From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
patches@lists.linux.dev,
syzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com>,
"Tetsuo Handa" <penguin-kernel@I-love.SAKURA.ne.jp>,
"Michal Hocko" <mhocko@suse.com>,
"Mel Gorman" <mgorman@techsingularity.net>,
"Petr Mladek" <pmladek@suse.com>,
"David Hildenbrand" <david@redhat.com>,
"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>,
"John Ogness" <john.ogness@linutronix.de>,
"Patrick Daly" <quic_pdaly@quicinc.com>,
"Sergey Senozhatsky" <senozhatsky@chromium.org>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Andrew Morton" <akpm@linux-foundation.org>
Subject: [PATCH 5.15 68/73] mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock
Date: Mon, 24 Apr 2023 15:17:22 +0200 [thread overview]
Message-ID: <20230424131131.597501030@linuxfoundation.org> (raw)
In-Reply-To: <20230424131129.040707961@linuxfoundation.org>
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
commit 1007843a91909a4995ee78a538f62d8665705b66 upstream.
syzbot is reporting circular locking dependency which involves
zonelist_update_seq seqlock [1], for this lock is checked by memory
allocation requests which do not need to be retried.
One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.
CPU0
----
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
// e.g. timer interrupt handler runs at this moment
some_timer_func() {
kmalloc(GFP_ATOMIC) {
__alloc_pages_slowpath() {
read_seqbegin(&zonelist_update_seq) {
// spins forever because zonelist_update_seq.seqcount is odd
}
}
}
}
// e.g. timer interrupt handler finishes
write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
}
This deadlock scenario can be easily eliminated by not calling
read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
requests. But Michal Hocko does not know whether we should go with this
approach.
Another deadlock scenario which syzbot is reporting is a race between
kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with
port->lock held and printk() from __build_all_zonelists() with
zonelist_update_seq held.
CPU0 CPU1
---- ----
pty_write() {
tty_insert_flip_string_and_push_buffer() {
__build_all_zonelists() {
write_seqlock(&zonelist_update_seq);
build_zonelists() {
printk() {
vprintk() {
vprintk_default() {
vprintk_emit() {
console_unlock() {
console_flush_all() {
console_emit_next_record() {
con->write() = serial8250_console_write() {
spin_lock_irqsave(&port->lock, flags);
tty_insert_flip_string() {
tty_insert_flip_string_fixed_flag() {
__tty_buffer_request_room() {
tty_buffer_alloc() {
kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
__alloc_pages_slowpath() {
zonelist_iter_begin() {
read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd
spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held
}
}
}
}
}
}
}
}
spin_unlock_irqrestore(&port->lock, flags);
// message is printed to console
spin_unlock_irqrestore(&port->lock, flags);
}
}
}
}
}
}
}
}
}
write_sequnlock(&zonelist_update_seq);
}
}
}
This deadlock scenario can be eliminated by
preventing interrupt context from calling kmalloc(GFP_ATOMIC)
and
preventing printk() from calling console_flush_all()
while zonelist_update_seq.seqcount is odd.
Since Petr Mladek thinks that __build_all_zonelists() can become a
candidate for deferring printk() [2], let's address this problem by
disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)
and
disabling synchronous printk() in order to avoid console_flush_all()
.
As a side effect of minimizing duration of zonelist_update_seq.seqcount
being odd by disabling synchronous printk(), latency at
read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
__GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from
lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
do not record unnecessary locking dependency) from interrupt context is
still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC)
inside
write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
section...
Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp
Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
Reported-by: syzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com>
Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Petr Mladek <pmladek@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Patrick Daly <quic_pdaly@quicinc.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/page_alloc.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6416,7 +6416,21 @@ static void __build_all_zonelists(void *
int nid;
int __maybe_unused cpu;
pg_data_t *self = data;
+ unsigned long flags;
+ /*
+ * Explicitly disable this CPU's interrupts before taking seqlock
+ * to prevent any IRQ handler from calling into the page allocator
+ * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock.
+ */
+ local_irq_save(flags);
+ /*
+ * Explicitly disable this CPU's synchronous printk() before taking
+ * seqlock to prevent any printk() from trying to hold port->lock, for
+ * tty_insert_flip_string_and_push_buffer() on other CPU might be
+ * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held.
+ */
+ printk_deferred_enter();
write_seqlock(&zonelist_update_seq);
#ifdef CONFIG_NUMA
@@ -6451,6 +6465,8 @@ static void __build_all_zonelists(void *
}
write_sequnlock(&zonelist_update_seq);
+ printk_deferred_exit();
+ local_irq_restore(flags);
}
static noinline void __init
next prev parent reply other threads:[~2023-04-24 13:22 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-24 13:16 [PATCH 5.15 00/73] 5.15.109-rc1 review Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 01/73] ARM: dts: rockchip: fix a typo error for rk3288 spdif node Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 02/73] arm64: dts: qcom: ipq8074-hk01: enable QMP device, not the PHY node Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 03/73] arm64: dts: meson-g12-common: specify full DMC range Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 04/73] arm64: dts: imx8mm-evk: correct pmic clock source Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 05/73] netfilter: br_netfilter: fix recent physdev match breakage Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 06/73] regulator: fan53555: Explicitly include bits header Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 07/73] regulator: fan53555: Fix wrong TCS_SLEW_MASK Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 08/73] net: sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 09/73] virtio_net: bugfix overflow inside xdp_linearize_page() Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 10/73] sfc: Split STATE_READY in to STATE_NET_DOWN and STATE_NET_UP Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 11/73] sfc: Fix use-after-free due to selftest_work Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 12/73] netfilter: nf_tables: fix ifdef to also consider nf_tables=m Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 13/73] i40e: fix accessing vsi->active_filters without holding lock Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 14/73] i40e: fix i40e_setup_misc_vector() error handling Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 15/73] netfilter: nf_tables: validate catch-all set elements Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 16/73] netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 17/73] bnxt_en: Do not initialize PTP on older P3/P4 chips Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 18/73] mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next() Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 19/73] bonding: Fix memory leak when changing bond type to Ethernet Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 20/73] net: rpl: fix rpl header size calculation Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 21/73] mlxsw: pci: Fix possible crash during initialization Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 22/73] spi: spi-rockchip: Fix missing unwind goto in rockchip_sfc_probe() Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 23/73] bpf: Fix incorrect verifier pruning due to missing register precision taints Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 24/73] e1000e: Disable TSO on i219-LM card to increase speed Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 25/73] f2fs: Fix f2fs_truncate_partial_nodes ftrace event Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 26/73] Input: i8042 - add quirk for Fujitsu Lifebook A574/H Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 27/73] platform/x86 (gigabyte-wmi): Add support for A320M-S2H V2 Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 28/73] selftests: sigaltstack: fix -Wuninitialized Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 29/73] scsi: megaraid_sas: Fix fw_crash_buffer_show() Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 30/73] scsi: core: Improve scsi_vpd_inquiry() checks Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 31/73] net: dsa: b53: mmap: add phy ops Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 32/73] s390/ptrace: fix PTRACE_GET_LAST_BREAK error handling Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 33/73] nvme-tcp: fix a possible UAF when failing to allocate an io queue Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 34/73] xen/netback: use same error messages for same errors Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 35/73] platform/x86: gigabyte-wmi: add support for X570S AORUS ELITE Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 36/73] rtmutex: Add acquire semantics for rtmutex lock acquisition slow path Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 37/73] iio: light: tsl2772: fix reading proximity-diodes from device tree Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 38/73] nilfs2: initialize unused bytes in segment summary blocks Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 39/73] memstick: fix memory leak if card device is never registered Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 40/73] kernel/sys.c: fix and improve control flow in __sys_setres[ug]id() Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 41/73] mmc: sdhci_am654: Set HIGH_SPEED_ENA for SDR12 and SDR25 Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 42/73] drm/i915: Fix fast wake AUX sync len Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 43/73] mm/khugepaged: check again on anon uffd-wp during isolation Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 44/73] mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 45/73] sched/uclamp: Fix fits_capacity() check in feec() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 46/73] sched/uclamp: Make cpu_overutilized() use util_fits_cpu() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 47/73] sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()s early exit condition Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 48/73] sched/fair: Detect capacity inversion Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 49/73] sched/fair: Consider capacity inversion in util_fits_cpu() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 50/73] sched/uclamp: Fix a uninitialized variable warnings Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 51/73] sched/fair: Fixes for capacity inversion detection Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 52/73] MIPS: Define RUNTIME_DISCARD_EXIT in LD script Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 53/73] [PATCH v2 stable-5.10.y stable-5.15.y] docs: futex: Fix kernel-doc references after code split-up preparation Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 54/73] purgatory: fix disabling debug info Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 55/73] fuse: fix attr version comparison in fuse_read_update_size() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 56/73] fuse: always revalidate rename target dentry Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 57/73] fuse: fix deadlock between atomic O_TRUNC and page invalidation Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 58/73] udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM) Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 59/73] tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 60/73] inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 61/73] dccp: Call inet6_destroy_sock() via sk->sk_destruct() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 62/73] sctp: " Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 63/73] pwm: meson: Explicitly set .polarity in .get_state() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 64/73] pwm: iqs620a: " Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 65/73] pwm: hibvt: " Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 66/73] counter: 104-quad-8: Fix race condition between FLAG and CNTR reads Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 67/73] iio: adc: at91-sama5d2_adc: fix an error code in at91_adc_allocate_trigger() Greg Kroah-Hartman
2023-04-24 13:17 ` Greg Kroah-Hartman [this message]
2023-04-24 13:17 ` [PATCH 5.15 69/73] ASoC: fsl_asrc_dma: fix potential null-ptr-deref Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 70/73] ASN.1: Fix check for strdup() success Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 71/73] soc: sifive: l2_cache: fix missing iounmap() in error path in sifive_l2_init() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 72/73] soc: sifive: l2_cache: fix missing free_irq() " Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 73/73] soc: sifive: l2_cache: fix missing of_node_put() " Greg Kroah-Hartman
2023-04-25 1:14 ` [PATCH 5.15 00/73] 5.15.109-rc1 review Guenter Roeck
2023-04-25 2:42 ` Bagas Sanjaya
2023-04-25 13:37 ` Chris Paterson
2023-04-25 13:41 ` Naresh Kamboju
2023-04-25 14:01 ` Harshit Mogalapalli
2023-04-25 20:30 ` Florian Fainelli
2023-04-25 21:03 ` Ron Economos
2023-04-26 0:24 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230424131131.597501030@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=ilpo.jarvinen@linux.intel.com \
--cc=john.ogness@linutronix.de \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=patches@lists.linux.dev \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=pmladek@suse.com \
--cc=quic_pdaly@quicinc.com \
--cc=rostedt@goodmis.org \
--cc=senozhatsky@chromium.org \
--cc=stable@vger.kernel.org \
--cc=syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox