public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev,
	syzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com>,
	"Tetsuo Handa" <penguin-kernel@I-love.SAKURA.ne.jp>,
	"Michal Hocko" <mhocko@suse.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Petr Mladek" <pmladek@suse.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>,
	"John Ogness" <john.ogness@linutronix.de>,
	"Patrick Daly" <quic_pdaly@quicinc.com>,
	"Sergey Senozhatsky" <senozhatsky@chromium.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Andrew Morton" <akpm@linux-foundation.org>
Subject: [PATCH 5.15 68/73] mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock
Date: Mon, 24 Apr 2023 15:17:22 +0200	[thread overview]
Message-ID: <20230424131131.597501030@linuxfoundation.org> (raw)
In-Reply-To: <20230424131129.040707961@linuxfoundation.org>

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

commit 1007843a91909a4995ee78a538f62d8665705b66 upstream.

syzbot is reporting circular locking dependency which involves
zonelist_update_seq seqlock [1], for this lock is checked by memory
allocation requests which do not need to be retried.

One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.

  CPU0
  ----
  __build_all_zonelists() {
    write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
    // e.g. timer interrupt handler runs at this moment
      some_timer_func() {
        kmalloc(GFP_ATOMIC) {
          __alloc_pages_slowpath() {
            read_seqbegin(&zonelist_update_seq) {
              // spins forever because zonelist_update_seq.seqcount is odd
            }
          }
        }
      }
    // e.g. timer interrupt handler finishes
    write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
  }

This deadlock scenario can be easily eliminated by not calling
read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
requests.  But Michal Hocko does not know whether we should go with this
approach.

Another deadlock scenario which syzbot is reporting is a race between
kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with
port->lock held and printk() from __build_all_zonelists() with
zonelist_update_seq held.

  CPU0                                   CPU1
  ----                                   ----
  pty_write() {
    tty_insert_flip_string_and_push_buffer() {
                                         __build_all_zonelists() {
                                           write_seqlock(&zonelist_update_seq);
                                           build_zonelists() {
                                             printk() {
                                               vprintk() {
                                                 vprintk_default() {
                                                   vprintk_emit() {
                                                     console_unlock() {
                                                       console_flush_all() {
                                                         console_emit_next_record() {
                                                           con->write() = serial8250_console_write() {
      spin_lock_irqsave(&port->lock, flags);
      tty_insert_flip_string() {
        tty_insert_flip_string_fixed_flag() {
          __tty_buffer_request_room() {
            tty_buffer_alloc() {
              kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
                __alloc_pages_slowpath() {
                  zonelist_iter_begin() {
                    read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd
                                                             spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held
                    }
                  }
                }
              }
            }
          }
        }
      }
      spin_unlock_irqrestore(&port->lock, flags);
                                                             // message is printed to console
                                                             spin_unlock_irqrestore(&port->lock, flags);
                                                           }
                                                         }
                                                       }
                                                     }
                                                   }
                                                 }
                                               }
                                             }
                                           }
                                           write_sequnlock(&zonelist_update_seq);
                                         }
    }
  }

This deadlock scenario can be eliminated by

  preventing interrupt context from calling kmalloc(GFP_ATOMIC)

and

  preventing printk() from calling console_flush_all()

while zonelist_update_seq.seqcount is odd.

Since Petr Mladek thinks that __build_all_zonelists() can become a
candidate for deferring printk() [2], let's address this problem by

  disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)

and

  disabling synchronous printk() in order to avoid console_flush_all()

.

As a side effect of minimizing duration of zonelist_update_seq.seqcount
being odd by disabling synchronous printk(), latency at
read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
__GFP_DIRECT_RECLAIM allocation requests will be reduced.  Although, from
lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
do not record unnecessary locking dependency) from interrupt context is
still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC)
inside
write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
section...

Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp
Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
Reported-by: syzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com>
  Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Petr Mladek <pmladek@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Patrick Daly <quic_pdaly@quicinc.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/page_alloc.c |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6416,7 +6416,21 @@ static void __build_all_zonelists(void *
 	int nid;
 	int __maybe_unused cpu;
 	pg_data_t *self = data;
+	unsigned long flags;
 
+	/*
+	 * Explicitly disable this CPU's interrupts before taking seqlock
+	 * to prevent any IRQ handler from calling into the page allocator
+	 * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock.
+	 */
+	local_irq_save(flags);
+	/*
+	 * Explicitly disable this CPU's synchronous printk() before taking
+	 * seqlock to prevent any printk() from trying to hold port->lock, for
+	 * tty_insert_flip_string_and_push_buffer() on other CPU might be
+	 * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held.
+	 */
+	printk_deferred_enter();
 	write_seqlock(&zonelist_update_seq);
 
 #ifdef CONFIG_NUMA
@@ -6451,6 +6465,8 @@ static void __build_all_zonelists(void *
 	}
 
 	write_sequnlock(&zonelist_update_seq);
+	printk_deferred_exit();
+	local_irq_restore(flags);
 }
 
 static noinline void __init



  parent reply	other threads:[~2023-04-24 13:22 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-24 13:16 [PATCH 5.15 00/73] 5.15.109-rc1 review Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 01/73] ARM: dts: rockchip: fix a typo error for rk3288 spdif node Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 02/73] arm64: dts: qcom: ipq8074-hk01: enable QMP device, not the PHY node Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 03/73] arm64: dts: meson-g12-common: specify full DMC range Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 04/73] arm64: dts: imx8mm-evk: correct pmic clock source Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 05/73] netfilter: br_netfilter: fix recent physdev match breakage Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 06/73] regulator: fan53555: Explicitly include bits header Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 07/73] regulator: fan53555: Fix wrong TCS_SLEW_MASK Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 08/73] net: sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 09/73] virtio_net: bugfix overflow inside xdp_linearize_page() Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 10/73] sfc: Split STATE_READY in to STATE_NET_DOWN and STATE_NET_UP Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 11/73] sfc: Fix use-after-free due to selftest_work Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 12/73] netfilter: nf_tables: fix ifdef to also consider nf_tables=m Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 13/73] i40e: fix accessing vsi->active_filters without holding lock Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 14/73] i40e: fix i40e_setup_misc_vector() error handling Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 15/73] netfilter: nf_tables: validate catch-all set elements Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 16/73] netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 17/73] bnxt_en: Do not initialize PTP on older P3/P4 chips Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 18/73] mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next() Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 19/73] bonding: Fix memory leak when changing bond type to Ethernet Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 20/73] net: rpl: fix rpl header size calculation Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 21/73] mlxsw: pci: Fix possible crash during initialization Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 22/73] spi: spi-rockchip: Fix missing unwind goto in rockchip_sfc_probe() Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 23/73] bpf: Fix incorrect verifier pruning due to missing register precision taints Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 24/73] e1000e: Disable TSO on i219-LM card to increase speed Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 25/73] f2fs: Fix f2fs_truncate_partial_nodes ftrace event Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 26/73] Input: i8042 - add quirk for Fujitsu Lifebook A574/H Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 27/73] platform/x86 (gigabyte-wmi): Add support for A320M-S2H V2 Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 28/73] selftests: sigaltstack: fix -Wuninitialized Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 29/73] scsi: megaraid_sas: Fix fw_crash_buffer_show() Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 30/73] scsi: core: Improve scsi_vpd_inquiry() checks Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 31/73] net: dsa: b53: mmap: add phy ops Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 32/73] s390/ptrace: fix PTRACE_GET_LAST_BREAK error handling Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 33/73] nvme-tcp: fix a possible UAF when failing to allocate an io queue Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 34/73] xen/netback: use same error messages for same errors Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 35/73] platform/x86: gigabyte-wmi: add support for X570S AORUS ELITE Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 36/73] rtmutex: Add acquire semantics for rtmutex lock acquisition slow path Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 37/73] iio: light: tsl2772: fix reading proximity-diodes from device tree Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 38/73] nilfs2: initialize unused bytes in segment summary blocks Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 39/73] memstick: fix memory leak if card device is never registered Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 40/73] kernel/sys.c: fix and improve control flow in __sys_setres[ug]id() Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 41/73] mmc: sdhci_am654: Set HIGH_SPEED_ENA for SDR12 and SDR25 Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 42/73] drm/i915: Fix fast wake AUX sync len Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 43/73] mm/khugepaged: check again on anon uffd-wp during isolation Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 44/73] mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 45/73] sched/uclamp: Fix fits_capacity() check in feec() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 46/73] sched/uclamp: Make cpu_overutilized() use util_fits_cpu() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 47/73] sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()s early exit condition Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 48/73] sched/fair: Detect capacity inversion Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 49/73] sched/fair: Consider capacity inversion in util_fits_cpu() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 50/73] sched/uclamp: Fix a uninitialized variable warnings Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 51/73] sched/fair: Fixes for capacity inversion detection Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 52/73] MIPS: Define RUNTIME_DISCARD_EXIT in LD script Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 53/73] [PATCH v2 stable-5.10.y stable-5.15.y] docs: futex: Fix kernel-doc references after code split-up preparation Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 54/73] purgatory: fix disabling debug info Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 55/73] fuse: fix attr version comparison in fuse_read_update_size() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 56/73] fuse: always revalidate rename target dentry Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 57/73] fuse: fix deadlock between atomic O_TRUNC and page invalidation Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 58/73] udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM) Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 59/73] tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 60/73] inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 61/73] dccp: Call inet6_destroy_sock() via sk->sk_destruct() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 62/73] sctp: " Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 63/73] pwm: meson: Explicitly set .polarity in .get_state() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 64/73] pwm: iqs620a: " Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 65/73] pwm: hibvt: " Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 66/73] counter: 104-quad-8: Fix race condition between FLAG and CNTR reads Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 67/73] iio: adc: at91-sama5d2_adc: fix an error code in at91_adc_allocate_trigger() Greg Kroah-Hartman
2023-04-24 13:17 ` Greg Kroah-Hartman [this message]
2023-04-24 13:17 ` [PATCH 5.15 69/73] ASoC: fsl_asrc_dma: fix potential null-ptr-deref Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 70/73] ASN.1: Fix check for strdup() success Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 71/73] soc: sifive: l2_cache: fix missing iounmap() in error path in sifive_l2_init() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 72/73] soc: sifive: l2_cache: fix missing free_irq() " Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 73/73] soc: sifive: l2_cache: fix missing of_node_put() " Greg Kroah-Hartman
2023-04-25  1:14 ` [PATCH 5.15 00/73] 5.15.109-rc1 review Guenter Roeck
2023-04-25  2:42 ` Bagas Sanjaya
2023-04-25 13:37 ` Chris Paterson
2023-04-25 13:41 ` Naresh Kamboju
2023-04-25 14:01 ` Harshit Mogalapalli
2023-04-25 20:30 ` Florian Fainelli
2023-04-25 21:03 ` Ron Economos
2023-04-26  0:24 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230424131131.597501030@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=john.ogness@linutronix.de \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=patches@lists.linux.dev \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=pmladek@suse.com \
    --cc=quic_pdaly@quicinc.com \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=stable@vger.kernel.org \
    --cc=syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox