All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev,
	syzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com>,
	"Tetsuo Handa" <penguin-kernel@I-love.SAKURA.ne.jp>,
	"Michal Hocko" <mhocko@suse.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Petr Mladek" <pmladek@suse.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>,
	"John Ogness" <john.ogness@linutronix.de>,
	"Patrick Daly" <quic_pdaly@quicinc.com>,
	"Sergey Senozhatsky" <senozhatsky@chromium.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Andrew Morton" <akpm@linux-foundation.org>
Subject: [PATCH 5.15 68/73] mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock
Date: Mon, 24 Apr 2023 15:17:22 +0200	[thread overview]
Message-ID: <20230424131131.597501030@linuxfoundation.org> (raw)
In-Reply-To: <20230424131129.040707961@linuxfoundation.org>

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

commit 1007843a91909a4995ee78a538f62d8665705b66 upstream.

syzbot is reporting circular locking dependency which involves
zonelist_update_seq seqlock [1], for this lock is checked by memory
allocation requests which do not need to be retried.

One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.

  CPU0
  ----
  __build_all_zonelists() {
    write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
    // e.g. timer interrupt handler runs at this moment
      some_timer_func() {
        kmalloc(GFP_ATOMIC) {
          __alloc_pages_slowpath() {
            read_seqbegin(&zonelist_update_seq) {
              // spins forever because zonelist_update_seq.seqcount is odd
            }
          }
        }
      }
    // e.g. timer interrupt handler finishes
    write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
  }

This deadlock scenario can be easily eliminated by not calling
read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
requests.  But Michal Hocko does not know whether we should go with this
approach.

Another deadlock scenario which syzbot is reporting is a race between
kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with
port->lock held and printk() from __build_all_zonelists() with
zonelist_update_seq held.

  CPU0                                   CPU1
  ----                                   ----
  pty_write() {
    tty_insert_flip_string_and_push_buffer() {
                                         __build_all_zonelists() {
                                           write_seqlock(&zonelist_update_seq);
                                           build_zonelists() {
                                             printk() {
                                               vprintk() {
                                                 vprintk_default() {
                                                   vprintk_emit() {
                                                     console_unlock() {
                                                       console_flush_all() {
                                                         console_emit_next_record() {
                                                           con->write() = serial8250_console_write() {
      spin_lock_irqsave(&port->lock, flags);
      tty_insert_flip_string() {
        tty_insert_flip_string_fixed_flag() {
          __tty_buffer_request_room() {
            tty_buffer_alloc() {
              kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
                __alloc_pages_slowpath() {
                  zonelist_iter_begin() {
                    read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd
                                                             spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held
                    }
                  }
                }
              }
            }
          }
        }
      }
      spin_unlock_irqrestore(&port->lock, flags);
                                                             // message is printed to console
                                                             spin_unlock_irqrestore(&port->lock, flags);
                                                           }
                                                         }
                                                       }
                                                     }
                                                   }
                                                 }
                                               }
                                             }
                                           }
                                           write_sequnlock(&zonelist_update_seq);
                                         }
    }
  }

This deadlock scenario can be eliminated by

  preventing interrupt context from calling kmalloc(GFP_ATOMIC)

and

  preventing printk() from calling console_flush_all()

while zonelist_update_seq.seqcount is odd.

Since Petr Mladek thinks that __build_all_zonelists() can become a
candidate for deferring printk() [2], let's address this problem by

  disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)

and

  disabling synchronous printk() in order to avoid console_flush_all()

.

As a side effect of minimizing duration of zonelist_update_seq.seqcount
being odd by disabling synchronous printk(), latency at
read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
__GFP_DIRECT_RECLAIM allocation requests will be reduced.  Although, from
lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
do not record unnecessary locking dependency) from interrupt context is
still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC)
inside
write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
section...

Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp
Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
Reported-by: syzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com>
  Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Petr Mladek <pmladek@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Patrick Daly <quic_pdaly@quicinc.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/page_alloc.c |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6416,7 +6416,21 @@ static void __build_all_zonelists(void *
 	int nid;
 	int __maybe_unused cpu;
 	pg_data_t *self = data;
+	unsigned long flags;
 
+	/*
+	 * Explicitly disable this CPU's interrupts before taking seqlock
+	 * to prevent any IRQ handler from calling into the page allocator
+	 * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock.
+	 */
+	local_irq_save(flags);
+	/*
+	 * Explicitly disable this CPU's synchronous printk() before taking
+	 * seqlock to prevent any printk() from trying to hold port->lock, for
+	 * tty_insert_flip_string_and_push_buffer() on other CPU might be
+	 * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held.
+	 */
+	printk_deferred_enter();
 	write_seqlock(&zonelist_update_seq);
 
 #ifdef CONFIG_NUMA
@@ -6451,6 +6465,8 @@ static void __build_all_zonelists(void *
 	}
 
 	write_sequnlock(&zonelist_update_seq);
+	printk_deferred_exit();
+	local_irq_restore(flags);
 }
 
 static noinline void __init



  parent reply	other threads:[~2023-04-24 13:21 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-24 13:16 [PATCH 5.15 00/73] 5.15.109-rc1 review Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 01/73] ARM: dts: rockchip: fix a typo error for rk3288 spdif node Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 02/73] arm64: dts: qcom: ipq8074-hk01: enable QMP device, not the PHY node Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 03/73] arm64: dts: meson-g12-common: specify full DMC range Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 04/73] arm64: dts: imx8mm-evk: correct pmic clock source Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 05/73] netfilter: br_netfilter: fix recent physdev match breakage Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 06/73] regulator: fan53555: Explicitly include bits header Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 07/73] regulator: fan53555: Fix wrong TCS_SLEW_MASK Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 08/73] net: sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 09/73] virtio_net: bugfix overflow inside xdp_linearize_page() Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 10/73] sfc: Split STATE_READY in to STATE_NET_DOWN and STATE_NET_UP Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 11/73] sfc: Fix use-after-free due to selftest_work Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 12/73] netfilter: nf_tables: fix ifdef to also consider nf_tables=m Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 13/73] i40e: fix accessing vsi->active_filters without holding lock Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 14/73] i40e: fix i40e_setup_misc_vector() error handling Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 15/73] netfilter: nf_tables: validate catch-all set elements Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 16/73] netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 17/73] bnxt_en: Do not initialize PTP on older P3/P4 chips Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 18/73] mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next() Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 19/73] bonding: Fix memory leak when changing bond type to Ethernet Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 20/73] net: rpl: fix rpl header size calculation Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 21/73] mlxsw: pci: Fix possible crash during initialization Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 22/73] spi: spi-rockchip: Fix missing unwind goto in rockchip_sfc_probe() Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 23/73] bpf: Fix incorrect verifier pruning due to missing register precision taints Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 24/73] e1000e: Disable TSO on i219-LM card to increase speed Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 25/73] f2fs: Fix f2fs_truncate_partial_nodes ftrace event Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 26/73] Input: i8042 - add quirk for Fujitsu Lifebook A574/H Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 27/73] platform/x86 (gigabyte-wmi): Add support for A320M-S2H V2 Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 28/73] selftests: sigaltstack: fix -Wuninitialized Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 29/73] scsi: megaraid_sas: Fix fw_crash_buffer_show() Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 30/73] scsi: core: Improve scsi_vpd_inquiry() checks Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 31/73] net: dsa: b53: mmap: add phy ops Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 32/73] s390/ptrace: fix PTRACE_GET_LAST_BREAK error handling Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 33/73] nvme-tcp: fix a possible UAF when failing to allocate an io queue Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 34/73] xen/netback: use same error messages for same errors Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 35/73] platform/x86: gigabyte-wmi: add support for X570S AORUS ELITE Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 36/73] rtmutex: Add acquire semantics for rtmutex lock acquisition slow path Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 37/73] iio: light: tsl2772: fix reading proximity-diodes from device tree Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 38/73] nilfs2: initialize unused bytes in segment summary blocks Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 39/73] memstick: fix memory leak if card device is never registered Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 40/73] kernel/sys.c: fix and improve control flow in __sys_setres[ug]id() Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 41/73] mmc: sdhci_am654: Set HIGH_SPEED_ENA for SDR12 and SDR25 Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 42/73] drm/i915: Fix fast wake AUX sync len Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 43/73] mm/khugepaged: check again on anon uffd-wp during isolation Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 44/73] mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pages Greg Kroah-Hartman
2023-04-24 13:16 ` [PATCH 5.15 45/73] sched/uclamp: Fix fits_capacity() check in feec() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 46/73] sched/uclamp: Make cpu_overutilized() use util_fits_cpu() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 47/73] sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()s early exit condition Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 48/73] sched/fair: Detect capacity inversion Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 49/73] sched/fair: Consider capacity inversion in util_fits_cpu() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 50/73] sched/uclamp: Fix a uninitialized variable warnings Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 51/73] sched/fair: Fixes for capacity inversion detection Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 52/73] MIPS: Define RUNTIME_DISCARD_EXIT in LD script Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 53/73] [PATCH v2 stable-5.10.y stable-5.15.y] docs: futex: Fix kernel-doc references after code split-up preparation Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 54/73] purgatory: fix disabling debug info Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 55/73] fuse: fix attr version comparison in fuse_read_update_size() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 56/73] fuse: always revalidate rename target dentry Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 57/73] fuse: fix deadlock between atomic O_TRUNC and page invalidation Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 58/73] udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM) Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 59/73] tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 60/73] inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 61/73] dccp: Call inet6_destroy_sock() via sk->sk_destruct() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 62/73] sctp: " Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 63/73] pwm: meson: Explicitly set .polarity in .get_state() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 64/73] pwm: iqs620a: " Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 65/73] pwm: hibvt: " Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 66/73] counter: 104-quad-8: Fix race condition between FLAG and CNTR reads Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 67/73] iio: adc: at91-sama5d2_adc: fix an error code in at91_adc_allocate_trigger() Greg Kroah-Hartman
2023-04-24 13:17 ` Greg Kroah-Hartman [this message]
2023-04-24 13:17 ` [PATCH 5.15 69/73] ASoC: fsl_asrc_dma: fix potential null-ptr-deref Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 70/73] ASN.1: Fix check for strdup() success Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 71/73] soc: sifive: l2_cache: fix missing iounmap() in error path in sifive_l2_init() Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 72/73] soc: sifive: l2_cache: fix missing free_irq() " Greg Kroah-Hartman
2023-04-24 13:17 ` [PATCH 5.15 73/73] soc: sifive: l2_cache: fix missing of_node_put() " Greg Kroah-Hartman
2023-04-25  1:14 ` [PATCH 5.15 00/73] 5.15.109-rc1 review Guenter Roeck
2023-04-25  2:42 ` Bagas Sanjaya
2023-04-25 10:44 ` Jon Hunter
2023-04-25 13:37 ` Chris Paterson
2023-04-25 13:41 ` Naresh Kamboju
2023-04-25 14:01 ` Harshit Mogalapalli
2023-04-25 20:30 ` Florian Fainelli
2023-04-25 21:03 ` Ron Economos
2023-04-26  0:24 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230424131131.597501030@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=john.ogness@linutronix.de \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=patches@lists.linux.dev \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=pmladek@suse.com \
    --cc=quic_pdaly@quicinc.com \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=stable@vger.kernel.org \
    --cc=syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.