From: Andrew Morton <akpm@linux-foundation.org>
To: mm-commits@vger.kernel.org, will@kernel.org, tglx@linutronix.de,
pmladek@suse.com, peterz@infradead.org,
penguin-kernel@I-love.SAKURA.ne.jp, mingo@redhat.com,
mhocko@suse.com, mgorman@techsingularity.net, longman@redhat.com,
lgoncalv@redhat.com, john.ogness@linutronix.de, david@redhat.com,
boqun.feng@gmail.com, bigeasy@linutronix.de,
akpm@linux-foundation.org
Subject: + mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch added to mm-unstable branch
Date: Sun, 02 Jul 2023 16:40:49 -0700 [thread overview]
Message-ID: <20230702234049.DCE06C433C8@smtp.kernel.org> (raw)
The patch titled
Subject: mm/page_alloc: use write_seqlock_irqsave() instead write_seqlock() + local_irq_save().
has been added to the -mm mm-unstable branch. Its filename is
mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: mm/page_alloc: use write_seqlock_irqsave() instead write_seqlock() + local_irq_save().
Date: Fri, 23 Jun 2023 22:15:17 +0200
__build_all_zonelists() acquires zonelist_update_seq by first disabling
interrupts via local_irq_save() and then acquiring the seqlock with
write_seqlock(). This is troublesome and leads to problems on PREEMPT_RT.
The problem is that the inner spinlock_t becomes a sleeping lock on
PREEMPT_RT and must not be acquired with disabled interrupts.
The API provides write_seqlock_irqsave() which does the right thing in one
step. printk_deferred_enter() has to be invoked in non-migrate-able
context to ensure that deferred printing is enabled and disabled on the
same CPU. This is the case after zonelist_update_seq has been acquired.
There was discussion on the first submission that the order should be:
local_irq_disable();
printk_deferred_enter();
write_seqlock();
to avoid pitfalls like having an unaccounted printk() coming from
write_seqlock_irqsave() before printk_deferred_enter() is invoked. The
only origin of such a printk() can be a lockdep splat because the lockdep
annotation happens after the sequence count is incremented. This is
exceptional and subject to change.
It was also pointed that PREEMPT_RT can be affected by the printk problem
since its write_seqlock_irqsave() does not really disable interrupts.
This isn't the case because PREEMPT_RT's printk implementation differs
from the mainline implementation in two important aspects:
- Printing happens in a dedicated threads and not at during the
invocation of printk().
- In emergency cases where synchronous printing is used, a different
driver is used which does not use tty_port::lock.
Acquire zonelist_update_seq with write_seqlock_irqsave() and then defer
printk output.
Link: https://lkml.kernel.org/r/20230623201517.yw286Knb@linutronix.de
Fixes: 1007843a91909 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save
+++ a/mm/page_alloc.c
@@ -5175,19 +5175,17 @@ static void __build_all_zonelists(void *
unsigned long flags;
/*
- * Explicitly disable this CPU's interrupts before taking seqlock
- * to prevent any IRQ handler from calling into the page allocator
- * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock.
+ * The zonelist_update_seq must be acquired with irqsave because the
+ * reader can be invoked from IRQ with GFP_ATOMIC.
*/
- local_irq_save(flags);
+ write_seqlock_irqsave(&zonelist_update_seq, flags);
/*
- * Explicitly disable this CPU's synchronous printk() before taking
- * seqlock to prevent any printk() from trying to hold port->lock, for
+ * Also disable synchronous printk() to prevent any printk() from
+ * trying to hold port->lock, for
* tty_insert_flip_string_and_push_buffer() on other CPU might be
* calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held.
*/
printk_deferred_enter();
- write_seqlock(&zonelist_update_seq);
#ifdef CONFIG_NUMA
memset(node_load, 0, sizeof(node_load));
@@ -5224,9 +5222,8 @@ static void __build_all_zonelists(void *
#endif
}
- write_sequnlock(&zonelist_update_seq);
printk_deferred_exit();
- local_irq_restore(flags);
+ write_sequnlock_irqrestore(&zonelist_update_seq, flags);
}
static noinline void __init
_
Patches currently in -mm which might be from bigeasy@linutronix.de are
seqlock-do-the-lockdep-annotation-before-locking-in-do_write_seqcount_begin_nested.patch
mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch
next reply other threads:[~2023-07-02 23:40 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-02 23:40 Andrew Morton [this message]
2023-07-03 0:09 ` + mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch added to mm-unstable branch Tetsuo Handa
2023-07-03 8:00 ` Michal Hocko
2023-07-03 8:39 ` Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230702234049.DCE06C433C8@smtp.kernel.org \
--to=akpm@linux-foundation.org \
--cc=bigeasy@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=david@redhat.com \
--cc=john.ogness@linutronix.de \
--cc=lgoncalv@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=mm-commits@vger.kernel.org \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.