* [merged mm-stable] mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch removed from -mm tree
@ 2023-08-11 22:59 Andrew Morton
0 siblings, 0 replies; only message in thread
From: Andrew Morton @ 2023-08-11 22:59 UTC (permalink / raw)
To: mm-commits, will, tglx, pmladek, peterz, penguin-kernel, mingo,
mhocko, mgorman, longman, lgoncalv, john.ogness, david,
boqun.feng, bigeasy, akpm
The quilt patch titled
Subject: mm/page_alloc: use write_seqlock_irqsave() instead write_seqlock() + local_irq_save().
has been removed from the -mm tree. Its filename was
mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: mm/page_alloc: use write_seqlock_irqsave() instead write_seqlock() + local_irq_save().
Date: Fri, 23 Jun 2023 22:15:17 +0200
__build_all_zonelists() acquires zonelist_update_seq by first disabling
interrupts via local_irq_save() and then acquiring the seqlock with
write_seqlock(). This is troublesome and leads to problems on PREEMPT_RT.
The problem is that the inner spinlock_t becomes a sleeping lock on
PREEMPT_RT and must not be acquired with disabled interrupts.
The API provides write_seqlock_irqsave() which does the right thing in one
step. printk_deferred_enter() has to be invoked in non-migrate-able
context to ensure that deferred printing is enabled and disabled on the
same CPU. This is the case after zonelist_update_seq has been acquired.
There was discussion on the first submission that the order should be:
local_irq_disable();
printk_deferred_enter();
write_seqlock();
to avoid pitfalls like having an unaccounted printk() coming from
write_seqlock_irqsave() before printk_deferred_enter() is invoked. The
only origin of such a printk() can be a lockdep splat because the lockdep
annotation happens after the sequence count is incremented. This is
exceptional and subject to change.
It was also pointed that PREEMPT_RT can be affected by the printk problem
since its write_seqlock_irqsave() does not really disable interrupts.
This isn't the case because PREEMPT_RT's printk implementation differs
from the mainline implementation in two important aspects:
- Printing happens in a dedicated threads and not at during the
invocation of printk().
- In emergency cases where synchronous printing is used, a different
driver is used which does not use tty_port::lock.
Acquire zonelist_update_seq with write_seqlock_irqsave() and then defer
printk output.
Link: https://lkml.kernel.org/r/20230623201517.yw286Knb@linutronix.de
Fixes: 1007843a91909 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/page_alloc.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)
--- a/mm/page_alloc.c~mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save
+++ a/mm/page_alloc.c
@@ -5139,19 +5139,17 @@ static void __build_all_zonelists(void *
unsigned long flags;
/*
- * Explicitly disable this CPU's interrupts before taking seqlock
- * to prevent any IRQ handler from calling into the page allocator
- * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock.
+ * The zonelist_update_seq must be acquired with irqsave because the
+ * reader can be invoked from IRQ with GFP_ATOMIC.
*/
- local_irq_save(flags);
+ write_seqlock_irqsave(&zonelist_update_seq, flags);
/*
- * Explicitly disable this CPU's synchronous printk() before taking
- * seqlock to prevent any printk() from trying to hold port->lock, for
+ * Also disable synchronous printk() to prevent any printk() from
+ * trying to hold port->lock, for
* tty_insert_flip_string_and_push_buffer() on other CPU might be
* calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held.
*/
printk_deferred_enter();
- write_seqlock(&zonelist_update_seq);
#ifdef CONFIG_NUMA
memset(node_load, 0, sizeof(node_load));
@@ -5188,9 +5186,8 @@ static void __build_all_zonelists(void *
#endif
}
- write_sequnlock(&zonelist_update_seq);
printk_deferred_exit();
- local_irq_restore(flags);
+ write_sequnlock_irqrestore(&zonelist_update_seq, flags);
}
static noinline void __init
_
Patches currently in -mm which might be from bigeasy@linutronix.de are
seqlock-do-the-lockdep-annotation-before-locking-in-do_write_seqcount_begin_nested.patch
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2023-08-11 22:59 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-11 22:59 [merged mm-stable] mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch removed from -mm tree Andrew Morton
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.