All of lore.kernel.org
 help / color / mirror / Atom feed
* + mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch added to mm-unstable branch
@ 2023-07-02 23:40 Andrew Morton
  2023-07-03  0:09 ` Tetsuo Handa
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2023-07-02 23:40 UTC (permalink / raw)
  To: mm-commits, will, tglx, pmladek, peterz, penguin-kernel, mingo,
	mhocko, mgorman, longman, lgoncalv, john.ogness, david,
	boqun.feng, bigeasy, akpm


The patch titled
     Subject: mm/page_alloc: use write_seqlock_irqsave() instead write_seqlock() + local_irq_save().
has been added to the -mm mm-unstable branch.  Its filename is
     mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch

This patch will shortly appear at
     https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch

This patch will later appear in the mm-unstable branch at
    git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days

------------------------------------------------------
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: mm/page_alloc: use write_seqlock_irqsave() instead write_seqlock() + local_irq_save().
Date: Fri, 23 Jun 2023 22:15:17 +0200

__build_all_zonelists() acquires zonelist_update_seq by first disabling
interrupts via local_irq_save() and then acquiring the seqlock with
write_seqlock().  This is troublesome and leads to problems on PREEMPT_RT.
The problem is that the inner spinlock_t becomes a sleeping lock on
PREEMPT_RT and must not be acquired with disabled interrupts.

The API provides write_seqlock_irqsave() which does the right thing in one
step.  printk_deferred_enter() has to be invoked in non-migrate-able
context to ensure that deferred printing is enabled and disabled on the
same CPU.  This is the case after zonelist_update_seq has been acquired.

There was discussion on the first submission that the order should be:
	local_irq_disable();
	printk_deferred_enter();
	write_seqlock();

to avoid pitfalls like having an unaccounted printk() coming from
write_seqlock_irqsave() before printk_deferred_enter() is invoked.  The
only origin of such a printk() can be a lockdep splat because the lockdep
annotation happens after the sequence count is incremented.  This is
exceptional and subject to change.

It was also pointed that PREEMPT_RT can be affected by the printk problem
since its write_seqlock_irqsave() does not really disable interrupts. 
This isn't the case because PREEMPT_RT's printk implementation differs
from the mainline implementation in two important aspects:

- Printing happens in a dedicated threads and not at during the
  invocation of printk().
- In emergency cases where synchronous printing is used, a different
  driver is used which does not use tty_port::lock.

Acquire zonelist_update_seq with write_seqlock_irqsave() and then defer
printk output.

Link: https://lkml.kernel.org/r/20230623201517.yw286Knb@linutronix.de
Fixes: 1007843a91909 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/page_alloc.c |   15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

--- a/mm/page_alloc.c~mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save
+++ a/mm/page_alloc.c
@@ -5175,19 +5175,17 @@ static void __build_all_zonelists(void *
 	unsigned long flags;
 
 	/*
-	 * Explicitly disable this CPU's interrupts before taking seqlock
-	 * to prevent any IRQ handler from calling into the page allocator
-	 * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock.
+	 * The zonelist_update_seq must be acquired with irqsave because the
+	 * reader can be invoked from IRQ with GFP_ATOMIC.
 	 */
-	local_irq_save(flags);
+	write_seqlock_irqsave(&zonelist_update_seq, flags);
 	/*
-	 * Explicitly disable this CPU's synchronous printk() before taking
-	 * seqlock to prevent any printk() from trying to hold port->lock, for
+	 * Also disable synchronous printk() to prevent any printk() from
+	 * trying to hold port->lock, for
 	 * tty_insert_flip_string_and_push_buffer() on other CPU might be
 	 * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held.
 	 */
 	printk_deferred_enter();
-	write_seqlock(&zonelist_update_seq);
 
 #ifdef CONFIG_NUMA
 	memset(node_load, 0, sizeof(node_load));
@@ -5224,9 +5222,8 @@ static void __build_all_zonelists(void *
 #endif
 	}
 
-	write_sequnlock(&zonelist_update_seq);
 	printk_deferred_exit();
-	local_irq_restore(flags);
+	write_sequnlock_irqrestore(&zonelist_update_seq, flags);
 }
 
 static noinline void __init
_

Patches currently in -mm which might be from bigeasy@linutronix.de are

seqlock-do-the-lockdep-annotation-before-locking-in-do_write_seqcount_begin_nested.patch
mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: + mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch added to mm-unstable branch
  2023-07-02 23:40 + mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch added to mm-unstable branch Andrew Morton
@ 2023-07-03  0:09 ` Tetsuo Handa
  2023-07-03  8:00   ` Michal Hocko
  0 siblings, 1 reply; 4+ messages in thread
From: Tetsuo Handa @ 2023-07-03  0:09 UTC (permalink / raw)
  To: Andrew Morton, bigeasy, pmladek
  Cc: will, tglx, peterz, mingo, mhocko, mgorman, longman, lgoncalv,
	john.ogness, david, boqun.feng, mm-commits

Nacked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

because of https://lkml.kernel.org/r/a1c559b7-335e-5401-d167-301c5b1cd312@I-love.SAKURA.ne.jp .

On 2023/07/03 8:40, Andrew Morton wrote:
> The patch titled
>      Subject: mm/page_alloc: use write_seqlock_irqsave() instead write_seqlock() + local_irq_save().
> has been added to the -mm mm-unstable branch.  Its filename is
>      mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch
> 
> This patch will shortly appear at
>      https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch
> 
> This patch will later appear in the mm-unstable branch at
>     git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> 
> Before you just go and hit "reply", please:
>    a) Consider who else should be cc'ed
>    b) Prefer to cc a suitable mailing list as well
>    c) Ideally: find the original patch on the mailing list and do a
>       reply-to-all to that, adding suitable additional cc's
> 
> *** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
> 
> The -mm tree is included into linux-next via the mm-everything
> branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> and is updated there every 2-3 working days


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: + mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch added to mm-unstable branch
  2023-07-03  0:09 ` Tetsuo Handa
@ 2023-07-03  8:00   ` Michal Hocko
  2023-07-03  8:39     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 4+ messages in thread
From: Michal Hocko @ 2023-07-03  8:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, bigeasy, pmladek, will, tglx, peterz, mingo,
	mgorman, longman, lgoncalv, john.ogness, david, boqun.feng,
	mm-commits

On Mon 03-07-23 09:09:46, Tetsuo Handa wrote:
> Nacked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> 
> because of https://lkml.kernel.org/r/a1c559b7-335e-5401-d167-301c5b1cd312@I-love.SAKURA.ne.jp .

This is not really productive approach! You are rising non-material
concerns you haven't proven to be real. This is blocking otherwise
useful fix. I am completely fine recording your nack with a reference to
your concern should we ever trip over your concerns and so we can easily
revert and find a different solution but I do not believe this should
stand in the way in the fix.

Now concerning the patch 1 in the series, I do agree this should be
passing through the lockdep maintainers. But this fix is not really
dependent on it.

> On 2023/07/03 8:40, Andrew Morton wrote:
> > The patch titled
> >      Subject: mm/page_alloc: use write_seqlock_irqsave() instead write_seqlock() + local_irq_save().
> > has been added to the -mm mm-unstable branch.  Its filename is
> >      mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch
> > 
> > This patch will shortly appear at
> >      https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch
> > 
> > This patch will later appear in the mm-unstable branch at
> >     git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> > 
> > Before you just go and hit "reply", please:
> >    a) Consider who else should be cc'ed
> >    b) Prefer to cc a suitable mailing list as well
> >    c) Ideally: find the original patch on the mailing list and do a
> >       reply-to-all to that, adding suitable additional cc's
> > 
> > *** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
> > 
> > The -mm tree is included into linux-next via the mm-everything
> > branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> > and is updated there every 2-3 working days

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: + mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch added to mm-unstable branch
  2023-07-03  8:00   ` Michal Hocko
@ 2023-07-03  8:39     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 4+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-07-03  8:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, Andrew Morton, pmladek, will, tglx, peterz, mingo,
	mgorman, longman, lgoncalv, john.ogness, david, boqun.feng,
	mm-commits

On 2023-07-03 10:00:47 [+0200], Michal Hocko wrote:
> On Mon 03-07-23 09:09:46, Tetsuo Handa wrote:
> > Nacked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > 
> > because of https://lkml.kernel.org/r/a1c559b7-335e-5401-d167-301c5b1cd312@I-love.SAKURA.ne.jp .
> 
> This is not really productive approach! You are rising non-material
> concerns you haven't proven to be real. This is blocking otherwise
> useful fix. I am completely fine recording your nack with a reference to
> your concern should we ever trip over your concerns and so we can easily
> revert and find a different solution but I do not believe this should
> stand in the way in the fix.
> 
> Now concerning the patch 1 in the series, I do agree this should be
> passing through the lockdep maintainers. But this fix is not really
> dependent on it.

I fully agree here.

Sebastian

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-07-03  8:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-02 23:40 + mm-page_alloc-use-write_seqlock_irqsave-instead-write_seqlock-local_irq_save.patch added to mm-unstable branch Andrew Morton
2023-07-03  0:09 ` Tetsuo Handa
2023-07-03  8:00   ` Michal Hocko
2023-07-03  8:39     ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.