public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: Rik van Riel <riel@surriel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Omar Sandoval <osandov@meta.com>,
	linux-kernel@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>,
	Sergey Senozhatsky <senozhatsky@chromium.org>
Subject: Re: [RFC PATCH] nmi,printk: fix ABBA deadlock between nmi_backtrace and dump_stack_lvl
Date: Wed, 24 Jul 2024 14:45:49 +0200	[thread overview]
Message-ID: <ZqD3fb8mQqWwePEU@pathway.suse.cz> (raw)
In-Reply-To: <87plrcqyii.fsf@jogness.linutronix.de>

On Wed 2024-07-17 09:22:21, John Ogness wrote:
> On 2024-07-15, Rik van Riel <riel@surriel.com> wrote:
> > Both nmi_backtrace and dump_stack_lvl call printk_cpu_sync_get_irqsave.
> >
> > However, dump_stack_lvl will call into the printk code, while holding
> > the printk_cpu_sync_get lock, and then take the console lock.
> >
> > Another CPU may end up getting an NMI stack trace printed, after
> > being stuck printing something to serial console for too long,
> > with the console lock held.
> >
> > This results in the following lock order:
> > CPU A: printk_cpu_sync_get lock -> console_lock
> > CPU B: console_lock -> (nmi) printk_cpu_sync_get lock
> >
> > This will cause the system to hang with an ABBA deadlock
> 
> The console lock is acquired via trylock, so that will not yield
> deadlock here. However, if CPU B was printing, then CPU A will spin on
> @console_waiter (in console_trylock_spinning()). _That_ is a deadlock.
>
> The purpose of printk_cpu_sync_get_irqsave() is to avoid having the
> different backtraces being interleaved in the _ringbuffer_. It really
> isn't necessary that they are printed in that context. And indeed, there
> is no guarantee that they will be printed in that context anyway.

Well, backtraces are printed when unexpected things happen. It is
usually an emergency situation and we should do our best to
flush consoles ASAP.

> Perhaps a simple solution would be for printk_cpu_sync_get_irqsave() to
> call printk_deferred_enter/_exit. Something like the below patch.
> 
> John Ogness
> 
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index 65c5184470f1..1a6f5aac28bf 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -315,8 +315,10 @@ extern void __printk_cpu_sync_put(void);
>  #define printk_cpu_sync_get_irqsave(flags)		\
>  	for (;;) {					\
>  		local_irq_save(flags);			\
> +		printk_deferred_enter();		\
>  		if (__printk_cpu_sync_try_get())	\
>  			break;				\
> +		printk_deferred_exit();			\
>  		local_irq_restore(flags);		\
>  		__printk_cpu_sync_wait();		\
>  	}
> @@ -329,6 +331,7 @@ extern void __printk_cpu_sync_put(void);
>  #define printk_cpu_sync_put_irqrestore(flags)	\
>  	do {					\
>  		__printk_cpu_sync_put();	\
> +		printk_deferred_exit();		\
>  		local_irq_restore(flags);	\
>  	} while (0)
>  

OK, there is the basic rule: "Never take a lock in NMI when the lock might
be taken in another context". printk_cpu_sync_get() violates the rule.
But it is safe as long as the lock is re-entrant and tail.

The above patch fixes a situation where printk_cpu_sync_get() was not
a tail lock. So it looks like a reasonable fix if we want to keep it simple.

Well, I still prefer the alternative solution at
https://lore.kernel.org/r/93155b2ccafa43ed4845ae450451c6b8e27509cc.camel@surriel.com
which does not deffer the console output in normal/IRQ context. There
are doubts whether it is safe. I think that it is. Let me to reply there.

Best Regards,
Petr

      parent reply	other threads:[~2024-07-24 12:45 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-16  3:20 [RFC PATCH] nmi,printk: fix ABBA deadlock between nmi_backtrace and dump_stack_lvl Rik van Riel
2024-07-17  7:16 ` John Ogness
2024-07-17 13:47   ` Rik van Riel
2024-07-18  7:25     ` John Ogness
2024-07-18 13:38       ` Rik van Riel
2024-07-18 14:09         ` John Ogness
2024-07-18 15:23           ` Rik van Riel
2024-07-24 12:56           ` Petr Mladek
2024-07-24 14:45             ` John Ogness
2024-07-24 15:08               ` Petr Mladek
2024-09-13 17:25                 ` Rik van Riel
2024-09-16 14:33                   ` Petr Mladek
2024-07-24 16:55             ` Rik van Riel
2024-07-24 12:45   ` Petr Mladek [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZqD3fb8mQqWwePEU@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=john.ogness@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=osandov@meta.com \
    --cc=riel@surriel.com \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox