public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Cheng Jian <cj.chengjian@huawei.com>
Cc: linux-kernel@vger.kernel.org, chenwandun@huawei.com,
	xiexiuqi@huawei.com, bobo.shaobowang@huawei.com,
	huawei.libin@huawei.com, sergey.senozhatsky@gmail.com,
	rostedt@goodmis.org
Subject: Re: [RFC PATCH] panic: fix deadlock in panic()
Date: Thu, 4 Jun 2020 10:29:47 +0200	[thread overview]
Message-ID: <20200604082947.GB22497@linux-b0ei> (raw)
In-Reply-To: <20200603141915.38739-1-cj.chengjian@huawei.com>

On Wed 2020-06-03 14:19:15, Cheng Jian wrote:
>  A deadlock caused by logbuf_lock occurs when panic:
> 
> 	a) Panic CPU is running in non-NMI context
> 	b) Panic CPU sends out shutdown IPI via NMI vector
> 	c) One of the CPUs that we bring down via NMI vector holded logbuf_lock
> 	d) Panic CPU try to hold logbuf_lock, then deadlock occurs.
> 
> we try to re-init the logbuf_lock in printk_safe_flush_on_panic()
> to avoid deadlock, but it does not work here, because :
> 
> Firstly, it is inappropriate to check num_online_cpus() here.
> When the CPU bring down via NMI vector, the panic CPU willn't
> wait too long for other cores to stop, so when this problem
> occurs, num_online_cpus() may be greater than 1.
> 
> Secondly, printk_safe_flush_on_panic() is called after panic
> notifier callback, so if printk() is called in panic notifier
> callback, deadlock will still occurs. Eg, if ftrace_dump_on_oops
> is set, we print some debug information, it will try to hold the
> logbuf_lock.
> 
> To avoid this deadlock, drop the num_online_cpus() check and call
> the printk_safe_flush_on_panic() before panic_notifier_list callback,
> attempt to re-init logbuf_lock from panic CPU.

It might cause double unlock (deadlock) on architectures that did not
use NMI to stop the CPUs.

I have created a conservative fix for this problem for SLES, see
https://github.com/openSUSE/kernel-source/blob/SLE15-SP2-UPDATE/patches.suse/printk-panic-Avoid-deadlock-in-printk-after-stopping-CPUs-by-NMI.patch
It solves the problem only on x86 architecture.

There are many hacks that try to solve various scenarios but it
is getting too complicated and does not solve all problems.

The only real solution is lockless printk(). First piece is a lockless
ringbuffer. See the last version at
https://lore.kernel.org/r/20200501094010.17694-1-john.ogness@linutronix.de

We prefer to work on the lockless solution instead of adding more
complicated workarounds. This is why I even did not try to upstream
the patch for SLES.

In the meantime, you might also consider removing the offending
message from the panic notifier if it is not really important.

Best Regards,
Petr

  parent reply	other threads:[~2020-06-04  8:29 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-03 14:19 [RFC PATCH] panic: fix deadlock in panic() Cheng Jian
2020-06-04  7:59 ` Sergey Senozhatsky
2020-06-04  8:29 ` Petr Mladek [this message]
2020-06-05 10:42   ` chengjian (D)
2020-06-05 11:36     ` Petr Mladek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200604082947.GB22497@linux-b0ei \
    --to=pmladek@suse.com \
    --cc=bobo.shaobowang@huawei.com \
    --cc=chenwandun@huawei.com \
    --cc=cj.chengjian@huawei.com \
    --cc=huawei.libin@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=xiexiuqi@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox