From: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
To: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
Konstantin Khlebnikov <koct9i@gmail.com>,
Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Steven Rostedt <rostedt@goodmis.org>,
Peter Zijlstra <peterz@infradead.org>,
x86@kernel.org, John Ogness <john.ogness@linutronix.de>,
Thomas Gleixner <tglx@linutronix.de>,
Andrew Morton <akpm@linux-foundation.org>,
Petr Tesarik <ptesarik@suse.cz>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] printk/panic: Access the main printk log in panic() only when safe
Date: Tue, 23 Jul 2019 12:13:40 +0900 [thread overview]
Message-ID: <20190723031340.GA19463@jagdpanzerIV> (raw)
In-Reply-To: <20190719125753.miniwfq4nhicy76n@pathway.suse.cz>
On (07/19/19 14:57), Petr Mladek wrote:
[..]
> > Where do nested printk()-s come from? Which one of the following
> > scenarios you cover in commit message:
> >
> > scenario 1
> >
> > - we have CPUB which holds logbuf_lock
> > - we have CPUA which panic()-s the system, but can't bring CPUB down,
> > so logbuf_lock stays locked on remote CPU
>
> No, this scenario is not affected by this patch. It would always lead to
> a deadlock.
Agreed, in many cases we won't be able to panic() the system properly,
deadlocking somewhere in smp_send_stop().
> > scenario 2
> >
> > - we have CPUA which holds logbuf_lock
> > - we have panic() on CPUA, but it cannot bring down some other CPUB
> > so logbuf_lock stays locked on local CPU, and it cannot re-init
> > logbuf.
[..]
> + Before:
> + printk_safe_flush_on_panic() will keep logbuf_lock locked
> and do nothing.
>
> + kmsg_dump(), console_unblank(), or console_flush_on_panic()
> will deadlock when they try to get logbuf_lock(). They will
> not be able to process any single line.
>
> + After:
> + printk_bust_lock_safe() will keep logbuf_lock locked
>
> + All functions using logbuf_lock will not get called.
> We will not see the messages (as previously) but the
> system will not deadlock.
>
>
> But there is one more scenario 3:
Yes!
> - we have CPUB which loops or is deadlocked in IRQ context
>
> - we have CPUA which panic()-s the system, but can't bring CPUB down,
> so logbuf_lock might be takes and release from time to time
> by CPUB
Great!
This is the only case when we actually need to pay attention to
num_online_cpus(), because there is an active logbuf_lock owner;
in any other case we can unconditionally re-init printk() locks.
But there is more to it.
Note, that the problem in scenario 3 is bigger than just logbuf_lock.
Regardless of logbuf implementation we will not be able to panic()
the system.
If we have a never ending source of printk() messages, coming from
misbehaving CPU which stuck in printing loop in IRQ context, then
flush_on_panic() will never end or kmsg dump will never stop, etc.
We need to cut off misbehaving CPUs. Panic CPU waits (for up to 1
second?) in smp_send_stop() for secondary CPUs to die, if some
secondary CPUs are still chatty after that then most likely those
CPUs don't have anything good to say, just a pointless flood of same
messages over and over again; which, however, will not let panic
CPU to proceed.
And this is where the idea of "disconnecting" those CPUs from main
logbuf come from.
So what we can do:
- smp_send_stop()
- disconnect all-but-self from logbuf (via printk-safe)
- wait for 1 or 2 more extra seconds for secondary CPUs to leave
console_unlock() and to redirect printks to per-CPU buffers
- after that we are sort of good-to-go: re-init printk locks
and do kmsg_dump, flush_on_panic().
Note, misbehaving CPUs will write to per-CPU buffers, they are not
expected to be able to flush per-CPU buffers to the main logbuf. That
will require enabled IRQs, which should deliver stop IPI. But we can
do even more - just disable print_safe irq work on disconnect CPUs.
So, shall we try one more time with the "disconnect" misbehaving CPUs
approach? I can send an RFC patch.
-ss
next prev parent reply other threads:[~2019-07-23 3:13 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-16 7:28 [PATCH 0/2] panic/printk/x86: Prevent some more printk-related deadlocks in panic() Petr Mladek
2019-07-16 7:28 ` [PATCH 1/2] printk/panic: Access the main printk log in panic() only when safe Petr Mladek
2019-07-17 9:56 ` Sergey Senozhatsky
2019-07-18 8:36 ` Petr Mladek
2019-07-18 9:49 ` Sergey Senozhatsky
2019-07-19 12:57 ` Petr Mladek
2019-07-23 3:13 ` Sergey Senozhatsky [this message]
2019-07-24 12:27 ` Petr Mladek
2019-07-31 6:08 ` Sergey Senozhatsky
2019-07-31 6:59 ` Sergey Senozhatsky
2019-07-18 10:11 ` Sergey Senozhatsky
2019-07-16 7:28 ` [PATCH 2/2] printk/panic/x86: Allow to access printk log buffer after crash_smp_send_stop() Petr Mladek
2019-07-18 10:47 ` Sergey Senozhatsky
2019-07-18 11:07 ` Konstantin Khlebnikov
2019-07-18 11:29 ` Sergey Senozhatsky
2019-07-19 12:19 ` Petr Mladek
2019-07-18 9:59 ` [PATCH 0/2] panic/printk/x86: Prevent some more printk-related deadlocks in panic() Konstantin Khlebnikov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190723031340.GA19463@jagdpanzerIV \
--to=sergey.senozhatsky.work@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=john.ogness@linutronix.de \
--cc=koct9i@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=ptesarik@suse.cz \
--cc=rostedt@goodmis.org \
--cc=sergey.senozhatsky@gmail.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.