From: John Ogness <john.ogness@linutronix.de>
To: Linus Torvalds <torvalds@linux-foundation.org>,
Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
Steven Rostedt <rostedt@goodmis.org>,
Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
Rasmus Villemoes <linux@rasmusvillemoes.dk>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Thomas Gleixner <tglx@linutronix.de>, Jan Kara <jack@suse.cz>,
Peter Zijlstra <peterz@infradead.org>,
linux-kernel@vger.kernel.org
Subject: Re: [GIT PULL] printk for 6.11
Date: Tue, 23 Jul 2024 22:47:15 +0206 [thread overview]
Message-ID: <87ed7jvo2c.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <CAHk-=whU_woFnFN-3Jv2hNCmwLg_fkrT42AWwxm-=Ha5BmNX4w@mail.gmail.com>
Hi Linus,
On 2024-07-23, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> - In an emergency section, directly from nbcon_cpu_emergency_exit()
>> or nbcon_cpu_emergency_flush(). It allows to see the messages
>> when the system is in an unexpected state and might not be
>> able to continue properly.
>>
>> The messages are flushed at the end of the emergency section
>> to allow storing the full log (backtrace) first.
>
> What? No.
>
> One of the historically problematic situations is when a recursive
> oops or a deadlock occurs *during* the first oops.
>
> The "recursive oops" may be simple to sort out by forcing a flush at
> that point, in that hopefully the machine is "alive", but what about
> random deadlocks or other situations where the printk machinery simply
> is never ever entered again?
>
> And we most definitely have had exactly that happen due to the call
> trace code etc.
>
> At that point, it's ok if the machine is dead (this is obviously a
> very catastrophic situation - nobody should worry about how to
> continue), but it's really important that the first problem report
> makes it out.
>
> The whole notion of "to allow storing the full log (backtrace) first"
> is completely crazy. It's entirely secondary whether you have a full
> log or not, when the primary goal MUST BE that you have any output at
> all!
>
> How can this have _continued_ to be unclear, when it was my one hard
> requirement for this whole thing from day one? My *ONE* requirement
> has always been that the printk code ALWAYS does its absolute best to
> print out problem reports.
>
> Because when an oops happen, all other rules go out the window.
>
> We no longer care about "what pretty printouts", and we should strive
> to always try to just get at least *some* basic print out. The kernel
> is known to not be in a great state, and maybe the printout will fail
> due to where the problem happened, but the kernel NEEDS TO TRY.
As the primary author, I would like to clarify the motivation.
During LPC2022 at the printk proof-of-concept demonstration is where
this requirement came from. The second point in my summary [0] of that
meeting stated the new requirement.
The requirement came about because during the demonstration we
accidentally hit a situation where a warning backtrace could not be seen
because while trying to print the warning, a panic was hit. In the end
we could see the panic, but not the original warning. At that meeting we
genrally agreed that it would be better to at least get the backtrace
into the buffer before entering the complex machinery of pushing out the
backlog to consoles. Then, if a panic occurs while printing, the warning
is already in the buffer and will be flushed out ahead of any panic
messages. That discussion is available online [1] (starting at 56:20).
In your response [2] to my summary email, you mentioned that we could
tweak how much is buffered as well as possibly changing the print order
to get the important stuff out first. But it all relied on the ability
to get things into the buffer first without requiring each individual
printk() to synchronously push out the backlog to all consoles.
Petr's pull request provides the functionality for a CPU to call
printk() during emergencies so that each line only goes into the
buffer. We also include a function to perform the flush at any time. As
the series is implemented now, that flush happens after the warning is
completely stored into the buffer. In cases where there is lots of data
in the warning (such as in RCU stalls or lockdep splats), the flush
happens after significant parts of the warning.
John Ogness
[0] https://lore.kernel.org/lkml/875yheqh6v.fsf@jogness.linutronix.de
[1] https://www.youtube.com/watch?v=TVhNcKQvzxI (from 56:20)
[2] https://lore.kernel.org/lkml/CAHk-=wieXPMGEm7E=Sz2utzZdW1d=9hJBwGYAaAipxnMXr0Hvg@mail.gmail.com
next prev parent reply other threads:[~2024-07-23 20:41 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-23 14:38 [GIT PULL] printk for 6.11 Petr Mladek
2024-07-23 18:04 ` Linus Torvalds
2024-07-23 20:41 ` John Ogness [this message]
2024-07-23 21:07 ` Linus Torvalds
2024-07-24 8:42 ` Petr Mladek
2024-07-24 12:47 ` Peter Zijlstra
2024-07-24 20:33 ` Linus Torvalds
2024-07-25 12:51 ` Petr Mladek
2024-07-25 13:47 ` Peter Zijlstra
2024-07-25 16:48 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ed7jvo2c.fsf@jogness.linutronix.de \
--to=john.ogness@linutronix.de \
--cc=andriy.shevchenko@linux.intel.com \
--cc=bigeasy@linutronix.de \
--cc=jack@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=rostedt@goodmis.org \
--cc=senozhatsky@chromium.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox