public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] printk for 6.11
@ 2024-07-23 14:38 Petr Mladek
  2024-07-23 18:04 ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Petr Mladek @ 2024-07-23 14:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sergey Senozhatsky, Steven Rostedt, John Ogness, Andy Shevchenko,
	Rasmus Villemoes, Sebastian Andrzej Siewior, Thomas Gleixner,
	Jan Kara, Peter Zijlstra, linux-kernel

Hi Linus,

please pull the latest printk changes from

  git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux.git tags/printk-for-6.11

===========================================

Next preparation step for introducing printk kthreads. This part
wires up flushing consoles via con->write_atomic() callback.

The write_atomic() callback can be used for flushing nbcon consoles
in any context. It is needed when the console could not be reliably
flushed using the printk kthread. Namely, it is used:

  - During the early boot, from the legacy loop in console_unlock().
    It allows to see the messages before processes (kthreads) could
    get started.

    Note that it must be used as long as there is a boot console
    registered. The console_lock() serializes accesses to the same
    HW port between the early and normal console driver.

  - In an emergency section, directly from nbcon_cpu_emergency_exit()
    or nbcon_cpu_emergency_flush(). It allows to see the messages
    when the system is in an unexpected state and might not be
    able to continue properly.

    The messages are flushed at the end of the emergency section
    to allow storing the full log (backtrace) first. It helps to
    see it even when other CPUs add messages in parallel. The flush()
    API is used to explicitly flush parts of longer logs which might
    not fit into the buffer or might cause a softlockup.

    The emergency section is used for WARN(), Oops, RCU stall,
    and lockdep reports.

  - In panic(), directly from printk() on the panic-CPU. Note that
    other CPUs are not allowed to add new messages at this point.

There is also a new uart_port_lock() API in serial_core.h. It allows to
take port->lock and also acquire the nbcon console context when
the particular port gets registered as a console.

The port->lock serializes code for non-printing activities, for example,
for writing or setting the port speed. The nbcon context allows to
serialize the access in emergency or panic situations where the classic
spin lock (port->lock) could cause a deadlock. Note that the nbcon
context allows a safe takeover in the middle of a printed message.

IMPORTANT: The changes do not affect the behavior of legacy consoles,
    except for two situations:

  - Legacy consoles are not longer flushed directly from printk()
    in emergency sections. The flush is done on exit from
    the emergency section or by nbcon_cpu_emergency_flush().
    The motivation is the same as for nbcon consoles.

  - Legacy consoles are not longer flushed directly from printk()
    in panic() when crash dump is created and a nbcon console
    is registered. They could create a deadlock in compare with
    nbcon consoles.

----------------------------------------------------------------
Jeff Johnson (1):
      vsprintf: add missing MODULE_DESCRIPTION() macro

John Ogness (25):
      printk: Add notation to console_srcu locking
      printk: nbcon: Remove return value for write_atomic()
      printk: nbcon: Add detailed doc for write_atomic()
      printk: nbcon: Add callbacks to synchronize with driver
      printk: nbcon: Use driver synchronization while (un)registering
      serial: core: Provide low-level functions to lock port
      serial: core: Introduce wrapper to set @uart_port->cons
      console: Improve console_srcu_read_flags() comments
      nbcon: Add API to acquire context for non-printing operations
      serial: core: Implement processing in port->lock wrapper
      printk: nbcon: Do not rely on proxy headers
      printk: Make console_is_usable() available to nbcon
      printk: Let console_is_usable() handle nbcon
      printk: Add @flags argument for console_is_usable()
      printk: nbcon: Add helper to assign priority based on CPU state
      printk: Track registered boot consoles
      printk: nbcon: Use nbcon consoles in console_flush_all()
      printk: nbcon: Add unsafe flushing on panic
      printk: Avoid console_lock dance if no legacy or boot consoles
      printk: Track nbcon consoles
      printk: Coordinate direct printing in panic
      panic: Mark emergency section in oops
      rcu: Mark emergency sections in rcu stalls
      lockdep: Mark emergency sections in lockdep splats
      printk: nbcon: do not require migration disabled for nbcon_get_cpu_emergency_nesting()

Petr Mladek (2):
      printk: Properly deal with nbcon consoles on seq init
      Merge branch 'rework/write-atomic' into for-linus

Sebastian Andrzej Siewior (1):
      printk: Check printk_deferred_enter()/_exit() usage

Sreenath Vijayan (1):
      printk: Rename console_replay_all() and update context

Thomas Gleixner (3):
      printk: nbcon: Provide function to flush using write_atomic()
      printk: nbcon: Implement emergency sections
      panic: Mark emergency section in warn

 drivers/tty/serial/8250/8250_core.c |   6 +-
 drivers/tty/serial/amba-pl011.c     |   2 +-
 drivers/tty/serial/serial_core.c    |  16 +-
 drivers/tty/sysrq.c                 |   2 +-
 include/linux/console.h             | 112 ++++++--
 include/linux/printk.h              |  37 ++-
 include/linux/serial_core.h         | 117 +++++++-
 kernel/locking/lockdep.c            |  84 +++++-
 kernel/panic.c                      |   9 +
 kernel/printk/internal.h            |  73 ++++-
 kernel/printk/nbcon.c               | 513 ++++++++++++++++++++++++++++++++++--
 kernel/printk/printk.c              | 313 ++++++++++++++++------
 kernel/printk/printk_ringbuffer.h   |   2 +
 kernel/printk/printk_safe.c         |  23 +-
 kernel/rcu/tree_exp.h               |   9 +
 kernel/rcu/tree_stall.h             |  11 +
 lib/test_printf.c                   |   1 +
 lib/test_scanf.c                    |   1 +
 18 files changed, 1194 insertions(+), 137 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] printk for 6.11
  2024-07-23 14:38 [GIT PULL] printk for 6.11 Petr Mladek
@ 2024-07-23 18:04 ` Linus Torvalds
  2024-07-23 20:41   ` John Ogness
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2024-07-23 18:04 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, John Ogness, Andy Shevchenko,
	Rasmus Villemoes, Sebastian Andrzej Siewior, Thomas Gleixner,
	Jan Kara, Peter Zijlstra, linux-kernel

On Tue, 23 Jul 2024 at 07:38, Petr Mladek <pmladek@suse.com> wrote:
>
>   - In an emergency section, directly from nbcon_cpu_emergency_exit()
>     or nbcon_cpu_emergency_flush(). It allows to see the messages
>     when the system is in an unexpected state and might not be
>     able to continue properly.
>
>     The messages are flushed at the end of the emergency section
>     to allow storing the full log (backtrace) first.

What? No.

One of the historically problematic situations is when a recursive
oops or a deadlock occurs *during* the first oops.

The "recursive oops" may be simple to sort out by forcing a flush at
that point, in that hopefully the machine is "alive", but what about
random deadlocks or other situations where the printk machinery simply
is never ever entered again?

And we most definitely have had exactly that happen due to the call
trace code etc.

At that point, it's ok if the machine is dead (this is obviously a
very catastrophic situation - nobody should worry about how to
continue), but it's really important that the first problem report
makes it out.

The whole notion of "to allow storing the full log (backtrace) first"
is completely crazy. It's entirely secondary whether you have a full
log or not, when the primary goal MUST BE that you have any output at
all!

How can this have _continued_ to be unclear, when it was my one hard
requirement for this whole thing from day one? My *ONE* requirement
has always been that the printk code ALWAYS does its absolute best to
print out problem reports.

Because when an oops happen, all other rules go out the window.

We no longer care about "what pretty printouts", and we should strive
to always try to just get at least *some* basic print out. The kernel
is known to not be in a great state, and maybe the printout will fail
due to where the problem happened, but the kernel NEEDS TO TRY.

           Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] printk for 6.11
  2024-07-23 18:04 ` Linus Torvalds
@ 2024-07-23 20:41   ` John Ogness
  2024-07-23 21:07     ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: John Ogness @ 2024-07-23 20:41 UTC (permalink / raw)
  To: Linus Torvalds, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Andy Shevchenko,
	Rasmus Villemoes, Sebastian Andrzej Siewior, Thomas Gleixner,
	Jan Kara, Peter Zijlstra, linux-kernel

Hi Linus,

On 2024-07-23, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>   - In an emergency section, directly from nbcon_cpu_emergency_exit()
>>     or nbcon_cpu_emergency_flush(). It allows to see the messages
>>     when the system is in an unexpected state and might not be
>>     able to continue properly.
>>
>>     The messages are flushed at the end of the emergency section
>>     to allow storing the full log (backtrace) first.
>
> What? No.
>
> One of the historically problematic situations is when a recursive
> oops or a deadlock occurs *during* the first oops.
>
> The "recursive oops" may be simple to sort out by forcing a flush at
> that point, in that hopefully the machine is "alive", but what about
> random deadlocks or other situations where the printk machinery simply
> is never ever entered again?
>
> And we most definitely have had exactly that happen due to the call
> trace code etc.
>
> At that point, it's ok if the machine is dead (this is obviously a
> very catastrophic situation - nobody should worry about how to
> continue), but it's really important that the first problem report
> makes it out.
>
> The whole notion of "to allow storing the full log (backtrace) first"
> is completely crazy. It's entirely secondary whether you have a full
> log or not, when the primary goal MUST BE that you have any output at
> all!
>
> How can this have _continued_ to be unclear, when it was my one hard
> requirement for this whole thing from day one? My *ONE* requirement
> has always been that the printk code ALWAYS does its absolute best to
> print out problem reports.
>
> Because when an oops happen, all other rules go out the window.
>
> We no longer care about "what pretty printouts", and we should strive
> to always try to just get at least *some* basic print out. The kernel
> is known to not be in a great state, and maybe the printout will fail
> due to where the problem happened, but the kernel NEEDS TO TRY.

As the primary author, I would like to clarify the motivation.

During LPC2022 at the printk proof-of-concept demonstration is where
this requirement came from. The second point in my summary [0] of that
meeting stated the new requirement.

The requirement came about because during the demonstration we
accidentally hit a situation where a warning backtrace could not be seen
because while trying to print the warning, a panic was hit. In the end
we could see the panic, but not the original warning. At that meeting we
genrally agreed that it would be better to at least get the backtrace
into the buffer before entering the complex machinery of pushing out the
backlog to consoles. Then, if a panic occurs while printing, the warning
is already in the buffer and will be flushed out ahead of any panic
messages. That discussion is available online [1] (starting at 56:20).

In your response [2] to my summary email, you mentioned that we could
tweak how much is buffered as well as possibly changing the print order
to get the important stuff out first. But it all relied on the ability
to get things into the buffer first without requiring each individual
printk() to synchronously push out the backlog to all consoles.

Petr's pull request provides the functionality for a CPU to call
printk() during emergencies so that each line only goes into the
buffer. We also include a function to perform the flush at any time. As
the series is implemented now, that flush happens after the warning is
completely stored into the buffer. In cases where there is lots of data
in the warning (such as in RCU stalls or lockdep splats), the flush
happens after significant parts of the warning.

John Ogness

[0] https://lore.kernel.org/lkml/875yheqh6v.fsf@jogness.linutronix.de

[1] https://www.youtube.com/watch?v=TVhNcKQvzxI (from 56:20)

[2] https://lore.kernel.org/lkml/CAHk-=wieXPMGEm7E=Sz2utzZdW1d=9hJBwGYAaAipxnMXr0Hvg@mail.gmail.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] printk for 6.11
  2024-07-23 20:41   ` John Ogness
@ 2024-07-23 21:07     ` Linus Torvalds
  2024-07-24  8:42       ` Petr Mladek
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2024-07-23 21:07 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Andy Shevchenko,
	Rasmus Villemoes, Sebastian Andrzej Siewior, Thomas Gleixner,
	Jan Kara, Peter Zijlstra, linux-kernel

On Tue, 23 Jul 2024 at 13:41, John Ogness <john.ogness@linutronix.de> wrote:
>
> Petr's pull request provides the functionality for a CPU to call
> printk() during emergencies so that each line only goes into the
> buffer. We also include a function to perform the flush at any time. As
> the series is implemented now, that flush happens after the warning is
> completely stored into the buffer. In cases where there is lots of data
> in the warning (such as in RCU stalls or lockdep splats), the flush
> happens after significant parts of the warning.

I really think the flushing needs to be *way* more aggressive for any
oops. The "flush at end" is not even remotely sane.

Some amount of buffering can make sense, eg when printing out the
regular register state over a few lines, there certainly shouldn't be
anything there that can cause problems.

Let me pick a very specific example of a common thing:

   int __die(const char *str, struct pt_regs *regs, long err)

in arch/x86/kernel/dumpstack.c.

Look, do I expect problems in "__die_header()"? No.

But the *moment* you call "notify_die()", you are now calling random
debug code. The register state NEEDS TO HAVE BEEN FLUSHED before this
point.

This is not something I'm willing to debate. Some of the most painful
debugging sessions I have *EVER* had have been due to "debug code that
failed".

Are these things rare? Yes they are. Very. Thankfully.

But the scars left behind by things like "buggy kgdb hook meant that
oops printout never happened at all when kgdb wasn't even enabled" and
having wasted literally *days* on something that would have been
obvious had the oops printout just happened means that I'm very much
in the "once bitten, twice shy" camp.

So that's why I absolutely *ABOHOR* that code in "oops_begin()" that
stops printouts until "oops_end()". It's *EXACTLY* the wrong thing to
do if there's some problem in the middle.

And yes, those problems have happened. Again - rarely, but it's *so*
painful when they do, that I refuse to pull something that I consider
to be this broken.

And yes, I'm convinced we have many other situations where a problem
during printout will silence things (the obvious one being locking
issues with the printing itself). But I refuse to have that silence be
an integral part of the die() code.

                 Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] printk for 6.11
  2024-07-23 21:07     ` Linus Torvalds
@ 2024-07-24  8:42       ` Petr Mladek
  2024-07-24 12:47         ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Petr Mladek @ 2024-07-24  8:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Andy Shevchenko,
	Rasmus Villemoes, Sebastian Andrzej Siewior, Thomas Gleixner,
	Jan Kara, Peter Zijlstra, linux-kernel

On Tue 2024-07-23 14:07:12, Linus Torvalds wrote:
> On Tue, 23 Jul 2024 at 13:41, John Ogness <john.ogness@linutronix.de> wrote:
> >
> > Petr's pull request provides the functionality for a CPU to call
> > printk() during emergencies so that each line only goes into the
> > buffer. We also include a function to perform the flush at any time. As
> > the series is implemented now, that flush happens after the warning is
> > completely stored into the buffer. In cases where there is lots of data
> > in the warning (such as in RCU stalls or lockdep splats), the flush
> > happens after significant parts of the warning.
> 
> I really think the flushing needs to be *way* more aggressive for any
> oops. The "flush at end" is not even remotely sane.
> 
> Some amount of buffering can make sense, eg when printing out the
> regular register state over a few lines, there certainly shouldn't be
> anything there that can cause problems.

Yes, this was the intention. I have missed the code path calling
the notifiers. The Oops/die code is more complicated.

Otherwise, nbcon_cpu_emergency_enter()/exit() is
used only around code printing various well defined debug
reports, like WARN(), lockdep, or RCU stall.

Note that the buffering is only in the emergency sections.
The messages are flushed directly after reaching panic().

Just to be sure. The buffering is _not_ there to solve cosmetic
problems. It should allow storing important information before
trying to flush it to consoles. It is because the flushing to consoles
is slow, might trigger softlockups and even cause system to die.


> Let me pick a very specific example of a common thing:
> 
>    int __die(const char *str, struct pt_regs *regs, long err)
> 
> in arch/x86/kernel/dumpstack.c.
> 
> Look, do I expect problems in "__die_header()"? No.
> 
> But the *moment* you call "notify_die()", you are now calling random
> debug code. The register state NEEDS TO HAVE BEEN FLUSHED before this
> point.

This primary goes down to notifiers (random debug code). It would make sense
to try flushing the consoles before calling them. A generic solution
would be:

diff --git a/kernel/notifier.c b/kernel/notifier.c
index b3ce28f39eb6..82989022d8fe 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -77,6 +77,8 @@ static int notifier_call_chain(struct notifier_block **nl,
 	int ret = NOTIFY_DONE;
 	struct notifier_block *nb, *next_nb;
 
+	nbcon_cpu_emergency_flush();
+
 	nb = rcu_dereference_raw(*nl);
 
 	while (nb && nr_to_call) {


> This is not something I'm willing to debate. Some of the most painful
> debugging sessions I have *EVER* had have been due to "debug code that
> failed".

I could imagine a bug even in WARN() or lockdep. We should probably
add an option to disable the buffering in emergency sections.

All in all, the buffering in emergency sections was just
a "sounds reasonable" idea which popped out during the demo at
Plumbers. It is not a clear win. We just wanted to give it a try.

Will it get acceptable if we flush the messages before calling
notifiers and with an option to disable it? I am not sure
whether it deserves more code. It probably would make sense to remove
it when it causes more harm than good in practice.

Best Regards,
Petr

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] printk for 6.11
  2024-07-24  8:42       ` Petr Mladek
@ 2024-07-24 12:47         ` Peter Zijlstra
  2024-07-24 20:33           ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2024-07-24 12:47 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Linus Torvalds, John Ogness, Sergey Senozhatsky, Steven Rostedt,
	Andy Shevchenko, Rasmus Villemoes, Sebastian Andrzej Siewior,
	Thomas Gleixner, Jan Kara, linux-kernel

On Wed, Jul 24, 2024 at 10:42:11AM +0200, Petr Mladek wrote:
> On Tue 2024-07-23 14:07:12, Linus Torvalds wrote:
> > On Tue, 23 Jul 2024 at 13:41, John Ogness <john.ogness@linutronix.de> wrote:
> > >
> > > Petr's pull request provides the functionality for a CPU to call
> > > printk() during emergencies so that each line only goes into the
> > > buffer. We also include a function to perform the flush at any time. As
> > > the series is implemented now, that flush happens after the warning is
> > > completely stored into the buffer. In cases where there is lots of data
> > > in the warning (such as in RCU stalls or lockdep splats), the flush
> > > happens after significant parts of the warning.
> > 
> > I really think the flushing needs to be *way* more aggressive for any
> > oops. The "flush at end" is not even remotely sane.
> > 
> > Some amount of buffering can make sense, eg when printing out the
> > regular register state over a few lines, there certainly shouldn't be
> > anything there that can cause problems.
> 
> Yes, this was the intention. I have missed the code path calling
> the notifiers. The Oops/die code is more complicated.
> 
> Otherwise, nbcon_cpu_emergency_enter()/exit() is
> used only around code printing various well defined debug
> reports, like WARN(), lockdep, or RCU stall.
> 
> Note that the buffering is only in the emergency sections.
> The messages are flushed directly after reaching panic().
> 
> Just to be sure. The buffering is _not_ there to solve cosmetic
> problems. It should allow storing important information before
> trying to flush it to consoles. It is because the flushing to consoles
> is slow, might trigger softlockups and even cause system to die.

So.. I've complained about this emergency buffering before. At the very
least the atomic consoles should never buffer and immediately print
everything. Per their definition they always work.

Slow is not a consideration. You might never reach the end of the
section, anything you can get before the machine dies is a win.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] printk for 6.11
  2024-07-24 12:47         ` Peter Zijlstra
@ 2024-07-24 20:33           ` Linus Torvalds
  2024-07-25 12:51             ` Petr Mladek
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2024-07-24 20:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Petr Mladek, John Ogness, Sergey Senozhatsky, Steven Rostedt,
	Andy Shevchenko, Rasmus Villemoes, Sebastian Andrzej Siewior,
	Thomas Gleixner, Jan Kara, linux-kernel

On Wed, 24 Jul 2024 at 05:47, Peter Zijlstra <peterz@infradead.org> wrote:
>
> So.. I've complained about this emergency buffering before. At the very
> least the atomic consoles should never buffer and immediately print
> everything. Per their definition they always work.

Yeah, my personal preference would be some variation of this.

And when I say "some variation of this", I do think that having a
per-console trylock is fine, and buffering *if* the atomic console is
already busy (presumably with an existing oops, but possibly also for
"setup issues" - ie things like "serial line is being configured" or
"VGACON is in the middle of a redraw or console size change
operation".

And yes, before anybody speaks up, that is kind of the approximation
of the current console_trylock() logic. I am aware. And I'm also aware
of how much people have hated it. And I'm not claiming it's perfect.

But I do think that the *typically* important case is "something went
horribly wrong, and the console was *not* busy at the time", and
that's the case where there is no excuse to not just print out ASAP.

And yes, then if some oops happens _while_ the console is busy
(possibly because another oops just happened on another CPU core, and
now *this* CPU also hit it), at that point you go "Ok, trying to print
now would just make things worse".

And a lot of consoles may end up simply having to buffer, ie if you
are using a network console or something, maybe "not buffering is so
painful and involves thousands of random network cards that we
realistically cannot deal with it".

(Same ultimately goes for slow serial lines - there comes a point
where waiting for things to flush just isn't worth it)

So I'm not saying "never buffer". There are very valid reasons to say
"I can't very reasonably print this synchronously".

But I really do think that we should never buffer "by default". And
that's why I kind of hate that whole concept of "oops_begin starts
buffering". It's exactly the kind of "buffer by default" mental model
that I was really hoping we'd never have.

            Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] printk for 6.11
  2024-07-24 20:33           ` Linus Torvalds
@ 2024-07-25 12:51             ` Petr Mladek
  2024-07-25 13:47               ` Peter Zijlstra
  2024-07-25 16:48               ` Linus Torvalds
  0 siblings, 2 replies; 10+ messages in thread
From: Petr Mladek @ 2024-07-25 12:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, John Ogness, Sergey Senozhatsky, Steven Rostedt,
	Andy Shevchenko, Rasmus Villemoes, Sebastian Andrzej Siewior,
	Thomas Gleixner, Jan Kara, linux-kernel

On Wed 2024-07-24 13:33:20, Linus Torvalds wrote:
> On Wed, 24 Jul 2024 at 05:47, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > So.. I've complained about this emergency buffering before. At the very
> > least the atomic consoles should never buffer and immediately print
> > everything. Per their definition they always work.
> 
> Yeah, my personal preference would be some variation of this.
>
> And when I say "some variation of this", I do think that having a
> per-console trylock is fine, and buffering *if* the atomic console is
> already busy (presumably with an existing oops, but possibly also for
> "setup issues" - ie things like "serial line is being configured" or
> "VGACON is in the middle of a redraw or console size change
> operation".
> 
> And yes, before anybody speaks up, that is kind of the approximation
> of the current console_trylock() logic. I am aware. And I'm also aware
> of how much people have hated it. And I'm not claiming it's perfect.

I am afraid that we have to live with some buffering. Otherwhise,
the speed of the system might be limited by the speed of the consoles.
This might be especially noticeable during boot when a lot of HW
gets initialized and tons of messages are flushed to a slow serial console.

After all, the trylock trick has been added already in 2001. It has been
only 3 years after adding SMP support (console_lock) to consoles in 1998.

> But I do think that the *typically* important case is "something went
> horribly wrong, and the console was *not* busy at the time", and
> that's the case where there is no excuse to not just print out ASAP.

Yup.

Just for record. The idea of "buffering in emergency" came up
in the opposite scenario:

<flood of messages>

CPU 0					CPU 1

WARN()
  printk()
    flush_consoles()
      # handling long backlog

					panic()
					  printk()
					    flush_consoles()
					    # successfully took over the lock
					    # and continued flushing the backlog


Result: CPU 0 never printed the rest of the WARN()

It looked acceptable because WARN() code was just printing messages,
was well tested and should never fail (last famous words).

Another motivation was that the consoles were handled by separate
threads. They might allow to see the entire WARN() on fast consoles
before a serial one prints the first line.

Also there are ways to see the messages without working consoles,
e.g. via crash dump, pstore, persistent memory. The buffer-first
approach might make even more sense in this case.

> But I really do think that we should never buffer "by default". And
> that's why I kind of hate that whole concept of "oops_begin starts
> buffering". It's exactly the kind of "buffer by default" mental model
> that I was really hoping we'd never have.

I agree that buffering in emeregency is more risky than in normal
situation. The idea needs more love. Let's continue a more
conservative way for now.

John is going to rework the series and remove the buffering in
emeregency. I am going to send another pull request with
just few trivial fixes for 6.11.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] printk for 6.11
  2024-07-25 12:51             ` Petr Mladek
@ 2024-07-25 13:47               ` Peter Zijlstra
  2024-07-25 16:48               ` Linus Torvalds
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2024-07-25 13:47 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Linus Torvalds, John Ogness, Sergey Senozhatsky, Steven Rostedt,
	Andy Shevchenko, Rasmus Villemoes, Sebastian Andrzej Siewior,
	Thomas Gleixner, Jan Kara, linux-kernel

On Thu, Jul 25, 2024 at 02:51:57PM +0200, Petr Mladek wrote:

> I am afraid that we have to live with some buffering. Otherwhise,
> the speed of the system might be limited by the speed of the consoles.
> This might be especially noticeable during boot when a lot of HW
> gets initialized and tons of messages are flushed to a slow serial console.

Only if you create a slow atomic console. But other than that, I've been
booting like that for at least the last decade or so.

Most all my test boxes have forced synchronous earlyprintk at 115200
baud. It has saved me untold times.

I suppose I'll just keep my own printk hackery around for longer.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GIT PULL] printk for 6.11
  2024-07-25 12:51             ` Petr Mladek
  2024-07-25 13:47               ` Peter Zijlstra
@ 2024-07-25 16:48               ` Linus Torvalds
  1 sibling, 0 replies; 10+ messages in thread
From: Linus Torvalds @ 2024-07-25 16:48 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Peter Zijlstra, John Ogness, Sergey Senozhatsky, Steven Rostedt,
	Andy Shevchenko, Rasmus Villemoes, Sebastian Andrzej Siewior,
	Thomas Gleixner, Jan Kara, linux-kernel

On Thu, 25 Jul 2024 at 05:52, Petr Mladek <pmladek@suse.com> wrote:
>
> I am afraid that we have to live with some buffering. Otherwhise,
> the speed of the system might be limited by the speed of the consoles.
> This might be especially noticeable during boot when a lot of HW
> gets initialized and tons of messages are flushed to a slow serial console.

Oh, I don't mind buffering during *normal* operations. Particularly
not for some slow serial line.

It's literally just oopses and KERN_ERR and similar that I think are special.

And I do think some consoles are more special than others.

"I want to buffer because serial lines are slow" does not mean "we
should always buffer" when realistically not everybody has a serial
line.

> Just for record. The idea of "buffering in emergency" came up
> in the opposite scenario:
>
> <flood of messages>
>
> CPU 0                                   CPU 1
>
> WARN()
>   printk()
>     flush_consoles()
>       # handling long backlog
>
>                                         panic()
>                                           printk()
>                                             flush_consoles()
>                                             # successfully took over the lock
>                                             # and continued flushing the backlog
>
> Result: CPU 0 never printed the rest of the WARN()

First off, I do think that's fine. This is - by definition - not a
normal situation, and a panic() is *way* more important than some
WARN.

Yes, they may obviously be related, but at the same time, if you
panic, and particularly if you have reboot-on-panic, you damn well get
what you asked for: other things *will* be hidden by the panic. Tough
luck.

There's a very real reason why I tell people that using "BUG_ON()" is
not ok for things where you can just return an error.

And there's also a very real reason why I think people who do
reboot-on-panic get to keep both pieces. It's self-inflicted damage,
and if they come crying to you, tell them so.

What you should *really* take away from this is

 (a) you fundamentally can't handle all situations.

We are - by definition - talking catastrophic kernel bugs, and
something unexpected went very very wrong,

You can always make up some case that won't work, and you NEED TO REALIZE THAT.

 (b) that means that you have to prioritize what you *DO* handle.

And I'm telling you that what needs to be prioritized is a oops (not a
warning), but particularly the *first* one.

Now, unrelated to that, I'm also claiming that the problem you
actually talk about is at least partially caused *by* the excessive
buffering. The whole "long backlog" should never happen, and never be
considered normal behavior.

So I think we also tend to have a behavioral problem, in that our
default console loglevel is too high. It's high by default, and I
suspect some users do console=verbose, which sets

        console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH;

which does exactly what it looks like it would do (or, more commonly,
the debug case that "only" sets it to CONSOLE_LOGLEVEL_DEBUG, which is
effectively the same thing in practice - I don't think we have any
users that actually use log-levels past KERN_DEBUG).

The thing is, if you have CONSOLE_LOGLEVEL_DEBUG set, and a big
machine, your bootup *will* be printing a lot of data.

And it *will* be slow over a serial console, particularly one that
runs at some historical speed because that's how a lot of these silly
things work.

But dammit, we literally have a "buffer messages" mode. It's this:

    static bool suppress_message_printing(int level)
    {
        return (level >= console_loglevel && !ignore_loglevel);
    }

and it was *ALWAYS* that. This is literally why log levels exist: they
say "some messages should not be printed, because it's slow and people
don't want to see it unless they need it".

So I think in a very real sense, the "solution" may be to make sure
our loglevel is something reasonable, and find the messages that cause
excessive noise, and make sure that they are buffered by being
sufficiently low log levels!

And maybe make sure our log level logic actually works right - I think
it should always have been per-console, but for obvious reasons that
was never the original behavior, and I suspect it was never fixed
because all the loglevel stuff I see from a quick grep is just
global).

A loglevel that makes sense for a fast directly connected console may
not make sense for a server with emulated serial lines and tons of
debug info.

IOW, I think a lot of the whole "long backlog" issues come from us
doing badly with log levels.

             Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-07-25 16:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-23 14:38 [GIT PULL] printk for 6.11 Petr Mladek
2024-07-23 18:04 ` Linus Torvalds
2024-07-23 20:41   ` John Ogness
2024-07-23 21:07     ` Linus Torvalds
2024-07-24  8:42       ` Petr Mladek
2024-07-24 12:47         ` Peter Zijlstra
2024-07-24 20:33           ` Linus Torvalds
2024-07-25 12:51             ` Petr Mladek
2024-07-25 13:47               ` Peter Zijlstra
2024-07-25 16:48               ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox