public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all()
Date: Thu, 18 Apr 2024 14:47:36 +0200	[thread overview]
Message-ID: <ZiEWaA3CeQsccEdj@pathway.suse.cz> (raw)
In-Reply-To: <87plunoai0.fsf@jogness.linutronix.de>

On Thu 2024-04-18 01:11:59, John Ogness wrote:
> On 2024-04-11, Petr Mladek <pmladek@suse.com> wrote:
> > I am trying to make a full picture when and how the nbcon consoles
> > will get flushed. My current understanding and view is the following,
> > starting from the easiest priority:
> >
> >
> >   1. NBCON_PRIO_PANIC messages will be flushed by calling
> >      nbcon_atomic_flush_pending() directly in vprintk_emit()
> >
> >      This will take care of any previously added messages.
> >
> >      Non-panic CPUs are not allowed to add messages anymore
> >      when there is a panic in progress.
> >
> >      [ALL OK]
> 
> OK, because the end of panic will perform unsafe takeovers if necessary.
> 
> >   2. NBCON_PRIO_EMERGENCY messages will be flushed by calling
> >      nbcon_atomic_flush_pending() directly in nbcon_cpu_emergency_exit().
> >
> >      This would cover all previously added messages, including
> >      the ones printed by the code between
> >      nbcon_cpu_emergency_enter()/exit().
> 
> This is best effort. If the console is owned by another context and is
> marked unsafe, nbcon_atomic_flush_pending() will do nothing.
> 
> [ PROBLEM: In this case, who will flush the emergency messages? ]

They should get flushed by the current owner when the system is still
living. Or the system is ready for panic() and the messages would
be emitted bu the final unsafe flush.

IMPORTANT: The optimistic scenario would work only when the current
	   owner really flushes everything. More on this below.


> >      This won't cover later added messages which might be
> >      a problem. Let's look at this closer. Later added
> >      messages with:
> >
> > 	+ NBCON_PRIO_PANIC will be handled in vprintk_emit()
> > 	  as explained above [OK]
> >
> > 	+ NBCON_PRIO_EMERGENCY() will be handled in the
> > 	  related nbcon_cpu_emergency_exit() as described here.
> > 	  [OK]
> >
> > 	+ NBCON_PRIO_NORMAL will be handled, see below. [?]
> >
> >      [ PROBLEM: later added NBCON_PRIO_NORMAL messages, see below. ]
> 
> Yes, this is also an issue, although the solution may be the same for
> this and the above problem.

This is a bit different. There was an existing console owner in the
above scenario. In this case, the code relies on the printk kthread.
But we need a solution for situations when the kthread is not working,
e.g. early boot.


> >   3. NBCON_PRIO_NORMAL messages will be flushed by:
> >
> >        + the printk kthread when it is available
> >
> >        + the legacy loop via
> >
> > 	 + console_unlock()
> > 	    + console_flush_all()
> > 	      + console nbcon_legacy_emit_next_record() [PROBLEM]
> >
> > PROBLEM: console_flush_all() does not guarantee progress with
> > 	 nbcon consoles as explained above (previous mail).
> 
> Not only this. If there is no kthread available, no printing will occur
> until the _next_ printk(), whenever that is.

> Above we have listed 3 problems:
> 
> - emergency messages will not flush if owned by another context and
>   unsafe
>
> - normal messages will not flush if owned by another context
> 
> - for the above 2 problems, if there is no kthread, nobody will flush
>   the messages

All this goes down to the problem who is would flush "ignored"
messages when the system continues working in "normal" mode.


> My question: Is this really a problem?

IMHO, it is. For example, early boot consoles exists for a reason.
People want to debug early boot problems using printk.
We should not break the reliability too much by introducing kthreads.

Later update: It is basically only about early boot debugging.

	The kthreads should always be created later. And
	we assume that they work, except for the emergency
	and panic context.

	So, this is not a problem as long as the boot consoles
	are using the legacy code paths.

	Well, I guess that they might use the "atomic_write()"
	callback in the future. And then this "bug" might hurt.


> The main idea behind the rework is that printing is deferred. The
> kthreads exist for this. If the kthreads are not available (early boot
> or shutdown) or the kthreads are not reliable enough (emergency
> messages), a best-safe-effort is made to print in the caller
> context. Only the panic situation is designed to force output (unsafely,
> if necessary). Is that not enough?

Simple answer: No, primary because the early boot behavior.

Longer answer: I tried to separate all the variants and point out
	a particular problem. The above paragraph mixes everything
	into "Wave this away" proposal.


> > My proposal:
> >
> > 	1. console_flush_all() will flush nbcon consoles only
> > 	   in NBCON_PRIO_NORMAL and when the kthreads are not
> > 	   available.
> >
> > 	   It will make it clear that this is the flusher in
> > 	   this situation.
> 
> This is the current PREEMPT_RT implementation.
> 
> > 	2. Allow to skip nbcon consoles in console_flush_all() when
> > 	   it can't take the context (as suggested in my previous
> > 	   reply).
> >
> > 	   This won't guarantee flushing NORMAL messages added
> > 	   while nbcon_cpu_emergency_exit() calls
> > 	   nbcon_atomic_flush_pending().
> 
> This was the previous version. And I agree that we need to go back to
> that.
> 
> > 	   Solve this problem by introducing[*] nbcon_atomic_flush_all()
> > 	   which would flush even newly added messages and
> > 	   call this in nbcon_cpu_emergency_exit() when the printk
> > 	   kthread does not work. It should bail out when there
> > 	   is a panic in progress.
> >
> > 	   Motivation: It does not matter which "atomic" context
> > 		flushes NORMAL/EMERGENCY messages when
> > 		the printk kthread is not available.
> 
> I do not think that solves the problem. If the console is in an unsafe
> section, nothing can be printed.

IMHO, it solves the problem.

The idea is simple:

  "The current nbcon console owner will be responsible for flushing
   all messages when the printk kthread does not exist."

The prove is more complicated:

   1. Let's put aside panic. We already do the best effort there.

   2. Emergency mode currently violates the rule because
      nbcon_atomic_flush_pending() ignores the simple rule.

      => FIX: improve nbcon_cpu_emergency_exit() to flush
	      all messages when kthreads are not ready.


   3. Normal mode flushes nbcon consoles via
      nbcon_legacy_emit_next_record() from console_unlock()
      before the kthreads are started.

      It is not reliable when nbcon_try_acquire() fails.
      But it would fail only when there is another user.
      The other owner might be:

	+ panic: will handle everything

	+ emergency: should flush everything [*]

	+ normal: can't happen because of con->device() lock.

=> The only remaining problem is to fix nbcon_atomic_flush_pending()
   to flush everything when the kthreads are not started yet.


> > 	  [*] Alternatively we could modify nbcon_atomic_flush_pending()
> > 	      to flush even newly added messages when the kthread is
> > 	      not working. But it might create another mess.
> 
> This discussion is about when kthreads are not available. If this is a
> concern, I wonder if maybe in this situation an irq_work should be
> triggered upon release of the console.

Calling irq_work() would solve the problem as well. It would move
flushing to context with "normal" priority which is serialized
by con->device_lock(). It works for me.

Does this make any sense?

It is possible that you already knew all this. And it is possible
that you did not see it as a problem because there was no plan
to convert boot consoles to nbcon variant. Maybe, it does
not even make sense because boot consoles could not use
kthreads. The only motivation would be code sharing and
simplification of the legacy loop but this is far away dream.

Sigh, all this is so complicated. I wonder how to document
this so that other people do not have to discover these
dependencies again and again. Is it even possible?

Well, I still think that it makes sense to improve
nbcon_cpu_emergency_exit() to fill the potential hole.
And ideally mention all these details somewhere
(commit message, comments, Documentation/...)

Best Regards,
Petr

  reply	other threads:[~2024-04-18 12:47 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-02 22:11 [PATCH printk v4 00/27] wire up write_atomic() printing John Ogness
2024-04-02 22:11 ` [PATCH printk v4 01/27] printk: Add notation to console_srcu locking John Ogness
2024-04-02 22:11 ` [PATCH printk v4 02/27] printk: Properly deal with nbcon consoles on seq init John Ogness
2024-04-05 15:37   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 03/27] printk: nbcon: Remove return value for write_atomic() John Ogness
2024-04-08 13:17   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 04/27] printk: Check printk_deferred_enter()/_exit() usage John Ogness
2024-04-02 22:11 ` [PATCH printk v4 05/27] printk: nbcon: Add detailed doc for write_atomic() John Ogness
2024-04-08 15:20   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 06/27] printk: nbcon: Add callbacks to synchronize with driver John Ogness
2024-04-09  9:20   ` Petr Mladek
2024-04-16 15:01     ` John Ogness
2024-04-17 13:03       ` Petr Mladek
2024-04-17 14:54         ` John Ogness
2024-04-18 10:33           ` Petr Mladek
2024-04-18 12:10             ` John Ogness
2024-04-18 15:03               ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 07/27] printk: nbcon: Use driver synchronization while registering John Ogness
2024-04-09  9:46   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 08/27] serial: core: Provide low-level functions to lock port John Ogness
2024-04-09 12:00   ` Petr Mladek
2024-04-09 13:23   ` Greg Kroah-Hartman
2024-04-02 22:11 ` [PATCH printk v4 09/27] printk: nbcon: Implement processing in port->lock wrapper John Ogness
2024-04-03 11:35   ` John Ogness
2024-04-10 12:35   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 10/27] printk: nbcon: Do not rely on proxy headers John Ogness
2024-04-03 10:33   ` Andy Shevchenko
2024-04-10 13:06   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 11/27] printk: nbcon: Fix kerneldoc for enums John Ogness
2024-04-02 22:11 ` [PATCH printk v4 12/27] printk: Make console_is_usable() available to nbcon John Ogness
2024-04-02 22:11 ` [PATCH printk v4 13/27] printk: Let console_is_usable() handle nbcon John Ogness
2024-04-02 22:11 ` [PATCH printk v4 14/27] printk: Add @flags argument for console_is_usable() John Ogness
2024-04-02 22:11 ` [PATCH printk v4 15/27] printk: nbcon: Provide function to flush using write_atomic() John Ogness
2024-04-10 14:56   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 16/27] printk: Track registered boot consoles John Ogness
2024-04-02 22:11 ` [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all() John Ogness
2024-04-11 14:14   ` Petr Mladek
2024-04-11 15:32     ` Petr Mladek
2024-04-17 23:05       ` John Ogness
2024-04-18 12:47         ` Petr Mladek [this message]
2024-04-18 21:45           ` John Ogness
2024-04-19  9:55             ` Petr Mladek
2024-04-12  9:07     ` Petr Mladek
2024-04-17 21:59     ` John Ogness
2024-04-02 22:11 ` [PATCH printk v4 18/27] printk: nbcon: Assign priority based on CPU state John Ogness
2024-04-02 22:11 ` [PATCH printk v4 19/27] printk: nbcon: Add unsafe flushing on panic John Ogness
2024-04-11 14:18   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 20/27] printk: Avoid console_lock dance if no legacy or boot consoles John Ogness
2024-04-03 10:01   ` John Ogness
2024-04-12 14:21     ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 21/27] printk: Track nbcon consoles John Ogness
2024-04-12 14:22   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 22/27] printk: Coordinate direct printing in panic John Ogness
2024-04-12 14:39   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 23/27] printk: nbcon: Implement emergency sections John Ogness
2024-04-12 15:27   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 24/27] panic: Mark emergency section in warn John Ogness
2024-04-15 13:16   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 25/27] panic: Mark emergency section in oops John Ogness
2024-04-15 13:22   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 26/27] rcu: Mark emergency section in rcu stalls John Ogness
2024-04-15 13:32   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 27/27] lockdep: Mark emergency sections in lockdep splats John Ogness
2024-04-16  9:51   ` Petr Mladek
2024-04-16 11:17   ` Peter Zijlstra
2024-04-16 12:53     ` John Ogness

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZiEWaA3CeQsccEdj@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=john.ogness@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox