public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all()
Date: Thu, 11 Apr 2024 16:14:54 +0200	[thread overview]
Message-ID: <ZhfwXsEE2Y8IPPxX@localhost.localdomain> (raw)
In-Reply-To: <20240402221129.2613843-18-john.ogness@linutronix.de>

On Wed 2024-04-03 00:17:19, John Ogness wrote:
> Allow nbcon consoles to print messages in the legacy printk()
> caller context (printing via unlock) by integrating them into
> console_flush_all(). The write_atomic() callback is used for
> printing.

Hmm, this patch tries to flush nbcon console even in context
with NBCON_PRIO_NORMAL. Do we really want this, please?

I would expect that it would do so only when the kthread
is not working.

> Provide nbcon_legacy_emit_next_record(), which acts as the
> nbcon variant of console_emit_next_record(). Call this variant
> within console_flush_all() for nbcon consoles. Since nbcon
> consoles use their own @nbcon_seq variable to track the next
> record to print, this also must be appropriately handled.

I have been a bit confused by all the boolean return values
and what _exactly_ they mean. IMHO, we should make it more
clear how it works when it can't acquire the context.

IMHO, it is is importnat because console_flush_all() interprets
nbcon_legacy_emit_next_record() return value as @progress even when
there is no guaranteed progress. We just expect that
the other context is doing something.

It feels like it might get stuck forewer in some situatuon.
It would be good to understand if it is OK or not.


Later update:

Hmm, console_flush_all() is called from console_unlock().
It might be called in atomic context. But the current
owner might be theoretically scheduled out.

This is from documentation of nbcon_context_try_acquire()

/**
 * nbcon_context_try_acquire - Try to acquire nbcon console
 * @ctxt:	The context of the caller
 *
 * Context:	Any context which could not be migrated to another CPU.


I can't find any situation where nbcon_context_try_acquire() is
currently called in normal (schedulable) context. This is probably
why you did not see any problems with testing.

I see 3 possible solutions:

  1. Enforce that nbcon context can be acquired only with preemtion
     disabled.

  2. Enforce that nbcon context can be acquired only with
     interrupts. It would prevent deadlock when some future
     code interrupt flush in NBCON_PRIO_EMERGENCY context.
     And then a potential nested console_flush_all() won't be
     able to takeover the interrupted NBCON_PRIO_CONTEXT
     and there will be no progress.

  3. console_flush_all() should ignore nbcon console when
     it is not able to get the context, aka no progress.


I personally prefer the 3rd solution because I have spent
last 12 years on attempts to move printk into preemtible
context. And it looks wrong to move into atomic context.

Warning: console_flush_all() suddenly won't guarantee flushing
	 all messages.

	 I am not completely sure about all the consequences until
	 I see the rest of the patchset and the kthread intergration.
	 We will somehow need to guarantee that all messages
	 are flushed.


I propose the following changes to make console_flush_all()
safe. The patch is made on top of this patchset. Feel free
to squash them into your patch and remove my SoB.


From 8dd9c0be5ab693c6976a0f7f8b99de48ecebcbf6 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Thu, 11 Apr 2024 15:45:53 +0200
Subject: [PATCH] printk: nbcon: Prevent deadlock when flushing nbcon consoles
 in the legacy loop

Ignore pending records in nbcon consoles in the legacy console_flush_all()
loop when the nbcon console context can't be acquired. They will be
flushed either by the current nbcon context owner or to-be-introduced
printk kthread.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/nbcon.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index c57ad34a8d56..88c40f165be4 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -964,8 +964,12 @@ static __ref unsigned int *nbcon_get_cpu_emergency_nesting(void)
  *				write_atomic() callback
  * @wctxt:	An initialized write context struct to use for this context
  *
- * Return:	False if it is known there are no more records to print,
- *		otherwise true.
+ * Return:	True, when a record has been printed and there are still
+ *		pending records. The caller might want to continue flushing.
+ *
+ *		False, when there is no pending record, or when the console
+ *		context can't be acquired, or the ownership has been lost.
+ *		The caller should give up. Either the job is or	can't be done.
  *
  * This is an internal helper to handle the locking of the console before
  * calling nbcon_emit_next_record().
@@ -975,7 +979,7 @@ static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
 	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
 
 	if (!nbcon_context_try_acquire(ctxt))
-		return true;
+		return false;
 
 	/*
 	 * nbcon_emit_next_record() returns false when the console was
@@ -983,7 +987,7 @@ static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
 	 * longer valid.
 	 */
 	if (!nbcon_emit_next_record(wctxt))
-		return true;
+		return false;
 
 	nbcon_context_release(ctxt);
 
@@ -1023,8 +1027,13 @@ enum nbcon_prio nbcon_get_default_prio(void)
  * @cookie:	The cookie from the SRCU read lock.
  *
  * Context:	Any context except NMI.
- * Return:	False if the given console has no next record to print,
- *		otherwise true.
+ *
+ * Return:	True, when a record has been printed and there are still
+ *		pending records. The caller might want to continue flushing.
+ *
+ *		False, when there is no pending record, or when the console
+ *		context can't be acquired, or the ownership has been lost.
+ *		The caller should give up. Either the job is or	can't be done.
  *
  * This function is meant to be called by console_flush_all() to print records
  * on nbcon consoles from legacy context (printing via console unlocking).
-- 
2.44.0


Best Regards,
Petr

  reply	other threads:[~2024-04-11 14:14 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-02 22:11 [PATCH printk v4 00/27] wire up write_atomic() printing John Ogness
2024-04-02 22:11 ` [PATCH printk v4 01/27] printk: Add notation to console_srcu locking John Ogness
2024-04-02 22:11 ` [PATCH printk v4 02/27] printk: Properly deal with nbcon consoles on seq init John Ogness
2024-04-05 15:37   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 03/27] printk: nbcon: Remove return value for write_atomic() John Ogness
2024-04-08 13:17   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 04/27] printk: Check printk_deferred_enter()/_exit() usage John Ogness
2024-04-02 22:11 ` [PATCH printk v4 05/27] printk: nbcon: Add detailed doc for write_atomic() John Ogness
2024-04-08 15:20   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 06/27] printk: nbcon: Add callbacks to synchronize with driver John Ogness
2024-04-09  9:20   ` Petr Mladek
2024-04-16 15:01     ` John Ogness
2024-04-17 13:03       ` Petr Mladek
2024-04-17 14:54         ` John Ogness
2024-04-18 10:33           ` Petr Mladek
2024-04-18 12:10             ` John Ogness
2024-04-18 15:03               ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 07/27] printk: nbcon: Use driver synchronization while registering John Ogness
2024-04-09  9:46   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 08/27] serial: core: Provide low-level functions to lock port John Ogness
2024-04-09 12:00   ` Petr Mladek
2024-04-09 13:23   ` Greg Kroah-Hartman
2024-04-02 22:11 ` [PATCH printk v4 09/27] printk: nbcon: Implement processing in port->lock wrapper John Ogness
2024-04-03 11:35   ` John Ogness
2024-04-10 12:35   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 10/27] printk: nbcon: Do not rely on proxy headers John Ogness
2024-04-03 10:33   ` Andy Shevchenko
2024-04-10 13:06   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 11/27] printk: nbcon: Fix kerneldoc for enums John Ogness
2024-04-02 22:11 ` [PATCH printk v4 12/27] printk: Make console_is_usable() available to nbcon John Ogness
2024-04-02 22:11 ` [PATCH printk v4 13/27] printk: Let console_is_usable() handle nbcon John Ogness
2024-04-02 22:11 ` [PATCH printk v4 14/27] printk: Add @flags argument for console_is_usable() John Ogness
2024-04-02 22:11 ` [PATCH printk v4 15/27] printk: nbcon: Provide function to flush using write_atomic() John Ogness
2024-04-10 14:56   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 16/27] printk: Track registered boot consoles John Ogness
2024-04-02 22:11 ` [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all() John Ogness
2024-04-11 14:14   ` Petr Mladek [this message]
2024-04-11 15:32     ` Petr Mladek
2024-04-17 23:05       ` John Ogness
2024-04-18 12:47         ` Petr Mladek
2024-04-18 21:45           ` John Ogness
2024-04-19  9:55             ` Petr Mladek
2024-04-12  9:07     ` Petr Mladek
2024-04-17 21:59     ` John Ogness
2024-04-02 22:11 ` [PATCH printk v4 18/27] printk: nbcon: Assign priority based on CPU state John Ogness
2024-04-02 22:11 ` [PATCH printk v4 19/27] printk: nbcon: Add unsafe flushing on panic John Ogness
2024-04-11 14:18   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 20/27] printk: Avoid console_lock dance if no legacy or boot consoles John Ogness
2024-04-03 10:01   ` John Ogness
2024-04-12 14:21     ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 21/27] printk: Track nbcon consoles John Ogness
2024-04-12 14:22   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 22/27] printk: Coordinate direct printing in panic John Ogness
2024-04-12 14:39   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 23/27] printk: nbcon: Implement emergency sections John Ogness
2024-04-12 15:27   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 24/27] panic: Mark emergency section in warn John Ogness
2024-04-15 13:16   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 25/27] panic: Mark emergency section in oops John Ogness
2024-04-15 13:22   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 26/27] rcu: Mark emergency section in rcu stalls John Ogness
2024-04-15 13:32   ` Petr Mladek
2024-04-02 22:11 ` [PATCH printk v4 27/27] lockdep: Mark emergency sections in lockdep splats John Ogness
2024-04-16  9:51   ` Petr Mladek
2024-04-16 11:17   ` Peter Zijlstra
2024-04-16 12:53     ` John Ogness

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZhfwXsEE2Y8IPPxX@localhost.localdomain \
    --to=pmladek@suse.com \
    --cc=john.ogness@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=senozhatsky@chromium.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox