[PATCH printk v3 00/19] add threaded printing + the rest

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH printk v3 00/19] add threaded printing + the rest
@ 2024-07-22 17:19 John Ogness
  2024-07-22 17:19 ` [PATCH printk v3 01/19] printk: nbcon: Clarify nbcon_get_default_prio() context John Ogness
                   ` (18 more replies)
  0 siblings, 19 replies; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, linux-fsdevel, Jiri Slaby, linux-serial

Hi,

This is v3 of a series to implement threaded console printing as
well as some other minor pieces (such as proc and sysfs support). v2
is here [0].

For information about the motivation of the nbcon consoles, please
read the cover letter of the original v1 [1].

This series provides the remaining pieces of the printk rework. All
other components are either already mainline or are currently in
linux-next. In particular this series does:

- Implement dedicated printing threads per nbcon console.

- Implement forced threading of legacy consoles for PREEMPT_RT.

- Implement nbcon support for proc and sysfs console-related files.

- Provide a new helper function nbcon_reacquire_nobuf() to allow
  nbcon console drivers to reacquire ownership.

Note that this series does *not* provide an nbcon console driver.
That will come in a follow-up series.

Here are the main changes since v2:

- Drop the patch for "threadprintk" boot arg for !PREEMPT_RT.

- Rename variables and functions:

force_printkthreads()		-> force_legacy_kthread()
nbcon_reacquire()		-> nbcon_reacquire_nobuf()
nbcon_wake_threads()		-> nbcon_wake_kthreads()
nbcon_legacy_kthread_func()	-> legacy_kthread_func()
wake_up_legacy_kthread()	-> legacy_kthread_wake()
@printk_threads_enabled		-> @printk_kthreads_running
@nbcon_legacy_kthread		-> @printk_legacy_kthread

- Introduce @printk_kthreads_ready to flag when it is allowed to
  create kthreads.

- Rename lockdep map functions with updated comments to hopefully
  better clarify their purpose.

- The write_thread() callback is now mandatory. write_atomic()
  remains optional. (This also simplifies the changes to
  drivers/tty/tty_io.c and fs/proc/consoles.c.)

- Merge nbcon_init() into nbcon_alloc(). This simplifies the rules
  and more closely resembles the legacy registration.

- Introduce struct console_flush_type to inform which flushing type
  to use.

- Introduce flushing type macro printk_get_console_flush_type() for
  non-emergency CPU states.

- Introduce flushing type macro
  printk_get_emergency_console_flush_type() for emergency CPU
  states.

- Remove @printing_via_unlock macro since there is now the flushing
  type macros.

- Replace _all_ flushing decisions with calls to new flushing type
  macros.

- Introduce helper function nbcon_write_context_set_buf() for
  setting up the write context for nbcon printers.

- In nbcon and legacy kthread functions, do not check for spurious
  signals.

- In nbcon_kthread_func(), check for kthread_should_stop() inside
  the printing loop.

- In nbcon_kthread_func(), add cond_resched() inside the printing
  loop.

- In rcuwait_has_sleeper(), avoid unnecessary RCU dereference.

- In nbcon_wake_kthreads(), do not try to wake if kthreads are not
  fully available.

- In nbcon_legacy_emit_next_record(), split the atomic/thread cases
  before and after printing.

- Use suggested alternative implementation for
  console_prepend_message().

- Move all functions to start/stop kthreads into printk.c.

- Introduce printk_kthreads_check_locked() to start/stop kthreads if
  needed when the console list has changed.

- Unregister nbcon console if the related kthread printer cannot be
  created.

- On failure to start the legacy kthread, unregister all legacy
  consoles.

- If all legacy consoles unregister, stop legacy kthread.

- If all nbcon consoles unregister, clear @nbcon_kthreads_running.

- On shutdown, clear @nbcon_kthreads_running.

- In unregister_console_locked(), flush the console before
  unregistering.

- Let pr_flush() fail if called _very_ early.

- Do not wake nbcon kthreads in printk_trigger_flush().

- In console_try_replay_all(), add support for the legacy kthread.

- Refactor the series to hopefully provide a cleaner transition to
  the final desired stand.

- Split -20 nice for kthreads into a separate patch.

- Rewrite many comments and commit messages as requested.

John Ogness

[0] https://lore.kernel.org/lkml/20240603232453.33992-1-john.ogness@linutronix.de

[1] https://lore.kernel.org/lkml/20230302195618.156940-1-john.ogness@linutronix.de

John Ogness (18):
  printk: nbcon: Clarify nbcon_get_default_prio() context
  printk: nbcon: Consolidate alloc() and init()
  printk: nbcon: Add function for printers to reacquire ownership
  printk: nbcon: Clarify rules of the owner/waiter matching
  printk: Fail pr_flush() if before SYSTEM_SCHEDULING
  printk: Flush console on unregister_console()
  printk: Add helpers for flush type logic
  printk: nbcon: Add context to usable() and emit()
  printk: nbcon: Use thread callback if in task context for legacy
  printk: nbcon: Rely on kthreads for normal operation
  printk: Provide helper for message prepending
  printk: nbcon: Show replay message on takeover
  proc: consoles: Add notation to c_start/c_stop
  proc: Add nbcon support for /proc/consoles
  tty: sysfs: Add nbcon support for 'active'
  printk: Implement legacy printer kthread for PREEMPT_RT
  printk: nbcon: Assign nice -20 for printing threads
  printk: Avoid false positive lockdep report for legacy printing

Thomas Gleixner (1):
  printk: nbcon: Introduce printer kthreads

 drivers/tty/tty_io.c     |   2 +-
 fs/proc/consoles.c       |   7 +-
 include/linux/console.h  |  50 +++-
 kernel/printk/internal.h | 159 ++++++++++-
 kernel/printk/nbcon.c    | 530 ++++++++++++++++++++++++++++++-----
 kernel/printk/printk.c   | 588 +++++++++++++++++++++++++++++++--------
 6 files changed, 1137 insertions(+), 199 deletions(-)


base-commit: b18703ea7157f62f02eb0ceb11f6fa0138e90adc
-- 
2.39.2


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH printk v3 01/19] printk: nbcon: Clarify nbcon_get_default_prio() context
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-07-26  8:57   ` Petr Mladek
  2024-07-22 17:19 ` [PATCH printk v3 02/19] printk: nbcon: Consolidate alloc() and init() John Ogness
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

Correct the kerneldoc and code comments that claimed migration must
be disabled for nbcon_get_default_prio(). This is not true.

Add the explation in the kerneldoc.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/nbcon.c  | 5 ++++-
 kernel/printk/printk.c | 2 --
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index ef6e76db0f5a..d8388faa6500 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -970,9 +970,12 @@ static __ref unsigned int *nbcon_get_cpu_emergency_nesting(void)
  * nbcon_get_default_prio - The appropriate nbcon priority to use for nbcon
  *				printing on the current CPU
  *
- * Context:	Any context which could not be migrated to another CPU.
+ * Context:	Any context.
  * Return:	The nbcon_prio to use for acquiring an nbcon console in this
  *		context for printing.
+ *
+ * Allowing migration enabled relies on the fact that a context cannot be
+ * migrated to panic or emergency state CPUs.
  */
 enum nbcon_prio nbcon_get_default_prio(void)
 {
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d0bff0b0abfd..5090c0591f88 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2401,8 +2401,6 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 * printing of all remaining records to all consoles so that
 		 * this context can return as soon as possible. Hopefully
 		 * another printk() caller will take over the printing.
-		 *
-		 * Also, nbcon_get_default_prio() requires migration disabled.
 		 */
 		preempt_disable();
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 01/19] printk: nbcon: Clarify nbcon_get_default_prio() context
  2024-07-22 17:19 ` [PATCH printk v3 01/19] printk: nbcon: Clarify nbcon_get_default_prio() context John Ogness
@ 2024-07-26  8:57   ` Petr Mladek
  0 siblings, 0 replies; 57+ messages in thread
From: Petr Mladek @ 2024-07-26  8:57 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Mon 2024-07-22 19:25:21, John Ogness wrote:
> Correct the kerneldoc and code comments that claimed migration must
> be disabled for nbcon_get_default_prio(). This is not true.
> 
> Add the explation in the kerneldoc.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> ---
>  kernel/printk/nbcon.c  | 5 ++++-
>  kernel/printk/printk.c | 2 --
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index ef6e76db0f5a..d8388faa6500 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -970,9 +970,12 @@ static __ref unsigned int *nbcon_get_cpu_emergency_nesting(void)
>   * nbcon_get_default_prio - The appropriate nbcon priority to use for nbcon
>   *				printing on the current CPU
>   *
> - * Context:	Any context which could not be migrated to another CPU.
> + * Context:	Any context.
>   * Return:	The nbcon_prio to use for acquiring an nbcon console in this
>   *		context for printing.
> + *
> + * Allowing migration enabled relies on the fact that a context cannot be
> + * migrated to panic or emergency state CPUs.

The sentence sounds really cryptic to me. I personally prefer
straightforward descriptions for dummies ;-) What about the following?

 * The function reads per-CPU variables but it is safe in any context.
 * The CPU migration is disabled in panic or emergency CPU context
 * by definition.

>   */
>  enum nbcon_prio nbcon_get_default_prio(void)
>  {

I do not resist on the comment update. Feel free to use:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH printk v3 02/19] printk: nbcon: Consolidate alloc() and init()
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
  2024-07-22 17:19 ` [PATCH printk v3 01/19] printk: nbcon: Clarify nbcon_get_default_prio() context John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-07-26 11:58   ` Petr Mladek
  2024-07-22 17:19 ` [PATCH printk v3 03/19] printk: nbcon: Add function for printers to reacquire ownership John Ogness
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

Rather than splitting the nbcon allocation and initialization into
two pieces, perform all initialization in nbcon_alloc(). Later,
the init_seq is calculated and can be explicitly set using
nbcon_seq_force(). This removes the need for the strong rules of
nbcon_init() that even included a BUG_ON(). It also more closely
matches the setup logic of the legacy consoles.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/internal.h |  2 --
 kernel/printk/nbcon.c    | 35 ++++++++++-------------------------
 kernel/printk/printk.c   |  2 +-
 3 files changed, 11 insertions(+), 28 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 0439cf2fdc22..d58f5cefbac3 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -86,7 +86,6 @@ int console_lock_spinning_disable_and_check(int cookie);
 u64 nbcon_seq_read(struct console *con);
 void nbcon_seq_force(struct console *con, u64 seq);
 bool nbcon_alloc(struct console *con);
-void nbcon_init(struct console *con, u64 init_seq);
 void nbcon_free(struct console *con);
 enum nbcon_prio nbcon_get_default_prio(void);
 void nbcon_atomic_flush_pending(void);
@@ -144,7 +143,6 @@ static inline bool printk_percpu_data_ready(void) { return false; }
 static inline u64 nbcon_seq_read(struct console *con) { return 0; }
 static inline void nbcon_seq_force(struct console *con, u64 seq) { }
 static inline bool nbcon_alloc(struct console *con) { return false; }
-static inline void nbcon_init(struct console *con, u64 init_seq) { }
 static inline void nbcon_free(struct console *con) { }
 static inline enum nbcon_prio nbcon_get_default_prio(void) { return NBCON_PRIO_NONE; }
 static inline void nbcon_atomic_flush_pending(void) { }
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index d8388faa6500..1388e23a439f 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1343,17 +1343,21 @@ void nbcon_cpu_emergency_flush(void)
 }
 
 /**
- * nbcon_alloc - Allocate buffers needed by the nbcon console
- * @con:	Console to allocate buffers for
+ * nbcon_alloc - Allocate and init the nbcon console specific data
+ * @con:	Console to initialize
  *
- * Return:	True on success. False otherwise and the console cannot
- *		be used.
+ * Return:	True if the console was fully allocated and initialized.
+ *		Otherwise @con must not be registered.
  *
- * This is not part of nbcon_init() because buffer allocation must
- * be performed earlier in the console registration process.
+ * When allocation and init was successful, the console must be properly
+ * freed using nbcon_free() once it is no longer needed.
  */
 bool nbcon_alloc(struct console *con)
 {
+	struct nbcon_state state = { };
+
+	nbcon_state_set(con, &state);
+
 	if (con->flags & CON_BOOT) {
 		/*
 		 * Boot console printing is synchronized with legacy console
@@ -1372,25 +1376,6 @@ bool nbcon_alloc(struct console *con)
 	return true;
 }
 
-/**
- * nbcon_init - Initialize the nbcon console specific data
- * @con:	Console to initialize
- * @init_seq:	Sequence number of the first record to be emitted
- *
- * nbcon_alloc() *must* be called and succeed before this function
- * is called.
- */
-void nbcon_init(struct console *con, u64 init_seq)
-{
-	struct nbcon_state state = { };
-
-	/* nbcon_alloc() must have been called and successful! */
-	BUG_ON(!con->pbufs);
-
-	nbcon_seq_force(con, init_seq);
-	nbcon_state_set(con, &state);
-}
-
 /**
  * nbcon_free - Free and cleanup the nbcon console specific data
  * @con:	Console to free/cleanup nbcon data
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 5090c0591f88..641c2a8b0a09 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3631,7 +3631,7 @@ void register_console(struct console *newcon)
 
 	if (newcon->flags & CON_NBCON) {
 		have_nbcon_console = true;
-		nbcon_init(newcon, init_seq);
+		nbcon_seq_force(newcon, init_seq);
 	} else {
 		have_legacy_console = true;
 		newcon->seq = init_seq;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 02/19] printk: nbcon: Consolidate alloc() and init()
  2024-07-22 17:19 ` [PATCH printk v3 02/19] printk: nbcon: Consolidate alloc() and init() John Ogness
@ 2024-07-26 11:58   ` Petr Mladek
  0 siblings, 0 replies; 57+ messages in thread
From: Petr Mladek @ 2024-07-26 11:58 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Mon 2024-07-22 19:25:22, John Ogness wrote:
> Rather than splitting the nbcon allocation and initialization into
> two pieces, perform all initialization in nbcon_alloc(). Later,
> the init_seq is calculated and can be explicitly set using
> nbcon_seq_force(). This removes the need for the strong rules of
> nbcon_init() that even included a BUG_ON(). It also more closely
> matches the setup logic of the legacy consoles.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

I like this.

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH printk v3 03/19] printk: nbcon: Add function for printers to reacquire ownership
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
  2024-07-22 17:19 ` [PATCH printk v3 01/19] printk: nbcon: Clarify nbcon_get_default_prio() context John Ogness
  2024-07-22 17:19 ` [PATCH printk v3 02/19] printk: nbcon: Consolidate alloc() and init() John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-07-26 12:25   ` Petr Mladek
  2024-07-22 17:19 ` [PATCH printk v3 04/19] printk: nbcon: Clarify rules of the owner/waiter matching John Ogness
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman

Since ownership can be lost at any time due to handover or
takeover, a printing context _must_ be prepared to back out
immediately and carefully. However, there are scenarios where
the printing context must reacquire ownership in order to
finalize or revert hardware changes.

One such example is when interrupts are disabled during
printing. No other context will automagically re-enable the
interrupts. For this case, the disabling context _must_
reacquire nbcon ownership so that it can re-enable the
interrupts.

Provide nbcon_reacquire_nobuf() for exactly this purpose. It
allows a printing context to reacquire ownership using the same
priority as its previous ownership.

Note that after a successful reacquire the printing context
will have no output buffer because that has been lost. This
function cannot be used to resume printing.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 include/linux/console.h |  6 +++++
 kernel/printk/nbcon.c   | 56 +++++++++++++++++++++++++++++++++++++----
 2 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 4aaf053840ee..38ef6e64da19 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -366,6 +366,10 @@ struct console {
 	 *
 	 * The callback should allow the takeover whenever it is safe. It
 	 * increases the chance to see messages when the system is in trouble.
+	 * If the driver must reacquire ownership in order to finalize or
+	 * revert hardware changes, nbcon_reacquire_nobuf() can be used.
+	 * However, on reacquire the buffer content is no longer available. A
+	 * reacquire cannot be used to resume printing.
 	 *
 	 * The callback can be called from any context (including NMI).
 	 * Therefore it must avoid usage of any locking and instead rely
@@ -559,6 +563,7 @@ extern void nbcon_cpu_emergency_flush(void);
 extern bool nbcon_can_proceed(struct nbcon_write_context *wctxt);
 extern bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt);
 extern bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt);
+extern void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt);
 #else
 static inline void nbcon_cpu_emergency_enter(void) { }
 static inline void nbcon_cpu_emergency_exit(void) { }
@@ -566,6 +571,7 @@ static inline void nbcon_cpu_emergency_flush(void) { }
 static inline bool nbcon_can_proceed(struct nbcon_write_context *wctxt) { return false; }
 static inline bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt) { return false; }
 static inline bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt) { return false; }
+static inline void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt) { }
 #endif
 
 extern int console_set_on_cmdline;
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 1388e23a439f..18a83d181622 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -834,6 +834,47 @@ bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt)
 }
 EXPORT_SYMBOL_GPL(nbcon_exit_unsafe);
 
+static void nbcon_write_context_set_buf(struct nbcon_write_context *wctxt,
+					char *buf, unsigned int len)
+{
+	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+	struct console *con = ctxt->console;
+	struct nbcon_state cur;
+
+	wctxt->outbuf = buf;
+	wctxt->len = len;
+	nbcon_state_read(con, &cur);
+	wctxt->unsafe_takeover = cur.unsafe_takeover;
+}
+
+/**
+ * nbcon_reacquire_nobuf - Reacquire a console after losing ownership
+ *				while printing
+ * @wctxt:	The write context that was handed to the write callback
+ *
+ * Since ownership can be lost at any time due to handover or takeover, a
+ * printing context _must_ be prepared to back out immediately and
+ * carefully. However, there are scenarios where the printing context must
+ * reacquire ownership in order to finalize or revert hardware changes.
+ *
+ * This function allows a printing context to reacquire ownership using the
+ * same priority as its previous ownership.
+ *
+ * Note that after a successful reacquire the printing context will have no
+ * output buffer because that has been lost. This function cannot be used to
+ * resume printing.
+ */
+void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt)
+{
+	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+
+	while (!nbcon_context_try_acquire(ctxt))
+		cpu_relax();
+
+	nbcon_write_context_set_buf(wctxt, NULL, 0);
+}
+EXPORT_SYMBOL_GPL(nbcon_reacquire_nobuf);
+
 /**
  * nbcon_emit_next_record - Emit a record in the acquired context
  * @wctxt:	The write context that will be handed to the write function
@@ -859,7 +900,6 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
 		.pbufs = ctxt->pbufs,
 	};
 	unsigned long con_dropped;
-	struct nbcon_state cur;
 	unsigned long dropped;
 
 	/*
@@ -894,10 +934,7 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
 		goto update_con;
 
 	/* Initialize the write context for driver callbacks. */
-	wctxt->outbuf = &pmsg.pbufs->outbuf[0];
-	wctxt->len = pmsg.outbuf_len;
-	nbcon_state_read(con, &cur);
-	wctxt->unsafe_takeover = cur.unsafe_takeover;
+	nbcon_write_context_set_buf(wctxt, &pmsg.pbufs->outbuf[0], pmsg.outbuf_len);
 
 	if (con->write_atomic) {
 		con->write_atomic(con, wctxt);
@@ -911,6 +948,15 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
 		return false;
 	}
 
+	if (!wctxt->outbuf) {
+		/*
+		 * Ownership was lost and reacquired by the driver.
+		 * Handle it as if ownership was lost.
+		 */
+		nbcon_context_release(ctxt);
+		return false;
+	}
+
 	/*
 	 * Since any dropped message was successfully output, reset the
 	 * dropped count for the console.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 03/19] printk: nbcon: Add function for printers to reacquire ownership
  2024-07-22 17:19 ` [PATCH printk v3 03/19] printk: nbcon: Add function for printers to reacquire ownership John Ogness
@ 2024-07-26 12:25   ` Petr Mladek
  2024-07-29  8:36     ` John Ogness
  0 siblings, 1 reply; 57+ messages in thread
From: Petr Mladek @ 2024-07-26 12:25 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman

On Mon 2024-07-22 19:25:23, John Ogness wrote:
> Since ownership can be lost at any time due to handover or
> takeover, a printing context _must_ be prepared to back out
> immediately and carefully. However, there are scenarios where
> the printing context must reacquire ownership in order to
> finalize or revert hardware changes.
> 
> One such example is when interrupts are disabled during
> printing. No other context will automagically re-enable the
> interrupts. For this case, the disabling context _must_
> reacquire nbcon ownership so that it can re-enable the
> interrupts.

I am still not sure how this is going to be used. It is suspicious.
If the context lost the ownership than another started flushing
higher priority messages.

Is it really safe to manipulate the HW at this point?
Won't it break the higher priority context?

> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -911,6 +948,15 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
>  		return false;
>  	}
>  
> +	if (!wctxt->outbuf) {

This check works only when con->write_atomic() called nbcon_reacquire_nobuf().

At least, we should clear the buffer also in nbcon_enter_unsafe() and
nbcon_exit_unsafe() when they realize that they do own the context.

Even better would be to add a check whether we still own the context.
Something like:

bool nbcon_can_proceed(struct nbcon_write_context *wctxt)
{
	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
	struct nbcon_state cur;

	nbcon_state_read(con, &cur);

	return nbcon_context_can_proceed(ctxt, &cur);
}

> +		/*
> +		 * Ownership was lost and reacquired by the driver.
> +		 * Handle it as if ownership was lost.
> +		 */
> +		nbcon_context_release(ctxt);
> +		return false;
> +	}
> +
>  	/*
>  	 * Since any dropped message was successfully output, reset the
>  	 * dropped count for the console.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 03/19] printk: nbcon: Add function for printers to reacquire ownership
  2024-07-26 12:25   ` Petr Mladek
@ 2024-07-29  8:36     ` John Ogness
  2024-07-30  9:24       ` Petr Mladek
  0 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-07-29  8:36 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman

On 2024-07-26, Petr Mladek <pmladek@suse.com> wrote:
> On Mon 2024-07-22 19:25:23, John Ogness wrote:
>> Since ownership can be lost at any time due to handover or
>> takeover, a printing context _must_ be prepared to back out
>> immediately and carefully. However, there are scenarios where
>> the printing context must reacquire ownership in order to
>> finalize or revert hardware changes.
>> 
>> One such example is when interrupts are disabled during
>> printing. No other context will automagically re-enable the
>> interrupts. For this case, the disabling context _must_
>> reacquire nbcon ownership so that it can re-enable the
>> interrupts.
>
> I am still not sure how this is going to be used. It is suspicious.
> If the context lost the ownership than another started flushing
> higher priority messages.
>
> Is it really safe to manipulate the HW at this point?
> Won't it break the higher priority context?

Why would it break anything? It spins until it normally and safely
acquires ownership again. The commit message provides a simple example
of why it is necessary. With a threaded printer, this situation happens
almost every time a warning occurs.

>> --- a/kernel/printk/nbcon.c
>> +++ b/kernel/printk/nbcon.c
>> @@ -911,6 +948,15 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
>>  		return false;
>>  	}
>>  
>> +	if (!wctxt->outbuf) {
>
> This check works only when con->write_atomic() called
> nbcon_reacquire_nobuf().

Exactly. That is what it is for.

> At least, we should clear the buffer also in nbcon_enter_unsafe() and
> nbcon_exit_unsafe() when they realize that they do own the context.

OK.

> Even better would be to add a check whether we still own the context.
> Something like:
>
> bool nbcon_can_proceed(struct nbcon_write_context *wctxt)
> {
> 	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> 	struct nbcon_state cur;
>
> 	nbcon_state_read(con, &cur);
>
> 	return nbcon_context_can_proceed(ctxt, &cur);
> }

nbcon_can_proceed() is meant to check ownership. And it only makes sense
to use it within an unsafe section. Otherwise it is racy.

Once a reacquire has occurred, the driver is allowed to proceed. It just
is not allowed to print (because its buffer is gone).

>> +		/*
>> +		 * Ownership was lost and reacquired by the driver.
>> +		 * Handle it as if ownership was lost.
>> +		 */
>> +		nbcon_context_release(ctxt);
>> +		return false;

John

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 03/19] printk: nbcon: Add function for printers to reacquire ownership
  2024-07-29  8:36     ` John Ogness
@ 2024-07-30  9:24       ` Petr Mladek
  2024-08-27  1:32         ` John Ogness
  0 siblings, 1 reply; 57+ messages in thread
From: Petr Mladek @ 2024-07-30  9:24 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman

On Mon 2024-07-29 10:42:04, John Ogness wrote:
> On 2024-07-26, Petr Mladek <pmladek@suse.com> wrote:
> > On Mon 2024-07-22 19:25:23, John Ogness wrote:
> >> Since ownership can be lost at any time due to handover or
> >> takeover, a printing context _must_ be prepared to back out
> >> immediately and carefully. However, there are scenarios where
> >> the printing context must reacquire ownership in order to
> >> finalize or revert hardware changes.
> >> 
> >> One such example is when interrupts are disabled during
> >> printing. No other context will automagically re-enable the
> >> interrupts. For this case, the disabling context _must_
> >> reacquire nbcon ownership so that it can re-enable the
> >> interrupts.
> >
> > I am still not sure how this is going to be used. It is suspicious.
> > If the context lost the ownership than another started flushing
> > higher priority messages.
> >
> > Is it really safe to manipulate the HW at this point?
> > Won't it break the higher priority context?
> 
> Why would it break anything? It spins until it normally and safely
> acquires ownership again. The commit message provides a simple example
> of why it is necessary. With a threaded printer, this situation happens
> almost every time a warning occurs.

I see. It makes sense now.

> >> --- a/kernel/printk/nbcon.c
> >> +++ b/kernel/printk/nbcon.c
> >> @@ -911,6 +948,15 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
> >>  		return false;
> >>  	}
> >>  
> >> +	if (!wctxt->outbuf) {
> >
> > This check works only when con->write_atomic() called
> > nbcon_reacquire_nobuf().
> 
> Exactly. That is what it is for.
> 
> > At least, we should clear the buffer also in nbcon_enter_unsafe() and
> > nbcon_exit_unsafe() when they realize that they do own the context.
> 
> OK.
> 
> > Even better would be to add a check whether we still own the context.
> > Something like:
> >
> > bool nbcon_can_proceed(struct nbcon_write_context *wctxt)
> > {
> > 	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> > 	struct nbcon_state cur;
> >
> > 	nbcon_state_read(con, &cur);
> >
> > 	return nbcon_context_can_proceed(ctxt, &cur);
> > }
> 
> nbcon_can_proceed() is meant to check ownership. And it only makes sense
> to use it within an unsafe section. Otherwise it is racy.

My idea was: "If we still own the context that we have owned it all
	      the time and con-write_atomic() succeeded."

The race is is not important. If we lose the ownership before updating
nbcon_seq then the line will get written again anyway.

> Once a reacquire has occurred, the driver is allowed to proceed. It just
> is not allowed to print (because its buffer is gone).

I see. My idea does not work because the driver is going to reacquire
the ownership. It means that nbcon_can_proceed() would return true
even when con->atomic_write() failed.

But it is not documented anywhere. And what if the driver has a bug
and does not call reacquire. Or what if the driver does not need
to restore anything.

IMHO, nbcon_emit_next_record() should check both:

	if (use_atomic)
		con->write_atomic(con, wctxt);
	else
		con->write_thread(con, wctxt);

	/* Still owns the console? */
	if (!nbcon_can_proceed(wctxt)
		return false;

	if (!wctxt->outbuf) {
		/*
		 * Ownership was lost and reacquired by the driver.
		 * Handle it as if ownership was lost.
		 */
		nbcon_context_release(ctxt);
		return false;
	}

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 03/19] printk: nbcon: Add function for printers to reacquire ownership
  2024-07-30  9:24       ` Petr Mladek
@ 2024-08-27  1:32         ` John Ogness
  0 siblings, 0 replies; 57+ messages in thread
From: John Ogness @ 2024-08-27  1:32 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman

On 2024-07-30, Petr Mladek <pmladek@suse.com> wrote:
> My idea was: "If we still own the context that we have owned it all
> 	      the time and con-write_atomic() succeeded."
>
> The race is is not important. If we lose the ownership before updating
> nbcon_seq then the line will get written again anyway.
>
>> Once a reacquire has occurred, the driver is allowed to proceed. It just
>> is not allowed to print (because its buffer is gone).
>
> I see. My idea does not work because the driver is going to reacquire
> the ownership. It means that nbcon_can_proceed() would return true
> even when con->atomic_write() failed.
>
> But it is not documented anywhere. And what if the driver has a bug
> and does not call reacquire. Or what if the driver does not need
> to restore anything.
>
> IMHO, nbcon_emit_next_record() should check both:
>
> 	if (use_atomic)
> 		con->write_atomic(con, wctxt);
> 	else
> 		con->write_thread(con, wctxt);
>
> 	/* Still owns the console? */
> 	if (!nbcon_can_proceed(wctxt)
> 		return false;
>
> 	if (!wctxt->outbuf) {
> 		/*
> 		 * Ownership was lost and reacquired by the driver.
> 		 * Handle it as if ownership was lost.
> 		 */
> 		nbcon_context_release(ctxt);
> 		return false;
> 	}

Except that the next thing nbcon_emit_next_record() does is
nbcon_context_enter_unsafe(), which checks ownership. So your suggested
nbcon_can_proceed() is redundant.

For v4 I can add comments explaining this. It would look like this (at
this point in the series):

	/* Initialize the write context for driver callbacks. */
	nbcon_write_context_set_buf(wctxt, &pmsg.pbufs->outbuf[0], pmsg.outbuf_len);

	if (con->write_atomic) {
		con->write_atomic(con, wctxt);
	} else {
		/*
		 * This function should never be called for legacy consoles.
		 * Handle it as if ownership was lost and try to continue.
		 */
		WARN_ON_ONCE(1);
		nbcon_context_release(ctxt);
		return false;
	}

	if (!wctxt->outbuf) {
		/*
		 * Ownership was lost and reacquired by the driver. Handle it
		 * as if ownership was lost.
		 */
		nbcon_context_release(ctxt);
		return false;
	}

	/*
	 * Ownership may have been lost but _not_ reacquired by the driver.
	 * This case is detected and handled when entering unsafe to update
	 * dropped/seq values.
	 */

	/*
	 * Since any dropped message was successfully output, reset the
	 * dropped count for the console.
	 */
	dropped = 0;
update_con:
	/*
	 * The dropped count and the sequence number are updated within an
	 * unsafe section. This limits update races to the panic context and
	 * allows the panic context to win.
	 */

	if (!nbcon_context_enter_unsafe(ctxt))
		return false;

John Ogness

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH printk v3 04/19] printk: nbcon: Clarify rules of the owner/waiter matching
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
                   ` (2 preceding siblings ...)
  2024-07-22 17:19 ` [PATCH printk v3 03/19] printk: nbcon: Add function for printers to reacquire ownership John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-07-26 12:55   ` Petr Mladek
  2024-07-22 17:19 ` [PATCH printk v3 05/19] printk: Fail pr_flush() if before SYSTEM_SCHEDULING John Ogness
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

The functions nbcon_owner_matches() and nbcon_waiter_matches()
use a minimal set of data to determine if a context matches.
The existing kerneldoc and comments were not clear enough and
caused the printk folks to re-prove that the functions are
indeed reliable in all cases.

Update and expand the explanations so that it is clear that the
implementations are sufficient for all cases.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/nbcon.c | 56 +++++++++++++++++++++++++++++++++++--------
 1 file changed, 46 insertions(+), 10 deletions(-)

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 18a83d181622..db1685a6d5cd 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -241,6 +241,13 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
 	struct nbcon_state new;
 
 	do {
+		/*
+		 * Panic does not imply that the console is owned. However, it
+		 * is critical that non-panic CPUs during panic are unable to
+		 * acquire ownership in order to satisfy the assumptions of
+		 * nbcon_waiter_matches(). In particular, the assumption that
+		 * lower priorities are ignored during panic.
+		 */
 		if (other_cpu_in_panic())
 			return -EPERM;
 
@@ -272,18 +279,29 @@ static bool nbcon_waiter_matches(struct nbcon_state *cur, int expected_prio)
 	/*
 	 * The request context is well defined by the @req_prio because:
 	 *
-	 * - Only a context with a higher priority can take over the request.
+	 * - Only a context with a priority higher than the owner can become
+	 *   a waiter.
+	 * - Only a context with a priority higher than the waiter can
+	 *   directly take over the request.
 	 * - There are only three priorities.
 	 * - Only one CPU is allowed to request PANIC priority.
 	 * - Lower priorities are ignored during panic() until reboot.
 	 *
 	 * As a result, the following scenario is *not* possible:
 	 *
-	 * 1. Another context with a higher priority directly takes ownership.
-	 * 2. The higher priority context releases the ownership.
-	 * 3. A lower priority context takes the ownership.
-	 * 4. Another context with the same priority as this context
+	 * 1. This context is currently a waiter.
+	 * 2. Another context with a higher priority than this context
+	 *    directly takes ownership.
+	 * 3. The higher priority context releases the ownership.
+	 * 4. Another lower priority context takes the ownership.
+	 * 5. Another context with the same priority as this context
 	 *    creates a request and starts waiting.
+	 *
+	 * Event #1 implies this context is EMERGENCY.
+	 * Event #2 implies the new context is PANIC.
+	 * Event #3 occurs when panic() has flushed the console.
+	 * Events #4 and #5 are not possible due to the other_cpu_in_panic()
+	 * check in nbcon_context_try_acquire_direct().
 	 */
 
 	return (cur->req_prio == expected_prio);
@@ -591,11 +609,29 @@ static bool nbcon_owner_matches(struct nbcon_state *cur, int expected_cpu,
 				int expected_prio)
 {
 	/*
-	 * Since consoles can only be acquired by higher priorities,
-	 * owning contexts are uniquely identified by @prio. However,
-	 * since contexts can unexpectedly lose ownership, it is
-	 * possible that later another owner appears with the same
-	 * priority. For this reason @cpu is also needed.
+	 * A similar function, nbcon_waiter_matches(), only deals with
+	 * EMERGENCY and PANIC priorities. However, this function must also
+	 * deal with the NORMAL priority, which requires additional checks
+	 * and constraints.
+	 *
+	 * For the case where preemption and interrupts are disabled, it is
+	 * enough to also verify that the owning CPU has not changed.
+	 *
+	 * For the case where preemption or interrupts are enabled, an
+	 * external synchronization method *must* be used. In particular,
+	 * the driver-specific locking mechanism used in device_lock()
+	 * (including disabling migration) should be used. It prevents
+	 * scenarios such as:
+	 *
+	 * 1. [Task A] owns a context with NBCON_PRIO_NORMAL on [CPU X] and
+	 *    is scheduled out.
+	 * 2. Another context takes over the lock with NBCON_PRIO_EMERGENCY
+	 *    and releases it.
+	 * 3. [Task B] acquires a context with NBCON_PRIO_NORMAL on [CPU X]
+	 *    and is scheduled out.
+	 * 4. [Task A] gets running on [CPU X] and sees that the console is
+	 *    still owned by a task on [CPU X] with NBON_PRIO_NORMAL. Thus
+	 *    [Task A] thinks it is the owner when it is not.
 	 */
 
 	if (cur->prio != expected_prio)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 04/19] printk: nbcon: Clarify rules of the owner/waiter matching
  2024-07-22 17:19 ` [PATCH printk v3 04/19] printk: nbcon: Clarify rules of the owner/waiter matching John Ogness
@ 2024-07-26 12:55   ` Petr Mladek
  0 siblings, 0 replies; 57+ messages in thread
From: Petr Mladek @ 2024-07-26 12:55 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Mon 2024-07-22 19:25:24, John Ogness wrote:
> The functions nbcon_owner_matches() and nbcon_waiter_matches()
> use a minimal set of data to determine if a context matches.
> The existing kerneldoc and comments were not clear enough and
> caused the printk folks to re-prove that the functions are
> indeed reliable in all cases.
> 
> Update and expand the explanations so that it is clear that the
> implementations are sufficient for all cases.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Thanks a lot for documenting this!

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH printk v3 05/19] printk: Fail pr_flush() if before SYSTEM_SCHEDULING
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
                   ` (3 preceding siblings ...)
  2024-07-22 17:19 ` [PATCH printk v3 04/19] printk: nbcon: Clarify rules of the owner/waiter matching John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-07-26 13:14   ` Petr Mladek
  2024-07-22 17:19 ` [PATCH printk v3 06/19] printk: Flush console on unregister_console() John Ogness
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

A follow-up change adds pr_flush() to console unregistration.
However, with boot consoles unregistration can happen very
early if there are also regular consoles registering as well.
In this case the pr_flush() is not important because the
regular console manually flushes the boot consoles before
unregistering them.

Allow pr_flush() to fail if @system_state has not yet reached
SYSTEM_SCHEDULING. This avoids might_sleep() and msleep()
explosions that would otherwise occur.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 641c2a8b0a09..39db56a32c5e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3946,6 +3946,10 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 	u64 diff;
 	u64 seq;
 
+	/* Sorry, pr_flush() will not work this early. */
+	if (system_state < SYSTEM_SCHEDULING)
+		return false;
+
 	might_sleep();
 
 	seq = prb_next_reserve_seq(prb);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 05/19] printk: Fail pr_flush() if before SYSTEM_SCHEDULING
  2024-07-22 17:19 ` [PATCH printk v3 05/19] printk: Fail pr_flush() if before SYSTEM_SCHEDULING John Ogness
@ 2024-07-26 13:14   ` Petr Mladek
  2024-07-26 14:45     ` John Ogness
  0 siblings, 1 reply; 57+ messages in thread
From: Petr Mladek @ 2024-07-26 13:14 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Mon 2024-07-22 19:25:25, John Ogness wrote:
> A follow-up change adds pr_flush() to console unregistration.
> However, with boot consoles unregistration can happen very
> early if there are also regular consoles registering as well.
> In this case the pr_flush() is not important because the
> regular console manually flushes the boot consoles before
> unregistering them.

It is not much clear what the "manual flush" means. Is it
the console_flush_all() in get_init_console_seq()?

I would write something like:

<proposal>
In this case the pr_flush() is not important because all consoles
are flushed when checking the initial console sequence number.
</proposal>

> Allow pr_flush() to fail if @system_state has not yet reached
> SYSTEM_SCHEDULING. This avoids might_sleep() and msleep()
> explosions that would otherwise occur.

What do you mean with the explosion, please?
Does it add some warnings into the log?
Does it cause extra delays?

> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> ---
>  kernel/printk/printk.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 641c2a8b0a09..39db56a32c5e 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3946,6 +3946,10 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
>  	u64 diff;
>  	u64 seq;
>  
> +	/* Sorry, pr_flush() will not work this early. */
> +	if (system_state < SYSTEM_SCHEDULING)
> +		return false;
> +
>  	might_sleep();
>  
>  	seq = prb_next_reserve_seq(prb);

The change seems acceptable. I just do not completely understand the
motivation.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 05/19] printk: Fail pr_flush() if before SYSTEM_SCHEDULING
  2024-07-26 13:14   ` Petr Mladek
@ 2024-07-26 14:45     ` John Ogness
  2024-07-30  9:50       ` Petr Mladek
  0 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-07-26 14:45 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On 2024-07-26, Petr Mladek <pmladek@suse.com> wrote:
> On Mon 2024-07-22 19:25:25, John Ogness wrote:
>> A follow-up change adds pr_flush() to console unregistration.
>> However, with boot consoles unregistration can happen very
>> early if there are also regular consoles registering as well.
>> In this case the pr_flush() is not important because the
>> regular console manually flushes the boot consoles before
>> unregistering them.
>
> It is not much clear what the "manual flush" means. Is it
> the console_flush_all() in get_init_console_seq()?
>
> I would write something like:
>
> <proposal>
> In this case the pr_flush() is not important because all consoles
> are flushed when checking the initial console sequence number.
> </proposal>

Yes, clearer. Thanks.

>> Allow pr_flush() to fail if @system_state has not yet reached
>> SYSTEM_SCHEDULING. This avoids might_sleep() and msleep()
>> explosions that would otherwise occur.
>
> What do you mean with the explosion, please?
> Does it add some warnings into the log?
> Does it cause extra delays?

I am certain that I could trigger a splat with might_sleep() using some
configured preemption mode. But now I cannot reproduce it. However, with
msleep() it is easy:

[    0.436739][    T0] printk: legacy console [ttyS0] enabled
[    0.439820][    T0] printk: legacy bootconsole [earlyser0] disabled
[    0.446822][    T0] BUG: scheduling while atomic: swapper/0/0/0x00000002
[    0.450491][    T0] 1 lock held by swapper/0/0:
[    0.457897][    T0]  #0: ffffffff82ae5f88 (console_mutex){+.+.}-{4:4}, at: console_list_lock+0x20/0x70
[    0.463141][    T0] Modules linked in:
[    0.465307][    T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.10.0-rc1+ #372
[    0.469394][    T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[    0.474402][    T0] Call Trace:
[    0.476246][    T0]  <TASK>
[    0.481473][    T0]  dump_stack_lvl+0x93/0xb0
[    0.483949][    T0]  dump_stack+0x10/0x20
[    0.486256][    T0]  __schedule_bug+0x68/0x90
[    0.488753][    T0]  __schedule+0xb9b/0xd80
[    0.491179][    T0]  ? lock_release+0xb5/0x270
[    0.493732][    T0]  schedule+0x43/0x170
[    0.495998][    T0]  schedule_timeout+0xc5/0x1e0
[    0.498634][    T0]  ? __pfx_process_timeout+0x10/0x10
[    0.501522][    T0]  ? msleep+0x13/0x50
[    0.503728][    T0]  msleep+0x3c/0x50
[    0.505847][    T0]  __pr_flush.constprop.0.isra.0+0x56/0x500
[    0.509050][    T0]  ? _printk+0x58/0x80
[    0.511332][    T0]  ? lock_is_held_type+0x9c/0x110
[    0.514106][    T0]  unregister_console_locked+0xe1/0x450
[    0.517144][    T0]  register_console+0x509/0x620
[    0.519827][    T0]  ? __pfx_univ8250_console_init+0x10/0x10
[    0.523042][    T0]  univ8250_console_init+0x24/0x40
[    0.525845][    T0]  console_init+0x43/0x210
[    0.528280][    T0]  start_kernel+0x493/0x980
[    0.530773][    T0]  x86_64_start_reservations+0x18/0x30
[    0.533755][    T0]  x86_64_start_kernel+0xae/0xc0
[    0.536473][    T0]  common_startup_64+0x12c/0x138
[    0.539210][    T0]  </TASK>

And then the kernel goes into an infinite loop complaining about:

1. releasing a pinned lock
2. unpinning an unpinned lock
3. bad: scheduling from the idle thread!
4. goto 1

John

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 05/19] printk: Fail pr_flush() if before SYSTEM_SCHEDULING
  2024-07-26 14:45     ` John Ogness
@ 2024-07-30  9:50       ` Petr Mladek
  0 siblings, 0 replies; 57+ messages in thread
From: Petr Mladek @ 2024-07-30  9:50 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Fri 2024-07-26 16:51:47, John Ogness wrote:
> On 2024-07-26, Petr Mladek <pmladek@suse.com> wrote:
> > On Mon 2024-07-22 19:25:25, John Ogness wrote:
> >> A follow-up change adds pr_flush() to console unregistration.
> >> However, with boot consoles unregistration can happen very
> >> early if there are also regular consoles registering as well.
> >> In this case the pr_flush() is not important because the
> >> regular console manually flushes the boot consoles before
> >> unregistering them.
> >
> > It is not much clear what the "manual flush" means. Is it
> > the console_flush_all() in get_init_console_seq()?
> >
> > I would write something like:
> >
> > <proposal>
> > In this case the pr_flush() is not important because all consoles
> > are flushed when checking the initial console sequence number.
> > </proposal>
> 
> Yes, clearer. Thanks.
> 
> >> Allow pr_flush() to fail if @system_state has not yet reached
> >> SYSTEM_SCHEDULING. This avoids might_sleep() and msleep()
> >> explosions that would otherwise occur.
> >
> > What do you mean with the explosion, please?
> > Does it add some warnings into the log?
> > Does it cause extra delays?
> 
> I am certain that I could trigger a splat with might_sleep() using some
> configured preemption mode. But now I cannot reproduce it. However, with
> msleep() it is easy:
> 
> [    0.436739][    T0] printk: legacy console [ttyS0] enabled
> [    0.439820][    T0] printk: legacy bootconsole [earlyser0] disabled
> [    0.446822][    T0] BUG: scheduling while atomic: swapper/0/0/0x00000002

It complains about sheduling while atomic.

> [    0.450491][    T0] 1 lock held by swapper/0/0:
> [    0.457897][    T0]  #0: ffffffff82ae5f88 (console_mutex){+.+.}-{4:4}, at: console_list_lock+0x20/0x70

But it is under a mutex so that scheduling should be possible.

It is weird.

> [    0.463141][    T0] Modules linked in:
> [    0.465307][    T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.10.0-rc1+ #372
> [    0.469394][    T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> [    0.474402][    T0] Call Trace:
> [    0.476246][    T0]  <TASK>
> [    0.481473][    T0]  dump_stack_lvl+0x93/0xb0
> [    0.483949][    T0]  dump_stack+0x10/0x20
> [    0.486256][    T0]  __schedule_bug+0x68/0x90
> [    0.488753][    T0]  __schedule+0xb9b/0xd80
> [    0.491179][    T0]  ? lock_release+0xb5/0x270
> [    0.493732][    T0]  schedule+0x43/0x170
> [    0.495998][    T0]  schedule_timeout+0xc5/0x1e0
> [    0.498634][    T0]  ? __pfx_process_timeout+0x10/0x10
> [    0.501522][    T0]  ? msleep+0x13/0x50
> [    0.503728][    T0]  msleep+0x3c/0x50
> [    0.505847][    T0]  __pr_flush.constprop.0.isra.0+0x56/0x500
> [    0.509050][    T0]  ? _printk+0x58/0x80
> [    0.511332][    T0]  ? lock_is_held_type+0x9c/0x110
> [    0.514106][    T0]  unregister_console_locked+0xe1/0x450
> [    0.517144][    T0]  register_console+0x509/0x620
> [    0.519827][    T0]  ? __pfx_univ8250_console_init+0x10/0x10
> [    0.523042][    T0]  univ8250_console_init+0x24/0x40
> [    0.525845][    T0]  console_init+0x43/0x210
> [    0.528280][    T0]  start_kernel+0x493/0x980
> [    0.530773][    T0]  x86_64_start_reservations+0x18/0x30
> [    0.533755][    T0]  x86_64_start_kernel+0xae/0xc0
> [    0.536473][    T0]  common_startup_64+0x12c/0x138
> [    0.539210][    T0]  </TASK>
> 
> And then the kernel goes into an infinite loop complaining about:
> 
> 1. releasing a pinned lock
> 2. unpinning an unpinned lock
> 3. bad: scheduling from the idle thread!
> 4. goto 1

Hmm, I have reproduced it as well. I do not understand it much.
But yeah, this is early during the boot and some things does
not work as expected.

I do not see any better solution. I am fine with the patch.

Well, it might we worth adding the above backtrace to the commit
message so that people know what we have seen.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH printk v3 06/19] printk: Flush console on unregister_console()
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
                   ` (4 preceding siblings ...)
  2024-07-22 17:19 ` [PATCH printk v3 05/19] printk: Fail pr_flush() if before SYSTEM_SCHEDULING John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-07-26 13:23   ` Petr Mladek
  2024-07-22 17:19 ` [PATCH printk v3 07/19] printk: Add helpers for flush type logic John Ogness
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

Ensure consoles have flushed pending records before
unregistering. The console should print up to at least its
related "console disabled" record.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 39db56a32c5e..82440b1f0d1e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3727,11 +3727,16 @@ static int unregister_console_locked(struct console *console)
 	if (res > 0)
 		return 0;
 
+	if (!console_is_registered_locked(console))
+		res = -ENODEV;
+	else if (console_is_usable(console, console->flags))
+		__pr_flush(console, 1000, true);
+
 	/* Disable it unconditionally */
 	console_srcu_write_flags(console, console->flags & ~CON_ENABLED);
 
-	if (!console_is_registered_locked(console))
-		return -ENODEV;
+	if (res < 0)
+		return res;
 
 	/*
 	 * Use the driver synchronization to ensure that the hardware is not
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 06/19] printk: Flush console on unregister_console()
  2024-07-22 17:19 ` [PATCH printk v3 06/19] printk: Flush console on unregister_console() John Ogness
@ 2024-07-26 13:23   ` Petr Mladek
  0 siblings, 0 replies; 57+ messages in thread
From: Petr Mladek @ 2024-07-26 13:23 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Mon 2024-07-22 19:25:26, John Ogness wrote:
> Ensure consoles have flushed pending records before
> unregistering. The console should print up to at least its
> related "console disabled" record.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

The code looks hairy. But it is because of the unconditional CON_ENABLED
flag handling. I do not see any better solution.

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH printk v3 07/19] printk: Add helpers for flush type logic
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
                   ` (5 preceding siblings ...)
  2024-07-22 17:19 ` [PATCH printk v3 06/19] printk: Flush console on unregister_console() John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-07-23  2:01   ` kernel test robot
                     ` (2 more replies)
  2024-07-22 17:19 ` [PATCH printk v3 08/19] printk: nbcon: Add context to usable() and emit() John Ogness
                   ` (11 subsequent siblings)
  18 siblings, 3 replies; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

There are many call sites where console flushing occur.
Depending on the system state and types of consoles, the flush
methods to use are different. A flush call site generally must
consider:

	@have_boot_console
	@have_nbcon_console
	@have_legacy_console
	@legacy_allow_panic_sync
	is_printk_preferred()

and take into account the current CPU state:

	NBCON_PRIO_NORMAL
	NBCON_PRIO_EMERGENCY
	NBCON_PRIO_PANIC

in order to decide if it should:

	flush nbcon directly via atomic_write() callback
	flush legacy directly via console_unlock
	flush legacy via offload to irq_work

All of these call sites use their own logic to make this
decision, which is complicated and error prone. Especially
later when two more flush methods will be introduced:

	flush nbcon via offload to kthread
	flush legacy via offload to kthread

Introduce a new internal struct console_flush_type that
specifies the flush method(s) that are available for a
particular call site to use.

Introduce helper functions to fill out console_flush_type to
be used for emergency and non-emergency call sites.

In many system states it is acceptable to flush legacy directly
via console_unlock or via offload to irq_work. For this reason
the non-emergency helper provides an argument @prefer_offload
for the caller to specify which method it is interested in
performing. (The helper functions will never allow both.)

Replace the logic of all flushing call sites to use the new
helpers. Note that this cleans up various corner cases where
is_printk_preferred() and @have_boot_console were not being
considerered before.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/internal.h | 100 +++++++++++++++++++++++++++++++--
 kernel/printk/nbcon.c    |  32 ++++++++---
 kernel/printk/printk.c   | 116 ++++++++++++++++++++-------------------
 3 files changed, 180 insertions(+), 68 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index d58f5cefbac3..ccdb81dc18f0 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -154,15 +154,105 @@ static inline bool console_is_usable(struct console *con, short flags) { return
 #endif /* CONFIG_PRINTK */
 
 extern bool have_boot_console;
+extern bool have_nbcon_console;
 extern bool have_legacy_console;
+extern bool legacy_allow_panic_sync;
+
+/**
+ * struct console_flush_type - Define how to flush the consoles
+ * @nbcon_atomic:	Flush directly using nbcon_atomic() callback
+ * @legacy_direct:	Call the legacy loop in this context
+ * @legacy_offload:	Offload the legacy loop into IRQ
+ */
+struct console_flush_type {
+	bool	nbcon_atomic;
+	bool	legacy_direct;
+	bool	legacy_offload;
+};
 
 /*
- * Specifies if the console lock/unlock dance is needed for console
- * printing. If @have_boot_console is true, the nbcon consoles will
- * be printed serially along with the legacy consoles because nbcon
- * consoles cannot print simultaneously with boot consoles.
+ * Decide while console flushing methods are to be used. Used
+ * for all flushing except when flushing from emergency state.
+ *
+ * Set @prefer_offload to true if the context is only interested in
+ * offloading. Then offloading types will be set instead of direct,
+ * if appropriate.
  */
-#define printing_via_unlock (have_legacy_console || have_boot_console)
+static inline void printk_get_console_flush_type(struct console_flush_type *ft, bool prefer_offload)
+{
+	memset(ft, 0, sizeof(*ft));
+
+	switch (nbcon_get_default_prio()) {
+	case NBCON_PRIO_NORMAL:
+		if (have_legacy_console || have_boot_console) {
+			if (prefer_offload || is_printk_deferred())
+				ft->legacy_offload = true;
+			else
+				ft->legacy_direct = true;
+		}
+
+		if (have_nbcon_console && !have_boot_console)
+			ft->nbcon_atomic = true;
+		break;
+
+	case NBCON_PRIO_EMERGENCY:
+		/*
+		 * Skip. The consoles will be flushed when exiting emergency
+		 * state. See printk_get_emergency_console_flush_type().
+		 */
+		break;
+
+	case NBCON_PRIO_PANIC:
+		if (have_nbcon_console && !have_boot_console)
+			ft->nbcon_atomic = true;
+
+		/*
+		 * In panic, if nbcon atomic printing occurs, the legacy
+		 * consoles must remain silent until explicitly allowed.
+		 */
+		if (legacy_allow_panic_sync || !ft->nbcon_atomic)
+			ft->legacy_direct = true;
+		break;
+
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+}
+
+/*
+ * Decide while console flushing methods are to be used from
+ * emergency state.
+ */
+static inline void printk_get_emergency_console_flush_type(struct console_flush_type *ft)
+{
+	memset(ft, 0, sizeof(*ft));
+
+	switch (nbcon_get_default_prio()) {
+	case NBCON_PRIO_EMERGENCY:
+		if (have_nbcon_console && !have_boot_console)
+			ft->nbcon_atomic = true;
+
+		if (have_legacy_console || have_boot_console) {
+			if (is_printk_deferred())
+				ft->legacy_offload = true;
+			else
+				ft->legacy_direct = true;
+		}
+		break;
+
+	case NBCON_PRIO_PANIC:
+		/*
+		 * Skip. Console flushing is handled by panic code.
+		 * See printk_get_console_flush_type().
+		 */
+		break;
+
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+}
 
 extern struct printk_buffers printk_shared_pbufs;
 
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index db1685a6d5cd..917022105b8b 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1235,6 +1235,7 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
 static void nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
 					   bool allow_unsafe_takeover)
 {
+	struct console_flush_type ft;
 	unsigned long flags;
 	int err;
 
@@ -1267,7 +1268,9 @@ static void nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
 	 * context must flush those remaining records because there is no
 	 * other context that will do it.
 	 */
-	if (prb_read_valid(prb, nbcon_seq_read(con), NULL)) {
+	printk_get_console_flush_type(&ft, false);
+	if (ft.nbcon_atomic &&
+	    prb_read_valid(prb, nbcon_seq_read(con), NULL)) {
 		stop_seq = prb_next_reserve_seq(prb);
 		goto again;
 	}
@@ -1362,6 +1365,7 @@ void nbcon_cpu_emergency_exit(void)
 {
 	unsigned int *cpu_emergency_nesting;
 	bool do_trigger_flush = false;
+	struct console_flush_type ft;
 
 	cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
 
@@ -1374,19 +1378,25 @@ void nbcon_cpu_emergency_exit(void)
 	 * for the emergency messages is NBCON_PRIO_EMERGENCY.
 	 */
 	if (*cpu_emergency_nesting == 1) {
-		nbcon_atomic_flush_pending();
+		printk_get_emergency_console_flush_type(&ft);
+
+		if (ft.nbcon_atomic)
+			nbcon_atomic_flush_pending();
 
 		/*
 		 * Safely attempt to flush the legacy consoles in this
 		 * context. Otherwise an irq_work context is triggered
 		 * to handle it.
 		 */
-		do_trigger_flush = true;
-		if (printing_via_unlock && !is_printk_deferred()) {
+		if (ft.legacy_direct) {
 			if (console_trylock()) {
 				do_trigger_flush = false;
 				console_unlock();
+			} else {
+				do_trigger_flush = true;
 			}
+		} else {
+			do_trigger_flush = ft.legacy_offload;
 		}
 	}
 
@@ -1412,13 +1422,18 @@ void nbcon_cpu_emergency_exit(void)
  */
 void nbcon_cpu_emergency_flush(void)
 {
+	struct console_flush_type ft;
+
 	/* The explicit flush is needed only in the emergency context. */
 	if (*(nbcon_get_cpu_emergency_nesting()) == 0)
 		return;
 
-	nbcon_atomic_flush_pending();
+	printk_get_emergency_console_flush_type(&ft);
+
+	if (ft.nbcon_atomic)
+		nbcon_atomic_flush_pending();
 
-	if (printing_via_unlock && !is_printk_deferred()) {
+	if (ft.legacy_direct) {
 		if (console_trylock())
 			console_unlock();
 	}
@@ -1521,6 +1536,7 @@ EXPORT_SYMBOL_GPL(nbcon_device_try_acquire);
 void nbcon_device_release(struct console *con)
 {
 	struct nbcon_context *ctxt = &ACCESS_PRIVATE(con, nbcon_device_ctxt);
+	struct console_flush_type ft;
 	int cookie;
 
 	if (!nbcon_context_exit_unsafe(ctxt))
@@ -1533,8 +1549,10 @@ void nbcon_device_release(struct console *con)
 	 * was locked. The console_srcu_read_lock must be taken to ensure
 	 * the console is usable throughout flushing.
 	 */
+	printk_get_console_flush_type(&ft, false);
 	cookie = console_srcu_read_lock();
-	if (console_is_usable(con, console_srcu_read_flags(con)) &&
+	if (ft.nbcon_atomic &&
+	    console_is_usable(con, console_srcu_read_flags(con)) &&
 	    prb_read_valid(prb, nbcon_seq_read(con), NULL)) {
 		__nbcon_atomic_flush_pending_con(con, prb_next_reserve_seq(prb), false);
 	}
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 82440b1f0d1e..3fbe27551f75 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -472,7 +472,7 @@ bool have_legacy_console;
  * synchronous printing of legacy consoles will not occur during panic until
  * the backtrace has been stored to the ringbuffer.
  */
-static bool have_nbcon_console;
+bool have_nbcon_console;
 
 /*
  * Specifies if a boot console is registered. If boot consoles are present,
@@ -482,6 +482,9 @@ static bool have_nbcon_console;
  */
 bool have_boot_console;
 
+/* See printk_legacy_allow_panic_sync() for details. */
+bool legacy_allow_panic_sync;
+
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* All 3 protected by @syslog_lock. */
@@ -2321,8 +2324,6 @@ int vprintk_store(int facility, int level,
 	return ret;
 }
 
-static bool legacy_allow_panic_sync;
-
 /*
  * This acts as a one-way switch to allow legacy consoles to print from
  * the printk() caller context on a panic CPU. It also attempts to flush
@@ -2330,9 +2331,13 @@ static bool legacy_allow_panic_sync;
  */
 void printk_legacy_allow_panic_sync(void)
 {
+	struct console_flush_type ft;
+
 	legacy_allow_panic_sync = true;
 
-	if (printing_via_unlock && !in_nmi()) {
+	printk_get_console_flush_type(&ft, false);
+
+	if (ft.legacy_direct && !in_nmi()) {
 		if (console_trylock())
 			console_unlock();
 	}
@@ -2342,7 +2347,8 @@ asmlinkage int vprintk_emit(int facility, int level,
 			    const struct dev_printk_info *dev_info,
 			    const char *fmt, va_list args)
 {
-	bool do_trylock_unlock = printing_via_unlock;
+	struct console_flush_type ft;
+	bool defer_legacy = false;
 	int printed_len;
 
 	/* Suppress unimportant messages after panic happens */
@@ -2360,41 +2366,19 @@ asmlinkage int vprintk_emit(int facility, int level,
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
 		/* If called from the scheduler, we can not call up(). */
-		do_trylock_unlock = false;
+		defer_legacy = true;
 	}
 
 	printk_delay(level);
 
 	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
 
-	if (have_nbcon_console && !have_boot_console) {
-		bool is_panic_context = this_cpu_in_panic();
+	printk_get_console_flush_type(&ft, false);
 
-		/*
-		 * In panic, the legacy consoles are not allowed to print from
-		 * the printk calling context unless explicitly allowed. This
-		 * gives the safe nbcon consoles a chance to print out all the
-		 * panic messages first. This restriction only applies if
-		 * there are nbcon consoles registered.
-		 */
-		if (is_panic_context)
-			do_trylock_unlock &= legacy_allow_panic_sync;
+	if (ft.nbcon_atomic)
+		nbcon_atomic_flush_pending();
 
-		/*
-		 * There are situations where nbcon atomic printing should
-		 * happen in the printk() caller context:
-		 *
-		 * - When this CPU is in panic.
-		 *
-		 * Note that if boot consoles are registered, the console
-		 * lock/unlock dance must be relied upon instead because nbcon
-		 * consoles cannot print simultaneously with boot consoles.
-		 */
-		if (is_panic_context)
-			nbcon_atomic_flush_pending();
-	}
-
-	if (do_trylock_unlock) {
+	if (!defer_legacy && ft.legacy_direct) {
 		/*
 		 * The caller may be holding system-critical or
 		 * timing-sensitive locks. Disable preemption during
@@ -2409,22 +2393,17 @@ asmlinkage int vprintk_emit(int facility, int level,
 		 * semaphore. The release will print out buffers. With the
 		 * spinning variant, this context tries to take over the
 		 * printing from another printing context.
-		 *
-		 * Skip it in EMERGENCY priority. The console will be
-		 * explicitly flushed when exiting the emergency section.
 		 */
-		if (nbcon_get_default_prio() != NBCON_PRIO_EMERGENCY) {
-			if (console_trylock_spinning())
-				console_unlock();
-		}
+		if (console_trylock_spinning())
+			console_unlock();
 
 		preempt_enable();
 	}
 
-	if (do_trylock_unlock)
-		wake_up_klogd();
-	else
+	if ((defer_legacy && ft.legacy_direct) || ft.legacy_offload)
 		defer_console_output();
+	else
+		wake_up_klogd();
 
 	return printed_len;
 }
@@ -2728,10 +2707,15 @@ void resume_console(void)
  */
 static int console_cpu_notify(unsigned int cpu)
 {
-	if (!cpuhp_tasks_frozen && printing_via_unlock) {
-		/* If trylock fails, someone else is doing the printing */
-		if (console_trylock())
-			console_unlock();
+	struct console_flush_type ft;
+
+	if (!cpuhp_tasks_frozen) {
+		printk_get_console_flush_type(&ft, false);
+
+		if (ft.legacy_direct) {
+			if (console_trylock())
+				console_unlock();
+		}
 	}
 	return 0;
 }
@@ -3256,6 +3240,7 @@ static void __console_rewind_all(void)
  */
 void console_flush_on_panic(enum con_flush_mode mode)
 {
+	struct console_flush_type ft;
 	bool handover;
 	u64 next_seq;
 
@@ -3279,9 +3264,12 @@ void console_flush_on_panic(enum con_flush_mode mode)
 	if (mode == CONSOLE_REPLAY_ALL)
 		__console_rewind_all();
 
-	nbcon_atomic_flush_pending();
+	printk_get_console_flush_type(&ft, false);
+
+	if (ft.nbcon_atomic)
+		nbcon_atomic_flush_pending();
 
-	if (printing_via_unlock)
+	if (ft.legacy_direct)
 		console_flush_all(false, &next_seq, &handover);
 }
 
@@ -3943,6 +3931,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 {
 	unsigned long timeout_jiffies = msecs_to_jiffies(timeout_ms);
 	unsigned long remaining_jiffies = timeout_jiffies;
+	struct console_flush_type ft;
 	struct console *c;
 	u64 last_diff = 0;
 	u64 printk_seq;
@@ -3959,8 +3948,10 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 
 	seq = prb_next_reserve_seq(prb);
 
+	printk_get_console_flush_type(&ft, false);
+
 	/* Flush the consoles so that records up to @seq are printed. */
-	if (printing_via_unlock) {
+	if (ft.legacy_direct) {
 		console_lock();
 		console_unlock();
 	}
@@ -3968,18 +3959,19 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 	for (;;) {
 		unsigned long begin_jiffies;
 		unsigned long slept_jiffies;
-		bool use_console_lock = printing_via_unlock;
+
+		printk_get_console_flush_type(&ft, false);
 
 		/*
-		 * Ensure the compiler does not optimize @use_console_lock to
-		 * be @printing_via_unlock since the latter can change at any
+		 * Ensure the compiler is re-reading the various state
+		 * variables for setting @ft since they can change at any
 		 * time.
 		 */
 		barrier();
 
 		diff = 0;
 
-		if (use_console_lock) {
+		if (ft.legacy_direct) {
 			/*
 			 * Hold the console_lock to guarantee safe access to
 			 * console->seq. Releasing console_lock flushes more
@@ -4007,7 +3999,16 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 			if (flags & CON_NBCON) {
 				printk_seq = nbcon_seq_read(c);
 			} else {
-				WARN_ON_ONCE(!use_console_lock);
+				/*
+				 * Warn if this is a usable legacy console but
+				 * the console lock was not taken because it
+				 * is not allowed to do the direct printing
+				 * when the console lock is released. Without
+				 * the console lock, reading console->seq is
+				 * unsafe.
+				 */
+				WARN_ON_ONCE(!ft.legacy_direct);
+
 				printk_seq = c->seq;
 			}
 
@@ -4019,7 +4020,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 		if (diff != last_diff && reset_on_progress)
 			remaining_jiffies = timeout_jiffies;
 
-		if (use_console_lock)
+		if (ft.legacy_direct)
 			console_unlock();
 
 		/* Note: @diff is 0 if there are no usable consoles. */
@@ -4086,6 +4087,8 @@ static DEFINE_PER_CPU(struct irq_work, wake_up_klogd_work) =
 
 static void __wake_up_klogd(int val)
 {
+	struct console_flush_type ft;
+
 	if (!printk_percpu_data_ready())
 		return;
 
@@ -4110,7 +4113,8 @@ static void __wake_up_klogd(int val)
 	 * registered legacy console when writing the message about it
 	 * being enabled.
 	 */
-	if (!printing_via_unlock)
+	printk_get_console_flush_type(&ft, true);
+	if (!ft.legacy_offload)
 		val &= ~PRINTK_PENDING_OUTPUT;
 
 	if (val) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 07/19] printk: Add helpers for flush type logic
  2024-07-22 17:19 ` [PATCH printk v3 07/19] printk: Add helpers for flush type logic John Ogness
@ 2024-07-23  2:01   ` kernel test robot
  2024-07-23  8:39     ` John Ogness
  2024-07-23  3:29   ` kernel test robot
  2024-07-26 15:51   ` Petr Mladek
  2 siblings, 1 reply; 57+ messages in thread
From: kernel test robot @ 2024-07-23  2:01 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: oe-kbuild-all, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel

Hi John,

kernel test robot noticed the following build errors:

[auto build test ERROR on b18703ea7157f62f02eb0ceb11f6fa0138e90adc]

url:    https://github.com/intel-lab-lkp/linux/commits/John-Ogness/printk-nbcon-Clarify-nbcon_get_default_prio-context/20240723-015154
base:   b18703ea7157f62f02eb0ceb11f6fa0138e90adc
patch link:    https://lore.kernel.org/r/20240722171939.3349410-8-john.ogness%40linutronix.de
patch subject: [PATCH printk v3 07/19] printk: Add helpers for flush type logic
config: x86_64-buildonly-randconfig-002-20240723 (https://download.01.org/0day-ci/archive/20240723/202407230936.FqviIH1q-lkp@intel.com/config)
compiler: gcc-10 (Ubuntu 10.5.0-1ubuntu1) 10.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240723/202407230936.FqviIH1q-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407230936.FqviIH1q-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from kernel/printk/printk.c:61:
   kernel/printk/internal.h: In function 'printk_get_console_flush_type':
>> kernel/printk/internal.h:188:26: error: implicit declaration of function 'is_printk_deferred'; did you mean '_printk_deferred'? [-Werror=implicit-function-declaration]
     188 |    if (prefer_offload || is_printk_deferred())
         |                          ^~~~~~~~~~~~~~~~~~
         |                          _printk_deferred
   cc1: some warnings being treated as errors


vim +188 kernel/printk/internal.h

   172	
   173	/*
   174	 * Decide while console flushing methods are to be used. Used
   175	 * for all flushing except when flushing from emergency state.
   176	 *
   177	 * Set @prefer_offload to true if the context is only interested in
   178	 * offloading. Then offloading types will be set instead of direct,
   179	 * if appropriate.
   180	 */
   181	static inline void printk_get_console_flush_type(struct console_flush_type *ft, bool prefer_offload)
   182	{
   183		memset(ft, 0, sizeof(*ft));
   184	
   185		switch (nbcon_get_default_prio()) {
   186		case NBCON_PRIO_NORMAL:
   187			if (have_legacy_console || have_boot_console) {
 > 188				if (prefer_offload || is_printk_deferred())
   189					ft->legacy_offload = true;
   190				else
   191					ft->legacy_direct = true;
   192			}
   193	
   194			if (have_nbcon_console && !have_boot_console)
   195				ft->nbcon_atomic = true;
   196			break;
   197	
   198		case NBCON_PRIO_EMERGENCY:
   199			/*
   200			 * Skip. The consoles will be flushed when exiting emergency
   201			 * state. See printk_get_emergency_console_flush_type().
   202			 */
   203			break;
   204	
   205		case NBCON_PRIO_PANIC:
   206			if (have_nbcon_console && !have_boot_console)
   207				ft->nbcon_atomic = true;
   208	
   209			/*
   210			 * In panic, if nbcon atomic printing occurs, the legacy
   211			 * consoles must remain silent until explicitly allowed.
   212			 */
   213			if (legacy_allow_panic_sync || !ft->nbcon_atomic)
   214				ft->legacy_direct = true;
   215			break;
   216	
   217		default:
   218			WARN_ON_ONCE(1);
   219			break;
   220		}
   221	}
   222	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 07/19] printk: Add helpers for flush type logic
  2024-07-23  2:01   ` kernel test robot
@ 2024-07-23  8:39     ` John Ogness
  0 siblings, 0 replies; 57+ messages in thread
From: John Ogness @ 2024-07-23  8:39 UTC (permalink / raw)
  To: kernel test robot, Petr Mladek
  Cc: oe-kbuild-all, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel

On 2024-07-23, kernel test robot <lkp@intel.com> wrote:
>    In file included from kernel/printk/printk.c:61:
>    kernel/printk/internal.h: In function 'printk_get_console_flush_type':
>>> kernel/printk/internal.h:188:26: error: implicit declaration of function 'is_printk_deferred'; did you mean '_printk_deferred'? [-Werror=implicit-function-declaration]
>      188 |    if (prefer_offload || is_printk_deferred())
>          |                          ^~~~~~~~~~~~~~~~~~
>          |                          _printk_deferred

Thanks lkp.

I will wait for the feedback to this series before addressing this. The
flush type macros may not land in mainline in this form anyway.

John Ogness

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 07/19] printk: Add helpers for flush type logic
  2024-07-22 17:19 ` [PATCH printk v3 07/19] printk: Add helpers for flush type logic John Ogness
  2024-07-23  2:01   ` kernel test robot
@ 2024-07-23  3:29   ` kernel test robot
  2024-07-26 15:51   ` Petr Mladek
  2 siblings, 0 replies; 57+ messages in thread
From: kernel test robot @ 2024-07-23  3:29 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: llvm, oe-kbuild-all, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel

Hi John,

kernel test robot noticed the following build errors:

[auto build test ERROR on b18703ea7157f62f02eb0ceb11f6fa0138e90adc]

url:    https://github.com/intel-lab-lkp/linux/commits/John-Ogness/printk-nbcon-Clarify-nbcon_get_default_prio-context/20240723-015154
base:   b18703ea7157f62f02eb0ceb11f6fa0138e90adc
patch link:    https://lore.kernel.org/r/20240722171939.3349410-8-john.ogness%40linutronix.de
patch subject: [PATCH printk v3 07/19] printk: Add helpers for flush type logic
config: arm-randconfig-004-20240723 (https://download.01.org/0day-ci/archive/20240723/202407231135.jtSVw3hi-lkp@intel.com/config)
compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project ad154281230d83ee551e12d5be48bb956ef47ed3)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240723/202407231135.jtSVw3hi-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407231135.jtSVw3hi-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from kernel/printk/printk.c:23:
   In file included from include/linux/mm.h:2253:
   include/linux/vmstat.h:514:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     514 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   In file included from kernel/printk/printk.c:61:
>> kernel/printk/internal.h:188:26: error: call to undeclared function 'is_printk_deferred'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     188 |                         if (prefer_offload || is_printk_deferred())
         |                                               ^
   kernel/printk/internal.h:188:26: note: did you mean '_printk_deferred'?
   include/linux/printk.h:218:5: note: '_printk_deferred' declared here
     218 | int _printk_deferred(const char *s, ...)
         |     ^
   In file included from kernel/printk/printk.c:61:
   kernel/printk/internal.h:237:8: error: call to undeclared function 'is_printk_deferred'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     237 |                         if (is_printk_deferred())
         |                             ^
   1 warning and 2 errors generated.


vim +/is_printk_deferred +188 kernel/printk/internal.h

   172	
   173	/*
   174	 * Decide while console flushing methods are to be used. Used
   175	 * for all flushing except when flushing from emergency state.
   176	 *
   177	 * Set @prefer_offload to true if the context is only interested in
   178	 * offloading. Then offloading types will be set instead of direct,
   179	 * if appropriate.
   180	 */
   181	static inline void printk_get_console_flush_type(struct console_flush_type *ft, bool prefer_offload)
   182	{
   183		memset(ft, 0, sizeof(*ft));
   184	
   185		switch (nbcon_get_default_prio()) {
   186		case NBCON_PRIO_NORMAL:
   187			if (have_legacy_console || have_boot_console) {
 > 188				if (prefer_offload || is_printk_deferred())
   189					ft->legacy_offload = true;
   190				else
   191					ft->legacy_direct = true;
   192			}
   193	
   194			if (have_nbcon_console && !have_boot_console)
   195				ft->nbcon_atomic = true;
   196			break;
   197	
   198		case NBCON_PRIO_EMERGENCY:
   199			/*
   200			 * Skip. The consoles will be flushed when exiting emergency
   201			 * state. See printk_get_emergency_console_flush_type().
   202			 */
   203			break;
   204	
   205		case NBCON_PRIO_PANIC:
   206			if (have_nbcon_console && !have_boot_console)
   207				ft->nbcon_atomic = true;
   208	
   209			/*
   210			 * In panic, if nbcon atomic printing occurs, the legacy
   211			 * consoles must remain silent until explicitly allowed.
   212			 */
   213			if (legacy_allow_panic_sync || !ft->nbcon_atomic)
   214				ft->legacy_direct = true;
   215			break;
   216	
   217		default:
   218			WARN_ON_ONCE(1);
   219			break;
   220		}
   221	}
   222	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 07/19] printk: Add helpers for flush type logic
  2024-07-22 17:19 ` [PATCH printk v3 07/19] printk: Add helpers for flush type logic John Ogness
  2024-07-23  2:01   ` kernel test robot
  2024-07-23  3:29   ` kernel test robot
@ 2024-07-26 15:51   ` Petr Mladek
  2 siblings, 0 replies; 57+ messages in thread
From: Petr Mladek @ 2024-07-26 15:51 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Mon 2024-07-22 19:25:27, John Ogness wrote:
> There are many call sites where console flushing occur.
> Depending on the system state and types of consoles, the flush
> methods to use are different. A flush call site generally must
> consider:
> 
> 	@have_boot_console
> 	@have_nbcon_console
> 	@have_legacy_console
> 	@legacy_allow_panic_sync
> 	is_printk_preferred()
> 
> and take into account the current CPU state:
> 
> 	NBCON_PRIO_NORMAL
> 	NBCON_PRIO_EMERGENCY
> 	NBCON_PRIO_PANIC
> 
> in order to decide if it should:
> 
> 	flush nbcon directly via atomic_write() callback
> 	flush legacy directly via console_unlock
> 	flush legacy via offload to irq_work
> 
> All of these call sites use their own logic to make this
> decision, which is complicated and error prone. Especially
> later when two more flush methods will be introduced:
> 
> 	flush nbcon via offload to kthread
> 	flush legacy via offload to kthread
> 
> Introduce a new internal struct console_flush_type that
> specifies the flush method(s) that are available for a
> particular call site to use.
> 
> Introduce helper functions to fill out console_flush_type to
> be used for emergency and non-emergency call sites.
> 
> In many system states it is acceptable to flush legacy directly
> via console_unlock or via offload to irq_work. For this reason
> the non-emergency helper provides an argument @prefer_offload
> for the caller to specify which method it is interested in
> performing. (The helper functions will never allow both.)
> 
> Replace the logic of all flushing call sites to use the new
> helpers. Note that this cleans up various corner cases where
> is_printk_preferred() and @have_boot_console were not being
> considerered before.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2330,9 +2331,13 @@ static bool legacy_allow_panic_sync;
>   */
>  void printk_legacy_allow_panic_sync(void)
>  {
> +	struct console_flush_type ft;
> +
>  	legacy_allow_panic_sync = true;
>  
> -	if (printing_via_unlock && !in_nmi()) {
> +	printk_get_console_flush_type(&ft, false);
> +
> +	if (ft.legacy_direct && !in_nmi()) {

in_nmi() check is not needed any longer. It is already done in
is_printk_deferred() in printk_get_console_flush_type().

>  		if (console_trylock())
>  			console_unlock();
>  	}
> @@ -2342,7 +2347,8 @@ asmlinkage int vprintk_emit(int facility, int level,
>  			    const struct dev_printk_info *dev_info,
>  			    const char *fmt, va_list args)
>  {
> -	bool do_trylock_unlock = printing_via_unlock;
> +	struct console_flush_type ft;
> +	bool defer_legacy = false;
>  	int printed_len;
>  
>  	/* Suppress unimportant messages after panic happens */
> @@ -2360,41 +2366,19 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	if (level == LOGLEVEL_SCHED) {
>  		level = LOGLEVEL_DEFAULT;
>  		/* If called from the scheduler, we can not call up(). */
> -		do_trylock_unlock = false;
> +		defer_legacy = true;
>  	}
>  
>  	printk_delay(level);
>  
>  	printed_len = vprintk_store(facility, level, dev_info, fmt, args);
>  
> -	if (have_nbcon_console && !have_boot_console) {
> -		bool is_panic_context = this_cpu_in_panic();
> +	printk_get_console_flush_type(&ft, false);

We could pass "defer_legacy" here. It won't be needed to check it
later then.

> -		/*
> -		 * In panic, the legacy consoles are not allowed to print from
> -		 * the printk calling context unless explicitly allowed. This
> -		 * gives the safe nbcon consoles a chance to print out all the
> -		 * panic messages first. This restriction only applies if
> -		 * there are nbcon consoles registered.
> -		 */
> -		if (is_panic_context)
> -			do_trylock_unlock &= legacy_allow_panic_sync;
> +	if (ft.nbcon_atomic)
> +		nbcon_atomic_flush_pending();
>  
> -		/*
> -		 * There are situations where nbcon atomic printing should
> -		 * happen in the printk() caller context:
> -		 *
> -		 * - When this CPU is in panic.
> -		 *
> -		 * Note that if boot consoles are registered, the console
> -		 * lock/unlock dance must be relied upon instead because nbcon
> -		 * consoles cannot print simultaneously with boot consoles.
> -		 */
> -		if (is_panic_context)
> -			nbcon_atomic_flush_pending();

Just for record. If I get it correctly than this actually fixes a bug.
The original code called nbcon_atomic_flush_pending() only in panic().
The nbcon consoles were not flushed when there were only nbcon consoles
(printing_via_unlock == false).

I think that we did not notice it because it probably got fixed by
later patches adding the printk kthreads.

> -	}
> -
> -	if (do_trylock_unlock) {
> +	if (!defer_legacy && ft.legacy_direct) {

@defer_legacy should not be needed if we passed it to
printk_get_console_flush_type().

>  		/*
>  		 * The caller may be holding system-critical or
>  		 * timing-sensitive locks. Disable preemption during
> @@ -2409,22 +2393,17 @@ asmlinkage int vprintk_emit(int facility, int level,
>  		 * semaphore. The release will print out buffers. With the
>  		 * spinning variant, this context tries to take over the
>  		 * printing from another printing context.
> -		 *
> -		 * Skip it in EMERGENCY priority. The console will be
> -		 * explicitly flushed when exiting the emergency section.
>  		 */
> -		if (nbcon_get_default_prio() != NBCON_PRIO_EMERGENCY) {
> -			if (console_trylock_spinning())
> -				console_unlock();
> -		}
> +		if (console_trylock_spinning())
> +			console_unlock();
>  
>  		preempt_enable();
>  	}
>  
> -	if (do_trylock_unlock)
> -		wake_up_klogd();
> -	else
> +	if ((defer_legacy && ft.legacy_direct) || ft.legacy_offload)

ditto.

>  		defer_console_output();
> +	else
> +		wake_up_klogd();
>  
>  	return printed_len;
>  }
> @@ -2728,10 +2707,15 @@ void resume_console(void)
>   */
>  static int console_cpu_notify(unsigned int cpu)
>  {
> -	if (!cpuhp_tasks_frozen && printing_via_unlock) {
> -		/* If trylock fails, someone else is doing the printing */
> -		if (console_trylock())
> -			console_unlock();
> +	struct console_flush_type ft;
> +
> +	if (!cpuhp_tasks_frozen) {
> +		printk_get_console_flush_type(&ft, false);
> +
> +		if (ft.legacy_direct) {
> +			if (console_trylock())
> +				console_unlock();
> +		}

One might be curious why we do not flush nbcon consoles here.
I guess that it will be less important after introducing
the kthreads.

Could it be called before the kthreads are started?

Anyway, this looks like a common pattern. Maybe, we could hide
it into some helper and be in the safe side:

/* Try to flush consoles directly when needed. */
void try_console_flush()
{
	struct console_flush_type ft;

	printk_get_console_flush_type(&ft, false);

	if (ft.nbcon_atomic)
		nbcon_atomic_flush_pending();

	if (ft.legacy_direct)
		console_flush_all(false, &next_seq, &handover);
}

>  	}
>  	return 0;
>  }

Otherwise, it looks good.

Heh, I wondered several times if it was worth it. The struct
console_flush_type and printk_get_console_flush_type()
sometimes looked like an overkill. But I see many advantages:

  + As the commit message says, the decision how to flush the messages
    depend on many variables. And it is nice to have it in one place.

  + The logic will get even more complicated after adding the
    kthreads.

  + printk_get_console_flush_type() allows to change the behavior
    in many situations in one place.

  + printk_get_console_flush_type() allows to easily find all
    locations where we decide how to flush the messages. It helps
    to check affected code paths.

So, I think that it is worth it in the end.

Note that I did not check the emergency code paths because
they are going to be reworked as discussed in the printk pull
request for 6.11, see
https://lore.kernel.org/r/ZqJKbcLgTeYRkDd6@pathway.suse.cz

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH printk v3 08/19] printk: nbcon: Add context to usable() and emit()
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
                   ` (6 preceding siblings ...)
  2024-07-22 17:19 ` [PATCH printk v3 07/19] printk: Add helpers for flush type logic John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-07-30 12:30   ` Petr Mladek
  2024-07-22 17:19 ` [PATCH printk v3 09/19] printk: nbcon: Introduce printer kthreads John Ogness
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman

The nbcon consoles will have two callbacks to be used for
different contexts. In order to determine if an nbcon console
is usable, console_is_usable() must know if it is a context
that will need to use the optional write_atomic() callback.
Also, nbcon_emit_next_record() must know which callback it
needs to call.

Add an extra parameter @use_atomic to console_is_usable() and
nbcon_emit_next_record() to specify this.

Since so far only the write_atomic() callback exists,
@use_atomic is set to true for all call sites.

For legacy consoles, @use_atomic is not used.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 include/linux/console.h  |  2 +-
 kernel/printk/internal.h |  8 +++++---
 kernel/printk/nbcon.c    | 25 +++++++++++++++++++------
 kernel/printk/printk.c   |  6 +++---
 4 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 38ef6e64da19..5e9285e886f5 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -349,7 +349,7 @@ struct console {
 	/**
 	 * @write_atomic:
 	 *
-	 * NBCON callback to write out text in any context.
+	 * NBCON callback to write out text in any context. (Optional)
 	 *
 	 * This callback is called with the console already acquired. However,
 	 * a higher priority context is allowed to take it over by default.
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index ccdb81dc18f0..8211623aee9b 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -98,7 +98,7 @@ bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
  * which can also play a role in deciding if @con can be used to print
  * records.
  */
-static inline bool console_is_usable(struct console *con, short flags)
+static inline bool console_is_usable(struct console *con, short flags, bool use_atomic)
 {
 	if (!(flags & CON_ENABLED))
 		return false;
@@ -107,7 +107,8 @@ static inline bool console_is_usable(struct console *con, short flags)
 		return false;
 
 	if (flags & CON_NBCON) {
-		if (!con->write_atomic)
+		/* The write_atomic() callback is optional. */
+		if (use_atomic && !con->write_atomic)
 			return false;
 	} else {
 		if (!con->write)
@@ -149,7 +150,8 @@ static inline void nbcon_atomic_flush_pending(void) { }
 static inline bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
 						 int cookie) { return false; }
 
-static inline bool console_is_usable(struct console *con, short flags) { return false; }
+static inline bool console_is_usable(struct console *con, short flags,
+				     bool use_atomic) { return false; }
 
 #endif /* CONFIG_PRINTK */
 
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 917022105b8b..ae9cd6d0b740 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -914,6 +914,7 @@ EXPORT_SYMBOL_GPL(nbcon_reacquire_nobuf);
 /**
  * nbcon_emit_next_record - Emit a record in the acquired context
  * @wctxt:	The write context that will be handed to the write function
+ * @use_atomic:	True if the write_atomic() callback is to be used
  *
  * Return:	True if this context still owns the console. False if
  *		ownership was handed over or taken.
@@ -927,7 +928,7 @@ EXPORT_SYMBOL_GPL(nbcon_reacquire_nobuf);
  * When true is returned, @wctxt->ctxt.backlog indicates whether there are
  * still records pending in the ringbuffer,
  */
-static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
+static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt, bool use_atomic)
 {
 	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
 	struct console *con = ctxt->console;
@@ -938,6 +939,18 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
 	unsigned long con_dropped;
 	unsigned long dropped;
 
+	/*
+	 * This function should never be called for consoles that have not
+	 * implemented the necessary callback for writing: i.e. legacy
+	 * consoles and, when atomic, nbcon consoles with no write_atomic().
+	 * Handle it as if ownership was lost and try to continue.
+	 */
+	if (WARN_ON_ONCE((use_atomic && !con->write_atomic) ||
+			 !(console_srcu_read_flags(con) & CON_NBCON))) {
+		nbcon_context_release(ctxt);
+		return false;
+	}
+
 	/*
 	 * The printk buffers are filled within an unsafe section. This
 	 * prevents NBCON_PRIO_NORMAL and NBCON_PRIO_EMERGENCY from
@@ -972,7 +985,7 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
 	/* Initialize the write context for driver callbacks. */
 	nbcon_write_context_set_buf(wctxt, &pmsg.pbufs->outbuf[0], pmsg.outbuf_len);
 
-	if (con->write_atomic) {
+	if (use_atomic) {
 		con->write_atomic(con, wctxt);
 	} else {
 		/*
@@ -1104,7 +1117,7 @@ static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
 	 * The higher priority printing context takes over responsibility
 	 * to print the pending records.
 	 */
-	if (!nbcon_emit_next_record(wctxt))
+	if (!nbcon_emit_next_record(wctxt, true))
 		return false;
 
 	nbcon_context_release(ctxt);
@@ -1205,7 +1218,7 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
 		 * handed over or taken over. In both cases the context is no
 		 * longer valid.
 		 */
-		if (!nbcon_emit_next_record(&wctxt))
+		if (!nbcon_emit_next_record(&wctxt, true))
 			return -EAGAIN;
 
 		if (!ctxt->backlog) {
@@ -1294,7 +1307,7 @@ static void __nbcon_atomic_flush_pending(u64 stop_seq, bool allow_unsafe_takeove
 		if (!(flags & CON_NBCON))
 			continue;
 
-		if (!console_is_usable(con, flags))
+		if (!console_is_usable(con, flags, true))
 			continue;
 
 		if (nbcon_seq_read(con) >= stop_seq)
@@ -1552,7 +1565,7 @@ void nbcon_device_release(struct console *con)
 	printk_get_console_flush_type(&ft, false);
 	cookie = console_srcu_read_lock();
 	if (ft.nbcon_atomic &&
-	    console_is_usable(con, console_srcu_read_flags(con)) &&
+	    console_is_usable(con, console_srcu_read_flags(con), true) &&
 	    prb_read_valid(prb, nbcon_seq_read(con), NULL)) {
 		__nbcon_atomic_flush_pending_con(con, prb_next_reserve_seq(prb), false);
 	}
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 3fbe27551f75..e5e575384b3c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3018,7 +3018,7 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
 			u64 printk_seq;
 			bool progress;
 
-			if (!console_is_usable(con, flags))
+			if (!console_is_usable(con, flags, true))
 				continue;
 			any_usable = true;
 
@@ -3717,7 +3717,7 @@ static int unregister_console_locked(struct console *console)
 
 	if (!console_is_registered_locked(console))
 		res = -ENODEV;
-	else if (console_is_usable(console, console->flags))
+	else if (console_is_usable(console, console->flags, true))
 		__pr_flush(console, 1000, true);
 
 	/* Disable it unconditionally */
@@ -3993,7 +3993,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 			 * that they make forward progress, so only increment
 			 * @diff for usable consoles.
 			 */
-			if (!console_is_usable(c, flags))
+			if (!console_is_usable(c, flags, true))
 				continue;
 
 			if (flags & CON_NBCON) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 08/19] printk: nbcon: Add context to usable() and emit()
  2024-07-22 17:19 ` [PATCH printk v3 08/19] printk: nbcon: Add context to usable() and emit() John Ogness
@ 2024-07-30 12:30   ` Petr Mladek
  0 siblings, 0 replies; 57+ messages in thread
From: Petr Mladek @ 2024-07-30 12:30 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman

On Mon 2024-07-22 19:25:28, John Ogness wrote:
> The nbcon consoles will have two callbacks to be used for
> different contexts. In order to determine if an nbcon console
> is usable, console_is_usable() must know if it is a context
> that will need to use the optional write_atomic() callback.
> Also, nbcon_emit_next_record() must know which callback it
> needs to call.
> 
> Add an extra parameter @use_atomic to console_is_usable() and
> nbcon_emit_next_record() to specify this.
> 
> Since so far only the write_atomic() callback exists,
> @use_atomic is set to true for all call sites.
> 
> For legacy consoles, @use_atomic is not used.
> 
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -938,6 +939,18 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
>  	unsigned long con_dropped;
>  	unsigned long dropped;
>  
> +	/*
> +	 * This function should never be called for consoles that have not
> +	 * implemented the necessary callback for writing: i.e. legacy
> +	 * consoles and, when atomic, nbcon consoles with no write_atomic().
> +	 * Handle it as if ownership was lost and try to continue.
> +	 */
> +	if (WARN_ON_ONCE((use_atomic && !con->write_atomic) ||
> +			 !(console_srcu_read_flags(con) & CON_NBCON))) {
> +		nbcon_context_release(ctxt);
> +		return false;
> +	}
> +
>  	/*
>  	 * The printk buffers are filled within an unsafe section. This
>  	 * prevents NBCON_PRIO_NORMAL and NBCON_PRIO_EMERGENCY from
> @@ -972,7 +985,7 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
>  	/* Initialize the write context for driver callbacks. */
>  	nbcon_write_context_set_buf(wctxt, &pmsg.pbufs->outbuf[0], pmsg.outbuf_len);
>  
> -	if (con->write_atomic) {
> +	if (use_atomic) {
>  		con->write_atomic(con, wctxt);
>  	} else {
>  		/*

This "else" code path duplicates the WARN_ON_ONCE() and nbcon_context_release(ctxt).
It could/should be removed.

BTW: There is the opposite bug in the next patch which adds con->write_thread().
     It removes this duplicate "else" path but it does not extend the initial check.

     I am sorry. These are mistakes from the refactoring which I have asked you
     to do.


Otherwise, this patch good.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH printk v3 09/19] printk: nbcon: Introduce printer kthreads
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
                   ` (7 preceding siblings ...)
  2024-07-22 17:19 ` [PATCH printk v3 08/19] printk: nbcon: Add context to usable() and emit() John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-07-30 14:44   ` John Ogness
  2024-07-30 15:16   ` Petr Mladek
  2024-07-22 17:19 ` [PATCH printk v3 10/19] printk: nbcon: Use thread callback if in task context for legacy John Ogness
                   ` (9 subsequent siblings)
  18 siblings, 2 replies; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman

From: Thomas Gleixner <tglx@linutronix.de>

Provide the main implementation for running a printer kthread
per nbcon console that is takeover/handover aware. This
includes:

- new mandatory write_thread() callback
- kthread creation
- kthread main printing loop
- kthread wakeup mechanism
- kthread shutdown

kthread creation is a bit tricky because consoles may register
before kthreads can be created. In such cases, registration
will succeed, even though no kthread exists. Once kthreads can
be created, an early_initcall will set @printk_kthreads_ready.
If there are no registered boot consoles, the early_initcall
creates the kthreads for all registered nbcon consoles. If
kthread creation fails, the related console is unregistered.

If there are registered boot consoles when
@printk_kthreads_ready is set, no kthreads are created until
the final boot console unregisters.

Once kthread creation finally occurs, @printk_kthreads_running
is set so that the system knows kthreads are available for all
registered nbcon consoles.

If @printk_kthreads_running is already set when the console
is registering, the kthread is created during registration. If
kthread creation fails, the registration will fail.

Until @printk_kthreads_running is set, console printing occurs
directly via the console_lock.

kthread shutdown on system shutdown/reboot is necessary to
ensure the printer kthreads finish their printing so that the
system can cleanly transition back to direct printing via the
console_lock in order to reliably push out the final
shutdown/reboot messages. @printk_kthreads_running is cleared
before shutting down the individual kthreads.

The kthread uses a new mandatory write_thread() callback that
is called with both device_lock() and the console context
acquired.

The console ownership handling is necessary for synchronization
against write_atomic() which is synchronized only via the
console context ownership.

The device_lock() serializes acquiring the console context with
NBCON_PRIO_NORMAL. It is needed in case the device_lock() does
not disable preemption. It prevents the following race:

CPU0				CPU1

 [ task A ]

 nbcon_context_try_acquire()
   # success with NORMAL prio
   # .unsafe == false;  // safe for takeover

 [ schedule: task A -> B ]

				WARN_ON()
				  nbcon_atomic_flush_pending()
				    nbcon_context_try_acquire()
				      # success with EMERGENCY prio

				      # flushing
				      nbcon_context_release()

				      # HERE: con->nbcon_state is free
				      #       to take by anyone !!!

 nbcon_context_try_acquire()
   # success with NORMAL prio [ task B ]

 [ schedule: task B -> A ]

 nbcon_enter_unsafe()
   nbcon_context_can_proceed()

BUG: nbcon_context_can_proceed() returns "true" because
     the console is owned by a context on CPU0 with
     NBCON_PRIO_NORMAL.

     But it should return "false". The console is owned
     by a context from task B and we do the check
     in a context from task A.

Note that with these changes, the printer kthreads do not yet
take over full responsibility for nbcon printing during normal
operation. These changes only focus on the lifecycle of the
kthreads.

Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
 include/linux/console.h  |  40 +++++++
 kernel/printk/internal.h |  27 +++++
 kernel/printk/nbcon.c    | 244 +++++++++++++++++++++++++++++++++++++--
 kernel/printk/printk.c   | 108 +++++++++++++++++
 4 files changed, 409 insertions(+), 10 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 5e9285e886f5..b8fb9fb24cbf 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -16,7 +16,9 @@
 
 #include <linux/atomic.h>
 #include <linux/bits.h>
+#include <linux/irq_work.h>
 #include <linux/rculist.h>
+#include <linux/rcuwait.h>
 #include <linux/types.h>
 #include <linux/vesa.h>
 
@@ -324,6 +326,9 @@ struct nbcon_write_context {
  * @nbcon_seq:		Sequence number of the next record for nbcon to print
  * @nbcon_device_ctxt:	Context available for non-printing operations
  * @pbufs:		Pointer to nbcon private buffer
+ * @kthread:		Printer kthread for this console
+ * @rcuwait:		RCU-safe wait object for @kthread waking
+ * @irq_work:		Defer @kthread waking to IRQ work context
  */
 struct console {
 	char			name[16];
@@ -377,6 +382,37 @@ struct console {
 	 */
 	void (*write_atomic)(struct console *con, struct nbcon_write_context *wctxt);
 
+	/**
+	 * @write_thread:
+	 *
+	 * NBCON callback to write out text in task context.
+	 *
+	 * This callback must be called only in task context with both
+	 * device_lock() and the nbcon console acquired with
+	 * NBCON_PRIO_NORMAL.
+	 *
+	 * The same rules for console ownership verification and unsafe
+	 * sections handling applies as with write_atomic().
+	 *
+	 * The console ownership handling is necessary for synchronization
+	 * against write_atomic() which is synchronized only via the context.
+	 *
+	 * The device_lock() provides the primary serialization for operations
+	 * on the device. It might be as relaxed (mutex)[*] or as tight
+	 * (disabled preemption and interrupts) as needed. It allows
+	 * the kthread to operate in the least restrictive mode[**].
+	 *
+	 * [*] Standalone nbcon_context_try_acquire() is not safe with
+	 *     the preemption enabled, see nbcon_owner_matches(). But it
+	 *     can be safe when always called in the preemptive context
+	 *     under the device_lock().
+	 *
+	 * [**] The device_lock() makes sure that nbcon_context_try_acquire()
+	 *      would never need to spin which is important especially with
+	 *      PREEMPT_RT.
+	 */
+	void (*write_thread)(struct console *con, struct nbcon_write_context *wctxt);
+
 	/**
 	 * @device_lock:
 	 *
@@ -423,7 +459,11 @@ struct console {
 	atomic_t		__private nbcon_state;
 	atomic_long_t		__private nbcon_seq;
 	struct nbcon_context	__private nbcon_device_ctxt;
+
 	struct printk_buffers	*pbufs;
+	struct task_struct	*kthread;
+	struct rcuwait		rcuwait;
+	struct irq_work		irq_work;
 };
 
 #ifdef CONFIG_LOCKDEP
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 8211623aee9b..12605e0aff11 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -48,6 +48,7 @@ struct printk_ringbuffer;
 struct dev_printk_info;
 
 extern struct printk_ringbuffer *prb;
+extern bool printk_kthreads_running;
 
 __printf(4, 0)
 int vprintk_store(int facility, int level,
@@ -91,6 +92,9 @@ enum nbcon_prio nbcon_get_default_prio(void);
 void nbcon_atomic_flush_pending(void);
 bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
 				   int cookie);
+bool nbcon_kthread_create(struct console *con);
+void nbcon_kthread_stop(struct console *con);
+void nbcon_wake_kthreads(void);
 
 /*
  * Check if the given console is currently capable and allowed to print
@@ -126,12 +130,34 @@ static inline bool console_is_usable(struct console *con, short flags, bool use_
 	return true;
 }
 
+/**
+ * nbcon_kthread_wake - Wake up a console printing thread
+ * @con:	Console to operate on
+ */
+static inline void nbcon_kthread_wake(struct console *con)
+{
+	/*
+	 * Guarantee any new records can be seen by tasks preparing to wait
+	 * before this context checks if the rcuwait is empty.
+	 *
+	 * The full memory barrier in rcuwait_wake_up() pairs with the full
+	 * memory barrier within set_current_state() of
+	 * ___rcuwait_wait_event(), which is called after prepare_to_rcuwait()
+	 * adds the waiter but before it has checked the wait condition.
+	 *
+	 * This pairs with nbcon_kthread_func:A.
+	 */
+	rcuwait_wake_up(&con->rcuwait); /* LMM(nbcon_kthread_wake:A) */
+}
+
 #else
 
 #define PRINTK_PREFIX_MAX	0
 #define PRINTK_MESSAGE_MAX	0
 #define PRINTKRB_RECORD_MAX	0
 
+#define printk_kthreads_running (false)
+
 /*
  * In !PRINTK builds we still export console_sem
  * semaphore and some of console functions (console_unlock()/etc.), so
@@ -149,6 +175,7 @@ static inline enum nbcon_prio nbcon_get_default_prio(void) { return NBCON_PRIO_N
 static inline void nbcon_atomic_flush_pending(void) { }
 static inline bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
 						 int cookie) { return false; }
+static inline void nbcon_kthread_wake(struct console *con) { }
 
 static inline bool console_is_usable(struct console *con, short flags,
 				     bool use_atomic) { return false; }
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index ae9cd6d0b740..69cecf97bf24 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -10,6 +10,7 @@
 #include <linux/export.h>
 #include <linux/init.h>
 #include <linux/irqflags.h>
+#include <linux/kthread.h>
 #include <linux/minmax.h>
 #include <linux/percpu.h>
 #include <linux/preempt.h>
@@ -985,17 +986,10 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt, bool use_a
 	/* Initialize the write context for driver callbacks. */
 	nbcon_write_context_set_buf(wctxt, &pmsg.pbufs->outbuf[0], pmsg.outbuf_len);
 
-	if (use_atomic) {
+	if (use_atomic)
 		con->write_atomic(con, wctxt);
-	} else {
-		/*
-		 * This function should never be called for legacy consoles.
-		 * Handle it as if ownership was lost and try to continue.
-		 */
-		WARN_ON_ONCE(1);
-		nbcon_context_release(ctxt);
-		return false;
-	}
+	else
+		con->write_thread(con, wctxt);
 
 	if (!wctxt->outbuf) {
 		/*
@@ -1031,6 +1025,219 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt, bool use_a
 	return nbcon_context_exit_unsafe(ctxt);
 }
 
+/**
+ * nbcon_kthread_should_wakeup - Check whether a printer thread should wakeup
+ * @con:	Console to operate on
+ * @ctxt:	The nbcon context from nbcon_context_try_acquire()
+ *
+ * Return:	True if the thread should shutdown or if the console is
+ *		allowed to print and a record is available. False otherwise.
+ *
+ * After the thread wakes up, it must first check if it should shutdown before
+ * attempting any printing.
+ */
+static bool nbcon_kthread_should_wakeup(struct console *con, struct nbcon_context *ctxt)
+{
+	bool ret = false;
+	short flags;
+	int cookie;
+
+	if (kthread_should_stop())
+		return true;
+
+	cookie = console_srcu_read_lock();
+
+	flags = console_srcu_read_flags(con);
+	if (console_is_usable(con, flags, false)) {
+		/* Bring the sequence in @ctxt up to date */
+		ctxt->seq = nbcon_seq_read(con);
+
+		ret = prb_read_valid(prb, ctxt->seq, NULL);
+	}
+
+	console_srcu_read_unlock(cookie);
+	return ret;
+}
+
+/**
+ * nbcon_kthread_func - The printer thread function
+ * @__console:	Console to operate on
+ *
+ * Return:	0
+ */
+static int nbcon_kthread_func(void *__console)
+{
+	struct console *con = __console;
+	struct nbcon_write_context wctxt = {
+		.ctxt.console	= con,
+		.ctxt.prio	= NBCON_PRIO_NORMAL,
+	};
+	struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
+	short con_flags;
+	bool backlog;
+	int cookie;
+
+wait_for_event:
+	/*
+	 * Guarantee this task is visible on the rcuwait before
+	 * checking the wake condition.
+	 *
+	 * The full memory barrier within set_current_state() of
+	 * ___rcuwait_wait_event() pairs with the full memory
+	 * barrier within rcuwait_has_sleeper().
+	 *
+	 * This pairs with rcuwait_has_sleeper:A and nbcon_kthread_wake:A.
+	 */
+	rcuwait_wait_event(&con->rcuwait,
+			   nbcon_kthread_should_wakeup(con, ctxt),
+			   TASK_INTERRUPTIBLE); /* LMM(nbcon_kthread_func:A) */
+
+	do {
+		if (kthread_should_stop())
+			return 0;
+
+		backlog = false;
+
+		/*
+		 * Keep the srcu read lock around the entire operation so that
+		 * synchronize_srcu() can guarantee that the kthread stopped
+		 * or suspended printing.
+		 */
+		cookie = console_srcu_read_lock();
+
+		con_flags = console_srcu_read_flags(con);
+
+		if (console_is_usable(con, con_flags, false)) {
+			unsigned long lock_flags;
+
+			con->device_lock(con, &lock_flags);
+
+			/*
+			 * Ensure this stays on the CPU to make handover and
+			 * takeover possible.
+			 */
+			cant_migrate();
+
+			if (nbcon_context_try_acquire(ctxt)) {
+				/*
+				 * If the emit fails, this context is no
+				 * longer the owner.
+				 */
+				if (nbcon_emit_next_record(&wctxt, false)) {
+					nbcon_context_release(ctxt);
+					backlog = ctxt->backlog;
+				}
+			}
+
+			con->device_unlock(con, lock_flags);
+		}
+
+		console_srcu_read_unlock(cookie);
+
+		cond_resched();
+
+	} while (backlog);
+
+	goto wait_for_event;
+}
+
+/**
+ * nbcon_irq_work - irq work to wake console printer thread
+ * @irq_work:	The irq work to operate on
+ */
+static void nbcon_irq_work(struct irq_work *irq_work)
+{
+	struct console *con = container_of(irq_work, struct console, irq_work);
+
+	nbcon_kthread_wake(con);
+}
+
+static inline bool rcuwait_has_sleeper(struct rcuwait *w)
+{
+	/*
+	 * Guarantee any new records can be seen by tasks preparing to wait
+	 * before this context checks if the rcuwait is empty.
+	 *
+	 * This full memory barrier pairs with the full memory barrier within
+	 * set_current_state() of ___rcuwait_wait_event(), which is called
+	 * after prepare_to_rcuwait() adds the waiter but before it has
+	 * checked the wait condition.
+	 *
+	 * This pairs with nbcon_kthread_func:A.
+	 */
+	smp_mb(); /* LMM(rcuwait_has_sleeper:A) */
+	return rcuwait_active(w);
+}
+
+/**
+ * nbcon_wake_kthreads - Wake up printing threads using irq_work
+ */
+void nbcon_wake_kthreads(void)
+{
+	struct console *con;
+	int cookie;
+
+	if (!printk_kthreads_running)
+		return;
+
+	cookie = console_srcu_read_lock();
+	for_each_console_srcu(con) {
+		/*
+		 * Only schedule irq_work if the printing thread is
+		 * actively waiting. If not waiting, the thread will
+		 * notice by itself that it has work to do.
+		 */
+		if (rcuwait_has_sleeper(&con->rcuwait))
+			irq_work_queue(&con->irq_work);
+	}
+	console_srcu_read_unlock(cookie);
+}
+
+/*
+ * nbcon_kthread_stop - Stop a console printer thread
+ * @con:	Console to operate on
+ */
+void nbcon_kthread_stop(struct console *con)
+{
+	lockdep_assert_console_list_lock_held();
+
+	if (!con->kthread)
+		return;
+
+	kthread_stop(con->kthread);
+	con->kthread = NULL;
+}
+
+/**
+ * nbcon_kthread_create - Create a console printer thread
+ * @con:	Console to operate on
+ *
+ * Return:	True if the kthread was started or already exists.
+ *		Otherwise false and @con must not be registered.
+ *
+ * If @con was already registered, it must be unregistered before
+ * the global state variable @printk_kthreads_running can be set.
+ */
+bool nbcon_kthread_create(struct console *con)
+{
+	struct task_struct *kt;
+
+	lockdep_assert_console_list_lock_held();
+
+	if (con->kthread)
+		return true;
+
+	kt = kthread_run(nbcon_kthread_func, con, "pr/%s%d", con->name, con->index);
+	if (WARN_ON(IS_ERR(kt))) {
+		con_printk(KERN_ERR, con, "failed to start printing thread\n");
+		return false;
+	}
+
+	con->kthread = kt;
+
+	return true;
+}
+
 /* Track the nbcon emergency nesting per CPU. */
 static DEFINE_PER_CPU(unsigned int, nbcon_pcpu_emergency_nesting);
 static unsigned int early_nbcon_pcpu_emergency_nesting __initdata;
@@ -1466,6 +1673,13 @@ bool nbcon_alloc(struct console *con)
 {
 	struct nbcon_state state = { };
 
+	if (!con->write_thread) {
+		con_printk(KERN_ERR, con, "no write_thread() callback provided\n");
+		return false;
+	}
+
+	rcuwait_init(&con->rcuwait);
+	init_irq_work(&con->irq_work, nbcon_irq_work);
 	nbcon_state_set(con, &state);
 
 	if (con->flags & CON_BOOT) {
@@ -1481,6 +1695,13 @@ bool nbcon_alloc(struct console *con)
 			con_printk(KERN_ERR, con, "failed to allocate printing buffer\n");
 			return false;
 		}
+
+		if (printk_kthreads_running) {
+			if (!nbcon_kthread_create(con)) {
+				kfree(con->pbufs);
+				return false;
+			}
+		}
 	}
 
 	return true;
@@ -1494,6 +1715,9 @@ void nbcon_free(struct console *con)
 {
 	struct nbcon_state state = { };
 
+	if (printk_kthreads_running)
+		nbcon_kthread_stop(con);
+
 	nbcon_state_set(con, &state);
 
 	/* Boot consoles share global printk buffers. */
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e5e575384b3c..cf0b35eaf047 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -34,6 +34,7 @@
 #include <linux/security.h>
 #include <linux/memblock.h>
 #include <linux/syscalls.h>
+#include <linux/syscore_ops.h>
 #include <linux/vmcore_info.h>
 #include <linux/ratelimit.h>
 #include <linux/kmsg_dump.h>
@@ -493,6 +494,9 @@ static u64 syslog_seq;
 static size_t syslog_partial;
 static bool syslog_time;
 
+/* True when _all_ printer threads are available for printing. */
+bool printk_kthreads_running;
+
 struct latched_seq {
 	seqcount_latch_t	latch;
 	u64			val[2];
@@ -2974,6 +2978,8 @@ static bool console_emit_next_record(struct console *con, bool *handover, int co
 	return false;
 }
 
+static inline void printk_kthreads_check_locked(void) { }
+
 #endif /* CONFIG_PRINTK */
 
 /*
@@ -3334,6 +3340,102 @@ void console_start(struct console *console)
 }
 EXPORT_SYMBOL(console_start);
 
+#ifdef CONFIG_PRINTK
+static int unregister_console_locked(struct console *console);
+
+/* True when system boot is far enough to create printer threads. */
+static bool printk_kthreads_ready __ro_after_init;
+
+/**
+ * printk_kthreads_shutdown - shutdown all threaded printers
+ *
+ * On system shutdown all threaded printers are stopped. This allows printk
+ * to transition back to atomic printing, thus providing a robust mechanism
+ * for the final shutdown/reboot messages to be output.
+ */
+static void printk_kthreads_shutdown(void)
+{
+	struct console *con;
+
+	console_list_lock();
+	if (printk_kthreads_running) {
+		printk_kthreads_running = false;
+
+		for_each_console(con) {
+			if (con->flags & CON_NBCON)
+				nbcon_kthread_stop(con);
+		}
+
+		/*
+		 * The threads may have been stopped while printing a
+		 * backlog. Flush any records left over.
+		 */
+		nbcon_atomic_flush_pending();
+	}
+	console_list_unlock();
+}
+
+static struct syscore_ops printk_syscore_ops = {
+	.shutdown = printk_kthreads_shutdown,
+};
+
+/*
+ * If appropriate, start nbcon kthreads and set @printk_kthreads_running.
+ * If any kthreads fail to start, those consoles are unregistered.
+ *
+ * Must be called under console_list_lock().
+ */
+static void printk_kthreads_check_locked(void)
+{
+	struct hlist_node *tmp;
+	struct console *con;
+
+	lockdep_assert_console_list_lock_held();
+
+	if (!printk_kthreads_ready)
+		return;
+
+	/*
+	 * Printer threads cannot be started as long as any boot console is
+	 * registered because there is no way to synchronize the hardware
+	 * registers between boot console code and regular console code.
+	 * It can only be known that there will be no new boot consoles when
+	 * an nbcon console is registered.
+	 */
+	if (have_boot_console || !have_nbcon_console) {
+		/* Clear flag in case all nbcon consoles unregistered. */
+		printk_kthreads_running = false;
+		return;
+	}
+
+	if (printk_kthreads_running)
+		return;
+
+	hlist_for_each_entry_safe(con, tmp, &console_list, node) {
+		if (!(con->flags & CON_NBCON))
+			continue;
+
+		if (!nbcon_kthread_create(con))
+			unregister_console_locked(con);
+	}
+
+	printk_kthreads_running = true;
+}
+
+static int __init printk_set_kthreads_ready(void)
+{
+	register_syscore_ops(&printk_syscore_ops);
+
+	console_list_lock();
+	printk_kthreads_ready = true;
+	printk_kthreads_check_locked();
+	console_list_unlock();
+
+	return 0;
+}
+early_initcall(printk_set_kthreads_ready);
+#endif /* CONFIG_PRINTK */
+
 static int __read_mostly keep_bootcon;
 
 static int __init keep_bootcon_setup(char *str)
@@ -3689,6 +3791,9 @@ void register_console(struct console *newcon)
 				unregister_console_locked(con);
 		}
 	}
+
+	/* Changed console list, may require printer threads to start/stop. */
+	printk_kthreads_check_locked();
 unlock:
 	console_list_unlock();
 }
@@ -3785,6 +3890,9 @@ static int unregister_console_locked(struct console *console)
 	if (!found_nbcon_con)
 		have_nbcon_console = found_nbcon_con;
 
+	/* Changed console list, may require printer threads to start/stop. */
+	printk_kthreads_check_locked();
+
 	return res;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 09/19] printk: nbcon: Introduce printer kthreads
  2024-07-22 17:19 ` [PATCH printk v3 09/19] printk: nbcon: Introduce printer kthreads John Ogness
@ 2024-07-30 14:44   ` John Ogness
  2024-07-31  9:59     ` Petr Mladek
  2024-07-30 15:16   ` Petr Mladek
  1 sibling, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-07-30 14:44 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman

On 2024-07-22, John Ogness <john.ogness@linutronix.de> wrote:
> +/**
> + * nbcon_kthread_should_wakeup - Check whether a printer thread should wakeup
> + * @con:	Console to operate on
> + * @ctxt:	The nbcon context from nbcon_context_try_acquire()
> + *
> + * Return:	True if the thread should shutdown or if the console is
> + *		allowed to print and a record is available. False otherwise.
> + *
> + * After the thread wakes up, it must first check if it should shutdown before
> + * attempting any printing.
> + */
> +static bool nbcon_kthread_should_wakeup(struct console *con, struct nbcon_context *ctxt)
> +{
> +	bool ret = false;
> +	short flags;
> +	int cookie;
> +
> +	if (kthread_should_stop())
> +		return true;
> +
> +	cookie = console_srcu_read_lock();
> +
> +	flags = console_srcu_read_flags(con);
> +	if (console_is_usable(con, flags, false)) {
> +		/* Bring the sequence in @ctxt up to date */
> +		ctxt->seq = nbcon_seq_read(con);
> +
> +		ret = prb_read_valid(prb, ctxt->seq, NULL);

With this v3 series, the kthreads could be started before @nbcon_seq has
been set to the correct initial value. This will cause it to start
printing immediately from 0.

To fix this, I would change nbcon_alloc() to initialize @nbcon_seq to
the highest possible value:

	/*
	 * Initialize @nbcon_seq to the highest possible sequence number so
	 * that practically speaking it will have nothing to print until a
	 * desired initial sequence number has been set via nbcon_seq_force().
	 */
	atomic_long_set(&ACCESS_PRIVATE(con, nbcon_seq), ULSEQ_MAX(prb));

With the following ULSEQ_MAX macro added to printk_ringbuffer.h:

#ifdef CONFIG_64BIT
#define ULSEQ_MAX(rb) (-1)
#else
#define ULSEQ_MAX(rb) (prb_first_seq(rb) + 0x80000000)
#endif

This will allow the prb_read_valid() in nbcon_kthread_should_wakeup() to
return false and the kthread will sleep until @nbcon_seq has been
correctly setup and the kthread is woken when the "enabled" printk() is
called.

> +	}
> +
> +	console_srcu_read_unlock(cookie);
> +	return ret;
> +}

Other options would be to add extra checks to
nbcon_kthread_should_wakeup() or add some wait/notify code before
entering the kthread main loop. But both are overkill for this simple
startup scenario.

Thoughts?

John Ogness

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 09/19] printk: nbcon: Introduce printer kthreads
  2024-07-30 14:44   ` John Ogness
@ 2024-07-31  9:59     ` Petr Mladek
  0 siblings, 0 replies; 57+ messages in thread
From: Petr Mladek @ 2024-07-31  9:59 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman

On Tue 2024-07-30 16:50:47, John Ogness wrote:
> On 2024-07-22, John Ogness <john.ogness@linutronix.de> wrote:
> > +/**
> > + * nbcon_kthread_should_wakeup - Check whether a printer thread should wakeup
> > + * @con:	Console to operate on
> > + * @ctxt:	The nbcon context from nbcon_context_try_acquire()
> > + *
> > + * Return:	True if the thread should shutdown or if the console is
> > + *		allowed to print and a record is available. False otherwise.
> > + *
> > + * After the thread wakes up, it must first check if it should shutdown before
> > + * attempting any printing.
> > + */
> > +static bool nbcon_kthread_should_wakeup(struct console *con, struct nbcon_context *ctxt)
> > +{
> > +	bool ret = false;
> > +	short flags;
> > +	int cookie;
> > +
> > +	if (kthread_should_stop())
> > +		return true;
> > +
> > +	cookie = console_srcu_read_lock();
> > +
> > +	flags = console_srcu_read_flags(con);
> > +	if (console_is_usable(con, flags, false)) {
> > +		/* Bring the sequence in @ctxt up to date */
> > +		ctxt->seq = nbcon_seq_read(con);
> > +
> > +		ret = prb_read_valid(prb, ctxt->seq, NULL);
> 
> With this v3 series, the kthreads could be started before @nbcon_seq has
> been set to the correct initial value. This will cause it to start
> printing immediately from 0.

Great catch!

> To fix this, I would change nbcon_alloc() to initialize @nbcon_seq to
> the highest possible value:
> 
> 	/*
> 	 * Initialize @nbcon_seq to the highest possible sequence number so
> 	 * that practically speaking it will have nothing to print until a
> 	 * desired initial sequence number has been set via nbcon_seq_force().
> 	 */
> 	atomic_long_set(&ACCESS_PRIVATE(con, nbcon_seq), ULSEQ_MAX(prb));
> 
> With the following ULSEQ_MAX macro added to printk_ringbuffer.h:
> 
> #ifdef CONFIG_64BIT
> #define ULSEQ_MAX(rb) (-1)
> #else
> #define ULSEQ_MAX(rb) (prb_first_seq(rb) + 0x80000000)
> #endif
> 
> This will allow the prb_read_valid() in nbcon_kthread_should_wakeup() to
> return false and the kthread will sleep until @nbcon_seq has been
> correctly setup and the kthread is woken when the "enabled" printk() is
> called.

It looks good.

Alternative solution would be to delay setting of the CON_ENABLED bit.
Well, there is one catch, some console drivers pre-set CON_ENABLED to
enforce the registration, see the comment in
try_enable_preferred_console(). We would need to handle this somehow.

So the trick with ULSEQ_MAX is a better way to go.

> > +	}
> > +
> > +	console_srcu_read_unlock(cookie);
> > +	return ret;
> > +}
> 
> Other options would be to add extra checks to
> nbcon_kthread_should_wakeup() or add some wait/notify code before
> entering the kthread main loop. But both are overkill for this simple
> startup scenario.

Yeah, the trick with ULSEQ_MAX is easier and looks reliable. I am fine
with it.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 09/19] printk: nbcon: Introduce printer kthreads
  2024-07-22 17:19 ` [PATCH printk v3 09/19] printk: nbcon: Introduce printer kthreads John Ogness
  2024-07-30 14:44   ` John Ogness
@ 2024-07-30 15:16   ` Petr Mladek
  1 sibling, 0 replies; 57+ messages in thread
From: Petr Mladek @ 2024-07-30 15:16 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman

On Mon 2024-07-22 19:25:29, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Provide the main implementation for running a printer kthread
> per nbcon console that is takeover/handover aware. This
> includes:
> 
> - new mandatory write_thread() callback
> - kthread creation
> - kthread main printing loop
> - kthread wakeup mechanism
> - kthread shutdown

> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -985,17 +986,10 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt, bool use_a
>  	/* Initialize the write context for driver callbacks. */
>  	nbcon_write_context_set_buf(wctxt, &pmsg.pbufs->outbuf[0], pmsg.outbuf_len);
>  
> -	if (use_atomic) {
> +	if (use_atomic)
>  		con->write_atomic(con, wctxt);
> -	} else {
> -		/*
> -		 * This function should never be called for legacy consoles.
> -		 * Handle it as if ownership was lost and try to continue.
> -		 */
> -		WARN_ON_ONCE(1);
> -		nbcon_context_release(ctxt);
> -		return false;
> -	}

This "else" code path should have been removed in the previous patch (8th).

> +	else
> +		con->write_thread(con, wctxt);

We need to make sure that this callback exists by extending the check
at the beginning of this function:

	/*
	 * This function should never be called for consoles that have not
	 * implemented the necessary callback for writing: i.e. legacy
	 * consoles and, when atomic, nbcon consoles with no write_atomic().
	 * Handle it as if ownership was lost and try to continue.
	 */
	if (WARN_ON_ONCE((use_atomic && !con->write_atomic) ||
+			 (!use_atomic && !con->write_thread) ||
			 !(console_srcu_read_flags(con) & CON_NBCON))) {
		nbcon_context_release(ctxt);
		return false;
	}

>  
>  	if (!wctxt->outbuf) {
>  		/*

Otherwise, it looks good.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH printk v3 10/19] printk: nbcon: Use thread callback if in task context for legacy
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
                   ` (8 preceding siblings ...)
  2024-07-22 17:19 ` [PATCH printk v3 09/19] printk: nbcon: Introduce printer kthreads John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-07-30 15:35   ` Petr Mladek
  2024-07-22 17:19 ` [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation John Ogness
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

When printing via console_lock, the write_atomic() callback is
used for nbcon consoles. However, if it is known that the
current context is a task context, the write_thread() callback
can be used instead.

Using write_thread() instead of write_atomic() helps to reduce
large disabled preemption regions when the device_lock does not
disable preemption.

This is mainly a preparatory change to allow avoiding
write_atomic() completely during normal operation if boot
consoles are registered.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/internal.h |  4 +--
 kernel/printk/nbcon.c    | 54 +++++++++++++++++++++++++++++-----------
 kernel/printk/printk.c   |  5 ++--
 3 files changed, 45 insertions(+), 18 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 12605e0aff11..bb02788acc7c 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -91,7 +91,7 @@ void nbcon_free(struct console *con);
 enum nbcon_prio nbcon_get_default_prio(void);
 void nbcon_atomic_flush_pending(void);
 bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
-				   int cookie);
+				   int cookie, bool use_atomic);
 bool nbcon_kthread_create(struct console *con);
 void nbcon_kthread_stop(struct console *con);
 void nbcon_wake_kthreads(void);
@@ -174,7 +174,7 @@ static inline void nbcon_free(struct console *con) { }
 static inline enum nbcon_prio nbcon_get_default_prio(void) { return NBCON_PRIO_NONE; }
 static inline void nbcon_atomic_flush_pending(void) { }
 static inline bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
-						 int cookie) { return false; }
+						 int cookie, bool use_atomic) { return false; }
 static inline void nbcon_kthread_wake(struct console *con) { }
 
 static inline bool console_is_usable(struct console *con, short flags,
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 69cecf97bf24..233ab8f90fef 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1294,9 +1294,10 @@ enum nbcon_prio nbcon_get_default_prio(void)
 }
 
 /*
- * nbcon_atomic_emit_one - Print one record for an nbcon console using the
- *				write_atomic() callback
+ * nbcon_emit_one - Print one record for an nbcon console using the
+ *			specified callback
  * @wctxt:	An initialized write context struct to use for this context
+ * @use_atomic:	True if the write_atomic() callback is to be used
  *
  * Return:	True, when a record has been printed and there are still
  *		pending records. The caller might want to continue flushing.
@@ -1309,7 +1310,7 @@ enum nbcon_prio nbcon_get_default_prio(void)
  * This is an internal helper to handle the locking of the console before
  * calling nbcon_emit_next_record().
  */
-static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
+static bool nbcon_emit_one(struct nbcon_write_context *wctxt, bool use_atomic)
 {
 	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
 
@@ -1324,7 +1325,7 @@ static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
 	 * The higher priority printing context takes over responsibility
 	 * to print the pending records.
 	 */
-	if (!nbcon_emit_next_record(wctxt, true))
+	if (!nbcon_emit_next_record(wctxt, use_atomic))
 		return false;
 
 	nbcon_context_release(ctxt);
@@ -1341,6 +1342,13 @@ static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
  *		both the console_lock and the SRCU read lock. Otherwise it
  *		is set to false.
  * @cookie:	The cookie from the SRCU read lock.
+ * @use_atomic: Set true when called in an atomic or unknown context.
+ *		It affects which nbcon callback will be used: write_atomic()
+ *		or write_thread().
+ *
+ *		When false, the write_thread() callback is used and would be
+ *		called in a preemtible context unless disabled by the
+ *		device_lock. The legacy handover is not allowed in this mode.
  *
  * Context:	Any context except NMI.
  * Return:	True, when a record has been printed and there are still
@@ -1356,26 +1364,44 @@ static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
  * Essentially it is the nbcon version of console_emit_next_record().
  */
 bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
-				   int cookie)
+				   int cookie, bool use_atomic)
 {
 	struct nbcon_write_context wctxt = { };
 	struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
 	unsigned long flags;
 	bool progress;
 
-	/* Use the same procedure as console_emit_next_record(). */
-	printk_safe_enter_irqsave(flags);
-	console_lock_spinning_enable();
-	stop_critical_timings();
-
 	ctxt->console	= con;
 	ctxt->prio	= nbcon_get_default_prio();
 
-	progress = nbcon_atomic_emit_one(&wctxt);
+	if (use_atomic) {
+		/*
+		 * In an atomic or unknown context, use the same procedure as
+		 * in console_emit_next_record(). It allows to handover.
+		 */
+		printk_safe_enter_irqsave(flags);
+		console_lock_spinning_enable();
+		stop_critical_timings();
+	} else {
+		/*
+		 * In task context, use the same procedure as in
+		 * nbcon_kthread_func(). It might allow scheduling depending
+		 * on the device_lock().
+		 */
+		con->device_lock(con, &flags);
+		cant_migrate();
+	}
 
-	start_critical_timings();
-	*handover = console_lock_spinning_disable_and_check(cookie);
-	printk_safe_exit_irqrestore(flags);
+	progress = nbcon_emit_one(&wctxt, use_atomic);
+
+	if (use_atomic) {
+		start_critical_timings();
+		*handover = console_lock_spinning_disable_and_check(cookie);
+		printk_safe_exit_irqrestore(flags);
+	} else {
+		con->device_unlock(con, flags);
+		*handover = false;
+	}
 
 	return progress;
 }
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index cf0b35eaf047..71e946a8c5fa 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3024,12 +3024,13 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
 			u64 printk_seq;
 			bool progress;
 
-			if (!console_is_usable(con, flags, true))
+			if (!console_is_usable(con, flags, !do_cond_resched))
 				continue;
 			any_usable = true;
 
 			if (flags & CON_NBCON) {
-				progress = nbcon_legacy_emit_next_record(con, handover, cookie);
+				progress = nbcon_legacy_emit_next_record(con, handover, cookie,
+									 !do_cond_resched);
 				printk_seq = nbcon_seq_read(con);
 			} else {
 				progress = console_emit_next_record(con, handover, cookie);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 10/19] printk: nbcon: Use thread callback if in task context for legacy
  2024-07-22 17:19 ` [PATCH printk v3 10/19] printk: nbcon: Use thread callback if in task context for legacy John Ogness
@ 2024-07-30 15:35   ` Petr Mladek
  0 siblings, 0 replies; 57+ messages in thread
From: Petr Mladek @ 2024-07-30 15:35 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Mon 2024-07-22 19:25:30, John Ogness wrote:
> When printing via console_lock, the write_atomic() callback is
> used for nbcon consoles. However, if it is known that the
> current context is a task context, the write_thread() callback
> can be used instead.
> 
> Using write_thread() instead of write_atomic() helps to reduce
> large disabled preemption regions when the device_lock does not
> disable preemption.
> 
> This is mainly a preparatory change to allow avoiding
> write_atomic() completely during normal operation if boot
> consoles are registered.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Well, I would update a comment, see below.

> ---
>  kernel/printk/internal.h |  4 +--
>  kernel/printk/nbcon.c    | 54 +++++++++++++++++++++++++++++-----------
>  kernel/printk/printk.c   |  5 ++--
>  3 files changed, 45 insertions(+), 18 deletions(-)
> 
> diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
> index 12605e0aff11..bb02788acc7c 100644
> --- a/kernel/printk/internal.h
> +++ b/kernel/printk/internal.h
> @@ -91,7 +91,7 @@ void nbcon_free(struct console *con);
>  enum nbcon_prio nbcon_get_default_prio(void);
>  void nbcon_atomic_flush_pending(void);
>  bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
> -				   int cookie);
> +				   int cookie, bool use_atomic);
>  bool nbcon_kthread_create(struct console *con);
>  void nbcon_kthread_stop(struct console *con);
>  void nbcon_wake_kthreads(void);
> @@ -174,7 +174,7 @@ static inline void nbcon_free(struct console *con) { }
>  static inline enum nbcon_prio nbcon_get_default_prio(void) { return NBCON_PRIO_NONE; }
>  static inline void nbcon_atomic_flush_pending(void) { }
>  static inline bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
> -						 int cookie) { return false; }
> +						 int cookie, bool use_atomic) { return false; }
>  static inline void nbcon_kthread_wake(struct console *con) { }
>  
>  static inline bool console_is_usable(struct console *con, short flags,
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index 69cecf97bf24..233ab8f90fef 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -1294,9 +1294,10 @@ enum nbcon_prio nbcon_get_default_prio(void)
>  }
>  
>  /*
> - * nbcon_atomic_emit_one - Print one record for an nbcon console using the
> - *				write_atomic() callback
> + * nbcon_emit_one - Print one record for an nbcon console using the
> + *			specified callback
>   * @wctxt:	An initialized write context struct to use for this context
> + * @use_atomic:	True if the write_atomic() callback is to be used
>   *
>   * Return:	True, when a record has been printed and there are still
>   *		pending records. The caller might want to continue flushing.
> @@ -1309,7 +1310,7 @@ enum nbcon_prio nbcon_get_default_prio(void)
>   * This is an internal helper to handle the locking of the console before
>   * calling nbcon_emit_next_record().

This is not completely true when @use_atomic == false. The function
takes only the nbcon context. But also con->device_lock()
is needed for the non-atomic case.

We should either update the commit message. Or it might make sense
to move "con->device_lock()/unlock() stuff to nbcon_emit_one
so that it actually does all the necessary locking.

BTW: We could use this function in nbcon_kthread_func() then as well.

    */
> -static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
> +static bool nbcon_emit_one(struct nbcon_write_context *wctxt, bool use_atomic)
>  {
>  	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
>  

Otherwise, it looks good.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
                   ` (9 preceding siblings ...)
  2024-07-22 17:19 ` [PATCH printk v3 10/19] printk: nbcon: Use thread callback if in task context for legacy John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-07-23  3:18   ` kernel test robot
                     ` (2 more replies)
  2024-07-22 17:19 ` [PATCH printk v3 12/19] printk: Provide helper for message prepending John Ogness
                   ` (7 subsequent siblings)
  18 siblings, 3 replies; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

Once the kthread is running and available
(i.e. @printk_kthreads_running is set), the kthread becomes
responsible for flushing any pending messages which are added
in NBCON_PRIO_NORMAL context. Namely the legacy
console_flush_all() and device_release() no longer flush the
console. And nbcon_atomic_flush_pending() used by
nbcon_cpu_emergency_exit() no longer flushes messages added
after the emergency messages.

The console context is safe when used by the kthread only when
one of the following conditions are true:

  1. Other caller acquires the console context with
     NBCON_PRIO_NORMAL with preemption disabled. It will
     release the context before rescheduling.

  2. Other caller acquires the console context with
     NBCON_PRIO_NORMAL under the device_lock.

  3. The kthread is the only context which acquires the console
     with NBCON_PRIO_NORMAL.

This is satisfied for all atomic printing call sites:

nbcon_legacy_emit_next_record() (#1)

nbcon_atomic_flush_pending_con() (#1)

nbcon_device_release() (#2)

It is even double guaranteed when @printk_kthreads_running
is set because then _only_ the kthread will print for
NBCON_PRIO_NORMAL. (#3)

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/internal.h |  6 +++++-
 kernel/printk/nbcon.c    | 13 ++++++------
 kernel/printk/printk.c   | 46 +++++++++++++++++++++++++++++++++++++++-
 3 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index bb02788acc7c..66321836c3fe 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -190,11 +190,13 @@ extern bool legacy_allow_panic_sync;
 /**
  * struct console_flush_type - Define how to flush the consoles
  * @nbcon_atomic:	Flush directly using nbcon_atomic() callback
+ * @nbcon_offload:	Offload flush to printer thread
  * @legacy_direct:	Call the legacy loop in this context
  * @legacy_offload:	Offload the legacy loop into IRQ
  */
 struct console_flush_type {
 	bool	nbcon_atomic;
+	bool	nbcon_offload;
 	bool	legacy_direct;
 	bool	legacy_offload;
 };
@@ -220,7 +222,9 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft,
 				ft->legacy_direct = true;
 		}
 
-		if (have_nbcon_console && !have_boot_console)
+		if (printk_kthreads_running)
+			ft->nbcon_offload = true;
+		else if (have_nbcon_console && !have_boot_console)
 			ft->nbcon_atomic = true;
 		break;
 
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 233ab8f90fef..8cf9e9e8c6e4 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1511,10 +1511,10 @@ static void nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
 
 	/*
 	 * If flushing was successful but more records are available, this
-	 * context must flush those remaining records because there is no
-	 * other context that will do it.
+	 * context must flush those remaining records if the printer thread
+	 * is not available do it.
 	 */
-	printk_get_console_flush_type(&ft, false);
+	printk_get_console_flush_type(&ft, true);
 	if (ft.nbcon_atomic &&
 	    prb_read_valid(prb, nbcon_seq_read(con), NULL)) {
 		stop_seq = prb_next_reserve_seq(prb);
@@ -1809,10 +1809,11 @@ void nbcon_device_release(struct console *con)
 
 	/*
 	 * This context must flush any new records added while the console
-	 * was locked. The console_srcu_read_lock must be taken to ensure
-	 * the console is usable throughout flushing.
+	 * was locked if the printer thread is not available to do it. The
+	 * console_srcu_read_lock must be taken to ensure the console is
+	 * usable throughout flushing.
 	 */
-	printk_get_console_flush_type(&ft, false);
+	printk_get_console_flush_type(&ft, true);
 	cookie = console_srcu_read_lock();
 	if (ft.nbcon_atomic &&
 	    console_is_usable(con, console_srcu_read_flags(con), true) &&
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 71e946a8c5fa..620c02b069bc 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2382,6 +2382,9 @@ asmlinkage int vprintk_emit(int facility, int level,
 	if (ft.nbcon_atomic)
 		nbcon_atomic_flush_pending();
 
+	if (ft.nbcon_offload)
+		nbcon_wake_kthreads();
+
 	if (!defer_legacy && ft.legacy_direct) {
 		/*
 		 * The caller may be holding system-critical or
@@ -2680,6 +2683,7 @@ void suspend_console(void)
 
 void resume_console(void)
 {
+	struct console_flush_type ft;
 	struct console *con;
 
 	if (!console_suspend_enabled)
@@ -2697,6 +2701,10 @@ void resume_console(void)
 	 */
 	synchronize_srcu(&console_srcu);
 
+	printk_get_console_flush_type(&ft, true);
+	if (ft.nbcon_offload)
+		nbcon_wake_kthreads();
+
 	pr_flush(1000, true);
 }
 
@@ -3007,6 +3015,7 @@ static inline void printk_kthreads_check_locked(void) { }
  */
 static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handover)
 {
+	struct console_flush_type ft;
 	bool any_usable = false;
 	struct console *con;
 	bool any_progress;
@@ -3018,12 +3027,21 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
 	do {
 		any_progress = false;
 
+		printk_get_console_flush_type(&ft, true);
+
 		cookie = console_srcu_read_lock();
 		for_each_console_srcu(con) {
 			short flags = console_srcu_read_flags(con);
 			u64 printk_seq;
 			bool progress;
 
+			/*
+			 * console_flush_all() is only for legacy consoles when
+			 * the nbcon consoles have their printer threads.
+			 */
+			if ((flags & CON_NBCON) && ft.nbcon_offload)
+				continue;
+
 			if (!console_is_usable(con, flags, !do_cond_resched))
 				continue;
 			any_usable = true;
@@ -3334,9 +3352,28 @@ EXPORT_SYMBOL(console_stop);
 
 void console_start(struct console *console)
 {
+	struct console_flush_type ft;
+	short flags;
+	int cookie;
+
 	console_list_lock();
 	console_srcu_write_flags(console, console->flags | CON_ENABLED);
 	console_list_unlock();
+
+	/*
+	 * Ensure that all SRCU list walks have completed. The related
+	 * printing context must be able to see it is enabled so that
+	 * it is guaranteed to wake up and resume printing.
+	 */
+	synchronize_srcu(&console_srcu);
+
+	printk_get_console_flush_type(&ft, true);
+	cookie = console_srcu_read_lock();
+	flags = console_srcu_read_flags(console);
+	if ((flags & CON_NBCON) && ft.nbcon_offload)
+		nbcon_kthread_wake(console);
+	console_srcu_read_unlock(cookie);
+
 	__pr_flush(console, 1000, true);
 }
 EXPORT_SYMBOL(console_start);
@@ -4102,8 +4139,10 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 			 * that they make forward progress, so only increment
 			 * @diff for usable consoles.
 			 */
-			if (!console_is_usable(c, flags, true))
+			if (!console_is_usable(c, flags, true) &&
+			    !console_is_usable(c, flags, false)) {
 				continue;
+			}
 
 			if (flags & CON_NBCON) {
 				printk_seq = nbcon_seq_read(c);
@@ -4605,8 +4644,13 @@ EXPORT_SYMBOL_GPL(kmsg_dump_rewind);
  */
 void console_try_replay_all(void)
 {
+	struct console_flush_type ft;
+
+	printk_get_console_flush_type(&ft, true);
 	if (console_trylock()) {
 		__console_rewind_all();
+		if (ft.nbcon_offload)
+			nbcon_wake_kthreads();
 		/* Consoles are flushed as part of console_unlock(). */
 		console_unlock();
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
  2024-07-22 17:19 ` [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation John Ogness
@ 2024-07-23  3:18   ` kernel test robot
  2024-07-23  8:51     ` John Ogness
  2024-07-31 13:46   ` preffer_ofload param: was: " Petr Mladek
  2024-07-31 14:06   ` Petr Mladek
  2 siblings, 1 reply; 57+ messages in thread
From: kernel test robot @ 2024-07-23  3:18 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: oe-kbuild-all, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel

Hi John,

kernel test robot noticed the following build errors:

[auto build test ERROR on b18703ea7157f62f02eb0ceb11f6fa0138e90adc]

url:    https://github.com/intel-lab-lkp/linux/commits/John-Ogness/printk-nbcon-Clarify-nbcon_get_default_prio-context/20240723-015154
base:   b18703ea7157f62f02eb0ceb11f6fa0138e90adc
patch link:    https://lore.kernel.org/r/20240722171939.3349410-12-john.ogness%40linutronix.de
patch subject: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
config: x86_64-buildonly-randconfig-002-20240723 (https://download.01.org/0day-ci/archive/20240723/202407231117.SngBfEcI-lkp@intel.com/config)
compiler: gcc-10 (Ubuntu 10.5.0-1ubuntu1) 10.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240723/202407231117.SngBfEcI-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202407231117.SngBfEcI-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from kernel/printk/printk.c:62:
   kernel/printk/internal.h: In function 'printk_get_console_flush_type':
   kernel/printk/internal.h:219:26: error: implicit declaration of function 'is_printk_deferred'; did you mean '_printk_deferred'? [-Werror=implicit-function-declaration]
     219 |    if (prefer_offload || is_printk_deferred())
         |                          ^~~~~~~~~~~~~~~~~~
         |                          _printk_deferred
   kernel/printk/printk.c: In function 'resume_console':
>> kernel/printk/printk.c:2706:3: error: implicit declaration of function 'nbcon_wake_kthreads'; did you mean 'irq_wake_thread'? [-Werror=implicit-function-declaration]
    2706 |   nbcon_wake_kthreads();
         |   ^~~~~~~~~~~~~~~~~~~
         |   irq_wake_thread
   cc1: some warnings being treated as errors


vim +2706 kernel/printk/printk.c

  2683	
  2684	void resume_console(void)
  2685	{
  2686		struct console_flush_type ft;
  2687		struct console *con;
  2688	
  2689		if (!console_suspend_enabled)
  2690			return;
  2691	
  2692		console_list_lock();
  2693		for_each_console(con)
  2694			console_srcu_write_flags(con, con->flags & ~CON_SUSPENDED);
  2695		console_list_unlock();
  2696	
  2697		/*
  2698		 * Ensure that all SRCU list walks have completed. All printing
  2699		 * contexts must be able to see they are no longer suspended so
  2700		 * that they are guaranteed to wake up and resume printing.
  2701		 */
  2702		synchronize_srcu(&console_srcu);
  2703	
  2704		printk_get_console_flush_type(&ft, true);
  2705		if (ft.nbcon_offload)
> 2706			nbcon_wake_kthreads();
  2707	
  2708		pr_flush(1000, true);
  2709	}
  2710	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
  2024-07-23  3:18   ` kernel test robot
@ 2024-07-23  8:51     ` John Ogness
  0 siblings, 0 replies; 57+ messages in thread
From: John Ogness @ 2024-07-23  8:51 UTC (permalink / raw)
  To: kernel test robot, Petr Mladek
  Cc: oe-kbuild-all, Sergey Senozhatsky, Steven Rostedt,
	Thomas Gleixner, linux-kernel

On 2024-07-23, kernel test robot <lkp@intel.com> wrote:
>    kernel/printk/printk.c: In function 'resume_console':
>>> kernel/printk/printk.c:2706:3: error: implicit declaration of function 'nbcon_wake_kthreads'; did you mean 'irq_wake_thread'? [-Werror=implicit-function-declaration]
>     2706 |   nbcon_wake_kthreads();
>          |   ^~~~~~~~~~~~~~~~~~~
>          |   irq_wake_thread
>    cc1: some warnings being treated as errors

Thanks lkp.

For v4 I will fix this and also rename nbcon_wake_kthreads() to
nbcon_kthreads_wake() so that it matches the naming scheme:

nbcon_kthread_wake()
legacy_kthread_wake()

John Ogness

^ permalink raw reply	[flat|nested] 57+ messages in thread

* preffer_ofload param: was: Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
  2024-07-22 17:19 ` [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation John Ogness
  2024-07-23  3:18   ` kernel test robot
@ 2024-07-31 13:46   ` Petr Mladek
  2024-08-01 14:22     ` John Ogness
  2024-07-31 14:06   ` Petr Mladek
  2 siblings, 1 reply; 57+ messages in thread
From: Petr Mladek @ 2024-07-31 13:46 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Mon 2024-07-22 19:25:31, John Ogness wrote:
> Once the kthread is running and available
> (i.e. @printk_kthreads_running is set), the kthread becomes
> responsible for flushing any pending messages which are added
> in NBCON_PRIO_NORMAL context. Namely the legacy
> console_flush_all() and device_release() no longer flush the
> console. And nbcon_atomic_flush_pending() used by
> nbcon_cpu_emergency_exit() no longer flushes messages added
> after the emergency messages.
> 
> --- a/kernel/printk/internal.h
> +++ b/kernel/printk/internal.h
> @@ -190,11 +190,13 @@ extern bool legacy_allow_panic_sync;
>  /**
>   * struct console_flush_type - Define how to flush the consoles
>   * @nbcon_atomic:	Flush directly using nbcon_atomic() callback
> + * @nbcon_offload:	Offload flush to printer thread
>   * @legacy_direct:	Call the legacy loop in this context
>   * @legacy_offload:	Offload the legacy loop into IRQ
>   */
>  struct console_flush_type {
>  	bool	nbcon_atomic;
> +	bool	nbcon_offload;
>  	bool	legacy_direct;
>  	bool	legacy_offload;
>  };
> @@ -220,7 +222,9 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft,
>  				ft->legacy_direct = true;
>  		}
>  
> -		if (have_nbcon_console && !have_boot_console)
> +		if (printk_kthreads_running)
> +			ft->nbcon_offload = true;
> +		else if (have_nbcon_console && !have_boot_console)
>  			ft->nbcon_atomic = true;
>  		break;
>  
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index 233ab8f90fef..8cf9e9e8c6e4 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -1511,10 +1511,10 @@ static void nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
>  
>  	/*
>  	 * If flushing was successful but more records are available, this
> -	 * context must flush those remaining records because there is no
> -	 * other context that will do it.
> +	 * context must flush those remaining records if the printer thread
> +	 * is not available do it.
>  	 */
> -	printk_get_console_flush_type(&ft, false);
> +	printk_get_console_flush_type(&ft, true);

Hmm, it is a bit weird that we change the value even though it does
not affect the behavior. The parameter @prefer_offload affects only
the legacy loop.

It is even more confusing because ft.legacy_offload and
ft.nbcon_offload have difference motivation:

  + legacy_offload is used when the direct flush is not allowed/possible
  + nbcon_offload is used when allowed/possible

I am not 100% sure why I added the @prefer_offload parameter. I think
that it was for situations when we wanted to call the legacy loop
independently on whether direct or offload was chosen. I think
that it was for __wake_up_klogd() called from
defer_console_output().

Anyway, this is cscope output in emacs after applying this patchset:

<paste>
Finding symbol: printk_get_console_flush_type

Database directory: /prace/kernel/linux-printk/
-------------------------------------------------------------------------------
*** kernel/printk/internal.h:
printk_get_console_flush_type[225] static inline void printk_get_console_flush_type(struct console_flush_type *ft, bool prefer_offload)

*** kernel/printk/nbcon.c:
nbcon_atomic_flush_pending_con[1548] printk_get_console_flush_type(&ft, true);
nbcon_device_release[1848]     printk_get_console_flush_type(&ft, true);

*** kernel/printk/printk.c:
printk_legacy_allow_panic_sync[2343] printk_get_console_flush_type(&ft, false);
vprintk_emit[2381]             printk_get_console_flush_type(&ft, false);
resume_console[2706]           printk_get_console_flush_type(&ft, true);
console_cpu_notify[2729]       printk_get_console_flush_type(&ft, false);
console_flush_all[3102]        printk_get_console_flush_type(&ft, true);
console_unlock[3223]           printk_get_console_flush_type(&ft, false);
console_flush_on_panic[3375]   printk_get_console_flush_type(&ft, false);
console_start[3453]            printk_get_console_flush_type(&ft, true);
legacy_kthread_should_wakeup[3484] printk_get_console_flush_type(&ft, true);
__pr_flush[4290]               printk_get_console_flush_type(&ft, false);
__pr_flush[4302]               printk_get_console_flush_type(&ft, false);
__wake_up_klogd[4466]          printk_get_console_flush_type(&ft, true);
console_try_replay_all[4851]   printk_get_console_flush_type(&ft, true);
-------------------------------------------------------------------------------
</paste>

Now, the parameter @prefer_offload makes a difference only
when it is set to "true" and "ft.legacy_offload" value is
later used. It reduces the above list to:

*** kernel/printk/printk.c:
resume_console[2706]           printk_get_console_flush_type(&ft, true);
console_start[3453]            printk_get_console_flush_type(&ft, true);
__wake_up_klogd[4466]          printk_get_console_flush_type(&ft, true);
console_try_replay_all[4851]   printk_get_console_flush_type(&ft, true);

IMHO, __wake_up_klogd() is the only location where we really need
to know if there are any messages for the legacy loop, for example,
when called from printk_deferred().

It should not be needed in other situations because it there
is always __pr_flush() or console_unlock() which would flush
the legacy consoles directly anyway.

=> I suggest to

1. Remove @prefer_offload parameter from printk_get_console_flush_type

2. Update __wake_up_klogd() to check both ft.legacy_offload and
   ft.legacy_direct, like:

	printk_get_console_flush_type(&ft);
	if (!ft.legacy_offload && !ft.legacy_direct)
		val &= ~PRINTK_PENDING_OUTPUT;


NOTE: I actually suggested to use in vprintk_emit():

	printk_get_console_flush_type(&ft, deffer_legacy);

      But it can be done even without this parameter. Like it
      is done in this version of the patchset.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: preffer_ofload param: was: Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
  2024-07-31 13:46   ` preffer_ofload param: was: " Petr Mladek
@ 2024-08-01 14:22     ` John Ogness
  2024-08-01 15:40       ` Petr Mladek
  0 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-08-01 14:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On 2024-07-31, Petr Mladek <pmladek@suse.com> wrote:
>> @@ -1511,10 +1511,10 @@ static void nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
>>  
>>  	/*
>>  	 * If flushing was successful but more records are available, this
>> -	 * context must flush those remaining records because there is no
>> -	 * other context that will do it.
>> +	 * context must flush those remaining records if the printer thread
>> +	 * is not available do it.
>>  	 */
>> -	printk_get_console_flush_type(&ft, false);
>> +	printk_get_console_flush_type(&ft, true);
>
> Hmm, it is a bit weird that we change the value even though it does
> not affect the behavior. The parameter @prefer_offload affects only
> the legacy loop.

For nbcon consoles, prefer_offload is really only annotating the
intentions. In this case, nbcon_atomic_flush_pending_con() is interested
in knowing if kthread printer is available, thus using
prefer_offload=true.

If the query yields ft.nbcon_atomic set, it means that the kthread
printer is _not_ available (nbcon_atomic and nbcon_offload are
exclusive) and it can (and must) handle its flushing responsibilities
directly (immediately, using the atomic callbacks).

You might ask, why does it not check ft.nbcon_offload? Although that
would tell it that the kthread is not available, it does not imply that
atomic printing is allowed. In order to see if atomic printing is
allowed, it would need to check ft.nbcon_atomic. And since the two are
exclusive, really it is enough just to check ft.nbcon_atomic. If
ft.nbcon_atomic is not set, either the kthread is available or there is
nothing the nbcon console can do about it anyway (for example, it must
rely on the legacy loop to handle it).

I suppose it is wrong that nbcon_atomic_flush_pending_con(false) will
set ft.nbcon_offload if the kthreads are available. I would fix that.

> IMHO, __wake_up_klogd() is the only location where we really need
> to know if there are any messages for the legacy loop, for example,
> when called from printk_deferred().
>
> It should not be needed in other situations because it there
> is always __pr_flush() or console_unlock() which would flush
> the legacy consoles directly anyway.
>
> => I suggest to
>
> 1. Remove @prefer_offload parameter from printk_get_console_flush_type
>
> 2. Update __wake_up_klogd() to check both ft.legacy_offload and
>    ft.legacy_direct, like:
>
> 	printk_get_console_flush_type(&ft);
> 	if (!ft.legacy_offload && !ft.legacy_direct)
> 		val &= ~PRINTK_PENDING_OUTPUT;
>
>
> NOTE: I actually suggested to use in vprintk_emit():
>
> 	printk_get_console_flush_type(&ft, deffer_legacy);
>
>       But it can be done even without this parameter. Like it
>       is done in this version of the patchset.

I understand what you are saying. But I don't like it. :-)

It would mean that functions only interested in offloading must check
direct. And that direct and offload are no longer exclusive. IMHO this
is a hack. The whole point of printk_get_console_flush_type() is so that
the flusher does not need any special code to figure out what to do.

If the flusher is only interested in offloaded flushing capabilities, it
should be able to query that. We could wrap things into macros to make
it clearer, but it might get a bit ugly with the efficience (depending
on how well compilers can optimize the macro usage):

#define is_nbcon_offload_available() ({			\
	struct console_flush_type ft;			\
	printk_get_console_flush_type(&ft, true);	\
	ft.nbcon_offload;				\
})

#define is_nbcon_atomic_available() ({			\
	struct console_flush_type ft;			\
	printk_get_console_flush_type(&ft, false);	\
	ft.nbcon_atomic;				\
})

And then this code looks like:

if (prb_read_valid(prb, nbcon_seq_read(con), NULL)) &&
    !is_nbcon_offload_available() &&
    is_nbcon_atomic_available()) {
	/* flush directly using atomic callback */
}

I have backported the printk_get_console_flush_type() macro to the
atomic series for v7. I would like to keep @prefer_offload and I will
try to add comments to clarify why the different query types are used.

John

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: preffer_ofload param: was: Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
  2024-08-01 14:22     ` John Ogness
@ 2024-08-01 15:40       ` Petr Mladek
  2024-08-02  7:29         ` John Ogness
  0 siblings, 1 reply; 57+ messages in thread
From: Petr Mladek @ 2024-08-01 15:40 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Thu 2024-08-01 16:28:28, John Ogness wrote:
> On 2024-07-31, Petr Mladek <pmladek@suse.com> wrote:
> >> @@ -1511,10 +1511,10 @@ static void nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
> >>  
> >>  	/*
> >>  	 * If flushing was successful but more records are available, this
> >> -	 * context must flush those remaining records because there is no
> >> -	 * other context that will do it.
> >> +	 * context must flush those remaining records if the printer thread
> >> +	 * is not available do it.
> >>  	 */
> >> -	printk_get_console_flush_type(&ft, false);
> >> +	printk_get_console_flush_type(&ft, true);
> >
> > Hmm, it is a bit weird that we change the value even though it does
> > not affect the behavior. The parameter @prefer_offload affects only
> > the legacy loop.
> 
> For nbcon consoles, prefer_offload is really only annotating the
> intentions. In this case, nbcon_atomic_flush_pending_con() is interested
> in knowing if kthread printer is available, thus using
> prefer_offload=true.
> 
> If the query yields ft.nbcon_atomic set, it means that the kthread
> printer is _not_ available (nbcon_atomic and nbcon_offload are
> exclusive) and it can (and must) handle its flushing responsibilities
> directly (immediately, using the atomic callbacks).
> 
> You might ask, why does it not check ft.nbcon_offload? Although that
> would tell it that the kthread is not available, it does not imply that
> atomic printing is allowed. In order to see if atomic printing is
> allowed, it would need to check ft.nbcon_atomic. And since the two are
> exclusive, really it is enough just to check ft.nbcon_atomic. If
> ft.nbcon_atomic is not set, either the kthread is available or there is
> nothing the nbcon console can do about it anyway (for example, it must
> rely on the legacy loop to handle it).

Where exactly do you need prefer offload of the legacy consoles?

Do need to "prefer offload" or "force offload" in these situations?

Note: We are talking about "legacy consoles" and "legacy approach"
which is:

   Legacy approach: "Prefer direct flush when possible"

Also note that "force_offload" is usually detected automatically via
the context: is_printk_deferred() check.

IMHO, we need to explicitely "force_offload" only in printk_deferred()
where it is passed to vprintk_emit() via LOGLEVEL_SCHED.

IMHO, we could get rid of this hack and simply do something like:

  int vprintk_deferred(const char *fmt, va_list args)
{
	preemption_disable();
	printk_deferred_enter();

	return vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, fmt, args);

	printk_deferred_exit();
	preemption_enable();
}

> I suppose it is wrong that nbcon_atomic_flush_pending_con(false) will
> set ft.nbcon_offload if the kthreads are available. I would fix that.

This will not work in vprintk_emit(). We need to use there

   nbcon_atomic_flush_pending_con(false)

because legacy consoles should be handled directly when possible
(legacy approach).

But nbcon consoles should be offloaded to the kthread when possible
(new approach).

The new approach is acceptable "only" for nbcon consoles because
they are synchronized by the nbcon context. Namely, they allow
to take over the ownership and flush the messages directly
in emergency or panic context.

In terms of approach:

  nbcon approach: "Prefer offload when possible"

=> for nbcon consoles we would need "prefer_direct" or
   "force_direct" parameter.

> > IMHO, __wake_up_klogd() is the only location where we really need
> > to know if there are any messages for the legacy loop, for example,
> > when called from printk_deferred().
> >
> > It should not be needed in other situations because it there
> > is always __pr_flush() or console_unlock() which would flush
> > the legacy consoles directly anyway.
> >
> > => I suggest to
> >
> > 1. Remove @prefer_offload parameter from printk_get_console_flush_type
> >
> > 2. Update __wake_up_klogd() to check both ft.legacy_offload and
> >    ft.legacy_direct, like:
> >
> > 	printk_get_console_flush_type(&ft);
> > 	if (!ft.legacy_offload && !ft.legacy_direct)
> > 		val &= ~PRINTK_PENDING_OUTPUT;
> >
> >
> > NOTE: I actually suggested to use in vprintk_emit():
> >
> > 	printk_get_console_flush_type(&ft, deffer_legacy);
> >
> >       But it can be done even without this parameter. Like it
> >       is done in this version of the patchset.
> 
> I understand what you are saying. But I don't like it. :-)
> 
> It would mean that functions only interested in offloading must check
> direct. And that direct and offload are no longer exclusive. IMHO this
> is a hack. The whole point of printk_get_console_flush_type() is so that
> the flusher does not need any special code to figure out what to do.

I agree.

> If the flusher is only interested in offloaded flushing capabilities, it
> should be able to query that. We could wrap things into macros to make
> it clearer, but it might get a bit ugly with the efficience (depending
> on how well compilers can optimize the macro usage):
> 
> #define is_nbcon_offload_available() ({			\
> 	struct console_flush_type ft;			\
> 	printk_get_console_flush_type(&ft, true);	\
> 	ft.nbcon_offload;				\
> })
> 
> #define is_nbcon_atomic_available() ({			\
> 	struct console_flush_type ft;			\
> 	printk_get_console_flush_type(&ft, false);	\
> 	ft.nbcon_atomic;				\
> })
> 
> And then this code looks like:
> 
> if (prb_read_valid(prb, nbcon_seq_read(con), NULL)) &&
>     !is_nbcon_offload_available() &&
>     is_nbcon_atomic_available()) {
> 	/* flush directly using atomic callback */
> }

This is crazy. But where exactly do you need this?


> I have backported the printk_get_console_flush_type() macro to the
> atomic series for v7. I would like to keep @prefer_offload and I will
> try to add comments to clarify why the different query types are used.

Please, reconsider this.

I believe that the parameter "prefer_offload" adds more harm than good
because:

   + It is a non-sense for nbcon consoles. They always prefer offload
     except for emergency and panic. But emergency and panic is
     handled transparently by nbcon_get_default_prio().

   + It is confusing even for legacy consoles after introducing the
     kthread. There will be three types of offload:

	+ do console_lock()/unlock() in IRQ work
	+ wake kthread
	+ wake kthread in IRQ work

In fact, the meaning is rather "can_t_call_scheduler_code_a_safe_way".

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: preffer_ofload param: was: Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
  2024-08-01 15:40       ` Petr Mladek
@ 2024-08-02  7:29         ` John Ogness
  2024-08-02 10:19           ` Petr Mladek
  0 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-08-02  7:29 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On 2024-08-01, Petr Mladek <pmladek@suse.com> wrote:
> I believe that the parameter "prefer_offload" adds more harm than good
> because:
>
>    + It is a non-sense for nbcon consoles. They always prefer offload
>      except for emergency and panic. But emergency and panic is
>      handled transparently by nbcon_get_default_prio().
>
>    + It is confusing even for legacy consoles after introducing the
>      kthread. There will be three types of offload:
>
> 	+ do console_lock()/unlock() in IRQ work
> 	+ wake kthread
> 	+ wake kthread in IRQ work

I think the confusion comes from my intention of the function. I wanted
a caller to use it as:

"Tell me how to flush."

This requires input from the caller to know some information about what
the caller's intentions are.

If I change the function so that a caller uses it as:

"Tell me what flush mechanisms are available to me."

Then the function does not need to know the caller's intentions. It only
needs to know the caller's state, and that information is readily
available via global/per-cpu variables.

I will drop the @prefer_offload argument, simplifying the function to
only provide a list of available flush options. The caller will then
decide itself which option it wants to use. I believe this aligns with
your intentions as well.

John

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: preffer_ofload param: was: Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
  2024-08-02  7:29         ` John Ogness
@ 2024-08-02 10:19           ` Petr Mladek
  0 siblings, 0 replies; 57+ messages in thread
From: Petr Mladek @ 2024-08-02 10:19 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Fri 2024-08-02 09:35:17, John Ogness wrote:
> On 2024-08-01, Petr Mladek <pmladek@suse.com> wrote:
> > I believe that the parameter "prefer_offload" adds more harm than good
> > because:
> >
> >    + It is a non-sense for nbcon consoles. They always prefer offload
> >      except for emergency and panic. But emergency and panic is
> >      handled transparently by nbcon_get_default_prio().
> >
> >    + It is confusing even for legacy consoles after introducing the
> >      kthread. There will be three types of offload:
> >
> > 	+ do console_lock()/unlock() in IRQ work
> > 	+ wake kthread
> > 	+ wake kthread in IRQ work
> 
> I think the confusion comes from my intention of the function. I wanted
> a caller to use it as:
> 
> "Tell me how to flush."

Yes, I understand it the same way.

> This requires input from the caller to know some information about what
> the caller's intentions are.

Does it really need it?
Where?

Please, show me two examples where the function is not able to make
the right decision by global variables?

I know about __wake_up_klogd(). And it is only because
printk_deferred() requires the offload via @level parameter
instead of the per-CPU @printk_context.

Is there any other?

Let me repeat:

    + The parameter makes a difference only when it is "true".
    + The parameter affects only legacy loop.

Let me check where the parameter is true in the current code:

 + nbcon_atomic_flush_pending_con[1548]
    + No effect! The function ignores legacy consoles.

 + nbcon_device_release[1848]
    + No effect! The function ignores legacy consoles.

 + resume_console[2706]
    + Wrong! The function should flush the legacy
	consoles directly by default. Otherwise, we would
	change the legacy behavior.

 + console_flush_all[3102]        printk_get_console_flush_type(&ft, true);
    + No effect! The function ignores legacy consoles.
    + Confusing! The function flushes legacy consoles directly!

  + console_start[3453]
    + Wrong! The function should flush the legacy
      consoles directly by default. Otherwise, we would
	change the legacy behavior.

  + legacy_kthread_should_wakeup[3484]
    + Not needed! The function is called only when
	force_legacy_kthread() == true.

  + __wake_up_klogd[4466]
    + This is the only location where it makes a difference.
    + We could simply check both legacy_direct && legacy_offload
      here.
    + Yes, it is a hack. But it is needed only because of
      printk_deferred() which is already hacky by using
      LOGLEVEL_SCHED.

  + console_try_replay_all[4851]
    + Wrong! It is called under console_lock() => the function will
	flush legacy consoles directly anyway.


> If I change the function so that a caller uses it as:
> 
> "Tell me what flush mechanisms are available to me."

No, this will make the code complicated again.

> Then the function does not need to know the caller's intentions. It only
> needs to know the caller's state, and that information is readily
> available via global/per-cpu variables.
>
> I will drop the @prefer_offload argument, simplifying the function to
> only provide a list of available flush options. The caller will then
> decide itself which option it wants to use. I believe this aligns with
> your intentions as well.

No. Please, keep the original meaning.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
  2024-07-22 17:19 ` [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation John Ogness
  2024-07-23  3:18   ` kernel test robot
  2024-07-31 13:46   ` preffer_ofload param: was: " Petr Mladek
@ 2024-07-31 14:06   ` Petr Mladek
  2024-07-31 15:25     ` John Ogness
  2 siblings, 1 reply; 57+ messages in thread
From: Petr Mladek @ 2024-07-31 14:06 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Mon 2024-07-22 19:25:31, John Ogness wrote:
> Once the kthread is running and available
> (i.e. @printk_kthreads_running is set), the kthread becomes
> responsible for flushing any pending messages which are added
> in NBCON_PRIO_NORMAL context. Namely the legacy
> console_flush_all() and device_release() no longer flush the
> console. And nbcon_atomic_flush_pending() used by
> nbcon_cpu_emergency_exit() no longer flushes messages added
> after the emergency messages.
> 
> The console context is safe when used by the kthread only when
> one of the following conditions are true:
> 
>   1. Other caller acquires the console context with
>      NBCON_PRIO_NORMAL with preemption disabled. It will
>      release the context before rescheduling.
> 
>   2. Other caller acquires the console context with
>      NBCON_PRIO_NORMAL under the device_lock.
> 
>   3. The kthread is the only context which acquires the console
>      with NBCON_PRIO_NORMAL.
> 
> This is satisfied for all atomic printing call sites:
> 
> nbcon_legacy_emit_next_record() (#1)
> 
> nbcon_atomic_flush_pending_con() (#1)
> 
> nbcon_device_release() (#2)
> 
> It is even double guaranteed when @printk_kthreads_running
> is set because then _only_ the kthread will print for
> NBCON_PRIO_NORMAL. (#3)
> 
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -4102,8 +4139,10 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
>  			 * that they make forward progress, so only increment
>  			 * @diff for usable consoles.
>  			 */
> -			if (!console_is_usable(c, flags, true))
> +			if (!console_is_usable(c, flags, true) &&
> +			    !console_is_usable(c, flags, false)) {

This looks weird. nbcon console can't make progress when
"write_atomic" is not implemented and the kthreads are not
running.

I should be:

			if (!((console_is_usable(c, flags, true)) ||
			      (console_is_usable(c, flags, false) && printk_kthreads_running))) {

That said. Do we really want to support nbcon consoles without
@write_atomic() callback?

It would make things easier when both callbacks are mandatory.

>  				continue;
> +			}
>  
>  			if (flags & CON_NBCON) {
>  				printk_seq = nbcon_seq_read(c);

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
  2024-07-31 14:06   ` Petr Mladek
@ 2024-07-31 15:25     ` John Ogness
  2024-08-01  9:36       ` Petr Mladek
  0 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-07-31 15:25 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On 2024-07-31, Petr Mladek <pmladek@suse.com> wrote:
>> @@ -4102,8 +4139,10 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
>>  			 * that they make forward progress, so only increment
>>  			 * @diff for usable consoles.
>>  			 */
>> -			if (!console_is_usable(c, flags, true))
>> +			if (!console_is_usable(c, flags, true) &&
>> +			    !console_is_usable(c, flags, false)) {
>
> This looks weird. nbcon console can't make progress when
> "write_atomic" is not implemented and the kthreads are not
> running.
>
> I should be:
>
> 			if (!((console_is_usable(c, flags, true)) ||
> 			      (console_is_usable(c, flags, false) && printk_kthreads_running))) {

I would prefer to have the printk_kthreads_running check within
console_is_usable() for the !use_atomic case.

> That said. Do we really want to support nbcon consoles without
> @write_atomic() callback?

We must. Graphic consoles will not be able to implement
write_atomic(). Network and USB consoles probably will not be able to
implement it either.

John

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
  2024-07-31 15:25     ` John Ogness
@ 2024-08-01  9:36       ` Petr Mladek
  2024-08-01  9:52         ` Petr Mladek
  0 siblings, 1 reply; 57+ messages in thread
From: Petr Mladek @ 2024-08-01  9:36 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Wed 2024-07-31 17:31:02, John Ogness wrote:
> On 2024-07-31, Petr Mladek <pmladek@suse.com> wrote:
> >> @@ -4102,8 +4139,10 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
> >>  			 * that they make forward progress, so only increment
> >>  			 * @diff for usable consoles.
> >>  			 */
> >> -			if (!console_is_usable(c, flags, true))
> >> +			if (!console_is_usable(c, flags, true) &&
> >> +			    !console_is_usable(c, flags, false)) {
> >
> > This looks weird. nbcon console can't make progress when
> > "write_atomic" is not implemented and the kthreads are not
> > running.
> >
> > I should be:
> >
> > 			if (!((console_is_usable(c, flags, true)) ||
> > 			      (console_is_usable(c, flags, false) && printk_kthreads_running))) {
> 
> I would prefer to have the printk_kthreads_running check within
> console_is_usable() for the !use_atomic case.

Makes sense.

> > That said. Do we really want to support nbcon consoles without
> > @write_atomic() callback?
> 
> We must. Graphic consoles will not be able to implement
> write_atomic(). Network and USB consoles probably will not be able to
> implement it either.

I see.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation
  2024-08-01  9:36       ` Petr Mladek
@ 2024-08-01  9:52         ` Petr Mladek
  0 siblings, 0 replies; 57+ messages in thread
From: Petr Mladek @ 2024-08-01  9:52 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Thu 2024-08-01 11:36:53, Petr Mladek wrote:
> On Wed 2024-07-31 17:31:02, John Ogness wrote:
> > On 2024-07-31, Petr Mladek <pmladek@suse.com> wrote:
> > >> @@ -4102,8 +4139,10 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
> > >>  			 * that they make forward progress, so only increment
> > >>  			 * @diff for usable consoles.
> > >>  			 */
> > >> -			if (!console_is_usable(c, flags, true))
> > >> +			if (!console_is_usable(c, flags, true) &&
> > >> +			    !console_is_usable(c, flags, false)) {
> > >
> > > This looks weird. nbcon console can't make progress when
> > > "write_atomic" is not implemented and the kthreads are not
> > > running.
> > >
> > > I should be:
> > >
> > > 			if (!((console_is_usable(c, flags, true)) ||
> > > 			      (console_is_usable(c, flags, false) && printk_kthreads_running))) {
> > 
> > I would prefer to have the printk_kthreads_running check within
> > console_is_usable() for the !use_atomic case.
> 
> Makes sense.

I have realized right after sending the reply that it actually did
not make sense.

We try to use con->write_thread also in

 + console_flush_all()
   + nbcon_legacy_emit_next_record()
     + nbcon_emit_one()
       + nbcon_emit_next_record()

when (@do_cond_resched == true)   =>   (@use_atomic == false)

=> __pr_flush() actually should be able to flush the messages using
   con->write_thread() even when the kthreads are not running.


Result:

IMHO, we should not check @printk_kthreads_running at all after all.

Sigh, it might make sense to document the design from a top level
view somewhere. We should do it at the end before we forget
the details.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH printk v3 12/19] printk: Provide helper for message prepending
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
                   ` (10 preceding siblings ...)
  2024-07-22 17:19 ` [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-07-22 17:19 ` [PATCH printk v3 13/19] printk: nbcon: Show replay message on takeover John Ogness
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

In order to support prepending different texts to printk
messages, split out the prepending code into a helper
function.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 620c02b069bc..89210a21cc50 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2790,30 +2790,31 @@ static void __console_unlock(void)
 #ifdef CONFIG_PRINTK
 
 /*
- * Prepend the message in @pmsg->pbufs->outbuf with a "dropped message". This
- * is achieved by shifting the existing message over and inserting the dropped
- * message.
+ * Prepend the message in @pmsg->pbufs->outbuf. This is achieved by shifting
+ * the existing message over and inserting the scratchbuf message.
  *
- * @pmsg is the printk message to prepend.
- *
- * @dropped is the dropped count to report in the dropped message.
+ * @pmsg is the original printk message.
+ * @fmt is the printf format of the message which will prepend the existing one.
  *
- * If the message text in @pmsg->pbufs->outbuf does not have enough space for
- * the dropped message, the message text will be sufficiently truncated.
+ * If there is not enough space in @pmsg->pbufs->outbuf, the existing
+ * message text will be sufficiently truncated.
  *
  * If @pmsg->pbufs->outbuf is modified, @pmsg->outbuf_len is updated.
  */
-void console_prepend_dropped(struct printk_message *pmsg, unsigned long dropped)
+__printf(2, 3)
+static void console_prepend_message(struct printk_message *pmsg, const char *fmt, ...)
 {
 	struct printk_buffers *pbufs = pmsg->pbufs;
 	const size_t scratchbuf_sz = sizeof(pbufs->scratchbuf);
 	const size_t outbuf_sz = sizeof(pbufs->outbuf);
 	char *scratchbuf = &pbufs->scratchbuf[0];
 	char *outbuf = &pbufs->outbuf[0];
+	va_list args;
 	size_t len;
 
-	len = scnprintf(scratchbuf, scratchbuf_sz,
-		       "** %lu printk messages dropped **\n", dropped);
+	va_start(args, fmt);
+	len = vscnprintf(scratchbuf, scratchbuf_sz, fmt, args);
+	va_end(args);
 
 	/*
 	 * Make sure outbuf is sufficiently large before prepending.
@@ -2835,6 +2836,19 @@ void console_prepend_dropped(struct printk_message *pmsg, unsigned long dropped)
 	pmsg->outbuf_len += len;
 }
 
+/*
+ * Prepend the message in @pmsg->pbufs->outbuf with a "dropped message".
+ * @pmsg->outbuf_len is updated appropriately.
+ *
+ * @pmsg is the printk message to prepend.
+ *
+ * @dropped is the dropped count to report in the dropped message.
+ */
+void console_prepend_dropped(struct printk_message *pmsg, unsigned long dropped)
+{
+	console_prepend_message(pmsg, "** %lu printk messages dropped **\n", dropped);
+}
+
 /*
  * Read and format the specified record (or a later record if the specified
  * record is not available).
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH printk v3 13/19] printk: nbcon: Show replay message on takeover
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
                   ` (11 preceding siblings ...)
  2024-07-22 17:19 ` [PATCH printk v3 12/19] printk: Provide helper for message prepending John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-07-31 14:59   ` Petr Mladek
  2024-07-22 17:19 ` [PATCH printk v3 14/19] proc: consoles: Add notation to c_start/c_stop John Ogness
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman

An emergency or panic context can takeover console ownership
while the current owner was printing a printk message. The
atomic printer will re-print the message that the previous
owner was printing. However, this can look confusing to the
user and may even seem as though a message was lost.

  [3430014.1
  [3430014.181123] usb 1-2: Product: USB Audio

Add a new field @nbcon_prev_seq to struct console to track
the sequence number to print that was assigned to the previous
console owner. If this matches the sequence number to print
that the current owner is assigned, then a takeover must have
occurred. In this case, print an additional message to inform
the user that the previous message is being printed again.

  [3430014.1
  ** replaying previous printk message **
  [3430014.181123] usb 1-2: Product: USB Audio

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 include/linux/console.h  |  2 ++
 kernel/printk/internal.h |  1 +
 kernel/printk/nbcon.c    | 26 ++++++++++++++++++++++++++
 kernel/printk/printk.c   | 11 +++++++++++
 4 files changed, 40 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index b8fb9fb24cbf..fde565e127b6 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -325,6 +325,7 @@ struct nbcon_write_context {
  * @nbcon_state:	State for nbcon consoles
  * @nbcon_seq:		Sequence number of the next record for nbcon to print
  * @nbcon_device_ctxt:	Context available for non-printing operations
+ * @nbcon_prev_seq:	Seq num the previous nbcon owner was assigned to print
  * @pbufs:		Pointer to nbcon private buffer
  * @kthread:		Printer kthread for this console
  * @rcuwait:		RCU-safe wait object for @kthread waking
@@ -459,6 +460,7 @@ struct console {
 	atomic_t		__private nbcon_state;
 	atomic_long_t		__private nbcon_seq;
 	struct nbcon_context	__private nbcon_device_ctxt;
+	atomic_long_t           __private nbcon_prev_seq;
 
 	struct printk_buffers	*pbufs;
 	struct task_struct	*kthread;
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 66321836c3fe..a4e7b40458b2 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -321,4 +321,5 @@ bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
 
 #ifdef CONFIG_PRINTK
 void console_prepend_dropped(struct printk_message *pmsg, unsigned long dropped);
+void console_prepend_replay(struct printk_message *pmsg);
 #endif
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 8cf9e9e8c6e4..2a61f2c68ddc 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -938,7 +938,9 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt, bool use_a
 		.pbufs = ctxt->pbufs,
 	};
 	unsigned long con_dropped;
+	struct nbcon_state cur;
 	unsigned long dropped;
+	unsigned long ulseq;
 
 	/*
 	 * This function should never be called for consoles that have not
@@ -976,6 +978,29 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt, bool use_a
 	if (dropped && !is_extended)
 		console_prepend_dropped(&pmsg, dropped);
 
+	/*
+	 * If the previous owner was assigned the same record, this context
+	 * has taken over ownership and is replaying the record. Prepend a
+	 * message to let the user know the record is replayed.
+	 */
+	ulseq = atomic_long_read(&ACCESS_PRIVATE(con, nbcon_prev_seq));
+	if (__ulseq_to_u64seq(prb, ulseq) == pmsg.seq) {
+		console_prepend_replay(&pmsg);
+	} else {
+		/*
+		 * Ensure this context is still the owner before trying to
+		 * update @nbcon_prev_seq. Otherwise the value in @ulseq may
+		 * not be from the previous owner and instead be some later
+		 * value from the context that took over ownership.
+		 */
+		nbcon_state_read(con, &cur);
+		if (!nbcon_context_can_proceed(ctxt, &cur))
+			return false;
+
+		atomic_long_try_cmpxchg(&ACCESS_PRIVATE(con, nbcon_prev_seq), &ulseq,
+					__u64seq_to_ulseq(pmsg.seq));
+	}
+
 	if (!nbcon_context_exit_unsafe(ctxt))
 		return false;
 
@@ -1706,6 +1731,7 @@ bool nbcon_alloc(struct console *con)
 
 	rcuwait_init(&con->rcuwait);
 	init_irq_work(&con->irq_work, nbcon_irq_work);
+	atomic_long_set(&ACCESS_PRIVATE(con, nbcon_prev_seq), -1UL);
 	nbcon_state_set(con, &state);
 
 	if (con->flags & CON_BOOT) {
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 89210a21cc50..6c9c0a42adf3 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2849,6 +2849,17 @@ void console_prepend_dropped(struct printk_message *pmsg, unsigned long dropped)
 	console_prepend_message(pmsg, "** %lu printk messages dropped **\n", dropped);
 }
 
+/*
+ * Prepend the message in @pmsg->pbufs->outbuf with a "replay message".
+ * @pmsg->outbuf_len is updated appropriately.
+ *
+ * @pmsg is the printk message to prepend.
+ */
+void console_prepend_replay(struct printk_message *pmsg)
+{
+	console_prepend_message(pmsg, "** replaying previous printk message **\n");
+}
+
 /*
  * Read and format the specified record (or a later record if the specified
  * record is not available).
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 13/19] printk: nbcon: Show replay message on takeover
  2024-07-22 17:19 ` [PATCH printk v3 13/19] printk: nbcon: Show replay message on takeover John Ogness
@ 2024-07-31 14:59   ` Petr Mladek
  0 siblings, 0 replies; 57+ messages in thread
From: Petr Mladek @ 2024-07-31 14:59 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman

On Mon 2024-07-22 19:25:33, John Ogness wrote:
> An emergency or panic context can takeover console ownership
> while the current owner was printing a printk message. The
> atomic printer will re-print the message that the previous
> owner was printing. However, this can look confusing to the
> user and may even seem as though a message was lost.
> 
>   [3430014.1
>   [3430014.181123] usb 1-2: Product: USB Audio
> 
> Add a new field @nbcon_prev_seq to struct console to track
> the sequence number to print that was assigned to the previous
> console owner. If this matches the sequence number to print
> that the current owner is assigned, then a takeover must have
> occurred. In this case, print an additional message to inform
> the user that the previous message is being printed again.
> 
>   [3430014.1
>   ** replaying previous printk message **
>   [3430014.181123] usb 1-2: Product: USB Audio
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH printk v3 14/19] proc: consoles: Add notation to c_start/c_stop
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
                   ` (12 preceding siblings ...)
  2024-07-22 17:19 ` [PATCH printk v3 13/19] printk: nbcon: Show replay message on takeover John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-07-22 17:19 ` [PATCH printk v3 15/19] proc: Add nbcon support for /proc/consoles John Ogness
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

fs/proc/consoles.c:78:13: warning: context imbalance in 'c_start'
	- wrong count at exit
fs/proc/consoles.c:104:13: warning: context imbalance in 'c_stop'
	- unexpected unlock

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 fs/proc/consoles.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/proc/consoles.c b/fs/proc/consoles.c
index e0758fe7936d..7036fdfa0bec 100644
--- a/fs/proc/consoles.c
+++ b/fs/proc/consoles.c
@@ -68,6 +68,7 @@ static int show_console_dev(struct seq_file *m, void *v)
 }
 
 static void *c_start(struct seq_file *m, loff_t *pos)
+	__acquires(&console_mutex)
 {
 	struct console *con;
 	loff_t off = 0;
@@ -94,6 +95,7 @@ static void *c_next(struct seq_file *m, void *v, loff_t *pos)
 }
 
 static void c_stop(struct seq_file *m, void *v)
+	__releases(&console_mutex)
 {
 	console_list_unlock();
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH printk v3 15/19] proc: Add nbcon support for /proc/consoles
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
                   ` (13 preceding siblings ...)
  2024-07-22 17:19 ` [PATCH printk v3 14/19] proc: consoles: Add notation to c_start/c_stop John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-07-31 15:07   ` Petr Mladek
  2024-07-22 17:19 ` [PATCH printk v3 16/19] tty: sysfs: Add nbcon support for 'active' John Ogness
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	linux-fsdevel

Update /proc/consoles output to show 'W' if an nbcon console is
registered. Since the write_thread() callback is mandatory, it
enough just to check if it is an nbcon console.

Also update /proc/consoles output to show 'N' if it is an
nbcon console.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 fs/proc/consoles.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/proc/consoles.c b/fs/proc/consoles.c
index 7036fdfa0bec..b7cab1ad990d 100644
--- a/fs/proc/consoles.c
+++ b/fs/proc/consoles.c
@@ -21,6 +21,7 @@ static int show_console_dev(struct seq_file *m, void *v)
 		{ CON_ENABLED,		'E' },
 		{ CON_CONSDEV,		'C' },
 		{ CON_BOOT,		'B' },
+		{ CON_NBCON,		'N' },
 		{ CON_PRINTBUFFER,	'p' },
 		{ CON_BRL,		'b' },
 		{ CON_ANYTIME,		'a' },
@@ -58,8 +59,8 @@ static int show_console_dev(struct seq_file *m, void *v)
 	seq_printf(m, "%s%d", con->name, con->index);
 	seq_pad(m, ' ');
 	seq_printf(m, "%c%c%c (%s)", con->read ? 'R' : '-',
-			con->write ? 'W' : '-', con->unblank ? 'U' : '-',
-			flags);
+		   ((con->flags & CON_NBCON) || con->write) ? 'W' : '-',
+		   con->unblank ? 'U' : '-', flags);
 	if (dev)
 		seq_printf(m, " %4d:%d", MAJOR(dev), MINOR(dev));
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 15/19] proc: Add nbcon support for /proc/consoles
  2024-07-22 17:19 ` [PATCH printk v3 15/19] proc: Add nbcon support for /proc/consoles John Ogness
@ 2024-07-31 15:07   ` Petr Mladek
  0 siblings, 0 replies; 57+ messages in thread
From: Petr Mladek @ 2024-07-31 15:07 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	linux-fsdevel

On Mon 2024-07-22 19:25:35, John Ogness wrote:
> Update /proc/consoles output to show 'W' if an nbcon console is
> registered. Since the write_thread() callback is mandatory, it
> enough just to check if it is an nbcon console.
> 
> Also update /proc/consoles output to show 'N' if it is an
> nbcon console.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH printk v3 16/19] tty: sysfs: Add nbcon support for 'active'
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
                   ` (14 preceding siblings ...)
  2024-07-22 17:19 ` [PATCH printk v3 15/19] proc: Add nbcon support for /proc/consoles John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-07-31 15:09   ` Petr Mladek
  2024-07-22 17:19 ` [PATCH printk v3 17/19] printk: Implement legacy printer kthread for PREEMPT_RT John Ogness
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, Jiri Slaby, linux-serial

Allow the 'active' attribute to list nbcon consoles.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 drivers/tty/tty_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 407b0d87b7c1..9140825e810f 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -3567,7 +3567,7 @@ static ssize_t show_cons_active(struct device *dev,
 	for_each_console(c) {
 		if (!c->device)
 			continue;
-		if (!c->write)
+		if (!(c->flags & CON_NBCON) && !c->write)
 			continue;
 		if ((c->flags & CON_ENABLED) == 0)
 			continue;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 16/19] tty: sysfs: Add nbcon support for 'active'
  2024-07-22 17:19 ` [PATCH printk v3 16/19] tty: sysfs: Add nbcon support for 'active' John Ogness
@ 2024-07-31 15:09   ` Petr Mladek
  0 siblings, 0 replies; 57+ messages in thread
From: Petr Mladek @ 2024-07-31 15:09 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel,
	Greg Kroah-Hartman, Jiri Slaby, linux-serial

On Mon 2024-07-22 19:25:36, John Ogness wrote:
> Allow the 'active' attribute to list nbcon consoles.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH printk v3 17/19] printk: Implement legacy printer kthread for PREEMPT_RT
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
                   ` (15 preceding siblings ...)
  2024-07-22 17:19 ` [PATCH printk v3 16/19] tty: sysfs: Add nbcon support for 'active' John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-08-02 11:45   ` Petr Mladek
  2024-07-22 17:19 ` [PATCH printk v3 18/19] printk: nbcon: Assign nice -20 for printing threads John Ogness
  2024-07-22 17:19 ` [PATCH printk v3 19/19] printk: Avoid false positive lockdep report for legacy printing John Ogness
  18 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

The write() callback of legacy consoles usually makes use of
spinlocks. This is not permitted with PREEMPT_RT in atomic
contexts.

For PREEMPT_RT, create a new kthread to handle printing of all
the legacy consoles (and nbcon consoles if boot consoles are
registered). This allows legacy consoles to work on PREEMPT_RT
without requiring modification. (However they will not have
the reliability properties guaranteed by nbcon atomic
consoles.)

Use the existing printk_kthreads_check_locked() to start/stop
the legacy kthread as needed.

Introduce the macro force_legacy_kthread() to query if the
forced threading of legacy consoles is in effect. Although
currently only enabled for PREEMPT_RT, this acts as a simple
mechanism for the future to allow other preemption models to
easily take advantage of the non-interference property provided
by the legacy kthread.

When force_legacy_kthread() is true, the legacy kthread
fulfills the role of the console_flush_type @legacy_offload by
waking the legacy kthread instead of printing via the
console_lock in the irq_work. If the legacy kthread is not
yet available, no legacy printing takes place (unless in
panic).

If for some reason the legacy kthread fails to create, any
legacy consoles are unregistered. With force_legacy_kthread(),
the legacy kthread is a critical component for legacy consoles.

These changes only affect CONFIG_PREEMPT_RT.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/internal.h |  19 ++++-
 kernel/printk/printk.c   | 162 +++++++++++++++++++++++++++++++++++----
 2 files changed, 162 insertions(+), 19 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index a4e7b40458b2..0ac4bdb9e3e8 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -21,6 +21,19 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
 		(con->flags & CON_BOOT) ? "boot" : "",		\
 		con->name, con->index, ##__VA_ARGS__)
 
+/*
+ * Identify if legacy printing is forced in a dedicated kthread. If
+ * true, all printing via console lock occurs within a dedicated
+ * legacy printer thread. The only exception is on panic, after the
+ * nbcon consoles have had their chance to print the panic messages
+ * first.
+ */
+#ifdef CONFIG_PREEMPT_RT
+# define force_legacy_kthread()	(true)
+#else
+# define force_legacy_kthread()	(false)
+#endif
+
 #ifdef CONFIG_PRINTK
 
 #ifdef CONFIG_PRINTK_CALLER
@@ -192,7 +205,7 @@ extern bool legacy_allow_panic_sync;
  * @nbcon_atomic:	Flush directly using nbcon_atomic() callback
  * @nbcon_offload:	Offload flush to printer thread
  * @legacy_direct:	Call the legacy loop in this context
- * @legacy_offload:	Offload the legacy loop into IRQ
+ * @legacy_offload:	Offload the legacy loop into IRQ or legacy thread
  */
 struct console_flush_type {
 	bool	nbcon_atomic;
@@ -216,7 +229,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft,
 	switch (nbcon_get_default_prio()) {
 	case NBCON_PRIO_NORMAL:
 		if (have_legacy_console || have_boot_console) {
-			if (prefer_offload || is_printk_deferred())
+			if (force_legacy_kthread() || prefer_offload || is_printk_deferred())
 				ft->legacy_offload = true;
 			else
 				ft->legacy_direct = true;
@@ -267,7 +280,7 @@ static inline void printk_get_emergency_console_flush_type(struct console_flush_
 			ft->nbcon_atomic = true;
 
 		if (have_legacy_console || have_boot_console) {
-			if (is_printk_deferred())
+			if (force_legacy_kthread() || is_printk_deferred())
 				ft->legacy_offload = true;
 			else
 				ft->legacy_direct = true;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 6c9c0a42adf3..6f70d3a7153f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -488,6 +488,7 @@ bool legacy_allow_panic_sync;
 
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
+static DECLARE_WAIT_QUEUE_HEAD(legacy_wait);
 /* All 3 protected by @syslog_lock. */
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
 static u64 syslog_seq;
@@ -2450,6 +2451,7 @@ static u64 syslog_seq;
 
 static bool pr_flush(int timeout_ms, bool reset_on_progress) { return true; }
 static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progress) { return true; }
+static inline void legacy_kthread_wake(void) { }
 
 #endif /* CONFIG_PRINTK */
 
@@ -2704,6 +2706,8 @@ void resume_console(void)
 	printk_get_console_flush_type(&ft, true);
 	if (ft.nbcon_offload)
 		nbcon_wake_kthreads();
+	if (ft.legacy_offload)
+		defer_console_output();
 
 	pr_flush(1000, true);
 }
@@ -3112,19 +3116,7 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
 	return false;
 }
 
-/**
- * console_unlock - unblock the console subsystem from printing
- *
- * Releases the console_lock which the caller holds to block printing of
- * the console subsystem.
- *
- * While the console_lock was held, console output may have been buffered
- * by printk().  If this is the case, console_unlock(); emits
- * the output prior to releasing the lock.
- *
- * console_unlock(); may be called from any context.
- */
-void console_unlock(void)
+static void __console_flush_and_unlock(void)
 {
 	bool do_cond_resched;
 	bool handover;
@@ -3168,6 +3160,29 @@ void console_unlock(void)
 		 */
 	} while (prb_read_valid(prb, next_seq, NULL) && console_trylock());
 }
+
+/**
+ * console_unlock - unblock the legacy console subsystem from printing
+ *
+ * Releases the console_lock which the caller holds to block printing of
+ * the legacy console subsystem.
+ *
+ * While the console_lock was held, console output may have been buffered
+ * by printk(). If this is the case, console_unlock() emits the output on
+ * legacy consoles prior to releasing the lock.
+ *
+ * console_unlock(); may be called from any context.
+ */
+void console_unlock(void)
+{
+	struct console_flush_type ft;
+
+	printk_get_console_flush_type(&ft, false);
+	if (ft.legacy_direct)
+		__console_flush_and_unlock();
+	else
+		__console_unlock();
+}
 EXPORT_SYMBOL(console_unlock);
 
 /**
@@ -3397,6 +3412,8 @@ void console_start(struct console *console)
 	flags = console_srcu_read_flags(console);
 	if ((flags & CON_NBCON) && ft.nbcon_offload)
 		nbcon_kthread_wake(console);
+	else if (!(flags & CON_NBCON) && ft.legacy_offload)
+		defer_console_output();
 	console_srcu_read_unlock(cookie);
 
 	__pr_flush(console, 1000, true);
@@ -3409,6 +3426,87 @@ static int unregister_console_locked(struct console *console);
 /* True when system boot is far enough to create printer threads. */
 static bool printk_kthreads_ready __ro_after_init;
 
+static struct task_struct *printk_legacy_kthread;
+
+static bool legacy_kthread_should_wakeup(void)
+{
+	struct console_flush_type ft;
+	struct console *con;
+	bool ret = false;
+	int cookie;
+
+	if (kthread_should_stop())
+		return true;
+
+	printk_get_console_flush_type(&ft, true);
+
+	cookie = console_srcu_read_lock();
+	for_each_console_srcu(con) {
+		short flags = console_srcu_read_flags(con);
+		u64 printk_seq;
+
+		/*
+		 * The legacy printer thread is only for legacy consoles when
+		 * the nbcon consoles have their printer threads.
+		 */
+		if ((flags & CON_NBCON) && ft.nbcon_offload)
+			continue;
+
+		if (!console_is_usable(con, flags, false))
+			continue;
+
+		if (flags & CON_NBCON) {
+			printk_seq = nbcon_seq_read(con);
+		} else {
+			/*
+			 * It is safe to read @seq because only this
+			 * thread context updates @seq.
+			 */
+			printk_seq = con->seq;
+		}
+
+		if (prb_read_valid(prb, printk_seq, NULL)) {
+			ret = true;
+			break;
+		}
+	}
+	console_srcu_read_unlock(cookie);
+
+	return ret;
+}
+
+static int legacy_kthread_func(void *unused)
+{
+	for (;;) {
+		wait_event_interruptible(legacy_wait, legacy_kthread_should_wakeup());
+
+		if (kthread_should_stop())
+			break;
+
+		console_lock();
+		__console_flush_and_unlock();
+	}
+
+	return 0;
+}
+
+static bool legacy_kthread_create(void)
+{
+	struct task_struct *kt;
+
+	lockdep_assert_console_list_lock_held();
+
+	kt = kthread_run(legacy_kthread_func, NULL, "pr/legacy");
+	if (WARN_ON(IS_ERR(kt))) {
+		pr_err("failed to start legacy printing thread\n");
+		return false;
+	}
+
+	printk_legacy_kthread = kt;
+
+	return true;
+}
+
 /**
  * printk_kthreads_shutdown - shutdown all threaded printers
  *
@@ -3458,6 +3556,27 @@ static void printk_kthreads_check_locked(void)
 	if (!printk_kthreads_ready)
 		return;
 
+	if (have_legacy_console || have_boot_console) {
+		if (!printk_legacy_kthread &&
+		    force_legacy_kthread() &&
+		    !legacy_kthread_create()) {
+			/*
+			 * All legacy consoles must be unregistered. If there
+			 * are any nbcon consoles, they will set up their own
+			 * kthread.
+			 */
+			hlist_for_each_entry_safe(con, tmp, &console_list, node) {
+				if (con->flags & CON_NBCON)
+					continue;
+
+				unregister_console_locked(con);
+			}
+		}
+	} else if (printk_legacy_kthread) {
+		kthread_stop(printk_legacy_kthread);
+		printk_legacy_kthread = NULL;
+	}
+
 	/*
 	 * Printer threads cannot be started as long as any boot console is
 	 * registered because there is no way to synchronize the hardware
@@ -4241,14 +4360,23 @@ static bool pr_flush(int timeout_ms, bool reset_on_progress)
 
 static DEFINE_PER_CPU(int, printk_pending);
 
+static void legacy_kthread_wake(void)
+{
+	if (printk_legacy_kthread)
+		wake_up_interruptible(&legacy_wait);
+}
+
 static void wake_up_klogd_work_func(struct irq_work *irq_work)
 {
 	int pending = this_cpu_xchg(printk_pending, 0);
 
 	if (pending & PRINTK_PENDING_OUTPUT) {
-		/* If trylock fails, someone else is doing the printing */
-		if (console_trylock())
-			console_unlock();
+		if (force_legacy_kthread()) {
+			legacy_kthread_wake();
+		} else {
+			if (console_trylock())
+				console_unlock();
+		}
 	}
 
 	if (pending & PRINTK_PENDING_WAKEUP)
@@ -4676,6 +4804,8 @@ void console_try_replay_all(void)
 		__console_rewind_all();
 		if (ft.nbcon_offload)
 			nbcon_wake_kthreads();
+		if (ft.legacy_offload)
+			defer_console_output();
 		/* Consoles are flushed as part of console_unlock(). */
 		console_unlock();
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 17/19] printk: Implement legacy printer kthread for PREEMPT_RT
  2024-07-22 17:19 ` [PATCH printk v3 17/19] printk: Implement legacy printer kthread for PREEMPT_RT John Ogness
@ 2024-08-02 11:45   ` Petr Mladek
  0 siblings, 0 replies; 57+ messages in thread
From: Petr Mladek @ 2024-08-02 11:45 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Mon 2024-07-22 19:25:37, John Ogness wrote:
> The write() callback of legacy consoles usually makes use of
> spinlocks. This is not permitted with PREEMPT_RT in atomic
> contexts.
> 
> For PREEMPT_RT, create a new kthread to handle printing of all
> the legacy consoles (and nbcon consoles if boot consoles are
> registered). This allows legacy consoles to work on PREEMPT_RT
> without requiring modification. (However they will not have
> the reliability properties guaranteed by nbcon atomic
> consoles.)
> 
> Use the existing printk_kthreads_check_locked() to start/stop
> the legacy kthread as needed.
> 
> Introduce the macro force_legacy_kthread() to query if the
> forced threading of legacy consoles is in effect. Although
> currently only enabled for PREEMPT_RT, this acts as a simple
> mechanism for the future to allow other preemption models to
> easily take advantage of the non-interference property provided
> by the legacy kthread.
> 
> When force_legacy_kthread() is true, the legacy kthread
> fulfills the role of the console_flush_type @legacy_offload by
> waking the legacy kthread instead of printing via the
> console_lock in the irq_work. If the legacy kthread is not
> yet available, no legacy printing takes place (unless in
> panic).
> 
> If for some reason the legacy kthread fails to create, any
> legacy consoles are unregistered. With force_legacy_kthread(),
> the legacy kthread is a critical component for legacy consoles.
> 
> These changes only affect CONFIG_PREEMPT_RT.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Looks good to me:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH printk v3 18/19] printk: nbcon: Assign nice -20 for printing threads
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
                   ` (16 preceding siblings ...)
  2024-07-22 17:19 ` [PATCH printk v3 17/19] printk: Implement legacy printer kthread for PREEMPT_RT John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-08-02 11:47   ` Petr Mladek
  2024-07-22 17:19 ` [PATCH printk v3 19/19] printk: Avoid false positive lockdep report for legacy printing John Ogness
  18 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

It is important that console printing threads are scheduled
shortly after a printk call and with generous runtime budgets.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/nbcon.c  | 6 ++++++
 kernel/printk/printk.c | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 2a61f2c68ddc..e34e5c5a7843 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1260,6 +1260,12 @@ bool nbcon_kthread_create(struct console *con)
 
 	con->kthread = kt;
 
+	/*
+	 * It is important that console printing threads are scheduled
+	 * shortly after a printk call and with generous runtime budgets.
+	 */
+	sched_set_normal(con->kthread, -20);
+
 	return true;
 }
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 6f70d3a7153f..36087a6d579f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3504,6 +3504,12 @@ static bool legacy_kthread_create(void)
 
 	printk_legacy_kthread = kt;
 
+	/*
+	 * It is important that console printing threads are scheduled
+	 * shortly after a printk call and with generous runtime budgets.
+	 */
+	sched_set_normal(printk_legacy_kthread, -20);
+
 	return true;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 18/19] printk: nbcon: Assign nice -20 for printing threads
  2024-07-22 17:19 ` [PATCH printk v3 18/19] printk: nbcon: Assign nice -20 for printing threads John Ogness
@ 2024-08-02 11:47   ` Petr Mladek
  0 siblings, 0 replies; 57+ messages in thread
From: Petr Mladek @ 2024-08-02 11:47 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Mon 2024-07-22 19:25:38, John Ogness wrote:
> It is important that console printing threads are scheduled
> shortly after a printk call and with generous runtime budgets.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH printk v3 19/19] printk: Avoid false positive lockdep report for legacy printing
  2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
                   ` (17 preceding siblings ...)
  2024-07-22 17:19 ` [PATCH printk v3 18/19] printk: nbcon: Assign nice -20 for printing threads John Ogness
@ 2024-07-22 17:19 ` John Ogness
  2024-08-02 12:34   ` Petr Mladek
  18 siblings, 1 reply; 57+ messages in thread
From: John Ogness @ 2024-07-22 17:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

Legacy console printing from printk() caller context may invoke
the console driver from atomic context. This leads to a lockdep
splat because the console driver will acquire a sleeping lock
and the caller may already hold a spinning lock. This is noticed
by lockdep on !PREEMPT_RT configurations because it will lead to
a problem on PREEMPT_RT.

However, on PREEMPT_RT the printing path from atomic context is
always avoided and the console driver is always invoked from a
dedicated thread. Thus the lockdep splat on !PREEMPT_RT is a
false positive.

For !PREEMPT_RT override the lock-context before invoking the
console driver to avoid the false positive.

Do not override the lock-context for PREEMPT_RT in order to
allow lockdep to catch any real locking context issues related
to the write callback usage.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/printk.c | 83 ++++++++++++++++++++++++++++++++----------
 1 file changed, 63 insertions(+), 20 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 36087a6d579f..fadfbe56019b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2929,6 +2929,34 @@ bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
 	return true;
 }
 
+/*
+ * Legacy console printing from printk() caller context does not respect
+ * raw_spinlock/spinlock nesting. For !PREEMPT_RT the lockdep warning is a
+ * false positive. For PREEMPT_RT the false positive condition does not
+ * occur.
+ *
+ * This map is used to temporarily establish LD_WAIT_SLEEP context for the
+ * console write() callback when legacy printing to avoid false positive
+ * lockdep complaints, thus allowing lockdep to continue to function for
+ * real issues.
+ */
+#ifdef CONFIG_PREEMPT_RT
+static inline void printk_legacy_allow_spinlock_enter(void) { }
+static inline void printk_legacy_allow_spinlock_exit(void) { }
+#else
+static DEFINE_WAIT_OVERRIDE_MAP(printk_legacy_map, LD_WAIT_SLEEP);
+
+static inline void printk_legacy_allow_spinlock_enter(void)
+{
+	lock_map_acquire_try(&printk_legacy_map);
+}
+
+static inline void printk_legacy_allow_spinlock_exit(void)
+{
+	lock_map_release(&printk_legacy_map);
+}
+#endif /* CONFIG_PREEMPT_RT */
+
 /*
  * Used as the printk buffers for non-panic, serialized console printing.
  * This is for legacy (!CON_NBCON) as well as all boot (CON_BOOT) consoles.
@@ -2978,31 +3006,46 @@ static bool console_emit_next_record(struct console *con, bool *handover, int co
 		con->dropped = 0;
 	}
 
-	/*
-	 * While actively printing out messages, if another printk()
-	 * were to occur on another CPU, it may wait for this one to
-	 * finish. This task can not be preempted if there is a
-	 * waiter waiting to take over.
-	 *
-	 * Interrupts are disabled because the hand over to a waiter
-	 * must not be interrupted until the hand over is completed
-	 * (@console_waiter is cleared).
-	 */
-	printk_safe_enter_irqsave(flags);
-	console_lock_spinning_enable();
+	/* Write everything out to the hardware. */
 
-	/* Do not trace print latency. */
-	stop_critical_timings();
+	if (force_legacy_kthread() && !panic_in_progress()) {
+		/*
+		 * With forced threading this function is in a task context
+		 * (either legacy kthread or get_init_console_seq()). There
+		 * is no need for concern about printk reentrance, handovers,
+		 * or lockdep complaints.
+		 */
 
-	/* Write everything out to the hardware. */
-	con->write(con, outbuf, pmsg.outbuf_len);
+		con->write(con, outbuf, pmsg.outbuf_len);
+		con->seq = pmsg.seq + 1;
+	} else {
+		/*
+		 * While actively printing out messages, if another printk()
+		 * were to occur on another CPU, it may wait for this one to
+		 * finish. This task can not be preempted if there is a
+		 * waiter waiting to take over.
+		 *
+		 * Interrupts are disabled because the hand over to a waiter
+		 * must not be interrupted until the hand over is completed
+		 * (@console_waiter is cleared).
+		 */
+		printk_safe_enter_irqsave(flags);
+		console_lock_spinning_enable();
 
-	start_critical_timings();
+		/* Do not trace print latency. */
+		stop_critical_timings();
 
-	con->seq = pmsg.seq + 1;
+		printk_legacy_allow_spinlock_enter();
+		con->write(con, outbuf, pmsg.outbuf_len);
+		printk_legacy_allow_spinlock_exit();
 
-	*handover = console_lock_spinning_disable_and_check(cookie);
-	printk_safe_exit_irqrestore(flags);
+		start_critical_timings();
+
+		con->seq = pmsg.seq + 1;
+
+		*handover = console_lock_spinning_disable_and_check(cookie);
+		printk_safe_exit_irqrestore(flags);
+	}
 skip:
 	return true;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH printk v3 19/19] printk: Avoid false positive lockdep report for legacy printing
  2024-07-22 17:19 ` [PATCH printk v3 19/19] printk: Avoid false positive lockdep report for legacy printing John Ogness
@ 2024-08-02 12:34   ` Petr Mladek
  0 siblings, 0 replies; 57+ messages in thread
From: Petr Mladek @ 2024-08-02 12:34 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Thomas Gleixner, linux-kernel

On Mon 2024-07-22 19:25:39, John Ogness wrote:
> Legacy console printing from printk() caller context may invoke
> the console driver from atomic context. This leads to a lockdep
> splat because the console driver will acquire a sleeping lock
> and the caller may already hold a spinning lock. This is noticed
> by lockdep on !PREEMPT_RT configurations because it will lead to
> a problem on PREEMPT_RT.
> 
> However, on PREEMPT_RT the printing path from atomic context is
> always avoided and the console driver is always invoked from a
> dedicated thread. Thus the lockdep splat on !PREEMPT_RT is a
> false positive.
> 
> For !PREEMPT_RT override the lock-context before invoking the
> console driver to avoid the false positive.
> 
> Do not override the lock-context for PREEMPT_RT in order to
> allow lockdep to catch any real locking context issues related
> to the write callback usage.
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Makes sense:

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2024-08-27  1:32 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-22 17:19 [PATCH printk v3 00/19] add threaded printing + the rest John Ogness
2024-07-22 17:19 ` [PATCH printk v3 01/19] printk: nbcon: Clarify nbcon_get_default_prio() context John Ogness
2024-07-26  8:57   ` Petr Mladek
2024-07-22 17:19 ` [PATCH printk v3 02/19] printk: nbcon: Consolidate alloc() and init() John Ogness
2024-07-26 11:58   ` Petr Mladek
2024-07-22 17:19 ` [PATCH printk v3 03/19] printk: nbcon: Add function for printers to reacquire ownership John Ogness
2024-07-26 12:25   ` Petr Mladek
2024-07-29  8:36     ` John Ogness
2024-07-30  9:24       ` Petr Mladek
2024-08-27  1:32         ` John Ogness
2024-07-22 17:19 ` [PATCH printk v3 04/19] printk: nbcon: Clarify rules of the owner/waiter matching John Ogness
2024-07-26 12:55   ` Petr Mladek
2024-07-22 17:19 ` [PATCH printk v3 05/19] printk: Fail pr_flush() if before SYSTEM_SCHEDULING John Ogness
2024-07-26 13:14   ` Petr Mladek
2024-07-26 14:45     ` John Ogness
2024-07-30  9:50       ` Petr Mladek
2024-07-22 17:19 ` [PATCH printk v3 06/19] printk: Flush console on unregister_console() John Ogness
2024-07-26 13:23   ` Petr Mladek
2024-07-22 17:19 ` [PATCH printk v3 07/19] printk: Add helpers for flush type logic John Ogness
2024-07-23  2:01   ` kernel test robot
2024-07-23  8:39     ` John Ogness
2024-07-23  3:29   ` kernel test robot
2024-07-26 15:51   ` Petr Mladek
2024-07-22 17:19 ` [PATCH printk v3 08/19] printk: nbcon: Add context to usable() and emit() John Ogness
2024-07-30 12:30   ` Petr Mladek
2024-07-22 17:19 ` [PATCH printk v3 09/19] printk: nbcon: Introduce printer kthreads John Ogness
2024-07-30 14:44   ` John Ogness
2024-07-31  9:59     ` Petr Mladek
2024-07-30 15:16   ` Petr Mladek
2024-07-22 17:19 ` [PATCH printk v3 10/19] printk: nbcon: Use thread callback if in task context for legacy John Ogness
2024-07-30 15:35   ` Petr Mladek
2024-07-22 17:19 ` [PATCH printk v3 11/19] printk: nbcon: Rely on kthreads for normal operation John Ogness
2024-07-23  3:18   ` kernel test robot
2024-07-23  8:51     ` John Ogness
2024-07-31 13:46   ` preffer_ofload param: was: " Petr Mladek
2024-08-01 14:22     ` John Ogness
2024-08-01 15:40       ` Petr Mladek
2024-08-02  7:29         ` John Ogness
2024-08-02 10:19           ` Petr Mladek
2024-07-31 14:06   ` Petr Mladek
2024-07-31 15:25     ` John Ogness
2024-08-01  9:36       ` Petr Mladek
2024-08-01  9:52         ` Petr Mladek
2024-07-22 17:19 ` [PATCH printk v3 12/19] printk: Provide helper for message prepending John Ogness
2024-07-22 17:19 ` [PATCH printk v3 13/19] printk: nbcon: Show replay message on takeover John Ogness
2024-07-31 14:59   ` Petr Mladek
2024-07-22 17:19 ` [PATCH printk v3 14/19] proc: consoles: Add notation to c_start/c_stop John Ogness
2024-07-22 17:19 ` [PATCH printk v3 15/19] proc: Add nbcon support for /proc/consoles John Ogness
2024-07-31 15:07   ` Petr Mladek
2024-07-22 17:19 ` [PATCH printk v3 16/19] tty: sysfs: Add nbcon support for 'active' John Ogness
2024-07-31 15:09   ` Petr Mladek
2024-07-22 17:19 ` [PATCH printk v3 17/19] printk: Implement legacy printer kthread for PREEMPT_RT John Ogness
2024-08-02 11:45   ` Petr Mladek
2024-07-22 17:19 ` [PATCH printk v3 18/19] printk: nbcon: Assign nice -20 for printing threads John Ogness
2024-08-02 11:47   ` Petr Mladek
2024-07-22 17:19 ` [PATCH printk v3 19/19] printk: Avoid false positive lockdep report for legacy printing John Ogness
2024-08-02 12:34   ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox