[PATCH printk v1 0/1] Allow unsafe ->write

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH printk v1 0/1] Allow unsafe ->write_atomic() for panic
@ 2025-09-12 12:18 John Ogness
  2025-09-12 12:18 ` [PATCH printk v1 1/1] printk: nbcon: Allow unsafe write_atomic() " John Ogness
  2025-09-17 14:44 ` [PATCH printk v1 0/1] Allow unsafe ->write_atomic() " Breno Leitao
  0 siblings, 2 replies; 18+ messages in thread
From: John Ogness @ 2025-09-12 12:18 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Breno Leitao, Mike Galbraith,
	linux-kernel, Greg Kroah-Hartman

Hi,

An effort is underway to update netconsole to support nbcon [0].
Currently it is not known exactly how a safe ->write_atomic()
callback for netconsole could be implemented. However, without
->write_atomic() implemented, there is guaranteed to be _no_
panic output.

We decided to allow unsafe ->write_atomic() implementations to
be used only as a last resort at the end of panic. This should
allow netconsole to be updated to support nbcon and still
provide panic output in "typical panic" situations.

I considered extending the feature to also allow such consoles
to print using their ->write_thread() callback whenever it is
known to be a sleepable context. However, this started to get
quite complex. We can revisit this extended feature if it is
determined that it is necessary.

John Ogness

[0] https://lore.kernel.org/lkml/b2qps3uywhmjaym4mht2wpxul4yqtuuayeoq4iv4k3zf5wdgh3@tocu6c7mj4lt

John Ogness (1):
  printk: nbcon: Allow unsafe write_atomic() for panic

 include/linux/console.h |  3 +++
 kernel/printk/nbcon.c   | 17 ++++++++++++++---
 kernel/printk/printk.c  | 23 ++++++++++++++++-------
 3 files changed, 33 insertions(+), 10 deletions(-)


base-commit: 37dbd4203b42c10b76d55471bb866900f99d6bc1
-- 
2.39.5


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH printk v1 1/1] printk: nbcon: Allow unsafe write_atomic() for panic
  2025-09-12 12:18 [PATCH printk v1 0/1] Allow unsafe ->write_atomic() for panic John Ogness
@ 2025-09-12 12:18 ` John Ogness
  2025-09-15 14:01   ` Breno Leitao
  2025-09-16 15:05   ` Petr Mladek
  2025-09-17 14:44 ` [PATCH printk v1 0/1] Allow unsafe ->write_atomic() " Breno Leitao
  1 sibling, 2 replies; 18+ messages in thread
From: John Ogness @ 2025-09-12 12:18 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Breno Leitao, Mike Galbraith,
	linux-kernel, Greg Kroah-Hartman

There may be console drivers that have not yet figured out a way
to implement safe atomic printing (->write_atomic() callback).
These drivers could choose to only implement threaded printing
(->write_thread() callback), but then it is guaranteed that _no_
output will be printed during panic. Not even attempted.

As a result, developers may be tempted to implement unsafe
->write_atomic() callbacks and/or implement some sort of custom
deferred printing trickery to try to make it work. This goes
against the principle intention of the nbcon API as well as
endangers other nbcon drivers that are doing things correctly
(safely).

As a compromise, allow nbcon drivers to implement unsafe
->write_atomic() callbacks by providing a new console flag
CON_NBCON_ATOMIC_UNSAFE. When specified, the ->write_atomic()
callback for that console will *only* be called during the
final "hope and pray" flush attempt at the end of a panic:
nbcon_atomic_flush_unsafe().

Signed-off-by: John Ogness <john.ogness@linutronix.de>
Link: https://lore.kernel.org/lkml/b2qps3uywhmjaym4mht2wpxul4yqtuuayeoq4iv4k3zf5wdgh3@tocu6c7mj4lt
---
 include/linux/console.h |  3 +++
 kernel/printk/nbcon.c   | 17 ++++++++++++++---
 kernel/printk/printk.c  | 23 ++++++++++++++++-------
 3 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 8f10d0a85bb4..ec68ecd13f85 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -185,6 +185,8 @@ static inline void con_debug_leave(void) { }
  *			printing callbacks must not be called.
  * @CON_NBCON:		Console can operate outside of the legacy style console_lock
  *			constraints.
+ * @CON_NBCON_ATOMIC_UNSAFE: The write_atomic() callback is not safe and is
+ *			therefore only used by nbcon_atomic_flush_unsafe().
  */
 enum cons_flags {
 	CON_PRINTBUFFER		= BIT(0),
@@ -196,6 +198,7 @@ enum cons_flags {
 	CON_EXTENDED		= BIT(6),
 	CON_SUSPENDED		= BIT(7),
 	CON_NBCON		= BIT(8),
+	CON_NBCON_ATOMIC_UNSAFE	= BIT(9),
 };
 
 /**
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 646801813415..8c2966b85ac3 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -972,14 +972,18 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt, bool use_a
 	/*
 	 * This function should never be called for consoles that have not
 	 * implemented the necessary callback for writing: i.e. legacy
-	 * consoles and, when atomic, nbcon consoles with no write_atomic().
+	 * consoles and, when atomic, nbcon consoles with no write_atomic()
+	 * or an unsafe write_atomic() without allowing unsafe takeovers.
 	 * Handle it as if ownership was lost and try to continue.
 	 *
 	 * Note that for nbcon consoles the write_thread() callback is
 	 * mandatory and was already checked in nbcon_alloc().
 	 */
-	if (WARN_ON_ONCE((use_atomic && !con->write_atomic) ||
-			 !(console_srcu_read_flags(con) & CON_NBCON))) {
+	if (WARN_ON_ONCE(!(console_srcu_read_flags(con) & CON_NBCON) ||
+			 (use_atomic &&
+			  (!con->write_atomic ||
+			   (!ctxt->allow_unsafe_takeover &&
+			    (console_srcu_read_flags(con) & CON_NBCON_ATOMIC_UNSAFE)))))) {
 		nbcon_context_release(ctxt);
 		return false;
 	}
@@ -1606,6 +1610,13 @@ static void __nbcon_atomic_flush_pending(u64 stop_seq, bool allow_unsafe_takeove
 		if (!console_is_usable(con, flags, true))
 			continue;
 
+		/*
+		 * It is only allowed to use unsafe ->write_atomic() from
+		 * nbcon_atomic_flush_unsafe().
+		 */
+		if ((flags & CON_NBCON_ATOMIC_UNSAFE) && !allow_unsafe_takeover)
+			continue;
+
 		if (nbcon_seq_read(con) >= stop_seq)
 			continue;
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 0efbcdda9aab..1cfc6801eed0 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3206,13 +3206,22 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
 			u64 printk_seq;
 			bool progress;
 
-			/*
-			 * console_flush_all() is only responsible for nbcon
-			 * consoles when the nbcon consoles cannot print via
-			 * their atomic or threaded flushing.
-			 */
-			if ((flags & CON_NBCON) && (ft.nbcon_atomic || ft.nbcon_offload))
-				continue;
+			if (flags & CON_NBCON) {
+				/*
+				 * console_flush_all() is only responsible for nbcon
+				 * consoles when the nbcon consoles cannot print via
+				 * their atomic or threaded flushing.
+				 */
+				if (ft.nbcon_atomic || ft.nbcon_offload)
+					continue;
+
+				/*
+				 * It is only allowed to use unsafe ->write_atomic() from
+				 * nbcon_atomic_flush_unsafe().
+				 */
+				if ((flags & CON_NBCON_ATOMIC_UNSAFE) && !do_cond_resched)
+					continue;
+			}
 
 			if (!console_is_usable(con, flags, !do_cond_resched))
 				continue;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH printk v1 1/1] printk: nbcon: Allow unsafe write_atomic() for panic
  2025-09-12 12:18 ` [PATCH printk v1 1/1] printk: nbcon: Allow unsafe write_atomic() " John Ogness
@ 2025-09-15 14:01   ` Breno Leitao
  2025-09-15 14:14     ` John Ogness
  2025-09-16 15:05   ` Petr Mladek
  1 sibling, 1 reply; 18+ messages in thread
From: Breno Leitao @ 2025-09-15 14:01 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Mike Galbraith,
	linux-kernel, Greg Kroah-Hartman

Hello John,

First of all, thanks for this patch.

On Fri, Sep 12, 2025 at 02:24:52PM +0206, John Ogness wrote:
> @@ -1606,6 +1610,13 @@ static void __nbcon_atomic_flush_pending(u64 stop_seq, bool allow_unsafe_takeove
>  		if (!console_is_usable(con, flags, true))
>  			continue;
>  
> +		/*
> +		 * It is only allowed to use unsafe ->write_atomic() from
> +		 * nbcon_atomic_flush_unsafe().
> +		 */
> +		if ((flags & CON_NBCON_ATOMIC_UNSAFE) && !allow_unsafe_takeover)
> +			continue;

What will happen with the "message" in this case? is it lost?	

Let me clarify I understand the patch. The .write_atomic callback are
called in two cases:

	1) Inside IRQ/NMI and scheduling context
	2) During panics.

In both cases, they go throught __nbcon_atomic_flush_pending_con(),
right?

Let's say that netconsole implements CON_NBCON_ATOMIC_UNSAFE. What will
happen with printks() inside IRQs (when the system is NOT panicking).
Are they coming through __nbcon_atomic_flush_pending() and will be
skipped?

Also, are these messages even deferred for later flush?

Thanks,
--breno

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH printk v1 1/1] printk: nbcon: Allow unsafe write_atomic() for panic
  2025-09-15 14:01   ` Breno Leitao
@ 2025-09-15 14:14     ` John Ogness
  2025-09-15 15:46       ` Breno Leitao
  0 siblings, 1 reply; 18+ messages in thread
From: John Ogness @ 2025-09-15 14:14 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Mike Galbraith,
	linux-kernel, Greg Kroah-Hartman

On 2025-09-15, Breno Leitao <leitao@debian.org> wrote:
> On Fri, Sep 12, 2025 at 02:24:52PM +0206, John Ogness wrote:
>> @@ -1606,6 +1610,13 @@ static void __nbcon_atomic_flush_pending(u64 stop_seq, bool allow_unsafe_takeove
>>  		if (!console_is_usable(con, flags, true))
>>  			continue;
>>  
>> +		/*
>> +		 * It is only allowed to use unsafe ->write_atomic() from
>> +		 * nbcon_atomic_flush_unsafe().
>> +		 */
>> +		if ((flags & CON_NBCON_ATOMIC_UNSAFE) && !allow_unsafe_takeover)
>> +			continue;
>
> What will happen with the "message" in this case? is it lost?
>
> Let me clarify I understand the patch. The .write_atomic callback are
> called in two cases:
>
> 	1) Inside IRQ/NMI and scheduling context
> 	2) During panics.
>
> In both cases, they go throught __nbcon_atomic_flush_pending_con(),
> right?

@allow_unsafe_takeover is only true at the very end of panic. In all
other cases, the ->write_atomic() callback is ignored as if it wasn't
implemented. That means it will rely on the deferred printing kthread to
handle it.

> Let's say that netconsole implements CON_NBCON_ATOMIC_UNSAFE. What will
> happen with printks() inside IRQs (when the system is NOT panicking).
> Are they coming through __nbcon_atomic_flush_pending() and will be
> skipped?
>
> Also, are these messages even deferred for later flush?

When the system is not panicing, CON_NBCON_ATOMIC_UNSAFE has the effect
of acting as if you never implemented ->write_atomic(). So yes, only
->write_thread() will handle everything in a deferred context. If the
system never panics, your ->write_atomic() will never be called.

John

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH printk v1 1/1] printk: nbcon: Allow unsafe write_atomic() for panic
  2025-09-15 14:14     ` John Ogness
@ 2025-09-15 15:46       ` Breno Leitao
  2025-09-15 19:09         ` John Ogness
  2025-09-16 13:25         ` Petr Mladek
  0 siblings, 2 replies; 18+ messages in thread
From: Breno Leitao @ 2025-09-15 15:46 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Mike Galbraith,
	linux-kernel, Greg Kroah-Hartman

On Mon, Sep 15, 2025 at 04:20:35PM +0206, John Ogness wrote:
> On 2025-09-15, Breno Leitao <leitao@debian.org> wrote:
> > On Fri, Sep 12, 2025 at 02:24:52PM +0206, John Ogness wrote:
> >> @@ -1606,6 +1610,13 @@ static void __nbcon_atomic_flush_pending(u64 stop_seq, bool allow_unsafe_takeove
> >>  		if (!console_is_usable(con, flags, true))
> >>  			continue;
> >>  
> >> +		/*
> >> +		 * It is only allowed to use unsafe ->write_atomic() from
> >> +		 * nbcon_atomic_flush_unsafe().
> >> +		 */
> >> +		if ((flags & CON_NBCON_ATOMIC_UNSAFE) && !allow_unsafe_takeover)
> >> +			continue;
> >
> > What will happen with the "message" in this case? is it lost?
> >
> > Let me clarify I understand the patch. The .write_atomic callback are
> > called in two cases:
> >
> > 	1) Inside IRQ/NMI and scheduling context
> > 	2) During panics.
> >
> > In both cases, they go throught __nbcon_atomic_flush_pending_con(),
> > right?
> 
> @allow_unsafe_takeover is only true at the very end of panic. In all
> other cases, the ->write_atomic() callback is ignored as if it wasn't
> implemented. That means it will rely on the deferred printing kthread to
> handle it.
> 
> > Let's say that netconsole implements CON_NBCON_ATOMIC_UNSAFE. What will
> > happen with printks() inside IRQs (when the system is NOT panicking).
> > Are they coming through __nbcon_atomic_flush_pending() and will be
> > skipped?
> >
> > Also, are these messages even deferred for later flush?
> 
> When the system is not panicing, CON_NBCON_ATOMIC_UNSAFE has the effect
> of acting as if you never implemented ->write_atomic(). So yes, only
> ->write_thread() will handle everything in a deferred context. If the
> system never panics, your ->write_atomic() will never be called.

If there is a printk() inside an IRQ and the host is not panicking, then
the message will be deferred to the kthread, which will print through
->write_thread.

So, from a user/netconsole perspective, assuming the no panic
(allow_unsafe_takeover=false) all the messages will be transmitted
(always from a thread context), even if the printk() happens on an IRQ.
So, no message will be lost.

Is my understanding right?

Thanks,
--breno

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH printk v1 1/1] printk: nbcon: Allow unsafe write_atomic() for panic
  2025-09-15 15:46       ` Breno Leitao
@ 2025-09-15 19:09         ` John Ogness
  2025-09-16 13:25         ` Petr Mladek
  1 sibling, 0 replies; 18+ messages in thread
From: John Ogness @ 2025-09-15 19:09 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Mike Galbraith,
	linux-kernel, Greg Kroah-Hartman

On 2025-09-15, Breno Leitao <leitao@debian.org> wrote:
> If there is a printk() inside an IRQ and the host is not panicking, then
> the message will be deferred to the kthread, which will print through
> ->write_thread.
>
> So, from a user/netconsole perspective, assuming the no panic
> (allow_unsafe_takeover=false) all the messages will be transmitted
> (always from a thread context), even if the printk() happens on an IRQ.
> So, no message will be lost.
>
> Is my understanding right?

Yes. So you will always have a safe context to write from (except in
panic, where you will do your unsafe code).

John

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH printk v1 1/1] printk: nbcon: Allow unsafe write_atomic() for panic
  2025-09-15 15:46       ` Breno Leitao
  2025-09-15 19:09         ` John Ogness
@ 2025-09-16 13:25         ` Petr Mladek
  1 sibling, 0 replies; 18+ messages in thread
From: Petr Mladek @ 2025-09-16 13:25 UTC (permalink / raw)
  To: Breno Leitao
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Mike Galbraith,
	linux-kernel, Greg Kroah-Hartman

On Mon 2025-09-15 08:46:13, Breno Leitao wrote:
> On Mon, Sep 15, 2025 at 04:20:35PM +0206, John Ogness wrote:
> > On 2025-09-15, Breno Leitao <leitao@debian.org> wrote:
> > > On Fri, Sep 12, 2025 at 02:24:52PM +0206, John Ogness wrote:
> > >> @@ -1606,6 +1610,13 @@ static void __nbcon_atomic_flush_pending(u64 stop_seq, bool allow_unsafe_takeove
> > >>  		if (!console_is_usable(con, flags, true))
> > >>  			continue;
> > >>  
> > >> +		/*
> > >> +		 * It is only allowed to use unsafe ->write_atomic() from
> > >> +		 * nbcon_atomic_flush_unsafe().
> > >> +		 */
> > >> +		if ((flags & CON_NBCON_ATOMIC_UNSAFE) && !allow_unsafe_takeover)
> > >> +			continue;
> > >
> > > What will happen with the "message" in this case? is it lost?
> > >
> > > Let me clarify I understand the patch. The .write_atomic callback are
> > > called in two cases:
> > >
> > > 	1) Inside IRQ/NMI and scheduling context
> > > 	2) During panics.
> > >
> > > In both cases, they go throught __nbcon_atomic_flush_pending_con(),
> > > right?
> > 
> > @allow_unsafe_takeover is only true at the very end of panic. In all
> > other cases, the ->write_atomic() callback is ignored as if it wasn't
> > implemented. That means it will rely on the deferred printing kthread to
> > handle it.
> > 
> > > Let's say that netconsole implements CON_NBCON_ATOMIC_UNSAFE. What will
> > > happen with printks() inside IRQs (when the system is NOT panicking).
> > > Are they coming through __nbcon_atomic_flush_pending() and will be
> > > skipped?
> > >
> > > Also, are these messages even deferred for later flush?
> > 
> > When the system is not panicing, CON_NBCON_ATOMIC_UNSAFE has the effect
> > of acting as if you never implemented ->write_atomic(). So yes, only
> > ->write_thread() will handle everything in a deferred context. If the
> > system never panics, your ->write_atomic() will never be called.
> 
> If there is a printk() inside an IRQ and the host is not panicking, then
> the message will be deferred to the kthread, which will print through
> ->write_thread.

Just to be sure that we are all on the same page.

Note that the above statement is true for all NBCON consoles,
including serial consoles, even with the _safe_ .write_atomic().

In fact, printk() does not distinguish IRQ or task context. The
primary distinction are the following three priorities:

  + NBCON_PRIO_NORMAL is used when the system is working properly.

    In this case, all messages are deferred to the kthread when
    the kthread is available.


  + NBCON_PRIO_EMERGENCY is currently used in some situations,
    e.g. WARN(), RCU stall, or lockdep report.

    In this case, printk() tries to emit the messages directly using
    _safe_ .write_atomic(). It must be deferred to the kthread when
    the callback is not implemented or is not safe.


  + NBCON_PRIO_PANIC is used in panic() by the panic CPU.

    Note that only panic CPU is allowed to flush messages to be on the safe
    side when other CPUs are stopped. In fact, non-panic() CPUs are
    not even allowed to add new messages by default.

    Anyway, in panic(), printk() tries to flush the messages directly
    using _safe_ .write_atomic(). They are ignored when the callback
    is not implemented.

    This patch will allow to use the _unsafe_ . write_atomic() by
    the final "can't loose anything" nbcon_atomic_flush_unsafe()
    call before the CPU enters the final infinite loop (blinking LEDs).


Note that we use the term "priority" because the context with
the higher (more critical) priority is allowed to take over
the ownership of the console even in the middle of emitting
a message.

> So, from a user/netconsole perspective, assuming the no panic
> (allow_unsafe_takeover=false) all the messages will be transmitted
> (always from a thread context), even if the printk() happens on an IRQ.
> So, no message will be lost.

True.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH printk v1 1/1] printk: nbcon: Allow unsafe write_atomic() for panic
  2025-09-12 12:18 ` [PATCH printk v1 1/1] printk: nbcon: Allow unsafe write_atomic() " John Ogness
  2025-09-15 14:01   ` Breno Leitao
@ 2025-09-16 15:05   ` Petr Mladek
  2025-09-17 12:47     ` John Ogness
  1 sibling, 1 reply; 18+ messages in thread
From: Petr Mladek @ 2025-09-16 15:05 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Breno Leitao, Mike Galbraith,
	linux-kernel, Greg Kroah-Hartman

On Fri 2025-09-12 14:24:52, John Ogness wrote:
> There may be console drivers that have not yet figured out a way
> to implement safe atomic printing (->write_atomic() callback).
> These drivers could choose to only implement threaded printing
> (->write_thread() callback), but then it is guaranteed that _no_
> output will be printed during panic. Not even attempted.
> 
> As a result, developers may be tempted to implement unsafe
> ->write_atomic() callbacks and/or implement some sort of custom
> deferred printing trickery to try to make it work. This goes
> against the principle intention of the nbcon API as well as
> endangers other nbcon drivers that are doing things correctly
> (safely).
> 
> As a compromise, allow nbcon drivers to implement unsafe
> ->write_atomic() callbacks by providing a new console flag
> CON_NBCON_ATOMIC_UNSAFE. When specified, the ->write_atomic()
> callback for that console will *only* be called during the
> final "hope and pray" flush attempt at the end of a panic:
> nbcon_atomic_flush_unsafe().
> 
> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> Link: https://lore.kernel.org/lkml/b2qps3uywhmjaym4mht2wpxul4yqtuuayeoq4iv4k3zf5wdgh3@tocu6c7mj4lt
> ---
>  include/linux/console.h |  3 +++
>  kernel/printk/nbcon.c   | 17 ++++++++++++++---
>  kernel/printk/printk.c  | 23 ++++++++++++++++-------
>  3 files changed, 33 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/console.h b/include/linux/console.h
> index 8f10d0a85bb4..ec68ecd13f85 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -185,6 +185,8 @@ static inline void con_debug_leave(void) { }
>   *			printing callbacks must not be called.
>   * @CON_NBCON:		Console can operate outside of the legacy style console_lock
>   *			constraints.
> + * @CON_NBCON_ATOMIC_UNSAFE: The write_atomic() callback is not safe and is
> + *			therefore only used by nbcon_atomic_flush_unsafe().
>   */
>  enum cons_flags {
>  	CON_PRINTBUFFER		= BIT(0),
> @@ -196,6 +198,7 @@ enum cons_flags {
>  	CON_EXTENDED		= BIT(6),
>  	CON_SUSPENDED		= BIT(7),
>  	CON_NBCON		= BIT(8),
> +	CON_NBCON_ATOMIC_UNSAFE	= BIT(9),
>  };
>  
>  /**
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index 646801813415..8c2966b85ac3 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -972,14 +972,18 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt, bool use_a
>  	/*
>  	 * This function should never be called for consoles that have not
>  	 * implemented the necessary callback for writing: i.e. legacy
> -	 * consoles and, when atomic, nbcon consoles with no write_atomic().
> +	 * consoles and, when atomic, nbcon consoles with no write_atomic()
> +	 * or an unsafe write_atomic() without allowing unsafe takeovers.
>  	 * Handle it as if ownership was lost and try to continue.
>  	 *
>  	 * Note that for nbcon consoles the write_thread() callback is
>  	 * mandatory and was already checked in nbcon_alloc().
>  	 */
> -	if (WARN_ON_ONCE((use_atomic && !con->write_atomic) ||
> -			 !(console_srcu_read_flags(con) & CON_NBCON))) {
> +	if (WARN_ON_ONCE(!(console_srcu_read_flags(con) & CON_NBCON) ||
> +			 (use_atomic &&
> +			  (!con->write_atomic ||
> +			   (!ctxt->allow_unsafe_takeover &&
> +			    (console_srcu_read_flags(con) & CON_NBCON_ATOMIC_UNSAFE)))))) {

The condition seems to be correct. But it is evil. I wonder whether
it would make sense to replace this with:

	flags = console_srcu_read_flags(con);

	if (WARN_ON_ONCE(!(flags & CON_NBCON) ||
			 !console_is_usable(con, flags, use_atomic, ctxt->allow_unsafe_takeover))) {


Note that I have added the 4th parameter intentionally, see below.

>  		nbcon_context_release(ctxt);
>  		return false;
>  	}
> @@ -1606,6 +1610,13 @@ static void __nbcon_atomic_flush_pending(u64 stop_seq, bool allow_unsafe_takeove
>  		if (!console_is_usable(con, flags, true))
>  			continue;
>  
> +		/*
> +		 * It is only allowed to use unsafe ->write_atomic() from
> +		 * nbcon_atomic_flush_unsafe().
> +		 */
> +		if ((flags & CON_NBCON_ATOMIC_UNSAFE) && !allow_unsafe_takeover)
> +			continue;
> +
>  		if (nbcon_seq_read(con) >= stop_seq)
>  			continue;
>  
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 0efbcdda9aab..1cfc6801eed0 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3206,13 +3206,22 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
>  			u64 printk_seq;
>  			bool progress;
>  
> -			/*
> -			 * console_flush_all() is only responsible for nbcon
> -			 * consoles when the nbcon consoles cannot print via
> -			 * their atomic or threaded flushing.
> -			 */
> -			if ((flags & CON_NBCON) && (ft.nbcon_atomic || ft.nbcon_offload))
> -				continue;
> +			if (flags & CON_NBCON) {
> +				/*
> +				 * console_flush_all() is only responsible for nbcon
> +				 * consoles when the nbcon consoles cannot print via
> +				 * their atomic or threaded flushing.
> +				 */
> +				if (ft.nbcon_atomic || ft.nbcon_offload)
> +					continue;
> +
> +				/*
> +				 * It is only allowed to use unsafe ->write_atomic() from
> +				 * nbcon_atomic_flush_unsafe().
> +				 */
> +				if ((flags & CON_NBCON_ATOMIC_UNSAFE) && !do_cond_resched)
> +					continue;
> +			}
>  
>  			if (!console_is_usable(con, flags, !do_cond_resched))
>  				continue;

Adding extra check looks error prone. I think that also
__nbcon_atomic_flush_pending_con() has to be patched.
Otherwise, it might end up in an infinite loop.
It is called directly from nbcon_device_release().

Note that this patch added the check only to the other caller
__nbcon_atomic_flush_pending().

It would be more reliable when the check was integrated into
console_is_usable(). I guess that you did not do it because
it added another parameter...

I would personally prefer to add the 4th parameter.

Or maybe, we could define @allow_unsafe_takeover via a global variable,
e.g. panic_nbcon_allow_unsafe_takeover. And it might be valid
only on the panic CPU, e.g.

static inline
bool nbcon_allow_unsafe_takeover(void)
{
	return panic_on_this_cpu() && panic_nbcon_allow_unsafe_takeover;
}

It is a kind of hack. But it might be better than the 4th parameter.
And it would simplify few other APIs.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH printk v1 1/1] printk: nbcon: Allow unsafe write_atomic() for panic
  2025-09-16 15:05   ` Petr Mladek
@ 2025-09-17 12:47     ` John Ogness
  2025-09-17 13:51       ` Petr Mladek
  0 siblings, 1 reply; 18+ messages in thread
From: John Ogness @ 2025-09-17 12:47 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Breno Leitao, Mike Galbraith,
	linux-kernel, Greg Kroah-Hartman

On 2025-09-16, Petr Mladek <pmladek@suse.com> wrote:
>> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
>> index 646801813415..8c2966b85ac3 100644
>> --- a/kernel/printk/nbcon.c
>> +++ b/kernel/printk/nbcon.c
>> @@ -972,14 +972,18 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt, bool use_a
>>  	/*
>>  	 * This function should never be called for consoles that have not
>>  	 * implemented the necessary callback for writing: i.e. legacy
>> -	 * consoles and, when atomic, nbcon consoles with no write_atomic().
>> +	 * consoles and, when atomic, nbcon consoles with no write_atomic()
>> +	 * or an unsafe write_atomic() without allowing unsafe takeovers.
>>  	 * Handle it as if ownership was lost and try to continue.
>>  	 *
>>  	 * Note that for nbcon consoles the write_thread() callback is
>>  	 * mandatory and was already checked in nbcon_alloc().
>>  	 */
>> -	if (WARN_ON_ONCE((use_atomic && !con->write_atomic) ||
>> -			 !(console_srcu_read_flags(con) & CON_NBCON))) {
>> +	if (WARN_ON_ONCE(!(console_srcu_read_flags(con) & CON_NBCON) ||
>> +			 (use_atomic &&
>> +			  (!con->write_atomic ||
>> +			   (!ctxt->allow_unsafe_takeover &&
>> +			    (console_srcu_read_flags(con) & CON_NBCON_ATOMIC_UNSAFE)))))) {
>
> The condition seems to be correct. But it is evil. I wonder whether
> it would make sense to replace this with:
>
> 	flags = console_srcu_read_flags(con);
>
> 	if (WARN_ON_ONCE(!(flags & CON_NBCON) ||
> 			 !console_is_usable(con, flags, use_atomic, ctxt->allow_unsafe_takeover))) {
>
>
> Note that I have added the 4th parameter intentionally, see below.

...

> It would be more reliable when the check was integrated into
> console_is_usable(). I guess that you did not do it because
> it added another parameter...

Not all console_is_usable() call sites have a printing context. That is
why I only added the checks only to the actual ->write_atomic() paths
that were possible via nbcon_atomic_flush_unsafe().

> Or maybe, we could define @allow_unsafe_takeover via a global variable,
> e.g. panic_nbcon_allow_unsafe_takeover. And it might be valid
> only on the panic CPU, e.g.
>
> static inline
> bool nbcon_allow_unsafe_takeover(void)
> {
> 	return panic_on_this_cpu() && panic_nbcon_allow_unsafe_takeover;
> }
>
> It is a kind of hack. But it might be better than the 4th parameter.
> And it would simplify few other APIs.

After weighing the pros/cons I think that a global variable makes the
most sense. It will simplify internal APIs and provide all
console_is_usable() users a correct value. And the end result is no
different than what we do now.

We could also keep its setting inside nbcon_atomic_flush_unsafe() so
that the variable remains a printk-internal variable.

John

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH printk v1 1/1] printk: nbcon: Allow unsafe write_atomic() for panic
  2025-09-17 12:47     ` John Ogness
@ 2025-09-17 13:51       ` Petr Mladek
  2025-09-22 10:44         ` John Ogness
  0 siblings, 1 reply; 18+ messages in thread
From: Petr Mladek @ 2025-09-17 13:51 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Breno Leitao, Mike Galbraith,
	linux-kernel, Greg Kroah-Hartman

On Wed 2025-09-17 14:53:26, John Ogness wrote:
> On 2025-09-16, Petr Mladek <pmladek@suse.com> wrote:
> >> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> >> index 646801813415..8c2966b85ac3 100644
> >> --- a/kernel/printk/nbcon.c
> >> +++ b/kernel/printk/nbcon.c
> >> @@ -972,14 +972,18 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt, bool use_a
> >>  	/*
> >>  	 * This function should never be called for consoles that have not
> >>  	 * implemented the necessary callback for writing: i.e. legacy
> >> -	 * consoles and, when atomic, nbcon consoles with no write_atomic().
> >> +	 * consoles and, when atomic, nbcon consoles with no write_atomic()
> >> +	 * or an unsafe write_atomic() without allowing unsafe takeovers.
> >>  	 * Handle it as if ownership was lost and try to continue.
> >>  	 *
> >>  	 * Note that for nbcon consoles the write_thread() callback is
> >>  	 * mandatory and was already checked in nbcon_alloc().
> >>  	 */
> >> -	if (WARN_ON_ONCE((use_atomic && !con->write_atomic) ||
> >> -			 !(console_srcu_read_flags(con) & CON_NBCON))) {
> >> +	if (WARN_ON_ONCE(!(console_srcu_read_flags(con) & CON_NBCON) ||
> >> +			 (use_atomic &&
> >> +			  (!con->write_atomic ||
> >> +			   (!ctxt->allow_unsafe_takeover &&
> >> +			    (console_srcu_read_flags(con) & CON_NBCON_ATOMIC_UNSAFE)))))) {
> >
> > The condition seems to be correct. But it is evil. I wonder whether
> > it would make sense to replace this with:
> >
> > 	flags = console_srcu_read_flags(con);
> >
> > 	if (WARN_ON_ONCE(!(flags & CON_NBCON) ||
> > 			 !console_is_usable(con, flags, use_atomic, ctxt->allow_unsafe_takeover))) {
> >
> >
> > Note that I have added the 4th parameter intentionally, see below.
> 
> ...
> 
> > It would be more reliable when the check was integrated into
> > console_is_usable(). I guess that you did not do it because
> > it added another parameter...
> 
> Not all console_is_usable() call sites have a printing context. That is
> why I only added the checks only to the actual ->write_atomic() paths
> that were possible via nbcon_atomic_flush_unsafe().

I see. But I still believe that it fits well into console_is_usable().
It is similar to the "use_atomic" parameter which depends on the
context as well. We could guess the context most of the time,
so that we hardcode the "use_atomic" value, ...


> > Or maybe, we could define @allow_unsafe_takeover via a global variable,
> > e.g. panic_nbcon_allow_unsafe_takeover. And it might be valid
> > only on the panic CPU, e.g.
> >
> > static inline
> > bool nbcon_allow_unsafe_takeover(void)
> > {
> > 	return panic_on_this_cpu() && panic_nbcon_allow_unsafe_takeover;
> > }
> >
> > It is a kind of hack. But it might be better than the 4th parameter.
> > And it would simplify few other APIs.
> 
> After weighing the pros/cons I think that a global variable makes the
> most sense. It will simplify internal APIs and provide all
> console_is_usable() users a correct value. And the end result is no
> different than what we do now.
> 
> We could also keep its setting inside nbcon_atomic_flush_unsafe() so
> that the variable remains a printk-internal variable.

Sounds good to me.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH printk v1 0/1] Allow unsafe ->write_atomic() for panic
  2025-09-12 12:18 [PATCH printk v1 0/1] Allow unsafe ->write_atomic() for panic John Ogness
  2025-09-12 12:18 ` [PATCH printk v1 1/1] printk: nbcon: Allow unsafe write_atomic() " John Ogness
@ 2025-09-17 14:44 ` Breno Leitao
  2025-09-26  9:21   ` John Ogness
  1 sibling, 1 reply; 18+ messages in thread
From: Breno Leitao @ 2025-09-17 14:44 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Mike Galbraith,
	linux-kernel, Greg Kroah-Hartman

On Fri, Sep 12, 2025 at 02:24:51PM +0206, John Ogness wrote:
> Hi,
> 
> An effort is underway to update netconsole to support nbcon [0].
> Currently it is not known exactly how a safe ->write_atomic()
> callback for netconsole could be implemented. However, without
> ->write_atomic() implemented, there is guaranteed to be _no_
> panic output.
> 
> We decided to allow unsafe ->write_atomic() implementations to
> be used only as a last resort at the end of panic. This should
> allow netconsole to be updated to support nbcon and still
> provide panic output in "typical panic" situations.
> 
> I considered extending the feature to also allow such consoles
> to print using their ->write_thread() callback whenever it is
> known to be a sleepable context. However, this started to get
> quite complex. We can revisit this extended feature if it is
> determined that it is necessary.

Thanks for this patch! I've successfully adapted my PoC netconsole
implementation to work with NBCON using your modifications, and the
initial results are encouraging. With your patch and a corresponding
->write_atomic callback, I'm receiving the complete set of messages that
were previously available through the regular console interface.

Upon further consideration, it's worth noting that not all network
drivers rely on irq-unsafe locks. In practice, only a subset of drivers
use them, while most network drivers I'm familiar with maintain IRQ-safe
TX paths.

If we could determine the IRQ safety characteristics (IRQ-safe vs
IRQ-unsafe TX) during netconsole registration, this would enable more
optimized behavior: netconsole could register as CON_NBCON_ATOMIC_UNSAFE
only when the underlying network adapter uses IRQ-unsafe locks. For
adapters with IRQ-safe implementations, netconsole could safely utilize
the ->write_atomic path without restrictions.

FWIW, I am attaching the patch I used to test netconsole on top of your
changes. Keep in mind this is is calling TX patch with IRQ disabled
given the list is depending on IRQ-safe lock.

This will be fixed once I do the following:

1) move target_list to use RCU
2) Untangling netconsole from netpoll [1]
3) They are depending on a conflicting netpoll fix [2]

Link: https://lore.kernel.org/all/willemdebruijn.kernel.a0f67bb6112a@gmail.com/ [1]
Link: https://lore.kernel.org/all/20250917-netconsole_torture-v4-1-0a5b3b8f81ce@debian.org/ [2]

commit e9f4a292a49ae6d3da29f1dca39754180d2608d7
Author: Breno Leitao <leitao@debian.org>
Date:   Tue Aug 19 04:14:58 2025 -0700

    netconsole: Add support for nbcon
    
    Add support for running netconsole using the new non‑blocking console
    (nbcon) infrastructure.
    
    The nbcon framework improves console handling by avoiding the global
    console lock and enabling asynchronous, non‑blocking writes from
    multiple contexts.
    
    With this option enabled, netconsole can operate as a fully
    asynchronous, lockless nbcon backend, not depending on console lock
    anymore.
    
    This support is marked experimental for now until it receives wider
    testing.
    
    This depends on the CON_NBCON_ATOMIC_UNSAFE nbcon flag, and
    ->write_atomic is unsafe and only called with NBCON_PRIO_PANIC priority.
    
    Signed-off-by: Breno Leitao <leitao@debian.org>

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index b29628d46be9b..ec9a430aa160e 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -382,6 +382,16 @@ config NETCONSOLE_PREPEND_RELEASE
 	  message.  See <file:Documentation/networking/netconsole.rst> for
 	  details.
 
+config NETCONSOLE_NBCON
+	bool "Enable non blocking netconsole (EXPERIMENTAL)"
+	depends on NETCONSOLE
+	default n
+	help
+	  Move netconsole to use non-blocking console (nbcons).  Non-blocking
+	  console (nbcon) is a new console infrastructure introduced to improve
+	  console handling by avoiding the global console lock (Big Kernel
+	  Lock) and enabling non-blocking, asynchronous writes to the console.
+
 config NETPOLL
 	def_bool NETCONSOLE
 
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index e3722de08ea9f..ac2f0ace45d6e 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -1692,6 +1692,97 @@ static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
 				   extradata_len);
 }
 
+static void do_write_msg(struct netconsole_target *nt, const char *msg,
+			 unsigned int len)
+{
+	const char *tmp;
+	int frag, left;
+
+	/*
+	 * We nest this inside the for-each-target loop above
+	 * so that we're able to get as much logging out to
+	 * at least one target if we die inside here, instead
+	 * of unnecessarily keeping all targets in lock-step.
+	 */
+	tmp = msg;
+	for (left = len; left;) {
+		frag = min(left, MAX_PRINT_CHUNK);
+		send_udp(nt, tmp, frag);
+		tmp += frag;
+		left -= frag;
+	}
+}
+
+#ifdef CONFIG_NETCONSOLE_NBCON
+static void netcon_write_ext_atomic(struct console *con,
+				    struct nbcon_write_context *wctxt)
+{
+	struct netconsole_target *nt;
+
+	list_for_each_entry(nt, &target_list, list)
+		if (nt->extended && nt->enabled && netif_running(nt->np.dev)) {
+			if (!nbcon_enter_unsafe(wctxt))
+				continue;
+			send_ext_msg_udp(nt, wctxt->outbuf, wctxt->len);
+			nbcon_exit_unsafe(wctxt);
+		}
+}
+
+static void netcon_write_ext_thread(struct console *con,
+				    struct nbcon_write_context *wctxt)
+{
+	struct netconsole_target *nt;
+
+	list_for_each_entry(nt, &target_list, list)
+		if (nt->extended && nt->enabled && netif_running(nt->np.dev)) {
+			if (!nbcon_enter_unsafe(wctxt))
+				continue;
+			send_ext_msg_udp(nt, wctxt->outbuf, wctxt->len);
+			nbcon_exit_unsafe(wctxt);
+		}
+}
+
+/* regular write call backs */
+static void netcon_write_thread(struct console *con,
+				struct nbcon_write_context *wctxt)
+{
+	struct netconsole_target *nt;
+
+	list_for_each_entry(nt, &target_list, list)
+		if (!nt->extended && nt->enabled && netif_running(nt->np.dev)) {
+			if (!nbcon_enter_unsafe(wctxt))
+				continue;
+			do_write_msg(nt, wctxt->outbuf, wctxt->len);
+			nbcon_exit_unsafe(wctxt);
+		}
+}
+
+static void netcon_write_atomic(struct console *con,
+				struct nbcon_write_context *wctxt)
+{
+	struct netconsole_target *nt;
+
+	list_for_each_entry(nt, &target_list, list)
+		if (!nt->extended && nt->enabled && netif_running(nt->np.dev)) {
+			if (!nbcon_enter_unsafe(wctxt))
+				continue;
+			do_write_msg(nt, wctxt->outbuf, wctxt->len);
+			nbcon_exit_unsafe(wctxt);
+		}
+}
+
+/* locks call backs */
+static void netconsole_device_lock(struct console *con, unsigned long *flags)
+{
+	/* protects all the targets at the same time */
+	spin_lock_irqsave(&target_list_lock, *flags);
+}
+
+static void netconsole_device_unlock(struct console *con, unsigned long flags)
+{
+	spin_unlock_irqrestore(&target_list_lock, flags);
+}
+#else
 static void write_ext_msg(struct console *con, const char *msg,
 			  unsigned int len)
 {
@@ -1710,10 +1801,8 @@ static void write_ext_msg(struct console *con, const char *msg,
 
 static void write_msg(struct console *con, const char *msg, unsigned int len)
 {
-	int frag, left;
 	unsigned long flags;
 	struct netconsole_target *nt;
-	const char *tmp;
 
 	if (oops_only && !oops_in_progress)
 		return;
@@ -1723,24 +1812,13 @@ static void write_msg(struct console *con, const char *msg, unsigned int len)
 
 	spin_lock_irqsave(&target_list_lock, flags);
 	list_for_each_entry(nt, &target_list, list) {
-		if (!nt->extended && nt->enabled && netif_running(nt->np.dev)) {
-			/*
-			 * We nest this inside the for-each-target loop above
-			 * so that we're able to get as much logging out to
-			 * at least one target if we die inside here, instead
-			 * of unnecessarily keeping all targets in lock-step.
-			 */
-			tmp = msg;
-			for (left = len; left;) {
-				frag = min(left, MAX_PRINT_CHUNK);
-				send_udp(nt, tmp, frag);
-				tmp += frag;
-				left -= frag;
-			}
-		}
+		if (!nt->extended && nt->enabled && netif_running(nt->np.dev))
+			do_write_msg(nt, msg, len);
 	}
 	spin_unlock_irqrestore(&target_list_lock, flags);
 }
+#endif
+
 
 static int netpoll_parse_ip_addr(const char *str, union inet_addr *addr)
 {
@@ -1919,14 +1997,30 @@ static void free_param_target(struct netconsole_target *nt)
 
 static struct console netconsole_ext = {
 	.name	= "netcon_ext",
+#ifdef CONFIG_NETCONSOLE_NBCON
+	.flags	= CON_ENABLED | CON_EXTENDED | CON_NBCON | CON_NBCON_ATOMIC_UNSAFE,
+	.write_thread = netcon_write_ext_thread,
+	.write_atomic = netcon_write_ext_atomic,
+	.device_lock = netconsole_device_lock,
+	.device_unlock = netconsole_device_unlock,
+#else
 	.flags	= CON_ENABLED | CON_EXTENDED,
 	.write	= write_ext_msg,
+#endif
 };
 
 static struct console netconsole = {
 	.name	= "netcon",
+#ifdef CONFIG_NETCONSOLE_NBCON
+	.flags	= CON_ENABLED | CON_NBCON | CON_NBCON_ATOMIC_UNSAFE,
+	.write_thread = netcon_write_thread,
+	.write_atomic = netcon_write_atomic,
+	.device_lock = netconsole_device_lock,
+	.device_unlock = netconsole_device_unlock,
+#else
 	.flags	= CON_ENABLED,
 	.write	= write_msg,
+#endif
 };
 
 static int __init init_netconsole(void)

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH printk v1 1/1] printk: nbcon: Allow unsafe write_atomic() for panic
  2025-09-17 13:51       ` Petr Mladek
@ 2025-09-22 10:44         ` John Ogness
  2025-09-22 11:45           ` Petr Mladek
  2025-09-23 12:30           ` Breno Leitao
  0 siblings, 2 replies; 18+ messages in thread
From: John Ogness @ 2025-09-22 10:44 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Breno Leitao, Mike Galbraith,
	linux-kernel, Greg Kroah-Hartman

On 2025-09-17, Petr Mladek <pmladek@suse.com> wrote:
>> After weighing the pros/cons I think that a global variable makes the
>> most sense. It will simplify internal APIs and provide all
>> console_is_usable() users a correct value. And the end result is no
>> different than what we do now.
>> 
>> We could also keep its setting inside nbcon_atomic_flush_unsafe() so
>> that the variable remains a printk-internal variable.
>
> Sounds good to me.

Right now things are a bit of a mess with required changes sitting in
printk and mm trees. Since this won't be going in to the upcoming merge
window, I will wait with v2 until you (Petr) can officially rebase the
printk tree to include the recent panic_*cpu*() changes. That will also
make it easier to coordinate the upcoming console_is_usable() changes as
well.

The functionality for v2 is the same as the v1, so the network folks can
continue working on the nbcon netconsole implementation.

@Breno: Or were you planning on pushing the nbcon netconsole for the 6.18
merge window next week? (I would guess no.)

John

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH printk v1 1/1] printk: nbcon: Allow unsafe write_atomic() for panic
  2025-09-22 10:44         ` John Ogness
@ 2025-09-22 11:45           ` Petr Mladek
  2025-09-23 12:30           ` Breno Leitao
  1 sibling, 0 replies; 18+ messages in thread
From: Petr Mladek @ 2025-09-22 11:45 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Breno Leitao, Mike Galbraith,
	linux-kernel, Greg Kroah-Hartman

On Mon 2025-09-22 12:50:04, John Ogness wrote:
> On 2025-09-17, Petr Mladek <pmladek@suse.com> wrote:
> >> After weighing the pros/cons I think that a global variable makes the
> >> most sense. It will simplify internal APIs and provide all
> >> console_is_usable() users a correct value. And the end result is no
> >> different than what we do now.
> >> 
> >> We could also keep its setting inside nbcon_atomic_flush_unsafe() so
> >> that the variable remains a printk-internal variable.
> >
> > Sounds good to me.
> 
> Right now things are a bit of a mess with required changes sitting in
> printk and mm trees. Since this won't be going in to the upcoming merge
> window, I will wait with v2 until you (Petr) can officially rebase the
> printk tree to include the recent panic_*cpu*() changes. That will also
> make it easier to coordinate the upcoming console_is_usable() changes as
> well.

Makes sense.

> The functionality for v2 is the same as the v1, so the network folks can
> continue working on the nbcon netconsole implementation.

Yup.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH printk v1 1/1] printk: nbcon: Allow unsafe write_atomic() for panic
  2025-09-22 10:44         ` John Ogness
  2025-09-22 11:45           ` Petr Mladek
@ 2025-09-23 12:30           ` Breno Leitao
  1 sibling, 0 replies; 18+ messages in thread
From: Breno Leitao @ 2025-09-23 12:30 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Mike Galbraith,
	linux-kernel, Greg Kroah-Hartman

Hello John,

On Mon, Sep 22, 2025 at 12:50:04PM +0206, John Ogness wrote:
> On 2025-09-17, Petr Mladek <pmladek@suse.com> wrote:

> @Breno: Or were you planning on pushing the nbcon netconsole for the 6.18
> merge window next week? (I would guess no.)

Unfortuantely no, for me to get netconsole on top of nbcon, I need to
address a few dependencies:

1) Get the memleak fixed, given this is conflicting with the next item
   * https://lore.kernel.org/all/20250918-netconsole_torture-v5-0-77e25e0a4eb6@debian.org/

2) Decouple netconsole from netpoll
   * https://lore.kernel.org/all/20250902-netpoll_untangle_v3-v1-0-51a03d6411be@debian.org/#t

3) Convert the target_list_lock to RCU, to avoid having to disable IRQ
   at netconsole side.

4) Move netconsole to nbcon.

Given that these are "core" changes, I would prefer to get, at least, 3)
and 4) into the next (6.19) merge window.

Thanks for asking,
--breno

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH printk v1 0/1] Allow unsafe ->write_atomic() for panic
  2025-09-17 14:44 ` [PATCH printk v1 0/1] Allow unsafe ->write_atomic() " Breno Leitao
@ 2025-09-26  9:21   ` John Ogness
  2025-09-26 15:17     ` Breno Leitao
  0 siblings, 1 reply; 18+ messages in thread
From: John Ogness @ 2025-09-26  9:21 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Mike Galbraith,
	linux-kernel, Greg Kroah-Hartman

Hi Breno,

On 2025-09-17, Breno Leitao <leitao@debian.org> wrote:
> Upon further consideration, it's worth noting that not all network
> drivers rely on irq-unsafe locks. In practice, only a subset of drivers
> use them, while most network drivers I'm familiar with maintain IRQ-safe
> TX paths.
>
> If we could determine the IRQ safety characteristics (IRQ-safe vs
> IRQ-unsafe TX) during netconsole registration, this would enable more
> optimized behavior: netconsole could register as CON_NBCON_ATOMIC_UNSAFE
> only when the underlying network adapter uses IRQ-unsafe locks. For
> adapters with IRQ-safe implementations, netconsole could safely utilize
> the ->write_atomic path without restrictions.

This is good to read. But note that if CON_NBCON_ATOMIC_UNSAFE is not
set, it is expected that ->write_atomic() will also function in NMI. So
being IRQ-safe may not be enough.

John Ogness

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH printk v1 0/1] Allow unsafe ->write_atomic() for panic
  2025-09-26  9:21   ` John Ogness
@ 2025-09-26 15:17     ` Breno Leitao
  2025-09-29 12:18       ` Petr Mladek
  0 siblings, 1 reply; 18+ messages in thread
From: Breno Leitao @ 2025-09-26 15:17 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Mike Galbraith,
	linux-kernel, Greg Kroah-Hartman

Hello John,

On Fri, Sep 26, 2025 at 11:27:49AM +0206, John Ogness wrote:
> On 2025-09-17, Breno Leitao <leitao@debian.org> wrote:
> > Upon further consideration, it's worth noting that not all network
> > drivers rely on irq-unsafe locks. In practice, only a subset of drivers
> > use them, while most network drivers I'm familiar with maintain IRQ-safe
> > TX paths.
> >
> > If we could determine the IRQ safety characteristics (IRQ-safe vs
> > IRQ-unsafe TX) during netconsole registration, this would enable more
> > optimized behavior: netconsole could register as CON_NBCON_ATOMIC_UNSAFE
> > only when the underlying network adapter uses IRQ-unsafe locks. For
> > adapters with IRQ-safe implementations, netconsole could safely utilize
> > the ->write_atomic path without restrictions.
> 
> This is good to read. But note that if CON_NBCON_ATOMIC_UNSAFE is not
> set, it is expected that ->write_atomic() will also function in NMI. So
> being IRQ-safe may not be enough.

What are the other requirements for ->write_atomic() so it could be
executed inside a NMI?

That brings me another point, I suppose that netconsole callbacks are
currently being called from NMI, given it is registered as a legacy
console, and legacy consoles are printked from inside NMIs, right?

Thanks!
--breno

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH printk v1 0/1] Allow unsafe ->write_atomic() for panic
  2025-09-26 15:17     ` Breno Leitao
@ 2025-09-29 12:18       ` Petr Mladek
  2025-09-29 13:36         ` John Ogness
  0 siblings, 1 reply; 18+ messages in thread
From: Petr Mladek @ 2025-09-29 12:18 UTC (permalink / raw)
  To: Breno Leitao
  Cc: John Ogness, Sergey Senozhatsky, Steven Rostedt, Mike Galbraith,
	linux-kernel, Greg Kroah-Hartman

On Fri 2025-09-26 08:17:53, Breno Leitao wrote:
> Hello John,
> 
> On Fri, Sep 26, 2025 at 11:27:49AM +0206, John Ogness wrote:
> > On 2025-09-17, Breno Leitao <leitao@debian.org> wrote:
> > > Upon further consideration, it's worth noting that not all network
> > > drivers rely on irq-unsafe locks. In practice, only a subset of drivers
> > > use them, while most network drivers I'm familiar with maintain IRQ-safe
> > > TX paths.
> > >
> > > If we could determine the IRQ safety characteristics (IRQ-safe vs
> > > IRQ-unsafe TX) during netconsole registration, this would enable more
> > > optimized behavior: netconsole could register as CON_NBCON_ATOMIC_UNSAFE
> > > only when the underlying network adapter uses IRQ-unsafe locks. For
> > > adapters with IRQ-safe implementations, netconsole could safely utilize
> > > the ->write_atomic path without restrictions.
> > 
> > This is good to read. But note that if CON_NBCON_ATOMIC_UNSAFE is not
> > set, it is expected that ->write_atomic() will also function in NMI. So
> > being IRQ-safe may not be enough.
> 
> What are the other requirements for ->write_atomic() so it could be
> executed inside a NMI?

Ideally, it should be synchronized only via the nbcon console context
API and should not need any additional lock.

Note that the nbcon console context should get synchronized with
the normal device lock via some wrappers, for example, see
uart_port_lock() in include/linux/serial_core.h.

> That brings me another point, I suppose that netconsole callbacks are
> currently being called from NMI, given it is registered as a legacy
> console, and legacy consoles are printked from inside NMIs, right?

The legacy console handling is automatically deferred in_nmi(), see
is_printk_legacy_deferred().

The only exception is panic() where the consoles are explicitly
flushed. It is not 100% safe. But the risk of a deadlock is reduced
by calling bust_spinlocks(1) which sets oops_in_progress.
Console drivers use trylock when oops_in_progress is set.

I could imagine that we allow the trick with oops_in_progress
and trylock also in the "unsafe" write_atomic() callbacks.
But it would be better to use only the nbcon context ownership
as suggested above.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH printk v1 0/1] Allow unsafe ->write_atomic() for panic
  2025-09-29 12:18       ` Petr Mladek
@ 2025-09-29 13:36         ` John Ogness
  0 siblings, 0 replies; 18+ messages in thread
From: John Ogness @ 2025-09-29 13:36 UTC (permalink / raw)
  To: Petr Mladek, Breno Leitao
  Cc: Sergey Senozhatsky, Steven Rostedt, Mike Galbraith, linux-kernel,
	Greg Kroah-Hartman

On 2025-09-29, Petr Mladek <pmladek@suse.com> wrote:
> On Fri 2025-09-26 08:17:53, Breno Leitao wrote:
>> On Fri, Sep 26, 2025 at 11:27:49AM +0206, John Ogness wrote:
>> > On 2025-09-17, Breno Leitao <leitao@debian.org> wrote:
>> > > Upon further consideration, it's worth noting that not all network
>> > > drivers rely on irq-unsafe locks. In practice, only a subset of drivers
>> > > use them, while most network drivers I'm familiar with maintain IRQ-safe
>> > > TX paths.
>> > >
>> > > If we could determine the IRQ safety characteristics (IRQ-safe vs
>> > > IRQ-unsafe TX) during netconsole registration, this would enable more
>> > > optimized behavior: netconsole could register as CON_NBCON_ATOMIC_UNSAFE
>> > > only when the underlying network adapter uses IRQ-unsafe locks. For
>> > > adapters with IRQ-safe implementations, netconsole could safely utilize
>> > > the ->write_atomic path without restrictions.
>> > 
>> > This is good to read. But note that if CON_NBCON_ATOMIC_UNSAFE is not
>> > set, it is expected that ->write_atomic() will also function in NMI. So
>> > being IRQ-safe may not be enough.
>> 
>> What are the other requirements for ->write_atomic() so it could be
>> executed inside a NMI?
>
> Ideally, it should be synchronized only via the nbcon console context
> API and should not need any additional lock.
>
> Note that the nbcon console context should get synchronized with
> the normal device lock via some wrappers, for example, see
> uart_port_lock() in include/linux/serial_core.h.

I would also like to add that write_atomic() calls are synchronized
against "unsafe sections"[0] of write_thread() and write_atomic()

... EXCEPT ...

As a last resort during panic, nbcon will perform hostile
takeovers. This means write_atomic() can be called even though the
driver is in unsafe sections. So your write_atomic() could be called
when your write_thread() is in an unsafe section. However,
write_atomic() can detect this by looking at unsafe_takeover of struct
nbcon_write_context. If true, you know your write_thread() was in an
unsafe section and you might need to be more careful or avoid certain
actions.

John

[0] Code surround by nbcon_enter_unsafe()/nbcon_exit_unsafe().

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-09-29 13:36 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-12 12:18 [PATCH printk v1 0/1] Allow unsafe ->write_atomic() for panic John Ogness
2025-09-12 12:18 ` [PATCH printk v1 1/1] printk: nbcon: Allow unsafe write_atomic() " John Ogness
2025-09-15 14:01   ` Breno Leitao
2025-09-15 14:14     ` John Ogness
2025-09-15 15:46       ` Breno Leitao
2025-09-15 19:09         ` John Ogness
2025-09-16 13:25         ` Petr Mladek
2025-09-16 15:05   ` Petr Mladek
2025-09-17 12:47     ` John Ogness
2025-09-17 13:51       ` Petr Mladek
2025-09-22 10:44         ` John Ogness
2025-09-22 11:45           ` Petr Mladek
2025-09-23 12:30           ` Breno Leitao
2025-09-17 14:44 ` [PATCH printk v1 0/1] Allow unsafe ->write_atomic() " Breno Leitao
2025-09-26  9:21   ` John Ogness
2025-09-26 15:17     ` Breno Leitao
2025-09-29 12:18       ` Petr Mladek
2025-09-29 13:36         ` John Ogness

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox