[PATCH v2 0/3] Handle NBCON consoles on KDB

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/3] Handle NBCON consoles on KDB
@ 2025-08-11 13:32 Marcos Paulo de Souza
  2025-08-11 13:32 ` [PATCH v2 1/3] printk: nbcon: Export console_is_usable Marcos Paulo de Souza
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Marcos Paulo de Souza @ 2025-08-11 13:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Petr Mladek, Steven Rostedt, John Ogness,
	Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
	Douglas Anderson
  Cc: linux-kernel, kgdb-bugreport, Marcos Paulo de Souza

After the proposed patches [1], it was suggested to start using
console_is_usable instead of checking if a console is enabled. It was
noticed that KDB was always calling con->write method, but this
callback is not set for NBCON consoles.

To fix this usecase, export console_is_usable and add new nbcon code to
acquire a context that KDB needs in order to call ->write_atomic. After
these patches are accepted I'm planning to go back to [1] again to use
the function expected here.

I did the tests using qemu and reapplying commit f79b163c4231
('Revert "serial: 8250: Switch to nbcon console"') created originally by
John, just to exercise the common 8250 serial from qemu. The commit can
be checked on [2]. I had to solve some conflicts since the code has been
reworked after the commit was reverted.

Without these patches, NBCON consoles won't receive the kgdb messages. I
tested using [2], and it works after this patchset.

Thanks to all reviewers of the patches posted on [1]! I hope this is the
first step into implementing all the changes suggested in that patchset.

[1]: https://lore.kernel.org/lkml/20250606-printk-cleanup-part2-v1-0-f427c743dda0@suse.com/
[2]: https://github.com/marcosps/linux/commit/bea249773c9caf56821f9ec06c7f9649e4966c61

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
Changes in v2:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v1: https://lore.kernel.org/r/20250713-nbcon-kgdboc-v1-0-51eccd9247a8@suse.com

---
Marcos Paulo de Souza (3):
      printk: nbcon: Export console_is_usable
      printk: nbcon: Introduce KDB helpers
      kdb: Adapt kdb_msg_write to work with NBCON consoles

 include/linux/console.h   | 50 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/debug/kdb/kdb_io.c | 26 ++++++++++++++++++++----
 kernel/printk/internal.h  | 41 --------------------------------------
 kernel/printk/nbcon.c     | 26 ++++++++++++++++++++++++
 4 files changed, 98 insertions(+), 45 deletions(-)
---
base-commit: bea249773c9caf56821f9ec06c7f9649e4966c61
change-id: 20250713-nbcon-kgdboc-efcfc37fde46

Best regards,
-- 
Marcos Paulo de Souza <mpdesouza@suse.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/3] printk: nbcon: Export console_is_usable
  2025-08-11 13:32 [PATCH v2 0/3] Handle NBCON consoles on KDB Marcos Paulo de Souza
@ 2025-08-11 13:32 ` Marcos Paulo de Souza
  2025-08-20 15:03   ` Petr Mladek
  2025-08-11 13:32 ` [PATCH v2 2/3] printk: nbcon: Introduce KDB helpers Marcos Paulo de Souza
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Marcos Paulo de Souza @ 2025-08-11 13:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Petr Mladek, Steven Rostedt, John Ogness,
	Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
	Douglas Anderson
  Cc: linux-kernel, kgdb-bugreport, Marcos Paulo de Souza

The helper will be used on KDB code in the next commits.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
 include/linux/console.h  | 44 ++++++++++++++++++++++++++++++++++++++++++++
 kernel/printk/internal.h | 41 -----------------------------------------
 2 files changed, 44 insertions(+), 41 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 8f10d0a85bb4536e4b0dda4e8ccbdf87978bbb4a..67af483574727c00eea1d5a1eacc994755c92607 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -605,6 +605,48 @@ extern bool nbcon_can_proceed(struct nbcon_write_context *wctxt);
 extern bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt);
 extern bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt);
 extern void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt);
+
+/*
+ * Check if the given console is currently capable and allowed to print
+ * records. Note that this function does not consider the current context,
+ * which can also play a role in deciding if @con can be used to print
+ * records.
+ */
+static inline bool console_is_usable(struct console *con, short flags, bool use_atomic)
+{
+	if (!(flags & CON_ENABLED))
+		return false;
+
+	if ((flags & CON_SUSPENDED))
+		return false;
+
+	if (flags & CON_NBCON) {
+		/* The write_atomic() callback is optional. */
+		if (use_atomic && !con->write_atomic)
+			return false;
+
+		/*
+		 * For the !use_atomic case, @printk_kthreads_running is not
+		 * checked because the write_thread() callback is also used
+		 * via the legacy loop when the printer threads are not
+		 * available.
+		 */
+	} else {
+		if (!con->write)
+			return false;
+	}
+
+	/*
+	 * Console drivers may assume that per-cpu resources have been
+	 * allocated. So unless they're explicitly marked as being able to
+	 * cope (CON_ANYTIME) don't call them until this CPU is officially up.
+	 */
+	if (!cpu_online(raw_smp_processor_id()) && !(flags & CON_ANYTIME))
+		return false;
+
+	return true;
+}
+
 #else
 static inline void nbcon_cpu_emergency_enter(void) { }
 static inline void nbcon_cpu_emergency_exit(void) { }
@@ -612,6 +654,8 @@ static inline bool nbcon_can_proceed(struct nbcon_write_context *wctxt) { return
 static inline bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt) { return false; }
 static inline bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt) { return false; }
 static inline void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt) { }
+static inline bool console_is_usable(struct console *con, short flags,
+				     bool use_atomic) { return false; }
 #endif
 
 extern int console_set_on_cmdline;
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index ef282001f200fdbbacae3171932bf9f049037a85..077927ed56c5f7ae22f76be4635a4b4614387de8 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -112,47 +112,6 @@ bool nbcon_kthread_create(struct console *con);
 void nbcon_kthread_stop(struct console *con);
 void nbcon_kthreads_wake(void);
 
-/*
- * Check if the given console is currently capable and allowed to print
- * records. Note that this function does not consider the current context,
- * which can also play a role in deciding if @con can be used to print
- * records.
- */
-static inline bool console_is_usable(struct console *con, short flags, bool use_atomic)
-{
-	if (!(flags & CON_ENABLED))
-		return false;
-
-	if ((flags & CON_SUSPENDED))
-		return false;
-
-	if (flags & CON_NBCON) {
-		/* The write_atomic() callback is optional. */
-		if (use_atomic && !con->write_atomic)
-			return false;
-
-		/*
-		 * For the !use_atomic case, @printk_kthreads_running is not
-		 * checked because the write_thread() callback is also used
-		 * via the legacy loop when the printer threads are not
-		 * available.
-		 */
-	} else {
-		if (!con->write)
-			return false;
-	}
-
-	/*
-	 * Console drivers may assume that per-cpu resources have been
-	 * allocated. So unless they're explicitly marked as being able to
-	 * cope (CON_ANYTIME) don't call them until this CPU is officially up.
-	 */
-	if (!cpu_online(raw_smp_processor_id()) && !(flags & CON_ANYTIME))
-		return false;
-
-	return true;
-}
-
 /**
  * nbcon_kthread_wake - Wake up a console printing thread
  * @con:	Console to operate on

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/3] printk: nbcon: Introduce KDB helpers
  2025-08-11 13:32 [PATCH v2 0/3] Handle NBCON consoles on KDB Marcos Paulo de Souza
  2025-08-11 13:32 ` [PATCH v2 1/3] printk: nbcon: Export console_is_usable Marcos Paulo de Souza
@ 2025-08-11 13:32 ` Marcos Paulo de Souza
  2025-08-26 14:30   ` John Ogness
  2025-08-28 15:26   ` Petr Mladek
  2025-08-11 13:32 ` [PATCH v2 3/3] kdb: Adapt kdb_msg_write to work with NBCON consoles Marcos Paulo de Souza
  2025-08-12 17:14 ` [PATCH v2 0/3] Handle NBCON consoles on KDB Marcos Paulo de Souza
  3 siblings, 2 replies; 18+ messages in thread
From: Marcos Paulo de Souza @ 2025-08-11 13:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Petr Mladek, Steven Rostedt, John Ogness,
	Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
	Douglas Anderson
  Cc: linux-kernel, kgdb-bugreport, Marcos Paulo de Souza

These helpers will be used when calling console->write_atomic on
KDB code in the next patch. It's basically the same implementaion
as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when
acquiring the context.

For release, differently from nbcon_device_release, we don't need to
flush the console, since all CPUs are stopped when KDB is active.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
 include/linux/console.h |  6 ++++++
 kernel/printk/nbcon.c   | 26 ++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index 67af483574727c00eea1d5a1eacc994755c92607..b34c5a0b86303e2fb4583fa467d8be43761cf756 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -605,6 +605,9 @@ extern bool nbcon_can_proceed(struct nbcon_write_context *wctxt);
 extern bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt);
 extern bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt);
 extern void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt);
+extern bool nbcon_kdb_try_acquire(struct console *con,
+				  struct nbcon_write_context *wctxt);
+extern void nbcon_kdb_release(struct nbcon_write_context *wctxt);
 
 /*
  * Check if the given console is currently capable and allowed to print
@@ -654,6 +657,9 @@ static inline bool nbcon_can_proceed(struct nbcon_write_context *wctxt) { return
 static inline bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt) { return false; }
 static inline bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt) { return false; }
 static inline void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt) { }
+static inline bool nbcon_kdb_try_acquire(struct console *con,
+					 struct nbcon_write_context *wctxt) { return false; }
+static inline void nbcon_kdb_release(struct console *con) { }
 static inline bool console_is_usable(struct console *con, short flags,
 				     bool use_atomic) { return false; }
 #endif
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 646801813415f0abe40cabf2f28ca9e30664f028..79d8c7437806119ad9787ddc48382dc2c86c23c3 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1855,3 +1855,29 @@ void nbcon_device_release(struct console *con)
 	console_srcu_read_unlock(cookie);
 }
 EXPORT_SYMBOL_GPL(nbcon_device_release);
+
+bool nbcon_kdb_try_acquire(struct console *con,
+			   struct nbcon_write_context *wctxt)
+{
+	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+
+	memset(ctxt, 0, sizeof(*ctxt));
+	ctxt->console = con;
+	ctxt->prio    = NBCON_PRIO_EMERGENCY;
+
+	if (!nbcon_context_try_acquire(ctxt, false))
+		return false;
+
+	if (!nbcon_context_enter_unsafe(ctxt))
+		return false;
+
+	return true;
+}
+
+void nbcon_kdb_release(struct nbcon_write_context *wctxt)
+{
+	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+
+	nbcon_context_exit_unsafe(ctxt);
+	nbcon_context_release(ctxt);
+}

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/3] kdb: Adapt kdb_msg_write to work with NBCON consoles
  2025-08-11 13:32 [PATCH v2 0/3] Handle NBCON consoles on KDB Marcos Paulo de Souza
  2025-08-11 13:32 ` [PATCH v2 1/3] printk: nbcon: Export console_is_usable Marcos Paulo de Souza
  2025-08-11 13:32 ` [PATCH v2 2/3] printk: nbcon: Introduce KDB helpers Marcos Paulo de Souza
@ 2025-08-11 13:32 ` Marcos Paulo de Souza
  2025-08-11 14:38   ` Daniel Thompson
  2025-08-12 17:14 ` [PATCH v2 0/3] Handle NBCON consoles on KDB Marcos Paulo de Souza
  3 siblings, 1 reply; 18+ messages in thread
From: Marcos Paulo de Souza @ 2025-08-11 13:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Petr Mladek, Steven Rostedt, John Ogness,
	Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
	Douglas Anderson
  Cc: linux-kernel, kgdb-bugreport, Marcos Paulo de Souza

Function kdb_msg_write was calling con->write for any found console,
but it won't work on NBCON ones. In this case we should acquire the
ownership of the console using NBCON_PRIO_EMERGENCY, since printing
kdb messages should only be interrupted by a panic. This is done by the
nbcon_kdb_{acquire,release} functions.

At this point, the console is required to use the atomic callback. The
console is skipped if the write_atomic callback is not set or if the
context could not be acquired. The validation of NBCON is done by the
console_is_usable helper. The context is released right after
write_atomic finishes.

Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
 kernel/debug/kdb/kdb_io.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
index 9b11b10b120cf07e451a7a4d92ce50f9a6c066b2..74f6d4316bdc9d3c4f6d4252bf425e33cce65a87 100644
--- a/kernel/debug/kdb/kdb_io.c
+++ b/kernel/debug/kdb/kdb_io.c
@@ -589,12 +589,23 @@ static void kdb_msg_write(const char *msg, int msg_len)
 	 */
 	cookie = console_srcu_read_lock();
 	for_each_console_srcu(c) {
-		if (!(console_srcu_read_flags(c) & CON_ENABLED))
+		struct nbcon_write_context wctxt = { };
+		short flags = console_srcu_read_flags(c);
+
+		if (!console_is_usable(c, flags, true))
 			continue;
 		if (c == dbg_io_ops->cons)
 			continue;
-		if (!c->write)
-			continue;
+
+		/*
+		 * Do not continue if the console is NBCON and the context
+		 * can't be acquired.
+		 */
+		if (flags & CON_NBCON) {
+			if (!nbcon_kdb_try_acquire(c, &wctxt))
+				continue;
+		}
+
 		/*
 		 * Set oops_in_progress to encourage the console drivers to
 		 * disregard their internal spin locks: in the current calling
@@ -605,7 +616,14 @@ static void kdb_msg_write(const char *msg, int msg_len)
 		 * for this calling context.
 		 */
 		++oops_in_progress;
-		c->write(c, msg, msg_len);
+		if (flags & CON_NBCON) {
+			wctxt.outbuf = (char *)msg;
+			wctxt.len = msg_len;
+			c->write_atomic(c, &wctxt);
+			nbcon_kdb_release(&wctxt);
+		} else {
+			c->write(c, msg, msg_len);
+		}
 		--oops_in_progress;
 		touch_nmi_watchdog();
 	}

-- 
2.50.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/3] kdb: Adapt kdb_msg_write to work with NBCON consoles
  2025-08-11 13:32 ` [PATCH v2 3/3] kdb: Adapt kdb_msg_write to work with NBCON consoles Marcos Paulo de Souza
@ 2025-08-11 14:38   ` Daniel Thompson
  2025-08-11 19:19     ` Marcos Paulo de Souza
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Thompson @ 2025-08-11 14:38 UTC (permalink / raw)
  To: Marcos Paulo de Souza
  Cc: Greg Kroah-Hartman, Petr Mladek, Steven Rostedt, John Ogness,
	Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
	Douglas Anderson, linux-kernel, kgdb-bugreport

Hi Marcos

No objections, but a couple of questions if I may...


On Mon, Aug 11, 2025 at 10:32:47AM -0300, Marcos Paulo de Souza wrote:
> Function kdb_msg_write was calling con->write for any found console,
> but it won't work on NBCON ones. In this case we should acquire the
> ownership of the console using NBCON_PRIO_EMERGENCY, since printing
> kdb messages should only be interrupted by a panic. This is done by the
> nbcon_kdb_{acquire,release} functions.

Just wanted to check what it means to be "interrupted by a panic"?

kdb is called from the panic handler but, assuming the serial port is run
syncrhonously when "bad things are happening", the serial port should be
quiet when we enter kdb meaning we can still acquire ownership of the
console?
>
> At this point, the console is required to use the atomic callback. The
> console is skipped if the write_atomic callback is not set or if the
> context could not be acquired. The validation of NBCON is done by the
> console_is_usable helper. The context is released right after
> write_atomic finishes.
>
> Suggested-by: Petr Mladek <pmladek@suse.com>
> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
> ---
>  kernel/debug/kdb/kdb_io.c | 26 ++++++++++++++++++++++----
>  1 file changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
> index 9b11b10b120cf07e451a7a4d92ce50f9a6c066b2..74f6d4316bdc9d3c4f6d4252bf425e33cce65a87 100644
> --- a/kernel/debug/kdb/kdb_io.c
> +++ b/kernel/debug/kdb/kdb_io.c
> @@ -589,12 +589,23 @@ static void kdb_msg_write(const char *msg, int msg_len)
>  	 */
>  	cookie = console_srcu_read_lock();
>  	for_each_console_srcu(c) {
> -		if (!(console_srcu_read_flags(c) & CON_ENABLED))
> +		struct nbcon_write_context wctxt = { };
> +		short flags = console_srcu_read_flags(c);
> +
> +		if (!console_is_usable(c, flags, true))
>  			continue;
>  		if (c == dbg_io_ops->cons)
>  			continue;
> -		if (!c->write)
> -			continue;
> +
> +		/*
> +		 * Do not continue if the console is NBCON and the context
> +		 * can't be acquired.
> +		 */
> +		if (flags & CON_NBCON) {
> +			if (!nbcon_kdb_try_acquire(c, &wctxt))
> +				continue;
> +		}
> +
>  		/*
>  		 * Set oops_in_progress to encourage the console drivers to
>  		 * disregard their internal spin locks: in the current calling
> @@ -605,7 +616,14 @@ static void kdb_msg_write(const char *msg, int msg_len)
>  		 * for this calling context.
>  		 */
>  		++oops_in_progress;

Dumb question, but is the oops_in_progress bump still useful with atomic
consoles? Will the console have any spinlocks to disregard or will the
console ownership code already handled any mutual exclusion issues meaning
there should be no spinlocks taking by the atomic write handler?


Thanks

Daniel.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/3] kdb: Adapt kdb_msg_write to work with NBCON consoles
  2025-08-11 14:38   ` Daniel Thompson
@ 2025-08-11 19:19     ` Marcos Paulo de Souza
  2025-08-26 15:11       ` John Ogness
  0 siblings, 1 reply; 18+ messages in thread
From: Marcos Paulo de Souza @ 2025-08-11 19:19 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Greg Kroah-Hartman, Petr Mladek, Steven Rostedt, John Ogness,
	Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
	Douglas Anderson, linux-kernel, kgdb-bugreport

On Mon, 2025-08-11 at 15:38 +0100, Daniel Thompson wrote:
> Hi Marcos
> 
> No objections, but a couple of questions if I may...

np at all, thanks for looking at the code!

> 
> 
> On Mon, Aug 11, 2025 at 10:32:47AM -0300, Marcos Paulo de Souza
> wrote:
> > Function kdb_msg_write was calling con->write for any found
> > console,
> > but it won't work on NBCON ones. In this case we should acquire the
> > ownership of the console using NBCON_PRIO_EMERGENCY, since printing
> > kdb messages should only be interrupted by a panic. This is done by
> > the
> > nbcon_kdb_{acquire,release} functions.
> 
> Just wanted to check what it means to be "interrupted by a panic"?
> 
> kdb is called from the panic handler but, assuming the serial port is
> run
> syncrhonously when "bad things are happening", the serial port should
> be
> quiet when we enter kdb meaning we can still acquire ownership of the
> console?

TBH I haven't thought about that. I talked with Petr Mladek about it,
and we agreed to have something similar to nbcon_device_try_acquire,
but with a higher priority, to make sure that we would get the context
at this point. But, when thinking about it, since KDB runs on the panic
handler, so we I'm not sure even if we need the _enter_unsafe() call at
this point, since nobody is going to interrupt either way.

About the try_acquire part, I haven't checked about the state of the
console devices when the panic happens, if they drop the ownership of
the console on non-panic CPUs or not, so I'm not sure if this is needed
or not. I'll wait for Petr and/or maybe John to add some info.

> > 
> > At this point, the console is required to use the atomic callback.
> > The
> > console is skipped if the write_atomic callback is not set or if
> > the
> > context could not be acquired. The validation of NBCON is done by
> > the
> > console_is_usable helper. The context is released right after
> > write_atomic finishes.
> > 
> > Suggested-by: Petr Mladek <pmladek@suse.com>
> > Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
> > ---
> >  kernel/debug/kdb/kdb_io.c | 26 ++++++++++++++++++++++----
> >  1 file changed, 22 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
> > index
> > 9b11b10b120cf07e451a7a4d92ce50f9a6c066b2..74f6d4316bdc9d3c4f6d4252b
> > f425e33cce65a87 100644
> > --- a/kernel/debug/kdb/kdb_io.c
> > +++ b/kernel/debug/kdb/kdb_io.c
> > @@ -589,12 +589,23 @@ static void kdb_msg_write(const char *msg,
> > int msg_len)
> >  	 */
> >  	cookie = console_srcu_read_lock();
> >  	for_each_console_srcu(c) {
> > -		if (!(console_srcu_read_flags(c) & CON_ENABLED))
> > +		struct nbcon_write_context wctxt = { };
> > +		short flags = console_srcu_read_flags(c);
> > +
> > +		if (!console_is_usable(c, flags, true))
> >  			continue;
> >  		if (c == dbg_io_ops->cons)
> >  			continue;
> > -		if (!c->write)
> > -			continue;
> > +
> > +		/*
> > +		 * Do not continue if the console is NBCON and the
> > context
> > +		 * can't be acquired.
> > +		 */
> > +		if (flags & CON_NBCON) {
> > +			if (!nbcon_kdb_try_acquire(c, &wctxt))
> > +				continue;
> > +		}
> > +
> >  		/*
> >  		 * Set oops_in_progress to encourage the console
> > drivers to
> >  		 * disregard their internal spin locks: in the
> > current calling
> > @@ -605,7 +616,14 @@ static void kdb_msg_write(const char *msg, int
> > msg_len)
> >  		 * for this calling context.
> >  		 */
> >  		++oops_in_progress;
> 
> Dumb question, but is the oops_in_progress bump still useful with
> atomic
> consoles? Will the console have any spinlocks to disregard or will
> the
> console ownership code already handled any mutual exclusion issues
> meaning
> there should be no spinlocks taking by the atomic write handler?
> 

IIUC, since we can have multiple consoles, and some of them are NB and
others  aren't, I believe that this oops_in_progress is still useful.

> 
> 
> Thanks
> 
> Daniel.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/3] Handle NBCON consoles on KDB
  2025-08-11 13:32 [PATCH v2 0/3] Handle NBCON consoles on KDB Marcos Paulo de Souza
                   ` (2 preceding siblings ...)
  2025-08-11 13:32 ` [PATCH v2 3/3] kdb: Adapt kdb_msg_write to work with NBCON consoles Marcos Paulo de Souza
@ 2025-08-12 17:14 ` Marcos Paulo de Souza
  3 siblings, 0 replies; 18+ messages in thread
From: Marcos Paulo de Souza @ 2025-08-12 17:14 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Petr Mladek, Steven Rostedt, John Ogness,
	Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
	Douglas Anderson
  Cc: linux-kernel, kgdb-bugreport

On Mon, 2025-08-11 at 10:32 -0300, Marcos Paulo de Souza wrote:
> After the proposed patches [1], it was suggested to start using
> console_is_usable instead of checking if a console is enabled. It was
> noticed that KDB was always calling con->write method, but this
> callback is not set for NBCON consoles.
> 
> To fix this usecase, export console_is_usable and add new nbcon code
> to
> acquire a context that KDB needs in order to call ->write_atomic.
> After
> these patches are accepted I'm planning to go back to [1] again to
> use
> the function expected here.
> 
> I did the tests using qemu and reapplying commit f79b163c4231
> ('Revert "serial: 8250: Switch to nbcon console"') created originally
> by
> John, just to exercise the common 8250 serial from qemu. The commit
> can
> be checked on [2]. I had to solve some conflicts since the code has
> been
> reworked after the commit was reverted.
> 
> Without these patches, NBCON consoles won't receive the kgdb
> messages. I
> tested using [2], and it works after this patchset.
> 
> Thanks to all reviewers of the patches posted on [1]! I hope this is
> the
> first step into implementing all the changes suggested in that
> patchset.
> 
> [1]:
> https://lore.kernel.org/lkml/20250606-printk-cleanup-part2-v1-0-f427c743dda0@suse.com/
> [2]:
> https://github.com/marcosps/linux/commit/bea249773c9caf56821f9ec06c7f9649e4966c61
> 
> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
> ---
> Changes in v2:
> - EDITME: describe what is new in this series revision.
> - EDITME: use bulletpoints and terse descriptions.
> - Link to v1:
> https://lore.kernel.org/r/20250713-nbcon-kgdboc-v1-0-51eccd9247a8@suse.com

This patchset wasn't thought as a v2 of [1], but somehow I messed up
with my b4 setup, and it's like this... ignore the v2 mark :)

> 
> ---
> Marcos Paulo de Souza (3):
>       printk: nbcon: Export console_is_usable
>       printk: nbcon: Introduce KDB helpers
>       kdb: Adapt kdb_msg_write to work with NBCON consoles
> 
>  include/linux/console.h   | 50
> +++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/debug/kdb/kdb_io.c | 26 ++++++++++++++++++++----
>  kernel/printk/internal.h  | 41 -------------------------------------
> -
>  kernel/printk/nbcon.c     | 26 ++++++++++++++++++++++++
>  4 files changed, 98 insertions(+), 45 deletions(-)
> ---
> base-commit: bea249773c9caf56821f9ec06c7f9649e4966c61
> change-id: 20250713-nbcon-kgdboc-efcfc37fde46
> 
> Best regards,

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/3] printk: nbcon: Export console_is_usable
  2025-08-11 13:32 ` [PATCH v2 1/3] printk: nbcon: Export console_is_usable Marcos Paulo de Souza
@ 2025-08-20 15:03   ` Petr Mladek
  0 siblings, 0 replies; 18+ messages in thread
From: Petr Mladek @ 2025-08-20 15:03 UTC (permalink / raw)
  To: Marcos Paulo de Souza
  Cc: Greg Kroah-Hartman, Steven Rostedt, John Ogness,
	Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
	Douglas Anderson, linux-kernel, kgdb-bugreport

On Mon 2025-08-11 10:32:45, Marcos Paulo de Souza wrote:
> The helper will be used on KDB code in the next commits.
> 
> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
> ---
>  include/linux/console.h  | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  kernel/printk/internal.h | 41 -----------------------------------------
>  2 files changed, 44 insertions(+), 41 deletions(-)
> 
> diff --git a/include/linux/console.h b/include/linux/console.h
> index 8f10d0a85bb4536e4b0dda4e8ccbdf87978bbb4a..67af483574727c00eea1d5a1eacc994755c92607 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -612,6 +654,8 @@ static inline bool nbcon_can_proceed(struct nbcon_write_context *wctxt) { return
>  static inline bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt) { return false; }
>  static inline bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt) { return false; }
>  static inline void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt) { }
> +static inline bool console_is_usable(struct console *con, short flags,
> +				     bool use_atomic) { return false; }

The patch should also remove the duplicated definition in
kernel/printk/internal.h.

>  #endif
>  
>  extern int console_set_on_cmdline;

Otherwise, it looks good.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/3] printk: nbcon: Introduce KDB helpers
  2025-08-11 13:32 ` [PATCH v2 2/3] printk: nbcon: Introduce KDB helpers Marcos Paulo de Souza
@ 2025-08-26 14:30   ` John Ogness
  2025-08-29 12:30     ` Marcos Paulo de Souza
  2025-08-28 15:26   ` Petr Mladek
  1 sibling, 1 reply; 18+ messages in thread
From: John Ogness @ 2025-08-26 14:30 UTC (permalink / raw)
  To: Marcos Paulo de Souza, Greg Kroah-Hartman, Petr Mladek,
	Steven Rostedt, Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
	Douglas Anderson
  Cc: linux-kernel, kgdb-bugreport, Marcos Paulo de Souza

On 2025-08-11, Marcos Paulo de Souza <mpdesouza@suse.com> wrote:
> These helpers will be used when calling console->write_atomic on
> KDB code in the next patch. It's basically the same implementaion
> as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when
> acquiring the context.
>
> For release, differently from nbcon_device_release, we don't need to
> flush the console, since all CPUs are stopped when KDB is active.
>
> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
> ---
>  include/linux/console.h |  6 ++++++
>  kernel/printk/nbcon.c   | 26 ++++++++++++++++++++++++++
>  2 files changed, 32 insertions(+)
>
> diff --git a/include/linux/console.h b/include/linux/console.h
> index 67af483574727c00eea1d5a1eacc994755c92607..b34c5a0b86303e2fb4583fa467d8be43761cf756 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -605,6 +605,9 @@ extern bool nbcon_can_proceed(struct nbcon_write_context *wctxt);
>  extern bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt);
>  extern bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt);
>  extern void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt);
> +extern bool nbcon_kdb_try_acquire(struct console *con,
> +				  struct nbcon_write_context *wctxt);
> +extern void nbcon_kdb_release(struct nbcon_write_context *wctxt);
>  
>  /*
>   * Check if the given console is currently capable and allowed to print
> @@ -654,6 +657,9 @@ static inline bool nbcon_can_proceed(struct nbcon_write_context *wctxt) { return
>  static inline bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt) { return false; }
>  static inline bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt) { return false; }
>  static inline void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt) { }
> +static inline bool nbcon_kdb_try_acquire(struct console *con,
> +					 struct nbcon_write_context *wctxt) { return false; }
> +static inline void nbcon_kdb_release(struct console *con) { }
>  static inline bool console_is_usable(struct console *con, short flags,
>  				     bool use_atomic) { return false; }
>  #endif
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index 646801813415f0abe40cabf2f28ca9e30664f028..79d8c7437806119ad9787ddc48382dc2c86c23c3 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -1855,3 +1855,29 @@ void nbcon_device_release(struct console *con)
>  	console_srcu_read_unlock(cookie);
>  }
>  EXPORT_SYMBOL_GPL(nbcon_device_release);
> +
> +bool nbcon_kdb_try_acquire(struct console *con,
> +			   struct nbcon_write_context *wctxt)
> +{
> +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +
> +	memset(ctxt, 0, sizeof(*ctxt));
> +	ctxt->console = con;
> +	ctxt->prio    = NBCON_PRIO_EMERGENCY;
> +
> +	if (!nbcon_context_try_acquire(ctxt, false))
> +		return false;
> +
> +	if (!nbcon_context_enter_unsafe(ctxt))
> +		return false;
> +
> +	return true;
> +}
> +
> +void nbcon_kdb_release(struct nbcon_write_context *wctxt)
> +{
> +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +
> +	nbcon_context_exit_unsafe(ctxt);
> +	nbcon_context_release(ctxt);

If nbcon_context_exit_unsafe() fails, the current task is no longer the
owner and thus a release is not needed. I would prefer:

	if (nbcon_context_exit_unsafe(ctxt))
		nbcon_context_release(ctxt);

or

	if (!nbcon_context_exit_unsafe(ctxt))
		return;

	nbcon_context_release(ctxt);

You can find this same pattern in nbcon_device_release().

John

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/3] kdb: Adapt kdb_msg_write to work with NBCON consoles
  2025-08-11 19:19     ` Marcos Paulo de Souza
@ 2025-08-26 15:11       ` John Ogness
  2025-08-29 13:15         ` Petr Mladek
  0 siblings, 1 reply; 18+ messages in thread
From: John Ogness @ 2025-08-26 15:11 UTC (permalink / raw)
  To: Marcos Paulo de Souza, Daniel Thompson
  Cc: Greg Kroah-Hartman, Petr Mladek, Steven Rostedt,
	Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
	Douglas Anderson, linux-kernel, kgdb-bugreport

On 2025-08-11, Marcos Paulo de Souza <mpdesouza@suse.com> wrote:
> On Mon, 2025-08-11 at 15:38 +0100, Daniel Thompson wrote:
>> On Mon, Aug 11, 2025 at 10:32:47AM -0300, Marcos Paulo de Souza
>> wrote:
>> > Function kdb_msg_write was calling con->write for any found
>> > console, but it won't work on NBCON ones. In this case we should
>> > acquire the ownership of the console using NBCON_PRIO_EMERGENCY,
>> > since printing kdb messages should only be interrupted by a
>> > panic. This is done by the nbcon_kdb_{acquire,release} functions.
>> 
>> Just wanted to check what it means to be "interrupted by a panic"?
>> 
>> kdb is called from the panic handler but, assuming the serial port is
>> run syncrhonously when "bad things are happening", the serial port
>> should be quiet when we enter kdb meaning we can still acquire
>> ownership of the console?
>
> TBH I haven't thought about that. I talked with Petr Mladek about it,
> and we agreed to have something similar to nbcon_device_try_acquire,
> but with a higher priority, to make sure that we would get the context
> at this point. But, when thinking about it, since KDB runs on the panic
> handler, so we I'm not sure even if we need the _enter_unsafe() call at
> this point, since nobody is going to interrupt either way.

Well, kdb can also run when not in panic. In that case, if a panic
occurs while using kdb, those panic messages should be printed directly
on the consoles.

Also be aware that the kdb interface is using dbg_io_ops->write_char()
for printing and it will ignore a console that matches
dbg_io_ops->cons. So there is no concern about the kdb interface
conflicting with the nbcon console. This is just about the mirrored kdb
output.

> About the try_acquire part, I haven't checked about the state of the
> console devices when the panic happens, if they drop the ownership of
> the console on non-panic CPUs or not, so I'm not sure if this is needed
> or not. I'll wait for Petr and/or maybe John to add some info.

If any context owned the console and is in an unsafe section while being
interrupted by kdb (regardless if panic or not), then
nbcon_kdb_try_acquire() will fail and the mirrored kdb output will not
be visible on that console.

I am not sure how important it is that the output is mirrored in this
case. A hostile takeover is not acceptable. And implementing some sort
of delay so that the current owner has a chance to release the ownership
(similar to what was attempted here [0]) is not only complicated, but
has its own set of problems.

Currently there is no mirrored output for nbcon consoles. With this
series it is at least possible.

BTW, in order for CPU switching during panic to be able to mirror output
on nbcon consoles, an additional exception must be added to
nbcon_context_try_acquire_direct(). It would look like this:

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 79d8c74378061..2c168eaf378ed 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -10,6 +10,7 @@
 #include <linux/export.h>
 #include <linux/init.h>
 #include <linux/irqflags.h>
+#include <linux/kgdb.h>
 #include <linux/kthread.h>
 #include <linux/minmax.h>
 #include <linux/percpu.h>
@@ -247,6 +248,8 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
 		 * Panic does not imply that the console is owned. However,
 		 * since all non-panic CPUs are stopped during panic(), it
 		 * is safer to have them avoid gaining console ownership.
+		 * The only exception is if kgdb is active, which may print
+		 * from multiple CPUs during a panic.
 		 *
 		 * If this acquire is a reacquire (and an unsafe takeover
 		 * has not previously occurred) then it is allowed to attempt
@@ -255,6 +258,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
 		 * interrupted by the panic CPU while printing.
 		 */
 		if (other_cpu_in_panic() &&
+		    atomic_read(&kgdb_active) == -1 &&
 		    (!is_reacquire || cur->unsafe_takeover)) {
 			return -EPERM;
 		}

>> > @@ -605,7 +616,14 @@ static void kdb_msg_write(const char *msg, int msg_len)
>> > 		 * for this calling context.
>> > 		 */
>> > 		++oops_in_progress;
>> > -		c->write(c, msg, msg_len);
>> > +		if (flags & CON_NBCON) {
>> > +			wctxt.outbuf = (char *)msg;
>> > +			wctxt.len = msg_len;
>> > +			c->write_atomic(c, &wctxt);
>> > +			nbcon_kdb_release(&wctxt);
>> > +		} else {
>> > +			c->write(c, msg, msg_len);
>> > +		}
>> > 		--oops_in_progress;
>> > 		touch_nmi_watchdog();
>> > 	}
>> 
>> Dumb question, but is the oops_in_progress bump still useful with
>> atomic consoles? Will the console have any spinlocks to disregard or
>> will the console ownership code already handled any mutual exclusion
>> issues meaning there should be no spinlocks taking by the atomic
>> write handler?
>
> IIUC, since we can have multiple consoles, and some of them are NB and
> others aren't, I believe that this oops_in_progress is still useful.

Correct, but only for legacy consoles. Please move the @oops_in_progress
increment/decrement to only be around the c->write() call. This makes it
clear that the hack is only "useful" for the legacy consoles.

John

[0] https://lore.kernel.org/lkml/20210803131301.5588-4-john.ogness@linutronix.de

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/3] printk: nbcon: Introduce KDB helpers
  2025-08-11 13:32 ` [PATCH v2 2/3] printk: nbcon: Introduce KDB helpers Marcos Paulo de Souza
  2025-08-26 14:30   ` John Ogness
@ 2025-08-28 15:26   ` Petr Mladek
  2025-08-29 12:33     ` Marcos Paulo de Souza
  1 sibling, 1 reply; 18+ messages in thread
From: Petr Mladek @ 2025-08-28 15:26 UTC (permalink / raw)
  To: Marcos Paulo de Souza
  Cc: Greg Kroah-Hartman, Steven Rostedt, John Ogness,
	Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
	Douglas Anderson, linux-kernel, kgdb-bugreport

On Mon 2025-08-11 10:32:46, Marcos Paulo de Souza wrote:
> These helpers will be used when calling console->write_atomic on
> KDB code in the next patch. It's basically the same implementaion
> as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when
> acquiring the context.
> 
> For release, differently from nbcon_device_release, we don't need to
> flush the console, since all CPUs are stopped when KDB is active.

I thought this when we were discussion the code, especially the
comment in

static void kdb_msg_write(const char *msg, int msg_len)
{
[...]

	/*
	 * The console_srcu_read_lock() only provides safe console list
	 * traversal. The use of the ->write() callback relies on all other
	 * CPUs being stopped at the moment and console drivers being able to
	 * handle reentrance when @oops_in_progress is set.


But I see that kdb_msg_write() is called from vkdb_printf() and
there is the following synchronization:

int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
{
[...]

	/* Serialize kdb_printf if multiple cpus try to write at once.
	 * But if any cpu goes recursive in kdb, just print the output,
	 * even if it is interleaved with any other text.
	 */
	local_irq_save(flags);
	this_cpu = smp_processor_id();
	for (;;) {
		old_cpu = cmpxchg(&kdb_printf_cpu, -1, this_cpu);
		if (old_cpu == -1 || old_cpu == this_cpu)
			break;

		cpu_relax();
	}

It suggests that the code might be used when other CPUs are still
running.

And for example, kgdb_panic(buf) is called in vpanic() before
the other CPUs are stopped.


My opinion:

It might make sense to flush the console after all. But probably
only the particular console, see below.

> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -1855,3 +1855,29 @@ void nbcon_device_release(struct console *con)
>  	console_srcu_read_unlock(cookie);
>  }
>  EXPORT_SYMBOL_GPL(nbcon_device_release);
> +

The function must be called only in the very specific kdb
context, so it would deserve a comment. Inspired by
nbcon_device_try_acquire():

/**
 * nbcon_kdb_try_acquire - Try to acquire nbcon console, enter unsafe
 *				section, and initialized nbcon write context
 * @con:	The nbcon console to acquire
 * @wctxt:	The nbcon write context to be used on success
 *
 * Context:	Under console_srcu_read_lock() for emiting a single kdb message
 *		using the given con->write_atomic() callback. Can be called
 *		only when the console is usable at the moment.
 *
 * Return:	True if the console was acquired. False otherwise.
 *
 * kdb emits messages on consoles registered for printk() without
 * storing them into the ring buffer. It has to acquire the console
 * ownerhip so that is could call con->write_atomic() callback a safe way.
 *
 * This function acquires the nbcon console using priority NBCON_PRIO_EMERGENCY
 * and marks it unsafe for handover/takeover.
 */

> +bool nbcon_kdb_try_acquire(struct console *con,
> +			   struct nbcon_write_context *wctxt)
> +{
> +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +
> +	memset(ctxt, 0, sizeof(*ctxt));
> +	ctxt->console = con;
> +	ctxt->prio    = NBCON_PRIO_EMERGENCY;
> +
> +	if (!nbcon_context_try_acquire(ctxt, false))
> +		return false;
> +
> +	if (!nbcon_context_enter_unsafe(ctxt))
> +		return false;
> +
> +	return true;
> +}
> +

It deserves a comment as well, see below:

> +void nbcon_kdb_release(struct nbcon_write_context *wctxt)
> +{
> +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +
> +	nbcon_context_exit_unsafe(ctxt);
> +	nbcon_context_release(ctxt);

I agree with John that the _release() should be called only when
exit_unsafe() succeeded.

Also it might make sense to flush the console. I would do something
like:


/**
 * nbcon_kdb_release - Exit unsafe section and release the nbcon console
 *
 * @wctxt:	The nbcon write context intialized by a succesful
 *	nbcon_kdb_try_acquire()
 *
 * Context:	Under console_srcu_read_lock() for emiting a single kdb message
 *		using the given con->write_atomic() callback. Can be called
 *		only when the console is usable at the moment.
 */
void nbcon_kdb_release(struct nbcon_write_context *wctxt)
{
	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);

	nbcon_context_exit_unsafe(ctxt);
	nbcon_context_release(ctxt);

	if (!nbcon_context_exit_unsafe(ctxt))
		return;

	nbcon_context_release(ctxt);

	/*
	 * Flush any new printk() messages added when the console was blocked.
	 * Only the console used by the given write context was	blocked.
	 * The console was locked only when the write_atomic() callback
	 * was usable.
	 */
	__nbcon_atomic_flush_pending_con(ctxt->console, prb_next_reserve_seq(prb), false);


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/3] printk: nbcon: Introduce KDB helpers
  2025-08-26 14:30   ` John Ogness
@ 2025-08-29 12:30     ` Marcos Paulo de Souza
  0 siblings, 0 replies; 18+ messages in thread
From: Marcos Paulo de Souza @ 2025-08-29 12:30 UTC (permalink / raw)
  To: John Ogness, Greg Kroah-Hartman, Petr Mladek, Steven Rostedt,
	Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
	Douglas Anderson
  Cc: linux-kernel, kgdb-bugreport

On Tue, 2025-08-26 at 16:36 +0206, John Ogness wrote:
> On 2025-08-11, Marcos Paulo de Souza <mpdesouza@suse.com> wrote:
> > These helpers will be used when calling console->write_atomic on
> > KDB code in the next patch. It's basically the same implementaion
> > as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when
> > acquiring the context.
> > 
> > For release, differently from nbcon_device_release, we don't need
> > to
> > flush the console, since all CPUs are stopped when KDB is active.
> > 
> > Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
> > ---
> >  include/linux/console.h |  6 ++++++
> >  kernel/printk/nbcon.c   | 26 ++++++++++++++++++++++++++
> >  2 files changed, 32 insertions(+)
> > 
> > diff --git a/include/linux/console.h b/include/linux/console.h
> > index
> > 67af483574727c00eea1d5a1eacc994755c92607..b34c5a0b86303e2fb4583fa46
> > 7d8be43761cf756 100644
> > --- a/include/linux/console.h
> > +++ b/include/linux/console.h
> > @@ -605,6 +605,9 @@ extern bool nbcon_can_proceed(struct
> > nbcon_write_context *wctxt);
> >  extern bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt);
> >  extern bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt);
> >  extern void nbcon_reacquire_nobuf(struct nbcon_write_context
> > *wctxt);
> > +extern bool nbcon_kdb_try_acquire(struct console *con,
> > +				  struct nbcon_write_context
> > *wctxt);
> > +extern void nbcon_kdb_release(struct nbcon_write_context *wctxt);
> >  
> >  /*
> >   * Check if the given console is currently capable and allowed to
> > print
> > @@ -654,6 +657,9 @@ static inline bool nbcon_can_proceed(struct
> > nbcon_write_context *wctxt) { return
> >  static inline bool nbcon_enter_unsafe(struct nbcon_write_context
> > *wctxt) { return false; }
> >  static inline bool nbcon_exit_unsafe(struct nbcon_write_context
> > *wctxt) { return false; }
> >  static inline void nbcon_reacquire_nobuf(struct
> > nbcon_write_context *wctxt) { }
> > +static inline bool nbcon_kdb_try_acquire(struct console *con,
> > +					 struct
> > nbcon_write_context *wctxt) { return false; }
> > +static inline void nbcon_kdb_release(struct console *con) { }
> >  static inline bool console_is_usable(struct console *con, short
> > flags,
> >  				     bool use_atomic) { return
> > false; }
> >  #endif
> > diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> > index
> > 646801813415f0abe40cabf2f28ca9e30664f028..79d8c7437806119ad9787ddc4
> > 8382dc2c86c23c3 100644
> > --- a/kernel/printk/nbcon.c
> > +++ b/kernel/printk/nbcon.c
> > @@ -1855,3 +1855,29 @@ void nbcon_device_release(struct console
> > *con)
> >  	console_srcu_read_unlock(cookie);
> >  }
> >  EXPORT_SYMBOL_GPL(nbcon_device_release);
> > +
> > +bool nbcon_kdb_try_acquire(struct console *con,
> > +			   struct nbcon_write_context *wctxt)
> > +{
> > +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> > +
> > +	memset(ctxt, 0, sizeof(*ctxt));
> > +	ctxt->console = con;
> > +	ctxt->prio    = NBCON_PRIO_EMERGENCY;
> > +
> > +	if (!nbcon_context_try_acquire(ctxt, false))
> > +		return false;
> > +
> > +	if (!nbcon_context_enter_unsafe(ctxt))
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> > +void nbcon_kdb_release(struct nbcon_write_context *wctxt)
> > +{
> > +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> > +
> > +	nbcon_context_exit_unsafe(ctxt);
> > +	nbcon_context_release(ctxt);
> 
> If nbcon_context_exit_unsafe() fails, the current task is no longer
> the
> owner and thus a release is not needed. I would prefer:
> 
> 	if (nbcon_context_exit_unsafe(ctxt))
> 		nbcon_context_release(ctxt);
> 
> or
> 
> 	if (!nbcon_context_exit_unsafe(ctxt))
> 		return;
> 
> 	nbcon_context_release(ctxt);
> 
> You can find this same pattern in nbcon_device_release().

Thanks for spotting this issue. Fixed localy.

> 
> John

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/3] printk: nbcon: Introduce KDB helpers
  2025-08-28 15:26   ` Petr Mladek
@ 2025-08-29 12:33     ` Marcos Paulo de Souza
  0 siblings, 0 replies; 18+ messages in thread
From: Marcos Paulo de Souza @ 2025-08-29 12:33 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Greg Kroah-Hartman, Steven Rostedt, John Ogness,
	Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
	Douglas Anderson, linux-kernel, kgdb-bugreport

On Thu, 2025-08-28 at 17:26 +0200, Petr Mladek wrote:
> On Mon 2025-08-11 10:32:46, Marcos Paulo de Souza wrote:
> > These helpers will be used when calling console->write_atomic on
> > KDB code in the next patch. It's basically the same implementaion
> > as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when
> > acquiring the context.
> > 
> > For release, differently from nbcon_device_release, we don't need
> > to
> > flush the console, since all CPUs are stopped when KDB is active.
> 
> I thought this when we were discussion the code, especially the
> comment in
> 
> static void kdb_msg_write(const char *msg, int msg_len)
> {
> [...]
> 
> 	/*
> 	 * The console_srcu_read_lock() only provides safe console
> list
> 	 * traversal. The use of the ->write() callback relies on
> all other
> 	 * CPUs being stopped at the moment and console drivers
> being able to
> 	 * handle reentrance when @oops_in_progress is set.
> 
> 
> But I see that kdb_msg_write() is called from vkdb_printf() and
> there is the following synchronization:
> 
> int vkdb_printf(enum kdb_msgsrc src, const char *fmt, va_list ap)
> {
> [...]
> 
> 	/* Serialize kdb_printf if multiple cpus try to write at
> once.
> 	 * But if any cpu goes recursive in kdb, just print the
> output,
> 	 * even if it is interleaved with any other text.
> 	 */
> 	local_irq_save(flags);
> 	this_cpu = smp_processor_id();
> 	for (;;) {
> 		old_cpu = cmpxchg(&kdb_printf_cpu, -1, this_cpu);
> 		if (old_cpu == -1 || old_cpu == this_cpu)
> 			break;
> 
> 		cpu_relax();
> 	}
> 
> It suggests that the code might be used when other CPUs are still
> running.
> 
> And for example, kgdb_panic(buf) is called in vpanic() before
> the other CPUs are stopped.
> 
> 
> My opinion:
> 
> It might make sense to flush the console after all. But probably
> only the particular console, see below.

Thanks for being so meticulous and double checking the solution.

> 
> > --- a/kernel/printk/nbcon.c
> > +++ b/kernel/printk/nbcon.c
> > @@ -1855,3 +1855,29 @@ void nbcon_device_release(struct console
> > *con)
> >  	console_srcu_read_unlock(cookie);
> >  }
> >  EXPORT_SYMBOL_GPL(nbcon_device_release);
> > +
> 
> The function must be called only in the very specific kdb
> context, so it would deserve a comment. Inspired by
> nbcon_device_try_acquire():

Yeah... I planned to add documentation on the new functions, but
somehow I forgot until now. I'll take you suggestions and add to the
functions, thanks a lot!

> 
> /**
>  * nbcon_kdb_try_acquire - Try to acquire nbcon console, enter unsafe
>  *				section, and initialized nbcon write
> context
>  * @con:	The nbcon console to acquire
>  * @wctxt:	The nbcon write context to be used on success
>  *
>  * Context:	Under console_srcu_read_lock() for emiting a single
> kdb message
>  *		using the given con->write_atomic() callback. Can be
> called
>  *		only when the console is usable at the moment.
>  *
>  * Return:	True if the console was acquired. False otherwise.
>  *
>  * kdb emits messages on consoles registered for printk() without
>  * storing them into the ring buffer. It has to acquire the console
>  * ownerhip so that is could call con->write_atomic() callback a safe
> way.
>  *
>  * This function acquires the nbcon console using priority
> NBCON_PRIO_EMERGENCY
>  * and marks it unsafe for handover/takeover.
>  */
> 
> > +bool nbcon_kdb_try_acquire(struct console *con,
> > +			   struct nbcon_write_context *wctxt)
> > +{
> > +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> > +
> > +	memset(ctxt, 0, sizeof(*ctxt));
> > +	ctxt->console = con;
> > +	ctxt->prio    = NBCON_PRIO_EMERGENCY;
> > +
> > +	if (!nbcon_context_try_acquire(ctxt, false))
> > +		return false;
> > +
> > +	if (!nbcon_context_enter_unsafe(ctxt))
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> 
> It deserves a comment as well, see below:
> 
> > +void nbcon_kdb_release(struct nbcon_write_context *wctxt)
> > +{
> > +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> > +
> > +	nbcon_context_exit_unsafe(ctxt);
> > +	nbcon_context_release(ctxt);
> 
> I agree with John that the _release() should be called only when
> exit_unsafe() succeeded.

Done.

> 
> Also it might make sense to flush the console. I would do something
> like:
> 
> 
> /**
>  * nbcon_kdb_release - Exit unsafe section and release the nbcon
> console
>  *
>  * @wctxt:	The nbcon write context intialized by a succesful
>  *	nbcon_kdb_try_acquire()
>  *
>  * Context:	Under console_srcu_read_lock() for emiting a single
> kdb message
>  *		using the given con->write_atomic() callback. Can be
> called
>  *		only when the console is usable at the moment.
>  */
> void nbcon_kdb_release(struct nbcon_write_context *wctxt)
> {
> 	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> 
> 	nbcon_context_exit_unsafe(ctxt);
> 	nbcon_context_release(ctxt);
> 
> 	if (!nbcon_context_exit_unsafe(ctxt))
> 		return;
> 
> 	nbcon_context_release(ctxt);
> 
> 	/*
> 	 * Flush any new printk() messages added when the console
> was blocked.
> 	 * Only the console used by the given write context
> was	blocked.
> 	 * The console was locked only when the write_atomic()
> callback
> 	 * was usable.
> 	 */
> 	__nbcon_atomic_flush_pending_con(ctxt->console,
> prb_next_reserve_seq(prb), false);

Fixed locally, thanks a lot!

> 
> 
> Best Regards,
> Petr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/3] kdb: Adapt kdb_msg_write to work with NBCON consoles
  2025-08-26 15:11       ` John Ogness
@ 2025-08-29 13:15         ` Petr Mladek
  2025-08-29 14:01           ` Marcos Paulo de Souza
  2025-08-29 14:12           ` John Ogness
  0 siblings, 2 replies; 18+ messages in thread
From: Petr Mladek @ 2025-08-29 13:15 UTC (permalink / raw)
  To: John Ogness
  Cc: Marcos Paulo de Souza, Daniel Thompson, Greg Kroah-Hartman,
	Steven Rostedt, Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
	Douglas Anderson, linux-kernel, kgdb-bugreport

On Tue 2025-08-26 17:17:32, John Ogness wrote:
> On 2025-08-11, Marcos Paulo de Souza <mpdesouza@suse.com> wrote:
> > On Mon, 2025-08-11 at 15:38 +0100, Daniel Thompson wrote:
> >> On Mon, Aug 11, 2025 at 10:32:47AM -0300, Marcos Paulo de Souza
> >> wrote:
> >> > Function kdb_msg_write was calling con->write for any found
> >> > console, but it won't work on NBCON ones. In this case we should
> >> > acquire the ownership of the console using NBCON_PRIO_EMERGENCY,
> >> > since printing kdb messages should only be interrupted by a
> >> > panic. This is done by the nbcon_kdb_{acquire,release} functions.
> >> 
> >> Just wanted to check what it means to be "interrupted by a panic"?
> >> 
> >> kdb is called from the panic handler but, assuming the serial port is
> >> run syncrhonously when "bad things are happening", the serial port
> >> should be quiet when we enter kdb meaning we can still acquire
> >> ownership of the console?
> >
> > TBH I haven't thought about that. I talked with Petr Mladek about it,
> > and we agreed to have something similar to nbcon_device_try_acquire,
> > but with a higher priority, to make sure that we would get the context
> > at this point. But, when thinking about it, since KDB runs on the panic
> > handler, so we I'm not sure even if we need the _enter_unsafe() call at
> > this point, since nobody is going to interrupt either way.
> 
> Well, kdb can also run when not in panic. In that case, if a panic
> occurs while using kdb, those panic messages should be printed directly
> on the consoles.
> 
> Also be aware that the kdb interface is using dbg_io_ops->write_char()
> for printing and it will ignore a console that matches
> dbg_io_ops->cons. So there is no concern about the kdb interface
> conflicting with the nbcon console. This is just about the mirrored kdb
> output.

I was a bit confused by the "mirrored kdb output". It was a new term ;-)
Let me try to summarize how I see it. Take it with a caution because I
am not much familiar with the kdb code.

vkdb_printf() API
-----------------

It serializes calls using a custom recursive CPU lock (kdb_printf_cpu)

It shows/stores the messages using several interfaces:

     a) gdbstub_msg_write() probably shows the message inside
	an attached gdb debugger.

     b) kdb_msg_write() shows the message on the console where kdb
	is attached via dbg_io_ops->write_char()

     c) kdb_msg_write() also writes the message on all other consoles
	registered by printk. I guess that this is what John meant
	by mirroring.

     d) printk()/pr_info() stores the messages into printk log buffer
	and eventually shows them on printk consoles. It is called
	only when @logging is enabled

Note that either gdbstub_msg_write() or kdb_msg_write() is called.
Also I guess that @logging can be enabled only when gdb debugger is
attached. kdbgetintenv() most likely returns KDB_NOTENV when gdb is
not attached.

  => vkdb_printf() shows the message on only once on consoles:

      + via kdb_msg_write() when gdb is not attached

      + via printk() when gdb is attached and logging is enabled
	by setting LOGGING= environment variable


vkdb_printf() context
---------------------

vkdb_printf() is called when entering or being inside kdb code.
I guess that we might end there via:

   + int3 (break point)
   + kgdb_panic() called in panic()
   + NMI ???

I think that it might be called even before kdb synchronizes
all CPUs into some quiescent mode. Otherwise, the custom CPU
synchronization would not be needed.

For example, I guess that the CPUs are not synchronized here:

void kgdb_panic(const char *msg)
{
[...]
	if (dbg_kdb_mode)
		kdb_printf("PANIC: %s\n", msg);


But I might be wrong. Also it seems that kgdb_cpu_enter() makes sure
that all CPUs get synchronized before entering the main loop...


Serialization of vkdb_printf() vs. printk()
===========================================

Let's look at the various interfaces showing the messages:

    a) gdbstub_msg_write() does not conflict with printk()
       at all.

    b) It seems that kdb_msg_write() -> dbg_io_ops->write_char()
       uses some special API which tries to check whether the device
       is in a safe state and eventually waits for the safe state (pooling).

       It seems to be a one way synchronization. It "guarantees" that
       kdb would start write only when safe. But it does not block
       the other side.

       It might be safe when the other side is blocked which likely
       happens in kgdb_cpu_enter().


     c) kdb_msg_write() -> con->write()/con->write_atomic(), where

	  + con->write() is the legacy console callback. It is
	    internally synchronized via some lock, e.g. port->lock.

	    But kdb_msg_write() increments @oops_in_progress so that
	    the internal lock is basically ignored.

	      => It looks like a try-and-hope approach used also by panic().


	  + con->write_atomic() is nbcon console callback. It is going
	    to be synchronized via the new nbcon_kdb_try_acquire()

	       => The output won't be guaranteed because try_acquire()
		  might fail. But it should be safe.


      d) printk()/pr_info() is synchronized against other printk() out
	 of box. It would show the messages on the consoles only when
	 safe.

	 BTW: It might make sense to call printk()/pr_info() inside
	      nbcon_cpu_emergency_enter()/exit() section so that it
	      would try to flush the messages immediately when safe.


> > About the try_acquire part, I haven't checked about the state of the
> > console devices when the panic happens, if they drop the ownership of
> > the console on non-panic CPUs or not, so I'm not sure if this is needed
> > or not. I'll wait for Petr and/or maybe John to add some info.
> 
> If any context owned the console and is in an unsafe section while being
> interrupted by kdb (regardless if panic or not), then
> nbcon_kdb_try_acquire() will fail and the mirrored kdb output will not
> be visible on that console.
> 
> I am not sure how important it is that the output is mirrored in this
> case. A hostile takeover is not acceptable. And implementing some sort
> of delay so that the current owner has a chance to release the ownership
> (similar to what was attempted here [0]) is not only complicated, but
> has its own set of problems.

The solution proposed in this patch (nbcon_kdb_try_acquire()) looks
acceptable to me. The output is not guaranteed. But is should
hopefully work in most situations.

The great thing is that it would be safe in compare with the legacy
consoles where @oops_in_progress causes ignoring the internal locking.


> Currently there is no mirrored output for nbcon consoles. With this
> series it is at least possible.
> 
> BTW, in order for CPU switching during panic to be able to mirror output
> on nbcon consoles, an additional exception must be added to
> nbcon_context_try_acquire_direct(). It would look like this:

Great idea. I am just not sure about the condition, see below.

> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index 79d8c74378061..2c168eaf378ed 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -10,6 +10,7 @@
>  #include <linux/export.h>
>  #include <linux/init.h>
>  #include <linux/irqflags.h>
> +#include <linux/kgdb.h>
>  #include <linux/kthread.h>
>  #include <linux/minmax.h>
>  #include <linux/percpu.h>
> @@ -247,6 +248,8 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
>  		 * Panic does not imply that the console is owned. However,
>  		 * since all non-panic CPUs are stopped during panic(), it
>  		 * is safer to have them avoid gaining console ownership.
> +		 * The only exception is if kgdb is active, which may print
> +		 * from multiple CPUs during a panic.
>  		 *
>  		 * If this acquire is a reacquire (and an unsafe takeover
>  		 * has not previously occurred) then it is allowed to attempt
> @@ -255,6 +258,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
>  		 * interrupted by the panic CPU while printing.
>  		 */
>  		if (other_cpu_in_panic() &&
> +		    atomic_read(&kgdb_active) == -1 &&

This would likely work for most kgdb_printk() calls. But what about
the one called from kgdb_panic()?

Alternative solution would be to allow it only for the CPU locked
by kdb, something like:

		    READ_ONCE(kdb_printf_cpu) != raw_smp_processor_id() &&

Note that I used READ_ONCE() to guarantee an atomic read. The
condition will fail only when we are inside a code locked by
the kdb_printf_cpu().

An alternative would be smp_load_acquire(&kdb_printf_cpu). But
I think that it is a "too big" hammer and it does not fit here.

> +		    atomic_read(&kgdb_active) == -1 &&
>  		    (!is_reacquire || cur->unsafe_takeover)) {
>  			return -EPERM;
>  		}
> 
> >> > @@ -605,7 +616,14 @@ static void kdb_msg_write(const char *msg, int msg_len)
> >> > 		 * for this calling context.
> >> > 		 */
> >> > 		++oops_in_progress;
> >> > -		c->write(c, msg, msg_len);
> >> > +		if (flags & CON_NBCON) {
> >> > +			wctxt.outbuf = (char *)msg;
> >> > +			wctxt.len = msg_len;
> >> > +			c->write_atomic(c, &wctxt);
> >> > +			nbcon_kdb_release(&wctxt);
> >> > +		} else {
> >> > +			c->write(c, msg, msg_len);
> >> > +		}
> >> > 		--oops_in_progress;
> >> > 		touch_nmi_watchdog();
> >> > 	}
> >> 
> >> Dumb question, but is the oops_in_progress bump still useful with
> >> atomic consoles? Will the console have any spinlocks to disregard or
> >> will the console ownership code already handled any mutual exclusion
> >> issues meaning there should be no spinlocks taking by the atomic
> >> write handler?
> >
> > IIUC, since we can have multiple consoles, and some of them are NB and
> > others aren't, I believe that this oops_in_progress is still useful.
> 
> Correct, but only for legacy consoles. Please move the @oops_in_progress
> increment/decrement to only be around the c->write() call. This makes it
> clear that the hack is only "useful" for the legacy consoles.

I agree.

> John
> 
> [0] https://lore.kernel.org/lkml/20210803131301.5588-4-john.ogness@linutronix.de

Sigh, I have already forgotten that we discussed this in the past.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/3] kdb: Adapt kdb_msg_write to work with NBCON consoles
  2025-08-29 13:15         ` Petr Mladek
@ 2025-08-29 14:01           ` Marcos Paulo de Souza
  2025-08-29 14:12           ` John Ogness
  1 sibling, 0 replies; 18+ messages in thread
From: Marcos Paulo de Souza @ 2025-08-29 14:01 UTC (permalink / raw)
  To: Petr Mladek, John Ogness
  Cc: Daniel Thompson, Greg Kroah-Hartman, Steven Rostedt,
	Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
	Douglas Anderson, linux-kernel, kgdb-bugreport

On Fri, 2025-08-29 at 15:15 +0200, Petr Mladek wrote:
> On Tue 2025-08-26 17:17:32, John Ogness wrote:
> > On 2025-08-11, Marcos Paulo de Souza <mpdesouza@suse.com> wrote:
> > > On Mon, 2025-08-11 at 15:38 +0100, Daniel Thompson wrote:
> > > > On Mon, Aug 11, 2025 at 10:32:47AM -0300, Marcos Paulo de Souza
> > > > wrote:
> > > > > Function kdb_msg_write was calling con->write for any found
> > > > > console, but it won't work on NBCON ones. In this case we
> > > > > should
> > > > > acquire the ownership of the console using
> > > > > NBCON_PRIO_EMERGENCY,
> > > > > since printing kdb messages should only be interrupted by a
> > > > > panic. This is done by the nbcon_kdb_{acquire,release}
> > > > > functions.
> > > > 
> > > > Just wanted to check what it means to be "interrupted by a
> > > > panic"?
> > > > 
> > > > kdb is called from the panic handler but, assuming the serial
> > > > port is
> > > > run syncrhonously when "bad things are happening", the serial
> > > > port
> > > > should be quiet when we enter kdb meaning we can still acquire
> > > > ownership of the console?
> > > 
> > > TBH I haven't thought about that. I talked with Petr Mladek about
> > > it,
> > > and we agreed to have something similar to
> > > nbcon_device_try_acquire,
> > > but with a higher priority, to make sure that we would get the
> > > context
> > > at this point. But, when thinking about it, since KDB runs on the
> > > panic
> > > handler, so we I'm not sure even if we need the _enter_unsafe()
> > > call at
> > > this point, since nobody is going to interrupt either way.
> > 
> > Well, kdb can also run when not in panic. In that case, if a panic
> > occurs while using kdb, those panic messages should be printed
> > directly
> > on the consoles.
> > 
> > Also be aware that the kdb interface is using dbg_io_ops-
> > >write_char()
> > for printing and it will ignore a console that matches
> > dbg_io_ops->cons. So there is no concern about the kdb interface
> > conflicting with the nbcon console. This is just about the mirrored
> > kdb
> > output.
> 
> I was a bit confused by the "mirrored kdb output". It was a new term
> ;-)
> Let me try to summarize how I see it. Take it with a caution because
> I
> am not much familiar with the kdb code.
> 
> vkdb_printf() API
> -----------------
> 
> It serializes calls using a custom recursive CPU lock
> (kdb_printf_cpu)
> 
> It shows/stores the messages using several interfaces:
> 
>      a) gdbstub_msg_write() probably shows the message inside
> 	an attached gdb debugger.
> 
>      b) kdb_msg_write() shows the message on the console where kdb
> 	is attached via dbg_io_ops->write_char()
> 
>      c) kdb_msg_write() also writes the message on all other consoles
> 	registered by printk. I guess that this is what John meant
> 	by mirroring.
> 
>      d) printk()/pr_info() stores the messages into printk log buffer
> 	and eventually shows them on printk consoles. It is called
> 	only when @logging is enabled
> 
> Note that either gdbstub_msg_write() or kdb_msg_write() is called.
> Also I guess that @logging can be enabled only when gdb debugger is
> attached. kdbgetintenv() most likely returns KDB_NOTENV when gdb is
> not attached.
> 
>   => vkdb_printf() shows the message on only once on consoles:
> 
>       + via kdb_msg_write() when gdb is not attached
> 
>       + via printk() when gdb is attached and logging is enabled
> 	by setting LOGGING= environment variable
> 
> 
> vkdb_printf() context
> ---------------------
> 
> vkdb_printf() is called when entering or being inside kdb code.
> I guess that we might end there via:
> 
>    + int3 (break point)
>    + kgdb_panic() called in panic()
>    + NMI ???
> 
> I think that it might be called even before kdb synchronizes
> all CPUs into some quiescent mode. Otherwise, the custom CPU
> synchronization would not be needed.
> 
> For example, I guess that the CPUs are not synchronized here:
> 
> void kgdb_panic(const char *msg)
> {
> [...]
> 	if (dbg_kdb_mode)
> 		kdb_printf("PANIC: %s\n", msg);
> 
> 
> But I might be wrong. Also it seems that kgdb_cpu_enter() makes sure
> that all CPUs get synchronized before entering the main loop...
> 
> 
> Serialization of vkdb_printf() vs. printk()
> ===========================================
> 
> Let's look at the various interfaces showing the messages:
> 
>     a) gdbstub_msg_write() does not conflict with printk()
>        at all.
> 
>     b) It seems that kdb_msg_write() -> dbg_io_ops->write_char()
>        uses some special API which tries to check whether the device
>        is in a safe state and eventually waits for the safe state
> (pooling).
> 
>        It seems to be a one way synchronization. It "guarantees" that
>        kdb would start write only when safe. But it does not block
>        the other side.
> 
>        It might be safe when the other side is blocked which likely
>        happens in kgdb_cpu_enter().
> 
> 
>      c) kdb_msg_write() -> con->write()/con->write_atomic(), where
> 
> 	  + con->write() is the legacy console callback. It is
> 	    internally synchronized via some lock, e.g. port->lock.
> 
> 	    But kdb_msg_write() increments @oops_in_progress so that
> 	    the internal lock is basically ignored.
> 
> 	      => It looks like a try-and-hope approach used also by
> panic().
> 
> 
> 	  + con->write_atomic() is nbcon console callback. It is
> going
> 	    to be synchronized via the new nbcon_kdb_try_acquire()
> 
> 	       => The output won't be guaranteed because
> try_acquire()
> 		  might fail. But it should be safe.
> 
> 
>       d) printk()/pr_info() is synchronized against other printk()
> out
> 	 of box. It would show the messages on the consoles only
> when
> 	 safe.
> 
> 	 BTW: It might make sense to call printk()/pr_info() inside
> 	      nbcon_cpu_emergency_enter()/exit() section so that it
> 	      would try to flush the messages immediately when safe.

Thanks for the insights!

> 
> > > About the try_acquire part, I haven't checked about the state of
> > > the
> > > console devices when the panic happens, if they drop the
> > > ownership of
> > > the console on non-panic CPUs or not, so I'm not sure if this is
> > > needed
> > > or not. I'll wait for Petr and/or maybe John to add some info.
> > 
> > If any context owned the console and is in an unsafe section while
> > being
> > interrupted by kdb (regardless if panic or not), then
> > nbcon_kdb_try_acquire() will fail and the mirrored kdb output will
> > not
> > be visible on that console.
> > 
> > I am not sure how important it is that the output is mirrored in
> > this
> > case. A hostile takeover is not acceptable. And implementing some
> > sort
> > of delay so that the current owner has a chance to release the
> > ownership
> > (similar to what was attempted here [0]) is not only complicated,
> > but
> > has its own set of problems.
> 
> The solution proposed in this patch (nbcon_kdb_try_acquire()) looks
> acceptable to me. The output is not guaranteed. But is should
> hopefully work in most situations.
> 
> The great thing is that it would be safe in compare with the legacy
> consoles where @oops_in_progress causes ignoring the internal
> locking.

great!

> 
> 
> > Currently there is no mirrored output for nbcon consoles. With this
> > series it is at least possible.
> > 
> > BTW, in order for CPU switching during panic to be able to mirror
> > output
> > on nbcon consoles, an additional exception must be added to
> > nbcon_context_try_acquire_direct(). It would look like this:
> 
> Great idea. I am just not sure about the condition, see below.
> 
> > diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> > index 79d8c74378061..2c168eaf378ed 100644
> > --- a/kernel/printk/nbcon.c
> > +++ b/kernel/printk/nbcon.c
> > @@ -10,6 +10,7 @@
> >  #include <linux/export.h>
> >  #include <linux/init.h>
> >  #include <linux/irqflags.h>
> > +#include <linux/kgdb.h>
> >  #include <linux/kthread.h>
> >  #include <linux/minmax.h>
> >  #include <linux/percpu.h>
> > @@ -247,6 +248,8 @@ static int
> > nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
> >  		 * Panic does not imply that the console is owned.
> > However,
> >  		 * since all non-panic CPUs are stopped during
> > panic(), it
> >  		 * is safer to have them avoid gaining console
> > ownership.
> > +		 * The only exception is if kgdb is active, which
> > may print
> > +		 * from multiple CPUs during a panic.
> >  		 *
> >  		 * If this acquire is a reacquire (and an unsafe
> > takeover
> >  		 * has not previously occurred) then it is allowed
> > to attempt
> > @@ -255,6 +258,7 @@ static int
> > nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
> >  		 * interrupted by the panic CPU while printing.
> >  		 */
> >  		if (other_cpu_in_panic() &&
> > +		    atomic_read(&kgdb_active) == -1 &&
> 
> This would likely work for most kgdb_printk() calls. But what about
> the one called from kgdb_panic()?
> 
> Alternative solution would be to allow it only for the CPU locked
> by kdb, something like:
> 
> 		    READ_ONCE(kdb_printf_cpu) !=
> raw_smp_processor_id() &&
> 
> Note that I used READ_ONCE() to guarantee an atomic read. The
> condition will fail only when we are inside a code locked by
> the kdb_printf_cpu().
> 
> An alternative would be smp_load_acquire(&kdb_printf_cpu). But
> I think that it is a "too big" hammer and it does not fit here.

Do you guys plan to send a separated patch for it later (or before) 
this patchset?

> 
> > +		    atomic_read(&kgdb_active) == -1 &&
> >  		    (!is_reacquire || cur->unsafe_takeover)) {
> >  			return -EPERM;
> >  		}
> > 
> > > > > @@ -605,7 +616,14 @@ static void kdb_msg_write(const char
> > > > > *msg, int msg_len)
> > > > > 		 * for this calling context.
> > > > > 		 */
> > > > > 		++oops_in_progress;
> > > > > -		c->write(c, msg, msg_len);
> > > > > +		if (flags & CON_NBCON) {
> > > > > +			wctxt.outbuf = (char *)msg;
> > > > > +			wctxt.len = msg_len;
> > > > > +			c->write_atomic(c, &wctxt);
> > > > > +			nbcon_kdb_release(&wctxt);
> > > > > +		} else {
> > > > > +			c->write(c, msg, msg_len);
> > > > > +		}
> > > > > 		--oops_in_progress;
> > > > > 		touch_nmi_watchdog();
> > > > > 	}
> > > > 
> > > > Dumb question, but is the oops_in_progress bump still useful
> > > > with
> > > > atomic consoles? Will the console have any spinlocks to
> > > > disregard or
> > > > will the console ownership code already handled any mutual
> > > > exclusion
> > > > issues meaning there should be no spinlocks taking by the
> > > > atomic
> > > > write handler?
> > > 
> > > IIUC, since we can have multiple consoles, and some of them are
> > > NB and
> > > others aren't, I believe that this oops_in_progress is still
> > > useful.
> > 
> > Correct, but only for legacy consoles. Please move the
> > @oops_in_progress
> > increment/decrement to only be around the c->write() call. This
> > makes it
> > clear that the hack is only "useful" for the legacy consoles.
> 
> I agree.

Great, this simplifies the logic by only checking the flags for NBCON
only once.

> 
> > John
> > 
> > [0]
> > https://lore.kernel.org/lkml/20210803131301.5588-4-john.ogness@linutronix.de
> 
> Sigh, I have already forgotten that we discussed this in the past.
> 
> Best Regards,
> Petr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/3] kdb: Adapt kdb_msg_write to work with NBCON consoles
  2025-08-29 13:15         ` Petr Mladek
  2025-08-29 14:01           ` Marcos Paulo de Souza
@ 2025-08-29 14:12           ` John Ogness
  2025-09-01 11:46             ` Petr Mladek
  1 sibling, 1 reply; 18+ messages in thread
From: John Ogness @ 2025-08-29 14:12 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Marcos Paulo de Souza, Daniel Thompson, Greg Kroah-Hartman,
	Steven Rostedt, Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
	Douglas Anderson, linux-kernel, kgdb-bugreport

On 2025-08-29, Petr Mladek <pmladek@suse.com> wrote:
>      c) kdb_msg_write() also writes the message on all other consoles
> 	registered by printk. I guess that this is what John meant
> 	by mirroring.

Yes.

>> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
>> index 79d8c74378061..2c168eaf378ed 100644
>> --- a/kernel/printk/nbcon.c
>> +++ b/kernel/printk/nbcon.c
>> @@ -10,6 +10,7 @@
>>  #include <linux/export.h>
>>  #include <linux/init.h>
>>  #include <linux/irqflags.h>
>> +#include <linux/kgdb.h>
>>  #include <linux/kthread.h>
>>  #include <linux/minmax.h>
>>  #include <linux/percpu.h>
>> @@ -247,6 +248,8 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
>>  		 * Panic does not imply that the console is owned. However,
>>  		 * since all non-panic CPUs are stopped during panic(), it
>>  		 * is safer to have them avoid gaining console ownership.
>> +		 * The only exception is if kgdb is active, which may print
>> +		 * from multiple CPUs during a panic.
>>  		 *
>>  		 * If this acquire is a reacquire (and an unsafe takeover
>>  		 * has not previously occurred) then it is allowed to attempt
>> @@ -255,6 +258,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
>>  		 * interrupted by the panic CPU while printing.
>>  		 */
>>  		if (other_cpu_in_panic() &&
>> +		    atomic_read(&kgdb_active) == -1 &&
>
> This would likely work for most kgdb_printk() calls. But what about
> the one called from kgdb_panic()?

Nice catch.

> Alternative solution would be to allow it only for the CPU locked
> by kdb, something like:
>
> 		    READ_ONCE(kdb_printf_cpu) != raw_smp_processor_id() &&

Yes, I like this.

> Note that I used READ_ONCE() to guarantee an atomic read. The
> condition will fail only when we are inside a code locked by
> the kdb_printf_cpu().

Neither the READ_ONCE() nor any memory barriers are needed because the
only interesting case is when the CPU sees that it is the one stored in
@kdb_printf_cpu. In which case it was the one that did the storing and
the value is always correctly loaded.

>> [0] https://lore.kernel.org/lkml/20210803131301.5588-4-john.ogness@linutronix.de
>
> Sigh, I have already forgotten that we discussed this in the past.

After so many years, I do not think there is a printk scenario we have
not discussed. ;-)

John

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/3] kdb: Adapt kdb_msg_write to work with NBCON consoles
  2025-08-29 14:12           ` John Ogness
@ 2025-09-01 11:46             ` Petr Mladek
  2025-09-01 12:27               ` John Ogness
  0 siblings, 1 reply; 18+ messages in thread
From: Petr Mladek @ 2025-09-01 11:46 UTC (permalink / raw)
  To: John Ogness
  Cc: Marcos Paulo de Souza, Daniel Thompson, Greg Kroah-Hartman,
	Steven Rostedt, Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
	Douglas Anderson, linux-kernel, kgdb-bugreport

On Fri 2025-08-29 16:18:28, John Ogness wrote:
> On 2025-08-29, Petr Mladek <pmladek@suse.com> wrote:
> >      c) kdb_msg_write() also writes the message on all other consoles
> > 	registered by printk. I guess that this is what John meant
> > 	by mirroring.
> 
> Yes.
> 
> >> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> >> index 79d8c74378061..2c168eaf378ed 100644
> >> --- a/kernel/printk/nbcon.c
> >> +++ b/kernel/printk/nbcon.c
> >> @@ -10,6 +10,7 @@
> >>  #include <linux/export.h>
> >>  #include <linux/init.h>
> >>  #include <linux/irqflags.h>
> >> +#include <linux/kgdb.h>
> >>  #include <linux/kthread.h>
> >>  #include <linux/minmax.h>
> >>  #include <linux/percpu.h>
> >> @@ -247,6 +248,8 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
> >>  		 * Panic does not imply that the console is owned. However,
> >>  		 * since all non-panic CPUs are stopped during panic(), it
> >>  		 * is safer to have them avoid gaining console ownership.
> >> +		 * The only exception is if kgdb is active, which may print
> >> +		 * from multiple CPUs during a panic.
> >>  		 *
> >>  		 * If this acquire is a reacquire (and an unsafe takeover
> >>  		 * has not previously occurred) then it is allowed to attempt
> >> @@ -255,6 +258,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
> >>  		 * interrupted by the panic CPU while printing.
> >>  		 */
> >>  		if (other_cpu_in_panic() &&
> >> +		    atomic_read(&kgdb_active) == -1 &&
> >
> > This would likely work for most kgdb_printk() calls. But what about
> > the one called from kgdb_panic()?
> 
> Nice catch.
> 
> > Alternative solution would be to allow it only for the CPU locked
> > by kdb, something like:
> >
> > 		    READ_ONCE(kdb_printf_cpu) != raw_smp_processor_id() &&
> 
> Yes, I like this.
>
> > Note that I used READ_ONCE() to guarantee an atomic read. The
> > condition will fail only when we are inside a code locked by
> > the kdb_printf_cpu().
> 
> Neither the READ_ONCE() nor any memory barriers are needed because the
> only interesting case is when the CPU sees that it is the one stored in
> @kdb_printf_cpu. In which case it was the one that did the storing and
> the value is always correctly loaded.

Let me play the devil advocate for a bit.
What about the following race?

kdb_printf_cpu = -1  (0xffffffff)

CPU 0xff				CPU 0x1

					panic()

printk()
  nbcon_atomic_flush_pending()
     nbcon_context_try_acquire_direct()
	# load low byte of kdb_printf_cpu
	val = 0xff

					vkdb_printf()
					  cmpxchg(&kdb_printf_cpu, ...)
					  kdb_printf_cpu == 0x1

	# load higher byte of kdb_printf_cpu
	val = 0xff

Result: CPU 0xff would be allowed to acquire the nbcon context
	because it thinks that vkdb_printf() got locked on this CPU.

	It is not fully artificial, see
	https://lwn.net/Articles/793253/#Load%20Tearing

The above race is not critical. CPU 0x1 still could wait for CPU 0xff
and acquire the nbcon context later.

But it is something unexpected. I would feel more comfortable if
we used the READ_ONCE() and be on the safe side.

> >> [0] https://lore.kernel.org/lkml/20210803131301.5588-4-john.ogness@linutronix.de
> >
> > Sigh, I have already forgotten that we discussed this in the past.
> 
> After so many years, I do not think there is a printk scenario we have
> not discussed. ;-)

;-)

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/3] kdb: Adapt kdb_msg_write to work with NBCON consoles
  2025-09-01 11:46             ` Petr Mladek
@ 2025-09-01 12:27               ` John Ogness
  0 siblings, 0 replies; 18+ messages in thread
From: John Ogness @ 2025-09-01 12:27 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Marcos Paulo de Souza, Daniel Thompson, Greg Kroah-Hartman,
	Steven Rostedt, Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
	Douglas Anderson, linux-kernel, kgdb-bugreport

On 2025-09-01, Petr Mladek <pmladek@suse.com> wrote:
> What about the following race?
>
> kdb_printf_cpu = -1  (0xffffffff)
>
> CPU 0xff				CPU 0x1
>
> 					panic()
>
> printk()
>   nbcon_atomic_flush_pending()
>      nbcon_context_try_acquire_direct()
> 	# load low byte of kdb_printf_cpu
> 	val = 0xff
>
> 					vkdb_printf()
> 					  cmpxchg(&kdb_printf_cpu, ...)
> 					  kdb_printf_cpu == 0x1
>
> 	# load higher byte of kdb_printf_cpu
> 	val = 0xff
>
> Result: CPU 0xff would be allowed to acquire the nbcon context
> 	because it thinks that vkdb_printf() got locked on this CPU.
>
> 	It is not fully artificial, see
> 	https://lwn.net/Articles/793253/#Load%20Tearing
>
> The above race is not critical. CPU 0x1 still could wait for CPU 0xff
> and acquire the nbcon context later.
>
> But it is something unexpected. I would feel more comfortable if
> we used the READ_ONCE() and be on the safe side.

Agreed.

John

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-09-01 12:27 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-11 13:32 [PATCH v2 0/3] Handle NBCON consoles on KDB Marcos Paulo de Souza
2025-08-11 13:32 ` [PATCH v2 1/3] printk: nbcon: Export console_is_usable Marcos Paulo de Souza
2025-08-20 15:03   ` Petr Mladek
2025-08-11 13:32 ` [PATCH v2 2/3] printk: nbcon: Introduce KDB helpers Marcos Paulo de Souza
2025-08-26 14:30   ` John Ogness
2025-08-29 12:30     ` Marcos Paulo de Souza
2025-08-28 15:26   ` Petr Mladek
2025-08-29 12:33     ` Marcos Paulo de Souza
2025-08-11 13:32 ` [PATCH v2 3/3] kdb: Adapt kdb_msg_write to work with NBCON consoles Marcos Paulo de Souza
2025-08-11 14:38   ` Daniel Thompson
2025-08-11 19:19     ` Marcos Paulo de Souza
2025-08-26 15:11       ` John Ogness
2025-08-29 13:15         ` Petr Mladek
2025-08-29 14:01           ` Marcos Paulo de Souza
2025-08-29 14:12           ` John Ogness
2025-09-01 11:46             ` Petr Mladek
2025-09-01 12:27               ` John Ogness
2025-08-12 17:14 ` [PATCH v2 0/3] Handle NBCON consoles on KDB Marcos Paulo de Souza

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).