* [PATCH v3 1/4] printk: nbcon: Export console_is_usable
2025-09-02 18:33 [PATCH v3 0/4] Handle NBCON consoles on KDB Marcos Paulo de Souza
@ 2025-09-02 18:33 ` Marcos Paulo de Souza
2025-09-05 13:48 ` Petr Mladek
2025-09-02 18:33 ` [PATCH v3 2/4] printk: nbcon: Introduce KDB helpers Marcos Paulo de Souza
` (2 subsequent siblings)
3 siblings, 1 reply; 22+ messages in thread
From: Marcos Paulo de Souza @ 2025-09-02 18:33 UTC (permalink / raw)
To: Greg Kroah-Hartman, Petr Mladek, Steven Rostedt, John Ogness,
Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
Douglas Anderson
Cc: linux-kernel, kgdb-bugreport, Marcos Paulo de Souza
The helper will be used on KDB code in the next commits.
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
include/linux/console.h | 44 ++++++++++++++++++++++++++++++++++++++++++++
kernel/printk/internal.h | 44 --------------------------------------------
2 files changed, 44 insertions(+), 44 deletions(-)
diff --git a/include/linux/console.h b/include/linux/console.h
index 8f10d0a85bb4536e4b0dda4e8ccbdf87978bbb4a..67af483574727c00eea1d5a1eacc994755c92607 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -605,6 +605,48 @@ extern bool nbcon_can_proceed(struct nbcon_write_context *wctxt);
extern bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt);
extern bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt);
extern void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt);
+
+/*
+ * Check if the given console is currently capable and allowed to print
+ * records. Note that this function does not consider the current context,
+ * which can also play a role in deciding if @con can be used to print
+ * records.
+ */
+static inline bool console_is_usable(struct console *con, short flags, bool use_atomic)
+{
+ if (!(flags & CON_ENABLED))
+ return false;
+
+ if ((flags & CON_SUSPENDED))
+ return false;
+
+ if (flags & CON_NBCON) {
+ /* The write_atomic() callback is optional. */
+ if (use_atomic && !con->write_atomic)
+ return false;
+
+ /*
+ * For the !use_atomic case, @printk_kthreads_running is not
+ * checked because the write_thread() callback is also used
+ * via the legacy loop when the printer threads are not
+ * available.
+ */
+ } else {
+ if (!con->write)
+ return false;
+ }
+
+ /*
+ * Console drivers may assume that per-cpu resources have been
+ * allocated. So unless they're explicitly marked as being able to
+ * cope (CON_ANYTIME) don't call them until this CPU is officially up.
+ */
+ if (!cpu_online(raw_smp_processor_id()) && !(flags & CON_ANYTIME))
+ return false;
+
+ return true;
+}
+
#else
static inline void nbcon_cpu_emergency_enter(void) { }
static inline void nbcon_cpu_emergency_exit(void) { }
@@ -612,6 +654,8 @@ static inline bool nbcon_can_proceed(struct nbcon_write_context *wctxt) { return
static inline bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt) { return false; }
static inline bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt) { return false; }
static inline void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt) { }
+static inline bool console_is_usable(struct console *con, short flags,
+ bool use_atomic) { return false; }
#endif
extern int console_set_on_cmdline;
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index ef282001f200fdbbacae3171932bf9f049037a85..bef97f2d11793191280bfb46f7f8b13bf2560351 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -112,47 +112,6 @@ bool nbcon_kthread_create(struct console *con);
void nbcon_kthread_stop(struct console *con);
void nbcon_kthreads_wake(void);
-/*
- * Check if the given console is currently capable and allowed to print
- * records. Note that this function does not consider the current context,
- * which can also play a role in deciding if @con can be used to print
- * records.
- */
-static inline bool console_is_usable(struct console *con, short flags, bool use_atomic)
-{
- if (!(flags & CON_ENABLED))
- return false;
-
- if ((flags & CON_SUSPENDED))
- return false;
-
- if (flags & CON_NBCON) {
- /* The write_atomic() callback is optional. */
- if (use_atomic && !con->write_atomic)
- return false;
-
- /*
- * For the !use_atomic case, @printk_kthreads_running is not
- * checked because the write_thread() callback is also used
- * via the legacy loop when the printer threads are not
- * available.
- */
- } else {
- if (!con->write)
- return false;
- }
-
- /*
- * Console drivers may assume that per-cpu resources have been
- * allocated. So unless they're explicitly marked as being able to
- * cope (CON_ANYTIME) don't call them until this CPU is officially up.
- */
- if (!cpu_online(raw_smp_processor_id()) && !(flags & CON_ANYTIME))
- return false;
-
- return true;
-}
-
/**
* nbcon_kthread_wake - Wake up a console printing thread
* @con: Console to operate on
@@ -204,9 +163,6 @@ static inline bool nbcon_legacy_emit_next_record(struct console *con, bool *hand
static inline void nbcon_kthread_wake(struct console *con) { }
static inline void nbcon_kthreads_wake(void) { }
-static inline bool console_is_usable(struct console *con, short flags,
- bool use_atomic) { return false; }
-
#endif /* CONFIG_PRINTK */
extern bool have_boot_console;
--
2.50.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v3 1/4] printk: nbcon: Export console_is_usable
2025-09-02 18:33 ` [PATCH v3 1/4] printk: nbcon: Export console_is_usable Marcos Paulo de Souza
@ 2025-09-05 13:48 ` Petr Mladek
0 siblings, 0 replies; 22+ messages in thread
From: Petr Mladek @ 2025-09-05 13:48 UTC (permalink / raw)
To: Marcos Paulo de Souza
Cc: Greg Kroah-Hartman, Steven Rostedt, John Ogness,
Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
Douglas Anderson, linux-kernel, kgdb-bugreport
On Tue 2025-09-02 15:33:52, Marcos Paulo de Souza wrote:
> The helper will be used on KDB code in the next commits.
>
> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Looks good:
Reviewed-by: Petr Mladek <pmladek@suse.com>
Best Regards,
Petr
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v3 2/4] printk: nbcon: Introduce KDB helpers
2025-09-02 18:33 [PATCH v3 0/4] Handle NBCON consoles on KDB Marcos Paulo de Souza
2025-09-02 18:33 ` [PATCH v3 1/4] printk: nbcon: Export console_is_usable Marcos Paulo de Souza
@ 2025-09-02 18:33 ` Marcos Paulo de Souza
2025-09-05 16:21 ` Petr Mladek
2025-09-02 18:33 ` [PATCH v3 3/4] printk: nbcon: Allow KDB to acquire the NBCON context Marcos Paulo de Souza
2025-09-02 18:33 ` [PATCH v3 4/4] kdb: Adapt kdb_msg_write to work with NBCON consoles Marcos Paulo de Souza
3 siblings, 1 reply; 22+ messages in thread
From: Marcos Paulo de Souza @ 2025-09-02 18:33 UTC (permalink / raw)
To: Greg Kroah-Hartman, Petr Mladek, Steven Rostedt, John Ogness,
Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
Douglas Anderson
Cc: linux-kernel, kgdb-bugreport, Marcos Paulo de Souza
These helpers will be used when calling console->write_atomic on
KDB code in the next patch. It's basically the same implementaion
as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when
acquiring the context.
For release we need to flush the console, since some messages could be
added before the context was acquired, as KDB emits the messages using
con->{write,write_atomic} instead of storing them on the ring buffer.
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
include/linux/console.h | 6 +++++
kernel/printk/nbcon.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 72 insertions(+)
diff --git a/include/linux/console.h b/include/linux/console.h
index 67af483574727c00eea1d5a1eacc994755c92607..b34c5a0b86303e2fb4583fa467d8be43761cf756 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -605,6 +605,9 @@ extern bool nbcon_can_proceed(struct nbcon_write_context *wctxt);
extern bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt);
extern bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt);
extern void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt);
+extern bool nbcon_kdb_try_acquire(struct console *con,
+ struct nbcon_write_context *wctxt);
+extern void nbcon_kdb_release(struct nbcon_write_context *wctxt);
/*
* Check if the given console is currently capable and allowed to print
@@ -654,6 +657,9 @@ static inline bool nbcon_can_proceed(struct nbcon_write_context *wctxt) { return
static inline bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt) { return false; }
static inline bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt) { return false; }
static inline void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt) { }
+static inline bool nbcon_kdb_try_acquire(struct console *con,
+ struct nbcon_write_context *wctxt) { return false; }
+static inline void nbcon_kdb_release(struct console *con) { }
static inline bool console_is_usable(struct console *con, short flags,
bool use_atomic) { return false; }
#endif
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 646801813415f0abe40cabf2f28ca9e30664f028..ff218e95a505fd10521c2c4dfb00ad5ec5773953 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1855,3 +1855,69 @@ void nbcon_device_release(struct console *con)
console_srcu_read_unlock(cookie);
}
EXPORT_SYMBOL_GPL(nbcon_device_release);
+
+/**
+ * nbcon_kdb_try_acquire - Try to acquire nbcon console, enter unsafe
+ * section, and initialized nbcon write context
+ * @con: The nbcon console to acquire
+ * @wctxt: The nbcon write context to be used on success
+ *
+ * Context: Under console_srcu_read_lock() for emiting a single kdb message
+ * using the given con->write_atomic() callback. Can be called
+ * only when the console is usable at the moment.
+ *
+ * Return: True if the console was acquired. False otherwise.
+ *
+ * kdb emits messages on consoles registered for printk() without
+ * storing them into the ring buffer. It has to acquire the console
+ * ownerhip so that it could call con->write_atomic() callback a safe way.
+ *
+ * This function acquires the nbcon console using priority NBCON_PRIO_EMERGENCY
+ * and marks it unsafe for handover/takeover.
+ */
+bool nbcon_kdb_try_acquire(struct console *con,
+ struct nbcon_write_context *wctxt)
+{
+ struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+
+ memset(ctxt, 0, sizeof(*ctxt));
+ ctxt->console = con;
+ ctxt->prio = NBCON_PRIO_EMERGENCY;
+
+ if (!nbcon_context_try_acquire(ctxt, false))
+ return false;
+
+ if (!nbcon_context_enter_unsafe(ctxt))
+ return false;
+
+ return true;
+}
+
+/**
+ * nbcon_kdb_release - Exit unsafe section and release the nbcon console
+ *
+ * @wctxt: The nbcon write context initialized by a successful
+ * nbcon_kdb_try_acquire()
+ *
+ * Context: Under console_srcu_read_lock() for emiting a single kdb message
+ * using the given con->write_atomic() callback. Can be called
+ * only when the console is usable at the moment.
+ */
+void nbcon_kdb_release(struct nbcon_write_context *wctxt)
+{
+ struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+
+ if (!nbcon_context_exit_unsafe(ctxt))
+ return;
+
+ nbcon_context_release(ctxt);
+
+ /*
+ * Flush any new printk() messages added when the console was blocked.
+ * Only the console used by the given write context was blocked.
+ * The console was locked only when the write_atomic() callback
+ * was usable.
+ */
+ __nbcon_atomic_flush_pending_con(ctxt->console,
+ prb_next_reserve_seq(prb), false);
+}
--
2.50.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v3 2/4] printk: nbcon: Introduce KDB helpers
2025-09-02 18:33 ` [PATCH v3 2/4] printk: nbcon: Introduce KDB helpers Marcos Paulo de Souza
@ 2025-09-05 16:21 ` Petr Mladek
2025-09-08 12:09 ` John Ogness
0 siblings, 1 reply; 22+ messages in thread
From: Petr Mladek @ 2025-09-05 16:21 UTC (permalink / raw)
To: Marcos Paulo de Souza
Cc: Greg Kroah-Hartman, Steven Rostedt, John Ogness,
Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
Douglas Anderson, linux-kernel, kgdb-bugreport
On Tue 2025-09-02 15:33:53, Marcos Paulo de Souza wrote:
> These helpers will be used when calling console->write_atomic on
> KDB code in the next patch. It's basically the same implementaion
> as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when
> acquiring the context.
>
> For release we need to flush the console, since some messages could be
> added before the context was acquired, as KDB emits the messages using
> con->{write,write_atomic} instead of storing them on the ring buffer.
I am a bit confused by the last paragraph. It is a very long sentence.
Sigh, I wanted to propose a simple and clear alternative. But I ended
in a rabbit hole and with a rather complex text:
<proposal>
The atomic flush in the release function is questionable. vkdb_printf()
is primary called only when other CPUs are quiescent in kdb_main_loop()
and do not call the classic printk(). But, for example, the
write_atomic() callback might print debug messages. Or there is
one kdb_printf() called in kgdb_panic() before other CPUs are
quiescent. So the flush might be useful. Especially, when
the kdb code fails to quiescent the CPUs and returns early.
Let's keep it simple and just call __nbcon_atomic_flush_pending_con().
It uses write_atomic() callback which is used by the locked kdb code
anyway.
The legacy loop (console_trylock()/console_unlock()) is not
usable in kdb context.
It might make sense to trigger the flush via the printk kthread.
But it would not work in panic() where is the only known kdb_printf()
called when other CPUs are not quiescent. So, it does not look
worth it.
</proposal>
What do you think?
My opinion:
Honestly, I think that the flush is not much important because
it will most offten have nothing to do.
I am just not sure whether it is better to have it there
or avoid it. It might be better to remove it after all.
And just document the decision.
Best Regards,
Petr
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v3 2/4] printk: nbcon: Introduce KDB helpers
2025-09-05 16:21 ` Petr Mladek
@ 2025-09-08 12:09 ` John Ogness
2025-09-09 14:21 ` Petr Mladek
0 siblings, 1 reply; 22+ messages in thread
From: John Ogness @ 2025-09-08 12:09 UTC (permalink / raw)
To: Petr Mladek, Marcos Paulo de Souza
Cc: Greg Kroah-Hartman, Steven Rostedt, Sergey Senozhatsky,
Jason Wessel, Daniel Thompson, Douglas Anderson, linux-kernel,
kgdb-bugreport
On 2025-09-05, Petr Mladek <pmladek@suse.com> wrote:
> On Tue 2025-09-02 15:33:53, Marcos Paulo de Souza wrote:
>> These helpers will be used when calling console->write_atomic on
>> KDB code in the next patch. It's basically the same implementaion
>> as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when
>> acquiring the context.
>>
>> For release we need to flush the console, since some messages could be
>> added before the context was acquired, as KDB emits the messages using
>> con->{write,write_atomic} instead of storing them on the ring buffer.
>
> I am a bit confused by the last paragraph. It is a very long sentence.
>
> Sigh, I wanted to propose a simple and clear alternative. But I ended
> in a rabbit hole and with a rather complex text:
>
> <proposal>
> The atomic flush in the release function is questionable. vkdb_printf()
> is primary called only when other CPUs are quiescent in kdb_main_loop()
> and do not call the classic printk(). But, for example, the
> write_atomic() callback might print debug messages. Or there is
> one kdb_printf() called in kgdb_panic() before other CPUs are
> quiescent. So the flush might be useful. Especially, when
> the kdb code fails to quiescent the CPUs and returns early.
>
> Let's keep it simple and just call __nbcon_atomic_flush_pending_con().
> It uses write_atomic() callback which is used by the locked kdb code
> anyway.
>
> The legacy loop (console_trylock()/console_unlock()) is not
> usable in kdb context.
>
> It might make sense to trigger the flush via the printk kthread.
> But it would not work in panic() where is the only known kdb_printf()
> called when other CPUs are not quiescent. So, it does not look
> worth it.
> </proposal>
>
> What do you think?
>
> My opinion:
>
> Honestly, I think that the flush is not much important because
> it will most offten have nothing to do.
>
> I am just not sure whether it is better to have it there
> or avoid it. It might be better to remove it after all.
> And just document the decision.
IMHO keeping the flush is fine. There are cases where there might be
something to print. And since a printing kthread will get no chance to
print as long as kdb is alive, we should have kdb flushing that
console.
Note that this is the only console that will actually see the new
messages immediately as all the other CPUs and quiesced. For this reason
we probably want to use __nbcon_atomic_flush_pending() to try to flush
_all_ the consoles.
As to the last paragraph of the commit message, I would keep it simple:
After release try to flush all consoles since there may be a backlog of
messages in the ringbuffer. The kthread console printers do not get a
chance to run while kdb is active.
John
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v3 2/4] printk: nbcon: Introduce KDB helpers
2025-09-08 12:09 ` John Ogness
@ 2025-09-09 14:21 ` Petr Mladek
2025-09-09 14:38 ` John Ogness
0 siblings, 1 reply; 22+ messages in thread
From: Petr Mladek @ 2025-09-09 14:21 UTC (permalink / raw)
To: John Ogness
Cc: Marcos Paulo de Souza, Greg Kroah-Hartman, Steven Rostedt,
Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
Douglas Anderson, linux-kernel, kgdb-bugreport
On Mon 2025-09-08 14:15:08, John Ogness wrote:
> On 2025-09-05, Petr Mladek <pmladek@suse.com> wrote:
> > On Tue 2025-09-02 15:33:53, Marcos Paulo de Souza wrote:
> >> These helpers will be used when calling console->write_atomic on
> >> KDB code in the next patch. It's basically the same implementaion
> >> as nbcon_device_try_acquire, but using NBCON_PORIO_EMERGENCY when
> >> acquiring the context.
> >>
> >> For release we need to flush the console, since some messages could be
> >> added before the context was acquired, as KDB emits the messages using
> >> con->{write,write_atomic} instead of storing them on the ring buffer.
> >
> > I am a bit confused by the last paragraph. It is a very long sentence.
> >
> > Sigh, I wanted to propose a simple and clear alternative. But I ended
> > in a rabbit hole and with a rather complex text:
> >
> > <proposal>
> > The atomic flush in the release function is questionable. vkdb_printf()
> > is primary called only when other CPUs are quiescent in kdb_main_loop()
> > and do not call the classic printk(). But, for example, the
> > write_atomic() callback might print debug messages. Or there is
> > one kdb_printf() called in kgdb_panic() before other CPUs are
> > quiescent. So the flush might be useful. Especially, when
> > the kdb code fails to quiescent the CPUs and returns early.
> >
> > Let's keep it simple and just call __nbcon_atomic_flush_pending_con().
> > It uses write_atomic() callback which is used by the locked kdb code
> > anyway.
> >
> > The legacy loop (console_trylock()/console_unlock()) is not
> > usable in kdb context.
> >
> > It might make sense to trigger the flush via the printk kthread.
> > But it would not work in panic() where is the only known kdb_printf()
> > called when other CPUs are not quiescent. So, it does not look
> > worth it.
> > </proposal>
> >
> > What do you think?
> >
> > My opinion:
> >
> > Honestly, I think that the flush is not much important because
> > it will most offten have nothing to do.
> >
> > I am just not sure whether it is better to have it there
> > or avoid it. It might be better to remove it after all.
> > And just document the decision.
>
> IMHO keeping the flush is fine. There are cases where there might be
> something to print. And since a printing kthread will get no chance to
> print as long as kdb is alive, we should have kdb flushing that
> console.
>
> Note that this is the only console that will actually see the new
> messages immediately as all the other CPUs and quiesced.
I do not understand this argument. IMHO, this new
try_acquire()/release() API should primary flush only
the console which was (b)locked by this API.
It will be called in kdb_msg_write() which tries to write
to all registered consoles. So the other nbcon consoles will
get flushed when the try_acquire() succeeds on them. And the
legacy conosles were never flushed.
> For this reason
> we probably want to use __nbcon_atomic_flush_pending() to try to flush
> _all_ the consoles.
I would prefer to keep __nbcon_atomic_flush_pending_con().
I mean to flush only the console which was blocked.
Note that we would need to increment oops_in_progress if we wanted
to flush legacy consoles in this context... which would spread
the mess into nbcon code...
> As to the last paragraph of the commit message, I would keep it simple:
>
> After release try to flush all consoles since there may be a backlog of
> messages in the ringbuffer. The kthread console printers do not get a
> chance to run while kdb is active.
I like this text.
Best Regards,
Petr
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v3 2/4] printk: nbcon: Introduce KDB helpers
2025-09-09 14:21 ` Petr Mladek
@ 2025-09-09 14:38 ` John Ogness
0 siblings, 0 replies; 22+ messages in thread
From: John Ogness @ 2025-09-09 14:38 UTC (permalink / raw)
To: Petr Mladek
Cc: Marcos Paulo de Souza, Greg Kroah-Hartman, Steven Rostedt,
Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
Douglas Anderson, linux-kernel, kgdb-bugreport
On 2025-09-09, Petr Mladek <pmladek@suse.com> wrote:
>> > Honestly, I think that the flush is not much important because
>> > it will most offten have nothing to do.
>> >
>> > I am just not sure whether it is better to have it there
>> > or avoid it. It might be better to remove it after all.
>> > And just document the decision.
>>
>> IMHO keeping the flush is fine. There are cases where there might be
>> something to print. And since a printing kthread will get no chance to
>> print as long as kdb is alive, we should have kdb flushing that
>> console.
>>
>> Note that this is the only console that will actually see the new
>> messages immediately as all the other CPUs and quiesced.
>
> I do not understand this argument. IMHO, this new
> try_acquire()/release() API should primary flush only
> the console which was (b)locked by this API.
>
> It will be called in kdb_msg_write() which tries to write
> to all registered consoles. So the other nbcon consoles will
> get flushed when the try_acquire() succeeds on them. And the
> legacy conosles were never flushed.
Right. I oversaw that it acquires each of the nbcon's.
> I would prefer to keep __nbcon_atomic_flush_pending_con().
> I mean to flush only the console which was blocked.
Agreed.
>> After release try to flush all consoles since there may be a backlog of
>> messages in the ringbuffer. The kthread console printers do not get a
>> chance to run while kdb is active.
>
> I like this text.
OK, but then change it to talk only about the one console.
After release try to flush the console since there may be a backlog of
messages in the ringbuffer. The kthread console printers do not get a
chance to run while kdb is active.
John
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v3 3/4] printk: nbcon: Allow KDB to acquire the NBCON context
2025-09-02 18:33 [PATCH v3 0/4] Handle NBCON consoles on KDB Marcos Paulo de Souza
2025-09-02 18:33 ` [PATCH v3 1/4] printk: nbcon: Export console_is_usable Marcos Paulo de Souza
2025-09-02 18:33 ` [PATCH v3 2/4] printk: nbcon: Introduce KDB helpers Marcos Paulo de Souza
@ 2025-09-02 18:33 ` Marcos Paulo de Souza
2025-09-05 14:52 ` John Ogness
2025-09-08 15:14 ` Petr Mladek
2025-09-02 18:33 ` [PATCH v3 4/4] kdb: Adapt kdb_msg_write to work with NBCON consoles Marcos Paulo de Souza
3 siblings, 2 replies; 22+ messages in thread
From: Marcos Paulo de Souza @ 2025-09-02 18:33 UTC (permalink / raw)
To: Greg Kroah-Hartman, Petr Mladek, Steven Rostedt, John Ogness,
Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
Douglas Anderson
Cc: linux-kernel, kgdb-bugreport, Marcos Paulo de Souza
KDB can interrupt any console to execute the "mirrored printing" at any
time, so add an exception to nbcon_context_try_acquire_direct to allow
to get the context if the current CPU is the same as kdb_printf_cpu.
This change will be necessary for the next patch, which fixes
kdb_msg_write to work with NBCON consoles by calling ->write_atomic on
such consoles. But to print it first needs to acquire the ownership of
the console, so nbcon_context_try_acquire_direct is fixed here.
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
kernel/printk/nbcon.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index ff218e95a505fd10521c2c4dfb00ad5ec5773953..352235a0eb4a484caccf86d3a57d1a149218ecec 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -10,6 +10,7 @@
#include <linux/export.h>
#include <linux/init.h>
#include <linux/irqflags.h>
+#include <linux/kdb.h>
#include <linux/kthread.h>
#include <linux/minmax.h>
#include <linux/percpu.h>
@@ -247,6 +248,8 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
* Panic does not imply that the console is owned. However,
* since all non-panic CPUs are stopped during panic(), it
* is safer to have them avoid gaining console ownership.
+ * The only exception is if kdb is active, which may print
+ * from multiple CPUs during a panic.
*
* If this acquire is a reacquire (and an unsafe takeover
* has not previously occurred) then it is allowed to attempt
@@ -255,6 +258,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
* interrupted by the panic CPU while printing.
*/
if (other_cpu_in_panic() &&
+ READ_ONCE(kdb_printf_cpu) != raw_smp_processor_id() &&
(!is_reacquire || cur->unsafe_takeover)) {
return -EPERM;
}
--
2.50.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v3 3/4] printk: nbcon: Allow KDB to acquire the NBCON context
2025-09-02 18:33 ` [PATCH v3 3/4] printk: nbcon: Allow KDB to acquire the NBCON context Marcos Paulo de Souza
@ 2025-09-05 14:52 ` John Ogness
2025-09-05 18:30 ` Marcos Paulo de Souza
2025-09-08 15:23 ` Petr Mladek
2025-09-08 15:14 ` Petr Mladek
1 sibling, 2 replies; 22+ messages in thread
From: John Ogness @ 2025-09-05 14:52 UTC (permalink / raw)
To: Marcos Paulo de Souza, Greg Kroah-Hartman, Petr Mladek,
Steven Rostedt, Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
Douglas Anderson
Cc: linux-kernel, kgdb-bugreport, Marcos Paulo de Souza
On 2025-09-02, Marcos Paulo de Souza <mpdesouza@suse.com> wrote:
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index ff218e95a505fd10521c2c4dfb00ad5ec5773953..352235a0eb4a484caccf86d3a57d1a149218ecec 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -255,6 +258,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
> * interrupted by the panic CPU while printing.
> */
> if (other_cpu_in_panic() &&
> + READ_ONCE(kdb_printf_cpu) != raw_smp_processor_id() &&
This needs some sort of ifdef CONFIG_KGDB_KDB wrapped around it. Maybe
it would be cleaner to have some macro.
#ifdef CONFIG_KGDB_KDB
#define KGDB_IS_ACTIVE() (READ_ONCE(kdb_printf_cpu) == raw_smp_processor_id())
#else
#define KGDB_IS_ACTIVE() false
#endif
and then something like:
if (other_cpu_in_panic() &&
!KGDB_IS_ACTIVE() &&
(!is_reacquire || cur->unsafe_takeover)) {
return -EPERM;
}
Or however you prefer. We need to compile for !CONFIG_KGDB_KDB.
John
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v3 3/4] printk: nbcon: Allow KDB to acquire the NBCON context
2025-09-05 14:52 ` John Ogness
@ 2025-09-05 18:30 ` Marcos Paulo de Souza
2025-09-08 15:23 ` Petr Mladek
1 sibling, 0 replies; 22+ messages in thread
From: Marcos Paulo de Souza @ 2025-09-05 18:30 UTC (permalink / raw)
To: John Ogness, Greg Kroah-Hartman, Petr Mladek, Steven Rostedt,
Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
Douglas Anderson
Cc: linux-kernel, kgdb-bugreport
On Fri, 2025-09-05 at 16:58 +0206, John Ogness wrote:
> On 2025-09-02, Marcos Paulo de Souza <mpdesouza@suse.com> wrote:
> > diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> > index
> > ff218e95a505fd10521c2c4dfb00ad5ec5773953..352235a0eb4a484caccf86d3a
> > 57d1a149218ecec 100644
> > --- a/kernel/printk/nbcon.c
> > +++ b/kernel/printk/nbcon.c
> > @@ -255,6 +258,7 @@ static int
> > nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
> > * interrupted by the panic CPU while printing.
> > */
> > if (other_cpu_in_panic() &&
> > + READ_ONCE(kdb_printf_cpu) !=
> > raw_smp_processor_id() &&
>
> This needs some sort of ifdef CONFIG_KGDB_KDB wrapped around it.
> Maybe
> it would be cleaner to have some macro.
Ouch.. that's true!
>
> #ifdef CONFIG_KGDB_KDB
> #define KGDB_IS_ACTIVE() (READ_ONCE(kdb_printf_cpu) ==
> raw_smp_processor_id())
> #else
> #define KGDB_IS_ACTIVE() false
> #endif
>
> and then something like:
>
> if (other_cpu_in_panic() &&
> !KGDB_IS_ACTIVE() &&
> (!is_reacquire || cur->unsafe_takeover)) {
> return -EPERM;
> }
>
> Or however you prefer. We need to compile for !CONFIG_KGDB_KDB.
I like the idea! Let me prepare it for v4.
>
> John
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v3 3/4] printk: nbcon: Allow KDB to acquire the NBCON context
2025-09-05 14:52 ` John Ogness
2025-09-05 18:30 ` Marcos Paulo de Souza
@ 2025-09-08 15:23 ` Petr Mladek
2025-09-09 7:53 ` John Ogness
1 sibling, 1 reply; 22+ messages in thread
From: Petr Mladek @ 2025-09-08 15:23 UTC (permalink / raw)
To: John Ogness
Cc: Marcos Paulo de Souza, Greg Kroah-Hartman, Steven Rostedt,
Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
Douglas Anderson, linux-kernel, kgdb-bugreport
On Fri 2025-09-05 16:58:34, John Ogness wrote:
> On 2025-09-02, Marcos Paulo de Souza <mpdesouza@suse.com> wrote:
> > diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> > index ff218e95a505fd10521c2c4dfb00ad5ec5773953..352235a0eb4a484caccf86d3a57d1a149218ecec 100644
> > --- a/kernel/printk/nbcon.c
> > +++ b/kernel/printk/nbcon.c
> > @@ -255,6 +258,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
> > * interrupted by the panic CPU while printing.
> > */
> > if (other_cpu_in_panic() &&
> > + READ_ONCE(kdb_printf_cpu) != raw_smp_processor_id() &&
>
> This needs some sort of ifdef CONFIG_KGDB_KDB wrapped around it. Maybe
> it would be cleaner to have some macro.
Great catch!
> #ifdef CONFIG_KGDB_KDB
> #define KGDB_IS_ACTIVE() (READ_ONCE(kdb_printf_cpu) == raw_smp_processor_id())
> #else
> #define KGDB_IS_ACTIVE() false
> #endif
I like this. It would fit into include/linux/kdb.h which already
contains the #ifdef CONFIG_KGDB_KDB / #else / #endif sections.
> and then something like:
>
> if (other_cpu_in_panic() &&
> !KGDB_IS_ACTIVE() &&
> (!is_reacquire || cur->unsafe_takeover)) {
> return -EPERM;
> }
>
> Or however you prefer. We need to compile for !CONFIG_KGDB_KDB.
Best Regards,
Petr
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v3 3/4] printk: nbcon: Allow KDB to acquire the NBCON context
2025-09-08 15:23 ` Petr Mladek
@ 2025-09-09 7:53 ` John Ogness
0 siblings, 0 replies; 22+ messages in thread
From: John Ogness @ 2025-09-09 7:53 UTC (permalink / raw)
To: Petr Mladek
Cc: Marcos Paulo de Souza, Greg Kroah-Hartman, Steven Rostedt,
Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
Douglas Anderson, linux-kernel, kgdb-bugreport
On 2025-09-08, Petr Mladek <pmladek@suse.com> wrote:
> On Fri 2025-09-05 16:58:34, John Ogness wrote:
>> On 2025-09-02, Marcos Paulo de Souza <mpdesouza@suse.com> wrote:
>> > diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
>> > index ff218e95a505fd10521c2c4dfb00ad5ec5773953..352235a0eb4a484caccf86d3a57d1a149218ecec 100644
>> > --- a/kernel/printk/nbcon.c
>> > +++ b/kernel/printk/nbcon.c
>> > @@ -255,6 +258,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
>> > * interrupted by the panic CPU while printing.
>> > */
>> > if (other_cpu_in_panic() &&
>> > + READ_ONCE(kdb_printf_cpu) != raw_smp_processor_id() &&
>>
>> This needs some sort of ifdef CONFIG_KGDB_KDB wrapped around it. Maybe
>> it would be cleaner to have some macro.
>
> Great catch!
>
>> #ifdef CONFIG_KGDB_KDB
>> #define KGDB_IS_ACTIVE() (READ_ONCE(kdb_printf_cpu) == raw_smp_processor_id())
>> #else
>> #define KGDB_IS_ACTIVE() false
>> #endif
>
> I like this. It would fit into include/linux/kdb.h which already
> contains the #ifdef CONFIG_KGDB_KDB / #else / #endif sections.
BTW, if there is such a macro created, it should be KDB_IS_ACTIVE()
rather than KGDB_IS_ACTIVE().
John
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 3/4] printk: nbcon: Allow KDB to acquire the NBCON context
2025-09-02 18:33 ` [PATCH v3 3/4] printk: nbcon: Allow KDB to acquire the NBCON context Marcos Paulo de Souza
2025-09-05 14:52 ` John Ogness
@ 2025-09-08 15:14 ` Petr Mladek
1 sibling, 0 replies; 22+ messages in thread
From: Petr Mladek @ 2025-09-08 15:14 UTC (permalink / raw)
To: Marcos Paulo de Souza
Cc: Greg Kroah-Hartman, Steven Rostedt, John Ogness,
Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
Douglas Anderson, linux-kernel, kgdb-bugreport
On Tue 2025-09-02 15:33:54, Marcos Paulo de Souza wrote:
> KDB can interrupt any console to execute the "mirrored printing" at any
> time, so add an exception to nbcon_context_try_acquire_direct to allow
> to get the context if the current CPU is the same as kdb_printf_cpu.
>
> This change will be necessary for the next patch, which fixes
> kdb_msg_write to work with NBCON consoles by calling ->write_atomic on
> such consoles. But to print it first needs to acquire the ownership of
> the console, so nbcon_context_try_acquire_direct is fixed here.
>
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -247,6 +248,8 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
> * Panic does not imply that the console is owned. However,
> * since all non-panic CPUs are stopped during panic(), it
> * is safer to have them avoid gaining console ownership.
> + * The only exception is if kdb is active, which may print
> + * from multiple CPUs during a panic.
Strictly speaking this is not the only exception. The reacquire is
another one. I would put this into a separate paragraph and write:
* One exception is when kdb is active, which may print
* from multiple CPUs during a panic.
> * If this acquire is a reacquire (and an unsafe takeover
And here start the paragrah with
* Second exception is a reacquire (and an usafe ...
> * has not previously occurred) then it is allowed to attempt
Best Regards,
Petr
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v3 4/4] kdb: Adapt kdb_msg_write to work with NBCON consoles
2025-09-02 18:33 [PATCH v3 0/4] Handle NBCON consoles on KDB Marcos Paulo de Souza
` (2 preceding siblings ...)
2025-09-02 18:33 ` [PATCH v3 3/4] printk: nbcon: Allow KDB to acquire the NBCON context Marcos Paulo de Souza
@ 2025-09-02 18:33 ` Marcos Paulo de Souza
2025-09-08 15:51 ` Petr Mladek
3 siblings, 1 reply; 22+ messages in thread
From: Marcos Paulo de Souza @ 2025-09-02 18:33 UTC (permalink / raw)
To: Greg Kroah-Hartman, Petr Mladek, Steven Rostedt, John Ogness,
Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
Douglas Anderson
Cc: linux-kernel, kgdb-bugreport, Marcos Paulo de Souza
Function kdb_msg_write was calling con->write for any found console,
but it won't work on NBCON ones. In this case we should acquire the
ownership of the console using NBCON_PRIO_EMERGENCY, since printing
kdb messages should only be interrupted by a panic in the case it was
triggered by sysrq debug option. This is done by the
nbcon_kdb_{acquire,release} functions.
At this point, the console is required to use the atomic callback. The
console is skipped if the write_atomic callback is not set or if the
context could not be acquired. The validation of NBCON is done by the
console_is_usable helper. The context is released right after
write_atomic finishes.
The oops_in_progress handling is only needed in the legacy consoles,
so it was moved around the con->write callback.
Suggested-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
kernel/debug/kdb/kdb_io.c | 46 +++++++++++++++++++++++++++++++---------------
1 file changed, 31 insertions(+), 15 deletions(-)
diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
index 9b11b10b120cf07e451a7a4d92ce50f9a6c066b2..47bc31cc71bc84750db5d9304ed75a113cd382bf 100644
--- a/kernel/debug/kdb/kdb_io.c
+++ b/kernel/debug/kdb/kdb_io.c
@@ -589,24 +589,40 @@ static void kdb_msg_write(const char *msg, int msg_len)
*/
cookie = console_srcu_read_lock();
for_each_console_srcu(c) {
- if (!(console_srcu_read_flags(c) & CON_ENABLED))
+ struct nbcon_write_context wctxt = { };
+ short flags = console_srcu_read_flags(c);
+
+ if (!console_is_usable(c, flags, true))
continue;
if (c == dbg_io_ops->cons)
continue;
- if (!c->write)
- continue;
- /*
- * Set oops_in_progress to encourage the console drivers to
- * disregard their internal spin locks: in the current calling
- * context the risk of deadlock is a bigger problem than risks
- * due to re-entering the console driver. We operate directly on
- * oops_in_progress rather than using bust_spinlocks() because
- * the calls bust_spinlocks() makes on exit are not appropriate
- * for this calling context.
- */
- ++oops_in_progress;
- c->write(c, msg, msg_len);
- --oops_in_progress;
+
+ if (flags & CON_NBCON) {
+ /*
+ * Do not continue if the console is NBCON and the context
+ * can't be acquired.
+ */
+ if (!nbcon_kdb_try_acquire(c, &wctxt))
+ continue;
+
+ wctxt.outbuf = (char *)msg;
+ wctxt.len = msg_len;
+ c->write_atomic(c, &wctxt);
+ nbcon_kdb_release(&wctxt);
+ } else {
+ /*
+ * Set oops_in_progress to encourage the console drivers to
+ * disregard their internal spin locks: in the current calling
+ * context the risk of deadlock is a bigger problem than risks
+ * due to re-entering the console driver. We operate directly on
+ * oops_in_progress rather than using bust_spinlocks() because
+ * the calls bust_spinlocks() makes on exit are not appropriate
+ * for this calling context.
+ */
+ ++oops_in_progress;
+ c->write(c, msg, msg_len);
+ --oops_in_progress;
+ }
touch_nmi_watchdog();
}
console_srcu_read_unlock(cookie);
--
2.50.0
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v3 4/4] kdb: Adapt kdb_msg_write to work with NBCON consoles
2025-09-02 18:33 ` [PATCH v3 4/4] kdb: Adapt kdb_msg_write to work with NBCON consoles Marcos Paulo de Souza
@ 2025-09-08 15:51 ` Petr Mladek
2025-09-08 19:27 ` Marcos Paulo de Souza
0 siblings, 1 reply; 22+ messages in thread
From: Petr Mladek @ 2025-09-08 15:51 UTC (permalink / raw)
To: Marcos Paulo de Souza
Cc: Greg Kroah-Hartman, Steven Rostedt, John Ogness,
Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
Douglas Anderson, linux-kernel, kgdb-bugreport
On Tue 2025-09-02 15:33:55, Marcos Paulo de Souza wrote:
> Function kdb_msg_write was calling con->write for any found console,
> but it won't work on NBCON ones. In this case we should acquire the
> ownership of the console using NBCON_PRIO_EMERGENCY, since printing
> kdb messages should only be interrupted by a panic
I would end the paragraph here.
> in the case it was
> triggered by sysrq debug option.
This is not important. Also there are more ways how to trigger
panic(). For example, it might happen by an error in the kdb code.
Or I heard rumors that some HW even had a physical "trigger NMI" button.
> This is done by the
> nbcon_kdb_{acquire,release} functions.
I think that this is more or less obvious.
> At this point, the console is required to use the atomic callback. The
> console is skipped if the write_atomic callback is not set or if the
> context could not be acquired. The validation of NBCON is done by the
> console_is_usable helper. The context is released right after
> write_atomic finishes.
>
> The oops_in_progress handling is only needed in the legacy consoles,
> so it was moved around the con->write callback.
> --- a/kernel/debug/kdb/kdb_io.c
> +++ b/kernel/debug/kdb/kdb_io.c
> @@ -589,24 +589,40 @@ static void kdb_msg_write(const char *msg, int msg_len)
> */
> cookie = console_srcu_read_lock();
> for_each_console_srcu(c) {
> - if (!(console_srcu_read_flags(c) & CON_ENABLED))
> + struct nbcon_write_context wctxt = { };
> + short flags = console_srcu_read_flags(c);
> +
> + if (!console_is_usable(c, flags, true))
> continue;
> if (c == dbg_io_ops->cons)
> continue;
> - if (!c->write)
> - continue;
> - /*
> - * Set oops_in_progress to encourage the console drivers to
> - * disregard their internal spin locks: in the current calling
> - * context the risk of deadlock is a bigger problem than risks
> - * due to re-entering the console driver. We operate directly on
> - * oops_in_progress rather than using bust_spinlocks() because
> - * the calls bust_spinlocks() makes on exit are not appropriate
> - * for this calling context.
> - */
> - ++oops_in_progress;
> - c->write(c, msg, msg_len);
> - --oops_in_progress;
> +
> + if (flags & CON_NBCON) {
> + /*
> + * Do not continue if the console is NBCON and the context
> + * can't be acquired.
> + */
> + if (!nbcon_kdb_try_acquire(c, &wctxt))
> + continue;
> +
> + wctxt.outbuf = (char *)msg;
> + wctxt.len = msg_len;
I double checked whether we initialized all members of the structure
correctly. And I found that we didn't. We should call here:
nbcon_write_context_set_buf(&wctxt,
&pmsg.pbufs->outbuf[0],
pmsg.outbuf_len);
Sigh, this is easy to miss. I remember that I was not super happy
about this design. But the original code initialized the structure
on a single place...
> + c->write_atomic(c, &wctxt);
> + nbcon_kdb_release(&wctxt);
> + } else {
> + /*
> + * Set oops_in_progress to encourage the console drivers to
> + * disregard their internal spin locks: in the current calling
> + * context the risk of deadlock is a bigger problem than risks
> + * due to re-entering the console driver. We operate directly on
> + * oops_in_progress rather than using bust_spinlocks() because
> + * the calls bust_spinlocks() makes on exit are not appropriate
> + * for this calling context.
> + */
> + ++oops_in_progress;
> + c->write(c, msg, msg_len);
> + --oops_in_progress;
> + }
> touch_nmi_watchdog();
> }
> console_srcu_read_unlock(cookie);
Best Regards,
Petr
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v3 4/4] kdb: Adapt kdb_msg_write to work with NBCON consoles
2025-09-08 15:51 ` Petr Mladek
@ 2025-09-08 19:27 ` Marcos Paulo de Souza
2025-09-09 7:57 ` John Ogness
0 siblings, 1 reply; 22+ messages in thread
From: Marcos Paulo de Souza @ 2025-09-08 19:27 UTC (permalink / raw)
To: Petr Mladek
Cc: Greg Kroah-Hartman, Steven Rostedt, John Ogness,
Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
Douglas Anderson, linux-kernel, kgdb-bugreport
On Mon, 2025-09-08 at 17:51 +0200, Petr Mladek wrote:
> On Tue 2025-09-02 15:33:55, Marcos Paulo de Souza wrote:
> > Function kdb_msg_write was calling con->write for any found
> > console,
> > but it won't work on NBCON ones. In this case we should acquire the
> > ownership of the console using NBCON_PRIO_EMERGENCY, since printing
> > kdb messages should only be interrupted by a panic
>
> I would end the paragraph here.
>
> > in the case it was
> > triggered by sysrq debug option.
>
> This is not important. Also there are more ways how to trigger
> panic(). For example, it might happen by an error in the kdb code.
> Or I heard rumors that some HW even had a physical "trigger NMI"
> button.
>
> > This is done by the
> > nbcon_kdb_{acquire,release} functions.
>
> I think that this is more or less obvious.
I'll adjust the commit message per you suggestion.
>
> > At this point, the console is required to use the atomic callback.
> > The
> > console is skipped if the write_atomic callback is not set or if
> > the
> > context could not be acquired. The validation of NBCON is done by
> > the
> > console_is_usable helper. The context is released right after
> > write_atomic finishes.
> >
> > The oops_in_progress handling is only needed in the legacy
> > consoles,
> > so it was moved around the con->write callback.
>
> > --- a/kernel/debug/kdb/kdb_io.c
> > +++ b/kernel/debug/kdb/kdb_io.c
> > @@ -589,24 +589,40 @@ static void kdb_msg_write(const char *msg,
> > int msg_len)
> > */
> > cookie = console_srcu_read_lock();
> > for_each_console_srcu(c) {
> > - if (!(console_srcu_read_flags(c) & CON_ENABLED))
> > + struct nbcon_write_context wctxt = { };
> > + short flags = console_srcu_read_flags(c);
> > +
> > + if (!console_is_usable(c, flags, true))
> > continue;
> > if (c == dbg_io_ops->cons)
> > continue;
> > - if (!c->write)
> > - continue;
> > - /*
> > - * Set oops_in_progress to encourage the console
> > drivers to
> > - * disregard their internal spin locks: in the
> > current calling
> > - * context the risk of deadlock is a bigger
> > problem than risks
> > - * due to re-entering the console driver. We
> > operate directly on
> > - * oops_in_progress rather than using
> > bust_spinlocks() because
> > - * the calls bust_spinlocks() makes on exit are
> > not appropriate
> > - * for this calling context.
> > - */
> > - ++oops_in_progress;
> > - c->write(c, msg, msg_len);
> > - --oops_in_progress;
> > +
> > + if (flags & CON_NBCON) {
> > + /*
> > + * Do not continue if the console is NBCON
> > and the context
> > + * can't be acquired.
> > + */
> > + if (!nbcon_kdb_try_acquire(c, &wctxt))
> > + continue;
> > +
> > + wctxt.outbuf = (char *)msg;
> > + wctxt.len = msg_len;
>
> I double checked whether we initialized all members of the structure
> correctly. And I found that we didn't. We should call here:
>
> nbcon_write_context_set_buf(&wctxt,
> &pmsg.pbufs-
> >outbuf[0],
>
> pmsg.outbuf_len);
>
> Sigh, this is easy to miss. I remember that I was not super happy
> about this design. But the original code initialized the structure
> on a single place...
Ok, so I'll need to also export nbcon_write_context_set_buf, since it's
currently static inside kernel/printk/nbcon.c. I'll do it for the next
version.
>
> > + c->write_atomic(c, &wctxt);
> > + nbcon_kdb_release(&wctxt);
> > + } else {
> > + /*
> > + * Set oops_in_progress to encourage the
> > console drivers to
> > + * disregard their internal spin locks: in
> > the current calling
> > + * context the risk of deadlock is a
> > bigger problem than risks
> > + * due to re-entering the console driver.
> > We operate directly on
> > + * oops_in_progress rather than using
> > bust_spinlocks() because
> > + * the calls bust_spinlocks() makes on
> > exit are not appropriate
> > + * for this calling context.
> > + */
> > + ++oops_in_progress;
> > + c->write(c, msg, msg_len);
> > + --oops_in_progress;
> > + }
> > touch_nmi_watchdog();
> > }
> > console_srcu_read_unlock(cookie);
>
> Best Regards,
> Petr
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v3 4/4] kdb: Adapt kdb_msg_write to work with NBCON consoles
2025-09-08 19:27 ` Marcos Paulo de Souza
@ 2025-09-09 7:57 ` John Ogness
2025-09-09 11:39 ` Marcos Paulo de Souza
2025-09-09 13:48 ` Petr Mladek
0 siblings, 2 replies; 22+ messages in thread
From: John Ogness @ 2025-09-09 7:57 UTC (permalink / raw)
To: Marcos Paulo de Souza, Petr Mladek
Cc: Greg Kroah-Hartman, Steven Rostedt, Sergey Senozhatsky,
Jason Wessel, Daniel Thompson, Douglas Anderson, linux-kernel,
kgdb-bugreport
On 2025-09-08, Marcos Paulo de Souza <mpdesouza@suse.com> wrote:
>> > --- a/kernel/debug/kdb/kdb_io.c
>> > +++ b/kernel/debug/kdb/kdb_io.c
>> > @@ -589,24 +589,40 @@ static void kdb_msg_write(const char *msg,
>> > int msg_len)
>> > */
>> > cookie = console_srcu_read_lock();
>> > for_each_console_srcu(c) {
>> > - if (!(console_srcu_read_flags(c) & CON_ENABLED))
>> > + struct nbcon_write_context wctxt = { };
>> > + short flags = console_srcu_read_flags(c);
>> > +
>> > + if (!console_is_usable(c, flags, true))
>> > continue;
>> > if (c == dbg_io_ops->cons)
>> > continue;
>> > - if (!c->write)
>> > - continue;
>> > - /*
>> > - * Set oops_in_progress to encourage the console
>> > drivers to
>> > - * disregard their internal spin locks: in the
>> > current calling
>> > - * context the risk of deadlock is a bigger
>> > problem than risks
>> > - * due to re-entering the console driver. We
>> > operate directly on
>> > - * oops_in_progress rather than using
>> > bust_spinlocks() because
>> > - * the calls bust_spinlocks() makes on exit are
>> > not appropriate
>> > - * for this calling context.
>> > - */
>> > - ++oops_in_progress;
>> > - c->write(c, msg, msg_len);
>> > - --oops_in_progress;
>> > +
>> > + if (flags & CON_NBCON) {
>> > + /*
>> > + * Do not continue if the console is NBCON
>> > and the context
>> > + * can't be acquired.
>> > + */
>> > + if (!nbcon_kdb_try_acquire(c, &wctxt))
>> > + continue;
>> > +
>> > + wctxt.outbuf = (char *)msg;
>> > + wctxt.len = msg_len;
>>
>> I double checked whether we initialized all members of the structure
>> correctly. And I found that we didn't. We should call here:
>>
>> nbcon_write_context_set_buf(&wctxt,
>> &pmsg.pbufs-
>> >outbuf[0],
>>
>> pmsg.outbuf_len);
Nice catch.
>> Sigh, this is easy to miss. I remember that I was not super happy
>> about this design. But the original code initialized the structure
>> on a single place...
>
> Ok, so I'll need to also export nbcon_write_context_set_buf, since it's
> currently static inside kernel/printk/nbcon.c. I'll do it for the next
> version.
How about modifying nbcon_kdb_try_acquire() to also take @msg and
@msg_len. Then, on success, @wctxt is already prepared. I do not like
the idea of external code fiddling with the fields.
John
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v3 4/4] kdb: Adapt kdb_msg_write to work with NBCON consoles
2025-09-09 7:57 ` John Ogness
@ 2025-09-09 11:39 ` Marcos Paulo de Souza
2025-09-09 13:48 ` Petr Mladek
1 sibling, 0 replies; 22+ messages in thread
From: Marcos Paulo de Souza @ 2025-09-09 11:39 UTC (permalink / raw)
To: John Ogness, Petr Mladek
Cc: Greg Kroah-Hartman, Steven Rostedt, Sergey Senozhatsky,
Jason Wessel, Daniel Thompson, Douglas Anderson, linux-kernel,
kgdb-bugreport
On Tue, 2025-09-09 at 10:03 +0206, John Ogness wrote:
> On 2025-09-08, Marcos Paulo de Souza <mpdesouza@suse.com> wrote:
> > > > --- a/kernel/debug/kdb/kdb_io.c
> > > > +++ b/kernel/debug/kdb/kdb_io.c
> > > > @@ -589,24 +589,40 @@ static void kdb_msg_write(const char
> > > > *msg,
> > > > int msg_len)
> > > > */
> > > > cookie = console_srcu_read_lock();
> > > > for_each_console_srcu(c) {
> > > > - if (!(console_srcu_read_flags(c) &
> > > > CON_ENABLED))
> > > > + struct nbcon_write_context wctxt = { };
> > > > + short flags = console_srcu_read_flags(c);
> > > > +
> > > > + if (!console_is_usable(c, flags, true))
> > > > continue;
> > > > if (c == dbg_io_ops->cons)
> > > > continue;
> > > > - if (!c->write)
> > > > - continue;
> > > > - /*
> > > > - * Set oops_in_progress to encourage the
> > > > console
> > > > drivers to
> > > > - * disregard their internal spin locks: in the
> > > > current calling
> > > > - * context the risk of deadlock is a bigger
> > > > problem than risks
> > > > - * due to re-entering the console driver. We
> > > > operate directly on
> > > > - * oops_in_progress rather than using
> > > > bust_spinlocks() because
> > > > - * the calls bust_spinlocks() makes on exit
> > > > are
> > > > not appropriate
> > > > - * for this calling context.
> > > > - */
> > > > - ++oops_in_progress;
> > > > - c->write(c, msg, msg_len);
> > > > - --oops_in_progress;
> > > > +
> > > > + if (flags & CON_NBCON) {
> > > > + /*
> > > > + * Do not continue if the console is
> > > > NBCON
> > > > and the context
> > > > + * can't be acquired.
> > > > + */
> > > > + if (!nbcon_kdb_try_acquire(c, &wctxt))
> > > > + continue;
> > > > +
> > > > + wctxt.outbuf = (char *)msg;
> > > > + wctxt.len = msg_len;
> > >
> > > I double checked whether we initialized all members of the
> > > structure
> > > correctly. And I found that we didn't. We should call here:
> > >
> > > nbcon_write_context_set_buf(&wctxt,
> > > &pmsg.pbufs-
> > > > outbuf[0],
> > >
> > > pmsg.outbuf_len);
>
> Nice catch.
>
> > > Sigh, this is easy to miss. I remember that I was not super happy
> > > about this design. But the original code initialized the
> > > structure
> > > on a single place...
> >
> > Ok, so I'll need to also export nbcon_write_context_set_buf, since
> > it's
> > currently static inside kernel/printk/nbcon.c. I'll do it for the
> > next
> > version.
>
> How about modifying nbcon_kdb_try_acquire() to also take @msg and
> @msg_len. Then, on success, @wctxt is already prepared. I do not like
> the idea of external code fiddling with the fields.
Hum.. indeed, it's ugly to pass msg and len, but it's better than
exposing more code for a case like this.
>
> John
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v3 4/4] kdb: Adapt kdb_msg_write to work with NBCON consoles
2025-09-09 7:57 ` John Ogness
2025-09-09 11:39 ` Marcos Paulo de Souza
@ 2025-09-09 13:48 ` Petr Mladek
2025-09-09 14:23 ` John Ogness
1 sibling, 1 reply; 22+ messages in thread
From: Petr Mladek @ 2025-09-09 13:48 UTC (permalink / raw)
To: John Ogness
Cc: Marcos Paulo de Souza, Greg Kroah-Hartman, Steven Rostedt,
Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
Douglas Anderson, linux-kernel, kgdb-bugreport
On Tue 2025-09-09 10:03:05, John Ogness wrote:
> On 2025-09-08, Marcos Paulo de Souza <mpdesouza@suse.com> wrote:
> >> > --- a/kernel/debug/kdb/kdb_io.c
> >> > +++ b/kernel/debug/kdb/kdb_io.c
> >> > @@ -589,24 +589,40 @@ static void kdb_msg_write(const char *msg,
> >> > int msg_len)
> >> > */
> >> > cookie = console_srcu_read_lock();
> >> > for_each_console_srcu(c) {
> >> > - if (!(console_srcu_read_flags(c) & CON_ENABLED))
> >> > + struct nbcon_write_context wctxt = { };
> >> > + short flags = console_srcu_read_flags(c);
> >> > +
> >> > + if (!console_is_usable(c, flags, true))
> >> > continue;
> >> > if (c == dbg_io_ops->cons)
> >> > continue;
> >> > - if (!c->write)
> >> > - continue;
> >> > - /*
> >> > - * Set oops_in_progress to encourage the console
> >> > drivers to
> >> > - * disregard their internal spin locks: in the
> >> > current calling
> >> > - * context the risk of deadlock is a bigger
> >> > problem than risks
> >> > - * due to re-entering the console driver. We
> >> > operate directly on
> >> > - * oops_in_progress rather than using
> >> > bust_spinlocks() because
> >> > - * the calls bust_spinlocks() makes on exit are
> >> > not appropriate
> >> > - * for this calling context.
> >> > - */
> >> > - ++oops_in_progress;
> >> > - c->write(c, msg, msg_len);
> >> > - --oops_in_progress;
> >> > +
> >> > + if (flags & CON_NBCON) {
> >> > + /*
> >> > + * Do not continue if the console is NBCON
> >> > and the context
> >> > + * can't be acquired.
> >> > + */
> >> > + if (!nbcon_kdb_try_acquire(c, &wctxt))
> >> > + continue;
> >> > +
> >> > + wctxt.outbuf = (char *)msg;
> >> > + wctxt.len = msg_len;
> >>
> >> I double checked whether we initialized all members of the structure
> >> correctly. And I found that we didn't. We should call here:
> >>
> >> nbcon_write_context_set_buf(&wctxt,
> >> &pmsg.pbufs-
> >> >outbuf[0],
> >>
> >> pmsg.outbuf_len);
>
> Nice catch.
>
> >> Sigh, this is easy to miss. I remember that I was not super happy
> >> about this design.
I looked for details. I described my concerns at
https://lore.kernel.org/lkml/ZNY5gPNyyw9RTo4X@alley/#t
> >> But the original code initialized the structure
> >> on a single place...
> >
> > Ok, so I'll need to also export nbcon_write_context_set_buf, since it's
> > currently static inside kernel/printk/nbcon.c. I'll do it for the next
> > version.
>
> How about modifying nbcon_kdb_try_acquire() to also take @msg and
> @msg_len. Then, on success, @wctxt is already prepared. I do not like
> the idea of external code fiddling with the fields.
I was thinking about another solution, e.g. an nbcon_wctxt_init()
function. The problem is that wctxt->unsafe_takeover would need
to get updated after acquiring the context. And might be reused
for different consoles, ...
But wait. I do not see any code using wctxt->unsafe_takeover.
It seems that the motivation was that console drivers might
do something else when there was an unsafe_takeover in the past,
see https://lore.kernel.org/lkml/87cyz6ro62.fsf@jogness.linutronix.de/
But it seems that no console driver is using it.
So, I would prefer to remove the "unsafe_takeover" field from
struct nbcon_write_context and keep this kdb code as it is now.
We could always add it back when really needed.
Alternatively, we could create an API which could read this information
from struct wctxt.ctxt.con. But I would create this API only when
there is an user.
Best Regards,
Petr
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v3 4/4] kdb: Adapt kdb_msg_write to work with NBCON consoles
2025-09-09 13:48 ` Petr Mladek
@ 2025-09-09 14:23 ` John Ogness
2025-09-09 15:03 ` Petr Mladek
0 siblings, 1 reply; 22+ messages in thread
From: John Ogness @ 2025-09-09 14:23 UTC (permalink / raw)
To: Petr Mladek
Cc: Marcos Paulo de Souza, Greg Kroah-Hartman, Steven Rostedt,
Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
Douglas Anderson, linux-kernel, kgdb-bugreport
On 2025-09-09, Petr Mladek <pmladek@suse.com> wrote:
> The problem is that wctxt->unsafe_takeover would need to get updated
> after acquiring the context. And might be reused for different
> consoles, ...
You are right. I think it is best to make nbcon_write_context_set_buf()
available.
> But wait. I do not see any code using wctxt->unsafe_takeover.
>
> It seems that the motivation was that console drivers might
> do something else when there was an unsafe_takeover in the past,
> see https://lore.kernel.org/lkml/87cyz6ro62.fsf@jogness.linutronix.de/
> But it seems that no console driver is using it.
>
> So, I would prefer to remove the "unsafe_takeover" field from
> struct nbcon_write_context and keep this kdb code as it is now.
No one is using it because the nbcon drivers are still implementing the
"hope and pray" model on unsafe takeovers. The flag is an important part
of the API to allow drivers to maximize their safety.
I think it is too early to remove the flag when there are still so few
nbcon drivers in existance.
John
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 4/4] kdb: Adapt kdb_msg_write to work with NBCON consoles
2025-09-09 14:23 ` John Ogness
@ 2025-09-09 15:03 ` Petr Mladek
0 siblings, 0 replies; 22+ messages in thread
From: Petr Mladek @ 2025-09-09 15:03 UTC (permalink / raw)
To: John Ogness
Cc: Marcos Paulo de Souza, Greg Kroah-Hartman, Steven Rostedt,
Sergey Senozhatsky, Jason Wessel, Daniel Thompson,
Douglas Anderson, linux-kernel, kgdb-bugreport
On Tue 2025-09-09 16:29:50, John Ogness wrote:
> On 2025-09-09, Petr Mladek <pmladek@suse.com> wrote:
> > The problem is that wctxt->unsafe_takeover would need to get updated
> > after acquiring the context. And might be reused for different
> > consoles, ...
>
> You are right. I think it is best to make nbcon_write_context_set_buf()
> available.
I am fine with it.
> > But wait. I do not see any code using wctxt->unsafe_takeover.
> >
> > It seems that the motivation was that console drivers might
> > do something else when there was an unsafe_takeover in the past,
> > see https://lore.kernel.org/lkml/87cyz6ro62.fsf@jogness.linutronix.de/
> > But it seems that no console driver is using it.
> >
> > So, I would prefer to remove the "unsafe_takeover" field from
> > struct nbcon_write_context and keep this kdb code as it is now.
>
> No one is using it because the nbcon drivers are still implementing the
> "hope and pray" model on unsafe takeovers. The flag is an important part
> of the API to allow drivers to maximize their safety.
>
> I think it is too early to remove the flag when there are still so few
> nbcon drivers in existance.
I feel that that I should be more strict and push for removing
the flag because it is not used and complicates the design.
I am sure that there are cleaner ways how to provide
the information when anyone would need it.
But I do not want to fight for it. It is not worth a blood
(changing code back and forth). I am fine with exporting
nbcon_write_context_set_buf() for now. We might know more
about real users next time it causes problems ;-)
Best Regards,
Petr
^ permalink raw reply [flat|nested] 22+ messages in thread