From: John Ogness <john.ogness@linutronix.de>
To: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Jiri Slaby <jirislaby@kernel.org>,
Sergey Senozhatsky <senozhatsky@chromium.org>,
Steven Rostedt <rostedt@goodmis.org>,
Thomas Gleixner <tglx@linutronix.de>,
Esben Haabendal <esben@geanix.com>,
linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org,
Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
Arnd Bergmann <arnd@arndb.de>, Tony Lindgren <tony@atomide.com>,
Niklas Schnelle <schnelle@linux.ibm.com>,
Serge Semin <fancer.lancer@gmail.com>,
Andrew Murray <amurray@thegoodpenguin.co.uk>,
Petr Mladek <pmladek@suse.com>
Subject: Re: [PATCH 1/3] printk/nbcon: Block printk kthreads when any CPU is in an emergency context
Date: Fri, 26 Sep 2025 16:43:33 +0206 [thread overview]
Message-ID: <841pnti8k2.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <20250926124912.243464-2-pmladek@suse.com>
On 2025-09-26, Petr Mladek <pmladek@suse.com> wrote:
> In emergency contexts, printk() tries to flush messages directly even
> on nbcon consoles. And it is allowed to takeover the console ownership
> and interrupt the printk kthread in the middle of a message.
>
> Only one takeover and one repeated message should be enough in most
> situations. The first emergency message flushes the backlog and printk
> kthreads get to sleep. Next emergency messages are flushed directly
> and printk() does not wake up the kthreads.
>
> However, the one takeover is not guaranteed. Any printk() in normal
> context on another CPU could wake up the kthreads. Or a new emergency
> message might be added before the kthreads get to sleep. Note that
> the interrupted .write_kthread() callbacks usually have to call
.write_thread()
> nbcon_reacquire_nobuf() and restore the original device setting
> before checking for pending messages.
>
> The risk of the repeated takeovers will be even bigger because
> __nbcon_atomic_flush_pending_con is going to release the console
> ownership after each emitted record. It will be needed to prevent
> hardlockup reports on other CPUs which are busy waiting for
> the context ownership, for example, by nbcon_reacquire_nobuf() or
> __uart_port_nbcon_acquire().
>
> The repeated takeovers break the output, for example:
>
> [ 5042.650211][ T2220] Call Trace:
> [ 5042.6511
> ** replaying previous printk message **
> [ 5042.651192][ T2220] <TASK>
> [ 5042.652160][ T2220] kunit_run_
> ** replaying previous printk message **
> [ 5042.652160][ T2220] kunit_run_tests+0x72/0x90
> [ 5042.653340][ T22
> ** replaying previous printk message **
> [ 5042.653340][ T2220] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 5042.654628][ T2220] ? stack_trace_save+0x4d/0x70
> [ 5042.6553
> ** replaying previous printk message **
> [ 5042.655394][ T2220] ? srso_alias_return_thunk+0x5/0xfbef5
> [ 5042.656713][ T2220] ? save_trace+0x5b/0x180
>
> A more robust solution is to block the printk kthread entirely whenever
> *any* CPU enters an emergency context. This ensures that critical messages
> can be flushed without contention from the normal, non-atomic printing
> path.
>
> Link: https://lore.kernel.org/all/aNQO-zl3k1l4ENfy@pathway.suse.cz
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---
> kernel/printk/nbcon.c | 32 +++++++++++++++++++++++++++++++-
> 1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index d5d8c8c657e0..08b196e898cd 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -117,6 +117,9 @@
> * from scratch.
> */
>
> +/* Counter of active nbcon emergency contexts. */
> +atomic_t nbcon_cpu_emergency_cnt;
This can be static and should be initialized:
static atomic_t nbcon_cpu_emergency_cnt = ATOMIC_INIT(0);
> +
> /**
> * nbcon_state_set - Helper function to set the console state
> * @con: Console to update
> @@ -1168,6 +1171,16 @@ static bool nbcon_kthread_should_wakeup(struct console *con, struct nbcon_contex
> if (kthread_should_stop())
> return true;
>
> + /*
> + * Block the kthread when the system is in an emergency or panic mode.
> + * It increases the chance that these contexts would be able to show
> + * the messages directly. And it reduces the risk of interrupted writes
> + * where the context with a higher priority takes over the nbcon console
> + * ownership in the middle of a message.
> + */
> + if (unlikely(atomic_read(&nbcon_cpu_emergency_cnt)))
> + return false;
> +
> cookie = console_srcu_read_lock();
>
> flags = console_srcu_read_flags(con);
> @@ -1219,6 +1232,13 @@ static int nbcon_kthread_func(void *__console)
> if (kthread_should_stop())
> return 0;
>
> + /*
> + * Block the kthread when the system is in an emergency or panic
> + * mode. See nbcon_kthread_should_wakeup() for more details.
> + */
> + if (unlikely(atomic_read(&nbcon_cpu_emergency_cnt)))
> + goto wait_for_event;
> +
> backlog = false;
>
> /*
> @@ -1660,6 +1680,8 @@ void nbcon_cpu_emergency_enter(void)
>
> preempt_disable();
>
> + atomic_inc(&nbcon_cpu_emergency_cnt);
> +
> cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
> (*cpu_emergency_nesting)++;
> }
> @@ -1674,10 +1696,18 @@ void nbcon_cpu_emergency_exit(void)
> unsigned int *cpu_emergency_nesting;
>
> cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
> -
> if (!WARN_ON_ONCE(*cpu_emergency_nesting == 0))
> (*cpu_emergency_nesting)--;
>
> + /*
> + * Wake up kthreads because there might be some pending messages
> + * added by other CPUs with normal priority since the last flush
> + * in the emergency context.
> + */
> + if (!WARN_ON_ONCE(atomic_read(&nbcon_cpu_emergency_cnt) == 0))
> + if (atomic_dec_return(&nbcon_cpu_emergency_cnt) == 0)
> + nbcon_kthreads_wake();
Although technically it doesn't hurt to blindly call
nbcon_kthreads_wake(), you may want to do it more formally. Maybe like
this:
if (!WARN_ON_ONCE(atomic_read(&nbcon_cpu_emergency_cnt) == 0)) {
if (atomic_dec_return(&nbcon_cpu_emergency_cnt) == 0) {
struct console_flush_type ft;
printk_get_console_flush_type(&ft);
if (ft.nbcon_offload)
nbcon_kthreads_wake();
}
}
I leave it up to you.
With the static+initializer change:
Reviewed-by: John Ogness <john.ogness@linutronix.de>
next prev parent reply other threads:[~2025-09-26 14:37 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-26 12:49 [PATCH 0/3] printk/nbcon: Prevent hardlockup reports caused by atomic nbcon flush Petr Mladek
2025-09-26 12:49 ` [PATCH 1/3] printk/nbcon: Block printk kthreads when any CPU is in an emergency context Petr Mladek
2025-09-26 14:37 ` John Ogness [this message]
2025-09-29 12:02 ` Petr Mladek
2025-09-29 8:40 ` Andrew Murray
2025-09-30 20:15 ` kernel test robot
2025-09-26 12:49 ` [PATCH 2/3] printk/nbcon/panic: Allow printk kthread to sleep when the system is in panic Petr Mladek
2025-09-26 14:38 ` John Ogness
2025-09-29 8:39 ` Andrew Murray
2025-09-26 12:49 ` [PATCH 3/3] printk/nbcon: Release nbcon consoles ownership in atomic flush after each emitted record Petr Mladek
2025-09-26 14:43 ` John Ogness
2025-09-29 8:38 ` Andrew Murray
2025-10-30 11:32 ` [PATCH 0/3] printk/nbcon: Prevent hardlockup reports caused by atomic nbcon flush Petr Mladek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=841pnti8k2.fsf@jogness.linutronix.de \
--to=john.ogness@linutronix.de \
--cc=amurray@thegoodpenguin.co.uk \
--cc=andriy.shevchenko@linux.intel.com \
--cc=arnd@arndb.de \
--cc=esben@geanix.com \
--cc=fancer.lancer@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=jirislaby@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-serial@vger.kernel.org \
--cc=pmladek@suse.com \
--cc=rostedt@goodmis.org \
--cc=schnelle@linux.ibm.com \
--cc=senozhatsky@chromium.org \
--cc=tglx@linutronix.de \
--cc=tony@atomide.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox