public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH printk v1 0/1] Fix reported suspend failures
@ 2025-11-11 14:43 John Ogness
  2025-11-11 14:43 ` [PATCH printk v1 1/1] printk: Avoid scheduling irq_work on suspend John Ogness
  0 siblings, 1 reply; 8+ messages in thread
From: John Ogness @ 2025-11-11 14:43 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Sherry Sun, Jacky Bai,
	Jon Hunter, Thierry Reding, Derek Barbosa, linux-kernel, stable

Hi,

There have been multiple reports [0][1] (+ 2 offlist) of suspend
failing when NBCON console drivers are in use. With the help of
NXP and NVIDIA we were able to isolate the problem and verify
the fix.

The first NBCON drivers appeared in 6.13, so currently there is
no LTS kernel that requires this series. But it should go into
6.17.x and 6.18.

John Ogness

[0] https://lore.kernel.org/lkml/80b020fc-c18a-4da4-b222-16da1cab2f4c@nvidia.com
[1] https://lore.kernel.org/lkml/DB9PR04MB8429E7DDF2D93C2695DE401D92C4A@DB9PR04MB8429.eurprd04.prod.outlook.com

John Ogness (1):
  printk: Avoid scheduling irq_work on suspend

 kernel/printk/internal.h |  8 ++++---
 kernel/printk/printk.c   | 51 ++++++++++++++++++++++++++++------------
 2 files changed, 41 insertions(+), 18 deletions(-)


base-commit: e9a6fb0bcdd7609be6969112f3fbfcce3b1d4a7c
-- 
2.47.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH printk v1 1/1] printk: Avoid scheduling irq_work on suspend
  2025-11-11 14:43 [PATCH printk v1 0/1] Fix reported suspend failures John Ogness
@ 2025-11-11 14:43 ` John Ogness
  2025-11-11 16:31   ` Petr Mladek
  2025-11-12  5:00   ` Sherry Sun
  0 siblings, 2 replies; 8+ messages in thread
From: John Ogness @ 2025-11-11 14:43 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Sherry Sun, Jacky Bai,
	Jon Hunter, Thierry Reding, Derek Barbosa, linux-kernel, stable

Allowing irq_work to be scheduled while trying to suspend has shown
to cause problems as some architectures interpret the pending
interrupts as a reason to not suspend. This became a problem for
printk() with the introduction of NBCON consoles. With every
printk() call, NBCON console printing kthreads are woken by queueing
irq_work. This means that irq_work continues to be queued due to
printk() calls late in the suspend procedure.

Avoid this problem by preventing printk() from queueing irq_work
once console suspending has begun. This applies to triggering NBCON
and legacy deferred printing as well as klogd waiters.

Since triggering of NBCON threaded printing relies on irq_work, the
pr_flush() within console_suspend_all() is used to perform the final
flushing before suspending consoles and blocking irq_work queueing.
NBCON consoles that are not suspended (due to the usage of the
"no_console_suspend" boot argument) transition to atomic flushing.

Introduce a new global variable @console_offload_blocked to flag
when irq_work queueing is to be avoided. The flag is used by
printk_get_console_flush_type() to avoid allowing deferred printing
and switch NBCON consoles to atomic flushing. It is also used by
vprintk_emit() to avoid klogd waking.

Cc: <stable@vger.kernel.org> # 6.13.x because no drivers in 6.12.x
Fixes: 6b93bb41f6ea ("printk: Add non-BKL (nbcon) console basic infrastructure")
Closes: https://lore.kernel.org/lkml/DB9PR04MB8429E7DDF2D93C2695DE401D92C4A@DB9PR04MB8429.eurprd04.prod.outlook.com
Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/internal.h |  8 ++++---
 kernel/printk/printk.c   | 51 ++++++++++++++++++++++++++++------------
 2 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index f72bbfa266d6c..b20929b7d71f5 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -230,6 +230,8 @@ struct console_flush_type {
 	bool	legacy_offload;
 };
 
+extern bool console_irqwork_blocked;
+
 /*
  * Identify which console flushing methods should be used in the context of
  * the caller.
@@ -241,7 +243,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft)
 	switch (nbcon_get_default_prio()) {
 	case NBCON_PRIO_NORMAL:
 		if (have_nbcon_console && !have_boot_console) {
-			if (printk_kthreads_running)
+			if (printk_kthreads_running && !console_irqwork_blocked)
 				ft->nbcon_offload = true;
 			else
 				ft->nbcon_atomic = true;
@@ -251,7 +253,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft)
 		if (have_legacy_console || have_boot_console) {
 			if (!is_printk_legacy_deferred())
 				ft->legacy_direct = true;
-			else
+			else if (!console_irqwork_blocked)
 				ft->legacy_offload = true;
 		}
 		break;
@@ -264,7 +266,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft)
 		if (have_legacy_console || have_boot_console) {
 			if (!is_printk_legacy_deferred())
 				ft->legacy_direct = true;
-			else
+			else if (!console_irqwork_blocked)
 				ft->legacy_offload = true;
 		}
 		break;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 5aee9ffb16b9a..94fc4a8662d4b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -462,6 +462,9 @@ bool have_boot_console;
 /* See printk_legacy_allow_panic_sync() for details. */
 bool legacy_allow_panic_sync;
 
+/* Avoid using irq_work when suspending. */
+bool console_irqwork_blocked;
+
 #ifdef CONFIG_PRINTK
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
 static DECLARE_WAIT_QUEUE_HEAD(legacy_wait);
@@ -2426,7 +2429,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 
 	if (ft.legacy_offload)
 		defer_console_output();
-	else
+	else if (!console_irqwork_blocked)
 		wake_up_klogd();
 
 	return printed_len;
@@ -2730,10 +2733,20 @@ void console_suspend_all(void)
 {
 	struct console *con;
 
+	if (console_suspend_enabled)
+		pr_info("Suspending console(s) (use no_console_suspend to debug)\n");
+
+	/*
+	 * Flush any console backlog and then avoid queueing irq_work until
+	 * console_resume_all(). Until then deferred printing is no longer
+	 * triggered, NBCON consoles transition to atomic flushing, and
+	 * any klogd waiters are not triggered.
+	 */
+	pr_flush(1000, true);
+	console_irqwork_blocked = true;
+
 	if (!console_suspend_enabled)
 		return;
-	pr_info("Suspending console(s) (use no_console_suspend to debug)\n");
-	pr_flush(1000, true);
 
 	console_list_lock();
 	for_each_console(con)
@@ -2754,26 +2767,34 @@ void console_resume_all(void)
 	struct console_flush_type ft;
 	struct console *con;
 
-	if (!console_suspend_enabled)
-		return;
-
-	console_list_lock();
-	for_each_console(con)
-		console_srcu_write_flags(con, con->flags & ~CON_SUSPENDED);
-	console_list_unlock();
-
 	/*
-	 * Ensure that all SRCU list walks have completed. All printing
-	 * contexts must be able to see they are no longer suspended so
-	 * that they are guaranteed to wake up and resume printing.
+	 * Allow queueing irq_work. After restoring console state, deferred
+	 * printing and any klogd waiters need to be triggered in case there
+	 * is now a console backlog.
 	 */
-	synchronize_srcu(&console_srcu);
+	console_irqwork_blocked = false;
+
+	if (console_suspend_enabled) {
+		console_list_lock();
+		for_each_console(con)
+			console_srcu_write_flags(con, con->flags & ~CON_SUSPENDED);
+		console_list_unlock();
+
+		/*
+		 * Ensure that all SRCU list walks have completed. All printing
+		 * contexts must be able to see they are no longer suspended so
+		 * that they are guaranteed to wake up and resume printing.
+		 */
+		synchronize_srcu(&console_srcu);
+	}
 
 	printk_get_console_flush_type(&ft);
 	if (ft.nbcon_offload)
 		nbcon_kthreads_wake();
 	if (ft.legacy_offload)
 		defer_console_output();
+	else
+		wake_up_klogd();
 
 	pr_flush(1000, true);
 }
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH printk v1 1/1] printk: Avoid scheduling irq_work on suspend
  2025-11-11 14:43 ` [PATCH printk v1 1/1] printk: Avoid scheduling irq_work on suspend John Ogness
@ 2025-11-11 16:31   ` Petr Mladek
  2025-11-12 15:50     ` John Ogness
  2025-11-12  5:00   ` Sherry Sun
  1 sibling, 1 reply; 8+ messages in thread
From: Petr Mladek @ 2025-11-11 16:31 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Sherry Sun, Jacky Bai,
	Jon Hunter, Thierry Reding, Derek Barbosa, linux-kernel, stable

On Tue 2025-11-11 15:49:22, John Ogness wrote:
> Allowing irq_work to be scheduled while trying to suspend has shown
> to cause problems as some architectures interpret the pending
> interrupts as a reason to not suspend. This became a problem for
> printk() with the introduction of NBCON consoles. With every
> printk() call, NBCON console printing kthreads are woken by queueing
> irq_work. This means that irq_work continues to be queued due to
> printk() calls late in the suspend procedure.
> 
> Avoid this problem by preventing printk() from queueing irq_work
> once console suspending has begun. This applies to triggering NBCON
> and legacy deferred printing as well as klogd waiters.
> 
> Since triggering of NBCON threaded printing relies on irq_work, the
> pr_flush() within console_suspend_all() is used to perform the final
> flushing before suspending consoles and blocking irq_work queueing.
> NBCON consoles that are not suspended (due to the usage of the
> "no_console_suspend" boot argument) transition to atomic flushing.
> 
> Introduce a new global variable @console_offload_blocked to flag

s/console_offload_blocked/console_irqwork_blocked/

> when irq_work queueing is to be avoided. The flag is used by
> printk_get_console_flush_type() to avoid allowing deferred printing
> and switch NBCON consoles to atomic flushing. It is also used by
> vprintk_emit() to avoid klogd waking.
> 
> --- a/kernel/printk/internal.h
> +++ b/kernel/printk/internal.h
> @@ -230,6 +230,8 @@ struct console_flush_type {
>  	bool	legacy_offload;
>  };
>  
> +extern bool console_irqwork_blocked;
> +
>  /*
>   * Identify which console flushing methods should be used in the context of
>   * the caller.
> @@ -241,7 +243,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft)
>  	switch (nbcon_get_default_prio()) {
>  	case NBCON_PRIO_NORMAL:
>  		if (have_nbcon_console && !have_boot_console) {
> -			if (printk_kthreads_running)
> +			if (printk_kthreads_running && !console_irqwork_blocked)
>  				ft->nbcon_offload = true;
>  			else
>  				ft->nbcon_atomic = true;
> @@ -251,7 +253,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft)
>  		if (have_legacy_console || have_boot_console) {
>  			if (!is_printk_legacy_deferred())
>  				ft->legacy_direct = true;
> -			else
> +			else if (!console_irqwork_blocked)
>  				ft->legacy_offload = true;
>  		}
>  		break;

This is one possibility.

Another possibility would be to block the irq work
directly in defer_console_output() and wake_up_klogd().
It would handle all situations, including printk_trigger_flush()
or defer_console_output().

Or is there any reason, why these two call paths are not handled?

I do not have strong opinion. This patch makes it more explicit
when defer_console_output() or wake_up_klogd() is called.

If we move the check into defer_console_output() or wake_up_klogd(),
it would hide the behavior. But it will make the API more safe
to use. And wake_up_klogd() is even exported via <linux/printk.h>.


> @@ -264,7 +266,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft)
>  		if (have_legacy_console || have_boot_console) {
>  			if (!is_printk_legacy_deferred())
>  				ft->legacy_direct = true;
> -			else
> +			else if (!console_irqwork_blocked)
>  				ft->legacy_offload = true;

This change won't be needed if we move the check into
defer_console_output() and wake_up_klogd().

>  		}
>  		break;
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 5aee9ffb16b9a..94fc4a8662d4b 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2426,7 +2429,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>  
>  	if (ft.legacy_offload)
>  		defer_console_output();
> -	else
> +	else if (!console_irqwork_blocked)
>  		wake_up_klogd();

Same here.

>  
>  	return printed_len;


The rest of the patch looks good to me.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH printk v1 1/1] printk: Avoid scheduling irq_work on suspend
  2025-11-11 14:43 ` [PATCH printk v1 1/1] printk: Avoid scheduling irq_work on suspend John Ogness
  2025-11-11 16:31   ` Petr Mladek
@ 2025-11-12  5:00   ` Sherry Sun
  1 sibling, 0 replies; 8+ messages in thread
From: Sherry Sun @ 2025-11-12  5:00 UTC (permalink / raw)
  To: John Ogness, Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Jacky Bai, Jon Hunter,
	Thierry Reding, Derek Barbosa, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org



> -----Original Message-----
> From: John Ogness <john.ogness@linutronix.de>
> Sent: Tuesday, November 11, 2025 10:43 PM
> To: Petr Mladek <pmladek@suse.com>
> Cc: Sergey Senozhatsky <senozhatsky@chromium.org>; Steven Rostedt
> <rostedt@goodmis.org>; Sherry Sun <sherry.sun@nxp.com>; Jacky Bai
> <ping.bai@nxp.com>; Jon Hunter <jonathanh@nvidia.com>; Thierry Reding
> <thierry.reding@gmail.com>; Derek Barbosa <debarbos@redhat.com>; linux-
> kernel@vger.kernel.org; stable@vger.kernel.org
> Subject: [PATCH printk v1 1/1] printk: Avoid scheduling irq_work on suspend
>
> Allowing irq_work to be scheduled while trying to suspend has shown to
> cause problems as some architectures interpret the pending interrupts as a
> reason to not suspend. This became a problem for
> printk() with the introduction of NBCON consoles. With every
> printk() call, NBCON console printing kthreads are woken by queueing
> irq_work. This means that irq_work continues to be queued due to
> printk() calls late in the suspend procedure.
>
> Avoid this problem by preventing printk() from queueing irq_work once
> console suspending has begun. This applies to triggering NBCON and legacy
> deferred printing as well as klogd waiters.
>
> Since triggering of NBCON threaded printing relies on irq_work, the
> pr_flush() within console_suspend_all() is used to perform the final flushing
> before suspending consoles and blocking irq_work queueing.
> NBCON consoles that are not suspended (due to the usage of the
> "no_console_suspend" boot argument) transition to atomic flushing.
>
> Introduce a new global variable @console_offload_blocked to flag when
> irq_work queueing is to be avoided. The flag is used by
> printk_get_console_flush_type() to avoid allowing deferred printing and
> switch NBCON consoles to atomic flushing. It is also used by
> vprintk_emit() to avoid klogd waking.
>
> Cc: <stable@vger.kernel.org> # 6.13.x because no drivers in 6.12.x
> Fixes: 6b93bb41f6ea ("printk: Add non-BKL (nbcon) console basic
> infrastructure")
> Closes:
> https://lore.ke/
> rnel.org%2Flkml%2FDB9PR04MB8429E7DDF2D93C2695DE401D92C4A%40DB
> 9PR04MB8429.eurprd04.prod.outlook.com&data=05%7C02%7Csherry.sun%4
> 0nxp.com%7C70da522fd21f416f3d1708de2130affc%7C686ea1d3bc2b4c6fa92
> cd99c5c301635%7C0%7C0%7C638984690180001517%7CUnknown%7CTWFp
> bGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4z
> MiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=jzpzHfyW
> 0kCsvsUz4DgQqSdoNxedZF2rnT9gS%2FuBa7A%3D&reserved=0
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

For this patch, Tested-by: Sherry Sun <sherry.sun@nxp.com>

Best Regards
Sherry

> ---
>  kernel/printk/internal.h |  8 ++++---
>  kernel/printk/printk.c   | 51 ++++++++++++++++++++++++++++------------
>  2 files changed, 41 insertions(+), 18 deletions(-)
>
> diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h index
> f72bbfa266d6c..b20929b7d71f5 100644
> --- a/kernel/printk/internal.h
> +++ b/kernel/printk/internal.h
> @@ -230,6 +230,8 @@ struct console_flush_type {
>       bool    legacy_offload;
>  };
>
> +extern bool console_irqwork_blocked;
> +
>  /*
>   * Identify which console flushing methods should be used in the context of
>   * the caller.
> @@ -241,7 +243,7 @@ static inline void printk_get_console_flush_type(struct
> console_flush_type *ft)
>       switch (nbcon_get_default_prio()) {
>       case NBCON_PRIO_NORMAL:
>               if (have_nbcon_console && !have_boot_console) {
> -                     if (printk_kthreads_running)
> +                     if (printk_kthreads_running
> && !console_irqwork_blocked)
>                               ft->nbcon_offload = true;
>                       else
>                               ft->nbcon_atomic = true;
> @@ -251,7 +253,7 @@ static inline void printk_get_console_flush_type(struct
> console_flush_type *ft)
>               if (have_legacy_console || have_boot_console) {
>                       if (!is_printk_legacy_deferred())
>                               ft->legacy_direct = true;
> -                     else
> +                     else if (!console_irqwork_blocked)
>                               ft->legacy_offload = true;
>               }
>               break;
> @@ -264,7 +266,7 @@ static inline void printk_get_console_flush_type(struct
> console_flush_type *ft)
>               if (have_legacy_console || have_boot_console) {
>                       if (!is_printk_legacy_deferred())
>                               ft->legacy_direct = true;
> -                     else
> +                     else if (!console_irqwork_blocked)
>                               ft->legacy_offload = true;
>               }
>               break;
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index
> 5aee9ffb16b9a..94fc4a8662d4b 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -462,6 +462,9 @@ bool have_boot_console;
>  /* See printk_legacy_allow_panic_sync() for details. */  bool
> legacy_allow_panic_sync;
>
> +/* Avoid using irq_work when suspending. */ bool
> +console_irqwork_blocked;
> +
>  #ifdef CONFIG_PRINTK
>  DECLARE_WAIT_QUEUE_HEAD(log_wait);
>  static DECLARE_WAIT_QUEUE_HEAD(legacy_wait);
> @@ -2426,7 +2429,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>
>       if (ft.legacy_offload)
>               defer_console_output();
> -     else
> +     else if (!console_irqwork_blocked)
>               wake_up_klogd();
>
>       return printed_len;
> @@ -2730,10 +2733,20 @@ void console_suspend_all(void)  {
>       struct console *con;
>
> +     if (console_suspend_enabled)
> +             pr_info("Suspending console(s) (use no_console_suspend to
> debug)\n");
> +
> +     /*
> +      * Flush any console backlog and then avoid queueing irq_work until
> +      * console_resume_all(). Until then deferred printing is no longer
> +      * triggered, NBCON consoles transition to atomic flushing, and
> +      * any klogd waiters are not triggered.
> +      */
> +     pr_flush(1000, true);
> +     console_irqwork_blocked = true;
> +
>       if (!console_suspend_enabled)
>               return;
> -     pr_info("Suspending console(s) (use no_console_suspend to
> debug)\n");
> -     pr_flush(1000, true);
>
>       console_list_lock();
>       for_each_console(con)
> @@ -2754,26 +2767,34 @@ void console_resume_all(void)
>       struct console_flush_type ft;
>       struct console *con;
>
> -     if (!console_suspend_enabled)
> -             return;
> -
> -     console_list_lock();
> -     for_each_console(con)
> -             console_srcu_write_flags(con, con->flags &
> ~CON_SUSPENDED);
> -     console_list_unlock();
> -
>       /*
> -      * Ensure that all SRCU list walks have completed. All printing
> -      * contexts must be able to see they are no longer suspended so
> -      * that they are guaranteed to wake up and resume printing.
> +      * Allow queueing irq_work. After restoring console state, deferred
> +      * printing and any klogd waiters need to be triggered in case there
> +      * is now a console backlog.
>        */
> -     synchronize_srcu(&console_srcu);
> +     console_irqwork_blocked = false;
> +
> +     if (console_suspend_enabled) {
> +             console_list_lock();
> +             for_each_console(con)
> +                     console_srcu_write_flags(con, con->flags &
> ~CON_SUSPENDED);
> +             console_list_unlock();
> +
> +             /*
> +              * Ensure that all SRCU list walks have completed. All printing
> +              * contexts must be able to see they are no longer suspended
> so
> +              * that they are guaranteed to wake up and resume printing.
> +              */
> +             synchronize_srcu(&console_srcu);
> +     }
>
>       printk_get_console_flush_type(&ft);
>       if (ft.nbcon_offload)
>               nbcon_kthreads_wake();
>       if (ft.legacy_offload)
>               defer_console_output();
> +     else
> +             wake_up_klogd();
>
>       pr_flush(1000, true);
>  }
> --
> 2.47.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH printk v1 1/1] printk: Avoid scheduling irq_work on suspend
  2025-11-11 16:31   ` Petr Mladek
@ 2025-11-12 15:50     ` John Ogness
  2025-11-13  9:52       ` Petr Mladek
  0 siblings, 1 reply; 8+ messages in thread
From: John Ogness @ 2025-11-12 15:50 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Sherry Sun, Jacky Bai,
	Jon Hunter, Thierry Reding, Derek Barbosa, linux-kernel, stable

On 2025-11-11, Petr Mladek <pmladek@suse.com> wrote:
>> Introduce a new global variable @console_offload_blocked to flag
>
> s/console_offload_blocked/console_irqwork_blocked/

Ack.

>> when irq_work queueing is to be avoided. The flag is used by
>> printk_get_console_flush_type() to avoid allowing deferred printing
>> and switch NBCON consoles to atomic flushing. It is also used by
>> vprintk_emit() to avoid klogd waking.
>> 
>> --- a/kernel/printk/internal.h
>> +++ b/kernel/printk/internal.h
>> @@ -230,6 +230,8 @@ struct console_flush_type {
>>  	bool	legacy_offload;
>>  };
>>  
>> +extern bool console_irqwork_blocked;
>> +
>>  /*
>>   * Identify which console flushing methods should be used in the context of
>>   * the caller.
>> @@ -241,7 +243,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft)
>>  	switch (nbcon_get_default_prio()) {
>>  	case NBCON_PRIO_NORMAL:
>>  		if (have_nbcon_console && !have_boot_console) {
>> -			if (printk_kthreads_running)
>> +			if (printk_kthreads_running && !console_irqwork_blocked)
>>  				ft->nbcon_offload = true;
>>  			else
>>  				ft->nbcon_atomic = true;
>> @@ -251,7 +253,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft)
>>  		if (have_legacy_console || have_boot_console) {
>>  			if (!is_printk_legacy_deferred())
>>  				ft->legacy_direct = true;
>> -			else
>> +			else if (!console_irqwork_blocked)
>>  				ft->legacy_offload = true;
>>  		}
>>  		break;
>
> This is one possibility.
>
> Another possibility would be to block the irq work
> directly in defer_console_output() and wake_up_klogd().
> It would handle all situations, including printk_trigger_flush()
> or defer_console_output().
>
> Or is there any reason, why these two call paths are not handled?

My intention was to focus only on irq_work triggered directly by
printk() calls as well as transitioning NBCON from threaded to direct.

> I do not have strong opinion. This patch makes it more explicit
> when defer_console_output() or wake_up_klogd() is called.
>
> If we move the check into defer_console_output() or wake_up_klogd(),
> it would hide the behavior. But it will make the API more safe
> to use. And wake_up_klogd() is even exported via <linux/printk.h>.

Earlier test versions of this patch did exactly as you are suggesting
here. But I felt like "properly avoiding" deferred/offloaded printing
via printk_get_console_flush_type() (rather than just hacking
irq_work_queue() callers to abort) was a cleaner solution. Especially
since printk_get_console_flush_type() modifications were needed anyway
in order to transition NBCON from threaded to direct.

>> @@ -264,7 +266,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft)
>>  		if (have_legacy_console || have_boot_console) {
>>  			if (!is_printk_legacy_deferred())
>>  				ft->legacy_direct = true;
>> -			else
>> +			else if (!console_irqwork_blocked)
>>  				ft->legacy_offload = true;
>
> This change won't be needed if we move the check into
> defer_console_output() and wake_up_klogd().
>
>>  		}
>>  		break;
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index 5aee9ffb16b9a..94fc4a8662d4b 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2426,7 +2429,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>>  
>>  	if (ft.legacy_offload)
>>  		defer_console_output();
>> -	else
>> +	else if (!console_irqwork_blocked)
>>  		wake_up_klogd();
>
> Same here.
>
>>  
>>  	return printed_len;

I would prefer to keep all the printk_get_console_flush_type() changes
since it returns proper available flush type information. If you would
like to _additionally_ short-circuit __wake_up_klogd() and
nbcon_kthreads_wake() in order to avoid all possible irq_work queueing,
I would be OK with that.

John

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH printk v1 1/1] printk: Avoid scheduling irq_work on suspend
  2025-11-12 15:50     ` John Ogness
@ 2025-11-13  9:52       ` Petr Mladek
  2025-11-13 10:11         ` John Ogness
  0 siblings, 1 reply; 8+ messages in thread
From: Petr Mladek @ 2025-11-13  9:52 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Sherry Sun, Jacky Bai,
	Jon Hunter, Thierry Reding, Derek Barbosa, linux-kernel, stable

On Wed 2025-11-12 16:56:22, John Ogness wrote:
> On 2025-11-11, Petr Mladek <pmladek@suse.com> wrote:
> >> Introduce a new global variable @console_offload_blocked to flag
> >> when irq_work queueing is to be avoided. The flag is used by
> >> printk_get_console_flush_type() to avoid allowing deferred printing
> >> and switch NBCON consoles to atomic flushing. It is also used by
> >> vprintk_emit() to avoid klogd waking.
> >> 
> >> --- a/kernel/printk/internal.h
> >> +++ b/kernel/printk/internal.h
> >> @@ -230,6 +230,8 @@ struct console_flush_type {
> >>  	bool	legacy_offload;
> >>  };
> >>  
> >> +extern bool console_irqwork_blocked;
> >> +
> >>  /*
> >>   * Identify which console flushing methods should be used in the context of
> >>   * the caller.
> >> @@ -241,7 +243,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft)
> >>  	switch (nbcon_get_default_prio()) {
> >>  	case NBCON_PRIO_NORMAL:
> >>  		if (have_nbcon_console && !have_boot_console) {
> >> -			if (printk_kthreads_running)
> >> +			if (printk_kthreads_running && !console_irqwork_blocked)
> >>  				ft->nbcon_offload = true;
> >>  			else
> >>  				ft->nbcon_atomic = true;
> >> @@ -251,7 +253,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft)
> >>  		if (have_legacy_console || have_boot_console) {
> >>  			if (!is_printk_legacy_deferred())
> >>  				ft->legacy_direct = true;
> >> -			else
> >> +			else if (!console_irqwork_blocked)
> >>  				ft->legacy_offload = true;
> >>  		}
> >>  		break;
> >
> > This is one possibility.
> >
> > Another possibility would be to block the irq work
> > directly in defer_console_output() and wake_up_klogd().
> > It would handle all situations, including printk_trigger_flush()
> > or defer_console_output().
> >
> > Or is there any reason, why these two call paths are not handled?
> 
> My intention was to focus only on irq_work triggered directly by
> printk() calls as well as transitioning NBCON from threaded to direct.
> 
> > I do not have strong opinion. This patch makes it more explicit
> > when defer_console_output() or wake_up_klogd() is called.
> >
> > If we move the check into defer_console_output() or wake_up_klogd(),
> > it would hide the behavior. But it will make the API more safe
> > to use. And wake_up_klogd() is even exported via <linux/printk.h>.
> 
> Earlier test versions of this patch did exactly as you are suggesting
> here. But I felt like "properly avoiding" deferred/offloaded printing
> via printk_get_console_flush_type() (rather than just hacking
> irq_work_queue() callers to abort) was a cleaner solution. Especially
> since printk_get_console_flush_type() modifications were needed anyway
> in order to transition NBCON from threaded to direct.

I see.

> >> @@ -264,7 +266,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft)
> >>  		if (have_legacy_console || have_boot_console) {
> >>  			if (!is_printk_legacy_deferred())
> >>  				ft->legacy_direct = true;
> >> -			else
> >> +			else if (!console_irqwork_blocked)
> >>  				ft->legacy_offload = true;
> >
> > This change won't be needed if we move the check into
> > defer_console_output() and wake_up_klogd().
> >
> >>  		}
> >>  		break;
> >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> >> index 5aee9ffb16b9a..94fc4a8662d4b 100644
> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> >> @@ -2426,7 +2429,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> >>  
> >>  	if (ft.legacy_offload)
> >>  		defer_console_output();
> >> -	else
> >> +	else if (!console_irqwork_blocked)
> >>  		wake_up_klogd();
> >
> > Same here.
> >
> >>  
> >>  	return printed_len;
> 
> I would prefer to keep all the printk_get_console_flush_type() changes
> since it returns proper available flush type information. If you would
> like to _additionally_ short-circuit __wake_up_klogd() and
> nbcon_kthreads_wake() in order to avoid all possible irq_work queueing,
> I would be OK with that.

Combining both approaches might be a bit messy. Some code paths might
work correctly because of the explicit check and some just by chance.

But I got an idea. We could add a warning into __wake_up_klogd()
and nbcon_kthreads_wake() to report when they are called unexpectedly.

And we should also prevent calling it from lib/nmi_backtrace.c.

I played with it and came up with the following changes on
top of this patch:

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 45c663124c9b..71e31b908ad1 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -203,6 +203,7 @@ void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
 extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
 extern asmlinkage void dump_stack(void) __cold;
+void printk_try_flush(void);
 void printk_trigger_flush(void);
 void console_try_replay_all(void);
 void printk_legacy_allow_panic_sync(void);
@@ -299,6 +300,9 @@ static inline void dump_stack_lvl(const char *log_lvl)
 static inline void dump_stack(void)
 {
 }
+static inline void printk_try_flush(void)
+{
+}
 static inline void printk_trigger_flush(void)
 {
 }
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index ffd5a2593306..a09b8502e507 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1302,6 +1302,13 @@ void nbcon_kthreads_wake(void)
 	if (!printk_kthreads_running)
 		return;
 
+	/*
+	 * Nobody is allowed to call this function when console irq_work
+	 * is blocked.
+	 */
+	if (WARN_ON_ONCE(console_irqwork_blocked))
+		return;
+
 	cookie = console_srcu_read_lock();
 	for_each_console_srcu(con) {
 		if (!(console_srcu_read_flags(con) & CON_NBCON))
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 334b4edff08c..e9290c418d12 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -4581,6 +4581,13 @@ static void __wake_up_klogd(int val)
 	if (!printk_percpu_data_ready())
 		return;
 
+	/*
+	 * Nobody is allowed to call this function when console irq_work
+	 * is blocked.
+	 */
+	if (WARN_ON_ONCE(console_irqwork_blocked))
+		return;
+
 	preempt_disable();
 	/*
 	 * Guarantee any new records can be seen by tasks preparing to wait
@@ -4637,6 +4644,21 @@ void defer_console_output(void)
 	__wake_up_klogd(PRINTK_PENDING_WAKEUP | PRINTK_PENDING_OUTPUT);
 }
 
+void printk_try_flush(void)
+{
+	struct console_flush_type ft;
+
+	printk_get_console_flush_type(&ft);
+	if (ft.nbcon_atomic)
+		nbcon_atomic_flush_pending();
+	if (ft.legacy_direct) {
+		if (console_trylock())
+			console_unlock();
+	}
+	if (ft.legacy_offload)
+		defer_console_output();
+}
+
 void printk_trigger_flush(void)
 {
 	defer_console_output();
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 33c154264bfe..632bbc28cb79 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -78,10 +78,10 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
 	nmi_backtrace_stall_check(to_cpumask(backtrace_mask));
 
 	/*
-	 * Force flush any remote buffers that might be stuck in IRQ context
+	 * Try flushing messages added CPUs which might be stuck in IRQ context
 	 * and therefore could not run their irq_work.
 	 */
-	printk_trigger_flush();
+	printk_try_flush();
 
 	clear_bit_unlock(0, &backtrace_flag);
 	put_cpu();

How does it look, please?

Best Regards,
Petr

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH printk v1 1/1] printk: Avoid scheduling irq_work on suspend
  2025-11-13  9:52       ` Petr Mladek
@ 2025-11-13 10:11         ` John Ogness
  2025-11-13 11:15           ` Petr Mladek
  0 siblings, 1 reply; 8+ messages in thread
From: John Ogness @ 2025-11-13 10:11 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, Sherry Sun, Jacky Bai,
	Jon Hunter, Thierry Reding, Derek Barbosa, linux-kernel, stable

On 2025-11-13, Petr Mladek <pmladek@suse.com> wrote:
>> I would prefer to keep all the printk_get_console_flush_type() changes
>> since it returns proper available flush type information. If you would
>> like to _additionally_ short-circuit __wake_up_klogd() and
>> nbcon_kthreads_wake() in order to avoid all possible irq_work queueing,
>> I would be OK with that.
>
> Combining both approaches might be a bit messy. Some code paths might
> work correctly because of the explicit check and some just by chance.
>
> But I got an idea. We could add a warning into __wake_up_klogd()
> and nbcon_kthreads_wake() to report when they are called unexpectedly.
>
> And we should also prevent calling it from lib/nmi_backtrace.c.
>
> I played with it and came up with the following changes on
> top of this patch:
>
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index 45c663124c9b..71e31b908ad1 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -203,6 +203,7 @@ void dump_stack_print_info(const char *log_lvl);
>  void show_regs_print_info(const char *log_lvl);
>  extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
>  extern asmlinkage void dump_stack(void) __cold;
> +void printk_try_flush(void);
>  void printk_trigger_flush(void);
>  void console_try_replay_all(void);
>  void printk_legacy_allow_panic_sync(void);
> @@ -299,6 +300,9 @@ static inline void dump_stack_lvl(const char *log_lvl)
>  static inline void dump_stack(void)
>  {
>  }
> +static inline void printk_try_flush(void)
> +{
> +}
>  static inline void printk_trigger_flush(void)
>  {
>  }
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index ffd5a2593306..a09b8502e507 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -1302,6 +1302,13 @@ void nbcon_kthreads_wake(void)
>  	if (!printk_kthreads_running)
>  		return;
>  
> +	/*
> +	 * Nobody is allowed to call this function when console irq_work
> +	 * is blocked.
> +	 */
> +	if (WARN_ON_ONCE(console_irqwork_blocked))
> +		return;
> +
>  	cookie = console_srcu_read_lock();
>  	for_each_console_srcu(con) {
>  		if (!(console_srcu_read_flags(con) & CON_NBCON))
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 334b4edff08c..e9290c418d12 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -4581,6 +4581,13 @@ static void __wake_up_klogd(int val)
>  	if (!printk_percpu_data_ready())
>  		return;
>  
> +	/*
> +	 * Nobody is allowed to call this function when console irq_work
> +	 * is blocked.
> +	 */
> +	if (WARN_ON_ONCE(console_irqwork_blocked))
> +		return;
> +
>  	preempt_disable();
>  	/*
>  	 * Guarantee any new records can be seen by tasks preparing to wait
> @@ -4637,6 +4644,21 @@ void defer_console_output(void)
>  	__wake_up_klogd(PRINTK_PENDING_WAKEUP | PRINTK_PENDING_OUTPUT);
>  }
>  
> +void printk_try_flush(void)
> +{
> +	struct console_flush_type ft;
> +
> +	printk_get_console_flush_type(&ft);
> +	if (ft.nbcon_atomic)
> +		nbcon_atomic_flush_pending();

For completeness, we should probably also have here:

	if (ft.nbcon_offload)
		nbcon_kthreads_wake();

> +	if (ft.legacy_direct) {
> +		if (console_trylock())
> +			console_unlock();
> +	}
> +	if (ft.legacy_offload)
> +		defer_console_output();
> +}
> +
>  void printk_trigger_flush(void)
>  {
>  	defer_console_output();
> diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
> index 33c154264bfe..632bbc28cb79 100644
> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@ -78,10 +78,10 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
>  	nmi_backtrace_stall_check(to_cpumask(backtrace_mask));
>  
>  	/*
> -	 * Force flush any remote buffers that might be stuck in IRQ context
> +	 * Try flushing messages added CPUs which might be stuck in IRQ context
>  	 * and therefore could not run their irq_work.
>  	 */
> -	printk_trigger_flush();
> +	printk_try_flush();
>  
>  	clear_bit_unlock(0, &backtrace_flag);
>  	put_cpu();
>
> How does it look, please?

I like this. But I think the printk_try_flush() implementation should
simply replace the implementation of printk_trigger_flush().

For the arch/powerpc/kernel/watchdog.c:watchdog_timer_interrupt() and
lib/nmi_backtrace.c:nmi_trigger_cpumask_backtrace() sites I think it is
appropriate.

For the kernel/printk/nbcon.c:nbcon_device_release() site I think the
call should be changed to defer_console_output():

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 558ef31779760..73f315fd97a3e 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1849,7 +1849,7 @@ void nbcon_device_release(struct console *con)
 			if (console_trylock())
 				console_unlock();
 		} else if (ft.legacy_offload) {
-			printk_trigger_flush();
+			defer_console_output();
 		}
 	}
 	console_srcu_read_unlock(cookie);

Can I fold all that into a new patch?

John

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH printk v1 1/1] printk: Avoid scheduling irq_work on suspend
  2025-11-13 10:11         ` John Ogness
@ 2025-11-13 11:15           ` Petr Mladek
  0 siblings, 0 replies; 8+ messages in thread
From: Petr Mladek @ 2025-11-13 11:15 UTC (permalink / raw)
  To: John Ogness
  Cc: Sergey Senozhatsky, Steven Rostedt, Sherry Sun, Jacky Bai,
	Jon Hunter, Thierry Reding, Derek Barbosa, linux-kernel, stable

On Thu 2025-11-13 11:17:42, John Ogness wrote:
> On 2025-11-13, Petr Mladek <pmladek@suse.com> wrote:
> >> I would prefer to keep all the printk_get_console_flush_type() changes
> >> since it returns proper available flush type information. If you would
> >> like to _additionally_ short-circuit __wake_up_klogd() and
> >> nbcon_kthreads_wake() in order to avoid all possible irq_work queueing,
> >> I would be OK with that.
> >
> > Combining both approaches might be a bit messy. Some code paths might
> > work correctly because of the explicit check and some just by chance.
> >
> > But I got an idea. We could add a warning into __wake_up_klogd()
> > and nbcon_kthreads_wake() to report when they are called unexpectedly.
> >
> > And we should also prevent calling it from lib/nmi_backtrace.c.
> >
> > I played with it and came up with the following changes on
> > top of this patch:
> >
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 334b4edff08c..e9290c418d12 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -4637,6 +4644,21 @@ void defer_console_output(void)
> >  	__wake_up_klogd(PRINTK_PENDING_WAKEUP | PRINTK_PENDING_OUTPUT);
> >  }
> >  
> > +void printk_try_flush(void)
> > +{
> > +	struct console_flush_type ft;
> > +
> > +	printk_get_console_flush_type(&ft);
> > +	if (ft.nbcon_atomic)
> > +		nbcon_atomic_flush_pending();
> 
> For completeness, we should probably also have here:
> 
> 	if (ft.nbcon_offload)
> 		nbcon_kthreads_wake();

Makes sense. I think that did not add it because I got scared by
the _wake suffix. I forgot that it just queued the irq_work.


> > +	if (ft.legacy_direct) {
> > +		if (console_trylock())
> > +			console_unlock();
> > +	}
> > +	if (ft.legacy_offload)
> > +		defer_console_output();
> > +}
> > +
> >  void printk_trigger_flush(void)
> >  {
> >  	defer_console_output();
> > diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
> > index 33c154264bfe..632bbc28cb79 100644
> > --- a/lib/nmi_backtrace.c
> > +++ b/lib/nmi_backtrace.c
> > @@ -78,10 +78,10 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
> >  	nmi_backtrace_stall_check(to_cpumask(backtrace_mask));
> >  
> >  	/*
> > -	 * Force flush any remote buffers that might be stuck in IRQ context
> > +	 * Try flushing messages added CPUs which might be stuck in IRQ context
> >  	 * and therefore could not run their irq_work.
> >  	 */
> > -	printk_trigger_flush();
> > +	printk_try_flush();
> >  
> >  	clear_bit_unlock(0, &backtrace_flag);
> >  	put_cpu();
> >
> > How does it look, please?
> 
> I like this. But I think the printk_try_flush() implementation should
> simply replace the implementation of printk_trigger_flush().

Make sense.

> For the arch/powerpc/kernel/watchdog.c:watchdog_timer_interrupt() and
> lib/nmi_backtrace.c:nmi_trigger_cpumask_backtrace() sites I think it is
> appropriate.

Yup.

> For the kernel/printk/nbcon.c:nbcon_device_release() site I think the
> call should be changed to defer_console_output():
> 
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index 558ef31779760..73f315fd97a3e 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -1849,7 +1849,7 @@ void nbcon_device_release(struct console *con)
>  			if (console_trylock())
>  				console_unlock();
>  		} else if (ft.legacy_offload) {
> -			printk_trigger_flush();
> +			defer_console_output();
>  		}
>  	}
>  	console_srcu_read_unlock(cookie);

Looks good.

> Can I fold all that into a new patch?

Go ahead.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-11-13 11:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-11 14:43 [PATCH printk v1 0/1] Fix reported suspend failures John Ogness
2025-11-11 14:43 ` [PATCH printk v1 1/1] printk: Avoid scheduling irq_work on suspend John Ogness
2025-11-11 16:31   ` Petr Mladek
2025-11-12 15:50     ` John Ogness
2025-11-13  9:52       ` Petr Mladek
2025-11-13 10:11         ` John Ogness
2025-11-13 11:15           ` Petr Mladek
2025-11-12  5:00   ` Sherry Sun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox