* [PATCH RFC 0/2] printk: Release console_lock between printing records in legacy thread
@ 2025-09-15 12:43 Andrew Murray
2025-09-15 12:43 ` [PATCH RFC 1/2] printk: Introduce console_flush_one_record Andrew Murray
2025-09-15 12:43 ` [PATCH RFC 2/2] printk: Use console_flush_one_record for legacy printer kthread Andrew Murray
0 siblings, 2 replies; 12+ messages in thread
From: Andrew Murray @ 2025-09-15 12:43 UTC (permalink / raw)
To: Petr Mladek, Steven Rostedt, John Ogness, Sergey Senozhatsky
Cc: linux-kernel, Andrew Murray
The legacy printer kthread uses console_lock and
__console_flush_and_unlock to flush records to the console which
holds the console_lock being held for the entire flush. This
results in large waiting times for console_lock waiters
especially where there is a large volume of records or where the
console is slow (e.g. serial). During boot, this contention causes
delays in the filp_open call in console_on_rootfs.
Let's instead release and reacquire console_lock in between
printing individual records.
I've tested this on a PocketBeagle 2, with the following boot args:
"console=ttyS2,9600 initcall_debug=1 loglevel=10"
Without the patches:
[ 5.276850] +console_on_rootfs/filp_open
[ 5.311995] mmc1: SDHCI controller on fa00000.mmc [fa00000.mmc] using ADMA 64-bit
[ 5.313665] probe of 2b300050.target-module returned 517 after 1179 usecs
[ 5.315279] probe of fa00000.mmc returned 0 after 263618 usecs
[ 5.383935] mmc1: new ultra high speed SDR104 SDHC card at address 5048
[ 5.394496] mmcblk1: mmc1:5048 SD32G 29.7 GiB
[ 5.417837] mmcblk1: p1 p2
[ 5.423320] probe of mmc1:5048 returned 0 after 36996 usecs
[ 5.425531] probe of 2b300050.target-module returned 517 after 1312 usecs
[ 15.339051] probe of 2b300050.target-module returned 517 after 1062 usecs
[ 15.424672] platform 2b300050.target-module: deferred probe pending: (reason unknown)
[ 145.531229] -console_on_rootfs/filp_open
and with:
[ 5.090776] +console_on_rootfs/filp_open
[ 5.316895] mmc1: SDHCI controller on fa00000.mmc [fa00000.mmc] using ADMA 64-bit
[ 5.318994] probe of 2b300050.target-module returned 517 after 1092 usecs
[ 5.320578] probe of fa00000.mmc returned 0 after 333601 usecs
[ 5.390914] mmc1: new ultra high speed SDR104 SDHC card at address 5048
[ 5.405461] mmcblk1: mmc1:5048 SD32G 29.7 GiB
[ 5.429104] mmcblk1: p1 p2
[ 5.434603] probe of mmc1:5048 returned 0 after 41200 usecs
[ 5.436515] probe of 2b300050.target-module returned 517 after 1040 usecs
[ 7.203025] -console_on_rootfs/filp_open
Where I've added a printk surrounding the call in console_on_rootfs to filp_open.
For reference, where loglevel=1 the console_on_rootfs delay is negligible.
Please let me know if there are other ways I can measure the impact of this.
Signed-off-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
---
Andrew Murray (2):
printk: Introduce console_flush_one_record
printk: Use console_flush_one_record for legacy printer kthread
kernel/printk/printk.c | 202 +++++++++++++++++++++++++++++++++++--------------
1 file changed, 147 insertions(+), 55 deletions(-)
---
base-commit: f83ec76bf285bea5727f478a68b894f5543ca76e
change-id: 20250914-printk_legacy_thread_console_lock-1c27f59bf990
Best regards,
--
Andrew Murray <amurray@thegoodpenguin.co.uk>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH RFC 1/2] printk: Introduce console_flush_one_record
2025-09-15 12:43 [PATCH RFC 0/2] printk: Release console_lock between printing records in legacy thread Andrew Murray
@ 2025-09-15 12:43 ` Andrew Murray
2025-09-18 15:22 ` Petr Mladek
2025-09-15 12:43 ` [PATCH RFC 2/2] printk: Use console_flush_one_record for legacy printer kthread Andrew Murray
1 sibling, 1 reply; 12+ messages in thread
From: Andrew Murray @ 2025-09-15 12:43 UTC (permalink / raw)
To: Petr Mladek, Steven Rostedt, John Ogness, Sergey Senozhatsky
Cc: linux-kernel, Andrew Murray
console_flush_all prints all remaining records to all usable consoles
whilst its caller holds console_lock. This can result in large waiting
times for those waiting for console_lock especially where there is a
large volume of records or where the console is slow (e.g. serial).
Let's extract the parts of this function which print a single record
into a new function named console_flush_one_record. This can later
be used for functions that will release and reacquire console_lock
between records.
This commit should not change existing functionality.
Signed-off-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
---
kernel/printk/printk.c | 162 +++++++++++++++++++++++++++++++------------------
1 file changed, 102 insertions(+), 60 deletions(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 0efbcdda9aaba9d8d877df5e4f1db002d3a596bc..75a3c47e9c0e645a3198c5f56e47df2a8d1871e6 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3161,6 +3161,101 @@ static inline void printk_kthreads_check_locked(void) { }
#endif /* CONFIG_PRINTK */
+
+/*
+ * Print out one record for each console.
+ *
+ * @do_cond_resched is set by the caller. It can be true only in schedulable
+ * context.
+ *
+ * @next_seq is set to the sequence number after the last available record.
+ * The value is valid only when this function returns true.
+ *
+ * @handover will be set to true if a printk waiter has taken over the
+ * console_lock, in which case the caller is no longer holding the
+ * console_lock. Otherwise it is set to false.
+ *
+ * @any_usable will be set to true if there are any usable consoles.
+ *
+ * Returns true when there was at least one usable console and a record was
+ * flushed. A returned false indicates there were no records to flush for any
+ * of the consoles. It may also indicate that there were no usable consoles,
+ * the context has been lost or there is a panic suitation. Regardless the
+ * reason, the caller should assume it is not useful to immediately try again.
+ *
+ * Requires the console_lock.
+ */
+static bool console_flush_one_record(bool do_cond_resched, u64 *next_seq, bool *handover,
+ bool *any_usable)
+{
+ struct console_flush_type ft;
+ struct console *con;
+ bool any_progress;
+ int cookie;
+
+ *any_usable = false;
+ any_progress = false;
+
+ printk_get_console_flush_type(&ft);
+
+ cookie = console_srcu_read_lock();
+ for_each_console_srcu(con) {
+ short flags = console_srcu_read_flags(con);
+ u64 printk_seq;
+ bool progress;
+
+ /*
+ * console_flush_one_record() is only responsible for
+ * nbcon consoles when the nbcon consoles cannot print via
+ * their atomic or threaded flushing.
+ */
+ if ((flags & CON_NBCON) && (ft.nbcon_atomic || ft.nbcon_offload))
+ continue;
+
+ if (!console_is_usable(con, flags, !do_cond_resched))
+ continue;
+ *any_usable = true;
+
+ if (flags & CON_NBCON) {
+ progress = nbcon_legacy_emit_next_record(con, handover, cookie,
+ !do_cond_resched);
+ printk_seq = nbcon_seq_read(con);
+ } else {
+ progress = console_emit_next_record(con, handover, cookie);
+ printk_seq = con->seq;
+ }
+
+ /*
+ * If a handover has occurred, the SRCU read lock
+ * is already released.
+ */
+ if (*handover)
+ return false;
+
+ /* Track the next of the highest seq flushed. */
+ if (printk_seq > *next_seq)
+ *next_seq = printk_seq;
+
+ if (!progress)
+ continue;
+ any_progress = true;
+
+ /* Allow panic_cpu to take over the consoles safely. */
+ if (other_cpu_in_panic())
+ goto abandon;
+
+ if (do_cond_resched)
+ cond_resched();
+ }
+ console_srcu_read_unlock(cookie);
+
+ return any_progress;
+
+abandon:
+ console_srcu_read_unlock(cookie);
+ return false;
+}
+
/*
* Print out all remaining records to all consoles.
*
@@ -3186,77 +3281,24 @@ static inline void printk_kthreads_check_locked(void) { }
*/
static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handover)
{
- struct console_flush_type ft;
- bool any_usable = false;
- struct console *con;
+ bool any_usable;
bool any_progress;
- int cookie;
*next_seq = 0;
*handover = false;
do {
- any_progress = false;
-
- printk_get_console_flush_type(&ft);
-
- cookie = console_srcu_read_lock();
- for_each_console_srcu(con) {
- short flags = console_srcu_read_flags(con);
- u64 printk_seq;
- bool progress;
+ any_progress = console_flush_one_record(do_cond_resched, next_seq, handover,
+ &any_usable);
- /*
- * console_flush_all() is only responsible for nbcon
- * consoles when the nbcon consoles cannot print via
- * their atomic or threaded flushing.
- */
- if ((flags & CON_NBCON) && (ft.nbcon_atomic || ft.nbcon_offload))
- continue;
-
- if (!console_is_usable(con, flags, !do_cond_resched))
- continue;
- any_usable = true;
-
- if (flags & CON_NBCON) {
- progress = nbcon_legacy_emit_next_record(con, handover, cookie,
- !do_cond_resched);
- printk_seq = nbcon_seq_read(con);
- } else {
- progress = console_emit_next_record(con, handover, cookie);
- printk_seq = con->seq;
- }
-
- /*
- * If a handover has occurred, the SRCU read lock
- * is already released.
- */
- if (*handover)
- return false;
-
- /* Track the next of the highest seq flushed. */
- if (printk_seq > *next_seq)
- *next_seq = printk_seq;
-
- if (!progress)
- continue;
- any_progress = true;
-
- /* Allow panic_cpu to take over the consoles safely. */
- if (other_cpu_in_panic())
- goto abandon;
+ if (*handover)
+ return false;
- if (do_cond_resched)
- cond_resched();
- }
- console_srcu_read_unlock(cookie);
+ if (other_cpu_in_panic())
+ return false;
} while (any_progress);
return any_usable;
-
-abandon:
- console_srcu_read_unlock(cookie);
- return false;
}
static void __console_flush_and_unlock(void)
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH RFC 2/2] printk: Use console_flush_one_record for legacy printer kthread
2025-09-15 12:43 [PATCH RFC 0/2] printk: Release console_lock between printing records in legacy thread Andrew Murray
2025-09-15 12:43 ` [PATCH RFC 1/2] printk: Introduce console_flush_one_record Andrew Murray
@ 2025-09-15 12:43 ` Andrew Murray
2025-09-18 16:27 ` Petr Mladek
1 sibling, 1 reply; 12+ messages in thread
From: Andrew Murray @ 2025-09-15 12:43 UTC (permalink / raw)
To: Petr Mladek, Steven Rostedt, John Ogness, Sergey Senozhatsky
Cc: linux-kernel, Andrew Murray
The legacy printer kthread uses console_lock and
__console_flush_and_unlock to flush records to the console. This
approach results in the console_lock being held for the entire
duration of a flush. This can result in large waiting times for
those waiting for console_lock especially where there is a large
volume of records or where the console is slow (e.g. serial). This
contention is observed during boot, as the call to filp_open in
console_on_rootfs will delay progression to userspace until any
in-flight flush is completed.
Let's instead use __console_flush_unlocked which releases and
reacquires console_lock between records.
Signed-off-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
---
kernel/printk/printk.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 52 insertions(+), 2 deletions(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 75a3c47e9c0e645a3198c5f56e47df2a8d1871e6..53daab5cdee537c2ff55702104e495005352db1b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3301,6 +3301,46 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
return any_usable;
}
+/*
+ * Print out all remaining records to all consoles.
+ *
+ * @next_seq is set to the sequence number after the last available record.
+ * The value is valid only when this function returns true. It means that all
+ * usable consoles are completely flushed.
+ *
+ * @handover will be set to true if a printk waiter has taken over the
+ * console_lock, in which case the caller is no longer holding the
+ * console_lock. Otherwise it is set to false.
+ *
+ * Returns true when there was at least one usable console and all messages
+ * were flushed to all usable consoles. A returned false informs the caller
+ * that everything was not flushed (either there were no usable consoles or
+ * another context has taken over printing or it is a panic situation and this
+ * is not the panic CPU). Regardless the reason, the caller should assume it
+ * is not useful to immediately try again.
+ */
+static bool console_flush_all_unlocked(u64 *next_seq, bool *handover)
+{
+ bool any_usable;
+ bool any_progress;
+
+ *next_seq = 0;
+ *handover = false;
+
+ do {
+ console_lock();
+ any_progress = console_flush_one_record(true, next_seq, handover, &any_usable);
+
+ if (*handover)
+ return false;
+
+ __console_unlock();
+
+ } while (any_progress);
+
+ return any_usable;
+}
+
static void __console_flush_and_unlock(void)
{
bool do_cond_resched;
@@ -3346,6 +3386,17 @@ static void __console_flush_and_unlock(void)
} while (prb_read_valid(prb, next_seq, NULL) && console_trylock());
}
+static void __console_flush_unlocked(void)
+{
+ bool handover;
+ bool flushed;
+ u64 next_seq;
+
+ do {
+ flushed = console_flush_all_unlocked(&next_seq, &handover);
+ } while (flushed && !handover && prb_read_valid(prb, next_seq, NULL));
+}
+
/**
* console_unlock - unblock the legacy console subsystem from printing
*
@@ -3676,8 +3727,7 @@ static int legacy_kthread_func(void *unused)
if (kthread_should_stop())
break;
- console_lock();
- __console_flush_and_unlock();
+ __console_flush_unlocked();
}
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 1/2] printk: Introduce console_flush_one_record
2025-09-15 12:43 ` [PATCH RFC 1/2] printk: Introduce console_flush_one_record Andrew Murray
@ 2025-09-18 15:22 ` Petr Mladek
2025-09-19 13:06 ` Andrew Murray
0 siblings, 1 reply; 12+ messages in thread
From: Petr Mladek @ 2025-09-18 15:22 UTC (permalink / raw)
To: Andrew Murray
Cc: Steven Rostedt, John Ogness, Sergey Senozhatsky, linux-kernel
On Mon 2025-09-15 13:43:05, Andrew Murray wrote:
> console_flush_all prints all remaining records to all usable consoles
> whilst its caller holds console_lock. This can result in large waiting
> times for those waiting for console_lock especially where there is a
> large volume of records or where the console is slow (e.g. serial).
>
> Let's extract the parts of this function which print a single record
> into a new function named console_flush_one_record. This can later
> be used for functions that will release and reacquire console_lock
> between records.
>
> This commit should not change existing functionality.
>
> Signed-off-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
The code shufling looks correct to me. Feel free to use:
Reviewed-by: Petr Mladek <pmladek@suse.com>
See a nit below.
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3186,77 +3281,24 @@ static inline void printk_kthreads_check_locked(void) { }
> */
> static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handover)
> {
> - struct console_flush_type ft;
> - bool any_usable = false;
> - struct console *con;
> + bool any_usable;
> bool any_progress;
> - int cookie;
>
> *next_seq = 0;
> *handover = false;
>
> do {
> - any_progress = false;
> -
> - printk_get_console_flush_type(&ft);
> -
> - cookie = console_srcu_read_lock();
> - for_each_console_srcu(con) {
> - short flags = console_srcu_read_flags(con);
> - u64 printk_seq;
> - bool progress;
> + any_progress = console_flush_one_record(do_cond_resched, next_seq, handover,
> + &any_usable);
>
> - /*
> - * console_flush_all() is only responsible for nbcon
> - * consoles when the nbcon consoles cannot print via
> - * their atomic or threaded flushing.
> - */
> - if ((flags & CON_NBCON) && (ft.nbcon_atomic || ft.nbcon_offload))
> - continue;
> -
> - if (!console_is_usable(con, flags, !do_cond_resched))
> - continue;
> - any_usable = true;
> -
> - if (flags & CON_NBCON) {
> - progress = nbcon_legacy_emit_next_record(con, handover, cookie,
> - !do_cond_resched);
> - printk_seq = nbcon_seq_read(con);
> - } else {
> - progress = console_emit_next_record(con, handover, cookie);
> - printk_seq = con->seq;
> - }
> -
> - /*
> - * If a handover has occurred, the SRCU read lock
> - * is already released.
> - */
> - if (*handover)
> - return false;
> -
> - /* Track the next of the highest seq flushed. */
> - if (printk_seq > *next_seq)
> - *next_seq = printk_seq;
> -
> - if (!progress)
> - continue;
> - any_progress = true;
> -
> - /* Allow panic_cpu to take over the consoles safely. */
> - if (other_cpu_in_panic())
> - goto abandon;
> + if (*handover)
> + return false;
>
> - if (do_cond_resched)
> - cond_resched();
> - }
> - console_srcu_read_unlock(cookie);
> + if (other_cpu_in_panic())
> + return false;
> } while (any_progress);
>
> return any_usable;
The semantic of the function was always complicated. It results
into duplicated checks in the callers.
I think that we could do slightly better in this particular case.
But I would do the refactoring in a separate patch to make this
one easier for review.
I played with the code and came up with:
From 7a3c930a12d7c0157f404a4a834d1f92f3070799 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Thu, 18 Sep 2025 16:57:58 +0200
Subject: [RFC] printk: console_flush_one_record() code cleanup
console_flush_one_record() and console_flush_all() duplicate several
checks. They both want to tell the caller that consoles are not
longer usable in this context because it has lost the lock or
the lock has to be reserved for the panic CPU.
Pass this information by clearing the @any_usable parameter value
which has the same effect.
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
kernel/printk/printk.c | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 53daab5cdee5..2cceedcc3d34 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3175,7 +3175,8 @@ static inline void printk_kthreads_check_locked(void) { }
* console_lock, in which case the caller is no longer holding the
* console_lock. Otherwise it is set to false.
*
- * @any_usable will be set to true if there are any usable consoles.
+ * @any_usable will be set to true if there are any usable consoles
+ * in this context.
*
* Returns true when there was at least one usable console and a record was
* flushed. A returned false indicates there were no records to flush for any
@@ -3230,7 +3231,7 @@ static bool console_flush_one_record(bool do_cond_resched, u64 *next_seq, bool *
* is already released.
*/
if (*handover)
- return false;
+ goto unusable;
/* Track the next of the highest seq flushed. */
if (printk_seq > *next_seq)
@@ -3242,7 +3243,7 @@ static bool console_flush_one_record(bool do_cond_resched, u64 *next_seq, bool *
/* Allow panic_cpu to take over the consoles safely. */
if (other_cpu_in_panic())
- goto abandon;
+ goto unusable_srcu;
if (do_cond_resched)
cond_resched();
@@ -3251,8 +3252,10 @@ static bool console_flush_one_record(bool do_cond_resched, u64 *next_seq, bool *
return any_progress;
-abandon:
+unusable_srcu:
console_srcu_read_unlock(cookie);
+unusable:
+ *any_usable = false;
return false;
}
@@ -3288,14 +3291,9 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
*handover = false;
do {
- any_progress = console_flush_one_record(do_cond_resched, next_seq, handover,
- &any_usable);
+ any_progress = console_flush_one_record(do_cond_resched, next_seq,
+ handover, &any_usable);
- if (*handover)
- return false;
-
- if (other_cpu_in_panic())
- return false;
} while (any_progress);
return any_usable;
--
2.51.0
Best Regards,
Petr
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 2/2] printk: Use console_flush_one_record for legacy printer kthread
2025-09-15 12:43 ` [PATCH RFC 2/2] printk: Use console_flush_one_record for legacy printer kthread Andrew Murray
@ 2025-09-18 16:27 ` Petr Mladek
2025-09-19 13:38 ` Andrew Murray
0 siblings, 1 reply; 12+ messages in thread
From: Petr Mladek @ 2025-09-18 16:27 UTC (permalink / raw)
To: Andrew Murray
Cc: Steven Rostedt, John Ogness, Sergey Senozhatsky, linux-kernel
On Mon 2025-09-15 13:43:06, Andrew Murray wrote:
> The legacy printer kthread uses console_lock and
> __console_flush_and_unlock to flush records to the console. This
> approach results in the console_lock being held for the entire
> duration of a flush. This can result in large waiting times for
> those waiting for console_lock especially where there is a large
> volume of records or where the console is slow (e.g. serial). This
> contention is observed during boot, as the call to filp_open in
> console_on_rootfs will delay progression to userspace until any
> in-flight flush is completed.
It would be great to add here the boot logs from the cover letter
so that the real life numbers are stored in the git log.
> Let's instead use __console_flush_unlocked which releases and
> reacquires console_lock between records.
It seems that the patch does the right thing.
I would just suggest some refactoring, see below.
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3301,6 +3301,46 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
> return any_usable;
> }
>
> +/*
> + * Print out all remaining records to all consoles.
> + *
> + * @next_seq is set to the sequence number after the last available record.
> + * The value is valid only when this function returns true. It means that all
> + * usable consoles are completely flushed.
> + *
> + * @handover will be set to true if a printk waiter has taken over the
> + * console_lock, in which case the caller is no longer holding the
> + * console_lock. Otherwise it is set to false.
> + *
> + * Returns true when there was at least one usable console and all messages
> + * were flushed to all usable consoles. A returned false informs the caller
> + * that everything was not flushed (either there were no usable consoles or
> + * another context has taken over printing or it is a panic situation and this
> + * is not the panic CPU). Regardless the reason, the caller should assume it
> + * is not useful to immediately try again.
> + */
> +static bool console_flush_all_unlocked(u64 *next_seq, bool *handover)
> +{
> + bool any_usable;
> + bool any_progress;
> +
> + *next_seq = 0;
> + *handover = false;
> +
> + do {
> + console_lock();
> + any_progress = console_flush_one_record(true, next_seq, handover, &any_usable);
> +
> + if (*handover)
> + return false;
> +
> + __console_unlock();
> +
> + } while (any_progress);
> +
> + return any_usable;
> +}
This is yet another console_flush() API with a complicated semantic.
(return value, parameters). It is not bad. But it would be nice
if we did not need it ;-)
> static void __console_flush_and_unlock(void)
> {
> bool do_cond_resched;
> @@ -3346,6 +3386,17 @@ static void __console_flush_and_unlock(void)
> } while (prb_read_valid(prb, next_seq, NULL) && console_trylock());
> }
>
> +static void __console_flush_unlocked(void)
> +{
> + bool handover;
> + bool flushed;
> + u64 next_seq;
> +
> + do {
> + flushed = console_flush_all_unlocked(&next_seq, &handover);
> + } while (flushed && !handover && prb_read_valid(prb, next_seq, NULL));
> +}
The semantic of this function is not much clear from the name.
IMHO, it would be a puzzle for people who would try to understand
the code in the future.
> +
> /**
> * console_unlock - unblock the legacy console subsystem from printing
> *
> @@ -3676,8 +3727,7 @@ static int legacy_kthread_func(void *unused)
> if (kthread_should_stop())
> break;
>
> - console_lock();
> - __console_flush_and_unlock();
> + __console_flush_unlocked();
IMHO, it would be pretty hard to understand what it does by trying to
understand all the layers of the code.
It might be better to open code it. And I would take inspiration
in the nbcon_kthread_func().
I played with the code and came up with:
static int legacy_kthread_func(void *unused)
{
bool any_progress;
wait_for_event:
wait_event_interruptible(legacy_wait, legacy_kthread_should_wakeup());
do {
bool any_usable;
bool handover;
u64 next_seq;
if (kthread_should_stop())
return 0;
console_lock();
any_progress = console_flush_one_record(true, &next_seq, &handover, &any_usable);
if (!handover)
__console_unlock();
/*
* There is no need to check whether there is any usable
* console in this context or whether there are pending
* messages. It is checked by legacy_kthread_should_wakeup()
* anyway. And console_lock() will never succeed again if
* there was panic in progress.
*/
} while (any_progress);
goto wait_for_event;
}
What do you think, please?
Best Regards,
Petr
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 1/2] printk: Introduce console_flush_one_record
2025-09-18 15:22 ` Petr Mladek
@ 2025-09-19 13:06 ` Andrew Murray
2025-09-23 13:30 ` Petr Mladek
0 siblings, 1 reply; 12+ messages in thread
From: Andrew Murray @ 2025-09-19 13:06 UTC (permalink / raw)
To: Petr Mladek; +Cc: Steven Rostedt, John Ogness, Sergey Senozhatsky, linux-kernel
On Thu, 18 Sept 2025 at 16:22, Petr Mladek <pmladek@suse.com> wrote:
>
> On Mon 2025-09-15 13:43:05, Andrew Murray wrote:
> > console_flush_all prints all remaining records to all usable consoles
> > whilst its caller holds console_lock. This can result in large waiting
> > times for those waiting for console_lock especially where there is a
> > large volume of records or where the console is slow (e.g. serial).
> >
> > Let's extract the parts of this function which print a single record
> > into a new function named console_flush_one_record. This can later
> > be used for functions that will release and reacquire console_lock
> > between records.
> >
> > This commit should not change existing functionality.
> >
> > Signed-off-by: Andrew Murray <amurray@thegoodpenguin.co.uk>
>
> The code shufling looks correct to me. Feel free to use:
>
> Reviewed-by: Petr Mladek <pmladek@suse.com>
Thanks.
>
> See a nit below.
>
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -3186,77 +3281,24 @@ static inline void printk_kthreads_check_locked(void) { }
> > */
> > static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handover)
> > {
> > - struct console_flush_type ft;
> > - bool any_usable = false;
> > - struct console *con;
> > + bool any_usable;
> > bool any_progress;
> > - int cookie;
> >
> > *next_seq = 0;
> > *handover = false;
> >
> > do {
> > - any_progress = false;
> > -
> > - printk_get_console_flush_type(&ft);
> > -
> > - cookie = console_srcu_read_lock();
> > - for_each_console_srcu(con) {
> > - short flags = console_srcu_read_flags(con);
> > - u64 printk_seq;
> > - bool progress;
> > + any_progress = console_flush_one_record(do_cond_resched, next_seq, handover,
> > + &any_usable);
Ah, there was an error in my patch. Previously console_flush_all would
set any_usable to false at the start, and then set it to true if any
usable consoles were seen whilst printing any of the records. My
changes resulted in any_usable being set to false on each entry of
console_flush_one_record - thus effectively only indicating if any
consoles were usable for *only* the last record.
Thus to preserve functionality, I'd need to additionally add these changes:
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 75a3c47e9c0e..060d4919de32 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3193,7 +3193,6 @@ static bool console_flush_one_record(bool
do_cond_resched, u64 *next_seq, bool *
bool any_progress;
int cookie;
- *any_usable = false;
any_progress = false;
printk_get_console_flush_type(&ft);
@@ -3281,7 +3280,7 @@ static bool console_flush_one_record(bool
do_cond_resched, u64 *next_seq, bool *
*/
static bool console_flush_all(bool do_cond_resched, u64 *next_seq,
bool *handover)
{
- bool any_usable;
+ bool any_usable = false;
bool any_progress;
*next_seq = 0;
Are you happy for me to keep your reviewed-by with these changes?
> >
> > - /*
> > - * console_flush_all() is only responsible for nbcon
> > - * consoles when the nbcon consoles cannot print via
> > - * their atomic or threaded flushing.
> > - */
> > - if ((flags & CON_NBCON) && (ft.nbcon_atomic || ft.nbcon_offload))
> > - continue;
> > -
> > - if (!console_is_usable(con, flags, !do_cond_resched))
> > - continue;
> > - any_usable = true;
> > -
> > - if (flags & CON_NBCON) {
> > - progress = nbcon_legacy_emit_next_record(con, handover, cookie,
> > - !do_cond_resched);
> > - printk_seq = nbcon_seq_read(con);
> > - } else {
> > - progress = console_emit_next_record(con, handover, cookie);
> > - printk_seq = con->seq;
> > - }
> > -
> > - /*
> > - * If a handover has occurred, the SRCU read lock
> > - * is already released.
> > - */
> > - if (*handover)
> > - return false;
> > -
> > - /* Track the next of the highest seq flushed. */
> > - if (printk_seq > *next_seq)
> > - *next_seq = printk_seq;
> > -
> > - if (!progress)
> > - continue;
> > - any_progress = true;
> > -
> > - /* Allow panic_cpu to take over the consoles safely. */
> > - if (other_cpu_in_panic())
> > - goto abandon;
> > + if (*handover)
> > + return false;
> >
> > - if (do_cond_resched)
> > - cond_resched();
> > - }
> > - console_srcu_read_unlock(cookie);
> > + if (other_cpu_in_panic())
> > + return false;
> > } while (any_progress);
> >
> > return any_usable;
>
> The semantic of the function was always complicated. It results
> into duplicated checks in the callers.
Yes, I've found this confusing.
any_usable is used by callers for different purposes. For
__console_flush_and_unlock, its used to determine if its possible to
try and do more work (i.e. to check if they should bother attempting
to print messages which may have been added between flush and unlock)
- i.e. useable consoles exist but also no panics or handovers have
happened. For get_init_console_seq it is used to determine if init_seq
was updated.
The comment for console_flush_all says:
* Returns true when there was at least one usable console and all messages
* were flushed to all usable consoles.
But I don't think that's true. It returns true when there was at least
one usable console and at least one message was attempted to be
written (I guess nbcon_legacy_emit_next_record or
console_emit_next_record could bail out before writing its message).
Unless I'm missing some assumptions about what can change in
iterations of the do/while loop.
>
> I think that we could do slightly better in this particular case.
> But I would do the refactoring in a separate patch to make this
> one easier for review.
>
> I played with the code and came up with:
>
> From 7a3c930a12d7c0157f404a4a834d1f92f3070799 Mon Sep 17 00:00:00 2001
> From: Petr Mladek <pmladek@suse.com>
> Date: Thu, 18 Sep 2025 16:57:58 +0200
> Subject: [RFC] printk: console_flush_one_record() code cleanup
>
> console_flush_one_record() and console_flush_all() duplicate several
> checks. They both want to tell the caller that consoles are not
> longer usable in this context because it has lost the lock or
> the lock has to be reserved for the panic CPU.
>
> Pass this information by clearing the @any_usable parameter value
> which has the same effect.
>
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---
> kernel/printk/printk.c | 20 +++++++++-----------
> 1 file changed, 9 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 53daab5cdee5..2cceedcc3d34 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3175,7 +3175,8 @@ static inline void printk_kthreads_check_locked(void) { }
> * console_lock, in which case the caller is no longer holding the
> * console_lock. Otherwise it is set to false.
> *
> - * @any_usable will be set to true if there are any usable consoles.
> + * @any_usable will be set to true if there are any usable consoles
> + * in this context.
> *
> * Returns true when there was at least one usable console and a record was
> * flushed. A returned false indicates there were no records to flush for any
> @@ -3230,7 +3231,7 @@ static bool console_flush_one_record(bool do_cond_resched, u64 *next_seq, bool *
> * is already released.
> */
> if (*handover)
> - return false;
> + goto unusable;
>
> /* Track the next of the highest seq flushed. */
> if (printk_seq > *next_seq)
> @@ -3242,7 +3243,7 @@ static bool console_flush_one_record(bool do_cond_resched, u64 *next_seq, bool *
>
> /* Allow panic_cpu to take over the consoles safely. */
> if (other_cpu_in_panic())
> - goto abandon;
> + goto unusable_srcu;
>
> if (do_cond_resched)
> cond_resched();
> @@ -3251,8 +3252,10 @@ static bool console_flush_one_record(bool do_cond_resched, u64 *next_seq, bool *
>
> return any_progress;
>
> -abandon:
> +unusable_srcu:
> console_srcu_read_unlock(cookie);
> +unusable:
> + *any_usable = false;
I'm struggling to understand this. I think your intention is to start
with any_usable being true (which would have to be set in
console_flush_all) and then set it to false where necessary (i.e.
invert the current logic). unusable_srcu and unusable are two cases
where false should be returned - however false should also be returned
where there are no consoles (for_each_console_srcu gives nothing) - or
where *all* of the consoles it provides are either NBCON(atomic or
offload) or where the !console_is_usable. I.e. console_flush_all may
erroneously return true, which may cause the loop in
__console_flush_and_unlock to iterate too many times.
> return false;
> }
>
> @@ -3288,14 +3291,9 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
> *handover = false;
>
> do {
> - any_progress = console_flush_one_record(do_cond_resched, next_seq, handover,
> - &any_usable);
> + any_progress = console_flush_one_record(do_cond_resched, next_seq,
> + handover, &any_usable);
>
> - if (*handover)
> - return false;
> -
> - if (other_cpu_in_panic())
> - return false;
Ah, that makes a lot of sense, any_progress will be false in these cases anyway.
> } while (any_progress);
>
> return any_usable;
> --
> 2.51.0
>
> Best Regards,
> Petr
Thanks,
Andrew Murray
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 2/2] printk: Use console_flush_one_record for legacy printer kthread
2025-09-18 16:27 ` Petr Mladek
@ 2025-09-19 13:38 ` Andrew Murray
2025-09-22 13:27 ` John Ogness
0 siblings, 1 reply; 12+ messages in thread
From: Andrew Murray @ 2025-09-19 13:38 UTC (permalink / raw)
To: Petr Mladek; +Cc: Steven Rostedt, John Ogness, Sergey Senozhatsky, linux-kernel
On Thu, 18 Sept 2025 at 17:27, Petr Mladek <pmladek@suse.com> wrote:
>
> On Mon 2025-09-15 13:43:06, Andrew Murray wrote:
> > The legacy printer kthread uses console_lock and
> > __console_flush_and_unlock to flush records to the console. This
> > approach results in the console_lock being held for the entire
> > duration of a flush. This can result in large waiting times for
> > those waiting for console_lock especially where there is a large
> > volume of records or where the console is slow (e.g. serial). This
> > contention is observed during boot, as the call to filp_open in
> > console_on_rootfs will delay progression to userspace until any
> > in-flight flush is completed.
>
> It would be great to add here the boot logs from the cover letter
> so that the real life numbers are stored in the git log.
No problem, I'll add them to the next revision.
>
> > Let's instead use __console_flush_unlocked which releases and
> > reacquires console_lock between records.
>
> It seems that the patch does the right thing.
> I would just suggest some refactoring, see below.
>
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -3301,6 +3301,46 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
> > return any_usable;
> > }
> >
> > +/*
> > + * Print out all remaining records to all consoles.
> > + *
> > + * @next_seq is set to the sequence number after the last available record.
> > + * The value is valid only when this function returns true. It means that all
> > + * usable consoles are completely flushed.
> > + *
> > + * @handover will be set to true if a printk waiter has taken over the
> > + * console_lock, in which case the caller is no longer holding the
> > + * console_lock. Otherwise it is set to false.
> > + *
> > + * Returns true when there was at least one usable console and all messages
> > + * were flushed to all usable consoles. A returned false informs the caller
> > + * that everything was not flushed (either there were no usable consoles or
> > + * another context has taken over printing or it is a panic situation and this
> > + * is not the panic CPU). Regardless the reason, the caller should assume it
> > + * is not useful to immediately try again.
> > + */
> > +static bool console_flush_all_unlocked(u64 *next_seq, bool *handover)
> > +{
> > + bool any_usable;
> > + bool any_progress;
> > +
> > + *next_seq = 0;
> > + *handover = false;
> > +
> > + do {
> > + console_lock();
> > + any_progress = console_flush_one_record(true, next_seq, handover, &any_usable);
> > +
> > + if (*handover)
> > + return false;
> > +
> > + __console_unlock();
> > +
> > + } while (any_progress);
> > +
> > + return any_usable;
> > +}
>
> This is yet another console_flush() API with a complicated semantic.
> (return value, parameters). It is not bad. But it would be nice
> if we did not need it ;-)
Yes, my approach/rationale was to replace the chain of calls from the
__console_flush_and_unlock call in legacy_kthread_func to functions
that don't require console_lock. Thus __console_flush_and_unlock
became __console_flush_unlocked, and console_flush_all became
console_flush_all_unlocked. I hoped I could then change existing users
of the locked versions over to the unlocked versions and perhaps
eliminate the locked versions. However, given that hasn't worked out,
open-coding as you suggest seems like a better approach.
>
> > static void __console_flush_and_unlock(void)
> > {
> > bool do_cond_resched;
> > @@ -3346,6 +3386,17 @@ static void __console_flush_and_unlock(void)
> > } while (prb_read_valid(prb, next_seq, NULL) && console_trylock());
> > }
> >
> > +static void __console_flush_unlocked(void)
> > +{
> > + bool handover;
> > + bool flushed;
> > + u64 next_seq;
> > +
> > + do {
> > + flushed = console_flush_all_unlocked(&next_seq, &handover);
> > + } while (flushed && !handover && prb_read_valid(prb, next_seq, NULL));
> > +}
>
> The semantic of this function is not much clear from the name.
> IMHO, it would be a puzzle for people who would try to understand
> the code in the future.
>
> > +
> > /**
> > * console_unlock - unblock the legacy console subsystem from printing
> > *
> > @@ -3676,8 +3727,7 @@ static int legacy_kthread_func(void *unused)
> > if (kthread_should_stop())
> > break;
> >
> > - console_lock();
> > - __console_flush_and_unlock();
> > + __console_flush_unlocked();
>
> IMHO, it would be pretty hard to understand what it does by trying to
> understand all the layers of the code.
>
> It might be better to open code it. And I would take inspiration
> in the nbcon_kthread_func().
Yes, I agree.
>
> I played with the code and came up with:
>
> static int legacy_kthread_func(void *unused)
> {
> bool any_progress;
>
> wait_for_event:
> wait_event_interruptible(legacy_wait, legacy_kthread_should_wakeup());
>
> do {
> bool any_usable;
> bool handover;
> u64 next_seq;
>
> if (kthread_should_stop())
> return 0;
This changes the behaviour from the existing legacy_kthread_func. Thus
allowing the thread to exit mid way through printing remaining
records, whereas previously the whole set of unprinted records would
first be printed. But that's probably a good thing.
>
> console_lock();
> any_progress = console_flush_one_record(true, &next_seq, &handover, &any_usable);
> if (!handover)
> __console_unlock();
>
> /*
> * There is no need to check whether there is any usable
> * console in this context or whether there are pending
> * messages. It is checked by legacy_kthread_should_wakeup()
> * anyway.
Ah, I hadn't realised this, thus the prb_read_valid call in my loop
isn't required. Thanks.
> And console_lock() will never succeed again if
> * there was panic in progress.
> */
> } while (any_progress);
>
> goto wait_for_event;
> }
>
> What do you think, please?
>
> Best Regards,
> Petr
I'll respin this series.
Thanks,
Andrew Murray
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 2/2] printk: Use console_flush_one_record for legacy printer kthread
2025-09-19 13:38 ` Andrew Murray
@ 2025-09-22 13:27 ` John Ogness
2025-09-23 14:22 ` Petr Mladek
0 siblings, 1 reply; 12+ messages in thread
From: John Ogness @ 2025-09-22 13:27 UTC (permalink / raw)
To: Andrew Murray, Petr Mladek
Cc: Steven Rostedt, Sergey Senozhatsky, linux-kernel
On 2025-09-19, Andrew Murray <amurray@thegoodpenguin.co.uk> wrote:
>> I played with the code and came up with:
>>
>> static int legacy_kthread_func(void *unused)
>> {
>> bool any_progress;
>>
>> wait_for_event:
>> wait_event_interruptible(legacy_wait, legacy_kthread_should_wakeup());
>>
>> do {
>> bool any_usable;
>> bool handover;
>> u64 next_seq;
>>
>> if (kthread_should_stop())
>> return 0;
>
> This changes the behaviour from the existing legacy_kthread_func. Thus
> allowing the thread to exit mid way through printing remaining
> records, whereas previously the whole set of unprinted records would
> first be printed. But that's probably a good thing.
It does not matter. kthread_should_stop() will only return true from
printk_kthreads_check_locked() when @have_legacy_console and
@have_boot_console are both false. That means that whatever legacy or
boot consoles there were, they are now unregistered, and were already
flushed from within their unregister_console_locked().
John
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 1/2] printk: Introduce console_flush_one_record
2025-09-19 13:06 ` Andrew Murray
@ 2025-09-23 13:30 ` Petr Mladek
2025-09-27 11:10 ` Andrew Murray
0 siblings, 1 reply; 12+ messages in thread
From: Petr Mladek @ 2025-09-23 13:30 UTC (permalink / raw)
To: Andrew Murray
Cc: Steven Rostedt, John Ogness, Sergey Senozhatsky, linux-kernel
On Fri 2025-09-19 14:06:24, Andrew Murray wrote:
> On Thu, 18 Sept 2025 at 16:22, Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Mon 2025-09-15 13:43:05, Andrew Murray wrote:
> > > console_flush_all prints all remaining records to all usable consoles
> > > whilst its caller holds console_lock. This can result in large waiting
> > > times for those waiting for console_lock especially where there is a
> > > large volume of records or where the console is slow (e.g. serial).
> > >
> > > Let's extract the parts of this function which print a single record
> > > into a new function named console_flush_one_record. This can later
> > > be used for functions that will release and reacquire console_lock
> > > between records.
> > >
> > > This commit should not change existing functionality.
> > >
> > > --- a/kernel/printk/printk.c
> > > +++ b/kernel/printk/printk.c
> > > @@ -3186,77 +3281,24 @@ static inline void printk_kthreads_check_locked(void) { }
> > > */
> > > static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handover)
> > > {
> > > - struct console_flush_type ft;
> > > - bool any_usable = false;
> > > - struct console *con;
> > > + bool any_usable;
> > > bool any_progress;
> > > - int cookie;
> > >
> > > *next_seq = 0;
> > > *handover = false;
> > >
> > > do {
> > > - any_progress = false;
> > > -
> > > - printk_get_console_flush_type(&ft);
> > > -
> > > - cookie = console_srcu_read_lock();
> > > - for_each_console_srcu(con) {
> > > - short flags = console_srcu_read_flags(con);
> > > - u64 printk_seq;
> > > - bool progress;
> > > + any_progress = console_flush_one_record(do_cond_resched, next_seq, handover,
> > > + &any_usable);
>
> Ah, there was an error in my patch. Previously console_flush_all would
> set any_usable to false at the start, and then set it to true if any
> usable consoles were seen whilst printing any of the records. My
> changes resulted in any_usable being set to false on each entry of
> console_flush_one_record - thus effectively only indicating if any
> consoles were usable for *only* the last record.
I noticed the change but I considered it cosmetic. In fact, I think
that it is better to reset the value in each cycle and return the value
reflecting the current situation.
If I get it correctly, it should not have any bad consequences.
In the worst case, we would return false when the last usable
console was disabled. It actually looks like the right value
for all callers.
> Thus to preserve functionality, I'd need to additionally add these changes:
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 75a3c47e9c0e..060d4919de32 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3193,7 +3193,6 @@ static bool console_flush_one_record(bool
> do_cond_resched, u64 *next_seq, bool *
> bool any_progress;
> int cookie;
>
> - *any_usable = false;
> any_progress = false;
>
> printk_get_console_flush_type(&ft);
> @@ -3281,7 +3280,7 @@ static bool console_flush_one_record(bool
> do_cond_resched, u64 *next_seq, bool *
> */
> static bool console_flush_all(bool do_cond_resched, u64 *next_seq,
> bool *handover)
> {
> - bool any_usable;
> + bool any_usable = false;
> bool any_progress;
>
> *next_seq = 0;
>
> Are you happy for me to keep your reviewed-by with these changes?
Feel free to do this in v2 and keep my reviewed-by.
It is always better to reshufle a code without changing
the behavior.
> > > - /*
> > > - * console_flush_all() is only responsible for nbcon
> > > - * consoles when the nbcon consoles cannot print via
> > > - * their atomic or threaded flushing.
> > > - */
> > > - if ((flags & CON_NBCON) && (ft.nbcon_atomic || ft.nbcon_offload))
> > > - continue;
> > > -
> > > - if (!console_is_usable(con, flags, !do_cond_resched))
> > > - continue;
> > > - any_usable = true;
> > > -
> > > - if (flags & CON_NBCON) {
> > > - progress = nbcon_legacy_emit_next_record(con, handover, cookie,
> > > - !do_cond_resched);
> > > - printk_seq = nbcon_seq_read(con);
> > > - } else {
> > > - progress = console_emit_next_record(con, handover, cookie);
> > > - printk_seq = con->seq;
> > > - }
> > > -
> > > - /*
> > > - * If a handover has occurred, the SRCU read lock
> > > - * is already released.
> > > - */
> > > - if (*handover)
> > > - return false;
> > > -
> > > - /* Track the next of the highest seq flushed. */
> > > - if (printk_seq > *next_seq)
> > > - *next_seq = printk_seq;
> > > -
> > > - if (!progress)
> > > - continue;
> > > - any_progress = true;
> > > -
> > > - /* Allow panic_cpu to take over the consoles safely. */
> > > - if (other_cpu_in_panic())
> > > - goto abandon;
> > > + if (*handover)
> > > + return false;
> > >
> > > - if (do_cond_resched)
> > > - cond_resched();
> > > - }
> > > - console_srcu_read_unlock(cookie);
> > > + if (other_cpu_in_panic())
> > > + return false;
> > > } while (any_progress);
> > >
> > > return any_usable;
> >
> > The semantic of the function was always complicated. It results
> > into duplicated checks in the callers.
>
> Yes, I've found this confusing.
>
> any_usable is used by callers for different purposes. For
> __console_flush_and_unlock, its used to determine if its possible to
> try and do more work (i.e. to check if they should bother attempting
> to print messages which may have been added between flush and
> unlock)
This implements a historic approach when the current console_lock
owner is responsible for flushing all pending messages.
It makes sure that all messages are flushed to the console.
And parallel printk() does not need to wait for console_lock.
The "try and do more work" handles the following race:
CPU0 CPU1
printk()
console_trylock()
console_unlock()
__console_flush_and_unlock()
console_flush_all()
printk()
console_trylock()
# fails because console_sem
# is still owned by CPU0
__console_unlock()
Problem: Nobody owns console_lock but the message added by
CPU1 has not been flushed yet.
Solution: The last owner must check whether there is a new
message after releasing console_lock. And it must
try to flush it.
The try_lock might fail when another CPU gets
the console_lock in the meantime. But it is fine.
The new owner will flush the message then.
> - i.e. useable consoles exist but also no panics or handovers have
> happened.
These are situations when this context is not longer responsible
for flushing the rest because:
+ there are no usable consoles.
+ non-panic CPUs are going to be stopped. And the messages
will get flushed by panic CPU, either by printk() called
on panic CPU or by explicit flush in panic().
+ the console_lock ownership has been passed to another
CPU via console_trylock_spinning(). The new owner will
flush the pending messages. It helps to reduce the risk
of softlockup reports with this historic approach.
> For get_init_console_seq it is used to determine if init_seq
> was updated.
I would rather say that it is used to determine whether init_seq
has a reliable value. Where reliable means that it flushed and
checked all usable consoles.
Note that this value is use to prevent duplicate output when
a proper console driver replaces an early console driver.
We are not able to detect which early/proper drivers are using
the same physical device => it is better to flush all consoles
and use the last next_seq which should be the same on all consoles.
> The comment for console_flush_all says:
>
> * Returns true when there was at least one usable console and all messages
> * were flushed to all usable consoles.
>
> But I don't think that's true. It returns true when there was at least
> one usable console and at least one message was attempted to be
> written (I guess nbcon_legacy_emit_next_record or
> console_emit_next_record could bail out before writing its message).
> Unless I'm missing some assumptions about what can change in
> iterations of the do/while loop.
The "bail out" should be detected. This is why console_flush_all()
explicitly returns "false" in the following situations:
+ handover happened
+ other_cpu_in_panic() returns true
+ both any_progress and any_usable are false
Hmm, I agree that the last case is not 100% correct:
1. The original code did not reset any_usable in each cycle.
It might be true even when all usable consoles have been
disabled in the meantime.
Well, it is a corner case and should not cause problems.
In the worst case, console_unlock() will do one more
cycle (try_lock + console_flush_all) and the new
console_flush_all() would return false because
there won't be any usable console
This problem should not happen with get_init_console_seq()
because it is called under console_list_lock. Consoles
could not be disabled in parallel.
This will actually be fixed by resetting "any_usable" in
each cycle.
2. "any_progress" is set to true when at least one console
made progress.
It is not a big deal:
+ console_flush_all() would explicitly return false when
there was a handover or other_cpu_in_panic()
returned true.
+ console_emit_next_record() always finishes the job
when there is a pending message.
+ nbcon_legacy_emit_next_record() might lose the owner
ship. But the new owner will become responsible for
flushing everything. And console_flush_all() would
try one more cycle...
It seems to be working correctly in all situations after all.
> > I think that we could do slightly better in this particular case.
> > But I would do the refactoring in a separate patch to make this
> > one easier for review.
> >
> > I played with the code and came up with:
> >
> > From 7a3c930a12d7c0157f404a4a834d1f92f3070799 Mon Sep 17 00:00:00 2001
> > From: Petr Mladek <pmladek@suse.com>
> > Date: Thu, 18 Sep 2025 16:57:58 +0200
> > Subject: [RFC] printk: console_flush_one_record() code cleanup
> >
> > console_flush_one_record() and console_flush_all() duplicate several
> > checks. They both want to tell the caller that consoles are not
> > longer usable in this context because it has lost the lock or
> > the lock has to be reserved for the panic CPU.
> >
> > Pass this information by clearing the @any_usable parameter value
> > which has the same effect.
> >
> > Signed-off-by: Petr Mladek <pmladek@suse.com>
> > ---
> > kernel/printk/printk.c | 20 +++++++++-----------
> > 1 file changed, 9 insertions(+), 11 deletions(-)
> >
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 53daab5cdee5..2cceedcc3d34 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -3175,7 +3175,8 @@ static inline void printk_kthreads_check_locked(void) { }
> > * console_lock, in which case the caller is no longer holding the
> > * console_lock. Otherwise it is set to false.
> > *
> > - * @any_usable will be set to true if there are any usable consoles.
> > + * @any_usable will be set to true if there are any usable consoles
> > + * in this context.
> > *
> > * Returns true when there was at least one usable console and a record was
> > * flushed. A returned false indicates there were no records to flush for any
> > @@ -3230,7 +3231,7 @@ static bool console_flush_one_record(bool do_cond_resched, u64 *next_seq, bool *
> > * is already released.
> > */
> > if (*handover)
> > - return false;
> > + goto unusable;
> >
> > /* Track the next of the highest seq flushed. */
> > if (printk_seq > *next_seq)
> > @@ -3242,7 +3243,7 @@ static bool console_flush_one_record(bool do_cond_resched, u64 *next_seq, bool *
> >
> > /* Allow panic_cpu to take over the consoles safely. */
> > if (other_cpu_in_panic())
> > - goto abandon;
> > + goto unusable_srcu;
> >
> > if (do_cond_resched)
> > cond_resched();
> > @@ -3251,8 +3252,10 @@ static bool console_flush_one_record(bool do_cond_resched, u64 *next_seq, bool *
> >
> > return any_progress;
> >
> > -abandon:
> > +unusable_srcu:
> > console_srcu_read_unlock(cookie);
> > +unusable:
> > + *any_usable = false;
>
> I'm struggling to understand this. I think your intention is to start
> with any_usable being true (which would have to be set in
> console_flush_all)
No, my intention is to start with @any_usable set to "false".
It would be set to "true" only when console_is_usable() returned
true for at least one console.
And it would be set back to "false" when the function was not
able to finish the job (bailed out) because:
+ handover
+ other_cpu_in_panic()
By other words, the variable @any_usable would have the same
semantic as console_flush_all() return value.
Maybe, we could rename the variable to "flushed". It would
be set to "true" when the function was able to flush on
all usable consoles and when it was not interrupted.
I hope that it makes more sense now.
>
> > return false;
> > }
> >
> > @@ -3288,14 +3291,9 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
> > *handover = false;
> >
> > do {
> > - any_progress = console_flush_one_record(do_cond_resched, next_seq, handover,
> > - &any_usable);
> > + any_progress = console_flush_one_record(do_cond_resched, next_seq,
> > + handover, &any_usable);
> >
> > - if (*handover)
> > - return false;
> > -
> > - if (other_cpu_in_panic())
> > - return false;
>
> Ah, that makes a lot of sense, any_progress will be false in these cases anyway.
>
>
> > } while (any_progress);
> >
> > return any_usable;
Best Regards,
Petr
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 2/2] printk: Use console_flush_one_record for legacy printer kthread
2025-09-22 13:27 ` John Ogness
@ 2025-09-23 14:22 ` Petr Mladek
2025-09-27 12:10 ` Andrew Murray
0 siblings, 1 reply; 12+ messages in thread
From: Petr Mladek @ 2025-09-23 14:22 UTC (permalink / raw)
To: John Ogness
Cc: Andrew Murray, Steven Rostedt, Sergey Senozhatsky, linux-kernel
On Mon 2025-09-22 15:33:12, John Ogness wrote:
> On 2025-09-19, Andrew Murray <amurray@thegoodpenguin.co.uk> wrote:
> >> I played with the code and came up with:
> >>
> >> static int legacy_kthread_func(void *unused)
> >> {
> >> bool any_progress;
> >>
> >> wait_for_event:
> >> wait_event_interruptible(legacy_wait, legacy_kthread_should_wakeup());
> >>
> >> do {
> >> bool any_usable;
> >> bool handover;
> >> u64 next_seq;
> >>
> >> if (kthread_should_stop())
> >> return 0;
> >
> > This changes the behaviour from the existing legacy_kthread_func. Thus
> > allowing the thread to exit mid way through printing remaining
> > records, whereas previously the whole set of unprinted records would
> > first be printed. But that's probably a good thing.
Good catch! I admit that I just copied this from nbcon_kthread_func()
and it looked reasonable.
> It does not matter. kthread_should_stop() will only return true from
> printk_kthreads_check_locked() when @have_legacy_console and
> @have_boot_console are both false. That means that whatever legacy or
> boot consoles there were, they are now unregistered, and were already
> flushed from within their unregister_console_locked().
Yup. I would keep it in the do/while loop to keep it consistent with
the nbcon kthread.
Best Regards,
Petr
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 1/2] printk: Introduce console_flush_one_record
2025-09-23 13:30 ` Petr Mladek
@ 2025-09-27 11:10 ` Andrew Murray
0 siblings, 0 replies; 12+ messages in thread
From: Andrew Murray @ 2025-09-27 11:10 UTC (permalink / raw)
To: Petr Mladek; +Cc: Steven Rostedt, John Ogness, Sergey Senozhatsky, linux-kernel
On Tue, 23 Sept 2025 at 14:30, Petr Mladek <pmladek@suse.com> wrote:
>
> On Fri 2025-09-19 14:06:24, Andrew Murray wrote:
> > On Thu, 18 Sept 2025 at 16:22, Petr Mladek <pmladek@suse.com> wrote:
> > Thus to preserve functionality, I'd need to additionally add these changes:
> >
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 75a3c47e9c0e..060d4919de32 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -3193,7 +3193,6 @@ static bool console_flush_one_record(bool
> > do_cond_resched, u64 *next_seq, bool *
> > bool any_progress;
> > int cookie;
> >
> > - *any_usable = false;
> > any_progress = false;
> >
> > printk_get_console_flush_type(&ft);
> > @@ -3281,7 +3280,7 @@ static bool console_flush_one_record(bool
> > do_cond_resched, u64 *next_seq, bool *
> > */
> > static bool console_flush_all(bool do_cond_resched, u64 *next_seq,
> > bool *handover)
> > {
> > - bool any_usable;
> > + bool any_usable = false;
> > bool any_progress;
> >
> > *next_seq = 0;
> >
> > Are you happy for me to keep your reviewed-by with these changes?
>
> Feel free to do this in v2 and keep my reviewed-by.
> It is always better to reshufle a code without changing
> the behavior.
No problem, I've squashed the above into a v2 of this first patch.
I've also included the reverse of this change into your cleanup patch,
thus ensuring we set any_usable to false at the start of
console_flush_one_record, as suggested.
>
> > > > - /*
> > > > - * console_flush_all() is only responsible for nbcon
> > > > - * consoles when the nbcon consoles cannot print via
> > > > - * their atomic or threaded flushing.
> > > > - */
> > > > - if ((flags & CON_NBCON) && (ft.nbcon_atomic || ft.nbcon_offload))
> > > > - continue;
> > > > -
> > > > - if (!console_is_usable(con, flags, !do_cond_resched))
> > > > - continue;
> > > > - any_usable = true;
> > > > -
> > > > - if (flags & CON_NBCON) {
> > > > - progress = nbcon_legacy_emit_next_record(con, handover, cookie,
> > > > - !do_cond_resched);
> > > > - printk_seq = nbcon_seq_read(con);
> > > > - } else {
> > > > - progress = console_emit_next_record(con, handover, cookie);
> > > > - printk_seq = con->seq;
> > > > - }
> > > > -
> > > > - /*
> > > > - * If a handover has occurred, the SRCU read lock
> > > > - * is already released.
> > > > - */
> > > > - if (*handover)
> > > > - return false;
> > > > -
> > > > - /* Track the next of the highest seq flushed. */
> > > > - if (printk_seq > *next_seq)
> > > > - *next_seq = printk_seq;
> > > > -
> > > > - if (!progress)
> > > > - continue;
> > > > - any_progress = true;
> > > > -
> > > > - /* Allow panic_cpu to take over the consoles safely. */
> > > > - if (other_cpu_in_panic())
> > > > - goto abandon;
> > > > + if (*handover)
> > > > + return false;
> > > >
> > > > - if (do_cond_resched)
> > > > - cond_resched();
> > > > - }
> > > > - console_srcu_read_unlock(cookie);
> > > > + if (other_cpu_in_panic())
> > > > + return false;
> > > > } while (any_progress);
> > > >
> > > > return any_usable;
> > >
> > > The semantic of the function was always complicated. It results
> > > into duplicated checks in the callers.
> >
> > Yes, I've found this confusing.
> >
> > any_usable is used by callers for different purposes. For
> > __console_flush_and_unlock, its used to determine if its possible to
> > try and do more work (i.e. to check if they should bother attempting
> > to print messages which may have been added between flush and
> > unlock)
>
> This implements a historic approach when the current console_lock
> owner is responsible for flushing all pending messages.
>
> It makes sure that all messages are flushed to the console.
> And parallel printk() does not need to wait for console_lock.
>
> The "try and do more work" handles the following race:
>
> CPU0 CPU1
>
> printk()
>
> console_trylock()
> console_unlock()
> __console_flush_and_unlock()
> console_flush_all()
>
> printk()
> console_trylock()
> # fails because console_sem
> # is still owned by CPU0
>
>
> __console_unlock()
>
> Problem: Nobody owns console_lock but the message added by
> CPU1 has not been flushed yet.
>
> Solution: The last owner must check whether there is a new
> message after releasing console_lock. And it must
> try to flush it.
>
> The try_lock might fail when another CPU gets
> the console_lock in the meantime. But it is fine.
> The new owner will flush the message then.
>
>
> > - i.e. useable consoles exist but also no panics or handovers have
> > happened.
>
> These are situations when this context is not longer responsible
> for flushing the rest because:
>
> + there are no usable consoles.
>
> + non-panic CPUs are going to be stopped. And the messages
> will get flushed by panic CPU, either by printk() called
> on panic CPU or by explicit flush in panic().
>
> + the console_lock ownership has been passed to another
> CPU via console_trylock_spinning(). The new owner will
> flush the pending messages. It helps to reduce the risk
> of softlockup reports with this historic approach.
>
>
> > For get_init_console_seq it is used to determine if init_seq
> > was updated.
>
> I would rather say that it is used to determine whether init_seq
> has a reliable value. Where reliable means that it flushed and
> checked all usable consoles.
>
> Note that this value is use to prevent duplicate output when
> a proper console driver replaces an early console driver.
>
> We are not able to detect which early/proper drivers are using
> the same physical device => it is better to flush all consoles
> and use the last next_seq which should be the same on all consoles.
>
>
> > The comment for console_flush_all says:
> >
> > * Returns true when there was at least one usable console and all messages
> > * were flushed to all usable consoles.
> >
> > But I don't think that's true. It returns true when there was at least
> > one usable console and at least one message was attempted to be
> > written (I guess nbcon_legacy_emit_next_record or
> > console_emit_next_record could bail out before writing its message).
> > Unless I'm missing some assumptions about what can change in
> > iterations of the do/while loop.
>
> The "bail out" should be detected. This is why console_flush_all()
> explicitly returns "false" in the following situations:
>
> + handover happened
> + other_cpu_in_panic() returns true
> + both any_progress and any_usable are false
>
> Hmm, I agree that the last case is not 100% correct:
>
> 1. The original code did not reset any_usable in each cycle.
> It might be true even when all usable consoles have been
> disabled in the meantime.
>
> Well, it is a corner case and should not cause problems.
>
> In the worst case, console_unlock() will do one more
> cycle (try_lock + console_flush_all) and the new
> console_flush_all() would return false because
> there won't be any usable console
>
> This problem should not happen with get_init_console_seq()
> because it is called under console_list_lock. Consoles
> could not be disabled in parallel.
>
> This will actually be fixed by resetting "any_usable" in
> each cycle.
>
>
> 2. "any_progress" is set to true when at least one console
> made progress.
>
> It is not a big deal:
>
> + console_flush_all() would explicitly return false when
> there was a handover or other_cpu_in_panic()
> returned true.
>
> + console_emit_next_record() always finishes the job
> when there is a pending message.
>
> + nbcon_legacy_emit_next_record() might lose the owner
> ship. But the new owner will become responsible for
> flushing everything. And console_flush_all() would
> try one more cycle...
>
> It seems to be working correctly in all situations after all.
Thanks for the explanations, much appreciated.
>
>
> > > I think that we could do slightly better in this particular case.
> > > But I would do the refactoring in a separate patch to make this
> > > one easier for review.
> > >
> > > I played with the code and came up with:
> > >
> > > From 7a3c930a12d7c0157f404a4a834d1f92f3070799 Mon Sep 17 00:00:00 2001
> > > From: Petr Mladek <pmladek@suse.com>
> > > Date: Thu, 18 Sep 2025 16:57:58 +0200
> > > Subject: [RFC] printk: console_flush_one_record() code cleanup
> > >
> > > console_flush_one_record() and console_flush_all() duplicate several
> > > checks. They both want to tell the caller that consoles are not
> > > longer usable in this context because it has lost the lock or
> > > the lock has to be reserved for the panic CPU.
> > >
> > > Pass this information by clearing the @any_usable parameter value
> > > which has the same effect.
> > >
> > > Signed-off-by: Petr Mladek <pmladek@suse.com>
> > > ---
> > > kernel/printk/printk.c | 20 +++++++++-----------
> > > 1 file changed, 9 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > > index 53daab5cdee5..2cceedcc3d34 100644
> > > --- a/kernel/printk/printk.c
> > > +++ b/kernel/printk/printk.c
> > > @@ -3175,7 +3175,8 @@ static inline void printk_kthreads_check_locked(void) { }
> > > * console_lock, in which case the caller is no longer holding the
> > > * console_lock. Otherwise it is set to false.
> > > *
> > > - * @any_usable will be set to true if there are any usable consoles.
> > > + * @any_usable will be set to true if there are any usable consoles
> > > + * in this context.
> > > *
> > > * Returns true when there was at least one usable console and a record was
> > > * flushed. A returned false indicates there were no records to flush for any
> > > @@ -3230,7 +3231,7 @@ static bool console_flush_one_record(bool do_cond_resched, u64 *next_seq, bool *
> > > * is already released.
> > > */
> > > if (*handover)
> > > - return false;
> > > + goto unusable;
> > >
> > > /* Track the next of the highest seq flushed. */
> > > if (printk_seq > *next_seq)
> > > @@ -3242,7 +3243,7 @@ static bool console_flush_one_record(bool do_cond_resched, u64 *next_seq, bool *
> > >
> > > /* Allow panic_cpu to take over the consoles safely. */
> > > if (other_cpu_in_panic())
> > > - goto abandon;
> > > + goto unusable_srcu;
> > >
> > > if (do_cond_resched)
> > > cond_resched();
> > > @@ -3251,8 +3252,10 @@ static bool console_flush_one_record(bool do_cond_resched, u64 *next_seq, bool *
> > >
> > > return any_progress;
> > >
> > > -abandon:
> > > +unusable_srcu:
> > > console_srcu_read_unlock(cookie);
> > > +unusable:
> > > + *any_usable = false;
> >
> > I'm struggling to understand this. I think your intention is to start
> > with any_usable being true (which would have to be set in
> > console_flush_all)
>
> No, my intention is to start with @any_usable set to "false".
>
> It would be set to "true" only when console_is_usable() returned
> true for at least one console.
>
> And it would be set back to "false" when the function was not
> able to finish the job (bailed out) because:
>
> + handover
> + other_cpu_in_panic()
>
> By other words, the variable @any_usable would have the same
> semantic as console_flush_all() return value.
Thanks for explaining, I'm happy with this patch now, I've rebased it
onto my series. As suggested, I've updated it to ensure any_usable is
false at the start of console_flush_one_record.
>
> Maybe, we could rename the variable to "flushed". It would
> be set to "true" when the function was able to flush on
> all usable consoles and when it was not interrupted.
Thanks,
Andrew Murray
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC 2/2] printk: Use console_flush_one_record for legacy printer kthread
2025-09-23 14:22 ` Petr Mladek
@ 2025-09-27 12:10 ` Andrew Murray
0 siblings, 0 replies; 12+ messages in thread
From: Andrew Murray @ 2025-09-27 12:10 UTC (permalink / raw)
To: Petr Mladek; +Cc: John Ogness, Steven Rostedt, Sergey Senozhatsky, linux-kernel
On Tue, 23 Sept 2025 at 15:22, Petr Mladek <pmladek@suse.com> wrote:
>
> On Mon 2025-09-22 15:33:12, John Ogness wrote:
> > On 2025-09-19, Andrew Murray <amurray@thegoodpenguin.co.uk> wrote:
> > >> I played with the code and came up with:
> > >>
> > >> static int legacy_kthread_func(void *unused)
> > >> {
> > >> bool any_progress;
> > >>
> > >> wait_for_event:
> > >> wait_event_interruptible(legacy_wait, legacy_kthread_should_wakeup());
> > >>
> > >> do {
> > >> bool any_usable;
> > >> bool handover;
> > >> u64 next_seq;
> > >>
> > >> if (kthread_should_stop())
> > >> return 0;
> > >
> > > This changes the behaviour from the existing legacy_kthread_func. Thus
> > > allowing the thread to exit mid way through printing remaining
> > > records, whereas previously the whole set of unprinted records would
> > > first be printed. But that's probably a good thing.
>
> Good catch! I admit that I just copied this from nbcon_kthread_func()
> and it looked reasonable.
>
> > It does not matter. kthread_should_stop() will only return true from
> > printk_kthreads_check_locked() when @have_legacy_console and
> > @have_boot_console are both false. That means that whatever legacy or
> > boot consoles there were, they are now unregistered, and were already
> > flushed from within their unregister_console_locked().
>
> Yup. I would keep it in the do/while loop to keep it consistent with
> the nbcon kthread.
Thanks for the feedback, the only change I've made in my v2 from
Petr's code is to set handover to false in the loop. Otherwise if
there are no usable consoles, the then uninitialised variable may lead
to incorrectly calling or not calling __console_unlock.
Thanks,
Andrew Murray
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-09-27 12:11 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-15 12:43 [PATCH RFC 0/2] printk: Release console_lock between printing records in legacy thread Andrew Murray
2025-09-15 12:43 ` [PATCH RFC 1/2] printk: Introduce console_flush_one_record Andrew Murray
2025-09-18 15:22 ` Petr Mladek
2025-09-19 13:06 ` Andrew Murray
2025-09-23 13:30 ` Petr Mladek
2025-09-27 11:10 ` Andrew Murray
2025-09-15 12:43 ` [PATCH RFC 2/2] printk: Use console_flush_one_record for legacy printer kthread Andrew Murray
2025-09-18 16:27 ` Petr Mladek
2025-09-19 13:38 ` Andrew Murray
2025-09-22 13:27 ` John Ogness
2025-09-23 14:22 ` Petr Mladek
2025-09-27 12:10 ` Andrew Murray
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox