[PATCH] printk: CPU backtrace not printing on panic

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] printk: CPU backtrace not printing on panic
@ 2024-07-26  9:44 takakura
  2024-07-26 10:46 ` Petr Mladek
  0 siblings, 1 reply; 10+ messages in thread
From: takakura @ 2024-07-26  9:44 UTC (permalink / raw)
  To: pmladek, rostedt, john.ogness, senozhatsky, akpm, bhe, lukas,
	wangkefeng.wang, ubizjak, feng.tang, j.granados,
	stephen.s.brennan
  Cc: linux-kernel, nishimura, taka, Ryo Takakura

From: Ryo Takakura <takakura@valinux.co.jp>

commit 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing
to ringbuffer") disabled non-panic CPUs to further write messages to
ringbuffer after panicked.

Since the commit, non-panicked CPU's are not allowed to write to
ring buffer after panicked and CPU backtrace which is triggered
after panicked to sample non-panicked CPUs' backtrace no longer
serves its function as it can't print anything.

Fix the issue by allowing non-panicked CPUs to write to ringbuffer
when CPU backtrace was enabled if there is no need for message
suppression. This patch brings back commit 13fb0f74d702 ("printk:
Avoid livelock with heavy printk during panic")'s implementation
for suppressing non-panicked CPUs' messages when detected messages
being dropped.

Fixes: 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing to ringbuffer")
Signed-off-by: Ryo Takakura <takakura@valinux.co.jp>
---

I was not really sure what the best approach would be if we want to
allow cpu backtrace to function while prioritizing the panicked CPU's 
messages. Hoping to get some feedbacks, thanks!

Sincerely,
Ryo Takakura

---
 include/linux/panic.h  |  9 +++++++++
 include/linux/printk.h |  1 +
 kernel/panic.c         | 10 +---------
 kernel/printk/printk.c | 26 ++++++++++++++++++++++++--
 4 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/include/linux/panic.h b/include/linux/panic.h
index 6717b15e798c..70dc17284785 100644
--- a/include/linux/panic.h
+++ b/include/linux/panic.h
@@ -16,6 +16,15 @@ extern void oops_enter(void);
 extern void oops_exit(void);
 extern bool oops_may_print(void);
 
+#define PANIC_PRINT_TASK_INFO		0x00000001
+#define PANIC_PRINT_MEM_INFO		0x00000002
+#define PANIC_PRINT_TIMER_INFO		0x00000004
+#define PANIC_PRINT_LOCK_INFO		0x00000008
+#define PANIC_PRINT_FTRACE_INFO		0x00000010
+#define PANIC_PRINT_ALL_PRINTK_MSG	0x00000020
+#define PANIC_PRINT_ALL_CPU_BT		0x00000040
+#define PANIC_PRINT_BLOCKED_TASKS	0x00000080
+
 extern int panic_timeout;
 extern unsigned long panic_print;
 extern int panic_on_oops;
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 40afab23881a..cbb4c1ec1fca 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -78,6 +78,7 @@ extern char devkmsg_log_str[DEVKMSG_STR_MAX_SIZE];
 struct ctl_table;
 
 extern int suppress_printk;
+extern int suppress_panic_printk;
 
 struct va_format {
 	const char *fmt;
diff --git a/kernel/panic.c b/kernel/panic.c
index 8bff183d6180..8c23cd1f876e 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -66,14 +66,6 @@ static unsigned int warn_limit __read_mostly;
 int panic_timeout = CONFIG_PANIC_TIMEOUT;
 EXPORT_SYMBOL_GPL(panic_timeout);
 
-#define PANIC_PRINT_TASK_INFO		0x00000001
-#define PANIC_PRINT_MEM_INFO		0x00000002
-#define PANIC_PRINT_TIMER_INFO		0x00000004
-#define PANIC_PRINT_LOCK_INFO		0x00000008
-#define PANIC_PRINT_FTRACE_INFO		0x00000010
-#define PANIC_PRINT_ALL_PRINTK_MSG	0x00000020
-#define PANIC_PRINT_ALL_CPU_BT		0x00000040
-#define PANIC_PRINT_BLOCKED_TASKS	0x00000080
 unsigned long panic_print;
 
 ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
@@ -252,7 +244,7 @@ void check_panic_on_warn(const char *origin)
  */
 static void panic_other_cpus_shutdown(bool crash_kexec)
 {
-	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
+	if (panic_print & PANIC_PRINT_ALL_CPU_BT && !suppress_panic_printk)
 		trigger_all_cpu_backtrace();
 
 	/*
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 420fd310129d..0ac2ca1abbbf 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -459,6 +459,13 @@ static int console_msg_format = MSG_FORMAT_DEFAULT;
 static DEFINE_MUTEX(syslog_lock);
 
 #ifdef CONFIG_PRINTK
+
+/*
+ * During panic, heavy printk by other CPUs can delay the
+ * panic and risk deadlock on console resources.
+ */
+int __read_mostly suppress_panic_printk;
+
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
 /* All 3 protected by @syslog_lock. */
 /* the next printk record to read by syslog(READ) or /proc/kmsg */
@@ -2311,9 +2318,11 @@ asmlinkage int vprintk_emit(int facility, int level,
 	/*
 	 * The messages on the panic CPU are the most important. If
 	 * non-panic CPUs are generating any messages, they will be
-	 * silently dropped.
+	 * silently dropped unless CPU backtrace on panic is enabled.
 	 */
-	if (other_cpu_in_panic())
+	if (other_cpu_in_panic() &&
+	   (!(panic_print & PANIC_PRINT_ALL_CPU_BT) ||
+	    unlikely(suppress_panic_printk)))
 		return 0;
 
 	if (level == LOGLEVEL_SCHED) {
@@ -2816,6 +2825,8 @@ void console_prepend_dropped(struct printk_message *pmsg, unsigned long dropped)
 bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
 			     bool is_extended, bool may_suppress)
 {
+	static int panic_console_dropped;
+
 	struct printk_buffers *pbufs = pmsg->pbufs;
 	const size_t scratchbuf_sz = sizeof(pbufs->scratchbuf);
 	const size_t outbuf_sz = sizeof(pbufs->outbuf);
@@ -2843,6 +2854,17 @@ bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
 	pmsg->seq = r.info->seq;
 	pmsg->dropped = r.info->seq - seq;
 
+	/*
+	 * Check for dropped messages in panic here so that printk
+	 * suppression can occur as early as possible if necessary.
+	 */
+	if (pmsg->dropped &&
+	    panic_in_progress() &&
+	    panic_console_dropped++ > 10) {
+		suppress_panic_printk = 1;
+		pr_warn_once("Too many dropped messages. Suppress messages on non-panic CPUs to prevent livelock.\n");
+	}
+
 	/* Skip record that has level above the console loglevel. */
 	if (may_suppress && suppress_message_printing(r.info->level))
 		goto out;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] printk: CPU backtrace not printing on panic
  2024-07-26  9:44 [PATCH] printk: CPU backtrace not printing on panic takakura
@ 2024-07-26 10:46 ` Petr Mladek
  2024-07-26 13:56   ` John Ogness
  0 siblings, 1 reply; 10+ messages in thread
From: Petr Mladek @ 2024-07-26 10:46 UTC (permalink / raw)
  To: takakura
  Cc: rostedt, john.ogness, senozhatsky, akpm, bhe, lukas,
	wangkefeng.wang, ubizjak, feng.tang, j.granados,
	stephen.s.brennan, linux-kernel, nishimura, taka

On Fri 2024-07-26 18:44:37, takakura@valinux.co.jp wrote:
> From: Ryo Takakura <takakura@valinux.co.jp>
> 
> commit 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing
> to ringbuffer") disabled non-panic CPUs to further write messages to
> ringbuffer after panicked.
> 
> Since the commit, non-panicked CPU's are not allowed to write to
> ring buffer after panicked and CPU backtrace which is triggered
> after panicked to sample non-panicked CPUs' backtrace no longer
> serves its function as it can't print anything.

Great catch!

> Fix the issue by allowing non-panicked CPUs to write to ringbuffer
> when CPU backtrace was enabled if there is no need for message
> suppression. This patch brings back commit 13fb0f74d702 ("printk:
> Avoid livelock with heavy printk during panic")'s implementation
> for suppressing non-panicked CPUs' messages when detected messages
> being dropped.
> 
> Fixes: 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing to ringbuffer")
> Signed-off-by: Ryo Takakura <takakura@valinux.co.jp>
> ---
> 
> I was not really sure what the best approach would be if we want to
> allow cpu backtrace to function while prioritizing the panicked CPU's 
> messages. Hoping to get some feedbacks, thanks!
> 
> Sincerely,
> Ryo Takakura
> 
> ---
>  include/linux/panic.h  |  9 +++++++++
>  include/linux/printk.h |  1 +
>  kernel/panic.c         | 10 +---------
>  kernel/printk/printk.c | 26 ++++++++++++++++++++++++--
>  4 files changed, 35 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/panic.h b/include/linux/panic.h
> index 6717b15e798c..70dc17284785 100644
> --- a/include/linux/panic.h
> +++ b/include/linux/panic.h
> @@ -16,6 +16,15 @@ extern void oops_enter(void);
>  extern void oops_exit(void);
>  extern bool oops_may_print(void);
>  
> +#define PANIC_PRINT_TASK_INFO		0x00000001
> +#define PANIC_PRINT_MEM_INFO		0x00000002
> +#define PANIC_PRINT_TIMER_INFO		0x00000004
> +#define PANIC_PRINT_LOCK_INFO		0x00000008
> +#define PANIC_PRINT_FTRACE_INFO		0x00000010
> +#define PANIC_PRINT_ALL_PRINTK_MSG	0x00000020
> +#define PANIC_PRINT_ALL_CPU_BT		0x00000040
> +#define PANIC_PRINT_BLOCKED_TASKS	0x00000080
> +
>  extern int panic_timeout;
>  extern unsigned long panic_print;
>  extern int panic_on_oops;
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index 40afab23881a..cbb4c1ec1fca 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -78,6 +78,7 @@ extern char devkmsg_log_str[DEVKMSG_STR_MAX_SIZE];
>  struct ctl_table;
>  
>  extern int suppress_printk;
> +extern int suppress_panic_printk;
>  
>  struct va_format {
>  	const char *fmt;
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 8bff183d6180..8c23cd1f876e 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -66,14 +66,6 @@ static unsigned int warn_limit __read_mostly;
>  int panic_timeout = CONFIG_PANIC_TIMEOUT;
>  EXPORT_SYMBOL_GPL(panic_timeout);
>  
> -#define PANIC_PRINT_TASK_INFO		0x00000001
> -#define PANIC_PRINT_MEM_INFO		0x00000002
> -#define PANIC_PRINT_TIMER_INFO		0x00000004
> -#define PANIC_PRINT_LOCK_INFO		0x00000008
> -#define PANIC_PRINT_FTRACE_INFO		0x00000010
> -#define PANIC_PRINT_ALL_PRINTK_MSG	0x00000020
> -#define PANIC_PRINT_ALL_CPU_BT		0x00000040
> -#define PANIC_PRINT_BLOCKED_TASKS	0x00000080
>  unsigned long panic_print;
>  
>  ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
> @@ -252,7 +244,7 @@ void check_panic_on_warn(const char *origin)
>   */
>  static void panic_other_cpus_shutdown(bool crash_kexec)
>  {
> -	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
> +	if (panic_print & PANIC_PRINT_ALL_CPU_BT && !suppress_panic_printk)
>  		trigger_all_cpu_backtrace();
>  
>  	/*
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 420fd310129d..0ac2ca1abbbf 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -459,6 +459,13 @@ static int console_msg_format = MSG_FORMAT_DEFAULT;
>  static DEFINE_MUTEX(syslog_lock);
>  
>  #ifdef CONFIG_PRINTK
> +
> +/*
> + * During panic, heavy printk by other CPUs can delay the
> + * panic and risk deadlock on console resources.
> + */
> +int __read_mostly suppress_panic_printk;
> +
>  DECLARE_WAIT_QUEUE_HEAD(log_wait);
>  /* All 3 protected by @syslog_lock. */
>  /* the next printk record to read by syslog(READ) or /proc/kmsg */
> @@ -2311,9 +2318,11 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	/*
>  	 * The messages on the panic CPU are the most important. If
>  	 * non-panic CPUs are generating any messages, they will be
> -	 * silently dropped.
> +	 * silently dropped unless CPU backtrace on panic is enabled.
>  	 */
> -	if (other_cpu_in_panic())
> +	if (other_cpu_in_panic() &&
> +	   (!(panic_print & PANIC_PRINT_ALL_CPU_BT) ||

I would do it the other way and enable printing from other CPUs only
when triggring the backtrace. We could do it because
trigger_all_cpu_backtrace() waits until all backtraces are
printed.

Something like:

diff --git a/include/linux/panic.h b/include/linux/panic.h
index 3130e0b5116b..980bacbdfcfc 100644
--- a/include/linux/panic.h
+++ b/include/linux/panic.h
@@ -16,6 +16,7 @@ extern void oops_enter(void);
 extern void oops_exit(void);
 extern bool oops_may_print(void);
 
+extern int panic_triggering_all_cpu_backtrace;
 extern int panic_timeout;
 extern unsigned long panic_print;
 extern int panic_on_oops;
diff --git a/kernel/panic.c b/kernel/panic.c
index f861bedc1925..7e9e97d59b1e 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -64,6 +64,8 @@ unsigned long panic_on_taint;
 bool panic_on_taint_nousertaint = false;
 static unsigned int warn_limit __read_mostly;
 
+int panic_triggering_all_cpu_backtrace;
+
 int panic_timeout = CONFIG_PANIC_TIMEOUT;
 EXPORT_SYMBOL_GPL(panic_timeout);
 
@@ -253,8 +255,12 @@ void check_panic_on_warn(const char *origin)
  */
 static void panic_other_cpus_shutdown(bool crash_kexec)
 {
-	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
+	if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
+		/* Temporary allow printing messages on non-panic CPUs. */
+		panic_triggering_all_cpu_backtrace = true;
 		trigger_all_cpu_backtrace();
+		panic_triggering_all_cpu_backtrace = false;
+	}
 
 	/*
 	 * Note that smp_send_stop() is the usual SMP shutdown function,
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 054c0e7784fd..c22b07049c38 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2316,7 +2316,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	 * non-panic CPUs are generating any messages, they will be
 	 * silently dropped.
 	 */
-	if (other_cpu_in_panic())
+	if (other_cpu_in_panic() && !panic_triggering_all_cpu_backtrace)
 		return 0;
 
 	if (level == LOGLEVEL_SCHED) {


> +	    unlikely(suppress_panic_printk)))
>  		return 0;
>  
>  	if (level == LOGLEVEL_SCHED) {
> @@ -2816,6 +2825,8 @@ void console_prepend_dropped(struct printk_message *pmsg, unsigned long dropped)
>  bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
>  			     bool is_extended, bool may_suppress)
>  {
> +	static int panic_console_dropped;
> +
>  	struct printk_buffers *pbufs = pmsg->pbufs;
>  	const size_t scratchbuf_sz = sizeof(pbufs->scratchbuf);
>  	const size_t outbuf_sz = sizeof(pbufs->outbuf);
> @@ -2843,6 +2854,17 @@ bool printk_get_next_message(struct printk_message *pmsg, u64 seq,
>  	pmsg->seq = r.info->seq;
>  	pmsg->dropped = r.info->seq - seq;
>  
> +	/*
> +	 * Check for dropped messages in panic here so that printk
> +	 * suppression can occur as early as possible if necessary.
> +	 */
> +	if (pmsg->dropped &&
> +	    panic_in_progress() &&
> +	    panic_console_dropped++ > 10) {
> +		suppress_panic_printk = 1;
> +		pr_warn_once("Too many dropped messages. Suppress messages on non-panic CPUs to prevent livelock.\n");
> +	}

I would keep it simple and omit this change. The risk of mess is lower
if we enable other CPUs only around printing the backtraces.
We could always add it back when we see problems in practice.

> +
>  	/* Skip record that has level above the console loglevel. */
>  	if (may_suppress && suppress_message_printing(r.info->level))
>  		goto out;

Best Regards,
Petr

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] printk: CPU backtrace not printing on panic
  2024-07-26 10:46 ` Petr Mladek
@ 2024-07-26 13:56   ` John Ogness
  2024-07-29 11:46     ` takakura
  2024-07-30 12:14     ` Petr Mladek
  0 siblings, 2 replies; 10+ messages in thread
From: John Ogness @ 2024-07-26 13:56 UTC (permalink / raw)
  To: Petr Mladek, takakura
  Cc: rostedt, senozhatsky, akpm, bhe, lukas, wangkefeng.wang, ubizjak,
	feng.tang, j.granados, stephen.s.brennan, linux-kernel, nishimura,
	taka

On 2024-07-26, Petr Mladek <pmladek@suse.com> wrote:
> I would do it the other way and enable printing from other CPUs only
> when triggring the backtrace. We could do it because
> trigger_all_cpu_backtrace() waits until all backtraces are
> printed.
>
> Something like:
>
> diff --git a/include/linux/panic.h b/include/linux/panic.h
> index 3130e0b5116b..980bacbdfcfc 100644
> --- a/include/linux/panic.h
> +++ b/include/linux/panic.h
> @@ -16,6 +16,7 @@ extern void oops_enter(void);
>  extern void oops_exit(void);
>  extern bool oops_may_print(void);
>  
> +extern int panic_triggering_all_cpu_backtrace;
>  extern int panic_timeout;
>  extern unsigned long panic_print;
>  extern int panic_on_oops;
> diff --git a/kernel/panic.c b/kernel/panic.c
> index f861bedc1925..7e9e97d59b1e 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -64,6 +64,8 @@ unsigned long panic_on_taint;
>  bool panic_on_taint_nousertaint = false;
>  static unsigned int warn_limit __read_mostly;
>  
> +int panic_triggering_all_cpu_backtrace;
> +
>  int panic_timeout = CONFIG_PANIC_TIMEOUT;
>  EXPORT_SYMBOL_GPL(panic_timeout);
>  
> @@ -253,8 +255,12 @@ void check_panic_on_warn(const char *origin)
>   */
>  static void panic_other_cpus_shutdown(bool crash_kexec)
>  {
> -	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
> +	if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
> +		/* Temporary allow printing messages on non-panic CPUs. */
> +		panic_triggering_all_cpu_backtrace = true;
>  		trigger_all_cpu_backtrace();
> +		panic_triggering_all_cpu_backtrace = false;

Note, here we should also add

		nbcon_atomic_flush_pending();

Your suggestion allows the other CPUs to dump their backtrace into the
ringbuffer, but they are still forbidden from acquiring the nbcon
console contexts for printing. That is a necessary requirement of
nbcon_waiter_matches().

Or since the cpu_sync is held while printing the backtrace, we could
allow the non-panic CPUs to print by modifying the check in
nbcon_context_try_acquire_direct():

----- BEGIN -----
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index ef6e76db0f5a..cd8724840edc 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -241,7 +241,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
 	struct nbcon_state new;
 
 	do {
-		if (other_cpu_in_panic())
+		if (other_cpu_in_panic() && !__printk_cpu_sync_owner())
 			return -EPERM;
 
 		if (ctxt->prio <= cur->prio || ctxt->prio <= cur->req_prio)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 613644c55e54..f486d0b84a84 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -4569,6 +4569,11 @@ void __printk_cpu_sync_wait(void)
 }
 EXPORT_SYMBOL(__printk_cpu_sync_wait);
 
+bool __printk_cpu_sync_owner(void)
+{
+	return (atomic_read(&printk_cpu_sync_owner) == raw_smp_processor_id());
+}
+
 /**
  * __printk_cpu_sync_try_get() - Try to acquire the printk cpu-reentrant
  *                               spinning lock.
----- END -----

> +	}
>  
>  	/*
>  	 * Note that smp_send_stop() is the usual SMP shutdown function,
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 054c0e7784fd..c22b07049c38 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2316,7 +2316,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	 * non-panic CPUs are generating any messages, they will be
>  	 * silently dropped.
>  	 */
> -	if (other_cpu_in_panic())
> +	if (other_cpu_in_panic() && !panic_triggering_all_cpu_backtrace)
>  		return 0;

I wonder if it is enough to check if it is holding the cpu_sync. Then we
would not need @panic_triggering_all_cpu_backtrace.

John

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] printk: CPU backtrace not printing on panic
  2024-07-26 13:56   ` John Ogness
@ 2024-07-29 11:46     ` takakura
  2024-07-29 12:15       ` John Ogness
  2024-07-30 12:14     ` Petr Mladek
  1 sibling, 1 reply; 10+ messages in thread
From: takakura @ 2024-07-29 11:46 UTC (permalink / raw)
  To: pmladek, john.ogness
  Cc: rostedt, senozhatsky, akpm, bhe, lukas, wangkefeng.wang, ubizjak,
	feng.tang, j.granados, stephen.s.brennan, linux-kernel, nishimura,
	taka

Hi Petr and John,

Thanks for the feeedback!
Sorry that I was not keeping up with the recent nbcon updates and I 
don't think the patch I sent was reflecting them.

On 2024-07-26, John Ogness wrote:
>On 2024-07-26, Petr Mladek <pmladek@suse.com> wrote:
>> I would do it the other way and enable printing from other CPUs only
>> when triggring the backtrace. We could do it because
>> trigger_all_cpu_backtrace() waits until all backtraces are
>> printed.
>>

Yes, I think doing it other way around works better filtering out 
the non-panicked CPU's irrelevant messages as it does now.
As suggested, it also makes it simple as it removes the need for 
what I suggested (bringing back commit 13fb0f74d702 ("printk: Avoid 
livelock with heavy printk during panic")) which intention was 
to check wheather to trigger backtrace or not based on those 
irrelevant messages.

>> Something like:
>>
>> diff --git a/include/linux/panic.h b/include/linux/panic.h
>> index 3130e0b5116b..980bacbdfcfc 100644
>> --- a/include/linux/panic.h
>> +++ b/include/linux/panic.h
>> @@ -16,6 +16,7 @@ extern void oops_enter(void);
>>  extern void oops_exit(void);
>>  extern bool oops_may_print(void);
>>  
>> +extern int panic_triggering_all_cpu_backtrace;
>>  extern int panic_timeout;
>>  extern unsigned long panic_print;
>>  extern int panic_on_oops;
>> diff --git a/kernel/panic.c b/kernel/panic.c
>> index f861bedc1925..7e9e97d59b1e 100644
>> --- a/kernel/panic.c
>> +++ b/kernel/panic.c
>> @@ -64,6 +64,8 @@ unsigned long panic_on_taint;
>>  bool panic_on_taint_nousertaint = false;
>>  static unsigned int warn_limit __read_mostly;
>>  
>> +int panic_triggering_all_cpu_backtrace;
>> +
>>  int panic_timeout = CONFIG_PANIC_TIMEOUT;
>>  EXPORT_SYMBOL_GPL(panic_timeout);
>>  
>> @@ -253,8 +255,12 @@ void check_panic_on_warn(const char *origin)
>>   */
>>  static void panic_other_cpus_shutdown(bool crash_kexec)
>>  {
>> -	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
>> +	if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
>> +		/* Temporary allow printing messages on non-panic CPUs. */
>> +		panic_triggering_all_cpu_backtrace = true;
>>  		trigger_all_cpu_backtrace();
>> +		panic_triggering_all_cpu_backtrace = false;
>
>Note, here we should also add
>
>		nbcon_atomic_flush_pending();
>
>Your suggestion allows the other CPUs to dump their backtrace into the
>ringbuffer, but they are still forbidden from acquiring the nbcon
>console contexts for printing. That is a necessary requirement of
>nbcon_waiter_matches().

Thanks for pointing it out. I see that it can only allows to write into 
the ringbuffer and printing also needs to be taken care as well.

>Or since the cpu_sync is held while printing the backtrace, we could
>allow the non-panic CPUs to print by modifying the check in
>nbcon_context_try_acquire_direct():
>
>----- BEGIN -----
>diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
>index ef6e76db0f5a..cd8724840edc 100644
>--- a/kernel/printk/nbcon.c
>+++ b/kernel/printk/nbcon.c
>@@ -241,7 +241,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
> 	struct nbcon_state new;
> 
> 	do {
>-		if (other_cpu_in_panic())
>+		if (other_cpu_in_panic() && !__printk_cpu_sync_owner())
> 			return -EPERM;
> 
> 		if (ctxt->prio <= cur->prio || ctxt->prio <= cur->req_prio)
>diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>index 613644c55e54..f486d0b84a84 100644
>--- a/kernel/printk/printk.c
>+++ b/kernel/printk/printk.c
>@@ -4569,6 +4569,11 @@ void __printk_cpu_sync_wait(void)
> }
> EXPORT_SYMBOL(__printk_cpu_sync_wait);
> 
>+bool __printk_cpu_sync_owner(void)
>+{
>+	return (atomic_read(&printk_cpu_sync_owner) == raw_smp_processor_id());
>+}
>+
> /**
>  * __printk_cpu_sync_try_get() - Try to acquire the printk cpu-reentrant
>  *                               spinning lock.
>----- END -----
>
>> +	}
>>  
>>  	/*
>>  	 * Note that smp_send_stop() is the usual SMP shutdown function,
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index 054c0e7784fd..c22b07049c38 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2316,7 +2316,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>>  	 * non-panic CPUs are generating any messages, they will be
>>  	 * silently dropped.
>>  	 */
>> -	if (other_cpu_in_panic())
>> +	if (other_cpu_in_panic() && !panic_triggering_all_cpu_backtrace)
>>  		return 0;
>I wonder if it is enough to check if it is holding the cpu_sync. Then we
>would not need @panic_triggering_all_cpu_backtrace.

Taking a look at the cpu_sync, it seems CPU backtrace is the only one holding 
the cpu_sync after panic. If so, I also think that it can replace 
@panic_triggering_all_cpu_backtrace.

But I thought that we still need @panic_triggering_all_cpu_backtrace
to let non-panic CPUs writing their backtrace to ringbuffer if we were to 
use cpu_sync to pass the check within nbcon_context_try_acquire_direct().
Or can we use cpu_sync for checking wheather non-panicked CPUs can write to 
ringbuffer and let nbcon_atomic_flush_pending() do the printing something like:

----- BEGIN -----
diff --git a/kernel/panic.c b/kernel/panic.c
index 7e207092576b..eed76e3e061b 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -252,8 +252,10 @@ void check_panic_on_warn(const char *origin)
  */
 static void panic_other_cpus_shutdown(bool crash_kexec)
 {
-	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
+	if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
 		trigger_all_cpu_backtrace();
+		nbcon_atomic_flush_pending();
+	}

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d0bff0b0abfd..b8132801ea07 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2354,7 +2354,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	 * non-panic CPUs are generating any messages, they will be
 	 * silently dropped.
 	 */
-	if (other_cpu_in_panic())
+	if (other_cpu_in_panic() && !__printk_cpu_sync_owner())
 		return 0;
 
 	if (level == LOGLEVEL_SCHED) {
@@ -4511,6 +4511,11 @@ void __printk_cpu_sync_wait(void)
 }
 EXPORT_SYMBOL(__printk_cpu_sync_wait);
 
+bool __printk_cpu_sync_owner(void)
+{
+	return (atomic_read(&printk_cpu_sync_owner) == raw_smp_processor_id());
+}
+
----- END -----


>John

Sincerely,
Ryo Takakura

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] printk: CPU backtrace not printing on panic
  2024-07-29 11:46     ` takakura
@ 2024-07-29 12:15       ` John Ogness
  2024-07-29 12:19         ` John Ogness
  0 siblings, 1 reply; 10+ messages in thread
From: John Ogness @ 2024-07-29 12:15 UTC (permalink / raw)
  To: takakura, pmladek
  Cc: rostedt, senozhatsky, akpm, bhe, lukas, wangkefeng.wang, ubizjak,
	feng.tang, j.granados, stephen.s.brennan, linux-kernel, nishimura,
	taka

Hi Ryo,

On 2024-07-29, takakura@valinux.co.jp wrote:
> Or can we use cpu_sync for checking wheather non-panicked CPUs can
> write to ringbuffer and let nbcon_atomic_flush_pending() do the
> printing

Yes, I like this. Let the non-panic CPUs write their messages, but keep
the actual printing to the panic CPU.

Right after the call to panic_other_cpus_shutdown() we plan [0] on
calling printk_legacy_allow_panic_sync(), which will handle flushing the
legacy consoles.

> something like:
>
> ----- BEGIN -----
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 7e207092576b..eed76e3e061b 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -252,8 +252,10 @@ void check_panic_on_warn(const char *origin)
>   */
>  static void panic_other_cpus_shutdown(bool crash_kexec)
>  {
> -	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
> +	if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
>  		trigger_all_cpu_backtrace();
> +		nbcon_atomic_flush_pending();
> +	}
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index d0bff0b0abfd..b8132801ea07 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2354,7 +2354,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	 * non-panic CPUs are generating any messages, they will be
>  	 * silently dropped.
>  	 */
> -	if (other_cpu_in_panic())
> +	if (other_cpu_in_panic() && !__printk_cpu_sync_owner())
>  		return 0;
>  
>  	if (level == LOGLEVEL_SCHED) {
> @@ -4511,6 +4511,11 @@ void __printk_cpu_sync_wait(void)
>  }
>  EXPORT_SYMBOL(__printk_cpu_sync_wait);
>  
> +bool __printk_cpu_sync_owner(void)
> +{
> +	return (atomic_read(&printk_cpu_sync_owner) == raw_smp_processor_id());
> +}
> +
> ----- END -----

Note that printing on the non-panic CPUs is guaranteed for nbcon
consoles because of the other_cpu_in_panic() in
nbcon_context_try_acquire_direct() and for the legacy consoles because
the cpu_sync holder must defer printing [1].

I am curious what Petr has to say.

John

[0] https://lore.kernel.org/lkml/20240527063749.391035-26-john.ogness@linutronix.de

[1] https://lore.kernel.org/lkml/87plrcqyii.fsf@jogness.linutronix.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] printk: CPU backtrace not printing on panic
  2024-07-29 12:15       ` John Ogness
@ 2024-07-29 12:19         ` John Ogness
  0 siblings, 0 replies; 10+ messages in thread
From: John Ogness @ 2024-07-29 12:19 UTC (permalink / raw)
  To: takakura, pmladek
  Cc: rostedt, senozhatsky, akpm, bhe, lukas, wangkefeng.wang, ubizjak,
	feng.tang, j.granados, stephen.s.brennan, linux-kernel, nishimura,
	taka

Sorry, missed an important negation...

On 2024-07-29, John Ogness <john.ogness@linutronix.de> wrote:
> Hi Ryo,
>
> On 2024-07-29, takakura@valinux.co.jp wrote:
>> Or can we use cpu_sync for checking wheather non-panicked CPUs can
>> write to ringbuffer and let nbcon_atomic_flush_pending() do the
>> printing
>
> Yes, I like this. Let the non-panic CPUs write their messages, but keep
> the actual printing to the panic CPU.
>
> Right after the call to panic_other_cpus_shutdown() we plan [0] on
> calling printk_legacy_allow_panic_sync(), which will handle flushing the
> legacy consoles.
>
>> something like:
>>
>> ----- BEGIN -----
>> diff --git a/kernel/panic.c b/kernel/panic.c
>> index 7e207092576b..eed76e3e061b 100644
>> --- a/kernel/panic.c
>> +++ b/kernel/panic.c
>> @@ -252,8 +252,10 @@ void check_panic_on_warn(const char *origin)
>>   */
>>  static void panic_other_cpus_shutdown(bool crash_kexec)
>>  {
>> -	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
>> +	if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
>>  		trigger_all_cpu_backtrace();
>> +		nbcon_atomic_flush_pending();
>> +	}
>>
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index d0bff0b0abfd..b8132801ea07 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2354,7 +2354,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>>  	 * non-panic CPUs are generating any messages, they will be
>>  	 * silently dropped.
>>  	 */
>> -	if (other_cpu_in_panic())
>> +	if (other_cpu_in_panic() && !__printk_cpu_sync_owner())
>>  		return 0;
>>  
>>  	if (level == LOGLEVEL_SCHED) {
>> @@ -4511,6 +4511,11 @@ void __printk_cpu_sync_wait(void)
>>  }
>>  EXPORT_SYMBOL(__printk_cpu_sync_wait);
>>  
>> +bool __printk_cpu_sync_owner(void)
>> +{
>> +	return (atomic_read(&printk_cpu_sync_owner) == raw_smp_processor_id());
>> +}
>> +
>> ----- END -----
>
> Note that printing on the non-panic CPUs is guaranteed for nbcon

Note that _preventing_ printing on the non-panic CPUs...

> consoles because of the other_cpu_in_panic() in
> nbcon_context_try_acquire_direct() and for the legacy consoles because
> the cpu_sync holder must defer printing [1].
>
> I am curious what Petr has to say.
>
> John
>
> [0] https://lore.kernel.org/lkml/20240527063749.391035-26-john.ogness@linutronix.de
>
> [1] https://lore.kernel.org/lkml/87plrcqyii.fsf@jogness.linutronix.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] printk: CPU backtrace not printing on panic
  2024-07-26 13:56   ` John Ogness
  2024-07-29 11:46     ` takakura
@ 2024-07-30 12:14     ` Petr Mladek
  2024-08-01  8:27       ` takakura
  1 sibling, 1 reply; 10+ messages in thread
From: Petr Mladek @ 2024-07-30 12:14 UTC (permalink / raw)
  To: John Ogness
  Cc: takakura, rostedt, senozhatsky, akpm, bhe, lukas, wangkefeng.wang,
	ubizjak, feng.tang, j.granados, stephen.s.brennan, linux-kernel,
	nishimura, taka

On Fri 2024-07-26 16:02:45, John Ogness wrote:
> On 2024-07-26, Petr Mladek <pmladek@suse.com> wrote:
> > I would do it the other way and enable printing from other CPUs only
> > when triggring the backtrace. We could do it because
> > trigger_all_cpu_backtrace() waits until all backtraces are
> > printed.
> >
> > Something like:
> >
> > diff --git a/include/linux/panic.h b/include/linux/panic.h
> > index 3130e0b5116b..980bacbdfcfc 100644
> > --- a/include/linux/panic.h
> > +++ b/include/linux/panic.h
> > @@ -16,6 +16,7 @@ extern void oops_enter(void);
> >  extern void oops_exit(void);
> >  extern bool oops_may_print(void);
> >  
> > +extern int panic_triggering_all_cpu_backtrace;
> >  extern int panic_timeout;
> >  extern unsigned long panic_print;
> >  extern int panic_on_oops;
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index f861bedc1925..7e9e97d59b1e 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -64,6 +64,8 @@ unsigned long panic_on_taint;
> >  bool panic_on_taint_nousertaint = false;
> >  static unsigned int warn_limit __read_mostly;
> >  
> > +int panic_triggering_all_cpu_backtrace;
> > +
> >  int panic_timeout = CONFIG_PANIC_TIMEOUT;
> >  EXPORT_SYMBOL_GPL(panic_timeout);
> >  
> > @@ -253,8 +255,12 @@ void check_panic_on_warn(const char *origin)
> >   */
> >  static void panic_other_cpus_shutdown(bool crash_kexec)
> >  {
> > -	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
> > +	if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
> > +		/* Temporary allow printing messages on non-panic CPUs. */
> > +		panic_triggering_all_cpu_backtrace = true;
> >  		trigger_all_cpu_backtrace();
> > +		panic_triggering_all_cpu_backtrace = false;
> 
> Note, here we should also add
> 
> 		nbcon_atomic_flush_pending();
> 
> Your suggestion allows the other CPUs to dump their backtrace into the
> ringbuffer, but they are still forbidden from acquiring the nbcon
> console contexts for printing. That is a necessary requirement of
> nbcon_waiter_matches().

Great catch!

I would prefer to solve this in a separate patch. This problem existed
even before the commit 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing
to ringbuffer"). In fact, the problem existed very long time even for
the legacy consoles.

It is pity that we need to handle both consoles separately. IMHO,
we could get the same job done by calling

	console_flush_on_panic(CONSOLE_FLUSH_PENDING);

It flushes both nbcon and legacy consoles.

> Or since the cpu_sync is held while printing the backtrace, we could
> allow the non-panic CPUs to print by modifying the check in
> nbcon_context_try_acquire_direct():
> 
> ----- BEGIN -----
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index ef6e76db0f5a..cd8724840edc 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -241,7 +241,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
>  	struct nbcon_state new;
>  
>  	do {
> -		if (other_cpu_in_panic())
> +		if (other_cpu_in_panic() && !__printk_cpu_sync_owner())

Interesting idea. I am not completely against it.

Well, this would be the only situation when nmi_cpu_backtrace() would
be allowed to flush the messages directly. Also it would be yet
another exception.

I would probably keep it simple and just flush the messages from
the panic-CPU (using console_flush_on_panic(CONSOLE_FLUSH_PENDING).


>  			return -EPERM;
>  
>  		if (ctxt->prio <= cur->prio || ctxt->prio <= cur->req_prio)
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2316,7 +2316,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  	 * non-panic CPUs are generating any messages, they will be
> >  	 * silently dropped.
> >  	 */
> > -	if (other_cpu_in_panic())
> > +	if (other_cpu_in_panic() && !panic_triggering_all_cpu_backtrace)
> >  		return 0;
> 
> I wonder if it is enough to check if it is holding the cpu_sync. Then we
> would not need @panic_triggering_all_cpu_backtrace.

I prefer to keep panic_triggering_all_cpu_backtrace. I know, it is an
ugly long name. But it clearly defines what we want to achieve.
And it limits the exception to printing the backtraces.

The check of the cpu_owner would work now because it is used basically
only for the backtraces. But it might change anytime in the future.
cpu_owner is a "generic" lock. I guess that it will be used
in more situations in the future. Any change might break this
scenario again...

Summary:

I prefer two patches:

  1st patch would allow storing the backtraces using the variable
     panic_triggering_all_cpu_backtrace (better name appreciated).

  2nd patch would cause flushing the backtraces. And I would use
     console_flush_on_panic(CONSOLE_FLUSH_PENDING) as a variant
     which can be backported to stable kernels. It might later
     be updated by the upcoming printk rework.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] printk: CPU backtrace not printing on panic
  2024-07-30 12:14     ` Petr Mladek
@ 2024-08-01  8:27       ` takakura
  2024-08-01 11:12         ` Petr Mladek
  0 siblings, 1 reply; 10+ messages in thread
From: takakura @ 2024-08-01  8:27 UTC (permalink / raw)
  To: pmladek, john.ogness
  Cc: akpm, bhe, feng.tang, j.granados, linux-kernel, lukas, nishimura,
	rostedt, senozhatsky, stephen.s.brennan, taka, takakura, ubizjak,
	wangkefeng.wang

Hi Petr and John,

On 2024-07-30, Petr Mladek <pmladek@suse.com> wrote:
>On Fri 2024-07-26 16:02:45, John Ogness wrote:
>> On 2024-07-26, Petr Mladek <pmladek@suse.com> wrote:
>> > I would do it the other way and enable printing from other CPUs only
>> > when triggring the backtrace. We could do it because
>> > trigger_all_cpu_backtrace() waits until all backtraces are
>> > printed.
>> >
>> > Something like:
>> >
>> > diff --git a/include/linux/panic.h b/include/linux/panic.h
>> > index 3130e0b5116b..980bacbdfcfc 100644
>> > --- a/include/linux/panic.h
>> > +++ b/include/linux/panic.h
>> > @@ -16,6 +16,7 @@ extern void oops_enter(void);
>> >  extern void oops_exit(void);
>> >  extern bool oops_may_print(void);
>> >  
>> > +extern int panic_triggering_all_cpu_backtrace;
>> >  extern int panic_timeout;
>> >  extern unsigned long panic_print;
>> >  extern int panic_on_oops;
>> > diff --git a/kernel/panic.c b/kernel/panic.c
>> > index f861bedc1925..7e9e97d59b1e 100644
>> > --- a/kernel/panic.c
>> > +++ b/kernel/panic.c
>> > @@ -64,6 +64,8 @@ unsigned long panic_on_taint;
>> >  bool panic_on_taint_nousertaint = false;
>> >  static unsigned int warn_limit __read_mostly;
>> >  
>> > +int panic_triggering_all_cpu_backtrace;
>> > +
>> >  int panic_timeout = CONFIG_PANIC_TIMEOUT;
>> >  EXPORT_SYMBOL_GPL(panic_timeout);
>> >  
>> > @@ -253,8 +255,12 @@ void check_panic_on_warn(const char *origin)
>> >   */
>> >  static void panic_other_cpus_shutdown(bool crash_kexec)
>> >  {
>> > -	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
>> > +	if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
>> > +		/* Temporary allow printing messages on non-panic CPUs. */
>> > +		panic_triggering_all_cpu_backtrace = true;
>> >  		trigger_all_cpu_backtrace();
>> > +		panic_triggering_all_cpu_backtrace = false;
>> 
>> Note, here we should also add
>> 
>> 		nbcon_atomic_flush_pending();
>> 
>> Your suggestion allows the other CPUs to dump their backtrace into the
>> ringbuffer, but they are still forbidden from acquiring the nbcon
>> console contexts for printing. That is a necessary requirement of
>> nbcon_waiter_matches().
>
>Great catch!
>
>I would prefer to solve this in a separate patch. This problem existed
>even before the commit 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing
>to ringbuffer"). In fact, the problem existed very long time even for
>the legacy consoles.
>

Good point! I guess the problem existed since the commit 51a1d258e50e 
("printk: Keep non-panic-CPUs out of console lock") as it forbade the 
acquisition of console lock for non-panic cpus?

>It is pity that we need to handle both consoles separately. IMHO,
>we could get the same job done by calling
>
>	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>
>It flushes both nbcon and legacy consoles.
>
>> Or since the cpu_sync is held while printing the backtrace, we could
>> allow the non-panic CPUs to print by modifying the check in
>> nbcon_context_try_acquire_direct():
>> 
>> ----- BEGIN -----
>> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
>> index ef6e76db0f5a..cd8724840edc 100644
>> --- a/kernel/printk/nbcon.c
>> +++ b/kernel/printk/nbcon.c
>> @@ -241,7 +241,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
>>  	struct nbcon_state new;
>>  
>>  	do {
>> -		if (other_cpu_in_panic())
>> +		if (other_cpu_in_panic() && !__printk_cpu_sync_owner())
>
>Interesting idea. I am not completely against it.
>
>Well, this would be the only situation when nmi_cpu_backtrace() would
>be allowed to flush the messages directly. Also it would be yet
>another exception.
>
>I would probably keep it simple and just flush the messages from
>the panic-CPU (using console_flush_on_panic(CONSOLE_FLUSH_PENDING).
>
>
>>  			return -EPERM;
>>  
>>  		if (ctxt->prio <= cur->prio || ctxt->prio <= cur->req_prio)
>> > --- a/kernel/printk/printk.c
>> > +++ b/kernel/printk/printk.c
>> > @@ -2316,7 +2316,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>> >  	 * non-panic CPUs are generating any messages, they will be
>> >  	 * silently dropped.
>> >  	 */
>> > -	if (other_cpu_in_panic())
>> > +	if (other_cpu_in_panic() && !panic_triggering_all_cpu_backtrace)
>> >  		return 0;
>> 
>> I wonder if it is enough to check if it is holding the cpu_sync. Then we
>> would not need @panic_triggering_all_cpu_backtrace.
>
>I prefer to keep panic_triggering_all_cpu_backtrace. I know, it is an
>ugly long name. But it clearly defines what we want to achieve.
>And it limits the exception to printing the backtraces.
>
>The check of the cpu_owner would work now because it is used basically
>only for the backtraces. But it might change anytime in the future.
>cpu_owner is a "generic" lock. I guess that it will be used
>in more situations in the future. Any change might break this
>scenario again...
>

I agree that the checking of cpu_owner can be insufficient in the future and 
the use of panic_triggering_all_cpu_backtrace is more reliable in that sense.

>Summary:
>
>I prefer two patches:
>
>  1st patch would allow storing the backtraces using the variable
>     panic_triggering_all_cpu_backtrace (better name appreciated).
>
>  2nd patch would cause flushing the backtraces. And I would use
>     console_flush_on_panic(CONSOLE_FLUSH_PENDING) as a variant
>     which can be backported to stable kernels. It might later
>     be updated by the upcoming printk rework.
>
>Best Regards,
>Petr

Thanks! I'll prepare another patch based on it!

Sincerely,
Ryo Takakura

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] printk: CPU backtrace not printing on panic
  2024-08-01  8:27       ` takakura
@ 2024-08-01 11:12         ` Petr Mladek
  2024-08-03  7:51           ` takakura
  0 siblings, 1 reply; 10+ messages in thread
From: Petr Mladek @ 2024-08-01 11:12 UTC (permalink / raw)
  To: takakura
  Cc: john.ogness, akpm, bhe, feng.tang, j.granados, linux-kernel,
	lukas, nishimura, rostedt, senozhatsky, stephen.s.brennan, taka,
	ubizjak, wangkefeng.wang

On Thu 2024-08-01 17:27:21, takakura@valinux.co.jp wrote:
> Hi Petr and John,
> 
> On 2024-07-30, Petr Mladek <pmladek@suse.com> wrote:
> >On Fri 2024-07-26 16:02:45, John Ogness wrote:
> >> On 2024-07-26, Petr Mladek <pmladek@suse.com> wrote:
> >> > I would do it the other way and enable printing from other CPUs only
> >> > when triggring the backtrace. We could do it because
> >> > trigger_all_cpu_backtrace() waits until all backtraces are
> >> > printed.
> >> >
> >> > Something like:
> >> >
> >> > diff --git a/include/linux/panic.h b/include/linux/panic.h
> >> > index 3130e0b5116b..980bacbdfcfc 100644
> >> > --- a/include/linux/panic.h
> >> > +++ b/include/linux/panic.h
> >> > @@ -16,6 +16,7 @@ extern void oops_enter(void);
> >> >  extern void oops_exit(void);
> >> >  extern bool oops_may_print(void);
> >> >  
> >> > +extern int panic_triggering_all_cpu_backtrace;
> >> >  extern int panic_timeout;
> >> >  extern unsigned long panic_print;
> >> >  extern int panic_on_oops;
> >> > diff --git a/kernel/panic.c b/kernel/panic.c
> >> > index f861bedc1925..7e9e97d59b1e 100644
> >> > --- a/kernel/panic.c
> >> > +++ b/kernel/panic.c
> >> > @@ -64,6 +64,8 @@ unsigned long panic_on_taint;
> >> >  bool panic_on_taint_nousertaint = false;
> >> >  static unsigned int warn_limit __read_mostly;
> >> >  
> >> > +int panic_triggering_all_cpu_backtrace;
> >> > +
> >> >  int panic_timeout = CONFIG_PANIC_TIMEOUT;
> >> >  EXPORT_SYMBOL_GPL(panic_timeout);
> >> >  
> >> > @@ -253,8 +255,12 @@ void check_panic_on_warn(const char *origin)
> >> >   */
> >> >  static void panic_other_cpus_shutdown(bool crash_kexec)
> >> >  {
> >> > -	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
> >> > +	if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
> >> > +		/* Temporary allow printing messages on non-panic CPUs. */
> >> > +		panic_triggering_all_cpu_backtrace = true;
> >> >  		trigger_all_cpu_backtrace();
> >> > +		panic_triggering_all_cpu_backtrace = false;
> >> 
> >> Note, here we should also add
> >> 
> >> 		nbcon_atomic_flush_pending();
> >> 
> >> Your suggestion allows the other CPUs to dump their backtrace into the
> >> ringbuffer, but they are still forbidden from acquiring the nbcon
> >> console contexts for printing. That is a necessary requirement of
> >> nbcon_waiter_matches().
> >
> >Great catch!
> >
> >I would prefer to solve this in a separate patch. This problem existed
> >even before the commit 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing
> >to ringbuffer"). In fact, the problem existed very long time even for
> >the legacy consoles.
> >
> 
> Good point! I guess the problem existed since the commit 51a1d258e50e 
> ("printk: Keep non-panic-CPUs out of console lock") as it forbade the 
> acquisition of console lock for non-panic cpus?

It most likely existed since the commit 7acac3445acde1c94
("printk: always use deferred printk when flush printk_safe lines")

These were times when printk() serialized access to the log buffer
using a spin lock. The backtraces from other CPUs were stored in
temporary per-CPU buffers and later copied to the main log buffer.
The above mentioned commit caused that printk_safe_flush_line()
did not longer try to flush the messages to the console after
copying the temporary stored messages.

Well, the above commit was in Jan 2017. It was before panic
allowed to show the backtraces.

In practice, the problem with flushing bracktraces in panic has
existed since the option to print the backtraces was added by
the commit 60c958d8df9cfc40b ("panic: add sysctl to dump
all CPUs backtraces on oops event") in Jun 2020.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] printk: CPU backtrace not printing on panic
  2024-08-01 11:12         ` Petr Mladek
@ 2024-08-03  7:51           ` takakura
  0 siblings, 0 replies; 10+ messages in thread
From: takakura @ 2024-08-03  7:51 UTC (permalink / raw)
  To: pmladek
  Cc: john.ogness, rostedt, senozhatsky, akpm, bhe, lukas,
	wangkefeng.wang, ubizjak, feng.tang, j.granados,
	stephen.s.brennan, linux-kernel, nishimura, taka

Hi Petr,

On 2024-08-01, Petr Mladek <pmladek@suse.com> wrote:
>On Thu 2024-08-01 17:27:21, takakura@valinux.co.jp wrote:
>> Hi Petr and John,
>>
>> On 2024-07-30, Petr Mladek <pmladek@suse.com> wrote:
>> >On Fri 2024-07-26 16:02:45, John Ogness wrote:
>> >> On 2024-07-26, Petr Mladek <pmladek@suse.com> wrote:
>> >> > I would do it the other way and enable printing from other CPUs only
>> >> > when triggring the backtrace. We could do it because
>> >> > trigger_all_cpu_backtrace() waits until all backtraces are
>> >> > printed.
>> >> >
>> >> > Something like:
>> >> >
>> >> > diff --git a/include/linux/panic.h b/include/linux/panic.h
>> >> > index 3130e0b5116b..980bacbdfcfc 100644
>> >> > --- a/include/linux/panic.h
>> >> > +++ b/include/linux/panic.h
>> >> > @@ -16,6 +16,7 @@ extern void oops_enter(void);
>> >> >  extern void oops_exit(void);
>> >> >  extern bool oops_may_print(void);
>> >> >
>> >> > +extern int panic_triggering_all_cpu_backtrace;
>> >> >  extern int panic_timeout;
>> >> >  extern unsigned long panic_print;
>> >> >  extern int panic_on_oops;
>> >> > diff --git a/kernel/panic.c b/kernel/panic.c
>> >> > index f861bedc1925..7e9e97d59b1e 100644
>> >> > --- a/kernel/panic.c
>> >> > +++ b/kernel/panic.c
>> >> > @@ -64,6 +64,8 @@ unsigned long panic_on_taint;
>> >> >  bool panic_on_taint_nousertaint = false;
>> >> >  static unsigned int warn_limit __read_mostly;
>> >> >
>> >> > +int panic_triggering_all_cpu_backtrace;
>> >> > +
>> >> >  int panic_timeout = CONFIG_PANIC_TIMEOUT;
>> >> >  EXPORT_SYMBOL_GPL(panic_timeout);
>> >> >
>> >> > @@ -253,8 +255,12 @@ void check_panic_on_warn(const char *origin)
>> >> >   */
>> >> >  static void panic_other_cpus_shutdown(bool crash_kexec)
>> >> >  {
>> >> > -       if (panic_print & PANIC_PRINT_ALL_CPU_BT)
>> >> > +       if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
>> >> > +               /* Temporary allow printing messages on non-panic CPUs. */
>> >> > +               panic_triggering_all_cpu_backtrace = true;
>> >> >                 trigger_all_cpu_backtrace();
>> >> > +               panic_triggering_all_cpu_backtrace = false;
>> >>
>> >> Note, here we should also add
>> >>
>> >>           nbcon_atomic_flush_pending();
>> >>
>> >> Your suggestion allows the other CPUs to dump their backtrace into the
>> >> ringbuffer, but they are still forbidden from acquiring the nbcon
>> >> console contexts for printing. That is a necessary requirement of
>> >> nbcon_waiter_matches().
>> >
>> >Great catch!
>> >
>> >I would prefer to solve this in a separate patch. This problem existed
>> >even before the commit 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing
>> >to ringbuffer"). In fact, the problem existed very long time even for
>> >the legacy consoles.
>> >
>>
>> Good point! I guess the problem existed since the commit 51a1d258e50e
>> ("printk: Keep non-panic-CPUs out of console lock") as it forbade the
>> acquisition of console lock for non-panic cpus?
>
>It most likely existed since the commit 7acac3445acde1c94
>("printk: always use deferred printk when flush printk_safe lines")
>
>These were times when printk() serialized access to the log buffer
>using a spin lock. The backtraces from other CPUs were stored in
>temporary per-CPU buffers and later copied to the main log buffer.
>The above mentioned commit caused that printk_safe_flush_line()
>did not longer try to flush the messages to the console after
>copying the temporary stored messages.
>
>Well, the above commit was in Jan 2017. It was before panic
>allowed to show the backtraces.
>
>In practice, the problem with flushing bracktraces in panic has
>existed since the option to print the backtraces was added by
>the commit 60c958d8df9cfc40b ("panic: add sysctl to dump
>all CPUs backtraces on oops event") in Jun 2020.
>
>Best Regards,
>Petr

That is quite interesting how flushing of backtraces in panic 
has been affected differently over time until now, even going back 
to days before CPU backtrace was introduced in panic.

Thanks for sharing this!

Sincerely,
Ryo Takakura

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-08-03  7:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-26  9:44 [PATCH] printk: CPU backtrace not printing on panic takakura
2024-07-26 10:46 ` Petr Mladek
2024-07-26 13:56   ` John Ogness
2024-07-29 11:46     ` takakura
2024-07-29 12:15       ` John Ogness
2024-07-29 12:19         ` John Ogness
2024-07-30 12:14     ` Petr Mladek
2024-08-01  8:27       ` takakura
2024-08-01 11:12         ` Petr Mladek
2024-08-03  7:51           ` takakura

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox