* [PATCH] printk: Fix panic log flush to serial console during kdump in PREEMPT_RT kernels
@ 2025-08-07 11:22 cuiguoqi
2025-08-19 13:31 ` Ryo Takakura
2025-08-19 13:39 ` Petr Mladek
0 siblings, 2 replies; 4+ messages in thread
From: cuiguoqi @ 2025-08-07 11:22 UTC (permalink / raw)
To: catalin.marinas, will, bigeasy, clrkwllms, rostedt
Cc: farbere, guoqi0226, cuiguoqi, tglx, pmladek, akpm, feng.tang,
joel.granados, john.ogness, namcao, takakura, sravankumarlpu,
linux-arm-kernel, linux-kernel, linux-rt-devel
When a system running a real-time (PREEMPT_RT) kernel panics and triggers kdump,
the critical log messages (e.g., panic reason, stack traces) may fail to appear
on the serial console.
When kdump cannot be used properly, serial console logs are crucial,
whether for diagnosing kdump issues or troubleshooting the underlying problem.
This issue arises due to synchronization or deferred flushing of the printk buffer
in real-time contexts, where preemptible console locks or delayed workqueues prevent
timely log output before kexec transitions to the crash kernel.
The test results are as follows:
[ T197] Kernel panic - not syncing: sysrq triggered crash
[ T197] Call trace:
[ T197] dump_backtrace+0x9c/0x120
[ T197] show_stack+0x1c/0x30
[ T197] dump_stack_lvl+0x34/0x88
[ T197] dump_stack+0x14/0x20
[ T197] panic+0x3c4/0x3f8
[ T197] sysrq_handle_crash+0x20/0x28
[ T197] __handle_sysrq+0xd4/0x1e0
[ T197] write_sysrq_trigger+0x88/0x108
[ T197] proc_reg_write+0x9c/0xf8
[ T197] vfs_write+0xf4/0x450
[ T197] ksys_write+0x70/0x100
[ T197] __arm64_sys_write+0x20/0x30
[ T197] invoke_syscall+0x48/0x110
[ T197] el0_svc_common.constprop.0+0x44/0xe8
[ T197] do_el0_svc+0x20/0x30
[ T197] el0_svc+0x24/0x88
[ T197] el0t_64_sync_handler+0xb8/0xc0
[ T197] el0t_64_sync+0x14c/0x150
[ T197] SMP: stopping secondary CPUs
[ T197] Starting crashdump kernel...
[ T197] Bye!
Signed-off-by: cuiguoqi <cuiguoqi@kylinos.cn>
---
arch/arm64/kernel/machine_kexec.c | 4 ++++
kernel/panic.c | 4 ++--
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index 6f121a0..66c7d90 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -24,6 +24,7 @@
#include <asm/page.h>
#include <asm/sections.h>
#include <asm/trans_pgd.h>
+#include <linux/console.h>
/**
* kexec_image_info - For debugging output.
@@ -176,6 +177,9 @@ void machine_kexec(struct kimage *kimage)
pr_info("Bye!\n");
+ if (IS_ENABLED(CONFIG_PREEMPT_RT) && in_kexec_crash)
+ console_flush_on_panic(CONSOLE_FLUSH_PENDING);
+
local_daif_mask();
/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 72fcbb5..e0ad0df 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -437,6 +437,8 @@ void vpanic(const char *fmt, va_list args)
*/
kgdb_panic(buf);
+ printk_legacy_allow_panic_sync();
+
/*
* If we have crashed and we have a crash kernel loaded let it handle
* everything else.
@@ -450,8 +452,6 @@ void vpanic(const char *fmt, va_list args)
panic_other_cpus_shutdown(_crash_kexec_post_notifiers);
- printk_legacy_allow_panic_sync();
-
/*
* Run any panic handlers, including those that might need to
* add information to the kmsg dump output.
--
2.7.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] printk: Fix panic log flush to serial console during kdump in PREEMPT_RT kernels
2025-08-07 11:22 [PATCH] printk: Fix panic log flush to serial console during kdump in PREEMPT_RT kernels cuiguoqi
@ 2025-08-19 13:31 ` Ryo Takakura
2025-08-19 13:39 ` Petr Mladek
1 sibling, 0 replies; 4+ messages in thread
From: Ryo Takakura @ 2025-08-19 13:31 UTC (permalink / raw)
To: cuiguoqi
Cc: akpm, bigeasy, catalin.marinas, clrkwllms, farbere, feng.tang,
guoqi0226, joel.granados, john.ogness, linux-arm-kernel,
linux-kernel, linux-rt-devel, namcao, pmladek, rostedt,
sravankumarlpu, takakura, tglx, will
Hi cuiguoqi!
I'm not an expert on the subject, but hope this helps.
On Thu, 7 Aug 2025 19:22:47 +0800, cuiguoqi wrote:
>When a system running a real-time (PREEMPT_RT) kernel panics and triggers kdump,
>the critical log messages (e.g., panic reason, stack traces) may fail to appear
>on the serial console.
>
>When kdump cannot be used properly, serial console logs are crucial,
>whether for diagnosing kdump issues or troubleshooting the underlying problem.
The console not being flushed in the case of kexec should be expected
as described [0]. Its about prioritizing kexec over serial output.
>This issue arises due to synchronization or deferred flushing of the printk buffer
>in real-time contexts, where preemptible console locks or delayed workqueues prevent
>timely log output before kexec transitions to the crash kernel.
>
> /**
> * kexec_image_info - For debugging output.
>@@ -176,6 +177,9 @@ void machine_kexec(struct kimage *kimage)
>
> pr_info("Bye!\n");
>
>+ if (IS_ENABLED(CONFIG_PREEMPT_RT) && in_kexec_crash)
>+ console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>+
Calling console_flush_on_panic() while trying to kexec will
reduce its chance of success.
>diff --git a/kernel/panic.c b/kernel/panic.c
>index 72fcbb5..e0ad0df 100644
>--- a/kernel/panic.c
>+++ b/kernel/panic.c
>@@ -437,6 +437,8 @@ void vpanic(const char *fmt, va_list args)
> */
> kgdb_panic(buf);
>
>+ printk_legacy_allow_panic_sync();
>+
> /*
> * If we have crashed and we have a crash kernel loaded let it handle
> * everything else.
>@@ -450,8 +452,6 @@ void vpanic(const char *fmt, va_list args)
>
> panic_other_cpus_shutdown(_crash_kexec_post_notifiers);
>
>- printk_legacy_allow_panic_sync();
>-
The ordering here should be kept where we don't want CPUs other than
the one panicked flushing legacy consoles.
Sincerely,
Ryo Takakura
[0] https://lore.kernel.org/lkml/847cagmjsx.fsf@jogness.linutronix.de/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] printk: Fix panic log flush to serial console during kdump in PREEMPT_RT kernels
2025-08-07 11:22 [PATCH] printk: Fix panic log flush to serial console during kdump in PREEMPT_RT kernels cuiguoqi
2025-08-19 13:31 ` Ryo Takakura
@ 2025-08-19 13:39 ` Petr Mladek
2025-08-25 3:02 ` [PATCH] drivers: example: fix memory leak cuiguoqi
1 sibling, 1 reply; 4+ messages in thread
From: Petr Mladek @ 2025-08-19 13:39 UTC (permalink / raw)
To: cuiguoqi
Cc: catalin.marinas, will, bigeasy, clrkwllms, rostedt, farbere,
guoqi0226, tglx, akpm, feng.tang, joel.granados, john.ogness,
namcao, takakura, sravankumarlpu, linux-arm-kernel, linux-kernel,
linux-rt-devel
On Thu 2025-08-07 19:22:47, cuiguoqi wrote:
> When a system running a real-time (PREEMPT_RT) kernel panics and triggers kdump,
> the critical log messages (e.g., panic reason, stack traces) may fail to appear
> on the serial console.
How did you find this problem, please?
Were you investigating why a log was missing?
Or was is just be reading the code?
By other words, is this problem theoretial or did you found
it when debugging a real life problem?
I ask because there is no ideal solution. This change might help
in one situation and make it worse in other situations.
See below.
> When kdump cannot be used properly, serial console logs are crucial,
> whether for diagnosing kdump issues or troubleshooting the underlying problem.
>
> This issue arises due to synchronization or deferred flushing of the printk buffer
> in real-time contexts, where preemptible console locks or delayed workqueues prevent
> timely log output before kexec transitions to the crash kernel.
>
> The test results are as follows:
> [ T197] Kernel panic - not syncing: sysrq triggered crash
> [ T197] Call trace:
> [ T197] dump_backtrace+0x9c/0x120
> [ T197] show_stack+0x1c/0x30
> [ T197] dump_stack_lvl+0x34/0x88
> [ T197] dump_stack+0x14/0x20
> [ T197] panic+0x3c4/0x3f8
> [ T197] sysrq_handle_crash+0x20/0x28
> [ T197] __handle_sysrq+0xd4/0x1e0
> [ T197] write_sysrq_trigger+0x88/0x108
> [ T197] proc_reg_write+0x9c/0xf8
> [ T197] vfs_write+0xf4/0x450
> [ T197] ksys_write+0x70/0x100
> [ T197] __arm64_sys_write+0x20/0x30
> [ T197] invoke_syscall+0x48/0x110
> [ T197] el0_svc_common.constprop.0+0x44/0xe8
> [ T197] do_el0_svc+0x20/0x30
> [ T197] el0_svc+0x24/0x88
> [ T197] el0t_64_sync_handler+0xb8/0xc0
> [ T197] el0t_64_sync+0x14c/0x150
> [ T197] SMP: stopping secondary CPUs
> [ T197] Starting crashdump kernel...
> [ T197] Bye!
>
> Signed-off-by: cuiguoqi <cuiguoqi@kylinos.cn>
> ---
> arch/arm64/kernel/machine_kexec.c | 4 ++++
> kernel/panic.c | 4 ++--
> 2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
> index 6f121a0..66c7d90 100644
> --- a/arch/arm64/kernel/machine_kexec.c
> +++ b/arch/arm64/kernel/machine_kexec.c
> @@ -24,6 +24,7 @@
> #include <asm/page.h>
> #include <asm/sections.h>
> #include <asm/trans_pgd.h>
> +#include <linux/console.h>
>
> /**
> * kexec_image_info - For debugging output.
> @@ -176,6 +177,9 @@ void machine_kexec(struct kimage *kimage)
>
> pr_info("Bye!\n");
>
> + if (IS_ENABLED(CONFIG_PREEMPT_RT) && in_kexec_crash)
> + console_flush_on_panic(CONSOLE_FLUSH_PENDING);
IMHO, this is a bad idea. console_flush_on_panic() is supposed
to be called as the last attempt to flush the kernel messages
when there is no other chance to see them.
console_flush_on_panic() ignores console_lock() because it might
create a deadlock. This why vpanic() calls debug_locks_off() first.
But ignoring a synchronization might obviously bring another problems,
and break the system another way.
console_lock() should _not_ be ignored when we try to create
crash_dump(). It would increase the risk that the crash_dump
would fail. And crash_dump() is the preferred way to preserve
the kernel messages in this code path.
> +
> local_daif_mask();
>
> /*
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 72fcbb5..e0ad0df 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -437,6 +437,8 @@ void vpanic(const char *fmt, va_list args)
> */
> kgdb_panic(buf);
>
> + printk_legacy_allow_panic_sync();
I do not like this as well.
The commit message for the commit e35a8884270bae1 ("printk:
Coordinate direct printing in panic") says that the primary
purpose is to disable flushing legacy consoles when printing
the backtrace by dump_stack(). This change looks OK from this POV.
But we wanted to delay this after __crash_kexec() and
panic_other_cpus_shutdown() because the legacy consoles
are not safe in panic(). They ignore the internal spin locks
after calling bust_spinlocks(1).
This change would increase the risk that __crash_kexec() would
fail. Also the legacy consoles are more safe after stopping
other CPUs.
IMPORTANT: The legacy consoles are blocked only when some
"nbcon" console is registered. And nbcon consoles are never
blocked. It guarantees that the messages are flushed on some
consoles even before this call.
> +
> /*
> * If we have crashed and we have a crash kernel loaded let it handle
> * everything else.
> @@ -450,8 +452,6 @@ void vpanic(const char *fmt, va_list args)
>
> panic_other_cpus_shutdown(_crash_kexec_post_notifiers);
>
> - printk_legacy_allow_panic_sync();
> -
> /*
> * Run any panic handlers, including those that might need to
> * add information to the kmsg dump output.
Best Regards,
Petr
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] drivers: example: fix memory leak
2025-08-19 13:39 ` Petr Mladek
@ 2025-08-25 3:02 ` cuiguoqi
0 siblings, 0 replies; 4+ messages in thread
From: cuiguoqi @ 2025-08-25 3:02 UTC (permalink / raw)
To: pmladek
Cc: akpm, bigeasy, catalin.marinas, clrkwllms, cuiguoqi, farbere,
feng.tang, guoqi0226, joel.granados, john.ogness,
linux-arm-kernel, linux-kernel, linux-rt-devel, namcao, rostedt,
sravankumarlpu, takakura, tglx, will
From: Petr Mladek <pmladek@suse.com>
Hi Petr:
> How did you find this problem, please?
> Were you investigating why a log was missing?
> Or was is just be reading the code?
When I was developing the Linux real-time kernel, the system abnormally crashed,
and kdump triggered the inability to normally enter the second kernel for demsg&vmcore saving.
When an abnormal panic is triggered simultaneously, the abnormal scene and some of the jump logs
of kexec are not output, which to some extent affects the efficiency of debugging and testing
Motivation for the fix:
1. For RT kernels with Kdump deployed, ensure that all relevant information such as call stacks is fully
output to the serial port during the entire process from panic occurrence to transition to the second kernel,
which can better enhance debugging efficiency.
2. When Kdump is not deployed, the call stack can be directly output in RT kernels as shown below:
```c
vpanic{
+ printk_legacy_allow_panic_sync();
+debug_locks_off();
+console_flush_on_panic(CONSOLE_FLUSH_PENDING);
+console_flush_on_panic(CONSOLE_FLUSH_PENDING);
+nbcon_atomic_flush_unsafe();
}
```
3. Therefore, currently I am wondering whether the issue lies with `debug_locks_off();` or if there is an issue with
the logical placement of `printk_legacy_allow_panic_sync();`.
My understanding is that by the time we reach machine_kexec(kexec_crash_image);
other cores should have already been notified and shut down. Additionally,
since this is clearly an emergency situation, flushing the log buffer to the terminal
should not introduce further adverse effects.
I would greatly appreciate your insights and guidance on this matter.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-08-25 3:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-07 11:22 [PATCH] printk: Fix panic log flush to serial console during kdump in PREEMPT_RT kernels cuiguoqi
2025-08-19 13:31 ` Ryo Takakura
2025-08-19 13:39 ` Petr Mladek
2025-08-25 3:02 ` [PATCH] drivers: example: fix memory leak cuiguoqi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).