public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] kgdb: Flush console before entering kgdb on panic
@ 2023-08-22 20:19 Douglas Anderson
  2023-08-25 10:08 ` Daniel Thompson
  0 siblings, 1 reply; 6+ messages in thread
From: Douglas Anderson @ 2023-08-22 20:19 UTC (permalink / raw)
  To: Daniel Thompson
  Cc: Petr Mladek, Douglas Anderson, Jason Wessel, kgdb-bugreport,
	linux-kernel

When entering kdb/kgdb on a kernel panic, it was be observed that the
console isn't flushed before the `kdb` prompt came up. Specifically,
when using the buddy lockup detector on arm64 and running:
  echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT

I could see:
  [   26.161099] lkdtm: Performing direct entry HARDLOCKUP
  [   32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6
  [   32.552865] Sending NMI from CPU 5 to CPUs 6:
  [   32.557359] NMI backtrace for cpu 6
  ... [backtrace for cpu 6] ...
  [   32.558353] NMI backtrace for cpu 5
  ... [backtrace for cpu 5] ...
  [   32.867471] Sending NMI from CPU 5 to CPUs 0-4,7:
  [   32.872321] NMI backtrace forP cpuANC: Hard LOCKUP

  Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry
  [5]kdb>

As you can see, backtraces for the other CPUs start printing and get
interleaved with the kdb PANIC print.

Let's replicate the commands to flush the console in the kdb panic
entry point to avoid this.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 kernel/debug/debug_core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index d5e9ccde3ab8..3a904d8697c8 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -1006,6 +1006,9 @@ void kgdb_panic(const char *msg)
 	if (panic_timeout)
 		return;
 
+	debug_locks_off();
+	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
+
 	if (dbg_kdb_mode)
 		kdb_printf("PANIC: %s\n", msg);
 
-- 
2.42.0.rc1.204.g551eb34607-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] kgdb: Flush console before entering kgdb on panic
  2023-08-22 20:19 [PATCH] kgdb: Flush console before entering kgdb on panic Douglas Anderson
@ 2023-08-25 10:08 ` Daniel Thompson
  2023-08-25 14:18   ` Doug Anderson
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Thompson @ 2023-08-25 10:08 UTC (permalink / raw)
  To: Douglas Anderson; +Cc: Petr Mladek, Jason Wessel, kgdb-bugreport, linux-kernel

On Tue, Aug 22, 2023 at 01:19:46PM -0700, Douglas Anderson wrote:
> When entering kdb/kgdb on a kernel panic, it was be observed that the
> console isn't flushed before the `kdb` prompt came up. Specifically,
> when using the buddy lockup detector on arm64 and running:
>   echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
>
> I could see:
>   [   26.161099] lkdtm: Performing direct entry HARDLOCKUP
>   [   32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6
>   [   32.552865] Sending NMI from CPU 5 to CPUs 6:
>   [   32.557359] NMI backtrace for cpu 6
>   ... [backtrace for cpu 6] ...
>   [   32.558353] NMI backtrace for cpu 5
>   ... [backtrace for cpu 5] ...
>   [   32.867471] Sending NMI from CPU 5 to CPUs 0-4,7:
>   [   32.872321] NMI backtrace forP cpuANC: Hard LOCKUP
>
>   Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry
>   [5]kdb>
>
> As you can see, backtraces for the other CPUs start printing and get
> interleaved with the kdb PANIC print.
>
> Let's replicate the commands to flush the console in the kdb panic
> entry point to avoid this.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
>  kernel/debug/debug_core.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index d5e9ccde3ab8..3a904d8697c8 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -1006,6 +1006,9 @@ void kgdb_panic(const char *msg)
>  	if (panic_timeout)
>  		return;
>
> +	debug_locks_off();
> +	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> +
>  	if (dbg_kdb_mode)
>  		kdb_printf("PANIC: %s\n", msg);

I'm somewhat included to say *this* (calling kdb_printf() when not
actually in the debugger) is the cause of the problem. kdb_printf()
does some pretty horid things to the console and isn't intended to
run while the system is active.

I'd therefore be more tempted to defer the print to the b.p. trap
handler itself and make this part of kgdb_panic() look more like:

	kgdb_panic_msg = msg;
	kgdb_breakpoint();
	kgdb_panic_msg = NULL;


Daniel.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] kgdb: Flush console before entering kgdb on panic
  2023-08-25 10:08 ` Daniel Thompson
@ 2023-08-25 14:18   ` Doug Anderson
  2023-08-30  9:53     ` Daniel Thompson
  0 siblings, 1 reply; 6+ messages in thread
From: Doug Anderson @ 2023-08-25 14:18 UTC (permalink / raw)
  To: Daniel Thompson; +Cc: Petr Mladek, Jason Wessel, kgdb-bugreport, linux-kernel

Hi,

On Fri, Aug 25, 2023 at 3:09 AM Daniel Thompson
<daniel.thompson@linaro.org> wrote:
>
> On Tue, Aug 22, 2023 at 01:19:46PM -0700, Douglas Anderson wrote:
> > When entering kdb/kgdb on a kernel panic, it was be observed that the
> > console isn't flushed before the `kdb` prompt came up. Specifically,
> > when using the buddy lockup detector on arm64 and running:
> >   echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> >
> > I could see:
> >   [   26.161099] lkdtm: Performing direct entry HARDLOCKUP
> >   [   32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6
> >   [   32.552865] Sending NMI from CPU 5 to CPUs 6:
> >   [   32.557359] NMI backtrace for cpu 6
> >   ... [backtrace for cpu 6] ...
> >   [   32.558353] NMI backtrace for cpu 5
> >   ... [backtrace for cpu 5] ...
> >   [   32.867471] Sending NMI from CPU 5 to CPUs 0-4,7:
> >   [   32.872321] NMI backtrace forP cpuANC: Hard LOCKUP
> >
> >   Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry
> >   [5]kdb>
> >
> > As you can see, backtraces for the other CPUs start printing and get
> > interleaved with the kdb PANIC print.
> >
> > Let's replicate the commands to flush the console in the kdb panic
> > entry point to avoid this.
> >
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> >
> >  kernel/debug/debug_core.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> > index d5e9ccde3ab8..3a904d8697c8 100644
> > --- a/kernel/debug/debug_core.c
> > +++ b/kernel/debug/debug_core.c
> > @@ -1006,6 +1006,9 @@ void kgdb_panic(const char *msg)
> >       if (panic_timeout)
> >               return;
> >
> > +     debug_locks_off();
> > +     console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> > +
> >       if (dbg_kdb_mode)
> >               kdb_printf("PANIC: %s\n", msg);
>
> I'm somewhat included to say *this* (calling kdb_printf() when not
> actually in the debugger) is the cause of the problem. kdb_printf()
> does some pretty horid things to the console and isn't intended to
> run while the system is active.
>
> I'd therefore be more tempted to defer the print to the b.p. trap
> handler itself and make this part of kgdb_panic() look more like:
>
>         kgdb_panic_msg = msg;
>         kgdb_breakpoint();
>         kgdb_panic_msg = NULL;

Unfortunately I think that only solves half the problem. As a quick
test, I tried simply commenting out the "kdb_printf" line in
kgdb_panic(). While that avoids the interleaved panic message and
backtrace, it does nothing to actually get the backtraces printed out
before you end up in kdb. As an example, this is what happened when I
used `echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT` and
had the "kdb_printf" in kgdb_panic() commented out:

[   72.658424] lkdtm: Performing direct entry HARDLOCKUP
[   82.181857] watchdog: Watchdog detected hard LOCKUP on cpu 6
...
[   82.234801] Sending NMI from CPU 5 to CPUs 6:
[   82.239296] NMI backtrace for cpu 6
... [ stack trace for CPU 6 ] ...
[   82.240294] NMI backtrace for cpu 5
... [ stack trace for CPU 5 ] ...
[   82.576443] Sending NMI from CPU 5 to CPUs 0-4,7:
[   82.581291] NMI backtrace
Entering kdb (current=0xffffff80da5a1080, pid 6978) on processor 5 due
to Keyboard Entry
[5]kdb>

As you can see, I don't see the traces for CPUs 0-4 and 7. Those do
show up if I use the "dmesg" command but it's a bit of a hassle to run
"dmesg" to look for any extra debug messages every time I drop in kdb.

I guess perhaps that part isn't obvious from the commit message?
Should I send a new version with an updated commit message indicating
that it's not just the jumbled text that's a problem but also the lack
of stack traces?

Thanks!

-Doug

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] kgdb: Flush console before entering kgdb on panic
  2023-08-25 14:18   ` Doug Anderson
@ 2023-08-30  9:53     ` Daniel Thompson
  2023-09-20 12:01       ` Petr Mladek
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Thompson @ 2023-08-30  9:53 UTC (permalink / raw)
  To: Doug Anderson; +Cc: Petr Mladek, Jason Wessel, kgdb-bugreport, linux-kernel

On Fri, Aug 25, 2023 at 07:18:44AM -0700, Doug Anderson wrote:
> Hi,
>
> On Fri, Aug 25, 2023 at 3:09 AM Daniel Thompson
> <daniel.thompson@linaro.org> wrote:
> >
> > On Tue, Aug 22, 2023 at 01:19:46PM -0700, Douglas Anderson wrote:
> > > When entering kdb/kgdb on a kernel panic, it was be observed that the
> > > console isn't flushed before the `kdb` prompt came up. Specifically,
> > > when using the buddy lockup detector on arm64 and running:
> > >   echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> > >
> > > I could see:
> > >   [   26.161099] lkdtm: Performing direct entry HARDLOCKUP
> > >   [   32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6
> > >   [   32.552865] Sending NMI from CPU 5 to CPUs 6:
> > >   [   32.557359] NMI backtrace for cpu 6
> > >   ... [backtrace for cpu 6] ...
> > >   [   32.558353] NMI backtrace for cpu 5
> > >   ... [backtrace for cpu 5] ...
> > >   [   32.867471] Sending NMI from CPU 5 to CPUs 0-4,7:
> > >   [   32.872321] NMI backtrace forP cpuANC: Hard LOCKUP
> > >
> > >   Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry
> > >   [5]kdb>
> > >
> > > As you can see, backtraces for the other CPUs start printing and get
> > > interleaved with the kdb PANIC print.
> > >
> > > Let's replicate the commands to flush the console in the kdb panic
> > > entry point to avoid this.
> > >
> > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > ---
> > >
> > >  kernel/debug/debug_core.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> > > index d5e9ccde3ab8..3a904d8697c8 100644
> > > --- a/kernel/debug/debug_core.c
> > > +++ b/kernel/debug/debug_core.c
> > > @@ -1006,6 +1006,9 @@ void kgdb_panic(const char *msg)
> > >       if (panic_timeout)
> > >               return;
> > >
> > > +     debug_locks_off();
> > > +     console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> > > +
> > >       if (dbg_kdb_mode)
> > >               kdb_printf("PANIC: %s\n", msg);
> >
> > I'm somewhat included to say *this* (calling kdb_printf() when not
> > actually in the debugger) is the cause of the problem. kdb_printf()
> > does some pretty horid things to the console and isn't intended to
> > run while the system is active.
> >
> > I'd therefore be more tempted to defer the print to the b.p. trap
> > handler itself and make this part of kgdb_panic() look more like:
> >
> >         kgdb_panic_msg = msg;
> >         kgdb_breakpoint();
> >         kgdb_panic_msg = NULL;
>
> Unfortunately I think that only solves half the problem. As a quick
> test, I tried simply commenting out the "kdb_printf" line in
> kgdb_panic(). While that avoids the interleaved panic message and
> backtrace, it does nothing to actually get the backtraces printed out
> before you end up in kdb. As an example, this is what happened when I
> used `echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT` and
> had the "kdb_printf" in kgdb_panic() commented out:
>
> [   72.658424] lkdtm: Performing direct entry HARDLOCKUP
> [   82.181857] watchdog: Watchdog detected hard LOCKUP on cpu 6
> ...
> [   82.234801] Sending NMI from CPU 5 to CPUs 6:
> [   82.239296] NMI backtrace for cpu 6
> ... [ stack trace for CPU 6 ] ...
> [   82.240294] NMI backtrace for cpu 5
> ... [ stack trace for CPU 5 ] ...
> [   82.576443] Sending NMI from CPU 5 to CPUs 0-4,7:
> [   82.581291] NMI backtrace
> Entering kdb (current=0xffffff80da5a1080, pid 6978) on processor 5 due
> to Keyboard Entry
> [5]kdb>
>
> As you can see, I don't see the traces for CPUs 0-4 and 7. Those do
> show up if I use the "dmesg" command but it's a bit of a hassle to run
> "dmesg" to look for any extra debug messages every time I drop in kdb.
>
> I guess perhaps that part isn't obvious from the commit message?

I figured it was a risk.

In fact it's an area where my instinct to honour console messages and my
instinct to get into the kernel as soon as possible after the decision
to invoke it has been made come into conflict.

In other words does it matter that the console buffers are not flushed
before entering kgdb? However having thought about it for a little while
(and knowing the console code tends to be written to be decently robust)
I can come to the view the flushing is best.


> Should I send a new version with an updated commit message indicating
> that it's not just the jumbled text that's a problem but also the lack
> of stack traces?

No real need.

I don't really like seeing kdb_printf() being called from here but
having reviewed a bit of console code I think we can might be able
to use the new infrastructure to make kdb_printf() a slightly less
hateful ;-).


Daniel.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] kgdb: Flush console before entering kgdb on panic
  2023-08-30  9:53     ` Daniel Thompson
@ 2023-09-20 12:01       ` Petr Mladek
  2023-10-16 14:00         ` Daniel Thompson
  0 siblings, 1 reply; 6+ messages in thread
From: Petr Mladek @ 2023-09-20 12:01 UTC (permalink / raw)
  To: Daniel Thompson; +Cc: Doug Anderson, Jason Wessel, kgdb-bugreport, linux-kernel

On Wed 2023-08-30 10:53:59, Daniel Thompson wrote:
> On Fri, Aug 25, 2023 at 07:18:44AM -0700, Doug Anderson wrote:
> > Hi,
> >
> > On Fri, Aug 25, 2023 at 3:09 AM Daniel Thompson
> > <daniel.thompson@linaro.org> wrote:
> > >
> > > On Tue, Aug 22, 2023 at 01:19:46PM -0700, Douglas Anderson wrote:
> > > > When entering kdb/kgdb on a kernel panic, it was be observed that the
> > > > console isn't flushed before the `kdb` prompt came up. Specifically,
> > > > when using the buddy lockup detector on arm64 and running:
> > > >   echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> > > >
> > > > I could see:
> > > >   [   26.161099] lkdtm: Performing direct entry HARDLOCKUP
> > > >   [   32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6
> > > >   [   32.552865] Sending NMI from CPU 5 to CPUs 6:
> > > >   [   32.557359] NMI backtrace for cpu 6
> > > >   ... [backtrace for cpu 6] ...
> > > >   [   32.558353] NMI backtrace for cpu 5
> > > >   ... [backtrace for cpu 5] ...
> > > >   [   32.867471] Sending NMI from CPU 5 to CPUs 0-4,7:
> > > >   [   32.872321] NMI backtrace forP cpuANC: Hard LOCKUP
> > > >
> > > >   Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry
> > > >   [5]kdb>
> > > >
> > > > As you can see, backtraces for the other CPUs start printing and get
> > > > interleaved with the kdb PANIC print.
> > > >
> > > > Let's replicate the commands to flush the console in the kdb panic
> > > > entry point to avoid this.
> > > >
> > > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > > ---
> > > >
> > > >  kernel/debug/debug_core.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> > > > index d5e9ccde3ab8..3a904d8697c8 100644
> > > > --- a/kernel/debug/debug_core.c
> > > > +++ b/kernel/debug/debug_core.c
> > > > @@ -1006,6 +1006,9 @@ void kgdb_panic(const char *msg)
> > > >       if (panic_timeout)
> > > >               return;
> > > >
> > > > +     debug_locks_off();
> > > > +     console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> > > > +
> > > >       if (dbg_kdb_mode)
> > > >               kdb_printf("PANIC: %s\n", msg);
> > >
> > > I'm somewhat included to say *this* (calling kdb_printf() when not
> > > actually in the debugger) is the cause of the problem. kdb_printf()
> > > does some pretty horid things to the console and isn't intended to
> > > run while the system is active.
> > >
> > > I'd therefore be more tempted to defer the print to the b.p. trap
> > > handler itself and make this part of kgdb_panic() look more like:
> > >
> > >         kgdb_panic_msg = msg;
> > >         kgdb_breakpoint();
> > >         kgdb_panic_msg = NULL;
> >
> > Unfortunately I think that only solves half the problem. As a quick
> > test, I tried simply commenting out the "kdb_printf" line in
> > kgdb_panic(). While that avoids the interleaved panic message and
> > backtrace, it does nothing to actually get the backtraces printed out
> > before you end up in kdb. As an example, this is what happened when I
> > used `echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT` and
> > had the "kdb_printf" in kgdb_panic() commented out:
> >
> > [   72.658424] lkdtm: Performing direct entry HARDLOCKUP
> > [   82.181857] watchdog: Watchdog detected hard LOCKUP on cpu 6
> > ...
> > [   82.234801] Sending NMI from CPU 5 to CPUs 6:
> > [   82.239296] NMI backtrace for cpu 6
> > ... [ stack trace for CPU 6 ] ...
> > [   82.240294] NMI backtrace for cpu 5
> > ... [ stack trace for CPU 5 ] ...
> > [   82.576443] Sending NMI from CPU 5 to CPUs 0-4,7:
> > [   82.581291] NMI backtrace
> > Entering kdb (current=0xffffff80da5a1080, pid 6978) on processor 5 due
> > to Keyboard Entry
> > [5]kdb>
> >
> > As you can see, I don't see the traces for CPUs 0-4 and 7. Those do
> > show up if I use the "dmesg" command but it's a bit of a hassle to run
> > "dmesg" to look for any extra debug messages every time I drop in kdb.
> >
> > I guess perhaps that part isn't obvious from the commit message?
> 
> I figured it was a risk.
> 
> In fact it's an area where my instinct to honour console messages and my
> instinct to get into the kernel as soon as possible after the decision
> to invoke it has been made come into conflict.
> 
> In other words does it matter that the console buffers are not flushed
> before entering kgdb? However having thought about it for a little while
> (and knowing the console code tends to be written to be decently robust)
> I can come to the view the flushing is best.

Just for record. I wondered if there was bug in printk() that the
backtraces were not flushed. And I think that it is because they
are printed in NMI context where the conosle handling is deferred.

From this POV, the explicit flush makes perfect sense.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] kgdb: Flush console before entering kgdb on panic
  2023-09-20 12:01       ` Petr Mladek
@ 2023-10-16 14:00         ` Daniel Thompson
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Thompson @ 2023-10-16 14:00 UTC (permalink / raw)
  To: Petr Mladek; +Cc: Doug Anderson, Jason Wessel, kgdb-bugreport, linux-kernel

On Wed, Sep 20, 2023 at 02:01:30PM +0200, Petr Mladek wrote:
> On Wed 2023-08-30 10:53:59, Daniel Thompson wrote:
> > On Fri, Aug 25, 2023 at 07:18:44AM -0700, Doug Anderson wrote:
> > > Hi,
> > >
> > > On Fri, Aug 25, 2023 at 3:09 AM Daniel Thompson
> > > <daniel.thompson@linaro.org> wrote:
> > > >
> > > > On Tue, Aug 22, 2023 at 01:19:46PM -0700, Douglas Anderson wrote:
> > > > > When entering kdb/kgdb on a kernel panic, it was be observed that the
> > > > > console isn't flushed before the `kdb` prompt came up. Specifically,
> > > > > when using the buddy lockup detector on arm64 and running:
> > > > >   echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> > > > >
> > > > > I could see:
> > > > >   [   26.161099] lkdtm: Performing direct entry HARDLOCKUP
> > > > >   [   32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6
> > > > >   [   32.552865] Sending NMI from CPU 5 to CPUs 6:
> > > > >   [   32.557359] NMI backtrace for cpu 6
> > > > >   ... [backtrace for cpu 6] ...
> > > > >   [   32.558353] NMI backtrace for cpu 5
> > > > >   ... [backtrace for cpu 5] ...
> > > > >   [   32.867471] Sending NMI from CPU 5 to CPUs 0-4,7:
> > > > >   [   32.872321] NMI backtrace forP cpuANC: Hard LOCKUP
> > > > >
> > > > >   Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry
> > > > >   [5]kdb>
> > > > >
> > > > > As you can see, backtraces for the other CPUs start printing and get
> > > > > interleaved with the kdb PANIC print.
> > > > >
> > > > > Let's replicate the commands to flush the console in the kdb panic
> > > > > entry point to avoid this.
> > > > >
> > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > > > ---
> > > > >
> > > > >  kernel/debug/debug_core.c | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> > > > > index d5e9ccde3ab8..3a904d8697c8 100644
> > > > > --- a/kernel/debug/debug_core.c
> > > > > +++ b/kernel/debug/debug_core.c
> > > > > @@ -1006,6 +1006,9 @@ void kgdb_panic(const char *msg)
> > > > >       if (panic_timeout)
> > > > >               return;
> > > > >
> > > > > +     debug_locks_off();
> > > > > +     console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> > > > > +
> > > > >       if (dbg_kdb_mode)
> > > > >               kdb_printf("PANIC: %s\n", msg);
> > > >
> > > > I'm somewhat included to say *this* (calling kdb_printf() when not
> > > > actually in the debugger) is the cause of the problem. kdb_printf()
> > > > does some pretty horid things to the console and isn't intended to
> > > > run while the system is active.
> > > >
> > > > I'd therefore be more tempted to defer the print to the b.p. trap
> > > > handler itself and make this part of kgdb_panic() look more like:
> > > >
> > > >         kgdb_panic_msg = msg;
> > > >         kgdb_breakpoint();
> > > >         kgdb_panic_msg = NULL;
> > >
> > > Unfortunately I think that only solves half the problem. As a quick
> > > test, I tried simply commenting out the "kdb_printf" line in
> > > kgdb_panic(). While that avoids the interleaved panic message and
> > > backtrace, it does nothing to actually get the backtraces printed out
> > > before you end up in kdb. As an example, this is what happened when I
> > > used `echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT` and
> > > had the "kdb_printf" in kgdb_panic() commented out:
> > >
> > > [   72.658424] lkdtm: Performing direct entry HARDLOCKUP
> > > [   82.181857] watchdog: Watchdog detected hard LOCKUP on cpu 6
> > > ...
> > > [   82.234801] Sending NMI from CPU 5 to CPUs 6:
> > > [   82.239296] NMI backtrace for cpu 6
> > > ... [ stack trace for CPU 6 ] ...
> > > [   82.240294] NMI backtrace for cpu 5
> > > ... [ stack trace for CPU 5 ] ...
> > > [   82.576443] Sending NMI from CPU 5 to CPUs 0-4,7:
> > > [   82.581291] NMI backtrace
> > > Entering kdb (current=0xffffff80da5a1080, pid 6978) on processor 5 due
> > > to Keyboard Entry
> > > [5]kdb>
> > >
> > > As you can see, I don't see the traces for CPUs 0-4 and 7. Those do
> > > show up if I use the "dmesg" command but it's a bit of a hassle to run
> > > "dmesg" to look for any extra debug messages every time I drop in kdb.
> > >
> > > I guess perhaps that part isn't obvious from the commit message?
> >
> > I figured it was a risk.
> >
> > In fact it's an area where my instinct to honour console messages and my
> > instinct to get into the kernel as soon as possible after the decision
> > to invoke it has been made come into conflict.
> >
> > In other words does it matter that the console buffers are not flushed
> > before entering kgdb? However having thought about it for a little while
> > (and knowing the console code tends to be written to be decently robust)
> > I can come to the view the flushing is best.
>
> Just for record. I wondered if there was bug in printk() that the
> backtraces were not flushed. And I think that it is because they
> are printed in NMI context where the conosle handling is deferred.
>
> From this POV, the explicit flush makes perfect sense.

Agreed... and (with apologies for the length of time to get this
done):
Applied!


Thanks

Daniel.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-10-16 14:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-22 20:19 [PATCH] kgdb: Flush console before entering kgdb on panic Douglas Anderson
2023-08-25 10:08 ` Daniel Thompson
2023-08-25 14:18   ` Doug Anderson
2023-08-30  9:53     ` Daniel Thompson
2023-09-20 12:01       ` Petr Mladek
2023-10-16 14:00         ` Daniel Thompson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox