* + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree
@ 2018-12-04 7:15 akpm
2018-12-04 10:10 ` Sergey Senozhatsky
2018-12-04 10:20 ` Petr Mladek
0 siblings, 2 replies; 25+ messages in thread
From: akpm @ 2018-12-04 7:15 UTC (permalink / raw)
To: bp, feng.tang, keescook, mm-commits, pmladek, sergey.senozhatsky,
stable, tglx
The patch titled
Subject: panic: Avoid the extra noise dmesg
has been added to the -mm tree. Its filename is
panic-avoid-the-extra-noise-dmesg.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/panic-avoid-the-extra-noise-dmesg.patch
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/panic-avoid-the-extra-noise-dmesg.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Feng Tang <feng.tang@intel.com>
Subject: panic: Avoid the extra noise dmesg
When kernel panic happens, it will first print the panic call stack,
then the ending msg like:
[ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception
[ 35.749975] ------------[ cut here ]------------
The above message are very useful for debugging.
But if system is configured to not reboot on panic, say the "panic_timeout"
parameter equals 0, it will likely print out many noisy message like
WARN() call stack for each and every CPU except the panic one, messages
like below:
WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190
Call Trace:
<IRQ>
try_to_wake_up
default_wake_function
autoremove_wake_function
__wake_up_common
__wake_up_common_lock
__wake_up
wake_up_klogd_work_func
irq_work_run_list
irq_work_tick
update_process_times
tick_sched_timer
__hrtimer_run_queues
hrtimer_interrupt
smp_apic_timer_interrupt
apic_timer_interrupt
For people working in console mode, the screen will first show the panic
call stack, but immediately overridded by these noisy extra messages, which
makes debugging much more difficult, as the original context gets lost on
screen.
Also these noisy messages will confuse some users, as I have seen many bug
reporters posted the noisy message into bugzilla, instead of the real panic
call stack and context.
Removing the "local_irq_enable" will avoid the noisy message.
The justification for the removing is: when code runs to this point, it
means user has chosed to not reboot, or do any special handling by using
the panic notifier method, no much point in re-enabling the interrupt.
Link: http://lkml.kernel.org/r/1543902228-23834-1-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
--- a/kernel/panic.c~panic-avoid-the-extra-noise-dmesg
+++ a/kernel/panic.c
@@ -322,7 +322,6 @@ void panic(const char *fmt, ...)
}
#endif
pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
- local_irq_enable();
for (i = 0; ; i += PANIC_TIMER_STEP) {
touch_softlockup_watchdog();
if (i >= i_next) {
_
Patches currently in -mm which might be from feng.tang@intel.com are
panic-add-options-to-print-system-info-when-panic-happens.patch
kernel-sysctl-add-panic_print-into-sysctl.patch
panic-avoid-the-extra-noise-dmesg.patch
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-04 7:15 + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree akpm @ 2018-12-04 10:10 ` Sergey Senozhatsky 2018-12-04 10:20 ` Petr Mladek 1 sibling, 0 replies; 25+ messages in thread From: Sergey Senozhatsky @ 2018-12-04 10:10 UTC (permalink / raw) To: Feng Tang Cc: bp, feng.tang, keescook, mm-commits, pmladek, sergey.senozhatsky, stable, tglx, akpm, peterz, rostedt On (12/03/18 23:15), akpm@linux-foundation.org wrote: [..] > Feng Tang wrote: > [..] > > For people working in console mode, the screen will first show the panic > call stack, but immediately overridded by these noisy extra messages, which > makes debugging much more difficult, as the original context gets lost on > screen. > > Also these noisy messages will confuse some users, as I have seen many bug > reporters posted the noisy message into bugzilla, instead of the real panic > call stack and context. > > Removing the "local_irq_enable" will avoid the noisy message. > > The justification for the removing is: when code runs to this point, it > means user has chosed to not reboot, or do any special handling by using > the panic notifier method, no much point in re-enabling the interrupt. [..] > @@ -322,7 +322,6 @@ void panic(const char *fmt, ...) > } > #endif > pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf); > - local_irq_enable(); > for (i = 0; ; i += PANIC_TIMER_STEP) { > touch_softlockup_watchdog(); > if (i >= i_next) { Hmm, looking at 2.4 kernel: --- panic() --- ... sti(); for(;;) { #if defined(CONFIG_X86) && defined(CONFIG_VT) extern void panic_blink(void); panic_blink(); #endif CHECK_EMERGENCY_SYNC } ---------------- CHECK_EMERGENCY_SYNC is #define CHECK_EMERGENCY_SYNC \ if (emergency_sync_scheduled) \ do_emergency_sync(); And emergency_sync_scheduled is set by sysrq. I wonder if this is still the case - sysrq over serial, for instance, we want local irqs for that. -ss ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-04 7:15 + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree akpm 2018-12-04 10:10 ` Sergey Senozhatsky @ 2018-12-04 10:20 ` Petr Mladek 2018-12-04 15:49 ` Feng Tang 1 sibling, 1 reply; 25+ messages in thread From: Petr Mladek @ 2018-12-04 10:20 UTC (permalink / raw) To: akpm Cc: bp, feng.tang, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Peter Zijlstra On Mon 2018-12-03 23:15:31, Andrew Morton wrote: > > The patch titled > Subject: panic: Avoid the extra noise dmesg > has been added to the -mm tree. Its filename is > panic-avoid-the-extra-noise-dmesg.patch > > This patch should soon appear at > http://ozlabs.org/~akpm/mmots/broken-out/panic-avoid-the-extra-noise-dmesg.patch > and later at > http://ozlabs.org/~akpm/mmotm/broken-out/panic-avoid-the-extra-noise-dmesg.patch > > Before you just go and hit "reply", please: > a) Consider who else should be cc'ed > b) Prefer to cc a suitable mailing list as well > c) Ideally: find the original patch on the mailing list and do a > reply-to-all to that, adding suitable additional cc's > > *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** > > The -mm tree is included into linux-next and is updated > there every 3-4 working days > > ------------------------------------------------------ > From: Feng Tang <feng.tang@intel.com> > Subject: panic: Avoid the extra noise dmesg > > When kernel panic happens, it will first print the panic call stack, > then the ending msg like: > > [ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception > [ 35.749975] ------------[ cut here ]------------ > > The above message are very useful for debugging. > > But if system is configured to not reboot on panic, say the "panic_timeout" > parameter equals 0, it will likely print out many noisy message like > WARN() call stack for each and every CPU except the panic one, messages > like below: > > WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190 > Call Trace: > <IRQ> > try_to_wake_up > default_wake_function > autoremove_wake_function > __wake_up_common > __wake_up_common_lock > __wake_up > wake_up_klogd_work_func > irq_work_run_list > irq_work_tick > update_process_times > tick_sched_timer > __hrtimer_run_queues > hrtimer_interrupt > smp_apic_timer_interrupt > apic_timer_interrupt I guess that it is a warning about migrating tasks to an offline CPU. > For people working in console mode, the screen will first show the panic > call stack, but immediately overridded by these noisy extra messages, which > makes debugging much more difficult, as the original context gets lost on > screen. > > Also these noisy messages will confuse some users, as I have seen many bug > reporters posted the noisy message into bugzilla, instead of the real panic > call stack and context. > > Removing the "local_irq_enable" will avoid the noisy message. > > The justification for the removing is: when code runs to this point, it > means user has chosed to not reboot, or do any special handling by using > the panic notifier method, no much point in re-enabling the interrupt. > > Link: http://lkml.kernel.org/r/1543902228-23834-1-git-send-email-feng.tang@intel.com > Signed-off-by: Feng Tang <feng.tang@intel.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Kees Cook <keescook@chromium.org> > Cc: Borislav Petkov <bp@suse.de> > Cc: Petr Mladek <pmladek@suse.com> > Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > Cc: <stable@vger.kernel.org> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > > --- a/kernel/panic.c~panic-avoid-the-extra-noise-dmesg > +++ a/kernel/panic.c > @@ -322,7 +322,6 @@ void panic(const char *fmt, ...) > } > #endif > pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf); > - local_irq_enable(); > for (i = 0; ; i += PANIC_TIMER_STEP) { > touch_softlockup_watchdog(); > if (i >= i_next) { Hmm, this calls panic_blink(). It seems that it depends on workqueues and the scheduler: + led_panic_blink() + led_trigger_event() + led_set_brightness() + schedule_work(set_brightness_work) I guess that blinking might be important in some situations and on some devices. On the other hand, we are interested only into blinking from this point on. The easiest solution seems to be to make a noop from printk(). For example, we could add a global flag: int panic_blinking; and add the following into vprintk_func() /* * Do not push away real panic() message by warnings from led * blinking code. */ if (panic_blinking) return 0; How does that sound? Best Regards, Petr ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-04 10:20 ` Petr Mladek @ 2018-12-04 15:49 ` Feng Tang 2018-12-04 16:01 ` Petr Mladek 2018-12-05 2:26 ` Sergey Senozhatsky 0 siblings, 2 replies; 25+ messages in thread From: Feng Tang @ 2018-12-04 15:49 UTC (permalink / raw) To: Petr Mladek Cc: akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Peter Zijlstra, Sasha Levin + Sasha Thanks Petr and Sergey for the reviews. On Tue, Dec 04, 2018 at 11:20:33AM +0100, Petr Mladek wrote: > On Mon 2018-12-03 23:15:31, Andrew Morton wrote: > > > > ------------------------------------------------------ > > From: Feng Tang <feng.tang@intel.com> > > Subject: panic: Avoid the extra noise dmesg > > > > When kernel panic happens, it will first print the panic call stack, > > then the ending msg like: > > > > [ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception > > [ 35.749975] ------------[ cut here ]------------ > > > > The above message are very useful for debugging. > > > > But if system is configured to not reboot on panic, say the "panic_timeout" > > parameter equals 0, it will likely print out many noisy message like > > WARN() call stack for each and every CPU except the panic one, messages > > like below: > > > > WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190 > > Call Trace: > > <IRQ> > > try_to_wake_up > > default_wake_function > > autoremove_wake_function > > __wake_up_common > > __wake_up_common_lock > > __wake_up > > wake_up_klogd_work_func > > irq_work_run_list > > irq_work_tick > > update_process_times > > tick_sched_timer > > __hrtimer_run_queues > > hrtimer_interrupt > > smp_apic_timer_interrupt > > apic_timer_interrupt > > I guess that it is a warning about migrating tasks to an offline CPU. My v1 patch was trying to add some hacky code into architecture code to address several WARN()s directly, but that turns out to be very hacky and involve much code for many archs. > > For people working in console mode, the screen will first show the panic > > call stack, but immediately overridded by these noisy extra messages, which > > makes debugging much more difficult, as the original context gets lost on > > screen. > > > > Also these noisy messages will confuse some users, as I have seen many bug > > reporters posted the noisy message into bugzilla, instead of the real panic > > call stack and context. > > > > Removing the "local_irq_enable" will avoid the noisy message. > > > > The justification for the removing is: when code runs to this point, it > > means user has chosed to not reboot, or do any special handling by using > > the panic notifier method, no much point in re-enabling the interrupt. > > > > Link: http://lkml.kernel.org/r/1543902228-23834-1-git-send-email-feng.tang@intel.com > > Signed-off-by: Feng Tang <feng.tang@intel.com> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: Kees Cook <keescook@chromium.org> > > Cc: Borislav Petkov <bp@suse.de> > > Cc: Petr Mladek <pmladek@suse.com> > > Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > > Cc: <stable@vger.kernel.org> > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > > --- > > > > > > --- a/kernel/panic.c~panic-avoid-the-extra-noise-dmesg > > +++ a/kernel/panic.c > > @@ -322,7 +322,6 @@ void panic(const char *fmt, ...) > > } > > #endif > > pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf); > > - local_irq_enable(); > > for (i = 0; ; i += PANIC_TIMER_STEP) { > > touch_softlockup_watchdog(); > > if (i >= i_next) { > > Hmm, this calls panic_blink(). It seems that it depends on workqueues > and the scheduler: > > + led_panic_blink() > + led_trigger_event() > + led_set_brightness() > + schedule_work(set_brightness_work) > > I guess that blinking might be important in some situations and > on some devices. On the other hand, we are interested only into > blinking from this point on. > > The easiest solution seems to be to make a noop from printk(). > For example, we could add a global flag: > > int panic_blinking; > > and add the following into vprintk_func() > > /* > * Do not push away real panic() message by warnings from led > * blinking code. > */ > if (panic_blinking) > return 0; > > How does that sound? This should be able to achieve the same goal. One thing I can think of is what mentioned by Sergey that some sysrq handler may want to print out something, but it should mostly be covered by 2 other panic debug print patches, which will print out task/mem/timer/lock/ftrace info runtime on demand. Thanks, Feng ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-04 15:49 ` Feng Tang @ 2018-12-04 16:01 ` Petr Mladek 2018-12-05 1:53 ` Feng Tang 2018-12-05 2:26 ` Sergey Senozhatsky 1 sibling, 1 reply; 25+ messages in thread From: Petr Mladek @ 2018-12-04 16:01 UTC (permalink / raw) To: Feng Tang Cc: akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Peter Zijlstra, Sasha Levin On Tue 2018-12-04 23:49:36, Feng Tang wrote: > + Sasha > > Thanks Petr and Sergey for the reviews. > > > On Tue, Dec 04, 2018 at 11:20:33AM +0100, Petr Mladek wrote: > > On Mon 2018-12-03 23:15:31, Andrew Morton wrote: > > > > > > ------------------------------------------------------ > > > From: Feng Tang <feng.tang@intel.com> > > > Subject: panic: Avoid the extra noise dmesg > > > > > > When kernel panic happens, it will first print the panic call stack, > > > then the ending msg like: > > > > > > [ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception > > > [ 35.749975] ------------[ cut here ]------------ > > > > > > The above message are very useful for debugging. > > > > > > But if system is configured to not reboot on panic, say the "panic_timeout" > > > parameter equals 0, it will likely print out many noisy message like > > > WARN() call stack for each and every CPU except the panic one, messages > > > like below: > > > > > > WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190 > > > Call Trace: > > > <IRQ> > > > try_to_wake_up > > > default_wake_function > > > autoremove_wake_function > > > __wake_up_common > > > __wake_up_common_lock > > > __wake_up > > > wake_up_klogd_work_func > > > irq_work_run_list > > > irq_work_tick > > > update_process_times > > > tick_sched_timer > > > __hrtimer_run_queues > > > hrtimer_interrupt > > > smp_apic_timer_interrupt > > > apic_timer_interrupt > > > > I guess that it is a warning about migrating tasks to an offline CPU. > > My v1 patch was trying to add some hacky code into architecture code to > address several WARN()s directly, but that turns out to be very hacky and > involve much code for many archs. > > > > For people working in console mode, the screen will first show the panic > > > call stack, but immediately overridded by these noisy extra messages, which > > > makes debugging much more difficult, as the original context gets lost on > > > screen. > > > > > > Also these noisy messages will confuse some users, as I have seen many bug > > > reporters posted the noisy message into bugzilla, instead of the real panic > > > call stack and context. > > > > > > Removing the "local_irq_enable" will avoid the noisy message. > > > > > > The justification for the removing is: when code runs to this point, it > > > means user has chosed to not reboot, or do any special handling by using > > > the panic notifier method, no much point in re-enabling the interrupt. > > > > > > Link: http://lkml.kernel.org/r/1543902228-23834-1-git-send-email-feng.tang@intel.com > > > Signed-off-by: Feng Tang <feng.tang@intel.com> > > > Cc: Thomas Gleixner <tglx@linutronix.de> > > > Cc: Kees Cook <keescook@chromium.org> > > > Cc: Borislav Petkov <bp@suse.de> > > > Cc: Petr Mladek <pmladek@suse.com> > > > Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > > > Cc: <stable@vger.kernel.org> > > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > > > --- > > > > > > > > > --- a/kernel/panic.c~panic-avoid-the-extra-noise-dmesg > > > +++ a/kernel/panic.c > > > @@ -322,7 +322,6 @@ void panic(const char *fmt, ...) > > > } > > > #endif > > > pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf); > > > - local_irq_enable(); > > > for (i = 0; ; i += PANIC_TIMER_STEP) { > > > touch_softlockup_watchdog(); > > > if (i >= i_next) { > > > > Hmm, this calls panic_blink(). It seems that it depends on workqueues > > and the scheduler: > > > > + led_panic_blink() > > + led_trigger_event() > > + led_set_brightness() > > + schedule_work(set_brightness_work) > > > > I guess that blinking might be important in some situations and > > on some devices. On the other hand, we are interested only into > > blinking from this point on. > > > > The easiest solution seems to be to make a noop from printk(). > > For example, we could add a global flag: > > > > int panic_blinking; > > > > and add the following into vprintk_func() > > > > /* > > * Do not push away real panic() message by warnings from led > > * blinking code. > > */ > > if (panic_blinking) > > return 0; > > > > How does that sound? > > This should be able to achieve the same goal. > > One thing I can think of is what mentioned by Sergey that some sysrq > handler may want to print out something, but it should mostly be > covered by 2 other panic debug print patches, which will print out > task/mem/timer/lock/ftrace info runtime on demand. I think that we could simply clear panic_blinking from __handle_sysrq(). The user will still be able to capture the screen before touching the keyboard. But it will keep the things simple. I hope that we did not miss anything else. Anyway, the approach with making printk a nop still looks like the best maintainable solution to me. Best Regards, Petr ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-04 16:01 ` Petr Mladek @ 2018-12-05 1:53 ` Feng Tang 2018-12-05 2:50 ` Sergey Senozhatsky 0 siblings, 1 reply; 25+ messages in thread From: Feng Tang @ 2018-12-05 1:53 UTC (permalink / raw) To: Petr Mladek Cc: akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Peter Zijlstra, Sasha Levin Hi Petr, On Tue, Dec 04, 2018 at 05:01:52PM +0100, Petr Mladek wrote: > On Tue 2018-12-04 23:49:36, Feng Tang wrote: > > + Sasha > > > > Thanks Petr and Sergey for the reviews. > > > > > > On Tue, Dec 04, 2018 at 11:20:33AM +0100, Petr Mladek wrote: > > > On Mon 2018-12-03 23:15:31, Andrew Morton wrote: > > > > > > > > ------------------------------------------------------ > > > > From: Feng Tang <feng.tang@intel.com> > > > > Subject: panic: Avoid the extra noise dmesg > > > > > > > > When kernel panic happens, it will first print the panic call stack, > > > > then the ending msg like: > > > > > > > > [ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception > > > > [ 35.749975] ------------[ cut here ]------------ > > > > > > > > The above message are very useful for debugging. > > > > > > > > But if system is configured to not reboot on panic, say the "panic_timeout" > > > > parameter equals 0, it will likely print out many noisy message like > > > > WARN() call stack for each and every CPU except the panic one, messages > > > > like below: > > > > > > > > WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190 > > > > Call Trace: > > > > <IRQ> > > > > try_to_wake_up > > > > default_wake_function > > > > autoremove_wake_function > > > > __wake_up_common > > > > __wake_up_common_lock > > > > __wake_up > > > > wake_up_klogd_work_func > > > > irq_work_run_list > > > > irq_work_tick > > > > update_process_times > > > > tick_sched_timer > > > > __hrtimer_run_queues > > > > hrtimer_interrupt > > > > smp_apic_timer_interrupt > > > > apic_timer_interrupt > > > > > > I guess that it is a warning about migrating tasks to an offline CPU. > > > > My v1 patch was trying to add some hacky code into architecture code to > > address several WARN()s directly, but that turns out to be very hacky and > > involve much code for many archs. > > > > > > For people working in console mode, the screen will first show the panic > > > > call stack, but immediately overridded by these noisy extra messages, which > > > > makes debugging much more difficult, as the original context gets lost on > > > > screen. > > > > > > > > Also these noisy messages will confuse some users, as I have seen many bug > > > > reporters posted the noisy message into bugzilla, instead of the real panic > > > > call stack and context. > > > > > > > > Removing the "local_irq_enable" will avoid the noisy message. > > > > > > > > The justification for the removing is: when code runs to this point, it > > > > means user has chosed to not reboot, or do any special handling by using > > > > the panic notifier method, no much point in re-enabling the interrupt. > > > > > > > > Link: http://lkml.kernel.org/r/1543902228-23834-1-git-send-email-feng.tang@intel.com > > > > Signed-off-by: Feng Tang <feng.tang@intel.com> > > > > Cc: Thomas Gleixner <tglx@linutronix.de> > > > > Cc: Kees Cook <keescook@chromium.org> > > > > Cc: Borislav Petkov <bp@suse.de> > > > > Cc: Petr Mladek <pmladek@suse.com> > > > > Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > > > > Cc: <stable@vger.kernel.org> > > > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > > > > --- > > > > > > > > > > > > --- a/kernel/panic.c~panic-avoid-the-extra-noise-dmesg > > > > +++ a/kernel/panic.c > > > > @@ -322,7 +322,6 @@ void panic(const char *fmt, ...) > > > > } > > > > #endif > > > > pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf); > > > > - local_irq_enable(); > > > > for (i = 0; ; i += PANIC_TIMER_STEP) { > > > > touch_softlockup_watchdog(); > > > > if (i >= i_next) { > > > > > > Hmm, this calls panic_blink(). It seems that it depends on workqueues > > > and the scheduler: > > > > > > + led_panic_blink() > > > + led_trigger_event() > > > + led_set_brightness() > > > + schedule_work(set_brightness_work) > > > > > > I guess that blinking might be important in some situations and > > > on some devices. On the other hand, we are interested only into > > > blinking from this point on. > > > > > > The easiest solution seems to be to make a noop from printk(). > > > For example, we could add a global flag: > > > > > > int panic_blinking; > > > > > > and add the following into vprintk_func() > > > > > > /* > > > * Do not push away real panic() message by warnings from led > > > * blinking code. > > > */ > > > if (panic_blinking) > > > return 0; > > > > > > How does that sound? > > > > This should be able to achieve the same goal. > > > > One thing I can think of is what mentioned by Sergey that some sysrq > > handler may want to print out something, but it should mostly be > > covered by 2 other panic debug print patches, which will print out > > task/mem/timer/lock/ftrace info runtime on demand. > > I think that we could simply clear panic_blinking from > __handle_sysrq(). The user will still be able to capture the screen > before touching the keyboard. But it will keep the things simple. > > I hope that we did not miss anything else. Anyway, the approach with > making printk a nop still looks like the best maintainable solution > to me. I will setup a platform which can handle sysrq request and try your suggestion. thanks, - Feng ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-05 1:53 ` Feng Tang @ 2018-12-05 2:50 ` Sergey Senozhatsky 2018-12-05 3:05 ` Sergey Senozhatsky 0 siblings, 1 reply; 25+ messages in thread From: Sergey Senozhatsky @ 2018-12-05 2:50 UTC (permalink / raw) To: Feng Tang Cc: Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Peter Zijlstra, Sasha Levin On (12/05/18 09:53), Feng Tang wrote: > > I think that we could simply clear panic_blinking from > > __handle_sysrq(). The user will still be able to capture the screen > > before touching the keyboard. But it will keep the things simple. > > > > I hope that we did not miss anything else. Anyway, the approach with > > making printk a nop still looks like the best maintainable solution > > to me. > > I will setup a platform which can handle sysrq request and try your > suggestion. thanks, I don't entirely understand this patch series, sorry. So you want to keep local IRQs disabled to, supposedly, have less printk-s between dump_stack() from panic CPU and "end Kernel panic" marker; yet at the same time you add *significantly* more printk-s between dump_stack() from panic CPU and "end Kernel panic" marker. panic_print_sys_info() can be very verbose, and it happens much later than dump_stack() from panic CPU. So you are guaranteed to have same problems you are trying to avoid: "the original context gets lost on screen" and "confused people post bad bug reports". Am I missing something? Dunno. Just a bunch of ideas (raw ideas). Is something like below going to work for you instead? --- @@ -327,6 +327,9 @@ void panic(const char *fmt, ...) #endif pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf); local_irq_enable(); + + dump_stack(); + for (i = 0; ; i += PANIC_TIMER_STEP) { touch_softlockup_watchdog(); if (i >= i_next) { --- Or... *Maybe* you can even do a ratelimited dump_stack() from that PANIC_TIMER_STEP loop. Say, one dump_stack() every 10 minutes. The WARN_ON noise should stop at some point. -ss ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-05 2:50 ` Sergey Senozhatsky @ 2018-12-05 3:05 ` Sergey Senozhatsky 2018-12-05 3:27 ` Feng Tang 0 siblings, 1 reply; 25+ messages in thread From: Sergey Senozhatsky @ 2018-12-05 3:05 UTC (permalink / raw) To: Feng Tang Cc: Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Peter Zijlstra, Sasha Levin, Sergey Senozhatsky On (12/05/18 11:50), Sergey Senozhatsky wrote: > > panic_print_sys_info() can be very verbose, and it happens much later > than dump_stack() from panic CPU. So you are guaranteed to have same > problems you are trying to avoid: "the original context gets > lost on screen" and "confused people post bad bug reports". > And it probably would be _a bit_ better to do panic_print_sys_info() before console_flush_on_panic(). --- debug_locks_off(); - console_flush_on_panic(); - panic_print_sys_info(); + console_flush_on_panic(); --- -ss ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-05 3:05 ` Sergey Senozhatsky @ 2018-12-05 3:27 ` Feng Tang 0 siblings, 0 replies; 25+ messages in thread From: Feng Tang @ 2018-12-05 3:27 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Peter Zijlstra, Sasha Levin Hi Sergey, On Wed, Dec 05, 2018 at 12:05:55PM +0900, Sergey Senozhatsky wrote: > On (12/05/18 11:50), Sergey Senozhatsky wrote: > > > > panic_print_sys_info() can be very verbose, and it happens much later > > than dump_stack() from panic CPU. So you are guaranteed to have same > > problems you are trying to avoid: "the original context gets > > lost on screen" and "confused people post bad bug reports". > > > > And it probably would be _a bit_ better to do panic_print_sys_info() > before console_flush_on_panic(). > > --- > debug_locks_off(); > - console_flush_on_panic(); > - > panic_print_sys_info(); > + console_flush_on_panic(); > --- Current situation of kernel panic print is like this: 1. local_irq_disable() 2. panic stack dump ----> Msg A 3. local_irq_enable() 4. lots of WARN() ----> Msg B Msg A is what we need, and Msg B is want we want to eliminate. https://lkml.org/lkml/2018/11/27/459 has the example log for kernel panic. My v3 patch is to remove the step 3, keep the IRQ disabled to avoid Msg B. Given the sysrq and led blinking concerns brought by you and Petr, I'm trying to test the panic_blink way while keeping the step 3 of local_irq_enable() Thanks, Feng ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-04 15:49 ` Feng Tang 2018-12-04 16:01 ` Petr Mladek @ 2018-12-05 2:26 ` Sergey Senozhatsky 2018-12-05 2:47 ` Feng Tang 1 sibling, 1 reply; 25+ messages in thread From: Sergey Senozhatsky @ 2018-12-05 2:26 UTC (permalink / raw) To: Feng Tang Cc: Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Peter Zijlstra, Sasha Levin On (12/04/18 23:49), Feng Tang wrote: > This should be able to achieve the same goal. > > One thing I can think of is what mentioned by Sergey that some sysrq > handler may want to print out something, but it should mostly be > covered by 2 other panic debug print patches, which will print out > task/mem/timer/lock/ftrace info runtime on demand. Well, not all sysrq handlers just printk stuff; some do sane things, like emergency sync, umount, etc. -ss ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-05 2:26 ` Sergey Senozhatsky @ 2018-12-05 2:47 ` Feng Tang 2018-12-05 2:57 ` Sergey Senozhatsky 0 siblings, 1 reply; 25+ messages in thread From: Feng Tang @ 2018-12-05 2:47 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Peter Zijlstra, Sasha Levin Hi Sergey, On Wed, Dec 05, 2018 at 11:26:54AM +0900, Sergey Senozhatsky wrote: > On (12/04/18 23:49), Feng Tang wrote: > > This should be able to achieve the same goal. > > > > One thing I can think of is what mentioned by Sergey that some sysrq > > handler may want to print out something, but it should mostly be > > covered by 2 other panic debug print patches, which will print out > > task/mem/timer/lock/ftrace info runtime on demand. > > Well, not all sysrq handlers just printk stuff; some do sane things, > like emergency sync, umount, etc. Yes, I understand. I said that by assuming we will keep the irq enabled and use the panic_blinking to control the flag, as suggested by Petr and you. With irq enabled, these sync, umount will be completed as usual. Btw, just FYI, I just tried the sysrq (using minicom CTL + A + F + 'magic key'), it works with system is running, but failed after I trigger a panic, I will check more though I'm not very familiar with sysrq yet. Thanks, Feng ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-05 2:47 ` Feng Tang @ 2018-12-05 2:57 ` Sergey Senozhatsky 2018-12-05 5:29 ` Sergey Senozhatsky 0 siblings, 1 reply; 25+ messages in thread From: Sergey Senozhatsky @ 2018-12-05 2:57 UTC (permalink / raw) To: Feng Tang Cc: Sergey Senozhatsky, Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Peter Zijlstra, Sasha Levin On (12/05/18 10:47), Feng Tang wrote: > > Btw, just FYI, I just tried the sysrq (using minicom CTL + A + F + 'magic key'), > it works with system is running, but failed after I trigger a panic, I will > check more though I'm not very familiar with sysrq yet. > Thanks! I'm not entirely sure if people still do this. I just saw the code in linux 2.4 and assumed that this still might be the case. Well, "theoretically, why not" :) The system can panic() because of misbehaving video driver, for instance, so the emergency sync still should be possible. But maybe it's all irrelevant, I don't know. -ss ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-05 2:57 ` Sergey Senozhatsky @ 2018-12-05 5:29 ` Sergey Senozhatsky 2018-12-05 8:00 ` Sergey Senozhatsky 0 siblings, 1 reply; 25+ messages in thread From: Sergey Senozhatsky @ 2018-12-05 5:29 UTC (permalink / raw) To: Feng Tang, Peter Zijlstra Cc: Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Sergey Senozhatsky, Andi Kleen On (12/05/18 11:57), Sergey Senozhatsky wrote: > On (12/05/18 10:47), Feng Tang wrote: > > > > Btw, just FYI, I just tried the sysrq (using minicom CTL + A + F + 'magic key'), > > it works with system is running, but failed after I trigger a panic, I will > > check more though I'm not very familiar with sysrq yet. OK... So, apparently, what's happening is panic() calls smp_send_stop(). And smp_send_stop()->native_stop_other_cpus() on x86 disables local APIC. So no fun anymore. If I keep APIC enabled on panic CPU, then I have my keyboard working, including PageUp-PageDown scrolling, sysrq handling, and so on. I think I'm not the only one who'd want scrollback to work after panic (yes, fremebuffer for debugging). Andi Kleen [1] wrote (Cc-ed): : Oops/warnings are getting longer and longer, often scrolling away : from the screen, and if the kernel crashes backscroll does not work : anymore, so precious information is lost. PeterZ, for those folks who sometimes have to use framebuffer for debugging (just a trivial "let me scrollback and see the panic backtrace") and not always have access to serial console, can we have local APIC enabled on the panic_cpu? Or is it a terrible thing to ask for? [1] https://lore.kernel.org/lkml/878tcvt592.fsf@linux.intel.com/T/#u -ss ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-05 5:29 ` Sergey Senozhatsky @ 2018-12-05 8:00 ` Sergey Senozhatsky 2018-12-05 15:46 ` Feng Tang 0 siblings, 1 reply; 25+ messages in thread From: Sergey Senozhatsky @ 2018-12-05 8:00 UTC (permalink / raw) To: Feng Tang, Peter Zijlstra Cc: Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen, Sergey Senozhatsky On (12/05/18 14:29), Sergey Senozhatsky wrote: > On (12/05/18 11:57), Sergey Senozhatsky wrote: > > On (12/05/18 10:47), Feng Tang wrote: > > OK... So, apparently, what's happening is panic() calls smp_send_stop(). > And smp_send_stop()->native_stop_other_cpus() on x86 disables local APIC. > So no fun anymore. > > If I keep APIC enabled on panic CPU, then I have my keyboard working, > including PageUp-PageDown scrolling, sysrq handling, and so on. Meeh, it's much harder than this. Worked on one machine only. > PeterZ, > for those folks who sometimes have to use framebuffer for debugging > (just a trivial "let me scrollback and see the panic backtrace") and > not always have access to serial console, can we have local APIC > enabled on the panic_cpu? Or is it a terrible thing to ask for? The question remains: can we have input irqs on panic_cpu after panic? -ss ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-05 8:00 ` Sergey Senozhatsky @ 2018-12-05 15:46 ` Feng Tang 2018-12-06 3:58 ` Feng Tang 0 siblings, 1 reply; 25+ messages in thread From: Feng Tang @ 2018-12-05 15:46 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Peter Zijlstra, Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen On Wed, Dec 05, 2018 at 05:00:44PM +0900, Sergey Senozhatsky wrote: > On (12/05/18 14:29), Sergey Senozhatsky wrote: > > On (12/05/18 11:57), Sergey Senozhatsky wrote: > > > On (12/05/18 10:47), Feng Tang wrote: > > > > OK... So, apparently, what's happening is panic() calls smp_send_stop(). > > And smp_send_stop()->native_stop_other_cpus() on x86 disables local APIC. > > So no fun anymore. > > > > If I keep APIC enabled on panic CPU, then I have my keyboard working, > > including PageUp-PageDown scrolling, sysrq handling, and so on. > > Meeh, it's much harder than this. Worked on one machine only. Same here, I tried on several platforms and hardly get the sysrq magic key working, though it works while system is running. And it make me wondering if those workqueue dependent led blinking code can still really work. > > > PeterZ, > > for those folks who sometimes have to use framebuffer for debugging > > (just a trivial "let me scrollback and see the panic backtrace") and > > not always have access to serial console, can we have local APIC > > enabled on the panic_cpu? Or is it a terrible thing to ask for? > > The question remains: can we have input irqs on panic_cpu after panic? With current kernel, the irq is still enabled for the panic CPU, I did see timer and some device's ISR get called, but they will only fire IRQ one time after panic triggered. Thanks, Feng ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-05 15:46 ` Feng Tang @ 2018-12-06 3:58 ` Feng Tang 2018-12-07 9:50 ` Sergey Senozhatsky 0 siblings, 1 reply; 25+ messages in thread From: Feng Tang @ 2018-12-06 3:58 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Peter Zijlstra, Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen On Wed, Dec 05, 2018 at 11:46:20PM +0800, Feng Tang wrote: > On Wed, Dec 05, 2018 at 05:00:44PM +0900, Sergey Senozhatsky wrote: > > On (12/05/18 14:29), Sergey Senozhatsky wrote: > > > On (12/05/18 11:57), Sergey Senozhatsky wrote: > > > > On (12/05/18 10:47), Feng Tang wrote: > > > > > > OK... So, apparently, what's happening is panic() calls smp_send_stop(). > > > And smp_send_stop()->native_stop_other_cpus() on x86 disables local APIC. > > > So no fun anymore. > > > > > > If I keep APIC enabled on panic CPU, then I have my keyboard working, > > > including PageUp-PageDown scrolling, sysrq handling, and so on. > > > > Meeh, it's much harder than this. Worked on one machine only. > > Same here, I tried on several platforms and hardly get the sysrq magic key > working, though it works while system is running. > > And it make me wondering if those workqueue dependent led blinking code > can still really work. Also, IMHO, if we need a panic blink method, it should better be simple and robust with only HW registers access plus delay function, as I'm not sure if the scheduling can still work. Anyway, can I propose to make the "local_irq_enable" conditional and off by default, and add a warning. @@ -295,7 +295,10 @@ void panic(const char *fmt, ...) } #endif pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf); - local_irq_enable(); + if (panic_keep_irq_on) + local_irq_enable(); + else + pr_emerg("Please add panic_keep_irq_on to cmdline, if you want blink/sysrq work"); for (i = 0; ; i += PANIC_TIMER_STEP) { touch_softlockup_watchdog(); if (i >= i_next) { Any comments? thanks! - Feng > > > > > > PeterZ, > > > for those folks who sometimes have to use framebuffer for debugging > > > (just a trivial "let me scrollback and see the panic backtrace") and > > > not always have access to serial console, can we have local APIC > > > enabled on the panic_cpu? Or is it a terrible thing to ask for? > > > > The question remains: can we have input irqs on panic_cpu after panic? > > With current kernel, the irq is still enabled for the panic CPU, I did > see timer and some device's ISR get called, but they will only fire IRQ > one time after panic triggered. > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-06 3:58 ` Feng Tang @ 2018-12-07 9:50 ` Sergey Senozhatsky 2018-12-10 9:45 ` Feng Tang 0 siblings, 1 reply; 25+ messages in thread From: Sergey Senozhatsky @ 2018-12-07 9:50 UTC (permalink / raw) To: Feng Tang Cc: Sergey Senozhatsky, Peter Zijlstra, Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen On (12/06/18 11:58), Feng Tang wrote: > > Same here, I tried on several platforms and hardly get the sysrq magic key > > working, though it works while system is running. > > > > And it make me wondering if those workqueue dependent led blinking code > > can still really work. > > Also, IMHO, if we need a panic blink method, it should better be simple > and robust with only HW registers access plus delay function, as I'm not > sure if the scheduling can still work. > > Anyway, can I propose to make the "local_irq_enable" conditional and off > by default, and add a warning. I'm not sure what to do about this. I think that the behaviour is platform specific. For instance, arm64 keeps secondary CPUs in a busy loop while (1) cpu_relax(); (masked out) and on panic_cpu disables only SDEI (interrupts from firmware, if I got it right); so it seems that arm64 can handle IRQs after panic. And if there are platforms that handle IRQ (including sysrq) after panic, then both options - making printk a noop or keeping local irqs off - maybe can cause some problems. Or maybe not. We better ask arch people. Personally, on my x86 laptop, I'd prefer the srollback to work after panic. Just my 5 cents. -ss ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-07 9:50 ` Sergey Senozhatsky @ 2018-12-10 9:45 ` Feng Tang 2018-12-10 15:57 ` Petr Mladek 2018-12-11 8:00 ` Sergey Senozhatsky 0 siblings, 2 replies; 25+ messages in thread From: Feng Tang @ 2018-12-10 9:45 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Peter Zijlstra, Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel Hi Sergey, + lkml. On Fri, Dec 07, 2018 at 06:50:04PM +0900, Sergey Senozhatsky wrote: > On (12/06/18 11:58), Feng Tang wrote: > > > Same here, I tried on several platforms and hardly get the sysrq magic key > > > working, though it works while system is running. > > > > > > And it make me wondering if those workqueue dependent led blinking code > > > can still really work. > > > > Also, IMHO, if we need a panic blink method, it should better be simple > > and robust with only HW registers access plus delay function, as I'm not > > sure if the scheduling can still work. > > > > Anyway, can I propose to make the "local_irq_enable" conditional and off > > by default, and add a warning. > > I'm not sure what to do about this. I think that the behaviour is platform > specific. For instance, arm64 keeps secondary CPUs in a busy loop > while (1) > cpu_relax(); Yes, it's similar to x86's handling for non-panic CPU. > > (masked out) and on panic_cpu disables only SDEI (interrupts from firmware, > if I got it right); so it seems that arm64 can handle IRQs after panic. And > if there are platforms that handle IRQ (including sysrq) after panic, then > both options - making printk a noop or keeping local irqs off - maybe can > cause some problems. Or maybe not. We better ask arch people. Yes, this is very valid concern. And after Petr and you raised it, I did some experiments with 3 x86 platforms at my hand, one Apollolake IOT device with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key all works well before panic, and fails after panic. But I did remember the PageUp/PageDown key worked on some laptop years ago. And you actually raised a good question: what do we expect for the post-panic kernel? For the v4 patch, my thought is, for experienced developers to make sysrq/panic_blink work, it's easy to add "panic_keep_irq_on" to kernel cmdline, or runtime change it by "echo Y > /sys/module/kernel/parameters/panic_keep_irq_on" while for normal user, they can by default see the clean panic call stack either on a screen or a serial console. Thanks, Feng > > Personally, on my x86 laptop, I'd prefer the srollback to work after panic. > Just my 5 cents. > > -ss ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-10 9:45 ` Feng Tang @ 2018-12-10 15:57 ` Petr Mladek 2018-12-11 8:07 ` Sergey Senozhatsky 2018-12-11 8:00 ` Sergey Senozhatsky 1 sibling, 1 reply; 25+ messages in thread From: Petr Mladek @ 2018-12-10 15:57 UTC (permalink / raw) To: Feng Tang Cc: Sergey Senozhatsky, Peter Zijlstra, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel On Mon 2018-12-10 17:45:54, Feng Tang wrote: > Hi Sergey, > > + lkml. > > On Fri, Dec 07, 2018 at 06:50:04PM +0900, Sergey Senozhatsky wrote: > > On (12/06/18 11:58), Feng Tang wrote: > > > > Same here, I tried on several platforms and hardly get the sysrq magic key > > > > working, though it works while system is running. > > > > > > > > And it make me wondering if those workqueue dependent led blinking code > > > > can still really work. > > > > > > Also, IMHO, if we need a panic blink method, it should better be simple > > > and robust with only HW registers access plus delay function, as I'm not > > > sure if the scheduling can still work. > > > > > > Anyway, can I propose to make the "local_irq_enable" conditional and off > > > by default, and add a warning. > > > > I'm not sure what to do about this. I think that the behaviour is platform > > specific. For instance, arm64 keeps secondary CPUs in a busy loop > > while (1) > > cpu_relax(); > > Yes, it's similar to x86's handling for non-panic CPU. > > > > > (masked out) and on panic_cpu disables only SDEI (interrupts from firmware, > > if I got it right); so it seems that arm64 can handle IRQs after panic. And > > if there are platforms that handle IRQ (including sysrq) after panic, then > > both options - making printk a noop or keeping local irqs off - maybe can > > cause some problems. Or maybe not. We better ask arch people. > > Yes, this is very valid concern. And after Petr and you raised it, I did > some experiments with 3 x86 platforms at my hand, one Apollolake IOT device > with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key > all works well before panic, and fails after panic. But I did remember the > PageUp/PageDown key worked on some laptop years ago. And you actually raised a > good question: what do we expect for the post-panic kernel? I am not sure why it does not work. But it would be nice if sysrq worked. > For the v4 patch, my thought is, for experienced developers to make > sysrq/panic_blink work, it's easy to add "panic_keep_irq_on" to kernel cmdline, > or runtime change it by > "echo Y > /sys/module/kernel/parameters/panic_keep_irq_on" > while for normal user, they can by default see the clean panic call stack > either on a screen or a serial console. I see this problematic from two reasons: First, the blinking and especially sysrq might be handy but they are disabled by default. It is too late to do anything when the system is panicing. Second, new parameters or configure options should be avoided whenever possible. They are easy to implement but the are less easy to use. The current amount of them is already overhelming. I still think that calming down printk() is acceptable when it can be restored from sysrq. I think that only few people might be interested into debugging post-panic problems. We could print a warning for them about that printk() has got disabled. Best Regards, Petr ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-10 15:57 ` Petr Mladek @ 2018-12-11 8:07 ` Sergey Senozhatsky 2018-12-11 8:22 ` Petr Mladek 2018-12-11 8:32 ` Feng Tang 0 siblings, 2 replies; 25+ messages in thread From: Sergey Senozhatsky @ 2018-12-11 8:07 UTC (permalink / raw) To: Petr Mladek Cc: Feng Tang, Sergey Senozhatsky, Peter Zijlstra, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel On (12/10/18 16:57), Petr Mladek wrote: > > > (masked out) and on panic_cpu disables only SDEI (interrupts from firmware, > > > if I got it right); so it seems that arm64 can handle IRQs after panic. And > > > if there are platforms that handle IRQ (including sysrq) after panic, then > > > both options - making printk a noop or keeping local irqs off - maybe can > > > cause some problems. Or maybe not. We better ask arch people. > > > > Yes, this is very valid concern. And after Petr and you raised it, I did > > some experiments with 3 x86 platforms at my hand, one Apollolake IOT device > > with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key > > all works well before panic, and fails after panic. But I did remember the > > PageUp/PageDown key worked on some laptop years ago. And you actually raised a > > good question: what do we expect for the post-panic kernel? > > I am not sure why it does not work. But it would be nice if sysrq > worked. Absolutely. [..] > I still think that calming down printk() is acceptable when > it can be restored from sysrq. I would agree; peeking one of the two solutions, printk patch is probably preferable. > I think that only few people might be interested into debugging > post-panic problems. We could print a warning for them about > that printk() has got disabled. Dunno. This _maybe_ (speculation!) can upset folks on those platforms that have sysrq working after panic. printk is a common code. I'm probably missing a lot of things here, but just in case, I'm not sure at which point the idea of patching some files under arch/x86 directory was ruled out and why. -ss ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-11 8:07 ` Sergey Senozhatsky @ 2018-12-11 8:22 ` Petr Mladek 2018-12-11 8:26 ` Sergey Senozhatsky 2018-12-11 8:32 ` Feng Tang 1 sibling, 1 reply; 25+ messages in thread From: Petr Mladek @ 2018-12-11 8:22 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Feng Tang, Peter Zijlstra, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel On Tue 2018-12-11 17:07:43, Sergey Senozhatsky wrote: > On (12/10/18 16:57), Petr Mladek wrote: > > > > (masked out) and on panic_cpu disables only SDEI (interrupts from firmware, > > > > if I got it right); so it seems that arm64 can handle IRQs after panic. And > > > > if there are platforms that handle IRQ (including sysrq) after panic, then > > > > both options - making printk a noop or keeping local irqs off - maybe can > > > > cause some problems. Or maybe not. We better ask arch people. > > > > > > Yes, this is very valid concern. And after Petr and you raised it, I did > > > some experiments with 3 x86 platforms at my hand, one Apollolake IOT device > > > with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key > > > all works well before panic, and fails after panic. But I did remember the > > > PageUp/PageDown key worked on some laptop years ago. And you actually raised a > > > good question: what do we expect for the post-panic kernel? > > > > I am not sure why it does not work. But it would be nice if sysrq > > worked. > > Absolutely. > > [..] > > I still think that calming down printk() is acceptable when > > it can be restored from sysrq. > > I would agree; peeking one of the two solutions, printk patch is > probably preferable. > > > I think that only few people might be interested into debugging > > post-panic problems. We could print a warning for them about > > that printk() has got disabled. > > Dunno. This _maybe_ (speculation!) can upset folks on those platforms > that have sysrq working after panic. printk is a common code. > > I'm probably missing a lot of things here, but just in case, I'm not > sure at which point the idea of patching some files under arch/x86 > directory was ruled out and why. I suggested to clear the panic_blinking (or whatever name) in __handle_sysrq(). The idea is that sysrq needs manual intervention. It allows to see the original message before it gets overridden by a potential sysrq-related output. It assumes that sysrq is the only interesting operation when printk() might be useful at this state. Best Regards, Petr ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-11 8:22 ` Petr Mladek @ 2018-12-11 8:26 ` Sergey Senozhatsky 0 siblings, 0 replies; 25+ messages in thread From: Sergey Senozhatsky @ 2018-12-11 8:26 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Feng Tang, Peter Zijlstra, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel On (12/11/18 09:22), Petr Mladek wrote: > I suggested to clear the panic_blinking (or whatever name) in > __handle_sysrq(). The idea is that sysrq needs manual intervention. > It allows to see the original message before it gets overridden > by a potential sysrq-related output. Ah, indeed. Sounds good. > It assumes that sysrq is the only interesting operation when > printk() might be useful at this state. Agreed. Makes sense. -ss ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-11 8:07 ` Sergey Senozhatsky 2018-12-11 8:22 ` Petr Mladek @ 2018-12-11 8:32 ` Feng Tang 2018-12-11 9:08 ` Sergey Senozhatsky 1 sibling, 1 reply; 25+ messages in thread From: Feng Tang @ 2018-12-11 8:32 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Peter Zijlstra, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel Hi Sergey, On Tue, Dec 11, 2018 at 05:07:43PM +0900, Sergey Senozhatsky wrote: > On (12/10/18 16:57), Petr Mladek wrote: > > > > (masked out) and on panic_cpu disables only SDEI (interrupts from firmware, > > > > if I got it right); so it seems that arm64 can handle IRQs after panic. And > > > > if there are platforms that handle IRQ (including sysrq) after panic, then > > > > both options - making printk a noop or keeping local irqs off - maybe can > > > > cause some problems. Or maybe not. We better ask arch people. > > > > > > Yes, this is very valid concern. And after Petr and you raised it, I did > > > some experiments with 3 x86 platforms at my hand, one Apollolake IOT device > > > with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key > > > all works well before panic, and fails after panic. But I did remember the > > > PageUp/PageDown key worked on some laptop years ago. And you actually raised a > > > good question: what do we expect for the post-panic kernel? > > > > I am not sure why it does not work. But it would be nice if sysrq > > worked. > > Absolutely. > > [..] > > I still think that calming down printk() is acceptable when > > it can be restored from sysrq. > > I would agree; peeking one of the two solutions, printk patch is > probably preferable. > > > I think that only few people might be interested into debugging > > post-panic problems. We could print a warning for them about > > that printk() has got disabled. > > Dunno. This _maybe_ (speculation!) can upset folks on those platforms > that have sysrq working after panic. printk is a common code. > > I'm probably missing a lot of things here, but just in case, I'm not > sure at which point the idea of patching some files under arch/x86 > directory was ruled out and why. Here is the v1 patch: https://lkml.org/lkml/2018/10/11/304 And actually no one ruled out the v1 patch :), I don't have HW of other archs like arm/ppc, so I just read some of the arch code, and found most of them use the similar flow like x86, that's why I chosed to finding a soluton inside panic.c itself. Thanks, Feng ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-11 8:32 ` Feng Tang @ 2018-12-11 9:08 ` Sergey Senozhatsky 0 siblings, 0 replies; 25+ messages in thread From: Sergey Senozhatsky @ 2018-12-11 9:08 UTC (permalink / raw) To: Feng Tang, Peter Zijlstra, Petr Mladek, Steven Rostedt, Thomas Gleixner, Frederic Weisbecker Cc: Sergey Senozhatsky, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, Sasha Levin, Andi Kleen, linux-kernel Adding Frederic, https://lkml.org/lkml/2018/10/11/304 On (12/11/18 16:32), Feng Tang wrote: > Here is the v1 patch: https://lkml.org/lkml/2018/10/11/304 > > And actually no one ruled out the v1 patch :), I don't have HW of other > archs like arm/ppc, so I just read some of the arch code, and found > most of them use the similar flow like x86, that's why I chosed to > finding a soluton inside panic.c itself. Interesting. So if the problem is that we need to clear cpu bit in several cpumaks (e.g. nohz.idle_cpus_mask) when we stop_this_cpu(), then I'd say let's clear cpumasks which are needed to be clear (doing some of the things which sched_cpu_dying() does, except that we need it on !CONFIG_HOTPLUG_CPU systems too). The idea of notifiers also looks interesting. x86 and sched gurus, can you please help? -ss ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree 2018-12-10 9:45 ` Feng Tang 2018-12-10 15:57 ` Petr Mladek @ 2018-12-11 8:00 ` Sergey Senozhatsky 1 sibling, 0 replies; 25+ messages in thread From: Sergey Senozhatsky @ 2018-12-11 8:00 UTC (permalink / raw) To: Feng Tang Cc: Sergey Senozhatsky, Peter Zijlstra, Petr Mladek, akpm, bp, keescook, mm-commits, sergey.senozhatsky, stable, tglx, Steven Rostedt, Sasha Levin, Andi Kleen, linux-kernel Hi, On (12/10/18 17:45), Feng Tang wrote: > Yes, this is very valid concern. And after Petr and you raised it, I did > some experiments with 3 x86 platforms at my hand, one Apollolake IOT device > with serial console, one IvyBridge laptop and one Kabylake NUC, the magic key > all works well before panic, and fails after panic. But I did remember the > PageUp/PageDown key worked on some laptop years ago. And you actually raised a > good question: what do we expect for the post-panic kernel? Yeah. It used to be case that people expected some things to work after panic. > For the v4 patch, my thought is, for experienced developers to make > sysrq/panic_blink work, it's easy to add "panic_keep_irq_on" to kernel cmdline, > or runtime change it by > "echo Y > /sys/module/kernel/parameters/panic_keep_irq_on" > while for normal user, they can by default see the clean panic call stack > either on a screen or a serial console. Before we move on, just a quick question, since I wasn't Cc-ed to v1 and v2 of this patch - did you have a chance to ask x86 people if they can help in any way? Asking to make sure that we are not fixing a _maybe_ x86-specific problem in arch-independent/common code. /* offtopic */ LOL, wish this was a "dumb-and-ugly-solutions" contest; I'm pretty sure I'd take the first prize with this one: --- diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 04adc8d60aed..40f643bb7fdc 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -181,6 +181,16 @@ asmlinkage __visible void smp_reboot_interrupt(void) irq_exit(); } +static void native_smp_suppress_reschedule(int cpu) +{ +} + +static void native_smp_to_up(void) +{ + WARN_ON_ONCE(num_online_cpus() > 1); + smp_ops.smp_send_reschedule = native_smp_suppress_reschedule; +} + static void native_stop_other_cpus(int wait) { unsigned long flags; @@ -250,6 +260,7 @@ static void native_stop_other_cpus(int wait) local_irq_save(flags); disable_local_APIC(); mcheck_cpu_clear(this_cpu_ptr(&cpu_info)); + native_smp_to_up(); local_irq_restore(flags); } --- If the system is not SMP anymore (hlt non-panic CPUs) - rewrite some smp_ops pointers to NOOP stubs to suppress some of those warnings. I know it's utterly awful. -ss ^ permalink raw reply related [flat|nested] 25+ messages in thread
end of thread, other threads:[~2018-12-11 9:08 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-12-04 7:15 + panic-avoid-the-extra-noise-dmesg.patch added to -mm tree akpm 2018-12-04 10:10 ` Sergey Senozhatsky 2018-12-04 10:20 ` Petr Mladek 2018-12-04 15:49 ` Feng Tang 2018-12-04 16:01 ` Petr Mladek 2018-12-05 1:53 ` Feng Tang 2018-12-05 2:50 ` Sergey Senozhatsky 2018-12-05 3:05 ` Sergey Senozhatsky 2018-12-05 3:27 ` Feng Tang 2018-12-05 2:26 ` Sergey Senozhatsky 2018-12-05 2:47 ` Feng Tang 2018-12-05 2:57 ` Sergey Senozhatsky 2018-12-05 5:29 ` Sergey Senozhatsky 2018-12-05 8:00 ` Sergey Senozhatsky 2018-12-05 15:46 ` Feng Tang 2018-12-06 3:58 ` Feng Tang 2018-12-07 9:50 ` Sergey Senozhatsky 2018-12-10 9:45 ` Feng Tang 2018-12-10 15:57 ` Petr Mladek 2018-12-11 8:07 ` Sergey Senozhatsky 2018-12-11 8:22 ` Petr Mladek 2018-12-11 8:26 ` Sergey Senozhatsky 2018-12-11 8:32 ` Feng Tang 2018-12-11 9:08 ` Sergey Senozhatsky 2018-12-11 8:00 ` Sergey Senozhatsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox