* [PATCH v3 0/4] printk: reduce deadlocks during panic
@ 2022-02-01 18:57 Stephen Brennan
2022-02-01 18:57 ` [PATCH v3 1/4] printk: Add panic_in_progress helper Stephen Brennan
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Stephen Brennan @ 2022-02-01 18:57 UTC (permalink / raw)
To: Sergey Senozhatsky, Petr Mladek; +Cc: Steven Rostedt, linux-kernel, John Ogness
When a caller writes heavily to the kernel log (e.g. writing to
/dev/kmsg in a loop) while another panics, there's currently a high
likelihood of a deadlock (see patch 2 for the full description of this
deadlock).
The principle fix is to disable the optimistic spin once panic_cpu is
set, so the panic CPU doesn't spin waiting for a halted CPU to hand over
the console_sem.
However, this exposed us to a livelock situation, where the panic CPU
holds the console_sem, and another CPU could fill up the log buffer
faster than the consoles could drain it, preventing the panic from
progressing and halting the other CPUs. To avoid this, patch 3 adds a
mechanism to suppress printk (from non-panic-CPU) during panic, if we
reach a threshold of dropped messages.
A major goal with all of these patches is to try to decrease the
likelihood that another CPU is holding the console_sem when we halt it
in panic(). This reduces the odds of needing to break locks and
potentially encountering further deadlocks with the console drivers.
To test, I use the following script, kmsg_panic.sh:
#!/bin/bash
date
# 991 chars (based on log buffer size):
chars="$(printf 'a%.0s' {1..991})"
while :; do
echo $chars > /dev/kmsg
done &
echo c > /proc/sysrq-trigger &
date
exit
I defined a hang as any time the system did not reboot to a login prompt
on the serial console within 60 seconds. Here are the statistics on
hangs using this script, before and after the patch.
before: 776 hangs / 1484 trials - 52.3%
after : 0 hangs / 15k trials - 0.0%
v3:
Some mild style changes, none of which affect testing (which has run
continuously and is now over 15 thousand trials without a hang!)
v2:
Each patch has minor updates from code reviews. I've re-done testing and
updated the above statistics. Exact changes are in each patch.
Stephen Brennan (4):
printk: Add panic_in_progress helper
printk: disable optimistic spin during panic
printk: Avoid livelock with heavy printk during panic
printk: Drop console_sem during panic
kernel/printk/printk.c | 55 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 54 insertions(+), 1 deletion(-)
--
2.30.2
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH v3 1/4] printk: Add panic_in_progress helper 2022-02-01 18:57 [PATCH v3 0/4] printk: reduce deadlocks during panic Stephen Brennan @ 2022-02-01 18:57 ` Stephen Brennan 2022-02-02 1:39 ` Stephen Brennan 2022-02-01 18:58 ` [PATCH v3 2/4] printk: disable optimistic spin during panic Stephen Brennan ` (2 subsequent siblings) 3 siblings, 1 reply; 11+ messages in thread From: Stephen Brennan @ 2022-02-01 18:57 UTC (permalink / raw) To: Sergey Senozhatsky, Petr Mladek Cc: Steven Rostedt, linux-kernel, John Ogness, Stephen Brennan This will be used help avoid deadlocks during panics. Although it would be better to include this in linux/panic.h, it would require that header to include linux/atomic.h as well. On some architectures, this results in a circular dependency. So instead add the helper directly to printk.c. Suggested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Reviewed-by: Petr Mladek <pmladek@suse.com> --- Notes: v3: Move the helper into printk.c due to circular include v2: Switch from macro to static inline function kernel/printk/printk.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 57b132b658e1..b33c2861a8fc 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1762,6 +1762,11 @@ static DEFINE_RAW_SPINLOCK(console_owner_lock); static struct task_struct *console_owner; static bool console_waiter; +static bool panic_in_progress(void) +{ + return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID); +} + /** * console_lock_spinning_enable - mark beginning of code where another * thread might safely busy wait -- 2.30.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v3 1/4] printk: Add panic_in_progress helper 2022-02-01 18:57 ` [PATCH v3 1/4] printk: Add panic_in_progress helper Stephen Brennan @ 2022-02-02 1:39 ` Stephen Brennan 0 siblings, 0 replies; 11+ messages in thread From: Stephen Brennan @ 2022-02-02 1:39 UTC (permalink / raw) To: Sergey Senozhatsky, Petr Mladek; +Cc: Steven Rostedt, linux-kernel, John Ogness Stephen Brennan <stephen.s.brennan@oracle.com> writes: > This will be used help avoid deadlocks during panics. Although it would > be better to include this in linux/panic.h, it would require that header > to include linux/atomic.h as well. On some architectures, this results > in a circular dependency. So instead add the helper directly to > printk.c. > > Suggested-by: Petr Mladek <pmladek@suse.com> > Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> > Reviewed-by: Petr Mladek <pmladek@suse.com> > --- > > Notes: > v3: Move the helper into printk.c due to circular include > v2: Switch from macro to static inline function > > kernel/printk/printk.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 57b132b658e1..b33c2861a8fc 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -1762,6 +1762,11 @@ static DEFINE_RAW_SPINLOCK(console_owner_lock); > static struct task_struct *console_owner; > static bool console_waiter; > > +static bool panic_in_progress(void) > +{ > + return unlikely(atomic_read(&panic_cpu) != PANIC_CPU_INVALID); > +} > + Unfortunately this is defined within a (very large) #ifdef CONFIG_PRINTK section. Thanks to the kbuild bot which will have earned the reported-by tag on this patch :D v4 will be tomorrow. Sorry for the noise everyone! > /** > * console_lock_spinning_enable - mark beginning of code where another > * thread might safely busy wait > -- > 2.30.2 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 2/4] printk: disable optimistic spin during panic 2022-02-01 18:57 [PATCH v3 0/4] printk: reduce deadlocks during panic Stephen Brennan 2022-02-01 18:57 ` [PATCH v3 1/4] printk: Add panic_in_progress helper Stephen Brennan @ 2022-02-01 18:58 ` Stephen Brennan 2022-02-01 18:58 ` [PATCH v3 3/4] printk: Avoid livelock with heavy printk " Stephen Brennan 2022-02-01 18:58 ` [PATCH v3 4/4] printk: Drop console_sem " Stephen Brennan 3 siblings, 0 replies; 11+ messages in thread From: Stephen Brennan @ 2022-02-01 18:58 UTC (permalink / raw) To: Sergey Senozhatsky, Petr Mladek Cc: Steven Rostedt, linux-kernel, John Ogness, Stephen Brennan A CPU executing with console lock spinning enabled might be halted during a panic. Before the panicking CPU calls console_flush_on_panic(), it may call console_trylock(), which attempts to optimistically spin, deadlocking the panic CPU: CPU 0 (panic CPU) CPU 1 ----------------- ------ printk() { vprintk_func() { vprintk_default() { vprintk_emit() { console_unlock() { console_lock_spinning_enable(); ... printing to console ... panic() { crash_smp_send_stop() { NMI -------------------> HALT } atomic_notifier_call_chain() { printk() { ... console_trylock_spinnning() { // optimistic spin infinitely This hang during panic can be induced when a kdump kernel is loaded, and crash_kexec_post_notifiers=1 is present on the kernel command line. The following script which concurrently writes to /dev/kmsg, and triggers a panic, can result in this hang: #!/bin/bash date # 991 chars (based on log buffer size): chars="$(printf 'a%.0s' {1..991})" while :; do echo $chars > /dev/kmsg done & echo c > /proc/sysrq-trigger & date exit To avoid this deadlock, ensure that console_trylock_spinning() does not allow spinning once a panic has begun. Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes") Suggested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Reviewed-by: Petr Mladek <pmladek@suse.com> --- kernel/printk/printk.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index b33c2861a8fc..1b96166eea35 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1848,6 +1848,16 @@ static int console_trylock_spinning(void) if (console_trylock()) return 1; + /* + * It's unsafe to spin once a panic has begun. If we are the + * panic CPU, we may have already halted the owner of the + * console_sem. If we are not the panic CPU, then we should + * avoid taking console_sem, so the panic CPU has a better + * chance of cleanly acquiring it later. + */ + if (panic_in_progress()) + return 0; + printk_safe_enter_irqsave(flags); raw_spin_lock(&console_owner_lock); -- 2.30.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v3 3/4] printk: Avoid livelock with heavy printk during panic 2022-02-01 18:57 [PATCH v3 0/4] printk: reduce deadlocks during panic Stephen Brennan 2022-02-01 18:57 ` [PATCH v3 1/4] printk: Add panic_in_progress helper Stephen Brennan 2022-02-01 18:58 ` [PATCH v3 2/4] printk: disable optimistic spin during panic Stephen Brennan @ 2022-02-01 18:58 ` Stephen Brennan 2022-02-02 0:16 ` kernel test robot 2022-02-01 18:58 ` [PATCH v3 4/4] printk: Drop console_sem " Stephen Brennan 3 siblings, 1 reply; 11+ messages in thread From: Stephen Brennan @ 2022-02-01 18:58 UTC (permalink / raw) To: Sergey Senozhatsky, Petr Mladek Cc: Steven Rostedt, linux-kernel, John Ogness, Stephen Brennan During panic(), if another CPU is writing heavily the kernel log (e.g. via /dev/kmsg), then the panic CPU may livelock writing out its messages to the console. Note when too many messages are dropped during panic and suppress further printk, except from the panic CPU. This could result in some important messages being dropped. However, messages are already being dropped, so this approach at least prevents a livelock. Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> --- Notes: v3: Use pr_warn_once, and don't break the message line v2: Add pr_warn when we suppress printk on non-panic CPU kernel/printk/printk.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 1b96166eea35..cc7bb86f7bfe 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -93,6 +93,12 @@ EXPORT_SYMBOL_GPL(console_drivers); */ int __read_mostly suppress_printk; +/* + * During panic, heavy printk by other CPUs can delay the + * panic and risk deadlock on console resources. + */ +int __read_mostly suppress_panic_printk; + #ifdef CONFIG_LOCKDEP static struct lockdep_map console_lock_dep_map = { .name = "console_lock" @@ -2233,6 +2239,10 @@ asmlinkage int vprintk_emit(int facility, int level, if (unlikely(suppress_printk)) return 0; + if (unlikely(suppress_panic_printk) && + atomic_read(&panic_cpu) != raw_smp_processor_id()) + return 0; + if (level == LOGLEVEL_SCHED) { level = LOGLEVEL_DEFAULT; in_sched = true; @@ -2618,6 +2628,7 @@ void console_unlock(void) { static char ext_text[CONSOLE_EXT_LOG_MAX]; static char text[CONSOLE_LOG_MAX]; + static int panic_console_dropped; unsigned long flags; bool do_cond_resched, retry; struct printk_info info; @@ -2672,6 +2683,10 @@ void console_unlock(void) if (console_seq != r.info->seq) { console_dropped += r.info->seq - console_seq; console_seq = r.info->seq; + if (panic_in_progress() && panic_console_dropped++ > 10) { + suppress_panic_printk = 1; + pr_warn_once("Too many dropped messages. Supress messages on non-panic CPUs to prevent livelock.\n"); + } } if (suppress_message_printing(r.info->level)) { -- 2.30.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v3 3/4] printk: Avoid livelock with heavy printk during panic 2022-02-01 18:58 ` [PATCH v3 3/4] printk: Avoid livelock with heavy printk " Stephen Brennan @ 2022-02-02 0:16 ` kernel test robot 0 siblings, 0 replies; 11+ messages in thread From: kernel test robot @ 2022-02-02 0:16 UTC (permalink / raw) To: Stephen Brennan; +Cc: llvm, kbuild-all Hi Stephen, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v5.17-rc2 next-20220201] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Stephen-Brennan/printk-reduce-deadlocks-during-panic/20220202-030025 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 26291c54e111ff6ba87a164d85d4a4e134b7315c config: hexagon-buildonly-randconfig-r006-20220130 (https://download.01.org/0day-ci/archive/20220202/202202020859.21684oMh-lkp@intel.com/config) compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 6b1e844b69f15bb7dffaf9365cd2b355d2eb7579) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/f8bc56f0cf3ba4bdd9948d9743b42927f8415345 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Stephen-Brennan/printk-reduce-deadlocks-during-panic/20220202-030025 git checkout f8bc56f0cf3ba4bdd9948d9743b42927f8415345 # save the config file to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash kernel/printk/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> kernel/printk/printk.c:2686:8: error: implicit declaration of function 'panic_in_progress' [-Werror,-Wimplicit-function-declaration] if (panic_in_progress() && panic_console_dropped++ > 10) { ^ 1 error generated. vim +/panic_in_progress +2686 kernel/printk/printk.c 2612 2613 /** 2614 * console_unlock - unlock the console system 2615 * 2616 * Releases the console_lock which the caller holds on the console system 2617 * and the console driver list. 2618 * 2619 * While the console_lock was held, console output may have been buffered 2620 * by printk(). If this is the case, console_unlock(); emits 2621 * the output prior to releasing the lock. 2622 * 2623 * If there is output waiting, we wake /dev/kmsg and syslog() users. 2624 * 2625 * console_unlock(); may be called from any context. 2626 */ 2627 void console_unlock(void) 2628 { 2629 static char ext_text[CONSOLE_EXT_LOG_MAX]; 2630 static char text[CONSOLE_LOG_MAX]; 2631 static int panic_console_dropped; 2632 unsigned long flags; 2633 bool do_cond_resched, retry; 2634 struct printk_info info; 2635 struct printk_record r; 2636 u64 __maybe_unused next_seq; 2637 2638 if (console_suspended) { 2639 up_console_sem(); 2640 return; 2641 } 2642 2643 prb_rec_init_rd(&r, &info, text, sizeof(text)); 2644 2645 /* 2646 * Console drivers are called with interrupts disabled, so 2647 * @console_may_schedule should be cleared before; however, we may 2648 * end up dumping a lot of lines, for example, if called from 2649 * console registration path, and should invoke cond_resched() 2650 * between lines if allowable. Not doing so can cause a very long 2651 * scheduling stall on a slow console leading to RCU stall and 2652 * softlockup warnings which exacerbate the issue with more 2653 * messages practically incapacitating the system. 2654 * 2655 * console_trylock() is not able to detect the preemptive 2656 * context reliably. Therefore the value must be stored before 2657 * and cleared after the "again" goto label. 2658 */ 2659 do_cond_resched = console_may_schedule; 2660 again: 2661 console_may_schedule = 0; 2662 2663 /* 2664 * We released the console_sem lock, so we need to recheck if 2665 * cpu is online and (if not) is there at least one CON_ANYTIME 2666 * console. 2667 */ 2668 if (!can_use_console()) { 2669 console_locked = 0; 2670 up_console_sem(); 2671 return; 2672 } 2673 2674 for (;;) { 2675 size_t ext_len = 0; 2676 int handover; 2677 size_t len; 2678 2679 skip: 2680 if (!prb_read_valid(prb, console_seq, &r)) 2681 break; 2682 2683 if (console_seq != r.info->seq) { 2684 console_dropped += r.info->seq - console_seq; 2685 console_seq = r.info->seq; > 2686 if (panic_in_progress() && panic_console_dropped++ > 10) { 2687 suppress_panic_printk = 1; 2688 pr_warn_once("Too many dropped messages. Supress messages on non-panic CPUs to prevent livelock.\n"); 2689 } 2690 } 2691 2692 if (suppress_message_printing(r.info->level)) { 2693 /* 2694 * Skip record we have buffered and already printed 2695 * directly to the console when we received it, and 2696 * record that has level above the console loglevel. 2697 */ 2698 console_seq++; 2699 goto skip; 2700 } 2701 2702 /* Output to all consoles once old messages replayed. */ 2703 if (unlikely(exclusive_console && 2704 console_seq >= exclusive_console_stop_seq)) { 2705 exclusive_console = NULL; 2706 } 2707 2708 /* 2709 * Handle extended console text first because later 2710 * record_print_text() will modify the record buffer in-place. 2711 */ 2712 if (nr_ext_console_drivers) { 2713 ext_len = info_print_ext_header(ext_text, 2714 sizeof(ext_text), 2715 r.info); 2716 ext_len += msg_print_ext_body(ext_text + ext_len, 2717 sizeof(ext_text) - ext_len, 2718 &r.text_buf[0], 2719 r.info->text_len, 2720 &r.info->dev_info); 2721 } 2722 len = record_print_text(&r, 2723 console_msg_format & MSG_FORMAT_SYSLOG, 2724 printk_time); 2725 console_seq++; 2726 2727 /* 2728 * While actively printing out messages, if another printk() 2729 * were to occur on another CPU, it may wait for this one to 2730 * finish. This task can not be preempted if there is a 2731 * waiter waiting to take over. 2732 * 2733 * Interrupts are disabled because the hand over to a waiter 2734 * must not be interrupted until the hand over is completed 2735 * (@console_waiter is cleared). 2736 */ 2737 printk_safe_enter_irqsave(flags); 2738 console_lock_spinning_enable(); 2739 2740 stop_critical_timings(); /* don't trace print latency */ 2741 call_console_drivers(ext_text, ext_len, text, len); 2742 start_critical_timings(); 2743 2744 handover = console_lock_spinning_disable_and_check(); 2745 printk_safe_exit_irqrestore(flags); 2746 if (handover) 2747 return; 2748 2749 if (do_cond_resched) 2750 cond_resched(); 2751 } 2752 2753 /* Get consistent value of the next-to-be-used sequence number. */ 2754 next_seq = console_seq; 2755 2756 console_locked = 0; 2757 up_console_sem(); 2758 2759 /* 2760 * Someone could have filled up the buffer again, so re-check if there's 2761 * something to flush. In case we cannot trylock the console_sem again, 2762 * there's a new owner and the console_unlock() from them will do the 2763 * flush, no worries. 2764 */ 2765 retry = prb_read_valid(prb, next_seq, NULL); 2766 if (retry && console_trylock()) 2767 goto again; 2768 } 2769 EXPORT_SYMBOL(console_unlock); 2770 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 3/4] printk: Avoid livelock with heavy printk during panic @ 2022-02-02 0:16 ` kernel test robot 0 siblings, 0 replies; 11+ messages in thread From: kernel test robot @ 2022-02-02 0:16 UTC (permalink / raw) To: kbuild-all [-- Attachment #1: Type: text/plain, Size: 8011 bytes --] Hi Stephen, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v5.17-rc2 next-20220201] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Stephen-Brennan/printk-reduce-deadlocks-during-panic/20220202-030025 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 26291c54e111ff6ba87a164d85d4a4e134b7315c config: hexagon-buildonly-randconfig-r006-20220130 (https://download.01.org/0day-ci/archive/20220202/202202020859.21684oMh-lkp(a)intel.com/config) compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 6b1e844b69f15bb7dffaf9365cd2b355d2eb7579) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/f8bc56f0cf3ba4bdd9948d9743b42927f8415345 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Stephen-Brennan/printk-reduce-deadlocks-during-panic/20220202-030025 git checkout f8bc56f0cf3ba4bdd9948d9743b42927f8415345 # save the config file to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash kernel/printk/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> kernel/printk/printk.c:2686:8: error: implicit declaration of function 'panic_in_progress' [-Werror,-Wimplicit-function-declaration] if (panic_in_progress() && panic_console_dropped++ > 10) { ^ 1 error generated. vim +/panic_in_progress +2686 kernel/printk/printk.c 2612 2613 /** 2614 * console_unlock - unlock the console system 2615 * 2616 * Releases the console_lock which the caller holds on the console system 2617 * and the console driver list. 2618 * 2619 * While the console_lock was held, console output may have been buffered 2620 * by printk(). If this is the case, console_unlock(); emits 2621 * the output prior to releasing the lock. 2622 * 2623 * If there is output waiting, we wake /dev/kmsg and syslog() users. 2624 * 2625 * console_unlock(); may be called from any context. 2626 */ 2627 void console_unlock(void) 2628 { 2629 static char ext_text[CONSOLE_EXT_LOG_MAX]; 2630 static char text[CONSOLE_LOG_MAX]; 2631 static int panic_console_dropped; 2632 unsigned long flags; 2633 bool do_cond_resched, retry; 2634 struct printk_info info; 2635 struct printk_record r; 2636 u64 __maybe_unused next_seq; 2637 2638 if (console_suspended) { 2639 up_console_sem(); 2640 return; 2641 } 2642 2643 prb_rec_init_rd(&r, &info, text, sizeof(text)); 2644 2645 /* 2646 * Console drivers are called with interrupts disabled, so 2647 * @console_may_schedule should be cleared before; however, we may 2648 * end up dumping a lot of lines, for example, if called from 2649 * console registration path, and should invoke cond_resched() 2650 * between lines if allowable. Not doing so can cause a very long 2651 * scheduling stall on a slow console leading to RCU stall and 2652 * softlockup warnings which exacerbate the issue with more 2653 * messages practically incapacitating the system. 2654 * 2655 * console_trylock() is not able to detect the preemptive 2656 * context reliably. Therefore the value must be stored before 2657 * and cleared after the "again" goto label. 2658 */ 2659 do_cond_resched = console_may_schedule; 2660 again: 2661 console_may_schedule = 0; 2662 2663 /* 2664 * We released the console_sem lock, so we need to recheck if 2665 * cpu is online and (if not) is there at least one CON_ANYTIME 2666 * console. 2667 */ 2668 if (!can_use_console()) { 2669 console_locked = 0; 2670 up_console_sem(); 2671 return; 2672 } 2673 2674 for (;;) { 2675 size_t ext_len = 0; 2676 int handover; 2677 size_t len; 2678 2679 skip: 2680 if (!prb_read_valid(prb, console_seq, &r)) 2681 break; 2682 2683 if (console_seq != r.info->seq) { 2684 console_dropped += r.info->seq - console_seq; 2685 console_seq = r.info->seq; > 2686 if (panic_in_progress() && panic_console_dropped++ > 10) { 2687 suppress_panic_printk = 1; 2688 pr_warn_once("Too many dropped messages. Supress messages on non-panic CPUs to prevent livelock.\n"); 2689 } 2690 } 2691 2692 if (suppress_message_printing(r.info->level)) { 2693 /* 2694 * Skip record we have buffered and already printed 2695 * directly to the console when we received it, and 2696 * record that has level above the console loglevel. 2697 */ 2698 console_seq++; 2699 goto skip; 2700 } 2701 2702 /* Output to all consoles once old messages replayed. */ 2703 if (unlikely(exclusive_console && 2704 console_seq >= exclusive_console_stop_seq)) { 2705 exclusive_console = NULL; 2706 } 2707 2708 /* 2709 * Handle extended console text first because later 2710 * record_print_text() will modify the record buffer in-place. 2711 */ 2712 if (nr_ext_console_drivers) { 2713 ext_len = info_print_ext_header(ext_text, 2714 sizeof(ext_text), 2715 r.info); 2716 ext_len += msg_print_ext_body(ext_text + ext_len, 2717 sizeof(ext_text) - ext_len, 2718 &r.text_buf[0], 2719 r.info->text_len, 2720 &r.info->dev_info); 2721 } 2722 len = record_print_text(&r, 2723 console_msg_format & MSG_FORMAT_SYSLOG, 2724 printk_time); 2725 console_seq++; 2726 2727 /* 2728 * While actively printing out messages, if another printk() 2729 * were to occur on another CPU, it may wait for this one to 2730 * finish. This task can not be preempted if there is a 2731 * waiter waiting to take over. 2732 * 2733 * Interrupts are disabled because the hand over to a waiter 2734 * must not be interrupted until the hand over is completed 2735 * (@console_waiter is cleared). 2736 */ 2737 printk_safe_enter_irqsave(flags); 2738 console_lock_spinning_enable(); 2739 2740 stop_critical_timings(); /* don't trace print latency */ 2741 call_console_drivers(ext_text, ext_len, text, len); 2742 start_critical_timings(); 2743 2744 handover = console_lock_spinning_disable_and_check(); 2745 printk_safe_exit_irqrestore(flags); 2746 if (handover) 2747 return; 2748 2749 if (do_cond_resched) 2750 cond_resched(); 2751 } 2752 2753 /* Get consistent value of the next-to-be-used sequence number. */ 2754 next_seq = console_seq; 2755 2756 console_locked = 0; 2757 up_console_sem(); 2758 2759 /* 2760 * Someone could have filled up the buffer again, so re-check if there's 2761 * something to flush. In case we cannot trylock the console_sem again, 2762 * there's a new owner and the console_unlock() from them will do the 2763 * flush, no worries. 2764 */ 2765 retry = prb_read_valid(prb, next_seq, NULL); 2766 if (retry && console_trylock()) 2767 goto again; 2768 } 2769 EXPORT_SYMBOL(console_unlock); 2770 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 4/4] printk: Drop console_sem during panic 2022-02-01 18:57 [PATCH v3 0/4] printk: reduce deadlocks during panic Stephen Brennan ` (2 preceding siblings ...) 2022-02-01 18:58 ` [PATCH v3 3/4] printk: Avoid livelock with heavy printk " Stephen Brennan @ 2022-02-01 18:58 ` Stephen Brennan 2022-02-04 4:04 ` Sergey Senozhatsky 3 siblings, 1 reply; 11+ messages in thread From: Stephen Brennan @ 2022-02-01 18:58 UTC (permalink / raw) To: Sergey Senozhatsky, Petr Mladek Cc: Steven Rostedt, linux-kernel, John Ogness, Stephen Brennan If another CPU is in panic, we are about to be halted. Try to gracefully abandon the console_sem, leaving it free for the panic CPU to grab. Suggested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> --- Notes: v2: Factor check out to a helper, and check at the end of console_unlock() to prevent retry as well. kernel/printk/printk.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index cc7bb86f7bfe..35676e76482e 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2598,6 +2598,25 @@ static int have_callable_console(void) return 0; } +/* + * Return true when this CPU should unlock console_sem without pushing all + * messages to the console. This reduces the chance that the console is + * locked when the panic CPU tries to use it. + */ +static bool abandon_console_lock_in_panic(void) +{ + if (!panic_in_progress()) + return false; + + /* + * We can use raw_smp_processor_id() here because it is impossible for + * the task to be migrated to the panic_cpu, or away from it. If + * panic_cpu has already been set, and we're not currently executing on + * that CPU, then we never will be. + */ + return atomic_read(&panic_cpu) != raw_smp_processor_id(); +} + /* * Can we actually use the console at this time on this cpu? * @@ -2746,6 +2765,10 @@ void console_unlock(void) if (handover) return; + /* Allow panic_cpu to take over the consoles safely */ + if (abandon_console_lock_in_panic()) + break; + if (do_cond_resched) cond_resched(); } @@ -2763,7 +2786,7 @@ void console_unlock(void) * flush, no worries. */ retry = prb_read_valid(prb, next_seq, NULL); - if (retry && console_trylock()) + if (retry && !abandon_console_lock_in_panic() && console_trylock()) goto again; } EXPORT_SYMBOL(console_unlock); -- 2.30.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v3 4/4] printk: Drop console_sem during panic 2022-02-01 18:58 ` [PATCH v3 4/4] printk: Drop console_sem " Stephen Brennan @ 2022-02-04 4:04 ` Sergey Senozhatsky 2022-02-04 18:53 ` Stephen Brennan 0 siblings, 1 reply; 11+ messages in thread From: Sergey Senozhatsky @ 2022-02-04 4:04 UTC (permalink / raw) To: Stephen Brennan Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, linux-kernel, John Ogness On (22/02/01 10:58), Stephen Brennan wrote: > +/* > + * Return true when this CPU should unlock console_sem without pushing all > + * messages to the console. This reduces the chance that the console is > + * locked when the panic CPU tries to use it. > + */ > +static bool abandon_console_lock_in_panic(void) > +{ > + if (!panic_in_progress()) > + return false; > + > + /* > + * We can use raw_smp_processor_id() here because it is impossible for > + * the task to be migrated to the panic_cpu, or away from it. If > + * panic_cpu has already been set, and we're not currently executing on > + * that CPU, then we never will be. > + */ > + return atomic_read(&panic_cpu) != raw_smp_processor_id(); > +} > + > /* > * Can we actually use the console at this time on this cpu? > * > @@ -2746,6 +2765,10 @@ void console_unlock(void) > if (handover) > return; > > + /* Allow panic_cpu to take over the consoles safely */ > + if (abandon_console_lock_in_panic()) > + break; Sorry, why not just `return` like in handover case? > + > if (do_cond_resched) > cond_resched(); > } > @@ -2763,7 +2786,7 @@ void console_unlock(void) > * flush, no worries. > */ > retry = prb_read_valid(prb, next_seq, NULL); > - if (retry && console_trylock()) > + if (retry && !abandon_console_lock_in_panic() && console_trylock()) > goto again; > } > EXPORT_SYMBOL(console_unlock); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 4/4] printk: Drop console_sem during panic 2022-02-04 4:04 ` Sergey Senozhatsky @ 2022-02-04 18:53 ` Stephen Brennan 2022-02-08 2:15 ` Sergey Senozhatsky 0 siblings, 1 reply; 11+ messages in thread From: Stephen Brennan @ 2022-02-04 18:53 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, linux-kernel, John Ogness Sergey Senozhatsky <senozhatsky@chromium.org> writes: > On (22/02/01 10:58), Stephen Brennan wrote: >> +/* >> + * Return true when this CPU should unlock console_sem without pushing all >> + * messages to the console. This reduces the chance that the console is >> + * locked when the panic CPU tries to use it. >> + */ >> +static bool abandon_console_lock_in_panic(void) >> +{ >> + if (!panic_in_progress()) >> + return false; >> + >> + /* >> + * We can use raw_smp_processor_id() here because it is impossible for >> + * the task to be migrated to the panic_cpu, or away from it. If >> + * panic_cpu has already been set, and we're not currently executing on >> + * that CPU, then we never will be. >> + */ >> + return atomic_read(&panic_cpu) != raw_smp_processor_id(); >> +} >> + >> /* >> * Can we actually use the console at this time on this cpu? >> * >> @@ -2746,6 +2765,10 @@ void console_unlock(void) >> if (handover) >> return; >> >> + /* Allow panic_cpu to take over the consoles safely */ >> + if (abandon_console_lock_in_panic()) >> + break; > > Sorry, why not just `return` like in handover case? We need to drop console_sem before returning, since the whole benefit here is to increase the chance that console_sem is unlocked when the panic_cpu halts this CPU. in the handover case, there's another cpu waiting, and we're essentially transferring the console_sem ownership to that cpu, so we explicitly return and skip the unlocking portion. Does this need some comments to clarify it? Admittedly if I had a few more lines of context in the diff, you would see the console unlock directly after the loop; it's a bit clearer when you're looking at the function as whole: 2768 /* Allow panic_cpu to take over the consoles safely */ 2769 if (abandon_console_lock_in_panic()) 2770 break; 2771 2772 if (do_cond_resched) 2773 cond_resched(); 2774 } 2775 2776 /* Get consistent value of the next-to-be-used sequence number. */ 2777 next_seq = console_seq; 2778 2779 console_locked = 0; 2780 up_console_sem(); Stephen > >> + >> if (do_cond_resched) >> cond_resched(); >> } >> @@ -2763,7 +2786,7 @@ void console_unlock(void) >> * flush, no worries. >> */ >> retry = prb_read_valid(prb, next_seq, NULL); >> - if (retry && console_trylock()) >> + if (retry && !abandon_console_lock_in_panic() && console_trylock()) >> goto again; >> } >> EXPORT_SYMBOL(console_unlock); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 4/4] printk: Drop console_sem during panic 2022-02-04 18:53 ` Stephen Brennan @ 2022-02-08 2:15 ` Sergey Senozhatsky 0 siblings, 0 replies; 11+ messages in thread From: Sergey Senozhatsky @ 2022-02-08 2:15 UTC (permalink / raw) To: Stephen Brennan Cc: Sergey Senozhatsky, Petr Mladek, Steven Rostedt, linux-kernel, John Ogness On (22/02/04 10:53), Stephen Brennan wrote: > Sergey Senozhatsky <senozhatsky@chromium.org> writes: > > On (22/02/01 10:58), Stephen Brennan wrote: > >> +/* > >> + * Return true when this CPU should unlock console_sem without pushing all > >> + * messages to the console. This reduces the chance that the console is > >> + * locked when the panic CPU tries to use it. > >> + */ > >> +static bool abandon_console_lock_in_panic(void) > >> +{ > >> + if (!panic_in_progress()) > >> + return false; > >> + > >> + /* > >> + * We can use raw_smp_processor_id() here because it is impossible for > >> + * the task to be migrated to the panic_cpu, or away from it. If > >> + * panic_cpu has already been set, and we're not currently executing on > >> + * that CPU, then we never will be. > >> + */ > >> + return atomic_read(&panic_cpu) != raw_smp_processor_id(); > >> +} > >> + > >> /* > >> * Can we actually use the console at this time on this cpu? > >> * > >> @@ -2746,6 +2765,10 @@ void console_unlock(void) > >> if (handover) > >> return; > >> > >> + /* Allow panic_cpu to take over the consoles safely */ > >> + if (abandon_console_lock_in_panic()) > >> + break; > > > > Sorry, why not just `return` like in handover case? > > We need to drop console_sem before returning, since the whole benefit > here is to increase the chance that console_sem is unlocked when the > panic_cpu halts this CPU. Yes, that makes sense. > in the handover case, there's another cpu waiting, and we're essentially > transferring the console_sem ownership to that cpu, so we explicitly > return and skip the unlocking portion. > > Does this need some comments to clarify it? No. Everything looks good. Thanks. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-02-08 2:16 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-02-01 18:57 [PATCH v3 0/4] printk: reduce deadlocks during panic Stephen Brennan 2022-02-01 18:57 ` [PATCH v3 1/4] printk: Add panic_in_progress helper Stephen Brennan 2022-02-02 1:39 ` Stephen Brennan 2022-02-01 18:58 ` [PATCH v3 2/4] printk: disable optimistic spin during panic Stephen Brennan 2022-02-01 18:58 ` [PATCH v3 3/4] printk: Avoid livelock with heavy printk " Stephen Brennan 2022-02-02 0:16 ` kernel test robot 2022-02-02 0:16 ` kernel test robot 2022-02-01 18:58 ` [PATCH v3 4/4] printk: Drop console_sem " Stephen Brennan 2022-02-04 4:04 ` Sergey Senozhatsky 2022-02-04 18:53 ` Stephen Brennan 2022-02-08 2:15 ` Sergey Senozhatsky
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.