* [RFC PATCH 0/3] Fix quiet console in pre-panic scenarios
@ 2020-03-15 17:09 Eugeniu Rosca
2020-03-15 17:09 ` [RFC PATCH 1/3] printk: convert ignore_loglevel to atomic_t Eugeniu Rosca
` (3 more replies)
0 siblings, 4 replies; 19+ messages in thread
From: Eugeniu Rosca @ 2020-03-15 17:09 UTC (permalink / raw)
To: linux-kernel
Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Ingo Molnar,
Thomas Gleixner, Peter Zijlstra, Jisheng Zhang, Valdis Kletnieks,
Sebastian Andrzej Siewior, Andrew Gabbasov, Dirk Behme,
Eugeniu Rosca
Dear community,
The motivation behind this seris is to save days/weeks, if not months,
of debugging efforts for users who:
* experience an issue like softlockup/hardlockup/hung task/oom, whose
reproduction is not clear and whose occurrence rate is very low
* are constrained to use a low loglevel value (1,2,3) in production
* mostly rely on console logs to debug the issue post-mortem
(e.g. saved to persistent storage via e.g. pstore)
As pointed out in the last patch from this series, under the above
circumstances, users might simply lack any relevant logs during
post-mortem analysis.
Why this series is marked as RFC is because:
* There are several possible approaches to turn console verbosity on
and off. Current series employs the 'ignore_loglevel' functionality,
but an alternative way is to use the 'console_loglevel' variable. The
latter is more intrusive, hence the former has been chosen as v1.
* Manipulating 'ignore_loglevel' might be seen as an abuse, especially
because it breaks the expectation of users who assume the system to
be dead silent after passing loglevel=0 on kernel command line.
Thank you for your comments!
Eugeniu Rosca (3):
printk: convert ignore_loglevel to atomic_t
printk: add console_verbose_{start,end}
watchdog: Turn console verbosity on when reporting softlockup
include/linux/printk.h | 10 ++++++++++
kernel/printk/printk.c | 30 ++++++++++++++++++++++++++----
kernel/watchdog.c | 4 ++++
3 files changed, 40 insertions(+), 4 deletions(-)
--
2.25.0
^ permalink raw reply [flat|nested] 19+ messages in thread* [RFC PATCH 1/3] printk: convert ignore_loglevel to atomic_t 2020-03-15 17:09 [RFC PATCH 0/3] Fix quiet console in pre-panic scenarios Eugeniu Rosca @ 2020-03-15 17:09 ` Eugeniu Rosca 2020-03-15 17:09 ` [RFC PATCH 2/3] printk: add console_verbose_{start,end} Eugeniu Rosca ` (2 subsequent siblings) 3 siblings, 0 replies; 19+ messages in thread From: Eugeniu Rosca @ 2020-03-15 17:09 UTC (permalink / raw) To: linux-kernel Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Jisheng Zhang, Valdis Kletnieks, Sebastian Andrzej Siewior, Andrew Gabbasov, Dirk Behme, Eugeniu Rosca Prepare for using the 'ignore_loglevel' variable as a lockless and SMP safe vehicle to turn console verbosity on and off in error paths which usually (but not necessarily) end up with panic (configurable via user-selectable knobs). No functional change intended. Tested on R-Car H3ULCB. Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com> --- kernel/printk/printk.c | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index fada22dc4ab6..d2c75955a0d7 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1181,24 +1181,46 @@ void __init setup_log_buf(int early) free, (free * 100) / __LOG_BUF_LEN); } -static bool __read_mostly ignore_loglevel; +atomic_t __read_mostly ignore_loglevel = ATOMIC_INIT(0); static int __init ignore_loglevel_setup(char *str) { - ignore_loglevel = true; + atomic_set(&ignore_loglevel, 1); pr_info("debug: ignoring loglevel setting.\n"); return 0; } +static int kparam_set_atomic(const char *val, const struct kernel_param *kp) +{ + if (param_set_bool(val, kp)) + return -EINVAL; + + atomic_set(&ignore_loglevel, *(bool *)kp->arg ? 1 : 0); + + return 0; +} + +static int kparam_get_atomic(char *val, const struct kernel_param *kp) +{ + return sprintf(val, "%c\n", atomic_read(&ignore_loglevel) ? 'Y' : 'N'); +} + +static const struct kernel_param_ops kparam_ops = { + .set = kparam_set_atomic, + .get = kparam_get_atomic, +}; + +static int kparam_buf; + early_param("ignore_loglevel", ignore_loglevel_setup); -module_param(ignore_loglevel, bool, S_IRUGO | S_IWUSR); +module_param_cb(ignore_loglevel, &kparam_ops, &kparam_buf, 0644); MODULE_PARM_DESC(ignore_loglevel, "ignore loglevel setting (prints all kernel messages to the console)"); static bool suppress_message_printing(int level) { - return (level >= console_loglevel && !ignore_loglevel); + return (level >= console_loglevel && !atomic_read(&ignore_loglevel)); } #ifdef CONFIG_BOOT_PRINTK_DELAY -- 2.25.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC PATCH 2/3] printk: add console_verbose_{start,end} 2020-03-15 17:09 [RFC PATCH 0/3] Fix quiet console in pre-panic scenarios Eugeniu Rosca 2020-03-15 17:09 ` [RFC PATCH 1/3] printk: convert ignore_loglevel to atomic_t Eugeniu Rosca @ 2020-03-15 17:09 ` Eugeniu Rosca 2020-03-16 21:19 ` [RFC PATCH 2/3] printk: add console_verbose_{start, end} kbuild test robot ` (2 more replies) 2020-03-15 17:09 ` [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup Eugeniu Rosca 2020-03-16 18:35 ` [RFC PATCH 0/3] Fix quiet console in pre-panic scenarios Steven Rostedt 3 siblings, 3 replies; 19+ messages in thread From: Eugeniu Rosca @ 2020-03-15 17:09 UTC (permalink / raw) To: linux-kernel Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Jisheng Zhang, Valdis Kletnieks, Sebastian Andrzej Siewior, Andrew Gabbasov, Dirk Behme, Eugeniu Rosca Consider below example scenarios: - soft lockup - hard lockup - hung task - scheduling while atomic - rcu stall - oops - oom - WARN In most of the above situations, it is up to the user to terminate the execution with a panic (see panic_on_* in kernel/sysctl.c) or to decide not to abort and try to recover. A general concern applicable to the above use-cases is that, depending on the console loglevel set by the user, precious information conveyed by show_regs(), print_modules(), dump_stack() and friends may simply not appear on the console. Below example commits tackle this exact concern in the panic paths: * commit 168e06f7937d96 ("kernel/hung_task.c: force console verbose before panic") * commit 5b530fc1832460 ("panic: call console_verbose() in panic") The approach behind the above commits is straightforward. Whenever panic is imminent, they simply call console_verbose(). Unfortunately, the same technique does not apply to non-panic paths. It requires a counterpart of console_verbose(), which currently does not exist. With that in mind, create a pair of functions named console_verbose_start() and console_verbose_end() which turn the console verbosity on and off in a lockless and SMP safe way. Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com> --- include/linux/printk.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/include/linux/printk.h b/include/linux/printk.h index 1e6108b8d15f..14755ef7b017 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -3,6 +3,7 @@ #define __KERNEL_PRINTK__ #include <stdarg.h> +#include <linux/atomic.h> #include <linux/init.h> #include <linux/kern_levels.h> #include <linux/linkage.h> @@ -77,6 +78,15 @@ static inline void console_verbose(void) console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH; } +#ifdef CONFIG_PRINTK +extern atomic_t ignore_loglevel; +static inline void console_verbose_start(void) { atomic_inc(&ignore_loglevel); } +static inline void console_verbose_end(void) { atomic_dec(&ignore_loglevel); } +#else +static inline void console_verbose_start(void) { } +static inline void console_verbose_end(void) { } +#endif + /* strlen("ratelimit") + 1 */ #define DEVKMSG_STR_MAX_SIZE 10 extern char devkmsg_log_str[]; -- 2.25.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 2/3] printk: add console_verbose_{start, end} 2020-03-15 17:09 ` [RFC PATCH 2/3] printk: add console_verbose_{start,end} Eugeniu Rosca @ 2020-03-16 21:19 ` kbuild test robot 2020-03-16 21:30 ` kbuild test robot 2020-03-18 16:00 ` [RFC PATCH 2/3] printk: add console_verbose_{start,end} John Ogness 2 siblings, 0 replies; 19+ messages in thread From: kbuild test robot @ 2020-03-16 21:19 UTC (permalink / raw) To: kbuild-all [-- Attachment #1: Type: text/plain, Size: 7468 bytes --] Hi Eugeniu, [FYI, it's a private test report for your RFC patch.] [auto build test ERROR on linux/master] [also build test ERROR on pmladek/for-next linus/master v5.6-rc6 next-20200316] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Eugeniu-Rosca/Fix-quiet-console-in-pre-panic-scenarios/20200317-040928 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f5ae2ea6347a308cfe91f53b53682ce635497d0d config: powerpc-allnoconfig (attached as .config) compiler: powerpc-linux-gcc (GCC) 9.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=9.2.0 make.cross ARCH=powerpc If you fix the issue, kindly add following tag Reported-by: kbuild test robot <lkp@intel.com> All errors (new ones prefixed by >>): In file included from arch/powerpc/include/asm/atomic.h:11, from include/linux/atomic.h:7, from include/linux/printk.h:6, from include/linux/kernel.h:15, from include/asm-generic/bug.h:19, from arch/powerpc/include/asm/bug.h:109, from include/linux/bug.h:5, from include/linux/page-flags.h:10, from kernel/bounds.c:10: arch/powerpc/include/asm/cmpxchg.h: In function '__xchg_local': arch/powerpc/include/asm/cmpxchg.h:173:2: error: implicit declaration of function 'BUILD_BUG_ON_MSG' [-Werror=implicit-function-declaration] 173 | BUILD_BUG_ON_MSG(1, "Unsupported size for __xchg"); | ^~~~~~~~~~~~~~~~ In file included from arch/powerpc/include/asm/page.h:254, from arch/powerpc/include/asm/thread_info.h:23, from include/linux/thread_info.h:38, from arch/powerpc/include/asm/ptrace.h:218, from arch/powerpc/include/asm/hw_irq.h:12, from arch/powerpc/include/asm/irqflags.h:12, from include/linux/irqflags.h:16, from include/asm-generic/cmpxchg-local.h:6, from arch/powerpc/include/asm/cmpxchg.h:537, from arch/powerpc/include/asm/atomic.h:11, from include/linux/atomic.h:7, from include/linux/printk.h:6, from include/linux/kernel.h:15, from include/asm-generic/bug.h:19, from arch/powerpc/include/asm/bug.h:109, from include/linux/bug.h:5, from include/linux/page-flags.h:10, from kernel/bounds.c:10: arch/powerpc/include/asm/page_32.h: In function 'clear_page': >> arch/powerpc/include/asm/page_32.h:54:2: error: implicit declaration of function 'WARN_ON' [-Werror=implicit-function-declaration] 54 | WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1)); | ^~~~~~~ In file included from arch/powerpc/include/asm/ptrace.h:218, from arch/powerpc/include/asm/hw_irq.h:12, from arch/powerpc/include/asm/irqflags.h:12, from include/linux/irqflags.h:16, from include/asm-generic/cmpxchg-local.h:6, from arch/powerpc/include/asm/cmpxchg.h:537, from arch/powerpc/include/asm/atomic.h:11, from include/linux/atomic.h:7, from include/linux/printk.h:6, from include/linux/kernel.h:15, from include/asm-generic/bug.h:19, from arch/powerpc/include/asm/bug.h:109, from include/linux/bug.h:5, from include/linux/page-flags.h:10, from kernel/bounds.c:10: include/linux/thread_info.h: In function 'copy_overflow': include/linux/thread_info.h:134:2: error: implicit declaration of function 'WARN' [-Werror=implicit-function-declaration] 134 | WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count); | ^~~~ include/linux/thread_info.h: In function 'check_copy_size': include/linux/thread_info.h:150:6: error: implicit declaration of function 'WARN_ON_ONCE' [-Werror=implicit-function-declaration] 150 | if (WARN_ON_ONCE(bytes > INT_MAX)) | ^~~~~~~~~~~~ cc1: some warnings being treated as errors make[2]: *** [scripts/Makefile.build:99: kernel/bounds.s] Error 1 make[2]: Target '__build' not remade because of errors. make[1]: *** [Makefile:1113: prepare0] Error 2 make[1]: Target 'prepare' not remade because of errors. make: *** [Makefile:179: sub-make] Error 2 16 real 4 user 6 sys 68.27% cpu make prepare vim +/WARN_ON +54 arch/powerpc/include/asm/page_32.h 7ab0b7cb8951d4 arch/powerpc/include/asm/page_32.h Christophe Leroy 2019-08-16 44 5736f96d12dd42 arch/powerpc/include/asm/page_32.h Christophe Leroy 2016-02-09 45 /* 5736f96d12dd42 arch/powerpc/include/asm/page_32.h Christophe Leroy 2016-02-09 46 * Clear page using the dcbz instruction, which doesn't cause any 5736f96d12dd42 arch/powerpc/include/asm/page_32.h Christophe Leroy 2016-02-09 47 * memory traffic (except to write out any cache lines which get 5736f96d12dd42 arch/powerpc/include/asm/page_32.h Christophe Leroy 2016-02-09 48 * displaced). This only works on cacheable memory. 5736f96d12dd42 arch/powerpc/include/asm/page_32.h Christophe Leroy 2016-02-09 49 */ 5736f96d12dd42 arch/powerpc/include/asm/page_32.h Christophe Leroy 2016-02-09 50 static inline void clear_page(void *addr) 5736f96d12dd42 arch/powerpc/include/asm/page_32.h Christophe Leroy 2016-02-09 51 { 5736f96d12dd42 arch/powerpc/include/asm/page_32.h Christophe Leroy 2016-02-09 52 unsigned int i; 5736f96d12dd42 arch/powerpc/include/asm/page_32.h Christophe Leroy 2016-02-09 53 7ab0b7cb8951d4 arch/powerpc/include/asm/page_32.h Christophe Leroy 2019-08-16 @54 WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1)); 7ab0b7cb8951d4 arch/powerpc/include/asm/page_32.h Christophe Leroy 2019-08-16 55 5736f96d12dd42 arch/powerpc/include/asm/page_32.h Christophe Leroy 2016-02-09 56 for (i = 0; i < PAGE_SIZE / L1_CACHE_BYTES; i++, addr += L1_CACHE_BYTES) 5736f96d12dd42 arch/powerpc/include/asm/page_32.h Christophe Leroy 2016-02-09 57 dcbz(addr); 5736f96d12dd42 arch/powerpc/include/asm/page_32.h Christophe Leroy 2016-02-09 58 } 5cd16ee934eafc include/asm-powerpc/page_32.h Michael Ellerman 2005-11-11 59 extern void copy_page(void *to, void *from); 5cd16ee934eafc include/asm-powerpc/page_32.h Michael Ellerman 2005-11-11 60 :::::: The code@line 54 was first introduced by commit :::::: 7ab0b7cb8951d4095d73e203759b74d41916e455 powerpc/32: Add warning on misaligned copy_page() or clear_page() :::::: TO: Christophe Leroy <christophe.leroy@c-s.fr> :::::: CC: Michael Ellerman <mpe@ellerman.id.au> --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org [-- Attachment #2: config.gz --] [-- Type: application/gzip, Size: 6334 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 2/3] printk: add console_verbose_{start, end} 2020-03-15 17:09 ` [RFC PATCH 2/3] printk: add console_verbose_{start,end} Eugeniu Rosca 2020-03-16 21:19 ` [RFC PATCH 2/3] printk: add console_verbose_{start, end} kbuild test robot @ 2020-03-16 21:30 ` kbuild test robot 2020-03-18 16:00 ` [RFC PATCH 2/3] printk: add console_verbose_{start,end} John Ogness 2 siblings, 0 replies; 19+ messages in thread From: kbuild test robot @ 2020-03-16 21:30 UTC (permalink / raw) To: kbuild-all [-- Attachment #1: Type: text/plain, Size: 8207 bytes --] Hi Eugeniu, [FYI, it's a private test report for your RFC patch.] [auto build test ERROR on linux/master] [also build test ERROR on pmladek/for-next linus/master v5.6-rc6 next-20200316] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Eugeniu-Rosca/Fix-quiet-console-in-pre-panic-scenarios/20200317-040928 base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f5ae2ea6347a308cfe91f53b53682ce635497d0d config: powerpc-mpc836x_rdk_defconfig (attached as .config) compiler: powerpc-linux-gcc (GCC) 9.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=9.2.0 make.cross ARCH=powerpc If you fix the issue, kindly add following tag Reported-by: kbuild test robot <lkp@intel.com> All error/warnings (new ones prefixed by >>): In file included from arch/powerpc/include/asm/atomic.h:11, from include/linux/atomic.h:7, from include/linux/printk.h:6, from include/linux/kernel.h:15, from include/asm-generic/bug.h:19, from arch/powerpc/include/asm/bug.h:109, from include/linux/bug.h:5, from include/linux/page-flags.h:10, from kernel/bounds.c:10: arch/powerpc/include/asm/cmpxchg.h: In function '__xchg_local': >> arch/powerpc/include/asm/cmpxchg.h:173:2: error: implicit declaration of function 'BUILD_BUG_ON_MSG' [-Werror=implicit-function-declaration] 173 | BUILD_BUG_ON_MSG(1, "Unsupported size for __xchg"); | ^~~~~~~~~~~~~~~~ In file included from include/linux/bug.h:5, from include/linux/page-flags.h:10, from kernel/bounds.c:10: arch/powerpc/include/asm/page_32.h: In function 'clear_page': >> arch/powerpc/include/asm/bug.h:87:4: error: implicit declaration of function '__WARN' [-Werror=implicit-function-declaration] 87 | __WARN(); \ | ^~~~~~ >> arch/powerpc/include/asm/page_32.h:54:2: note: in expansion of macro 'WARN_ON' 54 | WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1)); | ^~~~~~~ >> arch/powerpc/include/asm/bug.h:90:38: error: 'TAINT_WARN' undeclared (first use in this function) 90 | BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \ | ^~~~~~~~~~ arch/powerpc/include/asm/bug.h:57:10: note: in definition of macro 'BUG_ENTRY' 57 | "i" (flags), \ | ^~~~~ >> arch/powerpc/include/asm/bug.h:90:24: note: in expansion of macro 'BUGFLAG_TAINT' 90 | BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \ | ^~~~~~~~~~~~~ >> arch/powerpc/include/asm/page_32.h:54:2: note: in expansion of macro 'WARN_ON' 54 | WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1)); | ^~~~~~~ arch/powerpc/include/asm/bug.h:90:38: note: each undeclared identifier is reported only once for each function it appears in 90 | BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \ | ^~~~~~~~~~ arch/powerpc/include/asm/bug.h:57:10: note: in definition of macro 'BUG_ENTRY' 57 | "i" (flags), \ | ^~~~~ >> arch/powerpc/include/asm/bug.h:90:24: note: in expansion of macro 'BUGFLAG_TAINT' 90 | BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \ | ^~~~~~~~~~~~~ >> arch/powerpc/include/asm/page_32.h:54:2: note: in expansion of macro 'WARN_ON' 54 | WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1)); | ^~~~~~~ >> arch/powerpc/include/asm/bug.h:58:17: error: invalid application of 'sizeof' to incomplete type 'struct bug_entry' 58 | "i" (sizeof(struct bug_entry)), \ | ^~~~~~ >> arch/powerpc/include/asm/bug.h:89:3: note: in expansion of macro 'BUG_ENTRY' 89 | BUG_ENTRY(PPC_TLNEI " %4, 0", \ | ^~~~~~~~~ >> arch/powerpc/include/asm/page_32.h:54:2: note: in expansion of macro 'WARN_ON' 54 | WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1)); | ^~~~~~~ In file included from arch/powerpc/include/asm/ptrace.h:218, from arch/powerpc/include/asm/hw_irq.h:12, from arch/powerpc/include/asm/irqflags.h:12, from include/linux/irqflags.h:16, from include/asm-generic/cmpxchg-local.h:6, from arch/powerpc/include/asm/cmpxchg.h:537, from arch/powerpc/include/asm/atomic.h:11, from include/linux/atomic.h:7, from include/linux/printk.h:6, from include/linux/kernel.h:15, from include/asm-generic/bug.h:19, from arch/powerpc/include/asm/bug.h:109, from include/linux/bug.h:5, from include/linux/page-flags.h:10, from kernel/bounds.c:10: include/linux/thread_info.h: In function 'copy_overflow': >> include/linux/thread_info.h:134:2: error: implicit declaration of function 'WARN' [-Werror=implicit-function-declaration] 134 | WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count); | ^~~~ include/linux/thread_info.h: In function 'check_copy_size': >> include/linux/thread_info.h:150:6: error: implicit declaration of function 'WARN_ON_ONCE' [-Werror=implicit-function-declaration] 150 | if (WARN_ON_ONCE(bytes > INT_MAX)) | ^~~~~~~~~~~~ cc1: some warnings being treated as errors make[2]: *** [scripts/Makefile.build:99: kernel/bounds.s] Error 1 make[2]: Target '__build' not remade because of errors. make[1]: *** [Makefile:1113: prepare0] Error 2 make[1]: Target 'prepare' not remade because of errors. make: *** [Makefile:179: sub-make] Error 2 38 real 7 user 12 sys 51.95% cpu make prepare vim +/BUILD_BUG_ON_MSG +173 arch/powerpc/include/asm/cmpxchg.h ae3a197e3d0bfe David Howells 2012-03-28 157 ae3a197e3d0bfe David Howells 2012-03-28 158 static __always_inline unsigned long d0563a1297e234 Pan Xinhui 2016-04-27 159 __xchg_local(void *ptr, unsigned long x, unsigned int size) ae3a197e3d0bfe David Howells 2012-03-28 160 { ae3a197e3d0bfe David Howells 2012-03-28 161 switch (size) { d0563a1297e234 Pan Xinhui 2016-04-27 162 case 1: d0563a1297e234 Pan Xinhui 2016-04-27 163 return __xchg_u8_local(ptr, x); d0563a1297e234 Pan Xinhui 2016-04-27 164 case 2: d0563a1297e234 Pan Xinhui 2016-04-27 165 return __xchg_u16_local(ptr, x); ae3a197e3d0bfe David Howells 2012-03-28 166 case 4: 26760fc19a7e66 Boqun Feng 2015-12-15 167 return __xchg_u32_local(ptr, x); ae3a197e3d0bfe David Howells 2012-03-28 168 #ifdef CONFIG_PPC64 ae3a197e3d0bfe David Howells 2012-03-28 169 case 8: 26760fc19a7e66 Boqun Feng 2015-12-15 170 return __xchg_u64_local(ptr, x); ae3a197e3d0bfe David Howells 2012-03-28 171 #endif ae3a197e3d0bfe David Howells 2012-03-28 172 } 10d8b1480e6966 pan xinhui 2016-02-23 @173 BUILD_BUG_ON_MSG(1, "Unsupported size for __xchg"); ae3a197e3d0bfe David Howells 2012-03-28 174 return x; ae3a197e3d0bfe David Howells 2012-03-28 175 } ae3a197e3d0bfe David Howells 2012-03-28 176 :::::: The code at line 173 was first introduced by commit :::::: 10d8b1480e6966ba03cd3afad3b5d6eb5e341fae powerpc: Use BUILD_BUG_ON_MSG() for unsupported {cmp}xchg sizes :::::: TO: pan xinhui <xinhui.pan@linux.vnet.ibm.com> :::::: CC: Michael Ellerman <mpe@ellerman.id.au> --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org [-- Attachment #2: config.gz --] [-- Type: application/gzip, Size: 16082 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 2/3] printk: add console_verbose_{start,end} 2020-03-15 17:09 ` [RFC PATCH 2/3] printk: add console_verbose_{start,end} Eugeniu Rosca 2020-03-16 21:19 ` [RFC PATCH 2/3] printk: add console_verbose_{start, end} kbuild test robot 2020-03-16 21:30 ` kbuild test robot @ 2020-03-18 16:00 ` John Ogness 2 siblings, 0 replies; 19+ messages in thread From: John Ogness @ 2020-03-18 16:00 UTC (permalink / raw) To: Eugeniu Rosca Cc: linux-kernel, Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Jisheng Zhang, Valdis Kletnieks, Sebastian Andrzej Siewior, Andrew Gabbasov, Dirk Behme, Eugeniu Rosca On 2020-03-15, Eugeniu Rosca <roscaeugeniu@gmail.com> wrote: > diff --git a/include/linux/printk.h b/include/linux/printk.h > index 1e6108b8d15f..14755ef7b017 100644 > --- a/include/linux/printk.h > +++ b/include/linux/printk.h > @@ -3,6 +3,7 @@ > #define __KERNEL_PRINTK__ > > #include <stdarg.h> > +#include <linux/atomic.h> > #include <linux/init.h> > #include <linux/kern_levels.h> > #include <linux/linkage.h> > @@ -77,6 +78,15 @@ static inline void console_verbose(void) > console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH; > } > > +#ifdef CONFIG_PRINTK > +extern atomic_t ignore_loglevel; > +static inline void console_verbose_start(void) { atomic_inc(&ignore_loglevel); } > +static inline void console_verbose_end(void) { atomic_dec(&ignore_loglevel); } To be compatible with console_verbose() semantics, this should also respect console_loglevel=0. Perhaps by updating suppress_message_printing() to be something like: static bool suppress_message_printing(int level) { return (!console_loglevel || (level >= console_loglevel && !ignore_loglevel)); } John Ogness ^ permalink raw reply [flat|nested] 19+ messages in thread
* [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup 2020-03-15 17:09 [RFC PATCH 0/3] Fix quiet console in pre-panic scenarios Eugeniu Rosca 2020-03-15 17:09 ` [RFC PATCH 1/3] printk: convert ignore_loglevel to atomic_t Eugeniu Rosca 2020-03-15 17:09 ` [RFC PATCH 2/3] printk: add console_verbose_{start,end} Eugeniu Rosca @ 2020-03-15 17:09 ` Eugeniu Rosca 2020-03-17 2:18 ` Sergey Senozhatsky 2020-03-16 18:35 ` [RFC PATCH 0/3] Fix quiet console in pre-panic scenarios Steven Rostedt 3 siblings, 1 reply; 19+ messages in thread From: Eugeniu Rosca @ 2020-03-15 17:09 UTC (permalink / raw) To: linux-kernel Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Jisheng Zhang, Valdis Kletnieks, Sebastian Andrzej Siewior, Andrew Gabbasov, Dirk Behme, Eugeniu Rosca As pointed out in v5.0 commit 168e06f7937d96 ("kernel/hung_task.c: force console verbose before panic"), users may miss crucial information in pre-panic scenarios, depending on the loglevel value set by the distribion/platform maintainers. For example, assuming loglevel=3 (i.e. print EMERG/ALERT/CRIT only), below is the output of a softlockup got with pure vanilla kernel [1]. In contrast, with current patch applied, output [2] is far more helpful. Save potentially days/weeks/months of debugging efforts for those who: * deal with a softlockup whose reproduction scenario is not clear * are constrainted to use a low loglevel value in production * mostly/exclusively rely on logs like [2] post-mortem [1] Softlockup output [loglevel=3, pure v5.6-rc5, H3ULCB] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [modprobe:2354] [2] Softlockup output [loglevel=3, v5.6-rc5 + this patch, H3ULCB] [ 292.259324] watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [modprobe:2437] [ 292.266834] Modules linked in: <-- cut --> [ 292.302880] irq event stamp: 519660 [ 292.306555] hardirqs last enabled at (519659): [<ffffa00010082a18>] el1_irq+0xd8/0x180 [ 292.314909] hardirqs last disabled at (519660): [<ffffa000100829bc>] el1_irq+0x7c/0x180 [ 292.323274] softirqs last enabled at (519658): [<ffffa00010080d0c>] __do_softirq+0x2e4/0x7c4 [ 292.332179] softirqs last disabled at (519651): [<ffffa000100c3e48>] irq_exit+0x150/0x1e0 [ 292.340720] CPU: 3 PID: 2437 Comm: modprobe Not tainted 5.6.0-rc5+ #96 [ 292.347538] Hardware name: Renesas H3ULCB Kingfisher board based on r8a77951 (DT) [ 292.355355] pstate: 20000005 (nzCv daif -PAN -UAO) [ 292.360435] pc : lkdtm_SOFTLOCKUP+0x18/0x20 [lkdtm] [ 292.365592] lr : lkdtm_SOFTLOCKUP+0x18/0x20 [lkdtm] [ 292.370699] sp : ffff0006cd9af780 [ 292.374187] x29: ffff0006cd9af780 x28: ffffa0000907d000 [ 292.379757] x27: ffff0006cbaebc10 x26: ffffa000090751c0 [ 292.385327] x25: ffffa0000907dfc0 x24: 0000000000000009 [ 292.390896] x23: ffffa0000907d478 x22: 0000000000000000 [ 292.396464] x21: ffffa0000907d340 x20: ffffa0000907d340 [ 292.402034] x19: ffffa000090751c0 x18: 0000000000000000 [ 292.407603] x17: 0000000000000000 x16: 0000000000000000 [ 292.413171] x15: 0000000000000000 x14: ffffa00000000003 [ 292.418740] x13: ffffa00010dc0180 x12: 0000000000000000 [ 292.424309] x11: ffff8000bfc2401d x10: 1fffe000bfc2401d [ 292.429877] x9 : dfffa00000000000 x8 : ffffa00010082c00 [ 292.435445] x7 : ffff0005fe1200ef x6 : 0000000000000000 [ 292.441012] x5 : dfffa00000000000 x4 : ffff0006c82e9c50 [ 292.446580] x3 : ffffa0001024b848 x2 : 0000000000000001 [ 292.452147] x1 : 0000000000000000 x0 : 0000000000000000 [ 292.457713] Call trace: [ 292.460351] lkdtm_SOFTLOCKUP+0x18/0x20 [lkdtm] [ 292.465147] lkdtm_do_action+0x40/0x44 [lkdtm] [ 292.469853] lkdtm_register_cpoint+0x44/0xbc [lkdtm] [ 292.475099] lkdtm_module_init+0x220/0x27c [lkdtm] [ 292.480120] do_one_initcall+0x1c8/0x40c [ 292.484244] do_init_module+0xb4/0x330 [ 292.488185] load_module+0x3404/0x34c8 [ 292.492126] __do_sys_finit_module+0x11c/0x13c [ 292.496787] __arm64_sys_finit_module+0x4c/0x5c [ 292.501539] el0_svc_common.constprop.0+0x160/0x200 [ 292.506647] do_el0_svc+0x74/0xa4 [ 292.510139] el0_sync_handler+0xc4/0x104 [ 292.514257] el0_sync+0x140/0x180 Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com> --- kernel/watchdog.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index b6b1f54a7837..cbe7e2c4c5fb 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -428,6 +428,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) } } + console_verbose_start(); + pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", smp_processor_id(), duration, current->comm, task_pid_nr(current)); @@ -453,6 +455,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) if (softlockup_panic) panic("softlockup: hung tasks"); __this_cpu_write(soft_watchdog_warn, true); + + console_verbose_end(); } else __this_cpu_write(soft_watchdog_warn, false); -- 2.25.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup 2020-03-15 17:09 ` [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup Eugeniu Rosca @ 2020-03-17 2:18 ` Sergey Senozhatsky 2020-03-18 18:05 ` Eugeniu Rosca 0 siblings, 1 reply; 19+ messages in thread From: Sergey Senozhatsky @ 2020-03-17 2:18 UTC (permalink / raw) To: Eugeniu Rosca Cc: linux-kernel, Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Jisheng Zhang, Valdis Kletnieks, Sebastian Andrzej Siewior, Andrew Gabbasov, Dirk Behme, Eugeniu Rosca On (20/03/15 18:09), Eugeniu Rosca wrote: [..] > @@ -428,6 +428,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) > } > } > > + console_verbose_start(); > + > pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", > smp_processor_id(), duration, > current->comm, task_pid_nr(current)); > @@ -453,6 +455,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) > if (softlockup_panic) > panic("softlockup: hung tasks"); > __this_cpu_write(soft_watchdog_warn, true); > + > + console_verbose_end(); > } else > __this_cpu_write(soft_watchdog_warn, false); I'm afraid, as of now, this approach is not going to work the way it's supposed to work in 100% of cases. Because the only thing that printk() call sort of guarantees is that the message will be stored somewhere. Either in the main kernel log buffer, on in one of auxiliary per-CPU log buffers. It does not guarantee, generally speaking, that the message will be printed on the console immediately. Consider the following example: CPU0 CPU1 console_lock(); schedule(); watchdog() console_verbose_start(); printk() log_store() if (!console_trylock()) return; console_verbose_end(); ... console_unlock() print logbuf messages to the consoles we missed the console_verbose_start/end on CPU1 IIRC, we had a similar approach in the past. See commit 375899cddcbb26 ("printk: make sure to print log on console"). And we reverted it, see a6ae928c25835 ("Revert "printk: make sure to print log on console."). -ss ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup 2020-03-17 2:18 ` Sergey Senozhatsky @ 2020-03-18 18:05 ` Eugeniu Rosca 2020-03-19 6:48 ` Sergey Senozhatsky 2020-03-19 8:20 ` John Ogness 0 siblings, 2 replies; 19+ messages in thread From: Eugeniu Rosca @ 2020-03-18 18:05 UTC (permalink / raw) To: Sergey Senozhatsky Cc: linux-kernel, John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Jisheng Zhang, Valdis Kletnieks, Sebastian Andrzej Siewior, Andrew Gabbasov, Dirk Behme, Eugeniu Rosca, Eugeniu Rosca Hi Sergey, Many thanks for your feedback! On Tue, Mar 17, 2020 at 11:18:18AM +0900, Sergey Senozhatsky wrote: > On (20/03/15 18:09), Eugeniu Rosca wrote: > > [..] > > > @@ -428,6 +428,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) > > } > > } > > > > + console_verbose_start(); > > + > > pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", > > smp_processor_id(), duration, > > current->comm, task_pid_nr(current)); > > @@ -453,6 +455,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) > > if (softlockup_panic) > > panic("softlockup: hung tasks"); > > __this_cpu_write(soft_watchdog_warn, true); > > + > > + console_verbose_end(); > > } else > > __this_cpu_write(soft_watchdog_warn, false); > > > I'm afraid, as of now, this approach is not going to work the way it's > supposed to work in 100% of cases. Because the only thing that printk() > call sort of guarantees is that the message will be stored somewhere. > Either in the main kernel log buffer, on in one of auxiliary per-CPU > log buffers. It does not guarantee, generally speaking, that the message > will be printed on the console immediately. I take this passage as an acknowledgement of the problem being _real_, in spite of the fix being not perfect. One aspect I would like to emphasize is that (please, NAK this statement if it's not accurate) the problem reported in this patch is not specific to the existing printk mechanism, but also applies to the upcoming kthread-based printk. If that's true, then IMHO this is a compelling argument to join forces and try to find a working, safe and future-proof solution. > > Consider the following example: > > CPU0 CPU1 > console_lock(); > schedule(); > > watchdog() > console_verbose_start(); > printk() > log_store() > if (!console_trylock()) > return; > console_verbose_end(); > > ... > console_unlock() > print logbuf messages to the consoles > we missed the console_verbose_start/end > on CPU1 This looks plausible. However, I wonder to which degree the same scenario is a concern in the kthread-based approach? My current standpoint is that as long as points [A-D] are met, it should do no harm to accept a (partial) fix like seen in my series: - [A] the patch tackles at least a subset of problematic use-cases - [B] the fix is non-intrusive and easy to review - [C] there is hope to reuse it in the new lockless buffer based printk - [D] there are no regressions employing the major console knobs (ignore_loglevel, quiet, loglevel, etc) as it happened in a6ae928c25835c ("Revert "printk: make sure to print log on console."") From the above points, my only major concern is that current series breaks the expectations of users who pass loglevel=0 on kernel command line and expect the system to be totally silent. This has already been expressed in the cover letter. I would especially appreciate if the same view is shared (or invalidated) by others. > > IIRC, we had a similar approach in the past. See commit 375899cddcbb26 > ("printk: make sure to print log on console"). And we reverted it, see > a6ae928c25835 ("Revert "printk: make sure to print log on console."). Thanks for this reference. It looks to me that in spite of being relatively compact, commit 375899cddcbb26 ("printk: make sure to print log on console.") broke criteria [D] listed above. I intend to avoid it by testing multiple console knob values on my arm64 system. Looking forward to your feedback on the questions posted above. TIA! -- Best Regards Eugeniu Rosca ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup 2020-03-18 18:05 ` Eugeniu Rosca @ 2020-03-19 6:48 ` Sergey Senozhatsky 2020-03-19 7:38 ` Valdis Klētnieks 2020-05-19 14:43 ` Petr Mladek 2020-03-19 8:20 ` John Ogness 1 sibling, 2 replies; 19+ messages in thread From: Sergey Senozhatsky @ 2020-03-19 6:48 UTC (permalink / raw) To: Eugeniu Rosca Cc: Sergey Senozhatsky, linux-kernel, John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Jisheng Zhang, Valdis Kletnieks, Sebastian Andrzej Siewior, Andrew Gabbasov, Dirk Behme, Eugeniu Rosca Hi, On (20/03/18 19:05), Eugeniu Rosca wrote: [..] > > I'm afraid, as of now, this approach is not going to work the way it's > > supposed to work in 100% of cases. Because the only thing that printk() > > call sort of guarantees is that the message will be stored somewhere. > > Either in the main kernel log buffer, on in one of auxiliary per-CPU > > log buffers. It does not guarantee, generally speaking, that the message > > will be printed on the console immediately. > > I take this passage as an acknowledgement of the problem being _real_, > in spite of the fix being not perfect. > > One aspect I would like to emphasize is that (please, NAK this > statement if it's not accurate) the problem reported in this patch is > not specific to the existing printk mechanism, but also applies to the > upcoming kthread-based printk. If that's true, then IMHO this is a > compelling argument to join forces and try to find a working, safe and > future-proof solution. Yes, this is a "known issue". > > Consider the following example: > > > > CPU0 CPU1 > > console_lock(); > > schedule(); > > > > watchdog() > > console_verbose_start(); > > printk() > > log_store() > > if (!console_trylock()) > > return; > > console_verbose_end(); > > > > ... > > console_unlock() > > print logbuf messages to the consoles > > we missed the console_verbose_start/end > > on CPU1 > > This looks plausible. However, I wonder to which degree the same > scenario is a concern in the kthread-based approach? This is a problem for any async/deferred printk() printout. When we print messages on the consoles we don't have any idea anymore if the CPU which added those messages to the logbuf had any issues going on. > My current standpoint is that as long as points [A-D] are met, it > should do no harm to accept a (partial) fix like seen in my series: > > - [A] the patch tackles at least a subset of problematic use-cases > - [B] the fix is non-intrusive and easy to review > - [C] there is hope to reuse it in the new lockless buffer based printk > - [D] there are no regressions employing the major console knobs > (ignore_loglevel, quiet, loglevel, etc) as it happened in > a6ae928c25835c ("Revert "printk: make sure to print log on console."") So the issue is that when we bump `console_loglevel' or `ignore_loglevel' we lift restrictions globally. For all CPUs, for all contexts which call printk(). This may have negative impact. Fuzzing is one example. It usually generates a lot of printk() noise, so lifting global printk() log_level restrictions does cause problems. I recall that fuzzing people really disliked the old_level = console_loglevel console_loglevel = new_level .... console_loglevel = old_level approach. Because if lets all CPUs and tasks to pollute serial logs. While what we want is to print messages from a particular CPU/task. So _maybe_ we can explore the per-context printk() flags instead. The idea is to store a "desired"/emergency ->log_level. When a context sets that ->log_level we need to make any printk() messages issued by that context at least of that ->log_level. IOW we overwrite messages log_levels, because printk() is not synchronous by design. We had similar discussion before [0]. I was misunderstood back then and in order to avoid the same misunderstanding this time around, let me first state that - *I DO NOT* propose to disable preemption for printk(). The code in question was *just a POC snippet*. To move that POC further - we can have the same per-context printk() fields in every task_struct struct. Ideally without consuming additional memory, or consuming as few extra bytes per task_struct as possible. But even if we will have ->log_level embedded into every task_struct, we still need to distinguish various cases (contexts): - when current task wants to declare printk() emergency and sets current->log_level, only this process is in printk() emergency. - IRQs and NMIs which interrupt current should not be in printk() emergency, unless IRQ and/or NMI explicitly declare printk() emergency. - when IRQ declares printk() emergency and sets current->log_level, only this IRQ is in printk() emergency - when NMI declares printk() emergency and sets current->log_level, only this NMI is in printk() emergency Another thing is. Suppose current sets ->log_level, and then encounters a page_fault, which printk-s from exception handler. We shall distinguish such cases too, I think. So something like printk_emergency_enter(LOG_LEVEL) current->context[ctx] = LOG_LEVEL; ... printk() // overwrite message log_level to current->context[ctx] level dump_stack() // same log_level overwrite ... printk_emergency_exit() current->context[ctx] = -1 So when we iterate pending logbuf messages (CPU0->console_unlock() in the example from my first email) messages from CPU1 will have appropriate "desired" log_level and we will print those messages on the consoles. Well. Something like this. [0] https://lore.kernel.org/linuxppc-dev/20191112021747.GA68506@google.com/ -ss ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup 2020-03-19 6:48 ` Sergey Senozhatsky @ 2020-03-19 7:38 ` Valdis Klētnieks 2020-03-19 8:01 ` Sergey Senozhatsky 2020-05-19 14:43 ` Petr Mladek 1 sibling, 1 reply; 19+ messages in thread From: Valdis Klētnieks @ 2020-03-19 7:38 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Eugeniu Rosca, linux-kernel, John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Jisheng Zhang, Sebastian Andrzej Siewior, Andrew Gabbasov, Dirk Behme, Eugeniu Rosca [-- Attachment #1: Type: text/plain, Size: 1220 bytes --] On Thu, 19 Mar 2020 15:48:36 +0900, Sergey Senozhatsky said: > So the issue is that when we bump `console_loglevel' or `ignore_loglevel' > we lift restrictions globally. For all CPUs, for all contexts which call > printk(). This may have negative impact. Fuzzing is one example. It > usually generates a lot of printk() noise, so lifting global printk() > log_level restrictions does cause problems. I recall that fuzzing people > really disliked the > old_level = console_loglevel > console_loglevel = new_level > .... > console_loglevel = old_level > > approach. Because if lets all CPUs and tasks to pollute serial logs. > While what we want is to print messages from a particular CPU/task. Well... how does this sound for a RFC idea? We already have CONFIG_PRINTK_TIME and CONFIG_PRINTK_CALLER. How about adding an option to allow printing the calling CPU as well, or just extend CALLER to print [pid/cpu] rather than just [pid]? (And yes, I know that the concept of "this CPU" can change quickly, which is why per_cpu data may not be accurate for anything more than statistical aggregation purposes. It's only worth pondering how to deal with that if it's worth developing the patch at all... :) [-- Attachment #2: Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup 2020-03-19 7:38 ` Valdis Klētnieks @ 2020-03-19 8:01 ` Sergey Senozhatsky 2020-03-19 8:18 ` Valdis Klētnieks 0 siblings, 1 reply; 19+ messages in thread From: Sergey Senozhatsky @ 2020-03-19 8:01 UTC (permalink / raw) To: Valdis Klētnieks Cc: Sergey Senozhatsky, Eugeniu Rosca, linux-kernel, John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Jisheng Zhang, Sebastian Andrzej Siewior, Andrew Gabbasov, Dirk Behme, Eugeniu Rosca On (20/03/19 03:38), Valdis Klētnieks wrote: > On Thu, 19 Mar 2020 15:48:36 +0900, Sergey Senozhatsky said: > > > So the issue is that when we bump `console_loglevel' or `ignore_loglevel' > > we lift restrictions globally. For all CPUs, for all contexts which call > > printk(). This may have negative impact. Fuzzing is one example. It > > usually generates a lot of printk() noise, so lifting global printk() > > log_level restrictions does cause problems. I recall that fuzzing people > > really disliked the > > old_level = console_loglevel > > console_loglevel = new_level > > .... > > console_loglevel = old_level > > > > approach. Because if lets all CPUs and tasks to pollute serial logs. > > While what we want is to print messages from a particular CPU/task. > > Well... how does this sound for a RFC idea? We already have CONFIG_PRINTK_TIME > and CONFIG_PRINTK_CALLER. How about adding an option to allow printing the > calling CPU as well, or just extend CALLER to print [pid/cpu] rather than just > [pid]? IIRC, CONFIG_PRINTK_CALLER prints pid when printk() caller is a running task, and CPU-id otherwise. It was added by fuzzing people, because it's almost (or "simply") impossible to get through serial logs sometimes. -ss ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup 2020-03-19 8:01 ` Sergey Senozhatsky @ 2020-03-19 8:18 ` Valdis Klētnieks 0 siblings, 0 replies; 19+ messages in thread From: Valdis Klētnieks @ 2020-03-19 8:18 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Eugeniu Rosca, linux-kernel, John Ogness, Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Jisheng Zhang, Sebastian Andrzej Siewior, Andrew Gabbasov, Dirk Behme, Eugeniu Rosca [-- Attachment #1: Type: text/plain, Size: 362 bytes --] On Thu, 19 Mar 2020 17:01:56 +0900, Sergey Senozhatsky said: > IIRC, CONFIG_PRINTK_CALLER prints pid when printk() caller is a > running task, and CPU-id otherwise. Ah... so it does. #ifdef CONFIG_PRINTK_CALLER u32 caller_id; /* thread id or processor id */ #endif I haven't seen it output a processor ID yet, so I didn't know that part... [-- Attachment #2: Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup 2020-03-19 6:48 ` Sergey Senozhatsky 2020-03-19 7:38 ` Valdis Klētnieks @ 2020-05-19 14:43 ` Petr Mladek 1 sibling, 0 replies; 19+ messages in thread From: Petr Mladek @ 2020-05-19 14:43 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Eugeniu Rosca, linux-kernel, John Ogness, Sergey Senozhatsky, Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Jisheng Zhang, Valdis Kletnieks, Sebastian Andrzej Siewior, Andrew Gabbasov, Dirk Behme, Eugeniu Rosca On Thu 2020-03-19 15:48:36, Sergey Senozhatsky wrote: > On (20/03/18 19:05), Eugeniu Rosca wrote: > > My current standpoint is that as long as points [A-D] are met, it > > should do no harm to accept a (partial) fix like seen in my series: > > > > - [A] the patch tackles at least a subset of problematic use-cases > > - [B] the fix is non-intrusive and easy to review > > - [C] there is hope to reuse it in the new lockless buffer based printk > > - [D] there are no regressions employing the major console knobs > > (ignore_loglevel, quiet, loglevel, etc) as it happened in > > a6ae928c25835c ("Revert "printk: make sure to print log on console."") > > So the issue is that when we bump `console_loglevel' or `ignore_loglevel' > we lift restrictions globally. For all CPUs, for all contexts which call > printk(). This may have negative impact. Another impact is that many more messages might suddenly appear on the console even though admins wanted them quiet because they were slow. The problem is to define what information is critical. In the ideal world, all messages are visible on the console so that developers could use them for debugging. The console loglevel is there to keep it working in the real world. IMHO, an acceptable solution would be: + Print a single critical message about that a lockup happened + Make it configurable which log level will get used for the details. Note that there is a pending patchset that allows to show stacks with a given loglevel, see https://lore.kernel.org/linux-riscv/20200418201944.482088-1-dima@arista.com/ Well, I am slightly afraid that it might open a can with hundred of printk-related options shuffling log level for various events. The upcoming printk kthread might help to handle even more messages on slow consoles. It might allow to increase the default loglevel in these situations. But the problem will still be there. The throughput will always be limited and different people have different opinion on what messages are important. Best Regards, Petr ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup 2020-03-18 18:05 ` Eugeniu Rosca 2020-03-19 6:48 ` Sergey Senozhatsky @ 2020-03-19 8:20 ` John Ogness 1 sibling, 0 replies; 19+ messages in thread From: John Ogness @ 2020-03-19 8:20 UTC (permalink / raw) To: Eugeniu Rosca Cc: Sergey Senozhatsky, linux-kernel, Petr Mladek, Sergey Senozhatsky, Steven Rostedt, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Jisheng Zhang, Valdis Kletnieks, Sebastian Andrzej Siewior, Andrew Gabbasov, Dirk Behme, Eugeniu Rosca On 2020-03-18, Eugeniu Rosca <erosca@de.adit-jv.com> wrote: > On Tue, Mar 17, 2020 at 11:18:18AM +0900, Sergey Senozhatsky wrote: >> On (20/03/15 18:09), Eugeniu Rosca wrote: >>> @@ -428,6 +428,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) >>> } >>> } >>> >>> + console_verbose_start(); >>> + >>> pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", >>> smp_processor_id(), duration, >>> current->comm, task_pid_nr(current)); >>> @@ -453,6 +455,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) >>> if (softlockup_panic) >>> panic("softlockup: hung tasks"); >>> __this_cpu_write(soft_watchdog_warn, true); >>> + >>> + console_verbose_end(); >>> } else >>> __this_cpu_write(soft_watchdog_warn, false); >> >> I'm afraid, as of now, this approach is not going to work the way it's >> supposed to work in 100% of cases. Because the only thing that printk() >> call sort of guarantees is that the message will be stored somewhere. >> Either in the main kernel log buffer, on in one of auxiliary per-CPU >> log buffers. It does not guarantee, generally speaking, that the message >> will be printed on the console immediately. > > I take this passage as an acknowledgement of the problem being _real_, > in spite of the fix being not perfect. > > One aspect I would like to emphasize is that (please, NAK this > statement if it's not accurate) the problem reported in this patch is > not specific to the existing printk mechanism, but also applies to the > upcoming kthread-based printk. If that's true, then IMHO this is a > compelling argument to join forces and try to find a working, safe and > future-proof solution. Let me clarify that the upcoming rework is _not_ simply to make a kthread-based printk. There upcoming rework addresses several issues (kthreads being only a piece): 1. allow printk-callers to get their messages immediately and locklessly into the log buffer from any context 2. during normal operation, printk-callers are completely decoupled from console printing 3. in the case of an emergency, every printk-caller will directly and synchronously perform console printing of its messages on consoles that support atomic writing For the rework we decided that triggering an oops is what irreversibly transitions the system from "normal operation" to "emergency". One could interpret the current "console_verbose()" to be some sort of equivalent to irreversibly transitioning the system to "emergency". But that is not how it was decided to be interpreted. For the rework, printk-callers do not have any influence on forcing immediate console printing (unless they trigger an oops). console_verbose() is still relevant in the rework. console_verbose() is affecting _what_ is printed to consoles and _not when_. Once the printk rework is complete, the mechanisms for atomic and synchronous console printing will be in place. It would be possible that these mechanisms could also be used in non-oops scenarios. But that would need to be discussed in depth and clearly defined with caution for abuse. John Ogness ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 0/3] Fix quiet console in pre-panic scenarios 2020-03-15 17:09 [RFC PATCH 0/3] Fix quiet console in pre-panic scenarios Eugeniu Rosca ` (2 preceding siblings ...) 2020-03-15 17:09 ` [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup Eugeniu Rosca @ 2020-03-16 18:35 ` Steven Rostedt 2020-03-16 19:09 ` Sebastian Andrzej Siewior 3 siblings, 1 reply; 19+ messages in thread From: Steven Rostedt @ 2020-03-16 18:35 UTC (permalink / raw) To: Eugeniu Rosca Cc: linux-kernel, Petr Mladek, Sergey Senozhatsky, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Jisheng Zhang, Valdis Kletnieks, Sebastian Andrzej Siewior, Andrew Gabbasov, Dirk Behme, Eugeniu Rosca On Sun, 15 Mar 2020 18:09:00 +0100 Eugeniu Rosca <roscaeugeniu@gmail.com> wrote: > Dear community, > > The motivation behind this seris is to save days/weeks, if not months, > of debugging efforts for users who: > > * experience an issue like softlockup/hardlockup/hung task/oom, whose > reproduction is not clear and whose occurrence rate is very low > * are constrained to use a low loglevel value (1,2,3) in production > * mostly rely on console logs to debug the issue post-mortem > (e.g. saved to persistent storage via e.g. pstore) > > As pointed out in the last patch from this series, under the above > circumstances, users might simply lack any relevant logs during > post-mortem analysis. > > Why this series is marked as RFC is because: > * There are several possible approaches to turn console verbosity on > and off. Current series employs the 'ignore_loglevel' functionality, > but an alternative way is to use the 'console_loglevel' variable. The > latter is more intrusive, hence the former has been chosen as v1. > * Manipulating 'ignore_loglevel' might be seen as an abuse, especially > because it breaks the expectation of users who assume the system to > be dead silent after passing loglevel=0 on kernel command line. > > Thank you for your comments! I don't see any issues with this patch set. What do others think? Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> [ Note, I only acked, and did not give a deep review of it ] -- Steve > > Eugeniu Rosca (3): > printk: convert ignore_loglevel to atomic_t > printk: add console_verbose_{start,end} > watchdog: Turn console verbosity on when reporting softlockup > > include/linux/printk.h | 10 ++++++++++ > kernel/printk/printk.c | 30 ++++++++++++++++++++++++++---- > kernel/watchdog.c | 4 ++++ > 3 files changed, 40 insertions(+), 4 deletions(-) > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 0/3] Fix quiet console in pre-panic scenarios 2020-03-16 18:35 ` [RFC PATCH 0/3] Fix quiet console in pre-panic scenarios Steven Rostedt @ 2020-03-16 19:09 ` Sebastian Andrzej Siewior 2020-03-18 14:56 ` Eugeniu Rosca 0 siblings, 1 reply; 19+ messages in thread From: Sebastian Andrzej Siewior @ 2020-03-16 19:09 UTC (permalink / raw) To: Steven Rostedt Cc: Eugeniu Rosca, linux-kernel, Petr Mladek, Sergey Senozhatsky, Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Jisheng Zhang, Valdis Kletnieks, Andrew Gabbasov, Dirk Behme, Eugeniu Rosca On 2020-03-16 14:35:17 [-0400], Steven Rostedt wrote: > I don't see any issues with this patch set. What do others think? > > Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> > > [ Note, I only acked, and did not give a deep review of it ] What is the state of the other larger printk rework? If this does not solve any -stable related issues then it will be replaced? Sebastian ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 0/3] Fix quiet console in pre-panic scenarios 2020-03-16 19:09 ` Sebastian Andrzej Siewior @ 2020-03-18 14:56 ` Eugeniu Rosca 2020-03-18 15:40 ` John Ogness 0 siblings, 1 reply; 19+ messages in thread From: Eugeniu Rosca @ 2020-03-18 14:56 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Steven Rostedt, John Ogness, Thomas Gleixner, linux-kernel, Petr Mladek, Sergey Senozhatsky, Ingo Molnar, Peter Zijlstra, Jisheng Zhang, Valdis Kletnieks, Andrew Gabbasov, Dirk Behme, Eugeniu Rosca, Eugeniu Rosca Hi Sebastian, Cc: John Ogness On Mon, Mar 16, 2020 at 08:09:48PM +0100, Sebastian Andrzej Siewior wrote: > On 2020-03-16 14:35:17 [-0400], Steven Rostedt wrote: > > I don't see any issues with this patch set. What do others think? > > > > Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> > > > > [ Note, I only acked, and did not give a deep review of it ] > > What is the state of the other larger printk rework? If this does not > solve any -stable related issues then it will be replaced? Is this a question to John and his most recent series in https://lore.kernel.org/lkml/20200128161948.8524-1-john.ogness@linutronix.de/ ? Is there any upstream agreement to already keep the current printk mechanism away from any updates? -- Best Regards Eugeniu Rosca ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 0/3] Fix quiet console in pre-panic scenarios 2020-03-18 14:56 ` Eugeniu Rosca @ 2020-03-18 15:40 ` John Ogness 0 siblings, 0 replies; 19+ messages in thread From: John Ogness @ 2020-03-18 15:40 UTC (permalink / raw) To: Eugeniu Rosca Cc: Sebastian Andrzej Siewior, Steven Rostedt, Thomas Gleixner, linux-kernel, Petr Mladek, Sergey Senozhatsky, Ingo Molnar, Peter Zijlstra, Jisheng Zhang, Valdis Kletnieks, Andrew Gabbasov, Dirk Behme, Eugeniu Rosca On 2020-03-18, Eugeniu Rosca <erosca@de.adit-jv.com> wrote: > On Mon, Mar 16, 2020 at 08:09:48PM +0100, Sebastian Andrzej Siewior wrote: >> On 2020-03-16 14:35:17 [-0400], Steven Rostedt wrote: >> > I don't see any issues with this patch set. What do others think? >> > >> > Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> >> > >> > [ Note, I only acked, and did not give a deep review of it ] >> >> What is the state of the other larger printk rework? If this does not >> solve any -stable related issues then it will be replaced? > > Is this a question to John and his most recent series in > https://lore.kernel.org/lkml/20200128161948.8524-1-john.ogness@linutronix.de/ > ? > > Is there any upstream agreement to already keep the current printk > mechanism away from any updates? No. Fixes are welcome! The only thing we are trying to avoid at this stage is massive refactoring/cleanup work. Your series does not fit into that category. As to my opinion on this series, assuming it is acceptable for the maintainers, I would like to see console_verbose() become an alias to console_verbose_start(). Then @ignore_loglevel would be used for both and CONSOLE_LOGLEVEL_MOTORMOUTH could be removed. John Ogness ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2020-05-19 14:44 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-15 17:09 [RFC PATCH 0/3] Fix quiet console in pre-panic scenarios Eugeniu Rosca
2020-03-15 17:09 ` [RFC PATCH 1/3] printk: convert ignore_loglevel to atomic_t Eugeniu Rosca
2020-03-15 17:09 ` [RFC PATCH 2/3] printk: add console_verbose_{start,end} Eugeniu Rosca
2020-03-16 21:19 ` [RFC PATCH 2/3] printk: add console_verbose_{start, end} kbuild test robot
2020-03-16 21:30 ` kbuild test robot
2020-03-18 16:00 ` [RFC PATCH 2/3] printk: add console_verbose_{start,end} John Ogness
2020-03-15 17:09 ` [RFC PATCH 3/3] watchdog: Turn console verbosity on when reporting softlockup Eugeniu Rosca
2020-03-17 2:18 ` Sergey Senozhatsky
2020-03-18 18:05 ` Eugeniu Rosca
2020-03-19 6:48 ` Sergey Senozhatsky
2020-03-19 7:38 ` Valdis Klētnieks
2020-03-19 8:01 ` Sergey Senozhatsky
2020-03-19 8:18 ` Valdis Klētnieks
2020-05-19 14:43 ` Petr Mladek
2020-03-19 8:20 ` John Ogness
2020-03-16 18:35 ` [RFC PATCH 0/3] Fix quiet console in pre-panic scenarios Steven Rostedt
2020-03-16 19:09 ` Sebastian Andrzej Siewior
2020-03-18 14:56 ` Eugeniu Rosca
2020-03-18 15:40 ` John Ogness
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.