* Re: [lkp] [printk] 34578dc67f: EIP is at vprintk_emit+0x1ea/0x600 [not found] ` <20160223155331.GA448@swordfish> @ 2016-02-24 1:19 ` Sergey Senozhatsky 2016-02-24 11:46 ` Petr Mladek 0 siblings, 1 reply; 7+ messages in thread From: Sergey Senozhatsky @ 2016-02-24 1:19 UTC (permalink / raw) To: kernel test robot Cc: lkp, Rob Herring, Frank Rowand, Grant Likely, devicetree, LKML, Andrew Morton, Calvin Owens, Dave Jones, Kyle McMartin, Tejun Heo, Jan Kara, Petr Mladek, Stephen Rothwell, Sergey Senozhatsky, Sergey Senozhatsky Hello, Cc Rob, Frank, Grant On (02/24/16 00:53), Sergey Senozhatsky wrote: [..] > 284 [ 0.000000] per task-struct memory footprint: 2112 bytes > 285 [ 0.000000] per task-struct memory footprint: 2112 bytes > 286 [ 0.000000] ------------------------ > 287 [ 0.000000] ------------------------ > 288 [ 0.000000] | Locking API testsuite: > 289 [ 0.000000] | Locking API testsuite: > 290 [ 0.000000] ---------------------------------------------------------------------------- > 291 [ 0.000000] ---------------------------------------------------------------------------- > 292 [ 0.000000] | spin |wlock |rlock |mutex | wsem | rsem | > 293 [ 0.000000] | spin |wlock |rlock |mutex | wsem | rsem | > 294 [ 0.000000] -------------------------------------------------------------------------- > 295 [ 0.000000] -------------------------------------------------------------------------- > > > looking at your Kernel command line > > [ 0.000000] Kernel command line: root=/dev/ram0 user=lkp job=/lkp/scheduled/vm-kbuild-yocto-i386-53/bisect_boot-1-yocto-minimal-i386.cgz-i386-randconfig-h1-02192137-34578dc67f38c02ccbe696e4099967884caa8e15-20160220-72722-vao2m5-0.yaml ARCH=i386 kconfig=i386-randconfig-h1-02192137 branch=linux-next/master commi t=34578dc67f38c02ccbe696e4099967884caa8e15 BOOT_IMAGE=/pkg/linux/i386-randconfig-h1-02192137/gcc-5/34578dc67f38c02ccbe696e4099967884caa8e15/vmlinuz-4.5.0-rc4-00295-g34578dc max_uptime=600 RESULT_ROOT=/result/boot/1/vm-kbuild-yocto-i386/yocto-minimal-i386.cgz/i386-randconfig-h1-02192137/gcc-5/34578dc67f38c02ccbe69 6e4099967884caa8e15/9 LKP_SERVER=inn earlyprintk=ttyS0,115200 systemd.log_level=err debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 pan ic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal rw ip=::::vm-kbuild-yo cto-i386-53::dhcp drbd.minor_count=8 > > > - earlyprintk=ttyS0,115200 > - console=ttyS0,115200 > - console=tty0 > > and I see "bootconsole [earlyser0] enabled" but no "bootconsole [earlyser0] disabled" message, which > I'd expect to see... > and you get the NMI watchdog softlockup because you have a whole bunch of "of_overlay_destroy: Could not find overlay #6" "### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6" messages to print. seems that somehitng just pushes them in a loop. there are too many of them: ** 16981217 printk messages dropped ** [ 33.495591] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 ** 16981217 printk messages dropped ** [ 33.495591] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 [ 33.495593] of_overlay_destroy: Could not find overlay #6 [ 33.495593] of_overlay_destroy: Could not find overlay #6 ** 16981217 printk messages dropped ** [ 33.497583] of_overlay_destroy: Could not find overlay #6 [ 33.497583] of_overlay_destroy: Could not find overlay #6 [ 33.497584] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 [ 33.497584] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 [ 33.497586] of_overlay_destroy: Could not find overlay #6 [ 33.497586] of_overlay_destroy: Could not find overlay #6 [ 33.497587] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 [ 33.497587] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 [ 33.497589] of_overlay_destroy: Could not find overlay #6 [ 33.497589] of_overlay_destroy: Could not find overlay #6 [ 33.497589] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 [ 33.497589] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 [ 33.497591] of_overlay_destroy: Could not find overlay #6 [ 33.497591] of_overlay_destroy: Could not find overlay #6 [ 33.497592] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 [ 33.497592] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 [ 33.497594] of_overlay_destroy: Could not find overlay #6 [ 33.497594] of_overlay_destroy: Could not find overlay #6 [ 33.497595] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 [ 33.497595] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 [ 33.497596] of_overlay_destroy: Could not find overlay #6 [ 33.497596] of_overlay_destroy: Could not find overlay #6 [ 33.497597] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 [ 33.497597] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 [ 33.497599] of_overlay_destroy: Could not find overlay #6 [ 33.497599] of_overlay_destroy: Could not find overlay #6 [ 33.497600] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 [ 33.497600] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 [ 33.497601] of_overlay_destroy: Could not find overlay #6 [ 33.497601] of_overlay_destroy: Could not find overlay #6 [ 33.497602] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 [ 33.497602] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 [ 33.497604] of_overlay_destroy: Could not find overlay #6 [ 33.497604] of_overlay_destroy: Could not find overlay #6 [ 33.497605] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 [ 33.497605] ### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6 the source of "too many messages to printk" is this `while (1) loop' static void of_unittest_destroy_tracked_overlays(void) { int id, ret, defers; if (overlay_first_id < 0) return; /* try until no defers */ do { defers = 0; /* remove in reverse order */ for (id = MAX_UNITTEST_OVERLAYS - 1; id >= 0; id--) { if (!(overlay_id_bits[BIT_WORD(id)] & BIT_MASK(id))) continue; ret = of_overlay_destroy(id + overlay_first_id); if (ret != 0) { defers++; pr_warn("%s: overlay destroy failed for #%d\n", __func__, id + overlay_first_id); continue; } overlay_id_bits[BIT_WORD(id)] &= ~BIT_MASK(id); } } while (defers > 0); } I don't know what's the root cause of missing overlay id in idr, but in this particular case the loop transform into /* well, "defers" should overflow at some point, but I doubt we must count on it */ do { ret = of_overlay_destroy()->idr_find()->pr_err("Could not find overlay"), return -ENODEV if (ret != 0) { pr_warn("overlay destroy failed for"); continue; } } while (1); the "while (1) printk();" pattern is known to be dangerous; we need to fix printk(). so something like this perhaps. From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Subject: [PATCH] fix endless loop --- drivers/of/unittest.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c index 979b6e4..5058017 100644 --- a/drivers/of/unittest.c +++ b/drivers/of/unittest.c @@ -1165,6 +1165,12 @@ static void of_unittest_destroy_tracked_overlays(void) continue; ret = of_overlay_destroy(id + overlay_first_id); + if (ret == -ENODEV) { + pr_warn("%s: no overlay to destroy for #%d\n", + __func__, id + overlay_first_id); + continue; + } + if (ret != 0) { defers++; pr_warn("%s: overlay destroy failed for #%d\n", -- 2.7.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [lkp] [printk] 34578dc67f: EIP is at vprintk_emit+0x1ea/0x600 2016-02-24 1:19 ` [lkp] [printk] 34578dc67f: EIP is at vprintk_emit+0x1ea/0x600 Sergey Senozhatsky @ 2016-02-24 11:46 ` Petr Mladek [not found] ` <20160224114614.GY3305-KsEp0d+Q8qECVLCxKZUutA@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Petr Mladek @ 2016-02-24 11:46 UTC (permalink / raw) To: Sergey Senozhatsky Cc: kernel test robot, lkp, Rob Herring, Frank Rowand, Grant Likely, devicetree, LKML, Andrew Morton, Calvin Owens, Dave Jones, Kyle McMartin, Tejun Heo, Jan Kara, Stephen Rothwell, Sergey Senozhatsky On Wed 2016-02-24 10:19:41, Sergey Senozhatsky wrote: > > Hello, > > Cc Rob, Frank, Grant > > On (02/24/16 00:53), Sergey Senozhatsky wrote: > [..] > > 284 [ 0.000000] per task-struct memory footprint: 2112 bytes > > 285 [ 0.000000] per task-struct memory footprint: 2112 bytes > > 286 [ 0.000000] ------------------------ > > 287 [ 0.000000] ------------------------ > > 288 [ 0.000000] | Locking API testsuite: > > 289 [ 0.000000] | Locking API testsuite: > > 290 [ 0.000000] ---------------------------------------------------------------------------- > > 291 [ 0.000000] ---------------------------------------------------------------------------- > > 292 [ 0.000000] | spin |wlock |rlock |mutex | wsem | rsem | > > 293 [ 0.000000] | spin |wlock |rlock |mutex | wsem | rsem | > > 294 [ 0.000000] -------------------------------------------------------------------------- > > 295 [ 0.000000] -------------------------------------------------------------------------- > > > > > > looking at your Kernel command line > > > > [ 0.000000] Kernel command line: root=/dev/ram0 user=lkp job=/lkp/scheduled/vm-kbuild-yocto-i386-53/bisect_boot-1-yocto-minimal-i386.cgz-i386-randconfig-h1-02192137-34578dc67f38c02ccbe696e4099967884caa8e15-20160220-72722-vao2m5-0.yaml ARCH=i386 kconfig=i386-randconfig-h1-02192137 branch=linux-next/master commi t=34578dc67f38c02ccbe696e4099967884caa8e15 BOOT_IMAGE=/pkg/linux/i386-randconfig-h1-02192137/gcc-5/34578dc67f38c02ccbe696e4099967884caa8e15/vmlinuz-4.5.0-rc4-00295-g34578dc max_uptime=600 RESULT_ROOT=/result/boot/1/vm-kbuild-yocto-i386/yocto-minimal-i386.cgz/i386-randconfig-h1-02192137/gcc-5/34578dc67f38c02ccbe69 6e4099967884caa8e15/9 LKP_SERVER=inn earlyprintk=ttyS0,115200 systemd.log_level=err debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 p anic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal rw ip=::::vm-kbuild-yo cto-i386-53::dhcp drbd.minor_count=8 > > > > > > - earlyprintk=ttyS0,115200 > > - console=ttyS0,115200 > > - console=tty0 > > > > and I see "bootconsole [earlyser0] enabled" but no "bootconsole [earlyser0] disabled" message, which > > I'd expect to see... > > > > > and you get the NMI watchdog softlockup because you have a whole bunch of > > "of_overlay_destroy: Could not find overlay #6" > "### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6" > > messages to print. seems that somehitng just pushes them in a loop. > there are too many of them: This sounds like a reasonable explanation. It seems that of_unittest_destroy_tracked_overlays() really ended in an infinite loop. But I am still curious why the softlookup points to [ 33.497718] EIP is at vprintk_emit+0x1ea/0x600 Also there is on the stack [ 33.497741] [<c068e712>] vprintk_default+0x32/0x40 [ 33.497741] [<c068e712>] vprintk_default+0x32/0x40 [ 33.497744] [<c06fdf6e>] printk+0x11/0x13 [ 33.497744] [<c06fdf6e>] printk+0x11/0x13 [ 33.497748] [<c0df5eec>] of_unittest_overlay+0x8d1/0x900 [ 33.497748] [<c0df5eec>] of_unittest_overlay+0x8d1/0x900 [ 33.497750] [<c0df6b1f>] of_unittest+0xc04/0xc2d [ 33.497750] [<c0df6b1f>] of_unittest+0xc04/0xc2d I would expect that the soft lookup happens in console_unlock() called with IRQs disabled. It seems to me that of_unittest_overlay() is called with IRQs enabled. I want to be sure that the patch in printk() did not introduce a deadlock that is visible only under a high printk load. I guess that the softlookup was not visible before the printk patch was applied. I wonder if the patch made something worse. It was supposed to improve things but... Do I miss something, please? Best Regards, Petr ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20160224114614.GY3305-KsEp0d+Q8qECVLCxKZUutA@public.gmane.org>]
* Re: [lkp] [printk] 34578dc67f: EIP is at vprintk_emit+0x1ea/0x600 [not found] ` <20160224114614.GY3305-KsEp0d+Q8qECVLCxKZUutA@public.gmane.org> @ 2016-02-24 12:50 ` Sergey Senozhatsky 2016-02-24 15:12 ` Sergey Senozhatsky 0 siblings, 1 reply; 7+ messages in thread From: Sergey Senozhatsky @ 2016-02-24 12:50 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, kernel test robot, lkp-JC7UmRfGjtg, Rob Herring, Frank Rowand, Grant Likely, devicetree-u79uwXL29TY76Z2rM5mHXA, LKML, Andrew Morton, Calvin Owens, Dave Jones, Kyle McMartin, Tejun Heo, Jan Kara, Stephen Rothwell, Sergey Senozhatsky Hello, On (02/24/16 12:46), Petr Mladek wrote: [..] > > and you get the NMI watchdog softlockup because you have a whole bunch of > > > > "of_overlay_destroy: Could not find overlay #6" > > "### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6" > > > > messages to print. seems that somehitng just pushes them in a loop. > > there are too many of them: > > This sounds like a reasonable explanation. It seems that > of_unittest_destroy_tracked_overlays() really ended in an infinite > loop. > > But I am still curious why the softlookup points to > > [ 33.497718] EIP is at vprintk_emit+0x1ea/0x600 > > Also there is on the stack > > [ 33.497741] [<c068e712>] vprintk_default+0x32/0x40 > [ 33.497741] [<c068e712>] vprintk_default+0x32/0x40 > [ 33.497744] [<c06fdf6e>] printk+0x11/0x13 > [ 33.497744] [<c06fdf6e>] printk+0x11/0x13 > [ 33.497748] [<c0df5eec>] of_unittest_overlay+0x8d1/0x900 > [ 33.497748] [<c0df5eec>] of_unittest_overlay+0x8d1/0x900 > [ 33.497750] [<c0df6b1f>] of_unittest+0xc04/0xc2d > [ 33.497750] [<c0df6b1f>] of_unittest+0xc04/0xc2d > > I would expect that the soft lookup happens in console_unlock() > called with IRQs disabled. It seems to me that of_unittest_overlay() > is called with IRQs enabled. watchdog has two different parts: hrtimer part (via IRQ) checks the touch_ts and wakeups the updater smpboot kthread; and updater kthread, that updates touch_ts. to get a lockup you can just keep the preemption disabled for 20+ seconds, so updater kthread will not update touch_ts. the next hrtimer irq will detect lockup. and the .config we have is CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set that can add up to the issue. (well, at least watchdog was working this way the last time I saw it). > I want to be sure that the patch in printk() did not introduce > a deadlock that is visible only under a high printk load. sure. I did additional intensive tests today, and saw no issues. schematically, something like this: u64 start = local_lock() >> 31UL: { preempt_disable(), local_irq_save() } while (1) { u64 now = local_clock() >> 31UL; if (now - start > TIMEOUT) goto out; pr_err(">>>>>\n"); } out: { preempt_enable(), local_irq_restore() } > I guess that the softlookup was not visible before the > printk patch was applied. I wonder if the patch made something > worse. It was supposed to improve things but... I think the softlockup under the given conditions (endless printk and no preemption) will show up regardless the patches being applied; but I see you point, of course. -ss -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [lkp] [printk] 34578dc67f: EIP is at vprintk_emit+0x1ea/0x600 2016-02-24 12:50 ` Sergey Senozhatsky @ 2016-02-24 15:12 ` Sergey Senozhatsky 2016-02-25 5:10 ` Sergey Senozhatsky 0 siblings, 1 reply; 7+ messages in thread From: Sergey Senozhatsky @ 2016-02-24 15:12 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Petr Mladek, Sergey Senozhatsky, kernel test robot, lkp, Rob Herring, Frank Rowand, Grant Likely, devicetree, LKML, Andrew Morton, Calvin Owens, Dave Jones, Kyle McMartin, Tejun Heo, Jan Kara, Stephen Rothwell On (02/24/16 21:50), Sergey Senozhatsky wrote: > On (02/24/16 12:46), Petr Mladek wrote: > [..] > > > and you get the NMI watchdog softlockup because you have a whole bunch of > > > > > > "of_overlay_destroy: Could not find overlay #6" > > > "### dt-test ### of_unittest_destroy_tracked_overlays: overlay destroy failed for #6" > > > > > > messages to print. seems that somehitng just pushes them in a loop. > > > there are too many of them: > > > > This sounds like a reasonable explanation. It seems that > > of_unittest_destroy_tracked_overlays() really ended in an infinite > > loop. > > > > But I am still curious why the softlookup points to > > > > [ 33.497718] EIP is at vprintk_emit+0x1ea/0x600 > > > > Also there is on the stack > > > > [ 33.497741] [<c068e712>] vprintk_default+0x32/0x40 > > [ 33.497741] [<c068e712>] vprintk_default+0x32/0x40 > > [ 33.497744] [<c06fdf6e>] printk+0x11/0x13 > > [ 33.497744] [<c06fdf6e>] printk+0x11/0x13 > > [ 33.497748] [<c0df5eec>] of_unittest_overlay+0x8d1/0x900 > > [ 33.497748] [<c0df5eec>] of_unittest_overlay+0x8d1/0x900 > > [ 33.497750] [<c0df6b1f>] of_unittest+0xc04/0xc2d > > [ 33.497750] [<c0df6b1f>] of_unittest+0xc04/0xc2d > > > > I would expect that the soft lookup happens in console_unlock() > > called with IRQs disabled. It seems to me that of_unittest_overlay() > > is called with IRQs enabled. hm... both of the logbuf_lock/irq reqions in vprintk_emit/console_unlock are not modified by the patch set. there is, however, one thing that has changed -- additional console_cont_flush() calls, which does spin_lock_irq logbuf_lock and spin_unlock_irq logbuf_lock. > > I want to be sure that the patch in printk() did not introduce > > a deadlock that is visible only under a high printk load. I'll do more tests, certainly. -ss ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [lkp] [printk] 34578dc67f: EIP is at vprintk_emit+0x1ea/0x600 2016-02-24 15:12 ` Sergey Senozhatsky @ 2016-02-25 5:10 ` Sergey Senozhatsky 2016-03-02 10:30 ` Petr Mladek 0 siblings, 1 reply; 7+ messages in thread From: Sergey Senozhatsky @ 2016-02-25 5:10 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, kernel test robot, lkp-JC7UmRfGjtg, Rob Herring, Frank Rowand, Grant Likely, devicetree-u79uwXL29TY76Z2rM5mHXA, LKML, Andrew Morton, Calvin Owens, Dave Jones, Kyle McMartin, Tejun Heo, Jan Kara, Stephen Rothwell Hello, On (02/25/16 00:12), Sergey Senozhatsky wrote: [..] > > > I want to be sure that the patch in printk() did not introduce > > > a deadlock that is visible only under a high printk load. > > I'll do more tests, certainly. No behavioral change between linux-next and linux-next modulo printk patches was observed. next-20160224 git revert 8a30bf8ff5e07a589ad43363b06ea361ad723a0e 2822460a0ea97656ae71b2b4ddac087f79c5876a b6751672d3ac1d3d876fd5ea7ec579956eee4d6b set PREEMPT_NONE CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set CONFIG_PREEMPT_COUNT=y add PANIC of softlockup and panic on hung_task CONFIG_LOCKUP_DETECTOR=y CONFIG_HARDLOCKUP_DETECTOR=y CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=1 CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y << CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=1 CONFIG_DETECT_HUNG_TASK=y CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=11 CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y << CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=1 CONFIG_WQ_WATCHDOG=y CONFIG_PANIC_ON_OOPS=y CONFIG_PANIC_ON_OOPS_VALUE=1 CONFIG_PANIC_TIMEOUT=23 modify watchdog timeout (so it detects lockup sooner) and add printk() flood -- the code does not even disable preemption, just busy looping, we have PREEMPT_NONE kernel. git diff diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index 46055db..8759dbb 100644 --- a/drivers/block/zram/zram_drv.c +++ b/drivers/block/zram/zram_drv.c @@ -1410,6 +1410,21 @@ static int __init zram_init(void) { int ret; + /*unsigned long flags;*/ + u64 s, e; + + /*preempt_disable();*/ + s = local_clock() >> 31UL; + while (1) { + e = local_clock() >> 31UL; + if (e - s > 10) + goto out; + pr_err(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\n"); + } +out: + /*preempt_enable();*/ + + ret = class_register(&zram_control_class); if (ret) { pr_err("Unable to register zram-control class\n"); diff --git a/kernel/watchdog.c b/kernel/watchdog.c index b3ace6e..8655364 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -54,7 +54,7 @@ static unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED; int __read_mostly nmi_watchdog_enabled; int __read_mostly soft_watchdog_enabled; int __read_mostly watchdog_user_enabled; -int __read_mostly watchdog_thresh = 10; +int __read_mostly watchdog_thresh = 1; #ifdef CONFIG_SMP int __read_mostly sysctl_softlockup_all_cpu_backtrace; --- The test (unpatched linux-next) NMI watchdog: BUG: soft lockup - CPU#2 stuck for 3s! [modprobe:418] Modules linked in: ... irq event stamp: 40679 hardirqs last enabled at (40679): [<ffffffff8108b771>] vprintk_emit+0x3f0/0x499 hardirqs last disabled at (40678): [<ffffffff8108b3d4>] vprintk_emit+0x53/0x499 softirqs last enabled at (35360): [<ffffffff810475e9>] __do_softirq+0x2d8 softirqs last disabled at (35357): [<ffffffff810478ce>] irq_exit+0x41/0x95 CPU: 2 PID: 418 Comm: modprobe Tainted: G O 4.5.0-rc5-next-20160224-dbg-00012-gfedb791-dirty #187 Hardware name: ... task: ffff88013290ce00 ti: ffff880131f98000 task.ti: ffff880131f98000 RIP: 0010:[<ffffffff8108b2d8>] [<ffffffff8108b2d8>] console_unlock+0x3ad/0x456 RSP: 0018:ffff880131f9bbc8 EFLAGS: 00000246 RAX: ffff88013290ce00 RBX: ffffffff82891fdc RCX: 00000000ffffffff RDX: ffffffff812c99a8 RSI: 0000000000000001 RDI: ffffffff8108b2d5 RBP: ffff880131f9bc08 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000ffff00ff R11: 000000000000042b R12: 0000000000000000 R13: 0000000000000246 R14: 0000000000000003 R15: 0000000000000001 S: 00007fe6ed4fd700(0000) GS:ffff880137d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe6eb905000 CR3: 00000000bb7fc000 CR4: 00000000000006e0 Stack: 0000000000000000 0000000000000029 000000000001a128 00000000ffffffff 0000000000000003 0000000000000028 ffffffff82877602 0000000000000028 ffff880131f9bc60 ffffffff8108b7cb 0000000000000000 0000000000000000 Call Trace: [<ffffffff8108b7cb>] vprintk_emit+0x44a/0x499 [<ffffffffa01f0000>] ? 0xffffffffa01f0000 [<ffffffff8108b931>] vprintk_default+0x1d/0x1f [<ffffffff810f1b21>] printk+0x48/0x50 [<ffffffffa01f0032>] zram_init+0x32/0x1000 [zram] [<ffffffff81000402>] do_one_initcall+0xe7/0x16 [<ffffffff810f1d5f>] ? do_init_module+0x27/0x1d9 [<ffffffff81093619>] ? rcu_read_lock_sched_held+0x61/0x69 [<ffffffff811330fa>] ? kmem_cache_alloc_trace+0x138/0x1d0 [<ffffffff810f1d5f>] ? do_init_module+0x27/0x1d9 [<ffffffff810f1d97>] do_init_module+0x5f/0x1d9 [<ffffffff810b2b2c>] load_module+0x19da/0x1eb8 [<ffffffff810830c4>] ? lock_acquire+0x10f/0x1a3 [<ffffffff810b3109>] SyS_init_module+0xff/0x125 [<ffffffff810b3109>] ? SyS_init_module+0xff/0x125 [<ffffffff8150e525>] entry_SYSCALL_64_fastpath+0x18/0xa8 Code: 89 05 11 cc 80 01 4c 89 35 0e cc 80 01 88 d0 83 e2 fe 83 e0 1e 88 53 0f 0f b6 c0 89 05 f2 cb 80 01 eb 9e e8 9e 5a ff ff 41 55 9d <e9> ed fe ff ff 48 83 3d 3b cd 80 01 00 c7 05 3d cd 80 01 00 00 Kernel panic - not syncing: softlockup: hung tasks CPU: 2 PID: 418 Comm: modprobe Tainted: G O L 4.5.0-rc5-next-20160224-dbg-00012-gfedb791-dirty #187 Hardware name: ... 0000000000000000 ffff880137d03d68 ffffffff8123334d ffffffff817ad7c7 ffff880137d03df8 ffff880137d03de8 ffffffff810f124e ffffffff00000008 ffff880137d03df8 ffff880137d03d90 0000000000000000 00000000ffffffff Call Trace: <IRQ> [<ffffffff8123334d>] dump_stack+0x67/0x90 [<ffffffff810f124e>] panic+0xd1/0x212 [<ffffffff8107d1c1>] ? down_trylock+0x2d/0x37 [<ffffffff810c12c5>] watchdog_timer_fn+0x18e/0x1b0 [<ffffffff8109cee9>] __hrtimer_run_queues+0x1ee/0x418 [<ffffffff810c1137>] ? __touch_watchdog+0x1c/0x1c [<ffffffff8109d2af>] hrtimer_interrupt+0xa8/0x192 [<ffffffff810354cd>] hpet_interrupt_handler+0x2a/0x31 [<ffffffff8108ccd8>] handle_irq_event_percpu+0xd2/0x2e4 [<ffffffff8108cf22>] handle_irq_event+0x38/0x56 [<ffffffff8108fc63>] handle_edge_irq+0xd9/0xfb [<ffffffff810174ad>] handle_irq+0x1c/0x20 [<ffffffff81510795>] do_IRQ+0x85/0x1 [<ffffffff8150ee86>] common_interrupt+0x86/0x86 <EOI> [<ffffffff812c99a8>] ? vt_console_print+0x2f9/0x308 [<ffffffff8108b2d5>] ? console_unlock+0x3aa/0x456 [<ffffffff8108b2d8>] ? console_unlock+0x3ad/0x456 [<ffffffff8108b7cb>] vprintk_emit+0x44a/0x499 [<ffffffffa01f0000>] ? 0xffffffffa01f0000 [<ffffffff8108b931>] vprintk_default+0x1d/0x1f [<ffffffff810f1b21>] printk+0x48/0x50 [<ffffffffa01f0032>] zram_init+0x32/0x1000 [zram] [<ffffffff81000402>] do_one_initcall+0xe7/0x16c [<ffffffff810f1d5f>] ? do_init_module+0x27/0x1d9 [<ffffffff81093619>] ? rcu_read_lock_sched_held+0x61/0x69 [<ffffffff811330fa>] ? kmem_cache_alloc_trace+0x138/0x1d0 [<ffffffff810f1d5f>] ? do_init_module+0x27/0x1d9 [<ffffffff810f1d97>] do_init_module+0x5f/0x1d9 [<ffffffff810b2b2c>] load_module+0x19da/0x1eb8 [<ffffffff810830c4>] ? lock_acquire+0x10f/0x1a3 [<ffffffff810b3109>] SyS_init_module+0xff/0x125 [<ffffffff810b3109>] ? SyS_init_module+0xff/0x125 [<ffffffff8150e525>] entry_SYSCALL_64_fastpath+0x18/0xa8 so in this test printk()->console_unlock() was performed on the same CPU. but in general those can be different CPUs. one busy spinning doing printk(), the other busy spinning doing console_unlock()->call_console_drivers(). -ss -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [lkp] [printk] 34578dc67f: EIP is at vprintk_emit+0x1ea/0x600 2016-02-25 5:10 ` Sergey Senozhatsky @ 2016-03-02 10:30 ` Petr Mladek 2016-03-02 10:41 ` Sergey Senozhatsky 0 siblings, 1 reply; 7+ messages in thread From: Petr Mladek @ 2016-03-02 10:30 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Sergey Senozhatsky, kernel test robot, lkp, Rob Herring, Frank Rowand, Grant Likely, devicetree, LKML, Andrew Morton, Calvin Owens, Dave Jones, Kyle McMartin, Tejun Heo, Jan Kara, Stephen Rothwell On Thu 2016-02-25 14:10:05, Sergey Senozhatsky wrote: > CONFIG_PREEMPT_NONE=y > # CONFIG_PREEMPT_VOLUNTARY is not set > # CONFIG_PREEMPT is not set > CONFIG_PREEMPT_COUNT=y I was curious why your patch actually did not help to avoid the softlockup. The infinite printk loop was called in a safe context, CONFIG_PREEMPT_COUNT=y, so it did set console_may_schedule = 1. But it never got console_sem, never called console_unlock() and never called cond_resched() there. So, it never got preempted in the PREEMPT_NONE kernel. The console_sem was owned by another process that was busy handling the flood of messages. Note that the infinite cycle calling printk() might be interrupted anywhere by the NMI watchdog. It explains why the original report pointed at the beginning of the printk. It was not a deadlock. I got the softlockup on random locations here. Also it is not the classic softlockup in console_unlock(). The process handling the console actually could sleep if it has console_may_schedule = 1. This is why console_unlock() did not appeared in the softlockup backtrace. Conclusion: Sergey is correct and his patches are innocent here. Sergey, if you send the fix for of_unittest_destroy_tracked_overlays(), please add me into CC. Best Regards, Petr PS: I am sorry for the noise and that it took me so long to shake my head around this problem. Thanks a lot for patience. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [lkp] [printk] 34578dc67f: EIP is at vprintk_emit+0x1ea/0x600 2016-03-02 10:30 ` Petr Mladek @ 2016-03-02 10:41 ` Sergey Senozhatsky 0 siblings, 0 replies; 7+ messages in thread From: Sergey Senozhatsky @ 2016-03-02 10:41 UTC (permalink / raw) To: Petr Mladek Cc: Sergey Senozhatsky, Sergey Senozhatsky, kernel test robot, lkp, Rob Herring, Frank Rowand, Grant Likely, devicetree, LKML, Andrew Morton, Calvin Owens, Dave Jones, Kyle McMartin, Tejun Heo, Jan Kara, Stephen Rothwell On (03/02/16 11:30), Petr Mladek wrote: > On Thu 2016-02-25 14:10:05, Sergey Senozhatsky wrote: > > CONFIG_PREEMPT_NONE=y > > # CONFIG_PREEMPT_VOLUNTARY is not set > > # CONFIG_PREEMPT is not set > > CONFIG_PREEMPT_COUNT=y > > I was curious why your patch actually did not help to avoid the > softlockup. The infinite printk loop was called in a safe context, > CONFIG_PREEMPT_COUNT=y, so it did set console_may_schedule = 1. > > But it never got console_sem, never called console_unlock() > and never called cond_resched() there. So, it never got preempted > in the PREEMPT_NONE kernel. The console_sem was owned by > another process that was busy handling the flood of messages. > > Note that the infinite cycle calling printk() might be interrupted anywhere > by the NMI watchdog. It explains why the original report pointed > at the beginning of the printk. It was not a deadlock. > I got the softlockup on random locations here. > > Also it is not the classic softlockup in console_unlock(). > The process handling the console actually could sleep if it > has console_may_schedule = 1. This is why console_unlock() > did not appeared in the softlockup backtrace. > > > Conclusion: > > Sergey is correct and his patches are innocent here. thank you, Petr! > Sergey, if you send the fix for > of_unittest_destroy_tracked_overlays(), please add me into CC. of course, will try to send it out tonight. > PS: I am sorry for the noise and that it took me so long to shake my > head around this problem. Thanks a lot for patience. no problem at all. thank you. the more we double check it - the better. -ss ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-03-02 10:41 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <87fuwk1c0o.fsf@yhuang-dev.intel.com> [not found] ` <20160223131538.GA2048@swordfish> [not found] ` <20160223155331.GA448@swordfish> 2016-02-24 1:19 ` [lkp] [printk] 34578dc67f: EIP is at vprintk_emit+0x1ea/0x600 Sergey Senozhatsky 2016-02-24 11:46 ` Petr Mladek [not found] ` <20160224114614.GY3305-KsEp0d+Q8qECVLCxKZUutA@public.gmane.org> 2016-02-24 12:50 ` Sergey Senozhatsky 2016-02-24 15:12 ` Sergey Senozhatsky 2016-02-25 5:10 ` Sergey Senozhatsky 2016-03-02 10:30 ` Petr Mladek 2016-03-02 10:41 ` Sergey Senozhatsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).