upstream boot error: BUG: soft lockup in __do

All of lore.kernel.org
 help / color / mirror / Atom feed

* upstream boot error: BUG: soft lockup in __do_softirq
@ 2020-07-31  6:44 syzbot
  2020-07-31  6:50 ` Dmitry Vyukov
  0 siblings, 1 reply; 5+ messages in thread
From: syzbot @ 2020-07-31  6:44 UTC (permalink / raw)
  To: bp, hpa, linux-kernel, luto, mingo, syzkaller-bugs, tglx, x86

Hello,

syzbot found the following issue on:

HEAD commit:    92ed3019 Linux 5.8-rc7
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10e84cdf100000
kernel config:  https://syzkaller.appspot.com/x/.config?x=b45e47f6d958ae82
dashboard link: https://syzkaller.appspot.com/bug?extid=8472ea265fe32cc3bf78
compiler:       gcc (GCC) 10.1.0-syz 20200507

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8472ea265fe32cc3bf78@syzkaller.appspotmail.com

hrtimer: interrupt took 42698779 ns
random: crng init done
watchdog: BUG: soft lockup - CPU#3 stuck for 21s! [grep:4749]
Modules linked in:
hardirqs last  enabled at (2780): [<ffffffff88200204>] __do_softirq+0x204/0xa60 kernel/softirq.c:276
hardirqs last disabled at (2781): [<ffffffff87e5b2ed>] idtentry_enter_cond_rcu+0x1d/0x50 arch/x86/entry/common.c:649
softirqs last  enabled at (2760): [<ffffffff88200748>] __do_softirq+0x748/0xa60 kernel/softirq.c:319
softirqs last disabled at (2779): [<ffffffff88000f0f>] asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711
CPU: 3 PID: 4749 Comm: grep Not tainted 5.8.0-rc7-syzkaller #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
RIP: 0010:__do_softirq+0x22f/0xa60 kernel/softirq.c:278
Code: c7 c0 98 e0 b4 89 48 c1 e8 03 42 80 3c 30 00 0f 85 70 07 00 00 48 83 3d 76 de 94 01 00 0f 84 4a 06 00 00 fb 66 0f 1f 44 00 00 <48> c7 44 24 08 c0 90 a0 89 b8 ff ff ff ff 0f bc 04 24 83 c0 01 89
RSP: 0000:ffffc90000598f70 EFLAGS: 00000286
RAX: 1ffffffff1369c13 RBX: ffff8880294683c0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff88200204
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff88000c3a
R13: 0000000000000000 R14: dffffc0000000000 R15: 0000000000000000
FS:  00007fe02dd23700(0000) GS:ffff88802d100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000f5d1e8 CR3: 000000001efc6000 CR4: 0000000000340ee0
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <IRQ>
 </IRQ>
 invoke_softirq kernel/softirq.c:387 [inline]
 __irq_exit_rcu kernel/softirq.c:417 [inline]
 irq_exit_rcu+0x229/0x270 kernel/softirq.c:429
 sysvec_apic_timer_interrupt+0x54/0x120 arch/x86/kernel/apic/apic.c:1091
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:585
Code: ba 63 03 00 00 e8 f5 bf 00 00 0f 1f 44 00 00 48 89 5c 24 e0 48 89 6c 24 e8 48 89 fb 4c 89 64 24 f0 4c 89 6c 24 f8 48 83 ec 38 <0f> b6 47 04 83 e0 0f 83 f8 06 0f 85 3e 01 00 00 31 d2 66 83 7b 06
RSP: 002b:00007ffd13103ee0 EFLAGS: 00010202
RAX: 00000000000003b7 RBX: 00007fe02d584588 RCX: 0000000000000000
RDX: 00007fe02d57d94c RSI: 0000000000000edc RDI: 00007fe02d584588
RBP: 00007fe02dd2aef8 R08: 00000000004282a7 R09: 0000000000000004
R10: 00007ffd13103f70 R11: 00007ffd13103f70 R12: 0000000000000004
R13: 0000000010a0a9c4 R14: 0000000000000002 R15: 00007ffd131040f8


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: upstream boot error: BUG: soft lockup in __do_softirq
  2020-07-31  6:44 upstream boot error: BUG: soft lockup in __do_softirq syzbot
@ 2020-07-31  6:50 ` Dmitry Vyukov
  2020-07-31 16:08   ` Randy Dunlap
  0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Vyukov @ 2020-07-31  6:50 UTC (permalink / raw)
  To: syzbot
  Cc: Borislav Petkov, H. Peter Anvin, LKML, Andy Lutomirski,
	Ingo Molnar, syzkaller-bugs, Thomas Gleixner,
	the arch/x86 maintainers

On Fri, Jul 31, 2020 at 8:44 AM syzbot
<syzbot+8472ea265fe32cc3bf78@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    92ed3019 Linux 5.8-rc7
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=10e84cdf100000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=b45e47f6d958ae82
> dashboard link: https://syzkaller.appspot.com/bug?extid=8472ea265fe32cc3bf78
> compiler:       gcc (GCC) 10.1.0-syz 20200507
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+8472ea265fe32cc3bf78@syzkaller.appspotmail.com

This is a qemu-kvm instance killing the host kernel somehow, the host
kernel itself running qemu's is full of rcu stalls. I think this is
not a bug in the tested kernel.
We change rcu stall timeout to 120 seconds from the default 21s, but
this happens only after boot using sysctls. I did not find any way to
change the rcu timeout via cmdline/config (would be useful).

> hrtimer: interrupt took 42698779 ns
> random: crng init done
> watchdog: BUG: soft lockup - CPU#3 stuck for 21s! [grep:4749]
> Modules linked in:
> hardirqs last  enabled at (2780): [<ffffffff88200204>] __do_softirq+0x204/0xa60 kernel/softirq.c:276
> hardirqs last disabled at (2781): [<ffffffff87e5b2ed>] idtentry_enter_cond_rcu+0x1d/0x50 arch/x86/entry/common.c:649
> softirqs last  enabled at (2760): [<ffffffff88200748>] __do_softirq+0x748/0xa60 kernel/softirq.c:319
> softirqs last disabled at (2779): [<ffffffff88000f0f>] asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:711
> CPU: 3 PID: 4749 Comm: grep Not tainted 5.8.0-rc7-syzkaller #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> RIP: 0010:__do_softirq+0x22f/0xa60 kernel/softirq.c:278
> Code: c7 c0 98 e0 b4 89 48 c1 e8 03 42 80 3c 30 00 0f 85 70 07 00 00 48 83 3d 76 de 94 01 00 0f 84 4a 06 00 00 fb 66 0f 1f 44 00 00 <48> c7 44 24 08 c0 90 a0 89 b8 ff ff ff ff 0f bc 04 24 83 c0 01 89
> RSP: 0000:ffffc90000598f70 EFLAGS: 00000286
> RAX: 1ffffffff1369c13 RBX: ffff8880294683c0 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff88200204
> R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff88000c3a
> R13: 0000000000000000 R14: dffffc0000000000 R15: 0000000000000000
> FS:  00007fe02dd23700(0000) GS:ffff88802d100000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000f5d1e8 CR3: 000000001efc6000 CR4: 0000000000340ee0
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <IRQ>
>  </IRQ>
>  invoke_softirq kernel/softirq.c:387 [inline]
>  __irq_exit_rcu kernel/softirq.c:417 [inline]
>  irq_exit_rcu+0x229/0x270 kernel/softirq.c:429
>  sysvec_apic_timer_interrupt+0x54/0x120 arch/x86/kernel/apic/apic.c:1091
>  asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:585
> Code: ba 63 03 00 00 e8 f5 bf 00 00 0f 1f 44 00 00 48 89 5c 24 e0 48 89 6c 24 e8 48 89 fb 4c 89 64 24 f0 4c 89 6c 24 f8 48 83 ec 38 <0f> b6 47 04 83 e0 0f 83 f8 06 0f 85 3e 01 00 00 31 d2 66 83 7b 06
> RSP: 002b:00007ffd13103ee0 EFLAGS: 00010202
> RAX: 00000000000003b7 RBX: 00007fe02d584588 RCX: 0000000000000000
> RDX: 00007fe02d57d94c RSI: 0000000000000edc RDI: 00007fe02d584588
> RBP: 00007fe02dd2aef8 R08: 00000000004282a7 R09: 0000000000000004
> R10: 00007ffd13103f70 R11: 00007ffd13103f70 R12: 0000000000000004
> R13: 0000000010a0a9c4 R14: 0000000000000002 R15: 00007ffd131040f8
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000060adcb05abb71eb6%40google.com.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: upstream boot error: BUG: soft lockup in __do_softirq
  2020-07-31  6:50 ` Dmitry Vyukov
@ 2020-07-31 16:08   ` Randy Dunlap
  2020-07-31 16:21     ` Dmitry Vyukov
  0 siblings, 1 reply; 5+ messages in thread
From: Randy Dunlap @ 2020-07-31 16:08 UTC (permalink / raw)
  To: Dmitry Vyukov, syzbot
  Cc: Borislav Petkov, H. Peter Anvin, LKML, Andy Lutomirski,
	Ingo Molnar, syzkaller-bugs, Thomas Gleixner,
	the arch/x86 maintainers, Paul E. McKenney

On 7/30/20 11:50 PM, Dmitry Vyukov wrote:
> On Fri, Jul 31, 2020 at 8:44 AM syzbot
> <syzbot+8472ea265fe32cc3bf78@syzkaller.appspotmail.com> wrote:
>>
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    92ed3019 Linux 5.8-rc7
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=10e84cdf100000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=b45e47f6d958ae82
>> dashboard link: https://syzkaller.appspot.com/bug?extid=8472ea265fe32cc3bf78
>> compiler:       gcc (GCC) 10.1.0-syz 20200507
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+8472ea265fe32cc3bf78@syzkaller.appspotmail.com
> 
> This is a qemu-kvm instance killing the host kernel somehow, the host
> kernel itself running qemu's is full of rcu stalls. I think this is
> not a bug in the tested kernel.
> We change rcu stall timeout to 120 seconds from the default 21s, but
> this happens only after boot using sysctls. I did not find any way to
> change the rcu timeout via cmdline/config (would be useful).

(adding Paul)


Documentation/RCU/stallwarn.rst says there is a Kconfig:

CONFIG_RCU_CPU_STALL_TIMEOUT

	This kernel configuration parameter defines the period of time
	that RCU will wait from the beginning of a grace period until it
	issues an RCU CPU stall warning.  This time period is normally
	21 seconds.

and Documentation/admin-guide/kernel-parameters.txt has 2 RCU stall timeouts,
one for CPU and one for tasks:

	rcupdate.rcu_cpu_stall_timeout= [KNL]
			Set timeout for RCU CPU stall warning messages.

	rcupdate.rcu_task_stall_timeout= [KNL]
			Set timeout in jiffies for RCU task stall warning
			messages.  Disable with a value less than or equal
			to zero.


-- 
~Randy


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: upstream boot error: BUG: soft lockup in __do_softirq
  2020-07-31 16:08   ` Randy Dunlap
@ 2020-07-31 16:21     ` Dmitry Vyukov
  2020-07-31 16:23       ` Dmitry Vyukov
  0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Vyukov @ 2020-07-31 16:21 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: syzbot, Borislav Petkov, H. Peter Anvin, LKML, Andy Lutomirski,
	Ingo Molnar, syzkaller-bugs, Thomas Gleixner,
	the arch/x86 maintainers, Paul E. McKenney

On Fri, Jul 31, 2020 at 6:08 PM Randy Dunlap <rdunlap@infradead.org> wrote:
>
> On 7/30/20 11:50 PM, Dmitry Vyukov wrote:
> > On Fri, Jul 31, 2020 at 8:44 AM syzbot
> > <syzbot+8472ea265fe32cc3bf78@syzkaller.appspotmail.com> wrote:
> >>
> >> Hello,
> >>
> >> syzbot found the following issue on:
> >>
> >> HEAD commit:    92ed3019 Linux 5.8-rc7
> >> git tree:       upstream
> >> console output: https://syzkaller.appspot.com/x/log.txt?x=10e84cdf100000
> >> kernel config:  https://syzkaller.appspot.com/x/.config?x=b45e47f6d958ae82
> >> dashboard link: https://syzkaller.appspot.com/bug?extid=8472ea265fe32cc3bf78
> >> compiler:       gcc (GCC) 10.1.0-syz 20200507
> >>
> >> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >> Reported-by: syzbot+8472ea265fe32cc3bf78@syzkaller.appspotmail.com
> >
> > This is a qemu-kvm instance killing the host kernel somehow, the host
> > kernel itself running qemu's is full of rcu stalls. I think this is
> > not a bug in the tested kernel.
> > We change rcu stall timeout to 120 seconds from the default 21s, but
> > this happens only after boot using sysctls. I did not find any way to
> > change the rcu timeout via cmdline/config (would be useful).
>
> (adding Paul)
>
>
> Documentation/RCU/stallwarn.rst says there is a Kconfig:
>
> CONFIG_RCU_CPU_STALL_TIMEOUT
>
>         This kernel configuration parameter defines the period of time
>         that RCU will wait from the beginning of a grace period until it
>         issues an RCU CPU stall warning.  This time period is normally
>         21 seconds.
>
> and Documentation/admin-guide/kernel-parameters.txt has 2 RCU stall timeouts,
> one for CPU and one for tasks:
>
>         rcupdate.rcu_cpu_stall_timeout= [KNL]
>                         Set timeout for RCU CPU stall warning messages.
>
>         rcupdate.rcu_task_stall_timeout= [KNL]
>                         Set timeout in jiffies for RCU task stall warning
>                         messages.  Disable with a value less than or equal
>                         to zero.

Hi Randy,

Thanks for looking into this.
But I think I messed things up.  The config  has
CONFIG_RCU_CPU_STALL_TIMEOUT=100, but this is not an RCU stall:

watchdog: BUG: soft lockup - CPU#3 stuck for 21s! [grep:4749]

This is what is controlled by kernel.watchdog_thresh sysctl (?).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: upstream boot error: BUG: soft lockup in __do_softirq
  2020-07-31 16:21     ` Dmitry Vyukov
@ 2020-07-31 16:23       ` Dmitry Vyukov
  0 siblings, 0 replies; 5+ messages in thread
From: Dmitry Vyukov @ 2020-07-31 16:23 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: syzbot, Borislav Petkov, H. Peter Anvin, LKML, Andy Lutomirski,
	Ingo Molnar, syzkaller-bugs, Thomas Gleixner,
	the arch/x86 maintainers, Paul E. McKenney

On Fri, Jul 31, 2020 at 6:21 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Fri, Jul 31, 2020 at 6:08 PM Randy Dunlap <rdunlap@infradead.org> wrote:
> >
> > On 7/30/20 11:50 PM, Dmitry Vyukov wrote:
> > > On Fri, Jul 31, 2020 at 8:44 AM syzbot
> > > <syzbot+8472ea265fe32cc3bf78@syzkaller.appspotmail.com> wrote:
> > >>
> > >> Hello,
> > >>
> > >> syzbot found the following issue on:
> > >>
> > >> HEAD commit:    92ed3019 Linux 5.8-rc7
> > >> git tree:       upstream
> > >> console output: https://syzkaller.appspot.com/x/log.txt?x=10e84cdf100000
> > >> kernel config:  https://syzkaller.appspot.com/x/.config?x=b45e47f6d958ae82
> > >> dashboard link: https://syzkaller.appspot.com/bug?extid=8472ea265fe32cc3bf78
> > >> compiler:       gcc (GCC) 10.1.0-syz 20200507
> > >>
> > >> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > >> Reported-by: syzbot+8472ea265fe32cc3bf78@syzkaller.appspotmail.com
> > >
> > > This is a qemu-kvm instance killing the host kernel somehow, the host
> > > kernel itself running qemu's is full of rcu stalls. I think this is
> > > not a bug in the tested kernel.
> > > We change rcu stall timeout to 120 seconds from the default 21s, but
> > > this happens only after boot using sysctls. I did not find any way to
> > > change the rcu timeout via cmdline/config (would be useful).
> >
> > (adding Paul)
> >
> >
> > Documentation/RCU/stallwarn.rst says there is a Kconfig:
> >
> > CONFIG_RCU_CPU_STALL_TIMEOUT
> >
> >         This kernel configuration parameter defines the period of time
> >         that RCU will wait from the beginning of a grace period until it
> >         issues an RCU CPU stall warning.  This time period is normally
> >         21 seconds.
> >
> > and Documentation/admin-guide/kernel-parameters.txt has 2 RCU stall timeouts,
> > one for CPU and one for tasks:
> >
> >         rcupdate.rcu_cpu_stall_timeout= [KNL]
> >                         Set timeout for RCU CPU stall warning messages.
> >
> >         rcupdate.rcu_task_stall_timeout= [KNL]
> >                         Set timeout in jiffies for RCU task stall warning
> >                         messages.  Disable with a value less than or equal
> >                         to zero.
>
> Hi Randy,
>
> Thanks for looking into this.
> But I think I messed things up.  The config  has
> CONFIG_RCU_CPU_STALL_TIMEOUT=100, but this is not an RCU stall:
>
> watchdog: BUG: soft lockup - CPU#3 stuck for 21s! [grep:4749]
>
> This is what is controlled by kernel.watchdog_thresh sysctl (?).

And there is actually a cmdline parameter for this:

static int __init watchdog_thresh_setup(char *str)
{
    get_option(&str, &watchdog_thresh);
    return 1;
}
__setup("watchdog_thresh=", watchdog_thresh_setup);

I will write it down somewhere.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-07-31 16:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-07-31  6:44 upstream boot error: BUG: soft lockup in __do_softirq syzbot
2020-07-31  6:50 ` Dmitry Vyukov
2020-07-31 16:08   ` Randy Dunlap
2020-07-31 16:21     ` Dmitry Vyukov
2020-07-31 16:23       ` Dmitry Vyukov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.