* 3.5.0-rc5: BUG: soft lockup - CPU#0 stuck for 22s @ 2012-07-01 7:18 Christian Kujau 2012-07-02 4:50 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 6+ messages in thread From: Christian Kujau @ 2012-07-01 7:18 UTC (permalink / raw) To: linuxppc-dev; +Cc: LKML Hi, while trying to upgrade from 3.4.0 to 3.5.0-rc5 on this Powerbook G4 (powerpc 32 bit), this happens during booting: -------------- usb 2-1: new full-speed USB device number 4 using ohci_hcd SCSI subsystem initialized scsi0 : SBP-2 IEEE-1394 scsi1 : SBP-2 IEEE-1394 firewire_sbp2 fw1.0: logged in to LUN 0000 (0 retries) scsi 0:0:0:0: Direct-Access Ext Hard Disk PQ: 0 ANSI: 4 [25 seconds later] BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:695] Modules linked in: sd_mod arc4 firewire_sbp2 scsi_mod b43 mac80211 cfg80211 irq event stamp: 56975532 hardirqs last enabled at (56975531): <c053aea8>] _raw_spin_unlock_irqrestore+0x40/0x9c hardirqs last disabled at (56975532): <c0011688>] reenable_mmu+0x40/0x98 softirqs last enabled at (56966080): <c000f3f0>] call_do_softirq+0x14/0x24 softirqs last disabled at (56966073): <c000f3f0>] call_do_softirq+0x14/0x24 NIP: c053aeac LR: c053aea8 CTR: c009d510 REGS: eed1fd40 TRAP: 0901 Not tainted (3.5.0-rc5) MSR: 00009032 ME IR DR RI > CR: 24042484 XER: 00000000 TASK = eece5c00[695] 'modprobe' THREAD: eed1e000 #012GPR00: 012GPR08: NIP [c053aeac] _raw_spin_unlock_irqrestore+0x44/0x9c LR [c053aea8] _raw_spin_unlock_irqrestore+0x40/0x9c Call Trace: [eed1fdf0] [c053aea8] _raw_spin_unlock_irqrestore+0x40/0x9c (unreliable) [eed1fe00] [c005fbd8] lowest_in_progress+0xa8/0xcc [eed1fe20] [c005fc5c] async_synchronize_cookie_domain+0x60/0x1f4 [eed1fe70] [c005fe3c] async_synchronize_full+0x3c/0x74 [eed1fe90] [c0083908] sys_init_module+0x174/0x1168 [eed1ff40] [c00117e8] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xff5d694#012[ 36.339234] LR = 0x100041c0 Instruction dump: 38630010 73e08000 -------------- This is what netconsole was able to receive. I've made a screenshot from another boot with the same vanilla 3.5.0-rc5 (today's mainline) and put it (along with some more details) up here: http://nerdbynature.de/bits/3.5.0-rc5/soft_lockup/ Any ideas? Thanks, Christian. -- BOFH excuse #436: Daemon escaped from pentagram ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 3.5.0-rc5: BUG: soft lockup - CPU#0 stuck for 22s 2012-07-01 7:18 3.5.0-rc5: BUG: soft lockup - CPU#0 stuck for 22s Christian Kujau @ 2012-07-02 4:50 ` Benjamin Herrenschmidt 2012-07-02 6:30 ` Christian Kujau 2012-07-02 11:58 ` Christian Kujau 0 siblings, 2 replies; 6+ messages in thread From: Benjamin Herrenschmidt @ 2012-07-02 4:50 UTC (permalink / raw) To: Christian Kujau; +Cc: linuxppc-dev, LKML On Sun, 2012-07-01 at 00:18 -0700, Christian Kujau wrote: > Hi, > > while trying to upgrade from 3.4.0 to 3.5.0-rc5 on this Powerbook G4 > (powerpc 32 bit), this happens during booting: > > -------------- > usb 2-1: new full-speed USB device number 4 using ohci_hcd > SCSI subsystem initialized > scsi0 : SBP-2 IEEE-1394 > scsi1 : SBP-2 IEEE-1394 > firewire_sbp2 fw1.0: logged in to LUN 0000 (0 retries) > scsi 0:0:0:0: Direct-Access Ext Hard Disk PQ: 0 ANSI: 4 Interesting... I observed something roughly similar on a dual G4 the other day associated with a 30s to 1mn pause during boot. RCU was complaining loudly. In my case, it did continue booting normally, is that the case for you ? Also if I compile the kernel without CONFIG_SMP, it did go away as well, do you observe that too ? I don't have a spare cycle to investigate this problem this week I'm afraid, it might help if you could try a bisection though. Cheers, Ben. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 3.5.0-rc5: BUG: soft lockup - CPU#0 stuck for 22s 2012-07-02 4:50 ` Benjamin Herrenschmidt @ 2012-07-02 6:30 ` Christian Kujau 2012-07-02 16:45 ` Paul E. McKenney 2012-07-02 11:58 ` Christian Kujau 1 sibling, 1 reply; 6+ messages in thread From: Christian Kujau @ 2012-07-02 6:30 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, LKML On Mon, 2 Jul 2012 at 14:50, Benjamin Herrenschmidt wrote: > Interesting... I observed something roughly similar on a dual G4 > the other day associated with a 30s to 1mn pause during boot. RCU > was complaining loudly. > > In my case, it did continue booting normally, is that the case for you ? No, in my case it stopped booting, though there was no "panic" message. > Also if I compile the kernel without CONFIG_SMP, it did go away as well, > do you observe that too ? This PoweBook G4 is UP anyway and I'm always building w/o CONFIG_SMP. > I don't have a spare cycle to investigate this problem this week I'm > afraid, it might help if you could try a bisection though. Yes, I've started a bisection already and will report back. FYI, I've put a netconsole-log below from another boot (in the middle of a bisection), the instruction dump is slightly more complete. Thanks! Christian. [ 40.345973] BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:691] [ 40.347737] Modules linked in: sd_mod arc4 firewire_sbp2 scsi_mod b43 mac80211 cfg80211 [ 40.349443] irq event stamp: 65779272 [ 40.351156] hardirqs last enabled at (65779271): [<c052fae8>] _raw_spin_unlock_irqrestore+0x40/0x9c [ 40.352911] hardirqs last disabled at (65779272): [<c0011648>] reenable_mmu+0x40/0x98 [ 40.354649] softirqs last enabled at (65769864): [<c000f3c0>] call_do_softirq+0x14/0x24 [ 40.356386] softirqs last disabled at (65769857): [<c000f3c0>] call_do_softirq+0x14/0x24 [ 40.358098] NIP: c005ee64 LR: c005ee54 CTR: c009c578 [ 40.359788] REGS: eece3d70 TRAP: 0901 Not tainted (3.4.0-05609-gc80ddb5) [ 40.361517] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 24042482 XER: 00000000 [ 40.363261] TASK = eec30b80[691] 'modprobe' THREAD: eece2000#012 [ 40.363261] GPR00: c005ee54 eece3e20 eec30b80 00000000 00000002 c005edd0 00000000 00000000 #012 [ 40.363261] GPR08: 00000000 eece2000 00000000 00000001 24042488 [ 40.368310] NIP [c005ee64] async_synchronize_cookie_domain+0x70/0x1f4 [ 40.369978] LR [c005ee54] async_synchronize_cookie_domain+0x60/0x1f4 [ 40.371614] Call Trace: [ 40.373226] [eece3e20] [c005ee54] async_synchronize_cookie_domain+0x60/0x1f4 (unreliable) [ 40.374884] [eece3e70] [c005f034] async_synchronize_full+0x3c/0x74 [ 40.376524] [eece3e90] [c0082790] sys_init_module+0x178/0x1164 [ 40.378151] [eece3f40] [c00117a8] ret_from_syscall+0x0/0x38 [ 40.379783] --- Exception: c01 at 0xff5d694#012[ 40.379783] LR = 0x100041c0 [ 40.382928] Instruction dump: [ 40.384489] 93a10044 419e0014 3d20c06f 8009d020 2f800000 419e017c 7fc3f378 4bfffed9 [ 40.386081] 7f9f1840 419d0060 7f9f1800 419e0050 <881bf008> 2f800000 419e0014 3d20c06f -- BOFH excuse #241: _Rosin_ core solder? But... ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 3.5.0-rc5: BUG: soft lockup - CPU#0 stuck for 22s 2012-07-02 6:30 ` Christian Kujau @ 2012-07-02 16:45 ` Paul E. McKenney 2012-07-02 18:46 ` Christian Kujau 0 siblings, 1 reply; 6+ messages in thread From: Paul E. McKenney @ 2012-07-02 16:45 UTC (permalink / raw) To: Christian Kujau; +Cc: Benjamin Herrenschmidt, linuxppc-dev, LKML On Sun, Jul 01, 2012 at 11:30:40PM -0700, Christian Kujau wrote: > On Mon, 2 Jul 2012 at 14:50, Benjamin Herrenschmidt wrote: > > Interesting... I observed something roughly similar on a dual G4 > > the other day associated with a 30s to 1mn pause during boot. RCU > > was complaining loudly. > > > > In my case, it did continue booting normally, is that the case for you ? > > No, in my case it stopped booting, though there was no "panic" message. > > > Also if I compile the kernel without CONFIG_SMP, it did go away as well, > > do you observe that too ? > > This PoweBook G4 is UP anyway and I'm always building w/o CONFIG_SMP. > > > I don't have a spare cycle to investigate this problem this week I'm > > afraid, it might help if you could try a bisection though. > > Yes, I've started a bisection already and will report back. > > FYI, I've put a netconsole-log below from another boot (in the > middle of a bisection), the instruction dump is slightly more complete. Li Zhong posted a patch to fix this (async_synchronize_full() below): https://lkml.org/lkml/2012/7/2/32 This gets rid of this problem on my UP testing. If it works for you, please give Li Zhong a Tested-by. Thanx, Paul > Thanks! > Christian. > > [ 40.345973] BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:691] > [ 40.347737] Modules linked in: sd_mod arc4 firewire_sbp2 scsi_mod b43 mac80211 cfg80211 > [ 40.349443] irq event stamp: 65779272 > [ 40.351156] hardirqs last enabled at (65779271): [<c052fae8>] _raw_spin_unlock_irqrestore+0x40/0x9c > [ 40.352911] hardirqs last disabled at (65779272): [<c0011648>] reenable_mmu+0x40/0x98 > [ 40.354649] softirqs last enabled at (65769864): [<c000f3c0>] call_do_softirq+0x14/0x24 > [ 40.356386] softirqs last disabled at (65769857): [<c000f3c0>] call_do_softirq+0x14/0x24 > [ 40.358098] NIP: c005ee64 LR: c005ee54 CTR: c009c578 > [ 40.359788] REGS: eece3d70 TRAP: 0901 Not tainted (3.4.0-05609-gc80ddb5) > [ 40.361517] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 24042482 XER: 00000000 > [ 40.363261] TASK = eec30b80[691] 'modprobe' THREAD: eece2000#012 > [ 40.363261] GPR00: c005ee54 eece3e20 eec30b80 00000000 00000002 c005edd0 00000000 00000000 #012 > [ 40.363261] GPR08: 00000000 eece2000 00000000 00000001 24042488 > [ 40.368310] NIP [c005ee64] async_synchronize_cookie_domain+0x70/0x1f4 > [ 40.369978] LR [c005ee54] async_synchronize_cookie_domain+0x60/0x1f4 > [ 40.371614] Call Trace: > [ 40.373226] [eece3e20] [c005ee54] async_synchronize_cookie_domain+0x60/0x1f4 (unreliable) > [ 40.374884] [eece3e70] [c005f034] async_synchronize_full+0x3c/0x74 > [ 40.376524] [eece3e90] [c0082790] sys_init_module+0x178/0x1164 > [ 40.378151] [eece3f40] [c00117a8] ret_from_syscall+0x0/0x38 > [ 40.379783] --- Exception: c01 at 0xff5d694#012[ 40.379783] LR = 0x100041c0 > [ 40.382928] Instruction dump: > [ 40.384489] 93a10044 419e0014 3d20c06f 8009d020 2f800000 419e017c 7fc3f378 4bfffed9 > [ 40.386081] 7f9f1840 419d0060 7f9f1800 419e0050 <881bf008> 2f800000 419e0014 3d20c06f > > > -- > BOFH excuse #241: > > _Rosin_ core solder? But... > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 3.5.0-rc5: BUG: soft lockup - CPU#0 stuck for 22s 2012-07-02 16:45 ` Paul E. McKenney @ 2012-07-02 18:46 ` Christian Kujau 0 siblings, 0 replies; 6+ messages in thread From: Christian Kujau @ 2012-07-02 18:46 UTC (permalink / raw) To: Paul E. McKenney, zhong; +Cc: Benjamin Herrenschmidt, linuxppc-dev, LKML, arjan On Mon, 2 Jul 2012 at 09:45, Paul E. McKenney wrote: > On Sun, Jul 01, 2012 at 11:30:40PM -0700, Christian Kujau wrote: > > On Mon, 2 Jul 2012 at 14:50, Benjamin Herrenschmidt wrote: > > > Interesting... I observed something roughly similar on a dual G4 > > > the other day associated with a 30s to 1mn pause during boot. RCU > > > was complaining loudly. > > > > > > In my case, it did continue booting normally, is that the case for you ? > > > > No, in my case it stopped booting, though there was no "panic" message. > > > > > Also if I compile the kernel without CONFIG_SMP, it did go away as well, > > > do you observe that too ? > > > > This PoweBook G4 is UP anyway and I'm always building w/o CONFIG_SMP. > > > > > I don't have a spare cycle to investigate this problem this week I'm > > > afraid, it might help if you could try a bisection though. > > > > Yes, I've started a bisection already and will report back. > > > > FYI, I've put a netconsole-log below from another boot (in the > > middle of a bisection), the instruction dump is slightly more complete. > > Li Zhong posted a patch to fix this (async_synchronize_full() below): > > https://lkml.org/lkml/2012/7/2/32 > > This gets rid of this problem on my UP testing. If it works for you, > please give Li Zhong a Tested-by. Great! When applied to -rc5 Li's patch fixes it for my powerpc UP system as well. Tested-by: Christian Kujau <lists@nerdbynature.de> Many thanks! Christian. > > [ 40.345973] BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:691] > > [ 40.347737] Modules linked in: sd_mod arc4 firewire_sbp2 scsi_mod b43 mac80211 cfg80211 > > [ 40.349443] irq event stamp: 65779272 > > [ 40.351156] hardirqs last enabled at (65779271): [<c052fae8>] _raw_spin_unlock_irqrestore+0x40/0x9c > > [ 40.352911] hardirqs last disabled at (65779272): [<c0011648>] reenable_mmu+0x40/0x98 > > [ 40.354649] softirqs last enabled at (65769864): [<c000f3c0>] call_do_softirq+0x14/0x24 > > [ 40.356386] softirqs last disabled at (65769857): [<c000f3c0>] call_do_softirq+0x14/0x24 > > [ 40.358098] NIP: c005ee64 LR: c005ee54 CTR: c009c578 > > [ 40.359788] REGS: eece3d70 TRAP: 0901 Not tainted (3.4.0-05609-gc80ddb5) > > [ 40.361517] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 24042482 XER: 00000000 > > [ 40.363261] TASK = eec30b80[691] 'modprobe' THREAD: eece2000#012 > > [ 40.363261] GPR00: c005ee54 eece3e20 eec30b80 00000000 00000002 c005edd0 00000000 00000000 #012 > > [ 40.363261] GPR08: 00000000 eece2000 00000000 00000001 24042488 > > [ 40.368310] NIP [c005ee64] async_synchronize_cookie_domain+0x70/0x1f4 > > [ 40.369978] LR [c005ee54] async_synchronize_cookie_domain+0x60/0x1f4 > > [ 40.371614] Call Trace: > > [ 40.373226] [eece3e20] [c005ee54] async_synchronize_cookie_domain+0x60/0x1f4 (unreliable) > > [ 40.374884] [eece3e70] [c005f034] async_synchronize_full+0x3c/0x74 > > [ 40.376524] [eece3e90] [c0082790] sys_init_module+0x178/0x1164 > > [ 40.378151] [eece3f40] [c00117a8] ret_from_syscall+0x0/0x38 > > [ 40.379783] --- Exception: c01 at 0xff5d694#012[ 40.379783] LR = 0x100041c0 > > [ 40.382928] Instruction dump: > > [ 40.384489] 93a10044 419e0014 3d20c06f 8009d020 2f800000 419e017c 7fc3f378 4bfffed9 > > [ 40.386081] 7f9f1840 419d0060 7f9f1800 419e0050 <881bf008> 2f800000 419e0014 3d20c06f -- BOFH excuse #280: Traceroute says that there is a routing problem in the backbone. It's not our problem. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 3.5.0-rc5: BUG: soft lockup - CPU#0 stuck for 22s 2012-07-02 4:50 ` Benjamin Herrenschmidt 2012-07-02 6:30 ` Christian Kujau @ 2012-07-02 11:58 ` Christian Kujau 1 sibling, 0 replies; 6+ messages in thread From: Christian Kujau @ 2012-07-02 11:58 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: linuxppc-dev, LKML, dan.j.williams, JBottomley, stern On Mon, 2 Jul 2012 at 14:50, Benjamin Herrenschmidt wrote: > > while trying to upgrade from 3.4.0 to 3.5.0-rc5 on this Powerbook G4 > > (powerpc 32 bit), this happens during booting: > > > > -------------- > > usb 2-1: new full-speed USB device number 4 using ohci_hcd > > SCSI subsystem initialized > > scsi0 : SBP-2 IEEE-1394 > > scsi1 : SBP-2 IEEE-1394 > > firewire_sbp2 fw1.0: logged in to LUN 0000 (0 retries) > > scsi 0:0:0:0: Direct-Access Ext Hard Disk PQ: 0 ANSI: 4 > > Interesting... I observed something roughly similar on a dual G4 > the other day associated with a 30s to 1mn pause during boot. RCU > was complaining loudly. > > In my case, it did continue booting normally, is that the case for you ? > > Also if I compile the kernel without CONFIG_SMP, it did go away as well, > do you observe that too ? > > I don't have a spare cycle to investigate this problem this week I'm > afraid, it might help if you could try a bisection though. OK, after a git-bisect, the following commit has been identified as bad: --------------------------- a7a20d103994fd760766e6c9d494daa569cbfe06 is the first bad commit commit a7a20d103994fd760766e6c9d494daa569cbfe06 Author: Dan Williams <dan.j.williams@intel.com> Date: Thu Mar 22 17:05:11 2012 -0700 [SCSI] sd: limit the scope of the async probe domain --------------------------- Unfortunately it's not possible to "git revert" just this single commit: ------------ $ git revert a7a20d103994fd760766e6c9d494daa569cbfe06 error: could not revert a7a20d1... [SCSI] sd: limit the scope of the async probe domain hint: after resolving the conflicts, mark the corrected paths hint: with 'git add <paths>' or 'git rm <paths>' hint: and commit the result with 'git commit' ------------ I _guess_ this is probably due to: ------------ commit ea80dadec7a06889562b478cf0b87afbe62b7ac8 Author: James Bottomley <JBottomley@Parallels.com> Date: Wed Jun 6 14:54:13 2012 +0900 [SCSI] Fix sd_probe_domain config problem ------------ The whole git-bisect log, dmesg, .config and so on is here: http://nerdbynature.de/bits/3.5.0-rc5/soft_lockup/ Anyone got an idea how to go from here? Thanks, Christian. -- BOFH excuse #380: Operators killed when huge stack of backup tapes fell over. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-07-02 18:46 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-07-01 7:18 3.5.0-rc5: BUG: soft lockup - CPU#0 stuck for 22s Christian Kujau 2012-07-02 4:50 ` Benjamin Herrenschmidt 2012-07-02 6:30 ` Christian Kujau 2012-07-02 16:45 ` Paul E. McKenney 2012-07-02 18:46 ` Christian Kujau 2012-07-02 11:58 ` Christian Kujau
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox