* RCU stalls on 32-bit pmac SMP
@ 2012-06-18 0:35 Benjamin Herrenschmidt
2012-06-18 17:05 ` Paul E. McKenney
0 siblings, 1 reply; 5+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-18 0:35 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: linuxppc-dev
Hi Paul !
I was about to go debug something else when I hit that with -rc3 (plus
the patch to fix the current bug.h breakage) on a 32-bit PowerMac G4 SMP
(2 CPUs): About 1mn pause at boot followed by a bunch of RCU stall
messages. Log below. Any idea where to start looking ?
pci 0002:20:0f.0: reg 30: [mem 0xf5100000-0xf51fffff pref]
PCI 0000:00 Cannot reserve Legacy IO [io 0x802000-0x802fff]
PCI 0001:10 Cannot reserve Legacy IO [io 0x0000-0x0fff]
pci_bus 0000:00: resource 4 [io 0x802000-0x1001fff]
pci_bus 0000:00: resource 5 [mem 0xf1000000-0xf1ffffff]
pci_bus 0000:00: resource 6 [mem 0x90000000-0x9fffffff]
pci_bus 0001:10: resource 4 [io 0x0000-0x7fffff]
pci_bus 0001:10: resource 5 [mem 0xf3000000-0xf3ffffff]
pci_bus 0001:10: resource 6 [mem 0x80000000-0x8fffffff]
pci_bus 0002:20: resource 4 [io 0xff7fe000-0xffffdfff]
pci_bus 0002:20: resource 5 [mem 0xf5000000-0xf5ffffff]
bio: create slab <bio-0> at 0
vgaarb: device added: PCI:0000:00:10.0,decodes=io+mem,owns=io+mem,locks=none
vgaarb: loaded
vgaarb: bridge control possible 0000:00:10.0
SCSI subsystem initialized
libata version 3.00 loaded.
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
Switching to clocksource timebase
NET: Registered protocol family 2
IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
TCP: Hash tables configured (established 131072 bind 65536)
TCP: reno registered
UDP hash table entries: 512 (order: 2, 16384 bytes)
UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
NET: Registered protocol family 1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
pci 0001:10:18.0: enabling device (0000 -> 0002)
pci 0001:10:19.0: enabling device (0000 -> 0002)
PCI: CLS mismatch (32 != 1020), using 32 bytes
highmem bounce pool size: 64 pages
NFS: Registering the id_resolver key type
Key type id_resolver registered
msgmni has been set to 1500
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
Using unsupported 1680x1050 ATY,BlueStone_A at 9c008000, depth=8, pitch=1792
Console: switching to colour frame buffer device 210x65
fb0: Open Firmware frame buffer device on /pci@f0000000/ATY,BlueStoneParent@10/ATY,BlueStone_A
Using unsupported 640x480 ATY,BlueStone_B at 99008000, depth=8, pitch=768
checking generic (9c008000 1cb600) vs hw (99008000 5a000)
fb1: Open Firmware frame buffer device on /pci@f0000000/ATY,BlueStoneParent@10/ATY,BlueStone_B
Generic RTC Driver v1.07
Generic non-volatile memory driver v1.1
brd: module loaded
loop: module loaded
MacIO PCI driver attached to Keylargo chipset
adb: starting probe task...
adb: finished probe task...
mesh: configured for synchronous 5 MB/s
st: Version 20101219, fixed bufsize 32768, s/g segs 256
pata-macio 0.0001f000:ata-4: Activating pata-macio chipset KeyLargo ATA-4, Apple bus ID 2
scsi0 : pata_macio
ata1: PATA max UDMA/66 irq 19
ata1.00: ATA-5: ST380021A, 5.05, max UDMA/100
ata1.00: 156301488 sectors, multi 16: LBA
ata1.01: ATA-5: ST380021A, 3.75, max UDMA/100
ata1.01: 156301488 sectors, multi 16: LBA
ata1.00: configured for UDMA/66
ata1.01: configured for UDMA/66
scsi 0:0:0:0: Direct-Access ATA ST380021A 5.05 PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 156301488 512-byte logical blocks: (80.0 GB/74.5 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: Attached scsi generic sg0 type 0
sda: [mac] sda1 sda2 sda3 sda4 sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 sda13 sda14 sda15
sd 0:0:0:0: [sda] Attached SCSI disk
scsi 0:0:1:0: Direct-Access ATA ST380021A 3.75 PQ: 0 ANSI: 5
sd 0:0:1:0: [sdb] 156301488 512-byte logical blocks: (80.0 GB/74.5 GiB)
sd 0:0:1:0: [sdb] Write Protect is off
sd 0:0:1:0: Attached scsi generic sg1 type 0
sd 0:0:1:0: [sdb] Mode Sense: 00 3a 00 00
sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
INFO: rcu_sched self-detected stall on CPU { 0} (t=16163 jiffies)
Call Trace:
INFO: rcu_sched self-detected stall on CPU { 1} (t=16163 jiffies)
Call Trace:
[ef877d30] [c0008d04] show_stack+0x50/0x158 (unreliable)
[ef877d70] [c0097fe4] __rcu_pending+0x184/0x46c
[ef877da0] [c00991a0] rcu_check_callbacks+0x7c/0x168
[ef877dc0] [c0044a40] update_process_times+0x3c/0x70
[ef877de0] [c0083a3c] tick_sched_timer+0x88/0x100
[ef877e10] [c005b11c] __run_hrtimer.clone.29+0x54/0x104
[ef877e30] [c005bf44] hrtimer_interrupt+0x158/0x3f8
[ef877ea0] [c000b5c4] timer_interrupt+0x1cc/0x204
[ef877ed0] [c0011b88] ret_from_except+0x0/0x1c
--- Exception: 901 at cpu_idle+0xe4/0x188
LR = cpu_idle+0xc8/0x188
[ef877f90] [c00097e8] cpu_idle+0x60/0x188 (unreliable)
[ef877fc0] [c046531c] start_secondary+0x2c8/0x2cc
[ef877ff0] [00003278] 0x3278
[ef873b60] [c0008d04] show_stack+0x50/0x158 (unreliable)
[ef873ba0] [c0097fe4] __rcu_pending+0x184/0x46c
[ef873bd0] [c0099240] rcu_check_callbacks+0x11c/0x168
[ef873bf0] [c0044a40] update_process_times+0x3c/0x70
[ef873c10] [c0083a3c] tick_sched_timer+0x88/0x100
[ef873c40] [c005b11c] __run_hrtimer.clone.29+0x54/0x104
[ef873c60] [c005bf44] hrtimer_interrupt+0x158/0x3f8
[ef873cd0] [c000b5c4] timer_interrupt+0x1cc/0x204
[ef873d00] [c0011b88] ret_from_except+0x0/0x1c
--- Exception: 901 at wake_up_new_task+0x134/0x16c
LR = wake_up_new_task+0x134/0x16c
[ef873dc0] [c0065f08] wake_up_new_task+0xfc/0x16c (unreliable)
[ef873df0] [c0035530] do_fork+0xe8/0x2bc
[ef873e30] [c0008a4c] sys_clone+0x50/0x90
[ef873e50] [c00114b8] ret_from_syscall+0x0/0x40
--- Exception: c00 at kernel_thread+0x28/0x68
LR = __call_usermodehelper+0x40/0xdc
[ef873f10] [c005f5c8] async_run_entry_fn+0x128/0x1e4 (unreliable)
[ef873f20] [ef878c00] 0xef878c00
[ef873f40] [c004e668] process_one_work+0x150/0x3f0
[ef873f70] [c00514f0] worker_thread+0x18c/0x37c
[ef873fb0] [c00567bc] kthread+0x84/0x88
[ef873ff0] [c000f514] kernel_thread+0x4c/0x68
pata-macio 0.00020000:ata-3: Activating pata-macio chipset KeyLargo ATA-3, Apple bus ID 0
scsi1 : pata_macio
ata2: PATA max MWDMA2 irq 20
sdb: [mac] sdb1 sdb2 sdb3 sdb4 sdb5 sdb6 sdb7 sdb8 sdb9 sdb10 sdb11 sdb12 sdb13
sd 0:0:1:0: [sdb] Attached SCSI disk
ata2.00: ATAPI: PIONEER DVD-RW DVR-104, A227, max UDMA/33
ata2.00: configured for MWDMA2
scsi 1:0:0:0: CD-ROM PIONEER DVD-RW DVR-104 A227 PQ: 0 ANSI: 5
pata-macio 0.00021000:ata-3: Activating pata-macio chipset KeyLargo ATA-3, Apple bus ID 1
scsi2 : pata_macio
ata3: PATA max MWDMA2 irq 21
sr0: scsi3-mmc drive: 24x/24x writer cd/rw xa/form2 cdda tray
cdrom: Uniform CD-ROM driver Revision: 3.20
sr 1:0:0:0: Attached scsi CD-ROM sr0
pcnet32: pcnet32.c:v1.35 21.Apr.2008 tsbogend@alpha.franken.de
sr 1:0:0:0: Attached scsi generic sg2 type 5
sungem.c:v1.0 David S. Miller <davem@redhat.com>
gem 0002:20:0f.0: eth0: Sun GEM (PCI) 10/100/1000BaseT Ethernet 00:03:93:6f:04:b2
PPP generic driver version 2.4.2
PPP Deflate Compression module registered
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
ohci_hcd 0001:10:18.0: OHCI Host Controller
ohci_hcd 0001:10:18.0: new USB bus registered, assigned bus number 1
ohci_hcd 0001:10:18.0: irq 27, io mem 0x80081000
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 2 ports detected
ohci_hcd 0001:10:19.0: OHCI Host Controller
ohci_hcd 0001:10:19.0: new USB bus registered, assigned bus number 2
ohci_hcd 0001:10:19.0: irq 28, io mem 0x80080000
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
mousedev: PS/2 mouse device common for all mice
usbcore: registered new interface driver appletouch
PowerMac i2c bus pmu 2 registered
i2c i2c-0: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/via-pmu@16000/rtc
i2c i2c-0: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/via-pmu@16000/power-mgt
PowerMac i2c bus pmu 1 registered
i2c i2c-1: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/via-pmu@16000/rtc
i2c i2c-1: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/via-pmu@16000/power-mgt
PowerMac i2c bus mac-io 0 registered
i2c i2c-2: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/i2c@18000/cereal
i2c i2c-2: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/i2c@18000/deq
i2c i2c-2: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/i2c@18000/i2c-modem
PowerMac i2c bus uni-n 1 registered
i2c i2c-3: i2c-powermac: invalid reg on /uni-n@f8000000/i2c@f8001000/cereal
PowerMac i2c bus uni-n 0 registered
i2c i2c-4: i2c-powermac: invalid reg on /uni-n@f8000000/i2c@f8001000/cereal
APM Battery Driver
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
oprofile: using ppc/7450 performance monitoring.
TCP: cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 17
NET: Registered protocol family 15
Key type dns_resolver registered
PM: Hibernation image not present or could not be loaded.
input: PMU as /devices/virtual/input/input0
kjournald starting. Commit interval 5 seconds
EXT3-fs (sda15): mounted filesystem with writeback data mode
VFS: Mounted root (ext3 filesystem) readonly on device 8:15.
usb 1-1: new full-speed USB device number 2 using ohci_hcd
devtmpfs: mounted
Freeing unused kernel memory: 240k freed
hub 1-1:1.0: USB hub found
hub 1-1:1.0: 3 ports detected
usb 1-2: new full-speed USB device number 3 using ohci_hcd
hub 1-2:1.0: USB hub found
hub 1-2:1.0: 3 ports detected
usb 1-1.3: new full-speed USB device number 4 using ohci_hcd
input: Mitsumi Electric Apple Extended USB Keyboard as /devices/pci0001:10/0001:10:18.0/usb1/1-1/1-1.3/1-1.3:1.0/input/input1
hid-generic 0003:05AC:020B.0001: input: USB HID v1.10 Keyboard [Mitsumi Electric Apple Extended USB Keyboard] on usb-0001:10:18.0-1.3/input0
input: Mitsumi Electric Apple Extended USB Keyboard as /devices/pci0001:10/0001:10:18.0/usb1/1-1/1-1.3/1-1.3:1.1/input/input2
hid-generic 0003:05AC:020B.0002: input: USB HID v1.10 Device [Mitsumi Electric Apple Extended USB Keyboard] on usb-0001:10:18.0-1.3/input1
usb 1-2.3: new low-speed USB device number 5 using ohci_hcd
hid-generic 0003:05AC:9219.0003: claimed by neither input, hiddev nor hidraw
udevd[1209]: starting version 175
EXT3-fs (sda15): using internal journal
sungem_phy: PHY ID: 2060e1, addr: 0
gem 0002:20:0f.0: eth0: Found BCM5421 PHY
gem 0002:20:0f.0: eth0: Link is up at 100 Mbps, full-duplex
gem 0002:20:0f.0: eth0: Pause is enabled (rxfifo: 10240 off: 7168 on: 5632)
sshd (2042): /proc/2042/oom_adj is deprecated, please use /proc/2042/oom_score_adj instead.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: RCU stalls on 32-bit pmac SMP
2012-06-18 0:35 RCU stalls on 32-bit pmac SMP Benjamin Herrenschmidt
@ 2012-06-18 17:05 ` Paul E. McKenney
2012-06-18 20:45 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 5+ messages in thread
From: Paul E. McKenney @ 2012-06-18 17:05 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
On Mon, Jun 18, 2012 at 10:35:46AM +1000, Benjamin Herrenschmidt wrote:
> Hi Paul !
>
> I was about to go debug something else when I hit that with -rc3 (plus
> the patch to fix the current bug.h breakage) on a 32-bit PowerMac G4 SMP
> (2 CPUs): About 1mn pause at boot followed by a bunch of RCU stall
> messages. Log below. Any idea where to start looking ?
>
> pci 0002:20:0f.0: reg 30: [mem 0xf5100000-0xf51fffff pref]
> PCI 0000:00 Cannot reserve Legacy IO [io 0x802000-0x802fff]
> PCI 0001:10 Cannot reserve Legacy IO [io 0x0000-0x0fff]
> pci_bus 0000:00: resource 4 [io 0x802000-0x1001fff]
> pci_bus 0000:00: resource 5 [mem 0xf1000000-0xf1ffffff]
> pci_bus 0000:00: resource 6 [mem 0x90000000-0x9fffffff]
> pci_bus 0001:10: resource 4 [io 0x0000-0x7fffff]
> pci_bus 0001:10: resource 5 [mem 0xf3000000-0xf3ffffff]
> pci_bus 0001:10: resource 6 [mem 0x80000000-0x8fffffff]
> pci_bus 0002:20: resource 4 [io 0xff7fe000-0xffffdfff]
> pci_bus 0002:20: resource 5 [mem 0xf5000000-0xf5ffffff]
> bio: create slab <bio-0> at 0
> vgaarb: device added: PCI:0000:00:10.0,decodes=io+mem,owns=io+mem,locks=none
> vgaarb: loaded
> vgaarb: bridge control possible 0000:00:10.0
> SCSI subsystem initialized
> libata version 3.00 loaded.
> usbcore: registered new interface driver usbfs
> usbcore: registered new interface driver hub
> usbcore: registered new device driver usb
> Switching to clocksource timebase
> NET: Registered protocol family 2
> IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
> TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
> TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
> TCP: Hash tables configured (established 131072 bind 65536)
> TCP: reno registered
> UDP hash table entries: 512 (order: 2, 16384 bytes)
> UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
> NET: Registered protocol family 1
> RPC: Registered named UNIX socket transport module.
> RPC: Registered udp transport module.
> RPC: Registered tcp transport module.
> RPC: Registered tcp NFSv4.1 backchannel transport module.
> pci 0001:10:18.0: enabling device (0000 -> 0002)
> pci 0001:10:19.0: enabling device (0000 -> 0002)
> PCI: CLS mismatch (32 != 1020), using 32 bytes
> highmem bounce pool size: 64 pages
> NFS: Registering the id_resolver key type
> Key type id_resolver registered
> msgmni has been set to 1500
> Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
> io scheduler noop registered
> io scheduler deadline registered
> io scheduler cfq registered (default)
> Using unsupported 1680x1050 ATY,BlueStone_A at 9c008000, depth=8, pitch=1792
> Console: switching to colour frame buffer device 210x65
> fb0: Open Firmware frame buffer device on /pci@f0000000/ATY,BlueStoneParent@10/ATY,BlueStone_A
> Using unsupported 640x480 ATY,BlueStone_B at 99008000, depth=8, pitch=768
> checking generic (9c008000 1cb600) vs hw (99008000 5a000)
> fb1: Open Firmware frame buffer device on /pci@f0000000/ATY,BlueStoneParent@10/ATY,BlueStone_B
> Generic RTC Driver v1.07
> Generic non-volatile memory driver v1.1
> brd: module loaded
> loop: module loaded
> MacIO PCI driver attached to Keylargo chipset
> adb: starting probe task...
> adb: finished probe task...
> mesh: configured for synchronous 5 MB/s
> st: Version 20101219, fixed bufsize 32768, s/g segs 256
> pata-macio 0.0001f000:ata-4: Activating pata-macio chipset KeyLargo ATA-4, Apple bus ID 2
> scsi0 : pata_macio
> ata1: PATA max UDMA/66 irq 19
> ata1.00: ATA-5: ST380021A, 5.05, max UDMA/100
> ata1.00: 156301488 sectors, multi 16: LBA
> ata1.01: ATA-5: ST380021A, 3.75, max UDMA/100
> ata1.01: 156301488 sectors, multi 16: LBA
> ata1.00: configured for UDMA/66
> ata1.01: configured for UDMA/66
> scsi 0:0:0:0: Direct-Access ATA ST380021A 5.05 PQ: 0 ANSI: 5
> sd 0:0:0:0: [sda] 156301488 512-byte logical blocks: (80.0 GB/74.5 GiB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> sd 0:0:0:0: Attached scsi generic sg0 type 0
> sda: [mac] sda1 sda2 sda3 sda4 sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 sda13 sda14 sda15
> sd 0:0:0:0: [sda] Attached SCSI disk
> scsi 0:0:1:0: Direct-Access ATA ST380021A 3.75 PQ: 0 ANSI: 5
> sd 0:0:1:0: [sdb] 156301488 512-byte logical blocks: (80.0 GB/74.5 GiB)
> sd 0:0:1:0: [sdb] Write Protect is off
> sd 0:0:1:0: Attached scsi generic sg1 type 0
> sd 0:0:1:0: [sdb] Mode Sense: 00 3a 00 00
> sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> INFO: rcu_sched self-detected stall on CPU { 0} (t=16163 jiffies)
> Call Trace:
> INFO: rcu_sched self-detected stall on CPU { 1} (t=16163 jiffies)
> Call Trace:
> [ef877d30] [c0008d04] show_stack+0x50/0x158 (unreliable)
> [ef877d70] [c0097fe4] __rcu_pending+0x184/0x46c
> [ef877da0] [c00991a0] rcu_check_callbacks+0x7c/0x168
> [ef877dc0] [c0044a40] update_process_times+0x3c/0x70
> [ef877de0] [c0083a3c] tick_sched_timer+0x88/0x100
> [ef877e10] [c005b11c] __run_hrtimer.clone.29+0x54/0x104
> [ef877e30] [c005bf44] hrtimer_interrupt+0x158/0x3f8
> [ef877ea0] [c000b5c4] timer_interrupt+0x1cc/0x204
> [ef877ed0] [c0011b88] ret_from_except+0x0/0x1c
> --- Exception: 901 at cpu_idle+0xe4/0x188
Am I reading this correctly? Did the system really take a scheduling-clock
interrupt from within another scheduling-clock interrupt?
If so, my first question is "Why did the scheduling-clock interrupt
run for so long?" ;-)
Thanx, Paul
> LR = cpu_idle+0xc8/0x188
> [ef877f90] [c00097e8] cpu_idle+0x60/0x188 (unreliable)
> [ef877fc0] [c046531c] start_secondary+0x2c8/0x2cc
> [ef877ff0] [00003278] 0x3278
> [ef873b60] [c0008d04] show_stack+0x50/0x158 (unreliable)
> [ef873ba0] [c0097fe4] __rcu_pending+0x184/0x46c
> [ef873bd0] [c0099240] rcu_check_callbacks+0x11c/0x168
> [ef873bf0] [c0044a40] update_process_times+0x3c/0x70
> [ef873c10] [c0083a3c] tick_sched_timer+0x88/0x100
> [ef873c40] [c005b11c] __run_hrtimer.clone.29+0x54/0x104
> [ef873c60] [c005bf44] hrtimer_interrupt+0x158/0x3f8
> [ef873cd0] [c000b5c4] timer_interrupt+0x1cc/0x204
> [ef873d00] [c0011b88] ret_from_except+0x0/0x1c
> --- Exception: 901 at wake_up_new_task+0x134/0x16c
> LR = wake_up_new_task+0x134/0x16c
> [ef873dc0] [c0065f08] wake_up_new_task+0xfc/0x16c (unreliable)
> [ef873df0] [c0035530] do_fork+0xe8/0x2bc
> [ef873e30] [c0008a4c] sys_clone+0x50/0x90
> [ef873e50] [c00114b8] ret_from_syscall+0x0/0x40
> --- Exception: c00 at kernel_thread+0x28/0x68
> LR = __call_usermodehelper+0x40/0xdc
> [ef873f10] [c005f5c8] async_run_entry_fn+0x128/0x1e4 (unreliable)
> [ef873f20] [ef878c00] 0xef878c00
> [ef873f40] [c004e668] process_one_work+0x150/0x3f0
> [ef873f70] [c00514f0] worker_thread+0x18c/0x37c
> [ef873fb0] [c00567bc] kthread+0x84/0x88
> [ef873ff0] [c000f514] kernel_thread+0x4c/0x68
> pata-macio 0.00020000:ata-3: Activating pata-macio chipset KeyLargo ATA-3, Apple bus ID 0
> scsi1 : pata_macio
> ata2: PATA max MWDMA2 irq 20
> sdb: [mac] sdb1 sdb2 sdb3 sdb4 sdb5 sdb6 sdb7 sdb8 sdb9 sdb10 sdb11 sdb12 sdb13
> sd 0:0:1:0: [sdb] Attached SCSI disk
> ata2.00: ATAPI: PIONEER DVD-RW DVR-104, A227, max UDMA/33
> ata2.00: configured for MWDMA2
> scsi 1:0:0:0: CD-ROM PIONEER DVD-RW DVR-104 A227 PQ: 0 ANSI: 5
> pata-macio 0.00021000:ata-3: Activating pata-macio chipset KeyLargo ATA-3, Apple bus ID 1
> scsi2 : pata_macio
> ata3: PATA max MWDMA2 irq 21
> sr0: scsi3-mmc drive: 24x/24x writer cd/rw xa/form2 cdda tray
> cdrom: Uniform CD-ROM driver Revision: 3.20
> sr 1:0:0:0: Attached scsi CD-ROM sr0
> pcnet32: pcnet32.c:v1.35 21.Apr.2008 tsbogend@alpha.franken.de
> sr 1:0:0:0: Attached scsi generic sg2 type 5
> sungem.c:v1.0 David S. Miller <davem@redhat.com>
> gem 0002:20:0f.0: eth0: Sun GEM (PCI) 10/100/1000BaseT Ethernet 00:03:93:6f:04:b2
> PPP generic driver version 2.4.2
> PPP Deflate Compression module registered
> ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
> ohci_hcd 0001:10:18.0: OHCI Host Controller
> ohci_hcd 0001:10:18.0: new USB bus registered, assigned bus number 1
> ohci_hcd 0001:10:18.0: irq 27, io mem 0x80081000
> hub 1-0:1.0: USB hub found
> hub 1-0:1.0: 2 ports detected
> ohci_hcd 0001:10:19.0: OHCI Host Controller
> ohci_hcd 0001:10:19.0: new USB bus registered, assigned bus number 2
> ohci_hcd 0001:10:19.0: irq 28, io mem 0x80080000
> hub 2-0:1.0: USB hub found
> hub 2-0:1.0: 2 ports detected
> mousedev: PS/2 mouse device common for all mice
> usbcore: registered new interface driver appletouch
> PowerMac i2c bus pmu 2 registered
> i2c i2c-0: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/via-pmu@16000/rtc
> i2c i2c-0: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/via-pmu@16000/power-mgt
> PowerMac i2c bus pmu 1 registered
> i2c i2c-1: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/via-pmu@16000/rtc
> i2c i2c-1: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/via-pmu@16000/power-mgt
> PowerMac i2c bus mac-io 0 registered
> i2c i2c-2: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/i2c@18000/cereal
> i2c i2c-2: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/i2c@18000/deq
> i2c i2c-2: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/i2c@18000/i2c-modem
> PowerMac i2c bus uni-n 1 registered
> i2c i2c-3: i2c-powermac: invalid reg on /uni-n@f8000000/i2c@f8001000/cereal
> PowerMac i2c bus uni-n 0 registered
> i2c i2c-4: i2c-powermac: invalid reg on /uni-n@f8000000/i2c@f8001000/cereal
> APM Battery Driver
> usbcore: registered new interface driver usbhid
> usbhid: USB HID core driver
> oprofile: using ppc/7450 performance monitoring.
> TCP: cubic registered
> Initializing XFRM netlink socket
> NET: Registered protocol family 17
> NET: Registered protocol family 15
> Key type dns_resolver registered
> PM: Hibernation image not present or could not be loaded.
> input: PMU as /devices/virtual/input/input0
> kjournald starting. Commit interval 5 seconds
> EXT3-fs (sda15): mounted filesystem with writeback data mode
> VFS: Mounted root (ext3 filesystem) readonly on device 8:15.
> usb 1-1: new full-speed USB device number 2 using ohci_hcd
> devtmpfs: mounted
> Freeing unused kernel memory: 240k freed
> hub 1-1:1.0: USB hub found
> hub 1-1:1.0: 3 ports detected
> usb 1-2: new full-speed USB device number 3 using ohci_hcd
> hub 1-2:1.0: USB hub found
> hub 1-2:1.0: 3 ports detected
> usb 1-1.3: new full-speed USB device number 4 using ohci_hcd
> input: Mitsumi Electric Apple Extended USB Keyboard as /devices/pci0001:10/0001:10:18.0/usb1/1-1/1-1.3/1-1.3:1.0/input/input1
> hid-generic 0003:05AC:020B.0001: input: USB HID v1.10 Keyboard [Mitsumi Electric Apple Extended USB Keyboard] on usb-0001:10:18.0-1.3/input0
> input: Mitsumi Electric Apple Extended USB Keyboard as /devices/pci0001:10/0001:10:18.0/usb1/1-1/1-1.3/1-1.3:1.1/input/input2
> hid-generic 0003:05AC:020B.0002: input: USB HID v1.10 Device [Mitsumi Electric Apple Extended USB Keyboard] on usb-0001:10:18.0-1.3/input1
> usb 1-2.3: new low-speed USB device number 5 using ohci_hcd
> hid-generic 0003:05AC:9219.0003: claimed by neither input, hiddev nor hidraw
> udevd[1209]: starting version 175
> EXT3-fs (sda15): using internal journal
> sungem_phy: PHY ID: 2060e1, addr: 0
> gem 0002:20:0f.0: eth0: Found BCM5421 PHY
> gem 0002:20:0f.0: eth0: Link is up at 100 Mbps, full-duplex
> gem 0002:20:0f.0: eth0: Pause is enabled (rxfifo: 10240 off: 7168 on: 5632)
> sshd (2042): /proc/2042/oom_adj is deprecated, please use /proc/2042/oom_score_adj instead.
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: RCU stalls on 32-bit pmac SMP
2012-06-18 17:05 ` Paul E. McKenney
@ 2012-06-18 20:45 ` Benjamin Herrenschmidt
2012-06-18 21:58 ` Paul E. McKenney
0 siblings, 1 reply; 5+ messages in thread
From: Benjamin Herrenschmidt @ 2012-06-18 20:45 UTC (permalink / raw)
To: paulmck; +Cc: linuxppc-dev
Hi Paul !
On Mon, 2012-06-18 at 10:05 -0700, Paul E. McKenney wrote:
> > sd 0:0:0:0: Attached scsi generic sg0 type 0
> > sda: [mac] sda1 sda2 sda3 sda4 sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 sda13 sda14 sda15
> > sd 0:0:0:0: [sda] Attached SCSI disk
> > scsi 0:0:1:0: Direct-Access ATA ST380021A 3.75 PQ: 0 ANSI: 5
> > sd 0:0:1:0: [sdb] 156301488 512-byte logical blocks: (80.0 GB/74.5 GiB)
> > sd 0:0:1:0: [sdb] Write Protect is off
> > sd 0:0:1:0: Attached scsi generic sg1 type 0
> > sd 0:0:1:0: [sdb] Mode Sense: 00 3a 00 00
> > sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > INFO: rcu_sched self-detected stall on CPU { 0} (t=16163 jiffies)
> > Call Trace:
> > INFO: rcu_sched self-detected stall on CPU { 1} (t=16163 jiffies)
> > Call Trace:
> > [ef877d30] [c0008d04] show_stack+0x50/0x158 (unreliable)
> > [ef877d70] [c0097fe4] __rcu_pending+0x184/0x46c
> > [ef877da0] [c00991a0] rcu_check_callbacks+0x7c/0x168
> > [ef877dc0] [c0044a40] update_process_times+0x3c/0x70
> > [ef877de0] [c0083a3c] tick_sched_timer+0x88/0x100
> > [ef877e10] [c005b11c] __run_hrtimer.clone.29+0x54/0x104
> > [ef877e30] [c005bf44] hrtimer_interrupt+0x158/0x3f8
> > [ef877ea0] [c000b5c4] timer_interrupt+0x1cc/0x204
> > [ef877ed0] [c0011b88] ret_from_except+0x0/0x1c
> > --- Exception: 901 at cpu_idle+0xe4/0x188
>
> Am I reading this correctly? Did the system really take a scheduling-clock
> interrupt from within another scheduling-clock interrupt?
Not that I can see. I see only one interrupt on that CPU. It was idle,
took a timer interrupt -> hrtimer ... etc... RCU boom !
> If so, my first question is "Why did the scheduling-clock interrupt
> run for so long?" ;-)
This is still that CPU:
> LR = cpu_idle+0xc8/0x188
> > [ef877f90] [c00097e8] cpu_idle+0x60/0x188 (unreliable)
> > [ef877fc0] [c046531c] start_secondary+0x2c8/0x2cc
> > [ef877ff0] [00003278] 0x3278
And now we have the other one:
> > [ef873b60] [c0008d04] show_stack+0x50/0x158 (unreliable)
> > [ef873ba0] [c0097fe4] __rcu_pending+0x184/0x46c
> > [ef873bd0] [c0099240] rcu_check_callbacks+0x11c/0x168
> > [ef873bf0] [c0044a40] update_process_times+0x3c/0x70
> > [ef873c10] [c0083a3c] tick_sched_timer+0x88/0x100
> > [ef873c40] [c005b11c] __run_hrtimer.clone.29+0x54/0x104
> > [ef873c60] [c005bf44] hrtimer_interrupt+0x158/0x3f8
> > [ef873cd0] [c000b5c4] timer_interrupt+0x1cc/0x204
> > [ef873d00] [c0011b88] ret_from_except+0x0/0x1c
> > --- Exception: 901 at wake_up_new_task+0x134/0x16c
> > LR = wake_up_new_task+0x134/0x16c
> > [ef873dc0] [c0065f08] wake_up_new_task+0xfc/0x16c (unreliable)
> > [ef873df0] [c0035530] do_fork+0xe8/0x2bc
> > [ef873e30] [c0008a4c] sys_clone+0x50/0x90
> > [ef873e50] [c00114b8] ret_from_syscall+0x0/0x40
> > --- Exception: c00 at kernel_thread+0x28/0x68
> > LR = __call_usermodehelper+0x40/0xdc
> > [ef873f10] [c005f5c8] async_run_entry_fn+0x128/0x1e4 (unreliable)
> > [ef873f20] [ef878c00] 0xef878c00
> > [ef873f40] [c004e668] process_one_work+0x150/0x3f0
> > [ef873f70] [c00514f0] worker_thread+0x18c/0x37c
> > [ef873fb0] [c00567bc] kthread+0x84/0x88
> > [ef873ff0] [c000f514] kernel_thread+0x4c/0x68
Here what we have is a kernel thread trying to call_usermodehelper
(probably the hotplug stuff from discovering the disk) taking a timer
interrupt from wake_up_new_task.
Looks like the hang has to be with some RCU stuff update_process_times
does vs. taking the interrupt in wake_up_new_task ? Hard to tell...
This happens more/less reliably once at boot on this machine, then no
more (after the long pause the machine moves on, apparently fine).
Cheers,
Ben.
> > pata-macio 0.00020000:ata-3: Activating pata-macio chipset KeyLargo ATA-3, Apple bus ID 0
> > scsi1 : pata_macio
> > ata2: PATA max MWDMA2 irq 20
> > sdb: [mac] sdb1 sdb2 sdb3 sdb4 sdb5 sdb6 sdb7 sdb8 sdb9 sdb10 sdb11 sdb12 sdb13
> > sd 0:0:1:0: [sdb] Attached SCSI disk
> > ata2.00: ATAPI: PIONEER DVD-RW DVR-104, A227, max UDMA/33
> > ata2.00: configured for MWDMA2
> > scsi 1:0:0:0: CD-ROM PIONEER DVD-RW DVR-104 A227 PQ: 0 ANSI: 5
> > pata-macio 0.00021000:ata-3: Activating pata-macio chipset KeyLargo ATA-3, Apple bus ID 1
> > scsi2 : pata_macio
> > ata3: PATA max MWDMA2 irq 21
> > sr0: scsi3-mmc drive: 24x/24x writer cd/rw xa/form2 cdda tray
> > cdrom: Uniform CD-ROM driver Revision: 3.20
> > sr 1:0:0:0: Attached scsi CD-ROM sr0
> > pcnet32: pcnet32.c:v1.35 21.Apr.2008 tsbogend@alpha.franken.de
> > sr 1:0:0:0: Attached scsi generic sg2 type 5
> > sungem.c:v1.0 David S. Miller <davem@redhat.com>
> > gem 0002:20:0f.0: eth0: Sun GEM (PCI) 10/100/1000BaseT Ethernet 00:03:93:6f:04:b2
> > PPP generic driver version 2.4.2
> > PPP Deflate Compression module registered
> > ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
> > ohci_hcd 0001:10:18.0: OHCI Host Controller
> > ohci_hcd 0001:10:18.0: new USB bus registered, assigned bus number 1
> > ohci_hcd 0001:10:18.0: irq 27, io mem 0x80081000
> > hub 1-0:1.0: USB hub found
> > hub 1-0:1.0: 2 ports detected
> > ohci_hcd 0001:10:19.0: OHCI Host Controller
> > ohci_hcd 0001:10:19.0: new USB bus registered, assigned bus number 2
> > ohci_hcd 0001:10:19.0: irq 28, io mem 0x80080000
> > hub 2-0:1.0: USB hub found
> > hub 2-0:1.0: 2 ports detected
> > mousedev: PS/2 mouse device common for all mice
> > usbcore: registered new interface driver appletouch
> > PowerMac i2c bus pmu 2 registered
> > i2c i2c-0: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/via-pmu@16000/rtc
> > i2c i2c-0: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/via-pmu@16000/power-mgt
> > PowerMac i2c bus pmu 1 registered
> > i2c i2c-1: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/via-pmu@16000/rtc
> > i2c i2c-1: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/via-pmu@16000/power-mgt
> > PowerMac i2c bus mac-io 0 registered
> > i2c i2c-2: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/i2c@18000/cereal
> > i2c i2c-2: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/i2c@18000/deq
> > i2c i2c-2: i2c-powermac: invalid reg on /pci@f2000000/mac-io@17/i2c@18000/i2c-modem
> > PowerMac i2c bus uni-n 1 registered
> > i2c i2c-3: i2c-powermac: invalid reg on /uni-n@f8000000/i2c@f8001000/cereal
> > PowerMac i2c bus uni-n 0 registered
> > i2c i2c-4: i2c-powermac: invalid reg on /uni-n@f8000000/i2c@f8001000/cereal
> > APM Battery Driver
> > usbcore: registered new interface driver usbhid
> > usbhid: USB HID core driver
> > oprofile: using ppc/7450 performance monitoring.
> > TCP: cubic registered
> > Initializing XFRM netlink socket
> > NET: Registered protocol family 17
> > NET: Registered protocol family 15
> > Key type dns_resolver registered
> > PM: Hibernation image not present or could not be loaded.
> > input: PMU as /devices/virtual/input/input0
> > kjournald starting. Commit interval 5 seconds
> > EXT3-fs (sda15): mounted filesystem with writeback data mode
> > VFS: Mounted root (ext3 filesystem) readonly on device 8:15.
> > usb 1-1: new full-speed USB device number 2 using ohci_hcd
> > devtmpfs: mounted
> > Freeing unused kernel memory: 240k freed
> > hub 1-1:1.0: USB hub found
> > hub 1-1:1.0: 3 ports detected
> > usb 1-2: new full-speed USB device number 3 using ohci_hcd
> > hub 1-2:1.0: USB hub found
> > hub 1-2:1.0: 3 ports detected
> > usb 1-1.3: new full-speed USB device number 4 using ohci_hcd
> > input: Mitsumi Electric Apple Extended USB Keyboard as /devices/pci0001:10/0001:10:18.0/usb1/1-1/1-1.3/1-1.3:1.0/input/input1
> > hid-generic 0003:05AC:020B.0001: input: USB HID v1.10 Keyboard [Mitsumi Electric Apple Extended USB Keyboard] on usb-0001:10:18.0-1.3/input0
> > input: Mitsumi Electric Apple Extended USB Keyboard as /devices/pci0001:10/0001:10:18.0/usb1/1-1/1-1.3/1-1.3:1.1/input/input2
> > hid-generic 0003:05AC:020B.0002: input: USB HID v1.10 Device [Mitsumi Electric Apple Extended USB Keyboard] on usb-0001:10:18.0-1.3/input1
> > usb 1-2.3: new low-speed USB device number 5 using ohci_hcd
> > hid-generic 0003:05AC:9219.0003: claimed by neither input, hiddev nor hidraw
> > udevd[1209]: starting version 175
> > EXT3-fs (sda15): using internal journal
> > sungem_phy: PHY ID: 2060e1, addr: 0
> > gem 0002:20:0f.0: eth0: Found BCM5421 PHY
> > gem 0002:20:0f.0: eth0: Link is up at 100 Mbps, full-duplex
> > gem 0002:20:0f.0: eth0: Pause is enabled (rxfifo: 10240 off: 7168 on: 5632)
> > sshd (2042): /proc/2042/oom_adj is deprecated, please use /proc/2042/oom_score_adj instead.
> >
> >
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: RCU stalls on 32-bit pmac SMP
2012-06-18 20:45 ` Benjamin Herrenschmidt
@ 2012-06-18 21:58 ` Paul E. McKenney
2012-06-19 14:19 ` Paul E. McKenney
0 siblings, 1 reply; 5+ messages in thread
From: Paul E. McKenney @ 2012-06-18 21:58 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
On Tue, Jun 19, 2012 at 06:45:31AM +1000, Benjamin Herrenschmidt wrote:
> Hi Paul !
>
> On Mon, 2012-06-18 at 10:05 -0700, Paul E. McKenney wrote:
>
> > > sd 0:0:0:0: Attached scsi generic sg0 type 0
> > > sda: [mac] sda1 sda2 sda3 sda4 sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 sda13 sda14 sda15
> > > sd 0:0:0:0: [sda] Attached SCSI disk
> > > scsi 0:0:1:0: Direct-Access ATA ST380021A 3.75 PQ: 0 ANSI: 5
> > > sd 0:0:1:0: [sdb] 156301488 512-byte logical blocks: (80.0 GB/74.5 GiB)
> > > sd 0:0:1:0: [sdb] Write Protect is off
> > > sd 0:0:1:0: Attached scsi generic sg1 type 0
> > > sd 0:0:1:0: [sdb] Mode Sense: 00 3a 00 00
> > > sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > > INFO: rcu_sched self-detected stall on CPU { 0} (t=16163 jiffies)
> > > Call Trace:
> > > INFO: rcu_sched self-detected stall on CPU { 1} (t=16163 jiffies)
> > > Call Trace:
> > > [ef877d30] [c0008d04] show_stack+0x50/0x158 (unreliable)
> > > [ef877d70] [c0097fe4] __rcu_pending+0x184/0x46c
> > > [ef877da0] [c00991a0] rcu_check_callbacks+0x7c/0x168
> > > [ef877dc0] [c0044a40] update_process_times+0x3c/0x70
> > > [ef877de0] [c0083a3c] tick_sched_timer+0x88/0x100
> > > [ef877e10] [c005b11c] __run_hrtimer.clone.29+0x54/0x104
> > > [ef877e30] [c005bf44] hrtimer_interrupt+0x158/0x3f8
> > > [ef877ea0] [c000b5c4] timer_interrupt+0x1cc/0x204
> > > [ef877ed0] [c0011b88] ret_from_except+0x0/0x1c
> > > --- Exception: 901 at cpu_idle+0xe4/0x188
> >
> > Am I reading this correctly? Did the system really take a scheduling-clock
> > interrupt from within another scheduling-clock interrupt?
>
> Not that I can see. I see only one interrupt on that CPU. It was idle,
> took a timer interrupt -> hrtimer ... etc... RCU boom !
>
> > If so, my first question is "Why did the scheduling-clock interrupt
> > run for so long?" ;-)
>
> This is still that CPU:
>
> > LR = cpu_idle+0xc8/0x188
> > > [ef877f90] [c00097e8] cpu_idle+0x60/0x188 (unreliable)
> > > [ef877fc0] [c046531c] start_secondary+0x2c8/0x2cc
> > > [ef877ff0] [00003278] 0x3278
>
> And now we have the other one:
>
> > > [ef873b60] [c0008d04] show_stack+0x50/0x158 (unreliable)
> > > [ef873ba0] [c0097fe4] __rcu_pending+0x184/0x46c
> > > [ef873bd0] [c0099240] rcu_check_callbacks+0x11c/0x168
> > > [ef873bf0] [c0044a40] update_process_times+0x3c/0x70
> > > [ef873c10] [c0083a3c] tick_sched_timer+0x88/0x100
> > > [ef873c40] [c005b11c] __run_hrtimer.clone.29+0x54/0x104
> > > [ef873c60] [c005bf44] hrtimer_interrupt+0x158/0x3f8
> > > [ef873cd0] [c000b5c4] timer_interrupt+0x1cc/0x204
> > > [ef873d00] [c0011b88] ret_from_except+0x0/0x1c
> > > --- Exception: 901 at wake_up_new_task+0x134/0x16c
> > > LR = wake_up_new_task+0x134/0x16c
> > > [ef873dc0] [c0065f08] wake_up_new_task+0xfc/0x16c (unreliable)
> > > [ef873df0] [c0035530] do_fork+0xe8/0x2bc
> > > [ef873e30] [c0008a4c] sys_clone+0x50/0x90
> > > [ef873e50] [c00114b8] ret_from_syscall+0x0/0x40
> > > --- Exception: c00 at kernel_thread+0x28/0x68
> > > LR = __call_usermodehelper+0x40/0xdc
> > > [ef873f10] [c005f5c8] async_run_entry_fn+0x128/0x1e4 (unreliable)
> > > [ef873f20] [ef878c00] 0xef878c00
> > > [ef873f40] [c004e668] process_one_work+0x150/0x3f0
> > > [ef873f70] [c00514f0] worker_thread+0x18c/0x37c
> > > [ef873fb0] [c00567bc] kthread+0x84/0x88
> > > [ef873ff0] [c000f514] kernel_thread+0x4c/0x68
>
> Here what we have is a kernel thread trying to call_usermodehelper
> (probably the hotplug stuff from discovering the disk) taking a timer
> interrupt from wake_up_new_task.
>
> Looks like the hang has to be with some RCU stuff update_process_times
> does vs. taking the interrupt in wake_up_new_task ? Hard to tell...
The RCU stuff from update_process_times() is usually the diagnostic.
> This happens more/less reliably once at boot on this machine, then no
> more (after the long pause the machine moves on, apparently fine).
What happens if you reduce the stall-warning timeout? You can use either
the CONFIG_RCU_CPU_STALL_TIMEOUT kernel config parameter or the
rcutree.rcu_cpu_stall_timeout boot parameter.
If you can get two stall-warning messages to appear during the hang,
comparing the stacks is usually informative.
Thanx, Paul
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: RCU stalls on 32-bit pmac SMP
2012-06-18 21:58 ` Paul E. McKenney
@ 2012-06-19 14:19 ` Paul E. McKenney
0 siblings, 0 replies; 5+ messages in thread
From: Paul E. McKenney @ 2012-06-19 14:19 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
On Mon, Jun 18, 2012 at 02:58:57PM -0700, Paul E. McKenney wrote:
> On Tue, Jun 19, 2012 at 06:45:31AM +1000, Benjamin Herrenschmidt wrote:
> > Hi Paul !
> >
> > On Mon, 2012-06-18 at 10:05 -0700, Paul E. McKenney wrote:
> >
> > > > sd 0:0:0:0: Attached scsi generic sg0 type 0
> > > > sda: [mac] sda1 sda2 sda3 sda4 sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 sda13 sda14 sda15
> > > > sd 0:0:0:0: [sda] Attached SCSI disk
> > > > scsi 0:0:1:0: Direct-Access ATA ST380021A 3.75 PQ: 0 ANSI: 5
> > > > sd 0:0:1:0: [sdb] 156301488 512-byte logical blocks: (80.0 GB/74.5 GiB)
> > > > sd 0:0:1:0: [sdb] Write Protect is off
> > > > sd 0:0:1:0: Attached scsi generic sg1 type 0
> > > > sd 0:0:1:0: [sdb] Mode Sense: 00 3a 00 00
> > > > sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> > > > INFO: rcu_sched self-detected stall on CPU { 0} (t=16163 jiffies)
> > > > Call Trace:
> > > > INFO: rcu_sched self-detected stall on CPU { 1} (t=16163 jiffies)
> > > > Call Trace:
> > > > [ef877d30] [c0008d04] show_stack+0x50/0x158 (unreliable)
> > > > [ef877d70] [c0097fe4] __rcu_pending+0x184/0x46c
> > > > [ef877da0] [c00991a0] rcu_check_callbacks+0x7c/0x168
> > > > [ef877dc0] [c0044a40] update_process_times+0x3c/0x70
> > > > [ef877de0] [c0083a3c] tick_sched_timer+0x88/0x100
> > > > [ef877e10] [c005b11c] __run_hrtimer.clone.29+0x54/0x104
> > > > [ef877e30] [c005bf44] hrtimer_interrupt+0x158/0x3f8
> > > > [ef877ea0] [c000b5c4] timer_interrupt+0x1cc/0x204
> > > > [ef877ed0] [c0011b88] ret_from_except+0x0/0x1c
> > > > --- Exception: 901 at cpu_idle+0xe4/0x188
> > >
> > > Am I reading this correctly? Did the system really take a scheduling-clock
> > > interrupt from within another scheduling-clock interrupt?
> >
> > Not that I can see. I see only one interrupt on that CPU. It was idle,
> > took a timer interrupt -> hrtimer ... etc... RCU boom !
> >
> > > If so, my first question is "Why did the scheduling-clock interrupt
> > > run for so long?" ;-)
> >
> > This is still that CPU:
> >
> > > LR = cpu_idle+0xc8/0x188
> > > > [ef877f90] [c00097e8] cpu_idle+0x60/0x188 (unreliable)
> > > > [ef877fc0] [c046531c] start_secondary+0x2c8/0x2cc
> > > > [ef877ff0] [00003278] 0x3278
> >
> > And now we have the other one:
> >
> > > > [ef873b60] [c0008d04] show_stack+0x50/0x158 (unreliable)
> > > > [ef873ba0] [c0097fe4] __rcu_pending+0x184/0x46c
> > > > [ef873bd0] [c0099240] rcu_check_callbacks+0x11c/0x168
> > > > [ef873bf0] [c0044a40] update_process_times+0x3c/0x70
> > > > [ef873c10] [c0083a3c] tick_sched_timer+0x88/0x100
> > > > [ef873c40] [c005b11c] __run_hrtimer.clone.29+0x54/0x104
> > > > [ef873c60] [c005bf44] hrtimer_interrupt+0x158/0x3f8
> > > > [ef873cd0] [c000b5c4] timer_interrupt+0x1cc/0x204
> > > > [ef873d00] [c0011b88] ret_from_except+0x0/0x1c
> > > > --- Exception: 901 at wake_up_new_task+0x134/0x16c
> > > > LR = wake_up_new_task+0x134/0x16c
> > > > [ef873dc0] [c0065f08] wake_up_new_task+0xfc/0x16c (unreliable)
> > > > [ef873df0] [c0035530] do_fork+0xe8/0x2bc
> > > > [ef873e30] [c0008a4c] sys_clone+0x50/0x90
> > > > [ef873e50] [c00114b8] ret_from_syscall+0x0/0x40
> > > > --- Exception: c00 at kernel_thread+0x28/0x68
> > > > LR = __call_usermodehelper+0x40/0xdc
> > > > [ef873f10] [c005f5c8] async_run_entry_fn+0x128/0x1e4 (unreliable)
> > > > [ef873f20] [ef878c00] 0xef878c00
> > > > [ef873f40] [c004e668] process_one_work+0x150/0x3f0
> > > > [ef873f70] [c00514f0] worker_thread+0x18c/0x37c
> > > > [ef873fb0] [c00567bc] kthread+0x84/0x88
> > > > [ef873ff0] [c000f514] kernel_thread+0x4c/0x68
> >
> > Here what we have is a kernel thread trying to call_usermodehelper
> > (probably the hotplug stuff from discovering the disk) taking a timer
> > interrupt from wake_up_new_task.
> >
> > Looks like the hang has to be with some RCU stuff update_process_times
> > does vs. taking the interrupt in wake_up_new_task ? Hard to tell...
>
> The RCU stuff from update_process_times() is usually the diagnostic.
>
> > This happens more/less reliably once at boot on this machine, then no
> > more (after the long pause the machine moves on, apparently fine).
>
> What happens if you reduce the stall-warning timeout? You can use either
> the CONFIG_RCU_CPU_STALL_TIMEOUT kernel config parameter or the
> rcutree.rcu_cpu_stall_timeout boot parameter.
>
> If you can get two stall-warning messages to appear during the hang,
> comparing the stacks is usually informative.
Also, would you be willing to send along your .config?
Thanx, Paul
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-06-19 14:20 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-18 0:35 RCU stalls on 32-bit pmac SMP Benjamin Herrenschmidt
2012-06-18 17:05 ` Paul E. McKenney
2012-06-18 20:45 ` Benjamin Herrenschmidt
2012-06-18 21:58 ` Paul E. McKenney
2012-06-19 14:19 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).