* [ppc64] 2.6.29-git7 : offlining a cpu causes an exception
@ 2009-03-31 9:27 Sachin Sant
2009-03-31 22:44 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 7+ messages in thread
From: Sachin Sant @ 2009-03-31 9:27 UTC (permalink / raw)
To: linuxppc-dev, Benjamin Herrenschmidt
[-- Attachment #1: Type: text/plain, Size: 2020 bytes --]
While executing CPU HotPlug[1] tests i observed that during
every cpu offline process an exception is thrown.
cpu 0x2: Vector: 700 (Program Check) at [c0000000074c7ca0]
pc: 00000000007b6640
lr: 000000000079ddc0
sp: c0000000074c7f20
msr: 8000000000081002
current = 0xc0000000fe1c8580
paca = 0xc000000000ab2800
pid = 0, comm = swapper
2:mon> r
R00 = 0000000000000000 R16 = 0000000000000002
R01 = c0000000074c7f20 R17 = 0000000000000000
R02 = 00000000009e8dc0 R18 = 0000000000000000
R03 = 0000000000008278 R19 = 0000000000000000
R04 = 0000000000008000 R20 = 0000000000000000
R05 = 0000000000000002 R21 = 0000000000000000
R06 = 0000000000000002 R22 = c000000000b33ae0
R07 = 0000000000000000 R23 = 0000000000000000
R08 = 0000000000000000 R24 = 0000000000000002
R09 = 00000000000082fc R25 = 0000000000000000
R10 = 0000000000000000 R26 = 0000000000000004
R11 = a000000000001002 R27 = c000000000a95bd8
R12 = a000000000000000 R28 = 0000000000000008
R13 = c000000000ab2800 R29 = ffffffffffffffff
R14 = 0000000000000000 R30 = c00000000095e750
R15 = 0000000007531868 R31 = 0000000007d70b20
pc = 00000000007b6640
lr = 000000000079ddc0
msr = 8000000000081002 cr = 22000004
ctr = 0000000000000000 xer = 0000000000000020 trap = 700
2:mon> u
SLB contents of cpu 2
00 c000000008000000 40004f7ca3000500 1T ESID= c00000 VSID= 4f7ca3 LLP:100
01 d000000008000000 4000eb71b0000510 1T ESID= d00000 VSID= eb71b0 LLP:110
24 0000000008000000 0000000000000c80 256M ESID= 0 VSID= 0 LLP: 0
2:mon>
I can recreate this problem very easily on power5
as well as power6 box.
2.6.29-git6 did not have this problem. Let me know if there
is any other information i can provide. I have attached the
dmesg log here.
Thanks
-Sachin
[1] -> CPU Hotplug test which is part of LTP.
--
---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------
[-- Attachment #2: dmesg_cpu_hotplug --]
[-- Type: text/plain, Size: 9380 bytes --]
<6>Phyp-dump disabled at boot time.
<6>Using pSeries machine description.
<7>Page orders: linear mapping = 24, virtual = 16, io = 12.
<6>Using 1TB segments.
<4>Found initrd at 0xc0000000034d0000:0xc000000003c7f14f.
<6>console [udbg0] enabled.
<6>Partition configured for 4 cpus..
<6>CPU maps initialized for 2 threads per core.
<7> (thread shift is 1).
<4>Starting Linux PPC64 #3 SMP Tue Mar 31 14:33:34 IST 2009.
<4>-----------------------------------------------------.
<4>ppc64_pft_size = 0x1a.
<4>physicalMemorySize = 0x100000000.
<4>htab_hash_mask = 0x7ffff.
<4>-----------------------------------------------------.
<6>Initializing cgroup subsys cpuset.
<6>Initializing cgroup subsys cpu.
<5>Linux version 2.6.29-git7 (root@llm62) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #3 SMP Tue Mar 31 14:33:34 IST 2009.
<4>[boot]0012 Setup Arch.
<7>Node 0 Memory: 0x0-0x100000000.
<4>EEH: No capable adapters found.
<6>PPC64 nvram contains 15360 bytes.
<7>Using shared processor idle loop.
<4>Zone PFN ranges:.
<4> DMA 0x00000000 -> 0x00010000.
<4> Normal 0x00010000 -> 0x00010000.
<4>Movable zone start PFN for each node.
<4>early_node_map[1] active PFN ranges.
<4> 0: 0x00000000 -> 0x00010000.
<7>On node 0 totalpages: 65536.
<7> DMA zone: 56 pages used for memmap.
<7> DMA zone: 0 pages reserved.
<7> DMA zone: 65480 pages, LIFO batch:1.
<4>[boot]0015 Setup Done.
<4>Built 1 zonelists in Node order, mobility grouping on. Total pages: 65480.
<4>Policy zone: DMA.
<5>Kernel command line: root=/dev/sda5 sysrq=1 insmod=sym53c8xx insmod=ipr crashkernel=512M-:256M .
<6>NR_IRQS:512.
<4>[boot]0020 XICS Init.
<4>[boot]0021 XICS Done.
<7>pic: no ISA interrupt controller.
<4>PID hash table entries: 4096 (order: 12, 32768 bytes).
<7>time_init: decrementer frequency = 512.000000 MHz.
<7>time_init: processor frequency = 4704.000000 MHz.
<6>clocksource: timebase mult[7d0000] shift[22] registered.
<7>clockevent: decrementer mult[8312] shift[16] cpu[0].
<4>Console: colour dummy device 80x25.
<6>console handover: boot [udbg0] -> real [hvc0].
<6>Dentry cache hash table entries: 524288 (order: 6, 4194304 bytes).
<6>Inode-cache hash table entries: 262144 (order: 5, 2097152 bytes).
<6>allocated 2621440 bytes of page_cgroup.
<6>please try cgroup_disable=memory option if you don't want.
<4>freeing bootmem node 0.
<6>Memory: 4119872k/4194304k available (8192k kernel code, 74432k reserved, 1984k data, 4194k bss, 448k init).
<6>Calibrating delay loop... 1022.36 BogoMIPS (lpj=5111808).
<6>Security Framework initialized.
<6>SELinux: Disabled at boot..
<4>Mount-cache hash table entries: 4096.
<6>Initializing cgroup subsys debug.
<6>Initializing cgroup subsys ns.
<6>Initializing cgroup subsys cpuacct.
<6>Initializing cgroup subsys memory.
<6>Initializing cgroup subsys devices.
<6>Initializing cgroup subsys freezer.
<7>clockevent: decrementer mult[8312] shift[16] cpu[1].
<4>Processor 1 found..
<7>clockevent: decrementer mult[8312] shift[16] cpu[2].
<4>Processor 2 found..
<7>clockevent: decrementer mult[8312] shift[16] cpu[3].
<4>Processor 3 found..
<6>Brought up 4 CPUs.
<7>Node 0 CPUs: 0-3.
<7>CPU0 attaching sched-domain:.
<7> domain 0: span 0-1 level SIBLING.
<7> groups: 0 1.
<7> domain 1: span 0-3 level CPU.
<7> groups: 0-1 2-3.
<7> domain 2: span 0-3 level NODE.
<7> groups: 0-3.
<7>CPU1 attaching sched-domain:.
<7> domain 0: span 0-1 level SIBLING.
<7> groups: 1 0.
<7> domain 1: span 0-3 level CPU.
<7> groups: 0-1 2-3.
<7> domain 2: span 0-3 level NODE.
<7> groups: 0-3.
<7>CPU2 attaching sched-domain:.
<7> domain 0: span 2-3 level SIBLING.
<7> groups: 2 3.
<7> domain 1: span 0-3 level CPU.
<7> groups: 2-3 0-1.
<7> domain 2: span 0-3 level NODE.
<7> groups: 0-3.
<7>CPU3 attaching sched-domain:.
<7> domain 0: span 2-3 level SIBLING.
<7> groups: 3 2.
<7> domain 1: span 0-3 level CPU.
<7> groups: 2-3 0-1.
<7> domain 2: span 0-3 level NODE.
<7> groups: 0-3.
<6>net_namespace: 1888 bytes.
<6>NET: Registered protocol family 16.
<6>IBM eBus Device Driver.
<6>PCI: Probing PCI hardware.
<7>PCI: Probing PCI hardware done.
<4>bio: create slab
<bio-0> at 0.
<6>usbcore: registered new interface driver usbfs.
<6>usbcore: registered new interface driver hub.
<6>usbcore: registered new device driver usb.
<6>NET: Registered protocol family 2.
<7>Switched to high resolution mode on CPU 0.
<7>Switched to high resolution mode on CPU 1.
<7>Switched to high resolution mode on CPU 2.
<7>Switched to high resolution mode on CPU 3.
<6>IP route cache hash table entries: 32768 (order: 2, 262144 bytes).
<6>TCP established hash table entries: 131072 (order: 5, 2097152 bytes).
<6>TCP bind hash table entries: 65536 (order: 4, 1048576 bytes).
<6>TCP: Hash tables configured (established 131072 bind 65536).
<6>TCP reno registered.
<6>NET: Registered protocol family 1.
<6>Unpacking initramfs... done.
<4>Freeing initrd memory: 7868k freed.
<6>IOMMU table initialized, virtual merging enabled.
<7>RTAS daemon started.
<6>audit: initializing netlink socket (disabled).
<5>type=2000 audit(1238490478.637:1): initialized.
<6>Kprobe smoke test started.
<6>Kprobe smoke test passed successfully.
<6>HugeTLB registered 16 MB page size, pre-allocated 0 pages.
<6>HugeTLB registered 16 GB page size, pre-allocated 0 pages.
<5>VFS: Disk quotas dquot_6.5.2.
<4>Dquot-cache hash table entries: 8192 (order 0, 65536 bytes).
<6>msgmni has been set to 8060.
<6>alg: No test for stdrng (krng).
<6>Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254).
<6>io scheduler noop registered.
<6>io scheduler anticipatory registered.
<6>io scheduler deadline registered.
<6>io scheduler cfq registered (default).
<6>pci_hotplug: PCI Hot Plug PCI Core version: 0.5.
<6>rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1.
<7>vio_register_driver: driver hvc_console registering.
<7>HVSI: registered 0 devices.
<6>Generic RTC Driver v1.07.
<6>Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled.
<6>pmac_zilog: 0.6 (Benjamin Herrenschmidt
<benh@kernel.crashing.org>).
<6>input: Macintosh mouse button emulation as /devices/virtual/input/input0.
<6>Uniform Multi-Platform E-IDE driver.
<6>ide-gd driver 1.18.
<6>IBM eHEA ethernet device driver (Release EHEA_0100).
<6>ehea: eth0: Jumbo frames are disabled.
<6>ehea: eth0 -> logical port id #2.
<6>ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver.
<6>ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver.
<6>mice: PS/2 mouse device common for all mice.
<6>EDAC MC: Ver: 2.1.0 Mar 31 2009.
<6>usbcore: registered new interface driver hiddev.
<6>usbcore: registered new interface driver usbhid.
<6>usbhid: v2.6:USB HID core driver.
<6>TCP cubic registered.
<6>NET: Registered protocol family 15.
<4>registered taskstats version 1.
<4>Freeing unused kernel memory: 448k freed.
<6>SysRq : Changing Loglevel.
<4>Loglevel set to 1.
<5>SCSI subsystem initialized.
<7>vio_register_driver: driver ibmvscsi registering.
<6>ibmvscsi 30000002: SRP_VERSION: 16.a.
<6>scsi0 : IBM POWER Virtual SCSI Adapter 1.5.8.
<6>ibmvscsi 30000002: partner initialization complete.
<6>ibmvscsi 30000002: sent SRP login.
<6>ibmvscsi 30000002: SRP_LOGIN succeeded.
<6>ibmvscsi 30000002: host srp version: 16.a, host partition VIO (1), OS 3, max io 1048576.
<5>scsi 0:0:1:0: Direct-Access AIX VDASD 0001 PQ: 0 ANSI: 3.
<6>udevd version 128 started.
<4>Driver 'sd' needs updating - please use bus_type methods.
<5>sd 0:0:1:0: [sda] 167772160 512-byte hardware sectors: (85.8 GB/80.0 GiB).
<5>sd 0:0:1:0: [sda] Write Protect is off.
<7>sd 0:0:1:0: [sda] Mode Sense: 17 00 00 08.
<5>sd 0:0:1:0: [sda] Cache data unavailable.
<3>sd 0:0:1:0: [sda] Assuming drive cache: write through.
<5>sd 0:0:1:0: [sda] Cache data unavailable.
<3>sd 0:0:1:0: [sda] Assuming drive cache: write through.
<6> sda: sda1 sda2
< sda5 > sda3 sda4.
<5>sd 0:0:1:0: [sda] Attached SCSI disk.
<6>kjournald starting. Commit interval 5 seconds.
<6>EXT3 FS on sda5, internal journal.
<6>EXT3-fs: mounted filesystem with ordered data mode..
<6>udevd version 128 started.
<5>sd 0:0:1:0: Attached scsi generic sg0 type 0.
<6>Adding 1044096k swap on /dev/sda3. Priority:-1 extents:1 across:1044096k .
<6>device-mapper: uevent: version 1.0.3.
<6>device-mapper: ioctl: 4.14.0-ioctl (2008-04-23) initialised: dm-devel@redhat.com.
<6>loop: module loaded.
<6>fuse init (API version 7.11).
<6>ehea: eth0: Physical port up.
<6>ehea: External switch port is backup port.
<6>NET: Registered protocol family 10.
<6>lo: Disabled Privacy Extensions.
<7>eth0: no IPv6 routers present.
<4>cpu 2 (hwid 2) Ready to die....
<7>CPU0 attaching NULL sched-domain..
<7>CPU1 attaching NULL sched-domain..
<7>CPU2 attaching NULL sched-domain..
<7>CPU3 attaching NULL sched-domain..
<7>CPU0 attaching sched-domain:.
<7> domain 0: span 0-1 level SIBLING.
<7> groups: 0 1.
<7> domain 1: span 0-1,3 level CPU.
<7> groups: 0-1 3.
<7> domain 2: span 0-1,3 level NODE.
<7> groups: 0-1,3.
<7>CPU1 attaching sched-domain:.
<7> domain 0: span 0-1 level SIBLING.
<7> groups: 1 0.
<7> domain 1: span 0-1,3 level CPU.
<7> groups: 0-1 3.
<7> domain 2: span 0-1,3 level NODE.
<7> groups: 0-1,3.
<7>CPU3 attaching sched-domain:.
<7> domain 0: span 0-1,3 level CPU.
<7> groups: 3 0-1.
<7> domain 1: span 0-1,3 level NODE.
<7> groups: 0-1,3.....................
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [ppc64] 2.6.29-git7 : offlining a cpu causes an exception
2009-03-31 9:27 [ppc64] 2.6.29-git7 : offlining a cpu causes an exception Sachin Sant
@ 2009-03-31 22:44 ` Benjamin Herrenschmidt
2009-04-01 6:40 ` Sachin Sant
0 siblings, 1 reply; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2009-03-31 22:44 UTC (permalink / raw)
To: Sachin Sant; +Cc: linuxppc-dev
On Tue, 2009-03-31 at 14:57 +0530, Sachin Sant wrote:
> While executing CPU HotPlug[1] tests i observed that during
> every cpu offline process an exception is thrown.
Looks like a BUG_ON() to me... can you look at what other
messages just before that ?
That or lookup where the PC and LR values are in System.map
and maybe get us a backtrace from xmon ?
(You seem to have no symbols, have you built with kallsyms ?)
Ben.
> cpu 0x2: Vector: 700 (Program Check) at [c0000000074c7ca0]
> pc: 00000000007b6640
> lr: 000000000079ddc0
> sp: c0000000074c7f20
> msr: 8000000000081002
> current = 0xc0000000fe1c8580
> paca = 0xc000000000ab2800
> pid = 0, comm = swapper
> 2:mon> r
> R00 = 0000000000000000 R16 = 0000000000000002
> R01 = c0000000074c7f20 R17 = 0000000000000000
> R02 = 00000000009e8dc0 R18 = 0000000000000000
> R03 = 0000000000008278 R19 = 0000000000000000
> R04 = 0000000000008000 R20 = 0000000000000000
> R05 = 0000000000000002 R21 = 0000000000000000
> R06 = 0000000000000002 R22 = c000000000b33ae0
> R07 = 0000000000000000 R23 = 0000000000000000
> R08 = 0000000000000000 R24 = 0000000000000002
> R09 = 00000000000082fc R25 = 0000000000000000
> R10 = 0000000000000000 R26 = 0000000000000004
> R11 = a000000000001002 R27 = c000000000a95bd8
> R12 = a000000000000000 R28 = 0000000000000008
> R13 = c000000000ab2800 R29 = ffffffffffffffff
> R14 = 0000000000000000 R30 = c00000000095e750
> R15 = 0000000007531868 R31 = 0000000007d70b20
> pc = 00000000007b6640
> lr = 000000000079ddc0
> msr = 8000000000081002 cr = 22000004
> ctr = 0000000000000000 xer = 0000000000000020 trap = 700
> 2:mon> u
> SLB contents of cpu 2
> 00 c000000008000000 40004f7ca3000500 1T ESID= c00000 VSID= 4f7ca3 LLP:100
> 01 d000000008000000 4000eb71b0000510 1T ESID= d00000 VSID= eb71b0 LLP:110
> 24 0000000008000000 0000000000000c80 256M ESID= 0 VSID= 0 LLP: 0
> 2:mon>
>
> I can recreate this problem very easily on power5
> as well as power6 box.
>
> 2.6.29-git6 did not have this problem. Let me know if there
> is any other information i can provide. I have attached the
> dmesg log here.
>
> Thanks
> -Sachin
>
> [1] -> CPU Hotplug test which is part of LTP.
>
> plain text document attachment (dmesg_cpu_hotplug)
> <6>Phyp-dump disabled at boot time.
> <6>Using pSeries machine description.
> <7>Page orders: linear mapping = 24, virtual = 16, io = 12.
> <6>Using 1TB segments.
> <4>Found initrd at 0xc0000000034d0000:0xc000000003c7f14f.
> <6>console [udbg0] enabled.
> <6>Partition configured for 4 cpus..
> <6>CPU maps initialized for 2 threads per core.
> <7> (thread shift is 1).
> <4>Starting Linux PPC64 #3 SMP Tue Mar 31 14:33:34 IST 2009.
> <4>-----------------------------------------------------.
> <4>ppc64_pft_size = 0x1a.
> <4>physicalMemorySize = 0x100000000.
> <4>htab_hash_mask = 0x7ffff.
> <4>-----------------------------------------------------.
> <6>Initializing cgroup subsys cpuset.
> <6>Initializing cgroup subsys cpu.
> <5>Linux version 2.6.29-git7 (root@llm62) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #3 SMP Tue Mar 31 14:33:34 IST 2009.
> <4>[boot]0012 Setup Arch.
> <7>Node 0 Memory: 0x0-0x100000000.
> <4>EEH: No capable adapters found.
> <6>PPC64 nvram contains 15360 bytes.
> <7>Using shared processor idle loop.
> <4>Zone PFN ranges:.
> <4> DMA 0x00000000 -> 0x00010000.
> <4> Normal 0x00010000 -> 0x00010000.
> <4>Movable zone start PFN for each node.
> <4>early_node_map[1] active PFN ranges.
> <4> 0: 0x00000000 -> 0x00010000.
> <7>On node 0 totalpages: 65536.
> <7> DMA zone: 56 pages used for memmap.
> <7> DMA zone: 0 pages reserved.
> <7> DMA zone: 65480 pages, LIFO batch:1.
> <4>[boot]0015 Setup Done.
> <4>Built 1 zonelists in Node order, mobility grouping on. Total pages: 65480.
> <4>Policy zone: DMA.
> <5>Kernel command line: root=/dev/sda5 sysrq=1 insmod=sym53c8xx insmod=ipr crashkernel=512M-:256M .
> <6>NR_IRQS:512.
> <4>[boot]0020 XICS Init.
> <4>[boot]0021 XICS Done.
> <7>pic: no ISA interrupt controller.
> <4>PID hash table entries: 4096 (order: 12, 32768 bytes).
> <7>time_init: decrementer frequency = 512.000000 MHz.
> <7>time_init: processor frequency = 4704.000000 MHz.
> <6>clocksource: timebase mult[7d0000] shift[22] registered.
> <7>clockevent: decrementer mult[8312] shift[16] cpu[0].
> <4>Console: colour dummy device 80x25.
> <6>console handover: boot [udbg0] -> real [hvc0].
> <6>Dentry cache hash table entries: 524288 (order: 6, 4194304 bytes).
> <6>Inode-cache hash table entries: 262144 (order: 5, 2097152 bytes).
> <6>allocated 2621440 bytes of page_cgroup.
> <6>please try cgroup_disable=memory option if you don't want.
> <4>freeing bootmem node 0.
> <6>Memory: 4119872k/4194304k available (8192k kernel code, 74432k reserved, 1984k data, 4194k bss, 448k init).
> <6>Calibrating delay loop... 1022.36 BogoMIPS (lpj=5111808).
> <6>Security Framework initialized.
> <6>SELinux: Disabled at boot..
> <4>Mount-cache hash table entries: 4096.
> <6>Initializing cgroup subsys debug.
> <6>Initializing cgroup subsys ns.
> <6>Initializing cgroup subsys cpuacct.
> <6>Initializing cgroup subsys memory.
> <6>Initializing cgroup subsys devices.
> <6>Initializing cgroup subsys freezer.
> <7>clockevent: decrementer mult[8312] shift[16] cpu[1].
> <4>Processor 1 found..
> <7>clockevent: decrementer mult[8312] shift[16] cpu[2].
> <4>Processor 2 found..
> <7>clockevent: decrementer mult[8312] shift[16] cpu[3].
> <4>Processor 3 found..
> <6>Brought up 4 CPUs.
> <7>Node 0 CPUs: 0-3.
> <7>CPU0 attaching sched-domain:.
> <7> domain 0: span 0-1 level SIBLING.
> <7> groups: 0 1.
> <7> domain 1: span 0-3 level CPU.
> <7> groups: 0-1 2-3.
> <7> domain 2: span 0-3 level NODE.
> <7> groups: 0-3.
> <7>CPU1 attaching sched-domain:.
> <7> domain 0: span 0-1 level SIBLING.
> <7> groups: 1 0.
> <7> domain 1: span 0-3 level CPU.
> <7> groups: 0-1 2-3.
> <7> domain 2: span 0-3 level NODE.
> <7> groups: 0-3.
> <7>CPU2 attaching sched-domain:.
> <7> domain 0: span 2-3 level SIBLING.
> <7> groups: 2 3.
> <7> domain 1: span 0-3 level CPU.
> <7> groups: 2-3 0-1.
> <7> domain 2: span 0-3 level NODE.
> <7> groups: 0-3.
> <7>CPU3 attaching sched-domain:.
> <7> domain 0: span 2-3 level SIBLING.
> <7> groups: 3 2.
> <7> domain 1: span 0-3 level CPU.
> <7> groups: 2-3 0-1.
> <7> domain 2: span 0-3 level NODE.
> <7> groups: 0-3.
> <6>net_namespace: 1888 bytes.
> <6>NET: Registered protocol family 16.
> <6>IBM eBus Device Driver.
> <6>PCI: Probing PCI hardware.
> <7>PCI: Probing PCI hardware done.
> <4>bio: create slab
> <bio-0> at 0.
> <6>usbcore: registered new interface driver usbfs.
> <6>usbcore: registered new interface driver hub.
> <6>usbcore: registered new device driver usb.
> <6>NET: Registered protocol family 2.
> <7>Switched to high resolution mode on CPU 0.
> <7>Switched to high resolution mode on CPU 1.
> <7>Switched to high resolution mode on CPU 2.
> <7>Switched to high resolution mode on CPU 3.
> <6>IP route cache hash table entries: 32768 (order: 2, 262144 bytes).
> <6>TCP established hash table entries: 131072 (order: 5, 2097152 bytes).
> <6>TCP bind hash table entries: 65536 (order: 4, 1048576 bytes).
> <6>TCP: Hash tables configured (established 131072 bind 65536).
> <6>TCP reno registered.
> <6>NET: Registered protocol family 1.
> <6>Unpacking initramfs... done.
> <4>Freeing initrd memory: 7868k freed.
> <6>IOMMU table initialized, virtual merging enabled.
> <7>RTAS daemon started.
> <6>audit: initializing netlink socket (disabled).
> <5>type=2000 audit(1238490478.637:1): initialized.
> <6>Kprobe smoke test started.
> <6>Kprobe smoke test passed successfully.
> <6>HugeTLB registered 16 MB page size, pre-allocated 0 pages.
> <6>HugeTLB registered 16 GB page size, pre-allocated 0 pages.
> <5>VFS: Disk quotas dquot_6.5.2.
> <4>Dquot-cache hash table entries: 8192 (order 0, 65536 bytes).
> <6>msgmni has been set to 8060.
> <6>alg: No test for stdrng (krng).
> <6>Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254).
> <6>io scheduler noop registered.
> <6>io scheduler anticipatory registered.
> <6>io scheduler deadline registered.
> <6>io scheduler cfq registered (default).
> <6>pci_hotplug: PCI Hot Plug PCI Core version: 0.5.
> <6>rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1.
> <7>vio_register_driver: driver hvc_console registering.
> <7>HVSI: registered 0 devices.
> <6>Generic RTC Driver v1.07.
> <6>Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled.
> <6>pmac_zilog: 0.6 (Benjamin Herrenschmidt
> <benh@kernel.crashing.org>).
> <6>input: Macintosh mouse button emulation as /devices/virtual/input/input0.
> <6>Uniform Multi-Platform E-IDE driver.
> <6>ide-gd driver 1.18.
> <6>IBM eHEA ethernet device driver (Release EHEA_0100).
> <6>ehea: eth0: Jumbo frames are disabled.
> <6>ehea: eth0 -> logical port id #2.
> <6>ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver.
> <6>ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver.
> <6>mice: PS/2 mouse device common for all mice.
> <6>EDAC MC: Ver: 2.1.0 Mar 31 2009.
> <6>usbcore: registered new interface driver hiddev.
> <6>usbcore: registered new interface driver usbhid.
> <6>usbhid: v2.6:USB HID core driver.
> <6>TCP cubic registered.
> <6>NET: Registered protocol family 15.
> <4>registered taskstats version 1.
> <4>Freeing unused kernel memory: 448k freed.
> <6>SysRq : Changing Loglevel.
> <4>Loglevel set to 1.
> <5>SCSI subsystem initialized.
> <7>vio_register_driver: driver ibmvscsi registering.
> <6>ibmvscsi 30000002: SRP_VERSION: 16.a.
> <6>scsi0 : IBM POWER Virtual SCSI Adapter 1.5.8.
> <6>ibmvscsi 30000002: partner initialization complete.
> <6>ibmvscsi 30000002: sent SRP login.
> <6>ibmvscsi 30000002: SRP_LOGIN succeeded.
> <6>ibmvscsi 30000002: host srp version: 16.a, host partition VIO (1), OS 3, max io 1048576.
> <5>scsi 0:0:1:0: Direct-Access AIX VDASD 0001 PQ: 0 ANSI: 3.
> <6>udevd version 128 started.
> <4>Driver 'sd' needs updating - please use bus_type methods.
> <5>sd 0:0:1:0: [sda] 167772160 512-byte hardware sectors: (85.8 GB/80.0 GiB).
> <5>sd 0:0:1:0: [sda] Write Protect is off.
> <7>sd 0:0:1:0: [sda] Mode Sense: 17 00 00 08.
> <5>sd 0:0:1:0: [sda] Cache data unavailable.
> <3>sd 0:0:1:0: [sda] Assuming drive cache: write through.
> <5>sd 0:0:1:0: [sda] Cache data unavailable.
> <3>sd 0:0:1:0: [sda] Assuming drive cache: write through.
> <6> sda: sda1 sda2
> < sda5 > sda3 sda4.
> <5>sd 0:0:1:0: [sda] Attached SCSI disk.
> <6>kjournald starting. Commit interval 5 seconds.
> <6>EXT3 FS on sda5, internal journal.
> <6>EXT3-fs: mounted filesystem with ordered data mode..
> <6>udevd version 128 started.
> <5>sd 0:0:1:0: Attached scsi generic sg0 type 0.
> <6>Adding 1044096k swap on /dev/sda3. Priority:-1 extents:1 across:1044096k .
> <6>device-mapper: uevent: version 1.0.3.
> <6>device-mapper: ioctl: 4.14.0-ioctl (2008-04-23) initialised: dm-devel@redhat.com.
> <6>loop: module loaded.
> <6>fuse init (API version 7.11).
> <6>ehea: eth0: Physical port up.
> <6>ehea: External switch port is backup port.
> <6>NET: Registered protocol family 10.
> <6>lo: Disabled Privacy Extensions.
> <7>eth0: no IPv6 routers present.
> <4>cpu 2 (hwid 2) Ready to die....
> <7>CPU0 attaching NULL sched-domain..
> <7>CPU1 attaching NULL sched-domain..
> <7>CPU2 attaching NULL sched-domain..
> <7>CPU3 attaching NULL sched-domain..
> <7>CPU0 attaching sched-domain:.
> <7> domain 0: span 0-1 level SIBLING.
> <7> groups: 0 1.
> <7> domain 1: span 0-1,3 level CPU.
> <7> groups: 0-1 3.
> <7> domain 2: span 0-1,3 level NODE.
> <7> groups: 0-1,3.
> <7>CPU1 attaching sched-domain:.
> <7> domain 0: span 0-1 level SIBLING.
> <7> groups: 1 0.
> <7> domain 1: span 0-1,3 level CPU.
> <7> groups: 0-1 3.
> <7> domain 2: span 0-1,3 level NODE.
> <7> groups: 0-1,3.
> <7>CPU3 attaching sched-domain:.
> <7> domain 0: span 0-1,3 level CPU.
> <7> groups: 3 0-1.
> <7> domain 1: span 0-1,3 level NODE.
> <7> groups: 0-1,3.....................
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [ppc64] 2.6.29-git7 : offlining a cpu causes an exception
2009-03-31 22:44 ` Benjamin Herrenschmidt
@ 2009-04-01 6:40 ` Sachin Sant
2009-04-01 11:48 ` Sachin Sant
0 siblings, 1 reply; 7+ messages in thread
From: Sachin Sant @ 2009-04-01 6:40 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
Benjamin Herrenschmidt wrote:
> On Tue, 2009-03-31 at 14:57 +0530, Sachin Sant wrote:
>
>> While executing CPU HotPlug[1] tests i observed that during
>> every cpu offline process an exception is thrown.
>>
>
> Looks like a BUG_ON() to me... can you look at what other
> messages just before that ?
>
I don't get any other messages when the problem occurs. Infact
if i don't have xmon enabled the machine just hangs without
any messages on the console. I extracted the dmesg log
(attached in my previous mail) through xmon. Here are last few
related messages from 2.6.29-git8 kernel during problem recreation.
<4>IRQ 18 affinity broken off cpu 2
<4>cpu 2 (hwid 2) Ready to die....
<7>CPU0 attaching NULL sched-domain..
<7>CPU1 attaching NULL sched-domain..
<7>CPU2 attaching NULL sched-domain..
<7>CPU3 attaching NULL sched-domain..
<7>CPU0 attaching sched-domain:.
<7> domain 0: span 0-1 level SIBLING.
<7> groups: 0 1.
<7> domain 1: span 0-1,3 level CPU.
<7> groups: 0-1 3.
<7> domain 2: span 0-1,3 level NODE
<7> groups: 0-1,3.
<7>CPU1 attaching sched-domain:.
<7> domain 0: span 0-1 level SIBLING.
<7> groups: 1 0.
<7> domain 1: span 0-1,3 level CPU.
<7> groups: 0-1 3.
<7> domain 2: span 0-1,3 level NODE.
<7> groups: 0-1,3.
<7>CPU3 attaching sched-domain:.
<7> domain 0: span 0-1,3 level CPU.
<7> groups: 3 0-1.
<7> domain 1: span 0-1,3 level NODE.
<7> groups: 0-1,3...
> That or lookup where the PC and LR values are in System.map
> and maybe get us a backtrace from xmon ?
>
> (You seem to have no symbols, have you built with kallsyms ?)
I have kallsyms and debug info options enabled.
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_DEBUG_INFO=y
Here is the related information from 2.6.29-git8 kernel.
llm62 login: cpu 0x2: Vector: 700 (Program Check) at [c0000000074c7ca0]
pc: 00000000007b6640
lr: 000000000079ddc0
sp: c0000000074c7f20
msr: 8000000000081002
current = 0xc0000000fe1c8580
paca = 0xc000000000ab2800
pid = 0, comm = swapper
enter ? for help
[c0000000074c7f20] 0000000000018694 (unreliable)
[c0000000074c7f90] 0000000000008278
SP (4f00000003) is in userspace
2:mon> la %pc
00000000007b6640
2:mon> la c0000000007b6640
c0000000007b6640: .kmem_cache_init+0x2d8/0x528
2:mon> la %lr
000000000079ddc0
2:mon> la c00000000079ddc0
c00000000079ddc0: .mem_init+0x150/0x22c
2:mon>
Regards
-Sachin
--
---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [ppc64] 2.6.29-git7 : offlining a cpu causes an exception
2009-04-01 6:40 ` Sachin Sant
@ 2009-04-01 11:48 ` Sachin Sant
2009-04-16 5:36 ` Sachin Sant
0 siblings, 1 reply; 7+ messages in thread
From: Sachin Sant @ 2009-04-01 11:48 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
Sachin Sant wrote:
> Benjamin Herrenschmidt wrote:
>> On Tue, 2009-03-31 at 14:57 +0530, Sachin Sant wrote:
>>
>>> While executing CPU HotPlug[1] tests i observed that during
>>> every cpu offline process an exception is thrown.
>>>
>>
>> Looks like a BUG_ON() to me... can you look at what other
>> messages just before that ?
>
Ben, seems like the following patch is causing the cpu hotplug
test failure.
[PATCH 6/6] powerpc/mm: Introduce early_init_mmu() on 64-bit
http://ozlabs.org/pipermail/linuxppc-dev/2009-March/069613.html
If i back out this patch, i am able to offline/online cpu's
without any issue.
Thanks
-Sachin
--
---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [ppc64] 2.6.29-git7 : offlining a cpu causes an exception
2009-04-01 11:48 ` Sachin Sant
@ 2009-04-16 5:36 ` Sachin Sant
2009-04-16 8:25 ` Michael Ellerman
0 siblings, 1 reply; 7+ messages in thread
From: Sachin Sant @ 2009-04-16 5:36 UTC (permalink / raw)
To: linuxppc-dev
Sachin Sant wrote:
> Sachin Sant wrote:
>> Benjamin Herrenschmidt wrote:
>>> On Tue, 2009-03-31 at 14:57 +0530, Sachin Sant wrote:
>>>
>>>> While executing CPU HotPlug[1] tests i observed that during
>>>> every cpu offline process an exception is thrown.
>>>>
>>>
>>> Looks like a BUG_ON() to me... can you look at what other
>>> messages just before that ?
>>
> Ben, seems like the following patch is causing the cpu hotplug
> test failure.
> [PATCH 6/6] powerpc/mm: Introduce early_init_mmu() on 64-bit
>
> http://ozlabs.org/pipermail/linuxppc-dev/2009-March/069613.html
>
> If i back out this patch, i am able to offline/online cpu's
> without any issue.
I can recreate this problem with 2.6.30-rc2-git1 as well. Same BUG_ON while
running cpu hotplug tests.
Let me know if there is any thing i can help to find a fix for this.
Thanks
-Sachin
--
---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [ppc64] 2.6.29-git7 : offlining a cpu causes an exception
2009-04-16 5:36 ` Sachin Sant
@ 2009-04-16 8:25 ` Michael Ellerman
2009-04-16 10:15 ` Sachin Sant
0 siblings, 1 reply; 7+ messages in thread
From: Michael Ellerman @ 2009-04-16 8:25 UTC (permalink / raw)
To: Sachin Sant; +Cc: linuxppc-dev
On Thu, 2009-04-16 at 11:06 +0530, Sachin Sant wrote:
> Sachin Sant wrote:
> > Sachin Sant wrote:
> >> Benjamin Herrenschmidt wrote:
> >>> On Tue, 2009-03-31 at 14:57 +0530, Sachin Sant wrote:
> >>>
> >>>> While executing CPU HotPlug[1] tests i observed that during
> >>>> every cpu offline process an exception is thrown.
> >>>>
> >>>
> >>> Looks like a BUG_ON() to me... can you look at what other
> >>> messages just before that ?
> >>
> > Ben, seems like the following patch is causing the cpu hotplug
> > test failure.
> > [PATCH 6/6] powerpc/mm: Introduce early_init_mmu() on 64-bit
> >
> > http://ozlabs.org/pipermail/linuxppc-dev/2009-March/069613.html
> >
> > If i back out this patch, i am able to offline/online cpu's
> > without any issue.
> I can recreate this problem with 2.6.30-rc2-git1 as well. Same BUG_ON while
> running cpu hotplug tests.
>
> Let me know if there is any thing i can help to find a fix for this.
Hi Sachin,
Does this patch, on top of Ben's patch, fix it?
cheers
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index db556d2..1ade7eb 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -753,7 +753,7 @@ void __init early_init_mmu(void)
}
#ifdef CONFIG_SMP
-void __init early_init_mmu_secondary(void)
+void __cpuinit early_init_mmu_secondary(void)
{
/* Initialize hash table for that CPU */
if (!firmware_has_feature(FW_FEATURE_LPAR))
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [ppc64] 2.6.29-git7 : offlining a cpu causes an exception
2009-04-16 8:25 ` Michael Ellerman
@ 2009-04-16 10:15 ` Sachin Sant
0 siblings, 0 replies; 7+ messages in thread
From: Sachin Sant @ 2009-04-16 10:15 UTC (permalink / raw)
To: michael; +Cc: linuxppc-dev
Michael Ellerman wrote:
> Does this patch, on top of Ben's patch, fix it?
>
> cheers
>
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index db556d2..1ade7eb 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -753,7 +753,7 @@ void __init early_init_mmu(void)
> }
>
> #ifdef CONFIG_SMP
> -void __init early_init_mmu_secondary(void)
> +void __cpuinit early_init_mmu_secondary(void)
> {
> /* Initialize hash table for that CPU */
> if (!firmware_has_feature(FW_FEATURE_LPAR))
Yes, this patch fixed the issue. Now i can offline/online cpus without
any problem.
Thanks
-Sachin
--
---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-04-16 10:15 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-31 9:27 [ppc64] 2.6.29-git7 : offlining a cpu causes an exception Sachin Sant
2009-03-31 22:44 ` Benjamin Herrenschmidt
2009-04-01 6:40 ` Sachin Sant
2009-04-01 11:48 ` Sachin Sant
2009-04-16 5:36 ` Sachin Sant
2009-04-16 8:25 ` Michael Ellerman
2009-04-16 10:15 ` Sachin Sant
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).