linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Sachin Sant <sachinp@in.ibm.com>
To: Tejun Heo <tj@kernel.org>
Cc: David Miller <davem@davemloft.net>,
	Linux/PPC Development <linuxppc-dev@ozlabs.org>
Subject: Re: 2.6.31-git5 kernel boot hangs on powerpc
Date: Fri, 25 Sep 2009 12:45:49 +0530	[thread overview]
Message-ID: <4ABC6E25.7090904@in.ibm.com> (raw)
In-Reply-To: <4ABC376D.1020704@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 2979 bytes --]

Tejun Heo wrote:
> Benjamin Herrenschmidt wrote:
>   
>>> --- Exception: 301 at .memset+0x60/0xfc
>>>     LR = .pcpu_alloc+0x718/0x8fc
>>>       
>> So it's memsetting something that causes it to hash_page(), ie, faulting
>> in pages (vmalloc space ?) so far nothing obviously wrong....
>>     
>
> It's probably memset() call near the end of pcpu_populate_chunk()
> where percpu allocator clears the allocated areas before returning to
> user.  I don't think the first chunk is causing the problem as they're
> all in the linear mapped area.  From the second chunk on, they're on
> vmalloc area and very near to the top of it, so that might be exposing
> a hidden problem in paging code?  BTW, for some reason, the problem is
> not reproducible on my powerstation.
>
> Sachin, can you please apply the attached patch on top of the current
> linus tree, reproduce the hang and report full kernel log?  Let's see
> which address is causing the problem.
>   
Here is the dmesg log captured with the debug patch.

Some of the debug messages related to PERCPU

<6>PERCPU: Embedded 2 pages/cpu @c000000001100000 s97160 r0 d33912 u524288
<6>pcpu-alloc: s97160 r0 d33912 u524288 alloc=1*1048576
<6>pcpu-alloc: [0] 0 1
<4>PERCPU: initialized 19 slots [c000000001120200,c000000001120330)
<4>PERCPU: chunk 0 relocating -1 -> 13 c000000001120380 <c000000001120380:c000000001120380>
<4>PERCPU: relocated <c0000000011202d0:c0000000011202d0>

<4>PERCPU: chunk 0 relocating 13 -> 12 c000000001120380 <c0000000011202d0:c0000000011202d0>
<4>PERCPU: relocated <c0000000011202c0:c0000000011202c0>

<4>PERCPU: chunk 0 relocating 12 -> 11 c000000001120380 <c0000000011202c0:c0000000011202c0>
<4>PERCPU: relocated <c0000000011202b0:c0000000011202b0>

<6>ehea: eth0: Physical port up
<6>ehea: External switch port is backup port
<7>irq: irq 33540 on host null mapped to virtual irq 260
<6>NET: Registered protocol family 10
<4>PERCPU: chunk 0 relocating 11 -> 10 c000000001120380 <c0000000011202b0:c0000000011202b0>
<4>PERCPU: relocated <c0000000011202a0:c0000000011202a0>
<4>PERCPU: chunk 0 relocating 10 -> 9 c000000001120380 <c0000000011202a0:c0000000011202a0>
<4>PERCPU: relocated <c000000001120290:c000000001120290>
<4>PERCPU: chunk 1 relocating -1 -> 18 c0000000db70fb00 <c0000000db70fb00:c0000000db70fb00>
<4>PERCPU: relocated <c000000001120320:c000000001120320>
<4>PERCPU: chunk 1 relocating 18 -> 16 c0000000db70fb00 <c000000001120320:c000000001120320>
<4>PERCPU: relocated <c000000001120300:c000000001120300>
<4>PERCPU: chunk 1, alloc pages [0,1)
<4>PERCPU: chunk 1, map pages [0,1)
<4>PERCPU: map 0xd00007fffff00000, 1 pages 53544
<4>PERCPU: map 0xd00007fffff80000, 1 pages 53545
<4>PERCPU: chunk 1, will clear 4096b/unit d00007fffff00000 d00007fffff80000
<3>INFO: RCU detected CPU 0 stall (t=1000 jiffies)

Thanks
-Sachin



-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------


[-- Attachment #2: pcpu-debug-log --]
[-- Type: text/plain, Size: 12987 bytes --]

1:mon> c 0
0:mon> t
[link register   ] c00000000016d0d0 .pcpu_alloc+0x8a4/0xae4
[c0000000da427960] c00000000016d094 .pcpu_alloc+0x868/0xae4 (unreliable)
[c0000000da427a90] c000000000613c08 .snmp_mib_init+0x34/0x9c
[c0000000da427b20] d00000000262e074 .ipv6_add_dev+0x1d4/0x3e4 [ipv6]
[c0000000da427bc0] d00000000265997c .addrconf_init+0x6c/0x194 [ipv6]
[c0000000da427c50] d00000000265974c .inet6_init+0x1bc/0x34c [ipv6]
[c0000000da427ce0] c0000000000097a4 .do_one_initcall+0x88/0x1bc
[c0000000da427d90] c0000000000d59fc .SyS_init_module+0x118/0x2a4
[c0000000da427e30] c0000000000085b4 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 00000fff8350b568
SP (fffdc3a7f50) is in userspace
0:mon> dl
<4>Crash kernel location must be 0x2000000
<6>Reserving 256MB of memory at 32MB for crashkernel (System RAM: 4096MB)
<6>Phyp-dump disabled at boot time
<6>Using pSeries machine description
<7>Page orders: linear mapping = 24, virtual = 16, io = 12
<6>Using 1TB segments
<4>Found initrd at 0xc000000003500000:0xc000000003cca9b1
<6>bootconsole [udbg0] enabled
<6>Partition configured for 2 cpus.
<6>CPU maps initialized for 2 threads per core
<7> (thread shift is 1)
<4>Starting Linux PPC64 #2 SMP Fri Sep 25 12:31:24 IST 2009
<4>-----------------------------------------------------
<4>ppc64_pft_size                = 0x1a
<4>physicalMemorySize            = 0x100000000
<4>htab_hash_mask                = 0x7ffff
<4>-----------------------------------------------------
<6>Initializing cgroup subsys cpuset
<6>Initializing cgroup subsys cpu
<5>Linux version 2.6.31-git15 (root@mpower6lp5) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #2 SMP Fri Sep 25 12:31:24 IST 2009
<4>[boot]0012 Setup Arch
<7>Node 0 Memory:
<7>Node 2 Memory: 0x0-0xe0000000
<7>Node 3 Memory: 0xe0000000-0x100000000
<4>EEH: No capable adapters found
<6>PPC64 nvram contains 15360 bytes
<7>Using shared processor idle loop
<4>Zone PFN ranges:
<4>  DMA      0x00000000 -> 0x00010000
<4>  Normal   0x00010000 -> 0x00010000
<4>Movable zone start PFN for each node
<4>early_node_map[2] active PFN ranges
<4>    2: 0x00000000 -> 0x0000e000
<4>    3: 0x0000e000 -> 0x00010000
<4>Could not find start_pfn for node 0
<7>On node 0 totalpages: 0
<7>On node 2 totalpages: 57344
<7>  DMA zone: 56 pages used for memmap
<7>  DMA zone: 0 pages reserved
<7>  DMA zone: 57288 pages, LIFO batch:1
<7>On node 3 totalpages: 8192
<7>  DMA zone: 8 pages used for memmap
<7>  DMA zone: 0 pages reserved
<7>  DMA zone: 8184 pages, LIFO batch:0
<4>[boot]0015 Setup Done
<6>PERCPU: Embedded 2 pages/cpu @c000000001100000 s97160 r0 d33912 u524288
<6>pcpu-alloc: s97160 r0 d33912 u524288 alloc=1*1048576
<6>pcpu-alloc: [0] 0 1 
<4>PERCPU: initialized 19 slots [c000000001120200,c000000001120330)
<4>PERCPU: chunk 0 relocating -1 -> 13 c000000001120380 <c000000001120380:c000000001120380>
<4>PERCPU: relocated <c0000000011202d0:c0000000011202d0>
<4>Built 3 zonelists in Node order, mobility grouping on.  Total pages: 65472
<4>Policy zone: DMA
<5>Kernel command line: root=/dev/sda3 sysrq=8 insmod=sym53c8xx insmod=ipr crashkernel=512M-:256M xmon=on 
<6>PID hash table entries: 4096 (order: -1, 32768 bytes)
<4>freeing bootmem node 2
<4>freeing bootmem node 3
<6>Memory: 3897152k/4194304k available (9664k kernel code, 297152k reserved, 2944k data, 4274k bss, 576k init)
<6>Hierarchical RCU implementation.
<6>RCU-based detection of stalled CPUs is enabled.
<6>NR_IRQS:512
<4>[boot]0020 XICS Init
<4>[boot]0021 XICS Done
<7>pic: no ISA interrupt controller
<7>time_init: decrementer frequency = 512.000000 MHz
<7>time_init: processor frequency   = 4704.000000 MHz
<6>clocksource: timebase mult[7d0000] shift[22] registered
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[0]
<4>Console: colour dummy device 80x25
<6>console [hvc0] enabled, bootconsole disabled
<6>allocated 2621440 bytes of page_cgroup
<6>please try 'cgroup_disable=memory' option if you don't want memory cgroups
<6>Security Framework initialized
<6>SELinux:  Disabled at boot.
<6>Dentry cache hash table entries: 524288 (order: 6, 4194304 bytes)
<6>Inode-cache hash table entries: 262144 (order: 5, 2097152 bytes)
<4>Mount-cache hash table entries: 4096
<6>Initializing cgroup subsys ns
<6>Initializing cgroup subsys cpuacct
<6>Initializing cgroup subsys memory
<6>Initializing cgroup subsys devices
<6>Initializing cgroup subsys freezer
<7>irq: irq 2 on host null mapped to virtual irq 16
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<6>Brought up 2 CPUs
<7>Node 0 CPUs: 0-1
<7>Node 2 CPUs:
<7>Node 3 CPUs:
<7>CPU0 attaching sched-domain:
<7> domain 0: span 0-1 level SIBLING
<7>  groups: 0 (cpu_power = 589) 1 (cpu_power = 589)
<7>  domain 1: span 0-1 level CPU
<7>   groups: 0-1 (cpu_power = 1178)
<7>CPU1 attaching sched-domain:
<7> domain 0: span 0-1 level SIBLING
<7>  groups: 1 (cpu_power = 589) 0 (cpu_power = 589)
<7>  domain 1: span 0-1 level CPU
<7>   groups: 0-1 (cpu_power = 1178)
<6>NET: Registered protocol family 16
<6>IBM eBus Device Driver
<6>POWER6 performance monitor hardware support registered
<6>PCI: Probing PCI hardware
<7>PCI: Probing PCI hardware done
<4>bio: create slab <bio-0> at 0
<4>PERCPU: chunk 0 relocating 13 -> 12 c000000001120380 <c0000000011202d0:c0000000011202d0>
<4>PERCPU: relocated <c0000000011202c0:c0000000011202c0>
<6>vgaarb: loaded
<6>usbcore: registered new interface driver usbfs
<6>usbcore: registered new interface driver hub
<6>usbcore: registered new device driver usb
<6>Switching to clocksource timebase
<6>NET: Registered protocol family 2
<6>IP route cache hash table entries: 32768 (order: 2, 262144 bytes)
<6>TCP established hash table entries: 131072 (order: 5, 2097152 bytes)
<6>TCP bind hash table entries: 65536 (order: 5, 2097152 bytes)
<6>TCP: Hash tables configured (established 131072 bind 65536)
<6>TCP reno registered
<6>NET: Registered protocol family 1
<6>Unpacking initramfs...
<7>Switched to high resolution mode on CPU 0
<7>Switched to high resolution mode on CPU 1
<7>irq: irq 655360 on host null mapped to virtual irq 17
<7>irq: irq 655367 on host null mapped to virtual irq 18
<6>IOMMU table initialized, virtual merging enabled
<7>irq: irq 589825 on host null mapped to virtual irq 19
<7>RTAS daemon started
<6>audit: initializing netlink socket (disabled)
<5>type=2000 audit(1253862433.200:1): initialized
<6>Kprobe smoke test started
<6>Kprobe smoke test passed successfully
<6>HugeTLB registered 16 MB page size, pre-allocated 0 pages
<6>HugeTLB registered 16 GB page size, pre-allocated 0 pages
<5>VFS: Disk quotas dquot_6.5.2
<4>Dquot-cache hash table entries: 8192 (order 0, 65536 bytes)
<6>Btrfs loaded
<6>msgmni has been set to 7608
<4>PERCPU: chunk 0 relocating 12 -> 11 c000000001120380 <c0000000011202c0:c0000000011202c0>
<4>PERCPU: relocated <c0000000011202b0:c0000000011202b0>
<6>alg: No test for stdrng (krng)
<6>Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
<6>io scheduler noop registered
<6>io scheduler anticipatory registered
<6>io scheduler deadline registered
<6>io scheduler cfq registered (default)
<6>pci_hotplug: PCI Hot Plug PCI Core version: 0.5
<6>rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1
<7>vio_register_driver: driver hvc_console registering
<7>HVSI: registered 0 devices
<6>Generic RTC Driver v1.07
<6>Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
<6>pmac_zilog: 0.6 (Benjamin Herrenschmidt <benh@kernel.crashing.org>)
<6>input: Macintosh mouse button emulation as /devices/virtual/input/input0
<6>Uniform Multi-Platform E-IDE driver
<6>ide-gd driver 1.18
<6>IBM eHEA ethernet device driver (Release EHEA_0102)
<7>irq: irq 590088 on host null mapped to virtual irq 264
<6>ehea: eth0: Jumbo frames are disabled
<6>ehea: eth0 -> logical port id #2
<6>ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
<6>ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
<6>mice: PS/2 mouse device common for all mice
<6>EDAC MC: Ver: 2.1.0 Sep 25 2009
<6>usbcore: registered new interface driver hiddev
<6>usbcore: registered new interface driver usbhid
<6>usbhid: v2.6:USB HID core driver
<6>TCP cubic registered
<6>NET: Registered protocol family 15
<4>registered taskstats version 1
<4>Freeing unused kernel memory: 576k freed
<6>SysRq : Changing Loglevel
<4>Loglevel set to 8
<5>SCSI subsystem initialized
<7>vio_register_driver: driver ibmvscsi registering
<6>ibmvscsi 30000007: SRP_VERSION: 16.a
<6>scsi0 : IBM POWER Virtual SCSI Adapter 1.5.8
<6>ibmvscsi 30000007: partner initialization complete
<6>ibmvscsi 30000007: host srp version: 16.a, host partition VIO Server (1), OS 3, max io 1048576
<6>ibmvscsi 30000007: Client reserve enabled
<6>ibmvscsi 30000007: sent SRP login
<6>ibmvscsi 30000007: SRP_LOGIN succeeded
<5>scsi 0:0:1:0: Direct-Access     AIX      VDASD            0001 PQ: 0 ANSI: 3
<5>scsi 0:0:2:0: CD-ROM            AIX      VOPTA                 PQ: 0 ANSI: 4
<6>udevd version 128 started
<5>sd 0:0:1:0: [sda] 146800640 512-byte logical blocks: (75.1 GB/70.0 GiB)
<5>sd 0:0:1:0: [sda] Write Protect is off
<7>sd 0:0:1:0: [sda] Mode Sense: 17 00 00 08
<5>sd 0:0:1:0: [sda] Cache data unavailable
<3>sd 0:0:1:0: [sda] Assuming drive cache: write through
<5>sd 0:0:1:0: [sda] Cache data unavailable
<3>sd 0:0:1:0: [sda] Assuming drive cache: write through
<6> sda: sda1 sda2 sda3
<5>sd 0:0:1:0: [sda] Cache data unavailable
<3>sd 0:0:1:0: [sda] Assuming drive cache: write through
<5>sd 0:0:1:0: [sda] Attached SCSI disk
<6>kjournald starting.  Commit interval 5 seconds
<6>EXT3 FS on sda3, internal journal
<6>EXT3-fs: mounted filesystem with writeback data mode.
<6>udevd version 128 started
<5>sd 0:0:1:0: Attached scsi generic sg0 type 0
<5>scsi 0:0:2:0: Attached scsi generic sg1 type 5
<4>sr0: scsi-1 drive
<6>Uniform CD-ROM driver Revision: 3.20
<7>sr 0:0:2:0: Attached scsi CD-ROM sr0
<6>Adding 2096320k swap on /dev/sda2.  Priority:-1 extents:1 across:2096320k 
<6>device-mapper: uevent: version 1.0.3
<6>device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: dm-devel@redhat.com
<6>loop: module loaded
<6>fuse init (API version 7.13)
<7>irq: irq 33539 on host null mapped to virtual irq 259
<6>ehea: eth0: Physical port up
<6>ehea: External switch port is backup port
<7>irq: irq 33540 on host null mapped to virtual irq 260
<6>NET: Registered protocol family 10
<4>PERCPU: chunk 0 relocating 11 -> 10 c000000001120380 <c0000000011202b0:c0000000011202b0>
<4>PERCPU: relocated <c0000000011202a0:c0000000011202a0>
<4>PERCPU: chunk 0 relocating 10 -> 9 c000000001120380 <c0000000011202a0:c0000000011202a0>
<4>PERCPU: relocated <c000000001120290:c000000001120290>
<4>PERCPU: chunk 1 relocating -1 -> 18 c0000000db70fb00 <c0000000db70fb00:c0000000db70fb00>
<4>PERCPU: relocated <c000000001120320:c000000001120320>
<4>PERCPU: chunk 1 relocating 18 -> 16 c0000000db70fb00 <c000000001120320:c000000001120320>
<4>PERCPU: relocated <c000000001120300:c000000001120300>
<4>PERCPU: chunk 1, alloc pages [0,1)
<4>PERCPU: chunk 1, map pages [0,1)
<4>PERCPU: map 0xd00007fffff00000, 1 pages 53544
<4>PERCPU: map 0xd00007fffff80000, 1 pages 53545
<4>PERCPU: chunk 1, will clear 4096b/unit d00007fffff00000 d00007fffff80000
<3>INFO: RCU detected CPU 0 stall (t=1000 jiffies)
<3>INFO: RCU detected CPU 0 stall (t=4000 jiffies)
<3>BUG: soft lockup - CPU#0 stuck for 61s! [modprobe:1864]
<4>Modules linked in: ipv6(+) fuse loop dm_mod sr_mod cdrom sg sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod
<4>NIP: c000000000043140 LR: c00000000016d0d0 CTR: 0000000000000040
<4>REGS: c0000000da4276e0 TRAP: 0901   Not tainted  (2.6.31-git15)
<4>MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 44224880  XER: 20000002
<4>TASK = c0000000dbbf71d0[1864] 'modprobe' THREAD: c0000000da424000 CPU: 0
<4>GPR00: 0000000000000040 c0000000da427960 c000000000b8bca8 d00007fffff00000 
<4>GPR04: 0000000000000000 0000000000000000 d00007fffff00000 80000000565a6cc0 
<4>GPR08: 0000000000000000 c000000001120180 c000000000c45aa0 00000000000003c0 
<4>GPR12: 0000000028224882 c000000000c62600 
<4>NIP [c000000000043140] .memset+0x60/0xfc
<4>LR [c00000000016d0d0] .pcpu_alloc+0x8a4/0xae4
<4>Call Trace:
<4>[c0000000da427960] [c00000000016d094] .pcpu_alloc+0x868/0xae4 (unreliable)
<4>[c0000000da427a90] [c000000000613c08] .snmp_mib_init+0x34/0x9c
<4>[c0000000da427b20] [d00000000262e074] .ipv6_add_dev+0x1d4/0x3e4 [ipv6]
<4>[c0000000da427bc0] [d00000000265997c] .addrconf_init+0x6c/0x194 [ipv6]
<4>[c0000000da427c50] [d00000000265974c] .inet6_init+0x1bc/0x34c [ipv6]
<4>[c0000000da427ce0] [c0000000000097a4] .do_one_initcall+0x88/0x1bc
<4>[c0000000da427d90] [c0000000000d59fc] .SyS_init_module+0x118/0x2a4
<4>[c0000000da427e30] [c0000000000085b4] syscall_exit+0x0/0x40
<4>Instruction dump:
<4>98860000 38c60001 409e000c b0860000 38c60002 409d000c 90860000 38c60004 
<4>78a0d183 78a506a0 7c0903a6 4182002c <f8860000> f8860008 f8860010 f8860018 
<3>INFO: RCU detected CPU 0 stall (t=7000 jiffies)
<3>INFO: RCU detected CPU 0 stall (t=10000 jiffies)
0:mon>



  parent reply	other threads:[~2009-09-25  7:15 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-16 12:25 2.6.31-git5 kernel boot hangs on powerpc Sachin Sant
2009-09-17 10:51 ` Sachin Sant
2009-09-17 11:13   ` Benjamin Herrenschmidt
2009-09-17 15:53     ` Tejun Heo
2009-09-17 16:41       ` Sachin Sant
2009-09-19  8:54         ` Sachin Sant
2009-09-23  8:23           ` Sachin Sant
2009-09-23  8:34             ` Tejun Heo
2009-09-23 14:17               ` Tejun Heo
2009-09-24  7:58                 ` Sachin Sant
2009-09-24 12:59                   ` Tejun Heo
2009-09-24 13:23                     ` Sachin Sant
2009-09-24 21:05                       ` Benjamin Herrenschmidt
2009-09-25  3:22                         ` Tejun Heo
2009-09-25  3:40                           ` Benjamin Herrenschmidt
2009-09-25  7:15                           ` Sachin Sant [this message]
2009-09-25  7:39                             ` Tejun Heo
2009-09-25  7:43                               ` Tejun Heo
2009-09-25  8:03                                 ` Sachin Sant
2009-09-25  9:01                                   ` Tejun Heo
2009-09-25  9:48                                     ` Benjamin Herrenschmidt
2009-10-05  6:54                                       ` Sachin Sant
2009-09-25  8:31                               ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ABC6E25.7090904@in.ibm.com \
    --to=sachinp@in.ibm.com \
    --cc=davem@davemloft.net \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).