From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from e28smtp02.in.ibm.com (e28smtp02.in.ibm.com [59.145.155.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "e28smtp02.in.ibm.com", Issuer "Equifax" (verified OK)) by ozlabs.org (Postfix) with ESMTPS id 38E0BB7334 for ; Fri, 25 Sep 2009 17:15:55 +1000 (EST) Received: from d28relay05.in.ibm.com (d28relay05.in.ibm.com [9.184.220.62]) by e28smtp02.in.ibm.com (8.14.3/8.13.1) with ESMTP id n8P7FoCR004665 for ; Fri, 25 Sep 2009 12:45:50 +0530 Received: from d28av03.in.ibm.com (d28av03.in.ibm.com [9.184.220.65]) by d28relay05.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id n8P7FouS2166914 for ; Fri, 25 Sep 2009 12:45:50 +0530 Received: from d28av03.in.ibm.com (loopback [127.0.0.1]) by d28av03.in.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id n8P7Fn0B006495 for ; Fri, 25 Sep 2009 17:15:50 +1000 Message-ID: <4ABC6E25.7090904@in.ibm.com> Date: Fri, 25 Sep 2009 12:45:49 +0530 From: Sachin Sant MIME-Version: 1.0 To: Tejun Heo Subject: Re: 2.6.31-git5 kernel boot hangs on powerpc References: <4AB0D947.8010301@in.ibm.com> <4AB214C3.4040109@in.ibm.com> <1253185994.8375.352.camel@pasglop> <4AB25B61.9020609@kernel.org> <4AB266AF.9080705@in.ibm.com> <4AB49C37.6020003@in.ibm.com> <4AB9DAEC.3060309@in.ibm.com> <4AB9DD8F.1040305@kernel.org> <4ABA2DE2.6000601@kernel.org> <4ABB269F.6020309@in.ibm.com> <4ABB6D33.6060706@kernel.org> <4ABB72BD.9050905@in.ibm.com> <1253826309.7103.461.camel@pasglop> <4ABC376D.1020704@kernel.org> In-Reply-To: <4ABC376D.1020704@kernel.org> Content-Type: multipart/mixed; boundary="------------010001010803070307080207" Cc: David Miller , Linux/PPC Development List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , This is a multi-part message in MIME format. --------------010001010803070307080207 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Tejun Heo wrote: > Benjamin Herrenschmidt wrote: > >>> --- Exception: 301 at .memset+0x60/0xfc >>> LR = .pcpu_alloc+0x718/0x8fc >>> >> So it's memsetting something that causes it to hash_page(), ie, faulting >> in pages (vmalloc space ?) so far nothing obviously wrong.... >> > > It's probably memset() call near the end of pcpu_populate_chunk() > where percpu allocator clears the allocated areas before returning to > user. I don't think the first chunk is causing the problem as they're > all in the linear mapped area. From the second chunk on, they're on > vmalloc area and very near to the top of it, so that might be exposing > a hidden problem in paging code? BTW, for some reason, the problem is > not reproducible on my powerstation. > > Sachin, can you please apply the attached patch on top of the current > linus tree, reproduce the hang and report full kernel log? Let's see > which address is causing the problem. > Here is the dmesg log captured with the debug patch. Some of the debug messages related to PERCPU <6>PERCPU: Embedded 2 pages/cpu @c000000001100000 s97160 r0 d33912 u524288 <6>pcpu-alloc: s97160 r0 d33912 u524288 alloc=1*1048576 <6>pcpu-alloc: [0] 0 1 <4>PERCPU: initialized 19 slots [c000000001120200,c000000001120330) <4>PERCPU: chunk 0 relocating -1 -> 13 c000000001120380 <4>PERCPU: relocated <4>PERCPU: chunk 0 relocating 13 -> 12 c000000001120380 <4>PERCPU: relocated <4>PERCPU: chunk 0 relocating 12 -> 11 c000000001120380 <4>PERCPU: relocated <6>ehea: eth0: Physical port up <6>ehea: External switch port is backup port <7>irq: irq 33540 on host null mapped to virtual irq 260 <6>NET: Registered protocol family 10 <4>PERCPU: chunk 0 relocating 11 -> 10 c000000001120380 <4>PERCPU: relocated <4>PERCPU: chunk 0 relocating 10 -> 9 c000000001120380 <4>PERCPU: relocated <4>PERCPU: chunk 1 relocating -1 -> 18 c0000000db70fb00 <4>PERCPU: relocated <4>PERCPU: chunk 1 relocating 18 -> 16 c0000000db70fb00 <4>PERCPU: relocated <4>PERCPU: chunk 1, alloc pages [0,1) <4>PERCPU: chunk 1, map pages [0,1) <4>PERCPU: map 0xd00007fffff00000, 1 pages 53544 <4>PERCPU: map 0xd00007fffff80000, 1 pages 53545 <4>PERCPU: chunk 1, will clear 4096b/unit d00007fffff00000 d00007fffff80000 <3>INFO: RCU detected CPU 0 stall (t=1000 jiffies) Thanks -Sachin -- --------------------------------- Sachin Sant IBM Linux Technology Center India Systems and Technology Labs Bangalore, India --------------------------------- --------------010001010803070307080207 Content-Type: text/plain; name="pcpu-debug-log" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="pcpu-debug-log" 1:mon> c 0 0:mon> t [link register ] c00000000016d0d0 .pcpu_alloc+0x8a4/0xae4 [c0000000da427960] c00000000016d094 .pcpu_alloc+0x868/0xae4 (unreliable) [c0000000da427a90] c000000000613c08 .snmp_mib_init+0x34/0x9c [c0000000da427b20] d00000000262e074 .ipv6_add_dev+0x1d4/0x3e4 [ipv6] [c0000000da427bc0] d00000000265997c .addrconf_init+0x6c/0x194 [ipv6] [c0000000da427c50] d00000000265974c .inet6_init+0x1bc/0x34c [ipv6] [c0000000da427ce0] c0000000000097a4 .do_one_initcall+0x88/0x1bc [c0000000da427d90] c0000000000d59fc .SyS_init_module+0x118/0x2a4 [c0000000da427e30] c0000000000085b4 syscall_exit+0x0/0x40 --- Exception: c01 (System Call) at 00000fff8350b568 SP (fffdc3a7f50) is in userspace 0:mon> dl <4>Crash kernel location must be 0x2000000 <6>Reserving 256MB of memory at 32MB for crashkernel (System RAM: 4096MB) <6>Phyp-dump disabled at boot time <6>Using pSeries machine description <7>Page orders: linear mapping = 24, virtual = 16, io = 12 <6>Using 1TB segments <4>Found initrd at 0xc000000003500000:0xc000000003cca9b1 <6>bootconsole [udbg0] enabled <6>Partition configured for 2 cpus. <6>CPU maps initialized for 2 threads per core <7> (thread shift is 1) <4>Starting Linux PPC64 #2 SMP Fri Sep 25 12:31:24 IST 2009 <4>----------------------------------------------------- <4>ppc64_pft_size = 0x1a <4>physicalMemorySize = 0x100000000 <4>htab_hash_mask = 0x7ffff <4>----------------------------------------------------- <6>Initializing cgroup subsys cpuset <6>Initializing cgroup subsys cpu <5>Linux version 2.6.31-git15 (root@mpower6lp5) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #2 SMP Fri Sep 25 12:31:24 IST 2009 <4>[boot]0012 Setup Arch <7>Node 0 Memory: <7>Node 2 Memory: 0x0-0xe0000000 <7>Node 3 Memory: 0xe0000000-0x100000000 <4>EEH: No capable adapters found <6>PPC64 nvram contains 15360 bytes <7>Using shared processor idle loop <4>Zone PFN ranges: <4> DMA 0x00000000 -> 0x00010000 <4> Normal 0x00010000 -> 0x00010000 <4>Movable zone start PFN for each node <4>early_node_map[2] active PFN ranges <4> 2: 0x00000000 -> 0x0000e000 <4> 3: 0x0000e000 -> 0x00010000 <4>Could not find start_pfn for node 0 <7>On node 0 totalpages: 0 <7>On node 2 totalpages: 57344 <7> DMA zone: 56 pages used for memmap <7> DMA zone: 0 pages reserved <7> DMA zone: 57288 pages, LIFO batch:1 <7>On node 3 totalpages: 8192 <7> DMA zone: 8 pages used for memmap <7> DMA zone: 0 pages reserved <7> DMA zone: 8184 pages, LIFO batch:0 <4>[boot]0015 Setup Done <6>PERCPU: Embedded 2 pages/cpu @c000000001100000 s97160 r0 d33912 u524288 <6>pcpu-alloc: s97160 r0 d33912 u524288 alloc=1*1048576 <6>pcpu-alloc: [0] 0 1 <4>PERCPU: initialized 19 slots [c000000001120200,c000000001120330) <4>PERCPU: chunk 0 relocating -1 -> 13 c000000001120380 <4>PERCPU: relocated <4>Built 3 zonelists in Node order, mobility grouping on. Total pages: 65472 <4>Policy zone: DMA <5>Kernel command line: root=/dev/sda3 sysrq=8 insmod=sym53c8xx insmod=ipr crashkernel=512M-:256M xmon=on <6>PID hash table entries: 4096 (order: -1, 32768 bytes) <4>freeing bootmem node 2 <4>freeing bootmem node 3 <6>Memory: 3897152k/4194304k available (9664k kernel code, 297152k reserved, 2944k data, 4274k bss, 576k init) <6>Hierarchical RCU implementation. <6>RCU-based detection of stalled CPUs is enabled. <6>NR_IRQS:512 <4>[boot]0020 XICS Init <4>[boot]0021 XICS Done <7>pic: no ISA interrupt controller <7>time_init: decrementer frequency = 512.000000 MHz <7>time_init: processor frequency = 4704.000000 MHz <6>clocksource: timebase mult[7d0000] shift[22] registered <7>clockevent: decrementer mult[83126e97] shift[32] cpu[0] <4>Console: colour dummy device 80x25 <6>console [hvc0] enabled, bootconsole disabled <6>allocated 2621440 bytes of page_cgroup <6>please try 'cgroup_disable=memory' option if you don't want memory cgroups <6>Security Framework initialized <6>SELinux: Disabled at boot. <6>Dentry cache hash table entries: 524288 (order: 6, 4194304 bytes) <6>Inode-cache hash table entries: 262144 (order: 5, 2097152 bytes) <4>Mount-cache hash table entries: 4096 <6>Initializing cgroup subsys ns <6>Initializing cgroup subsys cpuacct <6>Initializing cgroup subsys memory <6>Initializing cgroup subsys devices <6>Initializing cgroup subsys freezer <7>irq: irq 2 on host null mapped to virtual irq 16 <7>clockevent: decrementer mult[83126e97] shift[32] cpu[1] <4>Processor 1 found. <6>Brought up 2 CPUs <7>Node 0 CPUs: 0-1 <7>Node 2 CPUs: <7>Node 3 CPUs: <7>CPU0 attaching sched-domain: <7> domain 0: span 0-1 level SIBLING <7> groups: 0 (cpu_power = 589) 1 (cpu_power = 589) <7> domain 1: span 0-1 level CPU <7> groups: 0-1 (cpu_power = 1178) <7>CPU1 attaching sched-domain: <7> domain 0: span 0-1 level SIBLING <7> groups: 1 (cpu_power = 589) 0 (cpu_power = 589) <7> domain 1: span 0-1 level CPU <7> groups: 0-1 (cpu_power = 1178) <6>NET: Registered protocol family 16 <6>IBM eBus Device Driver <6>POWER6 performance monitor hardware support registered <6>PCI: Probing PCI hardware <7>PCI: Probing PCI hardware done <4>bio: create slab at 0 <4>PERCPU: chunk 0 relocating 13 -> 12 c000000001120380 <4>PERCPU: relocated <6>vgaarb: loaded <6>usbcore: registered new interface driver usbfs <6>usbcore: registered new interface driver hub <6>usbcore: registered new device driver usb <6>Switching to clocksource timebase <6>NET: Registered protocol family 2 <6>IP route cache hash table entries: 32768 (order: 2, 262144 bytes) <6>TCP established hash table entries: 131072 (order: 5, 2097152 bytes) <6>TCP bind hash table entries: 65536 (order: 5, 2097152 bytes) <6>TCP: Hash tables configured (established 131072 bind 65536) <6>TCP reno registered <6>NET: Registered protocol family 1 <6>Unpacking initramfs... <7>Switched to high resolution mode on CPU 0 <7>Switched to high resolution mode on CPU 1 <7>irq: irq 655360 on host null mapped to virtual irq 17 <7>irq: irq 655367 on host null mapped to virtual irq 18 <6>IOMMU table initialized, virtual merging enabled <7>irq: irq 589825 on host null mapped to virtual irq 19 <7>RTAS daemon started <6>audit: initializing netlink socket (disabled) <5>type=2000 audit(1253862433.200:1): initialized <6>Kprobe smoke test started <6>Kprobe smoke test passed successfully <6>HugeTLB registered 16 MB page size, pre-allocated 0 pages <6>HugeTLB registered 16 GB page size, pre-allocated 0 pages <5>VFS: Disk quotas dquot_6.5.2 <4>Dquot-cache hash table entries: 8192 (order 0, 65536 bytes) <6>Btrfs loaded <6>msgmni has been set to 7608 <4>PERCPU: chunk 0 relocating 12 -> 11 c000000001120380 <4>PERCPU: relocated <6>alg: No test for stdrng (krng) <6>Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254) <6>io scheduler noop registered <6>io scheduler anticipatory registered <6>io scheduler deadline registered <6>io scheduler cfq registered (default) <6>pci_hotplug: PCI Hot Plug PCI Core version: 0.5 <6>rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1 <7>vio_register_driver: driver hvc_console registering <7>HVSI: registered 0 devices <6>Generic RTC Driver v1.07 <6>Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled <6>pmac_zilog: 0.6 (Benjamin Herrenschmidt ) <6>input: Macintosh mouse button emulation as /devices/virtual/input/input0 <6>Uniform Multi-Platform E-IDE driver <6>ide-gd driver 1.18 <6>IBM eHEA ethernet device driver (Release EHEA_0102) <7>irq: irq 590088 on host null mapped to virtual irq 264 <6>ehea: eth0: Jumbo frames are disabled <6>ehea: eth0 -> logical port id #2 <6>ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver <6>ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver <6>mice: PS/2 mouse device common for all mice <6>EDAC MC: Ver: 2.1.0 Sep 25 2009 <6>usbcore: registered new interface driver hiddev <6>usbcore: registered new interface driver usbhid <6>usbhid: v2.6:USB HID core driver <6>TCP cubic registered <6>NET: Registered protocol family 15 <4>registered taskstats version 1 <4>Freeing unused kernel memory: 576k freed <6>SysRq : Changing Loglevel <4>Loglevel set to 8 <5>SCSI subsystem initialized <7>vio_register_driver: driver ibmvscsi registering <6>ibmvscsi 30000007: SRP_VERSION: 16.a <6>scsi0 : IBM POWER Virtual SCSI Adapter 1.5.8 <6>ibmvscsi 30000007: partner initialization complete <6>ibmvscsi 30000007: host srp version: 16.a, host partition VIO Server (1), OS 3, max io 1048576 <6>ibmvscsi 30000007: Client reserve enabled <6>ibmvscsi 30000007: sent SRP login <6>ibmvscsi 30000007: SRP_LOGIN succeeded <5>scsi 0:0:1:0: Direct-Access AIX VDASD 0001 PQ: 0 ANSI: 3 <5>scsi 0:0:2:0: CD-ROM AIX VOPTA PQ: 0 ANSI: 4 <6>udevd version 128 started <5>sd 0:0:1:0: [sda] 146800640 512-byte logical blocks: (75.1 GB/70.0 GiB) <5>sd 0:0:1:0: [sda] Write Protect is off <7>sd 0:0:1:0: [sda] Mode Sense: 17 00 00 08 <5>sd 0:0:1:0: [sda] Cache data unavailable <3>sd 0:0:1:0: [sda] Assuming drive cache: write through <5>sd 0:0:1:0: [sda] Cache data unavailable <3>sd 0:0:1:0: [sda] Assuming drive cache: write through <6> sda: sda1 sda2 sda3 <5>sd 0:0:1:0: [sda] Cache data unavailable <3>sd 0:0:1:0: [sda] Assuming drive cache: write through <5>sd 0:0:1:0: [sda] Attached SCSI disk <6>kjournald starting. Commit interval 5 seconds <6>EXT3 FS on sda3, internal journal <6>EXT3-fs: mounted filesystem with writeback data mode. <6>udevd version 128 started <5>sd 0:0:1:0: Attached scsi generic sg0 type 0 <5>scsi 0:0:2:0: Attached scsi generic sg1 type 5 <4>sr0: scsi-1 drive <6>Uniform CD-ROM driver Revision: 3.20 <7>sr 0:0:2:0: Attached scsi CD-ROM sr0 <6>Adding 2096320k swap on /dev/sda2. Priority:-1 extents:1 across:2096320k <6>device-mapper: uevent: version 1.0.3 <6>device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: dm-devel@redhat.com <6>loop: module loaded <6>fuse init (API version 7.13) <7>irq: irq 33539 on host null mapped to virtual irq 259 <6>ehea: eth0: Physical port up <6>ehea: External switch port is backup port <7>irq: irq 33540 on host null mapped to virtual irq 260 <6>NET: Registered protocol family 10 <4>PERCPU: chunk 0 relocating 11 -> 10 c000000001120380 <4>PERCPU: relocated <4>PERCPU: chunk 0 relocating 10 -> 9 c000000001120380 <4>PERCPU: relocated <4>PERCPU: chunk 1 relocating -1 -> 18 c0000000db70fb00 <4>PERCPU: relocated <4>PERCPU: chunk 1 relocating 18 -> 16 c0000000db70fb00 <4>PERCPU: relocated <4>PERCPU: chunk 1, alloc pages [0,1) <4>PERCPU: chunk 1, map pages [0,1) <4>PERCPU: map 0xd00007fffff00000, 1 pages 53544 <4>PERCPU: map 0xd00007fffff80000, 1 pages 53545 <4>PERCPU: chunk 1, will clear 4096b/unit d00007fffff00000 d00007fffff80000 <3>INFO: RCU detected CPU 0 stall (t=1000 jiffies) <3>INFO: RCU detected CPU 0 stall (t=4000 jiffies) <3>BUG: soft lockup - CPU#0 stuck for 61s! [modprobe:1864] <4>Modules linked in: ipv6(+) fuse loop dm_mod sr_mod cdrom sg sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod <4>NIP: c000000000043140 LR: c00000000016d0d0 CTR: 0000000000000040 <4>REGS: c0000000da4276e0 TRAP: 0901 Not tainted (2.6.31-git15) <4>MSR: 8000000000009032 CR: 44224880 XER: 20000002 <4>TASK = c0000000dbbf71d0[1864] 'modprobe' THREAD: c0000000da424000 CPU: 0 <4>GPR00: 0000000000000040 c0000000da427960 c000000000b8bca8 d00007fffff00000 <4>GPR04: 0000000000000000 0000000000000000 d00007fffff00000 80000000565a6cc0 <4>GPR08: 0000000000000000 c000000001120180 c000000000c45aa0 00000000000003c0 <4>GPR12: 0000000028224882 c000000000c62600 <4>NIP [c000000000043140] .memset+0x60/0xfc <4>LR [c00000000016d0d0] .pcpu_alloc+0x8a4/0xae4 <4>Call Trace: <4>[c0000000da427960] [c00000000016d094] .pcpu_alloc+0x868/0xae4 (unreliable) <4>[c0000000da427a90] [c000000000613c08] .snmp_mib_init+0x34/0x9c <4>[c0000000da427b20] [d00000000262e074] .ipv6_add_dev+0x1d4/0x3e4 [ipv6] <4>[c0000000da427bc0] [d00000000265997c] .addrconf_init+0x6c/0x194 [ipv6] <4>[c0000000da427c50] [d00000000265974c] .inet6_init+0x1bc/0x34c [ipv6] <4>[c0000000da427ce0] [c0000000000097a4] .do_one_initcall+0x88/0x1bc <4>[c0000000da427d90] [c0000000000d59fc] .SyS_init_module+0x118/0x2a4 <4>[c0000000da427e30] [c0000000000085b4] syscall_exit+0x0/0x40 <4>Instruction dump: <4>98860000 38c60001 409e000c b0860000 38c60002 409d000c 90860000 38c60004 <4>78a0d183 78a506a0 7c0903a6 4182002c f8860008 f8860010 f8860018 <3>INFO: RCU detected CPU 0 stall (t=7000 jiffies) <3>INFO: RCU detected CPU 0 stall (t=10000 jiffies) 0:mon> --------------010001010803070307080207--