* Boot failure on the powerstation with 2.6.30 latest @ 2009-06-22 15:16 James Bottomley 2009-06-22 16:11 ` Brian King 0 siblings, 1 reply; 5+ messages in thread From: James Bottomley @ 2009-06-22 15:16 UTC (permalink / raw) To: linuxppc-dev 2.6.30-rc8 worked fine ... unless this is a known problem, I suppose I can begin bisecting. The boot log of the hang is: Please wait, loading kernel... Elf64 kernel loaded... Loading ramdisk... ramdisk loaded at 02500000, size: 8280 Kbytes OF stdout device is: /ht/isa@8/serial@2f8 Preparing to boot Linux version 2.6.30 (jejb@claymoor) (gcc version 4.3.3 (Debian 4.3.3-10) ) #1 SMP Mon Jun 22 09:59:35 CDT 2009 command line: root=/dev/sda3 ro console=ttyS0,19200n1 memory layout at init: alloc_bottom : 0000000002d16000 alloc_top : 0000000030000000 alloc_top_hi : 0000000080000000 rmo_top : 0000000030000000 ram_top : 0000000080000000 instantiating rtas at 0x000000002fff5000... done boot cpu hw idx 0000000000000000 starting cpu hw idx 0000000000000001... done starting cpu hw idx 0000000000000002... done starting cpu hw idx 0000000000000003... done copying OF device tree... Building dt strings... Building dt structure... Device tree strings 0x0000000003117000 -> 0x0000000003117640 Device tree struct 0x0000000003118000 -> 0x000000000311b000 Calling quiesce... returning from prom_init So it looks like some type of early boot failure or handoff in head_64 James ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Boot failure on the powerstation with 2.6.30 latest 2009-06-22 15:16 Boot failure on the powerstation with 2.6.30 latest James Bottomley @ 2009-06-22 16:11 ` Brian King 2009-06-22 22:03 ` James Bottomley 2009-06-22 22:21 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 5+ messages in thread From: Brian King @ 2009-06-22 16:11 UTC (permalink / raw) To: James Bottomley; +Cc: linuxppc-dev James, I was running into a similar hang on one of my Power boxes as well. Reverting c868d550115b9ccc0027c67265b9520790f05601 allowed by system to boot. It looks like that patch injected a bug where we can end up waiting on an uninitialized mutex: [c0000000009f3c30] c00000000052c7dc .mutex_lock+0x34/0x50 [c0000000009f3cb0] c00000000008b190 .get_online_cpus+0x3c/0x74 [c0000000009f3d40] c000000000146cd0 .kmem_cache_create+0xcc/0x548 [c0000000009f3e50] c000000000032ae0 .pgtable_cache_init+0x28/0x6c [c0000000009f3ee0] c000000000780960 .start_kernel+0x1ec/0x520 [c0000000009f3f90] c0000000000083d8 .start_here_common+0x1c/0x44 The mutex gets initialized in cpu_hotplug_init, which doesn't get called until after pgtable_cache_init. -Brian James Bottomley wrote: > 2.6.30-rc8 worked fine ... unless this is a known problem, I suppose I > can begin bisecting. > > The boot log of the hang is: > > Please wait, loading kernel... > Elf64 kernel loaded... > Loading ramdisk... > ramdisk loaded at 02500000, size: 8280 Kbytes > OF stdout device is: /ht/isa@8/serial@2f8 > Preparing to boot Linux version 2.6.30 (jejb@claymoor) (gcc version 4.3.3 (Debian 4.3.3-10) ) #1 SMP Mon Jun 22 09:59:35 CDT 2009 > command line: root=/dev/sda3 ro console=ttyS0,19200n1 > memory layout at init: > alloc_bottom : 0000000002d16000 > alloc_top : 0000000030000000 > alloc_top_hi : 0000000080000000 > rmo_top : 0000000030000000 > ram_top : 0000000080000000 > instantiating rtas at 0x000000002fff5000... done > boot cpu hw idx 0000000000000000 > starting cpu hw idx 0000000000000001... done > starting cpu hw idx 0000000000000002... done > starting cpu hw idx 0000000000000003... done > copying OF device tree... > Building dt strings... > Building dt structure... > Device tree strings 0x0000000003117000 -> 0x0000000003117640 > Device tree struct 0x0000000003118000 -> 0x000000000311b000 > Calling quiesce... > returning from prom_init > > So it looks like some type of early boot failure or handoff in head_64 > > James > > > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev -- Brian King Linux on Power Virtualization IBM Linux Technology Center ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Boot failure on the powerstation with 2.6.30 latest 2009-06-22 16:11 ` Brian King @ 2009-06-22 22:03 ` James Bottomley 2009-06-22 22:25 ` Benjamin Herrenschmidt 2009-06-22 22:21 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 5+ messages in thread From: James Bottomley @ 2009-06-22 22:03 UTC (permalink / raw) To: Brian King; +Cc: linuxppc-dev On Mon, 2009-06-22 at 11:11 -0500, Brian King wrote: > James, > > I was running into a similar hang on one of my Power boxes as well. > Reverting c868d550115b9ccc0027c67265b9520790f05601 allowed by system > to boot. It looks like that patch injected a bug where we can end up > waiting on an uninitialized mutex: > > [c0000000009f3c30] c00000000052c7dc .mutex_lock+0x34/0x50 > [c0000000009f3cb0] c00000000008b190 .get_online_cpus+0x3c/0x74 > [c0000000009f3d40] c000000000146cd0 .kmem_cache_create+0xcc/0x548 > [c0000000009f3e50] c000000000032ae0 .pgtable_cache_init+0x28/0x6c > [c0000000009f3ee0] c000000000780960 .start_kernel+0x1ec/0x520 > [c0000000009f3f90] c0000000000083d8 .start_here_common+0x1c/0x44 > > The mutex gets initialized in cpu_hotplug_init, which doesn't get called until > after pgtable_cache_init. Actually, no, reverting that one doesn't fix it. A full run of git bisect turns up this commit as the culprit; I'll make a fuss on lkml: 83b519e8b9572c319c8e0c615ee5dd7272856090 is first bad commit commit 83b519e8b9572c319c8e0c615ee5dd7272856090 Author: Pekka Enberg <penberg@cs.helsinki.fi> Date: Wed Jun 10 19:40:04 2009 +0300 slab: setup allocators earlier in the boot sequence James ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Boot failure on the powerstation with 2.6.30 latest 2009-06-22 22:03 ` James Bottomley @ 2009-06-22 22:25 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 5+ messages in thread From: Benjamin Herrenschmidt @ 2009-06-22 22:25 UTC (permalink / raw) To: James Bottomley; +Cc: Brian King, linuxppc-dev, Pekka Enberg, Linux Kernel list > Actually, no, reverting that one doesn't fix it. > > A full run of git bisect turns up this commit as the culprit; I'll make > a fuss on lkml: I haven't had the full log of that boot failure, but reverting the patch Brian suggested won't work well indeed, as I said, from the moment slab is initialized, page table allocations will use kmem caches which are initialized by pgtable_cache_init(). So the problem does indeed seem to be another fallover of moving the allocator initialization earlier. I'm working from home today but I'll see if I can get somebody in the office to wire up the powerstation (got disconnected for some reason last week) for me so I can have a look. The mutex issue Brian noticed will definitely break _any_ kmem_cache operation anyway, so that's one bug that need fixing at least (well, provided Brian analysis is right, I didn't have a chance to look myself yet :-) Cheers, Ben. > 83b519e8b9572c319c8e0c615ee5dd7272856090 is first bad commit > commit 83b519e8b9572c319c8e0c615ee5dd7272856090 > Author: Pekka Enberg <penberg@cs.helsinki.fi> > Date: Wed Jun 10 19:40:04 2009 +0300 > > slab: setup allocators earlier in the boot sequence > > James > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Boot failure on the powerstation with 2.6.30 latest 2009-06-22 16:11 ` Brian King 2009-06-22 22:03 ` James Bottomley @ 2009-06-22 22:21 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 5+ messages in thread From: Benjamin Herrenschmidt @ 2009-06-22 22:21 UTC (permalink / raw) To: Brian King; +Cc: James Bottomley, linuxppc-dev, Pekka Enberg, Linux Kernel list On Mon, 2009-06-22 at 11:11 -0500, Brian King wrote: > James, > > I was running into a similar hang on one of my Power boxes as well. > Reverting c868d550115b9ccc0027c67265b9520790f05601 allowed by system > to boot. It looks like that patch injected a bug where we can end up > waiting on an uninitialized mutex: > > [c0000000009f3c30] c00000000052c7dc .mutex_lock+0x34/0x50 > [c0000000009f3cb0] c00000000008b190 .get_online_cpus+0x3c/0x74 > [c0000000009f3d40] c000000000146cd0 .kmem_cache_create+0xcc/0x548 > [c0000000009f3e50] c000000000032ae0 .pgtable_cache_init+0x28/0x6c > [c0000000009f3ee0] c000000000780960 .start_kernel+0x1ec/0x520 > [c0000000009f3f90] c0000000000083d8 .start_here_common+0x1c/0x44 > > The mutex gets initialized in cpu_hotplug_init, which doesn't get called until > after pgtable_cache_init. Ah good, I didn't have a chance to track that one down yet. So the problem here is that we must do pgtable_cache_init there because vmalloc is initialized right after, which relies on allocating page tables and that will need kmem caches on some archs. So I suspect we need to sort out this mutex, either initializing it from elsewhere, moving cpu_hotplug_init() earlier, or avoiding it when the kernel state isn't SYSTEM_RUNNING, I haven't looked in details yet. Cheers, Ben. > -Brian > > James Bottomley wrote: > > 2.6.30-rc8 worked fine ... unless this is a known problem, I suppose I > > can begin bisecting. > > > > The boot log of the hang is: > > > > Please wait, loading kernel... > > Elf64 kernel loaded... > > Loading ramdisk... > > ramdisk loaded at 02500000, size: 8280 Kbytes > > OF stdout device is: /ht/isa@8/serial@2f8 > > Preparing to boot Linux version 2.6.30 (jejb@claymoor) (gcc version 4.3.3 (Debian 4.3.3-10) ) #1 SMP Mon Jun 22 09:59:35 CDT 2009 > > command line: root=/dev/sda3 ro console=ttyS0,19200n1 > > memory layout at init: > > alloc_bottom : 0000000002d16000 > > alloc_top : 0000000030000000 > > alloc_top_hi : 0000000080000000 > > rmo_top : 0000000030000000 > > ram_top : 0000000080000000 > > instantiating rtas at 0x000000002fff5000... done > > boot cpu hw idx 0000000000000000 > > starting cpu hw idx 0000000000000001... done > > starting cpu hw idx 0000000000000002... done > > starting cpu hw idx 0000000000000003... done > > copying OF device tree... > > Building dt strings... > > Building dt structure... > > Device tree strings 0x0000000003117000 -> 0x0000000003117640 > > Device tree struct 0x0000000003118000 -> 0x000000000311b000 > > Calling quiesce... > > returning from prom_init > > > > So it looks like some type of early boot failure or handoff in head_64 > > > > James > > > > > > _______________________________________________ > > Linuxppc-dev mailing list > > Linuxppc-dev@lists.ozlabs.org > > https://lists.ozlabs.org/listinfo/linuxppc-dev > > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-06-22 22:26 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-06-22 15:16 Boot failure on the powerstation with 2.6.30 latest James Bottomley 2009-06-22 16:11 ` Brian King 2009-06-22 22:03 ` James Bottomley 2009-06-22 22:25 ` Benjamin Herrenschmidt 2009-06-22 22:21 ` Benjamin Herrenschmidt
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.