* Re: Boot failure on the powerstation with 2.6.30 latest
[not found] ` <4A3FAD31.2060703@linux.vnet.ibm.com>
@ 2009-06-22 22:21 ` Benjamin Herrenschmidt
[not found] ` <1245708236.17035.2.camel@mulgrave.site>
1 sibling, 0 replies; 2+ messages in thread
From: Benjamin Herrenschmidt @ 2009-06-22 22:21 UTC (permalink / raw)
To: Brian King; +Cc: James Bottomley, linuxppc-dev, Pekka Enberg, Linux Kernel list
On Mon, 2009-06-22 at 11:11 -0500, Brian King wrote:
> James,
>
> I was running into a similar hang on one of my Power boxes as well.
> Reverting c868d550115b9ccc0027c67265b9520790f05601 allowed by system
> to boot. It looks like that patch injected a bug where we can end up
> waiting on an uninitialized mutex:
>
> [c0000000009f3c30] c00000000052c7dc .mutex_lock+0x34/0x50
> [c0000000009f3cb0] c00000000008b190 .get_online_cpus+0x3c/0x74
> [c0000000009f3d40] c000000000146cd0 .kmem_cache_create+0xcc/0x548
> [c0000000009f3e50] c000000000032ae0 .pgtable_cache_init+0x28/0x6c
> [c0000000009f3ee0] c000000000780960 .start_kernel+0x1ec/0x520
> [c0000000009f3f90] c0000000000083d8 .start_here_common+0x1c/0x44
>
> The mutex gets initialized in cpu_hotplug_init, which doesn't get called until
> after pgtable_cache_init.
Ah good, I didn't have a chance to track that one down yet.
So the problem here is that we must do pgtable_cache_init there because
vmalloc is initialized right after, which relies on allocating page
tables and that will need kmem caches on some archs.
So I suspect we need to sort out this mutex, either initializing it from
elsewhere, moving cpu_hotplug_init() earlier, or avoiding it when the
kernel state isn't SYSTEM_RUNNING, I haven't looked in details yet.
Cheers,
Ben.
> -Brian
>
> James Bottomley wrote:
> > 2.6.30-rc8 worked fine ... unless this is a known problem, I suppose I
> > can begin bisecting.
> >
> > The boot log of the hang is:
> >
> > Please wait, loading kernel...
> > Elf64 kernel loaded...
> > Loading ramdisk...
> > ramdisk loaded at 02500000, size: 8280 Kbytes
> > OF stdout device is: /ht/isa@8/serial@2f8
> > Preparing to boot Linux version 2.6.30 (jejb@claymoor) (gcc version 4.3.3 (Debian 4.3.3-10) ) #1 SMP Mon Jun 22 09:59:35 CDT 2009
> > command line: root=/dev/sda3 ro console=ttyS0,19200n1
> > memory layout at init:
> > alloc_bottom : 0000000002d16000
> > alloc_top : 0000000030000000
> > alloc_top_hi : 0000000080000000
> > rmo_top : 0000000030000000
> > ram_top : 0000000080000000
> > instantiating rtas at 0x000000002fff5000... done
> > boot cpu hw idx 0000000000000000
> > starting cpu hw idx 0000000000000001... done
> > starting cpu hw idx 0000000000000002... done
> > starting cpu hw idx 0000000000000003... done
> > copying OF device tree...
> > Building dt strings...
> > Building dt structure...
> > Device tree strings 0x0000000003117000 -> 0x0000000003117640
> > Device tree struct 0x0000000003118000 -> 0x000000000311b000
> > Calling quiesce...
> > returning from prom_init
> >
> > So it looks like some type of early boot failure or handoff in head_64
> >
> > James
> >
> >
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev
>
>
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Boot failure on the powerstation with 2.6.30 latest
[not found] ` <1245708236.17035.2.camel@mulgrave.site>
@ 2009-06-22 22:25 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 2+ messages in thread
From: Benjamin Herrenschmidt @ 2009-06-22 22:25 UTC (permalink / raw)
To: James Bottomley; +Cc: Brian King, linuxppc-dev, Pekka Enberg, Linux Kernel list
> Actually, no, reverting that one doesn't fix it.
>
> A full run of git bisect turns up this commit as the culprit; I'll make
> a fuss on lkml:
I haven't had the full log of that boot failure, but reverting the patch
Brian suggested won't work well indeed, as I said, from the moment slab
is initialized, page table allocations will use kmem caches which are
initialized by pgtable_cache_init().
So the problem does indeed seem to be another fallover of moving the
allocator initialization earlier.
I'm working from home today but I'll see if I can get somebody in the
office to wire up the powerstation (got disconnected for some reason
last week) for me so I can have a look.
The mutex issue Brian noticed will definitely break _any_ kmem_cache
operation anyway, so that's one bug that need fixing at least (well,
provided Brian analysis is right, I didn't have a chance to look myself
yet :-)
Cheers,
Ben.
> 83b519e8b9572c319c8e0c615ee5dd7272856090 is first bad commit
> commit 83b519e8b9572c319c8e0c615ee5dd7272856090
> Author: Pekka Enberg <penberg@cs.helsinki.fi>
> Date: Wed Jun 10 19:40:04 2009 +0300
>
> slab: setup allocators earlier in the boot sequence
>
> James
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2009-06-22 22:26 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1245683801.6901.8.camel@mulgrave.site>
[not found] ` <4A3FAD31.2060703@linux.vnet.ibm.com>
2009-06-22 22:21 ` Boot failure on the powerstation with 2.6.30 latest Benjamin Herrenschmidt
[not found] ` <1245708236.17035.2.camel@mulgrave.site>
2009-06-22 22:25 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox