Boot failure on the powerstation with 2.6.30 latest

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* Boot failure on the powerstation with 2.6.30 latest
@ 2009-06-22 15:16 James Bottomley
  2009-06-22 16:11 ` Brian King
  0 siblings, 1 reply; 5+ messages in thread
From: James Bottomley @ 2009-06-22 15:16 UTC (permalink / raw)
  To: linuxppc-dev

2.6.30-rc8 worked fine ... unless this is a known problem, I suppose I
can begin bisecting.

The boot log of the hang is:

Please wait, loading kernel...
   Elf64 kernel loaded...
Loading ramdisk...
ramdisk loaded at 02500000, size: 8280 Kbytes
OF stdout device is: /ht/isa@8/serial@2f8
Preparing to boot Linux version 2.6.30 (jejb@claymoor) (gcc version 4.3.3 (Debian 4.3.3-10) ) #1 SMP Mon Jun 22 09:59:35 CDT 2009
command line: root=/dev/sda3 ro console=ttyS0,19200n1 
memory layout at init:
  alloc_bottom : 0000000002d16000
  alloc_top    : 0000000030000000
  alloc_top_hi : 0000000080000000
  rmo_top      : 0000000030000000
  ram_top      : 0000000080000000
instantiating rtas at 0x000000002fff5000... done
boot cpu hw idx 0000000000000000
starting cpu hw idx 0000000000000001... done
starting cpu hw idx 0000000000000002... done
starting cpu hw idx 0000000000000003... done
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000003117000 -> 0x0000000003117640
Device tree struct  0x0000000003118000 -> 0x000000000311b000
Calling quiesce...
returning from prom_init

So it looks like some type of early boot failure or handoff in head_64

James

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Boot failure on the powerstation with 2.6.30 latest
  2009-06-22 15:16 Boot failure on the powerstation with 2.6.30 latest James Bottomley
@ 2009-06-22 16:11 ` Brian King
  2009-06-22 22:03   ` James Bottomley
  2009-06-22 22:21   ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 5+ messages in thread
From: Brian King @ 2009-06-22 16:11 UTC (permalink / raw)
  To: James Bottomley; +Cc: linuxppc-dev

James,

I was running into a similar hang on one of my Power boxes as well.
Reverting c868d550115b9ccc0027c67265b9520790f05601 allowed by system
to boot. It looks like that patch injected a bug where we can end up
waiting on an uninitialized mutex:

[c0000000009f3c30] c00000000052c7dc .mutex_lock+0x34/0x50
[c0000000009f3cb0] c00000000008b190 .get_online_cpus+0x3c/0x74
[c0000000009f3d40] c000000000146cd0 .kmem_cache_create+0xcc/0x548
[c0000000009f3e50] c000000000032ae0 .pgtable_cache_init+0x28/0x6c
[c0000000009f3ee0] c000000000780960 .start_kernel+0x1ec/0x520
[c0000000009f3f90] c0000000000083d8 .start_here_common+0x1c/0x44

The mutex gets initialized in cpu_hotplug_init, which doesn't get called until
after pgtable_cache_init.

-Brian

James Bottomley wrote:
> 2.6.30-rc8 worked fine ... unless this is a known problem, I suppose I
> can begin bisecting.
> 
> The boot log of the hang is:
> 
> Please wait, loading kernel...
>    Elf64 kernel loaded...
> Loading ramdisk...
> ramdisk loaded at 02500000, size: 8280 Kbytes
> OF stdout device is: /ht/isa@8/serial@2f8
> Preparing to boot Linux version 2.6.30 (jejb@claymoor) (gcc version 4.3.3 (Debian 4.3.3-10) ) #1 SMP Mon Jun 22 09:59:35 CDT 2009
> command line: root=/dev/sda3 ro console=ttyS0,19200n1 
> memory layout at init:
>   alloc_bottom : 0000000002d16000
>   alloc_top    : 0000000030000000
>   alloc_top_hi : 0000000080000000
>   rmo_top      : 0000000030000000
>   ram_top      : 0000000080000000
> instantiating rtas at 0x000000002fff5000... done
> boot cpu hw idx 0000000000000000
> starting cpu hw idx 0000000000000001... done
> starting cpu hw idx 0000000000000002... done
> starting cpu hw idx 0000000000000003... done
> copying OF device tree...
> Building dt strings...
> Building dt structure...
> Device tree strings 0x0000000003117000 -> 0x0000000003117640
> Device tree struct  0x0000000003118000 -> 0x000000000311b000
> Calling quiesce...
> returning from prom_init
> 
> So it looks like some type of early boot failure or handoff in head_64
> 
> James
> 
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev


-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Boot failure on the powerstation with 2.6.30 latest
  2009-06-22 16:11 ` Brian King
@ 2009-06-22 22:03   ` James Bottomley
  2009-06-22 22:25     ` Benjamin Herrenschmidt
  2009-06-22 22:21   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 5+ messages in thread
From: James Bottomley @ 2009-06-22 22:03 UTC (permalink / raw)
  To: Brian King; +Cc: linuxppc-dev

On Mon, 2009-06-22 at 11:11 -0500, Brian King wrote:
> James,
> 
> I was running into a similar hang on one of my Power boxes as well.
> Reverting c868d550115b9ccc0027c67265b9520790f05601 allowed by system
> to boot. It looks like that patch injected a bug where we can end up
> waiting on an uninitialized mutex:
> 
> [c0000000009f3c30] c00000000052c7dc .mutex_lock+0x34/0x50
> [c0000000009f3cb0] c00000000008b190 .get_online_cpus+0x3c/0x74
> [c0000000009f3d40] c000000000146cd0 .kmem_cache_create+0xcc/0x548
> [c0000000009f3e50] c000000000032ae0 .pgtable_cache_init+0x28/0x6c
> [c0000000009f3ee0] c000000000780960 .start_kernel+0x1ec/0x520
> [c0000000009f3f90] c0000000000083d8 .start_here_common+0x1c/0x44
> 
> The mutex gets initialized in cpu_hotplug_init, which doesn't get called until
> after pgtable_cache_init.

Actually, no, reverting that one doesn't fix it.

A full run of git bisect turns up this commit as the culprit; I'll make
a fuss on lkml:

83b519e8b9572c319c8e0c615ee5dd7272856090 is first bad commit
commit 83b519e8b9572c319c8e0c615ee5dd7272856090
Author: Pekka Enberg <penberg@cs.helsinki.fi>
Date:   Wed Jun 10 19:40:04 2009 +0300

    slab: setup allocators earlier in the boot sequence

James

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Boot failure on the powerstation with 2.6.30 latest
  2009-06-22 16:11 ` Brian King
  2009-06-22 22:03   ` James Bottomley
@ 2009-06-22 22:21   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 5+ messages in thread
From: Benjamin Herrenschmidt @ 2009-06-22 22:21 UTC (permalink / raw)
  To: Brian King; +Cc: James Bottomley, linuxppc-dev, Pekka Enberg, Linux Kernel list

On Mon, 2009-06-22 at 11:11 -0500, Brian King wrote:
> James,
> 
> I was running into a similar hang on one of my Power boxes as well.
> Reverting c868d550115b9ccc0027c67265b9520790f05601 allowed by system
> to boot. It looks like that patch injected a bug where we can end up
> waiting on an uninitialized mutex:
> 
> [c0000000009f3c30] c00000000052c7dc .mutex_lock+0x34/0x50
> [c0000000009f3cb0] c00000000008b190 .get_online_cpus+0x3c/0x74
> [c0000000009f3d40] c000000000146cd0 .kmem_cache_create+0xcc/0x548
> [c0000000009f3e50] c000000000032ae0 .pgtable_cache_init+0x28/0x6c
> [c0000000009f3ee0] c000000000780960 .start_kernel+0x1ec/0x520
> [c0000000009f3f90] c0000000000083d8 .start_here_common+0x1c/0x44
> 
> The mutex gets initialized in cpu_hotplug_init, which doesn't get called until
> after pgtable_cache_init.

Ah good, I didn't have a chance to track that one down yet.

So the problem here is that we must do pgtable_cache_init there because
vmalloc is initialized right after, which relies on allocating page
tables and that will need kmem caches on some archs.

So I suspect we need to sort out this mutex, either initializing it from
elsewhere, moving cpu_hotplug_init() earlier, or avoiding it when the
kernel state isn't SYSTEM_RUNNING, I haven't looked in details yet.

Cheers,
Ben.

> -Brian
> 
> James Bottomley wrote:
> > 2.6.30-rc8 worked fine ... unless this is a known problem, I suppose I
> > can begin bisecting.
> > 
> > The boot log of the hang is:
> > 
> > Please wait, loading kernel...
> >    Elf64 kernel loaded...
> > Loading ramdisk...
> > ramdisk loaded at 02500000, size: 8280 Kbytes
> > OF stdout device is: /ht/isa@8/serial@2f8
> > Preparing to boot Linux version 2.6.30 (jejb@claymoor) (gcc version 4.3.3 (Debian 4.3.3-10) ) #1 SMP Mon Jun 22 09:59:35 CDT 2009
> > command line: root=/dev/sda3 ro console=ttyS0,19200n1 
> > memory layout at init:
> >   alloc_bottom : 0000000002d16000
> >   alloc_top    : 0000000030000000
> >   alloc_top_hi : 0000000080000000
> >   rmo_top      : 0000000030000000
> >   ram_top      : 0000000080000000
> > instantiating rtas at 0x000000002fff5000... done
> > boot cpu hw idx 0000000000000000
> > starting cpu hw idx 0000000000000001... done
> > starting cpu hw idx 0000000000000002... done
> > starting cpu hw idx 0000000000000003... done
> > copying OF device tree...
> > Building dt strings...
> > Building dt structure...
> > Device tree strings 0x0000000003117000 -> 0x0000000003117640
> > Device tree struct  0x0000000003118000 -> 0x000000000311b000
> > Calling quiesce...
> > returning from prom_init
> > 
> > So it looks like some type of early boot failure or handoff in head_64
> > 
> > James
> > 
> > 
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Boot failure on the powerstation with 2.6.30 latest
  2009-06-22 22:03   ` James Bottomley
@ 2009-06-22 22:25     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 5+ messages in thread
From: Benjamin Herrenschmidt @ 2009-06-22 22:25 UTC (permalink / raw)
  To: James Bottomley; +Cc: Brian King, linuxppc-dev, Pekka Enberg, Linux Kernel list

> Actually, no, reverting that one doesn't fix it.
> 
> A full run of git bisect turns up this commit as the culprit; I'll make
> a fuss on lkml:

I haven't had the full log of that boot failure, but reverting the patch
Brian suggested won't work well indeed, as I said, from the moment slab
is initialized, page table allocations will use kmem caches which are
initialized by pgtable_cache_init().

So the problem does indeed seem to be another fallover of moving the
allocator initialization earlier.

I'm working from home today but I'll see if I can get somebody in the
office to wire up the powerstation (got disconnected for some reason
last week) for me so I can have a look.

The mutex issue Brian noticed will definitely break _any_ kmem_cache
operation anyway, so that's one bug that need fixing at least (well,
provided Brian analysis is right, I didn't have a chance to look myself
yet :-)

Cheers,
Ben.

> 83b519e8b9572c319c8e0c615ee5dd7272856090 is first bad commit
> commit 83b519e8b9572c319c8e0c615ee5dd7272856090
> Author: Pekka Enberg <penberg@cs.helsinki.fi>
> Date:   Wed Jun 10 19:40:04 2009 +0300
> 
>     slab: setup allocators earlier in the boot sequence
> 
> James
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-06-22 22:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-22 15:16 Boot failure on the powerstation with 2.6.30 latest James Bottomley
2009-06-22 16:11 ` Brian King
2009-06-22 22:03   ` James Bottomley
2009-06-22 22:25     ` Benjamin Herrenschmidt
2009-06-22 22:21   ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).