From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.228]) by ozlabs.org (Postfix) with ESMTP id CA76EDDDDB for ; Thu, 5 Feb 2009 05:48:13 +1100 (EST) Received: by rv-out-0506.google.com with SMTP id f6so2502859rvb.9 for ; Wed, 04 Feb 2009 10:48:11 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <02f6bf324a381ee5c2fc5e91313dbca9@bga.com> References: <02f6bf324a381ee5c2fc5e91313dbca9@bga.com> Date: Wed, 4 Feb 2009 13:48:11 -0500 Message-ID: Subject: Re: Maple PPC970 kexec crash-dump problems From: Benjamin Walsh To: Milton Miller Content-Type: multipart/alternative; boundary=000e0cd2903ac31f6304621c3a7c Cc: linuxppc-dev list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , --000e0cd2903ac31f6304621c3a7c Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hi Milton, I've tracked it down to the device tree passed to the second kernel being screwed-up when patched by kexec-tools. Namely, it was creating linux,usable-memory entries that were wrong, and the MMU initialization hung when it failed allocating for the page tables. I hacked the tool, and got passed that point in the init sequence, but the very first IO mapped access fails, so the MMU doesn't seem to be set up correctly. Anyway, up to my question: is the crash dump (kdump) kernel supposed to use the memory reserved for it by the first kernel for its working memory ? e.g. On that board, I have 0->2GB and 4->6GB for a total of 4GB of RAM. Let's say I reserve 128M@32M, that's 0x2000000->0xa000000. Is the second kernel supposed to use (0x2000000+) -> 0xa000000 for its memory pool and leave everything else: 0->0x2000000, 0xa000000 -> 80000000, 0x100000000 -> 0x180000000 as memory that is from the first kernel, used to debug it ? Basically, I am trying to figure out if I patched the tool correctly. Thanks, Ben On Sat, Jan 24, 2009 at 2:52 AM, Milton Miller wrote: > On Sat Jan 24 at 07:59:47 EST in 2009, Benjamin Walsh wrote: > >> I am trying to use kexec with a crash dump kernel on a Maple board >> (Motorola >> ATCA6101 to be precise). This board is running a two-CPU PPC970FX. I am >> running a 2.6.27-10 kernel and have tried both older kexec-tools and the >> newest ones. I have tried SMP and non-SMP kernels. >> > > Once you start the second cpu it is likly executing instructions somewhere. > > Priory to 2.6.27 you had to compile a fixxed offset kerenl to run kdump. > With 2.6.27 that option was removed and replaced with teh relocatable > kerenl. However, becasue of the way linux interacts with open firmware, the > kernel will still move itself to 0 unless a specific flag is set. The > location of the flag was changed twice during the merge process, and the > patches for kexec-tools were not made until early this year. > > Using kexec -l to fast boot works correctly. However, loading a crash dump >> kernel and triggering a crash via echo c > /proc/sysrq-trigger simply >> hangs >> the board. I have traced the sequence down to after the call to >> kexec_copy_flush(), when the CPU returns to real-address mode (bl >> real_mode). At this point I have no further debugging information. >> > > > Two things could help me: >> >> - Getting the fix if this is a known issue and a fix exists. I have looked >> at recent patches and nothing lept to mind, mostly relocatable kernel >> support. >> > > That is a major change. > > That said, I don't know if anyone has tested kexec panic beyond pseries for > 64 bit powerpc. > > I know Paul originally prototyped the relocatable patch on a powermac, but > I dont' know what if any smp testing he performed. And you said you are > actualy on maple not a powermac, so the startup issues are different. > > - Obtaining the address of the serial port @3f8 in real mode. The init >> sequence with udbg ON says that the physical address of the port is >> 0xf40003f8; however, setting it up in poll mode and trying to stuff >> characters in the tx buffer doesn't produce anything. >> > > Ah yes. In real mode you can only talk to cacheable memory without > implementation specific assistance. However, if you look in the kernel for > the maple early udbg support, you will find the code you need to talk to > that serial port in real mode. > > >> Has anyone recently tried to use the serial port in real mode ? >> >> Thanks for any help. >> >> Ben >> > > Hope this gets you started. I wrote a lot of the kernel code, but I had > the advantage of external jtag access to the processor to see where it when > ended up when it went astray. > > milton > > --000e0cd2903ac31f6304621c3a7c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Milton,

I've tracked it down to the device tree passed to the= second kernel being screwed-up when patched by kexec-tools. Namely, it was= creating linux,usable-memory entries that were wrong, and the MMU initiali= zation hung when it failed allocating for the page tables. I hacked the too= l, and got passed that point in the init sequence, but the very first IO ma= pped access fails, so the MMU doesn't seem to be set up correctly.

Anyway, up to my question: is the crash dump (kdump) kernel supposed to= use the memory reserved for it by the first kernel for its working memory = ? e.g. On that board, I have 0->2GB and 4->6GB for a total of 4GB of = RAM. Let's say I reserve 128M@32M, that's 0x2000000->0xa000000. = Is the second kernel supposed to use

(0x2000000+<kernel size>) -> 0xa000000

for its memory p= ool and leave everything else:

0->0x2000000, 0xa000000 -> 8000= 0000, 0x100000000 -> 0x180000000

as memory that is from the first= kernel, used to debug it ?

Basically, I am trying to figure out if I patched the tool correctly.
Thanks,
Ben

On Sat, Jan 24, 2009= at 2:52 AM, Milton Miller <miltonm@bga.com> wrote:
On Sat Jan 24 at 07:59:47 EST in 2009, Benjamin Walsh wrote:
I am trying to use kexec with a crash dump kernel on a Maple board (Motorol= a
ATCA6101 to be precise). This board is running a two-CPU PPC970FX. I am
running a 2.6.27-10 kernel and have tried both older kexec-tools and the newest ones. I have tried SMP and non-SMP kernels.

Once you start the second cpu it is likly executing instructions somewhere.=

Priory to 2.6.27 you had to compile a fixxed offset kerenl to run kdump. &n= bsp;With 2.6.27 that option was removed and replaced with teh relocatable k= erenl.  However, becasue of the way linux interacts with open firmware= , the kernel will still move itself to 0 unless a specific flag is set. &nb= sp; The location of the flag was changed twice during the merge process, an= d the patches for kexec-tools were not made until early this year.


Using kexec -l to fast boot works correctly. However, loading a crash dump<= br> kernel and triggering a crash via echo c > /proc/sysrq-trigger simply ha= ngs
the board. I have traced the sequence down to after the call to
kexec_copy_flush(), when the CPU returns to real-address mode (bl
real_mode). At this point I have no further debugging information.


Two things could help me:

- Getting the fix if this is a known issue and a fix exists. I have looked<= br> at recent patches and nothing lept to mind, mostly relocatable kernel
support.

That is a major change.

That said, I don't know if anyone has tested kexec panic beyond pseries= for 64 bit powerpc.

I know Paul originally prototyped the relocatable patch on a powermac, but = I dont' know what if any smp testing he performed.   And you said = you are actualy on maple not a powermac, so the startup issues are differen= t.


- Obtaining the address of the serial port @3f8 in real mode. The init
sequence with udbg ON says that the physical address of the port is
0xf40003f8; however, setting it up in poll mode and trying to stuff
characters in the tx buffer doesn't produce anything.

Ah yes.  In real mode you can only talk to cacheable memory without im= plementation specific assistance.  However, if you look in the kernel = for the maple early udbg support, you will find the code you need to talk t= o that serial port in real mode.



Has anyone recently tried to use the serial port in real mode ?

Thanks for any help.

Ben

Hope this gets you started.  I wrote a lot of the kernel code, but I h= ad the advantage of external jtag access to the processor to see where it w= hen ended up when it went astray.

milton


--000e0cd2903ac31f6304621c3a7c--