* Linux kernel panics and core dumps.
@ 2003-04-08 15:58 Arun Dharankar
0 siblings, 0 replies; 6+ messages in thread
From: Arun Dharankar @ 2003-04-08 15:58 UTC (permalink / raw)
To: linuxppc-embedded
Greetings.
On x86 architectures there seem to be at least two ways of
producing Linux kernel panic dumps. These projects are
hosted at
"http://lkcd.sourceforge.net/" (originated in SGI), and
"http://oss.missioncriticallinux.com/projects/mcore/"
(originated in MCLX).
Of the two, the second one seems to work quite well on x86
PCs. I dont know how much of it is actively supported on
PowerPCs. So, the first question is:
Has anyone tried this on PowerPC, specifically Linux
kernel versions 2.4.x? The code for PowerPC seems to
be there, but the Makefiles dont seem to be up-to-date,
and could be broken.
Further more, this same project has some documentation
which has a good discussion on different approaches to Linux
kernel memory dumps. One item in this discussion is about
the BIOS/bootloader support.
Essentially, if PPCBoot/U-Boot was to recognize the Linux
kernel memory layout, a much more reliable scheme could
be implemented. For example, under all panic or hang
conditions (watchdog), the system could just be rebooted.
During the startup, PPCBoot/U-Boot along with Linux, could
save the Linux kernel dump reliably. MCLX scheme seems
to follow this approach, but does not rely on the bootloader.
Has anyone investigated this? Or anything already done,
and cares to share it? Any thoughts on this?
Best regards,
-Arun.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Linux kernel panics and core dumps.
@ 2003-04-08 16:20 Wright, David
2003-04-09 3:38 ` Arun Dharankar
0 siblings, 1 reply; 6+ messages in thread
From: Wright, David @ 2003-04-08 16:20 UTC (permalink / raw)
To: Arun Dharankar, linuxppc-embedded
I looked at the SGI project, but it wasn't suitable for our
platform, as we didn't have the right sort of disk setup.
I ported the MCLX code to a 405-based platform. The code as
originally distributed had a few bugs in it, but I don't think
they ever fixed the bugs -- I fed changes back to them, and those
changes may have made it to IBM. Or at least they told me IBM
was interested in the project.
If you do use their code, ignore the program that uncompresses
the crash dumps -- as implemented, it's incredibly slow, and
although that can be fixed, it's pointless, since crash can read
the compressed dumps just fine (and can't read the uncompressed
ones, just as a final irony).
I don't have a complete list of the changes I needed to make, but
here are a few:
Makefile didn't specify compiling crash.c
crash.c (machine-specific) specified "regs" instead of "gprs"; also
specified tss.ksp instead of thread.ksp; also must #define
PFN_PHYS() itself.
The code in do_init_bootmem() is trying to work in bytes, not in
frames.
Anyway, once I got these various problems ironed out, plus a few
in crash(1), the facility worked fine. The main problem you're
apt to run into is having enough physical memory to run your
system, hold the dump, and copy the dump from RAM into some file.
NFS can be very useful here.
The dump facility did prove to be quite useful and we did use it
on live systems to track down problems. One thing to watch out
for is diags that scrub memory, since they'll scrub out your dump,
too.
-- David Wright, InfiniSwitch Corp.
> -----Original Message-----
> From: Arun Dharankar [mailto:ADharankar@attbi.com]
> Sent: Tuesday, April 08, 2003 11:58 AM
> To: linuxppc-embedded@lists.linuxppc.org
> Subject: Linux kernel panics and core dumps.
>
>
>
> Greetings.
>
> On x86 architectures there seem to be at least two ways of
> producing Linux kernel panic dumps. These projects are
> hosted at
>
> "http://lkcd.sourceforge.net/" (originated in SGI), and
>
> "http://oss.missioncriticallinux.com/projects/mcore/"
> (originated in MCLX).
>
>
> Of the two, the second one seems to work quite well on x86
> PCs. I dont know how much of it is actively supported on
> PowerPCs. So, the first question is:
>
> Has anyone tried this on PowerPC, specifically Linux
> kernel versions 2.4.x? The code for PowerPC seems to
> be there, but the Makefiles dont seem to be up-to-date,
> and could be broken.
>
> Further more, this same project has some documentation
> which has a good discussion on different approaches to Linux
> kernel memory dumps. One item in this discussion is about
> the BIOS/bootloader support.
>
>
> Essentially, if PPCBoot/U-Boot was to recognize the Linux
> kernel memory layout, a much more reliable scheme could
> be implemented. For example, under all panic or hang
> conditions (watchdog), the system could just be rebooted.
> During the startup, PPCBoot/U-Boot along with Linux, could
> save the Linux kernel dump reliably. MCLX scheme seems
> to follow this approach, but does not rely on the bootloader.
>
> Has anyone investigated this? Or anything already done,
> and cares to share it? Any thoughts on this?
>
>
> Best regards,
> -Arun.
>
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Linux kernel panics and core dumps.
2003-04-08 16:20 Wright, David
@ 2003-04-09 3:38 ` Arun Dharankar
0 siblings, 0 replies; 6+ messages in thread
From: Arun Dharankar @ 2003-04-09 3:38 UTC (permalink / raw)
To: Wright, David, linuxppc-embedded
David, thanks for your inputs!
The working Linux kernel version for me is 2.4.20. After applying
the patch and following the steps you outlined, the kernel boots
ok. However, just as the user processes startup, there are kernel
exceptions for all the processes, and the system eventually panics.
The nearest patch for Linux kernel I could get from MCLX site is
for 2.4.17. Even so, the patch failed only for init/main.c and
kernel/panic.c. Manually making the changes did not seem to
need anything significant.
I have not gone through the patch completely and have only some
understanding of it. So didn't understand what you meant by "The
code in do_init_bootmem() is trying to work in bytes, not in frames.".
Could you elaborate, please? May be this is where I have done
the changes correctly.
Best regards,
-Arun.
On Tuesday 08 April 2003 12:20 pm, Wright, David wrote:
> I looked at the SGI project, but it wasn't suitable for our
> platform, as we didn't have the right sort of disk setup.
>
> I ported the MCLX code to a 405-based platform. The code as
> originally distributed had a few bugs in it, but I don't think
> they ever fixed the bugs -- I fed changes back to them, and those
> changes may have made it to IBM. Or at least they told me IBM
> was interested in the project.
>
> If you do use their code, ignore the program that uncompresses
> the crash dumps -- as implemented, it's incredibly slow, and
> although that can be fixed, it's pointless, since crash can read
> the compressed dumps just fine (and can't read the uncompressed
> ones, just as a final irony).
>
> I don't have a complete list of the changes I needed to make, but
> here are a few:
>
> Makefile didn't specify compiling crash.c
> crash.c (machine-specific) specified "regs" instead of "gprs"; also
> specified tss.ksp instead of thread.ksp; also must #define
> PFN_PHYS() itself.
> The code in do_init_bootmem() is trying to work in bytes, not in
> frames.
>
> Anyway, once I got these various problems ironed out, plus a few
> in crash(1), the facility worked fine. The main problem you're
> apt to run into is having enough physical memory to run your
> system, hold the dump, and copy the dump from RAM into some file.
> NFS can be very useful here.
>
> The dump facility did prove to be quite useful and we did use it
> on live systems to track down problems. One thing to watch out
> for is diags that scrub memory, since they'll scrub out your dump,
> too.
>
> > -----Original Message-----
> > From: Arun Dharankar [mailto:ADharankar@attbi.com]
> > Sent: Tuesday, April 08, 2003 11:58 AM
> > To: linuxppc-embedded@lists.linuxppc.org
> > Subject: Linux kernel panics and core dumps.
> >
> > On x86 architectures there seem to be at least two ways of
> > producing Linux kernel panic dumps. These projects are
> > hosted at
> >
> > "http://lkcd.sourceforge.net/" (originated in SGI), and
> >
> > "http://oss.missioncriticallinux.com/projects/mcore/"
> > (originated in MCLX).
> >
> > Of the two, the second one seems to work quite well on x86
> > PCs. I dont know how much of it is actively supported on
> > PowerPCs. So, the first question is:
> >
> > Has anyone tried this on PowerPC, specifically Linux
> > kernel versions 2.4.x? The code for PowerPC seems to
> > be there, but the Makefiles dont seem to be up-to-date,
> > and could be broken.
> >
> > Further more, this same project has some documentation
> > which has a good discussion on different approaches to Linux
> > kernel memory dumps. One item in this discussion is about
> > the BIOS/bootloader support.
> >
> > Essentially, if PPCBoot/U-Boot was to recognize the Linux
> > kernel memory layout, a much more reliable scheme could
> > be implemented. For example, under all panic or hang
> > conditions (watchdog), the system could just be rebooted.
> > During the startup, PPCBoot/U-Boot along with Linux, could
> > save the Linux kernel dump reliably. MCLX scheme seems
> > to follow this approach, but does not rely on the bootloader.
> >
> > Has anyone investigated this? Or anything already done,
> > and cares to share it? Any thoughts on this?
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Linux kernel panics and core dumps.
@ 2003-04-09 16:19 Wright, David
2003-04-09 17:18 ` Arun Dharankar
0 siblings, 1 reply; 6+ messages in thread
From: Wright, David @ 2003-04-09 16:19 UTC (permalink / raw)
To: Arun Dharankar; +Cc: linuxppc-embedded
Boy, it took me a while to dig this out of my notes.
> David, thanks for your inputs!
>
> The working Linux kernel version for me is 2.4.20. After applying
> the patch and following the steps you outlined, the kernel boots
> ok. However, just as the user processes startup, there are kernel
> exceptions for all the processes, and the system eventually panics.
>
> The nearest patch for Linux kernel I could get from MCLX site is
> for 2.4.17. Even so, the patch failed only for init/main.c and
> kernel/panic.c. Manually making the changes did not seem to
> need anything significant.
>
> I have not gone through the patch completely and have only some
> understanding of it. So didn't understand what you meant by "The
> code in do_init_bootmem() is trying to work in bytes, not in frames.".
> Could you elaborate, please? May be this is where I have done
> the changes correctly.
This is the proper way to do the call to crash_init:
#if defined(CONFIG_MCL_COREDUMP)
crash_init((u_long)phys_to_virt(start),
(u_long)phys_to_virt(start + (33 * PAGE_SIZE)),
(u_long)phys_to_virt(start + (33 + crash_pages)*PAGE_SIZE));
#endif
Well, OK, those "33" literals in the code aren't so great, but
there are limits to how much cleanup I was able to do.
Also, in panic.c, these lines:
#ifdef CONFIG_MCL_COREDUMP
smp_call_function((void*)smp_crash_funnel_cpu,0,0,0);
crash_save_current_state(current);
#endif
should come after the invocation of "notifier_call_chain", if
you have any sort of watchdog timer on your system. We did, and
the watchdog would expire while the crash was being generated,
since it was the call chain that shut down the watchdog.
-- David Wright, InfiniSwitch Corp.
>
> Best regards,
> -Arun.
>
> On Tuesday 08 April 2003 12:20 pm, Wright, David wrote:
> > I looked at the SGI project, but it wasn't suitable for our
> > platform, as we didn't have the right sort of disk setup.
> >
> > I ported the MCLX code to a 405-based platform. The code as
> > originally distributed had a few bugs in it, but I don't think
> > they ever fixed the bugs -- I fed changes back to them, and those
> > changes may have made it to IBM. Or at least they told me IBM
> > was interested in the project.
> >
> > If you do use their code, ignore the program that uncompresses
> > the crash dumps -- as implemented, it's incredibly slow, and
> > although that can be fixed, it's pointless, since crash can read
> > the compressed dumps just fine (and can't read the uncompressed
> > ones, just as a final irony).
> >
> > I don't have a complete list of the changes I needed to make, but
> > here are a few:
> >
> > Makefile didn't specify compiling crash.c
> > crash.c (machine-specific) specified "regs" instead of "gprs"; also
> > specified tss.ksp instead of thread.ksp; also must #define
> > PFN_PHYS() itself.
> > The code in do_init_bootmem() is trying to work in bytes, not in
> > frames.
> >
> > Anyway, once I got these various problems ironed out, plus a few
> > in crash(1), the facility worked fine. The main problem you're
> > apt to run into is having enough physical memory to run your
> > system, hold the dump, and copy the dump from RAM into some file.
> > NFS can be very useful here.
> >
> > The dump facility did prove to be quite useful and we did use it
> > on live systems to track down problems. One thing to watch out
> > for is diags that scrub memory, since they'll scrub out your dump,
> > too.
> >
> > > -----Original Message-----
> > > From: Arun Dharankar [mailto:ADharankar@attbi.com]
> > > Sent: Tuesday, April 08, 2003 11:58 AM
> > > To: linuxppc-embedded@lists.linuxppc.org
> > > Subject: Linux kernel panics and core dumps.
> > >
> > > On x86 architectures there seem to be at least two ways of
> > > producing Linux kernel panic dumps. These projects are
> > > hosted at
> > >
> > > "http://lkcd.sourceforge.net/" (originated in SGI), and
> > >
> > > "http://oss.missioncriticallinux.com/projects/mcore/"
> > > (originated in MCLX).
> > >
> > > Of the two, the second one seems to work quite well on x86
> > > PCs. I dont know how much of it is actively supported on
> > > PowerPCs. So, the first question is:
> > >
> > > Has anyone tried this on PowerPC, specifically Linux
> > > kernel versions 2.4.x? The code for PowerPC seems to
> > > be there, but the Makefiles dont seem to be up-to-date,
> > > and could be broken.
> > >
> > > Further more, this same project has some documentation
> > > which has a good discussion on different approaches to Linux
> > > kernel memory dumps. One item in this discussion is about
> > > the BIOS/bootloader support.
> > >
> > > Essentially, if PPCBoot/U-Boot was to recognize the Linux
> > > kernel memory layout, a much more reliable scheme could
> > > be implemented. For example, under all panic or hang
> > > conditions (watchdog), the system could just be rebooted.
> > > During the startup, PPCBoot/U-Boot along with Linux, could
> > > save the Linux kernel dump reliably. MCLX scheme seems
> > > to follow this approach, but does not rely on the bootloader.
> > >
> > > Has anyone investigated this? Or anything already done,
> > > and cares to share it? Any thoughts on this?
>
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Linux kernel panics and core dumps.
2003-04-09 16:19 Linux kernel panics and core dumps Wright, David
@ 2003-04-09 17:18 ` Arun Dharankar
2003-04-10 3:16 ` Arun Dharankar
0 siblings, 1 reply; 6+ messages in thread
From: Arun Dharankar @ 2003-04-09 17:18 UTC (permalink / raw)
To: Wright, David; +Cc: linuxppc-embedded
David, Thanks very much!
It looks like I need to spend some time to understand debug the
problem. The crash init seems to go through ok, prints relevant
info. Then just as init starts the exceptions start to occur, and
dont stop till a bus timeout happens.
Probably it makes sense for me to gather some more information
to ask any sensible question.
Thanks again!
Best regards,
-Arun.
Just for completeness, snippets of the console output are as follow.
Memory BAT mapping: BAT2=256Mb, BAT3=256Mb, residual: 0Mb
Total memory = 512MB; using 1024kB for hash table (at c0300000)
Linux version 2.4.20 (root@host) (gcc version 2.95.2 19991024
(release)) #68 Wed Apr 9 13:26:50 EDT 2003
crash_init (crash_va: c0021000)
crash_dump_header c0021000 {
magic[0] = bf61001c
map = 7cbb2b78
map_pages = 3bc0ffea
data_pages = 408201f8
compr_units = 2f840000
boot_reserved_start = 419e015c
boot_reserved_end = 8002027c
memset...done
On node 0 totalpages: 131072
zone(0): 131072 pages.
zone(1): 0 pages.
zone(2): 0 pages.
... ... ...
... ... ...
Mounted devfs on /dev
Freeing unused kernel memory: 224k init
INIT: version 2.78 booting
Oops: Exception in kernel mode, sig: 4
NIP: C0021000 XER: 00000000 LR: C000701C SP: C02ADEF0 REGS: c02ade40
TRAP: 0700
Not tainted
MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c02ac000[1] 'init' Last syscall: 174
last math c02ac000 last altivec 00000000
GPR00: C000701C C02ADEF0 C02AC000 00000000 7FFFF6B8 7FFFF738
... ... ...
Kernel panic: Attempted to kill init!
save_core: started on CPU0
Oops: Exception in kernel mode, sig: 4
On Wednesday 09 April 2003 12:19 pm, Wright, David wrote:
> Boy, it took me a while to dig this out of my notes.
>
>
> This is the proper way to do the call to crash_init:
>
> #if defined(CONFIG_MCL_COREDUMP)
> crash_init((u_long)phys_to_virt(start),
> (u_long)phys_to_virt(start + (33 * PAGE_SIZE)),
> (u_long)phys_to_virt(start + (33 + crash_pages)*PAGE_SIZE));
> #endif
>
> Well, OK, those "33" literals in the code aren't so great, but
> there are limits to how much cleanup I was able to do.
>
> Also, in panic.c, these lines:
>
> #ifdef CONFIG_MCL_COREDUMP
> smp_call_function((void*)smp_crash_funnel_cpu,0,0,0);
> crash_save_current_state(current);
> #endif
>
> should come after the invocation of "notifier_call_chain", if
> you have any sort of watchdog timer on your system. We did, and
> the watchdog would expire while the crash was being generated,
> since it was the call chain that shut down the watchdog.
>
> -- David Wright, InfiniSwitch Corp.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Linux kernel panics and core dumps.
2003-04-09 17:18 ` Arun Dharankar
@ 2003-04-10 3:16 ` Arun Dharankar
0 siblings, 0 replies; 6+ messages in thread
From: Arun Dharankar @ 2003-04-10 3:16 UTC (permalink / raw)
To: Wright, David; +Cc: linuxppc-embedded
Thanks to the followups and the detailed inputs from David Wright, I
have the MCLX mcore working on MPC8260/MPC7410 based platform.
For reference: I have posted the patch at:
http://oss.missioncriticallinux.com/pipermail/mcore-dev/2003-April/000028.html
Best regards,
-Arun.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2003-04-10 3:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-09 16:19 Linux kernel panics and core dumps Wright, David
2003-04-09 17:18 ` Arun Dharankar
2003-04-10 3:16 ` Arun Dharankar
-- strict thread matches above, loose matches on Subject: below --
2003-04-08 16:20 Wright, David
2003-04-09 3:38 ` Arun Dharankar
2003-04-08 15:58 Arun Dharankar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).