PPC440 Kernel Stack overflow

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* PPC440 Kernel Stack overflow
@ 2004-07-16 10:49 Steve Boorman
  0 siblings, 0 replies; 2+ messages in thread
From: Steve Boorman @ 2004-07-16 10:49 UTC (permalink / raw)
  To: linuxppc-embedded

Hi,

We recently traced a system hang-up to a bug in one of our drivers.
The bug effectively caused repeated calls to itself, which caused the
Kernel stack to overflow. The surprising thing is that the machine
would just hang, no o/p on the console and all interrupts including
the timer were dead. We never got the message "Kernel stack overflow
in process" which is what I expected.

We are running a ported version of 2.4.26 on our hardware (PPC440GP
based), suspecting that something may be adrift with the port I tried
this with the stock 2.4.26 IBM ebony kernel running on the Ebony eval
board. This was done using a test driver, written as a loadable
module. The driver simulated a kernel stack overflow by repeated
calls to a module within the same module. The result was identical,
ie no messages on the console and the system completely freezes.

Am I expecting too much here, or is something wrong in the kernel
stack overflow detection?

The problem is that this type of hang is very hard to debug. We have
implemented the PPC440 watch-dog in our Kernel port, and whilst that
happily traps code spinning in a loop, it does not trap this kernel
stack problem, presumably because even critical exception interrupts
are not being processed. The watch-dog is definitely expiring.

We do not have (at the moment) a BDI2000 and wondered if it would be
any good at tracking this type of crash down anyway?

Any thoughts on this would be appreciated.

Regards,

Steve Boorman

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: PPC440 Kernel Stack overflow
@ 2004-07-16 14:49 Steve Boorman
  0 siblings, 0 replies; 2+ messages in thread
From: Steve Boorman @ 2004-07-16 14:49 UTC (permalink / raw)
  To: linuxppc-embedded

Mark,

Thanks for your comments, you are basically confirming what I
suspected, that accessing regions of memory outside the allocated
stack area are not trapped. It looks like the Kernel stack pointer is
just compared when context switches occur, and only then reported if
out of range.

> I'm curious about your watchdog - a true watchdog should hard reset
> your board if not serviced.  So you've really got a high priority
> interrupt I guess, which is probably not much use, (as you're
> finding out).  If it were an NMI you could at least write some code
> that would jump to the reset vector or something like that.

The watch-dog we use is built into the PPC440GP. The first time it
expires it generates a critical interrupt, the second time, if still
enabled, will cause a hardware reset if that function is enabled.
Whilst tracking down our original bug we didn't want the hardware
reset to occur as it would definitely loose any debug information
that may still be left in memory, hence we run it with reset
disabled.

Regards,
Steve Boorman

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2004-07-16 14:49 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-16 10:49 PPC440 Kernel Stack overflow Steve Boorman
  -- strict thread matches above, loose matches on Subject: below --
2004-07-16 14:49 Steve Boorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).