public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* How to go about debuging a system lockup?
@ 2006-11-16 15:34 Lennart Sorensen
  2006-11-16 20:49 ` Jesper Juhl
  0 siblings, 1 reply; 13+ messages in thread
From: Lennart Sorensen @ 2006-11-16 15:34 UTC (permalink / raw)
  To: linux-kernel

We have a router with a Geode SC1200 cpu, with 4 AMD 972 ethernet ports
(pcnet32) behind a PLX 6152 PCI-PCI bridge, which quite regularly locks
up completely if we try to do simultanius traffic on all 4 ports (our
test case sends data from port 1 to port 2, and back and from port 3 to
port 4 and back at a rate of 8000 packets per second using 1500byte
packets).  We usually manage to run the test for about 1 minute before
the system hangs.  This happens on every one of the systems we have
tried so far.  If we only run 2 ports, it seems to never die, and with 3
ports we haven't seen any failures yet, although maybe we just haven't
tested long enough.  If we just receive the packets but don't forward
them out again, then we never crash, so it seems to be related to
simultanious transmit on the pcnet32s.

So far I have tried printing a message everytime the pcnet32 driver
enables and disables interrupts to find out if it hangs somewhere with
interrupts disabled, but that didn't seem to indicate anything
meaningful.

So far I have tried this with 2.6.8, 2.6.16.22, and 2.6.18.2 and no
difference so far.  I can't think of what kind of even could cause the
system to just hang with no further console output or a kernel panic or
oops or anything.  Usually most errors produce some kind of message.

Does anyone have any suggestions for where I go from here to find out
what is happening and where to look?  I don't even know if I should
suspect the hardware or the software at this point.  I want to know if
the program counter is still changing, or if the cpu is simply hung or
something, but I have no idea how to get at that.

--
Len Sorensen

^ permalink raw reply	[flat|nested] 13+ messages in thread
* RE: How to go about debuging a system lockup?
@ 2006-11-16 22:01 Protasevich, Natalie
  2006-11-16 22:37 ` Lennart Sorensen
  0 siblings, 1 reply; 13+ messages in thread
From: Protasevich, Natalie @ 2006-11-16 22:01 UTC (permalink / raw)
  To: Lennart Sorensen, Jesper Juhl; +Cc: linux-kernel

> I don't know of a good version yet.  I so far don't know if there ever
> was one.  This could even be a bug in the PCI hardware, or the way the
> BIOS on this system on a board configured the PCI controller.  Maybe I
> should go back and try a 2.4 kernel.
> 
> > Hope some of that helps :)
> 
> Well hopefully.
> 

If you can't drop in kdb, or no sysreq, then your interrupts are
disabled. I used to be (with older systems anyway) that NMI button was
on the system, so one could send an NMI and make the handler to print a
trace. Newer systems might not have that, so you can built your own PCI
card to send an NMI :)
Another possibility is to use port 80 and make suspicious code print
something to it. Once we used a small self-built thing with LEDs to
catch the output to the parallel port while debugging silent boot
failure. There are some port 80 cards that you can buy:
http://auctions.yahoo.com/i:Port%2080%20Card%20and%20power%20supply%20te
ster:102201489
http://www.amazon.com/gp/product/B000234U3I/ref=pd_cp_e_title/103-887558
8-5330221

If your system has a jtag then in target probe would be useful if you
have one (or can borrow one, those are expensive).

--Natalie


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2006-11-21  4:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-16 15:34 How to go about debuging a system lockup? Lennart Sorensen
2006-11-16 20:49 ` Jesper Juhl
2006-11-16 21:21   ` Lennart Sorensen
2006-11-16 21:30     ` Jesper Juhl
2006-11-18  1:14       ` Krzysztof Halasa
2006-11-21  4:17     ` Keith Owens
  -- strict thread matches above, loose matches on Subject: below --
2006-11-16 22:01 Protasevich, Natalie
2006-11-16 22:37 ` Lennart Sorensen
2006-11-17 13:43   ` Stefan Richter
2006-11-17 14:29     ` Lennart Sorensen
2006-11-17 22:44       ` Lennart Sorensen
2006-11-17 23:09         ` Stefan Richter
2006-11-20 15:20           ` Lennart Sorensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox