netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.20.4: NETDEV WATCHDOG and lockups
@ 2007-04-02 19:41 Christian Kujau
  2007-04-02 20:20 ` Chuck Ebbert
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Christian Kujau @ 2007-04-02 19:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, malte


Hi there,

we have serious problems with 2 of our servers: both shiny new amd64 
dual core, with both 2GB RAM, 32bit kernel+userland (Debian/testing).
Both servers have 2 NICs, RTL8139 (eth0, irq10) and RTL8169s
(eth1, irq11).

Both boxes are running fine but after "a while" they lock up and 
eventually restart all of a sudden. The last messages in the logfile 
are:

14:15:11 db2 kernel: NETDEV WATCHDOG: eth0: transmit timed out
14:15:14 db2 kernel: eth0: link up, 100Mbps, full-duplex, lpa 0x45E1

Then the box reboots, nothing else in the log.

As the servers have been set up recently, we only know that it happend 
with Debian's 2.6.17-? kernel. When we upgraded the installation, we 
went to 2.6.18-4-k7 and the problem persistent. We're using now vanilla 
2.6.20.4 and while the problem persists, it takes longer to lockup (~20h 
as opposed to 4-5h). While this is a good thing for us, it's now harder
to reproduce (we have to wait longer).

Searching the archives turned up quite a few results but no real fix and 
lots of old postings too. We then disabled ACPI completely and booted 
with 'noapic'. Now both boxes are running for > 20h and we're curious 
how long they make it. However, booting with 'noapic' slowed down both 
servers *a lot*.

>From /proc/interrupts we can see that only CPU0 (core 0) is handling 
interrupts while CPU1 does not. We compiled with CONFIG_IRQBALANCE=n so 
that irqbalance(1) would work - but to no avail.

Please see http://nerdbynature.de/bits/2.6.20.4/ for details for both 
hosts and feel free to ask for more details. Although both boxes are in 
production we'll be happy test more bootoptions/patches and the like.

TIA,
Christian.
-- 
BOFH excuse #266:

All of the packets are empty.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2007-04-17 12:30 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-02 19:41 2.6.20.4: NETDEV WATCHDOG and lockups Christian Kujau
2007-04-02 20:20 ` Chuck Ebbert
2007-04-02 21:15   ` Christian Kujau
2007-04-03  5:34   ` Christian Kujau
2007-04-03 15:17   ` Christian Kujau
2007-04-03  5:20 ` Len Brown
2007-04-03  5:46   ` Christian Kujau
2007-04-03  6:58 ` Jarek Poplawski
2007-04-03  9:47   ` Christian Kujau
2007-04-03 15:19   ` Christian Kujau
2007-04-03 20:34     ` Francois Romieu
2007-04-04 11:21     ` Jarek Poplawski
2007-04-04 13:20       ` Christian Kujau
2007-04-05  6:20         ` Jarek Poplawski
2007-04-06 18:19         ` Christian Kujau
2007-04-06 18:27           ` Christian Kujau
2007-04-17 12:36           ` Jarek Poplawski
2007-04-04 13:53       ` Denys
2007-04-03 20:57 ` Francois Romieu
2007-04-04 13:12   ` Christian Kujau
2007-04-04 18:10     ` Francois Romieu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).