All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out
@ 2004-06-14 16:47 David Greaves
       [not found] ` <20040615155111.26d6b809@dell_ss3.pdx.osdl.net>
  0 siblings, 1 reply; 14+ messages in thread
From: David Greaves @ 2004-06-14 16:47 UTC (permalink / raw)
  To: shemminger, scott.feldman; +Cc: netdev

Hi

I have 2 machines with Intel/Pro 1000MT cards.

One machine seems to work fine (AFAIK), the other has major problems.
I've swapped the cards and the problem stays on the machine.

I'm using version 5.2.39-k2 from the stock 2.6.6 kernel on both machines.

Any sustained traffic causes repeated:
Jun 14 16:29:14 ash kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jun 14 16:29:17 ash kernel: e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex
Jun 14 16:29:17 ash kernel: nfs: server cu OK

I had a pair of Realtek r8169s that worked fine but only gave me 10Mb/s 
so I exchanged them for the Intel/Pro cards in the hope of something 
better - now, even with scp's rate limiter as low as 10kb/s this it 
still occurs.

I have played with all the module parameters and not found anything that 
affects it at 1Gbps

Even dropping to 100Mbps:
Jun 14 17:33:03 ash kernel: e1000: eth0 NIC Link is Up 100 Mbps Full Duplex
Jun 14 17:33:33 ash kernel: NETDEV WATCHDOG: eth0: transmit timed out

it can do 10Mbs:
scp reports a throughput of 1.0Mb/s (... less than thrilling)
however scp now transfers a few Mb and says:
Disconnecting: Corrupted MAC on input.



I found this mail:
  http://oss.sgi.com/projects/netdev/archive/2004-06/msg00256.html
from Stephen

which appears to reverse this mail:
  http://marc.theaimsgroup.com/?l=linux-kernel&m=107516205706542&w=2
from Scott
which I gather was supposed to correct this problem :)

I have seen no suggestions about other subsystems (eg ACPI etc) that 
could also be tried.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread
* RE: 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out+ delay scheduler
@ 2004-06-18 14:40 Venkatesan, Ganesh
  0 siblings, 0 replies; 14+ messages in thread
From: Venkatesan, Ganesh @ 2004-06-18 14:40 UTC (permalink / raw)
  To: David Greaves, Jens Laas; +Cc: Stephen Hemminger, netdev

Jens/David:

Did not mean to get off the list. For some reason, my subscription to
netdev is not working (even after re-subscribing). So, I grabbed your
message off of the archive.

I am trying to recreate your failure scenario in our lab. In the
meantime, please send me any new information you have on this issue.

Thanks,
ganesh 
 
-------------------------------------------------
Ganesh Venkatesan
Network/Storage Division, Hillsboro, OR

-----Original Message-----
From: David Greaves [mailto:david@dgreaves.com] 
Sent: Friday, June 18, 2004 5:52 AM
To: Jens Laas
Cc: Stephen Hemminger; netdev@oss.sgi.com; Venkatesan, Ganesh
Subject: Re: 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out+
delay scheduler

New info:
I booted into XP and the card works there - so it doesn't look like a 
simple hardware incompatibility.
[I've got no real way to test the performance but cygwin's wget against 
apache1.3 on the linux box returns about 25M/s initially and then 15M/s 
sustained for 500Mb]

Jens Laas wrote:

>>
>> I'm speaking with Ganesh Venkatesan at intel about it. Ganesh you 
>> went off list - do you want to include Jens or maybe go back on-list?
>
>
> If others run into this problem I'm sure they'll appreciate if its on 
> list.
> Since we have no idea what causes this (AFAIK) it may be a more 
> general problem than the device driver.

I tend to agree - but I wasn't sure if this was the place and I'll do as

I'm told ;)

>> A simple failure case for me is : 'ping -s 1500 '
>> This doesn't cause the timout but doesn't succeed either.
>>
>> ping -f with standard packet size succeeds (slow rate though) and 
>> doesn't timeout.
>
>
>
> I dont see the ping problems at all. Unless you try to ping when the 
> interface has "hanged" ?

<sigh> thought that might be helpful.
Ping with -s and -f seems to allow me to trigger errors and it seems a 
lot more debug-able than scp or nfs :)
No all tests are when it's reset and 'clean'

>> ============
>> From hereon down it's 2.6.7 with Stephen's recent delay scheduler
patch
>>
>> This changed the behaviour.
>
>
>
> This is strange unless you are actually using the delay scheduler ?
> Default is sch_generic (that is pfifo) that does not exhibit the 
> problems correct by the patch.

I'll go back and double check in case I cocked up...
(I noticed the e1000 module rebuild but you're right that's incidental)

I've rebuilt the kernel and modules with and w/o patch and rebooted a 
few times and I can't reproduce that effect - sorry for the red herring.
So after I reverted Stephens patch the results I reported are still 
reproducable w/o the patch.

>> 10592 packets transmitted, 10591 packets received, 0% packet loss
>> round-trip min/avg/max = 5.4/5.5/83.5 ms
>>
>> Increasing Transmit Descriptors to 4096 avoids the No buffer space 
>> available with packet sizes up to -s65468 (still 100% failure though)
>
>
> Increasing nr of buffers is not a way to fix the problem.

agreed - however in my ignorance of the deep behaviour I'm reporting 
things that affect behaviour in ways I don't expect.
I expected it to take longer to run out of buffers - that didn't happen
:)

(Anyway, on retesting I find that this was wrong - I suspect the 
interface was down and I didn't notice)

>
> I had hoped to hear something about this from Scott..

I'm happy to hear from anyone - I don't have *that* long until my RMA 
option expires and I don't fancy keeping them as ornaments!

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2004-06-21 18:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-14 16:47 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out David Greaves
     [not found] ` <20040615155111.26d6b809@dell_ss3.pdx.osdl.net>
2004-06-16 10:59   ` David Greaves
2004-06-18  8:04     ` Jens Laas
2004-06-18  9:08       ` 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out+ delay scheduler David Greaves
2004-06-18 10:27         ` Jens Laas
2004-06-18 12:51           ` David Greaves
2004-06-21 16:42         ` Thayne Harbaugh
2004-06-21 17:29           ` David Greaves
2004-06-21 17:43             ` ganesh.venkatesan
2004-06-21 18:34               ` David Greaves
2004-06-18 18:11       ` 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out Stephen Hemminger
2004-06-18 18:44         ` David Greaves
     [not found]           ` <20040618141629.0edd9766@dell_ss3.pdx.osdl.net>
2004-06-18 21:28             ` David Greaves
  -- strict thread matches above, loose matches on Subject: below --
2004-06-18 14:40 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out+ delay scheduler Venkatesan, Ganesh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.