netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out
@ 2004-06-14 16:47 David Greaves
       [not found] ` <20040615155111.26d6b809@dell_ss3.pdx.osdl.net>
  0 siblings, 1 reply; 14+ messages in thread
From: David Greaves @ 2004-06-14 16:47 UTC (permalink / raw)
  To: shemminger, scott.feldman; +Cc: netdev

Hi

I have 2 machines with Intel/Pro 1000MT cards.

One machine seems to work fine (AFAIK), the other has major problems.
I've swapped the cards and the problem stays on the machine.

I'm using version 5.2.39-k2 from the stock 2.6.6 kernel on both machines.

Any sustained traffic causes repeated:
Jun 14 16:29:14 ash kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jun 14 16:29:17 ash kernel: e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex
Jun 14 16:29:17 ash kernel: nfs: server cu OK

I had a pair of Realtek r8169s that worked fine but only gave me 10Mb/s 
so I exchanged them for the Intel/Pro cards in the hope of something 
better - now, even with scp's rate limiter as low as 10kb/s this it 
still occurs.

I have played with all the module parameters and not found anything that 
affects it at 1Gbps

Even dropping to 100Mbps:
Jun 14 17:33:03 ash kernel: e1000: eth0 NIC Link is Up 100 Mbps Full Duplex
Jun 14 17:33:33 ash kernel: NETDEV WATCHDOG: eth0: transmit timed out

it can do 10Mbs:
scp reports a throughput of 1.0Mb/s (... less than thrilling)
however scp now transfers a few Mb and says:
Disconnecting: Corrupted MAC on input.



I found this mail:
  http://oss.sgi.com/projects/netdev/archive/2004-06/msg00256.html
from Stephen

which appears to reverse this mail:
  http://marc.theaimsgroup.com/?l=linux-kernel&m=107516205706542&w=2
from Scott
which I gather was supposed to correct this problem :)

I have seen no suggestions about other subsystems (eg ACPI etc) that 
could also be tried.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread
* RE: 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out+ delay scheduler
@ 2004-06-18 14:40 Venkatesan, Ganesh
  0 siblings, 0 replies; 14+ messages in thread
From: Venkatesan, Ganesh @ 2004-06-18 14:40 UTC (permalink / raw)
  To: David Greaves, Jens Laas; +Cc: Stephen Hemminger, netdev

Jens/David:

Did not mean to get off the list. For some reason, my subscription to
netdev is not working (even after re-subscribing). So, I grabbed your
message off of the archive.

I am trying to recreate your failure scenario in our lab. In the
meantime, please send me any new information you have on this issue.

Thanks,
ganesh 
 
-------------------------------------------------
Ganesh Venkatesan
Network/Storage Division, Hillsboro, OR

-----Original Message-----
From: David Greaves [mailto:david@dgreaves.com] 
Sent: Friday, June 18, 2004 5:52 AM
To: Jens Laas
Cc: Stephen Hemminger; netdev@oss.sgi.com; Venkatesan, Ganesh
Subject: Re: 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out+
delay scheduler

New info:
I booted into XP and the card works there - so it doesn't look like a 
simple hardware incompatibility.
[I've got no real way to test the performance but cygwin's wget against 
apache1.3 on the linux box returns about 25M/s initially and then 15M/s 
sustained for 500Mb]

Jens Laas wrote:

>>
>> I'm speaking with Ganesh Venkatesan at intel about it. Ganesh you 
>> went off list - do you want to include Jens or maybe go back on-list?
>
>
> If others run into this problem I'm sure they'll appreciate if its on 
> list.
> Since we have no idea what causes this (AFAIK) it may be a more 
> general problem than the device driver.

I tend to agree - but I wasn't sure if this was the place and I'll do as

I'm told ;)

>> A simple failure case for me is : 'ping -s 1500 '
>> This doesn't cause the timout but doesn't succeed either.
>>
>> ping -f with standard packet size succeeds (slow rate though) and 
>> doesn't timeout.
>
>
>
> I dont see the ping problems at all. Unless you try to ping when the 
> interface has "hanged" ?

<sigh> thought that might be helpful.
Ping with -s and -f seems to allow me to trigger errors and it seems a 
lot more debug-able than scp or nfs :)
No all tests are when it's reset and 'clean'

>> ============
>> From hereon down it's 2.6.7 with Stephen's recent delay scheduler
patch
>>
>> This changed the behaviour.
>
>
>
> This is strange unless you are actually using the delay scheduler ?
> Default is sch_generic (that is pfifo) that does not exhibit the 
> problems correct by the patch.

I'll go back and double check in case I cocked up...
(I noticed the e1000 module rebuild but you're right that's incidental)

I've rebuilt the kernel and modules with and w/o patch and rebooted a 
few times and I can't reproduce that effect - sorry for the red herring.
So after I reverted Stephens patch the results I reported are still 
reproducable w/o the patch.

>> 10592 packets transmitted, 10591 packets received, 0% packet loss
>> round-trip min/avg/max = 5.4/5.5/83.5 ms
>>
>> Increasing Transmit Descriptors to 4096 avoids the No buffer space 
>> available with packet sizes up to -s65468 (still 100% failure though)
>
>
> Increasing nr of buffers is not a way to fix the problem.

agreed - however in my ignorance of the deep behaviour I'm reporting 
things that affect behaviour in ways I don't expect.
I expected it to take longer to run out of buffers - that didn't happen
:)

(Anyway, on retesting I find that this was wrong - I suspect the 
interface was down and I didn't notice)

>
> I had hoped to hear something about this from Scott..

I'm happy to hear from anyone - I don't have *that* long until my RMA 
option expires and I don't fancy keeping them as ornaments!

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2004-06-21 18:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-14 16:47 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out David Greaves
     [not found] ` <20040615155111.26d6b809@dell_ss3.pdx.osdl.net>
2004-06-16 10:59   ` David Greaves
2004-06-18  8:04     ` Jens Laas
2004-06-18  9:08       ` 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out+ delay scheduler David Greaves
2004-06-18 10:27         ` Jens Laas
2004-06-18 12:51           ` David Greaves
2004-06-21 16:42         ` Thayne Harbaugh
2004-06-21 17:29           ` David Greaves
2004-06-21 17:43             ` ganesh.venkatesan
2004-06-21 18:34               ` David Greaves
2004-06-18 18:11       ` 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out Stephen Hemminger
2004-06-18 18:44         ` David Greaves
     [not found]           ` <20040618141629.0edd9766@dell_ss3.pdx.osdl.net>
2004-06-18 21:28             ` David Greaves
  -- strict thread matches above, loose matches on Subject: below --
2004-06-18 14:40 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out+ delay scheduler Venkatesan, Ganesh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).