From: David Greaves <david@dgreaves.com>
To: Jens Laas <jens.laas@data.slu.se>
Cc: Stephen Hemminger <shemminger@osdl.org>,
netdev@oss.sgi.com, ganesh.venkatesan@intel.com
Subject: Re: 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out+ delay scheduler
Date: Fri, 18 Jun 2004 10:08:36 +0100 [thread overview]
Message-ID: <40D2B114.5020201@dgreaves.com> (raw)
In-Reply-To: <Pine.LNX.4.60.0406180953240.1089@jlaas2.data.slu.se>
Stephen, I applied your delay scheduler patch and some results appear below.
Jens Laas wrote:
> (04.06.16 kl.11:59) David Greaves skrev följande till Stephen Hemminger:
>
> We have seen the same symptoms. (2.6.x + e1000)
>
> Our system is an SMP system. That might be whats triggering the problem.
> Is your system UP or SMP ?
UP
> (Next reboot we will test running on only one CPU).
>
> We have tried with and without NAPI, both exhibit the same problem.
Me too
> We have tried different versions of e1000 without luck.
Me too, 3 cards.
(did I mention I have 2 machines with very similar specs (AMD/VIAKT600)
and the other one works - actually, to be accurate, hasn't yet failed
but hasn't yet run at full speed - and it has a higher CPU speed)
> We have tried with 100Mb and gigabit switches.
I'm now running two e1000's back to back over a piece of cat5...
>
> Make sure that flowcontrol is disabled on your switch (if it has it
> implemented).
...so it's not that smart anymore ;)
>>
>> module parameters.
>
>
> I believe following is recommended by driver developers:
> TxDescriptors=256 RxDescriptors=256 FlowControl=0 XsumRX=0
Yes, I'm running with module defaults unless otherwise stated but I've
tried that combo (to no effect)
I'm speaking with Ganesh Venkatesan at intel about it. Ganesh you went
off list - do you want to include Jens or maybe go back on-list?
A simple failure case for me is : 'ping -s 1500 '
This doesn't cause the timout but doesn't succeed either.
ping -f with standard packet size succeeds (slow rate though) and
doesn't timeout.
Using 8139 100Mbs card:
272384 packets transmitted, 272383 packets received, 0% packet loss
round-trip min/avg/max = 0.1/0.1/4.0 ms
real 0m32.179s
Using Pro/1000:
60992 packets transmitted, 60991 packets received, 0% packet loss
round-trip min/avg/max = 0.0/0.5/8.4 ms
real 0m38.257s
any ping with -s >1500 results in 100% packet loss.
============
From hereon down it's 2.6.7 with Stephen's recent delay scheduler patch
This changed the behaviour.
Now ping -s 1500 works
but after that it gets lossy
root@ash:~ # ping -s3000 10.0.1.1
PING 10.0.1.1 (10.0.1.1): 3000 data bytes
3008 bytes from 10.0.1.1: icmp_seq=1 ttl=64 time=0.5 ms
3008 bytes from 10.0.1.1: icmp_seq=11 ttl=64 time=0.5 ms
3008 bytes from 10.0.1.1: icmp_seq=12 ttl=64 time=0.4 ms
3008 bytes from 10.0.1.1: icmp_seq=13 ttl=64 time=0.9 ms
3008 bytes from 10.0.1.1: icmp_seq=15 ttl=64 time=0.4 ms
3008 bytes from 10.0.1.1: icmp_seq=16 ttl=64 time=0.3 ms
and now I'm seeing ping generate:
Jun 18 09:41:57 ash kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jun 18 09:41:59 ash kernel: e1000: eth0: e1000_watchdog: NIC Link is Up
1000 Mbps Full Duplex
ping -f now works for packet sizes up to -s 2952 (2 packets at mtu 1500)
ping -f -s 2953 results in:
PING 10.0.1.1 (10.0.1.1): 2953 data bytes
..............................ping: sendto: No buffer space available
ping: wrote 10.0.1.1 2961 chars, ret=-1
.ping: sendto: No buffer space available
nb. with the patch, between the same machines via an alternate pair of nics:
root@ash:~ # ping -f -s29550 haze
PING haze.dgreaves.com (10.0.0.88): 29550 data bytes
.
--- haze.dgreaves.com ping statistics ---
10592 packets transmitted, 10591 packets received, 0% packet loss
round-trip min/avg/max = 5.4/5.5/83.5 ms
Increasing Transmit Descriptors to 4096 avoids the No buffer space
available with packet sizes up to -s65468 (still 100% failure though)
I'm not sure that adds much now so I'll leave it until I get some more
suggestions.
HTH
David
next prev parent reply other threads:[~2004-06-18 9:08 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-06-14 16:47 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out David Greaves
[not found] ` <20040615155111.26d6b809@dell_ss3.pdx.osdl.net>
2004-06-16 10:59 ` David Greaves
2004-06-18 8:04 ` Jens Laas
2004-06-18 9:08 ` David Greaves [this message]
2004-06-18 10:27 ` 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out+ delay scheduler Jens Laas
2004-06-18 12:51 ` David Greaves
2004-06-21 16:42 ` Thayne Harbaugh
2004-06-21 17:29 ` David Greaves
2004-06-21 17:43 ` ganesh.venkatesan
2004-06-21 18:34 ` David Greaves
2004-06-18 18:11 ` 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out Stephen Hemminger
2004-06-18 18:44 ` David Greaves
[not found] ` <20040618141629.0edd9766@dell_ss3.pdx.osdl.net>
2004-06-18 21:28 ` David Greaves
-- strict thread matches above, loose matches on Subject: below --
2004-06-18 14:40 2.6.6 e1000 NETDEV WATCHDOG: eth0: transmit timed out+ delay scheduler Venkatesan, Ganesh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=40D2B114.5020201@dgreaves.com \
--to=david@dgreaves.com \
--cc=ganesh.venkatesan@intel.com \
--cc=jens.laas@data.slu.se \
--cc=netdev@oss.sgi.com \
--cc=shemminger@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.