All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tony Battersby <tonyb@cybernetics.com>
To: Stephen Hemminger <shemminger@linux-foundation.org>,
	netdev@vger.kernel.org
Subject: sky2: tx hang on dual-port Yukon XL when rx csum disabled
Date: Mon, 28 Jan 2008 13:43:19 -0500	[thread overview]
Message-ID: <479E2247.8080109@cybernetics.com> (raw)

I am experiencing network tx hangs on a dual-port SK-9E22 with sky2 in
2.6.24.  The problem is triggered by both ports transmitting at high
speed simultaneously.  This problem is 100% quickly reproducible.  Here
is the setup:

PC #1 with Intel PRO/1000 NIC:
e1000 IP address 192.168.1.1
running iperf -s

PC #2 with Intel PRO/1000 NIC:
e1000 IP address 192.168.2.1
running iperf -s

PC #3 with SysKonnect SK-9E22 (dual-port copper PCI-express)
sky2 IP address 192.168.1.2
sky2 IP address 192.168.2.2

So basically, I have two PCs with Intel PRO/1000 NICs running "iperf
-s".  Each of these Intel NICs is directly cabled to one of the two
ports of the SysKonnect NIC.

When I run:
(PC #3 tty1) iperf -c 192.168.1.1 -t 30
(wait for a second or two)
(PC #3 tty2) iperf -c 192.168.2.1 -t 30

"iperf -c 192.168.1.1" never finishes, but "iperf -c 192.168.2.1" does
finish.  Press Ctrl-C to abort the hung iperf.  Ping 192.168.1.1 does
not respond.  Ping 192.168.2.1 does respond, but each ping has almost
exactly 1 second latency (the latency should be < 1 ms).

When I switch the order of the tests, whichever iperf -c was started
_first_ is the one that locks up with no ping afterward, and whichever
was started _second_ is the one that finishes, but with a 1-second ping
latency afterward.  So the problem follows the ordering of the tests
rather than a specific port.

Also, the trigger seems to be transmitting, not receiving.  If I run
"iperf -s" on the SysKonnect PC and "iperf -c" on the two Intel PRO/1000
PCs, then the tests pass.

When I do "ethtool -K eth0 rx on; ethtool -K eth1 rx on" to turn on rx
checksumming on both ports of the SysKonnect NIC, both tests pass
successfully.  Commit 8b31cfbcd1b54362ef06c85beb40e65a349169a2 "sky2:
disable rx checksum on Yukon XL" disabled rx checksumming by default on
this NIC to get rid of some "hw csum failure" messages
(http://marc.info/?l=linux-netdev&m=119497815523843&w=4).  However, this
seems to have exposed a different (and arguably worse) bug.

I also tried booting with "maxcpus=1 pci=nomsi", but that didn't affect
the problem.

As a temporary workaround, I will use ethtool to turn on rx checksumming
and live with the "hw csum failure" messages, since they are better than
network lockups.

Let me know if I can be of any further assistance in tracking down this
problem.

Tony Battersby
Cybernetics


             reply	other threads:[~2008-01-28 19:06 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-28 18:43 Tony Battersby [this message]
2008-01-28 20:43 ` sky2: tx hang on dual-port Yukon XL when rx csum disabled Stephen Hemminger
2008-01-28 20:58   ` Tony Battersby
2008-01-28 21:21 ` Brandeburg, Jesse
2008-01-28 21:38   ` Tony Battersby
2008-01-29 15:26 ` Tony Battersby

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=479E2247.8080109@cybernetics.com \
    --to=tonyb@cybernetics.com \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.