* sky2: tx hang on dual-port Yukon XL when rx csum disabled
@ 2008-01-28 18:43 Tony Battersby
2008-01-28 20:43 ` Stephen Hemminger
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Tony Battersby @ 2008-01-28 18:43 UTC (permalink / raw)
To: Stephen Hemminger, netdev
I am experiencing network tx hangs on a dual-port SK-9E22 with sky2 in
2.6.24. The problem is triggered by both ports transmitting at high
speed simultaneously. This problem is 100% quickly reproducible. Here
is the setup:
PC #1 with Intel PRO/1000 NIC:
e1000 IP address 192.168.1.1
running iperf -s
PC #2 with Intel PRO/1000 NIC:
e1000 IP address 192.168.2.1
running iperf -s
PC #3 with SysKonnect SK-9E22 (dual-port copper PCI-express)
sky2 IP address 192.168.1.2
sky2 IP address 192.168.2.2
So basically, I have two PCs with Intel PRO/1000 NICs running "iperf
-s". Each of these Intel NICs is directly cabled to one of the two
ports of the SysKonnect NIC.
When I run:
(PC #3 tty1) iperf -c 192.168.1.1 -t 30
(wait for a second or two)
(PC #3 tty2) iperf -c 192.168.2.1 -t 30
"iperf -c 192.168.1.1" never finishes, but "iperf -c 192.168.2.1" does
finish. Press Ctrl-C to abort the hung iperf. Ping 192.168.1.1 does
not respond. Ping 192.168.2.1 does respond, but each ping has almost
exactly 1 second latency (the latency should be < 1 ms).
When I switch the order of the tests, whichever iperf -c was started
_first_ is the one that locks up with no ping afterward, and whichever
was started _second_ is the one that finishes, but with a 1-second ping
latency afterward. So the problem follows the ordering of the tests
rather than a specific port.
Also, the trigger seems to be transmitting, not receiving. If I run
"iperf -s" on the SysKonnect PC and "iperf -c" on the two Intel PRO/1000
PCs, then the tests pass.
When I do "ethtool -K eth0 rx on; ethtool -K eth1 rx on" to turn on rx
checksumming on both ports of the SysKonnect NIC, both tests pass
successfully. Commit 8b31cfbcd1b54362ef06c85beb40e65a349169a2 "sky2:
disable rx checksum on Yukon XL" disabled rx checksumming by default on
this NIC to get rid of some "hw csum failure" messages
(http://marc.info/?l=linux-netdev&m=119497815523843&w=4). However, this
seems to have exposed a different (and arguably worse) bug.
I also tried booting with "maxcpus=1 pci=nomsi", but that didn't affect
the problem.
As a temporary workaround, I will use ethtool to turn on rx checksumming
and live with the "hw csum failure" messages, since they are better than
network lockups.
Let me know if I can be of any further assistance in tracking down this
problem.
Tony Battersby
Cybernetics
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: sky2: tx hang on dual-port Yukon XL when rx csum disabled
2008-01-28 18:43 sky2: tx hang on dual-port Yukon XL when rx csum disabled Tony Battersby
@ 2008-01-28 20:43 ` Stephen Hemminger
2008-01-28 20:58 ` Tony Battersby
2008-01-28 21:21 ` Brandeburg, Jesse
2008-01-29 15:26 ` Tony Battersby
2 siblings, 1 reply; 6+ messages in thread
From: Stephen Hemminger @ 2008-01-28 20:43 UTC (permalink / raw)
To: Tony Battersby; +Cc: netdev
On Mon, 28 Jan 2008 13:43:19 -0500
Tony Battersby <tonyb@cybernetics.com> wrote:
> I am experiencing network tx hangs on a dual-port SK-9E22 with sky2 in
> 2.6.24. The problem is triggered by both ports transmitting at high
> speed simultaneously. This problem is 100% quickly reproducible. Here
> is the setup:
>
> PC #1 with Intel PRO/1000 NIC:
> e1000 IP address 192.168.1.1
> running iperf -s
>
> PC #2 with Intel PRO/1000 NIC:
> e1000 IP address 192.168.2.1
> running iperf -s
>
> PC #3 with SysKonnect SK-9E22 (dual-port copper PCI-express)
> sky2 IP address 192.168.1.2
> sky2 IP address 192.168.2.2
>
> So basically, I have two PCs with Intel PRO/1000 NICs running "iperf
> -s". Each of these Intel NICs is directly cabled to one of the two
> ports of the SysKonnect NIC.
>
> When I run:
> (PC #3 tty1) iperf -c 192.168.1.1 -t 30
> (wait for a second or two)
> (PC #3 tty2) iperf -c 192.168.2.1 -t 30
>
> "iperf -c 192.168.1.1" never finishes, but "iperf -c 192.168.2.1" does
> finish. Press Ctrl-C to abort the hung iperf. Ping 192.168.1.1 does
> not respond. Ping 192.168.2.1 does respond, but each ping has almost
> exactly 1 second latency (the latency should be < 1 ms).
>
> When I switch the order of the tests, whichever iperf -c was started
> _first_ is the one that locks up with no ping afterward, and whichever
> was started _second_ is the one that finishes, but with a 1-second ping
> latency afterward. So the problem follows the ordering of the tests
> rather than a specific port.
>
> Also, the trigger seems to be transmitting, not receiving. If I run
> "iperf -s" on the SysKonnect PC and "iperf -c" on the two Intel PRO/1000
> PCs, then the tests pass.
>
> When I do "ethtool -K eth0 rx on; ethtool -K eth1 rx on" to turn on rx
> checksumming on both ports of the SysKonnect NIC, both tests pass
> successfully. Commit 8b31cfbcd1b54362ef06c85beb40e65a349169a2 "sky2:
> disable rx checksum on Yukon XL" disabled rx checksumming by default on
> this NIC to get rid of some "hw csum failure" messages
> (http://marc.info/?l=linux-netdev&m=119497815523843&w=4). However, this
> seems to have exposed a different (and arguably worse) bug.
>
> I also tried booting with "maxcpus=1 pci=nomsi", but that didn't affect
> the problem.
>
> As a temporary workaround, I will use ethtool to turn on rx checksumming
> and live with the "hw csum failure" messages, since they are better than
> network lockups.
>
> Let me know if I can be of any further assistance in tracking down this
> problem.
>
> Tony Battersby
> Cybernetics
What bus and chipset is in use on the systems with sky2?
I have seen problems when using PCI-X on AMD systems (documented in AMD errata)
due to multiple outstanding transactions.
--
Stephen Hemminger <stephen.hemminger@vyatta.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: sky2: tx hang on dual-port Yukon XL when rx csum disabled
2008-01-28 20:43 ` Stephen Hemminger
@ 2008-01-28 20:58 ` Tony Battersby
0 siblings, 0 replies; 6+ messages in thread
From: Tony Battersby @ 2008-01-28 20:58 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
> What bus and chipset is in use on the systems with sky2?
> I have seen problems when using PCI-X on AMD systems (documented in AMD errata)
> due to multiple outstanding transactions.
Motherboard: SuperMicro PDSME
Chipset: Intel E7230
Processor: Intel Pentium D 3.4 GHz
(note: tried both SMP and booting with maxcpus=1)
lspci:
00:00.0 Host bridge: Intel Corporation E7230/3000/3010 Memory Controller Hub (rev 81)
00:01.0 PCI bridge: Intel Corporation E7230/3000/3010 PCI Express Root Port (rev 81)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01)
00:1c.4 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 5 (rev 01)
00:1c.5 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 6 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09)
01:00.1 PIC: Intel Corporation 6700/6702PXH I/OxAPIC Interrupt Controller A (rev 09)
01:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09)
01:00.3 PIC: Intel Corporation 6700PXH I/OxAPIC Interrupt Controller B (rev 09)
04:00.0 Ethernet controller: SysKonnect SK-9E21D 10/100/1000Base-T Adapter, Copper RJ-45 (rev 14)
05:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (rev 03)
06:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (rev 03)
0a:04.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
cat /proc/cpuinfo:
processor: 0
vendor_id: GenuineIntel
cpu family: 15
model: 6
model name: Intel(R) Pentium(R) D CPU 3.40GHz
stepping: 4
cpu MHz: 3391.734
cache size: 2048 KB
physical id: 0
siblings: 2
core id: 0
cpu cores: 2
fdiv_bug: no
hlt_bug: no
f00f_bug: no
coma_bug: no
fpu: yes
fpu_exception: yes
cpuid level: 6
wp: yes
flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl est cid cx16 xtpr lahf_lm
bogomips: 6789.26
clflush size: 64
processor: 1
vendor_id: GenuineIntel
cpu family: 15
model: 6
model name: Intel(R) Pentium(R) D CPU 3.40GHz
stepping: 4
cpu MHz: 3391.734
cache size: 2048 KB
physical id: 0
siblings: 2
core id: 1
cpu cores: 2
fdiv_bug: no
hlt_bug: no
f00f_bug: no
coma_bug: no
fpu: yes
fpu_exception: yes
cpuid level: 6
wp: yes
flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pebs bts sync_rdtsc pni monitor ds_cpl est cid cx16 xtpr lahf_lm
bogomips: 6783.57
clflush size: 64
cat /proc/interrupts
CPU0 CPU1
0: 86 0 IO-APIC-edge timer
1: 81 0 IO-APIC-edge i8042
7: 0 0 IO-APIC-edge parport0
8: 1 0 IO-APIC-edge rtc
9: 0 0 IO-APIC-fasteoi acpi
12: 5 0 IO-APIC-edge i8042
14: 412 0 IO-APIC-edge ide0
16: 0 0 IO-APIC-fasteoi uhci_hcd:usb5
18: 0 0 IO-APIC-fasteoi uhci_hcd:usb4
19: 31 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2
20: 0 0 IO-APIC-fasteoi uhci_hcd:usb3
219: 1 0 PCI-MSI-edge eth0
NMI: 0 0 Non-maskable interrupts
LOC: 1924 514 Local timer interrupts
RES: 16 20 Rescheduling interrupts
CAL: 19 56 function call interrupts
TLB: 21 41 TLB shootdowns
TRM: 0 0 Thermal event interrupts
SPU: 0 0 Spurious interrupts
ERR: 0
MIS: 0
(note: tried booting with pci=nomsi also)
Tony
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: sky2: tx hang on dual-port Yukon XL when rx csum disabled
2008-01-28 18:43 sky2: tx hang on dual-port Yukon XL when rx csum disabled Tony Battersby
2008-01-28 20:43 ` Stephen Hemminger
@ 2008-01-28 21:21 ` Brandeburg, Jesse
2008-01-28 21:38 ` Tony Battersby
2008-01-29 15:26 ` Tony Battersby
2 siblings, 1 reply; 6+ messages in thread
From: Brandeburg, Jesse @ 2008-01-28 21:21 UTC (permalink / raw)
To: Tony Battersby, Stephen Hemminger, netdev
Tony Battersby wrote:
> I am experiencing network tx hangs on a dual-port SK-9E22 with sky2 in
> 2.6.24. The problem is triggered by both ports transmitting at high
> speed simultaneously. This problem is 100% quickly reproducible.
> Here is the setup:
>
> PC #1 with Intel PRO/1000 NIC:
> e1000 IP address 192.168.1.1
> running iperf -s
>
> PC #2 with Intel PRO/1000 NIC:
> e1000 IP address 192.168.2.1
> running iperf -s
>
> PC #3 with SysKonnect SK-9E22 (dual-port copper PCI-express)
> sky2 IP address 192.168.1.2
> sky2 IP address 192.168.2.2
>
> So basically, I have two PCs with Intel PRO/1000 NICs running "iperf
> -s". Each of these Intel NICs is directly cabled to one of the two
> ports of the SysKonnect NIC.
make sure to disable the default Linux arp behavior for this kind of
test on PC3 by*
[root@real-server]# echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
[root@real-server]# echo 1 > /proc/sys/net/ipv4/conf/eth0/arp_filter
[root@real-server]# echo 1 > /proc/sys/net/ipv4/conf/eth1/arp_filter
*see http://linux-ip.net/html/ether-arp.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: sky2: tx hang on dual-port Yukon XL when rx csum disabled
2008-01-28 21:21 ` Brandeburg, Jesse
@ 2008-01-28 21:38 ` Tony Battersby
0 siblings, 0 replies; 6+ messages in thread
From: Tony Battersby @ 2008-01-28 21:38 UTC (permalink / raw)
To: Brandeburg, Jesse; +Cc: Stephen Hemminger, netdev
Brandeburg, Jesse wrote:
> make sure to disable the default Linux arp behavior for this kind of
> test on PC3 by*
> [root@real-server]# echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
> [root@real-server]# echo 1 > /proc/sys/net/ipv4/conf/eth0/arp_filter
> [root@real-server]# echo 1 > /proc/sys/net/ipv4/conf/eth1/arp_filter
>
> *see http://linux-ip.net/html/ether-arp.html
>
>
Yeah, that bit me a few years ago, and I now have it in one of my boot
startup scripts...
But thanks anyway.
Tony
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: sky2: tx hang on dual-port Yukon XL when rx csum disabled
2008-01-28 18:43 sky2: tx hang on dual-port Yukon XL when rx csum disabled Tony Battersby
2008-01-28 20:43 ` Stephen Hemminger
2008-01-28 21:21 ` Brandeburg, Jesse
@ 2008-01-29 15:26 ` Tony Battersby
2 siblings, 0 replies; 6+ messages in thread
From: Tony Battersby @ 2008-01-29 15:26 UTC (permalink / raw)
To: Stephen Hemminger, netdev
Tony Battersby wrote:
> "iperf -c 192.168.1.1" never finishes, but "iperf -c 192.168.2.1" does
> finish. Press Ctrl-C to abort the hung iperf. Ping 192.168.1.1 does
> not respond. Ping 192.168.2.1 does respond, but each ping has almost
> exactly 1 second latency (the latency should be < 1 ms).
>
>
Update: after triggering the problem, the ping latency on the interface
that still responds is the same as the ping interval. The default ping
interval is 1 second, so in my initial test I was seeing a 1 second ping
latency. If I do "ping -i 2 192.168.2.1", then each ping takes 2
seconds to receive the response. If I do "ping -i 5 192.168.2.1", then
each ping takes 5 seconds to receive the response. This implies that
the network stack doesn't realize that it received the ping reply until
it goes to send another ping.
Hope that helps.
Tony Battersby
Cybernetics
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-01-29 15:26 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-28 18:43 sky2: tx hang on dual-port Yukon XL when rx csum disabled Tony Battersby
2008-01-28 20:43 ` Stephen Hemminger
2008-01-28 20:58 ` Tony Battersby
2008-01-28 21:21 ` Brandeburg, Jesse
2008-01-28 21:38 ` Tony Battersby
2008-01-29 15:26 ` Tony Battersby
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).