netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sky2 hangs, hw csum errors with 2.6.18
@ 2006-09-22 11:24 Martin Lucina
  2006-09-22 16:56 ` Stephen Hemminger
  0 siblings, 1 reply; 15+ messages in thread
From: Martin Lucina @ 2006-09-22 11:24 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger

Hello,

I'm having problems with my sky2 NIC hanging under heavy load.  This
appears to be an old problem since it happened for me with 2.6.17 as
well.  Upgrading the affected systems to 2.6.18 has not solved the
problem.  It's easily reproducible for me since I'm running some
application stress testing that easily saturates the link.

I've had a look at the recent traffic on linux-kernel, netdev and the
relevant bugzilla (http://bugzilla.kernel.org/show_bug.cgi?id=6839) but
it's not clear to me which patch I should try against a stock 2.6.18
kernel.  If someone could confirm that the "TX pause fix" attached to
the bugzilla is sufficient, that would be great.

The card in question is a:

Sep 22 12:17:27 dezo kernel: sky2 v1.5 addr 0xf3000000 irq 169 Yukon-XL (0xb3) rev 1

it's a SysKonnect SK-9E21 PCI-E Server Adapter and the driver is using
PCI-MSI interrupts on my system.

The chip on the card is a Marvell 88E8061.

The actual errors leading up to the latest hang are:

Sep 21 21:47:06 dezo kernel: NETDEV WATCHDOG: eth1: transmit timed out
Sep 21 21:47:06 dezo kernel: sky2 eth1: tx timeout
Sep 21 21:47:06 dezo kernel: sky2 eth1: transmit ring 220 .. 179 report=220 done=220
Sep 21 21:47:06 dezo kernel: sky2 hardware hung? flushing
Sep 21 21:59:41 dezo kernel: NETDEV WATCHDOG: eth1: transmit timed out
Sep 21 21:59:41 dezo kernel: sky2 eth1: tx timeout
Sep 21 21:59:41 dezo kernel: sky2 eth1: transmit ring 179 .. 138 report=220 done=220
Sep 21 21:59:41 dezo kernel: sky2 status report lost?
Sep 21 22:00:41 dezo kernel: NETDEV WATCHDOG: eth1: transmit timed out
Sep 21 22:00:41 dezo kernel: sky2 eth1: tx timeout
Sep 21 22:00:41 dezo kernel: sky2 eth1: transmit ring 220 .. 179 report=220 done=220
Sep 21 22:00:41 dezo kernel: sky2 hardware hung? flushing
Sep 21 22:13:10 dezo kernel: NETDEV WATCHDOG: eth1: transmit timed out
Sep 21 22:13:10 dezo kernel: sky2 eth1: tx timeout
Sep 21 22:13:10 dezo kernel: sky2 eth1: transmit ring 179 .. 138 report=220 done=220
Sep 21 22:13:10 dezo kernel: sky2 status report lost?
Sep 21 22:14:20 dezo kernel: NETDEV WATCHDOG: eth1: transmit timed out
Sep 21 22:14:20 dezo kernel: sky2 eth1: tx timeout
Sep 21 22:14:20 dezo kernel: sky2 eth1: transmit ring 220 .. 179 report=220 done=220
Sep 21 22:14:20 dezo kernel: sky2 hardware hung? flushing
Sep 21 22:15:09 dezo kernel: sky2 eth1: disabling interface
Sep 21 22:15:09 dezo kernel: sky2 eth1: enabling interface
Sep 21 22:15:12 dezo kernel: sky2 eth1: Link is up at 1000 Mbps, full duplex, flow control
 both
Sep 21 22:15:20 dezo kernel: eth1: no IPv6 routers present

While the interface does appear to have been reset, it never actually
started working again and the system was hung until I rebooted it this
morning.

I'm also seeing a lot of these under high load:

Sep 21 21:34:24 dezo kernel: eth1: hw csum failure.
Sep 21 21:34:24 dezo kernel: 
Sep 21 21:34:24 dezo kernel: Call Trace:
Sep 21 21:34:24 dezo kernel:  [dump_stack+16/21] dump_stack+0x10/0x15
Sep 21 21:34:24 dezo kernel:  [__skb_checksum_complete+85/121] __skb_checksum_complete+0x5
5/0x79
Sep 21 21:34:24 dezo kernel:  [tcp_v4_rcv+218/2405] tcp_v4_rcv+0xda/0x965
Sep 21 21:34:24 dezo kernel:  [ip_local_deliver+433/635] ip_local_deliver+0x1b1/0x27b
Sep 21 21:34:24 dezo kernel:  [ip_rcv+1234/1311] ip_rcv+0x4d2/0x51f
Sep 21 21:34:24 dezo kernel:  [netif_receive_skb+589/621] netif_receive_skb+0x24d/0x26d
Sep 21 21:34:24 dezo kernel:  [__nosave_end+128712870/2129981440] :sky2:sky2_status_intr+0
x23b/0x404
Sep 21 21:34:24 dezo kernel:  [__nosave_end+128714646/2129981440] :sky2:sky2_poll+0x100/0x
1a1
Sep 21 21:34:24 dezo kernel:  [net_rx_action+132/268] net_rx_action+0x84/0x10c
Sep 21 21:34:24 dezo kernel:  [__do_softirq+107/226] __do_softirq+0x6b/0xe2
Sep 21 21:34:24 dezo kernel:  [call_softirq+28/40] call_softirq+0x1c/0x28
Sep 21 21:34:24 dezo kernel:  [do_softirq+45/129] do_softirq+0x2d/0x81
Sep 21 21:34:24 dezo kernel:  [do_IRQ+112/132] do_IRQ+0x70/0x84
Sep 21 21:34:24 dezo kernel:  [ret_from_intr+0/11] ret_from_intr+0x0/0xb
Sep 21 21:34:24 dezo kernel:  [mwait_idle+58/82] mwait_idle+0x3a/0x52
Sep 21 21:34:24 dezo kernel:  [cpu_idle+105/140] cpu_idle+0x69/0x8c
Sep 21 21:34:24 dezo kernel:  [start_kernel+483/488] start_kernel+0x1e3/0x1e8
Sep 21 21:34:24 dezo kernel:  [x86_64_start_kernel+459/474] x86_64_start_kernel+0x1cb/0x1d

Am happy to help with tracking this down...

Thanks,

-mato

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-10-03 19:24 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-22 11:24 sky2 hangs, hw csum errors with 2.6.18 Martin Lucina
2006-09-22 16:56 ` Stephen Hemminger
2006-09-22 18:23   ` Martin Lucina
2006-09-22 18:29   ` Martin Lucina
2006-09-22 18:31     ` Stephen Hemminger
2006-09-22 18:38       ` Martin Lucina
2006-09-22 18:50         ` Stephen Hemminger
2006-10-03 18:21           ` Martin Lucina
2006-10-03 18:35             ` Stephen Hemminger
2006-10-03 18:39               ` Martin Lucina
2006-10-03 19:03                 ` Stephen Hemminger
2006-10-03 19:13                   ` Martin Lucina
2006-10-03 19:16                     ` Stephen Hemminger
2006-10-03 19:23                       ` Martin Lucina
2006-10-03 19:15                   ` Martin Lucina

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).