* r8169: panic on 2.6.11
@ 2005-03-04 21:28 Stephen Hemminger
2005-03-04 21:37 ` Jeff Garzik
2005-03-04 21:39 ` Francois Romieu
0 siblings, 2 replies; 13+ messages in thread
From: Stephen Hemminger @ 2005-03-04 21:28 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev
I was intentionally over stressing r8169 without NAPI to test out
an alternative version of netif_rx, and discovered the following bug.
Looks like a problem in r8169. I was using pktgen to overwhelm the r8169
from a fast machine.
eth0: Too much work at interrupt!
eth0: Too much work at interrupt!
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:91!
invalid operand: 0000 [#1]
PREEMPT
Modules linked in: i810 md5 ipv6 autofs4 sunrpc reiserfs video button battery adCPU: 0
EIP: 0060:[<c0258c78>] Not tainted VLI
EFLAGS: 00010296 (2.6.11-netrx)
EIP is at skb_over_panic+0x38/0x50
eax: 0000002e ebx: d7144000 ecx: 00000000 edx: c036cf40
esi: c02cee94 edi: 00000000 ebp: c036cf58 esp: c036cf3c
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c036c000 task=c02f8b20)
Stack: c02e38f8 d8856a10 00001fec 00001fec d7144000 d39fe6a0 00000036 c036cf94
d8856a15 00000002 c1378b20 d8856a30 00001fec d395b360 00000100 00109f36
d7144220 d7144000 d39fe6a0 00000001 d881cc00 d7144000 c036cfbc d8856b4d
Call Trace:
[<c010344a>] show_stack+0x7a/0x90
[<c01035c9>] show_registers+0x149/0x1c0
[<c01037cd>] die+0xdd/0x170
[<c0103b65>] do_invalid_op+0xa5/0xb0
[<c01030d7>] error_code+0x2b/0x30
[<d8856a15>] rtl8169_rx_interrupt+0x365/0x380 [r8169]
[<d8856b4d>] rtl8169_interrupt+0xed/0x140 [r8169]
[<c0137cc5>] handle_IRQ_event+0x35/0x70
[<c0137dbe>] __do_IRQ+0xbe/0x150
[<c01048a1>] do_IRQ+0x41/0x70
=======================
[<c010309e>] common_interrupt+0x1a/0x20
[<c02716bd>] ip_route_input_slow+0x6d/0x9f0
[<c0274660>] ip_rcv+0x380/0x480
[<c025eda5>] netif_receive_skb+0x1e5/0x220
[<c025ee5e>] process_backlog+0x7e/0x100
[<c025ef45>] net_rx_action+0x65/0xf0
[<c011fbb2>] __do_softirq+0x42/0xa0
[<c01049b4>] do_softirq+0x44/0x60
=======================
[<c011fcd8>] irq_exit+0x38/0x40
[<c01048a8>] do_IRQ+0x48/0x70
[<c010309e>] common_interrupt+0x1a/0x20
[<c0102e7e>] need_resched+0x1f/0x21
Code: c0 89 5d f8 8b 58 18 89 54 24 0c 85 db 0f 44 de 89 5c 24 10 8b 40 60 89 4
<0>Kernel panic - not syncing: Fatal exception in interrupt
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: r8169: panic on 2.6.11 2005-03-04 21:28 r8169: panic on 2.6.11 Stephen Hemminger @ 2005-03-04 21:37 ` Jeff Garzik 2005-03-04 21:50 ` Francois Romieu 2005-03-04 21:39 ` Francois Romieu 1 sibling, 1 reply; 13+ messages in thread From: Jeff Garzik @ 2005-03-04 21:37 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Francois Romieu, netdev Stephen Hemminger wrote: > I was intentionally over stressing r8169 without NAPI to test out > an alternative version of netif_rx, and discovered the following bug. > Looks like a problem in r8169. I was using pktgen to overwhelm the r8169 > from a fast machine. Does 2.6.10 fail in a similar manner? > eth0: Too much work at interrupt! > eth0: Too much work at interrupt! > ------------[ cut here ]------------ > kernel BUG at net/core/skbuff.c:91! Any idea what this BUG actually means? Since it's in skb_over_panic(), a function designed to do nothing but BUG(), that doesn't tell us a whole lot about the callsite. Jeff ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: r8169: panic on 2.6.11 2005-03-04 21:37 ` Jeff Garzik @ 2005-03-04 21:50 ` Francois Romieu [not found] ` <20050304135922.0b0a3911@dxpl.pdx.osdl.net> 0 siblings, 1 reply; 13+ messages in thread From: Francois Romieu @ 2005-03-04 21:50 UTC (permalink / raw) To: Jeff Garzik; +Cc: Stephen Hemminger, netdev Jeff Garzik <jgarzik@pobox.com> : [...] > Any idea what this BUG actually means? Since it's in skb_over_panic(), > a function designed to do nothing but BUG(), that doesn't tell us a > whole lot about the callsite. I have been reported that the driver is not protected against oversized frames since 2.6.10 at least... I've just got an instant reboot on the sparc box while testing it :o/ -- Ueimor ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <20050304135922.0b0a3911@dxpl.pdx.osdl.net>]
[parent not found: <20050304221826.GA1028@electric-eye.fr.zoreil.com>]
* Re: r8169: panic on 2.6.11 [not found] ` <20050304221826.GA1028@electric-eye.fr.zoreil.com> @ 2005-03-04 22:53 ` Stephen Hemminger 2005-03-04 23:02 ` Francois Romieu 0 siblings, 1 reply; 13+ messages in thread From: Stephen Hemminger @ 2005-03-04 22:53 UTC (permalink / raw) To: Francois Romieu; +Cc: netdev On Fri, 4 Mar 2005 23:18:26 +0100 Francois Romieu <romieu@fr.zoreil.com> wrote: > Stephen Hemminger <shemminger@osdl.org> : > [...] > > My pktgen script is below. > > Sends lots of small packets. > > Ok, so I have two issues... > > It could help to know if it was the NAPI or the IRQ verison of the driver > which crashed + rough estimate of the expected pkts/s count during such test. NAPI is not enabled, it is the IRQ version. Hitting Added instrumentation:. skb=0xd1e28380 len=8172 head=d1e32000 data=d1e32012 tail=d1e33ffe end=d1e32620 Looks like the board is running back-to-back packets together, MTU is 1500. No Jumbo frames exist on my little network and the gigabit switch (Netgear) won't even take them. Probably a chip bug. Need to add a check for len > mtu before processing? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: r8169: panic on 2.6.11 2005-03-04 22:53 ` Stephen Hemminger @ 2005-03-04 23:02 ` Francois Romieu 2005-03-04 23:28 ` Jon Mason 0 siblings, 1 reply; 13+ messages in thread From: Francois Romieu @ 2005-03-04 23:02 UTC (permalink / raw) To: Stephen Hemminger; +Cc: netdev Stephen Hemminger <shemminger@osdl.org> : [...] > NAPI is not enabled, it is the IRQ version. Hitting > > Added instrumentation:. > > skb=0xd1e28380 len=8172 head=d1e32000 data=d1e32012 tail=d1e33ffe end=d1e32620 > > Looks like the board is running back-to-back packets together, MTU is 1500. > No Jumbo frames exist on my little network and the gigabit switch (Netgear) won't > even take them. Probably a chip bug. /me scratches head: play with the interframe gap ? > Need to add a check for len > mtu before processing? Please. diff -puN drivers/net/r8169.c~r8169-470 drivers/net/r8169.c --- linux-2.6.11/drivers/net/r8169.c~r8169-470 2005-03-04 22:51:35.038710839 +0100 +++ linux-2.6.11-fr/drivers/net/r8169.c 2005-03-04 23:16:29.422289316 +0100 @@ -2194,6 +2194,7 @@ rtl8169_rx_interrupt(struct net_device * int pkt_size = (status & 0x00001FFF) - 4; void (*pci_action)(struct pci_dev *, dma_addr_t, size_t, int) = pci_dma_sync_single_for_device; + static int show_size = 0; rtl8169_rx_csum(skb, desc); @@ -2210,6 +2211,24 @@ rtl8169_rx_interrupt(struct net_device * pci_action(tp->pci_dev, le64_to_cpu(desc->addr), tp->rx_buf_sz, PCI_DMA_FROMDEVICE); + if (pkt_size >= tp->rx_buf_sz) { + show_size = 1; + pkt_size = tp->rx_buf_sz; + } + + if (show_size) { + printk(KERN_INFO "%s: pkt_size=%d\n", dev->name, + pkt_size); + printk(KERN_INFO "%s: opts1= %08x\n", dev->name, + desc->opts1); + printk(KERN_INFO "%s: opts2= %08x\n", dev->name, + desc->opts2); + printk(KERN_INFO "%s: addrl= %08x\n", dev->name, + (u32)desc->addr); + printk(KERN_INFO "%s: addrh= %08x\n", dev->name, + (u32)(desc->addr >> 32)); + } + skb->dev = dev; skb_put(skb, pkt_size); skb->protocol = eth_type_trans(skb, dev); _ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: r8169: panic on 2.6.11 2005-03-04 23:02 ` Francois Romieu @ 2005-03-04 23:28 ` Jon Mason 2005-03-04 23:49 ` Stephen Hemminger 2005-03-04 23:58 ` Francois Romieu 0 siblings, 2 replies; 13+ messages in thread From: Jon Mason @ 2005-03-04 23:28 UTC (permalink / raw) To: Francois Romieu; +Cc: Stephen Hemminger, netdev On Friday 04 March 2005 05:02 pm, Francois Romieu wrote: > Stephen Hemminger <shemminger@osdl.org> : > [...] > > > NAPI is not enabled, it is the IRQ version. Hitting > > > > Added instrumentation:. > > > > skb=0xd1e28380 len=8172 head=d1e32000 data=d1e32012 tail=d1e33ffe > > end=d1e32620 > > > > Looks like the board is running back-to-back packets together, MTU is > > 1500. No Jumbo frames exist on my little network and the gigabit switch > > (Netgear) won't even take them. Probably a chip bug. > > /me scratches head: play with the interframe gap ? > > > Need to add a check for len > mtu before processing? > > Please. > > diff -puN drivers/net/r8169.c~r8169-470 drivers/net/r8169.c > --- linux-2.6.11/drivers/net/r8169.c~r8169-470 2005-03-04 > 22:51:35.038710839 +0100 +++ linux-2.6.11-fr/drivers/net/r8169.c 2005-03-04 > 23:16:29.422289316 +0100 @@ -2194,6 +2194,7 @@ rtl8169_rx_interrupt(struct > net_device * > int pkt_size = (status & 0x00001FFF) - 4; > void (*pci_action)(struct pci_dev *, dma_addr_t, > size_t, int) = pci_dma_sync_single_for_device; > + static int show_size = 0; > > rtl8169_rx_csum(skb, desc); > > @@ -2210,6 +2211,24 @@ rtl8169_rx_interrupt(struct net_device * > pci_action(tp->pci_dev, le64_to_cpu(desc->addr), > tp->rx_buf_sz, PCI_DMA_FROMDEVICE); > > + if (pkt_size >= tp->rx_buf_sz) { > + show_size = 1; > + pkt_size = tp->rx_buf_sz; > + } Shouldn't the above be dev->mtu (instead of tp->rx_buf_sz), otherwise there won't be enough room for ethernet header, CRC, etc. > + > + if (show_size) { > + printk(KERN_INFO "%s: pkt_size=%d\n", dev->name, > + pkt_size); > + printk(KERN_INFO "%s: opts1= %08x\n", dev->name, > + desc->opts1); > + printk(KERN_INFO "%s: opts2= %08x\n", dev->name, > + desc->opts2); > + printk(KERN_INFO "%s: addrl= %08x\n", dev->name, > + (u32)desc->addr); > + printk(KERN_INFO "%s: addrh= %08x\n", dev->name, > + (u32)(desc->addr >> 32)); > + } > + > skb->dev = dev; > skb_put(skb, pkt_size); > skb->protocol = eth_type_trans(skb, dev); > > _ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: r8169: panic on 2.6.11 2005-03-04 23:28 ` Jon Mason @ 2005-03-04 23:49 ` Stephen Hemminger 2005-03-05 0:37 ` Francois Romieu 2005-03-04 23:58 ` Francois Romieu 1 sibling, 1 reply; 13+ messages in thread From: Stephen Hemminger @ 2005-03-04 23:49 UTC (permalink / raw) To: Francois Romieu; +Cc: Jon Mason, netdev I am really killing this poor beast, the target is a 1.2 Ghz Celeron and it is trying to handle 1,000,000 packets/sec (from tg3 Opteron) Added this and took out "too much work at interrupt message" --- drivers/net/r8169.c.orig 2005-03-04 13:19:08.000000000 -0800 +++ drivers/net/r8169.c 2005-03-04 15:41:30.000000000 -0800 @@ -2210,6 +2210,16 @@ pci_action(tp->pci_dev, le64_to_cpu(desc->addr), tp->rx_buf_sz, PCI_DMA_FROMDEVICE); + if (pkt_size > tp->rx_buf_sz) { + printk(KERN_WARNING "%s: status=%x opts=%x opts2 =%x addr=%x:%x\n", + dev->name, status, desc->opts1, + desc->opts2, + (u32) (desc->addr >> 32), + (u32) desc->addr); + + goto ditch; + } + skb->dev = dev; skb_put(skb, pkt_size); skb->protocol = eth_type_trans(skb, dev); @@ -2222,6 +2232,7 @@ tp->stats.rx_packets++; } + ditch: cur_rx++; rx_left--; } And got this (before it died): eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:106ad012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:15194812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:15194012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:9efb812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:9efb012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:10215812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:10215012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8de2812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8de2012 eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:1559a812 eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:10782012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:b7f3812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:b7f3012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:15125812 eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:9ef9812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:9ef9012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:15707812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:15707012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:588c812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:588c012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:15120812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:15120012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:12b2c812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:12b2c012 eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:6561812 eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:6561012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:13bea812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:13bea012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:112b4812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:112b4012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:157d7812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:157d7012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:39a5812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:39a5012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:58c3812 eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:58c3012 eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:1582f812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:1582f012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:116f6812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:116f6012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:109f0812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:109f0012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:13b82812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:13b82012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:ce31812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:ce31012 eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:29d3812 eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:29d3012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:b3b4812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:b3b4012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:5a73812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:5a73012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:d288812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:d288012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:107e3812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:107e3012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:15198812 eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:15198012 eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:16654812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:16654012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:16741812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:16741012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:b911812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:b911012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:12e5d812 eth0: status=24829808 opts=24829808 opts2=0 addr=0:a6ee012 eth0: status=4829808 opts=4829808 opts2=0 addr=0:127ff812 eth0: status=4829808 opts=4829808 opts2=0 addr=0:127ff012 eth0: status=14829808 opts=14829808 opts2=0 addr=0:16b84812 eth0: status=20802842 opts=20802842 opts2=0 addr=0:16b84012 eth0: status=802842 opts=802842 opts2=0 addr=0:125d1812 eth0: status=802842 opts=802842 opts2=0 addr=0:125d1012 eth0: status=802842 opts=802842 opts2=0 addr=0:11104812 eth0: status=802842 opts=802842 opts2=0 addr=0:11104012 eth0: status=802842 opts=802842 opts2=0 addr=0:636a812 eth0: status=10802842 opts=10802842 opts2=0 addr=0:636a012 eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:138d5812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:138d5012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:11cb0812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:11cb0012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:88a4812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:88a4012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:1082a812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:1082a012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:168af812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:168af012 eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:1660f812 eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:112c3012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:c3e0812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:c3e0012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:1f7e812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:1f7e012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:2f5c812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:2f5c012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:132f6812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:132f6012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:c74a812 eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:c74a012 eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:11140812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:11140012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:127be812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:127be012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:16475812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:16475012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:114d7812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:114d7012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:87a3812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:87a3012 eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:11abb812 eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:11c68012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8968812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8968012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:5588812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:5588012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:14f77812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:14f77012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8951812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8951012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:5a6f812 eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:5a6f012 eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:12acc012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8e90812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8e90012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:9d53812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:9d53012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:1390a812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:1390a012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8952812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8952012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:11a76812 eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:11a76012 eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:11167812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:11167012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:11548812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:11548012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:16428812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:16428012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8762812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8762012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:14139812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:14139012 eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:1157c812 eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:8dcd012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8de8812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8de8012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8f08812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8f08012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:116a4812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:116a4012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:11b98812 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:11b98012 eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8cf4812 eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:8cf4012 Unable to handle kernel paging request at virtual address 4228b92f printing eip: c011772d *pde = 00000000 Oops: 0002 [#1] PREEMPT Modules linked in: r8169 i810 md5 ipv6 autofs4 sunrpc reiserfs video button battery ac 813dCPU: 0 EIP: 0060:[<c011772d>] Not tainted VLI EFLAGS: 00010003 (2.6.11-netrx) EIP is at scheduler_tick+0x3d/0x2c0 eax: 4228b927 ebx: c036c000 ecx: 000f4240 edx: 000f44af esi: d785ba40 edi: c036cf18 ebp: c036ceb0 esp: c036ce9c ds: 007b es: 007b ss: 0068 Process '.(B'.(B@ (pid: 1109979719, threadinfo=c036c000 task=d785ba40) Stack: 00000046 00000000 c036cf18 00000000 c036cf18 c036cec4 c010712a c02fd300 00000000 c036cf18 c036cee0 c0137cc5 00000000 00000000 00000000 c033ea40 c036c000 c036cf00 c0137dbe c036c000 c02fd300 c036cf18 000000e2 00000000 Call Trace: [<c010344a>] show_stack+0x7a/0x90 [<c01035c9>] show_registers+0x149/0x1c0 [<c01037cd>] die+0xdd/0x170 [<c0115643>] do_page_fault+0x453/0x675 [<c01030d7>] error_code+0x2b/0x30 [<c010712a>] timer_interrupt+0x4a/0x120 [<c0137cc5>] handle_IRQ_event+0x35/0x70 [<c0137dbe>] __do_IRQ+0xbe/0x150 [<c01048bf>] do_IRQ+0x5f/0x70 [<c010309e>] common_interrupt+0x1a/0x20 [<d8856bad>] rtl8169_interrupt+0xdd/0x130 [r8169] [<c0137cc5>] handle_IRQ_event+0x35/0x70 [<c0137dbe>] __do_IRQ+0xbe/0x150 [<c01048a1>] do_IRQ+0x41/0x70 ======================= [<c010309e>] common_interrupt+0x1a/0x20 [<c02719ec>] ip_route_input_slow+0x39c/0x9f0 [<c0274660>] ip_rcv+0x380/0x480 [<c025eda5>] netif_receive_skb+0x1e5/0x220 [<c025ee5e>] process_backlog+0x7e/0x100 [<c025ef45>] net_rx_action+0x65/0xf0 [<c011fbb2>] __do_softirq+0x42/0xa0 [<c01049b4>] do_softirq+0x44/0x60 ======================= [<c011fcd8>] irq_exit+0x38/0x40 [<c01048a8>] do_IRQ+0x48/0x70 [<c010309e>] common_interrupt+0x1a/0x20 [<c02b9507>] schedule_timeout+0x57/0xb0 [<c017db4c>] ep_poll+0x10c/0x190 [<c017cc72>] sys_epoll_wait+0x92/0xa0 [<c0102ed9>] sysenter_past_esp+0x52/0x75 Code: 7d fc 21 e3 8b 33 e8 03 71 ff ff 39 35 80 ef 36 c0 a3 74 ef 36 c0 89 15 78 ef 36 c0 <0>Kernel panic - not syncing: Fatal exception in interrupt ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: r8169: panic on 2.6.11 2005-03-04 23:49 ` Stephen Hemminger @ 2005-03-05 0:37 ` Francois Romieu 2005-03-05 5:03 ` Jon Mason 2005-03-07 18:59 ` Stephen Hemminger 0 siblings, 2 replies; 13+ messages in thread From: Francois Romieu @ 2005-03-05 0:37 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Jon Mason, netdev Stephen Hemminger <shemminger@osdl.org> : [...] > Added this and took out "too much work at interrupt message" Ok (leaks but see below). > And got this (before it died): > > eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:106ad012 ^^^^^^ 3ff0 would be a ~16k packet. The high weight byte is missing: the descriptor would be strictly somewhere between the first descriptor and the last descriptor for the packet. opts=80xxxx -> FIFO overflow (*bulb flashes*) The code does not look pretty there. Can you add something like the patch below on top of your current patch (untested but you get the idea): diff -puN drivers/net/r8169.c~r8169-480 drivers/net/r8169.c --- linux-2.6.11/drivers/net/r8169.c~r8169-480 2005-03-05 00:16:58.575516900 +0100 +++ linux-2.6.11-fr/drivers/net/r8169.c 2005-03-05 01:32:20.122261946 +0100 @@ -240,6 +241,7 @@ enum RTL8169_register_content { RxOK = 0x01, /* RxStatusDesc */ + RxOVF = 0x00800000, RxRES = 0x00200000, RxCRC = 0x00080000, RxRUNT = 0x00100000, @@ -2181,13 +2183,14 @@ rtl8169_rx_interrupt(struct net_device * if (status & DescOwn) break; - if (status & RxRES) { + if (status & (RxRES | RxOVF)) { printk(KERN_INFO "%s: Rx ERROR!!!\n", dev->name); tp->stats.rx_errors++; if (status & (RxRWT | RxRUNT)) tp->stats.rx_length_errors++; if (status & RxCRC) tp->stats.rx_crc_errors++; + rtl8169_return_to_asic(tp->RxDescArray + entry, tp->rx_buf_sz); } else { struct RxDesc *desc = tp->RxDescArray + entry; struct sk_buff *skb = tp->Rx_skbuff[entry]; _ /me goes to bed. Out of curiosity it would be interesting to see how non-PREEMPT and NAPI behaves (the rings are surely too small). -- Ueimor ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: r8169: panic on 2.6.11 2005-03-05 0:37 ` Francois Romieu @ 2005-03-05 5:03 ` Jon Mason 2005-03-05 6:34 ` Stephen Hemminger 2005-03-07 18:59 ` Stephen Hemminger 1 sibling, 1 reply; 13+ messages in thread From: Jon Mason @ 2005-03-05 5:03 UTC (permalink / raw) To: Francois Romieu; +Cc: Stephen Hemminger, netdev I tested the patch below on amd64, and have found a problem. My adapter always has the FOVF bit set, so the adapter never pings. After looking at the opts1 register output of my adapter and Steven's, I noticed something weird. For every packet, I am getting opts1 = 0x3481c040. Now, compare this to opts=803ff0 from Steven's last test. It appears that the upper 8 bits have been lost. These are the FirstSegment and LastSegment indicators (which should always be True for < 8191). This looks alot like some of the funky behavior that I was seeing with my > 8191 jumbo frames patch. What size packets are being sent accross the wire? On Friday 04 March 2005 06:37 pm, Francois Romieu wrote: > Stephen Hemminger <shemminger@osdl.org> : > [...] > > > Added this and took out "too much work at interrupt message" > > Ok (leaks but see below). > > > And got this (before it died): > > > > eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:106ad012 > > ^^^^^^ > 3ff0 would be a ~16k packet. The high weight byte is > missing: the descriptor would be strictly somewhere between > the first descriptor and the last descriptor for the packet. > > opts=80xxxx -> FIFO overflow (*bulb flashes*) > > The code does not look pretty there. > > Can you add something like the patch below on top of your > current patch (untested but you get the idea): > > diff -puN drivers/net/r8169.c~r8169-480 drivers/net/r8169.c > --- linux-2.6.11/drivers/net/r8169.c~r8169-480 2005-03-05 > 00:16:58.575516900 +0100 +++ linux-2.6.11-fr/drivers/net/r8169.c 2005-03-05 > 01:32:20.122261946 +0100 @@ -240,6 +241,7 @@ enum RTL8169_register_content > { > RxOK = 0x01, > > /* RxStatusDesc */ > + RxOVF = 0x00800000, > RxRES = 0x00200000, > RxCRC = 0x00080000, > RxRUNT = 0x00100000, > @@ -2181,13 +2183,14 @@ rtl8169_rx_interrupt(struct net_device * > > if (status & DescOwn) > break; > - if (status & RxRES) { > + if (status & (RxRES | RxOVF)) { > printk(KERN_INFO "%s: Rx ERROR!!!\n", dev->name); > tp->stats.rx_errors++; > if (status & (RxRWT | RxRUNT)) > tp->stats.rx_length_errors++; > if (status & RxCRC) > tp->stats.rx_crc_errors++; > + rtl8169_return_to_asic(tp->RxDescArray + entry, tp->rx_buf_sz); > } else { > struct RxDesc *desc = tp->RxDescArray + entry; > struct sk_buff *skb = tp->Rx_skbuff[entry]; > > _ > > /me goes to bed. > > Out of curiosity it would be interesting to see how non-PREEMPT and NAPI > behaves (the rings are surely too small). > > -- > Ueimor ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: r8169: panic on 2.6.11 2005-03-05 5:03 ` Jon Mason @ 2005-03-05 6:34 ` Stephen Hemminger 0 siblings, 0 replies; 13+ messages in thread From: Stephen Hemminger @ 2005-03-05 6:34 UTC (permalink / raw) To: Jon Mason; +Cc: netdev The packet burst is 10 million 64 byte packets. Could be the sender or switch causing merging, but I suspect the r8169 on the slow receiver. Details are: # cat /proc/net/pktgen/eth0 Params: count 10000000 min_pkt_size: 60 max_pkt_size: 60 frags: 0 delay: 0 clone_skb: 1000000 ifname: eth0 flows: 0 flowlen: 0 dst_min: 172.2.251.143 dst_max: src_min: src_max: src_mac: 00:0D:60:53:08:18 dst_mac: 00:09:5B:BD:B1:F9 udp_src_min: 9 udp_src_max: 9 udp_dst_min: 9 udp_dst_max: 9 src_mac_count: 0 dst_mac_count: 0 Flags: Current: pkts-sofar: 10000000 errors: 6374336 started: 1109979805822722us stopped: 1109979815051630us idle: 166364us seq_num: 10000011 cur_dst_mac_offset: 0 cur_src_mac_offset: 0 cur_saddr: 0x670114ac cur_daddr: 0x8ffb02ac cur_udp_dst: 9 cur_udp_src: 9 flows: 0 Result: OK: 9228908(c9062544+d166364) usec, 10000000 (60byte,0frags) 1083551pps 520Mb/sec (520104480bps) errors: 6374336 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: r8169: panic on 2.6.11 2005-03-05 0:37 ` Francois Romieu 2005-03-05 5:03 ` Jon Mason @ 2005-03-07 18:59 ` Stephen Hemminger 1 sibling, 0 replies; 13+ messages in thread From: Stephen Hemminger @ 2005-03-07 18:59 UTC (permalink / raw) To: Francois Romieu; +Cc: Jon Mason, netdev On Sat, 5 Mar 2005 01:37:35 +0100 Francois Romieu <romieu@fr.zoreil.com> wrote: > Stephen Hemminger <shemminger@osdl.org> : > [...] > > Added this and took out "too much work at interrupt message" > > Ok (leaks but see below). > > > And got this (before it died): > > > > eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:106ad012 > ^^^^^^ > 3ff0 would be a ~16k packet. The high weight byte is > missing: the descriptor would be strictly somewhere between > the first descriptor and the last descriptor for the packet. > > opts=80xxxx -> FIFO overflow (*bulb flashes*) > > The code does not look pretty there. Doesn't come up because it gets that bit set in the normal case. eth0: Rx ERROR status=328440ff!!! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: r8169: panic on 2.6.11 2005-03-04 23:28 ` Jon Mason 2005-03-04 23:49 ` Stephen Hemminger @ 2005-03-04 23:58 ` Francois Romieu 1 sibling, 0 replies; 13+ messages in thread From: Francois Romieu @ 2005-03-04 23:58 UTC (permalink / raw) To: Jon Mason; +Cc: Stephen Hemminger, netdev Jon Mason <jdmason@us.ibm.com> : [...] > > @@ -2210,6 +2211,24 @@ rtl8169_rx_interrupt(struct net_device * > > pci_action(tp->pci_dev, le64_to_cpu(desc->addr), > > tp->rx_buf_sz, PCI_DMA_FROMDEVICE); > > > > + if (pkt_size >= tp->rx_buf_sz) { > > + show_size = 1; > > + pkt_size = tp->rx_buf_sz; > > + } > > Shouldn't the above be dev->mtu (instead of tp->rx_buf_sz), otherwise there > won't be enough room for ethernet header, CRC, etc. There is no room left in the buffer. I want it to be translated into the biggest skb_put() possible. tp->rx_buf_sz already account the headers (see rtl8169_set_rxbufsize), no ? At worst it is possible we are a bit pessimistic imho. -- Ueimor ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: r8169: panic on 2.6.11 2005-03-04 21:28 r8169: panic on 2.6.11 Stephen Hemminger 2005-03-04 21:37 ` Jeff Garzik @ 2005-03-04 21:39 ` Francois Romieu 1 sibling, 0 replies; 13+ messages in thread From: Francois Romieu @ 2005-03-04 21:39 UTC (permalink / raw) To: Stephen Hemminger; +Cc: netdev Stephen Hemminger <shemminger@osdl.org> : > I was intentionally over stressing r8169 without NAPI to test out > an alternative version of netif_rx, and discovered the following bug. > Looks like a problem in r8169. I was using pktgen to overwhelm the r8169 > from a fast machine. Thanks. I'm on it. -- Ueimor ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2005-03-07 18:59 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-04 21:28 r8169: panic on 2.6.11 Stephen Hemminger
2005-03-04 21:37 ` Jeff Garzik
2005-03-04 21:50 ` Francois Romieu
[not found] ` <20050304135922.0b0a3911@dxpl.pdx.osdl.net>
[not found] ` <20050304221826.GA1028@electric-eye.fr.zoreil.com>
2005-03-04 22:53 ` Stephen Hemminger
2005-03-04 23:02 ` Francois Romieu
2005-03-04 23:28 ` Jon Mason
2005-03-04 23:49 ` Stephen Hemminger
2005-03-05 0:37 ` Francois Romieu
2005-03-05 5:03 ` Jon Mason
2005-03-05 6:34 ` Stephen Hemminger
2005-03-07 18:59 ` Stephen Hemminger
2005-03-04 23:58 ` Francois Romieu
2005-03-04 21:39 ` Francois Romieu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).