netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* r8169: panic on 2.6.11
@ 2005-03-04 21:28 Stephen Hemminger
  2005-03-04 21:37 ` Jeff Garzik
  2005-03-04 21:39 ` Francois Romieu
  0 siblings, 2 replies; 13+ messages in thread
From: Stephen Hemminger @ 2005-03-04 21:28 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev

I was intentionally over stressing r8169 without NAPI to test out
an alternative version of netif_rx, and discovered the following bug.
Looks like a problem in r8169. I was using pktgen to overwhelm the r8169
from a fast machine.

eth0: Too much work at interrupt!
eth0: Too much work at interrupt!
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:91!
invalid operand: 0000 [#1]
PREEMPT
Modules linked in: i810 md5 ipv6 autofs4 sunrpc reiserfs video button battery adCPU:    0
EIP:    0060:[<c0258c78>]    Not tainted VLI
EFLAGS: 00010296   (2.6.11-netrx)
EIP is at skb_over_panic+0x38/0x50
eax: 0000002e   ebx: d7144000   ecx: 00000000   edx: c036cf40
esi: c02cee94   edi: 00000000   ebp: c036cf58   esp: c036cf3c
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c036c000 task=c02f8b20)
Stack: c02e38f8 d8856a10 00001fec 00001fec d7144000 d39fe6a0 00000036 c036cf94
       d8856a15 00000002 c1378b20 d8856a30 00001fec d395b360 00000100 00109f36
       d7144220 d7144000 d39fe6a0 00000001 d881cc00 d7144000 c036cfbc d8856b4d
Call Trace:
 [<c010344a>] show_stack+0x7a/0x90
 [<c01035c9>] show_registers+0x149/0x1c0
 [<c01037cd>] die+0xdd/0x170
 [<c0103b65>] do_invalid_op+0xa5/0xb0
 [<c01030d7>] error_code+0x2b/0x30
 [<d8856a15>] rtl8169_rx_interrupt+0x365/0x380 [r8169]
 [<d8856b4d>] rtl8169_interrupt+0xed/0x140 [r8169]
 [<c0137cc5>] handle_IRQ_event+0x35/0x70
 [<c0137dbe>] __do_IRQ+0xbe/0x150
 [<c01048a1>] do_IRQ+0x41/0x70
 =======================
 [<c010309e>] common_interrupt+0x1a/0x20
 [<c02716bd>] ip_route_input_slow+0x6d/0x9f0
 [<c0274660>] ip_rcv+0x380/0x480
 [<c025eda5>] netif_receive_skb+0x1e5/0x220
 [<c025ee5e>] process_backlog+0x7e/0x100
 [<c025ef45>] net_rx_action+0x65/0xf0
 [<c011fbb2>] __do_softirq+0x42/0xa0
 [<c01049b4>] do_softirq+0x44/0x60
 =======================
 [<c011fcd8>] irq_exit+0x38/0x40
 [<c01048a8>] do_IRQ+0x48/0x70
 [<c010309e>] common_interrupt+0x1a/0x20
 [<c0102e7e>] need_resched+0x1f/0x21
Code: c0 89 5d f8 8b 58 18 89 54 24 0c 85 db 0f 44 de 89 5c 24 10 8b 40 60 89 4
 <0>Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169: panic on 2.6.11
  2005-03-04 21:28 r8169: panic on 2.6.11 Stephen Hemminger
@ 2005-03-04 21:37 ` Jeff Garzik
  2005-03-04 21:50   ` Francois Romieu
  2005-03-04 21:39 ` Francois Romieu
  1 sibling, 1 reply; 13+ messages in thread
From: Jeff Garzik @ 2005-03-04 21:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Francois Romieu, netdev

Stephen Hemminger wrote:
> I was intentionally over stressing r8169 without NAPI to test out
> an alternative version of netif_rx, and discovered the following bug.
> Looks like a problem in r8169. I was using pktgen to overwhelm the r8169
> from a fast machine.

Does 2.6.10 fail in a similar manner?


> eth0: Too much work at interrupt!
> eth0: Too much work at interrupt!
> ------------[ cut here ]------------
> kernel BUG at net/core/skbuff.c:91!

Any idea what this BUG actually means?  Since it's in skb_over_panic(), 
a function designed to do nothing but BUG(), that doesn't tell us a 
whole lot about the callsite.

	Jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169: panic on 2.6.11
  2005-03-04 21:28 r8169: panic on 2.6.11 Stephen Hemminger
  2005-03-04 21:37 ` Jeff Garzik
@ 2005-03-04 21:39 ` Francois Romieu
  1 sibling, 0 replies; 13+ messages in thread
From: Francois Romieu @ 2005-03-04 21:39 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Stephen Hemminger <shemminger@osdl.org> :
> I was intentionally over stressing r8169 without NAPI to test out
> an alternative version of netif_rx, and discovered the following bug.
> Looks like a problem in r8169. I was using pktgen to overwhelm the r8169
> from a fast machine.

Thanks. I'm on it.

--
Ueimor

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169: panic on 2.6.11
  2005-03-04 21:37 ` Jeff Garzik
@ 2005-03-04 21:50   ` Francois Romieu
       [not found]     ` <20050304135922.0b0a3911@dxpl.pdx.osdl.net>
  0 siblings, 1 reply; 13+ messages in thread
From: Francois Romieu @ 2005-03-04 21:50 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Stephen Hemminger, netdev

Jeff Garzik <jgarzik@pobox.com> :
[...]
> Any idea what this BUG actually means?  Since it's in skb_over_panic(), 
> a function designed to do nothing but BUG(), that doesn't tell us a 
> whole lot about the callsite.

I have been reported that the driver is not protected against oversized
frames since 2.6.10 at least... I've just got an instant reboot on the
sparc box while testing it :o/

--
Ueimor

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169: panic on 2.6.11
       [not found]       ` <20050304221826.GA1028@electric-eye.fr.zoreil.com>
@ 2005-03-04 22:53         ` Stephen Hemminger
  2005-03-04 23:02           ` Francois Romieu
  0 siblings, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2005-03-04 22:53 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev

On Fri, 4 Mar 2005 23:18:26 +0100
Francois Romieu <romieu@fr.zoreil.com> wrote:

> Stephen Hemminger <shemminger@osdl.org> :
> [...]
> > My pktgen script is below.
> > Sends lots of small packets.
> 
> Ok, so I have two issues...
> 
> It could help to know if it was the NAPI or the IRQ verison of the driver
> which crashed + rough estimate of the expected pkts/s count during such test.

NAPI is not enabled, it is the IRQ version. Hitting 

Added instrumentation:.

skb=0xd1e28380 len=8172 head=d1e32000 data=d1e32012 tail=d1e33ffe end=d1e32620

Looks like the board is running back-to-back packets together, MTU is 1500.
No Jumbo frames exist on my little network and the gigabit switch (Netgear) won't 
even take them.  Probably a chip bug.

Need to add a check for len > mtu before processing?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169: panic on 2.6.11
  2005-03-04 22:53         ` Stephen Hemminger
@ 2005-03-04 23:02           ` Francois Romieu
  2005-03-04 23:28             ` Jon Mason
  0 siblings, 1 reply; 13+ messages in thread
From: Francois Romieu @ 2005-03-04 23:02 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Stephen Hemminger <shemminger@osdl.org> :
[...]
> NAPI is not enabled, it is the IRQ version. Hitting 
> 
> Added instrumentation:.
> 
> skb=0xd1e28380 len=8172 head=d1e32000 data=d1e32012 tail=d1e33ffe end=d1e32620
> 
> Looks like the board is running back-to-back packets together, MTU is 1500.
> No Jumbo frames exist on my little network and the gigabit switch (Netgear) won't 
> even take them.  Probably a chip bug.

/me scratches head: play with the interframe gap ?

> Need to add a check for len > mtu before processing?

Please.

diff -puN drivers/net/r8169.c~r8169-470 drivers/net/r8169.c
--- linux-2.6.11/drivers/net/r8169.c~r8169-470	2005-03-04 22:51:35.038710839 +0100
+++ linux-2.6.11-fr/drivers/net/r8169.c	2005-03-04 23:16:29.422289316 +0100
@@ -2194,6 +2194,7 @@ rtl8169_rx_interrupt(struct net_device *
 			int pkt_size = (status & 0x00001FFF) - 4;
 			void (*pci_action)(struct pci_dev *, dma_addr_t,
 				size_t, int) = pci_dma_sync_single_for_device;
+			static int show_size = 0;
 
 			rtl8169_rx_csum(skb, desc);
 			
@@ -2210,6 +2211,24 @@ rtl8169_rx_interrupt(struct net_device *
 			pci_action(tp->pci_dev, le64_to_cpu(desc->addr),
 				   tp->rx_buf_sz, PCI_DMA_FROMDEVICE);
 
+			if (pkt_size >= tp->rx_buf_sz) {
+				show_size = 1;
+				pkt_size = tp->rx_buf_sz;
+			}
+
+			if (show_size) {
+				printk(KERN_INFO "%s: pkt_size=%d\n", dev->name,
+				       pkt_size);
+				printk(KERN_INFO "%s: opts1= %08x\n", dev->name,
+				       desc->opts1);
+				printk(KERN_INFO "%s: opts2= %08x\n", dev->name,
+				       desc->opts2);
+				printk(KERN_INFO "%s: addrl= %08x\n", dev->name,
+				       (u32)desc->addr);
+				printk(KERN_INFO "%s: addrh= %08x\n", dev->name,
+				       (u32)(desc->addr >> 32));
+			}
+
 			skb->dev = dev;
 			skb_put(skb, pkt_size);
 			skb->protocol = eth_type_trans(skb, dev);

_

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169: panic on 2.6.11
  2005-03-04 23:02           ` Francois Romieu
@ 2005-03-04 23:28             ` Jon Mason
  2005-03-04 23:49               ` Stephen Hemminger
  2005-03-04 23:58               ` Francois Romieu
  0 siblings, 2 replies; 13+ messages in thread
From: Jon Mason @ 2005-03-04 23:28 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Stephen Hemminger, netdev

On Friday 04 March 2005 05:02 pm, Francois Romieu wrote:
> Stephen Hemminger <shemminger@osdl.org> :
> [...]
>
> > NAPI is not enabled, it is the IRQ version. Hitting
> >
> > Added instrumentation:.
> >
> > skb=0xd1e28380 len=8172 head=d1e32000 data=d1e32012 tail=d1e33ffe
> > end=d1e32620
> >
> > Looks like the board is running back-to-back packets together, MTU is
> > 1500. No Jumbo frames exist on my little network and the gigabit switch
> > (Netgear) won't even take them.  Probably a chip bug.
>
> /me scratches head: play with the interframe gap ?
>
> > Need to add a check for len > mtu before processing?
>
> Please.
>
> diff -puN drivers/net/r8169.c~r8169-470 drivers/net/r8169.c
> --- linux-2.6.11/drivers/net/r8169.c~r8169-470 2005-03-04
> 22:51:35.038710839 +0100 +++ linux-2.6.11-fr/drivers/net/r8169.c 2005-03-04
> 23:16:29.422289316 +0100 @@ -2194,6 +2194,7 @@ rtl8169_rx_interrupt(struct
> net_device *
>     int pkt_size = (status & 0x00001FFF) - 4;
>     void (*pci_action)(struct pci_dev *, dma_addr_t,
>      size_t, int) = pci_dma_sync_single_for_device;
> +   static int show_size = 0;
>
>     rtl8169_rx_csum(skb, desc);
>
> @@ -2210,6 +2211,24 @@ rtl8169_rx_interrupt(struct net_device *
>     pci_action(tp->pci_dev, le64_to_cpu(desc->addr),
>         tp->rx_buf_sz, PCI_DMA_FROMDEVICE);
>
> +   if (pkt_size >= tp->rx_buf_sz) {
> +    show_size = 1;
> +    pkt_size = tp->rx_buf_sz;
> +   }

Shouldn't the above be dev->mtu (instead of tp->rx_buf_sz), otherwise there 
won't be enough room for ethernet header, CRC, etc.

> +
> +   if (show_size) {
> +    printk(KERN_INFO "%s: pkt_size=%d\n", dev->name,
> +           pkt_size);
> +    printk(KERN_INFO "%s: opts1= %08x\n", dev->name,
> +           desc->opts1);
> +    printk(KERN_INFO "%s: opts2= %08x\n", dev->name,
> +           desc->opts2);
> +    printk(KERN_INFO "%s: addrl= %08x\n", dev->name,
> +           (u32)desc->addr);
> +    printk(KERN_INFO "%s: addrh= %08x\n", dev->name,
> +           (u32)(desc->addr >> 32));
> +   }
> +
>     skb->dev = dev;
>     skb_put(skb, pkt_size);
>     skb->protocol = eth_type_trans(skb, dev);
>
> _

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169: panic on 2.6.11
  2005-03-04 23:28             ` Jon Mason
@ 2005-03-04 23:49               ` Stephen Hemminger
  2005-03-05  0:37                 ` Francois Romieu
  2005-03-04 23:58               ` Francois Romieu
  1 sibling, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2005-03-04 23:49 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Jon Mason, netdev

I am really killing this poor beast, the target is a 1.2 Ghz Celeron and it
is trying to handle 1,000,000 packets/sec (from tg3 Opteron)

Added this and took out "too much work at interrupt message"

--- drivers/net/r8169.c.orig    2005-03-04 13:19:08.000000000 -0800
+++ drivers/net/r8169.c 2005-03-04 15:41:30.000000000 -0800
@@ -2210,6 +2210,16 @@
                        pci_action(tp->pci_dev, le64_to_cpu(desc->addr),
                                   tp->rx_buf_sz, PCI_DMA_FROMDEVICE);

+                       if (pkt_size > tp->rx_buf_sz) {
+                               printk(KERN_WARNING "%s: status=%x opts=%x opts2
=%x addr=%x:%x\n",
+                                      dev->name, status, desc->opts1,
+                                      desc->opts2,
+                                      (u32) (desc->addr >> 32),
+                                      (u32) desc->addr);
+
+                               goto ditch;
+                       }
+
                        skb->dev = dev;
                        skb_put(skb, pkt_size);
                        skb->protocol = eth_type_trans(skb, dev);
@@ -2222,6 +2232,7 @@
                        tp->stats.rx_packets++;
                }

+       ditch:
                cur_rx++;
                rx_left--;
        }


And got this (before it died):

eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:106ad012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:15194812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:15194012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:9efb812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:9efb012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:10215812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:10215012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8de2812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8de2012
eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:1559a812
eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:10782012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:b7f3812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:b7f3012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:15125812
eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:9ef9812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:9ef9012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:15707812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:15707012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:588c812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:588c012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:15120812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:15120012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:12b2c812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:12b2c012
eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:6561812
eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:6561012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:13bea812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:13bea012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:112b4812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:112b4012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:157d7812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:157d7012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:39a5812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:39a5012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:58c3812
eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:58c3012
eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:1582f812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:1582f012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:116f6812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:116f6012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:109f0812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:109f0012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:13b82812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:13b82012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:ce31812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:ce31012
eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:29d3812
eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:29d3012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:b3b4812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:b3b4012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:5a73812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:5a73012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:d288812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:d288012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:107e3812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:107e3012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:15198812
eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:15198012
eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:16654812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:16654012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:16741812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:16741012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:b911812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:b911012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:12e5d812
eth0: status=24829808 opts=24829808 opts2=0 addr=0:a6ee012
eth0: status=4829808 opts=4829808 opts2=0 addr=0:127ff812
eth0: status=4829808 opts=4829808 opts2=0 addr=0:127ff012
eth0: status=14829808 opts=14829808 opts2=0 addr=0:16b84812
eth0: status=20802842 opts=20802842 opts2=0 addr=0:16b84012
eth0: status=802842 opts=802842 opts2=0 addr=0:125d1812
eth0: status=802842 opts=802842 opts2=0 addr=0:125d1012
eth0: status=802842 opts=802842 opts2=0 addr=0:11104812
eth0: status=802842 opts=802842 opts2=0 addr=0:11104012
eth0: status=802842 opts=802842 opts2=0 addr=0:636a812
eth0: status=10802842 opts=10802842 opts2=0 addr=0:636a012
eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:138d5812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:138d5012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:11cb0812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:11cb0012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:88a4812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:88a4012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:1082a812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:1082a012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:168af812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:168af012
eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:1660f812
eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:112c3012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:c3e0812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:c3e0012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:1f7e812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:1f7e012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:2f5c812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:2f5c012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:132f6812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:132f6012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:c74a812
eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:c74a012
eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:11140812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:11140012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:127be812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:127be012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:16475812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:16475012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:114d7812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:114d7012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:87a3812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:87a3012
eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:11abb812
eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:11c68012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8968812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8968012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:5588812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:5588012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:14f77812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:14f77012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8951812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8951012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:5a6f812
eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:5a6f012
eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:12acc012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8e90812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8e90012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:9d53812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:9d53012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:1390a812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:1390a012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8952812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8952012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:11a76812
eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:11a76012
eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:11167812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:11167012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:11548812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:11548012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:16428812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:16428012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8762812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8762012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:14139812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:14139012
eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:1157c812
eth0: status=20803ff0 opts=20803ff0 opts2=0 addr=0:8dcd012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8de8812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8de8012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8f08812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8f08012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:116a4812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:116a4012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:11b98812
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:11b98012
eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:8cf4812
eth0: status=10803ff0 opts=10803ff0 opts2=0 addr=0:8cf4012
Unable to handle kernel paging request at virtual address 4228b92f
 printing eip:
c011772d
*pde = 00000000
Oops: 0002 [#1]
PREEMPT
Modules linked in: r8169 i810 md5 ipv6 autofs4 sunrpc reiserfs video button battery ac 813dCPU:    0
EIP:    0060:[<c011772d>]    Not tainted VLI
EFLAGS: 00010003   (2.6.11-netrx)
EIP is at scheduler_tick+0x3d/0x2c0
eax: 4228b927   ebx: c036c000   ecx: 000f4240   edx: 000f44af
esi: d785ba40   edi: c036cf18   ebp: c036ceb0   esp: c036ce9c
ds: 007b   es: 007b   ss: 0068
Process '.(B'.(B@ (pid: 1109979719, threadinfo=c036c000 task=d785ba40)
Stack: 00000046 00000000 c036cf18 00000000 c036cf18 c036cec4 c010712a c02fd300
       00000000 c036cf18 c036cee0 c0137cc5 00000000 00000000 00000000 c033ea40
       c036c000 c036cf00 c0137dbe c036c000 c02fd300 c036cf18 000000e2 00000000
Call Trace:
 [<c010344a>] show_stack+0x7a/0x90
 [<c01035c9>] show_registers+0x149/0x1c0
 [<c01037cd>] die+0xdd/0x170
 [<c0115643>] do_page_fault+0x453/0x675
 [<c01030d7>] error_code+0x2b/0x30
 [<c010712a>] timer_interrupt+0x4a/0x120
 [<c0137cc5>] handle_IRQ_event+0x35/0x70
 [<c0137dbe>] __do_IRQ+0xbe/0x150
 [<c01048bf>] do_IRQ+0x5f/0x70
 [<c010309e>] common_interrupt+0x1a/0x20
 [<d8856bad>] rtl8169_interrupt+0xdd/0x130 [r8169]
 [<c0137cc5>] handle_IRQ_event+0x35/0x70
 [<c0137dbe>] __do_IRQ+0xbe/0x150
 [<c01048a1>] do_IRQ+0x41/0x70
 =======================
 [<c010309e>] common_interrupt+0x1a/0x20
 [<c02719ec>] ip_route_input_slow+0x39c/0x9f0
 [<c0274660>] ip_rcv+0x380/0x480
 [<c025eda5>] netif_receive_skb+0x1e5/0x220
 [<c025ee5e>] process_backlog+0x7e/0x100
 [<c025ef45>] net_rx_action+0x65/0xf0
 [<c011fbb2>] __do_softirq+0x42/0xa0
 [<c01049b4>] do_softirq+0x44/0x60
 =======================
 [<c011fcd8>] irq_exit+0x38/0x40
 [<c01048a8>] do_IRQ+0x48/0x70
 [<c010309e>] common_interrupt+0x1a/0x20
 [<c02b9507>] schedule_timeout+0x57/0xb0
 [<c017db4c>] ep_poll+0x10c/0x190
 [<c017cc72>] sys_epoll_wait+0x92/0xa0
 [<c0102ed9>] sysenter_past_esp+0x52/0x75
Code: 7d fc 21 e3 8b 33 e8 03 71 ff ff 39 35 80 ef 36 c0 a3 74 ef 36 c0 89 15 78 ef 36 c0
 <0>Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169: panic on 2.6.11
  2005-03-04 23:28             ` Jon Mason
  2005-03-04 23:49               ` Stephen Hemminger
@ 2005-03-04 23:58               ` Francois Romieu
  1 sibling, 0 replies; 13+ messages in thread
From: Francois Romieu @ 2005-03-04 23:58 UTC (permalink / raw)
  To: Jon Mason; +Cc: Stephen Hemminger, netdev

Jon Mason <jdmason@us.ibm.com> :
[...]
> > @@ -2210,6 +2211,24 @@ rtl8169_rx_interrupt(struct net_device *
> >     pci_action(tp->pci_dev, le64_to_cpu(desc->addr),
> >         tp->rx_buf_sz, PCI_DMA_FROMDEVICE);
> >
> > +   if (pkt_size >= tp->rx_buf_sz) {
> > +    show_size = 1;
> > +    pkt_size = tp->rx_buf_sz;
> > +   }
> 
> Shouldn't the above be dev->mtu (instead of tp->rx_buf_sz), otherwise there 
> won't be enough room for ethernet header, CRC, etc.

There is no room left in the buffer. I want it to be translated into the
biggest skb_put() possible. tp->rx_buf_sz already account the headers (see
rtl8169_set_rxbufsize), no ?

At worst it is possible we are a bit pessimistic imho.

--
Ueimor

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169: panic on 2.6.11
  2005-03-04 23:49               ` Stephen Hemminger
@ 2005-03-05  0:37                 ` Francois Romieu
  2005-03-05  5:03                   ` Jon Mason
  2005-03-07 18:59                   ` Stephen Hemminger
  0 siblings, 2 replies; 13+ messages in thread
From: Francois Romieu @ 2005-03-05  0:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Jon Mason, netdev

Stephen Hemminger <shemminger@osdl.org> :
[...]
> Added this and took out "too much work at interrupt message"

Ok (leaks but see below).

> And got this (before it died):
> 
> eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:106ad012
                           ^^^^^^
3ff0 would be a ~16k packet. The high weight byte is
missing: the descriptor would be strictly somewhere between
the first descriptor and the last descriptor for the packet.

opts=80xxxx -> FIFO overflow (*bulb flashes*)

The code does not look pretty there.

Can you add something like the patch below on top of your
current patch (untested but you get the idea):

diff -puN drivers/net/r8169.c~r8169-480 drivers/net/r8169.c
--- linux-2.6.11/drivers/net/r8169.c~r8169-480	2005-03-05 00:16:58.575516900 +0100
+++ linux-2.6.11-fr/drivers/net/r8169.c	2005-03-05 01:32:20.122261946 +0100
@@ -240,6 +241,7 @@ enum RTL8169_register_content {
 	RxOK = 0x01,
 
 	/* RxStatusDesc */
+	RxOVF = 0x00800000,
 	RxRES = 0x00200000,
 	RxCRC = 0x00080000,
 	RxRUNT = 0x00100000,
@@ -2181,13 +2183,14 @@ rtl8169_rx_interrupt(struct net_device *
 
 		if (status & DescOwn)
 			break;
-		if (status & RxRES) {
+		if (status & (RxRES | RxOVF)) {
 			printk(KERN_INFO "%s: Rx ERROR!!!\n", dev->name);
 			tp->stats.rx_errors++;
 			if (status & (RxRWT | RxRUNT))
 				tp->stats.rx_length_errors++;
 			if (status & RxCRC)
 				tp->stats.rx_crc_errors++;
+			rtl8169_return_to_asic(tp->RxDescArray + entry, tp->rx_buf_sz);
 		} else {
 			struct RxDesc *desc = tp->RxDescArray + entry;
 			struct sk_buff *skb = tp->Rx_skbuff[entry];

_

/me goes to bed.

Out of curiosity it would be interesting to see how non-PREEMPT and NAPI behaves (the
rings are surely too small).

--
Ueimor

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169: panic on 2.6.11
  2005-03-05  0:37                 ` Francois Romieu
@ 2005-03-05  5:03                   ` Jon Mason
  2005-03-05  6:34                     ` Stephen Hemminger
  2005-03-07 18:59                   ` Stephen Hemminger
  1 sibling, 1 reply; 13+ messages in thread
From: Jon Mason @ 2005-03-05  5:03 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Stephen Hemminger, netdev

I tested the patch below on amd64, and have found a problem.  My adapter 
always has the FOVF bit set, so the adapter never pings.  

After looking at the opts1 register output of my adapter and Steven's, I 
noticed something weird.  For every packet, I am getting opts1 = 0x3481c040.  
Now, compare this to opts=803ff0 from Steven's last test.  It appears that 
the upper 8 bits have been lost.  These are the FirstSegment and LastSegment 
indicators (which should always be True for < 8191).  This looks alot like 
some of the funky behavior that I was seeing with my > 8191 jumbo frames 
patch.  

What size packets are being sent accross the wire?

On Friday 04 March 2005 06:37 pm, Francois Romieu wrote:
> Stephen Hemminger <shemminger@osdl.org> :
> [...]
>
> > Added this and took out "too much work at interrupt message"
>
> Ok (leaks but see below).
>
> > And got this (before it died):
> >
> > eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:106ad012
>
>                            ^^^^^^
> 3ff0 would be a ~16k packet. The high weight byte is
> missing: the descriptor would be strictly somewhere between
> the first descriptor and the last descriptor for the packet.
>
> opts=80xxxx -> FIFO overflow (*bulb flashes*)
>
> The code does not look pretty there.
>
> Can you add something like the patch below on top of your
> current patch (untested but you get the idea):
>
> diff -puN drivers/net/r8169.c~r8169-480 drivers/net/r8169.c
> --- linux-2.6.11/drivers/net/r8169.c~r8169-480 2005-03-05
> 00:16:58.575516900 +0100 +++ linux-2.6.11-fr/drivers/net/r8169.c 2005-03-05
> 01:32:20.122261946 +0100 @@ -240,6 +241,7 @@ enum RTL8169_register_content
> {
>   RxOK = 0x01,
>
>   /* RxStatusDesc */
> + RxOVF = 0x00800000,
>   RxRES = 0x00200000,
>   RxCRC = 0x00080000,
>   RxRUNT = 0x00100000,
> @@ -2181,13 +2183,14 @@ rtl8169_rx_interrupt(struct net_device *
>
>    if (status & DescOwn)
>     break;
> -  if (status & RxRES) {
> +  if (status & (RxRES | RxOVF)) {
>     printk(KERN_INFO "%s: Rx ERROR!!!\n", dev->name);
>     tp->stats.rx_errors++;
>     if (status & (RxRWT | RxRUNT))
>      tp->stats.rx_length_errors++;
>     if (status & RxCRC)
>      tp->stats.rx_crc_errors++;
> +   rtl8169_return_to_asic(tp->RxDescArray + entry, tp->rx_buf_sz);
>    } else {
>     struct RxDesc *desc = tp->RxDescArray + entry;
>     struct sk_buff *skb = tp->Rx_skbuff[entry];
>
> _
>
> /me goes to bed.
>
> Out of curiosity it would be interesting to see how non-PREEMPT and NAPI
> behaves (the rings are surely too small).
>
> --
> Ueimor

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169: panic on 2.6.11
  2005-03-05  5:03                   ` Jon Mason
@ 2005-03-05  6:34                     ` Stephen Hemminger
  0 siblings, 0 replies; 13+ messages in thread
From: Stephen Hemminger @ 2005-03-05  6:34 UTC (permalink / raw)
  To: Jon Mason; +Cc: netdev

The packet burst is 10 million 64 byte packets. Could be the sender
or switch causing merging, but I suspect the r8169 on the slow receiver.

Details are:
# cat /proc/net/pktgen/eth0
Params: count 10000000  min_pkt_size: 60  max_pkt_size: 60
      frags: 0  delay: 0  clone_skb: 1000000  ifname: eth0
      flows: 0 flowlen: 0
      dst_min: 172.2.251.143  dst_max:
      src_min:   src_max:
      src_mac: 00:0D:60:53:08:18  dst_mac: 00:09:5B:BD:B1:F9
      udp_src_min: 9  udp_src_max: 9  udp_dst_min: 9  udp_dst_max: 9
      src_mac_count: 0  dst_mac_count: 0
      Flags:
Current:
      pkts-sofar: 10000000  errors: 6374336
      started: 1109979805822722us  stopped: 1109979815051630us idle: 
166364us
      seq_num: 10000011  cur_dst_mac_offset: 0  cur_src_mac_offset: 0
      cur_saddr: 0x670114ac  cur_daddr: 0x8ffb02ac
      cur_udp_dst: 9  cur_udp_src: 9
      flows: 0
Result: OK: 9228908(c9062544+d166364) usec, 10000000 (60byte,0frags)
   1083551pps 520Mb/sec (520104480bps) errors: 6374336

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: r8169: panic on 2.6.11
  2005-03-05  0:37                 ` Francois Romieu
  2005-03-05  5:03                   ` Jon Mason
@ 2005-03-07 18:59                   ` Stephen Hemminger
  1 sibling, 0 replies; 13+ messages in thread
From: Stephen Hemminger @ 2005-03-07 18:59 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Jon Mason, netdev

On Sat, 5 Mar 2005 01:37:35 +0100
Francois Romieu <romieu@fr.zoreil.com> wrote:

> Stephen Hemminger <shemminger@osdl.org> :
> [...]
> > Added this and took out "too much work at interrupt message"
> 
> Ok (leaks but see below).
> 
> > And got this (before it died):
> > 
> > eth0: status=803ff0 opts=803ff0 opts2=0 addr=0:106ad012
>                            ^^^^^^
> 3ff0 would be a ~16k packet. The high weight byte is
> missing: the descriptor would be strictly somewhere between
> the first descriptor and the last descriptor for the packet.
> 
> opts=80xxxx -> FIFO overflow (*bulb flashes*)
> 
> The code does not look pretty there.


Doesn't come up because it gets that bit set in the normal case.

eth0: Rx ERROR status=328440ff!!!

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2005-03-07 18:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-04 21:28 r8169: panic on 2.6.11 Stephen Hemminger
2005-03-04 21:37 ` Jeff Garzik
2005-03-04 21:50   ` Francois Romieu
     [not found]     ` <20050304135922.0b0a3911@dxpl.pdx.osdl.net>
     [not found]       ` <20050304221826.GA1028@electric-eye.fr.zoreil.com>
2005-03-04 22:53         ` Stephen Hemminger
2005-03-04 23:02           ` Francois Romieu
2005-03-04 23:28             ` Jon Mason
2005-03-04 23:49               ` Stephen Hemminger
2005-03-05  0:37                 ` Francois Romieu
2005-03-05  5:03                   ` Jon Mason
2005-03-05  6:34                     ` Stephen Hemminger
2005-03-07 18:59                   ` Stephen Hemminger
2005-03-04 23:58               ` Francois Romieu
2005-03-04 21:39 ` Francois Romieu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).