* r8169 OOPSen in rtl_rx @ 2013-08-13 9:43 Peter Zijlstra 2013-08-13 21:15 ` Francois Romieu 0 siblings, 1 reply; 6+ messages in thread From: Peter Zijlstra @ 2013-08-13 9:43 UTC (permalink / raw) To: nic_swsd, romieu; +Cc: netdev Hi r8169 people, I've got an AMD x86_64 machine with two realtek NICs: 01:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10) 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06) I currently run a 3.10.0.6 based kernel on the machine and frequently (several times a week) get OOPSen in the rtl_rx path. Now the horribly sad part is that this machine doesn't (currently) have a working serial line -- its got pins on the board but I need to go hunt for an expansion bracket for it :/ I recently added the RTL8111 (rev 6) card so that this machine could do firewall duties (it was a general server using the RTL-8169 for a long time before that and always ran without problems). I have tried netconsole, but that's not working, which leads me to believe its the inward-facing NIC that's buggered -- which would be the RTL-8169 (rev 10) -- pure speculation though, it could just crash hard enough for nothing to really work anymore. The video-card also doesn't support 80x50/60 text modes and KMS/framebuffer also didn't work (as in, I get graphics based text at high res but OOPSen don't actually make it to the screen). So all I've got to offer currently is a partial backtrace -- see attached image. Partial transcribe: ? rtl8169_try_rx_copy.isra.77 rtl_rx rtl8169_poll net_rx_action ? get_vtime_delta __do_softirq irq_exit do_IRQ common_interrupt ? native_safe_halt ? rcu_eqs_enter_common.isra.48 default_idle amd_e400_idle arch_cpu_idle cpu_idle_loop ... I did look at the r8169 log between 3.10 and current head and there wasn't anything obviously related to RX crashes so I haven't upgraded to 3.11-rc; if you think I should try please say so. I'm also willing to try patches -- although as said, reproduction can take a few days -- although sometimes I'm 'lucky' and it crashes multiple times a day :/ ~ Peter ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: r8169 OOPSen in rtl_rx 2013-08-13 9:43 r8169 OOPSen in rtl_rx Peter Zijlstra @ 2013-08-13 21:15 ` Francois Romieu 2013-08-14 9:29 ` Peter Zijlstra 0 siblings, 1 reply; 6+ messages in thread From: Francois Romieu @ 2013-08-13 21:15 UTC (permalink / raw) To: Peter Zijlstra; +Cc: nic_swsd, netdev Peter Zijlstra <peterz@infradead.org> : [...] > I've got an AMD x86_64 machine with two realtek NICs: > > 01:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10) > 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06) Which XID (see kernel dmesg) ? [...] > So all I've got to offer currently is a partial backtrace -- see > attached image. (no attachment) > Partial transcribe: > > ? rtl8169_try_rx_copy.isra.77 /me scratches head. You may check that pkt_size is > 0, <= ETH_FRAME_LEN (no jumbo ?) and see if it helps. -- Ueimor ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: r8169 OOPSen in rtl_rx 2013-08-13 21:15 ` Francois Romieu @ 2013-08-14 9:29 ` Peter Zijlstra 2013-08-14 9:52 ` Peter Zijlstra 0 siblings, 1 reply; 6+ messages in thread From: Peter Zijlstra @ 2013-08-14 9:29 UTC (permalink / raw) To: Francois Romieu; +Cc: nic_swsd, netdev [-- Attachment #1: Type: text/plain, Size: 2937 bytes --] On Tue, Aug 13, 2013 at 11:15:34PM +0200, Francois Romieu wrote: > Peter Zijlstra <peterz@infradead.org> : > [...] > > I've got an AMD x86_64 machine with two realtek NICs: > > > > 01:08.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10) > > 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06) > > Which XID (see kernel dmesg) ? $ dmesg | grep -i xid [ 1.706427] r8169 0000:01:08.0 eth0: RTL8110s at 0xffffc9000063ac00, 00:0e:2e:87:8b:70, XID 04000000 IRQ 18 [ 1.717012] r8169 0000:03:00.0 eth1: RTL8168e/8111e at 0xffffc90000646000, a0:f3:c1:00:74:a3, XID 0c200000 IRQ 43 > > So all I've got to offer currently is a partial backtrace -- see > > attached image. > > (no attachment) Oh, duh.. > > Partial transcribe: > > > > ? rtl8169_try_rx_copy.isra.77 > > /me scratches head. > > You may check that pkt_size is > 0, <= ETH_FRAME_LEN (no jumbo ?) and > see if it helps. So eth0 runs at 1000Mb/s but has a MTU:1500, eth1 runs at 100Mb/s also MTU:1500. I'll try a kernel with the below. Hopefully the change in dumpstack_64.c will avoid printing the 'process' stack and give a little more useful information. Will let you know. Thanks --- arch/x86/kernel/dumpstack_64.c | 7 ++++--- drivers/net/ethernet/realtek/r8169.c | 2 ++ 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c index addb207..f76e98f 100644 --- a/arch/x86/kernel/dumpstack_64.c +++ b/arch/x86/kernel/dumpstack_64.c @@ -182,7 +182,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs, stack = (unsigned long *) (irq_stack_end[-1]); irq_stack_end = NULL; ops->stack(data, "EOI"); - continue; + goto out; } } break; @@ -192,6 +192,7 @@ void dump_trace(struct task_struct *task, struct pt_regs *regs, * This handles the process stack: */ bp = ops->walk_stack(tinfo, stack, bp, ops, data, NULL, &graph); +out: put_cpu(); } EXPORT_SYMBOL(dump_trace); @@ -231,8 +232,8 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs, pr_cont(" <EOI> "); } } else { - if (((long) stack & (THREAD_SIZE-1)) == 0) - break; + if (((long) stack & (THREAD_SIZE-1)) == 0) + break; } if (i && ((i % STACKSLOTS_PER_LINE) == 0)) pr_cont("\n"); diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 393f961..76d1c18 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -6185,6 +6185,8 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget else pkt_size = status & 0x00003fff; + WARN_ON(!(pkt_size > 0 && pkt_size <= ETH_FRAME_LEN)); + /* * The driver does not support incoming fragmented * frames. They are seen as a symptom of over-mtu [-- Attachment #2: IMG_20130810_195601.jpg --] [-- Type: image/jpeg, Size: 301515 bytes --] ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: r8169 OOPSen in rtl_rx 2013-08-14 9:29 ` Peter Zijlstra @ 2013-08-14 9:52 ` Peter Zijlstra 2013-09-05 15:20 ` Peter Zijlstra 0 siblings, 1 reply; 6+ messages in thread From: Peter Zijlstra @ 2013-08-14 9:52 UTC (permalink / raw) To: Francois Romieu; +Cc: nic_swsd, netdev On Wed, Aug 14, 2013 at 11:29:15AM +0200, Peter Zijlstra wrote: > diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c > index 393f961..76d1c18 100644 > --- a/drivers/net/ethernet/realtek/r8169.c > +++ b/drivers/net/ethernet/realtek/r8169.c > @@ -6185,6 +6185,8 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget > else > pkt_size = status & 0x00003fff; > > + WARN_ON(!(pkt_size > 0 && pkt_size <= ETH_FRAME_LEN)); > + > /* > * The driver does not support incoming fragmented > * frames. They are seen as a symptom of over-mtu OK, I changed that to: diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 393f961..81e0bf4 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -6185,6 +6185,12 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget else pkt_size = status & 0x00003fff; + if (!(pkt_size > 0 && pkt_size <= ETH_FRAME_LEN)) { + dev->stats.rx_dropped++; + printk("%s Funny sized packet: %d\n", dev->name, pkt_size); + goto release_descriptor; + } + /* * The driver does not support incoming fragmented * frames. They are seen as a symptom of over-mtu ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: r8169 OOPSen in rtl_rx 2013-08-14 9:52 ` Peter Zijlstra @ 2013-09-05 15:20 ` Peter Zijlstra 2013-09-05 23:09 ` Francois Romieu 0 siblings, 1 reply; 6+ messages in thread From: Peter Zijlstra @ 2013-09-05 15:20 UTC (permalink / raw) To: Francois Romieu; +Cc: nic_swsd, netdev On Wed, Aug 14, 2013 at 11:52:33AM +0200, Peter Zijlstra wrote: > diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c > index 393f961..81e0bf4 100644 > --- a/drivers/net/ethernet/realtek/r8169.c > +++ b/drivers/net/ethernet/realtek/r8169.c > @@ -6185,6 +6185,12 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget > else > pkt_size = status & 0x00003fff; > > + if (!(pkt_size > 0 && pkt_size <= ETH_FRAME_LEN)) { > + dev->stats.rx_dropped++; > + printk("%s Funny sized packet: %d\n", dev->name, pkt_size); > + goto release_descriptor; > + } > + > /* > * The driver does not support incoming fragmented > * frames. They are seen as a symptom of over-mtu Yay, it triggered.. $ dmesg | awk '/Funny sized packet/ { t[$6]++ } END { for (i in t) { printf "%d %d\n", t[i], i; } }' | sort -n 1 4237 1 4983 1 5811 1 6062 1 6594 2 10709 2 12073 2 9197 4 14624 4 14870 266 16364 dev->name is always the same and the internal NIC (eth0, RTL8110s). When it happens the NIC stops working as every packet is mal-sized, however an ifconfig down; ifconfig up will restore it to working order. It appears to happen when I saturate my outside link such that all packets are fwd to the internal network -- I've got a 30Mbit/s down link which isn't all that much given its a GBE capable card. When I try and saturate the internal nic, with traffic from the firewall to an internal machine we reach GBE speeds but nothing falls over. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: r8169 OOPSen in rtl_rx 2013-09-05 15:20 ` Peter Zijlstra @ 2013-09-05 23:09 ` Francois Romieu 0 siblings, 0 replies; 6+ messages in thread From: Francois Romieu @ 2013-09-05 23:09 UTC (permalink / raw) To: Peter Zijlstra; +Cc: nic_swsd, netdev Peter Zijlstra <peterz@infradead.org> : [...] > Yay, it triggered.. Bingo. Can you display the whole descriptor entry (opts1 and opts2) and its index (cur_rx) when abnormal packets are detected ? We can always check the packet size but I'd welcome some more specific pattern in the remaining bits of the descriptor. Btw, you may try to revert aee77e4accbeb2c86b1d294cd84fec4a12dde3bd ("r8169: use unlimited DMA burst for TX") and see if it changes the Rx / Tx balance. It would only be a bandaid though. -- Ueimor ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-09-05 23:10 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-08-13 9:43 r8169 OOPSen in rtl_rx Peter Zijlstra 2013-08-13 21:15 ` Francois Romieu 2013-08-14 9:29 ` Peter Zijlstra 2013-08-14 9:52 ` Peter Zijlstra 2013-09-05 15:20 ` Peter Zijlstra 2013-09-05 23:09 ` Francois Romieu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox