* Repeatable Oops, mv643xx_eth in 2.6.18.x
@ 2006-11-10 19:17 Erik Andersen
2006-11-10 19:54 ` [RFT] mv643xxx_eth_start_xmit oops Stephen Hemminger
2006-11-10 20:29 ` Repeatable Oops, mv643xx_eth in 2.6.18.x Olaf Hering
0 siblings, 2 replies; 5+ messages in thread
From: Erik Andersen @ 2006-11-10 19:17 UTC (permalink / raw)
To: linux-kernel
I have a Pegasos2 powerpc system acting as my home server. With
2.6.16.x it was 100% stable and I had months of uptime, rebooting
only to periodically apply security updates to the kernel.
With 2.6.17 and 2.6.18, after an uptime of no more than 2 days,
and usually much less, I get a kernel panic, with nothing in the
log. I finally caught it in the act, and took a picture.
http://codepoet.org/oops.jpg
A quick transcription of the Oops in the screenshot follows:
--------------------------------------------
kernel BUG in eth_alloc_tx_desc_index at drivers/net/mv643xx_eth.c:1070
Oops: Exception in kernel mode, sig: 5 [#1]
Modules linked in: usb_storage uhci_hcd sd_mod scsi_mod tun ide_cd cdrom eepro100 via_rhine
NIP: C0160894 LR: C01609B8 CTR: C0160AF4
REGS: ca2b79b0 TRAP: 0700 Not tainted (2.6.18.1-erik)
MSR: 00021032 <ME,IR,DR> CR: 82202488 XER: 0000000
TASK = de0ec7b0[9823] 'dansguardian' THREAD: ca2b6000
GPR00: <see screenshot>
GPR08: <see screenshot>
GPR16: <see screenshot>
GPR24: <see screenshot>
NIP [C0160894] eth_alloc_tx_desc_index+0x48/0x50
LR [C01609B8] eth_tx_submit_descs_for_skb+0x28/0x164
Call Trace:
(unreliable)
mv643xx_eth_start_xmit+0xa8/0x170
dev_hard_start_xmit+0x8c/0x14c
dev_queue_xmit+0x11c/0x2e0
ip_output+0x198/0x288
ip_queue_xmit+0x3c0/0x470
tcp_transmit_skb+x308/0x4c8
tcp_push_one+0xfc/0x14c
tcp_sendmsg+0x3d0/0xca4
inet_sendmsg+0x60/0x7c
sock_sendmsg+0xb0/0xe4
sys_sendto+0xd4/0x10c
sys_socketcall+0x120/0x1d8
ret_drom_syscall+0x0/0x38
--- Exception: x01 at 0xfdb0e5c
LR = 0x100120a8
Instruction dump:
<see screenshot>
<see screenshot>
<0>Kernel panic - not syncing: Fatal exception in interrupt
<0>Rebooting in 180 seconds.._
-Erik
--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--
^ permalink raw reply [flat|nested] 5+ messages in thread* [RFT] mv643xxx_eth_start_xmit oops
2006-11-10 19:17 Repeatable Oops, mv643xx_eth in 2.6.18.x Erik Andersen
@ 2006-11-10 19:54 ` Stephen Hemminger
2006-11-10 20:47 ` Erik Andersen
2006-11-11 23:44 ` Erik Andersen
2006-11-10 20:29 ` Repeatable Oops, mv643xx_eth in 2.6.18.x Olaf Hering
1 sibling, 2 replies; 5+ messages in thread
From: Stephen Hemminger @ 2006-11-10 19:54 UTC (permalink / raw)
To: linux-kernel
On Fri, 10 Nov 2006 12:17:45 -0700
Erik Andersen <andersen@codepoet.org> wrote:
> I have a Pegasos2 powerpc system acting as my home server. With
> 2.6.16.x it was 100% stable and I had months of uptime, rebooting
> only to periodically apply security updates to the kernel.
>
> With 2.6.17 and 2.6.18, after an uptime of no more than 2 days,
> and usually much less, I get a kernel panic, with nothing in the
> log. I finally caught it in the act, and took a picture.
> http://codepoet.org/oops.jpg
>
> A quick transcription of the Oops in the screenshot follows:
> --------------------------------------------
>
The code int mv643xx_eth_start_xmit is not safe on SMP it was checking for space outside of lock.
Does the following (untested) fix it?
0. Fix race where space check is outside of lock.
1. Eliminate bogus BUG_ON()'s
2. Use proper transmit routine return values
3. Cleanup potential kernel log overrun if hit with unaligned frags
4. Compare with actual space needed rather than worst case
5. Use correct return code for case of linearize() failure.
---
drivers/net/mv643xx_eth.c | 30 ++++++++++++++----------------
1 files changed, 14 insertions(+), 16 deletions(-)
diff --git a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
index 9997081..4052bfe 100644
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -1191,25 +1191,23 @@ static int mv643xx_eth_start_xmit(struct
struct net_device_stats *stats = &mp->stats;
unsigned long flags;
- BUG_ON(netif_queue_stopped(dev));
- BUG_ON(skb == NULL);
-
- if (mp->tx_ring_size - mp->tx_desc_count < MAX_DESCS_PER_SKB) {
- printk(KERN_ERR "%s: transmit with queue full\n", dev->name);
- netif_stop_queue(dev);
- return 1;
- }
-
- if (has_tiny_unaligned_frags(skb)) {
- if (__skb_linearize(skb)) {
- stats->tx_dropped++;
+ if (has_tiny_unaligned_frags(skb) && __skb_linearize(skb)) {
+ stats->tx_dropped++;
+ if (net_ratelimit())
printk(KERN_DEBUG "%s: failed to linearize tiny "
- "unaligned fragment\n", dev->name);
- return 1;
- }
+ "unaligned fragment\n", dev->name);
+ return NETDEV_TX_OK;
}
spin_lock_irqsave(&mp->lock, flags);
+ if (mp->tx_ring_size - mp->tx_desc_count < skb_shinfo(skb)->nr_frags + 1) {
+ if (!netif_queue_stopped(dev)) {
+ printk(KERN_ERR "%s: transmit with queue full\n", dev->name);
+ netif_stop_queue(dev);
+ }
+ spin_unlock_irqrestore(&mp->lock, flags);
+ return NETDEV_TX_BUSY;;
+ }
eth_tx_submit_descs_for_skb(mp, skb);
stats->tx_bytes = skb->len;
@@ -1221,7 +1219,7 @@ static int mv643xx_eth_start_xmit(struct
spin_unlock_irqrestore(&mp->lock, flags);
- return 0; /* success */
+ return NETDEV_TX_OK;
}
/*
--
1.4.1
--
Stephen Hemminger <shemminger@osdl.org>
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [RFT] mv643xxx_eth_start_xmit oops
2006-11-10 19:54 ` [RFT] mv643xxx_eth_start_xmit oops Stephen Hemminger
@ 2006-11-10 20:47 ` Erik Andersen
2006-11-11 23:44 ` Erik Andersen
1 sibling, 0 replies; 5+ messages in thread
From: Erik Andersen @ 2006-11-10 20:47 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: linux-kernel
On Fri Nov 10, 2006 at 11:54:44AM -0800, Stephen Hemminger wrote:
> The code int mv643xx_eth_start_xmit is not safe on SMP it was
> checking for space outside of lock.
Hmm. I do not have CONFIG_SMP enabled... But then I suppose the
function is not reentrant either, so networking from N apps would
eventually result in a collision where it would blow up. That
seems consistant with what I was seeing.
> Does the following (untested) fix it?
Thanks! It at least applies cleanly and compiles...
Will let you know if it seems fixed.
-Erik
--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFT] mv643xxx_eth_start_xmit oops
2006-11-10 19:54 ` [RFT] mv643xxx_eth_start_xmit oops Stephen Hemminger
2006-11-10 20:47 ` Erik Andersen
@ 2006-11-11 23:44 ` Erik Andersen
1 sibling, 0 replies; 5+ messages in thread
From: Erik Andersen @ 2006-11-11 23:44 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: linux-kernel
On Fri Nov 10, 2006 at 11:54:44AM -0800, Stephen Hemminger wrote:
> The code int mv643xx_eth_start_xmit is not safe on SMP it was
> checking for space outside of lock. Does the following
> (untested) fix it?
With your patch applied, I got the exact same oops a few minutes
ago while trying to rsync about a GB of stuff...
-Erik
--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Repeatable Oops, mv643xx_eth in 2.6.18.x
2006-11-10 19:17 Repeatable Oops, mv643xx_eth in 2.6.18.x Erik Andersen
2006-11-10 19:54 ` [RFT] mv643xxx_eth_start_xmit oops Stephen Hemminger
@ 2006-11-10 20:29 ` Olaf Hering
1 sibling, 0 replies; 5+ messages in thread
From: Olaf Hering @ 2006-11-10 20:29 UTC (permalink / raw)
To: andersen, linux-kernel
On Fri, Nov 10, Erik Andersen wrote:
> I have a Pegasos2 powerpc system acting as my home server. With
> 2.6.16.x it was 100% stable and I had months of uptime, rebooting
> only to periodically apply security updates to the kernel.
>
> With 2.6.17 and 2.6.18, after an uptime of no more than 2 days,
> and usually much less, I get a kernel panic, with nothing in the
> log. I finally caught it in the act, and took a picture.
> http://codepoet.org/oops.jpg
I got a similar crash while doing Xorg pci domain testing (I guess):
Welcome to openSUSE 10.2 (PPC) Beta1plus - Kernel 2.6.18.1-20061102142452-default (console).
e89 login: Oops: Exception in kernel mode, sig: 5 [#1]
Modules linked in: autofs4 ipv6 nfs lockd nfs_acl sunrpc loop pcspkr ide_cd cdrom ohci1394 via_ircc irda crc_ccitt parport_pc ieee1394 uhci_hcd parport via_rhine mv643xx_eth mii via82cxxx
NIP: D1035504 LR: D1037AE0 CTR: D10379E4
REGS: c3919b10 TRAP: 0700 Tainted: G U (2.6.18.1-20061102142452-default)
MSR: 00021032 <ME,IR,DR> CR: 28000488 XER: 00000000
TASK = cc0d9340[3789] 'sshd' THREAD: c3918000
GPR00: 00000000 C3919BC0 CC0D9340 CFF1D280 CFF1D000 00000000 CE4109D2 00000000
GPR08: 00000176 00000001 00001000 00000177 00000000 08080AA0 00000000 00000001
GPR16: 00000005 C3919E34 CD8904E0 00000050 000005A8 00000000 00009032 CFF1D2FC
GPR24: 00000010 00000000 00000000 00000000 D10379E4 CFF1D000 CFF1D280 CB0A62A8
NIP [D1035504] eth_alloc_tx_desc_index+0x44/0x50 [mv643xx_eth]
LR [D1037AE0] mv643xx_eth_start_xmit+0xfc/0x374 [mv643xx_eth]
Call Trace:
[C3919BC0] [CE4109D0] 0xce4109d0 (unreliable)
[C3919C00] [C0262254] dev_hard_start_xmit+0x244/0x2dc
[C3919C20] [C026478C] dev_queue_xmit+0x21c/0x2dc
[C3919C40] [C02884CC] ip_output+0x240/0x294
[C3919C60] [C0288B10] ip_queue_xmit+0x3b4/0x41c
[C3919CE0] [C0298374] tcp_transmit_skb+0x6d4/0x718
[C3919D10] [C029A178] __tcp_push_pending_frames+0x76c/0x858
[C3919D50] [C028F19C] tcp_sendmsg+0xa90/0xbd0
[C3919DB0] [C02AA498] inet_sendmsg+0x60/0x74
[C3919DD0] [C02552A4] do_sock_write+0xd8/0xec
[C3919DF0] [C0255B48] sock_aio_write+0x58/0x74
[C3919E50] [C0083278] do_sync_write+0xbc/0x104
[C3919EF0] [C0083F40] vfs_write+0x104/0x1c8
[C3919F10] [C00845DC] sys_write+0x4c/0x8c
[C3919F40] [C00125DC] ret_from_syscall+0x0/0x40
--- Exception: c01 at 0x7a439b8
LR = 0x8041020
Instruction dump:
5400fffe 0f000000 81030020 81230024 39680001 7c0b53d6 7c0051d6 7d605850
7d694a78 91630020 7d290034 5529d97e <0f090000> 7d034378 4e800020 2f840001
<0>Kernel panic - not syncing: Fatal exception in interrupt
<0>Rebooting in 180 seconds..PegasosII Boot Strap (c) 2002-2005 bplan GmbH (BUILD 20051216202806)
Running on CPU PVR:80020101
It happend only once.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-11-11 23:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-10 19:17 Repeatable Oops, mv643xx_eth in 2.6.18.x Erik Andersen
2006-11-10 19:54 ` [RFT] mv643xxx_eth_start_xmit oops Stephen Hemminger
2006-11-10 20:47 ` Erik Andersen
2006-11-11 23:44 ` Erik Andersen
2006-11-10 20:29 ` Repeatable Oops, mv643xx_eth in 2.6.18.x Olaf Hering
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox