* skb_pull_rcsum - Fatal exception in interrupt
@ 2007-08-15 15:07 Alan J. Wylie
2007-08-15 15:54 ` Evgeniy Polyakov
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Alan J. Wylie @ 2007-08-15 15:07 UTC (permalink / raw)
To: netdev
We have been shipping Linux based servers to customers for several
years now, with few problems. Recently, however, a single customer has
been seeing kernel panics. Unfortunately, the customer is about 200
miles away, so physical access is limited. There are two ethernet
interfaces, one should be plugged into a local RFC1918 network, the
other is connected to the internet. If eth0 is plugged into the local
network, a short time later the system panics.
Hardware: Intel S5000VSA server
Network cards: Intel e1000
Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper)
We shipped a second system, and this displayed identical symptoms. We
have tested with several recent 2.6 kernels, including
2.6.22
2.6.17.14
2.6.20.15
all of which crash.
We have a couple of photographs showing the tail end of the messages
on the screen.
The last two lines are:
EIP: [<c02b6fb2>] skb_pull_rcsum+0x6d/0x71 SS:ESP 09068:c03e1ea4
Kernel panic - not syncing: Fatal exception in interrupt
The photos, along with the following information are available at
http://wylie.me.uk/skb_pull_rcsum/
lspci
lspci -n
lspci -v
ethtool -d
/proc/interrupts
kernel config
There are no related messages in the syslog files.
The code for skb_pull_rcsum is short, but contains two calls
to BUG_ON, checking for invalid lengths.
unsigned char *skb_pull_rcsum(struct sk_buff *skb, unsigned int len)
{
BUG_ON(len > skb->len);
skb->len -= len;
BUG_ON(skb->len < skb->data_len);
skb_postpull_rcsum(skb, skb->data, len);
return skb->data += len;
}
I wonder whether this problem bears any resemblance to
http://bugzilla.kernel.org/show_bug.cgi?id=2979
| We were overreacting to invalid incoming AppleTalk frames. Better
| just drop invalid frames than crash the kernel ;)
<http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=75559c167bddc1254db5bcff032ad5eed8bd6f4a>
| [APPLETALK]: Fix a remotely triggerable crash
| When we receive an AppleTalk frame shorter than what its header
| says, we still attempt to verify its checksum, and trip on the
| BUG_ON() at the end of function atalk_sum_skb() because of the
| length mismatch.
| This has security implications because this can be triggered by
| simply sending a specially crafted ethernet frame to a target
| victim, effectively crashing that host. Thus this qualifies, I
| think, as a remote DoS.
Our system is also installed in a school. We have remote access to the
box, and can, with some inconvenience, arrange for the box to be
rebooted. We are currently arranging for two different network cards
(RealTek RTL8139) to be installed.
I am pretty certain that the problem is to do with network traffic,
rather than hardware or software configurations - this box is pretty
well identical to tens of other boxes working successfully, the only
difference being that recently the on-board ethernet changed from
8086:1079 (rev 03) to 8086:1096 (rev 01) requiring an updated e1000
driver.
What is the best way to track this bug down, remembering that we have
little more than ssh access and a remote finger to press the reboot
button?
Could we modify the code to log and drop the packet, rather than
panicking the kernel?
--
Alan J. Wylie http://www.wylie.me.uk/
"Perfection [in design] is achieved not when there is nothing left to add,
but rather when there is nothing left to take away."
-- Antoine de Saint-Exupery
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: skb_pull_rcsum - Fatal exception in interrupt
2007-08-15 15:07 skb_pull_rcsum - Fatal exception in interrupt Alan J. Wylie
@ 2007-08-15 15:54 ` Evgeniy Polyakov
2007-08-20 14:24 ` Herbert Xu
2007-08-16 2:31 ` Herbert Xu
2007-08-20 16:21 ` Brandeburg, Jesse
2 siblings, 1 reply; 6+ messages in thread
From: Evgeniy Polyakov @ 2007-08-15 15:54 UTC (permalink / raw)
To: Alan J. Wylie; +Cc: netdev
Hi Alan.
On Wed, Aug 15, 2007 at 04:07:23PM +0100, Alan J. Wylie (alan@wylie.me.uk) wrote:
> EIP: [<c02b6fb2>] skb_pull_rcsum+0x6d/0x71 SS:ESP 09068:c03e1ea4
> Kernel panic - not syncing: Fatal exception in interrupt
At least with this patch it should not panic.
More correct solution might be to use pskb_may_pull() or check aditional
length in llc_fixup_skb().
Actually if dmesg will show that there is something in fragments, it
should use pskb_may_pull(). The same bug exist in bridge and vlan, btw,
so it might be a solution to remove bug_on from skb_pull_rcsum() and
instead call may_pull?
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
diff --git a/net/802/psnap.c b/net/802/psnap.c
index 04ee43e..5f410e9 100644
--- a/net/802/psnap.c
+++ b/net/802/psnap.c
@@ -60,13 +60,24 @@ static int snap_rcv(struct sk_buff *skb, struct net_device *dev,
if (proto) {
/* Pass the frame on. */
skb->transport_header += 5;
+ if (skb->len < 5 || skb->len - 5 < skb->data_len) {
+ if (net_ratelimit())
+ printk(KERN_NOTICE "Short packet: len: %u, "
+ "data_len: %u.\n",
+ skb->len, skb->data_len);
+ goto err_out;
+ }
skb_pull_rcsum(skb, 5);
rc = proto->rcvfunc(skb, dev, &snap_packet_type, orig_dev);
- } else {
- skb->sk = NULL;
- kfree_skb(skb);
- rc = 1;
}
+
+ rcu_read_unlock();
+ return rc;
+
+err_out:
+ skb->sk = NULL;
+ kfree_skb(skb);
+ rc = 1;
rcu_read_unlock();
return rc;
--
Evgeniy Polyakov
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: skb_pull_rcsum - Fatal exception in interrupt
2007-08-15 15:07 skb_pull_rcsum - Fatal exception in interrupt Alan J. Wylie
2007-08-15 15:54 ` Evgeniy Polyakov
@ 2007-08-16 2:31 ` Herbert Xu
2007-08-20 16:21 ` Brandeburg, Jesse
2 siblings, 0 replies; 6+ messages in thread
From: Herbert Xu @ 2007-08-16 2:31 UTC (permalink / raw)
To: Alan J. Wylie; +Cc: netdev
Alan J. Wylie <alan@wylie.me.uk> wrote:
>
> The photos, along with the following information are available at
> http://wylie.me.uk/skb_pull_rcsum/
The really important bit has scrolled off. Try booting with
vga=<appropriate setting> to increase the resolution. Or use
a serial console if you can.
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 6+ messages in thread* RE: skb_pull_rcsum - Fatal exception in interrupt
2007-08-15 15:07 skb_pull_rcsum - Fatal exception in interrupt Alan J. Wylie
2007-08-15 15:54 ` Evgeniy Polyakov
2007-08-16 2:31 ` Herbert Xu
@ 2007-08-20 16:21 ` Brandeburg, Jesse
2007-08-20 17:04 ` Alan J. Wylie
2 siblings, 1 reply; 6+ messages in thread
From: Brandeburg, Jesse @ 2007-08-20 16:21 UTC (permalink / raw)
To: Alan J. Wylie; +Cc: e1000-devel, Linux Network Development list
Alan J. Wylie wrote:
> We have been shipping Linux based servers to customers for several
> years now, with few problems. Recently, however, a single customer has
> been seeing kernel panics. Unfortunately, the customer is about 200
> miles away, so physical access is limited. There are two ethernet
> interfaces, one should be plugged into a local RFC1918 network, the
> other is connected to the internet. If eth0 is plugged into the local
> network, a short time later the system panics.
>
> Hardware: Intel S5000VSA server
>
> Network cards: Intel e1000
> Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper)
Hi Alan, I work on the team that supports e1000, I'd be interested in
seeing the dmesg output from the machine before it crashes, maybe you
can add that to your web collection of data below?
many of the 5000 series machines have BMC's its possible that you could
set up the remote management so you could reboot it remotely, but that
may not be worth the extra effort. It could however give you the
ability to have a serial console over ethernet, which would get us the
full panic message, but see below.
> # CONFIG_E1000_DISABLE_PACKET_SPLIT is not set
can you try setting the CONFIG_E1000_DISABLE_PACKET_SPLIT=y
this will prevent the driver from splitting the header from the packet
data which could be exacerbating this problem.
Its not immediately obvious whether this is a kernel or driver problem,
I hope you don't mind I cc'd e1000-devel since this is possibly relevant
to other e1000 users and developers.
> We shipped a second system, and this displayed identical symptoms. We
> have tested with several recent 2.6 kernels, including
>
> 2.6.22
> 2.6.17.14
> 2.6.20.15
>
> all of which crash.
>
> We have a couple of photographs showing the tail end of the messages
> on the screen.
>
> The last two lines are:
>
> EIP: [<c02b6fb2>] skb_pull_rcsum+0x6d/0x71 SS:ESP 09068:c03e1ea4
> Kernel panic - not syncing: Fatal exception in interrupt
can you boot with vga=0x318 appended to kernel options? this might help
you get more on the screen. you could also look into netconsole, but
because this is a networking crash I don't know if you'll get data out
of netconsole or not, and I don't know if you can use netconsole over
the 'net' as I've only used it for local logging.
Jesse
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: skb_pull_rcsum - Fatal exception in interrupt
2007-08-20 16:21 ` Brandeburg, Jesse
@ 2007-08-20 17:04 ` Alan J. Wylie
0 siblings, 0 replies; 6+ messages in thread
From: Alan J. Wylie @ 2007-08-20 17:04 UTC (permalink / raw)
To: Brandeburg, Jesse; +Cc: e1000-devel, Linux Network Development list
On Mon, 20 Aug 2007 09:21:54 -0700, "Brandeburg, Jesse" <jesse.brandeburg@intel.com> said:
> Hi Alan, I work on the team that supports e1000, I'd be interested
> in seeing the dmesg output from the machine before it crashes, maybe
> you can add that to your web collection of data below?
Don't worry - it's definitely not an e1000 problem. I'm in contact
with the netdev guys, who have produced a patch.
Thanks anyway
Alan.
--
Alan J. Wylie http://www.wylie.me.uk/
"Perfection [in design] is achieved not when there is nothing left to add,
but rather when there is nothing left to take away."
-- Antoine de Saint-Exupery
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-08-20 17:04 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-15 15:07 skb_pull_rcsum - Fatal exception in interrupt Alan J. Wylie
2007-08-15 15:54 ` Evgeniy Polyakov
2007-08-20 14:24 ` Herbert Xu
2007-08-16 2:31 ` Herbert Xu
2007-08-20 16:21 ` Brandeburg, Jesse
2007-08-20 17:04 ` Alan J. Wylie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox