2.6.18 forcedeth GSO panic on send

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 2.6.18 forcedeth GSO panic on send
@ 2006-10-26 23:17 Denis Vlasenko
  2006-10-27  3:58 ` Herbert Xu
  0 siblings, 1 reply; 6+ messages in thread
From: Denis Vlasenko @ 2006-10-26 23:17 UTC (permalink / raw)
  To: Manfred Spraul, Jeff Garzik, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 832 bytes --]

Hello,

I am using an AMD64 box with 32bit userspace / 64bit kernel.

Kernels 2.6.18 and 2.6.18.1 semi-randomly hang when I upload stuff
over the net - for example, "svn commit", scp are affected.
2.6.17.11 does not seem to be affected.

Unfortunately even 60-line screen is not big enough
to catch whole trace. There are at least two traces,
and first scrolls off. I have a photo at
http://busybox.net/~vda/gso_panic/forcedeth_gso_panic.jpg

Something bad is happening here, when kernel tries
to send some data:

...
error_exit
skb_over_panic
skb_over_panic
skb_segment
tcp_tso_segment
inet_gso_segment
skb_gso_segment
dev_hard_start_xmit
dev_queue_xmit
...

Looks like it is related to hardware accel in forcedeth.
I will try disabling all hw accel.

Please find in attached tarball:

.config
dmesg
ethtool-k
lspci
lspci-v
--
vda

[-- Attachment #2: gso_panic.tar.bz2 --]
[-- Type: application/x-tbz, Size: 15933 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.18 forcedeth GSO panic on send
  2006-10-26 23:17 2.6.18 forcedeth GSO panic on send Denis Vlasenko
@ 2006-10-27  3:58 ` Herbert Xu
  2006-10-29 12:55   ` Denis Vlasenko
  0 siblings, 1 reply; 6+ messages in thread
From: Herbert Xu @ 2006-10-27  3:58 UTC (permalink / raw)
  To: Denis Vlasenko
  Cc: Manfred Spraul, Jeff Garzik, netdev, linux-kernel,
	David S. Miller

On Thu, Oct 26, 2006 at 11:17:57PM +0000, Denis Vlasenko wrote:
> 
> I am using an AMD64 box with 32bit userspace / 64bit kernel.
> 
> Kernels 2.6.18 and 2.6.18.1 semi-randomly hang when I upload stuff
> over the net - for example, "svn commit", scp are affected.
> 2.6.17.11 does not seem to be affected.
> 
> Unfortunately even 60-line screen is not big enough
> to catch whole trace. There are at least two traces,
> and first scrolls off. I have a photo at
> http://busybox.net/~vda/gso_panic/forcedeth_gso_panic.jpg

Looks like a network stack bug rather than a driver problem.
However, I'd really like to see the first oops including the
print out from skb_over_panic.

Could you try booting with pause_on_oops=1 or perhaps use a
serial console?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.18 forcedeth GSO panic on send
  2006-10-27  3:58 ` Herbert Xu
@ 2006-10-29 12:55   ` Denis Vlasenko
  2006-10-29 14:10     ` Herbert Xu
  0 siblings, 1 reply; 6+ messages in thread
From: Denis Vlasenko @ 2006-10-29 12:55 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Manfred Spraul, Jeff Garzik, netdev, linux-kernel,
	David S. Miller

Sorry for the delay.

On Friday 27 October 2006 05:58, Herbert Xu wrote:
> > I am using an AMD64 box with 32bit userspace / 64bit kernel.
> > 
> > Kernels 2.6.18 and 2.6.18.1 semi-randomly hang when I upload stuff
> > over the net - for example, "svn commit", scp are affected.
> > 2.6.17.11 does not seem to be affected.
> > 
> > Unfortunately even 60-line screen is not big enough
> > to catch whole trace. There are at least two traces,
> > and first scrolls off. I have a photo at
> > http://busybox.net/~vda/gso_panic/forcedeth_gso_panic.jpg
> 
> Looks like a network stack bug rather than a driver problem.
> However, I'd really like to see the first oops including the
> print out from skb_over_panic.

With "echo 1 >/proc/sys/kernel/panic_on_oops" I've got
what you're requested. See screenshot:

http://busybox.net/~vda/gso_panic/forcedeth_gso_panic2.jpg

--
vda

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.18 forcedeth GSO panic on send
  2006-10-29 12:55   ` Denis Vlasenko
@ 2006-10-29 14:10     ` Herbert Xu
  2006-10-29 21:45       ` Denis Vlasenko
  0 siblings, 1 reply; 6+ messages in thread
From: Herbert Xu @ 2006-10-29 14:10 UTC (permalink / raw)
  To: Denis Vlasenko
  Cc: Manfred Spraul, Jeff Garzik, netdev, linux-kernel,
	David S. Miller

On Sun, Oct 29, 2006 at 01:55:56PM +0100, Denis Vlasenko wrote:
> 
> With "echo 1 >/proc/sys/kernel/panic_on_oops" I've got
> what you're requested. See screenshot:
> 
> http://busybox.net/~vda/gso_panic/forcedeth_gso_panic2.jpg

Thanks!

Please let me know if this patch fixes it:

[NET]: Fix segmentation of linear packets

skb_segment fails to segment linear packets correctly because it
tries to write all linear parts of the original skb into each
segment.  This will always panic as each segment only contains
enough space for one MSS.

This was not detected earlier because linear packets should be
rare for GSO.  In fact it still remains to be seen what exactly
created the linear packets that triggered this bug.  Basically
the only time this should happen is if someone enables GSO
emulation on an interface that does not support SG.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

The fact that you're triggering this bug at all means that
something else has gone wrong.  First of all your picture
shows a device setting of "lo".  This does not tally with
the fact that the BUG was triggered by ssh.  Do you have
any idea why this is the case? Do you have any netfilter
rules that might cause this?

If it is indeed lo, could you please check the ethtool -k
setting on it? Also what is the ethtool -k setting on the
interface where you expect ssh to go out?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3c23760..f735455 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1946,7 +1946,7 @@ struct sk_buff *skb_segment(struct sk_bu
 	do {
 		struct sk_buff *nskb;
 		skb_frag_t *frag;
-		int hsize, nsize;
+		int hsize;
 		int k;
 		int size;

@@ -1957,11 +1957,10 @@ struct sk_buff *skb_segment(struct sk_bu
 		hsize = skb_headlen(skb) - offset;
 		if (hsize < 0)
 			hsize = 0;
-		nsize = hsize + doffset;
-		if (nsize > len + doffset || !sg)
-			nsize = len + doffset;
+		if (hsize > len || !sg)
+			hsize = len;

-		nskb = alloc_skb(nsize + headroom, GFP_ATOMIC);
+		nskb = alloc_skb(hsize + doffset + headroom, GFP_ATOMIC);
 		if (unlikely(!nskb))
 			goto err;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: 2.6.18 forcedeth GSO panic on send
  2006-10-29 14:10     ` Herbert Xu
@ 2006-10-29 21:45       ` Denis Vlasenko
  2006-10-29 21:57         ` Denis Vlasenko
  0 siblings, 1 reply; 6+ messages in thread
From: Denis Vlasenko @ 2006-10-29 21:45 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Manfred Spraul, Jeff Garzik, netdev, linux-kernel,
	David S. Miller

On Sunday 29 October 2006 15:10, Herbert Xu wrote:
> On Sun, Oct 29, 2006 at 01:55:56PM +0100, Denis Vlasenko wrote:
> > 
> > With "echo 1 >/proc/sys/kernel/panic_on_oops" I've got
> > what you're requested. See screenshot:
> > 
> > http://busybox.net/~vda/gso_panic/forcedeth_gso_panic2.jpg
> 
> Thanks!
> 
> Please let me know if this patch fixes it:
> 
> [NET]: Fix segmentation of linear packets

Okay, will test now.

> The fact that you're triggering this bug at all means that
> something else has gone wrong.  First of all your picture
> shows a device setting of "lo".  This does not tally with
> the fact that the BUG was triggered by ssh.  Do you have
> any idea why this is the case? Do you have any netfilter
> rules that might cause this?

Yes, I have netfilter rules which redirect all outgoing
port 22 connections to local port 2222. I run tcpserver
on 2222 which spawns hose for each connection.

All this ugliness is required to get ssh working across
buggy D-Link router. It chokes on nagle-disabled sessions,
I think.

The rules:

# iptables -t nat -N redir22
# iptables -t nat -A redir22 -d 192.168.1.111 -j RETURN
# iptables -t nat -A redir22 -d 127.0.0.1 -j RETURN
# iptables -t nat -A redir22 --match owner --uid-owner daemon -j RETURN
# iptables -t nat -A redir22 -p tcp -j REDIRECT --to 2222
# iptables -t nat -A OUTPUT -p tcp --dport 22 -j redir22

---FILTER--
Chain INPUT (policy ACCEPT 20 packets, 1360 bytes)
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
Chain OUTPUT (policy ACCEPT 2 packets, 100 bytes)
---NAT-----
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
Chain POSTROUTING (policy ACCEPT 1 packets, 60 bytes)
Chain OUTPUT (policy ACCEPT 1 packets, 60 bytes)
       0        0 redir22    tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           tcp dpt:22
Chain redir22 (1 references)
       0        0 RETURN     all  --  *      *       0.0.0.0/0            192.168.1.111
       0        0 RETURN     all  --  *      *       0.0.0.0/0            127.0.0.1
       0        0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0           OWNER UID match 50
       0        0 REDIRECT   tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           redir ports 2222
---MANGLE--
Chain PREROUTING (policy ACCEPT 2 packets, 100 bytes)
Chain INPUT (policy ACCEPT 2 packets, 100 bytes)
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
Chain OUTPUT (policy ACCEPT 2 packets, 100 bytes)
Chain POSTROUTING (policy ACCEPT 2 packets, 100 bytes)

On port 2222 I have the following running:

run:
====
ip=0.0.0.0
port=2222
user=daemon
maxconn=100
# envuidgid $user tcpserver -U:
# run tcpserver as $user
exec \
env - \
softlimit \
envuidgid $user \
tcpserver \
    -U \
    -v -R -H -l 0 -c $maxconn \
    $ip $port \
./startproxy22

startproxy22
============
targetip="$TCPORIGDSTIP"
targetport=22
# Loop prevention
if test "$TCPLOCALIP" = "$targetip"; then
    if test "$TCPLOCALPORT" = "$targetport"; then
        env >loop.detected
        exit 0
    fi
fi
# netcat rulez... however hose of netpipes fame
# is still better because --netslave properly closes
# connection on input EOF.
exec env - hose "$targetip" "$targetport" --netslave

> If it is indeed lo, could you please check the ethtool -k
> setting on it? Also what is the ethtool -k setting on the
> interface where you expect ssh to go out?

# ethtool -k lo
Offload parameters for lo:
Cannot get device rx csum settings: Operation not supported
Cannot get device tx csum settings: Operation not supported
Cannot get device scatter-gather settings: Operation not supported
rx-checksumming: off
tx-checksumming: off
scatter-gather: off
tcp segmentation offload: off
--
vda

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.18 forcedeth GSO panic on send
  2006-10-29 21:45       ` Denis Vlasenko
@ 2006-10-29 21:57         ` Denis Vlasenko
  0 siblings, 0 replies; 6+ messages in thread
From: Denis Vlasenko @ 2006-10-29 21:57 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Manfred Spraul, Jeff Garzik, netdev, linux-kernel,
	David S. Miller

On Sunday 29 October 2006 22:45, Denis Vlasenko wrote:
> On Sunday 29 October 2006 15:10, Herbert Xu wrote:
> > On Sun, Oct 29, 2006 at 01:55:56PM +0100, Denis Vlasenko wrote:
> > > With "echo 1 >/proc/sys/kernel/panic_on_oops" I've got
> > > what you're requested. See screenshot:
> > > 
> > > http://busybox.net/~vda/gso_panic/forcedeth_gso_panic2.jpg
> > 
> > Thanks!
> > 
> > Please let me know if this patch fixes it:
> > 
> > [NET]: Fix segmentation of linear packets
> 
> Okay, will test now.

Patch seems to fix the issue. Thanks.

> setting on it? Also what is the ethtool -k setting on the
> interface where you expect ssh to go out?

# ethtool -k if
Offload parameters for if:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
--
vda

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-10-29 20:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-26 23:17 2.6.18 forcedeth GSO panic on send Denis Vlasenko
2006-10-27  3:58 ` Herbert Xu
2006-10-29 12:55   ` Denis Vlasenko
2006-10-29 14:10     ` Herbert Xu
2006-10-29 21:45       ` Denis Vlasenko
2006-10-29 21:57         ` Denis Vlasenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).