* 2.6.18 forcedeth GSO panic on send @ 2006-10-26 23:17 Denis Vlasenko 2006-10-27 3:58 ` Herbert Xu 0 siblings, 1 reply; 6+ messages in thread From: Denis Vlasenko @ 2006-10-26 23:17 UTC (permalink / raw) To: Manfred Spraul, Jeff Garzik, netdev, linux-kernel [-- Attachment #1: Type: text/plain, Size: 832 bytes --] Hello, I am using an AMD64 box with 32bit userspace / 64bit kernel. Kernels 2.6.18 and 2.6.18.1 semi-randomly hang when I upload stuff over the net - for example, "svn commit", scp are affected. 2.6.17.11 does not seem to be affected. Unfortunately even 60-line screen is not big enough to catch whole trace. There are at least two traces, and first scrolls off. I have a photo at http://busybox.net/~vda/gso_panic/forcedeth_gso_panic.jpg Something bad is happening here, when kernel tries to send some data: ... error_exit skb_over_panic skb_over_panic skb_segment tcp_tso_segment inet_gso_segment skb_gso_segment dev_hard_start_xmit dev_queue_xmit ... Looks like it is related to hardware accel in forcedeth. I will try disabling all hw accel. Please find in attached tarball: .config dmesg ethtool-k lspci lspci-v -- vda [-- Attachment #2: gso_panic.tar.bz2 --] [-- Type: application/x-tbz, Size: 15933 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.18 forcedeth GSO panic on send 2006-10-26 23:17 2.6.18 forcedeth GSO panic on send Denis Vlasenko @ 2006-10-27 3:58 ` Herbert Xu 2006-10-29 12:55 ` Denis Vlasenko 0 siblings, 1 reply; 6+ messages in thread From: Herbert Xu @ 2006-10-27 3:58 UTC (permalink / raw) To: Denis Vlasenko Cc: Manfred Spraul, Jeff Garzik, netdev, linux-kernel, David S. Miller On Thu, Oct 26, 2006 at 11:17:57PM +0000, Denis Vlasenko wrote: > > I am using an AMD64 box with 32bit userspace / 64bit kernel. > > Kernels 2.6.18 and 2.6.18.1 semi-randomly hang when I upload stuff > over the net - for example, "svn commit", scp are affected. > 2.6.17.11 does not seem to be affected. > > Unfortunately even 60-line screen is not big enough > to catch whole trace. There are at least two traces, > and first scrolls off. I have a photo at > http://busybox.net/~vda/gso_panic/forcedeth_gso_panic.jpg Looks like a network stack bug rather than a driver problem. However, I'd really like to see the first oops including the print out from skb_over_panic. Could you try booting with pause_on_oops=1 or perhaps use a serial console? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.18 forcedeth GSO panic on send 2006-10-27 3:58 ` Herbert Xu @ 2006-10-29 12:55 ` Denis Vlasenko 2006-10-29 14:10 ` Herbert Xu 0 siblings, 1 reply; 6+ messages in thread From: Denis Vlasenko @ 2006-10-29 12:55 UTC (permalink / raw) To: Herbert Xu Cc: Manfred Spraul, Jeff Garzik, netdev, linux-kernel, David S. Miller Sorry for the delay. On Friday 27 October 2006 05:58, Herbert Xu wrote: > > I am using an AMD64 box with 32bit userspace / 64bit kernel. > > > > Kernels 2.6.18 and 2.6.18.1 semi-randomly hang when I upload stuff > > over the net - for example, "svn commit", scp are affected. > > 2.6.17.11 does not seem to be affected. > > > > Unfortunately even 60-line screen is not big enough > > to catch whole trace. There are at least two traces, > > and first scrolls off. I have a photo at > > http://busybox.net/~vda/gso_panic/forcedeth_gso_panic.jpg > > Looks like a network stack bug rather than a driver problem. > However, I'd really like to see the first oops including the > print out from skb_over_panic. With "echo 1 >/proc/sys/kernel/panic_on_oops" I've got what you're requested. See screenshot: http://busybox.net/~vda/gso_panic/forcedeth_gso_panic2.jpg -- vda ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.18 forcedeth GSO panic on send 2006-10-29 12:55 ` Denis Vlasenko @ 2006-10-29 14:10 ` Herbert Xu 2006-10-29 21:45 ` Denis Vlasenko 0 siblings, 1 reply; 6+ messages in thread From: Herbert Xu @ 2006-10-29 14:10 UTC (permalink / raw) To: Denis Vlasenko Cc: Manfred Spraul, Jeff Garzik, netdev, linux-kernel, David S. Miller On Sun, Oct 29, 2006 at 01:55:56PM +0100, Denis Vlasenko wrote: > > With "echo 1 >/proc/sys/kernel/panic_on_oops" I've got > what you're requested. See screenshot: > > http://busybox.net/~vda/gso_panic/forcedeth_gso_panic2.jpg Thanks! Please let me know if this patch fixes it: [NET]: Fix segmentation of linear packets skb_segment fails to segment linear packets correctly because it tries to write all linear parts of the original skb into each segment. This will always panic as each segment only contains enough space for one MSS. This was not detected earlier because linear packets should be rare for GSO. In fact it still remains to be seen what exactly created the linear packets that triggered this bug. Basically the only time this should happen is if someone enables GSO emulation on an interface that does not support SG. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> The fact that you're triggering this bug at all means that something else has gone wrong. First of all your picture shows a device setting of "lo". This does not tally with the fact that the BUG was triggered by ssh. Do you have any idea why this is the case? Do you have any netfilter rules that might cause this? If it is indeed lo, could you please check the ethtool -k setting on it? Also what is the ethtool -k setting on the interface where you expect ssh to go out? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 3c23760..f735455 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1946,7 +1946,7 @@ struct sk_buff *skb_segment(struct sk_bu do { struct sk_buff *nskb; skb_frag_t *frag; - int hsize, nsize; + int hsize; int k; int size; @@ -1957,11 +1957,10 @@ struct sk_buff *skb_segment(struct sk_bu hsize = skb_headlen(skb) - offset; if (hsize < 0) hsize = 0; - nsize = hsize + doffset; - if (nsize > len + doffset || !sg) - nsize = len + doffset; + if (hsize > len || !sg) + hsize = len; - nskb = alloc_skb(nsize + headroom, GFP_ATOMIC); + nskb = alloc_skb(hsize + doffset + headroom, GFP_ATOMIC); if (unlikely(!nskb)) goto err; ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: 2.6.18 forcedeth GSO panic on send 2006-10-29 14:10 ` Herbert Xu @ 2006-10-29 21:45 ` Denis Vlasenko 2006-10-29 21:57 ` Denis Vlasenko 0 siblings, 1 reply; 6+ messages in thread From: Denis Vlasenko @ 2006-10-29 21:45 UTC (permalink / raw) To: Herbert Xu Cc: Manfred Spraul, Jeff Garzik, netdev, linux-kernel, David S. Miller On Sunday 29 October 2006 15:10, Herbert Xu wrote: > On Sun, Oct 29, 2006 at 01:55:56PM +0100, Denis Vlasenko wrote: > > > > With "echo 1 >/proc/sys/kernel/panic_on_oops" I've got > > what you're requested. See screenshot: > > > > http://busybox.net/~vda/gso_panic/forcedeth_gso_panic2.jpg > > Thanks! > > Please let me know if this patch fixes it: > > [NET]: Fix segmentation of linear packets Okay, will test now. > The fact that you're triggering this bug at all means that > something else has gone wrong. First of all your picture > shows a device setting of "lo". This does not tally with > the fact that the BUG was triggered by ssh. Do you have > any idea why this is the case? Do you have any netfilter > rules that might cause this? Yes, I have netfilter rules which redirect all outgoing port 22 connections to local port 2222. I run tcpserver on 2222 which spawns hose for each connection. All this ugliness is required to get ssh working across buggy D-Link router. It chokes on nagle-disabled sessions, I think. The rules: # iptables -t nat -N redir22 # iptables -t nat -A redir22 -d 192.168.1.111 -j RETURN # iptables -t nat -A redir22 -d 127.0.0.1 -j RETURN # iptables -t nat -A redir22 --match owner --uid-owner daemon -j RETURN # iptables -t nat -A redir22 -p tcp -j REDIRECT --to 2222 # iptables -t nat -A OUTPUT -p tcp --dport 22 -j redir22 ---FILTER-- Chain INPUT (policy ACCEPT 20 packets, 1360 bytes) Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) Chain OUTPUT (policy ACCEPT 2 packets, 100 bytes) ---NAT----- Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes) Chain POSTROUTING (policy ACCEPT 1 packets, 60 bytes) Chain OUTPUT (policy ACCEPT 1 packets, 60 bytes) 0 0 redir22 tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:22 Chain redir22 (1 references) 0 0 RETURN all -- * * 0.0.0.0/0 192.168.1.111 0 0 RETURN all -- * * 0.0.0.0/0 127.0.0.1 0 0 RETURN all -- * * 0.0.0.0/0 0.0.0.0/0 OWNER UID match 50 0 0 REDIRECT tcp -- * * 0.0.0.0/0 0.0.0.0/0 redir ports 2222 ---MANGLE-- Chain PREROUTING (policy ACCEPT 2 packets, 100 bytes) Chain INPUT (policy ACCEPT 2 packets, 100 bytes) Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) Chain OUTPUT (policy ACCEPT 2 packets, 100 bytes) Chain POSTROUTING (policy ACCEPT 2 packets, 100 bytes) On port 2222 I have the following running: run: ==== ip=0.0.0.0 port=2222 user=daemon maxconn=100 # envuidgid $user tcpserver -U: # run tcpserver as $user exec \ env - \ softlimit \ envuidgid $user \ tcpserver \ -U \ -v -R -H -l 0 -c $maxconn \ $ip $port \ ./startproxy22 startproxy22 ============ targetip="$TCPORIGDSTIP" targetport=22 # Loop prevention if test "$TCPLOCALIP" = "$targetip"; then if test "$TCPLOCALPORT" = "$targetport"; then env >loop.detected exit 0 fi fi # netcat rulez... however hose of netpipes fame # is still better because --netslave properly closes # connection on input EOF. exec env - hose "$targetip" "$targetport" --netslave > If it is indeed lo, could you please check the ethtool -k > setting on it? Also what is the ethtool -k setting on the > interface where you expect ssh to go out? # ethtool -k lo Offload parameters for lo: Cannot get device rx csum settings: Operation not supported Cannot get device tx csum settings: Operation not supported Cannot get device scatter-gather settings: Operation not supported rx-checksumming: off tx-checksumming: off scatter-gather: off tcp segmentation offload: off -- vda ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.18 forcedeth GSO panic on send 2006-10-29 21:45 ` Denis Vlasenko @ 2006-10-29 21:57 ` Denis Vlasenko 0 siblings, 0 replies; 6+ messages in thread From: Denis Vlasenko @ 2006-10-29 21:57 UTC (permalink / raw) To: Herbert Xu Cc: Manfred Spraul, Jeff Garzik, netdev, linux-kernel, David S. Miller On Sunday 29 October 2006 22:45, Denis Vlasenko wrote: > On Sunday 29 October 2006 15:10, Herbert Xu wrote: > > On Sun, Oct 29, 2006 at 01:55:56PM +0100, Denis Vlasenko wrote: > > > With "echo 1 >/proc/sys/kernel/panic_on_oops" I've got > > > what you're requested. See screenshot: > > > > > > http://busybox.net/~vda/gso_panic/forcedeth_gso_panic2.jpg > > > > Thanks! > > > > Please let me know if this patch fixes it: > > > > [NET]: Fix segmentation of linear packets > > Okay, will test now. Patch seems to fix the issue. Thanks. > setting on it? Also what is the ethtool -k setting on the > interface where you expect ssh to go out? # ethtool -k if Offload parameters for if: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: on -- vda ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-10-29 20:59 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-10-26 23:17 2.6.18 forcedeth GSO panic on send Denis Vlasenko 2006-10-27 3:58 ` Herbert Xu 2006-10-29 12:55 ` Denis Vlasenko 2006-10-29 14:10 ` Herbert Xu 2006-10-29 21:45 ` Denis Vlasenko 2006-10-29 21:57 ` Denis Vlasenko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).