* rare bad TCP checksum with 2.6.19?
@ 2007-01-14 22:59 Michael Tokarev
2007-01-15 9:39 ` Herbert Xu
0 siblings, 1 reply; 32+ messages in thread
From: Michael Tokarev @ 2007-01-14 22:59 UTC (permalink / raw)
To: netdev
I noticied, after running with 2.6.19 for more than a month, that
sometimes, a file transfer, when one of the ends is running 2.6.19,
stalls at the very end of the file, forever.
Playing with tcpdump, I noticied that the host sends out packets with
wrong checksums, like this:
01:28:07.608457 IP (tos 0x0, ttl 64, id 11740, offset 0, flags [DF], length: 82)
81.13.94.6.80 > 216.168.29.244.57064: FP [bad tcp cksum b011 (->7ae2)!]
140062:140092(30) ack 125 win 2896 <nop,nop,timestamp 87610 145676467>
(here, 81.13.94.6 is running linux 2.6.19).
It happens only on rare cases, and not reliable repeatable.
After further playing I noticied that - almost - only packets with FIN flag
set (like the above), *and* containing some data in them (again, like the
above), shows this behaviour.
With FIN set, the thing is 100% repeatable (the only problem is to force the
system to actually send such a packet -- for that, one has to push quite some
data to the socket and immediately close it, so that there will be some data
to send in kernel buffer still at the moment of close).
This explains the observed behaviour - rare, unreliable stalls at the end of
a transfer -- because it's relatively rare when FIN packet contains some data.
But sometimes, other packets go out with bad checksum, too:
01:20:01.712146 IP (tos 0x0, ttl 64, id 52870, offset 0, flags [DF], length: 1500)
81.13.94.6.80 > 216.168.29.244.57655: . [bad tcp cksum ab7e (->dcbd)!]
112945:114393(1448) ack 125 win 2896 <nop,nop,timestamp 39006 145190996>
(again, 81.13.94.6 is a machine running linux 2.6.19). That's one in a row of
other pretty normal packets - it has been retransmitted a bit later, with correct
checksum.
When switching back to 2.6.17 (previous kernel which was running on this
machine), things goes back to normal, or at least so it seems.
Note there's no funny/interesting hardware involved, like network cards with
tcp checksumming offload capabilities (this is plain dumb 8139 card).
I'll try to collect further information tomorrow. But if someone has some
clue before.... ;)
Thanks!
/mjt
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: rare bad TCP checksum with 2.6.19? 2007-01-14 22:59 rare bad TCP checksum with 2.6.19? Michael Tokarev @ 2007-01-15 9:39 ` Herbert Xu 2007-01-15 13:34 ` Michael Tokarev 0 siblings, 1 reply; 32+ messages in thread From: Herbert Xu @ 2007-01-15 9:39 UTC (permalink / raw) To: Michael Tokarev; +Cc: netdev Michael Tokarev <mjt@tls.msk.ru> wrote: > > Note there's no funny/interesting hardware involved, like network cards with > tcp checksumming offload capabilities (this is plain dumb 8139 card). The 8139 card might be dumb, but the driver isn't :) It emulates checksum offload in software, meaning that tcpdump will show bogus checksums. So please disable hardware checksum offload with ethtool -K and then try again. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: rare bad TCP checksum with 2.6.19? 2007-01-15 9:39 ` Herbert Xu @ 2007-01-15 13:34 ` Michael Tokarev 2007-01-15 14:25 ` Michael Tokarev ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Michael Tokarev @ 2007-01-15 13:34 UTC (permalink / raw) To: Herbert Xu; +Cc: netdev Herbert Xu wrote: > Michael Tokarev <mjt@tls.msk.ru> wrote: >> Note there's no funny/interesting hardware involved, like network cards with >> tcp checksumming offload capabilities (this is plain dumb 8139 card). > > The 8139 card might be dumb, but the driver isn't :) It emulates > checksum offload in software, meaning that tcpdump will show bogus > checksums. > > So please disable hardware checksum offload with ethtool -K and > then try again. # ethtool -k eth0 Offload parameters for eth0: Cannot get device rx csum settings: Operation not supported Cannot get device tx csum settings: Operation not supported Cannot get device scatter-gather settings: Operation not supported Cannot get device tcp segmentation offload settings: Operation not supported no offload info available # ethtool -K eth0 rx off tx off tso off Cannot set device rx csum settings: Operation not supported So I guess the problem is not related to hw checksumming offloading. Meanwhile, I tried many times to reproduce the problem - with little success. With different sizings, options, et al - I can't force the sending side to send some data within a FIN packet. I.e, most of the time, the thing just works, because no data goes with FIN packet. But once every 50..100 tries, I see single FIN-with-data packet, and that one ALWAYS has bad checksum. I was never able to reproduce the problem on a LAN, only when going from a distant host. And even with that distant host, it's very difficult to reproduce. At least one network (also distant) triggers this problem on every 2nd try or so (the one I experimented with yesterday). But I've no access to that network - I kindly asked for help yesterday, but I can't abuse their willingness to help more. And another thing I noticed. Right now I'm experimenting with another machine, running 2.6.17(.13) - it also shows similar behavior with bad csums, but MUCH rarer than this 2.6.19. Like this: 16:29:32.490976 IP (tos 0x60, ttl 48, id 14110, offset 0, flags [DF], length: 80) 69.42.67.34.2612 > 81.13.94.6.1234: . [bad tcp cksum f4b4 (->c1cc)!] ack 93407 win 9821 <nop,nop,timestamp 1046528199 5497679,nop,nop,sack sack 3 {104991:109335}{110783:112231}{104991:109335} > 16:29:32.525988 IP (tos 0x60, ttl 48, id 14112, offset 0, flags [DF], length: 80) 69.42.67.34.2612 > 81.13.94.6.1234: . [bad tcp cksum 3fb1 (->1819)!] ack 93407 win 9821 <nop,nop,timestamp 1046528202 5497679,nop,nop,sack sack 3 {110783:113679}{122367:123815}{110783:113679} > 16:29:32.561407 IP (tos 0x60, ttl 48, id 14116, offset 0, flags [DF], length: 80) 69.42.67.34.2612 > 81.13.94.6.1234: . [bad tcp cksum 87c0 (->2610)!] ack 93407 win 9821 <nop,nop,timestamp 1046528205 5497679,nop,nop,sack sack 3 {122367:127103}{128551:129572}{122367:127103} > Here, 69.42.67.34 is 2.6.17 from which I'm requesting data, and 81.13.94.6 is the sender. This behavior so far is demonstrated with sack packets only, but I've seen it in other direction too (also with sack), at least once. Any idea how to force sending FIN-with-data? Thanks! /mjt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: rare bad TCP checksum with 2.6.19? 2007-01-15 13:34 ` Michael Tokarev @ 2007-01-15 14:25 ` Michael Tokarev 2007-01-15 18:13 ` Eric Dumazet 2007-01-15 20:10 ` Herbert Xu 2 siblings, 0 replies; 32+ messages in thread From: Michael Tokarev @ 2007-01-15 14:25 UTC (permalink / raw) Cc: Herbert Xu, netdev Michael Tokarev wrote: [] > And another thing I noticed. Right now I'm experimenting with another > machine, running 2.6.17(.13) - it also shows similar behavior with bad > csums, but MUCH rarer than this 2.6.19. Like this: > > 16:29:32.490976 IP (tos 0x60, ttl 48, id 14110, offset 0, flags [DF], length: 80) > 69.42.67.34.2612 > 81.13.94.6.1234: . [bad tcp cksum f4b4 (->c1cc)!] ack 93407 win 9821 > <nop,nop,timestamp 1046528199 5497679,nop,nop,sack sack 3 {104991:109335}{110783:112231}{104991:109335} > This seems to be a tcpdump bug. At least the same packet(s), on another machine (in-between the two), with updated tcpdump, shows as having correct checksum. After updating tcpdump on this machine, I'm not seeing this 'sack bad cksum' stuff anymore. /mjt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: rare bad TCP checksum with 2.6.19? 2007-01-15 13:34 ` Michael Tokarev 2007-01-15 14:25 ` Michael Tokarev @ 2007-01-15 18:13 ` Eric Dumazet 2007-01-15 19:33 ` Michael Tokarev 2007-01-15 20:10 ` Herbert Xu 2 siblings, 1 reply; 32+ messages in thread From: Eric Dumazet @ 2007-01-15 18:13 UTC (permalink / raw) To: Michael Tokarev; +Cc: Herbert Xu, netdev Michael Tokarev a e'crit : > > Any idea how to force sending FIN-with-data? int flag_on = 1; setsockopt(fd, SOL_TCP, TCP_CORK, &flag_on, sizeof(int)); send(fd, data, datalen, 0); close(fd); Eric Dumazet ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: rare bad TCP checksum with 2.6.19? 2007-01-15 18:13 ` Eric Dumazet @ 2007-01-15 19:33 ` Michael Tokarev 2007-01-15 23:36 ` Eric Dumazet 0 siblings, 1 reply; 32+ messages in thread From: Michael Tokarev @ 2007-01-15 19:33 UTC (permalink / raw) To: Eric Dumazet; +Cc: Herbert Xu, netdev Eric Dumazet wrote: > Michael Tokarev a e'crit : >> >> Any idea how to force sending FIN-with-data? > > int flag_on = 1; > setsockopt(fd, SOL_TCP, TCP_CORK, &flag_on, sizeof(int)); > send(fd, data, datalen, 0); > close(fd); That produces two packets - one (or more - depending on the size) data packet and one FIN packet w/o any data. This is the first thing I've tried. /mjt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: rare bad TCP checksum with 2.6.19? 2007-01-15 19:33 ` Michael Tokarev @ 2007-01-15 23:36 ` Eric Dumazet 0 siblings, 0 replies; 32+ messages in thread From: Eric Dumazet @ 2007-01-15 23:36 UTC (permalink / raw) To: Michael Tokarev; +Cc: Herbert Xu, netdev Michael Tokarev a écrit : > Eric Dumazet wrote: >> Michael Tokarev a e'crit : >>> Any idea how to force sending FIN-with-data? >> int flag_on = 1; >> setsockopt(fd, SOL_TCP, TCP_CORK, &flag_on, sizeof(int)); >> send(fd, data, datalen, 0); >> close(fd); > > That produces two packets - one (or more - depending on the > size) data packet and one FIN packet w/o any data. > > This is the first thing I've tried. This may be because I forgot the shutdown() ? int flag_on = 1; setsockopt(fd, SOL_TCP, TCP_CORK, &flag_on, sizeof(int)); send(fd, data, datalen, 0); shutdown(fd, 1); close(fd); At least this is working on my machines (with and without shutdown()) Eric ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: rare bad TCP checksum with 2.6.19? 2007-01-15 13:34 ` Michael Tokarev 2007-01-15 14:25 ` Michael Tokarev 2007-01-15 18:13 ` Eric Dumazet @ 2007-01-15 20:10 ` Herbert Xu 2007-01-15 21:46 ` Michael Tokarev 2 siblings, 1 reply; 32+ messages in thread From: Herbert Xu @ 2007-01-15 20:10 UTC (permalink / raw) To: Michael Tokarev; +Cc: netdev On Mon, Jan 15, 2007 at 04:34:41PM +0300, Michael Tokarev wrote: > > # ethtool -k eth0 > Offload parameters for eth0: > Cannot get device rx csum settings: Operation not supported > Cannot get device tx csum settings: Operation not supported > Cannot get device scatter-gather settings: Operation not supported > Cannot get device tcp segmentation offload settings: Operation not supported > no offload info available > > # ethtool -K eth0 rx off tx off tso off > Cannot set device rx csum settings: Operation not supported > > So I guess the problem is not related to hw checksumming offloading. Nope, it just means that 8139too doesn't provide ethtool handlers to disable checksum offloading. So I suggest that you try doing the tcpdump on the receive side as that should show the real checksum. BTW, the reason tcpdump only shows some packets with bogus checksums is because it cuts packets off at 100 bytes by default so for most packets it can't verify the checksum at all. If you run it with -s 1600 you should see bogus checksums on every packet with payload. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: rare bad TCP checksum with 2.6.19? 2007-01-15 20:10 ` Herbert Xu @ 2007-01-15 21:46 ` Michael Tokarev 2007-01-15 23:35 ` Herbert Xu 2007-01-16 3:27 ` Herbert Xu 0 siblings, 2 replies; 32+ messages in thread From: Michael Tokarev @ 2007-01-15 21:46 UTC (permalink / raw) To: Herbert Xu; +Cc: netdev Herbert Xu wrote: > On Mon, Jan 15, 2007 at 04:34:41PM +0300, Michael Tokarev wrote: [] >> So I guess the problem is not related to hw checksumming offloading. > > Nope, it just means that 8139too doesn't provide ethtool handlers to > disable checksum offloading. > > So I suggest that you try doing the tcpdump on the receive side as > that should show the real checksum. I'm doing the capture on an intermediate host - the whole day today ;) > BTW, the reason tcpdump only shows some packets with bogus checksums > is because it cuts packets off at 100 bytes by default so for most > packets it can't verify the checksum at all. If you run it with > -s 1600 you should see bogus checksums on every packet with payload. And I'm capturing with -s 2000. By the way, tcpdump just does not verify the cheksum of truncated (due to capture size) packets. At least not the version I'm using (which is 3.9.5). Herbert, the problem IS real, it's not due to some bad behavior due to improper capturing or something like that. Yes it's difficult to come to it, but it is real. I've saved quite alot of packets today, but it's all quite.. useless as the thing is difficult to hit. Here's some traces made with the following filter: proto TCP and tcp[tcpflags] & (tcp-fin|tcp-push) == (tcp-fin|tcp-push) (I've choosen FIN+PUSH because this combination is where the problem is seen most - to be fair, it looks like I haven't seen it with other flags). In there, some packets are ok, but some are not. So - again, it seems like - I was wrong about 100% "hit ratio" -- ie, that the "bad checksum" is ALWAYS the case with packets where some data goes in FIN packets -- this is incorrect, because the trace shows quite a few examples of right behavior. The trace is here: http://www.corpit.ru/mjt/bad-tcp-cksum-dmp.bin (it contains some data which it sholdn't - but I hope there's nothing confidential in there ;) So, after the whole day digging around, I still don't have any more-or-less clean way to reproduce it. But I've noticied another thing as well: many different machines here, with different kernels, behave the same way. So it can't be a hardware problem for example. And only at VERY rare cases, the thing causes noticeable transfer slowdowns or stalls. But some networks triggers those rare cases more often than others (so the only more or less sane conclusion I can come with is that it's somehow timing-related). Thanks! /mjt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: rare bad TCP checksum with 2.6.19? 2007-01-15 21:46 ` Michael Tokarev @ 2007-01-15 23:35 ` Herbert Xu 2007-01-16 3:27 ` Herbert Xu 1 sibling, 0 replies; 32+ messages in thread From: Herbert Xu @ 2007-01-15 23:35 UTC (permalink / raw) To: Michael Tokarev; +Cc: netdev On Tue, Jan 16, 2007 at 12:46:08AM +0300, Michael Tokarev wrote: > > I'm doing the capture on an intermediate host - the whole day today ;) Cool, I was just trying to make sure :) > The trace is here: http://www.corpit.ru/mjt/bad-tcp-cksum-dmp.bin I'll take a look. Are you using anything extra like netfilter? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: rare bad TCP checksum with 2.6.19? 2007-01-15 21:46 ` Michael Tokarev 2007-01-15 23:35 ` Herbert Xu @ 2007-01-16 3:27 ` Herbert Xu 2007-01-16 3:38 ` Herbert Xu 1 sibling, 1 reply; 32+ messages in thread From: Herbert Xu @ 2007-01-16 3:27 UTC (permalink / raw) To: Michael Tokarev; +Cc: netdev On Tue, Jan 16, 2007 at 12:46:08AM +0300, Michael Tokarev wrote: > > The trace is here: http://www.corpit.ru/mjt/bad-tcp-cksum-dmp.bin I'm sorry but this dump does NOT look like it was taken from an intermediate box. I verified two bad checksums (chosen randomly) and they were both correct but partial checksums. This means that this dump was most likely taken from the sending host. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: rare bad TCP checksum with 2.6.19? 2007-01-16 3:27 ` Herbert Xu @ 2007-01-16 3:38 ` Herbert Xu 2007-01-16 8:08 ` Michael Tokarev 0 siblings, 1 reply; 32+ messages in thread From: Herbert Xu @ 2007-01-16 3:38 UTC (permalink / raw) To: Michael Tokarev; +Cc: netdev On Tue, Jan 16, 2007 at 02:27:39PM +1100, Herbert Xu wrote: > > I'm sorry but this dump does NOT look like it was taken from an > intermediate box. I verified two bad checksums (chosen randomly) > and they were both correct but partial checksums. This means that > this dump was most likely taken from the sending host. I did see one strange bit: 02:39:51.758803 IP (tos 0x0, ttl 63, id 41084, offset 0, flags [DF], length: 102) 192.168.1.1.25 > 81.13.94.6.21350: FP [bad tcp cksum 81b0 (->9ee8)!] 4271854025:4271 854075(50) ack 3772789166 win 272 <nop,nop,timestamp 145420525 6279830> 0x0000: 4500 0066 a07c 4000 3f06 2a59 c0a8 0101 E..f.|@.?.*Y.... 0x0010: 510d 5e06 0019 5366 fe9f 51c9 e0e0 31ae Q.^...Sf..Q...1. 0x0020: 8019 0110 81b0 0000 0101 080a 08aa f0ed ................ 0x0030: 005f d296 3235 3020 322e 302e 3020 4f6b ._..250.2.0.0.Ok 0x0040: 3a20 7175 6575 6564 2061 7320 3631 3345 :.queued.as.613E 0x0050: 4137 4637 440d 0a32 3231 2032 2e30 2e30 A7F7D..221.2.0.0 0x0060: 2042 7965 0d0a .Bye.. Most of the bad checksums are from 81.13.94.6, which I presume is the host you were dumping on. However, this packet is destined for it instead and yet it too has a partial (but correct) checksum. So the question is where in your network is 192.168.1.1 and how is your network setup in terms of NAT? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: rare bad TCP checksum with 2.6.19? 2007-01-16 3:38 ` Herbert Xu @ 2007-01-16 8:08 ` Michael Tokarev 2007-01-16 11:50 ` Herbert Xu 0 siblings, 1 reply; 32+ messages in thread From: Michael Tokarev @ 2007-01-16 8:08 UTC (permalink / raw) To: Herbert Xu; +Cc: netdev Herbert Xu wrote: > On Tue, Jan 16, 2007 at 02:27:39PM +1100, Herbert Xu wrote: >> I'm sorry but this dump does NOT look like it was taken from an >> intermediate box. I verified two bad checksums (chosen randomly) >> and they were both correct but partial checksums. This means that >> this dump was most likely taken from the sending host. > > I did see one strange bit: > > 02:39:51.758803 IP (tos 0x0, ttl 63, id 41084, offset 0, flags [DF], length: 102) 192.168.1.1.25 > 81.13.94.6.21350: FP [bad tcp cksum 81b0 (->9ee8)!] 4271854025:4271 > 854075(50) ack 3772789166 win 272 <nop,nop,timestamp 145420525 6279830> > 0x0000: 4500 0066 a07c 4000 3f06 2a59 c0a8 0101 E..f.|@.?.*Y.... > 0x0010: 510d 5e06 0019 5366 fe9f 51c9 e0e0 31ae Q.^...Sf..Q...1. > 0x0020: 8019 0110 81b0 0000 0101 080a 08aa f0ed ................ > 0x0030: 005f d296 3235 3020 322e 302e 3020 4f6b ._..250.2.0.0.Ok > 0x0040: 3a20 7175 6575 6564 2061 7320 3631 3345 :.queued.as.613E > 0x0050: 4137 4637 440d 0a32 3231 2032 2e30 2e30 A7F7D..221.2.0.0 > 0x0060: 2042 7965 0d0a .Bye.. > > Most of the bad checksums are from 81.13.94.6, which I presume is > the host you were dumping on. However, this packet is destined > for it instead and yet it too has a partial (but correct) checksum. > > So the question is where in your network is 192.168.1.1 and how is > your network setup in terms of NAT? This 192.168.* network is internal, and this very packet - I didn't think it'll be there, but.. hum. The network looks like this: internet | 81.13.94.6 etc [ router ] - [ DMZ ] | [ LAN ] 192.168.1.1 etc The capture has been made on the router, on the interface which is connected to a DMZ segment (so no netfilter stuff should be involved at all; but there's no fancy netfilter setup between dmz and external inteface, many packets don't even go to conntrack). 81.13.94.6 is a machine in the DMZ segment (it's www.corpit.ru, by the way). 192.168.1.1 is a machine in LAN. So the packet you're referring to belongs to a connection between internal (on LAN) mailserver and a DMZ mailserver - and that one, -- at least I didn't think about capturing *that* traffic. At least most of the packets were between dmz and external interface. That to say - 192.168.1.1 machine also has this problem (as I mentioned before - it happens on several different machines with different kernels (all are 2.6.19 still - it doesn't happen with 2.6.18 or before)), but it wasn't the main machine I did the testing on. Ok. Here's another trace, from that remote network that triggers this thing more-or-less reliable (every 2nd transfer at least) -- http://www.corpit.ru/mjt/bh-bad-cksum-dmp.bin . It's a full session between 216.168.29.244 - the requesting/receiving side -- and 81.13.94.6 -- our sending side (the file being transferred is some trojan horse I found on a friend's PC, so be careful ;) The last packet(s) -- they're repeated many times, ad infinitum, because the receiving side discards incorrectly checksummed packets and thus never sees the final part of the data -- here it's as captured on the router (above, included in the trace): 10:52:35.702649 IP (tos 0x0, ttl 64, id 61117, offset 0, flags [DF], proto: TCP (6), length: 82) 81.13.94.6.80 > 216.168.29.244.55354: FP, cksum 0x9185 (incorrect (-> 0x5c56), 140062:140092(30) ack 125 win 2896 <nop,nop,timestamp 12118000 265951653> And here it is again, captured on the RECEIVING side (on 216.168.29.244): 07:52:35.816545 IP (tos 0x0, ttl 48, id 61117, offset 0, flags [DF], proto: TCP (6), length: 82) 81.13.94.6.80 > 216.168.29.244.55354: FP, cksum 0x9185 (incorrect (-> 0x5c56), 140062:140092(30) ack 125 win 2896 <nop,nop,timestamp 12118000 265951653> (the only difference in headers I see is in the TTL, which is expectable). The transfer never finishes, it sits at 98% or so. On the receiving side (which is running FreeBSD), "bad checksums" statistics counter increases with every FP packet. It also makes no difference whenever tcpdump is running on either side or on an intermediate host or not. Thanks! /mjt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: rare bad TCP checksum with 2.6.19? 2007-01-16 8:08 ` Michael Tokarev @ 2007-01-16 11:50 ` Herbert Xu 2007-01-16 12:15 ` Patrick McHardy 2007-01-17 14:12 ` Michael Tokarev 0 siblings, 2 replies; 32+ messages in thread From: Herbert Xu @ 2007-01-16 11:50 UTC (permalink / raw) To: Michael Tokarev; +Cc: netdev, Patrick McHardy On Tue, Jan 16, 2007 at 11:08:51AM +0300, Michael Tokarev wrote: > > Ok. Here's another trace, from that remote network that triggers > this thing more-or-less reliable (every 2nd transfer at least) -- > http://www.corpit.ru/mjt/bh-bad-cksum-dmp.bin . It's a full session > between 216.168.29.244 - the requesting/receiving side -- and > 81.13.94.6 -- our sending side (the file being transferred is some > trojan horse I found on a friend's PC, so be careful ;) I'll have a look at this tomorrow. Since you're certain that this is being seen on the wire, one possibility is that we've got a bug somewhere that's zeroing skb->ip_summed on a packet with a partial checksum. One potential spot where this could happen is netfilter. Patrick, do you know of any recent changes (this is happening with 2.6.19) that might cause this? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: rare bad TCP checksum with 2.6.19? 2007-01-16 11:50 ` Herbert Xu @ 2007-01-16 12:15 ` Patrick McHardy 2007-01-16 14:38 ` Michael Tokarev 2007-01-17 14:12 ` Michael Tokarev 1 sibling, 1 reply; 32+ messages in thread From: Patrick McHardy @ 2007-01-16 12:15 UTC (permalink / raw) To: Herbert Xu; +Cc: Michael Tokarev, netdev Herbert Xu wrote: > On Tue, Jan 16, 2007 at 11:08:51AM +0300, Michael Tokarev wrote: > >>Ok. Here's another trace, from that remote network that triggers >>this thing more-or-less reliable (every 2nd transfer at least) -- >>http://www.corpit.ru/mjt/bh-bad-cksum-dmp.bin . It's a full session >>between 216.168.29.244 - the requesting/receiving side -- and >>81.13.94.6 -- our sending side (the file being transferred is some >>trojan horse I found on a friend's PC, so be careful ;) > > > I'll have a look at this tomorrow. > > Since you're certain that this is being seen on the wire, one > possibility is that we've got a bug somewhere that's zeroing > skb->ip_summed on a packet with a partial checksum. > > One potential spot where this could happen is netfilter. > Patrick, do you know of any recent changes (this is happening > with 2.6.19) that might cause this? The incremental HW checksum update stuff went in 2.6.19, so thats a prime suspect. Can't see where this could be happening though. Michael, how exactly is netfilter involved in your setup? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: rare bad TCP checksum with 2.6.19? 2007-01-16 12:15 ` Patrick McHardy @ 2007-01-16 14:38 ` Michael Tokarev 0 siblings, 0 replies; 32+ messages in thread From: Michael Tokarev @ 2007-01-16 14:38 UTC (permalink / raw) To: Patrick McHardy; +Cc: Herbert Xu, netdev Patrick McHardy wrote: > Herbert Xu wrote: [] >> Since you're certain that this is being seen on the wire, one >> possibility is that we've got a bug somewhere that's zeroing >> skb->ip_summed on a packet with a partial checksum. >> >> One potential spot where this could happen is netfilter. >> Patrick, do you know of any recent changes (this is happening >> with 2.6.19) that might cause this? > > The incremental HW checksum update stuff went in 2.6.19, so thats > a prime suspect. Can't see where this could be happening though. > > Michael, how exactly is netfilter involved in your setup? I think it doesn't involved. The captures I did were done on a router box, which indeed has some netfilter stuff. But: 1) the capture has been done on an interface directly connected to the segment where the "testing" machine is located (not on the "external" interface) 2) the "testing" machine itself does not have any netfilter modules loaded 3) the packets looks exactly the same in at least 3 places (modulo the TTL values): on the sending machine, on the router (on the interface connected to the sending machine - in those 2 places, the TTL is the same), and at the receiving side, which is 20+ hops away. 4) I tried another machine today (upgraded from 2.6.17 to 2.6.19) - stand-alone, without any netfilter modules loaded (but it's under quite.. some load - see http://j.ns.dsbl.org/nsg/ -- with this load it'll die right after iptables module loading, it's a 600MHz Celeron box replying to 15000 DNS packets every secound) - it started showing the same behavior. /mjt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: rare bad TCP checksum with 2.6.19? 2007-01-16 11:50 ` Herbert Xu 2007-01-16 12:15 ` Patrick McHardy @ 2007-01-17 14:12 ` Michael Tokarev 2007-01-19 11:06 ` [PATCH] tcp_output: " Jarek Poplawski 1 sibling, 1 reply; 32+ messages in thread From: Michael Tokarev @ 2007-01-17 14:12 UTC (permalink / raw) To: Herbert Xu; +Cc: netdev, Patrick McHardy Herbert Xu wrote: > On Tue, Jan 16, 2007 at 11:08:51AM +0300, Michael Tokarev wrote: >> Ok. Here's another trace, from that remote network that triggers >> this thing more-or-less reliable (every 2nd transfer at least) -- >> http://www.corpit.ru/mjt/bh-bad-cksum-dmp.bin . It's a full session >> between 216.168.29.244 - the requesting/receiving side -- and >> 81.13.94.6 -- our sending side (the file being transferred is some >> trojan horse I found on a friend's PC, so be careful ;) > > I'll have a look at this tomorrow. > > Since you're certain that this is being seen on the wire, one > possibility is that we've got a bug somewhere that's zeroing > skb->ip_summed on a packet with a partial checksum. Here's another sample, which may be more useful. I've seen quite alot of very similar stuff while running tcpdump. http://www.corpit.ru/mjt/bad-cksum-session3-dmp.bin The scenario looks like this. A client (82.84.172.37 -- a zombie machine trying to send us spam in this case) connects to a port 25 here (81.13.94.6:25). SYN+ACK sequence completes. Next, our server send an initial SMTP greething message, but almost right after that, the client sends a FIN packet, WITHOUT acknowleging that it received the (first and only) data packet. So some time later our machine re-sends the data, AND adds FIN flag to the packet (also replying to the FIN received from the client). And *that* packet - original data packet which is modified to also include FIN - has incorrect checksum. So it looks like the checksum isn't being updated WHEN ADDING MORE FLAGS to the original data packet. /mjt ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19? 2007-01-17 14:12 ` Michael Tokarev @ 2007-01-19 11:06 ` Jarek Poplawski 2007-01-19 12:14 ` Patrick McHardy ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Jarek Poplawski @ 2007-01-19 11:06 UTC (permalink / raw) To: Michael Tokarev; +Cc: netdev, Patrick McHardy, Herbert Xu On 17-01-2007 15:12, Michael Tokarev wrote: > Herbert Xu wrote: >> On Tue, Jan 16, 2007 at 11:08:51AM +0300, Michael Tokarev wrote: >>> Ok. Here's another trace, from that remote network that triggers >>> this thing more-or-less reliable (every 2nd transfer at least) -- >>> http://www.corpit.ru/mjt/bh-bad-cksum-dmp.bin . It's a full session >>> between 216.168.29.244 - the requesting/receiving side -- and >>> 81.13.94.6 -- our sending side (the file being transferred is some >>> trojan horse I found on a friend's PC, so be careful ;) >> I'll have a look at this tomorrow. >> >> Since you're certain that this is being seen on the wire, one >> possibility is that we've got a bug somewhere that's zeroing >> skb->ip_summed on a packet with a partial checksum. > > Here's another sample, which may be more useful. I've seen quite > alot of very similar stuff while running tcpdump. > > http://www.corpit.ru/mjt/bad-cksum-session3-dmp.bin > > The scenario looks like this. > > A client (82.84.172.37 -- a zombie machine trying to send us spam > in this case) connects to a port 25 here (81.13.94.6:25). SYN+ACK > sequence completes. Next, our server send an initial SMTP greething > message, but almost right after that, the client sends a FIN packet, > WITHOUT acknowleging that it received the (first and only) data > packet. So some time later our machine re-sends the data, AND adds > FIN flag to the packet (also replying to the FIN received from the > client). And *that* packet - original data packet which is modified > to also include FIN - has incorrect checksum. > > So it looks like the checksum isn't being updated WHEN ADDING MORE > FLAGS to the original data packet. > Hi, Here is my patch proposal. If I'm not totally wrong, there is a possibility that, during collapsing, empty skb with FIN is added to "normal" packet and changes its ip_summed field to CHECKSUM_NONE. Regards, Jarek P. PS: probably there are also other possibilities... --- [PATCH][NET] tcp_output: rare bad TCP checksum with 2.6.19 The patch "Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE" changed to unconditional copying of ip_summed field from collapsed skb. This patch reverts this change. All substantial work including heavy testing and diagnosing by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> --- diff -Nurp linux-2.6.19-/net/ipv4/tcp_output.c linux-2.6.19/net/ipv4/tcp_output.c --- linux-2.6.19-/net/ipv4/tcp_output.c 2006-11-29 22:57:37.000000000 +0100 +++ linux-2.6.19/net/ipv4/tcp_output.c 2007-01-19 07:58:39.000000000 +0100 @@ -1590,7 +1590,8 @@ static void tcp_retrans_try_collapse(str memcpy(skb_put(skb, next_skb_size), next_skb->data, next_skb_size); - skb->ip_summed = next_skb->ip_summed; + if (next_skb->ip_summed == CHECKSUM_PARTIAL) + skb->ip_summed = CHECKSUM_PARTIAL; if (skb->ip_summed != CHECKSUM_PARTIAL) skb->csum = csum_block_add(skb->csum, next_skb->csum, skb_size); ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19? 2007-01-19 11:06 ` [PATCH] tcp_output: " Jarek Poplawski @ 2007-01-19 12:14 ` Patrick McHardy 2007-01-19 13:23 ` Michael Tokarev 2007-01-19 14:32 ` Jarek Poplawski 2007-01-19 13:20 ` Michael Tokarev 2007-01-19 21:10 ` Herbert Xu 2 siblings, 2 replies; 32+ messages in thread From: Patrick McHardy @ 2007-01-19 12:14 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Michael Tokarev, netdev, Herbert Xu Jarek Poplawski wrote: > Here is my patch proposal. If I'm not totally wrong, > there is a possibility that, during collapsing, empty > skb with FIN is added to "normal" packet and changes > its ip_summed field to CHECKSUM_NONE. > > diff -Nurp linux-2.6.19-/net/ipv4/tcp_output.c linux-2.6.19/net/ipv4/tcp_output.c > --- linux-2.6.19-/net/ipv4/tcp_output.c 2006-11-29 22:57:37.000000000 +0100 > +++ linux-2.6.19/net/ipv4/tcp_output.c 2007-01-19 07:58:39.000000000 +0100 > @@ -1590,7 +1590,8 @@ static void tcp_retrans_try_collapse(str > > memcpy(skb_put(skb, next_skb_size), next_skb->data, next_skb_size); > > - skb->ip_summed = next_skb->ip_summed; > + if (next_skb->ip_summed == CHECKSUM_PARTIAL) > + skb->ip_summed = CHECKSUM_PARTIAL; > > if (skb->ip_summed != CHECKSUM_PARTIAL) > skb->csum = csum_block_add(skb->csum, next_skb->csum, skb_size); > I noticed this too, but I can't see how it could lead to a partial checksum on the wire since the checksumming is done after changing ip_summed to CHECKSUM_NONE. Is this patch verified to fix Michael's problem? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19? 2007-01-19 12:14 ` Patrick McHardy @ 2007-01-19 13:23 ` Michael Tokarev 2007-01-19 14:32 ` Jarek Poplawski 1 sibling, 0 replies; 32+ messages in thread From: Michael Tokarev @ 2007-01-19 13:23 UTC (permalink / raw) To: Patrick McHardy; +Cc: Jarek Poplawski, netdev, Herbert Xu Patrick McHardy wrote: > Jarek Poplawski wrote: >> Here is my patch proposal. If I'm not totally wrong, >> there is a possibility that, during collapsing, empty >> skb with FIN is added to "normal" packet and changes >> its ip_summed field to CHECKSUM_NONE. >> >> diff -Nurp linux-2.6.19-/net/ipv4/tcp_output.c linux-2.6.19/net/ipv4/tcp_output.c >> --- linux-2.6.19-/net/ipv4/tcp_output.c 2006-11-29 22:57:37.000000000 +0100 >> +++ linux-2.6.19/net/ipv4/tcp_output.c 2007-01-19 07:58:39.000000000 +0100 >> @@ -1590,7 +1590,8 @@ static void tcp_retrans_try_collapse(str >> >> memcpy(skb_put(skb, next_skb_size), next_skb->data, next_skb_size); >> >> - skb->ip_summed = next_skb->ip_summed; >> + if (next_skb->ip_summed == CHECKSUM_PARTIAL) >> + skb->ip_summed = CHECKSUM_PARTIAL; >> >> if (skb->ip_summed != CHECKSUM_PARTIAL) >> skb->csum = csum_block_add(skb->csum, next_skb->csum, skb_size); >> > > I noticed this too, but I can't see how it could lead to > a partial checksum on the wire since the checksumming is > done after changing ip_summed to CHECKSUM_NONE. Is this > patch verified to fix Michael's problem? It seems to fix this "my" problem, yes - at least I can't reproduce it anymore. Tcpdump is running however - let's see... :) /mjt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19? 2007-01-19 12:14 ` Patrick McHardy 2007-01-19 13:23 ` Michael Tokarev @ 2007-01-19 14:32 ` Jarek Poplawski 1 sibling, 0 replies; 32+ messages in thread From: Jarek Poplawski @ 2007-01-19 14:32 UTC (permalink / raw) To: Patrick McHardy; +Cc: Michael Tokarev, netdev, Herbert Xu On Fri, Jan 19, 2007 at 01:14:52PM +0100, Patrick McHardy wrote: > Jarek Poplawski wrote: > > Here is my patch proposal. If I'm not totally wrong, > > there is a possibility that, during collapsing, empty > > skb with FIN is added to "normal" packet and changes > > its ip_summed field to CHECKSUM_NONE. > > > > diff -Nurp linux-2.6.19-/net/ipv4/tcp_output.c linux-2.6.19/net/ipv4/tcp_output.c > > --- linux-2.6.19-/net/ipv4/tcp_output.c 2006-11-29 22:57:37.000000000 +0100 > > +++ linux-2.6.19/net/ipv4/tcp_output.c 2007-01-19 07:58:39.000000000 +0100 > > @@ -1590,7 +1590,8 @@ static void tcp_retrans_try_collapse(str > > > > memcpy(skb_put(skb, next_skb_size), next_skb->data, next_skb_size); > > > > - skb->ip_summed = next_skb->ip_summed; > > + if (next_skb->ip_summed == CHECKSUM_PARTIAL) > > + skb->ip_summed = CHECKSUM_PARTIAL; > > > > if (skb->ip_summed != CHECKSUM_PARTIAL) > > skb->csum = csum_block_add(skb->csum, next_skb->csum, skb_size); > > > > I noticed this too, but I can't see how it could lead to > a partial checksum on the wire since the checksumming is > done after changing ip_summed to CHECKSUM_NONE. Is this > patch verified to fix Michael's problem? No, this was intended as a proposal for testing. I didn't verify all the checksum path here, but I guessed such change during the summing could matter (probably for skb_copy_and_csum_dev and maybe earlier) and I couldn't find more suspicious change since 2.6.17 near this FINs. But if it really works, it shoudn't be so hard to verify the mechanism, I hope. Jarek P. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19? 2007-01-19 11:06 ` [PATCH] tcp_output: " Jarek Poplawski 2007-01-19 12:14 ` Patrick McHardy @ 2007-01-19 13:20 ` Michael Tokarev 2007-01-19 14:08 ` Jarek Poplawski 2007-01-19 21:10 ` Herbert Xu 2 siblings, 1 reply; 32+ messages in thread From: Michael Tokarev @ 2007-01-19 13:20 UTC (permalink / raw) To: Jarek Poplawski; +Cc: netdev, Patrick McHardy, Herbert Xu Jarek Poplawski wrote: > On 17-01-2007 15:12, Michael Tokarev wrote: [] >> Here's another sample, which may be more useful. I've seen quite >> alot of very similar stuff while running tcpdump. >> >> http://www.corpit.ru/mjt/bad-cksum-session3-dmp.bin >> >> The scenario looks like this. >> >> A client (82.84.172.37 -- a zombie machine trying to send us spam >> in this case) connects to a port 25 here (81.13.94.6:25). SYN+ACK >> sequence completes. Next, our server send an initial SMTP greething >> message, but almost right after that, the client sends a FIN packet, >> WITHOUT acknowleging that it received the (first and only) data >> packet. So some time later our machine re-sends the data, AND adds >> FIN flag to the packet (also replying to the FIN received from the >> client). And *that* packet - original data packet which is modified >> to also include FIN - has incorrect checksum. >> >> So it looks like the checksum isn't being updated WHEN ADDING MORE >> FLAGS to the original data packet. >> > > Hi, > > Here is my patch proposal. If I'm not totally wrong, > there is a possibility that, during collapsing, empty > skb with FIN is added to "normal" packet and changes > its ip_summed field to CHECKSUM_NONE. > > Regards, > Jarek P. > > PS: probably there are also other possibilities... Well.. I just tried it - with this patch applied, no more bad checksums are shown. Tried from the network that triggers it most reliable - and wasn't able to reproduce the bad behavior. I'm running a tcpdump right now, and so far it only captured a few bad-cksum packets from other hosts (which are also running 2.6.19 ;) Thanks Jarek! /mjt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19? 2007-01-19 13:20 ` Michael Tokarev @ 2007-01-19 14:08 ` Jarek Poplawski 2007-01-22 7:13 ` Jarek Poplawski 0 siblings, 1 reply; 32+ messages in thread From: Jarek Poplawski @ 2007-01-19 14:08 UTC (permalink / raw) To: Michael Tokarev; +Cc: netdev, Patrick McHardy, Herbert Xu On Fri, Jan 19, 2007 at 04:20:01PM +0300, Michael Tokarev wrote: ... > Well.. I just tried it - with this patch applied, no more bad checksums > are shown. Tried from the network that triggers it most reliable - and > wasn't able to reproduce the bad behavior. > > I'm running a tcpdump right now, and so far it only captured a few bad-cksum > packets from other hosts (which are also running 2.6.19 ;) > > Thanks Jarek! You are welcome! But you probably didn't read this with attention: if it works, you should thank mainly to that other guy... Btw. I can't remember I've seen such ferocious testing ever! Cheers, Jarek P. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19? 2007-01-19 14:08 ` Jarek Poplawski @ 2007-01-22 7:13 ` Jarek Poplawski 2007-01-22 7:19 ` Michael Tokarev 0 siblings, 1 reply; 32+ messages in thread From: Jarek Poplawski @ 2007-01-22 7:13 UTC (permalink / raw) To: Michael Tokarev; +Cc: netdev, Patrick McHardy, Herbert Xu On Fri, Jan 19, 2007 at 03:08:20PM +0100, Jarek Poplawski wrote: ... > You are welcome! But you probably didn't read this with > attention: if it works, you should thank mainly to that > other guy... > > Btw. I can't remember I've seen such ferocious testing > ever! After checking in the dictionary I found my btw. could be rather confusing: > ferocious: > 1.savagely fierce, as a wild beast, person, action, > or aspect; violently cruel: a ferocious beating. > 2.extreme or intense: a ferocious thirst. I've only meant #2 - and nothing like #1. If you were confused - sorry! Jarek P. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19? 2007-01-22 7:13 ` Jarek Poplawski @ 2007-01-22 7:19 ` Michael Tokarev 2007-01-22 8:03 ` Jarek Poplawski 0 siblings, 1 reply; 32+ messages in thread From: Michael Tokarev @ 2007-01-22 7:19 UTC (permalink / raw) To: Jarek Poplawski; +Cc: netdev, Patrick McHardy, Herbert Xu Jarek Poplawski wrote: > On Fri, Jan 19, 2007 at 03:08:20PM +0100, Jarek Poplawski wrote: > ... >> You are welcome! But you probably didn't read this with >> attention: if it works, you should thank mainly to that >> other guy... >> >> Btw. I can't remember I've seen such ferocious testing >> ever! > > After checking in the dictionary I found my btw. could > be rather confusing: > >> ferocious: >> 1.savagely fierce, as a wild beast, person, action, >> or aspect; violently cruel: a ferocious beating. >> 2.extreme or intense: a ferocious thirst. > > I've only meant #2 - and nothing like #1. > If you were confused - sorry! Heh. Jarek, thank you for your apprecation of my efforts. And no, I noticied this your statement in the first place (in good sense - like, #2 above) -- I wanted to comment on it first, but didn't. I was only running tcpdump - yes, it was running almost the whole day, with different options. I did almost nothing. You over-estimate my contribution, really ;) The very good thing is that this bug is now found, and *that* is what matters. /mjt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19? 2007-01-22 7:19 ` Michael Tokarev @ 2007-01-22 8:03 ` Jarek Poplawski 0 siblings, 0 replies; 32+ messages in thread From: Jarek Poplawski @ 2007-01-22 8:03 UTC (permalink / raw) To: Michael Tokarev; +Cc: netdev, Patrick McHardy, Herbert Xu On Mon, Jan 22, 2007 at 10:19:18AM +0300, Michael Tokarev wrote: ... > I was only running tcpdump - yes, it was running almost the > whole day, with different options. I did almost nothing. > > You over-estimate my contribution, really ;) > > The very good thing is that this bug is now found, and *that* > is what matters. But if you consider that 2.6.19 isn't so fresh now and the bug was in a very often used place and it for sure had to disturb many times in many places, so I'm not sure, if without your doing "almost nothing", this bug would be diagnosed in the next year or two, really. Jarek P. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19? 2007-01-19 11:06 ` [PATCH] tcp_output: " Jarek Poplawski 2007-01-19 12:14 ` Patrick McHardy 2007-01-19 13:20 ` Michael Tokarev @ 2007-01-19 21:10 ` Herbert Xu 2007-01-22 6:52 ` Jarek Poplawski 2 siblings, 1 reply; 32+ messages in thread From: Herbert Xu @ 2007-01-19 21:10 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Michael Tokarev, netdev, Patrick McHardy, David S. Miller On Fri, Jan 19, 2007 at 12:06:41PM +0100, Jarek Poplawski wrote: > > [PATCH][NET] tcp_output: rare bad TCP checksum with 2.6.19 > > The patch "Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE" > changed to unconditional copying of ip_summed field from collapsed > skb. This patch reverts this change. > > All substantial work including heavy testing and diagnosing by: > Michael Tokarev <mjt@tls.msk.ru> > > Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Thanks for catching this! I'll take the credit for adding this bug :) Dave, we'll need this fix for 2.6.20 as well as 2.6.19. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19? 2007-01-19 21:10 ` Herbert Xu @ 2007-01-22 6:52 ` Jarek Poplawski 2007-01-22 7:45 ` Herbert Xu 2007-01-24 6:08 ` David Miller 0 siblings, 2 replies; 32+ messages in thread From: Jarek Poplawski @ 2007-01-22 6:52 UTC (permalink / raw) To: Herbert Xu; +Cc: Michael Tokarev, netdev, Patrick McHardy, David S. Miller On Sat, Jan 20, 2007 at 08:10:27AM +1100, Herbert Xu wrote: > On Fri, Jan 19, 2007 at 12:06:41PM +0100, Jarek Poplawski wrote: > > > > [PATCH][NET] tcp_output: rare bad TCP checksum with 2.6.19 > > > > The patch "Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE" > > changed to unconditional copying of ip_summed field from collapsed > > skb. This patch reverts this change. > > > > All substantial work including heavy testing and diagnosing by: > > Michael Tokarev <mjt@tls.msk.ru> > > > > Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> > > Acked-by: Herbert Xu <herbert@gondor.apana.org.au> > > Thanks for catching this! I'll take the credit for adding this bug :) > > Dave, we'll need this fix for 2.6.20 as well as 2.6.19. I was so impressed by the amount of work done by Michael that I magnified his merit and forgot to mention the role of Patrick and Herbert, particularly here: > Since you're certain that this is being seen on the wire, one > possibility is that we've got a bug somewhere that's zeroing > skb->ip_summed on a packet with a partial checksum. which exactly pointed the reason. So, I apologize to them and, if there is such possibility, I would like to ask David Miller to change the description like that: --- [PATCH][NET] tcp_output: rare bad TCP checksum with 2.6.19 The patch "Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE" changed to unconditional copying of ip_summed field from collapsed skb. This patch reverts this change. The majority of substantial work including heavy testing and diagnosing by: Michael Tokarev <mjt@tls.msk.ru> Possible reasons pointed by: Herbert Xu and Patrick McHardy. Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> --- Regards, Jarek P. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19? 2007-01-22 6:52 ` Jarek Poplawski @ 2007-01-22 7:45 ` Herbert Xu 2007-01-22 8:48 ` Jarek Poplawski 2007-01-22 13:46 ` Patrick McHardy 2007-01-24 6:08 ` David Miller 1 sibling, 2 replies; 32+ messages in thread From: Herbert Xu @ 2007-01-22 7:45 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Michael Tokarev, netdev, Patrick McHardy, David S. Miller On Mon, Jan 22, 2007 at 07:52:14AM +0100, Jarek Poplawski wrote: > > I was so impressed by the amount of work done by Michael > that I magnified his merit and forgot to mention the role > of Patrick and Herbert, particularly here: You don't need to be so modest! While no doubt Patrick helped in tracking this down, my role in this thread is solely in adding this bug in the first place :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19? 2007-01-22 7:45 ` Herbert Xu @ 2007-01-22 8:48 ` Jarek Poplawski 2007-01-22 13:46 ` Patrick McHardy 1 sibling, 0 replies; 32+ messages in thread From: Jarek Poplawski @ 2007-01-22 8:48 UTC (permalink / raw) To: Herbert Xu; +Cc: Michael Tokarev, netdev, Patrick McHardy, David S. Miller On Mon, Jan 22, 2007 at 06:45:57PM +1100, Herbert Xu wrote: > On Mon, Jan 22, 2007 at 07:52:14AM +0100, Jarek Poplawski wrote: > > > > I was so impressed by the amount of work done by Michael > > that I magnified his merit and forgot to mention the role > > of Patrick and Herbert, particularly here: > > You don't need to be so modest! > > While no doubt Patrick helped in tracking this down, my role in > this thread is solely in adding this bug in the first place :) Let's be sincere then: I really spend more time on thinking why, with this hint of yours and the last summary of Michael, you didn't send this patch yet, and checking if I don't repeat or appropriate your work, then on finding the place. My only explanation was: you were overworked with something else and I'm sure you think the same, now. Jarek P. PS: considering the amount of diplomacy relative to the size of the patch, I think we would better stop amuse this list here. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19? 2007-01-22 7:45 ` Herbert Xu 2007-01-22 8:48 ` Jarek Poplawski @ 2007-01-22 13:46 ` Patrick McHardy 1 sibling, 0 replies; 32+ messages in thread From: Patrick McHardy @ 2007-01-22 13:46 UTC (permalink / raw) To: Herbert Xu; +Cc: Jarek Poplawski, Michael Tokarev, netdev, David S. Miller Herbert Xu wrote: > On Mon, Jan 22, 2007 at 07:52:14AM +0100, Jarek Poplawski wrote: > >> >>I was so impressed by the amount of work done by Michael >>that I magnified his merit and forgot to mention the role >>of Patrick and Herbert, particularly here: > > > You don't need to be so modest! > > While no doubt Patrick helped in tracking this down, my role in > this thread is solely in adding this bug in the first place :) Actually I only meditated over the code without figuring out anything useful, but thanks anyway :) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19? 2007-01-22 6:52 ` Jarek Poplawski 2007-01-22 7:45 ` Herbert Xu @ 2007-01-24 6:08 ` David Miller 1 sibling, 0 replies; 32+ messages in thread From: David Miller @ 2007-01-24 6:08 UTC (permalink / raw) To: jarkao2; +Cc: herbert, mjt, netdev, kaber From: Jarek Poplawski <jarkao2@o2.pl> Date: Mon, 22 Jan 2007 07:52:14 +0100 > [PATCH][NET] tcp_output: rare bad TCP checksum with 2.6.19 > > The patch "Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE" > changed to unconditional copying of ip_summed field from collapsed > skb. This patch reverts this change. > > The majority of substantial work including heavy testing > and diagnosing by: Michael Tokarev <mjt@tls.msk.ru> > Possible reasons pointed by: Herbert Xu and Patrick McHardy. > > Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> > > Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Applied, thanks a lot everyone. I'll take care of submitting this to 2.6.19-stable. Thanks again. ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2007-01-24 6:08 UTC | newest] Thread overview: 32+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-01-14 22:59 rare bad TCP checksum with 2.6.19? Michael Tokarev 2007-01-15 9:39 ` Herbert Xu 2007-01-15 13:34 ` Michael Tokarev 2007-01-15 14:25 ` Michael Tokarev 2007-01-15 18:13 ` Eric Dumazet 2007-01-15 19:33 ` Michael Tokarev 2007-01-15 23:36 ` Eric Dumazet 2007-01-15 20:10 ` Herbert Xu 2007-01-15 21:46 ` Michael Tokarev 2007-01-15 23:35 ` Herbert Xu 2007-01-16 3:27 ` Herbert Xu 2007-01-16 3:38 ` Herbert Xu 2007-01-16 8:08 ` Michael Tokarev 2007-01-16 11:50 ` Herbert Xu 2007-01-16 12:15 ` Patrick McHardy 2007-01-16 14:38 ` Michael Tokarev 2007-01-17 14:12 ` Michael Tokarev 2007-01-19 11:06 ` [PATCH] tcp_output: " Jarek Poplawski 2007-01-19 12:14 ` Patrick McHardy 2007-01-19 13:23 ` Michael Tokarev 2007-01-19 14:32 ` Jarek Poplawski 2007-01-19 13:20 ` Michael Tokarev 2007-01-19 14:08 ` Jarek Poplawski 2007-01-22 7:13 ` Jarek Poplawski 2007-01-22 7:19 ` Michael Tokarev 2007-01-22 8:03 ` Jarek Poplawski 2007-01-19 21:10 ` Herbert Xu 2007-01-22 6:52 ` Jarek Poplawski 2007-01-22 7:45 ` Herbert Xu 2007-01-22 8:48 ` Jarek Poplawski 2007-01-22 13:46 ` Patrick McHardy 2007-01-24 6:08 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).