rare bad TCP checksum with 2.6.19?

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* rare bad TCP checksum with 2.6.19?
@ 2007-01-14 22:59 Michael Tokarev
  2007-01-15  9:39 ` Herbert Xu
  0 siblings, 1 reply; 32+ messages in thread
From: Michael Tokarev @ 2007-01-14 22:59 UTC (permalink / raw)
  To: netdev

I noticied, after running with 2.6.19 for more than a month, that
sometimes, a file transfer, when one of the ends is running 2.6.19,
stalls at the very end of the file, forever.

Playing with tcpdump, I noticied that the host sends out packets with
wrong checksums, like this:

01:28:07.608457 IP (tos 0x0, ttl  64, id 11740, offset 0, flags [DF], length: 82)
    81.13.94.6.80 > 216.168.29.244.57064: FP [bad tcp cksum b011 (->7ae2)!]
    140062:140092(30) ack 125 win 2896 <nop,nop,timestamp 87610 145676467>

(here, 81.13.94.6 is running linux 2.6.19).

It happens only on rare cases, and not reliable repeatable.

After further playing I noticied that - almost - only packets with FIN flag
set (like the above), *and* containing some data in them (again, like the
above), shows this behaviour.

With FIN set, the thing is 100% repeatable (the only problem is to force the
system to actually send such a packet -- for that, one has to push quite some
data to the socket and immediately close it, so that there will be some data
to send in kernel buffer still at the moment of close).

This explains the observed behaviour - rare, unreliable stalls at the end of
a transfer -- because it's relatively rare when FIN packet contains some data.

But sometimes, other packets go out with bad checksum, too:

01:20:01.712146 IP (tos 0x0, ttl  64, id 52870, offset 0, flags [DF], length: 1500)
    81.13.94.6.80 > 216.168.29.244.57655: . [bad tcp cksum ab7e (->dcbd)!]
    112945:114393(1448) ack 125 win 2896 <nop,nop,timestamp 39006 145190996>

(again, 81.13.94.6 is a machine running linux 2.6.19).  That's one in a row of
other pretty normal packets - it has been retransmitted a bit later, with correct
checksum.

When switching back to 2.6.17 (previous kernel which was running on this
machine), things goes back to normal, or at least so it seems.

Note there's no funny/interesting hardware involved, like network cards with
tcp checksumming offload capabilities (this is plain dumb 8139 card).

I'll try to collect further information tomorrow.  But if someone has some
clue before.... ;)

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: rare bad TCP checksum with 2.6.19?
  2007-01-14 22:59 rare bad TCP checksum with 2.6.19? Michael Tokarev
@ 2007-01-15  9:39 ` Herbert Xu
  2007-01-15 13:34   ` Michael Tokarev
  0 siblings, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2007-01-15  9:39 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: netdev

Michael Tokarev <mjt@tls.msk.ru> wrote:
>
> Note there's no funny/interesting hardware involved, like network cards with
> tcp checksumming offload capabilities (this is plain dumb 8139 card).

The 8139 card might be dumb, but the driver isn't :) It emulates
checksum offload in software, meaning that tcpdump will show bogus
checksums.

So please disable hardware checksum offload with ethtool -K and
then try again.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: rare bad TCP checksum with 2.6.19?
  2007-01-15  9:39 ` Herbert Xu
@ 2007-01-15 13:34   ` Michael Tokarev
  2007-01-15 14:25     ` Michael Tokarev
                       ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Michael Tokarev @ 2007-01-15 13:34 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev

Herbert Xu wrote:
> Michael Tokarev <mjt@tls.msk.ru> wrote:
>> Note there's no funny/interesting hardware involved, like network cards with
>> tcp checksumming offload capabilities (this is plain dumb 8139 card).
> 
> The 8139 card might be dumb, but the driver isn't :) It emulates
> checksum offload in software, meaning that tcpdump will show bogus
> checksums.
> 
> So please disable hardware checksum offload with ethtool -K and
> then try again.

# ethtool -k eth0
Offload parameters for eth0:
Cannot get device rx csum settings: Operation not supported
Cannot get device tx csum settings: Operation not supported
Cannot get device scatter-gather settings: Operation not supported
Cannot get device tcp segmentation offload settings: Operation not supported
no offload info available

# ethtool -K eth0 rx off tx off tso off
Cannot set device rx csum settings: Operation not supported

So I guess the problem is not related to hw checksumming offloading.

Meanwhile, I tried many times to reproduce the problem - with little
success.  With different sizings, options, et al - I can't force the
sending side to send some data within a FIN packet.  I.e, most of the
time, the thing just works, because no data goes with FIN packet.
But once every 50..100 tries, I see single FIN-with-data packet, and
that one ALWAYS has bad checksum.

I was never able to reproduce the problem on a LAN, only when going from
a distant host.  And even with that distant host, it's very difficult to
reproduce.

At least one network (also distant) triggers this problem on every 2nd
try or so (the one I experimented with yesterday).  But I've no access
to that network - I kindly asked for help yesterday, but I can't abuse
their willingness to help more.

And another thing I noticed.  Right now I'm experimenting with another
machine, running 2.6.17(.13) - it also shows similar behavior with bad
csums, but MUCH rarer than this 2.6.19.  Like this:

16:29:32.490976 IP (tos 0x60, ttl  48, id 14110, offset 0, flags [DF], length: 80)
 69.42.67.34.2612 > 81.13.94.6.1234: . [bad tcp cksum f4b4 (->c1cc)!] ack 93407 win 9821
 <nop,nop,timestamp 1046528199 5497679,nop,nop,sack sack 3 {104991:109335}{110783:112231}{104991:109335} >
16:29:32.525988 IP (tos 0x60, ttl  48, id 14112, offset 0, flags [DF], length: 80)
 69.42.67.34.2612 > 81.13.94.6.1234: . [bad tcp cksum 3fb1 (->1819)!] ack 93407 win 9821
 <nop,nop,timestamp 1046528202 5497679,nop,nop,sack sack 3 {110783:113679}{122367:123815}{110783:113679} >
16:29:32.561407 IP (tos 0x60, ttl  48, id 14116, offset 0, flags [DF], length: 80)
 69.42.67.34.2612 > 81.13.94.6.1234: . [bad tcp cksum 87c0 (->2610)!] ack 93407 win 9821
 <nop,nop,timestamp 1046528205 5497679,nop,nop,sack sack 3 {122367:127103}{128551:129572}{122367:127103} >

Here, 69.42.67.34 is 2.6.17 from which I'm requesting data, and
81.13.94.6 is the sender.  This behavior so far is demonstrated with
sack packets only, but I've seen it in other direction too (also with
sack), at least once.

Any idea how to force sending FIN-with-data?

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: rare bad TCP checksum with 2.6.19?
  2007-01-15 13:34   ` Michael Tokarev
@ 2007-01-15 14:25     ` Michael Tokarev
  2007-01-15 18:13     ` Eric Dumazet
  2007-01-15 20:10     ` Herbert Xu
  2 siblings, 0 replies; 32+ messages in thread
From: Michael Tokarev @ 2007-01-15 14:25 UTC (permalink / raw)
  Cc: Herbert Xu, netdev

Michael Tokarev wrote:
[]
> And another thing I noticed.  Right now I'm experimenting with another
> machine, running 2.6.17(.13) - it also shows similar behavior with bad
> csums, but MUCH rarer than this 2.6.19.  Like this:
> 
> 16:29:32.490976 IP (tos 0x60, ttl  48, id 14110, offset 0, flags [DF], length: 80)
>  69.42.67.34.2612 > 81.13.94.6.1234: . [bad tcp cksum f4b4 (->c1cc)!] ack 93407 win 9821
>  <nop,nop,timestamp 1046528199 5497679,nop,nop,sack sack 3 {104991:109335}{110783:112231}{104991:109335} >

This seems to be a tcpdump bug.  At least the same packet(s), on another machine
(in-between the two), with updated tcpdump, shows as having correct checksum.
After updating tcpdump on this machine, I'm not seeing this 'sack bad cksum'
stuff anymore.

/mjt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: rare bad TCP checksum with 2.6.19?
  2007-01-15 13:34   ` Michael Tokarev
  2007-01-15 14:25     ` Michael Tokarev
@ 2007-01-15 18:13     ` Eric Dumazet
  2007-01-15 19:33       ` Michael Tokarev
  2007-01-15 20:10     ` Herbert Xu
  2 siblings, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2007-01-15 18:13 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Herbert Xu, netdev

Michael Tokarev a e'crit :
> 
> Any idea how to force sending FIN-with-data?

int flag_on = 1;
setsockopt(fd, SOL_TCP, TCP_CORK, &flag_on, sizeof(int));
send(fd, data, datalen, 0);
close(fd);


Eric Dumazet

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: rare bad TCP checksum with 2.6.19?
  2007-01-15 18:13     ` Eric Dumazet
@ 2007-01-15 19:33       ` Michael Tokarev
  2007-01-15 23:36         ` Eric Dumazet
  0 siblings, 1 reply; 32+ messages in thread
From: Michael Tokarev @ 2007-01-15 19:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Herbert Xu, netdev

Eric Dumazet wrote:
> Michael Tokarev a e'crit :
>>
>> Any idea how to force sending FIN-with-data?
> 
> int flag_on = 1;
> setsockopt(fd, SOL_TCP, TCP_CORK, &flag_on, sizeof(int));
> send(fd, data, datalen, 0);
> close(fd);

That produces two packets - one (or more - depending on the
size) data packet and one FIN packet w/o any data.

This is the first thing I've tried.

/mjt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: rare bad TCP checksum with 2.6.19?
  2007-01-15 19:33       ` Michael Tokarev
@ 2007-01-15 23:36         ` Eric Dumazet
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Dumazet @ 2007-01-15 23:36 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Herbert Xu, netdev

Michael Tokarev a écrit :
> Eric Dumazet wrote:
>> Michael Tokarev a e'crit :
>>> Any idea how to force sending FIN-with-data?
>> int flag_on = 1;
>> setsockopt(fd, SOL_TCP, TCP_CORK, &flag_on, sizeof(int));
>> send(fd, data, datalen, 0);
>> close(fd);
> 
> That produces two packets - one (or more - depending on the
> size) data packet and one FIN packet w/o any data.
> 
> This is the first thing I've tried.

This may be because I forgot the shutdown() ?

int flag_on = 1;
setsockopt(fd, SOL_TCP, TCP_CORK, &flag_on, sizeof(int));
send(fd, data, datalen, 0);
shutdown(fd, 1);
close(fd);

At least this is working on my machines (with and without shutdown())

Eric

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: rare bad TCP checksum with 2.6.19?
  2007-01-15 13:34   ` Michael Tokarev
  2007-01-15 14:25     ` Michael Tokarev
  2007-01-15 18:13     ` Eric Dumazet
@ 2007-01-15 20:10     ` Herbert Xu
  2007-01-15 21:46       ` Michael Tokarev
  2 siblings, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2007-01-15 20:10 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: netdev

On Mon, Jan 15, 2007 at 04:34:41PM +0300, Michael Tokarev wrote:
> 
> # ethtool -k eth0
> Offload parameters for eth0:
> Cannot get device rx csum settings: Operation not supported
> Cannot get device tx csum settings: Operation not supported
> Cannot get device scatter-gather settings: Operation not supported
> Cannot get device tcp segmentation offload settings: Operation not supported
> no offload info available
> 
> # ethtool -K eth0 rx off tx off tso off
> Cannot set device rx csum settings: Operation not supported
> 
> So I guess the problem is not related to hw checksumming offloading.

Nope, it just means that 8139too doesn't provide ethtool handlers to
disable checksum offloading.

So I suggest that you try doing the tcpdump on the receive side as
that should show the real checksum.

BTW, the reason tcpdump only shows some packets with bogus checksums
is because it cuts packets off at 100 bytes by default so for most
packets it can't verify the checksum at all.  If you run it with
-s 1600 you should see bogus checksums on every packet with payload.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: rare bad TCP checksum with 2.6.19?
  2007-01-15 20:10     ` Herbert Xu
@ 2007-01-15 21:46       ` Michael Tokarev
  2007-01-15 23:35         ` Herbert Xu
  2007-01-16  3:27         ` Herbert Xu
  0 siblings, 2 replies; 32+ messages in thread
From: Michael Tokarev @ 2007-01-15 21:46 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev

Herbert Xu wrote:
> On Mon, Jan 15, 2007 at 04:34:41PM +0300, Michael Tokarev wrote:
[]
>> So I guess the problem is not related to hw checksumming offloading.
> 
> Nope, it just means that 8139too doesn't provide ethtool handlers to
> disable checksum offloading.
> 
> So I suggest that you try doing the tcpdump on the receive side as
> that should show the real checksum.

I'm doing the capture on an intermediate host - the whole day today ;)

> BTW, the reason tcpdump only shows some packets with bogus checksums
> is because it cuts packets off at 100 bytes by default so for most
> packets it can't verify the checksum at all.  If you run it with
> -s 1600 you should see bogus checksums on every packet with payload.

And I'm capturing with -s 2000.  By the way, tcpdump just does not
verify the cheksum of truncated (due to capture size) packets.  At
least not the version I'm using (which is 3.9.5).

Herbert, the problem IS real, it's not due to some bad behavior due
to improper capturing or something like that.  Yes it's difficult to
come to it, but it is real.

I've saved quite alot of packets today, but it's all quite.. useless
as the thing is difficult to hit.  Here's some traces made with the
following filter:

 proto TCP and tcp[tcpflags] & (tcp-fin|tcp-push) == (tcp-fin|tcp-push)

(I've choosen FIN+PUSH because this combination is where the problem
is seen most - to be fair, it looks like I haven't seen it with other
flags).

In there, some packets are ok, but some are not.  So - again, it seems
like - I was wrong about 100% "hit ratio" -- ie, that the "bad checksum"
is ALWAYS the case with packets where some data goes in FIN packets --
this is incorrect, because the trace shows quite a few examples of right
behavior.

The trace is here: http://www.corpit.ru/mjt/bad-tcp-cksum-dmp.bin

(it contains some data which it sholdn't - but I hope there's nothing
confidential in there ;)

So, after the whole day digging around, I still don't have any more-or-less
clean way to reproduce it.  But I've noticied another thing as well: many
different machines here, with different kernels, behave the same way.
So it can't be a hardware problem for example.

And only at VERY rare cases, the thing causes noticeable transfer slowdowns
or stalls.  But some networks triggers those rare cases more often than others
(so the only more or less sane conclusion I can come with is that it's
somehow timing-related).

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: rare bad TCP checksum with 2.6.19?
  2007-01-15 21:46       ` Michael Tokarev
@ 2007-01-15 23:35         ` Herbert Xu
  2007-01-16  3:27         ` Herbert Xu
  1 sibling, 0 replies; 32+ messages in thread
From: Herbert Xu @ 2007-01-15 23:35 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: netdev

On Tue, Jan 16, 2007 at 12:46:08AM +0300, Michael Tokarev wrote:
> 
> I'm doing the capture on an intermediate host - the whole day today ;)

Cool, I was just trying to make sure :)
 
> The trace is here: http://www.corpit.ru/mjt/bad-tcp-cksum-dmp.bin

I'll take a look.

Are you using anything extra like netfilter?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: rare bad TCP checksum with 2.6.19?
  2007-01-15 21:46       ` Michael Tokarev
  2007-01-15 23:35         ` Herbert Xu
@ 2007-01-16  3:27         ` Herbert Xu
  2007-01-16  3:38           ` Herbert Xu
  1 sibling, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2007-01-16  3:27 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: netdev

On Tue, Jan 16, 2007 at 12:46:08AM +0300, Michael Tokarev wrote:
> 
> The trace is here: http://www.corpit.ru/mjt/bad-tcp-cksum-dmp.bin

I'm sorry but this dump does NOT look like it was taken from an
intermediate box.  I verified two bad checksums (chosen randomly)
and they were both correct but partial checksums.  This means that
this dump was most likely taken from the sending host.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: rare bad TCP checksum with 2.6.19?
  2007-01-16  3:27         ` Herbert Xu
@ 2007-01-16  3:38           ` Herbert Xu
  2007-01-16  8:08             ` Michael Tokarev
  0 siblings, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2007-01-16  3:38 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: netdev

On Tue, Jan 16, 2007 at 02:27:39PM +1100, Herbert Xu wrote:
> 
> I'm sorry but this dump does NOT look like it was taken from an
> intermediate box.  I verified two bad checksums (chosen randomly)
> and they were both correct but partial checksums.  This means that
> this dump was most likely taken from the sending host.

I did see one strange bit:

02:39:51.758803 IP (tos 0x0, ttl  63, id 41084, offset 0, flags [DF], length: 102) 192.168.1.1.25 > 81.13.94.6.21350: FP [bad tcp cksum 81b0 (->9ee8)!] 4271854025:4271
854075(50) ack 3772789166 win 272 <nop,nop,timestamp 145420525 6279830>
        0x0000:  4500 0066 a07c 4000 3f06 2a59 c0a8 0101  E..f.|@.?.*Y....
        0x0010:  510d 5e06 0019 5366 fe9f 51c9 e0e0 31ae  Q.^...Sf..Q...1.
        0x0020:  8019 0110 81b0 0000 0101 080a 08aa f0ed  ................
        0x0030:  005f d296 3235 3020 322e 302e 3020 4f6b  ._..250.2.0.0.Ok
        0x0040:  3a20 7175 6575 6564 2061 7320 3631 3345  :.queued.as.613E
        0x0050:  4137 4637 440d 0a32 3231 2032 2e30 2e30  A7F7D..221.2.0.0
        0x0060:  2042 7965 0d0a                           .Bye..

Most of the bad checksums are from 81.13.94.6, which I presume is
the host you were dumping on.  However, this packet is destined
for it instead and yet it too has a partial (but correct) checksum.

So the question is where in your network is 192.168.1.1 and how is
your network setup in terms of NAT?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: rare bad TCP checksum with 2.6.19?
  2007-01-16  3:38           ` Herbert Xu
@ 2007-01-16  8:08             ` Michael Tokarev
  2007-01-16 11:50               ` Herbert Xu
  0 siblings, 1 reply; 32+ messages in thread
From: Michael Tokarev @ 2007-01-16  8:08 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev

Herbert Xu wrote:
> On Tue, Jan 16, 2007 at 02:27:39PM +1100, Herbert Xu wrote:
>> I'm sorry but this dump does NOT look like it was taken from an
>> intermediate box.  I verified two bad checksums (chosen randomly)
>> and they were both correct but partial checksums.  This means that
>> this dump was most likely taken from the sending host.
> 
> I did see one strange bit:
> 
> 02:39:51.758803 IP (tos 0x0, ttl  63, id 41084, offset 0, flags [DF], length: 102) 192.168.1.1.25 > 81.13.94.6.21350: FP [bad tcp cksum 81b0 (->9ee8)!] 4271854025:4271
> 854075(50) ack 3772789166 win 272 <nop,nop,timestamp 145420525 6279830>
>         0x0000:  4500 0066 a07c 4000 3f06 2a59 c0a8 0101  E..f.|@.?.*Y....
>         0x0010:  510d 5e06 0019 5366 fe9f 51c9 e0e0 31ae  Q.^...Sf..Q...1.
>         0x0020:  8019 0110 81b0 0000 0101 080a 08aa f0ed  ................
>         0x0030:  005f d296 3235 3020 322e 302e 3020 4f6b  ._..250.2.0.0.Ok
>         0x0040:  3a20 7175 6575 6564 2061 7320 3631 3345  :.queued.as.613E
>         0x0050:  4137 4637 440d 0a32 3231 2032 2e30 2e30  A7F7D..221.2.0.0
>         0x0060:  2042 7965 0d0a                           .Bye..
> 
> Most of the bad checksums are from 81.13.94.6, which I presume is
> the host you were dumping on.  However, this packet is destined
> for it instead and yet it too has a partial (but correct) checksum.
> 
> So the question is where in your network is 192.168.1.1 and how is
> your network setup in terms of NAT?

This 192.168.* network is internal, and this very packet - I didn't
think it'll be there, but.. hum.

The network looks like this:

               internet
                  |          81.13.94.6 etc
             [ router ]  -  [ DMZ ]
                  |
               [ LAN ] 192.168.1.1 etc

The capture has been made on the router, on the interface which is
connected to a DMZ segment (so no netfilter stuff should be involved
at all; but there's no fancy netfilter setup between dmz and external
inteface, many packets don't even go to conntrack).

81.13.94.6 is a machine in the DMZ segment (it's www.corpit.ru, by the
way).

192.168.1.1 is a machine in LAN.

So the packet you're referring to belongs to a connection between
internal (on LAN) mailserver and a DMZ mailserver - and that one, --
at least I didn't think about capturing *that* traffic.  At least
most of the packets were between dmz and external interface.  That
to say - 192.168.1.1 machine also has this problem (as I mentioned
before - it happens on several different machines with different
kernels (all are 2.6.19 still - it doesn't happen with 2.6.18 or
before)), but it wasn't the main machine I did the testing on.

Ok.  Here's another trace, from that remote network that triggers
this thing more-or-less reliable (every 2nd transfer at least) --
http://www.corpit.ru/mjt/bh-bad-cksum-dmp.bin . It's a full session
between 216.168.29.244 - the requesting/receiving side -- and
81.13.94.6 -- our sending side (the file being transferred is some
trojan horse I found on a friend's PC, so be careful ;)

The last packet(s) -- they're repeated many times, ad infinitum,
because the receiving side discards incorrectly checksummed packets
and thus never sees the final part of the data -- here it's as
captured on the router (above, included in the trace):

10:52:35.702649 IP (tos 0x0, ttl  64, id 61117, offset 0, flags [DF], proto: TCP (6), length: 82)
 81.13.94.6.80 > 216.168.29.244.55354: FP, cksum 0x9185 (incorrect (-> 0x5c56),
 140062:140092(30) ack 125 win 2896 <nop,nop,timestamp 12118000 265951653>

And here it is again, captured on the RECEIVING side (on 216.168.29.244):

07:52:35.816545 IP (tos 0x0, ttl  48, id 61117, offset 0, flags [DF], proto: TCP (6), length: 82)
 81.13.94.6.80 > 216.168.29.244.55354: FP, cksum 0x9185 (incorrect (-> 0x5c56),
 140062:140092(30) ack 125 win 2896 <nop,nop,timestamp 12118000 265951653>

(the only difference in headers I see is in the TTL, which is expectable).

The transfer never finishes, it sits at 98% or so.  On the receiving side
(which is running FreeBSD), "bad checksums" statistics counter increases with
every FP packet.  It also makes no difference whenever tcpdump is running on
either side or on an intermediate host or not.

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: rare bad TCP checksum with 2.6.19?
  2007-01-16  8:08             ` Michael Tokarev
@ 2007-01-16 11:50               ` Herbert Xu
  2007-01-16 12:15                 ` Patrick McHardy
  2007-01-17 14:12                 ` Michael Tokarev
  0 siblings, 2 replies; 32+ messages in thread
From: Herbert Xu @ 2007-01-16 11:50 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: netdev, Patrick McHardy

On Tue, Jan 16, 2007 at 11:08:51AM +0300, Michael Tokarev wrote:
> 
> Ok.  Here's another trace, from that remote network that triggers
> this thing more-or-less reliable (every 2nd transfer at least) --
> http://www.corpit.ru/mjt/bh-bad-cksum-dmp.bin . It's a full session
> between 216.168.29.244 - the requesting/receiving side -- and
> 81.13.94.6 -- our sending side (the file being transferred is some
> trojan horse I found on a friend's PC, so be careful ;)

I'll have a look at this tomorrow.

Since you're certain that this is being seen on the wire, one
possibility is that we've got a bug somewhere that's zeroing
skb->ip_summed on a packet with a partial checksum.

One potential spot where this could happen is netfilter.
Patrick, do you know of any recent changes (this is happening
with 2.6.19) that might cause this?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: rare bad TCP checksum with 2.6.19?
  2007-01-16 11:50               ` Herbert Xu
@ 2007-01-16 12:15                 ` Patrick McHardy
  2007-01-16 14:38                   ` Michael Tokarev
  2007-01-17 14:12                 ` Michael Tokarev
  1 sibling, 1 reply; 32+ messages in thread
From: Patrick McHardy @ 2007-01-16 12:15 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Michael Tokarev, netdev

Herbert Xu wrote:
> On Tue, Jan 16, 2007 at 11:08:51AM +0300, Michael Tokarev wrote:
> 
>>Ok.  Here's another trace, from that remote network that triggers
>>this thing more-or-less reliable (every 2nd transfer at least) --
>>http://www.corpit.ru/mjt/bh-bad-cksum-dmp.bin . It's a full session
>>between 216.168.29.244 - the requesting/receiving side -- and
>>81.13.94.6 -- our sending side (the file being transferred is some
>>trojan horse I found on a friend's PC, so be careful ;)
> 
> 
> I'll have a look at this tomorrow.
> 
> Since you're certain that this is being seen on the wire, one
> possibility is that we've got a bug somewhere that's zeroing
> skb->ip_summed on a packet with a partial checksum.
> 
> One potential spot where this could happen is netfilter.
> Patrick, do you know of any recent changes (this is happening
> with 2.6.19) that might cause this?


The incremental HW checksum update stuff went in 2.6.19, so thats
a prime suspect. Can't see where this could be happening though.

Michael, how exactly is netfilter involved in your setup?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: rare bad TCP checksum with 2.6.19?
  2007-01-16 12:15                 ` Patrick McHardy
@ 2007-01-16 14:38                   ` Michael Tokarev
  0 siblings, 0 replies; 32+ messages in thread
From: Michael Tokarev @ 2007-01-16 14:38 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Herbert Xu, netdev

Patrick McHardy wrote:
> Herbert Xu wrote:
[]
>> Since you're certain that this is being seen on the wire, one
>> possibility is that we've got a bug somewhere that's zeroing
>> skb->ip_summed on a packet with a partial checksum.
>>
>> One potential spot where this could happen is netfilter.
>> Patrick, do you know of any recent changes (this is happening
>> with 2.6.19) that might cause this?
> 
> The incremental HW checksum update stuff went in 2.6.19, so thats
> a prime suspect. Can't see where this could be happening though.
> 
> Michael, how exactly is netfilter involved in your setup?

I think it doesn't involved.

The captures I did were done on a router box, which indeed has some
netfilter stuff.  But:

 1) the capture has been done on an interface directly connected to
   the segment where the "testing" machine is located (not on the
   "external" interface)

 2) the "testing" machine itself does not have any netfilter modules
   loaded

 3) the packets looks exactly the same in at least 3 places (modulo
   the TTL values): on the sending machine, on the router (on the
   interface connected to the sending machine - in those 2 places,
   the TTL is the same), and at the receiving side, which is 20+
   hops away.

 4) I tried another machine today (upgraded from 2.6.17 to 2.6.19) -
   stand-alone, without any netfilter modules loaded (but it's under
   quite.. some load - see http://j.ns.dsbl.org/nsg/ -- with this load
   it'll die right after iptables module loading, it's a 600MHz Celeron
   box replying to 15000 DNS packets every secound) - it started showing
   the same behavior.

/mjt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: rare bad TCP checksum with 2.6.19?
  2007-01-16 11:50               ` Herbert Xu
  2007-01-16 12:15                 ` Patrick McHardy
@ 2007-01-17 14:12                 ` Michael Tokarev
  2007-01-19 11:06                   ` [PATCH] tcp_output: " Jarek Poplawski
  1 sibling, 1 reply; 32+ messages in thread
From: Michael Tokarev @ 2007-01-17 14:12 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, Patrick McHardy

Herbert Xu wrote:
> On Tue, Jan 16, 2007 at 11:08:51AM +0300, Michael Tokarev wrote:
>> Ok.  Here's another trace, from that remote network that triggers
>> this thing more-or-less reliable (every 2nd transfer at least) --
>> http://www.corpit.ru/mjt/bh-bad-cksum-dmp.bin . It's a full session
>> between 216.168.29.244 - the requesting/receiving side -- and
>> 81.13.94.6 -- our sending side (the file being transferred is some
>> trojan horse I found on a friend's PC, so be careful ;)
> 
> I'll have a look at this tomorrow.
> 
> Since you're certain that this is being seen on the wire, one
> possibility is that we've got a bug somewhere that's zeroing
> skb->ip_summed on a packet with a partial checksum.

Here's another sample, which may be more useful.  I've seen quite
alot of very similar stuff while running tcpdump.

  http://www.corpit.ru/mjt/bad-cksum-session3-dmp.bin

The scenario looks like this.

A client (82.84.172.37 -- a zombie machine trying to send us spam
in this case) connects to a port 25 here (81.13.94.6:25).  SYN+ACK
sequence completes.  Next, our server send an initial SMTP greething
message, but almost right after that, the client sends a FIN packet,
WITHOUT acknowleging that it received the (first and only) data
packet.  So some time later our machine re-sends the data, AND adds
FIN flag to the packet (also replying to the FIN received from the
client).  And *that* packet - original data packet which is modified
to also include FIN - has incorrect checksum.

So it looks like the checksum isn't being updated WHEN ADDING MORE
FLAGS to the original data packet.

/mjt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19?
  2007-01-17 14:12                 ` Michael Tokarev
@ 2007-01-19 11:06                   ` Jarek Poplawski
  2007-01-19 12:14                     ` Patrick McHardy
                                       ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Jarek Poplawski @ 2007-01-19 11:06 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: netdev, Patrick McHardy, Herbert Xu

On 17-01-2007 15:12, Michael Tokarev wrote:
> Herbert Xu wrote:
>> On Tue, Jan 16, 2007 at 11:08:51AM +0300, Michael Tokarev wrote:
>>> Ok.  Here's another trace, from that remote network that triggers
>>> this thing more-or-less reliable (every 2nd transfer at least) --
>>> http://www.corpit.ru/mjt/bh-bad-cksum-dmp.bin . It's a full session
>>> between 216.168.29.244 - the requesting/receiving side -- and
>>> 81.13.94.6 -- our sending side (the file being transferred is some
>>> trojan horse I found on a friend's PC, so be careful ;)
>> I'll have a look at this tomorrow.
>>
>> Since you're certain that this is being seen on the wire, one
>> possibility is that we've got a bug somewhere that's zeroing
>> skb->ip_summed on a packet with a partial checksum.
> 
> Here's another sample, which may be more useful.  I've seen quite
> alot of very similar stuff while running tcpdump.
> 
>   http://www.corpit.ru/mjt/bad-cksum-session3-dmp.bin
> 
> The scenario looks like this.
> 
> A client (82.84.172.37 -- a zombie machine trying to send us spam
> in this case) connects to a port 25 here (81.13.94.6:25).  SYN+ACK
> sequence completes.  Next, our server send an initial SMTP greething
> message, but almost right after that, the client sends a FIN packet,
> WITHOUT acknowleging that it received the (first and only) data
> packet.  So some time later our machine re-sends the data, AND adds
> FIN flag to the packet (also replying to the FIN received from the
> client).  And *that* packet - original data packet which is modified
> to also include FIN - has incorrect checksum.
> 
> So it looks like the checksum isn't being updated WHEN ADDING MORE
> FLAGS to the original data packet.
> 

Hi,

Here is my patch proposal. If I'm not totally wrong,
there is a possibility that, during collapsing, empty
skb with FIN is added to "normal" packet and changes
its ip_summed field to CHECKSUM_NONE.

Regards,
Jarek P.

PS: probably there are also other possibilities...
---

[PATCH][NET] tcp_output: rare bad TCP checksum with 2.6.19

The patch "Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE"
changed to unconditional copying of ip_summed field from collapsed
skb. This patch reverts this change.   

All substantial work including heavy testing and diagnosing by:
Michael Tokarev <mjt@tls.msk.ru>

Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
---

diff -Nurp linux-2.6.19-/net/ipv4/tcp_output.c linux-2.6.19/net/ipv4/tcp_output.c
--- linux-2.6.19-/net/ipv4/tcp_output.c	2006-11-29 22:57:37.000000000 +0100
+++ linux-2.6.19/net/ipv4/tcp_output.c	2007-01-19 07:58:39.000000000 +0100
@@ -1590,7 +1590,8 @@ static void tcp_retrans_try_collapse(str
 
 		memcpy(skb_put(skb, next_skb_size), next_skb->data, next_skb_size);
 
-		skb->ip_summed = next_skb->ip_summed;
+		if (next_skb->ip_summed == CHECKSUM_PARTIAL)
+			skb->ip_summed = CHECKSUM_PARTIAL;
 
 		if (skb->ip_summed != CHECKSUM_PARTIAL)
 			skb->csum = csum_block_add(skb->csum, next_skb->csum, skb_size);

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19?
  2007-01-19 11:06                   ` [PATCH] tcp_output: " Jarek Poplawski
@ 2007-01-19 12:14                     ` Patrick McHardy
  2007-01-19 13:23                       ` Michael Tokarev
  2007-01-19 14:32                       ` Jarek Poplawski
  2007-01-19 13:20                     ` Michael Tokarev
  2007-01-19 21:10                     ` Herbert Xu
  2 siblings, 2 replies; 32+ messages in thread
From: Patrick McHardy @ 2007-01-19 12:14 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Michael Tokarev, netdev, Herbert Xu

Jarek Poplawski wrote:
> Here is my patch proposal. If I'm not totally wrong,
> there is a possibility that, during collapsing, empty
> skb with FIN is added to "normal" packet and changes
> its ip_summed field to CHECKSUM_NONE.
> 
> diff -Nurp linux-2.6.19-/net/ipv4/tcp_output.c linux-2.6.19/net/ipv4/tcp_output.c
> --- linux-2.6.19-/net/ipv4/tcp_output.c	2006-11-29 22:57:37.000000000 +0100
> +++ linux-2.6.19/net/ipv4/tcp_output.c	2007-01-19 07:58:39.000000000 +0100
> @@ -1590,7 +1590,8 @@ static void tcp_retrans_try_collapse(str
>  
>  		memcpy(skb_put(skb, next_skb_size), next_skb->data, next_skb_size);
>  
> -		skb->ip_summed = next_skb->ip_summed;
> +		if (next_skb->ip_summed == CHECKSUM_PARTIAL)
> +			skb->ip_summed = CHECKSUM_PARTIAL;
>  
>  		if (skb->ip_summed != CHECKSUM_PARTIAL)
>  			skb->csum = csum_block_add(skb->csum, next_skb->csum, skb_size);
> 

I noticed this too, but I can't see how it could lead to
a partial checksum on the wire since the checksumming is
done after changing ip_summed to CHECKSUM_NONE. Is this
patch verified to fix Michael's problem?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19?
  2007-01-19 12:14                     ` Patrick McHardy
@ 2007-01-19 13:23                       ` Michael Tokarev
  2007-01-19 14:32                       ` Jarek Poplawski
  1 sibling, 0 replies; 32+ messages in thread
From: Michael Tokarev @ 2007-01-19 13:23 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Jarek Poplawski, netdev, Herbert Xu

Patrick McHardy wrote:
> Jarek Poplawski wrote:
>> Here is my patch proposal. If I'm not totally wrong,
>> there is a possibility that, during collapsing, empty
>> skb with FIN is added to "normal" packet and changes
>> its ip_summed field to CHECKSUM_NONE.
>>
>> diff -Nurp linux-2.6.19-/net/ipv4/tcp_output.c linux-2.6.19/net/ipv4/tcp_output.c
>> --- linux-2.6.19-/net/ipv4/tcp_output.c	2006-11-29 22:57:37.000000000 +0100
>> +++ linux-2.6.19/net/ipv4/tcp_output.c	2007-01-19 07:58:39.000000000 +0100
>> @@ -1590,7 +1590,8 @@ static void tcp_retrans_try_collapse(str
>>  
>>  		memcpy(skb_put(skb, next_skb_size), next_skb->data, next_skb_size);
>>  
>> -		skb->ip_summed = next_skb->ip_summed;
>> +		if (next_skb->ip_summed == CHECKSUM_PARTIAL)
>> +			skb->ip_summed = CHECKSUM_PARTIAL;
>>  
>>  		if (skb->ip_summed != CHECKSUM_PARTIAL)
>>  			skb->csum = csum_block_add(skb->csum, next_skb->csum, skb_size);
>>
> 
> I noticed this too, but I can't see how it could lead to
> a partial checksum on the wire since the checksumming is
> done after changing ip_summed to CHECKSUM_NONE. Is this
> patch verified to fix Michael's problem?

It seems to fix this "my" problem, yes - at least I can't reproduce it anymore.
Tcpdump is running however - let's see... :)

/mjt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19?
  2007-01-19 12:14                     ` Patrick McHardy
  2007-01-19 13:23                       ` Michael Tokarev
@ 2007-01-19 14:32                       ` Jarek Poplawski
  1 sibling, 0 replies; 32+ messages in thread
From: Jarek Poplawski @ 2007-01-19 14:32 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Michael Tokarev, netdev, Herbert Xu

On Fri, Jan 19, 2007 at 01:14:52PM +0100, Patrick McHardy wrote:
> Jarek Poplawski wrote:
> > Here is my patch proposal. If I'm not totally wrong,
> > there is a possibility that, during collapsing, empty
> > skb with FIN is added to "normal" packet and changes
> > its ip_summed field to CHECKSUM_NONE.
> > 
> > diff -Nurp linux-2.6.19-/net/ipv4/tcp_output.c linux-2.6.19/net/ipv4/tcp_output.c
> > --- linux-2.6.19-/net/ipv4/tcp_output.c	2006-11-29 22:57:37.000000000 +0100
> > +++ linux-2.6.19/net/ipv4/tcp_output.c	2007-01-19 07:58:39.000000000 +0100
> > @@ -1590,7 +1590,8 @@ static void tcp_retrans_try_collapse(str
> >  
> >  		memcpy(skb_put(skb, next_skb_size), next_skb->data, next_skb_size);
> >  
> > -		skb->ip_summed = next_skb->ip_summed;
> > +		if (next_skb->ip_summed == CHECKSUM_PARTIAL)
> > +			skb->ip_summed = CHECKSUM_PARTIAL;
> >  
> >  		if (skb->ip_summed != CHECKSUM_PARTIAL)
> >  			skb->csum = csum_block_add(skb->csum, next_skb->csum, skb_size);
> > 
> 
> I noticed this too, but I can't see how it could lead to
> a partial checksum on the wire since the checksumming is
> done after changing ip_summed to CHECKSUM_NONE. Is this
> patch verified to fix Michael's problem?

No, this was intended as a proposal for testing.
I didn't verify all the checksum path here, but I
guessed such change during the summing could matter
(probably for skb_copy_and_csum_dev and maybe earlier)
and I couldn't find more suspicious change since 2.6.17
near this FINs. But if it really works, it shoudn't be
so hard to verify the mechanism, I hope.

Jarek P.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19?
  2007-01-19 11:06                   ` [PATCH] tcp_output: " Jarek Poplawski
  2007-01-19 12:14                     ` Patrick McHardy
@ 2007-01-19 13:20                     ` Michael Tokarev
  2007-01-19 14:08                       ` Jarek Poplawski
  2007-01-19 21:10                     ` Herbert Xu
  2 siblings, 1 reply; 32+ messages in thread
From: Michael Tokarev @ 2007-01-19 13:20 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev, Patrick McHardy, Herbert Xu

Jarek Poplawski wrote:
> On 17-01-2007 15:12, Michael Tokarev wrote:
[]
>> Here's another sample, which may be more useful.  I've seen quite
>> alot of very similar stuff while running tcpdump.
>>
>>   http://www.corpit.ru/mjt/bad-cksum-session3-dmp.bin
>>
>> The scenario looks like this.
>>
>> A client (82.84.172.37 -- a zombie machine trying to send us spam
>> in this case) connects to a port 25 here (81.13.94.6:25).  SYN+ACK
>> sequence completes.  Next, our server send an initial SMTP greething
>> message, but almost right after that, the client sends a FIN packet,
>> WITHOUT acknowleging that it received the (first and only) data
>> packet.  So some time later our machine re-sends the data, AND adds
>> FIN flag to the packet (also replying to the FIN received from the
>> client).  And *that* packet - original data packet which is modified
>> to also include FIN - has incorrect checksum.
>>
>> So it looks like the checksum isn't being updated WHEN ADDING MORE
>> FLAGS to the original data packet.
>>
> 
> Hi,
> 
> Here is my patch proposal. If I'm not totally wrong,
> there is a possibility that, during collapsing, empty
> skb with FIN is added to "normal" packet and changes
> its ip_summed field to CHECKSUM_NONE.
> 
> Regards,
> Jarek P.
> 
> PS: probably there are also other possibilities...

Well..  I just tried it - with this patch applied, no more bad checksums
are shown.  Tried from the network that triggers it most reliable - and
wasn't able to reproduce the bad behavior.

I'm running a tcpdump right now, and so far it only captured a few bad-cksum
packets from other hosts (which are also running 2.6.19 ;)

Thanks Jarek!

/mjt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19?
  2007-01-19 13:20                     ` Michael Tokarev
@ 2007-01-19 14:08                       ` Jarek Poplawski
  2007-01-22  7:13                         ` Jarek Poplawski
  0 siblings, 1 reply; 32+ messages in thread
From: Jarek Poplawski @ 2007-01-19 14:08 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: netdev, Patrick McHardy, Herbert Xu

On Fri, Jan 19, 2007 at 04:20:01PM +0300, Michael Tokarev wrote:
...
> Well..  I just tried it - with this patch applied, no more bad checksums
> are shown.  Tried from the network that triggers it most reliable - and
> wasn't able to reproduce the bad behavior.
> 
> I'm running a tcpdump right now, and so far it only captured a few bad-cksum
> packets from other hosts (which are also running 2.6.19 ;)
> 
> Thanks Jarek!

You are welcome! But you probably didn't read this with
attention: if it works, you should thank mainly to that
other guy...

Btw. I can't remember I've seen such ferocious testing
ever!

Cheers,
Jarek P.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19?
  2007-01-19 14:08                       ` Jarek Poplawski
@ 2007-01-22  7:13                         ` Jarek Poplawski
  2007-01-22  7:19                           ` Michael Tokarev
  0 siblings, 1 reply; 32+ messages in thread
From: Jarek Poplawski @ 2007-01-22  7:13 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: netdev, Patrick McHardy, Herbert Xu

On Fri, Jan 19, 2007 at 03:08:20PM +0100, Jarek Poplawski wrote:
...
> You are welcome! But you probably didn't read this with
> attention: if it works, you should thank mainly to that
> other guy...
> 
> Btw. I can't remember I've seen such ferocious testing
> ever!

After checking in the dictionary I found my btw. could
be rather confusing:

> ferocious:
> 1.savagely fierce, as a wild beast, person, action,
>  or aspect; violently cruel: a ferocious beating.
> 2.extreme or intense: a ferocious thirst. 

I've only meant #2 - and nothing like #1.
If you were confused - sorry!

Jarek P.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19?
  2007-01-22  7:13                         ` Jarek Poplawski
@ 2007-01-22  7:19                           ` Michael Tokarev
  2007-01-22  8:03                             ` Jarek Poplawski
  0 siblings, 1 reply; 32+ messages in thread
From: Michael Tokarev @ 2007-01-22  7:19 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev, Patrick McHardy, Herbert Xu

Jarek Poplawski wrote:
> On Fri, Jan 19, 2007 at 03:08:20PM +0100, Jarek Poplawski wrote:
> ...
>> You are welcome! But you probably didn't read this with
>> attention: if it works, you should thank mainly to that
>> other guy...
>>
>> Btw. I can't remember I've seen such ferocious testing
>> ever!
> 
> After checking in the dictionary I found my btw. could
> be rather confusing:
> 
>> ferocious:
>> 1.savagely fierce, as a wild beast, person, action,
>>  or aspect; violently cruel: a ferocious beating.
>> 2.extreme or intense: a ferocious thirst. 
> 
> I've only meant #2 - and nothing like #1.
> If you were confused - sorry!

Heh.

Jarek, thank you for your apprecation of my efforts.

And no, I noticied this your statement in the first place
(in good sense - like, #2 above) -- I wanted to comment on
it first, but didn't.

I was only running tcpdump - yes, it was running almost the
whole day, with different options.  I did almost nothing.

You over-estimate my contribution, really ;)

The very good thing is that this bug is now found, and *that*
is what matters.

/mjt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19?
  2007-01-22  7:19                           ` Michael Tokarev
@ 2007-01-22  8:03                             ` Jarek Poplawski
  0 siblings, 0 replies; 32+ messages in thread
From: Jarek Poplawski @ 2007-01-22  8:03 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: netdev, Patrick McHardy, Herbert Xu

On Mon, Jan 22, 2007 at 10:19:18AM +0300, Michael Tokarev wrote:
...
> I was only running tcpdump - yes, it was running almost the
> whole day, with different options.  I did almost nothing.
> 
> You over-estimate my contribution, really ;)
> 
> The very good thing is that this bug is now found, and *that*
> is what matters.

But if you consider that 2.6.19 isn't so fresh now
and the bug was in a very often used place and it for
sure had to disturb many times in many places, so I'm
not sure, if without your doing "almost nothing", this
bug would be diagnosed in the next year or two, really.

Jarek P. 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19?
  2007-01-19 11:06                   ` [PATCH] tcp_output: " Jarek Poplawski
  2007-01-19 12:14                     ` Patrick McHardy
  2007-01-19 13:20                     ` Michael Tokarev
@ 2007-01-19 21:10                     ` Herbert Xu
  2007-01-22  6:52                       ` Jarek Poplawski
  2 siblings, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2007-01-19 21:10 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Michael Tokarev, netdev, Patrick McHardy, David S. Miller

On Fri, Jan 19, 2007 at 12:06:41PM +0100, Jarek Poplawski wrote:
> 
> [PATCH][NET] tcp_output: rare bad TCP checksum with 2.6.19
> 
> The patch "Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE"
> changed to unconditional copying of ip_summed field from collapsed
> skb. This patch reverts this change.   
> 
> All substantial work including heavy testing and diagnosing by:
> Michael Tokarev <mjt@tls.msk.ru>
> 
> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks for catching this! I'll take the credit for adding this bug :)

Dave, we'll need this fix for 2.6.20 as well as 2.6.19.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19?
  2007-01-19 21:10                     ` Herbert Xu
@ 2007-01-22  6:52                       ` Jarek Poplawski
  2007-01-22  7:45                         ` Herbert Xu
  2007-01-24  6:08                         ` David Miller
  0 siblings, 2 replies; 32+ messages in thread
From: Jarek Poplawski @ 2007-01-22  6:52 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Michael Tokarev, netdev, Patrick McHardy, David S. Miller

On Sat, Jan 20, 2007 at 08:10:27AM +1100, Herbert Xu wrote:
> On Fri, Jan 19, 2007 at 12:06:41PM +0100, Jarek Poplawski wrote:
> > 
> > [PATCH][NET] tcp_output: rare bad TCP checksum with 2.6.19
> > 
> > The patch "Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE"
> > changed to unconditional copying of ip_summed field from collapsed
> > skb. This patch reverts this change.   
> > 
> > All substantial work including heavy testing and diagnosing by:
> > Michael Tokarev <mjt@tls.msk.ru>
> > 
> > Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
> 
> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> Thanks for catching this! I'll take the credit for adding this bug :)
> 
> Dave, we'll need this fix for 2.6.20 as well as 2.6.19.
 
I was so impressed by the amount of work done by Michael
that I magnified his merit and forgot to mention the role
of Patrick and Herbert, particularly here:

> Since you're certain that this is being seen on the wire, one
> possibility is that we've got a bug somewhere that's zeroing
> skb->ip_summed on a packet with a partial checksum.

which exactly pointed the reason.

So, I apologize to them and, if there is such possibility,
I would like to ask David Miller to change the description
like that: 
---
[PATCH][NET] tcp_output: rare bad TCP checksum with 2.6.19

The patch "Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE"
changed to unconditional copying of ip_summed field from collapsed
skb. This patch reverts this change.

The majority of substantial work including heavy testing
and diagnosing by: Michael Tokarev <mjt@tls.msk.ru>
Possible reasons pointed by: Herbert Xu and Patrick McHardy.

Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
---

Regards,
Jarek P.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19?
  2007-01-22  6:52                       ` Jarek Poplawski
@ 2007-01-22  7:45                         ` Herbert Xu
  2007-01-22  8:48                           ` Jarek Poplawski
  2007-01-22 13:46                           ` Patrick McHardy
  2007-01-24  6:08                         ` David Miller
  1 sibling, 2 replies; 32+ messages in thread
From: Herbert Xu @ 2007-01-22  7:45 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Michael Tokarev, netdev, Patrick McHardy, David S. Miller

On Mon, Jan 22, 2007 at 07:52:14AM +0100, Jarek Poplawski wrote:
>  
> I was so impressed by the amount of work done by Michael
> that I magnified his merit and forgot to mention the role
> of Patrick and Herbert, particularly here:

You don't need to be so modest!

While no doubt Patrick helped in tracking this down, my role in
this thread is solely in adding this bug in the first place :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19?
  2007-01-22  7:45                         ` Herbert Xu
@ 2007-01-22  8:48                           ` Jarek Poplawski
  2007-01-22 13:46                           ` Patrick McHardy
  1 sibling, 0 replies; 32+ messages in thread
From: Jarek Poplawski @ 2007-01-22  8:48 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Michael Tokarev, netdev, Patrick McHardy, David S. Miller

On Mon, Jan 22, 2007 at 06:45:57PM +1100, Herbert Xu wrote:
> On Mon, Jan 22, 2007 at 07:52:14AM +0100, Jarek Poplawski wrote:
> >  
> > I was so impressed by the amount of work done by Michael
> > that I magnified his merit and forgot to mention the role
> > of Patrick and Herbert, particularly here:
> 
> You don't need to be so modest!
> 
> While no doubt Patrick helped in tracking this down, my role in
> this thread is solely in adding this bug in the first place :)

Let's be sincere then: I really spend more time on
thinking why, with this hint of yours and the last
summary of Michael, you didn't send this patch yet,
and checking if I don't repeat or appropriate your
work, then on finding the place. My only explanation
was: you were overworked with something else and I'm
sure you think the same, now.

Jarek P.

PS: considering the amount of diplomacy relative to
the size of the patch, I think we would better stop
amuse this list here. 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19?
  2007-01-22  7:45                         ` Herbert Xu
  2007-01-22  8:48                           ` Jarek Poplawski
@ 2007-01-22 13:46                           ` Patrick McHardy
  1 sibling, 0 replies; 32+ messages in thread
From: Patrick McHardy @ 2007-01-22 13:46 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Jarek Poplawski, Michael Tokarev, netdev, David S. Miller

Herbert Xu wrote:
> On Mon, Jan 22, 2007 at 07:52:14AM +0100, Jarek Poplawski wrote:
> 
>> 
>>I was so impressed by the amount of work done by Michael
>>that I magnified his merit and forgot to mention the role
>>of Patrick and Herbert, particularly here:
> 
> 
> You don't need to be so modest!
> 
> While no doubt Patrick helped in tracking this down, my role in
> this thread is solely in adding this bug in the first place :)


Actually I only meditated over the code without figuring out
anything useful, but thanks anyway :)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] tcp_output: Re: rare bad TCP checksum with 2.6.19?
  2007-01-22  6:52                       ` Jarek Poplawski
  2007-01-22  7:45                         ` Herbert Xu
@ 2007-01-24  6:08                         ` David Miller
  1 sibling, 0 replies; 32+ messages in thread
From: David Miller @ 2007-01-24  6:08 UTC (permalink / raw)
  To: jarkao2; +Cc: herbert, mjt, netdev, kaber

From: Jarek Poplawski <jarkao2@o2.pl>
Date: Mon, 22 Jan 2007 07:52:14 +0100

> [PATCH][NET] tcp_output: rare bad TCP checksum with 2.6.19
> 
> The patch "Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE"
> changed to unconditional copying of ip_summed field from collapsed
> skb. This patch reverts this change.
> 
> The majority of substantial work including heavy testing
> and diagnosing by: Michael Tokarev <mjt@tls.msk.ru>
> Possible reasons pointed by: Herbert Xu and Patrick McHardy.
> 
> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
> 
> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied, thanks a lot everyone.

I'll take care of submitting this to 2.6.19-stable.

Thanks again.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2007-01-24  6:08 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-14 22:59 rare bad TCP checksum with 2.6.19? Michael Tokarev
2007-01-15  9:39 ` Herbert Xu
2007-01-15 13:34   ` Michael Tokarev
2007-01-15 14:25     ` Michael Tokarev
2007-01-15 18:13     ` Eric Dumazet
2007-01-15 19:33       ` Michael Tokarev
2007-01-15 23:36         ` Eric Dumazet
2007-01-15 20:10     ` Herbert Xu
2007-01-15 21:46       ` Michael Tokarev
2007-01-15 23:35         ` Herbert Xu
2007-01-16  3:27         ` Herbert Xu
2007-01-16  3:38           ` Herbert Xu
2007-01-16  8:08             ` Michael Tokarev
2007-01-16 11:50               ` Herbert Xu
2007-01-16 12:15                 ` Patrick McHardy
2007-01-16 14:38                   ` Michael Tokarev
2007-01-17 14:12                 ` Michael Tokarev
2007-01-19 11:06                   ` [PATCH] tcp_output: " Jarek Poplawski
2007-01-19 12:14                     ` Patrick McHardy
2007-01-19 13:23                       ` Michael Tokarev
2007-01-19 14:32                       ` Jarek Poplawski
2007-01-19 13:20                     ` Michael Tokarev
2007-01-19 14:08                       ` Jarek Poplawski
2007-01-22  7:13                         ` Jarek Poplawski
2007-01-22  7:19                           ` Michael Tokarev
2007-01-22  8:03                             ` Jarek Poplawski
2007-01-19 21:10                     ` Herbert Xu
2007-01-22  6:52                       ` Jarek Poplawski
2007-01-22  7:45                         ` Herbert Xu
2007-01-22  8:48                           ` Jarek Poplawski
2007-01-22 13:46                           ` Patrick McHardy
2007-01-24  6:08                         ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).