* Re: Checksum problem on sparc64
2006-03-23 22:35 Checksum problem on sparc64 Richard Braun
@ 2006-06-04 5:52 ` Richard Braun
2006-06-04 14:32 ` Samuel Thibault
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Richard Braun @ 2006-06-04 5:52 UTC (permalink / raw)
To: sparclinux
[-- Attachment #1: Type: text/plain, Size: 3668 bytes --]
Hello,
Some time ago, I sent a message about bad TCP checkums in some packets.
I've been able to get more free time and track the bug. Here is the story :
The bug is triggered by tcp_fragment() in net/ipv4/tcp_output.c (I also
had problems with IPv6, but have not yet checked that the origin of the
problem is the same - it probably is). The code involved is :
if (!skb_shinfo(skb)->nr_frags && skb->ip_summed != CHECKSUM_HW) {
/* Copy and checksum data tail into the new buffer. */
buff->csum = csum_partial_copy_nocheck(skb->data + len, skb_put(buff, nsize), nsize, 0);
skb_trim(skb, len);
skb->csum = csum_block_sub(skb->csum, buff->csum, len);
}
As many Sun machines have NICs that support hardware TCP/UDP
checksumming, you really need to force linux to use the software
implementation. One way to trigger the bug is to transfer a large amount
of data between two hosts in a LAN, and in the middle of the process,
lower the MTU of the sparc interface. tcp_fragment() should be called.
Depending on the content of the packets, you may or may not get a bad
checksum (let's say it's somehow pseudo-random). When grabbing packets
with tcpdump, don't forget to use -s 0, so that tcpdump gets all the
content of each packet, in order to calculate and check the TCP checksum.
When csum_block_sub() is called, it calls in turn csum_sub() and
csum_add(). In csum_sub(), the second parameter sent to csum_block_sub()
is complemented before being sent to csum_add() (in one's complement
arithmetic, subtracting is adding the complement). But on sparc64, the
csum_partial_xxx functions returns 16 bits words (actually 32 bits words
so that there is room for a carry bit). The value returned by
csum_block_sub() may have the 16 MSB bits set. Later, it is used in
tcp_v4_send_check() (net/ipv4/tcp_ipv4.c), in csum_partial() :
th->check = tcp_v4_check(th, len, inet->saddr, inet->daddr,
csum_partial((char *)th,
th->doff << 2,
skb->csum));
After lots of checks, it seems that the value returned by csum_partial()
is causing the problem. When skb->csum has the 16 MSB bits set,
csum_partial() forgets to add a carry bit. E.g. :
skb->csum = 0xffffabcd
tcphdr_csum = 0xdcba
skb->csum + tcphdr_csum = 0x100008887
The MSB bit is the 33rd bit and is apparently silently ignored.
The partial checksum is then 1 less than it should be. When returning
the complement of that partial checksum, the value is then 1 more than
it should be.
Here is a tcpdump output from my last message :
13:14:30.397874 > 0800 1416: IP (tos 0x8, ttl 63, id 26859, offset 0, flags
[DF], length: 1400) 10.0.0.2.22 > 84.96.34.158.59002: . [bad tcp cksum 696d
(->696c)!] 3016044:3017392(1348) ack 7729 win 6788 <nop,nop,timestamp 34000990
1795720>
You can clearly see that the checksum calculated by tcpdump is 1 more than
the packet checksum.
When studying the sparc64-specific csum_partial() assembly function
(arch/sparc64/lib/checksum.S), I noticed that a 64 bits register is used
to compute the checksum of the given data. But when adding the sum
parameter to that computed value, the code directly folds from 32 bits
to 16 bits. This is where the carry bit is lost. Unfortunately, I'm not
skilled enough in sparc64 assembly language to provide a functional patch
(my tests included a C version of csum_partial()) but I guess it won't
be difficult for any hacker around (say David Miller :-)) to fix this.
After all, it was only about a carry bit. Just a little bit...
Hope it helps :-).
--
Richard Braun
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Checksum problem on sparc64
2006-03-23 22:35 Checksum problem on sparc64 Richard Braun
2006-06-04 5:52 ` Richard Braun
@ 2006-06-04 14:32 ` Samuel Thibault
2006-06-04 14:34 ` Samuel Thibault
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Samuel Thibault @ 2006-06-04 14:32 UTC (permalink / raw)
To: sparclinux
Hi,
Syn wrote:
> The value returned by csum_block_sub() may have the 16 MSB bits set.
> th->check = tcp_v4_check(th, len, inet->saddr, inet->daddr,
> csum_partial((char *)th,
> th->doff << 2,
> skb->csum));
There indeed seems to be an issue in csum_partial: quoting beginning and
end of csum_partial in checksum.S:
.align 32
.globl csum_partial
csum_partial: /* %o0=buff, %o1=len, %o2=sum */
/* skb->csum is in %o2 */
... a lot of computing that doesn't involve %o2 ...
1: add %o2, %o4, %o2
csum_partial_finish:
retl
mov %o2, %o0
The last add doesn't take care of the potential carry into the 33th
bit. I guess something like
1: add %o2, %o4, %o2
srlx %o2, 32, %o4
srl %o2, 0, %o2
add %o2, %o4, %o2
csum_partial_finish:
retl
mov %o2, %o0
should be correct.
Samuel
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Checksum problem on sparc64
2006-03-23 22:35 Checksum problem on sparc64 Richard Braun
2006-06-04 5:52 ` Richard Braun
2006-06-04 14:32 ` Samuel Thibault
@ 2006-06-04 14:34 ` Samuel Thibault
2006-06-05 4:05 ` David Miller
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Samuel Thibault @ 2006-06-04 14:34 UTC (permalink / raw)
To: sparclinux
Samuel Thibault, le Sun 04 Jun 2006 16:32:15 +0200, a écrit :
> The last add doesn't take care of the potential carry into the 33th
> bit.
(which _do_ happen when the csum has its 16 MSB set, as Richard pointed
out).
Samuel
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Checksum problem on sparc64
2006-03-23 22:35 Checksum problem on sparc64 Richard Braun
` (2 preceding siblings ...)
2006-06-04 14:34 ` Samuel Thibault
@ 2006-06-05 4:05 ` David Miller
2006-06-05 4:32 ` David Miller
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2006-06-05 4:05 UTC (permalink / raw)
To: sparclinux
From: Samuel Thibault <samuel.thibault@ens-lyon.org>
Date: Sun, 4 Jun 2006 16:34:38 +0200
[ BTW: linux-sparc doesn't exist, the list name is sparclinux.
Samuel, what in your mail client caused it to change
from sparclinux@vger to linux-sparc@vger in your replies? ]
> Samuel Thibault, le Sun 04 Jun 2006 16:32:15 +0200, a écrit :
> > The last add doesn't take care of the potential carry into the 33th
> > bit.
>
> (which _do_ happen when the csum has its 16 MSB set, as Richard pointed
> out).
Thanks for all of the excellent detective work guys.
This patch below should fix things up.
Let me know if it works for you.
diff --git a/arch/sparc64/lib/checksum.S b/arch/sparc64/lib/checksum.S
index ba9cd3c..1d230f6 100644
--- a/arch/sparc64/lib/checksum.S
+++ b/arch/sparc64/lib/checksum.S
@@ -165,8 +165,9 @@ csum_partial_end_cruft:
sll %g1, 8, %g1
or %o5, %g1, %o4
-1: add %o2, %o4, %o2
+1: addcc %o2, %o4, %o2
+ addc %g0, %o2, %o2
csum_partial_finish:
retl
- mov %o2, %o0
+ srl %o2, 0, %o0
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: Checksum problem on sparc64
2006-03-23 22:35 Checksum problem on sparc64 Richard Braun
` (3 preceding siblings ...)
2006-06-05 4:05 ` David Miller
@ 2006-06-05 4:32 ` David Miller
2006-06-05 12:05 ` Richard Braun
2006-06-05 18:19 ` David Miller
6 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2006-06-05 4:32 UTC (permalink / raw)
To: sparclinux
From: David Miller <davem@davemloft.net>
Date: Sun, 04 Jun 2006 21:05:14 -0700 (PDT)
> Let me know if it works for you.
It turns out the checksum+copy code had the same exact problem,
so let's kill that bug too.
Here is the current patch:
diff-tree ae5de0ff0bc24664a053109c6caa782ba2ad7c53 (from 672c6108a51bf559d19595d9f8193dfd81f0f752)
Author: David S. Miller <davem@sunset.davemloft.net>
Date: Sun Jun 4 21:32:01 2006 -0700
[SPARC64]: Fix missing fold at end of checksums.
Both csum_partial() and the csum_partial_copy*() family of routines
forget to do a final fold on the computed checksum value on sparc64.
So do the standard Sparc "add + set condition codes, add carry"
sequence, then make sure the high 32-bits of the return value are
clear.
Based upon some excellent detective work and debugging done by
Richard Braun and Samuel Thibault.
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/arch/sparc64/lib/checksum.S b/arch/sparc64/lib/checksum.S
index ba9cd3c..1d230f6 100644
--- a/arch/sparc64/lib/checksum.S
+++ b/arch/sparc64/lib/checksum.S
@@ -165,8 +165,9 @@ csum_partial_end_cruft:
sll %g1, 8, %g1
or %o5, %g1, %o4
-1: add %o2, %o4, %o2
+1: addcc %o2, %o4, %o2
+ addc %g0, %o2, %o2
csum_partial_finish:
retl
- mov %o2, %o0
+ srl %o2, 0, %o0
diff --git a/arch/sparc64/lib/csum_copy.S b/arch/sparc64/lib/csum_copy.S
index 71af488..e566c77 100644
--- a/arch/sparc64/lib/csum_copy.S
+++ b/arch/sparc64/lib/csum_copy.S
@@ -221,11 +221,12 @@ FUNC_NAME: /* %o0=src, %o1=dst, %o2=len
sll %g1, 8, %g1
or %o5, %g1, %o4
-1: add %o3, %o4, %o3
+1: addcc %o3, %o4, %o3
+ addc %g0, %o3, %o3
70:
retl
- mov %o3, %o0
+ srl %o3, 0, %o0
95: mov 0, GLOBAL_SPARE
brlez,pn %o2, 4f
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: Checksum problem on sparc64
2006-03-23 22:35 Checksum problem on sparc64 Richard Braun
` (4 preceding siblings ...)
2006-06-05 4:32 ` David Miller
@ 2006-06-05 12:05 ` Richard Braun
2006-06-05 18:19 ` David Miller
6 siblings, 0 replies; 8+ messages in thread
From: Richard Braun @ 2006-06-05 12:05 UTC (permalink / raw)
To: sparclinux
[-- Attachment #1: Type: text/plain, Size: 240 bytes --]
On Sun, Jun 04, 2006 at 09:32:46PM -0700, David Miller wrote:
> > Let me know if it works for you.
>
> Here is the current patch:
I've applied the latest patch today, and didn't get any trouble since.
Thanks.
--
Richard Braun
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Checksum problem on sparc64
2006-03-23 22:35 Checksum problem on sparc64 Richard Braun
` (5 preceding siblings ...)
2006-06-05 12:05 ` Richard Braun
@ 2006-06-05 18:19 ` David Miller
6 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2006-06-05 18:19 UTC (permalink / raw)
To: sparclinux
From: Richard Braun <syn@sceen.net>
Date: Mon, 5 Jun 2006 14:05:28 +0200
> On Sun, Jun 04, 2006 at 09:32:46PM -0700, David Miller wrote:
> > > Let me know if it works for you.
> >
> > Here is the current patch:
>
> I've applied the latest patch today, and didn't get any trouble since.
> Thanks.
Thanks for testing.
^ permalink raw reply [flat|nested] 8+ messages in thread