All of lore.kernel.org
 help / color / mirror / Atom feed
* Checksum problem on sparc64
@ 2006-03-23 22:35 Richard Braun
  2006-06-04  5:52 ` Richard Braun
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Richard Braun @ 2006-03-23 22:35 UTC (permalink / raw)
  To: sparclinux

[-- Attachment #1: Type: text/plain, Size: 2304 bytes --]

Hello,

I'm using kernel 2.6.15.6 on my Ultra1 (vanilla kernel without modification).
My network setup is a bit weird since I'm using a PPPoE link with MTU 1400 (I
have packet losses if using a higher MTU because of the link quality). My NAT
router is a 2.6.15.6 kernel too running on i386. The problems I experience
were already present when I used 2.6.12.6 kernel. Now here is the problem :
I have a SSH server on my sparc64 box, and port 22 on the public interface
is redirected to that host. When downloading with sftp for example, everything
runs fine until some point. Using tcpdump to discover what was going on,
I captured these packets (on public interface) :

13:14:30.359867 > 0800 1416: IP (tos 0x8, ttl  63, id 26858, offset 0, flags
[DF], length: 1400) 213.41.243.68.22 > 84.96.34.158.59002: . [tcp sum ok]
3703221:3704569(1348) ack 9968 win 6788 <nop,nop,timestamp 34000986 1795720>

13:14:30.397874 > 0800 1416: IP (tos 0x8, ttl  63, id 26859, offset 0, flags
[DF], length: 1400) 10.0.0.2.22 > 84.96.34.158.59002: . [bad tcp cksum 696d
(->696c)!] 3016044:3017392(1348) ack 7729 win 6788 <nop,nop,timestamp 34000990
1795720>

Packets were similar on the LAN interface with the exception of NAT for the
first packet :

13:14:30.359816 08:00:20:86:c9:98 > 00:90:27:71:ed:ae, ethertype IPv4 (0x0800),
length 1414: IP (tos 0x8, ttl  64, id 26858, offset 0, flags [DF], length:
1400) 10.0.0.2.22 > 84.96.34.158.59002: . [tcp sum ok] 3703221:3704569(1348)
ack 9872 win 6788 <nop,nop,timestamp 34000986 1795720>

13:14:30.397818 08:00:20:86:c9:98 > 00:90:27:71:ed:ae, ethertype IPv4 (0x0800),
length 1414: IP (tos 0x8, ttl  64, id 26859, offset 0, flags [DF], length:
1400) 10.0.0.2.22 > 84.96.34.158.59002: . [bad tcp cksum 696d (->696c)!]
3643729:3645077(1348) ack 9872 win 6788 <nop,nop,timestamp 34000990 1795720>

I'm not sure why the tcp checksum is bad. It could be a hardware problem or
a kernel bug. Therefore I decided to report this. I think I have similar
problems with IPv6, but I have not investigated yet. The problem looks quite
random for now, but I can easily reproduce it (I begin a download over sftp,
wait a few seconds/minutes, then the rate drops).

Hope it helps (and sorry for previous private queries btw).

-- 
Richard Braun

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Checksum problem on sparc64
  2006-03-23 22:35 Checksum problem on sparc64 Richard Braun
@ 2006-06-04  5:52 ` Richard Braun
  2006-06-04 14:32 ` Samuel Thibault
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Richard Braun @ 2006-06-04  5:52 UTC (permalink / raw)
  To: sparclinux

[-- Attachment #1: Type: text/plain, Size: 3668 bytes --]

Hello,

Some time ago, I sent a message about bad TCP checkums in some packets.
I've been able to get more free time and track the bug. Here is the story :

The bug is triggered by tcp_fragment() in net/ipv4/tcp_output.c (I also
had problems with IPv6, but have not yet checked that the origin of the
problem is the same - it probably is). The code involved is :

if (!skb_shinfo(skb)->nr_frags && skb->ip_summed != CHECKSUM_HW) {
	/* Copy and checksum data tail into the new buffer. */
	buff->csum = csum_partial_copy_nocheck(skb->data + len, skb_put(buff, nsize), nsize, 0);

	skb_trim(skb, len);

	skb->csum = csum_block_sub(skb->csum, buff->csum, len);
}

As many Sun machines have NICs that support hardware TCP/UDP
checksumming, you really need to force linux to use the software
implementation. One way to trigger the bug is to transfer a large amount
of data between two hosts in a LAN, and in the middle of the process,
lower the MTU of the sparc interface. tcp_fragment() should be called.
Depending on the content of the packets, you may or may not get a bad
checksum (let's say it's somehow pseudo-random). When grabbing packets
with tcpdump, don't forget to use -s 0, so that tcpdump gets all the
content of each packet, in order to calculate and check the TCP checksum.

When csum_block_sub() is called, it calls in turn csum_sub() and
csum_add(). In csum_sub(), the second parameter sent to csum_block_sub()
is complemented before being sent to csum_add() (in one's complement
arithmetic, subtracting is adding the complement). But on sparc64, the
csum_partial_xxx functions returns 16 bits words (actually 32 bits words
so that there is room for a carry bit). The value returned by
csum_block_sub() may have the 16 MSB bits set. Later, it is used in
tcp_v4_send_check() (net/ipv4/tcp_ipv4.c), in csum_partial() :
th->check = tcp_v4_check(th, len, inet->saddr, inet->daddr,
                         csum_partial((char *)th,
                                      th->doff << 2,
                                      skb->csum));

After lots of checks, it seems that the value returned by csum_partial()
is causing the problem. When skb->csum has the 16 MSB bits set,
csum_partial() forgets to add a carry bit. E.g. :
skb->csum = 0xffffabcd
tcphdr_csum = 0xdcba
skb->csum + tcphdr_csum = 0x100008887

The MSB bit is the 33rd bit and is apparently silently ignored.
The partial checksum is then 1 less than it should be. When returning
the complement of that partial checksum, the value is then 1 more than
it should be.

Here is a tcpdump output from my last message :

13:14:30.397874 > 0800 1416: IP (tos 0x8, ttl  63, id 26859, offset 0, flags
[DF], length: 1400) 10.0.0.2.22 > 84.96.34.158.59002: . [bad tcp cksum 696d
(->696c)!] 3016044:3017392(1348) ack 7729 win 6788 <nop,nop,timestamp 34000990
1795720>

You can clearly see that the checksum calculated by tcpdump is 1 more than
the packet checksum.

When studying the sparc64-specific csum_partial() assembly function
(arch/sparc64/lib/checksum.S), I noticed that a 64 bits register is used
to compute the checksum of the given data. But when adding the sum
parameter to that computed value, the code directly folds from 32 bits
to 16 bits. This is where the carry bit is lost. Unfortunately, I'm not
skilled enough in sparc64 assembly language to provide a functional patch
(my tests included a C version of csum_partial()) but I guess it won't
be difficult for any hacker around (say David Miller :-)) to fix this.
After all, it was only about a carry bit. Just a little bit...

Hope it helps :-).

-- 
Richard Braun

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Checksum problem on sparc64
  2006-03-23 22:35 Checksum problem on sparc64 Richard Braun
  2006-06-04  5:52 ` Richard Braun
@ 2006-06-04 14:32 ` Samuel Thibault
  2006-06-04 14:34 ` Samuel Thibault
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Samuel Thibault @ 2006-06-04 14:32 UTC (permalink / raw)
  To: sparclinux

Hi,

Syn wrote:
> The value returned by csum_block_sub() may have the 16 MSB bits set.
> th->check = tcp_v4_check(th, len, inet->saddr, inet->daddr,
>                         csum_partial((char *)th,
>                                      th->doff << 2,
>                                      skb->csum));

There indeed seems to be an issue in csum_partial: quoting beginning and
end of csum_partial in checksum.S:

        .align          32
        .globl          csum_partial
csum_partial:           /* %o0=buff, %o1=len, %o2=sum */
	/* skb->csum is in %o2 */

 	 ... a lot of computing that doesn't involve %o2 ...

1:      add             %o2, %o4, %o2

csum_partial_finish:
        retl
         mov            %o2, %o0

The last add doesn't take care of the potential carry into the 33th
bit. I guess something like

1:      add             %o2, %o4, %o2
	srlx		%o2, 32, %o4
	srl		%o2, 0, %o2
	add		%o2, %o4, %o2

csum_partial_finish:
        retl
         mov            %o2, %o0

should be correct.

Samuel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Checksum problem on sparc64
  2006-03-23 22:35 Checksum problem on sparc64 Richard Braun
  2006-06-04  5:52 ` Richard Braun
  2006-06-04 14:32 ` Samuel Thibault
@ 2006-06-04 14:34 ` Samuel Thibault
  2006-06-05  4:05 ` David Miller
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Samuel Thibault @ 2006-06-04 14:34 UTC (permalink / raw)
  To: sparclinux

Samuel Thibault, le Sun 04 Jun 2006 16:32:15 +0200, a écrit :
> The last add doesn't take care of the potential carry into the 33th
> bit.

(which _do_ happen when the csum has its 16 MSB set, as Richard pointed
out).

Samuel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Checksum problem on sparc64
  2006-03-23 22:35 Checksum problem on sparc64 Richard Braun
                   ` (2 preceding siblings ...)
  2006-06-04 14:34 ` Samuel Thibault
@ 2006-06-05  4:05 ` David Miller
  2006-06-05  4:32 ` David Miller
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2006-06-05  4:05 UTC (permalink / raw)
  To: sparclinux

From: Samuel Thibault <samuel.thibault@ens-lyon.org>
Date: Sun, 4 Jun 2006 16:34:38 +0200

[ BTW: linux-sparc doesn't exist, the list name is sparclinux.
       Samuel, what in your mail client caused it to change
       from sparclinux@vger to linux-sparc@vger in your replies? ]

> Samuel Thibault, le Sun 04 Jun 2006 16:32:15 +0200, a écrit :
> > The last add doesn't take care of the potential carry into the 33th
> > bit.
> 
> (which _do_ happen when the csum has its 16 MSB set, as Richard pointed
> out).

Thanks for all of the excellent detective work guys.

This patch below should fix things up.

Let me know if it works for you.

diff --git a/arch/sparc64/lib/checksum.S b/arch/sparc64/lib/checksum.S
index ba9cd3c..1d230f6 100644
--- a/arch/sparc64/lib/checksum.S
+++ b/arch/sparc64/lib/checksum.S
@@ -165,8 +165,9 @@ csum_partial_end_cruft:
 	sll		%g1, 8, %g1
 	or		%o5, %g1, %o4
 
-1:	add		%o2, %o4, %o2
+1:	addcc		%o2, %o4, %o2
+	addc		%g0, %o2, %o2
 
 csum_partial_finish:
 	retl
-	 mov		%o2, %o0
+	 srl		%o2, 0, %o0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Checksum problem on sparc64
  2006-03-23 22:35 Checksum problem on sparc64 Richard Braun
                   ` (3 preceding siblings ...)
  2006-06-05  4:05 ` David Miller
@ 2006-06-05  4:32 ` David Miller
  2006-06-05 12:05 ` Richard Braun
  2006-06-05 18:19 ` David Miller
  6 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2006-06-05  4:32 UTC (permalink / raw)
  To: sparclinux

From: David Miller <davem@davemloft.net>
Date: Sun, 04 Jun 2006 21:05:14 -0700 (PDT)

> Let me know if it works for you.

It turns out the checksum+copy code had the same exact problem,
so let's kill that bug too.

Here is the current patch:

diff-tree ae5de0ff0bc24664a053109c6caa782ba2ad7c53 (from 672c6108a51bf559d19595d9f8193dfd81f0f752)
Author: David S. Miller <davem@sunset.davemloft.net>
Date:   Sun Jun 4 21:32:01 2006 -0700

    [SPARC64]: Fix missing fold at end of checksums.
    
    Both csum_partial() and the csum_partial_copy*() family of routines
    forget to do a final fold on the computed checksum value on sparc64.
    So do the standard Sparc "add + set condition codes, add carry"
    sequence, then make sure the high 32-bits of the return value are
    clear.
    
    Based upon some excellent detective work and debugging done by
    Richard Braun and Samuel Thibault.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/arch/sparc64/lib/checksum.S b/arch/sparc64/lib/checksum.S
index ba9cd3c..1d230f6 100644
--- a/arch/sparc64/lib/checksum.S
+++ b/arch/sparc64/lib/checksum.S
@@ -165,8 +165,9 @@ csum_partial_end_cruft:
 	sll		%g1, 8, %g1
 	or		%o5, %g1, %o4
 
-1:	add		%o2, %o4, %o2
+1:	addcc		%o2, %o4, %o2
+	addc		%g0, %o2, %o2
 
 csum_partial_finish:
 	retl
-	 mov		%o2, %o0
+	 srl		%o2, 0, %o0
diff --git a/arch/sparc64/lib/csum_copy.S b/arch/sparc64/lib/csum_copy.S
index 71af488..e566c77 100644
--- a/arch/sparc64/lib/csum_copy.S
+++ b/arch/sparc64/lib/csum_copy.S
@@ -221,11 +221,12 @@ FUNC_NAME:		/* %o0=src, %o1=dst, %o2=len
 	sll		%g1, 8, %g1
 	or		%o5, %g1, %o4
 
-1:	add		%o3, %o4, %o3
+1:	addcc		%o3, %o4, %o3
+	addc		%g0, %o3, %o3
 
 70:
 	retl
-	 mov		%o3, %o0
+	 srl		%o3, 0, %o0
 
 95:	mov		0, GLOBAL_SPARE
 	brlez,pn	%o2, 4f

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Checksum problem on sparc64
  2006-03-23 22:35 Checksum problem on sparc64 Richard Braun
                   ` (4 preceding siblings ...)
  2006-06-05  4:32 ` David Miller
@ 2006-06-05 12:05 ` Richard Braun
  2006-06-05 18:19 ` David Miller
  6 siblings, 0 replies; 8+ messages in thread
From: Richard Braun @ 2006-06-05 12:05 UTC (permalink / raw)
  To: sparclinux

[-- Attachment #1: Type: text/plain, Size: 240 bytes --]

On Sun, Jun 04, 2006 at 09:32:46PM -0700, David Miller wrote:
> > Let me know if it works for you.
> 
> Here is the current patch:

I've applied the latest patch today, and didn't get any trouble since.
Thanks.

-- 
Richard Braun

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Checksum problem on sparc64
  2006-03-23 22:35 Checksum problem on sparc64 Richard Braun
                   ` (5 preceding siblings ...)
  2006-06-05 12:05 ` Richard Braun
@ 2006-06-05 18:19 ` David Miller
  6 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2006-06-05 18:19 UTC (permalink / raw)
  To: sparclinux

From: Richard Braun <syn@sceen.net>
Date: Mon, 5 Jun 2006 14:05:28 +0200

> On Sun, Jun 04, 2006 at 09:32:46PM -0700, David Miller wrote:
> > > Let me know if it works for you.
> > 
> > Here is the current patch:
> 
> I've applied the latest patch today, and didn't get any trouble since.
> Thanks.

Thanks for testing.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-06-05 18:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-23 22:35 Checksum problem on sparc64 Richard Braun
2006-06-04  5:52 ` Richard Braun
2006-06-04 14:32 ` Samuel Thibault
2006-06-04 14:34 ` Samuel Thibault
2006-06-05  4:05 ` David Miller
2006-06-05  4:32 ` David Miller
2006-06-05 12:05 ` Richard Braun
2006-06-05 18:19 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.