* MIPS checksum bug
[not found] <072748C6-07A9-4167-A8A5-80D0F7D9C784@darkforest.org>
@ 2008-09-17 5:15 ` Bryan Phillippe
2008-09-17 10:40 ` Maciej W. Rozycki
0 siblings, 1 reply; 31+ messages in thread
From: Bryan Phillippe @ 2008-09-17 5:15 UTC (permalink / raw)
To: linux-mips
Hello,
It appears that the following problem once reported for Sparc64
affects MIPS/64 as well:
original report: http://www.spinics.net/lists/sparclinux/msg00173.html
resolution: http://www.spinics.net/lists/sparclinux/msg00179.html
The net result is that when TCP fragments unacked segments due to PMTU
discovery, the shortened segment can have a bad TCP csum. I'm testing
on Linux-2.6.26 (FWIW, on a Cavium CN3010). My repro is fairly
simple: MIPS/64 client behind a Linux router, where the Linux router
has an outbound MTU of 1492. When the client attempts to send
segments of size 1460 (1500), the router sends back an ICMP
unreachable/PMTU and the client resends the first portion of the
segment reduced to 1452 (1492), and the segments *often* (but not
always) have a bad TCP csum. Note that you can't have hardware
checksums enabled or the bug is masked.
I've experimented with the following change:
--- /home/bp/tmp/csum_partial.S.orig 2008-09-16 12:01:00.000000000 -0700
+++ arch/mips/lib/csum_partial.S 2008-09-16 11:51:44.000000000 -0700
@@ -281,6 +281,23 @@
.set reorder
/* Add the passed partial csum. */
ADDC(sum, a2)
+
+ /* fold checksum again to clear the high bits before returning */
+ .set push
+ .set noat
+#ifdef USE_DOUBLE
+ dsll32 v1, sum, 0
+ daddu sum, v1
+ sltu v1, sum, v1
+ dsra32 sum, sum, 0
+ addu sum, v1
+#endif
+ sll v1, sum, 16
+ addu sum, v1
+ sltu v1, sum, v1
+ srl sum, sum, 16
+ addu sum, v1
+
jr ra
.set noreorder
END(csum_partial)
and it seems to fix the problem for me. Can you comment?
Thanks,
--
-bp
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-17 5:15 ` MIPS checksum bug Bryan Phillippe
@ 2008-09-17 10:40 ` Maciej W. Rozycki
2008-09-17 13:23 ` Atsushi Nemoto
2008-09-18 4:43 ` Bryan Phillippe
0 siblings, 2 replies; 31+ messages in thread
From: Maciej W. Rozycki @ 2008-09-17 10:40 UTC (permalink / raw)
To: Bryan Phillippe; +Cc: linux-mips
On Tue, 16 Sep 2008, Bryan Phillippe wrote:
> I've experimented with the following change:
>
> --- /home/bp/tmp/csum_partial.S.orig 2008-09-16 12:01:00.000000000 -0700
> +++ arch/mips/lib/csum_partial.S 2008-09-16 11:51:44.000000000 -0700
> @@ -281,6 +281,23 @@
> .set reorder
> /* Add the passed partial csum. */
> ADDC(sum, a2)
> +
> + /* fold checksum again to clear the high bits before returning */
> + .set push
> + .set noat
> +#ifdef USE_DOUBLE
> + dsll32 v1, sum, 0
> + daddu sum, v1
> + sltu v1, sum, v1
> + dsra32 sum, sum, 0
> + addu sum, v1
> +#endif
> + sll v1, sum, 16
> + addu sum, v1
> + sltu v1, sum, v1
> + srl sum, sum, 16
> + addu sum, v1
> +
> jr ra
> .set noreorder
> END(csum_partial)
>
> and it seems to fix the problem for me. Can you comment?
It seems obvious that a carry from the bit #15 in the last addition of
the passed checksum -- ADDC(sum, a2) -- will negate the effect of the
folding. However a simpler fix should do as well. Try if the following
patch works for you. Please note this is completely untested and further
optimisation is possible, but I've skipped it in this version for clarity.
Thanks for raising the issue.
Maciej
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
--- a/arch/mips/lib/csum_partial.S 2008-05-05 02:55:23.000000000
+0000
+++ b/arch/mips/lib/csum_partial.S 2008-09-17 10:32:37.000000000
+0000
@@ -253,6 +253,9 @@ LEAF(csum_partial)
1: ADDC(sum, t1)
+ /* Add the passed partial csum. */
+ ADDC(sum, a2)
+
/* fold checksum */
.set push
.set noat
@@ -278,11 +281,8 @@ LEAF(csum_partial)
andi sum, 0xffff
.set pop
1:
- .set reorder
- /* Add the passed partial csum. */
- ADDC(sum, a2)
jr ra
- .set noreorder
+ nop
END(csum_partial)
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-17 10:40 ` Maciej W. Rozycki
@ 2008-09-17 13:23 ` Atsushi Nemoto
2008-09-17 14:46 ` Maciej W. Rozycki
2008-09-17 22:52 ` Bryan Phillippe
2008-09-18 4:43 ` Bryan Phillippe
1 sibling, 2 replies; 31+ messages in thread
From: Atsushi Nemoto @ 2008-09-17 13:23 UTC (permalink / raw)
To: macro; +Cc: u1, linux-mips
On Wed, 17 Sep 2008 11:40:01 +0100 (BST), "Maciej W. Rozycki" <macro@linux-mips.org> wrote:
> > and it seems to fix the problem for me. Can you comment?
>
> It seems obvious that a carry from the bit #15 in the last addition of
> the passed checksum -- ADDC(sum, a2) -- will negate the effect of the
> folding. However a simpler fix should do as well. Try if the following
> patch works for you. Please note this is completely untested and further
> optimisation is possible, but I've skipped it in this version for clarity.
Well, csum_partial()'s return value is __wsum (32-bit). So strictly
there is no need to folding into 16-bit.
I think this bug was introduced by my commit
ed99e2bc1dc5dc54eb5a019f4975562dbef20103 ("[MIPS] Optimize
csum_partial for 64bit kernel"). This commit changed ADDC to using
daddu for 64-bit kernel and that was wrong for the last addition of
partial csum which should be 32-bit addition.
Here is my proposal fix. Bryan, could you try it too?
diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/csum_partial.S
index 8d77841..8b15766 100644
--- a/arch/mips/lib/csum_partial.S
+++ b/arch/mips/lib/csum_partial.S
@@ -60,6 +60,14 @@
ADD sum, v1; \
.set pop
+#define ADDC32(sum,reg) \
+ .set push; \
+ .set noat; \
+ addu sum, reg; \
+ sltu v1, sum, reg; \
+ addu sum, v1; \
+ .set pop
+
#define CSUM_BIGCHUNK1(src, offset, sum, _t0, _t1, _t2, _t3) \
LOAD _t0, (offset + UNIT(0))(src); \
LOAD _t1, (offset + UNIT(1))(src); \
@@ -280,7 +288,7 @@ LEAF(csum_partial)
1:
.set reorder
/* Add the passed partial csum. */
- ADDC(sum, a2)
+ ADDC32(sum, a2)
jr ra
.set noreorder
END(csum_partial)
@@ -681,7 +689,7 @@ EXC( sb t0, NBYTES-2(dst), .Ls_exc)
.set pop
1:
.set reorder
- ADDC(sum, psum)
+ ADDC32(sum, psum)
jr ra
.set noreorder
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-17 13:23 ` Atsushi Nemoto
@ 2008-09-17 14:46 ` Maciej W. Rozycki
2008-09-17 15:27 ` Atsushi Nemoto
2008-09-17 22:52 ` Bryan Phillippe
1 sibling, 1 reply; 31+ messages in thread
From: Maciej W. Rozycki @ 2008-09-17 14:46 UTC (permalink / raw)
To: Atsushi Nemoto; +Cc: u1, linux-mips
On Wed, 17 Sep 2008, Atsushi Nemoto wrote:
> On Wed, 17 Sep 2008 11:40:01 +0100 (BST), "Maciej W. Rozycki" <macro@linux-mips.org> wrote:
> > > and it seems to fix the problem for me. Can you comment?
> >
> > It seems obvious that a carry from the bit #15 in the last addition of
> > the passed checksum -- ADDC(sum, a2) -- will negate the effect of the
> > folding. However a simpler fix should do as well. Try if the following
> > patch works for you. Please note this is completely untested and further
> > optimisation is possible, but I've skipped it in this version for clarity.
>
> Well, csum_partial()'s return value is __wsum (32-bit). So strictly
> there is no need to folding into 16-bit.
Hmm, what's the purpose of doing the fold in csum_partial() then?
Anyway, having looked at the code again the byte swap at the end means
the passed checksum should be added afterwards, so my proposal is
incorrect. Your suggestion to use 32-bit addition in all cases is
certainly valid.
Maciej
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-17 14:46 ` Maciej W. Rozycki
@ 2008-09-17 15:27 ` Atsushi Nemoto
2008-09-17 18:21 ` Maciej W. Rozycki
0 siblings, 1 reply; 31+ messages in thread
From: Atsushi Nemoto @ 2008-09-17 15:27 UTC (permalink / raw)
To: macro; +Cc: u1, linux-mips
On Wed, 17 Sep 2008 15:46:56 +0100 (BST), "Maciej W. Rozycki" <macro@linux-mips.org> wrote:
> > Well, csum_partial()'s return value is __wsum (32-bit). So strictly
> > there is no need to folding into 16-bit.
>
> Hmm, what's the purpose of doing the fold in csum_partial() then?
Well, maybe odd-byte handling requires 16-bit holded values?
---
Atsushi Nemoto
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-17 15:27 ` Atsushi Nemoto
@ 2008-09-17 18:21 ` Maciej W. Rozycki
2008-09-18 22:07 ` Ralf Baechle
0 siblings, 1 reply; 31+ messages in thread
From: Maciej W. Rozycki @ 2008-09-17 18:21 UTC (permalink / raw)
To: Atsushi Nemoto; +Cc: u1, linux-mips
On Thu, 18 Sep 2008, Atsushi Nemoto wrote:
> > Hmm, what's the purpose of doing the fold in csum_partial() then?
>
> Well, maybe odd-byte handling requires 16-bit holded values?
It should be enough to swap odd and even bytes in the word in the
unaligned path. Though perhaps extra code to do masking would make it no
shorter/faster than what we have now; however the aligned path would
benefit. Hmm...
Maciej
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-17 13:23 ` Atsushi Nemoto
2008-09-17 14:46 ` Maciej W. Rozycki
@ 2008-09-17 22:52 ` Bryan Phillippe
2008-09-18 16:17 ` Atsushi Nemoto
1 sibling, 1 reply; 31+ messages in thread
From: Bryan Phillippe @ 2008-09-17 22:52 UTC (permalink / raw)
To: Atsushi Nemoto; +Cc: macro, linux-mips
Hello All,
I tested the simplified patch below (ADDC32), and it does not address
the checksum bug. I suspect the problem is that we're still leaving
the carry bit in the upper 16 bits of the 32 bit csum returned, and
this is resulting in a computed checksum that is 1 greater than it
should be. The upper 16 bits of the return value of this function
must be 0, right?
Thanks,
--
-bp
On Sep 17, 2008, at 6:23 AM, Atsushi Nemoto wrote:
> On Wed, 17 Sep 2008 11:40:01 +0100 (BST), "Maciej W. Rozycki" <macro@linux-mips.org
> > wrote:
>>> and it seems to fix the problem for me. Can you comment?
>>
>> It seems obvious that a carry from the bit #15 in the last addition
>> of
>> the passed checksum -- ADDC(sum, a2) -- will negate the effect of the
>> folding. However a simpler fix should do as well. Try if the
>> following
>> patch works for you. Please note this is completely untested and
>> further
>> optimisation is possible, but I've skipped it in this version for
>> clarity.
>
> Well, csum_partial()'s return value is __wsum (32-bit). So strictly
> there is no need to folding into 16-bit.
>
> I think this bug was introduced by my commit
> ed99e2bc1dc5dc54eb5a019f4975562dbef20103 ("[MIPS] Optimize
> csum_partial for 64bit kernel"). This commit changed ADDC to using
> daddu for 64-bit kernel and that was wrong for the last addition of
> partial csum which should be 32-bit addition.
>
> Here is my proposal fix. Bryan, could you try it too?
>
> diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/
> csum_partial.S
> index 8d77841..8b15766 100644
> --- a/arch/mips/lib/csum_partial.S
> +++ b/arch/mips/lib/csum_partial.S
> @@ -60,6 +60,14 @@
> ADD sum, v1; \
> .set pop
>
> +#define ADDC32(sum,reg) \
> + .set push; \
> + .set noat; \
> + addu sum, reg; \
> + sltu v1, sum, reg; \
> + addu sum, v1; \
> + .set pop
> +
> #define CSUM_BIGCHUNK1(src, offset, sum, _t0, _t1, _t2, _t3) \
> LOAD _t0, (offset + UNIT(0))(src); \
> LOAD _t1, (offset + UNIT(1))(src); \
> @@ -280,7 +288,7 @@ LEAF(csum_partial)
> 1:
> .set reorder
> /* Add the passed partial csum. */
> - ADDC(sum, a2)
> + ADDC32(sum, a2)
> jr ra
> .set noreorder
> END(csum_partial)
> @@ -681,7 +689,7 @@ EXC( sb t0, NBYTES-2(dst), .Ls_exc)
> .set pop
> 1:
> .set reorder
> - ADDC(sum, psum)
> + ADDC32(sum, psum)
> jr ra
> .set noreorder
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-17 10:40 ` Maciej W. Rozycki
2008-09-17 13:23 ` Atsushi Nemoto
@ 2008-09-18 4:43 ` Bryan Phillippe
2008-09-18 10:06 ` Maciej W. Rozycki
1 sibling, 1 reply; 31+ messages in thread
From: Bryan Phillippe @ 2008-09-18 4:43 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: linux-mips
FWIW... your patch (below) seems to actually fix the checksum problem
in my testing. What was your concern about it?
--
-bp
On Sep 17, 2008, at 3:40 AM, Maciej W. Rozycki wrote:
> On Tue, 16 Sep 2008, Bryan Phillippe wrote:
>
>> I've experimented with the following change:
>>
>> --- /home/bp/tmp/csum_partial.S.orig 2008-09-16 12:01:00.000000000
>> -0700
>> +++ arch/mips/lib/csum_partial.S 2008-09-16 11:51:44.000000000 -0700
>> @@ -281,6 +281,23 @@
>> .set reorder
>> /* Add the passed partial csum. */
>> ADDC(sum, a2)
>> +
>> + /* fold checksum again to clear the high bits before returning */
>> + .set push
>> + .set noat
>> +#ifdef USE_DOUBLE
>> + dsll32 v1, sum, 0
>> + daddu sum, v1
>> + sltu v1, sum, v1
>> + dsra32 sum, sum, 0
>> + addu sum, v1
>> +#endif
>> + sll v1, sum, 16
>> + addu sum, v1
>> + sltu v1, sum, v1
>> + srl sum, sum, 16
>> + addu sum, v1
>> +
>> jr ra
>> .set noreorder
>> END(csum_partial)
>>
>> and it seems to fix the problem for me. Can you comment?
>
> It seems obvious that a carry from the bit #15 in the last addition of
> the passed checksum -- ADDC(sum, a2) -- will negate the effect of the
> folding. However a simpler fix should do as well. Try if the
> following
> patch works for you. Please note this is completely untested and
> further
> optimisation is possible, but I've skipped it in this version for
> clarity.
>
> Thanks for raising the issue.
>
> Maciej
>
> Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
> --- a/arch/mips/lib/csum_partial.S 2008-05-05 02:55:23.000000000
> +0000
> +++ b/arch/mips/lib/csum_partial.S 2008-09-17 10:32:37.000000000
> +0000
> @@ -253,6 +253,9 @@ LEAF(csum_partial)
>
> 1: ADDC(sum, t1)
>
> + /* Add the passed partial csum. */
> + ADDC(sum, a2)
> +
> /* fold checksum */
> .set push
> .set noat
> @@ -278,11 +281,8 @@ LEAF(csum_partial)
> andi sum, 0xffff
> .set pop
> 1:
> - .set reorder
> - /* Add the passed partial csum. */
> - ADDC(sum, a2)
> jr ra
> - .set noreorder
> + nop
> END(csum_partial)
>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-18 4:43 ` Bryan Phillippe
@ 2008-09-18 10:06 ` Maciej W. Rozycki
0 siblings, 0 replies; 31+ messages in thread
From: Maciej W. Rozycki @ 2008-09-18 10:06 UTC (permalink / raw)
To: Bryan Phillippe; +Cc: linux-mips
On Wed, 17 Sep 2008, Bryan Phillippe wrote:
> FWIW... your patch (below) seems to actually fix the checksum problem
> in my testing. What was your concern about it?
For unaligned buffers the passed checksum is added before the result has
been byte-swapped. That is probably not seen too often as the network
stack normally aligns IP packets, but it does not make the change correct.
One possibility with no performance impact to the common aligned case
would be to byte-swap the passed checksum too, but currently I am somewhat
puzzled about the API of the function; specifically as to whether the
checksums passed to and from it are expected to be folded or not. The use
of a 32-bit type does not imply it is valid for the upper 16 bits to be
non-zero in the values passed.
Maciej
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-17 22:52 ` Bryan Phillippe
@ 2008-09-18 16:17 ` Atsushi Nemoto
2008-09-19 1:14 ` Ralf Baechle
2008-09-19 15:43 ` Atsushi Nemoto
0 siblings, 2 replies; 31+ messages in thread
From: Atsushi Nemoto @ 2008-09-18 16:17 UTC (permalink / raw)
To: u1; +Cc: macro, linux-mips
On Wed, 17 Sep 2008 15:52:45 -0700, Bryan Phillippe <u1@terran.org> wrote:
> I tested the simplified patch below (ADDC32), and it does not address
> the checksum bug. I suspect the problem is that we're still leaving
> the carry bit in the upper 16 bits of the 32 bit csum returned, and
> this is resulting in a computed checksum that is 1 greater than it
> should be. The upper 16 bits of the return value of this function
> must be 0, right?
Thank you for testing. Though this patch did not fixed your problem,
I still have a doubt on 64-bit optimization.
If your hardware could run 32-bit kernel, could you confirm the
problem can happens in 32-bit too or not?
---
Atsushi Nemoto
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-17 18:21 ` Maciej W. Rozycki
@ 2008-09-18 22:07 ` Ralf Baechle
2008-09-19 10:12 ` Maciej W. Rozycki
0 siblings, 1 reply; 31+ messages in thread
From: Ralf Baechle @ 2008-09-18 22:07 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Atsushi Nemoto, u1, linux-mips
On Wed, Sep 17, 2008 at 07:21:23PM +0100, Maciej W. Rozycki wrote:
> > > Hmm, what's the purpose of doing the fold in csum_partial() then?
> >
> > Well, maybe odd-byte handling requires 16-bit holded values?
>
> It should be enough to swap odd and even bytes in the word in the
> unaligned path. Though perhaps extra code to do masking would make it no
> shorter/faster than what we have now; however the aligned path would
> benefit. Hmm...
Which is a truely weird operation - but MIPS R2 happens to have a wonderful
instruction for this operation, WSBH / DSBH.
Ralf
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-18 16:17 ` Atsushi Nemoto
@ 2008-09-19 1:14 ` Ralf Baechle
2008-09-19 15:43 ` Atsushi Nemoto
1 sibling, 0 replies; 31+ messages in thread
From: Ralf Baechle @ 2008-09-19 1:14 UTC (permalink / raw)
To: Atsushi Nemoto; +Cc: u1, macro, linux-mips
On Fri, Sep 19, 2008 at 01:17:04AM +0900, Atsushi Nemoto wrote:
It seems __csum_partial_copy_user and csum_partial_copy_nocheck were
affected by the same bug. Below a patch which tries to fix the issue.
I've tested it on 64-bit only. I seem to observe that TCP transfers
on my test machine are ramping up to full bandwith somewhat more
slowly than on another machine but there are all sorts of reasons which
make that an unscientific test. Anyway, I'd appreciate if people could
test this on 32-bit and 64-bit machines asap.
Ralf
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/csum_partial.S
index 8d77841..9143a42 100644
--- a/arch/mips/lib/csum_partial.S
+++ b/arch/mips/lib/csum_partial.S
@@ -53,12 +53,14 @@
#define UNIT(unit) ((unit)*NBYTES)
#define ADDC(sum,reg) \
- .set push; \
- .set noat; \
ADD sum, reg; \
sltu v1, sum, reg; \
ADD sum, v1; \
- .set pop
+
+#define ADDC32(sum,reg) \
+ addu sum, reg; \
+ sltu v1, sum, reg; \
+ addu sum, v1; \
#define CSUM_BIGCHUNK1(src, offset, sum, _t0, _t1, _t2, _t3) \
LOAD _t0, (offset + UNIT(0))(src); \
@@ -263,24 +265,25 @@ LEAF(csum_partial)
dsra32 sum, sum, 0
addu sum, v1
#endif
- sll v1, sum, 16
- addu sum, v1
- sltu v1, sum, v1
- srl sum, sum, 16
- addu sum, v1
/* odd buffer alignment? */
beqz t7, 1f
nop
- sll v1, sum, 8
+#ifdef CPU_MIPSR2
+ wsbh sum, sum
+#else
+ li v1, 0xff00ff
+ and t0, sum, v1
+ sll t0, t0, 8
srl sum, sum, 8
- or sum, v1
- andi sum, 0xffff
+ and sum, sum, v1
+ or sum, sum, t0
+#endif
.set pop
1:
.set reorder
/* Add the passed partial csum. */
- ADDC(sum, a2)
+ ADDC32(sum, a2)
jr ra
.set noreorder
END(csum_partial)
@@ -665,23 +668,24 @@ EXC( sb t0, NBYTES-2(dst), .Ls_exc)
dsra32 sum, sum, 0
addu sum, v1
#endif
- sll v1, sum, 16
- addu sum, v1
- sltu v1, sum, v1
- srl sum, sum, 16
- addu sum, v1
/* odd buffer alignment? */
beqz odd, 1f
nop
- sll v1, sum, 8
+#ifdef CPU_MIPSR2
+ wsbh sum, sum
+#else
+ li v1, 0xff00ff
+ and t0, sum, v1
+ sll t0, t0, 8
srl sum, sum, 8
- or sum, v1
- andi sum, 0xffff
+ and sum, sum, v1
+ or sum, sum, t0
+#endif
.set pop
1:
.set reorder
- ADDC(sum, psum)
+ ADDC32(sum, psum)
jr ra
.set noreorder
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-18 22:07 ` Ralf Baechle
@ 2008-09-19 10:12 ` Maciej W. Rozycki
2008-09-19 11:23 ` Ralf Baechle
0 siblings, 1 reply; 31+ messages in thread
From: Maciej W. Rozycki @ 2008-09-19 10:12 UTC (permalink / raw)
To: Ralf Baechle; +Cc: Atsushi Nemoto, u1, linux-mips
On Fri, 19 Sep 2008, Ralf Baechle wrote:
> Which is a truely weird operation - but MIPS R2 happens to have a wonderful
> instruction for this operation, WSBH / DSBH.
Ah, finally a justification for the R2 ISA!
Seriously though, I smell a caller somewhere fails to call csum_fold() on
the result obtained from csum_partial() where it should, so it would be
good to fix the bug rather than trying to cover it. Bryan, would you be
able to track down the caller?
I can see you have done the microoptimisation I had in mind meanwhile --
thanks for saving me the effort. ;) There is a delay slot to fill left
though -- will you take care of it too?
Maciej
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-19 10:12 ` Maciej W. Rozycki
@ 2008-09-19 11:23 ` Ralf Baechle
2008-09-19 11:47 ` [PATCH] MIPS checksum fix Ralf Baechle
2008-09-19 12:26 ` MIPS checksum bug Maciej W. Rozycki
0 siblings, 2 replies; 31+ messages in thread
From: Ralf Baechle @ 2008-09-19 11:23 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Atsushi Nemoto, u1, linux-mips
On Fri, Sep 19, 2008 at 11:12:22AM +0100, Maciej W. Rozycki wrote:
> > Which is a truely weird operation - but MIPS R2 happens to have a wonderful
> > instruction for this operation, WSBH / DSBH.
>
> Ah, finally a justification for the R2 ISA!
As you may recall the real reason is that this instruction was designed
not by engineers but patent law. Which is f*cked into the head but ...
> Seriously though, I smell a caller somewhere fails to call csum_fold() on
> the result obtained from csum_partial() where it should, so it would be
> good to fix the bug rather than trying to cover it. Bryan, would you be
> able to track down the caller?
Not quite. Internally the IP stack maintains the checksum as a 32-bit
value for performance sake. It only folds it to 16-bit when it has to.
> I can see you have done the microoptimisation I had in mind meanwhile --
> thanks for saving me the effort. ;) There is a delay slot to fill left
> though -- will you take care of it too?
Will do - just couldn't be bothered (too) late last night ...
Ralf
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH] MIPS checksum fix
2008-09-19 11:23 ` Ralf Baechle
@ 2008-09-19 11:47 ` Ralf Baechle
2008-09-19 12:07 ` Ralf Baechle
2008-09-19 12:26 ` MIPS checksum bug Maciej W. Rozycki
1 sibling, 1 reply; 31+ messages in thread
From: Ralf Baechle @ 2008-09-19 11:47 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Atsushi Nemoto, u1, linux-mips, netdev
On Fri, Sep 19, 2008 at 01:23:04PM +0200, Ralf Baechle wrote:
> > I can see you have done the microoptimisation I had in mind meanwhile --
> > thanks for saving me the effort. ;) There is a delay slot to fill left
> > though -- will you take care of it too?
>
> Will do - just couldn't be bothered (too) late last night ...
Voila.
I'm interested in test reports of this on all sorts of configurations -
32-bit, 64-bit, big / little endian, R2 processors and pre-R2. In
particular Cavium being the only MIPS64 R2 implementation would be
interesting. This definately is stuff which should go upstream for 2.6.27.
Ralf
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/csum_partial.S
index 8d77841..eac0d61 100644
--- a/arch/mips/lib/csum_partial.S
+++ b/arch/mips/lib/csum_partial.S
@@ -53,12 +53,14 @@
#define UNIT(unit) ((unit)*NBYTES)
#define ADDC(sum,reg) \
- .set push; \
- .set noat; \
ADD sum, reg; \
sltu v1, sum, reg; \
ADD sum, v1; \
- .set pop
+
+#define ADDC32(sum,reg) \
+ addu sum, reg; \
+ sltu v1, sum, reg; \
+ addu sum, v1; \
#define CSUM_BIGCHUNK1(src, offset, sum, _t0, _t1, _t2, _t3) \
LOAD _t0, (offset + UNIT(0))(src); \
@@ -254,8 +256,6 @@ LEAF(csum_partial)
1: ADDC(sum, t1)
/* fold checksum */
- .set push
- .set noat
#ifdef USE_DOUBLE
dsll32 v1, sum, 0
daddu sum, v1
@@ -263,24 +263,25 @@ LEAF(csum_partial)
dsra32 sum, sum, 0
addu sum, v1
#endif
- sll v1, sum, 16
- addu sum, v1
- sltu v1, sum, v1
- srl sum, sum, 16
- addu sum, v1
/* odd buffer alignment? */
- beqz t7, 1f
- nop
- sll v1, sum, 8
+#ifdef CPU_MIPSR2
+ wsbh sum, sum
+ movn sum, v1, t7
+#else
+ beqz t7, 1f /* odd buffer alignment? */
+ lui v1, 0x00ff
+ addu v1, 0x00ff
+ and t0, sum, v1
+ sll t0, t0, 8
srl sum, sum, 8
- or sum, v1
- andi sum, 0xffff
- .set pop
+ and sum, sum, v1
+ or sum, sum, t0
1:
+#endif
.set reorder
/* Add the passed partial csum. */
- ADDC(sum, a2)
+ ADDC32(sum, a2)
jr ra
.set noreorder
END(csum_partial)
@@ -656,8 +657,6 @@ EXC( sb t0, NBYTES-2(dst), .Ls_exc)
ADDC(sum, t2)
.Ldone:
/* fold checksum */
- .set push
- .set noat
#ifdef USE_DOUBLE
dsll32 v1, sum, 0
daddu sum, v1
@@ -665,23 +664,23 @@ EXC( sb t0, NBYTES-2(dst), .Ls_exc)
dsra32 sum, sum, 0
addu sum, v1
#endif
- sll v1, sum, 16
- addu sum, v1
- sltu v1, sum, v1
- srl sum, sum, 16
- addu sum, v1
- /* odd buffer alignment? */
- beqz odd, 1f
- nop
- sll v1, sum, 8
+#ifdef CPU_MIPSR2
+ wsbh v1, sum
+ movn sum, v1, odd
+#else
+ beqz odd, 1f /* odd buffer alignment? */
+ lui v1, 0x00ff
+ addu v1, 0x00ff
+ and t0, sum, v1
+ sll t0, t0, 8
srl sum, sum, 8
- or sum, v1
- andi sum, 0xffff
- .set pop
+ and sum, sum, v1
+ or sum, sum, t0
1:
+#endif
.set reorder
- ADDC(sum, psum)
+ ADDC32(sum, psum)
jr ra
.set noreorder
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH] MIPS checksum fix
2008-09-19 11:47 ` [PATCH] MIPS checksum fix Ralf Baechle
@ 2008-09-19 12:07 ` Ralf Baechle
2008-09-19 12:15 ` Maciej W. Rozycki
` (2 more replies)
0 siblings, 3 replies; 31+ messages in thread
From: Ralf Baechle @ 2008-09-19 12:07 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Atsushi Nemoto, u1, linux-mips, netdev
On Fri, Sep 19, 2008 at 01:47:43PM +0200, Ralf Baechle wrote:
> I'm interested in test reports of this on all sorts of configurations -
> 32-bit, 64-bit, big / little endian, R2 processors and pre-R2. In
> particular Cavium being the only MIPS64 R2 implementation would be
> interesting. This definately is stuff which should go upstream for 2.6.27.
There was a trivial bug in the R2 code.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] MIPS checksum fix
2008-09-19 12:07 ` Ralf Baechle
@ 2008-09-19 12:15 ` Maciej W. Rozycki
2008-09-19 14:09 ` Atsushi Nemoto
2008-09-23 21:52 ` Bryan Phillippe
2 siblings, 0 replies; 31+ messages in thread
From: Maciej W. Rozycki @ 2008-09-19 12:15 UTC (permalink / raw)
To: Ralf Baechle; +Cc: Atsushi Nemoto, u1, linux-mips, netdev
On Fri, 19 Sep 2008, Ralf Baechle wrote:
> + beqz t7, 1f /* odd buffer alignment? */
> + lui v1, 0x00ff
Well, .set reorder to move something from before the branch would have
been a little bit better for the common aligned case. ;) There is nothing
special about branch delay slots in the whole epilogue, so one from just
before the return might simply be relocated here.
Maciej
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-19 11:23 ` Ralf Baechle
2008-09-19 11:47 ` [PATCH] MIPS checksum fix Ralf Baechle
@ 2008-09-19 12:26 ` Maciej W. Rozycki
2008-09-19 16:04 ` Bryan Phillippe
1 sibling, 1 reply; 31+ messages in thread
From: Maciej W. Rozycki @ 2008-09-19 12:26 UTC (permalink / raw)
To: Ralf Baechle; +Cc: Atsushi Nemoto, u1, linux-mips
On Fri, 19 Sep 2008, Ralf Baechle wrote:
> > Seriously though, I smell a caller somewhere fails to call csum_fold() on
> > the result obtained from csum_partial() where it should, so it would be
> > good to fix the bug rather than trying to cover it. Bryan, would you be
> > able to track down the caller?
>
> Not quite. Internally the IP stack maintains the checksum as a 32-bit
> value for performance sake. It only folds it to 16-bit when it has to.
That's been my understanding from my little investigation yesterday
evening, but Bryan's problem has come from somewhere after all and
Atsushi-san's 32-bit addition fix didn't reportedly work while full
folding did, so I have assumed there must be some dependency somewhere
where the final folding does not happen. I have referred to the original
report concerning SPARC64 now and it seems to narrow the problem down to
the 32 MSBs only, so I would prefer to have any confusion cleared.
Bryan, can you please verify whether Ralf's fix works for you or not?
Maciej
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] MIPS checksum fix
2008-09-19 12:07 ` Ralf Baechle
2008-09-19 12:15 ` Maciej W. Rozycki
@ 2008-09-19 14:09 ` Atsushi Nemoto
2008-09-19 15:02 ` Maciej W. Rozycki
2008-09-20 15:09 ` Ralf Baechle
2008-09-23 21:52 ` Bryan Phillippe
2 siblings, 2 replies; 31+ messages in thread
From: Atsushi Nemoto @ 2008-09-19 14:09 UTC (permalink / raw)
To: ralf; +Cc: macro, u1, linux-mips, netdev
On Fri, 19 Sep 2008 14:07:52 +0200, Ralf Baechle <ralf@linux-mips.org> wrote:
> From 97ad23f4696a322cb3bc379a25a8c0f6526751d6 Mon Sep 17 00:00:00 2001
> From: Ralf Baechle <ralf@linux-mips.org>
> Date: Fri, 19 Sep 2008 14:05:53 +0200
> Subject: [PATCH] [MIPS] Fix 64-bit csum_partial, __csum_partial_copy_user and csum_partial_copy
... and __csum_partial_copy_nocheck, you mean? ;)
> On 64-bit machines it wouldn't handle a possible carry when adding the
> 32-bit folded checksum and checksum argument.
>
> While at it, add a few trivial optimizations, also for R2 processors.
I think it would be better splitting bugfix and optimization. This
code is too complex to do many things at a time, isn't it?
> @@ -53,12 +53,14 @@
> #define UNIT(unit) ((unit)*NBYTES)
>
> #define ADDC(sum,reg) \
> - .set push; \
> - .set noat; \
> ADD sum, reg; \
> sltu v1, sum, reg; \
> ADD sum, v1; \
> - .set pop
Is this required? Just a cleanup?
> @@ -254,8 +256,6 @@ LEAF(csum_partial)
> 1: ADDC(sum, t1)
>
> /* fold checksum */
> - .set push
> - .set noat
> #ifdef USE_DOUBLE
> dsll32 v1, sum, 0
> daddu sum, v1
> @@ -263,24 +263,25 @@ LEAF(csum_partial)
> dsra32 sum, sum, 0
> addu sum, v1
> #endif
> - sll v1, sum, 16
> - addu sum, v1
> - sltu v1, sum, v1
> - srl sum, sum, 16
> - addu sum, v1
>
> /* odd buffer alignment? */
> - beqz t7, 1f
> - nop
> - sll v1, sum, 8
> +#ifdef CPU_MIPSR2
> + wsbh v1, sum
> + movn sum, v1, t7
> +#else
> + beqz t7, 1f /* odd buffer alignment? */
> + lui v1, 0x00ff
> + addu v1, 0x00ff
> + and t0, sum, v1
> + sll t0, t0, 8
> srl sum, sum, 8
> - or sum, v1
> - andi sum, 0xffff
> - .set pop
> + and sum, sum, v1
> + or sum, sum, t0
> 1:
> +#endif
Is this just an optimization? or contain any fixes?
---
Atsushi Nemoto
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] MIPS checksum fix
2008-09-19 14:09 ` Atsushi Nemoto
@ 2008-09-19 15:02 ` Maciej W. Rozycki
2008-09-20 15:09 ` Ralf Baechle
1 sibling, 0 replies; 31+ messages in thread
From: Maciej W. Rozycki @ 2008-09-19 15:02 UTC (permalink / raw)
To: Atsushi Nemoto; +Cc: ralf, u1, linux-mips, netdev
On Fri, 19 Sep 2008, Atsushi Nemoto wrote:
> I think it would be better splitting bugfix and optimization. This
> code is too complex to do many things at a time, isn't it?
It's probably easier for people to test a single patch now and it can
then be split at the commitment time. Of course if something actually
breaks, then keeping changes combined won't help tracking down the cause.
;)
Maciej
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-18 16:17 ` Atsushi Nemoto
2008-09-19 1:14 ` Ralf Baechle
@ 2008-09-19 15:43 ` Atsushi Nemoto
2008-09-19 16:09 ` Maciej W. Rozycki
1 sibling, 1 reply; 31+ messages in thread
From: Atsushi Nemoto @ 2008-09-19 15:43 UTC (permalink / raw)
To: u1; +Cc: macro, linux-mips
On Fri, 19 Sep 2008 01:17:04 +0900 (JST), Atsushi Nemoto <anemo@mba.ocn.ne.jp> wrote:
> Thank you for testing. Though this patch did not fixed your problem,
> I still have a doubt on 64-bit optimization.
>
> If your hardware could run 32-bit kernel, could you confirm the
> problem can happens in 32-bit too or not?
I think I found possible breakage on 64-bit path.
There are some "lw" (and "ulw") used in 64-bit path and they should be
"lwu" (and "ulwu" ... but there is no such pseudo insn) to avoid
sign-extention.
Here is a completely untested patch.
diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/csum_partial.S
index 8d77841..40f9174 100644
--- a/arch/mips/lib/csum_partial.S
+++ b/arch/mips/lib/csum_partial.S
@@ -39,12 +39,14 @@
#ifdef USE_DOUBLE
#define LOAD ld
+#define LOAD32 lwu
#define ADD daddu
#define NBYTES 8
#else
#define LOAD lw
+#define LOAD32 lw
#define ADD addu
#define NBYTES 4
@@ -60,6 +62,14 @@
ADD sum, v1; \
.set pop
+#define ADDC32(sum,reg) \
+ .set push; \
+ .set noat; \
+ addu sum, reg; \
+ sltu v1, sum, reg; \
+ addu sum, v1; \
+ .set pop
+
#define CSUM_BIGCHUNK1(src, offset, sum, _t0, _t1, _t2, _t3) \
LOAD _t0, (offset + UNIT(0))(src); \
LOAD _t1, (offset + UNIT(1))(src); \
@@ -132,7 +142,7 @@ LEAF(csum_partial)
beqz t8, .Lqword_align
andi t8, src, 0x8
- lw t0, 0x00(src)
+ LOAD32 t0, 0x00(src)
LONG_SUBU a1, a1, 0x4
ADDC(sum, t0)
PTR_ADDU src, src, 0x4
@@ -211,7 +221,7 @@ LEAF(csum_partial)
LONG_SRL t8, t8, 0x2
.Lend_words:
- lw t0, (src)
+ LOAD32 t0, (src)
LONG_SUBU t8, t8, 0x1
ADDC(sum, t0)
.set reorder /* DADDI_WAR */
@@ -229,6 +239,9 @@ LEAF(csum_partial)
/* Still a full word to go */
ulw t1, (src)
+#ifdef USE_DOUBLE
+ add t1, zero /* clear upper 32bit */
+#endif
PTR_ADDIU src, 4
ADDC(sum, t1)
@@ -280,7 +293,7 @@ LEAF(csum_partial)
1:
.set reorder
/* Add the passed partial csum. */
- ADDC(sum, a2)
+ ADDC32(sum, a2)
jr ra
.set noreorder
END(csum_partial)
@@ -681,7 +694,7 @@ EXC( sb t0, NBYTES-2(dst), .Ls_exc)
.set pop
1:
.set reorder
- ADDC(sum, psum)
+ ADDC32(sum, psum)
jr ra
.set noreorder
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-19 12:26 ` MIPS checksum bug Maciej W. Rozycki
@ 2008-09-19 16:04 ` Bryan Phillippe
0 siblings, 0 replies; 31+ messages in thread
From: Bryan Phillippe @ 2008-09-19 16:04 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Ralf Baechle, Atsushi Nemoto, linux-mips
On Sep 19, 2008, at 5:26 AM, Maciej W. Rozycki wrote:
> On Fri, 19 Sep 2008, Ralf Baechle wrote:
>
>>> Seriously though, I smell a caller somewhere fails to call
>>> csum_fold() on
>>> the result obtained from csum_partial() where it should, so it
>>> would be
>>> good to fix the bug rather than trying to cover it. Bryan, would
>>> you be
>>> able to track down the caller?
>>
>> Not quite. Internally the IP stack maintains the checksum as a 32-
>> bit
>> value for performance sake. It only folds it to 16-bit when it has
>> to.
>
> That's been my understanding from my little investigation yesterday
> evening, but Bryan's problem has come from somewhere after all and
> Atsushi-san's 32-bit addition fix didn't reportedly work while full
> folding did, so I have assumed there must be some dependency somewhere
> where the final folding does not happen. I have referred to the
> original
> report concerning SPARC64 now and it seems to narrow the problem
> down to
> the 32 MSBs only, so I would prefer to have any confusion cleared.
>
> Bryan, can you please verify whether Ralf's fix works for you or not?
Ralph is correct on the usage of the checksum by the IP stack, and the
original problem report on the Sparc64 list is pretty thorough about
describing how the bug affects the stack.
original report: http://www.spinics.net/lists/sparclinux/msg00173.html
resolution: http://www.spinics.net/lists/sparclinux/msg00179.html
A synopsis is that when TCP resends a portion of a segment whose
checksum was already previously calculated (correctly), it splits the
segment into a new and shortened old segment [see net/ipv4/
tcp_output.c:tcp_fragment "Copy and checksum data tail into the new
buffer" ~ line 737ish]. The new segment has its checksum calculated
via csum_partial_copy_nocheck(), and the shortened old segment has its
checksum calculated via csum_block_sub() (for fast checksum delta).
It is the path through csum_block_sub() where the bug occurs, in which
the output from csum_block_sub() has the upper bits of the csum set
for the carry, and later when tcp_v4_send_check() is called with the
skb->csum, the csum is off-by-one. Obviously all of these need to be
in the non-hardware checksum case.
I can test a patch later today on the Cavium CN3010 64bit; I have been
running with the suboptimal but apparently effective fold-relocation
patch and the problem is completely gone, fwiw.
Also... I was thinking it would be trivial to extract the relevant
bits of the trigger from this repro and just write a small native app
to test the csum generation-and-return; I though about doing it myself
earlier, because I have a tcpdump capture of the segment that was
triggering the bug.
--
-bp
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-19 15:43 ` Atsushi Nemoto
@ 2008-09-19 16:09 ` Maciej W. Rozycki
2008-09-19 16:35 ` Thiemo Seufer
2008-09-20 0:13 ` Ralf Baechle
0 siblings, 2 replies; 31+ messages in thread
From: Maciej W. Rozycki @ 2008-09-19 16:09 UTC (permalink / raw)
To: Atsushi Nemoto; +Cc: u1, linux-mips
On Sat, 20 Sep 2008, Atsushi Nemoto wrote:
> @@ -229,6 +239,9 @@ LEAF(csum_partial)
>
> /* Still a full word to go */
> ulw t1, (src)
> +#ifdef USE_DOUBLE
> + add t1, zero /* clear upper 32bit */
> +#endif
> PTR_ADDIU src, 4
> ADDC(sum, t1)
>
Unfortunately you can't zero-extend with a single instruction (you can
use a single sll(v) to sign-extend), unless the R2 ISA provides some
suitable oddity (which I haven't checked). You want something like:
dsll32 t1, t1, 0
dsrl32 t1, t1, 0
instead.
Maciej
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-19 16:09 ` Maciej W. Rozycki
@ 2008-09-19 16:35 ` Thiemo Seufer
2008-09-19 23:18 ` Maciej W. Rozycki
2008-09-20 0:13 ` Ralf Baechle
1 sibling, 1 reply; 31+ messages in thread
From: Thiemo Seufer @ 2008-09-19 16:35 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Atsushi Nemoto, u1, linux-mips
Maciej W. Rozycki wrote:
> On Sat, 20 Sep 2008, Atsushi Nemoto wrote:
>
> > @@ -229,6 +239,9 @@ LEAF(csum_partial)
> >
> > /* Still a full word to go */
> > ulw t1, (src)
> > +#ifdef USE_DOUBLE
> > + add t1, zero /* clear upper 32bit */
> > +#endif
> > PTR_ADDIU src, 4
> > ADDC(sum, t1)
> >
>
> Unfortunately you can't zero-extend with a single instruction (you can
> use a single sll(v) to sign-extend), unless the R2 ISA provides some
> suitable oddity (which I haven't checked).
AFAIR dext can do this for MIPS64 R2.
Thiemo
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-19 16:35 ` Thiemo Seufer
@ 2008-09-19 23:18 ` Maciej W. Rozycki
0 siblings, 0 replies; 31+ messages in thread
From: Maciej W. Rozycki @ 2008-09-19 23:18 UTC (permalink / raw)
To: Thiemo Seufer; +Cc: Atsushi Nemoto, u1, linux-mips
On Fri, 19 Sep 2008, Thiemo Seufer wrote:
> > Unfortunately you can't zero-extend with a single instruction (you can
> > use a single sll(v) to sign-extend), unless the R2 ISA provides some
> > suitable oddity (which I haven't checked).
>
> AFAIR dext can do this for MIPS64 R2.
Yeah, I would have thought if anything, it would be something as odd as
that...
Maciej
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-19 16:09 ` Maciej W. Rozycki
2008-09-19 16:35 ` Thiemo Seufer
@ 2008-09-20 0:13 ` Ralf Baechle
2008-09-20 13:45 ` Atsushi Nemoto
1 sibling, 1 reply; 31+ messages in thread
From: Ralf Baechle @ 2008-09-20 0:13 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Atsushi Nemoto, u1, linux-mips
On Fri, Sep 19, 2008 at 05:09:17PM +0100, Maciej W. Rozycki wrote:
> > @@ -229,6 +239,9 @@ LEAF(csum_partial)
> >
> > /* Still a full word to go */
> > ulw t1, (src)
> > +#ifdef USE_DOUBLE
> > + add t1, zero /* clear upper 32bit */
> > +#endif
> > PTR_ADDIU src, 4
> > ADDC(sum, t1)
> >
>
> Unfortunately you can't zero-extend with a single instruction (you can
> use a single sll(v) to sign-extend), unless the R2 ISA provides some
> suitable oddity (which I haven't checked). You want something like:
>
> dsll32 t1, t1, 0
> dsrl32 t1, t1, 0
>
> instead.
For a one's complement checksum it doesn't matter in which of the 4
halfwords the data ends is loaded. So the easiest solution is:
/* Still a full word to go */
ulw t1, (src)
#ifdef USE_DOUBLE
dsll t1, t1, 32 /* clear lower 32bit */
#endif
PTR_ADDIU src, 4
ADDC(sum, t1)
Ralf
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: MIPS checksum bug
2008-09-20 0:13 ` Ralf Baechle
@ 2008-09-20 13:45 ` Atsushi Nemoto
0 siblings, 0 replies; 31+ messages in thread
From: Atsushi Nemoto @ 2008-09-20 13:45 UTC (permalink / raw)
To: ralf; +Cc: macro, u1, linux-mips
On Sat, 20 Sep 2008 02:13:44 +0200, Ralf Baechle <ralf@linux-mips.org> wrote:
> > > +#ifdef USE_DOUBLE
> > > + add t1, zero /* clear upper 32bit */
> > > +#endif
> > > PTR_ADDIU src, 4
> > > ADDC(sum, t1)
> > >
> >
> > Unfortunately you can't zero-extend with a single instruction (you can
> > use a single sll(v) to sign-extend), unless the R2 ISA provides some
> > suitable oddity (which I haven't checked). You want something like:
> >
> > dsll32 t1, t1, 0
> > dsrl32 t1, t1, 0
> >
> > instead.
>
> For a one's complement checksum it doesn't matter in which of the 4
> halfwords the data ends is loaded. So the easiest solution is:
>
> /* Still a full word to go */
> ulw t1, (src)
> #ifdef USE_DOUBLE
> dsll t1, t1, 32 /* clear lower 32bit */
> #endif
> PTR_ADDIU src, 4
> ADDC(sum, t1)
Oops, my fault. Thank you all.
---
Atsushi Nemoto
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] MIPS checksum fix
2008-09-19 14:09 ` Atsushi Nemoto
2008-09-19 15:02 ` Maciej W. Rozycki
@ 2008-09-20 15:09 ` Ralf Baechle
1 sibling, 0 replies; 31+ messages in thread
From: Ralf Baechle @ 2008-09-20 15:09 UTC (permalink / raw)
To: Atsushi Nemoto; +Cc: macro, u1, linux-mips, netdev
On Fri, Sep 19, 2008 at 11:09:52PM +0900, Atsushi Nemoto wrote:
> I think it would be better splitting bugfix and optimization. This
> code is too complex to do many things at a time, isn't it?
>
> > @@ -53,12 +53,14 @@
> > #define UNIT(unit) ((unit)*NBYTES)
> >
> > #define ADDC(sum,reg) \
> > - .set push; \
> > - .set noat; \
> > ADD sum, reg; \
> > sltu v1, sum, reg; \
> > ADD sum, v1; \
> > - .set pop
>
> Is this required? Just a cleanup?
It papers over potencially important warnings so had to go. I argue the
caller of ADDC should set noat mode, if at all.
Ralf
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] MIPS checksum fix
2008-09-19 12:07 ` Ralf Baechle
2008-09-19 12:15 ` Maciej W. Rozycki
2008-09-19 14:09 ` Atsushi Nemoto
@ 2008-09-23 21:52 ` Bryan Phillippe
2008-09-23 22:06 ` Ralf Baechle
2008-09-29 15:28 ` Atsushi Nemoto
2 siblings, 2 replies; 31+ messages in thread
From: Bryan Phillippe @ 2008-09-23 21:52 UTC (permalink / raw)
To: Ralf Baechle, Atsushi Nemoto; +Cc: Maciej W. Rozycki, linux-mips, netdev
On Sep 19, 2008, at 5:07 AM, Ralf Baechle wrote:
> On Fri, Sep 19, 2008 at 01:47:43PM +0200, Ralf Baechle wrote:
>
>> I'm interested in test reports of this on all sorts of
>> configurations -
>> 32-bit, 64-bit, big / little endian, R2 processors and pre-R2. In
>> particular Cavium being the only MIPS64 R2 implementation would be
>> interesting. This definately is stuff which should go upstream for
>> 2.6.27.
>
> There was a trivial bug in the R2 code.
>
>> From 97ad23f4696a322cb3bc379a25a8c0f6526751d6 Mon Sep 17 00:00:00
>> 2001
> From: Ralf Baechle <ralf@linux-mips.org>
> Date: Fri, 19 Sep 2008 14:05:53 +0200
> Subject: [PATCH] [MIPS] Fix 64-bit csum_partial,
> __csum_partial_copy_user and csum_partial_copy
>
> On 64-bit machines it wouldn't handle a possible carry when adding the
> 32-bit folded checksum and checksum argument.
>
> While at it, add a few trivial optimizations, also for R2 processors.
I tried this patch (with and without Atsushi's addition, shown below)
on a MIPS64 today and the checksums were all bad (i.e. worse than the
original problem).
Note that I had to manually create the diff, because of "malformed
patch" errors at line 21 (second hunk).
If anyone would like to send me an updated unified diff for this
issue, I can re-test today within the next day.
--
-bp
> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
>
> diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/
> csum_partial.S
> index 8d77841..c77a7a0 100644
> --- a/arch/mips/lib/csum_partial.S
> +++ b/arch/mips/lib/csum_partial.S
> @@ -53,12 +53,14 @@
> #define UNIT(unit) ((unit)*NBYTES)
>
> #define ADDC(sum,reg) \
> - .set push; \
> - .set noat; \
> ADD sum, reg; \
> sltu v1, sum, reg; \
> ADD sum, v1; \
> - .set pop
> +
> +#define ADDC32(sum,reg) \
> + addu sum, reg; \
> + sltu v1, sum, reg; \
> + addu sum, v1; \
>
> #define CSUM_BIGCHUNK1(src, offset, sum, _t0, _t1, _t2, _t3) \
> LOAD _t0, (offset + UNIT(0))(src); \
> @@ -254,8 +256,6 @@ LEAF(csum_partial)
> 1: ADDC(sum, t1)
>
> /* fold checksum */
> - .set push
> - .set noat
> #ifdef USE_DOUBLE
> dsll32 v1, sum, 0
> daddu sum, v1
> @@ -263,24 +263,25 @@ LEAF(csum_partial)
> dsra32 sum, sum, 0
> addu sum, v1
> #endif
> - sll v1, sum, 16
> - addu sum, v1
> - sltu v1, sum, v1
> - srl sum, sum, 16
> - addu sum, v1
>
> /* odd buffer alignment? */
> - beqz t7, 1f
> - nop
> - sll v1, sum, 8
> +#ifdef CPU_MIPSR2
> + wsbh v1, sum
> + movn sum, v1, t7
> +#else
> + beqz t7, 1f /* odd buffer alignment? */
> + lui v1, 0x00ff
> + addu v1, 0x00ff
> + and t0, sum, v1
> + sll t0, t0, 8
> srl sum, sum, 8
> - or sum, v1
> - andi sum, 0xffff
> - .set pop
> + and sum, sum, v1
> + or sum, sum, t0
> 1:
> +#endif
> .set reorder
> /* Add the passed partial csum. */
> - ADDC(sum, a2)
> + ADDC32(sum, a2)
> jr ra
> .set noreorder
> END(csum_partial)
> @@ -656,8 +657,6 @@ EXC( sb t0, NBYTES-2(dst), .Ls_exc)
> ADDC(sum, t2)
> .Ldone:
> /* fold checksum */
> - .set push
> - .set noat
> #ifdef USE_DOUBLE
> dsll32 v1, sum, 0
> daddu sum, v1
> @@ -665,23 +664,23 @@ EXC( sb t0, NBYTES-2(dst), .Ls_exc)
> dsra32 sum, sum, 0
> addu sum, v1
> #endif
> - sll v1, sum, 16
> - addu sum, v1
> - sltu v1, sum, v1
> - srl sum, sum, 16
> - addu sum, v1
>
> - /* odd buffer alignment? */
> - beqz odd, 1f
> - nop
> - sll v1, sum, 8
> +#ifdef CPU_MIPSR2
> + wsbh v1, sum
> + movn sum, v1, odd
> +#else
> + beqz odd, 1f /* odd buffer alignment? */
> + lui v1, 0x00ff
> + addu v1, 0x00ff
> + and t0, sum, v1
> + sll t0, t0, 8
> srl sum, sum, 8
> - or sum, v1
> - andi sum, 0xffff
> - .set pop
> + and sum, sum, v1
> + or sum, sum, t0
> 1:
> +#endif
> .set reorder
> - ADDC(sum, psum)
> + ADDC32(sum, psum)
> jr ra
> .set noreorder
>
>
Begin forwarded message:
> From: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
> Date: September 19, 2008 8:43:19 AM PDT
> To: u1@terran.org
> Cc: macro@linux-mips.org, linux-mips@linux-mips.org
> Subject: Re: MIPS checksum bug
>
> On Fri, 19 Sep 2008 01:17:04 +0900 (JST), Atsushi Nemoto <anemo@mba.ocn.ne.jp
> > wrote:
>> Thank you for testing. Though this patch did not fixed your problem,
>> I still have a doubt on 64-bit optimization.
>>
>> If your hardware could run 32-bit kernel, could you confirm the
>> problem can happens in 32-bit too or not?
>
> I think I found possible breakage on 64-bit path.
>
> There are some "lw" (and "ulw") used in 64-bit path and they should be
> "lwu" (and "ulwu" ... but there is no such pseudo insn) to avoid
> sign-extention.
>
> Here is a completely untested patch.
>
> diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/
> csum_partial.S
> index 8d77841..40f9174 100644
> --- a/arch/mips/lib/csum_partial.S
> +++ b/arch/mips/lib/csum_partial.S
> @@ -39,12 +39,14 @@
> #ifdef USE_DOUBLE
>
> #define LOAD ld
> +#define LOAD32 lwu
> #define ADD daddu
> #define NBYTES 8
>
> #else
>
> #define LOAD lw
> +#define LOAD32 lw
> #define ADD addu
> #define NBYTES 4
>
> @@ -60,6 +62,14 @@
> ADD sum, v1; \
> .set pop
>
> +#define ADDC32(sum,reg) \
> + .set push; \
> + .set noat; \
> + addu sum, reg; \
> + sltu v1, sum, reg; \
> + addu sum, v1; \
> + .set pop
> +
> #define CSUM_BIGCHUNK1(src, offset, sum, _t0, _t1, _t2, _t3) \
> LOAD _t0, (offset + UNIT(0))(src); \
> LOAD _t1, (offset + UNIT(1))(src); \
> @@ -132,7 +142,7 @@ LEAF(csum_partial)
> beqz t8, .Lqword_align
> andi t8, src, 0x8
>
> - lw t0, 0x00(src)
> + LOAD32 t0, 0x00(src)
> LONG_SUBU a1, a1, 0x4
> ADDC(sum, t0)
> PTR_ADDU src, src, 0x4
> @@ -211,7 +221,7 @@ LEAF(csum_partial)
> LONG_SRL t8, t8, 0x2
>
> .Lend_words:
> - lw t0, (src)
> + LOAD32 t0, (src)
> LONG_SUBU t8, t8, 0x1
> ADDC(sum, t0)
> .set reorder /* DADDI_WAR */
> @@ -229,6 +239,9 @@ LEAF(csum_partial)
>
> /* Still a full word to go */
> ulw t1, (src)
> +#ifdef USE_DOUBLE
> + add t1, zero /* clear upper 32bit */
> +#endif
> PTR_ADDIU src, 4
> ADDC(sum, t1)
>
> @@ -280,7 +293,7 @@ LEAF(csum_partial)
> 1:
> .set reorder
> /* Add the passed partial csum. */
> - ADDC(sum, a2)
> + ADDC32(sum, a2)
> jr ra
> .set noreorder
> END(csum_partial)
> @@ -681,7 +694,7 @@ EXC( sb t0, NBYTES-2(dst), .Ls_exc)
> .set pop
> 1:
> .set reorder
> - ADDC(sum, psum)
> + ADDC32(sum, psum)
> jr ra
> .set noreorder
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] MIPS checksum fix
2008-09-23 21:52 ` Bryan Phillippe
@ 2008-09-23 22:06 ` Ralf Baechle
2008-09-29 15:28 ` Atsushi Nemoto
1 sibling, 0 replies; 31+ messages in thread
From: Ralf Baechle @ 2008-09-23 22:06 UTC (permalink / raw)
To: Bryan Phillippe; +Cc: Atsushi Nemoto, Maciej W. Rozycki, linux-mips, netdev
On Tue, Sep 23, 2008 at 02:52:24PM -0700, Bryan Phillippe wrote:
> If anyone would like to send me an updated unified diff for this issue, I
> can re-test today within the next day.
An updated fix is already in the lmo and kernel.org git repositories.
Ralf
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] MIPS checksum fix
2008-09-23 21:52 ` Bryan Phillippe
2008-09-23 22:06 ` Ralf Baechle
@ 2008-09-29 15:28 ` Atsushi Nemoto
1 sibling, 0 replies; 31+ messages in thread
From: Atsushi Nemoto @ 2008-09-29 15:28 UTC (permalink / raw)
To: u1; +Cc: ralf, macro, linux-mips, netdev
On Tue, 23 Sep 2008 14:52:24 -0700, Bryan Phillippe <u1@terran.org> wrote:
>
> I tried this patch (with and without Atsushi's addition, shown below)
> on a MIPS64 today and the checksums were all bad (i.e. worse than the
> original problem).
>
> Note that I had to manually create the diff, because of "malformed
> patch" errors at line 21 (second hunk).
>
> If anyone would like to send me an updated unified diff for this
> issue, I can re-test today within the next day.
I suppose your problem is still not fixed, right?
If so, could you try this patch? With this patch, checksums is always
compared with a result of "reference" implementation. If any mismatch
was found, please report the log.
arch/mips/lib/Makefile | 1 +
arch/mips/lib/csum_partial.S | 4 +-
arch/mips/lib/csum_partial_test.c | 109 +++++++++++++++++++++++++++++++++++++
3 files changed, 112 insertions(+), 2 deletions(-)
diff --git a/arch/mips/lib/Makefile b/arch/mips/lib/Makefile
index 8810dfb..cb56e19 100644
--- a/arch/mips/lib/Makefile
+++ b/arch/mips/lib/Makefile
@@ -29,3 +29,4 @@ obj-$(CONFIG_CPU_VR41XX) += dump_tlb.o
# libgcc-style stuff needed in the kernel
obj-y += ashldi3.o ashrdi3.o cmpdi2.o lshrdi3.o ucmpdi2.o
+obj-y += csum_partial_test.o
diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/csum_partial.S
index edac989..22e9767 100644
--- a/arch/mips/lib/csum_partial.S
+++ b/arch/mips/lib/csum_partial.S
@@ -101,7 +101,7 @@
.text
.set noreorder
.align 5
-LEAF(csum_partial)
+LEAF(asm_csum_partial)
move sum, zero
move t7, zero
@@ -296,7 +296,7 @@ LEAF(csum_partial)
ADDC32(sum, a2)
jr ra
.set noreorder
- END(csum_partial)
+ END(asm_csum_partial)
/*
diff --git a/arch/mips/lib/csum_partial_test.c b/arch/mips/lib/csum_partial_test.c
new file mode 100644
index 0000000..4a4887e
--- /dev/null
+++ b/arch/mips/lib/csum_partial_test.c
@@ -0,0 +1,109 @@
+#include <net/checksum.h>
+
+static inline __sum16 ref_csum_fold(__wsum csum)
+{
+ u32 sum = (__force u32)csum;
+ sum = (sum & 0xffff) + (sum >> 16);
+ sum = (sum & 0xffff) + (sum >> 16);
+ return (__force __sum16)~sum;
+}
+
+static inline unsigned short from32to16(unsigned int x)
+{
+ /* add up 16-bit and 16-bit for 16+c bit */
+ x = (x & 0xffff) + (x >> 16);
+ /* add up carry.. */
+ x = (x & 0xffff) + (x >> 16);
+ return x;
+}
+
+static unsigned int do_csum(const unsigned char * buff, int len)
+{
+ int odd, count;
+ unsigned int result = 0;
+
+ if (len <= 0)
+ goto out;
+ odd = 1 & (unsigned long) buff;
+ if (odd) {
+#ifdef __BIG_ENDIAN
+ result = *buff;
+#else
+ result = *buff << 8;
+#endif
+ len--;
+ buff++;
+ }
+ count = len >> 1; /* nr of 16-bit words.. */
+ if (count) {
+ if (2 & (unsigned long) buff) {
+ result += *(unsigned short *) buff;
+ count--;
+ len -= 2;
+ buff += 2;
+ }
+ count >>= 1; /* nr of 32-bit words.. */
+ if (count) {
+ unsigned int carry = 0;
+ do {
+ unsigned int w = *(unsigned int *) buff;
+ count--;
+ buff += 4;
+ result += carry;
+ result += w;
+ carry = (w > result);
+ } while (count);
+ result += carry;
+ result = (result & 0xffff) + (result >> 16);
+ }
+ if (len & 2) {
+ result += *(unsigned short *) buff;
+ buff += 2;
+ }
+ }
+ if (len & 1) {
+#ifdef __BIG_ENDIAN
+ result += (*buff << 8);
+#else
+ result += *buff;
+#endif
+ }
+ result = from32to16(result);
+ if (odd)
+ result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
+out:
+ return result;
+}
+
+static __wsum ref_csum_partial(const void *buff, int len, __wsum sum)
+{
+ unsigned int result = do_csum(buff, len);
+
+ /* add in old sum, and carry.. */
+ result += (__force u32)sum;
+ if ((__force u32)sum > result)
+ result += 1;
+ return (__force __wsum)result;
+}
+
+__wsum asm_csum_partial(const void *buff, int len, __wsum sum);
+
+__wsum csum_partial(const void *buff, int len, __wsum sum)
+{
+ __wsum ref_wsum = ref_csum_partial(buff, len, sum);
+ __sum16 ref_sum = ref_csum_fold(ref_wsum);
+ __wsum cal_wsum = asm_csum_partial(buff, len, sum);
+ __sum16 cal_sum = csum_fold(cal_wsum);
+ if (ref_sum != cal_sum) {
+ int i;
+ printk("csum_partial error."
+ " %#04x(%#08x) != %#04x(%#08x)\n",
+ ref_sum, ref_wsum, cal_sum, cal_wsum);
+ printk("len %#04x, sum %#08x\n", len, sum);
+ printk("buf");
+ for (i = 0; i < len; i++)
+ printk(" %#02x", *((const u8 *)buff + i));
+ printk("\n");
+ }
+ return ref_wsum;
+}
^ permalink raw reply related [flat|nested] 31+ messages in thread
end of thread, other threads:[~2008-09-29 15:29 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <072748C6-07A9-4167-A8A5-80D0F7D9C784@darkforest.org>
2008-09-17 5:15 ` MIPS checksum bug Bryan Phillippe
2008-09-17 10:40 ` Maciej W. Rozycki
2008-09-17 13:23 ` Atsushi Nemoto
2008-09-17 14:46 ` Maciej W. Rozycki
2008-09-17 15:27 ` Atsushi Nemoto
2008-09-17 18:21 ` Maciej W. Rozycki
2008-09-18 22:07 ` Ralf Baechle
2008-09-19 10:12 ` Maciej W. Rozycki
2008-09-19 11:23 ` Ralf Baechle
2008-09-19 11:47 ` [PATCH] MIPS checksum fix Ralf Baechle
2008-09-19 12:07 ` Ralf Baechle
2008-09-19 12:15 ` Maciej W. Rozycki
2008-09-19 14:09 ` Atsushi Nemoto
2008-09-19 15:02 ` Maciej W. Rozycki
2008-09-20 15:09 ` Ralf Baechle
2008-09-23 21:52 ` Bryan Phillippe
2008-09-23 22:06 ` Ralf Baechle
2008-09-29 15:28 ` Atsushi Nemoto
2008-09-19 12:26 ` MIPS checksum bug Maciej W. Rozycki
2008-09-19 16:04 ` Bryan Phillippe
2008-09-17 22:52 ` Bryan Phillippe
2008-09-18 16:17 ` Atsushi Nemoto
2008-09-19 1:14 ` Ralf Baechle
2008-09-19 15:43 ` Atsushi Nemoto
2008-09-19 16:09 ` Maciej W. Rozycki
2008-09-19 16:35 ` Thiemo Seufer
2008-09-19 23:18 ` Maciej W. Rozycki
2008-09-20 0:13 ` Ralf Baechle
2008-09-20 13:45 ` Atsushi Nemoto
2008-09-18 4:43 ` Bryan Phillippe
2008-09-18 10:06 ` Maciej W. Rozycki
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.