gcc miscompiles csum_tcpudp_magic() on ARMv5

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: w@1wt.eu (Willy Tarreau)
To: linux-arm-kernel@lists.infradead.org
Subject: gcc miscompiles csum_tcpudp_magic() on ARMv5
Date: Thu, 12 Dec 2013 18:11:08 +0100	[thread overview]
Message-ID: <20131212171108.GA2337@1wt.eu> (raw)
In-Reply-To: <20131212164748.GS4360@n2100.arm.linux.org.uk>

On Thu, Dec 12, 2013 at 04:47:48PM +0000, Russell King - ARM Linux wrote:
> > Then changing the type of the function argument would probably be safer!
> 
> Actually, I think we can do a bit better with this code.  We really don't
> need much of this messing around here, we can combine some of these steps.
> 
> We have:
> 
> 16-bit protocol in host endian
> 16-bit length in host endian
> 
> and we need to combine them into a 32-bit checksum which is then
> subsequently folded down to 16-bits by adding the top and bottom halves.
> 
> Now, what we can do is this:
> 
> 1. Construct a combined 32-bit protocol and length:
> 
> 	unsigned lenproto = len | proto << 16;
> 
> 2. Pass this into the assembly thusly:
> 
>                 __asm__(
>                 "adds   %0, %1, %2      @ csum_tcpudp_nofold    \n\t"
>                 "adcs   %0, %0, %3                              \n\t"
> #ifdef __ARMEB__
>                 "adcs   %0, %0, %4                              \n\t"
> #else
>                 "adcs   %0, %0, %4, ror #8                      \n\t"
> #endif
>                 "adc    %0, %0, #0"
>                 : "=&r"(sum)
>                 : "r" (sum), "r" (daddr), "r" (saddr), "r" (lenprot)
>                 : "cc");
> 
> with no swabbing at this stage.  Well, where do we get the endian
> conversion?  See that ror #8 - that a 32 bit rotate by 8 bits.  As
> these are two 16-bit quantities, we end up with this:
> 
> original:
> 	31..24	23..16	15..8	7..0
> 	len_h	len_l	pro_h	pro_l
> 
> accumulated:
> 	31..24	23..16	15..8	7..0
> 	pro_l	len_h	len_l	pro_h
> 
> And now when we fold it down to 16-bit:
> 
> 			15..8	7..0
> 			len_l	pro_h
> 			pro_l	len_h

Amusing, I've used the same optimization yesterday when computing a
TCP pseudo-header checksum.

Another thing that can be done to improve the folding of the 16-bit
checksum is to swap the values to be added, sum them and only keep
the high half integer which already contains the carry. At least on
x86 I save some cycles doing this :

              31:24  23:16  15:8  7:0
     sum32 =    D      C      B    A

     To fold this into 16-bit at a time, I just do this :

                   31:24     23:16          15:8  7:0
     sum32           D         C              B    A
  +  sum32swapped    B         A              D    C
  =                 A+B  C+A+carry(B+D/C+A)  B+D  C+A

so just take the upper result and you get the final 16-bit word at
once.

In C it does :

       fold16 = (((sum32 >> 16) | (sum32 << 16)) + sum32) >> 16

When the CPU has a rotate instruction, it's fast :-)

Cheers,
Willy

next prev parent reply	other threads:[~2013-12-12 17:11 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-12 12:14 gcc miscompiles csum_tcpudp_magic() on ARMv5 Maxime Bizon
2013-12-12 12:40 ` Russell King - ARM Linux
2013-12-12 13:36   ` Maxime Bizon
2013-12-12 13:48     ` Måns Rullgård
2013-12-12 14:10       ` Maxime Bizon
2013-12-12 14:19         ` Willy Tarreau
2013-12-12 14:28           ` Maxime Bizon
2013-12-12 14:42             ` Måns Rullgård
2013-12-12 14:52               ` Maxime Bizon
2013-12-12 14:58                 ` Måns Rullgård
2013-12-12 15:00                 ` Russell King - ARM Linux
2013-12-12 15:26                   ` Maxime Bizon
2013-12-12 15:07               ` Willy Tarreau
2013-12-12 15:18                 ` Måns Rullgård
2013-12-12 15:28                   ` Willy Tarreau
2013-12-12 15:43                     ` Russell King - ARM Linux
2013-12-12 15:50                       ` Måns Rullgård
2013-12-12 14:37           ` Måns Rullgård
2013-12-12 14:40             ` Maxime Bizon
2013-12-12 14:47               ` Måns Rullgård
2013-12-12 14:26         ` Måns Rullgård
2013-12-12 14:48     ` Russell King - ARM Linux
2013-12-12 15:00       ` Måns Rullgård
2013-12-12 15:04       ` Maxime Bizon
2013-12-12 15:41         ` Russell King - ARM Linux
2013-12-12 16:04           ` Måns Rullgård
2013-12-12 16:04           ` Willy Tarreau
2013-12-12 16:47             ` Russell King - ARM Linux
2013-12-12 17:11               ` Willy Tarreau [this message]
2013-12-12 17:20                 ` Russell King - ARM Linux
2013-12-12 17:35                   ` Willy Tarreau
2013-12-12 18:07                   ` Nicolas Pitre
2013-12-12 22:30               ` Maxime Bizon
2013-12-12 22:36                 ` Russell King - ARM Linux
2013-12-12 22:44                   ` Maxime Bizon
2013-12-12 22:48                     ` Russell King - ARM Linux
2013-12-12 17:34           ` Maxime Bizon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131212171108.GA2337@1wt.eu \
    --to=w@1wt.eu \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).