From: w@1wt.eu (Willy Tarreau)
To: linux-arm-kernel@lists.infradead.org
Subject: gcc miscompiles csum_tcpudp_magic() on ARMv5
Date: Thu, 12 Dec 2013 18:11:08 +0100 [thread overview]
Message-ID: <20131212171108.GA2337@1wt.eu> (raw)
In-Reply-To: <20131212164748.GS4360@n2100.arm.linux.org.uk>
On Thu, Dec 12, 2013 at 04:47:48PM +0000, Russell King - ARM Linux wrote:
> > Then changing the type of the function argument would probably be safer!
>
> Actually, I think we can do a bit better with this code. We really don't
> need much of this messing around here, we can combine some of these steps.
>
> We have:
>
> 16-bit protocol in host endian
> 16-bit length in host endian
>
> and we need to combine them into a 32-bit checksum which is then
> subsequently folded down to 16-bits by adding the top and bottom halves.
>
> Now, what we can do is this:
>
> 1. Construct a combined 32-bit protocol and length:
>
> unsigned lenproto = len | proto << 16;
>
> 2. Pass this into the assembly thusly:
>
> __asm__(
> "adds %0, %1, %2 @ csum_tcpudp_nofold \n\t"
> "adcs %0, %0, %3 \n\t"
> #ifdef __ARMEB__
> "adcs %0, %0, %4 \n\t"
> #else
> "adcs %0, %0, %4, ror #8 \n\t"
> #endif
> "adc %0, %0, #0"
> : "=&r"(sum)
> : "r" (sum), "r" (daddr), "r" (saddr), "r" (lenprot)
> : "cc");
>
> with no swabbing at this stage. Well, where do we get the endian
> conversion? See that ror #8 - that a 32 bit rotate by 8 bits. As
> these are two 16-bit quantities, we end up with this:
>
> original:
> 31..24 23..16 15..8 7..0
> len_h len_l pro_h pro_l
>
> accumulated:
> 31..24 23..16 15..8 7..0
> pro_l len_h len_l pro_h
>
> And now when we fold it down to 16-bit:
>
> 15..8 7..0
> len_l pro_h
> pro_l len_h
Amusing, I've used the same optimization yesterday when computing a
TCP pseudo-header checksum.
Another thing that can be done to improve the folding of the 16-bit
checksum is to swap the values to be added, sum them and only keep
the high half integer which already contains the carry. At least on
x86 I save some cycles doing this :
31:24 23:16 15:8 7:0
sum32 = D C B A
To fold this into 16-bit at a time, I just do this :
31:24 23:16 15:8 7:0
sum32 D C B A
+ sum32swapped B A D C
= A+B C+A+carry(B+D/C+A) B+D C+A
so just take the upper result and you get the final 16-bit word at
once.
In C it does :
fold16 = (((sum32 >> 16) | (sum32 << 16)) + sum32) >> 16
When the CPU has a rotate instruction, it's fast :-)
Cheers,
Willy
next prev parent reply other threads:[~2013-12-12 17:11 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-12 12:14 gcc miscompiles csum_tcpudp_magic() on ARMv5 Maxime Bizon
2013-12-12 12:40 ` Russell King - ARM Linux
2013-12-12 13:36 ` Maxime Bizon
2013-12-12 13:48 ` Måns Rullgård
2013-12-12 14:10 ` Maxime Bizon
2013-12-12 14:19 ` Willy Tarreau
2013-12-12 14:28 ` Maxime Bizon
2013-12-12 14:42 ` Måns Rullgård
2013-12-12 14:52 ` Maxime Bizon
2013-12-12 14:58 ` Måns Rullgård
2013-12-12 15:00 ` Russell King - ARM Linux
2013-12-12 15:26 ` Maxime Bizon
2013-12-12 15:07 ` Willy Tarreau
2013-12-12 15:18 ` Måns Rullgård
2013-12-12 15:28 ` Willy Tarreau
2013-12-12 15:43 ` Russell King - ARM Linux
2013-12-12 15:50 ` Måns Rullgård
2013-12-12 14:37 ` Måns Rullgård
2013-12-12 14:40 ` Maxime Bizon
2013-12-12 14:47 ` Måns Rullgård
2013-12-12 14:26 ` Måns Rullgård
2013-12-12 14:48 ` Russell King - ARM Linux
2013-12-12 15:00 ` Måns Rullgård
2013-12-12 15:04 ` Maxime Bizon
2013-12-12 15:41 ` Russell King - ARM Linux
2013-12-12 16:04 ` Måns Rullgård
2013-12-12 16:04 ` Willy Tarreau
2013-12-12 16:47 ` Russell King - ARM Linux
2013-12-12 17:11 ` Willy Tarreau [this message]
2013-12-12 17:20 ` Russell King - ARM Linux
2013-12-12 17:35 ` Willy Tarreau
2013-12-12 18:07 ` Nicolas Pitre
2013-12-12 22:30 ` Maxime Bizon
2013-12-12 22:36 ` Russell King - ARM Linux
2013-12-12 22:44 ` Maxime Bizon
2013-12-12 22:48 ` Russell King - ARM Linux
2013-12-12 17:34 ` Maxime Bizon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131212171108.GA2337@1wt.eu \
--to=w@1wt.eu \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).