From: Christophe Leroy <christophe.leroy@c-s.fr>
To: Scott Wood <scottwood@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
Michael Ellerman <mpe@ellerman.id.au>,
linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
netdev@vger.kernel.org
Subject: Re: [PATCH 6/9] powerpc32: optimise a few instructions in csum_partial()
Date: Mon, 29 Feb 2016 13:53:07 +0100 [thread overview]
Message-ID: <56D43F33.4030702@c-s.fr> (raw)
In-Reply-To: <1445571040.701.149.camel@freescale.com>
Le 23/10/2015 05:30, Scott Wood a écrit :
> On Tue, 2015-09-22 at 16:34 +0200, Christophe Leroy wrote:
>> r5 does contain the value to be updated, so lets use r5 all way long
>> for that. It makes the code more readable.
>>
>> To avoid confusion, it is better to use adde instead of addc
>>
>> The first addition is useless. Its only purpose is to clear carry.
>> As r4 is a signed int that is always positive, this can be done by
>> using srawi instead of srwi
>>
>> Let's also remove the comment about bdnz having no overhead as it
>> is not correct on all powerpc, at least on MPC8xx
>>
>> In the last part, in our situation, the remaining quantity of bytes
>> to be proceeded is between 0 and 3. Therefore, we can base that part
>> on the value of bit 31 and bit 30 of r4 instead of anding r4 with 3
>> then proceding on comparisons and substractions.
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>> arch/powerpc/lib/checksum_32.S | 37 +++++++++++++++++--------------------
>> 1 file changed, 17 insertions(+), 20 deletions(-)
> Do you have benchmarks for these optimizations?
>
> -Scott
Using mftbl() to get timebase just before and after call to
csum_partial(), I get the following on an MPC885:
* 78 bytes packets: 9% faster (11,5 to 10,4 tb ticks)
* 328 bytes packets: 3% faster (47,9 to 46,5 tb ticks)
Christophe
>
>> diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
>> index 3472372..9c12602 100644
>> --- a/arch/powerpc/lib/checksum_32.S
>> +++ b/arch/powerpc/lib/checksum_32.S
>> @@ -27,35 +27,32 @@
>> * csum_partial(buff, len, sum)
>> */
>> _GLOBAL(csum_partial)
>> - addic r0,r5,0
>> subi r3,r3,4
>> - srwi. r6,r4,2
>> + srawi. r6,r4,2 /* Divide len by 4 and also clear carry */
>> beq 3f /* if we're doing < 4 bytes */
>> - andi. r5,r3,2 /* Align buffer to longword boundary */
>> + andi. r0,r3,2 /* Align buffer to longword boundary */
>> beq+ 1f
>> - lhz r5,4(r3) /* do 2 bytes to get aligned */
>> - addi r3,r3,2
>> + lhz r0,4(r3) /* do 2 bytes to get aligned */
>> subi r4,r4,2
>> - addc r0,r0,r5
>> + addi r3,r3,2
>> srwi. r6,r4,2 /* # words to do */
>> + adde r5,r5,r0
>> beq 3f
>> 1: mtctr r6
>> -2: lwzu r5,4(r3) /* the bdnz has zero overhead, so it should */
>> - adde r0,r0,r5 /* be unnecessary to unroll this loop */
>> +2: lwzu r0,4(r3)
>> + adde r5,r5,r0
>> bdnz 2b
>> - andi. r4,r4,3
>> -3: cmpwi 0,r4,2
>> - blt+ 4f
>> - lhz r5,4(r3)
>> +3: andi. r0,r4,2
>> + beq+ 4f
>> + lhz r0,4(r3)
>> addi r3,r3,2
>> - subi r4,r4,2
>> - adde r0,r0,r5
>> -4: cmpwi 0,r4,1
>> - bne+ 5f
>> - lbz r5,4(r3)
>> - slwi r5,r5,8 /* Upper byte of word */
>> - adde r0,r0,r5
>> -5: addze r3,r0 /* add in final carry */
>> + adde r5,r5,r0
>> +4: andi. r0,r4,1
>> + beq+ 5f
>> + lbz r0,4(r3)
>> + slwi r0,r0,8 /* Upper byte of word */
>> + adde r5,r5,r0
>> +5: addze r3,r5 /* add in final carry */
>> blr
>>
>> /*
next prev parent reply other threads:[~2016-02-29 12:53 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-22 14:34 [PATCH 0/9] powerpc32: set of optimisation of network checksum functions Christophe Leroy
2015-09-22 14:34 ` [PATCH 1/9] powerpc: unexport csum_tcpudp_magic Christophe Leroy
2015-09-22 14:34 ` [PATCH 2/9] powerpc: mark xer clobbered in csum_add() Christophe Leroy
2015-09-22 14:34 ` [PATCH 3/9] powerpc32: checksum_wrappers_64 becomes checksum_wrappers Christophe Leroy
2015-10-23 3:26 ` Scott Wood
2015-10-28 11:11 ` Anton Blanchard
2015-09-22 14:34 ` [PATCH 4/9] powerpc: inline ip_fast_csum() Christophe Leroy
2015-09-23 5:43 ` Denis Kirjanov
2016-02-29 7:25 ` Christophe Leroy
2016-03-05 3:50 ` [4/9] " Scott Wood
2015-09-22 14:34 ` [PATCH 5/9] powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user() Christophe Leroy
2015-09-22 14:34 ` [PATCH 6/9] powerpc32: optimise a few instructions in csum_partial() Christophe Leroy
2015-10-23 3:30 ` Scott Wood
2016-02-29 12:53 ` Christophe Leroy [this message]
2015-09-22 14:34 ` [PATCH 7/9] powerpc32: optimise csum_partial() loop Christophe Leroy
2015-09-22 14:34 ` [PATCH 8/9] powerpc: simplify csum_add(a, b) in case a or b is constant 0 Christophe Leroy
2015-10-23 3:33 ` Scott Wood
2016-02-29 7:26 ` Christophe Leroy
2015-09-22 14:34 ` [PATCH 9/9] powerpc: optimise csum_partial() call when len is constant Christophe Leroy
2015-10-23 3:32 ` Scott Wood
2016-03-05 5:29 ` [9/9] " Scott Wood
2015-09-23 22:38 ` [PATCH 0/9] powerpc32: set of optimisation of network checksum functions David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56D43F33.4030702@c-s.fr \
--to=christophe.leroy@c-s.fr \
--cc=benh@kernel.crashing.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=netdev@vger.kernel.org \
--cc=paulus@samba.org \
--cc=scottwood@freescale.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).