linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Christophe Leroy <christophe.leroy@c-s.fr>
To: Scott Wood <scottwood@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	netdev@vger.kernel.org
Subject: Re: [PATCH 6/9] powerpc32: optimise a few instructions in csum_partial()
Date: Mon, 29 Feb 2016 13:53:07 +0100	[thread overview]
Message-ID: <56D43F33.4030702@c-s.fr> (raw)
In-Reply-To: <1445571040.701.149.camel@freescale.com>



Le 23/10/2015 05:30, Scott Wood a écrit :
> On Tue, 2015-09-22 at 16:34 +0200, Christophe Leroy wrote:
>> r5 does contain the value to be updated, so lets use r5 all way long
>> for that. It makes the code more readable.
>>
>> To avoid confusion, it is better to use adde instead of addc
>>
>> The first addition is useless. Its only purpose is to clear carry.
>> As r4 is a signed int that is always positive, this can be done by
>> using srawi instead of srwi
>>
>> Let's also remove the comment about bdnz having no overhead as it
>> is not correct on all powerpc, at least on MPC8xx
>>
>> In the last part, in our situation, the remaining quantity of bytes
>> to be proceeded is between 0 and 3. Therefore, we can base that part
>> on the value of bit 31 and bit 30 of r4 instead of anding r4 with 3
>> then proceding on comparisons and substractions.
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>>   arch/powerpc/lib/checksum_32.S | 37 +++++++++++++++++--------------------
>>   1 file changed, 17 insertions(+), 20 deletions(-)
> Do you have benchmarks for these optimizations?
>
> -Scott
Using mftbl() to get timebase just before and after call to 
csum_partial(), I get the following on an MPC885:
* 78 bytes packets: 9% faster (11,5 to 10,4 tb ticks)
* 328 bytes packets: 3% faster (47,9 to 46,5 tb ticks)

Christophe
>
>> diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
>> index 3472372..9c12602 100644
>> --- a/arch/powerpc/lib/checksum_32.S
>> +++ b/arch/powerpc/lib/checksum_32.S
>> @@ -27,35 +27,32 @@
>>    * csum_partial(buff, len, sum)
>>    */
>>   _GLOBAL(csum_partial)
>> -     addic   r0,r5,0
>>        subi    r3,r3,4
>> -     srwi.   r6,r4,2
>> +     srawi.  r6,r4,2         /* Divide len by 4 and also clear carry */
>>        beq     3f              /* if we're doing < 4 bytes */
>> -     andi.   r5,r3,2         /* Align buffer to longword boundary */
>> +     andi.   r0,r3,2         /* Align buffer to longword boundary */
>>        beq+    1f
>> -     lhz     r5,4(r3)        /* do 2 bytes to get aligned */
>> -     addi    r3,r3,2
>> +     lhz     r0,4(r3)        /* do 2 bytes to get aligned */
>>        subi    r4,r4,2
>> -     addc    r0,r0,r5
>> +     addi    r3,r3,2
>>        srwi.   r6,r4,2         /* # words to do */
>> +     adde    r5,r5,r0
>>        beq     3f
>>   1:   mtctr   r6
>> -2:   lwzu    r5,4(r3)        /* the bdnz has zero overhead, so it should */
>> -     adde    r0,r0,r5        /* be unnecessary to unroll this loop */
>> +2:   lwzu    r0,4(r3)
>> +     adde    r5,r5,r0
>>        bdnz    2b
>> -     andi.   r4,r4,3
>> -3:   cmpwi   0,r4,2
>> -     blt+    4f
>> -     lhz     r5,4(r3)
>> +3:   andi.   r0,r4,2
>> +     beq+    4f
>> +     lhz     r0,4(r3)
>>        addi    r3,r3,2
>> -     subi    r4,r4,2
>> -     adde    r0,r0,r5
>> -4:   cmpwi   0,r4,1
>> -     bne+    5f
>> -     lbz     r5,4(r3)
>> -     slwi    r5,r5,8         /* Upper byte of word */
>> -     adde    r0,r0,r5
>> -5:   addze   r3,r0           /* add in final carry */
>> +     adde    r5,r5,r0
>> +4:   andi.   r0,r4,1
>> +     beq+    5f
>> +     lbz     r0,4(r3)
>> +     slwi    r0,r0,8         /* Upper byte of word */
>> +     adde    r5,r5,r0
>> +5:   addze   r3,r5           /* add in final carry */
>>        blr
>>   
>>   /*

  reply	other threads:[~2016-02-29 12:53 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-22 14:34 [PATCH 0/9] powerpc32: set of optimisation of network checksum functions Christophe Leroy
2015-09-22 14:34 ` [PATCH 1/9] powerpc: unexport csum_tcpudp_magic Christophe Leroy
2015-09-22 14:34 ` [PATCH 2/9] powerpc: mark xer clobbered in csum_add() Christophe Leroy
2015-09-22 14:34 ` [PATCH 3/9] powerpc32: checksum_wrappers_64 becomes checksum_wrappers Christophe Leroy
2015-10-23  3:26   ` Scott Wood
2015-10-28 11:11     ` Anton Blanchard
2015-09-22 14:34 ` [PATCH 4/9] powerpc: inline ip_fast_csum() Christophe Leroy
2015-09-23  5:43   ` Denis Kirjanov
2016-02-29  7:25     ` Christophe Leroy
2016-03-05  3:50   ` [4/9] " Scott Wood
2015-09-22 14:34 ` [PATCH 5/9] powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user() Christophe Leroy
2015-09-22 14:34 ` [PATCH 6/9] powerpc32: optimise a few instructions in csum_partial() Christophe Leroy
2015-10-23  3:30   ` Scott Wood
2016-02-29 12:53     ` Christophe Leroy [this message]
2015-09-22 14:34 ` [PATCH 7/9] powerpc32: optimise csum_partial() loop Christophe Leroy
2015-09-22 14:34 ` [PATCH 8/9] powerpc: simplify csum_add(a, b) in case a or b is constant 0 Christophe Leroy
2015-10-23  3:33   ` Scott Wood
2016-02-29  7:26     ` Christophe Leroy
2015-09-22 14:34 ` [PATCH 9/9] powerpc: optimise csum_partial() call when len is constant Christophe Leroy
2015-10-23  3:32   ` Scott Wood
2016-03-05  5:29   ` [9/9] " Scott Wood
2015-09-23 22:38 ` [PATCH 0/9] powerpc32: set of optimisation of network checksum functions David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56D43F33.4030702@c-s.fr \
    --to=christophe.leroy@c-s.fr \
    --cc=benh@kernel.crashing.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=netdev@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=scottwood@freescale.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).