All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christophe Leroy <christophe.leroy@c-s.fr>
To: Scott Wood <scottwood@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	netdev@vger.kernel.org
Subject: Re: [PATCH 6/9] powerpc32: optimise a few instructions in csum_partial()
Date: Mon, 29 Feb 2016 13:53:07 +0100	[thread overview]
Message-ID: <56D43F33.4030702@c-s.fr> (raw)
In-Reply-To: <1445571040.701.149.camel@freescale.com>



Le 23/10/2015 05:30, Scott Wood a écrit :
> On Tue, 2015-09-22 at 16:34 +0200, Christophe Leroy wrote:
>> r5 does contain the value to be updated, so lets use r5 all way long
>> for that. It makes the code more readable.
>>
>> To avoid confusion, it is better to use adde instead of addc
>>
>> The first addition is useless. Its only purpose is to clear carry.
>> As r4 is a signed int that is always positive, this can be done by
>> using srawi instead of srwi
>>
>> Let's also remove the comment about bdnz having no overhead as it
>> is not correct on all powerpc, at least on MPC8xx
>>
>> In the last part, in our situation, the remaining quantity of bytes
>> to be proceeded is between 0 and 3. Therefore, we can base that part
>> on the value of bit 31 and bit 30 of r4 instead of anding r4 with 3
>> then proceding on comparisons and substractions.
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>>   arch/powerpc/lib/checksum_32.S | 37 +++++++++++++++++--------------------
>>   1 file changed, 17 insertions(+), 20 deletions(-)
> Do you have benchmarks for these optimizations?
>
> -Scott
Using mftbl() to get timebase just before and after call to 
csum_partial(), I get the following on an MPC885:
* 78 bytes packets: 9% faster (11,5 to 10,4 tb ticks)
* 328 bytes packets: 3% faster (47,9 to 46,5 tb ticks)

Christophe
>
>> diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
>> index 3472372..9c12602 100644
>> --- a/arch/powerpc/lib/checksum_32.S
>> +++ b/arch/powerpc/lib/checksum_32.S
>> @@ -27,35 +27,32 @@
>>    * csum_partial(buff, len, sum)
>>    */
>>   _GLOBAL(csum_partial)
>> -     addic   r0,r5,0
>>        subi    r3,r3,4
>> -     srwi.   r6,r4,2
>> +     srawi.  r6,r4,2         /* Divide len by 4 and also clear carry */
>>        beq     3f              /* if we're doing < 4 bytes */
>> -     andi.   r5,r3,2         /* Align buffer to longword boundary */
>> +     andi.   r0,r3,2         /* Align buffer to longword boundary */
>>        beq+    1f
>> -     lhz     r5,4(r3)        /* do 2 bytes to get aligned */
>> -     addi    r3,r3,2
>> +     lhz     r0,4(r3)        /* do 2 bytes to get aligned */
>>        subi    r4,r4,2
>> -     addc    r0,r0,r5
>> +     addi    r3,r3,2
>>        srwi.   r6,r4,2         /* # words to do */
>> +     adde    r5,r5,r0
>>        beq     3f
>>   1:   mtctr   r6
>> -2:   lwzu    r5,4(r3)        /* the bdnz has zero overhead, so it should */
>> -     adde    r0,r0,r5        /* be unnecessary to unroll this loop */
>> +2:   lwzu    r0,4(r3)
>> +     adde    r5,r5,r0
>>        bdnz    2b
>> -     andi.   r4,r4,3
>> -3:   cmpwi   0,r4,2
>> -     blt+    4f
>> -     lhz     r5,4(r3)
>> +3:   andi.   r0,r4,2
>> +     beq+    4f
>> +     lhz     r0,4(r3)
>>        addi    r3,r3,2
>> -     subi    r4,r4,2
>> -     adde    r0,r0,r5
>> -4:   cmpwi   0,r4,1
>> -     bne+    5f
>> -     lbz     r5,4(r3)
>> -     slwi    r5,r5,8         /* Upper byte of word */
>> -     adde    r0,r0,r5
>> -5:   addze   r3,r0           /* add in final carry */
>> +     adde    r5,r5,r0
>> +4:   andi.   r0,r4,1
>> +     beq+    5f
>> +     lbz     r0,4(r3)
>> +     slwi    r0,r0,8         /* Upper byte of word */
>> +     adde    r5,r5,r0
>> +5:   addze   r3,r5           /* add in final carry */
>>        blr
>>   
>>   /*

  reply	other threads:[~2016-02-29 12:53 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-22 14:34 [PATCH 0/9] powerpc32: set of optimisation of network checksum functions Christophe Leroy
2015-09-22 14:34 ` [PATCH 1/9] powerpc: unexport csum_tcpudp_magic Christophe Leroy
2015-09-22 14:34   ` Christophe Leroy
2015-09-22 14:34 ` [PATCH 2/9] powerpc: mark xer clobbered in csum_add() Christophe Leroy
2015-09-22 14:34 ` [PATCH 3/9] powerpc32: checksum_wrappers_64 becomes checksum_wrappers Christophe Leroy
2015-10-23  3:26   ` Scott Wood
2015-10-28 11:11     ` Anton Blanchard
2015-09-22 14:34 ` [PATCH 4/9] powerpc: inline ip_fast_csum() Christophe Leroy
2015-09-23  5:43   ` Denis Kirjanov
2016-02-29  7:25     ` Christophe Leroy
2016-03-05  3:50   ` [4/9] " Scott Wood
2016-03-05  3:50     ` Scott Wood
2015-09-22 14:34 ` [PATCH 5/9] powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user() Christophe Leroy
2015-09-22 14:34 ` [PATCH 6/9] powerpc32: optimise a few instructions in csum_partial() Christophe Leroy
2015-10-23  3:30   ` Scott Wood
2016-02-29 12:53     ` Christophe Leroy [this message]
2015-09-22 14:34 ` [PATCH 7/9] powerpc32: optimise csum_partial() loop Christophe Leroy
2015-09-22 14:34 ` [PATCH 8/9] powerpc: simplify csum_add(a, b) in case a or b is constant 0 Christophe Leroy
2015-10-23  3:33   ` Scott Wood
2016-02-29  7:26     ` Christophe Leroy
2015-09-22 14:34 ` [PATCH 9/9] powerpc: optimise csum_partial() call when len is constant Christophe Leroy
2015-10-23  3:32   ` Scott Wood
2016-03-05  5:29   ` [9/9] " Scott Wood
2015-09-23 22:38 ` [PATCH 0/9] powerpc32: set of optimisation of network checksum functions David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56D43F33.4030702@c-s.fr \
    --to=christophe.leroy@c-s.fr \
    --cc=benh@kernel.crashing.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=netdev@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=scottwood@freescale.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.