linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Scott Wood <scottwood@freescale.com>
To: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	<linux-kernel@vger.kernel.org>, <linuxppc-dev@lists.ozlabs.org>,
	<netdev@vger.kernel.org>
Subject: Re: [PATCH 6/9] powerpc32: optimise a few instructions in csum_partial()
Date: Thu, 22 Oct 2015 22:30:40 -0500	[thread overview]
Message-ID: <1445571040.701.149.camel@freescale.com> (raw)
In-Reply-To: <0a4e1624642137dc2b16bbb68ea87b1a479dfa34.1442876807.git.christophe.leroy@c-s.fr>

On Tue, 2015-09-22 at 16:34 +0200, Christophe Leroy wrote:
> r5 does contain the value to be updated, so lets use r5 all way long
> for that. It makes the code more readable.
> 
> To avoid confusion, it is better to use adde instead of addc
> 
> The first addition is useless. Its only purpose is to clear carry.
> As r4 is a signed int that is always positive, this can be done by
> using srawi instead of srwi
> 
> Let's also remove the comment about bdnz having no overhead as it
> is not correct on all powerpc, at least on MPC8xx
> 
> In the last part, in our situation, the remaining quantity of bytes
> to be proceeded is between 0 and 3. Therefore, we can base that part
> on the value of bit 31 and bit 30 of r4 instead of anding r4 with 3
> then proceding on comparisons and substractions.
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
>  arch/powerpc/lib/checksum_32.S | 37 +++++++++++++++++--------------------
>  1 file changed, 17 insertions(+), 20 deletions(-)

Do you have benchmarks for these optimizations?

-Scott

> 
> diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
> index 3472372..9c12602 100644
> --- a/arch/powerpc/lib/checksum_32.S
> +++ b/arch/powerpc/lib/checksum_32.S
> @@ -27,35 +27,32 @@
>   * csum_partial(buff, len, sum)
>   */
>  _GLOBAL(csum_partial)
> -     addic   r0,r5,0
>       subi    r3,r3,4
> -     srwi.   r6,r4,2
> +     srawi.  r6,r4,2         /* Divide len by 4 and also clear carry */
>       beq     3f              /* if we're doing < 4 bytes */
> -     andi.   r5,r3,2         /* Align buffer to longword boundary */
> +     andi.   r0,r3,2         /* Align buffer to longword boundary */
>       beq+    1f
> -     lhz     r5,4(r3)        /* do 2 bytes to get aligned */
> -     addi    r3,r3,2
> +     lhz     r0,4(r3)        /* do 2 bytes to get aligned */
>       subi    r4,r4,2
> -     addc    r0,r0,r5
> +     addi    r3,r3,2
>       srwi.   r6,r4,2         /* # words to do */
> +     adde    r5,r5,r0
>       beq     3f
>  1:   mtctr   r6
> -2:   lwzu    r5,4(r3)        /* the bdnz has zero overhead, so it should */
> -     adde    r0,r0,r5        /* be unnecessary to unroll this loop */
> +2:   lwzu    r0,4(r3)
> +     adde    r5,r5,r0
>       bdnz    2b
> -     andi.   r4,r4,3
> -3:   cmpwi   0,r4,2
> -     blt+    4f
> -     lhz     r5,4(r3)
> +3:   andi.   r0,r4,2
> +     beq+    4f
> +     lhz     r0,4(r3)
>       addi    r3,r3,2
> -     subi    r4,r4,2
> -     adde    r0,r0,r5
> -4:   cmpwi   0,r4,1
> -     bne+    5f
> -     lbz     r5,4(r3)
> -     slwi    r5,r5,8         /* Upper byte of word */
> -     adde    r0,r0,r5
> -5:   addze   r3,r0           /* add in final carry */
> +     adde    r5,r5,r0
> +4:   andi.   r0,r4,1
> +     beq+    5f
> +     lbz     r0,4(r3)
> +     slwi    r0,r0,8         /* Upper byte of word */
> +     adde    r5,r5,r0
> +5:   addze   r3,r5           /* add in final carry */
>       blr
>  
>  /*

  reply	other threads:[~2015-10-23  3:30 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-22 14:34 [PATCH 0/9] powerpc32: set of optimisation of network checksum functions Christophe Leroy
2015-09-22 14:34 ` [PATCH 1/9] powerpc: unexport csum_tcpudp_magic Christophe Leroy
2015-09-22 14:34 ` [PATCH 2/9] powerpc: mark xer clobbered in csum_add() Christophe Leroy
2015-09-22 14:34 ` [PATCH 3/9] powerpc32: checksum_wrappers_64 becomes checksum_wrappers Christophe Leroy
2015-10-23  3:26   ` Scott Wood
2015-10-28 11:11     ` Anton Blanchard
2015-09-22 14:34 ` [PATCH 4/9] powerpc: inline ip_fast_csum() Christophe Leroy
2015-09-23  5:43   ` Denis Kirjanov
2016-02-29  7:25     ` Christophe Leroy
2016-03-05  3:50   ` [4/9] " Scott Wood
2015-09-22 14:34 ` [PATCH 5/9] powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user() Christophe Leroy
2015-09-22 14:34 ` [PATCH 6/9] powerpc32: optimise a few instructions in csum_partial() Christophe Leroy
2015-10-23  3:30   ` Scott Wood [this message]
2016-02-29 12:53     ` Christophe Leroy
2015-09-22 14:34 ` [PATCH 7/9] powerpc32: optimise csum_partial() loop Christophe Leroy
2015-09-22 14:34 ` [PATCH 8/9] powerpc: simplify csum_add(a, b) in case a or b is constant 0 Christophe Leroy
2015-10-23  3:33   ` Scott Wood
2016-02-29  7:26     ` Christophe Leroy
2015-09-22 14:34 ` [PATCH 9/9] powerpc: optimise csum_partial() call when len is constant Christophe Leroy
2015-10-23  3:32   ` Scott Wood
2016-03-05  5:29   ` [9/9] " Scott Wood
2015-09-23 22:38 ` [PATCH 0/9] powerpc32: set of optimisation of network checksum functions David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1445571040.701.149.camel@freescale.com \
    --to=scottwood@freescale.com \
    --cc=benh@kernel.crashing.org \
    --cc=christophe.leroy@c-s.fr \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=netdev@vger.kernel.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).