From: Scott Wood <scottwood@freescale.com>
To: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
Michael Ellerman <mpe@ellerman.id.au>,
<linux-kernel@vger.kernel.org>, <linuxppc-dev@lists.ozlabs.org>,
<netdev@vger.kernel.org>
Subject: Re: [PATCH 6/9] powerpc32: optimise a few instructions in csum_partial()
Date: Thu, 22 Oct 2015 22:30:40 -0500 [thread overview]
Message-ID: <1445571040.701.149.camel@freescale.com> (raw)
In-Reply-To: <0a4e1624642137dc2b16bbb68ea87b1a479dfa34.1442876807.git.christophe.leroy@c-s.fr>
On Tue, 2015-09-22 at 16:34 +0200, Christophe Leroy wrote:
> r5 does contain the value to be updated, so lets use r5 all way long
> for that. It makes the code more readable.
>
> To avoid confusion, it is better to use adde instead of addc
>
> The first addition is useless. Its only purpose is to clear carry.
> As r4 is a signed int that is always positive, this can be done by
> using srawi instead of srwi
>
> Let's also remove the comment about bdnz having no overhead as it
> is not correct on all powerpc, at least on MPC8xx
>
> In the last part, in our situation, the remaining quantity of bytes
> to be proceeded is between 0 and 3. Therefore, we can base that part
> on the value of bit 31 and bit 30 of r4 instead of anding r4 with 3
> then proceding on comparisons and substractions.
>
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
> arch/powerpc/lib/checksum_32.S | 37 +++++++++++++++++--------------------
> 1 file changed, 17 insertions(+), 20 deletions(-)
Do you have benchmarks for these optimizations?
-Scott
>
> diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
> index 3472372..9c12602 100644
> --- a/arch/powerpc/lib/checksum_32.S
> +++ b/arch/powerpc/lib/checksum_32.S
> @@ -27,35 +27,32 @@
> * csum_partial(buff, len, sum)
> */
> _GLOBAL(csum_partial)
> - addic r0,r5,0
> subi r3,r3,4
> - srwi. r6,r4,2
> + srawi. r6,r4,2 /* Divide len by 4 and also clear carry */
> beq 3f /* if we're doing < 4 bytes */
> - andi. r5,r3,2 /* Align buffer to longword boundary */
> + andi. r0,r3,2 /* Align buffer to longword boundary */
> beq+ 1f
> - lhz r5,4(r3) /* do 2 bytes to get aligned */
> - addi r3,r3,2
> + lhz r0,4(r3) /* do 2 bytes to get aligned */
> subi r4,r4,2
> - addc r0,r0,r5
> + addi r3,r3,2
> srwi. r6,r4,2 /* # words to do */
> + adde r5,r5,r0
> beq 3f
> 1: mtctr r6
> -2: lwzu r5,4(r3) /* the bdnz has zero overhead, so it should */
> - adde r0,r0,r5 /* be unnecessary to unroll this loop */
> +2: lwzu r0,4(r3)
> + adde r5,r5,r0
> bdnz 2b
> - andi. r4,r4,3
> -3: cmpwi 0,r4,2
> - blt+ 4f
> - lhz r5,4(r3)
> +3: andi. r0,r4,2
> + beq+ 4f
> + lhz r0,4(r3)
> addi r3,r3,2
> - subi r4,r4,2
> - adde r0,r0,r5
> -4: cmpwi 0,r4,1
> - bne+ 5f
> - lbz r5,4(r3)
> - slwi r5,r5,8 /* Upper byte of word */
> - adde r0,r0,r5
> -5: addze r3,r0 /* add in final carry */
> + adde r5,r5,r0
> +4: andi. r0,r4,1
> + beq+ 5f
> + lbz r0,4(r3)
> + slwi r0,r0,8 /* Upper byte of word */
> + adde r5,r5,r0
> +5: addze r3,r5 /* add in final carry */
> blr
>
> /*
next prev parent reply other threads:[~2015-10-23 3:30 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-22 14:34 [PATCH 0/9] powerpc32: set of optimisation of network checksum functions Christophe Leroy
2015-09-22 14:34 ` [PATCH 1/9] powerpc: unexport csum_tcpudp_magic Christophe Leroy
2015-09-22 14:34 ` [PATCH 2/9] powerpc: mark xer clobbered in csum_add() Christophe Leroy
2015-09-22 14:34 ` [PATCH 3/9] powerpc32: checksum_wrappers_64 becomes checksum_wrappers Christophe Leroy
2015-10-23 3:26 ` Scott Wood
2015-10-28 11:11 ` Anton Blanchard
2015-09-22 14:34 ` [PATCH 4/9] powerpc: inline ip_fast_csum() Christophe Leroy
2015-09-23 5:43 ` Denis Kirjanov
2016-02-29 7:25 ` Christophe Leroy
2016-03-05 3:50 ` [4/9] " Scott Wood
2015-09-22 14:34 ` [PATCH 5/9] powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user() Christophe Leroy
2015-09-22 14:34 ` [PATCH 6/9] powerpc32: optimise a few instructions in csum_partial() Christophe Leroy
2015-10-23 3:30 ` Scott Wood [this message]
2016-02-29 12:53 ` Christophe Leroy
2015-09-22 14:34 ` [PATCH 7/9] powerpc32: optimise csum_partial() loop Christophe Leroy
2015-09-22 14:34 ` [PATCH 8/9] powerpc: simplify csum_add(a, b) in case a or b is constant 0 Christophe Leroy
2015-10-23 3:33 ` Scott Wood
2016-02-29 7:26 ` Christophe Leroy
2015-09-22 14:34 ` [PATCH 9/9] powerpc: optimise csum_partial() call when len is constant Christophe Leroy
2015-10-23 3:32 ` Scott Wood
2016-03-05 5:29 ` [9/9] " Scott Wood
2015-09-23 22:38 ` [PATCH 0/9] powerpc32: set of optimisation of network checksum functions David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1445571040.701.149.camel@freescale.com \
--to=scottwood@freescale.com \
--cc=benh@kernel.crashing.org \
--cc=christophe.leroy@c-s.fr \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=netdev@vger.kernel.org \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).