All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Hogan <james.hogan@imgtec.com>
To: linux-mips@linux-mips.org
Cc: chenj <chenj@lemote.com>, markos.chandras@imgtec.com, chenhc@lemote.com
Subject: Re: [PATCH, v2] MIPS: lib: csum_partial: more instruction paral
Date: Mon, 19 May 2014 07:59:11 +0100	[thread overview]
Message-ID: <1818781.bbVdBBlkH9@radagast> (raw)
In-Reply-To: <1400469247-17788-1-git-send-email-chenj@lemote.com>

[-- Attachment #1: Type: text/plain, Size: 3679 bytes --]

On Monday 19 May 2014 11:14:07 chenj wrote:
> Computing sum introduces true data dependency, e.g.
> 	ADDC(sum, t0)
> 	ADDC(sum, t1)
> 	ADDC(sum, t2)
> 	ADDC(sum, t3)
> Here, each ADDC(sum, ...) references the sum value updated by previous ADDC.
> 
> In this patch, above sequence is adjusted as following:
> 	ADDC(t0, t1)
> 	ADDC(t2, t3)
> 	ADDC(sum, t0)
> 	ADDC(sum, t2)
> The first two ADDC operations are independent, hence can be executed
> simultaneously if possible.

The actual patch appears to change it to this:
ADDC(t0, t1)
ADDC(sum, t0)
ADDC(t2, t3)
ADDC(sum, t2)

which is slightly different (presumably due to the interleaved stores in some 
of the cases).

> This patch improves instruction level parallelism, and brings at most 50%
> csum performance gain on Loongson 3a processor[1].

Nice results.

The stuff below the --- will get dropped when the patch is applied though, 
after which the "[1]" won't refer to anything.

Cheers
James

> 
> ---
> 1. The result can be found at
> http://dev.lemote.com/files/upload/software/csum-opti/csum-opti-benchmark.ht
> ml And is generated by a userspace test program:
> http://dev.lemote.com/files/upload/software/csum-opti/csum-test.tar.gz
> 
> [v2: amend commit message]
> 
>  arch/mips/lib/csum_partial.S | 38 +++++++++++++++++++-------------------
>  1 file changed, 19 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/csum_partial.S
> index 9901237..6cea101 100644
> --- a/arch/mips/lib/csum_partial.S
> +++ b/arch/mips/lib/csum_partial.S
> @@ -76,10 +76,10 @@
>  	LOAD	_t1, (offset + UNIT(1))(src);			\
>  	LOAD	_t2, (offset + UNIT(2))(src);			\
>  	LOAD	_t3, (offset + UNIT(3))(src);			\
> +	ADDC(_t0, _t1);						\
> +	ADDC(_t2, _t3);						\
>  	ADDC(sum, _t0);						\
> -	ADDC(sum, _t1);						\
> -	ADDC(sum, _t2);						\
> -	ADDC(sum, _t3)
> +	ADDC(sum, _t2)
> 
>  #ifdef USE_DOUBLE
>  #define CSUM_BIGCHUNK(src, offset, sum, _t0, _t1, _t2, _t3)	\
> @@ -501,21 +501,21 @@ LEAF(csum_partial)
>  	SUB	len, len, 8*NBYTES
>  	ADD	src, src, 8*NBYTES
>  	STORE(t0, UNIT(0)(dst),	.Ls_exc\@)
> -	ADDC(sum, t0)
> +	ADDC(t0, t1)
>  	STORE(t1, UNIT(1)(dst),	.Ls_exc\@)
> -	ADDC(sum, t1)
> +	ADDC(sum, t0)
>  	STORE(t2, UNIT(2)(dst),	.Ls_exc\@)
> -	ADDC(sum, t2)
> +	ADDC(t2, t3)
>  	STORE(t3, UNIT(3)(dst),	.Ls_exc\@)
> -	ADDC(sum, t3)
> +	ADDC(sum, t2)
>  	STORE(t4, UNIT(4)(dst),	.Ls_exc\@)
> -	ADDC(sum, t4)
> +	ADDC(t4, t5)
>  	STORE(t5, UNIT(5)(dst),	.Ls_exc\@)
> -	ADDC(sum, t5)
> +	ADDC(sum, t4)
>  	STORE(t6, UNIT(6)(dst),	.Ls_exc\@)
> -	ADDC(sum, t6)
> +	ADDC(t6, t7)
>  	STORE(t7, UNIT(7)(dst),	.Ls_exc\@)
> -	ADDC(sum, t7)
> +	ADDC(sum, t6)
>  	.set	reorder				/* DADDI_WAR */
>  	ADD	dst, dst, 8*NBYTES
>  	bgez	len, 1b
> @@ -541,13 +541,13 @@ LEAF(csum_partial)
>  	SUB	len, len, 4*NBYTES
>  	ADD	src, src, 4*NBYTES
>  	STORE(t0, UNIT(0)(dst),	.Ls_exc\@)
> -	ADDC(sum, t0)
> +	ADDC(t0, t1)
>  	STORE(t1, UNIT(1)(dst),	.Ls_exc\@)
> -	ADDC(sum, t1)
> +	ADDC(sum, t0)
>  	STORE(t2, UNIT(2)(dst),	.Ls_exc\@)
> -	ADDC(sum, t2)
> +	ADDC(t2, t3)
>  	STORE(t3, UNIT(3)(dst),	.Ls_exc\@)
> -	ADDC(sum, t3)
> +	ADDC(sum, t2)
>  	.set	reorder				/* DADDI_WAR */
>  	ADD	dst, dst, 4*NBYTES
>  	beqz	len, .Ldone\@
> @@ -646,13 +646,13 @@ LEAF(csum_partial)
>  	nop				# improves slotting
>  #endif
>  	STORE(t0, UNIT(0)(dst),	.Ls_exc\@)
> -	ADDC(sum, t0)
> +	ADDC(t0, t1)
>  	STORE(t1, UNIT(1)(dst),	.Ls_exc\@)
> -	ADDC(sum, t1)
> +	ADDC(sum, t0)
>  	STORE(t2, UNIT(2)(dst),	.Ls_exc\@)
> -	ADDC(sum, t2)
> +	ADDC(t2, t3)
>  	STORE(t3, UNIT(3)(dst),	.Ls_exc\@)
> -	ADDC(sum, t3)
> +	ADDC(sum, t2)
>  	.set	reorder				/* DADDI_WAR */
>  	ADD	dst, dst, 4*NBYTES
>  	bne	len, rem, 1b

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

  reply	other threads:[~2014-05-19  7:00 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-15  7:09 [PATCH 1/2] MIPS: lib: csum_partial: more instruction paral chenj
2014-05-15  7:09 ` [PATCH 2/2] MIPS: lib: csum_partial: use wsbh/movn on ls3 chenj
2014-05-15 11:40   ` Paul Burton
2014-05-15 11:40     ` Paul Burton
2014-05-16 13:29     ` Chen Jie
2014-05-16 15:21       ` Paul Burton
2014-06-03 11:03   ` Ralf Baechle
2014-06-03 15:03     ` Chen Jie
2014-06-03 18:44   ` Ralf Baechle
2014-06-04  7:57     ` Chen Jie
2014-05-15  8:20 ` [PATCH 1/2] MIPS: lib: csum_partial: more instruction paral Markos Chandras
2014-05-15  8:20   ` Markos Chandras
2014-05-19 16:36   ` Ralf Baechle
2014-05-19  3:14 ` [PATCH, v2] " chenj
2014-05-19  6:59   ` James Hogan [this message]
2014-05-19 15:32     ` Chen Jie
2014-08-15 20:15       ` Chen Jie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1818781.bbVdBBlkH9@radagast \
    --to=james.hogan@imgtec.com \
    --cc=chenhc@lemote.com \
    --cc=chenj@lemote.com \
    --cc=linux-mips@linux-mips.org \
    --cc=markos.chandras@imgtec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.