From: James Hogan <james.hogan@imgtec.com>
To: linux-mips@linux-mips.org
Cc: chenj <chenj@lemote.com>, markos.chandras@imgtec.com, chenhc@lemote.com
Subject: Re: [PATCH, v2] MIPS: lib: csum_partial: more instruction paral
Date: Mon, 19 May 2014 07:59:11 +0100 [thread overview]
Message-ID: <1818781.bbVdBBlkH9@radagast> (raw)
In-Reply-To: <1400469247-17788-1-git-send-email-chenj@lemote.com>
[-- Attachment #1: Type: text/plain, Size: 3679 bytes --]
On Monday 19 May 2014 11:14:07 chenj wrote:
> Computing sum introduces true data dependency, e.g.
> ADDC(sum, t0)
> ADDC(sum, t1)
> ADDC(sum, t2)
> ADDC(sum, t3)
> Here, each ADDC(sum, ...) references the sum value updated by previous ADDC.
>
> In this patch, above sequence is adjusted as following:
> ADDC(t0, t1)
> ADDC(t2, t3)
> ADDC(sum, t0)
> ADDC(sum, t2)
> The first two ADDC operations are independent, hence can be executed
> simultaneously if possible.
The actual patch appears to change it to this:
ADDC(t0, t1)
ADDC(sum, t0)
ADDC(t2, t3)
ADDC(sum, t2)
which is slightly different (presumably due to the interleaved stores in some
of the cases).
> This patch improves instruction level parallelism, and brings at most 50%
> csum performance gain on Loongson 3a processor[1].
Nice results.
The stuff below the --- will get dropped when the patch is applied though,
after which the "[1]" won't refer to anything.
Cheers
James
>
> ---
> 1. The result can be found at
> http://dev.lemote.com/files/upload/software/csum-opti/csum-opti-benchmark.ht
> ml And is generated by a userspace test program:
> http://dev.lemote.com/files/upload/software/csum-opti/csum-test.tar.gz
>
> [v2: amend commit message]
>
> arch/mips/lib/csum_partial.S | 38 +++++++++++++++++++-------------------
> 1 file changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/csum_partial.S
> index 9901237..6cea101 100644
> --- a/arch/mips/lib/csum_partial.S
> +++ b/arch/mips/lib/csum_partial.S
> @@ -76,10 +76,10 @@
> LOAD _t1, (offset + UNIT(1))(src); \
> LOAD _t2, (offset + UNIT(2))(src); \
> LOAD _t3, (offset + UNIT(3))(src); \
> + ADDC(_t0, _t1); \
> + ADDC(_t2, _t3); \
> ADDC(sum, _t0); \
> - ADDC(sum, _t1); \
> - ADDC(sum, _t2); \
> - ADDC(sum, _t3)
> + ADDC(sum, _t2)
>
> #ifdef USE_DOUBLE
> #define CSUM_BIGCHUNK(src, offset, sum, _t0, _t1, _t2, _t3) \
> @@ -501,21 +501,21 @@ LEAF(csum_partial)
> SUB len, len, 8*NBYTES
> ADD src, src, 8*NBYTES
> STORE(t0, UNIT(0)(dst), .Ls_exc\@)
> - ADDC(sum, t0)
> + ADDC(t0, t1)
> STORE(t1, UNIT(1)(dst), .Ls_exc\@)
> - ADDC(sum, t1)
> + ADDC(sum, t0)
> STORE(t2, UNIT(2)(dst), .Ls_exc\@)
> - ADDC(sum, t2)
> + ADDC(t2, t3)
> STORE(t3, UNIT(3)(dst), .Ls_exc\@)
> - ADDC(sum, t3)
> + ADDC(sum, t2)
> STORE(t4, UNIT(4)(dst), .Ls_exc\@)
> - ADDC(sum, t4)
> + ADDC(t4, t5)
> STORE(t5, UNIT(5)(dst), .Ls_exc\@)
> - ADDC(sum, t5)
> + ADDC(sum, t4)
> STORE(t6, UNIT(6)(dst), .Ls_exc\@)
> - ADDC(sum, t6)
> + ADDC(t6, t7)
> STORE(t7, UNIT(7)(dst), .Ls_exc\@)
> - ADDC(sum, t7)
> + ADDC(sum, t6)
> .set reorder /* DADDI_WAR */
> ADD dst, dst, 8*NBYTES
> bgez len, 1b
> @@ -541,13 +541,13 @@ LEAF(csum_partial)
> SUB len, len, 4*NBYTES
> ADD src, src, 4*NBYTES
> STORE(t0, UNIT(0)(dst), .Ls_exc\@)
> - ADDC(sum, t0)
> + ADDC(t0, t1)
> STORE(t1, UNIT(1)(dst), .Ls_exc\@)
> - ADDC(sum, t1)
> + ADDC(sum, t0)
> STORE(t2, UNIT(2)(dst), .Ls_exc\@)
> - ADDC(sum, t2)
> + ADDC(t2, t3)
> STORE(t3, UNIT(3)(dst), .Ls_exc\@)
> - ADDC(sum, t3)
> + ADDC(sum, t2)
> .set reorder /* DADDI_WAR */
> ADD dst, dst, 4*NBYTES
> beqz len, .Ldone\@
> @@ -646,13 +646,13 @@ LEAF(csum_partial)
> nop # improves slotting
> #endif
> STORE(t0, UNIT(0)(dst), .Ls_exc\@)
> - ADDC(sum, t0)
> + ADDC(t0, t1)
> STORE(t1, UNIT(1)(dst), .Ls_exc\@)
> - ADDC(sum, t1)
> + ADDC(sum, t0)
> STORE(t2, UNIT(2)(dst), .Ls_exc\@)
> - ADDC(sum, t2)
> + ADDC(t2, t3)
> STORE(t3, UNIT(3)(dst), .Ls_exc\@)
> - ADDC(sum, t3)
> + ADDC(sum, t2)
> .set reorder /* DADDI_WAR */
> ADD dst, dst, 4*NBYTES
> bne len, rem, 1b
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
next prev parent reply other threads:[~2014-05-19 7:00 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-15 7:09 [PATCH 1/2] MIPS: lib: csum_partial: more instruction paral chenj
2014-05-15 7:09 ` [PATCH 2/2] MIPS: lib: csum_partial: use wsbh/movn on ls3 chenj
2014-05-15 11:40 ` Paul Burton
2014-05-15 11:40 ` Paul Burton
2014-05-16 13:29 ` Chen Jie
2014-05-16 15:21 ` Paul Burton
2014-06-03 11:03 ` Ralf Baechle
2014-06-03 15:03 ` Chen Jie
2014-06-03 18:44 ` Ralf Baechle
2014-06-04 7:57 ` Chen Jie
2014-05-15 8:20 ` [PATCH 1/2] MIPS: lib: csum_partial: more instruction paral Markos Chandras
2014-05-15 8:20 ` Markos Chandras
2014-05-19 16:36 ` Ralf Baechle
2014-05-19 3:14 ` [PATCH, v2] " chenj
2014-05-19 6:59 ` James Hogan [this message]
2014-05-19 15:32 ` Chen Jie
2014-08-15 20:15 ` Chen Jie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1818781.bbVdBBlkH9@radagast \
--to=james.hogan@imgtec.com \
--cc=chenhc@lemote.com \
--cc=chenj@lemote.com \
--cc=linux-mips@linux-mips.org \
--cc=markos.chandras@imgtec.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.