All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] MIPS: lib: csum_partial: more instruction paral
@ 2014-05-15  7:09 chenj
  2014-05-15  7:09 ` [PATCH 2/2] MIPS: lib: csum_partial: use wsbh/movn on ls3 chenj
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: chenj @ 2014-05-15  7:09 UTC (permalink / raw)
  To: linux-mips; +Cc: chenj

This will bring at most 50% performance gain on loongson3a.
See
http://dev.lemote.com/files/upload/software/csum-opti/csum-opti-benchmark.html

The benchmark is done in userspace through
http://dev.lemote.com/files/upload/software/csum-opti/csum-test.tar.gz
---
 arch/mips/lib/csum_partial.S | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/arch/mips/lib/csum_partial.S b/arch/mips/lib/csum_partial.S
index 9901237..6cea101 100644
--- a/arch/mips/lib/csum_partial.S
+++ b/arch/mips/lib/csum_partial.S
@@ -76,10 +76,10 @@
 	LOAD	_t1, (offset + UNIT(1))(src);			\
 	LOAD	_t2, (offset + UNIT(2))(src);			\
 	LOAD	_t3, (offset + UNIT(3))(src);			\
+	ADDC(_t0, _t1);						\
+	ADDC(_t2, _t3);						\
 	ADDC(sum, _t0);						\
-	ADDC(sum, _t1);						\
-	ADDC(sum, _t2);						\
-	ADDC(sum, _t3)
+	ADDC(sum, _t2)
 
 #ifdef USE_DOUBLE
 #define CSUM_BIGCHUNK(src, offset, sum, _t0, _t1, _t2, _t3)	\
@@ -501,21 +501,21 @@ LEAF(csum_partial)
 	SUB	len, len, 8*NBYTES
 	ADD	src, src, 8*NBYTES
 	STORE(t0, UNIT(0)(dst),	.Ls_exc\@)
-	ADDC(sum, t0)
+	ADDC(t0, t1)
 	STORE(t1, UNIT(1)(dst),	.Ls_exc\@)
-	ADDC(sum, t1)
+	ADDC(sum, t0)
 	STORE(t2, UNIT(2)(dst),	.Ls_exc\@)
-	ADDC(sum, t2)
+	ADDC(t2, t3)
 	STORE(t3, UNIT(3)(dst),	.Ls_exc\@)
-	ADDC(sum, t3)
+	ADDC(sum, t2)
 	STORE(t4, UNIT(4)(dst),	.Ls_exc\@)
-	ADDC(sum, t4)
+	ADDC(t4, t5)
 	STORE(t5, UNIT(5)(dst),	.Ls_exc\@)
-	ADDC(sum, t5)
+	ADDC(sum, t4)
 	STORE(t6, UNIT(6)(dst),	.Ls_exc\@)
-	ADDC(sum, t6)
+	ADDC(t6, t7)
 	STORE(t7, UNIT(7)(dst),	.Ls_exc\@)
-	ADDC(sum, t7)
+	ADDC(sum, t6)
 	.set	reorder				/* DADDI_WAR */
 	ADD	dst, dst, 8*NBYTES
 	bgez	len, 1b
@@ -541,13 +541,13 @@ LEAF(csum_partial)
 	SUB	len, len, 4*NBYTES
 	ADD	src, src, 4*NBYTES
 	STORE(t0, UNIT(0)(dst),	.Ls_exc\@)
-	ADDC(sum, t0)
+	ADDC(t0, t1)
 	STORE(t1, UNIT(1)(dst),	.Ls_exc\@)
-	ADDC(sum, t1)
+	ADDC(sum, t0)
 	STORE(t2, UNIT(2)(dst),	.Ls_exc\@)
-	ADDC(sum, t2)
+	ADDC(t2, t3)
 	STORE(t3, UNIT(3)(dst),	.Ls_exc\@)
-	ADDC(sum, t3)
+	ADDC(sum, t2)
 	.set	reorder				/* DADDI_WAR */
 	ADD	dst, dst, 4*NBYTES
 	beqz	len, .Ldone\@
@@ -646,13 +646,13 @@ LEAF(csum_partial)
 	nop				# improves slotting
 #endif
 	STORE(t0, UNIT(0)(dst),	.Ls_exc\@)
-	ADDC(sum, t0)
+	ADDC(t0, t1)
 	STORE(t1, UNIT(1)(dst),	.Ls_exc\@)
-	ADDC(sum, t1)
+	ADDC(sum, t0)
 	STORE(t2, UNIT(2)(dst),	.Ls_exc\@)
-	ADDC(sum, t2)
+	ADDC(t2, t3)
 	STORE(t3, UNIT(3)(dst),	.Ls_exc\@)
-	ADDC(sum, t3)
+	ADDC(sum, t2)
 	.set	reorder				/* DADDI_WAR */
 	ADD	dst, dst, 4*NBYTES
 	bne	len, rem, 1b
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread
* Re: [PATCH 2/2] MIPS: lib: csum_partial: use wsbh/movn on ls3
@ 2014-05-18  8:23 chenj
  2014-05-18 14:39 ` Huacai Chen
  0 siblings, 1 reply; 24+ messages in thread
From: chenj @ 2014-05-18  8:23 UTC (permalink / raw)
  To: linux-mips, paul.burton; +Cc: chenhc

[PATCH] MIPS: Loongson 3: Select CPU_MIPS64_R2

To chenhc: Please review this.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-08-15 20:16 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-15  7:09 [PATCH 1/2] MIPS: lib: csum_partial: more instruction paral chenj
2014-05-15  7:09 ` [PATCH 2/2] MIPS: lib: csum_partial: use wsbh/movn on ls3 chenj
2014-05-15 11:40   ` Paul Burton
2014-05-15 11:40     ` Paul Burton
2014-05-16 13:29     ` Chen Jie
2014-05-16 15:21       ` Paul Burton
2014-06-03 11:03   ` Ralf Baechle
2014-06-03 15:03     ` Chen Jie
2014-06-03 18:44   ` Ralf Baechle
2014-06-04  7:57     ` Chen Jie
2014-05-15  8:20 ` [PATCH 1/2] MIPS: lib: csum_partial: more instruction paral Markos Chandras
2014-05-15  8:20   ` Markos Chandras
2014-05-19 16:36   ` Ralf Baechle
2014-05-19  3:14 ` [PATCH, v2] " chenj
2014-05-19  6:59   ` James Hogan
2014-05-19 15:32     ` Chen Jie
2014-08-15 20:15       ` Chen Jie
  -- strict thread matches above, loose matches on Subject: below --
2014-05-18  8:23 [PATCH 2/2] MIPS: lib: csum_partial: use wsbh/movn on ls3 chenj
2014-05-18 14:39 ` Huacai Chen
2014-05-19 10:02   ` Paul Burton
2014-05-19 15:15     ` Chen Jie
2014-05-19 17:06       ` Ralf Baechle
2014-05-20 12:33         ` cee1
2014-05-24  1:33           ` Huacai Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.