linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86: Run checksumming in parallel accross multiple alu's
@ 2013-10-11 16:51 Neil Horman
  2013-10-12 17:21 ` Ingo Molnar
                   ` (3 more replies)
  0 siblings, 4 replies; 105+ messages in thread
From: Neil Horman @ 2013-10-11 16:51 UTC (permalink / raw)
  To: linux-kernel
  Cc: Neil Horman, sebastien.dugue, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86

Sébastien Dugué reported to me that devices implementing ipoib (which don't have
checksum offload hardware were spending a significant amount of time computing
checksums.  We found that by splitting the checksum computation into two
separate streams, each skipping successive elements of the buffer being summed,
we could parallelize the checksum operation accros multiple alus.  Since neither
chain is dependent on the result of the other, we get a speedup in execution (on
hardware that has multiple alu's available, which is almost ubiquitous on x86),
and only a negligible decrease on hardware that has only a single alu (an extra
addition is introduced).  Since addition in commutative, the result is the same,
only faster

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: sebastien.dugue@bull.net
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: x86@kernel.org
---
 arch/x86/lib/csum-partial_64.c | 37 +++++++++++++++++++++++++------------
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
index 9845371..2c7bc50 100644
--- a/arch/x86/lib/csum-partial_64.c
+++ b/arch/x86/lib/csum-partial_64.c
@@ -29,11 +29,12 @@ static inline unsigned short from32to16(unsigned a)
  * Things tried and found to not make it faster:
  * Manual Prefetching
  * Unrolling to an 128 bytes inner loop.
- * Using interleaving with more registers to break the carry chains.
  */
 static unsigned do_csum(const unsigned char *buff, unsigned len)
 {
 	unsigned odd, count;
+	unsigned long result1 = 0;
+	unsigned long result2 = 0;
 	unsigned long result = 0;
 
 	if (unlikely(len == 0))
@@ -68,22 +69,34 @@ static unsigned do_csum(const unsigned char *buff, unsigned len)
 			zero = 0;
 			count64 = count >> 3;
 			while (count64) { 
-				asm("addq 0*8(%[src]),%[res]\n\t"
-				    "adcq 1*8(%[src]),%[res]\n\t"
-				    "adcq 2*8(%[src]),%[res]\n\t"
-				    "adcq 3*8(%[src]),%[res]\n\t"
-				    "adcq 4*8(%[src]),%[res]\n\t"
-				    "adcq 5*8(%[src]),%[res]\n\t"
-				    "adcq 6*8(%[src]),%[res]\n\t"
-				    "adcq 7*8(%[src]),%[res]\n\t"
-				    "adcq %[zero],%[res]"
-				    : [res] "=r" (result)
+				asm("addq 0*8(%[src]),%[res1]\n\t"
+				    "adcq 2*8(%[src]),%[res1]\n\t"
+				    "adcq 4*8(%[src]),%[res1]\n\t"
+				    "adcq 6*8(%[src]),%[res1]\n\t"
+				    "adcq %[zero],%[res1]\n\t"
+
+				    "addq 1*8(%[src]),%[res2]\n\t"
+				    "adcq 3*8(%[src]),%[res2]\n\t"
+				    "adcq 5*8(%[src]),%[res2]\n\t"
+				    "adcq 7*8(%[src]),%[res2]\n\t"
+				    "adcq %[zero],%[res2]"
+				    : [res1] "=r" (result1),
+				      [res2] "=r" (result2)
 				    : [src] "r" (buff), [zero] "r" (zero),
-				    "[res]" (result));
+				      "[res1]" (result1), "[res2]" (result2));
 				buff += 64;
 				count64--;
 			}
 
+			asm("addq %[res1],%[res]\n\t"
+			    "adcq %[res2],%[res]\n\t"
+			    "adcq %[zero],%[res]"
+			    : [res] "=r" (result)
+			    : [res1] "r" (result1),
+			      [res2] "r" (result2),
+			      [zero] "r" (zero),
+			      "0" (result));
+
 			/* last up to 7 8byte blocks */
 			count %= 8; 
 			while (count) { 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 105+ messages in thread

end of thread, other threads:[~2013-11-11 21:18 UTC | newest]

Thread overview: 105+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-11 16:51 [PATCH] x86: Run checksumming in parallel accross multiple alu's Neil Horman
2013-10-12 17:21 ` Ingo Molnar
2013-10-13 12:53   ` Neil Horman
2013-10-14 20:28   ` Neil Horman
2013-10-14 21:19     ` Eric Dumazet
2013-10-14 22:18       ` Eric Dumazet
2013-10-14 22:37         ` Joe Perches
2013-10-14 22:44           ` Eric Dumazet
2013-10-14 22:49             ` Joe Perches
2013-10-15  7:41               ` Ingo Molnar
2013-10-15 10:51                 ` Borislav Petkov
2013-10-15 12:04                   ` Ingo Molnar
2013-10-15 16:21                 ` Joe Perches
2013-10-16  0:34                   ` Eric Dumazet
2013-10-16  6:25                   ` Ingo Molnar
2013-10-16 16:55                     ` Joe Perches
2013-10-17  0:34         ` Neil Horman
2013-10-17  1:42           ` Eric Dumazet
2013-10-18 16:50             ` Neil Horman
2013-10-18 17:20               ` Eric Dumazet
2013-10-18 20:11                 ` Neil Horman
2013-10-18 21:15                   ` Eric Dumazet
2013-10-20 21:29                     ` Neil Horman
2013-10-21 17:31                       ` Eric Dumazet
2013-10-21 17:46                         ` Neil Horman
2013-10-21 19:21                     ` Neil Horman
2013-10-21 19:44                       ` Eric Dumazet
2013-10-21 20:19                         ` Neil Horman
2013-10-26 12:01                           ` Ingo Molnar
2013-10-26 13:58                             ` Neil Horman
2013-10-27  7:26                               ` Ingo Molnar
2013-10-27 17:05                                 ` Neil Horman
2013-10-17  8:41           ` Ingo Molnar
2013-10-17 18:19             ` H. Peter Anvin
2013-10-17 18:48               ` Eric Dumazet
2013-10-18  6:43               ` Ingo Molnar
2013-10-28 16:01             ` Neil Horman
2013-10-28 16:20               ` Ingo Molnar
2013-10-28 17:49                 ` Neil Horman
2013-10-28 16:24               ` Ingo Molnar
2013-10-28 16:49                 ` David Ahern
2013-10-28 17:46                 ` Neil Horman
2013-10-28 18:29                   ` Neil Horman
2013-10-29  8:25                     ` Ingo Molnar
2013-10-29 11:20                       ` Neil Horman
2013-10-29 11:30                         ` Ingo Molnar
2013-10-29 11:49                           ` Neil Horman
2013-10-29 12:52                             ` Ingo Molnar
2013-10-29 13:07                               ` Neil Horman
2013-10-29 13:11                                 ` Ingo Molnar
2013-10-29 13:20                                   ` Neil Horman
2013-10-29 14:17                                   ` Neil Horman
2013-10-29 14:27                                     ` Ingo Molnar
2013-10-29 20:26                                       ` Neil Horman
2013-10-31 10:22                                         ` Ingo Molnar
2013-10-31 14:33                                           ` Neil Horman
2013-11-01  9:13                                             ` Ingo Molnar
2013-11-01 14:06                                               ` Neil Horman
2013-10-29 14:12                               ` David Ahern
2013-10-15  7:32     ` Ingo Molnar
2013-10-15 13:14       ` Neil Horman
2013-10-12 22:29 ` H. Peter Anvin
2013-10-13 12:53   ` Neil Horman
2013-10-18 16:42   ` Neil Horman
2013-10-18 17:09     ` H. Peter Anvin
2013-10-25 13:06       ` Neil Horman
2013-10-14  4:38 ` Andi Kleen
2013-10-14  7:49   ` Ingo Molnar
2013-10-14 21:07     ` Eric Dumazet
2013-10-15 13:17       ` Neil Horman
2013-10-14 20:25   ` Neil Horman
2013-10-15  7:12     ` Sébastien Dugué
2013-10-15 13:33       ` Andi Kleen
2013-10-15 13:56         ` Sébastien Dugué
2013-10-15 14:06           ` Eric Dumazet
2013-10-15 14:15             ` Sébastien Dugué
2013-10-15 14:26               ` Eric Dumazet
2013-10-15 14:52                 ` Eric Dumazet
2013-10-15 16:02                   ` Andi Kleen
2013-10-16  0:28                     ` Eric Dumazet
2013-11-06 15:23 ` x86: Enhance perf checksum profiling and x86 implementation Neil Horman
2013-11-06 15:23   ` [PATCH v2 1/2] perf: Add csum benchmark tests to perf Neil Horman
2013-11-06 15:23   ` [PATCH v2 2/2] x86: add prefetching to do_csum Neil Horman
2013-11-06 15:34     ` Dave Jones
2013-11-06 15:54       ` Neil Horman
2013-11-06 17:19         ` Joe Perches
2013-11-06 18:11           ` Neil Horman
2013-11-06 20:02           ` Neil Horman
2013-11-06 20:07             ` Joe Perches
2013-11-08 16:25               ` Neil Horman
2013-11-08 16:51                 ` Joe Perches
2013-11-08 19:07                   ` Neil Horman
2013-11-08 19:17                     ` Joe Perches
2013-11-08 20:08                       ` Neil Horman
2013-11-08 19:17                     ` H. Peter Anvin
2013-11-08 19:01           ` Neil Horman
2013-11-08 19:33             ` Joe Perches
2013-11-08 20:14               ` Neil Horman
2013-11-08 20:29                 ` Joe Perches
2013-11-11 19:40                   ` Neil Horman
2013-11-11 21:18                     ` Ingo Molnar
2013-11-06 18:23         ` Eric Dumazet
2013-11-06 18:59           ` Neil Horman
2013-11-06 20:19     ` Andi Kleen
2013-11-07 21:23       ` Neil Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).