public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] lib/x86: Optimise csum_partial of buffers that are not multiples of 8 bytes.
@ 2021-12-13 18:00 David Laight
  2021-12-13 18:40 ` Alexander Duyck
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: David Laight @ 2021-12-13 18:00 UTC (permalink / raw)
  To: 'Noah Goldstein', 'Eric Dumazet'
  Cc: 'tglx@linutronix.de', 'mingo@redhat.com',
	'Borislav Petkov', 'dave.hansen@linux.intel.com',
	'X86 ML', 'hpa@zytor.com',
	'peterz@infradead.org', 'alexanderduyck@fb.com',
	'open list', 'netdev'


Add in the trailing bytes first so that there is no need to worry
about the sum exceeding 64 bits.

Signed-off-by: David Laight <david.laight@aculab.com>
---

This ought to be faster - because of all the removed 'adc $0'.
Guessing how fast x86 code will run is hard!
There are other ways of handing buffers that are shorter than 8 bytes,
but I'd rather hope they don't happen in any hot paths.

Note - I've not even compile tested it.
(But have tested an equivalent change before.)

 arch/x86/lib/csum-partial_64.c | 55 ++++++++++++----------------------
 1 file changed, 19 insertions(+), 36 deletions(-)

diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
index abf819dd8525..fbcc073fc2b5 100644
--- a/arch/x86/lib/csum-partial_64.c
+++ b/arch/x86/lib/csum-partial_64.c
@@ -37,6 +37,24 @@ __wsum csum_partial(const void *buff, int len, __wsum sum)
 	u64 temp64 = (__force u64)sum;
 	unsigned result;
 
+	if (len & 7) {
+		if (unlikely(len < 8)) {
+			/* Avoid falling off the start of the buffer */
+			if (len & 4) {
+				temp64 += *(u32 *)buff;
+				buff += 4;
+			}
+			if (len & 2) {
+				temp64 += *(u16 *)buff;
+				buff += 2;
+			}
+			if (len & 1)
+				temp64 += *(u8 *)buff;
+			goto reduce_to32;
+		}
+		temp64 += *(u64 *)(buff + len - 8) << (8 - (len & 7)) * 8;
+	}
+
 	while (unlikely(len >= 64)) {
 		asm("addq 0*8(%[src]),%[res]\n\t"
 		    "adcq 1*8(%[src]),%[res]\n\t"
@@ -82,43 +100,8 @@ __wsum csum_partial(const void *buff, int len, __wsum sum)
 			: "memory");
 		buff += 8;
 	}
-	if (len & 7) {
-#ifdef CONFIG_DCACHE_WORD_ACCESS
-		unsigned int shift = (8 - (len & 7)) * 8;
-		unsigned long trail;
-
-		trail = (load_unaligned_zeropad(buff) << shift) >> shift;
 
-		asm("addq %[trail],%[res]\n\t"
-		    "adcq $0,%[res]"
-			: [res] "+r" (temp64)
-			: [trail] "r" (trail));
-#else
-		if (len & 4) {
-			asm("addq %[val],%[res]\n\t"
-			    "adcq $0,%[res]"
-				: [res] "+r" (temp64)
-				: [val] "r" ((u64)*(u32 *)buff)
-				: "memory");
-			buff += 4;
-		}
-		if (len & 2) {
-			asm("addq %[val],%[res]\n\t"
-			    "adcq $0,%[res]"
-				: [res] "+r" (temp64)
-				: [val] "r" ((u64)*(u16 *)buff)
-				: "memory");
-			buff += 2;
-		}
-		if (len & 1) {
-			asm("addq %[val],%[res]\n\t"
-			    "adcq $0,%[res]"
-				: [res] "+r" (temp64)
-				: [val] "r" ((u64)*(u8 *)buff)
-				: "memory");
-		}
-#endif
-	}
+reduce_to32:
 	result = add32_with_carry(temp64 >> 32, temp64 & 0xffffffff);
 	return (__force __wsum)result;
 }
-- 
2.17.1

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-12-14 12:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-13 18:00 [PATCH] lib/x86: Optimise csum_partial of buffers that are not multiples of 8 bytes David Laight
2021-12-13 18:40 ` Alexander Duyck
2021-12-13 22:52   ` David Laight
2021-12-13 18:45 ` Eric Dumazet
2021-12-13 19:23   ` Alexander Duyck
2021-12-14 12:36 ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox