netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] x86/lib: Remove the special case for odd-aligned buffers in csum-partial_64.c
@ 2022-01-06 14:45 David Laight
  2022-01-10 11:49 ` David Laight
  0 siblings, 1 reply; 2+ messages in thread
From: David Laight @ 2022-01-06 14:45 UTC (permalink / raw)
  To: 'Eric Dumazet', Peter Zijlstra
  Cc: 'tglx@linutronix.de', 'mingo@redhat.com',
	'Borislav Petkov', 'dave.hansen@linux.intel.com',
	'X86 ML', 'hpa@zytor.com',
	'alexanderduyck@fb.com', 'open list',
	'netdev', 'Noah Goldstein'

There is no need to special case the very unusual odd-aligned buffers.
They are no worse than 4n+2 aligned buffers.

Signed-off-by: David Laight <david.laight@aculab.com>
Acked-by: Eric Dumazet
---

resend - v1 seems to have got lost :-)

v2: Also delete from32to16()
    Add acked-by from Eric (he sent one at some point)
    Fix possible whitespace error in the last hunk.

The penalty for any misaligned access seems to be minimal.
On an i7-7700 misaligned buffers add 2 or 3 clocks (in 115) to a 512 byte
  checksum.
That is less than 1 clock for each cache line!
That is just measuring the main loop with an lfence prior to rdpmc to
read PERF_COUNT_HW_CPU_CYCLES.

 arch/x86/lib/csum-partial_64.c | 28 ++--------------------------
 1 file changed, 2 insertions(+), 26 deletions(-)

diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
index 1f8a8f895173..061b1ed74d6a 100644
--- a/arch/x86/lib/csum-partial_64.c
+++ b/arch/x86/lib/csum-partial_64.c
@@ -11,16 +11,6 @@
 #include <asm/checksum.h>
 #include <asm/word-at-a-time.h>
 
-static inline unsigned short from32to16(unsigned a) 
-{
-	unsigned short b = a >> 16; 
-	asm("addw %w2,%w0\n\t"
-	    "adcw $0,%w0\n" 
-	    : "=r" (b)
-	    : "0" (b), "r" (a));
-	return b;
-}
-
 /*
  * Do a checksum on an arbitrary memory area.
  * Returns a 32bit checksum.
@@ -30,22 +20,12 @@ static inline unsigned short from32to16(unsigned a)
  *
  * Still, with CHECKSUM_COMPLETE this is called to compute
  * checksums on IPv6 headers (40 bytes) and other small parts.
- * it's best to have buff aligned on a 64-bit boundary
+ * The penalty for misaligned buff is negligable.
  */
 __wsum csum_partial(const void *buff, int len, __wsum sum)
 {
 	u64 temp64 = (__force u64)sum;
-	unsigned odd, result;
-
-	odd = 1 & (unsigned long) buff;
-	if (unlikely(odd)) {
-		if (unlikely(len == 0))
-			return sum;
-		temp64 = ror32((__force u32)sum, 8);
-		temp64 += (*(unsigned char *)buff << 8);
-		len--;
-		buff++;
-	}
+	unsigned result;
 
 	while (unlikely(len >= 64)) {
 		asm("addq 0*8(%[src]),%[res]\n\t"
@@ -130,10 +110,6 @@ __wsum csum_partial(const void *buff, int len, __wsum sum)
 #endif
 	}
 	result = add32_with_carry(temp64 >> 32, temp64 & 0xffffffff);
-	if (unlikely(odd)) {
-		result = from32to16(result);
-		result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);
-	}
 	return (__force __wsum)result;
 }
 EXPORT_SYMBOL(csum_partial);
-- 
2.17.1

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-01-10 11:50 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-06 14:45 [PATCH v2] x86/lib: Remove the special case for odd-aligned buffers in csum-partial_64.c David Laight
2022-01-10 11:49 ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).