All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: frank zago <fzago@systemfabricworks.com>
Cc: linux-kernel@vger.kernel.org,
	Bob Pearson <rpearson@systemfabricworks.com>,
	Roland Dreier <roland@kernel.org>
Subject: Re: [PATCH] add slice by 8 algorithm to crc32.c
Date: Thu, 28 Jul 2011 15:16:19 -0700	[thread overview]
Message-ID: <20110728151619.8b4cc4e3.akpm@linux-foundation.org> (raw)
In-Reply-To: <4E27547E.4000100@systemfabricworks.com>


(I don't think rdreier@cisco.com exists any more)

On Wed, 20 Jul 2011 17:19:42 -0500
frank zago <fzago@systemfabricworks.com> wrote:

> Hello,
> 
> Added support for slice by 8 to existing crc32 algorithm. Also
> modified gen_crc32table.c to only produce table entries that are
> actually used. The parameters CRC_LE_BITS and CRC_BE_BITS determine
> the number of bits in the input array that are processed during each
> step. Generally the more bits the faster the algorithm is but the
> more table data required.
> 
> Using an x86_64 Opteron machine running at 2100MHz the following table
> was collected with a pre-warmed cache by computing the crc 1000 times
> on a buffer of 4096 bytes.
> 
> 	BITS	Size	LE Cycles/byte	BE Cycles/byte
> 	----------------------------------------------
> 	1	873	41.65		34.60
> 	2	1097	25.43		29.61
> 	4	1057	13.29		15.28
> 	8	2913	7.13		8.19
> 	32	9684	2.80		2.82
> 	64	18178	1.53		1.53
> 
> 	BITS is the value of CRC_LE_BITS or CRC_BE_BITS. The old
> 	default was 8 which actually selected the 32 bit algorithm. In
> 	this version the value 8 is used to select the standard
> 	8 bit algorithm and two new values: 32 and 64 are introduced
> 	to select the slice by 4 and slice by 8 algorithms respectively.
> 
> 	Where Size is the size of crc32.o's text segment which includes
> 	code and table data when both LE and BE versions are set to BITS.
> 
> The current version of crc32.c by default uses the slice by 4 algorithm
> which requires about 2.8 cycles per byte. The slice by 8 algorithm is
> roughly 2X faster and enables packet processing at over 1GB/sec on a typical
> 2-3GHz system.

Are you sure that the code doesn't add any unaligned accesses?  Those
are very bad on some non-x86 architectures.

> --- lib/crc32.c
> +++ lib/crc32.c

Please prepare patches in `patch -p1' form.  So that should have been

--- a/lib/crc32.c
+++ a/lib/crc32.c

>
> ...
>
> @@ -28,14 +31,17 @@
>  #include <linux/init.h>
>  #include <asm/atomic.h>
>  #include "crc32defs.h"
> -#if CRC_LE_BITS == 8
> -# define tole(x) __constant_cpu_to_le32(x)
> +
> +#include <asm/msr.h>

That breaks the build on all non-x86.  Fortunately the inclusion
appears to be unnecessary.

> +#if CRC_LE_BITS > 8
> +# define tole(x) (__force u32) __constant_cpu_to_le32(x)
>  #else
>  # define tole(x) (x)
>  #endif
>  
> -#if CRC_BE_BITS == 8
> -# define tobe(x) __constant_cpu_to_be32(x)
> +#if CRC_BE_BITS > 8
> +# define tobe(x) (__force u32) __constant_cpu_to_be32(x)
>  #else
>  # define tobe(x) (x)
>  #endif
> @@ -45,54 +51,228 @@ MODULE_AUTHOR("Matt Domsch <Matt_Domsch@
>  MODULE_DESCRIPTION("Ethernet CRC32 calculations");
>  MODULE_LICENSE("GPL");
>  
> -#if CRC_LE_BITS == 8 || CRC_BE_BITS == 8
> +#if CRC_LE_BITS > 8
> +static inline u32 crc32_le_body(u32 crc, u8 const *buf, size_t len)

`inline' is largely ignored by gcc, with gcc-version dependencies. 
__always_inline is more likely to succeed.  But the compiler would
inline this all by itself anyway.

> +{
> +	const u8 *p8;
> +	const u32 *p32;
> +	int init_bytes, end_bytes;
> +	size_t words;
> +	int i;
> +	u32 q;
> +	u8 i0, i1, i2, i3;
> +
> +	crc = (__force u32) __cpu_to_le32(crc);
> +
> +#if CRC_LE_BITS == 64
> +	p8 = buf;
> +	p32 = (u32 *)(((uintptr_t)p8 + 7) & ~7);

Perhaps using ALIGN() would be clearer.

> +
> +	init_bytes = (uintptr_t)p32 - (uintptr_t)p8;

Please take a look at the types of init_bytes and end_bytes. 
ptrdiff_t, size_t and uint would all eb more appropriate than `int'.

> +	if (init_bytes > len)
> +		init_bytes = len;
> +	words = (len - init_bytes) >> 3;
> +	end_bytes = (len - init_bytes) & 7;
> +#else
> +	p8 = buf;
> +	p32 = (u32 *)(((uintptr_t)p8 + 3) & ~3);

ALIGN()?

> +	init_bytes = (uintptr_t)p32 - (uintptr_t)p8;
> +	if (init_bytes > len)
> +		init_bytes = len;

max()?

> +	words = (len - init_bytes) >> 2;
> +	end_bytes = (len - init_bytes) & 3;
> +#endif
> +
> +	for (i = 0; i < init_bytes; i++) {
> +#ifdef __LITTLE_ENDIAN
> +		i0 = *p8++ ^ crc;
> +		crc = t0_le[i0] ^ (crc >> 8);
> +#else
> +		i0 = *p8++ ^ (crc >> 24);
> +		crc = t0_le[i0] ^ (crc << 8);
> +#endif
> +	}

All the ifdeffing in here bursts my brain.

Has this code been carefully tested in LE and BE?

>
> ...
>
> +#if CRC_LE_BITS == 64
> +	p8 = buf;
> +	p32 = (u32 *)(((uintptr_t)p8 + 7) & ~7);
> +
> +	init_bytes = (uintptr_t)p32 - (uintptr_t)p8;
> +	if (init_bytes > len)
> +		init_bytes = len;
> +	words = (len - init_bytes) >> 3;
> +	end_bytes = (len - init_bytes) & 7;

Various dittoes.

> +#else
> +	p8 = buf;
> +	p32 = (u32 *)(((uintptr_t)p8 + 3) & ~3);
> +
> +	init_bytes = (uintptr_t)p32 - (uintptr_t)p8;
> +	if (init_bytes > len)
> +		init_bytes = len;
> +	words = (len - init_bytes) >> 2;
> +	end_bytes = (len - init_bytes) & 3;
> +#endif
> +
> +	for (i = 0; i < init_bytes; i++) {
> +#ifdef __LITTLE_ENDIAN
> +		i0 = *p8++ ^ crc;
> +		crc = t0_be[i0] ^ (crc >> 8);
> +#else
> +		i0 = *p8++ ^ (crc >> 24);
> +		crc = t0_be[i0] ^ (crc << 8);
> +#endif
> +	}
>  


  reply	other threads:[~2011-07-28 22:17 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-20 22:19 [PATCH] add slice by 8 algorithm to crc32.c frank zago
2011-07-28 22:16 ` Andrew Morton [this message]
2011-07-29  1:47   ` Bob Pearson
2011-08-01 19:39     ` Andrew Morton
     [not found] <OF4AE0115F.3AA5397E-ONC12578DF.002EC6DF-C12578DF.003348E5@transmode.se>
2011-08-02 21:14 ` Bob Pearson
2011-08-02 21:19   ` Bob Pearson
2011-08-04 11:54   ` Joakim Tjernlund
2011-08-04 18:53     ` Bob Pearson
2011-08-05  9:22       ` Joakim Tjernlund
2011-08-05 15:51         ` Bob Pearson
2011-08-08  7:11           ` Joakim Tjernlund
2011-08-05 17:27         ` Bob Pearson
2011-08-08  7:15           ` Joakim Tjernlund
     [not found]       ` <OF14136E0E.3F2388EF-ONC12578E3.00301969-C12578E3.00338524@LocalDomain>
2011-08-05 13:34         ` Joakim Tjernlund
  -- strict thread matches above, loose matches on Subject: below --
2011-08-08  9:28 George Spelvin
2011-08-08 10:31 ` Joakim Tjernlund
2011-08-08 10:52   ` George Spelvin
2011-08-08 11:11     ` Joakim Tjernlund
2011-08-08 17:04       ` Bob Pearson
     [not found]     ` <OFEA1BD2B2.B2A7F07F-ONC12578E6.003D368C-C12578E6.003D7468@LocalDomain>
2011-08-08 11:24       ` Joakim Tjernlund
2011-08-08 11:42     ` Joakim Tjernlund
2011-08-08 12:54       ` George Spelvin
2011-08-08 17:01         ` Bob Pearson
2011-08-08 20:45           ` George Spelvin
2011-08-08 22:21             ` Bob Pearson
2011-08-08 16:54     ` Bob Pearson
2011-08-08 16:50 ` Bob Pearson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110728151619.8b4cc4e3.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=fzago@systemfabricworks.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=roland@kernel.org \
    --cc=rpearson@systemfabricworks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.