Re: [PATCH] x86/crc32: optimize tail handling for crc32c short inputs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Laight <david.laight.linux@gmail.com>
To: Eric Biggers <ebiggers@kernel.org>
Cc: linux-kernel@vger.kernel.org, Bill Wendling <morbo@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <nick.desaulniers+lkml@gmail.com>,
	Justin Stitt <justinstitt@google.com>,
	linux-crypto@vger.kernel.org, llvm@lists.linux.dev
Subject: Re: [PATCH] x86/crc32: optimize tail handling for crc32c short inputs
Date: Thu, 6 Mar 2025 05:17:40 +0000	[thread overview]
Message-ID: <20250306051740.6f454cac@pumpkin> (raw)
In-Reply-To: <20250306025643.GA1592@sol.localdomain>

On Wed, 5 Mar 2025 18:56:43 -0800
Eric Biggers <ebiggers@kernel.org> wrote:

> On Wed, Mar 05, 2025 at 10:07:39PM +0000, David Laight wrote:
> > On Wed, 5 Mar 2025 11:16:08 -0800
> > Eric Biggers <ebiggers@kernel.org> wrote:
> >   
> > > On Wed, Mar 05, 2025 at 02:26:53PM +0000, David Laight wrote:  
> > > > On Tue,  4 Mar 2025 13:32:16 -0800
> > > > Eric Biggers <ebiggers@kernel.org> wrote:
> > > >     
> > > > > From: Eric Biggers <ebiggers@google.com>
> > > > > 
> > > > > For handling the 0 <= len < sizeof(unsigned long) bytes left at the end,
> > > > > do a 4-2-1 step-down instead of a byte-at-a-time loop.  This allows
> > > > > taking advantage of wider CRC instructions.  Note that crc32c-3way.S
> > > > > already uses this same optimization too.    
> > > > 
> > > > An alternative is to add extra zero bytes at the start of the buffer.
> > > > They don't affect the crc and just need the first 8 bytes shifted left.
> > > > 
> > > > I think any non-zero 'crc-in' just needs to be xor'ed over the first
> > > > 4 actual data bytes.
> > > > (It's over 40 years since I did the maths of CRC.)  
> > ...  
> > > > 	David    
> > > 
> > > Sure, but that only works when len >= sizeof(unsigned long).  Also, the initial
> > > CRC sometimes has to be divided between two unsigned longs.  
> > 
> > Yes, I was thinking that might make it a bit more tricky.
> > I need to find some spare time :-)
> > 
> > I wasn't taught anything about using non-carry multiplies either.
> > And I can't remember the relevant 'number field' stuff either.
> > But (with no-carry maths) I think you have:
> > 	crc(n + 1) = (crc(n) + data(n)) * poly
> > If data(n+1) and data(n+2) are zero (handled elsewhere) you have:
> > 	crc(n + 3) = (((crc(n) + data(n)) * poly) * poly) * poly
> > I think that because it is a field this is the same as
> > 	crc(n + 3) = (crc(n) + data(n)) * (poly * poly * poly)
> > which is just a different crc polynomial.
> > If true your '3-way' cpu doesn't have to use big blocks.  
> 
> Well, to extend by some constant number of bits 'n', you can carryless-multiply
> by the polynomial x^n, pre-reduced by the CRC's generator polynomial.  That's
> basically how all the CRC implementations using carryless multiplication work.
> Take a look at the x86 and riscv optimized code, for example -- especially my
> new versions in the crc-next tree at
> https://web.git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git/log/?h=crc-next.
> 
> But x86 does not have a scalar carryless multiplication instruction, only vector
> (PCLMULQDQ).  It does have a scalar CRC instruction, for crc32c *specifically*,
> and that is what the code we're discussing is taking advantage of.  Given that
> there is overhead associated with using kernel-mode FPU (i.e. vector), it makes
> sense to do that, at least on short messages.
> 
> On longer messages a PCLMULQDQ-only implementation would work well, but so does
> interleaving the crc32c scalar instruction on multiple chunks, which is what is
> currently wired up in the kernel via crc32c-3way.S.  And yes, the chunks for
> that do not *have* to be long, but you still need to use pclmulqdq instructions
> to combine them

You should be able to use lookup tables to jump over the other chunks
(so processing a block of zero bytes).
Possibly 8 lookup tables of 16 entries each for each nibble (512 bytes).
Not nice for the d-cache though.

> (unless you do a really slow bit-at-a-time carryless
> multiplication), and you have to enter a kernel-mode FPU section to do that.

That pseudo-code is probably completely wrong.
I suspect that since it is 'just' doing 'data % poly' it relies on the fact that
	327 % 7 == 3 * (100 % 7) + 27
and reduces the length by one digit.
(Where a 'digit' could be 64 bits)
I need to look up pclmulqdq.
Trying to do anything with the simd instructions makes my head hurt.

	David

next prev parent reply	other threads:[~2025-03-06  5:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-04 21:32 [PATCH] x86/crc32: optimize tail handling for crc32c short inputs Eric Biggers
2025-03-05 10:10 ` Uros Bizjak
2025-03-05 14:26 ` David Laight
2025-03-05 19:16   ` Eric Biggers
2025-03-05 22:07     ` David Laight
2025-03-06  2:56       ` Eric Biggers
2025-03-06  5:17         ` David Laight [this message]
2025-03-06 17:21 ` Eric Biggers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250306051740.6f454cac@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=ebiggers@kernel.org \
    --cc=hpa@zytor.com \
    --cc=justinstitt@google.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=mingo@redhat.com \
    --cc=morbo@google.com \
    --cc=nathan@kernel.org \
    --cc=nick.desaulniers+lkml@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.