Re: [PATCH] net/crc: add 4x folding loop for x86 SSE implementation

DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Stephen Hemminger <stephen@networkplumber.org>
To: Shreesh Adiga <16567adigashreesh@gmail.com>
Cc: Jasvinder Singh <jasvinder.singh@intel.com>,
	Bruce Richardson <bruce.richardson@intel.com>,
	Konstantin Ananyev <konstantin.ananyev@huawei.com>,
	dev@dpdk.org
Subject: Re: [PATCH] net/crc: add 4x folding loop for x86 SSE implementation
Date: Thu, 11 Jun 2026 10:06:11 -0700	[thread overview]
Message-ID: <20260611100611.17880d3b@phoenix.local> (raw)
In-Reply-To: <20260609075712.247286-1-16567adigashreesh@gmail.com>

On Tue,  9 Jun 2026 13:27:12 +0530
Shreesh Adiga <16567adigashreesh@gmail.com> wrote:

> Add a 64-byte loop that maintains 4 fold registers and processes
> 64 bytes at a time. The 4x fold registers is then reduced to 16 byte
> single fold, similar to AVX512 implementation. This technique is
> described in the paper by Intel:
> "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
> 
> This results in roughly 50% performance improvement due to better ILP
> for large input sizes like 1024.
> 
> Signed-off-by: Shreesh Adiga <16567adigashreesh@gmail.com>
> ---

Looks good applied to next-net.

A couple of nits from more detailed AI review, that you still might want to look at:

The current crc_autotest does not exercise the new 64-byte CRC16 path.
Its CRC32 vectors are 1512 and 348 bytes, so the CRC32 4x loop is
covered — but the largest CRC16 vector is 32 bytes, all three CRC16
tests being ≤32. So the new CRC16 rk1_rk2 (64-byte fold) constants ship
untested in CI. My exhaustive test confirms they're correct, but a
future regression there wouldn't be caught. Suggest adding a CRC16
vector ≥64 bytes, ideally a non-multiple of 64 (e.g. 80 or 100) so it
hits the 4x loop, the single-fold tail, and the partial-bytes path
together.

In partial_bytes the comment /* k = rk1 & rk2 */ is now stale
 — after the patch k holds rk3_rk4 on every path reaching it.
Not introduced by this patch, but the patch is what made it wrong;
worth fixing in passing.

next prev parent reply	other threads:[~2026-06-11 17:06 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-09  7:57 [PATCH] net/crc: add 4x folding loop for x86 SSE implementation Shreesh Adiga
2026-06-11 17:06 ` Stephen Hemminger [this message]
2026-06-12  3:02   ` Shreesh Adiga

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260611100611.17880d3b@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=16567adigashreesh@gmail.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=jasvinder.singh@intel.com \
    --cc=konstantin.ananyev@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox