public inbox for dev@dpdk.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Daniel Gregory <code@danielg0.com>
Cc: "Stanisław Kardach" <stanislaw.kardach@gmail.com>,
	dev@dpdk.org, "Punit Agrawal" <punit.agrawal@bytedance.com>,
	"Liang Ma" <liangma@liangbit.com>,
	"Pengcheng Wang" <wangpengcheng.pp@bytedance.com>,
	"Chunsong Feng" <fengchunsong@bytedance.com>,
	"Daniel Gregory" <daniel.gregory@bytedance.com>,
	"Sun Yuechi" <sunyuechi@iscas.ac.cn>
Subject: Re: [PATCH v4 00/10] riscv: implement accelerated crc using zbc
Date: Sun, 29 Mar 2026 11:22:10 -0700	[thread overview]
Message-ID: <20260329112210.638d1811@phoenix.local> (raw)
In-Reply-To: <cover.1771772598.git.code@danielg0.com>

On Sun, 22 Feb 2026 16:29:54 +0100
Daniel Gregory <code@danielg0.com> wrote:

> The RISC-V Zbc extension adds instructions for carry-less multiplication
> we can use to implement CRC in hardware. This patch set contains two new
> implementations:
> 
> - one in lib/hash/rte_crc_riscv64.h that uses a Barrett reduction to
>   implement the four rte_hash_crc_* functions
> - one in lib/net/net_crc_zbc.c that uses repeated single-folds to reduce
>   the buffer until it is small enough for a Barrett reduction to
>   implement rte_crc16_ccitt_zbc_handler and rte_crc32_eth_zbc_handler
> 
> My approach is largely based on the Intel's "Fast CRC Computation Using
> PCLMULQDQ Instruction" white paper
> https://www.researchgate.net/publication/263424619_Fast_CRC_computation
> and a post about "Optimizing CRC32 for small payload sizes on x86"
> https://mary.rs/lab/crc32/
> 
> Whether these new implementations are enabled is controlled by new
> build-time and run-time detection of the RISC-V extensions present in
> the compiler and on the target system.
> 
> I have carried out some performance comparisons between the generic
> table implementations and the new hardware implementations. Listed below
> is the number of cycles it takes to compute the CRC hash for buffers of
> various sizes (as reported by rte_get_timer_cycles()). These results
> were collected on a Kendryte K230 and averaged over 20 samples:
> 
> |Buffer    | CRC32-ETH (lib/net) | CRC32C (lib/hash)   |
> |Size (MB) | Table    | Hardware | Table    | Hardware |
> |----------|----------|----------|----------|----------|
> |        1 |   155168 |    11610 |    73026 |    18385 |
> |        2 |   311203 |    22998 |   145586 |    35886 |
> |        3 |   466744 |    34370 |   218536 |    53939 |
> |        4 |   621843 |    45536 |   291574 |    71944 |
> |        5 |   777908 |    56989 |   364152 |    89706 |
> |        6 |   932736 |    68023 |   437016 |   107726 |
> |        7 |  1088756 |    79236 |   510197 |   125426 |
> |        8 |  1243794 |    90467 |   583231 |   143614 |
> 
> These results suggest a speed-up of lib/net by thirteen times, and of
> lib/hash by four times.
> 
> I have also run the hash_functions_autotest benchmark in dpdk_test,
> which measures the performance of the lib/hash implementation on small
> buffers, getting the following times:
> 
> | Key Length | Time (ticks/op)     |
> | (bytes)    | Table    | Hardware |
> |------------|----------|----------|
> |          1 |     0.47 |     0.85 |
> |          2 |     0.57 |     0.87 |
> |          4 |     0.99 |     0.88 |
> |          8 |     1.35 |     0.88 |
> |          9 |     1.20 |     1.09 |
> |         13 |     1.76 |     1.35 |
> |         16 |     1.87 |     1.02 |
> |         32 |     2.96 |     0.98 |
> |         37 |     3.35 |     1.45 |
> |         40 |     3.49 |     1.12 |
> |         48 |     4.02 |     1.25 |
> |         64 |     5.08 |     1.54 |
> 
> v4:
> - rebase on 26.03-rc1
> - RISC64 -> RISCV64 in test_hash.c (Stephen Hemminger)
> - Added section to release notes (Stephen Hemminger)
> - SPDX-License_Identifier -> SPDX-License-Identifier in
>   rte_crc_riscv64.h (Stephen Hemminger)
> - Fix header guard in rte_crc_riscv64.h (Stephen Hemminger)
> - assert -> RTE_ASSERT in rte_crc_riscv64.h (Stephen Hemminger)
> - Fix copyright statement in net_crc_zbc.c (Stephen Hemminger)
> - Make crc context structs static in net_crc_zbc.c (Stephen Hemminger)
> - prefer the optimised crc when zbc present over jhash in rte_fbk_hash.c
> v3:
> - rebase on 24.07
> - replace crc with CRC in commits (check-git-log.sh)
> v2:
> - replace compile flag with build-time (riscv extension macros) and
>   run-time detection (linux hwprobe syscall) (Stephen Hemminger)
> - add qemu target that supports zbc (Stanislaw Kardach)
> - fix spelling error in commit message
> - fix a bug in the net/ implementation that would cause segfaults on
>   small unaligned buffers
> - refactor net/ implementation to move variable declarations to top of
>   functions
> - enable the optimisation in a couple other places optimised crc is
>   preferred to jhash
>   - l3fwd-power
>   - cuckoo-hash
> 
> Daniel Gregory (10):
>   config/riscv: detect presence of Zbc extension
>   hash: implement CRC using riscv carryless multiply
>   net: implement CRC using riscv carryless multiply
>   config/riscv: add qemu crossbuild target
>   examples/l3fwd: use accelerated CRC on riscv
>   ipfrag: use accelerated CRC on riscv
>   examples/l3fwd-power: use accelerated CRC on riscv
>   hash: use accelerated CRC on riscv
>   member: use accelerated CRC on riscv
>   doc: implement CRC using riscv carryless multiply
> 
>  .mailmap                                      |   2 +-
>  MAINTAINERS                                   |   2 +
>  app/test/test_crc.c                           |  10 +
>  app/test/test_hash.c                          |   7 +
>  config/riscv/meson.build                      |  33 +++
>  config/riscv/riscv64_qemu_linux_gcc           |  17 ++
>  .../linux_gsg/cross_build_dpdk_for_riscv.rst  |   5 +
>  doc/guides/rel_notes/release_26_03.rst        |   8 +
>  examples/l3fwd-power/main.c                   |   2 +-
>  examples/l3fwd/l3fwd_em.c                     |   2 +-
>  lib/eal/riscv/include/rte_cpuflags.h          |   2 +
>  lib/eal/riscv/rte_cpuflags.c                  | 112 +++++++---
>  lib/hash/meson.build                          |   1 +
>  lib/hash/rte_crc_riscv64.h                    |  90 ++++++++
>  lib/hash/rte_cuckoo_hash.c                    |   3 +
>  lib/hash/rte_fbk_hash.c                       |   3 +
>  lib/hash/rte_hash_crc.c                       |  13 +-
>  lib/hash/rte_hash_crc.h                       |   6 +-
>  lib/ip_frag/ip_frag_internal.c                |   6 +-
>  lib/member/member.h                           |   2 +-
>  lib/net/meson.build                           |   4 +
>  lib/net/net_crc.h                             |  11 +
>  lib/net/net_crc_zbc.c                         | 194 ++++++++++++++++++
>  lib/net/rte_net_crc.c                         |  30 ++-
>  lib/net/rte_net_crc.h                         |   3 +
>  25 files changed, 526 insertions(+), 42 deletions(-)
>  create mode 100644 config/riscv/riscv64_qemu_linux_gcc
>  create mode 100644 lib/hash/rte_crc_riscv64.h
>  create mode 100644 lib/net/net_crc_zbc.c
> 

Since don't have riscv hardware or detailed CPU knowledge, turned to AI review for help.
Looks like more work is needed. Not all AI comments are correct; please look at in detail.

Summary of Findings

  The patch series correctly introduces RISC-V Zbc (carryless multiplication) support for CRC acceleration in both hash and
  net libraries, including runtime detection via the Linux hwprobe interface. However, I identified a significant alignment
  bug in the buffer processing logic and a missing header that will likely break the build on some systems.

  ---

  Correctness Bugs (High Priority)

  1. Alignment Logic Error in crc32_eth_calc_zbc (lib/net/net_crc_zbc.c)
  The code attempts to align the input buffer to an 8-byte boundary before performing 64-bit wide operations, but the
  calculation is incorrect:

   1 +    /* Barrett reduce until buffer aligned to 8-byte word */
   2 +    uint32_t misalign = (size_t)data & 7;
   3 +    if (misalign != 0 && misalign <= data_len) {
   4 +        crc = crc32_repeated_barrett_zbc(data, misalign, crc, params);
   5 +        data += misalign;
   6 +        data_len -= misalign;
   7 +    }

  Issue: If data is 0x...001 (1-byte misaligned), misalign is 1. After processing 1 byte and incrementing data, the new
  address is 0x...002, which is still not 8-byte aligned. This leads to misaligned 64-bit loads (*(const uint64_t *)data)
  later in the function, which can cause performance degradation or traps depending on the RISC-V implementation.
  Recommendation: The number of bytes to process to reach alignment should be (8 - ((uintptr_t)data & 7)) & 7.

  2. Missing Header for cpu_set_t (lib/eal/riscv/rte_cpuflags.c)
  In Patch 01/10, the function rte_cpu_hwprobe_ima_ext introduces the use of cpu_set_t:

   1 +    /* empty set of cpus returns extensions present on all cpus */
   2 +    cpu_set_t *cpus = NULL;

  Issue: lib/eal/riscv/rte_cpuflags.c does not include <sched.h>, which is required for the cpu_set_t definition. This will
  cause a compilation error.
  Recommendation: Add #include <sched.h> to the EAL CPU flags implementation.

  3. Algorithm Overriding in rte_hash_crc_set_alg (lib/hash/rte_hash_crc.c)
  The implementation overrides the user's requested algorithm if the Zbc extension is detected:

   1 +    if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_RISCV_EXT_ZBC))
   2 +        rte_hash_crc32_alg = CRC32_RISCV64;

  Issue: If a user explicitly calls rte_hash_crc_set_alg(CRC32_SW), the code will still force the use of CRC32_RISCV64. While
  this matches the existing (and arguably flawed) pattern for ARM64, it violates the expectation that set_alg respects the
  user's choice.

  ---

  Style and Process Compliance

  4. Inconsistent Assertion Macros
   - Patch 02/10 uses RTE_ASSERT in lib/hash/rte_crc_riscv64.h.
   - Patch 03/10 uses standard C assert() in lib/net/net_crc_zbc.c.
  Recommendation: Use RTE_ASSERT consistently across the codebase per DPDK standards.

  5. Implicit Pointer Comparison Nits
  While most comparisons are explicit, the project style generally favors if (ptr == NULL) over if (!ptr). There are a few
  instances in Patch 01 (Meson logic) and Patch 02 (intrinsics checks) that could be more explicit, though they mostly follow
  the local conventions of those specific files.

  6. Barrett Reduction Constants
  The constants mu and p in lib/hash/rte_crc_riscv64.h are lowercase. While these are static const variables, global/static
  constants in DPDK are often written in ALL_UPPERCASE.

  ---

  Additional Observations

   - Toolchain Requirements: The Meson logic correctly identifies that GCC 14.1+ or Clang 18.1+ is required for the Zbc
     intrinsics.
   - Cross-compilation: The addition of a QEMU cross-build target is a helpful addition for testing these extensions in
     virtualized environments.
   - Hardware Probing: The use of the Linux hwprobe syscall is the correct modern approach for RISC-V extension discovery.
                                          



      parent reply	other threads:[~2026-03-29 18:22 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-18 17:41 [PATCH 0/5] riscv: implement accelerated crc using zbc Daniel Gregory
2024-06-18 17:41 ` [PATCH 1/5] config/riscv: add flag for using Zbc extension Daniel Gregory
2024-06-18 20:03   ` Stephen Hemminger
2024-06-19  7:08     ` Morten Brørup
2024-06-19 14:49       ` Stephen Hemminger
2024-06-19 16:41       ` Daniel Gregory
2024-10-07  8:14         ` Stanisław Kardach
2024-10-07 15:20           ` Stephen Hemminger
2024-10-08  5:52             ` Stanisław Kardach
2024-10-08 15:35               ` Stephen Hemminger
2024-06-18 17:41 ` [PATCH 2/5] hash: implement crc using riscv carryless multiply Daniel Gregory
2024-06-18 17:41 ` [PATCH 3/5] net: " Daniel Gregory
2024-06-18 17:41 ` [PATCH 4/5] examples/l3fwd: use accelerated crc on riscv Daniel Gregory
2024-06-18 17:41 ` [PATCH 5/5] ipfrag: " Daniel Gregory
2024-07-12 15:46 ` [PATCH v2 0/9] riscv: implement accelerated crc using zbc Daniel Gregory
2024-07-12 15:46   ` [PATCH v2 1/9] config/riscv: detect presence of Zbc extension Daniel Gregory
2024-07-12 15:46   ` [PATCH v2 2/9] hash: implement crc using riscv carryless multiply Daniel Gregory
2024-07-12 15:46   ` [PATCH v2 3/9] net: " Daniel Gregory
2024-07-12 15:46   ` [PATCH v2 4/9] config/riscv: add qemu crossbuild target Daniel Gregory
2024-07-12 15:46   ` [PATCH v2 5/9] examples/l3fwd: use accelerated crc on riscv Daniel Gregory
2024-07-12 15:46   ` [PATCH v2 6/9] ipfrag: " Daniel Gregory
2024-07-12 15:46   ` [PATCH v2 7/9] examples/l3fwd-power: " Daniel Gregory
2024-07-12 15:46   ` [PATCH v2 8/9] hash/cuckoo: " Daniel Gregory
2024-07-12 15:46   ` [PATCH v2 9/9] member: " Daniel Gregory
2024-07-12 17:19   ` [PATCH v2 0/9] riscv: implement accelerated crc using zbc David Marchand
2024-08-27 15:32   ` [PATCH v3 " Daniel Gregory
2024-08-27 15:32     ` [PATCH v3 1/9] config/riscv: detect presence of Zbc extension Daniel Gregory
2024-08-27 15:32     ` [PATCH v3 2/9] hash: implement CRC using riscv carryless multiply Daniel Gregory
2024-08-27 15:32     ` [PATCH v3 3/9] net: " Daniel Gregory
2024-08-27 15:32     ` [PATCH v3 4/9] config/riscv: add qemu crossbuild target Daniel Gregory
2024-08-27 15:36     ` [PATCH v3 5/9] examples/l3fwd: use accelerated CRC on riscv Daniel Gregory
2024-08-27 15:36       ` [PATCH v3 6/9] ipfrag: " Daniel Gregory
2024-08-27 15:36       ` [PATCH v3 7/9] examples/l3fwd-power: " Daniel Gregory
2024-08-27 15:36       ` [PATCH v3 8/9] hash/cuckoo: " Daniel Gregory
2024-08-27 15:36       ` [PATCH v3 9/9] member: " Daniel Gregory
2024-09-17 14:26     ` [PATCH v3 0/9] riscv: implement accelerated crc using zbc Daniel Gregory
2025-11-17  4:47       ` sunyuechi
2026-01-13  1:07     ` Stephen Hemminger
2026-02-22 15:29     ` [PATCH v4 00/10] " Daniel Gregory
2026-02-22 15:29       ` [PATCH v4 01/10] config/riscv: detect presence of Zbc extension Daniel Gregory
2026-02-22 15:29       ` [PATCH v4 02/10] hash: implement CRC using riscv carryless multiply Daniel Gregory
2026-02-22 15:29       ` [PATCH v4 03/10] net: " Daniel Gregory
2026-02-22 15:29       ` [PATCH v4 04/10] config/riscv: add qemu crossbuild target Daniel Gregory
2026-02-22 15:29       ` [PATCH v4 05/10] examples/l3fwd: use accelerated CRC on riscv Daniel Gregory
2026-02-22 15:30       ` [PATCH v4 06/10] ipfrag: " Daniel Gregory
2026-02-22 15:30       ` [PATCH v4 07/10] examples/l3fwd-power: " Daniel Gregory
2026-02-22 15:30       ` [PATCH v4 08/10] hash: " Daniel Gregory
2026-02-22 15:30       ` [PATCH v4 09/10] member: " Daniel Gregory
2026-02-22 15:30       ` [PATCH v4 10/10] doc: implement CRC using riscv carryless multiply Daniel Gregory
2026-02-22 18:03       ` [PATCH v4 00/10] riscv: implement accelerated crc using zbc Stephen Hemminger
2026-02-22 19:42         ` Morten Brørup
2026-03-23  5:42       ` dangshiwei
2026-03-29 18:22       ` Stephen Hemminger [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260329112210.638d1811@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=code@danielg0.com \
    --cc=daniel.gregory@bytedance.com \
    --cc=dev@dpdk.org \
    --cc=fengchunsong@bytedance.com \
    --cc=liangma@liangbit.com \
    --cc=punit.agrawal@bytedance.com \
    --cc=stanislaw.kardach@gmail.com \
    --cc=sunyuechi@iscas.ac.cn \
    --cc=wangpengcheng.pp@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox