* [PATCH v14 0/2] net: optimize __rte_raw_cksum
@ 2026-01-12 12:04 scott.k.mitch1
2026-01-12 12:04 ` [PATCH v14 1/2] eal: add __rte_may_alias to unaligned typedefs scott.k.mitch1
` (2 more replies)
0 siblings, 3 replies; 39+ messages in thread
From: scott.k.mitch1 @ 2026-01-12 12:04 UTC (permalink / raw)
To: dev; +Cc: mb, stephen, Scott Mitchell
From: Scott Mitchell <scott.k.mitch1@gmail.com>
This series optimizes __rte_raw_cksum by replacing memcpy with direct
pointer access, enabling compiler vectorization on both GCC and Clang.
Patch 1 adds __rte_may_alias to unaligned typedefs to prevent a GCC
strict-aliasing bug where struct initialization is incorrectly elided.
Patch 2 uses the improved unaligned_uint16_t type in __rte_raw_cksum
to enable compiler optimizations while maintaining correctness across
all architectures (including strict-alignment platforms).
Performance results show significant improvements (40% for small buffers,
up to 8x for larger buffers) on Intel Xeon with Clang 18.1.
Changes in v14:
- Split into two patches: EAL typedef fix and checksum optimization
- Use unaligned_uint16_t directly instead of wrapper struct
- Added __rte_may_alias to unaligned typedefs to prevent GCC bug
Scott Mitchell (2):
eal: add __rte_may_alias to unaligned typedefs
net: __rte_raw_cksum pointers enable compiler optimizations
app/test/meson.build | 1 +
app/test/test_cksum_fuzz.c | 240 +++++++++++++++++++++++++++++++++++
app/test/test_cksum_perf.c | 2 +-
lib/eal/include/rte_common.h | 34 ++---
lib/net/rte_cksum.h | 14 +-
5 files changed, 266 insertions(+), 25 deletions(-)
create mode 100644 app/test/test_cksum_fuzz.c
--
2.39.5 (Apple Git-154)
^ permalink raw reply [flat|nested] 39+ messages in thread* [PATCH v14 1/2] eal: add __rte_may_alias to unaligned typedefs 2026-01-12 12:04 [PATCH v14 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 @ 2026-01-12 12:04 ` scott.k.mitch1 2026-01-12 13:28 ` Morten Brørup 2026-01-12 12:04 ` [PATCH v14 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 2026-01-17 21:21 ` [PATCH v15 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 2 siblings, 1 reply; 39+ messages in thread From: scott.k.mitch1 @ 2026-01-12 12:04 UTC (permalink / raw) To: dev; +Cc: mb, stephen, Scott Mitchell From: Scott Mitchell <scott.k.mitch1@gmail.com> Add __rte_may_alias attribute to unaligned_uint{16,32,64}_t typedefs to prevent GCC strict-aliasing optimization bugs. GCC has a bug where it incorrectly elides struct initialization when strict aliasing is enabled, causing reads from uninitialized memory. The __rte_may_alias attribute signals to the compiler that these types can alias other types, preventing the incorrect optimization. Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com> --- lib/eal/include/rte_common.h | 34 +++++++++++++++++++--------------- 1 file changed, 19 insertions(+), 15 deletions(-) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 9e7d84f929..ac70270cfb 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -121,14 +121,27 @@ extern "C" { #define __rte_aligned(a) __attribute__((__aligned__(a))) #endif +/** + * Macro to mark a type that is not subject to type-based aliasing rules + */ +#ifdef RTE_TOOLCHAIN_MSVC +#define __rte_may_alias +#else +#define __rte_may_alias __attribute__((__may_alias__)) +#endif + +/** + * __rte_may_alias avoids compiler bugs (GCC) that elide initialization + * of memory when strict-aliasing is enabled. + */ #ifdef RTE_ARCH_STRICT_ALIGN -typedef uint64_t unaligned_uint64_t __rte_aligned(1); -typedef uint32_t unaligned_uint32_t __rte_aligned(1); -typedef uint16_t unaligned_uint16_t __rte_aligned(1); +typedef uint64_t unaligned_uint64_t __rte_may_alias __rte_aligned(1); +typedef uint32_t unaligned_uint32_t __rte_may_alias __rte_aligned(1); +typedef uint16_t unaligned_uint16_t __rte_may_alias __rte_aligned(1); #else -typedef uint64_t unaligned_uint64_t; -typedef uint32_t unaligned_uint32_t; -typedef uint16_t unaligned_uint16_t; +typedef uint64_t unaligned_uint64_t __rte_may_alias; +typedef uint32_t unaligned_uint32_t __rte_may_alias; +typedef uint16_t unaligned_uint16_t __rte_may_alias; #endif /** @@ -159,15 +172,6 @@ typedef uint16_t unaligned_uint16_t; #define __rte_packed_end __attribute__((__packed__)) #endif -/** - * Macro to mark a type that is not subject to type-based aliasing rules - */ -#ifdef RTE_TOOLCHAIN_MSVC -#define __rte_may_alias -#else -#define __rte_may_alias __attribute__((__may_alias__)) -#endif - /******* Macro to mark functions and fields scheduled for removal *****/ #ifdef RTE_TOOLCHAIN_MSVC #define __rte_deprecated -- 2.39.5 (Apple Git-154) ^ permalink raw reply related [flat|nested] 39+ messages in thread
* RE: [PATCH v14 1/2] eal: add __rte_may_alias to unaligned typedefs 2026-01-12 12:04 ` [PATCH v14 1/2] eal: add __rte_may_alias to unaligned typedefs scott.k.mitch1 @ 2026-01-12 13:28 ` Morten Brørup 2026-01-12 15:00 ` Scott Mitchell 0 siblings, 1 reply; 39+ messages in thread From: Morten Brørup @ 2026-01-12 13:28 UTC (permalink / raw) To: scott.k.mitch1, dev; +Cc: stephen > From: Scott Mitchell <scott.k.mitch1@gmail.com> > > Add __rte_may_alias attribute to unaligned_uint{16,32,64}_t typedefs > to prevent GCC strict-aliasing optimization bugs. GCC has a bug where > it incorrectly elides struct initialization when strict aliasing is > enabled, causing reads from uninitialized memory. > > The __rte_may_alias attribute signals to the compiler that these types > can alias other types, preventing the incorrect optimization. I'm wondering if this is the right place to add __rte_may_alias, i.e. if the scope of the workaround is correct. Are the unaligned_uintNN_t types only used in a way where they are affected by the GCC bug? If not, adding __rte_may_alias to the types themselves may be too broad. Does the GCC bug only affect the unaligned_uintNN_t types? Or does it occur elsewhere or for other types too? Then this workaround only solves the problem for parts of the code. Minor detail: If the bug only occurs on GCC, not Clang, please make the workaround GCC-only, using the preprocessor. > > Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com> > --- > lib/eal/include/rte_common.h | 34 +++++++++++++++++++--------------- > 1 file changed, 19 insertions(+), 15 deletions(-) > > diff --git a/lib/eal/include/rte_common.h > b/lib/eal/include/rte_common.h > index 9e7d84f929..ac70270cfb 100644 > --- a/lib/eal/include/rte_common.h > +++ b/lib/eal/include/rte_common.h > @@ -121,14 +121,27 @@ extern "C" { > #define __rte_aligned(a) __attribute__((__aligned__(a))) > #endif > > +/** > + * Macro to mark a type that is not subject to type-based aliasing > rules > + */ > +#ifdef RTE_TOOLCHAIN_MSVC > +#define __rte_may_alias > +#else > +#define __rte_may_alias __attribute__((__may_alias__)) > +#endif > + > +/** > + * __rte_may_alias avoids compiler bugs (GCC) that elide > initialization > + * of memory when strict-aliasing is enabled. > + */ > #ifdef RTE_ARCH_STRICT_ALIGN > -typedef uint64_t unaligned_uint64_t __rte_aligned(1); > -typedef uint32_t unaligned_uint32_t __rte_aligned(1); > -typedef uint16_t unaligned_uint16_t __rte_aligned(1); > +typedef uint64_t unaligned_uint64_t __rte_may_alias __rte_aligned(1); > +typedef uint32_t unaligned_uint32_t __rte_may_alias __rte_aligned(1); > +typedef uint16_t unaligned_uint16_t __rte_may_alias __rte_aligned(1); > #else > -typedef uint64_t unaligned_uint64_t; > -typedef uint32_t unaligned_uint32_t; > -typedef uint16_t unaligned_uint16_t; > +typedef uint64_t unaligned_uint64_t __rte_may_alias; > +typedef uint32_t unaligned_uint32_t __rte_may_alias; > +typedef uint16_t unaligned_uint16_t __rte_may_alias; > #endif > > /** > @@ -159,15 +172,6 @@ typedef uint16_t unaligned_uint16_t; > #define __rte_packed_end __attribute__((__packed__)) > #endif > > -/** > - * Macro to mark a type that is not subject to type-based aliasing > rules > - */ > -#ifdef RTE_TOOLCHAIN_MSVC > -#define __rte_may_alias > -#else > -#define __rte_may_alias __attribute__((__may_alias__)) > -#endif > - > /******* Macro to mark functions and fields scheduled for removal > *****/ > #ifdef RTE_TOOLCHAIN_MSVC > #define __rte_deprecated > -- > 2.39.5 (Apple Git-154) ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v14 1/2] eal: add __rte_may_alias to unaligned typedefs 2026-01-12 13:28 ` Morten Brørup @ 2026-01-12 15:00 ` Scott Mitchell 0 siblings, 0 replies; 39+ messages in thread From: Scott Mitchell @ 2026-01-12 15:00 UTC (permalink / raw) To: Morten Brørup; +Cc: dev, stephen > I'm wondering if this is the right place to add __rte_may_alias, i.e. if the scope of the workaround is correct. > > Are the unaligned_uintNN_t types only used in a way where they are affected by the GCC bug? All uses in DPDK are for aliasing (casting pointers to access existing memory). There are no cases where these types declare actual data. Therefore adding __rte_may_alias is semantically correct and not too broad. > If not, adding __rte_may_alias to the types themselves may be too broad. > > Does the GCC bug only affect the unaligned_uintNN_t types? > Or does it occur elsewhere or for other types too? Then this workaround only solves the problem for parts of the code. > The GCC strict-aliasing bug is broader and can occur with other aliasing patterns involving struct initialization. This patch targets the unaligned_uintNN_t types specifically because: 1. They are known to trigger the bug in practice (reproduced in testing) 2. They are explicitly designed for aliasing 3. All existing DPDK usage is for aliasing Added benefits of this approach: 1. Simplifies existing workarounds: We can remove the intermediate packed structs in rte_memcpy.h for x86 and use unaligned_NN_t directly (https://elixir.bootlin.com/dpdk/v25.11/source/lib/eal/x86/include/rte_memcpy.h#L66) 2. Provides safe aliasing primitive: If other code needs to alias types and wants to avoid potential bugs, these unaligned_uintNN_t types are now a correct, safe option > Minor detail: > If the bug only occurs on GCC, not Clang, please make the workaround GCC-only, using the preprocessor. I've only reproduced the bug on GCC, but __rte_may_alias is semantically correct for these types on all compilers since they're exclusively used for aliasing. The attribute: - Has no negative impact on GCC/Clang (verified on Godbolt - still optimizes correctly: https://godbolt.org/z/Gj9EfqMTn) - Makes the code semantically accurate about its intent - Avoids #ifdef complexity However, if you prefer a GCC-only workaround for minimal change, I'm happy to add the preprocessor conditionals. Please let me know your preference. ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v14 2/2] net: __rte_raw_cksum pointers enable compiler optimizations 2026-01-12 12:04 [PATCH v14 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 2026-01-12 12:04 ` [PATCH v14 1/2] eal: add __rte_may_alias to unaligned typedefs scott.k.mitch1 @ 2026-01-12 12:04 ` scott.k.mitch1 2026-01-17 21:21 ` [PATCH v15 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 2 siblings, 0 replies; 39+ messages in thread From: scott.k.mitch1 @ 2026-01-12 12:04 UTC (permalink / raw) To: dev; +Cc: mb, stephen, Scott Mitchell From: Scott Mitchell <scott.k.mitch1@gmail.com> __rte_raw_cksum uses a loop with memcpy on each iteration. GCC 15+ is able to vectorize the loop but Clang 18.1 is not. Replace memcpy with direct pointer access using unaligned_uint16_t. This enables both GCC and Clang to vectorize the loop while handling unaligned access safely on all architectures. Performance results from cksum_perf_autotest on Intel Xeon (Cascade Lake, AVX-512) built with Clang 18.1 (TSC cycles/byte): Block size Before After Improvement 100 0.40 0.24 ~40% 1500 0.50 0.06 ~8x 9000 0.49 0.06 ~8x Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com> --- app/test/meson.build | 1 + app/test/test_cksum_fuzz.c | 240 +++++++++++++++++++++++++++++++++++++ app/test/test_cksum_perf.c | 2 +- lib/net/rte_cksum.h | 14 +-- 4 files changed, 247 insertions(+), 10 deletions(-) create mode 100644 app/test/test_cksum_fuzz.c diff --git a/app/test/meson.build b/app/test/meson.build index efec42a6bf..c92325ad58 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -38,6 +38,7 @@ source_file_deps = { 'test_byteorder.c': [], 'test_cfgfile.c': ['cfgfile'], 'test_cksum.c': ['net'], + 'test_cksum_fuzz.c': ['net'], 'test_cksum_perf.c': ['net'], 'test_cmdline.c': [], 'test_cmdline_cirbuf.c': [], diff --git a/app/test/test_cksum_fuzz.c b/app/test/test_cksum_fuzz.c new file mode 100644 index 0000000000..839861f57d --- /dev/null +++ b/app/test/test_cksum_fuzz.c @@ -0,0 +1,240 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2026 Apple Inc. + */ + +#include <stdio.h> +#include <string.h> + +#include <rte_common.h> +#include <rte_cycles.h> +#include <rte_hexdump.h> +#include <rte_cksum.h> +#include <rte_malloc.h> +#include <rte_random.h> + +#include "test.h" + +/* + * Fuzz test for __rte_raw_cksum optimization. + * Compares the optimized implementation against the original reference + * implementation across random data of various lengths. + */ + +#define DEFAULT_ITERATIONS 1000 +#define MAX_TEST_LEN 65536 /* 64K to match GRO frame sizes */ + +/* + * Original (reference) implementation of __rte_raw_cksum from DPDK v23.11. + * This is retained here for comparison testing against the optimized version. + */ +static inline uint32_t +__rte_raw_cksum_reference(const void *buf, size_t len, uint32_t sum) +{ + const void *end; + + for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t))); + buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) { + uint16_t v; + + memcpy(&v, buf, sizeof(uint16_t)); + sum += v; + } + + /* if length is odd, keeping it byte order independent */ + if (unlikely(len % 2)) { + uint16_t left = 0; + + memcpy(&left, end, 1); + sum += left; + } + + return sum; +} + +static void +init_random_buffer(uint8_t *buf, size_t len) +{ + size_t i; + + for (i = 0; i < len; i++) + buf[i] = (uint8_t)rte_rand(); +} + +static inline uint32_t +get_initial_sum(bool random_initial_sum) +{ + return random_initial_sum ? (rte_rand() & 0xFFFFFFFF) : 0; +} + +/* + * Test a single buffer length with specific alignment and initial sum + */ +static int +test_cksum_fuzz_length_aligned(size_t len, bool aligned, uint32_t initial_sum) +{ + uint8_t *data; + uint8_t *buf; + size_t alloc_size; + uint32_t sum_ref, sum_opt; + + if (len == 0 && !aligned) { + /* Skip unaligned test for zero length - nothing to test */ + return TEST_SUCCESS; + } + + /* Allocate exact size for aligned, +1 for unaligned offset */ + alloc_size = aligned ? len : len + 1; + if (alloc_size == 0) + alloc_size = 1; /* rte_malloc doesn't like 0 */ + + data = rte_malloc(NULL, alloc_size, 64); + if (data == NULL) { + printf("Failed to allocate %zu bytes\n", alloc_size); + return TEST_FAILED; + } + + buf = aligned ? data : (data + 1); + + init_random_buffer(buf, len); + + sum_ref = __rte_raw_cksum_reference(buf, len, initial_sum); + sum_opt = __rte_raw_cksum(buf, len, initial_sum); + + if (sum_ref != sum_opt) { + printf("MISMATCH at len=%zu aligned='%s' initial_sum=0x%08x ref=0x%08x opt=0x%08x\n", + len, aligned ? "aligned" : "unaligned", + initial_sum, sum_ref, sum_opt); + rte_hexdump(stdout, "failing buffer", buf, len); + rte_free(data); + return TEST_FAILED; + } + + rte_free(data); + return TEST_SUCCESS; +} + +/* + * Test a length with both alignments + */ +static int +test_cksum_fuzz_length(size_t len, uint32_t initial_sum) +{ + int rc; + + /* Test aligned */ + rc = test_cksum_fuzz_length_aligned(len, true, initial_sum); + if (rc != TEST_SUCCESS) + return rc; + + /* Test unaligned */ + rc = test_cksum_fuzz_length_aligned(len, false, initial_sum); + + return rc; +} + +/* + * Test specific edge case lengths + */ +static int +test_cksum_fuzz_edge_cases(void) +{ + /* Edge case lengths that might trigger bugs */ + static const size_t edge_lengths[] = { + 0, 1, 2, 3, 4, 5, 6, 7, 8, + 15, 16, 17, + 31, 32, 33, + 63, 64, 65, + 127, 128, 129, + 255, 256, 257, + 511, 512, 513, + 1023, 1024, 1025, + 1500, 1501, /* MTU boundaries */ + 2047, 2048, 2049, + 4095, 4096, 4097, + 8191, 8192, 8193, + 16383, 16384, 16385, + 32767, 32768, 32769, + 65534, 65535, 65536 /* 64K GRO boundaries */ + }; + unsigned int i; + int rc; + + printf("Testing edge case lengths...\n"); + + for (i = 0; i < RTE_DIM(edge_lengths); i++) { + /* Test with zero initial sum */ + rc = test_cksum_fuzz_length(edge_lengths[i], 0); + if (rc != TEST_SUCCESS) + return rc; + + /* Test with random initial sum */ + rc = test_cksum_fuzz_length(edge_lengths[i], get_initial_sum(true)); + if (rc != TEST_SUCCESS) + return rc; + } + + return TEST_SUCCESS; +} + +/* + * Test random lengths with optional random initial sums + */ +static int +test_cksum_fuzz_random(unsigned int iterations, bool random_initial_sum) +{ + unsigned int i; + int rc; + + printf("Testing random lengths (0-%d)%s...\n", MAX_TEST_LEN, + random_initial_sum ? " with random initial sums" : ""); + + for (i = 0; i < iterations; i++) { + size_t len = rte_rand() % (MAX_TEST_LEN + 1); + + rc = test_cksum_fuzz_length(len, get_initial_sum(random_initial_sum)); + if (rc != TEST_SUCCESS) { + printf("Failed at len=%zu\n", len); + return rc; + } + } + + return TEST_SUCCESS; +} + +static int +test_cksum_fuzz(void) +{ + int rc; + unsigned int iterations = DEFAULT_ITERATIONS; + printf("### __rte_raw_cksum optimization fuzz test ###\n"); + printf("Iterations per test: %u\n\n", iterations); + + /* Test edge cases */ + rc = test_cksum_fuzz_edge_cases(); + if (rc != TEST_SUCCESS) { + printf("Edge case test FAILED\n"); + return rc; + } + printf("Edge case test PASSED\n\n"); + + /* Test random lengths with zero initial sum */ + rc = test_cksum_fuzz_random(iterations, false); + if (rc != TEST_SUCCESS) { + printf("Random length test FAILED\n"); + return rc; + } + printf("Random length test PASSED\n\n"); + + /* Test random lengths with random initial sums */ + rc = test_cksum_fuzz_random(iterations, true); + if (rc != TEST_SUCCESS) { + printf("Random initial sum test FAILED\n"); + return rc; + } + printf("Random initial sum test PASSED\n\n"); + + printf("All fuzz tests PASSED!\n"); + return TEST_SUCCESS; +} + +REGISTER_FAST_TEST(cksum_fuzz_autotest, true, true, test_cksum_fuzz); diff --git a/app/test/test_cksum_perf.c b/app/test/test_cksum_perf.c index 0b919cd59f..6b1d4589e0 100644 --- a/app/test/test_cksum_perf.c +++ b/app/test/test_cksum_perf.c @@ -15,7 +15,7 @@ #define NUM_BLOCKS 10 #define ITERATIONS 1000000 -static const size_t data_sizes[] = { 20, 21, 100, 101, 1500, 1501 }; +static const size_t data_sizes[] = { 20, 21, 100, 101, 1500, 1501, 9000, 9001, 65536, 65537 }; static __rte_noinline uint16_t do_rte_raw_cksum(const void *buf, size_t len) diff --git a/lib/net/rte_cksum.h b/lib/net/rte_cksum.h index a8e8927952..f04b46a6c3 100644 --- a/lib/net/rte_cksum.h +++ b/lib/net/rte_cksum.h @@ -42,15 +42,11 @@ extern "C" { static inline uint32_t __rte_raw_cksum(const void *buf, size_t len, uint32_t sum) { - const void *end; - - for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t))); - buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) { - uint16_t v; - - memcpy(&v, buf, sizeof(uint16_t)); - sum += v; - } + /* Process uint16 chunks to preserve overflow/carry math. GCC/Clang vectorize the loop. */ + const unaligned_uint16_t *buf16 = (const unaligned_uint16_t *)buf; + const unaligned_uint16_t *end = buf16 + (len / sizeof(*buf16)); + for (; buf16 != end; buf16++) + sum += *buf16; /* if length is odd, keeping it byte order independent */ if (unlikely(len % 2)) { -- 2.39.5 (Apple Git-154) ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v15 0/2] net: optimize __rte_raw_cksum 2026-01-12 12:04 [PATCH v14 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 2026-01-12 12:04 ` [PATCH v14 1/2] eal: add __rte_may_alias to unaligned typedefs scott.k.mitch1 2026-01-12 12:04 ` [PATCH v14 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 @ 2026-01-17 21:21 ` scott.k.mitch1 2026-01-17 21:21 ` [PATCH v15 1/2] eal: add __rte_may_alias to unaligned typedefs scott.k.mitch1 ` (3 more replies) 2 siblings, 4 replies; 39+ messages in thread From: scott.k.mitch1 @ 2026-01-17 21:21 UTC (permalink / raw) To: dev; +Cc: mb, stephen, Scott Mitchell From: Scott Mitchell <scott.k.mitch1@gmail.com> This series optimizes __rte_raw_cksum by replacing memcpy with direct pointer access, enabling compiler vectorization on both GCC and Clang. Patch 1 adds __rte_may_alias to unaligned typedefs to prevent a GCC strict-aliasing bug where struct initialization is incorrectly elided. Patch 2 uses the improved unaligned_uint16_t type in __rte_raw_cksum to enable compiler optimizations while maintaining correctness across all architectures (including strict-alignment platforms). Performance results show significant improvements (40% for small buffers, up to 8x for larger buffers) on Intel Xeon with Clang 18.1. Changes in v15: - Use NOHUGE_OK and ASAN_OK constants in REGISTER_FAST_TEST Changes in v14: - Split into two patches: EAL typedef fix and checksum optimization - Use unaligned_uint16_t directly instead of wrapper struct - Added __rte_may_alias to unaligned typedefs to prevent GCC bug Scott Mitchell (2): eal: add __rte_may_alias to unaligned typedefs net: __rte_raw_cksum pointers enable compiler optimizations app/test/meson.build | 1 + app/test/test_cksum_fuzz.c | 240 +++++++++++++++++++++++++++++++++++ app/test/test_cksum_perf.c | 2 +- lib/eal/include/rte_common.h | 34 ++--- lib/net/rte_cksum.h | 14 +- 5 files changed, 266 insertions(+), 25 deletions(-) create mode 100644 app/test/test_cksum_fuzz.c -- 2.39.5 (Apple Git-154) ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v15 1/2] eal: add __rte_may_alias to unaligned typedefs 2026-01-17 21:21 ` [PATCH v15 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 @ 2026-01-17 21:21 ` scott.k.mitch1 2026-01-20 15:23 ` Morten Brørup 2026-01-17 21:21 ` [PATCH v15 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 ` (2 subsequent siblings) 3 siblings, 1 reply; 39+ messages in thread From: scott.k.mitch1 @ 2026-01-17 21:21 UTC (permalink / raw) To: dev; +Cc: mb, stephen, Scott Mitchell From: Scott Mitchell <scott.k.mitch1@gmail.com> Add __rte_may_alias attribute to unaligned_uint{16,32,64}_t typedefs to prevent GCC strict-aliasing optimization bugs. GCC has a bug where it incorrectly elides struct initialization when strict aliasing is enabled, causing reads from uninitialized memory. The __rte_may_alias attribute signals to the compiler that these types can alias other types, preventing the incorrect optimization. Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com> --- lib/eal/include/rte_common.h | 34 +++++++++++++++++++--------------- 1 file changed, 19 insertions(+), 15 deletions(-) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 9e7d84f929..ac70270cfb 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -121,14 +121,27 @@ extern "C" { #define __rte_aligned(a) __attribute__((__aligned__(a))) #endif +/** + * Macro to mark a type that is not subject to type-based aliasing rules + */ +#ifdef RTE_TOOLCHAIN_MSVC +#define __rte_may_alias +#else +#define __rte_may_alias __attribute__((__may_alias__)) +#endif + +/** + * __rte_may_alias avoids compiler bugs (GCC) that elide initialization + * of memory when strict-aliasing is enabled. + */ #ifdef RTE_ARCH_STRICT_ALIGN -typedef uint64_t unaligned_uint64_t __rte_aligned(1); -typedef uint32_t unaligned_uint32_t __rte_aligned(1); -typedef uint16_t unaligned_uint16_t __rte_aligned(1); +typedef uint64_t unaligned_uint64_t __rte_may_alias __rte_aligned(1); +typedef uint32_t unaligned_uint32_t __rte_may_alias __rte_aligned(1); +typedef uint16_t unaligned_uint16_t __rte_may_alias __rte_aligned(1); #else -typedef uint64_t unaligned_uint64_t; -typedef uint32_t unaligned_uint32_t; -typedef uint16_t unaligned_uint16_t; +typedef uint64_t unaligned_uint64_t __rte_may_alias; +typedef uint32_t unaligned_uint32_t __rte_may_alias; +typedef uint16_t unaligned_uint16_t __rte_may_alias; #endif /** @@ -159,15 +172,6 @@ typedef uint16_t unaligned_uint16_t; #define __rte_packed_end __attribute__((__packed__)) #endif -/** - * Macro to mark a type that is not subject to type-based aliasing rules - */ -#ifdef RTE_TOOLCHAIN_MSVC -#define __rte_may_alias -#else -#define __rte_may_alias __attribute__((__may_alias__)) -#endif - /******* Macro to mark functions and fields scheduled for removal *****/ #ifdef RTE_TOOLCHAIN_MSVC #define __rte_deprecated -- 2.39.5 (Apple Git-154) ^ permalink raw reply related [flat|nested] 39+ messages in thread
* RE: [PATCH v15 1/2] eal: add __rte_may_alias to unaligned typedefs 2026-01-17 21:21 ` [PATCH v15 1/2] eal: add __rte_may_alias to unaligned typedefs scott.k.mitch1 @ 2026-01-20 15:23 ` Morten Brørup 2026-01-23 14:34 ` Scott Mitchell 0 siblings, 1 reply; 39+ messages in thread From: Morten Brørup @ 2026-01-20 15:23 UTC (permalink / raw) To: stable; +Cc: scott.k.mitch1, dev, stephen > From: scott.k.mitch1@gmail.com [mailto:scott.k.mitch1@gmail.com] > Sent: Saturday, 17 January 2026 22.21 > To: dev@dpdk.org > Cc: Morten Brørup; stephen@networkplumber.org; Scott Mitchell > Subject: [PATCH v15 1/2] eal: add __rte_may_alias to unaligned typedefs > > From: Scott Mitchell <scott.k.mitch1@gmail.com> > > Add __rte_may_alias attribute to unaligned_uint{16,32,64}_t typedefs > to prevent GCC strict-aliasing optimization bugs. GCC has a bug where > it incorrectly elides struct initialization when strict aliasing is > enabled, causing reads from uninitialized memory. > > The __rte_may_alias attribute signals to the compiler that these types > can alias other types, preventing the incorrect optimization. Although this is a workaround to fix bugs in GCC, and not DPDK itself, I think it should be backported. It may fix (GCC induced) bugs in applications using these types. That's my opinion; let's get more opinions. If so, add: Fixes: 7621d6a8d0bd ("eal: add and use unaligned integer types") > > Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com> > --- > lib/eal/include/rte_common.h | 34 +++++++++++++++++++--------------- > 1 file changed, 19 insertions(+), 15 deletions(-) > > diff --git a/lib/eal/include/rte_common.h > b/lib/eal/include/rte_common.h > index 9e7d84f929..ac70270cfb 100644 > --- a/lib/eal/include/rte_common.h > +++ b/lib/eal/include/rte_common.h > @@ -121,14 +121,27 @@ extern "C" { > #define __rte_aligned(a) __attribute__((__aligned__(a))) > #endif > > +/** > + * Macro to mark a type that is not subject to type-based aliasing > rules > + */ > +#ifdef RTE_TOOLCHAIN_MSVC > +#define __rte_may_alias > +#else > +#define __rte_may_alias __attribute__((__may_alias__)) > +#endif > + > +/** > + * __rte_may_alias avoids compiler bugs (GCC) that elide > initialization > + * of memory when strict-aliasing is enabled. > + */ > #ifdef RTE_ARCH_STRICT_ALIGN > -typedef uint64_t unaligned_uint64_t __rte_aligned(1); > -typedef uint32_t unaligned_uint32_t __rte_aligned(1); > -typedef uint16_t unaligned_uint16_t __rte_aligned(1); > +typedef uint64_t unaligned_uint64_t __rte_may_alias __rte_aligned(1); > +typedef uint32_t unaligned_uint32_t __rte_may_alias __rte_aligned(1); > +typedef uint16_t unaligned_uint16_t __rte_may_alias __rte_aligned(1); > #else > -typedef uint64_t unaligned_uint64_t; > -typedef uint32_t unaligned_uint32_t; > -typedef uint16_t unaligned_uint16_t; > +typedef uint64_t unaligned_uint64_t __rte_may_alias; > +typedef uint32_t unaligned_uint32_t __rte_may_alias; > +typedef uint16_t unaligned_uint16_t __rte_may_alias; > #endif > > /** > @@ -159,15 +172,6 @@ typedef uint16_t unaligned_uint16_t; > #define __rte_packed_end __attribute__((__packed__)) > #endif > > -/** > - * Macro to mark a type that is not subject to type-based aliasing > rules > - */ > -#ifdef RTE_TOOLCHAIN_MSVC > -#define __rte_may_alias > -#else > -#define __rte_may_alias __attribute__((__may_alias__)) > -#endif > - > /******* Macro to mark functions and fields scheduled for removal > *****/ > #ifdef RTE_TOOLCHAIN_MSVC > #define __rte_deprecated > -- > 2.39.5 (Apple Git-154) ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v15 1/2] eal: add __rte_may_alias to unaligned typedefs 2026-01-20 15:23 ` Morten Brørup @ 2026-01-23 14:34 ` Scott Mitchell 0 siblings, 0 replies; 39+ messages in thread From: Scott Mitchell @ 2026-01-23 14:34 UTC (permalink / raw) To: Morten Brørup; +Cc: stable, dev, stephen > > Although this is a workaround to fix bugs in GCC, and not DPDK itself, > I think it should be backported. > It may fix (GCC induced) bugs in applications using these types. > That's my opinion; let's get more opinions. > > If so, add: > > Fixes: 7621d6a8d0bd ("eal: add and use unaligned integer types") > I agree. I will submit a v16 with this. ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v15 2/2] net: __rte_raw_cksum pointers enable compiler optimizations 2026-01-17 21:21 ` [PATCH v15 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 2026-01-17 21:21 ` [PATCH v15 1/2] eal: add __rte_may_alias to unaligned typedefs scott.k.mitch1 @ 2026-01-17 21:21 ` scott.k.mitch1 2026-01-17 22:08 ` [PATCH v15 0/2] net: optimize __rte_raw_cksum Stephen Hemminger 2026-01-23 16:02 ` [PATCH v16 " scott.k.mitch1 3 siblings, 0 replies; 39+ messages in thread From: scott.k.mitch1 @ 2026-01-17 21:21 UTC (permalink / raw) To: dev; +Cc: mb, stephen, Scott Mitchell From: Scott Mitchell <scott.k.mitch1@gmail.com> __rte_raw_cksum uses a loop with memcpy on each iteration. GCC 15+ is able to vectorize the loop but Clang 18.1 is not. Replace memcpy with direct pointer access using unaligned_uint16_t. This enables both GCC and Clang to vectorize the loop while handling unaligned access safely on all architectures. Performance results from cksum_perf_autotest on Intel Xeon (Cascade Lake, AVX-512) built with Clang 18.1 (TSC cycles/byte): Block size Before After Improvement 100 0.40 0.24 ~40% 1500 0.50 0.06 ~8x 9000 0.49 0.06 ~8x Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com> --- app/test/meson.build | 1 + app/test/test_cksum_fuzz.c | 240 +++++++++++++++++++++++++++++++++++++ app/test/test_cksum_perf.c | 2 +- lib/net/rte_cksum.h | 14 +-- 4 files changed, 247 insertions(+), 10 deletions(-) create mode 100644 app/test/test_cksum_fuzz.c diff --git a/app/test/meson.build b/app/test/meson.build index efec42a6bf..c92325ad58 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -38,6 +38,7 @@ source_file_deps = { 'test_byteorder.c': [], 'test_cfgfile.c': ['cfgfile'], 'test_cksum.c': ['net'], + 'test_cksum_fuzz.c': ['net'], 'test_cksum_perf.c': ['net'], 'test_cmdline.c': [], 'test_cmdline_cirbuf.c': [], diff --git a/app/test/test_cksum_fuzz.c b/app/test/test_cksum_fuzz.c new file mode 100644 index 0000000000..3df11e3dc2 --- /dev/null +++ b/app/test/test_cksum_fuzz.c @@ -0,0 +1,240 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2026 Apple Inc. + */ + +#include <stdio.h> +#include <string.h> + +#include <rte_common.h> +#include <rte_cycles.h> +#include <rte_hexdump.h> +#include <rte_cksum.h> +#include <rte_malloc.h> +#include <rte_random.h> + +#include "test.h" + +/* + * Fuzz test for __rte_raw_cksum optimization. + * Compares the optimized implementation against the original reference + * implementation across random data of various lengths. + */ + +#define DEFAULT_ITERATIONS 1000 +#define MAX_TEST_LEN 65536 /* 64K to match GRO frame sizes */ + +/* + * Original (reference) implementation of __rte_raw_cksum from DPDK v23.11. + * This is retained here for comparison testing against the optimized version. + */ +static inline uint32_t +__rte_raw_cksum_reference(const void *buf, size_t len, uint32_t sum) +{ + const void *end; + + for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t))); + buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) { + uint16_t v; + + memcpy(&v, buf, sizeof(uint16_t)); + sum += v; + } + + /* if length is odd, keeping it byte order independent */ + if (unlikely(len % 2)) { + uint16_t left = 0; + + memcpy(&left, end, 1); + sum += left; + } + + return sum; +} + +static void +init_random_buffer(uint8_t *buf, size_t len) +{ + size_t i; + + for (i = 0; i < len; i++) + buf[i] = (uint8_t)rte_rand(); +} + +static inline uint32_t +get_initial_sum(bool random_initial_sum) +{ + return random_initial_sum ? (rte_rand() & 0xFFFFFFFF) : 0; +} + +/* + * Test a single buffer length with specific alignment and initial sum + */ +static int +test_cksum_fuzz_length_aligned(size_t len, bool aligned, uint32_t initial_sum) +{ + uint8_t *data; + uint8_t *buf; + size_t alloc_size; + uint32_t sum_ref, sum_opt; + + if (len == 0 && !aligned) { + /* Skip unaligned test for zero length - nothing to test */ + return TEST_SUCCESS; + } + + /* Allocate exact size for aligned, +1 for unaligned offset */ + alloc_size = aligned ? len : len + 1; + if (alloc_size == 0) + alloc_size = 1; /* rte_malloc doesn't like 0 */ + + data = rte_malloc(NULL, alloc_size, 64); + if (data == NULL) { + printf("Failed to allocate %zu bytes\n", alloc_size); + return TEST_FAILED; + } + + buf = aligned ? data : (data + 1); + + init_random_buffer(buf, len); + + sum_ref = __rte_raw_cksum_reference(buf, len, initial_sum); + sum_opt = __rte_raw_cksum(buf, len, initial_sum); + + if (sum_ref != sum_opt) { + printf("MISMATCH at len=%zu aligned='%s' initial_sum=0x%08x ref=0x%08x opt=0x%08x\n", + len, aligned ? "aligned" : "unaligned", + initial_sum, sum_ref, sum_opt); + rte_hexdump(stdout, "failing buffer", buf, len); + rte_free(data); + return TEST_FAILED; + } + + rte_free(data); + return TEST_SUCCESS; +} + +/* + * Test a length with both alignments + */ +static int +test_cksum_fuzz_length(size_t len, uint32_t initial_sum) +{ + int rc; + + /* Test aligned */ + rc = test_cksum_fuzz_length_aligned(len, true, initial_sum); + if (rc != TEST_SUCCESS) + return rc; + + /* Test unaligned */ + rc = test_cksum_fuzz_length_aligned(len, false, initial_sum); + + return rc; +} + +/* + * Test specific edge case lengths + */ +static int +test_cksum_fuzz_edge_cases(void) +{ + /* Edge case lengths that might trigger bugs */ + static const size_t edge_lengths[] = { + 0, 1, 2, 3, 4, 5, 6, 7, 8, + 15, 16, 17, + 31, 32, 33, + 63, 64, 65, + 127, 128, 129, + 255, 256, 257, + 511, 512, 513, + 1023, 1024, 1025, + 1500, 1501, /* MTU boundaries */ + 2047, 2048, 2049, + 4095, 4096, 4097, + 8191, 8192, 8193, + 16383, 16384, 16385, + 32767, 32768, 32769, + 65534, 65535, 65536 /* 64K GRO boundaries */ + }; + unsigned int i; + int rc; + + printf("Testing edge case lengths...\n"); + + for (i = 0; i < RTE_DIM(edge_lengths); i++) { + /* Test with zero initial sum */ + rc = test_cksum_fuzz_length(edge_lengths[i], 0); + if (rc != TEST_SUCCESS) + return rc; + + /* Test with random initial sum */ + rc = test_cksum_fuzz_length(edge_lengths[i], get_initial_sum(true)); + if (rc != TEST_SUCCESS) + return rc; + } + + return TEST_SUCCESS; +} + +/* + * Test random lengths with optional random initial sums + */ +static int +test_cksum_fuzz_random(unsigned int iterations, bool random_initial_sum) +{ + unsigned int i; + int rc; + + printf("Testing random lengths (0-%d)%s...\n", MAX_TEST_LEN, + random_initial_sum ? " with random initial sums" : ""); + + for (i = 0; i < iterations; i++) { + size_t len = rte_rand() % (MAX_TEST_LEN + 1); + + rc = test_cksum_fuzz_length(len, get_initial_sum(random_initial_sum)); + if (rc != TEST_SUCCESS) { + printf("Failed at len=%zu\n", len); + return rc; + } + } + + return TEST_SUCCESS; +} + +static int +test_cksum_fuzz(void) +{ + int rc; + unsigned int iterations = DEFAULT_ITERATIONS; + printf("### __rte_raw_cksum optimization fuzz test ###\n"); + printf("Iterations per test: %u\n\n", iterations); + + /* Test edge cases */ + rc = test_cksum_fuzz_edge_cases(); + if (rc != TEST_SUCCESS) { + printf("Edge case test FAILED\n"); + return rc; + } + printf("Edge case test PASSED\n\n"); + + /* Test random lengths with zero initial sum */ + rc = test_cksum_fuzz_random(iterations, false); + if (rc != TEST_SUCCESS) { + printf("Random length test FAILED\n"); + return rc; + } + printf("Random length test PASSED\n\n"); + + /* Test random lengths with random initial sums */ + rc = test_cksum_fuzz_random(iterations, true); + if (rc != TEST_SUCCESS) { + printf("Random initial sum test FAILED\n"); + return rc; + } + printf("Random initial sum test PASSED\n\n"); + + printf("All fuzz tests PASSED!\n"); + return TEST_SUCCESS; +} + +REGISTER_FAST_TEST(cksum_fuzz_autotest, NOHUGE_OK, ASAN_OK, test_cksum_fuzz); diff --git a/app/test/test_cksum_perf.c b/app/test/test_cksum_perf.c index 0b919cd59f..6b1d4589e0 100644 --- a/app/test/test_cksum_perf.c +++ b/app/test/test_cksum_perf.c @@ -15,7 +15,7 @@ #define NUM_BLOCKS 10 #define ITERATIONS 1000000 -static const size_t data_sizes[] = { 20, 21, 100, 101, 1500, 1501 }; +static const size_t data_sizes[] = { 20, 21, 100, 101, 1500, 1501, 9000, 9001, 65536, 65537 }; static __rte_noinline uint16_t do_rte_raw_cksum(const void *buf, size_t len) diff --git a/lib/net/rte_cksum.h b/lib/net/rte_cksum.h index a8e8927952..f04b46a6c3 100644 --- a/lib/net/rte_cksum.h +++ b/lib/net/rte_cksum.h @@ -42,15 +42,11 @@ extern "C" { static inline uint32_t __rte_raw_cksum(const void *buf, size_t len, uint32_t sum) { - const void *end; - - for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t))); - buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) { - uint16_t v; - - memcpy(&v, buf, sizeof(uint16_t)); - sum += v; - } + /* Process uint16 chunks to preserve overflow/carry math. GCC/Clang vectorize the loop. */ + const unaligned_uint16_t *buf16 = (const unaligned_uint16_t *)buf; + const unaligned_uint16_t *end = buf16 + (len / sizeof(*buf16)); + for (; buf16 != end; buf16++) + sum += *buf16; /* if length is odd, keeping it byte order independent */ if (unlikely(len % 2)) { -- 2.39.5 (Apple Git-154) ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v15 0/2] net: optimize __rte_raw_cksum 2026-01-17 21:21 ` [PATCH v15 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 2026-01-17 21:21 ` [PATCH v15 1/2] eal: add __rte_may_alias to unaligned typedefs scott.k.mitch1 2026-01-17 21:21 ` [PATCH v15 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 @ 2026-01-17 22:08 ` Stephen Hemminger 2026-01-20 12:45 ` Morten Brørup 2026-01-23 16:02 ` [PATCH v16 " scott.k.mitch1 3 siblings, 1 reply; 39+ messages in thread From: Stephen Hemminger @ 2026-01-17 22:08 UTC (permalink / raw) To: scott.k.mitch1; +Cc: dev, mb On Sat, 17 Jan 2026 13:21:12 -0800 scott.k.mitch1@gmail.com wrote: > From: Scott Mitchell <scott.k.mitch1@gmail.com> > > This series optimizes __rte_raw_cksum by replacing memcpy with direct > pointer access, enabling compiler vectorization on both GCC and Clang. > > Patch 1 adds __rte_may_alias to unaligned typedefs to prevent a GCC > strict-aliasing bug where struct initialization is incorrectly elided. > > Patch 2 uses the improved unaligned_uint16_t type in __rte_raw_cksum > to enable compiler optimizations while maintaining correctness across > all architectures (including strict-alignment platforms). > > Performance results show significant improvements (40% for small buffers, > up to 8x for larger buffers) on Intel Xeon with Clang 18.1. > > Changes in v15: > - Use NOHUGE_OK and ASAN_OK constants in REGISTER_FAST_TEST > > Changes in v14: > - Split into two patches: EAL typedef fix and checksum optimization > - Use unaligned_uint16_t directly instead of wrapper struct > - Added __rte_may_alias to unaligned typedefs to prevent GCC bug > > Scott Mitchell (2): > eal: add __rte_may_alias to unaligned typedefs > net: __rte_raw_cksum pointers enable compiler optimizations > > app/test/meson.build | 1 + > app/test/test_cksum_fuzz.c | 240 +++++++++++++++++++++++++++++++++++ > app/test/test_cksum_perf.c | 2 +- > lib/eal/include/rte_common.h | 34 ++--- > lib/net/rte_cksum.h | 14 +- > 5 files changed, 266 insertions(+), 25 deletions(-) > create mode 100644 app/test/test_cksum_fuzz.c > > -- > 2.39.5 (Apple Git-154) > Looks good now. Acked-by: Stephen Hemminger <stephen@networkplumber.org> AI review agrees with me... ## Patch Review: [PATCH v15 1/2] eal: add __rte_may_alias to unaligned typedefs ### Commit Message Analysis | Criterion | Status | Notes | |-----------|--------|-------| | Subject ≤60 chars | ✅ Pass | 47 characters | | Lowercase after colon | ✅ Pass | "add __rte_may_alias..." | | Imperative mood | ✅ Pass | "add" | | No trailing period | ✅ Pass | | | Correct prefix | ✅ Pass | "eal:" for lib/eal/ files | | Body ≤75 chars/line | ✅ Pass | Lines appear within limit | | Body doesn't start with "It" | ✅ Pass | Starts with "Add" | | Signed-off-by present | ✅ Pass | `Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com>` | ### Missing Tags (Warning) **No `Fixes:` tag**: The commit message describes fixing "GCC strict-aliasing optimization bugs" and "incorrect optimization." This sounds like a bug fix that should reference the original commit introducing the unaligned typedefs. Consider adding: ``` Fixes: <12-char-sha> ("original commit introducing unaligned typedefs") ``` **No `Cc: stable@dpdk.org`**: If this fixes a real bug causing reads from uninitialized memory, it's likely a stable backport candidate. ### Code Review **Positive aspects:** - Proper Doxygen comment added for `__rte_may_alias` macro - Good explanation of the GCC bug workaround - MSVC fallback handled correctly - Macro moved before its use (necessary for the typedefs) **Minor observations:** - The second comment block (lines 121-124 in the diff) is somewhat redundant with the first Doxygen comment. Consider consolidating. --- ## Patch Review: [PATCH v15 2/2] net: use unaligned type for raw checksum ### Commit Message Analysis The mbox was truncated, but based on what's visible: | Criterion | Status | Notes | |-----------|--------|-------| | Correct prefix | ✅ Pass | "net:" for lib/net/ files | ### Code Review - lib/net/rte_cksum.h **The core change:** ```c // OLD (memcpy-based): for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t))); buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) { uint16_t v; memcpy(&v, buf, sizeof(uint16_t)); sum += v; } // NEW (direct access via unaligned type): const unaligned_uint16_t *buf16 = (const unaligned_uint16_t *)buf; const unaligned_uint16_t *end = buf16 + (len / sizeof(*buf16)); for (; buf16 != end; buf16++) sum += *buf16; ``` **Positive aspects:** - Cleaner, more readable code - Relies on the `__rte_may_alias` attribute from patch 1 to prevent aliasing bugs - Comment explains vectorization benefit: "GCC/Clang vectorize the loop" - Good dependency ordering (patch 1 must come before patch 2) **Style observations:** - ✅ Line length within 100 chars - ✅ Proper use of `const` ### Code Review - app/test/test_cksum_fuzz.c (New File) **Positive aspects:** - ✅ Uses `TEST_SUCCESS`/`TEST_FAILED` correctly - ✅ Uses `REGISTER_FAST_TEST` macro properly - ✅ `printf()` usage is acceptable in test code per AGENTS.md - ✅ `rte_malloc()` usage acceptable in test code - ✅ Comprehensive edge case testing (power-of-2 boundaries, MTU sizes, GRO boundaries) - ✅ Tests both aligned and unaligned cases - ✅ Tests with zero and random initial sums **Issues to verify** (file header not visible in truncated mbox): - Ensure SPDX license identifier present on first line - Ensure copyright line follows SPDX - Ensure blank line before includes **Style warning (lines 394-396):** ```c printf("MISMATCH at len=%zu aligned='%s' initial_sum=0x%08x ref=0x%08x opt=0x%08x\n", len, aligned ? "aligned" : "unaligned", initial_sum, sum_ref, sum_opt); ``` Line length appears to be ~95 chars which is acceptable (<100). ### Code Review - app/test/test_cksum_perf.c Minor change extending test coverage - looks fine. --- ## Summary ### Errors (Must Fix) None identified. ### Warnings (Should Fix) | Issue | Patch | Recommendation | |-------|-------|----------------| | Missing `Fixes:` tag | 1/2 | Add if this fixes a regression from a specific commit | | Missing `Cc: stable@dpdk.org` | 1/2 | Consider if this should be backported | | Verify SPDX header | 2/2 | Ensure test_cksum_fuzz.c has proper license header | ### Info (Consider) 1. **Patch 1**: The two comment blocks for `__rte_may_alias` could be consolidated into a single, more comprehensive Doxygen comment. 2. **Patch 2**: The new fuzz test is well-structured and follows DPDK test conventions. Good use of the `unit_test_suite_runner`-style approach with `REGISTER_FAST_TEST`. 3. **Series overall**: Good logical ordering - patch 1 provides the infrastructure, patch 2 uses it. Each commit should compile independently. --- **Verdict**: This is a well-structured patch series at v15. The code changes are clean and the test coverage is thorough. The main actionable items are adding appropriate `Fixes:` and `Cc: stable` tags if this is indeed a bug fix worth backporting. ^ permalink raw reply [flat|nested] 39+ messages in thread
* RE: [PATCH v15 0/2] net: optimize __rte_raw_cksum 2026-01-17 22:08 ` [PATCH v15 0/2] net: optimize __rte_raw_cksum Stephen Hemminger @ 2026-01-20 12:45 ` Morten Brørup 2026-01-23 15:43 ` Scott Mitchell 0 siblings, 1 reply; 39+ messages in thread From: Morten Brørup @ 2026-01-20 12:45 UTC (permalink / raw) To: scott.k.mitch1; +Cc: dev, Stephen Hemminger > > From: Scott Mitchell <scott.k.mitch1@gmail.com> > > > > This series optimizes __rte_raw_cksum by replacing memcpy with direct > > pointer access, enabling compiler vectorization on both GCC and > Clang. > > > > Patch 1 adds __rte_may_alias to unaligned typedefs to prevent a GCC > > strict-aliasing bug where struct initialization is incorrectly > elided. > > > > Patch 2 uses the improved unaligned_uint16_t type in __rte_raw_cksum > > to enable compiler optimizations while maintaining correctness across > > all architectures (including strict-alignment platforms). > > > > Performance results show significant improvements (40% for small > buffers, > > up to 8x for larger buffers) on Intel Xeon with Clang 18.1. > > > > Changes in v15: > > - Use NOHUGE_OK and ASAN_OK constants in REGISTER_FAST_TEST > > > > Changes in v14: > > - Split into two patches: EAL typedef fix and checksum optimization > > - Use unaligned_uint16_t directly instead of wrapper struct > > - Added __rte_may_alias to unaligned typedefs to prevent GCC bug > > > > Scott Mitchell (2): > > eal: add __rte_may_alias to unaligned typedefs > > net: __rte_raw_cksum pointers enable compiler optimizations > > > > app/test/meson.build | 1 + > > app/test/test_cksum_fuzz.c | 240 > +++++++++++++++++++++++++++++++++++ > > app/test/test_cksum_perf.c | 2 +- > > lib/eal/include/rte_common.h | 34 ++--- > > lib/net/rte_cksum.h | 14 +- > > 5 files changed, 266 insertions(+), 25 deletions(-) > > create mode 100644 app/test/test_cksum_fuzz.c > > > > -- > > 2.39.5 (Apple Git-154) > > > > Looks good now. > Acked-by: Stephen Hemminger <stephen@networkplumber.org> LGTM too. Acked-by: Morten Brørup <mb@smartsharesystems.com> Thank you for the effort and prompt reaction to feedback, Scott. It has been a pleasure reviewing this series! ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v15 0/2] net: optimize __rte_raw_cksum 2026-01-20 12:45 ` Morten Brørup @ 2026-01-23 15:43 ` Scott Mitchell 0 siblings, 0 replies; 39+ messages in thread From: Scott Mitchell @ 2026-01-23 15:43 UTC (permalink / raw) To: Morten Brørup; +Cc: dev, Stephen Hemminger Awesome! Thanks Morten & Stephen for the review and constructive feedback, leading to a better result in the end! I will submit a v16 with 1/2 including "Fixes: 7621d6a8d0bd ("eal: add and use unaligned integer types")" as requested. ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v16 0/2] net: optimize __rte_raw_cksum 2026-01-17 21:21 ` [PATCH v15 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 ` (2 preceding siblings ...) 2026-01-17 22:08 ` [PATCH v15 0/2] net: optimize __rte_raw_cksum Stephen Hemminger @ 2026-01-23 16:02 ` scott.k.mitch1 2026-01-23 16:02 ` [PATCH v16 1/2] eal: add __rte_may_alias to unaligned typedefs scott.k.mitch1 ` (3 more replies) 3 siblings, 4 replies; 39+ messages in thread From: scott.k.mitch1 @ 2026-01-23 16:02 UTC (permalink / raw) To: dev; +Cc: mb, stephen, Scott From: Scott <scott_mitchell@apple.com> This series optimizes __rte_raw_cksum by replacing memcpy with direct pointer access, enabling compiler vectorization on both GCC and Clang. Patch 1 adds __rte_may_alias to unaligned typedefs to prevent a GCC strict-aliasing bug where struct initialization is incorrectly elided. Patch 2 uses the improved unaligned_uint16_t type in __rte_raw_cksum to enable compiler optimizations while maintaining correctness across all architectures (including strict-alignment platforms). Performance results show significant improvements (40% for small buffers, up to 8x for larger buffers) on Intel Xeon with Clang 18.1. Changes in v16: - Add Fixes tag and Cc stable/author for backporting (patch 1) Changes in v15: - Use NOHUGE_OK and ASAN_OK constants in REGISTER_FAST_TEST Changes in v14: - Split into two patches: EAL typedef fix and checksum optimization - Use unaligned_uint16_t directly instead of wrapper struct - Added __rte_may_alias to unaligned typedefs to prevent GCC bug Scott Mitchell (2): eal: add __rte_may_alias to unaligned typedefs net: __rte_raw_cksum pointers enable compiler optimizations app/test/meson.build | 1 + app/test/test_cksum_fuzz.c | 240 +++++++++++++++++++++++++++++++++++ app/test/test_cksum_perf.c | 2 +- lib/eal/include/rte_common.h | 34 ++--- lib/net/rte_cksum.h | 14 +- 5 files changed, 266 insertions(+), 25 deletions(-) create mode 100644 app/test/test_cksum_fuzz.c -- 2.39.5 (Apple Git-154) ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v16 1/2] eal: add __rte_may_alias to unaligned typedefs 2026-01-23 16:02 ` [PATCH v16 " scott.k.mitch1 @ 2026-01-23 16:02 ` scott.k.mitch1 2026-01-23 16:02 ` [PATCH v16 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 ` (2 subsequent siblings) 3 siblings, 0 replies; 39+ messages in thread From: scott.k.mitch1 @ 2026-01-23 16:02 UTC (permalink / raw) To: dev; +Cc: mb, stephen, Scott Mitchell, Cyril Chemparathy, stable From: Scott Mitchell <scott.k.mitch1@gmail.com> Add __rte_may_alias attribute to unaligned_uint{16,32,64}_t typedefs to prevent GCC strict-aliasing optimization bugs. GCC has a bug where it incorrectly elides struct initialization when strict aliasing is enabled, causing reads from uninitialized memory. The __rte_may_alias attribute signals to the compiler that these types can alias other types, preventing the incorrect optimization. Fixes: 7621d6a8d0bd ("eal: add and use unaligned integer types") Cc: Cyril Chemparathy <cchemparathy@ezchip.com> Cc: stable@dpdk.org Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com> --- lib/eal/include/rte_common.h | 34 +++++++++++++++++++--------------- 1 file changed, 19 insertions(+), 15 deletions(-) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 573bf4f2ce..8a9623ea74 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -121,14 +121,27 @@ extern "C" { #define __rte_aligned(a) __attribute__((__aligned__(a))) #endif +/** + * Macro to mark a type that is not subject to type-based aliasing rules + */ +#ifdef RTE_TOOLCHAIN_MSVC +#define __rte_may_alias +#else +#define __rte_may_alias __attribute__((__may_alias__)) +#endif + +/** + * __rte_may_alias avoids compiler bugs (GCC) that elide initialization + * of memory when strict-aliasing is enabled. + */ #ifdef RTE_ARCH_STRICT_ALIGN -typedef uint64_t unaligned_uint64_t __rte_aligned(1); -typedef uint32_t unaligned_uint32_t __rte_aligned(1); -typedef uint16_t unaligned_uint16_t __rte_aligned(1); +typedef uint64_t unaligned_uint64_t __rte_may_alias __rte_aligned(1); +typedef uint32_t unaligned_uint32_t __rte_may_alias __rte_aligned(1); +typedef uint16_t unaligned_uint16_t __rte_may_alias __rte_aligned(1); #else -typedef uint64_t unaligned_uint64_t; -typedef uint32_t unaligned_uint32_t; -typedef uint16_t unaligned_uint16_t; +typedef uint64_t unaligned_uint64_t __rte_may_alias; +typedef uint32_t unaligned_uint32_t __rte_may_alias; +typedef uint16_t unaligned_uint16_t __rte_may_alias; #endif /** @@ -159,15 +172,6 @@ typedef uint16_t unaligned_uint16_t; #define __rte_packed_end __attribute__((__packed__)) #endif -/** - * Macro to mark a type that is not subject to type-based aliasing rules - */ -#ifdef RTE_TOOLCHAIN_MSVC -#define __rte_may_alias -#else -#define __rte_may_alias __attribute__((__may_alias__)) -#endif - /******* Macro to mark functions and fields scheduled for removal *****/ #ifdef RTE_TOOLCHAIN_MSVC #define __rte_deprecated -- 2.39.5 (Apple Git-154) ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v16 2/2] net: __rte_raw_cksum pointers enable compiler optimizations 2026-01-23 16:02 ` [PATCH v16 " scott.k.mitch1 2026-01-23 16:02 ` [PATCH v16 1/2] eal: add __rte_may_alias to unaligned typedefs scott.k.mitch1 @ 2026-01-23 16:02 ` scott.k.mitch1 2026-01-28 11:05 ` David Marchand 2026-01-24 8:23 ` [PATCH v16 0/2] net: optimize __rte_raw_cksum Morten Brørup 2026-01-28 18:05 ` [PATCH v17 " scott.k.mitch1 3 siblings, 1 reply; 39+ messages in thread From: scott.k.mitch1 @ 2026-01-23 16:02 UTC (permalink / raw) To: dev; +Cc: mb, stephen, Scott Mitchell From: Scott Mitchell <scott.k.mitch1@gmail.com> __rte_raw_cksum uses a loop with memcpy on each iteration. GCC 15+ is able to vectorize the loop but Clang 18.1 is not. Replace memcpy with direct pointer access using unaligned_uint16_t. This enables both GCC and Clang to vectorize the loop while handling unaligned access safely on all architectures. Performance results from cksum_perf_autotest on Intel Xeon (Cascade Lake, AVX-512) built with Clang 18.1 (TSC cycles/byte): Block size Before After Improvement 100 0.40 0.24 ~40% 1500 0.50 0.06 ~8x 9000 0.49 0.06 ~8x Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com> --- app/test/meson.build | 1 + app/test/test_cksum_fuzz.c | 240 +++++++++++++++++++++++++++++++++++++ app/test/test_cksum_perf.c | 2 +- lib/net/rte_cksum.h | 14 +-- 4 files changed, 247 insertions(+), 10 deletions(-) create mode 100644 app/test/test_cksum_fuzz.c diff --git a/app/test/meson.build b/app/test/meson.build index f4d04a6e42..2ca17716b9 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -38,6 +38,7 @@ source_file_deps = { 'test_byteorder.c': [], 'test_cfgfile.c': ['cfgfile'], 'test_cksum.c': ['net'], + 'test_cksum_fuzz.c': ['net'], 'test_cksum_perf.c': ['net'], 'test_cmdline.c': [], 'test_cmdline_cirbuf.c': [], diff --git a/app/test/test_cksum_fuzz.c b/app/test/test_cksum_fuzz.c new file mode 100644 index 0000000000..3df11e3dc2 --- /dev/null +++ b/app/test/test_cksum_fuzz.c @@ -0,0 +1,240 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2026 Apple Inc. + */ + +#include <stdio.h> +#include <string.h> + +#include <rte_common.h> +#include <rte_cycles.h> +#include <rte_hexdump.h> +#include <rte_cksum.h> +#include <rte_malloc.h> +#include <rte_random.h> + +#include "test.h" + +/* + * Fuzz test for __rte_raw_cksum optimization. + * Compares the optimized implementation against the original reference + * implementation across random data of various lengths. + */ + +#define DEFAULT_ITERATIONS 1000 +#define MAX_TEST_LEN 65536 /* 64K to match GRO frame sizes */ + +/* + * Original (reference) implementation of __rte_raw_cksum from DPDK v23.11. + * This is retained here for comparison testing against the optimized version. + */ +static inline uint32_t +__rte_raw_cksum_reference(const void *buf, size_t len, uint32_t sum) +{ + const void *end; + + for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t))); + buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) { + uint16_t v; + + memcpy(&v, buf, sizeof(uint16_t)); + sum += v; + } + + /* if length is odd, keeping it byte order independent */ + if (unlikely(len % 2)) { + uint16_t left = 0; + + memcpy(&left, end, 1); + sum += left; + } + + return sum; +} + +static void +init_random_buffer(uint8_t *buf, size_t len) +{ + size_t i; + + for (i = 0; i < len; i++) + buf[i] = (uint8_t)rte_rand(); +} + +static inline uint32_t +get_initial_sum(bool random_initial_sum) +{ + return random_initial_sum ? (rte_rand() & 0xFFFFFFFF) : 0; +} + +/* + * Test a single buffer length with specific alignment and initial sum + */ +static int +test_cksum_fuzz_length_aligned(size_t len, bool aligned, uint32_t initial_sum) +{ + uint8_t *data; + uint8_t *buf; + size_t alloc_size; + uint32_t sum_ref, sum_opt; + + if (len == 0 && !aligned) { + /* Skip unaligned test for zero length - nothing to test */ + return TEST_SUCCESS; + } + + /* Allocate exact size for aligned, +1 for unaligned offset */ + alloc_size = aligned ? len : len + 1; + if (alloc_size == 0) + alloc_size = 1; /* rte_malloc doesn't like 0 */ + + data = rte_malloc(NULL, alloc_size, 64); + if (data == NULL) { + printf("Failed to allocate %zu bytes\n", alloc_size); + return TEST_FAILED; + } + + buf = aligned ? data : (data + 1); + + init_random_buffer(buf, len); + + sum_ref = __rte_raw_cksum_reference(buf, len, initial_sum); + sum_opt = __rte_raw_cksum(buf, len, initial_sum); + + if (sum_ref != sum_opt) { + printf("MISMATCH at len=%zu aligned='%s' initial_sum=0x%08x ref=0x%08x opt=0x%08x\n", + len, aligned ? "aligned" : "unaligned", + initial_sum, sum_ref, sum_opt); + rte_hexdump(stdout, "failing buffer", buf, len); + rte_free(data); + return TEST_FAILED; + } + + rte_free(data); + return TEST_SUCCESS; +} + +/* + * Test a length with both alignments + */ +static int +test_cksum_fuzz_length(size_t len, uint32_t initial_sum) +{ + int rc; + + /* Test aligned */ + rc = test_cksum_fuzz_length_aligned(len, true, initial_sum); + if (rc != TEST_SUCCESS) + return rc; + + /* Test unaligned */ + rc = test_cksum_fuzz_length_aligned(len, false, initial_sum); + + return rc; +} + +/* + * Test specific edge case lengths + */ +static int +test_cksum_fuzz_edge_cases(void) +{ + /* Edge case lengths that might trigger bugs */ + static const size_t edge_lengths[] = { + 0, 1, 2, 3, 4, 5, 6, 7, 8, + 15, 16, 17, + 31, 32, 33, + 63, 64, 65, + 127, 128, 129, + 255, 256, 257, + 511, 512, 513, + 1023, 1024, 1025, + 1500, 1501, /* MTU boundaries */ + 2047, 2048, 2049, + 4095, 4096, 4097, + 8191, 8192, 8193, + 16383, 16384, 16385, + 32767, 32768, 32769, + 65534, 65535, 65536 /* 64K GRO boundaries */ + }; + unsigned int i; + int rc; + + printf("Testing edge case lengths...\n"); + + for (i = 0; i < RTE_DIM(edge_lengths); i++) { + /* Test with zero initial sum */ + rc = test_cksum_fuzz_length(edge_lengths[i], 0); + if (rc != TEST_SUCCESS) + return rc; + + /* Test with random initial sum */ + rc = test_cksum_fuzz_length(edge_lengths[i], get_initial_sum(true)); + if (rc != TEST_SUCCESS) + return rc; + } + + return TEST_SUCCESS; +} + +/* + * Test random lengths with optional random initial sums + */ +static int +test_cksum_fuzz_random(unsigned int iterations, bool random_initial_sum) +{ + unsigned int i; + int rc; + + printf("Testing random lengths (0-%d)%s...\n", MAX_TEST_LEN, + random_initial_sum ? " with random initial sums" : ""); + + for (i = 0; i < iterations; i++) { + size_t len = rte_rand() % (MAX_TEST_LEN + 1); + + rc = test_cksum_fuzz_length(len, get_initial_sum(random_initial_sum)); + if (rc != TEST_SUCCESS) { + printf("Failed at len=%zu\n", len); + return rc; + } + } + + return TEST_SUCCESS; +} + +static int +test_cksum_fuzz(void) +{ + int rc; + unsigned int iterations = DEFAULT_ITERATIONS; + printf("### __rte_raw_cksum optimization fuzz test ###\n"); + printf("Iterations per test: %u\n\n", iterations); + + /* Test edge cases */ + rc = test_cksum_fuzz_edge_cases(); + if (rc != TEST_SUCCESS) { + printf("Edge case test FAILED\n"); + return rc; + } + printf("Edge case test PASSED\n\n"); + + /* Test random lengths with zero initial sum */ + rc = test_cksum_fuzz_random(iterations, false); + if (rc != TEST_SUCCESS) { + printf("Random length test FAILED\n"); + return rc; + } + printf("Random length test PASSED\n\n"); + + /* Test random lengths with random initial sums */ + rc = test_cksum_fuzz_random(iterations, true); + if (rc != TEST_SUCCESS) { + printf("Random initial sum test FAILED\n"); + return rc; + } + printf("Random initial sum test PASSED\n\n"); + + printf("All fuzz tests PASSED!\n"); + return TEST_SUCCESS; +} + +REGISTER_FAST_TEST(cksum_fuzz_autotest, NOHUGE_OK, ASAN_OK, test_cksum_fuzz); diff --git a/app/test/test_cksum_perf.c b/app/test/test_cksum_perf.c index 0b919cd59f..6b1d4589e0 100644 --- a/app/test/test_cksum_perf.c +++ b/app/test/test_cksum_perf.c @@ -15,7 +15,7 @@ #define NUM_BLOCKS 10 #define ITERATIONS 1000000 -static const size_t data_sizes[] = { 20, 21, 100, 101, 1500, 1501 }; +static const size_t data_sizes[] = { 20, 21, 100, 101, 1500, 1501, 9000, 9001, 65536, 65537 }; static __rte_noinline uint16_t do_rte_raw_cksum(const void *buf, size_t len) diff --git a/lib/net/rte_cksum.h b/lib/net/rte_cksum.h index a8e8927952..f04b46a6c3 100644 --- a/lib/net/rte_cksum.h +++ b/lib/net/rte_cksum.h @@ -42,15 +42,11 @@ extern "C" { static inline uint32_t __rte_raw_cksum(const void *buf, size_t len, uint32_t sum) { - const void *end; - - for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t))); - buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) { - uint16_t v; - - memcpy(&v, buf, sizeof(uint16_t)); - sum += v; - } + /* Process uint16 chunks to preserve overflow/carry math. GCC/Clang vectorize the loop. */ + const unaligned_uint16_t *buf16 = (const unaligned_uint16_t *)buf; + const unaligned_uint16_t *end = buf16 + (len / sizeof(*buf16)); + for (; buf16 != end; buf16++) + sum += *buf16; /* if length is odd, keeping it byte order independent */ if (unlikely(len % 2)) { -- 2.39.5 (Apple Git-154) ^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v16 2/2] net: __rte_raw_cksum pointers enable compiler optimizations 2026-01-23 16:02 ` [PATCH v16 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 @ 2026-01-28 11:05 ` David Marchand 2026-01-28 17:39 ` Scott Mitchell 0 siblings, 1 reply; 39+ messages in thread From: David Marchand @ 2026-01-28 11:05 UTC (permalink / raw) To: scott.k.mitch1; +Cc: dev, mb, stephen Hello Scott, On Fri, 23 Jan 2026 at 17:03, <scott.k.mitch1@gmail.com> wrote: > > From: Scott Mitchell <scott.k.mitch1@gmail.com> > > __rte_raw_cksum uses a loop with memcpy on each iteration. > GCC 15+ is able to vectorize the loop but Clang 18.1 is not. > > Replace memcpy with direct pointer access using unaligned_uint16_t. > This enables both GCC and Clang to vectorize the loop while handling > unaligned access safely on all architectures. > > Performance results from cksum_perf_autotest on Intel Xeon > (Cascade Lake, AVX-512) built with Clang 18.1 (TSC cycles/byte): > > Block size Before After Improvement > 100 0.40 0.24 ~40% > 1500 0.50 0.06 ~8x > 9000 0.49 0.06 ~8x > > Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com> Unfortunately, clang 14 (Ubuntu 22.04) is complaining about unaligned access in the new test. Could you have a look? RTE>>cksum_fuzz_autotest ../lib/net/rte_cksum.h:49:10: runtime error: load of misaligned address 0x0001816c2e81 for type 'const unaligned_uint16_t' (aka 'const unsigned short'), which requires 2 byte alignment 0x0001816c2e81: note: pointer points here 00 00 00 00 70 f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../lib/net/rte_cksum.h:49:10 in The whole backtrace is as follows: RTE>>cksum_fuzz_autotest ../lib/net/rte_cksum.h:49:10: runtime error: load of misaligned address 0x0001816c2e81 for type 'const unaligned_uint16_t' (aka 'const unsigned short'), which requires 2 byte alignment 0x0001816c2e81: note: pointer points here 00 00 00 00 0e ce 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ #0 0x55a725ec25e7 in __rte_raw_cksum test_cksum_fuzz.c #1 0x55a725ec21ce in test_cksum_fuzz_length_aligned test_cksum_fuzz.c #2 0x55a725ec1f65 in test_cksum_fuzz_length test_cksum_fuzz.c #3 0x55a725ec1c8f in test_cksum_fuzz_edge_cases test_cksum_fuzz.c #4 0x55a725ec1ab2 in test_cksum_fuzz test_cksum_fuzz.c #5 0x55a725ceece9 in cmd_autotest_parsed commands.c #6 0x7fdb96d7e668 in __cmdline_parse cmdline_parse.c #7 0x7fdb96d7dcb1 in cmdline_parse (/home/runner/work/dpdk/dpdk/build/app/../lib/librte_cmdline.so.26+0x1bcb1) (BuildId: bcf9387da4939ba68c89cec1938166c878fca318) #8 0x7fdb96d74b69 in cmdline_valid_buffer cmdline.c #9 0x7fdb96d8b9c3 in rdline_char_in (/home/runner/work/dpdk/dpdk/build/app/../lib/librte_cmdline.so.26+0x299c3) (BuildId: bcf9387da4939ba68c89cec1938166c878fca318) #10 0x7fdb96d752d3 in cmdline_in (/home/runner/work/dpdk/dpdk/build/app/../lib/librte_cmdline.so.26+0x132d3) (BuildId: bcf9387da4939ba68c89cec1938166c878fca318) #11 0x55a725cf0f0b in main (/home/runner/work/dpdk/dpdk/build/app/dpdk-test+0x4ddf0b) (BuildId: 5905b821f00329f9c5b95c7064ea051d7aacac48) #12 0x7fdb94629d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 #13 0x7fdb94629e3f in __libc_start_main csu/../csu/libc-start.c:392:3 #14 0x55a725cc5ed4 in _start (/home/runner/work/dpdk/dpdk/build/app/dpdk-test+0x4b2ed4) (BuildId: 5905b821f00329f9c5b95c7064ea051d7aacac48) [snip] > diff --git a/app/test/test_cksum_fuzz.c b/app/test/test_cksum_fuzz.c > new file mode 100644 > index 0000000000..3df11e3dc2 > --- /dev/null > +++ b/app/test/test_cksum_fuzz.c > @@ -0,0 +1,240 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2026 Apple Inc. > + */ > + > +#include <stdio.h> > +#include <string.h> > + > +#include <rte_common.h> > +#include <rte_cycles.h> > +#include <rte_hexdump.h> > +#include <rte_cksum.h> > +#include <rte_malloc.h> > +#include <rte_random.h> > + > +#include "test.h" > + > +/* > + * Fuzz test for __rte_raw_cksum optimization. > + * Compares the optimized implementation against the original reference > + * implementation across random data of various lengths. > + */ > + > +#define DEFAULT_ITERATIONS 1000 > +#define MAX_TEST_LEN 65536 /* 64K to match GRO frame sizes */ > + > +/* > + * Original (reference) implementation of __rte_raw_cksum from DPDK v23.11. > + * This is retained here for comparison testing against the optimized version. > + */ > +static inline uint32_t > +__rte_raw_cksum_reference(const void *buf, size_t len, uint32_t sum) > +{ Just a nit, I prefer we don't declare test functions with the same prefix as a public dpdk API. It is confusing when reading the test code. -- David Marchand ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v16 2/2] net: __rte_raw_cksum pointers enable compiler optimizations 2026-01-28 11:05 ` David Marchand @ 2026-01-28 17:39 ` Scott Mitchell 0 siblings, 0 replies; 39+ messages in thread From: Scott Mitchell @ 2026-01-28 17:39 UTC (permalink / raw) To: David Marchand; +Cc: dev, mb, stephen > Unfortunately, clang 14 (Ubuntu 22.04) is complaining about unaligned > access in the new test. > Could you have a look? Yes, thx for flagging this. I think the unaligned types need both `__rte_may_alias __rte_aligned(1)` unconditionally and will submit a v17. I verified the asm on clang/gcc on x86 (-mavx512cd) and armv8 (-msve-vector-bits=512) are identical when adding `__rte_aligned(1)`: https://godbolt.org/z/fdYPdoTa5 ^ permalink raw reply [flat|nested] 39+ messages in thread
* RE: [PATCH v16 0/2] net: optimize __rte_raw_cksum 2026-01-23 16:02 ` [PATCH v16 " scott.k.mitch1 2026-01-23 16:02 ` [PATCH v16 1/2] eal: add __rte_may_alias to unaligned typedefs scott.k.mitch1 2026-01-23 16:02 ` [PATCH v16 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 @ 2026-01-24 8:23 ` Morten Brørup 2026-01-28 18:05 ` [PATCH v17 " scott.k.mitch1 3 siblings, 0 replies; 39+ messages in thread From: Morten Brørup @ 2026-01-24 8:23 UTC (permalink / raw) To: scott.k.mitch1, dev; +Cc: stephen, Scott > From: Scott <scott_mitchell@apple.com> > > This series optimizes __rte_raw_cksum by replacing memcpy with direct > pointer access, enabling compiler vectorization on both GCC and Clang. > > Patch 1 adds __rte_may_alias to unaligned typedefs to prevent a GCC > strict-aliasing bug where struct initialization is incorrectly elided. > > Patch 2 uses the improved unaligned_uint16_t type in __rte_raw_cksum > to enable compiler optimizations while maintaining correctness across > all architectures (including strict-alignment platforms). > > Performance results show significant improvements (40% for small > buffers, > up to 8x for larger buffers) on Intel Xeon with Clang 18.1. It's usually allowed to carry forward ACKs from previous versions. With major changes between versions, the author should consider if previous ACKs can remain or not. Carrying forward from v15 of the series, Acked-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Morten Brørup <mb@smartsharesystems.com> ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v17 0/2] net: optimize __rte_raw_cksum 2026-01-23 16:02 ` [PATCH v16 " scott.k.mitch1 ` (2 preceding siblings ...) 2026-01-24 8:23 ` [PATCH v16 0/2] net: optimize __rte_raw_cksum Morten Brørup @ 2026-01-28 18:05 ` scott.k.mitch1 2026-01-28 18:05 ` [PATCH v17 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs scott.k.mitch1 ` (2 more replies) 3 siblings, 3 replies; 39+ messages in thread From: scott.k.mitch1 @ 2026-01-28 18:05 UTC (permalink / raw) To: dev; +Cc: mb, stephen, bruce.richardson, david.marchand, Scott From: Scott <scott.k.mitch1@gmail.com> This series optimizes __rte_raw_cksum by replacing memcpy with direct pointer access, enabling compiler vectorization on both GCC and Clang. Patch 1 adds __rte_may_alias and __rte_aligned(1) to unaligned typedefs to prevent a GCC strict-aliasing bug where struct initialization is incorrectly elided, and avoid UB by clarifying access can be from any address. Patch 2 uses the improved unaligned_uint16_t type in __rte_raw_cksum to enable compiler optimizations while maintaining correctness across all architectures (including strict-alignment platforms). Performance results show significant improvements (40% for small buffers, up to 8x for larger buffers) on Intel Xeon with Clang 18.1. Changes in v17: - Use __rte_aligned(1) unconditionally on unaligned type aliases - test_cksum_fuzz uses unit_test_suite_runner - test_cksum_fuzz reference method rename to test_cksum_fuzz_cksum_reference Changes in v16: - Add Fixes tag and Cc stable/author for backporting (patch 1) Changes in v15: - Use NOHUGE_OK and ASAN_OK constants in REGISTER_FAST_TEST Changes in v14: - Split into two patches: EAL typedef fix and checksum optimization - Use unaligned_uint16_t directly instead of wrapper struct - Added __rte_may_alias to unaligned typedefs to prevent GCC bug Scott Mitchell (2): eal: add __rte_may_alias and __rte_aligned to unaligned typedefs net: __rte_raw_cksum pointers enable compiler optimizations app/test/meson.build | 1 + app/test/test_cksum_fuzz.c | 234 +++++++++++++++++++++++++++++++++++ app/test/test_cksum_perf.c | 2 +- lib/eal/include/rte_common.h | 39 +++--- lib/net/rte_cksum.h | 14 +-- 5 files changed, 264 insertions(+), 26 deletions(-) create mode 100644 app/test/test_cksum_fuzz.c -- 2.39.5 (Apple Git-154) ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v17 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs 2026-01-28 18:05 ` [PATCH v17 " scott.k.mitch1 @ 2026-01-28 18:05 ` scott.k.mitch1 2026-01-28 18:05 ` [PATCH v17 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 2026-01-28 19:41 ` [PATCH v18 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 2 siblings, 0 replies; 39+ messages in thread From: scott.k.mitch1 @ 2026-01-28 18:05 UTC (permalink / raw) To: dev Cc: mb, stephen, bruce.richardson, david.marchand, Scott Mitchell, Cyril Chemparathy, stable From: Scott Mitchell <scott.k.mitch1@gmail.com> Add __rte_may_alias attribute to unaligned_uint{16,32,64}_t typedefs to prevent GCC strict-aliasing optimization bugs. GCC has a bug where it incorrectly elides struct initialization when strict aliasing is enabled, causing reads from uninitialized memory. Add __rte_aligned(1) attribute to unaligned_uint{16,32,64}_t typedefs which allows for safe access at any alignment. Without this, accessing a uint16_t at an odd address is undefined behavior. Without this UBSan detects `UndefinedBehaviorSanitizer: undefined-behavior`. Fixes: 7621d6a8d0bd ("eal: add and use unaligned integer types") Cc: Cyril Chemparathy <cchemparathy@ezchip.com> Cc: stable@dpdk.org Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com> --- lib/eal/include/rte_common.h | 39 +++++++++++++++++++++--------------- 1 file changed, 23 insertions(+), 16 deletions(-) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 573bf4f2ce..15d379619a 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -121,16 +121,32 @@ extern "C" { #define __rte_aligned(a) __attribute__((__aligned__(a))) #endif -#ifdef RTE_ARCH_STRICT_ALIGN -typedef uint64_t unaligned_uint64_t __rte_aligned(1); -typedef uint32_t unaligned_uint32_t __rte_aligned(1); -typedef uint16_t unaligned_uint16_t __rte_aligned(1); +/** + * Macro to mark a type that is not subject to type-based aliasing rules + */ +#ifdef RTE_TOOLCHAIN_MSVC +#define __rte_may_alias #else -typedef uint64_t unaligned_uint64_t; -typedef uint32_t unaligned_uint32_t; -typedef uint16_t unaligned_uint16_t; +#define __rte_may_alias __attribute__((__may_alias__)) #endif +/** + * Types for potentially unaligned access. + * + * __rte_aligned(1) - Reduces alignment requirement to 1 byte, allowing + * these types to safely access memory at any address. + * Without this, accessing a uint16_t at an odd address + * is undefined behavior (even on x86 where hardware + * handles it). + * + * __rte_may_alias - Prevents strict-aliasing optimization bugs where + * compilers may incorrectly elide memory operations + * when casting between pointer types. + */ +typedef uint64_t unaligned_uint64_t __rte_may_alias __rte_aligned(1); +typedef uint32_t unaligned_uint32_t __rte_may_alias __rte_aligned(1); +typedef uint16_t unaligned_uint16_t __rte_may_alias __rte_aligned(1); + /** * @deprecated * @see __rte_packed_begin @@ -159,15 +175,6 @@ typedef uint16_t unaligned_uint16_t; #define __rte_packed_end __attribute__((__packed__)) #endif -/** - * Macro to mark a type that is not subject to type-based aliasing rules - */ -#ifdef RTE_TOOLCHAIN_MSVC -#define __rte_may_alias -#else -#define __rte_may_alias __attribute__((__may_alias__)) -#endif - /******* Macro to mark functions and fields scheduled for removal *****/ #ifdef RTE_TOOLCHAIN_MSVC #define __rte_deprecated -- 2.39.5 (Apple Git-154) ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v17 2/2] net: __rte_raw_cksum pointers enable compiler optimizations 2026-01-28 18:05 ` [PATCH v17 " scott.k.mitch1 2026-01-28 18:05 ` [PATCH v17 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs scott.k.mitch1 @ 2026-01-28 18:05 ` scott.k.mitch1 2026-01-28 19:41 ` [PATCH v18 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 2 siblings, 0 replies; 39+ messages in thread From: scott.k.mitch1 @ 2026-01-28 18:05 UTC (permalink / raw) To: dev; +Cc: mb, stephen, bruce.richardson, david.marchand, Scott Mitchell From: Scott Mitchell <scott.k.mitch1@gmail.com> __rte_raw_cksum uses a loop with memcpy on each iteration. GCC 15+ is able to vectorize the loop but Clang 18.1 is not. Replace memcpy with direct pointer access using unaligned_uint16_t. This enables both GCC and Clang to vectorize the loop while handling unaligned access safely on all architectures. Performance results from cksum_perf_autotest on Intel Xeon (Cascade Lake, AVX-512) built with Clang 18.1 (TSC cycles/byte): Block size Before After Improvement 100 0.40 0.24 ~40% 1500 0.50 0.06 ~8x 9000 0.49 0.06 ~8x Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com> --- app/test/meson.build | 1 + app/test/test_cksum_fuzz.c | 234 +++++++++++++++++++++++++++++++++++++ app/test/test_cksum_perf.c | 2 +- lib/net/rte_cksum.h | 14 +-- 4 files changed, 241 insertions(+), 10 deletions(-) create mode 100644 app/test/test_cksum_fuzz.c diff --git a/app/test/meson.build b/app/test/meson.build index f4d04a6e42..2ca17716b9 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -38,6 +38,7 @@ source_file_deps = { 'test_byteorder.c': [], 'test_cfgfile.c': ['cfgfile'], 'test_cksum.c': ['net'], + 'test_cksum_fuzz.c': ['net'], 'test_cksum_perf.c': ['net'], 'test_cmdline.c': [], 'test_cmdline_cirbuf.c': [], diff --git a/app/test/test_cksum_fuzz.c b/app/test/test_cksum_fuzz.c new file mode 100644 index 0000000000..33b4c77f51 --- /dev/null +++ b/app/test/test_cksum_fuzz.c @@ -0,0 +1,234 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2026 Apple Inc. + */ + +#include <stdio.h> +#include <string.h> + +#include <rte_common.h> +#include <rte_cycles.h> +#include <rte_hexdump.h> +#include <rte_cksum.h> +#include <rte_malloc.h> +#include <rte_random.h> + +#include "test.h" + +/* + * Fuzz test for __rte_raw_cksum optimization. + * Compares the optimized implementation against the original reference + * implementation across random data of various lengths. + */ + +#define DEFAULT_ITERATIONS 1000 +#define MAX_TEST_LEN 65536 /* 64K to match GRO frame sizes */ + +/* + * Original (reference) implementation of __rte_raw_cksum from DPDK v23.11. + * This is retained here for comparison testing against the optimized version. + */ +static inline uint32_t +test_cksum_fuzz_cksum_reference(const void *buf, size_t len, uint32_t sum) +{ + const void *end; + + for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t))); + buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) { + uint16_t v; + + memcpy(&v, buf, sizeof(uint16_t)); + sum += v; + } + + /* if length is odd, keeping it byte order independent */ + if (unlikely(len % 2)) { + uint16_t left = 0; + + memcpy(&left, end, 1); + sum += left; + } + + return sum; +} + +static void +init_random_buffer(uint8_t *buf, size_t len) +{ + size_t i; + + for (i = 0; i < len; i++) + buf[i] = (uint8_t)rte_rand(); +} + +static inline uint32_t +get_initial_sum(bool random_initial_sum) +{ + return random_initial_sum ? (rte_rand() & 0xFFFFFFFF) : 0; +} + +/* + * Test a single buffer length with specific alignment and initial sum + */ +static int +test_cksum_fuzz_length_aligned(size_t len, bool aligned, uint32_t initial_sum) +{ + uint8_t *data; + uint8_t *buf; + size_t alloc_size; + uint32_t sum_ref, sum_opt; + + if (len == 0 && !aligned) { + /* Skip unaligned test for zero length - nothing to test */ + return TEST_SUCCESS; + } + + /* Allocate exact size for aligned, +1 for unaligned offset */ + alloc_size = aligned ? len : len + 1; + if (alloc_size == 0) + alloc_size = 1; /* rte_malloc doesn't like 0 */ + + data = rte_malloc(NULL, alloc_size, 64); + if (data == NULL) { + printf("Failed to allocate %zu bytes\n", alloc_size); + return TEST_FAILED; + } + + buf = aligned ? data : (data + 1); + + init_random_buffer(buf, len); + + sum_ref = test_cksum_fuzz_cksum_reference(buf, len, initial_sum); + sum_opt = __rte_raw_cksum(buf, len, initial_sum); + + if (sum_ref != sum_opt) { + printf("MISMATCH at len=%zu aligned='%s' initial_sum=0x%08x ref=0x%08x opt=0x%08x\n", + len, aligned ? "aligned" : "unaligned", + initial_sum, sum_ref, sum_opt); + rte_hexdump(stdout, "failing buffer", buf, len); + rte_free(data); + return TEST_FAILED; + } + + rte_free(data); + return TEST_SUCCESS; +} + +/* + * Test a length with both alignments + */ +static int +test_cksum_fuzz_length(size_t len, uint32_t initial_sum) +{ + int rc; + + /* Test aligned */ + rc = test_cksum_fuzz_length_aligned(len, true, initial_sum); + if (rc != TEST_SUCCESS) + return rc; + + /* Test unaligned */ + rc = test_cksum_fuzz_length_aligned(len, false, initial_sum); + + return rc; +} + +/* + * Test specific edge case lengths + */ +static int +test_cksum_fuzz_edge_cases(void) +{ + /* Edge case lengths that might trigger bugs */ + static const size_t edge_lengths[] = { + 0, 1, 2, 3, 4, 5, 6, 7, 8, + 15, 16, 17, + 31, 32, 33, + 63, 64, 65, + 127, 128, 129, + 255, 256, 257, + 511, 512, 513, + 1023, 1024, 1025, + 1500, 1501, /* MTU boundaries */ + 2047, 2048, 2049, + 4095, 4096, 4097, + 8191, 8192, 8193, + 16383, 16384, 16385, + 32767, 32768, 32769, + 65534, 65535, 65536 /* 64K GRO boundaries */ + }; + unsigned int i; + int rc; + + printf("Testing edge case lengths...\n"); + + for (i = 0; i < RTE_DIM(edge_lengths); i++) { + /* Test with zero initial sum */ + rc = test_cksum_fuzz_length(edge_lengths[i], 0); + if (rc != TEST_SUCCESS) + return rc; + + /* Test with random initial sum */ + rc = test_cksum_fuzz_length(edge_lengths[i], get_initial_sum(true)); + if (rc != TEST_SUCCESS) + return rc; + } + + return TEST_SUCCESS; +} + +/* + * Test random lengths with optional random initial sums + */ +static int +test_cksum_fuzz_random(unsigned int iterations, bool random_initial_sum) +{ + unsigned int i; + int rc; + + printf("Testing random lengths (0-%d)%s...\n", MAX_TEST_LEN, + random_initial_sum ? " with random initial sums" : ""); + + for (i = 0; i < iterations; i++) { + size_t len = rte_rand() % (MAX_TEST_LEN + 1); + + rc = test_cksum_fuzz_length(len, get_initial_sum(random_initial_sum)); + if (rc != TEST_SUCCESS) { + printf("Failed at len=%zu\n", len); + return rc; + } + } + + return TEST_SUCCESS; +} + +static int +test_cksum_fuzz_random_zero_sum(void) +{ + return test_cksum_fuzz_random(DEFAULT_ITERATIONS, false); +} + +static int +test_cksum_fuzz_random_random_sum(void) +{ + return test_cksum_fuzz_random(DEFAULT_ITERATIONS, true); +} + +static struct unit_test_suite ptr_cksum_fuzz_suite = { + .suite_name = "cksum fuzz autotest", + .setup = NULL, + .teardown = NULL, + .unit_test_cases = { + TEST_CASE(test_cksum_fuzz_edge_cases), + TEST_CASE(test_cksum_fuzz_random_zero_sum), + TEST_CASE(test_cksum_fuzz_random_random_sum), + TEST_CASES_END() + } +}; + +static int +test_cksum_fuzz_suite(void) +{ + return unit_test_suite_runner(&ptr_cksum_fuzz_suite); +} + +REGISTER_FAST_TEST(cksum_fuzz_autotest, NOHUGE_OK, ASAN_OK, test_cksum_fuzz_suite); diff --git a/app/test/test_cksum_perf.c b/app/test/test_cksum_perf.c index 0b919cd59f..6b1d4589e0 100644 --- a/app/test/test_cksum_perf.c +++ b/app/test/test_cksum_perf.c @@ -15,7 +15,7 @@ #define NUM_BLOCKS 10 #define ITERATIONS 1000000 -static const size_t data_sizes[] = { 20, 21, 100, 101, 1500, 1501 }; +static const size_t data_sizes[] = { 20, 21, 100, 101, 1500, 1501, 9000, 9001, 65536, 65537 }; static __rte_noinline uint16_t do_rte_raw_cksum(const void *buf, size_t len) diff --git a/lib/net/rte_cksum.h b/lib/net/rte_cksum.h index a8e8927952..f04b46a6c3 100644 --- a/lib/net/rte_cksum.h +++ b/lib/net/rte_cksum.h @@ -42,15 +42,11 @@ extern "C" { static inline uint32_t __rte_raw_cksum(const void *buf, size_t len, uint32_t sum) { - const void *end; - - for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t))); - buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) { - uint16_t v; - - memcpy(&v, buf, sizeof(uint16_t)); - sum += v; - } + /* Process uint16 chunks to preserve overflow/carry math. GCC/Clang vectorize the loop. */ + const unaligned_uint16_t *buf16 = (const unaligned_uint16_t *)buf; + const unaligned_uint16_t *end = buf16 + (len / sizeof(*buf16)); + for (; buf16 != end; buf16++) + sum += *buf16; /* if length is odd, keeping it byte order independent */ if (unlikely(len % 2)) { -- 2.39.5 (Apple Git-154) ^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v18 0/2] net: optimize __rte_raw_cksum 2026-01-28 18:05 ` [PATCH v17 " scott.k.mitch1 2026-01-28 18:05 ` [PATCH v17 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs scott.k.mitch1 2026-01-28 18:05 ` [PATCH v17 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 @ 2026-01-28 19:41 ` scott.k.mitch1 2026-01-28 19:41 ` [PATCH v18 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs scott.k.mitch1 ` (2 more replies) 2 siblings, 3 replies; 39+ messages in thread From: scott.k.mitch1 @ 2026-01-28 19:41 UTC (permalink / raw) To: dev; +Cc: mb, stephen, bruce.richardson, david.marchand, Scott From: Scott <scott.k.mitch1@gmail.com> This series optimizes __rte_raw_cksum by replacing memcpy with direct pointer access, enabling compiler vectorization on both GCC and Clang. Patch 1 adds __rte_may_alias and __rte_aligned(1) to unaligned typedefs to prevent a GCC strict-aliasing bug where struct initialization is incorrectly elided, and avoid UB by clarifying access can be from any address. Patch 2 uses the improved unaligned_uint16_t type in __rte_raw_cksum to enable compiler optimizations while maintaining correctness across all architectures (including strict-alignment platforms). Performance results show significant improvements (40% for small buffers, up to 8x for larger buffers) on Intel Xeon with Clang 18.1. Changes in v18: - Fix MSVC compile error __rte_aligned(1) must come before type - Fix test_hash_functions incorrect usage of unaligned_uint32_t Changes in v17: - Use __rte_aligned(1) unconditionally on unaligned type aliases - test_cksum_fuzz uses unit_test_suite_runner - test_cksum_fuzz reference method rename to test_cksum_fuzz_cksum_reference Changes in v16: - Add Fixes tag and Cc stable/author for backporting (patch 1) Changes in v15: - Use NOHUGE_OK and ASAN_OK constants in REGISTER_FAST_TEST Changes in v14: - Split into two patches: EAL typedef fix and checksum optimization - Use unaligned_uint16_t directly instead of wrapper struct - Added __rte_may_alias to unaligned typedefs to prevent GCC bug Scott Mitchell (2): eal: add __rte_may_alias and __rte_aligned to unaligned typedefs net: __rte_raw_cksum pointers enable compiler optimizations app/test/meson.build | 1 + app/test/test_cksum_fuzz.c | 234 +++++++++++++++++++++++++++++++++ app/test/test_cksum_perf.c | 2 +- app/test/test_hash_functions.c | 2 +- lib/eal/include/rte_common.h | 45 ++++--- lib/net/rte_cksum.h | 14 +- 6 files changed, 271 insertions(+), 27 deletions(-) create mode 100644 app/test/test_cksum_fuzz.c -- 2.39.5 (Apple Git-154) ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v18 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs 2026-01-28 19:41 ` [PATCH v18 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 @ 2026-01-28 19:41 ` scott.k.mitch1 2026-01-29 8:28 ` Morten Brørup 2026-01-28 19:41 ` [PATCH v18 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 2026-02-02 4:48 ` [PATCH v19 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 2 siblings, 1 reply; 39+ messages in thread From: scott.k.mitch1 @ 2026-01-28 19:41 UTC (permalink / raw) To: dev Cc: mb, stephen, bruce.richardson, david.marchand, Scott Mitchell, Cyril Chemparathy, stable From: Scott Mitchell <scott.k.mitch1@gmail.com> Add __rte_may_alias attribute to unaligned_uint{16,32,64}_t typedefs to prevent GCC strict-aliasing optimization bugs. GCC has a bug where it incorrectly elides struct initialization when strict aliasing is enabled, causing reads from uninitialized memory. Add __rte_aligned(1) attribute to unaligned_uint{16,32,64}_t typedefs which allows for safe access at any alignment. Without this, accessing a uint16_t at an odd address is undefined behavior. Without this UBSan detects `UndefinedBehaviorSanitizer: undefined-behavior`. Fixes: 7621d6a8d0bd ("eal: add and use unaligned integer types") Cc: Cyril Chemparathy <cchemparathy@ezchip.com> Cc: stable@dpdk.org Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com> --- app/test/test_hash_functions.c | 2 +- lib/eal/include/rte_common.h | 45 ++++++++++++++++++++++------------ 2 files changed, 30 insertions(+), 17 deletions(-) diff --git a/app/test/test_hash_functions.c b/app/test/test_hash_functions.c index 70820d1f19..5b8b9c3e5d 100644 --- a/app/test/test_hash_functions.c +++ b/app/test/test_hash_functions.c @@ -199,7 +199,7 @@ verify_jhash_32bits(void) hash = rte_jhash(key, hashtest_key_lens[i], hashtest_initvals[j]); /* Divide key length by 4 in rte_jhash for 32 bits */ - hash32 = rte_jhash_32b((const unaligned_uint32_t *)key, + hash32 = rte_jhash_32b((const uint32_t *)key, hashtest_key_lens[i] >> 2, hashtest_initvals[j]); if (hash != hash32) { diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 573bf4f2ce..b10816d0d7 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -121,14 +121,36 @@ extern "C" { #define __rte_aligned(a) __attribute__((__aligned__(a))) #endif -#ifdef RTE_ARCH_STRICT_ALIGN -typedef uint64_t unaligned_uint64_t __rte_aligned(1); -typedef uint32_t unaligned_uint32_t __rte_aligned(1); -typedef uint16_t unaligned_uint16_t __rte_aligned(1); +/** + * Macro to mark a type that is not subject to type-based aliasing rules + */ +#ifdef RTE_TOOLCHAIN_MSVC +#define __rte_may_alias #else -typedef uint64_t unaligned_uint64_t; -typedef uint32_t unaligned_uint32_t; -typedef uint16_t unaligned_uint16_t; +#define __rte_may_alias __attribute__((__may_alias__)) +#endif + +/** + * Types for potentially unaligned access. + * + * __rte_aligned(1) - Reduces alignment requirement to 1 byte, allowing + * these types to safely access memory at any address. + * Without this, accessing a uint16_t at an odd address + * is undefined behavior (even on x86 where hardware + * handles it). + * + * __rte_may_alias - Prevents strict-aliasing optimization bugs where + * compilers may incorrectly elide memory operations + * when casting between pointer types. + */ +#ifdef RTE_TOOLCHAIN_MSVC +typedef __rte_may_alias __rte_aligned(1) uint64_t unaligned_uint64_t; +typedef __rte_may_alias __rte_aligned(1) uint32_t unaligned_uint32_t; +typedef __rte_may_alias __rte_aligned(1) uint16_t unaligned_uint16_t; +#else +typedef uint64_t unaligned_uint64_t __rte_may_alias __rte_aligned(1); +typedef uint32_t unaligned_uint32_t __rte_may_alias __rte_aligned(1); +typedef uint16_t unaligned_uint16_t __rte_may_alias __rte_aligned(1); #endif /** @@ -159,15 +181,6 @@ typedef uint16_t unaligned_uint16_t; #define __rte_packed_end __attribute__((__packed__)) #endif -/** - * Macro to mark a type that is not subject to type-based aliasing rules - */ -#ifdef RTE_TOOLCHAIN_MSVC -#define __rte_may_alias -#else -#define __rte_may_alias __attribute__((__may_alias__)) -#endif - /******* Macro to mark functions and fields scheduled for removal *****/ #ifdef RTE_TOOLCHAIN_MSVC #define __rte_deprecated -- 2.39.5 (Apple Git-154) ^ permalink raw reply related [flat|nested] 39+ messages in thread
* RE: [PATCH v18 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs 2026-01-28 19:41 ` [PATCH v18 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs scott.k.mitch1 @ 2026-01-29 8:28 ` Morten Brørup 2026-02-02 4:31 ` Scott Mitchell 0 siblings, 1 reply; 39+ messages in thread From: Morten Brørup @ 2026-01-29 8:28 UTC (permalink / raw) To: scott.k.mitch1, dev Cc: stephen, bruce.richardson, david.marchand, Cyril Chemparathy, stable > @@ -199,7 +199,7 @@ verify_jhash_32bits(void) > hash = rte_jhash(key, hashtest_key_lens[i], > hashtest_initvals[j]); > /* Divide key length by 4 in rte_jhash for 32 > bits */ > - hash32 = rte_jhash_32b((const > unaligned_uint32_t *)key, > + hash32 = rte_jhash_32b((const uint32_t *)key, > hashtest_key_lens[i] >> 2, > hashtest_initvals[j]); > if (hash != hash32) { rte_jhash_32b() correctly takes a pointer to (aligned) uint32_t, not unaligned, so casting to unaligned might be introducing a bug. (The automatically aligned allocation of the local "key" variable prevents this bug from occurring, but anyway.) Instead of changing the type cast, I'd prefer fixing this as follows: Add a local variable uint32_t key32[sizeof(key)/sizeof(uint32_t)], and memcpy(key32,key,sizeof(key)), and then call rte_jhash_32b(key32,...) without type casting. > +/** > + * Types for potentially unaligned access. > + * > + * __rte_aligned(1) - Reduces alignment requirement to 1 byte, > allowing > + * these types to safely access memory at any > address. > + * Without this, accessing a uint16_t at an odd > address > + * is undefined behavior (even on x86 where > hardware > + * handles it). > + * > + * __rte_may_alias - Prevents strict-aliasing optimization bugs where > + * compilers may incorrectly elide memory > operations > + * when casting between pointer types. > + */ > +#ifdef RTE_TOOLCHAIN_MSVC > +typedef __rte_may_alias __rte_aligned(1) uint64_t unaligned_uint64_t; > +typedef __rte_may_alias __rte_aligned(1) uint32_t unaligned_uint32_t; > +typedef __rte_may_alias __rte_aligned(1) uint16_t unaligned_uint16_t; > +#else > +typedef uint64_t unaligned_uint64_t __rte_may_alias __rte_aligned(1); > +typedef uint32_t unaligned_uint32_t __rte_may_alias __rte_aligned(1); > +typedef uint16_t unaligned_uint16_t __rte_may_alias __rte_aligned(1); > #endif Skimming GCC documentation, it looks like older versions required placing such attributes after the type, but newer versions seem to recommend placing them before, like qualifiers (const, volatile, ...). Placing them before the type, like qualifiers, seems more natural to me. And apparently, MSVC requires it. Does it work for GCC and Clang if they are placed before, like MSVC? Then we can get rid of the #ifdef RTE_TOOLCHAIN_MSVC. ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v18 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs 2026-01-29 8:28 ` Morten Brørup @ 2026-02-02 4:31 ` Scott Mitchell 0 siblings, 0 replies; 39+ messages in thread From: Scott Mitchell @ 2026-02-02 4:31 UTC (permalink / raw) To: Morten Brørup Cc: dev, stephen, bruce.richardson, david.marchand, Cyril Chemparathy, stable > > + hash32 = rte_jhash_32b((const uint32_t *)key, > > hashtest_key_lens[i] >> 2, > > hashtest_initvals[j]); > > if (hash != hash32) { > > rte_jhash_32b() correctly takes a pointer to (aligned) uint32_t, not unaligned, so casting to unaligned might be introducing a bug. (The automatically aligned allocation of the local "key" variable prevents this bug from occurring, but anyway.) > Instead of changing the type cast, I'd prefer fixing this as follows: > Add a local variable uint32_t key32[sizeof(key)/sizeof(uint32_t)], and memcpy(key32,key,sizeof(key)), and then call rte_jhash_32b(key32,...) without type casting. Sounds good, fix coming in v19. > > +/** > > + * Types for potentially unaligned access. > > + * > > + * __rte_aligned(1) - Reduces alignment requirement to 1 byte, > > allowing > > + * these types to safely access memory at any > > address. > > + * Without this, accessing a uint16_t at an odd > > address > > + * is undefined behavior (even on x86 where > > hardware > > + * handles it). > > + * > > + * __rte_may_alias - Prevents strict-aliasing optimization bugs where > > + * compilers may incorrectly elide memory > > operations > > + * when casting between pointer types. > > + */ > > +#ifdef RTE_TOOLCHAIN_MSVC > > +typedef __rte_may_alias __rte_aligned(1) uint64_t unaligned_uint64_t; > > +typedef __rte_may_alias __rte_aligned(1) uint32_t unaligned_uint32_t; > > +typedef __rte_may_alias __rte_aligned(1) uint16_t unaligned_uint16_t; > > +#else > > +typedef uint64_t unaligned_uint64_t __rte_may_alias __rte_aligned(1); > > +typedef uint32_t unaligned_uint32_t __rte_may_alias __rte_aligned(1); > > +typedef uint16_t unaligned_uint16_t __rte_may_alias __rte_aligned(1); > > #endif > > Skimming GCC documentation, it looks like older versions required placing such attributes after the type, but newer versions seem to recommend placing them before, like qualifiers (const, volatile, ...). > Placing them before the type, like qualifiers, seems more natural to me. > And apparently, MSVC requires it. > Does it work for GCC and Clang if they are placed before, like MSVC? > Then we can get rid of the #ifdef RTE_TOOLCHAIN_MSVC. Good call! https://godbolt.org/z/oYrnfsMM3 gcc 8 and clang 7 both support attributes before the type. ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v18 2/2] net: __rte_raw_cksum pointers enable compiler optimizations 2026-01-28 19:41 ` [PATCH v18 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 2026-01-28 19:41 ` [PATCH v18 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs scott.k.mitch1 @ 2026-01-28 19:41 ` scott.k.mitch1 2026-01-29 8:31 ` Morten Brørup 2026-02-02 4:48 ` [PATCH v19 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 2 siblings, 1 reply; 39+ messages in thread From: scott.k.mitch1 @ 2026-01-28 19:41 UTC (permalink / raw) To: dev; +Cc: mb, stephen, bruce.richardson, david.marchand, Scott Mitchell From: Scott Mitchell <scott.k.mitch1@gmail.com> __rte_raw_cksum uses a loop with memcpy on each iteration. GCC 15+ is able to vectorize the loop but Clang 18.1 is not. Replace memcpy with direct pointer access using unaligned_uint16_t. This enables both GCC and Clang to vectorize the loop while handling unaligned access safely on all architectures. Performance results from cksum_perf_autotest on Intel Xeon (Cascade Lake, AVX-512) built with Clang 18.1 (TSC cycles/byte): Block size Before After Improvement 100 0.40 0.24 ~40% 1500 0.50 0.06 ~8x 9000 0.49 0.06 ~8x Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com> --- app/test/meson.build | 1 + app/test/test_cksum_fuzz.c | 234 +++++++++++++++++++++++++++++++++++++ app/test/test_cksum_perf.c | 2 +- lib/net/rte_cksum.h | 14 +-- 4 files changed, 241 insertions(+), 10 deletions(-) create mode 100644 app/test/test_cksum_fuzz.c diff --git a/app/test/meson.build b/app/test/meson.build index f4d04a6e42..2ca17716b9 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -38,6 +38,7 @@ source_file_deps = { 'test_byteorder.c': [], 'test_cfgfile.c': ['cfgfile'], 'test_cksum.c': ['net'], + 'test_cksum_fuzz.c': ['net'], 'test_cksum_perf.c': ['net'], 'test_cmdline.c': [], 'test_cmdline_cirbuf.c': [], diff --git a/app/test/test_cksum_fuzz.c b/app/test/test_cksum_fuzz.c new file mode 100644 index 0000000000..33b4c77f51 --- /dev/null +++ b/app/test/test_cksum_fuzz.c @@ -0,0 +1,234 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2026 Apple Inc. + */ + +#include <stdio.h> +#include <string.h> + +#include <rte_common.h> +#include <rte_cycles.h> +#include <rte_hexdump.h> +#include <rte_cksum.h> +#include <rte_malloc.h> +#include <rte_random.h> + +#include "test.h" + +/* + * Fuzz test for __rte_raw_cksum optimization. + * Compares the optimized implementation against the original reference + * implementation across random data of various lengths. + */ + +#define DEFAULT_ITERATIONS 1000 +#define MAX_TEST_LEN 65536 /* 64K to match GRO frame sizes */ + +/* + * Original (reference) implementation of __rte_raw_cksum from DPDK v23.11. + * This is retained here for comparison testing against the optimized version. + */ +static inline uint32_t +test_cksum_fuzz_cksum_reference(const void *buf, size_t len, uint32_t sum) +{ + const void *end; + + for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t))); + buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) { + uint16_t v; + + memcpy(&v, buf, sizeof(uint16_t)); + sum += v; + } + + /* if length is odd, keeping it byte order independent */ + if (unlikely(len % 2)) { + uint16_t left = 0; + + memcpy(&left, end, 1); + sum += left; + } + + return sum; +} + +static void +init_random_buffer(uint8_t *buf, size_t len) +{ + size_t i; + + for (i = 0; i < len; i++) + buf[i] = (uint8_t)rte_rand(); +} + +static inline uint32_t +get_initial_sum(bool random_initial_sum) +{ + return random_initial_sum ? (rte_rand() & 0xFFFFFFFF) : 0; +} + +/* + * Test a single buffer length with specific alignment and initial sum + */ +static int +test_cksum_fuzz_length_aligned(size_t len, bool aligned, uint32_t initial_sum) +{ + uint8_t *data; + uint8_t *buf; + size_t alloc_size; + uint32_t sum_ref, sum_opt; + + if (len == 0 && !aligned) { + /* Skip unaligned test for zero length - nothing to test */ + return TEST_SUCCESS; + } + + /* Allocate exact size for aligned, +1 for unaligned offset */ + alloc_size = aligned ? len : len + 1; + if (alloc_size == 0) + alloc_size = 1; /* rte_malloc doesn't like 0 */ + + data = rte_malloc(NULL, alloc_size, 64); + if (data == NULL) { + printf("Failed to allocate %zu bytes\n", alloc_size); + return TEST_FAILED; + } + + buf = aligned ? data : (data + 1); + + init_random_buffer(buf, len); + + sum_ref = test_cksum_fuzz_cksum_reference(buf, len, initial_sum); + sum_opt = __rte_raw_cksum(buf, len, initial_sum); + + if (sum_ref != sum_opt) { + printf("MISMATCH at len=%zu aligned='%s' initial_sum=0x%08x ref=0x%08x opt=0x%08x\n", + len, aligned ? "aligned" : "unaligned", + initial_sum, sum_ref, sum_opt); + rte_hexdump(stdout, "failing buffer", buf, len); + rte_free(data); + return TEST_FAILED; + } + + rte_free(data); + return TEST_SUCCESS; +} + +/* + * Test a length with both alignments + */ +static int +test_cksum_fuzz_length(size_t len, uint32_t initial_sum) +{ + int rc; + + /* Test aligned */ + rc = test_cksum_fuzz_length_aligned(len, true, initial_sum); + if (rc != TEST_SUCCESS) + return rc; + + /* Test unaligned */ + rc = test_cksum_fuzz_length_aligned(len, false, initial_sum); + + return rc; +} + +/* + * Test specific edge case lengths + */ +static int +test_cksum_fuzz_edge_cases(void) +{ + /* Edge case lengths that might trigger bugs */ + static const size_t edge_lengths[] = { + 0, 1, 2, 3, 4, 5, 6, 7, 8, + 15, 16, 17, + 31, 32, 33, + 63, 64, 65, + 127, 128, 129, + 255, 256, 257, + 511, 512, 513, + 1023, 1024, 1025, + 1500, 1501, /* MTU boundaries */ + 2047, 2048, 2049, + 4095, 4096, 4097, + 8191, 8192, 8193, + 16383, 16384, 16385, + 32767, 32768, 32769, + 65534, 65535, 65536 /* 64K GRO boundaries */ + }; + unsigned int i; + int rc; + + printf("Testing edge case lengths...\n"); + + for (i = 0; i < RTE_DIM(edge_lengths); i++) { + /* Test with zero initial sum */ + rc = test_cksum_fuzz_length(edge_lengths[i], 0); + if (rc != TEST_SUCCESS) + return rc; + + /* Test with random initial sum */ + rc = test_cksum_fuzz_length(edge_lengths[i], get_initial_sum(true)); + if (rc != TEST_SUCCESS) + return rc; + } + + return TEST_SUCCESS; +} + +/* + * Test random lengths with optional random initial sums + */ +static int +test_cksum_fuzz_random(unsigned int iterations, bool random_initial_sum) +{ + unsigned int i; + int rc; + + printf("Testing random lengths (0-%d)%s...\n", MAX_TEST_LEN, + random_initial_sum ? " with random initial sums" : ""); + + for (i = 0; i < iterations; i++) { + size_t len = rte_rand() % (MAX_TEST_LEN + 1); + + rc = test_cksum_fuzz_length(len, get_initial_sum(random_initial_sum)); + if (rc != TEST_SUCCESS) { + printf("Failed at len=%zu\n", len); + return rc; + } + } + + return TEST_SUCCESS; +} + +static int +test_cksum_fuzz_random_zero_sum(void) +{ + return test_cksum_fuzz_random(DEFAULT_ITERATIONS, false); +} + +static int +test_cksum_fuzz_random_random_sum(void) +{ + return test_cksum_fuzz_random(DEFAULT_ITERATIONS, true); +} + +static struct unit_test_suite ptr_cksum_fuzz_suite = { + .suite_name = "cksum fuzz autotest", + .setup = NULL, + .teardown = NULL, + .unit_test_cases = { + TEST_CASE(test_cksum_fuzz_edge_cases), + TEST_CASE(test_cksum_fuzz_random_zero_sum), + TEST_CASE(test_cksum_fuzz_random_random_sum), + TEST_CASES_END() + } +}; + +static int +test_cksum_fuzz_suite(void) +{ + return unit_test_suite_runner(&ptr_cksum_fuzz_suite); +} + +REGISTER_FAST_TEST(cksum_fuzz_autotest, NOHUGE_OK, ASAN_OK, test_cksum_fuzz_suite); diff --git a/app/test/test_cksum_perf.c b/app/test/test_cksum_perf.c index 0b919cd59f..6b1d4589e0 100644 --- a/app/test/test_cksum_perf.c +++ b/app/test/test_cksum_perf.c @@ -15,7 +15,7 @@ #define NUM_BLOCKS 10 #define ITERATIONS 1000000 -static const size_t data_sizes[] = { 20, 21, 100, 101, 1500, 1501 }; +static const size_t data_sizes[] = { 20, 21, 100, 101, 1500, 1501, 9000, 9001, 65536, 65537 }; static __rte_noinline uint16_t do_rte_raw_cksum(const void *buf, size_t len) diff --git a/lib/net/rte_cksum.h b/lib/net/rte_cksum.h index a8e8927952..f04b46a6c3 100644 --- a/lib/net/rte_cksum.h +++ b/lib/net/rte_cksum.h @@ -42,15 +42,11 @@ extern "C" { static inline uint32_t __rte_raw_cksum(const void *buf, size_t len, uint32_t sum) { - const void *end; - - for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t))); - buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) { - uint16_t v; - - memcpy(&v, buf, sizeof(uint16_t)); - sum += v; - } + /* Process uint16 chunks to preserve overflow/carry math. GCC/Clang vectorize the loop. */ + const unaligned_uint16_t *buf16 = (const unaligned_uint16_t *)buf; + const unaligned_uint16_t *end = buf16 + (len / sizeof(*buf16)); + for (; buf16 != end; buf16++) + sum += *buf16; /* if length is odd, keeping it byte order independent */ if (unlikely(len % 2)) { -- 2.39.5 (Apple Git-154) ^ permalink raw reply related [flat|nested] 39+ messages in thread
* RE: [PATCH v18 2/2] net: __rte_raw_cksum pointers enable compiler optimizations 2026-01-28 19:41 ` [PATCH v18 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 @ 2026-01-29 8:31 ` Morten Brørup 0 siblings, 0 replies; 39+ messages in thread From: Morten Brørup @ 2026-01-29 8:31 UTC (permalink / raw) To: scott.k.mitch1, dev; +Cc: stephen, bruce.richardson, david.marchand Acked-by: Morten Brørup <mb@smartsharesystems.com> ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v19 0/2] net: optimize __rte_raw_cksum 2026-01-28 19:41 ` [PATCH v18 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 2026-01-28 19:41 ` [PATCH v18 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs scott.k.mitch1 2026-01-28 19:41 ` [PATCH v18 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 @ 2026-02-02 4:48 ` scott.k.mitch1 2026-02-02 4:48 ` [PATCH v19 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs scott.k.mitch1 ` (2 more replies) 2 siblings, 3 replies; 39+ messages in thread From: scott.k.mitch1 @ 2026-02-02 4:48 UTC (permalink / raw) To: dev; +Cc: mb, stephen, bruce.richardson, david.marchand, Scott From: Scott <scott.k.mitch1@gmail.com> This series optimizes __rte_raw_cksum by replacing memcpy with direct pointer access, enabling compiler vectorization on both GCC and Clang. Patch 1 adds __rte_may_alias and __rte_aligned(1) to unaligned typedefs to prevent a GCC strict-aliasing bug where struct initialization is incorrectly elided, and avoid UB by clarifying access can be from any address. Patch 2 uses the improved unaligned_uint16_t type in __rte_raw_cksum to enable compiler optimizations while maintaining correctness across all architectures (including strict-alignment platforms). Performance results show significant improvements (40% for small buffers, up to 8x for larger buffers) on Intel Xeon with Clang 18.1. Changes in v19: - Move qualifiers before typedef on all platforms - test_hash_functions explicit 32 bit variable use Changes in v18: - Fix MSVC compile error __rte_aligned(1) must come before type - Fix test_hash_functions incorrect usage of unaligned_uint32_t Changes in v17: - Use __rte_aligned(1) unconditionally on unaligned type aliases - test_cksum_fuzz uses unit_test_suite_runner - test_cksum_fuzz reference method rename to test_cksum_fuzz_cksum_reference Changes in v16: - Add Fixes tag and Cc stable/author for backporting (patch 1) Changes in v15: - Use NOHUGE_OK and ASAN_OK constants in REGISTER_FAST_TEST Changes in v14: - Split into two patches: EAL typedef fix and checksum optimization - Use unaligned_uint16_t directly instead of wrapper struct - Added __rte_may_alias to unaligned typedefs to prevent GCC bug Scott Mitchell (2): eal: add __rte_may_alias and __rte_aligned to unaligned typedefs net: __rte_raw_cksum pointers enable compiler optimizations app/test/meson.build | 1 + app/test/test_cksum_fuzz.c | 234 +++++++++++++++++++++++++++++++++ app/test/test_cksum_perf.c | 2 +- app/test/test_hash_functions.c | 6 +- lib/eal/include/rte_common.h | 49 ++++--- lib/net/rte_cksum.h | 14 +- 6 files changed, 279 insertions(+), 27 deletions(-) create mode 100644 app/test/test_cksum_fuzz.c -- 2.39.5 (Apple Git-154) ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v19 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs 2026-02-02 4:48 ` [PATCH v19 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 @ 2026-02-02 4:48 ` scott.k.mitch1 2026-02-03 8:18 ` Morten Brørup 2026-02-16 14:29 ` David Marchand 2026-02-02 4:48 ` [PATCH v19 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 2026-02-06 14:54 ` [PATCH v19 0/2] net: optimize __rte_raw_cksum David Marchand 2 siblings, 2 replies; 39+ messages in thread From: scott.k.mitch1 @ 2026-02-02 4:48 UTC (permalink / raw) To: dev Cc: mb, stephen, bruce.richardson, david.marchand, Scott Mitchell, Cyril Chemparathy, stable From: Scott Mitchell <scott.k.mitch1@gmail.com> Add __rte_may_alias attribute to unaligned_uint{16,32,64}_t typedefs to prevent GCC strict-aliasing optimization bugs. GCC has a bug where it incorrectly elides struct initialization when strict aliasing is enabled, causing reads from uninitialized memory. Add __rte_aligned(1) attribute to unaligned_uint{16,32,64}_t typedefs which allows for safe access at any alignment. Without this, accessing a uint16_t at an odd address is undefined behavior. Without this UBSan detects `UndefinedBehaviorSanitizer: undefined-behavior`. Fixes: 7621d6a8d0bd ("eal: add and use unaligned integer types") Cc: Cyril Chemparathy <cchemparathy@ezchip.com> Cc: stable@dpdk.org Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com> --- app/test/test_hash_functions.c | 6 ++++- lib/eal/include/rte_common.h | 49 +++++++++++++++++++++++----------- 2 files changed, 38 insertions(+), 17 deletions(-) diff --git a/app/test/test_hash_functions.c b/app/test/test_hash_functions.c index 70820d1f19..9524e3135f 100644 --- a/app/test/test_hash_functions.c +++ b/app/test/test_hash_functions.c @@ -187,11 +187,15 @@ verify_jhash_32bits(void) { unsigned i, j; uint8_t key[64]; + /* to guarantee alignment for rte_jhash_32b, use u32 and copy data */ + uint32_t key32[sizeof(key) / sizeof(uint32_t)]; uint32_t hash, hash32; for (i = 0; i < 64; i++) key[i] = rand() & 0xff; + memcpy(key32, key, sizeof(key)); + for (i = 0; i < RTE_DIM(hashtest_key_lens); i++) { for (j = 0; j < RTE_DIM(hashtest_initvals); j++) { /* Key size must be multiple of 4 (32 bits) */ @@ -199,7 +203,7 @@ verify_jhash_32bits(void) hash = rte_jhash(key, hashtest_key_lens[i], hashtest_initvals[j]); /* Divide key length by 4 in rte_jhash for 32 bits */ - hash32 = rte_jhash_32b((const unaligned_uint32_t *)key, + hash32 = rte_jhash_32b(key32, hashtest_key_lens[i] >> 2, hashtest_initvals[j]); if (hash != hash32) { diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 573bf4f2ce..7b36966019 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -121,16 +121,42 @@ extern "C" { #define __rte_aligned(a) __attribute__((__aligned__(a))) #endif -#ifdef RTE_ARCH_STRICT_ALIGN -typedef uint64_t unaligned_uint64_t __rte_aligned(1); -typedef uint32_t unaligned_uint32_t __rte_aligned(1); -typedef uint16_t unaligned_uint16_t __rte_aligned(1); +/** + * Macro to mark a type that is not subject to type-based aliasing rules + */ +#ifdef RTE_TOOLCHAIN_MSVC +#define __rte_may_alias #else -typedef uint64_t unaligned_uint64_t; -typedef uint32_t unaligned_uint32_t; -typedef uint16_t unaligned_uint16_t; +#define __rte_may_alias __attribute__((__may_alias__)) #endif +/* Unaligned types implementation notes: + * __rte_aligned(1) - Reduces alignment requirement to 1 byte, allowing + * these types to safely access memory at any address. + * Without this, accessing a uint16_t at an odd address + * is undefined behavior (even on x86 where hardware + * handles it). + * + * __rte_may_alias - Prevents strict-aliasing optimization bugs where + * compilers may incorrectly elide memory operations + * when casting between pointer types. + */ + +/** + * Type for safe unaligned u64 access. + */ +typedef __rte_may_alias __rte_aligned(1) uint64_t unaligned_uint64_t; + +/** + * Type for safe unaligned u32 access. + */ +typedef __rte_may_alias __rte_aligned(1) uint32_t unaligned_uint32_t; + +/** + * Type for safe unaligned u16 access. + */ +typedef __rte_may_alias __rte_aligned(1) uint16_t unaligned_uint16_t; + /** * @deprecated * @see __rte_packed_begin @@ -159,15 +185,6 @@ typedef uint16_t unaligned_uint16_t; #define __rte_packed_end __attribute__((__packed__)) #endif -/** - * Macro to mark a type that is not subject to type-based aliasing rules - */ -#ifdef RTE_TOOLCHAIN_MSVC -#define __rte_may_alias -#else -#define __rte_may_alias __attribute__((__may_alias__)) -#endif - /******* Macro to mark functions and fields scheduled for removal *****/ #ifdef RTE_TOOLCHAIN_MSVC #define __rte_deprecated -- 2.39.5 (Apple Git-154) ^ permalink raw reply related [flat|nested] 39+ messages in thread
* RE: [PATCH v19 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs 2026-02-02 4:48 ` [PATCH v19 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs scott.k.mitch1 @ 2026-02-03 8:18 ` Morten Brørup 2026-02-16 14:29 ` David Marchand 1 sibling, 0 replies; 39+ messages in thread From: Morten Brørup @ 2026-02-03 8:18 UTC (permalink / raw) To: scott.k.mitch1, dev Cc: stephen, bruce.richardson, david.marchand, Cyril Chemparathy, stable Acked-by: Morten Brørup <mb@smartsharesystems.com> ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v19 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs 2026-02-02 4:48 ` [PATCH v19 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs scott.k.mitch1 2026-02-03 8:18 ` Morten Brørup @ 2026-02-16 14:29 ` David Marchand 2026-02-16 15:00 ` Morten Brørup 1 sibling, 1 reply; 39+ messages in thread From: David Marchand @ 2026-02-16 14:29 UTC (permalink / raw) To: scott.k.mitch1, Andre Muezerie, Tyler Retzlaff Cc: dev, mb, stephen, bruce.richardson, Cyril Chemparathy, stable Hello Scott, Andre, Tyler, On Mon, 2 Feb 2026 at 05:48, <scott.k.mitch1@gmail.com> wrote: > > From: Scott Mitchell <scott.k.mitch1@gmail.com> > > Add __rte_may_alias attribute to unaligned_uint{16,32,64}_t typedefs > to prevent GCC strict-aliasing optimization bugs. GCC has a bug where > it incorrectly elides struct initialization when strict aliasing is > enabled, causing reads from uninitialized memory. > > Add __rte_aligned(1) attribute to unaligned_uint{16,32,64}_t typedefs > which allows for safe access at any alignment. Without this, accessing > a uint16_t at an odd address is undefined behavior. Without this > UBSan detects `UndefinedBehaviorSanitizer: undefined-behavior`. > > Fixes: 7621d6a8d0bd ("eal: add and use unaligned integer types") > Cc: stable@dpdk.org > > Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com> [snip] > diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h > index 573bf4f2ce..7b36966019 100644 > --- a/lib/eal/include/rte_common.h > +++ b/lib/eal/include/rte_common.h > @@ -121,16 +121,42 @@ extern "C" { > #define __rte_aligned(a) __attribute__((__aligned__(a))) > #endif > > -#ifdef RTE_ARCH_STRICT_ALIGN > -typedef uint64_t unaligned_uint64_t __rte_aligned(1); > -typedef uint32_t unaligned_uint32_t __rte_aligned(1); > -typedef uint16_t unaligned_uint16_t __rte_aligned(1); > +/** > + * Macro to mark a type that is not subject to type-based aliasing rules > + */ > +#ifdef RTE_TOOLCHAIN_MSVC > +#define __rte_may_alias > #else > -typedef uint64_t unaligned_uint64_t; > -typedef uint32_t unaligned_uint32_t; > -typedef uint16_t unaligned_uint16_t; > +#define __rte_may_alias __attribute__((__may_alias__)) > #endif > > +/* Unaligned types implementation notes: > + * __rte_aligned(1) - Reduces alignment requirement to 1 byte, allowing > + * these types to safely access memory at any address. > + * Without this, accessing a uint16_t at an odd address > + * is undefined behavior (even on x86 where hardware > + * handles it). > + * > + * __rte_may_alias - Prevents strict-aliasing optimization bugs where > + * compilers may incorrectly elide memory operations > + * when casting between pointer types. > + */ > + > +/** > + * Type for safe unaligned u64 access. > + */ > +typedef __rte_may_alias __rte_aligned(1) uint64_t unaligned_uint64_t; > + > +/** > + * Type for safe unaligned u32 access. > + */ > +typedef __rte_may_alias __rte_aligned(1) uint32_t unaligned_uint32_t; > + > +/** > + * Type for safe unaligned u16 access. > + */ > +typedef __rte_may_alias __rte_aligned(1) uint16_t unaligned_uint16_t; > + > /** > * @deprecated > * @see __rte_packed_begin > @@ -159,15 +185,6 @@ typedef uint16_t unaligned_uint16_t; > #define __rte_packed_end __attribute__((__packed__)) > #endif > > -/** > - * Macro to mark a type that is not subject to type-based aliasing rules > - */ > -#ifdef RTE_TOOLCHAIN_MSVC > -#define __rte_may_alias > -#else > -#define __rte_may_alias __attribute__((__may_alias__)) > -#endif > - > /******* Macro to mark functions and fields scheduled for removal *****/ > #ifdef RTE_TOOLCHAIN_MSVC > #define __rte_deprecated This change raises a warning in checkpatch. https://mails.dpdk.org/archives/test-report/2026-February/955237.html IIRC, we added this check for MSVC support, making sure no __rte_aligned() would be added in unsupported locations. @Microsoft guys, do you have a suggestion? -- David Marchand ^ permalink raw reply [flat|nested] 39+ messages in thread
* RE: [PATCH v19 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs 2026-02-16 14:29 ` David Marchand @ 2026-02-16 15:00 ` Morten Brørup 0 siblings, 0 replies; 39+ messages in thread From: Morten Brørup @ 2026-02-16 15:00 UTC (permalink / raw) To: David Marchand, scott.k.mitch1, Andre Muezerie, Tyler Retzlaff Cc: dev, stephen, bruce.richardson, Cyril Chemparathy, stable > From: David Marchand [mailto:david.marchand@redhat.com] > Sent: Monday, 16 February 2026 15.29 > > Hello Scott, Andre, Tyler, > > On Mon, 2 Feb 2026 at 05:48, <scott.k.mitch1@gmail.com> wrote: > > > > From: Scott Mitchell <scott.k.mitch1@gmail.com> > > > > Add __rte_may_alias attribute to unaligned_uint{16,32,64}_t typedefs > > to prevent GCC strict-aliasing optimization bugs. GCC has a bug where > > it incorrectly elides struct initialization when strict aliasing is > > enabled, causing reads from uninitialized memory. > > > > Add __rte_aligned(1) attribute to unaligned_uint{16,32,64}_t typedefs > > which allows for safe access at any alignment. Without this, > accessing > > a uint16_t at an odd address is undefined behavior. Without this > > UBSan detects `UndefinedBehaviorSanitizer: undefined-behavior`. > > > > Fixes: 7621d6a8d0bd ("eal: add and use unaligned integer types") > > Cc: stable@dpdk.org > > > > Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com> > > [snip] > > > diff --git a/lib/eal/include/rte_common.h > b/lib/eal/include/rte_common.h > > index 573bf4f2ce..7b36966019 100644 > > --- a/lib/eal/include/rte_common.h > > +++ b/lib/eal/include/rte_common.h > > @@ -121,16 +121,42 @@ extern "C" { > > #define __rte_aligned(a) __attribute__((__aligned__(a))) > > #endif > > > > -#ifdef RTE_ARCH_STRICT_ALIGN > > -typedef uint64_t unaligned_uint64_t __rte_aligned(1); > > -typedef uint32_t unaligned_uint32_t __rte_aligned(1); > > -typedef uint16_t unaligned_uint16_t __rte_aligned(1); > > +/** > > + * Macro to mark a type that is not subject to type-based aliasing > rules > > + */ > > +#ifdef RTE_TOOLCHAIN_MSVC > > +#define __rte_may_alias > > #else > > -typedef uint64_t unaligned_uint64_t; > > -typedef uint32_t unaligned_uint32_t; > > -typedef uint16_t unaligned_uint16_t; > > +#define __rte_may_alias __attribute__((__may_alias__)) > > #endif > > > > +/* Unaligned types implementation notes: > > + * __rte_aligned(1) - Reduces alignment requirement to 1 byte, > allowing > > + * these types to safely access memory at any > address. > > + * Without this, accessing a uint16_t at an odd > address > > + * is undefined behavior (even on x86 where > hardware > > + * handles it). > > + * > > + * __rte_may_alias - Prevents strict-aliasing optimization bugs > where > > + * compilers may incorrectly elide memory > operations > > + * when casting between pointer types. > > + */ > > + > > +/** > > + * Type for safe unaligned u64 access. > > + */ > > +typedef __rte_may_alias __rte_aligned(1) uint64_t > unaligned_uint64_t; > > + > > +/** > > + * Type for safe unaligned u32 access. > > + */ > > +typedef __rte_may_alias __rte_aligned(1) uint32_t > unaligned_uint32_t; > > + > > +/** > > + * Type for safe unaligned u16 access. > > + */ > > +typedef __rte_may_alias __rte_aligned(1) uint16_t > unaligned_uint16_t; > > + > > /** > > * @deprecated > > * @see __rte_packed_begin > > @@ -159,15 +185,6 @@ typedef uint16_t unaligned_uint16_t; > > #define __rte_packed_end __attribute__((__packed__)) > > #endif > > > > -/** > > - * Macro to mark a type that is not subject to type-based aliasing > rules > > - */ > > -#ifdef RTE_TOOLCHAIN_MSVC > > -#define __rte_may_alias > > -#else > > -#define __rte_may_alias __attribute__((__may_alias__)) > > -#endif > > - > > /******* Macro to mark functions and fields scheduled for removal > *****/ > > #ifdef RTE_TOOLCHAIN_MSVC > > #define __rte_deprecated > > This change raises a warning in checkpatch. > https://mails.dpdk.org/archives/test-report/2026-February/955237.html > > IIRC, we added this check for MSVC support, making sure no > __rte_aligned() would be added in unsupported locations. It looks like MSVC can use alignment for type definitions too: https://learn.microsoft.com/en-us/cpp/cpp/align-cpp?view=msvc-170#vclrf_declspecaligntypedef It is applied on a structure, though, so may not be viable for scalar types. IDK. But it looks like MSVC can only increase alignment: https://learn.microsoft.com/en-us/cpp/cpp/align-cpp?view=msvc-170#:~:text=__declspec(align(%23))%20can%20only%20increase%20alignment%20restrictions. So packing may be needed too. > > @Microsoft guys, do you have a suggestion? > > > -- > David Marchand ^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v19 2/2] net: __rte_raw_cksum pointers enable compiler optimizations 2026-02-02 4:48 ` [PATCH v19 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 2026-02-02 4:48 ` [PATCH v19 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs scott.k.mitch1 @ 2026-02-02 4:48 ` scott.k.mitch1 2026-02-03 8:19 ` Morten Brørup 2026-02-06 14:54 ` [PATCH v19 0/2] net: optimize __rte_raw_cksum David Marchand 2 siblings, 1 reply; 39+ messages in thread From: scott.k.mitch1 @ 2026-02-02 4:48 UTC (permalink / raw) To: dev; +Cc: mb, stephen, bruce.richardson, david.marchand, Scott Mitchell From: Scott Mitchell <scott.k.mitch1@gmail.com> __rte_raw_cksum uses a loop with memcpy on each iteration. GCC 15+ is able to vectorize the loop but Clang 18.1 is not. Replace memcpy with direct pointer access using unaligned_uint16_t. This enables both GCC and Clang to vectorize the loop while handling unaligned access safely on all architectures. Performance results from cksum_perf_autotest on Intel Xeon (Cascade Lake, AVX-512) built with Clang 18.1 (TSC cycles/byte): Block size Before After Improvement 100 0.40 0.24 ~40% 1500 0.50 0.06 ~8x 9000 0.49 0.06 ~8x Signed-off-by: Scott Mitchell <scott.k.mitch1@gmail.com> --- app/test/meson.build | 1 + app/test/test_cksum_fuzz.c | 234 +++++++++++++++++++++++++++++++++++++ app/test/test_cksum_perf.c | 2 +- lib/net/rte_cksum.h | 14 +-- 4 files changed, 241 insertions(+), 10 deletions(-) create mode 100644 app/test/test_cksum_fuzz.c diff --git a/app/test/meson.build b/app/test/meson.build index f4d04a6e42..2ca17716b9 100644 --- a/app/test/meson.build +++ b/app/test/meson.build @@ -38,6 +38,7 @@ source_file_deps = { 'test_byteorder.c': [], 'test_cfgfile.c': ['cfgfile'], 'test_cksum.c': ['net'], + 'test_cksum_fuzz.c': ['net'], 'test_cksum_perf.c': ['net'], 'test_cmdline.c': [], 'test_cmdline_cirbuf.c': [], diff --git a/app/test/test_cksum_fuzz.c b/app/test/test_cksum_fuzz.c new file mode 100644 index 0000000000..33b4c77f51 --- /dev/null +++ b/app/test/test_cksum_fuzz.c @@ -0,0 +1,234 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2026 Apple Inc. + */ + +#include <stdio.h> +#include <string.h> + +#include <rte_common.h> +#include <rte_cycles.h> +#include <rte_hexdump.h> +#include <rte_cksum.h> +#include <rte_malloc.h> +#include <rte_random.h> + +#include "test.h" + +/* + * Fuzz test for __rte_raw_cksum optimization. + * Compares the optimized implementation against the original reference + * implementation across random data of various lengths. + */ + +#define DEFAULT_ITERATIONS 1000 +#define MAX_TEST_LEN 65536 /* 64K to match GRO frame sizes */ + +/* + * Original (reference) implementation of __rte_raw_cksum from DPDK v23.11. + * This is retained here for comparison testing against the optimized version. + */ +static inline uint32_t +test_cksum_fuzz_cksum_reference(const void *buf, size_t len, uint32_t sum) +{ + const void *end; + + for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t))); + buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) { + uint16_t v; + + memcpy(&v, buf, sizeof(uint16_t)); + sum += v; + } + + /* if length is odd, keeping it byte order independent */ + if (unlikely(len % 2)) { + uint16_t left = 0; + + memcpy(&left, end, 1); + sum += left; + } + + return sum; +} + +static void +init_random_buffer(uint8_t *buf, size_t len) +{ + size_t i; + + for (i = 0; i < len; i++) + buf[i] = (uint8_t)rte_rand(); +} + +static inline uint32_t +get_initial_sum(bool random_initial_sum) +{ + return random_initial_sum ? (rte_rand() & 0xFFFFFFFF) : 0; +} + +/* + * Test a single buffer length with specific alignment and initial sum + */ +static int +test_cksum_fuzz_length_aligned(size_t len, bool aligned, uint32_t initial_sum) +{ + uint8_t *data; + uint8_t *buf; + size_t alloc_size; + uint32_t sum_ref, sum_opt; + + if (len == 0 && !aligned) { + /* Skip unaligned test for zero length - nothing to test */ + return TEST_SUCCESS; + } + + /* Allocate exact size for aligned, +1 for unaligned offset */ + alloc_size = aligned ? len : len + 1; + if (alloc_size == 0) + alloc_size = 1; /* rte_malloc doesn't like 0 */ + + data = rte_malloc(NULL, alloc_size, 64); + if (data == NULL) { + printf("Failed to allocate %zu bytes\n", alloc_size); + return TEST_FAILED; + } + + buf = aligned ? data : (data + 1); + + init_random_buffer(buf, len); + + sum_ref = test_cksum_fuzz_cksum_reference(buf, len, initial_sum); + sum_opt = __rte_raw_cksum(buf, len, initial_sum); + + if (sum_ref != sum_opt) { + printf("MISMATCH at len=%zu aligned='%s' initial_sum=0x%08x ref=0x%08x opt=0x%08x\n", + len, aligned ? "aligned" : "unaligned", + initial_sum, sum_ref, sum_opt); + rte_hexdump(stdout, "failing buffer", buf, len); + rte_free(data); + return TEST_FAILED; + } + + rte_free(data); + return TEST_SUCCESS; +} + +/* + * Test a length with both alignments + */ +static int +test_cksum_fuzz_length(size_t len, uint32_t initial_sum) +{ + int rc; + + /* Test aligned */ + rc = test_cksum_fuzz_length_aligned(len, true, initial_sum); + if (rc != TEST_SUCCESS) + return rc; + + /* Test unaligned */ + rc = test_cksum_fuzz_length_aligned(len, false, initial_sum); + + return rc; +} + +/* + * Test specific edge case lengths + */ +static int +test_cksum_fuzz_edge_cases(void) +{ + /* Edge case lengths that might trigger bugs */ + static const size_t edge_lengths[] = { + 0, 1, 2, 3, 4, 5, 6, 7, 8, + 15, 16, 17, + 31, 32, 33, + 63, 64, 65, + 127, 128, 129, + 255, 256, 257, + 511, 512, 513, + 1023, 1024, 1025, + 1500, 1501, /* MTU boundaries */ + 2047, 2048, 2049, + 4095, 4096, 4097, + 8191, 8192, 8193, + 16383, 16384, 16385, + 32767, 32768, 32769, + 65534, 65535, 65536 /* 64K GRO boundaries */ + }; + unsigned int i; + int rc; + + printf("Testing edge case lengths...\n"); + + for (i = 0; i < RTE_DIM(edge_lengths); i++) { + /* Test with zero initial sum */ + rc = test_cksum_fuzz_length(edge_lengths[i], 0); + if (rc != TEST_SUCCESS) + return rc; + + /* Test with random initial sum */ + rc = test_cksum_fuzz_length(edge_lengths[i], get_initial_sum(true)); + if (rc != TEST_SUCCESS) + return rc; + } + + return TEST_SUCCESS; +} + +/* + * Test random lengths with optional random initial sums + */ +static int +test_cksum_fuzz_random(unsigned int iterations, bool random_initial_sum) +{ + unsigned int i; + int rc; + + printf("Testing random lengths (0-%d)%s...\n", MAX_TEST_LEN, + random_initial_sum ? " with random initial sums" : ""); + + for (i = 0; i < iterations; i++) { + size_t len = rte_rand() % (MAX_TEST_LEN + 1); + + rc = test_cksum_fuzz_length(len, get_initial_sum(random_initial_sum)); + if (rc != TEST_SUCCESS) { + printf("Failed at len=%zu\n", len); + return rc; + } + } + + return TEST_SUCCESS; +} + +static int +test_cksum_fuzz_random_zero_sum(void) +{ + return test_cksum_fuzz_random(DEFAULT_ITERATIONS, false); +} + +static int +test_cksum_fuzz_random_random_sum(void) +{ + return test_cksum_fuzz_random(DEFAULT_ITERATIONS, true); +} + +static struct unit_test_suite ptr_cksum_fuzz_suite = { + .suite_name = "cksum fuzz autotest", + .setup = NULL, + .teardown = NULL, + .unit_test_cases = { + TEST_CASE(test_cksum_fuzz_edge_cases), + TEST_CASE(test_cksum_fuzz_random_zero_sum), + TEST_CASE(test_cksum_fuzz_random_random_sum), + TEST_CASES_END() + } +}; + +static int +test_cksum_fuzz_suite(void) +{ + return unit_test_suite_runner(&ptr_cksum_fuzz_suite); +} + +REGISTER_FAST_TEST(cksum_fuzz_autotest, NOHUGE_OK, ASAN_OK, test_cksum_fuzz_suite); diff --git a/app/test/test_cksum_perf.c b/app/test/test_cksum_perf.c index 0b919cd59f..6b1d4589e0 100644 --- a/app/test/test_cksum_perf.c +++ b/app/test/test_cksum_perf.c @@ -15,7 +15,7 @@ #define NUM_BLOCKS 10 #define ITERATIONS 1000000 -static const size_t data_sizes[] = { 20, 21, 100, 101, 1500, 1501 }; +static const size_t data_sizes[] = { 20, 21, 100, 101, 1500, 1501, 9000, 9001, 65536, 65537 }; static __rte_noinline uint16_t do_rte_raw_cksum(const void *buf, size_t len) diff --git a/lib/net/rte_cksum.h b/lib/net/rte_cksum.h index a8e8927952..f04b46a6c3 100644 --- a/lib/net/rte_cksum.h +++ b/lib/net/rte_cksum.h @@ -42,15 +42,11 @@ extern "C" { static inline uint32_t __rte_raw_cksum(const void *buf, size_t len, uint32_t sum) { - const void *end; - - for (end = RTE_PTR_ADD(buf, RTE_ALIGN_FLOOR(len, sizeof(uint16_t))); - buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) { - uint16_t v; - - memcpy(&v, buf, sizeof(uint16_t)); - sum += v; - } + /* Process uint16 chunks to preserve overflow/carry math. GCC/Clang vectorize the loop. */ + const unaligned_uint16_t *buf16 = (const unaligned_uint16_t *)buf; + const unaligned_uint16_t *end = buf16 + (len / sizeof(*buf16)); + for (; buf16 != end; buf16++) + sum += *buf16; /* if length is odd, keeping it byte order independent */ if (unlikely(len % 2)) { -- 2.39.5 (Apple Git-154) ^ permalink raw reply related [flat|nested] 39+ messages in thread
* RE: [PATCH v19 2/2] net: __rte_raw_cksum pointers enable compiler optimizations 2026-02-02 4:48 ` [PATCH v19 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 @ 2026-02-03 8:19 ` Morten Brørup 0 siblings, 0 replies; 39+ messages in thread From: Morten Brørup @ 2026-02-03 8:19 UTC (permalink / raw) To: scott.k.mitch1, dev; +Cc: stephen, bruce.richardson, david.marchand Acked-by: Morten Brørup <mb@smartsharesystems.com> ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v19 0/2] net: optimize __rte_raw_cksum 2026-02-02 4:48 ` [PATCH v19 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 2026-02-02 4:48 ` [PATCH v19 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs scott.k.mitch1 2026-02-02 4:48 ` [PATCH v19 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 @ 2026-02-06 14:54 ` David Marchand 2026-02-07 1:29 ` Scott Mitchell 2 siblings, 1 reply; 39+ messages in thread From: David Marchand @ 2026-02-06 14:54 UTC (permalink / raw) To: scott.k.mitch1; +Cc: dev, mb, stephen, bruce.richardson, Thomas Monjalon Hi Scott, On Mon, 2 Feb 2026 at 05:48, <scott.k.mitch1@gmail.com> wrote: > > From: Scott <scott.k.mitch1@gmail.com> > > This series optimizes __rte_raw_cksum by replacing memcpy with direct > pointer access, enabling compiler vectorization on both GCC and Clang. > > Patch 1 adds __rte_may_alias and __rte_aligned(1) to unaligned typedefs > to prevent a GCC strict-aliasing bug where struct initialization is > incorrectly elided, and avoid UB by clarifying access can be from any > address. > > Patch 2 uses the improved unaligned_uint16_t type in __rte_raw_cksum > to enable compiler optimizations while maintaining correctness across > all architectures (including strict-alignment platforms). > > Performance results show significant improvements (40% for small buffers, > up to 8x for larger buffers) on Intel Xeon with Clang 18.1. > > Changes in v19: > - Move qualifiers before typedef on all platforms > - test_hash_functions explicit 32 bit variable use > > Changes in v18: > - Fix MSVC compile error __rte_aligned(1) must come before type > - Fix test_hash_functions incorrect usage of unaligned_uint32_t > > Changes in v17: > - Use __rte_aligned(1) unconditionally on unaligned type aliases > - test_cksum_fuzz uses unit_test_suite_runner > - test_cksum_fuzz reference method rename to > test_cksum_fuzz_cksum_reference > > Changes in v16: > - Add Fixes tag and Cc stable/author for backporting (patch 1) > > Changes in v15: > - Use NOHUGE_OK and ASAN_OK constants in REGISTER_FAST_TEST > > Changes in v14: > - Split into two patches: EAL typedef fix and checksum optimization > - Use unaligned_uint16_t directly instead of wrapper struct > - Added __rte_may_alias to unaligned typedefs to prevent GCC bug > > Scott Mitchell (2): > eal: add __rte_may_alias and __rte_aligned to unaligned typedefs > net: __rte_raw_cksum pointers enable compiler optimizations > > app/test/meson.build | 1 + > app/test/test_cksum_fuzz.c | 234 +++++++++++++++++++++++++++++++++ > app/test/test_cksum_perf.c | 2 +- > app/test/test_hash_functions.c | 6 +- > lib/eal/include/rte_common.h | 49 ++++--- > lib/net/rte_cksum.h | 14 +- > 6 files changed, 279 insertions(+), 27 deletions(-) > create mode 100644 app/test/test_cksum_fuzz.c I have been trying to reproduce the numbers with one (venerable) Skylake processor but I see no difference before/after the series. Numbers are in the same range with gcc (11) and clang (20) on this RHEL 9 system. RTE>>cksum_perf_autotest ### rte_raw_cksum() performance ### Alignment Block size TSC cycles/block TSC cycles/byte Aligned 20 13.0 0.65 Unaligned 20 13.0 0.65 Aligned 21 14.0 0.67 Unaligned 21 14.0 0.67 Aligned 100 19.1 0.19 Unaligned 100 19.4 0.19 Aligned 101 20.1 0.20 Unaligned 101 22.1 0.22 Aligned 1500 132.5 0.09 Unaligned 1500 134.9 0.09 Aligned 1501 133.1 0.09 Unaligned 1501 146.3 0.10 Aligned 9000 766.7 0.09 Unaligned 9000 802.2 0.09 Aligned 9001 767.6 0.09 Unaligned 9001 800.3 0.09 Aligned 65536 5404.8 0.08 Unaligned 65536 5596.3 0.09 Aligned 65537 5406.8 0.08 Unaligned 65537 5604.5 0.09 Is the improvement only affecting clang18? Other things I should check? -- David Marchand ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v19 0/2] net: optimize __rte_raw_cksum 2026-02-06 14:54 ` [PATCH v19 0/2] net: optimize __rte_raw_cksum David Marchand @ 2026-02-07 1:29 ` Scott Mitchell 2026-02-10 11:53 ` Thomas Monjalon 2026-02-16 14:04 ` David Marchand 0 siblings, 2 replies; 39+ messages in thread From: Scott Mitchell @ 2026-02-07 1:29 UTC (permalink / raw) To: David Marchand; +Cc: dev, mb, stephen, bruce.richardson, Thomas Monjalon Thanks for testing! I included my build/host config, results on the main branch, and then with this path applied below. What is your build flags/configuration (e, cpu_instruction_set, march, optimization level, etc.)? I wasn't able to get any Clang version (18, 19, 20) to vectorize on Godbolt https://godbolt.org/z/8149r7sq8, and curious if your config enables vectorization. #### build / host config User defined options b_lto : false buildtype : release c_args : -fno-omit-frame-pointer -DPACKET_QDISC_BYPASS=1 -DRTE_MEMCPY_AVX512=1 cpu_instruction_set: cascadelake default_library : static max_lcores : 128 optimization : 3 $ clang --version clang version 18.1.8 (Red Hat, Inc. 18.1.8-3.el9) $ cat /etc/redhat-release Red Hat Enterprise Linux release 9.4 (Plow) #### main branch $ echo "cksum_perf_autotest" | /usr/local/bin/dpdk-test ### rte_raw_cksum() performance ### Alignment Block size TSC cycles/block TSC cycles/byte Aligned 20 10.0 0.50 Unaligned 20 10.1 0.50 Aligned 21 11.1 0.53 Unaligned 21 11.6 0.55 Aligned 100 39.4 0.39 Unaligned 100 67.3 0.67 Aligned 101 43.3 0.43 Unaligned 101 41.5 0.41 Aligned 1500 728.2 0.49 Unaligned 1500 805.8 0.54 Aligned 1501 768.8 0.51 Unaligned 1501 787.3 0.52 Test OK #### with this patch $ echo "cksum_perf_autotest" | /usr/local/bin/dpdk-test ### rte_raw_cksum() performance ### Alignment Block size TSC cycles/block TSC cycles/byte Aligned 20 12.6 0.63 Unaligned 20 12.3 0.62 Aligned 21 13.6 0.65 Unaligned 21 13.6 0.65 Aligned 100 22.7 0.23 Unaligned 100 22.6 0.23 Aligned 101 47.4 0.47 Unaligned 101 23.9 0.24 Aligned 1500 73.9 0.05 Unaligned 1500 73.9 0.05 Aligned 1501 95.7 0.06 Unaligned 1501 73.9 0.05 Aligned 9000 459.8 0.05 Unaligned 9000 523.5 0.06 Aligned 9001 536.7 0.06 Unaligned 9001 507.5 0.06 Aligned 65536 3158.4 0.05 Unaligned 65536 3506.1 0.05 Aligned 65537 3277.6 0.05 Unaligned 65537 3697.6 0.06 Test OK ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v19 0/2] net: optimize __rte_raw_cksum 2026-02-07 1:29 ` Scott Mitchell @ 2026-02-10 11:53 ` Thomas Monjalon 2026-02-16 14:04 ` David Marchand 1 sibling, 0 replies; 39+ messages in thread From: Thomas Monjalon @ 2026-02-10 11:53 UTC (permalink / raw) To: Scott Mitchell; +Cc: David Marchand, dev, mb, stephen, bruce.richardson Here are my test results: buildtype : debugoptimized default_library : shared -march=x86-64-v4 (Cascade Lake) gcc 15.2.1 clang 21.1.6 GCC - BEFORE Alignment Block size TSC cycles/block TSC cycles/byte Aligned 20 20.5 1.02 Unaligned 20 14.1 0.70 Aligned 21 15.8 0.75 Unaligned 21 15.8 0.75 Aligned 1500 148.2 0.10 Unaligned 1500 148.3 0.10 Aligned 1501 148.4 0.10 Unaligned 1501 148.2 0.10 GCC - AFTER Alignment Block size TSC cycles/block TSC cycles/byte Aligned 20 20.8 1.04 Unaligned 20 15.6 0.78 Aligned 21 16.9 0.81 Unaligned 21 16.9 0.80 Aligned 1500 109.5 0.07 Unaligned 1500 111.6 0.07 Aligned 1501 111.1 0.07 Unaligned 1501 113.0 0.08 Aligned 9000 612.4 0.07 Unaligned 9000 612.6 0.07 Aligned 9001 581.5 0.06 Unaligned 9001 601.7 0.07 CLANG - BEFORE Alignment Block size TSC cycles/block TSC cycles/byte Aligned 20 14.2 0.71 Unaligned 20 9.5 0.47 Aligned 21 11.7 0.56 Unaligned 21 11.8 0.56 Aligned 1500 610.7 0.41 Unaligned 1500 632.0 0.42 Aligned 1501 610.4 0.41 Unaligned 1501 627.6 0.42 CLANG - AFTER Alignment Block size TSC cycles/block TSC cycles/byte Aligned 20 14.0 0.70 Unaligned 20 9.1 0.45 Aligned 21 9.7 0.46 Unaligned 21 9.6 0.46 Aligned 1500 77.9 0.05 Unaligned 1500 79.4 0.05 Aligned 1501 79.4 0.05 Unaligned 1501 80.4 0.05 Aligned 9000 447.8 0.05 Unaligned 9000 492.1 0.05 Aligned 9001 448.5 0.05 Unaligned 9001 492.6 0.05 Before your patch, With small block size, clang is better than GCC. With large block size, GCC is better than clang. After your patch, clang is always better than GCC. 07/02/2026 02:29, Scott Mitchell: > Thanks for testing! I included my build/host config, results on the > main branch, and then with this path applied below. What is your build > flags/configuration (e, cpu_instruction_set, march, optimization > level, etc.)? I wasn't able to get any Clang version (18, 19, 20) to > vectorize on Godbolt https://godbolt.org/z/8149r7sq8, and curious if > your config enables vectorization. > > #### build / host config > User defined options > b_lto : false > buildtype : release > c_args : -fno-omit-frame-pointer > -DPACKET_QDISC_BYPASS=1 -DRTE_MEMCPY_AVX512=1 > cpu_instruction_set: cascadelake > default_library : static > max_lcores : 128 > optimization : 3 > $ clang --version > clang version 18.1.8 (Red Hat, Inc. 18.1.8-3.el9) > $ cat /etc/redhat-release > Red Hat Enterprise Linux release 9.4 (Plow) > > #### main branch > $ echo "cksum_perf_autotest" | /usr/local/bin/dpdk-test > ### rte_raw_cksum() performance ### > Alignment Block size TSC cycles/block TSC cycles/byte > Aligned 20 10.0 0.50 > Unaligned 20 10.1 0.50 > Aligned 21 11.1 0.53 > Unaligned 21 11.6 0.55 > Aligned 100 39.4 0.39 > Unaligned 100 67.3 0.67 > Aligned 101 43.3 0.43 > Unaligned 101 41.5 0.41 > Aligned 1500 728.2 0.49 > Unaligned 1500 805.8 0.54 > Aligned 1501 768.8 0.51 > Unaligned 1501 787.3 0.52 > Test OK > > #### with this patch > $ echo "cksum_perf_autotest" | /usr/local/bin/dpdk-test > ### rte_raw_cksum() performance ### > Alignment Block size TSC cycles/block TSC cycles/byte > Aligned 20 12.6 0.63 > Unaligned 20 12.3 0.62 > Aligned 21 13.6 0.65 > Unaligned 21 13.6 0.65 > Aligned 100 22.7 0.23 > Unaligned 100 22.6 0.23 > Aligned 101 47.4 0.47 > Unaligned 101 23.9 0.24 > Aligned 1500 73.9 0.05 > Unaligned 1500 73.9 0.05 > Aligned 1501 95.7 0.06 > Unaligned 1501 73.9 0.05 > Aligned 9000 459.8 0.05 > Unaligned 9000 523.5 0.06 > Aligned 9001 536.7 0.06 > Unaligned 9001 507.5 0.06 > Aligned 65536 3158.4 0.05 > Unaligned 65536 3506.1 0.05 > Aligned 65537 3277.6 0.05 > Unaligned 65537 3697.6 0.06 > Test OK > ^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v19 0/2] net: optimize __rte_raw_cksum 2026-02-07 1:29 ` Scott Mitchell 2026-02-10 11:53 ` Thomas Monjalon @ 2026-02-16 14:04 ` David Marchand 1 sibling, 0 replies; 39+ messages in thread From: David Marchand @ 2026-02-16 14:04 UTC (permalink / raw) To: Scott Mitchell; +Cc: dev, mb, stephen, bruce.richardson, Thomas Monjalon On Sat, 7 Feb 2026 at 02:29, Scott Mitchell <scott.k.mitch1@gmail.com> wrote: > > Thanks for testing! I included my build/host config, results on the > main branch, and then with this path applied below. What is your build > flags/configuration (e, cpu_instruction_set, march, optimization > level, etc.)? I wasn't able to get any Clang version (18, 19, 20) to > vectorize on Godbolt https://godbolt.org/z/8149r7sq8, and curious if > your config enables vectorization. > > #### build / host config > User defined options > b_lto : false > buildtype : release > c_args : -fno-omit-frame-pointer > -DPACKET_QDISC_BYPASS=1 -DRTE_MEMCPY_AVX512=1 > cpu_instruction_set: cascadelake > default_library : static > max_lcores : 128 > optimization : 3 > $ clang --version > clang version 18.1.8 (Red Hat, Inc. 18.1.8-3.el9) > $ cat /etc/redhat-release > Red Hat Enterprise Linux release 9.4 (Plow) > > #### main branch > $ echo "cksum_perf_autotest" | /usr/local/bin/dpdk-test > ### rte_raw_cksum() performance ### > Alignment Block size TSC cycles/block TSC cycles/byte > Aligned 20 10.0 0.50 > Unaligned 20 10.1 0.50 > Aligned 21 11.1 0.53 > Unaligned 21 11.6 0.55 > Aligned 100 39.4 0.39 > Unaligned 100 67.3 0.67 > Aligned 101 43.3 0.43 > Unaligned 101 41.5 0.41 > Aligned 1500 728.2 0.49 > Unaligned 1500 805.8 0.54 > Aligned 1501 768.8 0.51 > Unaligned 1501 787.3 0.52 > Test OK > > #### with this patch > $ echo "cksum_perf_autotest" | /usr/local/bin/dpdk-test > ### rte_raw_cksum() performance ### > Alignment Block size TSC cycles/block TSC cycles/byte > Aligned 20 12.6 0.63 > Unaligned 20 12.3 0.62 > Aligned 21 13.6 0.65 > Unaligned 21 13.6 0.65 > Aligned 100 22.7 0.23 > Unaligned 100 22.6 0.23 > Aligned 101 47.4 0.47 > Unaligned 101 23.9 0.24 > Aligned 1500 73.9 0.05 > Unaligned 1500 73.9 0.05 > Aligned 1501 95.7 0.06 > Unaligned 1501 73.9 0.05 > Aligned 9000 459.8 0.05 > Unaligned 9000 523.5 0.06 > Aligned 9001 536.7 0.06 > Unaligned 9001 507.5 0.06 > Aligned 65536 3158.4 0.05 > Unaligned 65536 3506.1 0.05 > Aligned 65537 3277.6 0.05 > Unaligned 65537 3697.6 0.06 > Test OK I redid my bench from scratch and I do see an improvement for clang. -Aligned 1500 905.3 0.60 -Unaligned 1500 924.9 0.62 -Aligned 1501 907.6 0.60 -Unaligned 1501 932.1 0.62 -Aligned 9000 5252.1 0.58 -Unaligned 9000 5433.0 0.60 -Aligned 9001 5260.9 0.58 -Unaligned 9001 5440.4 0.60 -Aligned 65536 38395.2 0.59 -Unaligned 65536 39639.5 0.60 -Aligned 65537 38030.3 0.58 -Unaligned 65537 39292.7 0.60 +Aligned 1500 104.0 0.07 +Unaligned 1500 106.5 0.07 +Aligned 1501 104.1 0.07 +Unaligned 1501 107.0 0.07 +Aligned 9000 596.7 0.07 +Unaligned 9000 655.1 0.07 +Aligned 9001 597.6 0.07 +Unaligned 9001 657.2 0.07 +Aligned 65536 4139.3 0.06 +Unaligned 65536 4583.2 0.07 +Aligned 65537 4139.9 0.06 +Unaligned 65537 4585.9 0.07 Something was most likely wrong in my test (and seeing how the gcc and clang numbers looked so close... I may have been using the gcc binary...). This is noticeable with clang, and no special cpu_instruction_set or any kind of compiler optimisation level set. I'll finish my checks and merge this nice improvement for rc1. -- David Marchand ^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2026-02-16 15:00 UTC | newest] Thread overview: 39+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-12 12:04 [PATCH v14 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 2026-01-12 12:04 ` [PATCH v14 1/2] eal: add __rte_may_alias to unaligned typedefs scott.k.mitch1 2026-01-12 13:28 ` Morten Brørup 2026-01-12 15:00 ` Scott Mitchell 2026-01-12 12:04 ` [PATCH v14 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 2026-01-17 21:21 ` [PATCH v15 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 2026-01-17 21:21 ` [PATCH v15 1/2] eal: add __rte_may_alias to unaligned typedefs scott.k.mitch1 2026-01-20 15:23 ` Morten Brørup 2026-01-23 14:34 ` Scott Mitchell 2026-01-17 21:21 ` [PATCH v15 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 2026-01-17 22:08 ` [PATCH v15 0/2] net: optimize __rte_raw_cksum Stephen Hemminger 2026-01-20 12:45 ` Morten Brørup 2026-01-23 15:43 ` Scott Mitchell 2026-01-23 16:02 ` [PATCH v16 " scott.k.mitch1 2026-01-23 16:02 ` [PATCH v16 1/2] eal: add __rte_may_alias to unaligned typedefs scott.k.mitch1 2026-01-23 16:02 ` [PATCH v16 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 2026-01-28 11:05 ` David Marchand 2026-01-28 17:39 ` Scott Mitchell 2026-01-24 8:23 ` [PATCH v16 0/2] net: optimize __rte_raw_cksum Morten Brørup 2026-01-28 18:05 ` [PATCH v17 " scott.k.mitch1 2026-01-28 18:05 ` [PATCH v17 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs scott.k.mitch1 2026-01-28 18:05 ` [PATCH v17 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 2026-01-28 19:41 ` [PATCH v18 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 2026-01-28 19:41 ` [PATCH v18 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs scott.k.mitch1 2026-01-29 8:28 ` Morten Brørup 2026-02-02 4:31 ` Scott Mitchell 2026-01-28 19:41 ` [PATCH v18 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 2026-01-29 8:31 ` Morten Brørup 2026-02-02 4:48 ` [PATCH v19 0/2] net: optimize __rte_raw_cksum scott.k.mitch1 2026-02-02 4:48 ` [PATCH v19 1/2] eal: add __rte_may_alias and __rte_aligned to unaligned typedefs scott.k.mitch1 2026-02-03 8:18 ` Morten Brørup 2026-02-16 14:29 ` David Marchand 2026-02-16 15:00 ` Morten Brørup 2026-02-02 4:48 ` [PATCH v19 2/2] net: __rte_raw_cksum pointers enable compiler optimizations scott.k.mitch1 2026-02-03 8:19 ` Morten Brørup 2026-02-06 14:54 ` [PATCH v19 0/2] net: optimize __rte_raw_cksum David Marchand 2026-02-07 1:29 ` Scott Mitchell 2026-02-10 11:53 ` Thomas Monjalon 2026-02-16 14:04 ` David Marchand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox