DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: "Morten Brørup" <mb@smartsharesystems.com>
Cc: dev@dpdk.org, Bruce Richardson <bruce.richardson@intel.com>,
	Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>,
	Vipin Varghese <vipin.varghese@amd.com>,
	Liangxing Wang <wangliangxing@hygon.cn>,
	Thiyagarajan P <Thiyagarajan.P@amd.com>,
	Bala Murali Krishna <Bala.MuraliKrishna@amd.com>,
	Anatoly Burakov <anatoly.burakov@intel.com>,
	Vladimir Medvedkin <vladimir.medvedkin@intel.com>,
	Konstantin Ananyev <konstantin.ananyev@huawei.com>
Subject: Re: [PATCH v11] eal/x86: optimize memcpy of small sizes
Date: Thu, 21 May 2026 12:48:48 -0700	[thread overview]
Message-ID: <20260521124848.62b69a7e@phoenix.local> (raw)
In-Reply-To: <20260521185631.116046-1-mb@smartsharesystems.com>

On Thu, 21 May 2026 18:56:31 +0000
Morten Brørup <mb@smartsharesystems.com> wrote:

> The implementation for copying up to 64 bytes does not depend on address
> alignment with the size of the CPU's vector registers. Nonetheless, the
> exact same code for copying up to 64 bytes was present in both the aligned
> copy function and all the CPU vector register size specific variants of
> the unaligned copy functions.
> With this patch, the implementation for copying up to 64 bytes was
> consolidated into one instance, located in the common copy function,
> before checking alignment requirements.
> This provides three benefits:
> 1. No copy-paste in the source code.
> 2. A performance gain for copying up to 64 bytes, because the
> address alignment check is avoided in this case.
> 3. Reduced instruction memory footprint, because the compiler only
> generates one instance of the function for copying up to 64 bytes, instead
> of two instances (one in the unaligned copy function, and one in the
> aligned copy function).
> 
> Furthermore, __rte_restrict was added to source and destination addresses.
> 
> Also, the missing implementation of rte_mov48() was added.
> 
> Until recently, some drivers required disabling stringop-overflow warnings
> when using rte_memcpy().
> For some strange reason, these warnings were disabled in the rte_memcpy
> header file, instead of in the problematic drivers.
> With series-38174 ("remove use of rte_memcpy from net/intel"), the
> problematic drivers were updated to use memcpy() instead of rte_memcpy(),
> so disabling these warnings is no longer required, and was removed.
> 
> Regarding performance...
> The memcpy performance test (cache-to-cache copy) shows:
> Copying up to 15 bytes takes ca. 4.5 cycles, versus ca. 6.5 cycles before.
> Copying 8 bytes takes 4 cycles, versus 7 cycles before.
> Copying 16 bytes takes 2 cycles, versus 4 cycles before.
> Copying 64 bytes takes 4 cycles, versus 7 cycles before.
> 
> Depends-on: series-38174 ("remove use of rte_memcpy from net/intel")
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> ---

Used this as test case for comparing AI reviews.

Anthropic saw:

## Warnings

### 1. Missing rte_mov48() test coverage

**Issue:** The patch adds a new `rte_mov48()` function but does not mention adding tests for it.

**Why it matters:** New utility functions should have test coverage to verify correctness, especially for low-level copy operations that could silently corrupt data if implemented incorrectly.

**Suggested fix:** Add a test case in `app/test/test_memcpy.c` (or equivalent) that verifies `rte_mov48()` copies 48 bytes correctly and preserves data beyond the copy range.

### 2. Documentation comment inconsistency

**Issue:** The main `rte_memcpy()` function description comment was removed (lines 37-39 of original), but the opening documentation text about overlapping locations remains inconsistent across helper functions.

The original stated:
```c
/**
 * Copy bytes from one location to another. The locations must not overlap.
 *
 * @note This is implemented as a macro, so it's address should not be taken
 * and care is needed as parameter expressions may be evaluated multiple times.
```

The new version removes the `@note` about macro implementation (which is correct since it's now a function), but does not update the main function comment to include the same `@param` / `@return` documentation that other memory functions have.

**Suggested fix:** Add complete Doxygen documentation to `rte_memcpy()`:
```c
/**
 * Copy bytes from one location to another. The locations must not overlap.
 *
 * @param dst
 *   Pointer to the destination of the data.
 * @param src
 *   Pointer to the source data.
 * @param n
 *   Number of bytes to copy.
 * @return
 *   Pointer to the destination data.
 */
```

xAI (Grok) saw:
**Error**

- `rte_mov48()` implementation casts away `__rte_restrict` qualifiers (lines 137-140 and 143-147).  
  Casting a restrict-qualified pointer to a non-restrict pointer is undefined behavior under the C standard and can break compiler alias analysis.  
  Fix: either make the callees (`rte_mov32`/`rte_mov16`) accept the restrict qualifiers, or use a local non-restrict pointer only inside `rte_mov48` after proper validation.

**Error**

- Multiple functions (`rte_mov16`, `rte_mov32`, `rte_mov48`, `rte_mov64`, `rte_mov128`, `rte_mov256`, `rte_mov128blocks`, `rte_mov512blocks`, `rte_memcpy_generic_more_than_64`, etc.) now take `uint8_t *__rte_restrict` parameters, but their callers inside the same file (e.g., the new `rte_memcpy` implementation and the various block-copy macros) pass `uint8_t *` without the restrict qualifier.  
  This creates implicit non-restrict paths that defeat the purpose of adding `__rte_restrict` and may trigger compiler warnings or incorrect optimizations.  
  Recommended fix: either propagate `__rte_restrict` consistently through all internal call sites or keep internal helpers non-restrict and only expose restrict at the public `rte_memcpy` boundary.

**Warning**

- `rte_memcpy_generic_more_than_64` and `rte_memcpy_aligned_more_than_64` are now the only implementations for >64-byte copies, but the old `rte_memcpy_generic` and `rte_memcpy_aligned` names have been removed.  
  Any external code (or test) that took the address of the old functions will break.  
  The patch should either keep the old names as aliases or document the removal in release notes.

**Warning**

- The new `rte_memcpy` implementation for 17–64 bytes contains a compile-time constant check (`if (__rte_constant(n) && n == 64)`) followed by a fallback path that still executes `rte_mov16` three times for SSE when `n > 48`.  
  This path is correct but duplicates logic that already exists in `rte_mov48`.  
  Consider calling `rte_mov48` for the AVX case and a single `rte_mov48` + tail for SSE to reduce code duplication.

**Info**

- The removal of the `#pragma GCC diagnostic` block for `-Wstringop-overflow` is correct per the patch description (drivers were updated). No further action needed.

**Info**

- All other style, API-tag, and documentation requirements appear satisfied. No additional correctness bugs (leaks, use-after-free, races, or forbidden tokens) were identified.




  reply	other threads:[~2026-05-21 19:48 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-20 11:45 [PATCH] eal/x86: reduce memcpy code duplication Morten Brørup
2025-11-21 10:35 ` [PATCH v2] eal/x86: optimize memcpy of small sizes Morten Brørup
2025-11-21 16:57   ` Stephen Hemminger
2025-11-21 17:02     ` Bruce Richardson
2025-11-21 17:11       ` Stephen Hemminger
2025-11-21 21:36         ` Morten Brørup
2025-11-21 10:40 ` Morten Brørup
2025-11-21 10:40 ` [PATCH v3] " Morten Brørup
2025-11-24 13:36   ` Morten Brørup
2025-11-24 15:46     ` Patrick Robb
2025-11-28 14:02   ` Konstantin Ananyev
2025-11-28 15:55     ` Morten Brørup
2025-11-28 18:10       ` Konstantin Ananyev
2025-11-29  2:17         ` Morten Brørup
2025-12-01  9:35           ` Konstantin Ananyev
2025-12-01 10:41             ` Morten Brørup
2025-11-24 20:31 ` [PATCH v4] " Morten Brørup
2025-11-25  8:19   ` Morten Brørup
2025-12-01 15:55 ` [PATCH v5] " Morten Brørup
2025-12-03 13:29   ` Morten Brørup
2026-01-03 17:53   ` Morten Brørup
2026-01-09 15:05     ` Varghese, Vipin
2026-01-11 15:52     ` Konstantin Ananyev
2026-01-11 16:01       ` Stephen Hemminger
2026-01-12  8:02       ` Morten Brørup
2026-01-12 16:00         ` Scott Mitchell
2026-01-13  0:39           ` Stephen Hemminger
2026-01-12 12:03 ` [PATCH v6] " Morten Brørup
2026-01-13 23:19   ` Stephen Hemminger
2026-01-20 11:00     ` Varghese, Vipin
2026-01-20 11:19       ` Varghese, Vipin
2026-01-20 11:22         ` Morten Brørup
2026-01-21 11:48           ` Varghese, Vipin
2026-01-22  6:59             ` Varghese, Vipin
2026-01-22  7:28               ` Liangxing Wang
2026-01-23  6:58               ` Varghese, Vipin
2026-02-20 11:08 ` [PATCH v7] " Morten Brørup
2026-03-11  7:28   ` Morten Brørup
2026-03-11 16:58   ` Bruce Richardson
2026-03-11 18:29     ` Morten Brørup
2026-03-11 19:09       ` Bruce Richardson
2026-03-12  8:33   ` Konstantin Ananyev
2026-03-19 15:55   ` Morten Brørup
2026-04-29  9:36 ` [PATCH v8] " Morten Brørup
2026-04-29 10:35 ` [PATCH v9] " Morten Brørup
2026-04-29 11:24   ` Morten Brørup
2026-05-08  6:32   ` Morten Brørup
2026-05-21 10:54   ` [TEST PATCH " Morten Brørup
2026-05-08  9:58 ` [PATCH v10] " Morten Brørup
2026-05-21 18:56 ` [PATCH v11] " Morten Brørup
2026-05-21 19:48   ` Stephen Hemminger [this message]
2026-05-21 22:42   ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260521124848.62b69a7e@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=Bala.MuraliKrishna@amd.com \
    --cc=Thiyagarajan.P@amd.com \
    --cc=anatoly.burakov@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@huawei.com \
    --cc=konstantin.v.ananyev@yandex.ru \
    --cc=mb@smartsharesystems.com \
    --cc=vipin.varghese@amd.com \
    --cc=vladimir.medvedkin@intel.com \
    --cc=wangliangxing@hygon.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox