[PATCH] mempool: simplify get objects

public inbox for dev@dpdk.org
 help / color / mirror / Atom feed

* [PATCH] mempool: simplify get objects
@ 2026-01-20  8:20 Morten Brørup
  2026-01-20  8:57 ` Morten Brørup
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Morten Brørup @ 2026-01-20  8:20 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: stable, Morten Brørup

Removed explicit test for build time constant request size,
and added comment that the compiler loop unrolls when request size is
build time constant, to improve source code readability.

Also, when putting objects, the compiler does not know if calling
rte_mempool_ops_enqueue_bulk() modifies cache->len, so load it before the
call, so it doesn't have to be loaded again after the call.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/mempool/rte_mempool.h | 38 ++++++++++----------------------------
 1 file changed, 10 insertions(+), 28 deletions(-)

diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index aedc100964..61b415e336 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -1410,8 +1410,9 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 		 * Flush the cache to make room for the objects.
 		 */
 		cache_objs = &cache->objs[0];
-		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
+		const uint32_t len = cache->len;
 		cache->len = n;
+		rte_mempool_ops_enqueue_bulk(mp, cache_objs, len);
 	} else {
 		/* The request itself is too big for the cache. */
 		goto driver_enqueue_stats_incremented;
@@ -1531,11 +1532,11 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 	cache_objs = &cache->objs[cache->len];
 
 	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
-	if (__rte_constant(n) && n <= cache->len) {
+	if (likely(n <= cache->len)) {
 		/*
-		 * The request size is known at build time, and
-		 * the entire request can be satisfied from the cache,
-		 * so let the compiler unroll the fixed length copy loop.
+		 * The entire request can be satisfied from the cache.
+		 * If the request size is known at build time,
+		 * the compiler unrolls the fixed length copy loop.
 		 */
 		cache->len -= n;
 		for (index = 0; index < n; index++)
@@ -1547,31 +1548,13 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 		return 0;
 	}
 
-	/*
-	 * Use the cache as much as we have to return hot objects first.
-	 * If the request size 'n' is known at build time, the above comparison
-	 * ensures that n > cache->len here, so omit RTE_MIN().
-	 */
-	len = __rte_constant(n) ? cache->len : RTE_MIN(n, cache->len);
-	cache->len -= len;
+	/* Use the cache as much as we have to return hot objects first. */
+	len = cache->len;
 	remaining = n - len;
+	cache->len = 0;
 	for (index = 0; index < len; index++)
 		*obj_table++ = *--cache_objs;
 
-	/*
-	 * If the request size 'n' is known at build time, the case
-	 * where the entire request can be satisfied from the cache
-	 * has already been handled above, so omit handling it here.
-	 */
-	if (!__rte_constant(n) && likely(remaining == 0)) {
-		/* The entire request is satisfied from the cache. */
-
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-
-		return 0;
-	}
-
 	/* Dequeue below would overflow mem allocated for cache? */
 	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
 		goto driver_dequeue;
@@ -1592,11 +1575,10 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
 	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
 	cache_objs = &cache->objs[cache->size + remaining];
+	cache->len = cache->size;
 	for (index = 0; index < remaining; index++)
 		*obj_table++ = *--cache_objs;
 
-	cache->len = cache->size;
-
 	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
 	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* RE: [PATCH] mempool: simplify get objects
  2026-01-20  8:20 [PATCH] mempool: simplify get objects Morten Brørup
@ 2026-01-20  8:57 ` Morten Brørup
  2026-01-20 10:17 ` [PATCH v2] " Morten Brørup
  2026-02-16  9:27 ` [PATCH v3] " Morten Brørup
  2 siblings, 0 replies; 9+ messages in thread
From: Morten Brørup @ 2026-01-20  8:57 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: stable

PLEASE IGNORE.
The patch description is rubbish.
Will post a v2.

> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Tuesday, 20 January 2026 09.21
> 
> Removed explicit test for build time constant request size,
> and added comment that the compiler loop unrolls when request size is
> build time constant, to improve source code readability.
> 
> Also, when putting objects, the compiler does not know if calling
> rte_mempool_ops_enqueue_bulk() modifies cache->len, so load it before
> the
> call, so it doesn't have to be loaded again after the call.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
>  lib/mempool/rte_mempool.h | 38 ++++++++++----------------------------
>  1 file changed, 10 insertions(+), 28 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index aedc100964..61b415e336 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -1410,8 +1410,9 @@ rte_mempool_do_generic_put(struct rte_mempool
> *mp, void * const *obj_table,
>  		 * Flush the cache to make room for the objects.
>  		 */
>  		cache_objs = &cache->objs[0];
> -		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> +		const uint32_t len = cache->len;
>  		cache->len = n;
> +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, len);
>  	} else {
>  		/* The request itself is too big for the cache. */
>  		goto driver_enqueue_stats_incremented;
> @@ -1531,11 +1532,11 @@ rte_mempool_do_generic_get(struct rte_mempool
> *mp, void **obj_table,
>  	cache_objs = &cache->objs[cache->len];
> 
>  	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
> -	if (__rte_constant(n) && n <= cache->len) {
> +	if (likely(n <= cache->len)) {
>  		/*
> -		 * The request size is known at build time, and
> -		 * the entire request can be satisfied from the cache,
> -		 * so let the compiler unroll the fixed length copy loop.
> +		 * The entire request can be satisfied from the cache.
> +		 * If the request size is known at build time,
> +		 * the compiler unrolls the fixed length copy loop.
>  		 */
>  		cache->len -= n;
>  		for (index = 0; index < n; index++)
> @@ -1547,31 +1548,13 @@ rte_mempool_do_generic_get(struct rte_mempool
> *mp, void **obj_table,
>  		return 0;
>  	}
> 
> -	/*
> -	 * Use the cache as much as we have to return hot objects first.
> -	 * If the request size 'n' is known at build time, the above
> comparison
> -	 * ensures that n > cache->len here, so omit RTE_MIN().
> -	 */
> -	len = __rte_constant(n) ? cache->len : RTE_MIN(n, cache->len);
> -	cache->len -= len;
> +	/* Use the cache as much as we have to return hot objects first.
> */
> +	len = cache->len;
>  	remaining = n - len;
> +	cache->len = 0;
>  	for (index = 0; index < len; index++)
>  		*obj_table++ = *--cache_objs;
> 
> -	/*
> -	 * If the request size 'n' is known at build time, the case
> -	 * where the entire request can be satisfied from the cache
> -	 * has already been handled above, so omit handling it here.
> -	 */
> -	if (!__rte_constant(n) && likely(remaining == 0)) {
> -		/* The entire request is satisfied from the cache. */
> -
> -		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> -		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> -
> -		return 0;
> -	}
> -
>  	/* Dequeue below would overflow mem allocated for cache? */
>  	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
>  		goto driver_dequeue;
> @@ -1592,11 +1575,10 @@ rte_mempool_do_generic_get(struct rte_mempool
> *mp, void **obj_table,
>  	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
>  	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
>  	cache_objs = &cache->objs[cache->size + remaining];
> +	cache->len = cache->size;
>  	for (index = 0; index < remaining; index++)
>  		*obj_table++ = *--cache_objs;
> 
> -	cache->len = cache->size;
> -
>  	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
>  	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> 
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] mempool: simplify get objects
  2026-01-20  8:20 [PATCH] mempool: simplify get objects Morten Brørup
  2026-01-20  8:57 ` Morten Brørup
@ 2026-01-20 10:17 ` Morten Brørup
  2026-01-20 20:00   ` Stephen Hemminger
  2026-02-03 10:03   ` Morten Brørup
  2026-02-16  9:27 ` [PATCH v3] " Morten Brørup
  2 siblings, 2 replies; 9+ messages in thread
From: Morten Brørup @ 2026-01-20 10:17 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: Morten Brørup

Removed explicit test for build time constant request size,
and added comment that the compiler loop unrolls when request size is
build time constant, to improve source code readability.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
v2:
* Removed unrelated microoptimization from rte_mempool_do_generic_put(),
  which was also described incorrectly.
---
 lib/mempool/rte_mempool.h | 35 ++++++++---------------------------
 1 file changed, 8 insertions(+), 27 deletions(-)

diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index aedc100964..4213784e14 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -1531,11 +1531,11 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 	cache_objs = &cache->objs[cache->len];
 
 	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
-	if (__rte_constant(n) && n <= cache->len) {
+	if (likely(n <= cache->len)) {
 		/*
-		 * The request size is known at build time, and
-		 * the entire request can be satisfied from the cache,
-		 * so let the compiler unroll the fixed length copy loop.
+		 * The entire request can be satisfied from the cache.
+		 * If the request size is known at build time,
+		 * the compiler unrolls the fixed length copy loop.
 		 */
 		cache->len -= n;
 		for (index = 0; index < n; index++)
@@ -1547,31 +1547,13 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 		return 0;
 	}
 
-	/*
-	 * Use the cache as much as we have to return hot objects first.
-	 * If the request size 'n' is known at build time, the above comparison
-	 * ensures that n > cache->len here, so omit RTE_MIN().
-	 */
-	len = __rte_constant(n) ? cache->len : RTE_MIN(n, cache->len);
-	cache->len -= len;
+	/* Use the cache as much as we have to return hot objects first. */
+	len = cache->len;
 	remaining = n - len;
+	cache->len = 0;
 	for (index = 0; index < len; index++)
 		*obj_table++ = *--cache_objs;
 
-	/*
-	 * If the request size 'n' is known at build time, the case
-	 * where the entire request can be satisfied from the cache
-	 * has already been handled above, so omit handling it here.
-	 */
-	if (!__rte_constant(n) && likely(remaining == 0)) {
-		/* The entire request is satisfied from the cache. */
-
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-
-		return 0;
-	}
-
 	/* Dequeue below would overflow mem allocated for cache? */
 	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
 		goto driver_dequeue;
@@ -1592,11 +1574,10 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
 	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
 	cache_objs = &cache->objs[cache->size + remaining];
+	cache->len = cache->size;
 	for (index = 0; index < remaining; index++)
 		*obj_table++ = *--cache_objs;
 
-	cache->len = cache->size;
-
 	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
 	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] mempool: simplify get objects
  2026-01-20 10:17 ` [PATCH v2] " Morten Brørup
@ 2026-01-20 20:00   ` Stephen Hemminger
  2026-01-21 11:17     ` Morten Brørup
  2026-02-03 10:03   ` Morten Brørup
  1 sibling, 1 reply; 9+ messages in thread
From: Stephen Hemminger @ 2026-01-20 20:00 UTC (permalink / raw)
  To: Morten Brørup; +Cc: Andrew Rybchenko, dev

On Tue, 20 Jan 2026 10:17:01 +0000
Morten Brørup <mb@smartsharesystems.com> wrote:

> Removed explicit test for build time constant request size,
> and added comment that the compiler loop unrolls when request size is
> build time constant, to improve source code readability.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>

Not sure I agree with what AI review shows, but here it is:

## Commit Message Review

**✓ Good:**
- Subject line: 29 characters (well under 60 limit)
- Proper component prefix: `mempool:`
- Lowercase, imperative mood, no trailing period
- Body properly wrapped under 75 characters
- Has `Signed-off-by:` with real name and email
- Version changelog correctly placed after `---`

**⚠ Warnings:**
- No mention of testing or performance validation
- Missing justification for why this simplification is beneficial

## Code Style Review

**✓ Good:**
- Comment formatting follows DPDK conventions
- Lines under 100 characters
- No trailing whitespace visible
- Consistent indentation

**⚠ Code Logic Concerns:**

1. **Removed optimization paths**: The patch removes `__rte_constant(n)` checks that allowed the compiler to optimize for build-time constant request sizes. While the comment claims "the compiler unrolls the fixed length copy loop," removing the explicit constant checks may reduce optimization effectiveness.

2. **Removed early return**: The old code had an optimization for the case where `remaining == 0` (entire request satisfied from partial cache use):
   ```c
   if (!__rte_constant(n) && likely(remaining == 0)) {
       /* stats and return */
   }
   ```
   This early return is now removed, which could slightly impact performance for this common case.

3. **Unnecessary code movement**: Line 127 moves `cache->len = cache->size;` before the loop when it was previously after. Since the loop doesn't use `cache->len`, this change appears cosmetic but reduces code clarity - it's less obvious why the assignment happens before consuming the objects.

## Missing Elements

**⚠ Should address:**
- No performance testing mentioned for a patch that removes optimizations
- No explanation of why simplification is worth potential performance impact
- Consider benchmarking showing no regression

## Recommendations

1. **Enhance commit message** with performance testing results to justify the simplification
2. **Consider keeping** the early return at line 111-118 for the `remaining == 0` case as it's a worthwhile optimization
3. **Reconsider** moving `cache->len = cache->size;` - leave it in the original position for clarity
4. **Add rationale** explaining why removing `__rte_constant(n)` checks doesn't harm performance

The patch compiles and maintains functional correctness, but the performance implications of removing compiler optimization hints should be validated and documented.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH v2] mempool: simplify get objects
  2026-01-20 20:00   ` Stephen Hemminger
@ 2026-01-21 11:17     ` Morten Brørup
  0 siblings, 0 replies; 9+ messages in thread
From: Morten Brørup @ 2026-01-21 11:17 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Andrew Rybchenko, dev

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Tuesday, 20 January 2026 21.00
> 
> On Tue, 20 Jan 2026 10:17:01 +0000
> Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> > Removed explicit test for build time constant request size,
> > and added comment that the compiler loop unrolls when request size is
> > build time constant, to improve source code readability.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> 
> Not sure I agree with what AI review shows, but here it is:

In this case, I consider the AI review very interesting; it tells a lot about the AI.
Thank you, Stephen!

> 
> 
> ## Commit Message Review
> 
> **✓ Good:**
> - Subject line: 29 characters (well under 60 limit)
> - Proper component prefix: `mempool:`
> - Lowercase, imperative mood, no trailing period
> - Body properly wrapped under 75 characters
> - Has `Signed-off-by:` with real name and email
> - Version changelog correctly placed after `---`
> 
> **⚠ Warnings:**
> - No mention of testing or performance validation
> - Missing justification for why this simplification is beneficial

My description has "to improve source code readability", which I consider the benefit.

> 
> ## Code Style Review
> 
> **✓ Good:**
> - Comment formatting follows DPDK conventions
> - Lines under 100 characters
> - No trailing whitespace visible
> - Consistent indentation
> 
> **⚠ Code Logic Concerns:**
> 
> 1. **Removed optimization paths**: The patch removes
> `__rte_constant(n)` checks that allowed the compiler to optimize for
> build-time constant request sizes. While the comment claims "the
> compiler unrolls the fixed length copy loop," removing the explicit
> constant checks may reduce optimization effectiveness.
> 
> 2. **Removed early return**: The old code had an optimization for the
> case where `remaining == 0` (entire request satisfied from partial
> cache use):
>    ```c
>    if (!__rte_constant(n) && likely(remaining == 0)) {
>        /* stats and return */
>    }
>    ```
>    This early return is now removed, which could slightly impact
> performance for this common case.

For 1 and 2, it seems the AI is not clever enough to realize that the resulting code is exactly the same with or without these changes.
So the AI worries if there might be a performance impact, and mentions it here.

And since the AI thinks there is a potential performance impact, it correctly added the warning bullet about "No mention of testing or performance validation".
I guess that is also why the AI asks for justification (as a separate warning bullet).

> 
> 3. **Unnecessary code movement**: Line 127 moves `cache->len = cache-
> >size;` before the loop when it was previously after. Since the loop
> doesn't use `cache->len`, this change appears cosmetic but reduces code
> clarity - it's less obvious why the assignment happens before consuming
> the objects.

It is cosmetic; for code similarity, cache->len is now set before each copy loop.
I would consider making three instances of some code look similar an improvement.

On the other hand, we generally don't like unnecessary code changes. And this could look like one.

But it's not only cosmetic.
It's also an micro-optimization, which I (to reduce noise) omitted to mention in the description:
The function's pointer parameters are not marked "restrict", so writing to "obj_table" in the copy loop might formally modify cache->size. And thus, setting cache->len = cache->size after the copy loop requires loading cache->size again after copying the objects.
Moving this line up before the copy loop avoids that extra load of cache->size when setting cache->len.

The AI is not clever enough to realize this.
Which probably also means that it is not clever enough to suggest such optimizations.

> 
> ## Missing Elements
> 
> **⚠ Should address:**
> - No performance testing mentioned for a patch that removes
> optimizations
> - No explanation of why simplification is worth potential performance
> impact
> - Consider benchmarking showing no regression
> 
> ## Recommendations
> 
> 1. **Enhance commit message** with performance testing results to
> justify the simplification
> 2. **Consider keeping** the early return at line 111-118 for the
> `remaining == 0` case as it's a worthwhile optimization
> 3. **Reconsider** moving `cache->len = cache->size;` - leave it in the
> original position for clarity
> 4. **Add rationale** explaining why removing `__rte_constant(n)` checks
> doesn't harm performance
> 
> The patch compiles and maintains functional correctness, but the
> performance implications of removing compiler optimization hints should
> be validated and documented.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH v2] mempool: simplify get objects
  2026-01-20 10:17 ` [PATCH v2] " Morten Brørup
  2026-01-20 20:00   ` Stephen Hemminger
@ 2026-02-03 10:03   ` Morten Brørup
  1 sibling, 0 replies; 9+ messages in thread
From: Morten Brørup @ 2026-02-03 10:03 UTC (permalink / raw)
  To: Andrew Rybchenko, dev

PING for review.

Here's some elaboration for reviewers...

Clearly, when the request can be served from the cache (n <= cache->len), the patch is correct, regardless if n is constant or variable:

	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
	if (likely(n <= cache->len)) {
 		/*
		 * The entire request can be satisfied from the cache.
		 * If the request size is known at build time,
		 * the compiler unrolls the fixed length copy loop.
 		 */
 		cache->len -= n;
 		for (index = 0; index < n; index++)
			*obj_table++ = *--cache_objs;

		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);

		return 0;
	}


Now, let's see what happens when the request cannot be served from the cache,
i.e. when n > cache-len:

	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
	if (__rte_constant(n) && n <= cache->len) {
// FALSE, because n > cache->len
// Regardless if n is constant or variable
//		/*
//		 * The request size is known at build time, and
//		 * the entire request can be satisfied from the cache,
//		 * so let the compiler unroll the fixed length copy loop.
//		 */
//		cache->len -= n;
//		for (index = 0; index < n; index++)
//			*obj_table++ = *--cache_objs;
//
//		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
//		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
//
//		return 0;
//	}

	/*
	 * Use the cache as much as we have to return hot objects first.
	 * If the request size 'n' is known at build time, the above comparison
	 * ensures that n > cache->len here, so omit RTE_MIN().
	 */
	len = __rte_constant(n) ? cache->len : RTE_MIN(n, cache->len);
// ALWAYS: len = cache->len
// When n is constant:
//	len = cache->len
// When n is variable:
//	len = RTE_MIN(n, cache->len)
//		= cache->len, because n > cache->len
	cache->len -= len;
// ALWAYS: cache->len = 0, because len == cache->len
	remaining = n - len;
	for (index = 0; index < len; index++)
		*obj_table++ = *--cache_objs;

	/*
	 * If the request size 'n' is known at build time, the case
	 * where the entire request can be satisfied from the cache
	 * has already been handled above, so omit handling it here.
	 */
	if (!__rte_constant(n) && likely(remaining == 0)) {
// FALSE, because remaining > 0
//		/* The entire request is satisfied from the cache. */
//
//		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
//		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
//
//		return 0;
//	}

	/* Dequeue below would overflow mem allocated for cache? */
	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
		goto driver_dequeue;

Venlig hilsen / Kind regards,
-Morten Brørup


> -----Original Message-----
> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Tuesday, 20 January 2026 11.17
> To: Andrew Rybchenko; dev@dpdk.org
> Cc: Morten Brørup
> Subject: [PATCH v2] mempool: simplify get objects
> 
> Removed explicit test for build time constant request size,
> and added comment that the compiler loop unrolls when request size is
> build time constant, to improve source code readability.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
> v2:
> * Removed unrelated microoptimization from
> rte_mempool_do_generic_put(),
>   which was also described incorrectly.
> ---
>  lib/mempool/rte_mempool.h | 35 ++++++++---------------------------
>  1 file changed, 8 insertions(+), 27 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index aedc100964..4213784e14 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -1531,11 +1531,11 @@ rte_mempool_do_generic_get(struct rte_mempool
> *mp, void **obj_table,
>  	cache_objs = &cache->objs[cache->len];
> 
>  	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
> -	if (__rte_constant(n) && n <= cache->len) {
> +	if (likely(n <= cache->len)) {
>  		/*
> -		 * The request size is known at build time, and
> -		 * the entire request can be satisfied from the cache,
> -		 * so let the compiler unroll the fixed length copy loop.
> +		 * The entire request can be satisfied from the cache.
> +		 * If the request size is known at build time,
> +		 * the compiler unrolls the fixed length copy loop.
>  		 */
>  		cache->len -= n;
>  		for (index = 0; index < n; index++)
> @@ -1547,31 +1547,13 @@ rte_mempool_do_generic_get(struct rte_mempool
> *mp, void **obj_table,
>  		return 0;
>  	}
> 
> -	/*
> -	 * Use the cache as much as we have to return hot objects first.
> -	 * If the request size 'n' is known at build time, the above
> comparison
> -	 * ensures that n > cache->len here, so omit RTE_MIN().
> -	 */
> -	len = __rte_constant(n) ? cache->len : RTE_MIN(n, cache->len);
> -	cache->len -= len;
> +	/* Use the cache as much as we have to return hot objects first.
> */
> +	len = cache->len;
>  	remaining = n - len;
> +	cache->len = 0;
>  	for (index = 0; index < len; index++)
>  		*obj_table++ = *--cache_objs;
> 
> -	/*
> -	 * If the request size 'n' is known at build time, the case
> -	 * where the entire request can be satisfied from the cache
> -	 * has already been handled above, so omit handling it here.
> -	 */
> -	if (!__rte_constant(n) && likely(remaining == 0)) {
> -		/* The entire request is satisfied from the cache. */
> -
> -		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> -		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> -
> -		return 0;
> -	}
> -
>  	/* Dequeue below would overflow mem allocated for cache? */
>  	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
>  		goto driver_dequeue;
> @@ -1592,11 +1574,10 @@ rte_mempool_do_generic_get(struct rte_mempool
> *mp, void **obj_table,
>  	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
>  	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
>  	cache_objs = &cache->objs[cache->size + remaining];
> +	cache->len = cache->size;
>  	for (index = 0; index < remaining; index++)
>  		*obj_table++ = *--cache_objs;
> 
> -	cache->len = cache->size;
> -
>  	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
>  	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> 
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3] mempool: simplify get objects
  2026-01-20  8:20 [PATCH] mempool: simplify get objects Morten Brørup
  2026-01-20  8:57 ` Morten Brørup
  2026-01-20 10:17 ` [PATCH v2] " Morten Brørup
@ 2026-02-16  9:27 ` Morten Brørup
  2026-02-17  6:53   ` Andrew Rybchenko
  2 siblings, 1 reply; 9+ messages in thread
From: Morten Brørup @ 2026-02-16  9:27 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: Morten Brørup

Removed explicit test for build time constant request size,
and added comment that the compiler loop unrolls when request size is
build time constant, to improve source code readability.

Moved setting cache->len up before the copy loop; not only for code
similarity (cache->len is now set before each copy loop), but also as an
optimization:
The function's pointer parameters are not marked restrict, so writing to
obj_table in the copy loop might formally modify cache->size. And thus,
setting cache->len = cache->size after the copy loop requires loading
cache->size again after copying the objects.
Moving this line up before the copy loop avoids that extra load of
cache->size when setting cache->len.

Similarly, moved statistics update up before the copy loops.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
v3:
* Added to description why setting cache->len was moved up before the copy
  loop.
* Moved statistics update up before the copy loop.
v2:
* Removed unrelated microoptimization from rte_mempool_do_generic_put(),
  which was also described incorrectly.
---
 lib/mempool/rte_mempool.h | 47 ++++++++++++---------------------------
 1 file changed, 14 insertions(+), 33 deletions(-)

diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index aedc100964..7989d7a475 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -1531,47 +1531,29 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 	cache_objs = &cache->objs[cache->len];
 
 	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
-	if (__rte_constant(n) && n <= cache->len) {
+	if (likely(n <= cache->len)) {
+		/* The entire request can be satisfied from the cache. */
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
 		/*
-		 * The request size is known at build time, and
-		 * the entire request can be satisfied from the cache,
-		 * so let the compiler unroll the fixed length copy loop.
+		 * If the request size is known at build time,
+		 * the compiler unrolls the fixed length copy loop.
 		 */
 		cache->len -= n;
 		for (index = 0; index < n; index++)
 			*obj_table++ = *--cache_objs;
 
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-
 		return 0;
 	}
 
-	/*
-	 * Use the cache as much as we have to return hot objects first.
-	 * If the request size 'n' is known at build time, the above comparison
-	 * ensures that n > cache->len here, so omit RTE_MIN().
-	 */
-	len = __rte_constant(n) ? cache->len : RTE_MIN(n, cache->len);
-	cache->len -= len;
+	/* Use the cache as much as we have to return hot objects first. */
+	len = cache->len;
 	remaining = n - len;
+	cache->len = 0;
 	for (index = 0; index < len; index++)
 		*obj_table++ = *--cache_objs;
 
-	/*
-	 * If the request size 'n' is known at build time, the case
-	 * where the entire request can be satisfied from the cache
-	 * has already been handled above, so omit handling it here.
-	 */
-	if (!__rte_constant(n) && likely(remaining == 0)) {
-		/* The entire request is satisfied from the cache. */
-
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-
-		return 0;
-	}
-
 	/* Dequeue below would overflow mem allocated for cache? */
 	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
 		goto driver_dequeue;
@@ -1589,17 +1571,16 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 	}
 
 	/* Satisfy the remaining part of the request from the filled cache. */
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
 	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
 	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
 	cache_objs = &cache->objs[cache->size + remaining];
+	cache->len = cache->size;
 	for (index = 0; index < remaining; index++)
 		*obj_table++ = *--cache_objs;
 
-	cache->len = cache->size;
-
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-
 	return 0;
 
 driver_dequeue:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] mempool: simplify get objects
  2026-02-16  9:27 ` [PATCH v3] " Morten Brørup
@ 2026-02-17  6:53   ` Andrew Rybchenko
  2026-03-17  8:51     ` Thomas Monjalon
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Rybchenko @ 2026-02-17  6:53 UTC (permalink / raw)
  To: Morten Brørup, dev

On 2/16/26 12:27 PM, Morten Brørup wrote:
> Removed explicit test for build time constant request size,
> and added comment that the compiler loop unrolls when request size is
> build time constant, to improve source code readability.
> 
> Moved setting cache->len up before the copy loop; not only for code
> similarity (cache->len is now set before each copy loop), but also as an
> optimization:
> The function's pointer parameters are not marked restrict, so writing to
> obj_table in the copy loop might formally modify cache->size. And thus,
> setting cache->len = cache->size after the copy loop requires loading
> cache->size again after copying the objects.
> Moving this line up before the copy loop avoids that extra load of
> cache->size when setting cache->len.
> 
> Similarly, moved statistics update up before the copy loops.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>

Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] mempool: simplify get objects
  2026-02-17  6:53   ` Andrew Rybchenko
@ 2026-03-17  8:51     ` Thomas Monjalon
  0 siblings, 0 replies; 9+ messages in thread
From: Thomas Monjalon @ 2026-03-17  8:51 UTC (permalink / raw)
  To: Morten Brørup; +Cc: dev, Andrew Rybchenko

17/02/2026 07:53, Andrew Rybchenko:
> On 2/16/26 12:27 PM, Morten Brørup wrote:
> > Removed explicit test for build time constant request size,
> > and added comment that the compiler loop unrolls when request size is
> > build time constant, to improve source code readability.
> > 
> > Moved setting cache->len up before the copy loop; not only for code
> > similarity (cache->len is now set before each copy loop), but also as an
> > optimization:
> > The function's pointer parameters are not marked restrict, so writing to
> > obj_table in the copy loop might formally modify cache->size. And thus,
> > setting cache->len = cache->size after the copy loop requires loading
> > cache->size again after copying the objects.
> > Moving this line up before the copy loop avoids that extra load of
> > cache->size when setting cache->len.
> > 
> > Similarly, moved statistics update up before the copy loops.
> > 
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> 
> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

Applied, thanks.




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-03-17  8:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-20  8:20 [PATCH] mempool: simplify get objects Morten Brørup
2026-01-20  8:57 ` Morten Brørup
2026-01-20 10:17 ` [PATCH v2] " Morten Brørup
2026-01-20 20:00   ` Stephen Hemminger
2026-01-21 11:17     ` Morten Brørup
2026-02-03 10:03   ` Morten Brørup
2026-02-16  9:27 ` [PATCH v3] " Morten Brørup
2026-02-17  6:53   ` Andrew Rybchenko
2026-03-17  8:51     ` Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox