[RFC PATCH 0/2] mempool: de-inline get/put objects unlikely code

public inbox for dev@dpdk.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/2] mempool: de-inline get/put objects unlikely code
@ 2026-02-16 11:58 Morten Brørup
  2026-02-16 11:58 ` [RFC PATCH 1/2] mempool: simplify get objects Morten Brørup
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Morten Brørup @ 2026-02-16 11:58 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: Morten Brørup

Reduced compiled code footprint by de-inlining unlikely
code paths.

Morten Brørup (2):
  mempool: simplify get objects
  mempool: de-inline get/put objects unlikely code paths

 lib/mempool/rte_mempool.c | 153 +++++++++++++++++++++++++
 lib/mempool/rte_mempool.h | 229 +++++++++++++++++---------------------
 2 files changed, 258 insertions(+), 124 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH 1/2] mempool: simplify get objects
  2026-02-16 11:58 [RFC PATCH 0/2] mempool: de-inline get/put objects unlikely code Morten Brørup
@ 2026-02-16 11:58 ` Morten Brørup
  2026-02-16 11:58 ` [RFC PATCH 2/2] mempool: de-inline get/put objects unlikely code paths Morten Brørup
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Morten Brørup @ 2026-02-16 11:58 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: Morten Brørup

Removed explicit test for build time constant request size,
and added comment that the compiler loop unrolls when request size is
build time constant, to improve source code readability.

Moved setting cache->len up before the copy loop; not only for code
similarity (cache->len is now set before each copy loop), but also as an
optimization:
The function's pointer parameters are not marked restrict, so writing to
obj_table in the copy loop might formally modify cache->size. And thus,
setting cache->len = cache->size after the copy loop requires loading
cache->size again after copying the objects.
Moving this line up before the copy loop avoids that extra load of
cache->size when setting cache->len.

Similarly, moved statistics update up before the copy loops.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
v3:
* Added to description why setting cache->len was moved up before the copy
  loop.
* Moved statistics update up before the copy loop.
v2:
* Removed unrelated microoptimization from rte_mempool_do_generic_put(),
  which was also described incorrectly.
---
 lib/mempool/rte_mempool.h | 47 ++++++++++++---------------------------
 1 file changed, 14 insertions(+), 33 deletions(-)

diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index aedc100964..7989d7a475 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -1531,47 +1531,29 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 	cache_objs = &cache->objs[cache->len];
 
 	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
-	if (__rte_constant(n) && n <= cache->len) {
+	if (likely(n <= cache->len)) {
+		/* The entire request can be satisfied from the cache. */
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
 		/*
-		 * The request size is known at build time, and
-		 * the entire request can be satisfied from the cache,
-		 * so let the compiler unroll the fixed length copy loop.
+		 * If the request size is known at build time,
+		 * the compiler unrolls the fixed length copy loop.
 		 */
 		cache->len -= n;
 		for (index = 0; index < n; index++)
 			*obj_table++ = *--cache_objs;
 
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-
 		return 0;
 	}
 
-	/*
-	 * Use the cache as much as we have to return hot objects first.
-	 * If the request size 'n' is known at build time, the above comparison
-	 * ensures that n > cache->len here, so omit RTE_MIN().
-	 */
-	len = __rte_constant(n) ? cache->len : RTE_MIN(n, cache->len);
-	cache->len -= len;
+	/* Use the cache as much as we have to return hot objects first. */
+	len = cache->len;
 	remaining = n - len;
+	cache->len = 0;
 	for (index = 0; index < len; index++)
 		*obj_table++ = *--cache_objs;
 
-	/*
-	 * If the request size 'n' is known at build time, the case
-	 * where the entire request can be satisfied from the cache
-	 * has already been handled above, so omit handling it here.
-	 */
-	if (!__rte_constant(n) && likely(remaining == 0)) {
-		/* The entire request is satisfied from the cache. */
-
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-
-		return 0;
-	}
-
 	/* Dequeue below would overflow mem allocated for cache? */
 	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
 		goto driver_dequeue;
@@ -1589,17 +1571,16 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 	}
 
 	/* Satisfy the remaining part of the request from the filled cache. */
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
 	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
 	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
 	cache_objs = &cache->objs[cache->size + remaining];
+	cache->len = cache->size;
 	for (index = 0; index < remaining; index++)
 		*obj_table++ = *--cache_objs;
 
-	cache->len = cache->size;
-
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-
 	return 0;
 
 driver_dequeue:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 2/2] mempool: de-inline get/put objects unlikely code paths
  2026-02-16 11:58 [RFC PATCH 0/2] mempool: de-inline get/put objects unlikely code Morten Brørup
  2026-02-16 11:58 ` [RFC PATCH 1/2] mempool: simplify get objects Morten Brørup
@ 2026-02-16 11:58 ` Morten Brørup
  2026-02-16 13:13 ` [RFC PATCH v2 0/2] mempool: de-inline get/put objects unlikely code Morten Brørup
  2026-02-16 15:23 ` [RFC PATCH v3 0/2] mempool: de-inline get/put " Morten Brørup
  3 siblings, 0 replies; 15+ messages in thread
From: Morten Brørup @ 2026-02-16 11:58 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: Morten Brørup

Reduced compiled code footprint by de-inlining unlikely
code paths.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/mempool/rte_mempool.c | 153 +++++++++++++++++++++++++++++
 lib/mempool/rte_mempool.h | 202 +++++++++++++++++++-------------------
 2 files changed, 254 insertions(+), 101 deletions(-)

diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
index 3042d94c14..078d6143c7 100644
--- a/lib/mempool/rte_mempool.c
+++ b/lib/mempool/rte_mempool.c
@@ -1016,6 +1016,118 @@ rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
 	return NULL;
 }
 
+/* internal */
+RTE_EXPORT_INTERNAL_SYMBOL(_rte_mempool_do_generic_put_more)
+void
+_rte_mempool_do_generic_put_more(struct rte_mempool *mp, void * const *obj_table,
+		unsigned int n, struct rte_mempool_cache *cache)
+{
+	__rte_assume(cache->flushthresh <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
+	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
+	__rte_assume(cache->len <= cache->flushthresh);
+	__rte_assume(cache->len + n > cache->flushthresh);
+	if (likely(n <= cache->flushthresh)) {
+		uint32_t len;
+		void **cache_objs;
+
+		/*
+		 * The cache is big enough for the objects, but - as detected by
+		 * rte_mempool_do_generic_put() - has insufficient room for them.
+		 * Flush the cache to make room for the objects.
+		 */
+		len = cache->len;
+		cache_objs = &cache->objs[0];
+		cache->len = n;
+		rte_mempool_ops_enqueue_bulk(mp, cache_objs, len);
+
+		/* Add the objects to the cache. */
+#if 0 /* Simple alternative to rte_memcpy(). */
+		for (uint32_t index = 0; index < n; index++)
+			*cache_objs++ = *obj_table++;
+#else
+		rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
+#endif
+
+		return;
+	}
+
+	/* The request itself is too big for the cache. Push objects directly to the backend. */
+	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
+}
+
+/* internal */
+RTE_EXPORT_INTERNAL_SYMBOL(_rte_mempool_do_generic_get_more)
+int
+_rte_mempool_do_generic_get_more(struct rte_mempool *mp, void **obj_table,
+		unsigned int n, struct rte_mempool_cache *cache)
+{
+	int ret;
+	unsigned int remaining;
+	uint32_t index, len;
+	void **cache_objs;
+
+	/* Use the cache as much as we have to return hot objects first. */
+	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
+	len = cache->len;
+	remaining = n - len;
+	cache_objs = &cache->objs[len];
+	cache->len = 0;
+	for (index = 0; index < len; index++)
+		*obj_table++ = *--cache_objs;
+
+	/* Dequeue below would overflow mem allocated for cache? */
+	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
+		goto driver_dequeue;
+
+	/* Fill the cache from the backend; fetch size + remaining objects. */
+	ret = rte_mempool_ops_dequeue_bulk(mp, cache->objs,
+			cache->size + remaining);
+	if (unlikely(ret < 0)) {
+		/*
+		 * We are buffer constrained, and not able to fetch all that.
+		 * Do not fill the cache, just satisfy the remaining part of
+		 * the request directly from the backend.
+		 */
+		goto driver_dequeue;
+	}
+
+	/* Satisfy the remaining part of the request from the filled cache. */
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
+	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
+	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
+	cache_objs = &cache->objs[cache->size + remaining];
+	cache->len = cache->size;
+	for (index = 0; index < remaining; index++)
+		*obj_table++ = *--cache_objs;
+
+	return 0;
+
+driver_dequeue:
+
+	/* Get remaining objects directly from the backend. */
+	ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, remaining);
+
+	if (unlikely(ret < 0)) {
+		cache->len = n - remaining;
+		/*
+		 * No further action is required to roll the first part
+		 * of the request back into the cache, as objects in
+		 * the cache are intact.
+		 */
+
+		RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
+		RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
+	} else {
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+		__rte_assume(ret == 0);
+	}
+
+	return ret;
+}
+
 /* Return the number of entries in the mempool */
 RTE_EXPORT_SYMBOL(rte_mempool_avail_count)
 unsigned int
@@ -1634,3 +1746,44 @@ RTE_INIT(mempool_init_telemetry)
 	rte_telemetry_register_cmd("/mempool/info", mempool_handle_info,
 		"Returns mempool info. Parameters: pool_name");
 }
+
+void
+review_rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
+                           unsigned int n, struct rte_mempool_cache *cache)
+{ rte_mempool_do_generic_put(mp, obj_table, n, cache); }
+
+void
+review_rte_mempool_do_generic_put_const32(struct rte_mempool *mp, void * const *obj_table,
+                           struct rte_mempool_cache *cache)
+{ rte_mempool_do_generic_put(mp, obj_table, 32, cache); }
+
+void
+review_rte_mempool_do_generic_put_const1(struct rte_mempool *mp, void * const *obj_table,
+                           struct rte_mempool_cache *cache)
+{ rte_mempool_do_generic_put(mp, obj_table, 1, cache); }
+
+int
+review_rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
+                           unsigned int n, struct rte_mempool_cache *cache)
+{ return rte_mempool_do_generic_get(mp, obj_table, n, cache); }
+
+int
+review_rte_mempool_do_generic_get_const32(struct rte_mempool *mp, void **obj_table,
+                           struct rte_mempool_cache *cache)
+{ return rte_mempool_do_generic_get(mp, obj_table, 32, cache); }
+
+int
+review_rte_mempool_do_generic_get_const1(struct rte_mempool *mp, void **obj_table,
+                           struct rte_mempool_cache *cache)
+{ return rte_mempool_do_generic_get(mp, obj_table, 1, cache); }
+
+int
+review_rte_mempool_do_generic_get_const1ret(struct rte_mempool *mp, void **obj_table,
+                           struct rte_mempool_cache *cache)
+{
+    int ret = rte_mempool_do_generic_get(mp, obj_table, 1, cache);
+    if (ret == 0)
+        return 0x1234;
+    else
+        exit(ret);
+}
diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 7989d7a475..9658bd7b4a 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -1370,6 +1370,24 @@ rte_mempool_cache_flush(struct rte_mempool_cache *cache,
 	cache->len = 0;
 }
 
+/**
+ * @internal
+ * Put several objects back in the mempool, more than the cache has room for; used internally.
+ * @param mp
+ *   A pointer to the mempool structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to store back in the mempool, must be strictly
+ *   positive.
+ * @param cache
+ *   A pointer to a mempool cache structure.
+ */
+__rte_internal
+void
+_rte_mempool_do_generic_put_more(struct rte_mempool *mp, void * const *obj_table,
+		unsigned int n, struct rte_mempool_cache *cache);
+
 /**
  * @internal Put several objects back in the mempool; used internally.
  * @param mp
@@ -1388,9 +1406,16 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 {
 	void **cache_objs;
 
-	/* No cache provided? */
-	if (unlikely(cache == NULL))
-		goto driver_enqueue;
+	if (unlikely(cache == NULL)) {
+		/* No cache. Push objects directly to the backend. */
+		/* Increment stats now, adding in mempool always succeeds. */
+		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
+		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
+
+		rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
+
+		return;
+	}
 
 	/* Increment stats now, adding in mempool always succeeds. */
 	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
@@ -1403,35 +1428,43 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 		/* Sufficient room in the cache for the objects. */
 		cache_objs = &cache->objs[cache->len];
 		cache->len += n;
-	} else if (n <= cache->flushthresh) {
+
+cache_enqueue:
+#if 0 /* Simple alternative to rte_memcpy(). */
 		/*
-		 * The cache is big enough for the objects, but - as detected by
-		 * the comparison above - has insufficient room for them.
-		 * Flush the cache to make room for the objects.
+		 * Add the objects to the cache.
+		 * If the request size is known at build time,
+		 * the compiler unrolls the fixed length copy loop.
 		 */
-		cache_objs = &cache->objs[0];
-		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
-		cache->len = n;
-	} else {
-		/* The request itself is too big for the cache. */
-		goto driver_enqueue_stats_incremented;
-	}
-
-	/* Add the objects to the cache. */
-	rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
+		for (uint32_t index = 0; index < n; index++)
+			*cache_objs++ = *obj_table++;
+#else
+		/* Add the objects to the cache. */
+		rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
+#endif
 
-	return;
+		return;
+	}
 
-driver_enqueue:
+	if (__rte_constant(n) && likely(n <= cache->flushthresh)) {
+		uint32_t len;
 
-	/* increment stat now, adding in mempool always success */
-	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
-	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
+		/*
+		 * The cache is big enough for the objects, but - as detected
+		 * above - has insufficient room for them.
+		 * Flush the cache to make room for the objects.
+		 */
+		len = cache->len;
+		cache_objs = &cache->objs[0];
+		cache->len = n;
+		rte_mempool_ops_enqueue_bulk(mp, cache_objs, len);
 
-driver_enqueue_stats_incremented:
+		/* Add the objects to the cache. */
+		goto cache_enqueue;
+	}
 
-	/* push objects to the backend */
-	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
+	/* Insufficient room in the cache for the objects. */
+	_rte_mempool_do_generic_put_more(mp, obj_table, n, cache);
 }
 
 
@@ -1498,6 +1531,26 @@ rte_mempool_put(struct rte_mempool *mp, void *obj)
 	rte_mempool_put_bulk(mp, &obj, 1);
 }
 
+/**
+ * @internal
+ * Get several objects from the mempool, more than held in the cache; used internally.
+ * @param mp
+ *   A pointer to the mempool structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to get, must be strictly positive.
+ * @param cache
+ *   A pointer to a mempool cache structure.
+ * @return
+ *   - 0: Success.
+ *   - <0: Error; code of driver dequeue function.
+ */
+__rte_internal
+int
+_rte_mempool_do_generic_get_more(struct rte_mempool *mp, void **obj_table,
+		unsigned int n, struct rte_mempool_cache *cache);
+
 /**
  * @internal Get several objects from the mempool; used internally.
  * @param mp
@@ -1516,26 +1569,36 @@ static __rte_always_inline int
 rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 			   unsigned int n, struct rte_mempool_cache *cache)
 {
-	int ret;
-	unsigned int remaining;
-	uint32_t index, len;
-	void **cache_objs;
-
-	/* No cache provided? */
 	if (unlikely(cache == NULL)) {
-		remaining = n;
-		goto driver_dequeue;
-	}
+		int ret;
 
-	/* The cache is a stack, so copy will be in reverse order. */
-	cache_objs = &cache->objs[cache->len];
+		/* No cache. Get objects directly from the backend. */
+		ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, n);
+
+		if (unlikely(ret < 0)) {
+			RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
+			RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
+		} else {
+			RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
+			RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
+			__rte_assume(ret == 0);
+		}
+
+		return ret;
+	}
 
 	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
 	if (likely(n <= cache->len)) {
+		uint32_t index;
+		void **cache_objs;
+
 		/* The entire request can be satisfied from the cache. */
 		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
 		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
 
+		/* The cache is a stack, so copy will be in reverse order. */
+		cache_objs = &cache->objs[cache->len];
+
 		/*
 		 * If the request size is known at build time,
 		 * the compiler unrolls the fixed length copy loop.
@@ -1547,71 +1610,8 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 		return 0;
 	}
 
-	/* Use the cache as much as we have to return hot objects first. */
-	len = cache->len;
-	remaining = n - len;
-	cache->len = 0;
-	for (index = 0; index < len; index++)
-		*obj_table++ = *--cache_objs;
-
-	/* Dequeue below would overflow mem allocated for cache? */
-	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
-		goto driver_dequeue;
-
-	/* Fill the cache from the backend; fetch size + remaining objects. */
-	ret = rte_mempool_ops_dequeue_bulk(mp, cache->objs,
-			cache->size + remaining);
-	if (unlikely(ret < 0)) {
-		/*
-		 * We are buffer constrained, and not able to fetch all that.
-		 * Do not fill the cache, just satisfy the remaining part of
-		 * the request directly from the backend.
-		 */
-		goto driver_dequeue;
-	}
-
-	/* Satisfy the remaining part of the request from the filled cache. */
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-
-	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
-	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
-	cache_objs = &cache->objs[cache->size + remaining];
-	cache->len = cache->size;
-	for (index = 0; index < remaining; index++)
-		*obj_table++ = *--cache_objs;
-
-	return 0;
-
-driver_dequeue:
-
-	/* Get remaining objects directly from the backend. */
-	ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, remaining);
-
-	if (unlikely(ret < 0)) {
-		if (likely(cache != NULL)) {
-			cache->len = n - remaining;
-			/*
-			 * No further action is required to roll the first part
-			 * of the request back into the cache, as objects in
-			 * the cache are intact.
-			 */
-		}
-
-		RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
-		RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
-	} else {
-		if (likely(cache != NULL)) {
-			RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-			RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-		} else {
-			RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
-			RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
-		}
-		__rte_assume(ret == 0);
-	}
-
-	return ret;
+	/* The entire request cannot be satisfied from the cache. */
+	return _rte_mempool_do_generic_get_more(mp, obj_table, n, cache);
 }
 
 /**
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 0/2] mempool: de-inline get/put objects unlikely code
  2026-02-16 11:58 [RFC PATCH 0/2] mempool: de-inline get/put objects unlikely code Morten Brørup
  2026-02-16 11:58 ` [RFC PATCH 1/2] mempool: simplify get objects Morten Brørup
  2026-02-16 11:58 ` [RFC PATCH 2/2] mempool: de-inline get/put objects unlikely code paths Morten Brørup
@ 2026-02-16 13:13 ` Morten Brørup
  2026-02-16 13:13   ` [RFC PATCH v2 1/2] mempool: simplify get objects Morten Brørup
  2026-02-16 13:13   ` [RFC PATCH v2 2/2] mempool: de-inline get/put objects unlikely code paths Morten Brørup
  2026-02-16 15:23 ` [RFC PATCH v3 0/2] mempool: de-inline get/put " Morten Brørup
  3 siblings, 2 replies; 15+ messages in thread
From: Morten Brørup @ 2026-02-16 13:13 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: Morten Brørup

Reduced compiled code footprint by de-inlining unlikely
code paths.

Morten Brørup (2):
  mempool: simplify get objects
  mempool: de-inline get/put objects unlikely code paths

 lib/mempool/rte_mempool.c | 114 ++++++++++++++++++-
 lib/mempool/rte_mempool.h | 229 +++++++++++++++++---------------------
 2 files changed, 218 insertions(+), 125 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 1/2] mempool: simplify get objects
  2026-02-16 13:13 ` [RFC PATCH v2 0/2] mempool: de-inline get/put objects unlikely code Morten Brørup
@ 2026-02-16 13:13   ` Morten Brørup
  2026-02-16 13:13   ` [RFC PATCH v2 2/2] mempool: de-inline get/put objects unlikely code paths Morten Brørup
  1 sibling, 0 replies; 15+ messages in thread
From: Morten Brørup @ 2026-02-16 13:13 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: Morten Brørup

Removed explicit test for build time constant request size,
and added comment that the compiler loop unrolls when request size is
build time constant, to improve source code readability.

Moved setting cache->len up before the copy loop; not only for code
similarity (cache->len is now set before each copy loop), but also as an
optimization:
The function's pointer parameters are not marked restrict, so writing to
obj_table in the copy loop might formally modify cache->size. And thus,
setting cache->len = cache->size after the copy loop requires loading
cache->size again after copying the objects.
Moving this line up before the copy loop avoids that extra load of
cache->size when setting cache->len.

Similarly, moved statistics update up before the copy loops.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
v3:
* Added to description why setting cache->len was moved up before the copy
  loop.
* Moved statistics update up before the copy loop.
v2:
* Removed unrelated microoptimization from rte_mempool_do_generic_put(),
  which was also described incorrectly.
---
 lib/mempool/rte_mempool.h | 47 ++++++++++++---------------------------
 1 file changed, 14 insertions(+), 33 deletions(-)

diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index aedc100964..7989d7a475 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -1531,47 +1531,29 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 	cache_objs = &cache->objs[cache->len];
 
 	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
-	if (__rte_constant(n) && n <= cache->len) {
+	if (likely(n <= cache->len)) {
+		/* The entire request can be satisfied from the cache. */
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
 		/*
-		 * The request size is known at build time, and
-		 * the entire request can be satisfied from the cache,
-		 * so let the compiler unroll the fixed length copy loop.
+		 * If the request size is known at build time,
+		 * the compiler unrolls the fixed length copy loop.
 		 */
 		cache->len -= n;
 		for (index = 0; index < n; index++)
 			*obj_table++ = *--cache_objs;
 
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-
 		return 0;
 	}
 
-	/*
-	 * Use the cache as much as we have to return hot objects first.
-	 * If the request size 'n' is known at build time, the above comparison
-	 * ensures that n > cache->len here, so omit RTE_MIN().
-	 */
-	len = __rte_constant(n) ? cache->len : RTE_MIN(n, cache->len);
-	cache->len -= len;
+	/* Use the cache as much as we have to return hot objects first. */
+	len = cache->len;
 	remaining = n - len;
+	cache->len = 0;
 	for (index = 0; index < len; index++)
 		*obj_table++ = *--cache_objs;
 
-	/*
-	 * If the request size 'n' is known at build time, the case
-	 * where the entire request can be satisfied from the cache
-	 * has already been handled above, so omit handling it here.
-	 */
-	if (!__rte_constant(n) && likely(remaining == 0)) {
-		/* The entire request is satisfied from the cache. */
-
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-
-		return 0;
-	}
-
 	/* Dequeue below would overflow mem allocated for cache? */
 	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
 		goto driver_dequeue;
@@ -1589,17 +1571,16 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 	}
 
 	/* Satisfy the remaining part of the request from the filled cache. */
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
 	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
 	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
 	cache_objs = &cache->objs[cache->size + remaining];
+	cache->len = cache->size;
 	for (index = 0; index < remaining; index++)
 		*obj_table++ = *--cache_objs;
 
-	cache->len = cache->size;
-
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-
 	return 0;
 
 driver_dequeue:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v2 2/2] mempool: de-inline get/put objects unlikely code paths
  2026-02-16 13:13 ` [RFC PATCH v2 0/2] mempool: de-inline get/put objects unlikely code Morten Brørup
  2026-02-16 13:13   ` [RFC PATCH v2 1/2] mempool: simplify get objects Morten Brørup
@ 2026-02-16 13:13   ` Morten Brørup
  1 sibling, 0 replies; 15+ messages in thread
From: Morten Brørup @ 2026-02-16 13:13 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: Morten Brørup

De-inline unlikely code paths, for smaller footprint.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
v2:
* Removed review functions.
* Changed #if 0 to #if AVOID_RTE_MEMCPY.
---
 lib/mempool/rte_mempool.c | 114 ++++++++++++++++++++-
 lib/mempool/rte_mempool.h | 202 +++++++++++++++++++-------------------
 2 files changed, 214 insertions(+), 102 deletions(-)

diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
index 3042d94c14..c9e6f49de5 100644
--- a/lib/mempool/rte_mempool.c
+++ b/lib/mempool/rte_mempool.c
@@ -1016,6 +1016,118 @@ rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
 	return NULL;
 }
 
+/* internal */
+RTE_EXPORT_INTERNAL_SYMBOL(_rte_mempool_do_generic_put_more)
+void
+_rte_mempool_do_generic_put_more(struct rte_mempool *mp, void * const *obj_table,
+		unsigned int n, struct rte_mempool_cache *cache)
+{
+	__rte_assume(cache->flushthresh <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
+	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
+	__rte_assume(cache->len <= cache->flushthresh);
+	__rte_assume(cache->len + n > cache->flushthresh);
+	if (likely(n <= cache->flushthresh)) {
+		uint32_t len;
+		void **cache_objs;
+
+		/*
+		 * The cache is big enough for the objects, but - as detected by
+		 * rte_mempool_do_generic_put() - has insufficient room for them.
+		 * Flush the cache to make room for the objects.
+		 */
+		len = cache->len;
+		cache_objs = &cache->objs[0];
+		cache->len = n;
+		rte_mempool_ops_enqueue_bulk(mp, cache_objs, len);
+
+		/* Add the objects to the cache. */
+#ifdef AVOID_RTE_MEMCPY /* Simple alternative to rte_memcpy(). */
+		for (uint32_t index = 0; index < n; index++)
+			*cache_objs++ = *obj_table++;
+#else
+		rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
+#endif
+
+		return;
+	}
+
+	/* The request itself is too big for the cache. Push objects directly to the backend. */
+	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
+}
+
+/* internal */
+RTE_EXPORT_INTERNAL_SYMBOL(_rte_mempool_do_generic_get_more)
+int
+_rte_mempool_do_generic_get_more(struct rte_mempool *mp, void **obj_table,
+		unsigned int n, struct rte_mempool_cache *cache)
+{
+	int ret;
+	unsigned int remaining;
+	uint32_t index, len;
+	void **cache_objs;
+
+	/* Use the cache as much as we have to return hot objects first. */
+	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
+	len = cache->len;
+	remaining = n - len;
+	cache_objs = &cache->objs[len];
+	cache->len = 0;
+	for (index = 0; index < len; index++)
+		*obj_table++ = *--cache_objs;
+
+	/* Dequeue below would overflow mem allocated for cache? */
+	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
+		goto driver_dequeue;
+
+	/* Fill the cache from the backend; fetch size + remaining objects. */
+	ret = rte_mempool_ops_dequeue_bulk(mp, cache->objs,
+			cache->size + remaining);
+	if (unlikely(ret < 0)) {
+		/*
+		 * We are buffer constrained, and not able to fetch all that.
+		 * Do not fill the cache, just satisfy the remaining part of
+		 * the request directly from the backend.
+		 */
+		goto driver_dequeue;
+	}
+
+	/* Satisfy the remaining part of the request from the filled cache. */
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
+	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
+	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
+	cache_objs = &cache->objs[cache->size + remaining];
+	cache->len = cache->size;
+	for (index = 0; index < remaining; index++)
+		*obj_table++ = *--cache_objs;
+
+	return 0;
+
+driver_dequeue:
+
+	/* Get remaining objects directly from the backend. */
+	ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, remaining);
+
+	if (unlikely(ret < 0)) {
+		cache->len = n - remaining;
+		/*
+		 * No further action is required to roll the first part
+		 * of the request back into the cache, as objects in
+		 * the cache are intact.
+		 */
+
+		RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
+		RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
+	} else {
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+		__rte_assume(ret == 0);
+	}
+
+	return ret;
+}
+
 /* Return the number of entries in the mempool */
 RTE_EXPORT_SYMBOL(rte_mempool_avail_count)
 unsigned int
@@ -1633,4 +1745,4 @@ RTE_INIT(mempool_init_telemetry)
 		"Returns list of available mempool. Takes no parameters");
 	rte_telemetry_register_cmd("/mempool/info", mempool_handle_info,
 		"Returns mempool info. Parameters: pool_name");
-}
+}
\ No newline at end of file
diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 7989d7a475..86163e5377 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -1370,6 +1370,24 @@ rte_mempool_cache_flush(struct rte_mempool_cache *cache,
 	cache->len = 0;
 }
 
+/**
+ * @internal
+ * Put several objects back in the mempool, more than the cache has room for; used internally.
+ * @param mp
+ *   A pointer to the mempool structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to store back in the mempool, must be strictly
+ *   positive.
+ * @param cache
+ *   A pointer to a mempool cache structure.
+ */
+__rte_internal
+void
+_rte_mempool_do_generic_put_more(struct rte_mempool *mp, void * const *obj_table,
+		unsigned int n, struct rte_mempool_cache *cache);
+
 /**
  * @internal Put several objects back in the mempool; used internally.
  * @param mp
@@ -1388,9 +1406,16 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 {
 	void **cache_objs;
 
-	/* No cache provided? */
-	if (unlikely(cache == NULL))
-		goto driver_enqueue;
+	if (unlikely(cache == NULL)) {
+		/* No cache. Push objects directly to the backend. */
+		/* Increment stats now, adding in mempool always succeeds. */
+		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
+		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
+
+		rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
+
+		return;
+	}
 
 	/* Increment stats now, adding in mempool always succeeds. */
 	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
@@ -1403,35 +1428,43 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 		/* Sufficient room in the cache for the objects. */
 		cache_objs = &cache->objs[cache->len];
 		cache->len += n;
-	} else if (n <= cache->flushthresh) {
+
+cache_enqueue:
+#ifdef AVOID_RTE_MEMCPY /* Simple alternative to rte_memcpy(). */
 		/*
-		 * The cache is big enough for the objects, but - as detected by
-		 * the comparison above - has insufficient room for them.
-		 * Flush the cache to make room for the objects.
+		 * Add the objects to the cache.
+		 * If the request size is known at build time,
+		 * the compiler unrolls the fixed length copy loop.
 		 */
-		cache_objs = &cache->objs[0];
-		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
-		cache->len = n;
-	} else {
-		/* The request itself is too big for the cache. */
-		goto driver_enqueue_stats_incremented;
-	}
-
-	/* Add the objects to the cache. */
-	rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
+		for (uint32_t index = 0; index < n; index++)
+			*cache_objs++ = *obj_table++;
+#else
+		/* Add the objects to the cache. */
+		rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
+#endif
 
-	return;
+		return;
+	}
 
-driver_enqueue:
+	if (__rte_constant(n) && likely(n <= cache->flushthresh)) {
+		uint32_t len;
 
-	/* increment stat now, adding in mempool always success */
-	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
-	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
+		/*
+		 * The cache is big enough for the objects, but - as detected
+		 * above - has insufficient room for them.
+		 * Flush the cache to make room for the objects.
+		 */
+		len = cache->len;
+		cache_objs = &cache->objs[0];
+		cache->len = n;
+		rte_mempool_ops_enqueue_bulk(mp, cache_objs, len);
 
-driver_enqueue_stats_incremented:
+		/* Add the objects to the cache. */
+		goto cache_enqueue;
+	}
 
-	/* push objects to the backend */
-	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
+	/* Insufficient room in the cache for the objects. */
+	_rte_mempool_do_generic_put_more(mp, obj_table, n, cache);
 }
 
 
@@ -1498,6 +1531,26 @@ rte_mempool_put(struct rte_mempool *mp, void *obj)
 	rte_mempool_put_bulk(mp, &obj, 1);
 }
 
+/**
+ * @internal
+ * Get several objects from the mempool, more than held in the cache; used internally.
+ * @param mp
+ *   A pointer to the mempool structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to get, must be strictly positive.
+ * @param cache
+ *   A pointer to a mempool cache structure.
+ * @return
+ *   - 0: Success.
+ *   - <0: Error; code of driver dequeue function.
+ */
+__rte_internal
+int
+_rte_mempool_do_generic_get_more(struct rte_mempool *mp, void **obj_table,
+		unsigned int n, struct rte_mempool_cache *cache);
+
 /**
  * @internal Get several objects from the mempool; used internally.
  * @param mp
@@ -1516,26 +1569,36 @@ static __rte_always_inline int
 rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 			   unsigned int n, struct rte_mempool_cache *cache)
 {
-	int ret;
-	unsigned int remaining;
-	uint32_t index, len;
-	void **cache_objs;
-
-	/* No cache provided? */
 	if (unlikely(cache == NULL)) {
-		remaining = n;
-		goto driver_dequeue;
-	}
+		int ret;
 
-	/* The cache is a stack, so copy will be in reverse order. */
-	cache_objs = &cache->objs[cache->len];
+		/* No cache. Get objects directly from the backend. */
+		ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, n);
+
+		if (unlikely(ret < 0)) {
+			RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
+			RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
+		} else {
+			RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
+			RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
+			__rte_assume(ret == 0);
+		}
+
+		return ret;
+	}
 
 	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
 	if (likely(n <= cache->len)) {
+		uint32_t index;
+		void **cache_objs;
+
 		/* The entire request can be satisfied from the cache. */
 		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
 		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
 
+		/* The cache is a stack, so copy will be in reverse order. */
+		cache_objs = &cache->objs[cache->len];
+
 		/*
 		 * If the request size is known at build time,
 		 * the compiler unrolls the fixed length copy loop.
@@ -1547,71 +1610,8 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 		return 0;
 	}
 
-	/* Use the cache as much as we have to return hot objects first. */
-	len = cache->len;
-	remaining = n - len;
-	cache->len = 0;
-	for (index = 0; index < len; index++)
-		*obj_table++ = *--cache_objs;
-
-	/* Dequeue below would overflow mem allocated for cache? */
-	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
-		goto driver_dequeue;
-
-	/* Fill the cache from the backend; fetch size + remaining objects. */
-	ret = rte_mempool_ops_dequeue_bulk(mp, cache->objs,
-			cache->size + remaining);
-	if (unlikely(ret < 0)) {
-		/*
-		 * We are buffer constrained, and not able to fetch all that.
-		 * Do not fill the cache, just satisfy the remaining part of
-		 * the request directly from the backend.
-		 */
-		goto driver_dequeue;
-	}
-
-	/* Satisfy the remaining part of the request from the filled cache. */
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-
-	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
-	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
-	cache_objs = &cache->objs[cache->size + remaining];
-	cache->len = cache->size;
-	for (index = 0; index < remaining; index++)
-		*obj_table++ = *--cache_objs;
-
-	return 0;
-
-driver_dequeue:
-
-	/* Get remaining objects directly from the backend. */
-	ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, remaining);
-
-	if (unlikely(ret < 0)) {
-		if (likely(cache != NULL)) {
-			cache->len = n - remaining;
-			/*
-			 * No further action is required to roll the first part
-			 * of the request back into the cache, as objects in
-			 * the cache are intact.
-			 */
-		}
-
-		RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
-		RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
-	} else {
-		if (likely(cache != NULL)) {
-			RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-			RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-		} else {
-			RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
-			RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
-		}
-		__rte_assume(ret == 0);
-	}
-
-	return ret;
+	/* The entire request cannot be satisfied from the cache. */
+	return _rte_mempool_do_generic_get_more(mp, obj_table, n, cache);
 }
 
 /**
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v3 0/2] mempool: de-inline get/put unlikely code paths
  2026-02-16 11:58 [RFC PATCH 0/2] mempool: de-inline get/put objects unlikely code Morten Brørup
                   ` (2 preceding siblings ...)
  2026-02-16 13:13 ` [RFC PATCH v2 0/2] mempool: de-inline get/put objects unlikely code Morten Brørup
@ 2026-02-16 15:23 ` Morten Brørup
  2026-02-16 15:23   ` [RFC PATCH v3 1/2] mempool: simplify get objects Morten Brørup
  2026-02-16 15:23   ` [RFC PATCH v3 2/2] mempool: de-inline get/put unlikely code paths Morten Brørup
  3 siblings, 2 replies; 15+ messages in thread
From: Morten Brørup @ 2026-02-16 15:23 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: Morten Brørup

De-inline unlikely code paths, for smaller footprint.

Morten Brørup (2):
  mempool: simplify get objects
  mempool: de-inline get/put unlikely code paths

 lib/mempool/rte_mempool.c | 112 ++++++++++++++++++
 lib/mempool/rte_mempool.h | 239 ++++++++++++++++++--------------------
 2 files changed, 227 insertions(+), 124 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH v3 1/2] mempool: simplify get objects
  2026-02-16 15:23 ` [RFC PATCH v3 0/2] mempool: de-inline get/put " Morten Brørup
@ 2026-02-16 15:23   ` Morten Brørup
  2026-02-17  6:19     ` Andrew Rybchenko
  2026-02-16 15:23   ` [RFC PATCH v3 2/2] mempool: de-inline get/put unlikely code paths Morten Brørup
  1 sibling, 1 reply; 15+ messages in thread
From: Morten Brørup @ 2026-02-16 15:23 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: Morten Brørup

Removed explicit test for build time constant request size,
and added comment that the compiler loop unrolls when request size is
build time constant, to improve source code readability.

Moved setting cache->len up before the copy loop; not only for code
similarity (cache->len is now set before each copy loop), but also as an
optimization:
The function's pointer parameters are not marked restrict, so writing to
obj_table in the copy loop might formally modify cache->size. And thus,
setting cache->len = cache->size after the copy loop requires loading
cache->size again after copying the objects.
Moving this line up before the copy loop avoids that extra load of
cache->size when setting cache->len.

Similarly, moved statistics update up before the copy loops.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
v3:
* Added to description why setting cache->len was moved up before the copy
  loop.
* Moved statistics update up before the copy loop.
v2:
* Removed unrelated microoptimization from rte_mempool_do_generic_put(),
  which was also described incorrectly.
---
 lib/mempool/rte_mempool.h | 47 ++++++++++++---------------------------
 1 file changed, 14 insertions(+), 33 deletions(-)

diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index aedc100964..7989d7a475 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -1531,47 +1531,29 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 	cache_objs = &cache->objs[cache->len];
 
 	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
-	if (__rte_constant(n) && n <= cache->len) {
+	if (likely(n <= cache->len)) {
+		/* The entire request can be satisfied from the cache. */
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
 		/*
-		 * The request size is known at build time, and
-		 * the entire request can be satisfied from the cache,
-		 * so let the compiler unroll the fixed length copy loop.
+		 * If the request size is known at build time,
+		 * the compiler unrolls the fixed length copy loop.
 		 */
 		cache->len -= n;
 		for (index = 0; index < n; index++)
 			*obj_table++ = *--cache_objs;
 
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-
 		return 0;
 	}
 
-	/*
-	 * Use the cache as much as we have to return hot objects first.
-	 * If the request size 'n' is known at build time, the above comparison
-	 * ensures that n > cache->len here, so omit RTE_MIN().
-	 */
-	len = __rte_constant(n) ? cache->len : RTE_MIN(n, cache->len);
-	cache->len -= len;
+	/* Use the cache as much as we have to return hot objects first. */
+	len = cache->len;
 	remaining = n - len;
+	cache->len = 0;
 	for (index = 0; index < len; index++)
 		*obj_table++ = *--cache_objs;
 
-	/*
-	 * If the request size 'n' is known at build time, the case
-	 * where the entire request can be satisfied from the cache
-	 * has already been handled above, so omit handling it here.
-	 */
-	if (!__rte_constant(n) && likely(remaining == 0)) {
-		/* The entire request is satisfied from the cache. */
-
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-
-		return 0;
-	}
-
 	/* Dequeue below would overflow mem allocated for cache? */
 	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
 		goto driver_dequeue;
@@ -1589,17 +1571,16 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 	}
 
 	/* Satisfy the remaining part of the request from the filled cache. */
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
 	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
 	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
 	cache_objs = &cache->objs[cache->size + remaining];
+	cache->len = cache->size;
 	for (index = 0; index < remaining; index++)
 		*obj_table++ = *--cache_objs;
 
-	cache->len = cache->size;
-
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-
 	return 0;
 
 driver_dequeue:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v3 2/2] mempool: de-inline get/put unlikely code paths
  2026-02-16 15:23 ` [RFC PATCH v3 0/2] mempool: de-inline get/put " Morten Brørup
  2026-02-16 15:23   ` [RFC PATCH v3 1/2] mempool: simplify get objects Morten Brørup
@ 2026-02-16 15:23   ` Morten Brørup
  2026-02-16 17:35     ` Stephen Hemminger
  2026-02-17  6:37     ` Andrew Rybchenko
  1 sibling, 2 replies; 15+ messages in thread
From: Morten Brørup @ 2026-02-16 15:23 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: Morten Brørup

De-inline unlikely code paths, for smaller footprint.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
v3:
* New functions are called from inline code, so make them experimental
  instead of internal.
v2:
* Removed review functions.
* Changed #if 0 to #if AVOID_RTE_MEMCPY.
---
 lib/mempool/rte_mempool.c | 112 ++++++++++++++++++++
 lib/mempool/rte_mempool.h | 212 ++++++++++++++++++++------------------
 2 files changed, 223 insertions(+), 101 deletions(-)

diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
index 3042d94c14..30dce3a2fd 100644
--- a/lib/mempool/rte_mempool.c
+++ b/lib/mempool/rte_mempool.c
@@ -1016,6 +1016,118 @@ rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
 	return NULL;
 }
 
+/* internal */
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(_rte_mempool_do_generic_put_more, 26.03)
+void
+_rte_mempool_do_generic_put_more(struct rte_mempool *mp, void * const *obj_table,
+		unsigned int n, struct rte_mempool_cache *cache)
+{
+	__rte_assume(cache->flushthresh <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
+	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
+	__rte_assume(cache->len <= cache->flushthresh);
+	__rte_assume(cache->len + n > cache->flushthresh);
+	if (likely(n <= cache->flushthresh)) {
+		uint32_t len;
+		void **cache_objs;
+
+		/*
+		 * The cache is big enough for the objects, but - as detected by
+		 * rte_mempool_do_generic_put() - has insufficient room for them.
+		 * Flush the cache to make room for the objects.
+		 */
+		len = cache->len;
+		cache_objs = &cache->objs[0];
+		cache->len = n;
+		rte_mempool_ops_enqueue_bulk(mp, cache_objs, len);
+
+		/* Add the objects to the cache. */
+#ifdef AVOID_RTE_MEMCPY /* Simple alternative to rte_memcpy(). */
+		for (uint32_t index = 0; index < n; index++)
+			*cache_objs++ = *obj_table++;
+#else
+		rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
+#endif
+
+		return;
+	}
+
+	/* The request itself is too big for the cache. Push objects directly to the backend. */
+	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
+}
+
+/* internal */
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(_rte_mempool_do_generic_get_more, 26.03)
+int
+_rte_mempool_do_generic_get_more(struct rte_mempool *mp, void **obj_table,
+		unsigned int n, struct rte_mempool_cache *cache)
+{
+	int ret;
+	unsigned int remaining;
+	uint32_t index, len;
+	void **cache_objs;
+
+	/* Use the cache as much as we have to return hot objects first. */
+	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
+	len = cache->len;
+	remaining = n - len;
+	cache_objs = &cache->objs[len];
+	cache->len = 0;
+	for (index = 0; index < len; index++)
+		*obj_table++ = *--cache_objs;
+
+	/* Dequeue below would overflow mem allocated for cache? */
+	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
+		goto driver_dequeue;
+
+	/* Fill the cache from the backend; fetch size + remaining objects. */
+	ret = rte_mempool_ops_dequeue_bulk(mp, cache->objs,
+			cache->size + remaining);
+	if (unlikely(ret < 0)) {
+		/*
+		 * We are buffer constrained, and not able to fetch all that.
+		 * Do not fill the cache, just satisfy the remaining part of
+		 * the request directly from the backend.
+		 */
+		goto driver_dequeue;
+	}
+
+	/* Satisfy the remaining part of the request from the filled cache. */
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
+	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
+	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
+	cache_objs = &cache->objs[cache->size + remaining];
+	cache->len = cache->size;
+	for (index = 0; index < remaining; index++)
+		*obj_table++ = *--cache_objs;
+
+	return 0;
+
+driver_dequeue:
+
+	/* Get remaining objects directly from the backend. */
+	ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, remaining);
+
+	if (unlikely(ret < 0)) {
+		cache->len = n - remaining;
+		/*
+		 * No further action is required to roll the first part
+		 * of the request back into the cache, as objects in
+		 * the cache are intact.
+		 */
+
+		RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
+		RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
+	} else {
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+		__rte_assume(ret == 0);
+	}
+
+	return ret;
+}
+
 /* Return the number of entries in the mempool */
 RTE_EXPORT_SYMBOL(rte_mempool_avail_count)
 unsigned int
diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 7989d7a475..c6df285194 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -1370,8 +1370,31 @@ rte_mempool_cache_flush(struct rte_mempool_cache *cache,
 	cache->len = 0;
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @internal
+ * Put several objects back in the mempool, more than the cache has room for; used internally.
+ *
+ * @param mp
+ *   A pointer to the mempool structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to store back in the mempool, must be strictly
+ *   positive.
+ * @param cache
+ *   A pointer to a mempool cache structure.
+ */
+__rte_experimental
+void
+_rte_mempool_do_generic_put_more(struct rte_mempool *mp, void * const *obj_table,
+		unsigned int n, struct rte_mempool_cache *cache);
+
 /**
  * @internal Put several objects back in the mempool; used internally.
+ *
  * @param mp
  *   A pointer to the mempool structure.
  * @param obj_table
@@ -1388,9 +1411,16 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 {
 	void **cache_objs;
 
-	/* No cache provided? */
-	if (unlikely(cache == NULL))
-		goto driver_enqueue;
+	if (unlikely(cache == NULL)) {
+		/* No cache. Push objects directly to the backend. */
+		/* Increment stats now, adding in mempool always succeeds. */
+		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
+		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
+
+		rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
+
+		return;
+	}
 
 	/* Increment stats now, adding in mempool always succeeds. */
 	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
@@ -1403,35 +1433,43 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 		/* Sufficient room in the cache for the objects. */
 		cache_objs = &cache->objs[cache->len];
 		cache->len += n;
-	} else if (n <= cache->flushthresh) {
+
+cache_enqueue:
+#ifdef AVOID_RTE_MEMCPY /* Simple alternative to rte_memcpy(). */
 		/*
-		 * The cache is big enough for the objects, but - as detected by
-		 * the comparison above - has insufficient room for them.
-		 * Flush the cache to make room for the objects.
+		 * Add the objects to the cache.
+		 * If the request size is known at build time,
+		 * the compiler unrolls the fixed length copy loop.
 		 */
-		cache_objs = &cache->objs[0];
-		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
-		cache->len = n;
-	} else {
-		/* The request itself is too big for the cache. */
-		goto driver_enqueue_stats_incremented;
-	}
-
-	/* Add the objects to the cache. */
-	rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
+		for (uint32_t index = 0; index < n; index++)
+			*cache_objs++ = *obj_table++;
+#else
+		/* Add the objects to the cache. */
+		rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
+#endif
 
-	return;
+		return;
+	}
 
-driver_enqueue:
+	if (__rte_constant(n) && likely(n <= cache->flushthresh)) {
+		uint32_t len;
 
-	/* increment stat now, adding in mempool always success */
-	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
-	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
+		/*
+		 * The cache is big enough for the objects, but - as detected
+		 * above - has insufficient room for them.
+		 * Flush the cache to make room for the objects.
+		 */
+		len = cache->len;
+		cache_objs = &cache->objs[0];
+		cache->len = n;
+		rte_mempool_ops_enqueue_bulk(mp, cache_objs, len);
 
-driver_enqueue_stats_incremented:
+		/* Add the objects to the cache. */
+		goto cache_enqueue;
+	}
 
-	/* push objects to the backend */
-	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
+	/* Insufficient room in the cache for the objects. */
+	_rte_mempool_do_generic_put_more(mp, obj_table, n, cache);
 }
 
 
@@ -1498,8 +1536,33 @@ rte_mempool_put(struct rte_mempool *mp, void *obj)
 	rte_mempool_put_bulk(mp, &obj, 1);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @internal
+ * Get several objects from the mempool, more than held in the cache; used internally.
+ *
+ * @param mp
+ *   A pointer to the mempool structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to get, must be strictly positive.
+ * @param cache
+ *   A pointer to a mempool cache structure.
+ * @return
+ *   - 0: Success.
+ *   - <0: Error; code of driver dequeue function.
+ */
+__rte_experimental
+int
+_rte_mempool_do_generic_get_more(struct rte_mempool *mp, void **obj_table,
+		unsigned int n, struct rte_mempool_cache *cache);
+
 /**
  * @internal Get several objects from the mempool; used internally.
+ *
  * @param mp
  *   A pointer to the mempool structure.
  * @param obj_table
@@ -1516,26 +1579,36 @@ static __rte_always_inline int
 rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 			   unsigned int n, struct rte_mempool_cache *cache)
 {
-	int ret;
-	unsigned int remaining;
-	uint32_t index, len;
-	void **cache_objs;
-
-	/* No cache provided? */
 	if (unlikely(cache == NULL)) {
-		remaining = n;
-		goto driver_dequeue;
-	}
+		int ret;
+
+		/* No cache. Get objects directly from the backend. */
+		ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, n);
+
+		if (unlikely(ret < 0)) {
+			RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
+			RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
+		} else {
+			RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
+			RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
+			__rte_assume(ret == 0);
+		}
 
-	/* The cache is a stack, so copy will be in reverse order. */
-	cache_objs = &cache->objs[cache->len];
+		return ret;
+	}
 
 	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
 	if (likely(n <= cache->len)) {
+		uint32_t index;
+		void **cache_objs;
+
 		/* The entire request can be satisfied from the cache. */
 		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
 		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
 
+		/* The cache is a stack, so copy will be in reverse order. */
+		cache_objs = &cache->objs[cache->len];
+
 		/*
 		 * If the request size is known at build time,
 		 * the compiler unrolls the fixed length copy loop.
@@ -1547,71 +1620,8 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 		return 0;
 	}
 
-	/* Use the cache as much as we have to return hot objects first. */
-	len = cache->len;
-	remaining = n - len;
-	cache->len = 0;
-	for (index = 0; index < len; index++)
-		*obj_table++ = *--cache_objs;
-
-	/* Dequeue below would overflow mem allocated for cache? */
-	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
-		goto driver_dequeue;
-
-	/* Fill the cache from the backend; fetch size + remaining objects. */
-	ret = rte_mempool_ops_dequeue_bulk(mp, cache->objs,
-			cache->size + remaining);
-	if (unlikely(ret < 0)) {
-		/*
-		 * We are buffer constrained, and not able to fetch all that.
-		 * Do not fill the cache, just satisfy the remaining part of
-		 * the request directly from the backend.
-		 */
-		goto driver_dequeue;
-	}
-
-	/* Satisfy the remaining part of the request from the filled cache. */
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-
-	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
-	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
-	cache_objs = &cache->objs[cache->size + remaining];
-	cache->len = cache->size;
-	for (index = 0; index < remaining; index++)
-		*obj_table++ = *--cache_objs;
-
-	return 0;
-
-driver_dequeue:
-
-	/* Get remaining objects directly from the backend. */
-	ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, remaining);
-
-	if (unlikely(ret < 0)) {
-		if (likely(cache != NULL)) {
-			cache->len = n - remaining;
-			/*
-			 * No further action is required to roll the first part
-			 * of the request back into the cache, as objects in
-			 * the cache are intact.
-			 */
-		}
-
-		RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
-		RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
-	} else {
-		if (likely(cache != NULL)) {
-			RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
-			RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
-		} else {
-			RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
-			RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
-		}
-		__rte_assume(ret == 0);
-	}
-
-	return ret;
+	/* The entire request cannot be satisfied from the cache. */
+	return _rte_mempool_do_generic_get_more(mp, obj_table, n, cache);
 }
 
 /**
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v3 2/2] mempool: de-inline get/put unlikely code paths
  2026-02-16 15:23   ` [RFC PATCH v3 2/2] mempool: de-inline get/put unlikely code paths Morten Brørup
@ 2026-02-16 17:35     ` Stephen Hemminger
  2026-02-16 19:59       ` Morten Brørup
  2026-02-17  6:37     ` Andrew Rybchenko
  1 sibling, 1 reply; 15+ messages in thread
From: Stephen Hemminger @ 2026-02-16 17:35 UTC (permalink / raw)
  To: Morten Brørup; +Cc: Andrew Rybchenko, dev

On Mon, 16 Feb 2026 15:23:20 +0000
Morten Brørup <mb@smartsharesystems.com> wrote:

> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * @internal
> + * Put several objects back in the mempool, more than the cache has room for; used internally.
> + *
> + * @param mp
> + *   A pointer to the mempool structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to store back in the mempool, must be strictly
> + *   positive.
> + * @param cache
> + *   A pointer to a mempool cache structure.
> + */
> +__rte_experimental
> +void
> +_rte_mempool_do_generic_put_more(struct rte_mempool *mp, void * const *obj_table,
> +		unsigned int n, struct rte_mempool_cache *cache);
> +

Don't you want internal not experimental on this.
You don't want or expect direct callers.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [RFC PATCH v3 2/2] mempool: de-inline get/put unlikely code paths
  2026-02-16 17:35     ` Stephen Hemminger
@ 2026-02-16 19:59       ` Morten Brørup
  0 siblings, 0 replies; 15+ messages in thread
From: Morten Brørup @ 2026-02-16 19:59 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Andrew Rybchenko, dev

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Monday, 16 February 2026 18.36
> 
> On Mon, 16 Feb 2026 15:23:20 +0000
> Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * @internal
> > + * Put several objects back in the mempool, more than the cache has
> room for; used internally.
> > + *
> > + * @param mp
> > + *   A pointer to the mempool structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to store back in the mempool, must be
> strictly
> > + *   positive.
> > + * @param cache
> > + *   A pointer to a mempool cache structure.
> > + */
> > +__rte_experimental
> > +void
> > +_rte_mempool_do_generic_put_more(struct rte_mempool *mp, void *
> const *obj_table,
> > +		unsigned int n, struct rte_mempool_cache *cache);
> > +
> 
> Don't you want internal not experimental on this.
> You don't want or expect direct callers.

I initially had it internal, but it's being called from an inline function, so it needs to be publicly accessible from applications.
It seems experimental doesn't suffice either - if only stable APIs are allowed - so it probably needs to be a regular API.
We'll get that sorted.

The core question of the RFC is the tradeoff of making he unlikely code path non-inline to gain a smaller footprint of the likely code path.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v3 1/2] mempool: simplify get objects
  2026-02-16 15:23   ` [RFC PATCH v3 1/2] mempool: simplify get objects Morten Brørup
@ 2026-02-17  6:19     ` Andrew Rybchenko
  2026-03-13 15:36       ` Morten Brørup
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Rybchenko @ 2026-02-17  6:19 UTC (permalink / raw)
  To: Morten Brørup, dev

Hi Morten,

On 2/16/26 6:23 PM, Morten Brørup wrote:
> Removed explicit test for build time constant request size,
> and added comment that the compiler loop unrolls when request size is
> build time constant, to improve source code readability.
> 
> Moved setting cache->len up before the copy loop; not only for code
> similarity (cache->len is now set before each copy loop), but also as an
> optimization:
> The function's pointer parameters are not marked restrict, so writing to
> obj_table in the copy loop might formally modify cache->size. And thus,
> setting cache->len = cache->size after the copy loop requires loading
> cache->size again after copying the objects.
> Moving this line up before the copy loop avoids that extra load of
> cache->size when setting cache->len.
> 
> Similarly, moved statistics update up before the copy loops.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>

LGTM, the result looks simpler, easier to read and understand.
If there is no measurable performance degradation after the patch

Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

> ---
> v3:
> * Added to description why setting cache->len was moved up before the copy
>    loop.
> * Moved statistics update up before the copy loop.
> v2:
> * Removed unrelated microoptimization from rte_mempool_do_generic_put(),
>    which was also described incorrectly.
> ---
>   lib/mempool/rte_mempool.h | 47 ++++++++++++---------------------------
>   1 file changed, 14 insertions(+), 33 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index aedc100964..7989d7a475 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -1531,47 +1531,29 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
>   	cache_objs = &cache->objs[cache->len];
>   
>   	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
> -	if (__rte_constant(n) && n <= cache->len) {
> +	if (likely(n <= cache->len)) {
> +		/* The entire request can be satisfied from the cache. */
> +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> +
>   		/*
> -		 * The request size is known at build time, and
> -		 * the entire request can be satisfied from the cache,
> -		 * so let the compiler unroll the fixed length copy loop.
> +		 * If the request size is known at build time,
> +		 * the compiler unrolls the fixed length copy loop.
>   		 */
>   		cache->len -= n;
>   		for (index = 0; index < n; index++)
>   			*obj_table++ = *--cache_objs;
>   
> -		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> -		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> -
>   		return 0;
>   	}
>   
> -	/*
> -	 * Use the cache as much as we have to return hot objects first.
> -	 * If the request size 'n' is known at build time, the above comparison
> -	 * ensures that n > cache->len here, so omit RTE_MIN().
> -	 */
> -	len = __rte_constant(n) ? cache->len : RTE_MIN(n, cache->len);
> -	cache->len -= len;
> +	/* Use the cache as much as we have to return hot objects first. */
> +	len = cache->len;
>   	remaining = n - len;
> +	cache->len = 0;
>   	for (index = 0; index < len; index++)
>   		*obj_table++ = *--cache_objs;
>   
> -	/*
> -	 * If the request size 'n' is known at build time, the case
> -	 * where the entire request can be satisfied from the cache
> -	 * has already been handled above, so omit handling it here.
> -	 */
> -	if (!__rte_constant(n) && likely(remaining == 0)) {
> -		/* The entire request is satisfied from the cache. */
> -
> -		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> -		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> -
> -		return 0;
> -	}
> -
>   	/* Dequeue below would overflow mem allocated for cache? */
>   	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
>   		goto driver_dequeue;
> @@ -1589,17 +1571,16 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
>   	}
>   
>   	/* Satisfy the remaining part of the request from the filled cache. */
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> +
>   	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
>   	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
>   	cache_objs = &cache->objs[cache->size + remaining];
> +	cache->len = cache->size;
>   	for (index = 0; index < remaining; index++)
>   		*obj_table++ = *--cache_objs;
>   
> -	cache->len = cache->size;
> -
> -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> -
>   	return 0;
>   
>   driver_dequeue:


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v3 2/2] mempool: de-inline get/put unlikely code paths
  2026-02-16 15:23   ` [RFC PATCH v3 2/2] mempool: de-inline get/put unlikely code paths Morten Brørup
  2026-02-16 17:35     ` Stephen Hemminger
@ 2026-02-17  6:37     ` Andrew Rybchenko
  2026-03-13 15:27       ` Morten Brørup
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Rybchenko @ 2026-02-17  6:37 UTC (permalink / raw)
  To: Morten Brørup, dev

On 2/16/26 6:23 PM, Morten Brørup wrote:
> De-inline unlikely code paths, for smaller footprint.

The idea is interesting and makes sense to me. But could you share
performance figures to know the impact.

> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
> v3:
> * New functions are called from inline code, so make them experimental
>    instead of internal.
> v2:
> * Removed review functions.
> * Changed #if 0 to #if AVOID_RTE_MEMCPY.
> ---
>   lib/mempool/rte_mempool.c | 112 ++++++++++++++++++++
>   lib/mempool/rte_mempool.h | 212 ++++++++++++++++++++------------------
>   2 files changed, 223 insertions(+), 101 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
> index 3042d94c14..30dce3a2fd 100644
> --- a/lib/mempool/rte_mempool.c
> +++ b/lib/mempool/rte_mempool.c
> @@ -1016,6 +1016,118 @@ rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
>   	return NULL;
>   }
>   
> +/* internal */
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(_rte_mempool_do_generic_put_more, 26.03)
> +void
> +_rte_mempool_do_generic_put_more(struct rte_mempool *mp, void * const *obj_table,
> +		unsigned int n, struct rte_mempool_cache *cache)
> +{

I'd add comments which explain why stats are not updated by the
function. It is the drawback of the solution when at least
comments should be added to make it clear. Stats update would
be very easy to loose in the case of future changes.

> +	__rte_assume(cache->flushthresh <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
> +	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
> +	__rte_assume(cache->len <= cache->flushthresh);
> +	__rte_assume(cache->len + n > cache->flushthresh);
> +	if (likely(n <= cache->flushthresh)) {
> +		uint32_t len;
> +		void **cache_objs;
> +
> +		/*
> +		 * The cache is big enough for the objects, but - as detected by
> +		 * rte_mempool_do_generic_put() - has insufficient room for them.
> +		 * Flush the cache to make room for the objects.
> +		 */
> +		len = cache->len;
> +		cache_objs = &cache->objs[0];
> +		cache->len = n;
> +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, len);
> +
> +		/* Add the objects to the cache. */
> +#ifdef AVOID_RTE_MEMCPY /* Simple alternative to rte_memcpy(). */

I'd not mix introduction of AVOID_RTE_MEMCPY and other goals of the
patch. If AVOID_RTE_MEMCPY is really useful, it could be added
separately and appropriately motivated.

> +		for (uint32_t index = 0; index < n; index++)
> +			*cache_objs++ = *obj_table++;
> +#else
> +		rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
> +#endif
> +
> +		return;
> +	}
> +
> +	/* The request itself is too big for the cache. Push objects directly to the backend. */
> +	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
> +}
> +
> +/* internal */
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(_rte_mempool_do_generic_get_more, 26.03)
> +int
> +_rte_mempool_do_generic_get_more(struct rte_mempool *mp, void **obj_table,
> +		unsigned int n, struct rte_mempool_cache *cache)
> +{
> +	int ret;
> +	unsigned int remaining;
> +	uint32_t index, len;
> +	void **cache_objs;
> +
> +	/* Use the cache as much as we have to return hot objects first. */
> +	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
> +	len = cache->len;
> +	remaining = n - len;
> +	cache_objs = &cache->objs[len];
> +	cache->len = 0;
> +	for (index = 0; index < len; index++)
> +		*obj_table++ = *--cache_objs;
> +
> +	/* Dequeue below would overflow mem allocated for cache? */
> +	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
> +		goto driver_dequeue;
> +
> +	/* Fill the cache from the backend; fetch size + remaining objects. */
> +	ret = rte_mempool_ops_dequeue_bulk(mp, cache->objs,
> +			cache->size + remaining);
> +	if (unlikely(ret < 0)) {
> +		/*
> +		 * We are buffer constrained, and not able to fetch all that.
> +		 * Do not fill the cache, just satisfy the remaining part of
> +		 * the request directly from the backend.
> +		 */
> +		goto driver_dequeue;
> +	}
> +
> +	/* Satisfy the remaining part of the request from the filled cache. */
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> +
> +	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> +	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> +	cache_objs = &cache->objs[cache->size + remaining];
> +	cache->len = cache->size;
> +	for (index = 0; index < remaining; index++)
> +		*obj_table++ = *--cache_objs;
> +
> +	return 0;
> +
> +driver_dequeue:
> +
> +	/* Get remaining objects directly from the backend. */
> +	ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, remaining);
> +
> +	if (unlikely(ret < 0)) {
> +		cache->len = n - remaining;
> +		/*
> +		 * No further action is required to roll the first part
> +		 * of the request back into the cache, as objects in
> +		 * the cache are intact.
> +		 */
> +
> +		RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
> +		RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
> +	} else {
> +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> +		__rte_assume(ret == 0);
> +	}
> +
> +	return ret;
> +}
> +
>   /* Return the number of entries in the mempool */
>   RTE_EXPORT_SYMBOL(rte_mempool_avail_count)
>   unsigned int
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 7989d7a475..c6df285194 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -1370,8 +1370,31 @@ rte_mempool_cache_flush(struct rte_mempool_cache *cache,
>   	cache->len = 0;
>   }
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * @internal
> + * Put several objects back in the mempool, more than the cache has room for; used internally.
> + *
> + * @param mp
> + *   A pointer to the mempool structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to store back in the mempool, must be strictly
> + *   positive.
> + * @param cache
> + *   A pointer to a mempool cache structure.
> + */
> +__rte_experimental
> +void
> +_rte_mempool_do_generic_put_more(struct rte_mempool *mp, void * const *obj_table,
> +		unsigned int n, struct rte_mempool_cache *cache);
> +
>   /**
>    * @internal Put several objects back in the mempool; used internally.
> + *
>    * @param mp
>    *   A pointer to the mempool structure.
>    * @param obj_table
> @@ -1388,9 +1411,16 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
>   {
>   	void **cache_objs;
>   
> -	/* No cache provided? */
> -	if (unlikely(cache == NULL))
> -		goto driver_enqueue;
> +	if (unlikely(cache == NULL)) {

Patch summary says about de-inline of unlikely code, but you still have
it here. May be it is better to be consistent and the case in de-line
code.

> +		/* No cache. Push objects directly to the backend. */
> +		/* Increment stats now, adding in mempool always succeeds. */
> +		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> +		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> +
> +		rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
> +
> +		return;
> +	}
>   
>   	/* Increment stats now, adding in mempool always succeeds. */
>   	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> @@ -1403,35 +1433,43 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
>   		/* Sufficient room in the cache for the objects. */
>   		cache_objs = &cache->objs[cache->len];
>   		cache->len += n;
> -	} else if (n <= cache->flushthresh) {
> +
> +cache_enqueue:
> +#ifdef AVOID_RTE_MEMCPY /* Simple alternative to rte_memcpy(). */
>   		/*
> -		 * The cache is big enough for the objects, but - as detected by
> -		 * the comparison above - has insufficient room for them.
> -		 * Flush the cache to make room for the objects.
> +		 * Add the objects to the cache.
> +		 * If the request size is known at build time,
> +		 * the compiler unrolls the fixed length copy loop.
>   		 */
> -		cache_objs = &cache->objs[0];
> -		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> -		cache->len = n;
> -	} else {
> -		/* The request itself is too big for the cache. */
> -		goto driver_enqueue_stats_incremented;
> -	}
> -
> -	/* Add the objects to the cache. */
> -	rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
> +		for (uint32_t index = 0; index < n; index++)
> +			*cache_objs++ = *obj_table++;
> +#else
> +		/* Add the objects to the cache. */
> +		rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
> +#endif
>   
> -	return;
> +		return;
> +	}
>   
> -driver_enqueue:
> +	if (__rte_constant(n) && likely(n <= cache->flushthresh)) {
> +		uint32_t len;
>   
> -	/* increment stat now, adding in mempool always success */
> -	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> -	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> +		/*
> +		 * The cache is big enough for the objects, but - as detected
> +		 * above - has insufficient room for them.
> +		 * Flush the cache to make room for the objects.
> +		 */
> +		len = cache->len;
> +		cache_objs = &cache->objs[0];
> +		cache->len = n;
> +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, len);
>   
> -driver_enqueue_stats_incremented:
> +		/* Add the objects to the cache. */
> +		goto cache_enqueue;
> +	}
>   
> -	/* push objects to the backend */
> -	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
> +	/* Insufficient room in the cache for the objects. */
> +	_rte_mempool_do_generic_put_more(mp, obj_table, n, cache);
>   }
>   
>   
> @@ -1498,8 +1536,33 @@ rte_mempool_put(struct rte_mempool *mp, void *obj)
>   	rte_mempool_put_bulk(mp, &obj, 1);
>   }
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * @internal
> + * Get several objects from the mempool, more than held in the cache; used internally.
> + *
> + * @param mp
> + *   A pointer to the mempool structure.
> + * @param obj_table
> + *   A pointer to a table of void * pointers (objects).
> + * @param n
> + *   The number of objects to get, must be strictly positive.
> + * @param cache
> + *   A pointer to a mempool cache structure.
> + * @return
> + *   - 0: Success.
> + *   - <0: Error; code of driver dequeue function.
> + */
> +__rte_experimental
> +int
> +_rte_mempool_do_generic_get_more(struct rte_mempool *mp, void **obj_table,
> +		unsigned int n, struct rte_mempool_cache *cache);
> +
>   /**
>    * @internal Get several objects from the mempool; used internally.
> + *
>    * @param mp
>    *   A pointer to the mempool structure.
>    * @param obj_table
> @@ -1516,26 +1579,36 @@ static __rte_always_inline int
>   rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
>   			   unsigned int n, struct rte_mempool_cache *cache)
>   {
> -	int ret;
> -	unsigned int remaining;
> -	uint32_t index, len;
> -	void **cache_objs;
> -
> -	/* No cache provided? */
>   	if (unlikely(cache == NULL)) {

Patch summary says about de-inline of unlikely code, but you still have
it here.

> -		remaining = n;
> -		goto driver_dequeue;
> -	}
> +		int ret;
> +
> +		/* No cache. Get objects directly from the backend. */
> +		ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, n);
> +
> +		if (unlikely(ret < 0)) {
> +			RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
> +			RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
> +		} else {
> +			RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
> +			RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
> +			__rte_assume(ret == 0);
> +		}
>   
> -	/* The cache is a stack, so copy will be in reverse order. */
> -	cache_objs = &cache->objs[cache->len];
> +		return ret;
> +	}
>   
>   	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
>   	if (likely(n <= cache->len)) {
> +		uint32_t index;
> +		void **cache_objs;
> +
>   		/* The entire request can be satisfied from the cache. */
>   		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
>   		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
>   
> +		/* The cache is a stack, so copy will be in reverse order. */
> +		cache_objs = &cache->objs[cache->len];
> +
>   		/*
>   		 * If the request size is known at build time,
>   		 * the compiler unrolls the fixed length copy loop.
> @@ -1547,71 +1620,8 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
>   		return 0;
>   	}
>   
> -	/* Use the cache as much as we have to return hot objects first. */
> -	len = cache->len;
> -	remaining = n - len;
> -	cache->len = 0;
> -	for (index = 0; index < len; index++)
> -		*obj_table++ = *--cache_objs;
> -
> -	/* Dequeue below would overflow mem allocated for cache? */
> -	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
> -		goto driver_dequeue;
> -
> -	/* Fill the cache from the backend; fetch size + remaining objects. */
> -	ret = rte_mempool_ops_dequeue_bulk(mp, cache->objs,
> -			cache->size + remaining);
> -	if (unlikely(ret < 0)) {
> -		/*
> -		 * We are buffer constrained, and not able to fetch all that.
> -		 * Do not fill the cache, just satisfy the remaining part of
> -		 * the request directly from the backend.
> -		 */
> -		goto driver_dequeue;
> -	}
> -
> -	/* Satisfy the remaining part of the request from the filled cache. */
> -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> -
> -	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> -	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> -	cache_objs = &cache->objs[cache->size + remaining];
> -	cache->len = cache->size;
> -	for (index = 0; index < remaining; index++)
> -		*obj_table++ = *--cache_objs;
> -
> -	return 0;
> -
> -driver_dequeue:
> -
> -	/* Get remaining objects directly from the backend. */
> -	ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, remaining);
> -
> -	if (unlikely(ret < 0)) {
> -		if (likely(cache != NULL)) {
> -			cache->len = n - remaining;
> -			/*
> -			 * No further action is required to roll the first part
> -			 * of the request back into the cache, as objects in
> -			 * the cache are intact.
> -			 */
> -		}
> -
> -		RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
> -		RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
> -	} else {
> -		if (likely(cache != NULL)) {
> -			RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> -			RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> -		} else {
> -			RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
> -			RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
> -		}
> -		__rte_assume(ret == 0);
> -	}
> -
> -	return ret;
> +	/* The entire request cannot be satisfied from the cache. */
> +	return _rte_mempool_do_generic_get_more(mp, obj_table, n, cache);
>   }
>   
>   /**


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [RFC PATCH v3 2/2] mempool: de-inline get/put unlikely code paths
  2026-02-17  6:37     ` Andrew Rybchenko
@ 2026-03-13 15:27       ` Morten Brørup
  0 siblings, 0 replies; 15+ messages in thread
From: Morten Brørup @ 2026-03-13 15:27 UTC (permalink / raw)
  To: Andrew Rybchenko, dev

> From: Andrew Rybchenko [mailto:andrew.rybchenko@oktetlabs.ru]
> Sent: Tuesday, 17 February 2026 07.37
> 
> On 2/16/26 6:23 PM, Morten Brørup wrote:
> > De-inline unlikely code paths, for smaller footprint.
> 
> The idea is interesting and makes sense to me. But could you share
> performance figures to know the impact.
> 
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---
> > v3:
> > * New functions are called from inline code, so make them
> experimental
> >    instead of internal.
> > v2:
> > * Removed review functions.
> > * Changed #if 0 to #if AVOID_RTE_MEMCPY.
> > ---
> >   lib/mempool/rte_mempool.c | 112 ++++++++++++++++++++
> >   lib/mempool/rte_mempool.h | 212 ++++++++++++++++++++---------------
> ---
> >   2 files changed, 223 insertions(+), 101 deletions(-)
> >
> > diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
> > index 3042d94c14..30dce3a2fd 100644
> > --- a/lib/mempool/rte_mempool.c
> > +++ b/lib/mempool/rte_mempool.c
> > @@ -1016,6 +1016,118 @@ rte_mempool_create(const char *name, unsigned
> n, unsigned elt_size,
> >   	return NULL;
> >   }
> >
> > +/* internal */
> > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(_rte_mempool_do_generic_put_more,
> 26.03)
> > +void
> > +_rte_mempool_do_generic_put_more(struct rte_mempool *mp, void *
> const *obj_table,
> > +		unsigned int n, struct rte_mempool_cache *cache)
> > +{
> 
> I'd add comments which explain why stats are not updated by the
> function. It is the drawback of the solution when at least
> comments should be added to make it clear. Stats update would
> be very easy to loose in the case of future changes.
> 
> > +	__rte_assume(cache->flushthresh <= RTE_MEMPOOL_CACHE_MAX_SIZE *
> 2);
> > +	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
> > +	__rte_assume(cache->len <= cache->flushthresh);
> > +	__rte_assume(cache->len + n > cache->flushthresh);
> > +	if (likely(n <= cache->flushthresh)) {
> > +		uint32_t len;
> > +		void **cache_objs;
> > +
> > +		/*
> > +		 * The cache is big enough for the objects, but - as
> detected by
> > +		 * rte_mempool_do_generic_put() - has insufficient room for
> them.
> > +		 * Flush the cache to make room for the objects.
> > +		 */
> > +		len = cache->len;
> > +		cache_objs = &cache->objs[0];
> > +		cache->len = n;
> > +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, len);
> > +
> > +		/* Add the objects to the cache. */
> > +#ifdef AVOID_RTE_MEMCPY /* Simple alternative to rte_memcpy(). */
> 
> I'd not mix introduction of AVOID_RTE_MEMCPY and other goals of the
> patch. If AVOID_RTE_MEMCPY is really useful, it could be added
> separately and appropriately motivated.
> 
> > +		for (uint32_t index = 0; index < n; index++)
> > +			*cache_objs++ = *obj_table++;
> > +#else
> > +		rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
> > +#endif
> > +
> > +		return;
> > +	}
> > +
> > +	/* The request itself is too big for the cache. Push objects
> directly to the backend. */
> > +	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
> > +}
> > +
> > +/* internal */
> > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(_rte_mempool_do_generic_get_more,
> 26.03)
> > +int
> > +_rte_mempool_do_generic_get_more(struct rte_mempool *mp, void
> **obj_table,
> > +		unsigned int n, struct rte_mempool_cache *cache)
> > +{
> > +	int ret;
> > +	unsigned int remaining;
> > +	uint32_t index, len;
> > +	void **cache_objs;
> > +
> > +	/* Use the cache as much as we have to return hot objects first.
> */
> > +	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
> > +	len = cache->len;
> > +	remaining = n - len;
> > +	cache_objs = &cache->objs[len];
> > +	cache->len = 0;
> > +	for (index = 0; index < len; index++)
> > +		*obj_table++ = *--cache_objs;
> > +
> > +	/* Dequeue below would overflow mem allocated for cache? */
> > +	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
> > +		goto driver_dequeue;
> > +
> > +	/* Fill the cache from the backend; fetch size + remaining
> objects. */
> > +	ret = rte_mempool_ops_dequeue_bulk(mp, cache->objs,
> > +			cache->size + remaining);
> > +	if (unlikely(ret < 0)) {
> > +		/*
> > +		 * We are buffer constrained, and not able to fetch all
> that.
> > +		 * Do not fill the cache, just satisfy the remaining part
> of
> > +		 * the request directly from the backend.
> > +		 */
> > +		goto driver_dequeue;
> > +	}
> > +
> > +	/* Satisfy the remaining part of the request from the filled
> cache. */
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> > +
> > +	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> > +	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> > +	cache_objs = &cache->objs[cache->size + remaining];
> > +	cache->len = cache->size;
> > +	for (index = 0; index < remaining; index++)
> > +		*obj_table++ = *--cache_objs;
> > +
> > +	return 0;
> > +
> > +driver_dequeue:
> > +
> > +	/* Get remaining objects directly from the backend. */
> > +	ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, remaining);
> > +
> > +	if (unlikely(ret < 0)) {
> > +		cache->len = n - remaining;
> > +		/*
> > +		 * No further action is required to roll the first part
> > +		 * of the request back into the cache, as objects in
> > +		 * the cache are intact.
> > +		 */
> > +
> > +		RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
> > +		RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
> > +	} else {
> > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> > +		__rte_assume(ret == 0);
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> >   /* Return the number of entries in the mempool */
> >   RTE_EXPORT_SYMBOL(rte_mempool_avail_count)
> >   unsigned int
> > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> > index 7989d7a475..c6df285194 100644
> > --- a/lib/mempool/rte_mempool.h
> > +++ b/lib/mempool/rte_mempool.h
> > @@ -1370,8 +1370,31 @@ rte_mempool_cache_flush(struct
> rte_mempool_cache *cache,
> >   	cache->len = 0;
> >   }
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * @internal
> > + * Put several objects back in the mempool, more than the cache has
> room for; used internally.
> > + *
> > + * @param mp
> > + *   A pointer to the mempool structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to store back in the mempool, must be
> strictly
> > + *   positive.
> > + * @param cache
> > + *   A pointer to a mempool cache structure.
> > + */
> > +__rte_experimental
> > +void
> > +_rte_mempool_do_generic_put_more(struct rte_mempool *mp, void *
> const *obj_table,
> > +		unsigned int n, struct rte_mempool_cache *cache);
> > +
> >   /**
> >    * @internal Put several objects back in the mempool; used
> internally.
> > + *
> >    * @param mp
> >    *   A pointer to the mempool structure.
> >    * @param obj_table
> > @@ -1388,9 +1411,16 @@ rte_mempool_do_generic_put(struct rte_mempool
> *mp, void * const *obj_table,
> >   {
> >   	void **cache_objs;
> >
> > -	/* No cache provided? */
> > -	if (unlikely(cache == NULL))
> > -		goto driver_enqueue;
> > +	if (unlikely(cache == NULL)) {
> 
> Patch summary says about de-inline of unlikely code, but you still have
> it here. May be it is better to be consistent and the case in de-line
> code.
> 
> > +		/* No cache. Push objects directly to the backend. */
> > +		/* Increment stats now, adding in mempool always succeeds.
> */
> > +		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > +		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > +
> > +		rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
> > +
> > +		return;
> > +	}
> >
> >   	/* Increment stats now, adding in mempool always succeeds. */
> >   	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > @@ -1403,35 +1433,43 @@ rte_mempool_do_generic_put(struct rte_mempool
> *mp, void * const *obj_table,
> >   		/* Sufficient room in the cache for the objects. */
> >   		cache_objs = &cache->objs[cache->len];
> >   		cache->len += n;
> > -	} else if (n <= cache->flushthresh) {
> > +
> > +cache_enqueue:
> > +#ifdef AVOID_RTE_MEMCPY /* Simple alternative to rte_memcpy(). */
> >   		/*
> > -		 * The cache is big enough for the objects, but - as
> detected by
> > -		 * the comparison above - has insufficient room for them.
> > -		 * Flush the cache to make room for the objects.
> > +		 * Add the objects to the cache.
> > +		 * If the request size is known at build time,
> > +		 * the compiler unrolls the fixed length copy loop.
> >   		 */
> > -		cache_objs = &cache->objs[0];
> > -		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> > -		cache->len = n;
> > -	} else {
> > -		/* The request itself is too big for the cache. */
> > -		goto driver_enqueue_stats_incremented;
> > -	}
> > -
> > -	/* Add the objects to the cache. */
> > -	rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
> > +		for (uint32_t index = 0; index < n; index++)
> > +			*cache_objs++ = *obj_table++;
> > +#else
> > +		/* Add the objects to the cache. */
> > +		rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
> > +#endif
> >
> > -	return;
> > +		return;
> > +	}
> >
> > -driver_enqueue:
> > +	if (__rte_constant(n) && likely(n <= cache->flushthresh)) {
> > +		uint32_t len;
> >
> > -	/* increment stat now, adding in mempool always success */
> > -	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > -	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > +		/*
> > +		 * The cache is big enough for the objects, but - as
> detected
> > +		 * above - has insufficient room for them.
> > +		 * Flush the cache to make room for the objects.
> > +		 */
> > +		len = cache->len;
> > +		cache_objs = &cache->objs[0];
> > +		cache->len = n;
> > +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, len);
> >
> > -driver_enqueue_stats_incremented:
> > +		/* Add the objects to the cache. */
> > +		goto cache_enqueue;
> > +	}
> >
> > -	/* push objects to the backend */
> > -	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
> > +	/* Insufficient room in the cache for the objects. */
> > +	_rte_mempool_do_generic_put_more(mp, obj_table, n, cache);
> >   }
> >
> >
> > @@ -1498,8 +1536,33 @@ rte_mempool_put(struct rte_mempool *mp, void
> *obj)
> >   	rte_mempool_put_bulk(mp, &obj, 1);
> >   }
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * @internal
> > + * Get several objects from the mempool, more than held in the
> cache; used internally.
> > + *
> > + * @param mp
> > + *   A pointer to the mempool structure.
> > + * @param obj_table
> > + *   A pointer to a table of void * pointers (objects).
> > + * @param n
> > + *   The number of objects to get, must be strictly positive.
> > + * @param cache
> > + *   A pointer to a mempool cache structure.
> > + * @return
> > + *   - 0: Success.
> > + *   - <0: Error; code of driver dequeue function.
> > + */
> > +__rte_experimental
> > +int
> > +_rte_mempool_do_generic_get_more(struct rte_mempool *mp, void
> **obj_table,
> > +		unsigned int n, struct rte_mempool_cache *cache);
> > +
> >   /**
> >    * @internal Get several objects from the mempool; used internally.
> > + *
> >    * @param mp
> >    *   A pointer to the mempool structure.
> >    * @param obj_table
> > @@ -1516,26 +1579,36 @@ static __rte_always_inline int
> >   rte_mempool_do_generic_get(struct rte_mempool *mp, void
> **obj_table,
> >   			   unsigned int n, struct rte_mempool_cache *cache)
> >   {
> > -	int ret;
> > -	unsigned int remaining;
> > -	uint32_t index, len;
> > -	void **cache_objs;
> > -
> > -	/* No cache provided? */
> >   	if (unlikely(cache == NULL)) {
> 
> Patch summary says about de-inline of unlikely code, but you still have
> it here.
> 
> > -		remaining = n;
> > -		goto driver_dequeue;
> > -	}
> > +		int ret;
> > +
> > +		/* No cache. Get objects directly from the backend. */
> > +		ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, n);
> > +
> > +		if (unlikely(ret < 0)) {
> > +			RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
> > +			RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
> > +		} else {
> > +			RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
> > +			RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
> > +			__rte_assume(ret == 0);
> > +		}
> >
> > -	/* The cache is a stack, so copy will be in reverse order. */
> > -	cache_objs = &cache->objs[cache->len];
> > +		return ret;
> > +	}
> >
> >   	__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE * 2);
> >   	if (likely(n <= cache->len)) {
> > +		uint32_t index;
> > +		void **cache_objs;
> > +
> >   		/* The entire request can be satisfied from the cache. */
> >   		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> >   		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> >
> > +		/* The cache is a stack, so copy will be in reverse order.
> */
> > +		cache_objs = &cache->objs[cache->len];
> > +
> >   		/*
> >   		 * If the request size is known at build time,
> >   		 * the compiler unrolls the fixed length copy loop.
> > @@ -1547,71 +1620,8 @@ rte_mempool_do_generic_get(struct rte_mempool
> *mp, void **obj_table,
> >   		return 0;
> >   	}
> >
> > -	/* Use the cache as much as we have to return hot objects first.
> */
> > -	len = cache->len;
> > -	remaining = n - len;
> > -	cache->len = 0;
> > -	for (index = 0; index < len; index++)
> > -		*obj_table++ = *--cache_objs;
> > -
> > -	/* Dequeue below would overflow mem allocated for cache? */
> > -	if (unlikely(remaining > RTE_MEMPOOL_CACHE_MAX_SIZE))
> > -		goto driver_dequeue;
> > -
> > -	/* Fill the cache from the backend; fetch size + remaining
> objects. */
> > -	ret = rte_mempool_ops_dequeue_bulk(mp, cache->objs,
> > -			cache->size + remaining);
> > -	if (unlikely(ret < 0)) {
> > -		/*
> > -		 * We are buffer constrained, and not able to fetch all
> that.
> > -		 * Do not fill the cache, just satisfy the remaining part
> of
> > -		 * the request directly from the backend.
> > -		 */
> > -		goto driver_dequeue;
> > -	}
> > -
> > -	/* Satisfy the remaining part of the request from the filled
> cache. */
> > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> > -
> > -	__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> > -	__rte_assume(remaining <= RTE_MEMPOOL_CACHE_MAX_SIZE);
> > -	cache_objs = &cache->objs[cache->size + remaining];
> > -	cache->len = cache->size;
> > -	for (index = 0; index < remaining; index++)
> > -		*obj_table++ = *--cache_objs;
> > -
> > -	return 0;
> > -
> > -driver_dequeue:
> > -
> > -	/* Get remaining objects directly from the backend. */
> > -	ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, remaining);
> > -
> > -	if (unlikely(ret < 0)) {
> > -		if (likely(cache != NULL)) {
> > -			cache->len = n - remaining;
> > -			/*
> > -			 * No further action is required to roll the first
> part
> > -			 * of the request back into the cache, as objects in
> > -			 * the cache are intact.
> > -			 */
> > -		}
> > -
> > -		RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
> > -		RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
> > -	} else {
> > -		if (likely(cache != NULL)) {
> > -			RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk,
> 1);
> > -			RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs,
> n);
> > -		} else {
> > -			RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
> > -			RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
> > -		}
> > -		__rte_assume(ret == 0);
> > -	}
> > -
> > -	return ret;
> > +	/* The entire request cannot be satisfied from the cache. */
> > +	return _rte_mempool_do_generic_get_more(mp, obj_table, n, cache);
> >   }
> >
> >   /**

Good feedback, Andrew.
I'll mark as changes requested, and follow up with a new version later.

-Morten


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [RFC PATCH v3 1/2] mempool: simplify get objects
  2026-02-17  6:19     ` Andrew Rybchenko
@ 2026-03-13 15:36       ` Morten Brørup
  0 siblings, 0 replies; 15+ messages in thread
From: Morten Brørup @ 2026-03-13 15:36 UTC (permalink / raw)
  To: dev; +Cc: Andrew Rybchenko

> From: Andrew Rybchenko [mailto:andrew.rybchenko@oktetlabs.ru]
> Sent: Tuesday, 17 February 2026 07.19
> 
> Hi Morten,
> 
> On 2/16/26 6:23 PM, Morten Brørup wrote:
> > Removed explicit test for build time constant request size,
> > and added comment that the compiler loop unrolls when request size is
> > build time constant, to improve source code readability.
> >
> > Moved setting cache->len up before the copy loop; not only for code
> > similarity (cache->len is now set before each copy loop), but also as
> an
> > optimization:
> > The function's pointer parameters are not marked restrict, so writing
> to
> > obj_table in the copy loop might formally modify cache->size. And
> thus,
> > setting cache->len = cache->size after the copy loop requires loading
> > cache->size again after copying the objects.
> > Moving this line up before the copy loop avoids that extra load of
> > cache->size when setting cache->len.
> >
> > Similarly, moved statistics update up before the copy loops.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> 
> LGTM, the result looks simpler, easier to read and understand.
> If there is no measurable performance degradation after the patch
> 
> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> 

There's a real patch [1] for this RFC patch 1/2.
Marking this RFC patch 1/2 as Superseded.

[1]: https://patchwork.dpdk.org/project/dpdk/patch/20260216092750.94137-1-mb@smartsharesystems.com/


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-03-13 15:36 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-16 11:58 [RFC PATCH 0/2] mempool: de-inline get/put objects unlikely code Morten Brørup
2026-02-16 11:58 ` [RFC PATCH 1/2] mempool: simplify get objects Morten Brørup
2026-02-16 11:58 ` [RFC PATCH 2/2] mempool: de-inline get/put objects unlikely code paths Morten Brørup
2026-02-16 13:13 ` [RFC PATCH v2 0/2] mempool: de-inline get/put objects unlikely code Morten Brørup
2026-02-16 13:13   ` [RFC PATCH v2 1/2] mempool: simplify get objects Morten Brørup
2026-02-16 13:13   ` [RFC PATCH v2 2/2] mempool: de-inline get/put objects unlikely code paths Morten Brørup
2026-02-16 15:23 ` [RFC PATCH v3 0/2] mempool: de-inline get/put " Morten Brørup
2026-02-16 15:23   ` [RFC PATCH v3 1/2] mempool: simplify get objects Morten Brørup
2026-02-17  6:19     ` Andrew Rybchenko
2026-03-13 15:36       ` Morten Brørup
2026-02-16 15:23   ` [RFC PATCH v3 2/2] mempool: de-inline get/put unlikely code paths Morten Brørup
2026-02-16 17:35     ` Stephen Hemminger
2026-02-16 19:59       ` Morten Brørup
2026-02-17  6:37     ` Andrew Rybchenko
2026-03-13 15:27       ` Morten Brørup

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox