* [PATCH] net/intel: do not bypass mbuf lib for mbuf fast-free
@ 2026-04-18 9:56 Morten Brørup
2026-04-19 6:29 ` Morten Brørup
2026-04-21 10:34 ` Bruce Richardson
0 siblings, 2 replies; 5+ messages in thread
From: Morten Brørup @ 2026-04-18 9:56 UTC (permalink / raw)
To: dev, Bruce Richardson; +Cc: Morten Brørup
Freeing mbufs directly into the mempool meant that mbuf instrumentation,
including mbuf history marking, was omitted.
The mbufs are now freed via the rte_mbuf_raw_free_bulk() function instead.
Added a static_assert to ensure that type casting the array of struct
ci_tx_entry_vec to an array of rte_mbuf pointers remains sound.
Performance note:
The (n & 31) condition was not removed.
For the default tx_rs_thresh value (32), the condition will be true.
And due to inlining, the rte_mbuf_raw_free_bulk() ends up in an
rte_memcpy(), where the optimizer takes advantage of knowing that the
lower bits are not set.
This should compensate somewhat for removing the handcoded optimization of
copying in chunks of 32 mbufs.
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
doc/guides/rel_notes/release_26_07.rst | 4 +++
drivers/net/intel/common/tx.h | 36 +++-----------------------
2 files changed, 7 insertions(+), 33 deletions(-)
diff --git a/doc/guides/rel_notes/release_26_07.rst b/doc/guides/rel_notes/release_26_07.rst
index 060b26ff61..9367d38b13 100644
--- a/doc/guides/rel_notes/release_26_07.rst
+++ b/doc/guides/rel_notes/release_26_07.rst
@@ -24,6 +24,10 @@ DPDK Release 26.07
New Features
------------
+* **Updated Intel common driver.**
+
+ * Added missing mbuf history marking to vectorized Tx path for MBUF_FAST_FREE.
+
.. This section should contain new features added in this release.
Sample format:
diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
index 283bd58d5d..4a201da83c 100644
--- a/drivers/net/intel/common/tx.h
+++ b/drivers/net/intel/common/tx.h
@@ -285,42 +285,12 @@ ci_tx_free_bufs_vec(struct ci_tx_queue *txq, ci_desc_done_fn desc_done, bool ctx
(txq->fast_free_mp = txep[0].mbuf->pool);
if (mp != NULL && (n & 31) == 0) {
- void **cache_objs;
- struct rte_mempool_cache *cache = rte_mempool_default_cache(mp, rte_lcore_id());
-
- if (cache == NULL)
- goto normal;
-
- cache_objs = &cache->objs[cache->len];
-
- if (n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
- rte_mempool_ops_enqueue_bulk(mp, (void *)txep, n);
- goto done;
- }
-
- /* The cache follows the following algorithm
- * 1. Add the objects to the cache
- * 2. Anything greater than the cache min value (if it
- * crosses the cache flush threshold) is flushed to the ring.
- */
- /* Add elements back into the cache */
- uint32_t copied = 0;
- /* n is multiple of 32 */
- while (copied < n) {
- memcpy(&cache_objs[copied], &txep[copied], 32 * sizeof(void *));
- copied += 32;
- }
- cache->len += n;
-
- if (cache->len >= cache->flushthresh) {
- rte_mempool_ops_enqueue_bulk(mp, &cache->objs[cache->size],
- cache->len - cache->size);
- cache->len = cache->size;
- }
+ static_assert(sizeof(*txep) == sizeof(struct rte_mbuf *),
+ "txep array is not similar to an array of rte_mbuf pointers");
+ rte_mbuf_raw_free_bulk(mp, (void *)txep, n);
goto done;
}
-normal:
m = rte_pktmbuf_prefree_seg(txep[0].mbuf);
if (likely(m)) {
free[0] = m;
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* RE: [PATCH] net/intel: do not bypass mbuf lib for mbuf fast-free
2026-04-18 9:56 [PATCH] net/intel: do not bypass mbuf lib for mbuf fast-free Morten Brørup
@ 2026-04-19 6:29 ` Morten Brørup
2026-04-21 10:34 ` Bruce Richardson
1 sibling, 0 replies; 5+ messages in thread
From: Morten Brørup @ 2026-04-19 6:29 UTC (permalink / raw)
To: dev
Recheck-request: iol-unit-amd64-testing
FYI, the log from the failure:
54/143 DPDK:fast-tests / eal_flags_misc_autotest FAIL 1.21s (exit status 255 or signal 127 SIGinvalid)
>>> DPDK_TEST=eal_flags_misc_autotest MALLOC_PERTURB_=27 /root/workspace/Generic-Unit-Test-DPDK/dpdk/build/app/dpdk-test --file-prefix=eal_flags_misc_autotest
-------------------------------------------------------------------------- --------------------------------------------------------------------------
[...]
Calling recursive action for test_misc_flags
Returned from recursive action for test_misc_flags with 0
Cleaning up /root/workspace/Generic-Unit-Test-DPDK/dpdk/build/app/dpdk-test recursive instance
/root/workspace/Generic-Unit-Test-DPDK/dpdk/build/app/dpdk-test recursive instance returning 0
Running binary with argv[]:'/root/workspace/Generic-Unit-Test-DPDK/dpdk/build/app/dpdk-test' '--file-prefix=virtaddr' '--log-level=lib.eal:debug' '--no-pci' '--base-virtaddr=0x23456789'
EAL: lib.eal log level changed from info to debug
EAL: Detected lcore 0 as core 0 on NUMA node 0
EAL: Detected lcore 1 as core 0 on NUMA node 0
EAL: Detected lcore 2 as core 0 on NUMA node 0
EAL: Detected lcore 3 as core 0 on NUMA node 0
EAL: Detected lcore 4 as core 0 on NUMA node 0
EAL: Detected lcore 5 as core 0 on NUMA node 0
EAL: Detected lcore 6 as core 0 on NUMA node 0
EAL: Detected lcore 7 as core 0 on NUMA node 0
EAL: Detected lcore 8 as core 0 on NUMA node 1
EAL: Detected lcore 9 as core 0 on NUMA node 1
EAL: Detected lcore 10 as core 0 on NUMA node 1
EAL: Detected lcore 11 as core 0 on NUMA node 1
EAL: Detected lcore 12 as core 0 on NUMA node 1
EAL: Detected lcore 13 as core 0 on NUMA node 1
EAL: Detected lcore 14 as core 0 on NUMA node 1
EAL: Detected lcore 15 as core 0 on NUMA node 1
EAL: Maximum logical cores by configuration: 128
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 2
EAL: Cores selected by affinity auto-detection: 0
EAL: lcore 0 mapped to physical core 0
EAL: Control threads will use cores: 0
EAL: Checking presence of .so 'librte_eal.so.26.2'
EAL: Checking presence of .so 'librte_eal.so.26'
EAL: Checking presence of .so 'librte_eal.so'
EAL: Detected static linkage of DPDK
EAL: Ask a virtual area of 0x7000 bytes
EAL: Cannot get a virtual area at requested address: 0x23ff9000 (got 0x7fa029a47000)
EAL: Cannot mmap memory for rte_config
EAL: Cannot init config
EAL: rte_intr_fd_get, ln 209: Interrupt instance unallocated
EAL: Unregistering with invalid input parameter
Error with EAL initialization, ret = -1
Error (line 1101) - process did not run ok with --base-virtaddr parameter
Test Failed
RTE>>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] net/intel: do not bypass mbuf lib for mbuf fast-free
2026-04-18 9:56 [PATCH] net/intel: do not bypass mbuf lib for mbuf fast-free Morten Brørup
2026-04-19 6:29 ` Morten Brørup
@ 2026-04-21 10:34 ` Bruce Richardson
2026-04-21 10:44 ` Morten Brørup
2026-04-21 11:00 ` Bruce Richardson
1 sibling, 2 replies; 5+ messages in thread
From: Bruce Richardson @ 2026-04-21 10:34 UTC (permalink / raw)
To: Morten Brørup; +Cc: dev
On Sat, Apr 18, 2026 at 09:56:38AM +0000, Morten Brørup wrote:
> Freeing mbufs directly into the mempool meant that mbuf instrumentation,
> including mbuf history marking, was omitted.
> The mbufs are now freed via the rte_mbuf_raw_free_bulk() function instead.
>
> Added a static_assert to ensure that type casting the array of struct
> ci_tx_entry_vec to an array of rte_mbuf pointers remains sound.
>
> Performance note:
> The (n & 31) condition was not removed.
> For the default tx_rs_thresh value (32), the condition will be true.
> And due to inlining, the rte_mbuf_raw_free_bulk() ends up in an
> rte_memcpy(), where the optimizer takes advantage of knowing that the
> lower bits are not set.
> This should compensate somewhat for removing the handcoded optimization of
> copying in chunks of 32 mbufs.
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
Ran a very quick perf test using a couple of 100G ports, no regression
seen with this patch, maybe even a slight perf bump. Therefore:
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
One comment inline below:
> doc/guides/rel_notes/release_26_07.rst | 4 +++
> drivers/net/intel/common/tx.h | 36 +++-----------------------
> 2 files changed, 7 insertions(+), 33 deletions(-)
>
> diff --git a/doc/guides/rel_notes/release_26_07.rst b/doc/guides/rel_notes/release_26_07.rst
> index 060b26ff61..9367d38b13 100644
> --- a/doc/guides/rel_notes/release_26_07.rst
> +++ b/doc/guides/rel_notes/release_26_07.rst
> @@ -24,6 +24,10 @@ DPDK Release 26.07
> New Features
> ------------
>
> +* **Updated Intel common driver.**
> +
> + * Added missing mbuf history marking to vectorized Tx path for MBUF_FAST_FREE.
> +
I don't think this is a big enough change to require a release note update.
It's really more of a bug fix. If you are ok with it, I'd like to drop this
RN entry on apply of the patch?
> .. This section should contain new features added in this release.
> Sample format:
>
> diff --git a/drivers/net/intel/common/tx.h b/drivers/net/intel/common/tx.h
> index 283bd58d5d..4a201da83c 100644
> --- a/drivers/net/intel/common/tx.h
> +++ b/drivers/net/intel/common/tx.h
> @@ -285,42 +285,12 @@ ci_tx_free_bufs_vec(struct ci_tx_queue *txq, ci_desc_done_fn desc_done, bool ctx
> (txq->fast_free_mp = txep[0].mbuf->pool);
>
> if (mp != NULL && (n & 31) == 0) {
> - void **cache_objs;
> - struct rte_mempool_cache *cache = rte_mempool_default_cache(mp, rte_lcore_id());
> -
> - if (cache == NULL)
> - goto normal;
> -
> - cache_objs = &cache->objs[cache->len];
> -
> - if (n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
> - rte_mempool_ops_enqueue_bulk(mp, (void *)txep, n);
> - goto done;
> - }
> -
> - /* The cache follows the following algorithm
> - * 1. Add the objects to the cache
> - * 2. Anything greater than the cache min value (if it
> - * crosses the cache flush threshold) is flushed to the ring.
> - */
> - /* Add elements back into the cache */
> - uint32_t copied = 0;
> - /* n is multiple of 32 */
> - while (copied < n) {
> - memcpy(&cache_objs[copied], &txep[copied], 32 * sizeof(void *));
> - copied += 32;
> - }
> - cache->len += n;
> -
> - if (cache->len >= cache->flushthresh) {
> - rte_mempool_ops_enqueue_bulk(mp, &cache->objs[cache->size],
> - cache->len - cache->size);
> - cache->len = cache->size;
> - }
> + static_assert(sizeof(*txep) == sizeof(struct rte_mbuf *),
> + "txep array is not similar to an array of rte_mbuf pointers");
> + rte_mbuf_raw_free_bulk(mp, (void *)txep, n);
> goto done;
> }
>
> -normal:
> m = rte_pktmbuf_prefree_seg(txep[0].mbuf);
> if (likely(m)) {
> free[0] = m;
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 5+ messages in thread* RE: [PATCH] net/intel: do not bypass mbuf lib for mbuf fast-free
2026-04-21 10:34 ` Bruce Richardson
@ 2026-04-21 10:44 ` Morten Brørup
2026-04-21 11:00 ` Bruce Richardson
1 sibling, 0 replies; 5+ messages in thread
From: Morten Brørup @ 2026-04-21 10:44 UTC (permalink / raw)
To: Bruce Richardson; +Cc: dev
> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Tuesday, 21 April 2026 12.35
>
> On Sat, Apr 18, 2026 at 09:56:38AM +0000, Morten Brørup wrote:
> > Freeing mbufs directly into the mempool meant that mbuf
> instrumentation,
> > including mbuf history marking, was omitted.
> > The mbufs are now freed via the rte_mbuf_raw_free_bulk() function
> instead.
> >
> > Added a static_assert to ensure that type casting the array of struct
> > ci_tx_entry_vec to an array of rte_mbuf pointers remains sound.
> >
> > Performance note:
> > The (n & 31) condition was not removed.
> > For the default tx_rs_thresh value (32), the condition will be true.
> > And due to inlining, the rte_mbuf_raw_free_bulk() ends up in an
> > rte_memcpy(), where the optimizer takes advantage of knowing that the
> > lower bits are not set.
> > This should compensate somewhat for removing the handcoded
> optimization of
> > copying in chunks of 32 mbufs.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---
>
> Ran a very quick perf test using a couple of 100G ports, no regression
> seen with this patch, maybe even a slight perf bump. Therefore:
>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> Tested-by: Bruce Richardson <bruce.richardson@intel.com>
>
> One comment inline below:
>
> > doc/guides/rel_notes/release_26_07.rst | 4 +++
> > drivers/net/intel/common/tx.h | 36 +++---------------------
> --
> > 2 files changed, 7 insertions(+), 33 deletions(-)
> >
> > diff --git a/doc/guides/rel_notes/release_26_07.rst
> b/doc/guides/rel_notes/release_26_07.rst
> > index 060b26ff61..9367d38b13 100644
> > --- a/doc/guides/rel_notes/release_26_07.rst
> > +++ b/doc/guides/rel_notes/release_26_07.rst
> > @@ -24,6 +24,10 @@ DPDK Release 26.07
> > New Features
> > ------------
> >
> > +* **Updated Intel common driver.**
> > +
> > + * Added missing mbuf history marking to vectorized Tx path for
> MBUF_FAST_FREE.
> > +
>
> I don't think this is a big enough change to require a release note
> update.
> It's really more of a bug fix. If you are ok with it, I'd like to drop
> this
> RN entry on apply of the patch?
OK with me.
>
> > .. This section should contain new features added in this release.
> > Sample format:
> >
> > diff --git a/drivers/net/intel/common/tx.h
> b/drivers/net/intel/common/tx.h
> > index 283bd58d5d..4a201da83c 100644
> > --- a/drivers/net/intel/common/tx.h
> > +++ b/drivers/net/intel/common/tx.h
> > @@ -285,42 +285,12 @@ ci_tx_free_bufs_vec(struct ci_tx_queue *txq,
> ci_desc_done_fn desc_done, bool ctx
> > (txq->fast_free_mp = txep[0].mbuf->pool);
> >
> > if (mp != NULL && (n & 31) == 0) {
> > - void **cache_objs;
> > - struct rte_mempool_cache *cache =
> rte_mempool_default_cache(mp, rte_lcore_id());
> > -
> > - if (cache == NULL)
> > - goto normal;
> > -
> > - cache_objs = &cache->objs[cache->len];
> > -
> > - if (n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
> > - rte_mempool_ops_enqueue_bulk(mp, (void *)txep, n);
> > - goto done;
> > - }
> > -
> > - /* The cache follows the following algorithm
> > - * 1. Add the objects to the cache
> > - * 2. Anything greater than the cache min value (if it
> > - * crosses the cache flush threshold) is flushed to the
> ring.
> > - */
> > - /* Add elements back into the cache */
> > - uint32_t copied = 0;
> > - /* n is multiple of 32 */
> > - while (copied < n) {
> > - memcpy(&cache_objs[copied], &txep[copied], 32 *
> sizeof(void *));
> > - copied += 32;
> > - }
> > - cache->len += n;
> > -
> > - if (cache->len >= cache->flushthresh) {
> > - rte_mempool_ops_enqueue_bulk(mp, &cache->objs[cache-
> >size],
> > - cache->len - cache->size);
> > - cache->len = cache->size;
> > - }
> > + static_assert(sizeof(*txep) == sizeof(struct rte_mbuf *),
> > + "txep array is not similar to an array of
> rte_mbuf pointers");
> > + rte_mbuf_raw_free_bulk(mp, (void *)txep, n);
> > goto done;
> > }
> >
> > -normal:
> > m = rte_pktmbuf_prefree_seg(txep[0].mbuf);
> > if (likely(m)) {
> > free[0] = m;
> > --
> > 2.43.0
> >
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] net/intel: do not bypass mbuf lib for mbuf fast-free
2026-04-21 10:34 ` Bruce Richardson
2026-04-21 10:44 ` Morten Brørup
@ 2026-04-21 11:00 ` Bruce Richardson
1 sibling, 0 replies; 5+ messages in thread
From: Bruce Richardson @ 2026-04-21 11:00 UTC (permalink / raw)
To: Morten Brørup; +Cc: dev
On Tue, Apr 21, 2026 at 11:34:46AM +0100, Bruce Richardson wrote:
> On Sat, Apr 18, 2026 at 09:56:38AM +0000, Morten Brørup wrote:
> > Freeing mbufs directly into the mempool meant that mbuf instrumentation,
> > including mbuf history marking, was omitted.
> > The mbufs are now freed via the rte_mbuf_raw_free_bulk() function instead.
> >
> > Added a static_assert to ensure that type casting the array of struct
> > ci_tx_entry_vec to an array of rte_mbuf pointers remains sound.
> >
> > Performance note:
> > The (n & 31) condition was not removed.
> > For the default tx_rs_thresh value (32), the condition will be true.
> > And due to inlining, the rte_mbuf_raw_free_bulk() ends up in an
> > rte_memcpy(), where the optimizer takes advantage of knowing that the
> > lower bits are not set.
> > This should compensate somewhat for removing the handcoded optimization of
> > copying in chunks of 32 mbufs.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---
>
> Ran a very quick perf test using a couple of 100G ports, no regression
> seen with this patch, maybe even a slight perf bump. Therefore:
>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> Tested-by: Bruce Richardson <bruce.richardson@intel.com>
>
Applied to dpdk-next-net-intel.
Thanks,
/Bruce
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-21 11:00 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-18 9:56 [PATCH] net/intel: do not bypass mbuf lib for mbuf fast-free Morten Brørup
2026-04-19 6:29 ` Morten Brørup
2026-04-21 10:34 ` Bruce Richardson
2026-04-21 10:44 ` Morten Brørup
2026-04-21 11:00 ` Bruce Richardson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox