* [PATCH v8 1/3] mbuf: de-inline sanity checking a reinitialized mbuf
2025-08-23 6:29 ` [PATCH v8 0/3] mbuf: simplify handling " Morten Brørup
@ 2025-08-23 6:30 ` Morten Brørup
2025-10-09 16:49 ` Bruce Richardson
2025-08-23 6:30 ` [PATCH v8 2/3] mbuf: promote raw free and alloc bulk functions as stable Morten Brørup
` (3 subsequent siblings)
4 siblings, 1 reply; 39+ messages in thread
From: Morten Brørup @ 2025-08-23 6:30 UTC (permalink / raw)
To: dev, Thomas Monjalon, Stephen Hemminger, Bruce Richardson,
Konstantin Ananyev, Andrew Rybchenko, Ivan Malov, Chengwen Feng
Cc: Morten Brørup
Sanity checking a reinitialized mbuf (a.k.a. raw mbuf) has been refactored
to follow the same design pattern as sanity checking a normal mbuf, and
now depends on RTE_LIBRTE_MBUF_DEBUG instead of RTE_ENABLE_ASSERT.
The details of the changes are as follows:
Non-inlined functions rte_mbuf_raw_sanity_check() and rte_mbuf_raw_check()
have been added.
They do for a reinitialized mbuf what rte_mbuf_sanity_check() and
rte_mbuf_check() do for a normal mbuf.
They basically check the same conditions as __rte_mbuf_raw_sanity_check()
previously did, but use "if (!condition) rte_panic(message)" instead of
RTE_ASSERT(), so they don't depend on RTE_ENABLE_ASSERT.
The inline function __rte_mbuf_raw_sanity_check() has been replaced
by the new macro __rte_mbuf_raw_sanity_check_mp(), which either calls
rte_mbuf_raw_sanity_check() or does nothing, depending on
RTE_LIBRTE_MBUF_DEBUG, just like the __rte_mbuf_sanity_check() macro does
for a normal mbuf.
Note that the new macro __rte_mbuf_raw_sanity_check_mp() takes an optional
mempool parameter to verify that the mbuf belongs to the expected mbuf
pool.
This addition is mainly relevant for sanity checking reinitialized mbufs
freed directly into a given mempool by a PMD when using
RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE.
The macro __rte_mbuf_raw_sanity_check() has been kept for backwards API
compatibility.
It simply calls __rte_mbuf_raw_sanity_check_mp() without specifying a
mempool.
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
---
v3:
* Removed experimental status for the new functions.
Experimental is optional for new symbols, to allow for future API
changes. But this has been around for a long time. (Stephen Hemminger)
* Consequentially, the added build check for ALLOW_EXPERIMENTAL_API with
RTE_LIBRTE_MBUF_DEBUG is no longer needed, and was removed.
v2:
* Added explicit build check for ALLOW_EXPERIMENTAL_API being enabled when
RTE_LIBRTE_MBUF_DEBUG is enabled, with descriptive error message if not
so. (Ivan Malov)
* Fixed typo in patch description.
---
lib/mbuf/rte_mbuf.c | 61 ++++++++++++++++++++++++++++
lib/mbuf/rte_mbuf.h | 96 +++++++++++++++++++++++++++------------------
2 files changed, 119 insertions(+), 38 deletions(-)
diff --git a/lib/mbuf/rte_mbuf.c b/lib/mbuf/rte_mbuf.c
index 9e7731a8a2..af39c13cf7 100644
--- a/lib/mbuf/rte_mbuf.c
+++ b/lib/mbuf/rte_mbuf.c
@@ -373,6 +373,67 @@ rte_pktmbuf_pool_create_extbuf(const char *name, unsigned int n,
return mp;
}
+/* do some sanity checks on a reinitialized mbuf: panic if it fails */
+RTE_EXPORT_SYMBOL(rte_mbuf_raw_sanity_check)
+void
+rte_mbuf_raw_sanity_check(const struct rte_mbuf *m, const struct rte_mempool *mp)
+{
+ const char *reason;
+
+ if (rte_mbuf_raw_check(m, mp, &reason))
+ rte_panic("%s\n", reason);
+}
+
+RTE_EXPORT_SYMBOL(rte_mbuf_raw_check)
+int rte_mbuf_raw_check(const struct rte_mbuf *m, const struct rte_mempool *mp,
+ const char **reason)
+{
+ /* check sanity */
+ if (rte_mbuf_check(m, 0, reason) == -1)
+ return -1;
+
+ /* check initialized */
+ if (rte_mbuf_refcnt_read(m) != 1) {
+ *reason = "uninitialized ref cnt";
+ return -1;
+ }
+ if (m->next != NULL) {
+ *reason = "uninitialized next ptr";
+ return -1;
+ }
+ if (m->nb_segs != 1) {
+ *reason = "uninitialized nb_segs";
+ return -1;
+ }
+ if (RTE_MBUF_CLONED(m)) {
+ *reason = "cloned";
+ return -1;
+ }
+ if (RTE_MBUF_HAS_EXTBUF(m)) {
+ if (!RTE_MBUF_HAS_PINNED_EXTBUF(m)) {
+ *reason = "external buffer not pinned";
+ return -1;
+ }
+
+ uint16_t cnt = rte_mbuf_ext_refcnt_read(m->shinfo);
+ if ((cnt == 0) || (cnt == UINT16_MAX)) {
+ *reason = "pinned external buffer bad ref cnt";
+ return -1;
+ }
+ if (cnt != 1) {
+ *reason = "pinned external buffer uninitialized ref cnt";
+ return -1;
+ }
+ }
+
+ if (mp != NULL && m->pool != mp) {
+ *reason = "wrong mbuf pool";
+ return -1;
+ }
+
+ return 0;
+}
+
/* do some sanity checks on a mbuf: panic if it fails */
RTE_EXPORT_SYMBOL(rte_mbuf_sanity_check)
void
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 06ab7502a5..552cda1ae5 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -339,11 +339,17 @@ rte_pktmbuf_priv_flags(struct rte_mempool *mp)
#ifdef RTE_LIBRTE_MBUF_DEBUG
+/** check reinitialized mbuf type in debug mode */
+#define __rte_mbuf_raw_sanity_check_mp(m, mp) rte_mbuf_raw_sanity_check(m, mp)
+
/** check mbuf type in debug mode */
#define __rte_mbuf_sanity_check(m, is_h) rte_mbuf_sanity_check(m, is_h)
#else /* RTE_LIBRTE_MBUF_DEBUG */
+/** check reinitialized mbuf type in debug mode */
+#define __rte_mbuf_raw_sanity_check_mp(m, mp) do { } while (0)
+
/** check mbuf type in debug mode */
#define __rte_mbuf_sanity_check(m, is_h) do { } while (0)
@@ -513,6 +519,46 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
} while (0)
+/**
+ * Sanity checks on a reinitialized mbuf.
+ *
+ * Check the consistency of the given reinitialized mbuf.
+ * The function will cause a panic if corruption is detected.
+ *
+ * Check that the mbuf is properly reinitialized (refcnt=1, next=NULL,
+ * nb_segs=1), as done by rte_pktmbuf_prefree_seg().
+ *
+ * @param m
+ * The mbuf to be checked.
+ * @param mp
+ * The mempool to which the mbuf belongs.
+ * NULL if unknown, not to be checked.
+ */
+void
+rte_mbuf_raw_sanity_check(const struct rte_mbuf *m, const struct rte_mempool *mp);
+
+/**
+ * Sanity checks on a reinitialized mbuf.
+ *
+ * Almost like rte_mbuf_raw_sanity_check(), but this function gives the reason
+ * if corruption is detected rather than panic.
+ *
+ * @param m
+ * The mbuf to be checked.
+ * @param mp
+ * The mempool to which the mbuf belongs.
+ * NULL if unknown, not to be checked.
+ * @param reason
+ * A reference to a string pointer where to store the reason why a mbuf is
+ * considered invalid.
+ * @return
+ * - 0 if no issue has been found, reason is left untouched.
+ * - -1 if a problem is detected, reason then points to a string describing
+ * the reason why the mbuf is deemed invalid.
+ */
+int rte_mbuf_raw_check(const struct rte_mbuf *m, const struct rte_mempool *mp,
+ const char **reason);
+
/**
* Sanity checks on an mbuf.
*
@@ -550,33 +596,11 @@ rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header);
int rte_mbuf_check(const struct rte_mbuf *m, int is_header,
const char **reason);
-/**
- * Sanity checks on a reinitialized mbuf in debug mode.
- *
- * Check the consistency of the given reinitialized mbuf.
- * The function will cause a panic if corruption is detected.
- *
- * Check that the mbuf is properly reinitialized (refcnt=1, next=NULL,
- * nb_segs=1), as done by rte_pktmbuf_prefree_seg().
- *
- * @param m
- * The mbuf to be checked.
- */
-static __rte_always_inline void
-__rte_mbuf_raw_sanity_check(__rte_unused const struct rte_mbuf *m)
-{
- RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
- RTE_ASSERT(m->next == NULL);
- RTE_ASSERT(m->nb_segs == 1);
- RTE_ASSERT(!RTE_MBUF_CLONED(m));
- RTE_ASSERT(!RTE_MBUF_HAS_EXTBUF(m) ||
- (RTE_MBUF_HAS_PINNED_EXTBUF(m) &&
- rte_mbuf_ext_refcnt_read(m->shinfo) == 1));
- __rte_mbuf_sanity_check(m, 0);
-}
+/** For backwards compatibility. */
+#define __rte_mbuf_raw_sanity_check(m) __rte_mbuf_raw_sanity_check_mp(m, NULL)
/** For backwards compatibility. */
-#define MBUF_RAW_ALLOC_CHECK(m) __rte_mbuf_raw_sanity_check(m)
+#define MBUF_RAW_ALLOC_CHECK(m) __rte_mbuf_raw_sanity_check_mp(m, NULL)
/**
* Allocate an uninitialized mbuf from mempool *mp*.
@@ -606,7 +630,7 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
if (rte_mempool_get(mp, &ret.ptr) < 0)
return NULL;
- __rte_mbuf_raw_sanity_check(ret.m);
+ __rte_mbuf_raw_sanity_check_mp(ret.m, mp);
return ret.m;
}
@@ -644,7 +668,7 @@ rte_mbuf_raw_alloc_bulk(struct rte_mempool *mp, struct rte_mbuf **mbufs, unsigne
int rc = rte_mempool_get_bulk(mp, (void **)mbufs, count);
if (likely(rc == 0))
for (unsigned int idx = 0; idx < count; idx++)
- __rte_mbuf_raw_sanity_check(mbufs[idx]);
+ __rte_mbuf_raw_sanity_check_mp(mbufs[idx], mp);
return rc;
}
@@ -665,7 +689,7 @@ rte_mbuf_raw_alloc_bulk(struct rte_mempool *mp, struct rte_mbuf **mbufs, unsigne
static __rte_always_inline void
rte_mbuf_raw_free(struct rte_mbuf *m)
{
- __rte_mbuf_raw_sanity_check(m);
+ __rte_mbuf_raw_sanity_check_mp(m, NULL);
rte_mempool_put(m->pool, m);
}
@@ -696,12 +720,8 @@ __rte_experimental
static __rte_always_inline void
rte_mbuf_raw_free_bulk(struct rte_mempool *mp, struct rte_mbuf **mbufs, unsigned int count)
{
- for (unsigned int idx = 0; idx < count; idx++) {
- const struct rte_mbuf *m = mbufs[idx];
- RTE_ASSERT(m != NULL);
- RTE_ASSERT(m->pool == mp);
- __rte_mbuf_raw_sanity_check(m);
- }
+ for (unsigned int idx = 0; idx < count; idx++)
+ __rte_mbuf_raw_sanity_check_mp(mbufs[idx], mp);
rte_mempool_put_bulk(mp, (void **)mbufs, count);
}
@@ -1021,22 +1041,22 @@ static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
switch (count % 4) {
case 0:
while (idx != count) {
- __rte_mbuf_raw_sanity_check(mbufs[idx]);
+ __rte_mbuf_raw_sanity_check_mp(mbufs[idx], pool);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
case 3:
- __rte_mbuf_raw_sanity_check(mbufs[idx]);
+ __rte_mbuf_raw_sanity_check_mp(mbufs[idx], pool);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
case 2:
- __rte_mbuf_raw_sanity_check(mbufs[idx]);
+ __rte_mbuf_raw_sanity_check_mp(mbufs[idx], pool);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
case 1:
- __rte_mbuf_raw_sanity_check(mbufs[idx]);
+ __rte_mbuf_raw_sanity_check_mp(mbufs[idx], pool);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
--
2.43.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH v8 1/3] mbuf: de-inline sanity checking a reinitialized mbuf
2025-08-23 6:30 ` [PATCH v8 1/3] mbuf: de-inline sanity checking a reinitialized mbuf Morten Brørup
@ 2025-10-09 16:49 ` Bruce Richardson
2025-10-09 17:12 ` Morten Brørup
0 siblings, 1 reply; 39+ messages in thread
From: Bruce Richardson @ 2025-10-09 16:49 UTC (permalink / raw)
To: Morten Brørup
Cc: dev, Thomas Monjalon, Stephen Hemminger, Konstantin Ananyev,
Andrew Rybchenko, Ivan Malov, Chengwen Feng
On Sat, Aug 23, 2025 at 06:30:00AM +0000, Morten Brørup wrote:
> Sanity checking a reinitialized mbuf (a.k.a. raw mbuf) has been refactored
> to follow the same design pattern as sanity checking a normal mbuf, and
> now depends on RTE_LIBRTE_MBUF_DEBUG instead of RTE_ENABLE_ASSERT.
>
> The details of the changes are as follows:
>
> Non-inlined functions rte_mbuf_raw_sanity_check() and rte_mbuf_raw_check()
> have been added.
> They do for a reinitialized mbuf what rte_mbuf_sanity_check() and
> rte_mbuf_check() do for a normal mbuf.
> They basically check the same conditions as __rte_mbuf_raw_sanity_check()
> previously did, but use "if (!condition) rte_panic(message)" instead of
> RTE_ASSERT(), so they don't depend on RTE_ENABLE_ASSERT.
>
> The inline function __rte_mbuf_raw_sanity_check() has been replaced
> by the new macro __rte_mbuf_raw_sanity_check_mp(), which either calls
> rte_mbuf_raw_sanity_check() or does nothing, depending on
> RTE_LIBRTE_MBUF_DEBUG, just like the __rte_mbuf_sanity_check() macro does
> for a normal mbuf.
>
> Note that the new macro __rte_mbuf_raw_sanity_check_mp() takes an optional
> mempool parameter to verify that the mbuf belongs to the expected mbuf
> pool.
> This addition is mainly relevant for sanity checking reinitialized mbufs
> freed directly into a given mempool by a PMD when using
> RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE.
>
> The macro __rte_mbuf_raw_sanity_check() has been kept for backwards API
> compatibility.
> It simply calls __rte_mbuf_raw_sanity_check_mp() without specifying a
> mempool.
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> ---
Looks ok to me, one comment inline below.
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> v3:
> * Removed experimental status for the new functions.
> Experimental is optional for new symbols, to allow for future API
> changes. But this has been around for a long time. (Stephen Hemminger)
> * Consequentially, the added build check for ALLOW_EXPERIMENTAL_API with
> RTE_LIBRTE_MBUF_DEBUG is no longer needed, and was removed.
> v2:
> * Added explicit build check for ALLOW_EXPERIMENTAL_API being enabled when
> RTE_LIBRTE_MBUF_DEBUG is enabled, with descriptive error message if not
> so. (Ivan Malov)
> * Fixed typo in patch description.
> ---
> lib/mbuf/rte_mbuf.c | 61 ++++++++++++++++++++++++++++
> lib/mbuf/rte_mbuf.h | 96 +++++++++++++++++++++++++++------------------
> 2 files changed, 119 insertions(+), 38 deletions(-)
>
> diff --git a/lib/mbuf/rte_mbuf.c b/lib/mbuf/rte_mbuf.c
> index 9e7731a8a2..af39c13cf7 100644
> --- a/lib/mbuf/rte_mbuf.c
> +++ b/lib/mbuf/rte_mbuf.c
> @@ -373,6 +373,67 @@ rte_pktmbuf_pool_create_extbuf(const char *name, unsigned int n,
> return mp;
> }
>
> +/* do some sanity checks on a reinitialized mbuf: panic if it fails */
> +RTE_EXPORT_SYMBOL(rte_mbuf_raw_sanity_check)
> +void
> +rte_mbuf_raw_sanity_check(const struct rte_mbuf *m, const struct rte_mempool *mp)
> +{
> + const char *reason;
> +
> + if (rte_mbuf_raw_check(m, mp, &reason))
> + rte_panic("%s\n", reason);
> +}
> +
> +RTE_EXPORT_SYMBOL(rte_mbuf_raw_check)
> +int rte_mbuf_raw_check(const struct rte_mbuf *m, const struct rte_mempool *mp,
> + const char **reason)
> +{
> + /* check sanity */
> + if (rte_mbuf_check(m, 0, reason) == -1)
> + return -1;
> +
> + /* check initialized */
> + if (rte_mbuf_refcnt_read(m) != 1) {
> + *reason = "uninitialized ref cnt";
> + return -1;
> + }
> + if (m->next != NULL) {
> + *reason = "uninitialized next ptr";
> + return -1;
> + }
> + if (m->nb_segs != 1) {
> + *reason = "uninitialized nb_segs";
> + return -1;
> + }
> + if (RTE_MBUF_CLONED(m)) {
> + *reason = "cloned";
> + return -1;
> + }
> + if (RTE_MBUF_HAS_EXTBUF(m)) {
> + if (!RTE_MBUF_HAS_PINNED_EXTBUF(m)) {
> + *reason = "external buffer not pinned";
> + return -1;
> + }
> +
> + uint16_t cnt = rte_mbuf_ext_refcnt_read(m->shinfo);
> + if ((cnt == 0) || (cnt == UINT16_MAX)) {
> + *reason = "pinned external buffer bad ref cnt";
> + return -1;
> + }
> + if (cnt != 1) {
> + *reason = "pinned external buffer uninitialized ref cnt";
> + return -1;
> + }
> + }
> +
> + if (mp != NULL && m->pool != mp) {
> + *reason = "wrong mbuf pool";
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> /* do some sanity checks on a mbuf: panic if it fails */
> RTE_EXPORT_SYMBOL(rte_mbuf_sanity_check)
> void
> diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> index 06ab7502a5..552cda1ae5 100644
> --- a/lib/mbuf/rte_mbuf.h
> +++ b/lib/mbuf/rte_mbuf.h
> @@ -339,11 +339,17 @@ rte_pktmbuf_priv_flags(struct rte_mempool *mp)
>
> #ifdef RTE_LIBRTE_MBUF_DEBUG
>
> +/** check reinitialized mbuf type in debug mode */
> +#define __rte_mbuf_raw_sanity_check_mp(m, mp) rte_mbuf_raw_sanity_check(m, mp)
> +
> /** check mbuf type in debug mode */
> #define __rte_mbuf_sanity_check(m, is_h) rte_mbuf_sanity_check(m, is_h)
>
> #else /* RTE_LIBRTE_MBUF_DEBUG */
>
> +/** check reinitialized mbuf type in debug mode */
This is in release mode, not debug mode. Comment below seems wrong too.
> +#define __rte_mbuf_raw_sanity_check_mp(m, mp) do { } while (0)
> +
> /** check mbuf type in debug mode */
> #define __rte_mbuf_sanity_check(m, is_h) do { } while (0)
>
> @@ -513,6 +519,46 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
> } while (0)
<snip>
^ permalink raw reply [flat|nested] 39+ messages in thread* RE: [PATCH v8 1/3] mbuf: de-inline sanity checking a reinitialized mbuf
2025-10-09 16:49 ` Bruce Richardson
@ 2025-10-09 17:12 ` Morten Brørup
2025-10-09 17:29 ` Thomas Monjalon
0 siblings, 1 reply; 39+ messages in thread
From: Morten Brørup @ 2025-10-09 17:12 UTC (permalink / raw)
To: Bruce Richardson
Cc: dev, Thomas Monjalon, Stephen Hemminger, Konstantin Ananyev,
Andrew Rybchenko, Ivan Malov, Chengwen Feng
> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Thursday, 9 October 2025 18.50
>
> On Sat, Aug 23, 2025 at 06:30:00AM +0000, Morten Brørup wrote:
> > Sanity checking a reinitialized mbuf (a.k.a. raw mbuf) has been
> refactored
> > to follow the same design pattern as sanity checking a normal mbuf,
> and
> > now depends on RTE_LIBRTE_MBUF_DEBUG instead of RTE_ENABLE_ASSERT.
> >
> > The details of the changes are as follows:
> >
> > Non-inlined functions rte_mbuf_raw_sanity_check() and
> rte_mbuf_raw_check()
> > have been added.
> > They do for a reinitialized mbuf what rte_mbuf_sanity_check() and
> > rte_mbuf_check() do for a normal mbuf.
> > They basically check the same conditions as
> __rte_mbuf_raw_sanity_check()
> > previously did, but use "if (!condition) rte_panic(message)" instead
> of
> > RTE_ASSERT(), so they don't depend on RTE_ENABLE_ASSERT.
> >
> > The inline function __rte_mbuf_raw_sanity_check() has been replaced
> > by the new macro __rte_mbuf_raw_sanity_check_mp(), which either calls
> > rte_mbuf_raw_sanity_check() or does nothing, depending on
> > RTE_LIBRTE_MBUF_DEBUG, just like the __rte_mbuf_sanity_check() macro
> does
> > for a normal mbuf.
> >
> > Note that the new macro __rte_mbuf_raw_sanity_check_mp() takes an
> optional
> > mempool parameter to verify that the mbuf belongs to the expected
> mbuf
> > pool.
> > This addition is mainly relevant for sanity checking reinitialized
> mbufs
> > freed directly into a given mempool by a PMD when using
> > RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE.
> >
> > The macro __rte_mbuf_raw_sanity_check() has been kept for backwards
> API
> > compatibility.
> > It simply calls __rte_mbuf_raw_sanity_check_mp() without specifying a
> > mempool.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> > ---
>
> Looks ok to me, one comment inline below.
Thank you for reviewing, Bruce.
Comment replied to inline below. :-)
>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
>
> > v3:
> > * Removed experimental status for the new functions.
> > Experimental is optional for new symbols, to allow for future API
> > changes. But this has been around for a long time. (Stephen
> Hemminger)
> > * Consequentially, the added build check for ALLOW_EXPERIMENTAL_API
> with
> > RTE_LIBRTE_MBUF_DEBUG is no longer needed, and was removed.
> > v2:
> > * Added explicit build check for ALLOW_EXPERIMENTAL_API being enabled
> when
> > RTE_LIBRTE_MBUF_DEBUG is enabled, with descriptive error message if
> not
> > so. (Ivan Malov)
> > * Fixed typo in patch description.
> > ---
> > lib/mbuf/rte_mbuf.c | 61 ++++++++++++++++++++++++++++
> > lib/mbuf/rte_mbuf.h | 96 +++++++++++++++++++++++++++----------------
> --
> > 2 files changed, 119 insertions(+), 38 deletions(-)
> >
> > diff --git a/lib/mbuf/rte_mbuf.c b/lib/mbuf/rte_mbuf.c
> > index 9e7731a8a2..af39c13cf7 100644
> > --- a/lib/mbuf/rte_mbuf.c
> > +++ b/lib/mbuf/rte_mbuf.c
> > @@ -373,6 +373,67 @@ rte_pktmbuf_pool_create_extbuf(const char *name,
> unsigned int n,
> > return mp;
> > }
> >
> > +/* do some sanity checks on a reinitialized mbuf: panic if it fails
> */
> > +RTE_EXPORT_SYMBOL(rte_mbuf_raw_sanity_check)
> > +void
> > +rte_mbuf_raw_sanity_check(const struct rte_mbuf *m, const struct
> rte_mempool *mp)
> > +{
> > + const char *reason;
> > +
> > + if (rte_mbuf_raw_check(m, mp, &reason))
> > + rte_panic("%s\n", reason);
> > +}
> > +
> > +RTE_EXPORT_SYMBOL(rte_mbuf_raw_check)
> > +int rte_mbuf_raw_check(const struct rte_mbuf *m, const struct
> rte_mempool *mp,
> > + const char **reason)
> > +{
> > + /* check sanity */
> > + if (rte_mbuf_check(m, 0, reason) == -1)
> > + return -1;
> > +
> > + /* check initialized */
> > + if (rte_mbuf_refcnt_read(m) != 1) {
> > + *reason = "uninitialized ref cnt";
> > + return -1;
> > + }
> > + if (m->next != NULL) {
> > + *reason = "uninitialized next ptr";
> > + return -1;
> > + }
> > + if (m->nb_segs != 1) {
> > + *reason = "uninitialized nb_segs";
> > + return -1;
> > + }
> > + if (RTE_MBUF_CLONED(m)) {
> > + *reason = "cloned";
> > + return -1;
> > + }
> > + if (RTE_MBUF_HAS_EXTBUF(m)) {
> > + if (!RTE_MBUF_HAS_PINNED_EXTBUF(m)) {
> > + *reason = "external buffer not pinned";
> > + return -1;
> > + }
> > +
> > + uint16_t cnt = rte_mbuf_ext_refcnt_read(m->shinfo);
> > + if ((cnt == 0) || (cnt == UINT16_MAX)) {
> > + *reason = "pinned external buffer bad ref cnt";
> > + return -1;
> > + }
> > + if (cnt != 1) {
> > + *reason = "pinned external buffer uninitialized ref
> cnt";
> > + return -1;
> > + }
> > + }
> > +
> > + if (mp != NULL && m->pool != mp) {
> > + *reason = "wrong mbuf pool";
> > + return -1;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > /* do some sanity checks on a mbuf: panic if it fails */
> > RTE_EXPORT_SYMBOL(rte_mbuf_sanity_check)
> > void
> > diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> > index 06ab7502a5..552cda1ae5 100644
> > --- a/lib/mbuf/rte_mbuf.h
> > +++ b/lib/mbuf/rte_mbuf.h
> > @@ -339,11 +339,17 @@ rte_pktmbuf_priv_flags(struct rte_mempool *mp)
> >
> > #ifdef RTE_LIBRTE_MBUF_DEBUG
> >
> > +/** check reinitialized mbuf type in debug mode */
> > +#define __rte_mbuf_raw_sanity_check_mp(m, mp)
> rte_mbuf_raw_sanity_check(m, mp)
> > +
> > /** check mbuf type in debug mode */
> > #define __rte_mbuf_sanity_check(m, is_h) rte_mbuf_sanity_check(m,
> is_h)
> >
> > #else /* RTE_LIBRTE_MBUF_DEBUG */
> >
> > +/** check reinitialized mbuf type in debug mode */
>
> This is in release mode, not debug mode. Comment below seems wrong too.
Yes, I noticed the comment was present in both debug and release mode, which I couldn't understand. So I guessed it was for Doxygen or some other parser.
I have seen weird stuff for Doxygen, e.g. "#ifdef __DOXYGEN__" for documenting a function [1], so I didn't attempt to understand the reason for it, but just followed the same pattern.
Also, the comment says that the macro does something in debug mode, so I guess that the comment is not incorrect (but misleading) when describing a release mode macro.
Again, I didn't understand it, so I left it as is.
[1]: https://elixir.bootlin.com/dpdk/v25.07/source/lib/eal/include/generic/rte_memcpy.h#L94
>
> > +#define __rte_mbuf_raw_sanity_check_mp(m, mp) do { } while (0)
> > +
> > /** check mbuf type in debug mode */
> > #define __rte_mbuf_sanity_check(m, is_h) do { } while (0)
> >
> > @@ -513,6 +519,46 @@ rte_mbuf_ext_refcnt_update(struct
> rte_mbuf_ext_shared_info *shinfo,
> > } while (0)
>
> <snip>
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v8 1/3] mbuf: de-inline sanity checking a reinitialized mbuf
2025-10-09 17:12 ` Morten Brørup
@ 2025-10-09 17:29 ` Thomas Monjalon
2025-10-09 17:55 ` Morten Brørup
0 siblings, 1 reply; 39+ messages in thread
From: Thomas Monjalon @ 2025-10-09 17:29 UTC (permalink / raw)
To: Morten Brørup
Cc: Bruce Richardson, dev, Stephen Hemminger, Konstantin Ananyev,
Andrew Rybchenko, Ivan Malov, Chengwen Feng
09/10/2025 19:12, Morten Brørup:
> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > On Sat, Aug 23, 2025 at 06:30:00AM +0000, Morten Brørup wrote:
> > > +/** check reinitialized mbuf type in debug mode */
> >
> > This is in release mode, not debug mode. Comment below seems wrong too.
>
> Yes, I noticed the comment was present in both debug and release mode,
> which I couldn't understand. So I guessed it was for Doxygen or some other parser.
> I have seen weird stuff for Doxygen, e.g. "#ifdef __DOXYGEN__"
> for documenting a function [1],
> so I didn't attempt to understand the reason for it,
> but just followed the same pattern.
Hum, as a maintainer, I would prefer you try to understand, or ask please.
Note: we can use Slack for such questions.
__DOXYGEN__ is defined only by Doxygen,
so any code inside #ifdef __DOXYGEN__ is for documentation only.
It was supposed to be used in lib/eal/include/generic
for functions which are really defined inline per CPU implementation.
^ permalink raw reply [flat|nested] 39+ messages in thread
* RE: [PATCH v8 1/3] mbuf: de-inline sanity checking a reinitialized mbuf
2025-10-09 17:29 ` Thomas Monjalon
@ 2025-10-09 17:55 ` Morten Brørup
2025-10-19 20:22 ` Thomas Monjalon
0 siblings, 1 reply; 39+ messages in thread
From: Morten Brørup @ 2025-10-09 17:55 UTC (permalink / raw)
To: Thomas Monjalon
Cc: Bruce Richardson, dev, Stephen Hemminger, Konstantin Ananyev,
Andrew Rybchenko, Ivan Malov, Chengwen Feng
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, 9 October 2025 19.29
>
> 09/10/2025 19:12, Morten Brørup:
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > On Sat, Aug 23, 2025 at 06:30:00AM +0000, Morten Brørup wrote:
> > > > +/** check reinitialized mbuf type in debug mode */
> > >
> > > This is in release mode, not debug mode. Comment below seems wrong
> too.
> >
> > Yes, I noticed the comment was present in both debug and release
> mode,
> > which I couldn't understand. So I guessed it was for Doxygen or some
> other parser.
> > I have seen weird stuff for Doxygen, e.g. "#ifdef __DOXYGEN__"
> > for documenting a function [1],
> > so I didn't attempt to understand the reason for it,
> > but just followed the same pattern.
>
> Hum, as a maintainer, I would prefer you try to understand, or ask
> please.
> Note: we can use Slack for such questions.
Point taken. Good guidance!
I'll try to broaden my scope of knowledge to include preprocessing for Doxygen.
>
> __DOXYGEN__ is defined only by Doxygen,
> so any code inside #ifdef __DOXYGEN__ is for documentation only.
> It was supposed to be used in lib/eal/include/generic
> for functions which are really defined inline per CPU implementation.
>
Yes, I get that.
But I don't get - when there are two definitions of a macro - why is the same comment/documentation present in both instances?
And if the instances have different comments/documentation, how is that reflected in the documentation output from Doxygen?
In this specific case, how should the code look to both 1) get the wanted Doxygen documentation output, and 2) have relevant comments inline in the source code for both instances?
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 1/3] mbuf: de-inline sanity checking a reinitialized mbuf
2025-10-09 17:55 ` Morten Brørup
@ 2025-10-19 20:22 ` Thomas Monjalon
0 siblings, 0 replies; 39+ messages in thread
From: Thomas Monjalon @ 2025-10-19 20:22 UTC (permalink / raw)
To: Morten Brørup
Cc: Bruce Richardson, dev, Stephen Hemminger, Konstantin Ananyev,
Andrew Rybchenko, Ivan Malov, Chengwen Feng
09/10/2025 19:55, Morten Brørup:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Thursday, 9 October 2025 19.29
> >
> > 09/10/2025 19:12, Morten Brørup:
> > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > On Sat, Aug 23, 2025 at 06:30:00AM +0000, Morten Brørup wrote:
> > > > > +/** check reinitialized mbuf type in debug mode */
> > > >
> > > > This is in release mode, not debug mode. Comment below seems wrong
> > too.
> > >
> > > Yes, I noticed the comment was present in both debug and release
> > mode,
> > > which I couldn't understand. So I guessed it was for Doxygen or some
> > other parser.
> > > I have seen weird stuff for Doxygen, e.g. "#ifdef __DOXYGEN__"
> > > for documenting a function [1],
> > > so I didn't attempt to understand the reason for it,
> > > but just followed the same pattern.
> >
> > Hum, as a maintainer, I would prefer you try to understand, or ask
> > please.
> > Note: we can use Slack for such questions.
>
> Point taken. Good guidance!
> I'll try to broaden my scope of knowledge to include preprocessing for Doxygen.
>
> > __DOXYGEN__ is defined only by Doxygen,
> > so any code inside #ifdef __DOXYGEN__ is for documentation only.
> > It was supposed to be used in lib/eal/include/generic
> > for functions which are really defined inline per CPU implementation.
>
> Yes, I get that.
> But I don't get - when there are two definitions of a macro -
> why is the same comment/documentation present in both instances?
Because the author had no better idea I suppose.
> And if the instances have different comments/documentation,
> how is that reflected in the documentation output from Doxygen?
You can have only one instance of a symbol in Doxygen.
> In this specific case, how should the code look to both
> 1) get the wanted Doxygen documentation output,
#if defined RTE_LIBRTE_MBUF_DEBUG || defined __DOXYGEN__
/** Check reinitialized mbuf type in debug mode. */
#define __rte_mbuf_raw_sanity_check_mp(m, mp) rte_mbuf_raw_sanity_check(m, mp)
/** Check mbuf type in debug mode. */
#define __rte_mbuf_sanity_check(m, is_h) rte_mbuf_sanity_check(m, is_h)
#else /* !RTE_LIBRTE_MBUF_DEBUG */
#define __rte_mbuf_raw_sanity_check_mp(m, mp) do { } while (0)
#define __rte_mbuf_sanity_check(m, is_h) do { } while (0)
#endif /* RTE_LIBRTE_MBUF_DEBUG */
> and 2) have relevant comments inline in the source code for both instances?
No need to comment in release mode, it is not relevant for debug macros.
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v8 2/3] mbuf: promote raw free and alloc bulk functions as stable
2025-08-23 6:29 ` [PATCH v8 0/3] mbuf: simplify handling " Morten Brørup
2025-08-23 6:30 ` [PATCH v8 1/3] mbuf: de-inline sanity checking a reinitialized mbuf Morten Brørup
@ 2025-08-23 6:30 ` Morten Brørup
2025-10-09 16:53 ` Bruce Richardson
2025-08-23 6:30 ` [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs Morten Brørup
` (2 subsequent siblings)
4 siblings, 1 reply; 39+ messages in thread
From: Morten Brørup @ 2025-08-23 6:30 UTC (permalink / raw)
To: dev, Thomas Monjalon, Stephen Hemminger, Bruce Richardson,
Konstantin Ananyev, Andrew Rybchenko, Ivan Malov, Chengwen Feng
Cc: Morten Brørup
If ethdev drivers should use these APIs for allocating/freeing mbufs
instead of bypassing the mbuf library and accessing the mempool APIs, they
cannot be experimental anymore.
Also updated the packet mbuf alloc bulk function to use the raw alloc bulk
function, now that it is stable.
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
lib/mbuf/rte_mbuf.h | 14 +-------------
1 file changed, 1 insertion(+), 13 deletions(-)
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 552cda1ae5..49c93ab356 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -635,9 +635,6 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
}
/**
- * @warning
- * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
- *
* Allocate a bulk of uninitialized mbufs from mempool *mp*.
*
* This function can be used by PMDs (especially in Rx functions)
@@ -661,7 +658,6 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
* - 0: Success.
* - -ENOENT: Not enough entries in the mempool; no mbufs are retrieved.
*/
-__rte_experimental
static __rte_always_inline int
rte_mbuf_raw_alloc_bulk(struct rte_mempool *mp, struct rte_mbuf **mbufs, unsigned int count)
{
@@ -694,9 +690,6 @@ rte_mbuf_raw_free(struct rte_mbuf *m)
}
/**
- * @warning
- * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
- *
* Put a bulk of mbufs allocated from the same mempool back into the mempool.
*
* The caller must ensure that the mbufs come from the specified mempool,
@@ -716,7 +709,6 @@ rte_mbuf_raw_free(struct rte_mbuf *m)
* @param count
* Array size.
*/
-__rte_experimental
static __rte_always_inline void
rte_mbuf_raw_free_bulk(struct rte_mempool *mp, struct rte_mbuf **mbufs, unsigned int count)
{
@@ -1029,7 +1021,7 @@ static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
unsigned idx = 0;
int rc;
- rc = rte_mempool_get_bulk(pool, (void **)mbufs, count);
+ rc = rte_mbuf_raw_alloc_bulk(pool, mbufs, count);
if (unlikely(rc))
return rc;
@@ -1041,22 +1033,18 @@ static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
switch (count % 4) {
case 0:
while (idx != count) {
- __rte_mbuf_raw_sanity_check_mp(mbufs[idx], pool);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
case 3:
- __rte_mbuf_raw_sanity_check_mp(mbufs[idx], pool);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
case 2:
- __rte_mbuf_raw_sanity_check_mp(mbufs[idx], pool);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
case 1:
- __rte_mbuf_raw_sanity_check_mp(mbufs[idx], pool);
rte_pktmbuf_reset(mbufs[idx]);
idx++;
/* fall-through */
--
2.43.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH v8 2/3] mbuf: promote raw free and alloc bulk functions as stable
2025-08-23 6:30 ` [PATCH v8 2/3] mbuf: promote raw free and alloc bulk functions as stable Morten Brørup
@ 2025-10-09 16:53 ` Bruce Richardson
0 siblings, 0 replies; 39+ messages in thread
From: Bruce Richardson @ 2025-10-09 16:53 UTC (permalink / raw)
To: Morten Brørup
Cc: dev, Thomas Monjalon, Stephen Hemminger, Konstantin Ananyev,
Andrew Rybchenko, Ivan Malov, Chengwen Feng
On Sat, Aug 23, 2025 at 06:30:01AM +0000, Morten Brørup wrote:
> If ethdev drivers should use these APIs for allocating/freeing mbufs
> instead of bypassing the mbuf library and accessing the mempool APIs, they
> cannot be experimental anymore.
>
Not sure this is actually true. AFAIK there is no issue with internal use
of experimental functions (with certain caveats around use in macros or
inlines) since we will fixup any internal issues if the function changes.
However, no issue with making these stable.
> Also updated the packet mbuf alloc bulk function to use the raw alloc bulk
> function, now that it is stable.
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
2025-08-23 6:29 ` [PATCH v8 0/3] mbuf: simplify handling " Morten Brørup
2025-08-23 6:30 ` [PATCH v8 1/3] mbuf: de-inline sanity checking a reinitialized mbuf Morten Brørup
2025-08-23 6:30 ` [PATCH v8 2/3] mbuf: promote raw free and alloc bulk functions as stable Morten Brørup
@ 2025-08-23 6:30 ` Morten Brørup
2025-08-23 14:28 ` Stephen Hemminger
2025-10-09 17:15 ` Bruce Richardson
2025-10-06 14:43 ` [PATCH v8 0/3] mbuf: simplify handling " Morten Brørup
2025-10-19 20:42 ` Thomas Monjalon
4 siblings, 2 replies; 39+ messages in thread
From: Morten Brørup @ 2025-08-23 6:30 UTC (permalink / raw)
To: dev, Thomas Monjalon, Stephen Hemminger, Bruce Richardson,
Konstantin Ananyev, Andrew Rybchenko, Ivan Malov, Chengwen Feng
Cc: Morten Brørup
An optimized function for resetting a bulk of newly allocated
reinitialized mbufs (a.k.a. raw mbufs) was added.
Compared to the normal packet mbuf reset function, it takes advantage of
the following two details:
1. The 'next' and 'nb_segs' fields are already reset, so resetting them
has been omitted.
2. When resetting the mbuf, the 'ol_flags' field must indicate whether the
mbuf uses an external buffer, and the 'data_off' field must not exceed the
data room size when resetting the data offset to include the default
headroom.
Unlike the normal packet mbuf reset function, which reads the mbuf itself
to get the information required for resetting these two fields, this
function gets the information from the mempool.
This makes the function write-only of the mbuf, unlike the normal packet
mbuf reset function, which is read-modify-write of the mbuf.
Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
lib/mbuf/rte_mbuf.h | 74 ++++++++++++++++++++++++++++-----------------
1 file changed, 46 insertions(+), 28 deletions(-)
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 49c93ab356..6f37a2e91e 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -954,6 +954,50 @@ static inline void rte_pktmbuf_reset_headroom(struct rte_mbuf *m)
(uint16_t)m->buf_len);
}
+/**
+ * Reset the fields of a bulk of packet mbufs to their default values.
+ *
+ * The caller must ensure that the mbufs come from the specified mempool,
+ * are direct and properly reinitialized (refcnt=1, next=NULL, nb_segs=1),
+ * as done by rte_pktmbuf_prefree_seg().
+ *
+ * This function should be used with care, when optimization is required.
+ * For standard needs, prefer rte_pktmbuf_reset().
+ *
+ * @param mp
+ * The mempool to which the mbuf belongs.
+ * @param mbufs
+ * Array of pointers to packet mbufs.
+ * The array must not contain NULL pointers.
+ * @param count
+ * Array size.
+ */
+static inline void
+rte_mbuf_raw_reset_bulk(struct rte_mempool *mp, struct rte_mbuf **mbufs, unsigned int count)
+{
+ uint64_t ol_flags = (rte_pktmbuf_priv_flags(mp) & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) ?
+ RTE_MBUF_F_EXTERNAL : 0;
+ uint16_t data_off = RTE_MIN_T(RTE_PKTMBUF_HEADROOM, rte_pktmbuf_data_room_size(mp),
+ uint16_t);
+
+ for (unsigned int idx = 0; idx < count; idx++) {
+ struct rte_mbuf *m = mbufs[idx];
+
+ m->pkt_len = 0;
+ m->tx_offload = 0;
+ m->vlan_tci = 0;
+ m->vlan_tci_outer = 0;
+ m->port = RTE_MBUF_PORT_INVALID;
+
+ m->ol_flags = ol_flags;
+ m->packet_type = 0;
+ m->data_off = data_off;
+
+ m->data_len = 0;
+ __rte_mbuf_sanity_check(m, 1);
+ }
+}
+
/**
* Reset the fields of a packet mbuf to their default values.
*
@@ -997,7 +1041,7 @@ static inline struct rte_mbuf *rte_pktmbuf_alloc(struct rte_mempool *mp)
{
struct rte_mbuf *m;
if ((m = rte_mbuf_raw_alloc(mp)) != NULL)
- rte_pktmbuf_reset(m);
+ rte_mbuf_raw_reset_bulk(mp, &m, 1);
return m;
}
@@ -1018,38 +1062,12 @@ static inline struct rte_mbuf *rte_pktmbuf_alloc(struct rte_mempool *mp)
static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
struct rte_mbuf **mbufs, unsigned count)
{
- unsigned idx = 0;
int rc;
rc = rte_mbuf_raw_alloc_bulk(pool, mbufs, count);
if (unlikely(rc))
return rc;
-
- /* To understand duff's device on loop unwinding optimization, see
- * https://en.wikipedia.org/wiki/Duff's_device.
- * Here while() loop is used rather than do() while{} to avoid extra
- * check if count is zero.
- */
- switch (count % 4) {
- case 0:
- while (idx != count) {
- rte_pktmbuf_reset(mbufs[idx]);
- idx++;
- /* fall-through */
- case 3:
- rte_pktmbuf_reset(mbufs[idx]);
- idx++;
- /* fall-through */
- case 2:
- rte_pktmbuf_reset(mbufs[idx]);
- idx++;
- /* fall-through */
- case 1:
- rte_pktmbuf_reset(mbufs[idx]);
- idx++;
- /* fall-through */
- }
- }
+ rte_mbuf_raw_reset_bulk(pool, mbufs, count);
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
2025-08-23 6:30 ` [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs Morten Brørup
@ 2025-08-23 14:28 ` Stephen Hemminger
2025-10-09 17:15 ` Bruce Richardson
1 sibling, 0 replies; 39+ messages in thread
From: Stephen Hemminger @ 2025-08-23 14:28 UTC (permalink / raw)
To: Morten Brørup
Cc: dev, Thomas Monjalon, Bruce Richardson, Konstantin Ananyev,
Andrew Rybchenko, Ivan Malov, Chengwen Feng
On Sat, 23 Aug 2025 06:30:02 +0000
Morten Brørup <mb@smartsharesystems.com> wrote:
> An optimized function for resetting a bulk of newly allocated
> reinitialized mbufs (a.k.a. raw mbufs) was added.
>
> Compared to the normal packet mbuf reset function, it takes advantage of
> the following two details:
> 1. The 'next' and 'nb_segs' fields are already reset, so resetting them
> has been omitted.
This doesn't matter, since they are on the same cache line the
operation of touching them costs nothing.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
2025-08-23 6:30 ` [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs Morten Brørup
2025-08-23 14:28 ` Stephen Hemminger
@ 2025-10-09 17:15 ` Bruce Richardson
2025-10-09 17:35 ` Morten Brørup
2025-10-19 20:45 ` Stephen Hemminger
1 sibling, 2 replies; 39+ messages in thread
From: Bruce Richardson @ 2025-10-09 17:15 UTC (permalink / raw)
To: Morten Brørup
Cc: dev, Thomas Monjalon, Stephen Hemminger, Konstantin Ananyev,
Andrew Rybchenko, Ivan Malov, Chengwen Feng
On Sat, Aug 23, 2025 at 06:30:02AM +0000, Morten Brørup wrote:
> An optimized function for resetting a bulk of newly allocated
> reinitialized mbufs (a.k.a. raw mbufs) was added.
>
> Compared to the normal packet mbuf reset function, it takes advantage of
> the following two details:
> 1. The 'next' and 'nb_segs' fields are already reset, so resetting them
> has been omitted.
> 2. When resetting the mbuf, the 'ol_flags' field must indicate whether the
> mbuf uses an external buffer, and the 'data_off' field must not exceed the
> data room size when resetting the data offset to include the default
> headroom.
> Unlike the normal packet mbuf reset function, which reads the mbuf itself
> to get the information required for resetting these two fields, this
> function gets the information from the mempool.
>
> This makes the function write-only of the mbuf, unlike the normal packet
> mbuf reset function, which is read-modify-write of the mbuf.
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
> lib/mbuf/rte_mbuf.h | 74 ++++++++++++++++++++++++++++-----------------
> 1 file changed, 46 insertions(+), 28 deletions(-)
>
> diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> index 49c93ab356..6f37a2e91e 100644
> --- a/lib/mbuf/rte_mbuf.h
> +++ b/lib/mbuf/rte_mbuf.h
> @@ -954,6 +954,50 @@ static inline void rte_pktmbuf_reset_headroom(struct rte_mbuf *m)
> (uint16_t)m->buf_len);
> }
>
> +/**
> + * Reset the fields of a bulk of packet mbufs to their default values.
> + *
> + * The caller must ensure that the mbufs come from the specified mempool,
> + * are direct and properly reinitialized (refcnt=1, next=NULL, nb_segs=1),
> + * as done by rte_pktmbuf_prefree_seg().
> + *
> + * This function should be used with care, when optimization is required.
> + * For standard needs, prefer rte_pktmbuf_reset().
> + *
> + * @param mp
> + * The mempool to which the mbuf belongs.
> + * @param mbufs
> + * Array of pointers to packet mbufs.
> + * The array must not contain NULL pointers.
> + * @param count
> + * Array size.
> + */
> +static inline void
> +rte_mbuf_raw_reset_bulk(struct rte_mempool *mp, struct rte_mbuf **mbufs, unsigned int count)
> +{
> + uint64_t ol_flags = (rte_pktmbuf_priv_flags(mp) & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) ?
> + RTE_MBUF_F_EXTERNAL : 0;
> + uint16_t data_off = RTE_MIN_T(RTE_PKTMBUF_HEADROOM, rte_pktmbuf_data_room_size(mp),
> + uint16_t);
> +
> + for (unsigned int idx = 0; idx < count; idx++) {
> + struct rte_mbuf *m = mbufs[idx];
> +
> + m->pkt_len = 0;
> + m->tx_offload = 0;
> + m->vlan_tci = 0;
> + m->vlan_tci_outer = 0;
> + m->port = RTE_MBUF_PORT_INVALID;
Have you considered doing all initialization using 64-bit stores? It's
generally cheaper to do a single 64-bit store than e.g. set of 16-bit ones.
This also means that we could remove the restriction on having refcnt and
nb_segs already set. As in PMDs, a single store can init data_off, ref_cnt,
nb_segs and port.
Similarly for packet_type and pkt_len, and data_len/vlan_tci and rss fields
etc. For max performance, the whole of the mbuf cleared here can be done in
40 bytes, or 5 64-bit stores. If we do the stores in order, possibly the
compiler can even opportunistically coalesce more stores, so we could even
end up getting 128-bit or larger stores depending on the ISA compiled for.
[Maybe the compiler will do this even if they are not in order, but I'd
like to maximize my chances here! :-)]
/Bruce
> +
> + m->ol_flags = ol_flags;
> + m->packet_type = 0;
> + m->data_off = data_off;
> +
> + m->data_len = 0;
> + __rte_mbuf_sanity_check(m, 1);
> + }
> +}
> +
<snip>
^ permalink raw reply [flat|nested] 39+ messages in thread* RE: [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
2025-10-09 17:15 ` Bruce Richardson
@ 2025-10-09 17:35 ` Morten Brørup
2025-10-10 7:43 ` Bruce Richardson
2025-10-19 16:59 ` Thomas Monjalon
2025-10-19 20:45 ` Stephen Hemminger
1 sibling, 2 replies; 39+ messages in thread
From: Morten Brørup @ 2025-10-09 17:35 UTC (permalink / raw)
To: Bruce Richardson
Cc: dev, Thomas Monjalon, Stephen Hemminger, Konstantin Ananyev,
Andrew Rybchenko, Ivan Malov, Chengwen Feng
> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Thursday, 9 October 2025 19.15
>
> On Sat, Aug 23, 2025 at 06:30:02AM +0000, Morten Brørup wrote:
> > An optimized function for resetting a bulk of newly allocated
> > reinitialized mbufs (a.k.a. raw mbufs) was added.
> >
> > Compared to the normal packet mbuf reset function, it takes advantage
> of
> > the following two details:
> > 1. The 'next' and 'nb_segs' fields are already reset, so resetting
> them
> > has been omitted.
> > 2. When resetting the mbuf, the 'ol_flags' field must indicate
> whether the
> > mbuf uses an external buffer, and the 'data_off' field must not
> exceed the
> > data room size when resetting the data offset to include the default
> > headroom.
> > Unlike the normal packet mbuf reset function, which reads the mbuf
> itself
> > to get the information required for resetting these two fields, this
> > function gets the information from the mempool.
> >
> > This makes the function write-only of the mbuf, unlike the normal
> packet
> > mbuf reset function, which is read-modify-write of the mbuf.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---
> > lib/mbuf/rte_mbuf.h | 74 ++++++++++++++++++++++++++++---------------
> --
> > 1 file changed, 46 insertions(+), 28 deletions(-)
> >
> > diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> > index 49c93ab356..6f37a2e91e 100644
> > --- a/lib/mbuf/rte_mbuf.h
> > +++ b/lib/mbuf/rte_mbuf.h
> > @@ -954,6 +954,50 @@ static inline void
> rte_pktmbuf_reset_headroom(struct rte_mbuf *m)
> > (uint16_t)m->buf_len);
> > }
> >
> > +/**
> > + * Reset the fields of a bulk of packet mbufs to their default
> values.
> > + *
> > + * The caller must ensure that the mbufs come from the specified
> mempool,
> > + * are direct and properly reinitialized (refcnt=1, next=NULL,
> nb_segs=1),
> > + * as done by rte_pktmbuf_prefree_seg().
> > + *
> > + * This function should be used with care, when optimization is
> required.
> > + * For standard needs, prefer rte_pktmbuf_reset().
> > + *
> > + * @param mp
> > + * The mempool to which the mbuf belongs.
> > + * @param mbufs
> > + * Array of pointers to packet mbufs.
> > + * The array must not contain NULL pointers.
> > + * @param count
> > + * Array size.
> > + */
> > +static inline void
> > +rte_mbuf_raw_reset_bulk(struct rte_mempool *mp, struct rte_mbuf
> **mbufs, unsigned int count)
> > +{
> > + uint64_t ol_flags = (rte_pktmbuf_priv_flags(mp) &
> RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) ?
> > + RTE_MBUF_F_EXTERNAL : 0;
> > + uint16_t data_off = RTE_MIN_T(RTE_PKTMBUF_HEADROOM,
> rte_pktmbuf_data_room_size(mp),
> > + uint16_t);
> > +
> > + for (unsigned int idx = 0; idx < count; idx++) {
> > + struct rte_mbuf *m = mbufs[idx];
> > +
> > + m->pkt_len = 0;
> > + m->tx_offload = 0;
> > + m->vlan_tci = 0;
> > + m->vlan_tci_outer = 0;
> > + m->port = RTE_MBUF_PORT_INVALID;
>
> Have you considered doing all initialization using 64-bit stores? It's
> generally cheaper to do a single 64-bit store than e.g. set of 16-bit
> ones.
The code is basically copy-paste from rte_pktmbuf_reset().
I kept it the same way for readability.
> This also means that we could remove the restriction on having refcnt
> and
> nb_segs already set. As in PMDs, a single store can init data_off,
> ref_cnt,
> nb_segs and port.
Yes, I have given the concept a lot of thought already.
If we didn't require mbufs residing in the mempool to have any fields initialized, specifically "next" and "nb_segs", it would improve performance for drivers freeing mbufs back to the mempool, because writing to the mbufs would no longer be required at that point; the mbufs could simply be freed back to the mempool. Instead, we would require the driver to initialize these fields - which it probably does on RX anyway, if it supports segmented packets.
But I consider this concept a major API change, also affecting applications assuming that these fields are initialized when allocating raw mbufs from the mempool. So I haven't pursued it.
>
> Similarly for packet_type and pkt_len, and data_len/vlan_tci and rss
> fields
> etc. For max performance, the whole of the mbuf cleared here can be
> done in
> 40 bytes, or 5 64-bit stores. If we do the stores in order, possibly
> the
> compiler can even opportunistically coalesce more stores, so we could
> even
> end up getting 128-bit or larger stores depending on the ISA compiled
> for.
> [Maybe the compiler will do this even if they are not in order, but I'd
> like to maximize my chances here! :-)]
>
> /Bruce
>
> > +
> > + m->ol_flags = ol_flags;
> > + m->packet_type = 0;
> > + m->data_off = data_off;
> > +
> > + m->data_len = 0;
> > + __rte_mbuf_sanity_check(m, 1);
> > + }
> > +}
> > +
> <snip>
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
2025-10-09 17:35 ` Morten Brørup
@ 2025-10-10 7:43 ` Bruce Richardson
2025-10-10 9:22 ` Morten Brørup
2025-10-19 16:59 ` Thomas Monjalon
1 sibling, 1 reply; 39+ messages in thread
From: Bruce Richardson @ 2025-10-10 7:43 UTC (permalink / raw)
To: Morten Brørup
Cc: dev, Thomas Monjalon, Stephen Hemminger, Konstantin Ananyev,
Andrew Rybchenko, Ivan Malov, Chengwen Feng
On Thu, Oct 09, 2025 at 07:35:54PM +0200, Morten Brørup wrote:
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Thursday, 9 October 2025 19.15
> >
> > On Sat, Aug 23, 2025 at 06:30:02AM +0000, Morten Brørup wrote:
> > > An optimized function for resetting a bulk of newly allocated
> > > reinitialized mbufs (a.k.a. raw mbufs) was added.
> > >
> > > Compared to the normal packet mbuf reset function, it takes advantage
> > of
> > > the following two details:
> > > 1. The 'next' and 'nb_segs' fields are already reset, so resetting
> > them
> > > has been omitted.
> > > 2. When resetting the mbuf, the 'ol_flags' field must indicate
> > whether the
> > > mbuf uses an external buffer, and the 'data_off' field must not
> > exceed the
> > > data room size when resetting the data offset to include the default
> > > headroom.
> > > Unlike the normal packet mbuf reset function, which reads the mbuf
> > itself
> > > to get the information required for resetting these two fields, this
> > > function gets the information from the mempool.
> > >
> > > This makes the function write-only of the mbuf, unlike the normal
> > packet
> > > mbuf reset function, which is read-modify-write of the mbuf.
> > >
> > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > ---
> > > lib/mbuf/rte_mbuf.h | 74 ++++++++++++++++++++++++++++---------------
> > --
> > > 1 file changed, 46 insertions(+), 28 deletions(-)
> > >
> > > diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> > > index 49c93ab356..6f37a2e91e 100644
> > > --- a/lib/mbuf/rte_mbuf.h
> > > +++ b/lib/mbuf/rte_mbuf.h
> > > @@ -954,6 +954,50 @@ static inline void
> > rte_pktmbuf_reset_headroom(struct rte_mbuf *m)
> > > (uint16_t)m->buf_len);
> > > }
> > >
> > > +/**
> > > + * Reset the fields of a bulk of packet mbufs to their default
> > values.
> > > + *
> > > + * The caller must ensure that the mbufs come from the specified
> > mempool,
> > > + * are direct and properly reinitialized (refcnt=1, next=NULL,
> > nb_segs=1),
> > > + * as done by rte_pktmbuf_prefree_seg().
> > > + *
> > > + * This function should be used with care, when optimization is
> > required.
> > > + * For standard needs, prefer rte_pktmbuf_reset().
> > > + *
> > > + * @param mp
> > > + * The mempool to which the mbuf belongs.
> > > + * @param mbufs
> > > + * Array of pointers to packet mbufs.
> > > + * The array must not contain NULL pointers.
> > > + * @param count
> > > + * Array size.
> > > + */
> > > +static inline void
> > > +rte_mbuf_raw_reset_bulk(struct rte_mempool *mp, struct rte_mbuf
> > **mbufs, unsigned int count)
> > > +{
> > > + uint64_t ol_flags = (rte_pktmbuf_priv_flags(mp) &
> > RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) ?
> > > + RTE_MBUF_F_EXTERNAL : 0;
> > > + uint16_t data_off = RTE_MIN_T(RTE_PKTMBUF_HEADROOM,
> > rte_pktmbuf_data_room_size(mp),
> > > + uint16_t);
> > > +
> > > + for (unsigned int idx = 0; idx < count; idx++) {
> > > + struct rte_mbuf *m = mbufs[idx];
> > > +
> > > + m->pkt_len = 0;
> > > + m->tx_offload = 0;
> > > + m->vlan_tci = 0;
> > > + m->vlan_tci_outer = 0;
> > > + m->port = RTE_MBUF_PORT_INVALID;
> >
> > Have you considered doing all initialization using 64-bit stores? It's
> > generally cheaper to do a single 64-bit store than e.g. set of 16-bit
> > ones.
>
> The code is basically copy-paste from rte_pktmbuf_reset().
> I kept it the same way for readability.
>
I'd think using 64-bit stores should be fine for readability so long as
there is a comment on each one (maybe with compile-time checks for field
layout).
> > This also means that we could remove the restriction on having refcnt
> > and
> > nb_segs already set. As in PMDs, a single store can init data_off,
> > ref_cnt,
> > nb_segs and port.
>
> Yes, I have given the concept a lot of thought already.
> If we didn't require mbufs residing in the mempool to have any fields initialized, specifically "next" and "nb_segs", it would improve performance for drivers freeing mbufs back to the mempool, because writing to the mbufs would no longer be required at that point; the mbufs could simply be freed back to the mempool. Instead, we would require the driver to initialize these fields - which it probably does on RX anyway, if it supports segmented packets.
> But I consider this concept a major API change, also affecting applications assuming that these fields are initialized when allocating raw mbufs from the mempool. So I haven't pursued it.
>
Yes, agreed. Let's not change anything in those restrictions.
/Bruce
^ permalink raw reply [flat|nested] 39+ messages in thread* RE: [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
2025-10-10 7:43 ` Bruce Richardson
@ 2025-10-10 9:22 ` Morten Brørup
0 siblings, 0 replies; 39+ messages in thread
From: Morten Brørup @ 2025-10-10 9:22 UTC (permalink / raw)
To: Bruce Richardson
Cc: dev, Thomas Monjalon, Stephen Hemminger, Konstantin Ananyev,
Andrew Rybchenko, Ivan Malov, Chengwen Feng
> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Friday, 10 October 2025 09.44
>
> On Thu, Oct 09, 2025 at 07:35:54PM +0200, Morten Brørup wrote:
> > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > Sent: Thursday, 9 October 2025 19.15
> > >
> > > On Sat, Aug 23, 2025 at 06:30:02AM +0000, Morten Brørup wrote:
> > > > An optimized function for resetting a bulk of newly allocated
> > > > reinitialized mbufs (a.k.a. raw mbufs) was added.
> > > >
> > > > Compared to the normal packet mbuf reset function, it takes
> advantage
> > > of
> > > > the following two details:
> > > > 1. The 'next' and 'nb_segs' fields are already reset, so
> resetting
> > > them
> > > > has been omitted.
> > > > 2. When resetting the mbuf, the 'ol_flags' field must indicate
> > > whether the
> > > > mbuf uses an external buffer, and the 'data_off' field must not
> > > exceed the
> > > > data room size when resetting the data offset to include the
> default
> > > > headroom.
> > > > Unlike the normal packet mbuf reset function, which reads the
> mbuf
> > > itself
> > > > to get the information required for resetting these two fields,
> this
> > > > function gets the information from the mempool.
> > > >
> > > > This makes the function write-only of the mbuf, unlike the normal
> > > packet
> > > > mbuf reset function, which is read-modify-write of the mbuf.
> > > >
> > > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > > ---
> > > > lib/mbuf/rte_mbuf.h | 74 ++++++++++++++++++++++++++++-----------
> ----
> > > --
> > > > 1 file changed, 46 insertions(+), 28 deletions(-)
> > > >
> > > > diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> > > > index 49c93ab356..6f37a2e91e 100644
> > > > --- a/lib/mbuf/rte_mbuf.h
> > > > +++ b/lib/mbuf/rte_mbuf.h
> > > > @@ -954,6 +954,50 @@ static inline void
> > > rte_pktmbuf_reset_headroom(struct rte_mbuf *m)
> > > > (uint16_t)m->buf_len);
> > > > }
> > > >
> > > > +/**
> > > > + * Reset the fields of a bulk of packet mbufs to their default
> > > values.
> > > > + *
> > > > + * The caller must ensure that the mbufs come from the specified
> > > mempool,
> > > > + * are direct and properly reinitialized (refcnt=1, next=NULL,
> > > nb_segs=1),
> > > > + * as done by rte_pktmbuf_prefree_seg().
> > > > + *
> > > > + * This function should be used with care, when optimization is
> > > required.
> > > > + * For standard needs, prefer rte_pktmbuf_reset().
> > > > + *
> > > > + * @param mp
> > > > + * The mempool to which the mbuf belongs.
> > > > + * @param mbufs
> > > > + * Array of pointers to packet mbufs.
> > > > + * The array must not contain NULL pointers.
> > > > + * @param count
> > > > + * Array size.
> > > > + */
> > > > +static inline void
> > > > +rte_mbuf_raw_reset_bulk(struct rte_mempool *mp, struct rte_mbuf
> > > **mbufs, unsigned int count)
> > > > +{
> > > > + uint64_t ol_flags = (rte_pktmbuf_priv_flags(mp) &
> > > RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) ?
> > > > + RTE_MBUF_F_EXTERNAL : 0;
> > > > + uint16_t data_off = RTE_MIN_T(RTE_PKTMBUF_HEADROOM,
> > > rte_pktmbuf_data_room_size(mp),
> > > > + uint16_t);
> > > > +
> > > > + for (unsigned int idx = 0; idx < count; idx++) {
> > > > + struct rte_mbuf *m = mbufs[idx];
> > > > +
> > > > + m->pkt_len = 0;
> > > > + m->tx_offload = 0;
> > > > + m->vlan_tci = 0;
> > > > + m->vlan_tci_outer = 0;
> > > > + m->port = RTE_MBUF_PORT_INVALID;
> > >
> > > Have you considered doing all initialization using 64-bit stores?
> It's
> > > generally cheaper to do a single 64-bit store than e.g. set of 16-
> bit
> > > ones.
> >
> > The code is basically copy-paste from rte_pktmbuf_reset().
> > I kept it the same way for readability.
> >
>
> I'd think using 64-bit stores should be fine for readability so long as
> there is a comment on each one (maybe with compile-time checks for
> field
> layout).
OK. I'll put this additional optimization on my TODO list for later, and will do it for both rte_mbuf_raw_reset_bulk() and rte_pktmbuf_reset(). I'm too busy with another project to do it now.
>
> > > This also means that we could remove the restriction on having
> refcnt
> > > and
> > > nb_segs already set. As in PMDs, a single store can init data_off,
> > > ref_cnt,
> > > nb_segs and port.
> >
> > Yes, I have given the concept a lot of thought already.
> > If we didn't require mbufs residing in the mempool to have any fields
> initialized, specifically "next" and "nb_segs", it would improve
> performance for drivers freeing mbufs back to the mempool, because
> writing to the mbufs would no longer be required at that point; the
> mbufs could simply be freed back to the mempool. Instead, we would
> require the driver to initialize these fields - which it probably does
> on RX anyway, if it supports segmented packets.
> > But I consider this concept a major API change, also affecting
> applications assuming that these fields are initialized when allocating
> raw mbufs from the mempool. So I haven't pursued it.
> >
> Yes, agreed. Let's not change anything in those restrictions.
Maybe later, if the tech board agrees to such a change.
>
> /Bruce
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
2025-10-09 17:35 ` Morten Brørup
2025-10-10 7:43 ` Bruce Richardson
@ 2025-10-19 16:59 ` Thomas Monjalon
2025-10-19 18:45 ` Morten Brørup
1 sibling, 1 reply; 39+ messages in thread
From: Thomas Monjalon @ 2025-10-19 16:59 UTC (permalink / raw)
To: Morten Brørup
Cc: Bruce Richardson, dev, Stephen Hemminger, Konstantin Ananyev,
Andrew Rybchenko, Ivan Malov, Chengwen Feng
09/10/2025 19:35, Morten Brørup:
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > + m->pkt_len = 0;
> > > + m->tx_offload = 0;
> > > + m->vlan_tci = 0;
> > > + m->vlan_tci_outer = 0;
> > > + m->port = RTE_MBUF_PORT_INVALID;
> >
> > Have you considered doing all initialization using 64-bit stores? It's
> > generally cheaper to do a single 64-bit store than e.g. set of 16-bit
> > ones.
>
> The code is basically copy-paste from rte_pktmbuf_reset().
> I kept it the same way for readability.
>
> > This also means that we could remove the restriction on having refcnt
> > and
> > nb_segs already set. As in PMDs, a single store can init data_off,
> > ref_cnt,
> > nb_segs and port.
>
> Yes, I have given the concept a lot of thought already.
> If we didn't require mbufs residing in the mempool to have any fields initialized, specifically "next" and "nb_segs", it would improve performance for drivers freeing mbufs back to the mempool, because writing to the mbufs would no longer be required at that point; the mbufs could simply be freed back to the mempool. Instead, we would require the driver to initialize these fields - which it probably does on RX anyway, if it supports segmented packets.
> But I consider this concept a major API change, also affecting applications assuming that these fields are initialized when allocating raw mbufs from the mempool. So I haven't pursued it.
>
> >
> > Similarly for packet_type and pkt_len, and data_len/vlan_tci and rss
> > fields
> > etc. For max performance, the whole of the mbuf cleared here can be
> > done in
> > 40 bytes, or 5 64-bit stores. If we do the stores in order, possibly
> > the
> > compiler can even opportunistically coalesce more stores, so we could
> > even
> > end up getting 128-bit or larger stores depending on the ISA compiled
> > for.
> > [Maybe the compiler will do this even if they are not in order, but I'd
> > like to maximize my chances here! :-)]
Morten, you didn't reply to this.
Can we optimize more with big stores?
^ permalink raw reply [flat|nested] 39+ messages in thread
* RE: [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
2025-10-19 16:59 ` Thomas Monjalon
@ 2025-10-19 18:45 ` Morten Brørup
0 siblings, 0 replies; 39+ messages in thread
From: Morten Brørup @ 2025-10-19 18:45 UTC (permalink / raw)
To: Thomas Monjalon
Cc: Bruce Richardson, dev, Stephen Hemminger, Konstantin Ananyev,
Andrew Rybchenko, Ivan Malov, Chengwen Feng
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Sunday, 19 October 2025 18.59
>
> 09/10/2025 19:35, Morten Brørup:
> > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > + m->pkt_len = 0;
> > > > + m->tx_offload = 0;
> > > > + m->vlan_tci = 0;
> > > > + m->vlan_tci_outer = 0;
> > > > + m->port = RTE_MBUF_PORT_INVALID;
> > >
> > > Have you considered doing all initialization using 64-bit stores?
> It's
> > > generally cheaper to do a single 64-bit store than e.g. set of 16-
> bit
> > > ones.
> >
> > The code is basically copy-paste from rte_pktmbuf_reset().
> > I kept it the same way for readability.
> >
> > > This also means that we could remove the restriction on having
> refcnt
> > > and
> > > nb_segs already set. As in PMDs, a single store can init data_off,
> > > ref_cnt,
> > > nb_segs and port.
> >
> > Yes, I have given the concept a lot of thought already.
> > If we didn't require mbufs residing in the mempool to have any fields
> initialized, specifically "next" and "nb_segs", it would improve
> performance for drivers freeing mbufs back to the mempool, because
> writing to the mbufs would no longer be required at that point; the
> mbufs could simply be freed back to the mempool. Instead, we would
> require the driver to initialize these fields - which it probably does
> on RX anyway, if it supports segmented packets.
> > But I consider this concept a major API change, also affecting
> applications assuming that these fields are initialized when allocating
> raw mbufs from the mempool. So I haven't pursued it.
> >
> > >
> > > Similarly for packet_type and pkt_len, and data_len/vlan_tci and
> rss
> > > fields
> > > etc. For max performance, the whole of the mbuf cleared here can be
> > > done in
> > > 40 bytes, or 5 64-bit stores. If we do the stores in order,
> possibly
> > > the
> > > compiler can even opportunistically coalesce more stores, so we
> could
> > > even
> > > end up getting 128-bit or larger stores depending on the ISA
> compiled
> > > for.
> > > [Maybe the compiler will do this even if they are not in order, but
> I'd
> > > like to maximize my chances here! :-)]
>
> Morten, you didn't reply to this.
> Can we optimize more with big stores?
>
I did reply:
https://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35F654BB@smartserver.smartshare.dk/
Essentially, it's a different type of optimization, which should also be applied to rte_pktmbuf_reset().
I have postponed such optimization for later. Don't have time to do it properly now.
-Morten
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
2025-10-09 17:15 ` Bruce Richardson
2025-10-09 17:35 ` Morten Brørup
@ 2025-10-19 20:45 ` Stephen Hemminger
2025-10-20 8:46 ` Bruce Richardson
1 sibling, 1 reply; 39+ messages in thread
From: Stephen Hemminger @ 2025-10-19 20:45 UTC (permalink / raw)
To: Bruce Richardson
Cc: Morten Brørup, dev, Thomas Monjalon, Konstantin Ananyev,
Andrew Rybchenko, Ivan Malov, Chengwen Feng
On Thu, 9 Oct 2025 18:15:12 +0100
Bruce Richardson <bruce.richardson@intel.com> wrote:
> On Sat, Aug 23, 2025 at 06:30:02AM +0000, Morten Brørup wrote:
> > An optimized function for resetting a bulk of newly allocated
> > reinitialized mbufs (a.k.a. raw mbufs) was added.
> >
> > Compared to the normal packet mbuf reset function, it takes advantage of
> > the following two details:
> > 1. The 'next' and 'nb_segs' fields are already reset, so resetting them
> > has been omitted.
> > 2. When resetting the mbuf, the 'ol_flags' field must indicate whether the
> > mbuf uses an external buffer, and the 'data_off' field must not exceed the
> > data room size when resetting the data offset to include the default
> > headroom.
> > Unlike the normal packet mbuf reset function, which reads the mbuf itself
> > to get the information required for resetting these two fields, this
> > function gets the information from the mempool.
> >
> > This makes the function write-only of the mbuf, unlike the normal packet
> > mbuf reset function, which is read-modify-write of the mbuf.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---
> > lib/mbuf/rte_mbuf.h | 74 ++++++++++++++++++++++++++++-----------------
> > 1 file changed, 46 insertions(+), 28 deletions(-)
> >
> > diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> > index 49c93ab356..6f37a2e91e 100644
> > --- a/lib/mbuf/rte_mbuf.h
> > +++ b/lib/mbuf/rte_mbuf.h
> > @@ -954,6 +954,50 @@ static inline void rte_pktmbuf_reset_headroom(struct rte_mbuf *m)
> > (uint16_t)m->buf_len);
> > }
> >
> > +/**
> > + * Reset the fields of a bulk of packet mbufs to their default values.
> > + *
> > + * The caller must ensure that the mbufs come from the specified mempool,
> > + * are direct and properly reinitialized (refcnt=1, next=NULL, nb_segs=1),
> > + * as done by rte_pktmbuf_prefree_seg().
> > + *
> > + * This function should be used with care, when optimization is required.
> > + * For standard needs, prefer rte_pktmbuf_reset().
> > + *
> > + * @param mp
> > + * The mempool to which the mbuf belongs.
> > + * @param mbufs
> > + * Array of pointers to packet mbufs.
> > + * The array must not contain NULL pointers.
> > + * @param count
> > + * Array size.
> > + */
> > +static inline void
> > +rte_mbuf_raw_reset_bulk(struct rte_mempool *mp, struct rte_mbuf **mbufs, unsigned int count)
> > +{
> > + uint64_t ol_flags = (rte_pktmbuf_priv_flags(mp) & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) ?
> > + RTE_MBUF_F_EXTERNAL : 0;
> > + uint16_t data_off = RTE_MIN_T(RTE_PKTMBUF_HEADROOM, rte_pktmbuf_data_room_size(mp),
> > + uint16_t);
> > +
> > + for (unsigned int idx = 0; idx < count; idx++) {
> > + struct rte_mbuf *m = mbufs[idx];
> > +
> > + m->pkt_len = 0;
> > + m->tx_offload = 0;
> > + m->vlan_tci = 0;
> > + m->vlan_tci_outer = 0;
> > + m->port = RTE_MBUF_PORT_INVALID;
>
> Have you considered doing all initialization using 64-bit stores? It's
> generally cheaper to do a single 64-bit store than e.g. set of 16-bit ones.
> This also means that we could remove the restriction on having refcnt and
> nb_segs already set. As in PMDs, a single store can init data_off, ref_cnt,
> nb_segs and port.
>
> Similarly for packet_type and pkt_len, and data_len/vlan_tci and rss fields
> etc. For max performance, the whole of the mbuf cleared here can be done in
> 40 bytes, or 5 64-bit stores. If we do the stores in order, possibly the
> compiler can even opportunistically coalesce more stores, so we could even
> end up getting 128-bit or larger stores depending on the ISA compiled for.
> [Maybe the compiler will do this even if they are not in order, but I'd
> like to maximize my chances here! :-)]
>
> /Bruce
Although it is possible to use less CPU instructions, the performance
limiting factor is which fields are in cache.
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
2025-10-19 20:45 ` Stephen Hemminger
@ 2025-10-20 8:46 ` Bruce Richardson
2026-03-06 12:19 ` Rahul Bhansali
0 siblings, 1 reply; 39+ messages in thread
From: Bruce Richardson @ 2025-10-20 8:46 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Morten Brørup, dev, Thomas Monjalon, Konstantin Ananyev,
Andrew Rybchenko, Ivan Malov, Chengwen Feng
On Sun, Oct 19, 2025 at 01:45:45PM -0700, Stephen Hemminger wrote:
> On Thu, 9 Oct 2025 18:15:12 +0100
> Bruce Richardson <bruce.richardson@intel.com> wrote:
>
> > On Sat, Aug 23, 2025 at 06:30:02AM +0000, Morten Brørup wrote:
> > > An optimized function for resetting a bulk of newly allocated
> > > reinitialized mbufs (a.k.a. raw mbufs) was added.
> > >
> > > Compared to the normal packet mbuf reset function, it takes advantage of
> > > the following two details:
> > > 1. The 'next' and 'nb_segs' fields are already reset, so resetting them
> > > has been omitted.
> > > 2. When resetting the mbuf, the 'ol_flags' field must indicate whether the
> > > mbuf uses an external buffer, and the 'data_off' field must not exceed the
> > > data room size when resetting the data offset to include the default
> > > headroom.
> > > Unlike the normal packet mbuf reset function, which reads the mbuf itself
> > > to get the information required for resetting these two fields, this
> > > function gets the information from the mempool.
> > >
> > > This makes the function write-only of the mbuf, unlike the normal packet
> > > mbuf reset function, which is read-modify-write of the mbuf.
> > >
> > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > ---
> > > lib/mbuf/rte_mbuf.h | 74 ++++++++++++++++++++++++++++-----------------
> > > 1 file changed, 46 insertions(+), 28 deletions(-)
> > >
> > > diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> > > index 49c93ab356..6f37a2e91e 100644
> > > --- a/lib/mbuf/rte_mbuf.h
> > > +++ b/lib/mbuf/rte_mbuf.h
> > > @@ -954,6 +954,50 @@ static inline void rte_pktmbuf_reset_headroom(struct rte_mbuf *m)
> > > (uint16_t)m->buf_len);
> > > }
> > >
> > > +/**
> > > + * Reset the fields of a bulk of packet mbufs to their default values.
> > > + *
> > > + * The caller must ensure that the mbufs come from the specified mempool,
> > > + * are direct and properly reinitialized (refcnt=1, next=NULL, nb_segs=1),
> > > + * as done by rte_pktmbuf_prefree_seg().
> > > + *
> > > + * This function should be used with care, when optimization is required.
> > > + * For standard needs, prefer rte_pktmbuf_reset().
> > > + *
> > > + * @param mp
> > > + * The mempool to which the mbuf belongs.
> > > + * @param mbufs
> > > + * Array of pointers to packet mbufs.
> > > + * The array must not contain NULL pointers.
> > > + * @param count
> > > + * Array size.
> > > + */
> > > +static inline void
> > > +rte_mbuf_raw_reset_bulk(struct rte_mempool *mp, struct rte_mbuf **mbufs, unsigned int count)
> > > +{
> > > + uint64_t ol_flags = (rte_pktmbuf_priv_flags(mp) & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) ?
> > > + RTE_MBUF_F_EXTERNAL : 0;
> > > + uint16_t data_off = RTE_MIN_T(RTE_PKTMBUF_HEADROOM, rte_pktmbuf_data_room_size(mp),
> > > + uint16_t);
> > > +
> > > + for (unsigned int idx = 0; idx < count; idx++) {
> > > + struct rte_mbuf *m = mbufs[idx];
> > > +
> > > + m->pkt_len = 0;
> > > + m->tx_offload = 0;
> > > + m->vlan_tci = 0;
> > > + m->vlan_tci_outer = 0;
> > > + m->port = RTE_MBUF_PORT_INVALID;
> >
> > Have you considered doing all initialization using 64-bit stores? It's
> > generally cheaper to do a single 64-bit store than e.g. set of 16-bit ones.
> > This also means that we could remove the restriction on having refcnt and
> > nb_segs already set. As in PMDs, a single store can init data_off, ref_cnt,
> > nb_segs and port.
> >
> > Similarly for packet_type and pkt_len, and data_len/vlan_tci and rss fields
> > etc. For max performance, the whole of the mbuf cleared here can be done in
> > 40 bytes, or 5 64-bit stores. If we do the stores in order, possibly the
> > compiler can even opportunistically coalesce more stores, so we could even
> > end up getting 128-bit or larger stores depending on the ISA compiled for.
> > [Maybe the compiler will do this even if they are not in order, but I'd
> > like to maximize my chances here! :-)]
> >
> > /Bruce
>
> Although it is possible to use less CPU instructions, the performance
> limiting factor is which fields are in cache.
Yes, the cache presence of the target of the stores has a massive effect on
how well the code will perform. However, the number of stores can make a
difference too - especially if you are in store-heavy code. Consider the
number of store operations which would be generated by storing
field-by-field to a burst of 32 packets. With the previous work we have
done on our PMDs, and vectorizing them, we got a noticible benefit from
doing larger vector stores compared to smaller ones!
/Bruce
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
2025-10-20 8:46 ` Bruce Richardson
@ 2026-03-06 12:19 ` Rahul Bhansali
2026-03-06 14:53 ` Morten Brørup
0 siblings, 1 reply; 39+ messages in thread
From: Rahul Bhansali @ 2026-03-06 12:19 UTC (permalink / raw)
To: Bruce Richardson, Stephen Hemminger
Cc: Morten Brørup, dev@dpdk.org, Thomas Monjalon,
Konstantin Ananyev, Andrew Rybchenko, Ivan Malov, Chengwen Feng,
Jerin Jacob, Nithin Kumar Dabilpuram, Ashwin Sekhar T K
Please see inline.
> -----Original Message-----
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Monday, October 20, 2025 2:17 PM
> To: Stephen Hemminger <stephen@networkplumber.org>
> Cc: Morten Brørup <mb@smartsharesystems.com>; dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Konstantin Ananyev
> <konstantin.ananyev@huawei.com>; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Ivan Malov
> <ivan.malov@arknetworks.am>; Chengwen Feng <fengchengwen@huawei.com>
> Subject: Re: [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
>
> On Sun, Oct 19, 2025 at 01: 45: 45PM -0700, Stephen Hemminger wrote: > On Thu, 9 Oct 2025 18: 15: 12 +0100 > Bruce Richardson
> <bruce. richardson@ intel. com> wrote: > > > On Sat, Aug 23, 2025 at 06: 30: 02AM +0000, Morten Brørup
> On Sun, Oct 19, 2025 at 01:45:45PM -0700, Stephen Hemminger wrote:
> > On Thu, 9 Oct 2025 18:15:12 +0100
> > Bruce Richardson <mailto:bruce.richardson@intel.com> wrote:
> >
> > > On Sat, Aug 23, 2025 at 06:30:02AM +0000, Morten Brørup wrote:
> > > > An optimized function for resetting a bulk of newly allocated
> > > > reinitialized mbufs (a.k.a. raw mbufs) was added.
> > > >
> > > > Compared to the normal packet mbuf reset function, it takes advantage of
> > > > the following two details:
> > > > 1. The 'next' and 'nb_segs' fields are already reset, so resetting them
> > > > has been omitted.
> > > > 2. When resetting the mbuf, the 'ol_flags' field must indicate whether the
> > > > mbuf uses an external buffer, and the 'data_off' field must not exceed the
> > > > data room size when resetting the data offset to include the default
> > > > headroom.
> > > > Unlike the normal packet mbuf reset function, which reads the mbuf itself
> > > > to get the information required for resetting these two fields, this
> > > > function gets the information from the mempool.
> > > >
> > > > This makes the function write-only of the mbuf, unlike the normal packet
> > > > mbuf reset function, which is read-modify-write of the mbuf.
> > > >
> > > > Signed-off-by: Morten Brørup <mailto:mb@smartsharesystems.com>
> > > > ---
> > > > lib/mbuf/rte_mbuf.h | 74 ++++++++++++++++++++++++++++-----------------
> > > > 1 file changed, 46 insertions(+), 28 deletions(-)
> > > >
> > > > diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> > > > index 49c93ab356..6f37a2e91e 100644
> > > > --- a/lib/mbuf/rte_mbuf.h
> > > > +++ b/lib/mbuf/rte_mbuf.h
> > > > @@ -954,6 +954,50 @@ static inline void rte_pktmbuf_reset_headroom(struct rte_mbuf *m)
> > > > (uint16_t)m->buf_len);
> > > > }
> > > >
> > > > +/**
> > > > + * Reset the fields of a bulk of packet mbufs to their default values.
> > > > + *
> > > > + * The caller must ensure that the mbufs come from the specified mempool,
> > > > + * are direct and properly reinitialized (refcnt=1, next=NULL, nb_segs=1),
[Rahul] For Marvell's CNxx SoCs, mbuf pointers alloc and free are offloaded to HW for Rx/Tx, so these fields "next and nb_segs" will not be reset to default values by HW.
When packets are coming from wire, we reset these fields in Rx fastpath, but in case of SW allocated mbuf, we cannot do it in Marvell's mempool driver as that is unaware of mbuf.
Is it possible to reset these also in rte_mbuf_raw_reset_bulk() itself for mbuf alloc requests ?
> > > > + * as done by rte_pktmbuf_prefree_seg().
> > > > + *
> > > > + * This function should be used with care, when optimization is required.
> > > > + * For standard needs, prefer rte_pktmbuf_reset().
> > > > + *
> > > > + * @param mp
> > > > + * The mempool to which the mbuf belongs.
> > > > + * @param mbufs
> > > > + * Array of pointers to packet mbufs.
> > > > + * The array must not contain NULL pointers.
> > > > + * @param count
> > > > + * Array size.
> > > > + */
> > > > +static inline void
> > > > +rte_mbuf_raw_reset_bulk(struct rte_mempool *mp, struct rte_mbuf **mbufs, unsigned int count)
> > > > +{
> > > > + uint64_t ol_flags = (rte_pktmbuf_priv_flags(mp) & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) ?
> > > > + RTE_MBUF_F_EXTERNAL : 0;
> > > > + uint16_t data_off = RTE_MIN_T(RTE_PKTMBUF_HEADROOM, rte_pktmbuf_data_room_size(mp),
> > > > + uint16_t);
> > > > +
> > > > + for (unsigned int idx = 0; idx < count; idx++) {
> > > > + struct rte_mbuf *m = mbufs[idx];
> > > > +
> > > > + m->pkt_len = 0;
> > > > + m->tx_offload = 0;
> > > > + m->vlan_tci = 0;
> > > > + m->vlan_tci_outer = 0;
> > > > + m->port = RTE_MBUF_PORT_INVALID;
> > >
> > > Have you considered doing all initialization using 64-bit stores? It's
> > > generally cheaper to do a single 64-bit store than e.g. set of 16-bit ones.
> > > This also means that we could remove the restriction on having refcnt and
> > > nb_segs already set. As in PMDs, a single store can init data_off, ref_cnt,
> > > nb_segs and port.
> > >
> > > Similarly for packet_type and pkt_len, and data_len/vlan_tci and rss fields
> > > etc. For max performance, the whole of the mbuf cleared here can be done in
> > > 40 bytes, or 5 64-bit stores. If we do the stores in order, possibly the
> > > compiler can even opportunistically coalesce more stores, so we could even
> > > end up getting 128-bit or larger stores depending on the ISA compiled for.
> > > [Maybe the compiler will do this even if they are not in order, but I'd
> > > like to maximize my chances here! :-)]
> > >
> > > /Bruce
> >
> > Although it is possible to use less CPU instructions, the performance
> > limiting factor is which fields are in cache.
>
> Yes, the cache presence of the target of the stores has a massive effect on
> how well the code will perform. However, the number of stores can make a
> difference too - especially if you are in store-heavy code. Consider the
> number of store operations which would be generated by storing
> field-by-field to a burst of 32 packets. With the previous work we have
> done on our PMDs, and vectorizing them, we got a noticible benefit from
> doing larger vector stores compared to smaller ones!
>
> /Bruce
^ permalink raw reply [flat|nested] 39+ messages in thread* RE: [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
2026-03-06 12:19 ` Rahul Bhansali
@ 2026-03-06 14:53 ` Morten Brørup
2026-03-06 16:04 ` Rahul Bhansali
0 siblings, 1 reply; 39+ messages in thread
From: Morten Brørup @ 2026-03-06 14:53 UTC (permalink / raw)
To: Rahul Bhansali, Bruce Richardson, Stephen Hemminger
Cc: dev, Thomas Monjalon, Konstantin Ananyev, Andrew Rybchenko,
Ivan Malov, Chengwen Feng, Jerin Jacob, Nithin Kumar Dabilpuram,
Ashwin Sekhar T K
> From: Rahul Bhansali [mailto:rbhansali@marvell.com]
> Sent: Friday, 6 March 2026 13.19
>
> Please see inline.
>
> > From: Bruce Richardson <bruce.richardson@intel.com>
> > Sent: Monday, October 20, 2025 2:17 PM
> >
> > On Sun, Oct 19, 2025 at 01: 45: 45PM -0700, Stephen Hemminger wrote:
> > On Thu, 9 Oct 2025 18: 15: 12 +0100 > Bruce Richardson
> > <bruce. richardson@ intel. com> wrote: > > > On Sat, Aug 23, 2025 at
> 06: 30: 02AM +0000, Morten Brørup
> > On Sun, Oct 19, 2025 at 01:45:45PM -0700, Stephen Hemminger wrote:
> > > On Thu, 9 Oct 2025 18:15:12 +0100
> > > Bruce Richardson <mailto:bruce.richardson@intel.com> wrote:
> > >
> > > > On Sat, Aug 23, 2025 at 06:30:02AM +0000, Morten Brørup wrote:
> > > > > An optimized function for resetting a bulk of newly allocated
> > > > > reinitialized mbufs (a.k.a. raw mbufs) was added.
> > > > >
> > > > > Compared to the normal packet mbuf reset function, it takes
> advantage of
> > > > > the following two details:
> > > > > 1. The 'next' and 'nb_segs' fields are already reset, so
> resetting them
> > > > > has been omitted.
> > > > > 2. When resetting the mbuf, the 'ol_flags' field must indicate
> whether the
> > > > > mbuf uses an external buffer, and the 'data_off' field must not
> exceed the
> > > > > data room size when resetting the data offset to include the
> default
> > > > > headroom.
> > > > > Unlike the normal packet mbuf reset function, which reads the
> mbuf itself
> > > > > to get the information required for resetting these two fields,
> this
> > > > > function gets the information from the mempool.
> > > > >
> > > > > This makes the function write-only of the mbuf, unlike the
> normal packet
> > > > > mbuf reset function, which is read-modify-write of the mbuf.
> > > > >
> > > > > Signed-off-by: Morten Brørup <mailto:mb@smartsharesystems.com>
> > > > > ---
> > > > > lib/mbuf/rte_mbuf.h | 74 ++++++++++++++++++++++++++++---------
> --------
> > > > > 1 file changed, 46 insertions(+), 28 deletions(-)
> > > > >
> > > > > diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> > > > > index 49c93ab356..6f37a2e91e 100644
> > > > > --- a/lib/mbuf/rte_mbuf.h
> > > > > +++ b/lib/mbuf/rte_mbuf.h
> > > > > @@ -954,6 +954,50 @@ static inline void
> rte_pktmbuf_reset_headroom(struct rte_mbuf *m)
> > > > > (uint16_t)m->buf_len);
> > > > > }
> > > > >
> > > > > +/**
> > > > > + * Reset the fields of a bulk of packet mbufs to their default
> values.
> > > > > + *
> > > > > + * The caller must ensure that the mbufs come from the
> specified mempool,
> > > > > + * are direct and properly reinitialized (refcnt=1, next=NULL,
> nb_segs=1),
>
> [Rahul] For Marvell's CNxx SoCs, mbuf pointers alloc and free are
> offloaded to HW for Rx/Tx, so these fields "next and nb_segs" will not
> be reset to default values by HW.
> When packets are coming from wire, we reset these fields in Rx
> fastpath, but in case of SW allocated mbuf, we cannot do it in
> Marvell's mempool driver as that is unaware of mbuf.
It has always been an invariant that mbufs stored in a mempool have their "next" and "nb_segs" fields reset.
This means that these fields must be reset before free.
In an ethdev driver's normal Tx path, the driver calls rte_pktmbuf_prefree_seg() before freeing an mbuf.
Does your ethdev driver not do that?
> Is it possible to reset these also in rte_mbuf_raw_reset_bulk() itself
> for mbuf alloc requests ?
Due to the invariant (about mbufs stored in a mempool having their "next" and "nb_segs" fields reset), resetting them again in rte_mbuf_raw_reset_bulk() after fetching the mbufs from the mempool (i.e. after calling rte_mempool_get_bulk()) is considered unnecessary.
PS:
I wish for a roadmap towards eliminating this invariant, and instead require the ethdev drivers to reset the "nb_segs" and "next" fields in the Rx fastpath instead - where the driver is initializing many other mbuf fields anyway, and the additional cost is near-zero.
One of the steps in such a roadmap could be to reset the "nb_segs" and "next" fields in the rte_mbuf_raw_reset_bulk() function, for ethdev drivers which hasn't implemented it yet.
>
> > > > > + * as done by rte_pktmbuf_prefree_seg().
> > > > > + *
> > > > > + * This function should be used with care, when optimization
> is required.
> > > > > + * For standard needs, prefer rte_pktmbuf_reset().
> > > > > + *
> > > > > + * @param mp
> > > > > + * The mempool to which the mbuf belongs.
> > > > > + * @param mbufs
> > > > > + * Array of pointers to packet mbufs.
> > > > > + * The array must not contain NULL pointers.
> > > > > + * @param count
> > > > > + * Array size.
> > > > > + */
> > > > > +static inline void
> > > > > +rte_mbuf_raw_reset_bulk(struct rte_mempool *mp, struct
> rte_mbuf **mbufs, unsigned int count)
> > > > > +{
> > > > > + uint64_t ol_flags = (rte_pktmbuf_priv_flags(mp) &
> RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) ?
> > > > > + RTE_MBUF_F_EXTERNAL : 0;
> > > > > + uint16_t data_off = RTE_MIN_T(RTE_PKTMBUF_HEADROOM,
> rte_pktmbuf_data_room_size(mp),
> > > > > + uint16_t);
> > > > > +
> > > > > + for (unsigned int idx = 0; idx < count; idx++) {
> > > > > + struct rte_mbuf *m = mbufs[idx];
> > > > > +
> > > > > + m->pkt_len = 0;
> > > > > + m->tx_offload = 0;
> > > > > + m->vlan_tci = 0;
> > > > > + m->vlan_tci_outer = 0;
> > > > > + m->port = RTE_MBUF_PORT_INVALID;
> > > >
> > > > Have you considered doing all initialization using 64-bit stores?
> It's
> > > > generally cheaper to do a single 64-bit store than e.g. set of
> 16-bit ones.
> > > > This also means that we could remove the restriction on having
> refcnt and
> > > > nb_segs already set. As in PMDs, a single store can init
> data_off, ref_cnt,
> > > > nb_segs and port.
> > > >
> > > > Similarly for packet_type and pkt_len, and data_len/vlan_tci and
> rss fields
> > > > etc. For max performance, the whole of the mbuf cleared here can
> be done in
> > > > 40 bytes, or 5 64-bit stores. If we do the stores in order,
> possibly the
> > > > compiler can even opportunistically coalesce more stores, so we
> could even
> > > > end up getting 128-bit or larger stores depending on the ISA
> compiled for.
> > > > [Maybe the compiler will do this even if they are not in order,
> but I'd
> > > > like to maximize my chances here! :-)]
> > > >
> > > > /Bruce
> > >
> > > Although it is possible to use less CPU instructions, the
> performance
> > > limiting factor is which fields are in cache.
> >
> > Yes, the cache presence of the target of the stores has a massive
> effect on
> > how well the code will perform. However, the number of stores can
> make a
> > difference too - especially if you are in store-heavy code. Consider
> the
> > number of store operations which would be generated by storing
> > field-by-field to a burst of 32 packets. With the previous work we
> have
> > done on our PMDs, and vectorizing them, we got a noticible benefit
> from
> > doing larger vector stores compared to smaller ones!
> >
> > /Bruce
^ permalink raw reply [flat|nested] 39+ messages in thread* RE: [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
2026-03-06 14:53 ` Morten Brørup
@ 2026-03-06 16:04 ` Rahul Bhansali
2026-03-06 20:04 ` Morten Brørup
0 siblings, 1 reply; 39+ messages in thread
From: Rahul Bhansali @ 2026-03-06 16:04 UTC (permalink / raw)
To: Morten Brørup, Bruce Richardson, Stephen Hemminger
Cc: dev@dpdk.org, Thomas Monjalon, Konstantin Ananyev,
Andrew Rybchenko, Ivan Malov, Chengwen Feng, Jerin Jacob,
Nithin Kumar Dabilpuram, Ashwin Sekhar T K
> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Friday, March 6, 2026 8:23 PM
> To: Rahul Bhansali <rbhansali@marvell.com>; Bruce Richardson <bruce.richardson@intel.com>; Stephen Hemminger
> <stephen@networkplumber.org>
> Cc: dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Konstantin Ananyev <konstantin.ananyev@huawei.com>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>; Ivan Malov <ivan.malov@arknetworks.am>; Chengwen Feng
> <fengchengwen@huawei.com>; Jerin Jacob <jerinj@marvell.com>; Nithin Kumar Dabilpuram <ndabilpuram@marvell.com>; Ashwin
> Sekhar T K <asekhar@marvell.com>
> Subject: RE: [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
>
> > From: Rahul Bhansali [mailto: rbhansali@ marvell. com] > Sent: Friday, 6 March 2026 13. 19 > > Please see inline. > > > From: Bruce
> Richardson <bruce. richardson@ intel. com> > > Sent: Monday, October 20, 2025 2: 17
> > From: Rahul Bhansali [mailto:rbhansali@marvell.com]
> > Sent: Friday, 6 March 2026 13.19
> >
> > Please see inline.
> >
> > > From: Bruce Richardson <mailto:bruce.richardson@intel.com>
> > > Sent: Monday, October 20, 2025 2:17 PM
> > >
> > > On Sun, Oct 19, 2025 at 01: 45: 45PM -0700, Stephen Hemminger wrote:
> > > On Thu, 9 Oct 2025 18: 15: 12 +0100 > Bruce Richardson
> > > <bruce. richardson@ intel. com> wrote: > > > On Sat, Aug 23, 2025 at
> > 06: 30: 02AM +0000, Morten Brørup
> > > On Sun, Oct 19, 2025 at 01:45:45PM -0700, Stephen Hemminger wrote:
> > > > On Thu, 9 Oct 2025 18:15:12 +0100
> > > > Bruce Richardson <mailto:bruce.richardson@intel.com> wrote:
> > > >
> > > > > On Sat, Aug 23, 2025 at 06:30:02AM +0000, Morten Brørup wrote:
> > > > > > An optimized function for resetting a bulk of newly allocated
> > > > > > reinitialized mbufs (a.k.a. raw mbufs) was added.
> > > > > >
> > > > > > Compared to the normal packet mbuf reset function, it takes
> > advantage of
> > > > > > the following two details:
> > > > > > 1. The 'next' and 'nb_segs' fields are already reset, so
> > resetting them
> > > > > > has been omitted.
> > > > > > 2. When resetting the mbuf, the 'ol_flags' field must indicate
> > whether the
> > > > > > mbuf uses an external buffer, and the 'data_off' field must not
> > exceed the
> > > > > > data room size when resetting the data offset to include the
> > default
> > > > > > headroom.
> > > > > > Unlike the normal packet mbuf reset function, which reads the
> > mbuf itself
> > > > > > to get the information required for resetting these two fields,
> > this
> > > > > > function gets the information from the mempool.
> > > > > >
> > > > > > This makes the function write-only of the mbuf, unlike the
> > normal packet
> > > > > > mbuf reset function, which is read-modify-write of the mbuf.
> > > > > >
> > > > > > Signed-off-by: Morten Brørup <mailto:mb@smartsharesystems.com>
> > > > > > ---
> > > > > > lib/mbuf/rte_mbuf.h | 74 ++++++++++++++++++++++++++++---------
> > --------
> > > > > > 1 file changed, 46 insertions(+), 28 deletions(-)
> > > > > >
> > > > > > diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> > > > > > index 49c93ab356..6f37a2e91e 100644
> > > > > > --- a/lib/mbuf/rte_mbuf.h
> > > > > > +++ b/lib/mbuf/rte_mbuf.h
> > > > > > @@ -954,6 +954,50 @@ static inline void
> > rte_pktmbuf_reset_headroom(struct rte_mbuf *m)
> > > > > > (uint16_t)m->buf_len);
> > > > > > }
> > > > > >
> > > > > > +/**
> > > > > > + * Reset the fields of a bulk of packet mbufs to their default
> > values.
> > > > > > + *
> > > > > > + * The caller must ensure that the mbufs come from the
> > specified mempool,
> > > > > > + * are direct and properly reinitialized (refcnt=1, next=NULL,
> > nb_segs=1),
> >
> > [Rahul] For Marvell's CNxx SoCs, mbuf pointers alloc and free are
> > offloaded to HW for Rx/Tx, so these fields "next and nb_segs" will not
> > be reset to default values by HW.
> > When packets are coming from wire, we reset these fields in Rx
> > fastpath, but in case of SW allocated mbuf, we cannot do it in
> > Marvell's mempool driver as that is unaware of mbuf.
>
> It has always been an invariant that mbufs stored in a mempool have their "next" and "nb_segs" fields reset.
> This means that these fields must be reset before free.
>
> In an ethdev driver's normal Tx path, the driver calls rte_pktmbuf_prefree_seg() before freeing an mbuf.
> Does your ethdev driver not do that?
[Rahul] We support this in case of no mbuf fast free offload (RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE offload is disabled) .
When mbuf fast free offload is enabled, then mbuf will free in HW after transmission.
>
> > Is it possible to reset these also in rte_mbuf_raw_reset_bulk() itself
> > for mbuf alloc requests ?
>
> Due to the invariant (about mbufs stored in a mempool having their "next" and "nb_segs" fields reset), resetting them again in
> rte_mbuf_raw_reset_bulk() after fetching the mbufs from the mempool (i.e. after calling rte_mempool_get_bulk()) is considered
> unnecessary.
>
> PS:
> I wish for a roadmap towards eliminating this invariant, and instead require the ethdev drivers to reset the "nb_segs" and "next" fields in
> the Rx fastpath instead - where the driver is initializing many other mbuf fields anyway, and the additional cost is near-zero.
> One of the steps in such a roadmap could be to reset the "nb_segs" and "next" fields in the rte_mbuf_raw_reset_bulk() function, for
> ethdev drivers which hasn't implemented it yet.
>
> >
> > > > > > + * as done by rte_pktmbuf_prefree_seg().
> > > > > > + *
> > > > > > + * This function should be used with care, when optimization
> > is required.
> > > > > > + * For standard needs, prefer rte_pktmbuf_reset().
> > > > > > + *
> > > > > > + * @param mp
> > > > > > + * The mempool to which the mbuf belongs.
> > > > > > + * @param mbufs
> > > > > > + * Array of pointers to packet mbufs.
> > > > > > + * The array must not contain NULL pointers.
> > > > > > + * @param count
> > > > > > + * Array size.
> > > > > > + */
> > > > > > +static inline void
> > > > > > +rte_mbuf_raw_reset_bulk(struct rte_mempool *mp, struct
> > rte_mbuf **mbufs, unsigned int count)
> > > > > > +{
> > > > > > + uint64_t ol_flags = (rte_pktmbuf_priv_flags(mp) &
> > RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) ?
> > > > > > + RTE_MBUF_F_EXTERNAL : 0;
> > > > > > + uint16_t data_off = RTE_MIN_T(RTE_PKTMBUF_HEADROOM,
> > rte_pktmbuf_data_room_size(mp),
> > > > > > + uint16_t);
> > > > > > +
> > > > > > + for (unsigned int idx = 0; idx < count; idx++) {
> > > > > > + struct rte_mbuf *m = mbufs[idx];
> > > > > > +
> > > > > > + m->pkt_len = 0;
> > > > > > + m->tx_offload = 0;
> > > > > > + m->vlan_tci = 0;
> > > > > > + m->vlan_tci_outer = 0;
> > > > > > + m->port = RTE_MBUF_PORT_INVALID;
> > > > >
> > > > > Have you considered doing all initialization using 64-bit stores?
> > It's
> > > > > generally cheaper to do a single 64-bit store than e.g. set of
> > 16-bit ones.
> > > > > This also means that we could remove the restriction on having
> > refcnt and
> > > > > nb_segs already set. As in PMDs, a single store can init
> > data_off, ref_cnt,
> > > > > nb_segs and port.
> > > > >
> > > > > Similarly for packet_type and pkt_len, and data_len/vlan_tci and
> > rss fields
> > > > > etc. For max performance, the whole of the mbuf cleared here can
> > be done in
> > > > > 40 bytes, or 5 64-bit stores. If we do the stores in order,
> > possibly the
> > > > > compiler can even opportunistically coalesce more stores, so we
> > could even
> > > > > end up getting 128-bit or larger stores depending on the ISA
> > compiled for.
> > > > > [Maybe the compiler will do this even if they are not in order,
> > but I'd
> > > > > like to maximize my chances here! :-)]
> > > > >
> > > > > /Bruce
> > > >
> > > > Although it is possible to use less CPU instructions, the
> > performance
> > > > limiting factor is which fields are in cache.
> > >
> > > Yes, the cache presence of the target of the stores has a massive
> > effect on
> > > how well the code will perform. However, the number of stores can
> > make a
> > > difference too - especially if you are in store-heavy code. Consider
> > the
> > > number of store operations which would be generated by storing
> > > field-by-field to a burst of 32 packets. With the previous work we
> > have
> > > done on our PMDs, and vectorizing them, we got a noticible benefit
> > from
> > > doing larger vector stores compared to smaller ones!
> > >
> > > /Bruce
^ permalink raw reply [flat|nested] 39+ messages in thread* RE: [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
2026-03-06 16:04 ` Rahul Bhansali
@ 2026-03-06 20:04 ` Morten Brørup
2026-04-01 6:12 ` Rahul Bhansali
0 siblings, 1 reply; 39+ messages in thread
From: Morten Brørup @ 2026-03-06 20:04 UTC (permalink / raw)
To: Rahul Bhansali, Bruce Richardson, Stephen Hemminger
Cc: dev, Thomas Monjalon, Konstantin Ananyev, Andrew Rybchenko,
Ivan Malov, Chengwen Feng, Jerin Jacob, Nithin Kumar Dabilpuram,
Ashwin Sekhar T K
> From: Rahul Bhansali [mailto:rbhansali@marvell.com]
> Sent: Friday, 6 March 2026 17.04
>
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: Friday, March 6, 2026 8:23 PM
> >
> > > From: Rahul Bhansali [mailto: rbhansali@ marvell. com] > Sent:
> Friday, 6 March 2026 13. 19 > > Please see inline. > > > From: Bruce
> > Richardson <bruce. richardson@ intel. com> > > Sent: Monday, October
> 20, 2025 2: 17
> > > From: Rahul Bhansali [mailto:rbhansali@marvell.com]
> > > Sent: Friday, 6 March 2026 13.19
> > >
> > > Please see inline.
> > >
> > > > From: Bruce Richardson <mailto:bruce.richardson@intel.com>
> > > > Sent: Monday, October 20, 2025 2:17 PM
> > > >
> > > > On Sun, Oct 19, 2025 at 01: 45: 45PM -0700, Stephen Hemminger
> wrote:
> > > > On Thu, 9 Oct 2025 18: 15: 12 +0100 > Bruce Richardson
> > > > <bruce. richardson@ intel. com> wrote: > > > On Sat, Aug 23, 2025
> at
> > > 06: 30: 02AM +0000, Morten Brørup
> > > > On Sun, Oct 19, 2025 at 01:45:45PM -0700, Stephen Hemminger
> wrote:
> > > > > On Thu, 9 Oct 2025 18:15:12 +0100
> > > > > Bruce Richardson <mailto:bruce.richardson@intel.com> wrote:
> > > > >
> > > > > > On Sat, Aug 23, 2025 at 06:30:02AM +0000, Morten Brørup
> wrote:
> > > > > > > An optimized function for resetting a bulk of newly
> allocated
> > > > > > > reinitialized mbufs (a.k.a. raw mbufs) was added.
> > > > > > >
> > > > > > > Compared to the normal packet mbuf reset function, it takes
> > > advantage of
> > > > > > > the following two details:
> > > > > > > 1. The 'next' and 'nb_segs' fields are already reset, so
> > > resetting them
> > > > > > > has been omitted.
> > > > > > > 2. When resetting the mbuf, the 'ol_flags' field must
> indicate
> > > whether the
> > > > > > > mbuf uses an external buffer, and the 'data_off' field must
> not
> > > exceed the
> > > > > > > data room size when resetting the data offset to include
> the
> > > default
> > > > > > > headroom.
> > > > > > > Unlike the normal packet mbuf reset function, which reads
> the
> > > mbuf itself
> > > > > > > to get the information required for resetting these two
> fields,
> > > this
> > > > > > > function gets the information from the mempool.
> > > > > > >
> > > > > > > This makes the function write-only of the mbuf, unlike the
> > > normal packet
> > > > > > > mbuf reset function, which is read-modify-write of the
> mbuf.
> > > > > > >
> > > > > > > Signed-off-by: Morten Brørup
> <mailto:mb@smartsharesystems.com>
> > > > > > > ---
> > > > > > > lib/mbuf/rte_mbuf.h | 74 ++++++++++++++++++++++++++++-----
> ----
> > > --------
> > > > > > > 1 file changed, 46 insertions(+), 28 deletions(-)
> > > > > > >
> > > > > > > diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> > > > > > > index 49c93ab356..6f37a2e91e 100644
> > > > > > > --- a/lib/mbuf/rte_mbuf.h
> > > > > > > +++ b/lib/mbuf/rte_mbuf.h
> > > > > > > @@ -954,6 +954,50 @@ static inline void
> > > rte_pktmbuf_reset_headroom(struct rte_mbuf *m)
> > > > > > > (uint16_t)m->buf_len);
> > > > > > > }
> > > > > > >
> > > > > > > +/**
> > > > > > > + * Reset the fields of a bulk of packet mbufs to their
> default
> > > values.
> > > > > > > + *
> > > > > > > + * The caller must ensure that the mbufs come from the
> > > specified mempool,
> > > > > > > + * are direct and properly reinitialized (refcnt=1,
> next=NULL,
> > > nb_segs=1),
> > >
> > > [Rahul] For Marvell's CNxx SoCs, mbuf pointers alloc and free are
> > > offloaded to HW for Rx/Tx, so these fields "next and nb_segs" will
> not
> > > be reset to default values by HW.
> > > When packets are coming from wire, we reset these fields in Rx
> > > fastpath, but in case of SW allocated mbuf, we cannot do it in
> > > Marvell's mempool driver as that is unaware of mbuf.
> >
> > It has always been an invariant that mbufs stored in a mempool have
> their "next" and "nb_segs" fields reset.
> > This means that these fields must be reset before free.
> >
> > In an ethdev driver's normal Tx path, the driver calls
> rte_pktmbuf_prefree_seg() before freeing an mbuf.
> > Does your ethdev driver not do that?
> [Rahul] We support this in case of no mbuf fast free offload
> (RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE offload is disabled) .
> When mbuf fast free offload is enabled, then mbuf will free in HW after
> transmission.
Great. This limits the challenge to FAST_FREE Tx processing only.
There are two different solutions to this:
1. When choosing the Tx function for a queue, only select your "fast" Tx function if both RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE is set and RTE_ETH_TX_OFFLOAD_MULTI_SEGS is not set.
Note: If MULTI_SEGS is not set, packets are not segmented, so the "next" and "nb_segs" fields are never modified, and thus remain reset.
This limits the "fast" Tx function to support non-segmented packets only, so not the optimal solution.
2. Modify your "fast" Tx function to reset the mbuf "next" and "nb_segs" fields when writing the hardware Tx descriptor (i.e. at an earlier Tx processing stage in the driver), so the fields are already reset when freeing the mbuf.
This allows the "fast" Tx function to support both non-segmented and segmented packets.
>
> >
> > > Is it possible to reset these also in rte_mbuf_raw_reset_bulk()
> itself
> > > for mbuf alloc requests ?
> >
> > Due to the invariant (about mbufs stored in a mempool having their
> "next" and "nb_segs" fields reset), resetting them again in
> > rte_mbuf_raw_reset_bulk() after fetching the mbufs from the mempool
> (i.e. after calling rte_mempool_get_bulk()) is considered
> > unnecessary.
> >
> > PS:
> > I wish for a roadmap towards eliminating this invariant, and instead
> require the ethdev drivers to reset the "nb_segs" and "next" fields in
> > the Rx fastpath instead - where the driver is initializing many other
> mbuf fields anyway, and the additional cost is near-zero.
> > One of the steps in such a roadmap could be to reset the "nb_segs"
> and "next" fields in the rte_mbuf_raw_reset_bulk() function, for
> > ethdev drivers which hasn't implemented it yet.
> >
> > >
> > > > > > > + * as done by rte_pktmbuf_prefree_seg().
> > > > > > > + *
> > > > > > > + * This function should be used with care, when
> optimization
> > > is required.
> > > > > > > + * For standard needs, prefer rte_pktmbuf_reset().
> > > > > > > + *
> > > > > > > + * @param mp
> > > > > > > + * The mempool to which the mbuf belongs.
> > > > > > > + * @param mbufs
> > > > > > > + * Array of pointers to packet mbufs.
> > > > > > > + * The array must not contain NULL pointers.
> > > > > > > + * @param count
> > > > > > > + * Array size.
> > > > > > > + */
> > > > > > > +static inline void
> > > > > > > +rte_mbuf_raw_reset_bulk(struct rte_mempool *mp, struct
> > > rte_mbuf **mbufs, unsigned int count)
> > > > > > > +{
> > > > > > > + uint64_t ol_flags = (rte_pktmbuf_priv_flags(mp) &
> > > RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) ?
> > > > > > > + RTE_MBUF_F_EXTERNAL : 0;
> > > > > > > + uint16_t data_off = RTE_MIN_T(RTE_PKTMBUF_HEADROOM,
> > > rte_pktmbuf_data_room_size(mp),
> > > > > > > + uint16_t);
> > > > > > > +
> > > > > > > + for (unsigned int idx = 0; idx < count; idx++) {
> > > > > > > + struct rte_mbuf *m = mbufs[idx];
> > > > > > > +
> > > > > > > + m->pkt_len = 0;
> > > > > > > + m->tx_offload = 0;
> > > > > > > + m->vlan_tci = 0;
> > > > > > > + m->vlan_tci_outer = 0;
> > > > > > > + m->port = RTE_MBUF_PORT_INVALID;
> > > > > >
> > > > > > Have you considered doing all initialization using 64-bit
> stores?
> > > It's
> > > > > > generally cheaper to do a single 64-bit store than e.g. set
> of
> > > 16-bit ones.
> > > > > > This also means that we could remove the restriction on
> having
> > > refcnt and
> > > > > > nb_segs already set. As in PMDs, a single store can init
> > > data_off, ref_cnt,
> > > > > > nb_segs and port.
> > > > > >
> > > > > > Similarly for packet_type and pkt_len, and data_len/vlan_tci
> and
> > > rss fields
> > > > > > etc. For max performance, the whole of the mbuf cleared here
> can
> > > be done in
> > > > > > 40 bytes, or 5 64-bit stores. If we do the stores in order,
> > > possibly the
> > > > > > compiler can even opportunistically coalesce more stores, so
> we
> > > could even
> > > > > > end up getting 128-bit or larger stores depending on the ISA
> > > compiled for.
> > > > > > [Maybe the compiler will do this even if they are not in
> order,
> > > but I'd
> > > > > > like to maximize my chances here! :-)]
> > > > > >
> > > > > > /Bruce
> > > > >
> > > > > Although it is possible to use less CPU instructions, the
> > > performance
> > > > > limiting factor is which fields are in cache.
> > > >
> > > > Yes, the cache presence of the target of the stores has a massive
> > > effect on
> > > > how well the code will perform. However, the number of stores can
> > > make a
> > > > difference too - especially if you are in store-heavy code.
> Consider
> > > the
> > > > number of store operations which would be generated by storing
> > > > field-by-field to a burst of 32 packets. With the previous work
> we
> > > have
> > > > done on our PMDs, and vectorizing them, we got a noticible
> benefit
> > > from
> > > > doing larger vector stores compared to smaller ones!
> > > >
> > > > /Bruce
^ permalink raw reply [flat|nested] 39+ messages in thread* RE: [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
2026-03-06 20:04 ` Morten Brørup
@ 2026-04-01 6:12 ` Rahul Bhansali
0 siblings, 0 replies; 39+ messages in thread
From: Rahul Bhansali @ 2026-04-01 6:12 UTC (permalink / raw)
To: Morten Brørup, Bruce Richardson, Stephen Hemminger
Cc: dev@dpdk.org, Thomas Monjalon, Konstantin Ananyev,
Andrew Rybchenko, Ivan Malov, Chengwen Feng, Jerin Jacob,
Nithin Kumar Dabilpuram, Ashwin Sekhar T K
> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Saturday, March 7, 2026 1:34 AM
> To: Rahul Bhansali <rbhansali@marvell.com>; Bruce Richardson <bruce.richardson@intel.com>; Stephen Hemminger
> <stephen@networkplumber.org>
> Cc: dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Konstantin Ananyev <konstantin.ananyev@huawei.com>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>; Ivan Malov <ivan.malov@arknetworks.am>; Chengwen Feng
> <fengchengwen@huawei.com>; Jerin Jacob <jerinj@marvell.com>; Nithin Kumar Dabilpuram <ndabilpuram@marvell.com>; Ashwin
> Sekhar T K <asekhar@marvell.com>
> Subject: [EXTERNAL] RE: [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs
>
> > From: Rahul Bhansali [mailto: rbhansali@ marvell. com] > Sent: Friday, 6 March 2026 17. 04 > > > From: Morten Brørup
> <mb@ smartsharesystems. com> > > Sent: Friday, March 6, 2026 8: 23 PM > > > > > From: Rahul
> > From: Rahul Bhansali [mailto:rbhansali@marvell.com]
> > Sent: Friday, 6 March 2026 17.04
> >
> > > From: Morten Brørup <mailto:mb@smartsharesystems.com>
> > > Sent: Friday, March 6, 2026 8:23 PM
> > >
> > > > From: Rahul Bhansali [mailto: rbhansali@ marvell. com] > Sent:
> > Friday, 6 March 2026 13. 19 > > Please see inline. > > > From: Bruce
> > > Richardson <bruce. richardson@ intel. com> > > Sent: Monday, October
> > 20, 2025 2: 17
> > > > From: Rahul Bhansali [mailto:rbhansali@marvell.com]
> > > > Sent: Friday, 6 March 2026 13.19
> > > >
> > > > Please see inline.
> > > >
> > > > > From: Bruce Richardson <mailto:bruce.richardson@intel.com>
> > > > > Sent: Monday, October 20, 2025 2:17 PM
> > > > >
> > > > > On Sun, Oct 19, 2025 at 01: 45: 45PM -0700, Stephen Hemminger
> > wrote:
> > > > > On Thu, 9 Oct 2025 18: 15: 12 +0100 > Bruce Richardson
> > > > > <bruce. richardson@ intel. com> wrote: > > > On Sat, Aug 23, 2025
> > at
> > > > 06: 30: 02AM +0000, Morten Brørup
> > > > > On Sun, Oct 19, 2025 at 01:45:45PM -0700, Stephen Hemminger
> > wrote:
> > > > > > On Thu, 9 Oct 2025 18:15:12 +0100
> > > > > > Bruce Richardson <mailto:bruce.richardson@intel.com> wrote:
> > > > > >
> > > > > > > On Sat, Aug 23, 2025 at 06:30:02AM +0000, Morten Brørup
> > wrote:
> > > > > > > > An optimized function for resetting a bulk of newly
> > allocated
> > > > > > > > reinitialized mbufs (a.k.a. raw mbufs) was added.
> > > > > > > >
> > > > > > > > Compared to the normal packet mbuf reset function, it takes
> > > > advantage of
> > > > > > > > the following two details:
> > > > > > > > 1. The 'next' and 'nb_segs' fields are already reset, so
> > > > resetting them
> > > > > > > > has been omitted.
> > > > > > > > 2. When resetting the mbuf, the 'ol_flags' field must
> > indicate
> > > > whether the
> > > > > > > > mbuf uses an external buffer, and the 'data_off' field must
> > not
> > > > exceed the
> > > > > > > > data room size when resetting the data offset to include
> > the
> > > > default
> > > > > > > > headroom.
> > > > > > > > Unlike the normal packet mbuf reset function, which reads
> > the
> > > > mbuf itself
> > > > > > > > to get the information required for resetting these two
> > fields,
> > > > this
> > > > > > > > function gets the information from the mempool.
> > > > > > > >
> > > > > > > > This makes the function write-only of the mbuf, unlike the
> > > > normal packet
> > > > > > > > mbuf reset function, which is read-modify-write of the
> > mbuf.
> > > > > > > >
> > > > > > > > Signed-off-by: Morten Brørup
> > <mailto:mb@smartsharesystems.com>
> > > > > > > > ---
> > > > > > > > lib/mbuf/rte_mbuf.h | 74 ++++++++++++++++++++++++++++-----
> > ----
> > > > --------
> > > > > > > > 1 file changed, 46 insertions(+), 28 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> > > > > > > > index 49c93ab356..6f37a2e91e 100644
> > > > > > > > --- a/lib/mbuf/rte_mbuf.h
> > > > > > > > +++ b/lib/mbuf/rte_mbuf.h
> > > > > > > > @@ -954,6 +954,50 @@ static inline void
> > > > rte_pktmbuf_reset_headroom(struct rte_mbuf *m)
> > > > > > > > (uint16_t)m->buf_len);
> > > > > > > > }
> > > > > > > >
> > > > > > > > +/**
> > > > > > > > + * Reset the fields of a bulk of packet mbufs to their
> > default
> > > > values.
> > > > > > > > + *
> > > > > > > > + * The caller must ensure that the mbufs come from the
> > > > specified mempool,
> > > > > > > > + * are direct and properly reinitialized (refcnt=1,
> > next=NULL,
> > > > nb_segs=1),
> > > >
> > > > [Rahul] For Marvell's CNxx SoCs, mbuf pointers alloc and free are
> > > > offloaded to HW for Rx/Tx, so these fields "next and nb_segs" will
> > not
> > > > be reset to default values by HW.
> > > > When packets are coming from wire, we reset these fields in Rx
> > > > fastpath, but in case of SW allocated mbuf, we cannot do it in
> > > > Marvell's mempool driver as that is unaware of mbuf.
> > >
> > > It has always been an invariant that mbufs stored in a mempool have
> > their "next" and "nb_segs" fields reset.
> > > This means that these fields must be reset before free.
> > >
> > > In an ethdev driver's normal Tx path, the driver calls
> > rte_pktmbuf_prefree_seg() before freeing an mbuf.
> > > Does your ethdev driver not do that?
> > [Rahul] We support this in case of no mbuf fast free offload
> > (RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE offload is disabled) .
> > When mbuf fast free offload is enabled, then mbuf will free in HW after
> > transmission.
>
> Great. This limits the challenge to FAST_FREE Tx processing only.
> There are two different solutions to this:
>
> 1. When choosing the Tx function for a queue, only select your "fast" Tx function if both RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE is set
> and RTE_ETH_TX_OFFLOAD_MULTI_SEGS is not set.
> Note: If MULTI_SEGS is not set, packets are not segmented, so the "next" and "nb_segs" fields are never modified, and thus remain
> reset.
> This limits the "fast" Tx function to support non-segmented packets only, so not the optimal solution.
>
> 2. Modify your "fast" Tx function to reset the mbuf "next" and "nb_segs" fields when writing the hardware Tx descriptor (i.e. at an
> earlier Tx processing stage in the driver), so the fields are already reset when freeing the mbuf.
> This allows the "fast" Tx function to support both non-segmented and segmented packets.
Sorry, I missed to reply. We already support fastpath functions based on offload flags and will make changes for mbuf "next" and "nb_segs" in multi-seg offloads Tx path.
>
> >
> > >
> > > > Is it possible to reset these also in rte_mbuf_raw_reset_bulk()
> > itself
> > > > for mbuf alloc requests ?
> > >
> > > Due to the invariant (about mbufs stored in a mempool having their
> > "next" and "nb_segs" fields reset), resetting them again in
> > > rte_mbuf_raw_reset_bulk() after fetching the mbufs from the mempool
> > (i.e. after calling rte_mempool_get_bulk()) is considered
> > > unnecessary.
> > >
> > > PS:
> > > I wish for a roadmap towards eliminating this invariant, and instead
> > require the ethdev drivers to reset the "nb_segs" and "next" fields in
> > > the Rx fastpath instead - where the driver is initializing many other
> > mbuf fields anyway, and the additional cost is near-zero.
> > > One of the steps in such a roadmap could be to reset the "nb_segs"
> > and "next" fields in the rte_mbuf_raw_reset_bulk() function, for
> > > ethdev drivers which hasn't implemented it yet.
> > >
> > > >
> > > > > > > > + * as done by rte_pktmbuf_prefree_seg().
> > > > > > > > + *
> > > > > > > > + * This function should be used with care, when
> > optimization
> > > > is required.
> > > > > > > > + * For standard needs, prefer rte_pktmbuf_reset().
> > > > > > > > + *
> > > > > > > > + * @param mp
> > > > > > > > + * The mempool to which the mbuf belongs.
> > > > > > > > + * @param mbufs
> > > > > > > > + * Array of pointers to packet mbufs.
> > > > > > > > + * The array must not contain NULL pointers.
> > > > > > > > + * @param count
> > > > > > > > + * Array size.
> > > > > > > > + */
> > > > > > > > +static inline void
> > > > > > > > +rte_mbuf_raw_reset_bulk(struct rte_mempool *mp, struct
> > > > rte_mbuf **mbufs, unsigned int count)
> > > > > > > > +{
> > > > > > > > + uint64_t ol_flags = (rte_pktmbuf_priv_flags(mp) &
> > > > RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) ?
> > > > > > > > + RTE_MBUF_F_EXTERNAL : 0;
> > > > > > > > + uint16_t data_off = RTE_MIN_T(RTE_PKTMBUF_HEADROOM,
> > > > rte_pktmbuf_data_room_size(mp),
> > > > > > > > + uint16_t);
> > > > > > > > +
> > > > > > > > + for (unsigned int idx = 0; idx < count; idx++) {
> > > > > > > > + struct rte_mbuf *m = mbufs[idx];
> > > > > > > > +
> > > > > > > > + m->pkt_len = 0;
> > > > > > > > + m->tx_offload = 0;
> > > > > > > > + m->vlan_tci = 0;
> > > > > > > > + m->vlan_tci_outer = 0;
> > > > > > > > + m->port = RTE_MBUF_PORT_INVALID;
> > > > > > >
> > > > > > > Have you considered doing all initialization using 64-bit
> > stores?
> > > > It's
> > > > > > > generally cheaper to do a single 64-bit store than e.g. set
> > of
> > > > 16-bit ones.
> > > > > > > This also means that we could remove the restriction on
> > having
> > > > refcnt and
> > > > > > > nb_segs already set. As in PMDs, a single store can init
> > > > data_off, ref_cnt,
> > > > > > > nb_segs and port.
> > > > > > >
> > > > > > > Similarly for packet_type and pkt_len, and data_len/vlan_tci
> > and
> > > > rss fields
> > > > > > > etc. For max performance, the whole of the mbuf cleared here
> > can
> > > > be done in
> > > > > > > 40 bytes, or 5 64-bit stores. If we do the stores in order,
> > > > possibly the
> > > > > > > compiler can even opportunistically coalesce more stores, so
> > we
> > > > could even
> > > > > > > end up getting 128-bit or larger stores depending on the ISA
> > > > compiled for.
> > > > > > > [Maybe the compiler will do this even if they are not in
> > order,
> > > > but I'd
> > > > > > > like to maximize my chances here! :-)]
> > > > > > >
> > > > > > > /Bruce
> > > > > >
> > > > > > Although it is possible to use less CPU instructions, the
> > > > performance
> > > > > > limiting factor is which fields are in cache.
> > > > >
> > > > > Yes, the cache presence of the target of the stores has a massive
> > > > effect on
> > > > > how well the code will perform. However, the number of stores can
> > > > make a
> > > > > difference too - especially if you are in store-heavy code.
> > Consider
> > > > the
> > > > > number of store operations which would be generated by storing
> > > > > field-by-field to a burst of 32 packets. With the previous work
> > we
> > > > have
> > > > > done on our PMDs, and vectorizing them, we got a noticible
> > benefit
> > > > from
> > > > > doing larger vector stores compared to smaller ones!
> > > > >
> > > > > /Bruce
^ permalink raw reply [flat|nested] 39+ messages in thread
* RE: [PATCH v8 0/3] mbuf: simplify handling of reinitialized mbufs
2025-08-23 6:29 ` [PATCH v8 0/3] mbuf: simplify handling " Morten Brørup
` (2 preceding siblings ...)
2025-08-23 6:30 ` [PATCH v8 3/3] mbuf: optimize reset of reinitialized mbufs Morten Brørup
@ 2025-10-06 14:43 ` Morten Brørup
2025-10-19 20:42 ` Thomas Monjalon
4 siblings, 0 replies; 39+ messages in thread
From: Morten Brørup @ 2025-10-06 14:43 UTC (permalink / raw)
To: dev, Thomas Monjalon, Stephen Hemminger, Bruce Richardson,
Konstantin Ananyev, Andrew Rybchenko, Ivan Malov, Chengwen Feng
PING for review of this series.
Venlig hilsen / Kind regards,
-Morten Brørup
> -----Original Message-----
> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Saturday, 23 August 2025 08.30
> To: dev@dpdk.org; Thomas Monjalon; Stephen Hemminger; Bruce Richardson;
> Konstantin Ananyev; Andrew Rybchenko; Ivan Malov; Chengwen Feng
> Cc: Morten Brørup
> Subject: [PATCH v8 0/3] mbuf: simplify handling of reinitialized mbufs
>
> This series contains three changes:
> - sanity checking a reinitialized mbuf (a.k.a. raw mbuf) was de-inlined
> - reinitialized mbuf free and alloc bulk functions were promoted as
> stable
> - resetting fields of reinitialized mbufs was optimized, making the
> reset
> operation write-only of the mbuf, instead of read-modify-write
>
> Morten Brørup (3):
> mbuf: de-inline sanity checking a reinitialized mbuf
> mbuf: promote raw free and alloc bulk functions as stable
> mbuf: optimize reset of reinitialized mbufs
>
> lib/mbuf/rte_mbuf.c | 61 +++++++++++++++
> lib/mbuf/rte_mbuf.h | 176 +++++++++++++++++++++++++-------------------
> 2 files changed, 162 insertions(+), 75 deletions(-)
>
> --
> 2.43.0
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v8 0/3] mbuf: simplify handling of reinitialized mbufs
2025-08-23 6:29 ` [PATCH v8 0/3] mbuf: simplify handling " Morten Brørup
` (3 preceding siblings ...)
2025-10-06 14:43 ` [PATCH v8 0/3] mbuf: simplify handling " Morten Brørup
@ 2025-10-19 20:42 ` Thomas Monjalon
4 siblings, 0 replies; 39+ messages in thread
From: Thomas Monjalon @ 2025-10-19 20:42 UTC (permalink / raw)
To: Morten Brørup
Cc: dev, Stephen Hemminger, Bruce Richardson, Konstantin Ananyev,
Andrew Rybchenko, Ivan Malov, Chengwen Feng
23/08/2025 08:29, Morten Brørup:
> This series contains three changes:
> - sanity checking a reinitialized mbuf (a.k.a. raw mbuf) was de-inlined
> - reinitialized mbuf free and alloc bulk functions were promoted as stable
> - resetting fields of reinitialized mbufs was optimized, making the reset
> operation write-only of the mbuf, instead of read-modify-write
>
> Morten Brørup (3):
> mbuf: de-inline sanity checking a reinitialized mbuf
> mbuf: promote raw free and alloc bulk functions as stable
> mbuf: optimize reset of reinitialized mbufs
Applied, thanks.
2 notes:
- do you know how much it improves performance?
- it is adding (more or less) new functions directly as stable
^ permalink raw reply [flat|nested] 39+ messages in thread