From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Bruce Richardson" <bruce.richardson@intel.com>
Cc: <dev@dpdk.org>,
"Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
"Jingjing Wu" <jingjing.wu@intel.com>,
"Praveen Shetty" <praveen.shetty@intel.com>,
"Hemant Agrawal" <hemant.agrawal@nxp.com>,
"Sachin Saxena" <sachin.saxena@oss.nxp.com>
Subject: RE: [PATCH v5] mempool: improve cache behaviour and performance
Date: Wed, 27 May 2026 11:22:01 +0200 [thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35F658A0@smartserver.smartshare.dk> (raw)
In-Reply-To: <ahavyHwTzyPPDkQ4@bricha3-mobl1.ger.corp.intel.com>
> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Wednesday, 27 May 2026 10.48
>
> On Tue, May 26, 2026 at 07:45:24PM +0200, Morten Brørup wrote:
> > > From: Morten Brørup [mailto:mb@smartsharesystems.com]
> > > Sent: Tuesday, 26 May 2026 12.37
> > >
> > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > > Sent: Tuesday, 26 May 2026 11.40
> > > >
> >
> > [...]
> >
> > > > [In all this, I am making the assumption that burst size is well
> less
> > > > than
> > > > cache size. Also, similar logic would be applicable for the
> inverse
> > > > scenario, e.g. flush to empty (and fill burst) and fill to 75%]
> > >
> > > I'm not so sure about this assumption.
> > > With a cache size of 512 and a bursts of 64, the cache only holds 8
> > > bursts.
> > > 50% is 4 bursts, and 25% is only 2 bursts.
> > >
> > > Using a replenish/drain level in the middle requires 5 bursts in
> either
> > > direction to pass the edge (and trigger replenish/flush).
> > > Using a replenish/drain level 25% from the edge requires only 3
> bursts
> > > in the wrong direction to pass the edge (and trigger
> replenish/flush).
> > > Much higher probability with random get/put.
> > >
> > > >
> > > > Now, all said, I tend to agree that we want to leave space for a
> > > decent
> > > > size burst after a fill. That is why I think that filling to 75%
> is
> > > > reasonable. After an alloc that triggers a fill, I don't want the
> > > cache
> > > > less than 50% full, but not completely full so there is room for
> a
> > > free
> > > > without a flush, and similarly for a free that triggers a flush,
> the
> > > > cache
> > > > should not be empty, but also should not be more than half full.
> > > >
> > > > One suggestion - we could always add a simple tunable that
> specifies
> > > > the
> > > > margin, or reserved entries for alloc and free. We can then guide
> in
> > > > the
> > > > docs that the value should be e.g. "zero for apps where alloc and
> > > free
> > > > take
> > > > place on different cores. 20%-50% of cache is recommended where
> alloc
> > > > and
> > > > free take place on the same core"
> > >
> > > Yes, a simple tunable is a really good idea.
> > >
> > > At this point, I think we should optimize for use case #1, and go
> for
> > > the 50% fill level.
> > > Then we can add a tunable to optimize for use case #2 later. I will
> try
> > > to come up with a draft for such a follow-up patch within the next
> few
> > > days.
> >
> > Adding a tunable is not so simple...
> > The choice of mempool cache algorithm (drain/replenish to 50% vs.
> drain/replenish completely) should be passed via the "flags" parameter
> in rte_mempool_create(), but rte_pktmbuf_pool_create() is missing the
> "flags" parameter.
> > We can add it at the next ABI breaking release.
> > WDYT?
> >
> I don't want this just a binary flag with two settings, I think it
> should be an actual numeric value.
If it was plain and simple to support a numeric value, I'd do it.
But the two algorithms differ too much.
If needed, the flag can be used as an enum to support more algorithms in the future.
My WIP (not even build tested), where you can see the different algorithms steered by the cache->flags field, looks like this:
static __rte_always_inline void
rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
unsigned int n, struct rte_mempool_cache *cache)
{
void **cache_objs;
/* No cache provided? */
if (unlikely(cache == NULL))
goto driver_enqueue;
/* Increment stats now, adding in mempool always succeeds. */
RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
__rte_assume(cache->len <= RTE_MEMPOOL_CACHE_MAX_SIZE);
__rte_assume(cache->len <= cache->size);
if (likely(cache->len + n <= cache->size)) {
/* Sufficient room in the cache for the objects. */
cache_objs = &cache->objs[cache->len];
cache->len += n;
/* Add the objects to the cache. */
rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
return;
}
/* Insufficient room in the cache for the objects. */
if (cache->flags & RTE_MEMPOOL_F_CACHE_ACCESS_ONE_WAY) {
/* The algorithm is optimized for put or get operations only. */
/* The request itself exceeds the cache bounce buffer limit? */
__rte_assume(cache->size <= RTE_MEMPOOL_CACHE_MAX_SIZE);
if (n > cache->size)
goto driver_enqueue_stats_incremented;
/* Fill the cache completely by adding the first part of the objects. */
cache_objs = &cache->objs[cache->len];
rte_memcpy(cache_objs, obj_table, sizeof(void *) * (cache->size - cache->len));
obj_table += cache->size - cache->len;
/* Flush the entire cache to the backend. */
rte_mempool_ops_enqueue_bulk(mp, &cache->objs[0], cache->size);
/* Add the remaining objects to the cache. */
cache->len = n - (cache->size - cache->len);
rte_memcpy(&cache->objs[0], obj_table, sizeof(void *) * cache->len);
} else {
/* The algorithm is optimized for a balanced mix of put and get operations. */
/* The request itself exceeds the cache bounce buffer limit? */
__rte_assume(cache->size / 2 <= RTE_MEMPOOL_CACHE_MAX_SIZE / 2);
if (n > cache->size / 2)
goto driver_enqueue_stats_incremented;
/*
* Flush part of the cache to the backend to make room for the objects;
* flush (size / 2) objects from the bottom of the cache, where
* objects are less hot, and move down the remaining objects, which
* are more hot, from the upper half of the cache.
*/
__rte_assume(cache->len > cache->size / 2);
rte_mempool_ops_enqueue_bulk(mp, &cache->objs[0], cache->size / 2);
rte_memcpy(&cache->objs[0], &cache->objs[cache->size / 2],
sizeof(void *) * (cache->len - cache->size / 2));
cache_objs = &cache->objs[cache->len - cache->size / 2];
cache->len = cache->len - cache->size / 2 + n;
/* Add the objects to the cache. */
rte_memcpy(cache_objs, obj_table, sizeof(void *) * n);
}
return;
driver_enqueue:
/* increment stat now, adding in mempool always success */
RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
driver_enqueue_stats_incremented:
/* push objects to the backend */
rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
}
> Can we not use function versioning to add the
> new parameter to all functions needing it, without worrying about ABI
> breakage.
The public structures will also change, so I don't think so.
>
> /Bruce
next prev parent reply other threads:[~2026-05-27 9:22 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-08 14:13 [PATCH] mempool: improve cache behaviour and performance Morten Brørup
2026-04-08 15:41 ` Stephen Hemminger
2026-04-09 10:25 ` [PATCH v2] " Morten Brørup
2026-04-09 11:05 ` [PATCH v3] " Morten Brørup
2026-04-15 13:40 ` Morten Brørup
2026-04-18 11:15 ` [PATCH v4] " Morten Brørup
2026-04-19 9:55 ` [PATCH v5] " Morten Brørup
2026-04-22 12:27 ` Morten Brørup
2026-04-27 15:21 ` Morten Brørup
2026-04-28 7:44 ` Andrew Rybchenko
2026-05-22 16:11 ` Bruce Richardson
2026-05-26 8:41 ` Morten Brørup
2026-05-26 9:39 ` Bruce Richardson
2026-05-26 10:37 ` Morten Brørup
2026-05-26 17:45 ` Morten Brørup
2026-05-27 8:48 ` Bruce Richardson
2026-05-27 9:22 ` Morten Brørup [this message]
2026-05-22 16:12 ` Bruce Richardson
2026-05-26 8:57 ` Morten Brørup
2026-05-26 14:00 ` [PATCH v6] " Morten Brørup
2026-05-26 16:00 ` Morten Brørup
2026-06-01 13:36 ` Thomas Monjalon
2026-06-01 13:51 ` Morten Brørup
2026-06-01 14:19 ` Thomas Monjalon
2026-06-01 14:27 ` Morten Brørup
2026-05-29 8:53 ` fengchengwen
2026-05-29 11:43 ` Morten Brørup
2026-05-27 11:36 ` [PATCH v6] net/idpf: update for new mempool cache algorithm Morten Brørup
2026-05-27 11:36 ` [PATCH v6] mempool/dpaa: " Morten Brørup
2026-05-27 11:36 ` [PATCH v6] mempool/dpaa2: " Morten Brørup
2026-06-01 16:40 ` [PATCH v7] mempool: improve cache behaviour and performance Morten Brørup
2026-06-03 15:44 ` Thomas Monjalon
2026-06-01 18:36 ` [PATCH v7] net/idpf: update for new mempool cache algorithm Morten Brørup
2026-06-01 18:36 ` [PATCH v7] mempool/dpaa: " Morten Brørup
2026-06-02 6:51 ` Morten Brørup
2026-06-01 18:36 ` [PATCH v7] mempool/dpaa2: " Morten Brørup
2026-06-02 6:53 ` Morten Brørup
2026-06-02 6:45 ` [PATCH v7] net/idpf: " Morten Brørup
2026-06-10 11:21 ` Morten Brørup
2026-06-10 11:31 ` Bruce Richardson
2026-06-10 12:17 ` Thomas Monjalon
2026-06-10 12:34 ` Bruce Richardson
2026-06-10 11:31 ` Morten Brørup
2026-06-04 11:48 ` [PATCH v8] mempool: improve cache behaviour and performance Morten Brørup
2026-06-04 13:57 ` Morten Brørup
2026-06-10 11:06 ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=98CBD80474FA8B44BF855DF32C47DC35F658A0@smartserver.smartshare.dk \
--to=mb@smartsharesystems.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=hemant.agrawal@nxp.com \
--cc=jingjing.wu@intel.com \
--cc=praveen.shetty@intel.com \
--cc=sachin.saxena@oss.nxp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox