[RFC net-next] net: page_pool: cap alloc cache size and refill by pool ring size

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC net-next] net: page_pool: cap alloc cache size and refill by pool ring size
@ 2026-02-23  9:24 Nimrod Oren
  2026-02-24 23:39 ` Jakub Kicinski
  0 siblings, 1 reply; 4+ messages in thread
From: Nimrod Oren @ 2026-02-23  9:24 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Ilias Apalodimas, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman
  Cc: netdev, linux-kernel, Gal Pressman, Nimrod Oren, Dragos Tatulea,
	Tariq Toukan

Hi all,

The current page_pool alloc cache constants were chosen to match the NAPI
budget and to leave headroom for XDP_DROP recycling, hence the current
defaults PP_ALLOC_CACHE_REFILL (64) and PP_ALLOC_CACHE_SIZE (128).

This logic implicitly assumes a reasonably large backing ring. However, on
systems with 64K page size, these values may exceed the number of pages
actually managed by a pool instance. In practice this means we can
bulk-allocate or cache significantly more pages than a given pool can ever
meaningfully use. This becomes particularly problematic when scaling to
many interfaces/channels, where the total amount of memory tied up in
per-pool alloc caches becomes significant.

I'm proposing to cap the alloc cache size and refill values by the pool
ring size, while preserving the existing behavior as much as possible.

The implementation I have right now is:

pool->alloc.refill = min_t(unsigned int, PP_ALLOC_CACHE_REFILL, ring_qsize);
pool->alloc.size   = pool->alloc.refill * 2;

This keeps the existing relationship "cache size = 2 x refill" and ensures
that refill never exceeds ring_qsize.

I am also considering a couple of alternatives and would like feedback on
which shape makes most sense:

Option B:

pool->alloc.size   = min_t(unsigned int, PP_ALLOC_CACHE_SIZE, ring_qsize);
pool->alloc.refill = pool->alloc.size / 2;

Option C:

pool->alloc.size   = min_t(unsigned int, PP_ALLOC_CACHE_SIZE, ring_qsize);
pool->alloc.refill = min_t(unsigned int, PP_ALLOC_CACHE_REFILL, ring_qsize);

Option A keeps refill as the primary parameter and derives size from it,
preserving the current "refill == NAPI budget" motivation as long as the
ring is large enough. Options B and C instead cap size directly by
ring_qsize and then either derive refill from size (B) or cap both
independently (C).

Looking forward, it might be useful to allow drivers to configure these
values explicitly, so they can tune the cache and refill based on their
specific use case and hardware characteristics. Even if such an option is
added later, the logic above would still define the default behavior.

I'd appreciate feedback on:
* Whether this per-pool cache capping approach makes sense
* If so, which option is preferable
* Any alternative suggestions to better cap/scale the page_pool cache
  parameters for large pages

Thanks,
Nimrod Oren

Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Nimrod Oren <noren@nvidia.com>
---
 include/net/page_pool/types.h |  2 ++
 net/core/page_pool.c          | 10 +++++++---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 0d453484a585..521d0ca587dd 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -55,6 +55,8 @@
 #define PP_ALLOC_CACHE_REFILL	64
 struct pp_alloc_cache {
 	u32 count;
+	u8 refill;
+	u8 size;
 	netmem_ref cache[PP_ALLOC_CACHE_SIZE];
 };

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 265a729431bb..07474ff201d5 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -213,6 +213,10 @@ static int page_pool_init(struct page_pool *pool,
 	if (pool->p.pool_size)
 		ring_qsize = min(pool->p.pool_size, 16384);

+	pool->alloc.refill = min_t(unsigned int, PP_ALLOC_CACHE_REFILL,
+				   ring_qsize);
+	pool->alloc.size = pool->alloc.refill * 2;
+
 	/* DMA direction is either DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
 	 * DMA_BIDIRECTIONAL is for allowing page used for DMA sending,
 	 * which is the XDP_TX use-case.
@@ -416,7 +420,7 @@ static noinline netmem_ref page_pool_refill_alloc_cache(struct page_pool *pool)
 			netmem = 0;
 			break;
 		}
-	} while (pool->alloc.count < PP_ALLOC_CACHE_REFILL);
+	} while (pool->alloc.count < pool->alloc.refill);

 	/* Return last page */
 	if (likely(pool->alloc.count > 0)) {
@@ -590,7 +594,7 @@ static struct page *__page_pool_alloc_page_order(struct page_pool *pool,
 static noinline netmem_ref __page_pool_alloc_netmems_slow(struct page_pool *pool,
 							  gfp_t gfp)
 {
-	const int bulk = PP_ALLOC_CACHE_REFILL;
+	const int bulk = pool->alloc.refill;
 	unsigned int pp_order = pool->p.order;
 	bool dma_map = pool->dma_map;
 	netmem_ref netmem;
@@ -799,7 +803,7 @@ static bool page_pool_recycle_in_ring(struct page_pool *pool, netmem_ref netmem)
 static bool page_pool_recycle_in_cache(netmem_ref netmem,
 				       struct page_pool *pool)
 {
-	if (unlikely(pool->alloc.count == PP_ALLOC_CACHE_SIZE)) {
+	if (unlikely(pool->alloc.count == pool->alloc.size)) {
 		recycle_stat_inc(pool, cache_full);
 		return false;
 	}
-- 
2.45.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC net-next] net: page_pool: cap alloc cache size and refill by pool ring size
  2026-02-23  9:24 [RFC net-next] net: page_pool: cap alloc cache size and refill by pool ring size Nimrod Oren
@ 2026-02-24 23:39 ` Jakub Kicinski
  2026-02-26 15:28   ` Nimrod Oren
  0 siblings, 1 reply; 4+ messages in thread
From: Jakub Kicinski @ 2026-02-24 23:39 UTC (permalink / raw)
  To: Nimrod Oren
  Cc: Jesper Dangaard Brouer, Ilias Apalodimas, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, netdev, linux-kernel,
	Gal Pressman, Dragos Tatulea, Tariq Toukan

On Mon, 23 Feb 2026 11:24:10 +0200 Nimrod Oren wrote:
> I'd appreciate feedback on:
> * Whether this per-pool cache capping approach makes sense
> * If so, which option is preferable
> * Any alternative suggestions to better cap/scale the page_pool cache
>   parameters for large pages

I'd simply change the defines based on PAGE_SIZE. The allocation batch
size has nothing to do with the ring size, it's just amortizing
allocations within a single NAPI cycle.

#if PAGE_SIZE >= 64K
#define PP_ALLOC_CACHE_REFILL	16
#elif PAGE_SIZE >= 16K
#define PP_ALLOC_CACHE_REFILL	32
#else
#define PP_ALLOC_CACHE_REFILL	64
#endif

#define PP_ALLOC_CACHE_SIZE	(PP_ALLOC_CACHE_REFILL * 2)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC net-next] net: page_pool: cap alloc cache size and refill by pool ring size
  2026-02-24 23:39 ` Jakub Kicinski
@ 2026-02-26 15:28   ` Nimrod Oren
  2026-02-26 16:56     ` Jakub Kicinski
  0 siblings, 1 reply; 4+ messages in thread
From: Nimrod Oren @ 2026-02-26 15:28 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jesper Dangaard Brouer, Ilias Apalodimas, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, netdev, linux-kernel,
	Gal Pressman, Dragos Tatulea, Tariq Toukan

On 25/02/2026 1:39, Jakub Kicinski wrote:
> I'd simply change the defines based on PAGE_SIZE. The allocation batch
> size has nothing to do with the ring size, it's just amortizing
> allocations within a single NAPI cycle.

Thanks, that sounds good to me.
Do you think a formula like this would work?

#define PP_ALLOC_CACHE_REFILL ((64 * SZ_4K) / PAGE_SIZE)
#define PP_ALLOC_CACHE_SIZE   (PP_ALLOC_CACHE_REFILL * 2)

It keeps both values constant in bytes across page sizes, ensuring a
consistent memory footprint for pool alloc-caches regardless of a
system's page size.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC net-next] net: page_pool: cap alloc cache size and refill by pool ring size
  2026-02-26 15:28   ` Nimrod Oren
@ 2026-02-26 16:56     ` Jakub Kicinski
  0 siblings, 0 replies; 4+ messages in thread
From: Jakub Kicinski @ 2026-02-26 16:56 UTC (permalink / raw)
  To: Nimrod Oren
  Cc: Jesper Dangaard Brouer, Ilias Apalodimas, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, netdev, linux-kernel,
	Gal Pressman, Dragos Tatulea, Tariq Toukan

On Thu, 26 Feb 2026 17:28:55 +0200 Nimrod Oren wrote:
> On 25/02/2026 1:39, Jakub Kicinski wrote:
> > I'd simply change the defines based on PAGE_SIZE. The allocation batch
> > size has nothing to do with the ring size, it's just amortizing
> > allocations within a single NAPI cycle.  
> 
> Thanks, that sounds good to me.
> Do you think a formula like this would work?
> 
> #define PP_ALLOC_CACHE_REFILL ((64 * SZ_4K) / PAGE_SIZE)
> #define PP_ALLOC_CACHE_SIZE   (PP_ALLOC_CACHE_REFILL * 2)
> 
> It keeps both values constant in bytes across page sizes, ensuring a
> consistent memory footprint for pool alloc-caches regardless of a
> system's page size.

I like the clarity / obviousness of the if/elif/else list. 
No need to think what the result would be for a given page size. 
But up to you.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-02-26 16:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-23  9:24 [RFC net-next] net: page_pool: cap alloc cache size and refill by pool ring size Nimrod Oren
2026-02-24 23:39 ` Jakub Kicinski
2026-02-26 15:28   ` Nimrod Oren
2026-02-26 16:56     ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox