* [PATCH net v1 1/2] page_pool: expose max page pool ring size
@ 2025-11-05 20:07 Mina Almasry
2025-11-05 20:07 ` [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools Mina Almasry
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Mina Almasry @ 2025-11-05 20:07 UTC (permalink / raw)
To: netdev, linux-kernel
Cc: Mina Almasry, Joshua Washington, Harshitha Ramamurthy,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas,
Simon Horman, Willem de Bruijn
Expose this as a constant so we can reuse it in drivers.
Signed-off-by: Mina Almasry <almasrymina@google.com>
---
include/net/page_pool/types.h | 2 ++
net/core/page_pool.c | 2 +-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 1509a536cb85..5edba3122b10 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -58,6 +58,8 @@ struct pp_alloc_cache {
netmem_ref cache[PP_ALLOC_CACHE_SIZE];
};
+#define PAGE_POOL_MAX_RING_SIZE 16384
+
/**
* struct page_pool_params - page pool parameters
* @fast: params accessed frequently on hotpath
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 1a5edec485f1..7b2808da294f 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -211,7 +211,7 @@ static int page_pool_init(struct page_pool *pool,
return -EINVAL;
if (pool->p.pool_size)
- ring_qsize = min(pool->p.pool_size, 16384);
+ ring_qsize = min(pool->p.pool_size, PAGE_POOL_MAX_RING_SIZE);
/* DMA direction is either DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
* DMA_BIDIRECTIONAL is for allowing page used for DMA sending,
base-commit: 327c20c21d80e0d87834b392d83ae73c955ad8ff
--
2.51.2.1026.g39e6a42477-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools 2025-11-05 20:07 [PATCH net v1 1/2] page_pool: expose max page pool ring size Mina Almasry @ 2025-11-05 20:07 ` Mina Almasry 2025-11-05 21:58 ` Jesper Dangaard Brouer ` (2 more replies) 2025-11-05 21:56 ` [PATCH net v1 1/2] page_pool: expose max page pool ring size Jesper Dangaard Brouer 2025-11-06 13:12 ` Ilias Apalodimas 2 siblings, 3 replies; 17+ messages in thread From: Mina Almasry @ 2025-11-05 20:07 UTC (permalink / raw) To: netdev, linux-kernel Cc: Mina Almasry, Joshua Washington, Harshitha Ramamurthy, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas, Simon Horman, Willem de Bruijn, ziweixiao, Vedant Mathur NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the current gve devmem tcp configuration. Root causing showed that this particular workload results in a very bursty pattern of devmem allocations and frees, exhausting the page_pool ring buffer. This results in sock_devmem_dontneed taking up to 5ms to free a batch of 128 netmems, as each free does not find an available entry in the pp->ring, and going all the way down to the (slow) gen_pool, and gve_alloc_buffer running into a burst of successive allocations which also don't find entries in the pp->ring (not dontneed'd yet, presumably), each allocation taking up to 100us, slowing down the napi poll loop. From there, the slowness of the napi poll loop results, I suspect, in the rx buffers not being processed in time, and packet drops detected by tcpdump. The total sum of all this badness results in this workload running at around 0.5 GB/s, when expected perf is around 12 GB/s. This entire behavior can be avoided by increasing the pp->ring size to the max allowed 16384. This makes the pp able to handle the bursty alloc/frees of this particular workload. AFACT there should be no negative side effect of arbitrarily increasing the pp->ring size in this manner for ZC configs - the memory is prealloced and pinned by the memory provider anyway. Tested by running AllToAll PXN=2 workload. Before: Avg bus bandwidth : 0.434191 After: Avg bus bandwidth : 12.5494 Note that there is more we can do to optimize this path, such as bulk netmem dontneeds, bulk netmem pp refills, and possibly taking a page from the iouring zcrx playbook and replacing the gen_pool with a simpler fixed-size array based allocator, but this seems sufficient to fix these critcal workloads. With thanks to Willem and Eric for helping root cause this, Cc: ziweixiao@google.com Fixes: 62d7f40503bc ("gve: support unreadable netmem") Reported-by: Vedant Mathur <vedantmathur@google.com> Signed-off-by: Mina Almasry <almasrymina@google.com> --- drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c index 0e2b703c673a..f63ffdd3b3ba 100644 --- a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c +++ b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c @@ -8,6 +8,8 @@ #include "gve.h" #include "gve_utils.h" +#include "net/netdev_queues.h" + int gve_buf_ref_cnt(struct gve_rx_buf_state_dqo *bs) { return page_count(bs->page_info.page) - bs->page_info.pagecnt_bias; @@ -263,6 +265,8 @@ struct page_pool *gve_rx_create_page_pool(struct gve_priv *priv, if (priv->header_split_enabled) { pp.flags |= PP_FLAG_ALLOW_UNREADABLE_NETMEM; pp.queue_idx = rx->q_num; + if (netif_rxq_has_unreadable_mp(priv->dev, rx->q_num)) + pp.pool_size = PAGE_POOL_MAX_RING_SIZE; } return page_pool_create(&pp); -- 2.51.2.1026.g39e6a42477-goog ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools 2025-11-05 20:07 ` [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools Mina Almasry @ 2025-11-05 21:58 ` Jesper Dangaard Brouer 2025-11-05 22:44 ` Mina Almasry 2025-11-05 22:15 ` Harshitha Ramamurthy 2025-11-06 1:11 ` Jakub Kicinski 2 siblings, 1 reply; 17+ messages in thread From: Jesper Dangaard Brouer @ 2025-11-05 21:58 UTC (permalink / raw) To: Mina Almasry, netdev, linux-kernel Cc: Joshua Washington, Harshitha Ramamurthy, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Ilias Apalodimas, Simon Horman, Willem de Bruijn, ziweixiao, Vedant Mathur On 05/11/2025 21.07, Mina Almasry wrote: > diff --git a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c > index 0e2b703c673a..f63ffdd3b3ba 100644 > --- a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c > +++ b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c > @@ -8,6 +8,8 @@ > #include "gve.h" > #include "gve_utils.h" > > +#include "net/netdev_queues.h" > + Shouldn't this be with "<net/netdev_queues.h>" ? And why include this and not net/page_pool/types.h that you just modified in previous patch? > int gve_buf_ref_cnt(struct gve_rx_buf_state_dqo *bs) > { > return page_count(bs->page_info.page) - bs->page_info.pagecnt_bias; > @@ -263,6 +265,8 @@ struct page_pool *gve_rx_create_page_pool(struct gve_priv *priv, > if (priv->header_split_enabled) { > pp.flags |= PP_FLAG_ALLOW_UNREADABLE_NETMEM; > pp.queue_idx = rx->q_num; > + if (netif_rxq_has_unreadable_mp(priv->dev, rx->q_num)) > + pp.pool_size = PAGE_POOL_MAX_RING_SIZE; > } > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools 2025-11-05 21:58 ` Jesper Dangaard Brouer @ 2025-11-05 22:44 ` Mina Almasry 0 siblings, 0 replies; 17+ messages in thread From: Mina Almasry @ 2025-11-05 22:44 UTC (permalink / raw) To: Jesper Dangaard Brouer Cc: netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Ilias Apalodimas, Simon Horman, Willem de Bruijn, ziweixiao, Vedant Mathur On Wed, Nov 5, 2025 at 1:58 PM Jesper Dangaard Brouer <hawk@kernel.org> wrote: > > > > On 05/11/2025 21.07, Mina Almasry wrote: > > diff --git a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c > > index 0e2b703c673a..f63ffdd3b3ba 100644 > > --- a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c > > +++ b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c > > @@ -8,6 +8,8 @@ > > #include "gve.h" > > #include "gve_utils.h" > > > > +#include "net/netdev_queues.h" > > + > > Shouldn't this be with "<net/netdev_queues.h>" ? > Yes, will do. > And why include this and not net/page_pool/types.h that you just > modified in previous patch? > netdev_queues.h is actually for netif_rxq_has_unreadable_mp. I did indeed forget to explicitly include net/page_pool/types.h for the PAGE_POOL_MAX_RING_SIZE. Will also do in v2. -- Thanks, Mina ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools 2025-11-05 20:07 ` [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools Mina Almasry 2025-11-05 21:58 ` Jesper Dangaard Brouer @ 2025-11-05 22:15 ` Harshitha Ramamurthy 2025-11-05 22:46 ` Mina Almasry 2025-11-06 1:11 ` Jakub Kicinski 2 siblings, 1 reply; 17+ messages in thread From: Harshitha Ramamurthy @ 2025-11-05 22:15 UTC (permalink / raw) To: Mina Almasry Cc: netdev, linux-kernel, Joshua Washington, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas, Simon Horman, Willem de Bruijn, ziweixiao, Vedant Mathur On Wed, Nov 5, 2025 at 12:08 PM Mina Almasry <almasrymina@google.com> wrote: > > NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the > current gve devmem tcp configuration. > > Root causing showed that this particular workload results in a very > bursty pattern of devmem allocations and frees, exhausting the page_pool > ring buffer. This results in sock_devmem_dontneed taking up to 5ms to > free a batch of 128 netmems, as each free does not find an available > entry in the pp->ring, and going all the way down to the (slow) gen_pool, > and gve_alloc_buffer running into a burst of successive allocations > which also don't find entries in the pp->ring (not dontneed'd yet, > presumably), each allocation taking up to 100us, slowing down the napi > poll loop. > > From there, the slowness of the napi poll loop results, I suspect, > in the rx buffers not being processed in time, and packet drops > detected by tcpdump. The total sum of all this badness results in this > workload running at around 0.5 GB/s, when expected perf is around 12 > GB/s. > > This entire behavior can be avoided by increasing the pp->ring size to the > max allowed 16384. This makes the pp able to handle the bursty > alloc/frees of this particular workload. AFACT there should be no > negative side effect of arbitrarily increasing the pp->ring size in this > manner for ZC configs - the memory is prealloced and pinned by the > memory provider anyway. > > Tested by running AllToAll PXN=2 workload. Before: > > Avg bus bandwidth : 0.434191 > > After: > > Avg bus bandwidth : 12.5494 > > Note that there is more we can do to optimize this path, such as bulk > netmem dontneeds, bulk netmem pp refills, and possibly taking a page > from the iouring zcrx playbook and replacing the gen_pool with a simpler > fixed-size array based allocator, but this seems sufficient to fix these > critcal workloads. > > With thanks to Willem and Eric for helping root cause this, > > Cc: ziweixiao@google.com > Fixes: 62d7f40503bc ("gve: support unreadable netmem") > Reported-by: Vedant Mathur <vedantmathur@google.com> > Signed-off-by: Mina Almasry <almasrymina@google.com> > --- > drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c > index 0e2b703c673a..f63ffdd3b3ba 100644 > --- a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c > +++ b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c > @@ -8,6 +8,8 @@ > #include "gve.h" > #include "gve_utils.h" > > +#include "net/netdev_queues.h" > + > int gve_buf_ref_cnt(struct gve_rx_buf_state_dqo *bs) > { > return page_count(bs->page_info.page) - bs->page_info.pagecnt_bias; > @@ -263,6 +265,8 @@ struct page_pool *gve_rx_create_page_pool(struct gve_priv *priv, > if (priv->header_split_enabled) { > pp.flags |= PP_FLAG_ALLOW_UNREADABLE_NETMEM; > pp.queue_idx = rx->q_num; > + if (netif_rxq_has_unreadable_mp(priv->dev, rx->q_num)) > + pp.pool_size = PAGE_POOL_MAX_RING_SIZE; > } Would it make sense to also wrap setting of the PP_FLAG_ALLOW_UNREADABLE_NETMEM under the netif_rxq_has_unreadable_mp() helper? > > return page_pool_create(&pp); > -- > 2.51.2.1026.g39e6a42477-goog > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools 2025-11-05 22:15 ` Harshitha Ramamurthy @ 2025-11-05 22:46 ` Mina Almasry 0 siblings, 0 replies; 17+ messages in thread From: Mina Almasry @ 2025-11-05 22:46 UTC (permalink / raw) To: Harshitha Ramamurthy Cc: netdev, linux-kernel, Joshua Washington, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas, Simon Horman, Willem de Bruijn, ziweixiao, Vedant Mathur On Wed, Nov 5, 2025 at 2:15 PM Harshitha Ramamurthy <hramamurthy@google.com> wrote: > > On Wed, Nov 5, 2025 at 12:08 PM Mina Almasry <almasrymina@google.com> wrote: > > > > NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the > > current gve devmem tcp configuration. > > > > Root causing showed that this particular workload results in a very > > bursty pattern of devmem allocations and frees, exhausting the page_pool > > ring buffer. This results in sock_devmem_dontneed taking up to 5ms to > > free a batch of 128 netmems, as each free does not find an available > > entry in the pp->ring, and going all the way down to the (slow) gen_pool, > > and gve_alloc_buffer running into a burst of successive allocations > > which also don't find entries in the pp->ring (not dontneed'd yet, > > presumably), each allocation taking up to 100us, slowing down the napi > > poll loop. > > > > From there, the slowness of the napi poll loop results, I suspect, > > in the rx buffers not being processed in time, and packet drops > > detected by tcpdump. The total sum of all this badness results in this > > workload running at around 0.5 GB/s, when expected perf is around 12 > > GB/s. > > > > This entire behavior can be avoided by increasing the pp->ring size to the > > max allowed 16384. This makes the pp able to handle the bursty > > alloc/frees of this particular workload. AFACT there should be no > > negative side effect of arbitrarily increasing the pp->ring size in this > > manner for ZC configs - the memory is prealloced and pinned by the > > memory provider anyway. > > > > Tested by running AllToAll PXN=2 workload. Before: > > > > Avg bus bandwidth : 0.434191 > > > > After: > > > > Avg bus bandwidth : 12.5494 > > > > Note that there is more we can do to optimize this path, such as bulk > > netmem dontneeds, bulk netmem pp refills, and possibly taking a page > > from the iouring zcrx playbook and replacing the gen_pool with a simpler > > fixed-size array based allocator, but this seems sufficient to fix these > > critcal workloads. > > > > With thanks to Willem and Eric for helping root cause this, > > > > Cc: ziweixiao@google.com > > Fixes: 62d7f40503bc ("gve: support unreadable netmem") > > Reported-by: Vedant Mathur <vedantmathur@google.com> > > Signed-off-by: Mina Almasry <almasrymina@google.com> > > --- > > drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c > > index 0e2b703c673a..f63ffdd3b3ba 100644 > > --- a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c > > +++ b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c > > @@ -8,6 +8,8 @@ > > #include "gve.h" > > #include "gve_utils.h" > > > > +#include "net/netdev_queues.h" > > + > > int gve_buf_ref_cnt(struct gve_rx_buf_state_dqo *bs) > > { > > return page_count(bs->page_info.page) - bs->page_info.pagecnt_bias; > > @@ -263,6 +265,8 @@ struct page_pool *gve_rx_create_page_pool(struct gve_priv *priv, > > if (priv->header_split_enabled) { > > pp.flags |= PP_FLAG_ALLOW_UNREADABLE_NETMEM; > > pp.queue_idx = rx->q_num; > > + if (netif_rxq_has_unreadable_mp(priv->dev, rx->q_num)) > > + pp.pool_size = PAGE_POOL_MAX_RING_SIZE; > > } > > Would it make sense to also wrap setting of the > PP_FLAG_ALLOW_UNREADABLE_NETMEM under the > netif_rxq_has_unreadable_mp() helper? Eh, maybe. PP_FLAG_ALLOW_UNREADABLE_NETMEM is supposed to say 'the driver supports unreadable netmem, core can enable unreadable netmem if it wants'. The hope was that the driver would never need to understand whether it's actually configured for readable or unreadable. But then we found that for reasons like this, the driver sometimes wants to know if it's about to be configured for unreadable, hence netif_rxq_has_unreadable_mp was added. I would say this bit is correct as written, but let me know if you disagree. -- Thanks, Mina ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools 2025-11-05 20:07 ` [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools Mina Almasry 2025-11-05 21:58 ` Jesper Dangaard Brouer 2025-11-05 22:15 ` Harshitha Ramamurthy @ 2025-11-06 1:11 ` Jakub Kicinski 2025-11-06 1:56 ` Mina Almasry 2 siblings, 1 reply; 17+ messages in thread From: Jakub Kicinski @ 2025-11-06 1:11 UTC (permalink / raw) To: Mina Almasry Cc: netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy, Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas, Simon Horman, Willem de Bruijn, ziweixiao, Vedant Mathur On Wed, 5 Nov 2025 20:07:58 +0000 Mina Almasry wrote: > NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the > current gve devmem tcp configuration. Hardcoding the ring size because some other attribute makes you think that a specific application is running is rather unclean IMO.. Do you want me to respin the per-ring config series? Or you can take it over. IDK where the buffer size config is after recent discussion but IIUC it will not drag in my config infra so it shouldn't conflict. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools 2025-11-06 1:11 ` Jakub Kicinski @ 2025-11-06 1:56 ` Mina Almasry 2025-11-06 2:22 ` Jakub Kicinski 0 siblings, 1 reply; 17+ messages in thread From: Mina Almasry @ 2025-11-06 1:56 UTC (permalink / raw) To: Jakub Kicinski Cc: netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy, Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas, Simon Horman, Willem de Bruijn, ziweixiao, Vedant Mathur On Wed, Nov 5, 2025 at 5:11 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 5 Nov 2025 20:07:58 +0000 Mina Almasry wrote: > > NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the > > current gve devmem tcp configuration. > > Hardcoding the ring size because some other attribute makes you think > that a specific application is running is rather unclean IMO.. > I did not see it this way tbh. I am thinking for devmem tcp to be as robust as possible to the burstiness of frag frees, we need a bit of a generous ring size. The specific application I'm referring to is just an example of how this could happen. I was thinking maybe binding->dma_buf->size / net_iov_size (so that the ring is large enough to hold every single netmem if need be) would be the upper bound, but in practice increasing to the current max allowed was good enough, so I'm trying that. > Do you want me to respin the per-ring config series? Or you can take it over. > IDK where the buffer size config is after recent discussion but IIUC > it will not drag in my config infra so it shouldn't conflict. > You mean this one? "[RFC net-next 00/22] net: per-queue rx-buf-len configuration" I don't see the connection between rx-buf-len and the ring size, unless you're thinking about some netlink-configurable way to configure the pp->ring size? I am hoping for something backportable with fixes to make this class of workloads usable. -- Thanks, Mina ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools 2025-11-06 1:56 ` Mina Almasry @ 2025-11-06 2:22 ` Jakub Kicinski 2025-11-06 2:56 ` Mina Almasry 0 siblings, 1 reply; 17+ messages in thread From: Jakub Kicinski @ 2025-11-06 2:22 UTC (permalink / raw) To: Mina Almasry Cc: netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy, Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas, Simon Horman, Willem de Bruijn, ziweixiao, Vedant Mathur On Wed, 5 Nov 2025 17:56:10 -0800 Mina Almasry wrote: > On Wed, Nov 5, 2025 at 5:11 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 5 Nov 2025 20:07:58 +0000 Mina Almasry wrote: > > > NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the > > > current gve devmem tcp configuration. > > > > Hardcoding the ring size because some other attribute makes you think > > that a specific application is running is rather unclean IMO.. > > I did not see it this way tbh. I am thinking for devmem tcp to be as > robust as possible to the burstiness of frag frees, we need a bit of a > generous ring size. The specific application I'm referring to is just > an example of how this could happen. > > I was thinking maybe binding->dma_buf->size / net_iov_size (so that > the ring is large enough to hold every single netmem if need be) would > be the upper bound, but in practice increasing to the current max > allowed was good enough, so I'm trying that. Increasing cache sizes to the max seems very hacky at best. The underlying implementation uses genpool and doesn't even bother to do batching. > > Do you want me to respin the per-ring config series? Or you can take it over. > > IDK where the buffer size config is after recent discussion but IIUC > > it will not drag in my config infra so it shouldn't conflict. > > You mean this one? "[RFC net-next 00/22] net: per-queue rx-buf-len > configuration" > > I don't see the connection between rx-buf-len and the ring size, > unless you're thinking about some netlink-configurable way to > configure the pp->ring size? The latter. We usually have the opposite problem - drivers configure the cache way too large for any practical production needs and waste memory. > I am hoping for something backportable with fixes to make this class > of workloads usable. Oh, let's be clear, no way this is getting a fixes tag :/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools 2025-11-06 2:22 ` Jakub Kicinski @ 2025-11-06 2:56 ` Mina Almasry 2025-11-06 17:25 ` Dragos Tatulea 0 siblings, 1 reply; 17+ messages in thread From: Mina Almasry @ 2025-11-06 2:56 UTC (permalink / raw) To: Jakub Kicinski Cc: netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy, Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas, Simon Horman, Willem de Bruijn, ziweixiao, Vedant Mathur On Wed, Nov 5, 2025 at 6:22 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 5 Nov 2025 17:56:10 -0800 Mina Almasry wrote: > > On Wed, Nov 5, 2025 at 5:11 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > On Wed, 5 Nov 2025 20:07:58 +0000 Mina Almasry wrote: > > > > NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the > > > > current gve devmem tcp configuration. > > > > > > Hardcoding the ring size because some other attribute makes you think > > > that a specific application is running is rather unclean IMO.. > > > > I did not see it this way tbh. I am thinking for devmem tcp to be as > > robust as possible to the burstiness of frag frees, we need a bit of a > > generous ring size. The specific application I'm referring to is just > > an example of how this could happen. > > > > I was thinking maybe binding->dma_buf->size / net_iov_size (so that > > the ring is large enough to hold every single netmem if need be) would > > be the upper bound, but in practice increasing to the current max > > allowed was good enough, so I'm trying that. > > Increasing cache sizes to the max seems very hacky at best. > The underlying implementation uses genpool and doesn't even > bother to do batching. > OK, my bad. I tried to think through downsides of arbitrarily increasing the ring size in a ZC scenario where the underlying memory is pre-pinned and allocated anyway, and I couldn't think of any, but I won't argue the point any further. > > > Do you want me to respin the per-ring config series? Or you can take it over. > > > IDK where the buffer size config is after recent discussion but IIUC > > > it will not drag in my config infra so it shouldn't conflict. > > > > You mean this one? "[RFC net-next 00/22] net: per-queue rx-buf-len > > configuration" > > > > I don't see the connection between rx-buf-len and the ring size, > > unless you're thinking about some netlink-configurable way to > > configure the pp->ring size? > > The latter. We usually have the opposite problem - drivers configure > the cache way too large for any practical production needs and waste > memory. > Sounds good, does this sound like roughly what we're looking for here? I'm thinking configuring pp->ring size could be simpler than rx-buf-len because it's really all used by core, so maybe not something we need to bubble all the way down to the driver, so something like: - We add a new field, netdev_rx_queue[->mp_params?]->pp_ring_size. - We add a netlink api to configure the above. - When a pp is being created, we check netdev_rx_queue[->mp_params]->pp_ring_size, if it's set, then it overrides the driver-provided value. Does that make sense? I don't immediately see why the driver needs to be told the pp_ring_size via the queue API, as it's really needed by the pp anyway, no? Or am I missing something? -- Thanks, Mina ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools 2025-11-06 2:56 ` Mina Almasry @ 2025-11-06 17:25 ` Dragos Tatulea 2025-11-07 1:18 ` Jakub Kicinski 0 siblings, 1 reply; 17+ messages in thread From: Dragos Tatulea @ 2025-11-06 17:25 UTC (permalink / raw) To: Mina Almasry, Jakub Kicinski Cc: netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy, Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas, Simon Horman, Willem de Bruijn, ziweixiao, Vedant Mathur On Wed, Nov 05, 2025 at 06:56:46PM -0800, Mina Almasry wrote: > On Wed, Nov 5, 2025 at 6:22 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Wed, 5 Nov 2025 17:56:10 -0800 Mina Almasry wrote: > > > On Wed, Nov 5, 2025 at 5:11 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Wed, 5 Nov 2025 20:07:58 +0000 Mina Almasry wrote: > > > > > NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the > > > > > current gve devmem tcp configuration. > > > > > > > > Hardcoding the ring size because some other attribute makes you think > > > > that a specific application is running is rather unclean IMO.. > > > > > > I did not see it this way tbh. I am thinking for devmem tcp to be as > > > robust as possible to the burstiness of frag frees, we need a bit of a > > > generous ring size. The specific application I'm referring to is just > > > an example of how this could happen. > > > > > > I was thinking maybe binding->dma_buf->size / net_iov_size (so that > > > the ring is large enough to hold every single netmem if need be) would > > > be the upper bound, but in practice increasing to the current max > > > allowed was good enough, so I'm trying that. > > > > Increasing cache sizes to the max seems very hacky at best. > > The underlying implementation uses genpool and doesn't even > > bother to do batching. > > > > OK, my bad. I tried to think through downsides of arbitrarily > increasing the ring size in a ZC scenario where the underlying memory > is pre-pinned and allocated anyway, and I couldn't think of any, but I > won't argue the point any further. > I see a similar issue with io_uring as well: for a 9K MTU with 4K ring size there are ~1% allocation errors during a simple zcrx test. mlx5 calculates 16K pages and the io_uring zcrx buffer matches exactly that size (16K * 4K). Increasing the buffer doesn't help because the pool size is still what the driver asked for (+ also the internal pool limit). Even worse: eventually ENOSPC is returned to the application. But maybe this error has a different fix. Adapting the pool size to the io_uring buffer size works very well. The allocation errors are gone and performance is improved. AFAIU, a page_pool with underlying pre-allocated memory is not really a cache. So it is useful to be able to adapt to the capacity reserved by the application. Maybe one could argue that the zcrx example from liburing could also be improved. But one thing is sure: aligning the buffer size to the page_pool size calculated by the driver based on ring size and MTU is a hassle. If the application provides a large enough buffer, things should "just work". > > > > Do you want me to respin the per-ring config series? Or you can take it over. > > > > IDK where the buffer size config is after recent discussion but IIUC > > > > it will not drag in my config infra so it shouldn't conflict. > > > > > > You mean this one? "[RFC net-next 00/22] net: per-queue rx-buf-len > > > configuration" > > > > > > I don't see the connection between rx-buf-len and the ring size, > > > unless you're thinking about some netlink-configurable way to > > > configure the pp->ring size? > > > > The latter. We usually have the opposite problem - drivers configure > > the cache way too large for any practical production needs and waste > > memory. > > > > Sounds good, does this sound like roughly what we're looking for here? > I'm thinking configuring pp->ring size could be simpler than > rx-buf-len because it's really all used by core, so maybe not > something we need to bubble all the way down to the driver, so > something like: > > - We add a new field, netdev_rx_queue[->mp_params?]->pp_ring_size. > - We add a netlink api to configure the above. > - When a pp is being created, we check > netdev_rx_queue[->mp_params]->pp_ring_size, if it's set, then it > overrides the driver-provided value. > And you would do the last item in page_pool_init() when mp_ops and pp_ring_size is set? > Does that make sense? It does to me. I would add that for this case the page_pool limit should not apply at all anymore. Maybe you thought about this but I didn't see it mentioned. > I don't immediately see why the driver needs to > be told the pp_ring_size via the queue API, as it's really needed by > the pp anyway, no? Or am I missing something? It doesn't. The only corner case to take care of is when the pool_size is below what the driver asked for. Thanks, Dragos ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools 2025-11-06 17:25 ` Dragos Tatulea @ 2025-11-07 1:18 ` Jakub Kicinski 2025-11-07 13:35 ` Dragos Tatulea 0 siblings, 1 reply; 17+ messages in thread From: Jakub Kicinski @ 2025-11-07 1:18 UTC (permalink / raw) To: Dragos Tatulea Cc: Mina Almasry, netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy, Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas, Simon Horman, Willem de Bruijn, ziweixiao, Vedant Mathur On Thu, 6 Nov 2025 17:25:43 +0000 Dragos Tatulea wrote: > On Wed, Nov 05, 2025 at 06:56:46PM -0800, Mina Almasry wrote: > > On Wed, Nov 5, 2025 at 6:22 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > Increasing cache sizes to the max seems very hacky at best. > > > The underlying implementation uses genpool and doesn't even > > > bother to do batching. > > > > OK, my bad. I tried to think through downsides of arbitrarily > > increasing the ring size in a ZC scenario where the underlying memory > > is pre-pinned and allocated anyway, and I couldn't think of any, but I > > won't argue the point any further. > > > I see a similar issue with io_uring as well: for a 9K MTU with 4K ring > size there are ~1% allocation errors during a simple zcrx test. > > mlx5 calculates 16K pages and the io_uring zcrx buffer matches exactly > that size (16K * 4K). Increasing the buffer doesn't help because the > pool size is still what the driver asked for (+ also the > internal pool limit). Even worse: eventually ENOSPC is returned to the > application. But maybe this error has a different fix. Hm, yes, did you trace it all the way to where it comes from? page pool itself does not have any ENOSPC AFAICT. If the cache is full we free the page back to the provider via .release_netmem > Adapting the pool size to the io_uring buffer size works very well. The > allocation errors are gone and performance is improved. > > AFAIU, a page_pool with underlying pre-allocated memory is not really a > cache. So it is useful to be able to adapt to the capacity reserved by > the application. > > Maybe one could argue that the zcrx example from liburing could also be > improved. But one thing is sure: aligning the buffer size to the > page_pool size calculated by the driver based on ring size and MTU > is a hassle. If the application provides a large enough buffer, things > should "just work". Yes, there should be no ENOSPC. I think io_uring is more thorough in handling the corner cases so what you're describing is more of a concern.. Keep in mind that we expect multiple page pools from one provider. We want the pages to flow back to the MP level so other PPs can grab them. > > > The latter. We usually have the opposite problem - drivers configure > > > the cache way too large for any practical production needs and waste > > > memory. > > > > Sounds good, does this sound like roughly what we're looking for here? > > I'm thinking configuring pp->ring size could be simpler than > > rx-buf-len because it's really all used by core, so maybe not > > something we need to bubble all the way down to the driver, so > > something like: > > > > - We add a new field, netdev_rx_queue[->mp_params?]->pp_ring_size. My series was extending dev->cfg / dev->cfg_pending to also have per-queue params. So the user config goes there, not in the queue struct. > > - We add a netlink api to configure the above. > > - When a pp is being created, we check > > netdev_rx_queue[->mp_params]->pp_ring_size, if it's set, then it > > overrides the driver-provided value. > > And you would do the last item in page_pool_init() when mp_ops and > pp_ring_size is set? Yes. > > Does that make sense? > It does to me. > > I would add that for this case the page_pool limit should not apply at > all anymore. Maybe you thought about this but I didn't see it mentioned. > > > I don't immediately see why the driver needs to > > be told the pp_ring_size via the queue API, as it's really needed by > > the pp anyway, no? Or am I missing something? > It doesn't. The only corner case to take care of is when the pool_size > is below what the driver asked for. Hm, I can't think why driver would care. Of course if someone sets the cache size to a tiny value and enables IOMMU=strict they shouldn't expect great performance.. If a driver appears that has a min I think we can plumb thru the checking later. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools 2025-11-07 1:18 ` Jakub Kicinski @ 2025-11-07 13:35 ` Dragos Tatulea 2025-11-08 2:04 ` Jakub Kicinski 0 siblings, 1 reply; 17+ messages in thread From: Dragos Tatulea @ 2025-11-07 13:35 UTC (permalink / raw) To: Jakub Kicinski Cc: Mina Almasry, netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy, Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas, Simon Horman, Willem de Bruijn, ziweixiao, Vedant Mathur On Thu, Nov 06, 2025 at 05:18:33PM -0800, Jakub Kicinski wrote: > On Thu, 6 Nov 2025 17:25:43 +0000 Dragos Tatulea wrote: > > On Wed, Nov 05, 2025 at 06:56:46PM -0800, Mina Almasry wrote: > > > On Wed, Nov 5, 2025 at 6:22 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > Increasing cache sizes to the max seems very hacky at best. > > > > The underlying implementation uses genpool and doesn't even > > > > bother to do batching. > > > > > > OK, my bad. I tried to think through downsides of arbitrarily > > > increasing the ring size in a ZC scenario where the underlying memory > > > is pre-pinned and allocated anyway, and I couldn't think of any, but I > > > won't argue the point any further. > > > > > I see a similar issue with io_uring as well: for a 9K MTU with 4K ring > > size there are ~1% allocation errors during a simple zcrx test. > > > > mlx5 calculates 16K pages and the io_uring zcrx buffer matches exactly > > that size (16K * 4K). Increasing the buffer doesn't help because the > > pool size is still what the driver asked for (+ also the > > internal pool limit). Even worse: eventually ENOSPC is returned to the > > application. But maybe this error has a different fix. > > Hm, yes, did you trace it all the way to where it comes from? > page pool itself does not have any ENOSPC AFAICT. If the cache > is full we free the page back to the provider via .release_netmem > Yes I did. It happens in io_cqe_cache_refill() when there are no more CQEs: https://elixir.bootlin.com/linux/v6.17.7/source/io_uring/io_uring.c#L775 Looking at the code in zcrx I see that the amount of RQ entries and CQ entries is 4K, which matches the device ring size, but doesn't match the amount of pages available in the buffer: https://github.com/isilence/liburing/blob/zcrx/rx-buf-len/examples/zcrx.c#L410 https://github.com/isilence/liburing/blob/zcrx/rx-buf-len/examples/zcrx.c#L176 Doubling the CQs (or both RQ and CQ size) makes the ENOSPC go away. > > Adapting the pool size to the io_uring buffer size works very well. The > > allocation errors are gone and performance is improved. > > > > AFAIU, a page_pool with underlying pre-allocated memory is not really a > > cache. So it is useful to be able to adapt to the capacity reserved by > > the application. > > > > Maybe one could argue that the zcrx example from liburing could also be > > improved. But one thing is sure: aligning the buffer size to the > > page_pool size calculated by the driver based on ring size and MTU > > is a hassle. If the application provides a large enough buffer, things > > should "just work". > > Yes, there should be no ENOSPC. I think io_uring is more thorough > in handling the corner cases so what you're describing is more of > a concern.. > Is this error something that io_uring should fix or is this similar to EAGAIN where the application has to retry? > Keep in mind that we expect multiple page pools from one provider. > We want the pages to flow back to the MP level so other PPs can grab > them. > Oh, right, I forgot... And this can happen now only for devmem though, right? Still, this is an additional reason to give more control to the MP over the page_pool config, right? [...] Thanks, Dragos ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools 2025-11-07 13:35 ` Dragos Tatulea @ 2025-11-08 2:04 ` Jakub Kicinski 0 siblings, 0 replies; 17+ messages in thread From: Jakub Kicinski @ 2025-11-08 2:04 UTC (permalink / raw) To: Dragos Tatulea Cc: Mina Almasry, netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy, Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas, Simon Horman, Willem de Bruijn, ziweixiao, Vedant Mathur, io-uring, David Wei On Fri, 7 Nov 2025 13:35:44 +0000 Dragos Tatulea wrote: > On Thu, Nov 06, 2025 at 05:18:33PM -0800, Jakub Kicinski wrote: > > On Thu, 6 Nov 2025 17:25:43 +0000 Dragos Tatulea wrote: > > > I see a similar issue with io_uring as well: for a 9K MTU with 4K ring > > > size there are ~1% allocation errors during a simple zcrx test. > > > > > > mlx5 calculates 16K pages and the io_uring zcrx buffer matches exactly > > > that size (16K * 4K). Increasing the buffer doesn't help because the > > > pool size is still what the driver asked for (+ also the > > > internal pool limit). Even worse: eventually ENOSPC is returned to the > > > application. But maybe this error has a different fix. > > > > Hm, yes, did you trace it all the way to where it comes from? > > page pool itself does not have any ENOSPC AFAICT. If the cache > > is full we free the page back to the provider via .release_netmem > > > Yes I did. It happens in io_cqe_cache_refill() when there are no more > CQEs: > https://elixir.bootlin.com/linux/v6.17.7/source/io_uring/io_uring.c#L775 > > Looking at the code in zcrx I see that the amount of RQ entries and CQ > entries is 4K, which matches the device ring size, but doesn't match the > amount of pages available in the buffer: > https://github.com/isilence/liburing/blob/zcrx/rx-buf-len/examples/zcrx.c#L410 > https://github.com/isilence/liburing/blob/zcrx/rx-buf-len/examples/zcrx.c#L176 > > Doubling the CQs (or both RQ and CQ size) makes the ENOSPC go away. > > > > Adapting the pool size to the io_uring buffer size works very well. The > > > allocation errors are gone and performance is improved. > > > > > > AFAIU, a page_pool with underlying pre-allocated memory is not really a > > > cache. So it is useful to be able to adapt to the capacity reserved by > > > the application. > > > > > > Maybe one could argue that the zcrx example from liburing could also be > > > improved. But one thing is sure: aligning the buffer size to the > > > page_pool size calculated by the driver based on ring size and MTU > > > is a hassle. If the application provides a large enough buffer, things > > > should "just work". > > > > Yes, there should be no ENOSPC. I think io_uring is more thorough > > in handling the corner cases so what you're describing is more of > > a concern.. > > Is this error something that io_uring should fix or is this similar to > EAGAIN where the application has to retry? Not sure.. let me CC them. > > Keep in mind that we expect multiple page pools from one provider. > > We want the pages to flow back to the MP level so other PPs can grab > > them. > > > Oh, right, I forgot... And this can happen now only for devmem though, > right? Right, tho I think David is also working on some queue sharing? > Still, this is an additional reason to give more control to the MP > over the page_pool config, right? This one I'm really not sure needs to be exposed via MP vs just netdev-nl. But yes, I'd imagine the driver default may be sub-optimal in either direction so giving user control over the sizing would be good. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v1 1/2] page_pool: expose max page pool ring size 2025-11-05 20:07 [PATCH net v1 1/2] page_pool: expose max page pool ring size Mina Almasry 2025-11-05 20:07 ` [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools Mina Almasry @ 2025-11-05 21:56 ` Jesper Dangaard Brouer 2025-11-05 22:56 ` Mina Almasry 2025-11-06 13:12 ` Ilias Apalodimas 2 siblings, 1 reply; 17+ messages in thread From: Jesper Dangaard Brouer @ 2025-11-05 21:56 UTC (permalink / raw) To: Mina Almasry, netdev, linux-kernel Cc: Joshua Washington, Harshitha Ramamurthy, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Ilias Apalodimas, Simon Horman, Willem de Bruijn On 05/11/2025 21.07, Mina Almasry wrote: > Expose this as a constant so we can reuse it in drivers. > > Signed-off-by: Mina Almasry <almasrymina@google.com> > --- > include/net/page_pool/types.h | 2 ++ > net/core/page_pool.c | 2 +- > 2 files changed, 3 insertions(+), 1 deletion(-) Looks good to me Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> > diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h > index 1509a536cb85..5edba3122b10 100644 > --- a/include/net/page_pool/types.h > +++ b/include/net/page_pool/types.h > @@ -58,6 +58,8 @@ struct pp_alloc_cache { > netmem_ref cache[PP_ALLOC_CACHE_SIZE]; > }; > > +#define PAGE_POOL_MAX_RING_SIZE 16384 > + IIRC this was recently reduced to 16384 (from 32K), do you have a use-case for higher limits? > /** > * struct page_pool_params - page pool parameters > * @fast: params accessed frequently on hotpath > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > index 1a5edec485f1..7b2808da294f 100644 > --- a/net/core/page_pool.c > +++ b/net/core/page_pool.c > @@ -211,7 +211,7 @@ static int page_pool_init(struct page_pool *pool, > return -EINVAL; > > if (pool->p.pool_size) > - ring_qsize = min(pool->p.pool_size, 16384); > + ring_qsize = min(pool->p.pool_size, PAGE_POOL_MAX_RING_SIZE); > > /* DMA direction is either DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. > * DMA_BIDIRECTIONAL is for allowing page used for DMA sending, > > base-commit: 327c20c21d80e0d87834b392d83ae73c955ad8ff ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v1 1/2] page_pool: expose max page pool ring size 2025-11-05 21:56 ` [PATCH net v1 1/2] page_pool: expose max page pool ring size Jesper Dangaard Brouer @ 2025-11-05 22:56 ` Mina Almasry 0 siblings, 0 replies; 17+ messages in thread From: Mina Almasry @ 2025-11-05 22:56 UTC (permalink / raw) To: Jesper Dangaard Brouer Cc: netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Ilias Apalodimas, Simon Horman, Willem de Bruijn On Wed, Nov 5, 2025 at 1:56 PM Jesper Dangaard Brouer <hawk@kernel.org> wrote: > > diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h > > index 1509a536cb85..5edba3122b10 100644 > > --- a/include/net/page_pool/types.h > > +++ b/include/net/page_pool/types.h > > @@ -58,6 +58,8 @@ struct pp_alloc_cache { > > netmem_ref cache[PP_ALLOC_CACHE_SIZE]; > > }; > > > > +#define PAGE_POOL_MAX_RING_SIZE 16384 > > + > > IIRC this was recently reduced to 16384 (from 32K), do you have a > use-case for higher limits? > Yes, I noticed that. Increasing to 16384 resolved my issue, so I did not feel the need to revert back to 32K. We're expanding testing more and more and I may run into another issue that asks that I re-increase the limit (maybe with a check that it's indeed a ZC configuration, so no extra memory will be pinned), but for now I did not find a justification for that. -- Thanks, Mina ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v1 1/2] page_pool: expose max page pool ring size 2025-11-05 20:07 [PATCH net v1 1/2] page_pool: expose max page pool ring size Mina Almasry 2025-11-05 20:07 ` [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools Mina Almasry 2025-11-05 21:56 ` [PATCH net v1 1/2] page_pool: expose max page pool ring size Jesper Dangaard Brouer @ 2025-11-06 13:12 ` Ilias Apalodimas 2 siblings, 0 replies; 17+ messages in thread From: Ilias Apalodimas @ 2025-11-06 13:12 UTC (permalink / raw) To: Mina Almasry Cc: netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Jesper Dangaard Brouer, Simon Horman, Willem de Bruijn On Wed, 5 Nov 2025 at 22:08, Mina Almasry <almasrymina@google.com> wrote: > > Expose this as a constant so we can reuse it in drivers. > > Signed-off-by: Mina Almasry <almasrymina@google.com> > --- > include/net/page_pool/types.h | 2 ++ > net/core/page_pool.c | 2 +- > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h > index 1509a536cb85..5edba3122b10 100644 > --- a/include/net/page_pool/types.h > +++ b/include/net/page_pool/types.h > @@ -58,6 +58,8 @@ struct pp_alloc_cache { > netmem_ref cache[PP_ALLOC_CACHE_SIZE]; > }; > > +#define PAGE_POOL_MAX_RING_SIZE 16384 > + > /** > * struct page_pool_params - page pool parameters > * @fast: params accessed frequently on hotpath > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > index 1a5edec485f1..7b2808da294f 100644 > --- a/net/core/page_pool.c > +++ b/net/core/page_pool.c > @@ -211,7 +211,7 @@ static int page_pool_init(struct page_pool *pool, > return -EINVAL; > > if (pool->p.pool_size) > - ring_qsize = min(pool->p.pool_size, 16384); > + ring_qsize = min(pool->p.pool_size, PAGE_POOL_MAX_RING_SIZE); > > /* DMA direction is either DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. > * DMA_BIDIRECTIONAL is for allowing page used for DMA sending, > > base-commit: 327c20c21d80e0d87834b392d83ae73c955ad8ff > -- > 2.51.2.1026.g39e6a42477-goog > Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-11-08 2:04 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-05 20:07 [PATCH net v1 1/2] page_pool: expose max page pool ring size Mina Almasry 2025-11-05 20:07 ` [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools Mina Almasry 2025-11-05 21:58 ` Jesper Dangaard Brouer 2025-11-05 22:44 ` Mina Almasry 2025-11-05 22:15 ` Harshitha Ramamurthy 2025-11-05 22:46 ` Mina Almasry 2025-11-06 1:11 ` Jakub Kicinski 2025-11-06 1:56 ` Mina Almasry 2025-11-06 2:22 ` Jakub Kicinski 2025-11-06 2:56 ` Mina Almasry 2025-11-06 17:25 ` Dragos Tatulea 2025-11-07 1:18 ` Jakub Kicinski 2025-11-07 13:35 ` Dragos Tatulea 2025-11-08 2:04 ` Jakub Kicinski 2025-11-05 21:56 ` [PATCH net v1 1/2] page_pool: expose max page pool ring size Jesper Dangaard Brouer 2025-11-05 22:56 ` Mina Almasry 2025-11-06 13:12 ` Ilias Apalodimas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).