[PATCH net v1 1/2] page_pool: expose max page pool ring size

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net v1 1/2] page_pool: expose max page pool ring size
@ 2025-11-05 20:07 Mina Almasry
  2025-11-05 20:07 ` [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools Mina Almasry
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Mina Almasry @ 2025-11-05 20:07 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Mina Almasry, Joshua Washington, Harshitha Ramamurthy,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas,
	Simon Horman, Willem de Bruijn

Expose this as a constant so we can reuse it in drivers.

Signed-off-by: Mina Almasry <almasrymina@google.com>
---
 include/net/page_pool/types.h | 2 ++
 net/core/page_pool.c          | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 1509a536cb85..5edba3122b10 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -58,6 +58,8 @@ struct pp_alloc_cache {
 	netmem_ref cache[PP_ALLOC_CACHE_SIZE];
 };
 
+#define PAGE_POOL_MAX_RING_SIZE 16384
+
 /**
  * struct page_pool_params - page pool parameters
  * @fast:	params accessed frequently on hotpath
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 1a5edec485f1..7b2808da294f 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -211,7 +211,7 @@ static int page_pool_init(struct page_pool *pool,
 		return -EINVAL;
 
 	if (pool->p.pool_size)
-		ring_qsize = min(pool->p.pool_size, 16384);
+		ring_qsize = min(pool->p.pool_size, PAGE_POOL_MAX_RING_SIZE);
 
 	/* DMA direction is either DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
 	 * DMA_BIDIRECTIONAL is for allowing page used for DMA sending,

base-commit: 327c20c21d80e0d87834b392d83ae73c955ad8ff
-- 
2.51.2.1026.g39e6a42477-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools
  2025-11-05 20:07 [PATCH net v1 1/2] page_pool: expose max page pool ring size Mina Almasry
@ 2025-11-05 20:07 ` Mina Almasry
  2025-11-05 21:58   ` Jesper Dangaard Brouer
                     ` (2 more replies)
  2025-11-05 21:56 ` [PATCH net v1 1/2] page_pool: expose max page pool ring size Jesper Dangaard Brouer
  2025-11-06 13:12 ` Ilias Apalodimas
  2 siblings, 3 replies; 17+ messages in thread
From: Mina Almasry @ 2025-11-05 20:07 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Mina Almasry, Joshua Washington, Harshitha Ramamurthy,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas,
	Simon Horman, Willem de Bruijn, ziweixiao, Vedant Mathur

NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the
current gve devmem tcp configuration.

Root causing showed that this particular workload results in a very
bursty pattern of devmem allocations and frees, exhausting the page_pool
ring buffer. This results in sock_devmem_dontneed taking up to 5ms to
free a batch of 128 netmems, as each free does not find an available
entry in the pp->ring, and going all the way down to the (slow) gen_pool,
and gve_alloc_buffer running into a burst of successive allocations
which also don't find entries in the pp->ring (not dontneed'd yet,
presumably), each allocation taking up to 100us, slowing down the napi
poll loop.

From there, the slowness of the napi poll loop results, I suspect,
in the rx buffers not being processed in time, and packet drops
detected by tcpdump. The total sum of all this badness results in this
workload running at around 0.5 GB/s, when expected perf is around 12
GB/s.

This entire behavior can be avoided by increasing the pp->ring size to the
max allowed 16384. This makes the pp able to handle the bursty
alloc/frees of this particular workload. AFACT there should be no
negative side effect of arbitrarily increasing the pp->ring size in this
manner for ZC configs - the memory is prealloced and pinned by the
memory provider anyway.

Tested by running AllToAll PXN=2 workload. Before:

Avg bus bandwidth    : 0.434191

After:

Avg bus bandwidth    : 12.5494

Note that there is more we can do to optimize this path, such as bulk
netmem dontneeds, bulk netmem pp refills, and possibly taking a page
from the iouring zcrx playbook and replacing the gen_pool with a simpler
fixed-size array based allocator, but this seems sufficient to fix these
critcal workloads.

With thanks to Willem and Eric for helping root cause this,

Cc: ziweixiao@google.com
Fixes: 62d7f40503bc ("gve: support unreadable netmem")
Reported-by: Vedant Mathur <vedantmathur@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
---
 drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
index 0e2b703c673a..f63ffdd3b3ba 100644
--- a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
+++ b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
@@ -8,6 +8,8 @@
 #include "gve.h"
 #include "gve_utils.h"

+#include "net/netdev_queues.h"
+
 int gve_buf_ref_cnt(struct gve_rx_buf_state_dqo *bs)
 {
 	return page_count(bs->page_info.page) - bs->page_info.pagecnt_bias;
@@ -263,6 +265,8 @@ struct page_pool *gve_rx_create_page_pool(struct gve_priv *priv,
 	if (priv->header_split_enabled) {
 		pp.flags |= PP_FLAG_ALLOW_UNREADABLE_NETMEM;
 		pp.queue_idx = rx->q_num;
+		if  (netif_rxq_has_unreadable_mp(priv->dev, rx->q_num))
+			pp.pool_size = PAGE_POOL_MAX_RING_SIZE;
 	}

 	return page_pool_create(&pp);
-- 
2.51.2.1026.g39e6a42477-goog

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools
  2025-11-05 20:07 ` [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools Mina Almasry
@ 2025-11-05 21:58   ` Jesper Dangaard Brouer
  2025-11-05 22:44     ` Mina Almasry
  2025-11-05 22:15   ` Harshitha Ramamurthy
  2025-11-06  1:11   ` Jakub Kicinski
  2 siblings, 1 reply; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2025-11-05 21:58 UTC (permalink / raw)
  To: Mina Almasry, netdev, linux-kernel
  Cc: Joshua Washington, Harshitha Ramamurthy, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Ilias Apalodimas, Simon Horman, Willem de Bruijn, ziweixiao,
	Vedant Mathur



On 05/11/2025 21.07, Mina Almasry wrote:
> diff --git a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
> index 0e2b703c673a..f63ffdd3b3ba 100644
> --- a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
> +++ b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
> @@ -8,6 +8,8 @@
>   #include "gve.h"
>   #include "gve_utils.h"
>   
> +#include "net/netdev_queues.h"
> +

Shouldn't this be with "<net/netdev_queues.h>" ?

And why include this and not net/page_pool/types.h that you just
modified in previous patch?

>   int gve_buf_ref_cnt(struct gve_rx_buf_state_dqo *bs)
>   {
>   	return page_count(bs->page_info.page) - bs->page_info.pagecnt_bias;
> @@ -263,6 +265,8 @@ struct page_pool *gve_rx_create_page_pool(struct gve_priv *priv,
>   	if (priv->header_split_enabled) {
>   		pp.flags |= PP_FLAG_ALLOW_UNREADABLE_NETMEM;
>   		pp.queue_idx = rx->q_num;
> +		if  (netif_rxq_has_unreadable_mp(priv->dev, rx->q_num))
> +			pp.pool_size = PAGE_POOL_MAX_RING_SIZE;
>   	}
>   


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools
  2025-11-05 21:58   ` Jesper Dangaard Brouer
@ 2025-11-05 22:44     ` Mina Almasry
  0 siblings, 0 replies; 17+ messages in thread
From: Mina Almasry @ 2025-11-05 22:44 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Ilias Apalodimas, Simon Horman, Willem de Bruijn,
	ziweixiao, Vedant Mathur

On Wed, Nov 5, 2025 at 1:58 PM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>
>
>
> On 05/11/2025 21.07, Mina Almasry wrote:
> > diff --git a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
> > index 0e2b703c673a..f63ffdd3b3ba 100644
> > --- a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
> > +++ b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
> > @@ -8,6 +8,8 @@
> >   #include "gve.h"
> >   #include "gve_utils.h"
> >
> > +#include "net/netdev_queues.h"
> > +
>
> Shouldn't this be with "<net/netdev_queues.h>" ?
>

Yes, will do.

> And why include this and not net/page_pool/types.h that you just
> modified in previous patch?
>

netdev_queues.h is actually for netif_rxq_has_unreadable_mp. I did
indeed forget to explicitly include net/page_pool/types.h for the
PAGE_POOL_MAX_RING_SIZE. Will also do in v2.

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools
  2025-11-05 20:07 ` [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools Mina Almasry
  2025-11-05 21:58   ` Jesper Dangaard Brouer
@ 2025-11-05 22:15   ` Harshitha Ramamurthy
  2025-11-05 22:46     ` Mina Almasry
  2025-11-06  1:11   ` Jakub Kicinski
  2 siblings, 1 reply; 17+ messages in thread
From: Harshitha Ramamurthy @ 2025-11-05 22:15 UTC (permalink / raw)
  To: Mina Almasry
  Cc: netdev, linux-kernel, Joshua Washington, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jesper Dangaard Brouer, Ilias Apalodimas, Simon Horman,
	Willem de Bruijn, ziweixiao, Vedant Mathur

On Wed, Nov 5, 2025 at 12:08 PM Mina Almasry <almasrymina@google.com> wrote:
>
> NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the
> current gve devmem tcp configuration.
>
> Root causing showed that this particular workload results in a very
> bursty pattern of devmem allocations and frees, exhausting the page_pool
> ring buffer. This results in sock_devmem_dontneed taking up to 5ms to
> free a batch of 128 netmems, as each free does not find an available
> entry in the pp->ring, and going all the way down to the (slow) gen_pool,
> and gve_alloc_buffer running into a burst of successive allocations
> which also don't find entries in the pp->ring (not dontneed'd yet,
> presumably), each allocation taking up to 100us, slowing down the napi
> poll loop.
>
> From there, the slowness of the napi poll loop results, I suspect,
> in the rx buffers not being processed in time, and packet drops
> detected by tcpdump. The total sum of all this badness results in this
> workload running at around 0.5 GB/s, when expected perf is around 12
> GB/s.
>
> This entire behavior can be avoided by increasing the pp->ring size to the
> max allowed 16384. This makes the pp able to handle the bursty
> alloc/frees of this particular workload. AFACT there should be no
> negative side effect of arbitrarily increasing the pp->ring size in this
> manner for ZC configs - the memory is prealloced and pinned by the
> memory provider anyway.
>
> Tested by running AllToAll PXN=2 workload. Before:
>
> Avg bus bandwidth    : 0.434191
>
> After:
>
> Avg bus bandwidth    : 12.5494
>
> Note that there is more we can do to optimize this path, such as bulk
> netmem dontneeds, bulk netmem pp refills, and possibly taking a page
> from the iouring zcrx playbook and replacing the gen_pool with a simpler
> fixed-size array based allocator, but this seems sufficient to fix these
> critcal workloads.
>
> With thanks to Willem and Eric for helping root cause this,
>
> Cc: ziweixiao@google.com
> Fixes: 62d7f40503bc ("gve: support unreadable netmem")
> Reported-by: Vedant Mathur <vedantmathur@google.com>
> Signed-off-by: Mina Almasry <almasrymina@google.com>
> ---
>  drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
> index 0e2b703c673a..f63ffdd3b3ba 100644
> --- a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
> +++ b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
> @@ -8,6 +8,8 @@
>  #include "gve.h"
>  #include "gve_utils.h"
>
> +#include "net/netdev_queues.h"
> +
>  int gve_buf_ref_cnt(struct gve_rx_buf_state_dqo *bs)
>  {
>         return page_count(bs->page_info.page) - bs->page_info.pagecnt_bias;
> @@ -263,6 +265,8 @@ struct page_pool *gve_rx_create_page_pool(struct gve_priv *priv,
>         if (priv->header_split_enabled) {
>                 pp.flags |= PP_FLAG_ALLOW_UNREADABLE_NETMEM;
>                 pp.queue_idx = rx->q_num;
> +               if  (netif_rxq_has_unreadable_mp(priv->dev, rx->q_num))
> +                       pp.pool_size = PAGE_POOL_MAX_RING_SIZE;
>         }

Would it make sense to also wrap setting of the
PP_FLAG_ALLOW_UNREADABLE_NETMEM under the
netif_rxq_has_unreadable_mp() helper?
>
>         return page_pool_create(&pp);
> --
> 2.51.2.1026.g39e6a42477-goog
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools
  2025-11-05 22:15   ` Harshitha Ramamurthy
@ 2025-11-05 22:46     ` Mina Almasry
  0 siblings, 0 replies; 17+ messages in thread
From: Mina Almasry @ 2025-11-05 22:46 UTC (permalink / raw)
  To: Harshitha Ramamurthy
  Cc: netdev, linux-kernel, Joshua Washington, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jesper Dangaard Brouer, Ilias Apalodimas, Simon Horman,
	Willem de Bruijn, ziweixiao, Vedant Mathur

On Wed, Nov 5, 2025 at 2:15 PM Harshitha Ramamurthy
<hramamurthy@google.com> wrote:
>
> On Wed, Nov 5, 2025 at 12:08 PM Mina Almasry <almasrymina@google.com> wrote:
> >
> > NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the
> > current gve devmem tcp configuration.
> >
> > Root causing showed that this particular workload results in a very
> > bursty pattern of devmem allocations and frees, exhausting the page_pool
> > ring buffer. This results in sock_devmem_dontneed taking up to 5ms to
> > free a batch of 128 netmems, as each free does not find an available
> > entry in the pp->ring, and going all the way down to the (slow) gen_pool,
> > and gve_alloc_buffer running into a burst of successive allocations
> > which also don't find entries in the pp->ring (not dontneed'd yet,
> > presumably), each allocation taking up to 100us, slowing down the napi
> > poll loop.
> >
> > From there, the slowness of the napi poll loop results, I suspect,
> > in the rx buffers not being processed in time, and packet drops
> > detected by tcpdump. The total sum of all this badness results in this
> > workload running at around 0.5 GB/s, when expected perf is around 12
> > GB/s.
> >
> > This entire behavior can be avoided by increasing the pp->ring size to the
> > max allowed 16384. This makes the pp able to handle the bursty
> > alloc/frees of this particular workload. AFACT there should be no
> > negative side effect of arbitrarily increasing the pp->ring size in this
> > manner for ZC configs - the memory is prealloced and pinned by the
> > memory provider anyway.
> >
> > Tested by running AllToAll PXN=2 workload. Before:
> >
> > Avg bus bandwidth    : 0.434191
> >
> > After:
> >
> > Avg bus bandwidth    : 12.5494
> >
> > Note that there is more we can do to optimize this path, such as bulk
> > netmem dontneeds, bulk netmem pp refills, and possibly taking a page
> > from the iouring zcrx playbook and replacing the gen_pool with a simpler
> > fixed-size array based allocator, but this seems sufficient to fix these
> > critcal workloads.
> >
> > With thanks to Willem and Eric for helping root cause this,
> >
> > Cc: ziweixiao@google.com
> > Fixes: 62d7f40503bc ("gve: support unreadable netmem")
> > Reported-by: Vedant Mathur <vedantmathur@google.com>
> > Signed-off-by: Mina Almasry <almasrymina@google.com>
> > ---
> >  drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
> > index 0e2b703c673a..f63ffdd3b3ba 100644
> > --- a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
> > +++ b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
> > @@ -8,6 +8,8 @@
> >  #include "gve.h"
> >  #include "gve_utils.h"
> >
> > +#include "net/netdev_queues.h"
> > +
> >  int gve_buf_ref_cnt(struct gve_rx_buf_state_dqo *bs)
> >  {
> >         return page_count(bs->page_info.page) - bs->page_info.pagecnt_bias;
> > @@ -263,6 +265,8 @@ struct page_pool *gve_rx_create_page_pool(struct gve_priv *priv,
> >         if (priv->header_split_enabled) {
> >                 pp.flags |= PP_FLAG_ALLOW_UNREADABLE_NETMEM;
> >                 pp.queue_idx = rx->q_num;
> > +               if  (netif_rxq_has_unreadable_mp(priv->dev, rx->q_num))
> > +                       pp.pool_size = PAGE_POOL_MAX_RING_SIZE;
> >         }
>
> Would it make sense to also wrap setting of the
> PP_FLAG_ALLOW_UNREADABLE_NETMEM under the
> netif_rxq_has_unreadable_mp() helper?

Eh, maybe. PP_FLAG_ALLOW_UNREADABLE_NETMEM is supposed to say 'the
driver supports unreadable netmem, core can enable unreadable netmem
if it wants'. The hope was that the driver would never need to
understand whether it's actually configured for readable or
unreadable.

But then we found that for reasons like this, the driver sometimes
wants to know if it's about to be configured for unreadable, hence
netif_rxq_has_unreadable_mp was added.

I would say this bit is correct as written, but let me know if you disagree.

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools
  2025-11-05 20:07 ` [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools Mina Almasry
  2025-11-05 21:58   ` Jesper Dangaard Brouer
  2025-11-05 22:15   ` Harshitha Ramamurthy
@ 2025-11-06  1:11   ` Jakub Kicinski
  2025-11-06  1:56     ` Mina Almasry
  2 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2025-11-06  1:11 UTC (permalink / raw)
  To: Mina Almasry
  Cc: netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy,
	Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	Jesper Dangaard Brouer, Ilias Apalodimas, Simon Horman,
	Willem de Bruijn, ziweixiao, Vedant Mathur

On Wed,  5 Nov 2025 20:07:58 +0000 Mina Almasry wrote:
> NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the
> current gve devmem tcp configuration.

Hardcoding the ring size because some other attribute makes you think
that a specific application is running is rather unclean IMO..

Do you want me to respin the per-ring config series? Or you can take it over.
IDK where the buffer size config is after recent discussion but IIUC 
it will not drag in my config infra so it shouldn't conflict.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools
  2025-11-06  1:11   ` Jakub Kicinski
@ 2025-11-06  1:56     ` Mina Almasry
  2025-11-06  2:22       ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Mina Almasry @ 2025-11-06  1:56 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy,
	Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	Jesper Dangaard Brouer, Ilias Apalodimas, Simon Horman,
	Willem de Bruijn, ziweixiao, Vedant Mathur

On Wed, Nov 5, 2025 at 5:11 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed,  5 Nov 2025 20:07:58 +0000 Mina Almasry wrote:
> > NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the
> > current gve devmem tcp configuration.
>
> Hardcoding the ring size because some other attribute makes you think
> that a specific application is running is rather unclean IMO..
>

I did not see it this way tbh. I am thinking for devmem tcp to be as
robust as possible to the burstiness of frag frees, we need a bit of a
generous ring size. The specific application I'm referring to is just
an example of how this could happen.

I was thinking maybe binding->dma_buf->size / net_iov_size (so that
the ring is large enough to hold every single netmem if need be) would
be the upper bound, but in practice increasing to the current max
allowed was good enough, so I'm trying that.

> Do you want me to respin the per-ring config series? Or you can take it over.
> IDK where the buffer size config is after recent discussion but IIUC
> it will not drag in my config infra so it shouldn't conflict.
>

You mean this one? "[RFC net-next 00/22] net: per-queue rx-buf-len
configuration"

I don't see the connection between rx-buf-len and the ring size,
unless you're thinking about some netlink-configurable way to
configure the pp->ring size? I am hoping for something backportable
with fixes to make this class of workloads usable.

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools
  2025-11-06  1:56     ` Mina Almasry
@ 2025-11-06  2:22       ` Jakub Kicinski
  2025-11-06  2:56         ` Mina Almasry
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2025-11-06  2:22 UTC (permalink / raw)
  To: Mina Almasry
  Cc: netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy,
	Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	Jesper Dangaard Brouer, Ilias Apalodimas, Simon Horman,
	Willem de Bruijn, ziweixiao, Vedant Mathur

On Wed, 5 Nov 2025 17:56:10 -0800 Mina Almasry wrote:
> On Wed, Nov 5, 2025 at 5:11 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Wed,  5 Nov 2025 20:07:58 +0000 Mina Almasry wrote:  
> > > NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the
> > > current gve devmem tcp configuration.  
> >
> > Hardcoding the ring size because some other attribute makes you think
> > that a specific application is running is rather unclean IMO..
> 
> I did not see it this way tbh. I am thinking for devmem tcp to be as
> robust as possible to the burstiness of frag frees, we need a bit of a
> generous ring size. The specific application I'm referring to is just
> an example of how this could happen.
> 
> I was thinking maybe binding->dma_buf->size / net_iov_size (so that
> the ring is large enough to hold every single netmem if need be) would
> be the upper bound, but in practice increasing to the current max
> allowed was good enough, so I'm trying that.

Increasing cache sizes to the max seems very hacky at best.
The underlying implementation uses genpool and doesn't even
bother to do batching.

> > Do you want me to respin the per-ring config series? Or you can take it over.
> > IDK where the buffer size config is after recent discussion but IIUC
> > it will not drag in my config infra so it shouldn't conflict.
> 
> You mean this one? "[RFC net-next 00/22] net: per-queue rx-buf-len
> configuration"
> 
> I don't see the connection between rx-buf-len and the ring size,
> unless you're thinking about some netlink-configurable way to
> configure the pp->ring size?

The latter. We usually have the opposite problem - drivers configure
the cache way too large for any practical production needs and waste
memory.

> I am hoping for something backportable with fixes to make this class
> of workloads usable.

Oh, let's be clear, no way this is getting a fixes tag :/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools
  2025-11-06  2:22       ` Jakub Kicinski
@ 2025-11-06  2:56         ` Mina Almasry
  2025-11-06 17:25           ` Dragos Tatulea
  0 siblings, 1 reply; 17+ messages in thread
From: Mina Almasry @ 2025-11-06  2:56 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy,
	Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	Jesper Dangaard Brouer, Ilias Apalodimas, Simon Horman,
	Willem de Bruijn, ziweixiao, Vedant Mathur

On Wed, Nov 5, 2025 at 6:22 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 5 Nov 2025 17:56:10 -0800 Mina Almasry wrote:
> > On Wed, Nov 5, 2025 at 5:11 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Wed,  5 Nov 2025 20:07:58 +0000 Mina Almasry wrote:
> > > > NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the
> > > > current gve devmem tcp configuration.
> > >
> > > Hardcoding the ring size because some other attribute makes you think
> > > that a specific application is running is rather unclean IMO..
> >
> > I did not see it this way tbh. I am thinking for devmem tcp to be as
> > robust as possible to the burstiness of frag frees, we need a bit of a
> > generous ring size. The specific application I'm referring to is just
> > an example of how this could happen.
> >
> > I was thinking maybe binding->dma_buf->size / net_iov_size (so that
> > the ring is large enough to hold every single netmem if need be) would
> > be the upper bound, but in practice increasing to the current max
> > allowed was good enough, so I'm trying that.
>
> Increasing cache sizes to the max seems very hacky at best.
> The underlying implementation uses genpool and doesn't even
> bother to do batching.
>

OK, my bad. I tried to think through downsides of arbitrarily
increasing the ring size in a ZC scenario where the underlying memory
is pre-pinned and allocated anyway, and I couldn't think of any, but I
won't argue the point any further.

> > > Do you want me to respin the per-ring config series? Or you can take it over.
> > > IDK where the buffer size config is after recent discussion but IIUC
> > > it will not drag in my config infra so it shouldn't conflict.
> >
> > You mean this one? "[RFC net-next 00/22] net: per-queue rx-buf-len
> > configuration"
> >
> > I don't see the connection between rx-buf-len and the ring size,
> > unless you're thinking about some netlink-configurable way to
> > configure the pp->ring size?
>
> The latter. We usually have the opposite problem - drivers configure
> the cache way too large for any practical production needs and waste
> memory.
>

Sounds good, does this sound like roughly what we're looking for here?
I'm thinking configuring pp->ring size could be simpler than
rx-buf-len because it's really all used by core, so maybe not
something we need to bubble all the way down to the driver, so
something like:

- We add a new field, netdev_rx_queue[->mp_params?]->pp_ring_size.
- We add a netlink api to configure the above.
- When a pp is being created, we check
netdev_rx_queue[->mp_params]->pp_ring_size, if it's set, then it
overrides the driver-provided value.

Does that make sense? I don't immediately see why the driver needs to
be told the pp_ring_size via the queue API, as it's really needed by
the pp anyway, no? Or am I missing something?

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools
  2025-11-06  2:56         ` Mina Almasry
@ 2025-11-06 17:25           ` Dragos Tatulea
  2025-11-07  1:18             ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Dragos Tatulea @ 2025-11-06 17:25 UTC (permalink / raw)
  To: Mina Almasry, Jakub Kicinski
  Cc: netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy,
	Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	Jesper Dangaard Brouer, Ilias Apalodimas, Simon Horman,
	Willem de Bruijn, ziweixiao, Vedant Mathur

On Wed, Nov 05, 2025 at 06:56:46PM -0800, Mina Almasry wrote:
> On Wed, Nov 5, 2025 at 6:22 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Wed, 5 Nov 2025 17:56:10 -0800 Mina Almasry wrote:
> > > On Wed, Nov 5, 2025 at 5:11 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > On Wed,  5 Nov 2025 20:07:58 +0000 Mina Almasry wrote:
> > > > > NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the
> > > > > current gve devmem tcp configuration.
> > > >
> > > > Hardcoding the ring size because some other attribute makes you think
> > > > that a specific application is running is rather unclean IMO..
> > >
> > > I did not see it this way tbh. I am thinking for devmem tcp to be as
> > > robust as possible to the burstiness of frag frees, we need a bit of a
> > > generous ring size. The specific application I'm referring to is just
> > > an example of how this could happen.
> > >
> > > I was thinking maybe binding->dma_buf->size / net_iov_size (so that
> > > the ring is large enough to hold every single netmem if need be) would
> > > be the upper bound, but in practice increasing to the current max
> > > allowed was good enough, so I'm trying that.
> >
> > Increasing cache sizes to the max seems very hacky at best.
> > The underlying implementation uses genpool and doesn't even
> > bother to do batching.
> >
> 
> OK, my bad. I tried to think through downsides of arbitrarily
> increasing the ring size in a ZC scenario where the underlying memory
> is pre-pinned and allocated anyway, and I couldn't think of any, but I
> won't argue the point any further.
> 
I see a similar issue with io_uring as well: for a 9K MTU with 4K ring
size there are ~1% allocation errors during a simple zcrx test.

mlx5 calculates 16K pages and the io_uring zcrx buffer matches exactly
that size (16K * 4K). Increasing the buffer doesn't help because the
pool size is still what the driver asked for (+ also the
internal pool limit). Even worse: eventually ENOSPC is returned to the
application. But maybe this error has a different fix.

Adapting the pool size to the io_uring buffer size works very well. The
allocation errors are gone and performance is improved.

AFAIU, a page_pool with underlying pre-allocated memory is not really a
cache. So it is useful to be able to adapt to the capacity reserved by
the application.

Maybe one could argue that the zcrx example from liburing could also be
improved. But one thing is sure: aligning the buffer size to the
page_pool size calculated by the driver based on ring size and MTU
is a hassle. If the application provides a large enough buffer, things
should "just work".

> > > > Do you want me to respin the per-ring config series? Or you can take it over.
> > > > IDK where the buffer size config is after recent discussion but IIUC
> > > > it will not drag in my config infra so it shouldn't conflict.
> > >
> > > You mean this one? "[RFC net-next 00/22] net: per-queue rx-buf-len
> > > configuration"
> > >
> > > I don't see the connection between rx-buf-len and the ring size,
> > > unless you're thinking about some netlink-configurable way to
> > > configure the pp->ring size?
> >
> > The latter. We usually have the opposite problem - drivers configure
> > the cache way too large for any practical production needs and waste
> > memory.
> >
> 
> Sounds good, does this sound like roughly what we're looking for here?
> I'm thinking configuring pp->ring size could be simpler than
> rx-buf-len because it's really all used by core, so maybe not
> something we need to bubble all the way down to the driver, so
> something like:
> 
> - We add a new field, netdev_rx_queue[->mp_params?]->pp_ring_size.
> - We add a netlink api to configure the above.
> - When a pp is being created, we check
> netdev_rx_queue[->mp_params]->pp_ring_size, if it's set, then it
> overrides the driver-provided value.
> 
And you would do the last item in page_pool_init() when mp_ops and
pp_ring_size is set?

> Does that make sense?
It does to me. 

I would add that for this case the page_pool limit should not apply at
all anymore. Maybe you thought about this but I didn't see it mentioned.

> I don't immediately see why the driver needs to
> be told the pp_ring_size via the queue API, as it's really needed by
> the pp anyway, no? Or am I missing something?
It doesn't. The only corner case to take care of is when the pool_size
is below what the driver asked for.

Thanks,
Dragos

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools
  2025-11-06 17:25           ` Dragos Tatulea
@ 2025-11-07  1:18             ` Jakub Kicinski
  2025-11-07 13:35               ` Dragos Tatulea
  0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2025-11-07  1:18 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Mina Almasry, netdev, linux-kernel, Joshua Washington,
	Harshitha Ramamurthy, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas,
	Simon Horman, Willem de Bruijn, ziweixiao, Vedant Mathur

On Thu, 6 Nov 2025 17:25:43 +0000 Dragos Tatulea wrote:
> On Wed, Nov 05, 2025 at 06:56:46PM -0800, Mina Almasry wrote:
> > On Wed, Nov 5, 2025 at 6:22 PM Jakub Kicinski <kuba@kernel.org> wrote:  
> > > Increasing cache sizes to the max seems very hacky at best.
> > > The underlying implementation uses genpool and doesn't even
> > > bother to do batching.
> > 
> > OK, my bad. I tried to think through downsides of arbitrarily
> > increasing the ring size in a ZC scenario where the underlying memory
> > is pre-pinned and allocated anyway, and I couldn't think of any, but I
> > won't argue the point any further.
> >   
> I see a similar issue with io_uring as well: for a 9K MTU with 4K ring
> size there are ~1% allocation errors during a simple zcrx test.
> 
> mlx5 calculates 16K pages and the io_uring zcrx buffer matches exactly
> that size (16K * 4K). Increasing the buffer doesn't help because the
> pool size is still what the driver asked for (+ also the
> internal pool limit). Even worse: eventually ENOSPC is returned to the
> application. But maybe this error has a different fix.

Hm, yes, did you trace it all the way to where it comes from?
page pool itself does not have any ENOSPC AFAICT. If the cache
is full we free the page back to the provider via .release_netmem

> Adapting the pool size to the io_uring buffer size works very well. The
> allocation errors are gone and performance is improved.
> 
> AFAIU, a page_pool with underlying pre-allocated memory is not really a
> cache. So it is useful to be able to adapt to the capacity reserved by
> the application.
> 
> Maybe one could argue that the zcrx example from liburing could also be
> improved. But one thing is sure: aligning the buffer size to the
> page_pool size calculated by the driver based on ring size and MTU
> is a hassle. If the application provides a large enough buffer, things
> should "just work".

Yes, there should be no ENOSPC. I think io_uring is more thorough
in handling the corner cases so what you're describing is more of 
a concern..

Keep in mind that we expect multiple page pools from one provider.
We want the pages to flow back to the MP level so other PPs can grab
them.

> > > The latter. We usually have the opposite problem - drivers configure
> > > the cache way too large for any practical production needs and waste
> > > memory.
> > 
> > Sounds good, does this sound like roughly what we're looking for here?
> > I'm thinking configuring pp->ring size could be simpler than
> > rx-buf-len because it's really all used by core, so maybe not
> > something we need to bubble all the way down to the driver, so
> > something like:
> > 
> > - We add a new field, netdev_rx_queue[->mp_params?]->pp_ring_size.

My series was extending dev->cfg / dev->cfg_pending to also have
per-queue params. So the user config goes there, not in the queue
struct.

> > - We add a netlink api to configure the above.
> > - When a pp is being created, we check
> > netdev_rx_queue[->mp_params]->pp_ring_size, if it's set, then it
> > overrides the driver-provided value.
> 
> And you would do the last item in page_pool_init() when mp_ops and
> pp_ring_size is set?

Yes.

> > Does that make sense?  
> It does to me. 
> 
> I would add that for this case the page_pool limit should not apply at
> all anymore. Maybe you thought about this but I didn't see it mentioned.
> 
> > I don't immediately see why the driver needs to
> > be told the pp_ring_size via the queue API, as it's really needed by
> > the pp anyway, no? Or am I missing something?  
> It doesn't. The only corner case to take care of is when the pool_size
> is below what the driver asked for.

Hm, I can't think why driver would care. Of course if someone sets the
cache size to a tiny value and enables IOMMU=strict they shouldn't
expect great performance..

If a driver appears that has a min I think we can plumb thru the
checking later. 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools
  2025-11-07  1:18             ` Jakub Kicinski
@ 2025-11-07 13:35               ` Dragos Tatulea
  2025-11-08  2:04                 ` Jakub Kicinski
  0 siblings, 1 reply; 17+ messages in thread
From: Dragos Tatulea @ 2025-11-07 13:35 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Mina Almasry, netdev, linux-kernel, Joshua Washington,
	Harshitha Ramamurthy, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas,
	Simon Horman, Willem de Bruijn, ziweixiao, Vedant Mathur

On Thu, Nov 06, 2025 at 05:18:33PM -0800, Jakub Kicinski wrote:
> On Thu, 6 Nov 2025 17:25:43 +0000 Dragos Tatulea wrote:
> > On Wed, Nov 05, 2025 at 06:56:46PM -0800, Mina Almasry wrote:
> > > On Wed, Nov 5, 2025 at 6:22 PM Jakub Kicinski <kuba@kernel.org> wrote:  
> > > > Increasing cache sizes to the max seems very hacky at best.
> > > > The underlying implementation uses genpool and doesn't even
> > > > bother to do batching.
> > > 
> > > OK, my bad. I tried to think through downsides of arbitrarily
> > > increasing the ring size in a ZC scenario where the underlying memory
> > > is pre-pinned and allocated anyway, and I couldn't think of any, but I
> > > won't argue the point any further.
> > >   
> > I see a similar issue with io_uring as well: for a 9K MTU with 4K ring
> > size there are ~1% allocation errors during a simple zcrx test.
> > 
> > mlx5 calculates 16K pages and the io_uring zcrx buffer matches exactly
> > that size (16K * 4K). Increasing the buffer doesn't help because the
> > pool size is still what the driver asked for (+ also the
> > internal pool limit). Even worse: eventually ENOSPC is returned to the
> > application. But maybe this error has a different fix.
> 
> Hm, yes, did you trace it all the way to where it comes from?
> page pool itself does not have any ENOSPC AFAICT. If the cache
> is full we free the page back to the provider via .release_netmem
>
Yes I did. It happens in io_cqe_cache_refill() when there are no more
CQEs:
https://elixir.bootlin.com/linux/v6.17.7/source/io_uring/io_uring.c#L775

Looking at the code in zcrx I see that the amount of RQ entries and CQ
entries is 4K, which matches the device ring size, but doesn't match the
amount of pages available in the buffer:
https://github.com/isilence/liburing/blob/zcrx/rx-buf-len/examples/zcrx.c#L410
https://github.com/isilence/liburing/blob/zcrx/rx-buf-len/examples/zcrx.c#L176

Doubling the CQs (or both RQ and CQ size) makes the ENOSPC go away.

> > Adapting the pool size to the io_uring buffer size works very well. The
> > allocation errors are gone and performance is improved.
> > 
> > AFAIU, a page_pool with underlying pre-allocated memory is not really a
> > cache. So it is useful to be able to adapt to the capacity reserved by
> > the application.
> > 
> > Maybe one could argue that the zcrx example from liburing could also be
> > improved. But one thing is sure: aligning the buffer size to the
> > page_pool size calculated by the driver based on ring size and MTU
> > is a hassle. If the application provides a large enough buffer, things
> > should "just work".
> 
> Yes, there should be no ENOSPC. I think io_uring is more thorough
> in handling the corner cases so what you're describing is more of 
> a concern..
>
Is this error something that io_uring should fix or is this similar to
EAGAIN where the application has to retry?

> Keep in mind that we expect multiple page pools from one provider.
> We want the pages to flow back to the MP level so other PPs can grab
> them.
>
Oh, right, I forgot... And this can happen now only for devmem though,
right?

Still, this is an additional reason to give more control to the MP
over the page_pool config, right?

[...]

Thanks,
Dragos

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools
  2025-11-07 13:35               ` Dragos Tatulea
@ 2025-11-08  2:04                 ` Jakub Kicinski
  0 siblings, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2025-11-08  2:04 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Mina Almasry, netdev, linux-kernel, Joshua Washington,
	Harshitha Ramamurthy, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Jesper Dangaard Brouer, Ilias Apalodimas,
	Simon Horman, Willem de Bruijn, ziweixiao, Vedant Mathur,
	io-uring, David Wei

On Fri, 7 Nov 2025 13:35:44 +0000 Dragos Tatulea wrote:
> On Thu, Nov 06, 2025 at 05:18:33PM -0800, Jakub Kicinski wrote:
> > On Thu, 6 Nov 2025 17:25:43 +0000 Dragos Tatulea wrote:  
> > > I see a similar issue with io_uring as well: for a 9K MTU with 4K ring
> > > size there are ~1% allocation errors during a simple zcrx test.
> > > 
> > > mlx5 calculates 16K pages and the io_uring zcrx buffer matches exactly
> > > that size (16K * 4K). Increasing the buffer doesn't help because the
> > > pool size is still what the driver asked for (+ also the
> > > internal pool limit). Even worse: eventually ENOSPC is returned to the
> > > application. But maybe this error has a different fix.  
> > 
> > Hm, yes, did you trace it all the way to where it comes from?
> > page pool itself does not have any ENOSPC AFAICT. If the cache
> > is full we free the page back to the provider via .release_netmem
> >  
> Yes I did. It happens in io_cqe_cache_refill() when there are no more
> CQEs:
> https://elixir.bootlin.com/linux/v6.17.7/source/io_uring/io_uring.c#L775
> 
> Looking at the code in zcrx I see that the amount of RQ entries and CQ
> entries is 4K, which matches the device ring size, but doesn't match the
> amount of pages available in the buffer:
> https://github.com/isilence/liburing/blob/zcrx/rx-buf-len/examples/zcrx.c#L410
> https://github.com/isilence/liburing/blob/zcrx/rx-buf-len/examples/zcrx.c#L176
> 
> Doubling the CQs (or both RQ and CQ size) makes the ENOSPC go away.
> 
> > > Adapting the pool size to the io_uring buffer size works very well. The
> > > allocation errors are gone and performance is improved.
> > > 
> > > AFAIU, a page_pool with underlying pre-allocated memory is not really a
> > > cache. So it is useful to be able to adapt to the capacity reserved by
> > > the application.
> > > 
> > > Maybe one could argue that the zcrx example from liburing could also be
> > > improved. But one thing is sure: aligning the buffer size to the
> > > page_pool size calculated by the driver based on ring size and MTU
> > > is a hassle. If the application provides a large enough buffer, things
> > > should "just work".  
> > 
> > Yes, there should be no ENOSPC. I think io_uring is more thorough
> > in handling the corner cases so what you're describing is more of 
> > a concern..
> 
> Is this error something that io_uring should fix or is this similar to
> EAGAIN where the application has to retry?

Not sure.. let me CC them.

> > Keep in mind that we expect multiple page pools from one provider.
> > We want the pages to flow back to the MP level so other PPs can grab
> > them.
> >  
> Oh, right, I forgot... And this can happen now only for devmem though,
> right?

Right, tho I think David is also working on some queue sharing?

> Still, this is an additional reason to give more control to the MP
> over the page_pool config, right?

This one I'm really not sure needs to be exposed via MP vs just
netdev-nl. But yes, I'd imagine the driver default may be sub-optimal
in either direction so giving user control over the sizing would be
good.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v1 1/2] page_pool: expose max page pool ring size
  2025-11-05 20:07 [PATCH net v1 1/2] page_pool: expose max page pool ring size Mina Almasry
  2025-11-05 20:07 ` [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools Mina Almasry
@ 2025-11-05 21:56 ` Jesper Dangaard Brouer
  2025-11-05 22:56   ` Mina Almasry
  2025-11-06 13:12 ` Ilias Apalodimas
  2 siblings, 1 reply; 17+ messages in thread
From: Jesper Dangaard Brouer @ 2025-11-05 21:56 UTC (permalink / raw)
  To: Mina Almasry, netdev, linux-kernel
  Cc: Joshua Washington, Harshitha Ramamurthy, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Ilias Apalodimas, Simon Horman, Willem de Bruijn



On 05/11/2025 21.07, Mina Almasry wrote:
> Expose this as a constant so we can reuse it in drivers.
> 
> Signed-off-by: Mina Almasry <almasrymina@google.com>
> ---
>   include/net/page_pool/types.h | 2 ++
>   net/core/page_pool.c          | 2 +-
>   2 files changed, 3 insertions(+), 1 deletion(-)


Looks good to me

Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>

> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 1509a536cb85..5edba3122b10 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -58,6 +58,8 @@ struct pp_alloc_cache {
>   	netmem_ref cache[PP_ALLOC_CACHE_SIZE];
>   };
>   
> +#define PAGE_POOL_MAX_RING_SIZE 16384
> +

IIRC this was recently reduced to 16384 (from 32K), do you have a
use-case for higher limits?

>   /**
>    * struct page_pool_params - page pool parameters
>    * @fast:	params accessed frequently on hotpath
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 1a5edec485f1..7b2808da294f 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -211,7 +211,7 @@ static int page_pool_init(struct page_pool *pool,
>   		return -EINVAL;
>   
>   	if (pool->p.pool_size)
> -		ring_qsize = min(pool->p.pool_size, 16384);
> +		ring_qsize = min(pool->p.pool_size, PAGE_POOL_MAX_RING_SIZE);
>   
>   	/* DMA direction is either DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
>   	 * DMA_BIDIRECTIONAL is for allowing page used for DMA sending,
> 
> base-commit: 327c20c21d80e0d87834b392d83ae73c955ad8ff


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v1 1/2] page_pool: expose max page pool ring size
  2025-11-05 21:56 ` [PATCH net v1 1/2] page_pool: expose max page pool ring size Jesper Dangaard Brouer
@ 2025-11-05 22:56   ` Mina Almasry
  0 siblings, 0 replies; 17+ messages in thread
From: Mina Almasry @ 2025-11-05 22:56 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Ilias Apalodimas, Simon Horman, Willem de Bruijn

On Wed, Nov 5, 2025 at 1:56 PM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
> > diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> > index 1509a536cb85..5edba3122b10 100644
> > --- a/include/net/page_pool/types.h
> > +++ b/include/net/page_pool/types.h
> > @@ -58,6 +58,8 @@ struct pp_alloc_cache {
> >       netmem_ref cache[PP_ALLOC_CACHE_SIZE];
> >   };
> >
> > +#define PAGE_POOL_MAX_RING_SIZE 16384
> > +
>
> IIRC this was recently reduced to 16384 (from 32K), do you have a
> use-case for higher limits?
>

Yes, I noticed that. Increasing to 16384 resolved my issue, so I did
not feel the need to revert back to 32K.

We're expanding testing more and more and I may run into another issue
that asks that I re-increase the limit (maybe with a check that it's
indeed a ZC configuration, so no extra memory will be pinned), but for
now I did not find a justification for that.

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v1 1/2] page_pool: expose max page pool ring size
  2025-11-05 20:07 [PATCH net v1 1/2] page_pool: expose max page pool ring size Mina Almasry
  2025-11-05 20:07 ` [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools Mina Almasry
  2025-11-05 21:56 ` [PATCH net v1 1/2] page_pool: expose max page pool ring size Jesper Dangaard Brouer
@ 2025-11-06 13:12 ` Ilias Apalodimas
  2 siblings, 0 replies; 17+ messages in thread
From: Ilias Apalodimas @ 2025-11-06 13:12 UTC (permalink / raw)
  To: Mina Almasry
  Cc: netdev, linux-kernel, Joshua Washington, Harshitha Ramamurthy,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Jesper Dangaard Brouer, Simon Horman,
	Willem de Bruijn

On Wed, 5 Nov 2025 at 22:08, Mina Almasry <almasrymina@google.com> wrote:
>
> Expose this as a constant so we can reuse it in drivers.
>
> Signed-off-by: Mina Almasry <almasrymina@google.com>
> ---
>  include/net/page_pool/types.h | 2 ++
>  net/core/page_pool.c          | 2 +-
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 1509a536cb85..5edba3122b10 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -58,6 +58,8 @@ struct pp_alloc_cache {
>         netmem_ref cache[PP_ALLOC_CACHE_SIZE];
>  };
>
> +#define PAGE_POOL_MAX_RING_SIZE 16384
> +
>  /**
>   * struct page_pool_params - page pool parameters
>   * @fast:      params accessed frequently on hotpath
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 1a5edec485f1..7b2808da294f 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -211,7 +211,7 @@ static int page_pool_init(struct page_pool *pool,
>                 return -EINVAL;
>
>         if (pool->p.pool_size)
> -               ring_qsize = min(pool->p.pool_size, 16384);
> +               ring_qsize = min(pool->p.pool_size, PAGE_POOL_MAX_RING_SIZE);
>
>         /* DMA direction is either DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
>          * DMA_BIDIRECTIONAL is for allowing page used for DMA sending,
>
> base-commit: 327c20c21d80e0d87834b392d83ae73c955ad8ff
> --
> 2.51.2.1026.g39e6a42477-goog
>

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-11-08  2:04 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-05 20:07 [PATCH net v1 1/2] page_pool: expose max page pool ring size Mina Almasry
2025-11-05 20:07 ` [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools Mina Almasry
2025-11-05 21:58   ` Jesper Dangaard Brouer
2025-11-05 22:44     ` Mina Almasry
2025-11-05 22:15   ` Harshitha Ramamurthy
2025-11-05 22:46     ` Mina Almasry
2025-11-06  1:11   ` Jakub Kicinski
2025-11-06  1:56     ` Mina Almasry
2025-11-06  2:22       ` Jakub Kicinski
2025-11-06  2:56         ` Mina Almasry
2025-11-06 17:25           ` Dragos Tatulea
2025-11-07  1:18             ` Jakub Kicinski
2025-11-07 13:35               ` Dragos Tatulea
2025-11-08  2:04                 ` Jakub Kicinski
2025-11-05 21:56 ` [PATCH net v1 1/2] page_pool: expose max page pool ring size Jesper Dangaard Brouer
2025-11-05 22:56   ` Mina Almasry
2025-11-06 13:12 ` Ilias Apalodimas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).