From: Jakub Kicinski <kuba@kernel.org>
To: Dragos Tatulea <dtatulea@nvidia.com>
Cc: Mina Almasry <almasrymina@google.com>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Joshua Washington <joshwash@google.com>,
Harshitha Ramamurthy <hramamurthy@google.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Simon Horman <horms@kernel.org>,
Willem de Bruijn <willemb@google.com>,
ziweixiao@google.com, Vedant Mathur <vedantmathur@google.com>,
io-uring@vger.kernel.org, David Wei <dw@davidwei.uk>
Subject: Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools
Date: Fri, 7 Nov 2025 18:04:53 -0800 [thread overview]
Message-ID: <20251107180453.17f0ed39@kernel.org> (raw)
In-Reply-To: <k3h635mirxo3wichhpxosw4hxvfu67khqs2jyna3muhhj5pmvm@4t2gypnckuri>
On Fri, 7 Nov 2025 13:35:44 +0000 Dragos Tatulea wrote:
> On Thu, Nov 06, 2025 at 05:18:33PM -0800, Jakub Kicinski wrote:
> > On Thu, 6 Nov 2025 17:25:43 +0000 Dragos Tatulea wrote:
> > > I see a similar issue with io_uring as well: for a 9K MTU with 4K ring
> > > size there are ~1% allocation errors during a simple zcrx test.
> > >
> > > mlx5 calculates 16K pages and the io_uring zcrx buffer matches exactly
> > > that size (16K * 4K). Increasing the buffer doesn't help because the
> > > pool size is still what the driver asked for (+ also the
> > > internal pool limit). Even worse: eventually ENOSPC is returned to the
> > > application. But maybe this error has a different fix.
> >
> > Hm, yes, did you trace it all the way to where it comes from?
> > page pool itself does not have any ENOSPC AFAICT. If the cache
> > is full we free the page back to the provider via .release_netmem
> >
> Yes I did. It happens in io_cqe_cache_refill() when there are no more
> CQEs:
> https://elixir.bootlin.com/linux/v6.17.7/source/io_uring/io_uring.c#L775
>
> Looking at the code in zcrx I see that the amount of RQ entries and CQ
> entries is 4K, which matches the device ring size, but doesn't match the
> amount of pages available in the buffer:
> https://github.com/isilence/liburing/blob/zcrx/rx-buf-len/examples/zcrx.c#L410
> https://github.com/isilence/liburing/blob/zcrx/rx-buf-len/examples/zcrx.c#L176
>
> Doubling the CQs (or both RQ and CQ size) makes the ENOSPC go away.
>
> > > Adapting the pool size to the io_uring buffer size works very well. The
> > > allocation errors are gone and performance is improved.
> > >
> > > AFAIU, a page_pool with underlying pre-allocated memory is not really a
> > > cache. So it is useful to be able to adapt to the capacity reserved by
> > > the application.
> > >
> > > Maybe one could argue that the zcrx example from liburing could also be
> > > improved. But one thing is sure: aligning the buffer size to the
> > > page_pool size calculated by the driver based on ring size and MTU
> > > is a hassle. If the application provides a large enough buffer, things
> > > should "just work".
> >
> > Yes, there should be no ENOSPC. I think io_uring is more thorough
> > in handling the corner cases so what you're describing is more of
> > a concern..
>
> Is this error something that io_uring should fix or is this similar to
> EAGAIN where the application has to retry?
Not sure.. let me CC them.
> > Keep in mind that we expect multiple page pools from one provider.
> > We want the pages to flow back to the MP level so other PPs can grab
> > them.
> >
> Oh, right, I forgot... And this can happen now only for devmem though,
> right?
Right, tho I think David is also working on some queue sharing?
> Still, this is an additional reason to give more control to the MP
> over the page_pool config, right?
This one I'm really not sure needs to be exposed via MP vs just
netdev-nl. But yes, I'd imagine the driver default may be sub-optimal
in either direction so giving user control over the sizing would be
good.
next prev parent reply other threads:[~2025-11-08 2:04 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-05 20:07 [PATCH net v1 1/2] page_pool: expose max page pool ring size Mina Almasry
2025-11-05 20:07 ` [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools Mina Almasry
2025-11-05 21:58 ` Jesper Dangaard Brouer
2025-11-05 22:44 ` Mina Almasry
2025-11-05 22:15 ` Harshitha Ramamurthy
2025-11-05 22:46 ` Mina Almasry
2025-11-06 1:11 ` Jakub Kicinski
2025-11-06 1:56 ` Mina Almasry
2025-11-06 2:22 ` Jakub Kicinski
2025-11-06 2:56 ` Mina Almasry
2025-11-06 17:25 ` Dragos Tatulea
2025-11-07 1:18 ` Jakub Kicinski
2025-11-07 13:35 ` Dragos Tatulea
2025-11-08 2:04 ` Jakub Kicinski [this message]
2025-11-10 12:36 ` Pavel Begunkov
2025-11-10 12:48 ` Dragos Tatulea
2025-11-05 21:56 ` [PATCH net v1 1/2] page_pool: expose max page pool ring size Jesper Dangaard Brouer
2025-11-05 22:56 ` Mina Almasry
2025-11-06 13:12 ` Ilias Apalodimas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251107180453.17f0ed39@kernel.org \
--to=kuba@kernel.org \
--cc=almasrymina@google.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=dtatulea@nvidia.com \
--cc=dw@davidwei.uk \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=hramamurthy@google.com \
--cc=ilias.apalodimas@linaro.org \
--cc=io-uring@vger.kernel.org \
--cc=joshwash@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=vedantmathur@google.com \
--cc=willemb@google.com \
--cc=ziweixiao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.