From: Jakub Kicinski <kuba@kernel.org>
To: Dragos Tatulea <dtatulea@nvidia.com>
Cc: Mina Almasry <almasrymina@google.com>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Joshua Washington <joshwash@google.com>,
Harshitha Ramamurthy <hramamurthy@google.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Simon Horman <horms@kernel.org>,
Willem de Bruijn <willemb@google.com>,
ziweixiao@google.com, Vedant Mathur <vedantmathur@google.com>,
io-uring@vger.kernel.org, David Wei <dw@davidwei.uk>
Subject: Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools
Date: Fri, 7 Nov 2025 18:04:53 -0800 [thread overview]
Message-ID: <20251107180453.17f0ed39@kernel.org> (raw)
In-Reply-To: <k3h635mirxo3wichhpxosw4hxvfu67khqs2jyna3muhhj5pmvm@4t2gypnckuri>
On Fri, 7 Nov 2025 13:35:44 +0000 Dragos Tatulea wrote:
> On Thu, Nov 06, 2025 at 05:18:33PM -0800, Jakub Kicinski wrote:
> > On Thu, 6 Nov 2025 17:25:43 +0000 Dragos Tatulea wrote:
> > > I see a similar issue with io_uring as well: for a 9K MTU with 4K ring
> > > size there are ~1% allocation errors during a simple zcrx test.
> > >
> > > mlx5 calculates 16K pages and the io_uring zcrx buffer matches exactly
> > > that size (16K * 4K). Increasing the buffer doesn't help because the
> > > pool size is still what the driver asked for (+ also the
> > > internal pool limit). Even worse: eventually ENOSPC is returned to the
> > > application. But maybe this error has a different fix.
> >
> > Hm, yes, did you trace it all the way to where it comes from?
> > page pool itself does not have any ENOSPC AFAICT. If the cache
> > is full we free the page back to the provider via .release_netmem
> >
> Yes I did. It happens in io_cqe_cache_refill() when there are no more
> CQEs:
> https://elixir.bootlin.com/linux/v6.17.7/source/io_uring/io_uring.c#L775
>
> Looking at the code in zcrx I see that the amount of RQ entries and CQ
> entries is 4K, which matches the device ring size, but doesn't match the
> amount of pages available in the buffer:
> https://github.com/isilence/liburing/blob/zcrx/rx-buf-len/examples/zcrx.c#L410
> https://github.com/isilence/liburing/blob/zcrx/rx-buf-len/examples/zcrx.c#L176
>
> Doubling the CQs (or both RQ and CQ size) makes the ENOSPC go away.
>
> > > Adapting the pool size to the io_uring buffer size works very well. The
> > > allocation errors are gone and performance is improved.
> > >
> > > AFAIU, a page_pool with underlying pre-allocated memory is not really a
> > > cache. So it is useful to be able to adapt to the capacity reserved by
> > > the application.
> > >
> > > Maybe one could argue that the zcrx example from liburing could also be
> > > improved. But one thing is sure: aligning the buffer size to the
> > > page_pool size calculated by the driver based on ring size and MTU
> > > is a hassle. If the application provides a large enough buffer, things
> > > should "just work".
> >
> > Yes, there should be no ENOSPC. I think io_uring is more thorough
> > in handling the corner cases so what you're describing is more of
> > a concern..
>
> Is this error something that io_uring should fix or is this similar to
> EAGAIN where the application has to retry?
Not sure.. let me CC them.
> > Keep in mind that we expect multiple page pools from one provider.
> > We want the pages to flow back to the MP level so other PPs can grab
> > them.
> >
> Oh, right, I forgot... And this can happen now only for devmem though,
> right?
Right, tho I think David is also working on some queue sharing?
> Still, this is an additional reason to give more control to the MP
> over the page_pool config, right?
This one I'm really not sure needs to be exposed via MP vs just
netdev-nl. But yes, I'd imagine the driver default may be sub-optimal
in either direction so giving user control over the sizing would be
good.
next prev parent reply other threads:[~2025-11-08 2:04 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-05 20:07 [PATCH net v1 1/2] page_pool: expose max page pool ring size Mina Almasry
2025-11-05 20:07 ` [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools Mina Almasry
2025-11-05 21:58 ` Jesper Dangaard Brouer
2025-11-05 22:44 ` Mina Almasry
2025-11-05 22:15 ` Harshitha Ramamurthy
2025-11-05 22:46 ` Mina Almasry
2025-11-06 1:11 ` Jakub Kicinski
2025-11-06 1:56 ` Mina Almasry
2025-11-06 2:22 ` Jakub Kicinski
2025-11-06 2:56 ` Mina Almasry
2025-11-06 17:25 ` Dragos Tatulea
2025-11-07 1:18 ` Jakub Kicinski
2025-11-07 13:35 ` Dragos Tatulea
2025-11-08 2:04 ` Jakub Kicinski [this message]
2025-11-05 21:56 ` [PATCH net v1 1/2] page_pool: expose max page pool ring size Jesper Dangaard Brouer
2025-11-05 22:56 ` Mina Almasry
2025-11-06 13:12 ` Ilias Apalodimas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251107180453.17f0ed39@kernel.org \
--to=kuba@kernel.org \
--cc=almasrymina@google.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=dtatulea@nvidia.com \
--cc=dw@davidwei.uk \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=hramamurthy@google.com \
--cc=ilias.apalodimas@linaro.org \
--cc=io-uring@vger.kernel.org \
--cc=joshwash@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=vedantmathur@google.com \
--cc=willemb@google.com \
--cc=ziweixiao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).