From: Stanislav Fomichev <stfomichev@gmail.com>
To: Mina Almasry <almasrymina@google.com>
Cc: Cosmin Ratiu <cratiu@nvidia.com>,
netdev@vger.kernel.org, Jason Gunthorpe <jgg@nvidia.com>,
Leon Romanovsky <leonro@nvidia.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S . Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
Saeed Mahameed <saeedm@nvidia.com>,
Tariq Toukan <tariqt@nvidia.com>,
Dragos Tatulea <dtatulea@nvidia.com>,
linux-kselftest@vger.kernel.org
Subject: Re: [PATCH net 1/2] net/devmem: Reject insufficiently large dmabuf pools
Date: Thu, 24 Apr 2025 17:37:52 -0700 [thread overview]
Message-ID: <aArZYDpFyThATgYN@mini-arch> (raw)
In-Reply-To: <CAHS8izPm_yWCRTD3ngUgDqapqiGmtpw5hhG1DFAwqwtXC-CHLA@mail.gmail.com>
On 04/24, Mina Almasry wrote:
> On Thu, Apr 24, 2025 at 3:40 PM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> >
> > On 04/24, Mina Almasry wrote:
> > > On Thu, Apr 24, 2025 at 3:10 PM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> > > >
> > > > On 04/24, Mina Almasry wrote:
> > > > > On Wed, Apr 23, 2025 at 1:15 PM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> > > > > >
> > > > > > On 04/23, Mina Almasry wrote:
> > > > > > > On Wed, Apr 23, 2025 at 9:03 AM Cosmin Ratiu <cratiu@nvidia.com> wrote:
> > > > > > > >
> > > > > > > > Drivers that are told to allocate RX buffers from pools of DMA memory
> > > > > > > > should have enough memory in the pool to satisfy projected allocation
> > > > > > > > requests (a function of ring size, MTU & other parameters). If there's
> > > > > > > > not enough memory, RX ring refill might fail later at inconvenient times
> > > > > > > > (e.g. during NAPI poll).
> > > > > > > >
> > > > > > >
> > > > > > > My understanding is that if the RX ring refill fails, the driver will
> > > > > > > post the buffers it was able to allocate data for, and will not post
> > > > > > > other buffers. So it will run with a degraded performance but nothing
> > > > > > > overly bad should happen. This should be the same behavior if the
> > > > > > > machine is under memory pressure.
> > > > > > >
> > > > > > > In general I don't know about this change. If the user wants to use
> > > > > > > very small dmabufs, they should be able to, without going through
> > > > > > > hoops reducing the number of rx ring slots the driver has (if it
> > > > > > > supports configuring that).
> > > > > > >
> > > > > > > I think maybe printing an error or warning that the dmabuf is too
> > > > > > > small for the pool_size may be fine. But outright failing this
> > > > > > > configuration? I don't think so.
> > > > > > >
> > > > > > > > This commit adds a check at dmabuf pool init time that compares the
> > > > > > > > amount of memory in the underlying chunk pool (configured by the user
> > > > > > > > space application providing dmabuf memory) with the desired pool size
> > > > > > > > (previously set by the driver) and fails with an error message if chunk
> > > > > > > > memory isn't enough.
> > > > > > > >
> > > > > > > > Fixes: 0f9214046893 ("memory-provider: dmabuf devmem memory provider")
> > > > > > > > Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
> > > > > > > > ---
> > > > > > > > net/core/devmem.c | 11 +++++++++++
> > > > > > > > 1 file changed, 11 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/net/core/devmem.c b/net/core/devmem.c
> > > > > > > > index 6e27a47d0493..651cd55ebb28 100644
> > > > > > > > --- a/net/core/devmem.c
> > > > > > > > +++ b/net/core/devmem.c
> > > > > > > > @@ -299,6 +299,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> > > > > > > > int mp_dmabuf_devmem_init(struct page_pool *pool)
> > > > > > > > {
> > > > > > > > struct net_devmem_dmabuf_binding *binding = pool->mp_priv;
> > > > > > > > + size_t size;
> > > > > > > >
> > > > > > > > if (!binding)
> > > > > > > > return -EINVAL;
> > > > > > > > @@ -312,6 +313,16 @@ int mp_dmabuf_devmem_init(struct page_pool *pool)
> > > > > > > > if (pool->p.order != 0)
> > > > > > > > return -E2BIG;
> > > > > > > >
> > > > > > > > + /* Validate that the underlying dmabuf has enough memory to satisfy
> > > > > > > > + * requested pool size.
> > > > > > > > + */
> > > > > > > > + size = gen_pool_size(binding->chunk_pool) >> PAGE_SHIFT;
> > > > > > > > + if (size < pool->p.pool_size) {
> > > > > > >
> > > > > > > pool_size seems to be the number of ptr_ring slots in the page_pool,
> > > > > > > not some upper or lower bound on the amount of memory the page_pool
> > > > > > > can provide. So this check seems useless? The page_pool can still not
> > > > > > > provide this amount of memory with dmabuf (if the netmems aren't being
> > > > > > > recycled fast enough) or with normal memory (under memory pressure).
> > > > > >
> > > > > > I read this check more as "is there enough chunks in the binding to
> > > > > > fully fill in the page pool". User controls the size of rx ring
> > > > >
> > > > > Only on drivers that support ethtool -G, and where it will let you
> > > > > configure -G to what you want.
> > > >
> > > > gve is the minority here, any major nic (brcm/mlx/intel) supports resizing
> > > > the rings.
> > > >
> > >
> > > GVE supports resizing rings. Other drivers may not. Even on drivers
> > > that support resizing rings. Some users may have a use case for a
> > > dmabuf smaller than the minimum ring size their driver accepts.
> > >
> > > > > > which
> > > > > > controls the size of the page pool which somewhat dictates the minimal
> > > > > > size of the binding (maybe).
> > > > >
> > > > > See the test I ran in the other thread. Seems at least GVE is fine
> > > > > with dmabuf size < ring size. I don't know what other drivers do, but
> > > > > generally speaking I think specific driver limitations should not
> > > > > limit what others can do with their drivers. Sure for the GPU mem
> > > > > applications you're probably looking at the dmabufs are huge and
> > > > > supporting small dmabufs is not a concern, but someone somewhere may
> > > > > want to run with 1 MB dmabuf for some use case and if their driver is
> > > > > fine with it, core should not prevent it, I think.
> > > > >
> > > > > > So it's more of a sanity check.
> > > > > >
> > > > > > Maybe having better defaults in ncdevmem would've been a better option? It
> > > > > > allocates (16000*4096) bytes (slightly less than 64MB, why? to fit into
> > > > > > default /sys/module/udmabuf/parameters/size_limit_mb?) and on my setup
> > > > > > PP wants to get 64MB at least..
> > > > >
> > > > > Yeah, udmabuf has a limitation that it only supports 64MB max size
> > > > > last I looked.
> > > >
> > > > We can use /sys/module/udmabuf/parameters/size_limit_mb to allocate
> > > > more than 64MB, ncdevmem can change it.
> > >
> > > The udmabuf limit is hardcoded, in udmabuf.c or configured on module
> > > load, and ncdevmem doesn't load udmabuf. I guess it could be changed
> > > to that, but currently ncdevmem works with CONFIG_UDMABUF=y
> >
> > You don't need to load/reload the module to change module params:
> >
> > # id
> > uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys)
> > # cat /sys/module/udmabuf/parameters/size_limit_mb
> > 64
> > # echo 128 > /sys/module/udmabuf/parameters/size_limit_mb
> > # cat /sys/module/udmabuf/parameters/size_limit_mb
> > 128
> >
>
> Today I learned! Thanks!
>
> I will put it on my todo list to make ncdevmem force a larger limit to
Or we can ask Cosmin to send something out? Since he's already looking
into the buffer sizes..
next prev parent reply other threads:[~2025-04-25 0:37 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-23 15:35 [PATCH net 1/2] net/devmem: Reject insufficiently large dmabuf pools Cosmin Ratiu
2025-04-23 15:35 ` [PATCH net 2/2] tests/ncdevmem: Fix double-free of queue array Cosmin Ratiu
2025-04-23 16:54 ` Stanislav Fomichev
2025-04-23 18:49 ` Mina Almasry
2025-04-23 16:53 ` [PATCH net 1/2] net/devmem: Reject insufficiently large dmabuf pools Stanislav Fomichev
2025-04-23 17:30 ` Mina Almasry
2025-04-23 20:15 ` Stanislav Fomichev
2025-04-24 20:57 ` Mina Almasry
2025-04-24 22:10 ` Stanislav Fomichev
2025-04-24 22:26 ` Mina Almasry
2025-04-24 22:40 ` Stanislav Fomichev
2025-04-24 23:42 ` Mina Almasry
2025-04-25 0:37 ` Stanislav Fomichev [this message]
2025-04-24 8:47 ` Cosmin Ratiu
2025-04-24 20:50 ` Mina Almasry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aArZYDpFyThATgYN@mini-arch \
--to=stfomichev@gmail.com \
--cc=almasrymina@google.com \
--cc=andrew+netdev@lunn.ch \
--cc=cratiu@nvidia.com \
--cc=davem@davemloft.net \
--cc=dtatulea@nvidia.com \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jgg@nvidia.com \
--cc=kuba@kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-kselftest@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).