All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Mina Almasry <almasrymina@google.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org,
	linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org,
	sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org,
	bpf@vger.kernel.org, linux-media@vger.kernel.org,
	dri-devel@lists.freedesktop.org
Cc: "David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Donald Hunter" <donald.hunter@gmail.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Ivan Kokshaysky" <ink@jurassic.park.msu.ru>,
	"Matt Turner" <mattst88@gmail.com>,
	"Thomas Bogendoerfer" <tsbogend@alpha.franken.de>,
	"James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>,
	"Helge Deller" <deller@gmx.de>,
	"Andreas Larsson" <andreas@gaisler.com>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Masami Hiramatsu" <mhiramat@kernel.org>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Steffen Klassert" <steffen.klassert@secunet.com>,
	"Herbert Xu" <herbert@gondor.apana.org.au>,
	"David Ahern" <dsahern@kernel.org>,
	"Willem de Bruijn" <willemdebruijn.kernel@gmail.com>,
	"Shuah Khan" <shuah@kernel.org>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Bagas Sanjaya" <bagasdotme@gmail.com>,
	"Christoph Hellwig" <hch@infradead.org>,
	"Nikolay Aleksandrov" <razor@blackwall.org>,
	"Taehee Yoo" <ap420073@gmail.com>, "David Wei" <dw@davidwei.uk>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Yunsheng Lin" <linyunsheng@huawei.com>,
	"Shailend Chand" <shailend@google.com>,
	"Harshitha Ramamurthy" <hramamurthy@google.com>,
	"Shakeel Butt" <shakeel.butt@linux.dev>,
	"Jeroen de Borst" <jeroendb@google.com>,
	"Praveen Kaligineedi" <pkaligineedi@google.com>,
	"Willem de Bruijn" <willemb@google.com>,
	"Kaiyuan Zhang" <kaiyuanz@google.com>
Subject: Re: [PATCH net-next v19 06/13] memory-provider: dmabuf devmem memory provider
Date: Wed, 14 Aug 2024 15:12:22 +0100	[thread overview]
Message-ID: <de7daf80-a2e4-4451-b666-2a67ccc3649e@gmail.com> (raw)
In-Reply-To: <20240813211317.3381180-7-almasrymina@google.com>

On 8/13/24 22:13, Mina Almasry wrote:
> Implement a memory provider that allocates dmabuf devmem in the form of
> net_iov.
> 
> The provider receives a reference to the struct netdev_dmabuf_binding
> via the pool->mp_priv pointer. The driver needs to set this pointer for
> the provider in the net_iov.
> 
> The provider obtains a reference on the netdev_dmabuf_binding which
> guarantees the binding and the underlying mapping remains alive until
> the provider is destroyed.
> 
> Usage of PP_FLAG_DMA_MAP is required for this memory provide such that
> the page_pool can provide the driver with the dma-addrs of the devmem.
> 
> Support for PP_FLAG_DMA_SYNC_DEV is omitted for simplicity & p.order !=
> 0.
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
> Signed-off-by: Mina Almasry <almasrymina@google.com>
> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
> 
> ---
> 
> v19:
> - Add PP_FLAG_ALLOW_UNREADABLE_NETMEM flag. It serves 2 purposes, (a)
>    it guards drivers that don't support unreadable netmem (net_iov
>    backed) from accidentally getting exposed to it, and (b) drivers that
>    wish to create header pools can unset it for that pool to force
>    readable netmem.
> - Add page_pool_check_memory_provider, which verifies that the driver
>    has created a page_pool with the expected configuration. This is used
>    to report to the user if the mp configuration succeeded, and also
>    verify that the driver is doing the right thing.
> - Don't reset niov->dma_addr on allocation/free.
> 
> v17:
> - Use ASSERT_RTNL (Jakub)
> 
> v16:
> - Add DEBUG_NET_WARN_ON_ONCE(!rtnl_is_locked()), to catch cases if
>    page_pool_init without rtnl_locking when the queue is provided. In
>    this case, the queue configuration may be changed while we're initing
>    the page_pool, which could be a race.
> 
> v13:
> - Return on warning (Pavel).
> - Fixed pool->recycle_stats not being freed on error (Pavel).
> - Applied reviewed-by from Pavel.
> 
> v11:
> - Rebase to not use the ops. (Christoph)
> 
> v8:
> - Use skb_frag_size instead of frag->bv_len to fix patch-by-patch build
>    error
> 
> v6:
> - refactor new memory provider functions into net/core/devmem.c (Pavel)
> 
> v2:
> - Disable devmem for p.order != 0
> 
> v1:
> - static_branch check in page_is_page_pool_iov() (Willem & Paolo).
> - PP_DEVMEM -> PP_IOV (David).
> - Require PP_FLAG_DMA_MAP (Jakub).
> 
...
> diff --git a/net/core/devmem.c b/net/core/devmem.c
> index 301f4250ca82..2f2a7f4dee4c 100644
> --- a/net/core/devmem.c
> +++ b/net/core/devmem.c
> @@ -17,6 +17,7 @@
>   #include <linux/genalloc.h>
>   #include <linux/dma-buf.h>
>   #include <net/devmem.h>
> +#include <net/mp_dmabuf_devmem.h>
>   #include <net/netdev_queues.h>
>   
>   #include "page_pool_priv.h"
> @@ -153,6 +154,10 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
>   	if (err)
>   		goto err_xa_erase;
>   
> +	err = page_pool_check_memory_provider(dev, rxq, binding);

Frankly, I pretty much don't like it.

1. We do it after reconfiguring the queue just to fail and reconfigure
it again.

2. It should be a part of the common path like netdev_rx_queue_restart(),
not specific to devmem TCP.

These two can be fixed by moving the check into
netdev_rx_queue_restart() just after ->ndo_queue_mem_alloc, assuming
that the callback where we init page pools.

3. That implicit check gives me bad feeling, instead of just getting
direct feedback from the driver, either it's a flag or an error
returned, we have to try to figure what exactly the driver did, with
a high chance this inference will fail us at some point.

And page_pool_check_memory_provider() is not that straightforward,
it doesn't walk through pools of a queue. Not looking too deep,
but it seems like the nested loop can be moved out with the same
effect, so it first looks for a pool in the device and the follows
with the bound_rxqs. And seems the bound_rxqs check would always turn
true, you set the binding into the map in
net_devmem_bind_dmabuf_to_queue() before the restart and it'll be there
after restart for page_pool_check_memory_provider(). Maybe I missed
something, but it's not super clear.

4. And the last thing Jakub mentioned is that we need to be prepared
to expose a flag to the userspace for whether a queue supports
netiov. Not really doable in a sane manner with such implicit
post configuration checks.

And that brings us back to the first approach I mentioned, where
we have a flag in the queue structure, drivers set it, and
netdev_rx_queue_restart() checks it before any callback. That's
where the thread with Jakub stopped, and it reads like at least
he's not against the idea.


> +	if (err)
> +		goto err_xa_erase;
> +
>   	return 0;
>   
>   err_xa_erase:
> @@ -305,4 +310,69 @@ void dev_dmabuf_uninstall(struct net_device *dev)
>   				xa_erase(&binding->bound_rxqs, xa_idx);
>   	}
>   }
> +
...
> diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
> index 3a3277ba167b..cbc54ee4f670 100644
> --- a/net/core/page_pool_user.c
> +++ b/net/core/page_pool_user.c
> @@ -344,6 +344,32 @@ void page_pool_unlist(struct page_pool *pool)
>   	mutex_unlock(&page_pools_lock);
>   }
>   
> +int page_pool_check_memory_provider(struct net_device *dev,
> +				    struct netdev_rx_queue *rxq,
> +				    struct net_devmem_dmabuf_binding *binding)
> +{
> +	struct netdev_rx_queue *binding_rxq;
> +	struct page_pool *pool;
> +	struct hlist_node *n;
> +	unsigned long xa_idx;
> +
> +	mutex_lock(&page_pools_lock);
> +	hlist_for_each_entry_safe(pool, n, &dev->page_pools, user.list) {
> +		if (pool->mp_priv != binding)
> +			continue;
> +
> +		xa_for_each(&binding->bound_rxqs, xa_idx, binding_rxq) {
> +			if (rxq != binding_rxq)
> +				continue;
> +
> +			mutex_unlock(&page_pools_lock);
> +			return 0;
> +		}
> +	}
> +	mutex_unlock(&page_pools_lock);
> +	return -ENODATA;
> +}
> +
>   static void page_pool_unreg_netdev_wipe(struct net_device *netdev)
>   {
>   	struct page_pool *pool;

-- 
Pavel Begunkov

  reply	other threads:[~2024-08-14 14:11 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-13 21:13 [PATCH net-next v19 06/13] memory-provider: dmabuf devmem memory provider Mina Almasry
2024-08-14 14:12 ` Pavel Begunkov [this message]
2024-08-14 14:55   ` Mina Almasry
2024-08-14 16:32     ` Pavel Begunkov
2024-08-16  1:22       ` Jakub Kicinski
2024-08-16 12:20         ` Mina Almasry
2024-08-16 15:35           ` Jakub Kicinski
2024-08-16  0:55 ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de7daf80-a2e4-4451-b666-2a67ccc3649e@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=almasrymina@google.com \
    --cc=andreas@gaisler.com \
    --cc=ap420073@gmail.com \
    --cc=arnd@arndb.de \
    --cc=ast@kernel.org \
    --cc=bagasdotme@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=corbet@lwn.net \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=deller@gmx.de \
    --cc=donald.hunter@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=dsahern@kernel.org \
    --cc=dw@davidwei.uk \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=hch@infradead.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=hramamurthy@google.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=ink@jurassic.park.msu.ru \
    --cc=jeroendb@google.com \
    --cc=jgg@ziepe.ca \
    --cc=john.fastabend@gmail.com \
    --cc=kaiyuanz@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mattst88@gmail.com \
    --cc=mhiramat@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pkaligineedi@google.com \
    --cc=razor@blackwall.org \
    --cc=richard.henderson@linaro.org \
    --cc=rostedt@goodmis.org \
    --cc=shailend@google.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shuah@kernel.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=steffen.klassert@secunet.com \
    --cc=sumit.semwal@linaro.org \
    --cc=tsbogend@alpha.franken.de \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.