All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Mina Almasry <almasrymina@google.com>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org,
	linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org,
	sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	linux-arch@vger.kernel.org, bpf@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org,
	dri-devel@lists.freedesktop.org,
	"Donald Hunter" <donald.hunter@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Ivan Kokshaysky" <ink@jurassic.park.msu.ru>,
	"Matt Turner" <mattst88@gmail.com>,
	"Thomas Bogendoerfer" <tsbogend@alpha.franken.de>,
	"James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>,
	"Helge Deller" <deller@gmx.de>,
	"Andreas Larsson" <andreas@gaisler.com>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Masami Hiramatsu" <mhiramat@kernel.org>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Steffen Klassert" <steffen.klassert@secunet.com>,
	"Herbert Xu" <herbert@gondor.apana.org.au>,
	"David Ahern" <dsahern@kernel.org>,
	"Willem de Bruijn" <willemdebruijn.kernel@gmail.com>,
	"Björn Töpel" <bjorn@kernel.org>,
	"Magnus Karlsson" <magnus.karlsson@intel.com>,
	"Maciej Fijalkowski" <maciej.fijalkowski@intel.com>,
	"Jonathan Lemon" <jonathan.lemon@gmail.com>,
	"Shuah Khan" <shuah@kernel.org>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Pavel Begunkov" <asml.silence@gmail.com>,
	"David Wei" <dw@davidwei.uk>, "Jason Gunthorpe" <jgg@ziepe.ca>,
	"Yunsheng Lin" <linyunsheng@huawei.com>,
	"Shailend Chand" <shailend@google.com>,
	"Harshitha Ramamurthy" <hramamurthy@google.com>,
	"Shakeel Butt" <shakeel.butt@linux.dev>,
	"Jeroen de Borst" <jeroendb@google.com>,
	"Praveen Kaligineedi" <pkaligineedi@google.com>,
	"Bagas Sanjaya" <bagasdotme@gmail.com>,
	"Christoph Hellwig" <hch@infradead.org>,
	"Nikolay Aleksandrov" <razor@blackwall.org>,
	"Taehee Yoo" <ap420073@gmail.com>,
	"Willem de Bruijn" <willemb@google.com>,
	"Kaiyuan Zhang" <kaiyuanz@google.com>
Subject: Re: [PATCH net-next v24 06/13] memory-provider: dmabuf devmem memory provider
Date: Tue, 3 Sep 2024 14:19:48 -0700	[thread overview]
Message-ID: <20240903141948.269e22bb@kernel.org> (raw)
In-Reply-To: <20240831004313.3713467-7-almasrymina@google.com>

On Sat, 31 Aug 2024 00:43:06 +0000 Mina Almasry wrote:
> diff --git a/include/net/mp_dmabuf_devmem.h b/include/net/mp_dmabuf_devmem.h
> new file mode 100644
> index 000000000000..6d1cf2a77f6b
> --- /dev/null
> +++ b/include/net/mp_dmabuf_devmem.h

this header can live under net/core/ like netmem_priv.h right?
devmem internals should be of no interest outside of core networking.

In fact the same is true for include/net/devmem.h ?

Sorry for pushing back on all the header exports, we have had bad
experiences with people treating anything under include/ as public 
API for any subsystem to use..

> @@ -0,0 +1,44 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Dmabuf device memory provider.
> + *
> + * Authors:	Mina Almasry <almasrymina@google.com>
> + *
> + */
> +#ifndef _NET_MP_DMABUF_DEVMEM_H
> +#define _NET_MP_DMABUF_DEVMEM_H
> +
> +#include <net/netmem.h>
> +
> +#if defined(CONFIG_NET_DEVMEM)
> +int mp_dmabuf_devmem_init(struct page_pool *pool);
> +
> +netmem_ref mp_dmabuf_devmem_alloc_netmems(struct page_pool *pool, gfp_t gfp);
> +
> +void mp_dmabuf_devmem_destroy(struct page_pool *pool);
> +
> +bool mp_dmabuf_devmem_release_page(struct page_pool *pool, netmem_ref netmem);
> +#else
> +static inline int mp_dmabuf_devmem_init(struct page_pool *pool)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +static inline netmem_ref mp_dmabuf_devmem_alloc_netmems(struct page_pool *pool,
> +							gfp_t gfp)

Please break the lines after the return type if the line gets long:

static inline netmem_ref 
mp_dmabuf_devmem_alloc_netmems(struct page_pool *pool, gfp_t gfp)

Please fix where you can (at least where it cases going over 80 chars)

> +{
> +	return 0;
> +}
> +
> +static inline void mp_dmabuf_devmem_destroy(struct page_pool *pool)
> +{
> +}
> +
> +static inline bool mp_dmabuf_devmem_release_page(struct page_pool *pool,
> +						 netmem_ref netmem)
> +{
> +	return false;
> +}
> +#endif
> +
> +#endif /* _NET_MP_DMABUF_DEVMEM_H */
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index ac6c7945117b..61400d4b0d66 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -8,6 +8,7 @@
>  #ifndef _NET_NETMEM_H
>  #define _NET_NETMEM_H
>  
> +#include <linux/mm.h>
>  #include <net/devmem.h>
>  #include <net/net_debug.h>
>  
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 4afd6dd56351..1b4698710f25 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -20,8 +20,17 @@
>  					* device driver responsibility
>  					*/
>  #define PP_FLAG_SYSTEM_POOL	BIT(2) /* Global system page_pool */
> +#define PP_FLAG_ALLOW_UNREADABLE_NETMEM	BIT(3) /* Allow unreadable (net_iov
> +						* backed) netmem in this
> +						* page_pool. Drivers setting
> +						* this must be able to support
> +						* unreadable netmem, where
> +						* netmem_address() would return
> +						* NULL. This flag should not be
> +						* set for header page_pools.
> +						*/

Maybe move the comment before the define:

/* Allow unreadable (net_iov backed) netmem in this page_pool. Drivers
setting
 * this must be able to support unreadable netmem, where netmem_address() would 
 * return NULL. This flag should not be set for header page_pools.
 */
#define PP_FLAG_ALLOW_UNREADABLE_NETMEM	BIT(3)

? up to you.

>  #define PP_FLAG_ALL		(PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV | \
> -				 PP_FLAG_SYSTEM_POOL)
> +				 PP_FLAG_SYSTEM_POOL | PP_FLAG_ALLOW_UNREADABLE_NETMEM)
>  
>  /*
>   * Fast allocation side cache array/stack
> @@ -57,7 +66,9 @@ struct pp_alloc_cache {
>   * @offset:	DMA sync address offset for PP_FLAG_DMA_SYNC_DEV
>   * @slow:	params with slowpath access only (initialization and Netlink)
>   * @netdev:	netdev this pool will serve (leave as NULL if none or multiple)
> - * @flags:	PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV, PP_FLAG_SYSTEM_POOL
> + * @queue:	struct netdev_rx_queue this page_pool is being created for.
> + * @flags:	PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV, PP_FLAG_SYSTEM_POOL,
> + *		PP_FLAG_ALLOW_UNREADABLE_NETMEM.
>   */
>  struct page_pool_params {
>  	struct_group_tagged(page_pool_params_fast, fast,
> @@ -72,6 +83,7 @@ struct page_pool_params {
>  	);
>  	struct_group_tagged(page_pool_params_slow, slow,
>  		struct net_device *netdev;
> +		struct netdev_rx_queue *queue;

Why set a pointer? It should work but drivers don't usually deal with
netdev_rx_queue struct directly. struct xdp_rxq_info takes an integer
queue id, and it serves a somewhat similar function.

Keep in mind that there will be more drivers than core code, so
convenience for them matters more.

>  		unsigned int	flags;
>  /* private: used by test code only */
>  		void (*init_callback)(netmem_ref netmem, void *arg);
> diff --git a/net/core/devmem.c b/net/core/devmem.c
> index 727e5ee39f30..c8c112360caa 100644
> --- a/net/core/devmem.c
> +++ b/net/core/devmem.c
> @@ -13,6 +13,7 @@
>  #include <linux/netdevice.h>
>  #include <linux/types.h>
>  #include <net/devmem.h>
> +#include <net/mp_dmabuf_devmem.h>
>  #include <net/netdev_queues.h>
>  #include <net/netdev_rx_queue.h>
>  #include <net/page_pool/helpers.h>
> @@ -320,3 +321,68 @@ void dev_dmabuf_uninstall(struct net_device *dev)
>  			}
>  	}
>  }
> +
> +/*** "Dmabuf devmem memory provider" ***/
> +
> +int mp_dmabuf_devmem_init(struct page_pool *pool)
> +{
> +	struct net_devmem_dmabuf_binding *binding = pool->mp_priv;
> +
> +	if (!binding)
> +		return -EINVAL;
> +
> +	if (!pool->dma_map)
> +		return -EOPNOTSUPP;
> +
> +	if (pool->dma_sync)
> +		return -EOPNOTSUPP;
> +
> +	if (pool->p.order != 0)
> +		return -E2BIG;
> +
> +	net_devmem_dmabuf_binding_get(binding);
> +	return 0;
> +}
> +
> +netmem_ref mp_dmabuf_devmem_alloc_netmems(struct page_pool *pool, gfp_t gfp)
> +{
> +	struct net_devmem_dmabuf_binding *binding = pool->mp_priv;
> +	netmem_ref netmem;
> +	struct net_iov *niov;

nit: reverse xmas tree

> +	niov = net_devmem_alloc_dmabuf(binding);
> +	if (!niov)
> +		return 0;
> +
> +	netmem = net_iov_to_netmem(niov);
> +
> +	page_pool_set_pp_info(pool, netmem);
> +
> +	pool->pages_state_hold_cnt++;
> +	trace_page_pool_state_hold(pool, netmem, pool->pages_state_hold_cnt);
> +	return netmem;
> +}
> +
> +void mp_dmabuf_devmem_destroy(struct page_pool *pool)
> +{
> +	struct net_devmem_dmabuf_binding *binding = pool->mp_priv;
> +
> +	net_devmem_dmabuf_binding_put(binding);
> +}
> +
> +bool mp_dmabuf_devmem_release_page(struct page_pool *pool, netmem_ref netmem)
> +{
> +	if (WARN_ON_ONCE(!netmem_is_net_iov(netmem)))
> +		return false;
> +
> +	if (WARN_ON_ONCE(atomic_long_read(netmem_get_pp_ref_count_ref(netmem)) !=
> +		     1))

something needs factoring out here, to make this line shorter, please..
either netmem -> net_iov conversion or at least reading of the ref
count?

> +		return false;
> +
> +	page_pool_clear_pp_info(netmem);
> +
> +	net_devmem_free_dmabuf(netmem_to_net_iov(netmem));
> +
> +	/* We don't want the page pool put_page()ing our net_iovs. */
> +	return false;
> +}
> diff --git a/net/core/netdev_rx_queue.c b/net/core/netdev_rx_queue.c
> index da11720a5983..e217a5838c87 100644
> --- a/net/core/netdev_rx_queue.c
> +++ b/net/core/netdev_rx_queue.c
> @@ -4,8 +4,11 @@
>  #include <net/netdev_queues.h>
>  #include <net/netdev_rx_queue.h>
>  
> +#include "page_pool_priv.h"
> +
>  int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq_idx)
>  {
> +	struct netdev_rx_queue *rxq = __netif_get_rx_queue(dev, rxq_idx);
>  	void *new_mem, *old_mem;
>  	int err;
>  
> @@ -31,6 +34,10 @@ int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq_idx)
>  	if (err)
>  		goto err_free_old_mem;
>  
> +	err = page_pool_check_memory_provider(dev, rxq);
> +	if (err)
> +		goto err_free_new_queue_mem;
> +
>  	err = dev->queue_mgmt_ops->ndo_queue_stop(dev, old_mem, rxq_idx);
>  	if (err)
>  		goto err_free_new_queue_mem;
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 52659db2d765..6e24950f2be4 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -11,6 +11,8 @@
>  #include <linux/slab.h>
>  #include <linux/device.h>
>  
> +#include <net/mp_dmabuf_devmem.h>
> +#include <net/netdev_rx_queue.h>
>  #include <net/page_pool/helpers.h>
>  #include <net/xdp.h>
>  
> @@ -190,6 +192,7 @@ static int page_pool_init(struct page_pool *pool,
>  			  int cpuid)
>  {
>  	unsigned int ring_qsize = 1024; /* Default */
> +	int err;
>  
>  	page_pool_struct_check();
>  
> @@ -271,7 +274,36 @@ static int page_pool_init(struct page_pool *pool,
>  	if (pool->dma_map)
>  		get_device(pool->p.dev);
>  
> +	if (pool->slow.queue &&
> +	    pool->slow.flags & PP_FLAG_ALLOW_UNREADABLE_NETMEM) {
> +		/* We rely on rtnl_lock()ing to make sure netdev_rx_queue
> +		 * configuration doesn't change while we're initializing the

nit: 'the' to next line

> +		 * page_pool.
> +		 */
> +		ASSERT_RTNL();
> +		pool->mp_priv = pool->slow.queue->mp_params.mp_priv;
> +	}

  reply	other threads:[~2024-09-03 21:19 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-31  0:43 [PATCH net-next v24 00/13] Device Memory TCP Mina Almasry
2024-08-31  0:43 ` [PATCH net-next v24 01/13] netdev: add netdev_rx_queue_restart() Mina Almasry
2024-08-31  0:43 ` [PATCH net-next v24 02/13] net: netdev netlink api to bind dma-buf to a net device Mina Almasry
2024-08-31  0:43 ` [PATCH net-next v24 03/13] netdev: support binding dma-buf to netdevice Mina Almasry
2024-09-03 20:57   ` Jakub Kicinski
2024-08-31  0:43 ` [PATCH net-next v24 04/13] netdev: netdevice devmem allocator Mina Almasry
2024-08-31  0:43 ` [PATCH net-next v24 05/13] page_pool: devmem support Mina Almasry
2024-08-31  0:43 ` [PATCH net-next v24 06/13] memory-provider: dmabuf devmem memory provider Mina Almasry
2024-09-03 21:19   ` Jakub Kicinski [this message]
2024-09-09  0:21     ` Mina Almasry
2024-08-31  0:43 ` [PATCH net-next v24 07/13] net: support non paged skb frags Mina Almasry
2024-09-03 21:31   ` Jakub Kicinski
2024-08-31  0:43 ` [PATCH net-next v24 08/13] net: add support for skbs with unreadable frags Mina Almasry
2024-09-03 21:40   ` Jakub Kicinski
2024-09-04 15:18     ` Mina Almasry
2024-09-04 15:29       ` Eric Dumazet
2024-08-31  0:43 ` [PATCH net-next v24 09/13] tcp: RX path for devmem TCP Mina Almasry
2024-08-31  0:43 ` [PATCH net-next v24 10/13] net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags Mina Almasry
2024-08-31  0:43 ` [PATCH net-next v24 11/13] net: add devmem TCP documentation Mina Almasry
2024-09-03 21:51   ` Jakub Kicinski
2024-08-31  0:43 ` [PATCH net-next v24 12/13] selftests: add ncdevmem, netcat for devmem TCP Mina Almasry
2024-08-31  0:43 ` [PATCH net-next v24 13/13] netdev: add dmabuf introspection Mina Almasry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240903141948.269e22bb@kernel.org \
    --to=kuba@kernel.org \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=almasrymina@google.com \
    --cc=andreas@gaisler.com \
    --cc=ap420073@gmail.com \
    --cc=arnd@arndb.de \
    --cc=asml.silence@gmail.com \
    --cc=ast@kernel.org \
    --cc=bagasdotme@gmail.com \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=corbet@lwn.net \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=deller@gmx.de \
    --cc=donald.hunter@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=dsahern@kernel.org \
    --cc=dw@davidwei.uk \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=hch@infradead.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=hramamurthy@google.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=ink@jurassic.park.msu.ru \
    --cc=jeroendb@google.com \
    --cc=jgg@ziepe.ca \
    --cc=john.fastabend@gmail.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kaiyuanz@google.com \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mattst88@gmail.com \
    --cc=mhiramat@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pkaligineedi@google.com \
    --cc=razor@blackwall.org \
    --cc=richard.henderson@linaro.org \
    --cc=rostedt@goodmis.org \
    --cc=shailend@google.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shuah@kernel.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=steffen.klassert@secunet.com \
    --cc=sumit.semwal@linaro.org \
    --cc=tsbogend@alpha.franken.de \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.