All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Mina Almasry <almasrymina@google.com>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org,
	linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org,
	sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	linux-arch@vger.kernel.org, bpf@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org,
	dri-devel@lists.freedesktop.org,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Donald Hunter" <donald.hunter@gmail.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Ivan Kokshaysky" <ink@jurassic.park.msu.ru>,
	"Matt Turner" <mattst88@gmail.com>,
	"Thomas Bogendoerfer" <tsbogend@alpha.franken.de>,
	"James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>,
	"Helge Deller" <deller@gmx.de>,
	"Andreas Larsson" <andreas@gaisler.com>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Masami Hiramatsu" <mhiramat@kernel.org>,
	"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"Eduard Zingerman" <eddyz87@gmail.com>,
	"Song Liu" <song@kernel.org>,
	"Yonghong Song" <yonghong.song@linux.dev>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@kernel.org>,
	"Stanislav Fomichev" <sdf@fomichev.me>,
	"Hao Luo" <haoluo@google.com>, "Jiri Olsa" <jolsa@kernel.org>,
	"Steffen Klassert" <steffen.klassert@secunet.com>,
	"Herbert Xu" <herbert@gondor.apana.org.au>,
	"David Ahern" <dsahern@kernel.org>,
	"Willem de Bruijn" <willemdebruijn.kernel@gmail.com>,
	"Shuah Khan" <shuah@kernel.org>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Bagas Sanjaya" <bagasdotme@gmail.com>,
	"Christoph Hellwig" <hch@infradead.org>,
	"Nikolay Aleksandrov" <razor@blackwall.org>,
	"Pavel Begunkov" <asml.silence@gmail.com>,
	"David Wei" <dw@davidwei.uk>, "Jason Gunthorpe" <jgg@ziepe.ca>,
	"Yunsheng Lin" <linyunsheng@huawei.com>,
	"Shailend Chand" <shailend@google.com>,
	"Harshitha Ramamurthy" <hramamurthy@google.com>,
	"Shakeel Butt" <shakeel.butt@linux.dev>,
	"Jeroen de Borst" <jeroendb@google.com>,
	"Praveen Kaligineedi" <pkaligineedi@google.com>,
	"Willem de Bruijn" <willemb@google.com>,
	"Kaiyuan Zhang" <kaiyuanz@google.com>
Subject: Re: [PATCH net-next v15 03/14] netdev: support binding dma-buf to netdevice
Date: Tue, 2 Jul 2024 18:09:08 -0700	[thread overview]
Message-ID: <20240702180908.0eccf78f@kernel.org> (raw)
In-Reply-To: <20240628003253.1694510-4-almasrymina@google.com>

On Fri, 28 Jun 2024 00:32:40 +0000 Mina Almasry wrote:
> +/* Protected by rtnl_lock() */
> +static DEFINE_XARRAY_FLAGS(net_devmem_dmabuf_bindings, XA_FLAGS_ALLOC1);
> +
> +void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding)
> +{
> +	struct netdev_rx_queue *rxq;
> +	unsigned long xa_idx;
> +	unsigned int rxq_idx;
> +
> +	if (!binding)
> +		return;

nit: I don't see how it can happen, no defensive programming, please

> +	if (binding->list.next)
> +		list_del(&binding->list);
> +
> +	xa_for_each(&binding->bound_rxq_list, xa_idx, rxq) {

nit: s/bound_rxq_list/bound_rxqs/ ? it's not a list

> +		if (rxq->mp_params.mp_priv == binding) {
> +			/* We hold the rtnl_lock while binding/unbinding
> +			 * dma-buf, so we can't race with another thread that
> +			 * is also modifying this value. However, the page_pool
> +			 * may read this config while it's creating its
> +			 * rx-queues. WRITE_ONCE() here to match the
> +			 * READ_ONCE() in the page_pool.
> +			 */
> +			WRITE_ONCE(rxq->mp_params.mp_priv, NULL);

Is this really sufficient in terms of locking? @binding is not
RCU-protected and neither is the reader guaranteed to be in 
an RCU critical section. Actually the "reader" tries to take a ref 
and use this struct so it's not even a pure reader.

Let's add a lock or use one of the existing locks

Or, perhaps time to add a mutex to struct net_device

> +			rxq_idx = get_netdev_rx_queue_index(rxq);
> +
> +			netdev_rx_queue_restart(binding->dev, rxq_idx);
> +		}
> +	}
> +
> +	xa_erase(&net_devmem_dmabuf_bindings, binding->id);
> +
> +	net_devmem_dmabuf_binding_put(binding);
> +}
> +
> +int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> +				    struct net_devmem_dmabuf_binding *binding)
> +{
> +	struct netdev_rx_queue *rxq;
> +	u32 xa_idx;
> +	int err;
> +
> +	if (rxq_idx >= dev->num_rx_queues)
> +		return -ERANGE;
> +
> +	rxq = __netif_get_rx_queue(dev, rxq_idx);
> +	if (rxq->mp_params.mp_priv)
> +		return -EEXIST;

Makes me wonder - do we need an API to unbind or we assume
application will only have one binding per socket and close 
it every time? I guess that's fine for future extension.

> +	err = xa_alloc(&binding->bound_rxq_list, &xa_idx, rxq, xa_limit_32b,
> +		       GFP_KERNEL);
> +	if (err)
> +		return err;
> +
> +	/* We hold the rtnl_lock while binding/unbinding dma-buf, so we can't
> +	 * race with another thread that is also modifying this value. However,
> +	 * the driver may read this config while it's creating its * rx-queues.
> +	 * WRITE_ONCE() here to match the READ_ONCE() in the driver.
> +	 */
> +	WRITE_ONCE(rxq->mp_params.mp_priv, binding);
> +
> +	err = netdev_rx_queue_restart(dev, rxq_idx);
> +	if (err)
> +		goto err_xa_erase;
> +
> +	return 0;
> +
> +err_xa_erase:
> +	WRITE_ONCE(rxq->mp_params.mp_priv, NULL);
> +	xa_erase(&binding->bound_rxq_list, xa_idx);
> +
> +	return err;
> +}
> +
> +int net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
> +			   struct net_devmem_dmabuf_binding **out)
> +{
> +	struct net_devmem_dmabuf_binding *binding;
> +	static u32 id_alloc_next;
> +	struct scatterlist *sg;
> +	struct dma_buf *dmabuf;
> +	unsigned int sg_idx, i;
> +	unsigned long virtual;
> +	int err;
> +
> +	dmabuf = dma_buf_get(dmabuf_fd);
> +	if (IS_ERR(dmabuf))
> +		return -EBADFD;

nit: I think error pointers are nicer than **out parameters :(
     you can ERR_CAST() all the DMABUF errors

> +	binding = kzalloc_node(sizeof(*binding), GFP_KERNEL,
> +			       dev_to_node(&dev->dev));
> +	if (!binding) {
> +		err = -ENOMEM;
> +		goto err_put_dmabuf;
> +	}
> +
> +	binding->dev = dev;
> +
> +	err = xa_alloc_cyclic(&net_devmem_dmabuf_bindings, &binding->id,
> +			      binding, xa_limit_32b, &id_alloc_next,
> +			      GFP_KERNEL);
> +	if (err < 0)
> +		goto err_free_binding;
> +
> +	xa_init_flags(&binding->bound_rxq_list, XA_FLAGS_ALLOC);
> +
> +	refcount_set(&binding->ref, 1);
> +
> +	binding->dmabuf = dmabuf;
> +
> +	binding->attachment = dma_buf_attach(binding->dmabuf, dev->dev.parent);
> +	if (IS_ERR(binding->attachment)) {
> +		err = PTR_ERR(binding->attachment);
> +		goto err_free_id;
> +	}

> -/* Stub */
>  int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
>  {
> -	return 0;
> +	struct nlattr *tb[ARRAY_SIZE(netdev_queue_dmabuf_nl_policy)];
> +	struct net_devmem_dmabuf_binding *out_binding;
> +	struct list_head *sock_binding_list;
> +	u32 ifindex, dmabuf_fd, rxq_idx;
> +	struct net_device *netdev;
> +	struct sk_buff *rsp;
> +	struct nlattr *attr;
> +	int rem, err = 0;
> +	void *hdr;
> +
> +	if (GENL_REQ_ATTR_CHECK(info, NETDEV_A_DEV_IFINDEX) ||
> +	    GENL_REQ_ATTR_CHECK(info, NETDEV_A_BIND_DMABUF_DMABUF_FD) ||
> +	    GENL_REQ_ATTR_CHECK(info, NETDEV_A_BIND_DMABUF_QUEUES))
> +		return -EINVAL;
> +
> +	ifindex = nla_get_u32(info->attrs[NETDEV_A_DEV_IFINDEX]);
> +	dmabuf_fd = nla_get_u32(info->attrs[NETDEV_A_BIND_DMABUF_DMABUF_FD]);
> +
> +	rtnl_lock();
> +
> +	netdev = __dev_get_by_index(genl_info_net(info), ifindex);
> +	if (!netdev) {

 || !netif_device_present(netdev)

> +		err = -ENODEV;
> +		goto err_unlock;
> +	}
> +
> +	err = net_devmem_bind_dmabuf(netdev, dmabuf_fd, &out_binding);
> +	if (err)
> +		goto err_unlock;
> +
> +	nla_for_each_attr(attr, genlmsg_data(info->genlhdr),
> +			  genlmsg_len(info->genlhdr), rem) {
> +
> +		if (nla_type(attr) != NETDEV_A_BIND_DMABUF_QUEUES)
> +			continue;

nit: nla_for_each_attr_type()

> +		err = nla_parse_nested(
> +			tb, ARRAY_SIZE(netdev_queue_dmabuf_nl_policy) - 1, attr,
> +			netdev_queue_dmabuf_nl_policy, info->extack);
> +		if (err < 0)
> +			goto err_unbind;
> +
> +		rxq_idx = nla_get_u32(tb[NETDEV_A_QUEUE_DMABUF_IDX]);
> +
> +		err = net_devmem_bind_dmabuf_to_queue(netdev, rxq_idx,
> +						      out_binding);
> +		if (err)
> +			goto err_unbind;
> +	}
> +
> +	sock_binding_list = genl_sk_priv_get(&netdev_nl_family,
> +					     NETLINK_CB(skb).sk);
> +	if (IS_ERR(sock_binding_list)) {
> +		err = PTR_ERR(sock_binding_list);
> +		goto err_unbind;
> +	}
> +
> +	list_add(&out_binding->list, sock_binding_list);
> +
> +	rsp = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
> +	if (!rsp) {
> +		err = -ENOMEM;
> +		goto err_unbind;
> +	}
> +
> +	hdr = genlmsg_iput(rsp, info);
> +	if (!hdr) {
> +		err = -EMSGSIZE;
> +		goto err_genlmsg_free;
> +	}

I'd move genl_sk_priv_get(), genlmsg_new() and genlmsg_iput() before we
take rtnl_lock(), but I admit it's a bit late for this sort of
feedback.. :)

> +	nla_put_u32(rsp, NETDEV_A_BIND_DMABUF_DMABUF_ID, out_binding->id);
> +	genlmsg_end(rsp, hdr);
> +
> +	rtnl_unlock();
> +
> +	return genlmsg_reply(rsp, info);
> +
> +err_genlmsg_free:
> +	nlmsg_free(rsp);
> +err_unbind:
> +	net_devmem_unbind_dmabuf(out_binding);
> +err_unlock:
> +	rtnl_unlock();
> +	return err;
>  }

  parent reply	other threads:[~2024-07-03  1:09 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-28  0:32 [PATCH net-next v15 00/14] Device Memory TCP Mina Almasry
2024-06-28  0:32 ` [PATCH net-next v15 01/14] netdev: add netdev_rx_queue_restart() Mina Almasry
2024-07-03  0:01   ` Jakub Kicinski
2024-06-28  0:32 ` [PATCH net-next v15 02/14] net: netdev netlink api to bind dma-buf to a net device Mina Almasry
2024-06-28 10:04   ` Donald Hunter
2024-07-01 19:04     ` Mina Almasry
2024-07-02  9:33       ` Donald Hunter
2024-07-03  0:19   ` Jakub Kicinski
2024-06-28  0:32 ` [PATCH net-next v15 03/14] netdev: support binding dma-buf to netdevice Mina Almasry
2024-07-02 11:08   ` Paolo Abeni
2024-07-02 14:17     ` Paolo Abeni
2024-07-03  1:09   ` Jakub Kicinski [this message]
2024-07-03 16:56     ` Mina Almasry
2024-07-03 18:31       ` Jakub Kicinski
2024-07-04 15:30   ` Taehee Yoo
2024-07-04 17:56   ` Taehee Yoo
2024-07-08 20:08     ` Mina Almasry
2024-07-09 15:37       ` Taehee Yoo
2024-07-09 17:44         ` Mina Almasry
2024-07-23 21:49         ` Mina Almasry
2024-07-24 14:11           ` Taehee Yoo
2024-06-28  0:32 ` [PATCH net-next v15 04/14] netdev: netdevice devmem allocator Mina Almasry
2024-06-28  0:32 ` [PATCH net-next v15 05/14] page_pool: convert to use netmem Mina Almasry
2024-06-28  0:32 ` [PATCH net-next v15 06/14] page_pool: devmem support Mina Almasry
2024-06-28  0:32 ` [PATCH net-next v15 07/14] memory-provider: dmabuf devmem memory provider Mina Almasry
2024-06-28  0:32 ` [PATCH net-next v15 08/14] net: support non paged skb frags Mina Almasry
2024-07-02 13:14   ` Eric Dumazet
2024-06-28  0:32 ` [PATCH net-next v15 09/14] net: add support for skbs with unreadable frags Mina Almasry
2024-07-02 14:06   ` Eric Dumazet
2024-06-28  0:32 ` [PATCH net-next v15 10/14] tcp: RX path for devmem TCP Mina Almasry
2024-07-02 14:16   ` Eric Dumazet
2024-07-02 14:51   ` Pavel Begunkov
2024-07-02 15:25   ` Arnd Bergmann
2024-07-02 18:14     ` Mina Almasry
2024-06-28  0:32 ` [PATCH net-next v15 11/14] net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags Mina Almasry
2024-07-02 14:30   ` Eric Dumazet
2024-07-02 14:56   ` Pavel Begunkov
2024-07-02 15:01   ` David Ahern
2024-06-28  0:32 ` [PATCH net-next v15 12/14] net: add devmem TCP documentation Mina Almasry
2024-06-28 10:09   ` Donald Hunter
2024-07-01 19:00     ` Mina Almasry
2024-06-28  0:32 ` [PATCH net-next v15 13/14] tools: net: package libynl for use in selftests Mina Almasry
2024-06-28  0:32 ` [PATCH net-next v15 14/14] selftests: add ncdevmem, netcat for devmem TCP Mina Almasry
2024-07-04  7:59   ` Taehee Yoo
2024-06-28  6:45 ` [PATCH net-next v15 00/14] Device Memory TCP Mina Almasry
2024-07-03  2:02 ` Jakub Kicinski
2024-07-03  4:20 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240702180908.0eccf78f@kernel.org \
    --to=kuba@kernel.org \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=almasrymina@google.com \
    --cc=andreas@gaisler.com \
    --cc=andrii@kernel.org \
    --cc=arnd@arndb.de \
    --cc=asml.silence@gmail.com \
    --cc=ast@kernel.org \
    --cc=bagasdotme@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=corbet@lwn.net \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=deller@gmx.de \
    --cc=donald.hunter@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=dsahern@kernel.org \
    --cc=dw@davidwei.uk \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=haoluo@google.com \
    --cc=hawk@kernel.org \
    --cc=hch@infradead.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=hramamurthy@google.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=ink@jurassic.park.msu.ru \
    --cc=jeroendb@google.com \
    --cc=jgg@ziepe.ca \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kaiyuanz@google.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=martin.lau@linux.dev \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mattst88@gmail.com \
    --cc=mhiramat@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pkaligineedi@google.com \
    --cc=razor@blackwall.org \
    --cc=richard.henderson@linaro.org \
    --cc=rostedt@goodmis.org \
    --cc=sdf@fomichev.me \
    --cc=shailend@google.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=steffen.klassert@secunet.com \
    --cc=sumit.semwal@linaro.org \
    --cc=tsbogend@alpha.franken.de \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.