All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stanislav Fomichev <stfomichev@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, kuba@kernel.org,
	davem@davemloft.net, razor@blackwall.org, pabeni@redhat.com,
	willemb@google.com, sdf@fomichev.me, john.fastabend@gmail.com,
	martin.lau@kernel.org, jordan@jrife.io,
	maciej.fijalkowski@intel.com, magnus.karlsson@intel.com,
	dw@davidwei.uk, toke@redhat.com, yangzhenze@bytedance.com,
	wangdongdong.6@bytedance.com
Subject: Re: [PATCH net-next v3 05/15] net: Proxy net_mp_{open,close}_rxq for mapped queues
Date: Fri, 24 Oct 2025 11:36:49 -0700	[thread overview]
Message-ID: <aPvHQYXJ8SGA-lSw@mini-arch> (raw)
In-Reply-To: <20251020162355.136118-6-daniel@iogearbox.net>

On 10/20, Daniel Borkmann wrote:
> From: David Wei <dw@davidwei.uk>
> 
> When a process in a container wants to setup a memory provider, it will
> use the virtual netdev and a mapped rxq, and call net_mp_{open,close}_rxq
> to try and restart the queue. At this point, proxy the queue restart on
> the real rxq in the physical netdev.
> 
> For memory providers (io_uring zero-copy rx and devmem), it causes the
> real rxq in the physical netdev to be filled from a memory provider that
> has DMA mapped memory from a process within a container.
> 
> Signed-off-by: David Wei <dw@davidwei.uk>
> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  include/net/page_pool/memory_provider.h |  4 +-
>  net/core/netdev_rx_queue.c              | 57 +++++++++++++++++--------
>  2 files changed, 41 insertions(+), 20 deletions(-)
> 
> diff --git a/include/net/page_pool/memory_provider.h b/include/net/page_pool/memory_provider.h
> index ada4f968960a..b6f811c3416b 100644
> --- a/include/net/page_pool/memory_provider.h
> +++ b/include/net/page_pool/memory_provider.h
> @@ -23,12 +23,12 @@ bool net_mp_niov_set_dma_addr(struct net_iov *niov, dma_addr_t addr);
>  void net_mp_niov_set_page_pool(struct page_pool *pool, struct net_iov *niov);
>  void net_mp_niov_clear_page_pool(struct net_iov *niov);
>  
> -int net_mp_open_rxq(struct net_device *dev, unsigned ifq_idx,
> +int net_mp_open_rxq(struct net_device *dev, unsigned int rxq_idx,
>  		    struct pp_memory_provider_params *p);
>  int __net_mp_open_rxq(struct net_device *dev, unsigned int rxq_idx,
>  		      const struct pp_memory_provider_params *p,
>  		      struct netlink_ext_ack *extack);
> -void net_mp_close_rxq(struct net_device *dev, unsigned ifq_idx,
> +void net_mp_close_rxq(struct net_device *dev, unsigned int rxq_idx,
>  		      struct pp_memory_provider_params *old_p);
>  void __net_mp_close_rxq(struct net_device *dev, unsigned int rxq_idx,
>  			const struct pp_memory_provider_params *old_p);
> diff --git a/net/core/netdev_rx_queue.c b/net/core/netdev_rx_queue.c
> index 8ee289316c06..b4ff3497e086 100644
> --- a/net/core/netdev_rx_queue.c
> +++ b/net/core/netdev_rx_queue.c
> @@ -170,48 +170,63 @@ int __net_mp_open_rxq(struct net_device *dev, unsigned int rxq_idx,
>  		      struct netlink_ext_ack *extack)
>  {
>  	struct netdev_rx_queue *rxq;
> +	bool needs_unlock = false;
>  	int ret;
>  
>  	if (!netdev_need_ops_lock(dev))
>  		return -EOPNOTSUPP;
> -
>  	if (rxq_idx >= dev->real_num_rx_queues) {
>  		NL_SET_ERR_MSG(extack, "rx queue index out of range");
>  		return -ERANGE;
>  	}
> -	rxq_idx = array_index_nospec(rxq_idx, dev->real_num_rx_queues);
>  
> +	rxq_idx = array_index_nospec(rxq_idx, dev->real_num_rx_queues);
> +	rxq = netif_get_rx_queue_peer_locked(&dev, &rxq_idx, &needs_unlock);
> +	if (!rxq) {
> +		NL_SET_ERR_MSG(extack, "rx queue peered to a virtual netdev");
> +		return -EBUSY;
> +	}
> +	if (!dev->dev.parent) {
> +		NL_SET_ERR_MSG(extack, "rx queue is mapped to a virtual netdev");
> +		ret = -EBUSY;
> +		goto out;
> +	}
>  	if (dev->cfg->hds_config != ETHTOOL_TCP_DATA_SPLIT_ENABLED) {
>  		NL_SET_ERR_MSG(extack, "tcp-data-split is disabled");
> -		return -EINVAL;
> +		ret = -EINVAL;
> +		goto out;
>  	}
>  	if (dev->cfg->hds_thresh) {
>  		NL_SET_ERR_MSG(extack, "hds-thresh is not zero");
> -		return -EINVAL;
> +		ret = -EINVAL;
> +		goto out;
>  	}
>  	if (dev_xdp_prog_count(dev)) {
>  		NL_SET_ERR_MSG(extack, "unable to custom memory provider to device with XDP program attached");
> -		return -EEXIST;
> +		ret = -EEXIST;
> +		goto out;
>  	}
> -
> -	rxq = __netif_get_rx_queue(dev, rxq_idx);
>  	if (rxq->mp_params.mp_ops) {
>  		NL_SET_ERR_MSG(extack, "designated queue already memory provider bound");
> -		return -EEXIST;
> +		ret = -EEXIST;
> +		goto out;
>  	}
>  #ifdef CONFIG_XDP_SOCKETS
>  	if (rxq->pool) {
>  		NL_SET_ERR_MSG(extack, "designated queue already in use by AF_XDP");
> -		return -EBUSY;
> +		ret = -EBUSY;
> +		goto out;
>  	}
>  #endif
> -
>  	rxq->mp_params = *p;
>  	ret = netdev_rx_queue_restart(dev, rxq_idx);
>  	if (ret) {
>  		rxq->mp_params.mp_ops = NULL;
>  		rxq->mp_params.mp_priv = NULL;
>  	}
> +out:
> +	if (needs_unlock)
> +		netdev_unlock(dev);

Can we do something better than needs_unlock flag? Maybe something like the
following?

netif_put_rx_queue_peer_locked(orig_dev, dev)
{
	if (orig_dev != dev)
		netdev_unlock(dev);
}

Then we can do:

orig_dev = dev;
rxq = netif_get_rx_queue_peer_locked(&dev, &rx_idx);
...
netif_put_rx_queue_peer_locked(orig_dev, dev);

  parent reply	other threads:[~2025-10-24 18:36 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-20 16:23 [PATCH net-next v3 00/15] netkit: Support for io_uring zero-copy and AF_XDP Daniel Borkmann
2025-10-20 16:23 ` [PATCH net-next v3 01/15] net: Add bind-queue operation Daniel Borkmann
2025-10-22 11:19   ` Nikolay Aleksandrov
2025-10-24  2:12   ` Jakub Kicinski
2025-10-24 10:15     ` Daniel Borkmann
2025-10-24 18:11       ` Stanislav Fomichev
2025-10-24 19:17         ` Daniel Borkmann
2025-10-20 16:23 ` [PATCH net-next v3 02/15] net: Implement netdev_nl_bind_queue_doit Daniel Borkmann
2025-10-22 11:17   ` Nikolay Aleksandrov
2025-10-22 11:26     ` Daniel Borkmann
2025-10-23 10:17   ` Paolo Abeni
2025-10-23 12:46     ` Daniel Borkmann
2025-10-23 10:27   ` Paolo Abeni
2025-10-23 12:48     ` Daniel Borkmann
2025-10-24  2:08       ` Jakub Kicinski
2025-10-28 21:59         ` David Wei
2025-10-28 23:44           ` Jakub Kicinski
2025-10-29  0:38             ` David Wei
2025-10-24  2:28   ` Jakub Kicinski
2025-10-28 22:41     ` David Wei
2025-10-29 16:46       ` Daniel Borkmann
2025-10-24 18:20   ` Stanislav Fomichev
2025-10-24 19:15     ` Daniel Borkmann
2025-10-20 16:23 ` [PATCH net-next v3 03/15] net: Add peer info to queue-get response Daniel Borkmann
2025-10-22 11:23   ` Nikolay Aleksandrov
2025-10-24  2:33   ` Jakub Kicinski
2025-10-24 12:59     ` Daniel Borkmann
2025-10-24 23:18       ` Jakub Kicinski
2025-10-29  2:08         ` David Wei
2025-10-29 22:47           ` Jakub Kicinski
2025-10-20 16:23 ` [PATCH net-next v3 04/15] net, ethtool: Disallow peered real rxqs to be resized Daniel Borkmann
2025-10-22 11:25   ` Nikolay Aleksandrov
2025-10-20 16:23 ` [PATCH net-next v3 05/15] net: Proxy net_mp_{open,close}_rxq for mapped queues Daniel Borkmann
2025-10-22 12:50   ` Nikolay Aleksandrov
2025-10-24 18:36   ` Stanislav Fomichev [this message]
2025-10-29  2:07     ` David Wei
2025-10-20 16:23 ` [PATCH net-next v3 06/15] xsk: Move NETDEV_XDP_ACT_ZC into generic header Daniel Borkmann
2025-10-22 12:51   ` Nikolay Aleksandrov
2025-10-20 16:23 ` [PATCH net-next v3 07/15] xsk: Move pool registration into single function Daniel Borkmann
2025-10-22 12:52   ` Nikolay Aleksandrov
2025-10-20 16:23 ` [PATCH net-next v3 08/15] xsk: Add small helper xp_pool_bindable Daniel Borkmann
2025-10-22 12:52   ` Nikolay Aleksandrov
2025-10-20 16:23 ` [PATCH net-next v3 09/15] xsk: Change xsk_rcv_check to check netdev/queue_id from pool Daniel Borkmann
2025-10-20 16:23 ` [PATCH net-next v3 10/15] xsk: Proxy pool management for mapped queues Daniel Borkmann
2025-10-20 16:23 ` [PATCH net-next v3 11/15] netkit: Add single device mode for netkit Daniel Borkmann
2025-10-22 13:13   ` Nikolay Aleksandrov
2025-10-20 16:23 ` [PATCH net-next v3 12/15] netkit: Document fast vs slowpath members via macros Daniel Borkmann
2025-10-22 13:02   ` Nikolay Aleksandrov
2025-10-20 16:23 ` [PATCH net-next v3 13/15] netkit: Implement rtnl_link_ops->alloc and ndo_queue_create Daniel Borkmann
2025-10-22 13:00   ` Nikolay Aleksandrov
2025-10-20 16:23 ` [PATCH net-next v3 14/15] netkit: Add io_uring zero-copy support for TCP Daniel Borkmann
2025-10-22 13:12   ` Nikolay Aleksandrov
2025-10-20 16:23 ` [PATCH net-next v3 15/15] netkit: Add xsk support for af_xdp applications Daniel Borkmann
2025-10-22 14:27   ` Nikolay Aleksandrov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPvHQYXJ8SGA-lSw@mini-arch \
    --to=stfomichev@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dw@davidwei.uk \
    --cc=john.fastabend@gmail.com \
    --cc=jordan@jrife.io \
    --cc=kuba@kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=martin.lau@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=razor@blackwall.org \
    --cc=sdf@fomichev.me \
    --cc=toke@redhat.com \
    --cc=wangdongdong.6@bytedance.com \
    --cc=willemb@google.com \
    --cc=yangzhenze@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.