From: Stanislav Fomichev <stfomichev@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, kuba@kernel.org,
davem@davemloft.net, razor@blackwall.org, pabeni@redhat.com,
willemb@google.com, sdf@fomichev.me, john.fastabend@gmail.com,
martin.lau@kernel.org, jordan@jrife.io,
maciej.fijalkowski@intel.com, magnus.karlsson@intel.com,
dw@davidwei.uk, toke@redhat.com, yangzhenze@bytedance.com,
wangdongdong.6@bytedance.com
Subject: Re: [PATCH net-next v3 02/15] net: Implement netdev_nl_bind_queue_doit
Date: Fri, 24 Oct 2025 11:20:11 -0700 [thread overview]
Message-ID: <aPvDW0o89kmtGFfH@mini-arch> (raw)
In-Reply-To: <20251020162355.136118-3-daniel@iogearbox.net>
On 10/20, Daniel Borkmann wrote:
> From: David Wei <dw@davidwei.uk>
>
> Implement netdev_nl_bind_queue_doit() that creates an rx queue in a
> virtual netdev and then binds it to an rxq in a real netdev to create
> a queue pair.
>
> Example with ynl client:
>
> # ./pyynl/cli.py \
> --spec ~/netlink/specs/netdev.yaml \
> --do bind-queue \
> --json '{"src-ifindex": 4, "src-queue-id": 15, "dst-ifindex": 8, "queue-type": "rx"}'
> {'dst-queue-id': 1}
>
> Note that the netdevice locking order is always from the virtual to
> the physical device.
>
> Signed-off-by: David Wei <dw@davidwei.uk>
> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
> include/net/netdev_queues.h | 5 ++
> include/net/netdev_rx_queue.h | 36 ++++++++-
> net/core/netdev-genl.c | 141 +++++++++++++++++++++++++++++++++-
> net/core/netdev_rx_queue.c | 61 +++++++++++++++
> 4 files changed, 240 insertions(+), 3 deletions(-)
>
> diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
> index cd00e0406cf4..286d5edce07d 100644
> --- a/include/net/netdev_queues.h
> +++ b/include/net/netdev_queues.h
> @@ -130,6 +130,10 @@ void netdev_stat_queue_sum(struct net_device *netdev,
> * @ndo_queue_get_dma_dev: Get dma device for zero-copy operations to be used
> * for this queue. Return NULL on error.
> *
> + * @ndo_queue_create: Create a new RX queue which can be bound to another queue.
> + * Ops on this queue are redirected to the peer queue e.g.
> + * when opening a memory provider.
> + *
> * Note that @ndo_queue_mem_alloc and @ndo_queue_mem_free may be called while
> * the interface is closed. @ndo_queue_start and @ndo_queue_stop will only
> * be called for an interface which is open.
> @@ -149,6 +153,7 @@ struct netdev_queue_mgmt_ops {
> int idx);
> struct device * (*ndo_queue_get_dma_dev)(struct net_device *dev,
> int idx);
> + int (*ndo_queue_create)(struct net_device *dev);
> };
>
> bool netif_rxq_has_unreadable_mp(struct net_device *dev, int idx);
> diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
> index 8cdcd138b33f..db3ef94c0744 100644
> --- a/include/net/netdev_rx_queue.h
> +++ b/include/net/netdev_rx_queue.h
> @@ -28,6 +28,7 @@ struct netdev_rx_queue {
> #endif
> struct napi_struct *napi;
> struct pp_memory_provider_params mp_params;
> + struct netdev_rx_queue *peer;
> } ____cacheline_aligned_in_smp;
>
> /*
> @@ -56,6 +57,37 @@ get_netdev_rx_queue_index(struct netdev_rx_queue *queue)
> return index;
> }
>
> -int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq);
> +static inline void __netdev_rx_queue_peer(struct netdev_rx_queue *src_rxq,
> + struct netdev_rx_queue *dst_rxq)
> +{
> + src_rxq->peer = dst_rxq;
> + dst_rxq->peer = src_rxq;
> +}
>
> -#endif
> +static inline void __netdev_rx_queue_unpeer(struct netdev_rx_queue *src_rxq,
> + struct netdev_rx_queue *dst_rxq)
> +{
> + src_rxq->peer = NULL;
> + dst_rxq->peer = NULL;
> +}
> +
> +static inline bool netdev_rx_queue_peered(struct net_device *dev,
> + u16 queue_id)
> +{
> + if (queue_id < dev->real_num_rx_queues)
> + return dev->_rx[queue_id].peer;
> + return false;
> +}
> +
> +void netdev_rx_queue_peer(struct net_device *src_dev,
> + struct netdev_rx_queue *src_rxq,
> + struct netdev_rx_queue *dst_rxq);
> +void netdev_rx_queue_unpeer(struct net_device *src_dev,
> + struct netdev_rx_queue *src_rxq,
> + struct netdev_rx_queue *dst_rxq);
> +int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq);
> +struct netdev_rx_queue *
> +netif_get_rx_queue_peer_locked(struct net_device **dev,
> + unsigned int *rxq_idx,
> + bool *needs_unlock);
> +#endif /* _LINUX_NETDEV_RX_QUEUE_H */
> diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> index ce1018ea390f..579469abac8c 100644
> --- a/net/core/netdev-genl.c
> +++ b/net/core/netdev-genl.c
> @@ -1122,7 +1122,146 @@ int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct genl_info *info)
>
> int netdev_nl_bind_queue_doit(struct sk_buff *skb, struct genl_info *info)
> {
> - return -EOPNOTSUPP;
> + u32 src_ifidx, src_qid, dst_ifidx, dst_qid, q_type;
> + struct netdev_rx_queue *src_rxq, *dst_rxq, *tmp_rxq;
> + struct net_device *src_dev, *dst_dev;
> + struct sk_buff *rsp;
> + int err = 0;
> + void *hdr;
> +
> + if (GENL_REQ_ATTR_CHECK(info, NETDEV_A_QUEUE_PAIR_QUEUE_TYPE) ||
> + GENL_REQ_ATTR_CHECK(info, NETDEV_A_QUEUE_PAIR_SRC_IFINDEX) ||
> + GENL_REQ_ATTR_CHECK(info, NETDEV_A_QUEUE_PAIR_SRC_QUEUE_ID) ||
> + GENL_REQ_ATTR_CHECK(info, NETDEV_A_QUEUE_PAIR_DST_IFINDEX))
> + return -EINVAL;
> +
> + src_ifidx = nla_get_u32(info->attrs[NETDEV_A_QUEUE_PAIR_SRC_IFINDEX]);
> + src_qid = nla_get_u32(info->attrs[NETDEV_A_QUEUE_PAIR_SRC_QUEUE_ID]);
> + dst_ifidx = nla_get_u32(info->attrs[NETDEV_A_QUEUE_PAIR_DST_IFINDEX]);
> + q_type = nla_get_u32(info->attrs[NETDEV_A_QUEUE_PAIR_QUEUE_TYPE]);
> +
> + if (q_type != NETDEV_QUEUE_TYPE_RX) {
> + NL_SET_ERR_MSG(info->extack, "Only binding of RX queue supported");
> + return -EOPNOTSUPP;
> + }
> + if (dst_ifidx == src_ifidx) {
> + NL_SET_ERR_MSG(info->extack,
> + "Destination driver cannot be same as source driver");
> + return -EOPNOTSUPP;
> + }
> +
> + rsp = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
> + if (!rsp)
> + return -ENOMEM;
> +
> + hdr = genlmsg_iput(rsp, info);
> + if (!hdr) {
> + err = -EMSGSIZE;
> + goto err_genlmsg_free;
> + }
[..]
> + /* Locking order is always from the virtual to the physical device
> + * since this is also the same order when applications open the
> + * memory provider later on.
> + */
> + dst_dev = netdev_get_by_index_lock(genl_info_net(info), dst_ifidx);
> + if (!dst_dev) {
> + err = -ENODEV;
> + goto err_genlmsg_free;
> + }
...
> + src_dev = netdev_get_by_index_lock(genl_info_net(info), src_ifidx);
> + if (!src_dev) {
> + err = -ENODEV;
> + goto err_unlock_dst_dev;
> + }
But isn't the above susceptible to ABBA exploitation from the userspace?
I can try to concurrently do two requests, the second one being with
dst_dev and src_dev swapped. Or do we assume that we exit earlier for
the swapped case based on some other condition?
next prev parent reply other threads:[~2025-10-24 18:20 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-20 16:23 [PATCH net-next v3 00/15] netkit: Support for io_uring zero-copy and AF_XDP Daniel Borkmann
2025-10-20 16:23 ` [PATCH net-next v3 01/15] net: Add bind-queue operation Daniel Borkmann
2025-10-22 11:19 ` Nikolay Aleksandrov
2025-10-24 2:12 ` Jakub Kicinski
2025-10-24 10:15 ` Daniel Borkmann
2025-10-24 18:11 ` Stanislav Fomichev
2025-10-24 19:17 ` Daniel Borkmann
2025-10-20 16:23 ` [PATCH net-next v3 02/15] net: Implement netdev_nl_bind_queue_doit Daniel Borkmann
2025-10-22 11:17 ` Nikolay Aleksandrov
2025-10-22 11:26 ` Daniel Borkmann
2025-10-23 10:17 ` Paolo Abeni
2025-10-23 12:46 ` Daniel Borkmann
2025-10-23 10:27 ` Paolo Abeni
2025-10-23 12:48 ` Daniel Borkmann
2025-10-24 2:08 ` Jakub Kicinski
2025-10-28 21:59 ` David Wei
2025-10-28 23:44 ` Jakub Kicinski
2025-10-29 0:38 ` David Wei
2025-10-24 2:28 ` Jakub Kicinski
2025-10-28 22:41 ` David Wei
2025-10-29 16:46 ` Daniel Borkmann
2025-10-24 18:20 ` Stanislav Fomichev [this message]
2025-10-24 19:15 ` Daniel Borkmann
2025-10-20 16:23 ` [PATCH net-next v3 03/15] net: Add peer info to queue-get response Daniel Borkmann
2025-10-22 11:23 ` Nikolay Aleksandrov
2025-10-24 2:33 ` Jakub Kicinski
2025-10-24 12:59 ` Daniel Borkmann
2025-10-24 23:18 ` Jakub Kicinski
2025-10-29 2:08 ` David Wei
2025-10-29 22:47 ` Jakub Kicinski
2025-10-20 16:23 ` [PATCH net-next v3 04/15] net, ethtool: Disallow peered real rxqs to be resized Daniel Borkmann
2025-10-22 11:25 ` Nikolay Aleksandrov
2025-10-20 16:23 ` [PATCH net-next v3 05/15] net: Proxy net_mp_{open,close}_rxq for mapped queues Daniel Borkmann
2025-10-22 12:50 ` Nikolay Aleksandrov
2025-10-24 18:36 ` Stanislav Fomichev
2025-10-29 2:07 ` David Wei
2025-10-20 16:23 ` [PATCH net-next v3 06/15] xsk: Move NETDEV_XDP_ACT_ZC into generic header Daniel Borkmann
2025-10-22 12:51 ` Nikolay Aleksandrov
2025-10-20 16:23 ` [PATCH net-next v3 07/15] xsk: Move pool registration into single function Daniel Borkmann
2025-10-22 12:52 ` Nikolay Aleksandrov
2025-10-20 16:23 ` [PATCH net-next v3 08/15] xsk: Add small helper xp_pool_bindable Daniel Borkmann
2025-10-22 12:52 ` Nikolay Aleksandrov
2025-10-20 16:23 ` [PATCH net-next v3 09/15] xsk: Change xsk_rcv_check to check netdev/queue_id from pool Daniel Borkmann
2025-10-20 16:23 ` [PATCH net-next v3 10/15] xsk: Proxy pool management for mapped queues Daniel Borkmann
2025-10-20 16:23 ` [PATCH net-next v3 11/15] netkit: Add single device mode for netkit Daniel Borkmann
2025-10-22 13:13 ` Nikolay Aleksandrov
2025-10-20 16:23 ` [PATCH net-next v3 12/15] netkit: Document fast vs slowpath members via macros Daniel Borkmann
2025-10-22 13:02 ` Nikolay Aleksandrov
2025-10-20 16:23 ` [PATCH net-next v3 13/15] netkit: Implement rtnl_link_ops->alloc and ndo_queue_create Daniel Borkmann
2025-10-22 13:00 ` Nikolay Aleksandrov
2025-10-20 16:23 ` [PATCH net-next v3 14/15] netkit: Add io_uring zero-copy support for TCP Daniel Borkmann
2025-10-22 13:12 ` Nikolay Aleksandrov
2025-10-20 16:23 ` [PATCH net-next v3 15/15] netkit: Add xsk support for af_xdp applications Daniel Borkmann
2025-10-22 14:27 ` Nikolay Aleksandrov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPvDW0o89kmtGFfH@mini-arch \
--to=stfomichev@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dw@davidwei.uk \
--cc=john.fastabend@gmail.com \
--cc=jordan@jrife.io \
--cc=kuba@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=martin.lau@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=razor@blackwall.org \
--cc=sdf@fomichev.me \
--cc=toke@redhat.com \
--cc=wangdongdong.6@bytedance.com \
--cc=willemb@google.com \
--cc=yangzhenze@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.