From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yw1-f175.google.com (mail-yw1-f175.google.com [209.85.128.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 86824286417 for ; Tue, 12 May 2026 01:18:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.175 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778548704; cv=none; b=rU7Yey9R9GGEUVBey4OecknKuVtwayJ7UDwumvow+KxEWF/H6NmUmWkCJzUcPC/YrahKtR41JPiLhdsxV8Q7HC72Ow1vyXa1q2n6IpvAb1VfDuYY5AeyFxs1I9JYROOyifB+bx7N5c0Eu+Py47fagPLggOHGz9ZaQi6LZIy+774= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778548704; c=relaxed/simple; bh=yvnd2jpTzv7uquF5X3DZ7Q87rn0nrx7kMgTRq02m/Ns=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=BeL1lXIRlsQO6GnfUsCxT119HZINb/K719iV8F6Ga+maudO4X8iCpu5dDYBfgVVYGC3x+tHoyGM/5WIA4QDtg7JQOKWiaP8CAzi3el4WNaj1hWkYd1g1gqUEN+nTl/DbVg0BE3tQLyCy5R+00IO7TO9LqW9Hhw7GVrsfO8YPDTw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=nstw92OX; arc=none smtp.client-ip=209.85.128.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="nstw92OX" Received: by mail-yw1-f175.google.com with SMTP id 00721157ae682-7bf14e33f5bso44331157b3.1 for ; Mon, 11 May 2026 18:18:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778548696; x=1779153496; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=LaTOd3vwyY+N8dU2vAPUPWXUXO0K0ZlI7SyXM5QjVRQ=; b=nstw92OXLKwIxzs3c9xrgGcz73lraWbvz4jpQxFe2VU9D1y0GCTvdQ6hQCTpWutsUC L7GdY19p5rBdiMb1gt6X7tDxRG58kSGIoGvtMhP+Uug+Jacq88sA7kH8iXmPv2Bv7XGA QIMmAlWAtrogJft69YKtii1BVLobMGiLFq8ayrv9a5EOA3x93MSCzReKJKJrKkgUhElB 7UUpDUCDDvpjWWfn8r/tSEX0d2q2ouZhDZM0+pjy3ffPvwfRXEtT3v4aiQL87VXmQf0D WRCWVXmVsBwMJu+gBpIYoZGlWKo9B+VDSz+fMc3mEGBHa6KFbZZbKl6SSEcwXQNoDRSn aklg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778548696; x=1779153496; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=LaTOd3vwyY+N8dU2vAPUPWXUXO0K0ZlI7SyXM5QjVRQ=; b=eUm9wjKN0JM1UZn4rU+bIz0PbwVG8DM8S6ubhFwkx1wvviWhGZJ7xowiLTWvDgex0Q De3SCqePHSVtjkukH15dpGP6FM3WfPrx5rZW7JHhoTIxocFPjLYP3OnEUSabOwO79Yr6 ic7zfAP0iuffOAiMONo3GC9Ma9rCQdsFxfwLiS1rqkcFBRM3nSJ3paYTp3CV0lMv5PXX dTfxP3QjyFf6sHxjliRgFqG/fKnsJyUTM/TD3XO1DoB5DjGoGcD/ZQ3EWavYvV9cWils T6WU3Oz3Q43U78aZIud/I8ylntUVHAjqZ+WwvWw/l1AKHQzAuqXC6Tnoq+KbRovefoQL L5nQ== X-Forwarded-Encrypted: i=1; AFNElJ/zc5rGHCGXH+K7i4LA0fkWC9oJlvr0EJwLoJ9LJoSgBekKS+yAiMkldMZOac7lMjRNjkUuZ7U=@vger.kernel.org X-Gm-Message-State: AOJu0Yx9Ur00UPjkPEGC1dWF7H64A3Oyl21VBo9BDrklPL1IP0ooNNUv YTBV8mNSJCKezIv/cmsesBi3lgtWdz0ytG2aPlc57knDF7sKLipUK2wC X-Gm-Gg: Acq92OF4ezO4T8AkzyXsytct0WcINUimIBC4sLxuNvSbmdG0PNHVd0wiMKHG2pSjvjJ ilYazB+G9D1NKzbuOOrNMQq4jGSjSFwBH3024i1eUxKony0pqjJJpGGenM1UPoFZB3Q/2sdcbLZ hytR26K6GW/XHcq2wyro07Vfl4tG4clWcc3e7JxDXkXeRz0pmMafaYMMYdvcjAXeJT78P9PUUzA DD5ZA8jf64IdF2qvC63WRewvl8oUgwsmbsRCkCyFSykOSo7Vv/Ps4DFzbaBAV3ZTQO/tElDv0Ae RTgmcG4qxZ8SwrogAaRB+kePSVkCmNWb+Az352WdeJKlr87U1oUEIELfuaPiRfD+vAcyx6gvDEF 34P4WF8r7tjAaLE3Ry2N25hDu6NXGqUZon2Ry9yiGSzzSDkZ/zvzO4xbYq/DUx1PqZgG5WMWUfW Bgj+EC8q4pJtsJoG2yXVimtQ== X-Received: by 2002:a05:690c:e3e9:b0:7ba:f1a2:a448 with SMTP id 00721157ae682-7bfb71f45cbmr145963037b3.10.1778548696361; Mon, 11 May 2026 18:18:16 -0700 (PDT) Received: from localhost ([2a03:2880:f806:3e::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-7bd6683aa8asm156438657b3.26.2026.05.11.18.18.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 May 2026 18:18:16 -0700 (PDT) From: Bobby Eshleman Date: Mon, 11 May 2026 18:17:57 -0700 Subject: [PATCH net-next v4 3/8] net: devmem: support TX over NETMEM_TX_NO_DMA devices Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20260511-tcp-dm-netkit-v4-3-841b78b99d74@meta.com> References: <20260511-tcp-dm-netkit-v4-0-841b78b99d74@meta.com> In-Reply-To: <20260511-tcp-dm-netkit-v4-0-841b78b99d74@meta.com> To: Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Jonathan Corbet , Shuah Khan , Alex Shi , Yanteng Si , Dongliang Mu , Michael Chan , Pavan Chebbi , Joshua Washington , Harshitha Ramamurthy , Saeed Mahameed , Tariq Toukan , Mark Bloch , Leon Romanovsky , Alexander Duyck , kernel-team@meta.com, Daniel Borkmann , Nikolay Aleksandrov , Shuah Khan , Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Jonathan Corbet , Shuah Khan , Alex Shi , Yanteng Si , Dongliang Mu , Michael Chan , Pavan Chebbi , Joshua Washington , Harshitha Ramamurthy , Saeed Mahameed , Tariq Toukan , Mark Bloch , Leon Romanovsky , Alexander Duyck , kernel-team@meta.com, Daniel Borkmann , Nikolay Aleksandrov , Shuah Khan Cc: dw@davidwei.uk, sdf.kernel@gmail.com, mohsin.bashr@gmail.com, willemb@google.com, jiang.kun2@zte.com.cn, xu.xin16@zte.com.cn, wang.yaxin@zte.com.cn, netdev@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, Stanislav Fomichev , Mina Almasry , netdev@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, Bobby Eshleman X-Mailer: b4 0.14.3 From: Bobby Eshleman When a netkit virtual device leases queues from a physical NIC, devmem TX bindings created on the netkit device must still result in the dmabuf being mapped for dma by the physical device. This patch accomplishes this by teaching the bind handler to search for the underlying DMA-capable device by looking it up via leased rx queues. The function netdev_find_netmem_tx_dev(), used for finding the underlying DMA-capable device, can be extended to support other non-netkit NETMEM_TX_NO_DMA devices in the future if needed. Additionally, this patch extends validate_xmit_unreadable_skb() to support the netkit case, where the skb is validated twice: once on the netkit guest device and again on the physical NIC after BPF redirect or ip forwarding. Acked-by: Stanislav Fomichev Signed-off-by: Bobby Eshleman --- Changes in v4: - Fold the `NETMEM_TX_NO_DMA` check in `validate_xmit_unreadable_skb()` (Stan, Jakub) - Convert `binding->vdev` to void* opaque cookie with comment (Jakub) Changes in v3: - Fix validate_xmit_unreadable_skb() bug for non-devmem unreadable niovs (should not be dropped) - Major simplification of validate_xmit_unreadable_skb() - Fix prematurely released lock in bind-tx handler (Jakub) Changes in v2: - In validate_xmit_unreadable_skb() to check netmem_tx mode before inspecting frags (Jakub) - Lock bind_dev around netdev_queue_get_dma_dev() when bind_dev != netdev to fix lockdep (Sashiko) --- net/core/dev.c | 3 ++- net/core/devmem.c | 6 +++-- net/core/devmem.h | 10 ++++++-- net/core/netdev-genl.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++---- 4 files changed, 72 insertions(+), 10 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index fbe4c328a367..e1dc3ec88fe2 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3993,7 +3993,8 @@ static struct sk_buff *validate_xmit_unreadable_skb(struct sk_buff *skb, struct skb_shared_info *shinfo; struct net_iov *niov; - if (likely(skb_frags_readable(skb))) + if (likely(skb_frags_readable(skb) || + dev->netmem_tx == NETMEM_TX_NO_DMA)) goto out; if (dev->netmem_tx == NETMEM_TX_NONE) diff --git a/net/core/devmem.c b/net/core/devmem.c index cde4c89bc146..1c67e7524246 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -181,7 +181,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, } struct net_devmem_dmabuf_binding * -net_devmem_bind_dmabuf(struct net_device *dev, +net_devmem_bind_dmabuf(struct net_device *dev, void *vdev, struct device *dma_dev, enum dma_data_direction direction, unsigned int dmabuf_fd, struct netdev_nl_sock *priv, @@ -212,6 +212,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, } binding->dev = dev; + binding->vdev = vdev; xa_init_flags(&binding->bound_rxqs, XA_FLAGS_ALLOC); err = percpu_ref_init(&binding->ref, @@ -397,7 +398,8 @@ struct net_devmem_dmabuf_binding *net_devmem_get_binding(struct sock *sk, */ dst_dev = dst_dev_rcu(dst); if (unlikely(!dst_dev) || - unlikely(dst_dev != READ_ONCE(binding->dev))) { + unlikely(dst_dev != READ_ONCE(binding->dev) && + dst_dev != READ_ONCE(binding->vdev))) { err = -ENODEV; goto out_unlock; } diff --git a/net/core/devmem.h b/net/core/devmem.h index 1c5c18581fcb..3852a56036cb 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -19,7 +19,13 @@ struct net_devmem_dmabuf_binding { struct dma_buf *dmabuf; struct dma_buf_attachment *attachment; struct sg_table *sgt; + /* Physical NIC that does the actual DMA for this binding. */ struct net_device *dev; + /* Opaque cookie identifying the virtual device (e.g. netkit) the user + * called bind-tx on. Used only for pointer comparison. Never + * dereferenced. + */ + void *vdev; struct gen_pool *chunk_pool; /* Protect dev */ struct mutex lock; @@ -84,7 +90,7 @@ struct dmabuf_genpool_chunk_owner { void __net_devmem_dmabuf_binding_free(struct work_struct *wq); struct net_devmem_dmabuf_binding * -net_devmem_bind_dmabuf(struct net_device *dev, +net_devmem_bind_dmabuf(struct net_device *dev, void *vdev, struct device *dma_dev, enum dma_data_direction direction, unsigned int dmabuf_fd, struct netdev_nl_sock *priv, @@ -165,7 +171,7 @@ static inline void net_devmem_put_net_iov(struct net_iov *niov) } static inline struct net_devmem_dmabuf_binding * -net_devmem_bind_dmabuf(struct net_device *dev, +net_devmem_bind_dmabuf(struct net_device *dev, void *vdev, struct device *dma_dev, enum dma_data_direction direction, unsigned int dmabuf_fd, diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c index 4d2c49371cdb..b4d48f3672a5 100644 --- a/net/core/netdev-genl.c +++ b/net/core/netdev-genl.c @@ -1077,7 +1077,7 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info) goto err_rxq_bitmap; } - binding = net_devmem_bind_dmabuf(netdev, dma_dev, DMA_FROM_DEVICE, + binding = net_devmem_bind_dmabuf(netdev, NULL, dma_dev, DMA_FROM_DEVICE, dmabuf_fd, priv, info->extack); if (IS_ERR(binding)) { err = PTR_ERR(binding); @@ -1119,9 +1119,43 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info) return err; } +/* Find the DMA-capable device for a netmem TX binding. + * + * For NETMEM_TX_DMA devices, return the device itself. + * For NETMEM_TX_NO_DMA devices, walk leased RX queues to find the underlying + * physical device and return it. + */ +static struct net_device * +netdev_find_netmem_tx_dev(struct net_device *dev) +{ + struct netdev_rx_queue *lease_rxq; + struct net_device *phys_dev; + int i; + + if (dev->netmem_tx == NETMEM_TX_DMA) + return dev; + + if (dev->netmem_tx != NETMEM_TX_NO_DMA) + return NULL; + + for (i = 0; i < dev->real_num_rx_queues; i++) { + lease_rxq = READ_ONCE(__netif_get_rx_queue(dev, i)->lease); + if (!lease_rxq) + continue; + + phys_dev = lease_rxq->dev; + if (netif_device_present(phys_dev) && + phys_dev->netmem_tx == NETMEM_TX_DMA) + return phys_dev; + } + + return NULL; +} + int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct genl_info *info) { struct net_devmem_dmabuf_binding *binding; + struct net_device *bind_dev; struct netdev_nl_sock *priv; struct net_device *netdev; struct device *dma_dev; @@ -1171,22 +1205,41 @@ int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct genl_info *info) goto err_unlock_netdev; } - dma_dev = netdev_queue_get_dma_dev(netdev, 0, NETDEV_QUEUE_TYPE_TX); - binding = net_devmem_bind_dmabuf(netdev, dma_dev, DMA_TO_DEVICE, - dmabuf_fd, priv, info->extack); + bind_dev = netdev_find_netmem_tx_dev(netdev); + if (!bind_dev) { + err = -EOPNOTSUPP; + NL_SET_ERR_MSG(info->extack, + "No DMA-capable device found for netmem TX"); + goto err_unlock_netdev; + } + + if (bind_dev != netdev) + netdev_lock(bind_dev); + + dma_dev = netdev_queue_get_dma_dev(bind_dev, 0, NETDEV_QUEUE_TYPE_TX); + + binding = net_devmem_bind_dmabuf(bind_dev, + bind_dev != netdev ? netdev : NULL, + dma_dev, DMA_TO_DEVICE, dmabuf_fd, + priv, info->extack); if (IS_ERR(binding)) { err = PTR_ERR(binding); - goto err_unlock_netdev; + goto err_unlock_bind_dev; } nla_put_u32(rsp, NETDEV_A_DMABUF_ID, binding->id); genlmsg_end(rsp, hdr); + if (bind_dev != netdev) + netdev_unlock(bind_dev); netdev_unlock(netdev); mutex_unlock(&priv->lock); return genlmsg_reply(rsp, info); +err_unlock_bind_dev: + if (bind_dev != netdev) + netdev_unlock(bind_dev); err_unlock_netdev: netdev_unlock(netdev); err_unlock_sock: -- 2.53.0-Meta