* [PATCH net-next v2 0/6] net: devmem: support devmem with netkit devices
@ 2026-05-05 0:27 Bobby Eshleman
2026-05-05 0:27 ` [PATCH net-next v2 1/6] net: add netmem_tx modes that indicate dma capability Bobby Eshleman
` (5 more replies)
0 siblings, 6 replies; 8+ messages in thread
From: Bobby Eshleman @ 2026-05-05 0:27 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan
Cc: netdev, linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
Stanislav Fomichev, Mina Almasry, Bobby Eshleman
This series enables TCP devmem TX through netkit devices.
Netkit now supports queue leasing. A physical NIC's RX queue can be
leased to a netkit guest interface inside a container namespace. This
gives the container a devmem-capable data path on the RX side (bind-rx,
etc...). On the TX side, the container process binds to its netkit guest
interface and sends traffic that netkit redirects (via BPF or ip
forwarding) to the physical NIC for DMA.
Two things in the existing devmem TX path prevent this from working:
1. validate_xmit_unreadable_skb() requires dev->netmem_tx before it will
forward a dmabuf-backed (unreadable) skb. This protects skbs from
landing on devices that don't have the IOMMU mappings for the backing
dmabuf or that don't speak netmem. Netkit, however, does not support
DMA, doesn't attempt to read unreadable skb pages and so doesn't
break netmem (it is pure skb routing and redirection). It is
functionally capable of routing unreadable skbs, but there is no way
for the TX validation pathway to distinguish between a device that
will actually attempt DMA-ing the skb and another device
(like netkit) that does not DMA but also does not break
netmem.
2. bind_tx_doit uses the bound device as the DMA device. When the user
binds devmem TX to the netkit guest, the bind handler attempts to
create DMA mappings against netkit, which has no DMA capability and
no IOMMU mappings.
This series solves these problems as follows:
1. Extend netmem_tx to two bits, assigned to one of three values:
NETMEM_TX_NONE - netmem not supported
NETMEM_TX_DMA - netmem supported and performs DMA
NETMEM_TX_NO_DMA - netmem supported, but does not DMA
With these bits, phys devices can set NETMEM_TX_DMA and devices like
netkit set NETMEM_TX_NO_DMA. The validation TX path ensures that any
DMA-capable netdev exactly matches the bound device, guarantee the
correct mapping of the bound dmabuf. The validation TX path also
allows devices with NETMEM_TX_NO_DMA to pass, knowing these devices
will not misuse netmem or run into IOMMU faults. After redirection or
routing and the skb finally makes its way through the stack to a
physical device's TX path, the above NETMEM_TX_DMA check is performed
again to guarantee the device has the appropriate binding/mappings.
2. On TX bind, the bind handler recognizes NETMEM_TX_NO_DMA devices and
finds the phys TX device and binds to that instead. For the netkit
case, if it has been leased a queue from a DMA-capable device
already, then the bind action is performed on the DMA-capable device
instead and the dmabuf is mapped correctly.
---
Changes in v2:
- Squash driver conversion patches (2-5) into patch 1 (Jakub)
- In validate_xmit_unreadable_skb() to check netmem_tx mode before inspecting
frags (Jakub)
- Lock bind_dev around netdev_queue_get_dma_dev() when bind_dev != netdev to
fix lockdep (Sashiko)
- Move require_devmem() into individual test functions so KsftSkipEx goes up to
ksft_run() (Sashiko)
- Add nk_devmem.py to TEST_PROGS in Makefile (Sashiko)
- Link to v1:
https://lore.kernel.org/all/20260428-tcp-dm-netkit-v1-0-719280eba4d2@meta.com/
To: Andrew Lunn <andrew+netdev@lunn.ch>
To: David S. Miller <davem@davemloft.net>
To: Eric Dumazet <edumazet@google.com>
To: Jakub Kicinski <kuba@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>
To: Simon Horman <horms@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>
To: Shuah Khan <skhan@linuxfoundation.org>
To: Alex Shi <alexs@kernel.org>
To: Yanteng Si <si.yanteng@linux.dev>
To: Dongliang Mu <dzm91@hust.edu.cn>
To: Michael Chan <michael.chan@broadcom.com>
To: Pavan Chebbi <pavan.chebbi@broadcom.com>
To: Joshua Washington <joshwash@google.com>
To: Harshitha Ramamurthy <hramamurthy@google.com>
To: Saeed Mahameed <saeedm@nvidia.com>
To: Tariq Toukan <tariqt@nvidia.com>
To: Mark Bloch <mbloch@nvidia.com>
To: Leon Romanovsky <leon@kernel.org>
To: Alexander Duyck <alexanderduyck@fb.com>
To: kernel-team@meta.com
To: Daniel Borkmann <daniel@iogearbox.net>
To: Nikolay Aleksandrov <razor@blackwall.org>
To: Shuah Khan <shuah@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: bpf@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: Mina Almasry <almasrymina@google.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Bobby Eshleman (6):
net: add netmem_tx modes that indicate dma capability
net: devmem: support TX over NETMEM_TX_NO_DMA devices
selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration
selftests: drv-net: refactor devmem command builders into lib module
selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv
selftests: drv-net: add netkit devmem tests
.../networking/net_cachelines/net_device.rst | 2 +-
Documentation/networking/netmem.rst | 8 +-
.../translations/zh_CN/networking/netmem.rst | 7 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
drivers/net/ethernet/google/gve/gve_main.c | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
drivers/net/ethernet/meta/fbnic/fbnic_netdev.c | 2 +-
drivers/net/netkit.c | 1 +
include/linux/netdevice.h | 11 +-
net/core/dev.c | 21 +-
net/core/devmem.c | 6 +-
net/core/devmem.h | 9 +-
net/core/netdev-genl.c | 57 +++++-
tools/testing/selftests/drivers/net/hw/Makefile | 1 +
tools/testing/selftests/drivers/net/hw/devmem.py | 73 +------
.../selftests/drivers/net/hw/lib/py/devmem.py | 222 +++++++++++++++++++++
tools/testing/selftests/drivers/net/hw/ncdevmem.c | 58 +++---
.../testing/selftests/drivers/net/hw/nk_devmem.py | 40 ++++
.../drivers/net/hw/nk_primary_rx_redirect.bpf.c | 41 ++++
tools/testing/selftests/drivers/net/lib/py/env.py | 67 +++++--
20 files changed, 507 insertions(+), 125 deletions(-)
---
base-commit: 790ead9394860e7d70c5e0e50a35b243e909a618
change-id: 20260423-tcp-dm-netkit-2bd78b638d30
Best regards,
--
Bobby Eshleman <bobbyeshleman@meta.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next v2 1/6] net: add netmem_tx modes that indicate dma capability
2026-05-05 0:27 [PATCH net-next v2 0/6] net: devmem: support devmem with netkit devices Bobby Eshleman
@ 2026-05-05 0:27 ` Bobby Eshleman
2026-05-05 17:41 ` Harshitha Ramamurthy
2026-05-05 0:27 ` [PATCH net-next v2 2/6] net: devmem: support TX over NETMEM_TX_NO_DMA devices Bobby Eshleman
` (4 subsequent siblings)
5 siblings, 1 reply; 8+ messages in thread
From: Bobby Eshleman @ 2026-05-05 0:27 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan
Cc: netdev, linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
Stanislav Fomichev, Mina Almasry, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
Devices that support netmem TX previously set dev->netmem_tx = true.
This was checked in validate_xmit_unreadable_skb() to drop unreadable
skbs (skbs with dmabuf-backed frags) before they reach drivers that
would mishandle them or devices that would not have the iommu mappings
for them.
Some virtual devices like netkit (or ifb) never DMA and never touch frag
contents, as they essentially just forward the skb to another device.
They are unable to forward unreadable skbs, however, because they fail
to pass TX validation checks on dev->netmem_tx. This single bit flag
doesn't give the TX validator enough information to differentiate
devices that will attempt DMA on the unreadable skb and those that will
simply route it untouched.
This patch fixes this issue by adding an additional bit to netmem_tx, so
that drivers can indicate 1) if they have netmem support, and 2) if they
do, are they DMA-capable or not?
Replace the boolean with a 2-bit enum:
NETMEM_TX_NONE - no netmem TX support (drop unreadable skbs)
NETMEM_TX_DMA - full support, device does DMA
NETMEM_TX_NO_DMA - pass-through, device never DMAs
Update drivers to reflect these definitions. NIC drivers use
NETMEM_TX_DMA, and netkit uses NETMEM_TX_NO_DMA.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v2:
- Squash driver conversion patches (2-5) into patch 1 (Jakub)
---
Documentation/networking/net_cachelines/net_device.rst | 2 +-
Documentation/networking/netmem.rst | 8 +++++++-
Documentation/translations/zh_CN/networking/netmem.rst | 7 ++++++-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
drivers/net/ethernet/google/gve/gve_main.c | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
drivers/net/ethernet/meta/fbnic/fbnic_netdev.c | 2 +-
drivers/net/netkit.c | 1 +
include/linux/netdevice.h | 11 +++++++++--
9 files changed, 28 insertions(+), 9 deletions(-)
diff --git a/Documentation/networking/net_cachelines/net_device.rst b/Documentation/networking/net_cachelines/net_device.rst
index 1c19bb7705df..c85784259544 100644
--- a/Documentation/networking/net_cachelines/net_device.rst
+++ b/Documentation/networking/net_cachelines/net_device.rst
@@ -10,7 +10,7 @@ Type Name fastpath_tx_acce
=================================== =========================== =================== =================== ===================================================================================
unsigned_long:32 priv_flags read_mostly __dev_queue_xmit(tx)
unsigned_long:1 lltx read_mostly HARD_TX_LOCK,HARD_TX_TRYLOCK,HARD_TX_UNLOCK(tx)
-unsigned long:1 netmem_tx:1; read_mostly
+unsigned long:2 netmem_tx:2; read_mostly
char name[16]
struct netdev_name_node* name_node
struct dev_ifalias* ifalias
diff --git a/Documentation/networking/netmem.rst b/Documentation/networking/netmem.rst
index b63aded46337..217869d1108d 100644
--- a/Documentation/networking/netmem.rst
+++ b/Documentation/networking/netmem.rst
@@ -95,4 +95,10 @@ Driver TX Requirements
netdev@, or reach out to the maintainers and/or almasrymina@google.com for
help adding the netmem API.
-2. Driver should declare support by setting `netdev->netmem_tx = true`
+2. Driver should declare support by setting `netdev->netmem_tx` to the
+ appropriate mode:
+
+ - `NETMEM_TX_DMA`: for physical devices that perform DMA.
+
+ - `NETMEM_TX_NO_DMA`: for virtual or passthrough devices that do
+ not DMA, but still support handling of netmem-backed skbs.
diff --git a/Documentation/translations/zh_CN/networking/netmem.rst b/Documentation/translations/zh_CN/networking/netmem.rst
index fe351a240f02..320f3eacf51b 100644
--- a/Documentation/translations/zh_CN/networking/netmem.rst
+++ b/Documentation/translations/zh_CN/networking/netmem.rst
@@ -89,4 +89,9 @@ dma-mapping API 去处理。
使用某个还不存在的 netmem API,你可以自行添加并提交到 netdev@,也可以联系维护
人员或者发送邮件至 almasrymina@google.com 寻求帮助。
-2. 驱动程序应通过设置 netdev->netmem_tx = true 来表明自身支持 netmem 功能。
+2. 驱动程序应将 `netdev->netmem_tx` 设置为适当的模式:
+
+ - `NETMEM_TX_DMA`:适用于执行 DMA 的物理设备。
+
+ - `NETMEM_TX_NO_DMA`:适用于不执行 DMA 的虚拟或透传设备,但仍支持
+ 处理 netmem 支持的 skb。
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 8c55874f44ca..ed9c22dc4a5a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -17120,7 +17120,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
dev->queue_mgmt_ops = &bnxt_queue_mgmt_ops_unsupp;
if (BNXT_SUPPORTS_QUEUE_API(bp))
dev->queue_mgmt_ops = &bnxt_queue_mgmt_ops;
- dev->netmem_tx = true;
+ dev->netmem_tx = NETMEM_TX_DMA;
rc = register_netdev(dev);
if (rc)
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 424d973c97f2..dd2b8f087163 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -2894,7 +2894,7 @@ static int gve_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto abort_with_wq;
if (!gve_is_gqi(priv) && !gve_is_qpl(priv))
- dev->netmem_tx = true;
+ dev->netmem_tx = NETMEM_TX_DMA;
err = register_netdev(dev);
if (err)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 5a46870c4b74..fc49aae38807 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5924,7 +5924,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
netdev->priv_flags |= IFF_UNICAST_FLT;
- netdev->netmem_tx = true;
+ netdev->netmem_tx = NETMEM_TX_DMA;
netif_set_tso_max_size(netdev, GSO_MAX_SIZE);
mlx5e_set_xdp_feature(priv);
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
index c406a3b56b37..138e522ef9b9 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
@@ -752,7 +752,7 @@ struct net_device *fbnic_netdev_alloc(struct fbnic_dev *fbd)
netdev->netdev_ops = &fbnic_netdev_ops;
netdev->stat_ops = &fbnic_stat_ops;
netdev->queue_mgmt_ops = &fbnic_queue_mgmt_ops;
- netdev->netmem_tx = true;
+ netdev->netmem_tx = NETMEM_TX_DMA;
fbnic_set_ethtool_ops(netdev);
diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c
index 5e2eecc3165d..0ad6a806d7d5 100644
--- a/drivers/net/netkit.c
+++ b/drivers/net/netkit.c
@@ -466,6 +466,7 @@ static void netkit_setup(struct net_device *dev)
dev->priv_flags |= IFF_NO_QUEUE;
dev->priv_flags |= IFF_DISABLE_NETPOLL;
dev->lltx = true;
+ dev->netmem_tx = NETMEM_TX_NO_DMA;
dev->netdev_ops = &netkit_netdev_ops;
dev->ethtool_ops = &netkit_ethtool_ops;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0e1e581efc5a..11d68e75eb4f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1788,6 +1788,12 @@ enum netdev_stat_type {
NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */
};
+enum netmem_tx_mode {
+ NETMEM_TX_NONE, /* no netmem TX support */
+ NETMEM_TX_DMA, /* DMA-capable netmem TX (real HW) */
+ NETMEM_TX_NO_DMA, /* no DMA, e.g. passthrough for virtual devs */
+};
+
enum netdev_reg_state {
NETREG_UNINITIALIZED = 0,
NETREG_REGISTERED, /* completed register_netdevice */
@@ -1809,7 +1815,8 @@ enum netdev_reg_state {
* @lltx: device supports lockless Tx. Deprecated for real HW
* drivers. Mainly used by logical interfaces, such as
* bonding and tunnels
- * @netmem_tx: device support netmem_tx.
+ * @netmem_tx: device netmem TX mode (NETMEM_TX_NONE, NETMEM_TX_DMA,
+ * or NETMEM_TX_NO_DMA).
*
* @name: This is the first field of the "visible" part of this structure
* (i.e. as seen by users in the "Space.c" file). It is the name
@@ -2132,7 +2139,7 @@ struct net_device {
struct_group(priv_flags_fast,
unsigned long priv_flags:32;
unsigned long lltx:1;
- unsigned long netmem_tx:1;
+ unsigned long netmem_tx:2;
);
const struct net_device_ops *netdev_ops;
const struct header_ops *header_ops;
--
2.52.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v2 2/6] net: devmem: support TX over NETMEM_TX_NO_DMA devices
2026-05-05 0:27 [PATCH net-next v2 0/6] net: devmem: support devmem with netkit devices Bobby Eshleman
2026-05-05 0:27 ` [PATCH net-next v2 1/6] net: add netmem_tx modes that indicate dma capability Bobby Eshleman
@ 2026-05-05 0:27 ` Bobby Eshleman
2026-05-05 0:27 ` [PATCH net-next v2 3/6] selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration Bobby Eshleman
` (3 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Bobby Eshleman @ 2026-05-05 0:27 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan
Cc: netdev, linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
Stanislav Fomichev, Mina Almasry, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
When a netkit virtual device leases queues from a physical NIC, devmem
TX bindings created on the netkit device must still result in the dmabuf
being mapped for dma by the physical device. This patch accomplishes
this by teaching the bind handler to search for the underlying
DMA-capable device by looking it up via leased rx queues. The function
netdev_find_netmem_tx_dev(), used for finding the underlying DMA-capable
device, can be extended to support other non-netkit NETMEM_TX_NO_DMA
devices in the future if needed.
Additionally, this patch extends validate_xmit_unreadable_skb() to
support the netkit case, where the skb is validated twice: once on the
netkit guest device and again on the physical NIC after BPF redirect or
ip forwarding.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v2:
- In validate_xmit_unreadable_skb() to check netmem_tx mode before
inspecting frags (Jakub)
- Lock bind_dev around netdev_queue_get_dma_dev() when bind_dev !=
netdev to fix lockdep (Sashiko)
---
net/core/dev.c | 21 ++++++++++++-------
net/core/devmem.c | 6 ++++--
net/core/devmem.h | 9 ++++++--
net/core/netdev-genl.c | 57 +++++++++++++++++++++++++++++++++++++++++++++-----
4 files changed, 77 insertions(+), 16 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 06c195906231..74eb4eb170cd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3990,23 +3990,30 @@ static struct sk_buff *sk_validate_xmit_skb(struct sk_buff *skb,
static struct sk_buff *validate_xmit_unreadable_skb(struct sk_buff *skb,
struct net_device *dev)
{
+ struct net_devmem_dmabuf_binding *binding;
struct skb_shared_info *shinfo;
struct net_iov *niov;
if (likely(skb_frags_readable(skb)))
goto out;
- if (!dev->netmem_tx)
+ if (dev->netmem_tx == NETMEM_TX_NONE)
goto out_free;
+ if (dev->netmem_tx == NETMEM_TX_NO_DMA)
+ goto out;
+
shinfo = skb_shinfo(skb);
+ if (shinfo->nr_frags == 0)
+ goto out;
- if (shinfo->nr_frags > 0) {
- niov = netmem_to_net_iov(skb_frag_netmem(&shinfo->frags[0]));
- if (net_is_devmem_iov(niov) &&
- READ_ONCE(net_devmem_iov_binding(niov)->dev) != dev)
- goto out_free;
- }
+ niov = netmem_to_net_iov(skb_frag_netmem(&shinfo->frags[0]));
+ if (!net_is_devmem_iov(niov))
+ goto out_free;
+
+ binding = net_devmem_iov_binding(niov);
+ if (READ_ONCE(binding->dev) != dev)
+ goto out_free;
out:
return skb;
diff --git a/net/core/devmem.c b/net/core/devmem.c
index cde4c89bc146..644c286b778f 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -181,7 +181,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
}
struct net_devmem_dmabuf_binding *
-net_devmem_bind_dmabuf(struct net_device *dev,
+net_devmem_bind_dmabuf(struct net_device *dev, struct net_device *vdev,
struct device *dma_dev,
enum dma_data_direction direction,
unsigned int dmabuf_fd, struct netdev_nl_sock *priv,
@@ -212,6 +212,7 @@ net_devmem_bind_dmabuf(struct net_device *dev,
}
binding->dev = dev;
+ binding->vdev = vdev;
xa_init_flags(&binding->bound_rxqs, XA_FLAGS_ALLOC);
err = percpu_ref_init(&binding->ref,
@@ -397,7 +398,8 @@ struct net_devmem_dmabuf_binding *net_devmem_get_binding(struct sock *sk,
*/
dst_dev = dst_dev_rcu(dst);
if (unlikely(!dst_dev) ||
- unlikely(dst_dev != READ_ONCE(binding->dev))) {
+ unlikely(dst_dev != READ_ONCE(binding->dev) &&
+ dst_dev != READ_ONCE(binding->vdev))) {
err = -ENODEV;
goto out_unlock;
}
diff --git a/net/core/devmem.h b/net/core/devmem.h
index 1c5c18581fcb..f399632b3c4b 100644
--- a/net/core/devmem.h
+++ b/net/core/devmem.h
@@ -19,7 +19,12 @@ struct net_devmem_dmabuf_binding {
struct dma_buf *dmabuf;
struct dma_buf_attachment *attachment;
struct sg_table *sgt;
+ /* Physical NIC that does the actual DMA for this binding. */
struct net_device *dev;
+ /* Virtual device (e.g. netkit) the user called bind-tx on. Must be
+ * NETMEM_TX_NO_DMA.
+ */
+ struct net_device *vdev;
struct gen_pool *chunk_pool;
/* Protect dev */
struct mutex lock;
@@ -84,7 +89,7 @@ struct dmabuf_genpool_chunk_owner {
void __net_devmem_dmabuf_binding_free(struct work_struct *wq);
struct net_devmem_dmabuf_binding *
-net_devmem_bind_dmabuf(struct net_device *dev,
+net_devmem_bind_dmabuf(struct net_device *dev, struct net_device *vdev,
struct device *dma_dev,
enum dma_data_direction direction,
unsigned int dmabuf_fd, struct netdev_nl_sock *priv,
@@ -165,7 +170,7 @@ static inline void net_devmem_put_net_iov(struct net_iov *niov)
}
static inline struct net_devmem_dmabuf_binding *
-net_devmem_bind_dmabuf(struct net_device *dev,
+net_devmem_bind_dmabuf(struct net_device *dev, struct net_device *vdev,
struct device *dma_dev,
enum dma_data_direction direction,
unsigned int dmabuf_fd,
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index b8f6076d8007..0e296c3bb677 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -1077,7 +1077,7 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
goto err_rxq_bitmap;
}
- binding = net_devmem_bind_dmabuf(netdev, dma_dev, DMA_FROM_DEVICE,
+ binding = net_devmem_bind_dmabuf(netdev, NULL, dma_dev, DMA_FROM_DEVICE,
dmabuf_fd, priv, info->extack);
if (IS_ERR(binding)) {
err = PTR_ERR(binding);
@@ -1119,9 +1119,42 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
return err;
}
+/* Find the DMA-capable device for netmem TX binding.
+ * For NETMEM_TX_DMA devices, returns the device itself.
+ * For NETMEM_TX_NO_DMA devices (e.g. netkit), walks leased queues
+ * to find the underlying physical device.
+ * Returns NULL if no suitable device is found.
+ */
+static struct net_device *netdev_find_netmem_tx_dev(struct net_device *dev)
+{
+ struct netdev_rx_queue *lease_rxq;
+ struct net_device *phys_dev;
+ int i;
+
+ if (dev->netmem_tx == NETMEM_TX_DMA)
+ return dev;
+
+ if (dev->netmem_tx != NETMEM_TX_NO_DMA)
+ return NULL;
+
+ for (i = 0; i < dev->real_num_rx_queues; i++) {
+ lease_rxq = READ_ONCE(__netif_get_rx_queue(dev, i)->lease);
+ if (!lease_rxq)
+ continue;
+
+ phys_dev = lease_rxq->dev;
+ if (netif_device_present(phys_dev) &&
+ phys_dev->netmem_tx == NETMEM_TX_DMA)
+ return phys_dev;
+ }
+
+ return NULL;
+}
+
int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct genl_info *info)
{
struct net_devmem_dmabuf_binding *binding;
+ struct net_device *bind_dev;
struct netdev_nl_sock *priv;
struct net_device *netdev;
struct device *dma_dev;
@@ -1164,16 +1197,30 @@ int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct genl_info *info)
goto err_unlock_netdev;
}
- if (!netdev->netmem_tx) {
+ if (netdev->netmem_tx == NETMEM_TX_NONE) {
err = -EOPNOTSUPP;
NL_SET_ERR_MSG(info->extack,
"Driver does not support netmem TX");
goto err_unlock_netdev;
}
- dma_dev = netdev_queue_get_dma_dev(netdev, 0, NETDEV_QUEUE_TYPE_TX);
- binding = net_devmem_bind_dmabuf(netdev, dma_dev, DMA_TO_DEVICE,
- dmabuf_fd, priv, info->extack);
+ bind_dev = netdev_find_netmem_tx_dev(netdev);
+ if (!bind_dev) {
+ err = -EOPNOTSUPP;
+ NL_SET_ERR_MSG(info->extack,
+ "No DMA-capable device found for netmem TX");
+ goto err_unlock_netdev;
+ }
+
+ if (bind_dev != netdev)
+ netdev_lock(bind_dev);
+ dma_dev = netdev_queue_get_dma_dev(bind_dev, 0, NETDEV_QUEUE_TYPE_TX);
+ if (bind_dev != netdev)
+ netdev_unlock(bind_dev);
+ binding = net_devmem_bind_dmabuf(bind_dev,
+ bind_dev != netdev ? netdev : NULL,
+ dma_dev, DMA_TO_DEVICE, dmabuf_fd,
+ priv, info->extack);
if (IS_ERR(binding)) {
err = PTR_ERR(binding);
goto err_unlock_netdev;
--
2.52.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v2 3/6] selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration
2026-05-05 0:27 [PATCH net-next v2 0/6] net: devmem: support devmem with netkit devices Bobby Eshleman
2026-05-05 0:27 ` [PATCH net-next v2 1/6] net: add netmem_tx modes that indicate dma capability Bobby Eshleman
2026-05-05 0:27 ` [PATCH net-next v2 2/6] net: devmem: support TX over NETMEM_TX_NO_DMA devices Bobby Eshleman
@ 2026-05-05 0:27 ` Bobby Eshleman
2026-05-05 0:27 ` [PATCH net-next v2 4/6] selftests: drv-net: refactor devmem command builders into lib module Bobby Eshleman
` (2 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Bobby Eshleman @ 2026-05-05 0:27 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan
Cc: netdev, linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
Stanislav Fomichev, Mina Almasry, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add a -n (skip_config) flag that causes ncdevmem to skip NIC
configuration when operating as an RX server. When -n is passed,
ncdevmem skips configuring header split, RSS, and flow steering, as well
as their teardown on exit.
This allows ksft tests to pre-configure the NIC in the host namespace
before launching ncdevmem in the guest namespace. This is needed for
netkit devmem tests where the test harness namespace has direct access
to the NIC and the ncdevmem namespace does not.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
tools/testing/selftests/drivers/net/hw/ncdevmem.c | 58 +++++++++++++----------
1 file changed, 34 insertions(+), 24 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
index e098d6534c3c..d96e8a3b5a65 100644
--- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c
+++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
@@ -93,6 +93,7 @@ static char *port;
static size_t do_validation;
static int start_queue = -1;
static int num_queues = -1;
+static int skip_config;
static char *ifname;
static unsigned int ifindex;
static unsigned int dmabuf_id;
@@ -828,7 +829,7 @@ static struct netdev_queue_id *create_queues(void)
static int do_server(struct memory_buffer *mem)
{
- struct ethtool_rings_get_rsp *ring_config;
+ struct ethtool_rings_get_rsp *ring_config = NULL;
char ctrl_data[sizeof(int) * 20000];
size_t non_page_aligned_frags = 0;
struct sockaddr_in6 client_addr;
@@ -851,27 +852,29 @@ static int do_server(struct memory_buffer *mem)
return -1;
}
- ring_config = get_ring_config();
- if (!ring_config) {
- pr_err("Failed to get current ring configuration");
- return -1;
- }
+ if (!skip_config) {
+ ring_config = get_ring_config();
+ if (!ring_config) {
+ pr_err("Failed to get current ring configuration");
+ return -1;
+ }
- if (configure_headersplit(ring_config, 1)) {
- pr_err("Failed to enable TCP header split");
- goto err_free_ring_config;
- }
+ if (configure_headersplit(ring_config, 1)) {
+ pr_err("Failed to enable TCP header split");
+ goto err_free_ring_config;
+ }
- /* Configure RSS to divert all traffic from our devmem queues */
- if (configure_rss()) {
- pr_err("Failed to configure rss");
- goto err_reset_headersplit;
- }
+ /* Configure RSS to divert all traffic from our devmem queues */
+ if (configure_rss()) {
+ pr_err("Failed to configure rss");
+ goto err_reset_headersplit;
+ }
- /* Flow steer our devmem flows to start_queue */
- if (configure_flow_steering(&server_sin)) {
- pr_err("Failed to configure flow steering");
- goto err_reset_rss;
+ /* Flow steer our devmem flows to start_queue */
+ if (configure_flow_steering(&server_sin)) {
+ pr_err("Failed to configure flow steering");
+ goto err_reset_rss;
+ }
}
if (bind_rx_queue(ifindex, mem->fd, create_queues(), num_queues, &ys)) {
@@ -1052,13 +1055,17 @@ static int do_server(struct memory_buffer *mem)
err_unbind:
ynl_sock_destroy(ys);
err_reset_flow_steering:
- reset_flow_steering();
+ if (!skip_config)
+ reset_flow_steering();
err_reset_rss:
- reset_rss();
+ if (!skip_config)
+ reset_rss();
err_reset_headersplit:
- restore_ring_config(ring_config);
+ if (!skip_config)
+ restore_ring_config(ring_config);
err_free_ring_config:
- ethtool_rings_get_rsp_free(ring_config);
+ if (!skip_config)
+ ethtool_rings_get_rsp_free(ring_config);
return err;
}
@@ -1404,7 +1411,7 @@ int main(int argc, char *argv[])
int is_server = 0, opt;
int ret, err = 1;
- while ((opt = getopt(argc, argv, "Lls:c:p:v:q:t:f:z:")) != -1) {
+ while ((opt = getopt(argc, argv, "Lls:c:p:v:q:t:f:z:n")) != -1) {
switch (opt) {
case 'L':
fail_on_linear = true;
@@ -1436,6 +1443,9 @@ int main(int argc, char *argv[])
case 'z':
max_chunk = atoi(optarg);
break;
+ case 'n':
+ skip_config = 1;
+ break;
case '?':
fprintf(stderr, "unknown option: %c\n", optopt);
break;
--
2.52.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v2 4/6] selftests: drv-net: refactor devmem command builders into lib module
2026-05-05 0:27 [PATCH net-next v2 0/6] net: devmem: support devmem with netkit devices Bobby Eshleman
` (2 preceding siblings ...)
2026-05-05 0:27 ` [PATCH net-next v2 3/6] selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration Bobby Eshleman
@ 2026-05-05 0:27 ` Bobby Eshleman
2026-05-05 0:27 ` [PATCH net-next v2 5/6] selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv Bobby Eshleman
2026-05-05 0:27 ` [PATCH net-next v2 6/6] selftests: drv-net: add netkit devmem tests Bobby Eshleman
5 siblings, 0 replies; 8+ messages in thread
From: Bobby Eshleman @ 2026-05-05 0:27 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan
Cc: netdev, linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
Stanislav Fomichev, Mina Almasry, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
Adding netkit-based devmem tests is a straight-forward copy of devmem
test commands plus some args for the nk cases, so this patch breaks out
these command builders into helpers used by both.
Though we tried to avoid libraries to avoid increasing the barrier of
entry/complexity (see selftests/drivers/net/README.md, section "Avoid
libraries and frameworks"), factoring out these functions seemed like
the lesser of two evils in this case of using the same commands, just
with slightly different args per environment.
I experimented with just having all of the tests in the same file to
avoid having helpers in a library file, but because ksft_run() is
limited to a single call per file, and the new tests will require
different environments (NetDrvContEnv/NetDrvEpEnv), it would have been
necessary to have each test set up its own environment instead of
sharing one for the entire ksft_run() run. This came at the cost of
ballooning the test time (from under 5s to 30s on my test system), so to
strike a balance these tests were placed in separate files so they could
keep a shared environment across a single ksft_run() run shared across
all tests using the same env type (introduced in subsequent patches).
The helpers work transparently with both plain and netkit environments
by inspecting cfg for netkit-specific attributes (netns, nk_queue,
etc...).
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v2:
- Move require_devmem() into individual test functions so KsftSkipEx goes up to
ksft_run() (Sashiko)
- in ncdevmem_rx(), move -v 7 to take effect for both netns and
non-netns when verify=True
---
tools/testing/selftests/drivers/net/hw/devmem.py | 73 +------
.../selftests/drivers/net/hw/lib/py/devmem.py | 222 +++++++++++++++++++++
2 files changed, 231 insertions(+), 64 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/hw/devmem.py b/tools/testing/selftests/drivers/net/hw/devmem.py
index ee863e90d1e0..33648e39577a 100755
--- a/tools/testing/selftests/drivers/net/hw/devmem.py
+++ b/tools/testing/selftests/drivers/net/hw/devmem.py
@@ -1,92 +1,37 @@
#!/usr/bin/env python3
# SPDX-License-Identifier: GPL-2.0
+"""Test devmem TCP."""
from os import path
-from lib.py import ksft_run, ksft_exit
-from lib.py import ksft_eq, KsftSkipEx
+from lib.py import ksft_run, ksft_exit, ksft_disruptive
from lib.py import NetDrvEpEnv
-from lib.py import bkg, cmd, rand_port, wait_port_listen
-from lib.py import ksft_disruptive
-
-
-def require_devmem(cfg):
- if not hasattr(cfg, "_devmem_probed"):
- probe_command = f"{cfg.bin_local} -f {cfg.ifname}"
- cfg._devmem_supported = cmd(probe_command, fail=False, shell=True).ret == 0
- cfg._devmem_probed = True
-
- if not cfg._devmem_supported:
- raise KsftSkipEx("Test requires devmem support")
+from lib.py.devmem import setup_test, run_rx, run_tx, run_tx_chunks, run_rx_hds
@ksft_disruptive
def check_rx(cfg) -> None:
- require_devmem(cfg)
-
- port = rand_port()
- socat = f"socat -u - TCP{cfg.addr_ipver}:{cfg.baddr}:{port},bind={cfg.remote_baddr}:{port}"
- listen_cmd = f"{cfg.bin_local} -l -f {cfg.ifname} -s {cfg.addr} -p {port} -c {cfg.remote_addr} -v 7"
-
- with bkg(listen_cmd, exit_wait=True) as ncdevmem:
- wait_port_listen(port)
- cmd(f"yes $(echo -e \x01\x02\x03\x04\x05\x06) | \
- head -c 1K | {socat}", host=cfg.remote, shell=True)
-
- ksft_eq(ncdevmem.ret, 0)
+ run_rx(cfg)
@ksft_disruptive
def check_tx(cfg) -> None:
- require_devmem(cfg)
-
- port = rand_port()
- listen_cmd = f"socat -U - TCP{cfg.addr_ipver}-LISTEN:{port}"
-
- with bkg(listen_cmd, host=cfg.remote, exit_wait=True) as socat:
- wait_port_listen(port, host=cfg.remote)
- cmd(f"echo -e \"hello\\nworld\"| {cfg.bin_local} -f {cfg.ifname} -s {cfg.remote_addr} -p {port}", shell=True)
-
- ksft_eq(socat.stdout.strip(), "hello\nworld")
+ run_tx(cfg)
@ksft_disruptive
def check_tx_chunks(cfg) -> None:
- require_devmem(cfg)
-
- port = rand_port()
- listen_cmd = f"socat -U - TCP{cfg.addr_ipver}-LISTEN:{port}"
-
- with bkg(listen_cmd, host=cfg.remote, exit_wait=True) as socat:
- wait_port_listen(port, host=cfg.remote)
- cmd(f"echo -e \"hello\\nworld\"| {cfg.bin_local} -f {cfg.ifname} -s {cfg.remote_addr} -p {port} -z 3", shell=True)
-
- ksft_eq(socat.stdout.strip(), "hello\nworld")
+ run_tx_chunks(cfg)
def check_rx_hds(cfg) -> None:
- """Test HDS splitting across payload sizes."""
- require_devmem(cfg)
-
- for size in [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192]:
- port = rand_port()
- listen_cmd = f"{cfg.bin_local} -L -l -f {cfg.ifname} -s {cfg.addr} -p {port}"
-
- with bkg(listen_cmd, exit_wait=True) as ncdevmem:
- wait_port_listen(port)
- cmd(f"dd if=/dev/zero bs={size} count=1 2>/dev/null | " +
- f"socat -b {size} -u - TCP{cfg.addr_ipver}:{cfg.baddr}:{port},nodelay",
- host=cfg.remote, shell=True)
-
- ksft_eq(ncdevmem.ret, 0, f"HDS failed for payload size {size}")
+ run_rx_hds(cfg)
def main() -> None:
with NetDrvEpEnv(__file__) as cfg:
- cfg.bin_local = path.abspath(path.dirname(__file__) + "/ncdevmem")
- cfg.bin_remote = cfg.remote.deploy(cfg.bin_local)
-
+ setup_test(cfg, path.abspath(path.dirname(__file__) + "/ncdevmem"))
ksft_run([check_rx, check_tx, check_tx_chunks, check_rx_hds],
- args=(cfg, ))
+ args=(cfg,))
ksft_exit()
diff --git a/tools/testing/selftests/drivers/net/hw/lib/py/devmem.py b/tools/testing/selftests/drivers/net/hw/lib/py/devmem.py
new file mode 100644
index 000000000000..6f8a3f5aae14
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/hw/lib/py/devmem.py
@@ -0,0 +1,222 @@
+# SPDX-License-Identifier: GPL-2.0
+"""Shared helpers for devmem TCP selftests."""
+
+import re
+
+from net.lib.py import (bkg, cmd, defer, ethtool, rand_port, wait_port_listen,
+ ksft_eq, KsftSkipEx, NetNSEnter, EthtoolFamily,
+ NetdevFamily)
+
+
+def require_devmem(cfg):
+ if not hasattr(cfg, "_devmem_probed"):
+ probe_command = f"{cfg.bin_local} -f {cfg.ifname}"
+ cfg._devmem_supported = cmd(probe_command, fail=False, shell=True).ret == 0
+ cfg._devmem_probed = True
+
+ if not cfg._devmem_supported:
+ raise KsftSkipEx("Test requires devmem support")
+
+
+def configure_nic(cfg):
+ """Channels, rings, RSS, queue lease for netkit devmem.
+
+ Rings and RSS are re-applied each call because per-test defers restore
+ them after every test case. The queue lease is created only once.
+ """
+ cfg.require_ipver('6')
+ ethnl = EthtoolFamily()
+
+ channels = ethnl.channels_get({'header': {'dev-index': cfg.ifindex}})
+ channels = channels['combined-count']
+ if channels < 2:
+ raise KsftSkipEx(
+ 'Test requires NETIF with at least 2 combined channels'
+ )
+
+ rings = ethnl.rings_get({'header': {'dev-index': cfg.ifindex}})
+ rx_rings = rings['rx']
+ hds_thresh = rings.get('hds-thresh', 0)
+ orig_data_split = rings.get('tcp-data-split', 'unknown')
+
+ ethnl.rings_set({'header': {'dev-index': cfg.ifindex},
+ 'tcp-data-split': 'enabled',
+ 'hds-thresh': 0,
+ 'rx': min(64, rx_rings)})
+ defer(ethnl.rings_set, {'header': {'dev-index': cfg.ifindex},
+ 'tcp-data-split': orig_data_split,
+ 'hds-thresh': hds_thresh,
+ 'rx': rx_rings})
+
+ cfg.src_queue = channels - 1
+ ethtool(f"-X {cfg.ifname} equal {cfg.src_queue}")
+ defer(ethtool, f"-X {cfg.ifname} default")
+
+ if not hasattr(cfg, 'nk_queue'):
+ with NetNSEnter(str(cfg.netns)):
+ netdevnl = NetdevFamily()
+ lease_result = netdevnl.queue_create({
+ "ifindex": cfg.nk_guest_ifindex,
+ "type": "rx",
+ "lease": {
+ "ifindex": cfg.ifindex,
+ "queue": {"id": cfg.src_queue, "type": "rx"},
+ "netns-id": 0,
+ },
+ })
+ cfg.nk_queue = lease_result['id']
+
+
+def set_flow_rule(cfg, port):
+ output = ethtool(
+ f"-N {cfg.ifname} flow-type tcp6 dst-port {port}"
+ f" action {cfg.src_queue}"
+ ).stdout
+ return int(re.search(r'ID (\d+)', output).group(1))
+
+
+def ncdevmem_rx(cfg, port, verify=True, fail_on_linear=False, flow_steer=False):
+ if hasattr(cfg, 'netns'):
+ flow_rule_id = set_flow_rule(cfg, port)
+ defer(ethtool, f"-N {cfg.ifname} delete {flow_rule_id}")
+
+ ifname = cfg._nk_guest_ifname
+ addr = cfg.nk_guest_ipv6
+ extras = f" -t {cfg.nk_queue} -q 1 -n"
+ else:
+ ifname = cfg.ifname
+ addr = cfg.addr
+
+ extras = ""
+ if flow_steer:
+ extras += f"-c {cfg.remote_addr}"
+
+ if verify:
+ extras += " -v 7"
+
+ if fail_on_linear:
+ extras += " -L"
+
+ return f"{cfg.bin_local} -l -f {ifname} -s {addr} -p {port} {extras}"
+
+
+def ncdevmem_tx(cfg, port, chunk_size=0):
+ """ncdevmem TX send command (without stdin pipe)."""
+ if hasattr(cfg, 'netns'):
+ ifname = cfg._nk_guest_ifname
+ addr = cfg.remote_addr_v['6']
+ nk_args = "-t 0 -q 1 -n"
+ else:
+ ifname = cfg.ifname
+ addr = cfg.remote_addr
+ nk_args = ""
+
+ chunk = f"-z {chunk_size}" if chunk_size else ""
+
+ return (f"{cfg.bin_local} -f {ifname} -s {addr} -p {port}"
+ f" {nk_args} {chunk}").rstrip()
+
+
+def socat_send(cfg, port, buf_size=0, nodelay=False, bind=False):
+ """Socat command for sending to the devmem listener."""
+ proto = f"TCP{cfg.addr_ipver}"
+
+ if hasattr(cfg, 'netns'):
+ addr = f"[{cfg.nk_guest_ipv6}]"
+ else:
+ addr = cfg.baddr
+
+ buf = f"-b {buf_size} " if buf_size else ""
+
+ suffix = ""
+ if nodelay:
+ suffix += ",nodelay"
+ # Match the 5-tuple flow rule ncdevmem installs when given -c.
+ if bind:
+ suffix += f",bind={cfg.remote_baddr}:{port}"
+
+ return f"socat {buf}-u - {proto}:{addr}:{port}{suffix}"
+
+
+def socat_listen(cfg, port):
+ """Socat listen command for TX tests."""
+ proto = f"TCP{cfg.addr_ipver}"
+
+ if hasattr(cfg, 'netns'):
+ opts = ",reuseaddr"
+ else:
+ opts = ""
+
+ return f"socat -U - {proto}-LISTEN:{port}{opts}"
+
+
+def setup_test(cfg, bin_local):
+ cfg.bin_local = bin_local
+ cfg.bin_remote = cfg.remote.deploy(cfg.bin_local)
+ cfg.listen_ns = getattr(cfg, 'netns', None)
+
+
+def run_rx(cfg):
+ require_devmem(cfg)
+ if hasattr(cfg, 'netns'):
+ configure_nic(cfg)
+ port = rand_port()
+ socat = socat_send(cfg, port)
+ data_pipe = (f"yes $(echo -e \x01\x02\x03\x04\x05\x06) | head -c 1K"
+ f" | {socat}")
+ ns = getattr(cfg, "netns", None)
+
+ listen_cmd = ncdevmem_rx(cfg, port)
+ with bkg(listen_cmd, exit_wait=True, ns=ns) as ncdevmem:
+ wait_port_listen(port, proto="tcp", ns=ns)
+ cmd(data_pipe, host=cfg.remote, shell=True)
+ ksft_eq(ncdevmem.ret, 0)
+
+
+def run_tx(cfg):
+ require_devmem(cfg)
+ if hasattr(cfg, 'netns'):
+ configure_nic(cfg)
+ ns = getattr(cfg, "netns", None)
+ port = rand_port()
+ tx = ncdevmem_tx(cfg, port)
+ listen_cmd = socat_listen(cfg, port)
+
+ with bkg(listen_cmd, host=cfg.remote, exit_wait=True) as socat:
+ wait_port_listen(port, host=cfg.remote)
+ cmd(f"bash -c 'echo -e \"hello\\nworld\" | {tx}'", ns=ns, shell=True)
+ ksft_eq(socat.stdout.strip(), "hello\nworld")
+
+
+def run_tx_chunks(cfg):
+ require_devmem(cfg)
+ if hasattr(cfg, 'netns'):
+ configure_nic(cfg)
+ ns = getattr(cfg, "netns", None)
+ port = rand_port()
+ tx = ncdevmem_tx(cfg, port, chunk_size=3)
+ listen_cmd = socat_listen(cfg, port)
+
+ with bkg(listen_cmd, host=cfg.remote, exit_wait=True) as socat:
+ wait_port_listen(port, host=cfg.remote)
+ cmd(f"bash -c 'echo -e \"hello\\nworld\" | {tx}'", ns=ns, shell=True)
+ ksft_eq(socat.stdout.strip(), "hello\nworld")
+
+
+def run_rx_hds(cfg):
+ require_devmem(cfg)
+ if hasattr(cfg, 'netns'):
+ configure_nic(cfg)
+ ns = getattr(cfg, "netns", None)
+
+ for size in [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192]:
+ port = rand_port()
+
+ listen_cmd = ncdevmem_rx(cfg, port, verify=False, fail_on_linear=True)
+ socat = socat_send(cfg, port, buf_size=size, nodelay=True)
+
+ with bkg(listen_cmd, exit_wait=True, ns=ns) as ncdevmem:
+ wait_port_listen(port, proto="tcp", ns=ns)
+ cmd(f"dd if=/dev/zero bs={size} count=1 2>/dev/null | "
+ f"{socat}", host=cfg.remote, shell=True)
+ ksft_eq(ncdevmem.ret, 0, f"HDS failed for payload size {size}")
--
2.52.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v2 5/6] selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv
2026-05-05 0:27 [PATCH net-next v2 0/6] net: devmem: support devmem with netkit devices Bobby Eshleman
` (3 preceding siblings ...)
2026-05-05 0:27 ` [PATCH net-next v2 4/6] selftests: drv-net: refactor devmem command builders into lib module Bobby Eshleman
@ 2026-05-05 0:27 ` Bobby Eshleman
2026-05-05 0:27 ` [PATCH net-next v2 6/6] selftests: drv-net: add netkit devmem tests Bobby Eshleman
5 siblings, 0 replies; 8+ messages in thread
From: Bobby Eshleman @ 2026-05-05 0:27 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan
Cc: netdev, linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
Stanislav Fomichev, Mina Almasry, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
When sending from a namespace that has access to a netkit device with a
leased queue, the nk primary in the host namespace needs to redirect its
RX to the physical device. This patch adds that redirection bpf program
and teaches the harness to install it.
Add primary_rx_redirect=False parameter to NetDrvContEnv.__init__().
When enabled, _attach_primary_rx_redirect_bpf() attaches a new BPF TC
program (nk_primary_rx_redirect.bpf.c) to the primary (host-side) netkit
interface. The program redirects non-ICMPv6 IPv6 packets to the
physical NIC via bpf_redirect_neigh(), with the physical ifindex
configured via the .bss map.
Extract _find_bss_map_id() from _attach_bpf() into a reusable helper so
other BPF attachment methods can use it.
Also add an IPv6 host route on the remote endpoint for the netkit guest
IP via the physical NIC address, so the remote can send packets that
traverse the redirect path to the guest.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
.../drivers/net/hw/nk_primary_rx_redirect.bpf.c | 41 +++++++++++++
tools/testing/selftests/drivers/net/lib/py/env.py | 67 ++++++++++++++++++----
2 files changed, 96 insertions(+), 12 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/hw/nk_primary_rx_redirect.bpf.c b/tools/testing/selftests/drivers/net/hw/nk_primary_rx_redirect.bpf.c
new file mode 100644
index 000000000000..fe3c127a4fd0
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/hw/nk_primary_rx_redirect.bpf.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/ipv6.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#define TC_ACT_OK 0
+#define ETH_P_IPV6 0x86DD
+#define IPPROTO_ICMPV6 58
+
+#define ctx_ptr(field) ((void *)(long)(field))
+
+volatile __u32 phys_ifindex;
+
+SEC("tc/ingress")
+int nk_primary_rx_redirect(struct __sk_buff *skb)
+{
+ void *data_end = ctx_ptr(skb->data_end);
+ void *data = ctx_ptr(skb->data);
+ struct ethhdr *eth;
+ struct ipv6hdr *ip6h;
+
+ eth = data;
+ if ((void *)(eth + 1) > data_end)
+ return TC_ACT_OK;
+
+ if (eth->h_proto != bpf_htons(ETH_P_IPV6))
+ return TC_ACT_OK;
+
+ ip6h = data + sizeof(struct ethhdr);
+ if ((void *)(ip6h + 1) > data_end)
+ return TC_ACT_OK;
+
+ if (ip6h->nexthdr == IPPROTO_ICMPV6)
+ return TC_ACT_OK;
+
+ return bpf_redirect_neigh(phys_ifindex, NULL, 0, 0);
+}
+
+char __license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/drivers/net/lib/py/env.py b/tools/testing/selftests/drivers/net/lib/py/env.py
index 24ce122abd9c..d569d01ef791 100644
--- a/tools/testing/selftests/drivers/net/lib/py/env.py
+++ b/tools/testing/selftests/drivers/net/lib/py/env.py
@@ -336,15 +336,17 @@ class NetDrvContEnv(NetDrvEpEnv):
+---------------+
"""
- def __init__(self, src_path, rxqueues=1, **kwargs):
+ def __init__(self, src_path, rxqueues=1, primary_rx_redirect=False, **kwargs):
self.netns = None
self._nk_host_ifname = None
self._nk_guest_ifname = None
self._tc_clsact_added = False
self._tc_attached = False
+ self._primary_rx_redirect_attached = False
self._bpf_prog_pref = None
self._bpf_prog_id = None
self._init_ns_attached = False
+ self._remote_route_added = False
self._old_fwd = None
self._old_accept_ra = None
@@ -396,8 +398,14 @@ class NetDrvContEnv(NetDrvEpEnv):
self._setup_ns()
self._attach_bpf()
+ if primary_rx_redirect:
+ self._attach_primary_rx_redirect_bpf()
def __del__(self):
+ if self._primary_rx_redirect_attached:
+ cmd(f"tc qdisc del dev {self._nk_host_ifname} clsact", fail=False)
+ self._primary_rx_redirect_attached = False
+
if self._tc_attached:
cmd(f"tc filter del dev {self.ifname} ingress pref {self._bpf_prog_pref}")
self._tc_attached = False
@@ -406,6 +414,11 @@ class NetDrvContEnv(NetDrvEpEnv):
cmd(f"tc qdisc del dev {self.ifname} clsact")
self._tc_clsact_added = False
+ if self._remote_route_added:
+ cmd(f"ip -6 route del {self.nk_guest_ipv6}/128",
+ host=self.remote, fail=False)
+ self._remote_route_added = False
+
if self._nk_host_ifname:
cmd(f"ip link del dev {self._nk_host_ifname}")
self._nk_host_ifname = None
@@ -459,6 +472,9 @@ class NetDrvContEnv(NetDrvEpEnv):
ip(f"-6 addr add {self.nk_guest_ipv6}/64 dev {self._nk_guest_ifname} nodad", ns=self.netns)
ip(f"-6 route add default via fe80::1 dev {self._nk_guest_ifname}", ns=self.netns)
+ ip(f"-6 route add {self.nk_guest_ipv6}/128 via {self.addr_v['6']}", host=self.remote)
+ self._remote_route_added = True
+
def _tc_ensure_clsact(self):
qdisc = json.loads(cmd(f"tc -j qdisc show dev {self.ifname}").stdout)
for q in qdisc:
@@ -476,6 +492,15 @@ class NetDrvContEnv(NetDrvEpEnv):
return (bpf['pref'], bpf['options']['prog']['id'])
raise Exception("Failed to get BPF prog ID")
+ def _find_bss_map_id(self, prog_id):
+ """Find the .bss map ID for a loaded BPF program."""
+ prog_info = bpftool(f"prog show id {prog_id}", json=True)
+ for map_id in prog_info.get("map_ids", []):
+ map_info = bpftool(f"map show id {map_id}", json=True)
+ if map_info.get("name", "").endswith("bss"):
+ return map_id
+ raise Exception(f"Failed to find .bss map for prog {prog_id}")
+
def _attach_bpf(self):
bpf_obj = self.test_dir / "nk_forward.bpf.o"
if not bpf_obj.exists():
@@ -487,17 +512,7 @@ class NetDrvContEnv(NetDrvEpEnv):
self._tc_attached = True
(self._bpf_prog_pref, self._bpf_prog_id) = self._get_bpf_prog_ids()
- prog_info = bpftool(f"prog show id {self._bpf_prog_id}", json=True)
- map_ids = prog_info.get("map_ids", [])
-
- bss_map_id = None
- for map_id in map_ids:
- map_info = bpftool(f"map show id {map_id}", json=True)
- if map_info.get("name").endswith("bss"):
- bss_map_id = map_id
-
- if bss_map_id is None:
- raise Exception("Failed to find .bss map")
+ bss_map_id = self._find_bss_map_id(self._bpf_prog_id)
ipv6_addr = ipaddress.IPv6Address(self.ipv6_prefix)
ipv6_bytes = ipv6_addr.packed
@@ -505,3 +520,31 @@ class NetDrvContEnv(NetDrvEpEnv):
value = ipv6_bytes + ifindex_bytes
value_hex = ' '.join(f'{b:02x}' for b in value)
bpftool(f"map update id {bss_map_id} key hex 00 00 00 00 value hex {value_hex}")
+
+ def _attach_primary_rx_redirect_bpf(self):
+ """Attach BPF redirect program on the primary netkit ingress."""
+ bpf_obj = self.test_dir / "nk_primary_rx_redirect.bpf.o"
+ if not bpf_obj.exists():
+ raise KsftSkipEx("Primary RX redirect BPF prog not found")
+
+ cmd(f"tc qdisc add dev {self._nk_host_ifname} clsact")
+ cmd(f"tc filter add dev {self._nk_host_ifname} ingress"
+ f" bpf obj {bpf_obj} sec tc/ingress direct-action")
+ self._primary_rx_redirect_attached = True
+
+ filters = json.loads(
+ cmd(f"tc -j filter show dev {self._nk_host_ifname} ingress").stdout)
+ redirect_prog_id = None
+ for bpf in filters:
+ if 'options' not in bpf:
+ continue
+ if bpf['options']['bpf_name'].startswith('nk_primary_rx_redirect'):
+ redirect_prog_id = bpf['options']['prog']['id']
+ break
+ if redirect_prog_id is None:
+ raise Exception("Failed to get primary RX redirect BPF prog ID")
+
+ bss_map_id = self._find_bss_map_id(redirect_prog_id)
+ phys_ifindex_bytes = self.ifindex.to_bytes(4, byteorder='little')
+ value_hex = ' '.join(f'{b:02x}' for b in phys_ifindex_bytes)
+ bpftool(f"map update id {bss_map_id} key hex 00 00 00 00 value hex {value_hex}")
--
2.52.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net-next v2 6/6] selftests: drv-net: add netkit devmem tests
2026-05-05 0:27 [PATCH net-next v2 0/6] net: devmem: support devmem with netkit devices Bobby Eshleman
` (4 preceding siblings ...)
2026-05-05 0:27 ` [PATCH net-next v2 5/6] selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv Bobby Eshleman
@ 2026-05-05 0:27 ` Bobby Eshleman
5 siblings, 0 replies; 8+ messages in thread
From: Bobby Eshleman @ 2026-05-05 0:27 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan
Cc: netdev, linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
Stanislav Fomichev, Mina Almasry, Bobby Eshleman
From: Bobby Eshleman <bobbyeshleman@meta.com>
Add nk_devmem.py with four tests for TCP devmem through a netkit device:
These tests are just duplicates of the original devmem tests, with some
adjusted parameters such as telling ncdevmem to avoid device setup
(since it only has access to netkit, not a phys device).
Each test uses NetDrvContEnv with primary_rx_redirect=True to set up the
BPF redirect program on the primary netkit interface.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v2:
- Add nk_devmem.py to TEST_PROGS in Makefile (Sashiko)
---
tools/testing/selftests/drivers/net/hw/Makefile | 1 +
.../testing/selftests/drivers/net/hw/nk_devmem.py | 40 ++++++++++++++++++++++
2 files changed, 41 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/hw/Makefile b/tools/testing/selftests/drivers/net/hw/Makefile
index 85ca4d1ecf9e..2f78c6aec397 100644
--- a/tools/testing/selftests/drivers/net/hw/Makefile
+++ b/tools/testing/selftests/drivers/net/hw/Makefile
@@ -34,6 +34,7 @@ TEST_PROGS = \
irq.py \
loopback.sh \
nic_timestamp.py \
+ nk_devmem.py \
nk_netns.py \
nk_qlease.py \
ntuple.py \
diff --git a/tools/testing/selftests/drivers/net/hw/nk_devmem.py b/tools/testing/selftests/drivers/net/hw/nk_devmem.py
new file mode 100755
index 000000000000..c069d525798b
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/hw/nk_devmem.py
@@ -0,0 +1,40 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+"""Test devmem TCP with netkit."""
+
+from os import path
+from lib.py import ksft_run, ksft_exit, ksft_disruptive
+from lib.py import NetDrvContEnv
+from lib.py.devmem import setup_test, run_rx, run_tx, run_tx_chunks, run_rx_hds
+
+
+@ksft_disruptive
+def check_nk_rx(cfg) -> None:
+ run_rx(cfg)
+
+
+@ksft_disruptive
+def check_nk_tx(cfg) -> None:
+ run_tx(cfg)
+
+
+@ksft_disruptive
+def check_nk_tx_chunks(cfg) -> None:
+ run_tx_chunks(cfg)
+
+
+@ksft_disruptive
+def check_nk_rx_hds(cfg) -> None:
+ run_rx_hds(cfg)
+
+
+def main() -> None:
+ with NetDrvContEnv(__file__, rxqueues=2, primary_rx_redirect=True) as cfg:
+ setup_test(cfg, path.abspath(path.dirname(__file__) + "/ncdevmem"))
+ ksft_run([check_nk_rx, check_nk_tx, check_nk_tx_chunks, check_nk_rx_hds],
+ args=(cfg,))
+ ksft_exit()
+
+
+if __name__ == "__main__":
+ main()
--
2.52.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v2 1/6] net: add netmem_tx modes that indicate dma capability
2026-05-05 0:27 ` [PATCH net-next v2 1/6] net: add netmem_tx modes that indicate dma capability Bobby Eshleman
@ 2026-05-05 17:41 ` Harshitha Ramamurthy
0 siblings, 0 replies; 8+ messages in thread
From: Harshitha Ramamurthy @ 2026-05-05 17:41 UTC (permalink / raw)
To: Bobby Eshleman
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
Joshua Washington, Saeed Mahameed, Tariq Toukan, Mark Bloch,
Leon Romanovsky, Alexander Duyck, kernel-team, Daniel Borkmann,
Nikolay Aleksandrov, Shuah Khan, netdev, linux-doc, linux-kernel,
linux-rdma, bpf, linux-kselftest, Stanislav Fomichev,
Mina Almasry, Bobby Eshleman
On Mon, May 4, 2026 at 5:27 PM Bobby Eshleman <bobbyeshleman@gmail.com> wrote:
>
> From: Bobby Eshleman <bobbyeshleman@meta.com>
>
> Devices that support netmem TX previously set dev->netmem_tx = true.
> This was checked in validate_xmit_unreadable_skb() to drop unreadable
> skbs (skbs with dmabuf-backed frags) before they reach drivers that
> would mishandle them or devices that would not have the iommu mappings
> for them.
>
> Some virtual devices like netkit (or ifb) never DMA and never touch frag
> contents, as they essentially just forward the skb to another device.
> They are unable to forward unreadable skbs, however, because they fail
> to pass TX validation checks on dev->netmem_tx. This single bit flag
> doesn't give the TX validator enough information to differentiate
> devices that will attempt DMA on the unreadable skb and those that will
> simply route it untouched.
>
> This patch fixes this issue by adding an additional bit to netmem_tx, so
> that drivers can indicate 1) if they have netmem support, and 2) if they
> do, are they DMA-capable or not?
>
> Replace the boolean with a 2-bit enum:
>
> NETMEM_TX_NONE - no netmem TX support (drop unreadable skbs)
> NETMEM_TX_DMA - full support, device does DMA
> NETMEM_TX_NO_DMA - pass-through, device never DMAs
>
> Update drivers to reflect these definitions. NIC drivers use
> NETMEM_TX_DMA, and netkit uses NETMEM_TX_NO_DMA.
>
> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> ---
> Changes in v2:
> - Squash driver conversion patches (2-5) into patch 1 (Jakub)
> ---
> Documentation/networking/net_cachelines/net_device.rst | 2 +-
> Documentation/networking/netmem.rst | 8 +++++++-
> Documentation/translations/zh_CN/networking/netmem.rst | 7 ++++++-
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
> drivers/net/ethernet/google/gve/gve_main.c | 2 +-
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
> drivers/net/ethernet/meta/fbnic/fbnic_netdev.c | 2 +-
> drivers/net/netkit.c | 1 +
> include/linux/netdevice.h | 11 +++++++++--
> 9 files changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/networking/net_cachelines/net_device.rst b/Documentation/networking/net_cachelines/net_device.rst
> index 1c19bb7705df..c85784259544 100644
> --- a/Documentation/networking/net_cachelines/net_device.rst
> +++ b/Documentation/networking/net_cachelines/net_device.rst
> @@ -10,7 +10,7 @@ Type Name fastpath_tx_acce
> =================================== =========================== =================== =================== ===================================================================================
> unsigned_long:32 priv_flags read_mostly __dev_queue_xmit(tx)
> unsigned_long:1 lltx read_mostly HARD_TX_LOCK,HARD_TX_TRYLOCK,HARD_TX_UNLOCK(tx)
> -unsigned long:1 netmem_tx:1; read_mostly
> +unsigned long:2 netmem_tx:2; read_mostly
> char name[16]
> struct netdev_name_node* name_node
> struct dev_ifalias* ifalias
> diff --git a/Documentation/networking/netmem.rst b/Documentation/networking/netmem.rst
> index b63aded46337..217869d1108d 100644
> --- a/Documentation/networking/netmem.rst
> +++ b/Documentation/networking/netmem.rst
> @@ -95,4 +95,10 @@ Driver TX Requirements
> netdev@, or reach out to the maintainers and/or almasrymina@google.com for
> help adding the netmem API.
>
> -2. Driver should declare support by setting `netdev->netmem_tx = true`
> +2. Driver should declare support by setting `netdev->netmem_tx` to the
> + appropriate mode:
> +
> + - `NETMEM_TX_DMA`: for physical devices that perform DMA.
> +
> + - `NETMEM_TX_NO_DMA`: for virtual or passthrough devices that do
> + not DMA, but still support handling of netmem-backed skbs.
> diff --git a/Documentation/translations/zh_CN/networking/netmem.rst b/Documentation/translations/zh_CN/networking/netmem.rst
> index fe351a240f02..320f3eacf51b 100644
> --- a/Documentation/translations/zh_CN/networking/netmem.rst
> +++ b/Documentation/translations/zh_CN/networking/netmem.rst
> @@ -89,4 +89,9 @@ dma-mapping API 去处理。
> 使用某个还不存在的 netmem API,你可以自行添加并提交到 netdev@,也可以联系维护
> 人员或者发送邮件至 almasrymina@google.com 寻求帮助。
>
> -2. 驱动程序应通过设置 netdev->netmem_tx = true 来表明自身支持 netmem 功能。
> +2. 驱动程序应将 `netdev->netmem_tx` 设置为适当的模式:
> +
> + - `NETMEM_TX_DMA`:适用于执行 DMA 的物理设备。
> +
> + - `NETMEM_TX_NO_DMA`:适用于不执行 DMA 的虚拟或透传设备,但仍支持
> + 处理 netmem 支持的 skb。
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 8c55874f44ca..ed9c22dc4a5a 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -17120,7 +17120,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> dev->queue_mgmt_ops = &bnxt_queue_mgmt_ops_unsupp;
> if (BNXT_SUPPORTS_QUEUE_API(bp))
> dev->queue_mgmt_ops = &bnxt_queue_mgmt_ops;
> - dev->netmem_tx = true;
> + dev->netmem_tx = NETMEM_TX_DMA;
>
> rc = register_netdev(dev);
> if (rc)
> diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
> index 424d973c97f2..dd2b8f087163 100644
> --- a/drivers/net/ethernet/google/gve/gve_main.c
> +++ b/drivers/net/ethernet/google/gve/gve_main.c
> @@ -2894,7 +2894,7 @@ static int gve_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> goto abort_with_wq;
>
> if (!gve_is_gqi(priv) && !gve_is_qpl(priv))
> - dev->netmem_tx = true;
> + dev->netmem_tx = NETMEM_TX_DMA;
Acked-by: Harshitha Ramamurthy <hramamurthy@google.com>
>
> err = register_netdev(dev);
> if (err)
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 5a46870c4b74..fc49aae38807 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -5924,7 +5924,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
>
> netdev->priv_flags |= IFF_UNICAST_FLT;
>
> - netdev->netmem_tx = true;
> + netdev->netmem_tx = NETMEM_TX_DMA;
>
> netif_set_tso_max_size(netdev, GSO_MAX_SIZE);
> mlx5e_set_xdp_feature(priv);
> diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
> index c406a3b56b37..138e522ef9b9 100644
> --- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
> +++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
> @@ -752,7 +752,7 @@ struct net_device *fbnic_netdev_alloc(struct fbnic_dev *fbd)
> netdev->netdev_ops = &fbnic_netdev_ops;
> netdev->stat_ops = &fbnic_stat_ops;
> netdev->queue_mgmt_ops = &fbnic_queue_mgmt_ops;
> - netdev->netmem_tx = true;
> + netdev->netmem_tx = NETMEM_TX_DMA;
>
> fbnic_set_ethtool_ops(netdev);
>
> diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c
> index 5e2eecc3165d..0ad6a806d7d5 100644
> --- a/drivers/net/netkit.c
> +++ b/drivers/net/netkit.c
> @@ -466,6 +466,7 @@ static void netkit_setup(struct net_device *dev)
> dev->priv_flags |= IFF_NO_QUEUE;
> dev->priv_flags |= IFF_DISABLE_NETPOLL;
> dev->lltx = true;
> + dev->netmem_tx = NETMEM_TX_NO_DMA;
>
> dev->netdev_ops = &netkit_netdev_ops;
> dev->ethtool_ops = &netkit_ethtool_ops;
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 0e1e581efc5a..11d68e75eb4f 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1788,6 +1788,12 @@ enum netdev_stat_type {
> NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */
> };
>
> +enum netmem_tx_mode {
> + NETMEM_TX_NONE, /* no netmem TX support */
> + NETMEM_TX_DMA, /* DMA-capable netmem TX (real HW) */
> + NETMEM_TX_NO_DMA, /* no DMA, e.g. passthrough for virtual devs */
> +};
> +
> enum netdev_reg_state {
> NETREG_UNINITIALIZED = 0,
> NETREG_REGISTERED, /* completed register_netdevice */
> @@ -1809,7 +1815,8 @@ enum netdev_reg_state {
> * @lltx: device supports lockless Tx. Deprecated for real HW
> * drivers. Mainly used by logical interfaces, such as
> * bonding and tunnels
> - * @netmem_tx: device support netmem_tx.
> + * @netmem_tx: device netmem TX mode (NETMEM_TX_NONE, NETMEM_TX_DMA,
> + * or NETMEM_TX_NO_DMA).
> *
> * @name: This is the first field of the "visible" part of this structure
> * (i.e. as seen by users in the "Space.c" file). It is the name
> @@ -2132,7 +2139,7 @@ struct net_device {
> struct_group(priv_flags_fast,
> unsigned long priv_flags:32;
> unsigned long lltx:1;
> - unsigned long netmem_tx:1;
> + unsigned long netmem_tx:2;
> );
> const struct net_device_ops *netdev_ops;
> const struct header_ops *header_ops;
>
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-05-05 17:42 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-05 0:27 [PATCH net-next v2 0/6] net: devmem: support devmem with netkit devices Bobby Eshleman
2026-05-05 0:27 ` [PATCH net-next v2 1/6] net: add netmem_tx modes that indicate dma capability Bobby Eshleman
2026-05-05 17:41 ` Harshitha Ramamurthy
2026-05-05 0:27 ` [PATCH net-next v2 2/6] net: devmem: support TX over NETMEM_TX_NO_DMA devices Bobby Eshleman
2026-05-05 0:27 ` [PATCH net-next v2 3/6] selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration Bobby Eshleman
2026-05-05 0:27 ` [PATCH net-next v2 4/6] selftests: drv-net: refactor devmem command builders into lib module Bobby Eshleman
2026-05-05 0:27 ` [PATCH net-next v2 5/6] selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv Bobby Eshleman
2026-05-05 0:27 ` [PATCH net-next v2 6/6] selftests: drv-net: add netkit devmem tests Bobby Eshleman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox