Linux Kernel Selftest development
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices
@ 2026-05-08  2:27 Bobby Eshleman
  2026-05-08  2:27 ` [PATCH net-next v3 1/8] net: convert netmem_tx flag to enum Bobby Eshleman
                   ` (8 more replies)
  0 siblings, 9 replies; 27+ messages in thread
From: Bobby Eshleman @ 2026-05-08  2:27 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan
  Cc: dw, sdf.kernel, mohsin.bashr, willemb, jiang.kun2, xu.xin16,
	wang.yaxin, netdev, linux-doc, linux-kernel, linux-rdma, bpf,
	linux-kselftest, Stanislav Fomichev, Mina Almasry, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Bobby Eshleman

This series enables TCP devmem TX through netkit devices.

Netkit now supports queue leasing. A physical NIC's RX queue can be
leased to a netkit guest interface inside a container namespace. This
gives the container a devmem-capable data path on the RX side (bind-rx,
etc...). On the TX side, the container process binds to its netkit guest
interface and sends traffic that netkit redirects (via BPF or ip
forwarding) to the physical NIC for DMA.

Two things in the existing devmem TX path prevent this from working:

1. validate_xmit_unreadable_skb() requires dev->netmem_tx before it will
   forward a dmabuf-backed (unreadable) skb. This protects skbs from
   landing on devices that don't have the IOMMU mappings for the backing
   dmabuf or that don't speak netmem. Netkit, however, does not support
   DMA, doesn't attempt to read unreadable skb pages and so doesn't
   break netmem (it is pure skb routing and redirection). It is
   functionally capable of routing unreadable skbs, but there is no way
   for the TX validation pathway to distinguish between a device that
   will actually attempt DMA-ing the skb and another device
   (like netkit) that does not DMA but also does not break
   netmem.

2. bind_tx_doit uses the bound device as the DMA device.  When the user
   binds devmem TX to the netkit guest, the bind handler attempts to
   create DMA mappings against netkit, which has no DMA capability and
   no IOMMU mappings.

This series solves these problems as follows:

1. Extend netmem_tx to two bits, assigned to one of three values:

   NETMEM_TX_NONE   - netmem not supported
   NETMEM_TX_DMA    - netmem supported and performs DMA
   NETMEM_TX_NO_DMA - netmem supported, but does not DMA

   With these bits, phys devices can set NETMEM_TX_DMA and devices like
   netkit set NETMEM_TX_NO_DMA. The validation TX path ensures that any
   DMA-capable netdev exactly matches the bound device, guaranteeing the
   correct mapping of the bound dmabuf. The validation TX path also
   allows devices with NETMEM_TX_NO_DMA to pass, knowing these devices
   will not misuse netmem or run into IOMMU faults. After redirection or
   routing and the skb finally makes its way through the stack to a
   physical device's TX path, the above NETMEM_TX_DMA check is performed
   again to guarantee the device has the appropriate binding/mappings.

2. On TX bind, the bind handler recognizes NETMEM_TX_NO_DMA devices and
   finds the phys TX device and binds to that instead. For the netkit
   case, if it has been leased a queue from a DMA-capable device
   already, then the bind action is performed on the DMA-capable device
   instead and the dmabuf is mapped correctly.

---
Changes in v3:
- Fix validate_xmit_unreadable_skb() logic for non-devmem
  unreadable niovs (should not be dropped) (Sashiko)
- Simplify lock handling in bind_tx, no premature release (Jakub)
- split NO_DMA changes into separate patch (Jakub)
- fixed some pylint issues, one required an additional patch ("selftests:
  drv-net: make attr _nk_guest_ifname public") to rename a variable from
  private to public
- see per-patch changelist for more detailed changes
- Link to v2: https://lore.kernel.org/r/20260504-tcp-dm-netkit-v2-0-56d52ac72fd4@meta.com

Changes in v2:
- Squash driver conversion patches (2-5) into patch 1 (Jakub)
- In validate_xmit_unreadable_skb() to check netmem_tx mode before inspecting
  frags (Jakub)
- Lock bind_dev around netdev_queue_get_dma_dev() when bind_dev != netdev to
  fix lockdep (Sashiko)
- Move require_devmem() into individual test functions so KsftSkipEx goes up to
  ksft_run() (Sashiko)
- Add nk_devmem.py to TEST_PROGS in Makefile (Sashiko)
- Link to v1:
  https://lore.kernel.org/all/20260428-tcp-dm-netkit-v1-0-719280eba4d2@meta.com/

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>

---
Bobby Eshleman (8):
      net: convert netmem_tx flag to enum
      net: netkit: declare NETMEM_TX_NO_DMA mode
      net: devmem: support TX over NETMEM_TX_NO_DMA devices
      selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration
      selftests: drv-net: make attr _nk_guest_ifname public
      selftests: drv-net: refactor devmem command builders into lib module
      selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv
      selftests: drv-net: add netkit devmem tests

 .../networking/net_cachelines/net_device.rst       |   2 +-
 Documentation/networking/netmem.rst                |   8 +-
 .../translations/zh_CN/networking/netmem.rst       |   7 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c          |   2 +-
 drivers/net/ethernet/google/gve/gve_main.c         |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   2 +-
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.c     |   2 +-
 drivers/net/netkit.c                               |   1 +
 include/linux/netdevice.h                          |  11 +-
 net/core/dev.c                                     |   5 +-
 net/core/devmem.c                                  |   6 +-
 net/core/devmem.h                                  |   9 +-
 net/core/netdev-genl.c                             |  65 +++++-
 tools/testing/selftests/drivers/net/hw/Makefile    |   1 +
 tools/testing/selftests/drivers/net/hw/devmem.py   |  77 ++------
 .../selftests/drivers/net/hw/lib/py/devmem.py      | 218 +++++++++++++++++++++
 tools/testing/selftests/drivers/net/hw/ncdevmem.c  |  58 +++---
 .../testing/selftests/drivers/net/hw/nk_devmem.py  |  55 ++++++
 .../drivers/net/hw/nk_primary_rx_redirect.bpf.c    |  39 ++++
 .../testing/selftests/drivers/net/hw/nk_qlease.py  |   8 +-
 tools/testing/selftests/drivers/net/lib/py/env.py  | 109 ++++++++---
 21 files changed, 549 insertions(+), 138 deletions(-)
---
base-commit: 790ead9394860e7d70c5e0e50a35b243e909a618
change-id: 20260423-tcp-dm-netkit-2bd78b638d30

Best regards,
-- 
Bobby Eshleman <bobbyeshleman@meta.com>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH net-next v3 1/8] net: convert netmem_tx flag to enum
  2026-05-08  2:27 [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices Bobby Eshleman
@ 2026-05-08  2:27 ` Bobby Eshleman
  2026-05-08 14:56   ` Stanislav Fomichev
  2026-05-08  2:27 ` [PATCH net-next v3 2/8] net: netkit: declare NETMEM_TX_NO_DMA mode Bobby Eshleman
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Bobby Eshleman @ 2026-05-08  2:27 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan
  Cc: dw, sdf.kernel, mohsin.bashr, willemb, jiang.kun2, xu.xin16,
	wang.yaxin, netdev, linux-doc, linux-kernel, linux-rdma, bpf,
	linux-kselftest, Stanislav Fomichev, Mina Almasry, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman@meta.com>

Devices that support netmem TX previously set dev->netmem_tx = true.
This was checked in validate_xmit_unreadable_skb() to drop unreadable
skbs (skbs with dmabuf-backed frags) before they reach drivers that
would mishandle them or devices that would not have the iommu mappings
for them.

A subsequent patch will introduce a third state for virtual devices
that forward unreadable skbs without ever performing DMA on them. To
prepare for that, convert the boolean dev->netmem_tx into an enum:

NETMEM_TX_NONE   - no netmem TX support (drop unreadable skbs)
NETMEM_TX_DMA    - full support, device does DMA

Update the existing NIC drivers (bnxt, gve, mlx5, fbnic) and the
validators in net/core to use the new enum. No functional change.

Acked-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v3:
- Split NO_DMA changes into subsequent commit (Jakub)
- Move !netdev->netmem_tx -> netdev->netmem_tx ==
  NETMEM_TX_NONE conversions to this patch (Jakub)

Changes in v2:
- Squash driver conversion patches (2-5) into patch 1 (Jakub)
---
 Documentation/networking/netmem.rst                    | 5 ++++-
 Documentation/translations/zh_CN/networking/netmem.rst | 4 +++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c              | 2 +-
 drivers/net/ethernet/google/gve/gve_main.c             | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c      | 2 +-
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.c         | 2 +-
 include/linux/netdevice.h                              | 8 +++++++-
 net/core/dev.c                                         | 2 +-
 net/core/netdev-genl.c                                 | 2 +-
 9 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/Documentation/networking/netmem.rst b/Documentation/networking/netmem.rst
index b63aded46337..5ccadba4f373 100644
--- a/Documentation/networking/netmem.rst
+++ b/Documentation/networking/netmem.rst
@@ -95,4 +95,7 @@ Driver TX Requirements
    netdev@, or reach out to the maintainers and/or almasrymina@google.com for
    help adding the netmem API.
 
-2. Driver should declare support by setting `netdev->netmem_tx = true`
+2. Driver should declare support by setting `netdev->netmem_tx` to the
+   appropriate mode:
+
+   - `NETMEM_TX_DMA`: for physical devices that perform DMA.
diff --git a/Documentation/translations/zh_CN/networking/netmem.rst b/Documentation/translations/zh_CN/networking/netmem.rst
index fe351a240f02..9c84423b7528 100644
--- a/Documentation/translations/zh_CN/networking/netmem.rst
+++ b/Documentation/translations/zh_CN/networking/netmem.rst
@@ -89,4 +89,6 @@ dma-mapping API 去处理。
 使用某个还不存在的 netmem API,你可以自行添加并提交到 netdev@,也可以联系维护
 人员或者发送邮件至 almasrymina@google.com 寻求帮助。
 
-2. 驱动程序应通过设置 netdev->netmem_tx = true 来表明自身支持 netmem 功能。
+2. 驱动程序应将 `netdev->netmem_tx` 设置为适当的模式:
+
+   - `NETMEM_TX_DMA`:适用于执行 DMA 的物理设备。
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 8c55874f44ca..ed9c22dc4a5a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -17120,7 +17120,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dev->queue_mgmt_ops = &bnxt_queue_mgmt_ops_unsupp;
 	if (BNXT_SUPPORTS_QUEUE_API(bp))
 		dev->queue_mgmt_ops = &bnxt_queue_mgmt_ops;
-	dev->netmem_tx = true;
+	dev->netmem_tx = NETMEM_TX_DMA;
 
 	rc = register_netdev(dev);
 	if (rc)
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 424d973c97f2..dd2b8f087163 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -2894,7 +2894,7 @@ static int gve_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto abort_with_wq;
 
 	if (!gve_is_gqi(priv) && !gve_is_qpl(priv))
-		dev->netmem_tx = true;
+		dev->netmem_tx = NETMEM_TX_DMA;
 
 	err = register_netdev(dev);
 	if (err)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 5a46870c4b74..fc49aae38807 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5924,7 +5924,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 
 	netdev->priv_flags       |= IFF_UNICAST_FLT;
 
-	netdev->netmem_tx = true;
+	netdev->netmem_tx = NETMEM_TX_DMA;
 
 	netif_set_tso_max_size(netdev, GSO_MAX_SIZE);
 	mlx5e_set_xdp_feature(priv);
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
index c406a3b56b37..138e522ef9b9 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
@@ -752,7 +752,7 @@ struct net_device *fbnic_netdev_alloc(struct fbnic_dev *fbd)
 	netdev->netdev_ops = &fbnic_netdev_ops;
 	netdev->stat_ops = &fbnic_stat_ops;
 	netdev->queue_mgmt_ops = &fbnic_queue_mgmt_ops;
-	netdev->netmem_tx = true;
+	netdev->netmem_tx = NETMEM_TX_DMA;
 
 	fbnic_set_ethtool_ops(netdev);
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0e1e581efc5a..580bccb118a0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1788,6 +1788,11 @@ enum netdev_stat_type {
 	NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */
 };
 
+enum netmem_tx_mode {
+	NETMEM_TX_NONE,		/* no netmem TX support */
+	NETMEM_TX_DMA,		/* DMA-capable netmem TX (real HW) */
+};
+
 enum netdev_reg_state {
 	NETREG_UNINITIALIZED = 0,
 	NETREG_REGISTERED,	/* completed register_netdevice */
@@ -1809,7 +1814,8 @@ enum netdev_reg_state {
  *	@lltx:		device supports lockless Tx. Deprecated for real HW
  *			drivers. Mainly used by logical interfaces, such as
  *			bonding and tunnels
- *	@netmem_tx:	device support netmem_tx.
+ *	@netmem_tx:	device netmem TX mode (NETMEM_TX_NONE or
+ *			NETMEM_TX_DMA).
  *
  *	@name:	This is the first field of the "visible" part of this structure
  *		(i.e. as seen by users in the "Space.c" file).  It is the name
diff --git a/net/core/dev.c b/net/core/dev.c
index 06c195906231..fbe4c328a367 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3996,7 +3996,7 @@ static struct sk_buff *validate_xmit_unreadable_skb(struct sk_buff *skb,
 	if (likely(skb_frags_readable(skb)))
 		goto out;
 
-	if (!dev->netmem_tx)
+	if (dev->netmem_tx == NETMEM_TX_NONE)
 		goto out_free;
 
 	shinfo = skb_shinfo(skb);
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index b8f6076d8007..4d2c49371cdb 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -1164,7 +1164,7 @@ int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct genl_info *info)
 		goto err_unlock_netdev;
 	}
 
-	if (!netdev->netmem_tx) {
+	if (netdev->netmem_tx == NETMEM_TX_NONE) {
 		err = -EOPNOTSUPP;
 		NL_SET_ERR_MSG(info->extack,
 			       "Driver does not support netmem TX");

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v3 2/8] net: netkit: declare NETMEM_TX_NO_DMA mode
  2026-05-08  2:27 [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices Bobby Eshleman
  2026-05-08  2:27 ` [PATCH net-next v3 1/8] net: convert netmem_tx flag to enum Bobby Eshleman
@ 2026-05-08  2:27 ` Bobby Eshleman
  2026-05-08 14:57   ` Stanislav Fomichev
  2026-05-08  2:27 ` [PATCH net-next v3 3/8] net: devmem: support TX over NETMEM_TX_NO_DMA devices Bobby Eshleman
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Bobby Eshleman @ 2026-05-08  2:27 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan
  Cc: dw, sdf.kernel, mohsin.bashr, willemb, jiang.kun2, xu.xin16,
	wang.yaxin, netdev, linux-doc, linux-kernel, linux-rdma, bpf,
	linux-kselftest, Stanislav Fomichev, Mina Almasry, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman@meta.com>

Some virtual devices like netkit (or ifb) never DMA and never touch frag
contents, they just forward the skb to another device. They are unable
to forward unreadable skbs, however, because they fail to pass TX
validation checks on dev->netmem_tx. The existing two-state
NETMEM_TX_NONE / NETMEM_TX_DMA doesn't give the TX validator enough
information to differentiate devices that will attempt DMA on the
unreadable skb from those that will simply route it untouched.

Add a third mode to the enum so drivers can indicate 1) if they have
netmem TX support, and 2) if they do, whether they are DMA-capable:

NETMEM_TX_NO_DMA - pass-through, device never DMAs

Widen dev->netmem_tx from a 1-bit field to 2 bits to fit the new value,
and declare netkit as NETMEM_TX_NO_DMA. Devmem TX support over these
devices comes in a follow-up patch.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v3:
- net_cachelines/net_device.rst: align the netmem_tx row's type column
  with the rest of the table by using "unsigned_long:2" instead of
  "unsigned long:2"
- Split this into a distinct patch (Jakub)
---
 Documentation/networking/net_cachelines/net_device.rst | 2 +-
 Documentation/networking/netmem.rst                    | 3 +++
 Documentation/translations/zh_CN/networking/netmem.rst | 3 +++
 drivers/net/netkit.c                                   | 1 +
 include/linux/netdevice.h                              | 7 ++++---
 5 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/Documentation/networking/net_cachelines/net_device.rst b/Documentation/networking/net_cachelines/net_device.rst
index 1c19bb7705df..7b3392553fd6 100644
--- a/Documentation/networking/net_cachelines/net_device.rst
+++ b/Documentation/networking/net_cachelines/net_device.rst
@@ -10,7 +10,7 @@ Type                                Name                        fastpath_tx_acce
 =================================== =========================== =================== =================== ===================================================================================
 unsigned_long:32                    priv_flags                  read_mostly                             __dev_queue_xmit(tx)
 unsigned_long:1                     lltx                        read_mostly                             HARD_TX_LOCK,HARD_TX_TRYLOCK,HARD_TX_UNLOCK(tx)
-unsigned long:1                     netmem_tx:1;                read_mostly
+unsigned_long:2                     netmem_tx:2;                read_mostly
 char                                name[16]
 struct netdev_name_node*            name_node
 struct dev_ifalias*                 ifalias
diff --git a/Documentation/networking/netmem.rst b/Documentation/networking/netmem.rst
index 5ccadba4f373..217869d1108d 100644
--- a/Documentation/networking/netmem.rst
+++ b/Documentation/networking/netmem.rst
@@ -99,3 +99,6 @@ Driver TX Requirements
    appropriate mode:
 
    - `NETMEM_TX_DMA`: for physical devices that perform DMA.
+
+   - `NETMEM_TX_NO_DMA`: for virtual or passthrough devices that do
+     not DMA, but still support handling of netmem-backed skbs.
diff --git a/Documentation/translations/zh_CN/networking/netmem.rst b/Documentation/translations/zh_CN/networking/netmem.rst
index 9c84423b7528..320f3eacf51b 100644
--- a/Documentation/translations/zh_CN/networking/netmem.rst
+++ b/Documentation/translations/zh_CN/networking/netmem.rst
@@ -92,3 +92,6 @@ dma-mapping API 去处理。
 2. 驱动程序应将 `netdev->netmem_tx` 设置为适当的模式:
 
    - `NETMEM_TX_DMA`:适用于执行 DMA 的物理设备。
+
+   - `NETMEM_TX_NO_DMA`:适用于不执行 DMA 的虚拟或透传设备,但仍支持
+     处理 netmem 支持的 skb。
diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c
index 5e2eecc3165d..0ad6a806d7d5 100644
--- a/drivers/net/netkit.c
+++ b/drivers/net/netkit.c
@@ -466,6 +466,7 @@ static void netkit_setup(struct net_device *dev)
 	dev->priv_flags |= IFF_NO_QUEUE;
 	dev->priv_flags |= IFF_DISABLE_NETPOLL;
 	dev->lltx = true;
+	dev->netmem_tx = NETMEM_TX_NO_DMA;
 
 	dev->netdev_ops     = &netkit_netdev_ops;
 	dev->ethtool_ops    = &netkit_ethtool_ops;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 580bccb118a0..11d68e75eb4f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1791,6 +1791,7 @@ enum netdev_stat_type {
 enum netmem_tx_mode {
 	NETMEM_TX_NONE,		/* no netmem TX support */
 	NETMEM_TX_DMA,		/* DMA-capable netmem TX (real HW) */
+	NETMEM_TX_NO_DMA,	/* no DMA, e.g. passthrough for virtual devs */
 };
 
 enum netdev_reg_state {
@@ -1814,8 +1815,8 @@ enum netdev_reg_state {
  *	@lltx:		device supports lockless Tx. Deprecated for real HW
  *			drivers. Mainly used by logical interfaces, such as
  *			bonding and tunnels
- *	@netmem_tx:	device netmem TX mode (NETMEM_TX_NONE or
- *			NETMEM_TX_DMA).
+ *	@netmem_tx:	device netmem TX mode (NETMEM_TX_NONE, NETMEM_TX_DMA,
+ *			or NETMEM_TX_NO_DMA).
  *
  *	@name:	This is the first field of the "visible" part of this structure
  *		(i.e. as seen by users in the "Space.c" file).  It is the name
@@ -2138,7 +2139,7 @@ struct net_device {
 	struct_group(priv_flags_fast,
 		unsigned long		priv_flags:32;
 		unsigned long		lltx:1;
-		unsigned long		netmem_tx:1;
+		unsigned long		netmem_tx:2;
 	);
 	const struct net_device_ops *netdev_ops;
 	const struct header_ops *header_ops;

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v3 3/8] net: devmem: support TX over NETMEM_TX_NO_DMA devices
  2026-05-08  2:27 [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices Bobby Eshleman
  2026-05-08  2:27 ` [PATCH net-next v3 1/8] net: convert netmem_tx flag to enum Bobby Eshleman
  2026-05-08  2:27 ` [PATCH net-next v3 2/8] net: netkit: declare NETMEM_TX_NO_DMA mode Bobby Eshleman
@ 2026-05-08  2:27 ` Bobby Eshleman
  2026-05-08 15:01   ` Stanislav Fomichev
  2026-05-08 20:47   ` Jakub Kicinski
  2026-05-08  2:27 ` [PATCH net-next v3 4/8] selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration Bobby Eshleman
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 27+ messages in thread
From: Bobby Eshleman @ 2026-05-08  2:27 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan
  Cc: dw, sdf.kernel, mohsin.bashr, willemb, jiang.kun2, xu.xin16,
	wang.yaxin, netdev, linux-doc, linux-kernel, linux-rdma, bpf,
	linux-kselftest, Stanislav Fomichev, Mina Almasry, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman@meta.com>

When a netkit virtual device leases queues from a physical NIC, devmem
TX bindings created on the netkit device must still result in the dmabuf
being mapped for dma by the physical device. This patch accomplishes
this by teaching the bind handler to search for the underlying
DMA-capable device by looking it up via leased rx queues. The function
netdev_find_netmem_tx_dev(), used for finding the underlying DMA-capable
device, can be extended to support other non-netkit NETMEM_TX_NO_DMA
devices in the future if needed.

Additionally, this patch extends validate_xmit_unreadable_skb() to
support the netkit case, where the skb is validated twice: once on the
netkit guest device and again on the physical NIC after BPF redirect or
ip forwarding.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v3:
- Fix validate_xmit_unreadable_skb() bug for non-devmem
  unreadable niovs (should not be dropped)
- Major simplification of validate_xmit_unreadable_skb()
- Fix prematurely released lock in bind-tx handler (Jakub)

Changes in v2:
- In validate_xmit_unreadable_skb() to check netmem_tx mode before
  inspecting frags (Jakub)
- Lock bind_dev around netdev_queue_get_dma_dev() when bind_dev !=
  netdev to fix lockdep (Sashiko)
---
 net/core/dev.c         |  3 +++
 net/core/devmem.c      |  6 +++--
 net/core/devmem.h      |  9 ++++++--
 net/core/netdev-genl.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++----
 4 files changed, 72 insertions(+), 9 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index fbe4c328a367..268417c9ef22 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3999,6 +3999,9 @@ static struct sk_buff *validate_xmit_unreadable_skb(struct sk_buff *skb,
 	if (dev->netmem_tx == NETMEM_TX_NONE)
 		goto out_free;
 
+	if (dev->netmem_tx == NETMEM_TX_NO_DMA)
+		goto out;
+
 	shinfo = skb_shinfo(skb);
 
 	if (shinfo->nr_frags > 0) {
diff --git a/net/core/devmem.c b/net/core/devmem.c
index cde4c89bc146..644c286b778f 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -181,7 +181,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
 }
 
 struct net_devmem_dmabuf_binding *
-net_devmem_bind_dmabuf(struct net_device *dev,
+net_devmem_bind_dmabuf(struct net_device *dev, struct net_device *vdev,
 		       struct device *dma_dev,
 		       enum dma_data_direction direction,
 		       unsigned int dmabuf_fd, struct netdev_nl_sock *priv,
@@ -212,6 +212,7 @@ net_devmem_bind_dmabuf(struct net_device *dev,
 	}
 
 	binding->dev = dev;
+	binding->vdev = vdev;
 	xa_init_flags(&binding->bound_rxqs, XA_FLAGS_ALLOC);
 
 	err = percpu_ref_init(&binding->ref,
@@ -397,7 +398,8 @@ struct net_devmem_dmabuf_binding *net_devmem_get_binding(struct sock *sk,
 	 */
 	dst_dev = dst_dev_rcu(dst);
 	if (unlikely(!dst_dev) ||
-	    unlikely(dst_dev != READ_ONCE(binding->dev))) {
+	    unlikely(dst_dev != READ_ONCE(binding->dev) &&
+		     dst_dev != READ_ONCE(binding->vdev))) {
 		err = -ENODEV;
 		goto out_unlock;
 	}
diff --git a/net/core/devmem.h b/net/core/devmem.h
index 1c5c18581fcb..f399632b3c4b 100644
--- a/net/core/devmem.h
+++ b/net/core/devmem.h
@@ -19,7 +19,12 @@ struct net_devmem_dmabuf_binding {
 	struct dma_buf *dmabuf;
 	struct dma_buf_attachment *attachment;
 	struct sg_table *sgt;
+	/* Physical NIC that does the actual DMA for this binding. */
 	struct net_device *dev;
+	/* Virtual device (e.g. netkit) the user called bind-tx on. Must be
+	 * NETMEM_TX_NO_DMA.
+	 */
+	struct net_device *vdev;
 	struct gen_pool *chunk_pool;
 	/* Protect dev */
 	struct mutex lock;
@@ -84,7 +89,7 @@ struct dmabuf_genpool_chunk_owner {
 
 void __net_devmem_dmabuf_binding_free(struct work_struct *wq);
 struct net_devmem_dmabuf_binding *
-net_devmem_bind_dmabuf(struct net_device *dev,
+net_devmem_bind_dmabuf(struct net_device *dev, struct net_device *vdev,
 		       struct device *dma_dev,
 		       enum dma_data_direction direction,
 		       unsigned int dmabuf_fd, struct netdev_nl_sock *priv,
@@ -165,7 +170,7 @@ static inline void net_devmem_put_net_iov(struct net_iov *niov)
 }
 
 static inline struct net_devmem_dmabuf_binding *
-net_devmem_bind_dmabuf(struct net_device *dev,
+net_devmem_bind_dmabuf(struct net_device *dev, struct net_device *vdev,
 		       struct device *dma_dev,
 		       enum dma_data_direction direction,
 		       unsigned int dmabuf_fd,
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 4d2c49371cdb..b4d48f3672a5 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -1077,7 +1077,7 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
 		goto err_rxq_bitmap;
 	}
 
-	binding = net_devmem_bind_dmabuf(netdev, dma_dev, DMA_FROM_DEVICE,
+	binding = net_devmem_bind_dmabuf(netdev, NULL, dma_dev, DMA_FROM_DEVICE,
 					 dmabuf_fd, priv, info->extack);
 	if (IS_ERR(binding)) {
 		err = PTR_ERR(binding);
@@ -1119,9 +1119,43 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info)
 	return err;
 }
 
+/* Find the DMA-capable device for a netmem TX binding.
+ *
+ * For NETMEM_TX_DMA devices, return the device itself.
+ * For NETMEM_TX_NO_DMA devices, walk leased RX queues to find the underlying
+ * physical device and return it.
+ */
+static struct net_device *
+netdev_find_netmem_tx_dev(struct net_device *dev)
+{
+	struct netdev_rx_queue *lease_rxq;
+	struct net_device *phys_dev;
+	int i;
+
+	if (dev->netmem_tx == NETMEM_TX_DMA)
+		return dev;
+
+	if (dev->netmem_tx != NETMEM_TX_NO_DMA)
+		return NULL;
+
+	for (i = 0; i < dev->real_num_rx_queues; i++) {
+		lease_rxq = READ_ONCE(__netif_get_rx_queue(dev, i)->lease);
+		if (!lease_rxq)
+			continue;
+
+		phys_dev = lease_rxq->dev;
+		if (netif_device_present(phys_dev) &&
+		    phys_dev->netmem_tx == NETMEM_TX_DMA)
+			return phys_dev;
+	}
+
+	return NULL;
+}
+
 int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct genl_info *info)
 {
 	struct net_devmem_dmabuf_binding *binding;
+	struct net_device *bind_dev;
 	struct netdev_nl_sock *priv;
 	struct net_device *netdev;
 	struct device *dma_dev;
@@ -1171,22 +1205,41 @@ int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct genl_info *info)
 		goto err_unlock_netdev;
 	}
 
-	dma_dev = netdev_queue_get_dma_dev(netdev, 0, NETDEV_QUEUE_TYPE_TX);
-	binding = net_devmem_bind_dmabuf(netdev, dma_dev, DMA_TO_DEVICE,
-					 dmabuf_fd, priv, info->extack);
+	bind_dev = netdev_find_netmem_tx_dev(netdev);
+	if (!bind_dev) {
+		err = -EOPNOTSUPP;
+		NL_SET_ERR_MSG(info->extack,
+			       "No DMA-capable device found for netmem TX");
+		goto err_unlock_netdev;
+	}
+
+	if (bind_dev != netdev)
+		netdev_lock(bind_dev);
+
+	dma_dev = netdev_queue_get_dma_dev(bind_dev, 0, NETDEV_QUEUE_TYPE_TX);
+
+	binding = net_devmem_bind_dmabuf(bind_dev,
+					 bind_dev != netdev ? netdev : NULL,
+					 dma_dev, DMA_TO_DEVICE, dmabuf_fd,
+					 priv, info->extack);
 	if (IS_ERR(binding)) {
 		err = PTR_ERR(binding);
-		goto err_unlock_netdev;
+		goto err_unlock_bind_dev;
 	}
 
 	nla_put_u32(rsp, NETDEV_A_DMABUF_ID, binding->id);
 	genlmsg_end(rsp, hdr);
 
+	if (bind_dev != netdev)
+		netdev_unlock(bind_dev);
 	netdev_unlock(netdev);
 	mutex_unlock(&priv->lock);
 
 	return genlmsg_reply(rsp, info);
 
+err_unlock_bind_dev:
+	if (bind_dev != netdev)
+		netdev_unlock(bind_dev);
 err_unlock_netdev:
 	netdev_unlock(netdev);
 err_unlock_sock:

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v3 4/8] selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration
  2026-05-08  2:27 [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices Bobby Eshleman
                   ` (2 preceding siblings ...)
  2026-05-08  2:27 ` [PATCH net-next v3 3/8] net: devmem: support TX over NETMEM_TX_NO_DMA devices Bobby Eshleman
@ 2026-05-08  2:27 ` Bobby Eshleman
  2026-05-08 15:01   ` Stanislav Fomichev
  2026-05-08  2:27 ` [PATCH net-next v3 5/8] selftests: drv-net: make attr _nk_guest_ifname public Bobby Eshleman
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Bobby Eshleman @ 2026-05-08  2:27 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan
  Cc: dw, sdf.kernel, mohsin.bashr, willemb, jiang.kun2, xu.xin16,
	wang.yaxin, netdev, linux-doc, linux-kernel, linux-rdma, bpf,
	linux-kselftest, Stanislav Fomichev, Mina Almasry, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman@meta.com>

Add a -n (skip_config) flag that causes ncdevmem to skip NIC
configuration when operating as an RX server. When -n is passed,
ncdevmem skips configuring header split, RSS, and flow steering, as well
as their teardown on exit.

This allows ksft tests to pre-configure the NIC in the host namespace
before launching ncdevmem in the guest namespace. This is needed for
netkit devmem tests where the test harness namespace has direct access
to the NIC and the ncdevmem namespace does not.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
 tools/testing/selftests/drivers/net/hw/ncdevmem.c | 58 +++++++++++++----------
 1 file changed, 34 insertions(+), 24 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
index e098d6534c3c..d96e8a3b5a65 100644
--- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c
+++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
@@ -93,6 +93,7 @@ static char *port;
 static size_t do_validation;
 static int start_queue = -1;
 static int num_queues = -1;
+static int skip_config;
 static char *ifname;
 static unsigned int ifindex;
 static unsigned int dmabuf_id;
@@ -828,7 +829,7 @@ static struct netdev_queue_id *create_queues(void)
 
 static int do_server(struct memory_buffer *mem)
 {
-	struct ethtool_rings_get_rsp *ring_config;
+	struct ethtool_rings_get_rsp *ring_config = NULL;
 	char ctrl_data[sizeof(int) * 20000];
 	size_t non_page_aligned_frags = 0;
 	struct sockaddr_in6 client_addr;
@@ -851,27 +852,29 @@ static int do_server(struct memory_buffer *mem)
 		return -1;
 	}
 
-	ring_config = get_ring_config();
-	if (!ring_config) {
-		pr_err("Failed to get current ring configuration");
-		return -1;
-	}
+	if (!skip_config) {
+		ring_config = get_ring_config();
+		if (!ring_config) {
+			pr_err("Failed to get current ring configuration");
+			return -1;
+		}
 
-	if (configure_headersplit(ring_config, 1)) {
-		pr_err("Failed to enable TCP header split");
-		goto err_free_ring_config;
-	}
+		if (configure_headersplit(ring_config, 1)) {
+			pr_err("Failed to enable TCP header split");
+			goto err_free_ring_config;
+		}
 
-	/* Configure RSS to divert all traffic from our devmem queues */
-	if (configure_rss()) {
-		pr_err("Failed to configure rss");
-		goto err_reset_headersplit;
-	}
+		/* Configure RSS to divert all traffic from our devmem queues */
+		if (configure_rss()) {
+			pr_err("Failed to configure rss");
+			goto err_reset_headersplit;
+		}
 
-	/* Flow steer our devmem flows to start_queue */
-	if (configure_flow_steering(&server_sin)) {
-		pr_err("Failed to configure flow steering");
-		goto err_reset_rss;
+		/* Flow steer our devmem flows to start_queue */
+		if (configure_flow_steering(&server_sin)) {
+			pr_err("Failed to configure flow steering");
+			goto err_reset_rss;
+		}
 	}
 
 	if (bind_rx_queue(ifindex, mem->fd, create_queues(), num_queues, &ys)) {
@@ -1052,13 +1055,17 @@ static int do_server(struct memory_buffer *mem)
 err_unbind:
 	ynl_sock_destroy(ys);
 err_reset_flow_steering:
-	reset_flow_steering();
+	if (!skip_config)
+		reset_flow_steering();
 err_reset_rss:
-	reset_rss();
+	if (!skip_config)
+		reset_rss();
 err_reset_headersplit:
-	restore_ring_config(ring_config);
+	if (!skip_config)
+		restore_ring_config(ring_config);
 err_free_ring_config:
-	ethtool_rings_get_rsp_free(ring_config);
+	if (!skip_config)
+		ethtool_rings_get_rsp_free(ring_config);
 	return err;
 }
 
@@ -1404,7 +1411,7 @@ int main(int argc, char *argv[])
 	int is_server = 0, opt;
 	int ret, err = 1;
 
-	while ((opt = getopt(argc, argv, "Lls:c:p:v:q:t:f:z:")) != -1) {
+	while ((opt = getopt(argc, argv, "Lls:c:p:v:q:t:f:z:n")) != -1) {
 		switch (opt) {
 		case 'L':
 			fail_on_linear = true;
@@ -1436,6 +1443,9 @@ int main(int argc, char *argv[])
 		case 'z':
 			max_chunk = atoi(optarg);
 			break;
+		case 'n':
+			skip_config = 1;
+			break;
 		case '?':
 			fprintf(stderr, "unknown option: %c\n", optopt);
 			break;

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v3 5/8] selftests: drv-net: make attr _nk_guest_ifname public
  2026-05-08  2:27 [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices Bobby Eshleman
                   ` (3 preceding siblings ...)
  2026-05-08  2:27 ` [PATCH net-next v3 4/8] selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration Bobby Eshleman
@ 2026-05-08  2:27 ` Bobby Eshleman
  2026-05-08 15:01   ` Stanislav Fomichev
  2026-05-08  2:27 ` [PATCH net-next v3 6/8] selftests: drv-net: refactor devmem command builders into lib module Bobby Eshleman
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Bobby Eshleman @ 2026-05-08  2:27 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan
  Cc: dw, sdf.kernel, mohsin.bashr, willemb, jiang.kun2, xu.xin16,
	wang.yaxin, netdev, linux-doc, linux-kernel, linux-rdma, bpf,
	linux-kselftest, Stanislav Fomichev, Mina Almasry, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman@meta.com>

Subsequent patches will use the _nk_guest_ifname as a public attr for
setting up devmem. Rename to nk_guest_ifname to avoid angering the
linter about the '_' prefix being used for a non-private attr.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
 tools/testing/selftests/drivers/net/hw/nk_qlease.py |  8 ++++----
 tools/testing/selftests/drivers/net/lib/py/env.py   | 16 ++++++++--------
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/hw/nk_qlease.py b/tools/testing/selftests/drivers/net/hw/nk_qlease.py
index aa83dc321328..139a91ebd229 100755
--- a/tools/testing/selftests/drivers/net/hw/nk_qlease.py
+++ b/tools/testing/selftests/drivers/net/hw/nk_qlease.py
@@ -71,7 +71,7 @@ def test_iou_zcrx(cfg) -> None:
     flow_rule_id = set_flow_rule(cfg)
     defer(ethtool, f"-N {cfg.ifname} delete {flow_rule_id}")
 
-    rx_cmd = f"ip netns exec {cfg.netns.name} {cfg.bin_local} -s -p {cfg.port} -i {cfg._nk_guest_ifname} -q {cfg.nk_queue}"
+    rx_cmd = f"ip netns exec {cfg.netns.name} {cfg.bin_local} -s -p {cfg.port} -i {cfg.nk_guest_ifname} -q {cfg.nk_queue}"
     tx_cmd = f"{cfg.bin_remote} -c -h {cfg.nk_guest_ipv6} -p {cfg.port} -l 12840"
     with bkg(rx_cmd, exit_wait=True):
         wait_port_listen(cfg.port, proto="tcp", ns=cfg.netns)
@@ -128,7 +128,7 @@ def test_attach_xdp_with_mp(cfg) -> None:
 
     netdevnl = NetdevFamily()
 
-    rx_cmd = f"ip netns exec {cfg.netns.name} {cfg.bin_local} -s -p {cfg.port} -i {cfg._nk_guest_ifname} -q {cfg.nk_queue}"
+    rx_cmd = f"ip netns exec {cfg.netns.name} {cfg.bin_local} -s -p {cfg.port} -i {cfg.nk_guest_ifname} -q {cfg.nk_queue}"
     with bkg(rx_cmd):
         wait_port_listen(cfg.port, proto="tcp", ns=cfg.netns)
 
@@ -178,7 +178,7 @@ def test_destroy(cfg) -> None:
     ethtool(f"-X {cfg.ifname} equal {cfg.src_queue}")
     defer(ethtool, f"-X {cfg.ifname} default")
 
-    rx_cmd = f"ip netns exec {cfg.netns.name} {cfg.bin_local} -s -p {cfg.port} -i {cfg._nk_guest_ifname} -q {cfg.nk_queue}"
+    rx_cmd = f"ip netns exec {cfg.netns.name} {cfg.bin_local} -s -p {cfg.port} -i {cfg.nk_guest_ifname} -q {cfg.nk_queue}"
     rx_proc = cmd(rx_cmd, background=True)
     wait_port_listen(cfg.port, proto="tcp", ns=cfg.netns)
 
@@ -196,7 +196,7 @@ def test_destroy(cfg) -> None:
     ip(f"link del dev {cfg._nk_host_ifname}")
     kill_timer.join()
     cfg._nk_host_ifname = None
-    cfg._nk_guest_ifname = None
+    cfg.nk_guest_ifname = None
 
     queue_info = netdevnl.queue_get(
         {"ifindex": cfg.ifindex, "id": cfg.src_queue, "type": "rx"}
diff --git a/tools/testing/selftests/drivers/net/lib/py/env.py b/tools/testing/selftests/drivers/net/lib/py/env.py
index 24ce122abd9c..409b41922245 100644
--- a/tools/testing/selftests/drivers/net/lib/py/env.py
+++ b/tools/testing/selftests/drivers/net/lib/py/env.py
@@ -339,7 +339,7 @@ class NetDrvContEnv(NetDrvEpEnv):
     def __init__(self, src_path, rxqueues=1, **kwargs):
         self.netns = None
         self._nk_host_ifname = None
-        self._nk_guest_ifname = None
+        self.nk_guest_ifname = None
         self._tc_clsact_added = False
         self._tc_attached = False
         self._bpf_prog_pref = None
@@ -390,7 +390,7 @@ class NetDrvContEnv(NetDrvEpEnv):
 
         netkit_links.sort(key=lambda x: x['ifindex'])
         self._nk_host_ifname = netkit_links[1]['ifname']
-        self._nk_guest_ifname = netkit_links[0]['ifname']
+        self.nk_guest_ifname = netkit_links[0]['ifname']
         self.nk_host_ifindex = netkit_links[1]['ifindex']
         self.nk_guest_ifindex = netkit_links[0]['ifindex']
 
@@ -409,7 +409,7 @@ class NetDrvContEnv(NetDrvEpEnv):
         if self._nk_host_ifname:
             cmd(f"ip link del dev {self._nk_host_ifname}")
             self._nk_host_ifname = None
-            self._nk_guest_ifname = None
+            self.nk_guest_ifname = None
 
         if self._init_ns_attached:
             cmd("ip netns del init", fail=False)
@@ -448,16 +448,16 @@ class NetDrvContEnv(NetDrvEpEnv):
         cmd("ip netns attach init 1")
         self._init_ns_attached = True
         ip("netns set init 0", ns=self.netns)
-        ip(f"link set dev {self._nk_guest_ifname} netns {self.netns.name}")
+        ip(f"link set dev {self.nk_guest_ifname} netns {self.netns.name}")
         ip(f"link set dev {self._nk_host_ifname} up")
         ip(f"-6 addr add fe80::1/64 dev {self._nk_host_ifname} nodad")
         ip(f"-6 route add {self.nk_guest_ipv6}/128 via fe80::2 dev {self._nk_host_ifname}")
 
         ip("link set lo up", ns=self.netns)
-        ip(f"link set dev {self._nk_guest_ifname} up", ns=self.netns)
-        ip(f"-6 addr add fe80::2/64 dev {self._nk_guest_ifname}", ns=self.netns)
-        ip(f"-6 addr add {self.nk_guest_ipv6}/64 dev {self._nk_guest_ifname} nodad", ns=self.netns)
-        ip(f"-6 route add default via fe80::1 dev {self._nk_guest_ifname}", ns=self.netns)
+        ip(f"link set dev {self.nk_guest_ifname} up", ns=self.netns)
+        ip(f"-6 addr add fe80::2/64 dev {self.nk_guest_ifname}", ns=self.netns)
+        ip(f"-6 addr add {self.nk_guest_ipv6}/64 dev {self.nk_guest_ifname} nodad", ns=self.netns)
+        ip(f"-6 route add default via fe80::1 dev {self.nk_guest_ifname}", ns=self.netns)
 
     def _tc_ensure_clsact(self):
         qdisc = json.loads(cmd(f"tc -j qdisc show dev {self.ifname}").stdout)

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v3 6/8] selftests: drv-net: refactor devmem command builders into lib module
  2026-05-08  2:27 [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices Bobby Eshleman
                   ` (4 preceding siblings ...)
  2026-05-08  2:27 ` [PATCH net-next v3 5/8] selftests: drv-net: make attr _nk_guest_ifname public Bobby Eshleman
@ 2026-05-08  2:27 ` Bobby Eshleman
  2026-05-08 15:03   ` Stanislav Fomichev
  2026-05-08  2:27 ` [PATCH net-next v3 7/8] selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv Bobby Eshleman
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Bobby Eshleman @ 2026-05-08  2:27 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan
  Cc: dw, sdf.kernel, mohsin.bashr, willemb, jiang.kun2, xu.xin16,
	wang.yaxin, netdev, linux-doc, linux-kernel, linux-rdma, bpf,
	linux-kselftest, Stanislav Fomichev, Mina Almasry, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman@meta.com>

Adding netkit-based devmem tests is a straight-forward copy of devmem
test commands plus some args for the nk cases, so this patch breaks out
these command builders into helpers used by both.

Though we tried to avoid libraries to avoid increasing the barrier of
entry/complexity (see selftests/drivers/net/README.md, section "Avoid
libraries and frameworks"), factoring out these functions seemed like
the lesser of two evils in this case of using the same commands, just
with slightly different args per environment.

I experimented with just having all of the tests in the same file to
avoid having helpers in a library file, but because ksft_run() is
limited to a single call per file, and the new tests will require
different environments (NetDrvContEnv/NetDrvEpEnv), it would have been
necessary to have each test set up its own environment instead of
sharing one for the entire ksft_run() run. This came at the cost of
ballooning the test time (from under 5s to 30s on my test system), so to
strike a balance these tests were placed in separate files so they could
keep a shared environment across a single ksft_run() run shared across
all tests using the same env type (introduced in subsequent patches).

The helpers work transparently with both plain and netkit environments
by inspecting cfg for netkit-specific attributes (netns, nk_queue,
etc...).

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v4:
- Make socat_send() always bind the source; drop its bind= parameter
  and the matching bind=not_ns at the run_rx call site.
- Drop socat_send()'s nodelay= arg; have buf_size>0 imply TCP_NODELAY
  since they are only meaningful together.
- configure_nic(): stash originals on cfg instead of using defer(); add
  paired cleanup_nic() helper. Drop the per-test configure_nic() calls
  from run_rx/run_tx/run_tx_chunks/run_rx_hds; the netkit test file
  invokes configure_nic/cleanup_nic once around ksft_run().
- make cfg.devmem_supported and cfg.devmem_probed public attrs (no '_')
  for sake of linting
- general cleanup of the code, linting fixes

Changes in v3:
- In setup_test, drop the unused cfg.listen_ns = getattr(cfg, 'netns',
  None) assignment.
- In run_rx, pass flow_steer=not_ns to ncdevmem_rx and bind=not_ns to
  socat_send to avoid changing functionality (we want just a straight
  refactor here)

Changes in v2:
- Move require_devmem() into individual test functions so KsftSkipEx goes up to
  ksft_run() (Sashiko)
- in ncdevmem_rx(), move -v 7 to take effect for both netns and
  non-netns when verify=True
---
 tools/testing/selftests/drivers/net/hw/devmem.py   |  77 ++------
 .../selftests/drivers/net/hw/lib/py/devmem.py      | 218 +++++++++++++++++++++
 2 files changed, 231 insertions(+), 64 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/hw/devmem.py b/tools/testing/selftests/drivers/net/hw/devmem.py
index ee863e90d1e0..dbc1e6a27b6a 100755
--- a/tools/testing/selftests/drivers/net/hw/devmem.py
+++ b/tools/testing/selftests/drivers/net/hw/devmem.py
@@ -2,91 +2,40 @@
 # SPDX-License-Identifier: GPL-2.0
 
 from os import path
-from lib.py import ksft_run, ksft_exit
-from lib.py import ksft_eq, KsftSkipEx
+from lib.py import ksft_run, ksft_exit, ksft_disruptive
 from lib.py import NetDrvEpEnv
-from lib.py import bkg, cmd, rand_port, wait_port_listen
-from lib.py import ksft_disruptive
-
-
-def require_devmem(cfg):
-    if not hasattr(cfg, "_devmem_probed"):
-        probe_command = f"{cfg.bin_local} -f {cfg.ifname}"
-        cfg._devmem_supported = cmd(probe_command, fail=False, shell=True).ret == 0
-        cfg._devmem_probed = True
-
-    if not cfg._devmem_supported:
-        raise KsftSkipEx("Test requires devmem support")
+from lib.py.devmem import setup_test, run_rx, run_tx, run_tx_chunks, run_rx_hds
 
 
 @ksft_disruptive
 def check_rx(cfg) -> None:
-    require_devmem(cfg)
-
-    port = rand_port()
-    socat = f"socat -u - TCP{cfg.addr_ipver}:{cfg.baddr}:{port},bind={cfg.remote_baddr}:{port}"
-    listen_cmd = f"{cfg.bin_local} -l -f {cfg.ifname} -s {cfg.addr} -p {port} -c {cfg.remote_addr} -v 7"
-
-    with bkg(listen_cmd, exit_wait=True) as ncdevmem:
-        wait_port_listen(port)
-        cmd(f"yes $(echo -e \x01\x02\x03\x04\x05\x06) | \
-            head -c 1K | {socat}", host=cfg.remote, shell=True)
-
-    ksft_eq(ncdevmem.ret, 0)
+    """Run the devmem RX test."""
+    run_rx(cfg)
 
 
 @ksft_disruptive
 def check_tx(cfg) -> None:
-    require_devmem(cfg)
-
-    port = rand_port()
-    listen_cmd = f"socat -U - TCP{cfg.addr_ipver}-LISTEN:{port}"
-
-    with bkg(listen_cmd, host=cfg.remote, exit_wait=True) as socat:
-        wait_port_listen(port, host=cfg.remote)
-        cmd(f"echo -e \"hello\\nworld\"| {cfg.bin_local} -f {cfg.ifname} -s {cfg.remote_addr} -p {port}", shell=True)
-
-    ksft_eq(socat.stdout.strip(), "hello\nworld")
+    """Run the devmem TX test."""
+    run_tx(cfg)
 
 
 @ksft_disruptive
 def check_tx_chunks(cfg) -> None:
-    require_devmem(cfg)
-
-    port = rand_port()
-    listen_cmd = f"socat -U - TCP{cfg.addr_ipver}-LISTEN:{port}"
-
-    with bkg(listen_cmd, host=cfg.remote, exit_wait=True) as socat:
-        wait_port_listen(port, host=cfg.remote)
-        cmd(f"echo -e \"hello\\nworld\"| {cfg.bin_local} -f {cfg.ifname} -s {cfg.remote_addr} -p {port} -z 3", shell=True)
-
-    ksft_eq(socat.stdout.strip(), "hello\nworld")
+    """Run the devmem TX chunking test."""
+    run_tx_chunks(cfg)
 
 
 def check_rx_hds(cfg) -> None:
-    """Test HDS splitting across payload sizes."""
-    require_devmem(cfg)
-
-    for size in [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192]:
-        port = rand_port()
-        listen_cmd = f"{cfg.bin_local} -L -l -f {cfg.ifname} -s {cfg.addr} -p {port}"
-
-        with bkg(listen_cmd, exit_wait=True) as ncdevmem:
-            wait_port_listen(port)
-            cmd(f"dd if=/dev/zero bs={size} count=1 2>/dev/null | " +
-                f"socat -b {size} -u - TCP{cfg.addr_ipver}:{cfg.baddr}:{port},nodelay",
-                host=cfg.remote, shell=True)
-
-        ksft_eq(ncdevmem.ret, 0, f"HDS failed for payload size {size}")
+    """Run the HDS test."""
+    run_rx_hds(cfg)
 
 
 def main() -> None:
+    """Run the devmem test cases."""
     with NetDrvEpEnv(__file__) as cfg:
-        cfg.bin_local = path.abspath(path.dirname(__file__) + "/ncdevmem")
-        cfg.bin_remote = cfg.remote.deploy(cfg.bin_local)
-
+        setup_test(cfg, path.abspath(path.dirname(__file__) + "/ncdevmem"))
         ksft_run([check_rx, check_tx, check_tx_chunks, check_rx_hds],
-                 args=(cfg, ))
+                 args=(cfg,))
     ksft_exit()
 
 
diff --git a/tools/testing/selftests/drivers/net/hw/lib/py/devmem.py b/tools/testing/selftests/drivers/net/hw/lib/py/devmem.py
new file mode 100644
index 000000000000..d3e7a3645cba
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/hw/lib/py/devmem.py
@@ -0,0 +1,218 @@
+# SPDX-License-Identifier: GPL-2.0
+"""Shared helpers for devmem TCP selftests."""
+
+import re
+
+from net.lib.py import (bkg, cmd, defer, ethtool, rand_port, wait_port_listen,
+                        ksft_eq, KsftSkipEx, NetNSEnter, EthtoolFamily,
+                        NetdevFamily)
+
+
+def require_devmem(cfg):
+    """Probe ncdevmem on cfg.ifname and SKIP the test if devmem isn't supported."""
+    if not hasattr(cfg, "devmem_probed"):
+        probe_command = f"{cfg.bin_local} -f {cfg.ifname}"
+        cfg.devmem_supported = cmd(probe_command, fail=False, shell=True).ret == 0
+        cfg.devmem_probed = True
+
+    if not cfg.devmem_supported:
+        raise KsftSkipEx("Test requires devmem support")
+
+
+def configure_nic(cfg):
+    """Channels, rings, RSS, queue lease for netkit devmem."""
+    cfg.require_ipver('6')
+    ethnl = EthtoolFamily()
+
+    channels = ethnl.channels_get({'header': {'dev-index': cfg.ifindex}})
+    channels = channels['combined-count']
+    if channels < 2:
+        raise KsftSkipEx(
+            'Test requires NETIF with at least 2 combined channels'
+        )
+
+    rings = ethnl.rings_get({'header': {'dev-index': cfg.ifindex}})
+    cfg.orig_rx_rings = rings['rx']
+    cfg.orig_hds_thresh = rings.get('hds-thresh', 0)
+    cfg.orig_data_split = rings.get('tcp-data-split', 'unknown')
+
+    ethnl.rings_set({'header': {'dev-index': cfg.ifindex},
+                     'tcp-data-split': 'enabled',
+                     'hds-thresh': 0,
+                     'rx': min(64, cfg.orig_rx_rings)})
+
+    cfg.src_queue = channels - 1
+    ethtool(f"-X {cfg.ifname} equal {cfg.src_queue}")
+
+    with NetNSEnter(str(cfg.netns)):
+        netdevnl = NetdevFamily()
+        lease_result = netdevnl.queue_create({
+            "ifindex": cfg.nk_guest_ifindex,
+            "type": "rx",
+            "lease": {
+                "ifindex": cfg.ifindex,
+                "queue": {"id": cfg.src_queue, "type": "rx"},
+                "netns-id": 0,
+            },
+        })
+        cfg.nk_queue = lease_result['id']
+
+
+def cleanup_nic(cfg):
+    """Undo configure_nic() by restoring RSS and ring settings."""
+    ethtool(f"-X {cfg.ifname} default")
+    EthtoolFamily().rings_set({'header': {'dev-index': cfg.ifindex},
+                               'tcp-data-split': cfg.orig_data_split,
+                               'hds-thresh': cfg.orig_hds_thresh,
+                               'rx': cfg.orig_rx_rings})
+
+
+def set_flow_rule(cfg, port):
+    """Install a flow rule steering to src_queue and return the flow rule ID."""
+    output = ethtool(
+        f"-N {cfg.ifname} flow-type tcp6 dst-port {port}"
+        f" action {cfg.src_queue}"
+    ).stdout
+    return int(re.search(r'ID (\d+)', output).group(1))
+
+
+def ncdevmem_rx(cfg, port, verify=True, fail_on_linear=False, flow_steer=False):
+    """Build the ncdevmem RX listener command."""
+    if hasattr(cfg, 'netns'):
+        flow_rule_id = set_flow_rule(cfg, port)
+        defer(ethtool, f"-N {cfg.ifname} delete {flow_rule_id}")
+
+        ifname = cfg.nk_guest_ifname
+        addr = cfg.nk_guest_ipv6
+        extras = [f"-t {cfg.nk_queue}", "-q 1", "-n"]
+    else:
+        ifname = cfg.ifname
+        addr = cfg.addr
+        extras = []
+        if flow_steer:
+            extras.append(f"-c {cfg.remote_addr}")
+
+    if verify:
+        extras.append("-v 7")
+    if fail_on_linear:
+        extras.append("-L")
+
+    parts = [cfg.bin_local, "-l", f"-f {ifname}", f"-s {addr}",
+             f"-p {port}", *extras]
+    return " ".join(parts)
+
+
+def ncdevmem_tx(cfg, port, chunk_size=0):
+    """Build the ncdevmem TX send command."""
+    if hasattr(cfg, 'netns'):
+        ifname = cfg.nk_guest_ifname
+        addr = cfg.remote_addr_v['6']
+        extras = ["-t 0", "-q 1", "-n"]
+    else:
+        ifname = cfg.ifname
+        addr = cfg.remote_addr
+        extras = []
+
+    if chunk_size:
+        extras.append(f"-z {chunk_size}")
+
+    parts = [cfg.bin_local, f"-f {ifname}", f"-s {addr}",
+             f"-p {port}", *extras]
+    return " ".join(parts)
+
+
+def socat_send(cfg, port, buf_size=0):
+    """Socat command for sending to the devmem listener.
+
+    When buf_size > 0, force one TCP segment per write of exactly that size by
+    setting socat's buffer (-b) and disabling Nagle (TCP_NODELAY).
+    """
+    proto = f"TCP{cfg.addr_ipver}"
+
+    if hasattr(cfg, 'netns'):
+        addr = f"[{cfg.nk_guest_ipv6}]"
+    else:
+        addr = cfg.baddr
+
+    suffix = f",bind={cfg.remote_baddr}:{port}"
+
+    buf = ""
+    if buf_size:
+        buf = f"-b {buf_size}"
+        suffix += ",nodelay"
+
+    return f"socat {buf} -u - {proto}:{addr}:{port}{suffix}"
+
+
+def socat_listen(cfg, port):
+    """Socat listen command for TX tests."""
+    return f"socat -U - TCP{cfg.addr_ipver}-LISTEN:{port}"
+
+
+def setup_test(cfg, bin_local):
+    """Stash the local ncdevmem path on cfg and deploy it to the remote."""
+    cfg.bin_local = bin_local
+    cfg.bin_remote = cfg.remote.deploy(cfg.bin_local)
+
+
+def run_rx(cfg):
+    """Run the devmem RX test."""
+    require_devmem(cfg)
+    port = rand_port()
+    socat = socat_send(cfg, port)
+    data_pipe = (f"yes $(echo -e \x01\x02\x03\x04\x05\x06) | head -c 1K"
+                 f" | {socat}")
+    netns = getattr(cfg, "netns", None)
+
+    listen_cmd = ncdevmem_rx(cfg, port, flow_steer=not hasattr(cfg, 'netns'))
+    with bkg(listen_cmd, exit_wait=True, ns=netns) as ncdevmem:
+        wait_port_listen(port, proto="tcp", ns=netns)
+        cmd(data_pipe, host=cfg.remote, shell=True)
+    ksft_eq(ncdevmem.ret, 0)
+
+
+def run_tx(cfg):
+    """Run the devmem TX test."""
+    require_devmem(cfg)
+    netns = getattr(cfg, "netns", None)
+    port = rand_port()
+    tx_cmd = ncdevmem_tx(cfg, port)
+    listen_cmd = socat_listen(cfg, port)
+
+    with bkg(listen_cmd, host=cfg.remote, exit_wait=True) as socat:
+        wait_port_listen(port, host=cfg.remote)
+        cmd(f"bash -c 'echo -e \"hello\\nworld\" | {tx_cmd}'", ns=netns, shell=True)
+    ksft_eq(socat.stdout.strip(), "hello\nworld")
+
+
+def run_tx_chunks(cfg):
+    """Run the devmem TX chunking test."""
+    require_devmem(cfg)
+    netns = getattr(cfg, "netns", None)
+    port = rand_port()
+    tx_cmd = ncdevmem_tx(cfg, port, chunk_size=3)
+    listen_cmd = socat_listen(cfg, port)
+
+    with bkg(listen_cmd, host=cfg.remote, exit_wait=True) as socat:
+        wait_port_listen(port, host=cfg.remote)
+        cmd(f"bash -c 'echo -e \"hello\\nworld\" | {tx_cmd}'", ns=netns, shell=True)
+    ksft_eq(socat.stdout.strip(), "hello\nworld")
+
+
+def run_rx_hds(cfg):
+    """Run the HDS test by running devmem RX across a segment size sweep."""
+    require_devmem(cfg)
+    netns = getattr(cfg, "netns", None)
+
+    for size in [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192]:
+        port = rand_port()
+
+        listen_cmd = ncdevmem_rx(cfg, port, verify=False,
+                                 fail_on_linear=True)
+        socat = socat_send(cfg, port, buf_size=size)
+
+        with bkg(listen_cmd, exit_wait=True, ns=netns) as ncdevmem:
+            wait_port_listen(port, proto="tcp", ns=netns)
+            cmd(f"dd if=/dev/zero bs={size} count=1 2>/dev/null | "
+                f"{socat}", host=cfg.remote, shell=True)
+        ksft_eq(ncdevmem.ret, 0, f"HDS failed for payload size {size}")

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v3 7/8] selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv
  2026-05-08  2:27 [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices Bobby Eshleman
                   ` (5 preceding siblings ...)
  2026-05-08  2:27 ` [PATCH net-next v3 6/8] selftests: drv-net: refactor devmem command builders into lib module Bobby Eshleman
@ 2026-05-08  2:27 ` Bobby Eshleman
  2026-05-08 15:03   ` Stanislav Fomichev
  2026-05-08  2:27 ` [PATCH net-next v3 8/8] selftests: drv-net: add netkit devmem tests Bobby Eshleman
  2026-05-10 20:33 ` [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices Zhu Yanjun
  8 siblings, 1 reply; 27+ messages in thread
From: Bobby Eshleman @ 2026-05-08  2:27 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan
  Cc: dw, sdf.kernel, mohsin.bashr, willemb, jiang.kun2, xu.xin16,
	wang.yaxin, netdev, linux-doc, linux-kernel, linux-rdma, bpf,
	linux-kselftest, Stanislav Fomichev, Mina Almasry, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman@meta.com>

When sending from a namespace that has access to a netkit device with a
leased queue, the nk primary in the host namespace needs to redirect its
RX to the physical device. This patch adds that redirection bpf program
and teaches the harness to install it.

Add primary_rx_redirect=False parameter to NetDrvContEnv.__init__().
When enabled, _attach_primary_rx_redirect_bpf() attaches a new BPF TC
program (nk_primary_rx_redirect.bpf.c) to the primary (host-side) netkit
interface. The program redirects non-ICMPv6 IPv6 packets to the physical
NIC via bpf_redirect_neigh(), with the physical ifindex configured via
the .bss map. ICMPv6 is left on the host's netkit primary so IPv6
neighbor discovery still work locally.

Extract _find_bss_map_id() from _attach_bpf() into a reusable helper so
other BPF attachment methods can use it.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v3:
- nk_primary_rx_redirect.bpf.c: add header includes to avoid hardcoding
  values
- update commit message explaining why ICMP is passed through
- env.py: re-use _tc_ensure_clsact() (had to add ifname paramater)
- env.py: gate the remote IPv6 host route install on primary_rx_redirect
  by moving it from _setup_ns() into _attach_primary_rx_redirect_bpf()
---
 .../drivers/net/hw/nk_primary_rx_redirect.bpf.c    | 39 +++++++++
 tools/testing/selftests/drivers/net/lib/py/env.py  | 93 +++++++++++++++++-----
 2 files changed, 114 insertions(+), 18 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/hw/nk_primary_rx_redirect.bpf.c b/tools/testing/selftests/drivers/net/hw/nk_primary_rx_redirect.bpf.c
new file mode 100644
index 000000000000..46ff494b23de
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/hw/nk_primary_rx_redirect.bpf.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <linux/pkt_cls.h>
+#include <linux/if_ether.h>
+#include <linux/in.h>
+#include <linux/ipv6.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#define ctx_ptr(field)		((void *)(long)(field))
+
+volatile __u32 phys_ifindex;
+
+SEC("tc/ingress")
+int nk_primary_rx_redirect(struct __sk_buff *skb)
+{
+	void *data_end = ctx_ptr(skb->data_end);
+	void *data = ctx_ptr(skb->data);
+	struct ethhdr *eth;
+	struct ipv6hdr *ip6h;
+
+	eth = data;
+	if ((void *)(eth + 1) > data_end)
+		return TC_ACT_OK;
+
+	if (eth->h_proto != bpf_htons(ETH_P_IPV6))
+		return TC_ACT_OK;
+
+	ip6h = data + sizeof(struct ethhdr);
+	if ((void *)(ip6h + 1) > data_end)
+		return TC_ACT_OK;
+
+	if (ip6h->nexthdr == IPPROTO_ICMPV6)
+		return TC_ACT_OK;
+
+	return bpf_redirect_neigh(phys_ifindex, NULL, 0, 0);
+}
+
+char __license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/drivers/net/lib/py/env.py b/tools/testing/selftests/drivers/net/lib/py/env.py
index 409b41922245..af8e1de8ed7b 100644
--- a/tools/testing/selftests/drivers/net/lib/py/env.py
+++ b/tools/testing/selftests/drivers/net/lib/py/env.py
@@ -336,15 +336,18 @@ class NetDrvContEnv(NetDrvEpEnv):
               +---------------+
     """
 
-    def __init__(self, src_path, rxqueues=1, **kwargs):
+    def __init__(self, src_path, rxqueues=1, primary_rx_redirect=False, **kwargs):
         self.netns = None
         self._nk_host_ifname = None
         self.nk_guest_ifname = None
         self._tc_clsact_added = False
         self._tc_attached = False
+        self._primary_rx_redirect_attached = False
+        self._primary_rx_redirect_clsact_added = False
         self._bpf_prog_pref = None
         self._bpf_prog_id = None
         self._init_ns_attached = False
+        self._remote_route_added = False
         self._old_fwd = None
         self._old_accept_ra = None
 
@@ -396,8 +399,18 @@ class NetDrvContEnv(NetDrvEpEnv):
 
         self._setup_ns()
         self._attach_bpf()
+        if primary_rx_redirect:
+            self._attach_primary_rx_redirect_bpf()
 
     def __del__(self):
+        if self._primary_rx_redirect_attached:
+            cmd(f"tc filter del dev {self._nk_host_ifname} ingress", fail=False)
+            self._primary_rx_redirect_attached = False
+
+        if self._primary_rx_redirect_clsact_added:
+            cmd(f"tc qdisc del dev {self._nk_host_ifname} clsact", fail=False)
+            self._primary_rx_redirect_clsact_added = False
+
         if self._tc_attached:
             cmd(f"tc filter del dev {self.ifname} ingress pref {self._bpf_prog_pref}")
             self._tc_attached = False
@@ -406,6 +419,11 @@ class NetDrvContEnv(NetDrvEpEnv):
             cmd(f"tc qdisc del dev {self.ifname} clsact")
             self._tc_clsact_added = False
 
+        if self._remote_route_added:
+            cmd(f"ip -6 route del {self.nk_guest_ipv6}/128",
+                host=self.remote, fail=False)
+            self._remote_route_added = False
+
         if self._nk_host_ifname:
             cmd(f"ip link del dev {self._nk_host_ifname}")
             self._nk_host_ifname = None
@@ -459,13 +477,19 @@ class NetDrvContEnv(NetDrvEpEnv):
         ip(f"-6 addr add {self.nk_guest_ipv6}/64 dev {self.nk_guest_ifname} nodad", ns=self.netns)
         ip(f"-6 route add default via fe80::1 dev {self.nk_guest_ifname}", ns=self.netns)
 
-    def _tc_ensure_clsact(self):
-        qdisc = json.loads(cmd(f"tc -j qdisc show dev {self.ifname}").stdout)
+    def _tc_ensure_clsact(self, ifname=None):
+        """Ensure a clsact qdisc exists on @ifname.
+
+        Returns True if this call added the qdisc, otherwise returns False.
+        """
+        if ifname is None:
+            ifname = self.ifname
+        qdisc = json.loads(cmd(f"tc -j qdisc show dev {ifname}").stdout)
         for q in qdisc:
             if q['kind'] == 'clsact':
-                return
-        cmd(f"tc qdisc add dev {self.ifname} clsact")
-        self._tc_clsact_added = True
+                return False
+        cmd(f"tc qdisc add dev {ifname} clsact")
+        return True
 
     def _get_bpf_prog_ids(self):
         filters = json.loads(cmd(f"tc -j filter show dev {self.ifname} ingress").stdout)
@@ -476,28 +500,28 @@ class NetDrvContEnv(NetDrvEpEnv):
                 return (bpf['pref'], bpf['options']['prog']['id'])
         raise Exception("Failed to get BPF prog ID")
 
+    def _find_bss_map_id(self, prog_id):
+        """Find the .bss map ID for a loaded BPF program."""
+        prog_info = bpftool(f"prog show id {prog_id}", json=True)
+        for map_id in prog_info.get("map_ids", []):
+            map_info = bpftool(f"map show id {map_id}", json=True)
+            if map_info.get("name", "").endswith("bss"):
+                return map_id
+        raise Exception(f"Failed to find .bss map for prog {prog_id}")
+
     def _attach_bpf(self):
         bpf_obj = self.test_dir / "nk_forward.bpf.o"
         if not bpf_obj.exists():
             raise KsftSkipEx("BPF prog not found")
 
-        self._tc_ensure_clsact()
+        if self._tc_ensure_clsact():
+            self._tc_clsact_added = True
         cmd(f"tc filter add dev {self.ifname} ingress bpf obj {bpf_obj}"
             " sec tc/ingress direct-action")
         self._tc_attached = True
 
         (self._bpf_prog_pref, self._bpf_prog_id) = self._get_bpf_prog_ids()
-        prog_info = bpftool(f"prog show id {self._bpf_prog_id}", json=True)
-        map_ids = prog_info.get("map_ids", [])
-
-        bss_map_id = None
-        for map_id in map_ids:
-            map_info = bpftool(f"map show id {map_id}", json=True)
-            if map_info.get("name").endswith("bss"):
-                bss_map_id = map_id
-
-        if bss_map_id is None:
-            raise Exception("Failed to find .bss map")
+        bss_map_id = self._find_bss_map_id(self._bpf_prog_id)
 
         ipv6_addr = ipaddress.IPv6Address(self.ipv6_prefix)
         ipv6_bytes = ipv6_addr.packed
@@ -505,3 +529,36 @@ class NetDrvContEnv(NetDrvEpEnv):
         value = ipv6_bytes + ifindex_bytes
         value_hex = ' '.join(f'{b:02x}' for b in value)
         bpftool(f"map update id {bss_map_id} key hex 00 00 00 00 value hex {value_hex}")
+
+    def _attach_primary_rx_redirect_bpf(self):
+        """Attach BPF redirect program on the primary netkit ingress."""
+        bpf_obj = self.test_dir / "nk_primary_rx_redirect.bpf.o"
+        if not bpf_obj.exists():
+            raise KsftSkipEx("Primary RX redirect BPF prog not found")
+
+        if self._tc_ensure_clsact(self._nk_host_ifname):
+            self._primary_rx_redirect_clsact_added = True
+        cmd(f"tc filter add dev {self._nk_host_ifname} ingress"
+            f" bpf obj {bpf_obj} sec tc/ingress direct-action")
+        self._primary_rx_redirect_attached = True
+
+        ip(f"-6 route add {self.nk_guest_ipv6}/128 via {self.addr_v['6']}",
+           host=self.remote)
+        self._remote_route_added = True
+
+        filters = json.loads(
+            cmd(f"tc -j filter show dev {self._nk_host_ifname} ingress").stdout)
+        redirect_prog_id = None
+        for bpf in filters:
+            if 'options' not in bpf:
+                continue
+            if bpf['options']['bpf_name'].startswith('nk_primary_rx_redirect'):
+                redirect_prog_id = bpf['options']['prog']['id']
+                break
+        if redirect_prog_id is None:
+            raise Exception("Failed to get primary RX redirect BPF prog ID")
+
+        bss_map_id = self._find_bss_map_id(redirect_prog_id)
+        phys_ifindex_bytes = self.ifindex.to_bytes(4, byteorder='little')
+        value_hex = ' '.join(f'{b:02x}' for b in phys_ifindex_bytes)
+        bpftool(f"map update id {bss_map_id} key hex 00 00 00 00 value hex {value_hex}")

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v3 8/8] selftests: drv-net: add netkit devmem tests
  2026-05-08  2:27 [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices Bobby Eshleman
                   ` (6 preceding siblings ...)
  2026-05-08  2:27 ` [PATCH net-next v3 7/8] selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv Bobby Eshleman
@ 2026-05-08  2:27 ` Bobby Eshleman
  2026-05-08 15:03   ` Stanislav Fomichev
  2026-05-10 20:33 ` [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices Zhu Yanjun
  8 siblings, 1 reply; 27+ messages in thread
From: Bobby Eshleman @ 2026-05-08  2:27 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan
  Cc: dw, sdf.kernel, mohsin.bashr, willemb, jiang.kun2, xu.xin16,
	wang.yaxin, netdev, linux-doc, linux-kernel, linux-rdma, bpf,
	linux-kselftest, Stanislav Fomichev, Mina Almasry, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Bobby Eshleman

From: Bobby Eshleman <bobbyeshleman@meta.com>

Add nk_devmem.py with four tests for TCP devmem through a netkit device:

These tests are just duplicates of the original devmem tests, with some
adjusted parameters such as telling ncdevmem to avoid device setup
(since it only has access to netkit, not a phys device).

Each test uses NetDrvContEnv with primary_rx_redirect=True to set up the
BPF redirect program on the primary netkit interface.

The NIC (HDS, RSS, queue lease) is configured once in main() before
ksft_run() and torn down in a finally block via cleanup_nic(), mirroring
the nk_qlease.py pattern. This avoids re-toggling NIC settings around
every test case.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v4:
- Call configure_nic()/cleanup_nic() once around ksft_run() rather than
  relying on per-test configuration inside the run_* helpers.

Changes in v3:
- Reorder os.path expressions
- Drop @ksft_disruptive from check_nk_rx_hds to mirror the original
  check_rx_hds in devmem.py

Changes in v2:
- Add nk_devmem.py to TEST_PROGS in Makefile (Sashiko)
---
 tools/testing/selftests/drivers/net/hw/Makefile    |  1 +
 .../testing/selftests/drivers/net/hw/nk_devmem.py  | 55 ++++++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/tools/testing/selftests/drivers/net/hw/Makefile b/tools/testing/selftests/drivers/net/hw/Makefile
index 85ca4d1ecf9e..2f78c6aec397 100644
--- a/tools/testing/selftests/drivers/net/hw/Makefile
+++ b/tools/testing/selftests/drivers/net/hw/Makefile
@@ -34,6 +34,7 @@ TEST_PROGS = \
 	irq.py \
 	loopback.sh \
 	nic_timestamp.py \
+	nk_devmem.py \
 	nk_netns.py \
 	nk_qlease.py \
 	ntuple.py \
diff --git a/tools/testing/selftests/drivers/net/hw/nk_devmem.py b/tools/testing/selftests/drivers/net/hw/nk_devmem.py
new file mode 100755
index 000000000000..0e36a0fa9688
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/hw/nk_devmem.py
@@ -0,0 +1,55 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+"""Test devmem TCP with netkit."""
+
+import os
+from lib.py import ksft_run, ksft_exit, ksft_disruptive
+from lib.py import NetDrvContEnv
+from lib.py.devmem import (setup_test, require_devmem, configure_nic,
+                           cleanup_nic, run_rx, run_tx, run_tx_chunks,
+                           run_rx_hds)
+
+
+@ksft_disruptive
+def check_nk_rx(cfg) -> None:
+    """Run the devmem RX test through netkit."""
+    run_rx(cfg)
+
+
+@ksft_disruptive
+def check_nk_tx(cfg) -> None:
+    """Run the devmem TX test through netkit."""
+    run_tx(cfg)
+
+
+@ksft_disruptive
+def check_nk_tx_chunks(cfg) -> None:
+    """Run the devmem TX chunking test through netkit."""
+    run_tx_chunks(cfg)
+
+
+def check_nk_rx_hds(cfg) -> None:
+    """Run the HDS test through netkit."""
+    run_rx_hds(cfg)
+
+
+def main() -> None:
+    """Configure the NIC once, then run the netkit devmem test cases."""
+    with NetDrvContEnv(__file__, rxqueues=2, primary_rx_redirect=True) as cfg:
+        setup_test(cfg,
+                   os.path.join(os.path.dirname(os.path.abspath(__file__)),
+                                "ncdevmem"))
+
+        require_devmem(cfg)
+        configure_nic(cfg)
+        try:
+            ksft_run([check_nk_rx, check_nk_tx, check_nk_tx_chunks,
+                      check_nk_rx_hds], args=(cfg,))
+        finally:
+            cleanup_nic(cfg)
+
+    ksft_exit()
+
+
+if __name__ == "__main__":
+    main()

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 1/8] net: convert netmem_tx flag to enum
  2026-05-08  2:27 ` [PATCH net-next v3 1/8] net: convert netmem_tx flag to enum Bobby Eshleman
@ 2026-05-08 14:56   ` Stanislav Fomichev
  2026-05-08 16:11     ` Bobby Eshleman
  0 siblings, 1 reply; 27+ messages in thread
From: Stanislav Fomichev @ 2026-05-08 14:56 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan, dw,
	mohsin.bashr, willemb, jiang.kun2, xu.xin16, wang.yaxin, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Stanislav Fomichev, Mina Almasry, Bobby Eshleman

On 05/07, Bobby Eshleman wrote:
> From: Bobby Eshleman <bobbyeshleman@meta.com>
> 
> Devices that support netmem TX previously set dev->netmem_tx = true.
> This was checked in validate_xmit_unreadable_skb() to drop unreadable
> skbs (skbs with dmabuf-backed frags) before they reach drivers that
> would mishandle them or devices that would not have the iommu mappings
> for them.
> 
> A subsequent patch will introduce a third state for virtual devices
> that forward unreadable skbs without ever performing DMA on them. To
> prepare for that, convert the boolean dev->netmem_tx into an enum:
> 
> NETMEM_TX_NONE   - no netmem TX support (drop unreadable skbs)
> NETMEM_TX_DMA    - full support, device does DMA
> 
> Update the existing NIC drivers (bnxt, gve, mlx5, fbnic) and the
> validators in net/core to use the new enum. No functional change.
> 
> Acked-by: Harshitha Ramamurthy <hramamurthy@google.com>
> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> ---
> Changes in v3:
> - Split NO_DMA changes into subsequent commit (Jakub)
> - Move !netdev->netmem_tx -> netdev->netmem_tx ==
>   NETMEM_TX_NONE conversions to this patch (Jakub)
> 
> Changes in v2:
> - Squash driver conversion patches (2-5) into patch 1 (Jakub)
> ---
>  Documentation/networking/netmem.rst                    | 5 ++++-
>  Documentation/translations/zh_CN/networking/netmem.rst | 4 +++-
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c              | 2 +-
>  drivers/net/ethernet/google/gve/gve_main.c             | 2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c      | 2 +-
>  drivers/net/ethernet/meta/fbnic/fbnic_netdev.c         | 2 +-
>  include/linux/netdevice.h                              | 8 +++++++-
>  net/core/dev.c                                         | 2 +-
>  net/core/netdev-genl.c                                 | 2 +-
>  9 files changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/networking/netmem.rst b/Documentation/networking/netmem.rst
> index b63aded46337..5ccadba4f373 100644
> --- a/Documentation/networking/netmem.rst
> +++ b/Documentation/networking/netmem.rst
> @@ -95,4 +95,7 @@ Driver TX Requirements
>     netdev@, or reach out to the maintainers and/or almasrymina@google.com for
>     help adding the netmem API.
>  
> -2. Driver should declare support by setting `netdev->netmem_tx = true`
> +2. Driver should declare support by setting `netdev->netmem_tx` to the
> +   appropriate mode:
> +
> +   - `NETMEM_TX_DMA`: for physical devices that perform DMA.
> diff --git a/Documentation/translations/zh_CN/networking/netmem.rst b/Documentation/translations/zh_CN/networking/netmem.rst
> index fe351a240f02..9c84423b7528 100644
> --- a/Documentation/translations/zh_CN/networking/netmem.rst
> +++ b/Documentation/translations/zh_CN/networking/netmem.rst
> @@ -89,4 +89,6 @@ dma-mapping API 去处理。
>  使用某个还不存在的 netmem API,你可以自行添加并提交到 netdev@,也可以联系维护
>  人员或者发送邮件至 almasrymina@google.com 寻求帮助。
>  
> -2. 驱动程序应通过设置 netdev->netmem_tx = true 来表明自身支持 netmem 功能。
> +2. 驱动程序应将 `netdev->netmem_tx` 设置为适当的模式:
> +
> +   - `NETMEM_TX_DMA`:适用于执行 DMA 的物理设备。
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 8c55874f44ca..ed9c22dc4a5a 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -17120,7 +17120,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	dev->queue_mgmt_ops = &bnxt_queue_mgmt_ops_unsupp;
>  	if (BNXT_SUPPORTS_QUEUE_API(bp))
>  		dev->queue_mgmt_ops = &bnxt_queue_mgmt_ops;
> -	dev->netmem_tx = true;
> +	dev->netmem_tx = NETMEM_TX_DMA;
>  
>  	rc = register_netdev(dev);
>  	if (rc)
> diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
> index 424d973c97f2..dd2b8f087163 100644
> --- a/drivers/net/ethernet/google/gve/gve_main.c
> +++ b/drivers/net/ethernet/google/gve/gve_main.c
> @@ -2894,7 +2894,7 @@ static int gve_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  		goto abort_with_wq;
>  
>  	if (!gve_is_gqi(priv) && !gve_is_qpl(priv))
> -		dev->netmem_tx = true;
> +		dev->netmem_tx = NETMEM_TX_DMA;
>  
>  	err = register_netdev(dev);
>  	if (err)
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 5a46870c4b74..fc49aae38807 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -5924,7 +5924,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
>  
>  	netdev->priv_flags       |= IFF_UNICAST_FLT;
>  
> -	netdev->netmem_tx = true;
> +	netdev->netmem_tx = NETMEM_TX_DMA;
>  
>  	netif_set_tso_max_size(netdev, GSO_MAX_SIZE);
>  	mlx5e_set_xdp_feature(priv);
> diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
> index c406a3b56b37..138e522ef9b9 100644
> --- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
> +++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
> @@ -752,7 +752,7 @@ struct net_device *fbnic_netdev_alloc(struct fbnic_dev *fbd)
>  	netdev->netdev_ops = &fbnic_netdev_ops;
>  	netdev->stat_ops = &fbnic_stat_ops;
>  	netdev->queue_mgmt_ops = &fbnic_queue_mgmt_ops;
> -	netdev->netmem_tx = true;
> +	netdev->netmem_tx = NETMEM_TX_DMA;
>  
>  	fbnic_set_ethtool_ops(netdev);
>  
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 0e1e581efc5a..580bccb118a0 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1788,6 +1788,11 @@ enum netdev_stat_type {
>  	NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */
>  };
>  
> +enum netmem_tx_mode {
> +	NETMEM_TX_NONE,		/* no netmem TX support */
> +	NETMEM_TX_DMA,		/* DMA-capable netmem TX (real HW) */
> +};
> +
>  enum netdev_reg_state {
>  	NETREG_UNINITIALIZED = 0,
>  	NETREG_REGISTERED,	/* completed register_netdevice */
> @@ -1809,7 +1814,8 @@ enum netdev_reg_state {
>   *	@lltx:		device supports lockless Tx. Deprecated for real HW
>   *			drivers. Mainly used by logical interfaces, such as
>   *			bonding and tunnels
> - *	@netmem_tx:	device support netmem_tx.
> + *	@netmem_tx:	device netmem TX mode (NETMEM_TX_NONE or
> + *			NETMEM_TX_DMA).


nit: if you happen to repost, listing enum values here seems too much?

"device netmem TX mode" should be enough

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 2/8] net: netkit: declare NETMEM_TX_NO_DMA mode
  2026-05-08  2:27 ` [PATCH net-next v3 2/8] net: netkit: declare NETMEM_TX_NO_DMA mode Bobby Eshleman
@ 2026-05-08 14:57   ` Stanislav Fomichev
  0 siblings, 0 replies; 27+ messages in thread
From: Stanislav Fomichev @ 2026-05-08 14:57 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan, dw,
	mohsin.bashr, willemb, jiang.kun2, xu.xin16, wang.yaxin, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Stanislav Fomichev, Mina Almasry, Bobby Eshleman

On 05/07, Bobby Eshleman wrote:
> From: Bobby Eshleman <bobbyeshleman@meta.com>
> 
> Some virtual devices like netkit (or ifb) never DMA and never touch frag
> contents, they just forward the skb to another device. They are unable
> to forward unreadable skbs, however, because they fail to pass TX
> validation checks on dev->netmem_tx. The existing two-state
> NETMEM_TX_NONE / NETMEM_TX_DMA doesn't give the TX validator enough
> information to differentiate devices that will attempt DMA on the
> unreadable skb from those that will simply route it untouched.
> 
> Add a third mode to the enum so drivers can indicate 1) if they have
> netmem TX support, and 2) if they do, whether they are DMA-capable:
> 
> NETMEM_TX_NO_DMA - pass-through, device never DMAs
> 
> Widen dev->netmem_tx from a 1-bit field to 2 bits to fit the new value,
> and declare netkit as NETMEM_TX_NO_DMA. Devmem TX support over these
> devices comes in a follow-up patch.
> 
> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 3/8] net: devmem: support TX over NETMEM_TX_NO_DMA devices
  2026-05-08  2:27 ` [PATCH net-next v3 3/8] net: devmem: support TX over NETMEM_TX_NO_DMA devices Bobby Eshleman
@ 2026-05-08 15:01   ` Stanislav Fomichev
  2026-05-08 16:19     ` Bobby Eshleman
  2026-05-08 20:44     ` Jakub Kicinski
  2026-05-08 20:47   ` Jakub Kicinski
  1 sibling, 2 replies; 27+ messages in thread
From: Stanislav Fomichev @ 2026-05-08 15:01 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan, dw,
	mohsin.bashr, willemb, jiang.kun2, xu.xin16, wang.yaxin, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Stanislav Fomichev, Mina Almasry, Bobby Eshleman

On 05/07, Bobby Eshleman wrote:
> From: Bobby Eshleman <bobbyeshleman@meta.com>
> 
> When a netkit virtual device leases queues from a physical NIC, devmem
> TX bindings created on the netkit device must still result in the dmabuf
> being mapped for dma by the physical device. This patch accomplishes
> this by teaching the bind handler to search for the underlying
> DMA-capable device by looking it up via leased rx queues. The function
> netdev_find_netmem_tx_dev(), used for finding the underlying DMA-capable
> device, can be extended to support other non-netkit NETMEM_TX_NO_DMA
> devices in the future if needed.
> 
> Additionally, this patch extends validate_xmit_unreadable_skb() to
> support the netkit case, where the skb is validated twice: once on the
> netkit guest device and again on the physical NIC after BPF redirect or
> ip forwarding.
> 
> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> ---
> Changes in v3:
> - Fix validate_xmit_unreadable_skb() bug for non-devmem
>   unreadable niovs (should not be dropped)
> - Major simplification of validate_xmit_unreadable_skb()
> - Fix prematurely released lock in bind-tx handler (Jakub)
> 
> Changes in v2:
> - In validate_xmit_unreadable_skb() to check netmem_tx mode before
>   inspecting frags (Jakub)
> - Lock bind_dev around netdev_queue_get_dma_dev() when bind_dev !=
>   netdev to fix lockdep (Sashiko)
> ---
>  net/core/dev.c         |  3 +++
>  net/core/devmem.c      |  6 +++--
>  net/core/devmem.h      |  9 ++++++--
>  net/core/netdev-genl.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++----
>  4 files changed, 72 insertions(+), 9 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index fbe4c328a367..268417c9ef22 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3999,6 +3999,9 @@ static struct sk_buff *validate_xmit_unreadable_skb(struct sk_buff *skb,
>  	if (dev->netmem_tx == NETMEM_TX_NONE)
>  		goto out_free;
>  
> +	if (dev->netmem_tx == NETMEM_TX_NO_DMA)
> +		goto out;
> +

Since this is a good case, maybe fold it into skb_frags_readable check above?

	if (likely(skb_frags_readable() || netmem_tx == NETMEM_TX_NO_DMA))

Otherwise it's a bit confusing to have:

if (xxx)
	goto out;
if (yyy)
	goto out_free;
if (zzz)
	goto out;

(or, reorder to be out/out/out_free)

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 4/8] selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration
  2026-05-08  2:27 ` [PATCH net-next v3 4/8] selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration Bobby Eshleman
@ 2026-05-08 15:01   ` Stanislav Fomichev
  0 siblings, 0 replies; 27+ messages in thread
From: Stanislav Fomichev @ 2026-05-08 15:01 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan, dw,
	mohsin.bashr, willemb, jiang.kun2, xu.xin16, wang.yaxin, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Stanislav Fomichev, Mina Almasry, Bobby Eshleman

On 05/07, Bobby Eshleman wrote:
> From: Bobby Eshleman <bobbyeshleman@meta.com>
> 
> Add a -n (skip_config) flag that causes ncdevmem to skip NIC
> configuration when operating as an RX server. When -n is passed,
> ncdevmem skips configuring header split, RSS, and flow steering, as well
> as their teardown on exit.
> 
> This allows ksft tests to pre-configure the NIC in the host namespace
> before launching ncdevmem in the guest namespace. This is needed for
> netkit devmem tests where the test harness namespace has direct access
> to the NIC and the ncdevmem namespace does not.
> 
> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 5/8] selftests: drv-net: make attr _nk_guest_ifname public
  2026-05-08  2:27 ` [PATCH net-next v3 5/8] selftests: drv-net: make attr _nk_guest_ifname public Bobby Eshleman
@ 2026-05-08 15:01   ` Stanislav Fomichev
  0 siblings, 0 replies; 27+ messages in thread
From: Stanislav Fomichev @ 2026-05-08 15:01 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan, dw,
	mohsin.bashr, willemb, jiang.kun2, xu.xin16, wang.yaxin, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Stanislav Fomichev, Mina Almasry, Bobby Eshleman

On 05/07, Bobby Eshleman wrote:
> From: Bobby Eshleman <bobbyeshleman@meta.com>
> 
> Subsequent patches will use the _nk_guest_ifname as a public attr for
> setting up devmem. Rename to nk_guest_ifname to avoid angering the
> linter about the '_' prefix being used for a non-private attr.
> 
> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 6/8] selftests: drv-net: refactor devmem command builders into lib module
  2026-05-08  2:27 ` [PATCH net-next v3 6/8] selftests: drv-net: refactor devmem command builders into lib module Bobby Eshleman
@ 2026-05-08 15:03   ` Stanislav Fomichev
  2026-05-08 16:19     ` Bobby Eshleman
  0 siblings, 1 reply; 27+ messages in thread
From: Stanislav Fomichev @ 2026-05-08 15:03 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan, dw,
	mohsin.bashr, willemb, jiang.kun2, xu.xin16, wang.yaxin, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Stanislav Fomichev, Mina Almasry, Bobby Eshleman

On 05/07, Bobby Eshleman wrote:
> From: Bobby Eshleman <bobbyeshleman@meta.com>
> 
> Adding netkit-based devmem tests is a straight-forward copy of devmem
> test commands plus some args for the nk cases, so this patch breaks out
> these command builders into helpers used by both.
> 
> Though we tried to avoid libraries to avoid increasing the barrier of
> entry/complexity (see selftests/drivers/net/README.md, section "Avoid
> libraries and frameworks"), factoring out these functions seemed like
> the lesser of two evils in this case of using the same commands, just
> with slightly different args per environment.
> 
> I experimented with just having all of the tests in the same file to
> avoid having helpers in a library file, but because ksft_run() is
> limited to a single call per file, and the new tests will require
> different environments (NetDrvContEnv/NetDrvEpEnv), it would have been
> necessary to have each test set up its own environment instead of
> sharing one for the entire ksft_run() run. This came at the cost of
> ballooning the test time (from under 5s to 30s on my test system), so to
> strike a balance these tests were placed in separate files so they could
> keep a shared environment across a single ksft_run() run shared across
> all tests using the same env type (introduced in subsequent patches).
> 
> The helpers work transparently with both plain and netkit environments
> by inspecting cfg for netkit-specific attributes (netns, nk_queue,
> etc...).
> 
> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> ---

[..]

> Changes in v4:

This is a v3, but you already have changes for v4 :-p

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 7/8] selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv
  2026-05-08  2:27 ` [PATCH net-next v3 7/8] selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv Bobby Eshleman
@ 2026-05-08 15:03   ` Stanislav Fomichev
  0 siblings, 0 replies; 27+ messages in thread
From: Stanislav Fomichev @ 2026-05-08 15:03 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan, dw,
	mohsin.bashr, willemb, jiang.kun2, xu.xin16, wang.yaxin, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Stanislav Fomichev, Mina Almasry, Bobby Eshleman

On 05/07, Bobby Eshleman wrote:
> From: Bobby Eshleman <bobbyeshleman@meta.com>
> 
> When sending from a namespace that has access to a netkit device with a
> leased queue, the nk primary in the host namespace needs to redirect its
> RX to the physical device. This patch adds that redirection bpf program
> and teaches the harness to install it.
> 
> Add primary_rx_redirect=False parameter to NetDrvContEnv.__init__().
> When enabled, _attach_primary_rx_redirect_bpf() attaches a new BPF TC
> program (nk_primary_rx_redirect.bpf.c) to the primary (host-side) netkit
> interface. The program redirects non-ICMPv6 IPv6 packets to the physical
> NIC via bpf_redirect_neigh(), with the physical ifindex configured via
> the .bss map. ICMPv6 is left on the host's netkit primary so IPv6
> neighbor discovery still work locally.
> 
> Extract _find_bss_map_id() from _attach_bpf() into a reusable helper so
> other BPF attachment methods can use it.
> 
> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 8/8] selftests: drv-net: add netkit devmem tests
  2026-05-08  2:27 ` [PATCH net-next v3 8/8] selftests: drv-net: add netkit devmem tests Bobby Eshleman
@ 2026-05-08 15:03   ` Stanislav Fomichev
  0 siblings, 0 replies; 27+ messages in thread
From: Stanislav Fomichev @ 2026-05-08 15:03 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan, dw,
	mohsin.bashr, willemb, jiang.kun2, xu.xin16, wang.yaxin, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Stanislav Fomichev, Mina Almasry, Bobby Eshleman

On 05/07, Bobby Eshleman wrote:
> From: Bobby Eshleman <bobbyeshleman@meta.com>
> 
> Add nk_devmem.py with four tests for TCP devmem through a netkit device:
> 
> These tests are just duplicates of the original devmem tests, with some
> adjusted parameters such as telling ncdevmem to avoid device setup
> (since it only has access to netkit, not a phys device).
> 
> Each test uses NetDrvContEnv with primary_rx_redirect=True to set up the
> BPF redirect program on the primary netkit interface.
> 
> The NIC (HDS, RSS, queue lease) is configured once in main() before
> ksft_run() and torn down in a finally block via cleanup_nic(), mirroring
> the nk_qlease.py pattern. This avoids re-toggling NIC settings around
> every test case.
> 
> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 1/8] net: convert netmem_tx flag to enum
  2026-05-08 14:56   ` Stanislav Fomichev
@ 2026-05-08 16:11     ` Bobby Eshleman
  0 siblings, 0 replies; 27+ messages in thread
From: Bobby Eshleman @ 2026-05-08 16:11 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan, dw,
	mohsin.bashr, willemb, jiang.kun2, xu.xin16, wang.yaxin, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Stanislav Fomichev, Mina Almasry, Bobby Eshleman

On Fri, May 08, 2026 at 07:56:42AM -0700, Stanislav Fomichev wrote:
> On 05/07, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@meta.com>
> > 
> > Devices that support netmem TX previously set dev->netmem_tx = true.
> > This was checked in validate_xmit_unreadable_skb() to drop unreadable
> > skbs (skbs with dmabuf-backed frags) before they reach drivers that
> > would mishandle them or devices that would not have the iommu mappings
> > for them.
> > 
> > A subsequent patch will introduce a third state for virtual devices
> > that forward unreadable skbs without ever performing DMA on them. To
> > prepare for that, convert the boolean dev->netmem_tx into an enum:
> > 
> > NETMEM_TX_NONE   - no netmem TX support (drop unreadable skbs)
> > NETMEM_TX_DMA    - full support, device does DMA
> > 
> > Update the existing NIC drivers (bnxt, gve, mlx5, fbnic) and the
> > validators in net/core to use the new enum. No functional change.
> > 
> > Acked-by: Harshitha Ramamurthy <hramamurthy@google.com>
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > ---
> > Changes in v3:
> > - Split NO_DMA changes into subsequent commit (Jakub)
> > - Move !netdev->netmem_tx -> netdev->netmem_tx ==
> >   NETMEM_TX_NONE conversions to this patch (Jakub)
> > 
> > Changes in v2:
> > - Squash driver conversion patches (2-5) into patch 1 (Jakub)
> > ---
> >  Documentation/networking/netmem.rst                    | 5 ++++-
> >  Documentation/translations/zh_CN/networking/netmem.rst | 4 +++-
> >  drivers/net/ethernet/broadcom/bnxt/bnxt.c              | 2 +-
> >  drivers/net/ethernet/google/gve/gve_main.c             | 2 +-
> >  drivers/net/ethernet/mellanox/mlx5/core/en_main.c      | 2 +-
> >  drivers/net/ethernet/meta/fbnic/fbnic_netdev.c         | 2 +-
> >  include/linux/netdevice.h                              | 8 +++++++-
> >  net/core/dev.c                                         | 2 +-
> >  net/core/netdev-genl.c                                 | 2 +-
> >  9 files changed, 20 insertions(+), 9 deletions(-)
> > 
> > diff --git a/Documentation/networking/netmem.rst b/Documentation/networking/netmem.rst
> > index b63aded46337..5ccadba4f373 100644
> > --- a/Documentation/networking/netmem.rst
> > +++ b/Documentation/networking/netmem.rst
> > @@ -95,4 +95,7 @@ Driver TX Requirements
> >     netdev@, or reach out to the maintainers and/or almasrymina@google.com for
> >     help adding the netmem API.
> >  
> > -2. Driver should declare support by setting `netdev->netmem_tx = true`
> > +2. Driver should declare support by setting `netdev->netmem_tx` to the
> > +   appropriate mode:
> > +
> > +   - `NETMEM_TX_DMA`: for physical devices that perform DMA.
> > diff --git a/Documentation/translations/zh_CN/networking/netmem.rst b/Documentation/translations/zh_CN/networking/netmem.rst
> > index fe351a240f02..9c84423b7528 100644
> > --- a/Documentation/translations/zh_CN/networking/netmem.rst
> > +++ b/Documentation/translations/zh_CN/networking/netmem.rst
> > @@ -89,4 +89,6 @@ dma-mapping API 去处理。
> >  使用某个还不存在的 netmem API,你可以自行添加并提交到 netdev@,也可以联系维护
> >  人员或者发送邮件至 almasrymina@google.com 寻求帮助。
> >  
> > -2. 驱动程序应通过设置 netdev->netmem_tx = true 来表明自身支持 netmem 功能。
> > +2. 驱动程序应将 `netdev->netmem_tx` 设置为适当的模式:
> > +
> > +   - `NETMEM_TX_DMA`:适用于执行 DMA 的物理设备。
> > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > index 8c55874f44ca..ed9c22dc4a5a 100644
> > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > @@ -17120,7 +17120,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> >  	dev->queue_mgmt_ops = &bnxt_queue_mgmt_ops_unsupp;
> >  	if (BNXT_SUPPORTS_QUEUE_API(bp))
> >  		dev->queue_mgmt_ops = &bnxt_queue_mgmt_ops;
> > -	dev->netmem_tx = true;
> > +	dev->netmem_tx = NETMEM_TX_DMA;
> >  
> >  	rc = register_netdev(dev);
> >  	if (rc)
> > diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
> > index 424d973c97f2..dd2b8f087163 100644
> > --- a/drivers/net/ethernet/google/gve/gve_main.c
> > +++ b/drivers/net/ethernet/google/gve/gve_main.c
> > @@ -2894,7 +2894,7 @@ static int gve_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> >  		goto abort_with_wq;
> >  
> >  	if (!gve_is_gqi(priv) && !gve_is_qpl(priv))
> > -		dev->netmem_tx = true;
> > +		dev->netmem_tx = NETMEM_TX_DMA;
> >  
> >  	err = register_netdev(dev);
> >  	if (err)
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > index 5a46870c4b74..fc49aae38807 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > @@ -5924,7 +5924,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
> >  
> >  	netdev->priv_flags       |= IFF_UNICAST_FLT;
> >  
> > -	netdev->netmem_tx = true;
> > +	netdev->netmem_tx = NETMEM_TX_DMA;
> >  
> >  	netif_set_tso_max_size(netdev, GSO_MAX_SIZE);
> >  	mlx5e_set_xdp_feature(priv);
> > diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
> > index c406a3b56b37..138e522ef9b9 100644
> > --- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
> > +++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c
> > @@ -752,7 +752,7 @@ struct net_device *fbnic_netdev_alloc(struct fbnic_dev *fbd)
> >  	netdev->netdev_ops = &fbnic_netdev_ops;
> >  	netdev->stat_ops = &fbnic_stat_ops;
> >  	netdev->queue_mgmt_ops = &fbnic_queue_mgmt_ops;
> > -	netdev->netmem_tx = true;
> > +	netdev->netmem_tx = NETMEM_TX_DMA;
> >  
> >  	fbnic_set_ethtool_ops(netdev);
> >  
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 0e1e581efc5a..580bccb118a0 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -1788,6 +1788,11 @@ enum netdev_stat_type {
> >  	NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */
> >  };
> >  
> > +enum netmem_tx_mode {
> > +	NETMEM_TX_NONE,		/* no netmem TX support */
> > +	NETMEM_TX_DMA,		/* DMA-capable netmem TX (real HW) */
> > +};
> > +
> >  enum netdev_reg_state {
> >  	NETREG_UNINITIALIZED = 0,
> >  	NETREG_REGISTERED,	/* completed register_netdevice */
> > @@ -1809,7 +1814,8 @@ enum netdev_reg_state {
> >   *	@lltx:		device supports lockless Tx. Deprecated for real HW
> >   *			drivers. Mainly used by logical interfaces, such as
> >   *			bonding and tunnels
> > - *	@netmem_tx:	device support netmem_tx.
> > + *	@netmem_tx:	device netmem TX mode (NETMEM_TX_NONE or
> > + *			NETMEM_TX_DMA).
> 
> 
> nit: if you happen to repost, listing enum values here seems too much?
> 
> "device netmem TX mode" should be enough

Will do!

Thanks,
Bobby

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 3/8] net: devmem: support TX over NETMEM_TX_NO_DMA devices
  2026-05-08 15:01   ` Stanislav Fomichev
@ 2026-05-08 16:19     ` Bobby Eshleman
  2026-05-08 20:44     ` Jakub Kicinski
  1 sibling, 0 replies; 27+ messages in thread
From: Bobby Eshleman @ 2026-05-08 16:19 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan, dw,
	mohsin.bashr, willemb, jiang.kun2, xu.xin16, wang.yaxin, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Stanislav Fomichev, Mina Almasry, Bobby Eshleman

On Fri, May 08, 2026 at 08:01:17AM -0700, Stanislav Fomichev wrote:
> On 05/07, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@meta.com>
> > 
> > When a netkit virtual device leases queues from a physical NIC, devmem
> > TX bindings created on the netkit device must still result in the dmabuf
> > being mapped for dma by the physical device. This patch accomplishes
> > this by teaching the bind handler to search for the underlying
> > DMA-capable device by looking it up via leased rx queues. The function
> > netdev_find_netmem_tx_dev(), used for finding the underlying DMA-capable
> > device, can be extended to support other non-netkit NETMEM_TX_NO_DMA
> > devices in the future if needed.
> > 
> > Additionally, this patch extends validate_xmit_unreadable_skb() to
> > support the netkit case, where the skb is validated twice: once on the
> > netkit guest device and again on the physical NIC after BPF redirect or
> > ip forwarding.
> > 
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > ---
> > Changes in v3:
> > - Fix validate_xmit_unreadable_skb() bug for non-devmem
> >   unreadable niovs (should not be dropped)
> > - Major simplification of validate_xmit_unreadable_skb()
> > - Fix prematurely released lock in bind-tx handler (Jakub)
> > 
> > Changes in v2:
> > - In validate_xmit_unreadable_skb() to check netmem_tx mode before
> >   inspecting frags (Jakub)
> > - Lock bind_dev around netdev_queue_get_dma_dev() when bind_dev !=
> >   netdev to fix lockdep (Sashiko)
> > ---
> >  net/core/dev.c         |  3 +++
> >  net/core/devmem.c      |  6 +++--
> >  net/core/devmem.h      |  9 ++++++--
> >  net/core/netdev-genl.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++----
> >  4 files changed, 72 insertions(+), 9 deletions(-)
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index fbe4c328a367..268417c9ef22 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3999,6 +3999,9 @@ static struct sk_buff *validate_xmit_unreadable_skb(struct sk_buff *skb,
> >  	if (dev->netmem_tx == NETMEM_TX_NONE)
> >  		goto out_free;
> >  
> > +	if (dev->netmem_tx == NETMEM_TX_NO_DMA)
> > +		goto out;
> > +
> 
> Since this is a good case, maybe fold it into skb_frags_readable check above?
> 
> 	if (likely(skb_frags_readable() || netmem_tx == NETMEM_TX_NO_DMA))
> 
> Otherwise it's a bit confusing to have:
> 
> if (xxx)
> 	goto out;
> if (yyy)
> 	goto out_free;
> if (zzz)
> 	goto out;
> 
> (or, reorder to be out/out/out_free)
> 
> Acked-by: Stanislav Fomichev <sdf@fomichev.me>

Makes sense, will use the combined conditional.

Best,
Bobby

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 6/8] selftests: drv-net: refactor devmem command builders into lib module
  2026-05-08 15:03   ` Stanislav Fomichev
@ 2026-05-08 16:19     ` Bobby Eshleman
  0 siblings, 0 replies; 27+ messages in thread
From: Bobby Eshleman @ 2026-05-08 16:19 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan, dw,
	mohsin.bashr, willemb, jiang.kun2, xu.xin16, wang.yaxin, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Stanislav Fomichev, Mina Almasry, Bobby Eshleman

On Fri, May 08, 2026 at 08:03:11AM -0700, Stanislav Fomichev wrote:
> On 05/07, Bobby Eshleman wrote:
> > From: Bobby Eshleman <bobbyeshleman@meta.com>
> > 
> > Adding netkit-based devmem tests is a straight-forward copy of devmem
> > test commands plus some args for the nk cases, so this patch breaks out
> > these command builders into helpers used by both.
> > 
> > Though we tried to avoid libraries to avoid increasing the barrier of
> > entry/complexity (see selftests/drivers/net/README.md, section "Avoid
> > libraries and frameworks"), factoring out these functions seemed like
> > the lesser of two evils in this case of using the same commands, just
> > with slightly different args per environment.
> > 
> > I experimented with just having all of the tests in the same file to
> > avoid having helpers in a library file, but because ksft_run() is
> > limited to a single call per file, and the new tests will require
> > different environments (NetDrvContEnv/NetDrvEpEnv), it would have been
> > necessary to have each test set up its own environment instead of
> > sharing one for the entire ksft_run() run. This came at the cost of
> > ballooning the test time (from under 5s to 30s on my test system), so to
> > strike a balance these tests were placed in separate files so they could
> > keep a shared environment across a single ksft_run() run shared across
> > all tests using the same env type (introduced in subsequent patches).
> > 
> > The helpers work transparently with both plain and netkit environments
> > by inspecting cfg for netkit-specific attributes (netns, nk_queue,
> > etc...).
> > 
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > ---
> 
> [..]
> 
> > Changes in v4:
> 
> This is a v3, but you already have changes for v4 :-p

oops, will change on respin.

Thanks,
Bobby

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 3/8] net: devmem: support TX over NETMEM_TX_NO_DMA devices
  2026-05-08 15:01   ` Stanislav Fomichev
  2026-05-08 16:19     ` Bobby Eshleman
@ 2026-05-08 20:44     ` Jakub Kicinski
  1 sibling, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2026-05-08 20:44 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Bobby Eshleman, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan, dw,
	mohsin.bashr, willemb, jiang.kun2, xu.xin16, wang.yaxin, netdev,
	linux-doc, linux-kernel, linux-rdma, bpf, linux-kselftest,
	Stanislav Fomichev, Mina Almasry, Bobby Eshleman

On Fri, 8 May 2026 08:01:17 -0700 Stanislav Fomichev wrote:
> Since this is a good case, maybe fold it into skb_frags_readable check above?
> 
> 	if (likely(skb_frags_readable() || netmem_tx == NETMEM_TX_NO_DMA))

FWIW I had the same feeling on v2, so probably worth fixing.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 3/8] net: devmem: support TX over NETMEM_TX_NO_DMA devices
  2026-05-08  2:27 ` [PATCH net-next v3 3/8] net: devmem: support TX over NETMEM_TX_NO_DMA devices Bobby Eshleman
  2026-05-08 15:01   ` Stanislav Fomichev
@ 2026-05-08 20:47   ` Jakub Kicinski
  2026-05-08 21:28     ` Bobby Eshleman
  1 sibling, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2026-05-08 20:47 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi, Yanteng Si,
	Dongliang Mu, Michael Chan, Pavan Chebbi, Joshua Washington,
	Harshitha Ramamurthy, Saeed Mahameed, Tariq Toukan, Mark Bloch,
	Leon Romanovsky, Alexander Duyck, kernel-team, Daniel Borkmann,
	Nikolay Aleksandrov, Shuah Khan, dw, sdf.kernel, mohsin.bashr,
	willemb, jiang.kun2, xu.xin16, wang.yaxin, netdev, linux-doc,
	linux-kernel, linux-rdma, bpf, linux-kselftest,
	Stanislav Fomichev, Mina Almasry, Bobby Eshleman

On Thu, 07 May 2026 19:27:48 -0700 Bobby Eshleman wrote:
> +	/* Virtual device (e.g. netkit) the user called bind-tx on. Must be
> +	 * NETMEM_TX_NO_DMA.
> +	 */
> +	struct net_device *vdev;

AI keeps complaining that we don't hold a reference to this dev which 
I think is fine, we're just comparing pointers. Could we maybe make this
a void pointer and mention in the comment that we treat it as "best
effort cookie" (better phrasing welcome).

Or we should wipe these vdev pointers when vdevs disappear, not sure
how hard that'd be (or whether it's worth the extra state).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 3/8] net: devmem: support TX over NETMEM_TX_NO_DMA devices
  2026-05-08 20:47   ` Jakub Kicinski
@ 2026-05-08 21:28     ` Bobby Eshleman
  2026-05-08 22:27       ` Jakub Kicinski
  0 siblings, 1 reply; 27+ messages in thread
From: Bobby Eshleman @ 2026-05-08 21:28 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi, Yanteng Si,
	Dongliang Mu, Michael Chan, Pavan Chebbi, Joshua Washington,
	Harshitha Ramamurthy, Saeed Mahameed, Tariq Toukan, Mark Bloch,
	Leon Romanovsky, Alexander Duyck, kernel-team, Daniel Borkmann,
	Nikolay Aleksandrov, Shuah Khan, dw, sdf.kernel, mohsin.bashr,
	willemb, jiang.kun2, xu.xin16, wang.yaxin, netdev, linux-doc,
	linux-kernel, linux-rdma, bpf, linux-kselftest,
	Stanislav Fomichev, Mina Almasry, Bobby Eshleman

On Fri, May 08, 2026 at 01:47:17PM -0700, Jakub Kicinski wrote:
> On Thu, 07 May 2026 19:27:48 -0700 Bobby Eshleman wrote:
> > +	/* Virtual device (e.g. netkit) the user called bind-tx on. Must be
> > +	 * NETMEM_TX_NO_DMA.
> > +	 */
> > +	struct net_device *vdev;
> 
> AI keeps complaining that we don't hold a reference to this dev which 
> I think is fine, we're just comparing pointers. Could we maybe make this
> a void pointer and mention in the comment that we treat it as "best
> effort cookie" (better phrasing welcome).
> 
> Or we should wipe these vdev pointers when vdevs disappear, not sure
> how hard that'd be (or whether it's worth the extra state).

My guess is this would probably be the simplest way?

diff --git a/net/core/devmem.c b/net/core/devmem.c
index 644c286b778f..e28fae14c687 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -533,3 +533,38 @@ static const struct memory_provider_ops dmabuf_devmem_ops = {
 	.nl_fill		= mp_dmabuf_devmem_nl_fill,
 	.uninstall		= mp_dmabuf_devmem_uninstall,
 };
+
+static int net_devmem_netdev_event(struct notifier_block *nb,
+				   unsigned long event, void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct net_devmem_dmabuf_binding *binding;
+	unsigned long id;
+
+	if (event != NETDEV_UNREGISTER)
+		return NOTIFY_DONE;
+
+	xa_for_each(&net_devmem_dmabuf_bindings, id, binding) {
+		if (!net_devmem_dmabuf_binding_get(binding))
+			continue;
+		mutex_lock(&binding->lock);
+		if (READ_ONCE(binding->vdev) == dev) {
+			ASSERT_EXCLUSIVE_WRITER(binding->vdev);
+			WRITE_ONCE(binding->vdev, NULL);
+		}
+		mutex_unlock(&binding->lock);
+		net_devmem_dmabuf_binding_put(binding);
+	}
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block net_devmem_netdev_nb = {
+	.notifier_call = net_devmem_netdev_event,
+};
+
+static int __init net_devmem_init(void)
+{
+	return register_netdevice_notifier(&net_devmem_netdev_nb);
+}
+subsys_initcall(net_devmem_init);


I'm open to either approach. The void* + comment is good too, IMHO.  For
the notifier, I'd probably want to add a test too ensure sendmsg() kicks
back after the device is removed.

Best,
Bobby

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 3/8] net: devmem: support TX over NETMEM_TX_NO_DMA devices
  2026-05-08 21:28     ` Bobby Eshleman
@ 2026-05-08 22:27       ` Jakub Kicinski
  2026-05-08 23:03         ` Bobby Eshleman
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2026-05-08 22:27 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi, Yanteng Si,
	Dongliang Mu, Michael Chan, Pavan Chebbi, Joshua Washington,
	Harshitha Ramamurthy, Saeed Mahameed, Tariq Toukan, Mark Bloch,
	Leon Romanovsky, Alexander Duyck, kernel-team, Daniel Borkmann,
	Nikolay Aleksandrov, Shuah Khan, dw, sdf.kernel, mohsin.bashr,
	willemb, jiang.kun2, xu.xin16, wang.yaxin, netdev, linux-doc,
	linux-kernel, linux-rdma, bpf, linux-kselftest,
	Stanislav Fomichev, Mina Almasry, Bobby Eshleman

On Fri, 8 May 2026 14:28:55 -0700 Bobby Eshleman wrote:
> My guess is this would probably be the simplest way?

IDK. Notifiers are so inelegant. Don't we have the same problem with
the main ->dev on Tx binding?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 3/8] net: devmem: support TX over NETMEM_TX_NO_DMA devices
  2026-05-08 22:27       ` Jakub Kicinski
@ 2026-05-08 23:03         ` Bobby Eshleman
  0 siblings, 0 replies; 27+ messages in thread
From: Bobby Eshleman @ 2026-05-08 23:03 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi, Yanteng Si,
	Dongliang Mu, Michael Chan, Pavan Chebbi, Joshua Washington,
	Harshitha Ramamurthy, Saeed Mahameed, Tariq Toukan, Mark Bloch,
	Leon Romanovsky, Alexander Duyck, kernel-team, Daniel Borkmann,
	Nikolay Aleksandrov, Shuah Khan, dw, sdf.kernel, mohsin.bashr,
	willemb, jiang.kun2, xu.xin16, wang.yaxin, netdev, linux-doc,
	linux-kernel, linux-rdma, bpf, linux-kselftest,
	Stanislav Fomichev, Mina Almasry, Bobby Eshleman

On Fri, May 08, 2026 at 03:27:08PM -0700, Jakub Kicinski wrote:
> On Fri, 8 May 2026 14:28:55 -0700 Bobby Eshleman wrote:
> > My guess is this would probably be the simplest way?
> 
> IDK. Notifiers are so inelegant. Don't we have the same problem with
> the main ->dev on Tx binding?

Yes, true. For some reason, I thought I recalled the dma buf attachment
causing some chain of reference holding that kept the device alive, but
that is actually not true...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices
  2026-05-08  2:27 [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices Bobby Eshleman
                   ` (7 preceding siblings ...)
  2026-05-08  2:27 ` [PATCH net-next v3 8/8] selftests: drv-net: add netkit devmem tests Bobby Eshleman
@ 2026-05-10 20:33 ` Zhu Yanjun
  2026-05-11 17:01   ` Bobby Eshleman
  8 siblings, 1 reply; 27+ messages in thread
From: Zhu Yanjun @ 2026-05-10 20:33 UTC (permalink / raw)
  To: Bobby Eshleman, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan, Alex Shi, Yanteng Si, Dongliang Mu, Michael Chan,
	Pavan Chebbi, Joshua Washington, Harshitha Ramamurthy,
	Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky,
	Alexander Duyck, kernel-team, Daniel Borkmann,
	Nikolay Aleksandrov, Shuah Khan, yanjun.zhu@linux.dev
  Cc: dw, sdf.kernel, mohsin.bashr, willemb, jiang.kun2, xu.xin16,
	wang.yaxin, netdev, linux-doc, linux-kernel, linux-rdma, bpf,
	linux-kselftest, Stanislav Fomichev, Mina Almasry, Bobby Eshleman

在 2026/5/7 19:27, Bobby Eshleman 写道:
> This series enables TCP devmem TX through netkit devices.
> 
> Netkit now supports queue leasing. A physical NIC's RX queue can be
> leased to a netkit guest interface inside a container namespace. This
> gives the container a devmem-capable data path on the RX side (bind-rx,
> etc...). On the TX side, the container process binds to its netkit guest
> interface and sends traffic that netkit redirects (via BPF or ip
> forwarding) to the physical NIC for DMA.
> 
> Two things in the existing devmem TX path prevent this from working:
> 
> 1. validate_xmit_unreadable_skb() requires dev->netmem_tx before it will
>     forward a dmabuf-backed (unreadable) skb. This protects skbs from
>     landing on devices that don't have the IOMMU mappings for the backing
>     dmabuf or that don't speak netmem. Netkit, however, does not support
>     DMA, doesn't attempt to read unreadable skb pages and so doesn't
>     break netmem (it is pure skb routing and redirection). It is
>     functionally capable of routing unreadable skbs, but there is no way
>     for the TX validation pathway to distinguish between a device that
>     will actually attempt DMA-ing the skb and another device
>     (like netkit) that does not DMA but also does not break
>     netmem.
> 
> 2. bind_tx_doit uses the bound device as the DMA device.  When the user
>     binds devmem TX to the netkit guest, the bind handler attempts to
>     create DMA mappings against netkit, which has no DMA capability and
>     no IOMMU mappings.
> 
> This series solves these problems as follows:
> 
> 1. Extend netmem_tx to two bits, assigned to one of three values:
> 
>     NETMEM_TX_NONE   - netmem not supported
>     NETMEM_TX_DMA    - netmem supported and performs DMA
>     NETMEM_TX_NO_DMA - netmem supported, but does not DMA
> 
>     With these bits, phys devices can set NETMEM_TX_DMA and devices like
>     netkit set NETMEM_TX_NO_DMA. The validation TX path ensures that any
>     DMA-capable netdev exactly matches the bound device, guaranteeing the
>     correct mapping of the bound dmabuf. The validation TX path also
>     allows devices with NETMEM_TX_NO_DMA to pass, knowing these devices
>     will not misuse netmem or run into IOMMU faults. After redirection or
>     routing and the skb finally makes its way through the stack to a
>     physical device's TX path, the above NETMEM_TX_DMA check is performed
>     again to guarantee the device has the appropriate binding/mappings.
> 
> 2. On TX bind, the bind handler recognizes NETMEM_TX_NO_DMA devices and
>     finds the phys TX device and binds to that instead. For the netkit
>     case, if it has been leased a queue from a DMA-capable device
>     already, then the bind action is performed on the DMA-capable device
>     instead and the dmabuf is mapped correctly.
> 
> ---
> Changes in v3:
> - Fix validate_xmit_unreadable_skb() logic for non-devmem
>    unreadable niovs (should not be dropped) (Sashiko)
> - Simplify lock handling in bind_tx, no premature release (Jakub)
> - split NO_DMA changes into separate patch (Jakub)
> - fixed some pylint issues, one required an additional patch ("selftests:
>    drv-net: make attr _nk_guest_ifname public") to rename a variable from
>    private to public
> - see per-patch changelist for more detailed changes
> - Link to v2: https://lore.kernel.org/r/20260504-tcp-dm-netkit-v2-0-56d52ac72fd4@meta.com
> 
> Changes in v2:
> - Squash driver conversion patches (2-5) into patch 1 (Jakub)
> - In validate_xmit_unreadable_skb() to check netmem_tx mode before inspecting
>    frags (Jakub)
> - Lock bind_dev around netdev_queue_get_dma_dev() when bind_dev != netdev to
>    fix lockdep (Sashiko)
> - Move require_devmem() into individual test functions so KsftSkipEx goes up to
>    ksft_run() (Sashiko)
> - Add nk_devmem.py to TEST_PROGS in Makefile (Sashiko)
> - Link to v1:
>    https://lore.kernel.org/all/20260428-tcp-dm-netkit-v1-0-719280eba4d2@meta.com/
> 
> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> 
> ---
> Bobby Eshleman (8):
>        net: convert netmem_tx flag to enum
>        net: netkit: declare NETMEM_TX_NO_DMA mode
>        net: devmem: support TX over NETMEM_TX_NO_DMA devices

I applied this patchset in my local kernel tree and built a new kernel 
image. I loaded this new kernel image in my test environment. It seems 
that all the testcases can pass.

I think that this patchset would not cause any regression problem in my 
test environment.

Zhu Yanjun

>        selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration
>        selftests: drv-net: make attr _nk_guest_ifname public
>        selftests: drv-net: refactor devmem command builders into lib module
>        selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv
>        selftests: drv-net: add netkit devmem tests
> 
>   .../networking/net_cachelines/net_device.rst       |   2 +-
>   Documentation/networking/netmem.rst                |   8 +-
>   .../translations/zh_CN/networking/netmem.rst       |   7 +-
>   drivers/net/ethernet/broadcom/bnxt/bnxt.c          |   2 +-
>   drivers/net/ethernet/google/gve/gve_main.c         |   2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   2 +-
>   drivers/net/ethernet/meta/fbnic/fbnic_netdev.c     |   2 +-
>   drivers/net/netkit.c                               |   1 +
>   include/linux/netdevice.h                          |  11 +-
>   net/core/dev.c                                     |   5 +-
>   net/core/devmem.c                                  |   6 +-
>   net/core/devmem.h                                  |   9 +-
>   net/core/netdev-genl.c                             |  65 +++++-
>   tools/testing/selftests/drivers/net/hw/Makefile    |   1 +
>   tools/testing/selftests/drivers/net/hw/devmem.py   |  77 ++------
>   .../selftests/drivers/net/hw/lib/py/devmem.py      | 218 +++++++++++++++++++++
>   tools/testing/selftests/drivers/net/hw/ncdevmem.c  |  58 +++---
>   .../testing/selftests/drivers/net/hw/nk_devmem.py  |  55 ++++++
>   .../drivers/net/hw/nk_primary_rx_redirect.bpf.c    |  39 ++++
>   .../testing/selftests/drivers/net/hw/nk_qlease.py  |   8 +-
>   tools/testing/selftests/drivers/net/lib/py/env.py  | 109 ++++++++---
>   21 files changed, 549 insertions(+), 138 deletions(-)
> ---
> base-commit: 790ead9394860e7d70c5e0e50a35b243e909a618
> change-id: 20260423-tcp-dm-netkit-2bd78b638d30
> 
> Best regards,


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices
  2026-05-10 20:33 ` [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices Zhu Yanjun
@ 2026-05-11 17:01   ` Bobby Eshleman
  0 siblings, 0 replies; 27+ messages in thread
From: Bobby Eshleman @ 2026-05-11 17:01 UTC (permalink / raw)
  To: Zhu Yanjun
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, Alex Shi,
	Yanteng Si, Dongliang Mu, Michael Chan, Pavan Chebbi,
	Joshua Washington, Harshitha Ramamurthy, Saeed Mahameed,
	Tariq Toukan, Mark Bloch, Leon Romanovsky, Alexander Duyck,
	kernel-team, Daniel Borkmann, Nikolay Aleksandrov, Shuah Khan, dw,
	sdf.kernel, mohsin.bashr, willemb, jiang.kun2, xu.xin16,
	wang.yaxin, netdev, linux-doc, linux-kernel, linux-rdma, bpf,
	linux-kselftest, Stanislav Fomichev, Mina Almasry, Bobby Eshleman

On Sun, May 10, 2026 at 01:33:18PM -0700, Zhu Yanjun wrote:
> 在 2026/5/7 19:27, Bobby Eshleman 写道:
> > This series enables TCP devmem TX through netkit devices.
> > 
> > Netkit now supports queue leasing. A physical NIC's RX queue can be
> > leased to a netkit guest interface inside a container namespace. This
> > gives the container a devmem-capable data path on the RX side (bind-rx,
> > etc...). On the TX side, the container process binds to its netkit guest
> > interface and sends traffic that netkit redirects (via BPF or ip
> > forwarding) to the physical NIC for DMA.
> > 
> > Two things in the existing devmem TX path prevent this from working:
> > 
> > 1. validate_xmit_unreadable_skb() requires dev->netmem_tx before it will
> >     forward a dmabuf-backed (unreadable) skb. This protects skbs from
> >     landing on devices that don't have the IOMMU mappings for the backing
> >     dmabuf or that don't speak netmem. Netkit, however, does not support
> >     DMA, doesn't attempt to read unreadable skb pages and so doesn't
> >     break netmem (it is pure skb routing and redirection). It is
> >     functionally capable of routing unreadable skbs, but there is no way
> >     for the TX validation pathway to distinguish between a device that
> >     will actually attempt DMA-ing the skb and another device
> >     (like netkit) that does not DMA but also does not break
> >     netmem.
> > 
> > 2. bind_tx_doit uses the bound device as the DMA device.  When the user
> >     binds devmem TX to the netkit guest, the bind handler attempts to
> >     create DMA mappings against netkit, which has no DMA capability and
> >     no IOMMU mappings.
> > 
> > This series solves these problems as follows:
> > 
> > 1. Extend netmem_tx to two bits, assigned to one of three values:
> > 
> >     NETMEM_TX_NONE   - netmem not supported
> >     NETMEM_TX_DMA    - netmem supported and performs DMA
> >     NETMEM_TX_NO_DMA - netmem supported, but does not DMA
> > 
> >     With these bits, phys devices can set NETMEM_TX_DMA and devices like
> >     netkit set NETMEM_TX_NO_DMA. The validation TX path ensures that any
> >     DMA-capable netdev exactly matches the bound device, guaranteeing the
> >     correct mapping of the bound dmabuf. The validation TX path also
> >     allows devices with NETMEM_TX_NO_DMA to pass, knowing these devices
> >     will not misuse netmem or run into IOMMU faults. After redirection or
> >     routing and the skb finally makes its way through the stack to a
> >     physical device's TX path, the above NETMEM_TX_DMA check is performed
> >     again to guarantee the device has the appropriate binding/mappings.
> > 
> > 2. On TX bind, the bind handler recognizes NETMEM_TX_NO_DMA devices and
> >     finds the phys TX device and binds to that instead. For the netkit
> >     case, if it has been leased a queue from a DMA-capable device
> >     already, then the bind action is performed on the DMA-capable device
> >     instead and the dmabuf is mapped correctly.
> > 
> > ---
> > Changes in v3:
> > - Fix validate_xmit_unreadable_skb() logic for non-devmem
> >    unreadable niovs (should not be dropped) (Sashiko)
> > - Simplify lock handling in bind_tx, no premature release (Jakub)
> > - split NO_DMA changes into separate patch (Jakub)
> > - fixed some pylint issues, one required an additional patch ("selftests:
> >    drv-net: make attr _nk_guest_ifname public") to rename a variable from
> >    private to public
> > - see per-patch changelist for more detailed changes
> > - Link to v2: https://lore.kernel.org/r/20260504-tcp-dm-netkit-v2-0-56d52ac72fd4@meta.com
> > 
> > Changes in v2:
> > - Squash driver conversion patches (2-5) into patch 1 (Jakub)
> > - In validate_xmit_unreadable_skb() to check netmem_tx mode before inspecting
> >    frags (Jakub)
> > - Lock bind_dev around netdev_queue_get_dma_dev() when bind_dev != netdev to
> >    fix lockdep (Sashiko)
> > - Move require_devmem() into individual test functions so KsftSkipEx goes up to
> >    ksft_run() (Sashiko)
> > - Add nk_devmem.py to TEST_PROGS in Makefile (Sashiko)
> > - Link to v1:
> >    https://lore.kernel.org/all/20260428-tcp-dm-netkit-v1-0-719280eba4d2@meta.com/
> > 
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > 
> > ---
> > Bobby Eshleman (8):
> >        net: convert netmem_tx flag to enum
> >        net: netkit: declare NETMEM_TX_NO_DMA mode
> >        net: devmem: support TX over NETMEM_TX_NO_DMA devices
> 
> I applied this patchset in my local kernel tree and built a new kernel
> image. I loaded this new kernel image in my test environment. It seems that
> all the testcases can pass.
> 
> I think that this patchset would not cause any regression problem in my test
> environment.
> 
> Zhu Yanjun

Thanks for testing!

Best,
Bobby

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2026-05-11 17:01 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-08  2:27 [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices Bobby Eshleman
2026-05-08  2:27 ` [PATCH net-next v3 1/8] net: convert netmem_tx flag to enum Bobby Eshleman
2026-05-08 14:56   ` Stanislav Fomichev
2026-05-08 16:11     ` Bobby Eshleman
2026-05-08  2:27 ` [PATCH net-next v3 2/8] net: netkit: declare NETMEM_TX_NO_DMA mode Bobby Eshleman
2026-05-08 14:57   ` Stanislav Fomichev
2026-05-08  2:27 ` [PATCH net-next v3 3/8] net: devmem: support TX over NETMEM_TX_NO_DMA devices Bobby Eshleman
2026-05-08 15:01   ` Stanislav Fomichev
2026-05-08 16:19     ` Bobby Eshleman
2026-05-08 20:44     ` Jakub Kicinski
2026-05-08 20:47   ` Jakub Kicinski
2026-05-08 21:28     ` Bobby Eshleman
2026-05-08 22:27       ` Jakub Kicinski
2026-05-08 23:03         ` Bobby Eshleman
2026-05-08  2:27 ` [PATCH net-next v3 4/8] selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration Bobby Eshleman
2026-05-08 15:01   ` Stanislav Fomichev
2026-05-08  2:27 ` [PATCH net-next v3 5/8] selftests: drv-net: make attr _nk_guest_ifname public Bobby Eshleman
2026-05-08 15:01   ` Stanislav Fomichev
2026-05-08  2:27 ` [PATCH net-next v3 6/8] selftests: drv-net: refactor devmem command builders into lib module Bobby Eshleman
2026-05-08 15:03   ` Stanislav Fomichev
2026-05-08 16:19     ` Bobby Eshleman
2026-05-08  2:27 ` [PATCH net-next v3 7/8] selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv Bobby Eshleman
2026-05-08 15:03   ` Stanislav Fomichev
2026-05-08  2:27 ` [PATCH net-next v3 8/8] selftests: drv-net: add netkit devmem tests Bobby Eshleman
2026-05-08 15:03   ` Stanislav Fomichev
2026-05-10 20:33 ` [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices Zhu Yanjun
2026-05-11 17:01   ` Bobby Eshleman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox