From: Bobby Eshleman <bobbyeshleman@gmail.com>
To: Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Alex Shi <alexs@kernel.org>, Yanteng Si <si.yanteng@linux.dev>,
Dongliang Mu <dzm91@hust.edu.cn>,
Michael Chan <michael.chan@broadcom.com>,
Pavan Chebbi <pavan.chebbi@broadcom.com>,
Joshua Washington <joshwash@google.com>,
Harshitha Ramamurthy <hramamurthy@google.com>,
Saeed Mahameed <saeedm@nvidia.com>,
Tariq Toukan <tariqt@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
Alexander Duyck <alexanderduyck@fb.com>,
kernel-team@meta.com, Daniel Borkmann <daniel@iogearbox.net>,
Nikolay Aleksandrov <razor@blackwall.org>,
Shuah Khan <shuah@kernel.org>
Cc: netdev@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
Stanislav Fomichev <sdf@fomichev.me>,
Mina Almasry <almasrymina@google.com>,
Bobby Eshleman <bobbyeshleman@meta.com>
Subject: [PATCH net-next 00/11] net: devmem: support devmem with netkit devices
Date: Tue, 28 Apr 2026 15:41:57 -0700 [thread overview]
Message-ID: <20260428-tcp-dm-netkit-v1-0-719280eba4d2@meta.com> (raw)
This series enables TCP devmem TX through netkit devices.
Netkit now supports queue leasing. A physical NIC's RX queue can be
leased to a netkit guest interface inside a container namespace. This
gives the container a devmem-capable data path on the RX side (bind-rx,
etc...). On the TX side, the container process binds to its netkit guest
interface and sends traffic that netkit redirects (via BPF or ip
forwarding) to the physical NIC for DMA.
Two things in the existing devmem TX path prevent this from working:
1. validate_xmit_unreadable_skb() requires dev->netmem_tx before it will
forward a dmabuf-backed (unreadable) skb. This protects skbs from
landing on devices that don't have the IOMMU mappings for the backing
dmabuf or that don't speak netmem. Netkit, however, does not support
DMA, doesn't attempt to read unreadable skb pages and so doesn't
break netmem (it is pure skb routing and redirection). It is
functionally capable of routing unreadable skbs, but there is no way
for the TX validation pathway to distinguish between a device that
will actually attempt DMA-ing the skb and another device
(like netkit) that does not DMA but also does not break
netmem.
2. bind_tx_doit uses the bound device as the DMA device. When the user
binds devmem TX to the netkit guest, the bind handler attempts to
create DMA mappings against netkit, which has no DMA capability and
no IOMMU mappings.
This series solves these problems as follows:
1. Extend netmem_tx to two bits, assigned to one of three values:
NETMEM_TX_NONE - netmem not supported
NETMEM_TX_DMA - netmem supported and performs DMA
NETMEM_TX_NO_DMA - netmem supported, but does not DMA
With these bits, phys devices can set NETMEM_TX_DMA and devices like
netkit set NETMEM_TX_NO_DMA. The validation TX path ensures that any
DMA-capable netdev exactly matches the bound device, guarantee the
correct mapping of the bound dmabuf. The validation TX path also
allows devices with NETMEM_TX_NO_DMA to pass, knowing these devices
will not misuse netmem or run into IOMMU faults. After redirection or
routing and the skb finally makes its way through the stack to a
physical device's TX path, the above NETMEM_TX_DMA check is performed
again to guarantee the device has the appropriate binding/mappings.
2. On TX bind, the bind handler recognizes NETMEM_TX_NO_DMA devices and
finds the phys TX device and binds to that instead. For the netkit
case, if it has been leased a queue from a DMA-capable device
already, then the bind action is performed on the DMA-capable device
instead and the dmabuf is mapped correctly.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Bobby Eshleman (11):
net: add netmem_tx modes that indicate dma capability
net: bnxt: convert netmem_tx from bool to NETMEM_TX_DMA enum
gve: convert netmem_tx from bool to NETMEM_TX_DMA enum
net/mlx5e: convert netmem_tx from bool to NETMEM_TX_DMA enum
eth: fbnic: convert netmem_tx from bool to NETMEM_TX_DMA enum
netkit: set NETMEM_TX_NO_DMA for unreadable skb passthrough
net: devmem: support TX over NETMEM_TX_NO_DMA devices
selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration
selftests: drv-net: refactor devmem command builders into lib module
selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv
selftests: drv-net: add netkit devmem tests
.../networking/net_cachelines/net_device.rst | 2 +-
Documentation/networking/netmem.rst | 8 +-
.../translations/zh_CN/networking/netmem.rst | 7 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
drivers/net/ethernet/google/gve/gve_main.c | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
drivers/net/ethernet/meta/fbnic/fbnic_netdev.c | 2 +-
drivers/net/netkit.c | 1 +
include/linux/netdevice.h | 11 +-
net/core/dev.c | 24 ++-
net/core/devmem.c | 6 +-
net/core/devmem.h | 9 +-
net/core/netdev-genl.c | 53 ++++-
tools/testing/selftests/drivers/net/hw/devmem.py | 73 +------
.../selftests/drivers/net/hw/lib/py/devmem.py | 215 +++++++++++++++++++++
tools/testing/selftests/drivers/net/hw/ncdevmem.c | 58 +++---
.../testing/selftests/drivers/net/hw/nk_devmem.py | 40 ++++
.../drivers/net/hw/nk_primary_rx_redirect.bpf.c | 41 ++++
tools/testing/selftests/drivers/net/lib/py/env.py | 67 +++++--
19 files changed, 498 insertions(+), 125 deletions(-)
---
base-commit: 790ead9394860e7d70c5e0e50a35b243e909a618
change-id: 20260423-tcp-dm-netkit-2bd78b638d30
Best regards,
--
Bobby Eshleman <bobbyeshleman@meta.com>
next reply other threads:[~2026-04-28 22:42 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-28 22:41 Bobby Eshleman [this message]
2026-04-28 22:41 ` [PATCH net-next 01/11] net: add netmem_tx modes that indicate dma capability Bobby Eshleman
2026-04-28 22:41 ` [PATCH net-next 02/11] net: bnxt: convert netmem_tx from bool to NETMEM_TX_DMA enum Bobby Eshleman
2026-04-28 22:42 ` [PATCH net-next 03/11] gve: " Bobby Eshleman
2026-04-28 22:42 ` [PATCH net-next 04/11] net/mlx5e: " Bobby Eshleman
2026-04-28 22:42 ` [PATCH net-next 05/11] eth: fbnic: " Bobby Eshleman
2026-04-28 22:42 ` [PATCH net-next 06/11] netkit: set NETMEM_TX_NO_DMA for unreadable skb passthrough Bobby Eshleman
2026-04-28 22:42 ` [PATCH net-next 07/11] net: devmem: support TX over NETMEM_TX_NO_DMA devices Bobby Eshleman
2026-05-01 0:57 ` Jakub Kicinski
2026-05-01 1:07 ` Bobby Eshleman
2026-04-28 22:42 ` [PATCH net-next 08/11] selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration Bobby Eshleman
2026-04-28 22:42 ` [PATCH net-next 09/11] selftests: drv-net: refactor devmem command builders into lib module Bobby Eshleman
2026-04-28 22:42 ` [PATCH net-next 10/11] selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv Bobby Eshleman
2026-04-28 22:42 ` [PATCH net-next 11/11] selftests: drv-net: add netkit devmem tests Bobby Eshleman
2026-04-29 12:08 ` [PATCH net-next 00/11] net: devmem: support devmem with netkit devices Daniel Borkmann
2026-04-29 15:18 ` Bobby Eshleman
2026-04-29 15:33 ` Daniel Borkmann
2026-05-01 0:59 ` Jakub Kicinski
2026-05-01 1:04 ` Bobby Eshleman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260428-tcp-dm-netkit-v1-0-719280eba4d2@meta.com \
--to=bobbyeshleman@gmail.com \
--cc=alexanderduyck@fb.com \
--cc=alexs@kernel.org \
--cc=almasrymina@google.com \
--cc=andrew+netdev@lunn.ch \
--cc=bobbyeshleman@meta.com \
--cc=bpf@vger.kernel.org \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dzm91@hust.edu.cn \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=hramamurthy@google.com \
--cc=joshwash@google.com \
--cc=kernel-team@meta.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=michael.chan@broadcom.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pavan.chebbi@broadcom.com \
--cc=razor@blackwall.org \
--cc=saeedm@nvidia.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=si.yanteng@linux.dev \
--cc=skhan@linuxfoundation.org \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox