From: Zhu Yanjun <yanjun.zhu@linux.dev>
To: Bobby Eshleman <bobbyeshleman@gmail.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>, Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Alex Shi <alexs@kernel.org>, Yanteng Si <si.yanteng@linux.dev>,
Dongliang Mu <dzm91@hust.edu.cn>,
Michael Chan <michael.chan@broadcom.com>,
Pavan Chebbi <pavan.chebbi@broadcom.com>,
Joshua Washington <joshwash@google.com>,
Harshitha Ramamurthy <hramamurthy@google.com>,
Saeed Mahameed <saeedm@nvidia.com>,
Tariq Toukan <tariqt@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
Alexander Duyck <alexanderduyck@fb.com>,
kernel-team@meta.com, Daniel Borkmann <daniel@iogearbox.net>,
Nikolay Aleksandrov <razor@blackwall.org>,
Shuah Khan <shuah@kernel.org>,
"yanjun.zhu@linux.dev" <yanjun.zhu@linux.dev>
Cc: dw@davidwei.uk, sdf.kernel@gmail.com, mohsin.bashr@gmail.com,
willemb@google.com, jiang.kun2@zte.com.cn, xu.xin16@zte.com.cn,
wang.yaxin@zte.com.cn, netdev@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, bpf@vger.kernel.org,
linux-kselftest@vger.kernel.org,
Stanislav Fomichev <sdf@fomichev.me>,
Mina Almasry <almasrymina@google.com>,
Bobby Eshleman <bobbyeshleman@meta.com>
Subject: Re: [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices
Date: Sun, 10 May 2026 13:33:18 -0700 [thread overview]
Message-ID: <d7de2f17-af2d-4b6a-be65-f009d78e3d20@linux.dev> (raw)
In-Reply-To: <20260507-tcp-dm-netkit-v3-0-52821445867c@meta.com>
在 2026/5/7 19:27, Bobby Eshleman 写道:
> This series enables TCP devmem TX through netkit devices.
>
> Netkit now supports queue leasing. A physical NIC's RX queue can be
> leased to a netkit guest interface inside a container namespace. This
> gives the container a devmem-capable data path on the RX side (bind-rx,
> etc...). On the TX side, the container process binds to its netkit guest
> interface and sends traffic that netkit redirects (via BPF or ip
> forwarding) to the physical NIC for DMA.
>
> Two things in the existing devmem TX path prevent this from working:
>
> 1. validate_xmit_unreadable_skb() requires dev->netmem_tx before it will
> forward a dmabuf-backed (unreadable) skb. This protects skbs from
> landing on devices that don't have the IOMMU mappings for the backing
> dmabuf or that don't speak netmem. Netkit, however, does not support
> DMA, doesn't attempt to read unreadable skb pages and so doesn't
> break netmem (it is pure skb routing and redirection). It is
> functionally capable of routing unreadable skbs, but there is no way
> for the TX validation pathway to distinguish between a device that
> will actually attempt DMA-ing the skb and another device
> (like netkit) that does not DMA but also does not break
> netmem.
>
> 2. bind_tx_doit uses the bound device as the DMA device. When the user
> binds devmem TX to the netkit guest, the bind handler attempts to
> create DMA mappings against netkit, which has no DMA capability and
> no IOMMU mappings.
>
> This series solves these problems as follows:
>
> 1. Extend netmem_tx to two bits, assigned to one of three values:
>
> NETMEM_TX_NONE - netmem not supported
> NETMEM_TX_DMA - netmem supported and performs DMA
> NETMEM_TX_NO_DMA - netmem supported, but does not DMA
>
> With these bits, phys devices can set NETMEM_TX_DMA and devices like
> netkit set NETMEM_TX_NO_DMA. The validation TX path ensures that any
> DMA-capable netdev exactly matches the bound device, guaranteeing the
> correct mapping of the bound dmabuf. The validation TX path also
> allows devices with NETMEM_TX_NO_DMA to pass, knowing these devices
> will not misuse netmem or run into IOMMU faults. After redirection or
> routing and the skb finally makes its way through the stack to a
> physical device's TX path, the above NETMEM_TX_DMA check is performed
> again to guarantee the device has the appropriate binding/mappings.
>
> 2. On TX bind, the bind handler recognizes NETMEM_TX_NO_DMA devices and
> finds the phys TX device and binds to that instead. For the netkit
> case, if it has been leased a queue from a DMA-capable device
> already, then the bind action is performed on the DMA-capable device
> instead and the dmabuf is mapped correctly.
>
> ---
> Changes in v3:
> - Fix validate_xmit_unreadable_skb() logic for non-devmem
> unreadable niovs (should not be dropped) (Sashiko)
> - Simplify lock handling in bind_tx, no premature release (Jakub)
> - split NO_DMA changes into separate patch (Jakub)
> - fixed some pylint issues, one required an additional patch ("selftests:
> drv-net: make attr _nk_guest_ifname public") to rename a variable from
> private to public
> - see per-patch changelist for more detailed changes
> - Link to v2: https://lore.kernel.org/r/20260504-tcp-dm-netkit-v2-0-56d52ac72fd4@meta.com
>
> Changes in v2:
> - Squash driver conversion patches (2-5) into patch 1 (Jakub)
> - In validate_xmit_unreadable_skb() to check netmem_tx mode before inspecting
> frags (Jakub)
> - Lock bind_dev around netdev_queue_get_dma_dev() when bind_dev != netdev to
> fix lockdep (Sashiko)
> - Move require_devmem() into individual test functions so KsftSkipEx goes up to
> ksft_run() (Sashiko)
> - Add nk_devmem.py to TEST_PROGS in Makefile (Sashiko)
> - Link to v1:
> https://lore.kernel.org/all/20260428-tcp-dm-netkit-v1-0-719280eba4d2@meta.com/
>
> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
>
> ---
> Bobby Eshleman (8):
> net: convert netmem_tx flag to enum
> net: netkit: declare NETMEM_TX_NO_DMA mode
> net: devmem: support TX over NETMEM_TX_NO_DMA devices
I applied this patchset in my local kernel tree and built a new kernel
image. I loaded this new kernel image in my test environment. It seems
that all the testcases can pass.
I think that this patchset would not cause any regression problem in my
test environment.
Zhu Yanjun
> selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration
> selftests: drv-net: make attr _nk_guest_ifname public
> selftests: drv-net: refactor devmem command builders into lib module
> selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv
> selftests: drv-net: add netkit devmem tests
>
> .../networking/net_cachelines/net_device.rst | 2 +-
> Documentation/networking/netmem.rst | 8 +-
> .../translations/zh_CN/networking/netmem.rst | 7 +-
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
> drivers/net/ethernet/google/gve/gve_main.c | 2 +-
> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
> drivers/net/ethernet/meta/fbnic/fbnic_netdev.c | 2 +-
> drivers/net/netkit.c | 1 +
> include/linux/netdevice.h | 11 +-
> net/core/dev.c | 5 +-
> net/core/devmem.c | 6 +-
> net/core/devmem.h | 9 +-
> net/core/netdev-genl.c | 65 +++++-
> tools/testing/selftests/drivers/net/hw/Makefile | 1 +
> tools/testing/selftests/drivers/net/hw/devmem.py | 77 ++------
> .../selftests/drivers/net/hw/lib/py/devmem.py | 218 +++++++++++++++++++++
> tools/testing/selftests/drivers/net/hw/ncdevmem.c | 58 +++---
> .../testing/selftests/drivers/net/hw/nk_devmem.py | 55 ++++++
> .../drivers/net/hw/nk_primary_rx_redirect.bpf.c | 39 ++++
> .../testing/selftests/drivers/net/hw/nk_qlease.py | 8 +-
> tools/testing/selftests/drivers/net/lib/py/env.py | 109 ++++++++---
> 21 files changed, 549 insertions(+), 138 deletions(-)
> ---
> base-commit: 790ead9394860e7d70c5e0e50a35b243e909a618
> change-id: 20260423-tcp-dm-netkit-2bd78b638d30
>
> Best regards,
next prev parent reply other threads:[~2026-05-10 20:33 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-08 2:27 [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices Bobby Eshleman
2026-05-08 2:27 ` [PATCH net-next v3 1/8] net: convert netmem_tx flag to enum Bobby Eshleman
2026-05-08 14:56 ` Stanislav Fomichev
2026-05-08 16:11 ` Bobby Eshleman
2026-05-08 2:27 ` [PATCH net-next v3 2/8] net: netkit: declare NETMEM_TX_NO_DMA mode Bobby Eshleman
2026-05-08 14:57 ` Stanislav Fomichev
2026-05-08 2:27 ` [PATCH net-next v3 3/8] net: devmem: support TX over NETMEM_TX_NO_DMA devices Bobby Eshleman
2026-05-08 15:01 ` Stanislav Fomichev
2026-05-08 16:19 ` Bobby Eshleman
2026-05-08 20:44 ` Jakub Kicinski
2026-05-08 20:47 ` Jakub Kicinski
2026-05-08 21:28 ` Bobby Eshleman
2026-05-08 22:27 ` Jakub Kicinski
2026-05-08 23:03 ` Bobby Eshleman
2026-05-08 2:27 ` [PATCH net-next v3 4/8] selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration Bobby Eshleman
2026-05-08 15:01 ` Stanislav Fomichev
2026-05-08 2:27 ` [PATCH net-next v3 5/8] selftests: drv-net: make attr _nk_guest_ifname public Bobby Eshleman
2026-05-08 15:01 ` Stanislav Fomichev
2026-05-08 2:27 ` [PATCH net-next v3 6/8] selftests: drv-net: refactor devmem command builders into lib module Bobby Eshleman
2026-05-08 15:03 ` Stanislav Fomichev
2026-05-08 16:19 ` Bobby Eshleman
2026-05-08 2:27 ` [PATCH net-next v3 7/8] selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv Bobby Eshleman
2026-05-08 15:03 ` Stanislav Fomichev
2026-05-08 2:27 ` [PATCH net-next v3 8/8] selftests: drv-net: add netkit devmem tests Bobby Eshleman
2026-05-08 15:03 ` Stanislav Fomichev
2026-05-10 20:33 ` Zhu Yanjun [this message]
2026-05-11 17:01 ` [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices Bobby Eshleman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d7de2f17-af2d-4b6a-be65-f009d78e3d20@linux.dev \
--to=yanjun.zhu@linux.dev \
--cc=alexanderduyck@fb.com \
--cc=alexs@kernel.org \
--cc=almasrymina@google.com \
--cc=andrew+netdev@lunn.ch \
--cc=bobbyeshleman@gmail.com \
--cc=bobbyeshleman@meta.com \
--cc=bpf@vger.kernel.org \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dw@davidwei.uk \
--cc=dzm91@hust.edu.cn \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=hramamurthy@google.com \
--cc=jiang.kun2@zte.com.cn \
--cc=joshwash@google.com \
--cc=kernel-team@meta.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=michael.chan@broadcom.com \
--cc=mohsin.bashr@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pavan.chebbi@broadcom.com \
--cc=razor@blackwall.org \
--cc=saeedm@nvidia.com \
--cc=sdf.kernel@gmail.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=si.yanteng@linux.dev \
--cc=skhan@linuxfoundation.org \
--cc=tariqt@nvidia.com \
--cc=wang.yaxin@zte.com.cn \
--cc=willemb@google.com \
--cc=xu.xin16@zte.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox