Linux Kernel Selftest development
 help / color / mirror / Atom feed
From: Zhu Yanjun <yanjun.zhu@linux.dev>
To: Bobby Eshleman <bobbyeshleman@gmail.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>, Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Alex Shi <alexs@kernel.org>, Yanteng Si <si.yanteng@linux.dev>,
	Dongliang Mu <dzm91@hust.edu.cn>,
	Michael Chan <michael.chan@broadcom.com>,
	Pavan Chebbi <pavan.chebbi@broadcom.com>,
	Joshua Washington <joshwash@google.com>,
	Harshitha Ramamurthy <hramamurthy@google.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Tariq Toukan <tariqt@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	Alexander Duyck <alexanderduyck@fb.com>,
	kernel-team@meta.com, Daniel Borkmann <daniel@iogearbox.net>,
	Nikolay Aleksandrov <razor@blackwall.org>,
	Shuah Khan <shuah@kernel.org>,
	"yanjun.zhu@linux.dev" <yanjun.zhu@linux.dev>
Cc: dw@davidwei.uk, sdf.kernel@gmail.com, mohsin.bashr@gmail.com,
	willemb@google.com, jiang.kun2@zte.com.cn, xu.xin16@zte.com.cn,
	wang.yaxin@zte.com.cn, netdev@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org, bpf@vger.kernel.org,
	linux-kselftest@vger.kernel.org,
	Stanislav Fomichev <sdf@fomichev.me>,
	Mina Almasry <almasrymina@google.com>,
	Bobby Eshleman <bobbyeshleman@meta.com>
Subject: Re: [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices
Date: Sun, 10 May 2026 13:33:18 -0700	[thread overview]
Message-ID: <d7de2f17-af2d-4b6a-be65-f009d78e3d20@linux.dev> (raw)
In-Reply-To: <20260507-tcp-dm-netkit-v3-0-52821445867c@meta.com>

在 2026/5/7 19:27, Bobby Eshleman 写道:
> This series enables TCP devmem TX through netkit devices.
> 
> Netkit now supports queue leasing. A physical NIC's RX queue can be
> leased to a netkit guest interface inside a container namespace. This
> gives the container a devmem-capable data path on the RX side (bind-rx,
> etc...). On the TX side, the container process binds to its netkit guest
> interface and sends traffic that netkit redirects (via BPF or ip
> forwarding) to the physical NIC for DMA.
> 
> Two things in the existing devmem TX path prevent this from working:
> 
> 1. validate_xmit_unreadable_skb() requires dev->netmem_tx before it will
>     forward a dmabuf-backed (unreadable) skb. This protects skbs from
>     landing on devices that don't have the IOMMU mappings for the backing
>     dmabuf or that don't speak netmem. Netkit, however, does not support
>     DMA, doesn't attempt to read unreadable skb pages and so doesn't
>     break netmem (it is pure skb routing and redirection). It is
>     functionally capable of routing unreadable skbs, but there is no way
>     for the TX validation pathway to distinguish between a device that
>     will actually attempt DMA-ing the skb and another device
>     (like netkit) that does not DMA but also does not break
>     netmem.
> 
> 2. bind_tx_doit uses the bound device as the DMA device.  When the user
>     binds devmem TX to the netkit guest, the bind handler attempts to
>     create DMA mappings against netkit, which has no DMA capability and
>     no IOMMU mappings.
> 
> This series solves these problems as follows:
> 
> 1. Extend netmem_tx to two bits, assigned to one of three values:
> 
>     NETMEM_TX_NONE   - netmem not supported
>     NETMEM_TX_DMA    - netmem supported and performs DMA
>     NETMEM_TX_NO_DMA - netmem supported, but does not DMA
> 
>     With these bits, phys devices can set NETMEM_TX_DMA and devices like
>     netkit set NETMEM_TX_NO_DMA. The validation TX path ensures that any
>     DMA-capable netdev exactly matches the bound device, guaranteeing the
>     correct mapping of the bound dmabuf. The validation TX path also
>     allows devices with NETMEM_TX_NO_DMA to pass, knowing these devices
>     will not misuse netmem or run into IOMMU faults. After redirection or
>     routing and the skb finally makes its way through the stack to a
>     physical device's TX path, the above NETMEM_TX_DMA check is performed
>     again to guarantee the device has the appropriate binding/mappings.
> 
> 2. On TX bind, the bind handler recognizes NETMEM_TX_NO_DMA devices and
>     finds the phys TX device and binds to that instead. For the netkit
>     case, if it has been leased a queue from a DMA-capable device
>     already, then the bind action is performed on the DMA-capable device
>     instead and the dmabuf is mapped correctly.
> 
> ---
> Changes in v3:
> - Fix validate_xmit_unreadable_skb() logic for non-devmem
>    unreadable niovs (should not be dropped) (Sashiko)
> - Simplify lock handling in bind_tx, no premature release (Jakub)
> - split NO_DMA changes into separate patch (Jakub)
> - fixed some pylint issues, one required an additional patch ("selftests:
>    drv-net: make attr _nk_guest_ifname public") to rename a variable from
>    private to public
> - see per-patch changelist for more detailed changes
> - Link to v2: https://lore.kernel.org/r/20260504-tcp-dm-netkit-v2-0-56d52ac72fd4@meta.com
> 
> Changes in v2:
> - Squash driver conversion patches (2-5) into patch 1 (Jakub)
> - In validate_xmit_unreadable_skb() to check netmem_tx mode before inspecting
>    frags (Jakub)
> - Lock bind_dev around netdev_queue_get_dma_dev() when bind_dev != netdev to
>    fix lockdep (Sashiko)
> - Move require_devmem() into individual test functions so KsftSkipEx goes up to
>    ksft_run() (Sashiko)
> - Add nk_devmem.py to TEST_PROGS in Makefile (Sashiko)
> - Link to v1:
>    https://lore.kernel.org/all/20260428-tcp-dm-netkit-v1-0-719280eba4d2@meta.com/
> 
> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> 
> ---
> Bobby Eshleman (8):
>        net: convert netmem_tx flag to enum
>        net: netkit: declare NETMEM_TX_NO_DMA mode
>        net: devmem: support TX over NETMEM_TX_NO_DMA devices

I applied this patchset in my local kernel tree and built a new kernel 
image. I loaded this new kernel image in my test environment. It seems 
that all the testcases can pass.

I think that this patchset would not cause any regression problem in my 
test environment.

Zhu Yanjun

>        selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration
>        selftests: drv-net: make attr _nk_guest_ifname public
>        selftests: drv-net: refactor devmem command builders into lib module
>        selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv
>        selftests: drv-net: add netkit devmem tests
> 
>   .../networking/net_cachelines/net_device.rst       |   2 +-
>   Documentation/networking/netmem.rst                |   8 +-
>   .../translations/zh_CN/networking/netmem.rst       |   7 +-
>   drivers/net/ethernet/broadcom/bnxt/bnxt.c          |   2 +-
>   drivers/net/ethernet/google/gve/gve_main.c         |   2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   2 +-
>   drivers/net/ethernet/meta/fbnic/fbnic_netdev.c     |   2 +-
>   drivers/net/netkit.c                               |   1 +
>   include/linux/netdevice.h                          |  11 +-
>   net/core/dev.c                                     |   5 +-
>   net/core/devmem.c                                  |   6 +-
>   net/core/devmem.h                                  |   9 +-
>   net/core/netdev-genl.c                             |  65 +++++-
>   tools/testing/selftests/drivers/net/hw/Makefile    |   1 +
>   tools/testing/selftests/drivers/net/hw/devmem.py   |  77 ++------
>   .../selftests/drivers/net/hw/lib/py/devmem.py      | 218 +++++++++++++++++++++
>   tools/testing/selftests/drivers/net/hw/ncdevmem.c  |  58 +++---
>   .../testing/selftests/drivers/net/hw/nk_devmem.py  |  55 ++++++
>   .../drivers/net/hw/nk_primary_rx_redirect.bpf.c    |  39 ++++
>   .../testing/selftests/drivers/net/hw/nk_qlease.py  |   8 +-
>   tools/testing/selftests/drivers/net/lib/py/env.py  | 109 ++++++++---
>   21 files changed, 549 insertions(+), 138 deletions(-)
> ---
> base-commit: 790ead9394860e7d70c5e0e50a35b243e909a618
> change-id: 20260423-tcp-dm-netkit-2bd78b638d30
> 
> Best regards,


      parent reply	other threads:[~2026-05-10 20:33 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08  2:27 [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices Bobby Eshleman
2026-05-08  2:27 ` [PATCH net-next v3 1/8] net: convert netmem_tx flag to enum Bobby Eshleman
2026-05-08 14:56   ` Stanislav Fomichev
2026-05-08 16:11     ` Bobby Eshleman
2026-05-08  2:27 ` [PATCH net-next v3 2/8] net: netkit: declare NETMEM_TX_NO_DMA mode Bobby Eshleman
2026-05-08 14:57   ` Stanislav Fomichev
2026-05-08  2:27 ` [PATCH net-next v3 3/8] net: devmem: support TX over NETMEM_TX_NO_DMA devices Bobby Eshleman
2026-05-08 15:01   ` Stanislav Fomichev
2026-05-08 16:19     ` Bobby Eshleman
2026-05-08 20:44     ` Jakub Kicinski
2026-05-08 20:47   ` Jakub Kicinski
2026-05-08 21:28     ` Bobby Eshleman
2026-05-08 22:27       ` Jakub Kicinski
2026-05-08 23:03         ` Bobby Eshleman
2026-05-08  2:27 ` [PATCH net-next v3 4/8] selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration Bobby Eshleman
2026-05-08 15:01   ` Stanislav Fomichev
2026-05-08  2:27 ` [PATCH net-next v3 5/8] selftests: drv-net: make attr _nk_guest_ifname public Bobby Eshleman
2026-05-08 15:01   ` Stanislav Fomichev
2026-05-08  2:27 ` [PATCH net-next v3 6/8] selftests: drv-net: refactor devmem command builders into lib module Bobby Eshleman
2026-05-08 15:03   ` Stanislav Fomichev
2026-05-08 16:19     ` Bobby Eshleman
2026-05-08  2:27 ` [PATCH net-next v3 7/8] selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv Bobby Eshleman
2026-05-08 15:03   ` Stanislav Fomichev
2026-05-08  2:27 ` [PATCH net-next v3 8/8] selftests: drv-net: add netkit devmem tests Bobby Eshleman
2026-05-08 15:03   ` Stanislav Fomichev
2026-05-10 20:33 ` Zhu Yanjun [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d7de2f17-af2d-4b6a-be65-f009d78e3d20@linux.dev \
    --to=yanjun.zhu@linux.dev \
    --cc=alexanderduyck@fb.com \
    --cc=alexs@kernel.org \
    --cc=almasrymina@google.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=bobbyeshleman@gmail.com \
    --cc=bobbyeshleman@meta.com \
    --cc=bpf@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dw@davidwei.uk \
    --cc=dzm91@hust.edu.cn \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=hramamurthy@google.com \
    --cc=jiang.kun2@zte.com.cn \
    --cc=joshwash@google.com \
    --cc=kernel-team@meta.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=michael.chan@broadcom.com \
    --cc=mohsin.bashr@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pavan.chebbi@broadcom.com \
    --cc=razor@blackwall.org \
    --cc=saeedm@nvidia.com \
    --cc=sdf.kernel@gmail.com \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=si.yanteng@linux.dev \
    --cc=skhan@linuxfoundation.org \
    --cc=tariqt@nvidia.com \
    --cc=wang.yaxin@zte.com.cn \
    --cc=willemb@google.com \
    --cc=xu.xin16@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox