From: Daniel Borkmann <daniel@iogearbox.net>
To: netdev@vger.kernel.org
Cc: bpf@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
razor@blackwall.org, pabeni@redhat.com, willemb@google.com,
sdf@fomichev.me, john.fastabend@gmail.com, martin.lau@kernel.org,
jordan@jrife.io, maciej.fijalkowski@intel.com,
magnus.karlsson@intel.com, dw@davidwei.uk, toke@redhat.com,
yangzhenze@bytedance.com, wangdongdong.6@bytedance.com
Subject: [PATCH net-next v11 00/14] netkit: Support for io_uring zero-copy and AF_XDP
Date: Fri, 3 Apr 2026 01:10:17 +0200 [thread overview]
Message-ID: <20260402231031.447597-1-daniel@iogearbox.net> (raw)
Containers use virtual netdevs to route traffic from a physical netdev
in the host namespace. They do not have access to the physical netdev
in the host and thus can't use memory providers or AF_XDP that require
reconfiguring/restarting queues in the physical netdev.
This patchset adds the concept of queue leasing to virtual netdevs that
allow containers to use memory providers and AF_XDP at native speed.
Leased queues are bound to a real queue in a physical netdev and act
as a proxy.
Memory providers and AF_XDP operations take an ifindex and queue id,
so containers would pass in an ifindex for a virtual netdev and a queue
id of a leased queue, which then gets proxied to the underlying real
queue.
We have implemented support for this concept in netkit and tested the
latter against Nvidia ConnectX-6 (mlx5) as well as Broadcom BCM957504
(bnxt_en) 100G NICs. For more details see the individual patches.
v10->v11:
- Fix missing mp_ops->uninstall upon unlease path (Gemini)
- Fix dma device retrieval on tx queue when rx queue is leased (Gemini)
- Rework ethtool channel checks to check rx/tx individually (Gemini)
- The remainder of the Gemini findings were non-issues
- Add more extensive net selftests around queue leasing corner cases
- Rebase and retested everything with mlx5 + bnxt_en
v9->v10:
- Fix netkit selftest ruff checks (Jakub)
- Pass in net arg in netdev_put_lock (Gemini)
- Make the ethtool channel busy checks direction-aware (Gemini)
- Fix list corruption in netkit_check_lease_unregister (Gemini)
- Fix netkit_set_headroom in single device mode (Gemini)
- Fix xsk_clear_pool_at_qid bounds to use num_{rx,tx}_queues (Gemini)
- Fix netkit xsk teardown state checks (Gemini)
- The remainder of the Gemini findings were non-issues
- Rebase and retested everything with mlx5 + bnxt_en
v8->v9:
- Use ynl --family netdev in commit desc (Jakub)
- Update docs and comments about locking (Jakub)
- Propagate extack to the driver in ndo_queue_create (Jakub)
- Move function to net/core/dev.h core where possible (Jakub)
- Add comment about lease in netdev_rx_queue struct (Jakub)
- Add min: 0 check to netns id in policy (Jakub)
- Drop ifindex == ifindex_lease test in netdev_nl_queue_create_doit (Jakub)
- Detailed extack errors in netkit's ndo_queue_create (Jakub)
- Refactor lease dump into own function in netdev_nl_queue_fill_one (Jakub)
- Dump mp and xsk info for both queues in netdev_nl_queue_fill_one (Jakub)
- Replace the net_ prefix with netif_ for net_mp_{open,close}_rxq (Jakub)
- Remove ifq_idx naming cleanup from net_mp_{open,close}_rxq (Jakub)
- Rework the mp entry points to have explicit code that deals with
the lease (Jakub)
- Fix locking in netdev_queue_get_dma_dev (Jakub)
- Remove unneeded carrier flap in netkit_queue_create (Jakub)
- Drop base net selftests that were already merged
- Add more nekit queue leasing coverage of base functionality (Jakub)
- Remove any mp leftovers upon unlease found via test cases
- Rebase and retested everything with mlx5 + bnxt_en
v7->v8:
- Rework and refactor net_mp_{open,close}_rxq patch (Jakub)
- Moved queue lease netlink command from env into test (Jakub)
- Support netns_id also in queue create call and not only dump
- Rebase and retested everything with mlx5 + bnxt_en
v6->v7:
- Add xsk_dev_queue_valid real_num_rx_queues check given bound
xs->queue_id could be from a TX queue (Claude)
- Fix up exception path in queue leasing selftest (Claude)
- Rebase and retested everything with mlx5 + bnxt_en
v5->v6:
- Fix nest_queue test in netdev_nl_queue_fill_one (Jakub/Claude)
- Fix netdev notifier locking leak (Jakub/Claude)
- Drop NETREG_UNREGISTERING WARN_ON_ONCE to avoid confusion (Stan)
- Remove slipped-in .gitignore cruft in net selftest (Stan)
- Fix Pylint warnings in net selftest (Jakub)
- Rebase and retested everything with mlx5 + bnxt_en
v4->v5:
- Rework of the core API into queue-create op (Jakub)
- Rename from queue peering to queue leasing (Jakub)
- Add net selftests for queue leasing (Stan, Jakub)
- Move netkit_queue_get_dma_dev into core (Jakub)
- Dropped netkit_get_channels (Jakub)
- Moved ndo_queue_create back to return index or error (Jakub)
- Inline __netdev_rx_queue_{peer,unpeer} helpers (Jakub)
- Adding helpers in patches where they are used (Jakub)
- Undo inline for netdev_put_lock (Jakub)
- Factoring out checks whether device can lease (Jakub)
- Fix up return codes in netdev_nl_bind_queue_doit (Jakub)
- Reject when AF_XDP or mp already bound (Jakub)
- Switch some error cases to NL_SET_BAD_ATTR() (Jakub)
- Rebase and retested everything with mlx5 + bnxt_en
v3->v4:
- ndo_queue_create store dst queue via arg (Nikolay)
- Small nits like a spelling issue + rev xmas (Nikolay)
- admin-perm flag in bind-queue spec (Jakub)
- Fix potential ABBA deadlock situation in bind (Jakub, Paolo, Stan)
- Add a peer dev_tracker to not reuse the sysfs one (Jakub)
- New patch (12/14) to handle the underlying device going away (Jakub)
- Improve commit message on queue-get (Jakub)
- Do not expose phys dev info from container on queue-get (Jakub)
- Add netif_put_rx_queue_peer_locked to simplify code (Stan)
- Rework xsk handling to simplify the code and drop a few patches
- Rebase and retested everything with mlx5 + bnxt_en
v2->v3:
- Use netdev_ops_assert_locked instead of netdev_assert_locked (syzbot)
- Add missing netdev_lockdep_set_classes in netkit
v1->v2:
- Removed bind sample ynl code (Stan)
- Reworked netdev locking to have consistent order (Stan, Kuba)
- Return 'not supported' in API patch (Stan)
- Improved ynl documentation (Kuba)
- Added 'max: s32-max' in ynl spec for ifindex (Kuba)
- Added also queue type in ynl to have user specify rx to make
it obvious (Kuba)
- Use of netdev_hold (Kuba)
- Avoid static inlines from another header (Kuba)
- Squashed some commits (Kuba, Stan)
- Removed ndo_{peer,unpeer}_queues callback and simplified
code (Kuba)
- Improved commit messages (Toke, Kuba, Stan, zf)
- Got rid of locking genl_sk_priv_get (Stan)
- Removed af_xdp cleanup churn (Maciej)
- Added netdev locking asserts (Stan)
- Reject ethtool ioctl path queue resizing (Kuba)
- Added kdoc for ndo_queue_create (Stan)
- Uninvert logic in netkit single dev mode (Jordan)
- Added binding support for multiple queues
Daniel Borkmann (10):
net: Add queue-create operation
net: Implement netdev_nl_queue_create_doit
net: Add lease info to queue-get response
net, ethtool: Disallow leased real rxqs to be resized
net: Slightly simplify net_mp_{open,close}_rxq
xsk: Extend xsk_rcv_check validation
xsk: Proxy pool management for leased queues
netkit: Add single device mode for netkit
netkit: Add netkit notifier to check for unregistering devices
netkit: Add xsk support for af_xdp applications
David Wei (4):
net: Proxy netif_mp_{open,close}_rxq for leased queues
net: Proxy netdev_queue_get_dma_dev for leased queues
netkit: Implement rtnl_link_ops->alloc and ndo_queue_create
selftests/net: Add queue leasing tests with netkit
Documentation/netlink/specs/netdev.yaml | 46 +
Documentation/netlink/specs/rt-link.yaml | 11 +
Documentation/networking/netdevices.rst | 6 +
drivers/net/netkit.c | 412 ++++-
include/linux/netdevice.h | 11 +-
include/net/netdev_queues.h | 23 +-
include/net/netdev_rx_queue.h | 29 +-
include/net/page_pool/memory_provider.h | 8 +-
include/uapi/linux/if_link.h | 6 +
include/uapi/linux/netdev.h | 11 +
io_uring/zcrx.c | 12 +-
net/core/dev.c | 18 +-
net/core/dev.h | 12 +
net/core/devmem.c | 6 +-
net/core/netdev-genl-gen.c | 20 +
net/core/netdev-genl-gen.h | 2 +
net/core/netdev-genl.c | 238 ++-
net/core/netdev_queues.c | 103 +-
net/core/netdev_rx_queue.c | 205 ++-
net/ethtool/channels.c | 28 +-
net/ethtool/ioctl.c | 21 +-
net/xdp/xsk.c | 76 +-
tools/include/uapi/linux/netdev.h | 11 +
.../testing/selftests/drivers/net/hw/Makefile | 1 +
.../drivers/net/hw/lib/py/__init__.py | 4 +-
.../selftests/drivers/net/hw/nk_qlease.py | 1407 +++++++++++++++++
26 files changed, 2558 insertions(+), 169 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/hw/nk_qlease.py
--
2.43.0
next reply other threads:[~2026-04-02 23:11 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-02 23:10 Daniel Borkmann [this message]
2026-04-02 23:10 ` [PATCH net-next v11 01/14] net: Add queue-create operation Daniel Borkmann
2026-04-02 23:10 ` [PATCH net-next v11 02/14] net: Implement netdev_nl_queue_create_doit Daniel Borkmann
2026-04-02 23:10 ` [PATCH net-next v11 03/14] net: Add lease info to queue-get response Daniel Borkmann
2026-04-08 3:40 ` Jakub Kicinski
2026-04-08 9:09 ` Daniel Borkmann
2026-04-08 22:12 ` Jakub Kicinski
2026-04-09 13:43 ` Daniel Borkmann
2026-04-09 13:52 ` Daniel Borkmann
2026-04-09 14:46 ` Jakub Kicinski
2026-04-09 15:32 ` Daniel Borkmann
2026-04-10 1:51 ` Jakub Kicinski
2026-04-10 11:10 ` Daniel Borkmann
2026-04-02 23:10 ` [PATCH net-next v11 04/14] net, ethtool: Disallow leased real rxqs to be resized Daniel Borkmann
2026-04-02 23:10 ` [PATCH net-next v11 05/14] net: Slightly simplify net_mp_{open,close}_rxq Daniel Borkmann
2026-04-02 23:10 ` [PATCH net-next v11 06/14] net: Proxy netif_mp_{open,close}_rxq for leased queues Daniel Borkmann
2026-04-02 23:10 ` [PATCH net-next v11 07/14] net: Proxy netdev_queue_get_dma_dev " Daniel Borkmann
2026-04-02 23:10 ` [PATCH net-next v11 08/14] xsk: Extend xsk_rcv_check validation Daniel Borkmann
2026-04-02 23:10 ` [PATCH net-next v11 09/14] xsk: Proxy pool management for leased queues Daniel Borkmann
2026-04-02 23:10 ` [PATCH net-next v11 10/14] netkit: Add single device mode for netkit Daniel Borkmann
2026-04-02 23:10 ` [PATCH net-next v11 11/14] netkit: Implement rtnl_link_ops->alloc and ndo_queue_create Daniel Borkmann
2026-04-02 23:10 ` [PATCH net-next v11 12/14] netkit: Add netkit notifier to check for unregistering devices Daniel Borkmann
2026-04-02 23:10 ` [PATCH net-next v11 13/14] netkit: Add xsk support for af_xdp applications Daniel Borkmann
2026-04-02 23:10 ` [PATCH net-next v11 14/14] selftests/net: Add queue leasing tests with netkit Daniel Borkmann
2026-04-08 23:22 ` Jakub Kicinski
2026-04-09 15:26 ` David Wei
2026-04-10 1:19 ` Jakub Kicinski
2026-04-07 9:50 ` [PATCH net-next v11 00/14] netkit: Support for io_uring zero-copy and AF_XDP Daniel Borkmann
2026-04-10 2:00 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260402231031.447597-1-daniel@iogearbox.net \
--to=daniel@iogearbox.net \
--cc=bpf@vger.kernel.org \
--cc=davem@davemloft.net \
--cc=dw@davidwei.uk \
--cc=john.fastabend@gmail.com \
--cc=jordan@jrife.io \
--cc=kuba@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=martin.lau@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=razor@blackwall.org \
--cc=sdf@fomichev.me \
--cc=toke@redhat.com \
--cc=wangdongdong.6@bytedance.com \
--cc=willemb@google.com \
--cc=yangzhenze@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.