From: Stanislav Fomichev <stfomichev@gmail.com>
To: David Wei <dw@davidwei.uk>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
netdev@vger.kernel.org, bpf@vger.kernel.org, kuba@kernel.org,
davem@davemloft.net, razor@blackwall.org, pabeni@redhat.com,
willemb@google.com, sdf@fomichev.me, john.fastabend@gmail.com,
martin.lau@kernel.org, jordan@jrife.io,
maciej.fijalkowski@intel.com, magnus.karlsson@intel.com,
toke@redhat.com, yangzhenze@bytedance.com,
wangdongdong.6@bytedance.com
Subject: Re: [PATCH net-next v4 00/14] netkit: Support for io_uring zero-copy and AF_XDP
Date: Wed, 5 Nov 2025 11:51:50 -0800 [thread overview]
Message-ID: <aQuq1mhm7cM8kkLY@mini-arch> (raw)
In-Reply-To: <458d088f-dace-4869-b4af-b381d6ca5af1@davidwei.uk>
On 11/04, David Wei wrote:
> On 2025-11-04 15:22, Stanislav Fomichev wrote:
> > On 10/31, Daniel Borkmann wrote:
> > > Containers use virtual netdevs to route traffic from a physical netdev
> > > in the host namespace. They do not have access to the physical netdev
> > > in the host and thus can't use memory providers or AF_XDP that require
> > > reconfiguring/restarting queues in the physical netdev.
> > >
> > > This patchset adds the concept of queue peering to virtual netdevs that
> > > allow containers to use memory providers and AF_XDP at native speed.
> > > These mapped queues are bound to a real queue in a physical netdev and
> > > act as a proxy.
> > >
> > > Memory providers and AF_XDP operations takes an ifindex and queue id,
> > > so containers would pass in an ifindex for a virtual netdev and a queue
> > > id of a mapped queue, which then gets proxied to the underlying real
> > > queue. Peered queues are created and bound to a real queue atomically
> > > through a generic ynl netdev operation.
> > >
> > > We have implemented support for this concept in netkit and tested the
> > > latter against Nvidia ConnectX-6 (mlx5) as well as Broadcom BCM957504
> > > (bnxt_en) 100G NICs. For more details see the individual patches.
> > >
> > > v3->v4:
> > > - ndo_queue_create store dst queue via arg (Nikolay)
> > > - Small nits like a spelling issue + rev xmas (Nikolay)
> > > - admin-perm flag in bind-queue spec (Jakub)
> > > - Fix potential ABBA deadlock situation in bind (Jakub, Paolo, Stan)
> > > - Add a peer dev_tracker to not reuse the sysfs one (Jakub)
> > > - New patch (12/14) to handle the underlying device going away (Jakub)
> > > - Improve commit message on queue-get (Jakub)
> > > - Do not expose phys dev info from container on queue-get (Jakub)
> > > - Add netif_put_rx_queue_peer_locked to simplify code (Stan)
> > > - Rework xsk handling to simplify the code and drop a few patches
> > > - Rebase and retested everything with mlx5 + bnxt_en
> >
> > I mostly looked at patches 1-8 and they look good to me. Will it be
> > possible to put your sample runs from 13 and 14 into a selftest form? Even
> > if you require real hw, that should be doable, similar to
> > tools/testing/selftests/drivers/net/hw/devmem.py, right?
>
> Thanks for taking a look. For io_uring at least, it requires both a
> routable VIP that can be assigned to the netkit in a netns and a BPF
> program for skb forwarding. I could add a selftest, but it'll be hard to
> generalise across all envs. I'm hoping to get self contained QEMU VM
> selftest support first. WDYT?
You can start at least with having what you have in patch 3 as a
selftest. NIPA runs with fbnic qemu model, you should be able to at
least test the netns setup, make sure peer-info works as expected, etc.
You can verify that things like changing the number of channels are
blocked when you have the queued bound to netkit..
But also, regarding the datapath test, not sure you need another qemu. Not
even sure why you need a vip? You can carve a single port and share
the same host ip in the netns? Alternatively I think you can carve
out 192.168.x.y from /32 and assign it to the machine. We have datapath
devmem tests working without any special qemu vms (besides, well,
special fbnic qemu, but you should be able to test on it as well).
next prev parent reply other threads:[~2025-11-05 19:51 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-31 21:20 [PATCH net-next v4 00/14] netkit: Support for io_uring zero-copy and AF_XDP Daniel Borkmann
2025-10-31 21:20 ` [PATCH net-next v4 01/14] net: Add bind-queue operation Daniel Borkmann
2025-11-07 0:39 ` Jakub Kicinski
2025-11-19 14:57 ` Daniel Borkmann
2025-11-20 2:20 ` Jakub Kicinski
2025-10-31 21:20 ` [PATCH net-next v4 02/14] net: Implement netdev_nl_bind_queue_doit Daniel Borkmann
2025-11-07 0:39 ` Jakub Kicinski
2025-10-31 21:20 ` [PATCH net-next v4 03/14] net: Add peer info to queue-get response Daniel Borkmann
2025-10-31 21:20 ` [PATCH net-next v4 04/14] net, ethtool: Disallow peered real rxqs to be resized Daniel Borkmann
2025-10-31 21:20 ` [PATCH net-next v4 05/14] net: Proxy net_mp_{open,close}_rxq for mapped queues Daniel Borkmann
2025-10-31 21:20 ` [PATCH net-next v4 06/14] xsk: Move NETDEV_XDP_ACT_ZC into generic header Daniel Borkmann
2025-10-31 21:20 ` [PATCH net-next v4 07/14] xsk: Extend xsk_rcv_check validation Daniel Borkmann
2025-10-31 21:20 ` [PATCH net-next v4 08/14] xsk: Proxy pool management for mapped queues Daniel Borkmann
2025-10-31 21:20 ` [PATCH net-next v4 09/14] netkit: Add single device mode for netkit Daniel Borkmann
2025-10-31 21:20 ` [PATCH net-next v4 10/14] netkit: Document fast vs slowpath members via macros Daniel Borkmann
2025-10-31 21:21 ` [PATCH net-next v4 11/14] netkit: Implement rtnl_link_ops->alloc and ndo_queue_create Daniel Borkmann
2025-11-07 0:41 ` Jakub Kicinski
2025-11-07 15:01 ` Daniel Borkmann
2025-11-07 15:54 ` Jakub Kicinski
2025-10-31 21:21 ` [PATCH net-next v4 12/14] netkit: Add netkit notifier to check for unregistering devices Daniel Borkmann
2025-10-31 21:21 ` [PATCH net-next v4 13/14] netkit: Add io_uring zero-copy support for TCP Daniel Borkmann
2025-11-07 0:43 ` Jakub Kicinski
2025-10-31 21:21 ` [PATCH net-next v4 14/14] netkit: Add xsk support for af_xdp applications Daniel Borkmann
2025-11-04 23:22 ` [PATCH net-next v4 00/14] netkit: Support for io_uring zero-copy and AF_XDP Stanislav Fomichev
2025-11-05 0:43 ` David Wei
2025-11-05 19:51 ` Stanislav Fomichev [this message]
2025-11-08 22:18 ` David Wei
2025-11-07 0:47 ` Jakub Kicinski
2025-11-07 1:00 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQuq1mhm7cM8kkLY@mini-arch \
--to=stfomichev@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dw@davidwei.uk \
--cc=john.fastabend@gmail.com \
--cc=jordan@jrife.io \
--cc=kuba@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=martin.lau@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=razor@blackwall.org \
--cc=sdf@fomichev.me \
--cc=toke@redhat.com \
--cc=wangdongdong.6@bytedance.com \
--cc=willemb@google.com \
--cc=yangzhenze@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).