From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Daniel Borkmann <daniel@iogearbox.net>, bpf@vger.kernel.org
Cc: netdev@vger.kernel.org, martin.lau@linux.dev,
razor@blackwall.org, ast@kernel.org, andrii@kernel.org,
john.fastabend@gmail.com, sdf@google.com, kuba@kernel.org,
andrew@lunn.ch, Daniel Borkmann <daniel@iogearbox.net>
Subject: Re: [PATCH bpf-next v3 1/7] netkit, bpf: Add bpf programmable net device
Date: Tue, 24 Oct 2023 18:07:56 +0200 [thread overview]
Message-ID: <87msw8ovfn.fsf@toke.dk> (raw)
In-Reply-To: <20231023171856.18324-2-daniel@iogearbox.net>
Daniel Borkmann <daniel@iogearbox.net> writes:
> This work adds a new, minimal BPF-programmable device called "netkit"
> (former PoC code-name "meta") we recently presented at LSF/MM/BPF. The
> core idea is that BPF programs are executed within the drivers xmit routine
> and therefore e.g. in case of containers/Pods moving BPF processing closer
> to the source.
>
> One of the goals was that in case of Pod egress traffic, this allows to
> move BPF programs from hostns tcx ingress into the device itself, providing
> earlier drop or forward mechanisms, for example, if the BPF program
> determines that the skb must be sent out of the node, then a redirect to
> the physical device can take place directly without going through per-CPU
> backlog queue. This helps to shift processing for such traffic from softirq
> to process context, leading to better scheduling decisions/performance (see
> measurements in the slides).
>
> In this initial version, the netkit device ships as a pair, but we plan to
> extend this further so it can also operate in single device mode. The pair
> comes with a primary and a peer device. Only the primary device, typically
> residing in hostns, can manage BPF programs for itself and its peer. The
> peer device is designated for containers/Pods and cannot attach/detach
> BPF programs. Upon the device creation, the user can set the default policy
> to 'forward' or 'drop' for the case when no BPF program is attached.
Nit: according to the code the policies are 'pass' and 'drop'? :)
> Additionally, the device can be operated in L3 (default) or L2 mode. The
> management of BPF programs is done via bpf_mprog, so that multi-attach is
> supported right from the beginning with similar API and dependency controls
> as tcx. For details on the latter see commit 053c8e1f235d ("bpf: Add generic
> attach/detach/query API for multi-progs"). tc BPF compatibility is provided,
> so that existing programs can be easily migrated.
>
> Going forward, we plan to use netkit devices in Cilium as the main device
> type for connecting Pods. They will be operated in L3 mode in order to
> simplify a Pod's neighbor management and the peer will operate in default
> drop mode, so that no traffic is leaving between the time when a Pod is
> brought up by the CNI plugin and programs attached by the agent.
> Additionally, the programs we attach via tcx on the physical devices are
> using bpf_redirect_peer() for inbound traffic into netkit device, hence the
> latter is also supporting the ndo_get_peer_dev callback. Similarly, we use
> bpf_redirect_neigh() for the way out, pushing from netkit peer to phys device
> directly. Also, BIG TCP is supported on netkit device. For the follow-up
> work in single device mode, we plan to convert Cilium's cilium_host/_net
> devices into a single one.
>
> An extensive test suite for checking device operations and the BPF program
> and link management API comes as BPF selftests in this series.
>
> Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org>
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Link: https://github.com/borkmann/iproute2/tree/pr/netkit
> Link:
> http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf
> (24ff.)
I like the new name - thank you for changing it! :)
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
next prev parent reply other threads:[~2023-10-24 16:08 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-23 17:18 [PATCH bpf-next v3 0/7] Add bpf programmable net device Daniel Borkmann
2023-10-23 17:18 ` [PATCH bpf-next v3 1/7] netkit, bpf: " Daniel Borkmann
2023-10-24 16:07 ` Toke Høiland-Jørgensen [this message]
2023-10-24 18:16 ` Daniel Borkmann
2023-10-24 16:40 ` Stanislav Fomichev
2023-10-24 18:05 ` Daniel Borkmann
2023-10-24 18:27 ` Stanislav Fomichev
2023-10-24 19:58 ` Daniel Borkmann
2023-10-23 17:18 ` [PATCH bpf-next v3 2/7] tools: Sync if_link uapi header Daniel Borkmann
2023-10-23 17:18 ` [PATCH bpf-next v3 3/7] libbpf: Add link-based API for netkit Daniel Borkmann
2023-10-23 17:18 ` [PATCH bpf-next v3 4/7] bpftool: Implement link show support " Daniel Borkmann
2023-10-24 16:08 ` Toke Høiland-Jørgensen
2023-10-23 17:18 ` [PATCH bpf-next v3 5/7] bpftool: Extend net dump with netkit progs Daniel Borkmann
2023-10-23 17:18 ` [PATCH bpf-next v3 6/7] selftests/bpf: Add netlink helper library Daniel Borkmann
2023-10-23 17:18 ` [PATCH bpf-next v3 7/7] selftests/bpf: Add selftests for netkit Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87msw8ovfn.fsf@toke.dk \
--to=toke@redhat.com \
--cc=andrew@lunn.ch \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=razor@blackwall.org \
--cc=sdf@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.