netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Daniel Borkmann <daniel@iogearbox.net>, bpf@vger.kernel.org
Cc: netdev@vger.kernel.org, martin.lau@linux.dev,
	razor@blackwall.org, ast@kernel.org, andrii@kernel.org,
	john.fastabend@gmail.com, sdf@google.com, kuba@kernel.org,
	andrew@lunn.ch, Daniel Borkmann <daniel@iogearbox.net>
Subject: Re: [PATCH bpf-next v3 1/7] netkit, bpf: Add bpf programmable net device
Date: Tue, 24 Oct 2023 18:07:56 +0200	[thread overview]
Message-ID: <87msw8ovfn.fsf@toke.dk> (raw)
In-Reply-To: <20231023171856.18324-2-daniel@iogearbox.net>

Daniel Borkmann <daniel@iogearbox.net> writes:

> This work adds a new, minimal BPF-programmable device called "netkit"
> (former PoC code-name "meta") we recently presented at LSF/MM/BPF. The
> core idea is that BPF programs are executed within the drivers xmit routine
> and therefore e.g. in case of containers/Pods moving BPF processing closer
> to the source.
>
> One of the goals was that in case of Pod egress traffic, this allows to
> move BPF programs from hostns tcx ingress into the device itself, providing
> earlier drop or forward mechanisms, for example, if the BPF program
> determines that the skb must be sent out of the node, then a redirect to
> the physical device can take place directly without going through per-CPU
> backlog queue. This helps to shift processing for such traffic from softirq
> to process context, leading to better scheduling decisions/performance (see
> measurements in the slides).
>
> In this initial version, the netkit device ships as a pair, but we plan to
> extend this further so it can also operate in single device mode. The pair
> comes with a primary and a peer device. Only the primary device, typically
> residing in hostns, can manage BPF programs for itself and its peer. The
> peer device is designated for containers/Pods and cannot attach/detach
> BPF programs. Upon the device creation, the user can set the default policy
> to 'forward' or 'drop' for the case when no BPF program is attached.

Nit: according to the code the policies are 'pass' and 'drop'? :)

> Additionally, the device can be operated in L3 (default) or L2 mode. The
> management of BPF programs is done via bpf_mprog, so that multi-attach is
> supported right from the beginning with similar API and dependency controls
> as tcx. For details on the latter see commit 053c8e1f235d ("bpf: Add generic
> attach/detach/query API for multi-progs"). tc BPF compatibility is provided,
> so that existing programs can be easily migrated.
>
> Going forward, we plan to use netkit devices in Cilium as the main device
> type for connecting Pods. They will be operated in L3 mode in order to
> simplify a Pod's neighbor management and the peer will operate in default
> drop mode, so that no traffic is leaving between the time when a Pod is
> brought up by the CNI plugin and programs attached by the agent.
> Additionally, the programs we attach via tcx on the physical devices are
> using bpf_redirect_peer() for inbound traffic into netkit device, hence the
> latter is also supporting the ndo_get_peer_dev callback. Similarly, we use
> bpf_redirect_neigh() for the way out, pushing from netkit peer to phys device
> directly. Also, BIG TCP is supported on netkit device. For the follow-up
> work in single device mode, we plan to convert Cilium's cilium_host/_net
> devices into a single one.
>
> An extensive test suite for checking device operations and the BPF program
> and link management API comes as BPF selftests in this series.
>
> Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org>
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Link: https://github.com/borkmann/iproute2/tree/pr/netkit
> Link:
> http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf
> (24ff.)

I like the new name - thank you for changing it! :)

Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>


  reply	other threads:[~2023-10-24 16:08 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-23 17:18 [PATCH bpf-next v3 0/7] Add bpf programmable net device Daniel Borkmann
2023-10-23 17:18 ` [PATCH bpf-next v3 1/7] netkit, bpf: " Daniel Borkmann
2023-10-24 16:07   ` Toke Høiland-Jørgensen [this message]
2023-10-24 18:16     ` Daniel Borkmann
2023-10-24 16:40   ` Stanislav Fomichev
2023-10-24 18:05     ` Daniel Borkmann
2023-10-24 18:27       ` Stanislav Fomichev
2023-10-24 19:58         ` Daniel Borkmann
2023-10-23 17:18 ` [PATCH bpf-next v3 2/7] tools: Sync if_link uapi header Daniel Borkmann
2023-10-23 17:18 ` [PATCH bpf-next v3 3/7] libbpf: Add link-based API for netkit Daniel Borkmann
2023-10-23 17:18 ` [PATCH bpf-next v3 4/7] bpftool: Implement link show support " Daniel Borkmann
2023-10-24 16:08   ` Toke Høiland-Jørgensen
2023-10-23 17:18 ` [PATCH bpf-next v3 5/7] bpftool: Extend net dump with netkit progs Daniel Borkmann
2023-10-23 17:18 ` [PATCH bpf-next v3 6/7] selftests/bpf: Add netlink helper library Daniel Borkmann
2023-10-23 17:18 ` [PATCH bpf-next v3 7/7] selftests/bpf: Add selftests for netkit Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87msw8ovfn.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=andrew@lunn.ch \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=razor@blackwall.org \
    --cc=sdf@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).