netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf@google.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
	martin.lau@kernel.org,  razor@blackwall.org, ast@kernel.org,
	andrii@kernel.org,  john.fastabend@gmail.com
Subject: Re: [PATCH bpf-next 1/8] meta, bpf: Add bpf programmable meta device
Date: Tue, 26 Sep 2023 14:26:29 -0700	[thread overview]
Message-ID: <ZRNMhVfuqPrK3J6O@google.com> (raw)
In-Reply-To: <20230926055913.9859-2-daniel@iogearbox.net>

On 09/26, Daniel Borkmann wrote:
> This work adds a new, minimal BPF-programmable device called "meta" we
> recently presented at LSF/MM/BPF. The latter name derives from the Greek
> μετά, encompassing a wide array of meanings such as "on top of", "beyond".
> Given business logic is defined by BPF, this device can have many meanings.
> The core idea is that BPF programs are executed within the drivers xmit
> routine and therefore e.g. in case of containers/Pods moving BPF processing
> closer to the source.
> 
> One of the goals was that in case of Pod egress traffic, this allows to
> move BPF programs from hostns tcx ingress into the device itself, providing
> earlier drop or forward mechanisms, for example, if the BPF program
> determines that the skb must be sent out of the node, then a redirect to
> the physical device can take place directly without going through per-CPU
> backlog queue. This helps to shift processing for such traffic from softirq
> to process context, leading to better scheduling decisions and better
> performance.
> 
> In this initial version, the meta device ships as a pair, but we plan to
> extend this further so it can also operate in single device mode. The pair
> comes with a primary and a peer device. Only the primary device, typically
> residing in hostns, can manage BPF programs for itself and its peer. The
> peer device is designated for containers/Pods and cannot attach/detach
> BPF programs. Upon the device creation, the user can set the default policy
> to 'forward' or 'drop' for the case when no BPF program is attached.
> 
> Additionally, the device can be operated in L3 (default) or L2 mode. The
> management of BPF programs is done via bpf_mprog, so that multi-attach is
> supported right from the beginning with similar API/dependency controls as
> tcx. For details on the latter see commit 053c8e1f235d ("bpf: Add generic
> attach/detach/query API for multi-progs"). tc BPF compatibility is provided,
> so that existing programs can be easily migrated.
> 
> Going forward, we plan to use meta devices in Cilium as the main device type
> for connecting Pods. They will be operated in L3 mode in order to simplify
> a Pod's neighbor management and the peer will operate in default drop mode,
> so that no traffic is leaving between the time when a Pod is brought up by
> the CNI plugin and programs attached by the agent. Additionally, the programs
> we attach via tcx on the physical devices are using bpf_redirect_peer()
> for inbound traffic into meta device, hence the latter also supporting the
> ndo_get_peer_dev callback. Similarly, we use bpf_redirect_neigh() for the
> way out, pushing to phys device directly. Also, BIG TCP is supported on meta
> device. For the follow-up work in single device mode, we plan to convert
> Cilium's cilium_host/_net devices into a single one.
> 
> An extensive test suite for checking device operations and the BPF program
> and link management API comes as BPF selftests in this series.
> 
> Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org>
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Link: https://github.com/borkmann/iproute2/commits/pr/meta
> Link: http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf (24ff.)
> ---
>  MAINTAINERS                    |   9 +
>  drivers/net/Kconfig            |   9 +
>  drivers/net/Makefile           |   1 +
>  drivers/net/meta.c             | 734 +++++++++++++++++++++++++++++++++
>  include/linux/netdevice.h      |   2 +
>  include/net/meta.h             |  31 ++
>  include/uapi/linux/bpf.h       |   2 +
>  include/uapi/linux/if_link.h   |  25 ++
>  kernel/bpf/syscall.c           |  30 +-
>  tools/include/uapi/linux/bpf.h |   2 +
>  10 files changed, 840 insertions(+), 5 deletions(-)
>  create mode 100644 drivers/net/meta.c
>  create mode 100644 include/net/meta.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 8985a1b0b5ee..ec3edd4caa56 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3774,6 +3774,15 @@ L:	bpf@vger.kernel.org
>  S:	Maintained
>  F:	tools/lib/bpf/
>  
> +BPF [META]
> +M:	Daniel Borkmann <daniel@iogearbox.net>
> +M:	Nikolay Aleksandrov <razor@blackwall.org>
> +L:	bpf@vger.kernel.org
> +L:	netdev@vger.kernel.org
> +S:	Supported
> +F:	drivers/net/meta.c
> +F:	include/net/meta.h
> +
>  BPF [MISC]
>  L:	bpf@vger.kernel.org
>  S:	Odd Fixes
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index 44eeb5d61ba9..9959cdd50b0b 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -448,6 +448,15 @@ config NLMON
>  	  diagnostics, etc. This is mostly intended for developers or support
>  	  to debug netlink issues. If unsure, say N.
>  
> +config META
> +	bool "BPF-programmable meta device"
> +	depends on BPF_SYSCALL
> +	help
> +	  The virtual meta devices can be created in pairs and used to connect
> +	  two network namespaces. A BPF program can be attached to the device(s)
> +	  which then gets executed on transmission to implement the driver
> +	  internal logic.
> +
>  config NET_VRF
>  	tristate "Virtual Routing and Forwarding (Lite)"
>  	depends on IP_MULTIPLE_TABLES
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index e26f98f897c5..18eabeb78ece 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -22,6 +22,7 @@ obj-$(CONFIG_MDIO) += mdio.o
>  obj-$(CONFIG_NET) += loopback.o
>  obj-$(CONFIG_NETDEV_LEGACY_INIT) += Space.o
>  obj-$(CONFIG_NETCONSOLE) += netconsole.o
> +obj-$(CONFIG_META) += meta.o
>  obj-y += phy/
>  obj-y += pse-pd/
>  obj-y += mdio/
> diff --git a/drivers/net/meta.c b/drivers/net/meta.c
> new file mode 100644
> index 000000000000..e464f547b0a6
> --- /dev/null
> +++ b/drivers/net/meta.c
> @@ -0,0 +1,734 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (c) 2023 Isovalent */
> +
> +#include <linux/netdevice.h>
> +#include <linux/ethtool.h>
> +#include <linux/etherdevice.h>
> +#include <linux/filter.h>
> +#include <linux/netfilter_netdev.h>
> +#include <linux/bpf_mprog.h>
> +
> +#include <net/meta.h>
> +#include <net/dst.h>
> +#include <net/tcx.h>
> +
> +#define DRV_NAME	"meta"
> +#define DRV_VERSION	"1.0"
> +
> +struct meta {
> +	/* Needed in fast-path */
> +	struct net_device __rcu *peer;
> +	struct bpf_mprog_entry __rcu *active;
> +	enum meta_action policy;
> +	struct bpf_mprog_bundle	bundle;
> +	/* Needed in slow-path */
> +	enum meta_mode mode;
> +	bool primary;
> +	u32 headroom;
> +};
> +
> +static void meta_scrub_minimum(struct sk_buff *skb)
> +{

[..]

> +	skb->skb_iif = 0;
> +	skb->ignore_df = 0;
> +	skb->priority = 0;
> +	skb_dst_drop(skb);
> +	skb_ext_reset(skb);
> +	nf_reset_ct(skb);
> +	nf_reset_trace(skb);
> +	nf_skip_egress(skb, true);
> +	ipvs_reset(skb);

This looks similar to skb_scrub_packet; what's the difference?

  reply	other threads:[~2023-09-26 21:26 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-26  5:59 [PATCH bpf-next 0/8] Add bpf programmable device Daniel Borkmann
2023-09-26  5:59 ` [PATCH bpf-next 1/8] meta, bpf: Add bpf programmable meta device Daniel Borkmann
2023-09-26 21:26   ` Stanislav Fomichev [this message]
2023-09-28  9:16   ` Toke Høiland-Jørgensen
2023-09-28 12:01     ` Willem de Bruijn
2023-09-28 21:14     ` Daniel Borkmann
2023-09-29 19:25       ` Alexei Starovoitov
2023-10-13 11:26   ` Florian Kauer
2023-09-26  5:59 ` [PATCH bpf-next 2/8] meta, bpf: Add bpf link support for " Daniel Borkmann
2023-09-28  0:12   ` Andrii Nakryiko
2023-09-26  5:59 ` [PATCH bpf-next 3/8] tools: Sync if_link uapi header Daniel Borkmann
2023-09-26  5:59 ` [PATCH bpf-next 4/8] libbpf: Add link-based API for meta Daniel Borkmann
2023-09-26 11:19   ` Quentin Monnet
2023-09-28  0:12   ` Andrii Nakryiko
2023-09-28 21:30     ` Daniel Borkmann
2023-09-26  5:59 ` [PATCH bpf-next 5/8] bpftool: Implement link show support " Daniel Borkmann
2023-09-26 11:19   ` Quentin Monnet
2023-09-26  5:59 ` [PATCH bpf-next 6/8] bpftool: Extend net dump with meta progs Daniel Borkmann
2023-09-26 11:19   ` Quentin Monnet
2023-09-26  5:59 ` [PATCH bpf-next 7/8] selftests/bpf: Add netlink helper library Daniel Borkmann
2023-09-26 21:35   ` Stanislav Fomichev
2023-09-26  5:59 ` [PATCH bpf-next 8/8] selftests/bpf: Add selftests for meta Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZRNMhVfuqPrK3J6O@google.com \
    --to=sdf@google.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=martin.lau@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=razor@blackwall.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).