All of lore.kernel.org
 help / color / mirror / Atom feed
From: sdf@google.com
To: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
	Daniel Borkmann <borkmann@iogearbox.net>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	maze@google.com, lmb@cloudflare.com, shaun@tigera.io,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	marek@cloudflare.com, John Fastabend <john.fastabend@gmail.com>,
	Jakub Kicinski <kuba@kernel.org>,
	eyal.birger@gmail.com, colrack@gmail.com
Subject: Re: [PATCH bpf-next V7 4/8] bpf: add BPF-helper for MTU checking
Date: Thu, 3 Dec 2020 10:25:41 -0800	[thread overview]
Message-ID: <X8ktpX/BYfiL0l2l@google.com> (raw)
In-Reply-To: <160588910708.2817268.17750536562819017509.stgit@firesoul>

On 11/20, Jesper Dangaard Brouer wrote:
> This BPF-helper bpf_check_mtu() works for both XDP and TC-BPF programs.

> The SKB object is complex and the skb->len value (accessible from
> BPF-prog) also include the length of any extra GRO/GSO segments, but
> without taking into account that these GRO/GSO segments get added
> transport (L4) and network (L3) headers before being transmitted. Thus,
> this BPF-helper is created such that the BPF-programmer don't need to
> handle these details in the BPF-prog.

> The API is designed to help the BPF-programmer, that want to do packet
> context size changes, which involves other helpers. These other helpers
> usually does a delta size adjustment. This helper also support a delta
> size (len_diff), which allow BPF-programmer to reuse arguments needed by
> these other helpers, and perform the MTU check prior to doing any actual
> size adjustment of the packet context.

> It is on purpose, that we allow the len adjustment to become a negative
> result, that will pass the MTU check. This might seem weird, but it's not
> this helpers responsibility to "catch" wrong len_diff adjustments. Other
> helpers will take care of these checks, if BPF-programmer chooses to do
> actual size adjustment.

> V6:
> - Took John's advice and dropped BPF_MTU_CHK_RELAX
> - Returned MTU is kept at L3-level (like fib_lookup)

> V4: Lot of changes
>   - ifindex 0 now use current netdev for MTU lookup
>   - rename helper from bpf_mtu_check to bpf_check_mtu
>   - fix bug for GSO pkt length (as skb->len is total len)
>   - remove __bpf_len_adj_positive, simply allow negative len adj

> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>   include/uapi/linux/bpf.h       |   67 ++++++++++++++++++++++
>   net/core/filter.c              |  122  
> ++++++++++++++++++++++++++++++++++++++++
>   tools/include/uapi/linux/bpf.h |   67 ++++++++++++++++++++++
>   3 files changed, 256 insertions(+)

> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index beacd312ea17..2619ea8c5a08 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3790,6 +3790,61 @@ union bpf_attr {
>    *		*ARG_PTR_TO_BTF_ID* of type *task_struct*.
>    *	Return
>    *		Pointer to the current task.
> + *
> + * int bpf_check_mtu(void *ctx, u32 ifindex, u32 *mtu_len, s32 len_diff,  
> u64 flags)
> + *	Description
> + *		Check ctx packet size against MTU of net device (based on
> + *		*ifindex*).  This helper will likely be used in combination with
> + *		helpers that adjust/change the packet size.  The argument
> + *		*len_diff* can be used for querying with a planned size
> + *		change. This allows to check MTU prior to changing packet ctx.
> + *
> + *		Specifying *ifindex* zero means the MTU check is performed
> + *		against the current net device.  This is practical if this isn't
> + *		used prior to redirect.
> + *
> + *		The Linux kernel route table can configure MTUs on a more
> + *		specific per route level, which is not provided by this helper.
> + *		For route level MTU checks use the **bpf_fib_lookup**\ ()
> + *		helper.
> + *
> + *		*ctx* is either **struct xdp_md** for XDP programs or
> + *		**struct sk_buff** for tc cls_act programs.
> + *
> + *		The *flags* argument can be a combination of one or more of the
> + *		following values:
> + *
> + *		**BPF_MTU_CHK_SEGS**
> + *			This flag will only works for *ctx* **struct sk_buff**.
> + *			If packet context contains extra packet segment buffers
> + *			(often knows as GSO skb), then MTU check is harder to
> + *			check at this point, because in transmit path it is
> + *			possible for the skb packet to get re-segmented
> + *			(depending on net device features).  This could still be
> + *			a MTU violation, so this flag enables performing MTU
> + *			check against segments, with a different violation
> + *			return code to tell it apart. Check cannot use len_diff.
> + *
> + *		On return *mtu_len* pointer contains the MTU value of the net
> + *		device.  Remember the net device configured MTU is the L3 size,
> + *		which is returned here and XDP and TX length operate at L2.
> + *		Helper take this into account for you, but remember when using
> + *		MTU value in your BPF-code.  On input *mtu_len* must be a valid
> + *		pointer and be initialized (to zero), else verifier will reject
> + *		BPF program.
> + *
> + *	Return
> + *		* 0 on success, and populate MTU value in *mtu_len* pointer.
> + *
> + *		* < 0 if any input argument is invalid (*mtu_len* not updated)
> + *
> + *		MTU violations return positive values, but also populate MTU
> + *		value in *mtu_len* pointer, as this can be needed for
> + *		implementing PMTU handing:
> + *
> + *		* **BPF_MTU_CHK_RET_FRAG_NEEDED**
> + *		* **BPF_MTU_CHK_RET_SEGS_TOOBIG**
> + *
>    */
>   #define __BPF_FUNC_MAPPER(FN)		\
>   	FN(unspec),			\
> @@ -3951,6 +4006,7 @@ union bpf_attr {
>   	FN(task_storage_get),		\
>   	FN(task_storage_delete),	\
>   	FN(get_current_task_btf),	\
> +	FN(check_mtu),			\
>   	/* */

>   /* integer value in 'imm' field of BPF_CALL instruction selects which  
> helper
> @@ -4978,6 +5034,17 @@ struct bpf_redir_neigh {
>   	};
>   };

> +/* bpf_check_mtu flags*/
> +enum  bpf_check_mtu_flags {
> +	BPF_MTU_CHK_SEGS  = (1U << 0),
> +};
> +
> +enum bpf_check_mtu_ret {
> +	BPF_MTU_CHK_RET_SUCCESS,      /* check and lookup successful */
> +	BPF_MTU_CHK_RET_FRAG_NEEDED,  /* fragmentation required to fwd */
> +	BPF_MTU_CHK_RET_SEGS_TOOBIG,  /* GSO re-segmentation needed to fwd */
> +};
> +
>   enum bpf_task_fd_type {
>   	BPF_FD_TYPE_RAW_TRACEPOINT,	/* tp name */
>   	BPF_FD_TYPE_TRACEPOINT,		/* tp name */
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 25b137ffdced..d6125cfc49c3 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5604,6 +5604,124 @@ static const struct bpf_func_proto  
> bpf_skb_fib_lookup_proto = {
>   	.arg4_type	= ARG_ANYTHING,
>   };

> +static struct net_device *__dev_via_ifindex(struct net_device *dev_curr,
> +					    u32 ifindex)
> +{
> +	struct net *netns = dev_net(dev_curr);
> +
> +	/* Non-redirect use-cases can use ifindex=0 and save ifindex lookup */
> +	if (ifindex == 0)
> +		return dev_curr;
> +
> +	return dev_get_by_index_rcu(netns, ifindex);
> +}
> +
> +BPF_CALL_5(bpf_skb_check_mtu, struct sk_buff *, skb,
> +	   u32, ifindex, u32 *, mtu_len, s32, len_diff, u64, flags)
> +{
> +	int ret = BPF_MTU_CHK_RET_FRAG_NEEDED;
> +	struct net_device *dev = skb->dev;
> +	int len;
> +	int mtu;
> +
> +	if (flags & ~(BPF_MTU_CHK_SEGS))
> +		return -EINVAL;
> +
> +	dev = __dev_via_ifindex(dev, ifindex);
> +	if (!dev)
> +		return -ENODEV;
> +
> +	mtu = READ_ONCE(dev->mtu);
> +
> +	/* TC len is L2, remove L2-header as dev MTU is L3 size */

[..]
> +	len = skb->len - ETH_HLEN;
Any reason not to do s/ETH_HLEN/dev->hard_header_len/ (or min_header_len?)
thought this patch?

  reply	other threads:[~2020-12-03 18:26 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-20 16:18 [PATCH bpf-next V7 0/8] bpf: New approach for BPF MTU handling Jesper Dangaard Brouer
2020-11-20 16:18 ` [PATCH bpf-next V7 1/8] bpf: Remove MTU check in __bpf_skb_max_len Jesper Dangaard Brouer
2020-11-20 16:18 ` [PATCH bpf-next V7 2/8] bpf: fix bpf_fib_lookup helper MTU check for SKB ctx Jesper Dangaard Brouer
2020-12-02 21:44   ` Daniel Borkmann
2020-12-02 22:00     ` Daniel Borkmann
2020-11-20 16:18 ` [PATCH bpf-next V7 3/8] bpf: bpf_fib_lookup return MTU value as output when looked up Jesper Dangaard Brouer
2020-11-20 16:18 ` [PATCH bpf-next V7 4/8] bpf: add BPF-helper for MTU checking Jesper Dangaard Brouer
2020-12-03 18:25   ` sdf [this message]
2020-12-14 11:52     ` Jesper Dangaard Brouer
2020-11-20 16:18 ` [PATCH bpf-next V7 5/8] bpf: drop MTU check when doing TC-BPF redirect to ingress Jesper Dangaard Brouer
2020-11-20 16:18 ` [PATCH bpf-next V7 6/8] bpf: make it possible to identify BPF redirected SKBs Jesper Dangaard Brouer
2020-11-20 16:18 ` [PATCH bpf-next V7 7/8] selftests/bpf: use bpf_check_mtu in selftest test_cls_redirect Jesper Dangaard Brouer
2020-11-20 16:18 ` [PATCH bpf-next V7 8/8] bpf/selftests: activating bpf_check_mtu BPF-helper Jesper Dangaard Brouer
2020-11-20 21:50   ` Alexei Starovoitov
2020-11-21  7:41   ` Andrii Nakryiko
2020-11-24 14:33     ` Jesper Dangaard Brouer
2020-11-24 16:50       ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X8ktpX/BYfiL0l2l@google.com \
    --to=sdf@google.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=borkmann@iogearbox.net \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=colrack@gmail.com \
    --cc=eyal.birger@gmail.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=lmb@cloudflare.com \
    --cc=lorenzo@kernel.org \
    --cc=marek@cloudflare.com \
    --cc=maze@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=shaun@tigera.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.