Re: [PATCH bpf-next V2 5/6] bpf: Add MTU check for TC-BPF packets after egress hook

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: John Fastabend <john.fastabend@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	bpf@vger.kernel.org
Cc: netdev@vger.kernel.org, Daniel Borkmann <borkmann@iogearbox.net>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	maze@google.com, lmb@cloudflare.com, shaun@tigera.io,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	marek@cloudflare.com, John Fastabend <john.fastabend@gmail.com>,
	Jakub Kicinski <kuba@kernel.org>,
	eyal.birger@gmail.com
Subject: Re: [PATCH bpf-next V2 5/6] bpf: Add MTU check for TC-BPF packets after egress hook
Date: Wed, 07 Oct 2020 15:36:58 -0700	[thread overview]
Message-ID: <5f7e430ae158b_1a8312084d@john-XPS-13-9370.notmuch> (raw)
In-Reply-To: <7aeb6082-48a3-9b71-2e2c-10adeb5ee79a@iogearbox.net>

Daniel Borkmann wrote:
> On 10/7/20 6:23 PM, Jesper Dangaard Brouer wrote:
> [...]
> >   net/core/dev.c |   24 ++++++++++++++++++++++--
> >   1 file changed, 22 insertions(+), 2 deletions(-)

Couple high-level comments. Whats the problem with just letting the driver
consume the packet? I would chalk it up to a buggy BPF program that is
sending these packets. The drivers really shouldn't panic or do anything
horrible under this case because even today I don't think we can be
100% certain MTU on skb matches set MTU. Imagine the case where I change
the MTU from 9kB->1500B there will be some skbs in-flight with the larger
length and some with the shorter. If the drivers panic/fault or otherwise
does something else horrible thats not going to be friendly in general case
regardless of what BPF does. And seeing this type of config is all done
async its tricky (not practical) to flush any skbs in-flight.

I've spent many hours debugging these types of feature flag, mtu
change bugs on the driver side I'm not sure it can be resolved by
the stack easily. Better to just build drivers that can handle it IMO.

Do we know if sending >MTU size skbs to drivers causes problems in real
cases? I haven't tried on the NICs I have here, but I expect they should
be fine. Fine here being system keeps running as expected. Dropping the
skb either on TX or RX side is expected. Even with this change though
its possible for the skb to slip through if I configure MTU on a live
system.

> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index b433098896b2..19406013f93e 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3870,6 +3870,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
> >   	switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) {
> >   	case TC_ACT_OK:
> >   	case TC_ACT_RECLASSIFY:
> > +		*ret = NET_XMIT_SUCCESS;
> >   		skb->tc_index = TC_H_MIN(cl_res.classid);
> >   		break;
> >   	case TC_ACT_SHOT:
> > @@ -4064,9 +4065,12 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> >   {
> >   	struct net_device *dev = skb->dev;
> >   	struct netdev_queue *txq;
> > +#ifdef CONFIG_NET_CLS_ACT
> > +	bool mtu_check = false;
> > +#endif
> > +	bool again = false;
> >   	struct Qdisc *q;
> >   	int rc = -ENOMEM;
> > -	bool again = false;
> >   
> >   	skb_reset_mac_header(skb);
> >   
> > @@ -4082,14 +4086,28 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> >   
> >   	qdisc_pkt_len_init(skb);
> >   #ifdef CONFIG_NET_CLS_ACT
> > +	mtu_check = skb_is_redirected(skb);
> >   	skb->tc_at_ingress = 0;
> >   # ifdef CONFIG_NET_EGRESS
> >   	if (static_branch_unlikely(&egress_needed_key)) {
> > +		unsigned int len_orig = skb->len;
> > +
> >   		skb = sch_handle_egress(skb, &rc, dev);
> >   		if (!skb)
> >   			goto out;
> > +		/* BPF-prog ran and could have changed packet size beyond MTU */
> > +		if (rc == NET_XMIT_SUCCESS && skb->len > len_orig)
> > +			mtu_check = true;
> >   	}
> >   # endif
> > +	/* MTU-check only happens on "last" net_device in a redirect sequence
> > +	 * (e.g. above sch_handle_egress can steal SKB and skb_do_redirect it
> > +	 * either ingress or egress to another device).
> > +	 */
> 
> Hmm, quite some overhead in fast path. Also, won't this be checked multiple times
> on stacked devices? :( Moreover, this missed the fact that 'real' qdiscs can have
> filters attached too, this would come after this check. Can't this instead be in
> driver layer for those that really need it? I would probably only drop the check
> as done in 1/6 and allow the BPF prog to do the validation if needed.

Any checks like this should probably go in validate_xmit_skb_list() this is
where we check other things GSO, checksum, etc. that depend on drivers. Worse
case we could add a netif_needs_mtu() case in validate_xmit_skb if drivers
really can't handle >MTU size.

> 
> > +	if (mtu_check && !is_skb_forwardable(dev, skb)) {
> > +		rc = -EMSGSIZE;
> > +		goto drop;
> > +	}
> >   #endif
> >   	/* If device/qdisc don't need skb->dst, release it right now while
> >   	 * its hot in this cpu cache.
> > @@ -4157,7 +4175,9 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> >   
> >   	rc = -ENETDOWN;
> >   	rcu_read_unlock_bh();
> > -
> > +#ifdef CONFIG_NET_CLS_ACT
> > +drop:
> > +#endif
> >   	atomic_long_inc(&dev->tx_dropped);
> >   	kfree_skb_list(skb);
> >   	return rc;
> >

next prev parent reply	other threads:[~2020-10-07 22:37 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-07 16:22 [PATCH bpf-next V2 0/6] bpf: New approach for BPF MTU handling and enforcement Jesper Dangaard Brouer
2020-10-07 16:22 ` [PATCH bpf-next V2 1/6] bpf: Remove MTU check in __bpf_skb_max_len Jesper Dangaard Brouer
2020-10-07 21:26   ` Daniel Borkmann
2020-10-07 23:46   ` Maciej Żenczykowski
2020-10-08 11:06     ` Jesper Dangaard Brouer
2020-10-08 12:33       ` Willem de Bruijn
2020-10-08 14:07         ` Jesper Dangaard Brouer
2020-10-07 16:22 ` [PATCH bpf-next V2 2/6] bpf: bpf_fib_lookup return MTU value as output when looked up Jesper Dangaard Brouer
2020-10-07 16:22 ` [PATCH bpf-next V2 3/6] bpf: add BPF-helper for MTU checking Jesper Dangaard Brouer
2020-10-07 16:22 ` [PATCH bpf-next V2 4/6] bpf: make it possible to identify BPF redirected SKBs Jesper Dangaard Brouer
2020-10-07 16:23 ` [PATCH bpf-next V2 5/6] bpf: Add MTU check for TC-BPF packets after egress hook Jesper Dangaard Brouer
2020-10-07 21:37   ` Daniel Borkmann
2020-10-07 22:36     ` John Fastabend [this message]
2020-10-07 23:52       ` Maciej Żenczykowski
2020-10-08  3:19         ` John Fastabend
2020-10-08  8:07           ` Jesper Dangaard Brouer
2020-10-08  8:25             ` Daniel Borkmann
2020-10-08  8:30     ` Jesper Dangaard Brouer
2020-10-07 16:23 ` [PATCH bpf-next V2 6/6] bpf: drop MTU check when doing TC-BPF redirect to ingress Jesper Dangaard Brouer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5f7e430ae158b_1a8312084d@john-XPS-13-9370.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=borkmann@iogearbox.net \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=eyal.birger@gmail.com \
    --cc=kuba@kernel.org \
    --cc=lmb@cloudflare.com \
    --cc=lorenzo@kernel.org \
    --cc=marek@cloudflare.com \
    --cc=maze@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=shaun@tigera.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox