From: Jesper Dangaard Brouer <brouer@redhat.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: "Maciej Żenczykowski" <maze@google.com>,
"Daniel Borkmann" <daniel@iogearbox.net>,
bpf <bpf@vger.kernel.org>,
"Linux NetDev" <netdev@vger.kernel.org>,
"Daniel Borkmann" <borkmann@iogearbox.net>,
"Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
"Lorenz Bauer" <lmb@cloudflare.com>,
"Shaun Crampton" <shaun@tigera.io>,
"Lorenzo Bianconi" <lorenzo@kernel.org>,
"Marek Majkowski" <marek@cloudflare.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Eyal Birger" <eyal.birger@gmail.com>,
brouer@redhat.com
Subject: Re: [PATCH bpf-next V2 5/6] bpf: Add MTU check for TC-BPF packets after egress hook
Date: Thu, 8 Oct 2020 10:07:23 +0200 [thread overview]
Message-ID: <20201008100723.33e14dca@carbon> (raw)
In-Reply-To: <5f7e854b111fc_2acac2087e@john-XPS-13-9370.notmuch>
On Wed, 07 Oct 2020 20:19:39 -0700
John Fastabend <john.fastabend@gmail.com> wrote:
> Maciej Żenczykowski wrote:
> > On Wed, Oct 7, 2020 at 3:37 PM John Fastabend <john.fastabend@gmail.com> wrote:
> > >
> > > Daniel Borkmann wrote:
> > > > On 10/7/20 6:23 PM, Jesper Dangaard Brouer wrote:
> > > > [...]
> > > > > net/core/dev.c | 24 ++++++++++++++++++++++--
> > > > > 1 file changed, 22 insertions(+), 2 deletions(-)
> > >
> > > Couple high-level comments. Whats the problem with just letting the driver
> > > consume the packet? I would chalk it up to a buggy BPF program that is
> > > sending these packets. The drivers really shouldn't panic or do anything
> > > horrible under this case because even today I don't think we can be
> > > 100% certain MTU on skb matches set MTU. Imagine the case where I change
> > > the MTU from 9kB->1500B there will be some skbs in-flight with the larger
> > > length and some with the shorter. If the drivers panic/fault or otherwise
> > > does something else horrible thats not going to be friendly in general case
> > > regardless of what BPF does. And seeing this type of config is all done
> > > async its tricky (not practical) to flush any skbs in-flight.
> > >
> > > I've spent many hours debugging these types of feature flag, mtu
> > > change bugs on the driver side I'm not sure it can be resolved by
> > > the stack easily. Better to just build drivers that can handle it IMO.
> > >
> > > Do we know if sending >MTU size skbs to drivers causes problems in real
> > > cases? I haven't tried on the NICs I have here, but I expect they should
> > > be fine. Fine here being system keeps running as expected. Dropping the
> > > skb either on TX or RX side is expected. Even with this change though
> > > its possible for the skb to slip through if I configure MTU on a live
> > > system.
> >
> > I wholeheartedly agree with the above.
> >
> > Ideally the only >mtu check should happen at driver admittance.
> > But again ideally it should happen in some core stack location not in
> > the driver itself.
>
> Ideally maybe, but IMO we should just let the skb go to the driver
> and let the driver sort it out. Even if this means pushing the packet
> onto the wire then the switch will drop it or the receiver, etc. A
> BPF program can do lots of horrible things that should never be
> on the wire otherwise. MTU is just one of them, but sending corrupted
> payloads, adding bogus headers, checksums etc. so I don't think we can
> reasonable protect against all of them.
That is a good point.
> Of course if the driver is going to hang/panic then something needs
> to be done. Perhaps a needs_mtu_check feature flag, although thats
> not so nice either so perhaps drivers just need to handle it themselves.
> Also even today the case could happen without BPF as best I can tell
> so the drivers should be prepared for it.
Yes, borderline cases do exist already today (like your reconf with
inflight packets), so I guess drivers should at-least not hang/panic
and be robust enough that we can drop this check.
I think you have convinced me that we can drop this check. My only
concern is how people can troubleshoot this, but its not different from
current state.
> > However, due to both gso and vlan offload, even this is not trivial to do...
> > The mtu is L3, but drivers/hardware/the wire usually care about L2...
If net_device->mtu is L3 (1500) and XDP (and TC, right?) operate at L2,
that likely means that the "strict" bpf_mtu_check (in my BPF-helper) is
wrong, as XDP (and TC) length at this point include the 14 bytes
Ethernet header. I will check and fix.
Is this accounted for via net_device->hard_header_len ?
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2020-10-08 8:07 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-07 16:22 [PATCH bpf-next V2 0/6] bpf: New approach for BPF MTU handling and enforcement Jesper Dangaard Brouer
2020-10-07 16:22 ` [PATCH bpf-next V2 1/6] bpf: Remove MTU check in __bpf_skb_max_len Jesper Dangaard Brouer
2020-10-07 21:26 ` Daniel Borkmann
2020-10-07 23:46 ` Maciej Żenczykowski
2020-10-08 11:06 ` Jesper Dangaard Brouer
2020-10-08 12:33 ` Willem de Bruijn
2020-10-08 14:07 ` Jesper Dangaard Brouer
2020-10-07 16:22 ` [PATCH bpf-next V2 2/6] bpf: bpf_fib_lookup return MTU value as output when looked up Jesper Dangaard Brouer
2020-10-07 16:22 ` [PATCH bpf-next V2 3/6] bpf: add BPF-helper for MTU checking Jesper Dangaard Brouer
2020-10-07 16:22 ` [PATCH bpf-next V2 4/6] bpf: make it possible to identify BPF redirected SKBs Jesper Dangaard Brouer
2020-10-07 16:23 ` [PATCH bpf-next V2 5/6] bpf: Add MTU check for TC-BPF packets after egress hook Jesper Dangaard Brouer
2020-10-07 21:37 ` Daniel Borkmann
2020-10-07 22:36 ` John Fastabend
2020-10-07 23:52 ` Maciej Żenczykowski
2020-10-08 3:19 ` John Fastabend
2020-10-08 8:07 ` Jesper Dangaard Brouer [this message]
2020-10-08 8:25 ` Daniel Borkmann
2020-10-08 8:30 ` Jesper Dangaard Brouer
2020-10-07 16:23 ` [PATCH bpf-next V2 6/6] bpf: drop MTU check when doing TC-BPF redirect to ingress Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201008100723.33e14dca@carbon \
--to=brouer@redhat.com \
--cc=alexei.starovoitov@gmail.com \
--cc=borkmann@iogearbox.net \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eyal.birger@gmail.com \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=lmb@cloudflare.com \
--cc=lorenzo@kernel.org \
--cc=marek@cloudflare.com \
--cc=maze@google.com \
--cc=netdev@vger.kernel.org \
--cc=shaun@tigera.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.