All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Jesper Dangaard Brouer <brouer@redhat.com>,
	Saeed Mahameed <saeed@kernel.org>
Cc: "Maciej Żenczykowski" <maze@google.com>,
	"Daniel Borkmann" <borkmann@iogearbox.net>,
	"Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
	BPF-dev-list <bpf@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"Lorenzo Bianconi" <lorenzo.bianconi@redhat.com>,
	"Lorenz Bauer" <lmb@cloudflare.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Shaun Crampton" <shaun@tigera.io>,
	"David Miller" <davem@davemloft.net>,
	"Marek Majkowski" <marek@cloudflare.com>,
	brouer@redhat.com
Subject: Re: BPF redirect API design issue for BPF-prog MTU feedback?
Date: Fri, 18 Sep 2020 12:34:08 +0200	[thread overview]
Message-ID: <87ft7ffk67.fsf@toke.dk> (raw)
In-Reply-To: <20200918120016.7007f437@carbon>

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> On Thu, 17 Sep 2020 12:11:33 -0700
> Saeed Mahameed <saeed@kernel.org> wrote:
>
>> On Thu, 2020-09-17 at 05:54 -0700, Maciej Żenczykowski wrote:
>> > On Thu, Sep 17, 2020 at 5:39 AM Jesper Dangaard Brouer
>> > <brouer@redhat.com> wrote:  
>> > > 
>> > > As you likely know[1] I'm looking into moving the MTU check (for
>> > > TC-BPF) in __bpf_skb_max_len() when e.g. called by
>> > > bpf_skb_adjust_room(), because when redirecting packets to
>> > > another netdev it is not correct to limit the MTU based on the
>> > > incoming netdev.
>> > > 
>> > > I was looking at doing the MTU check in bpf_redirect() helper,
>> > > because at this point we know the redirect to netdev, and
>> > > returning an indication/error that MTU was exceed, would allow
>> > > the BPF-prog logic to react, e.g. sending ICMP (instead of packet
>> > > getting silently dropped). 
>> > > BUT this is not possible because bpf_redirect(index, flags) helper
>> > > don't provide the packet context-object (so I cannot lookup the
>> > > packet length).
>> > > 
>> > > Seeking input:
>> > > 
>> > > Should/can we change the bpf_redirect API or create a new helper
>> > > with packet-context?
>> > > 
>> > >  Note: We have the same need for the packet context for XDP when
>> > >  redirecting the new multi-buffer packets, as not all destination
>> > >  netdev will support these new multi-buffer packets.
>> > > 
>> > > I can of-cause do the MTU checks on kernel-side in
>> > > skb_do_redirect, but then how do people debug this? as packet
>> > > will basically be silently dropped.
>> > > 
>> > > 
>> > > 
>> > > (Looking at how does BPF-prog logic handle MTU today)
>> > > 
>> > > How do bpf_skb_adjust_room() report that the MTU was exceeded?
>> > > Unfortunately it uses a common return code -ENOTSUPP which used
>> > > for multiple cases (include MTU exceeded). Thus, the BPF-prog
>> > > logic cannot use this reliably to know if this is a MTU exceeded
>> > > event. (Looked BPF-prog code and they all simply exit with
>> > > TC_ACT_SHOT for all error codes, cloudflare have the most
>> > > advanced handling with metrics->errors_total_encap_adjust_failed++).
>> > > 
>> > > 
>> > > [1] 
>> > > https://lore.kernel.org/bpf/159921182827.1260200.9699352760916903781.stgit@firesoul/
>> > > --
>> > > Best regards,
>> > >   Jesper Dangaard Brouer
>> > >   MSc.CS, Principal Kernel Engineer at Red Hat
>> > >   LinkedIn: http://www.linkedin.com/in/brouer
>> > >   
>> > 
>> > (a) the current state of the world seems very hard to use correctly,
>> > so adding new apis, or even changing existing ones seems ok to me.
>> > especially if this just means changing what error code they return
>> > 
>> > (b) another complexity with bpf_redirect() is you can call it, it
>> > can succeed, but then you can not return TC_ACT_REDIRECT from the
>> > bpf program, which effectively makes the earlier *successful*
>> > bpf_redirect() call an utter no-op.
>> > 
>> > (bpf_redirect() just determines what a future return TC_ACT_REDIRECT
>> > will do)
>> > 
>> > so if you bpf_redirect to interface with larger mtu, then increase
>> > packet size,  
>> 
>> why would you redirect then touch the packet afterwards ? 
>> if you have a bad program, then it is a user issue.
>> 
>> > then return TC_ACT_OK, then you potentially end up with excessively
>> > large packet egressing through original interface (with small mtu).
>> > 
>
> This is a good point.  As bpf_skb_adjust_room() can just be run after
> bpf_redirect() call, then a MTU check in bpf_redirect() actually
> doesn't make much sense.  As clever/bad BPF program can then avoid the
> MTU check anyhow.  This basically means that we have to do the MTU
> check (again) on kernel side anyhow to catch such clever/bad BPF
> programs.  (And I don't like wasting cycles on doing the same check two
> times).
>
> If we do the MTU check on the kernel side, then there are no feedback
> to the program, and how are end-users going to debug this?

The same way any other MTU-related error is seen? Isn't there a counter
or something? Presumably (since this is in the skb path) it would also
be caught by drop_monitor?

-Toke


  reply	other threads:[~2020-09-18 10:34 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-17 12:38 BPF redirect API design issue for BPF-prog MTU feedback? Jesper Dangaard Brouer
2020-09-17 12:54 ` Maciej Żenczykowski
2020-09-17 19:11   ` Saeed Mahameed
2020-09-18 10:00     ` Jesper Dangaard Brouer
2020-09-18 10:34       ` Toke Høiland-Jørgensen [this message]
2020-09-18 23:06       ` Maciej Żenczykowski
2020-09-21 10:37         ` Lorenz Bauer
2020-09-21 12:49           ` Jesper Dangaard Brouer
2020-09-21 15:08             ` Daniel Borkmann
2020-09-21 16:21               ` Marek Zavodsky
2020-09-21 21:17                 ` Willem de Bruijn
2020-09-22  9:15                   ` Jesper Dangaard Brouer
2020-09-21 16:26               ` Jesper Dangaard Brouer
2020-09-22  6:56                 ` Eyal Birger
2020-09-21 18:04               ` John Fastabend
2020-10-06 11:45                 ` Jesper Dangaard Brouer
2020-09-21 10:42   ` Lorenz Bauer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ft7ffk67.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=borkmann@iogearbox.net \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=lmb@cloudflare.com \
    --cc=lorenzo.bianconi@redhat.com \
    --cc=marek@cloudflare.com \
    --cc=maze@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeed@kernel.org \
    --cc=shaun@tigera.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.