All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: John Fastabend <john.fastabend@gmail.com>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: "Jesper Dangaard Brouer" <brouer@redhat.com>,
	BPF-dev-list <bpf@vger.kernel.org>,
	"Alexander Lobakin" <alexandr.lobakin@intel.com>,
	"Karlsson, Magnus" <magnus.karlsson@intel.com>,
	"Magnus Karlsson" <magnus.karlsson@gmail.com>,
	"David Ahern" <dsahern@kernel.org>,
	"Björn Töpel" <bjorn@kernel.org>,
	"Saeed Mahameed" <saeed@kernel.org>,
	"kurt@linutronix.de" <kurt@linutronix.de>,
	"Raczynski, Piotr" <piotr.raczynski@intel.com>,
	"Zhang, Jessica" <jessica.zhang@intel.com>,
	"Maloor, Kishen" <kishen.maloor@intel.com>,
	"Gomes, Vinicius" <vinicius.gomes@intel.com>,
	"Brandeburg, Jesse" <jesse.brandeburg@intel.com>,
	"Swiatkowski, Michal" <michal.swiatkowski@intel.com>,
	"Plantykow, Marta A" <marta.a.plantykow@intel.com>,
	"Desouza, Ederson" <ederson.desouza@intel.com>,
	"Song, Yoong Siang" <yoong.siang.song@intel.com>,
	"Czapnik, Lukasz" <lukasz.czapnik@intel.com>,
	"Joseph, Jithu" <jithu.joseph@intel.com>,
	"William Tu" <u9012063@gmail.com>,
	"Ong Boon Leong" <boon.leong.ong@intel.com>,
	xdp-hints@xdp-project.net
Subject: Re: XDP-hints: Howto support multiple BTF types per packet basis?
Date: Fri, 28 May 2021 17:33:06 +0200	[thread overview]
Message-ID: <87o8cug9fx.fsf@toke.dk> (raw)
In-Reply-To: <60b0ffb63a21a_1cf82089e@john-XPS-13-9370.notmuch>

John Fastabend <john.fastabend@gmail.com> writes:

> Toke Høiland-Jørgensen wrote:
>> John Fastabend <john.fastabend@gmail.com> writes:
>> 
>> >> > > union and independent set of BTFs are two different things, I'll let
>> >> > > you guys figure out which one you need, but I replied how it could
>> >> > > look like in CO-RE world
>> >> >
>> >> > I think a union is sufficient and more aligned with how the
>> >> > hardware would actually work.
>> >> 
>> >> Sure. And I think those are two orthogonal concerns. You can start
>> >> with a single struct mynic_metadata with union inside it, and later
>> >> add the ability to swap mynic_metadata with another
>> >> mynic_metadata___v2 that will have a similar union but with a
>> >> different layout.
>> >
>> > Right and then you just have normal upgrade/downgrade problems with
>> > any struct.
>> >
>> > Seems like a workable path to me. But, need to circle back to the
>> > what we want to do with it part that Jesper replied to.
>> 
>> So while this seems to be a viable path for getting libbpf to do all the
>> relocations (and thanks for hashing that out, I did not have a good grip
>> of the details), doing it all in userspace means that there is no way
>> for the XDP program to react to changes once it has been loaded. So this
>> leaves us with a selection of non-very-attractive options, IMO. I.e.,
>> we would have to:
>
> I don't really understand what this means 'having XDP program to
> react to changes once it has been loaded.' What would a program look
> like thats dynamic? You can always version your metadata and
> write programs like this,
>
>   if (meta->version == VERSION1) {do_foo}
>   else {do_bar}
>
> And then have a headeer,
>
>    struct meta {
>      int version;
>      union ...    // union of versions
>    }
>
> I fail to see how a program could 'react' dynamically. An agent could
> load new programs dynamically into tail call maps of fentry with
> the need handlers, which would work as well and avoid unions.

By "react" I meant "not break", as in the program should still be able
to parse the metadata even if config changes. See below:

>> 
>> - have to block any modifications to the hardware config that would
>>   change the metadata format; this will probably result in irate users
>
> I'll need a concrete example if I swap out my parser block, I should
> also swap out my BPF for my shiny new protocol. I don't see how a
> user might write programs for things they've not configured hardware
> for yet. Leaving aside knobs like VLAN on/off, VXLAN on/off, and
> such which brings the next point.
>
>> 
>> - require XDP programs to deal with all possible metadata permutations
>>   supported by that driver (by exporting them all via a BTF union or
>>   similar); this means a potential for combinatorial explosion of config
>>   options and as NICs become programmable themselves I'm not even sure
>>   if it's possible for the driver to know ahead of time
>
> I don't see the problem sorry. For current things that exist I can't
> think up too many fields vlan, timestamp, checksum(?), pkt_type,
> hash maybe.

Even with five fields (assuming they can be individually toggled),
that's 32 different metadata formats. Add two more and we're at 128.
That's what I meant with "combinatorial explosion" (although I suppose
it's only exponential, not combinatorial if we fix the order of the
fields). I suppose it may be that you're right and that in practice the
number of fields is small enough that it's manageable, but right off the
bat it seems like a pretty limiting design to me.

> For programmable pipelines (P4) then I don't see a problem with
> reloading your program or swapping out a program. I don't see the
> value of adding a new protocol for example dynamically. Surely the
> hardware is going to get hit with a big reset anyways.

Hmm, okay, I do buy that completely reprogramming the NIC is probably
not something that is done as dynamically as toggling existing feature
bits, so maybe that is not such a huge concern...

>> - throw up our hands and just let the user deal with it (i.e., to
>>   nothing and so require XDP programs to be reloaded if the NIC config
>>   changes); this is not very friendly and is likely to lead to subtle
>>   bugs if an XDP program parses the metadata assuming it is in a
>>   different format than it is
>
> I'm not opposed to user error causing logic bugs.  If I give
> users power to reprogram their NICs they should be capabable
> of managing a few BPF programs. And if not then its a space
> where a distro/vendor should help them with tooling.
>
>> 
>> Given that hardware config changes are not just done by ethtool, but
>> also by things like running `tcpdump -j`, I really think we have to
>> assume that they can be quite dynamic; which IMO means we have to solve
>> this as part of the initial design. And I have a hard time seeing how
>> this is possible without involving the kernel somehow.
>
> I guess here your talking about building an skb? Wouldn't it
> use whatever logic it uses today to include the timestamp.
> This is a bit of an aside from metadata in the BPF program.

Building skbs is a separate concern, yeah, but that was not actually
what I meant here. Say I install an XDP program that reads metadata
like (after CO-RE rewriting):

struct meta {
  u32 rxhash;
  u8 vlan;
};

and that is merrily running and doing its thing, but then someone runs
`tcpdump -j`, causing the NIC to turn on hardware timestamping, thus
changing the effective metadata layout to:

struct meta {
  u32 rxhash;
  u32 timestamp;
  u8 vlan;
};

suddenly my XDP program will be reading garbage without knowing it, even
though it's not interested in the timestamp at all.

>> Unless I'm missing something? WDYT?
>
> Distilling above down. I think we disagree on how useful
> dynamic programs are because of two reasons. First I don't
> see a large list of common attributes that would make the
> union approach as painful as you fear.

See above; but I wouldn't actually mind being proven wrong here, I'm
just worried that we end up setting something in stone ABI-wise so we
can't change it later should there end up being a need for it.

> And two, I believe users who are touching core hardware firmware need
> to also be smart enough (or have smart tools) to swap out their BPF
> programs in the correct order so as to not create subtle races. I
> didn't do it here but if we agree walking through that program swap
> flow with firmware update would be useful.

Sure, I do think this would be useful; I only have a very fuzzy idea how
this is likely to work. But I think we may also differ in the assumption
of who controls the XDP programs: I very much view it as in scope for a
system to be able to run different XDP programs from different
applications without any other point of coordination than what the
kernel and libbpf/libxdp APIs offer. So if application A needs to
reprogram the hardware, how does application B's XDP program get
re-loaded so it can get its CO-RE relocations re-applied with the new
BTF format?

-Toke


  reply	other threads:[~2021-05-28 15:33 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-26 10:58 XDP-hints: Howto support multiple BTF types per packet basis? Jesper Dangaard Brouer
2021-05-26 19:12 ` Andrii Nakryiko
2021-05-26 20:20   ` Jesper Dangaard Brouer
2021-05-26 20:37     ` John Fastabend
2021-05-26 22:39     ` Andrii Nakryiko
2021-05-28 11:16       ` Jesper Dangaard Brouer
2021-05-30  3:24         ` Andrii Nakryiko
2021-05-26 20:31   ` John Fastabend
2021-05-26 22:54     ` Andrii Nakryiko
2021-05-27  0:44       ` John Fastabend
2021-05-27 17:44         ` Andrii Nakryiko
2021-05-28  5:48           ` John Fastabend
2021-05-28  9:16             ` Toke Høiland-Jørgensen
2021-05-28 10:38               ` Alexander Lobakin
2021-05-28 14:35               ` John Fastabend
2021-05-28 15:33                 ` Toke Høiland-Jørgensen [this message]
2021-05-28 16:02                 ` Jesper Dangaard Brouer
2021-05-28 17:29                   ` John Fastabend
2021-05-30  3:27                     ` Andrii Nakryiko
2021-05-31 11:03                     ` Toke Høiland-Jørgensen
2021-05-31 13:17                       ` Jesper Dangaard Brouer
2021-06-02  0:22                       ` John Fastabend
2021-06-02 16:18                         ` Jakub Kicinski
2021-06-22  7:44                           ` Michal Swiatkowski
2021-06-22 11:53                             ` Toke Høiland-Jørgensen
2021-06-23  8:32                               ` Michal Swiatkowski
2021-06-24 12:23                                 ` Toke Høiland-Jørgensen
2021-06-24 13:07                                   ` Magnus Karlsson
2021-06-24 14:58                                     ` Alexei Starovoitov
2021-06-24 15:11                                   ` Zvi Effron
2021-06-24 16:04                                     ` Toke Høiland-Jørgensen
2021-06-24 16:32                                       ` Zvi Effron
2021-06-24 16:45                                       ` Jesper Dangaard Brouer
2021-07-08  8:32                                   ` Michal Swiatkowski
2021-07-09 10:57                                     ` Toke Høiland-Jørgensen
2021-09-02  2:49                                       ` Michal Swiatkowski
2021-09-02  9:17                                         ` Jesper Dangaard Brouer
2021-09-07  6:27                                           ` Michal Swiatkowski
2021-09-08 13:28                                             ` Jesper Dangaard Brouer
2021-09-09 18:19                                               ` Andrii Nakryiko
2021-09-10 11:16                                                 ` Jesper Dangaard Brouer
2021-07-28  9:54 ` XDP-hints: how to inform driver about hints Michal Swiatkowski
2021-07-28 18:40   ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o8cug9fx.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=alexandr.lobakin@intel.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bjorn@kernel.org \
    --cc=boon.leong.ong@intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=dsahern@kernel.org \
    --cc=ederson.desouza@intel.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=jessica.zhang@intel.com \
    --cc=jithu.joseph@intel.com \
    --cc=john.fastabend@gmail.com \
    --cc=kishen.maloor@intel.com \
    --cc=kurt@linutronix.de \
    --cc=lukasz.czapnik@intel.com \
    --cc=magnus.karlsson@gmail.com \
    --cc=magnus.karlsson@intel.com \
    --cc=marta.a.plantykow@intel.com \
    --cc=michal.swiatkowski@intel.com \
    --cc=piotr.raczynski@intel.com \
    --cc=saeed@kernel.org \
    --cc=u9012063@gmail.com \
    --cc=vinicius.gomes@intel.com \
    --cc=xdp-hints@xdp-project.net \
    --cc=yoong.siang.song@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.