All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	BPF-dev-list <bpf@vger.kernel.org>,
	Magnus Karlsson <magnus.karlsson@gmail.com>,
	William Tu <u9012063@gmail.com>,
	XDP-hints working-group <xdp-hints@xdp-project.net>,
	brouer@redhat.com
Subject: Re: XDP-hints: Howto support multiple BTF types per packet basis?
Date: Mon, 31 May 2021 15:17:48 +0200	[thread overview]
Message-ID: <20210531151748.39fc5aa6@carbon> (raw)
In-Reply-To: <8735u3dv2l.fsf@toke.dk>

On Mon, 31 May 2021 13:03:14 +0200
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> John Fastabend <john.fastabend@gmail.com> writes:
> 
> > Jesper Dangaard Brouer wrote:
> >
> > [...]
> >
> > I'll try to respond to both Toke and Jesper here and make it coherent so
> > we don't split this thread yet again.
> >
> > Wish me luck.
> >
> > @Toke "react" -> "not break" hopefully gives you my opinion on this.
> >
> > @Toke "five fields gives 32 different metadata formats" OK let me take
> > five fields,
> >
> >  struct meta {
> >    __u32 f1;
> >    __u32 f2;
> >    __u32 f3;
> >    __u32 f4;
> >    __u32 f5;
> >  }
> >
> > I'm still confused why the meta data would change just because the feature
> > is enabled or disabled. I've written drivers before and I don't want to
> > move around where I write f1 based on some combination of features f2,f3,f4,f5

Just be be clear:  I'm not suggesting drivers need to dynamically write
at different offsets.  Adding this BTF-flexibility make is possible,
but in practice the driver need to be smart about this.  I think we all
understand this would kill performance and create too complex drivers.

Drivers and hardware have a natural need to place info at fixed offsets.
The hole exercise is to create a layer between drivers and BPF+netstack
via BTF that can express flexibility.  That BTF can express this
flexibility doesn't mean that the drivers/hardware all of a sudden need
to be as flexible.

Drivers will continue to place info at fixed offsets (likely make sense
to have a big struct with unions and everything). I'm suggesting that
the driver will tell the "flex-layer" via BTF which-members-are-what by
setting a BTF-ID that "describe" this.
(Hint, drivers have knowledge like what HW features bits are enabled
that can be translated into a given BTF-ID.  Potentially drivers can
update this BTF-ID when setup events via ethtool occurs)

To avoid the combinations explode, the driver can choose to limit these
via e.g. always include the vlan_id but set it to zero even when
vlan-offloads are turned off.  Drivers can also *choose* to have a
single and very advanced BTF-layout with bitfields and everything (as
the BTF-side is superflexible and support this).  Again the drivers
should be moderate and not implement a combination explosion of BTF-IDs
just because BTF allowed this high level of flexibility.


> > state of enabled/disabled. If features are mutual exclusive I can build a
> > sensible union. If its possible for all fields to enabled then I just lay
> > them out like above.  
> 
> The assumption that the layout would be changing as the features were
> enabled came from a discussion I had with Jesper where he pointed out
> that zeroing out the fields that were not active comes with a measurable
> performance impact. So changing the struct layout to only include the
> fields that are currently used is a way to make sure we don't hurt
> performance.
> 
> If I'm understanding you correctly, what you propose instead is that we
> just keep the struct layout the same and only write the data that we
> have, leaving the other fields as uninitialised (so essentially
> garbage), right?
> 
> If we do this, the BPF program obviously needs to know which fields are
> valid and which are not. AFAICT you're proposing that this should be
> done out-of-band (i.e., by the system administrator manually ensuring
> BPF program config fits system config)? I think there are a couple of
> problems with this:
> 
> - It requires the system admin to coordinate device config with all of
>   their installed XDP applications. This is error-prone, especially as
>   the number of applications grows (say if different containers have
>   different XDP programs installed on their virtual devices).
> 
> - It has synchronisation issues. Say I have an XDP program with optional
>   support for hardware timestamps and a software fallback. It gets
>   installed in software fallback mode; then the admin has to make sure
>   to enable the hardware timestamps before switching the application
>   into the mode where it will read that metadata field (and the opposite
>   order when disabling the hardware mode).

IMHO this synchronization issue is problematic.  E.g. when turning off
HW-timestamping, userspace BPF-application have to be quick, as it need
to disable BPF-prog global-variable BEFORE hardware stops setting
HW-TS, else BPF-prog will think HW-TS is on and read garbage. (There is
a similar issue for in-flight packets when turning this on).

Today enable/disable of HW-TS happens via a socket API. Do you imagine
the BPF-prog need to catch these events (turning HW-TS off) and then
update the BPF-prog global-variable?
 
> Also, we need to be able to deal with different metadata layouts on
> different packets in the same program. Consider the XDP program
> running on a veth device inside a container above: if this gets
> packets redirected into it from different NICs with different
> layouts, it needs to be able to figure out which packet came from
> where.
> 
> With this in mind I think we have to encode the metadata format into
> the packet metadata itself somehow. This could just be a matter of
> including the BTF ID as part of the struct itself, so that your
> example could essentially do:
> 
>   if (data->meta_btf_id == timestamp_id) {
>     struct timestamp_meta *meta = data->meta_data;
>     // do stuff
>   } else {
>     struct normal_meta *meta = data->meta_data;
>   }
> 
> 
> and then, to avoid drivers having to define different layouts we could
> essentially have the two metadata structs be:
> 
>  struct normal_meta {
>   u32 rxhash;
>   u32 padding;
>   u8 vlan;
>  };
> 
> and
> 
>  struct timestamp_meta {
>    u32 rxhash;
>    u32 timestamp;
>    u8 vlan;
>  };

This aligns well with my above suggestion to name a member differently
like "padding" in above.

Another way to "remove" members is the change the metadata size.
This way the BPF program cannot access it.  Notice, that is why I had
my timestamp info in the top of the struct in my example.  The driver
code is of-cause simply written such that the offsets are not dynamic (I
hope this is also clear to others, else feel free to poke me to explain
better...).

> This still gets us exponential growth in the number of metadata
> structs, but at least we could create tooling to auto-generate the
> BTF for the variants so the complexity is reduced to just consuming a
> lot of BTF IDs.
> 
> Alternatively we could have an explicit bitmap of valid fields, like:
> 
>  struct all_meta {
>    u32 _valid_field_bitmap;
>    u32 rxhash;
>    u32 timestamp;
>    u8 vlan;
>  };
> 
> and if a program reads all_meta->timestamp CO-RE could transparently
> insert a check of the relevant field in the bitmap first. My immediate
> feeling is that this last solution would be nicer: We'd still need to
> include the packet BTF ID in the packet data to deal with case where
> packets are coming from different interfaces, but we'd avoid having
> lots of different variants with separate BTF IDs. I'm not sure what
> it would take to teach CO-RE to support such a scheme, though...
> 
> WDYT?

Keeping above intact, as I (also) want to hear what John thinks.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


  reply	other threads:[~2021-05-31 13:19 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-26 10:58 XDP-hints: Howto support multiple BTF types per packet basis? Jesper Dangaard Brouer
2021-05-26 19:12 ` Andrii Nakryiko
2021-05-26 20:20   ` Jesper Dangaard Brouer
2021-05-26 20:37     ` John Fastabend
2021-05-26 22:39     ` Andrii Nakryiko
2021-05-28 11:16       ` Jesper Dangaard Brouer
2021-05-30  3:24         ` Andrii Nakryiko
2021-05-26 20:31   ` John Fastabend
2021-05-26 22:54     ` Andrii Nakryiko
2021-05-27  0:44       ` John Fastabend
2021-05-27 17:44         ` Andrii Nakryiko
2021-05-28  5:48           ` John Fastabend
2021-05-28  9:16             ` Toke Høiland-Jørgensen
2021-05-28 10:38               ` Alexander Lobakin
2021-05-28 14:35               ` John Fastabend
2021-05-28 15:33                 ` Toke Høiland-Jørgensen
2021-05-28 16:02                 ` Jesper Dangaard Brouer
2021-05-28 17:29                   ` John Fastabend
2021-05-30  3:27                     ` Andrii Nakryiko
2021-05-31 11:03                     ` Toke Høiland-Jørgensen
2021-05-31 13:17                       ` Jesper Dangaard Brouer [this message]
2021-06-02  0:22                       ` John Fastabend
2021-06-02 16:18                         ` Jakub Kicinski
2021-06-22  7:44                           ` Michal Swiatkowski
2021-06-22 11:53                             ` Toke Høiland-Jørgensen
2021-06-23  8:32                               ` Michal Swiatkowski
2021-06-24 12:23                                 ` Toke Høiland-Jørgensen
2021-06-24 13:07                                   ` Magnus Karlsson
2021-06-24 14:58                                     ` Alexei Starovoitov
2021-06-24 15:11                                   ` Zvi Effron
2021-06-24 16:04                                     ` Toke Høiland-Jørgensen
2021-06-24 16:32                                       ` Zvi Effron
2021-06-24 16:45                                       ` Jesper Dangaard Brouer
2021-07-08  8:32                                   ` Michal Swiatkowski
2021-07-09 10:57                                     ` Toke Høiland-Jørgensen
2021-09-02  2:49                                       ` Michal Swiatkowski
2021-09-02  9:17                                         ` Jesper Dangaard Brouer
2021-09-07  6:27                                           ` Michal Swiatkowski
2021-09-08 13:28                                             ` Jesper Dangaard Brouer
2021-09-09 18:19                                               ` Andrii Nakryiko
2021-09-10 11:16                                                 ` Jesper Dangaard Brouer
2021-07-28  9:54 ` XDP-hints: how to inform driver about hints Michal Swiatkowski
2021-07-28 18:40   ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210531151748.39fc5aa6@carbon \
    --to=brouer@redhat.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=magnus.karlsson@gmail.com \
    --cc=toke@redhat.com \
    --cc=u9012063@gmail.com \
    --cc=xdp-hints@xdp-project.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.