From: John Fastabend <john.fastabend@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
Jesper Dangaard Brouer <brouer@redhat.com>
Cc: BPF-dev-list <bpf@vger.kernel.org>,
"Alexander Lobakin" <alexandr.lobakin@intel.com>,
"Karlsson, Magnus" <magnus.karlsson@intel.com>,
"Magnus Karlsson" <magnus.karlsson@gmail.com>,
"David Ahern" <dsahern@kernel.org>,
"Björn Töpel" <bjorn@kernel.org>,
"Saeed Mahameed" <saeed@kernel.org>,
"kurt@linutronix.de" <kurt@linutronix.de>,
"Raczynski, Piotr" <piotr.raczynski@intel.com>,
"Zhang, Jessica" <jessica.zhang@intel.com>,
"Maloor, Kishen" <kishen.maloor@intel.com>,
"Gomes, Vinicius" <vinicius.gomes@intel.com>,
"Brandeburg, Jesse" <jesse.brandeburg@intel.com>,
"Swiatkowski, Michal" <michal.swiatkowski@intel.com>,
"Plantykow, Marta A" <marta.a.plantykow@intel.com>,
"Desouza, Ederson" <ederson.desouza@intel.com>,
"Song, Yoong Siang" <yoong.siang.song@intel.com>,
"Czapnik, Lukasz" <lukasz.czapnik@intel.com>,
"John Fastabend" <john.fastabend@gmail.com>,
"Joseph, Jithu" <jithu.joseph@intel.com>,
"William Tu" <u9012063@gmail.com>,
"Ong Boon Leong" <boon.leong.ong@intel.com>
Subject: Re: XDP-hints: Howto support multiple BTF types per packet basis?
Date: Wed, 26 May 2021 13:31:26 -0700 [thread overview]
Message-ID: <60aeb01ebcd10_fe49208b8@john-XPS-13-9370.notmuch> (raw)
In-Reply-To: <CAEf4BzYXUDyQaBjZmb_Q5-z3jw1-Uvdgxm+cfcQjSwb9oRoXnQ@mail.gmail.com>
Andrii Nakryiko wrote:
> On Wed, May 26, 2021 at 3:59 AM Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > Hi All,
> >
> > I see a need for a driver to use different XDP metadata layout on a per
> > packet basis. E.g. PTP packets contains a hardware timestamp. E.g. VLAN
> > offloading and associated metadata as only relevant for packets using
> > VLANs. (Reserving room for every possible HW-hint is against the idea
> > of BTF).
> >
> > The question is how to support multiple BTF types on per packet basis?
> > (I need input from BTF experts, to tell me if I'm going in the wrong
> > direction with below ideas).
>
> I'm trying to follow all three threads, but still, can someone dumb it
Thanks ;)
> down for me and use few very specific examples to show how all this is
> supposed to work end-to-end. I.e., how the C definition for those
> custom BTF layouts might look like and how they are used in BPF
> programs, etc. I'm struggling to put all the pieces together, even
> ignoring all the netdev-specific configuration questions.
Best to start with the simplest possible usable thing and get more
complex over time.
For a C definition I would expect drivers to do something like this,
struct mynic_rx_descriptor {
__u64 len;
__u64 head;
__u64 tail;
__u64 foobar;
}
struct mynic_metadata {
__u64 timestamp;
__u64 hash;
__u64 pkt_type;
struct mynic_rx_descriptor *ptr_to_rx;
/* other things */
}
It doesn't really matter how the driver folks generate their metadata
though. They might use some non-C thing that is more natural for
writing parser/action/tcam codes.
Anyways given some C block like above we generate BTF from above
using normal method, quick hack just `pahole -J` the thing. Now we
have a BTF file.
Next up write some XDP program to do something with it,
void myxdp_prog(struct xdp_md *ctx) {
struct mynic_metadata m = (struct mynic_metadata *)ctx->data_meta;
// now I can get data using normal CO-RE
// I usually have this _(&) to put CO-RE attributes in I
// believe that is standard? Or use the other macros
__u64 pkt_type = _(&m->pkt_type)
// we can even walk into structs if we have probe read
// around.
struct mynic_rx_descriptor *rxdesc = _(&m->ptr_to_rx)
// now do whatever I like with above metadata
}
Run above program through normal CO-RE pass and as long as it has
access to the BTF from above it will work. I have some logic
sitting around to stitch two BTF blocks together but we have
that now done properly for linking.
probe_read from XDP should be added regardless of above. I've
found it super handy in skmsg programs to dig out kernel info
inline. With probe_read we can also start to walk net_device
struct for more detailed info as needed. Or into sock structs
for process level conntrack (other thread). Even without
probe_read above would be useful but fields would need to fit
into the metadata where we know we can read/write data.
Having drivers export their BTF over a /sys/fs/ interface
so that BTF can change with fimware/parser updates is possible
as well, but I would want to see above working in real world
before committing to a /sys/fs interface. Anyways the
interface is just a convienence.
>
> As for BTF on a per-packet basis. This means that BTF itself is not
> known at the BPF program verification time, so there will be some sort
> of if/else if/else conditions to handle all recognized BTF IDs? Is
> that right? Fake but specific code would help (at least me) to
> actually join the discussion. Thanks.
I don't think we actually want per-packet data that sounds a bit
clumsy for me. Lets use a union and define it so that we have a
single BTF.
struct mynic_metadata {
__u64 pkt_type
union {
struct ipv6_meta meta;
struct ipv4_meta meta;
struct arp_meta meta;
}
};
Then program has to swivel on pkt_type but that is most natural
C thing to do IMO.
Honestly we have about 90% of the necessary bits to do this now.
Typed that up a bit fast hope its legible. Got a lot going on today.
Andrii, make sense?
Thanks,
John
>
> >
> > Let me describe a possible/proposed packet flow (feel free to disagree):
> >
> > When driver RX e.g. a PTP packet it knows HW is configured for PTP-TS and
> > when it sees a TS is available, then it chooses a code path that use the
> > BTF layout that contains RX-TS. To communicate what BTF-type the
> > XDP-metadata contains, it simply store the BTF-ID in xdp_buff->btf_id.
> >
> > When redirecting the xdp_buff is converted to xdp_frame, and also contains
> > the btf_id member. When converting xdp_frame to SKB, then netcore-code
> > checks if this BTF-ID have been registered, if so there is a (callback or
> > BPF-hook) registered to handle this BTF-type that transfer the fields from
> > XDP-metadata area into SKB fields.
> >
> > The XDP-prog also have access to this ctx->btf_id and can multiplex on
> > this in the BPF-code itself. Or use other methods like parsing PTP packet
> > and extract TS as expected BTF offset in XDP metadata (perhaps add a
> > sanity check if metadata-size match).
> >
> >
> > I talked to AF_XDP people (Magnus, Bjørn and William) about this idea,
> > and they pointed out that AF_XDP also need to know what BTF-layout is
> > used. As Magnus wrote in other thread; there is only 32-bit left in
> > AF_XDP descriptor option. We could store the BTF-ID in this field, but
> > it would block for other use-cases. Bjørn came up with the idea of
> > storing the BTF-ID in the BTF-layout itself, but as the last-member (to
> > have fixed offset to check in userspace AF_XDP program). Then we only
> > need to use a single bit in AF_XDP descriptor option to say
> > XDP-metadata is BTF described.
> >
> > In the AF_XDP userspace program, the programmers can have a similar
> > callback system per known BTF-ID. This way they can compile efficient
> > code per ID via requesting the BTF layout from the kernel. (Hint:
> > `bpftool btf dump id 42 format c`).
> >
> > Please let me know if this it the right or wrong direction?
> >
> > --
> > Best regards,
> > Jesper Dangaard Brouer
> > MSc.CS, Principal Kernel Engineer at Red Hat
> > LinkedIn: http://www.linkedin.com/in/brouer
> >
next prev parent reply other threads:[~2021-05-26 20:31 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-26 10:58 XDP-hints: Howto support multiple BTF types per packet basis? Jesper Dangaard Brouer
2021-05-26 19:12 ` Andrii Nakryiko
2021-05-26 20:20 ` Jesper Dangaard Brouer
2021-05-26 20:37 ` John Fastabend
2021-05-26 22:39 ` Andrii Nakryiko
2021-05-28 11:16 ` Jesper Dangaard Brouer
2021-05-30 3:24 ` Andrii Nakryiko
2021-05-26 20:31 ` John Fastabend [this message]
2021-05-26 22:54 ` Andrii Nakryiko
2021-05-27 0:44 ` John Fastabend
2021-05-27 17:44 ` Andrii Nakryiko
2021-05-28 5:48 ` John Fastabend
2021-05-28 9:16 ` Toke Høiland-Jørgensen
2021-05-28 10:38 ` Alexander Lobakin
2021-05-28 14:35 ` John Fastabend
2021-05-28 15:33 ` Toke Høiland-Jørgensen
2021-05-28 16:02 ` Jesper Dangaard Brouer
2021-05-28 17:29 ` John Fastabend
2021-05-30 3:27 ` Andrii Nakryiko
2021-05-31 11:03 ` Toke Høiland-Jørgensen
2021-05-31 13:17 ` Jesper Dangaard Brouer
2021-06-02 0:22 ` John Fastabend
2021-06-02 16:18 ` Jakub Kicinski
2021-06-22 7:44 ` Michal Swiatkowski
2021-06-22 11:53 ` Toke Høiland-Jørgensen
2021-06-23 8:32 ` Michal Swiatkowski
2021-06-24 12:23 ` Toke Høiland-Jørgensen
2021-06-24 13:07 ` Magnus Karlsson
2021-06-24 14:58 ` Alexei Starovoitov
2021-06-24 15:11 ` Zvi Effron
2021-06-24 16:04 ` Toke Høiland-Jørgensen
2021-06-24 16:32 ` Zvi Effron
2021-06-24 16:45 ` Jesper Dangaard Brouer
2021-07-08 8:32 ` Michal Swiatkowski
2021-07-09 10:57 ` Toke Høiland-Jørgensen
2021-09-02 2:49 ` Michal Swiatkowski
2021-09-02 9:17 ` Jesper Dangaard Brouer
2021-09-07 6:27 ` Michal Swiatkowski
2021-09-08 13:28 ` Jesper Dangaard Brouer
2021-09-09 18:19 ` Andrii Nakryiko
2021-09-10 11:16 ` Jesper Dangaard Brouer
2021-07-28 9:54 ` XDP-hints: how to inform driver about hints Michal Swiatkowski
2021-07-28 18:40 ` John Fastabend
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=60aeb01ebcd10_fe49208b8@john-XPS-13-9370.notmuch \
--to=john.fastabend@gmail.com \
--cc=alexandr.lobakin@intel.com \
--cc=andrii.nakryiko@gmail.com \
--cc=bjorn@kernel.org \
--cc=boon.leong.ong@intel.com \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=dsahern@kernel.org \
--cc=ederson.desouza@intel.com \
--cc=jesse.brandeburg@intel.com \
--cc=jessica.zhang@intel.com \
--cc=jithu.joseph@intel.com \
--cc=kishen.maloor@intel.com \
--cc=kurt@linutronix.de \
--cc=lukasz.czapnik@intel.com \
--cc=magnus.karlsson@gmail.com \
--cc=magnus.karlsson@intel.com \
--cc=marta.a.plantykow@intel.com \
--cc=michal.swiatkowski@intel.com \
--cc=piotr.raczynski@intel.com \
--cc=saeed@kernel.org \
--cc=u9012063@gmail.com \
--cc=vinicius.gomes@intel.com \
--cc=yoong.siang.song@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).