public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
diff for duplicates of <87a69o94wz.fsf@toke.dk>

diff --git a/a/1.txt b/N1/1.txt
index 1ca3ba4..e4aa316 100644
--- a/a/1.txt
+++ b/N1/1.txt
@@ -1,142 +1,191 @@
-Alexander Lobakin <alexandr.lobakin@intel.com> writes:
+From: Toke Høiland-Jørgensen <toke@redhat.com>
+Date: Mon, 04 Jul 2022 19:14:04 +0200
 
-> From: Toke H??iland-J??rgensen <toke@redhat.com>
-> Date: Wed, 29 Jun 2022 15:43:05 +0200
->
->> John Fastabend <john.fastabend@gmail.com> writes:
->> 
->> > Alexander Lobakin wrote:
->> >> This RFC is to give the whole picture. It will most likely be split
->> >> onto several series, maybe even merge cycles. See the "table of
->> >> contents" below.
->> >
->> > Even for RFC its a bit much. Probably improve the summary
->> > message here as well I'm still not clear on the overall
->> > architecture so not sure I want to dig into patches.
->> 
->> +1 on this, and piggybacking on your comment to chime in on the general
->> architecture.
->> 
->> >> Now, a NIC driver, or even a SmartNIC itself, can put those params
->> >> there in a well-defined format. The format is fixed, but can be of
->> >> several different types represented by structures, which definitions
->> >> are available to the kernel, BPF programs and the userland.
->> >
->> > I don't think in general the format needs to be fixed.
->> 
->> No, that's the whole point of BTF: it's not supposed to be UAPI, we'll
->> use CO-RE to enable dynamic formats...
->> 
->> [...]
->> 
->> >> It is fixed due to it being almost a UAPI, and the exact format can
->> >> be determined by reading the last 10 bytes of metadata. They contain
->> >> a 2-byte magic ID to not confuse it with a non-compatible meta and
->> >> a 8-byte combined BTF ID + type ID: the ID of the BTF where this
->> >> structure is defined and the ID of that definition inside that BTF.
->> >> Users can obtain BTF IDs by structure types using helpers available
->> >> in the kernel, BPF (written by the CO-RE/verifier) and the userland
->> >> (libbpf -> kernel call) and then rely on those ID when reading data
->> >> to make sure whether they support it and what to do with it.
->> >> Why separate magic and ID? The idea is to make different formats
->> >> always contain the basic/"generic" structure embedded at the end.
->> >> This way we can still benefit in purely generic consumers (like
->> >> cpumap) while providing some "extra" data to those who support it.
->> >
->> > I don't follow this. If you have a struct in your driver name it
->> > something obvious, ice_xdp_metadata. If I understand things
->> > correctly just dump the BTF for the driver, extract the
->> > struct and done you can use CO-RE reads. For the 'fixed' case
->> > this looks easy. And I don't think you even need a patch for this.
->> 
->> ...however as we've discussed previously, we do need a bit of
->> infrastructure around this. In particular, we need to embed the embed
->> the BTF ID into the metadata itself so BPF can do runtime disambiguation
->> between different formats (and add the right CO-RE primitives to make
->> this easy). This is for two reasons:
->> 
->> - The metadata might be different per-packet (e.g., PTP packets with
->>   timestamps interleaved with bulk data without them)
->> 
->> - With redirects we may end up processing packets from different devices
->>   in a single XDP program (in devmap or cpumap, or on a veth) so we need
->>   to be able to disambiguate at runtime.
->> 
->> So I think the part of the design that puts the BTF ID into the end of
->> the metadata struct is sound; however, the actual format doesn't have to
->> be fixed, we can use CO-RE to pick out the bits that a given BPF program
->> needs; we just need a convention for how drivers report which format(s)
->> they support. Which we should also agree on (and add core infrastructure
->> around) so each driver doesn't go around inventing their own
->> conventions.
->> 
->> >> The enablement of this feature is controlled on attaching/replacing
->> >> XDP program on an interface with two new parameters: that combined
->> >> BTF+type ID and metadata threshold.
->> >> The threshold specifies the minimum frame size which a driver (or
->> >> NIC) should start composing metadata from. It is introduced instead
->> >> of just false/true flag due to that often it's not worth it to spend
->> >> cycles to fetch all that data for such small frames: let's say, it
->> >> can be even faster to just calculate checksums for them on CPU
->> >> rather than touch non-coherent DMA zone. Simple XDP_DROP case loses
->> >> 15 Mpps on 64 byte frames with enabled metadata, threshold can help
->> >> mitigate that.
->> >
->> > I would put this in the bonus category. Can you do the simple thing
->> > above without these extra bits and then add them later. Just
->> > pick some overly conservative threshold to start with.
->> 
->> Yeah, I'd agree this kind of configuration is something that can be
->> added later, and also it's sort of orthogonal to the consumption of the
->> metadata itself.
->> 
->> Also, tying this configuration into the loading of an XDP program is a
->> terrible interface: these are hardware configuration options, let's just
->> put them into ethtool or 'ip link' like any other piece of device
->> configuration.
->
-> I don't believe it fits there, especially Ethtool. Ethtool is for
-> hardware configuration, XDP/AF_XDP is 95% software stuff (apart from
-> offload bits which is purely NFP's for now).
+> Alexander Lobakin <alexandr.lobakin@intel.com> writes:
+> 
+> > From: Toke H??iland-J??rgensen <toke@redhat.com>
+> > Date: Wed, 29 Jun 2022 15:43:05 +0200
+> >
+> >> John Fastabend <john.fastabend@gmail.com> writes:
+> >> 
+> >> > Alexander Lobakin wrote:
+> >> >> This RFC is to give the whole picture. It will most likely be split
+> >> >> onto several series, maybe even merge cycles. See the "table of
+> >> >> contents" below.
+> >> >
+> >> > Even for RFC its a bit much. Probably improve the summary
+> >> > message here as well I'm still not clear on the overall
+> >> > architecture so not sure I want to dig into patches.
+> >> 
+> >> +1 on this, and piggybacking on your comment to chime in on the general
+> >> architecture.
+> >> 
+> >> >> Now, a NIC driver, or even a SmartNIC itself, can put those params
+> >> >> there in a well-defined format. The format is fixed, but can be of
+> >> >> several different types represented by structures, which definitions
+> >> >> are available to the kernel, BPF programs and the userland.
+> >> >
+> >> > I don't think in general the format needs to be fixed.
+> >> 
+> >> No, that's the whole point of BTF: it's not supposed to be UAPI, we'll
+> >> use CO-RE to enable dynamic formats...
+> >> 
+> >> [...]
+> >> 
+> >> >> It is fixed due to it being almost a UAPI, and the exact format can
+> >> >> be determined by reading the last 10 bytes of metadata. They contain
+> >> >> a 2-byte magic ID to not confuse it with a non-compatible meta and
+> >> >> a 8-byte combined BTF ID + type ID: the ID of the BTF where this
+> >> >> structure is defined and the ID of that definition inside that BTF.
+> >> >> Users can obtain BTF IDs by structure types using helpers available
+> >> >> in the kernel, BPF (written by the CO-RE/verifier) and the userland
+> >> >> (libbpf -> kernel call) and then rely on those ID when reading data
+> >> >> to make sure whether they support it and what to do with it.
+> >> >> Why separate magic and ID? The idea is to make different formats
+> >> >> always contain the basic/"generic" structure embedded at the end.
+> >> >> This way we can still benefit in purely generic consumers (like
+> >> >> cpumap) while providing some "extra" data to those who support it.
+> >> >
+> >> > I don't follow this. If you have a struct in your driver name it
+> >> > something obvious, ice_xdp_metadata. If I understand things
+> >> > correctly just dump the BTF for the driver, extract the
+> >> > struct and done you can use CO-RE reads. For the 'fixed' case
+> >> > this looks easy. And I don't think you even need a patch for this.
+> >> 
+> >> ...however as we've discussed previously, we do need a bit of
+> >> infrastructure around this. In particular, we need to embed the embed
+> >> the BTF ID into the metadata itself so BPF can do runtime disambiguation
+> >> between different formats (and add the right CO-RE primitives to make
+> >> this easy). This is for two reasons:
+> >> 
+> >> - The metadata might be different per-packet (e.g., PTP packets with
+> >>   timestamps interleaved with bulk data without them)
+> >> 
+> >> - With redirects we may end up processing packets from different devices
+> >>   in a single XDP program (in devmap or cpumap, or on a veth) so we need
+> >>   to be able to disambiguate at runtime.
+> >> 
+> >> So I think the part of the design that puts the BTF ID into the end of
+> >> the metadata struct is sound; however, the actual format doesn't have to
+> >> be fixed, we can use CO-RE to pick out the bits that a given BPF program
+> >> needs; we just need a convention for how drivers report which format(s)
+> >> they support. Which we should also agree on (and add core infrastructure
+> >> around) so each driver doesn't go around inventing their own
+> >> conventions.
+> >> 
+> >> >> The enablement of this feature is controlled on attaching/replacing
+> >> >> XDP program on an interface with two new parameters: that combined
+> >> >> BTF+type ID and metadata threshold.
+> >> >> The threshold specifies the minimum frame size which a driver (or
+> >> >> NIC) should start composing metadata from. It is introduced instead
+> >> >> of just false/true flag due to that often it's not worth it to spend
+> >> >> cycles to fetch all that data for such small frames: let's say, it
+> >> >> can be even faster to just calculate checksums for them on CPU
+> >> >> rather than touch non-coherent DMA zone. Simple XDP_DROP case loses
+> >> >> 15 Mpps on 64 byte frames with enabled metadata, threshold can help
+> >> >> mitigate that.
+> >> >
+> >> > I would put this in the bonus category. Can you do the simple thing
+> >> > above without these extra bits and then add them later. Just
+> >> > pick some overly conservative threshold to start with.
+> >> 
+> >> Yeah, I'd agree this kind of configuration is something that can be
+> >> added later, and also it's sort of orthogonal to the consumption of the
+> >> metadata itself.
+> >> 
+> >> Also, tying this configuration into the loading of an XDP program is a
+> >> terrible interface: these are hardware configuration options, let's just
+> >> put them into ethtool or 'ip link' like any other piece of device
+> >> configuration.
+> >
+> > I don't believe it fits there, especially Ethtool. Ethtool is for
+> > hardware configuration, XDP/AF_XDP is 95% software stuff (apart from
+> > offload bits which is purely NFP's for now).
+> 
+> But XDP-hints is about consuming hardware features. When you're
+> configuring which metadata items you want, you're saying "please provide
+> me with these (hardware) features". So ethtool is an excellent place to
+> do that :)
 
-But XDP-hints is about consuming hardware features. When you're
-configuring which metadata items you want, you're saying "please provide
-me with these (hardware) features". So ethtool is an excellent place to
-do that :)
+With Ethtool you configure the hardware, e.g. it won't strip VLAN
+tags if you disable rx-cvlan-stripping. With configuring metadata
+you only tell what you want to see there, don't you?
 
-> I follow that way:
->
-> 1) you pick a program you want to attach;
-> 2) usually they are written for special needs and usecases;
-> 3) so most likely that program will be tied with metadata/driver/etc
->    in some way;
-> 4) so you want to enable Hints of a particular format primarily for
->    this program and usecase, same with threshold and everything
->    else.
->
-> Pls explain how you see it, I might be wrong for sure.
+> 
+> > I follow that way:
+> >
+> > 1) you pick a program you want to attach;
+> > 2) usually they are written for special needs and usecases;
+> > 3) so most likely that program will be tied with metadata/driver/etc
+> >    in some way;
+> > 4) so you want to enable Hints of a particular format primarily for
+> >    this program and usecase, same with threshold and everything
+> >    else.
+> >
+> > Pls explain how you see it, I might be wrong for sure.
+> 
+> As above: XDP hints is about giving XDP programs (and AF_XDP consumers)
+> access to metadata that is not currently available. Tying the lifetime
+> of that hardware configuration (i.e., which information to provide) to
+> the lifetime of an XDP program is not a good interface: for one thing,
+> how will it handle multiple programs? What about when XDP is not used at
 
-As above: XDP hints is about giving XDP programs (and AF_XDP consumers)
-access to metadata that is not currently available. Tying the lifetime
-of that hardware configuration (i.e., which information to provide) to
-the lifetime of an XDP program is not a good interface: for one thing,
-how will it handle multiple programs? What about when XDP is not used at
-all but you still want to configure the same features?
+Multiple progs is stuff I didn't cover, but will do later (as you
+all say to me, "let's start with something simple" :)). Aaaand
+multiple XDP progs (I'm not talking about attaching progs in
+differeng modes) is not a kernel feature, rather a libpf feature,
+so I believe it should be handled there later...
 
-In addition, in every other case where we do dynamic data access (with
-CO-RE) the BPF program is a consumer that modifies itself to access the
-data provided by the kernel. I get that this is harder to achieve for
-AF_XDP, but then let's solve that instead of making a totally
-inconsistent interface for XDP.
+> all but you still want to configure the same features?
 
-I'm as excited as you about the prospect of having totally programmable
-hardware where you can just specify any arbitrary metadata format and
-it'll provide that for you. But that is an orthogonal feature: let's
-start with creating a dynamic interface for consuming the (static)
-hardware features we already have, and then later we can have a separate
-interface for configuring more dynamic hardware features. XDP-hints is
-about adding this consumption feature in a way that's sufficiently
-dynamic that we can do the other (programmable hardware) thing on top
-later...
+What's the point of configuring metadata when there are no progs
+attached? To configure it once and not on every prog attach? I'm
+not saying I don't like it, just want to clarify.
+Maybe I need opinions from some more people, just to have an
+overview of how most of folks see it and would like to configure
+it. 'Cause I heard from at least one of the consumers that
+libpf API is a perfect place for Hints to him :)
 
--Toke
+> 
+> In addition, in every other case where we do dynamic data access (with
+> CO-RE) the BPF program is a consumer that modifies itself to access the
+> data provided by the kernel. I get that this is harder to achieve for
+> AF_XDP, but then let's solve that instead of making a totally
+> inconsistent interface for XDP.
+
+I also see CO-RE more fitting and convenient way to use them, but
+didn't manage to solve two things:
+
+1) AF_XDP programs, so what to do with them? Prepare patches for
+   LLVM to make it able to do CO-RE on AF_XDP program load? Or
+   just hardcode them for particular usecases and NICs? What about
+   "general-purpose" programs?
+   And if hardcode, what's the point then to do Generic Hints at
+   all? Then all it needs is making driver building some meta in
+   front of frames via on-off button and that's it? Why BTF ID in
+   the meta then if consumers will access meta hardcoded (via CO-RE
+   or literally hardcoded, doesn't matter)?
+2) In-kernel metadata consumers? Also do CO-RE? Otherwise, with no
+   generic metadata structure they won't be able to benefit from
+   Hints. But I guess we still need to provide kernel with meta?
+   Or no?
+
+> 
+> I'm as excited as you about the prospect of having totally programmable
+
+But I mostly care about current generation with no programmable
+Hints...
+
+> hardware where you can just specify any arbitrary metadata format and
+> it'll provide that for you. But that is an orthogonal feature: let's
+> start with creating a dynamic interface for consuming the (static)
+> hardware features we already have, and then later we can have a separate
+> interface for configuring more dynamic hardware features. XDP-hints is
+> about adding this consumption feature in a way that's sufficiently
+> dynamic that we can do the other (programmable hardware) thing on top
+> later...
+> 
+> -Toke
+
+Thanks,
+Olek
diff --git a/a/content_digest b/N1/content_digest
index fae8949..5d40daa 100644
--- a/a/content_digest
+++ b/N1/content_digest
@@ -1,12 +1,9 @@
- "ref\020220628194812.1453059-1-alexandr.lobakin@intel.com\0"
- "ref\062bbedf07f44a_2181420830@john.notmuch\0"
- "ref\087iloja8ly.fsf@toke.dk\0"
- "ref\020220704154440.7567-1-alexandr.lobakin@intel.com\0"
- "From\0Toke H\303\270iland-J\303\270rgensen <toke@redhat.com>\0"
+ "From\0Alexander Lobakin <alexandr.lobakin@intel.com>\0"
  "Subject\0Re: [xdp-hints] Re: [PATCH RFC bpf-next 00/52] bpf, xdp: introduce and use Generic Hints/metadata\0"
- "Date\0Mon, 04 Jul 2022 19:14:04 +0200\0"
- "To\0Alexander Lobakin <alexandr.lobakin@intel.com>\0"
- "Cc\0John Fastabend <john.fastabend@gmail.com>"
+ "Date\0Tue,  5 Jul 2022 17:15:09 +0200\0"
+ "To\0Toke H\303\270iland-J\303\270rgensen <toke@redhat.com>\0"
+ "Cc\0Alexander Lobakin <alexandr.lobakin@intel.com>"
+  John Fastabend <john.fastabend@gmail.com>
   Alexei Starovoitov <ast@kernel.org>
   Daniel Borkmann <daniel@iogearbox.net>
   Andrii Nakryiko <andrii@kernel.org>
@@ -31,147 +28,196 @@
  " xdp-hints@xdp-project.net\0"
  "\00:1\0"
  "b\0"
- "Alexander Lobakin <alexandr.lobakin@intel.com> writes:\n"
+ "From: Toke H\303\270iland-J\303\270rgensen <toke@redhat.com>\n"
+ "Date: Mon, 04 Jul 2022 19:14:04 +0200\n"
  "\n"
- "> From: Toke H??iland-J??rgensen <toke@redhat.com>\n"
- "> Date: Wed, 29 Jun 2022 15:43:05 +0200\n"
- ">\n"
- ">> John Fastabend <john.fastabend@gmail.com> writes:\n"
- ">> \n"
- ">> > Alexander Lobakin wrote:\n"
- ">> >> This RFC is to give the whole picture. It will most likely be split\n"
- ">> >> onto several series, maybe even merge cycles. See the \"table of\n"
- ">> >> contents\" below.\n"
- ">> >\n"
- ">> > Even for RFC its a bit much. Probably improve the summary\n"
- ">> > message here as well I'm still not clear on the overall\n"
- ">> > architecture so not sure I want to dig into patches.\n"
- ">> \n"
- ">> +1 on this, and piggybacking on your comment to chime in on the general\n"
- ">> architecture.\n"
- ">> \n"
- ">> >> Now, a NIC driver, or even a SmartNIC itself, can put those params\n"
- ">> >> there in a well-defined format. The format is fixed, but can be of\n"
- ">> >> several different types represented by structures, which definitions\n"
- ">> >> are available to the kernel, BPF programs and the userland.\n"
- ">> >\n"
- ">> > I don't think in general the format needs to be fixed.\n"
- ">> \n"
- ">> No, that's the whole point of BTF: it's not supposed to be UAPI, we'll\n"
- ">> use CO-RE to enable dynamic formats...\n"
- ">> \n"
- ">> [...]\n"
- ">> \n"
- ">> >> It is fixed due to it being almost a UAPI, and the exact format can\n"
- ">> >> be determined by reading the last 10 bytes of metadata. They contain\n"
- ">> >> a 2-byte magic ID to not confuse it with a non-compatible meta and\n"
- ">> >> a 8-byte combined BTF ID + type ID: the ID of the BTF where this\n"
- ">> >> structure is defined and the ID of that definition inside that BTF.\n"
- ">> >> Users can obtain BTF IDs by structure types using helpers available\n"
- ">> >> in the kernel, BPF (written by the CO-RE/verifier) and the userland\n"
- ">> >> (libbpf -> kernel call) and then rely on those ID when reading data\n"
- ">> >> to make sure whether they support it and what to do with it.\n"
- ">> >> Why separate magic and ID? The idea is to make different formats\n"
- ">> >> always contain the basic/\"generic\" structure embedded at the end.\n"
- ">> >> This way we can still benefit in purely generic consumers (like\n"
- ">> >> cpumap) while providing some \"extra\" data to those who support it.\n"
- ">> >\n"
- ">> > I don't follow this. If you have a struct in your driver name it\n"
- ">> > something obvious, ice_xdp_metadata. If I understand things\n"
- ">> > correctly just dump the BTF for the driver, extract the\n"
- ">> > struct and done you can use CO-RE reads. For the 'fixed' case\n"
- ">> > this looks easy. And I don't think you even need a patch for this.\n"
- ">> \n"
- ">> ...however as we've discussed previously, we do need a bit of\n"
- ">> infrastructure around this. In particular, we need to embed the embed\n"
- ">> the BTF ID into the metadata itself so BPF can do runtime disambiguation\n"
- ">> between different formats (and add the right CO-RE primitives to make\n"
- ">> this easy). This is for two reasons:\n"
- ">> \n"
- ">> - The metadata might be different per-packet (e.g., PTP packets with\n"
- ">>   timestamps interleaved with bulk data without them)\n"
- ">> \n"
- ">> - With redirects we may end up processing packets from different devices\n"
- ">>   in a single XDP program (in devmap or cpumap, or on a veth) so we need\n"
- ">>   to be able to disambiguate at runtime.\n"
- ">> \n"
- ">> So I think the part of the design that puts the BTF ID into the end of\n"
- ">> the metadata struct is sound; however, the actual format doesn't have to\n"
- ">> be fixed, we can use CO-RE to pick out the bits that a given BPF program\n"
- ">> needs; we just need a convention for how drivers report which format(s)\n"
- ">> they support. Which we should also agree on (and add core infrastructure\n"
- ">> around) so each driver doesn't go around inventing their own\n"
- ">> conventions.\n"
- ">> \n"
- ">> >> The enablement of this feature is controlled on attaching/replacing\n"
- ">> >> XDP program on an interface with two new parameters: that combined\n"
- ">> >> BTF+type ID and metadata threshold.\n"
- ">> >> The threshold specifies the minimum frame size which a driver (or\n"
- ">> >> NIC) should start composing metadata from. It is introduced instead\n"
- ">> >> of just false/true flag due to that often it's not worth it to spend\n"
- ">> >> cycles to fetch all that data for such small frames: let's say, it\n"
- ">> >> can be even faster to just calculate checksums for them on CPU\n"
- ">> >> rather than touch non-coherent DMA zone. Simple XDP_DROP case loses\n"
- ">> >> 15 Mpps on 64 byte frames with enabled metadata, threshold can help\n"
- ">> >> mitigate that.\n"
- ">> >\n"
- ">> > I would put this in the bonus category. Can you do the simple thing\n"
- ">> > above without these extra bits and then add them later. Just\n"
- ">> > pick some overly conservative threshold to start with.\n"
- ">> \n"
- ">> Yeah, I'd agree this kind of configuration is something that can be\n"
- ">> added later, and also it's sort of orthogonal to the consumption of the\n"
- ">> metadata itself.\n"
- ">> \n"
- ">> Also, tying this configuration into the loading of an XDP program is a\n"
- ">> terrible interface: these are hardware configuration options, let's just\n"
- ">> put them into ethtool or 'ip link' like any other piece of device\n"
- ">> configuration.\n"
- ">\n"
- "> I don't believe it fits there, especially Ethtool. Ethtool is for\n"
- "> hardware configuration, XDP/AF_XDP is 95% software stuff (apart from\n"
- "> offload bits which is purely NFP's for now).\n"
+ "> Alexander Lobakin <alexandr.lobakin@intel.com> writes:\n"
+ "> \n"
+ "> > From: Toke H??iland-J??rgensen <toke@redhat.com>\n"
+ "> > Date: Wed, 29 Jun 2022 15:43:05 +0200\n"
+ "> >\n"
+ "> >> John Fastabend <john.fastabend@gmail.com> writes:\n"
+ "> >> \n"
+ "> >> > Alexander Lobakin wrote:\n"
+ "> >> >> This RFC is to give the whole picture. It will most likely be split\n"
+ "> >> >> onto several series, maybe even merge cycles. See the \"table of\n"
+ "> >> >> contents\" below.\n"
+ "> >> >\n"
+ "> >> > Even for RFC its a bit much. Probably improve the summary\n"
+ "> >> > message here as well I'm still not clear on the overall\n"
+ "> >> > architecture so not sure I want to dig into patches.\n"
+ "> >> \n"
+ "> >> +1 on this, and piggybacking on your comment to chime in on the general\n"
+ "> >> architecture.\n"
+ "> >> \n"
+ "> >> >> Now, a NIC driver, or even a SmartNIC itself, can put those params\n"
+ "> >> >> there in a well-defined format. The format is fixed, but can be of\n"
+ "> >> >> several different types represented by structures, which definitions\n"
+ "> >> >> are available to the kernel, BPF programs and the userland.\n"
+ "> >> >\n"
+ "> >> > I don't think in general the format needs to be fixed.\n"
+ "> >> \n"
+ "> >> No, that's the whole point of BTF: it's not supposed to be UAPI, we'll\n"
+ "> >> use CO-RE to enable dynamic formats...\n"
+ "> >> \n"
+ "> >> [...]\n"
+ "> >> \n"
+ "> >> >> It is fixed due to it being almost a UAPI, and the exact format can\n"
+ "> >> >> be determined by reading the last 10 bytes of metadata. They contain\n"
+ "> >> >> a 2-byte magic ID to not confuse it with a non-compatible meta and\n"
+ "> >> >> a 8-byte combined BTF ID + type ID: the ID of the BTF where this\n"
+ "> >> >> structure is defined and the ID of that definition inside that BTF.\n"
+ "> >> >> Users can obtain BTF IDs by structure types using helpers available\n"
+ "> >> >> in the kernel, BPF (written by the CO-RE/verifier) and the userland\n"
+ "> >> >> (libbpf -> kernel call) and then rely on those ID when reading data\n"
+ "> >> >> to make sure whether they support it and what to do with it.\n"
+ "> >> >> Why separate magic and ID? The idea is to make different formats\n"
+ "> >> >> always contain the basic/\"generic\" structure embedded at the end.\n"
+ "> >> >> This way we can still benefit in purely generic consumers (like\n"
+ "> >> >> cpumap) while providing some \"extra\" data to those who support it.\n"
+ "> >> >\n"
+ "> >> > I don't follow this. If you have a struct in your driver name it\n"
+ "> >> > something obvious, ice_xdp_metadata. If I understand things\n"
+ "> >> > correctly just dump the BTF for the driver, extract the\n"
+ "> >> > struct and done you can use CO-RE reads. For the 'fixed' case\n"
+ "> >> > this looks easy. And I don't think you even need a patch for this.\n"
+ "> >> \n"
+ "> >> ...however as we've discussed previously, we do need a bit of\n"
+ "> >> infrastructure around this. In particular, we need to embed the embed\n"
+ "> >> the BTF ID into the metadata itself so BPF can do runtime disambiguation\n"
+ "> >> between different formats (and add the right CO-RE primitives to make\n"
+ "> >> this easy). This is for two reasons:\n"
+ "> >> \n"
+ "> >> - The metadata might be different per-packet (e.g., PTP packets with\n"
+ "> >>   timestamps interleaved with bulk data without them)\n"
+ "> >> \n"
+ "> >> - With redirects we may end up processing packets from different devices\n"
+ "> >>   in a single XDP program (in devmap or cpumap, or on a veth) so we need\n"
+ "> >>   to be able to disambiguate at runtime.\n"
+ "> >> \n"
+ "> >> So I think the part of the design that puts the BTF ID into the end of\n"
+ "> >> the metadata struct is sound; however, the actual format doesn't have to\n"
+ "> >> be fixed, we can use CO-RE to pick out the bits that a given BPF program\n"
+ "> >> needs; we just need a convention for how drivers report which format(s)\n"
+ "> >> they support. Which we should also agree on (and add core infrastructure\n"
+ "> >> around) so each driver doesn't go around inventing their own\n"
+ "> >> conventions.\n"
+ "> >> \n"
+ "> >> >> The enablement of this feature is controlled on attaching/replacing\n"
+ "> >> >> XDP program on an interface with two new parameters: that combined\n"
+ "> >> >> BTF+type ID and metadata threshold.\n"
+ "> >> >> The threshold specifies the minimum frame size which a driver (or\n"
+ "> >> >> NIC) should start composing metadata from. It is introduced instead\n"
+ "> >> >> of just false/true flag due to that often it's not worth it to spend\n"
+ "> >> >> cycles to fetch all that data for such small frames: let's say, it\n"
+ "> >> >> can be even faster to just calculate checksums for them on CPU\n"
+ "> >> >> rather than touch non-coherent DMA zone. Simple XDP_DROP case loses\n"
+ "> >> >> 15 Mpps on 64 byte frames with enabled metadata, threshold can help\n"
+ "> >> >> mitigate that.\n"
+ "> >> >\n"
+ "> >> > I would put this in the bonus category. Can you do the simple thing\n"
+ "> >> > above without these extra bits and then add them later. Just\n"
+ "> >> > pick some overly conservative threshold to start with.\n"
+ "> >> \n"
+ "> >> Yeah, I'd agree this kind of configuration is something that can be\n"
+ "> >> added later, and also it's sort of orthogonal to the consumption of the\n"
+ "> >> metadata itself.\n"
+ "> >> \n"
+ "> >> Also, tying this configuration into the loading of an XDP program is a\n"
+ "> >> terrible interface: these are hardware configuration options, let's just\n"
+ "> >> put them into ethtool or 'ip link' like any other piece of device\n"
+ "> >> configuration.\n"
+ "> >\n"
+ "> > I don't believe it fits there, especially Ethtool. Ethtool is for\n"
+ "> > hardware configuration, XDP/AF_XDP is 95% software stuff (apart from\n"
+ "> > offload bits which is purely NFP's for now).\n"
+ "> \n"
+ "> But XDP-hints is about consuming hardware features. When you're\n"
+ "> configuring which metadata items you want, you're saying \"please provide\n"
+ "> me with these (hardware) features\". So ethtool is an excellent place to\n"
+ "> do that :)\n"
  "\n"
- "But XDP-hints is about consuming hardware features. When you're\n"
- "configuring which metadata items you want, you're saying \"please provide\n"
- "me with these (hardware) features\". So ethtool is an excellent place to\n"
- "do that :)\n"
+ "With Ethtool you configure the hardware, e.g. it won't strip VLAN\n"
+ "tags if you disable rx-cvlan-stripping. With configuring metadata\n"
+ "you only tell what you want to see there, don't you?\n"
  "\n"
- "> I follow that way:\n"
- ">\n"
- "> 1) you pick a program you want to attach;\n"
- "> 2) usually they are written for special needs and usecases;\n"
- "> 3) so most likely that program will be tied with metadata/driver/etc\n"
- ">    in some way;\n"
- "> 4) so you want to enable Hints of a particular format primarily for\n"
- ">    this program and usecase, same with threshold and everything\n"
- ">    else.\n"
- ">\n"
- "> Pls explain how you see it, I might be wrong for sure.\n"
+ "> \n"
+ "> > I follow that way:\n"
+ "> >\n"
+ "> > 1) you pick a program you want to attach;\n"
+ "> > 2) usually they are written for special needs and usecases;\n"
+ "> > 3) so most likely that program will be tied with metadata/driver/etc\n"
+ "> >    in some way;\n"
+ "> > 4) so you want to enable Hints of a particular format primarily for\n"
+ "> >    this program and usecase, same with threshold and everything\n"
+ "> >    else.\n"
+ "> >\n"
+ "> > Pls explain how you see it, I might be wrong for sure.\n"
+ "> \n"
+ "> As above: XDP hints is about giving XDP programs (and AF_XDP consumers)\n"
+ "> access to metadata that is not currently available. Tying the lifetime\n"
+ "> of that hardware configuration (i.e., which information to provide) to\n"
+ "> the lifetime of an XDP program is not a good interface: for one thing,\n"
+ "> how will it handle multiple programs? What about when XDP is not used at\n"
  "\n"
- "As above: XDP hints is about giving XDP programs (and AF_XDP consumers)\n"
- "access to metadata that is not currently available. Tying the lifetime\n"
- "of that hardware configuration (i.e., which information to provide) to\n"
- "the lifetime of an XDP program is not a good interface: for one thing,\n"
- "how will it handle multiple programs? What about when XDP is not used at\n"
- "all but you still want to configure the same features?\n"
+ "Multiple progs is stuff I didn't cover, but will do later (as you\n"
+ "all say to me, \"let's start with something simple\" :)). Aaaand\n"
+ "multiple XDP progs (I'm not talking about attaching progs in\n"
+ "differeng modes) is not a kernel feature, rather a libpf feature,\n"
+ "so I believe it should be handled there later...\n"
  "\n"
- "In addition, in every other case where we do dynamic data access (with\n"
- "CO-RE) the BPF program is a consumer that modifies itself to access the\n"
- "data provided by the kernel. I get that this is harder to achieve for\n"
- "AF_XDP, but then let's solve that instead of making a totally\n"
- "inconsistent interface for XDP.\n"
+ "> all but you still want to configure the same features?\n"
  "\n"
- "I'm as excited as you about the prospect of having totally programmable\n"
- "hardware where you can just specify any arbitrary metadata format and\n"
- "it'll provide that for you. But that is an orthogonal feature: let's\n"
- "start with creating a dynamic interface for consuming the (static)\n"
- "hardware features we already have, and then later we can have a separate\n"
- "interface for configuring more dynamic hardware features. XDP-hints is\n"
- "about adding this consumption feature in a way that's sufficiently\n"
- "dynamic that we can do the other (programmable hardware) thing on top\n"
- "later...\n"
+ "What's the point of configuring metadata when there are no progs\n"
+ "attached? To configure it once and not on every prog attach? I'm\n"
+ "not saying I don't like it, just want to clarify.\n"
+ "Maybe I need opinions from some more people, just to have an\n"
+ "overview of how most of folks see it and would like to configure\n"
+ "it. 'Cause I heard from at least one of the consumers that\n"
+ "libpf API is a perfect place for Hints to him :)\n"
  "\n"
- -Toke
+ "> \n"
+ "> In addition, in every other case where we do dynamic data access (with\n"
+ "> CO-RE) the BPF program is a consumer that modifies itself to access the\n"
+ "> data provided by the kernel. I get that this is harder to achieve for\n"
+ "> AF_XDP, but then let's solve that instead of making a totally\n"
+ "> inconsistent interface for XDP.\n"
+ "\n"
+ "I also see CO-RE more fitting and convenient way to use them, but\n"
+ "didn't manage to solve two things:\n"
+ "\n"
+ "1) AF_XDP programs, so what to do with them? Prepare patches for\n"
+ "   LLVM to make it able to do CO-RE on AF_XDP program load? Or\n"
+ "   just hardcode them for particular usecases and NICs? What about\n"
+ "   \"general-purpose\" programs?\n"
+ "   And if hardcode, what's the point then to do Generic Hints at\n"
+ "   all? Then all it needs is making driver building some meta in\n"
+ "   front of frames via on-off button and that's it? Why BTF ID in\n"
+ "   the meta then if consumers will access meta hardcoded (via CO-RE\n"
+ "   or literally hardcoded, doesn't matter)?\n"
+ "2) In-kernel metadata consumers? Also do CO-RE? Otherwise, with no\n"
+ "   generic metadata structure they won't be able to benefit from\n"
+ "   Hints. But I guess we still need to provide kernel with meta?\n"
+ "   Or no?\n"
+ "\n"
+ "> \n"
+ "> I'm as excited as you about the prospect of having totally programmable\n"
+ "\n"
+ "But I mostly care about current generation with no programmable\n"
+ "Hints...\n"
+ "\n"
+ "> hardware where you can just specify any arbitrary metadata format and\n"
+ "> it'll provide that for you. But that is an orthogonal feature: let's\n"
+ "> start with creating a dynamic interface for consuming the (static)\n"
+ "> hardware features we already have, and then later we can have a separate\n"
+ "> interface for configuring more dynamic hardware features. XDP-hints is\n"
+ "> about adding this consumption feature in a way that's sufficiently\n"
+ "> dynamic that we can do the other (programmable hardware) thing on top\n"
+ "> later...\n"
+ "> \n"
+ "> -Toke\n"
+ "\n"
+ "Thanks,\n"
+ Olek
 
-c72dc416836c50636a9981877feaff1bcb1c823441ad3152d8a1b2a65188b1ff
+8f75568259bb40949ca39a06e8d33b0e5cdd88ba293b3917893e73d9d530d758

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox