diff for duplicates of <87a69o94wz.fsf@toke.dk> diff --git a/a/1.txt b/N1/1.txt index 1ca3ba4..e4aa316 100644 --- a/a/1.txt +++ b/N1/1.txt @@ -1,142 +1,191 @@ -Alexander Lobakin <alexandr.lobakin@intel.com> writes: +From: Toke Høiland-Jørgensen <toke@redhat.com> +Date: Mon, 04 Jul 2022 19:14:04 +0200 -> From: Toke H??iland-J??rgensen <toke@redhat.com> -> Date: Wed, 29 Jun 2022 15:43:05 +0200 -> ->> John Fastabend <john.fastabend@gmail.com> writes: ->> ->> > Alexander Lobakin wrote: ->> >> This RFC is to give the whole picture. It will most likely be split ->> >> onto several series, maybe even merge cycles. See the "table of ->> >> contents" below. ->> > ->> > Even for RFC its a bit much. Probably improve the summary ->> > message here as well I'm still not clear on the overall ->> > architecture so not sure I want to dig into patches. ->> ->> +1 on this, and piggybacking on your comment to chime in on the general ->> architecture. ->> ->> >> Now, a NIC driver, or even a SmartNIC itself, can put those params ->> >> there in a well-defined format. The format is fixed, but can be of ->> >> several different types represented by structures, which definitions ->> >> are available to the kernel, BPF programs and the userland. ->> > ->> > I don't think in general the format needs to be fixed. ->> ->> No, that's the whole point of BTF: it's not supposed to be UAPI, we'll ->> use CO-RE to enable dynamic formats... ->> ->> [...] ->> ->> >> It is fixed due to it being almost a UAPI, and the exact format can ->> >> be determined by reading the last 10 bytes of metadata. They contain ->> >> a 2-byte magic ID to not confuse it with a non-compatible meta and ->> >> a 8-byte combined BTF ID + type ID: the ID of the BTF where this ->> >> structure is defined and the ID of that definition inside that BTF. ->> >> Users can obtain BTF IDs by structure types using helpers available ->> >> in the kernel, BPF (written by the CO-RE/verifier) and the userland ->> >> (libbpf -> kernel call) and then rely on those ID when reading data ->> >> to make sure whether they support it and what to do with it. ->> >> Why separate magic and ID? The idea is to make different formats ->> >> always contain the basic/"generic" structure embedded at the end. ->> >> This way we can still benefit in purely generic consumers (like ->> >> cpumap) while providing some "extra" data to those who support it. ->> > ->> > I don't follow this. If you have a struct in your driver name it ->> > something obvious, ice_xdp_metadata. If I understand things ->> > correctly just dump the BTF for the driver, extract the ->> > struct and done you can use CO-RE reads. For the 'fixed' case ->> > this looks easy. And I don't think you even need a patch for this. ->> ->> ...however as we've discussed previously, we do need a bit of ->> infrastructure around this. In particular, we need to embed the embed ->> the BTF ID into the metadata itself so BPF can do runtime disambiguation ->> between different formats (and add the right CO-RE primitives to make ->> this easy). This is for two reasons: ->> ->> - The metadata might be different per-packet (e.g., PTP packets with ->> timestamps interleaved with bulk data without them) ->> ->> - With redirects we may end up processing packets from different devices ->> in a single XDP program (in devmap or cpumap, or on a veth) so we need ->> to be able to disambiguate at runtime. ->> ->> So I think the part of the design that puts the BTF ID into the end of ->> the metadata struct is sound; however, the actual format doesn't have to ->> be fixed, we can use CO-RE to pick out the bits that a given BPF program ->> needs; we just need a convention for how drivers report which format(s) ->> they support. Which we should also agree on (and add core infrastructure ->> around) so each driver doesn't go around inventing their own ->> conventions. ->> ->> >> The enablement of this feature is controlled on attaching/replacing ->> >> XDP program on an interface with two new parameters: that combined ->> >> BTF+type ID and metadata threshold. ->> >> The threshold specifies the minimum frame size which a driver (or ->> >> NIC) should start composing metadata from. It is introduced instead ->> >> of just false/true flag due to that often it's not worth it to spend ->> >> cycles to fetch all that data for such small frames: let's say, it ->> >> can be even faster to just calculate checksums for them on CPU ->> >> rather than touch non-coherent DMA zone. Simple XDP_DROP case loses ->> >> 15 Mpps on 64 byte frames with enabled metadata, threshold can help ->> >> mitigate that. ->> > ->> > I would put this in the bonus category. Can you do the simple thing ->> > above without these extra bits and then add them later. Just ->> > pick some overly conservative threshold to start with. ->> ->> Yeah, I'd agree this kind of configuration is something that can be ->> added later, and also it's sort of orthogonal to the consumption of the ->> metadata itself. ->> ->> Also, tying this configuration into the loading of an XDP program is a ->> terrible interface: these are hardware configuration options, let's just ->> put them into ethtool or 'ip link' like any other piece of device ->> configuration. -> -> I don't believe it fits there, especially Ethtool. Ethtool is for -> hardware configuration, XDP/AF_XDP is 95% software stuff (apart from -> offload bits which is purely NFP's for now). +> Alexander Lobakin <alexandr.lobakin@intel.com> writes: +> +> > From: Toke H??iland-J??rgensen <toke@redhat.com> +> > Date: Wed, 29 Jun 2022 15:43:05 +0200 +> > +> >> John Fastabend <john.fastabend@gmail.com> writes: +> >> +> >> > Alexander Lobakin wrote: +> >> >> This RFC is to give the whole picture. It will most likely be split +> >> >> onto several series, maybe even merge cycles. See the "table of +> >> >> contents" below. +> >> > +> >> > Even for RFC its a bit much. Probably improve the summary +> >> > message here as well I'm still not clear on the overall +> >> > architecture so not sure I want to dig into patches. +> >> +> >> +1 on this, and piggybacking on your comment to chime in on the general +> >> architecture. +> >> +> >> >> Now, a NIC driver, or even a SmartNIC itself, can put those params +> >> >> there in a well-defined format. The format is fixed, but can be of +> >> >> several different types represented by structures, which definitions +> >> >> are available to the kernel, BPF programs and the userland. +> >> > +> >> > I don't think in general the format needs to be fixed. +> >> +> >> No, that's the whole point of BTF: it's not supposed to be UAPI, we'll +> >> use CO-RE to enable dynamic formats... +> >> +> >> [...] +> >> +> >> >> It is fixed due to it being almost a UAPI, and the exact format can +> >> >> be determined by reading the last 10 bytes of metadata. They contain +> >> >> a 2-byte magic ID to not confuse it with a non-compatible meta and +> >> >> a 8-byte combined BTF ID + type ID: the ID of the BTF where this +> >> >> structure is defined and the ID of that definition inside that BTF. +> >> >> Users can obtain BTF IDs by structure types using helpers available +> >> >> in the kernel, BPF (written by the CO-RE/verifier) and the userland +> >> >> (libbpf -> kernel call) and then rely on those ID when reading data +> >> >> to make sure whether they support it and what to do with it. +> >> >> Why separate magic and ID? The idea is to make different formats +> >> >> always contain the basic/"generic" structure embedded at the end. +> >> >> This way we can still benefit in purely generic consumers (like +> >> >> cpumap) while providing some "extra" data to those who support it. +> >> > +> >> > I don't follow this. If you have a struct in your driver name it +> >> > something obvious, ice_xdp_metadata. If I understand things +> >> > correctly just dump the BTF for the driver, extract the +> >> > struct and done you can use CO-RE reads. For the 'fixed' case +> >> > this looks easy. And I don't think you even need a patch for this. +> >> +> >> ...however as we've discussed previously, we do need a bit of +> >> infrastructure around this. In particular, we need to embed the embed +> >> the BTF ID into the metadata itself so BPF can do runtime disambiguation +> >> between different formats (and add the right CO-RE primitives to make +> >> this easy). This is for two reasons: +> >> +> >> - The metadata might be different per-packet (e.g., PTP packets with +> >> timestamps interleaved with bulk data without them) +> >> +> >> - With redirects we may end up processing packets from different devices +> >> in a single XDP program (in devmap or cpumap, or on a veth) so we need +> >> to be able to disambiguate at runtime. +> >> +> >> So I think the part of the design that puts the BTF ID into the end of +> >> the metadata struct is sound; however, the actual format doesn't have to +> >> be fixed, we can use CO-RE to pick out the bits that a given BPF program +> >> needs; we just need a convention for how drivers report which format(s) +> >> they support. Which we should also agree on (and add core infrastructure +> >> around) so each driver doesn't go around inventing their own +> >> conventions. +> >> +> >> >> The enablement of this feature is controlled on attaching/replacing +> >> >> XDP program on an interface with two new parameters: that combined +> >> >> BTF+type ID and metadata threshold. +> >> >> The threshold specifies the minimum frame size which a driver (or +> >> >> NIC) should start composing metadata from. It is introduced instead +> >> >> of just false/true flag due to that often it's not worth it to spend +> >> >> cycles to fetch all that data for such small frames: let's say, it +> >> >> can be even faster to just calculate checksums for them on CPU +> >> >> rather than touch non-coherent DMA zone. Simple XDP_DROP case loses +> >> >> 15 Mpps on 64 byte frames with enabled metadata, threshold can help +> >> >> mitigate that. +> >> > +> >> > I would put this in the bonus category. Can you do the simple thing +> >> > above without these extra bits and then add them later. Just +> >> > pick some overly conservative threshold to start with. +> >> +> >> Yeah, I'd agree this kind of configuration is something that can be +> >> added later, and also it's sort of orthogonal to the consumption of the +> >> metadata itself. +> >> +> >> Also, tying this configuration into the loading of an XDP program is a +> >> terrible interface: these are hardware configuration options, let's just +> >> put them into ethtool or 'ip link' like any other piece of device +> >> configuration. +> > +> > I don't believe it fits there, especially Ethtool. Ethtool is for +> > hardware configuration, XDP/AF_XDP is 95% software stuff (apart from +> > offload bits which is purely NFP's for now). +> +> But XDP-hints is about consuming hardware features. When you're +> configuring which metadata items you want, you're saying "please provide +> me with these (hardware) features". So ethtool is an excellent place to +> do that :) -But XDP-hints is about consuming hardware features. When you're -configuring which metadata items you want, you're saying "please provide -me with these (hardware) features". So ethtool is an excellent place to -do that :) +With Ethtool you configure the hardware, e.g. it won't strip VLAN +tags if you disable rx-cvlan-stripping. With configuring metadata +you only tell what you want to see there, don't you? -> I follow that way: -> -> 1) you pick a program you want to attach; -> 2) usually they are written for special needs and usecases; -> 3) so most likely that program will be tied with metadata/driver/etc -> in some way; -> 4) so you want to enable Hints of a particular format primarily for -> this program and usecase, same with threshold and everything -> else. -> -> Pls explain how you see it, I might be wrong for sure. +> +> > I follow that way: +> > +> > 1) you pick a program you want to attach; +> > 2) usually they are written for special needs and usecases; +> > 3) so most likely that program will be tied with metadata/driver/etc +> > in some way; +> > 4) so you want to enable Hints of a particular format primarily for +> > this program and usecase, same with threshold and everything +> > else. +> > +> > Pls explain how you see it, I might be wrong for sure. +> +> As above: XDP hints is about giving XDP programs (and AF_XDP consumers) +> access to metadata that is not currently available. Tying the lifetime +> of that hardware configuration (i.e., which information to provide) to +> the lifetime of an XDP program is not a good interface: for one thing, +> how will it handle multiple programs? What about when XDP is not used at -As above: XDP hints is about giving XDP programs (and AF_XDP consumers) -access to metadata that is not currently available. Tying the lifetime -of that hardware configuration (i.e., which information to provide) to -the lifetime of an XDP program is not a good interface: for one thing, -how will it handle multiple programs? What about when XDP is not used at -all but you still want to configure the same features? +Multiple progs is stuff I didn't cover, but will do later (as you +all say to me, "let's start with something simple" :)). Aaaand +multiple XDP progs (I'm not talking about attaching progs in +differeng modes) is not a kernel feature, rather a libpf feature, +so I believe it should be handled there later... -In addition, in every other case where we do dynamic data access (with -CO-RE) the BPF program is a consumer that modifies itself to access the -data provided by the kernel. I get that this is harder to achieve for -AF_XDP, but then let's solve that instead of making a totally -inconsistent interface for XDP. +> all but you still want to configure the same features? -I'm as excited as you about the prospect of having totally programmable -hardware where you can just specify any arbitrary metadata format and -it'll provide that for you. But that is an orthogonal feature: let's -start with creating a dynamic interface for consuming the (static) -hardware features we already have, and then later we can have a separate -interface for configuring more dynamic hardware features. XDP-hints is -about adding this consumption feature in a way that's sufficiently -dynamic that we can do the other (programmable hardware) thing on top -later... +What's the point of configuring metadata when there are no progs +attached? To configure it once and not on every prog attach? I'm +not saying I don't like it, just want to clarify. +Maybe I need opinions from some more people, just to have an +overview of how most of folks see it and would like to configure +it. 'Cause I heard from at least one of the consumers that +libpf API is a perfect place for Hints to him :) --Toke +> +> In addition, in every other case where we do dynamic data access (with +> CO-RE) the BPF program is a consumer that modifies itself to access the +> data provided by the kernel. I get that this is harder to achieve for +> AF_XDP, but then let's solve that instead of making a totally +> inconsistent interface for XDP. + +I also see CO-RE more fitting and convenient way to use them, but +didn't manage to solve two things: + +1) AF_XDP programs, so what to do with them? Prepare patches for + LLVM to make it able to do CO-RE on AF_XDP program load? Or + just hardcode them for particular usecases and NICs? What about + "general-purpose" programs? + And if hardcode, what's the point then to do Generic Hints at + all? Then all it needs is making driver building some meta in + front of frames via on-off button and that's it? Why BTF ID in + the meta then if consumers will access meta hardcoded (via CO-RE + or literally hardcoded, doesn't matter)? +2) In-kernel metadata consumers? Also do CO-RE? Otherwise, with no + generic metadata structure they won't be able to benefit from + Hints. But I guess we still need to provide kernel with meta? + Or no? + +> +> I'm as excited as you about the prospect of having totally programmable + +But I mostly care about current generation with no programmable +Hints... + +> hardware where you can just specify any arbitrary metadata format and +> it'll provide that for you. But that is an orthogonal feature: let's +> start with creating a dynamic interface for consuming the (static) +> hardware features we already have, and then later we can have a separate +> interface for configuring more dynamic hardware features. XDP-hints is +> about adding this consumption feature in a way that's sufficiently +> dynamic that we can do the other (programmable hardware) thing on top +> later... +> +> -Toke + +Thanks, +Olek diff --git a/a/content_digest b/N1/content_digest index fae8949..5d40daa 100644 --- a/a/content_digest +++ b/N1/content_digest @@ -1,12 +1,9 @@ - "ref\020220628194812.1453059-1-alexandr.lobakin@intel.com\0" - "ref\062bbedf07f44a_2181420830@john.notmuch\0" - "ref\087iloja8ly.fsf@toke.dk\0" - "ref\020220704154440.7567-1-alexandr.lobakin@intel.com\0" - "From\0Toke H\303\270iland-J\303\270rgensen <toke@redhat.com>\0" + "From\0Alexander Lobakin <alexandr.lobakin@intel.com>\0" "Subject\0Re: [xdp-hints] Re: [PATCH RFC bpf-next 00/52] bpf, xdp: introduce and use Generic Hints/metadata\0" - "Date\0Mon, 04 Jul 2022 19:14:04 +0200\0" - "To\0Alexander Lobakin <alexandr.lobakin@intel.com>\0" - "Cc\0John Fastabend <john.fastabend@gmail.com>" + "Date\0Tue, 5 Jul 2022 17:15:09 +0200\0" + "To\0Toke H\303\270iland-J\303\270rgensen <toke@redhat.com>\0" + "Cc\0Alexander Lobakin <alexandr.lobakin@intel.com>" + John Fastabend <john.fastabend@gmail.com> Alexei Starovoitov <ast@kernel.org> Daniel Borkmann <daniel@iogearbox.net> Andrii Nakryiko <andrii@kernel.org> @@ -31,147 +28,196 @@ " xdp-hints@xdp-project.net\0" "\00:1\0" "b\0" - "Alexander Lobakin <alexandr.lobakin@intel.com> writes:\n" + "From: Toke H\303\270iland-J\303\270rgensen <toke@redhat.com>\n" + "Date: Mon, 04 Jul 2022 19:14:04 +0200\n" "\n" - "> From: Toke H??iland-J??rgensen <toke@redhat.com>\n" - "> Date: Wed, 29 Jun 2022 15:43:05 +0200\n" - ">\n" - ">> John Fastabend <john.fastabend@gmail.com> writes:\n" - ">> \n" - ">> > Alexander Lobakin wrote:\n" - ">> >> This RFC is to give the whole picture. It will most likely be split\n" - ">> >> onto several series, maybe even merge cycles. See the \"table of\n" - ">> >> contents\" below.\n" - ">> >\n" - ">> > Even for RFC its a bit much. Probably improve the summary\n" - ">> > message here as well I'm still not clear on the overall\n" - ">> > architecture so not sure I want to dig into patches.\n" - ">> \n" - ">> +1 on this, and piggybacking on your comment to chime in on the general\n" - ">> architecture.\n" - ">> \n" - ">> >> Now, a NIC driver, or even a SmartNIC itself, can put those params\n" - ">> >> there in a well-defined format. The format is fixed, but can be of\n" - ">> >> several different types represented by structures, which definitions\n" - ">> >> are available to the kernel, BPF programs and the userland.\n" - ">> >\n" - ">> > I don't think in general the format needs to be fixed.\n" - ">> \n" - ">> No, that's the whole point of BTF: it's not supposed to be UAPI, we'll\n" - ">> use CO-RE to enable dynamic formats...\n" - ">> \n" - ">> [...]\n" - ">> \n" - ">> >> It is fixed due to it being almost a UAPI, and the exact format can\n" - ">> >> be determined by reading the last 10 bytes of metadata. They contain\n" - ">> >> a 2-byte magic ID to not confuse it with a non-compatible meta and\n" - ">> >> a 8-byte combined BTF ID + type ID: the ID of the BTF where this\n" - ">> >> structure is defined and the ID of that definition inside that BTF.\n" - ">> >> Users can obtain BTF IDs by structure types using helpers available\n" - ">> >> in the kernel, BPF (written by the CO-RE/verifier) and the userland\n" - ">> >> (libbpf -> kernel call) and then rely on those ID when reading data\n" - ">> >> to make sure whether they support it and what to do with it.\n" - ">> >> Why separate magic and ID? The idea is to make different formats\n" - ">> >> always contain the basic/\"generic\" structure embedded at the end.\n" - ">> >> This way we can still benefit in purely generic consumers (like\n" - ">> >> cpumap) while providing some \"extra\" data to those who support it.\n" - ">> >\n" - ">> > I don't follow this. If you have a struct in your driver name it\n" - ">> > something obvious, ice_xdp_metadata. If I understand things\n" - ">> > correctly just dump the BTF for the driver, extract the\n" - ">> > struct and done you can use CO-RE reads. For the 'fixed' case\n" - ">> > this looks easy. And I don't think you even need a patch for this.\n" - ">> \n" - ">> ...however as we've discussed previously, we do need a bit of\n" - ">> infrastructure around this. In particular, we need to embed the embed\n" - ">> the BTF ID into the metadata itself so BPF can do runtime disambiguation\n" - ">> between different formats (and add the right CO-RE primitives to make\n" - ">> this easy). This is for two reasons:\n" - ">> \n" - ">> - The metadata might be different per-packet (e.g., PTP packets with\n" - ">> timestamps interleaved with bulk data without them)\n" - ">> \n" - ">> - With redirects we may end up processing packets from different devices\n" - ">> in a single XDP program (in devmap or cpumap, or on a veth) so we need\n" - ">> to be able to disambiguate at runtime.\n" - ">> \n" - ">> So I think the part of the design that puts the BTF ID into the end of\n" - ">> the metadata struct is sound; however, the actual format doesn't have to\n" - ">> be fixed, we can use CO-RE to pick out the bits that a given BPF program\n" - ">> needs; we just need a convention for how drivers report which format(s)\n" - ">> they support. Which we should also agree on (and add core infrastructure\n" - ">> around) so each driver doesn't go around inventing their own\n" - ">> conventions.\n" - ">> \n" - ">> >> The enablement of this feature is controlled on attaching/replacing\n" - ">> >> XDP program on an interface with two new parameters: that combined\n" - ">> >> BTF+type ID and metadata threshold.\n" - ">> >> The threshold specifies the minimum frame size which a driver (or\n" - ">> >> NIC) should start composing metadata from. It is introduced instead\n" - ">> >> of just false/true flag due to that often it's not worth it to spend\n" - ">> >> cycles to fetch all that data for such small frames: let's say, it\n" - ">> >> can be even faster to just calculate checksums for them on CPU\n" - ">> >> rather than touch non-coherent DMA zone. Simple XDP_DROP case loses\n" - ">> >> 15 Mpps on 64 byte frames with enabled metadata, threshold can help\n" - ">> >> mitigate that.\n" - ">> >\n" - ">> > I would put this in the bonus category. Can you do the simple thing\n" - ">> > above without these extra bits and then add them later. Just\n" - ">> > pick some overly conservative threshold to start with.\n" - ">> \n" - ">> Yeah, I'd agree this kind of configuration is something that can be\n" - ">> added later, and also it's sort of orthogonal to the consumption of the\n" - ">> metadata itself.\n" - ">> \n" - ">> Also, tying this configuration into the loading of an XDP program is a\n" - ">> terrible interface: these are hardware configuration options, let's just\n" - ">> put them into ethtool or 'ip link' like any other piece of device\n" - ">> configuration.\n" - ">\n" - "> I don't believe it fits there, especially Ethtool. Ethtool is for\n" - "> hardware configuration, XDP/AF_XDP is 95% software stuff (apart from\n" - "> offload bits which is purely NFP's for now).\n" + "> Alexander Lobakin <alexandr.lobakin@intel.com> writes:\n" + "> \n" + "> > From: Toke H??iland-J??rgensen <toke@redhat.com>\n" + "> > Date: Wed, 29 Jun 2022 15:43:05 +0200\n" + "> >\n" + "> >> John Fastabend <john.fastabend@gmail.com> writes:\n" + "> >> \n" + "> >> > Alexander Lobakin wrote:\n" + "> >> >> This RFC is to give the whole picture. It will most likely be split\n" + "> >> >> onto several series, maybe even merge cycles. See the \"table of\n" + "> >> >> contents\" below.\n" + "> >> >\n" + "> >> > Even for RFC its a bit much. Probably improve the summary\n" + "> >> > message here as well I'm still not clear on the overall\n" + "> >> > architecture so not sure I want to dig into patches.\n" + "> >> \n" + "> >> +1 on this, and piggybacking on your comment to chime in on the general\n" + "> >> architecture.\n" + "> >> \n" + "> >> >> Now, a NIC driver, or even a SmartNIC itself, can put those params\n" + "> >> >> there in a well-defined format. The format is fixed, but can be of\n" + "> >> >> several different types represented by structures, which definitions\n" + "> >> >> are available to the kernel, BPF programs and the userland.\n" + "> >> >\n" + "> >> > I don't think in general the format needs to be fixed.\n" + "> >> \n" + "> >> No, that's the whole point of BTF: it's not supposed to be UAPI, we'll\n" + "> >> use CO-RE to enable dynamic formats...\n" + "> >> \n" + "> >> [...]\n" + "> >> \n" + "> >> >> It is fixed due to it being almost a UAPI, and the exact format can\n" + "> >> >> be determined by reading the last 10 bytes of metadata. They contain\n" + "> >> >> a 2-byte magic ID to not confuse it with a non-compatible meta and\n" + "> >> >> a 8-byte combined BTF ID + type ID: the ID of the BTF where this\n" + "> >> >> structure is defined and the ID of that definition inside that BTF.\n" + "> >> >> Users can obtain BTF IDs by structure types using helpers available\n" + "> >> >> in the kernel, BPF (written by the CO-RE/verifier) and the userland\n" + "> >> >> (libbpf -> kernel call) and then rely on those ID when reading data\n" + "> >> >> to make sure whether they support it and what to do with it.\n" + "> >> >> Why separate magic and ID? The idea is to make different formats\n" + "> >> >> always contain the basic/\"generic\" structure embedded at the end.\n" + "> >> >> This way we can still benefit in purely generic consumers (like\n" + "> >> >> cpumap) while providing some \"extra\" data to those who support it.\n" + "> >> >\n" + "> >> > I don't follow this. If you have a struct in your driver name it\n" + "> >> > something obvious, ice_xdp_metadata. If I understand things\n" + "> >> > correctly just dump the BTF for the driver, extract the\n" + "> >> > struct and done you can use CO-RE reads. For the 'fixed' case\n" + "> >> > this looks easy. And I don't think you even need a patch for this.\n" + "> >> \n" + "> >> ...however as we've discussed previously, we do need a bit of\n" + "> >> infrastructure around this. In particular, we need to embed the embed\n" + "> >> the BTF ID into the metadata itself so BPF can do runtime disambiguation\n" + "> >> between different formats (and add the right CO-RE primitives to make\n" + "> >> this easy). This is for two reasons:\n" + "> >> \n" + "> >> - The metadata might be different per-packet (e.g., PTP packets with\n" + "> >> timestamps interleaved with bulk data without them)\n" + "> >> \n" + "> >> - With redirects we may end up processing packets from different devices\n" + "> >> in a single XDP program (in devmap or cpumap, or on a veth) so we need\n" + "> >> to be able to disambiguate at runtime.\n" + "> >> \n" + "> >> So I think the part of the design that puts the BTF ID into the end of\n" + "> >> the metadata struct is sound; however, the actual format doesn't have to\n" + "> >> be fixed, we can use CO-RE to pick out the bits that a given BPF program\n" + "> >> needs; we just need a convention for how drivers report which format(s)\n" + "> >> they support. Which we should also agree on (and add core infrastructure\n" + "> >> around) so each driver doesn't go around inventing their own\n" + "> >> conventions.\n" + "> >> \n" + "> >> >> The enablement of this feature is controlled on attaching/replacing\n" + "> >> >> XDP program on an interface with two new parameters: that combined\n" + "> >> >> BTF+type ID and metadata threshold.\n" + "> >> >> The threshold specifies the minimum frame size which a driver (or\n" + "> >> >> NIC) should start composing metadata from. It is introduced instead\n" + "> >> >> of just false/true flag due to that often it's not worth it to spend\n" + "> >> >> cycles to fetch all that data for such small frames: let's say, it\n" + "> >> >> can be even faster to just calculate checksums for them on CPU\n" + "> >> >> rather than touch non-coherent DMA zone. Simple XDP_DROP case loses\n" + "> >> >> 15 Mpps on 64 byte frames with enabled metadata, threshold can help\n" + "> >> >> mitigate that.\n" + "> >> >\n" + "> >> > I would put this in the bonus category. Can you do the simple thing\n" + "> >> > above without these extra bits and then add them later. Just\n" + "> >> > pick some overly conservative threshold to start with.\n" + "> >> \n" + "> >> Yeah, I'd agree this kind of configuration is something that can be\n" + "> >> added later, and also it's sort of orthogonal to the consumption of the\n" + "> >> metadata itself.\n" + "> >> \n" + "> >> Also, tying this configuration into the loading of an XDP program is a\n" + "> >> terrible interface: these are hardware configuration options, let's just\n" + "> >> put them into ethtool or 'ip link' like any other piece of device\n" + "> >> configuration.\n" + "> >\n" + "> > I don't believe it fits there, especially Ethtool. Ethtool is for\n" + "> > hardware configuration, XDP/AF_XDP is 95% software stuff (apart from\n" + "> > offload bits which is purely NFP's for now).\n" + "> \n" + "> But XDP-hints is about consuming hardware features. When you're\n" + "> configuring which metadata items you want, you're saying \"please provide\n" + "> me with these (hardware) features\". So ethtool is an excellent place to\n" + "> do that :)\n" "\n" - "But XDP-hints is about consuming hardware features. When you're\n" - "configuring which metadata items you want, you're saying \"please provide\n" - "me with these (hardware) features\". So ethtool is an excellent place to\n" - "do that :)\n" + "With Ethtool you configure the hardware, e.g. it won't strip VLAN\n" + "tags if you disable rx-cvlan-stripping. With configuring metadata\n" + "you only tell what you want to see there, don't you?\n" "\n" - "> I follow that way:\n" - ">\n" - "> 1) you pick a program you want to attach;\n" - "> 2) usually they are written for special needs and usecases;\n" - "> 3) so most likely that program will be tied with metadata/driver/etc\n" - "> in some way;\n" - "> 4) so you want to enable Hints of a particular format primarily for\n" - "> this program and usecase, same with threshold and everything\n" - "> else.\n" - ">\n" - "> Pls explain how you see it, I might be wrong for sure.\n" + "> \n" + "> > I follow that way:\n" + "> >\n" + "> > 1) you pick a program you want to attach;\n" + "> > 2) usually they are written for special needs and usecases;\n" + "> > 3) so most likely that program will be tied with metadata/driver/etc\n" + "> > in some way;\n" + "> > 4) so you want to enable Hints of a particular format primarily for\n" + "> > this program and usecase, same with threshold and everything\n" + "> > else.\n" + "> >\n" + "> > Pls explain how you see it, I might be wrong for sure.\n" + "> \n" + "> As above: XDP hints is about giving XDP programs (and AF_XDP consumers)\n" + "> access to metadata that is not currently available. Tying the lifetime\n" + "> of that hardware configuration (i.e., which information to provide) to\n" + "> the lifetime of an XDP program is not a good interface: for one thing,\n" + "> how will it handle multiple programs? What about when XDP is not used at\n" "\n" - "As above: XDP hints is about giving XDP programs (and AF_XDP consumers)\n" - "access to metadata that is not currently available. Tying the lifetime\n" - "of that hardware configuration (i.e., which information to provide) to\n" - "the lifetime of an XDP program is not a good interface: for one thing,\n" - "how will it handle multiple programs? What about when XDP is not used at\n" - "all but you still want to configure the same features?\n" + "Multiple progs is stuff I didn't cover, but will do later (as you\n" + "all say to me, \"let's start with something simple\" :)). Aaaand\n" + "multiple XDP progs (I'm not talking about attaching progs in\n" + "differeng modes) is not a kernel feature, rather a libpf feature,\n" + "so I believe it should be handled there later...\n" "\n" - "In addition, in every other case where we do dynamic data access (with\n" - "CO-RE) the BPF program is a consumer that modifies itself to access the\n" - "data provided by the kernel. I get that this is harder to achieve for\n" - "AF_XDP, but then let's solve that instead of making a totally\n" - "inconsistent interface for XDP.\n" + "> all but you still want to configure the same features?\n" "\n" - "I'm as excited as you about the prospect of having totally programmable\n" - "hardware where you can just specify any arbitrary metadata format and\n" - "it'll provide that for you. But that is an orthogonal feature: let's\n" - "start with creating a dynamic interface for consuming the (static)\n" - "hardware features we already have, and then later we can have a separate\n" - "interface for configuring more dynamic hardware features. XDP-hints is\n" - "about adding this consumption feature in a way that's sufficiently\n" - "dynamic that we can do the other (programmable hardware) thing on top\n" - "later...\n" + "What's the point of configuring metadata when there are no progs\n" + "attached? To configure it once and not on every prog attach? I'm\n" + "not saying I don't like it, just want to clarify.\n" + "Maybe I need opinions from some more people, just to have an\n" + "overview of how most of folks see it and would like to configure\n" + "it. 'Cause I heard from at least one of the consumers that\n" + "libpf API is a perfect place for Hints to him :)\n" "\n" - -Toke + "> \n" + "> In addition, in every other case where we do dynamic data access (with\n" + "> CO-RE) the BPF program is a consumer that modifies itself to access the\n" + "> data provided by the kernel. I get that this is harder to achieve for\n" + "> AF_XDP, but then let's solve that instead of making a totally\n" + "> inconsistent interface for XDP.\n" + "\n" + "I also see CO-RE more fitting and convenient way to use them, but\n" + "didn't manage to solve two things:\n" + "\n" + "1) AF_XDP programs, so what to do with them? Prepare patches for\n" + " LLVM to make it able to do CO-RE on AF_XDP program load? Or\n" + " just hardcode them for particular usecases and NICs? What about\n" + " \"general-purpose\" programs?\n" + " And if hardcode, what's the point then to do Generic Hints at\n" + " all? Then all it needs is making driver building some meta in\n" + " front of frames via on-off button and that's it? Why BTF ID in\n" + " the meta then if consumers will access meta hardcoded (via CO-RE\n" + " or literally hardcoded, doesn't matter)?\n" + "2) In-kernel metadata consumers? Also do CO-RE? Otherwise, with no\n" + " generic metadata structure they won't be able to benefit from\n" + " Hints. But I guess we still need to provide kernel with meta?\n" + " Or no?\n" + "\n" + "> \n" + "> I'm as excited as you about the prospect of having totally programmable\n" + "\n" + "But I mostly care about current generation with no programmable\n" + "Hints...\n" + "\n" + "> hardware where you can just specify any arbitrary metadata format and\n" + "> it'll provide that for you. But that is an orthogonal feature: let's\n" + "> start with creating a dynamic interface for consuming the (static)\n" + "> hardware features we already have, and then later we can have a separate\n" + "> interface for configuring more dynamic hardware features. XDP-hints is\n" + "> about adding this consumption feature in a way that's sufficiently\n" + "> dynamic that we can do the other (programmable hardware) thing on top\n" + "> later...\n" + "> \n" + "> -Toke\n" + "\n" + "Thanks,\n" + Olek -c72dc416836c50636a9981877feaff1bcb1c823441ad3152d8a1b2a65188b1ff +8f75568259bb40949ca39a06e8d33b0e5cdd88ba293b3917893e73d9d530d758
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox