From: "Toke Høiland-Jørgensen" <toke@kernel.org>
To: Stanislav Fomichev <sdf@google.com>
Cc: lsf-pc@lists.linux-foundation.org, bpf@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] XDP metadata for TX
Date: Wed, 01 Mar 2023 00:08:36 +0100 [thread overview]
Message-ID: <87lekhgzgr.fsf@toke.dk> (raw)
In-Reply-To: <Y/6IBI8hlyI3DcUz@google.com>
Stanislav Fomichev <sdf@google.com> writes:
> On 02/28, Toke H�iland-J�rgensen wrote:
>> Stanislav Fomichev <sdf@google.com> writes:
>
>> > On Mon, Feb 27, 2023 at 3:54 PM Toke H�iland-J�rgensen
>> <toke@kernel.org> wrote:
>> >>
>> >> Stanislav Fomichev <sdf@google.com> writes:
>> >>
>> >> > On Mon, Feb 27, 2023 at 6:17 AM Toke H�iland-J�rgensen
>> <toke@kernel.org> wrote:
>> >> >>
>> >> >> Stanislav Fomichev <sdf@google.com> writes:
>> >> >>
>> >> >> > On Thu, Feb 23, 2023 at 3:22 PM Toke H�iland-J�rgensen
>> <toke@kernel.org> wrote:
>> >> >> >>
>> >> >> >> Stanislav Fomichev <sdf@google.com> writes:
>> >> >> >>
>> >> >> >> > I'd like to discuss a potential follow up for the
>> previous "XDP RX
>> >> >> >> > metadata" series [0].
>> >> >> >> >
>> >> >> >> > Now that we can access (a subset of) packet metadata at RX,
>> I'd like to
>> >> >> >> > explore the options where we can export some of that metadata
>> on TX. And
>> >> >> >> > also whether it might be possible to access some of the TX
>> completion
>> >> >> >> > metadata (things like TX timestamp).
>> >> >> >> >
>> >> >> >> > I'm currently trying to understand whether the same approach
>> I've used
>> >> >> >> > on RX could work at TX. By May I plan to have a bunch of
>> options laid
>> >> >> >> > out (currently considering XSK tx/compl programs and XDP
>> tx/compl
>> >> >> >> > programs) so we have something to discuss.
>> >> >> >>
>> >> >> >> I've been looking at ways of getting a TX-completion hook for
>> the XDP
>> >> >> >> queueing stuff as well. For that, I think it could work to just
>> hook
>> >> >> >> into xdp_return_frame(), but if you want to access hardware
>> metadata
>> >> >> >> it'll obviously have to be in the driver. A hook in the driver
>> could
>> >> >> >> certainly be used for the queueing return as well, though, which
>> may
>> >> >> >> help making it worth the trouble :)
>> >> >> >
>> >> >> > Yeah, I'd like to get to completion descriptors ideally; so
>> nothing
>> >> >> > better than a driver hook comes to mind so far :-(
>> >> >> > (I'm eye-balling mlx5's mlx5e_free_xdpsq_desc AF_XDP path mostly
>> so far).
>> >> >>
>> >> >> Is there any other use case for this than getting the TX timestamp?
>> Not
>> >> >> really sure what else those descriptors contain...
>> >> >
>> >> > I don't think so; at least looking at mlx5 and bnxt (the latter
>> >> > doesn't have a timestamp in the completion ring).
>> >> > So yeah, not sure, maybe that should be on the side and be AF_XDP
>> specific.
>> >> > And not even involve bpf, just put the tx tstamp somewhere in umem:
>> >> > setsockopt(xsk_fd, SOL_XDP, XSK_STAMP_TX_COMPLETION,
>> >> > &data_relative_offset, ..);
>> >> > OTOH, if it is only a timestamp now, it doesn't mean that's all we'd
>> >> > have for eternity? (plus, this needs a driver "hook" for af_xdp
>> >> > anyway, so why not make it generic?)
>> >>
>> >> So since this is read-only in any case, we could just make it a
>> >> tracepoint instead of a whole new hook? That's what I was planning to
>> do
>> >> for xdp_return_frame(); we just need a way to refer back to the
>> original
>> >> frame, but we'd need to solve that anyway. Just letting XDP/AF_XDP
>> >> specify their own packet ID in some form, and make that part of the
>> >> tracepoint data, would probably be sufficient?
>> >
>> > That would probably mean a driver specific tracepoint (since we might
>> > need to get to the completion queue descriptor)?
>> > Idk, probably still makes sense to have something that works across
>> > different drivers?
>> > Or are you suggesting to just do fentry/mlx5e_free_xdpsq_desc and go
>> from there?
>> > Not sure I can get a umem frame, as you've mentioned; and also it
>> > doesn't look like cqe is there...
>
>> No, we'd define the tracepoint in one place (like along with the others
>> in include/xdp/trace/event.h), and have it include the fields we need
>> (ifindex, queue index, timestamp, some kind of packet ID). And then add
>> a call in each driver that will support this, wherever it makes sense in
>> that driver. This is what we do with the trace_xdp_exception() calls
>> already scattered through drivers in the RX handling code.
>
>> With this, a BPF listener can attach to just the one tracepoint, and get
>> events from all drivers that support it (but it can filter on ifindex
>> for the events it's interested in).
>
>> This probably doesn't scale indefinitely: it's possible to add new
>> fields to the tracepoint if we discover other uses than the timestamp,
>> but this would probably get unwieldy at some point. However, it's a
>> lightweight way to add a "hook" if we don't have any other use cases
>> than getting the timestamp for now.
>
>> > I guess the fact that it would arrive out-of-band (not in a umem
>> > frame) is a minor inconvenience, the userspace should be able to join
>> > the data together hopefully.
>
>> Yeah, that's why I suggested the user-defined packet ID: if the event
>> just includes that, the BPF or userspace program can do its own matching
>> depending on its needs...
>
> I guess we can implement this user-defined packet ID as a TX metadata
> as well? Some kind of u32 tag/mark that the XDP program can set? (and
> some new AF_XDP TX hook where we can set it up as well)
Yeah, good point, that would be an excellent way to resolve that
particular issue :)
-Toke
next prev parent reply other threads:[~2023-02-28 23:08 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-23 22:23 [LSF/MM/BPF TOPIC] XDP metadata for TX Stanislav Fomichev
2023-02-23 23:22 ` Toke Høiland-Jørgensen
2023-02-23 23:51 ` Stanislav Fomichev
2023-02-27 14:17 ` Toke Høiland-Jørgensen
2023-02-27 20:03 ` Stanislav Fomichev
2023-02-27 23:54 ` Toke Høiland-Jørgensen
2023-02-28 21:18 ` Stanislav Fomichev
2023-02-28 22:09 ` Toke Høiland-Jørgensen
2023-02-28 23:02 ` Stanislav Fomichev
2023-02-28 23:08 ` Toke Høiland-Jørgensen [this message]
2023-03-03 7:42 ` Magnus Karlsson
2023-03-03 12:37 ` Toke Høiland-Jørgensen
2023-03-03 17:16 ` Stanislav Fomichev
2023-03-07 19:32 ` Jesper Dangaard Brouer
2023-03-09 18:04 ` Stanislav Fomichev
2023-03-10 11:09 ` Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87lekhgzgr.fsf@toke.dk \
--to=toke@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=sdf@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox