From: "Toke Høiland-Jørgensen" <toke@kernel.org>
To: Stanislav Fomichev <sdf@google.com>
Cc: lsf-pc@lists.linux-foundation.org, bpf@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] XDP metadata for TX
Date: Wed, 01 Mar 2023 00:08:36 +0100 [thread overview]
Message-ID: <87lekhgzgr.fsf@toke.dk> (raw)
In-Reply-To: <Y/6IBI8hlyI3DcUz@google.com>
Stanislav Fomichev <sdf@google.com> writes:
> On 02/28, Toke H�iland-J�rgensen wrote:
>> Stanislav Fomichev <sdf@google.com> writes:
>
>> > On Mon, Feb 27, 2023 at 3:54 PM Toke H�iland-J�rgensen
>> <toke@kernel.org> wrote:
>> >>
>> >> Stanislav Fomichev <sdf@google.com> writes:
>> >>
>> >> > On Mon, Feb 27, 2023 at 6:17 AM Toke H�iland-J�rgensen
>> <toke@kernel.org> wrote:
>> >> >>
>> >> >> Stanislav Fomichev <sdf@google.com> writes:
>> >> >>
>> >> >> > On Thu, Feb 23, 2023 at 3:22 PM Toke H�iland-J�rgensen
>> <toke@kernel.org> wrote:
>> >> >> >>
>> >> >> >> Stanislav Fomichev <sdf@google.com> writes:
>> >> >> >>
>> >> >> >> > I'd like to discuss a potential follow up for the
>> previous "XDP RX
>> >> >> >> > metadata" series [0].
>> >> >> >> >
>> >> >> >> > Now that we can access (a subset of) packet metadata at RX,
>> I'd like to
>> >> >> >> > explore the options where we can export some of that metadata
>> on TX. And
>> >> >> >> > also whether it might be possible to access some of the TX
>> completion
>> >> >> >> > metadata (things like TX timestamp).
>> >> >> >> >
>> >> >> >> > I'm currently trying to understand whether the same approach
>> I've used
>> >> >> >> > on RX could work at TX. By May I plan to have a bunch of
>> options laid
>> >> >> >> > out (currently considering XSK tx/compl programs and XDP
>> tx/compl
>> >> >> >> > programs) so we have something to discuss.
>> >> >> >>
>> >> >> >> I've been looking at ways of getting a TX-completion hook for
>> the XDP
>> >> >> >> queueing stuff as well. For that, I think it could work to just
>> hook
>> >> >> >> into xdp_return_frame(), but if you want to access hardware
>> metadata
>> >> >> >> it'll obviously have to be in the driver. A hook in the driver
>> could
>> >> >> >> certainly be used for the queueing return as well, though, which
>> may
>> >> >> >> help making it worth the trouble :)
>> >> >> >
>> >> >> > Yeah, I'd like to get to completion descriptors ideally; so
>> nothing
>> >> >> > better than a driver hook comes to mind so far :-(
>> >> >> > (I'm eye-balling mlx5's mlx5e_free_xdpsq_desc AF_XDP path mostly
>> so far).
>> >> >>
>> >> >> Is there any other use case for this than getting the TX timestamp?
>> Not
>> >> >> really sure what else those descriptors contain...
>> >> >
>> >> > I don't think so; at least looking at mlx5 and bnxt (the latter
>> >> > doesn't have a timestamp in the completion ring).
>> >> > So yeah, not sure, maybe that should be on the side and be AF_XDP
>> specific.
>> >> > And not even involve bpf, just put the tx tstamp somewhere in umem:
>> >> > setsockopt(xsk_fd, SOL_XDP, XSK_STAMP_TX_COMPLETION,
>> >> > &data_relative_offset, ..);
>> >> > OTOH, if it is only a timestamp now, it doesn't mean that's all we'd
>> >> > have for eternity? (plus, this needs a driver "hook" for af_xdp
>> >> > anyway, so why not make it generic?)
>> >>
>> >> So since this is read-only in any case, we could just make it a
>> >> tracepoint instead of a whole new hook? That's what I was planning to
>> do
>> >> for xdp_return_frame(); we just need a way to refer back to the
>> original
>> >> frame, but we'd need to solve that anyway. Just letting XDP/AF_XDP
>> >> specify their own packet ID in some form, and make that part of the
>> >> tracepoint data, would probably be sufficient?
>> >
>> > That would probably mean a driver specific tracepoint (since we might
>> > need to get to the completion queue descriptor)?
>> > Idk, probably still makes sense to have something that works across
>> > different drivers?
>> > Or are you suggesting to just do fentry/mlx5e_free_xdpsq_desc and go
>> from there?
>> > Not sure I can get a umem frame, as you've mentioned; and also it
>> > doesn't look like cqe is there...
>
>> No, we'd define the tracepoint in one place (like along with the others
>> in include/xdp/trace/event.h), and have it include the fields we need
>> (ifindex, queue index, timestamp, some kind of packet ID). And then add
>> a call in each driver that will support this, wherever it makes sense in
>> that driver. This is what we do with the trace_xdp_exception() calls
>> already scattered through drivers in the RX handling code.
>
>> With this, a BPF listener can attach to just the one tracepoint, and get
>> events from all drivers that support it (but it can filter on ifindex
>> for the events it's interested in).
>
>> This probably doesn't scale indefinitely: it's possible to add new
>> fields to the tracepoint if we discover other uses than the timestamp,
>> but this would probably get unwieldy at some point. However, it's a
>> lightweight way to add a "hook" if we don't have any other use cases
>> than getting the timestamp for now.
>
>> > I guess the fact that it would arrive out-of-band (not in a umem
>> > frame) is a minor inconvenience, the userspace should be able to join
>> > the data together hopefully.
>
>> Yeah, that's why I suggested the user-defined packet ID: if the event
>> just includes that, the BPF or userspace program can do its own matching
>> depending on its needs...
>
> I guess we can implement this user-defined packet ID as a TX metadata
> as well? Some kind of u32 tag/mark that the XDP program can set? (and
> some new AF_XDP TX hook where we can set it up as well)
Yeah, good point, that would be an excellent way to resolve that
particular issue :)
-Toke
next prev parent reply other threads:[~2023-02-28 23:08 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-23 22:23 [LSF/MM/BPF TOPIC] XDP metadata for TX Stanislav Fomichev
2023-02-23 23:22 ` Toke Høiland-Jørgensen
2023-02-23 23:51 ` Stanislav Fomichev
2023-02-27 14:17 ` Toke Høiland-Jørgensen
2023-02-27 20:03 ` Stanislav Fomichev
2023-02-27 23:54 ` Toke Høiland-Jørgensen
2023-02-28 21:18 ` Stanislav Fomichev
2023-02-28 22:09 ` Toke Høiland-Jørgensen
2023-02-28 23:02 ` Stanislav Fomichev
2023-02-28 23:08 ` Toke Høiland-Jørgensen [this message]
2023-03-03 7:42 ` Magnus Karlsson
2023-03-03 12:37 ` Toke Høiland-Jørgensen
2023-03-03 17:16 ` Stanislav Fomichev
2023-03-07 19:32 ` Jesper Dangaard Brouer
2023-03-09 18:04 ` Stanislav Fomichev
2023-03-10 11:09 ` Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87lekhgzgr.fsf@toke.dk \
--to=toke@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=sdf@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.