From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: "Stanislav Fomichev" <sdf@google.com>,
"Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Andrii Nakryiko" <andrii@kernel.org>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
"Song Liu" <song@kernel.org>, "Yonghong Song" <yhs@fb.com>,
"John Fastabend" <john.fastabend@gmail.com>,
"KP Singh" <kpsingh@kernel.org>, "Hao Luo" <haoluo@google.com>,
"Jiri Olsa" <jolsa@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Jesper Dangaard Brouer" <hawk@kernel.org>,
"Björn Töpel" <bjorn@kernel.org>,
"Magnus Karlsson" <magnus.karlsson@intel.com>,
"Maciej Fijalkowski" <maciej.fijalkowski@intel.com>,
"Jonathan Lemon" <jonathan.lemon@gmail.com>,
"Mykola Lysenko" <mykolal@fb.com>,
"Kumar Kartikeya Dwivedi" <memxor@gmail.com>,
netdev@vger.kernel.org, bpf@vger.kernel.org,
"Freysteinn Alfredsson" <freysteinn.alfredsson@kau.se>
Subject: Re: [RFC PATCH 00/17] xdp: Add packet queueing and scheduling capabilities
Date: Mon, 18 Jul 2022 14:25:46 +0200 [thread overview]
Message-ID: <87v8ruli9h.fsf@toke.dk> (raw)
In-Reply-To: <YtRfG5YNtNHZXOUc@pop-os.localdomain>
Cong Wang <xiyou.wangcong@gmail.com> writes:
> On Thu, Jul 14, 2022 at 12:46:54PM +0200, Toke Høiland-Jørgensen wrote:
>> Stanislav Fomichev <sdf@google.com> writes:
>>
>> > On Wed, Jul 13, 2022 at 2:52 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Stanislav Fomichev <sdf@google.com> writes:
>> >>
>> >> > On Wed, Jul 13, 2022 at 4:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >> >>
>> >> >> Packet forwarding is an important use case for XDP, which offers
>> >> >> significant performance improvements compared to forwarding using the
>> >> >> regular networking stack. However, XDP currently offers no mechanism to
>> >> >> delay, queue or schedule packets, which limits the practical uses for
>> >> >> XDP-based forwarding to those where the capacity of input and output links
>> >> >> always match each other (i.e., no rate transitions or many-to-one
>> >> >> forwarding). It also prevents an XDP-based router from doing any kind of
>> >> >> traffic shaping or reordering to enforce policy.
>> >> >>
>> >> >> This series represents a first RFC of our attempt to remedy this lack. The
>> >> >> code in these patches is functional, but needs additional testing and
>> >> >> polishing before being considered for merging. I'm posting it here as an
>> >> >> RFC to get some early feedback on the API and overall design of the
>> >> >> feature.
>> >> >>
>> >> >> DESIGN
>> >> >>
>> >> >> The design consists of three components: A new map type for storing XDP
>> >> >> frames, a new 'dequeue' program type that will run in the TX softirq to
>> >> >> provide the stack with packets to transmit, and a set of helpers to dequeue
>> >> >> packets from the map, optionally drop them, and to schedule an interface
>> >> >> for transmission.
>> >> >>
>> >> >> The new map type is modelled on the PIFO data structure proposed in the
>> >> >> literature[0][1]. It represents a priority queue where packets can be
>> >> >> enqueued in any priority, but is always dequeued from the head. From the
>> >> >> XDP side, the map is simply used as a target for the bpf_redirect_map()
>> >> >> helper, where the target index is the desired priority.
>> >> >
>> >> > I have the same question I asked on the series from Cong:
>> >> > Any considerations for existing carousel/edt-like models?
>> >>
>> >> Well, the reason for the addition in patch 5 (continuously increasing
>> >> priorities) is exactly to be able to implement EDT-like behaviour, where
>> >> the priority is used as time units to clock out packets.
>> >
>> > Ah, ok, I didn't read the patches closely enough. I saw some limits
>> > for the ranges and assumed that it wasn't capable of efficiently
>> > storing 64-bit timestamps..
>>
>> The goal is definitely to support full 64-bit priorities. Right now you
>> have to start out at 0 but can go on for a full 64 bits, but that's a
>> bit of an API wart that I'd like to get rid of eventually...
>>
>> >> > Can we make the map flexible enough to implement different qdisc
>> >> > policies?
>> >>
>> >> That's one of the things we want to be absolutely sure about. We are
>> >> starting out with the PIFO map type because the literature makes a good
>> >> case that it is flexible enough to implement all conceivable policies.
>> >> The goal of the test harness linked as note [4] is to actually examine
>> >> this; Frey is our PhD student working on this bit.
>> >>
>> >> Thus far we haven't hit any limitations on this, but we'll need to add
>> >> more policies before we are done with this. Another consideration is
>> >> performance, of course, so we're also planning to do a comparison with a
>> >> more traditional "bunch of FIFO queues" type data structure for at least
>> >> a subset of the algorithms. Kartikeya also had an idea for an
>> >> alternative way to implement a priority queue using (semi-)lockless
>> >> skiplists, which may turn out to perform better.
>> >>
>> >> If there's any particular policy/algorithm you'd like to see included in
>> >> this evaluation, please do let us know, BTW! :)
>> >
>> > I honestly am not sure what the bar for accepting this should be. But
>> > on the Cong's series I mentioned Martin's CC bpf work as a great
>> > example of what we should be trying to do for qdisc-like maps. Having
>> > a bpf version of fq/fq_codel/whatever_other_complex_qdisc might be
>> > very convincing :-)
>>
>> Just doing flow queueing is quite straight forward with PIFOs. We're
>> working on fq_codel. Personally I also want to implement something that
>> has feature parity with sch_cake (which includes every feature and the
>> kitchen sink already) :)
>
> And how exactly would you plan to implement Least Slack Time First with
> PIFOs? See https://www.usenix.org/system/files/nsdi20-paper-sharma.pdf.
> Can you be as specific as possible ideally with a pesudo code?
By sticking flows into the PIFO instead of individual packets.
Basically:
enqueue:
flow_id = hash_pkt(pkt);
flow_pifo = &flows[flow_id];
pifo_enqueue(flow_pifo, pkt, 0); // always enqueue at rank 0, so effectively a FIFO
pifo_enqueue(toplevel_pifo, flow_id, compute_rank(flow_id));
dequeue:
flow_id = pifo_dequeue(toplevel_pifo);
flow_pifo = &flows[flow_id];
pkt = pifo_dequeue(flow_pifo);
pifo_enqueue(toplevel_pifo, flow_id, compute_rank(flow_id)); // re-enqueue
return pkt;
We have not gotten around to doing a full implementation of this yet,
but SRPT/LSTF is on our list of algorithms to add :)
> BTW, this is very easy to do with my approach as no FO limitations.
How does being able to dequeue out-of-order actually help with this
particular scheme? On dequeue you still process things in priority
order?
-Toke
next prev parent reply other threads:[~2022-07-18 12:25 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-13 11:14 [RFC PATCH 00/17] xdp: Add packet queueing and scheduling capabilities Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 01/17] dev: Move received_rps counter next to RPS members in softnet data Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 02/17] bpf: Expand map key argument of bpf_redirect_map to u64 Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 03/17] bpf: Use 64-bit return value for bpf_prog_run Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 04/17] bpf: Add a PIFO priority queue map type Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 05/17] pifomap: Add queue rotation for continuously increasing rank mode Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 06/17] xdp: Add dequeue program type for getting packets from a PIFO Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 07/17] bpf: Teach the verifier about referenced packets returned from dequeue programs Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 08/17] bpf: Add helpers to dequeue from a PIFO map Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 09/17] bpf: Introduce pkt_uid member for PTR_TO_PACKET Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 10/17] bpf: Implement direct packet access in dequeue progs Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 11/17] dev: Add XDP dequeue hook Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 12/17] bpf: Add helper to schedule an interface for TX dequeue Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 13/17] libbpf: Add support for dequeue program type and PIFO map type Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 14/17] libbpf: Add support for querying dequeue programs Toke Høiland-Jørgensen
2022-07-14 5:36 ` Andrii Nakryiko
2022-07-14 10:13 ` Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 15/17] selftests/bpf: Add verifier tests for dequeue prog Toke Høiland-Jørgensen
2022-07-14 5:38 ` Andrii Nakryiko
2022-07-14 6:45 ` Kumar Kartikeya Dwivedi
2022-07-14 18:54 ` Andrii Nakryiko
2022-07-15 11:11 ` Kumar Kartikeya Dwivedi
2022-07-13 11:14 ` [RFC PATCH 16/17] selftests/bpf: Add test for XDP queueing through PIFO maps Toke Høiland-Jørgensen
2022-07-14 5:41 ` Andrii Nakryiko
2022-07-14 10:18 ` Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 17/17] samples/bpf: Add queueing support to xdp_fwd sample Toke Høiland-Jørgensen
2022-07-13 18:36 ` [RFC PATCH 00/17] xdp: Add packet queueing and scheduling capabilities Stanislav Fomichev
2022-07-13 21:52 ` Toke Høiland-Jørgensen
2022-07-13 22:56 ` Stanislav Fomichev
2022-07-14 10:46 ` Toke Høiland-Jørgensen
2022-07-14 17:24 ` Stanislav Fomichev
2022-07-15 1:12 ` Alexei Starovoitov
2022-07-15 12:55 ` Toke Høiland-Jørgensen
2022-07-17 19:12 ` Cong Wang
2022-07-18 12:25 ` Toke Høiland-Jørgensen [this message]
2022-07-14 6:34 ` Kumar Kartikeya Dwivedi
2022-07-17 18:17 ` Cong Wang
2022-07-17 18:41 ` Kumar Kartikeya Dwivedi
2022-07-17 19:23 ` Cong Wang
2022-07-18 12:12 ` Toke Høiland-Jørgensen
2022-07-14 14:05 ` Jamal Hadi Salim
2022-07-14 14:56 ` Dave Taht
2022-07-14 15:33 ` Jamal Hadi Salim
2022-07-14 16:21 ` Toke Høiland-Jørgensen
2022-07-17 17:46 ` Cong Wang
2022-07-18 12:45 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v8ruli9h.fsf@toke.dk \
--to=toke@redhat.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=freysteinn.alfredsson@kau.se \
--cc=haoluo@google.com \
--cc=hawk@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=jonathan.lemon@gmail.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=mykolal@fb.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=xiyou.wangcong@gmail.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).