From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: "Stanislav Fomichev" <sdf@google.com>,
"Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Andrii Nakryiko" <andrii@kernel.org>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
"Song Liu" <song@kernel.org>, "Yonghong Song" <yhs@fb.com>,
"John Fastabend" <john.fastabend@gmail.com>,
"KP Singh" <kpsingh@kernel.org>, "Hao Luo" <haoluo@google.com>,
"Jiri Olsa" <jolsa@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Jesper Dangaard Brouer" <hawk@kernel.org>,
"Björn Töpel" <bjorn@kernel.org>,
"Magnus Karlsson" <magnus.karlsson@intel.com>,
"Maciej Fijalkowski" <maciej.fijalkowski@intel.com>,
"Jonathan Lemon" <jonathan.lemon@gmail.com>,
"Mykola Lysenko" <mykolal@fb.com>,
"Kumar Kartikeya Dwivedi" <memxor@gmail.com>,
netdev@vger.kernel.org, bpf@vger.kernel.org,
"Freysteinn Alfredsson" <freysteinn.alfredsson@kau.se>
Subject: Re: [RFC PATCH 00/17] xdp: Add packet queueing and scheduling capabilities
Date: Mon, 18 Jul 2022 14:12:05 +0200 [thread overview]
Message-ID: <87y1wqliwa.fsf@toke.dk> (raw)
In-Reply-To: <YtRSOaCtujBfzHUS@pop-os.localdomain>
Cong Wang <xiyou.wangcong@gmail.com> writes:
> On Wed, Jul 13, 2022 at 11:52:07PM +0200, Toke Høiland-Jørgensen wrote:
>> Stanislav Fomichev <sdf@google.com> writes:
>>
>> > On Wed, Jul 13, 2022 at 4:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Packet forwarding is an important use case for XDP, which offers
>> >> significant performance improvements compared to forwarding using the
>> >> regular networking stack. However, XDP currently offers no mechanism to
>> >> delay, queue or schedule packets, which limits the practical uses for
>> >> XDP-based forwarding to those where the capacity of input and output links
>> >> always match each other (i.e., no rate transitions or many-to-one
>> >> forwarding). It also prevents an XDP-based router from doing any kind of
>> >> traffic shaping or reordering to enforce policy.
>> >>
>> >> This series represents a first RFC of our attempt to remedy this lack. The
>> >> code in these patches is functional, but needs additional testing and
>> >> polishing before being considered for merging. I'm posting it here as an
>> >> RFC to get some early feedback on the API and overall design of the
>> >> feature.
>> >>
>> >> DESIGN
>> >>
>> >> The design consists of three components: A new map type for storing XDP
>> >> frames, a new 'dequeue' program type that will run in the TX softirq to
>> >> provide the stack with packets to transmit, and a set of helpers to dequeue
>> >> packets from the map, optionally drop them, and to schedule an interface
>> >> for transmission.
>> >>
>> >> The new map type is modelled on the PIFO data structure proposed in the
>> >> literature[0][1]. It represents a priority queue where packets can be
>> >> enqueued in any priority, but is always dequeued from the head. From the
>> >> XDP side, the map is simply used as a target for the bpf_redirect_map()
>> >> helper, where the target index is the desired priority.
>> >
>> > I have the same question I asked on the series from Cong:
>> > Any considerations for existing carousel/edt-like models?
>>
>> Well, the reason for the addition in patch 5 (continuously increasing
>> priorities) is exactly to be able to implement EDT-like behaviour, where
>> the priority is used as time units to clock out packets.
>
> Are you sure? I seriouly doubt your patch can do this at all...
>
> Since your patch relies on bpf_map_push_elem(), which has no room for
> 'key' hence you reuse 'flags' but you also reserve 4 bits there... How
> could tstamp be packed with 4 reserved bits??
Well, my point was that the *data structure* itself supports 64-bit
priorities, and that's what we use from bpf_map_redirect() in XDP. The
choice of reserving four bits was a bit of an arbitrary choice on my
part. I actually figured 60 bits would be plenty to represent timestamps
in themselves, but I guess I miscalculated a bit for nanosecond
timestamps (60 bits only gets you 36 years of range there).
We could lower that to 2 reserved bits, which gets you a range of 146
years using 62 bits; or users could just right-shift the value by a
couple of bits before putting them in the map (scheduling with
single-nanosecond precision is not possible anyway, so losing a few bits
of precision is no big deal); or we could add a new helper instead of
reusing the existing one.
> Actually, if we look into the in-kernel EDT implementation
> (net/sched/sch_etf.c), it is also based on rbtree rather than PIFO.
The main reason I eschewed the existing rbtree code is that I don't
believe it's sufficiently performant, mainly due to the rebalancing.
This is a hunch, though, and as I mentioned in a different reply I'm
planning to go back and revisit the data structure, including
benchmarking different implementations against each other.
-Toke
next prev parent reply other threads:[~2022-07-18 12:12 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-13 11:14 [RFC PATCH 00/17] xdp: Add packet queueing and scheduling capabilities Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 01/17] dev: Move received_rps counter next to RPS members in softnet data Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 02/17] bpf: Expand map key argument of bpf_redirect_map to u64 Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 03/17] bpf: Use 64-bit return value for bpf_prog_run Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 04/17] bpf: Add a PIFO priority queue map type Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 05/17] pifomap: Add queue rotation for continuously increasing rank mode Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 06/17] xdp: Add dequeue program type for getting packets from a PIFO Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 07/17] bpf: Teach the verifier about referenced packets returned from dequeue programs Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 08/17] bpf: Add helpers to dequeue from a PIFO map Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 09/17] bpf: Introduce pkt_uid member for PTR_TO_PACKET Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 10/17] bpf: Implement direct packet access in dequeue progs Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 11/17] dev: Add XDP dequeue hook Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 12/17] bpf: Add helper to schedule an interface for TX dequeue Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 13/17] libbpf: Add support for dequeue program type and PIFO map type Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 14/17] libbpf: Add support for querying dequeue programs Toke Høiland-Jørgensen
2022-07-14 5:36 ` Andrii Nakryiko
2022-07-14 10:13 ` Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 15/17] selftests/bpf: Add verifier tests for dequeue prog Toke Høiland-Jørgensen
2022-07-14 5:38 ` Andrii Nakryiko
2022-07-14 6:45 ` Kumar Kartikeya Dwivedi
2022-07-14 18:54 ` Andrii Nakryiko
2022-07-15 11:11 ` Kumar Kartikeya Dwivedi
2022-07-13 11:14 ` [RFC PATCH 16/17] selftests/bpf: Add test for XDP queueing through PIFO maps Toke Høiland-Jørgensen
2022-07-14 5:41 ` Andrii Nakryiko
2022-07-14 10:18 ` Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 17/17] samples/bpf: Add queueing support to xdp_fwd sample Toke Høiland-Jørgensen
2022-07-13 18:36 ` [RFC PATCH 00/17] xdp: Add packet queueing and scheduling capabilities Stanislav Fomichev
2022-07-13 21:52 ` Toke Høiland-Jørgensen
2022-07-13 22:56 ` Stanislav Fomichev
2022-07-14 10:46 ` Toke Høiland-Jørgensen
2022-07-14 17:24 ` Stanislav Fomichev
2022-07-15 1:12 ` Alexei Starovoitov
2022-07-15 12:55 ` Toke Høiland-Jørgensen
2022-07-17 19:12 ` Cong Wang
2022-07-18 12:25 ` Toke Høiland-Jørgensen
2022-07-14 6:34 ` Kumar Kartikeya Dwivedi
2022-07-17 18:17 ` Cong Wang
2022-07-17 18:41 ` Kumar Kartikeya Dwivedi
2022-07-17 19:23 ` Cong Wang
2022-07-18 12:12 ` Toke Høiland-Jørgensen [this message]
2022-07-14 14:05 ` Jamal Hadi Salim
2022-07-14 14:56 ` Dave Taht
2022-07-14 15:33 ` Jamal Hadi Salim
2022-07-14 16:21 ` Toke Høiland-Jørgensen
2022-07-17 17:46 ` Cong Wang
2022-07-18 12:45 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y1wqliwa.fsf@toke.dk \
--to=toke@redhat.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=freysteinn.alfredsson@kau.se \
--cc=haoluo@google.com \
--cc=hawk@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=jonathan.lemon@gmail.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=mykolal@fb.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=xiyou.wangcong@gmail.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.