netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: "Stanislav Fomichev" <sdf@google.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"Song Liu" <song@kernel.org>, "Yonghong Song" <yhs@fb.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@kernel.org>, "Hao Luo" <haoluo@google.com>,
	"Jiri Olsa" <jolsa@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"Björn Töpel" <bjorn@kernel.org>,
	"Magnus Karlsson" <magnus.karlsson@intel.com>,
	"Maciej Fijalkowski" <maciej.fijalkowski@intel.com>,
	"Jonathan Lemon" <jonathan.lemon@gmail.com>,
	"Mykola Lysenko" <mykolal@fb.com>,
	"Kumar Kartikeya Dwivedi" <memxor@gmail.com>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	"Freysteinn Alfredsson" <freysteinn.alfredsson@kau.se>
Subject: Re: [RFC PATCH 00/17] xdp: Add packet queueing and scheduling capabilities
Date: Mon, 18 Jul 2022 14:12:05 +0200	[thread overview]
Message-ID: <87y1wqliwa.fsf@toke.dk> (raw)
In-Reply-To: <YtRSOaCtujBfzHUS@pop-os.localdomain>

Cong Wang <xiyou.wangcong@gmail.com> writes:

> On Wed, Jul 13, 2022 at 11:52:07PM +0200, Toke Høiland-Jørgensen wrote:
>> Stanislav Fomichev <sdf@google.com> writes:
>> 
>> > On Wed, Jul 13, 2022 at 4:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Packet forwarding is an important use case for XDP, which offers
>> >> significant performance improvements compared to forwarding using the
>> >> regular networking stack. However, XDP currently offers no mechanism to
>> >> delay, queue or schedule packets, which limits the practical uses for
>> >> XDP-based forwarding to those where the capacity of input and output links
>> >> always match each other (i.e., no rate transitions or many-to-one
>> >> forwarding). It also prevents an XDP-based router from doing any kind of
>> >> traffic shaping or reordering to enforce policy.
>> >>
>> >> This series represents a first RFC of our attempt to remedy this lack. The
>> >> code in these patches is functional, but needs additional testing and
>> >> polishing before being considered for merging. I'm posting it here as an
>> >> RFC to get some early feedback on the API and overall design of the
>> >> feature.
>> >>
>> >> DESIGN
>> >>
>> >> The design consists of three components: A new map type for storing XDP
>> >> frames, a new 'dequeue' program type that will run in the TX softirq to
>> >> provide the stack with packets to transmit, and a set of helpers to dequeue
>> >> packets from the map, optionally drop them, and to schedule an interface
>> >> for transmission.
>> >>
>> >> The new map type is modelled on the PIFO data structure proposed in the
>> >> literature[0][1]. It represents a priority queue where packets can be
>> >> enqueued in any priority, but is always dequeued from the head. From the
>> >> XDP side, the map is simply used as a target for the bpf_redirect_map()
>> >> helper, where the target index is the desired priority.
>> >
>> > I have the same question I asked on the series from Cong:
>> > Any considerations for existing carousel/edt-like models?
>> 
>> Well, the reason for the addition in patch 5 (continuously increasing
>> priorities) is exactly to be able to implement EDT-like behaviour, where
>> the priority is used as time units to clock out packets.
>
> Are you sure? I seriouly doubt your patch can do this at all...
>
> Since your patch relies on bpf_map_push_elem(), which has no room for
> 'key' hence you reuse 'flags' but you also reserve 4 bits there... How
> could tstamp be packed with 4 reserved bits??

Well, my point was that the *data structure* itself supports 64-bit
priorities, and that's what we use from bpf_map_redirect() in XDP. The
choice of reserving four bits was a bit of an arbitrary choice on my
part. I actually figured 60 bits would be plenty to represent timestamps
in themselves, but I guess I miscalculated a bit for nanosecond
timestamps (60 bits only gets you 36 years of range there).

We could lower that to 2 reserved bits, which gets you a range of 146
years using 62 bits; or users could just right-shift the value by a
couple of bits before putting them in the map (scheduling with
single-nanosecond precision is not possible anyway, so losing a few bits
of precision is no big deal); or we could add a new helper instead of
reusing the existing one.

> Actually, if we look into the in-kernel EDT implementation
> (net/sched/sch_etf.c), it is also based on rbtree rather than PIFO.

The main reason I eschewed the existing rbtree code is that I don't
believe it's sufficiently performant, mainly due to the rebalancing.
This is a hunch, though, and as I mentioned in a different reply I'm
planning to go back and revisit the data structure, including
benchmarking different implementations against each other.

-Toke


  parent reply	other threads:[~2022-07-18 12:12 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-13 11:14 [RFC PATCH 00/17] xdp: Add packet queueing and scheduling capabilities Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 01/17] dev: Move received_rps counter next to RPS members in softnet data Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 02/17] bpf: Expand map key argument of bpf_redirect_map to u64 Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 03/17] bpf: Use 64-bit return value for bpf_prog_run Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 04/17] bpf: Add a PIFO priority queue map type Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 05/17] pifomap: Add queue rotation for continuously increasing rank mode Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 06/17] xdp: Add dequeue program type for getting packets from a PIFO Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 07/17] bpf: Teach the verifier about referenced packets returned from dequeue programs Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 08/17] bpf: Add helpers to dequeue from a PIFO map Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 09/17] bpf: Introduce pkt_uid member for PTR_TO_PACKET Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 10/17] bpf: Implement direct packet access in dequeue progs Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 11/17] dev: Add XDP dequeue hook Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 12/17] bpf: Add helper to schedule an interface for TX dequeue Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 13/17] libbpf: Add support for dequeue program type and PIFO map type Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 14/17] libbpf: Add support for querying dequeue programs Toke Høiland-Jørgensen
2022-07-14  5:36   ` Andrii Nakryiko
2022-07-14 10:13     ` Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 15/17] selftests/bpf: Add verifier tests for dequeue prog Toke Høiland-Jørgensen
2022-07-14  5:38   ` Andrii Nakryiko
2022-07-14  6:45     ` Kumar Kartikeya Dwivedi
2022-07-14 18:54       ` Andrii Nakryiko
2022-07-15 11:11         ` Kumar Kartikeya Dwivedi
2022-07-13 11:14 ` [RFC PATCH 16/17] selftests/bpf: Add test for XDP queueing through PIFO maps Toke Høiland-Jørgensen
2022-07-14  5:41   ` Andrii Nakryiko
2022-07-14 10:18     ` Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 17/17] samples/bpf: Add queueing support to xdp_fwd sample Toke Høiland-Jørgensen
2022-07-13 18:36 ` [RFC PATCH 00/17] xdp: Add packet queueing and scheduling capabilities Stanislav Fomichev
2022-07-13 21:52   ` Toke Høiland-Jørgensen
2022-07-13 22:56     ` Stanislav Fomichev
2022-07-14 10:46       ` Toke Høiland-Jørgensen
2022-07-14 17:24         ` Stanislav Fomichev
2022-07-15  1:12         ` Alexei Starovoitov
2022-07-15 12:55           ` Toke Høiland-Jørgensen
2022-07-17 19:12         ` Cong Wang
2022-07-18 12:25           ` Toke Høiland-Jørgensen
2022-07-14  6:34     ` Kumar Kartikeya Dwivedi
2022-07-17 18:17     ` Cong Wang
2022-07-17 18:41       ` Kumar Kartikeya Dwivedi
2022-07-17 19:23         ` Cong Wang
2022-07-18 12:12       ` Toke Høiland-Jørgensen [this message]
2022-07-14 14:05 ` Jamal Hadi Salim
2022-07-14 14:56   ` Dave Taht
2022-07-14 15:33     ` Jamal Hadi Salim
2022-07-14 16:21   ` Toke Høiland-Jørgensen
2022-07-17 17:46 ` Cong Wang
2022-07-18 12:45   ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y1wqliwa.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=freysteinn.alfredsson@kau.se \
    --cc=haoluo@google.com \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=jonathan.lemon@gmail.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=mykolal@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).