From: Barret Rhoden <brho@google.com>
To: Tejun Heo <tj@kernel.org>
Cc: torvalds@linux-foundation.org, mingo@redhat.com,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
bristot@redhat.com, vschneid@redhat.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org,
joshdon@google.com, pjt@google.com, derkling@google.com,
haoluo@google.com, dvernet@meta.com, dschatzberg@meta.com,
dskarlat@cs.cmu.edu, riel@surriel.com,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
kernel-team@meta.com
Subject: Re: [PATCHSET v4] sched: Implement BPF extensible scheduler class
Date: Mon, 24 Jul 2023 11:11:10 -0400 [thread overview]
Message-ID: <2bef1006-3798-3fbe-87ad-4adfaee08cc0@google.com> (raw)
In-Reply-To: <ZLrQdTvzbmi5XFeq@slm.duckdns.org>
Hi -
On 7/21/23 14:37, Tejun Heo wrote:
> Hello,
>
> It's been more than half a year since the initial posting of the patchset
> and we are now at the fourth iteration. There have been some reviews around
> specifics (should be all addressed now except for the ones Andrea raised on
> this iteration) but none at high level. There also were some in-person and
> off-list discussions. Some, I believe, are addressed by the cover letter but
> it'd be nonetheless useful to delve into them on-list.
>
> On our side, we've been diligently experimenting.
On the google side, we're still experimenting and developing schedulers
based on ghost, which we think we can port over to sched_ext.
Specifically, I've been working on a framework to write multicore
schedulers in BPF called 'Flux'. The idea in brief is to compose a
scheduler as a hierarchy of "subschedulers", where cpus allocations go
up and down the tree.
Flux is open-source, but it needs the ghost kernel and our BPF
extensions currently (which are also open source, but harder to use for
people). I'll send a proposal to talk about it at LPC in case people
are interested - if not the scheduler framework itself, then as a "this
is some crazy stuff people can do with BPF".
As far as results go, I wrote a custom scheduler with Flux for our
Search app and have been testing it on our single-leaf loadtester. The
initial results out of the box were pretty great: 17% QPS increase, 43%
p99 decrease (default settings for the loadtester). But the loadtester
varies a bit, so it's hard to get reliable numbers out of it for an A/B
comparison of schedulers. Overall, we run equal or better than CFS. I
did a sweep across various offered loads, and we got 5% better QPS on
average, 30% better p99 latency, 6% lower utilization. The better
numbers come under higher load, as you'd expect, when there are more
threads competing for the cpu.
The big caveat to those numbers is the single-leaf loadtester isn't a
representative test. It's more of a microbenchmark. Our next step is
to run a full cluster load test, which will give us a better signal.
Anyway, this scheduler is highly specific to our app, including shared
memory regions where the app's threads can tell us stuff like RPC
deadlines. It's the sort of thing you could only reasonably do with a
pluggable scheduler like sched_ext or ghost.
> We are comfortable with the current API. Everything we tried fit pretty
> well. It will continue to evolve but sched_ext now seems mature enough for
> initial inclusion. I suppose lack of response doesn't indicate tacit
> agreement from everyone, so what are you guys all thinking?
Btw, I backported your patchset to our "franken-kernel". I was able to
boot it on one of our nodes, and run the search loadtest on CFS.
Nothing broke, performance was the same, etc. Not a huge surprise,
since I didn't turn on sched_ext. I haven't been able to get a
sched_ext scheduler to work yet with our kernel - there's more patch
backporting needed for your schedulers themselves (the bpf_for iterators
and whatnot). I'll report back if/when I can get it running.
Thanks,
Barret
next prev parent reply other threads:[~2023-07-24 15:11 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-11 1:13 [PATCHSET v4] sched: Implement BPF extensible scheduler class Tejun Heo
2023-07-11 1:13 ` [PATCH 01/34] cgroup: Implement cgroup_show_cftypes() Tejun Heo
2023-07-11 1:13 ` [PATCH 02/34] sched: Restructure sched_class order sanity checks in sched_init() Tejun Heo
2023-07-11 1:13 ` [PATCH 03/34] sched: Allow sched_cgroup_fork() to fail and introduce sched_cancel_fork() Tejun Heo
2023-07-11 1:13 ` [PATCH 04/34] sched: Add sched_class->reweight_task() Tejun Heo
2023-07-11 1:13 ` [PATCH 05/34] sched: Add sched_class->switching_to() and expose check_class_changing/changed() Tejun Heo
2023-07-11 1:13 ` [PATCH 06/34] sched: Factor out cgroup weight conversion functions Tejun Heo
2023-07-11 1:13 ` [PATCH 07/34] sched: Expose css_tg() and __setscheduler_prio() Tejun Heo
2023-07-11 1:13 ` [PATCH 08/34] sched: Enumerate CPU cgroup file types Tejun Heo
2023-07-11 1:13 ` [PATCH 09/34] sched: Add @reason to sched_class->rq_{on|off}line() Tejun Heo
2023-07-11 1:13 ` [PATCH 10/34] sched: Add normal_policy() Tejun Heo
2023-07-11 1:13 ` [PATCH 11/34] sched_ext: Add boilerplate for extensible scheduler class Tejun Heo
2023-07-11 1:13 ` [PATCH 12/34] sched_ext: Implement BPF " Tejun Heo
2023-07-11 9:21 ` Andrea Righi
2023-07-11 21:45 ` Tejun Heo
2023-08-16 11:45 ` Vishal Chourasia
2023-08-16 19:20 ` Tejun Heo
2023-07-11 1:13 ` [PATCH 13/34] sched_ext: Add scx_simple and scx_example_qmap example schedulers Tejun Heo
2023-07-11 1:13 ` [PATCH 14/34] sched_ext: Add sysrq-S which disables the BPF scheduler Tejun Heo
2023-07-11 1:13 ` [PATCH 15/34] sched_ext: Implement runnable task stall watchdog Tejun Heo
2023-07-11 1:13 ` [PATCH 16/34] sched_ext: Allow BPF schedulers to disallow specific tasks from joining SCHED_EXT Tejun Heo
2023-07-11 1:13 ` [PATCH 17/34] sched_ext: Allow BPF schedulers to switch all eligible tasks into sched_ext Tejun Heo
2023-07-11 1:13 ` [PATCH 18/34] sched_ext: Implement scx_bpf_kick_cpu() and task preemption support Tejun Heo
2023-07-11 1:13 ` [PATCH 19/34] sched_ext: Add a central scheduler which makes all scheduling decisions on one CPU Tejun Heo
2023-07-11 1:13 ` [PATCH 20/34] sched_ext: Make watchdog handle ops.dispatch() looping stall Tejun Heo
2023-07-11 1:13 ` [PATCH 21/34] sched_ext: Add task state tracking operations Tejun Heo
2023-07-11 1:13 ` [PATCH 22/34] sched_ext: Implement tickless support Tejun Heo
2023-07-11 1:13 ` [PATCH 23/34] sched_ext: Track tasks that are subjects of the in-flight SCX operation Tejun Heo
2023-07-11 1:13 ` [PATCH 24/34] sched_ext: Add cgroup support Tejun Heo
2023-07-11 1:13 ` [PATCH 25/34] sched_ext: Add a cgroup-based core-scheduling scheduler Tejun Heo
2023-07-11 1:13 ` [PATCH 26/34] sched_ext: Add a cgroup scheduler which uses flattened hierarchy Tejun Heo
2023-07-11 1:13 ` [PATCH 27/34] sched_ext: Implement SCX_KICK_WAIT Tejun Heo
2023-07-13 13:45 ` Andrea Righi
2023-07-13 18:32 ` Linus Torvalds
2023-07-13 19:48 ` Tejun Heo
2023-07-11 1:13 ` [PATCH 28/34] sched_ext: Implement sched_ext_ops.cpu_acquire/release() Tejun Heo
2023-07-11 1:13 ` [PATCH 29/34] sched_ext: Implement sched_ext_ops.cpu_online/offline() Tejun Heo
2023-07-11 1:13 ` [PATCH 30/34] sched_ext: Implement core-sched support Tejun Heo
2023-07-11 1:13 ` [PATCH 31/34] sched_ext: Add vtime-ordered priority queue to dispatch_q's Tejun Heo
2023-07-11 1:13 ` [PATCH 32/34] sched_ext: Documentation: scheduler: Document extensible scheduler class Tejun Heo
2023-07-11 1:13 ` [PATCH 33/34] sched_ext: Add a basic, userland vruntime scheduler Tejun Heo
2023-07-11 1:13 ` [PATCH 34/34] sched_ext: Add a rust userspace hybrid example scheduler Tejun Heo
2023-07-21 18:37 ` [PATCHSET v4] sched: Implement BPF extensible scheduler class Tejun Heo
2023-07-24 15:11 ` Barret Rhoden [this message]
2023-07-26 9:17 ` Peter Zijlstra
2023-07-28 0:12 ` Tejun Heo
2023-08-04 0:08 ` Tejun Heo
2023-08-11 1:16 ` Tejun Heo
2023-08-17 12:44 ` Mel Gorman
2023-08-24 21:31 ` Tejun Heo
2023-09-19 17:56 ` Tejun Heo
2023-09-26 9:20 ` Mel Gorman
2023-10-10 22:09 ` Tejun Heo
2023-08-25 0:26 ` Josh Don
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2bef1006-3798-3fbe-87ad-4adfaee08cc0@google.com \
--to=brho@google.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=daniel@iogearbox.net \
--cc=derkling@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=dschatzberg@meta.com \
--cc=dskarlat@cs.cmu.edu \
--cc=dvernet@meta.com \
--cc=haoluo@google.com \
--cc=joshdon@google.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.lau@kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=riel@surriel.com \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox