From: David Vernet <void@manifault.com>
To: Qais Yousef <qyousef@layalina.io>
Cc: Steven Rostedt <rostedt@goodmis.org>,
Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>,
torvalds@linux-foundation.org, mingo@redhat.com,
juri.lelli@redhat.com, vincent.guittot@linaro.org,
dietmar.eggemann@arm.com, bsegall@google.com, mgorman@suse.de,
bristot@redhat.com, vschneid@redhat.com, ast@kernel.org,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org,
joshdon@google.com, brho@google.com, pjt@google.com,
derkling@google.com, haoluo@google.com, dvernet@meta.com,
dschatzberg@meta.com, dskarlat@cs.cmu.edu, riel@surriel.com,
changwoo@igalia.com, himadrics@inria.fr, memxor@gmail.com,
andrea.righi@canonical.com, joel@joelfernandes.org,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
kernel-team@meta.com
Subject: Re: [PATCHSET v6] sched: Implement BPF extensible scheduler class
Date: Tue, 14 May 2024 16:34:02 -0500 [thread overview]
Message-ID: <20240514213402.GB295811@maniforge> (raw)
In-Reply-To: <20240514000715.4765jfpwi5ovlizj@airbuntu>
[-- Attachment #1: Type: text/plain, Size: 5954 bytes --]
On Tue, May 14, 2024 at 01:07:15AM +0100, Qais Yousef wrote:
[...]
> > >
> > > How does this BPF muck translate into better quality patches for me?
> >
> > Here's how we will be using it (we will likely be porting sched_ext to
> > ChromeOS regardless of its acceptance).
> >
> > Doing testing of scheduler changes in the field is extremely time
> > consuming and complex. We tested EEVDF vs CFS by backporting EEVDF to
> > 5.15 (as that is the kernel version we are using on the chromebooks we
> > were testing on), and then we need to add a user space "switch" to
> > change the scheduler. Note, this also risks causing a bug in adding
> > these changes. Then we push the kernel out, and then start our
> > experiment that enables our feature to a small percentage, and slowly
> > increases the number of users until we have a enough for a statistical
> > result.
> >
> > What sched_ext would give us is a easy way to try different scheduling
> > algorithms and get feedback much quicker. Once we determine a solution
> > that improves things, we would then spend the time to implement it in
> > the scheduler, and yes, send it upstream.
> >
> > To me, sched_ext should never be the final solution, but it can be
> > extremely useful in testing various changes quickly in the field. Which
> > to me would encourage more contributions.
Hello Qais,
[...]
> I really don't buy the rapid development aspect too. The scheduler was heavily
There are already several examples from users who have shown that the rapid
development and experimentation is extremely useful. Imagine if you're
iterating on the scheduler to improve p99 frame rates on the Steam Deck, as
Changwoo described. It's much more efficient to be able to just tweak and load
a BPF scheduler (that is safe and can't crash the machine) to try some random
idea out than it is to:
1. Tweak and recompile the kernel
2. Reinstall the kernel on the Steam Deck
3. Reboot the Steam Deck
4. Reload a game and let caches rewarm
5. Measure FPS
You're talking about a 5 second compile job + 1 second to reload a safe BPF
scheduler vs. having to do all of the above steps _and_ potentially making a
mistake that brings the machine down. These benefits are also extremely useful
for testing workloads on production servers, etc. Let’s also not forget that
unlike many other kernel features, you probably can’t get reliable scheduling
results from running in a VM. The experimentation overhead is very real.
[...]
> influenced by the early contributors which come from server market that had
> (few) very specific workloads they needed to optimize for and throughput had
> a heavier weight vs latency. Fast forward to now, things are different. Even on
> server market latency/responsiveness has become more important. Power and
> thermal are important on a larger class of systems now too. I'd dare say even
> on server market. How do you know when it's okay for an app/task to consume too
> much power and when it is not? Hint hint, you can't unless someone in userspace
> tells you. Similarly for latency vs throughput. What is the correct way to
> write an application to provide this info? Then we can ask what is missing in
> the scheduler to enable this.
Hmm, you seem to be arguing that the way forward here is to have our one
general purpose scheduler be entirely driven by user space hinting. Assuming
I’m not misunderstanding you, I strongly disagree with this sentiment. User
space hinting can be powerful, but I think we need to have a general purpose
scheduler that's completely agnostic to whatever is running in user space.
We’ve also been able to get strong results from sched_ext schedulers that don’t
use any user space hinting.
Also, even if this ended up being the way forward, I don’t see it being
practical to implement. Wouldn’t it require us to update all of user space
globally just to update how it interfaces with the scheduler?
[...]
> Note the original min/wakeup_granularity_ns, latency_ns etc were tuned by
> default for throughput by the way (server market bias). You can manipulate
> those and get better latencies.
Those knobs aren't available anymore in EEVDF.
[...]
> point IMO, not the scheduler algorithm. If the latter need to change, it needs
> to be as the result of this friction - which what EEVDF came about from to my
> understanding. To enable implementing a latency interface easier. But Vincent
> had a working implementation with CFS too which I think would have worked fine
> by the way.
This friction is nothing new. It's why we already find ourselves in the
unfortunate position of having a large corpus of out of tree scheduler patches.
If there is a lot of performance being left on the table, vendors are going to
find a way to get that performance. Corporations don't need our consent to ship
kernels with custom schedulers on their devices. They've already been doing it
for years, and it's ultimately the users who suffer.
I genuinely believe that the fair.c scheduler will benefit from being able to
apply ideas conceived in a sched_ext scheduler which end up working well for
general use cases. For example, in scx_rusty, we’re able to get very good
interactivity [0] by determining a task’s deadline as a function of its average
runtime (along with some other great ideas that Changwoo first added to
scx_lavd) rather than from its eligibility + slice as with what EEVDF does.
Over the course of a day or two, I tried way more ideas that didn’t work than
would have been possible in that time frame than with a recompile-reboot cycle,
and ended up finding one that seems to work very well. It would be awesome if
these ideas were added to EEVDF so that everyone can benefit.
[0]: https://drive.google.com/file/d/1fyHt9BYGha6apl7HAkibwpy52UTi8-AQ/view?usp=drive_link
Thanks,
David
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
next prev parent reply other threads:[~2024-05-14 21:34 UTC|newest]
Thread overview: 142+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-01 15:09 [PATCHSET v6] sched: Implement BPF extensible scheduler class Tejun Heo
2024-05-01 15:09 ` [PATCH 01/39] cgroup: Implement cgroup_show_cftypes() Tejun Heo
2024-05-01 15:09 ` [PATCH 02/39] sched: Restructure sched_class order sanity checks in sched_init() Tejun Heo
2024-05-01 15:09 ` [PATCH 03/39] sched: Allow sched_cgroup_fork() to fail and introduce sched_cancel_fork() Tejun Heo
2024-05-01 15:09 ` [PATCH 04/39] sched: Add sched_class->reweight_task() Tejun Heo
2024-06-24 10:23 ` Peter Zijlstra
2024-06-24 10:31 ` Peter Zijlstra
2024-06-24 23:59 ` Tejun Heo
2024-06-25 7:29 ` Peter Zijlstra
2024-06-25 23:57 ` Tejun Heo
2024-06-26 1:29 ` [PATCH sched/urgent] sched/fair: set_load_weight() must also call reweight_task() for SCHED_IDLE tasks Tejun Heo
2024-06-26 2:19 ` [PATCH sched_ext/for-6.11] sched_ext: Account for idle policy when setting p->scx.weight in scx_ops_enable_task() Tejun Heo
2024-07-08 19:29 ` [PATCH v2 " Tejun Heo
2024-07-08 15:07 ` [tip: sched/core] sched/fair: set_load_weight() must also call reweight_task() for SCHED_IDLE tasks tip-bot2 for Tejun Heo
2024-05-01 15:09 ` [PATCH 05/39] sched: Add sched_class->switching_to() and expose check_class_changing/changed() Tejun Heo
2024-06-24 11:06 ` Peter Zijlstra
2024-06-24 22:18 ` Tejun Heo
2024-06-25 8:16 ` Peter Zijlstra
2024-05-01 15:09 ` [PATCH 06/39] sched: Factor out cgroup weight conversion functions Tejun Heo
2024-05-01 15:09 ` [PATCH 07/39] sched: Expose css_tg() and __setscheduler_prio() Tejun Heo
2024-06-24 11:19 ` Peter Zijlstra
2024-06-24 18:56 ` Tejun Heo
2024-05-01 15:09 ` [PATCH 08/39] sched: Enumerate CPU cgroup file types Tejun Heo
2024-05-01 15:09 ` [PATCH 09/39] sched: Add @reason to sched_class->rq_{on|off}line() Tejun Heo
2024-06-24 11:32 ` Peter Zijlstra
2024-06-24 21:18 ` Tejun Heo
2024-06-25 8:29 ` Peter Zijlstra
2024-06-25 23:41 ` Tejun Heo
2024-06-26 8:23 ` Peter Zijlstra
2024-06-26 18:01 ` Tejun Heo
2024-06-27 1:27 ` [PATCH sched_ext/for-6.11] sched_ext: Disallow loading BPF scheduler if isolcpus= domain isolation is in effect Tejun Heo
2024-07-08 19:30 ` Tejun Heo
2024-05-01 15:09 ` [PATCH 10/39] sched: Factor out update_other_load_avgs() from __update_blocked_others() Tejun Heo
2024-06-24 12:35 ` Peter Zijlstra
2024-06-24 16:15 ` Vincent Guittot
2024-06-24 19:24 ` Tejun Heo
2024-06-25 9:13 ` Vincent Guittot
2024-06-26 20:49 ` Tejun Heo
2024-05-01 15:09 ` [PATCH 11/39] cpufreq_schedutil: Refactor sugov_cpu_is_busy() Tejun Heo
2024-05-01 15:09 ` [PATCH 12/39] sched: Add normal_policy() Tejun Heo
2024-05-01 15:09 ` [PATCH 13/39] sched_ext: Add boilerplate for extensible scheduler class Tejun Heo
2024-05-01 15:09 ` [PATCH 14/39] sched_ext: Implement BPF " Tejun Heo
2024-05-01 15:09 ` [PATCH 15/39] sched_ext: Add scx_simple and scx_example_qmap example schedulers Tejun Heo
2024-05-01 15:09 ` [PATCH 16/39] sched_ext: Add sysrq-S which disables the BPF scheduler Tejun Heo
2024-05-01 15:09 ` [PATCH 17/39] sched_ext: Implement runnable task stall watchdog Tejun Heo
2024-05-01 15:09 ` [PATCH 18/39] sched_ext: Allow BPF schedulers to disallow specific tasks from joining SCHED_EXT Tejun Heo
2024-06-24 12:40 ` Peter Zijlstra
2024-06-24 19:06 ` Tejun Heo
2024-05-01 15:09 ` [PATCH 19/39] sched_ext: Print sched_ext info when dumping stack Tejun Heo
2024-06-24 12:46 ` Peter Zijlstra
2024-06-24 14:25 ` Linus Torvalds
2024-06-24 18:34 ` Tejun Heo
2024-05-01 15:09 ` [PATCH 20/39] sched_ext: Print debug dump after an error exit Tejun Heo
2024-05-01 15:09 ` [PATCH 21/39] tools/sched_ext: Add scx_show_state.py Tejun Heo
2024-05-01 15:09 ` [PATCH 22/39] sched_ext: Implement scx_bpf_kick_cpu() and task preemption support Tejun Heo
2024-05-01 15:09 ` [PATCH 23/39] sched_ext: Add a central scheduler which makes all scheduling decisions on one CPU Tejun Heo
2024-05-01 15:09 ` [PATCH 24/39] sched_ext: Make watchdog handle ops.dispatch() looping stall Tejun Heo
2024-05-01 15:10 ` [PATCH 25/39] sched_ext: Add task state tracking operations Tejun Heo
2024-05-01 15:10 ` [PATCH 26/39] sched_ext: Implement tickless support Tejun Heo
2024-05-01 15:10 ` [PATCH 27/39] sched_ext: Track tasks that are subjects of the in-flight SCX operation Tejun Heo
2024-05-01 15:10 ` [PATCH 28/39] sched_ext: Add cgroup support Tejun Heo
2024-05-01 15:10 ` [PATCH 29/39] sched_ext: Add a cgroup scheduler which uses flattened hierarchy Tejun Heo
2024-05-01 15:10 ` [PATCH 30/39] sched_ext: Implement SCX_KICK_WAIT Tejun Heo
2024-05-01 15:10 ` [PATCH 31/39] sched_ext: Implement sched_ext_ops.cpu_acquire/release() Tejun Heo
2024-05-01 15:10 ` [PATCH 32/39] sched_ext: Implement sched_ext_ops.cpu_online/offline() Tejun Heo
2024-05-01 15:10 ` [PATCH 33/39] sched_ext: Bypass BPF scheduler while PM events are in progress Tejun Heo
2024-05-01 15:10 ` [PATCH 34/39] sched_ext: Implement core-sched support Tejun Heo
2024-05-01 15:10 ` [PATCH 35/39] sched_ext: Add vtime-ordered priority queue to dispatch_q's Tejun Heo
2024-05-01 15:10 ` [PATCH 36/39] sched_ext: Implement DSQ iterator Tejun Heo
2024-05-01 15:10 ` [PATCH 37/39] sched_ext: Add cpuperf support Tejun Heo
2024-05-01 15:10 ` [PATCH 38/39] sched_ext: Documentation: scheduler: Document extensible scheduler class Tejun Heo
2024-05-02 2:24 ` Bagas Sanjaya
2024-05-01 15:10 ` [PATCH 39/39] sched_ext: Add selftests Tejun Heo
2024-05-02 8:48 ` [PATCHSET v6] sched: Implement BPF extensible scheduler class Peter Zijlstra
2024-05-02 19:20 ` Tejun Heo
2024-05-03 8:52 ` Peter Zijlstra
2024-05-05 23:31 ` Tejun Heo
2024-05-13 8:03 ` Peter Zijlstra
2024-05-13 18:26 ` Steven Rostedt
2024-05-14 0:07 ` Qais Yousef
2024-05-14 21:34 ` David Vernet [this message]
2024-05-27 21:25 ` Qais Yousef
2024-05-28 23:46 ` Tejun Heo
2024-05-29 22:09 ` Qais Yousef
2024-05-17 9:58 ` Peter Zijlstra
2024-05-27 20:29 ` Qais Yousef
2024-05-14 20:22 ` Chris Mason
2024-05-14 22:06 ` Josh Don
2024-05-15 20:41 ` Tejun Heo
2024-05-21 0:19 ` Tejun Heo
2024-05-30 16:49 ` Tejun Heo
2024-05-06 18:47 ` Rik van Riel
2024-05-07 19:33 ` Tejun Heo
2024-05-07 19:47 ` Rik van Riel
2024-05-09 7:38 ` Changwoo Min
2024-05-10 18:24 ` Peter Jung
2024-05-13 20:36 ` Andrea Righi
2024-06-11 21:34 ` Linus Torvalds
2024-06-13 23:38 ` Tejun Heo
2024-06-19 20:56 ` Thomas Gleixner
2024-06-19 22:10 ` Linus Torvalds
2024-06-19 22:27 ` Thomas Gleixner
2024-06-19 22:55 ` Linus Torvalds
2024-06-20 2:35 ` Thomas Gleixner
2024-06-20 5:07 ` Linus Torvalds
2024-06-20 17:11 ` Linus Torvalds
2024-06-20 17:41 ` Tejun Heo
2024-06-20 22:15 ` [PATCH sched_ext/for-6.11] sched, sched_ext: Replace scx_next_task_picked() with sched_class->switch_class() Tejun Heo
2024-06-20 22:42 ` Linus Torvalds
2024-06-21 19:46 ` Tejun Heo
2024-06-24 9:04 ` Peter Zijlstra
2024-06-24 18:41 ` Tejun Heo
2024-06-24 9:02 ` Peter Zijlstra
2024-06-21 19:52 ` Tejun Heo
2024-06-24 8:59 ` Peter Zijlstra
2024-06-24 21:01 ` Tejun Heo
2024-06-25 7:49 ` Peter Zijlstra
2024-06-25 23:30 ` Tejun Heo
2024-06-26 8:28 ` Peter Zijlstra
2024-06-26 17:56 ` Tejun Heo
2024-06-20 18:47 ` [PATCHSET v6] sched: Implement BPF extensible scheduler class Thomas Gleixner
2024-06-20 19:20 ` Linus Torvalds
2024-06-21 9:35 ` Thomas Gleixner
2024-06-21 16:34 ` Linus Torvalds
2024-06-23 2:00 ` Tejun Heo
2024-06-23 10:31 ` Thomas Gleixner
2024-06-23 10:33 ` Thomas Gleixner
2024-06-24 14:23 ` Jason Gunthorpe
2024-06-20 19:58 ` Tejun Heo
2024-06-24 9:34 ` Peter Zijlstra
2024-06-24 20:17 ` Tejun Heo
2024-06-24 20:51 ` [PATCH sched_ext/for-6.11] sched, sched_ext: Simplify dl_prio() case handling in sched_fork() Tejun Heo
2024-07-08 18:56 ` Tejun Heo
2024-06-20 19:35 ` [PATCHSET v6] sched: Implement BPF extensible scheduler class Tejun Heo
2024-06-21 10:46 ` Thomas Gleixner
2024-06-21 21:14 ` Chris Mason
2024-06-23 8:14 ` Thomas Gleixner
2024-06-24 16:42 ` Chris Mason
2024-06-24 18:11 ` Tejun Heo
2024-06-24 22:01 ` Peter Oskolkov
2024-06-24 22:17 ` David Vernet
2024-06-24 21:54 ` Peter Oskolkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240514213402.GB295811@maniforge \
--to=void@manifault.com \
--cc=andrea.righi@canonical.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brho@google.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=changwoo@igalia.com \
--cc=daniel@iogearbox.net \
--cc=derkling@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=dschatzberg@meta.com \
--cc=dskarlat@cs.cmu.edu \
--cc=dvernet@meta.com \
--cc=haoluo@google.com \
--cc=himadrics@inria.fr \
--cc=joel@joelfernandes.org \
--cc=joshdon@google.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.lau@kernel.org \
--cc=memxor@gmail.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=qyousef@layalina.io \
--cc=riel@surriel.com \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox