From: Peter Zijlstra <peterz@infradead.org>
To: Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org, mingo@redhat.com,
juri.lelli@redhat.com, vincent.guittot@linaro.org,
dietmar.eggemann@arm.com, rostedt@goodmis.org,
bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com,
void@manifault.com, arighi@nvidia.com, changwoo@igalia.com,
cgroups@vger.kernel.org, sched-ext@lists.linux.dev,
liuwenfang@honor.com, tglx@linutronix.de
Subject: Re: [PATCH 12/14] sched: Add shared runqueue locking to __task_rq_lock()
Date: Thu, 25 Sep 2025 10:35:33 +0200 [thread overview]
Message-ID: <20250925083533.GW4067720@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <aMnnslT_mUfAtytN@slm.duckdns.org>
Hi! Sorry for the delay,
On Tue, Sep 16, 2025 at 12:41:54PM -1000, Tejun Heo wrote:
> On Tue, Sep 16, 2025 at 12:29:57PM -1000, Tejun Heo wrote:
> ...
> > Long term, I think maintaining flexibility is of higher importance for
> > sched_ext than e.g. small performance improvements or even design or
> > implementation aesthetics. The primary purpose is enabling trying out new,
> > sometimes wild, things after all. As such, I don't think it'd be a good idea
> > to put strict restrictions on how the BPF side operates unless it affects
> > the ability to recover the system from a malfunctioning BPF scheduler, of
> > course.
>
> Thinking a bit more about it. I wonder the status-quo is actually an okay
> balance. All in-kernel sched classes are per-CPU rich rq design, which
> meshes well with the current locking scheme, for obvious reasons.
>
> sched_ext is an oddball in that it may want to hot-migrate tasks at the last
> minute because who knows what the BPF side wants to do. However, this just
> boils down to having to always call balance() before any pick_task()
> attempts (including DL server case). Yeah, it's a niggle, especially as
> there needs to be a secondary hook to handle losing the race between
> balance() and pick_task(), but it's pretty contained conceptually and not a
> lot of code.
Status quo isn't sufficient; there is that guy that wants to fix some RT
interaction, and there is that dl_server series.
The only viable option other than overhauling the locking, is pushing rf
into pick_task() and have that do all the lock dancing. This gets rid of
that balance abuse (which is needed for dl_server) and also allows
fixing that rt thing.
It just makes a giant mess of pick_task_scx() which might have to drop
locks and retry/abort -- which you weren't very keen on, but yeah, it
should work.
As to letting BPF do wild experiments; that's fine of course, but not
exposing the actual locking requirements is like denying reality. You
can't do lock-break in pick_task_scx() and then claim lockless or
advanced locking -- that's just not true.
Also, you cannot claim bpf-sched author is clever enough to implement
advanced locking, but then somehow not clever enough to deal with a
simple interface to express locking to the core code. That feels
disingenuous.
For all the DSQ based schedulers, this new locking really is an
improvement, but if you don't want to constrain bpf-sched authors to
reality, then perhaps only do the lock break dance for them?
Anyway, I'll go poke at this series again -- the latest queue.git
version seemed to work reliably for me (I could run stress-ng while
having scx_simple loaded), but the robot seems to have found an issue.
next prev parent reply other threads:[~2025-09-25 8:35 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-10 15:44 [PATCH 00/14] sched: Support shared runqueue locking Peter Zijlstra
2025-09-10 15:44 ` [PATCH 01/14] sched: Employ sched_change guards Peter Zijlstra
2025-09-11 9:06 ` K Prateek Nayak
2025-09-11 9:55 ` Peter Zijlstra
2025-09-11 10:10 ` Peter Zijlstra
2025-09-11 10:37 ` K Prateek Nayak
2025-10-06 15:21 ` Shrikanth Hegde
2025-10-06 18:14 ` Peter Zijlstra
2025-10-07 5:12 ` Shrikanth Hegde
2025-10-07 9:34 ` Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] sched: Mandate shared flags for sched_change tip-bot2 for Peter Zijlstra
2025-09-10 15:44 ` [PATCH 02/14] sched: Re-arrange the {EN,DE}QUEUE flags Peter Zijlstra
2025-09-10 15:44 ` [PATCH 03/14] sched: Fold sched_class::switch{ing,ed}_{to,from}() into the change pattern Peter Zijlstra
2025-09-10 15:44 ` [PATCH 04/14] sched: Cleanup sched_delayed handling for class switches Peter Zijlstra
2025-09-10 15:44 ` [PATCH 05/14] sched: Move sched_class::prio_changed() into the change pattern Peter Zijlstra
2025-09-11 1:44 ` Tejun Heo
2025-09-10 15:44 ` [PATCH 06/14] sched: Fix migrate_disable_switch() locking Peter Zijlstra
2025-09-10 15:44 ` [PATCH 07/14] sched: Fix do_set_cpus_allowed() locking Peter Zijlstra
2025-10-30 0:12 ` Mark Brown
2025-10-30 9:07 ` Peter Zijlstra
2025-10-30 12:47 ` Mark Brown
2025-09-10 15:44 ` [PATCH 08/14] sched: Rename do_set_cpus_allowed() Peter Zijlstra
2025-09-10 15:44 ` [PATCH 09/14] sched: Make __do_set_cpus_allowed() use the sched_change pattern Peter Zijlstra
2025-09-10 15:44 ` [PATCH 10/14] sched: Add locking comments to sched_class methods Peter Zijlstra
2025-09-10 15:44 ` [PATCH 11/14] sched: Add flags to {put_prev,set_next}_task() methods Peter Zijlstra
2025-09-10 15:44 ` [PATCH 12/14] sched: Add shared runqueue locking to __task_rq_lock() Peter Zijlstra
2025-09-12 0:19 ` Tejun Heo
2025-09-12 11:54 ` Peter Zijlstra
2025-09-12 14:11 ` Peter Zijlstra
2025-09-12 17:56 ` Tejun Heo
2025-09-15 8:38 ` Peter Zijlstra
2025-09-16 22:29 ` Tejun Heo
2025-09-16 22:41 ` Tejun Heo
2025-09-25 8:35 ` Peter Zijlstra [this message]
2025-09-25 21:43 ` Tejun Heo
2025-09-26 9:59 ` Peter Zijlstra
2025-09-26 16:48 ` Tejun Heo
2025-09-26 10:36 ` Peter Zijlstra
2025-09-26 21:39 ` Tejun Heo
2025-09-29 10:06 ` Peter Zijlstra
2025-09-30 23:49 ` Tejun Heo
2025-10-01 11:54 ` Peter Zijlstra
2025-10-02 23:32 ` Tejun Heo
2025-09-10 15:44 ` [PATCH 13/14] sched: Add {DE,EN}QUEUE_LOCKED Peter Zijlstra
2025-09-11 2:01 ` Tejun Heo
2025-09-11 9:42 ` Peter Zijlstra
2025-09-11 20:40 ` Tejun Heo
2025-09-12 14:19 ` Peter Zijlstra
2025-09-12 16:32 ` Tejun Heo
2025-09-13 22:32 ` Tejun Heo
2025-09-15 8:48 ` Peter Zijlstra
2025-09-25 13:10 ` Peter Zijlstra
2025-09-25 15:40 ` Tejun Heo
2025-09-25 15:53 ` Peter Zijlstra
2025-09-25 18:44 ` Tejun Heo
2025-09-10 15:44 ` [PATCH 14/14] sched/ext: Implement p->srq_lock support Peter Zijlstra
2025-09-10 16:07 ` Peter Zijlstra
2025-09-10 17:32 ` [PATCH 00/14] sched: Support shared runqueue locking Andrea Righi
2025-09-10 18:19 ` Peter Zijlstra
2025-09-10 18:35 ` Peter Zijlstra
2025-09-10 19:00 ` Andrea Righi
2025-09-11 9:58 ` Peter Zijlstra
2025-09-11 14:51 ` Andrea Righi
2025-09-11 14:00 ` Peter Zijlstra
2025-09-11 14:30 ` Peter Zijlstra
2025-09-11 14:48 ` Andrea Righi
2025-09-18 15:15 ` Christian Loehle
2025-09-25 9:00 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250925083533.GW4067720@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=arighi@nvidia.com \
--cc=bsegall@google.com \
--cc=cgroups@vger.kernel.org \
--cc=changwoo@igalia.com \
--cc=dietmar.eggemann@arm.com \
--cc=hannes@cmpxchg.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=liuwenfang@honor.com \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=rostedt@goodmis.org \
--cc=sched-ext@lists.linux.dev \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=void@manifault.com \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox