The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: arighi@nvidia.com, void@manifault.com, changwoo@igalia.com,
	jstultz@google.com
Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	vschneid@redhat.com, kprateek.nayak@amd.com,
	christian.loehle@arm.com, kobak@nvidia.com,
	joelagnelf@nvidia.com, emil@etsalapatis.com,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH sched_ext/for-7.2 0/10] sched: Make proxy execution compatible with sched_ext
Date: Fri,  8 May 2026 15:00:59 -1000	[thread overview]
Message-ID: <20260509010059.345908-1-tj@kernel.org> (raw)
In-Reply-To: <20260506174639.535232-1-arighi@nvidia.com>

Hello,

I'm a bit worried this is more invasive than what it buys. Even with
the full series, the cross-CPU gap Prateek raised stays open -
find_proxy_task() doesn't go through put_prev_set_next_task(), so owner
runs without ops.running(owner). Closing that seems to need yet another
protocol on top, either synthetic running/stopping events or scx core
taking over dispatch_dequeue for substitutions. The BPF scheduler ends
up dispatching tasks it didn't pick and observing callbacks for tasks
it didn't enqueue, which feels too magical and error-prone.

Maybe worth considering an alternative where, when scx is loaded, we
just turn proxy-exec off entirely and expose blocked_on to the BPF
scheduler. Schedulers that want PI can implement it themselves on top
of the relationship; ones that don't pay nothing.

scx_enable could flip the proxy_exec static branch off, after which the
existing gates in __schedule keep blocked tasks off the runqueue and
skip find_proxy_task on their own. The remaining concern is in-flight
donors at the moment of the flip - the existing scx_bypass walk already
visits every rq's runnable list during enable, and could force-block
any task it sees with blocked_on set. Mutex unlock would re-wake them
through wake_q normally after that. blocked_on itself is set and
cleared in mutex.c regardless of proxy_exec, so the signal we'd want
to surface is already there.

For the BPF side, the natural shape seems to be tagging the existing
ops.quiescent and ops.runnable callbacks with a bit indicating "this
sleep/wake was a mutex transition," plus a small kfunc that returns
the owner of the mutex p is blocked on. A scheduler that wants PI then
records the owner in its own task storage on the quiescent side, boosts
it via the existing vtime / slice / dsq_move / kick primitives, and
drops the boost when the runnable side fires. No new dispatch protocol,
the BPF scheduler stays in charge of who runs.

Does that direction seem reasonable, or am I missing something that
makes it not work?

Thanks.
--
tejun

  parent reply	other threads:[~2026-05-09  1:01 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-06 17:45 [RFC PATCH sched_ext/for-7.2 0/10] sched: Make proxy execution compatible with sched_ext Andrea Righi
2026-05-06 17:45 ` [PATCH 01/10] sched/core: Skip migration disabled tasks in proxy execution Andrea Righi
2026-05-06 21:09   ` John Stultz
2026-05-07  3:34     ` K Prateek Nayak
2026-05-07  6:31       ` Andrea Righi
2026-05-07  7:45         ` K Prateek Nayak
2026-05-07 10:13           ` Andrea Righi
2026-05-07 15:47             ` K Prateek Nayak
2026-05-08  7:40               ` Andrea Righi
2026-05-06 17:45 ` [PATCH 02/10] sched/core: Skip put_prev_task/set_next_task re-entry for sched_ext donors Andrea Righi
2026-05-06 17:45 ` [PATCH 03/10] sched/ext: Split curr|donor references properly Andrea Righi
2026-05-06 17:45 ` [PATCH 04/10] sched/ext: Avoid migrating blocked tasks with proxy execution Andrea Righi
2026-05-06 17:45 ` [PATCH 05/10] sched_ext: Fix TOCTOU race in consume_remote_task() Andrea Righi
2026-05-06 17:45 ` [PATCH 06/10] sched_ext: Fix ops.running/stopping() pairing for proxy-exec donors Andrea Righi
2026-05-06 17:45 ` [PATCH 07/10] sched_ext: Save/restore kf_tasks[] when task ops nest Andrea Righi
2026-05-06 17:45 ` [PATCH 08/10] sched_ext: Skip ops.runnable() when nested in SCX_CALL_OP_TASK Andrea Righi
2026-05-06 17:45 ` [PATCH 09/10] sched/core: Disable proxy-exec context switch under sched_ext by default Andrea Righi
2026-05-06 17:45 ` [PATCH 10/10] sched: Allow enabling proxy exec with sched_ext Andrea Righi
2026-05-09  1:00 ` Tejun Heo [this message]
2026-05-10 15:06   ` [RFC PATCH sched_ext/for-7.2 0/10] sched: Make proxy execution compatible " Andrea Righi
2026-05-10 19:41     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260509010059.345908-1-tj@kernel.org \
    --to=tj@kernel.org \
    --cc=arighi@nvidia.com \
    --cc=bsegall@google.com \
    --cc=changwoo@igalia.com \
    --cc=christian.loehle@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=emil@etsalapatis.com \
    --cc=joelagnelf@nvidia.com \
    --cc=jstultz@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kobak@nvidia.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=vincent.guittot@linaro.org \
    --cc=void@manifault.com \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox