From: Tejun Heo <tj@kernel.org>
To: Andrea Righi <arighi@nvidia.com>
Cc: linux-kernel@vger.kernel.org, sched-ext@lists.linux.dev,
void@manifault.com, changwoo@igalia.com, emil@etsalapatis.com,
hannes@cmpxchg.org, mkoutny@suse.com, cgroups@vger.kernel.org
Subject: Re: [PATCH 11/34] sched_ext: Enforce scheduler ownership when updating slice and dsq_vtime
Date: Fri, 27 Feb 2026 12:25:45 -1000 [thread overview]
Message-ID: <aaIZ6aeNJrZp14kh@slm.duckdns.org> (raw)
In-Reply-To: <aaBjHUr29afGuKVh@gpd4>
Hello, Andrea.
On Thu, Feb 26, 2026 at 04:13:33PM +0100, Andrea Righi wrote:
> My concern with this is that we may introduce some overhead for those
> schedulers that require frequent adjustment of slice / dsq_vtime directly.
I'm a bit skeptical about the premise. Unless p->scx.vtime/slice are used
for BPF side book-keeping, the only times they need to be modified are:
- When inserting into a vtime DSQ, vtime needs to be set. However, the
interface functions already have provisions for setting vtime, so direct
manipulation isn't necessary.
- slice can be simliar but can also be a bit more complicated. As slice only
affects when the task actually gets on the CPU and a task may not have its
eventual slice known at the time of its insertion into a user DSQ. In such
cases, it may be necessary to set the slice as the task starts execution
from e.g. ops.running().
- While a task is running, slice modification can be used to give the task
more or less CPU time. Most commonly, these would be either extending
slice to keep running the current task or preemting the task by setting
the slice to zero and triggering a scheduling event.
So, as long as p->scx.vtime/slice are used to instruct the kernel what to
do, as opposed to being used for BPF side book-keeping, vtime doesn't need
to be directly modified at all and while slice may need to be modified,
those are mostly directly tied to actual scheduling operations and context
switches. I'd be surprised if the kfunc overhead is noticeable at all. kfunc
calls aren't expensive unless you're banging on it in a tight loop. Also,
note that in the lowest overhead scheduling scenario - direct dispatch to a
local DSQ from select_cpu()/enqueue() - neither is needed. It'd just be a
single scx_bpf_dsq_insert() call.
> While the scx_task_on_sched() check itself has likely zero impact, the
> kfunc invocations can potentially introduce measurable overhead.
>
> I'm wondering if we could instead delegate the authority check at
> verification time, introducing something similar to PTR_TRUSTED
> (PTR_SCX_AUTH?) to struct task_struct * to represent that the scheduler has
> authority to access the task and allow direct writes to p->scx.slice /
> p->scx.dsq_vtime only when the register has that flag.
>
> Then:
> - for tasks passed from the core opts (enqueue, dispatch, etc.) we
> automatically tag them with PTR_SCX_AUTH,
> - tasks obtained externally (e.g., via bpf_task_from_pid()): they don't
> have the flag (so no modification allowed) and in this case maybe we
> provide a scx_bpf_auth_task() kfunc to perform the scx_task_on_sched()
> check that returns p (or NULL) setting the auth flag if the scheduler
> has full access to the task.
So, I'm not sure this is something we need to invest complexity into. The
only cases I can think of where the overhead might become visible is if the
BPF sched uses these fields for internal bookkeeping and keeps updating a
lot more times than there are actual scheduling events. However, I don't
think that's a usage model that we want to encourage.
Thanks.
--
tejun
next prev parent reply other threads:[~2026-02-27 22:25 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-25 5:00 [PATCHSET v2 sched_ext/for-7.1] sched_ext: Implement cgroup sub-scheduler support Tejun Heo
2026-02-25 5:00 ` [PATCH 01/34] sched_ext: Implement cgroup subtree iteration for scx_task_iter Tejun Heo
2026-02-25 5:00 ` [PATCH 02/34] sched_ext: Add @kargs to scx_fork() Tejun Heo
2026-02-25 5:00 ` [PATCH 03/34] sched/core: Swap the order between sched_post_fork() and cgroup_post_fork() Tejun Heo
2026-02-25 5:00 ` [PATCH 04/34] cgroup: Expose some cgroup helpers Tejun Heo
2026-02-25 5:00 ` [PATCH 05/34] sched_ext: Update p->scx.disallow warning in scx_init_task() Tejun Heo
2026-02-25 5:00 ` [PATCH 06/34] sched_ext: Reorganize enable/disable path for multi-scheduler support Tejun Heo
2026-02-25 5:00 ` [PATCH 07/34] sched_ext: Introduce cgroup sub-sched support Tejun Heo
2026-02-26 15:37 ` Andrea Righi
2026-02-27 20:14 ` Tejun Heo
2026-02-26 15:52 ` Andrea Righi
2026-02-27 19:51 ` Tejun Heo
2026-02-27 20:04 ` Tejun Heo
2026-02-25 5:00 ` [PATCH 08/34] sched_ext: Introduce scx_task_sched[_rcu]() Tejun Heo
2026-02-25 5:00 ` [PATCH 09/34] sched_ext: Introduce scx_prog_sched() Tejun Heo
2026-02-25 5:00 ` [PATCH 10/34] sched_ext: Enforce scheduling authority in dispatch and select_cpu operations Tejun Heo
2026-02-25 5:00 ` [PATCH 11/34] sched_ext: Enforce scheduler ownership when updating slice and dsq_vtime Tejun Heo
2026-02-26 15:13 ` Andrea Righi
2026-02-27 22:25 ` Tejun Heo [this message]
2026-02-27 23:50 ` Andrea Righi
2026-02-25 5:00 ` [PATCH 12/34] sched_ext: scx_dsq_move() should validate the task belongs to the right scheduler Tejun Heo
2026-02-25 5:00 ` [PATCH 13/34] sched_ext: Refactor task init/exit helpers Tejun Heo
2026-02-27 6:55 ` Andrea Righi
2026-02-27 19:50 ` Tejun Heo
2026-02-25 5:00 ` [PATCH 14/34] sched_ext: Make scx_prio_less() handle multiple schedulers Tejun Heo
2026-02-25 5:00 ` [PATCH 15/34] sched_ext: Move default slice to per-scheduler field Tejun Heo
2026-02-25 5:00 ` [PATCH 16/34] sched_ext: Move aborting flag " Tejun Heo
2026-02-25 5:00 ` [PATCH 17/34] sched_ext: Move bypass_dsq into scx_sched_pcpu Tejun Heo
2026-02-25 5:00 ` [PATCH 18/34] sched_ext: Move bypass state into scx_sched Tejun Heo
2026-02-25 5:00 ` [PATCH 19/34] sched_ext: Prepare bypass mode for hierarchical operation Tejun Heo
2026-02-25 5:00 ` [PATCH 20/34] sched_ext: Factor out scx_dispatch_sched() Tejun Heo
2026-02-25 5:00 ` [PATCH 21/34] sched_ext: When calling ops.dispatch() @prev must be on the same scx_sched Tejun Heo
2026-02-25 5:00 ` [PATCH 22/34] sched_ext: Separate bypass dispatch enabling from bypass depth tracking Tejun Heo
2026-02-25 5:00 ` [PATCH 23/34] sched_ext: Implement hierarchical bypass mode Tejun Heo
2026-02-25 5:00 ` [PATCH 24/34] sched_ext: Dispatch from all scx_sched instances Tejun Heo
2026-02-25 5:01 ` [PATCH 25/34] sched_ext: Move scx_dsp_ctx and scx_dsp_max_batch into scx_sched Tejun Heo
2026-02-25 5:01 ` [PATCH 26/34] sched_ext: Make watchdog sub-sched aware Tejun Heo
2026-02-25 5:01 ` [PATCH 27/34] sched_ext: Convert scx_dump_state() spinlock to raw spinlock Tejun Heo
2026-02-25 5:01 ` [PATCH 28/34] sched_ext: Support dumping multiple schedulers and add scheduler identification Tejun Heo
2026-02-25 5:01 ` [PATCH 29/34] sched_ext: Implement cgroup sub-sched enabling and disabling Tejun Heo
2026-02-25 5:01 ` [PATCH 30/34] sched_ext: Add scx_sched back pointer to scx_sched_pcpu Tejun Heo
2026-02-25 5:01 ` [PATCH 31/34] sched_ext: Make scx_bpf_reenqueue_local() sub-sched aware Tejun Heo
2026-02-25 5:01 ` [PATCH 32/34] sched_ext: Factor out scx_link_sched() and scx_unlink_sched() Tejun Heo
2026-02-25 5:01 ` [PATCH 33/34] sched_ext: Add rhashtable lookup for sub-schedulers Tejun Heo
2026-02-25 5:01 ` [PATCH 34/34] sched_ext: Add basic building blocks for nested sub-scheduler dispatching Tejun Heo
2026-02-25 5:14 ` [PATCHSET v2 sched_ext/for-7.1] sched_ext: Implement cgroup sub-scheduler support Tejun Heo
-- strict thread matches above, loose matches on Subject: below --
2026-03-04 22:00 [PATCHSET v3 " Tejun Heo
2026-03-04 22:00 ` [PATCH 11/34] sched_ext: Enforce scheduler ownership when updating slice and dsq_vtime Tejun Heo
2026-02-25 5:01 [PATCHSET v2 sched_ext/for-7.1] sched_ext: Implement cgroup sub-scheduler support Tejun Heo
2026-02-25 5:01 ` [PATCH 11/34] sched_ext: Enforce scheduler ownership when updating slice and dsq_vtime Tejun Heo
2026-01-21 23:11 [PATCHSET v1 sched_ext/for-6.20] sched_ext: Implement cgroup sub-scheduler support Tejun Heo
2026-01-21 23:11 ` [PATCH 11/34] sched_ext: Enforce scheduler ownership when updating slice and dsq_vtime Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aaIZ6aeNJrZp14kh@slm.duckdns.org \
--to=tj@kernel.org \
--cc=arighi@nvidia.com \
--cc=cgroups@vger.kernel.org \
--cc=changwoo@igalia.com \
--cc=emil@etsalapatis.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mkoutny@suse.com \
--cc=sched-ext@lists.linux.dev \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox