From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org, sched-ext@lists.linux.dev,
void@manifault.com, changwoo@igalia.com, emil@etsalapatis.com,
hannes@cmpxchg.org, mkoutny@suse.com, cgroups@vger.kernel.org
Subject: Re: [PATCH 11/34] sched_ext: Enforce scheduler ownership when updating slice and dsq_vtime
Date: Sat, 28 Feb 2026 00:50:57 +0100 [thread overview]
Message-ID: <aaIt4aZzWAZSktJg@gpd4> (raw)
In-Reply-To: <aaIZ6aeNJrZp14kh@slm.duckdns.org>
On Fri, Feb 27, 2026 at 12:25:45PM -1000, Tejun Heo wrote:
> Hello, Andrea.
>
> On Thu, Feb 26, 2026 at 04:13:33PM +0100, Andrea Righi wrote:
> > My concern with this is that we may introduce some overhead for those
> > schedulers that require frequent adjustment of slice / dsq_vtime directly.
>
> I'm a bit skeptical about the premise. Unless p->scx.vtime/slice are used
> for BPF side book-keeping, the only times they need to be modified are:
>
> - When inserting into a vtime DSQ, vtime needs to be set. However, the
> interface functions already have provisions for setting vtime, so direct
> manipulation isn't necessary.
>
> - slice can be simliar but can also be a bit more complicated. As slice only
> affects when the task actually gets on the CPU and a task may not have its
> eventual slice known at the time of its insertion into a user DSQ. In such
> cases, it may be necessary to set the slice as the task starts execution
> from e.g. ops.running().
>
> - While a task is running, slice modification can be used to give the task
> more or less CPU time. Most commonly, these would be either extending
> slice to keep running the current task or preemting the task by setting
> the slice to zero and triggering a scheduling event.
>
> So, as long as p->scx.vtime/slice are used to instruct the kernel what to
> do, as opposed to being used for BPF side book-keeping, vtime doesn't need
> to be directly modified at all and while slice may need to be modified,
> those are mostly directly tied to actual scheduling operations and context
> switches. I'd be surprised if the kfunc overhead is noticeable at all. kfunc
> calls aren't expensive unless you're banging on it in a tight loop. Also,
> note that in the lowest overhead scheduling scenario - direct dispatch to a
> local DSQ from select_cpu()/enqueue() - neither is needed. It'd just be a
> single scx_bpf_dsq_insert() call.
>
> > While the scx_task_on_sched() check itself has likely zero impact, the
> > kfunc invocations can potentially introduce measurable overhead.
> >
> > I'm wondering if we could instead delegate the authority check at
> > verification time, introducing something similar to PTR_TRUSTED
> > (PTR_SCX_AUTH?) to struct task_struct * to represent that the scheduler has
> > authority to access the task and allow direct writes to p->scx.slice /
> > p->scx.dsq_vtime only when the register has that flag.
> >
> > Then:
> > - for tasks passed from the core opts (enqueue, dispatch, etc.) we
> > automatically tag them with PTR_SCX_AUTH,
> > - tasks obtained externally (e.g., via bpf_task_from_pid()): they don't
> > have the flag (so no modification allowed) and in this case maybe we
> > provide a scx_bpf_auth_task() kfunc to perform the scx_task_on_sched()
> > check that returns p (or NULL) setting the auth flag if the scheduler
> > has full access to the task.
>
> So, I'm not sure this is something we need to invest complexity into. The
> only cases I can think of where the overhead might become visible is if the
> BPF sched uses these fields for internal bookkeeping and keeps updating a
> lot more times than there are actual scheduling events. However, I don't
> think that's a usage model that we want to encourage.
Ack, also we don't necessarily need to make it perfect right now, we can
begin with the set_slice/set_dsq_vtime kfuncs and refine the appraoch later
if we find performance regressions.
Thanks,
-Andrea
next prev parent reply other threads:[~2026-02-27 23:51 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-25 5:00 [PATCHSET v2 sched_ext/for-7.1] sched_ext: Implement cgroup sub-scheduler support Tejun Heo
2026-02-25 5:00 ` [PATCH 01/34] sched_ext: Implement cgroup subtree iteration for scx_task_iter Tejun Heo
2026-02-25 5:00 ` [PATCH 02/34] sched_ext: Add @kargs to scx_fork() Tejun Heo
2026-02-25 5:00 ` [PATCH 03/34] sched/core: Swap the order between sched_post_fork() and cgroup_post_fork() Tejun Heo
2026-02-25 5:00 ` [PATCH 04/34] cgroup: Expose some cgroup helpers Tejun Heo
2026-02-25 5:00 ` [PATCH 05/34] sched_ext: Update p->scx.disallow warning in scx_init_task() Tejun Heo
2026-02-25 5:00 ` [PATCH 06/34] sched_ext: Reorganize enable/disable path for multi-scheduler support Tejun Heo
2026-02-25 5:00 ` [PATCH 07/34] sched_ext: Introduce cgroup sub-sched support Tejun Heo
2026-02-26 15:37 ` Andrea Righi
2026-02-27 20:14 ` Tejun Heo
2026-02-26 15:52 ` Andrea Righi
2026-02-27 19:51 ` Tejun Heo
2026-02-27 20:04 ` Tejun Heo
2026-02-25 5:00 ` [PATCH 08/34] sched_ext: Introduce scx_task_sched[_rcu]() Tejun Heo
2026-02-25 5:00 ` [PATCH 09/34] sched_ext: Introduce scx_prog_sched() Tejun Heo
2026-02-25 5:00 ` [PATCH 10/34] sched_ext: Enforce scheduling authority in dispatch and select_cpu operations Tejun Heo
2026-02-25 5:00 ` [PATCH 11/34] sched_ext: Enforce scheduler ownership when updating slice and dsq_vtime Tejun Heo
2026-02-26 15:13 ` Andrea Righi
2026-02-27 22:25 ` Tejun Heo
2026-02-27 23:50 ` Andrea Righi [this message]
2026-02-25 5:00 ` [PATCH 12/34] sched_ext: scx_dsq_move() should validate the task belongs to the right scheduler Tejun Heo
2026-02-25 5:00 ` [PATCH 13/34] sched_ext: Refactor task init/exit helpers Tejun Heo
2026-02-27 6:55 ` Andrea Righi
2026-02-27 19:50 ` Tejun Heo
2026-02-25 5:00 ` [PATCH 14/34] sched_ext: Make scx_prio_less() handle multiple schedulers Tejun Heo
2026-02-25 5:00 ` [PATCH 15/34] sched_ext: Move default slice to per-scheduler field Tejun Heo
2026-02-25 5:00 ` [PATCH 16/34] sched_ext: Move aborting flag " Tejun Heo
2026-02-25 5:00 ` [PATCH 17/34] sched_ext: Move bypass_dsq into scx_sched_pcpu Tejun Heo
2026-02-25 5:00 ` [PATCH 18/34] sched_ext: Move bypass state into scx_sched Tejun Heo
2026-02-25 5:00 ` [PATCH 19/34] sched_ext: Prepare bypass mode for hierarchical operation Tejun Heo
2026-02-25 5:00 ` [PATCH 20/34] sched_ext: Factor out scx_dispatch_sched() Tejun Heo
2026-02-25 5:00 ` [PATCH 21/34] sched_ext: When calling ops.dispatch() @prev must be on the same scx_sched Tejun Heo
2026-02-25 5:00 ` [PATCH 22/34] sched_ext: Separate bypass dispatch enabling from bypass depth tracking Tejun Heo
2026-02-25 5:00 ` [PATCH 23/34] sched_ext: Implement hierarchical bypass mode Tejun Heo
2026-02-25 5:00 ` [PATCH 24/34] sched_ext: Dispatch from all scx_sched instances Tejun Heo
2026-02-25 5:01 ` [PATCH 25/34] sched_ext: Move scx_dsp_ctx and scx_dsp_max_batch into scx_sched Tejun Heo
2026-02-25 5:01 ` [PATCH 26/34] sched_ext: Make watchdog sub-sched aware Tejun Heo
2026-02-25 5:01 ` [PATCH 27/34] sched_ext: Convert scx_dump_state() spinlock to raw spinlock Tejun Heo
2026-02-25 5:01 ` [PATCH 28/34] sched_ext: Support dumping multiple schedulers and add scheduler identification Tejun Heo
2026-02-25 5:01 ` [PATCH 29/34] sched_ext: Implement cgroup sub-sched enabling and disabling Tejun Heo
2026-02-25 5:01 ` [PATCH 30/34] sched_ext: Add scx_sched back pointer to scx_sched_pcpu Tejun Heo
2026-02-25 5:01 ` [PATCH 31/34] sched_ext: Make scx_bpf_reenqueue_local() sub-sched aware Tejun Heo
2026-02-25 5:01 ` [PATCH 32/34] sched_ext: Factor out scx_link_sched() and scx_unlink_sched() Tejun Heo
2026-02-25 5:01 ` [PATCH 33/34] sched_ext: Add rhashtable lookup for sub-schedulers Tejun Heo
2026-02-25 5:01 ` [PATCH 34/34] sched_ext: Add basic building blocks for nested sub-scheduler dispatching Tejun Heo
2026-02-25 5:14 ` [PATCHSET v2 sched_ext/for-7.1] sched_ext: Implement cgroup sub-scheduler support Tejun Heo
-- strict thread matches above, loose matches on Subject: below --
2026-03-04 22:00 [PATCHSET v3 " Tejun Heo
2026-03-04 22:00 ` [PATCH 11/34] sched_ext: Enforce scheduler ownership when updating slice and dsq_vtime Tejun Heo
2026-02-25 5:01 [PATCHSET v2 sched_ext/for-7.1] sched_ext: Implement cgroup sub-scheduler support Tejun Heo
2026-02-25 5:01 ` [PATCH 11/34] sched_ext: Enforce scheduler ownership when updating slice and dsq_vtime Tejun Heo
2026-01-21 23:11 [PATCHSET v1 sched_ext/for-6.20] sched_ext: Implement cgroup sub-scheduler support Tejun Heo
2026-01-21 23:11 ` [PATCH 11/34] sched_ext: Enforce scheduler ownership when updating slice and dsq_vtime Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aaIt4aZzWAZSktJg@gpd4 \
--to=arighi@nvidia.com \
--cc=cgroups@vger.kernel.org \
--cc=changwoo@igalia.com \
--cc=emil@etsalapatis.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mkoutny@suse.com \
--cc=sched-ext@lists.linux.dev \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox