From: Tejun Heo <tj@kernel.org>
To: linux-kernel@vger.kernel.org, sched-ext@lists.linux.dev
Cc: void@manifault.com, arighi@nvidia.com, changwoo@igalia.com,
emil@etsalapatis.com, hannes@cmpxchg.org, mkoutny@suse.com,
cgroups@vger.kernel.org, Tejun Heo <tj@kernel.org>
Subject: [PATCH 10/34] sched_ext: Enforce scheduling authority in dispatch and select_cpu operations
Date: Tue, 24 Feb 2026 19:00:45 -1000 [thread overview]
Message-ID: <20260225050109.1070059-11-tj@kernel.org> (raw)
In-Reply-To: <20260225050109.1070059-1-tj@kernel.org>
Add checks to enforce scheduling authority boundaries when multiple
schedulers are present:
1. In scx_dsq_insert_preamble() and the dispatch retry path, ignore attempts
to insert tasks that the scheduler doesn't own, counting them via
SCX_EV_INSERT_NOT_OWNED. As BPF schedulers are allowed to ignore
dequeues, such attempts can occur legitimately during sub-scheduler
enabling when tasks move between schedulers. The counter helps distinguish
normal cases from scheduler bugs.
2. For scx_bpf_dsq_insert_vtime() and scx_bpf_select_cpu_and(), error out
when sub-schedulers are attached. These functions lack the aux__prog
parameter needed to identify the calling scheduler, so they cannot be used
safely with multiple schedulers. BPF programs should use the arg-wrapped
versions (__scx_bpf_dsq_insert_vtime() and __scx_bpf_select_cpu_and())
instead.
These checks ensure that with multiple concurrent schedulers, scheduler
identity can be properly determined and unauthorized task operations are
prevented or tracked.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
kernel/sched/ext.c | 26 ++++++++++++++++++++++++++
kernel/sched/ext_idle.c | 11 +++++++++++
kernel/sched/ext_internal.h | 12 ++++++++++++
3 files changed, 49 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 30dd65b33802..56ac2d5655a2 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2321,6 +2321,12 @@ static void finish_dispatch(struct scx_sched *sch, struct rq *rq,
if ((opss & SCX_OPSS_QSEQ_MASK) != qseq_at_dispatch)
return;
+ /* see SCX_EV_INSERT_NOT_OWNED definition */
+ if (unlikely(!scx_task_on_sched(sch, p))) {
+ __scx_add_event(sch, SCX_EV_INSERT_NOT_OWNED, 1);
+ return;
+ }
+
/*
* While we know @p is accessible, we don't yet have a claim on
* it - the BPF scheduler is allowed to dispatch tasks
@@ -4011,6 +4017,7 @@ static ssize_t scx_attr_events_show(struct kobject *kobj,
at += scx_attr_event_show(buf, at, &events, SCX_EV_BYPASS_DURATION);
at += scx_attr_event_show(buf, at, &events, SCX_EV_BYPASS_DISPATCH);
at += scx_attr_event_show(buf, at, &events, SCX_EV_BYPASS_ACTIVATE);
+ at += scx_attr_event_show(buf, at, &events, SCX_EV_INSERT_NOT_OWNED);
return at;
}
SCX_ATTR(events);
@@ -5131,6 +5138,7 @@ static void scx_dump_state(struct scx_exit_info *ei, size_t dump_len)
scx_dump_event(s, &events, SCX_EV_BYPASS_DURATION);
scx_dump_event(s, &events, SCX_EV_BYPASS_DISPATCH);
scx_dump_event(s, &events, SCX_EV_BYPASS_ACTIVATE);
+ scx_dump_event(s, &events, SCX_EV_INSERT_NOT_OWNED);
if (seq_buf_has_overflowed(&s) && dump_len >= sizeof(trunc_marker))
memcpy(ei->dump + dump_len - sizeof(trunc_marker),
@@ -6409,6 +6417,12 @@ static bool scx_dsq_insert_preamble(struct scx_sched *sch, struct task_struct *p
return false;
}
+ /* see SCX_EV_INSERT_NOT_OWNED definition */
+ if (unlikely(!scx_task_on_sched(sch, p))) {
+ __scx_add_event(sch, SCX_EV_INSERT_NOT_OWNED, 1);
+ return false;
+ }
+
return true;
}
@@ -6601,6 +6615,17 @@ __bpf_kfunc void scx_bpf_dsq_insert_vtime(struct task_struct *p, u64 dsq_id,
if (unlikely(!sch))
return;
+#ifdef CONFIG_EXT_SUB_SCHED
+ /*
+ * Disallow if any sub-scheds are attached. There is no way to tell
+ * which scheduler called us, just error out @p's scheduler.
+ */
+ if (unlikely(!list_empty(&sch->children))) {
+ scx_error(scx_task_sched(p), "__scx_bpf_dsq_insert_vtime() must be used");
+ return;
+ }
+#endif
+
scx_dsq_insert_vtime(sch, p, dsq_id, slice, vtime, enq_flags);
}
@@ -7933,6 +7958,7 @@ static void scx_read_events(struct scx_sched *sch, struct scx_event_stats *event
scx_agg_event(events, e_cpu, SCX_EV_BYPASS_DURATION);
scx_agg_event(events, e_cpu, SCX_EV_BYPASS_DISPATCH);
scx_agg_event(events, e_cpu, SCX_EV_BYPASS_ACTIVATE);
+ scx_agg_event(events, e_cpu, SCX_EV_INSERT_NOT_OWNED);
}
}
diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index 34487a83d3f7..321efd7b14fb 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -1061,6 +1061,17 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64
if (unlikely(!sch))
return -ENODEV;
+#ifdef CONFIG_EXT_SUB_SCHED
+ /*
+ * Disallow if any sub-scheds are attached. There is no way to tell
+ * which scheduler called us, just error out @p's scheduler.
+ */
+ if (unlikely(!list_empty(&sch->children))) {
+ scx_error(scx_task_sched(p), "__scx_bpf_select_cpu_and() must be used");
+ return -EINVAL;
+ }
+#endif
+
return select_cpu_from_kfunc(sch, p, prev_cpu, wake_flags,
cpus_allowed, flags);
}
diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
index 679325e8c19b..3b17180ba3dd 100644
--- a/kernel/sched/ext_internal.h
+++ b/kernel/sched/ext_internal.h
@@ -911,6 +911,18 @@ struct scx_event_stats {
* The number of times the bypassing mode has been activated.
*/
s64 SCX_EV_BYPASS_ACTIVATE;
+
+ /*
+ * The number of times the scheduler attempted to insert a task that it
+ * doesn't own into a DSQ. Such attempts are ignored.
+ *
+ * As BPF schedulers are allowed to ignore dequeues, it's difficult to
+ * tell whether such an attempt is from a scheduler malfunction or an
+ * ignored dequeue around sub-sched enabling. If this count keeps going
+ * up regardless of sub-sched enabling, it likely indicates a bug in the
+ * scheduler.
+ */
+ s64 SCX_EV_INSERT_NOT_OWNED;
};
struct scx_sched_pcpu {
--
2.53.0
next prev parent reply other threads:[~2026-02-25 5:01 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-25 5:00 [PATCHSET v2 sched_ext/for-7.1] sched_ext: Implement cgroup sub-scheduler support Tejun Heo
2026-02-25 5:00 ` [PATCH 01/34] sched_ext: Implement cgroup subtree iteration for scx_task_iter Tejun Heo
2026-02-25 5:00 ` [PATCH 02/34] sched_ext: Add @kargs to scx_fork() Tejun Heo
2026-02-25 5:00 ` [PATCH 03/34] sched/core: Swap the order between sched_post_fork() and cgroup_post_fork() Tejun Heo
2026-02-25 5:00 ` [PATCH 04/34] cgroup: Expose some cgroup helpers Tejun Heo
2026-02-25 5:00 ` [PATCH 05/34] sched_ext: Update p->scx.disallow warning in scx_init_task() Tejun Heo
2026-02-25 5:00 ` [PATCH 06/34] sched_ext: Reorganize enable/disable path for multi-scheduler support Tejun Heo
2026-02-25 5:00 ` [PATCH 07/34] sched_ext: Introduce cgroup sub-sched support Tejun Heo
2026-02-26 15:37 ` Andrea Righi
2026-02-27 20:14 ` Tejun Heo
2026-02-26 15:52 ` Andrea Righi
2026-02-27 19:51 ` Tejun Heo
2026-02-27 20:04 ` Tejun Heo
2026-02-25 5:00 ` [PATCH 08/34] sched_ext: Introduce scx_task_sched[_rcu]() Tejun Heo
2026-02-25 5:00 ` [PATCH 09/34] sched_ext: Introduce scx_prog_sched() Tejun Heo
2026-02-25 5:00 ` Tejun Heo [this message]
2026-02-25 5:00 ` [PATCH 11/34] sched_ext: Enforce scheduler ownership when updating slice and dsq_vtime Tejun Heo
2026-02-26 15:13 ` Andrea Righi
2026-02-27 22:25 ` Tejun Heo
2026-02-27 23:50 ` Andrea Righi
2026-02-25 5:00 ` [PATCH 12/34] sched_ext: scx_dsq_move() should validate the task belongs to the right scheduler Tejun Heo
2026-02-25 5:00 ` [PATCH 13/34] sched_ext: Refactor task init/exit helpers Tejun Heo
2026-02-27 6:55 ` Andrea Righi
2026-02-27 19:50 ` Tejun Heo
2026-02-25 5:00 ` [PATCH 14/34] sched_ext: Make scx_prio_less() handle multiple schedulers Tejun Heo
2026-02-25 5:00 ` [PATCH 15/34] sched_ext: Move default slice to per-scheduler field Tejun Heo
2026-02-25 5:00 ` [PATCH 16/34] sched_ext: Move aborting flag " Tejun Heo
2026-02-25 5:00 ` [PATCH 17/34] sched_ext: Move bypass_dsq into scx_sched_pcpu Tejun Heo
2026-02-25 5:00 ` [PATCH 18/34] sched_ext: Move bypass state into scx_sched Tejun Heo
2026-02-25 5:00 ` [PATCH 19/34] sched_ext: Prepare bypass mode for hierarchical operation Tejun Heo
2026-02-25 5:00 ` [PATCH 20/34] sched_ext: Factor out scx_dispatch_sched() Tejun Heo
2026-02-25 5:00 ` [PATCH 21/34] sched_ext: When calling ops.dispatch() @prev must be on the same scx_sched Tejun Heo
2026-02-25 5:00 ` [PATCH 22/34] sched_ext: Separate bypass dispatch enabling from bypass depth tracking Tejun Heo
2026-02-25 5:00 ` [PATCH 23/34] sched_ext: Implement hierarchical bypass mode Tejun Heo
2026-02-25 5:00 ` [PATCH 24/34] sched_ext: Dispatch from all scx_sched instances Tejun Heo
2026-02-25 5:01 ` [PATCH 25/34] sched_ext: Move scx_dsp_ctx and scx_dsp_max_batch into scx_sched Tejun Heo
2026-02-25 5:01 ` [PATCH 26/34] sched_ext: Make watchdog sub-sched aware Tejun Heo
2026-02-25 5:01 ` [PATCH 27/34] sched_ext: Convert scx_dump_state() spinlock to raw spinlock Tejun Heo
2026-02-25 5:01 ` [PATCH 28/34] sched_ext: Support dumping multiple schedulers and add scheduler identification Tejun Heo
2026-02-25 5:01 ` [PATCH 29/34] sched_ext: Implement cgroup sub-sched enabling and disabling Tejun Heo
2026-02-25 5:01 ` [PATCH 30/34] sched_ext: Add scx_sched back pointer to scx_sched_pcpu Tejun Heo
2026-02-25 5:01 ` [PATCH 31/34] sched_ext: Make scx_bpf_reenqueue_local() sub-sched aware Tejun Heo
2026-02-25 5:01 ` [PATCH 32/34] sched_ext: Factor out scx_link_sched() and scx_unlink_sched() Tejun Heo
2026-02-25 5:01 ` [PATCH 33/34] sched_ext: Add rhashtable lookup for sub-schedulers Tejun Heo
2026-02-25 5:01 ` [PATCH 34/34] sched_ext: Add basic building blocks for nested sub-scheduler dispatching Tejun Heo
2026-02-25 5:14 ` [PATCHSET v2 sched_ext/for-7.1] sched_ext: Implement cgroup sub-scheduler support Tejun Heo
-- strict thread matches above, loose matches on Subject: below --
2026-03-04 22:00 [PATCHSET v3 " Tejun Heo
2026-03-04 22:00 ` [PATCH 10/34] sched_ext: Enforce scheduling authority in dispatch and select_cpu operations Tejun Heo
2026-02-25 5:01 [PATCHSET v2 sched_ext/for-7.1] sched_ext: Implement cgroup sub-scheduler support Tejun Heo
2026-02-25 5:01 ` [PATCH 10/34] sched_ext: Enforce scheduling authority in dispatch and select_cpu operations Tejun Heo
2026-01-21 23:11 [PATCHSET v1 sched_ext/for-6.20] sched_ext: Implement cgroup sub-scheduler support Tejun Heo
2026-01-21 23:11 ` [PATCH 10/34] sched_ext: Enforce scheduling authority in dispatch and select_cpu operations Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260225050109.1070059-11-tj@kernel.org \
--to=tj@kernel.org \
--cc=arighi@nvidia.com \
--cc=cgroups@vger.kernel.org \
--cc=changwoo@igalia.com \
--cc=emil@etsalapatis.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mkoutny@suse.com \
--cc=sched-ext@lists.linux.dev \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.