From: Tejun Heo <tj@kernel.org>
To: David Vernet <void@manifault.com>,
Andrea Righi <arighi@nvidia.com>,
Changwoo Min <changwoo@igalia.com>
Cc: sched-ext@lists.linux.dev, Emil Tsalapatis <emil@etsalapatis.com>,
linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>
Subject: [PATCH sched_ext/for-7.3 22/32] sched_ext: Add the SCX_CAP_ENQ_IMMED cap
Date: Thu, 2 Jul 2026 22:01:49 -1000 [thread overview]
Message-ID: <20260703080159.2314350-23-tj@kernel.org> (raw)
In-Reply-To: <20260703080159.2314350-1-tj@kernel.org>
Replace the __SCX_CAP_DUMMY placeholder with SCX_CAP_ENQ_IMMED, which gates
inserting IMMED tasks onto a cid's local DSQ. An IMMED enqueue is guaranteed
to either get its task running on the cpu at once or hand it back to the
scheduler, so IMMED work can never pile up on the cpu's queue and a cpu can
be shared across sub-scheds through IMMED access without any of them
swamping it.
That makes ENQ_IMMED the natural baseline, the minimal cap to make any use
of a cpu. SCX_CAP_BASE aliases it so gates on basic cpu access can state the
intention instead of naming ENQ_IMMED.
Enforcement covers inserts and queued tasks. An insert without the cap is
diverted to the reject DSQ, and queued tasks are reenqueued when the cap is
lost. scx_bpf_sub_dispatch() skips a child that lacks the cap on the cpu, as
its inserts would only be rejected. Vacating the running task on cap loss
lands in a later patch.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
kernel/sched/ext/ext.c | 2 +-
kernel/sched/ext/internal.h | 11 ++++++++---
kernel/sched/ext/sub.c | 8 ++++++++
kernel/sched/ext/sub.h | 4 ++--
4 files changed, 19 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/ext/ext.c b/kernel/sched/ext/ext.c
index b6d68a80a04f..9309d57e3f4f 100644
--- a/kernel/sched/ext/ext.c
+++ b/kernel/sched/ext/ext.c
@@ -4757,7 +4757,7 @@ SCX_ATTR(events);
#ifdef CONFIG_EXT_SUB_SCHED
static const char *scx_cap_names[__SCX_NR_CAPS] = {
- [__SCX_CAP_DUMMY] = "dummy",
+ [__SCX_CAP_ENQ_IMMED] = "enq_immed",
};
static ssize_t scx_attr_caps_show(struct kobject *kobj,
diff --git a/kernel/sched/ext/internal.h b/kernel/sched/ext/internal.h
index ef6b4d0f7dee..20a1ffbe4c26 100644
--- a/kernel/sched/ext/internal.h
+++ b/kernel/sched/ext/internal.h
@@ -1270,17 +1270,22 @@ struct scx_sched_pnode {
* topology-aligned and likely to serve as the locality unit when cids are
* allocated to schedulers, so per-shard lock granularity scales naturally with
* the allocation pattern.
+ *
+ * ENQ_IMMED insert an IMMED task onto the cid's local DSQ
*/
enum scx_cap_flags {
- __SCX_CAP_DUMMY = 0,
+ __SCX_CAP_ENQ_IMMED = 0,
__SCX_NR_CAPS,
__SCX_CAP_ALL = BIT_U64(__SCX_NR_CAPS) - 1,
- SCX_CAP_DUMMY = BIT_U64(__SCX_CAP_DUMMY),
+ SCX_CAP_ENQ_IMMED = BIT_U64(__SCX_CAP_ENQ_IMMED),
+
+ /* alias for minimal cap to make any use of a cpu */
+ SCX_CAP_BASE = SCX_CAP_ENQ_IMMED,
/* caps whose loss strands queued tasks, see scx_process_sync_ecaps() */
- SCX_CAPS_REENQ_ON_LOSS = 0,
+ SCX_CAPS_REENQ_ON_LOSS = SCX_CAP_ENQ_IMMED,
};
#ifdef CONFIG_EXT_SUB_SCHED
diff --git a/kernel/sched/ext/sub.c b/kernel/sched/ext/sub.c
index aea63484edc5..2f1e19db8e72 100644
--- a/kernel/sched/ext/sub.c
+++ b/kernel/sched/ext/sub.c
@@ -1230,6 +1230,14 @@ __bpf_kfunc bool scx_bpf_sub_dispatch(u64 cgroup_id, const struct bpf_prog_aux *
return false;
}
+ /*
+ * Skip a child that does not effectively hold the base cap on this cpu:
+ * its inserts would only be rejected. ecaps are synced at the top of
+ * balance_one() before dispatch, so this reflects the in-effect state.
+ */
+ if (scx_missing_caps(child, cpu_of(this_rq), SCX_CAP_BASE))
+ return false;
+
return scx_dispatch_sched(child, this_rq, this_rq->scx.sub_dispatch_prev,
true);
}
diff --git a/kernel/sched/ext/sub.h b/kernel/sched/ext/sub.h
index 89d1458ff450..ea8bea347bb0 100644
--- a/kernel/sched/ext/sub.h
+++ b/kernel/sched/ext/sub.h
@@ -105,13 +105,13 @@ static inline u64 scx_missing_caps(struct scx_sched *sch, s32 cpu, u64 needed)
/* map @enq_flags to the SCX_CAP_* bit required for the local-DSQ insert */
static inline u64 scx_caps_for_enq(u64 enq_flags)
{
- return 0;
+ return SCX_CAP_ENQ_IMMED;
}
/* map queued @p to the SCX_CAP_* bit required to stay on its local DSQ */
static inline u64 scx_caps_for_task(struct task_struct *p)
{
- return 0;
+ return SCX_CAP_ENQ_IMMED;
}
/* caps implied by holding @cap */
--
2.54.0
next prev parent reply other threads:[~2026-07-03 8:02 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-07-03 8:01 [PATCHSET sched_ext/for-7.3] sched_ext: Capability-based CPU delegation for sub-schedulers Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 01/32] sched_ext: Fix premature ops->priv publication in scx_alloc_and_add_sched() Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 02/32] tools/sched_ext: scx - Fix cmask_subset(), cmask_equal() and cmask_weight() Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 03/32] sched_ext: Use READ_ONCE/WRITE_ONCE in cmask word ops and drop _RACY variants Tejun Heo
2026-07-03 8:33 ` sashiko-bot
2026-07-04 0:54 ` Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 04/32] tools/sched_ext: scx_qmap - Use bare u64/u32/s32 integer types Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 05/32] sched_ext: Reject direct slice and dsq_vtime writes for cid-form schedulers Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 06/32] sched_ext: Make scx_bpf_kick_cid() return void Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 07/32] sched_ext: Make the kick machinery per-sched Tejun Heo
2026-07-03 9:02 ` sashiko-bot
2026-07-04 0:54 ` Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 08/32] sched_ext: Add ops.init_cids() to finalize the cid layout before init Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 09/32] sched_ext: Add CID sharding Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 10/32] sched_ext: Add shard boundaries to scx_bpf_cid_override() Tejun Heo
2026-07-03 9:51 ` sashiko-bot
2026-07-04 0:54 ` Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 11/32] sched_ext: Defer scx_sched kobj sysfs add into the enable workfns Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 12/32] sched_ext: Add per-shard scx_sched storage scaffolding Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 13/32] sched_ext: Add scx_cmask_ref for validated arena cmask access Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 14/32] sched_ext: RCU-protect the sub-sched tree's children/sibling lists Tejun Heo
2026-07-03 10:49 ` sashiko-bot
2026-07-04 0:54 ` Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 15/32] sched_ext: Add scx_skip_subtree_pre() Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 16/32] sched_ext: Add per-shard cap delegation for sub-schedulers Tejun Heo
2026-07-03 11:17 ` sashiko-bot
2026-07-04 0:54 ` Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 17/32] sched_ext: Add coalescing sub_caps_updated() notifier " Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 18/32] sched_ext: Maintain per-cpu effective cap copies for single-read checks Tejun Heo
2026-07-03 12:05 ` sashiko-bot
2026-07-04 0:54 ` Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 19/32] sched_ext: Add sub_ecaps_updated() effective-cap change notifier Tejun Heo
2026-07-03 12:25 ` sashiko-bot
2026-07-04 0:54 ` Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 20/32] sched_ext: Generalize local-DSQ handling to rq-owned DSQs Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 21/32] sched_ext: Add reject DSQ for cap-rejected dispatches Tejun Heo
2026-07-03 12:57 ` sashiko-bot
2026-07-04 0:54 ` Tejun Heo
2026-07-03 8:01 ` Tejun Heo [this message]
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 23/32] sched_ext: Assign a unique id to each scheduler instance Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 24/32] sched_ext: Route task slice writes through set_task_slice() Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 25/32] sched_ext: Tie cpu occupancy to SCX_CAP_BASE through the task slice Tejun Heo
2026-07-03 13:34 ` sashiko-bot
2026-07-04 0:54 ` Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 26/32] sched_ext: Add the SCX_CAP_ENQ cap Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 27/32] sched_ext: Gate kicks on SCX_CAP_BASE and preemption on SCX_CAP_PREEMPT Tejun Heo
2026-07-03 14:01 ` sashiko-bot
2026-07-04 0:54 ` Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 28/32] sched_ext: Route ops.update_idle() to sub-schedulers and re-notify owed scheds Tejun Heo
2026-07-03 14:14 ` sashiko-bot
2026-07-04 0:54 ` Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 29/32] sched_ext: Replay ecaps notifications suppressed by bypass Tejun Heo
2026-07-03 14:28 ` sashiko-bot
2026-07-04 0:54 ` Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 30/32] sched_ext: Add scx_bpf_sub_kill() to evict a child sub-scheduler Tejun Heo
2026-07-03 14:45 ` sashiko-bot
2026-07-04 0:54 ` Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 31/32] tools/sched_ext: scx_qmap - Expand hierarchical sub-scheduling Tejun Heo
2026-07-03 14:57 ` sashiko-bot
2026-07-04 0:54 ` Tejun Heo
2026-07-03 8:01 ` [PATCH sched_ext/for-7.3 32/32] tools/sched_ext: scx_qmap - Add sub-sched cap fault injection Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260703080159.2314350-23-tj@kernel.org \
--to=tj@kernel.org \
--cc=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=emil@etsalapatis.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sched-ext@lists.linux.dev \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox