* [PATCH] sched/ext: Implement cgroup_set_idle() callback
@ 2025-09-30 6:10 zhidao su
2025-10-07 19:15 ` Tejun Heo
0 siblings, 1 reply; 5+ messages in thread
From: zhidao su @ 2025-09-30 6:10 UTC (permalink / raw)
To: sched-ext; +Cc: tj, zhidao su
From: zhidao su <suzhidao@xiaomi.com>
Implement the missing cgroup_set_idle() callback that was marked as a
TODO. This allows BPF schedulers to be notified when a cgroup's idle
state changes, enabling them to adjust their scheduling behavior
accordingly.
The implementation follows the same pattern as other cgroup callbacks
like cgroup_set_weight() and cgroup_set_bandwidth(). It checks if the
BPF scheduler has implemented the callback and invokes it with the
appropriate parameters.
Fixes a spelling error in the cgroup_set_bandwidth() documentation.
Signed-off-by: zhidao su <suzhidao@xiaomi.com>
---
kernel/sched/ext.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 088ceff38c8a..72bf2ad382fb 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -688,12 +688,23 @@ struct sched_ext_ops {
* 2_500_000. @cgrp is entitled to 2.5 CPUs. @burst_us can be
* interpreted in the same fashion and specifies how much @cgrp can
* burst temporarily. The specific control mechanism and thus the
- * interpretation of @period_us and burstiness is upto to the BPF
+ * interpretation of @period_us and burstiness is up to the BPF
* scheduler.
*/
void (*cgroup_set_bandwidth)(struct cgroup *cgrp,
u64 period_us, u64 quota_us, u64 burst_us);
+ /**
+ * @cgroup_set_idle: A cgroup's idle state is being changed
+ * @cgrp: cgroup whose idle state is being updated
+ * @idle: whether the cgroup is entering or exiting idle state
+ *
+ * Update @cgrp's idle state to @idle. This callback is invoked when
+ * a cgroup transitions between idle and non-idle states, allowing the
+ * BPF scheduler to adjust its behavior accordingly.
+ */
+ void (*cgroup_set_idle)(struct cgroup *cgrp, bool idle);
+
#endif /* CONFIG_EXT_GROUP_SCHED */
/*
@@ -4258,7 +4269,15 @@ void scx_group_set_weight(struct task_group *tg, unsigned long weight)
void scx_group_set_idle(struct task_group *tg, bool idle)
{
- /* TODO: Implement ops->cgroup_set_idle() */
+ struct scx_sched *sch = scx_root;
+
+ percpu_down_read(&scx_cgroup_rwsem);
+
+ if (scx_cgroup_enabled && SCX_HAS_OP(sch, cgroup_set_idle))
+ SCX_CALL_OP(sch, SCX_KF_UNLOCKED, cgroup_set_idle, NULL,
+ tg_cgrp(tg), idle);
+
+ percpu_up_read(&scx_cgroup_rwsem);
}
void scx_group_set_bandwidth(struct task_group *tg,
@@ -6004,6 +6023,7 @@ static void sched_ext_ops__cgroup_move(struct task_struct *p, struct cgroup *fro
static void sched_ext_ops__cgroup_cancel_move(struct task_struct *p, struct cgroup *from, struct cgroup *to) {}
static void sched_ext_ops__cgroup_set_weight(struct cgroup *cgrp, u32 weight) {}
static void sched_ext_ops__cgroup_set_bandwidth(struct cgroup *cgrp, u64 period_us, u64 quota_us, u64 burst_us) {}
+static void sched_ext_ops__cgroup_set_idle(struct cgroup *cgrp, bool idle) {}
#endif
static void sched_ext_ops__cpu_online(s32 cpu) {}
static void sched_ext_ops__cpu_offline(s32 cpu) {}
@@ -6042,6 +6062,7 @@ static struct sched_ext_ops __bpf_ops_sched_ext_ops = {
.cgroup_cancel_move = sched_ext_ops__cgroup_cancel_move,
.cgroup_set_weight = sched_ext_ops__cgroup_set_weight,
.cgroup_set_bandwidth = sched_ext_ops__cgroup_set_bandwidth,
+ .cgroup_set_idle = sched_ext_ops__cgroup_set_idle,
#endif
.cpu_online = sched_ext_ops__cpu_online,
.cpu_offline = sched_ext_ops__cpu_offline,
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] sched/ext: Implement cgroup_set_idle() callback
2025-09-30 6:10 [PATCH] sched/ext: Implement cgroup_set_idle() callback zhidao su
@ 2025-10-07 19:15 ` Tejun Heo
2025-10-08 2:09 ` [PATCH v2] sched/ext: Add tg->scx.idle which tracks the current state zhidao su
2025-10-08 2:09 ` [PATCH] " zhidao su
0 siblings, 2 replies; 5+ messages in thread
From: Tejun Heo @ 2025-10-07 19:15 UTC (permalink / raw)
To: zhidao su; +Cc: sched-ext, zhidao su
Hello,
On Tue, Sep 30, 2025 at 02:10:45PM +0800, zhidao su wrote:
> From: zhidao su <suzhidao@xiaomi.com>
>
> Implement the missing cgroup_set_idle() callback that was marked as a
> TODO. This allows BPF schedulers to be notified when a cgroup's idle
> state changes, enabling them to adjust their scheduling behavior
> accordingly.
>
> The implementation follows the same pattern as other cgroup callbacks
> like cgroup_set_weight() and cgroup_set_bandwidth(). It checks if the
> BPF scheduler has implemented the callback and invokes it with the
> appropriate parameters.
Can you please add tg->scx.idle which tracks the current state?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] sched/ext: Add tg->scx.idle which tracks the current state
2025-10-07 19:15 ` Tejun Heo
@ 2025-10-08 2:09 ` zhidao su
2025-10-08 2:09 ` [PATCH] " zhidao su
1 sibling, 0 replies; 5+ messages in thread
From: zhidao su @ 2025-10-08 2:09 UTC (permalink / raw)
To: tj; +Cc: sched-ext, suzhidao
v2: Add tg->scx.idle which tracks the current state
FYI, thanks
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] sched/ext: Add tg->scx.idle which tracks the current state
2025-10-07 19:15 ` Tejun Heo
2025-10-08 2:09 ` [PATCH v2] sched/ext: Add tg->scx.idle which tracks the current state zhidao su
@ 2025-10-08 2:09 ` zhidao su
2025-10-08 18:20 ` Tejun Heo
1 sibling, 1 reply; 5+ messages in thread
From: zhidao su @ 2025-10-08 2:09 UTC (permalink / raw)
To: tj; +Cc: sched-ext, suzhidao, zhidao su
Add an idle field to the scx_task_group structure to track the current
idle state of a task group. This field is initialized to false in
scx_tg_init() and updated in scx_group_set_idle() when the idle state
changes.
This allows BPF schedulers to check the current idle state of a task
group directly from the scx_task_group structure.
v2: Add tg->scx.idle which tracks the current state
Signed-off-by: zhidao su <soolaugust@gmail.com>
---
include/linux/sched/ext.h | 1 +
kernel/sched/ext.c | 4 ++++
2 files changed, 5 insertions(+)
diff --git a/include/linux/sched/ext.h b/include/linux/sched/ext.h
index 7047101dbf58..b65e9abafcb6 100644
--- a/include/linux/sched/ext.h
+++ b/include/linux/sched/ext.h
@@ -224,6 +224,7 @@ struct scx_task_group {
u64 bw_period_us;
u64 bw_quota_us;
u64 bw_burst_us;
+ bool idle;
#endif
};
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 72bf2ad382fb..a2bbcc34e5d5 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4103,6 +4103,7 @@ void scx_tg_init(struct task_group *tg)
tg->scx.weight = CGROUP_WEIGHT_DFL;
tg->scx.bw_period_us = default_bw_period_us();
tg->scx.bw_quota_us = RUNTIME_INF;
+ tg->scx.idle = false;
}
int scx_tg_online(struct task_group *tg)
@@ -4273,6 +4274,9 @@ void scx_group_set_idle(struct task_group *tg, bool idle)
percpu_down_read(&scx_cgroup_rwsem);
+ /* Update the task group's idle state */
+ tg->scx.idle = idle;
+
if (scx_cgroup_enabled && SCX_HAS_OP(sch, cgroup_set_idle))
SCX_CALL_OP(sch, SCX_KF_UNLOCKED, cgroup_set_idle, NULL,
tg_cgrp(tg), idle);
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] sched/ext: Add tg->scx.idle which tracks the current state
2025-10-08 2:09 ` [PATCH] " zhidao su
@ 2025-10-08 18:20 ` Tejun Heo
0 siblings, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2025-10-08 18:20 UTC (permalink / raw)
To: zhidao su; +Cc: sched-ext, suzhidao
On Wed, Oct 08, 2025 at 10:09:39AM +0800, zhidao su wrote:
> Add an idle field to the scx_task_group structure to track the current
> idle state of a task group. This field is initialized to false in
> scx_tg_init() and updated in scx_group_set_idle() when the idle state
> changes.
>
> This allows BPF schedulers to check the current idle state of a task
> group directly from the scx_task_group structure.
>
> v2: Add tg->scx.idle which tracks the current state
Hmm... maybe a blank line here? I don't know whether there actually is an
established convention but I don't think we usually put version delta
descriptions in the tag section.
> Signed-off-by: zhidao su <soolaugust@gmail.com>
...
> @@ -4273,6 +4274,9 @@ void scx_group_set_idle(struct task_group *tg, bool idle)
>
> percpu_down_read(&scx_cgroup_rwsem);
>
> + /* Update the task group's idle state */
> + tg->scx.idle = idle;
> +
> if (scx_cgroup_enabled && SCX_HAS_OP(sch, cgroup_set_idle))
> SCX_CALL_OP(sch, SCX_KF_UNLOCKED, cgroup_set_idle, NULL,
> tg_cgrp(tg), idle);
This can be either way but scx_group_set_weight() sets the weight after
calling the ops.cgroup_set_weight() is called, so let's match that.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-10-08 18:20 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-30 6:10 [PATCH] sched/ext: Implement cgroup_set_idle() callback zhidao su
2025-10-07 19:15 ` Tejun Heo
2025-10-08 2:09 ` [PATCH v2] sched/ext: Add tg->scx.idle which tracks the current state zhidao su
2025-10-08 2:09 ` [PATCH] " zhidao su
2025-10-08 18:20 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox