* [PATCH sched_ext/for-7.1-fixes] sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work
@ 2026-05-11 23:18 Tejun Heo
2026-05-12 21:33 ` Tejun Heo
2026-05-13 4:00 ` sashiko-bot
0 siblings, 2 replies; 3+ messages in thread
From: Tejun Heo @ 2026-05-11 23:18 UTC (permalink / raw)
To: David Vernet, Andrea Righi, Changwoo Min
Cc: sched-ext, Emil Tsalapatis, linux-kernel
scx_sub_enable_workfn() pins parent->kobj before dropping scx_sched_lock,
but that does not pin parent->sub_kset. Concurrent disable can
kset_unregister and free sub_kset before scx_alloc_and_add_sched()
dereferences it.
Split sub_kset teardown: kobject_del() at disable keeps sysfs removal; defer
kobject_put() to scx_sched_free_rcu_work so the memory survives. A racing
child sees state_in_sysfs=0 with valid memory, sysfs_create_dir() fails, and
the existing exit_kind gate in scx_link_sched() turns it away with -ENOENT.
Fixes: 411d3ef1a705 ("sched_ext: Unregister sub_kset on scheduler disable")
Signed-off-by: Tejun Heo <tj@kernel.org>
---
kernel/sched/ext.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 1fb3329ae2b4..e18c0abbc575 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4829,6 +4829,8 @@ static void scx_sched_free_rcu_work(struct work_struct *work)
kfree(sch->cgrp_path);
if (sch_cgroup(sch))
cgroup_put(sch_cgroup(sch));
+ if (sch->sub_kset)
+ kobject_put(&sch->sub_kset->kobj);
#endif /* CONFIG_EXT_SUB_SCHED */
for_each_possible_cpu(cpu) {
@@ -5869,7 +5871,7 @@ static void scx_sub_disable(struct scx_sched *sch)
if (sch->ops.exit)
SCX_CALL_OP(sch, exit, NULL, sch->exit_info);
if (sch->sub_kset)
- kset_unregister(sch->sub_kset);
+ kobject_del(&sch->sub_kset->kobj);
kobject_del(&sch->kobj);
}
#else /* CONFIG_EXT_SUB_SCHED */
@@ -6003,7 +6005,7 @@ static void scx_root_disable(struct scx_sched *sch)
*/
#ifdef CONFIG_EXT_SUB_SCHED
if (sch->sub_kset)
- kset_unregister(sch->sub_kset);
+ kobject_del(&sch->sub_kset->kobj);
#endif
kobject_del(&sch->kobj);
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work
2026-05-11 23:18 [PATCH sched_ext/for-7.1-fixes] sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work Tejun Heo
@ 2026-05-12 21:33 ` Tejun Heo
2026-05-13 4:00 ` sashiko-bot
1 sibling, 0 replies; 3+ messages in thread
From: Tejun Heo @ 2026-05-12 21:33 UTC (permalink / raw)
To: David Vernet, Andrea Righi, Changwoo Min
Cc: sched-ext, Emil Tsalapatis, linux-kernel
Hello,
Applied to sched_ext/for-7.1-fixes.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work
2026-05-11 23:18 [PATCH sched_ext/for-7.1-fixes] sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work Tejun Heo
2026-05-12 21:33 ` Tejun Heo
@ 2026-05-13 4:00 ` sashiko-bot
1 sibling, 0 replies; 3+ messages in thread
From: sashiko-bot @ 2026-05-13 4:00 UTC (permalink / raw)
To: Tejun Heo; +Cc: sched-ext
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Relying on sysfs_create_dir() failure triggers a user-triggerable kernel log spam (pr_err).
- [Medium] Inconsistent error handling paths for the same race condition.
--
commit e9f250877d2f7deea4cca723a738f6925005797e
Author: Tejun Heo <tj@kernel.org>
sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work
This commit splits the sub_kset teardown by moving kobject_put() to
scx_sched_free_rcu_work() while keeping kobject_del() in the disable
path. This ensures the memory remains valid during concurrent disable
and enable operations, relying on sysfs creation failure to catch races.
> A racing child sees state_in_sysfs=0 with valid memory, sysfs_create_dir() fails, and
> the existing exit_kind gate in scx_link_sched() turns it away with -ENOENT.
Does this accurately describe the error path? If sysfs_create_dir() fails,
wouldn't scx_alloc_and_add_sched() return ERR_PTR(-ENOENT) and jump directly
to out_unlock in scx_sub_enable_workfn(), completely bypassing
scx_link_sched()?
Also, does this create an inconsistent API contract for userspace? If the
sysfs creation fails, userspace receives a synchronous -ENOENT. However, if
sysfs creation succeeds but the race is caught later by the exit_kind gate in
scx_link_sched(), the error path jumps to err_disable which sets cmd->ret = 0,
resulting in an asynchronous failure instead.
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
[ ... ]
> @@ -5801,7 +5803,7 @@ static void scx_sub_disable(struct scx_sched *sch)
> if (sch->ops.exit)
> SCX_CALL_OP(sch, exit, NULL, sch->exit_info);
> if (sch->sub_kset)
> - kset_unregister(sch->sub_kset);
> + kobject_del(&sch->sub_kset->kobj);
> kobject_del(&sch->kobj);
> }
Will relying on sysfs failure here cause user-triggerable kernel log spam?
When the parent's sysfs directory is removed via kobject_del(), its ->sd
pointer is set to NULL. When the racing child subsequently calls
kobject_init_and_add() and sysfs_create_dir_ns() fails with -ENOENT,
kobject_add_internal() will unconditionally print a pr_err() about the failure.
Since this is an expected race condition, is it safe to handle it by letting
sysfs fail and print an error to dmesg?
--
Sashiko AI review · https://sashiko.dev/#/patchset/2f2b7481ef217ad89c5d90fd56fc0c01@kernel.org?part=1
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-13 4:00 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11 23:18 [PATCH sched_ext/for-7.1-fixes] sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work Tejun Heo
2026-05-12 21:33 ` Tejun Heo
2026-05-13 4:00 ` sashiko-bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox