Sched_ext development
 help / color / mirror / Atom feed
* [PATCH sched_ext/for-7.1-fixes] sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work
@ 2026-05-11 23:18 Tejun Heo
  2026-05-12 21:33 ` Tejun Heo
  2026-05-13  4:00 ` sashiko-bot
  0 siblings, 2 replies; 3+ messages in thread
From: Tejun Heo @ 2026-05-11 23:18 UTC (permalink / raw)
  To: David Vernet, Andrea Righi, Changwoo Min
  Cc: sched-ext, Emil Tsalapatis, linux-kernel

scx_sub_enable_workfn() pins parent->kobj before dropping scx_sched_lock,
but that does not pin parent->sub_kset. Concurrent disable can
kset_unregister and free sub_kset before scx_alloc_and_add_sched()
dereferences it.

Split sub_kset teardown: kobject_del() at disable keeps sysfs removal; defer
kobject_put() to scx_sched_free_rcu_work so the memory survives. A racing
child sees state_in_sysfs=0 with valid memory, sysfs_create_dir() fails, and
the existing exit_kind gate in scx_link_sched() turns it away with -ENOENT.

Fixes: 411d3ef1a705 ("sched_ext: Unregister sub_kset on scheduler disable")
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/sched/ext.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 1fb3329ae2b4..e18c0abbc575 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4829,6 +4829,8 @@ static void scx_sched_free_rcu_work(struct work_struct *work)
 	kfree(sch->cgrp_path);
 	if (sch_cgroup(sch))
 		cgroup_put(sch_cgroup(sch));
+	if (sch->sub_kset)
+		kobject_put(&sch->sub_kset->kobj);
 #endif	/* CONFIG_EXT_SUB_SCHED */

 	for_each_possible_cpu(cpu) {
@@ -5869,7 +5871,7 @@ static void scx_sub_disable(struct scx_sched *sch)
 	if (sch->ops.exit)
 		SCX_CALL_OP(sch, exit, NULL, sch->exit_info);
 	if (sch->sub_kset)
-		kset_unregister(sch->sub_kset);
+		kobject_del(&sch->sub_kset->kobj);
 	kobject_del(&sch->kobj);
 }
 #else	/* CONFIG_EXT_SUB_SCHED */
@@ -6003,7 +6005,7 @@ static void scx_root_disable(struct scx_sched *sch)
 	 */
 #ifdef CONFIG_EXT_SUB_SCHED
 	if (sch->sub_kset)
-		kset_unregister(sch->sub_kset);
+		kobject_del(&sch->sub_kset->kobj);
 #endif
 	kobject_del(&sch->kobj);


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work
  2026-05-11 23:18 [PATCH sched_ext/for-7.1-fixes] sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work Tejun Heo
@ 2026-05-12 21:33 ` Tejun Heo
  2026-05-13  4:00 ` sashiko-bot
  1 sibling, 0 replies; 3+ messages in thread
From: Tejun Heo @ 2026-05-12 21:33 UTC (permalink / raw)
  To: David Vernet, Andrea Righi, Changwoo Min
  Cc: sched-ext, Emil Tsalapatis, linux-kernel

Hello,

Applied to sched_ext/for-7.1-fixes.

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work
  2026-05-11 23:18 [PATCH sched_ext/for-7.1-fixes] sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work Tejun Heo
  2026-05-12 21:33 ` Tejun Heo
@ 2026-05-13  4:00 ` sashiko-bot
  1 sibling, 0 replies; 3+ messages in thread
From: sashiko-bot @ 2026-05-13  4:00 UTC (permalink / raw)
  To: Tejun Heo; +Cc: sched-ext

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Relying on sysfs_create_dir() failure triggers a user-triggerable kernel log spam (pr_err).
- [Medium] Inconsistent error handling paths for the same race condition.
--

commit e9f250877d2f7deea4cca723a738f6925005797e
Author: Tejun Heo <tj@kernel.org>

sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work

This commit splits the sub_kset teardown by moving kobject_put() to
scx_sched_free_rcu_work() while keeping kobject_del() in the disable
path. This ensures the memory remains valid during concurrent disable
and enable operations, relying on sysfs creation failure to catch races.

> A racing child sees state_in_sysfs=0 with valid memory, sysfs_create_dir() fails, and
> the existing exit_kind gate in scx_link_sched() turns it away with -ENOENT.

Does this accurately describe the error path? If sysfs_create_dir() fails,
wouldn't scx_alloc_and_add_sched() return ERR_PTR(-ENOENT) and jump directly
to out_unlock in scx_sub_enable_workfn(), completely bypassing
scx_link_sched()?

Also, does this create an inconsistent API contract for userspace? If the
sysfs creation fails, userspace receives a synchronous -ENOENT. However, if
sysfs creation succeeds but the race is caught later by the exit_kind gate in
scx_link_sched(), the error path jumps to err_disable which sets cmd->ret = 0,
resulting in an asynchronous failure instead.

> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
[ ... ]
> @@ -5801,7 +5803,7 @@ static void scx_sub_disable(struct scx_sched *sch)
>  	if (sch->ops.exit)
>  		SCX_CALL_OP(sch, exit, NULL, sch->exit_info);
>  	if (sch->sub_kset)
> -		kset_unregister(sch->sub_kset);
> +		kobject_del(&sch->sub_kset->kobj);
>  	kobject_del(&sch->kobj);
>  }

Will relying on sysfs failure here cause user-triggerable kernel log spam?

When the parent's sysfs directory is removed via kobject_del(), its ->sd
pointer is set to NULL. When the racing child subsequently calls
kobject_init_and_add() and sysfs_create_dir_ns() fails with -ENOENT,
kobject_add_internal() will unconditionally print a pr_err() about the failure.

Since this is an expected race condition, is it safe to handle it by letting
sysfs fail and print an error to dmesg?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/2f2b7481ef217ad89c5d90fd56fc0c01@kernel.org?part=1

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-13  4:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11 23:18 [PATCH sched_ext/for-7.1-fixes] sched_ext: Defer sub_kset base put to scx_sched_free_rcu_work Tejun Heo
2026-05-12 21:33 ` Tejun Heo
2026-05-13  4:00 ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox