* [PATCH sched_ext/for-7.1-fixes] sched_ext: Fix deadlock between scx_root_disable() and concurrent forks
@ 2026-05-17 0:41 Tejun Heo
2026-05-17 10:56 ` Andrea Righi
2026-05-17 17:43 ` [PATCH v2 " Tejun Heo
0 siblings, 2 replies; 7+ messages in thread
From: Tejun Heo @ 2026-05-17 0:41 UTC (permalink / raw)
To: David Vernet, Andrea Righi, Changwoo Min
Cc: sched-ext, Emil Tsalapatis, linux-kernel
scx_root_disable() enters SCX_DISABLING before it grabs scx_enable_mutex to
clear [__]scx_switching_all. task_should_scx() short-circuits on DISABLING,
so forks in that window land on fair while next_active_class() still skips
fair - the new tasks stall.
This can deadlock the disable path itself: scx_alloc_and_add_sched() runs
under scx_enable_mutex and creates a helper kthread; if that new kthread is
one of the stalled fair tasks, the mutex holder waits forever and
scx_root_disable() can never make progress. Only sub-sched support exposes
this, since sub-sched enables are the only path where
scx_alloc_and_add_sched() can race the root's disable.
Move the DISABLING check after @scx_switching_all so that whenever
@scx_switching_all is set, forks keep going to scx and stay in lockstep with
__scx_switched_all. Once both are cleared (together under the mutex),
DISABLING applies normally.
Fixes: 337ec00b1d9c ("sched_ext: Implement cgroup sub-sched enabling and disabling")
Signed-off-by: Tejun Heo <tj@kernel.org>
---
kernel/sched/ext.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5092,10 +5092,31 @@ static const struct kset_uevent_ops scx_
*/
bool task_should_scx(int policy)
{
- if (!scx_enabled() || unlikely(scx_enable_state() == SCX_DISABLING))
+ /* if disabled, nothing should be on it */
+ if (!scx_enabled())
return false;
+
+ /* scx is taking over all SCHED_OTHER and SCHED_EXT tasks */
if (READ_ONCE(scx_switching_all))
return true;
+
+ /*
+ * scx is tearing down - keep new SCHED_EXT tasks out.
+ *
+ * Must come after scx_switching_all test. While both are set, we must
+ * return true via the branch above: [__]scx_switching_all are cleared
+ * together under scx_enable_mutex, and a fork routed to fair while
+ * __scx_switched_all is still on would stall because
+ * next_active_class() skips fair.
+ *
+ * This can develop into a deadlock - scx holds scx_enable_mutex across
+ * kthread_create() in scx_alloc_and_add_sched(); if the new kthread is
+ * the stalled task, the disable path can never grab the mutex to clear
+ * scx_switching_all.
+ */
+ if (unlikely(scx_enable_state() == SCX_DISABLING))
+ return false;
+
return policy == SCHED_EXT;
}
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Fix deadlock between scx_root_disable() and concurrent forks 2026-05-17 0:41 [PATCH sched_ext/for-7.1-fixes] sched_ext: Fix deadlock between scx_root_disable() and concurrent forks Tejun Heo @ 2026-05-17 10:56 ` Andrea Righi 2026-05-17 17:25 ` Tejun Heo 2026-05-17 17:43 ` [PATCH v2 " Tejun Heo 1 sibling, 1 reply; 7+ messages in thread From: Andrea Righi @ 2026-05-17 10:56 UTC (permalink / raw) To: Tejun Heo Cc: David Vernet, Changwoo Min, sched-ext, Emil Tsalapatis, linux-kernel Hi Tejun, On Sat, May 16, 2026 at 02:41:20PM -1000, Tejun Heo wrote: > scx_root_disable() enters SCX_DISABLING before it grabs scx_enable_mutex to > clear [__]scx_switching_all. task_should_scx() short-circuits on DISABLING, > so forks in that window land on fair while next_active_class() still skips > fair - the new tasks stall. > > This can deadlock the disable path itself: scx_alloc_and_add_sched() runs > under scx_enable_mutex and creates a helper kthread; if that new kthread is > one of the stalled fair tasks, the mutex holder waits forever and > scx_root_disable() can never make progress. Only sub-sched support exposes > this, since sub-sched enables are the only path where > scx_alloc_and_add_sched() can race the root's disable. > > Move the DISABLING check after @scx_switching_all so that whenever > @scx_switching_all is set, forks keep going to scx and stay in lockstep with > __scx_switched_all. Once both are cleared (together under the mutex), > DISABLING applies normally. > > Fixes: 337ec00b1d9c ("sched_ext: Implement cgroup sub-sched enabling and disabling") > Signed-off-by: Tejun Heo <tj@kernel.org> > --- > kernel/sched/ext.c | 23 ++++++++++++++++++++++- > 1 file changed, 22 insertions(+), 1 deletion(-) > > --- a/kernel/sched/ext.c > +++ b/kernel/sched/ext.c > @@ -5092,10 +5092,31 @@ static const struct kset_uevent_ops scx_ > */ > bool task_should_scx(int policy) > { > - if (!scx_enabled() || unlikely(scx_enable_state() == SCX_DISABLING)) > + /* if disabled, nothing should be on it */ > + if (!scx_enabled()) > return false; > + > + /* scx is taking over all SCHED_OTHER and SCHED_EXT tasks */ > if (READ_ONCE(scx_switching_all)) > return true; > + > + /* > + * scx is tearing down - keep new SCHED_EXT tasks out. > + * > + * Must come after scx_switching_all test. While both are set, we must > + * return true via the branch above: [__]scx_switching_all are cleared > + * together under scx_enable_mutex, and a fork routed to fair while > + * __scx_switched_all is still on would stall because > + * next_active_class() skips fair. Just being extra picky: [__]scx_switching_all are cleared together sequentially, but not atomically (in fact the order is what matters). To make it more clear, how about rephrasing the comment block above like this: * Must come after the scx_switching_all test. scx_root_disable() * clears __scx_switched_all before scx_switching_all (both under * scx_enable_mutex), so while scx_switching_all is observed as true, * __scx_switched_all may still be on. A fork routed to fair in that * window would stall because next_active_class() skips fair. > + * > + * This can develop into a deadlock - scx holds scx_enable_mutex across > + * kthread_create() in scx_alloc_and_add_sched(); if the new kthread is > + * the stalled task, the disable path can never grab the mutex to clear > + * scx_switching_all. > + */ > + if (unlikely(scx_enable_state() == SCX_DISABLING)) > + return false; > + > return policy == SCHED_EXT; > } > Other than that, looks good to me. Reviewed-by: Andrea Righi <arighi@nvidia.com> Thanks, -Andrea ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Fix deadlock between scx_root_disable() and concurrent forks 2026-05-17 10:56 ` Andrea Righi @ 2026-05-17 17:25 ` Tejun Heo 0 siblings, 0 replies; 7+ messages in thread From: Tejun Heo @ 2026-05-17 17:25 UTC (permalink / raw) To: Andrea Righi Cc: David Vernet, Changwoo Min, sched-ext, Emil Tsalapatis, linux-kernel Hello, On Sun, May 17, 2026 at 12:56:34PM +0200, Andrea Righi wrote: > > + * Must come after scx_switching_all test. While both are set, we must > > + * return true via the branch above: [__]scx_switching_all are cleared > > + * together under scx_enable_mutex, and a fork routed to fair while > > + * __scx_switched_all is still on would stall because > > + * next_active_class() skips fair. > > Just being extra picky: [__]scx_switching_all are cleared together sequentially, > but not atomically (in fact the order is what matters). To make it more clear, > how about rephrasing the comment block above like this: > > * Must come after the scx_switching_all test. scx_root_disable() > * clears __scx_switched_all before scx_switching_all (both under > * scx_enable_mutex), so while scx_switching_all is observed as true, > * __scx_switched_all may still be on. A fork routed to fair in that > * window would stall because next_active_class() skips fair. Hmm... I don't think the ordering between scx_switching_all and __scx_switching_all matters here. The stall is caused by the gap between the earlier DISABLING transition and __scx_switching_all being turned off which here is tested through scx_switching_all and at this point as the mutex is already held, even if you swapped scx_switching_all's position with __scx_switching_all, it wouldn't matter. It's just kinda confusing because what's actually involved in the stall and deadlock is __scx_switching_all but we're testing it via scx_switching_all. I'll update the comment so that it just mentions __scx_switching_all. I'm not even sure we actually need scx_switching_all. Thanks. -- tejun ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 sched_ext/for-7.1-fixes] sched_ext: Fix deadlock between scx_root_disable() and concurrent forks 2026-05-17 0:41 [PATCH sched_ext/for-7.1-fixes] sched_ext: Fix deadlock between scx_root_disable() and concurrent forks Tejun Heo 2026-05-17 10:56 ` Andrea Righi @ 2026-05-17 17:43 ` Tejun Heo 2026-05-17 18:47 ` Andrea Righi 2026-05-17 19:15 ` Tejun Heo 1 sibling, 2 replies; 7+ messages in thread From: Tejun Heo @ 2026-05-17 17:43 UTC (permalink / raw) To: David Vernet, Andrea Righi, Changwoo Min Cc: sched-ext, Emil Tsalapatis, linux-kernel scx_root_disable() enters SCX_DISABLING before it grabs scx_enable_mutex to clear __scx_switched_all and scx_switching_all. task_should_scx() short-circuits on DISABLING, so forks in that window land on fair while next_active_class() still skips fair - the new tasks stall. This can deadlock the disable path itself: scx_alloc_and_add_sched() runs under scx_enable_mutex and creates a helper kthread; if that new kthread is one of the stalled fair tasks, the mutex holder waits forever and scx_root_disable() can never make progress. Only sub-sched support exposes this, since sub-sched enables are the only path where scx_alloc_and_add_sched() can race the root's disable. Move the DISABLING check after @scx_switching_all. @scx_switching_all serves as a proxy for __scx_switched_all, so while it's set, forks keep going to scx. Once cleared, DISABLING applies normally. v2: Reword in-source comment and description. (Andrea) Fixes: 337ec00b1d9c ("sched_ext: Implement cgroup sub-sched enabling and disabling") Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com> --- kernel/sched/ext.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -5092,10 +5092,30 @@ static const struct kset_uevent_ops scx_ */ bool task_should_scx(int policy) { - if (!scx_enabled() || unlikely(scx_enable_state() == SCX_DISABLING)) + /* if disabled, nothing should be on it */ + if (!scx_enabled()) return false; + + /* scx is taking over all SCHED_OTHER and SCHED_EXT tasks */ if (READ_ONCE(scx_switching_all)) return true; + + /* + * scx is tearing down - keep new SCHED_EXT tasks out. + * + * Must come after scx_switching_all test, which serves as a proxy + * for __scx_switched_all. While __scx_switched_all is set, we must + * return true via the branch above: a fork routed to fair would + * stall because next_active_class() skips fair. + * + * This can develop into a deadlock - scx holds scx_enable_mutex across + * kthread_create() in scx_alloc_and_add_sched(); if the new kthread is + * the stalled task, the disable path can never grab the mutex to clear + * scx_switching_all. + */ + if (unlikely(scx_enable_state() == SCX_DISABLING)) + return false; + return policy == SCHED_EXT; } ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 sched_ext/for-7.1-fixes] sched_ext: Fix deadlock between scx_root_disable() and concurrent forks 2026-05-17 17:43 ` [PATCH v2 " Tejun Heo @ 2026-05-17 18:47 ` Andrea Righi 2026-05-17 19:08 ` Tejun Heo 2026-05-17 19:15 ` Tejun Heo 1 sibling, 1 reply; 7+ messages in thread From: Andrea Righi @ 2026-05-17 18:47 UTC (permalink / raw) To: Tejun Heo Cc: David Vernet, Changwoo Min, sched-ext, Emil Tsalapatis, linux-kernel Hi Tejun, On Sun, May 17, 2026 at 07:43:16AM -1000, Tejun Heo wrote: > scx_root_disable() enters SCX_DISABLING before it grabs scx_enable_mutex to > clear __scx_switched_all and scx_switching_all. task_should_scx() short-circuits on DISABLING, > so forks in that window land on fair while next_active_class() still skips > fair - the new tasks stall. > > This can deadlock the disable path itself: scx_alloc_and_add_sched() runs > under scx_enable_mutex and creates a helper kthread; if that new kthread is > one of the stalled fair tasks, the mutex holder waits forever and > scx_root_disable() can never make progress. Only sub-sched support exposes > this, since sub-sched enables are the only path where > scx_alloc_and_add_sched() can race the root's disable. > > Move the DISABLING check after @scx_switching_all. @scx_switching_all > serves as a proxy for __scx_switched_all, so while it's set, forks keep > going to scx. Once cleared, DISABLING applies normally. > > v2: Reword in-source comment and description. (Andrea) > > Fixes: 337ec00b1d9c ("sched_ext: Implement cgroup sub-sched enabling and disabling") > Signed-off-by: Tejun Heo <tj@kernel.org> > Reviewed-by: Andrea Righi <arighi@nvidia.com> > --- > kernel/sched/ext.c | 22 +++++++++++++++++++++- > 1 file changed, 21 insertions(+), 1 deletion(-) > > --- a/kernel/sched/ext.c > +++ b/kernel/sched/ext.c > @@ -5092,10 +5092,30 @@ static const struct kset_uevent_ops scx_ > */ > bool task_should_scx(int policy) > { > - if (!scx_enabled() || unlikely(scx_enable_state() == SCX_DISABLING)) > + /* if disabled, nothing should be on it */ > + if (!scx_enabled()) > return false; > + > + /* scx is taking over all SCHED_OTHER and SCHED_EXT tasks */ > if (READ_ONCE(scx_switching_all)) > return true; > + > + /* > + * scx is tearing down - keep new SCHED_EXT tasks out. > + * > + * Must come after scx_switching_all test, which serves as a proxy > + * for __scx_switched_all. While __scx_switched_all is set, we must > + * return true via the branch above: a fork routed to fair would > + * stall because next_active_class() skips fair. > + * > + * This can develop into a deadlock - scx holds scx_enable_mutex across > + * kthread_create() in scx_alloc_and_add_sched(); if the new kthread is > + * the stalled task, the disable path can never grab the mutex to clear > + * scx_switching_all. > + */ Yeah, this is much better than my comment (that was quite confusing). To make sure I understand: what fixes the deadlock is checking scx_switching_all before DISABLING in task_should_scx(), because in this way the sched_ext_helper kthread goes to scx (not fair), runs, the enable path completes, releases the mutex and the disable path moves forward. When I wrote my comment I was looking at the ordering of [__]scx_switched_all in scx_root_disable(): static_branch_disable(&__scx_switched_all); WRITE_ONCE(scx_switching_all, false); And I was wondering, if we invert those we'd have a similar issue: a small window where __scx_switched_all == ON and scx_switching_all == false. But the current order is already the safe one, so no change needed. Thanks, -Andrea > + if (unlikely(scx_enable_state() == SCX_DISABLING)) > + return false; > + > return policy == SCHED_EXT; > } > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 sched_ext/for-7.1-fixes] sched_ext: Fix deadlock between scx_root_disable() and concurrent forks 2026-05-17 18:47 ` Andrea Righi @ 2026-05-17 19:08 ` Tejun Heo 0 siblings, 0 replies; 7+ messages in thread From: Tejun Heo @ 2026-05-17 19:08 UTC (permalink / raw) To: Andrea Righi Cc: David Vernet, Changwoo Min, sched-ext, Emil Tsalapatis, linux-kernel Hello, On Sun, May 17, 2026 at 08:47:31PM +0200, Andrea Righi wrote: ... > Yeah, this is much better than my comment (that was quite confusing). > > To make sure I understand: what fixes the deadlock is checking scx_switching_all > before DISABLING in task_should_scx(), because in this way the sched_ext_helper > kthread goes to scx (not fair), runs, the enable path completes, releases the > mutex and the disable path moves forward. > > When I wrote my comment I was looking at the ordering of [__]scx_switched_all in > scx_root_disable(): > > static_branch_disable(&__scx_switched_all); > WRITE_ONCE(scx_switching_all, false); > > And I was wondering, if we invert those we'd have a similar issue: a small > window where __scx_switched_all == ON and scx_switching_all == false. But the > current order is already the safe one, so no change needed. Yeah, and even if create that window between __scx_switched_all and scx_switching_all, it's transient. Let's say a task slips into eevdf between the two. The task has no way of preventing disable from completing __scx_switched_all transition, and the condition would unwind. The problem with DISABLING transition was that it could make a racing enable path to wait for kthread creation to finish while holding enable_mutex. Because disable path needs the same mutex to turn off __scx_switched_all and the stalled task needs __scx_switched_all to be turned off to progress, we end up in a deadlock. Thanks. -- tejun ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 sched_ext/for-7.1-fixes] sched_ext: Fix deadlock between scx_root_disable() and concurrent forks 2026-05-17 17:43 ` [PATCH v2 " Tejun Heo 2026-05-17 18:47 ` Andrea Righi @ 2026-05-17 19:15 ` Tejun Heo 1 sibling, 0 replies; 7+ messages in thread From: Tejun Heo @ 2026-05-17 19:15 UTC (permalink / raw) To: David Vernet, Andrea Righi, Changwoo Min Cc: sched-ext, Emil Tsalapatis, linux-kernel Hello, Applied to sched_ext/for-7.1-fixes. Thanks. -- tejun ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-05-17 19:15 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-17 0:41 [PATCH sched_ext/for-7.1-fixes] sched_ext: Fix deadlock between scx_root_disable() and concurrent forks Tejun Heo 2026-05-17 10:56 ` Andrea Righi 2026-05-17 17:25 ` Tejun Heo 2026-05-17 17:43 ` [PATCH v2 " Tejun Heo 2026-05-17 18:47 ` Andrea Righi 2026-05-17 19:08 ` Tejun Heo 2026-05-17 19:15 ` Tejun Heo
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.