From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
sched-ext@lists.linux.dev, Emil Tsalapatis <emil@etsalapatis.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 sched_ext/for-7.1-fixes] sched_ext: Fix deadlock between scx_root_disable() and concurrent forks
Date: Sun, 17 May 2026 20:47:31 +0200 [thread overview]
Message-ID: <agoNQ7cKgkX1pWmo@gpd4> (raw)
In-Reply-To: <362a365eb559003ed21c6dac12d92c5d@kernel.org>
Hi Tejun,
On Sun, May 17, 2026 at 07:43:16AM -1000, Tejun Heo wrote:
> scx_root_disable() enters SCX_DISABLING before it grabs scx_enable_mutex to
> clear __scx_switched_all and scx_switching_all. task_should_scx() short-circuits on DISABLING,
> so forks in that window land on fair while next_active_class() still skips
> fair - the new tasks stall.
>
> This can deadlock the disable path itself: scx_alloc_and_add_sched() runs
> under scx_enable_mutex and creates a helper kthread; if that new kthread is
> one of the stalled fair tasks, the mutex holder waits forever and
> scx_root_disable() can never make progress. Only sub-sched support exposes
> this, since sub-sched enables are the only path where
> scx_alloc_and_add_sched() can race the root's disable.
>
> Move the DISABLING check after @scx_switching_all. @scx_switching_all
> serves as a proxy for __scx_switched_all, so while it's set, forks keep
> going to scx. Once cleared, DISABLING applies normally.
>
> v2: Reword in-source comment and description. (Andrea)
>
> Fixes: 337ec00b1d9c ("sched_ext: Implement cgroup sub-sched enabling and disabling")
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reviewed-by: Andrea Righi <arighi@nvidia.com>
> ---
> kernel/sched/ext.c | 22 +++++++++++++++++++++-
> 1 file changed, 21 insertions(+), 1 deletion(-)
>
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -5092,10 +5092,30 @@ static const struct kset_uevent_ops scx_
> */
> bool task_should_scx(int policy)
> {
> - if (!scx_enabled() || unlikely(scx_enable_state() == SCX_DISABLING))
> + /* if disabled, nothing should be on it */
> + if (!scx_enabled())
> return false;
> +
> + /* scx is taking over all SCHED_OTHER and SCHED_EXT tasks */
> if (READ_ONCE(scx_switching_all))
> return true;
> +
> + /*
> + * scx is tearing down - keep new SCHED_EXT tasks out.
> + *
> + * Must come after scx_switching_all test, which serves as a proxy
> + * for __scx_switched_all. While __scx_switched_all is set, we must
> + * return true via the branch above: a fork routed to fair would
> + * stall because next_active_class() skips fair.
> + *
> + * This can develop into a deadlock - scx holds scx_enable_mutex across
> + * kthread_create() in scx_alloc_and_add_sched(); if the new kthread is
> + * the stalled task, the disable path can never grab the mutex to clear
> + * scx_switching_all.
> + */
Yeah, this is much better than my comment (that was quite confusing).
To make sure I understand: what fixes the deadlock is checking scx_switching_all
before DISABLING in task_should_scx(), because in this way the sched_ext_helper
kthread goes to scx (not fair), runs, the enable path completes, releases the
mutex and the disable path moves forward.
When I wrote my comment I was looking at the ordering of [__]scx_switched_all in
scx_root_disable():
static_branch_disable(&__scx_switched_all);
WRITE_ONCE(scx_switching_all, false);
And I was wondering, if we invert those we'd have a similar issue: a small
window where __scx_switched_all == ON and scx_switching_all == false. But the
current order is already the safe one, so no change needed.
Thanks,
-Andrea
> + if (unlikely(scx_enable_state() == SCX_DISABLING))
> + return false;
> +
> return policy == SCHED_EXT;
> }
>
next prev parent reply other threads:[~2026-05-17 18:47 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-17 0:41 [PATCH sched_ext/for-7.1-fixes] sched_ext: Fix deadlock between scx_root_disable() and concurrent forks Tejun Heo
2026-05-17 10:56 ` Andrea Righi
2026-05-17 17:25 ` Tejun Heo
2026-05-17 17:43 ` [PATCH v2 " Tejun Heo
2026-05-17 18:47 ` Andrea Righi [this message]
2026-05-17 19:08 ` Tejun Heo
2026-05-17 19:15 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agoNQ7cKgkX1pWmo@gpd4 \
--to=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=emil@etsalapatis.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sched-ext@lists.linux.dev \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.