All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	sched-ext@lists.linux.dev, Emil Tsalapatis <emil@etsalapatis.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Fix deadlock between scx_root_disable() and concurrent forks
Date: Sun, 17 May 2026 12:56:34 +0200	[thread overview]
Message-ID: <agme4i4w5K6Ab9cM@gpd4> (raw)
In-Reply-To: <39ab37b4e79c6e5361a907c06ab27e72@kernel.org>

Hi Tejun,

On Sat, May 16, 2026 at 02:41:20PM -1000, Tejun Heo wrote:
> scx_root_disable() enters SCX_DISABLING before it grabs scx_enable_mutex to
> clear [__]scx_switching_all. task_should_scx() short-circuits on DISABLING,
> so forks in that window land on fair while next_active_class() still skips
> fair - the new tasks stall.
> 
> This can deadlock the disable path itself: scx_alloc_and_add_sched() runs
> under scx_enable_mutex and creates a helper kthread; if that new kthread is
> one of the stalled fair tasks, the mutex holder waits forever and
> scx_root_disable() can never make progress. Only sub-sched support exposes
> this, since sub-sched enables are the only path where
> scx_alloc_and_add_sched() can race the root's disable.
> 
> Move the DISABLING check after @scx_switching_all so that whenever
> @scx_switching_all is set, forks keep going to scx and stay in lockstep with
> __scx_switched_all. Once both are cleared (together under the mutex),
> DISABLING applies normally.
> 
> Fixes: 337ec00b1d9c ("sched_ext: Implement cgroup sub-sched enabling and disabling")
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  kernel/sched/ext.c |   23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -5092,10 +5092,31 @@ static const struct kset_uevent_ops scx_
>   */
>  bool task_should_scx(int policy)
>  {
> -	if (!scx_enabled() || unlikely(scx_enable_state() == SCX_DISABLING))
> +	/* if disabled, nothing should be on it */
> +	if (!scx_enabled())
>  		return false;
> +
> +	/* scx is taking over all SCHED_OTHER and SCHED_EXT tasks */
>  	if (READ_ONCE(scx_switching_all))
>  		return true;
> +
> +	/*
> +	 * scx is tearing down - keep new SCHED_EXT tasks out.
> +	 *
> +	 * Must come after scx_switching_all test. While both are set, we must
> +	 * return true via the branch above: [__]scx_switching_all are cleared
> +	 * together under scx_enable_mutex, and a fork routed to fair while
> +	 * __scx_switched_all is still on would stall because
> +	 * next_active_class() skips fair.

Just being extra picky: [__]scx_switching_all are cleared together sequentially,
but not atomically (in fact the order is what matters). To make it more clear,
how about rephrasing the comment block above like this:

  * Must come after the scx_switching_all test. scx_root_disable()
  * clears __scx_switched_all before scx_switching_all (both under
  * scx_enable_mutex), so while scx_switching_all is observed as true,
  * __scx_switched_all may still be on. A fork routed to fair in that
  * window would stall because next_active_class() skips fair.

> +	 *
> +	 * This can develop into a deadlock - scx holds scx_enable_mutex across
> +	 * kthread_create() in scx_alloc_and_add_sched(); if the new kthread is
> +	 * the stalled task, the disable path can never grab the mutex to clear
> +	 * scx_switching_all.
> +	 */
> +	if (unlikely(scx_enable_state() == SCX_DISABLING))
> +		return false;
> +
>  	return policy == SCHED_EXT;
>  }
> 

Other than that, looks good to me.

Reviewed-by: Andrea Righi <arighi@nvidia.com>

Thanks,
-Andrea

  reply	other threads:[~2026-05-17 10:56 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-17  0:41 [PATCH sched_ext/for-7.1-fixes] sched_ext: Fix deadlock between scx_root_disable() and concurrent forks Tejun Heo
2026-05-17 10:56 ` Andrea Righi [this message]
2026-05-17 17:25   ` Tejun Heo
2026-05-17 17:43 ` [PATCH v2 " Tejun Heo
2026-05-17 18:47   ` Andrea Righi
2026-05-17 19:08     ` Tejun Heo
2026-05-17 19:15   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agme4i4w5K6Ab9cM@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=emil@etsalapatis.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.