All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Emil Tsalapatis <emil@etsalapatis.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH sched_ext/for-6.19-fixes] sched_ext: Short-circuit sched_class operations on dead tasks
Date: Wed, 4 Feb 2026 23:12:32 +0100	[thread overview]
Message-ID: <aYPEUNo4oGY8-Fw0@gpd4> (raw)
In-Reply-To: <fc034891cef55029c16122f4279e4057@kernel.org>

On Wed, Feb 04, 2026 at 10:07:55AM -1000, Tejun Heo wrote:
> 7900aa699c34 ("sched_ext: Fix cgroup exit ordering by moving sched_ext_free()
> to finish_task_switch()") moved sched_ext_free() to finish_task_switch() and
> renamed it to sched_ext_dead() to fix cgroup exit ordering issues. However,
> this created a race window where certain sched_class ops may be invoked on
> dead tasks leading to failures - e.g. sched_setscheduler() may try to switch a
> task which finished sched_ext_dead() back into SCX triggering invalid SCX task
> state transitions.
> 
> Add task_dead_and_done() which tests whether a task is TASK_DEAD and has
> completed its final context switch, and use it to short-circuit sched_class
> operations which may be called on dead tasks.
> 
> Fixes: 7900aa699c34 ("sched_ext: Fix cgroup exit ordering by moving sched_ext_free() to finish_task_switch()")
> Reported-by: Andrea Righi <arighi@nvidia.com>
> Link: http://lkml.kernel.org/r/20260202151341.796959-1-arighi@nvidia.com
> Signed-off-by: Tejun Heo <tj@kernel.org>

Looks good to me, thanks for tracking down the exact issue!

Reviewed-by: Andrea Righi <arighi@nvidia.com>

-Andrea

> ---
>  kernel/sched/ext.c |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
> 
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -194,6 +194,7 @@ MODULE_PARM_DESC(bypass_lb_intv_us, "byp
>  #include <trace/events/sched_ext.h>
> 
>  static void process_ddsp_deferred_locals(struct rq *rq);
> +static bool task_dead_and_done(struct task_struct *p);
>  static u32 reenq_local(struct rq *rq);
>  static void scx_kick_cpu(struct scx_sched *sch, s32 cpu, u64 flags);
>  static bool scx_vexit(struct scx_sched *sch, enum scx_exit_kind kind,
> @@ -2618,6 +2619,9 @@ static void set_cpus_allowed_scx(struct
> 
>  	set_cpus_allowed_common(p, ac);
> 
> +	if (task_dead_and_done(p))
> +		return;
> +
>  	/*
>  	 * The effective cpumask is stored in @p->cpus_ptr which may temporarily
>  	 * differ from the configured one in @p->cpus_mask. Always tell the bpf
> @@ -3033,10 +3037,45 @@ void scx_cancel_fork(struct task_struct
>  	percpu_up_read(&scx_fork_rwsem);
>  }
> 
> +/**
> + * task_dead_and_done - Is a task dead and done running?
> + * @p: target task
> + *
> + * Once sched_ext_dead() removes the dead task from scx_tasks and exits it, the
> + * task no longer exists from SCX's POV. However, certain sched_class ops may be
> + * invoked on these dead tasks leading to failures - e.g. sched_setscheduler()
> + * may try to switch a task which finished sched_ext_dead() back into SCX
> + * triggering invalid SCX task state transitions and worse.
> + *
> + * Once a task has finished the final switch, sched_ext_dead() is the only thing
> + * that needs to happen on the task. Use this test to short-circuit sched_class
> + * operations which may be called on dead tasks.
> + */
> +static bool task_dead_and_done(struct task_struct *p)
> +{
> +	struct rq *rq = task_rq(p);
> +
> +	lockdep_assert_rq_held(rq);
> +
> +	/*
> +	 * In do_task_dead(), a dying task sets %TASK_DEAD with preemption
> +	 * disabled and __schedule(). If @p has %TASK_DEAD set and off CPU, @p
> +	 * won't ever run again.
> +	 */
> +	return unlikely(READ_ONCE(p->__state) == TASK_DEAD) &&
> +		!task_on_cpu(rq, p);
> +}
> +
>  void sched_ext_dead(struct task_struct *p)
>  {
>  	unsigned long flags;
> 
> +	/*
> +	 * By the time control reaches here, @p has %TASK_DEAD set, switched out
> +	 * for the last time and then dropped the rq lock - task_dead_and_done()
> +	 * should be returning %true nullifying the straggling sched_class ops.
> +	 * Remove from scx_tasks and exit @p.
> +	 */
>  	raw_spin_lock_irqsave(&scx_tasks_lock, flags);
>  	list_del_init(&p->scx.tasks_node);
>  	raw_spin_unlock_irqrestore(&scx_tasks_lock, flags);
> @@ -3062,6 +3101,9 @@ static void reweight_task_scx(struct rq
> 
>  	lockdep_assert_rq_held(task_rq(p));
> 
> +	if (task_dead_and_done(p))
> +		return;
> +
>  	p->scx.weight = sched_weight_to_cgroup(scale_load_down(lw->weight));
>  	if (SCX_HAS_OP(sch, set_weight))
>  		SCX_CALL_OP_TASK(sch, SCX_KF_REST, set_weight, rq,
> @@ -3076,6 +3118,9 @@ static void switching_to_scx(struct rq *
>  {
>  	struct scx_sched *sch = scx_root;
> 
> +	if (task_dead_and_done(p))
> +		return;
> +
>  	scx_enable_task(p);
> 
>  	/*
> @@ -3089,6 +3134,9 @@ static void switching_to_scx(struct rq *
> 
>  static void switched_from_scx(struct rq *rq, struct task_struct *p)
>  {
> +	if (task_dead_and_done(p))
> +		return;
> +
>  	scx_disable_task(p);
>  }

  reply	other threads:[~2026-02-04 22:12 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-04 20:07 [PATCH sched_ext/for-6.19-fixes] sched_ext: Short-circuit sched_class operations on dead tasks Tejun Heo
2026-02-04 22:12 ` Andrea Righi [this message]
2026-02-04 22:24 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aYPEUNo4oGY8-Fw0@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=emil@etsalapatis.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.