public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
	Emil Tsalapatis <emil@etsalapatis.com>,
	Chris Mason <clm@meta.com>, Ryan Newton <newton@meta.com>
Subject: Re: [PATCH v2 13/13] sched_ext: Refuse cross-task select_cpu_from_kfunc calls
Date: Sat, 25 Apr 2026 08:50:59 +0200	[thread overview]
Message-ID: <aexkU_cROm-NyTNE@gpd4> (raw)
In-Reply-To: <20260425001942.3987351-1-tj@kernel.org>

Hi Tejun,

On Fri, Apr 24, 2026 at 02:19:42PM -1000, Tejun Heo wrote:
> select_cpu_from_kfunc() skipped pi_lock for @p when called from
> ops.select_cpu() or another rq-locked SCX op, assuming the held lock
> protects @p. scx_bpf_select_cpu_dfl() / __scx_bpf_select_cpu_and() accept an
> arbitrary KF_RCU task_struct, so a caller in e.g. ops.select_cpu(p1) or
> ops.enqueue(p1) can pass some other p2 - the held pi_lock / rq lock is p1's,
> not p2's - and reading p2->cpus_ptr / nr_cpus_allowed races with
> set_cpus_allowed_ptr() and migrate_disable_switch() on another CPU.
> 
> Abort the scheduler on cross-task calls in both branches: for
> ops.select_cpu() use scx_kf_arg_task_ok() to verify @p is the wake-up
> task recorded in current->scx.kf_tasks[] by SCX_CALL_OP_TASK_RET();
> for other rq-locked SCX ops compare task_rq(p) against scx_locked_rq().
> 
> v2: Per Andrea Righi: switch the in_select_cpu cross-task check from
> direct_dispatch_task comparison to scx_kf_arg_task_ok(). The former
> spuriously rejects when ops.select_cpu() calls scx_bpf_dsq_insert()
> first (mark_direct_dispatch() stamps direct_dispatch_task =
> ERR_PTR(-ESRCH)), then calls scx_bpf_select_cpu_*() on the same task.
> 
> Fixes: 0022b328504d ("sched_ext: Decouple kfunc unlocked-context check from kf_mask")
> Reported-by: Chris Mason <clm@meta.com>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Andrea Righi <arighi@nvidia.com>

Looks good!

Reviewed-by: Andrea Righi <arighi@nvidia.com>

Thanks,
-Andrea

> ---
>  kernel/sched/ext_idle.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
> index c43d62d90e40..7468560a6d80 100644
> --- a/kernel/sched/ext_idle.c
> +++ b/kernel/sched/ext_idle.c
> @@ -927,14 +927,24 @@ static s32 select_cpu_from_kfunc(struct scx_sched *sch, struct task_struct *p,
>  	 * Accessing p->cpus_ptr / p->nr_cpus_allowed needs either @p's rq
>  	 * lock or @p's pi_lock. Three cases:
>  	 *
> -	 *  - inside ops.select_cpu(): try_to_wake_up() holds @p's pi_lock.
> +	 *  - inside ops.select_cpu(): try_to_wake_up() holds the wake-up
> +	 *    task's pi_lock; the wake-up task is recorded in kf_tasks[0]
> +	 *    by SCX_CALL_OP_TASK_RET().
>  	 *  - other rq-locked SCX op: scx_locked_rq() points at the held rq.
>  	 *  - truly unlocked (UNLOCKED ops, SYSCALL, non-SCX struct_ops):
>  	 *    nothing held, take pi_lock ourselves.
> +	 *
> +	 * In the first two cases, BPF schedulers may pass an arbitrary task
> +	 * that the held lock doesn't cover. Refuse those.
>  	 */
>  	if (this_rq()->scx.in_select_cpu) {
> +		if (!scx_kf_arg_task_ok(sch, p))
> +			return -EINVAL;
>  		lockdep_assert_held(&p->pi_lock);
> -	} else if (!scx_locked_rq()) {
> +	} else if (scx_locked_rq()) {
> +		if (task_rq(p) != scx_locked_rq())
> +			goto cross_task;
> +	} else {
>  		raw_spin_lock_irqsave(&p->pi_lock, irq_flags);
>  		we_locked = true;
>  	}
> @@ -960,6 +970,11 @@ static s32 select_cpu_from_kfunc(struct scx_sched *sch, struct task_struct *p,
>  		raw_spin_unlock_irqrestore(&p->pi_lock, irq_flags);
>  
>  	return cpu;
> +
> +cross_task:
> +	scx_error(sch, "select_cpu kfunc called cross-task on %s[%d]",
> +		  p->comm, p->pid);
> +	return -EINVAL;
>  }
>  
>  /**
> -- 
> 2.53.0
> 

  reply	other threads:[~2026-04-25  6:51 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 20:44 [PATCHSET sched_ext/for-7.1-fixes] sched_ext: Assorted fixes Tejun Heo
2026-04-24 20:44 ` [PATCH 01/13] sched_ext: Unregister sub_kset on scheduler disable Tejun Heo
2026-04-24 20:44 ` [PATCH 02/13] sched_ext: Guard scx_dsq_move() against NULL kit->dsq after failed iter_new Tejun Heo
2026-04-24 20:44 ` [PATCH 03/13] sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu() Tejun Heo
2026-04-24 20:44 ` [PATCH 04/13] sched_ext: Don't disable tasks in scx_sub_enable_workfn() abort path Tejun Heo
2026-04-24 20:44 ` [PATCH 05/13] sched_ext: Read scx_root under scx_cgroup_ops_rwsem in cgroup setters Tejun Heo
2026-04-24 20:44 ` [PATCH 06/13] sched_ext: Resolve caller's scheduler in scx_bpf_destroy_dsq() / scx_bpf_dsq_nr_queued() Tejun Heo
2026-04-24 20:44 ` [PATCH 07/13] sched_ext: Use dsq->first_task instead of list_empty() in dispatch_enqueue() FIFO-tail Tejun Heo
2026-04-24 20:44 ` [PATCH 08/13] sched_ext: Save and restore scx_locked_rq across SCX_CALL_OP Tejun Heo
2026-04-24 20:44 ` [PATCH 09/13] sched_ext: Pass held rq to SCX_CALL_OP() for dump_cpu/dump_task Tejun Heo
2026-04-24 20:44 ` [PATCH 10/13] sched_ext: Pass held rq to SCX_CALL_OP() for core_sched_before Tejun Heo
2026-04-24 20:44 ` [PATCH 11/13] sched_ext: Make bypass LB cpumasks per-scheduler Tejun Heo
2026-04-24 20:44 ` [PATCH 12/13] sched_ext: Align cgroup #ifdef guards with SUB_SCHED vs GROUP_SCHED Tejun Heo
2026-04-24 20:44 ` [PATCH 13/13] sched_ext: Refuse cross-task select_cpu_from_kfunc calls Tejun Heo
2026-04-24 21:46   ` Andrea Righi
2026-04-25  0:19   ` [PATCH v2 " Tejun Heo
2026-04-25  6:50     ` Andrea Righi [this message]
2026-04-24 21:08 ` [PATCH 14/13] sched_ext: Reject NULL-sch callers in scx_bpf_task_set_slice/dsq_vtime Tejun Heo
2026-04-24 21:08   ` [PATCH 15/13] sched_ext: Release cpus_read_lock on scx_link_sched() failure in root enable Tejun Heo
2026-04-24 22:00     ` Andrea Righi
2026-04-25  0:19     ` [PATCH v2 " Tejun Heo
2026-04-25  6:51       ` Andrea Righi
2026-04-24 22:10 ` [PATCHSET sched_ext/for-7.1-fixes] sched_ext: Assorted fixes Andrea Righi
2026-04-25  0:39 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aexkU_cROm-NyTNE@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=clm@meta.com \
    --cc=emil@etsalapatis.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=newton@meta.com \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox