linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET v2 sched_ext/for-6.16] sched_ext: Extend usability of scx_bpf_select_cpu_and()
@ 2025-05-15 19:11 Andrea Righi
  2025-05-15 19:11 ` [PATCH 1/4] sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c Andrea Righi
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Andrea Righi @ 2025-05-15 19:11 UTC (permalink / raw)
  To: Tejun Heo, David Vernet, Changwoo Min; +Cc: linux-kernel

Many scx schedulers implement their own idle CPU selection policy, which
may be invoked from different contexts, not just from ops.select_cpu().

For example, some schedulers may need to trigger a proactive CPU wakeup
from ops.enqueue() after enqueuing a task, while others may expose this
functionality to user space, relying on a BPF test_run call to pick an idle
CPU.

To maintain a consistent selection policy, these schedulers often implement
their own idle CPU selection logic, since the built-in one is only usable
from ops.select_cpu(), leading to unnecessary code duplication and
fragmentation.

To address this, allow scx_bpf_select_cpu_and() to be called not only from
ops.select_cpu() and ops.enqueue(), but also from unlocked contexts (e.g.,
when triggered from user space via a BPF test_run call).

This allows scx schedulers to consistently leverage the built-in idle CPU
selection logic, helping reduce code duplication and fragmentation.

This patchset is also available in the following git branch:

 git://git.kernel.org/pub/scm/linux/kernel/git/arighi/linux.git scx-select-cpu-and

Changes in v2:
 - Enable scx_bpf_select_cpu_and() to be called from ops.enqueue()
   ops.select_cpu() and unlocked context
 - Add locking validation to ensure safe access to p->cpus_ptr and
   p->nr_cpus_allowed
 - Extend selftest to cover scx_bpf_select_cpu_and() when invoked from user
   space
 - Link to v1: https://lore.kernel.org/all/20250512151743.42988-1-arighi@nvidia.com/

Andrea Righi (4):
      sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c
      sched_ext: idle: Validate locking correctness in scx_bpf_select_cpu_and()
      sched_ext: idle: Allow scx_bpf_select_cpu_and() from unlocked context
      selftests/sched_ext: Add test for scx_bpf_select_cpu_and() via test_run

 kernel/sched/ext.c                                 |  5 ---
 kernel/sched/ext.h                                 |  5 +++
 kernel/sched/ext_idle.c                            | 45 +++++++++++++++++-----
 .../testing/selftests/sched_ext/allowed_cpus.bpf.c | 23 +++++++++++
 tools/testing/selftests/sched_ext/allowed_cpus.c   | 27 +++++++++++++
 5 files changed, 91 insertions(+), 14 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c
  2025-05-15 19:11 [PATCHSET v2 sched_ext/for-6.16] sched_ext: Extend usability of scx_bpf_select_cpu_and() Andrea Righi
@ 2025-05-15 19:11 ` Andrea Righi
  2025-05-15 19:11 ` [PATCH 2/4] sched_ext: idle: Validate locking correctness in scx_bpf_select_cpu_and() Andrea Righi
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Andrea Righi @ 2025-05-15 19:11 UTC (permalink / raw)
  To: Tejun Heo, David Vernet, Changwoo Min; +Cc: linux-kernel

Relocate the scx_kf_allowed_if_unlocked(), so it can be used from other
source files (e.g., ext_idle.c).

No functional change.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext.c | 5 -----
 kernel/sched/ext.h | 5 +++++
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 52e0f9553e730..793e288f63cf4 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1387,11 +1387,6 @@ static __always_inline bool scx_kf_allowed_on_arg_tasks(u32 mask,
 	return true;
 }
 
-static bool scx_kf_allowed_if_unlocked(void)
-{
-	return !current->scx.kf_mask;
-}
-
 /**
  * nldsq_next_task - Iterate to the next task in a non-local DSQ
  * @dsq: user dsq being iterated
diff --git a/kernel/sched/ext.h b/kernel/sched/ext.h
index 3053cdd61eb9c..6e5072f577718 100644
--- a/kernel/sched/ext.h
+++ b/kernel/sched/ext.h
@@ -8,6 +8,11 @@
  */
 #ifdef CONFIG_SCHED_CLASS_EXT
 
+static inline bool scx_kf_allowed_if_unlocked(void)
+{
+	return !current->scx.kf_mask;
+}
+
 DECLARE_STATIC_KEY_FALSE(scx_ops_allow_queued_wakeup);
 
 void scx_tick(struct rq *rq);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/4] sched_ext: idle: Validate locking correctness in scx_bpf_select_cpu_and()
  2025-05-15 19:11 [PATCHSET v2 sched_ext/for-6.16] sched_ext: Extend usability of scx_bpf_select_cpu_and() Andrea Righi
  2025-05-15 19:11 ` [PATCH 1/4] sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c Andrea Righi
@ 2025-05-15 19:11 ` Andrea Righi
  2025-05-15 19:11 ` [PATCH 3/4] sched_ext: idle: Allow scx_bpf_select_cpu_and() from unlocked context Andrea Righi
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Andrea Righi @ 2025-05-15 19:11 UTC (permalink / raw)
  To: Tejun Heo, David Vernet, Changwoo Min; +Cc: linux-kernel

Validate locking correctness when accessing p->nr_cpus_allowed and
p->cpus_ptr inside scx_bpf_select_cpu_and(): if the rq lock is held,
access is safe; otherwise, require that p->pi_lock is held.

This allows to catch potential unsafe calls to scx_bpf_select_cpu_and().

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext_idle.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index f0ebf8b5b908e..716863f1f8cee 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -935,6 +935,7 @@ __bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
 __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64 wake_flags,
 				       const struct cpumask *cpus_allowed, u64 flags)
 {
+	struct rq *rq;
 	s32 cpu;
 
 	if (!kf_cpu_valid(prev_cpu, NULL))
@@ -946,6 +947,15 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64
 	if (!scx_kf_allowed(SCX_KF_SELECT_CPU | SCX_KF_ENQUEUE))
 		return -EPERM;
 
+	/*
+	 * Validate locking correctness to access p->cpus_ptr and
+	 * p->nr_cpus_allowed: if we're holding an rq lock, we're safe;
+	 * otherwise, assert that p->pi_lock is held.
+	 */
+	rq = scx_locked_rq();
+	if (!rq)
+		lockdep_assert_held(&p->pi_lock);
+
 #ifdef CONFIG_SMP
 	/*
 	 * This may also be called from ops.enqueue(), so we need to handle
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] sched_ext: idle: Allow scx_bpf_select_cpu_and() from unlocked context
  2025-05-15 19:11 [PATCHSET v2 sched_ext/for-6.16] sched_ext: Extend usability of scx_bpf_select_cpu_and() Andrea Righi
  2025-05-15 19:11 ` [PATCH 1/4] sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c Andrea Righi
  2025-05-15 19:11 ` [PATCH 2/4] sched_ext: idle: Validate locking correctness in scx_bpf_select_cpu_and() Andrea Righi
@ 2025-05-15 19:11 ` Andrea Righi
  2025-05-17 17:54   ` David Vernet
  2025-05-15 19:11 ` [PATCH 4/4] selftests/sched_ext: Add test for scx_bpf_select_cpu_and() via test_run Andrea Righi
  2025-05-19 18:02 ` [PATCHSET v2 sched_ext/for-6.16] sched_ext: Extend usability of scx_bpf_select_cpu_and() Tejun Heo
  4 siblings, 1 reply; 11+ messages in thread
From: Andrea Righi @ 2025-05-15 19:11 UTC (permalink / raw)
  To: Tejun Heo, David Vernet, Changwoo Min; +Cc: linux-kernel

Allow scx_bpf_select_cpu_and() to be used from an unlocked context, in
addition to ops.enqueue() or ops.select_cpu().

This enables schedulers, including user-space ones, to implement a
consistent idle CPU selection policy and helps reduce code duplication.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext_idle.c | 37 +++++++++++++++++++++++++++----------
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index 716863f1f8cee..37279a09900ca 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -922,9 +922,10 @@ __bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
  * @cpus_allowed: cpumask of allowed CPUs
  * @flags: %SCX_PICK_IDLE* flags
  *
- * Can only be called from ops.select_cpu() or ops.enqueue() if the
- * built-in CPU selection is enabled: ops.update_idle() is missing or
- * %SCX_OPS_KEEP_BUILTIN_IDLE is set.
+ * Can be called from ops.select_cpu(), ops.enqueue(), or from an unlocked
+ * context such as a BPF test_run() call, as long as built-in CPU selection
+ * is enabled: ops.update_idle() is missing or %SCX_OPS_KEEP_BUILTIN_IDLE
+ * is set.
  *
  * @p, @prev_cpu and @wake_flags match ops.select_cpu().
  *
@@ -936,6 +937,7 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64
 				       const struct cpumask *cpus_allowed, u64 flags)
 {
 	struct rq *rq;
+	struct rq_flags rf;
 	s32 cpu;
 
 	if (!kf_cpu_valid(prev_cpu, NULL))
@@ -944,15 +946,26 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64
 	if (!check_builtin_idle_enabled())
 		return -EBUSY;
 
-	if (!scx_kf_allowed(SCX_KF_SELECT_CPU | SCX_KF_ENQUEUE))
-		return -EPERM;
+	/*
+	 * If called from an unlocked context, acquire the task's rq lock,
+	 * so that we can safely access p->cpus_ptr and p->nr_cpus_allowed.
+	 *
+	 * Otherwise, allow to use this kfunc only from ops.select_cpu()
+	 * and ops.select_enqueue().
+	 */
+	if (scx_kf_allowed_if_unlocked()) {
+		rq = task_rq_lock(p, &rf);
+	} else {
+		if (!scx_kf_allowed(SCX_KF_SELECT_CPU | SCX_KF_ENQUEUE))
+			return -EPERM;
+		rq = scx_locked_rq();
+	}
 
 	/*
 	 * Validate locking correctness to access p->cpus_ptr and
 	 * p->nr_cpus_allowed: if we're holding an rq lock, we're safe;
 	 * otherwise, assert that p->pi_lock is held.
 	 */
-	rq = scx_locked_rq();
 	if (!rq)
 		lockdep_assert_held(&p->pi_lock);
 
@@ -966,13 +979,17 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64
 	if (p->nr_cpus_allowed == 1) {
 		if (cpumask_test_cpu(prev_cpu, cpus_allowed) &&
 		    scx_idle_test_and_clear_cpu(prev_cpu))
-			return prev_cpu;
-		return -EBUSY;
+			cpu = prev_cpu;
+		else
+			cpu = -EBUSY;
+	} else {
+		cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, cpus_allowed, flags);
 	}
-	cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, cpus_allowed, flags);
 #else
 	cpu = -EBUSY;
 #endif
+	if (scx_kf_allowed_if_unlocked())
+		task_rq_unlock(rq, p, &rf);
 
 	return cpu;
 }
@@ -1276,6 +1293,7 @@ BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu_node, KF_RCU)
 BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu, KF_RCU)
 BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu_node, KF_RCU)
 BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu, KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU)
 BTF_KFUNCS_END(scx_kfunc_ids_idle)
 
 static const struct btf_kfunc_id_set scx_kfunc_set_idle = {
@@ -1285,7 +1303,6 @@ static const struct btf_kfunc_id_set scx_kfunc_set_idle = {
 
 BTF_KFUNCS_START(scx_kfunc_ids_select_cpu)
 BTF_ID_FLAGS(func, scx_bpf_select_cpu_dfl, KF_RCU)
-BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU)
 BTF_KFUNCS_END(scx_kfunc_ids_select_cpu)
 
 static const struct btf_kfunc_id_set scx_kfunc_set_select_cpu = {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] selftests/sched_ext: Add test for scx_bpf_select_cpu_and() via test_run
  2025-05-15 19:11 [PATCHSET v2 sched_ext/for-6.16] sched_ext: Extend usability of scx_bpf_select_cpu_and() Andrea Righi
                   ` (2 preceding siblings ...)
  2025-05-15 19:11 ` [PATCH 3/4] sched_ext: idle: Allow scx_bpf_select_cpu_and() from unlocked context Andrea Righi
@ 2025-05-15 19:11 ` Andrea Righi
  2025-05-19 18:02 ` [PATCHSET v2 sched_ext/for-6.16] sched_ext: Extend usability of scx_bpf_select_cpu_and() Tejun Heo
  4 siblings, 0 replies; 11+ messages in thread
From: Andrea Righi @ 2025-05-15 19:11 UTC (permalink / raw)
  To: Tejun Heo, David Vernet, Changwoo Min; +Cc: linux-kernel

Update the allowed_cpus selftest to include a check to validate the
behavior of scx_bpf_select_cpu_and() when invoked via a BPF test_run
call.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 .../selftests/sched_ext/allowed_cpus.bpf.c    | 23 ++++++++++++++++
 .../selftests/sched_ext/allowed_cpus.c        | 27 +++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/tools/testing/selftests/sched_ext/allowed_cpus.bpf.c b/tools/testing/selftests/sched_ext/allowed_cpus.bpf.c
index 39d57f7f74099..35923e74a2ec3 100644
--- a/tools/testing/selftests/sched_ext/allowed_cpus.bpf.c
+++ b/tools/testing/selftests/sched_ext/allowed_cpus.bpf.c
@@ -111,6 +111,29 @@ void BPF_STRUCT_OPS(allowed_cpus_exit, struct scx_exit_info *ei)
 	UEI_RECORD(uei, ei);
 }
 
+struct task_cpu_arg {
+	pid_t pid;
+};
+
+SEC("syscall")
+int select_cpu_from_user(struct task_cpu_arg *input)
+{
+	struct task_struct *p;
+	int cpu;
+
+	p = bpf_task_from_pid(input->pid);
+	if (!p)
+		return -EINVAL;
+
+	bpf_rcu_read_lock();
+	cpu = scx_bpf_select_cpu_and(p, bpf_get_smp_processor_id(), 0, p->cpus_ptr, 0);
+	bpf_rcu_read_unlock();
+
+	bpf_task_release(p);
+
+	return cpu;
+}
+
 SEC(".struct_ops.link")
 struct sched_ext_ops allowed_cpus_ops = {
 	.select_cpu		= (void *)allowed_cpus_select_cpu,
diff --git a/tools/testing/selftests/sched_ext/allowed_cpus.c b/tools/testing/selftests/sched_ext/allowed_cpus.c
index a001a3a0e9f1f..093f285ab4bae 100644
--- a/tools/testing/selftests/sched_ext/allowed_cpus.c
+++ b/tools/testing/selftests/sched_ext/allowed_cpus.c
@@ -23,6 +23,30 @@ static enum scx_test_status setup(void **ctx)
 	return SCX_TEST_PASS;
 }
 
+static int test_select_cpu_from_user(const struct allowed_cpus *skel)
+{
+	int fd, ret;
+	__u64 args[1];
+
+	LIBBPF_OPTS(bpf_test_run_opts, attr,
+		.ctx_in = args,
+		.ctx_size_in = sizeof(args),
+	);
+
+	args[0] = getpid();
+	fd = bpf_program__fd(skel->progs.select_cpu_from_user);
+	if (fd < 0)
+		return fd;
+
+	ret = bpf_prog_test_run_opts(fd, &attr);
+	if (ret < 0)
+		return ret;
+
+	fprintf(stderr, "%s: CPU %d\n", __func__, attr.retval);
+
+	return 0;
+}
+
 static enum scx_test_status run(void *ctx)
 {
 	struct allowed_cpus *skel = ctx;
@@ -31,6 +55,9 @@ static enum scx_test_status run(void *ctx)
 	link = bpf_map__attach_struct_ops(skel->maps.allowed_cpus_ops);
 	SCX_FAIL_IF(!link, "Failed to attach scheduler");
 
+	/* Pick an idle CPU from user-space */
+	SCX_FAIL_IF(test_select_cpu_from_user(skel), "Failed to pick idle CPU");
+
 	/* Just sleeping is fine, plenty of scheduling events happening */
 	sleep(1);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/4] sched_ext: idle: Allow scx_bpf_select_cpu_and() from unlocked context
  2025-05-15 19:11 ` [PATCH 3/4] sched_ext: idle: Allow scx_bpf_select_cpu_and() from unlocked context Andrea Righi
@ 2025-05-17 17:54   ` David Vernet
  2025-05-17 22:50     ` Andrea Righi
  0 siblings, 1 reply; 11+ messages in thread
From: David Vernet @ 2025-05-17 17:54 UTC (permalink / raw)
  To: Andrea Righi; +Cc: Tejun Heo, Changwoo Min, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3881 bytes --]

On Thu, May 15, 2025 at 09:11:44PM +0200, Andrea Righi wrote:
> Allow scx_bpf_select_cpu_and() to be used from an unlocked context, in
> addition to ops.enqueue() or ops.select_cpu().
> 
> This enables schedulers, including user-space ones, to implement a
> consistent idle CPU selection policy and helps reduce code duplication.
> 
> Signed-off-by: Andrea Righi <arighi@nvidia.com>

Hey Andrea,

Nice, this looks correct and reasonable to me. Just left one suggestion below
that I'd be curious to hear your thoughts on.

> ---
>  kernel/sched/ext_idle.c | 37 +++++++++++++++++++++++++++----------
>  1 file changed, 27 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
> index 716863f1f8cee..37279a09900ca 100644
> --- a/kernel/sched/ext_idle.c
> +++ b/kernel/sched/ext_idle.c
> @@ -922,9 +922,10 @@ __bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
>   * @cpus_allowed: cpumask of allowed CPUs
>   * @flags: %SCX_PICK_IDLE* flags
>   *
> - * Can only be called from ops.select_cpu() or ops.enqueue() if the
> - * built-in CPU selection is enabled: ops.update_idle() is missing or
> - * %SCX_OPS_KEEP_BUILTIN_IDLE is set.
> + * Can be called from ops.select_cpu(), ops.enqueue(), or from an unlocked
> + * context such as a BPF test_run() call, as long as built-in CPU selection
> + * is enabled: ops.update_idle() is missing or %SCX_OPS_KEEP_BUILTIN_IDLE
> + * is set.
>   *
>   * @p, @prev_cpu and @wake_flags match ops.select_cpu().
>   *
> @@ -936,6 +937,7 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64
>  				       const struct cpumask *cpus_allowed, u64 flags)
>  {
>  	struct rq *rq;
> +	struct rq_flags rf;
>  	s32 cpu;
>  
>  	if (!kf_cpu_valid(prev_cpu, NULL))
> @@ -944,15 +946,26 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64
>  	if (!check_builtin_idle_enabled())
>  		return -EBUSY;
>  
> -	if (!scx_kf_allowed(SCX_KF_SELECT_CPU | SCX_KF_ENQUEUE))
> -		return -EPERM;
> +	/*
> +	 * If called from an unlocked context, acquire the task's rq lock,
> +	 * so that we can safely access p->cpus_ptr and p->nr_cpus_allowed.
> +	 *
> +	 * Otherwise, allow to use this kfunc only from ops.select_cpu()
> +	 * and ops.select_enqueue().
> +	 */
> +	if (scx_kf_allowed_if_unlocked()) {
> +		rq = task_rq_lock(p, &rf);
> +	} else {
> +		if (!scx_kf_allowed(SCX_KF_SELECT_CPU | SCX_KF_ENQUEUE))
> +			return -EPERM;
> +		rq = scx_locked_rq();
> +	}
>  
>  	/*
>  	 * Validate locking correctness to access p->cpus_ptr and
>  	 * p->nr_cpus_allowed: if we're holding an rq lock, we're safe;
>  	 * otherwise, assert that p->pi_lock is held.
>  	 */
> -	rq = scx_locked_rq();
>  	if (!rq)
>  		lockdep_assert_held(&p->pi_lock);
>  
> @@ -966,13 +979,17 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64
>  	if (p->nr_cpus_allowed == 1) {
>  		if (cpumask_test_cpu(prev_cpu, cpus_allowed) &&
>  		    scx_idle_test_and_clear_cpu(prev_cpu))
> -			return prev_cpu;
> -		return -EBUSY;
> +			cpu = prev_cpu;
> +		else
> +			cpu = -EBUSY;
> +	} else {
> +		cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, cpus_allowed, flags);

I wonder if we should just bring this into scx_select_cpu_dfl()? It seems like
it would makes sense to do this optimization whether we're looking at
cpus_allowed here, or p->cpus_ptr in scx_select_cpu_dfl(). I seem to recall us
having this in there before so there may be a reason we removed it, but I've
been out of the game for a while so I'm not sure.

Anyways, if we could do this, then we could bring both scx_bpf_select_cpu_and()
and scx_select_cpu_dfl() into the scx_kfunc_ids_idle kfunc group and remove
scx_kfunc_ids_select_cpu.

What do you think?

Thanks,
David

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/4] sched_ext: idle: Allow scx_bpf_select_cpu_and() from unlocked context
  2025-05-17 17:54   ` David Vernet
@ 2025-05-17 22:50     ` Andrea Righi
  2025-05-18  0:40       ` David Vernet
  0 siblings, 1 reply; 11+ messages in thread
From: Andrea Righi @ 2025-05-17 22:50 UTC (permalink / raw)
  To: David Vernet; +Cc: Tejun Heo, Changwoo Min, linux-kernel

Hi David,

On Sat, May 17, 2025 at 12:54:29PM -0500, David Vernet wrote:
> On Thu, May 15, 2025 at 09:11:44PM +0200, Andrea Righi wrote:
> > Allow scx_bpf_select_cpu_and() to be used from an unlocked context, in
> > addition to ops.enqueue() or ops.select_cpu().
> > 
> > This enables schedulers, including user-space ones, to implement a
> > consistent idle CPU selection policy and helps reduce code duplication.
> > 
> > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> 
> Hey Andrea,
> 
> Nice, this looks correct and reasonable to me. Just left one suggestion below
> that I'd be curious to hear your thoughts on.

Thanks for looking at this!

> 
> > ---
> >  kernel/sched/ext_idle.c | 37 +++++++++++++++++++++++++++----------
> >  1 file changed, 27 insertions(+), 10 deletions(-)
> > 
> > diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
> > index 716863f1f8cee..37279a09900ca 100644
> > --- a/kernel/sched/ext_idle.c
> > +++ b/kernel/sched/ext_idle.c
> > @@ -922,9 +922,10 @@ __bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
> >   * @cpus_allowed: cpumask of allowed CPUs
> >   * @flags: %SCX_PICK_IDLE* flags
> >   *
> > - * Can only be called from ops.select_cpu() or ops.enqueue() if the
> > - * built-in CPU selection is enabled: ops.update_idle() is missing or
> > - * %SCX_OPS_KEEP_BUILTIN_IDLE is set.
> > + * Can be called from ops.select_cpu(), ops.enqueue(), or from an unlocked
> > + * context such as a BPF test_run() call, as long as built-in CPU selection
> > + * is enabled: ops.update_idle() is missing or %SCX_OPS_KEEP_BUILTIN_IDLE
> > + * is set.
> >   *
> >   * @p, @prev_cpu and @wake_flags match ops.select_cpu().
> >   *
> > @@ -936,6 +937,7 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64
> >  				       const struct cpumask *cpus_allowed, u64 flags)
> >  {
> >  	struct rq *rq;
> > +	struct rq_flags rf;
> >  	s32 cpu;
> >  
> >  	if (!kf_cpu_valid(prev_cpu, NULL))
> > @@ -944,15 +946,26 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64
> >  	if (!check_builtin_idle_enabled())
> >  		return -EBUSY;
> >  
> > -	if (!scx_kf_allowed(SCX_KF_SELECT_CPU | SCX_KF_ENQUEUE))
> > -		return -EPERM;
> > +	/*
> > +	 * If called from an unlocked context, acquire the task's rq lock,
> > +	 * so that we can safely access p->cpus_ptr and p->nr_cpus_allowed.
> > +	 *
> > +	 * Otherwise, allow to use this kfunc only from ops.select_cpu()
> > +	 * and ops.select_enqueue().
> > +	 */
> > +	if (scx_kf_allowed_if_unlocked()) {
> > +		rq = task_rq_lock(p, &rf);
> > +	} else {
> > +		if (!scx_kf_allowed(SCX_KF_SELECT_CPU | SCX_KF_ENQUEUE))
> > +			return -EPERM;
> > +		rq = scx_locked_rq();
> > +	}
> >  
> >  	/*
> >  	 * Validate locking correctness to access p->cpus_ptr and
> >  	 * p->nr_cpus_allowed: if we're holding an rq lock, we're safe;
> >  	 * otherwise, assert that p->pi_lock is held.
> >  	 */
> > -	rq = scx_locked_rq();
> >  	if (!rq)
> >  		lockdep_assert_held(&p->pi_lock);
> >  
> > @@ -966,13 +979,17 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64
> >  	if (p->nr_cpus_allowed == 1) {
> >  		if (cpumask_test_cpu(prev_cpu, cpus_allowed) &&
> >  		    scx_idle_test_and_clear_cpu(prev_cpu))
> > -			return prev_cpu;
> > -		return -EBUSY;
> > +			cpu = prev_cpu;
> > +		else
> > +			cpu = -EBUSY;
> > +	} else {
> > +		cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, cpus_allowed, flags);
> 
> I wonder if we should just bring this into scx_select_cpu_dfl()? It seems like
> it would makes sense to do this optimization whether we're looking at
> cpus_allowed here, or p->cpus_ptr in scx_select_cpu_dfl(). I seem to recall us
> having this in there before so there may be a reason we removed it, but I've
> been out of the game for a while so I'm not sure.

Trying to remember... probably it was removed because ops.select_cpu() is
never called for tasks that can only run on 1 CPU.

> 
> Anyways, if we could do this, then we could bring both scx_bpf_select_cpu_and()
> and scx_select_cpu_dfl() into the scx_kfunc_ids_idle kfunc group and remove
> scx_kfunc_ids_select_cpu.
> 
> What do you think?

Are you suggesting that scx_bpf_select_cpu_dfl() should also be allowed in
the same contexts as scx_bpf_select_cpu_and()?

I did consider that, but was initially concerned about the potential
overhead of handling different contexts, even though these extra checks to
manage the contexts would likely be negligible in terms of overhead. And it
would give the possibility to use scx_bpf_select_cpu_dfl() in ops.enqueue()
as well, so overall I see more pros than cons.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/4] sched_ext: idle: Allow scx_bpf_select_cpu_and() from unlocked context
  2025-05-17 22:50     ` Andrea Righi
@ 2025-05-18  0:40       ` David Vernet
  2025-05-18  5:52         ` Andrea Righi
  0 siblings, 1 reply; 11+ messages in thread
From: David Vernet @ 2025-05-18  0:40 UTC (permalink / raw)
  To: Andrea Righi; +Cc: Tejun Heo, Changwoo Min, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2117 bytes --]

On Sun, May 18, 2025 at 12:50:29AM +0200, Andrea Righi wrote:

Hey Andrea,

[...]

> > I wonder if we should just bring this into scx_select_cpu_dfl()? It seems like
> > it would makes sense to do this optimization whether we're looking at
> > cpus_allowed here, or p->cpus_ptr in scx_select_cpu_dfl(). I seem to recall us
> > having this in there before so there may be a reason we removed it, but I've
> > been out of the game for a while so I'm not sure.
> 
> Trying to remember... probably it was removed because ops.select_cpu() is
> never called for tasks that can only run on 1 CPU.

Hmmm, I think it is called even for pcpu tasks, no?

> > Anyways, if we could do this, then we could bring both scx_bpf_select_cpu_and()
> > and scx_select_cpu_dfl() into the scx_kfunc_ids_idle kfunc group and remove
> > scx_kfunc_ids_select_cpu.
> > 
> > What do you think?
> 
> Are you suggesting that scx_bpf_select_cpu_dfl() should also be allowed in
> the same contexts as scx_bpf_select_cpu_and()?

Yep that's what I was thinking.

> I did consider that, but was initially concerned about the potential
> overhead of handling different contexts, even though these extra checks to
> manage the contexts would likely be negligible in terms of overhead. And it
> would give the possibility to use scx_bpf_select_cpu_dfl() in ops.enqueue()
> as well, so overall I see more pros than cons.

Now that you mention it I don't see any reason that scx_bpf_select_cpu_dfl()
couldn't be called from ops.enqueue() even now, as we do hold the rq lock on
that path. But in general I think that if we want to make
scx_bpf_select_cpu_and() callable without the rq lock held, that we might as
well do it for both as I don't think there's any semantic difference between
the two to the user; it's just that one version also does an AND.

As I mentioned in the other thread, I don't have a strong opinion either way,
but in my opinion it seems more consistent to move the extra context-handling
logic into scx_bpf_select_cpu_dfl() if we do decide to allow callers to use
this.

Thanks,
David

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/4] sched_ext: idle: Allow scx_bpf_select_cpu_and() from unlocked context
  2025-05-18  0:40       ` David Vernet
@ 2025-05-18  5:52         ` Andrea Righi
  2025-05-18 13:49           ` David Vernet
  0 siblings, 1 reply; 11+ messages in thread
From: Andrea Righi @ 2025-05-18  5:52 UTC (permalink / raw)
  To: David Vernet; +Cc: Tejun Heo, Changwoo Min, linux-kernel

On Sat, May 17, 2025 at 07:40:22PM -0500, David Vernet wrote:
> On Sun, May 18, 2025 at 12:50:29AM +0200, Andrea Righi wrote:
> 
> Hey Andrea,
> 
> [...]
> 
> > > I wonder if we should just bring this into scx_select_cpu_dfl()? It seems like
> > > it would makes sense to do this optimization whether we're looking at
> > > cpus_allowed here, or p->cpus_ptr in scx_select_cpu_dfl(). I seem to recall us
> > > having this in there before so there may be a reason we removed it, but I've
> > > been out of the game for a while so I'm not sure.
> > 
> > Trying to remember... probably it was removed because ops.select_cpu() is
> > never called for tasks that can only run on 1 CPU.
> 
> Hmmm, I think it is called even for pcpu tasks, no?

IIUC ops.select_cpu() is triggered by select_task_rq(), but only if
p->nr_cpus_allowed > 1 and it's not a migration-disabled task, see:

https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/core.c#n3582

> 
> > > Anyways, if we could do this, then we could bring both scx_bpf_select_cpu_and()
> > > and scx_select_cpu_dfl() into the scx_kfunc_ids_idle kfunc group and remove
> > > scx_kfunc_ids_select_cpu.
> > > 
> > > What do you think?
> > 
> > Are you suggesting that scx_bpf_select_cpu_dfl() should also be allowed in
> > the same contexts as scx_bpf_select_cpu_and()?
> 
> Yep that's what I was thinking.
> 
> > I did consider that, but was initially concerned about the potential
> > overhead of handling different contexts, even though these extra checks to
> > manage the contexts would likely be negligible in terms of overhead. And it
> > would give the possibility to use scx_bpf_select_cpu_dfl() in ops.enqueue()
> > as well, so overall I see more pros than cons.
> 
> Now that you mention it I don't see any reason that scx_bpf_select_cpu_dfl()
> couldn't be called from ops.enqueue() even now, as we do hold the rq lock on
> that path. But in general I think that if we want to make
> scx_bpf_select_cpu_and() callable without the rq lock held, that we might as
> well do it for both as I don't think there's any semantic difference between
> the two to the user; it's just that one version also does an AND.

Semantically speaking, yes, like you say, they're the same, except that one
also accepts an additional AND cpumask.

The only reason to keep them separate might be the slight overhead of
managing contexts, but, again, that should be negligible and not worth
preserving a different and potentially confusing semantic.

> 
> As I mentioned in the other thread, I don't have a strong opinion either way,
> but in my opinion it seems more consistent to move the extra context-handling
> logic into scx_bpf_select_cpu_dfl() if we do decide to allow callers to use
> this.

I think I agree, I'll send another patch on top to unify the two kfuncs.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/4] sched_ext: idle: Allow scx_bpf_select_cpu_and() from unlocked context
  2025-05-18  5:52         ` Andrea Righi
@ 2025-05-18 13:49           ` David Vernet
  0 siblings, 0 replies; 11+ messages in thread
From: David Vernet @ 2025-05-18 13:49 UTC (permalink / raw)
  To: Andrea Righi; +Cc: Tejun Heo, Changwoo Min, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2626 bytes --]

On Sun, May 18, 2025 at 07:52:30AM +0200, Andrea Righi wrote:
> > > 
> > > Trying to remember... probably it was removed because ops.select_cpu() is
> > > never called for tasks that can only run on 1 CPU.
> > 
> > Hmmm, I think it is called even for pcpu tasks, no?
> 
> IIUC ops.select_cpu() is triggered by select_task_rq(), but only if
> p->nr_cpus_allowed > 1 and it's not a migration-disabled task, see:
> 
> https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/core.c#n3582

Ahh, right you are, I'd forgotten about that. Thanks for the link.

> > > > Anyways, if we could do this, then we could bring both scx_bpf_select_cpu_and()
> > > > and scx_select_cpu_dfl() into the scx_kfunc_ids_idle kfunc group and remove
> > > > scx_kfunc_ids_select_cpu.
> > > > 
> > > > What do you think?
> > > 
> > > Are you suggesting that scx_bpf_select_cpu_dfl() should also be allowed in
> > > the same contexts as scx_bpf_select_cpu_and()?
> > 
> > Yep that's what I was thinking.
> > 
> > > I did consider that, but was initially concerned about the potential
> > > overhead of handling different contexts, even though these extra checks to
> > > manage the contexts would likely be negligible in terms of overhead. And it
> > > would give the possibility to use scx_bpf_select_cpu_dfl() in ops.enqueue()
> > > as well, so overall I see more pros than cons.
> > 
> > Now that you mention it I don't see any reason that scx_bpf_select_cpu_dfl()
> > couldn't be called from ops.enqueue() even now, as we do hold the rq lock on
> > that path. But in general I think that if we want to make
> > scx_bpf_select_cpu_and() callable without the rq lock held, that we might as
> > well do it for both as I don't think there's any semantic difference between
> > the two to the user; it's just that one version also does an AND.
> 
> Semantically speaking, yes, like you say, they're the same, except that one
> also accepts an additional AND cpumask.
> 
> The only reason to keep them separate might be the slight overhead of
> managing contexts, but, again, that should be negligible and not worth
> preserving a different and potentially confusing semantic.

Yep, I think we're aligned.

> > As I mentioned in the other thread, I don't have a strong opinion either way,
> > but in my opinion it seems more consistent to move the extra context-handling
> > logic into scx_bpf_select_cpu_dfl() if we do decide to allow callers to use
> > this.
> 
> I think I agree, I'll send another patch on top to unify the two kfuncs.

Thanks, Andrea.

- David

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCHSET v2 sched_ext/for-6.16] sched_ext: Extend usability of scx_bpf_select_cpu_and()
  2025-05-15 19:11 [PATCHSET v2 sched_ext/for-6.16] sched_ext: Extend usability of scx_bpf_select_cpu_and() Andrea Righi
                   ` (3 preceding siblings ...)
  2025-05-15 19:11 ` [PATCH 4/4] selftests/sched_ext: Add test for scx_bpf_select_cpu_and() via test_run Andrea Righi
@ 2025-05-19 18:02 ` Tejun Heo
  4 siblings, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2025-05-19 18:02 UTC (permalink / raw)
  To: Andrea Righi; +Cc: David Vernet, Changwoo Min, linux-kernel

On Thu, May 15, 2025 at 09:11:41PM +0200, Andrea Righi wrote:
> Many scx schedulers implement their own idle CPU selection policy, which
> may be invoked from different contexts, not just from ops.select_cpu().
> 
> For example, some schedulers may need to trigger a proactive CPU wakeup
> from ops.enqueue() after enqueuing a task, while others may expose this
> functionality to user space, relying on a BPF test_run call to pick an idle
> CPU.
> 
> To maintain a consistent selection policy, these schedulers often implement
> their own idle CPU selection logic, since the built-in one is only usable
> from ops.select_cpu(), leading to unnecessary code duplication and
> fragmentation.
> 
> To address this, allow scx_bpf_select_cpu_and() to be called not only from
> ops.select_cpu() and ops.enqueue(), but also from unlocked contexts (e.g.,
> when triggered from user space via a BPF test_run call).
> 
> This allows scx schedulers to consistently leverage the built-in idle CPU
> selection logic, helping reduce code duplication and fragmentation.
> 
> This patchset is also available in the following git branch:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/arighi/linux.git scx-select-cpu-and
> 
> Changes in v2:
>  - Enable scx_bpf_select_cpu_and() to be called from ops.enqueue()
>    ops.select_cpu() and unlocked context
>  - Add locking validation to ensure safe access to p->cpus_ptr and
>    p->nr_cpus_allowed
>  - Extend selftest to cover scx_bpf_select_cpu_and() when invoked from user
>    space
>  - Link to v1: https://lore.kernel.org/all/20250512151743.42988-1-arighi@nvidia.com/
> 
> Andrea Righi (4):
>       sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c
>       sched_ext: idle: Validate locking correctness in scx_bpf_select_cpu_and()
>       sched_ext: idle: Allow scx_bpf_select_cpu_and() from unlocked context
>       selftests/sched_ext: Add test for scx_bpf_select_cpu_and() via test_run

Applied to sched_ext/for-6.16.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-05-19 18:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-15 19:11 [PATCHSET v2 sched_ext/for-6.16] sched_ext: Extend usability of scx_bpf_select_cpu_and() Andrea Righi
2025-05-15 19:11 ` [PATCH 1/4] sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c Andrea Righi
2025-05-15 19:11 ` [PATCH 2/4] sched_ext: idle: Validate locking correctness in scx_bpf_select_cpu_and() Andrea Righi
2025-05-15 19:11 ` [PATCH 3/4] sched_ext: idle: Allow scx_bpf_select_cpu_and() from unlocked context Andrea Righi
2025-05-17 17:54   ` David Vernet
2025-05-17 22:50     ` Andrea Righi
2025-05-18  0:40       ` David Vernet
2025-05-18  5:52         ` Andrea Righi
2025-05-18 13:49           ` David Vernet
2025-05-15 19:11 ` [PATCH 4/4] selftests/sched_ext: Add test for scx_bpf_select_cpu_and() via test_run Andrea Righi
2025-05-19 18:02 ` [PATCHSET v2 sched_ext/for-6.16] sched_ext: Extend usability of scx_bpf_select_cpu_and() Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).