[PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and()
@ 2025-05-12 15:14 Andrea Righi
  2025-05-12 15:14 ` [PATCH 1/2] sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c Andrea Righi
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Andrea Righi @ 2025-05-12 15:14 UTC (permalink / raw)
  To: Tejun Heo, David Vernet, Changwoo Min; +Cc: linux-kernel

Many scx schedulers implement their own idle CPU selection policy, which
may be invoked from different contexts, not just from ops.select_cpu().

For example, some schedulers may need to trigger a proactive CPU wakeup
from ops.enqueue() after enqueuing a task, while others may expose this
functionality to user space, relying on a BPF test_run call to pick an idle
CPU.

To maintain a consistent selection policy, these schedulers often implement
their own idle CPU selection logic, since the built-in one is only usable
from ops.select_cpu(), leading to unnecessary code duplication and
fragmentation.

To address this, allow scx_bpf_select_cpu_and() to be used from any
context. This gives scx schedulers the option to use the built-in idle CPU
selection policy in a consistent way, potentially reducing code duplication
and fragmentation.

Andrea Righi (2):
      sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c
      sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context

 kernel/sched/ext.c      |  5 -----
 kernel/sched/ext.h      |  5 +++++
 kernel/sched/ext_idle.c | 23 +++++++++++++++++------
 3 files changed, 22 insertions(+), 11 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c
  2025-05-12 15:14 [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() Andrea Righi
@ 2025-05-12 15:14 ` Andrea Righi
  2025-05-12 15:14 ` [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context Andrea Righi
  2025-05-13  0:46 ` [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() Changwoo Min
  2 siblings, 0 replies; 12+ messages in thread
From: Andrea Righi @ 2025-05-12 15:14 UTC (permalink / raw)
  To: Tejun Heo, David Vernet, Changwoo Min; +Cc: linux-kernel

Relocate the scx_kf_allowed_if_unlocked(), so it can be used from other
source files (e.g., ext_idle.c).

No functional change.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext.c | 5 -----
 kernel/sched/ext.h | 5 +++++
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 8ccb5c7ff55c9..51623d1bd5d48 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1358,11 +1358,6 @@ static __always_inline bool scx_kf_allowed_on_arg_tasks(u32 mask,
 	return true;
 }
 
-static bool scx_kf_allowed_if_unlocked(void)
-{
-	return !current->scx.kf_mask;
-}
-
 /**
  * nldsq_next_task - Iterate to the next task in a non-local DSQ
  * @dsq: user dsq being iterated
diff --git a/kernel/sched/ext.h b/kernel/sched/ext.h
index 3053cdd61eb9c..6e5072f577718 100644
--- a/kernel/sched/ext.h
+++ b/kernel/sched/ext.h
@@ -8,6 +8,11 @@
  */
 #ifdef CONFIG_SCHED_CLASS_EXT
 
+static inline bool scx_kf_allowed_if_unlocked(void)
+{
+	return !current->scx.kf_mask;
+}
+
 DECLARE_STATIC_KEY_FALSE(scx_ops_allow_queued_wakeup);
 
 void scx_tick(struct rq *rq);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context
  2025-05-12 15:14 [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() Andrea Righi
  2025-05-12 15:14 ` [PATCH 1/2] sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c Andrea Righi
@ 2025-05-12 15:14 ` Andrea Righi
  2025-05-12 16:58   ` David Vernet
  2025-05-12 17:19   ` Tejun Heo
  2025-05-13  0:46 ` [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() Changwoo Min
  2 siblings, 2 replies; 12+ messages in thread
From: Andrea Righi @ 2025-05-12 15:14 UTC (permalink / raw)
  To: Tejun Heo, David Vernet, Changwoo Min; +Cc: linux-kernel

Allow scx_bpf_select_cpu_and() to be used from any context, not just
from ops.enqueue() or ops.select_cpu().

This enables schedulers, including user-space ones, to implement a
consistent idle CPU selection policy and helps reduce code duplication.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext_idle.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index 6915685cd3d6b..5373132db02b6 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -943,10 +943,15 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64
 	if (!check_builtin_idle_enabled())
 		return -EBUSY;
 
-	if (!scx_kf_allowed(SCX_KF_SELECT_CPU | SCX_KF_ENQUEUE))
-		return -EPERM;
-
 #ifdef CONFIG_SMP
+	/*
+	 * If called from an unlocked context, try to acquire
+	 * cpus_read_lock() to avoid races with CPU hotplug.
+	 */
+	if (scx_kf_allowed_if_unlocked())
+		if (!cpus_read_trylock())
+			return -EBUSY;
+
 	/*
 	 * This may also be called from ops.enqueue(), so we need to handle
 	 * per-CPU tasks as well. For these tasks, we can skip all idle CPU
@@ -956,10 +961,16 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64
 	if (p->nr_cpus_allowed == 1) {
 		if (cpumask_test_cpu(prev_cpu, cpus_allowed) &&
 		    scx_idle_test_and_clear_cpu(prev_cpu))
-			return prev_cpu;
-		return -EBUSY;
+			cpu = prev_cpu;
+		else
+			cpu = -EBUSY;
+		goto out_unlock;
 	}
 	cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, cpus_allowed, flags);
+
+out_unlock:
+	if (scx_kf_allowed_if_unlocked())
+		cpus_read_unlock();
 #else
 	cpu = -EBUSY;
 #endif
@@ -1266,6 +1277,7 @@ BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu_node, KF_RCU)
 BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu, KF_RCU)
 BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu_node, KF_RCU)
 BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu, KF_RCU)
+BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU)
 BTF_KFUNCS_END(scx_kfunc_ids_idle)
 
 static const struct btf_kfunc_id_set scx_kfunc_set_idle = {
@@ -1275,7 +1287,6 @@ static const struct btf_kfunc_id_set scx_kfunc_set_idle = {
 
 BTF_KFUNCS_START(scx_kfunc_ids_select_cpu)
 BTF_ID_FLAGS(func, scx_bpf_select_cpu_dfl, KF_RCU)
-BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU)
 BTF_KFUNCS_END(scx_kfunc_ids_select_cpu)
 
 static const struct btf_kfunc_id_set scx_kfunc_set_select_cpu = {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context
  2025-05-12 15:14 ` [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context Andrea Righi
@ 2025-05-12 16:58   ` David Vernet
  2025-05-12 18:29     ` Andrea Righi
  2025-05-12 17:19   ` Tejun Heo
  1 sibling, 1 reply; 12+ messages in thread
From: David Vernet @ 2025-05-12 16:58 UTC (permalink / raw)
  To: Andrea Righi; +Cc: Tejun Heo, Changwoo Min, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3912 bytes --]

On Mon, May 12, 2025 at 05:14:56PM +0200, Andrea Righi wrote:
> Allow scx_bpf_select_cpu_and() to be used from any context, not just
> from ops.enqueue() or ops.select_cpu().
> 
> This enables schedulers, including user-space ones, to implement a
> consistent idle CPU selection policy and helps reduce code duplication.
> 
> Signed-off-by: Andrea Righi <arighi@nvidia.com>

Hi Andrea,

> ---
>  kernel/sched/ext_idle.c | 23 +++++++++++++++++------
>  1 file changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
> index 6915685cd3d6b..5373132db02b6 100644
> --- a/kernel/sched/ext_idle.c
> +++ b/kernel/sched/ext_idle.c
> @@ -943,10 +943,15 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64

FYI it looks like there are a few comments sprinkled around here that specify
that only the .select_cpu() and .enqueue() ops are allowed.

>  	if (!check_builtin_idle_enabled())
>  		return -EBUSY;
>  
> -	if (!scx_kf_allowed(SCX_KF_SELECT_CPU | SCX_KF_ENQUEUE))
> -		return -EPERM;
> -
>  #ifdef CONFIG_SMP
> +	/*
> +	 * If called from an unlocked context, try to acquire
> +	 * cpus_read_lock() to avoid races with CPU hotplug.
> +	 */
> +	if (scx_kf_allowed_if_unlocked())
> +		if (!cpus_read_trylock())
> +			return -EBUSY;

Hmm, so this now assumes that this function can be called without the rq lock
held. I'm not sure if that's safe, as scx_select_cpu_dfl() queries p->cpus_ptr,
which is protected by the rq lock. Additionally, scx_bpf_select_cpu_and()
checks p->nr_cpus_allowed which is protected sometimes by pi_lock, sometimes by
the rq lock, etc depending on its state.

This might be fine in practice as I _think_ the only unsafe thing would be if
p->cpus_ptr could have a torn read or write. scx_select_cpu_dfl() does do
preempt_disable() so no issues in querying the per-cpu variables there.

As long as this is safe (or can be made safe) I don't have a strong opinion one
way or the other. I think it's probably a good idea to allow users to remove
code duplication, and in general it's up to the user to use this API correctly
(e.g. if you're preempted during a call to scx_bpf_select_cpu_and() and
prev_cpu changes out from under you due to the task being migrated, that's
likely just a bug in the scheduler).

Thanks,
David

>  	/*
>  	 * This may also be called from ops.enqueue(), so we need to handle
>  	 * per-CPU tasks as well. For these tasks, we can skip all idle CPU
> @@ -956,10 +961,16 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64
>  	if (p->nr_cpus_allowed == 1) {
>  		if (cpumask_test_cpu(prev_cpu, cpus_allowed) &&
>  		    scx_idle_test_and_clear_cpu(prev_cpu))
> -			return prev_cpu;
> -		return -EBUSY;
> +			cpu = prev_cpu;
> +		else
> +			cpu = -EBUSY;
> +		goto out_unlock;
>  	}
>  	cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, cpus_allowed, flags);
> +
> +out_unlock:
> +	if (scx_kf_allowed_if_unlocked())
> +		cpus_read_unlock();
>  #else
>  	cpu = -EBUSY;
>  #endif
> @@ -1266,6 +1277,7 @@ BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu_node, KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu, KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu_node, KF_RCU)
>  BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu, KF_RCU)
> +BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU)
>  BTF_KFUNCS_END(scx_kfunc_ids_idle)
>  
>  static const struct btf_kfunc_id_set scx_kfunc_set_idle = {
> @@ -1275,7 +1287,6 @@ static const struct btf_kfunc_id_set scx_kfunc_set_idle = {
>  
>  BTF_KFUNCS_START(scx_kfunc_ids_select_cpu)
>  BTF_ID_FLAGS(func, scx_bpf_select_cpu_dfl, KF_RCU)
> -BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU)
>  BTF_KFUNCS_END(scx_kfunc_ids_select_cpu)
>  
>  static const struct btf_kfunc_id_set scx_kfunc_set_select_cpu = {
> -- 
> 2.49.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context
  2025-05-12 15:14 ` [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context Andrea Righi
  2025-05-12 16:58   ` David Vernet
@ 2025-05-12 17:19   ` Tejun Heo
  2025-05-12 18:35     ` Andrea Righi
  1 sibling, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2025-05-12 17:19 UTC (permalink / raw)
  To: Andrea Righi; +Cc: David Vernet, Changwoo Min, linux-kernel

Hello,

On Mon, May 12, 2025 at 05:14:56PM +0200, Andrea Righi wrote:
>  #ifdef CONFIG_SMP
> +	/*
> +	 * If called from an unlocked context, try to acquire
> +	 * cpus_read_lock() to avoid races with CPU hotplug.
> +	 */
> +	if (scx_kf_allowed_if_unlocked())
> +		if (!cpus_read_trylock())
> +			return -EBUSY;

Is this meaningful? The idle CPU selection is already racy against CPU
hotplugs and we depend on the scheduler core to fix it up afterwards. Even
if scx_bpf_select_cpu_and() is not racy, it will drop the cpus lock before
returning and becomes racy again right there. ie. This doesn't add any
meaningful protection.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context
  2025-05-12 16:58   ` David Vernet
@ 2025-05-12 18:29     ` Andrea Righi
  2025-05-12 19:07       ` Andrea Righi
  0 siblings, 1 reply; 12+ messages in thread
From: Andrea Righi @ 2025-05-12 18:29 UTC (permalink / raw)
  To: David Vernet; +Cc: Tejun Heo, Changwoo Min, linux-kernel

Hi David,

On Mon, May 12, 2025 at 11:58:29AM -0500, David Vernet wrote:
> On Mon, May 12, 2025 at 05:14:56PM +0200, Andrea Righi wrote:
> > Allow scx_bpf_select_cpu_and() to be used from any context, not just
> > from ops.enqueue() or ops.select_cpu().
> > 
> > This enables schedulers, including user-space ones, to implement a
> > consistent idle CPU selection policy and helps reduce code duplication.
> > 
> > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> 
> Hi Andrea,
> 
> > ---
> >  kernel/sched/ext_idle.c | 23 +++++++++++++++++------
> >  1 file changed, 17 insertions(+), 6 deletions(-)
> > 
> > diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
> > index 6915685cd3d6b..5373132db02b6 100644
> > --- a/kernel/sched/ext_idle.c
> > +++ b/kernel/sched/ext_idle.c
> > @@ -943,10 +943,15 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64
> 
> FYI it looks like there are a few comments sprinkled around here that specify
> that only the .select_cpu() and .enqueue() ops are allowed.

Ah, good catch! I'll fix the comments.

> 
> >  	if (!check_builtin_idle_enabled())
> >  		return -EBUSY;
> >  
> > -	if (!scx_kf_allowed(SCX_KF_SELECT_CPU | SCX_KF_ENQUEUE))
> > -		return -EPERM;
> > -
> >  #ifdef CONFIG_SMP
> > +	/*
> > +	 * If called from an unlocked context, try to acquire
> > +	 * cpus_read_lock() to avoid races with CPU hotplug.
> > +	 */
> > +	if (scx_kf_allowed_if_unlocked())
> > +		if (!cpus_read_trylock())
> > +			return -EBUSY;
> 
> Hmm, so this now assumes that this function can be called without the rq lock
> held. I'm not sure if that's safe, as scx_select_cpu_dfl() queries p->cpus_ptr,
> which is protected by the rq lock. Additionally, scx_bpf_select_cpu_and()
> checks p->nr_cpus_allowed which is protected sometimes by pi_lock, sometimes by
> the rq lock, etc depending on its state.

Yeah, it makes locking quite tricky. Maybe we can acquire the rq lock if
called from an unlocked context, instead of cpu_read_lock(), but then we
still have to deal with pi_lock.

> 
> This might be fine in practice as I _think_ the only unsafe thing would be if
> p->cpus_ptr could have a torn read or write. scx_select_cpu_dfl() does do
> preempt_disable() so no issues in querying the per-cpu variables there.

I've been running `stress-ng --race-sched N` for a while to stress test
affinity changes  and I haven't triggered any error, but that doesn't mean
the code is safe...

> 
> As long as this is safe (or can be made safe) I don't have a strong opinion one
> way or the other. I think it's probably a good idea to allow users to remove
> code duplication, and in general it's up to the user to use this API correctly
> (e.g. if you're preempted during a call to scx_bpf_select_cpu_and() and
> prev_cpu changes out from under you due to the task being migrated, that's
> likely just a bug in the scheduler).

Right, I guess the biggest issue here is dealing with locking in a safe
way. About the scheduler not being 100% accurate with prev_cpu, etc. that's
a non-problem I think, at the end we don't need to be formally perfect, we
can tolerate little errors if this API is used from a non-atomic context.

I'll think more about it and do some experiemnts with locking.

Thanks!
-Andrea

> 
> Thanks,
> David
> 
> >  	/*
> >  	 * This may also be called from ops.enqueue(), so we need to handle
> >  	 * per-CPU tasks as well. For these tasks, we can skip all idle CPU
> > @@ -956,10 +961,16 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64
> >  	if (p->nr_cpus_allowed == 1) {
> >  		if (cpumask_test_cpu(prev_cpu, cpus_allowed) &&
> >  		    scx_idle_test_and_clear_cpu(prev_cpu))
> > -			return prev_cpu;
> > -		return -EBUSY;
> > +			cpu = prev_cpu;
> > +		else
> > +			cpu = -EBUSY;
> > +		goto out_unlock;
> >  	}
> >  	cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, cpus_allowed, flags);
> > +
> > +out_unlock:
> > +	if (scx_kf_allowed_if_unlocked())
> > +		cpus_read_unlock();
> >  #else
> >  	cpu = -EBUSY;
> >  #endif
> > @@ -1266,6 +1277,7 @@ BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu_node, KF_RCU)
> >  BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu, KF_RCU)
> >  BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu_node, KF_RCU)
> >  BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu, KF_RCU)
> > +BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU)
> >  BTF_KFUNCS_END(scx_kfunc_ids_idle)
> >  
> >  static const struct btf_kfunc_id_set scx_kfunc_set_idle = {
> > @@ -1275,7 +1287,6 @@ static const struct btf_kfunc_id_set scx_kfunc_set_idle = {
> >  
> >  BTF_KFUNCS_START(scx_kfunc_ids_select_cpu)
> >  BTF_ID_FLAGS(func, scx_bpf_select_cpu_dfl, KF_RCU)
> > -BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU)
> >  BTF_KFUNCS_END(scx_kfunc_ids_select_cpu)
> >  
> >  static const struct btf_kfunc_id_set scx_kfunc_set_select_cpu = {
> > -- 
> > 2.49.0
> > 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context
  2025-05-12 17:19   ` Tejun Heo
@ 2025-05-12 18:35     ` Andrea Righi
  0 siblings, 0 replies; 12+ messages in thread
From: Andrea Righi @ 2025-05-12 18:35 UTC (permalink / raw)
  To: Tejun Heo; +Cc: David Vernet, Changwoo Min, linux-kernel

Hi Tejun,

On Mon, May 12, 2025 at 07:19:53AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Mon, May 12, 2025 at 05:14:56PM +0200, Andrea Righi wrote:
> >  #ifdef CONFIG_SMP
> > +	/*
> > +	 * If called from an unlocked context, try to acquire
> > +	 * cpus_read_lock() to avoid races with CPU hotplug.
> > +	 */
> > +	if (scx_kf_allowed_if_unlocked())
> > +		if (!cpus_read_trylock())
> > +			return -EBUSY;
> 
> Is this meaningful? The idle CPU selection is already racy against CPU
> hotplugs and we depend on the scheduler core to fix it up afterwards. Even
> if scx_bpf_select_cpu_and() is not racy, it will drop the cpus lock before
> returning and becomes racy again right there. ie. This doesn't add any
> meaningful protection.

I was concerned that accessing llc_span() / llc_weight() from
scx_select_cpu_dfl() might be problematic if a CPU goes offline underneath,
but we are accessing them with rcu_read_lock() held, we probably don't need
cpus_read_lock() protection.

And about the scheduler picking a bad CPU, that can be fixed by the
sched_ext core, so there's no problem for that.

I'll think more about the CPU hotplugging scenario.

Thanks,
-Andrea

> 
> Thanks.
> 
> -- 
> tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context
  2025-05-12 18:29     ` Andrea Righi
@ 2025-05-12 19:07       ` Andrea Righi
  2025-05-12 20:00         ` Tejun Heo
  0 siblings, 1 reply; 12+ messages in thread
From: Andrea Righi @ 2025-05-12 19:07 UTC (permalink / raw)
  To: David Vernet; +Cc: Tejun Heo, Changwoo Min, linux-kernel

On Mon, May 12, 2025 at 08:29:26PM +0200, Andrea Righi wrote:
...
> > > +	/*
> > > +	 * If called from an unlocked context, try to acquire
> > > +	 * cpus_read_lock() to avoid races with CPU hotplug.
> > > +	 */
> > > +	if (scx_kf_allowed_if_unlocked())
> > > +		if (!cpus_read_trylock())
> > > +			return -EBUSY;
> > 
> > Hmm, so this now assumes that this function can be called without the rq lock
> > held. I'm not sure if that's safe, as scx_select_cpu_dfl() queries p->cpus_ptr,
> > which is protected by the rq lock. Additionally, scx_bpf_select_cpu_and()
> > checks p->nr_cpus_allowed which is protected sometimes by pi_lock, sometimes by
> > the rq lock, etc depending on its state.
> 
> Yeah, it makes locking quite tricky. Maybe we can acquire the rq lock if
> called from an unlocked context, instead of cpu_read_lock(), but then we
> still have to deal with pi_lock.

Actually that might work, since pi_lock should never be held when
scx_kf_allowed_if_unlocked() is true, so basically we can do:

	if (scx_kf_allowed_if_unlocked())
		rq = task_rq_lock(p, &rf);
	...
	if (scx_kf_allowed_if_unlocked())
		task_rq_unlock(rq, p, &rf);

Or at least it should cover all current use cases. The tricky one is
scx_bpf_select_cpu_and() being called via BPF test_run from a user-space
task (scx_rustland_core).

If we had a way to clearly identify a test_run context, we could restrict
this to BPF_PROG_TYPE_STRUCT_OPS and BPF_PROG_TYPE_TEST_RUN (but as far as
I can tell, the latter doesn't exist).

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context
  2025-05-12 19:07       ` Andrea Righi
@ 2025-05-12 20:00         ` Tejun Heo
  2025-05-13  5:23           ` Andrea Righi
  0 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2025-05-12 20:00 UTC (permalink / raw)
  To: Andrea Righi; +Cc: David Vernet, Changwoo Min, linux-kernel

Hello,

On Mon, May 12, 2025 at 09:07:49PM +0200, Andrea Righi wrote:
...
> 	if (scx_kf_allowed_if_unlocked())
> 		rq = task_rq_lock(p, &rf);
> 	...
> 	if (scx_kf_allowed_if_unlocked())
> 		task_rq_unlock(rq, p, &rf);
> 
> Or at least it should cover all current use cases. The tricky one is
> scx_bpf_select_cpu_and() being called via BPF test_run from a user-space
> task (scx_rustland_core).
> 
> If we had a way to clearly identify a test_run context, we could restrict
> this to BPF_PROG_TYPE_STRUCT_OPS and BPF_PROG_TYPE_TEST_RUN (but as far as
> I can tell, the latter doesn't exist).

Shouldn't that work as-is? TEST_RUN isn't gonna set any kf_mask, so the same
condition would work?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and()
  2025-05-12 15:14 [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() Andrea Righi
  2025-05-12 15:14 ` [PATCH 1/2] sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c Andrea Righi
  2025-05-12 15:14 ` [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context Andrea Righi
@ 2025-05-13  0:46 ` Changwoo Min
  2025-05-13  5:48   ` Andrea Righi
  2 siblings, 1 reply; 12+ messages in thread
From: Changwoo Min @ 2025-05-13  0:46 UTC (permalink / raw)
  To: Andrea Righi, Tejun Heo, David Vernet; +Cc: linux-kernel

Hi Andrea,

On 5/13/25 00:14, Andrea Righi wrote:
> Many scx schedulers implement their own idle CPU selection policy, which
> may be invoked from different contexts, not just from ops.select_cpu().
> 
> For example, some schedulers may need to trigger a proactive CPU wakeup
> from ops.enqueue() after enqueuing a task, while others may expose this
> functionality to user space, relying on a BPF test_run call to pick an idle
> CPU.
> 
> To maintain a consistent selection policy, these schedulers often implement
> their own idle CPU selection logic, since the built-in one is only usable
> from ops.select_cpu(), leading to unnecessary code duplication and
> fragmentation.

Besides ops.select_cpu() and ops.enqueue(), do you have use cases in
mind?

Regards,
Changwoo Min

> 
> To address this, allow scx_bpf_select_cpu_and() to be used from any
> context. This gives scx schedulers the option to use the built-in idle CPU
> selection policy in a consistent way, potentially reducing code duplication
> and fragmentation.
> 
> Andrea Righi (2):
>        sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c
>        sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context
> 
>   kernel/sched/ext.c      |  5 -----
>   kernel/sched/ext.h      |  5 +++++
>   kernel/sched/ext_idle.c | 23 +++++++++++++++++------
>   3 files changed, 22 insertions(+), 11 deletions(-)
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context
  2025-05-12 20:00         ` Tejun Heo
@ 2025-05-13  5:23           ` Andrea Righi
  0 siblings, 0 replies; 12+ messages in thread
From: Andrea Righi @ 2025-05-13  5:23 UTC (permalink / raw)
  To: Tejun Heo; +Cc: David Vernet, Changwoo Min, linux-kernel

On Mon, May 12, 2025 at 10:00:19AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Mon, May 12, 2025 at 09:07:49PM +0200, Andrea Righi wrote:
> ...
> > 	if (scx_kf_allowed_if_unlocked())
> > 		rq = task_rq_lock(p, &rf);
> > 	...
> > 	if (scx_kf_allowed_if_unlocked())
> > 		task_rq_unlock(rq, p, &rf);
> > 
> > Or at least it should cover all current use cases. The tricky one is
> > scx_bpf_select_cpu_and() being called via BPF test_run from a user-space
> > task (scx_rustland_core).
> > 
> > If we had a way to clearly identify a test_run context, we could restrict
> > this to BPF_PROG_TYPE_STRUCT_OPS and BPF_PROG_TYPE_TEST_RUN (but as far as
> > I can tell, the latter doesn't exist).
> 
> Shouldn't that work as-is? TEST_RUN isn't gonna set any kf_mask, so the same
> condition would work?

Oh yes, it works, my point is that, without any restrictions, the kfunc
might be used in problematic contexts, for example if it's called with
pi_lock held it could trigger a deadlock with task_rq_lock(). Limiting its
use to ops.select_cpu(), ops.enqueue(), ops.dispatch(), and test_run would
allow to cover all the use cases while being safe.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and()
  2025-05-13  0:46 ` [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() Changwoo Min
@ 2025-05-13  5:48   ` Andrea Righi
  0 siblings, 0 replies; 12+ messages in thread
From: Andrea Righi @ 2025-05-13  5:48 UTC (permalink / raw)
  To: Changwoo Min; +Cc: Tejun Heo, David Vernet, linux-kernel

Hi Changwoo,

On Tue, May 13, 2025 at 09:46:04AM +0900, Changwoo Min wrote:
> Hi Andrea,
> 
> On 5/13/25 00:14, Andrea Righi wrote:
> > Many scx schedulers implement their own idle CPU selection policy, which
> > may be invoked from different contexts, not just from ops.select_cpu().
> > 
> > For example, some schedulers may need to trigger a proactive CPU wakeup
> > from ops.enqueue() after enqueuing a task, while others may expose this
> > functionality to user space, relying on a BPF test_run call to pick an idle
> > CPU.
> > 
> > To maintain a consistent selection policy, these schedulers often implement
> > their own idle CPU selection logic, since the built-in one is only usable
> > from ops.select_cpu(), leading to unnecessary code duplication and
> > fragmentation.
> 
> Besides ops.select_cpu() and ops.enqueue(), do you have use cases in
> mind?

scx_rustland_core exposes the select_cpu() functionality to user space,
which internally invokes a custom pick_idle_cpu() from the user-space
scheduler via a BPF test_run call.

In this case, having access to scx_bpf_select_cpu_and() would be really
useful, as it would allow us to eliminate a significant amount of code that
effectively re-implements the built-in idle CPU selection logic, including
all the code required to expose all the topology information to BPF.

Another potential use case could be within ops.dispatch(). For example, if
the current CPU isn't the best fit for the first task about to be consumed
from a DSQ, we could use the built-in idle selection logic to try to pick a
more suitable idle CPU and bounce the task there.

Thanks,
-Andrea

> 
> Regards,
> Changwoo Min
> 
> > 
> > To address this, allow scx_bpf_select_cpu_and() to be used from any
> > context. This gives scx schedulers the option to use the built-in idle CPU
> > selection policy in a consistent way, potentially reducing code duplication
> > and fragmentation.
> > 
> > Andrea Righi (2):
> >        sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c
> >        sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context
> > 
> >   kernel/sched/ext.c      |  5 -----
> >   kernel/sched/ext.h      |  5 +++++
> >   kernel/sched/ext_idle.c | 23 +++++++++++++++++------
> >   3 files changed, 22 insertions(+), 11 deletions(-)
> > 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-05-13  5:48 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-12 15:14 [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() Andrea Righi
2025-05-12 15:14 ` [PATCH 1/2] sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c Andrea Righi
2025-05-12 15:14 ` [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context Andrea Righi
2025-05-12 16:58   ` David Vernet
2025-05-12 18:29     ` Andrea Righi
2025-05-12 19:07       ` Andrea Righi
2025-05-12 20:00         ` Tejun Heo
2025-05-13  5:23           ` Andrea Righi
2025-05-12 17:19   ` Tejun Heo
2025-05-12 18:35     ` Andrea Righi
2025-05-13  0:46 ` [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() Changwoo Min
2025-05-13  5:48   ` Andrea Righi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.