* [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and()
@ 2025-05-12 15:14 Andrea Righi
2025-05-12 15:14 ` [PATCH 1/2] sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c Andrea Righi
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Andrea Righi @ 2025-05-12 15:14 UTC (permalink / raw)
To: Tejun Heo, David Vernet, Changwoo Min; +Cc: linux-kernel
Many scx schedulers implement their own idle CPU selection policy, which
may be invoked from different contexts, not just from ops.select_cpu().
For example, some schedulers may need to trigger a proactive CPU wakeup
from ops.enqueue() after enqueuing a task, while others may expose this
functionality to user space, relying on a BPF test_run call to pick an idle
CPU.
To maintain a consistent selection policy, these schedulers often implement
their own idle CPU selection logic, since the built-in one is only usable
from ops.select_cpu(), leading to unnecessary code duplication and
fragmentation.
To address this, allow scx_bpf_select_cpu_and() to be used from any
context. This gives scx schedulers the option to use the built-in idle CPU
selection policy in a consistent way, potentially reducing code duplication
and fragmentation.
Andrea Righi (2):
sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c
sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context
kernel/sched/ext.c | 5 -----
kernel/sched/ext.h | 5 +++++
kernel/sched/ext_idle.c | 23 +++++++++++++++++------
3 files changed, 22 insertions(+), 11 deletions(-)
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 1/2] sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c 2025-05-12 15:14 [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() Andrea Righi @ 2025-05-12 15:14 ` Andrea Righi 2025-05-12 15:14 ` [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context Andrea Righi 2025-05-13 0:46 ` [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() Changwoo Min 2 siblings, 0 replies; 12+ messages in thread From: Andrea Righi @ 2025-05-12 15:14 UTC (permalink / raw) To: Tejun Heo, David Vernet, Changwoo Min; +Cc: linux-kernel Relocate the scx_kf_allowed_if_unlocked(), so it can be used from other source files (e.g., ext_idle.c). No functional change. Signed-off-by: Andrea Righi <arighi@nvidia.com> --- kernel/sched/ext.c | 5 ----- kernel/sched/ext.h | 5 +++++ 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 8ccb5c7ff55c9..51623d1bd5d48 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -1358,11 +1358,6 @@ static __always_inline bool scx_kf_allowed_on_arg_tasks(u32 mask, return true; } -static bool scx_kf_allowed_if_unlocked(void) -{ - return !current->scx.kf_mask; -} - /** * nldsq_next_task - Iterate to the next task in a non-local DSQ * @dsq: user dsq being iterated diff --git a/kernel/sched/ext.h b/kernel/sched/ext.h index 3053cdd61eb9c..6e5072f577718 100644 --- a/kernel/sched/ext.h +++ b/kernel/sched/ext.h @@ -8,6 +8,11 @@ */ #ifdef CONFIG_SCHED_CLASS_EXT +static inline bool scx_kf_allowed_if_unlocked(void) +{ + return !current->scx.kf_mask; +} + DECLARE_STATIC_KEY_FALSE(scx_ops_allow_queued_wakeup); void scx_tick(struct rq *rq); -- 2.49.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context 2025-05-12 15:14 [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() Andrea Righi 2025-05-12 15:14 ` [PATCH 1/2] sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c Andrea Righi @ 2025-05-12 15:14 ` Andrea Righi 2025-05-12 16:58 ` David Vernet 2025-05-12 17:19 ` Tejun Heo 2025-05-13 0:46 ` [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() Changwoo Min 2 siblings, 2 replies; 12+ messages in thread From: Andrea Righi @ 2025-05-12 15:14 UTC (permalink / raw) To: Tejun Heo, David Vernet, Changwoo Min; +Cc: linux-kernel Allow scx_bpf_select_cpu_and() to be used from any context, not just from ops.enqueue() or ops.select_cpu(). This enables schedulers, including user-space ones, to implement a consistent idle CPU selection policy and helps reduce code duplication. Signed-off-by: Andrea Righi <arighi@nvidia.com> --- kernel/sched/ext_idle.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c index 6915685cd3d6b..5373132db02b6 100644 --- a/kernel/sched/ext_idle.c +++ b/kernel/sched/ext_idle.c @@ -943,10 +943,15 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64 if (!check_builtin_idle_enabled()) return -EBUSY; - if (!scx_kf_allowed(SCX_KF_SELECT_CPU | SCX_KF_ENQUEUE)) - return -EPERM; - #ifdef CONFIG_SMP + /* + * If called from an unlocked context, try to acquire + * cpus_read_lock() to avoid races with CPU hotplug. + */ + if (scx_kf_allowed_if_unlocked()) + if (!cpus_read_trylock()) + return -EBUSY; + /* * This may also be called from ops.enqueue(), so we need to handle * per-CPU tasks as well. For these tasks, we can skip all idle CPU @@ -956,10 +961,16 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64 if (p->nr_cpus_allowed == 1) { if (cpumask_test_cpu(prev_cpu, cpus_allowed) && scx_idle_test_and_clear_cpu(prev_cpu)) - return prev_cpu; - return -EBUSY; + cpu = prev_cpu; + else + cpu = -EBUSY; + goto out_unlock; } cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, cpus_allowed, flags); + +out_unlock: + if (scx_kf_allowed_if_unlocked()) + cpus_read_unlock(); #else cpu = -EBUSY; #endif @@ -1266,6 +1277,7 @@ BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu_node, KF_RCU) BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu, KF_RCU) BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu_node, KF_RCU) BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu, KF_RCU) +BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU) BTF_KFUNCS_END(scx_kfunc_ids_idle) static const struct btf_kfunc_id_set scx_kfunc_set_idle = { @@ -1275,7 +1287,6 @@ static const struct btf_kfunc_id_set scx_kfunc_set_idle = { BTF_KFUNCS_START(scx_kfunc_ids_select_cpu) BTF_ID_FLAGS(func, scx_bpf_select_cpu_dfl, KF_RCU) -BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU) BTF_KFUNCS_END(scx_kfunc_ids_select_cpu) static const struct btf_kfunc_id_set scx_kfunc_set_select_cpu = { -- 2.49.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context 2025-05-12 15:14 ` [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context Andrea Righi @ 2025-05-12 16:58 ` David Vernet 2025-05-12 18:29 ` Andrea Righi 2025-05-12 17:19 ` Tejun Heo 1 sibling, 1 reply; 12+ messages in thread From: David Vernet @ 2025-05-12 16:58 UTC (permalink / raw) To: Andrea Righi; +Cc: Tejun Heo, Changwoo Min, linux-kernel [-- Attachment #1: Type: text/plain, Size: 3912 bytes --] On Mon, May 12, 2025 at 05:14:56PM +0200, Andrea Righi wrote: > Allow scx_bpf_select_cpu_and() to be used from any context, not just > from ops.enqueue() or ops.select_cpu(). > > This enables schedulers, including user-space ones, to implement a > consistent idle CPU selection policy and helps reduce code duplication. > > Signed-off-by: Andrea Righi <arighi@nvidia.com> Hi Andrea, > --- > kernel/sched/ext_idle.c | 23 +++++++++++++++++------ > 1 file changed, 17 insertions(+), 6 deletions(-) > > diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c > index 6915685cd3d6b..5373132db02b6 100644 > --- a/kernel/sched/ext_idle.c > +++ b/kernel/sched/ext_idle.c > @@ -943,10 +943,15 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64 FYI it looks like there are a few comments sprinkled around here that specify that only the .select_cpu() and .enqueue() ops are allowed. > if (!check_builtin_idle_enabled()) > return -EBUSY; > > - if (!scx_kf_allowed(SCX_KF_SELECT_CPU | SCX_KF_ENQUEUE)) > - return -EPERM; > - > #ifdef CONFIG_SMP > + /* > + * If called from an unlocked context, try to acquire > + * cpus_read_lock() to avoid races with CPU hotplug. > + */ > + if (scx_kf_allowed_if_unlocked()) > + if (!cpus_read_trylock()) > + return -EBUSY; Hmm, so this now assumes that this function can be called without the rq lock held. I'm not sure if that's safe, as scx_select_cpu_dfl() queries p->cpus_ptr, which is protected by the rq lock. Additionally, scx_bpf_select_cpu_and() checks p->nr_cpus_allowed which is protected sometimes by pi_lock, sometimes by the rq lock, etc depending on its state. This might be fine in practice as I _think_ the only unsafe thing would be if p->cpus_ptr could have a torn read or write. scx_select_cpu_dfl() does do preempt_disable() so no issues in querying the per-cpu variables there. As long as this is safe (or can be made safe) I don't have a strong opinion one way or the other. I think it's probably a good idea to allow users to remove code duplication, and in general it's up to the user to use this API correctly (e.g. if you're preempted during a call to scx_bpf_select_cpu_and() and prev_cpu changes out from under you due to the task being migrated, that's likely just a bug in the scheduler). Thanks, David > /* > * This may also be called from ops.enqueue(), so we need to handle > * per-CPU tasks as well. For these tasks, we can skip all idle CPU > @@ -956,10 +961,16 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64 > if (p->nr_cpus_allowed == 1) { > if (cpumask_test_cpu(prev_cpu, cpus_allowed) && > scx_idle_test_and_clear_cpu(prev_cpu)) > - return prev_cpu; > - return -EBUSY; > + cpu = prev_cpu; > + else > + cpu = -EBUSY; > + goto out_unlock; > } > cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, cpus_allowed, flags); > + > +out_unlock: > + if (scx_kf_allowed_if_unlocked()) > + cpus_read_unlock(); > #else > cpu = -EBUSY; > #endif > @@ -1266,6 +1277,7 @@ BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu_node, KF_RCU) > BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu, KF_RCU) > BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu_node, KF_RCU) > BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu, KF_RCU) > +BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU) > BTF_KFUNCS_END(scx_kfunc_ids_idle) > > static const struct btf_kfunc_id_set scx_kfunc_set_idle = { > @@ -1275,7 +1287,6 @@ static const struct btf_kfunc_id_set scx_kfunc_set_idle = { > > BTF_KFUNCS_START(scx_kfunc_ids_select_cpu) > BTF_ID_FLAGS(func, scx_bpf_select_cpu_dfl, KF_RCU) > -BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU) > BTF_KFUNCS_END(scx_kfunc_ids_select_cpu) > > static const struct btf_kfunc_id_set scx_kfunc_set_select_cpu = { > -- > 2.49.0 > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context 2025-05-12 16:58 ` David Vernet @ 2025-05-12 18:29 ` Andrea Righi 2025-05-12 19:07 ` Andrea Righi 0 siblings, 1 reply; 12+ messages in thread From: Andrea Righi @ 2025-05-12 18:29 UTC (permalink / raw) To: David Vernet; +Cc: Tejun Heo, Changwoo Min, linux-kernel Hi David, On Mon, May 12, 2025 at 11:58:29AM -0500, David Vernet wrote: > On Mon, May 12, 2025 at 05:14:56PM +0200, Andrea Righi wrote: > > Allow scx_bpf_select_cpu_and() to be used from any context, not just > > from ops.enqueue() or ops.select_cpu(). > > > > This enables schedulers, including user-space ones, to implement a > > consistent idle CPU selection policy and helps reduce code duplication. > > > > Signed-off-by: Andrea Righi <arighi@nvidia.com> > > Hi Andrea, > > > --- > > kernel/sched/ext_idle.c | 23 +++++++++++++++++------ > > 1 file changed, 17 insertions(+), 6 deletions(-) > > > > diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c > > index 6915685cd3d6b..5373132db02b6 100644 > > --- a/kernel/sched/ext_idle.c > > +++ b/kernel/sched/ext_idle.c > > @@ -943,10 +943,15 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64 > > FYI it looks like there are a few comments sprinkled around here that specify > that only the .select_cpu() and .enqueue() ops are allowed. Ah, good catch! I'll fix the comments. > > > if (!check_builtin_idle_enabled()) > > return -EBUSY; > > > > - if (!scx_kf_allowed(SCX_KF_SELECT_CPU | SCX_KF_ENQUEUE)) > > - return -EPERM; > > - > > #ifdef CONFIG_SMP > > + /* > > + * If called from an unlocked context, try to acquire > > + * cpus_read_lock() to avoid races with CPU hotplug. > > + */ > > + if (scx_kf_allowed_if_unlocked()) > > + if (!cpus_read_trylock()) > > + return -EBUSY; > > Hmm, so this now assumes that this function can be called without the rq lock > held. I'm not sure if that's safe, as scx_select_cpu_dfl() queries p->cpus_ptr, > which is protected by the rq lock. Additionally, scx_bpf_select_cpu_and() > checks p->nr_cpus_allowed which is protected sometimes by pi_lock, sometimes by > the rq lock, etc depending on its state. Yeah, it makes locking quite tricky. Maybe we can acquire the rq lock if called from an unlocked context, instead of cpu_read_lock(), but then we still have to deal with pi_lock. > > This might be fine in practice as I _think_ the only unsafe thing would be if > p->cpus_ptr could have a torn read or write. scx_select_cpu_dfl() does do > preempt_disable() so no issues in querying the per-cpu variables there. I've been running `stress-ng --race-sched N` for a while to stress test affinity changes and I haven't triggered any error, but that doesn't mean the code is safe... > > As long as this is safe (or can be made safe) I don't have a strong opinion one > way or the other. I think it's probably a good idea to allow users to remove > code duplication, and in general it's up to the user to use this API correctly > (e.g. if you're preempted during a call to scx_bpf_select_cpu_and() and > prev_cpu changes out from under you due to the task being migrated, that's > likely just a bug in the scheduler). Right, I guess the biggest issue here is dealing with locking in a safe way. About the scheduler not being 100% accurate with prev_cpu, etc. that's a non-problem I think, at the end we don't need to be formally perfect, we can tolerate little errors if this API is used from a non-atomic context. I'll think more about it and do some experiemnts with locking. Thanks! -Andrea > > Thanks, > David > > > /* > > * This may also be called from ops.enqueue(), so we need to handle > > * per-CPU tasks as well. For these tasks, we can skip all idle CPU > > @@ -956,10 +961,16 @@ __bpf_kfunc s32 scx_bpf_select_cpu_and(struct task_struct *p, s32 prev_cpu, u64 > > if (p->nr_cpus_allowed == 1) { > > if (cpumask_test_cpu(prev_cpu, cpus_allowed) && > > scx_idle_test_and_clear_cpu(prev_cpu)) > > - return prev_cpu; > > - return -EBUSY; > > + cpu = prev_cpu; > > + else > > + cpu = -EBUSY; > > + goto out_unlock; > > } > > cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, cpus_allowed, flags); > > + > > +out_unlock: > > + if (scx_kf_allowed_if_unlocked()) > > + cpus_read_unlock(); > > #else > > cpu = -EBUSY; > > #endif > > @@ -1266,6 +1277,7 @@ BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu_node, KF_RCU) > > BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu, KF_RCU) > > BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu_node, KF_RCU) > > BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu, KF_RCU) > > +BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU) > > BTF_KFUNCS_END(scx_kfunc_ids_idle) > > > > static const struct btf_kfunc_id_set scx_kfunc_set_idle = { > > @@ -1275,7 +1287,6 @@ static const struct btf_kfunc_id_set scx_kfunc_set_idle = { > > > > BTF_KFUNCS_START(scx_kfunc_ids_select_cpu) > > BTF_ID_FLAGS(func, scx_bpf_select_cpu_dfl, KF_RCU) > > -BTF_ID_FLAGS(func, scx_bpf_select_cpu_and, KF_RCU) > > BTF_KFUNCS_END(scx_kfunc_ids_select_cpu) > > > > static const struct btf_kfunc_id_set scx_kfunc_set_select_cpu = { > > -- > > 2.49.0 > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context 2025-05-12 18:29 ` Andrea Righi @ 2025-05-12 19:07 ` Andrea Righi 2025-05-12 20:00 ` Tejun Heo 0 siblings, 1 reply; 12+ messages in thread From: Andrea Righi @ 2025-05-12 19:07 UTC (permalink / raw) To: David Vernet; +Cc: Tejun Heo, Changwoo Min, linux-kernel On Mon, May 12, 2025 at 08:29:26PM +0200, Andrea Righi wrote: ... > > > + /* > > > + * If called from an unlocked context, try to acquire > > > + * cpus_read_lock() to avoid races with CPU hotplug. > > > + */ > > > + if (scx_kf_allowed_if_unlocked()) > > > + if (!cpus_read_trylock()) > > > + return -EBUSY; > > > > Hmm, so this now assumes that this function can be called without the rq lock > > held. I'm not sure if that's safe, as scx_select_cpu_dfl() queries p->cpus_ptr, > > which is protected by the rq lock. Additionally, scx_bpf_select_cpu_and() > > checks p->nr_cpus_allowed which is protected sometimes by pi_lock, sometimes by > > the rq lock, etc depending on its state. > > Yeah, it makes locking quite tricky. Maybe we can acquire the rq lock if > called from an unlocked context, instead of cpu_read_lock(), but then we > still have to deal with pi_lock. Actually that might work, since pi_lock should never be held when scx_kf_allowed_if_unlocked() is true, so basically we can do: if (scx_kf_allowed_if_unlocked()) rq = task_rq_lock(p, &rf); ... if (scx_kf_allowed_if_unlocked()) task_rq_unlock(rq, p, &rf); Or at least it should cover all current use cases. The tricky one is scx_bpf_select_cpu_and() being called via BPF test_run from a user-space task (scx_rustland_core). If we had a way to clearly identify a test_run context, we could restrict this to BPF_PROG_TYPE_STRUCT_OPS and BPF_PROG_TYPE_TEST_RUN (but as far as I can tell, the latter doesn't exist). Thanks, -Andrea ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context 2025-05-12 19:07 ` Andrea Righi @ 2025-05-12 20:00 ` Tejun Heo 2025-05-13 5:23 ` Andrea Righi 0 siblings, 1 reply; 12+ messages in thread From: Tejun Heo @ 2025-05-12 20:00 UTC (permalink / raw) To: Andrea Righi; +Cc: David Vernet, Changwoo Min, linux-kernel Hello, On Mon, May 12, 2025 at 09:07:49PM +0200, Andrea Righi wrote: ... > if (scx_kf_allowed_if_unlocked()) > rq = task_rq_lock(p, &rf); > ... > if (scx_kf_allowed_if_unlocked()) > task_rq_unlock(rq, p, &rf); > > Or at least it should cover all current use cases. The tricky one is > scx_bpf_select_cpu_and() being called via BPF test_run from a user-space > task (scx_rustland_core). > > If we had a way to clearly identify a test_run context, we could restrict > this to BPF_PROG_TYPE_STRUCT_OPS and BPF_PROG_TYPE_TEST_RUN (but as far as > I can tell, the latter doesn't exist). Shouldn't that work as-is? TEST_RUN isn't gonna set any kf_mask, so the same condition would work? Thanks. -- tejun ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context 2025-05-12 20:00 ` Tejun Heo @ 2025-05-13 5:23 ` Andrea Righi 0 siblings, 0 replies; 12+ messages in thread From: Andrea Righi @ 2025-05-13 5:23 UTC (permalink / raw) To: Tejun Heo; +Cc: David Vernet, Changwoo Min, linux-kernel On Mon, May 12, 2025 at 10:00:19AM -1000, Tejun Heo wrote: > Hello, > > On Mon, May 12, 2025 at 09:07:49PM +0200, Andrea Righi wrote: > ... > > if (scx_kf_allowed_if_unlocked()) > > rq = task_rq_lock(p, &rf); > > ... > > if (scx_kf_allowed_if_unlocked()) > > task_rq_unlock(rq, p, &rf); > > > > Or at least it should cover all current use cases. The tricky one is > > scx_bpf_select_cpu_and() being called via BPF test_run from a user-space > > task (scx_rustland_core). > > > > If we had a way to clearly identify a test_run context, we could restrict > > this to BPF_PROG_TYPE_STRUCT_OPS and BPF_PROG_TYPE_TEST_RUN (but as far as > > I can tell, the latter doesn't exist). > > Shouldn't that work as-is? TEST_RUN isn't gonna set any kf_mask, so the same > condition would work? Oh yes, it works, my point is that, without any restrictions, the kfunc might be used in problematic contexts, for example if it's called with pi_lock held it could trigger a deadlock with task_rq_lock(). Limiting its use to ops.select_cpu(), ops.enqueue(), ops.dispatch(), and test_run would allow to cover all the use cases while being safe. Thanks, -Andrea ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context 2025-05-12 15:14 ` [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context Andrea Righi 2025-05-12 16:58 ` David Vernet @ 2025-05-12 17:19 ` Tejun Heo 2025-05-12 18:35 ` Andrea Righi 1 sibling, 1 reply; 12+ messages in thread From: Tejun Heo @ 2025-05-12 17:19 UTC (permalink / raw) To: Andrea Righi; +Cc: David Vernet, Changwoo Min, linux-kernel Hello, On Mon, May 12, 2025 at 05:14:56PM +0200, Andrea Righi wrote: > #ifdef CONFIG_SMP > + /* > + * If called from an unlocked context, try to acquire > + * cpus_read_lock() to avoid races with CPU hotplug. > + */ > + if (scx_kf_allowed_if_unlocked()) > + if (!cpus_read_trylock()) > + return -EBUSY; Is this meaningful? The idle CPU selection is already racy against CPU hotplugs and we depend on the scheduler core to fix it up afterwards. Even if scx_bpf_select_cpu_and() is not racy, it will drop the cpus lock before returning and becomes racy again right there. ie. This doesn't add any meaningful protection. Thanks. -- tejun ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context 2025-05-12 17:19 ` Tejun Heo @ 2025-05-12 18:35 ` Andrea Righi 0 siblings, 0 replies; 12+ messages in thread From: Andrea Righi @ 2025-05-12 18:35 UTC (permalink / raw) To: Tejun Heo; +Cc: David Vernet, Changwoo Min, linux-kernel Hi Tejun, On Mon, May 12, 2025 at 07:19:53AM -1000, Tejun Heo wrote: > Hello, > > On Mon, May 12, 2025 at 05:14:56PM +0200, Andrea Righi wrote: > > #ifdef CONFIG_SMP > > + /* > > + * If called from an unlocked context, try to acquire > > + * cpus_read_lock() to avoid races with CPU hotplug. > > + */ > > + if (scx_kf_allowed_if_unlocked()) > > + if (!cpus_read_trylock()) > > + return -EBUSY; > > Is this meaningful? The idle CPU selection is already racy against CPU > hotplugs and we depend on the scheduler core to fix it up afterwards. Even > if scx_bpf_select_cpu_and() is not racy, it will drop the cpus lock before > returning and becomes racy again right there. ie. This doesn't add any > meaningful protection. I was concerned that accessing llc_span() / llc_weight() from scx_select_cpu_dfl() might be problematic if a CPU goes offline underneath, but we are accessing them with rcu_read_lock() held, we probably don't need cpus_read_lock() protection. And about the scheduler picking a bad CPU, that can be fixed by the sched_ext core, so there's no problem for that. I'll think more about the CPU hotplugging scenario. Thanks, -Andrea > > Thanks. > > -- > tejun ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() 2025-05-12 15:14 [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() Andrea Righi 2025-05-12 15:14 ` [PATCH 1/2] sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c Andrea Righi 2025-05-12 15:14 ` [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context Andrea Righi @ 2025-05-13 0:46 ` Changwoo Min 2025-05-13 5:48 ` Andrea Righi 2 siblings, 1 reply; 12+ messages in thread From: Changwoo Min @ 2025-05-13 0:46 UTC (permalink / raw) To: Andrea Righi, Tejun Heo, David Vernet; +Cc: linux-kernel Hi Andrea, On 5/13/25 00:14, Andrea Righi wrote: > Many scx schedulers implement their own idle CPU selection policy, which > may be invoked from different contexts, not just from ops.select_cpu(). > > For example, some schedulers may need to trigger a proactive CPU wakeup > from ops.enqueue() after enqueuing a task, while others may expose this > functionality to user space, relying on a BPF test_run call to pick an idle > CPU. > > To maintain a consistent selection policy, these schedulers often implement > their own idle CPU selection logic, since the built-in one is only usable > from ops.select_cpu(), leading to unnecessary code duplication and > fragmentation. Besides ops.select_cpu() and ops.enqueue(), do you have use cases in mind? Regards, Changwoo Min > > To address this, allow scx_bpf_select_cpu_and() to be used from any > context. This gives scx schedulers the option to use the built-in idle CPU > selection policy in a consistent way, potentially reducing code duplication > and fragmentation. > > Andrea Righi (2): > sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c > sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context > > kernel/sched/ext.c | 5 ----- > kernel/sched/ext.h | 5 +++++ > kernel/sched/ext_idle.c | 23 +++++++++++++++++------ > 3 files changed, 22 insertions(+), 11 deletions(-) > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() 2025-05-13 0:46 ` [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() Changwoo Min @ 2025-05-13 5:48 ` Andrea Righi 0 siblings, 0 replies; 12+ messages in thread From: Andrea Righi @ 2025-05-13 5:48 UTC (permalink / raw) To: Changwoo Min; +Cc: Tejun Heo, David Vernet, linux-kernel Hi Changwoo, On Tue, May 13, 2025 at 09:46:04AM +0900, Changwoo Min wrote: > Hi Andrea, > > On 5/13/25 00:14, Andrea Righi wrote: > > Many scx schedulers implement their own idle CPU selection policy, which > > may be invoked from different contexts, not just from ops.select_cpu(). > > > > For example, some schedulers may need to trigger a proactive CPU wakeup > > from ops.enqueue() after enqueuing a task, while others may expose this > > functionality to user space, relying on a BPF test_run call to pick an idle > > CPU. > > > > To maintain a consistent selection policy, these schedulers often implement > > their own idle CPU selection logic, since the built-in one is only usable > > from ops.select_cpu(), leading to unnecessary code duplication and > > fragmentation. > > Besides ops.select_cpu() and ops.enqueue(), do you have use cases in > mind? scx_rustland_core exposes the select_cpu() functionality to user space, which internally invokes a custom pick_idle_cpu() from the user-space scheduler via a BPF test_run call. In this case, having access to scx_bpf_select_cpu_and() would be really useful, as it would allow us to eliminate a significant amount of code that effectively re-implements the built-in idle CPU selection logic, including all the code required to expose all the topology information to BPF. Another potential use case could be within ops.dispatch(). For example, if the current CPU isn't the best fit for the first task about to be consumed from a DSQ, we could use the built-in idle selection logic to try to pick a more suitable idle CPU and bounce the task there. Thanks, -Andrea > > Regards, > Changwoo Min > > > > > To address this, allow scx_bpf_select_cpu_and() to be used from any > > context. This gives scx schedulers the option to use the built-in idle CPU > > selection policy in a consistent way, potentially reducing code duplication > > and fragmentation. > > > > Andrea Righi (2): > > sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c > > sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context > > > > kernel/sched/ext.c | 5 ----- > > kernel/sched/ext.h | 5 +++++ > > kernel/sched/ext_idle.c | 23 +++++++++++++++++------ > > 3 files changed, 22 insertions(+), 11 deletions(-) > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-05-13 5:48 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-12 15:14 [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() Andrea Righi 2025-05-12 15:14 ` [PATCH 1/2] sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c Andrea Righi 2025-05-12 15:14 ` [PATCH 2/2] sched_ext/idle: Make scx_bpf_select_cpu_and() usable from any context Andrea Righi 2025-05-12 16:58 ` David Vernet 2025-05-12 18:29 ` Andrea Righi 2025-05-12 19:07 ` Andrea Righi 2025-05-12 20:00 ` Tejun Heo 2025-05-13 5:23 ` Andrea Righi 2025-05-12 17:19 ` Tejun Heo 2025-05-12 18:35 ` Andrea Righi 2025-05-13 0:46 ` [PATCH sched_ext/for-6.16 0/2] sched_ext: Extend usability of scx_bpf_select_cpu_and() Changwoo Min 2025-05-13 5:48 ` Andrea Righi
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.