* [PATCH sched_ext/for-6.14-fixes 1/2] sched_ext: Implement auto local dispatching of migration disabled tasks
@ 2025-02-07 20:58 Tejun Heo
2025-02-07 20:59 ` [PATCH sched_ext/for-6.14-fixes 2/2] sched_ext: Fix migration disabled handling in targeted dispatches Tejun Heo
2025-02-07 22:36 ` [PATCH sched_ext/for-6.14-fixes 1/2] sched_ext: Implement auto local dispatching of migration disabled tasks Andrea Righi
0 siblings, 2 replies; 6+ messages in thread
From: Tejun Heo @ 2025-02-07 20:58 UTC (permalink / raw)
To: David Vernet, Andrea Righi, Changwoo Min; +Cc: linux-kernel, sched-ext
Migration disabled tasks are special and pinned to their previous CPUs. They
tripped up some unsuspecting BPF schedulers as their ->nr_cpus_allowed may
not agree with the bits set in ->cpus_ptr. Make it easier for BPF schedulers
by automatically dispatching them to the pinned local DSQs by default. If a
BPF scheduler wants to handle migration disabled tasks explicitly, it can
set SCX_OPS_ENQ_MIGRATION_DISABLED.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
kernel/sched/ext.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -123,6 +123,19 @@ enum scx_ops_flags {
SCX_OPS_SWITCH_PARTIAL = 1LLU << 3,
/*
+ * A migration disabled task can only execute on its current CPU. By
+ * default, such tasks are automatically put on the CPU's local DSQ with
+ * the default slice on enqueue. If this ops flag is set, they also go
+ * through ops.enqueue().
+ *
+ * A migration disabled task never invokes ops.select_cpu() as it can
+ * only select the current CPU. Also, p->cpus_ptr will only contain its
+ * current CPU while p->nr_cpus_allowed keeps tracking p->user_cpus_ptr
+ * and thus may disagree with cpumask_weight(p->cpus_ptr).
+ */
+ SCX_OPS_ENQ_MIGRATION_DISABLED = 1LLU << 4,
+
+ /*
* CPU cgroup support flags
*/
SCX_OPS_HAS_CGROUP_WEIGHT = 1LLU << 16, /* cpu.weight */
@@ -130,6 +143,7 @@ enum scx_ops_flags {
SCX_OPS_ALL_FLAGS = SCX_OPS_KEEP_BUILTIN_IDLE |
SCX_OPS_ENQ_LAST |
SCX_OPS_ENQ_EXITING |
+ SCX_OPS_ENQ_MIGRATION_DISABLED |
SCX_OPS_SWITCH_PARTIAL |
SCX_OPS_HAS_CGROUP_WEIGHT,
};
@@ -882,6 +896,7 @@ static bool scx_warned_zero_slice;
static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_last);
static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_exiting);
+static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_migration_disabled);
static DEFINE_STATIC_KEY_FALSE(scx_ops_cpu_preempt);
static DEFINE_STATIC_KEY_FALSE(scx_builtin_idle_enabled);
@@ -2014,6 +2029,11 @@ static void do_enqueue_task(struct rq *r
unlikely(p->flags & PF_EXITING))
goto local;
+ /* see %SCX_OPS_ENQ_MIGRATION_DISABLED */
+ if (!static_branch_unlikely(&scx_ops_enq_migration_disabled) &&
+ is_migration_disabled(p))
+ goto local;
+
if (!SCX_HAS_OP(enqueue))
goto global;
@@ -5052,6 +5072,7 @@ static void scx_ops_disable_workfn(struc
static_branch_disable(&scx_has_op[i]);
static_branch_disable(&scx_ops_enq_last);
static_branch_disable(&scx_ops_enq_exiting);
+ static_branch_disable(&scx_ops_enq_migration_disabled);
static_branch_disable(&scx_ops_cpu_preempt);
static_branch_disable(&scx_builtin_idle_enabled);
synchronize_rcu();
@@ -5661,6 +5682,8 @@ static int scx_ops_enable(struct sched_e
if (ops->flags & SCX_OPS_ENQ_EXITING)
static_branch_enable(&scx_ops_enq_exiting);
+ if (ops->flags & SCX_OPS_ENQ_MIGRATION_DISABLED)
+ static_branch_enable(&scx_ops_enq_migration_disabled);
if (scx_ops.cpu_acquire || scx_ops.cpu_release)
static_branch_enable(&scx_ops_cpu_preempt);
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH sched_ext/for-6.14-fixes 2/2] sched_ext: Fix migration disabled handling in targeted dispatches
2025-02-07 20:58 [PATCH sched_ext/for-6.14-fixes 1/2] sched_ext: Implement auto local dispatching of migration disabled tasks Tejun Heo
@ 2025-02-07 20:59 ` Tejun Heo
2025-02-09 6:33 ` Tejun Heo
2025-02-07 22:36 ` [PATCH sched_ext/for-6.14-fixes 1/2] sched_ext: Implement auto local dispatching of migration disabled tasks Andrea Righi
1 sibling, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2025-02-07 20:59 UTC (permalink / raw)
To: David Vernet, Andrea Righi, Changwoo Min; +Cc: linux-kernel, sched-ext
A dispatch operation that can target a specific local DSQ -
scx_bpf_dsq_move_to_local() or scx_bpf_dsq_move() - checks whether the task
can be migrated to the target CPU using task_can_run_on_remote_rq(). If the
task can't be migrated to the targeted CPU, it is bounced through a global
DSQ.
task_can_run_on_remote_rq() assumes that the task is on a CPU that's
different from the targeted CPU but the callers doesn't uphold the
assumption and may call the function when the task is already on the target
CPU. When such task has migration disabled, task_can_run_on_remote_rq() ends
up returning %false incorrectly unnecessarily bouncing the task to a global
DSQ.
Fix it by updating the callers to only call task_can_run_on_remote_rq() when
the task is on a different CPU than the target CPU. As this is a bit subtle,
for clarity and documentation:
- Make task_can_run_on_remote_rq() trigger SCHED_WARN_ON() if the task is on
the same CPU as the target CPU.
- is_migration_disabled() test in task_can_run_on_remote_rq() cannot trigger
if the task is on a different CPU than the target CPU as the preceding
task_allowed_on_cpu() test should fail beforehand. Convert the test into
SCHED_WARN_ON().
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 4c30f5ce4f7a ("sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()")
Fixes: 0366017e0973 ("sched_ext: Use task_can_run_on_remote_rq() test in dispatch_to_local_dsq()")
Cc: stable@vger.kernel.org # v6.12+
---
kernel/sched/ext.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2333,12 +2333,16 @@ static void move_remote_task_to_local_ds
*
* - The BPF scheduler is bypassed while the rq is offline and we can always say
* no to the BPF scheduler initiated migrations while offline.
+ *
+ * The caller must ensure that @p and @rq are on different CPUs.
*/
static bool task_can_run_on_remote_rq(struct task_struct *p, struct rq *rq,
bool trigger_error)
{
int cpu = cpu_of(rq);
+ SCHED_WARN_ON(task_cpu(p) == cpu);
+
/*
* We don't require the BPF scheduler to avoid dispatching to offline
* CPUs mostly for convenience but also because CPUs can go offline
@@ -2352,8 +2356,11 @@ static bool task_can_run_on_remote_rq(st
return false;
}
- if (unlikely(is_migration_disabled(p)))
- return false;
+ /*
+ * If @p has migration disabled, @p->cpus_ptr only contains its current
+ * CPU and the above task_allowed_on_cpu() test should have failed.
+ */
+ SCHED_WARN_ON(is_migration_disabled(p));
if (!scx_rq_online(rq))
return false;
@@ -2457,7 +2464,8 @@ static struct rq *move_task_between_dsqs
if (dst_dsq->id == SCX_DSQ_LOCAL) {
dst_rq = container_of(dst_dsq, struct rq, scx.local_dsq);
- if (!task_can_run_on_remote_rq(p, dst_rq, true)) {
+ if (src_rq != dst_rq &&
+ unlikely(!task_can_run_on_remote_rq(p, dst_rq, true))) {
dst_dsq = find_global_dsq(p);
dst_rq = src_rq;
}
@@ -2611,7 +2619,8 @@ static void dispatch_to_local_dsq(struct
}
#ifdef CONFIG_SMP
- if (unlikely(!task_can_run_on_remote_rq(p, dst_rq, true))) {
+ if (src_rq != dst_rq &&
+ unlikely(!task_can_run_on_remote_rq(p, dst_rq, true))) {
dispatch_enqueue(find_global_dsq(p), p,
enq_flags | SCX_ENQ_CLEAR_OPSS);
return;
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH sched_ext/for-6.14-fixes 1/2] sched_ext: Implement auto local dispatching of migration disabled tasks
2025-02-07 20:58 [PATCH sched_ext/for-6.14-fixes 1/2] sched_ext: Implement auto local dispatching of migration disabled tasks Tejun Heo
2025-02-07 20:59 ` [PATCH sched_ext/for-6.14-fixes 2/2] sched_ext: Fix migration disabled handling in targeted dispatches Tejun Heo
@ 2025-02-07 22:36 ` Andrea Righi
2025-02-07 22:44 ` Tejun Heo
1 sibling, 1 reply; 6+ messages in thread
From: Andrea Righi @ 2025-02-07 22:36 UTC (permalink / raw)
To: Tejun Heo; +Cc: David Vernet, Changwoo Min, linux-kernel, sched-ext
Hi Tejun,
On Fri, Feb 07, 2025 at 10:58:23AM -1000, Tejun Heo wrote:
> Migration disabled tasks are special and pinned to their previous CPUs. They
> tripped up some unsuspecting BPF schedulers as their ->nr_cpus_allowed may
> not agree with the bits set in ->cpus_ptr. Make it easier for BPF schedulers
> by automatically dispatching them to the pinned local DSQs by default. If a
> BPF scheduler wants to handle migration disabled tasks explicitly, it can
> set SCX_OPS_ENQ_MIGRATION_DISABLED.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> kernel/sched/ext.c | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -123,6 +123,19 @@ enum scx_ops_flags {
> SCX_OPS_SWITCH_PARTIAL = 1LLU << 3,
>
> /*
> + * A migration disabled task can only execute on its current CPU. By
> + * default, such tasks are automatically put on the CPU's local DSQ with
> + * the default slice on enqueue. If this ops flag is set, they also go
> + * through ops.enqueue().
> + *
> + * A migration disabled task never invokes ops.select_cpu() as it can
> + * only select the current CPU. Also, p->cpus_ptr will only contain its
> + * current CPU while p->nr_cpus_allowed keeps tracking p->user_cpus_ptr
> + * and thus may disagree with cpumask_weight(p->cpus_ptr).
> + */
> + SCX_OPS_ENQ_MIGRATION_DISABLED = 1LLU << 4,
> +
> + /*
> * CPU cgroup support flags
> */
> SCX_OPS_HAS_CGROUP_WEIGHT = 1LLU << 16, /* cpu.weight */
> @@ -130,6 +143,7 @@ enum scx_ops_flags {
> SCX_OPS_ALL_FLAGS = SCX_OPS_KEEP_BUILTIN_IDLE |
> SCX_OPS_ENQ_LAST |
> SCX_OPS_ENQ_EXITING |
> + SCX_OPS_ENQ_MIGRATION_DISABLED |
> SCX_OPS_SWITCH_PARTIAL |
> SCX_OPS_HAS_CGROUP_WEIGHT,
> };
> @@ -882,6 +896,7 @@ static bool scx_warned_zero_slice;
>
> static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_last);
> static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_exiting);
> +static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_migration_disabled);
> static DEFINE_STATIC_KEY_FALSE(scx_ops_cpu_preempt);
> static DEFINE_STATIC_KEY_FALSE(scx_builtin_idle_enabled);
>
> @@ -2014,6 +2029,11 @@ static void do_enqueue_task(struct rq *r
> unlikely(p->flags & PF_EXITING))
> goto local;
>
> + /* see %SCX_OPS_ENQ_MIGRATION_DISABLED */
> + if (!static_branch_unlikely(&scx_ops_enq_migration_disabled) &&
> + is_migration_disabled(p))
> + goto local;
Maybe not in this patch set, but it'd be nice to have an event counter for
this, as skipping ops.enqueue() might introduce latency issues. Having a
feedback could help to determine if we need to enable
SCX_OPS_ENQ_MIGRATION_DISABLED in some schedulers.
I'm also a bit conflicted if the default should be on or off, we're
changing the previous behavior, but OTOH this is going to prevent some
potential breakage (due to the nr_cpus_allowed mismatch) and server
workload is going to benefit from this, so it seems that there are more
pros than cons at dispatching migration_disabled tasks directly by default.
And I also did a quick test with this and it seems good, so:
Acked-by: Andrea Righi <arighi@nvidia.com>
-Andrea
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH sched_ext/for-6.14-fixes 1/2] sched_ext: Implement auto local dispatching of migration disabled tasks
2025-02-07 22:36 ` [PATCH sched_ext/for-6.14-fixes 1/2] sched_ext: Implement auto local dispatching of migration disabled tasks Andrea Righi
@ 2025-02-07 22:44 ` Tejun Heo
2025-02-07 22:59 ` Andrea Righi
0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2025-02-07 22:44 UTC (permalink / raw)
To: Andrea Righi; +Cc: David Vernet, Changwoo Min, linux-kernel, sched-ext
On Fri, Feb 07, 2025 at 11:36:08PM +0100, Andrea Righi wrote:
> > @@ -2014,6 +2029,11 @@ static void do_enqueue_task(struct rq *r
> > unlikely(p->flags & PF_EXITING))
> > goto local;
> >
> > + /* see %SCX_OPS_ENQ_MIGRATION_DISABLED */
> > + if (!static_branch_unlikely(&scx_ops_enq_migration_disabled) &&
> > + is_migration_disabled(p))
> > + goto local;
>
> Maybe not in this patch set, but it'd be nice to have an event counter for
> this, as skipping ops.enqueue() might introduce latency issues. Having a
> feedback could help to determine if we need to enable
> SCX_OPS_ENQ_MIGRATION_DISABLED in some schedulers.
Yeah, this patch is headed for sched_ext/for-6.14-fixes and the counters are
in for-6.15, so that can come after this lands in for-6.14-fixes and that
gets pulled into for-6.15.
> I'm also a bit conflicted if the default should be on or off, we're
> changing the previous behavior, but OTOH this is going to prevent some
> potential breakage (due to the nr_cpus_allowed mismatch) and server
> workload is going to benefit from this, so it seems that there are more
> pros than cons at dispatching migration_disabled tasks directly by default.
Hmm.. I didn't see a lot of migrate_disable() while testing with stress-ng
and migrate_disable() isn't used that much in the kernel to begin with. Do
you happen to know where migrate_disable() was coming from when you saw them
with bpfland?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH sched_ext/for-6.14-fixes 1/2] sched_ext: Implement auto local dispatching of migration disabled tasks
2025-02-07 22:44 ` Tejun Heo
@ 2025-02-07 22:59 ` Andrea Righi
0 siblings, 0 replies; 6+ messages in thread
From: Andrea Righi @ 2025-02-07 22:59 UTC (permalink / raw)
To: Tejun Heo; +Cc: David Vernet, Changwoo Min, linux-kernel, sched-ext
On Fri, Feb 07, 2025 at 12:44:07PM -1000, Tejun Heo wrote:
...
> > I'm also a bit conflicted if the default should be on or off, we're
> > changing the previous behavior, but OTOH this is going to prevent some
> > potential breakage (due to the nr_cpus_allowed mismatch) and server
> > workload is going to benefit from this, so it seems that there are more
> > pros than cons at dispatching migration_disabled tasks directly by default.
>
> Hmm.. I didn't see a lot of migrate_disable() while testing with stress-ng
> and migrate_disable() isn't used that much in the kernel to begin with. Do
> you happen to know where migrate_disable() was coming from when you saw them
> with bpfland?
I also don't see a lot of migrate_disable() invocations, but if I do the
usual benchmark, measuring the fps while building the kernel, with the
migration_disabled tasks dispatched directly I see a significant
performance drop in the fps and the kernel build is faster, so it seems
something related to the IO? I'll investigate a bit...
-Andrea
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH sched_ext/for-6.14-fixes 2/2] sched_ext: Fix migration disabled handling in targeted dispatches
2025-02-07 20:59 ` [PATCH sched_ext/for-6.14-fixes 2/2] sched_ext: Fix migration disabled handling in targeted dispatches Tejun Heo
@ 2025-02-09 6:33 ` Tejun Heo
0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2025-02-09 6:33 UTC (permalink / raw)
To: David Vernet, Andrea Righi, Changwoo Min; +Cc: linux-kernel, sched-ext
On Fri, Feb 07, 2025 at 10:59:06AM -1000, Tejun Heo wrote:
> A dispatch operation that can target a specific local DSQ -
> scx_bpf_dsq_move_to_local() or scx_bpf_dsq_move() - checks whether the task
> can be migrated to the target CPU using task_can_run_on_remote_rq(). If the
> task can't be migrated to the targeted CPU, it is bounced through a global
> DSQ.
>
> task_can_run_on_remote_rq() assumes that the task is on a CPU that's
> different from the targeted CPU but the callers doesn't uphold the
> assumption and may call the function when the task is already on the target
> CPU. When such task has migration disabled, task_can_run_on_remote_rq() ends
> up returning %false incorrectly unnecessarily bouncing the task to a global
> DSQ.
>
> Fix it by updating the callers to only call task_can_run_on_remote_rq() when
> the task is on a different CPU than the target CPU. As this is a bit subtle,
> for clarity and documentation:
>
> - Make task_can_run_on_remote_rq() trigger SCHED_WARN_ON() if the task is on
> the same CPU as the target CPU.
>
> - is_migration_disabled() test in task_can_run_on_remote_rq() cannot trigger
> if the task is on a different CPU than the target CPU as the preceding
> task_allowed_on_cpu() test should fail beforehand. Convert the test into
> SCHED_WARN_ON().
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Fixes: 4c30f5ce4f7a ("sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()")
> Fixes: 0366017e0973 ("sched_ext: Use task_can_run_on_remote_rq() test in dispatch_to_local_dsq()")
> Cc: stable@vger.kernel.org # v6.12+
Applied 1-2 to sched_ext/for-6.14-fixes.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-02-09 6:33 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-07 20:58 [PATCH sched_ext/for-6.14-fixes 1/2] sched_ext: Implement auto local dispatching of migration disabled tasks Tejun Heo
2025-02-07 20:59 ` [PATCH sched_ext/for-6.14-fixes 2/2] sched_ext: Fix migration disabled handling in targeted dispatches Tejun Heo
2025-02-09 6:33 ` Tejun Heo
2025-02-07 22:36 ` [PATCH sched_ext/for-6.14-fixes 1/2] sched_ext: Implement auto local dispatching of migration disabled tasks Andrea Righi
2025-02-07 22:44 ` Tejun Heo
2025-02-07 22:59 ` Andrea Righi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.