* [PATCH 7.0.y] sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()
@ 2026-05-13 13:01 Andrea Righi
2026-05-14 5:02 ` sashiko-bot
2026-05-15 9:18 ` Patch "sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()" has been added to the 7.0-stable tree gregkh
0 siblings, 2 replies; 4+ messages in thread
From: Andrea Righi @ 2026-05-13 13:01 UTC (permalink / raw)
To: Greg Kroah-Hartman, Tejun Heo, David Vernet, Changwoo Min
Cc: Chris Mason, Peter Schneider, sched-ext, stable, linux-kernel
From: Tejun Heo <tj@kernel.org>
commit da2d81b4118a74e65d2335e221a38d665902a98c upstream.
bypass_lb_cpu() transfers tasks between per-CPU bypass DSQs without
migrating them - task_cpu() only updates when the donee later consumes the
task via move_remote_task_to_local_dsq(). If the LB timer fires again before
consumption and the new DSQ becomes a donor, @p is still on the previous CPU
and task_rq(@p) != donor_rq. @p can't be moved without its own rq locked.
Skip such tasks.
Fixes: 95d1df610cdc ("sched_ext: Implement load balancer for bypass mode")
Cc: stable@vger.kernel.org # v6.19+
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
[ arighi: replace donor_rq with rq, not present in v7.0.y ]
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
kernel/sched/ext.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 89814646a9868..ddd7c19daa17c 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4008,6 +4008,15 @@ static u32 bypass_lb_cpu(struct scx_sched *sch, struct rq *rq,
if (cpumask_empty(donee_mask))
break;
+ /*
+ * If an earlier pass placed @p on @donor_dsq from a different
+ * CPU and the donee hasn't consumed it yet, @p is still on the
+ * previous CPU and task_rq(@p) != @rq. @p can't be moved
+ * without its rq locked. Skip.
+ */
+ if (task_rq(p) != rq)
+ continue;
+
donee = cpumask_any_and_distribute(donee_mask, p->cpus_ptr);
if (donee >= nr_cpu_ids)
continue;
--
2.54.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 7.0.y] sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()
2026-05-13 13:01 [PATCH 7.0.y] sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu() Andrea Righi
@ 2026-05-14 5:02 ` sashiko-bot
2026-05-15 9:18 ` Patch "sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()" has been added to the 7.0-stable tree gregkh
1 sibling, 0 replies; 4+ messages in thread
From: sashiko-bot @ 2026-05-14 5:02 UTC (permalink / raw)
To: Andrea Righi; +Cc: sched-ext
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Unbounded lock hold time with IRQs disabled when skipping tasks in `bypass_lb_cpu()`.
- [Low] Obsolete comment contradicts the new task_rq lock constraints.
--
commit 68327c3a383140f9157a96a24a7b23be2c135ac5
Author: Tejun Heo <tj@kernel.org>
sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()
This commit updates bypass_lb_cpu() to skip tasks whose task_rq() does not
match the donor rq. These tasks were placed on the donor DSQ from a different
CPU but haven't been consumed yet, preventing their migration without the
proper rq lock.
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
[ ... ]
> while ((p = n)) {
> struct rq *donee_rq;
[ ... ]
> + /*
> + * If an earlier pass placed @p on @donor_dsq from a different
> + * CPU and the donee hasn't consumed it yet, @p is still on the
> + * previous CPU and task_rq(@p) != @rq. @p can't be moved
> + * without its rq locked. Skip.
> + */
> + if (task_rq(p) != rq)
> + continue;
Because this uses continue, does it bypass the lock-yielding check at the
bottom of the loop (if (!(nr_balanced % SCX_BYPASS_LB_BATCH)))?
If there is a large batch of unconsumed tasks left over from previous passes,
could skipping them without incrementing nr_balanced result in an unbounded
loop?
Traversing a long list here while holding the rq lock and donor_dsq lock
with IRQs disabled could potentially cause latency spikes or trigger the
soft lockup watchdog.
> donee = cpumask_any_and_distribute(donee_mask, p->cpus_ptr);
> if (donee >= nr_cpu_ids)
> continue;
>
> donee_rq = cpu_rq(donee);
> donee_dsq = &donee_rq->scx.bypass_dsq;
>
> /*
> * $p's rq is not locked but $p's DSQ lock protects its
> * scheduling properties making this test safe.
> */
This isn't a bug, but is this comment still accurate?
Since the function initially acquires raw_spin_rq_lock_irq(rq) and the
newly added check guarantees task_rq(p) == rq, it appears the task's rq is
actually locked when it reaches this point.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260513130111.689740-1-arighi@nvidia.com?part=1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Patch "sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()" has been added to the 7.0-stable tree
2026-05-13 13:01 [PATCH 7.0.y] sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu() Andrea Righi
2026-05-14 5:02 ` sashiko-bot
@ 2026-05-15 9:18 ` gregkh
2026-05-15 9:39 ` sashiko-bot
1 sibling, 1 reply; 4+ messages in thread
From: gregkh @ 2026-05-15 9:18 UTC (permalink / raw)
To: arighi, changwoo, clm, gregkh, pschneider1968, sched-ext, tj,
void
Cc: stable-commits
This is a note to let you know that I've just added the patch titled
sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()
to the 7.0-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
The filename of the patch is:
sched_ext-skip-tasks-with-stale-task_rq-in-bypass_lb_cpu.patch
and it can be found in the queue-7.0 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.
From arighi@nvidia.com Wed May 13 15:01:26 2026
From: Andrea Righi <arighi@nvidia.com>
Date: Wed, 13 May 2026 15:01:11 +0200
Subject: sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>, Changwoo Min <changwoo@igalia.com>
Cc: Chris Mason <clm@meta.com>, Peter Schneider <pschneider1968@googlemail.com>, sched-ext@lists.linux.dev, stable@vger.kernel.org, linux-kernel@vger.kernel.org
Message-ID: <20260513130111.689740-1-arighi@nvidia.com>
From: Tejun Heo <tj@kernel.org>
commit da2d81b4118a74e65d2335e221a38d665902a98c upstream.
bypass_lb_cpu() transfers tasks between per-CPU bypass DSQs without
migrating them - task_cpu() only updates when the donee later consumes the
task via move_remote_task_to_local_dsq(). If the LB timer fires again before
consumption and the new DSQ becomes a donor, @p is still on the previous CPU
and task_rq(@p) != donor_rq. @p can't be moved without its own rq locked.
Skip such tasks.
Fixes: 95d1df610cdc ("sched_ext: Implement load balancer for bypass mode")
Cc: stable@vger.kernel.org # v6.19+
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
[ arighi: replace donor_rq with rq, not present in v7.0.y ]
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
kernel/sched/ext.c | 9 +++++++++
1 file changed, 9 insertions(+)
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4010,6 +4010,15 @@ resume:
if (cpumask_empty(donee_mask))
break;
+ /*
+ * If an earlier pass placed @p on @donor_dsq from a different
+ * CPU and the donee hasn't consumed it yet, @p is still on the
+ * previous CPU and task_rq(@p) != @rq. @p can't be moved
+ * without its rq locked. Skip.
+ */
+ if (task_rq(p) != rq)
+ continue;
+
donee = cpumask_any_and_distribute(donee_mask, p->cpus_ptr);
if (donee >= nr_cpu_ids)
continue;
Patches currently in stable-queue which might be from arighi@nvidia.com are
queue-7.0/sched_ext-use-hk_type_domain_boot-to-detect-isolcpus-domain-isolation.patch
queue-7.0/sched_ext-skip-tasks-with-stale-task_rq-in-bypass_lb_cpu.patch
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Patch "sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()" has been added to the 7.0-stable tree
2026-05-15 9:18 ` Patch "sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()" has been added to the 7.0-stable tree gregkh
@ 2026-05-15 9:39 ` sashiko-bot
0 siblings, 0 replies; 4+ messages in thread
From: sashiko-bot @ 2026-05-15 9:39 UTC (permalink / raw)
To: gregkh; +Cc: sched-ext
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Unbounded iteration with interrupts disabled in `bypass_lb_cpu()` due to skipping tasks without yielding.
--
commit a8567d453e072dcdd0102db7ef988700bdb7dc40
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()
This commit updates the bypass_lb_cpu() load balancer to skip tasks that
have a stale task_rq. This prevents moving tasks between per-CPU bypass
dispatch queues without having the correct runqueue lock held.
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
[ ... ]
> @@ -4005,6 +4005,15 @@ static u32 bypass_lb_cpu(struct scx_sched *sch, struct rq *rq,
> if (cpumask_empty(donee_mask))
> break;
>
> + /*
> + * If an earlier pass placed @p on @donor_dsq from a different
> + * CPU and the donee hasn't consumed it yet, @p is still on the
> + * previous CPU and task_rq(@p) != @rq. @p can't be moved
> + * without its rq locked. Skip.
> + */
> + if (task_rq(p) != rq)
> + continue;
Does this regression lead to unbounded iteration with local interrupts
disabled?
Looking at bypass_lb_cpu(), it iterates over tasks in donor_dsq while holding
rq->lock and donor_dsq->lock. The function includes a batching mechanism to
drop these locks and prevent CPU stalls:
nr_balanced++;
if (!(nr_balanced % SCX_BYPASS_LB_BATCH) && n) {
list_move_tail(&cursor.node, &n->scx.dsq_list.node);
raw_spin_unlock(&donor_dsq->lock);
raw_spin_rq_unlock_irq(rq);
cpu_relax();
...
When tasks are skipped via the new continue statement, nr_balanced is not
incremented.
If the bypass dispatch queue contains a massive number of tasks that were
recently load-balanced but not yet consumed, they will all have a stale
task_rq.
Since they are all skipped, the loop might process an immense queue without
ever incrementing nr_balanced or triggering the lock yield.
Would it be safer to use a separate loop iteration counter rather than the
number of successfully balanced tasks to trigger the yield?
--
Sashiko AI review · https://sashiko.dev/#/patchset/2026051520-oak-marlin-d109@gregkh?part=1
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-05-15 9:39 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13 13:01 [PATCH 7.0.y] sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu() Andrea Righi
2026-05-14 5:02 ` sashiko-bot
2026-05-15 9:18 ` Patch "sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()" has been added to the 7.0-stable tree gregkh
2026-05-15 9:39 ` sashiko-bot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.