[PATCH 7.0.y] sched_ext: Skip tasks with stale task_rq in bypass_lb

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 7.0.y] sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()
@ 2026-05-13 13:01 Andrea Righi
  2026-05-14  5:02 ` sashiko-bot
  2026-05-15  9:18 ` Patch "sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()" has been added to the 7.0-stable tree gregkh
  0 siblings, 2 replies; 4+ messages in thread
From: Andrea Righi @ 2026-05-13 13:01 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Tejun Heo, David Vernet, Changwoo Min
  Cc: Chris Mason, Peter Schneider, sched-ext, stable, linux-kernel

From: Tejun Heo <tj@kernel.org>

commit da2d81b4118a74e65d2335e221a38d665902a98c upstream.

bypass_lb_cpu() transfers tasks between per-CPU bypass DSQs without
migrating them - task_cpu() only updates when the donee later consumes the
task via move_remote_task_to_local_dsq(). If the LB timer fires again before
consumption and the new DSQ becomes a donor, @p is still on the previous CPU
and task_rq(@p) != donor_rq. @p can't be moved without its own rq locked.

Skip such tasks.

Fixes: 95d1df610cdc ("sched_ext: Implement load balancer for bypass mode")
Cc: stable@vger.kernel.org # v6.19+
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
[ arighi: replace donor_rq with rq, not present in v7.0.y ]
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 89814646a9868..ddd7c19daa17c 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4008,6 +4008,15 @@ static u32 bypass_lb_cpu(struct scx_sched *sch, struct rq *rq,
 		if (cpumask_empty(donee_mask))
 			break;
 
+		/*
+		 * If an earlier pass placed @p on @donor_dsq from a different
+		 * CPU and the donee hasn't consumed it yet, @p is still on the
+		 * previous CPU and task_rq(@p) != @rq. @p can't be moved
+		 * without its rq locked. Skip.
+		 */
+		if (task_rq(p) != rq)
+			continue;
+
 		donee = cpumask_any_and_distribute(donee_mask, p->cpus_ptr);
 		if (donee >= nr_cpu_ids)
 			continue;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 7.0.y] sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()
  2026-05-13 13:01 [PATCH 7.0.y] sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu() Andrea Righi
@ 2026-05-14  5:02 ` sashiko-bot
  2026-05-15  9:18 ` Patch "sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()" has been added to the 7.0-stable tree gregkh
  1 sibling, 0 replies; 4+ messages in thread
From: sashiko-bot @ 2026-05-14  5:02 UTC (permalink / raw)
  To: Andrea Righi; +Cc: sched-ext

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Unbounded lock hold time with IRQs disabled when skipping tasks in `bypass_lb_cpu()`.
- [Low] Obsolete comment contradicts the new task_rq lock constraints.
--

commit 68327c3a383140f9157a96a24a7b23be2c135ac5
Author: Tejun Heo <tj@kernel.org>

sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()

This commit updates bypass_lb_cpu() to skip tasks whose task_rq() does not
match the donor rq. These tasks were placed on the donor DSQ from a different
CPU but haven't been consumed yet, preventing their migration without the
proper rq lock.

> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c

[ ... ]

>  	while ((p = n)) {
>  		struct rq *donee_rq;

[ ... ]

> +		/*
> +		 * If an earlier pass placed @p on @donor_dsq from a different
> +		 * CPU and the donee hasn't consumed it yet, @p is still on the
> +		 * previous CPU and task_rq(@p) != @rq. @p can't be moved
> +		 * without its rq locked. Skip.
> +		 */
> +		if (task_rq(p) != rq)
> +			continue;

Because this uses continue, does it bypass the lock-yielding check at the
bottom of the loop (if (!(nr_balanced % SCX_BYPASS_LB_BATCH)))?

If there is a large batch of unconsumed tasks left over from previous passes,
could skipping them without incrementing nr_balanced result in an unbounded
loop?

Traversing a long list here while holding the rq lock and donor_dsq lock
with IRQs disabled could potentially cause latency spikes or trigger the
soft lockup watchdog.

>  		donee = cpumask_any_and_distribute(donee_mask, p->cpus_ptr);
>  		if (donee >= nr_cpu_ids)
>  			continue;
>  
>  		donee_rq = cpu_rq(donee);
>  		donee_dsq = &donee_rq->scx.bypass_dsq;
>  
>  		/*
>  		 * $p's rq is not locked but $p's DSQ lock protects its
>  		 * scheduling properties making this test safe.
>  		 */

This isn't a bug, but is this comment still accurate?

Since the function initially acquires raw_spin_rq_lock_irq(rq) and the
newly added check guarantees task_rq(p) == rq, it appears the task's rq is
actually locked when it reaches this point.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260513130111.689740-1-arighi@nvidia.com?part=1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Patch "sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()" has been added to the 7.0-stable tree
  2026-05-13 13:01 [PATCH 7.0.y] sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu() Andrea Righi
  2026-05-14  5:02 ` sashiko-bot
@ 2026-05-15  9:18 ` gregkh
  2026-05-15  9:39   ` sashiko-bot
  1 sibling, 1 reply; 4+ messages in thread
From: gregkh @ 2026-05-15  9:18 UTC (permalink / raw)
  To: arighi, changwoo, clm, gregkh, pschneider1968, sched-ext, tj,
	void
  Cc: stable-commits


This is a note to let you know that I've just added the patch titled

    sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()

to the 7.0-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     sched_ext-skip-tasks-with-stale-task_rq-in-bypass_lb_cpu.patch
and it can be found in the queue-7.0 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


From arighi@nvidia.com Wed May 13 15:01:26 2026
From: Andrea Righi <arighi@nvidia.com>
Date: Wed, 13 May 2026 15:01:11 +0200
Subject: sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>, Changwoo Min <changwoo@igalia.com>
Cc: Chris Mason <clm@meta.com>, Peter Schneider <pschneider1968@googlemail.com>, sched-ext@lists.linux.dev, stable@vger.kernel.org, linux-kernel@vger.kernel.org
Message-ID: <20260513130111.689740-1-arighi@nvidia.com>

From: Tejun Heo <tj@kernel.org>

commit da2d81b4118a74e65d2335e221a38d665902a98c upstream.

bypass_lb_cpu() transfers tasks between per-CPU bypass DSQs without
migrating them - task_cpu() only updates when the donee later consumes the
task via move_remote_task_to_local_dsq(). If the LB timer fires again before
consumption and the new DSQ becomes a donor, @p is still on the previous CPU
and task_rq(@p) != donor_rq. @p can't be moved without its own rq locked.

Skip such tasks.

Fixes: 95d1df610cdc ("sched_ext: Implement load balancer for bypass mode")
Cc: stable@vger.kernel.org # v6.19+
Reported-by: Chris Mason <clm@meta.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
[ arighi: replace donor_rq with rq, not present in v7.0.y ]
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/sched/ext.c |    9 +++++++++
 1 file changed, 9 insertions(+)

--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4010,6 +4010,15 @@ resume:
 		if (cpumask_empty(donee_mask))
 			break;
 
+		/*
+		 * If an earlier pass placed @p on @donor_dsq from a different
+		 * CPU and the donee hasn't consumed it yet, @p is still on the
+		 * previous CPU and task_rq(@p) != @rq. @p can't be moved
+		 * without its rq locked. Skip.
+		 */
+		if (task_rq(p) != rq)
+			continue;
+
 		donee = cpumask_any_and_distribute(donee_mask, p->cpus_ptr);
 		if (donee >= nr_cpu_ids)
 			continue;


Patches currently in stable-queue which might be from arighi@nvidia.com are

queue-7.0/sched_ext-use-hk_type_domain_boot-to-detect-isolcpus-domain-isolation.patch
queue-7.0/sched_ext-skip-tasks-with-stale-task_rq-in-bypass_lb_cpu.patch

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Patch "sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()" has been added to the 7.0-stable tree
  2026-05-15  9:18 ` Patch "sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()" has been added to the 7.0-stable tree gregkh
@ 2026-05-15  9:39   ` sashiko-bot
  0 siblings, 0 replies; 4+ messages in thread
From: sashiko-bot @ 2026-05-15  9:39 UTC (permalink / raw)
  To: gregkh; +Cc: sched-ext

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Unbounded iteration with interrupts disabled in `bypass_lb_cpu()` due to skipping tasks without yielding.
--

commit a8567d453e072dcdd0102db7ef988700bdb7dc40
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()

This commit updates the bypass_lb_cpu() load balancer to skip tasks that
have a stale task_rq. This prevents moving tasks between per-CPU bypass
dispatch queues without having the correct runqueue lock held.

> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
[ ... ]
> @@ -4005,6 +4005,15 @@ static u32 bypass_lb_cpu(struct scx_sched *sch, struct rq *rq,
>  		if (cpumask_empty(donee_mask))
>  			break;
>  
> +		/*
> +		 * If an earlier pass placed @p on @donor_dsq from a different
> +		 * CPU and the donee hasn't consumed it yet, @p is still on the
> +		 * previous CPU and task_rq(@p) != @rq. @p can't be moved
> +		 * without its rq locked. Skip.
> +		 */
> +		if (task_rq(p) != rq)
> +			continue;

Does this regression lead to unbounded iteration with local interrupts
disabled?

Looking at bypass_lb_cpu(), it iterates over tasks in donor_dsq while holding
rq->lock and donor_dsq->lock. The function includes a batching mechanism to
drop these locks and prevent CPU stalls:

    nr_balanced++;
    if (!(nr_balanced % SCX_BYPASS_LB_BATCH) && n) {
        list_move_tail(&cursor.node, &n->scx.dsq_list.node);
        raw_spin_unlock(&donor_dsq->lock);
        raw_spin_rq_unlock_irq(rq);
        cpu_relax();
        ...

When tasks are skipped via the new continue statement, nr_balanced is not
incremented.

If the bypass dispatch queue contains a massive number of tasks that were
recently load-balanced but not yet consumed, they will all have a stale
task_rq.

Since they are all skipped, the loop might process an immense queue without
ever incrementing nr_balanced or triggering the lock yield.

Would it be safer to use a separate loop iteration counter rather than the
number of successfully balanced tasks to trigger the yield?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/2026051520-oak-marlin-d109@gregkh?part=1

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-15  9:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13 13:01 [PATCH 7.0.y] sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu() Andrea Righi
2026-05-14  5:02 ` sashiko-bot
2026-05-15  9:18 ` Patch "sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu()" has been added to the 7.0-stable tree gregkh
2026-05-15  9:39   ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.