public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] sched_ext: Don't kick CPUs running higher classes
  2026-01-24  9:20 [PATCH 0/2] SCX kick fixes from 6.19 Christian Loehle
@ 2026-01-24  9:20 ` Christian Loehle
  2026-01-28 14:18   ` Greg KH
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Loehle @ 2026-01-24  9:20 UTC (permalink / raw)
  To: stable, tj; +Cc: arighi, void, sched-ext

From: Tejun Heo <tj@kernel.org>

commit a9c1fbbd6dadbaa38c157a07d5d11005460b86b9 upstream.

When a sched_ext scheduler tries to kick a CPU, the CPU may be running a
higher class task. sched_ext has no control over such CPUs. A sched_ext
scheduler couldn't have expected to get access to the CPU after kicking it
anyway. Skip kicking when the target CPU is running a higher class.

Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/sched/ext.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 31eda2a56920..3d53b2232937 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5164,18 +5164,23 @@ static bool kick_one_cpu(s32 cpu, struct rq *this_rq, unsigned long *pseqs)
 {
 	struct rq *rq = cpu_rq(cpu);
 	struct scx_rq *this_scx = &this_rq->scx;
+	const struct sched_class *cur_class;
 	bool should_wait = false;
 	unsigned long flags;
 
 	raw_spin_rq_lock_irqsave(rq, flags);
+	cur_class = rq->curr->sched_class;
 
 	/*
 	 * During CPU hotplug, a CPU may depend on kicking itself to make
-	 * forward progress. Allow kicking self regardless of online state.
+	 * forward progress. Allow kicking self regardless of online state. If
+	 * @cpu is running a higher class task, we have no control over @cpu.
+	 * Skip kicking.
 	 */
-	if (cpu_online(cpu) || cpu == cpu_of(this_rq)) {
+	if ((cpu_online(cpu) || cpu == cpu_of(this_rq)) &&
+	    !sched_class_above(cur_class, &ext_sched_class)) {
 		if (cpumask_test_cpu(cpu, this_scx->cpus_to_preempt)) {
-			if (rq->curr->sched_class == &ext_sched_class)
+			if (cur_class == &ext_sched_class)
 				rq->curr->scx.slice = 0;
 			cpumask_clear_cpu(cpu, this_scx->cpus_to_preempt);
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] sched_ext: Don't kick CPUs running higher classes
  2026-01-24  9:20 ` [PATCH 1/2] sched_ext: Don't kick CPUs running higher classes Christian Loehle
@ 2026-01-28 14:18   ` Greg KH
  0 siblings, 0 replies; 5+ messages in thread
From: Greg KH @ 2026-01-28 14:18 UTC (permalink / raw)
  To: Christian Loehle; +Cc: stable, tj, arighi, void, sched-ext

On Sat, Jan 24, 2026 at 09:20:42AM +0000, Christian Loehle wrote:
> From: Tejun Heo <tj@kernel.org>
> 
> commit a9c1fbbd6dadbaa38c157a07d5d11005460b86b9 upstream.
> 
> When a sched_ext scheduler tries to kick a CPU, the CPU may be running a
> higher class task. sched_ext has no control over such CPUs. A sched_ext
> scheduler couldn't have expected to get access to the CPU after kicking it
> anyway. Skip kicking when the target CPU is running a higher class.
> 
> Reviewed-by: Andrea Righi <arighi@nvidia.com>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  kernel/sched/ext.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)

You did not sign off on these patches that you are forwarding on for us
to apply :(



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCHv2 0/2] SCX kick fixes from 6.19
@ 2026-01-29  9:25 Christian Loehle
  2026-01-29  9:25 ` [PATCH 1/2] sched_ext: Don't kick CPUs running higher classes Christian Loehle
  2026-01-29  9:25 ` [PATCH 2/2] sched_ext: Fix SCX_KICK_WAIT to work reliably Christian Loehle
  0 siblings, 2 replies; 5+ messages in thread
From: Christian Loehle @ 2026-01-29  9:25 UTC (permalink / raw)
  To: stable, tj; +Cc: arighi, void, sched-ext, Christian Loehle

See https://lore.kernel.org/lkml/20251022205629.845930-1-tj@kernel.org/
These apply to linux-6.18.y
The issue also affects 6.12 but that reuires a different backport.

v2:
- Add the sign-off, no other changes

Tejun Heo (2):
  sched_ext: Don't kick CPUs running higher classes
  sched_ext: Fix SCX_KICK_WAIT to work reliably

 kernel/sched/ext.c          | 57 ++++++++++++++++++++++---------------
 kernel/sched/ext_internal.h |  6 ++--
 2 files changed, 38 insertions(+), 25 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] sched_ext: Don't kick CPUs running higher classes
  2026-01-29  9:25 [PATCHv2 0/2] SCX kick fixes from 6.19 Christian Loehle
@ 2026-01-29  9:25 ` Christian Loehle
  2026-01-29  9:25 ` [PATCH 2/2] sched_ext: Fix SCX_KICK_WAIT to work reliably Christian Loehle
  1 sibling, 0 replies; 5+ messages in thread
From: Christian Loehle @ 2026-01-29  9:25 UTC (permalink / raw)
  To: stable, tj; +Cc: arighi, void, sched-ext, Christian Loehle

From: Tejun Heo <tj@kernel.org>

commit a9c1fbbd6dadbaa38c157a07d5d11005460b86b9 upstream.

When a sched_ext scheduler tries to kick a CPU, the CPU may be running a
higher class task. sched_ext has no control over such CPUs. A sched_ext
scheduler couldn't have expected to get access to the CPU after kicking it
anyway. Skip kicking when the target CPU is running a higher class.

Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
---
 kernel/sched/ext.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 31eda2a56920..3d53b2232937 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5164,18 +5164,23 @@ static bool kick_one_cpu(s32 cpu, struct rq *this_rq, unsigned long *pseqs)
 {
 	struct rq *rq = cpu_rq(cpu);
 	struct scx_rq *this_scx = &this_rq->scx;
+	const struct sched_class *cur_class;
 	bool should_wait = false;
 	unsigned long flags;
 
 	raw_spin_rq_lock_irqsave(rq, flags);
+	cur_class = rq->curr->sched_class;
 
 	/*
 	 * During CPU hotplug, a CPU may depend on kicking itself to make
-	 * forward progress. Allow kicking self regardless of online state.
+	 * forward progress. Allow kicking self regardless of online state. If
+	 * @cpu is running a higher class task, we have no control over @cpu.
+	 * Skip kicking.
 	 */
-	if (cpu_online(cpu) || cpu == cpu_of(this_rq)) {
+	if ((cpu_online(cpu) || cpu == cpu_of(this_rq)) &&
+	    !sched_class_above(cur_class, &ext_sched_class)) {
 		if (cpumask_test_cpu(cpu, this_scx->cpus_to_preempt)) {
-			if (rq->curr->sched_class == &ext_sched_class)
+			if (cur_class == &ext_sched_class)
 				rq->curr->scx.slice = 0;
 			cpumask_clear_cpu(cpu, this_scx->cpus_to_preempt);
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] sched_ext: Fix SCX_KICK_WAIT to work reliably
  2026-01-29  9:25 [PATCHv2 0/2] SCX kick fixes from 6.19 Christian Loehle
  2026-01-29  9:25 ` [PATCH 1/2] sched_ext: Don't kick CPUs running higher classes Christian Loehle
@ 2026-01-29  9:25 ` Christian Loehle
  1 sibling, 0 replies; 5+ messages in thread
From: Christian Loehle @ 2026-01-29  9:25 UTC (permalink / raw)
  To: stable, tj
  Cc: arighi, void, sched-ext, Wen-Fang Liu, Peter Zijlstra,
	Christian Loehle

From: Tejun Heo <tj@kernel.org>

commit a379fa1e2cae15d7422b4eead83a6366f2f445cb upstream.

SCX_KICK_WAIT is used to synchronously wait for the target CPU to complete
a reschedule and can be used to implement operations like core scheduling.

This used to be implemented by scx_next_task_picked() incrementing pnt_seq,
which was always called when a CPU picks the next task to run, allowing
SCX_KICK_WAIT to reliably wait for the target CPU to enter the scheduler and
pick the next task.

However, commit b999e365c298 ("sched_ext: Replace scx_next_task_picked()
with switch_class()") replaced scx_next_task_picked() with the
switch_class() callback, which is only called when switching between sched
classes. This broke SCX_KICK_WAIT because pnt_seq would no longer be
reliably incremented unless the previous task was SCX and the next task was
not.

This fix leverages commit 4c95380701f5 ("sched/ext: Fold balance_scx() into
pick_task_scx()") which refactored the pick path making put_prev_task_scx()
the natural place to track task switches for SCX_KICK_WAIT. The fix moves
pnt_seq increment to put_prev_task_scx() and also increments it in
pick_task_scx() to handle cases where the same task is re-selected, whether
by BPF scheduler decision or slice refill. The semantics: If the current
task on the target CPU is SCX, SCX_KICK_WAIT waits until the CPU enters the
scheduling path. This provides sufficient guarantee for use cases like core
scheduling while keeping the operation self-contained within SCX.

v2: - Also increment pnt_seq in pick_task_scx() to handle same-task
      re-selection (Andrea Righi).
    - Use smp_cond_load_acquire() for the busy-wait loop for better
      architecture optimization (Peter Zijlstra).

Reported-by: Wen-Fang Liu <liuwenfang@honor.com>
Link: http://lkml.kernel.org/r/228ebd9e6ed3437996dffe15735a9caa@honor.com
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
---
 kernel/sched/ext.c          | 46 +++++++++++++++++++++----------------
 kernel/sched/ext_internal.h |  6 +++--
 2 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 3d53b2232937..2ff7034841c7 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2306,12 +2306,6 @@ static void switch_class(struct rq *rq, struct task_struct *next)
 	struct scx_sched *sch = scx_root;
 	const struct sched_class *next_class = next->sched_class;
 
-	/*
-	 * Pairs with the smp_load_acquire() issued by a CPU in
-	 * kick_cpus_irq_workfn() who is waiting for this CPU to perform a
-	 * resched.
-	 */
-	smp_store_release(&rq->scx.pnt_seq, rq->scx.pnt_seq + 1);
 	if (!(sch->ops.flags & SCX_OPS_HAS_CPU_PREEMPT))
 		return;
 
@@ -2351,6 +2345,10 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
 			      struct task_struct *next)
 {
 	struct scx_sched *sch = scx_root;
+
+	/* see kick_cpus_irq_workfn() */
+	smp_store_release(&rq->scx.pnt_seq, rq->scx.pnt_seq + 1);
+
 	update_curr_scx(rq);
 
 	/* see dequeue_task_scx() on why we skip when !QUEUED */
@@ -2404,6 +2402,9 @@ static struct task_struct *pick_task_scx(struct rq *rq)
 	bool keep_prev = rq->scx.flags & SCX_RQ_BAL_KEEP;
 	bool kick_idle = false;
 
+	/* see kick_cpus_irq_workfn() */
+	smp_store_release(&rq->scx.pnt_seq, rq->scx.pnt_seq + 1);
+
 	/*
 	 * WORKAROUND:
 	 *
@@ -5186,8 +5187,12 @@ static bool kick_one_cpu(s32 cpu, struct rq *this_rq, unsigned long *pseqs)
 		}
 
 		if (cpumask_test_cpu(cpu, this_scx->cpus_to_wait)) {
-			pseqs[cpu] = rq->scx.pnt_seq;
-			should_wait = true;
+			if (cur_class == &ext_sched_class) {
+				pseqs[cpu] = rq->scx.pnt_seq;
+				should_wait = true;
+			} else {
+				cpumask_clear_cpu(cpu, this_scx->cpus_to_wait);
+			}
 		}
 
 		resched_curr(rq);
@@ -5248,18 +5253,19 @@ static void kick_cpus_irq_workfn(struct irq_work *irq_work)
 	for_each_cpu(cpu, this_scx->cpus_to_wait) {
 		unsigned long *wait_pnt_seq = &cpu_rq(cpu)->scx.pnt_seq;
 
-		if (cpu != cpu_of(this_rq)) {
-			/*
-			 * Pairs with smp_store_release() issued by this CPU in
-			 * switch_class() on the resched path.
-			 *
-			 * We busy-wait here to guarantee that no other task can
-			 * be scheduled on our core before the target CPU has
-			 * entered the resched path.
-			 */
-			while (smp_load_acquire(wait_pnt_seq) == pseqs[cpu])
-				cpu_relax();
-		}
+		/*
+		 * Busy-wait until the task running at the time of kicking is no
+		 * longer running. This can be used to implement e.g. core
+		 * scheduling.
+		 *
+		 * smp_cond_load_acquire() pairs with store_releases in
+		 * pick_task_scx() and put_prev_task_scx(). The former breaks
+		 * the wait if SCX's scheduling path is entered even if the same
+		 * task is picked subsequently. The latter is necessary to break
+		 * the wait when $cpu is taken by a higher sched class.
+		 */
+		if (cpu != cpu_of(this_rq))
+			smp_cond_load_acquire(wait_pnt_seq, VAL != pseqs[cpu]);
 
 		cpumask_clear_cpu(cpu, this_scx->cpus_to_wait);
 	}
diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
index b3617abed510..601cfae8cc76 100644
--- a/kernel/sched/ext_internal.h
+++ b/kernel/sched/ext_internal.h
@@ -986,8 +986,10 @@ enum scx_kick_flags {
 	SCX_KICK_PREEMPT	= 1LLU << 1,
 
 	/*
-	 * Wait for the CPU to be rescheduled. The scx_bpf_kick_cpu() call will
-	 * return after the target CPU finishes picking the next task.
+	 * The scx_bpf_kick_cpu() call will return after the current SCX task of
+	 * the target CPU switches out. This can be used to implement e.g. core
+	 * scheduling. This has no effect if the current task on the target CPU
+	 * is not on SCX.
 	 */
 	SCX_KICK_WAIT		= 1LLU << 2,
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-01-29  9:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-29  9:25 [PATCHv2 0/2] SCX kick fixes from 6.19 Christian Loehle
2026-01-29  9:25 ` [PATCH 1/2] sched_ext: Don't kick CPUs running higher classes Christian Loehle
2026-01-29  9:25 ` [PATCH 2/2] sched_ext: Fix SCX_KICK_WAIT to work reliably Christian Loehle
  -- strict thread matches above, loose matches on Subject: below --
2026-01-24  9:20 [PATCH 0/2] SCX kick fixes from 6.19 Christian Loehle
2026-01-24  9:20 ` [PATCH 1/2] sched_ext: Don't kick CPUs running higher classes Christian Loehle
2026-01-28 14:18   ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox