[PATCHSET RESEND v2 sched_ext/for-6.19] sched_ext: Fix SCX_KICK

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCHSET RESEND v2 sched_ext/for-6.19] sched_ext: Fix SCX_KICK_WAIT reliability
@ 2025-10-22 20:56 Tejun Heo
  2025-10-22 20:56 ` [PATCH RESEND v2 1/3] sched_ext: Don't kick CPUs running higher classes Tejun Heo
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Tejun Heo @ 2025-10-22 20:56 UTC (permalink / raw)
  To: void, arighi, changwoo; +Cc: linux-kernel, sched-ext, peterz

Resending because the original v2 posting didn't include the full recipient
list on the individual patches due to git send-email invocation error. Sorry
about the noise.

SCX_KICK_WAIT is used to synchronously wait for the target CPU to complete
a reschedule and can be used to implement operations like core scheduling.
However, recent scheduler refactorings broke its reliability. This series
fixes the issue and improves the code clarity.

v2: - In patch #2, also increment pnt_seq in pick_task_scx() to handle
      same-task re-selection (Andrea Righi).
    - In patch #2, use smp_cond_load_acquire() for the busy-wait loop for
      better architecture optimization (Peter Zijlstra).
    - Added patch #3 to rename pnt_seq to kick_sync for clarity.

v1: http://lkml.kernel.org/r/20251021210354.89570-1-tj@kernel.org

Based on sched_ext/for-6.19 (2dbbdeda77a6).

 1 sched_ext: Don't kick CPUs running higher classes
 2 sched_ext: Fix SCX_KICK_WAIT to work reliably
 3 sched_ext: Rename pnt_seq to kick_sync

Git tree: git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git scx-fix-kick_wait

 kernel/sched/ext.c          | 129 ++++++++++++++++++++++++--------------------
 kernel/sched/ext_internal.h |   6 ++-
 kernel/sched/sched.h        |   2 +-
 3 files changed, 75 insertions(+), 62 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH RESEND v2 1/3] sched_ext: Don't kick CPUs running higher classes
  2025-10-22 20:56 [PATCHSET RESEND v2 sched_ext/for-6.19] sched_ext: Fix SCX_KICK_WAIT reliability Tejun Heo
@ 2025-10-22 20:56 ` Tejun Heo
  2025-10-22 20:56 ` [PATCH RESEND v2 2/3] sched_ext: Fix SCX_KICK_WAIT to work reliably Tejun Heo
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2025-10-22 20:56 UTC (permalink / raw)
  To: void, arighi, changwoo; +Cc: linux-kernel, sched-ext, peterz, Tejun Heo

When a sched_ext scheduler tries to kick a CPU, the CPU may be running a
higher class task. sched_ext has no control over such CPUs. A sched_ext
scheduler couldn't have expected to get access to the CPU after kicking it
anyway. Skip kicking when the target CPU is running a higher class.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/sched/ext.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index dc86ce0be32a..7db43a14a6fc 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5122,18 +5122,23 @@ static bool kick_one_cpu(s32 cpu, struct rq *this_rq, unsigned long *pseqs)
 {
 	struct rq *rq = cpu_rq(cpu);
 	struct scx_rq *this_scx = &this_rq->scx;
+	const struct sched_class *cur_class;
 	bool should_wait = false;
 	unsigned long flags;
 
 	raw_spin_rq_lock_irqsave(rq, flags);
+	cur_class = rq->curr->sched_class;
 
 	/*
 	 * During CPU hotplug, a CPU may depend on kicking itself to make
-	 * forward progress. Allow kicking self regardless of online state.
+	 * forward progress. Allow kicking self regardless of online state. If
+	 * @cpu is running a higher class task, we have no control over @cpu.
+	 * Skip kicking.
 	 */
-	if (cpu_online(cpu) || cpu == cpu_of(this_rq)) {
+	if ((cpu_online(cpu) || cpu == cpu_of(this_rq)) &&
+	    !sched_class_above(cur_class, &ext_sched_class)) {
 		if (cpumask_test_cpu(cpu, this_scx->cpus_to_preempt)) {
-			if (rq->curr->sched_class == &ext_sched_class)
+			if (cur_class == &ext_sched_class)
 				rq->curr->scx.slice = 0;
 			cpumask_clear_cpu(cpu, this_scx->cpus_to_preempt);
 		}
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH RESEND v2 2/3] sched_ext: Fix SCX_KICK_WAIT to work reliably
  2025-10-22 20:56 [PATCHSET RESEND v2 sched_ext/for-6.19] sched_ext: Fix SCX_KICK_WAIT reliability Tejun Heo
  2025-10-22 20:56 ` [PATCH RESEND v2 1/3] sched_ext: Don't kick CPUs running higher classes Tejun Heo
@ 2025-10-22 20:56 ` Tejun Heo
  2025-10-22 20:56 ` [PATCH RESEND v2 3/3] sched_ext: Rename pnt_seq to kick_sync Tejun Heo
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2025-10-22 20:56 UTC (permalink / raw)
  To: void, arighi, changwoo
  Cc: linux-kernel, sched-ext, peterz, Tejun Heo, Wen-Fang Liu

SCX_KICK_WAIT is used to synchronously wait for the target CPU to complete
a reschedule and can be used to implement operations like core scheduling.

This used to be implemented by scx_next_task_picked() incrementing pnt_seq,
which was always called when a CPU picks the next task to run, allowing
SCX_KICK_WAIT to reliably wait for the target CPU to enter the scheduler and
pick the next task.

However, commit b999e365c298 ("sched_ext: Replace scx_next_task_picked()
with switch_class()") replaced scx_next_task_picked() with the
switch_class() callback, which is only called when switching between sched
classes. This broke SCX_KICK_WAIT because pnt_seq would no longer be
reliably incremented unless the previous task was SCX and the next task was
not.

This fix leverages commit 4c95380701f5 ("sched/ext: Fold balance_scx() into
pick_task_scx()") which refactored the pick path making put_prev_task_scx()
the natural place to track task switches for SCX_KICK_WAIT. The fix moves
pnt_seq increment to put_prev_task_scx() and also increments it in
pick_task_scx() to handle cases where the same task is re-selected, whether
by BPF scheduler decision or slice refill. The semantics: If the current
task on the target CPU is SCX, SCX_KICK_WAIT waits until the CPU enters the
scheduling path. This provides sufficient guarantee for use cases like core
scheduling while keeping the operation self-contained within SCX.

v2: - Also increment pnt_seq in pick_task_scx() to handle same-task
      re-selection (Andrea Righi).
    - Use smp_cond_load_acquire() for the busy-wait loop for better
      architecture optimization (Peter Zijlstra).

Reported-by: Wen-Fang Liu <liuwenfang@honor.com>
Link: http://lkml.kernel.org/r/228ebd9e6ed3437996dffe15735a9caa@honor.com
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/sched/ext.c          | 46 +++++++++++++++++++++----------------
 kernel/sched/ext_internal.h |  6 +++--
 2 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 7db43a14a6fc..3f87f3d31ccd 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2260,12 +2260,6 @@ static void switch_class(struct rq *rq, struct task_struct *next)
 	struct scx_sched *sch = scx_root;
 	const struct sched_class *next_class = next->sched_class;
 
-	/*
-	 * Pairs with the smp_load_acquire() issued by a CPU in
-	 * kick_cpus_irq_workfn() who is waiting for this CPU to perform a
-	 * resched.
-	 */
-	smp_store_release(&rq->scx.pnt_seq, rq->scx.pnt_seq + 1);
 	if (!(sch->ops.flags & SCX_OPS_HAS_CPU_PREEMPT))
 		return;
 
@@ -2305,6 +2299,10 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
 			      struct task_struct *next)
 {
 	struct scx_sched *sch = scx_root;
+
+	/* see kick_cpus_irq_workfn() */
+	smp_store_release(&rq->scx.pnt_seq, rq->scx.pnt_seq + 1);
+
 	update_curr_scx(rq);
 
 	/* see dequeue_task_scx() on why we skip when !QUEUED */
@@ -2358,6 +2356,9 @@ do_pick_task_scx(struct rq *rq, struct rq_flags *rf)
 	bool keep_prev, kick_idle = false;
 	struct task_struct *p;
 
+	/* see kick_cpus_irq_workfn() */
+	smp_store_release(&rq->scx.pnt_seq, rq->scx.pnt_seq + 1);
+
 	rq_modified_clear(rq);
 
 	rq_unpin_lock(rq, rf);
@@ -5144,8 +5145,12 @@ static bool kick_one_cpu(s32 cpu, struct rq *this_rq, unsigned long *pseqs)
 		}
 
 		if (cpumask_test_cpu(cpu, this_scx->cpus_to_wait)) {
-			pseqs[cpu] = rq->scx.pnt_seq;
-			should_wait = true;
+			if (cur_class == &ext_sched_class) {
+				pseqs[cpu] = rq->scx.pnt_seq;
+				should_wait = true;
+			} else {
+				cpumask_clear_cpu(cpu, this_scx->cpus_to_wait);
+			}
 		}
 
 		resched_curr(rq);
@@ -5206,18 +5211,19 @@ static void kick_cpus_irq_workfn(struct irq_work *irq_work)
 	for_each_cpu(cpu, this_scx->cpus_to_wait) {
 		unsigned long *wait_pnt_seq = &cpu_rq(cpu)->scx.pnt_seq;
 
-		if (cpu != cpu_of(this_rq)) {
-			/*
-			 * Pairs with smp_store_release() issued by this CPU in
-			 * switch_class() on the resched path.
-			 *
-			 * We busy-wait here to guarantee that no other task can
-			 * be scheduled on our core before the target CPU has
-			 * entered the resched path.
-			 */
-			while (smp_load_acquire(wait_pnt_seq) == pseqs[cpu])
-				cpu_relax();
-		}
+		/*
+		 * Busy-wait until the task running at the time of kicking is no
+		 * longer running. This can be used to implement e.g. core
+		 * scheduling.
+		 *
+		 * smp_cond_load_acquire() pairs with store_releases in
+		 * pick_task_scx() and put_prev_task_scx(). The former breaks
+		 * the wait if SCX's scheduling path is entered even if the same
+		 * task is picked subsequently. The latter is necessary to break
+		 * the wait when $cpu is taken by a higher sched class.
+		 */
+		if (cpu != cpu_of(this_rq))
+			smp_cond_load_acquire(wait_pnt_seq, VAL != pseqs[cpu]);
 
 		cpumask_clear_cpu(cpu, this_scx->cpus_to_wait);
 	}
diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
index 87e5e22bfade..21c0ccaf9c71 100644
--- a/kernel/sched/ext_internal.h
+++ b/kernel/sched/ext_internal.h
@@ -997,8 +997,10 @@ enum scx_kick_flags {
 	SCX_KICK_PREEMPT	= 1LLU << 1,
 
 	/*
-	 * Wait for the CPU to be rescheduled. The scx_bpf_kick_cpu() call will
-	 * return after the target CPU finishes picking the next task.
+	 * The scx_bpf_kick_cpu() call will return after the current SCX task of
+	 * the target CPU switches out. This can be used to implement e.g. core
+	 * scheduling. This has no effect if the current task on the target CPU
+	 * is not on SCX.
 	 */
 	SCX_KICK_WAIT		= 1LLU << 2,
 };
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH RESEND v2 3/3] sched_ext: Rename pnt_seq to kick_sync
  2025-10-22 20:56 [PATCHSET RESEND v2 sched_ext/for-6.19] sched_ext: Fix SCX_KICK_WAIT reliability Tejun Heo
  2025-10-22 20:56 ` [PATCH RESEND v2 1/3] sched_ext: Don't kick CPUs running higher classes Tejun Heo
  2025-10-22 20:56 ` [PATCH RESEND v2 2/3] sched_ext: Fix SCX_KICK_WAIT to work reliably Tejun Heo
@ 2025-10-22 20:56 ` Tejun Heo
  2025-10-22 21:18 ` [PATCHSET RESEND v2 sched_ext/for-6.19] sched_ext: Fix SCX_KICK_WAIT reliability Andrea Righi
  2025-10-22 21:47 ` [PATCH v2] " Tejun Heo
  4 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2025-10-22 20:56 UTC (permalink / raw)
  To: void, arighi, changwoo; +Cc: linux-kernel, sched-ext, peterz, Tejun Heo

The pnt_seq field and related infrastructure were originally named for
"pick next task sequence", reflecting their original implementation in
scx_next_task_picked(). However, the sequence counter is now incremented in
both put_prev_task_scx() and pick_task_scx() and its purpose is to
synchronize kick operations via SCX_KICK_WAIT, not specifically to track
pick_next_task events.

Rename to better reflect the actual semantics:
- pnt_seq -> kick_sync
- scx_kick_pseqs -> scx_kick_syncs
- pseqs variables -> ksyncs
- Update comments to refer to "kick_sync sequence" instead of "pick_task
  sequence"

This is a pure renaming with no functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/sched/ext.c   | 80 ++++++++++++++++++++++++++--------------------------
 kernel/sched/sched.h |  2 +-
 2 files changed, 41 insertions(+), 41 deletions(-)

--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -68,18 +68,18 @@ static unsigned long scx_watchdog_timest
 static struct delayed_work scx_watchdog_work;
 
 /*
- * For %SCX_KICK_WAIT: Each CPU has a pointer to an array of pick_task sequence
+ * For %SCX_KICK_WAIT: Each CPU has a pointer to an array of kick_sync sequence
  * numbers. The arrays are allocated with kvzalloc() as size can exceed percpu
  * allocator limits on large machines. O(nr_cpu_ids^2) allocation, allocated
  * lazily when enabling and freed when disabling to avoid waste when sched_ext
  * isn't active.
  */
-struct scx_kick_pseqs {
+struct scx_kick_syncs {
 	struct rcu_head		rcu;
-	unsigned long		seqs[];
+	unsigned long		syncs[];
 };
 
-static DEFINE_PER_CPU(struct scx_kick_pseqs __rcu *, scx_kick_pseqs);
+static DEFINE_PER_CPU(struct scx_kick_syncs __rcu *, scx_kick_syncs);
 
 /*
  * Direct dispatch marker.
@@ -2301,7 +2301,7 @@ static void put_prev_task_scx(struct rq
 	struct scx_sched *sch = scx_root;
 
 	/* see kick_cpus_irq_workfn() */
-	smp_store_release(&rq->scx.pnt_seq, rq->scx.pnt_seq + 1);
+	smp_store_release(&rq->scx.kick_sync, rq->scx.kick_sync + 1);
 
 	update_curr_scx(rq);
 
@@ -2357,7 +2357,7 @@ do_pick_task_scx(struct rq *rq, struct r
 	struct task_struct *p;
 
 	/* see kick_cpus_irq_workfn() */
-	smp_store_release(&rq->scx.pnt_seq, rq->scx.pnt_seq + 1);
+	smp_store_release(&rq->scx.kick_sync, rq->scx.kick_sync + 1);
 
 	rq_modified_clear(rq);
 
@@ -3883,24 +3883,24 @@ static const char *scx_exit_reason(enum
 	}
 }
 
-static void free_kick_pseqs_rcu(struct rcu_head *rcu)
+static void free_kick_syncs_rcu(struct rcu_head *rcu)
 {
-	struct scx_kick_pseqs *pseqs = container_of(rcu, struct scx_kick_pseqs, rcu);
+	struct scx_kick_syncs *ksyncs = container_of(rcu, struct scx_kick_syncs, rcu);
 
-	kvfree(pseqs);
+	kvfree(ksyncs);
 }
 
-static void free_kick_pseqs(void)
+static void free_kick_syncs(void)
 {
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
-		struct scx_kick_pseqs **pseqs = per_cpu_ptr(&scx_kick_pseqs, cpu);
-		struct scx_kick_pseqs *to_free;
+		struct scx_kick_syncs **ksyncs = per_cpu_ptr(&scx_kick_syncs, cpu);
+		struct scx_kick_syncs *to_free;
 
-		to_free = rcu_replace_pointer(*pseqs, NULL, true);
+		to_free = rcu_replace_pointer(*ksyncs, NULL, true);
 		if (to_free)
-			call_rcu(&to_free->rcu, free_kick_pseqs_rcu);
+			call_rcu(&to_free->rcu, free_kick_syncs_rcu);
 	}
 }
 
@@ -4038,7 +4038,7 @@ static void scx_disable_workfn(struct kt
 	free_percpu(scx_dsp_ctx);
 	scx_dsp_ctx = NULL;
 	scx_dsp_max_batch = 0;
-	free_kick_pseqs();
+	free_kick_syncs();
 
 	mutex_unlock(&scx_enable_mutex);
 
@@ -4287,10 +4287,10 @@ static void scx_dump_state(struct scx_ex
 		seq_buf_init(&ns, buf, avail);
 
 		dump_newline(&ns);
-		dump_line(&ns, "CPU %-4d: nr_run=%u flags=0x%x cpu_rel=%d ops_qseq=%lu pnt_seq=%lu",
+		dump_line(&ns, "CPU %-4d: nr_run=%u flags=0x%x cpu_rel=%d ops_qseq=%lu ksync=%lu",
 			  cpu, rq->scx.nr_running, rq->scx.flags,
 			  rq->scx.cpu_released, rq->scx.ops_qseq,
-			  rq->scx.pnt_seq);
+			  rq->scx.kick_sync);
 		dump_line(&ns, "          curr=%s[%d] class=%ps",
 			  rq->curr->comm, rq->curr->pid,
 			  rq->curr->sched_class);
@@ -4401,7 +4401,7 @@ static void scx_vexit(struct scx_sched *
 	irq_work_queue(&sch->error_irq_work);
 }
 
-static int alloc_kick_pseqs(void)
+static int alloc_kick_syncs(void)
 {
 	int cpu;
 
@@ -4410,19 +4410,19 @@ static int alloc_kick_pseqs(void)
 	 * can exceed percpu allocator limits on large machines.
 	 */
 	for_each_possible_cpu(cpu) {
-		struct scx_kick_pseqs **pseqs = per_cpu_ptr(&scx_kick_pseqs, cpu);
-		struct scx_kick_pseqs *new_pseqs;
+		struct scx_kick_syncs **ksyncs = per_cpu_ptr(&scx_kick_syncs, cpu);
+		struct scx_kick_syncs *new_ksyncs;
 
-		WARN_ON_ONCE(rcu_access_pointer(*pseqs));
+		WARN_ON_ONCE(rcu_access_pointer(*ksyncs));
 
-		new_pseqs = kvzalloc_node(struct_size(new_pseqs, seqs, nr_cpu_ids),
-					  GFP_KERNEL, cpu_to_node(cpu));
-		if (!new_pseqs) {
-			free_kick_pseqs();
+		new_ksyncs = kvzalloc_node(struct_size(new_ksyncs, syncs, nr_cpu_ids),
+					   GFP_KERNEL, cpu_to_node(cpu));
+		if (!new_ksyncs) {
+			free_kick_syncs();
 			return -ENOMEM;
 		}
 
-		rcu_assign_pointer(*pseqs, new_pseqs);
+		rcu_assign_pointer(*ksyncs, new_ksyncs);
 	}
 
 	return 0;
@@ -4578,14 +4578,14 @@ static int scx_enable(struct sched_ext_o
 		goto err_unlock;
 	}
 
-	ret = alloc_kick_pseqs();
+	ret = alloc_kick_syncs();
 	if (ret)
 		goto err_unlock;
 
 	sch = scx_alloc_and_add_sched(ops);
 	if (IS_ERR(sch)) {
 		ret = PTR_ERR(sch);
-		goto err_free_pseqs;
+		goto err_free_ksyncs;
 	}
 
 	/*
@@ -4788,8 +4788,8 @@ static int scx_enable(struct sched_ext_o
 
 	return 0;
 
-err_free_pseqs:
-	free_kick_pseqs();
+err_free_ksyncs:
+	free_kick_syncs();
 err_unlock:
 	mutex_unlock(&scx_enable_mutex);
 	return ret;
@@ -5119,7 +5119,7 @@ static bool can_skip_idle_kick(struct rq
 	return !is_idle_task(rq->curr) && !(rq->scx.flags & SCX_RQ_IN_BALANCE);
 }
 
-static bool kick_one_cpu(s32 cpu, struct rq *this_rq, unsigned long *pseqs)
+static bool kick_one_cpu(s32 cpu, struct rq *this_rq, unsigned long *ksyncs)
 {
 	struct rq *rq = cpu_rq(cpu);
 	struct scx_rq *this_scx = &this_rq->scx;
@@ -5146,7 +5146,7 @@ static bool kick_one_cpu(s32 cpu, struct
 
 		if (cpumask_test_cpu(cpu, this_scx->cpus_to_wait)) {
 			if (cur_class == &ext_sched_class) {
-				pseqs[cpu] = rq->scx.pnt_seq;
+				ksyncs[cpu] = rq->scx.kick_sync;
 				should_wait = true;
 			} else {
 				cpumask_clear_cpu(cpu, this_scx->cpus_to_wait);
@@ -5182,20 +5182,20 @@ static void kick_cpus_irq_workfn(struct
 {
 	struct rq *this_rq = this_rq();
 	struct scx_rq *this_scx = &this_rq->scx;
-	struct scx_kick_pseqs __rcu *pseqs_pcpu = __this_cpu_read(scx_kick_pseqs);
+	struct scx_kick_syncs __rcu *ksyncs_pcpu = __this_cpu_read(scx_kick_syncs);
 	bool should_wait = false;
-	unsigned long *pseqs;
+	unsigned long *ksyncs;
 	s32 cpu;
 
-	if (unlikely(!pseqs_pcpu)) {
-		pr_warn_once("kick_cpus_irq_workfn() called with NULL scx_kick_pseqs");
+	if (unlikely(!ksyncs_pcpu)) {
+		pr_warn_once("kick_cpus_irq_workfn() called with NULL scx_kick_syncs");
 		return;
 	}
 
-	pseqs = rcu_dereference_bh(pseqs_pcpu)->seqs;
+	ksyncs = rcu_dereference_bh(ksyncs_pcpu)->syncs;
 
 	for_each_cpu(cpu, this_scx->cpus_to_kick) {
-		should_wait |= kick_one_cpu(cpu, this_rq, pseqs);
+		should_wait |= kick_one_cpu(cpu, this_rq, ksyncs);
 		cpumask_clear_cpu(cpu, this_scx->cpus_to_kick);
 		cpumask_clear_cpu(cpu, this_scx->cpus_to_kick_if_idle);
 	}
@@ -5209,7 +5209,7 @@ static void kick_cpus_irq_workfn(struct
 		return;
 
 	for_each_cpu(cpu, this_scx->cpus_to_wait) {
-		unsigned long *wait_pnt_seq = &cpu_rq(cpu)->scx.pnt_seq;
+		unsigned long *wait_kick_sync = &cpu_rq(cpu)->scx.kick_sync;
 
 		/*
 		 * Busy-wait until the task running at the time of kicking is no
@@ -5223,7 +5223,7 @@ static void kick_cpus_irq_workfn(struct
 		 * the wait when $cpu is taken by a higher sched class.
 		 */
 		if (cpu != cpu_of(this_rq))
-			smp_cond_load_acquire(wait_pnt_seq, VAL != pseqs[cpu]);
+			smp_cond_load_acquire(wait_kick_sync, VAL != ksyncs[cpu]);
 
 		cpumask_clear_cpu(cpu, this_scx->cpus_to_wait);
 	}
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -803,7 +803,7 @@ struct scx_rq {
 	cpumask_var_t		cpus_to_kick_if_idle;
 	cpumask_var_t		cpus_to_preempt;
 	cpumask_var_t		cpus_to_wait;
-	unsigned long		pnt_seq;
+	unsigned long		kick_sync;
 	struct balance_callback	deferred_bal_cb;
 	struct irq_work		deferred_irq_work;
 	struct irq_work		kick_cpus_irq_work;
-- 
2.47.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCHSET RESEND v2 sched_ext/for-6.19] sched_ext: Fix SCX_KICK_WAIT reliability
  2025-10-22 20:56 [PATCHSET RESEND v2 sched_ext/for-6.19] sched_ext: Fix SCX_KICK_WAIT reliability Tejun Heo
                   ` (2 preceding siblings ...)
  2025-10-22 20:56 ` [PATCH RESEND v2 3/3] sched_ext: Rename pnt_seq to kick_sync Tejun Heo
@ 2025-10-22 21:18 ` Andrea Righi
  2025-10-22 21:47 ` [PATCH v2] " Tejun Heo
  4 siblings, 0 replies; 6+ messages in thread
From: Andrea Righi @ 2025-10-22 21:18 UTC (permalink / raw)
  To: Tejun Heo; +Cc: void, changwoo, linux-kernel, sched-ext, peterz

Hi Tejun,

On Wed, Oct 22, 2025 at 10:56:26AM -1000, Tejun Heo wrote:
> Resending because the original v2 posting didn't include the full recipient
> list on the individual patches due to git send-email invocation error. Sorry
> about the noise.
> 
> SCX_KICK_WAIT is used to synchronously wait for the target CPU to complete
> a reschedule and can be used to implement operations like core scheduling.
> However, recent scheduler refactorings broke its reliability. This series
> fixes the issue and improves the code clarity.
> 
> v2: - In patch #2, also increment pnt_seq in pick_task_scx() to handle
>       same-task re-selection (Andrea Righi).
>     - In patch #2, use smp_cond_load_acquire() for the busy-wait loop for
>       better architecture optimization (Peter Zijlstra).
>     - Added patch #3 to rename pnt_seq to kick_sync for clarity.
> 
> v1: http://lkml.kernel.org/r/20251021210354.89570-1-tj@kernel.org

Looks good to me!

Reviewed-by: Andrea Righi <arighi@nvidia.com>

Thanks,
-Andrea

> 
> Based on sched_ext/for-6.19 (2dbbdeda77a6).
> 
>  1 sched_ext: Don't kick CPUs running higher classes
>  2 sched_ext: Fix SCX_KICK_WAIT to work reliably
>  3 sched_ext: Rename pnt_seq to kick_sync
> 
> Git tree: git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git scx-fix-kick_wait
> 
>  kernel/sched/ext.c          | 129 ++++++++++++++++++++++++--------------------
>  kernel/sched/ext_internal.h |   6 ++-
>  kernel/sched/sched.h        |   2 +-
>  3 files changed, 75 insertions(+), 62 deletions(-)
> 
> --
> tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] sched_ext: Fix SCX_KICK_WAIT reliability
  2025-10-22 20:56 [PATCHSET RESEND v2 sched_ext/for-6.19] sched_ext: Fix SCX_KICK_WAIT reliability Tejun Heo
                   ` (3 preceding siblings ...)
  2025-10-22 21:18 ` [PATCHSET RESEND v2 sched_ext/for-6.19] sched_ext: Fix SCX_KICK_WAIT reliability Andrea Righi
@ 2025-10-22 21:47 ` Tejun Heo
  4 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2025-10-22 21:47 UTC (permalink / raw)
  To: void, arighi, changwoo; +Cc: linux-kernel, sched-ext, peterz

> Tejun Heo (3):
>   sched_ext: Don't kick CPUs running higher classes
>   sched_ext: Fix SCX_KICK_WAIT to work reliably
>   sched_ext: Rename pnt_seq to kick_sync

Applied 1-3 to sched_ext/for-6.19.

Thanks.
--
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-10-22 21:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-22 20:56 [PATCHSET RESEND v2 sched_ext/for-6.19] sched_ext: Fix SCX_KICK_WAIT reliability Tejun Heo
2025-10-22 20:56 ` [PATCH RESEND v2 1/3] sched_ext: Don't kick CPUs running higher classes Tejun Heo
2025-10-22 20:56 ` [PATCH RESEND v2 2/3] sched_ext: Fix SCX_KICK_WAIT to work reliably Tejun Heo
2025-10-22 20:56 ` [PATCH RESEND v2 3/3] sched_ext: Rename pnt_seq to kick_sync Tejun Heo
2025-10-22 21:18 ` [PATCHSET RESEND v2 sched_ext/for-6.19] sched_ext: Fix SCX_KICK_WAIT reliability Andrea Righi
2025-10-22 21:47 ` [PATCH v2] " Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox