[sched_ext/for-6.12] sched_ext: Don't use double locking to migrate tasks across CPUs

All of lore.kernel.org
 help / color / mirror / Atom feed

* [sched_ext/for-6.12] sched_ext: Don't use double locking to migrate tasks across CPUs
@ 2024-08-07 23:14 Tejun Heo
  2024-08-07 23:15 ` [PATCH RESEND sched_ext/for-6.12] " Tejun Heo
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2024-08-07 23:14 UTC (permalink / raw)
  To: David Vernet; +Cc: linux-kernel, kernel-team, Peter Zijlstra

consume_remote_task() and dispatch_to_local_dsq() use
move_task_to_local_dsq() to migrate the task to the target CPU. Currently,
move_task_to_local_dsq() expects the caller to lock both the source and
destination rq's. While this may save a few lock operations while the rq's
are not contended, under contention, the double locking can exacerbate the
situation significantly (refer to the linked message below).

Update the migration path so that double locking is not used.
move_task_to_local_dsq() now expects the caller to be locking the source rq,
drops it and then acquires the destination rq lock. Code is simpler this way
and, on a 2-way NUMA machine w/ Xeon Gold 6138, 'hackbench 100 thread 5000`
shows ~3% improvement with scx_simple.

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20240806082716.GP37996@noisy.programming.kicks-ass.net
---
 kernel/sched/ext.c |  136 ++++++++++++++++++-----------------------------------
 1 file changed, 47 insertions(+), 89 deletions(-)

--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2097,11 +2097,12 @@ static bool yield_to_task_scx(struct rq
 #ifdef CONFIG_SMP
 /**
  * move_task_to_local_dsq - Move a task from a different rq to a local DSQ
- * @rq: rq to move the task into, currently locked
  * @p: task to move
  * @enq_flags: %SCX_ENQ_*
+ * @src_rq: rq to move the task from, locked on entry, released on return
+ * @dst_rq: rq to move the task into, locked on return
  *
- * Move @p which is currently on a different rq to @rq's local DSQ. The caller
+ * Move @p which is currently on @src_rq to @dst_rq's local DSQ. The caller
  * must:
  *
  * 1. Start with exclusive access to @p either through its DSQ lock or
@@ -2109,109 +2110,63 @@ static bool yield_to_task_scx(struct rq
  *
  * 2. Set @p->scx.holding_cpu to raw_smp_processor_id().
  *
- * 3. Remember task_rq(@p). Release the exclusive access so that we don't
- *    deadlock with dequeue.
+ * 3. Remember task_rq(@p) as @src_rq. Release the exclusive access so that we
+ *    don't deadlock with dequeue.
  *
- * 4. Lock @rq and the task_rq from #3.
+ * 4. Lock @src_rq from #3.
  *
  * 5. Call this function.
  *
  * Returns %true if @p was successfully moved. %false after racing dequeue and
- * losing.
+ * losing. On return, @src_rq is unlocked and @dst_rq is locked.
  */
-static bool move_task_to_local_dsq(struct rq *rq, struct task_struct *p,
-				   u64 enq_flags)
+static bool move_task_to_local_dsq(struct task_struct *p, u64 enq_flags,
+				   struct rq *src_rq, struct rq *dst_rq)
 {
-	struct rq *task_rq;
-
-	lockdep_assert_rq_held(rq);
+	lockdep_assert_rq_held(src_rq);
 
 	/*
-	 * If dequeue got to @p while we were trying to lock both rq's, it'd
-	 * have cleared @p->scx.holding_cpu to -1. While other cpus may have
-	 * updated it to different values afterwards, as this operation can't be
+	 * If dequeue got to @p while we were trying to lock @src_rq, it'd have
+	 * cleared @p->scx.holding_cpu to -1. While other cpus may have updated
+	 * it to different values afterwards, as this operation can't be
 	 * preempted or recurse, @p->scx.holding_cpu can never become
 	 * raw_smp_processor_id() again before we're done. Thus, we can tell
 	 * whether we lost to dequeue by testing whether @p->scx.holding_cpu is
 	 * still raw_smp_processor_id().
 	 *
+	 * @p->rq couldn't have changed if we're still the holding cpu.
+	 *
 	 * See dispatch_dequeue() for the counterpart.
 	 */
-	if (unlikely(p->scx.holding_cpu != raw_smp_processor_id()))
+	if (unlikely(p->scx.holding_cpu != raw_smp_processor_id()) ||
+	    WARN_ON_ONCE(src_rq != task_rq(p))) {
+		raw_spin_rq_unlock(src_rq);
+		raw_spin_rq_lock(dst_rq);
 		return false;
+	}
 
-	/* @p->rq couldn't have changed if we're still the holding cpu */
-	task_rq = task_rq(p);
-	lockdep_assert_rq_held(task_rq);
-
-	WARN_ON_ONCE(!cpumask_test_cpu(cpu_of(rq), p->cpus_ptr));
-	deactivate_task(task_rq, p, 0);
-	set_task_cpu(p, cpu_of(rq));
-	p->scx.sticky_cpu = cpu_of(rq);
+	/* the following marks @p MIGRATING which excludes dequeue */
+	deactivate_task(src_rq, p, 0);
+	set_task_cpu(p, cpu_of(dst_rq));
+	p->scx.sticky_cpu = cpu_of(dst_rq);
+
+	raw_spin_rq_unlock(src_rq);
+	raw_spin_rq_lock(dst_rq);
 
 	/*
 	 * We want to pass scx-specific enq_flags but activate_task() will
 	 * truncate the upper 32 bit. As we own @rq, we can pass them through
 	 * @rq->scx.extra_enq_flags instead.
 	 */
-	WARN_ON_ONCE(rq->scx.extra_enq_flags);
-	rq->scx.extra_enq_flags = enq_flags;
-	activate_task(rq, p, 0);
-	rq->scx.extra_enq_flags = 0;
+	WARN_ON_ONCE(!cpumask_test_cpu(cpu_of(dst_rq), p->cpus_ptr));
+	WARN_ON_ONCE(dst_rq->scx.extra_enq_flags);
+	dst_rq->scx.extra_enq_flags = enq_flags;
+	activate_task(dst_rq, p, 0);
+	dst_rq->scx.extra_enq_flags = 0;
 
 	return true;
 }
 
-/**
- * dispatch_to_local_dsq_lock - Ensure source and destination rq's are locked
- * @rq: current rq which is locked
- * @src_rq: rq to move task from
- * @dst_rq: rq to move task to
- *
- * We're holding @rq lock and trying to dispatch a task from @src_rq to
- * @dst_rq's local DSQ and thus need to lock both @src_rq and @dst_rq. Whether
- * @rq stays locked isn't important as long as the state is restored after
- * dispatch_to_local_dsq_unlock().
- */
-static void dispatch_to_local_dsq_lock(struct rq *rq, struct rq *src_rq,
-				       struct rq *dst_rq)
-{
-	if (src_rq == dst_rq) {
-		raw_spin_rq_unlock(rq);
-		raw_spin_rq_lock(dst_rq);
-	} else if (rq == src_rq) {
-		double_lock_balance(rq, dst_rq);
-	} else if (rq == dst_rq) {
-		double_lock_balance(rq, src_rq);
-	} else {
-		raw_spin_rq_unlock(rq);
-		double_rq_lock(src_rq, dst_rq);
-	}
-}
-
-/**
- * dispatch_to_local_dsq_unlock - Undo dispatch_to_local_dsq_lock()
- * @rq: current rq which is locked
- * @src_rq: rq to move task from
- * @dst_rq: rq to move task to
- *
- * Unlock @src_rq and @dst_rq and ensure that @rq is locked on return.
- */
-static void dispatch_to_local_dsq_unlock(struct rq *rq, struct rq *src_rq,
-					 struct rq *dst_rq)
-{
-	if (src_rq == dst_rq) {
-		raw_spin_rq_unlock(dst_rq);
-		raw_spin_rq_lock(rq);
-	} else if (rq == src_rq) {
-		double_unlock_balance(rq, dst_rq);
-	} else if (rq == dst_rq) {
-		double_unlock_balance(rq, src_rq);
-	} else {
-		double_rq_unlock(src_rq, dst_rq);
-		raw_spin_rq_lock(rq);
-	}
-}
 #endif	/* CONFIG_SMP */
 
 static void consume_local_task(struct rq *rq, struct scx_dispatch_q *dsq,
@@ -2263,8 +2218,6 @@ static bool task_can_run_on_remote_rq(st
 static bool consume_remote_task(struct rq *rq, struct scx_dispatch_q *dsq,
 				struct task_struct *p, struct rq *task_rq)
 {
-	bool moved = false;
-
 	lockdep_assert_held(&dsq->lock);	/* released on return */
 
 	/*
@@ -2280,13 +2233,10 @@ static bool consume_remote_task(struct r
 	p->scx.holding_cpu = raw_smp_processor_id();
 	raw_spin_unlock(&dsq->lock);
 
-	double_lock_balance(rq, task_rq);
-
-	moved = move_task_to_local_dsq(rq, p, 0);
+	raw_spin_rq_unlock(rq);
+	raw_spin_rq_lock(task_rq);
 
-	double_unlock_balance(rq, task_rq);
-
-	return moved;
+	return move_task_to_local_dsq(p, 0, task_rq, rq);
 }
 #else	/* CONFIG_SMP */
 static bool task_can_run_on_remote_rq(struct task_struct *p, struct rq *rq) { return false; }
@@ -2380,7 +2330,6 @@ dispatch_to_local_dsq(struct rq *rq, u64
 
 #ifdef CONFIG_SMP
 	if (cpumask_test_cpu(cpu_of(dst_rq), p->cpus_ptr)) {
-		struct rq *locked_dst_rq = dst_rq;
 		bool dsp;
 
 		/*
@@ -2399,7 +2348,11 @@ dispatch_to_local_dsq(struct rq *rq, u64
 		/* store_release ensures that dequeue sees the above */
 		atomic_long_set_release(&p->scx.ops_state, SCX_OPSS_NONE);
 
-		dispatch_to_local_dsq_lock(rq, src_rq, locked_dst_rq);
+		/* switch to @src_rq lock */
+		if (rq != src_rq) {
+			raw_spin_rq_unlock(rq);
+			raw_spin_rq_lock(src_rq);
+		}
 
 		/*
 		 * We don't require the BPF scheduler to avoid dispatching to
@@ -2426,7 +2379,8 @@ dispatch_to_local_dsq(struct rq *rq, u64
 						 enq_flags);
 			}
 		} else {
-			dsp = move_task_to_local_dsq(dst_rq, p, enq_flags);
+			dsp = move_task_to_local_dsq(p, enq_flags,
+						     src_rq, dst_rq);
 		}
 
 		/* if the destination CPU is idle, wake it up */
@@ -2434,7 +2388,11 @@ dispatch_to_local_dsq(struct rq *rq, u64
 					     dst_rq->curr->sched_class))
 			resched_curr(dst_rq);
 
-		dispatch_to_local_dsq_unlock(rq, src_rq, locked_dst_rq);
+		/* switch back to @rq lock */
+		if (rq != dst_rq) {
+			raw_spin_rq_unlock(dst_rq);
+			raw_spin_rq_lock(rq);
+		}
 
 		return dsp ? DTL_DISPATCHED : DTL_LOST;
 	}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH RESEND sched_ext/for-6.12] sched_ext: Don't use double locking to migrate tasks across CPUs
  2024-08-07 23:14 [sched_ext/for-6.12] sched_ext: Don't use double locking to migrate tasks across CPUs Tejun Heo
@ 2024-08-07 23:15 ` Tejun Heo
  2024-08-08 18:19   ` David Vernet
  2024-08-13 19:09   ` Tejun Heo
  0 siblings, 2 replies; 6+ messages in thread
From: Tejun Heo @ 2024-08-07 23:15 UTC (permalink / raw)
  To: David Vernet; +Cc: linux-kernel, kernel-team, Peter Zijlstra

consume_remote_task() and dispatch_to_local_dsq() use
move_task_to_local_dsq() to migrate the task to the target CPU. Currently,
move_task_to_local_dsq() expects the caller to lock both the source and
destination rq's. While this may save a few lock operations while the rq's
are not contended, under contention, the double locking can exacerbate the
situation significantly (refer to the linked message below).

Update the migration path so that double locking is not used.
move_task_to_local_dsq() now expects the caller to be locking the source rq,
drops it and then acquires the destination rq lock. Code is simpler this way
and, on a 2-way NUMA machine w/ Xeon Gold 6138, 'hackbench 100 thread 5000`
shows ~3% improvement with scx_simple.

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20240806082716.GP37996@noisy.programming.kicks-ass.net
---
Oops, PATCH prefix was missing. Resending.

 kernel/sched/ext.c |  136 ++++++++++++++++++-----------------------------------
 1 file changed, 47 insertions(+), 89 deletions(-)

--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2097,11 +2097,12 @@ static bool yield_to_task_scx(struct rq
 #ifdef CONFIG_SMP
 /**
  * move_task_to_local_dsq - Move a task from a different rq to a local DSQ
- * @rq: rq to move the task into, currently locked
  * @p: task to move
  * @enq_flags: %SCX_ENQ_*
+ * @src_rq: rq to move the task from, locked on entry, released on return
+ * @dst_rq: rq to move the task into, locked on return
  *
- * Move @p which is currently on a different rq to @rq's local DSQ. The caller
+ * Move @p which is currently on @src_rq to @dst_rq's local DSQ. The caller
  * must:
  *
  * 1. Start with exclusive access to @p either through its DSQ lock or
@@ -2109,109 +2110,63 @@ static bool yield_to_task_scx(struct rq
  *
  * 2. Set @p->scx.holding_cpu to raw_smp_processor_id().
  *
- * 3. Remember task_rq(@p). Release the exclusive access so that we don't
- *    deadlock with dequeue.
+ * 3. Remember task_rq(@p) as @src_rq. Release the exclusive access so that we
+ *    don't deadlock with dequeue.
  *
- * 4. Lock @rq and the task_rq from #3.
+ * 4. Lock @src_rq from #3.
  *
  * 5. Call this function.
  *
  * Returns %true if @p was successfully moved. %false after racing dequeue and
- * losing.
+ * losing. On return, @src_rq is unlocked and @dst_rq is locked.
  */
-static bool move_task_to_local_dsq(struct rq *rq, struct task_struct *p,
-				   u64 enq_flags)
+static bool move_task_to_local_dsq(struct task_struct *p, u64 enq_flags,
+				   struct rq *src_rq, struct rq *dst_rq)
 {
-	struct rq *task_rq;
-
-	lockdep_assert_rq_held(rq);
+	lockdep_assert_rq_held(src_rq);
 
 	/*
-	 * If dequeue got to @p while we were trying to lock both rq's, it'd
-	 * have cleared @p->scx.holding_cpu to -1. While other cpus may have
-	 * updated it to different values afterwards, as this operation can't be
+	 * If dequeue got to @p while we were trying to lock @src_rq, it'd have
+	 * cleared @p->scx.holding_cpu to -1. While other cpus may have updated
+	 * it to different values afterwards, as this operation can't be
 	 * preempted or recurse, @p->scx.holding_cpu can never become
 	 * raw_smp_processor_id() again before we're done. Thus, we can tell
 	 * whether we lost to dequeue by testing whether @p->scx.holding_cpu is
 	 * still raw_smp_processor_id().
 	 *
+	 * @p->rq couldn't have changed if we're still the holding cpu.
+	 *
 	 * See dispatch_dequeue() for the counterpart.
 	 */
-	if (unlikely(p->scx.holding_cpu != raw_smp_processor_id()))
+	if (unlikely(p->scx.holding_cpu != raw_smp_processor_id()) ||
+	    WARN_ON_ONCE(src_rq != task_rq(p))) {
+		raw_spin_rq_unlock(src_rq);
+		raw_spin_rq_lock(dst_rq);
 		return false;
+	}
 
-	/* @p->rq couldn't have changed if we're still the holding cpu */
-	task_rq = task_rq(p);
-	lockdep_assert_rq_held(task_rq);
-
-	WARN_ON_ONCE(!cpumask_test_cpu(cpu_of(rq), p->cpus_ptr));
-	deactivate_task(task_rq, p, 0);
-	set_task_cpu(p, cpu_of(rq));
-	p->scx.sticky_cpu = cpu_of(rq);
+	/* the following marks @p MIGRATING which excludes dequeue */
+	deactivate_task(src_rq, p, 0);
+	set_task_cpu(p, cpu_of(dst_rq));
+	p->scx.sticky_cpu = cpu_of(dst_rq);
+
+	raw_spin_rq_unlock(src_rq);
+	raw_spin_rq_lock(dst_rq);
 
 	/*
 	 * We want to pass scx-specific enq_flags but activate_task() will
 	 * truncate the upper 32 bit. As we own @rq, we can pass them through
 	 * @rq->scx.extra_enq_flags instead.
 	 */
-	WARN_ON_ONCE(rq->scx.extra_enq_flags);
-	rq->scx.extra_enq_flags = enq_flags;
-	activate_task(rq, p, 0);
-	rq->scx.extra_enq_flags = 0;
+	WARN_ON_ONCE(!cpumask_test_cpu(cpu_of(dst_rq), p->cpus_ptr));
+	WARN_ON_ONCE(dst_rq->scx.extra_enq_flags);
+	dst_rq->scx.extra_enq_flags = enq_flags;
+	activate_task(dst_rq, p, 0);
+	dst_rq->scx.extra_enq_flags = 0;
 
 	return true;
 }
 
-/**
- * dispatch_to_local_dsq_lock - Ensure source and destination rq's are locked
- * @rq: current rq which is locked
- * @src_rq: rq to move task from
- * @dst_rq: rq to move task to
- *
- * We're holding @rq lock and trying to dispatch a task from @src_rq to
- * @dst_rq's local DSQ and thus need to lock both @src_rq and @dst_rq. Whether
- * @rq stays locked isn't important as long as the state is restored after
- * dispatch_to_local_dsq_unlock().
- */
-static void dispatch_to_local_dsq_lock(struct rq *rq, struct rq *src_rq,
-				       struct rq *dst_rq)
-{
-	if (src_rq == dst_rq) {
-		raw_spin_rq_unlock(rq);
-		raw_spin_rq_lock(dst_rq);
-	} else if (rq == src_rq) {
-		double_lock_balance(rq, dst_rq);
-	} else if (rq == dst_rq) {
-		double_lock_balance(rq, src_rq);
-	} else {
-		raw_spin_rq_unlock(rq);
-		double_rq_lock(src_rq, dst_rq);
-	}
-}
-
-/**
- * dispatch_to_local_dsq_unlock - Undo dispatch_to_local_dsq_lock()
- * @rq: current rq which is locked
- * @src_rq: rq to move task from
- * @dst_rq: rq to move task to
- *
- * Unlock @src_rq and @dst_rq and ensure that @rq is locked on return.
- */
-static void dispatch_to_local_dsq_unlock(struct rq *rq, struct rq *src_rq,
-					 struct rq *dst_rq)
-{
-	if (src_rq == dst_rq) {
-		raw_spin_rq_unlock(dst_rq);
-		raw_spin_rq_lock(rq);
-	} else if (rq == src_rq) {
-		double_unlock_balance(rq, dst_rq);
-	} else if (rq == dst_rq) {
-		double_unlock_balance(rq, src_rq);
-	} else {
-		double_rq_unlock(src_rq, dst_rq);
-		raw_spin_rq_lock(rq);
-	}
-}
 #endif	/* CONFIG_SMP */
 
 static void consume_local_task(struct rq *rq, struct scx_dispatch_q *dsq,
@@ -2263,8 +2218,6 @@ static bool task_can_run_on_remote_rq(st
 static bool consume_remote_task(struct rq *rq, struct scx_dispatch_q *dsq,
 				struct task_struct *p, struct rq *task_rq)
 {
-	bool moved = false;
-
 	lockdep_assert_held(&dsq->lock);	/* released on return */
 
 	/*
@@ -2280,13 +2233,10 @@ static bool consume_remote_task(struct r
 	p->scx.holding_cpu = raw_smp_processor_id();
 	raw_spin_unlock(&dsq->lock);
 
-	double_lock_balance(rq, task_rq);
-
-	moved = move_task_to_local_dsq(rq, p, 0);
+	raw_spin_rq_unlock(rq);
+	raw_spin_rq_lock(task_rq);
 
-	double_unlock_balance(rq, task_rq);
-
-	return moved;
+	return move_task_to_local_dsq(p, 0, task_rq, rq);
 }
 #else	/* CONFIG_SMP */
 static bool task_can_run_on_remote_rq(struct task_struct *p, struct rq *rq) { return false; }
@@ -2380,7 +2330,6 @@ dispatch_to_local_dsq(struct rq *rq, u64
 
 #ifdef CONFIG_SMP
 	if (cpumask_test_cpu(cpu_of(dst_rq), p->cpus_ptr)) {
-		struct rq *locked_dst_rq = dst_rq;
 		bool dsp;
 
 		/*
@@ -2399,7 +2348,11 @@ dispatch_to_local_dsq(struct rq *rq, u64
 		/* store_release ensures that dequeue sees the above */
 		atomic_long_set_release(&p->scx.ops_state, SCX_OPSS_NONE);
 
-		dispatch_to_local_dsq_lock(rq, src_rq, locked_dst_rq);
+		/* switch to @src_rq lock */
+		if (rq != src_rq) {
+			raw_spin_rq_unlock(rq);
+			raw_spin_rq_lock(src_rq);
+		}
 
 		/*
 		 * We don't require the BPF scheduler to avoid dispatching to
@@ -2426,7 +2379,8 @@ dispatch_to_local_dsq(struct rq *rq, u64
 						 enq_flags);
 			}
 		} else {
-			dsp = move_task_to_local_dsq(dst_rq, p, enq_flags);
+			dsp = move_task_to_local_dsq(p, enq_flags,
+						     src_rq, dst_rq);
 		}
 
 		/* if the destination CPU is idle, wake it up */
@@ -2434,7 +2388,11 @@ dispatch_to_local_dsq(struct rq *rq, u64
 					     dst_rq->curr->sched_class))
 			resched_curr(dst_rq);
 
-		dispatch_to_local_dsq_unlock(rq, src_rq, locked_dst_rq);
+		/* switch back to @rq lock */
+		if (rq != dst_rq) {
+			raw_spin_rq_unlock(dst_rq);
+			raw_spin_rq_lock(rq);
+		}
 
 		return dsp ? DTL_DISPATCHED : DTL_LOST;
 	}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RESEND sched_ext/for-6.12] sched_ext: Don't use double locking to migrate tasks across CPUs
  2024-08-07 23:15 ` [PATCH RESEND sched_ext/for-6.12] " Tejun Heo
@ 2024-08-08 18:19   ` David Vernet
  2024-08-08 18:24     ` Tejun Heo
  2024-08-13 19:09   ` Tejun Heo
  1 sibling, 1 reply; 6+ messages in thread
From: David Vernet @ 2024-08-08 18:19 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, kernel-team, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 3053 bytes --]

On Wed, Aug 07, 2024 at 01:15:19PM -1000, Tejun Heo wrote:

Hi Tejun,

[...]

> -static bool move_task_to_local_dsq(struct rq *rq, struct task_struct *p,
> -				   u64 enq_flags)
> +static bool move_task_to_local_dsq(struct task_struct *p, u64 enq_flags,
> +				   struct rq *src_rq, struct rq *dst_rq)
>  {
> -	struct rq *task_rq;
> -
> -	lockdep_assert_rq_held(rq);
> +	lockdep_assert_rq_held(src_rq);
>  
>  	/*
> -	 * If dequeue got to @p while we were trying to lock both rq's, it'd
> -	 * have cleared @p->scx.holding_cpu to -1. While other cpus may have
> -	 * updated it to different values afterwards, as this operation can't be
> +	 * If dequeue got to @p while we were trying to lock @src_rq, it'd have
> +	 * cleared @p->scx.holding_cpu to -1. While other cpus may have updated
> +	 * it to different values afterwards, as this operation can't be
>  	 * preempted or recurse, @p->scx.holding_cpu can never become
>  	 * raw_smp_processor_id() again before we're done. Thus, we can tell
>  	 * whether we lost to dequeue by testing whether @p->scx.holding_cpu is
>  	 * still raw_smp_processor_id().
>  	 *
> +	 * @p->rq couldn't have changed if we're still the holding cpu.
> +	 *
>  	 * See dispatch_dequeue() for the counterpart.
>  	 */
> -	if (unlikely(p->scx.holding_cpu != raw_smp_processor_id()))
> +	if (unlikely(p->scx.holding_cpu != raw_smp_processor_id()) ||
> +	    WARN_ON_ONCE(src_rq != task_rq(p))) {
> +		raw_spin_rq_unlock(src_rq);
> +		raw_spin_rq_lock(dst_rq);
>  		return false;
> +	}
>  
> -	/* @p->rq couldn't have changed if we're still the holding cpu */
> -	task_rq = task_rq(p);
> -	lockdep_assert_rq_held(task_rq);
> -
> -	WARN_ON_ONCE(!cpumask_test_cpu(cpu_of(rq), p->cpus_ptr));
> -	deactivate_task(task_rq, p, 0);
> -	set_task_cpu(p, cpu_of(rq));
> -	p->scx.sticky_cpu = cpu_of(rq);
> +	/* the following marks @p MIGRATING which excludes dequeue */
> +	deactivate_task(src_rq, p, 0);
> +	set_task_cpu(p, cpu_of(dst_rq));
> +	p->scx.sticky_cpu = cpu_of(dst_rq);
> +
> +	raw_spin_rq_unlock(src_rq);
> +	raw_spin_rq_lock(dst_rq);
>  
>  	/*
>  	 * We want to pass scx-specific enq_flags but activate_task() will
>  	 * truncate the upper 32 bit. As we own @rq, we can pass them through
>  	 * @rq->scx.extra_enq_flags instead.
>  	 */
> -	WARN_ON_ONCE(rq->scx.extra_enq_flags);
> -	rq->scx.extra_enq_flags = enq_flags;
> -	activate_task(rq, p, 0);
> -	rq->scx.extra_enq_flags = 0;
> +	WARN_ON_ONCE(!cpumask_test_cpu(cpu_of(dst_rq), p->cpus_ptr));

Hmmm, what's to stop someone from issuing a call to
set_cpus_allowed_ptr() after we drop the src_rq lock above?  Before we
held any relevant rq lock so everything should have been protected, but
I'm not following how we prevent racing with the cpus_allowed logic in
core.c here.

> +	WARN_ON_ONCE(dst_rq->scx.extra_enq_flags);
> +	dst_rq->scx.extra_enq_flags = enq_flags;
> +	activate_task(dst_rq, p, 0);
> +	dst_rq->scx.extra_enq_flags = 0;
>  
>  	return true;
>  }

Thanks,
David

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RESEND sched_ext/for-6.12] sched_ext: Don't use double locking to migrate tasks across CPUs
  2024-08-08 18:19   ` David Vernet
@ 2024-08-08 18:24     ` Tejun Heo
  2024-08-08 18:25       ` David Vernet
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2024-08-08 18:24 UTC (permalink / raw)
  To: David Vernet; +Cc: linux-kernel, kernel-team, Peter Zijlstra

Hello,

On Thu, Aug 08, 2024 at 01:19:27PM -0500, David Vernet wrote:
...
> > +	deactivate_task(src_rq, p, 0);
> > +	set_task_cpu(p, cpu_of(dst_rq));
> > +	p->scx.sticky_cpu = cpu_of(dst_rq);
> > +
> > +	raw_spin_rq_unlock(src_rq);
> > +	raw_spin_rq_lock(dst_rq);
> >  
> >  	/*
> >  	 * We want to pass scx-specific enq_flags but activate_task() will
> >  	 * truncate the upper 32 bit. As we own @rq, we can pass them through
> >  	 * @rq->scx.extra_enq_flags instead.
> >  	 */
> > -	WARN_ON_ONCE(rq->scx.extra_enq_flags);
> > -	rq->scx.extra_enq_flags = enq_flags;
> > -	activate_task(rq, p, 0);
> > -	rq->scx.extra_enq_flags = 0;
> > +	WARN_ON_ONCE(!cpumask_test_cpu(cpu_of(dst_rq), p->cpus_ptr));
> 
> Hmmm, what's to stop someone from issuing a call to
> set_cpus_allowed_ptr() after we drop the src_rq lock above?  Before we
> held any relevant rq lock so everything should have been protected, but
> I'm not following how we prevent racing with the cpus_allowed logic in
> core.c here.

deactivate_task(src_rq, p, 0) sets p->on_rq to TASK_ON_RQ_MIGRATING and
task_rq_lock() can't lock p until it's cleared, so set_cpus_allowed_ptr()
is blocked till the migration is complete.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RESEND sched_ext/for-6.12] sched_ext: Don't use double locking to migrate tasks across CPUs
  2024-08-08 18:24     ` Tejun Heo
@ 2024-08-08 18:25       ` David Vernet
  0 siblings, 0 replies; 6+ messages in thread
From: David Vernet @ 2024-08-08 18:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, kernel-team, Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 1460 bytes --]

On Thu, Aug 08, 2024 at 08:24:55AM -1000, Tejun Heo wrote:
> Hello,
> 
> On Thu, Aug 08, 2024 at 01:19:27PM -0500, David Vernet wrote:
> ...
> > > +	deactivate_task(src_rq, p, 0);
> > > +	set_task_cpu(p, cpu_of(dst_rq));
> > > +	p->scx.sticky_cpu = cpu_of(dst_rq);
> > > +
> > > +	raw_spin_rq_unlock(src_rq);
> > > +	raw_spin_rq_lock(dst_rq);
> > >  
> > >  	/*
> > >  	 * We want to pass scx-specific enq_flags but activate_task() will
> > >  	 * truncate the upper 32 bit. As we own @rq, we can pass them through
> > >  	 * @rq->scx.extra_enq_flags instead.
> > >  	 */
> > > -	WARN_ON_ONCE(rq->scx.extra_enq_flags);
> > > -	rq->scx.extra_enq_flags = enq_flags;
> > > -	activate_task(rq, p, 0);
> > > -	rq->scx.extra_enq_flags = 0;
> > > +	WARN_ON_ONCE(!cpumask_test_cpu(cpu_of(dst_rq), p->cpus_ptr));
> > 
> > Hmmm, what's to stop someone from issuing a call to
> > set_cpus_allowed_ptr() after we drop the src_rq lock above?  Before we
> > held any relevant rq lock so everything should have been protected, but
> > I'm not following how we prevent racing with the cpus_allowed logic in
> > core.c here.
> 
> deactivate_task(src_rq, p, 0) sets p->on_rq to TASK_ON_RQ_MIGRATING and
> task_rq_lock() can't lock p until it's cleared, so set_cpus_allowed_ptr()
> is blocked till the migration is complete.

Ahh I see, thanks. This LGTM then.

Acked-by: David Vernet <void@manifault.com>

> 
> Thanks.
> 
> -- 
> tejun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH RESEND sched_ext/for-6.12] sched_ext: Don't use double locking to migrate tasks across CPUs
  2024-08-07 23:15 ` [PATCH RESEND sched_ext/for-6.12] " Tejun Heo
  2024-08-08 18:19   ` David Vernet
@ 2024-08-13 19:09   ` Tejun Heo
  1 sibling, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2024-08-13 19:09 UTC (permalink / raw)
  To: David Vernet; +Cc: linux-kernel, kernel-team, Peter Zijlstra

On Wed, Aug 07, 2024 at 01:15:19PM -1000, Tejun Heo wrote:
> consume_remote_task() and dispatch_to_local_dsq() use
> move_task_to_local_dsq() to migrate the task to the target CPU. Currently,
> move_task_to_local_dsq() expects the caller to lock both the source and
> destination rq's. While this may save a few lock operations while the rq's
> are not contended, under contention, the double locking can exacerbate the
> situation significantly (refer to the linked message below).
> 
> Update the migration path so that double locking is not used.
> move_task_to_local_dsq() now expects the caller to be locking the source rq,
> drops it and then acquires the destination rq lock. Code is simpler this way
> and, on a 2-way NUMA machine w/ Xeon Gold 6138, 'hackbench 100 thread 5000`
> shows ~3% improvement with scx_simple.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Link: http://lkml.kernel.org/r/20240806082716.GP37996@noisy.programming.kicks-ass.net

Applied to sched_ext/for-6.12.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-08-13 19:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-07 23:14 [sched_ext/for-6.12] sched_ext: Don't use double locking to migrate tasks across CPUs Tejun Heo
2024-08-07 23:15 ` [PATCH RESEND sched_ext/for-6.12] " Tejun Heo
2024-08-08 18:19   ` David Vernet
2024-08-08 18:24     ` Tejun Heo
2024-08-08 18:25       ` David Vernet
2024-08-13 19:09   ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.