Re: [PATCH v23 6/9] sched: Handle blocked-waiter migration (and return migration)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: K Prateek Nayak <kprateek.nayak@amd.com>
To: John Stultz <jstultz@google.com>, LKML <linux-kernel@vger.kernel.org>
Cc: Joel Fernandes <joelagnelf@nvidia.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Juri Lelli" <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>,
	Mel Gorman <mgorman@suse.de>, Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Daniel Lezcano" <daniel.lezcano@linaro.org>,
	Suleiman Souhlal <suleiman@google.com>,
	kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
	<kernel-team@android.com>
Subject: Re: [PATCH v23 6/9] sched: Handle blocked-waiter migration (and return migration)
Date: Thu, 30 Oct 2025 15:02:45 +0530	[thread overview]
Message-ID: <0c337bca-4ecf-4654-9256-df766573c7de@amd.com> (raw)
In-Reply-To: <20251030001857.681432-7-jstultz@google.com>

Hello John,

On 10/30/2025 5:48 AM, John Stultz wrote:
> -static struct task_struct *proxy_deactivate(struct rq *rq, struct task_struct *donor)
> +/*
> + * If the blocked-on relationship crosses CPUs, migrate @p to the
> + * owner's CPU.
> + *
> + * This is because we must respect the CPU affinity of execution
> + * contexts (owner) but we can ignore affinity for scheduling
> + * contexts (@p). So we have to move scheduling contexts towards
> + * potential execution contexts.
> + *
> + * Note: The owner can disappear, but simply migrate to @target_cpu
> + * and leave that CPU to sort things out.
> + */
> +static void proxy_migrate_task(struct rq *rq, struct rq_flags *rf,
> +			       struct task_struct *p, int target_cpu)
>  {
> -	if (!__proxy_deactivate(rq, donor)) {
> -		/*
> -		 * XXX: For now, if deactivation failed, set donor
> -		 * as unblocked, as we aren't doing proxy-migrations
> -		 * yet (more logic will be needed then).
> -		 */
> -		clear_task_blocked_on(donor, NULL);
> +	struct rq *target_rq = cpu_rq(target_cpu);
> +
> +	lockdep_assert_rq_held(rq);
> +
> +	/*
> +	 * Since we're going to drop @rq, we have to put(@rq->donor) first,
> +	 * otherwise we have a reference that no longer belongs to us.
> +	 *
> +	 * Additionally, as we put_prev_task(prev) earlier, its possible that
> +	 * prev will migrate away as soon as we drop the rq lock, however we
> +	 * still have it marked as rq->curr, as we've not yet switched tasks.
> +	 *
> +	 * So call proxy_resched_idle() to let go of the references before
> +	 * we release the lock.
> +	 */
> +	proxy_resched_idle(rq);
> +
> +	WARN_ON(p == rq->curr);
> +
> +	deactivate_task(rq, p, 0);

DEQUEUE_NOCLOCK since we arrive here with the clock updated from
schedule().

> +	proxy_set_task_cpu(p, target_cpu);
> +
> +	/*
> +	 * We have to zap callbacks before unlocking the rq
> +	 * as another CPU may jump in and call sched_balance_rq
> +	 * which can trip the warning in rq_pin_lock() if we
> +	 * leave callbacks set.
> +	 */
> +	zap_balance_callbacks(rq);
> +	rq_unpin_lock(rq, rf);
> +	raw_spin_rq_unlock(rq);
> +	raw_spin_rq_lock(target_rq);
> +
> +	activate_task(target_rq, p, 0);
> +	wakeup_preempt(target_rq, p, 0);
> +
> +	raw_spin_rq_unlock(target_rq);
> +	raw_spin_rq_lock(rq);
> +	rq_repin_lock(rq, rf);

We should perhaps update the clock once we've reacquired the rq_lock
given we are going into schedule() again for another pick.

> +}
> +
> +static void proxy_force_return(struct rq *rq, struct rq_flags *rf,
> +			       struct task_struct *p)
> +{
> +	struct rq *this_rq, *target_rq;
> +	struct rq_flags this_rf;
> +	int cpu, wake_flag = 0;
> +
> +	lockdep_assert_rq_held(rq);
> +	WARN_ON(p == rq->curr);
> +
> +	get_task_struct(p);
> +
> +	/*
> +	 * We have to zap callbacks before unlocking the rq
> +	 * as another CPU may jump in and call sched_balance_rq
> +	 * which can trip the warning in rq_pin_lock() if we
> +	 * leave callbacks set.
> +	 */
> +	zap_balance_callbacks(rq);
> +	rq_unpin_lock(rq, rf);
> +	raw_spin_rq_unlock(rq);
> +
> +	/*
> +	 * We drop the rq lock, and re-grab task_rq_lock to get
> +	 * the pi_lock (needed for select_task_rq) as well.
> +	 */
> +	this_rq = task_rq_lock(p, &this_rf);
> +	update_rq_clock(this_rq);

I think we can delay the clock update until proxy_resched_idle().

> +
> +	/*
> +	 * Since we let go of the rq lock, the task may have been
> +	 * woken or migrated to another rq before we  got the
> +	 * task_rq_lock. So re-check we're on the same RQ. If
> +	 * not, the task has already been migrated and that CPU
> +	 * will handle any futher migrations.
> +	 */
> +	if (this_rq != rq)
> +		goto err_out;
> +
> +	/* Similarly, if we've been dequeued, someone else will wake us */
> +	if (!task_on_rq_queued(p))
> +		goto err_out;
> +
> +	/*
> +	 * Since we should only be calling here from __schedule()
> +	 * -> find_proxy_task(), no one else should have
> +	 * assigned current out from under us. But check and warn
> +	 * if we see this, then bail.
> +	 */
> +	if (task_current(this_rq, p) || task_on_cpu(this_rq, p)) {
> +		WARN_ONCE(1, "%s rq: %i current/on_cpu task %s %d  on_cpu: %i\n",
> +			  __func__, cpu_of(this_rq),
> +			  p->comm, p->pid, p->on_cpu);
> +		goto err_out;
>  	}
> -	return NULL;
> +
> +	proxy_resched_idle(this_rq);
> +	deactivate_task(this_rq, p, 0);

This should add DEQUEUE_NOCLOCK since we've already updated the rq clock
before the call.

> +	cpu = select_task_rq(p, p->wake_cpu, &wake_flag);
> +	set_task_cpu(p, cpu);
> +	target_rq = cpu_rq(cpu);
> +	clear_task_blocked_on(p, NULL);
> +	task_rq_unlock(this_rq, p, &this_rf);
> +
> +	/* Drop this_rq and grab target_rq for activation */
> +	raw_spin_rq_lock(target_rq);
> +	activate_task(target_rq, p, 0);
> +	wakeup_preempt(target_rq, p, 0);
> +	put_task_struct(p);
> +	raw_spin_rq_unlock(target_rq);
> +
> +	/* Finally, re-grab the origianl rq lock and return to pick-again */
> +	raw_spin_rq_lock(rq);
> +	rq_repin_lock(rq, rf);
> +	return;
> +
> +err_out:
> +	put_task_struct(p);
> +	task_rq_unlock(this_rq, p, &this_rf);

I believe as long a we have the task_rq_lock(), the task cannot
disappear under us but are there any concern on doing a
put_task_struct() and then using the same task reference for
task_rq_unlock()?

> +	raw_spin_rq_lock(rq);
> +	rq_repin_lock(rq, rf);

Probably a clock update once we reacquire the rq_lock since we
go into schedule() again to retry pick?

> +	return;
>  }
>  
>  /*
> @@ -6655,10 +6792,12 @@ static struct task_struct *
>  find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
>  {
>  	struct task_struct *owner = NULL;
> +	bool curr_in_chain = false;
>  	int this_cpu = cpu_of(rq);
>  	struct task_struct *p;
>  	struct mutex *mutex;
> -	enum { FOUND, DEACTIVATE_DONOR } action = FOUND;
> +	int owner_cpu;
> +	enum { FOUND, DEACTIVATE_DONOR, MIGRATE, NEEDS_RETURN } action = FOUND;
>  
>  	/* Follow blocked_on chain. */
>  	for (p = donor; task_is_blocked(p); p = owner) {
> @@ -6667,9 +6806,15 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
>  		if (!mutex)
>  			return NULL;
>  
> -		/* if its PROXY_WAKING, resched_idle so ttwu can complete */
> -		if (mutex == PROXY_WAKING)
> -			return proxy_resched_idle(rq);
> +		/* if its PROXY_WAKING, do return migration or run if current */
> +		if (mutex == PROXY_WAKING) {
> +			if (task_current(rq, p)) {
> +				clear_task_blocked_on(p, PROXY_WAKING);
> +				return p;
> +			}
> +			action = NEEDS_RETURN;
> +			break;
> +		}
>  
>  		/*
>  		 * By taking mutex->wait_lock we hold off concurrent mutex_unlock()
> @@ -6689,26 +6834,41 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
>  			return NULL;
>  		}
>  
> +		if (task_current(rq, p))
> +			curr_in_chain = true;
> +
>  		owner = __mutex_owner(mutex);
>  		if (!owner) {
>  			/*
> -			 * If there is no owner, clear blocked_on
> -			 * and return p so it can run and try to
> -			 * acquire the lock
> +			 * If there is no owner, either clear blocked_on
> +			 * and return p (if it is current and safe to
> +			 * just run on this rq), or return-migrate the task.
>  			 */
> -			__clear_task_blocked_on(p, mutex);
> -			return p;
> +			if (task_current(rq, p)) {
> +				__clear_task_blocked_on(p, NULL);
> +				return p;
> +			}
> +			action = NEEDS_RETURN;
> +			break;
>  		}
>  
>  		if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) {

Should we handle task_on_rq_migrating() in the similar way?
Wait for the owner to finish migrating and look at the
task_cpu(owner) once it is reliable?

>  			/* XXX Don't handle blocked owners/delayed dequeue yet */
> +			if (curr_in_chain)
> +				return proxy_resched_idle(rq);
>  			action = DEACTIVATE_DONOR;
>  			break;
>  		}
>  
> -		if (task_cpu(owner) != this_cpu) {
> -			/* XXX Don't handle migrations yet */
> -			action = DEACTIVATE_DONOR;
> +		owner_cpu = task_cpu(owner);
> +		if (owner_cpu != this_cpu) {
> +			/*
> +			 * @owner can disappear, simply migrate to @owner_cpu
> +			 * and leave that CPU to sort things out.
> +			 */
> +			if (curr_in_chain)
> +				return proxy_resched_idle(rq);
> +			action = MIGRATE;
>  			break;
>  		}
>  
> @@ -6770,7 +6930,17 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
>  	/* Handle actions we need to do outside of the guard() scope */
>  	switch (action) {
>  	case DEACTIVATE_DONOR:
> -		return proxy_deactivate(rq, donor);
> +		if (proxy_deactivate(rq, donor))
> +			return NULL;
> +		/* If deactivate fails, force return */
> +		p = donor;
> +		fallthrough;
> +	case NEEDS_RETURN:
> +		proxy_force_return(rq, rf, p);
> +		return NULL;
> +	case MIGRATE:
> +		proxy_migrate_task(rq, rf, p, owner_cpu);
> +		return NULL;
>  	case FOUND:
>  		/* fallthrough */;
>  	}

-- 
Thanks and Regards,
Prateek

next prev parent reply	other threads:[~2025-10-30  9:33 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-30  0:18 [PATCH v23 0/9] Donor Migration for Proxy Execution (v23) John Stultz
2025-10-30  0:18 ` [PATCH v23 1/9] locking: Add task::blocked_lock to serialize blocked_on state John Stultz
2025-10-30  0:18 ` [PATCH v23 2/9] sched: Fix modifying donor->blocked on without proper locking John Stultz
2025-10-30  4:51   ` K Prateek Nayak
2025-10-30 23:42     ` John Stultz
2025-10-30  0:18 ` [PATCH v23 3/9] sched/locking: Add special p->blocked_on==PROXY_WAKING value for proxy return-migration John Stultz
2025-10-30  7:32   ` K Prateek Nayak
2025-10-30 23:53     ` John Stultz
2025-10-30  0:18 ` [PATCH v23 4/9] sched: Add assert_balance_callbacks_empty helper John Stultz
2025-10-30  7:38   ` K Prateek Nayak
2025-10-30  0:18 ` [PATCH v23 5/9] sched: Add logic to zap balance callbacks if we pick again John Stultz
2025-10-30  8:08   ` K Prateek Nayak
2025-10-31  3:15     ` John Stultz
2025-10-31  3:50       ` K Prateek Nayak
2025-10-30  0:18 ` [PATCH v23 6/9] sched: Handle blocked-waiter migration (and return migration) John Stultz
2025-10-30  9:32   ` K Prateek Nayak [this message]
2025-11-07 23:18     ` John Stultz
2025-11-10  4:47       ` K Prateek Nayak
2025-11-20  1:53         ` John Stultz
2025-11-20  2:00           ` John Stultz
2025-11-20  2:55             ` K Prateek Nayak
2025-11-20  6:33               ` John Stultz
2025-11-20  7:16                 ` K Prateek Nayak
2025-11-20  7:27                   ` John Stultz
2025-11-07 15:19   ` Juri Lelli
2025-11-07 17:24     ` John Stultz
2025-10-30  0:18 ` [PATCH v23 7/9] sched: Have try_to_wake_up() handle return-migration for PROXY_WAKING case John Stultz
2025-10-31  4:27   ` K Prateek Nayak
2025-11-20  1:05     ` John Stultz
2025-11-20  3:15       ` K Prateek Nayak
2025-11-20  7:34         ` John Stultz
2025-10-30  0:18 ` [PATCH v23 8/9] sched: Add blocked_donor link to task for smarter mutex handoffs John Stultz
2025-10-31  5:01   ` K Prateek Nayak
2025-11-11  7:50     ` John Stultz
2025-11-11  8:35       ` K Prateek Nayak
2025-10-30  0:18 ` [PATCH v23 9/9] sched: Migrate whole chain in proxy_migrate_task() John Stultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0c337bca-4ecf-4654-9256-df766573c7de@amd.com \
    --to=kprateek.nayak@amd.com \
    --cc=Metin.Kaya@arm.com \
    --cc=boqun.feng@gmail.com \
    --cc=bsegall@google.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=hupu.gm@gmail.com \
    --cc=joelagnelf@nvidia.com \
    --cc=jstultz@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@android.com \
    --cc=kuyo.chang@mediatek.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=rostedt@goodmis.org \
    --cc=suleiman@google.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=will@kernel.org \
    --cc=xuewen.yan94@gmail.com \
    --cc=zezeozue@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.