All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: John Stultz <jstultz@google.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Qais Yousef <qyousef@layalina.io>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Zimuzo Ezeozue <zezeozue@google.com>,
	Mel Gorman <mgorman@suse.de>, Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Suleiman Souhlal <suleiman@google.com>,
	kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
	kernel-team@android.com
Subject: Re: [PATCH v26 00/10] Simple Donor Migration for Proxy Execution
Date: Fri, 3 Apr 2026 11:52:25 +0200	[thread overview]
Message-ID: <20260403095225.GY3738010@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <e6dfce6a-ee6f-441b-9f21-e2036d0a3e5d@amd.com>

On Fri, Apr 03, 2026 at 11:39:25AM +0530, K Prateek Nayak wrote:

> >> @@ -3702,28 +3723,39 @@ ttwu_do_activate(struct rq *rq, struct t
> >>   */
> >>  static int ttwu_runnable(struct task_struct *p, int wake_flags)
> >>  {
> >> -       struct rq_flags rf;
> >> -       struct rq *rq;
> >> -       int ret = 0;
> >> +       ACQUIRE(__task_rq_lock, guard)(p);
> >> +       struct rq *rq = guard.rq;
> >>
> >> -       rq = __task_rq_lock(p, &rf);
> >> -       if (task_on_rq_queued(p)) {
> >> -               update_rq_clock(rq);
> >> -               if (p->se.sched_delayed)
> >> -                       enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
> >> -               if (!task_on_cpu(rq, p)) {
> >> +       if (!task_on_rq_queued(p))
> >> +               return 0;
> >> +
> >> +       if (sched_proxy_exec() && p->blocked_on) {
> >> +               guard(raw_spinlock)(&p->blocked_lock);
> >> +               struct mutex *lock = p->blocked_on;
> >> +               if (lock) {
> >>                         /*
> >> -                        * When on_rq && !on_cpu the task is preempted, see if
> >> -                        * it should preempt the task that is current now.
> >> +                        * TASK_WAKING is a special state and results in
> >> +                        * DEQUEUE_SPECIAL such that the task will actually be
> >> +                        * forced from the runqueue.
> >>                          */
> >> -                       wakeup_preempt(rq, p, wake_flags);
> >> +                       block_task(rq, p, TASK_WAKING);
> 
> This needs to reset the rq->donor if the task getting woken up is the
> current donor.

*groan*, that is a fun case. I'll ponder that.

> >> +                       p->blocked_on = NULL;
> >> +                       return 0;
> >>                 }
> >> -               ttwu_do_wakeup(p);
> >> -               ret = 1;
> >>         }
> >> -       __task_rq_unlock(rq, p, &rf);
> >>
> >> -       return ret;
> >> +       update_rq_clock(rq);
> 
> nit. Since block_task() adds a DEQUEUE_NOCLOCK now we need to move that
> clock update before the block.

D'0h :-)

> >> +       if (p->se.sched_delayed)
> >> +               enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
> > 
> > I can't precisely remember the details now, but I believe we need to
> > handle enqueueing sched_delayed tasks before handling blocked_on
> > tasks.
> 
> So proxy_deactivate() can still delay the task leading to
> task_on_rq_queued() and the wakeup coming to ttwu_runnable() so either
> we can dequeue it fully in proxy_deactivate() or we need to teach
> block_task() to add a DEQUEUE_DELAYED flag when task_is_blocked().
> 
> I think the former is cleaner but we don't decay lag for fair task :-(
> 
> We can't simply re-enqueue it either since proxy migration might have
> put it on a CPU outside its affinity mask so we need to take a full
> dequeue + wakeup in ttwu_runnable().

Right, sanest option is to have ttwu_runnable() deal with this.

> >> -static void proxy_force_return(struct rq *rq, struct rq_flags *rf,
> >> -                              struct task_struct *p)
> >> -       __must_hold(__rq_lockp(rq))
> >> -{
> >> -}
> >> -
> 
> Went a little heavy of the delete there did you? :-)

Well, I thought that was the whole idea, have ttwu() handle this :-)

> >>  /*
> >>   * Find runnable lock owner to proxy for mutex blocked donor
> >>   *
> >> @@ -6777,7 +6723,7 @@ find_proxy_task(struct rq *rq, struct ta
> >>                                 clear_task_blocked_on(p, PROXY_WAKING);
> >>                                 return p;
> >>                         }
> >> -                       goto force_return;
> >> +                       goto deactivate;
> >>                 }
> 
> This makes sense if we preserve the !TASK_RUNNING + p->blocked_on
> invariant since we'll definitely get a wakeup here.

Right, so TASK_RUNNING must imply !->blocked_on.

> >>
> >>                 /*
> >> @@ -6812,7 +6758,7 @@ find_proxy_task(struct rq *rq, struct ta
> >>                                 __clear_task_blocked_on(p, NULL);
> >>                                 return p;
> >>                         }
> >> -                       goto force_return;
> >> +                       goto deactivate;
> 
> This too makes sense considering !owner implies some task will be woken
> up but ... if we take this task off and another task steals the mutex,
> this task will no longer be able to proxy it since it is completely
> blocked now.
> 
> Probably not desired. We should at least let it run and see if it can
> get the mutex and evaluate the "p->blocked_on" again since !owner is
> a limbo state.

I need to go re-read the mutex side of things, but doesn't that do
hand-off way more agressively?

Anyway, one thing that is completely missing is a fast path for when the
task is still inside its valid mask. I suspect adding that back will
cure some of these issues.

> So I added the following in top of Peter's diff on top of
> queue:sched/core and it hasn't crashed and burnt yet when running a
> handful instances of sched-messaging with a mix of fair and SCHED_RR
> priority:
> 
>   (Includes John's findings from the parallel thread)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 5b2b2451720a..e845e3a8ae65 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2160,7 +2160,7 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
>  	dequeue_task(rq, p, flags);
>  }
>  
> -static bool block_task(struct rq *rq, struct task_struct *p, unsigned long task_state)
> +static void block_task(struct rq *rq, struct task_struct *p, unsigned long task_state)
>  {
>  	int flags = DEQUEUE_NOCLOCK;
>  
> @@ -3696,6 +3696,20 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
>  	}
>  }
>  
> +static void zap_balance_callbacks(struct rq *rq);
> +
> +static inline void proxy_reset_donor(struct rq *rq)
> +{
> +#ifdef CONFIG_SCHED_PROXY_EXEC
> +	WARN_ON_ONCE(rq->curr == rq->donor);
> +
> +	put_prev_set_next_task(rq, rq->donor, rq->curr);
> +	rq_set_donor(rq, rq->curr);
> +	zap_balance_callbacks(rq);
> +	resched_curr(rq);
> +#endif
> +}

This one hurts my bain :-)

>  /*
>   * Consider @p being inside a wait loop:
>   *
> @@ -3730,6 +3744,8 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags)
>  		return 0;
>  
>  	update_rq_clock(rq);
> +	if (p->se.sched_delayed)
> +		enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);

Right, this works but seems wasteful, might be better to add
DEQUEUE_DELAYED in the blocked_on case.

>  	if (sched_proxy_exec() && p->blocked_on) {

So I had doubts about this lockless test of ->blocked_on, I still cannot
convince myself it is correct.

>  		guard(raw_spinlock)(&p->blocked_lock);
>  		struct mutex *lock = p->blocked_on;
> @@ -3738,15 +3754,20 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags)
>  			 * TASK_WAKING is a special state and results in
>  			 * DEQUEUE_SPECIAL such that the task will actually be
>  			 * forced from the runqueue.
> +			 *
> +			 * XXX: All of this is now equivalent of
> +			 * proxy_needs_return() from John's series :-)
>  			 */
> -			block_task(rq, p, TASK_WAKING);
>  			p->blocked_on = NULL;
> +			if (task_current(rq, p))
> +				goto out;

Right, fair enough :-) This could also be done when rq->cpu is inside
p->cpus_ptr mask, because in that case we don't strictly need a
migration. Thinking about that was on the todo list.

> +			if (task_current_donor(rq, p))
> +				proxy_reset_donor(rq);

Fun fun fun :-)

> +			block_task(rq, p, TASK_WAKING);
>  			return 0;
>  		}
>  	}
> -
> -	if (p->se.sched_delayed)
> -		enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
> +out:
>  	if (!task_on_cpu(rq, p)) {
>  		/*
>  		 * When on_rq && !on_cpu the task is preempted, see if
> @@ -4256,6 +4277,15 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>  		 */
>  		smp_cond_load_acquire(&p->on_cpu, !VAL);
>  
> +		/*
> +		 * We never clear the blocked_on relation on proxy_deactivate.
> +		 * If we don't clear it here, we have TASK_RUNNING + p->blocked_on
> +		 * when waking up. Since this is a fully blocked, off CPU task
> +		 * waking up, it should be safe to clear the blocked_on relation.
> +		 */
> +		if (task_is_blocked(p))
> +			clear_task_blocked_on(p, NULL);
> +

Aah, yes! This is when find_proxy_task() hits deactivate() for us and we
skip ttwu_runnable(). We still need to clear ->blocked_on.

I am once again not sure on the lockless nature of accessing
->blocked_on.

>  		cpu = select_task_rq(p, p->wake_cpu, &wake_flags);
>  		if (task_cpu(p) != cpu) {
>  			if (p->in_iowait) {
> @@ -6977,6 +7007,10 @@ static void __sched notrace __schedule(int sched_mode)
>  		switch_count = &prev->nvcsw;
>  	}
>  
> +	/* See: https://github.com/kudureranganath/linux/commit/0d6a01bb19db39f045d6f0f5fb4d196500091637 */
> +	if (!prev_state && task_is_blocked(prev))
> +		clear_task_blocked_on(prev, NULL);
> +

This one confuses me, ttwu() should never results in ->blocked_on being
set.

>  pick_again:
>  	assert_balance_callbacks_empty(rq);
>  	next = pick_next_task(rq, rq->donor, &rf);

  reply	other threads:[~2026-04-03  9:52 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24 19:13 [PATCH v26 00/10] Simple Donor Migration for Proxy Execution John Stultz
2026-03-24 19:13 ` [PATCH v26 01/10] sched: Make class_schedulers avoid pushing current, and get rid of proxy_tag_curr() John Stultz
2026-04-03 12:30   ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 02/10] sched: Minimise repeated sched_proxy_exec() checking John Stultz
2026-04-03 12:30   ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 03/10] sched: Fix potentially missing balancing with Proxy Exec John Stultz
2026-04-03 12:30   ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 04/10] locking: Add task::blocked_lock to serialize blocked_on state John Stultz
2026-04-03 12:30   ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 05/10] sched: Fix modifying donor->blocked on without proper locking John Stultz
2026-03-26 21:45   ` Steven Rostedt
2026-04-03 12:30   ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 06/10] sched/locking: Add special p->blocked_on==PROXY_WAKING value for proxy return-migration John Stultz
2026-04-03 12:30   ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 07/10] sched: Add assert_balance_callbacks_empty helper John Stultz
2026-04-03 12:30   ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 08/10] sched: Add logic to zap balance callbacks if we pick again John Stultz
2026-04-03 12:30   ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 09/10] sched: Move attach_one_task and attach_task helpers to sched.h John Stultz
2026-04-03 12:30   ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-24 19:13 ` [PATCH v26 10/10] sched: Handle blocked-waiter migration (and return migration) John Stultz
2026-03-26 22:52   ` Steven Rostedt
2026-03-27  4:47     ` K Prateek Nayak
2026-03-27 12:47       ` Peter Zijlstra
2026-04-02 14:43   ` Peter Zijlstra
2026-04-02 15:08     ` Peter Zijlstra
2026-04-02 17:43       ` John Stultz
2026-04-02 17:34     ` John Stultz
2026-04-03 12:30   ` [tip: sched/core] " tip-bot2 for John Stultz
2026-03-25 10:52 ` [PATCH v26 00/10] Simple Donor Migration for Proxy Execution K Prateek Nayak
2026-03-27 11:48   ` Peter Zijlstra
2026-03-27 13:33     ` K Prateek Nayak
2026-03-27 15:20       ` Peter Zijlstra
2026-03-27 15:41         ` Peter Zijlstra
2026-03-27 16:00       ` Peter Zijlstra
2026-03-27 16:57         ` K Prateek Nayak
2026-04-02 15:50           ` Peter Zijlstra
2026-04-02 18:31             ` John Stultz
2026-04-02 21:04               ` John Stultz
2026-04-03  6:09               ` K Prateek Nayak
2026-04-03  9:52                 ` Peter Zijlstra [this message]
2026-04-03 10:25                   ` K Prateek Nayak
2026-04-03 11:28                     ` Peter Zijlstra
2026-04-03 13:43                       ` K Prateek Nayak
2026-04-03 14:38                         ` Peter Zijlstra
2026-04-03 15:39                           ` K Prateek Nayak
2026-04-03 21:08                             ` Peter Zijlstra
2026-04-04  0:26                             ` John Stultz
2026-04-04  5:49                               ` K Prateek Nayak
2026-04-04  6:07                                 ` John Stultz
2026-04-06  2:40                                   ` K Prateek Nayak
2026-04-03 12:54                     ` Peter Zijlstra
2026-04-03  9:18               ` Peter Zijlstra
2026-03-27 19:15     ` John Stultz
2026-03-27 19:10   ` John Stultz
2026-03-28  4:53     ` K Prateek Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260403095225.GY3738010@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=Metin.Kaya@arm.com \
    --cc=boqun.feng@gmail.com \
    --cc=bsegall@google.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=hupu.gm@gmail.com \
    --cc=joelagnelf@nvidia.com \
    --cc=jstultz@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@android.com \
    --cc=kprateek.nayak@amd.com \
    --cc=kuyo.chang@mediatek.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=qyousef@layalina.io \
    --cc=rostedt@goodmis.org \
    --cc=suleiman@google.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=will@kernel.org \
    --cc=xuewen.yan94@gmail.com \
    --cc=zezeozue@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.