From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE7053A255D for ; Wed, 22 Apr 2026 23:07:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776899239; cv=none; b=AIbAEohT9NDbCDP7kOpNPqMz0dGc7mFS1wEL1UBtLUrbWcOuyhHS67Y2tK4Ag2QYj7/qnZd+18Ilr2d2VqyPnV/7tiE5DSoIl9pJhI+GjTj6I35DGoVZvWdNBWEKLoOhB20CVBrQWOPXb0zya0ktyDyRsDwR7HQ8l1hWgdOT/wc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776899239; c=relaxed/simple; bh=NPXh0KmRYOFaMMEvwWTWk0BKW250gM9qA7pumOyUA4k=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=jGrFA10TiXY8zMg7A52h4U17zZq4bwjqCVJUHonUE1L/sj2zXDWwmLq4f6N3qa2cvVht08Oa9Ox9QgWvtoIJjL2IM8wu3vYNQr82M0lU57aM76Jx2tQm6XNqDUPHWbB9kclKgUUTk+np+N2CtKwDW7NYLGEaAQ5oBxaZzq75btA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=AqbZpMXv; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="AqbZpMXv" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-35fbb5779e8so6966743a91.3 for ; Wed, 22 Apr 2026 16:07:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1776899234; x=1777504034; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=WEMdwADTR5bjH1z+sSz1SGoDcZgjFXtGqF46AN0pidA=; b=AqbZpMXvVhuIqPC3H+QINbFjeOYi+Ka65uVNZy/Oa3ZewVFHrbD+8IIg5Q9UqfPQtB cEu2s1Zz67jHmK2638v+vfYAah1Y8FB7QOOsVDUDbRXJZwf/1kgeGtwNitHIsHqgY50p t1/TlqJ0L/5/Wiqqw0k/qEtUwRO+cZc/GG8rkz7bsyI13sWAsSHMPH+ssbVvB3WWoo4O UHasSd6KS/QwPyYp/B1gy59f02ZZaxSsXL9NUIL4yjWO5fXDhm3aIqRJ6nhvVBvtSpQR XqwX2qOY2H92ADTZS6TgJmEPk/g/ZMVh54gpi8wqftr1P+VduHiY2QssbjJyBxtu7FDr sGww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776899234; x=1777504034; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WEMdwADTR5bjH1z+sSz1SGoDcZgjFXtGqF46AN0pidA=; b=laOcWvEe6DcafCIXgr/LwqXVs2SzPQdpOQPLqSVKHw+C9bGsRP6eJPcM3v0vqrYGvW 2fDaEN8MvwzMMtcB430JWufACZeQaRazN4Gy3dFTOLxRyl/4guVpe38g2Sb6L1m0OcpV Nu8Rtfe3sGbKYJbWgDXlOpkPwpThVioh+Zzb6Smm8yv61gMi5A3LC+e+DBV4AIfajMnM NjrGjt6ZqRZ++2cvEMySf8lKe28yOElnea80E8ldFfgy2gxtnXtb3zO6TGivi+s1q8ss y+bft87vjdfqOg9rFUUUg0jESRbVAHtD2z22CoC++yHxqZUcWA75MTaK6FpSYbjud4Qm Orbg== X-Gm-Message-State: AOJu0Ywo0nVaDseCk+nQ/7j+2rnlyAvwVlZHgCD21Kt/ygWELD8qELuR Pq6reyLIWGQm2dkCdeZJYlu+X8Gux8BdyA/j09E/eoSY/rg9FuboO7uTj3lhGPrO6yia3X1skPA FKiMNNDU4GgOmNz3/6IT0/rOYe+BPSLSHnyf9k5azGsLi0MwrkVuYMUg+W1GGMQ2oOznHn7y31M OkdFuXwRXwLryi9jzpMKBREA6p0FkQNwEayPWlN3pFqtseKfqr X-Received: from plbbb3.prod.google.com ([2002:a17:902:bc83:b0:29f:25cf:e576]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3c52:b0:35f:b50e:defc with SMTP id 98e67ed59e1d1-36140473f07mr25888192a91.16.1776899233791; Wed, 22 Apr 2026 16:07:13 -0700 (PDT) Date: Wed, 22 Apr 2026 23:06:46 +0000 In-Reply-To: <20260422230659.903191-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260422230659.903191-1-jstultz@google.com> X-Mailer: git-send-email 2.54.0.rc2.533.g4f5dca5207-goog Message-ID: <20260422230659.903191-6-jstultz@google.com> Subject: [PATCH v28 5/8] sched: Have try_to_wake_up() handle return-migration for PROXY_WAKING case From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Type: text/plain; charset="UTF-8" This patch adds logic so try_to_wake_up() will notice if we are waking a task where blocked_on == PROXY_WAKING, and if necessary dequeue the task so the wakeup will naturally return-migrate the donor task back to a cpu it can run on. This helps performance as we do the dequeue and wakeup under the locks normally taken in the try_to_wake_up() and avoids having to do proxy_force_return() from __schedule(), which has to re-take similar locks and then force a pick again loop. This was split out from the larger proxy patch, and significantly reworked. Credits for the original patch go to: Peter Zijlstra (Intel) Juri Lelli Valentin Schneider Connor O'Brien Signed-off-by: John Stultz --- XXX - Make sure to switch to use ACQUIRE(__task_rq_lock, guard) in 7.0-rc+ branches --- v24: * Reworked proxy_needs_return() so its less nested as suggested by K Prateek * Switch to using block_task with DEQUEUE_SPECIAL as suggested by K Prateek * Fix edge case to reset wake_cpu if select_task_rq() chooses the current rq and we skip set_task_cpu() v26: * Handle both blocked and PROXY_WAKING tasks in proxy_needs_return(), as suggested by K Prateek * Try to handle signal edge case in ttwu that K Prateek pointed out v27: * Integrate simplifications to proxy_needs_return() suggested by K Prateek * Rework ttwu_runnable() to align with ACQUIRE(__task_rq_lock, guard)(p) usage as suggested by Peter * Major rework suggested by Peter to get rid of proxy_force_return() completely, using proxy_deactivate() and allow ttwu to handle all the return migration. Lots of helpful improvements suggested by K Prateek included as well here. v28: * Folded in change suggested by K Prateek to introduce proxy_reset_donor() to reset the donor to current task when the donor is woken up. * Drop an unnecessary PROXY_WAKING assignment which was noted by K Prateek Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- include/linux/sched.h | 2 +- kernel/sched/core.c | 196 +++++++++++++++++++++--------------------- 2 files changed, 98 insertions(+), 100 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 368c7b4d7cb51..5b68a1c9eedcf 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -161,7 +161,7 @@ struct user_event_mm; */ #define is_special_task_state(state) \ ((state) & (__TASK_STOPPED | __TASK_TRACED | TASK_PARKED | \ - TASK_DEAD | TASK_FROZEN)) + TASK_DEAD | TASK_WAKING | TASK_FROZEN)) #ifdef CONFIG_DEBUG_ATOMIC_SLEEP # define debug_normal_state_change(state_value) \ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 942af3b34ffe0..17797e1f76f25 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3722,6 +3722,54 @@ void update_rq_avg_idle(struct rq *rq) rq->idle_stamp = 0; } +#ifdef CONFIG_SCHED_PROXY_EXEC +static void zap_balance_callbacks(struct rq *rq); + +static inline void proxy_reset_donor(struct rq *rq) +{ + WARN_ON_ONCE(rq->donor == rq->curr); + + put_prev_set_next_task(rq, rq->donor, rq->curr); + rq_set_donor(rq, rq->curr); + zap_balance_callbacks(rq); + resched_curr(rq); +} + +/* + * Checks to see if task p has been proxy-migrated to another rq + * and needs to be returned. If so, we deactivate the task here + * so that it can be properly woken up on the p->wake_cpu + * (or whichever cpu select_task_rq() picks at the bottom of + * try_to_wake_up() + */ +static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p) +{ + if (!task_is_blocked(p)) + return false; + + guard(raw_spinlock)(&p->blocked_lock); + + /* Task is waking up; clear any blocked_on relationship */ + __clear_task_blocked_on(p, NULL); + + /* If already current, don't need to return migrate */ + if (task_current(rq, p)) + return false; + + /* If we're return migrating the rq->donor, switch it out for idle */ + if (task_current_donor(rq, p)) + proxy_reset_donor(rq); + + block_task(rq, p, TASK_WAKING); + return true; +} +#else /* !CONFIG_SCHED_PROXY_EXEC */ +static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p) +{ + return false; +} +#endif /* CONFIG_SCHED_PROXY_EXEC */ + static void ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, struct rq_flags *rf) @@ -3786,28 +3834,26 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, */ static int ttwu_runnable(struct task_struct *p, int wake_flags) { - struct rq_flags rf; - struct rq *rq; - int ret = 0; + ACQUIRE(__task_rq_lock, guard)(p); + struct rq *rq = guard.rq; - rq = __task_rq_lock(p, &rf); - if (task_on_rq_queued(p)) { - update_rq_clock(rq); - if (p->se.sched_delayed) - enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED); - if (!task_on_cpu(rq, p)) { - /* - * When on_rq && !on_cpu the task is preempted, see if - * it should preempt the task that is current now. - */ - wakeup_preempt(rq, p, wake_flags); - } - ttwu_do_wakeup(p); - ret = 1; - } - __task_rq_unlock(rq, p, &rf); + if (!task_on_rq_queued(p)) + return 0; - return ret; + update_rq_clock(rq); + if (p->se.sched_delayed) + enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED); + if (proxy_needs_return(rq, p)) + return 0; + if (!task_on_cpu(rq, p)) { + /* + * When on_rq && !on_cpu the task is preempted, see if + * it should preempt the task that is current now. + */ + wakeup_preempt(rq, p, wake_flags); + } + ttwu_do_wakeup(p); + return 1; } void sched_ttwu_pending(void *arg) @@ -4194,6 +4240,8 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) * it disabling IRQs (this allows not taking ->pi_lock). */ WARN_ON_ONCE(p->se.sched_delayed); + /* If p is current, we know we can run here, so clear blocked_on */ + clear_task_blocked_on(p, NULL); if (!ttwu_state_match(p, state, &success)) goto out; @@ -4210,6 +4258,7 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) */ scoped_guard (raw_spinlock_irqsave, &p->pi_lock) { smp_mb__after_spinlock(); + if (!ttwu_state_match(p, state, &success)) break; @@ -4274,6 +4323,14 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) */ WRITE_ONCE(p->__state, TASK_WAKING); + /* + * We never clear the blocked_on relation on proxy_deactivate. + * If we don't clear it here, we have TASK_RUNNING + p->blocked_on + * when waking up. Since this is a fully blocked, off CPU task + * waking up, it should be safe to clear the blocked_on relation. + */ + if (task_is_blocked(p)) + clear_task_blocked_on(p, NULL); /* * If the owning (remote) CPU is still in the middle of schedule() with * this task as prev, considering queueing p on the remote CPUs wake_list @@ -4318,6 +4375,16 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) wake_flags |= WF_MIGRATED; psi_ttwu_dequeue(p); set_task_cpu(p, cpu); + } else if (cpu != p->wake_cpu) { + /* + * If we were proxy-migrated to cpu, then + * select_task_rq() picks cpu instead of wake_cpu + * to return to, we won't call set_task_cpu(), + * leaving a stale wake_cpu pointing to where we + * proxy-migrated from. So just fixup wake_cpu here + * if its not correct + */ + p->wake_cpu = cpu; } ttwu_queue(p, cpu, wake_flags); @@ -6606,7 +6673,7 @@ static bool try_to_block_task(struct rq *rq, struct task_struct *p, if (signal_pending_state(task_state, p)) { WRITE_ONCE(p->__state, TASK_RUNNING); *task_state_p = TASK_RUNNING; - set_task_blocked_on_waking(p, NULL); + clear_task_blocked_on(p, NULL); return false; } @@ -6649,13 +6716,11 @@ static inline struct task_struct *proxy_resched_idle(struct rq *rq) return rq->idle; } -static bool proxy_deactivate(struct rq *rq, struct task_struct *donor) +static void proxy_deactivate(struct rq *rq, struct task_struct *donor) { unsigned long state = READ_ONCE(donor->__state); - /* Don't deactivate if the state has been changed to TASK_RUNNING */ - if (state == TASK_RUNNING) - return false; + WARN_ON_ONCE(state == TASK_RUNNING); /* * Because we got donor from pick_next_task(), it is *crucial* * that we call proxy_resched_idle() before we deactivate it. @@ -6666,7 +6731,7 @@ static bool proxy_deactivate(struct rq *rq, struct task_struct *donor) * need to be changed from next *before* we deactivate. */ proxy_resched_idle(rq); - return try_to_block_task(rq, donor, &state, true); + block_task(rq, donor, state); } static inline void proxy_release_rq_lock(struct rq *rq, struct rq_flags *rf) @@ -6740,71 +6805,6 @@ static void proxy_migrate_task(struct rq *rq, struct rq_flags *rf, proxy_reacquire_rq_lock(rq, rf); } -static void proxy_force_return(struct rq *rq, struct rq_flags *rf, - struct task_struct *p) - __must_hold(__rq_lockp(rq)) -{ - struct rq *task_rq, *target_rq = NULL; - int cpu, wake_flag = WF_TTWU; - - lockdep_assert_rq_held(rq); - WARN_ON(p == rq->curr); - - if (p == rq->donor) - proxy_resched_idle(rq); - - proxy_release_rq_lock(rq, rf); - /* - * We drop the rq lock, and re-grab task_rq_lock to get - * the pi_lock (needed for select_task_rq) as well. - */ - scoped_guard (task_rq_lock, p) { - task_rq = scope.rq; - - /* - * Since we let go of the rq lock, the task may have been - * woken or migrated to another rq before we got the - * task_rq_lock. So re-check we're on the same RQ. If - * not, the task has already been migrated and that CPU - * will handle any futher migrations. - */ - if (task_rq != rq) - break; - - /* - * Similarly, if we've been dequeued, someone else will - * wake us - */ - if (!task_on_rq_queued(p)) - break; - - /* - * Since we should only be calling here from __schedule() - * -> find_proxy_task(), no one else should have - * assigned current out from under us. But check and warn - * if we see this, then bail. - */ - if (task_current(task_rq, p) || task_on_cpu(task_rq, p)) { - WARN_ONCE(1, "%s rq: %i current/on_cpu task %s %d on_cpu: %i\n", - __func__, cpu_of(task_rq), - p->comm, p->pid, p->on_cpu); - break; - } - - update_rq_clock(task_rq); - deactivate_task(task_rq, p, DEQUEUE_NOCLOCK); - cpu = select_task_rq(p, p->wake_cpu, &wake_flag); - set_task_cpu(p, cpu); - target_rq = cpu_rq(cpu); - clear_task_blocked_on(p, NULL); - } - - if (target_rq) - attach_one_task(target_rq, p); - - proxy_reacquire_rq_lock(rq, rf); -} - /* * Find runnable lock owner to proxy for mutex blocked donor * @@ -6840,7 +6840,7 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf) clear_task_blocked_on(p, PROXY_WAKING); return p; } - goto force_return; + goto deactivate; } /* @@ -6875,7 +6875,7 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf) __clear_task_blocked_on(p, NULL); return p; } - goto force_return; + goto deactivate; } if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) { @@ -6954,12 +6954,7 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf) return owner; deactivate: - if (proxy_deactivate(rq, donor)) - return NULL; - /* If deactivate fails, force return */ - p = donor; -force_return: - proxy_force_return(rq, rf, p); + proxy_deactivate(rq, p); return NULL; migrate_task: proxy_migrate_task(rq, rf, p, owner_cpu); @@ -7106,6 +7101,9 @@ static void __sched notrace __schedule(int sched_mode) if (sched_proxy_exec()) { struct task_struct *prev_donor = rq->donor; + if (!prev_state && prev->blocked_on) + clear_task_blocked_on(prev, NULL); + rq_set_donor(rq, next); if (unlikely(next->blocked_on)) { next = find_proxy_task(rq, next, &rf); -- 2.54.0.rc2.533.g4f5dca5207-goog