From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 11DF137755A for ; Sat, 4 Apr 2026 05:36:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775281018; cv=none; b=ZdNeQfrO1SUlLCZmDVKcj81oMvNpYnS4lI2jt5BByd/IoqWA47qsVhDNMX39pl2tK88pdFUqM/SuZoXYd+j+R5uepSCAe4+8x681CA7c4m7kak6q+wSwD9UEdzOPf3CRXs/jPk77CW8qCHGCNndVfnh2/z2mEP/7mu4tacPgZ/Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775281018; c=relaxed/simple; bh=6MGeMiTdWXeKHl0bix/fOT8GJOFL6I6BQlZjZc5W4T8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=rE4jLc9eOlnxKYi4Zg4lPwFOV1BdoCQlNUskoPRSEf6nxAUc1pGoAqg/mUyIYNFmQtPWy22hlwoIwztbVkYBPkD2E96VXyBnjZPD885vrdjbjl2Mj9+FpzQYLvMSZxDcHzgJ3F+PL36qI9i5lQdtCDcsBRjOn8ko5bJmjeMmi8M= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=n0DwI8Aq; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="n0DwI8Aq" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-358df8fbd1cso2768101a91.0 for ; Fri, 03 Apr 2026 22:36:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1775281010; x=1775885810; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PxI7okwmhW5yjaVNEGP0PdvDUJuVFZiHzIx4cRjEVu4=; b=n0DwI8AqN2GozibneOjAHiJh2M9RWXjo7to/nwbBkoo3FYagf7Ga6CEMHOh8UIp3NI TgEAfa/T69RJZOhEdgD3xEKf9+9IlGxQi7vhQYzFYkDOQw8oYnZ3TMLaYsSwys7s3M2e p2rCErVb5mXl9B/Vzr9s0gj3KONCoURkyw5+94G7f0VEw59VvJG+noUjq/k8PRtKfI4q ohgq40/+xDR0WRtK9DK6k0dplrGw5O+UUNNExs3g/2Yfk7o5KDIOFVCE889zDG8JJ7hR 8YuJZG51QsgWSOn0o6KJpyDVAqKGxyjBdTayFkQjW1aSGRL2uiIJ/oe7Bltdwpzf0Bd+ XWqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775281010; x=1775885810; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PxI7okwmhW5yjaVNEGP0PdvDUJuVFZiHzIx4cRjEVu4=; b=gX99Oyn+O2aGCPRVvBQQiEKloSm+6cwncKVm1nhD4o20k0nQGVrHlVYzhoc9HOSvLL 9W8LXTYau19+dPJKLqcRca9QGQUlzhh321I0ANy9yZsURJfaxLJPSZbC67fXzeCpzsIW PDfH+5hMJQIUuNPoJ2Te8wcMIMvvucuL0YkD879Hk54sJTy3hVZDCHy4h/XkjFjnu1Pt JiuUnLGktCi+vQAD3/5DvctqJEnMWNPCahViLxxUK4iia9BvYQ76+W4cY2eA5G4wKlAv U6P+m3CmP9Xxt6I2NyN7IZb0t72RaWJBwWhy1m1yb1S9KNo7dQveu7SupeOnkao5vNiu KXmQ== X-Gm-Message-State: AOJu0YxZWrv0P/wfaDD7xr6/ApBtvEZpKhglZanXUOqCCq1CI8GXGsJu 5cEYoP7kybXvFM6ifef6WZZ8PFHOeToficCBX2gON75dZ2YmpYgNs94lk5Bf5Y+8rm9lggkX3kg xsgpxW5c+BUUX1KWls4+Pvfnf4M+7u5e3NzR28dQZm3kcp/pWku2wVKAmPDdx9EDrt2JwnQxUNP XPbHN/uXtB8oPNuhKaw8A9PIvF5QNx6fmljIHH6n/JAIyDVj6E X-Received: from pgbcq14.prod.google.com ([2002:a05:6a02:408e:b0:c74:1d79:2cfa]) (user=jstultz job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3fcf:b0:35d:a3b4:2ef6 with SMTP id 98e67ed59e1d1-35de6946a99mr5236438a91.21.1775281010094; Fri, 03 Apr 2026 22:36:50 -0700 (PDT) Date: Sat, 4 Apr 2026 05:36:23 +0000 In-Reply-To: <20260404053632.1729280-1-jstultz@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260404053632.1729280-1-jstultz@google.com> X-Mailer: git-send-email 2.53.0.1213.gd9a14994de-goog Message-ID: <20260404053632.1729280-7-jstultz@google.com> Subject: [PATCH v27 06/10] sched: Have try_to_wake_up() handle return-migration for PROXY_WAKING case From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , Daniel Lezcano , Suleiman Souhlal , kuyo chang , hupu , kernel-team@android.com Content-Type: text/plain; charset="UTF-8" This patch adds logic so try_to_wake_up() will notice if we are waking a task where blocked_on == PROXY_WAKING, and if necessary dequeue the task so the wakeup will naturally return-migrate the donor task back to a cpu it can run on. This helps performance as we do the dequeue and wakeup under the locks normally taken in the try_to_wake_up() and avoids having to do proxy_force_return() from __schedule(), which has to re-take similar locks and then force a pick again loop. This was split out from the larger proxy patch, and significantly reworked. Credits for the original patch go to: Peter Zijlstra (Intel) Juri Lelli Valentin Schneider Connor O'Brien Signed-off-by: John Stultz --- v24: * Reworked proxy_needs_return() so its less nested as suggested by K Prateek * Switch to using block_task with DEQUEUE_SPECIAL as suggested by K Prateek * Fix edge case to reset wake_cpu if select_task_rq() chooses the current rq and we skip set_task_cpu() v26: * Handle both blocked and PROXY_WAKING tasks in proxy_needs_return(), as suggested by K Prateek * Try to handle signal edge case in ttwu that K Prateek pointed out v27: * Integrate simplifications to proxy_needs_return() suggested by K Prateek * Rework ttwu_runnable() to align with ACQUIRE(__task_rq_lock, guard)(p) usage as suggested by Peter * Major rework suggested by Peter to get rid of proxy_force_return() completely, using proxy_deactivate() and allow ttwu to handle all the return migration. Lots of helpful improvements suggested by K Prateek included as well here. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Suleiman Souhlal Cc: kuyo chang Cc: hupu Cc: kernel-team@android.com --- include/linux/sched.h | 2 +- kernel/sched/core.c | 194 +++++++++++++++++++++--------------------- 2 files changed, 96 insertions(+), 100 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 8ec3b6d7d718b..3ae1330801157 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -161,7 +161,7 @@ struct user_event_mm; */ #define is_special_task_state(state) \ ((state) & (__TASK_STOPPED | __TASK_TRACED | TASK_PARKED | \ - TASK_DEAD | TASK_FROZEN)) + TASK_DEAD | TASK_WAKING | TASK_FROZEN)) #ifdef CONFIG_DEBUG_ATOMIC_SLEEP # define debug_normal_state_change(state_value) \ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 8f1b14a830851..2b5f9f905afe1 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3659,6 +3659,44 @@ void update_rq_avg_idle(struct rq *rq) rq->idle_stamp = 0; } +#ifdef CONFIG_SCHED_PROXY_EXEC +static inline struct task_struct *proxy_resched_idle(struct rq *rq); + +/* + * Checks to see if task p has been proxy-migrated to another rq + * and needs to be returned. If so, we deactivate the task here + * so that it can be properly woken up on the p->wake_cpu + * (or whichever cpu select_task_rq() picks at the bottom of + * try_to_wake_up() + */ +static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p) +{ + if (!task_is_blocked(p)) + return false; + + guard(raw_spinlock)(&p->blocked_lock); + + /* Task is waking up; clear any blocked_on relationship */ + __clear_task_blocked_on(p, NULL); + + /* If already current, don't need to return migrate */ + if (task_current(rq, p)) + return false; + + /* If we're return migrating the rq->donor, switch it out for idle */ + if (task_current_donor(rq, p)) + proxy_resched_idle(rq); + + block_task(rq, p, TASK_WAKING); + return true; +} +#else /* !CONFIG_SCHED_PROXY_EXEC */ +static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p) +{ + return false; +} +#endif /* CONFIG_SCHED_PROXY_EXEC */ + static void ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, struct rq_flags *rf) @@ -3723,28 +3761,26 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, */ static int ttwu_runnable(struct task_struct *p, int wake_flags) { - struct rq_flags rf; - struct rq *rq; - int ret = 0; + ACQUIRE(__task_rq_lock, guard)(p); + struct rq *rq = guard.rq; - rq = __task_rq_lock(p, &rf); - if (task_on_rq_queued(p)) { - update_rq_clock(rq); - if (p->se.sched_delayed) - enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED); - if (!task_on_cpu(rq, p)) { - /* - * When on_rq && !on_cpu the task is preempted, see if - * it should preempt the task that is current now. - */ - wakeup_preempt(rq, p, wake_flags); - } - ttwu_do_wakeup(p); - ret = 1; - } - __task_rq_unlock(rq, p, &rf); + if (!task_on_rq_queued(p)) + return 0; - return ret; + update_rq_clock(rq); + if (p->se.sched_delayed) + enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED); + if (proxy_needs_return(rq, p)) + return 0; + if (!task_on_cpu(rq, p)) { + /* + * When on_rq && !on_cpu the task is preempted, see if + * it should preempt the task that is current now. + */ + wakeup_preempt(rq, p, wake_flags); + } + ttwu_do_wakeup(p); + return 1; } void sched_ttwu_pending(void *arg) @@ -4131,6 +4167,8 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) * it disabling IRQs (this allows not taking ->pi_lock). */ WARN_ON_ONCE(p->se.sched_delayed); + /* If p is current, we know we can run here, so clear blocked_on */ + clear_task_blocked_on(p, NULL); if (!ttwu_state_match(p, state, &success)) goto out; @@ -4147,6 +4185,15 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) */ scoped_guard (raw_spinlock_irqsave, &p->pi_lock) { smp_mb__after_spinlock(); + + /* + * We could get a wakeup from a signal which wouldn't + * mark the blocked_on state as PROXY_WAKING. So + * set the woken task as PROXY_WAKING here so we are + * sure the task will wake and run. + */ + set_task_blocked_on_waking(p, NULL); + if (!ttwu_state_match(p, state, &success)) break; @@ -4211,6 +4258,14 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) */ WRITE_ONCE(p->__state, TASK_WAKING); + /* + * We never clear the blocked_on relation on proxy_deactivate. + * If we don't clear it here, we have TASK_RUNNING + p->blocked_on + * when waking up. Since this is a fully blocked, off CPU task + * waking up, it should be safe to clear the blocked_on relation. + */ + if (task_is_blocked(p)) + clear_task_blocked_on(p, NULL); /* * If the owning (remote) CPU is still in the middle of schedule() with * this task as prev, considering queueing p on the remote CPUs wake_list @@ -4255,6 +4310,16 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) wake_flags |= WF_MIGRATED; psi_ttwu_dequeue(p); set_task_cpu(p, cpu); + } else if (cpu != p->wake_cpu) { + /* + * If we were proxy-migrated to cpu, then + * select_task_rq() picks cpu instead of wake_cpu + * to return to, we won't call set_task_cpu(), + * leaving a stale wake_cpu pointing to where we + * proxy-migrated from. So just fixup wake_cpu here + * if its not correct + */ + p->wake_cpu = cpu; } ttwu_queue(p, cpu, wake_flags); @@ -6542,7 +6607,7 @@ static bool try_to_block_task(struct rq *rq, struct task_struct *p, if (signal_pending_state(task_state, p)) { WRITE_ONCE(p->__state, TASK_RUNNING); *task_state_p = TASK_RUNNING; - set_task_blocked_on_waking(p, NULL); + clear_task_blocked_on(p, NULL); return false; } @@ -6585,13 +6650,11 @@ static inline struct task_struct *proxy_resched_idle(struct rq *rq) return rq->idle; } -static bool proxy_deactivate(struct rq *rq, struct task_struct *donor) +static void proxy_deactivate(struct rq *rq, struct task_struct *donor) { unsigned long state = READ_ONCE(donor->__state); - /* Don't deactivate if the state has been changed to TASK_RUNNING */ - if (state == TASK_RUNNING) - return false; + WARN_ON_ONCE(state == TASK_RUNNING); /* * Because we got donor from pick_next_task(), it is *crucial* * that we call proxy_resched_idle() before we deactivate it. @@ -6602,7 +6665,7 @@ static bool proxy_deactivate(struct rq *rq, struct task_struct *donor) * need to be changed from next *before* we deactivate. */ proxy_resched_idle(rq); - return try_to_block_task(rq, donor, &state, true); + block_task(rq, donor, state); } static inline void proxy_release_rq_lock(struct rq *rq, struct rq_flags *rf) @@ -6676,71 +6739,6 @@ static void proxy_migrate_task(struct rq *rq, struct rq_flags *rf, proxy_reacquire_rq_lock(rq, rf); } -static void proxy_force_return(struct rq *rq, struct rq_flags *rf, - struct task_struct *p) - __must_hold(__rq_lockp(rq)) -{ - struct rq *task_rq, *target_rq = NULL; - int cpu, wake_flag = WF_TTWU; - - lockdep_assert_rq_held(rq); - WARN_ON(p == rq->curr); - - if (p == rq->donor) - proxy_resched_idle(rq); - - proxy_release_rq_lock(rq, rf); - /* - * We drop the rq lock, and re-grab task_rq_lock to get - * the pi_lock (needed for select_task_rq) as well. - */ - scoped_guard (task_rq_lock, p) { - task_rq = scope.rq; - - /* - * Since we let go of the rq lock, the task may have been - * woken or migrated to another rq before we got the - * task_rq_lock. So re-check we're on the same RQ. If - * not, the task has already been migrated and that CPU - * will handle any futher migrations. - */ - if (task_rq != rq) - break; - - /* - * Similarly, if we've been dequeued, someone else will - * wake us - */ - if (!task_on_rq_queued(p)) - break; - - /* - * Since we should only be calling here from __schedule() - * -> find_proxy_task(), no one else should have - * assigned current out from under us. But check and warn - * if we see this, then bail. - */ - if (task_current(task_rq, p) || task_on_cpu(task_rq, p)) { - WARN_ONCE(1, "%s rq: %i current/on_cpu task %s %d on_cpu: %i\n", - __func__, cpu_of(task_rq), - p->comm, p->pid, p->on_cpu); - break; - } - - update_rq_clock(task_rq); - deactivate_task(task_rq, p, DEQUEUE_NOCLOCK); - cpu = select_task_rq(p, p->wake_cpu, &wake_flag); - set_task_cpu(p, cpu); - target_rq = cpu_rq(cpu); - clear_task_blocked_on(p, NULL); - } - - if (target_rq) - attach_one_task(target_rq, p); - - proxy_reacquire_rq_lock(rq, rf); -} - /* * Find runnable lock owner to proxy for mutex blocked donor * @@ -6776,7 +6774,7 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf) clear_task_blocked_on(p, PROXY_WAKING); return p; } - goto force_return; + goto deactivate; } /* @@ -6811,7 +6809,7 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf) __clear_task_blocked_on(p, NULL); return p; } - goto force_return; + goto deactivate; } if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) { @@ -6890,12 +6888,7 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf) return owner; deactivate: - if (proxy_deactivate(rq, donor)) - return NULL; - /* If deactivate fails, force return */ - p = donor; -force_return: - proxy_force_return(rq, rf, p); + proxy_deactivate(rq, p); return NULL; migrate_task: proxy_migrate_task(rq, rf, p, owner_cpu); @@ -7043,6 +7036,9 @@ static void __sched notrace __schedule(int sched_mode) if (sched_proxy_exec()) { struct task_struct *prev_donor = rq->donor; + if (!prev_state && prev->blocked_on) + clear_task_blocked_on(prev, NULL); + rq_set_donor(rq, next); if (unlikely(next->blocked_on)) { next = find_proxy_task(rq, next, &rf); -- 2.53.0.1213.gd9a14994de-goog