From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Joel Fernandes <joelaf@google.com>,
Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Valentin Schneider <vschneid@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>,
Zimuzo Ezeozue <zezeozue@google.com>,
Youssef Esmat <youssefesmat@google.com>,
Mel Gorman <mgorman@suse.de>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Metin Kaya <Metin.Kaya@arm.com>,
Xuewen Yan <xuewen.yan94@gmail.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
Thomas Gleixner <tglx@linutronix.de>,
kernel-team@android.com,
Valentin Schneider <valentin.schneider@arm.com>,
"Connor O'Brien" <connoro@google.com>,
John Stultz <jstultz@google.com>
Subject: [PATCH v7 13/23] sched: Start blocked_on chain processing in find_proxy_task()
Date: Tue, 19 Dec 2023 16:18:24 -0800 [thread overview]
Message-ID: <20231220001856.3710363-14-jstultz@google.com> (raw)
In-Reply-To: <20231220001856.3710363-1-jstultz@google.com>
From: Peter Zijlstra <peterz@infradead.org>
Start to flesh out the real find_proxy_task() implementation,
but avoid the migration cases for now, in those cases just
deactivate the selected task and pick again.
To ensure the selected task or other blocked tasks in the chain
aren't migrated away while we're running the proxy, this patch
also tweaks CFS logic to avoid migrating selected or mutex
blocked tasks.
Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: This change was split out from the larger proxy patch]
Signed-off-by: John Stultz <jstultz@google.com>
---
v5:
* Split this out from larger proxy patch
v7:
* Minor refactoring of core find_proxy_task() function
* Minor spelling and corrections suggested by Metin Kaya
* Dropped an added BUG_ON that was frequently tripped
* Minor commit message tweaks from Metin Kaya
---
kernel/sched/core.c | 154 +++++++++++++++++++++++++++++++++++++-------
kernel/sched/fair.c | 9 ++-
2 files changed, 137 insertions(+), 26 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f6bf3b62194c..42e25bbdfe6b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -94,6 +94,7 @@
#include "../workqueue_internal.h"
#include "../../io_uring/io-wq.h"
#include "../smpboot.h"
+#include "../locking/mutex.h"
EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_send_cpu);
EXPORT_TRACEPOINT_SYMBOL_GPL(ipi_send_cpumask);
@@ -6609,6 +6610,15 @@ static bool try_to_deactivate_task(struct rq *rq, struct task_struct *p,
#ifdef CONFIG_SCHED_PROXY_EXEC
+static inline struct task_struct *
+proxy_resched_idle(struct rq *rq, struct task_struct *next)
+{
+ put_prev_task(rq, next);
+ rq_set_selected(rq, rq->idle);
+ set_tsk_need_resched(rq->idle);
+ return rq->idle;
+}
+
static bool proxy_deactivate(struct rq *rq, struct task_struct *next)
{
unsigned long state = READ_ONCE(next->__state);
@@ -6618,48 +6628,138 @@ static bool proxy_deactivate(struct rq *rq, struct task_struct *next)
return false;
if (!try_to_deactivate_task(rq, next, state, true))
return false;
- put_prev_task(rq, next);
- rq_set_selected(rq, rq->idle);
- resched_curr(rq);
+ proxy_resched_idle(rq, next);
return true;
}
/*
- * Initial simple proxy that just returns the task if it's waking
- * or deactivates the blocked task so we can pick something that
- * isn't blocked.
+ * Find who @next (currently blocked on a mutex) can proxy for.
+ *
+ * Follow the blocked-on relation:
+ * task->blocked_on -> mutex->owner -> task...
+ *
+ * Lock order:
+ *
+ * p->pi_lock
+ * rq->lock
+ * mutex->wait_lock
+ * p->blocked_lock
+ *
+ * Returns the task that is going to be used as execution context (the one
+ * that is actually going to be put to run on cpu_of(rq)).
*/
static struct task_struct *
find_proxy_task(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
{
+ struct task_struct *owner = NULL;
struct task_struct *ret = NULL;
- struct task_struct *p = next;
+ struct task_struct *p;
struct mutex *mutex;
+ int this_cpu = cpu_of(rq);
- mutex = p->blocked_on;
- /* Something changed in the chain, so pick again */
- if (!mutex)
- return NULL;
/*
- * By taking mutex->wait_lock we hold off concurrent mutex_unlock()
- * and ensure @owner sticks around.
+ * Follow blocked_on chain.
+ *
+ * TODO: deadlock detection
*/
- raw_spin_lock(&mutex->wait_lock);
- raw_spin_lock(&p->blocked_lock);
+ for (p = next; task_is_blocked(p); p = owner) {
+ mutex = p->blocked_on;
+ /* Something changed in the chain, so pick again */
+ if (!mutex)
+ return NULL;
- /* Check again that p is blocked with blocked_lock held */
- if (!task_is_blocked(p) || mutex != p->blocked_on) {
/*
- * Something changed in the blocked_on chain and
- * we don't know if only at this level. So, let's
- * just bail out completely and let __schedule
- * figure things out (pick_again loop).
+ * By taking mutex->wait_lock we hold off concurrent mutex_unlock()
+ * and ensure @owner sticks around.
*/
- goto out;
+ raw_spin_lock(&mutex->wait_lock);
+ raw_spin_lock(&p->blocked_lock);
+
+ /* Check again that p is blocked with blocked_lock held */
+ if (mutex != p->blocked_on) {
+ /*
+ * Something changed in the blocked_on chain and
+ * we don't know if only at this level. So, let's
+ * just bail out completely and let __schedule
+ * figure things out (pick_again loop).
+ */
+ goto out;
+ }
+
+ owner = __mutex_owner(mutex);
+ if (!owner) {
+ ret = p;
+ goto out;
+ }
+
+ if (task_cpu(owner) != this_cpu) {
+ /* XXX Don't handle migrations yet */
+ if (!proxy_deactivate(rq, next))
+ ret = next;
+ goto out;
+ }
+
+ if (task_on_rq_migrating(owner)) {
+ /*
+ * One of the chain of mutex owners is currently migrating to this
+ * CPU, but has not yet been enqueued because we are holding the
+ * rq lock. As a simple solution, just schedule rq->idle to give
+ * the migration a chance to complete. Much like the migrate_task
+ * case we should end up back in proxy(), this time hopefully with
+ * all relevant tasks already enqueued.
+ */
+ raw_spin_unlock(&p->blocked_lock);
+ raw_spin_unlock(&mutex->wait_lock);
+ return proxy_resched_idle(rq, next);
+ }
+
+ if (!owner->on_rq) {
+ /* XXX Don't handle blocked owners yet */
+ if (!proxy_deactivate(rq, next))
+ ret = next;
+ goto out;
+ }
+
+ if (owner == p) {
+ /*
+ * It's possible we interleave with mutex_unlock like:
+ *
+ * lock(&rq->lock);
+ * find_proxy_task()
+ * mutex_unlock()
+ * lock(&wait_lock);
+ * next(owner) = current->blocked_donor;
+ * unlock(&wait_lock);
+ *
+ * wake_up_q();
+ * ...
+ * ttwu_runnable()
+ * __task_rq_lock()
+ * lock(&wait_lock);
+ * owner == p
+ *
+ * Which leaves us to finish the ttwu_runnable() and make it go.
+ *
+ * So schedule rq->idle so that ttwu_runnable can get the rq lock
+ * and mark owner as running.
+ */
+ raw_spin_unlock(&p->blocked_lock);
+ raw_spin_unlock(&mutex->wait_lock);
+ return proxy_resched_idle(rq, next);
+ }
+
+ /*
+ * OK, now we're absolutely sure @owner is not blocked _and_
+ * on this rq, therefore holding @rq->lock is sufficient to
+ * guarantee its existence, as per ttwu_remote().
+ */
+ raw_spin_unlock(&p->blocked_lock);
+ raw_spin_unlock(&mutex->wait_lock);
}
- if (!proxy_deactivate(rq, next))
- ret = p;
+ WARN_ON_ONCE(owner && !owner->on_rq);
+ return owner;
+
out:
raw_spin_unlock(&p->blocked_lock);
raw_spin_unlock(&mutex->wait_lock);
@@ -6738,6 +6838,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
struct rq_flags rf;
struct rq *rq;
int cpu;
+ bool preserve_need_resched = false;
cpu = smp_processor_id();
rq = cpu_rq(cpu);
@@ -6798,9 +6899,12 @@ static void __sched notrace __schedule(unsigned int sched_mode)
rq_repin_lock(rq, &rf);
goto pick_again;
}
+ if (next == rq->idle && prev == rq->idle)
+ preserve_need_resched = true;
}
- clear_tsk_need_resched(prev);
+ if (!preserve_need_resched)
+ clear_tsk_need_resched(prev);
clear_preempt_need_resched();
#ifdef CONFIG_SCHED_DEBUG
rq->last_seen_need_resched_ns = 0;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 085941db5bf1..954b41e5b7df 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8905,6 +8905,9 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
if (kthread_is_per_cpu(p))
return 0;
+ if (task_is_blocked(p))
+ return 0;
+
if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
int cpu;
@@ -8941,7 +8944,8 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
/* Record that we found at least one task that could run on dst_cpu */
env->flags &= ~LBF_ALL_PINNED;
- if (task_on_cpu(env->src_rq, p)) {
+ if (task_on_cpu(env->src_rq, p) ||
+ task_current_selected(env->src_rq, p)) {
schedstat_inc(p->stats.nr_failed_migrations_running);
return 0;
}
@@ -8980,6 +8984,9 @@ static void detach_task(struct task_struct *p, struct lb_env *env)
{
lockdep_assert_rq_held(env->src_rq);
+ BUG_ON(task_current(env->src_rq, p));
+ BUG_ON(task_current_selected(env->src_rq, p));
+
deactivate_task(env->src_rq, p, DEQUEUE_NOCLOCK);
set_task_cpu(p, env->dst_cpu);
}
--
2.43.0.472.g3155946c3a-goog
next prev parent reply other threads:[~2023-12-20 0:19 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-20 0:18 [PATCH v7 00/23] Proxy Execution: A generalized form of Priority Inheritance v7 John Stultz
2023-12-20 0:18 ` [PATCH v7 01/23] sched: Unify runtime accounting across classes John Stultz
2023-12-20 0:18 ` [PATCH v7 02/23] locking/mutex: Remove wakeups from under mutex::wait_lock John Stultz
2023-12-20 0:18 ` [PATCH v7 03/23] locking/mutex: Make mutex::wait_lock irq safe John Stultz
2023-12-20 0:18 ` [PATCH v7 04/23] locking/mutex: Expose __mutex_owner() John Stultz
2023-12-20 0:18 ` [PATCH v7 05/23] locking/mutex: Rework task_struct::blocked_on John Stultz
2023-12-21 10:13 ` Metin Kaya
2023-12-21 17:52 ` John Stultz
2023-12-20 0:18 ` [PATCH v7 06/23] sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable John Stultz
2023-12-20 1:04 ` Randy Dunlap
2023-12-21 17:05 ` John Stultz
2023-12-28 15:06 ` Metin Kaya
2024-01-10 22:36 ` John Stultz
2023-12-20 0:18 ` [PATCH v7 07/23] locking/mutex: Switch to mutex handoffs for CONFIG_SCHED_PROXY_EXEC John Stultz
2023-12-20 0:18 ` [PATCH v7 08/23] sched: Split scheduler and execution contexts John Stultz
2023-12-21 10:43 ` Metin Kaya
2023-12-21 18:23 ` John Stultz
2024-01-03 14:49 ` Valentin Schneider
2024-01-10 22:24 ` John Stultz
2023-12-20 0:18 ` [PATCH v7 09/23] sched: Fix runtime accounting w/ split exec & sched contexts John Stultz
2024-01-03 13:47 ` Valentin Schneider
2023-12-20 0:18 ` [PATCH v7 10/23] sched: Split out __sched() deactivate task logic into a helper John Stultz
2023-12-21 12:30 ` Metin Kaya
2023-12-21 18:49 ` John Stultz
2023-12-20 0:18 ` [PATCH v7 11/23] sched: Add a initial sketch of the find_proxy_task() function John Stultz
2023-12-21 12:55 ` Metin Kaya
2023-12-21 19:12 ` John Stultz
2023-12-20 0:18 ` [PATCH v7 12/23] sched: Fix proxy/current (push,pull)ability John Stultz
2023-12-21 15:03 ` Metin Kaya
2023-12-21 21:02 ` John Stultz
2023-12-20 0:18 ` John Stultz [this message]
2023-12-21 15:30 ` [PATCH v7 13/23] sched: Start blocked_on chain processing in find_proxy_task() Metin Kaya
2023-12-20 0:18 ` [PATCH v7 14/23] sched: Handle blocked-waiter migration (and return migration) John Stultz
2023-12-21 16:12 ` Metin Kaya
2023-12-21 19:46 ` John Stultz
2024-01-02 15:33 ` Phil Auld
2024-01-04 23:33 ` John Stultz
2023-12-20 0:18 ` [PATCH v7 15/23] sched: Add blocked_donor link to task for smarter mutex handoffs John Stultz
2023-12-20 0:18 ` [PATCH v7 16/23] sched: Add deactivated (sleeping) owner handling to find_proxy_task() John Stultz
2023-12-22 8:33 ` Metin Kaya
2024-01-04 23:25 ` John Stultz
2023-12-20 0:18 ` [PATCH v7 17/23] sched: Initial sched_football test implementation John Stultz
2023-12-20 0:59 ` Randy Dunlap
2023-12-20 2:37 ` John Stultz
2023-12-22 9:32 ` Metin Kaya
2024-01-05 5:20 ` John Stultz
2023-12-28 15:19 ` Metin Kaya
2024-01-05 5:22 ` John Stultz
2023-12-28 16:36 ` Metin Kaya
2024-01-05 5:25 ` John Stultz
2023-12-20 0:18 ` [PATCH v7 18/23] sched: Add push_task_chain helper John Stultz
2023-12-22 10:32 ` Metin Kaya
2023-12-20 0:18 ` [PATCH v7 19/23] sched: Consolidate pick_*_task to task_is_pushable helper John Stultz
2023-12-22 10:23 ` Metin Kaya
2024-01-04 23:44 ` John Stultz
2023-12-20 0:18 ` [PATCH v7 20/23] sched: Push execution and scheduler context split into deadline and rt paths John Stultz
2023-12-22 11:33 ` Metin Kaya
2024-01-05 0:01 ` John Stultz
2023-12-20 0:18 ` [PATCH v7 21/23] sched: Add find_exec_ctx helper John Stultz
2023-12-22 11:57 ` Metin Kaya
2024-01-05 3:12 ` John Stultz
2023-12-20 0:18 ` [PATCH v7 22/23] sched: Refactor dl/rt find_lowest/latest_rq logic John Stultz
2023-12-22 13:52 ` Metin Kaya
2023-12-20 0:18 ` [PATCH v7 23/23] sched: Fix rt/dl load balancing via chain level balance John Stultz
2023-12-22 14:51 ` Metin Kaya
2024-01-05 3:42 ` John Stultz
2023-12-21 8:35 ` [PATCH v7 00/23] Proxy Execution: A generalized form of Priority Inheritance v7 Metin Kaya
2023-12-21 17:13 ` John Stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231220001856.3710363-14-jstultz@google.com \
--to=jstultz@google.com \
--cc=Metin.Kaya@arm.com \
--cc=boqun.feng@gmail.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=connoro@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=joelaf@google.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@android.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=qyousef@google.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=valentin.schneider@arm.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=will@kernel.org \
--cc=xuewen.yan94@gmail.com \
--cc=youssefesmat@google.com \
--cc=zezeozue@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox