From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: linux-kernel@vger.kernel.org
Cc: Ben Segall <bsegall@google.com>,
Daniel Bristot de Oliveira <bristot@redhat.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Ingo Molnar <mingo@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>, Mel Gorman <mgorman@suse.de>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Thomas Gleixner <tglx@linutronix.de>,
Valentin Schneider <vschneid@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>
Subject: [PATCH] sched: Consider task_struct::saved_state in wait_task_inactive().
Date: Fri, 17 Feb 2023 15:53:02 +0100 [thread overview]
Message-ID: <Y++UzubyNavLKFDP@linutronix.de> (raw)
wait_task_inactive() waits for thread to unschedule in a certain task state.
On PREEMPT_RT that state may be stored in task_struct::saved_state while the
thread, that is being waited for, blocks on a sleeping lock and
task_struct::__state is set to TASK_RTLOCK_WAIT.
It is not possible to check only for TASK_RTLOCK_WAIT to be sure that the task
is blocked on a sleeping lock because during wake up (after the sleeping lock
has been acquired) the task state is set TASK_RUNNING. After the task in on CPU
and acquired the pi_lock it will reset the state accordingly but until then
TASK_RUNNING will be observed (with the desired state is saved in saved_state).
Check also for task_struct::saved_state if the desired match was not found in
task_struct::__state on PREEMPT_RT. If the state was found in saved_state, wait
until the task is idle and state is visible in task_struct::__state.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
---
Repost of https://lore.kernel.org/Yt%2FpQAFQ1xKNK0RY@linutronix.de
kernel/sched/core.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 76 insertions(+), 5 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3266,6 +3266,76 @@ int migrate_swap(struct task_struct *cur
}
#endif /* CONFIG_NUMA_BALANCING */
+#ifdef CONFIG_PREEMPT_RT
+
+/*
+ * Consider:
+ *
+ * set_special_state(X);
+ *
+ * do_things()
+ * // Somewhere in there is an rtlock that can be contended:
+ * current_save_and_set_rtlock_wait_state();
+ * [...]
+ * schedule_rtlock(); (A)
+ * [...]
+ * current_restore_rtlock_saved_state();
+ *
+ * schedule(); (B)
+ *
+ * If p->saved_state is anything else than TASK_RUNNING, then p blocked on an
+ * rtlock (A) *before* voluntarily calling into schedule() (B) after setting its
+ * state to X. For things like ptrace (X=TASK_TRACED), the task could have more
+ * work to do upon acquiring the lock in do_things() before whoever called
+ * wait_task_inactive() should return. IOW, we have to wait for:
+ *
+ * p.saved_state = TASK_RUNNING
+ * p.__state = X
+ *
+ * which implies the task isn't blocked on an RT lock and got to schedule() (B).
+ *
+ * Also see comments in ttwu_state_match().
+ */
+
+static __always_inline bool state_mismatch(struct task_struct *p, unsigned int match_state)
+{
+ unsigned long flags;
+ bool mismatch;
+
+ raw_spin_lock_irqsave(&p->pi_lock, flags);
+ if (READ_ONCE(p->__state) & match_state)
+ mismatch = false;
+ else if (READ_ONCE(p->saved_state) & match_state)
+ mismatch = false;
+ else
+ mismatch = true;
+
+ raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+ return mismatch;
+}
+static __always_inline bool state_match(struct task_struct *p, unsigned int match_state,
+ bool *wait)
+{
+ if (READ_ONCE(p->__state) & match_state)
+ return true;
+ if (READ_ONCE(p->saved_state) & match_state) {
+ *wait = true;
+ return true;
+ }
+ return false;
+}
+#else
+static __always_inline bool state_mismatch(struct task_struct *p, unsigned int match_state)
+{
+ return !(READ_ONCE(p->__state) & match_state);
+}
+static __always_inline bool state_match(struct task_struct *p, unsigned int match_state,
+ bool *wait)
+{
+ return (READ_ONCE(p->__state) & match_state);
+}
+#endif
+
/*
* wait_task_inactive - wait for a thread to unschedule.
*
@@ -3284,7 +3354,7 @@ int migrate_swap(struct task_struct *cur
*/
unsigned long wait_task_inactive(struct task_struct *p, unsigned int match_state)
{
- int running, queued;
+ bool running, wait;
struct rq_flags rf;
unsigned long ncsw;
struct rq *rq;
@@ -3310,7 +3380,7 @@ unsigned long wait_task_inactive(struct
* is actually now running somewhere else!
*/
while (task_on_cpu(rq, p)) {
- if (!(READ_ONCE(p->__state) & match_state))
+ if (state_mismatch(p, match_state))
return 0;
cpu_relax();
}
@@ -3323,9 +3393,10 @@ unsigned long wait_task_inactive(struct
rq = task_rq_lock(p, &rf);
trace_sched_wait_task(p);
running = task_on_cpu(rq, p);
- queued = task_on_rq_queued(p);
+ wait = task_on_rq_queued(p);
ncsw = 0;
- if (READ_ONCE(p->__state) & match_state)
+
+ if (state_match(p, match_state, &wait))
ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
task_rq_unlock(rq, p, &rf);
@@ -3355,7 +3426,7 @@ unsigned long wait_task_inactive(struct
* running right now), it's preempted, and we should
* yield - it could be a while.
*/
- if (unlikely(queued)) {
+ if (unlikely(wait)) {
ktime_t to = NSEC_PER_SEC / HZ;
set_current_state(TASK_UNINTERRUPTIBLE);
next reply other threads:[~2023-02-17 14:53 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-17 14:53 Sebastian Andrzej Siewior [this message]
2023-02-22 13:36 ` [PATCH] sched: Consider task_struct::saved_state in wait_task_inactive() Peter Zijlstra
2023-02-23 16:53 ` Sebastian Andrzej Siewior
2023-03-29 13:33 ` Sebastian Andrzej Siewior
2023-05-24 14:59 ` Sebastian Andrzej Siewior
2023-05-25 16:52 ` Peter Zijlstra
2023-05-26 8:05 ` Peter Zijlstra
2023-05-26 15:13 ` Sebastian Andrzej Siewior
2023-06-01 9:12 ` Peter Zijlstra
2023-06-02 8:25 ` Peter Zijlstra
2023-06-02 10:37 ` Peter Zijlstra
2023-06-02 10:49 ` Sebastian Andrzej Siewior
2023-06-02 11:18 ` Peter Zijlstra
2023-06-05 16:15 ` Sebastian Andrzej Siewior
2023-06-05 19:16 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-05-26 8:47 ` [PATCH] " Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y++UzubyNavLKFDP@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox