From: Peter Zijlstra <peterz@infradead.org>
To: Valentin Schneider <valentin.schneider@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>,
linux-kernel@vger.kernel.org, vincent.guittot@linaro.org,
mgorman@suse.de, Oleg Nesterov <oleg@redhat.com>,
david@fromorbit.com
Subject: Re: [RFC][PATCH] sched: Better document ttwu()
Date: Fri, 3 Jul 2020 10:30:12 +0200 [thread overview]
Message-ID: <20200703083012.GU4800@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <jhjd05d92y3.mognet@arm.com>
On Thu, Jul 02, 2020 at 07:39:16PM +0100, Valentin Schneider wrote:
> > + * Special state:
> > + *
> > + * System-calls and anything external will use task_rq_lock() which acquires
> > + * both p->lock and rq->lock. As a consequence the state they change is stable
> > + * while holding either lock:
> > + *
> > + * - sched_setaffinity(): p->cpus_ptr
> > + * - set_user_nice(): p->se.load, p->static_prio
> > + * - __sched_setscheduler(): p->sched_class, p->policy, p->*prio, p->se.load,
> > + * p->dl.dl_{runtime, deadline, period, flags, bw, density}
>
> Only extra thing that comes to mind is p->uclamp*; dunno how exhaustive you
> want this list to be.
Indeed, I seem to have missed that one.
> > + * - sched_setnuma(): p->numa_preferred_nid
> > + * - sched_move_task()/
> > + * cpu_cgroup_fork(): p->sched_task_group
> > + *
> > + * task_cpu(p): is changed by set_task_cpu(), the rules are:
> > + *
> > + * - Don't call set_task_cpu() on a blocked task:
> > + *
> > + * We don't care what CPU we're not running on, this simplifies hotplug,
> > + * the CPU assignment of blocked tasks isn't required to be valid.
> > + *
>
> That's more of a good practice rather than a hard rule, right? We do that
> with proxy execution (the whole migrate to owner's rq thing), at least in
> its current shape.
Yeah, but all of that isn't upstream yet. That said; the distinguishing
feature there is that we create a class of blocked tasks that will still
be 'runnable'. And as such we'll care about their placement.
> > + * - for try_to_wake_up(), called under p->pi_lock:
> > + *
> > + * This allows try_to_wake_up() to only take one rq->lock, see its comment.
> > + *
> > + * - for migration called under rq->lock:
> > + * [ see task_on_rq_migrating() in task_rq_lock() ]
> > + *
> > + * o move_queued_task()
> > + * o __migrate_swap_task()
>
> Isn't that one under double_rq_lock()?
Indeed, /me moves.
> > + * o detach_task()
> > + *
> > + * - for migration called under double_rq_lock():
> > + *
> > + * o push_rt_task() / pull_rt_task()
> > + * o push_dl_task() / pull_dl_task()
> > + * o dl_task_offline_migration()
> > + *
> > + */
> > /*
> > - * Called in case the task @p isn't fully descheduled from its runqueue,
> > - * in this case we must do a remote wakeup. Its a 'light' wakeup though,
> > - * since all we need to do is flip p->state to TASK_RUNNING, since
> > - * the task is still ->on_rq.
> > + * Consider @p being inside a wait loop:
> > + *
> > + * for (;;) {
> > + * set_current_state(TASK_UNINTERRUPTIBLE);
> > + *
> > + * if (CONDITION)
> > + * break;
>
> For some reason the alignment is off in my mail view, but looks okay once
> applied.
I'll go eradicate tabstops :-)
> > + *
> > + * schedule();
> > + * }
> > + * __set_current_state(TASK_RUNNING);
> > + *
> > + * between set_current_state() and schedule(). In this case @p is still
> > + * runnable, so all that needs doing is change p->state back to TASK_RUNNING in
> > + * an atomic manner.
> > + *
>
> Sorry if I'm being dense; don't you mean "running" here? If it stops being
> current inbetween set_current_state() and schedule(), __schedule() will
> deactivate() it, so AFAICT it can only be either running or deactivated.
Runnable, the task could be preempted. At this point we don't care if it
is actually running or not.
> > + * By taking task_rq(p)->lock we serialize against schedule(), if @p->on_rq
> > + * then schedule() must still happen and p->state can be changed to
> > + * TASK_RUNNING. Otherwise we lost the race, schedule() has happened, and we
> > + * need to do a full wakeup with enqueue.
> > + *
> > + * Returns: %true when the wakeup is done,
> > + * %false otherwise.
> > */
> > -static int ttwu_remote(struct task_struct *p, int wake_flags)
> > +static int ttwu_runnable(struct task_struct *p, int wake_flags)
> > {
> > struct rq_flags rf;
> > struct rq *rq;
> > + * Tries really hard to only take one task_rq(p)->lock for performance.
> > + * Takes rq->lock in:
> > + * - ttwu_runnable() -- old rq, unavoidable, see comment there;
> > + * - ttwu_queue() -- new rq, for enqueue of the task;
> > + * - psi_ttwu_dequeue() -- much sadness :-( accounting will kill us.
> > + *
> > + * As a concequence we race really badly with just about everything. See the
>
> s/concequence/consequence/
ta!
> > @@ -3134,8 +3274,12 @@ static inline void prepare_task(struct task_struct *next)
> > /*
> > * Claim the task as running, we do this before switching to it
> > * such that any running task will have this set.
> > + *
> > + * __schedule()'s rq->lock and smp_mb__after_spin_lock() orders this
> > + * store against prior state change of @next, also see
> > + * try_to_wake_up(), specifically smp_load_acquire(&p->on_cpu).
>
> smp_*cond*_load_acquire(&p->on_cpu, <blah>)
Both, but yeah.. arguably the cond one is the more important one.
>
> > */
> > - next->on_cpu = 1;
> > + WRITE_ONCE(next->on_cpu, 1);
> > #endif
> > }
> > +/*
> > + * Lockdep annotation that avoid accidental unlock; any
> > + * raw_spin_unlock(&rq->lock) without preceding rq_unpin_lock() with the
> > + * correct cookie will result in a WARN.
> > + *
>
> ISTR that being described (by yourself?) as a "sticky/continuous
> lockdep_assert_held()", which I think gets the point across.
Ah indeed! Clever of my past self :-) I'll go reword it.
next prev parent reply other threads:[~2020-07-03 8:30 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-02 12:52 [RFC][PATCH] sched: Better document ttwu() Peter Zijlstra
2020-07-02 13:13 ` Phil Auld
2020-07-02 15:23 ` Peter Zijlstra
2020-07-02 18:39 ` Valentin Schneider
2020-07-03 8:30 ` Peter Zijlstra [this message]
2020-07-03 11:36 ` Peter Zijlstra
2020-07-03 10:12 ` Dietmar Eggemann
2020-07-03 12:39 ` Vincent Guittot
2020-07-22 9:12 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200703083012.GU4800@hirez.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=david@fromorbit.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=valentin.schneider@arm.com \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox