From: Kirill Tkhai <ktkhai@parallels.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: <linux-kernel@vger.kernel.org>,
Mike Galbraith <umgwanakikbuti@gmail.com>,
Steven Rostedt <rostedt@goodmis.org>,
Tim Chen <tim.c.chen@linux.intel.com>,
Nicolas Pitre <nicolas.pitre@linaro.org>,
Ingo Molnar <mingo@kernel.org>, Paul Turner <pjt@google.com>,
<tkhai@yandex.ru>, Oleg Nesterov <oleg@redhat.com>
Subject: Re: [PATCH 2/5] sched: Teach scheduler to understand ONRQ_MIGRATING state
Date: Tue, 22 Jul 2014 16:24:00 +0400 [thread overview]
Message-ID: <1406031840.3526.60.camel@tkhai> (raw)
In-Reply-To: <20140722114542.GG20603@laptop.programming.kicks-ass.net>
В Вт, 22/07/2014 в 13:45 +0200, Peter Zijlstra пишет:
> On Tue, Jul 22, 2014 at 03:30:16PM +0400, Kirill Tkhai wrote:
> >
> > This is new on_rq state for the cases when task is migrating
> > from one src_rq to another dst_rq, and locks of the both RQs
> > are unlocked.
> >
> > We will use the state this way:
> >
> > raw_spin_lock(&src_rq->lock);
> > dequeue_task(src_rq, p, 0);
> > p->on_rq = ONRQ_MIGRATING;
> > set_task_cpu(p, dst_cpu);
> > raw_spin_unlock(&src_rq->lock);
> >
> > raw_spin_lock(&dst_rq->lock);
> > p->on_rq = ONRQ_QUEUED;
> > enqueue_task(dst_rq, p, 0);
> > raw_spin_unlock(&dst_rq->lock);
> >
> > The profit is that double_rq_lock() is not needed now,
> > and this may reduce the latencies in some situations.
> >
> > The logic of try_to_wake_up() remained the same as it
> > was. Its behaviour changes in a small subset of cases
> > (when preempted task in ~TASK_RUNNING state is queued
> > on rq and we are migrating it to another).
>
> more details is better ;-) Also, I think Oleg enjoys these kind of
> things, so I've added him to the CC.
try_to_wake_up() wakes tasks in the particular states,
no one calls it with TASK_RUNNING argument. So, our
logic will have a deal with tasks in !TASK_RUNNING state.
If they are on rq and !TASK_RUNNING, this means, they were
preempted using preempt_schedule{,_irq}. If they were preempted
in !TASK_RUNNING state, this was in one of the cases like:
set_current_state(TASK_INTERRUPTIBLE);
(actions)
schedule();
And someone is migrating this task.
Really small subset of cases :)
> A few questions, haven't really thought about things yet.
>
> > @@ -1491,10 +1491,14 @@ static void ttwu_activate(struct rq *rq, struct task_struct *p, int en_flags)
> > static void
> > ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
> > {
> > - check_preempt_curr(rq, p, wake_flags);
> > trace_sched_wakeup(p, true);
> >
> > p->state = TASK_RUNNING;
> > +
> > + if (!task_queued(p))
> > + return;
>
> How can this happen? we're in the middle of a wakeup, we're just added
> the task to the rq and are still holding the appropriate rq->lock.
try_to_wake_up()->ttwu_remote()->ttwu_do_wakeup():
task is migrating at the moment, and the only thing we do is we change
its state on TASK_RUNNING. It will be queued with new state (TASK_RUNNING) by the code, which is migrating it.
It looks like this situation in general is not very often. It's only
for the cases when we're migrating a task, which was preempted with
!TASK_RUNNING state.
> > @@ -4623,9 +4629,14 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
> > struct rq *rq;
> > unsigned int dest_cpu;
> > int ret = 0;
> > -
> > +again:
> > rq = task_rq_lock(p, &flags);
> >
> > + if (unlikely(p->on_rq) == ONRQ_MIGRATING) {
> > + task_rq_unlock(rq, p, &flags);
> > + goto again;
> > + }
> > +
> > if (cpumask_equal(&p->cpus_allowed, new_mask))
> > goto out;
> >
>
> That looks like a non-deterministic spin loop, 'waiting' for the
> migration to finish. Not particularly nice and something I think we
> should avoid for it has bad (TM) worst case behaviour.
>
> Also, why only this site and not all task_rq_lock() sites?
All other places tests for task_queued(p) under rq's lock. I watched
all of them and did not find a place who needs that. For example,
rt_mutex_setprio() enqueues only if task was really queued before.
Patch [1/5] made all preparation for that. With a hope, nothing was
skipped by /me.
About set_cpus_allowed_ptr(). We could return -EAGAIN if a task is
migrating, but set_cpus_allowed_ptr() must not fail on kernel threads.
We'd would miss something important this way, it's softirq affinity
for example.
Going to again label looks like manual spinlock for me. We used to
spin there before. The time of held lock was similar, and we also
had competition with other rq's lock users. The lock just was locked
permanently, and we didn't have overhead with repeating spin_{lock,unlock}
here.
I don't see what to do instead..
Thanks,
Kirill
next prev parent reply other threads:[~2014-07-22 12:24 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20140722102425.29682.24086.stgit@tkhai>
2014-07-22 11:30 ` [PATCH 1/5] sched: Wrapper for checking task_struct's .on_rq Kirill Tkhai
2014-07-22 11:30 ` [PATCH 2/5] sched: Teach scheduler to understand ONRQ_MIGRATING state Kirill Tkhai
2014-07-22 11:45 ` Peter Zijlstra
2014-07-22 12:24 ` Kirill Tkhai [this message]
2014-07-22 12:25 ` Steven Rostedt
2014-07-22 13:20 ` Kirill Tkhai
2014-07-24 19:03 ` Oleg Nesterov
2014-07-25 7:11 ` Kirill Tkhai
2014-07-22 11:30 ` [PATCH 3/5] sched: Remove double_rq_lock() from __migrate_task() Kirill Tkhai
2014-07-22 11:30 ` [PATCH 4/5] sched/fair: Remove double_lock_balance() from active_load_balance_cpu_stop() Kirill Tkhai
2014-07-25 0:04 ` Tim Chen
2014-07-25 7:05 ` Kirill Tkhai
2014-07-22 11:31 ` [PATCH 5/5] sched/fair: Remove double_lock_balance() from load_balance() Kirill Tkhai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1406031840.3526.60.camel@tkhai \
--to=ktkhai@parallels.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=nicolas.pitre@linaro.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=rostedt@goodmis.org \
--cc=tim.c.chen@linux.intel.com \
--cc=tkhai@yandex.ru \
--cc=umgwanakikbuti@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox