From: Michael wang <wangyun@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>,
Ingo Molnar <mingo@kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: sched: hang in migrate_swap
Date: Tue, 25 Feb 2014 11:01:01 +0800 [thread overview]
Message-ID: <530C076D.1050603@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140224121218.GR15586@twins.programming.kicks-ass.net>
On 02/24/2014 08:12 PM, Peter Zijlstra wrote:
[snip]
>>
>> ...what about move idle_balance() back to it's old position?
>
> I've always hated that, idle_balance() is very much a fair policy thing
> and shouldn't live in the core code.
>
>> pull_rt_task() logical could be after idle_balance() if still no FAIR
>> and DL, then go into the pick loop, that may could make things more
>> clean & clear, should we have a try?
>
> So the reason pull_{rt,dl}_task() is before idle_balance() is that we
> don't want to add the execution latency of idle_balance() to the rt/dl
> task pulling.
Yeah, that make sense, just wondering... since RT also has balance
stuff, may be we can use a new call back for each class in the old position?
The new idle_balance could like:
void idle_balance() {
for_each_class(class)
if class->idle_balance()
break
}
>
> Anyway, the below seems to work; it avoids playing tricks with the idle
> thread and instead uses a magic constant.
>
> The comparison should be faster too; seeing how we avoid dereferencing
> p->sched_class.
Great, it once appeared in my mind but you achieved this without new
parameter, now let's ignore my wondering above :)
Regards,
Michael Wang
>
> ---
> Subject: sched: Guarantee task priority in pick_next_task()
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Fri Feb 14 12:25:08 CET 2014
>
> Michael spotted that the idle_balance() push down created a task
> priority problem.
>
> Previously, when we called idle_balance() before pick_next_task() it
> wasn't a problem when -- because of the rq->lock droppage -- an rt/dl
> task slipped in.
>
> Similarly for pre_schedule(), rt pre-schedule could have a dl task
> slip in.
>
> But by pulling it into the pick_next_task() loop, we'll not try a
> higher task priority again.
>
> Cure this by creating a re-start condition in pick_next_task(); and
> triggering this from pick_next_task_{rt,fair}().
>
> Fixes: 38033c37faab ("sched: Push down pre_schedule() and idle_balance()")
> Cc: Juri Lelli <juri.lelli@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Reported-by: Michael Wang <wangyun@linux.vnet.ibm.com>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> ---
> kernel/sched/core.c | 12 ++++++++----
> kernel/sched/fair.c | 13 ++++++++++++-
> kernel/sched/rt.c | 10 +++++++++-
> kernel/sched/sched.h | 5 +++++
> 4 files changed, 34 insertions(+), 6 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2586,24 +2586,28 @@ static inline void schedule_debug(struct
> static inline struct task_struct *
> pick_next_task(struct rq *rq, struct task_struct *prev)
> {
> - const struct sched_class *class;
> + const struct sched_class *class = &fair_sched_class;
> struct task_struct *p;
>
> /*
> * Optimization: we know that if all tasks are in
> * the fair class we can call that function directly:
> */
> - if (likely(prev->sched_class == &fair_sched_class &&
> + if (likely(prev->sched_class == class &&
> rq->nr_running == rq->cfs.h_nr_running)) {
> p = fair_sched_class.pick_next_task(rq, prev);
> - if (likely(p))
> + if (likely(p && p != RETRY_TASK))
> return p;
> }
>
> +again:
> for_each_class(class) {
> p = class->pick_next_task(rq, prev);
> - if (p)
> + if (p) {
> + if (unlikely(p == RETRY_TASK))
> + goto again;
> return p;
> + }
> }
>
> BUG(); /* the idle class will always have a runnable task */
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4687,6 +4687,7 @@ pick_next_task_fair(struct rq *rq, struc
> struct cfs_rq *cfs_rq = &rq->cfs;
> struct sched_entity *se;
> struct task_struct *p;
> + int new_tasks;
>
> again:
> #ifdef CONFIG_FAIR_GROUP_SCHED
> @@ -4785,7 +4786,17 @@ pick_next_task_fair(struct rq *rq, struc
> return p;
>
> idle:
> - if (idle_balance(rq)) /* drops rq->lock */
> + /*
> + * Because idle_balance() releases (and re-acquires) rq->lock, it is
> + * possible for any higher priority task to appear. In that case we
> + * must re-start the pick_next_entity() loop.
> + */
> + new_tasks = idle_balance(rq);
> +
> + if (rq->nr_running != rq->cfs.h_nr_running)
> + return RETRY_TASK;
> +
> + if (new_tasks)
> goto again;
>
> return NULL;
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1360,8 +1360,16 @@ pick_next_task_rt(struct rq *rq, struct
> struct task_struct *p;
> struct rt_rq *rt_rq = &rq->rt;
>
> - if (need_pull_rt_task(rq, prev))
> + if (need_pull_rt_task(rq, prev)) {
> pull_rt_task(rq);
> + /*
> + * pull_rt_task() can drop (and re-acquire) rq->lock; this
> + * means a dl task can slip in, in which case we need to
> + * re-start task selection.
> + */
> + if (unlikely(rq->dl.dl_nr_running))
> + return RETRY_TASK;
> + }
>
> if (!rt_rq->rt_nr_running)
> return NULL;
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1090,6 +1090,8 @@ static const u32 prio_to_wmult[40] = {
>
> #define DEQUEUE_SLEEP 1
>
> +#define RETRY_TASK ((void *)-1UL)
> +
> struct sched_class {
> const struct sched_class *next;
>
> @@ -1104,6 +1106,9 @@ struct sched_class {
> * It is the responsibility of the pick_next_task() method that will
> * return the next task to call put_prev_task() on the @prev task or
> * something equivalent.
> + *
> + * May return RETRY_TASK when it finds a higher prio class has runnable
> + * tasks.
> */
> struct task_struct * (*pick_next_task) (struct rq *rq,
> struct task_struct *prev);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
next prev parent reply other threads:[~2014-02-25 3:01 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-19 18:08 sched: hang in migrate_swap Sasha Levin
2014-02-20 4:32 ` Michael wang
2014-02-21 16:43 ` Sasha Levin
2014-02-22 1:45 ` Michael wang
2014-02-24 3:23 ` Sasha Levin
2014-02-24 5:19 ` Michael wang
2014-02-24 5:54 ` Sasha Levin
2014-02-24 7:10 ` Peter Zijlstra
2014-02-24 10:14 ` Michael wang
2014-02-24 12:12 ` Peter Zijlstra
2014-02-24 13:10 ` Peter Zijlstra
2014-02-25 4:47 ` Michael wang
2014-02-25 10:49 ` Peter Zijlstra
2014-02-26 2:32 ` Michael wang
2014-02-24 18:21 ` Sasha Levin
2014-02-25 2:48 ` Michael wang
2014-02-25 11:03 ` Peter Zijlstra
2014-02-25 3:01 ` Michael wang [this message]
2014-02-27 13:33 ` [tip:sched/core] sched: Guarantee task priority in pick_next_task () tip-bot for Peter Zijlstra
2014-04-10 3:31 ` sched: hang in migrate_swap Sasha Levin
2014-04-10 6:59 ` Michael wang
2014-04-10 13:38 ` Kirill Tkhai
2014-04-11 14:32 ` Sasha Levin
2014-04-11 15:16 ` Kirill Tkhai
2014-05-12 18:48 ` Sasha Levin
2014-05-14 9:42 ` Kirill Tkhai
2014-05-14 10:13 ` Peter Zijlstra
2014-05-14 10:21 ` Kirill Tkhai
2014-05-14 10:26 ` Peter Zijlstra
2014-05-14 11:20 ` Peter Zijlstra
2015-06-15 19:38 ` Rafael David Tinoco
2015-06-15 19:47 ` Peter Zijlstra
2014-04-18 8:24 ` [tip:sched/urgent] sched: Check for stop task appearance when balancing happens tip-bot for Kirill Tkhai
2014-04-10 7:42 ` sched: hang in migrate_swap Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=530C076D.1050603@linux.vnet.ibm.com \
--to=wangyun@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=sasha.levin@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.