Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance()

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Yuyang Du <yuyang.du@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Rabin Vincent <rabin.vincent@axis.com>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Paul Turner <pjt@google.com>, Ben Segall <bsegall@google.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>
Subject: Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance()
Date: Thu, 2 Jul 2015 07:25:11 +0800	[thread overview]
Message-ID: <20150701232511.GA5197@intel.com> (raw)
In-Reply-To: <20150701204404.GH25159@twins.programming.kicks-ass.net>

Hello,

We have two basic load balancing: idle blancing and periodic balancing.

It takes a tick for periodic balancing to happen again, so the livelock
could not be from it.

On the contrary, the idle balancing occurs as needed at arbitrary time,
so the livelock could happen.

And obviously, the idle balancing livelock SHOULD happen: one CPU pulls
tasks from the other, makes the other idle, and this iterates...

That being said, it is also obvious to prevent the livelock from happening:
idle pulling until the source rq's nr_running is 1, becuase otherwise we
just avoid idleness by making another idleness.

Hope the patch at the end should work.

On Wed, Jul 01, 2015 at 10:44:04PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 01, 2015 at 04:55:51PM +0200, Rabin Vincent wrote:
> >  PID: 413    TASK: 8edda408  CPU: 1   COMMAND: "rngd"
> >   task_h_load():     0 [ = (load_avg_contrib {    0} * cfs_rq->h_load {    0}) / (cfs_rq->runnable_load_avg {    0} + 1) ]
> >   SE: 8edda450 load_avg_contrib:     0 load.weight:  1024 PARENT: 8fffbd00 GROUPNAME: (null)
> >   SE: 8fffbd00 load_avg_contrib:     0 load.weight:     2 PARENT: 8f531f80 GROUPNAME: rngd@hwrng.service
> >   SE: 8f531f80 load_avg_contrib:     0 load.weight:  1024 PARENT: 8f456e00 GROUPNAME: system-rngd.slice
> >   SE: 8f456e00 load_avg_contrib:   118 load.weight:   911 PARENT: 00000000 GROUPNAME: system.slice
> 
> Firstly, a group (parent) load_avg_contrib should never be less than
> that of its constituent parts, therefore the top 3 SEs should have at
> least 118 too.

I think the downward is parent, so with this case, "parent is bigger than
child" is ok. 

But if the child is the only contributor to the parent's load (probably is this case),
then the load_avg_contrib should not jump suddenly from 0 to 118.

So this might be due to the __update_group_entity_contrib(), I did not look into
detail in this complex function, but a glimpse suggests it is at least not consistent
if not arbitray, so likely will not satisfy "parent is bigger than child".

But this patch series http://comments.gmane.org/gmane.linux.kernel/1981970 should
be very consistent in computing all SE's load avergage, thus safisfy the universal
truth...

> Now its been a while since I looked at the per entity load tracking
> stuff so some of the details have left me, but while it looks like we
> add the se->avg.load_avg_contrib to its cfs->runnable_load, we do not
> propagate that into the corresponding (group) se.
> 
> This means the se->avg.load_avg_contrib is accounted per cpu without
> migration benefits. So if our task just got migrated onto a cpu that
> hasn't ran the group in a while, the group will not have accumulated
> runtime.

Yes, this is true, the migration has nothing to do with either the source
or the destination group SEs. Down this road, if we want the addition and
subtraction after migation, we need to do it bottom-up along the SE, and 
do without rq->lock (?). Need to think about it for a while.

Thanks,
Yuyang

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 40a7fcb..f7cc1ef 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5898,6 +5898,10 @@ static int detach_tasks(struct lb_env *env)
 		return 0;

 	while (!list_empty(tasks)) {
+
+		if (env->idle == CPU_NEWLY_IDLE && env->src_rq->nr_running <= 1)
+			break;
+
 		p = list_first_entry(tasks, struct task_struct, se.group_node);

 		env->loop++;

next prev parent reply	other threads:[~2015-07-02  7:17 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-30 14:30 [PATCH?] Livelock in pick_next_task_fair() / idle_balance() Rabin Vincent
2015-07-01  5:36 ` Mike Galbraith
2015-07-01 14:55   ` Rabin Vincent
2015-07-01 15:47     ` Mike Galbraith
2015-07-01 20:44     ` Peter Zijlstra
2015-07-01 23:25       ` Yuyang Du [this message]
2015-07-02  8:05         ` Mike Galbraith
2015-07-02  1:05           ` Yuyang Du
2015-07-02 10:25             ` Mike Galbraith
2015-07-02 11:40             ` Morten Rasmussen
2015-07-02 19:37               ` Yuyang Du
2015-07-03  9:34                 ` Morten Rasmussen
2015-07-03 16:38                   ` Peter Zijlstra
2015-07-05 22:31                     ` Yuyang Du
2015-07-09 14:32                       ` Morten Rasmussen
2015-07-09 23:24                         ` Yuyang Du
2015-07-05 20:12                   ` Yuyang Du
2015-07-06 17:36                     ` Dietmar Eggemann
2015-07-07 11:17                       ` Rabin Vincent
2015-07-13 17:43                         ` Dietmar Eggemann
2015-07-09 13:53                     ` Morten Rasmussen
2015-07-09 22:34                       ` Yuyang Du
2015-07-02 10:53         ` Peter Zijlstra
2015-07-02 11:44           ` Morten Rasmussen
2015-07-02 18:42             ` Yuyang Du
2015-07-03  4:42               ` Mike Galbraith
2015-07-03 16:39         ` Peter Zijlstra
2015-07-05 22:11           ` Yuyang Du
2015-07-09  6:15             ` Stefan Ekenberg
2015-07-26 18:57             ` Yuyang Du
2015-08-03 17:05             ` [tip:sched/core] sched/fair: Avoid pulling all tasks in idle balancing tip-bot for Yuyang Du

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:40a7fcb dfblob:f7cc1ef )
 OR (
bs:"Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150701232511.GA5197@intel.com \
    --to=yuyang.du@intel.com \
    --cc=bsegall@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rabin.vincent@axis.com \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox