Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Yuyang Du <yuyang.du@intel.com>
To: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rabin Vincent <rabin.vincent@axis.com>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Paul Turner <pjt@google.com>, Ben Segall <bsegall@google.com>
Subject: Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance()
Date: Mon, 6 Jul 2015 04:12:41 +0800	[thread overview]
Message-ID: <20150705201241.GE5197@intel.com> (raw)
In-Reply-To: <20150703093441.GA15477@e105550-lin.cambridge.arm.com>

Hi Morten,

On Fri, Jul 03, 2015 at 10:34:41AM +0100, Morten Rasmussen wrote:
> > > IOW, since task groups include blocked load in the load_avg_contrib (see
> > > __update_group_entity_contrib() and __update_cfs_rq_tg_load_contrib()) the
> > > imbalance includes blocked load and hence env->imbalance >=
> > > sum(task_h_load(p)) for all tasks p on the rq. Which leads to
> > > detach_tasks() emptying the rq completely in the reported scenario where
> > > blocked load > runnable load.
> > 
> > Whenever I want to know the load avg concerning task group, I need to
> > walk through the complete codes again, I prefer not to do it this time.
> > But it should not be that simply to say "the 118 comes from the blocked load".
> 
> But the whole hierarchy of group entities is updated each time we enqueue
> or dequeue a task. I don't see how the group entity load_avg_contrib is
> not up to date? Why do you need to update it again?
> 
> In any case, we have one task in the group hierarchy which has a
> load_avg_contrib of 0 and the grand-grand parent group entity has a
> load_avg_contrib of 118 and no additional tasks. That load contribution
> must be from tasks which are no longer around on the rq? No?

load_avg_contrib has WEIGHT inside, so the most I can say is:
SE: 8f456e00's load_avg_contrib 118 = (its cfs_rq's runnable + blocked) / (tg->load_avg + 1) * tg->shares

The tg->shares is probably 1024 (at least 911). So we are just left with:

cfs_rq / tg = 11.5%

I myself did question the sudden jump from 0 to 118 (see a previous reply).

But anyway, this really is irrelevant to the discusstion.
 
> > Anyway, with blocked load, yes, we definitely can't move (or even find) some
> > ammount of the imbalance if we only look at the tasks on the queue. But this
> > may or may not be a problem.
> > 
> > Firstly, the question comes to whether we want blocked load anywhere.
> > This is just about a "now vs. average" question.
> 
> That is what I meant in the paragraph below. It is a scheduling policy
> question.
> 
> > Secondly, if we stick to average, we just need to treat the blocked load
> > consistently, not that group SE has it, but task SE does not, or somewhere
> > has it, others not.
> 
> I agree that inconsistent use of blocked load will lead us into trouble.
> The problem is that none of the load-balance logic was designed for
> blocked load. It was written to deal load that is currently on the rq
> and slightly biased by average cpu load, not dealing with load
> contribution of tasks which we can't migrate at the moment because they
> are blocked. The load-balance code has to be updated to deal with
> blocked load. We will run into all sorts of issues if we don't and roll
> out use of blocked load everywhere.
> 
> However, before we can rework the load-balance code, we have to agree on
> the now vs average balance policy.
> 
> Your proposed patch implements a policy somewhere in between. We try to
> balance based on average, but we don't allow idle_balance() to empty a
> cpu completely. A pure average balance policy would allow a cpu to go
> idle even if we could do have tasks waiting on other rqs if the blocked
> load indicates that other task will show up shortly one the cpu. A pure
> "now" balance would balance based on runnable_load_avg for all entities
> including groups ignoring all blocked load, but that goes against the
> PELT group balancing design.
> 
> I'm not against having a policy that sits somewhere in between, we just
> have to agree it is the right policy and clean up the load-balance code
> such that the implemented policy is clear.
 
The proposed patch sits in between? I agree, but would rather see it from
another perspective.

First, I don't think it merits a solution/policy. It is just a cheap
"last guard" to protect the "king" - no crash.

Second, a "pure average" policy is pretty fine in general, but it does not
mean we would simply allow a CPU to be pulled empty, that is because we are
making a bet from a prediction (average) here. By prediction, it basically
means sometimes it fails. As the failure could lead to a disater, without
blaming the prediction, it is reasonable we make a sort of "damage control".

Thanks,
Yuyang

next prev parent reply	other threads:[~2015-07-06  4:04 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-30 14:30 [PATCH?] Livelock in pick_next_task_fair() / idle_balance() Rabin Vincent
2015-07-01  5:36 ` Mike Galbraith
2015-07-01 14:55   ` Rabin Vincent
2015-07-01 15:47     ` Mike Galbraith
2015-07-01 20:44     ` Peter Zijlstra
2015-07-01 23:25       ` Yuyang Du
2015-07-02  8:05         ` Mike Galbraith
2015-07-02  1:05           ` Yuyang Du
2015-07-02 10:25             ` Mike Galbraith
2015-07-02 11:40             ` Morten Rasmussen
2015-07-02 19:37               ` Yuyang Du
2015-07-03  9:34                 ` Morten Rasmussen
2015-07-03 16:38                   ` Peter Zijlstra
2015-07-05 22:31                     ` Yuyang Du
2015-07-09 14:32                       ` Morten Rasmussen
2015-07-09 23:24                         ` Yuyang Du
2015-07-05 20:12                   ` Yuyang Du [this message]
2015-07-06 17:36                     ` Dietmar Eggemann
2015-07-07 11:17                       ` Rabin Vincent
2015-07-13 17:43                         ` Dietmar Eggemann
2015-07-09 13:53                     ` Morten Rasmussen
2015-07-09 22:34                       ` Yuyang Du
2015-07-02 10:53         ` Peter Zijlstra
2015-07-02 11:44           ` Morten Rasmussen
2015-07-02 18:42             ` Yuyang Du
2015-07-03  4:42               ` Mike Galbraith
2015-07-03 16:39         ` Peter Zijlstra
2015-07-05 22:11           ` Yuyang Du
2015-07-09  6:15             ` Stefan Ekenberg
2015-07-26 18:57             ` Yuyang Du
2015-08-03 17:05             ` [tip:sched/core] sched/fair: Avoid pulling all tasks in idle balancing tip-bot for Yuyang Du

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150705201241.GE5197@intel.com \
    --to=yuyang.du@intel.com \
    --cc=bsegall@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rabin.vincent@axis.com \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.