public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: "Li, Tong N" <tong.n.li@intel.com>
Cc: Chris Friesen <cfriesen@nortel.com>,
	linux-kernel@vger.kernel.org, vatsa@linux.vnet.ibm.com,
	mingo@elte.hu, pj@sgi.com
Subject: RE: fair group scheduler not so fair?
Date: Thu, 22 May 2008 23:13:39 +0200	[thread overview]
Message-ID: <1211490819.6463.172.camel@lappy.programming.kicks-ass.net> (raw)
In-Reply-To: <5FD5754DDBA0B1499B5A0B4BB54194850357ED61@fmsmsx411.amr.corp.intel.com>

On Thu, 2008-05-22 at 13:18 -0700, Li, Tong N wrote:
> Peter,
> 
> I didn't look at your patches, but I thought you were flattening group
> weights down to task-level so that the scheduler only looks at per-task
> weights. That'd make group fairness as good as task fairness gets. Is
> this still the case?

We still have hierarchical runqueues - getting rid of that is another
tree I'm working on, it has an EEVDF based rq scheduler.

For load balancing purposes we are indeed projecting everything to a
flat level.

A rather quick description of what we do:


We'll have:

  task-weight     - the weight for a task
  group-weight    - the weight for a group (same units as for tasks)
  group-shares    - the weight for a group on a particular cpu
  runqueue-weight - the sum of weights

we compute group-shares as:

  s_(i,g) = W_g * rw_(i,g) / \Sum_j rw_(j,g)
  
s_(i,g)  := group g's shares for cpu i
W_g      := group g's weight
rw_(i,g) := group g's runqueue weight for cpu i

(all for a given group)

We compute these shares while walking the task group tree bottom up,
since the shares for a child's group will affect the runqueue weight for
its parent.

We then select the busiest runqueue from the available set solely based
on top level runqueue weight (since that accurately reflects all the
child group weights after updating the shares).

We compute an imbalance between this rq and the busiest rq in top
weight.

Then, for this busiest cpu we compute the hierarchical load for each
group:

 h_load_(i,g) = rw_(i,0) \Prod_{l=1} s_(i,l)/rw_(i,{l-1})

Where l iterates over the tree levels (not the cpus).

h_load represents the full weight of the group as seen from the top
level. This is used to convert the weight of each moved task to top
weight, and we'll keep on moving tasks until the imbalance is satisfied.


Given the following:

      root
     / | \
   _A_ 1  2
  /| |\
 3 4 5 B
      / \
     6   7

     CPU0            CPU1
     root            root
     /  \            /  \
    A    1          A    2
   / \             / \
  4   B           3   5
     / \
    6   7


Numerical examples given the above scenario, assuming every body's
weight is 1024:

 s_(0,A) = s_(1,A) = 512
 s_(0,B) = 1024, s_(1,B) = 0

 rw_(0,A) = rw(1,A) = 2048
 rw_(0,B) = 2048, rw_(1,B) = 0

 h_load_(0,A) = h_load_(1,A) = 512
 h_load_(0,B) = 256, h_load(1,B) = 0





  reply	other threads:[~2008-05-22 21:15 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-21 23:59 fair group scheduler not so fair? Chris Friesen
2008-05-22  6:56 ` Peter Zijlstra
2008-05-22 20:02   ` Chris Friesen
2008-05-22 20:07     ` Peter Zijlstra
2008-05-22 20:18       ` Li, Tong N
2008-05-22 21:13         ` Peter Zijlstra [this message]
2008-05-23  0:17           ` Chris Friesen
2008-05-23  7:44             ` Srivatsa Vaddagiri
2008-05-23  9:42         ` Srivatsa Vaddagiri
2008-05-23  9:39           ` Peter Zijlstra
2008-05-23 10:19             ` Srivatsa Vaddagiri
2008-05-23 10:16               ` Peter Zijlstra
2008-05-27 17:15 ` Srivatsa Vaddagiri
2008-05-27 18:13   ` Chris Friesen
2008-05-28 16:33     ` Srivatsa Vaddagiri
2008-05-28 18:35       ` Chris Friesen
2008-05-28 18:47         ` Dhaval Giani
2008-05-29  2:50         ` Srivatsa Vaddagiri
2008-05-29 16:46         ` Srivatsa Vaddagiri
2008-05-29 16:47           ` Srivatsa Vaddagiri
2008-05-29 21:30           ` Chris Friesen
2008-05-30  6:43             ` Dhaval Giani
2008-05-30 10:21               ` Srivatsa Vaddagiri
2008-05-30 11:36             ` Srivatsa Vaddagiri
2008-06-02 20:03               ` Chris Friesen
2008-05-27 17:28 ` Srivatsa Vaddagiri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1211490819.6463.172.camel@lappy.programming.kicks-ass.net \
    --to=a.p.zijlstra@chello.nl \
    --cc=cfriesen@nortel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=pj@sgi.com \
    --cc=tong.n.li@intel.com \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox