All of lore.kernel.org
 help / color / mirror / Atom feed
From: Srivatsa Vaddagiri <vatsa@in.ibm.com>
To: William Lee Irwin III <wli@holomorphy.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	efault@gmx.de, kernel@kolivas.org, containers@lists.osdl.org,
	ckrm-tech@lists.sourceforge.net, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, pwil3058@bigpond.net.au,
	tingy@cs.umass.edu, tong.n.li@intel.com,
	Balbir Singh <balbir@in.ibm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC] [PATCH 0/3] Add group fairness to CFS
Date: Fri, 25 May 2007 21:44:24 +0530	[thread overview]
Message-ID: <20070525161424.GA5162@in.ibm.com> (raw)
In-Reply-To: <20070523180316.GY19966@holomorphy.com>

On Wed, May 23, 2007 at 11:03:16AM -0700, William Lee Irwin III wrote:
> Well, SMP load balancing is what makes all this hard.

Agreed. I am optimistic that we can achieve good degree of SMP
fairness using similar mechanism as smpnice ..

> On Wed, May 23, 2007 at 10:18:59PM +0530, Srivatsa Vaddagiri wrote:
> > Salient points which needs discussion:
> > 1. This patch reuses CFS core to achieve fairness at group level also.
> >    To make this possible, CFS core has been abstracted to deal with generic 
> >    schedulable "entities" (tasks, users etc).
> 
> The ability to handle deeper hierarchies would be useful for those
> who want such semantics.

sure, although the more levels of hierarchy scheduler recoginizes, more
the (accounting/scheduling) cost is!

> On Wed, May 23, 2007 at 10:18:59PM +0530, Srivatsa Vaddagiri wrote:
> > 2. The per-cpu rb-tree has been split to be per-group per-cpu.
> >    schedule() now becomes two step on every cpu : pick a group first (from
> >    group rb-tree) and a task within that group next (from that group's task
> >    rb-tree)
> 
> That assumes per-user scheduling groups; other configurations would
> make it one step for each level of hierarchy. It may be possible to
> reduce those steps to only state transitions that change weightings
> and incremental updates of task weightings. By and large, one needs
> the groups to determine task weightings as opposed to hierarchically
> scheduling, so there are alternative ways of going about this, ones
> that would even make load balancing easier.

Yeah I agree that providing hierarchical group-fairness at the cost of single 
(or fewer) scheduling levels would be a nice thing to target for,
although I don't know of any good way to do it. Do you have any ideas
here? Doing group fairness in a single level, using a common rb-tree for tasks 
from all groups is very difficult IMHO. We need atleast two levels.

One possibility is that we recognize deeper hierarchies only in user-space,
but flatten this view from kernel perspective i.e some user space tool
will have to distributed the weights accordingly in this flattened view
to the kernel.

> On Wed, May 23, 2007 at 10:18:59PM +0530, Srivatsa Vaddagiri wrote:
> > 3. Grouping mechanism - I have used 'uid' as the basis of grouping for
> >    timebeing (since that grouping concept is already in mainline today).
> >    The patch can be adapted to a more generic process grouping mechanism
> >    (like http://lkml.org/lkml/2007/4/27/146) later.
> 
> I'd like to see how desirable the semantics achieved by reflecting
> more of the process hierarchy structure in the scheduler groupings are.
> Users, sessions, pgrps, and thread_groups would be the levels of
> hierarchy there, where some handling of orphan pgrps is needed.

Good point. Essentially all users should get fair cpu first, then all
sessions/pgrps under a user should get fair share, followed by
process-groups under a session, followed by processes in a
process-group, followed by threads in a process (phew) .. ?

The container patches by Paul Menage at http://lkml.org/lkml/2007/4/27/146
provide a generic enough mechanism to group tasks in a hierarchical
manner for each resource controller. For ex: for the cpu controller, if the 
desired fairness is as per the above scheme (user/session/pgrp/threads etc), 
then it is possible to write a script which creates such a tree under cpu
controller filesystem:

	# mkdir /dev/cpuctl
	# mount -t container -o cpuctl none /dev/cpuctl

/dev/cpuctl is the cpu controller filesystem which can look like this:

	/dev/cpuctl
		|----uid root
		|       |-- sid 10
		|	|      |------ pgrp 20
		| 	|      | 	  |-- process 100
		|	|      |          |-- process 101
		|	|      |          |
		|       |
		|	|-- sid 11
		|
		|--- uid guest

(If the cpu controller really supports those many levels that is!)

user scripts can be written to modify this filesystem tree upon every 
login/session/user creation (if that is possible to trap on). Essentially it 
lets this semantics (what you ask) be dynamic/tunable by user.

> Kernel compiles are markedly poor benchmarks. Try lat_ctx from lmbench,
> VolanoMark, AIM7, OAST, SDET, and so on.

Thanks for this list of tests. I intend to run all of them if possible
for my next version.

-- 
Regards,
vatsa

  parent reply	other threads:[~2007-05-25 16:06 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-23 16:48 [RFC] [PATCH 0/3] Add group fairness to CFS Srivatsa Vaddagiri
2007-05-23 16:51 ` [RFC] [PATCH 1/3] task_cpu(p) needs to be correct always Srivatsa Vaddagiri
2007-05-23 16:54 ` [RFC] [PATCH 2/3] Introduce two new structures - struct lrq and sched_entity Srivatsa Vaddagiri
2007-05-23 16:56 ` [RFC] [PATCH 3/3] Generalize CFS core and provide per-user fairness Srivatsa Vaddagiri
2007-05-23 18:32 ` [RFC] [PATCH 0/3] Add group fairness to CFS Ingo Molnar
2007-05-25  7:59   ` Srivatsa Vaddagiri
     [not found] ` <3d8471ca0705231112rfac9cfbt9145ac2da8ec1c85@mail.gmail.com>
     [not found]   ` <20070523183824.GA7388@elte.hu>
     [not found]     ` <4654BF88.3030404@yahoo.fr>
2007-05-25  7:45       ` Srivatsa Vaddagiri
2007-05-25  8:29         ` Ingo Molnar
2007-05-25 10:56           ` Srivatsa Vaddagiri
2007-05-25 11:11             ` Ingo Molnar
2007-05-25 11:28               ` Srivatsa Vaddagiri
2007-05-25 12:05                 ` Ingo Molnar
2007-05-25 12:41                   ` Srivatsa Vaddagiri
2007-05-25 13:05           ` Kirill Korotaev
2007-05-25 15:34             ` [ckrm-tech] " Srivatsa Vaddagiri
2007-05-25 16:18               ` Kirill Korotaev
2007-05-25 18:08                 ` Srivatsa Vaddagiri
2007-05-26  0:17                   ` Peter Williams
2007-05-26 15:41                     ` William Lee Irwin III
2007-05-27  1:29                       ` Peter Williams
2007-05-29 10:48                         ` William Lee Irwin III
2007-05-30  0:09                           ` Peter Williams
2007-05-30  2:48                             ` William Lee Irwin III
2007-05-30  4:07                               ` Peter Williams
2007-05-30 17:14                       ` Srivatsa Vaddagiri
2007-05-30 20:13                         ` William Lee Irwin III
2007-05-31  3:26                           ` Srivatsa Vaddagiri
2007-05-31  4:09                             ` William Lee Irwin III
2007-05-31  5:48                               ` Srivatsa Vaddagiri
2007-05-31  6:36                                 ` William Lee Irwin III
2007-05-31  8:33                                   ` Srivatsa Vaddagiri
2007-05-31  8:43                                     ` William Lee Irwin III
2007-05-31  8:56                                     ` Srivatsa Vaddagiri
2007-05-31  9:15                                       ` William Lee Irwin III
2007-05-31  9:36                                         ` Srivatsa Vaddagiri
2007-05-28 17:26                     ` Srivatsa Vaddagiri
2007-05-29  0:18                       ` Peter Williams
2007-05-29  1:55                         ` Paul Menage
2007-05-29  3:30                         ` Peter Williams
2007-05-25  9:30         ` Guillaume Chazarain
     [not found] ` <20070523180316.GY19966@holomorphy.com>
2007-05-25 16:14   ` Srivatsa Vaddagiri [this message]
2007-05-25 17:14     ` Li, Tong N
2007-05-28 16:39       ` [ckrm-tech] " Srivatsa Vaddagiri
2007-05-30  0:14         ` Bill Huey
2007-05-30  2:51         ` William Lee Irwin III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070525161424.GA5162@in.ibm.com \
    --to=vatsa@in.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@in.ibm.com \
    --cc=ckrm-tech@lists.sourceforge.net \
    --cc=containers@lists.osdl.org \
    --cc=efault@gmx.de \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pwil3058@bigpond.net.au \
    --cc=tingy@cs.umass.edu \
    --cc=tong.n.li@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.