Re: [ckrm-tech] [RFC] [PATCH 0/3] Add group fairness to CFS

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Srivatsa Vaddagiri <vatsa@in.ibm.com>
To: William Lee Irwin III <wli@holomorphy.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
	ckrm-tech@lists.sourceforge.net, Balbir Singh <balbir@in.ibm.com>,
	efault@gmx.de, linux-kernel@vger.kernel.org, tingy@cs.umass.edu,
	Peter Williams <pwil3058@bigpond.net.au>,
	kernel@kolivas.org, tong.n.li@intel.com,
	containers@lists.osdl.org, Ingo Molnar <mingo@elte.hu>,
	Kirill Korotaev <dev@sw.ru>,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	Guillaume Chazarain <guichaz@yahoo.fr>
Subject: Re: [ckrm-tech] [RFC] [PATCH 0/3] Add group fairness to CFS
Date: Thu, 31 May 2007 14:03:53 +0530	[thread overview]
Message-ID: <20070531083353.GF663@in.ibm.com> (raw)
In-Reply-To: <20070531063647.GC15426@holomorphy.com>

On Wed, May 30, 2007 at 11:36:47PM -0700, William Lee Irwin III wrote:
> On Thu, May 31, 2007 at 11:18:28AM +0530, Srivatsa Vaddagiri wrote:
> > Hmm ..the fact that each task runs for a minimum of 1 tick seems to
> > complicate the matters to me (when doing group fairness given a single
> > level hierarchy). A user with 1000 (or more) tasks can be unduly
> > advantaged compared to another user with just 1 (or fewer) task
> > because of this?
> 
> Temporarily, yes. All this only works when averaged out.

So essentially when we calculate delta_mine component for each of those
1000 tasks, we will find that it has executed for 1 tick (4 ms say) but 
its fair share was very very low.

	fair_share = delta_exec * p->load_weight / total_weight

If p->load_weight has been calculated after factoring in hierarchy (as
you outlined in a previous mail), then p->load_weight of those 1000 tasks
will be far less compared to the p->load_weight of one task belonging to
other user, correct? Just to make sure I get all this correct:

	User U1 has tasks T0 - T999
	User U2 has task T1000

assuming each task's weight is 1 and each user's weight is 1 then:


	WT0 = (WU1 / WU1 + WU2) * (WT0 / WT0 + WT1 + ... + WT999)
	    = (1 / 1 + 1) * (1 / 1000)
	    = 1/2000
	    = 0.0005

	WT1 ..WT999 will be same as WT0

whereas, weight of T1000 will be:


	WT1000 	= (WU1 / WU1 + WU2) * (WT1000 / WT1000)
		= (1 / 1 + 1) * (1/1)
		= 0.5

?


So when T0 (or T1 ..T999) executes for 1 tick (4ms), their fair share would
be:
	T0's fair_share (delta_mine)
			= 4 ms * 0.0005 / (0.0005 * 1000 + 0.5)
			= 4 ms * 0.0005 / 1
			= 0.002 ms (2000 ns)

This would cause T0's ->wait_runtime to go negative sharply, causing it to be
inserted back in rb-tree well ahead in future. One change I can forsee
in CFS is with regard to limit_wait_runtime() ..We will have to change
its default limit, atleast when group fairness thingy is enabled.

Compared to this when T1000 executes for 1 tick, its fair share would be
calculated as:

	T1000's fair_share (delta_mine)
				= 4 ms * 0.5 / (0.0005 * 1000 + 0.5)
				= 4 ms * 0.5 / 1
				= 2 ms (2000000 ns)

Its ->wait_runtime will drop less significantly, which lets it be
inserted in rb-tree much to the left of those 1000 tasks (and which indirectly
lets it gain back its fair share during subsequent schedule cycles).

Hmm ..is that the theory?

Ingo, do you have any comments on this approach?

/me is tempted to try this all out.


> The basic
> idea is that you want a constant upper bound on the difference between
> the CPU time a task receives and the CPU time it was intended to get.
> This discretization is one of the larger sources of the "error" in the
> CPU time granted. The constant upper bound usually only applies to the
> largest difference for any task. When absolute values of differences
> are summed across tasks the aggregate will be O(tasks) because there's
> something almost like a constant per-task lower bound a la Heisenberg.
> It would have to get more exact the more tasks there are on the system
> for that to work, and something of the opposite actually holds.
> 
> It might be appropriate for the scheduler to dynamically adjust a
> periodic timer's period or to set up one-shot timers at involuntary
> preemption times in order to achieve more precise fairness in this
> sort of situation. In the case of few preemption points such one-shot
> code or low periodicity code would also save on taking interrupts that
> would otherwise manifest as overhead.
> 
> In short, a user with many tasks can reap a temporary advantage
> relative to users with fewer tasks because of this, but over time,
> longer-running tasks will receive the CPU time intended to within
> some constant upper bound, provided other things aren't broken.

-- 
Regards,
vatsa

next prev parent reply	other threads:[~2007-05-31  8:26 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-23 16:48 [RFC] [PATCH 0/3] Add group fairness to CFS Srivatsa Vaddagiri
2007-05-23 16:51 ` [RFC] [PATCH 1/3] task_cpu(p) needs to be correct always Srivatsa Vaddagiri
2007-05-23 16:54 ` [RFC] [PATCH 2/3] Introduce two new structures - struct lrq and sched_entity Srivatsa Vaddagiri
2007-05-23 16:56 ` [RFC] [PATCH 3/3] Generalize CFS core and provide per-user fairness Srivatsa Vaddagiri
2007-05-23 18:32 ` [RFC] [PATCH 0/3] Add group fairness to CFS Ingo Molnar
2007-05-25  7:59   ` Srivatsa Vaddagiri
     [not found] ` <3d8471ca0705231112rfac9cfbt9145ac2da8ec1c85@mail.gmail.com>
     [not found]   ` <20070523183824.GA7388@elte.hu>
     [not found]     ` <4654BF88.3030404@yahoo.fr>
2007-05-25  7:45       ` Srivatsa Vaddagiri
2007-05-25  8:29         ` Ingo Molnar
2007-05-25 10:56           ` Srivatsa Vaddagiri
2007-05-25 11:11             ` Ingo Molnar
2007-05-25 11:28               ` Srivatsa Vaddagiri
2007-05-25 12:05                 ` Ingo Molnar
2007-05-25 12:41                   ` Srivatsa Vaddagiri
2007-05-25 13:05           ` Kirill Korotaev
2007-05-25 15:34             ` [ckrm-tech] " Srivatsa Vaddagiri
2007-05-25 16:18               ` Kirill Korotaev
2007-05-25 18:08                 ` Srivatsa Vaddagiri
2007-05-26  0:17                   ` Peter Williams
2007-05-26 15:41                     ` William Lee Irwin III
2007-05-27  1:29                       ` Peter Williams
2007-05-29 10:48                         ` William Lee Irwin III
2007-05-30  0:09                           ` Peter Williams
2007-05-30  2:48                             ` William Lee Irwin III
2007-05-30  4:07                               ` Peter Williams
2007-05-30 17:14                       ` Srivatsa Vaddagiri
2007-05-30 20:13                         ` William Lee Irwin III
2007-05-31  3:26                           ` Srivatsa Vaddagiri
2007-05-31  4:09                             ` William Lee Irwin III
2007-05-31  5:48                               ` Srivatsa Vaddagiri
2007-05-31  6:36                                 ` William Lee Irwin III
2007-05-31  8:33                                   ` Srivatsa Vaddagiri [this message]
2007-05-31  8:43                                     ` William Lee Irwin III
2007-05-31  8:56                                     ` Srivatsa Vaddagiri
2007-05-31  9:15                                       ` William Lee Irwin III
2007-05-31  9:36                                         ` Srivatsa Vaddagiri
2007-05-28 17:26                     ` Srivatsa Vaddagiri
2007-05-29  0:18                       ` Peter Williams
2007-05-29  1:55                         ` Paul Menage
2007-05-29  3:30                         ` Peter Williams
2007-05-25  9:30         ` Guillaume Chazarain
     [not found] ` <20070523180316.GY19966@holomorphy.com>
2007-05-25 16:14   ` Srivatsa Vaddagiri
2007-05-25 17:14     ` Li, Tong N
2007-05-28 16:39       ` [ckrm-tech] " Srivatsa Vaddagiri
2007-05-30  0:14         ` Bill Huey
2007-05-30  2:51         ` William Lee Irwin III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070531083353.GF663@in.ibm.com \
    --to=vatsa@in.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@in.ibm.com \
    --cc=ckrm-tech@lists.sourceforge.net \
    --cc=containers@lists.osdl.org \
    --cc=dev@sw.ru \
    --cc=efault@gmx.de \
    --cc=guichaz@yahoo.fr \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pwil3058@bigpond.net.au \
    --cc=tingy@cs.umass.edu \
    --cc=tong.n.li@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.