All of lore.kernel.org
 help / color / mirror / Atom feed
From: Srivatsa Vaddagiri <vatsa@in.ibm.com>
To: Kirill Korotaev <dev@sw.ru>
Cc: Ingo Molnar <mingo@elte.hu>,
	kernel@kolivas.org, Nick Piggin <nickpiggin@yahoo.com.au>,
	ckrm-tech@lists.sourceforge.net, efault@gmx.de,
	pwil3058@bigpond.net.au, wli@holomorphy.com,
	linux-kernel@vger.kernel.org, tingy@cs.umass.edu,
	tong.n.li@intel.com, containers@lists.osdl.org,
	akpm@linux-foundation.org, torvalds@linux-foundation.org,
	Guillaume Chazarain <guichaz@yahoo.fr>,
	Balbir Singh <balbir@in.ibm.com>
Subject: Re: [ckrm-tech] [RFC] [PATCH 0/3] Add group fairness to CFS
Date: Fri, 25 May 2007 21:04:50 +0530	[thread overview]
Message-ID: <20070525153450.GA4679@in.ibm.com> (raw)
In-Reply-To: <4656DF0C.9090306@sw.ru>

On Fri, May 25, 2007 at 05:05:16PM +0400, Kirill Korotaev wrote:
> > That way the scheduler would first pick a "virtual CPU" to schedule, and 
> > then pick a user from that virtual CPU, and then a task from the user. 
> 
> don't you mean the vice versa:
> first use to scheduler, then VCPU (which is essentially a runqueue or rbtree),
> then a task from VCPU?
> 
> this is the approach we use in OpenVZ [...]

So is this how it looks in OpenVZ?

	CONTAINER1 => VCPU0 + VCPU1 
	CONTAINER2 => VCPU2 + VCPU3

PCPU0 picks a container first, a vcpu next and then a task in it
PCPU1 also picks a container first, a vcpu next and then a task in it.

Few questions:

1. Are VCPU runqueues (on which tasks are present) global queues?
  
   That is, let's say that both PCPU0 and PCPU1 pick CONTAINER1 to schedule
   (first level) at the same time and next (let's say) they pick same vcpu
   VCPU0 to schedule (second level). Will the two pcpu's now have to be 
   serialized for scanning task to schedule next (third level) within VCPU0 
   using a spinlock?  Won't that shootup scheduling costs (esp on large
   systems), compared to (local scheduling + balance across cpus once in a 
   while, the way its done today)?

   Or do you required that two pcpus don't schedule the same vcpu at the
   same time (the way hypervisors normally work)? Even then I would
   imagine a fair level of contention to be present in second step (pick
   a virtual cpu from a container's list of vcpus).

2. How would this load balance at virtual cpu level and sched domain based
   load balancing interact?

   The current sched domain based balancing code has many HT/MC/SMT related 
   optimizations, which ensure that tasks are spread across physical 
   threads/cores/packages in a most efficient manner - so as to utilize 
   hardware bandwidth to the maximum. You would now need to introduce
   those optimizations essentially at schedule() time ..? Don't know
   if that is a wise thing to do.

3. How do you determine the number of VCPUs per container? Is there any
   relation for number of virtual cpus exposed per user/container and
   the number of available cpus? For ex: in case of user-driven
   scheduling, we would want all users to see the same number of cpus
   (which is the number available in the system).

4. VCPU ids (namespace) - is it different for different containers?

   For ex: can id's of vcpus belonging to different containers (say VCPU0 and 
   VCPU2), as seen by users thr' vgetcpu/smp_processor_id() that is, be same?
   If so, then potentially two threads belonging to different users may find
   that they are running -truly simultaneously- on /same/ cpu 0 (one on
   VCPU0/PCPU0 and another on VCPU2/PCPU1) which normally isn't possible!

   This may be ok for containers, with non-overlapping cpu id namespace,
   but when applied to group scheduling for, say, users, which require a
   global cpu id namespace, wondering how that would be addressed ..


> and if you don't mind I would propose to go this way for fair-scheduling in 
> mainstream.
> It has it's own advantages and disatvantages.
> 
> This is not the easy way to go and I can outline the problems/disadvantages
> which appear on this way:
> - tasks which bind to CPU mask will bind to virtual CPUs.
>   no problem with user tasks, [...]

Why is this not a problem for user tasks? Tasks which bind to different
CPUs for performance reason now can find that they are running on same
(physical) CPU unknowingly.

> but some kernel threads
>   use this to do CPU-related management (like cpufreq).
>   This can be fixed using SMP IPI actually.
> - VCPUs should no change PCPUs very frequently,
>   otherwise there is some overhead. Solvable.
> 
> Advantages:
> - High precision and fairness.

I just don't know if this benefit of high degree of fairness is worth the 
complexity it introduces. Besides having some data which shows how much better 
is is with respect to fairness/overhead when compared with other approaches 
(like smpnice) would help I guess. I will however let experts like Ingo make 
the final call here  :)

-- 
Regards,
vatsa

  reply	other threads:[~2007-05-25 15:27 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-23 16:48 [RFC] [PATCH 0/3] Add group fairness to CFS Srivatsa Vaddagiri
2007-05-23 16:51 ` [RFC] [PATCH 1/3] task_cpu(p) needs to be correct always Srivatsa Vaddagiri
2007-05-23 16:54 ` [RFC] [PATCH 2/3] Introduce two new structures - struct lrq and sched_entity Srivatsa Vaddagiri
2007-05-23 16:56 ` [RFC] [PATCH 3/3] Generalize CFS core and provide per-user fairness Srivatsa Vaddagiri
2007-05-23 18:32 ` [RFC] [PATCH 0/3] Add group fairness to CFS Ingo Molnar
2007-05-25  7:59   ` Srivatsa Vaddagiri
     [not found] ` <3d8471ca0705231112rfac9cfbt9145ac2da8ec1c85@mail.gmail.com>
     [not found]   ` <20070523183824.GA7388@elte.hu>
     [not found]     ` <4654BF88.3030404@yahoo.fr>
2007-05-25  7:45       ` Srivatsa Vaddagiri
2007-05-25  8:29         ` Ingo Molnar
2007-05-25 10:56           ` Srivatsa Vaddagiri
2007-05-25 11:11             ` Ingo Molnar
2007-05-25 11:28               ` Srivatsa Vaddagiri
2007-05-25 12:05                 ` Ingo Molnar
2007-05-25 12:41                   ` Srivatsa Vaddagiri
2007-05-25 13:05           ` Kirill Korotaev
2007-05-25 15:34             ` Srivatsa Vaddagiri [this message]
2007-05-25 16:18               ` [ckrm-tech] " Kirill Korotaev
2007-05-25 18:08                 ` Srivatsa Vaddagiri
2007-05-26  0:17                   ` Peter Williams
2007-05-26 15:41                     ` William Lee Irwin III
2007-05-27  1:29                       ` Peter Williams
2007-05-29 10:48                         ` William Lee Irwin III
2007-05-30  0:09                           ` Peter Williams
2007-05-30  2:48                             ` William Lee Irwin III
2007-05-30  4:07                               ` Peter Williams
2007-05-30 17:14                       ` Srivatsa Vaddagiri
2007-05-30 20:13                         ` William Lee Irwin III
2007-05-31  3:26                           ` Srivatsa Vaddagiri
2007-05-31  4:09                             ` William Lee Irwin III
2007-05-31  5:48                               ` Srivatsa Vaddagiri
2007-05-31  6:36                                 ` William Lee Irwin III
2007-05-31  8:33                                   ` Srivatsa Vaddagiri
2007-05-31  8:43                                     ` William Lee Irwin III
2007-05-31  8:56                                     ` Srivatsa Vaddagiri
2007-05-31  9:15                                       ` William Lee Irwin III
2007-05-31  9:36                                         ` Srivatsa Vaddagiri
2007-05-28 17:26                     ` Srivatsa Vaddagiri
2007-05-29  0:18                       ` Peter Williams
2007-05-29  1:55                         ` Paul Menage
2007-05-29  3:30                         ` Peter Williams
2007-05-25  9:30         ` Guillaume Chazarain
     [not found] ` <20070523180316.GY19966@holomorphy.com>
2007-05-25 16:14   ` Srivatsa Vaddagiri
2007-05-25 17:14     ` Li, Tong N
2007-05-28 16:39       ` [ckrm-tech] " Srivatsa Vaddagiri
2007-05-30  0:14         ` Bill Huey
2007-05-30  2:51         ` William Lee Irwin III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070525153450.GA4679@in.ibm.com \
    --to=vatsa@in.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@in.ibm.com \
    --cc=ckrm-tech@lists.sourceforge.net \
    --cc=containers@lists.osdl.org \
    --cc=dev@sw.ru \
    --cc=efault@gmx.de \
    --cc=guichaz@yahoo.fr \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pwil3058@bigpond.net.au \
    --cc=tingy@cs.umass.edu \
    --cc=tong.n.li@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.