public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dinakar Guniguntala <dino@in.ibm.com>
To: Paul Jackson <pj@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
	mbligh@google.com, akpm@osdl.org, menage@google.com,
	Simon.Derr@bull.net, linux-kernel@vger.kernel.org,
	rohitseth@google.com, holt@sgi.com, dipankar@in.ibm.com,
	suresh.b.siddha@intel.com, clameter@sgi.com
Subject: Re: [RFC] cpuset: remove sched domain hooks from cpusets
Date: Sat, 21 Oct 2006 02:00:16 +0530	[thread overview]
Message-ID: <20061020203016.GA26421@in.ibm.com> (raw)
In-Reply-To: <20061020120005.61239317.pj@sgi.com>

Hi Paul,

This mail seems to be as good as any to reply to, so here goes

On Fri, Oct 20, 2006 at 12:00:05PM -0700, Paul Jackson wrote:
> > The patch I posted previously should (modulo bugs) only do partitioning
> > in the top-most cpuset. I still need clarification from Paul as to why
> > this is unacceptable, though.
> 
> That patch partitioned on the children of the top cpuset, not the
> top cpuset itself.
> 
> There is only one top cpuset - and that covers the entire system.
> 
> Consider the following example:
> 
> 	/dev/cpuset		cpu_exclusive=1, cpus=0-7, task A
> 	/dev/cpuset/a		cpu_exclusive=1, cpus=0-3, task B
> 	/dev/cpuset/b		cpu_exclusive=1, cpus=4-7, task C
> 
> We have three cpusets - the top cpuset and two children, 'a' and 'b'.
> 
> We have three tasks, A, B and C.  Task A is running in the top cpuset,
> with access to all 8 cpus on the system.  Tasks B and C are each in
> a child cpuset, with access to just 4 cpus.
> 
> By your patch, the cpu_exclusive cpusets 'a' and 'b' partition the
> sched domains in two halves, each covering 4 of the systems 8 cpus.
> (That, or I'm still a sched domain idiot - quite possible.)
> 
> As a result, task A is screwed.  If it happens to be on any of cpus
> 0-3 when the above is set up and the sched domains become partitioned,
> it will never be considered for load balancing on any of cpus 4-7.
> Or vice versa, if it is on any of cpus 4-7, it has no chance of
> subsequently running on cpus 0-3.

ok I see the issue here, although the above has been the case all along.
I think the main issue here is that most of the users dont have to
do more than one level of partitioning (having to partitioning a system
with not more than 16 - 32 cpus, mostly less) and it is fairly easy to keep
track of exclusive cpusets and task placements and this is not such
a big problem at all. However I can see that with 1024 cpus this is not
trivial anymore to remember all of the partitioning especially if the
partioning is more than 2 levels deep and that its gets unwieldy

So I propose the following changes to cpusets

1. Have a new flag that takes care of sched domains. (say sched_domain)
   Although I still think that we can still tag sched domains at the
   back of exclusive cpusets, I think it best to separate the two
   and maybe even add a separate CONFIG option for this. This way we
   can keep any complexity arising out of this, such as hotplug/sched
   domains all under the config.
2. The main change is that we dont allow tasks to be added to a cpuset
   if it has child cpusets that also have the sched_domain flag turned on
   (Maybe return a EINVAL if the user tries to do that)

Clearly one issue remains, tasks that are already running at the top cpuset.
Unless these are manually moved down to the correct cpuset heirarchy they
will continue to have the problem as before. I still dont have a simple
enough solution for this at the moment other than to document this at the
moment. But I still think on smaller systems this should be fairly easy task
for the administrator if they really know what they are doing. And the
fact that we have a separate flag to indicate the sched domain partitioning
should make it harder for them to shoot themselves in the foot.
Maybe there are other better ways to resolve this ?

One point I would argue against is to completely decouple cpusets and
sched domains. We do need a way to partition sched domains and doing
it along the lines of cpusets seems to be the most logical. This is
also much simpler in terms of additional lines of code needed to support
this feature. (as compared to adding a whole new API just to do this)

	-Dinakar

  reply	other threads:[~2006-10-20 20:30 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-19  9:23 [RFC] cpuset: remove sched domain hooks from cpusets Paul Jackson
2006-10-19 10:24 ` Nick Piggin
2006-10-19 19:03   ` Paul Jackson
2006-10-19 19:21     ` Nick Piggin
2006-10-19 19:50       ` Martin Bligh
2006-10-20  0:14         ` Paul Jackson
2006-10-20 16:03         ` Nick Piggin
2006-10-20 17:29           ` Siddha, Suresh B
2006-10-20 19:19             ` Paul Jackson
2006-10-20 19:00           ` Paul Jackson
2006-10-20 20:30             ` Dinakar Guniguntala [this message]
2006-10-20 21:41               ` Paul Jackson
2006-10-20 22:35                 ` Dinakar Guniguntala
2006-10-20 23:14                   ` Siddha, Suresh B
2006-10-21  5:37                     ` Paul Jackson
2006-10-23  4:31                       ` Siddha, Suresh B
2006-10-23  5:59                         ` Paul Jackson
2006-10-21 23:05                     ` Paul Jackson
2006-10-22 12:02                   ` Paul Jackson
2006-10-23  3:09                     ` Paul Jackson
2006-10-20 21:46               ` Paul Jackson
2006-10-21 18:23         ` Paul Menage
2006-10-21 20:55           ` Paul Jackson
2006-10-21 20:59             ` Paul Menage
2006-10-22 10:51         ` Paul Jackson
2006-10-23  5:26           ` Siddha, Suresh B
2006-10-23  5:54             ` Paul Jackson
2006-10-23  5:43               ` Siddha, Suresh B
2006-10-23  6:02               ` Nick Piggin
2006-10-23  6:16                 ` Paul Jackson
2006-10-23 16:03                 ` Christoph Lameter
2006-11-09 10:59                   ` Paul Jackson
2006-10-23 16:01               ` Christoph Lameter
  -- strict thread matches above, loose matches on Subject: below --
2006-10-30 21:26 [RFC] cpuset: Remove " Dinakar Guniguntala

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061020203016.GA26421@in.ibm.com \
    --to=dino@in.ibm.com \
    --cc=Simon.Derr@bull.net \
    --cc=akpm@osdl.org \
    --cc=clameter@sgi.com \
    --cc=dipankar@in.ibm.com \
    --cc=holt@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbligh@google.com \
    --cc=menage@google.com \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pj@sgi.com \
    --cc=rohitseth@google.com \
    --cc=suresh.b.siddha@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox