public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Gregory Haskins" <ghaskins@novell.com>
To: "Paul Jackson" <pj@sgi.com>
Cc: <a.p.zijlstra@chello.nl>, <mingo@elte.hu>,
	<dmitry.adamushko@gmail.com>, <rostedt@goodmis.org>,
	<menage@google.com>, <rientjes@google.com>, <tong.n.li@intel.com>,
	<tglx@linutronix.de>, <akpm@linux-foundation.org>,
	<dhaval@linux.vnet.ibm.com>, <vatsa@linux.vnet.ibm.com>,
	<sgrubb@redhat.com>, <linux-kernel@vger.kernel.org>,
	<ebiederm@xmission.com>, <nickpiggin@yahoo.com.au>
Subject: Re: scheduler scalability - cgroups, cpusets and load-balancing
Date: Tue, 29 Jan 2008 09:42:16 -0700	[thread overview]
Message-ID: <479F1118.BA47.005A.0@novell.com> (raw)
In-Reply-To: <20080129102836.be614579.pj@sgi.com>

>>> On Tue, Jan 29, 2008 at 11:28 AM, in message
<20080129102836.be614579.pj@sgi.com>, Paul Jackson <pj@sgi.com> wrote: 
> Gregory wrote:
>>   I am a bit confused as to why you disable load-balancing in the
>>   RT cpuset?  It shouldn't be strictly necessary in order for the
>>   RT scheduler to do its job (unless I am misunderstanding what you
>>   are trying to accomplish?).  Do you do this because you *have*
>>   to in order to make real-time deadlines, or because its just a
>>   further optimization?
> 
> My primary motivation for cpusets originally, and for the
> sched_load_balance flag now, was not realtime, but "soft partitioning"
> of big NUMA systems, especially for batch schedulers.  They sometimes
> have large cpusets which are only being used to hold smaller, per-job,
> cpusets.  It is a waste of time (CPU cycles in the kernel sched code)
> to load balance those large cpusets.  Load balancing doesn't scale
> easily to high CPU counts, and it's nice to avoid doing that where
> not needed.

Understood, and that makes tons of sense.

> 
> See the following lkml message for a fuller explanation:
> 
>   http://lkml.org/lkml/2008/1/29/85
> 
> As a secondary motivation, I thought that disabling load balancing on
> the RT cpuset was the right thing to do for RT needs, but I make no
> claim to knowing much about RT.

Well, I make no claim to understand the large batch systems you work on either ;)  Everything you said made a ton of sense other than the RT/load-balance thing, but I think we are on the same page now.

> 
> I just now realized that you added a 'root_domain' in a patch in
> late Nov and early Dec.   I was on the road then, moving from
> California to Texas, and not paying much attention to Linux.

np (though I was wondering why you had no comment before ;)

> 
> A couple of questions on that patch, both involving a comment it adds
> to kernel/sched.c:
> 
> /*
>  * We add the notion of a root-domain which will be used to define per-domain
>  * variables. Each exclusive cpuset essentially defines an island domain by
>  * fully partitioning the member cpus from any other cpuset. Whenever a new
>  * exclusive cpuset is created, we also create and attach a new root-domain
>  * object.
>  */
> 
> 1) What are 'per-domain' variables?

s/per-domain/per-root-domain

> 
> 2) The mention of 'exclusive cpuset' is no longer correct.
> 
>    With the patch 'remove sched domain hooks from cpusets' cpusets
>    no longer defines sched domains using the cpu_exclusive flag.
> 
>    With the subsequent sched_load_balance patch (see
>    http://lkml.org/lkml/2007/10/6/19) cpusets uses a new per-cpuset
>    flag 'sched_load_balance' to define sched domains.

Doh!  Thanks for the heads up.  

> 
> The following revised comment might be more accurate:
> 
> /*
>  * We add the notion of a root-domain which will be used to define per-domain
>  * variables.  Each non-overlapping sched domain defines an island domain by
>  * fully partitioning the member cpus from any other cpuset. Whenever a new
>  * such a sched domain is created, we also create and attach a new 
> root-domain
>  * object.  These non-overlapping sched domains are determined by the cpuset
>  * configuration, via a call to partition_sched_domains().
>  */
> 
> It sounds like you (Gregory, others) want your RT CPUs to be in a sched
> domain, unlike the current way things are, where my cpuset code
> carefully avoids setting up a sched domain for those CPUs.  However I
> still have need, in the batch scheduler case explained above, to have
> some CPUs not in any sched domain.
> 
> If you require these RT sched domains to be setup differently somehow,
> in some way that is visible to partition_sched_domains, then that
> apparently means we need a per-cpuset flag to mark those RT cpusets.

I think we only need a plain-vanilla partition, so no flags should be necessary.

-Greg

> 
> If you just want an ordinary sched domain setup (just so long as it
> contains only the intended RT CPUs, not others) then I guess we don't
> technically need any more per-cpuset flags, but I'm worried, because
> the API we're presenting to users for this has just gone from subtle to
> bizarre.  I suspect I'll want to add a flag anyway, if by doing so, I
> can make the kernel-user API, via cpusets, easier to understand.




  reply	other threads:[~2008-01-29 16:48 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-29  9:53 scheduler scalability - cgroups, cpusets and load-balancing Peter Zijlstra
2008-01-29 10:01 ` Paul Jackson
2008-01-29 10:50   ` Peter Zijlstra
2008-01-29 11:13     ` Paul Jackson
2008-01-29 11:31       ` Peter Zijlstra
2008-01-29 11:53         ` Paul Jackson
2008-01-29 12:07           ` Peter Zijlstra
2008-01-29 12:36             ` Paul Jackson
2008-01-29 12:03         ` Paul Jackson
2008-01-29 12:30           ` Peter Zijlstra
2008-01-29 12:52             ` Paul Jackson
2008-01-29 13:38               ` Peter Zijlstra
2008-01-29 10:57 ` Peter Zijlstra
2008-01-29 11:30   ` Paul Jackson
2008-01-29 11:34     ` Paul Jackson
2008-01-29 11:50     ` Peter Zijlstra
2008-01-29 12:12       ` Paul Jackson
2008-01-29 15:57         ` Gregory Haskins
2008-01-29 16:33           ` Paul Jackson
2008-01-29 15:50       ` Gregory Haskins
2008-01-29 16:51         ` Paul Jackson
2008-01-29 17:21           ` Gregory Haskins
2008-01-29 19:04             ` Paul Jackson
2008-01-29 20:36               ` Gregory Haskins
2008-01-29 21:02                 ` Paul Jackson
2008-01-29 21:07                   ` Gregory Haskins
2008-01-29 15:36     ` Gregory Haskins
2008-01-29 16:28       ` Paul Jackson
2008-01-29 16:42         ` Gregory Haskins [this message]
2008-01-29 19:37           ` Paul Jackson
2008-01-29 20:28             ` Gregory Haskins
2008-01-29 20:56               ` Paul Jackson
2008-01-29 21:02                 ` Gregory Haskins
2008-01-29 22:23                   ` Steven Rostedt
2008-01-29 12:32   ` Srivatsa Vaddagiri
2008-01-29 12:21     ` Paul Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=479F1118.BA47.005A.0@novell.com \
    --to=ghaskins@novell.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=dhaval@linux.vnet.ibm.com \
    --cc=dmitry.adamushko@gmail.com \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pj@sgi.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=sgrubb@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tong.n.li@intel.com \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox