From: "Gregory Haskins" <ghaskins@novell.com>
To: "Paul Jackson" <pj@sgi.com>
Cc: <a.p.zijlstra@chello.nl>, <mingo@elte.hu>,
<dmitry.adamushko@gmail.com>, <rostedt@goodmis.org>,
<menage@google.com>, <rientjes@google.com>, <tong.n.li@intel.com>,
<tglx@linutronix.de>, <akpm@linux-foundation.org>,
<dhaval@linux.vnet.ibm.com>, <vatsa@linux.vnet.ibm.com>,
<sgrubb@redhat.com>, <linux-kernel@vger.kernel.org>,
<ebiederm@xmission.com>, <nickpiggin@yahoo.com.au>
Subject: Re: scheduler scalability - cgroups, cpusets and load-balancing
Date: Tue, 29 Jan 2008 09:42:16 -0700 [thread overview]
Message-ID: <479F1118.BA47.005A.0@novell.com> (raw)
In-Reply-To: <20080129102836.be614579.pj@sgi.com>
>>> On Tue, Jan 29, 2008 at 11:28 AM, in message
<20080129102836.be614579.pj@sgi.com>, Paul Jackson <pj@sgi.com> wrote:
> Gregory wrote:
>> I am a bit confused as to why you disable load-balancing in the
>> RT cpuset? It shouldn't be strictly necessary in order for the
>> RT scheduler to do its job (unless I am misunderstanding what you
>> are trying to accomplish?). Do you do this because you *have*
>> to in order to make real-time deadlines, or because its just a
>> further optimization?
>
> My primary motivation for cpusets originally, and for the
> sched_load_balance flag now, was not realtime, but "soft partitioning"
> of big NUMA systems, especially for batch schedulers. They sometimes
> have large cpusets which are only being used to hold smaller, per-job,
> cpusets. It is a waste of time (CPU cycles in the kernel sched code)
> to load balance those large cpusets. Load balancing doesn't scale
> easily to high CPU counts, and it's nice to avoid doing that where
> not needed.
Understood, and that makes tons of sense.
>
> See the following lkml message for a fuller explanation:
>
> http://lkml.org/lkml/2008/1/29/85
>
> As a secondary motivation, I thought that disabling load balancing on
> the RT cpuset was the right thing to do for RT needs, but I make no
> claim to knowing much about RT.
Well, I make no claim to understand the large batch systems you work on either ;) Everything you said made a ton of sense other than the RT/load-balance thing, but I think we are on the same page now.
>
> I just now realized that you added a 'root_domain' in a patch in
> late Nov and early Dec. I was on the road then, moving from
> California to Texas, and not paying much attention to Linux.
np (though I was wondering why you had no comment before ;)
>
> A couple of questions on that patch, both involving a comment it adds
> to kernel/sched.c:
>
> /*
> * We add the notion of a root-domain which will be used to define per-domain
> * variables. Each exclusive cpuset essentially defines an island domain by
> * fully partitioning the member cpus from any other cpuset. Whenever a new
> * exclusive cpuset is created, we also create and attach a new root-domain
> * object.
> */
>
> 1) What are 'per-domain' variables?
s/per-domain/per-root-domain
>
> 2) The mention of 'exclusive cpuset' is no longer correct.
>
> With the patch 'remove sched domain hooks from cpusets' cpusets
> no longer defines sched domains using the cpu_exclusive flag.
>
> With the subsequent sched_load_balance patch (see
> http://lkml.org/lkml/2007/10/6/19) cpusets uses a new per-cpuset
> flag 'sched_load_balance' to define sched domains.
Doh! Thanks for the heads up.
>
> The following revised comment might be more accurate:
>
> /*
> * We add the notion of a root-domain which will be used to define per-domain
> * variables. Each non-overlapping sched domain defines an island domain by
> * fully partitioning the member cpus from any other cpuset. Whenever a new
> * such a sched domain is created, we also create and attach a new
> root-domain
> * object. These non-overlapping sched domains are determined by the cpuset
> * configuration, via a call to partition_sched_domains().
> */
>
> It sounds like you (Gregory, others) want your RT CPUs to be in a sched
> domain, unlike the current way things are, where my cpuset code
> carefully avoids setting up a sched domain for those CPUs. However I
> still have need, in the batch scheduler case explained above, to have
> some CPUs not in any sched domain.
>
> If you require these RT sched domains to be setup differently somehow,
> in some way that is visible to partition_sched_domains, then that
> apparently means we need a per-cpuset flag to mark those RT cpusets.
I think we only need a plain-vanilla partition, so no flags should be necessary.
-Greg
>
> If you just want an ordinary sched domain setup (just so long as it
> contains only the intended RT CPUs, not others) then I guess we don't
> technically need any more per-cpuset flags, but I'm worried, because
> the API we're presenting to users for this has just gone from subtle to
> bizarre. I suspect I'll want to add a flag anyway, if by doing so, I
> can make the kernel-user API, via cpusets, easier to understand.
next prev parent reply other threads:[~2008-01-29 16:48 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-29 9:53 scheduler scalability - cgroups, cpusets and load-balancing Peter Zijlstra
2008-01-29 10:01 ` Paul Jackson
2008-01-29 10:50 ` Peter Zijlstra
2008-01-29 11:13 ` Paul Jackson
2008-01-29 11:31 ` Peter Zijlstra
2008-01-29 11:53 ` Paul Jackson
2008-01-29 12:07 ` Peter Zijlstra
2008-01-29 12:36 ` Paul Jackson
2008-01-29 12:03 ` Paul Jackson
2008-01-29 12:30 ` Peter Zijlstra
2008-01-29 12:52 ` Paul Jackson
2008-01-29 13:38 ` Peter Zijlstra
2008-01-29 10:57 ` Peter Zijlstra
2008-01-29 11:30 ` Paul Jackson
2008-01-29 11:34 ` Paul Jackson
2008-01-29 11:50 ` Peter Zijlstra
2008-01-29 12:12 ` Paul Jackson
2008-01-29 15:57 ` Gregory Haskins
2008-01-29 16:33 ` Paul Jackson
2008-01-29 15:50 ` Gregory Haskins
2008-01-29 16:51 ` Paul Jackson
2008-01-29 17:21 ` Gregory Haskins
2008-01-29 19:04 ` Paul Jackson
2008-01-29 20:36 ` Gregory Haskins
2008-01-29 21:02 ` Paul Jackson
2008-01-29 21:07 ` Gregory Haskins
2008-01-29 15:36 ` Gregory Haskins
2008-01-29 16:28 ` Paul Jackson
2008-01-29 16:42 ` Gregory Haskins [this message]
2008-01-29 19:37 ` Paul Jackson
2008-01-29 20:28 ` Gregory Haskins
2008-01-29 20:56 ` Paul Jackson
2008-01-29 21:02 ` Gregory Haskins
2008-01-29 22:23 ` Steven Rostedt
2008-01-29 12:32 ` Srivatsa Vaddagiri
2008-01-29 12:21 ` Paul Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=479F1118.BA47.005A.0@novell.com \
--to=ghaskins@novell.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=dhaval@linux.vnet.ibm.com \
--cc=dmitry.adamushko@gmail.com \
--cc=ebiederm@xmission.com \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=mingo@elte.hu \
--cc=nickpiggin@yahoo.com.au \
--cc=pj@sgi.com \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=sgrubb@redhat.com \
--cc=tglx@linutronix.de \
--cc=tong.n.li@intel.com \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.