From: Paul Jackson <pj@sgi.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: akpm@linux-foundation.org, menage@google.com,
linux-kernel@vger.kernel.org, dino@in.ibm.com, cpw@sgi.com,
mingo@elte.hu
Subject: Re: [PATCH] cpuset and sched domains: sched_load_balance flag
Date: Sun, 30 Sep 2007 20:42:52 -0700 [thread overview]
Message-ID: <20070930204252.45e20bb4.pj@sgi.com> (raw)
In-Reply-To: <200709301335.00441.nickpiggin@yahoo.com.au>
Nick wrote:
> Moreover, sched_load_balance doesn't really sound like a good name
> for asking for a partition.
Yup - it's not a good name for asking for a partition.
That's because it isn't asking for a partition.
It's asking for load balancing over the CPUs in the cpuset so marked.
> It's more like you're just asking to have better
> load balancing over that set,
Yup - it's asking for load balancing over that set. That is why it is
called that. There's no idea here of better or worse load balancing,
that's an internal kernel scheduler subtlety -- it's just a request that
load balancing be done.
That is what is visible to user space: whether or not tasks get moved
from overloaded CPUs to underloaded, though still allowed, CPUs.
This is visible to user space in two ways:
1) as task movemement, which may or may not be what is desired, and
2) as kernel CPU cycles spent, because load balancing costs CPU cycles
that increase more than linearly with the number of CPUs being
balanced.
The user doesn't give a hoot what a 'sched domain' is. They care to
manage (1) whether their tasks might move under a load imbalance, and
(2) how many CPU cycles the kernel spends providing this service.
> You would do this by creating partitioning cpusets which carve up the
> root cpuset (basically -- have multiple roots).
You would do this with the current, single rooted cpuset (and now
cgroup) mechanism by having multiple immediate child cpusets of the
root cpuset, which partition the system CPUs. There is no need to
invent some bastardized multiple root structure.
> You can't (easily) do this now because you have so many tasks in the
> root cpuset that it is impossible to know whether or not you
> actually want to load balance them.
I don't know what proposal you are reacting to here. Clearly not this
patch that I have proposed, as it is trivially easy to indicate whether
you want to load balance the root cpuset - by setting or clearing the
'sched_load_balance' flag in the root cpuset.
How could it possibly get any more direct that that?
> Neither approach is really fundamentally more or less powerful than
> the other, but what I object to in yours is adding these flags which
> don't allow the admin to specify what they want, but to specify how they
> want it done.
My approach doesn't do that - perhaps we aren't communicating.
We are in complete agreement that the admin should specify what they
want, and leave it to the kernel to figure out how to do it.
> Rather than require the admin to know the intricate details about
> how and why the scheduler load balancing gets broken, and when they
> might or might not need to use this flag, they can just specify what they
> want to be done, and the kernel can choose the optimal strategy.
Excellent -- I'm glad you like my approach </sarcasm>
> No, I'm insisting that *no* single administrative point of control
> determines the sched domains. Not directly. The kernel should.
> cpusets API should be rich enough that the kernel can derive tihs
> information from what the admin has intended.
We are in complete agreement in insisting on this.
In short:
The kernel schedulers dynamic sched domains are --not-- the service
being provided to the user. "Sched domains" are just the kernel
internal mechanism.
The service being provided is dynamic load balancing of tasks from
overloaded CPUs to underloaded CPUs.
Some users will want to disable load balancing on some cpusets, because
either:
(1) it's too expensive to balance really large cpusets unless really
needed, or
(2) real time users don't want to waste the CPU cycles doing
balancing even on small cpusets.
If you think I repeated everything two or three times above ... good,
you're right - I did.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
next prev parent reply other threads:[~2007-10-01 3:43 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-30 10:44 [PATCH] cpuset and sched domains: sched_load_balance flag Paul Jackson
2007-09-29 19:21 ` Nick Piggin
2007-09-30 18:07 ` Paul Jackson
2007-09-30 3:34 ` Nick Piggin
2007-10-01 3:42 ` Paul Jackson [this message]
2007-10-02 13:05 ` Nick Piggin
2007-10-03 6:58 ` Paul Jackson
2007-10-02 16:09 ` Nick Piggin
2007-10-03 9:55 ` Paul Jackson
2007-10-02 17:56 ` Nick Piggin
2007-10-03 11:38 ` Paul Jackson
2007-10-02 19:25 ` Nick Piggin
2007-10-03 12:14 ` Paul Jackson
2007-10-02 19:53 ` Nick Piggin
2007-10-03 12:41 ` Paul Jackson
2007-10-02 20:30 ` Nick Piggin
2007-10-03 17:46 ` Paul Jackson
2007-10-03 12:17 ` Paul Jackson
2007-10-02 20:31 ` Nick Piggin
2007-10-03 17:44 ` Paul Jackson
2007-10-01 18:15 ` Paul Jackson
2007-10-02 13:35 ` Nick Piggin
2007-10-03 6:22 ` [patch] sched: fix sched-domains partitioning by cpusets Ingo Molnar
2007-10-03 6:56 ` Paul Jackson
2007-10-02 15:46 ` Nick Piggin
2007-10-03 9:21 ` Paul Jackson
2007-10-02 17:23 ` Nick Piggin
2007-10-03 10:08 ` Paul Jackson
2007-10-03 9:35 ` Ingo Molnar
2007-10-03 9:39 ` Paul Jackson
2007-10-02 17:29 ` Nick Piggin
2007-10-03 7:20 ` Ingo Molnar
2007-10-03 7:25 ` [PATCH] cpuset and sched domains: sched_load_balance flag Paul Jackson
2007-10-02 16:14 ` Nick Piggin
2007-09-30 10:44 ` [PATCH] cpuset decrustify update and validate masks Paul Jackson
2007-09-30 17:33 ` [PATCH] cpuset and sched domains: sched_load_balance flag Ingo Molnar
2007-10-02 20:22 ` Randy Dunlap
2007-10-02 20:57 ` Paul Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070930204252.45e20bb4.pj@sgi.com \
--to=pj@sgi.com \
--cc=akpm@linux-foundation.org \
--cc=cpw@sgi.com \
--cc=dino@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=mingo@elte.hu \
--cc=nickpiggin@yahoo.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox