From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Paul Jackson <pj@sgi.com>
Cc: akpm@linux-foundation.org, menage@google.com,
linux-kernel@vger.kernel.org, dino@in.ibm.com, cpw@sgi.com,
mingo@elte.hu
Subject: Re: [PATCH] cpuset and sched domains: sched_load_balance flag
Date: Sun, 30 Sep 2007 13:34:59 +1000 [thread overview]
Message-ID: <200709301335.00441.nickpiggin@yahoo.com.au> (raw)
In-Reply-To: <20070930110753.19b03388.pj@sgi.com>
On Monday 01 October 2007 04:07, Paul Jackson wrote:
> Nick wrote:
> > The user should just be able to specify exactly the partitioning of
> > tasks required, and cpusets should ask the scheduler to do the best
> > job of load balancing possible.
>
> If the cpusets which have 'sched_load_balance' enabled are disjoint
> (their 'cpus' cpus_allowed masks don't overlap) then you get exactly
> what you're asking for. In that case there is exactly one sched domain
> for the 'cpus' allowed by each cpuset that has sched_load_balanced
> enabled.
But you could do that just by having the current cpuset scheme able
to properly partition the system. You can't (easily) do this now because
you have so many tasks in the root cpuset that it is impossible to know
whether or not you actually want to load balance them.
You would do this by creating partitioning cpusets which carve up the
root cpuset (basically -- have multiple roots).
> But there is another case in which one does not want what you ask for.
>
> That case involves the situation where one is running a third part
> batch scheduler on part of ones big system, and doing other stuff
> (perhaps Ingo's realtime stuff) on another part of the system.
In this case the admin would simply not partition the system (they
would retain a single root cpuset).
Neither approach is really fundamentally more or less powerful than
the other, but what I object to in yours is adding these flags which
don't allow the admin to specify what they want, but to specify how they
want it done.
Moreover, sched_load_balance doesn't really sound like a good name
for asking for a partition. It's more like you're just asking to have better
load balancing over that set, which you could equally achieve by adding
a second set of sched domains (and the global domains could keep
globally balancing).
Basically: the admin doesn't know best when it comes to how the
scheduler should work; the admin knows best about how they intend
the system to be used.
> In that case, the system admin will be advised to turn off
> sched_load_balance on the top cpuset. But in that case the system
> admin will -not- know from moment to moment what jobs the batch
> scheduler is running on the cpus assigned to its control. Only the
> batch scheduler knows that.
>
> The batch scheduler is code that was written by someone else, in
> some other company, some other time. That code does not get to
> control the overall sched domain partitioning of the entire system.
> The batch scheduler gets to say, in affect:
>
> Here's where I need load balancing to occur, in the normal fashion,
> and here's where I don't need it.
Rather than require the admin to know the intricate details about
how and why the scheduler load balancing gets broken, and when they
might or might not need to use this flag, they can just specify what they
want to be done, and the kernel can choose the optimal strategy.
> In short, you insisting that only a single administrative point of
> control determine the systems sched domains. Sometimes that fits
> the way the system is managed, and my patch lets you do that. But
> sometimes this is a shared responsibility, between a piece of third
> party software and the system admin, and my patch allows for that
> case as well.
>
> This is a typical sort of situation that arises from having hierarchical
> cpuset definitions, and highlights the reason (and the use case,
> involving third party batch schedulers) that I went with a hierarchical
> cpuset architecture in the first place.
No, I'm insisting that *no* single administrative point of control
determines the sched domains. Not directly. The kernel should.
cpusets API should be rich enough that the kernel can derive tihs
information from what the admin has intended.
next prev parent reply other threads:[~2007-09-30 20:06 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-30 10:44 [PATCH] cpuset and sched domains: sched_load_balance flag Paul Jackson
2007-09-29 19:21 ` Nick Piggin
2007-09-30 18:07 ` Paul Jackson
2007-09-30 3:34 ` Nick Piggin [this message]
2007-10-01 3:42 ` Paul Jackson
2007-10-02 13:05 ` Nick Piggin
2007-10-03 6:58 ` Paul Jackson
2007-10-02 16:09 ` Nick Piggin
2007-10-03 9:55 ` Paul Jackson
2007-10-02 17:56 ` Nick Piggin
2007-10-03 11:38 ` Paul Jackson
2007-10-02 19:25 ` Nick Piggin
2007-10-03 12:14 ` Paul Jackson
2007-10-02 19:53 ` Nick Piggin
2007-10-03 12:41 ` Paul Jackson
2007-10-02 20:30 ` Nick Piggin
2007-10-03 17:46 ` Paul Jackson
2007-10-03 12:17 ` Paul Jackson
2007-10-02 20:31 ` Nick Piggin
2007-10-03 17:44 ` Paul Jackson
2007-10-01 18:15 ` Paul Jackson
2007-10-02 13:35 ` Nick Piggin
2007-10-03 6:22 ` [patch] sched: fix sched-domains partitioning by cpusets Ingo Molnar
2007-10-03 6:56 ` Paul Jackson
2007-10-02 15:46 ` Nick Piggin
2007-10-03 9:21 ` Paul Jackson
2007-10-02 17:23 ` Nick Piggin
2007-10-03 10:08 ` Paul Jackson
2007-10-03 9:35 ` Ingo Molnar
2007-10-03 9:39 ` Paul Jackson
2007-10-02 17:29 ` Nick Piggin
2007-10-03 7:20 ` Ingo Molnar
2007-10-03 7:25 ` [PATCH] cpuset and sched domains: sched_load_balance flag Paul Jackson
2007-10-02 16:14 ` Nick Piggin
2007-09-30 10:44 ` [PATCH] cpuset decrustify update and validate masks Paul Jackson
2007-09-30 17:33 ` [PATCH] cpuset and sched domains: sched_load_balance flag Ingo Molnar
2007-10-02 20:22 ` Randy Dunlap
2007-10-02 20:57 ` Paul Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200709301335.00441.nickpiggin@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=akpm@linux-foundation.org \
--cc=cpw@sgi.com \
--cc=dino@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=mingo@elte.hu \
--cc=pj@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox