From: Paul Jackson <pj@sgi.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: akpm@linux-foundation.org, menage@google.com,
linux-kernel@vger.kernel.org, dino@in.ibm.com, cpw@sgi.com,
mingo@elte.hu
Subject: Re: [PATCH] cpuset and sched domains: sched_load_balance flag
Date: Wed, 3 Oct 2007 04:38:18 -0700 [thread overview]
Message-ID: <20071003043818.baa0e1ce.pj@sgi.com> (raw)
In-Reply-To: <200710030356.18090.nickpiggin@yahoo.com.au>
> OK, so to really do anything different (from a non-partitioned setup),
> you would need to set sched_load_balance=0 for the root cpuset?
Yup - exactly. In fact one code fragment in my patch highlights this:
/* Special case for the 99% of systems with one, full, sched domain */
if (is_sched_load_balance(&top_cpuset)) {
ndoms = 1;
doms = kmalloc(sizeof(cpumask_t), GFP_KERNEL);
*doms = top_cpuset.cpus_allowed;
goto rebuild;
}
This code says: if the top cpuset is load balanced, you've got one
big fat sched domain covering all (nonisolated) CPUs - end of story.
None of the other 'sched_load_balance' flags matter in this case.
Logically, the above code fragment is not needed. Without it, the
code would still do the same thing, just wasting more CPU cycles doing
it.
> Suppose you do that to hard partition the machine, what happens to
> newly created tasks like kernel threads or things that aren't in a
> cpuset?
Well ... --every-- task is in a cpuset, always. Newly created tasks
start in the cpuset of their parent. Grep for 'the_top_cpuset_hack'
in kernel/cpuset.c to see the lengths to which we go to ensure that
current->cpuset always resolves somewhere.
The usual case on the big systems that I care about the most is
that we move (almost) every task out of the top cpuset, into smaller
cpusets, because we don't want some random thread intruding on the
CPUs dedicated to a particular job. The only threads left in the root
cpuset are pinned kernel threads, such as for thread migration, per-cpu
irq handlers and various per-cpu and per-node disk and file flushers
and such. These threads aren't going anywhere, regardless. But no
thread that is willing to run anywhere is left free to run anywhere.
I will advise my third party batch scheduler developers to turn off
sched_load_balance on their main cpuset, and on any big "holding tank"
cpusets they have which hold only inactive jobs. This way, on big
systems that are managed to optimize for this, the kernel scheduler
won't waste time load balancing the batch schedulers big cpusets that
don't need it. With the 'sched_load_balance' flag defined the way
it is, the batch scheduler won't have to make system-wide decisions
as to sched domain partitioning. They can just make local 'advisory'
markings on particular cpusets that (1) are or might be big, and (2)
don't hold any active tasks that might need load balancing. The system
will take it from there, providing the finest granularity sched domain
partitioning that will accomplish that.
I will advise the system admins of bigger systems to turn off
sched_load_balance on the top cpuset, as part of the above work
routinely done to get all non-pinned tasks out of the top cpuset.
I will advise the real time developers using cpusets to: (1) turn off
sched_load_balance on their real time cpusets, and (2) insist that
the sys admins using their products turn off sched_load_balance on
the top cpuset, to ensure the expected realtime performance is obtained.
Most systems, even medium size ones (for some definition of medium,
perhaps dozens of CPUs?) so long as they aren't running realtime on
some CPUs, can just run with the default - one big fat load balanced
sched domain ... unless of course they have some other need not
considered above.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
next prev parent reply other threads:[~2007-10-03 11:38 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-30 10:44 [PATCH] cpuset and sched domains: sched_load_balance flag Paul Jackson
2007-09-29 19:21 ` Nick Piggin
2007-09-30 18:07 ` Paul Jackson
2007-09-30 3:34 ` Nick Piggin
2007-10-01 3:42 ` Paul Jackson
2007-10-02 13:05 ` Nick Piggin
2007-10-03 6:58 ` Paul Jackson
2007-10-02 16:09 ` Nick Piggin
2007-10-03 9:55 ` Paul Jackson
2007-10-02 17:56 ` Nick Piggin
2007-10-03 11:38 ` Paul Jackson [this message]
2007-10-02 19:25 ` Nick Piggin
2007-10-03 12:14 ` Paul Jackson
2007-10-02 19:53 ` Nick Piggin
2007-10-03 12:41 ` Paul Jackson
2007-10-02 20:30 ` Nick Piggin
2007-10-03 17:46 ` Paul Jackson
2007-10-03 12:17 ` Paul Jackson
2007-10-02 20:31 ` Nick Piggin
2007-10-03 17:44 ` Paul Jackson
2007-10-01 18:15 ` Paul Jackson
2007-10-02 13:35 ` Nick Piggin
2007-10-03 6:22 ` [patch] sched: fix sched-domains partitioning by cpusets Ingo Molnar
2007-10-03 6:56 ` Paul Jackson
2007-10-02 15:46 ` Nick Piggin
2007-10-03 9:21 ` Paul Jackson
2007-10-02 17:23 ` Nick Piggin
2007-10-03 10:08 ` Paul Jackson
2007-10-03 9:35 ` Ingo Molnar
2007-10-03 9:39 ` Paul Jackson
2007-10-02 17:29 ` Nick Piggin
2007-10-03 7:20 ` Ingo Molnar
2007-10-03 7:25 ` [PATCH] cpuset and sched domains: sched_load_balance flag Paul Jackson
2007-10-02 16:14 ` Nick Piggin
2007-09-30 10:44 ` [PATCH] cpuset decrustify update and validate masks Paul Jackson
2007-09-30 17:33 ` [PATCH] cpuset and sched domains: sched_load_balance flag Ingo Molnar
2007-10-02 20:22 ` Randy Dunlap
2007-10-02 20:57 ` Paul Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071003043818.baa0e1ce.pj@sgi.com \
--to=pj@sgi.com \
--cc=akpm@linux-foundation.org \
--cc=cpw@sgi.com \
--cc=dino@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=mingo@elte.hu \
--cc=nickpiggin@yahoo.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox