From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: linux-kernel <linux-kernel@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>, vatsa <vatsa@linux.vnet.ibm.com>,
Dhaval Giani <dhaval@linux.vnet.ibm.com>,
Paul Jackson <pj@sgi.com>, Nick Piggin <nickpiggin@yahoo.com.au>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Andrew Morton <akpm@linux-foundation.org>,
Steve Grubb <sgrubb@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Gregory Haskins <ghaskins@novell.com>,
Dmitry Adamushko <dmitry.adamushko@gmail.com>,
"Li, Tong N" <tong.n.li@intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
Paul Menage <menage@google.com>,
David Rientjes <rientjes@google.com>
Subject: scheduler scalability - cgroups, cpusets and load-balancing
Date: Tue, 29 Jan 2008 10:53:48 +0100 [thread overview]
Message-ID: <1201600428.28547.87.camel@lappy> (raw)
Hi All,
Some of the fancy new scheduler features such as the cgroup load
balancer (load_balance_monitor) and the real-time load balancer are a
bit of an scalability issue. They all seem to want a rather strong
global bound to keep a global fairness (which is quite understandable).
[ my own interest is currently real-time group scheduling on multiple
cpus, and that seems to require _very_ strong bonds ]
I think the current stuff would scale up to 8 maybe 16 cpus, but after
that I'd be real worried.
Now we want distributions to enable most of these features. Distros seem
to want containers, but distros also need to support 128+ cpu machines,
so how are we going to solve this.
My thoughts were to make stronger use of disjoint cpu-sets. cgroups and
cpusets are related, in that cpusets provide a property to a cgroup.
However, load_balance_monitor()'s interaction with sched domains
confuses me - it might DTRT, but I can't tell.
[ It looks to me it balances a group over the largest SD the current cpu
has access to, even though that might be larger than the SD associated
with the cpuset of that particular cgroup. ]
Also the RT load-balance needs to become aware of such these sets, I
think Paul J and Steven once talked about it, but can't quite remember
where that ended. From my POV there should be sched-domain based balance
information, not global.
By cutting the problem into smaller pieces, and adding tunables to
weaken to global fairness, I think we can give administrators enough
freedom to make use of these features, even on the largest of machines.
[ so I'd move the load_balance_monitor() tunables into cpusets as well,
I can imagine a smaller cpuset wanting a stronger fairness than a much
larger cpuset. ]
I understand its a somewhat hand-wavey email, but I wanted to start
discussion on the issue, or have someone show me I'm wrong and can stop
worrying :-).
Peter
next reply other threads:[~2008-01-29 9:54 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-29 9:53 Peter Zijlstra [this message]
2008-01-29 10:01 ` scheduler scalability - cgroups, cpusets and load-balancing Paul Jackson
2008-01-29 10:50 ` Peter Zijlstra
2008-01-29 11:13 ` Paul Jackson
2008-01-29 11:31 ` Peter Zijlstra
2008-01-29 11:53 ` Paul Jackson
2008-01-29 12:07 ` Peter Zijlstra
2008-01-29 12:36 ` Paul Jackson
2008-01-29 12:03 ` Paul Jackson
2008-01-29 12:30 ` Peter Zijlstra
2008-01-29 12:52 ` Paul Jackson
2008-01-29 13:38 ` Peter Zijlstra
2008-01-29 10:57 ` Peter Zijlstra
2008-01-29 11:30 ` Paul Jackson
2008-01-29 11:34 ` Paul Jackson
2008-01-29 11:50 ` Peter Zijlstra
2008-01-29 12:12 ` Paul Jackson
2008-01-29 15:57 ` Gregory Haskins
2008-01-29 16:33 ` Paul Jackson
2008-01-29 15:50 ` Gregory Haskins
2008-01-29 16:51 ` Paul Jackson
2008-01-29 17:21 ` Gregory Haskins
2008-01-29 19:04 ` Paul Jackson
2008-01-29 20:36 ` Gregory Haskins
2008-01-29 21:02 ` Paul Jackson
2008-01-29 21:07 ` Gregory Haskins
2008-01-29 15:36 ` Gregory Haskins
2008-01-29 16:28 ` Paul Jackson
2008-01-29 16:42 ` Gregory Haskins
2008-01-29 19:37 ` Paul Jackson
2008-01-29 20:28 ` Gregory Haskins
2008-01-29 20:56 ` Paul Jackson
2008-01-29 21:02 ` Gregory Haskins
2008-01-29 22:23 ` Steven Rostedt
2008-01-29 12:32 ` Srivatsa Vaddagiri
2008-01-29 12:21 ` Paul Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1201600428.28547.87.camel@lappy \
--to=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=dhaval@linux.vnet.ibm.com \
--cc=dmitry.adamushko@gmail.com \
--cc=ebiederm@xmission.com \
--cc=ghaskins@novell.com \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=mingo@elte.hu \
--cc=nickpiggin@yahoo.com.au \
--cc=pj@sgi.com \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=sgrubb@redhat.com \
--cc=tglx@linutronix.de \
--cc=tong.n.li@intel.com \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox