public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* scheduler scalability - cgroups, cpusets and load-balancing
@ 2008-01-29  9:53 Peter Zijlstra
  2008-01-29 10:01 ` Paul Jackson
  2008-01-29 10:57 ` Peter Zijlstra
  0 siblings, 2 replies; 36+ messages in thread
From: Peter Zijlstra @ 2008-01-29  9:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, vatsa, Dhaval Giani, Paul Jackson, Nick Piggin,
	Eric W. Biederman, Andrew Morton, Steve Grubb, Steven Rostedt,
	Gregory Haskins, Dmitry Adamushko, Li, Tong N, Thomas Gleixner,
	Paul Menage, David Rientjes

Hi All,

Some of the fancy new scheduler features such as the cgroup load
balancer (load_balance_monitor) and the real-time load balancer are a
bit of an scalability issue. They all seem to want a rather strong
global bound to keep a global fairness (which is quite understandable).

[ my own interest is currently real-time group scheduling on multiple
  cpus, and that seems to require _very_ strong bonds ]

I think the current stuff would scale up to 8 maybe 16 cpus, but after
that I'd be real worried.

Now we want distributions to enable most of these features. Distros seem
to want containers, but distros also need to support 128+ cpu machines,
so how are we going to solve this.

My thoughts were to make stronger use of disjoint cpu-sets. cgroups and
cpusets are related, in that cpusets provide a property to a cgroup.
However, load_balance_monitor()'s interaction with sched domains
confuses me - it might DTRT, but I can't tell.

[ It looks to me it balances a group over the largest SD the current cpu
  has access to, even though that might be larger than the SD associated
  with the cpuset of that particular cgroup. ]

Also the RT load-balance needs to become aware of such these sets, I
think Paul J and Steven once talked about it, but can't quite remember
where that ended. From my POV there should be sched-domain based balance
information, not global.

By cutting the problem into smaller pieces, and adding tunables to
weaken to global fairness, I think we can give administrators enough
freedom to make use of these features, even on the largest of machines.

[ so I'd move the load_balance_monitor() tunables into cpusets as well,
  I can imagine a smaller cpuset wanting a stronger fairness than a much
  larger cpuset. ]

I understand its a somewhat hand-wavey email, but I wanted to start
discussion on the issue, or have someone show me I'm wrong and can stop
worrying :-).

Peter


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2008-01-29 22:24 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-29  9:53 scheduler scalability - cgroups, cpusets and load-balancing Peter Zijlstra
2008-01-29 10:01 ` Paul Jackson
2008-01-29 10:50   ` Peter Zijlstra
2008-01-29 11:13     ` Paul Jackson
2008-01-29 11:31       ` Peter Zijlstra
2008-01-29 11:53         ` Paul Jackson
2008-01-29 12:07           ` Peter Zijlstra
2008-01-29 12:36             ` Paul Jackson
2008-01-29 12:03         ` Paul Jackson
2008-01-29 12:30           ` Peter Zijlstra
2008-01-29 12:52             ` Paul Jackson
2008-01-29 13:38               ` Peter Zijlstra
2008-01-29 10:57 ` Peter Zijlstra
2008-01-29 11:30   ` Paul Jackson
2008-01-29 11:34     ` Paul Jackson
2008-01-29 11:50     ` Peter Zijlstra
2008-01-29 12:12       ` Paul Jackson
2008-01-29 15:57         ` Gregory Haskins
2008-01-29 16:33           ` Paul Jackson
2008-01-29 15:50       ` Gregory Haskins
2008-01-29 16:51         ` Paul Jackson
2008-01-29 17:21           ` Gregory Haskins
2008-01-29 19:04             ` Paul Jackson
2008-01-29 20:36               ` Gregory Haskins
2008-01-29 21:02                 ` Paul Jackson
2008-01-29 21:07                   ` Gregory Haskins
2008-01-29 15:36     ` Gregory Haskins
2008-01-29 16:28       ` Paul Jackson
2008-01-29 16:42         ` Gregory Haskins
2008-01-29 19:37           ` Paul Jackson
2008-01-29 20:28             ` Gregory Haskins
2008-01-29 20:56               ` Paul Jackson
2008-01-29 21:02                 ` Gregory Haskins
2008-01-29 22:23                   ` Steven Rostedt
2008-01-29 12:32   ` Srivatsa Vaddagiri
2008-01-29 12:21     ` Paul Jackson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox