public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] sched: core-balancer v2
@ 2008-05-15 18:35 Gregory Haskins
  2008-05-15 18:35 ` [RFC PATCH 1/3] sched: create sched_balancer container for sched_groups Gregory Haskins
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Gregory Haskins @ 2008-05-15 18:35 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Suresh Sidda, Srivatsa Vaddagiri, linux-kernel, Gregory Haskins

Here is an update to the core-balancer previously discussed here

http://lkml.org/lkml/2008/5/12/260

Changes since v1:

1) Fixed a memory allocation error pointed out by Suresh Siddha
2) Implemented a suggestion by Suresh to only assign core-balancers to SMP or
NUMA domains (i.e. exclude HT/MC domains)
3) Fixed several small bugs and misc cleanups

Comments/feedback welcome!

Regards,
-Greg

^ permalink raw reply	[flat|nested] 5+ messages in thread
* [RFC PATCH 0/3] sched: core balancer
@ 2008-05-12 18:14 Gregory Haskins
  2008-05-12 18:14 ` [RFC PATCH 2/3] sched: pass specific sched_balancer to functions Gregory Haskins
  0 siblings, 1 reply; 5+ messages in thread
From: Gregory Haskins @ 2008-05-12 18:14 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Srivatsa Vaddagiri, Gregory Haskins, linux-kernel

Hi Ingo, Peter, Srivatsa,

The following series is an RFC for some code I wrote in conjunction with
some rt/cfs load-balancing enhancements.  The enhancements arent quite
ready to see the light of day yet, but this particular fix is ready for
comment.  It applies to sched-devel.

This series addresses a problem that I discovered while working on the rt/cfs
load-balancer, but it appears it could affect upstream too (though its much
less likely to ever occur).

Patches 1&2 move the existing balancer data into a "sched_balancer" container
called "group_balancer".  Patch #3 then adds a new type of balancer called a
"core balancer".

Here is the problem statement (also included in Documentation/scheduler):

	Core Balancing
	----------------------
	
	The standard group_balancer manages SCHED_OTHER tasks based on a
	hierarchy of sched_domains and sched_groups as dictated by the
	physical cache/node topology of the hardware.  Each group may contain
	one or more cores which have a specific relationship to other members
	of the group. Balancing is always performed on an inter-group basis.
	
	For example, consider a quad-core, dual socket Intel Xeon system.  It
	has a total of 8 cores across one logical NUMA node, with a cache
	shared between cores [0,2], [1,3], [4,6], [5,7].  From a
	sched_domain/group perspective on core 0, this looks like the
	following: 
	
	domain-0: (MC)
	  span: 0x5
	  groups = 2 -> [0], [2]
	  domain-1: (SMP)
	    span: 0xff
	    groups = 4 -> [0,2], [1,3], [4,6], [5,7]
	    domain-2: (NUMA)
	      span: 0xff
	      groups = 1 -> [0-7]
	
	Recall that balancing is always inter-group, and will get more
	aggressive in the lower domains than the higher ones.  The balancing
	logic will attempt to balance between [0],[2] first, [0,2], [1,3],
	[4,6], [5,7] second, and [0-7] last.  Note that since domain-2 only
	consists of 1 group, it will never result in a balance decision since
	there must be at least two groups to consider.
	
	This layout is quite logical.  The idea is that [0], and [2] can
	balance between each other aggresively in a very efficient manner
	since they share a cache.  Once the load is equalized between two
	cache-peers, domain-1 can spread the load out between the other
	peer-groups.  This represents a pretty good way to structure the
	balancing operations.
	
	However, there is one slight problem with the group_balancer: Since we
	always balance inter-group, intra-group imbalances may result in
	suboptimal behavior if we hit the condition where lower-level domains
	(domain-0 in this example) are ineffective.  This condition can arise
	whenever a domain-level imbalance cannot be resolved such that the
	group has a high aggregate load rating, yet some cores are relatively
	idle. 
	
	For example, if a core has a large but affined load, or otherwise
	untouchable tasks (e.g. RT tasks), SCHED_OTHER will not be able to
	equalize the load.  The net result is that one or more members of the
	group may remain relatively unloaded, while the load rating for the
	entire group is high.  The higher layer domains will only consider the
	group as a whole, and the lower level domains are left powerless to
	equalize the vacuum.
	
	To address this concern, core_balancer adds the concept of a new
	grouping of cores at each domain-level: a per-core grouping (each core
	in its own unique group).  This "core_balancer" group is configured to
	run much less aggressively than its topologically relevant brother:
	"group_balancer". Core_balancer will sweep through the cores every so
	often, correcting intra-group vacuums left over from lower level
	domains.  In most cases, the group_balancer should have already
	established equilibrium, therefore benefiting from the hardwares
	natural affinity hierarchy.  In the cases where it cannot achieve
	equilibrium, the core_balancer tries to take it one step closer.
	
	By default, group_balancer runs at sd->min_interval, whereas
	core_balancer starts at sd->max_interval (both of which will respond
	to dynamic programming).  Both will employ a multiplicative backoff
	algorithm when faced with repeated migration failure.

---

Regards,
-Greg



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-05-15 18:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-15 18:35 [RFC PATCH 0/3] sched: core-balancer v2 Gregory Haskins
2008-05-15 18:35 ` [RFC PATCH 1/3] sched: create sched_balancer container for sched_groups Gregory Haskins
2008-05-15 18:35 ` [RFC PATCH 2/3] sched: pass specific sched_balancer to functions Gregory Haskins
2008-05-15 18:35 ` [RFC PATCH 3/3] sched: add a per-core balancer group Gregory Haskins
  -- strict thread matches above, loose matches on Subject: below --
2008-05-12 18:14 [RFC PATCH 0/3] sched: core balancer Gregory Haskins
2008-05-12 18:14 ` [RFC PATCH 2/3] sched: pass specific sched_balancer to functions Gregory Haskins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox