Jesse Barnes wrote:
> On Friday, July 16, 2004 1:53 am, Nick Piggin wrote:

>>Instead of a top level domain spanning all CPUs, have each CPU's top level
>>domain just span all CPUs within a couple of hops (enough to get, say 16 to
>>64 CPUs into each top level domain). I could give you a hand with this if
>>you need.
> 
> 
> Yeah, that's what I had in mind.  I'll wait for the patch you mentioned above 
> and hack on top of that...
> 

The patch is attached, although it needs a bit of commenting and testing.
Also, the init_sched_build_groups helper function in kernel/sched.c probably
wants to be exported for use by architecture code.

Out of interest, what sort of performance problems are you seeing with
this high rate of global balancing? I have a couple of patches to cut down
runqueue locking to almost zero in interrupt paths, although I imagine the
main problem you are having is pulling a cacheline off every remote CPU
when calculating runqueue loads?