Nick's scheduler v17

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Nick's scheduler v17
@ 2003-10-24 18:10 Nick Piggin
  2003-10-24 20:17 ` cliff white
  2003-10-24 21:49 ` Andrew Theurer
  0 siblings, 2 replies; 7+ messages in thread
From: Nick Piggin @ 2003-10-24 18:10 UTC (permalink / raw)
  To: linux-kernel

Hi,
http://www.kerneltrap.org/~npiggin/v17/

Still working on SMP and NUMA. Some (maybe) interesting things I put in are
- Sequential CPU balancing so you don't get a big storm of balances 
every 1/4s.
- Balancing is trying to err more on the side of caution, I have to start
  analysing it more thoroughly though.
- Attacked the NUMA balancing code. There should now be less buslocked ops /
  cache pingpongs in some fastpaths. Volanomark likes it, more realistic 
loads
  won't improve so much http://www.kerneltrap.org/~npiggin/v17/volano.png
  This improvement is NUMA only.

I haven't had time to reproduce Cliff's serious reaim performance 
dropoffs so
they're probably still there. I couldn't reproduce Martin's kernbench 
dropoff,
but the 16-way I'm using only has 512K cache which might not show it up.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Nick's scheduler v17
  2003-10-24 18:10 Nick's scheduler v17 Nick Piggin
@ 2003-10-24 20:17 ` cliff white
  2003-10-24 21:49 ` Andrew Theurer
  1 sibling, 0 replies; 7+ messages in thread
From: cliff white @ 2003-10-24 20:17 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

On Sat, 25 Oct 2003 04:10:24 +1000
Nick Piggin <piggin@cyberone.com.au> wrote:

> Hi,
> http://www.kerneltrap.org/~npiggin/v17/
> 
> Still working on SMP and NUMA. Some (maybe) interesting things I put in are
> - Sequential CPU balancing so you don't get a big storm of balances 
> every 1/4s.
> - Balancing is trying to err more on the side of caution, I have to start
>   analysing it more thoroughly though.
> - Attacked the NUMA balancing code. There should now be less buslocked ops /
>   cache pingpongs in some fastpaths. Volanomark likes it, more realistic 
> loads
>   won't improve so much http://www.kerneltrap.org/~npiggin/v17/volano.png
>   This improvement is NUMA only.
> 
> I haven't had time to reproduce Cliff's serious reaim performance 
> dropoffs so
> they're probably still there. I couldn't reproduce Martin's kernbench 
> dropoff,
> but the 16-way I'm using only has 512K cache which might not show it up.
> 
Okay, i put both v17 patches in the OSDL Patch Lifecycle Manager 
( http://www.osdl.org/plm-cgi/plm/ ) 
sched-v17 is PLM #2251
sched-nopolicy-v17 is PLM #2252

Results asap.
cliffw

> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
The church is near, but the road is icy.
The bar is far, but i will walk carefully. - Russian proverb

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Nick's scheduler v17
  2003-10-24 18:10 Nick's scheduler v17 Nick Piggin
  2003-10-24 20:17 ` cliff white
@ 2003-10-24 21:49 ` Andrew Theurer
  2003-10-25  1:12   ` Nick Piggin
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Theurer @ 2003-10-24 21:49 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

On Friday 24 October 2003 13:10, Nick Piggin wrote:
> Hi,
> http://www.kerneltrap.org/~npiggin/v17/
>
> Still working on SMP and NUMA. Some (maybe) interesting things I put in are
> - Sequential CPU balancing so you don't get a big storm of balances
> every 1/4s.
> - Balancing is trying to err more on the side of caution, I have to start
>   analysing it more thoroughly though.

+
+	*imbalance /= 2;
+	*imbalance = (*imbalance + FPT - 1) / FPT;

I think I see what is going on here, but would something like this work out 
better?

	*imbalance = min(this_load - load_avg, load_avg - max_load)

That way you take just enough to either have busiest_queue or this_rq's length 
be the load_avg.  I suppose you could take even less, but IMO, the /=2 is 
what I really don't like.  Perhaps:

*imbalance = min(this_load - load_avg, load_avg - max_load);
*imbalance = (*imbalance + FPT - 1) / FPT;

This should work well for intranode balances, internode balances may need a 
little optimization, since the load_avg really does not really represent the 
load avg of the two nodes in question, just one cpu from one of them and all 
the cpus from another.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Nick's scheduler v17
  2003-10-24 21:49 ` Andrew Theurer
@ 2003-10-25  1:12   ` Nick Piggin
  2003-10-26  6:43     ` Nick Piggin
  0 siblings, 1 reply; 7+ messages in thread
From: Nick Piggin @ 2003-10-25  1:12 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: linux-kernel



Andrew Theurer wrote:

>On Friday 24 October 2003 13:10, Nick Piggin wrote:
>
>>Hi,
>>http://www.kerneltrap.org/~npiggin/v17/
>>
>>Still working on SMP and NUMA. Some (maybe) interesting things I put in are
>>- Sequential CPU balancing so you don't get a big storm of balances
>>every 1/4s.
>>- Balancing is trying to err more on the side of caution, I have to start
>>  analysing it more thoroughly though.
>>
>
>+
>+	*imbalance /= 2;
>+	*imbalance = (*imbalance + FPT - 1) / FPT;
>
>I think I see what is going on here, but would something like this work out 
>better?
>

Yeah, sorry its not well commented. Its still changing quite quickly.

>
>	*imbalance = min(this_load - load_avg, load_avg - max_load)
>
>That way you take just enough to either have busiest_queue or this_rq's length 
>be the load_avg.  I suppose you could take even less, but IMO, the /=2 is 
>what I really don't like.  Perhaps:
>

That is _exactly_ what I had before! Thats probably the way to go. Thanks
for having a look at it.

>
>
>*imbalance = min(this_load - load_avg, load_avg - max_load);
>*imbalance = (*imbalance + FPT - 1) / FPT;
>
>This should work well for intranode balances, internode balances may need a 
>little optimization, since the load_avg really does not really represent the 
>load avg of the two nodes in question, just one cpu from one of them and all 
>the cpus from another.
>

Yeah that does need a bit of rethinking.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Nick's scheduler v17
  2003-10-25  1:12   ` Nick Piggin
@ 2003-10-26  6:43     ` Nick Piggin
  2003-10-27 17:02       ` Andrew Theurer
  0 siblings, 1 reply; 7+ messages in thread
From: Nick Piggin @ 2003-10-26  6:43 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Theurer, linux-kernel



Nick Piggin wrote:

>
>
> Andrew Theurer wrote:
>
>> On Friday 24 October 2003 13:10, Nick Piggin wrote:
>>
>>> Hi,
>>> http://www.kerneltrap.org/~npiggin/v17/
>>>
>>> Still working on SMP and NUMA. Some (maybe) interesting things I put 
>>> in are
>>> - Sequential CPU balancing so you don't get a big storm of balances
>>> every 1/4s.
>>> - Balancing is trying to err more on the side of caution, I have to 
>>> start
>>>  analysing it more thoroughly though.
>>>
>>
>> +
>> +    *imbalance /= 2;
>> +    *imbalance = (*imbalance + FPT - 1) / FPT;
>>
>> I think I see what is going on here, but would something like this 
>> work out better?
>>
>
> Yeah, sorry its not well commented. Its still changing quite quickly.
>
>>
>>     *imbalance = min(this_load - load_avg, load_avg - max_load)
>>
>> That way you take just enough to either have busiest_queue or 
>> this_rq's length be the load_avg.  I suppose you could take even 
>> less, but IMO, the /=2 is what I really don't like.  Perhaps:
>>
>
> That is _exactly_ what I had before! Thats probably the way to go. Thanks
> for having a look at it.
>
>>
>>
>> *imbalance = min(this_load - load_avg, load_avg - max_load);
>> *imbalance = (*imbalance + FPT - 1) / FPT;
>>
>> This should work well for intranode balances, internode balances may 
>> need a little optimization, since the load_avg really does not really 
>> represent the load avg of the two nodes in question, just one cpu 
>> from one of them and all the cpus from another.
>>

Oh, actually, after my path, load_avg represents the load average of _all_
the nodes. Have a look at find_busiest_node. Which jogs my memory of why
its not always a good idea to do your *imbalance min(...) thing (I actually
saw this happening).

5 CPUs, 4 processes running on one cpu. load_avg would be 0.8 for all cpus.
balancing doesn't happen. I have to think about this a bit more...



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Nick's scheduler v17
  2003-10-26  6:43     ` Nick Piggin
@ 2003-10-27 17:02       ` Andrew Theurer
  2003-10-27 23:13         ` Nick Piggin
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Theurer @ 2003-10-27 17:02 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

> >>     *imbalance = min(this_load - load_avg, load_avg - max_load)
> >>
> >> That way you take just enough to either have busiest_queue or
> >> this_rq's length be the load_avg.  I suppose you could take even
> >> less, but IMO, the /=2 is what I really don't like.  Perhaps:
> >
> > That is _exactly_ what I had before! Thats probably the way to go. Thanks
> > for having a look at it.
> >
> >> *imbalance = min(this_load - load_avg, load_avg - max_load);
> >> *imbalance = (*imbalance + FPT - 1) / FPT;
> >>
> >> This should work well for intranode balances, internode balances may
> >> need a little optimization, since the load_avg really does not really
> >> represent the load avg of the two nodes in question, just one cpu
> >> from one of them and all the cpus from another.
>
> Oh, actually, after my path, load_avg represents the load average of _all_
> the nodes. Have a look at find_busiest_node. Which jogs my memory of why
> its not always a good idea to do your *imbalance min(...) thing (I actually
> saw this happening).

Oops, I meant avg_load, which you calculate in find_busiest_queue on the fly.  

> 5 CPUs, 4 processes running on one cpu. load_avg would be 0.8 for all cpus.
> balancing doesn't happen. I have to think about this a bit more...

Actually, if we use avg_load, I guess it would be 0, since this is an unsigned 
long.  Maybe avg_load needs to have a min value of 1.  Then if we apply:

*imbalance = min(max_load - avg_load, avg_load - this_load)
	     min(4 - 1, 1 - 0)
	     	

And imbalance looks a lot better.  Only concern would be an idle cpu stealing 
from another, leaving the other cpu idle.  I guess a check could be put 
there.	
	     



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Nick's scheduler v17
  2003-10-27 17:02       ` Andrew Theurer
@ 2003-10-27 23:13         ` Nick Piggin
  0 siblings, 0 replies; 7+ messages in thread
From: Nick Piggin @ 2003-10-27 23:13 UTC (permalink / raw)
  To: Andrew Theurer; +Cc: linux-kernel



Andrew Theurer wrote:

>>>>    *imbalance = min(this_load - load_avg, load_avg - max_load)
>>>>
>>>>That way you take just enough to either have busiest_queue or
>>>>this_rq's length be the load_avg.  I suppose you could take even
>>>>less, but IMO, the /=2 is what I really don't like.  Perhaps:
>>>>
>>>That is _exactly_ what I had before! Thats probably the way to go. Thanks
>>>for having a look at it.
>>>
>>>
>>>>*imbalance = min(this_load - load_avg, load_avg - max_load);
>>>>*imbalance = (*imbalance + FPT - 1) / FPT;
>>>>
>>>>This should work well for intranode balances, internode balances may
>>>>need a little optimization, since the load_avg really does not really
>>>>represent the load avg of the two nodes in question, just one cpu
>>>>from one of them and all the cpus from another.
>>>>
>>Oh, actually, after my path, load_avg represents the load average of _all_
>>the nodes. Have a look at find_busiest_node. Which jogs my memory of why
>>its not always a good idea to do your *imbalance min(...) thing (I actually
>>saw this happening).
>>
>
>Oops, I meant avg_load, which you calculate in find_busiest_queue on the fly.  
>

OK

>
>>5 CPUs, 4 processes running on one cpu. load_avg would be 0.8 for all cpus.
>>balancing doesn't happen. I have to think about this a bit more...
>>
>
>Actually, if we use avg_load, I guess it would be 0, since this is an unsigned 
>long.  Maybe avg_load needs to have a min value of 1.  Then if we apply:
>

Well its got a fixed point scaling factor.

>
>*imbalance = min(max_load - avg_load, avg_load - this_load)
>	     min(4 - 1, 1 - 0)
>	     	
>

I think you want:

*imbalance = min(max_load - avg_load, avg_load - this_load)
if ( (*imbalance < 1*FPT) &&
        (max_load - this_load) > 1*FPT )
    *imbalance = 1*FPT;

So if there is a total imbalance of more than 1 task, at least one
will be moved.



>
>And imbalance looks a lot better.  Only concern would be an idle cpu stealing 
>from another, leaving the other cpu idle.  I guess a check could be put 
>there.	
>	     
>

pull_task won't pull a running task, so you get some protection there.



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-10-27 23:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-24 18:10 Nick's scheduler v17 Nick Piggin
2003-10-24 20:17 ` cliff white
2003-10-24 21:49 ` Andrew Theurer
2003-10-25  1:12   ` Nick Piggin
2003-10-26  6:43     ` Nick Piggin
2003-10-27 17:02       ` Andrew Theurer
2003-10-27 23:13         ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox