Re: [ckrm-tech] [RFC 0/4] sched: Add CPU rate caps (improved)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [ckrm-tech] [RFC 0/4] sched: Add CPU rate caps (improved)
       [not found] <20060606023708.2801.24804.sendpatchset@heathwren.pw.nest>
@ 2006-06-07  8:05 ` MAEDA Naoaki
  2006-06-07 12:44   ` Peter Williams
  2006-06-08  7:50   ` Peter Williams
  0 siblings, 2 replies; 8+ messages in thread
From: MAEDA Naoaki @ 2006-06-07  8:05 UTC (permalink / raw)
  To: Peter Williams
  Cc: Linux Kernel, Kirill Korotaev, Srivatsa, CKRM, Balbir Singh,
	Mike Galbraith, Con Kolivas, Sam Vilain, Kingsley Cheung,
	Eric W. Biederman, Ingo Molnar, Rene Herman

Peter Williams wrote:

> 4. Overhead Measurements.  To measure the implications for overhead
> introduced by these patches kernbench was used on a dual 500Mhz
> Centrino SMP system.  Runs were done for a kernel without these
> patches applied, one with the patches applied but no caps being used
> and one with the patches applied and running kernbench with a soft cap
> of zero (which would be inherited by all its children).
> 
> Average Optimal -j 8 Load Run:
> 
>                   Vanilla          Patch Applied    Soft Cap 0%
> 
> Elapsed Time      1056.1   (1.92)  1048.2   (0.62)  1064.1   (1.59)
> User Time         1908.1   (1.09)  1895.2   (1.30)  1926.6   (1.39)
> System Time        181.7   (0.60)   177.5   (0.74)   173.8   (1.07)
> Percent CPU        197.6   (0.55)   197.0   (0)      197.0   (0)
> Context Switches 49253.6 (136.31) 48881.4  (92.03) 92490.8 (163.71)
> Sleeps           28038.8 (228.11) 28136.0 (250.65) 25769.4 (280.40)

I tried to run kernbench with hard cap, and then it spent a very
long time on "Cleaning souce tree..." phase. Because this phase
is not CPU hog, my expectation is that it act as without cap.

That can be reproduced by just running "make clean" on top of a
kernel source tree with hard cap.

% /usr/bin/time make clean
1.62user 0.29system 0:01.90elapsed 101%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+68539minor)pagefaults 0swaps

   # Without cap, it returns almost immediately

% ~/withcap.sh  -C 900 /usr/bin/time make clean
1.61user 0.29system 1:26.17elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+68537minor)pagefaults 0swaps

   # With 90% hard cap, it takes about 1.5 minutes.

% ~/withcap.sh  -C 100 /usr/bin/time make clean
1.64user 0.34system 3:31.48elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+68538minor)pagefaults 0swaps

   # It became worse with 10% hard cap.

% ~/withcap.sh  -c 900 /usr/bin/time make clean
1.63user 0.28system 0:01.89elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+68537minor)pagefaults 0swaps

   # It doesn't happen with soft cap.

Thanks,
MAEDA Naoaki


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ckrm-tech] [RFC 0/4] sched: Add CPU rate caps (improved)
  2006-06-07  8:05 ` [ckrm-tech] [RFC 0/4] sched: Add CPU rate caps (improved) MAEDA Naoaki
@ 2006-06-07 12:44   ` Peter Williams
  2006-06-08  7:50   ` Peter Williams
  1 sibling, 0 replies; 8+ messages in thread
From: Peter Williams @ 2006-06-07 12:44 UTC (permalink / raw)
  To: MAEDA Naoaki
  Cc: Linux Kernel, Kirill Korotaev, Srivatsa, CKRM, Balbir Singh,
	Mike Galbraith, Con Kolivas, Sam Vilain, Kingsley Cheung,
	Eric W. Biederman, Ingo Molnar, Rene Herman

MAEDA Naoaki wrote:
> Peter Williams wrote:
> 
>> 4. Overhead Measurements.  To measure the implications for overhead
>> introduced by these patches kernbench was used on a dual 500Mhz
>> Centrino SMP system.  Runs were done for a kernel without these
>> patches applied, one with the patches applied but no caps being used
>> and one with the patches applied and running kernbench with a soft cap
>> of zero (which would be inherited by all its children).
>>
>> Average Optimal -j 8 Load Run:
>>
>>                   Vanilla          Patch Applied    Soft Cap 0%
>>
>> Elapsed Time      1056.1   (1.92)  1048.2   (0.62)  1064.1   (1.59)
>> User Time         1908.1   (1.09)  1895.2   (1.30)  1926.6   (1.39)
>> System Time        181.7   (0.60)   177.5   (0.74)   173.8   (1.07)
>> Percent CPU        197.6   (0.55)   197.0   (0)      197.0   (0)
>> Context Switches 49253.6 (136.31) 48881.4  (92.03) 92490.8 (163.71)
>> Sleeps           28038.8 (228.11) 28136.0 (250.65) 25769.4 (280.40)
> 
> I tried to run kernbench with hard cap, and then it spent a very
> long time on "Cleaning souce tree..." phase. Because this phase
> is not CPU hog, my expectation is that it act as without cap.
> 
> That can be reproduced by just running "make clean" on top of a
> kernel source tree with hard cap.
> 
> % /usr/bin/time make clean
> 1.62user 0.29system 0:01.90elapsed 101%CPU (0avgtext+0avgdata 
> 0maxresident)k
> 0inputs+0outputs (0major+68539minor)pagefaults 0swaps
> 
>   # Without cap, it returns almost immediately
> 
> % ~/withcap.sh  -C 900 /usr/bin/time make clean
> 1.61user 0.29system 1:26.17elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+68537minor)pagefaults 0swaps
> 
>   # With 90% hard cap, it takes about 1.5 minutes.

This is harder capping than I would expect.  I'll look into probable 
causes.  It could be caused by the simplification I made to the 
calculation of sinbin time.

> 
> % ~/withcap.sh  -C 100 /usr/bin/time make clean
> 1.64user 0.34system 3:31.48elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+68538minor)pagefaults 0swaps
> 
>   # It became worse with 10% hard cap.

And so it should.

> 
> % ~/withcap.sh  -c 900 /usr/bin/time make clean
> 1.63user 0.28system 0:01.89elapsed 100%CPU (0avgtext+0avgdata 
> 0maxresident)k
> 0inputs+0outputs (0major+68537minor)pagefaults 0swaps
> 
>   # It doesn't happen with soft cap.

That's because soft caps allow you to go over the cap if no other tasks 
want the CPU.

Thanks for the feedback,
Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ckrm-tech] [RFC 0/4] sched: Add CPU rate caps (improved)
  2006-06-07  8:05 ` [ckrm-tech] [RFC 0/4] sched: Add CPU rate caps (improved) MAEDA Naoaki
  2006-06-07 12:44   ` Peter Williams
@ 2006-06-08  7:50   ` Peter Williams
  2006-06-09  0:57     ` Peter Williams
  2006-06-09  5:41     ` MAEDA Naoaki
  1 sibling, 2 replies; 8+ messages in thread
From: Peter Williams @ 2006-06-08  7:50 UTC (permalink / raw)
  To: MAEDA Naoaki
  Cc: Linux Kernel, Kirill Korotaev, Srivatsa, CKRM, Balbir Singh,
	Mike Galbraith, Con Kolivas, Sam Vilain, Kingsley Cheung,
	Eric W. Biederman, Ingo Molnar, Rene Herman

[-- Attachment #1: Type: text/plain, Size: 5670 bytes --]

MAEDA Naoaki wrote:
> 
> I tried to run kernbench with hard cap, and then it spent a very
> long time on "Cleaning souce tree..." phase. Because this phase
> is not CPU hog, my expectation is that it act as without cap.
> 
> That can be reproduced by just running "make clean" on top of a
> kernel source tree with hard cap.
> 
> % /usr/bin/time make clean
> 1.62user 0.29system 0:01.90elapsed 101%CPU (0avgtext+0avgdata 
> 0maxresident)k
> 0inputs+0outputs (0major+68539minor)pagefaults 0swaps
> 
>   # Without cap, it returns almost immediately
> 
> % ~/withcap.sh  -C 900 /usr/bin/time make clean
> 1.61user 0.29system 1:26.17elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+68537minor)pagefaults 0swaps
> 
>   # With 90% hard cap, it takes about 1.5 minutes.
> 
> % ~/withcap.sh  -C 100 /usr/bin/time make clean
> 1.64user 0.34system 3:31.48elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+68538minor)pagefaults 0swaps
> 
>   # It became worse with 10% hard cap.
> 
> % ~/withcap.sh  -c 900 /usr/bin/time make clean
> 1.63user 0.28system 0:01.89elapsed 100%CPU (0avgtext+0avgdata 
> 0maxresident)k
> 0inputs+0outputs (0major+68537minor)pagefaults 0swaps
> 
>   # It doesn't happen with soft cap.

This behaviour is caused by the "make clean" being a short lived CPU 
intensive task.  It was made worse by the facts that my simplifications 
to the sinbin duration calculation which assumed a constant CPU burst 
size based on the time slice and that exiting tasks could still have 
caps enforced.  (The simplification was done to avoid 64 bit divides.)

I've put in a more complex sinbin calculation (and don't think the 64 
bit divides will matter too much as they're on an infrequently travelled 
path.  Exiting tasks are now excluded from having caps enforced on the 
grounds that it's best for system performance to let them get out of the 
way as soon as possible.  A patch is attached and I would appreciate it 
if you could see if it improves the situation you are observing.

These changes don't completely get rid of the phenomenon but I think 
that it's less severe.  I've written a couple of scripts to test this 
behaviour using the wload program from:

<http://prdownloads.sourceforge.net/cpuse/simloads-0.1.1.tar.gz?download>

You run loop.sh with a single argument and it uses asps.sh.   What the 
test does is run a number (specified by the argument to loop.sh) of 
instances of wload in series and uses time to get the stats for the 
series to complete.  It does these for a number of different durations 
of wload running between 0.001 and 10.0 seconds.  Here's an example of 
an output from an uncapped run:

Peter[peterw@heathwren ~]$ ./loops.sh 1
-d=0.001: user = 0.01 system = 0.00 elapsed = 0.00 rate = 133%
-d=0.005: user = 0.01 system = 0.00 elapsed = 0.01 rate = 84%
-d=0.01: user = 0.02 system = 0.00 elapsed = 0.01 rate = 105%
-d=0.05: user = 0.06 system = 0.00 elapsed = 0.05 rate = 103%
-d=0.1: user = 0.10 system = 0.00 elapsed = 0.11 rate = 98%
-d=0.5: user = 0.50 system = 0.00 elapsed = 0.50 rate = 100%
-d=1.0: user = 1.00 system = 0.00 elapsed = 1.01 rate = 99%
-d=5.0: user = 5.00 system = 0.00 elapsed = 5.01 rate = 99%
-d=10.0: user = 10.00 system = 0.00 elapsed = 10.01 rate = 99%

and with a cap of 90%:

[peterw@heathwren ~]$ withcap -C 900 ./loops.sh 1
-d=0.001: user = 0.00 system = 0.00 elapsed = 0.01 rate = 53%
-d=0.005: user = 0.01 system = 0.00 elapsed = 0.02 rate = 61%
-d=0.01: user = 0.01 system = 0.00 elapsed = 0.03 rate = 66%
-d=0.05: user = 0.06 system = 0.00 elapsed = 0.07 rate = 85%
-d=0.1: user = 0.10 system = 0.00 elapsed = 0.11 rate = 91%
-d=0.5: user = 0.50 system = 0.00 elapsed = 0.56 rate = 90%
-d=1.0: user = 1.00 system = 0.00 elapsed = 1.11 rate = 90%
-d=5.0: user = 5.00 system = 0.00 elapsed = 5.54 rate = 90%
-d=10.0: user = 10.00 system = 0.00 elapsed = 11.14 rate = 89%

Notice how the tasks' usage rates get closer to the cap the longer the 
task runs and never exceeds the cap.  With smaller caps the effect is 
different e.g. for a 9% cap we get:

[peterw@heathwren ~]$ withcap -C 90 ./loops.sh 1
-d=0.001: user = 0.00 system = 0.00 elapsed = 0.01 rate = 109%
-d=0.005: user = 0.01 system = 0.00 elapsed = 0.02 rate = 59%
-d=0.01: user = 0.02 system = 0.00 elapsed = 0.05 rate = 35%
-d=0.05: user = 0.05 system = 0.00 elapsed = 0.14 rate = 42%
-d=0.1: user = 0.10 system = 0.00 elapsed = 0.25 rate = 43%
-d=0.5: user = 0.50 system = 0.00 elapsed = 1.87 rate = 27%
-d=1.0: user = 1.00 system = 0.00 elapsed = 5.37 rate = 18%
-d=5.0: user = 5.00 system = 0.00 elapsed = 48.61 rate = 10%
-d=10.0: user = 10.00 system = 0.00 elapsed = 102.22 rate = 9%

and short lived tasks are being under capped.

Bearing in mind that -d=0.01 is equivalent of a task running for just a 
single tick and that that's about the shortest cycle length we're likely 
to see for CPU intensive tasks (and then only when the capping 
enforcement kicks) I think it is unrealistic to expect much better for 
tasks with a life shorter than that.  Further it takes several cycles to 
gather reasonable statistics to base capping enforcement so I think that 
doing much better than this for short lived tasks is unrealistic.

You could also try using a smaller value for CAP_STATS_OFFSET as this 
will shorten the half life of the Kalman filters and make the capping 
react more quickly to changes in usage rates (which is what a task's 
starting is).  The downside is that it will be less smooth.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

[-- Attachment #2: loops.sh --]
[-- Type: application/x-shellscript, Size: 191 bytes --]

[-- Attachment #3: short-lived-tasks-hard-cap-fix --]
[-- Type: text/plain, Size: 1502 bytes --]

---
 kernel/sched.c |   18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

Index: MM-2.6.17-rc5-mm3/kernel/sched.c
===================================================================
--- MM-2.6.17-rc5-mm3.orig/kernel/sched.c	2006-06-06 11:29:51.000000000 +1000
+++ MM-2.6.17-rc5-mm3/kernel/sched.c	2006-06-08 14:28:10.000000000 +1000
@@ -216,7 +216,8 @@ static void sinbin_release_fn(unsigned l
 #define cap_load_weight(p) \
 	(max((int)((min_cpu_rate_cap(p) * SCHED_LOAD_SCALE) / CPU_CAP_ONE), 1))
 #define safe_to_enforce_cap(p) \
-	(!((p)->mutexes_held || (p)->flags & (PF_FREEZE | PF_UIWAKE)))
+	(!((p)->mutexes_held || \
+	   (p)->flags & (PF_FREEZE | PF_UIWAKE | PF_EXITING)))
 #define safe_to_sinbin(p) (safe_to_enforce_cap(p) && !signal_pending(p))
 
 static void init_cpu_rate_caps(task_t *p)
@@ -1235,13 +1236,16 @@ static unsigned long reqd_sinbin_ticks(c
 	unsigned long long rhs = p->avg_cycle_length * p->cpu_rate_hard_cap;
 
 	if (lhs > rhs) {
-		unsigned long res;
-
-		res = static_prio_timeslice(p->static_prio);
-		res *= (CPU_CAP_ONE - p->cpu_rate_hard_cap);
-		res /= CPU_CAP_ONE;
+		lhs -= p->avg_cpu_per_cycle;
+		lhs >>= CAP_STATS_OFFSET;
+		/* have to do two divisions because there's no gaurantee
+		 * that p->cpu_rate_hard_cap * (1000000000 / HZ) would
+		 * not overflow a 32 bit unsigned integer
+		 */
+		(void)do_div(lhs, p->cpu_rate_hard_cap);
+		(void)do_div(lhs, (1000000000 / HZ));
 
-		return res ? : 1;
+		return lhs ? : 1;
 	}
 
 	return 0;

[-- Attachment #4: asps.sh --]
[-- Type: application/x-shellscript, Size: 92 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ckrm-tech] [RFC 0/4] sched: Add CPU rate caps (improved)
  2006-06-08  7:50   ` Peter Williams
@ 2006-06-09  0:57     ` Peter Williams
  2006-06-09  5:50       ` MAEDA Naoaki
  2006-06-09  5:41     ` MAEDA Naoaki
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Williams @ 2006-06-09  0:57 UTC (permalink / raw)
  To: Peter Williams
  Cc: MAEDA Naoaki, Kirill Korotaev, Srivatsa, CKRM, Balbir Singh,
	Mike Galbraith, Linux Kernel, Con Kolivas, Sam Vilain,
	Kingsley Cheung, Eric W. Biederman, Ingo Molnar, Rene Herman

Peter Williams wrote:
> WARNING: This e-mail has been altered to remove unsafe attachments
> and or a potential virus.
> 
> These attachments will be scrutinized by the System Administrators
> and if found to be safe forwarded to the recipient.
> 
> For more information please see the System Administrators
> <sam-members@aurema.com> +61 2 9698 2322.
> 
> An attachment named loops.sh was removed from this document as it
> constituted a security hazard.  If you require this document, please contact
> the sender and arrange an alternate means of receiving it.
> 
> 
> An attachment named asps.sh was removed from this document as it
> constituted a security hazard.  If you require this document, please contact
> the sender and arrange an alternate means of receiving it.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> MAEDA Naoaki wrote:
>>
>> I tried to run kernbench with hard cap, and then it spent a very
>> long time on "Cleaning souce tree..." phase. Because this phase
>> is not CPU hog, my expectation is that it act as without cap.
>>
>> That can be reproduced by just running "make clean" on top of a
>> kernel source tree with hard cap.
>>
>> % /usr/bin/time make clean
>> 1.62user 0.29system 0:01.90elapsed 101%CPU (0avgtext+0avgdata 
>> 0maxresident)k
>> 0inputs+0outputs (0major+68539minor)pagefaults 0swaps
>>
>>   # Without cap, it returns almost immediately
>>
>> % ~/withcap.sh  -C 900 /usr/bin/time make clean
>> 1.61user 0.29system 1:26.17elapsed 2%CPU (0avgtext+0avgdata 
>> 0maxresident)k
>> 0inputs+0outputs (0major+68537minor)pagefaults 0swaps
>>
>>   # With 90% hard cap, it takes about 1.5 minutes.
>>
>> % ~/withcap.sh  -C 100 /usr/bin/time make clean
>> 1.64user 0.34system 3:31.48elapsed 0%CPU (0avgtext+0avgdata 
>> 0maxresident)k
>> 0inputs+0outputs (0major+68538minor)pagefaults 0swaps
>>
>>   # It became worse with 10% hard cap.
>>
>> % ~/withcap.sh  -c 900 /usr/bin/time make clean
>> 1.63user 0.28system 0:01.89elapsed 100%CPU (0avgtext+0avgdata 
>> 0maxresident)k
>> 0inputs+0outputs (0major+68537minor)pagefaults 0swaps
>>
>>   # It doesn't happen with soft cap.
> 
> This behaviour is caused by the "make clean" being a short lived CPU 
> intensive task.  It was made worse by the facts that my simplifications 
> to the sinbin duration calculation which assumed a constant CPU burst 
> size based on the time slice and that exiting tasks could still have 
> caps enforced.  (The simplification was done to avoid 64 bit divides.)
> 
> I've put in a more complex sinbin calculation (and don't think the 64 
> bit divides will matter too much as they're on an infrequently travelled 
> path.  Exiting tasks are now excluded from having caps enforced on the 
> grounds that it's best for system performance to let them get out of the 
> way as soon as possible.  A patch is attached and I would appreciate it 
> if you could see if it improves the situation you are observing.
> 
> These changes don't completely get rid of the phenomenon but I think 
> that it's less severe.  I've written a couple of scripts to test this 
> behaviour using the wload program from:
> 
> <http://prdownloads.sourceforge.net/cpuse/simloads-0.1.1.tar.gz?download>
> 
> You run loop.sh with a single argument and it uses asps.sh.   What the 
> test does is run a number (specified by the argument to loop.sh) of 
> instances of wload in series and uses time to get the stats for the 
> series to complete.  It does these for a number of different durations 
> of wload running between 0.001 and 10.0 seconds.  Here's an example of 
> an output from an uncapped run:
> 
> Peter[peterw@heathwren ~]$ ./loops.sh 1
> -d=0.001: user = 0.01 system = 0.00 elapsed = 0.00 rate = 133%
> -d=0.005: user = 0.01 system = 0.00 elapsed = 0.01 rate = 84%
> -d=0.01: user = 0.02 system = 0.00 elapsed = 0.01 rate = 105%
> -d=0.05: user = 0.06 system = 0.00 elapsed = 0.05 rate = 103%
> -d=0.1: user = 0.10 system = 0.00 elapsed = 0.11 rate = 98%
> -d=0.5: user = 0.50 system = 0.00 elapsed = 0.50 rate = 100%
> -d=1.0: user = 1.00 system = 0.00 elapsed = 1.01 rate = 99%
> -d=5.0: user = 5.00 system = 0.00 elapsed = 5.01 rate = 99%
> -d=10.0: user = 10.00 system = 0.00 elapsed = 10.01 rate = 99%
> 
> and with a cap of 90%:
> 
> [peterw@heathwren ~]$ withcap -C 900 ./loops.sh 1
> -d=0.001: user = 0.00 system = 0.00 elapsed = 0.01 rate = 53%
> -d=0.005: user = 0.01 system = 0.00 elapsed = 0.02 rate = 61%
> -d=0.01: user = 0.01 system = 0.00 elapsed = 0.03 rate = 66%
> -d=0.05: user = 0.06 system = 0.00 elapsed = 0.07 rate = 85%
> -d=0.1: user = 0.10 system = 0.00 elapsed = 0.11 rate = 91%
> -d=0.5: user = 0.50 system = 0.00 elapsed = 0.56 rate = 90%
> -d=1.0: user = 1.00 system = 0.00 elapsed = 1.11 rate = 90%
> -d=5.0: user = 5.00 system = 0.00 elapsed = 5.54 rate = 90%
> -d=10.0: user = 10.00 system = 0.00 elapsed = 11.14 rate = 89%
> 
> Notice how the tasks' usage rates get closer to the cap the longer the 
> task runs and never exceeds the cap.  With smaller caps the effect is 
> different e.g. for a 9% cap we get:
> 
> [peterw@heathwren ~]$ withcap -C 90 ./loops.sh 1
> -d=0.001: user = 0.00 system = 0.00 elapsed = 0.01 rate = 109%
> -d=0.005: user = 0.01 system = 0.00 elapsed = 0.02 rate = 59%
> -d=0.01: user = 0.02 system = 0.00 elapsed = 0.05 rate = 35%
> -d=0.05: user = 0.05 system = 0.00 elapsed = 0.14 rate = 42%
> -d=0.1: user = 0.10 system = 0.00 elapsed = 0.25 rate = 43%
> -d=0.5: user = 0.50 system = 0.00 elapsed = 1.87 rate = 27%
> -d=1.0: user = 1.00 system = 0.00 elapsed = 5.37 rate = 18%
> -d=5.0: user = 5.00 system = 0.00 elapsed = 48.61 rate = 10%
> -d=10.0: user = 10.00 system = 0.00 elapsed = 102.22 rate = 9%
> 
> and short lived tasks are being under capped.
> 
> Bearing in mind that -d=0.01 is equivalent of a task running for just a 
> single tick and that that's about the shortest cycle length we're likely 
> to see for CPU intensive tasks (and then only when the capping 
> enforcement kicks) I think it is unrealistic to expect much better for 
> tasks with a life shorter than that.  Further it takes several cycles to 
> gather reasonable statistics to base capping enforcement so I think that 
> doing much better than this for short lived tasks is unrealistic.
> 
> You could also try using a smaller value for CAP_STATS_OFFSET as this 
> will shorten the half life of the Kalman filters and make the capping 
> react more quickly to changes in usage rates (which is what a task's 
> starting is).  The downside is that it will be less smooth.

I've done some informal testing with smaller values of CAP_STATS_OFFSET 
and there is only a minor improvement.

However, something that does improve behaviour for short lived tasks is 
to increase the value of HZ.  This is because the basic unit of CPU 
allocation by the scheduler is 1/HZ and this is also the minimum time 
(and granularity) with which sinbinning and other capping measures can 
be implemented.  This is the fundamental limiting factor for the 
accuracy of capping i.e. if everything worked perfectly the best 
granularity that can be expected from capping of short lived tasks is 
1000 / (HZ * duration) where duration is in seconds.

For longer living tasks, once the initial phase has passed the half life 
of the Kalman filters takes over from "HZ * duration" in the above 
expression.  Reducing CAP_STATS_OFFSET will shorten the half life of the 
filters and this in turn will make capping coarser.  On the other hand, 
if the half lives are too big then capping will be too slow in reacting 
to changes in a task's CPU usage patterns.  So there's a sweet spot in 
there somewhere.  There's also an upper limit imposed by the likelihood 
of arithmetic overflow during the calculations and this has to consider 
the fact that the average cycle length (one of the metrics) can be quite 
long.  The current values was based on these considerations.

Peter
-- 
Dr Peter Williams, Chief Scientist         <peterw@aurema.com>
Aurema Pty Limited
Level 2, 130 Elizabeth St, Sydney, NSW 2000, Australia
Tel:+61 2 9698 2322  Fax:+61 2 9699 9174 http://www.aurema.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ckrm-tech] [RFC 0/4] sched: Add CPU rate caps (improved)
  2006-06-08  7:50   ` Peter Williams
  2006-06-09  0:57     ` Peter Williams
@ 2006-06-09  5:41     ` MAEDA Naoaki
  2006-06-09  6:38       ` Peter Williams
  1 sibling, 1 reply; 8+ messages in thread
From: MAEDA Naoaki @ 2006-06-09  5:41 UTC (permalink / raw)
  To: Peter Williams
  Cc: Linux Kernel, Kirill Korotaev, Srivatsa, CKRM, Balbir Singh,
	Mike Galbraith, Con Kolivas, Sam Vilain, Kingsley Cheung,
	Eric W. Biederman, Ingo Molnar, Rene Herman, MAEDA Naoaki

Peter Williams wrote:

> This behaviour is caused by the "make clean" being a short lived CPU 
> intensive task.  It was made worse by the facts that my simplifications 
> to the sinbin duration calculation which assumed a constant CPU burst 
> size based on the time slice and that exiting tasks could still have 
> caps enforced.  (The simplification was done to avoid 64 bit divides.)
> 
> I've put in a more complex sinbin calculation (and don't think the 64 
> bit divides will matter too much as they're on an infrequently travelled 
> path.  Exiting tasks are now excluded from having caps enforced on the 
> grounds that it's best for system performance to let them get out of the 
> way as soon as possible.  A patch is attached and I would appreciate it 
> if you could see if it improves the situation you are observing.

Sorry for my late reply.

The followings are the results of patched kernel. Unfortunately,
the patch doesn't seem to help my situation.

$ ~/withcap.sh  -C 900 /usr/bin/time make clean
1.61user 0.29system 1:33.94elapsed 2%CPU

$ ~/withcap.sh  -C 100 /usr/bin/time make clean
1.68user 0.27system 3:34.45elapsed 0%CPU

Thanks,
MAEDA Naoaki


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ckrm-tech] [RFC 0/4] sched: Add CPU rate caps (improved)
  2006-06-09  0:57     ` Peter Williams
@ 2006-06-09  5:50       ` MAEDA Naoaki
  2006-06-09  6:05         ` Peter Williams
  0 siblings, 1 reply; 8+ messages in thread
From: MAEDA Naoaki @ 2006-06-09  5:50 UTC (permalink / raw)
  To: Peter Williams
  Cc: Peter Williams, Kirill Korotaev, Srivatsa, CKRM, Balbir Singh,
	Mike Galbraith, Linux Kernel, Con Kolivas, Sam Vilain,
	Kingsley Cheung, Eric W. Biederman, Ingo Molnar, Rene Herman,
	MAEDA Naoaki

Peter Williams wrote:
> 
> I've done some informal testing with smaller values of CAP_STATS_OFFSET 
> and there is only a minor improvement.
> 
> However, something that does improve behaviour for short lived tasks is 
> to increase the value of HZ.  This is because the basic unit of CPU
> allocation by the scheduler is 1/HZ and this is also the minimum time 
> (and granularity) with which sinbinning and other capping measures can 
> be implemented.  This is the fundamental limiting factor for the 
> accuracy of capping i.e. if everything worked perfectly the best 
> granularity that can be expected from capping of short lived tasks is 
> 1000 / (HZ * duration) where duration is in seconds.

I already defines CONFIG_HZ=1000. Do you suggest increasing more?

> For longer living tasks, once the initial phase has passed the half life 
> of the Kalman filters takes over from "HZ * duration" in the above 
> expression.  Reducing CAP_STATS_OFFSET will shorten the half life of the 
> filters and this in turn will make capping coarser.  On the other hand, 
> if the half lives are too big then capping will be too slow in reacting 
> to changes in a task's CPU usage patterns.  So there's a sweet spot in 
> there somewhere.  There's also an upper limit imposed by the likelihood 
> of arithmetic overflow during the calculations and this has to consider 
> the fact that the average cycle length (one of the metrics) can be quite 
> long.  The current values was based on these considerations.
> 
> Peter

Thanks,
MAEDA Naoaki


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ckrm-tech] [RFC 0/4] sched: Add CPU rate caps (improved)
  2006-06-09  5:50       ` MAEDA Naoaki
@ 2006-06-09  6:05         ` Peter Williams
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Williams @ 2006-06-09  6:05 UTC (permalink / raw)
  To: MAEDA Naoaki
  Cc: Kirill Korotaev, Srivatsa, CKRM, Balbir Singh, Mike Galbraith,
	Peter Williams, Con Kolivas, Linux Kernel, Sam Vilain,
	Eric W. Biederman, Kingsley Cheung, Rene Herman, Ingo Molnar

MAEDA Naoaki wrote:
> Peter Williams wrote:
>> I've done some informal testing with smaller values of CAP_STATS_OFFSET 
>> and there is only a minor improvement.
>>
>> However, something that does improve behaviour for short lived tasks is 
>> to increase the value of HZ.  This is because the basic unit of CPU
>> allocation by the scheduler is 1/HZ and this is also the minimum time 
>> (and granularity) with which sinbinning and other capping measures can 
>> be implemented.  This is the fundamental limiting factor for the 
>> accuracy of capping i.e. if everything worked perfectly the best 
>> granularity that can be expected from capping of short lived tasks is 
>> 1000 / (HZ * duration) where duration is in seconds.
> 
> I already defines CONFIG_HZ=1000. Do you suggest increasing more?

No.

Peter
-- 
Dr Peter Williams, Chief Scientist         <peterw@aurema.com>
Aurema Pty Limited
Level 2, 130 Elizabeth St, Sydney, NSW 2000, Australia
Tel:+61 2 9698 2322  Fax:+61 2 9699 9174 http://www.aurema.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ckrm-tech] [RFC 0/4] sched: Add CPU rate caps (improved)
  2006-06-09  5:41     ` MAEDA Naoaki
@ 2006-06-09  6:38       ` Peter Williams
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Williams @ 2006-06-09  6:38 UTC (permalink / raw)
  To: MAEDA Naoaki
  Cc: Peter Williams, Kirill Korotaev, Srivatsa, CKRM, Balbir Singh,
	Mike Galbraith, Linux Kernel, Con Kolivas, Sam Vilain,
	Kingsley Cheung, Eric W. Biederman, Ingo Molnar, Rene Herman

MAEDA Naoaki wrote:
> Peter Williams wrote:
> 
>> This behaviour is caused by the "make clean" being a short lived CPU 
>> intensive task.  It was made worse by the facts that my simplifications 
>> to the sinbin duration calculation which assumed a constant CPU burst 
>> size based on the time slice and that exiting tasks could still have 
>> caps enforced.  (The simplification was done to avoid 64 bit divides.)
>>
>> I've put in a more complex sinbin calculation (and don't think the 64 
>> bit divides will matter too much as they're on an infrequently travelled 
>> path.  Exiting tasks are now excluded from having caps enforced on the 
>> grounds that it's best for system performance to let them get out of the 
>> way as soon as possible.  A patch is attached and I would appreciate it 
>> if you could see if it improves the situation you are observing.
> 
> Sorry for my late reply.
> 
> The followings are the results of patched kernel. Unfortunately,
> the patch doesn't seem to help my situation.
> 
> $ ~/withcap.sh  -C 900 /usr/bin/time make clean
> 1.61user 0.29system 1:33.94elapsed 2%CPU
> 
> $ ~/withcap.sh  -C 100 /usr/bin/time make clean
> 1.68user 0.27system 3:34.45elapsed 0%CPU

I don't see anything that bad here.  E.g.

[peterw@heathwren SMP]$ /usr/bin/time make clean
make -C /home/peterw/KERNELS/PlugSched/MM-2.6.17-rc4-mm1 
O=/kbuild/Plugsched/MM-2.6.17-rc4-mm1/SMP clean
1.18user 0.81system 0:01.98elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+140353minor)pagefaults 0swaps
[peterw@heathwren SMP]$ withcap -C 900 /usr/bin/time make clean
make -C /home/peterw/KERNELS/PlugSched/MM-2.6.17-rc4-mm1 
O=/kbuild/Plugsched/MM-2.6.17-rc4-mm1/SMP clean
1.19user 0.80system 0:03.07elapsed 65%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+140322minor)pagefaults 0swaps
[peterw@heathwren SMP]$ withcap -C 100 /usr/bin/time make clean
make -C /home/peterw/KERNELS/PlugSched/MM-2.6.17-rc4-mm1 
O=/kbuild/Plugsched/MM-2.6.17-rc4-mm1/SMP clean
1.21user 0.82system 0:05.03elapsed 40%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+140374minor)pagefaults 0swaps
[peterw@heathwren SMP]$

These are worse than predicted by my tests for single processes leading 
me to the opinion that the make clean actually consists of a number or 
serially executed CPU intensive tasks whose total usage adds up to the 
reported times.  My tests (starting a background task before and another 
after running "make clean" and using the difference in their reported 
pids as an estimate of how many tasks were involved in the "make clean") 
indicate that there were about 660.  This gives them each an average 
duration of about 3 milliseconds (based on a total elapsed time of 1.98 
seconds).  Even with HZ of 1000 that's only about 3 jiffies and is well 
and truly in the "too short to do reasonable capping" basket.

Peter
-- 
Dr Peter Williams, Chief Scientist         <peterw@aurema.com>
Aurema Pty Limited
Level 2, 130 Elizabeth St, Sydney, NSW 2000, Australia
Tel:+61 2 9698 2322  Fax:+61 2 9699 9174 http://www.aurema.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-06-09  6:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20060606023708.2801.24804.sendpatchset@heathwren.pw.nest>
2006-06-07  8:05 ` [ckrm-tech] [RFC 0/4] sched: Add CPU rate caps (improved) MAEDA Naoaki
2006-06-07 12:44   ` Peter Williams
2006-06-08  7:50   ` Peter Williams
2006-06-09  0:57     ` Peter Williams
2006-06-09  5:50       ` MAEDA Naoaki
2006-06-09  6:05         ` Peter Williams
2006-06-09  5:41     ` MAEDA Naoaki
2006-06-09  6:38       ` Peter Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox