cpufreq Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH][RFC] ondemand governor automatic downscaling
@ 2005-03-06 23:36 Eric Piel
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Piel @ 2005-03-06 23:36 UTC (permalink / raw)
  To: cpufreq

[-- Attachment #1: Type: text/plain, Size: 1633 bytes --]

Hello,

Here is a new policy for the ondemand governor. The modification 
concerns the frequency downscaling. Instead of decreasing to a lower 
frequency when the CPU usage is under 20%, this new policy automatically 
scales to the optimal frequency. The optimal frequency being the lowest 
frequency which provides enough power to not trigger the upscaling policy.

An exemple: let's say you are watching a DVD, this takes 40% of the cpu 
usage when frequency is maximum. With the current governor, the 
frequency would be at the maximum, 1Ghz (because when you started the 
player, the cpu usage reached, briefly, 100%). With this new algorithm, 
the optimal frequency is computed. The goal is to have a usage of 70% 
(== 80% minus a delta). Therefore the optimal frequency is 571Mhz, from 
this request we get the closest hardware frequency which is on this 
computer 700Mhz. We've saved 300Mhz.

The beautifulness of this approach is that, in addition to have a 
frequency better fitting the CPU usage, the code is simpler than before 
and, cherry on the cake, there is no need for down_threshold anymore!

Ok, so as you've understood, I'm happy with this new algorithm. It works 
on my computer (speedstep-ich) and it should bring even more conveniency 
with hardware that supports more than 2 frequencies. Does anyone see any 
disavantage in this approach? Venkatesh, what do you think of 
incorporating this new algorithm in the ondemand governor?

Eric

PS: This applies on vanilla 2.6.11, after my two previous patches 
(although they are not technically required).

--
Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>

[-- Attachment #2: ondemand-automatic-downscaling-2.6.11.patch --]
[-- Type: text/x-patch, Size: 4895 bytes --]

--- linux-2.6.11/drivers/cpufreq/cpufreq_ondemand.c.factorise-allcpu	2005-03-06 19:30:29.000000000 +0100
+++ linux-2.6.11/drivers/cpufreq/cpufreq_ondemand.c	2005-03-06 23:41:03.000000000 +0100
@@ -34,13 +34,9 @@
  */
 
 #define DEF_FREQUENCY_UP_THRESHOLD		(80)
-#define MIN_FREQUENCY_UP_THRESHOLD		(0)
+#define MIN_FREQUENCY_UP_THRESHOLD		(10)
 #define MAX_FREQUENCY_UP_THRESHOLD		(100)
 
-#define DEF_FREQUENCY_DOWN_THRESHOLD		(20)
-#define MIN_FREQUENCY_DOWN_THRESHOLD		(0)
-#define MAX_FREQUENCY_DOWN_THRESHOLD		(100)
-
 /* 
  * The polling frequency of this governor depends on the capability of 
  * the processor. Default polling frequency is 1000 times the transition
@@ -78,12 +74,10 @@ struct dbs_tuners {
 	unsigned int 		sampling_rate;
 	unsigned int		sampling_down_factor;
 	unsigned int		up_threshold;
-	unsigned int		down_threshold;
 };
 
 static struct dbs_tuners dbs_tuners_ins = {
 	.up_threshold 		= DEF_FREQUENCY_UP_THRESHOLD,
-	.down_threshold 	= DEF_FREQUENCY_DOWN_THRESHOLD,
 	.sampling_down_factor 	= DEF_SAMPLING_DOWN_FACTOR,
 };
 
@@ -115,7 +109,6 @@ static ssize_t show_##file_name						\
 show_one(sampling_rate, sampling_rate);
 show_one(sampling_down_factor, sampling_down_factor);
 show_one(up_threshold, up_threshold);
-show_one(down_threshold, down_threshold);
 
 static ssize_t store_sampling_down_factor(struct cpufreq_policy *unused, 
 		const char *buf, size_t count)
@@ -161,8 +154,7 @@ static ssize_t store_up_threshold(struct
 
 	down(&dbs_sem);
 	if (ret != 1 || input > MAX_FREQUENCY_UP_THRESHOLD || 
-			input < MIN_FREQUENCY_UP_THRESHOLD ||
-			input <= dbs_tuners_ins.down_threshold) {
+			input < MIN_FREQUENCY_UP_THRESHOLD) {
 		up(&dbs_sem);
 		return -EINVAL;
 	}
@@ -173,26 +165,6 @@ static ssize_t store_up_threshold(struct
 	return count;
 }
 
-static ssize_t store_down_threshold(struct cpufreq_policy *unused, 
-		const char *buf, size_t count)
-{
-	unsigned int input;
-	int ret;
-	ret = sscanf (buf, "%u", &input);
-
-	down(&dbs_sem);
-	if (ret != 1 || input > MAX_FREQUENCY_DOWN_THRESHOLD || 
-			input < MIN_FREQUENCY_DOWN_THRESHOLD ||
-			input >= dbs_tuners_ins.up_threshold) {
-		up(&dbs_sem);
-		return -EINVAL;
-	}
-
-	dbs_tuners_ins.down_threshold = input;
-	up(&dbs_sem);
-
-	return count;
-}
 
 #define define_one_rw(_name) \
 static struct freq_attr _name = \
@@ -201,7 +173,6 @@ __ATTR(_name, 0644, show_##_name, store_
 define_one_rw(sampling_rate);
 define_one_rw(sampling_down_factor);
 define_one_rw(up_threshold);
-define_one_rw(down_threshold);
 
 static struct attribute * dbs_attributes[] = {
 	&sampling_rate_max.attr,
@@ -209,7 +180,6 @@ static struct attribute * dbs_attributes
 	&sampling_rate.attr,
 	&sampling_down_factor.attr,
 	&up_threshold.attr,
-	&down_threshold.attr,
 	NULL
 };
 
@@ -222,8 +192,8 @@ static struct attribute_group dbs_attr_g
 
 static void dbs_check_cpu(int cpu)
 {
-	unsigned int idle_ticks, up_idle_ticks, down_idle_ticks;
-	unsigned int freq_down_step;
+	unsigned int idle_ticks, up_idle_ticks, total_ticks;
+	unsigned int freq_next;
 	unsigned int freq_down_sampling_rate;
 	static int down_skip[NR_CPUS];
 	struct cpu_dbs_info_s *this_dbs_info;
@@ -290,7 +261,12 @@ static void dbs_check_cpu(int cpu)
 	down_skip[cpu]++;
 	if (down_skip[cpu] < dbs_tuners_ins.sampling_down_factor)
 		return;
+	down_skip[cpu] = 0;
 
+	/* don't try to decrease the frequency if it's already the min */
+	if (policy->cur == policy->min)
+		return;
+	
 	idle_ticks = UINT_MAX;
 	for_each_cpu_mask(j, policy->cpus) {
 		unsigned int tmp_idle_ticks, total_idle_ticks;
@@ -308,27 +285,23 @@ static void dbs_check_cpu(int cpu)
 			idle_ticks = tmp_idle_ticks;
 	}
 
-	/* Scale idle ticks by 100 and compare with up and down ticks */
-	idle_ticks *= 100;
-	down_skip[cpu] = 0;
-
+	/* Compute how many ticks there are between two measurements */
 	freq_down_sampling_rate = dbs_tuners_ins.sampling_rate *
 		dbs_tuners_ins.sampling_down_factor;
-	down_idle_ticks = (100 - dbs_tuners_ins.down_threshold) *
-			sampling_rate_in_HZ(freq_down_sampling_rate);
-
-	if (idle_ticks > down_idle_ticks) {
-		freq_down_step = (5 * policy->max) / 100;
-
-		/* max freq cannot be less than 100. But who knows.... */
-		if (unlikely(freq_down_step == 0))
-			freq_down_step = 5;
+	total_ticks = sampling_rate_in_HZ(freq_down_sampling_rate);
+	
+	/* 
+	 * The optimal frequency is the frequency that is the lowest that
+	 * can support the current CPU usage without triggering 
+	 * the up policy. To be safe, we focus 10 points under the threshold.
+	 */
+	freq_next = ((total_ticks - idle_ticks) * 100 * policy->cur) /
+	       		(total_ticks * (dbs_tuners_ins.up_threshold - 10));
 
+	if (freq_next <= ((policy->cur * 95) / 100))
 		__cpufreq_driver_target(policy,
-			policy->cur - freq_down_step, 
-			CPUFREQ_RELATION_H);
-		return;
-	}
+			freq_next, 
+			CPUFREQ_RELATION_L);
 }
 
 static void do_dbs_timer(void *data)

[-- Attachment #3: Type: text/plain, Size: 147 bytes --]

_______________________________________________
Cpufreq mailing list
Cpufreq@lists.linux.org.uk
http://lists.linux.org.uk/mailman/listinfo/cpufreq

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH][RFC] ondemand governor automatic downscaling
@ 2005-03-07  0:13 Pallipadi, Venkatesh
  2005-03-07  0:53 ` Eric Piel
  0 siblings, 1 reply; 5+ messages in thread
From: Pallipadi, Venkatesh @ 2005-03-07  0:13 UTC (permalink / raw)
  To: Eric Piel, cpufreq



The approach looks good. I had tried something like this before. And
this works well as long as we assume that the CPU utilization will be
proportional to the current frequency.

In particular, this will not work well in the following situation:
Assume a server with a constant read/write to disk. Say the CPU computes
something for 40% of time and then issues a I/O and waits for remaining
60% of time (until I/O is complete). This wait is irrespective of CPU
speed. Now if we reduce the frequency to make CPU 70% busy, and CPU will
still wait for IO and overall performance will go down as a result of
the frequency slowdown. 
This can still happen with 20-80 policy, but the probablity is less.

I am thinking of two possiblities here:
1) Keep ondemand as it is, with a performance oriented frequency
scaling. And add a new governor with this automatic downscaling and with
other features making it a power oriented frequency scaling.
2) Change the ondemand governor itself with these changes.

At present, I am tending towards option 1. But, I would like to get
other opinions on this.

Thanks,
Venki
 

>-----Original Message-----
>From: Eric Piel [mailto:Eric.Piel@tremplin-utc.net] 
>Sent: Sunday, March 06, 2005 3:36 PM
>To: cpufreq@zenii.linux.org.uk
>Cc: Pallipadi, Venkatesh
>Subject: [PATCH][RFC] ondemand governor automatic downscaling
>
>Hello,
>
>Here is a new policy for the ondemand governor. The modification 
>concerns the frequency downscaling. Instead of decreasing to a lower 
>frequency when the CPU usage is under 20%, this new policy 
>automatically 
>scales to the optimal frequency. The optimal frequency being 
>the lowest 
>frequency which provides enough power to not trigger the 
>upscaling policy.
>
>An exemple: let's say you are watching a DVD, this takes 40% 
>of the cpu 
>usage when frequency is maximum. With the current governor, the 
>frequency would be at the maximum, 1Ghz (because when you started the 
>player, the cpu usage reached, briefly, 100%). With this new 
>algorithm, 
>the optimal frequency is computed. The goal is to have a usage of 70% 
>(== 80% minus a delta). Therefore the optimal frequency is 
>571Mhz, from 
>this request we get the closest hardware frequency which is on this 
>computer 700Mhz. We've saved 300Mhz.
>
>The beautifulness of this approach is that, in addition to have a 
>frequency better fitting the CPU usage, the code is simpler 
>than before 
>and, cherry on the cake, there is no need for down_threshold anymore!
>
>Ok, so as you've understood, I'm happy with this new 
>algorithm. It works 
>on my computer (speedstep-ich) and it should bring even more 
>conveniency 
>with hardware that supports more than 2 frequencies. Does 
>anyone see any 
>disavantage in this approach? Venkatesh, what do you think of 
>incorporating this new algorithm in the ondemand governor?
>
>Eric
>
>PS: This applies on vanilla 2.6.11, after my two previous patches 
>(although they are not technically required).
>
>--
>Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][RFC] ondemand governor automatic downscaling
  2005-03-07  0:13 [PATCH][RFC] ondemand governor automatic downscaling Pallipadi, Venkatesh
@ 2005-03-07  0:53 ` Eric Piel
  2005-03-08 22:28   ` Stefan Seyfried
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Piel @ 2005-03-07  0:53 UTC (permalink / raw)
  To: Pallipadi, Venkatesh; +Cc: cpufreq

Pallipadi, Venkatesh a écrit :
:
:
> In particular, this will not work well in the following situation:
> Assume a server with a constant read/write to disk. Say the CPU computes
> something for 40% of time and then issues a I/O and waits for remaining
> 60% of time (until I/O is complete). This wait is irrespective of CPU
> speed. Now if we reduce the frequency to make CPU 70% busy, and CPU will
> still wait for IO and overall performance will go down as a result of
> the frequency slowdown. 
> This can still happen with 20-80 policy, but the probablity is less.
Well, if I understand correctly, we lose time because the harddisk is 
waiting for the CPU to do some computing before it can write anything 
and the slowest is the CPU, the longer the harddisk will have to wait. 
IMHO, this situation seems very unlikely because of all the I/O 
virtualisation done by the kernel, including read-ahead and cache. The 
harddisk will completely turn asynchronously, 100% of the time, while 
the CPU will do computation during 70% of the time instead of 40%, still 
having idle time. This shouldn't affect global performance. Still, I can 
be overlooking something, do you have any testcase in mind?

In addition, in the very improbable case where disk would be waiting for 
the CPU (half idle), the 20-80 policy would have slight superiority 
simply because it is not as effective as the automatic downscaling 
policy. It would be better to directly address this problem, for 
instance, allowing the iowait time to count as busy time.


> 
> I am thinking of two possiblities here:
> 1) Keep ondemand as it is, with a performance oriented frequency
> scaling. And add a new governor with this automatic downscaling and with
> other features making it a power oriented frequency scaling.
> 2) Change the ondemand governor itself with these changes.
> 
> At present, I am tending towards option 1. But, I would like to get
> other opinions on this.
Well, I'd like to avoid as much as possible to have to fork the ondemand 
governor so I'd like to try to stick to option 2 :-)

Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH][RFC] ondemand governor automatic downscaling
@ 2005-03-07  1:51 Pallipadi, Venkatesh
  0 siblings, 0 replies; 5+ messages in thread
From: Pallipadi, Venkatesh @ 2005-03-07  1:51 UTC (permalink / raw)
  To: Eric Piel; +Cc: cpufreq


>-----Original Message-----
>From: Eric Piel [mailto:Eric.Piel@tremplin-utc.net] 
>Sent: Sunday, March 06, 2005 4:54 PM
>To: Pallipadi, Venkatesh
>Cc: cpufreq@zenii.linux.org.uk
>Subject: Re: [PATCH][RFC] ondemand governor automatic downscaling
>
>Pallipadi, Venkatesh a écrit :
>:
>:
>> In particular, this will not work well in the following situation:
>> Assume a server with a constant read/write to disk. Say the 
>CPU computes
>> something for 40% of time and then issues a I/O and waits 
>for remaining
>> 60% of time (until I/O is complete). This wait is irrespective of CPU
>> speed. Now if we reduce the frequency to make CPU 70% busy, 
>and CPU will
>> still wait for IO and overall performance will go down as a result of
>> the frequency slowdown. 
>> This can still happen with 20-80 policy, but the probablity is less.
>Well, if I understand correctly, we lose time because the harddisk is 
>waiting for the CPU to do some computing before it can write anything 
>and the slowest is the CPU, the longer the harddisk will have to wait. 
>IMHO, this situation seems very unlikely because of all the I/O 
>virtualisation done by the kernel, including read-ahead and cache. The 
>harddisk will completely turn asynchronously, 100% of the time, while 
>the CPU will do computation during 70% of the time instead of 
>40%, still 
>having idle time. This shouldn't affect global performance. 
>Still, I can 
>be overlooking something, do you have any testcase in mind?
>

It was just a theoritical possibility or sort of a corner case.
But, as you said, that can be handled by having iowait as busy time.
Probably by having another tuning parameter for iowait time.

I will run this patch on couple of SMP systems, with some workload. 
I just want to make sure there are no side-effects before pushing 
this to the base. I should be able to get back on this in couple of days. 

Thanks again for the patch,
Venki

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][RFC] ondemand governor automatic downscaling
  2005-03-07  0:53 ` Eric Piel
@ 2005-03-08 22:28   ` Stefan Seyfried
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Seyfried @ 2005-03-08 22:28 UTC (permalink / raw)
  To: cpufreq

On Mon, Mar 07, 2005 at 01:53:41AM +0100, Eric Piel wrote:
> Pallipadi, Venkatesh a écrit :
> :
> :
> >In particular, this will not work well in the following situation:
> >Assume a server with a constant read/write to disk. Say the CPU computes
> >something for 40% of time and then issues a I/O and waits for remaining
> >60% of time (until I/O is complete). This wait is irrespective of CPU
> >speed. Now if we reduce the frequency to make CPU 70% busy, and CPU will
> >still wait for IO and overall performance will go down as a result of
> >the frequency slowdown. 

I'm not quite sure this is right...

> >This can still happen with 20-80 policy, but the probablity is less.
> Well, if I understand correctly, we lose time because the harddisk is 
> waiting for the CPU to do some computing before it can write anything 
> and the slowest is the CPU, the longer the harddisk will have to wait. 
> IMHO, this situation seems very unlikely because of all the I/O 
> virtualisation done by the kernel, including read-ahead and cache. The 
> harddisk will completely turn asynchronously, 100% of the time, while 
> the CPU will do computation during 70% of the time instead of 40%, still 
> having idle time. This shouldn't affect global performance. Still, I can 
> be overlooking something, do you have any testcase in mind?

...and i agree with Eric's thoughts. But today was a long day and i am not
very awake right now.
Anyway.
I have this "automatic gearshift ;-)" in the SUSE powersaved since at
least SUSE 9.2 (i am not sure if this one made it into 9.1 and SLES9 or
if it went in after that) and i have never heard of any performance
problems of this sort (and i have heard a lot of people claim that
dynamic cpufreq policies would slow down their machines, but nobody
provided me with numbers to back this up even remotely). So i think this
is the right thing to do and it makes configuration easier: just an
up_threshold and maybe a hysteresis percentage (default 5% in powersaved)
to avoid continuous switching if the cpu load is near the switching
point.

Short: i like it ;-)

> Well, I'd like to avoid as much as possible to have to fork the ondemand 
> governor so I'd like to try to stick to option 2 :-)

maybe it would be possible to get 2 ondemand governors from 3 source
files:
 - an "ondemand core"
 - an "ondemand simple", probably performance oriented with not too much
   tunables
 - an "ondemand mobile" with more sophisticated switching and optimal
   configuration aimed at power saving, less performance oriented.
The reasoning behind this that IMO tuning for performance is easy if you
don't mind a few extra mWh spent, but getting it to run at good
performance but as little power consumption as possible is a little bit
more complex (should we consider "niced" cpu time? automatic selection of
"good" up- and downswitching points etc). I'm not sure if there would be
enough code shared between the two drivers to have an extra "core"
module, but it might help.

For me, as a "mobile devices" guy, the more powersaving, the better but
there are different opinions on this ;-)

Best regards,

    Stefan
-- 
Stefan Seyfried

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-03-08 22:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-07  0:13 [PATCH][RFC] ondemand governor automatic downscaling Pallipadi, Venkatesh
2005-03-07  0:53 ` Eric Piel
2005-03-08 22:28   ` Stefan Seyfried
  -- strict thread matches above, loose matches on Subject: below --
2005-03-07  1:51 Pallipadi, Venkatesh
2005-03-06 23:36 Eric Piel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox