cpufreq.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] cpufreq: intel_pstate: Change the calculation of next pstate
@ 2014-05-01 21:00 Stratos Karafotis
  2014-05-01 21:30 ` Dirk Brandewie
  0 siblings, 1 reply; 6+ messages in thread
From: Stratos Karafotis @ 2014-05-01 21:00 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar, Dirk Brandewie
  Cc: cpufreq@vger.kernel.org, linux-pm@vger.kernel.org, LKML,
	dirk.brandewie

Currently the driver calculates the next pstate proportional to
core_busy factor, scaled by the ratio max_pstate / current_pstate.

Using the scaled load (core_busy) to calculate the next pstate
is not always correct, because there are cases that the load is 
independent from current pstate. For example, a tight 'for' loop
through many sampling intervals will cause a load of 100% in
every pstate.

So, change the above method and calculate the next pstate with
the assumption that the next pstate should not depend on the
current pstate. The next pstate should only be directly
proportional to measured load. 

Tested on Intel i7-3770 CPU @ 3.40GHz.
Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
increase ~1.5% in performance. Below the test results using turbostat
(5 iterations):

Without patch:

Ph. avg Time	Total time	PkgWatt		Total Energy
	79.63	266.416		57.74		15382.85984
	79.63	265.609		57.87		15370.79283
	79.57	266.994		57.54		15362.83476
	79.53	265.304		57.83		15342.53032
	79.71	265.977		57.76		15362.83152
avg	79.61	266.06		57.74		15364.36985

With patch:

Ph. avg Time	Total time	PkgWatt		Total Energy
	78.23	258.826		59.14		15306.96964
	78.41	259.110		59.15		15326.35650
	78.40	258.530		59.26		15320.48780
	78.46	258.673		59.20		15313.44160
	78.19	259.075		59.16		15326.87700
avg	78.34	258.842		59.18		15318.82650

The total test time reduced by ~2.6%, while the total energy
consumption during a test iteration reduced by ~0.35%

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
---

Changes v1 -> v2
	- Enhance change log as Rafael and Viresh suggested
 

 drivers/cpufreq/intel_pstate.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 0999673..8e309db 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -608,28 +608,27 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
 	mod_timer_pinned(&cpu->timer, jiffies + delay);
 }
 
-static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
+static inline int32_t intel_pstate_get_busy(struct cpudata *cpu)
 {
-	int32_t core_busy, max_pstate, current_pstate;
+	int32_t core_busy, max_pstate;
 
 	core_busy = cpu->sample.core_pct_busy;
 	max_pstate = int_tofp(cpu->pstate.max_pstate);
-	current_pstate = int_tofp(cpu->pstate.current_pstate);
-	core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
+	core_busy = mul_fp(core_busy, max_pstate);
 	return FP_ROUNDUP(core_busy);
 }
 
 static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
 {
-	int32_t busy_scaled;
+	int32_t busy;
 	struct _pid *pid;
 	signed int ctl = 0;
 	int steps;
 
 	pid = &cpu->pid;
-	busy_scaled = intel_pstate_get_scaled_busy(cpu);
+	busy = intel_pstate_get_busy(cpu);
 
-	ctl = pid_calc(pid, busy_scaled);
+	ctl = pid_calc(pid, busy);
 
 	steps = abs(ctl);
 
@@ -651,7 +650,7 @@ static void intel_pstate_timer_func(unsigned long __data)
 	intel_pstate_adjust_busy_pstate(cpu);
 
 	trace_pstate_sample(fp_toint(sample->core_pct_busy),
-			fp_toint(intel_pstate_get_scaled_busy(cpu)),
+			fp_toint(intel_pstate_get_busy(cpu)),
 			cpu->pstate.current_pstate,
 			sample->mperf,
 			sample->aperf,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] cpufreq: intel_pstate: Change the calculation of next pstate
  2014-05-01 21:00 [PATCH v2] cpufreq: intel_pstate: Change the calculation of next pstate Stratos Karafotis
@ 2014-05-01 21:30 ` Dirk Brandewie
  2014-05-01 23:18   ` Rafael J. Wysocki
  0 siblings, 1 reply; 6+ messages in thread
From: Dirk Brandewie @ 2014-05-01 21:30 UTC (permalink / raw)
  To: Stratos Karafotis, Rafael J. Wysocki, Viresh Kumar,
	Dirk Brandewie
  Cc: dirk.brandewie, cpufreq@vger.kernel.org, linux-pm@vger.kernel.org,
	LKML

On 05/01/2014 02:00 PM, Stratos Karafotis wrote:
> Currently the driver calculates the next pstate proportional to
> core_busy factor, scaled by the ratio max_pstate / current_pstate.
>
> Using the scaled load (core_busy) to calculate the next pstate
> is not always correct, because there are cases that the load is
> independent from current pstate. For example, a tight 'for' loop
> through many sampling intervals will cause a load of 100% in
> every pstate.
>
> So, change the above method and calculate the next pstate with
> the assumption that the next pstate should not depend on the
> current pstate. The next pstate should only be directly
> proportional to measured load.
>
> Tested on Intel i7-3770 CPU @ 3.40GHz.
> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
> increase ~1.5% in performance. Below the test results using turbostat
> (5 iterations):
>
> Without patch:
>
> Ph. avg Time	Total time	PkgWatt		Total Energy
> 	79.63	266.416		57.74		15382.85984
> 	79.63	265.609		57.87		15370.79283
> 	79.57	266.994		57.54		15362.83476
> 	79.53	265.304		57.83		15342.53032
> 	79.71	265.977		57.76		15362.83152
> avg	79.61	266.06		57.74		15364.36985
>
> With patch:
>
> Ph. avg Time	Total time	PkgWatt		Total Energy
> 	78.23	258.826		59.14		15306.96964
> 	78.41	259.110		59.15		15326.35650
> 	78.40	258.530		59.26		15320.48780
> 	78.46	258.673		59.20		15313.44160
> 	78.19	259.075		59.16		15326.87700
> avg	78.34	258.842		59.18		15318.82650
>
> The total test time reduced by ~2.6%, while the total energy
> consumption during a test iteration reduced by ~0.35%
>
> Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
> ---
>
> Changes v1 -> v2
> 	- Enhance change log as Rafael and Viresh suggested
>
>
>   drivers/cpufreq/intel_pstate.c | 15 +++++++--------
>   1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 0999673..8e309db 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -608,28 +608,27 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
>   	mod_timer_pinned(&cpu->timer, jiffies + delay);
>   }
>
> -static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
> +static inline int32_t intel_pstate_get_busy(struct cpudata *cpu)
>   {
> -	int32_t core_busy, max_pstate, current_pstate;
> +	int32_t core_busy, max_pstate;
>
>   	core_busy = cpu->sample.core_pct_busy;
>   	max_pstate = int_tofp(cpu->pstate.max_pstate);
> -	current_pstate = int_tofp(cpu->pstate.current_pstate);
> -	core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
> +	core_busy = mul_fp(core_busy, max_pstate);

NAK,  The goal of this code is to find out how busy the core is at the current
P state. This change will return a value WAY too high.

Assume core_busy is 100 and the max non-turbo P state is 34 (3.4GHz) this code
would return a busy value of 3400. The PID  is trying to keep the busy value
at the setpoint any value of ~3% will drive the P state to the highest turbo
P state in this example.


>   	return FP_ROUNDUP(core_busy);
>   }
>
>   static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)
>   {
> -	int32_t busy_scaled;
> +	int32_t busy;
>   	struct _pid *pid;
>   	signed int ctl = 0;
>   	int steps;
>
>   	pid = &cpu->pid;
> -	busy_scaled = intel_pstate_get_scaled_busy(cpu);
> +	busy = intel_pstate_get_busy(cpu);
>
> -	ctl = pid_calc(pid, busy_scaled);
> +	ctl = pid_calc(pid, busy);
>
>   	steps = abs(ctl);
>
> @@ -651,7 +650,7 @@ static void intel_pstate_timer_func(unsigned long __data)
>   	intel_pstate_adjust_busy_pstate(cpu);
>
>   	trace_pstate_sample(fp_toint(sample->core_pct_busy),
> -			fp_toint(intel_pstate_get_scaled_busy(cpu)),
> +			fp_toint(intel_pstate_get_busy(cpu)),
>   			cpu->pstate.current_pstate,
>   			sample->mperf,
>   			sample->aperf,
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] cpufreq: intel_pstate: Change the calculation of next pstate
  2014-05-01 21:30 ` Dirk Brandewie
@ 2014-05-01 23:18   ` Rafael J. Wysocki
  2014-05-02  1:48     ` Dirk Brandewie
  0 siblings, 1 reply; 6+ messages in thread
From: Rafael J. Wysocki @ 2014-05-01 23:18 UTC (permalink / raw)
  To: Dirk Brandewie
  Cc: Stratos Karafotis, Viresh Kumar, Dirk Brandewie,
	cpufreq@vger.kernel.org, linux-pm@vger.kernel.org, LKML

On Thursday, May 01, 2014 02:30:42 PM Dirk Brandewie wrote:
> On 05/01/2014 02:00 PM, Stratos Karafotis wrote:
> > Currently the driver calculates the next pstate proportional to
> > core_busy factor, scaled by the ratio max_pstate / current_pstate.
> >
> > Using the scaled load (core_busy) to calculate the next pstate
> > is not always correct, because there are cases that the load is
> > independent from current pstate. For example, a tight 'for' loop
> > through many sampling intervals will cause a load of 100% in
> > every pstate.
> >
> > So, change the above method and calculate the next pstate with
> > the assumption that the next pstate should not depend on the
> > current pstate. The next pstate should only be directly
> > proportional to measured load.
> >
> > Tested on Intel i7-3770 CPU @ 3.40GHz.
> > Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
> > increase ~1.5% in performance. Below the test results using turbostat
> > (5 iterations):
> >
> > Without patch:
> >
> > Ph. avg Time	Total time	PkgWatt		Total Energy
> > 	79.63	266.416		57.74		15382.85984
> > 	79.63	265.609		57.87		15370.79283
> > 	79.57	266.994		57.54		15362.83476
> > 	79.53	265.304		57.83		15342.53032
> > 	79.71	265.977		57.76		15362.83152
> > avg	79.61	266.06		57.74		15364.36985
> >
> > With patch:
> >
> > Ph. avg Time	Total time	PkgWatt		Total Energy
> > 	78.23	258.826		59.14		15306.96964
> > 	78.41	259.110		59.15		15326.35650
> > 	78.40	258.530		59.26		15320.48780
> > 	78.46	258.673		59.20		15313.44160
> > 	78.19	259.075		59.16		15326.87700
> > avg	78.34	258.842		59.18		15318.82650
> >
> > The total test time reduced by ~2.6%, while the total energy
> > consumption during a test iteration reduced by ~0.35%
> >
> > Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
> > ---
> >
> > Changes v1 -> v2
> > 	- Enhance change log as Rafael and Viresh suggested
> >
> >
> >   drivers/cpufreq/intel_pstate.c | 15 +++++++--------
> >   1 file changed, 7 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> > index 0999673..8e309db 100644
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -608,28 +608,27 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
> >   	mod_timer_pinned(&cpu->timer, jiffies + delay);
> >   }
> >
> > -static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
> > +static inline int32_t intel_pstate_get_busy(struct cpudata *cpu)
> >   {
> > -	int32_t core_busy, max_pstate, current_pstate;
> > +	int32_t core_busy, max_pstate;
> >
> >   	core_busy = cpu->sample.core_pct_busy;
> >   	max_pstate = int_tofp(cpu->pstate.max_pstate);
> > -	current_pstate = int_tofp(cpu->pstate.current_pstate);
> > -	core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
> > +	core_busy = mul_fp(core_busy, max_pstate);
> 
> NAK,  The goal of this code is to find out how busy the core is at the current
> P state. This change will return a value WAY too high.
> 
> Assume core_busy is 100 and the max non-turbo P state is 34 (3.4GHz) this code
> would return a busy value of 3400. The PID  is trying to keep the busy value
> at the setpoint any value of ~3% will drive the P state to the highest turbo
> P state in this example.

Well, the problem is that the numbers above indicate an improvement in energy
efficiency as a result of this patch and we need to explain that result.

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] cpufreq: intel_pstate: Change the calculation of next pstate
  2014-05-01 23:18   ` Rafael J. Wysocki
@ 2014-05-02  1:48     ` Dirk Brandewie
  2014-05-02 12:26       ` Rafael J. Wysocki
  0 siblings, 1 reply; 6+ messages in thread
From: Dirk Brandewie @ 2014-05-02  1:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: dirk.brandewie, Stratos Karafotis, Viresh Kumar, Dirk Brandewie,
	cpufreq@vger.kernel.org, linux-pm@vger.kernel.org, LKML

On 05/01/2014 04:18 PM, Rafael J. Wysocki wrote:
> On Thursday, May 01, 2014 02:30:42 PM Dirk Brandewie wrote:
>> On 05/01/2014 02:00 PM, Stratos Karafotis wrote:
>>> Currently the driver calculates the next pstate proportional to
>>> core_busy factor, scaled by the ratio max_pstate / current_pstate.
>>>
>>> Using the scaled load (core_busy) to calculate the next pstate
>>> is not always correct, because there are cases that the load is
>>> independent from current pstate. For example, a tight 'for' loop
>>> through many sampling intervals will cause a load of 100% in
>>> every pstate.
>>>
>>> So, change the above method and calculate the next pstate with
>>> the assumption that the next pstate should not depend on the
>>> current pstate. The next pstate should only be directly
>>> proportional to measured load.
>>>
>>> Tested on Intel i7-3770 CPU @ 3.40GHz.
>>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
>>> increase ~1.5% in performance. Below the test results using turbostat
>>> (5 iterations):
>>>
>>> Without patch:
>>>
>>> Ph. avg Time	Total time	PkgWatt		Total Energy
>>> 	79.63	266.416		57.74		15382.85984
>>> 	79.63	265.609		57.87		15370.79283
>>> 	79.57	266.994		57.54		15362.83476
>>> 	79.53	265.304		57.83		15342.53032
>>> 	79.71	265.977		57.76		15362.83152
>>> avg	79.61	266.06		57.74		15364.36985
>>>
>>> With patch:
>>>
>>> Ph. avg Time	Total time	PkgWatt		Total Energy
>>> 	78.23	258.826		59.14		15306.96964
>>> 	78.41	259.110		59.15		15326.35650
>>> 	78.40	258.530		59.26		15320.48780
>>> 	78.46	258.673		59.20		15313.44160
>>> 	78.19	259.075		59.16		15326.87700
>>> avg	78.34	258.842		59.18		15318.82650
>>>
>>> The total test time reduced by ~2.6%, while the total energy
>>> consumption during a test iteration reduced by ~0.35%
>>>
>>> Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
>>> ---
>>>
>>> Changes v1 -> v2
>>> 	- Enhance change log as Rafael and Viresh suggested
>>>
>>>
>>>    drivers/cpufreq/intel_pstate.c | 15 +++++++--------
>>>    1 file changed, 7 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
>>> index 0999673..8e309db 100644
>>> --- a/drivers/cpufreq/intel_pstate.c
>>> +++ b/drivers/cpufreq/intel_pstate.c
>>> @@ -608,28 +608,27 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
>>>    	mod_timer_pinned(&cpu->timer, jiffies + delay);
>>>    }
>>>
>>> -static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
>>> +static inline int32_t intel_pstate_get_busy(struct cpudata *cpu)
>>>    {
>>> -	int32_t core_busy, max_pstate, current_pstate;
>>> +	int32_t core_busy, max_pstate;
>>>
>>>    	core_busy = cpu->sample.core_pct_busy;
>>>    	max_pstate = int_tofp(cpu->pstate.max_pstate);
>>> -	current_pstate = int_tofp(cpu->pstate.current_pstate);
>>> -	core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
>>> +	core_busy = mul_fp(core_busy, max_pstate);
>>
>> NAK,  The goal of this code is to find out how busy the core is at the current
>> P state. This change will return a value WAY too high.
>>
>> Assume core_busy is 100 and the max non-turbo P state is 34 (3.4GHz) this code
>> would return a busy value of 3400. The PID  is trying to keep the busy value
>> at the setpoint any value of ~3% will drive the P state to the highest turbo
>> P state in this example.
>
> Well, the problem is that the numbers above indicate an improvement in energy
> efficiency as a result of this patch and we need to explain that result.
>
The performance governor is the best option for this workload.

This change will give you the highest trubo for all but very idle work loads.

Lets say you have a processor with max P state of 3.4GHz  The current P state
is 1.6 GHz so if the processor was 100% in C0 the core_busy values would be
47% This number scaled would be 100%.  With the change above the PID would be
reacting to a load of 1598%.  APERF/MPERF give you the percent of entire
core scaling it lets you find out how busy your are within the cureent P state.

--Dirk



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] cpufreq: intel_pstate: Change the calculation of next pstate
  2014-05-02  1:48     ` Dirk Brandewie
@ 2014-05-02 12:26       ` Rafael J. Wysocki
  2014-05-02 13:45         ` Stratos Karafotis
  0 siblings, 1 reply; 6+ messages in thread
From: Rafael J. Wysocki @ 2014-05-02 12:26 UTC (permalink / raw)
  To: Dirk Brandewie
  Cc: Stratos Karafotis, Viresh Kumar, Dirk Brandewie,
	cpufreq@vger.kernel.org, linux-pm@vger.kernel.org, LKML

On Thursday, May 01, 2014 06:48:08 PM Dirk Brandewie wrote:
> On 05/01/2014 04:18 PM, Rafael J. Wysocki wrote:
> > On Thursday, May 01, 2014 02:30:42 PM Dirk Brandewie wrote:
> >> On 05/01/2014 02:00 PM, Stratos Karafotis wrote:
> >>> Currently the driver calculates the next pstate proportional to
> >>> core_busy factor, scaled by the ratio max_pstate / current_pstate.
> >>>
> >>> Using the scaled load (core_busy) to calculate the next pstate
> >>> is not always correct, because there are cases that the load is
> >>> independent from current pstate. For example, a tight 'for' loop
> >>> through many sampling intervals will cause a load of 100% in
> >>> every pstate.
> >>>
> >>> So, change the above method and calculate the next pstate with
> >>> the assumption that the next pstate should not depend on the
> >>> current pstate. The next pstate should only be directly
> >>> proportional to measured load.
> >>>
> >>> Tested on Intel i7-3770 CPU @ 3.40GHz.
> >>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
> >>> increase ~1.5% in performance. Below the test results using turbostat
> >>> (5 iterations):
> >>>
> >>> Without patch:
> >>>
> >>> Ph. avg Time	Total time	PkgWatt		Total Energy
> >>> 	79.63	266.416		57.74		15382.85984
> >>> 	79.63	265.609		57.87		15370.79283
> >>> 	79.57	266.994		57.54		15362.83476
> >>> 	79.53	265.304		57.83		15342.53032
> >>> 	79.71	265.977		57.76		15362.83152
> >>> avg	79.61	266.06		57.74		15364.36985
> >>>
> >>> With patch:
> >>>
> >>> Ph. avg Time	Total time	PkgWatt		Total Energy
> >>> 	78.23	258.826		59.14		15306.96964
> >>> 	78.41	259.110		59.15		15326.35650
> >>> 	78.40	258.530		59.26		15320.48780
> >>> 	78.46	258.673		59.20		15313.44160
> >>> 	78.19	259.075		59.16		15326.87700
> >>> avg	78.34	258.842		59.18		15318.82650
> >>>
> >>> The total test time reduced by ~2.6%, while the total energy
> >>> consumption during a test iteration reduced by ~0.35%
> >>>
> >>> Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
> >>> ---
> >>>
> >>> Changes v1 -> v2
> >>> 	- Enhance change log as Rafael and Viresh suggested
> >>>
> >>>
> >>>    drivers/cpufreq/intel_pstate.c | 15 +++++++--------
> >>>    1 file changed, 7 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> >>> index 0999673..8e309db 100644
> >>> --- a/drivers/cpufreq/intel_pstate.c
> >>> +++ b/drivers/cpufreq/intel_pstate.c
> >>> @@ -608,28 +608,27 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
> >>>    	mod_timer_pinned(&cpu->timer, jiffies + delay);
> >>>    }
> >>>
> >>> -static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
> >>> +static inline int32_t intel_pstate_get_busy(struct cpudata *cpu)
> >>>    {
> >>> -	int32_t core_busy, max_pstate, current_pstate;
> >>> +	int32_t core_busy, max_pstate;
> >>>
> >>>    	core_busy = cpu->sample.core_pct_busy;
> >>>    	max_pstate = int_tofp(cpu->pstate.max_pstate);
> >>> -	current_pstate = int_tofp(cpu->pstate.current_pstate);
> >>> -	core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
> >>> +	core_busy = mul_fp(core_busy, max_pstate);
> >>
> >> NAK,  The goal of this code is to find out how busy the core is at the current
> >> P state. This change will return a value WAY too high.
> >>
> >> Assume core_busy is 100 and the max non-turbo P state is 34 (3.4GHz) this code
> >> would return a busy value of 3400. The PID  is trying to keep the busy value
> >> at the setpoint any value of ~3% will drive the P state to the highest turbo
> >> P state in this example.
> >
> > Well, the problem is that the numbers above indicate an improvement in energy
> > efficiency as a result of this patch and we need to explain that result.
> >
> The performance governor is the best option for this workload.
> 
> This change will give you the highest trubo for all but very idle work loads.

I see.

> Lets say you have a processor with max P state of 3.4GHz  The current P state
> is 1.6 GHz so if the processor was 100% in C0 the core_busy values would be
> 47% This number scaled would be 100%.  With the change above the PID would be
> reacting to a load of 1598%.  APERF/MPERF give you the percent of entire
> core scaling it lets you find out how busy your are within the cureent P state.

OK

Stratos seems to be arguing that we can achieve better results, in therms of
both performance and energy efficiency, if we disregard the history and only
take the current situation into account.  The patch itself may not be correct,
but the idea is worth consideration in my opinion, especially in the face
of the fact that we made a similar change in cpufreq and the results improved.


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] cpufreq: intel_pstate: Change the calculation of next pstate
  2014-05-02 12:26       ` Rafael J. Wysocki
@ 2014-05-02 13:45         ` Stratos Karafotis
  0 siblings, 0 replies; 6+ messages in thread
From: Stratos Karafotis @ 2014-05-02 13:45 UTC (permalink / raw)
  To: Rafael J. Wysocki, Dirk Brandewie
  Cc: Viresh Kumar, Dirk Brandewie, cpufreq@vger.kernel.org,
	linux-pm@vger.kernel.org, LKML

On 02/05/2014 03:26 μμ, Rafael J. Wysocki wrote:
> On Thursday, May 01, 2014 06:48:08 PM Dirk Brandewie wrote:
>> On 05/01/2014 04:18 PM, Rafael J. Wysocki wrote:
>>> On Thursday, May 01, 2014 02:30:42 PM Dirk Brandewie wrote:
>>>> On 05/01/2014 02:00 PM, Stratos Karafotis wrote:
>>>>> Currently the driver calculates the next pstate proportional to
>>>>> core_busy factor, scaled by the ratio max_pstate / current_pstate.
>>>>>
>>>>> Using the scaled load (core_busy) to calculate the next pstate
>>>>> is not always correct, because there are cases that the load is
>>>>> independent from current pstate. For example, a tight 'for' loop
>>>>> through many sampling intervals will cause a load of 100% in
>>>>> every pstate.
>>>>>
>>>>> So, change the above method and calculate the next pstate with
>>>>> the assumption that the next pstate should not depend on the
>>>>> current pstate. The next pstate should only be directly
>>>>> proportional to measured load.
>>>>>
>>>>> Tested on Intel i7-3770 CPU @ 3.40GHz.
>>>>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
>>>>> increase ~1.5% in performance. Below the test results using turbostat
>>>>> (5 iterations):
>>>>>
>>>>> Without patch:
>>>>>
>>>>> Ph. avg Time	Total time	PkgWatt		Total Energy
>>>>> 	79.63	266.416		57.74		15382.85984
>>>>> 	79.63	265.609		57.87		15370.79283
>>>>> 	79.57	266.994		57.54		15362.83476
>>>>> 	79.53	265.304		57.83		15342.53032
>>>>> 	79.71	265.977		57.76		15362.83152
>>>>> avg	79.61	266.06		57.74		15364.36985
>>>>>
>>>>> With patch:
>>>>>
>>>>> Ph. avg Time	Total time	PkgWatt		Total Energy
>>>>> 	78.23	258.826		59.14		15306.96964
>>>>> 	78.41	259.110		59.15		15326.35650
>>>>> 	78.40	258.530		59.26		15320.48780
>>>>> 	78.46	258.673		59.20		15313.44160
>>>>> 	78.19	259.075		59.16		15326.87700
>>>>> avg	78.34	258.842		59.18		15318.82650
>>>>>
>>>>> The total test time reduced by ~2.6%, while the total energy
>>>>> consumption during a test iteration reduced by ~0.35%
>>>>>
>>>>> Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
>>>>> ---
>>>>>
>>>>> Changes v1 -> v2
>>>>> 	- Enhance change log as Rafael and Viresh suggested
>>>>>
>>>>>
>>>>>    drivers/cpufreq/intel_pstate.c | 15 +++++++--------
>>>>>    1 file changed, 7 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
>>>>> index 0999673..8e309db 100644
>>>>> --- a/drivers/cpufreq/intel_pstate.c
>>>>> +++ b/drivers/cpufreq/intel_pstate.c
>>>>> @@ -608,28 +608,27 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
>>>>>    	mod_timer_pinned(&cpu->timer, jiffies + delay);
>>>>>    }
>>>>>
>>>>> -static inline int32_t intel_pstate_get_scaled_busy(struct cpudata *cpu)
>>>>> +static inline int32_t intel_pstate_get_busy(struct cpudata *cpu)
>>>>>    {
>>>>> -	int32_t core_busy, max_pstate, current_pstate;
>>>>> +	int32_t core_busy, max_pstate;
>>>>>
>>>>>    	core_busy = cpu->sample.core_pct_busy;
>>>>>    	max_pstate = int_tofp(cpu->pstate.max_pstate);
>>>>> -	current_pstate = int_tofp(cpu->pstate.current_pstate);
>>>>> -	core_busy = mul_fp(core_busy, div_fp(max_pstate, current_pstate));
>>>>> +	core_busy = mul_fp(core_busy, max_pstate);
>>>>
>>>> NAK,  The goal of this code is to find out how busy the core is at the current
>>>> P state. This change will return a value WAY too high.
>>>>
>>>> Assume core_busy is 100 and the max non-turbo P state is 34 (3.4GHz) this code
>>>> would return a busy value of 3400. The PID  is trying to keep the busy value
>>>> at the setpoint any value of ~3% will drive the P state to the highest turbo
>>>> P state in this example.
>>>
>>> Well, the problem is that the numbers above indicate an improvement in energy
>>> efficiency as a result of this patch and we need to explain that result.
>>>
>> The performance governor is the best option for this workload.
>>
>> This change will give you the highest trubo for all but very idle work loads.
> 
> I see.
> 
>> Lets say you have a processor with max P state of 3.4GHz  The current P state
>> is 1.6 GHz so if the processor was 100% in C0 the core_busy values would be
>> 47% This number scaled would be 100%.  With the change above the PID would be
>> reacting to a load of 1598%.  APERF/MPERF give you the percent of entire
>> core scaling it lets you find out how busy your are within the cureent P state.
> 
> OK
> 
> Stratos seems to be arguing that we can achieve better results, in therms of
> both performance and energy efficiency, if we disregard the history and only
> take the current situation into account.  The patch itself may not be correct,
> but the idea is worth consideration in my opinion, especially in the face
> of the fact that we made a similar change in cpufreq and the results improved.
> 
> 

First of all, thank you very much all of you for your time reviewing this patch and
for your help!

With the hope that I will not waste more your time, I will try to implement
correctly this assumption, and test it.

My first thought, after Dirk's explanation, is to try to fix the logic in the patch
using the c0_pct as the core_pct_busy in the intel_pstate_calc_busy function. Thus:

-        sample->core_pct_busy = mul_fp(core_pct, c0_pct);
+	 sample->core_pct_busy = c0_pct;

But, I'm not sure if c0_pct is (always) equivalent to CPU load.
Otherwise, I will try to get the absolute load using the get_cpu_idle_time_us and
calculate the next pstate directly proportional to this load.


Thanks,
Stratos

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-05-02 13:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-01 21:00 [PATCH v2] cpufreq: intel_pstate: Change the calculation of next pstate Stratos Karafotis
2014-05-01 21:30 ` Dirk Brandewie
2014-05-01 23:18   ` Rafael J. Wysocki
2014-05-02  1:48     ` Dirk Brandewie
2014-05-02 12:26       ` Rafael J. Wysocki
2014-05-02 13:45         ` Stratos Karafotis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).