public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] cpufreq: Add sampling window to enhance ondemand governor power efficiency
@ 2010-12-23  6:23 Youquan Song
  2010-12-23  6:23 ` [PATCH 1/6] cpufreq: Add sampling window for ondemand governor Youquan Song
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Youquan Song @ 2010-12-23  6:23 UTC (permalink / raw)
  To: davej, cpufreq
  Cc: venki, arjan, lenb, suresh.b.siddha, kent.liu, chaohong.guo,
	linux-kernel, linux-acpi, Youquan Song, Youquan Song

Running a well-known power performance benchmark, current ondemand governor is
not power efficiency. Even when workload is at 10%~20% of full capability, the
CPU will also run much of time at highest frequency. In fact, in this situation,
the lowest frequency often can meet user requirement. When running this
benchmark on turbo mode enable machine, I compare the result of different
governors, the results of ondemand and performance governors are the closest.
There is no much power saving between ondemand and performance governor. If we
can ignore the little power saving, the perfomance governor even better than 
ondemand governor, at leaset for better performance. 

One potential reason for ondemand governor is not power efficiency is that
ondemand governor decide the next target frequency by instant requirement during
sampling interval (10ms or possible a little longer for deferrable timer in idle
tickless). The instant requirement can response quickly to workload change, but
it does not usually reflect workload real CPU usage requirement in a small 
longer time and it possibly causes frequently change between highest and lowest
frequency.     

This patchset add a sampling window for percpu ondemand thread. Each sampling
window with max 150 record items which slide every sampling interval and use to
track the workload requirement during latest sampling window timeframe. 
The average of workload during latest sample windows will be used to decide next
target frequency. The sampling window targets to be more truly reflects workload
requirement of CPU usage. 

The sampling window size can be set by user and default max sampling window
is one second. When it is set to default sampling rate, the sampling window will
roll back to original behaviour.

The sampling window size also can be dynamicly changed in according to current
system workload busy situation. The more idle, the smaller sampling window; the
more busy, the larger sampling window. It will increase the respnose speed by
decrease sampling window, while it will keep CPU working at high speed when busy
by increase sampling window and also avoid unefficiently dangle between highest
and lowest frequency in original ondemand.

We set to up_threshold to 80 and down_differential to 20, so when workload reach
 80% of current frequency, it will increase to highest frequency. When workload
decrease to below (up_threshold - down_differential)60% of current frequency
capability, it will decrease the frequency, which ensure that CPU work above 60%
of its current capability, otherwise lowest frequency will be used. 
   
The Turbo Mode (P0) will comsume much more power compare with second largest
frequency (P1) and P1 frequency is often double, even more, with Pn lowest
frequency; Current logic will increase sharply to highest frequency Turbo Mode
when workload reach to up_threshold of current frequency capacity, even current
frequency at lowest frequency. In this patchset, it will firstly evaluate P1 if
it is enough to support current workload before directly enter into Turbo Mode.
If P1 can meet workload requirement, it will save power compare of being Turbo
Mode.  

On my test platform with two sockets Westmere-EP server and run the well-known
power performance benchmark, when workload is low, the patched governor is 
power saving like powersave governor; while workload is high, the patched 
governor is as good as performance governor but the patched governor consume
less power than performance governor. Along with other patches in this patchset,
the patched governor power efficiey is improved about 10%, while the performance
has no apparently decrease.
Running other benchmarks in phoronix, kernel building save 5% power, while the
performance without decrease. compress-7zip save power 2%, while the performance
also does not apparently decrease. However, apache benchmark saves power but its
performance decrease a lot.



^ permalink raw reply	[flat|nested] 20+ messages in thread
* [PATCH 0/6] cpufreq: Add sampling window to enhance ondemand governor power efficiency
@ 2010-12-23  6:17 Youquan Song
  2010-12-23  6:17 ` [PATCH 1/6] cpufreq: Add sampling window for ondemand governor Youquan Song
  0 siblings, 1 reply; 20+ messages in thread
From: Youquan Song @ 2010-12-23  6:17 UTC (permalink / raw)
  To: davej, cpufreq
  Cc: venki, arjan, lenb, suresh.b.siddha, kent.liu, chaohong.guo,
	linux-kernel, linux-acpi, Youquan Song, Youquan Song

Running a well-known power performance benchmark, current ondemand governor is
not power efficiency. Even when workload is at 10%~20% of full capability, the
CPU will also run much of time at highest frequency. In fact, in this situation,
the lowest frequency often can meet user requirement. When running this
benchmark on turbo mode enable machine, I compare the result of different
governors, the results of ondemand and performance governors are the closest.
There is no much power saving between ondemand and performance governor. If we
can ignore the little power saving, the perfomance governor even better than 
ondemand governor, at leaset for better performance. 

One potential reason for ondemand governor is not power efficiency is that
ondemand governor decide the next target frequency by instant requirement during
sampling interval (10ms or possible a little longer for deferrable timer in idle
tickless). The instant requirement can response quickly to workload change, but
it does not usually reflect workload real CPU usage requirement in a small 
longer time and it possibly causes frequently change between highest and lowest
frequency.     

This patchset add a sampling window for percpu ondemand thread. Each sampling
window with max 150 record items which slide every sampling interval and use to
track the workload requirement during latest sampling window timeframe. 
The average of workload during latest sample windows will be used to decide next
target frequency. The sampling window targets to be more truly reflects workload
requirement of CPU usage. 

The sampling window size can be set by user and default max sampling window
is one second. When it is set to default sampling rate, the sampling window will
roll back to original behaviour.

The sampling window size also can be dynamicly changed in according to current
system workload busy situation. The more idle, the smaller sampling window; the
more busy, the larger sampling window. It will increase the respnose speed by
decrease sampling window, while it will keep CPU working at high speed when busy
by increase sampling window and also avoid unefficiently dangle between highest
and lowest frequency in original ondemand.

We set to up_threshold to 80 and down_differential to 20, so when workload reach
 80% of current frequency, it will increase to highest frequency. When workload
decrease to below (up_threshold - down_differential)60% of current frequency
capability, it will decrease the frequency, which ensure that CPU work above 60%
of its current capability, otherwise lowest frequency will be used. 
   
The Turbo Mode (P0) will comsume much more power compare with second largest
frequency (P1) and P1 frequency is often double, even more, with Pn lowest
frequency; Current logic will increase sharply to highest frequency Turbo Mode
when workload reach to up_threshold of current frequency capacity, even current
frequency at lowest frequency. In this patchset, it will firstly evaluate P1 if
it is enough to support current workload before directly enter into Turbo Mode.
If P1 can meet workload requirement, it will save power compare of being Turbo
Mode.  

On my test platform with two sockets Westmere-EP server and run the well-known
power performance benchmark, when workload is low, the patched governor is 
power saving like powersave governor; while workload is high, the patched 
governor is as good as performance governor but the patched governor consume
less power than performance governor. Along with other patches in this patchset,
the patched governor power efficiey is improved about 10%, while the performance
has no apparently decrease.
Running other benchmarks in phoronix, kernel building save 5% power, while the
performance without decrease. compress-7zip save power 2%, while the performance
also does not apparently decrease. However, apache benchmark saves power but its
performance decrease a lot.



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2010-12-25  4:54 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-23  6:23 [PATCH 0/6] cpufreq: Add sampling window to enhance ondemand governor power efficiency Youquan Song
2010-12-23  6:23 ` [PATCH 1/6] cpufreq: Add sampling window for ondemand governor Youquan Song
2010-12-23  6:23   ` [PATCH 2/6] cpufreq: Add sampling_window tunable Youquan Song
2010-12-23  6:23     ` [PATCH 3/6] cpufreq: Add roll back non-sampling_window Youquan Song
2010-12-23  6:23       ` [PATCH 4/6] cpufreq: Add dynamic sampling window tunable Youquan Song
2010-12-23  6:23         ` [PATCH 5/6] cpufreq: Add down_differential tunable Youquan Song
2010-12-23  6:23           ` [PATCH 6/6] cpufreq: Evaluate P1 before enter turbo mode Youquan Song
2010-12-23 10:57             ` Dominik Brodowski
2010-12-23 14:38               ` Matthew Garrett
2010-12-23 18:13                 ` Venkatesh Pallipadi
2010-12-23 20:48                   ` Dominik Brodowski
2010-12-23 11:00 ` [PATCH 0/6] cpufreq: Add sampling window to enhance ondemand governor power efficiency Dominik Brodowski
2010-12-23 17:34   ` Dave Jones
2010-12-23 20:51     ` Dominik Brodowski
2010-12-25  4:24       ` James Cloos
2010-12-24  3:06   ` Youquan Song
2010-12-23 14:42 ` Matthew Garrett
2010-12-24  4:28   ` Youquan Song
     [not found] ` <BBBDBC5FD59D92459FAEF7D8EA3361C402A1A424@DUL1WNEXMB05.vcorp.ad.vrsn.com>
2010-12-23 23:26   ` Youquan Song
  -- strict thread matches above, loose matches on Subject: below --
2010-12-23  6:17 Youquan Song
2010-12-23  6:17 ` [PATCH 1/6] cpufreq: Add sampling window for ondemand governor Youquan Song
2010-12-23  6:17   ` [PATCH 2/6] cpufreq: Add sampling_window tunable Youquan Song
2010-12-23  6:17     ` [PATCH 3/6] cpufreq: Add roll back non-sampling_window Youquan Song
2010-12-23  6:17       ` [PATCH 4/6] cpufreq: Add dynamic sampling window tunable Youquan Song
2010-12-23  6:17         ` [PATCH 5/6] cpufreq: Add down_differential tunable Youquan Song
2010-12-23  6:17           ` [PATCH 6/6] cpufreq: Evaluate P1 before enter turbo mode Youquan Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox