* [RFC PATCH] Increase in idle power with schedutil @ 2016-05-18 12:53 Shilpasri G Bhat 2016-05-18 12:53 ` [RFC PATCH] cpufreq: powernv: Add fast_switch callback Shilpasri G Bhat 2016-05-18 21:11 ` [RFC PATCH] Increase in idle power with schedutil Rafael J. Wysocki 0 siblings, 2 replies; 12+ messages in thread From: Shilpasri G Bhat @ 2016-05-18 12:53 UTC (permalink / raw) To: rjw Cc: viresh.kumar, linux-pm, linux-kernel, ego, shreyas, akshay.adiga, linuxppc-dev, Shilpasri G Bhat This patch adds driver callback for fast_switch and below observations on schedutil governor are done with this patch. In POWER8 there is a regression observed with schedutil compared to ondemand. With schedutil the frequency is not ramping down and is mostly stuck at max frequency during idle . This is because of the watchdog timer, an RT task which is fired every 4 seconds which results in requesting max frequency. In a completely idle system, when there are no processes running apart from few short running housekeeping tasks (like watchdog) the system is stuck at max frequency due to 'cpufreq_trigger_update()' static inline void cpufreq_trigger_update(u64 time) { cpufreq_update_util(time, ULONG_MAX, 0); } If there is no noise apart from the watchdog timer the cpu is held at max frequency for no good reason. On a 16 core system I can see an increase in 20% idle power with schedutil compared to ondemand governor. Below is the trace with 'sched:sched_switch' and 'power:cpu_frequency' events. Here the watchdog timer that runs for a very small period is requesting Pmax and this gets triggered regularly. <idle>-0 19059.992912: sched_switch: prev_comm=swapper/16 prev_state=R ==> next_comm=watchdog/16 watchdog/16-107 19059.992914: cpu_frequency: state=4322000 cpu_id=16 watchdog/16-107 19059.992915: sched_switch: prev_comm=watchdog/16 prev_state=S ==> next_comm=swapper/16 However adding a cpufreq hook in pick_next_task_idle() to decrease the frequency helped to reduce the problem. static inline void cpufreq_trigger_idle(u64 time) { cpufreq_update_util(time, 0, 1); } This might not be the right fix for the problem, however this thread is reporting the other short-comings of cpufreq_trigger_update(). Shilpasri G Bhat (1): cpufreq: powernv: Add fast_switch callback drivers/cpufreq/powernv-cpufreq.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) -- 1.9.3 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC PATCH] cpufreq: powernv: Add fast_switch callback 2016-05-18 12:53 [RFC PATCH] Increase in idle power with schedutil Shilpasri G Bhat @ 2016-05-18 12:53 ` Shilpasri G Bhat 2016-05-18 21:22 ` Rafael J. Wysocki 2016-05-18 21:11 ` [RFC PATCH] Increase in idle power with schedutil Rafael J. Wysocki 1 sibling, 1 reply; 12+ messages in thread From: Shilpasri G Bhat @ 2016-05-18 12:53 UTC (permalink / raw) To: rjw Cc: viresh.kumar, linux-pm, linux-kernel, ego, shreyas, akshay.adiga, linuxppc-dev, Shilpasri G Bhat Add fast_switch driver callback to support frequency update in interrupt context while using schedutil governor. Changing frequency in interrupt context will remove the jitter on the workloads which can be seen when a kworker thread is used for the changing the frequency. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> --- drivers/cpufreq/powernv-cpufreq.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c index 54c4536..4553eb6 100644 --- a/drivers/cpufreq/powernv-cpufreq.c +++ b/drivers/cpufreq/powernv-cpufreq.c @@ -678,6 +678,8 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy) for (i = 0; i < threads_per_core; i++) cpumask_set_cpu(base + i, policy->cpus); + policy->fast_switch_possible = true; + kn = kernfs_find_and_get(policy->kobj.sd, throttle_attr_grp.name); if (!kn) { int ret; @@ -854,6 +856,24 @@ static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy) del_timer_sync(&gpstates->timer); } +static unsigned int powernv_fast_switch(struct cpufreq_policy *policy, + unsigned int target_freq) +{ + int index; + struct powernv_smp_call_data freq_data; + + cpufreq_frequency_table_target(policy, policy->freq_table, + target_freq, + CPUFREQ_RELATION_C, &index); + if (index < 0 || index >= powernv_pstate_info.nr_pstates) + return CPUFREQ_ENTRY_INVALID; + freq_data.pstate_id = powernv_freqs[index].driver_data; + freq_data.gpstate_id = powernv_freqs[index].driver_data; + set_pstate(&freq_data); + + return pstate_id_to_freq(-index); +} + static struct cpufreq_driver powernv_cpufreq_driver = { .name = "powernv-cpufreq", .flags = CPUFREQ_CONST_LOOPS, @@ -861,6 +881,7 @@ static struct cpufreq_driver powernv_cpufreq_driver = { .exit = powernv_cpufreq_cpu_exit, .verify = cpufreq_generic_frequency_table_verify, .target_index = powernv_cpufreq_target_index, + .fast_switch = powernv_fast_switch, .get = powernv_cpufreq_get, .stop_cpu = powernv_cpufreq_stop_cpu, .attr = powernv_cpu_freq_attr, -- 1.9.3 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] cpufreq: powernv: Add fast_switch callback 2016-05-18 12:53 ` [RFC PATCH] cpufreq: powernv: Add fast_switch callback Shilpasri G Bhat @ 2016-05-18 21:22 ` Rafael J. Wysocki 0 siblings, 0 replies; 12+ messages in thread From: Rafael J. Wysocki @ 2016-05-18 21:22 UTC (permalink / raw) To: Shilpasri G Bhat Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Gautham R. Shenoy, shreyas, akshay.adiga, linuxppc-dev, Peter Zijlstra On Wed, May 18, 2016 at 2:53 PM, Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> wrote: > Add fast_switch driver callback to support frequency update in > interrupt context while using schedutil governor. Changing frequency > in interrupt context will remove the jitter on the workloads which can > be seen when a kworker thread is used for the changing the frequency. > > Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> This looks simple enough. :-) A couple of comments, though. > --- > drivers/cpufreq/powernv-cpufreq.c | 21 +++++++++++++++++++++ > 1 file changed, 21 insertions(+) > > diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c > index 54c4536..4553eb6 100644 > --- a/drivers/cpufreq/powernv-cpufreq.c > +++ b/drivers/cpufreq/powernv-cpufreq.c > @@ -678,6 +678,8 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy) > for (i = 0; i < threads_per_core; i++) > cpumask_set_cpu(base + i, policy->cpus); > > + policy->fast_switch_possible = true; > + > kn = kernfs_find_and_get(policy->kobj.sd, throttle_attr_grp.name); > if (!kn) { > int ret; > @@ -854,6 +856,24 @@ static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy) > del_timer_sync(&gpstates->timer); > } > > +static unsigned int powernv_fast_switch(struct cpufreq_policy *policy, > + unsigned int target_freq) > +{ > + int index; > + struct powernv_smp_call_data freq_data; > + > + cpufreq_frequency_table_target(policy, policy->freq_table, > + target_freq, > + CPUFREQ_RELATION_C, &index); According to the discussion I had with Peter some time ago, this should be RELATION_L or you may end up using a frequency that's not sufficient to meet a deadline somewhere. Also cpufreq_frequency_table_target() is somewhat heavy-weight especially if the table is known to be sorted (which I guess is the case). > + if (index < 0 || index >= powernv_pstate_info.nr_pstates) > + return CPUFREQ_ENTRY_INVALID; > + freq_data.pstate_id = powernv_freqs[index].driver_data; > + freq_data.gpstate_id = powernv_freqs[index].driver_data; > + set_pstate(&freq_data); > + > + return pstate_id_to_freq(-index); > +} > + > static struct cpufreq_driver powernv_cpufreq_driver = { > .name = "powernv-cpufreq", > .flags = CPUFREQ_CONST_LOOPS, > @@ -861,6 +881,7 @@ static struct cpufreq_driver powernv_cpufreq_driver = { > .exit = powernv_cpufreq_cpu_exit, > .verify = cpufreq_generic_frequency_table_verify, > .target_index = powernv_cpufreq_target_index, > + .fast_switch = powernv_fast_switch, > .get = powernv_cpufreq_get, > .stop_cpu = powernv_cpufreq_stop_cpu, > .attr = powernv_cpu_freq_attr, > -- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] Increase in idle power with schedutil 2016-05-18 12:53 [RFC PATCH] Increase in idle power with schedutil Shilpasri G Bhat 2016-05-18 12:53 ` [RFC PATCH] cpufreq: powernv: Add fast_switch callback Shilpasri G Bhat @ 2016-05-18 21:11 ` Rafael J. Wysocki 2016-05-19 11:40 ` Peter Zijlstra 1 sibling, 1 reply; 12+ messages in thread From: Rafael J. Wysocki @ 2016-05-18 21:11 UTC (permalink / raw) To: Shilpasri G Bhat Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Gautham R. Shenoy, shreyas, akshay.adiga, linuxppc-dev, Steve Muckle, Peter Zijlstra On Wed, May 18, 2016 at 2:53 PM, Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> wrote: > This patch adds driver callback for fast_switch and below observations > on schedutil governor are done with this patch. > > In POWER8 there is a regression observed with schedutil compared to > ondemand. With schedutil the frequency is not ramping down and is > mostly stuck at max frequency during idle . This is because of the > watchdog timer, an RT task which is fired every 4 seconds which > results in requesting max frequency. Well, yes, that would be problematic. I guess the Steve Muckle's cross-CPU utilization updates series might help (you can find it in the linux-pm patchwork). > In a completely idle system, when there are no processes running apart > from few short running housekeeping tasks (like watchdog) the system is > stuck at max frequency due to 'cpufreq_trigger_update()' > > static inline void cpufreq_trigger_update(u64 time) > { > cpufreq_update_util(time, ULONG_MAX, 0); > } > > If there is no noise apart from the watchdog timer the cpu is held at > max frequency for no good reason. On a 16 core system I can see an > increase in 20% idle power with schedutil compared to ondemand > governor. > > Below is the trace with 'sched:sched_switch' and 'power:cpu_frequency' > events. Here the watchdog timer that runs for a very small period is > requesting Pmax and this gets triggered regularly. > > <idle>-0 19059.992912: sched_switch: prev_comm=swapper/16 prev_state=R > ==> next_comm=watchdog/16 > watchdog/16-107 19059.992914: cpu_frequency: state=4322000 cpu_id=16 > watchdog/16-107 19059.992915: sched_switch: prev_comm=watchdog/16 prev_state=S > ==> next_comm=swapper/16 > > However adding a cpufreq hook in pick_next_task_idle() to decrease the > frequency helped to reduce the problem. > > static inline void cpufreq_trigger_idle(u64 time) > { > cpufreq_update_util(time, 0, 1); > } > > This might not be the right fix for the problem, however this thread > is reporting the other short-comings of cpufreq_trigger_update(). ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] Increase in idle power with schedutil 2016-05-18 21:11 ` [RFC PATCH] Increase in idle power with schedutil Rafael J. Wysocki @ 2016-05-19 11:40 ` Peter Zijlstra 2016-05-19 14:30 ` Rafael J. Wysocki ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Peter Zijlstra @ 2016-05-19 11:40 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Shilpasri G Bhat, Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Gautham R. Shenoy, shreyas, akshay.adiga, linuxppc-dev, Steve Muckle On Wed, May 18, 2016 at 11:11:51PM +0200, Rafael J. Wysocki wrote: > On Wed, May 18, 2016 at 2:53 PM, Shilpasri G Bhat > <shilpa.bhat@linux.vnet.ibm.com> wrote: > > This patch adds driver callback for fast_switch and below observations > > on schedutil governor are done with this patch. > > > > In POWER8 there is a regression observed with schedutil compared to > > ondemand. With schedutil the frequency is not ramping down and is > > mostly stuck at max frequency during idle . This is because of the > > watchdog timer, an RT task which is fired every 4 seconds which > > results in requesting max frequency. > > Well, yes, that would be problematic. > Right; we need to come up with something for RT tasks; but what happens if you disable the watchdog? This should be entirely doable and might give a better comparison. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] Increase in idle power with schedutil 2016-05-19 11:40 ` Peter Zijlstra @ 2016-05-19 14:30 ` Rafael J. Wysocki 2016-05-20 12:23 ` Shilpasri G Bhat [not found] ` <201605201223.u4KCNWn9028105@mx0a-001b2d01.pphosted.com> 2 siblings, 0 replies; 12+ messages in thread From: Rafael J. Wysocki @ 2016-05-19 14:30 UTC (permalink / raw) To: Peter Zijlstra Cc: Rafael J. Wysocki, Shilpasri G Bhat, Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Gautham R. Shenoy, shreyas, akshay.adiga, linuxppc-dev, Steve Muckle On Thu, May 19, 2016 at 1:40 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, May 18, 2016 at 11:11:51PM +0200, Rafael J. Wysocki wrote: >> On Wed, May 18, 2016 at 2:53 PM, Shilpasri G Bhat >> <shilpa.bhat@linux.vnet.ibm.com> wrote: >> > This patch adds driver callback for fast_switch and below observations >> > on schedutil governor are done with this patch. >> > >> > In POWER8 there is a regression observed with schedutil compared to >> > ondemand. With schedutil the frequency is not ramping down and is >> > mostly stuck at max frequency during idle . This is because of the >> > watchdog timer, an RT task which is fired every 4 seconds which >> > results in requesting max frequency. >> >> Well, yes, that would be problematic. >> > > Right; we need to come up with something for RT tasks; I think we need the hints thing for that to be able to distinguish between RT and the rest. Also in this particular case it looks like an RT task is the only task that wakes up often enough and we don't drop the frequency when going idle. Do we need a hook somewhere in the idle path? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] Increase in idle power with schedutil 2016-05-19 11:40 ` Peter Zijlstra 2016-05-19 14:30 ` Rafael J. Wysocki @ 2016-05-20 12:23 ` Shilpasri G Bhat [not found] ` <201605201223.u4KCNWn9028105@mx0a-001b2d01.pphosted.com> 2 siblings, 0 replies; 12+ messages in thread From: Shilpasri G Bhat @ 2016-05-20 12:23 UTC (permalink / raw) To: Peter Zijlstra Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Gautham R. Shenoy, shreyas, akshay.adiga, linuxppc-dev, Steve Muckle Hi, On 05/19/2016 05:10 PM, Peter Zijlstra wrote: > On Wed, May 18, 2016 at 11:11:51PM +0200, Rafael J. Wysocki wrote: >> On Wed, May 18, 2016 at 2:53 PM, Shilpasri G Bhat >> <shilpa.bhat@linux.vnet.ibm.com> wrote: >>> This patch adds driver callback for fast_switch and below observations >>> on schedutil governor are done with this patch. >>> >>> In POWER8 there is a regression observed with schedutil compared to >>> ondemand. With schedutil the frequency is not ramping down and is >>> mostly stuck at max frequency during idle . This is because of the >>> watchdog timer, an RT task which is fired every 4 seconds which >>> results in requesting max frequency. >> >> Well, yes, that would be problematic. >> > > Right; we need to come up with something for RT tasks; but what happens > if you disable the watchdog? This should be entirely doable and might > give a better comparison. > Below are the comparisons by disabling watchdog. Both schedutil and ondemand have a similar ramp-down trend. And in both the cases I can see that frequency of the cpu is not reduced in deterministic fashion. In a observation window of 30 seconds after running a workload I can see that the frequency is not ramped down on some cpus in the system and are idling at max frequency. Below are the sample trace showcasing the frequency request when the cpu enters idle with schedutil. <...>-3528 7650.011010: cpu_frequency: state=4322000 cpu_id=120 <...>-3528 7650.027540: sched_switch: prev_comm=ppc64_cpu prev_state=x ==> next_comm=swapper/120 <idle>-0 7650.035017: cpu_frequency: state=4322000 cpu_id=120 <idle>-0 7729.683536: cpu_frequency: state=4322000 cpu_id=120 <idle>-0 7729.683552: sched_switch: prev_comm=swapper/120 prev_state=R ==> next_comm=kworker/120:1 kworker/120 7729.683565: sched_switch: prev_comm=kworker/120:1 prev_state=S ==> next_comm=swapper/120 However ondemand governor(with watchdog enabled) benefits from the noise created by watchdog timer and is able to brig down the frequency. Thanks and Regards, Shilpa ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <201605201223.u4KCNWn9028105@mx0a-001b2d01.pphosted.com>]
* Re: [RFC PATCH] Increase in idle power with schedutil [not found] ` <201605201223.u4KCNWn9028105@mx0a-001b2d01.pphosted.com> @ 2016-05-22 10:39 ` Peter Zijlstra 2016-05-22 20:42 ` Steve Muckle 0 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2016-05-22 10:39 UTC (permalink / raw) To: Shilpasri G Bhat Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Gautham R. Shenoy, shreyas, akshay.adiga, linuxppc-dev, Steve Muckle On Fri, May 20, 2016 at 05:53:41PM +0530, Shilpasri G Bhat wrote: > > Below are the comparisons by disabling watchdog. > Both schedutil and ondemand have a similar ramp-down trend. And in both the > cases I can see that frequency of the cpu is not reduced in deterministic > fashion. In a observation window of 30 seconds after running a workload I can > see that the frequency is not ramped down on some cpus in the system and are > idling at max frequency. So does it actually matter what the frequency is when you idle? Isn't the whole thing clock gated anyway? Because this seems to generate contradictory requirements, on the one hand we want to stay idle as long as possible while on the other hand you seem to want to clock down while idle, which requires not being idle. If it matters; should not your idle state muck explicitly set/restore frequency? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] Increase in idle power with schedutil 2016-05-22 10:39 ` Peter Zijlstra @ 2016-05-22 20:42 ` Steve Muckle 2016-05-23 9:00 ` Lorenzo Pieralisi 2016-05-23 9:24 ` Peter Zijlstra 0 siblings, 2 replies; 12+ messages in thread From: Steve Muckle @ 2016-05-22 20:42 UTC (permalink / raw) To: Peter Zijlstra, Daniel Lezcano Cc: Shilpasri G Bhat, Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Gautham R. Shenoy, shreyas, akshay.adiga, linuxppc-dev, Steve Muckle On Sun, May 22, 2016 at 12:39:12PM +0200, Peter Zijlstra wrote: > On Fri, May 20, 2016 at 05:53:41PM +0530, Shilpasri G Bhat wrote: > > > > Below are the comparisons by disabling watchdog. > > Both schedutil and ondemand have a similar ramp-down trend. And in both the > > cases I can see that frequency of the cpu is not reduced in deterministic > > fashion. In a observation window of 30 seconds after running a workload I can > > see that the frequency is not ramped down on some cpus in the system and are > > idling at max frequency. > > So does it actually matter what the frequency is when you idle? Isn't > the whole thing clock gated anyway? > > Because this seems to generate contradictory requirements, on the one > hand we want to stay idle as long as possible while on the other hand > you seem to want to clock down while idle, which requires not being > idle. > > If it matters; should not your idle state muck explicitly set/restore > frequency? AFAIK this is very platform dependent. Some will waste more power than others when a CPU idles above fmin due to things like resource (bus bandwidth, shared cache freq etc) voting. It is also true that there is power spent going to fmin (and then perhaps restoring the frequency when idle ends) which will be in part a function of how slow the frequency change operation is on that platform. I think Daniel Lezcano (added) was exploring the idea of having cpuidle drivers take the expected idle duration and potentially communicate to cpufreq to reduce the frequency depending on a platform-specific cost/benefit analysis. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] Increase in idle power with schedutil 2016-05-22 20:42 ` Steve Muckle @ 2016-05-23 9:00 ` Lorenzo Pieralisi 2016-05-23 9:24 ` Peter Zijlstra 2016-05-23 9:24 ` Peter Zijlstra 1 sibling, 1 reply; 12+ messages in thread From: Lorenzo Pieralisi @ 2016-05-23 9:00 UTC (permalink / raw) To: Steve Muckle Cc: Peter Zijlstra, Daniel Lezcano, Shilpasri G Bhat, Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Gautham R. Shenoy, shreyas, akshay.adiga, linuxppc-dev On Sun, May 22, 2016 at 01:42:52PM -0700, Steve Muckle wrote: > On Sun, May 22, 2016 at 12:39:12PM +0200, Peter Zijlstra wrote: > > On Fri, May 20, 2016 at 05:53:41PM +0530, Shilpasri G Bhat wrote: > > > > > > Below are the comparisons by disabling watchdog. > > > Both schedutil and ondemand have a similar ramp-down trend. And in both the > > > cases I can see that frequency of the cpu is not reduced in deterministic > > > fashion. In a observation window of 30 seconds after running a workload I can > > > see that the frequency is not ramped down on some cpus in the system and are > > > idling at max frequency. > > > > So does it actually matter what the frequency is when you idle? Isn't > > the whole thing clock gated anyway? > > > > Because this seems to generate contradictory requirements, on the one > > hand we want to stay idle as long as possible while on the other hand > > you seem to want to clock down while idle, which requires not being > > idle. > > > > If it matters; should not your idle state muck explicitly set/restore > > frequency? > > AFAIK this is very platform dependent. Some will waste more power than > others when a CPU idles above fmin due to things like resource (bus > bandwidth, shared cache freq etc) voting. It is also related to static leakage power that depends on the operating voltage (ie higher operating frequencies require higher voltage) so in a way scaling frequency before going idle may not be effective if voltage does not scale too in turn. Lorenzo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] Increase in idle power with schedutil 2016-05-23 9:00 ` Lorenzo Pieralisi @ 2016-05-23 9:24 ` Peter Zijlstra 0 siblings, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2016-05-23 9:24 UTC (permalink / raw) To: Lorenzo Pieralisi Cc: Steve Muckle, Daniel Lezcano, Shilpasri G Bhat, Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Gautham R. Shenoy, shreyas, akshay.adiga, linuxppc-dev On Mon, May 23, 2016 at 10:00:04AM +0100, Lorenzo Pieralisi wrote: > It is also related to static leakage power that depends on the operating > voltage (ie higher operating frequencies require higher voltage) so in a > way scaling frequency before going idle may not be effective if voltage > does not scale too in turn. Sure, but the platform drivers 'know' all this and can make the right decision. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] Increase in idle power with schedutil 2016-05-22 20:42 ` Steve Muckle 2016-05-23 9:00 ` Lorenzo Pieralisi @ 2016-05-23 9:24 ` Peter Zijlstra 1 sibling, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2016-05-23 9:24 UTC (permalink / raw) To: Steve Muckle Cc: Daniel Lezcano, Shilpasri G Bhat, Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Gautham R. Shenoy, shreyas, akshay.adiga, linuxppc-dev On Sun, May 22, 2016 at 01:42:52PM -0700, Steve Muckle wrote: > > So does it actually matter what the frequency is when you idle? Isn't > > the whole thing clock gated anyway? > > > > Because this seems to generate contradictory requirements, on the one > > hand we want to stay idle as long as possible while on the other hand > > you seem to want to clock down while idle, which requires not being > > idle. > > > > If it matters; should not your idle state muck explicitly set/restore > > frequency? > > AFAIK this is very platform dependent. Some will waste more power than > others when a CPU idles above fmin due to things like resource (bus > bandwidth, shared cache freq etc) voting. Oh agreed, completely platform dependent. 'Luckily' all this cpuidle is already very platform dependent. > It is also true that there is power spent going to fmin (and then > perhaps restoring the frequency when idle ends) which will be in part a > function of how slow the frequency change operation is on that platform. Agreed. > I think Daniel Lezcano (added) was exploring the idea of having cpuidle > drivers take the expected idle duration and potentially communicate to > cpufreq to reduce the frequency depending on a platform-specific > cost/benefit analysis. Right; that's along the lines I was thinking. If the idle guestimate and the idle QoS both allow (ie. it wins on power and doesn't violate wake-up latency) muck with DVSF on the idle path. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-05-23 9:25 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-05-18 12:53 [RFC PATCH] Increase in idle power with schedutil Shilpasri G Bhat 2016-05-18 12:53 ` [RFC PATCH] cpufreq: powernv: Add fast_switch callback Shilpasri G Bhat 2016-05-18 21:22 ` Rafael J. Wysocki 2016-05-18 21:11 ` [RFC PATCH] Increase in idle power with schedutil Rafael J. Wysocki 2016-05-19 11:40 ` Peter Zijlstra 2016-05-19 14:30 ` Rafael J. Wysocki 2016-05-20 12:23 ` Shilpasri G Bhat [not found] ` <201605201223.u4KCNWn9028105@mx0a-001b2d01.pphosted.com> 2016-05-22 10:39 ` Peter Zijlstra 2016-05-22 20:42 ` Steve Muckle 2016-05-23 9:00 ` Lorenzo Pieralisi 2016-05-23 9:24 ` Peter Zijlstra 2016-05-23 9:24 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).