* Re: [RFC PATCH] Increase in idle power with schedutil
2016-05-19 11:40 ` Peter Zijlstra
@ 2016-05-19 14:30 ` Rafael J. Wysocki
2016-05-20 12:23 ` Shilpasri G Bhat
` (3 subsequent siblings)
4 siblings, 0 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2016-05-19 14:30 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Rafael J. Wysocki, Shilpasri G Bhat, Rafael J. Wysocki,
Viresh Kumar, linux-pm@vger.kernel.org, Linux Kernel Mailing List,
Gautham R. Shenoy, shreyas, akshay.adiga, linuxppc-dev,
Steve Muckle
On Thu, May 19, 2016 at 1:40 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, May 18, 2016 at 11:11:51PM +0200, Rafael J. Wysocki wrote:
>> On Wed, May 18, 2016 at 2:53 PM, Shilpasri G Bhat
>> <shilpa.bhat@linux.vnet.ibm.com> wrote:
>> > This patch adds driver callback for fast_switch and below observations
>> > on schedutil governor are done with this patch.
>> >
>> > In POWER8 there is a regression observed with schedutil compared to
>> > ondemand. With schedutil the frequency is not ramping down and is
>> > mostly stuck at max frequency during idle . This is because of the
>> > watchdog timer, an RT task which is fired every 4 seconds which
>> > results in requesting max frequency.
>>
>> Well, yes, that would be problematic.
>>
>
> Right; we need to come up with something for RT tasks;
I think we need the hints thing for that to be able to distinguish
between RT and the rest.
Also in this particular case it looks like an RT task is the only task
that wakes up often enough and we don't drop the frequency when going
idle. Do we need a hook somewhere in the idle path?
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [RFC PATCH] Increase in idle power with schedutil
2016-05-19 11:40 ` Peter Zijlstra
2016-05-19 14:30 ` Rafael J. Wysocki
@ 2016-05-20 12:23 ` Shilpasri G Bhat
2016-05-20 12:23 ` Shilpasri G Bhat
` (2 subsequent siblings)
4 siblings, 0 replies; 14+ messages in thread
From: Shilpasri G Bhat @ 2016-05-20 12:23 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org,
Linux Kernel Mailing List, Gautham R. Shenoy, shreyas,
akshay.adiga, linuxppc-dev, Steve Muckle
Hi,
On 05/19/2016 05:10 PM, Peter Zijlstra wrote:
> On Wed, May 18, 2016 at 11:11:51PM +0200, Rafael J. Wysocki wrote:
>> On Wed, May 18, 2016 at 2:53 PM, Shilpasri G Bhat
>> <shilpa.bhat@linux.vnet.ibm.com> wrote:
>>> This patch adds driver callback for fast_switch and below observations
>>> on schedutil governor are done with this patch.
>>>
>>> In POWER8 there is a regression observed with schedutil compared to
>>> ondemand. With schedutil the frequency is not ramping down and is
>>> mostly stuck at max frequency during idle . This is because of the
>>> watchdog timer, an RT task which is fired every 4 seconds which
>>> results in requesting max frequency.
>>
>> Well, yes, that would be problematic.
>>
>
> Right; we need to come up with something for RT tasks; but what happens
> if you disable the watchdog? This should be entirely doable and might
> give a better comparison.
>
Below are the comparisons by disabling watchdog.
Both schedutil and ondemand have a similar ramp-down trend. And in both the
cases I can see that frequency of the cpu is not reduced in deterministic
fashion. In a observation window of 30 seconds after running a workload I can
see that the frequency is not ramped down on some cpus in the system and are
idling at max frequency.
Below are the sample trace showcasing the frequency request when the cpu enters
idle with schedutil.
<...>-3528 7650.011010: cpu_frequency: state=4322000 cpu_id=120
<...>-3528 7650.027540: sched_switch: prev_comm=ppc64_cpu prev_state=x ==>
next_comm=swapper/120
<idle>-0 7650.035017: cpu_frequency: state=4322000 cpu_id=120
<idle>-0 7729.683536: cpu_frequency: state=4322000 cpu_id=120
<idle>-0 7729.683552: sched_switch: prev_comm=swapper/120 prev_state=R ==>
next_comm=kworker/120:1
kworker/120 7729.683565: sched_switch: prev_comm=kworker/120:1 prev_state=S ==>
next_comm=swapper/120
However ondemand governor(with watchdog enabled) benefits from the noise created
by watchdog timer and is able to brig down the frequency.
Thanks and Regards,
Shilpa
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [RFC PATCH] Increase in idle power with schedutil
2016-05-19 11:40 ` Peter Zijlstra
2016-05-19 14:30 ` Rafael J. Wysocki
2016-05-20 12:23 ` Shilpasri G Bhat
@ 2016-05-20 12:23 ` Shilpasri G Bhat
2016-05-20 12:23 ` Shilpasri G Bhat
[not found] ` <201605201223.u4KCNWn9028105@mx0a-001b2d01.pphosted.com>
4 siblings, 0 replies; 14+ messages in thread
From: Shilpasri G Bhat @ 2016-05-20 12:23 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Gautham R. Shenoy, linux-pm@vger.kernel.org, shreyas,
Rafael J. Wysocki, Linux Kernel Mailing List, linuxppc-dev,
Steve Muckle, Viresh Kumar, akshay.adiga
Hi,
On 05/19/2016 05:10 PM, Peter Zijlstra wrote:
> On Wed, May 18, 2016 at 11:11:51PM +0200, Rafael J. Wysocki wrote:
>> On Wed, May 18, 2016 at 2:53 PM, Shilpasri G Bhat
>> <shilpa.bhat@linux.vnet.ibm.com> wrote:
>>> This patch adds driver callback for fast_switch and below observations
>>> on schedutil governor are done with this patch.
>>>
>>> In POWER8 there is a regression observed with schedutil compared to
>>> ondemand. With schedutil the frequency is not ramping down and is
>>> mostly stuck at max frequency during idle . This is because of the
>>> watchdog timer, an RT task which is fired every 4 seconds which
>>> results in requesting max frequency.
>>
>> Well, yes, that would be problematic.
>>
>
> Right; we need to come up with something for RT tasks; but what happens
> if you disable the watchdog? This should be entirely doable and might
> give a better comparison.
>
Below are the comparisons by disabling watchdog.
Both schedutil and ondemand have a similar ramp-down trend. And in both the
cases I can see that frequency of the cpu is not reduced in deterministic
fashion. In a observation window of 30 seconds after running a workload I can
see that the frequency is not ramped down on some cpus in the system and are
idling at max frequency.
Below are the sample trace showcasing the frequency request when the cpu enters
idle with schedutil.
<...>-3528 7650.011010: cpu_frequency: state=4322000 cpu_id=120
<...>-3528 7650.027540: sched_switch: prev_comm=ppc64_cpu prev_state=x ==>
next_comm=swapper/120
<idle>-0 7650.035017: cpu_frequency: state=4322000 cpu_id=120
<idle>-0 7729.683536: cpu_frequency: state=4322000 cpu_id=120
<idle>-0 7729.683552: sched_switch: prev_comm=swapper/120 prev_state=R ==>
next_comm=kworker/120:1
kworker/120 7729.683565: sched_switch: prev_comm=kworker/120:1 prev_state=S ==>
next_comm=swapper/120
However ondemand governor(with watchdog enabled) benefits from the noise created
by watchdog timer and is able to brig down the frequency.
Thanks and Regards,
Shilpa
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [RFC PATCH] Increase in idle power with schedutil
2016-05-19 11:40 ` Peter Zijlstra
` (2 preceding siblings ...)
2016-05-20 12:23 ` Shilpasri G Bhat
@ 2016-05-20 12:23 ` Shilpasri G Bhat
[not found] ` <201605201223.u4KCNWn9028105@mx0a-001b2d01.pphosted.com>
4 siblings, 0 replies; 14+ messages in thread
From: Shilpasri G Bhat @ 2016-05-20 12:23 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org,
Linux Kernel Mailing List, Gautham R. Shenoy, shreyas,
akshay.adiga, linuxppc-dev, Steve Muckle
Hi,
On 05/19/2016 05:10 PM, Peter Zijlstra wrote:
> On Wed, May 18, 2016 at 11:11:51PM +0200, Rafael J. Wysocki wrote:
>> On Wed, May 18, 2016 at 2:53 PM, Shilpasri G Bhat
>> <shilpa.bhat@linux.vnet.ibm.com> wrote:
>>> This patch adds driver callback for fast_switch and below observations
>>> on schedutil governor are done with this patch.
>>>
>>> In POWER8 there is a regression observed with schedutil compared to
>>> ondemand. With schedutil the frequency is not ramping down and is
>>> mostly stuck at max frequency during idle . This is because of the
>>> watchdog timer, an RT task which is fired every 4 seconds which
>>> results in requesting max frequency.
>>
>> Well, yes, that would be problematic.
>>
>
> Right; we need to come up with something for RT tasks; but what happens
> if you disable the watchdog? This should be entirely doable and might
> give a better comparison.
>
Below are the comparisons by disabling watchdog.
Both schedutil and ondemand have a similar ramp-down trend. And in both the
cases I can see that frequency of the cpu is not reduced in deterministic
fashion. In a observation window of 30 seconds after running a workload I can
see that the frequency is not ramped down on some cpus in the system and are
idling at max frequency.
Below are the sample trace showcasing the frequency request when the cpu enters
idle with schedutil.
<...>-3528 7650.011010: cpu_frequency: state=4322000 cpu_id=120
<...>-3528 7650.027540: sched_switch: prev_comm=ppc64_cpu prev_state=x ==>
next_comm=swapper/120
<idle>-0 7650.035017: cpu_frequency: state=4322000 cpu_id=120
<idle>-0 7729.683536: cpu_frequency: state=4322000 cpu_id=120
<idle>-0 7729.683552: sched_switch: prev_comm=swapper/120 prev_state=R ==>
next_comm=kworker/120:1
kworker/120 7729.683565: sched_switch: prev_comm=kworker/120:1 prev_state=S ==>
next_comm=swapper/120
However ondemand governor(with watchdog enabled) benefits from the noise created
by watchdog timer and is able to brig down the frequency.
Thanks and Regards,
Shilpa
^ permalink raw reply [flat|nested] 14+ messages in thread[parent not found: <201605201223.u4KCNWn9028105@mx0a-001b2d01.pphosted.com>]
* Re: [RFC PATCH] Increase in idle power with schedutil
[not found] ` <201605201223.u4KCNWn9028105@mx0a-001b2d01.pphosted.com>
@ 2016-05-22 10:39 ` Peter Zijlstra
2016-05-22 20:42 ` Steve Muckle
0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2016-05-22 10:39 UTC (permalink / raw)
To: Shilpasri G Bhat
Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org,
Linux Kernel Mailing List, Gautham R. Shenoy, shreyas,
akshay.adiga, linuxppc-dev, Steve Muckle
On Fri, May 20, 2016 at 05:53:41PM +0530, Shilpasri G Bhat wrote:
>
> Below are the comparisons by disabling watchdog.
> Both schedutil and ondemand have a similar ramp-down trend. And in both the
> cases I can see that frequency of the cpu is not reduced in deterministic
> fashion. In a observation window of 30 seconds after running a workload I can
> see that the frequency is not ramped down on some cpus in the system and are
> idling at max frequency.
So does it actually matter what the frequency is when you idle? Isn't
the whole thing clock gated anyway?
Because this seems to generate contradictory requirements, on the one
hand we want to stay idle as long as possible while on the other hand
you seem to want to clock down while idle, which requires not being
idle.
If it matters; should not your idle state muck explicitly set/restore
frequency?
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [RFC PATCH] Increase in idle power with schedutil
2016-05-22 10:39 ` Peter Zijlstra
@ 2016-05-22 20:42 ` Steve Muckle
2016-05-23 9:00 ` Lorenzo Pieralisi
2016-05-23 9:24 ` Peter Zijlstra
0 siblings, 2 replies; 14+ messages in thread
From: Steve Muckle @ 2016-05-22 20:42 UTC (permalink / raw)
To: Peter Zijlstra, Daniel Lezcano
Cc: Shilpasri G Bhat, Rafael J. Wysocki, Viresh Kumar,
linux-pm@vger.kernel.org, Linux Kernel Mailing List,
Gautham R. Shenoy, shreyas, akshay.adiga, linuxppc-dev,
Steve Muckle
On Sun, May 22, 2016 at 12:39:12PM +0200, Peter Zijlstra wrote:
> On Fri, May 20, 2016 at 05:53:41PM +0530, Shilpasri G Bhat wrote:
> >
> > Below are the comparisons by disabling watchdog.
> > Both schedutil and ondemand have a similar ramp-down trend. And in both the
> > cases I can see that frequency of the cpu is not reduced in deterministic
> > fashion. In a observation window of 30 seconds after running a workload I can
> > see that the frequency is not ramped down on some cpus in the system and are
> > idling at max frequency.
>
> So does it actually matter what the frequency is when you idle? Isn't
> the whole thing clock gated anyway?
>
> Because this seems to generate contradictory requirements, on the one
> hand we want to stay idle as long as possible while on the other hand
> you seem to want to clock down while idle, which requires not being
> idle.
>
> If it matters; should not your idle state muck explicitly set/restore
> frequency?
AFAIK this is very platform dependent. Some will waste more power than
others when a CPU idles above fmin due to things like resource (bus
bandwidth, shared cache freq etc) voting.
It is also true that there is power spent going to fmin (and then
perhaps restoring the frequency when idle ends) which will be in part a
function of how slow the frequency change operation is on that platform.
I think Daniel Lezcano (added) was exploring the idea of having cpuidle
drivers take the expected idle duration and potentially communicate to
cpufreq to reduce the frequency depending on a platform-specific
cost/benefit analysis.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] Increase in idle power with schedutil
2016-05-22 20:42 ` Steve Muckle
@ 2016-05-23 9:00 ` Lorenzo Pieralisi
2016-05-23 9:24 ` Peter Zijlstra
2016-05-23 9:24 ` Peter Zijlstra
1 sibling, 1 reply; 14+ messages in thread
From: Lorenzo Pieralisi @ 2016-05-23 9:00 UTC (permalink / raw)
To: Steve Muckle
Cc: Peter Zijlstra, Daniel Lezcano, Shilpasri G Bhat,
Rafael J. Wysocki, Viresh Kumar, linux-pm@vger.kernel.org,
Linux Kernel Mailing List, Gautham R. Shenoy, shreyas,
akshay.adiga, linuxppc-dev
On Sun, May 22, 2016 at 01:42:52PM -0700, Steve Muckle wrote:
> On Sun, May 22, 2016 at 12:39:12PM +0200, Peter Zijlstra wrote:
> > On Fri, May 20, 2016 at 05:53:41PM +0530, Shilpasri G Bhat wrote:
> > >
> > > Below are the comparisons by disabling watchdog.
> > > Both schedutil and ondemand have a similar ramp-down trend. And in both the
> > > cases I can see that frequency of the cpu is not reduced in deterministic
> > > fashion. In a observation window of 30 seconds after running a workload I can
> > > see that the frequency is not ramped down on some cpus in the system and are
> > > idling at max frequency.
> >
> > So does it actually matter what the frequency is when you idle? Isn't
> > the whole thing clock gated anyway?
> >
> > Because this seems to generate contradictory requirements, on the one
> > hand we want to stay idle as long as possible while on the other hand
> > you seem to want to clock down while idle, which requires not being
> > idle.
> >
> > If it matters; should not your idle state muck explicitly set/restore
> > frequency?
>
> AFAIK this is very platform dependent. Some will waste more power than
> others when a CPU idles above fmin due to things like resource (bus
> bandwidth, shared cache freq etc) voting.
It is also related to static leakage power that depends on the operating
voltage (ie higher operating frequencies require higher voltage) so in a
way scaling frequency before going idle may not be effective if voltage
does not scale too in turn.
Lorenzo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] Increase in idle power with schedutil
2016-05-23 9:00 ` Lorenzo Pieralisi
@ 2016-05-23 9:24 ` Peter Zijlstra
0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2016-05-23 9:24 UTC (permalink / raw)
To: Lorenzo Pieralisi
Cc: Steve Muckle, Daniel Lezcano, Shilpasri G Bhat, Rafael J. Wysocki,
Viresh Kumar, linux-pm@vger.kernel.org, Linux Kernel Mailing List,
Gautham R. Shenoy, shreyas, akshay.adiga, linuxppc-dev
On Mon, May 23, 2016 at 10:00:04AM +0100, Lorenzo Pieralisi wrote:
> It is also related to static leakage power that depends on the operating
> voltage (ie higher operating frequencies require higher voltage) so in a
> way scaling frequency before going idle may not be effective if voltage
> does not scale too in turn.
Sure, but the platform drivers 'know' all this and can make the right
decision.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] Increase in idle power with schedutil
2016-05-22 20:42 ` Steve Muckle
2016-05-23 9:00 ` Lorenzo Pieralisi
@ 2016-05-23 9:24 ` Peter Zijlstra
1 sibling, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2016-05-23 9:24 UTC (permalink / raw)
To: Steve Muckle
Cc: Daniel Lezcano, Shilpasri G Bhat, Rafael J. Wysocki, Viresh Kumar,
linux-pm@vger.kernel.org, Linux Kernel Mailing List,
Gautham R. Shenoy, shreyas, akshay.adiga, linuxppc-dev
On Sun, May 22, 2016 at 01:42:52PM -0700, Steve Muckle wrote:
> > So does it actually matter what the frequency is when you idle? Isn't
> > the whole thing clock gated anyway?
> >
> > Because this seems to generate contradictory requirements, on the one
> > hand we want to stay idle as long as possible while on the other hand
> > you seem to want to clock down while idle, which requires not being
> > idle.
> >
> > If it matters; should not your idle state muck explicitly set/restore
> > frequency?
>
> AFAIK this is very platform dependent. Some will waste more power than
> others when a CPU idles above fmin due to things like resource (bus
> bandwidth, shared cache freq etc) voting.
Oh agreed, completely platform dependent. 'Luckily' all this cpuidle is
already very platform dependent.
> It is also true that there is power spent going to fmin (and then
> perhaps restoring the frequency when idle ends) which will be in part a
> function of how slow the frequency change operation is on that platform.
Agreed.
> I think Daniel Lezcano (added) was exploring the idea of having cpuidle
> drivers take the expected idle duration and potentially communicate to
> cpufreq to reduce the frequency depending on a platform-specific
> cost/benefit analysis.
Right; that's along the lines I was thinking. If the idle guestimate and
the idle QoS both allow (ie. it wins on power and doesn't violate
wake-up latency) muck with DVSF on the idle path.
^ permalink raw reply [flat|nested] 14+ messages in thread