* Performance regression in v3.14
@ 2014-05-06 16:35 Johan Hovold
2014-05-07 5:40 ` Viresh Kumar
0 siblings, 1 reply; 13+ messages in thread
From: Johan Hovold @ 2014-05-06 16:35 UTC (permalink / raw)
To: Rafael J. Wysocki, Viresh Kumar; +Cc: cpufreq, linux-pm, linux-kernel
After updating my main system from v3.13 to v3.14.2, I found that the
git bash-completion was extremely sluggish. Completing a file name would
take roughly six rather than one second on this Haswell machine
(i7-4770). (Other things, such as git rebase, also felt slower, but
the completion issue was much more obvious and easy to measure).
I managed to reproduce the problem using the following minimal construct
cat dmesg.repeat | while read x; do true; done
where dmesg.repeat is simply dmesg concatenated together to an
equivalent number of lines as produced by git ls-files in the
kernel-source tree root (45k), and where the actual processing of each
line has been removed.
Most of the time I get:
$ time cat dmesg.repeat | while read x; do true; done
real 0m6.091s
user 0m3.674s
sys 0m2.447s
but sometimes it only takes one second.
$ time cat dmesg.repeat | while read x; do true; done
real 0m1.100s
user 0m0.544s
sys 0m0.570s
I don't seem to be able to reproduce the problem on 3.13 where the pipe
always takes about one second to finish.
Taking all but one core offline seems to make the problem go away, and so
does using the performance rather than powersave governor of the
intel_pstate cpufreq driver (on at least one of two online cores).
Moving the mouse cursor makes to loop finish faster, and so does
switching to a another terminal to print cpufreq/cpuinfo_cur_freq which
was around cpuinfo_min_freq several times (when tracing, see below).
I could not reproduce the problem when using perf record, but I can get
function-profile traces using ftrace (in which case the loop takes about
60 seconds instead of six seconds to finish).
Comparing the traces I see a lot of functions taking ten times longer to
finish, but I guess that's expected if this is indeed a cpufreq issue.
Since this is my main machine (and only multi-core machine at the
moment) I'm not able to bisect this myself. And for the same reason I
have not verified that the problem persists in v3.15-rc.
I don't see any cpufreq patches in the v3.14.3 stable queue nor anything
obviously related and marked for stable in v3.15-rc.
Any ideas about what might be going on?
Thanks,
Johan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14
2014-05-06 16:35 Performance regression in v3.14 Johan Hovold
@ 2014-05-07 5:40 ` Viresh Kumar
2014-05-07 7:35 ` Johan Hovold
2014-05-07 14:10 ` Dirk Brandewie
0 siblings, 2 replies; 13+ messages in thread
From: Viresh Kumar @ 2014-05-07 5:40 UTC (permalink / raw)
To: Johan Hovold, Dirk Brandewie
Cc: Rafael J. Wysocki, cpufreq@vger.kernel.org,
linux-pm@vger.kernel.org, Linux Kernel Mailing List
Cc'ing Dirk who is taking care of intel-pstate driver.
On 6 May 2014 22:05, Johan Hovold <jhovold@gmail.com> wrote:
> After updating my main system from v3.13 to v3.14.2, I found that the
> git bash-completion was extremely sluggish. Completing a file name would
> take roughly six rather than one second on this Haswell machine
> (i7-4770). (Other things, such as git rebase, also felt slower, but
> the completion issue was much more obvious and easy to measure).
>
> I managed to reproduce the problem using the following minimal construct
>
> cat dmesg.repeat | while read x; do true; done
>
> where dmesg.repeat is simply dmesg concatenated together to an
> equivalent number of lines as produced by git ls-files in the
> kernel-source tree root (45k), and where the actual processing of each
> line has been removed.
>
> Most of the time I get:
>
> $ time cat dmesg.repeat | while read x; do true; done
>
> real 0m6.091s
> user 0m3.674s
> sys 0m2.447s
>
> but sometimes it only takes one second.
>
> $ time cat dmesg.repeat | while read x; do true; done
>
> real 0m1.100s
> user 0m0.544s
> sys 0m0.570s
>
> I don't seem to be able to reproduce the problem on 3.13 where the pipe
> always takes about one second to finish.
>
> Taking all but one core offline seems to make the problem go away, and so
> does using the performance rather than powersave governor of the
> intel_pstate cpufreq driver (on at least one of two online cores).
>
> Moving the mouse cursor makes to loop finish faster, and so does
> switching to a another terminal to print cpufreq/cpuinfo_cur_freq which
> was around cpuinfo_min_freq several times (when tracing, see below).
>
> I could not reproduce the problem when using perf record, but I can get
> function-profile traces using ftrace (in which case the loop takes about
> 60 seconds instead of six seconds to finish).
>
> Comparing the traces I see a lot of functions taking ten times longer to
> finish, but I guess that's expected if this is indeed a cpufreq issue.
>
> Since this is my main machine (and only multi-core machine at the
> moment) I'm not able to bisect this myself. And for the same reason I
> have not verified that the problem persists in v3.15-rc.
>
> I don't see any cpufreq patches in the v3.14.3 stable queue nor anything
> obviously related and marked for stable in v3.15-rc.
>
> Any ideas about what might be going on?
I tried to take a look at the diff for cpufreq between 3.13 and 3.14.2 and
couldn't pin point on any change which might cause it. Don't have a clue
of what's going on. I don't know how to help you on this.
Normally I test my stuff on a ARM board and I don't remember facing
any such behavior there. There might be something wrong with intel-pstate
as well..
Also, can you try to use acpi-cpufreq instead? And see how that is behaving?
--
viresh
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14
2014-05-07 5:40 ` Viresh Kumar
@ 2014-05-07 7:35 ` Johan Hovold
2014-05-07 8:36 ` Romain Francoise
2014-05-07 14:10 ` Dirk Brandewie
1 sibling, 1 reply; 13+ messages in thread
From: Johan Hovold @ 2014-05-07 7:35 UTC (permalink / raw)
To: Viresh Kumar
Cc: Johan Hovold, Dirk Brandewie, Rafael J. Wysocki,
cpufreq@vger.kernel.org, linux-pm@vger.kernel.org,
Linux Kernel Mailing List
On Wed, May 07, 2014 at 11:10:34AM +0530, Viresh Kumar wrote:
> Cc'ing Dirk who is taking care of intel-pstate driver.
>
> On 6 May 2014 22:05, Johan Hovold <jhovold@gmail.com> wrote:
> > After updating my main system from v3.13 to v3.14.2, I found that the
> > git bash-completion was extremely sluggish. Completing a file name would
> > take roughly six rather than one second on this Haswell machine
> > (i7-4770). (Other things, such as git rebase, also felt slower, but
> > the completion issue was much more obvious and easy to measure).
> >
> > I managed to reproduce the problem using the following minimal construct
> >
> > cat dmesg.repeat | while read x; do true; done
> >
> > where dmesg.repeat is simply dmesg concatenated together to an
> > equivalent number of lines as produced by git ls-files in the
> > kernel-source tree root (45k), and where the actual processing of each
> > line has been removed.
> >
> > Most of the time I get:
> >
> > $ time cat dmesg.repeat | while read x; do true; done
> >
> > real 0m6.091s
> > user 0m3.674s
> > sys 0m2.447s
> >
> > but sometimes it only takes one second.
> >
> > $ time cat dmesg.repeat | while read x; do true; done
> >
> > real 0m1.100s
> > user 0m0.544s
> > sys 0m0.570s
> >
> > I don't seem to be able to reproduce the problem on 3.13 where the pipe
> > always takes about one second to finish.
> >
> > Taking all but one core offline seems to make the problem go away, and so
> > does using the performance rather than powersave governor of the
> > intel_pstate cpufreq driver (on at least one of two online cores).
> >
> > Moving the mouse cursor makes to loop finish faster, and so does
> > switching to a another terminal to print cpufreq/cpuinfo_cur_freq which
> > was around cpuinfo_min_freq several times (when tracing, see below).
<snip>
> I tried to take a look at the diff for cpufreq between 3.13 and 3.14.2 and
> couldn't pin point on any change which might cause it. Don't have a clue
> of what's going on. I don't know how to help you on this.
>
> Normally I test my stuff on a ARM board and I don't remember facing
> any such behavior there. There might be something wrong with intel-pstate
> as well..
>
> Also, can you try to use acpi-cpufreq instead? And see how that is behaving?
Using acpi-cpufreq and the ondemand governor (with all 8 cores
online) on 3.14.3 improves the situation somewhat:
$ time cat dmesg.repeat | while read x; do true; done
real 0m1.989s
user 0m1.257s
sys 0m0.747s
when the system is idle, and
$ time cat dmesg.repeat | while read x; do true; done
real 0m1.191s
user 0m0.753s
sys 0m0.449s
when run a second time in immediate succession.
When running the same tests on 3.13.11, the figures are roughly the same
$ time cat dmesg.repeat | while read x; do true; done
real 0m2.075s
user 0m1.276s
sys 0m0.816s
$ time cat dmesg.repeat | while read x; do true; done
real 0m1.291s
user 0m0.800s
sys 0m0.504s
So I guess that idle-active difference is normal for acpi-cpufreq and
that the problem only arises in or with the intel_pstate driver.
Thanks,
Johan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14
2014-05-07 7:35 ` Johan Hovold
@ 2014-05-07 8:36 ` Romain Francoise
0 siblings, 0 replies; 13+ messages in thread
From: Romain Francoise @ 2014-05-07 8:36 UTC (permalink / raw)
To: Johan Hovold
Cc: Viresh Kumar, Dirk Brandewie, Rafael J. Wysocki,
cpufreq@vger.kernel.org, linux-pm@vger.kernel.org,
Linux Kernel Mailing List
Johan Hovold <jhovold@gmail.com> writes:
> So I guess that idle-active difference is normal for acpi-cpufreq and
> that the problem only arises in or with the intel_pstate driver.
I've also noticed some performance issues with intel_pstate in powersave
mode, in my case playing fullscreen video was very choppy. Switching to
the performance governor fixed things as well. Looking at turbostat, the
cores remain close to their minimal frequency pretty much all the time
in powersave.
Maybe 91a4cd4f3d8169d ("intel_pstate: Remove periodic P state boost")
which went in v3.14 is the culprit?
(I would also be curious to know what the driver authors recommend for
typical laptop usage, performance or powersave, since it defaults to
performance.)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14
2014-05-07 5:40 ` Viresh Kumar
2014-05-07 7:35 ` Johan Hovold
@ 2014-05-07 14:10 ` Dirk Brandewie
2014-05-21 9:00 ` Johan Hovold
1 sibling, 1 reply; 13+ messages in thread
From: Dirk Brandewie @ 2014-05-07 14:10 UTC (permalink / raw)
To: Viresh Kumar, Johan Hovold
Cc: dirk.j.brandewie, Rafael J. Wysocki, cpufreq@vger.kernel.org,
linux-pm@vger.kernel.org, Linux Kernel Mailing List
On 05/06/2014 10:40 PM, Viresh Kumar wrote:
> Cc'ing Dirk who is taking care of intel-pstate driver.
>
Thanks Viresh I had seen this thread.
I am looking into it
--Dirk
> On 6 May 2014 22:05, Johan Hovold <jhovold@gmail.com> wrote:
>> After updating my main system from v3.13 to v3.14.2, I found that the
>> git bash-completion was extremely sluggish. Completing a file name would
>> take roughly six rather than one second on this Haswell machine
>> (i7-4770). (Other things, such as git rebase, also felt slower, but
>> the completion issue was much more obvious and easy to measure).
>>
>> I managed to reproduce the problem using the following minimal construct
>>
>> cat dmesg.repeat | while read x; do true; done
>>
>> where dmesg.repeat is simply dmesg concatenated together to an
>> equivalent number of lines as produced by git ls-files in the
>> kernel-source tree root (45k), and where the actual processing of each
>> line has been removed.
>>
>> Most of the time I get:
>>
>> $ time cat dmesg.repeat | while read x; do true; done
>>
>> real 0m6.091s
>> user 0m3.674s
>> sys 0m2.447s
>>
>> but sometimes it only takes one second.
>>
>> $ time cat dmesg.repeat | while read x; do true; done
>>
>> real 0m1.100s
>> user 0m0.544s
>> sys 0m0.570s
>>
>> I don't seem to be able to reproduce the problem on 3.13 where the pipe
>> always takes about one second to finish.
>>
>> Taking all but one core offline seems to make the problem go away, and so
>> does using the performance rather than powersave governor of the
>> intel_pstate cpufreq driver (on at least one of two online cores).
>>
>> Moving the mouse cursor makes to loop finish faster, and so does
>> switching to a another terminal to print cpufreq/cpuinfo_cur_freq which
>> was around cpuinfo_min_freq several times (when tracing, see below).
>>
>> I could not reproduce the problem when using perf record, but I can get
>> function-profile traces using ftrace (in which case the loop takes about
>> 60 seconds instead of six seconds to finish).
>>
>> Comparing the traces I see a lot of functions taking ten times longer to
>> finish, but I guess that's expected if this is indeed a cpufreq issue.
>>
>> Since this is my main machine (and only multi-core machine at the
>> moment) I'm not able to bisect this myself. And for the same reason I
>> have not verified that the problem persists in v3.15-rc.
>>
>> I don't see any cpufreq patches in the v3.14.3 stable queue nor anything
>> obviously related and marked for stable in v3.15-rc.
>>
>> Any ideas about what might be going on?
>
> I tried to take a look at the diff for cpufreq between 3.13 and 3.14.2 and
> couldn't pin point on any change which might cause it. Don't have a clue
> of what's going on. I don't know how to help you on this.
>
> Normally I test my stuff on a ARM board and I don't remember facing
> any such behavior there. There might be something wrong with intel-pstate
> as well..
>
> Also, can you try to use acpi-cpufreq instead? And see how that is behaving?
>
> --
> viresh
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14
2014-05-07 14:10 ` Dirk Brandewie
@ 2014-05-21 9:00 ` Johan Hovold
2014-05-28 7:59 ` Johan Hovold
0 siblings, 1 reply; 13+ messages in thread
From: Johan Hovold @ 2014-05-21 9:00 UTC (permalink / raw)
To: Dirk Brandewie
Cc: Viresh Kumar, Johan Hovold, dirk.j.brandewie, Rafael J. Wysocki,
cpufreq@vger.kernel.org, linux-pm@vger.kernel.org,
Linux Kernel Mailing List
On Wed, May 07, 2014 at 07:10:49AM -0700, Dirk Brandewie wrote:
> On 05/06/2014 10:40 PM, Viresh Kumar wrote:
> > Cc'ing Dirk who is taking care of intel-pstate driver.
> >
>
> Thanks Viresh I had seen this thread.
>
> I am looking into it
Any updates on this, Dirk? 3.14 is still basically unusable with the
intel_pstate driver.
Any fixes or workarounds posted elsewhere that I can apply in the
meantime?
Thanks,
Johan
> > On 6 May 2014 22:05, Johan Hovold <jhovold@gmail.com> wrote:
> >> After updating my main system from v3.13 to v3.14.2, I found that the
> >> git bash-completion was extremely sluggish. Completing a file name would
> >> take roughly six rather than one second on this Haswell machine
> >> (i7-4770). (Other things, such as git rebase, also felt slower, but
> >> the completion issue was much more obvious and easy to measure).
> >>
> >> I managed to reproduce the problem using the following minimal construct
> >>
> >> cat dmesg.repeat | while read x; do true; done
> >>
> >> where dmesg.repeat is simply dmesg concatenated together to an
> >> equivalent number of lines as produced by git ls-files in the
> >> kernel-source tree root (45k), and where the actual processing of each
> >> line has been removed.
> >>
> >> Most of the time I get:
> >>
> >> $ time cat dmesg.repeat | while read x; do true; done
> >>
> >> real 0m6.091s
> >> user 0m3.674s
> >> sys 0m2.447s
> >>
> >> but sometimes it only takes one second.
> >>
> >> $ time cat dmesg.repeat | while read x; do true; done
> >>
> >> real 0m1.100s
> >> user 0m0.544s
> >> sys 0m0.570s
> >>
> >> I don't seem to be able to reproduce the problem on 3.13 where the pipe
> >> always takes about one second to finish.
> >>
> >> Taking all but one core offline seems to make the problem go away, and so
> >> does using the performance rather than powersave governor of the
> >> intel_pstate cpufreq driver (on at least one of two online cores).
> >>
> >> Moving the mouse cursor makes to loop finish faster, and so does
> >> switching to a another terminal to print cpufreq/cpuinfo_cur_freq which
> >> was around cpuinfo_min_freq several times (when tracing, see below).
> >>
> >> I could not reproduce the problem when using perf record, but I can get
> >> function-profile traces using ftrace (in which case the loop takes about
> >> 60 seconds instead of six seconds to finish).
> >>
> >> Comparing the traces I see a lot of functions taking ten times longer to
> >> finish, but I guess that's expected if this is indeed a cpufreq issue.
> >>
> >> Since this is my main machine (and only multi-core machine at the
> >> moment) I'm not able to bisect this myself. And for the same reason I
> >> have not verified that the problem persists in v3.15-rc.
> >>
> >> I don't see any cpufreq patches in the v3.14.3 stable queue nor anything
> >> obviously related and marked for stable in v3.15-rc.
> >>
> >> Any ideas about what might be going on?
> >
> > I tried to take a look at the diff for cpufreq between 3.13 and 3.14.2 and
> > couldn't pin point on any change which might cause it. Don't have a clue
> > of what's going on. I don't know how to help you on this.
> >
> > Normally I test my stuff on a ARM board and I don't remember facing
> > any such behavior there. There might be something wrong with intel-pstate
> > as well..
> >
> > Also, can you try to use acpi-cpufreq instead? And see how that is behaving?
> >
> > --
> > viresh
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14
2014-05-28 7:59 ` Johan Hovold
@ 2014-05-28 0:35 ` Yuyang Du
2014-05-28 16:00 ` Doug Smythies
2014-05-30 2:27 ` Greg Kroah-Hartman
1 sibling, 1 reply; 13+ messages in thread
From: Yuyang Du @ 2014-05-28 0:35 UTC (permalink / raw)
To: Johan Hovold
Cc: Dirk Brandewie, Viresh Kumar, dirk.j.brandewie, Rafael J. Wysocki,
cpufreq@vger.kernel.org, linux-pm@vger.kernel.org,
Linux Kernel Mailing List, Greg Kroah-Hartman, Doug Smythies,
Stratos Karafotis
> I tried applying your (rejected) patch "intel_pstate: Remove C0
> tracking" posted here:
>
> https://lkml.org/lkml/2014/5/8/574
>
> to v3.14.4 and it fixes the problem as expected.
>
> So we have a commit fcb6a15c2e7e ("intel_pstate: Take core C0 time into
> account for core busy calculation") that went into v3.14-rc2 (and was
> even marked for *stable*) that first broke Greg KH's system:
>
> https://lkml.org/lkml/2014/2/19/626
>
> That was apparently fixed by e66c17683746 ("intel_pstate: Change
> busy calculation to use fixed point math."), but still left v3.14
> basically unusable for lower-intensity workloads such as my
> bash-completion example and other reported regressions:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=75121
>
> Sure there may be issues with v3.13 not hitting the lowest frequencies
> but at least the system was *usable*.
>
> In my opinion there's really no other option than to restore the 3.13
> behaviour by effectively reverting fcb6a15c2e7e ("intel_pstate: Take
> core C0 time into account for core busy calculation") until you have
> figured out a way to take C0 into account without breaking things too
> badly.
Hi all,
My posts before and now are only relevant to why C0 tracking can't be
removed. Maybe I need to elaborate on it a little bit more.
In a nutshell, without C0 tracking, the intel_pstate is effectively performance
governor in terms of frequency control. Why?
Without C0 trakcing, the machinery of the freq control is as I formed:
last_freq_average / last_requested_freq ==> setpoint
which can be virtually formed into:
last_freq_average / last_requested_freq * last_C0_pct ==> setpoint * last_C0_pct
which said, the control machinery will increase the freuqency at ANY frequency at
ANY C0_pct (which is the CPU utilization), since setpoint is less then 100
percent. And a few iterations later, we will reach max (possible) frequency,
then we are effectively performance governor (highest frequency all the time).
So, sure, without C0 tracking, the performance issues should be fixed. But
let's simply set highest frequency, that should be better.
It is your decision whether we should remove C0 tracking as a no-other option fix
right now. I am ok either way.
Thanks,
Yuyang
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14
2014-05-21 9:00 ` Johan Hovold
@ 2014-05-28 7:59 ` Johan Hovold
2014-05-28 0:35 ` Yuyang Du
2014-05-30 2:27 ` Greg Kroah-Hartman
0 siblings, 2 replies; 13+ messages in thread
From: Johan Hovold @ 2014-05-28 7:59 UTC (permalink / raw)
To: Dirk Brandewie
Cc: Viresh Kumar, Johan Hovold, dirk.j.brandewie, Rafael J. Wysocki,
cpufreq@vger.kernel.org, linux-pm@vger.kernel.org,
Linux Kernel Mailing List, Greg Kroah-Hartman, Doug Smythies,
Yuyang Du, Stratos Karafotis
[ +CC: Greg, Doug, Stratos, Yuyang ]
On Wed, May 21, 2014 at 11:00:51AM +0200, Johan Hovold wrote:
> On Wed, May 07, 2014 at 07:10:49AM -0700, Dirk Brandewie wrote:
> > On 05/06/2014 10:40 PM, Viresh Kumar wrote:
> > > Cc'ing Dirk who is taking care of intel-pstate driver.
> > >
> >
> > Thanks Viresh I had seen this thread.
> >
> > I am looking into it
>
> Any updates on this, Dirk? 3.14 is still basically unusable with the
> intel_pstate driver.
>
> Any fixes or workarounds posted elsewhere that I can apply in the
> meantime?
Another week and still no reply, Dirk?
I tried applying your (rejected) patch "intel_pstate: Remove C0
tracking" posted here:
https://lkml.org/lkml/2014/5/8/574
to v3.14.4 and it fixes the problem as expected.
So we have a commit fcb6a15c2e7e ("intel_pstate: Take core C0 time into
account for core busy calculation") that went into v3.14-rc2 (and was
even marked for *stable*) that first broke Greg KH's system:
https://lkml.org/lkml/2014/2/19/626
That was apparently fixed by e66c17683746 ("intel_pstate: Change
busy calculation to use fixed point math."), but still left v3.14
basically unusable for lower-intensity workloads such as my
bash-completion example and other reported regressions:
https://bugzilla.kernel.org/show_bug.cgi?id=75121
Sure there may be issues with v3.13 not hitting the lowest frequencies
but at least the system was *usable*.
In my opinion there's really no other option than to restore the 3.13
behaviour by effectively reverting fcb6a15c2e7e ("intel_pstate: Take
core C0 time into account for core busy calculation") until you have
figured out a way to take C0 into account without breaking things too
badly.
Johan
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Performance regression in v3.14
2014-05-28 0:35 ` Yuyang Du
@ 2014-05-28 16:00 ` Doug Smythies
2014-05-28 16:53 ` Yuyang Du
0 siblings, 1 reply; 13+ messages in thread
From: Doug Smythies @ 2014-05-28 16:00 UTC (permalink / raw)
To: 'Yuyang Du', 'Johan Hovold'
Cc: 'Dirk Brandewie', 'Viresh Kumar',
dirk.j.brandewie, 'Rafael J. Wysocki', cpufreq, linux-pm,
'Linux Kernel Mailing List', 'Greg Kroah-Hartman',
'Stratos Karafotis', Doug Smythies
On 2014.05.27 01:40 Yuyang Du wrote:
>> On 2014.05.27 01:00, Johan Hovold wrote:
>> I tried applying your (rejected) patch "intel_pstate: Remove C0
>> tracking" posted here:
>>
>> https://lkml.org/lkml/2014/5/8/574
>>
>> to v3.14.4 and it fixes the problem as expected.
>>
>> So we have a commit fcb6a15c2e7e ("intel_pstate: Take core C0 time into
>> account for core busy calculation") that went into v3.14-rc2 (and was
>> even marked for *stable*) that first broke Greg KH's system:
>>
>> https://lkml.org/lkml/2014/2/19/626
>>
>> That was apparently fixed by e66c17683746 ("intel_pstate: Change
>> busy calculation to use fixed point math."), but still left v3.14
>> basically unusable for lower-intensity workloads such as my
>> bash-completion example and other reported regressions:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=75121
>>
>> Sure there may be issues with v3.13 not hitting the lowest frequencies
>> but at least the system was *usable*.
>>
>> In my opinion there's really no other option than to restore the 3.13
>> behaviour by effectively reverting fcb6a15c2e7e ("intel_pstate: Take
>> core C0 time into account for core busy calculation") until you have
>> figured out a way to take C0 into account without breaking things too
>> badly.
> Hi all,
> My posts before and now are only relevant to why C0 tracking can't be
> removed. Maybe I need to elaborate on it a little bit more.
> In a nutshell, without C0 tracking, the intel_pstate is effectively
> performance governor in terms of frequency control.
That is not true. The CPU Frequency Verses Load response curve
is different and considerably more aggressive for performance mode
when compared to powersave mode with C0 tracking removed.
I'll add a relevant graph to the bugzilla report referenced above.
(but it will be a few hours before I do.)
> Why? Without C0 trakcing, the machinery of the freq control
> is as I formed:
> last_freq_average / last_requested_freq ==> setpoint
> which can be virtually formed into:
> last_freq_average / last_requested_freq * last_C0_pct ==>
> setpoint * last_C0_pct
> which said, the control machinery will increase the frequency
> at ANY frequency at ANY C0_pct (which is the CPU utilization),
> since setpoint is less then 100 percent.
That is not true. Yes, and due to the setpoint being less than
100, which is needed or the driver won't work at all, there is
a tendency to drive the target pstate upwards.
However that is tempered by both the PID proportional gain,
and ultimately integer math. More importantly, the CPU
itself tells the driver when it is operating below the target
pstate and driver responds.
Additionally, the tendency to drive up the target pstate
too much is exasperated by some extra rounding up at a
couple of spots. Dirk has a pending fix.
> And a few iterations
> later, we will reach max (possible) frequency,
> then we are effectively performance governor
> (highest frequency all the time).
Please do not confuse highest target pstate with
highest frequency. They are not the same. The processor
itself can back off.
... Doug
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14
2014-05-28 16:00 ` Doug Smythies
@ 2014-05-28 16:53 ` Yuyang Du
0 siblings, 0 replies; 13+ messages in thread
From: Yuyang Du @ 2014-05-28 16:53 UTC (permalink / raw)
To: Doug Smythies
Cc: 'Johan Hovold', 'Dirk Brandewie',
'Viresh Kumar', dirk.j.brandewie,
'Rafael J. Wysocki', cpufreq, linux-pm,
'Linux Kernel Mailing List', 'Greg Kroah-Hartman',
'Stratos Karafotis'
> That is not true. Yes, and due to the setpoint being less than
> 100, which is needed or the driver won't work at all, there is
> a tendency to drive the target pstate upwards.
> However that is tempered by both the PID proportional gain,
> and ultimately integer math. More importantly, the CPU
> itself tells the driver when it is operating below the target
> pstate and driver responds.
>
> Additionally, the tendency to drive up the target pstate
> too much is exasperated by some extra rounding up at a
> couple of spots. Dirk has a pending fix.
>
> > And a few iterations
> > later, we will reach max (possible) frequency,
> > then we are effectively performance governor
> > (highest frequency all the time).
>
> Please do not confuse highest target pstate with
> highest frequency. They are not the same. The processor
> itself can back off.
>
Hi Doug,
All you said is about the hardware will not give whatever software wants
(e.g., requested freq too high). Agreed.
But does it matter to this discussion?
Yuyang
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14
2014-05-28 7:59 ` Johan Hovold
2014-05-28 0:35 ` Yuyang Du
@ 2014-05-30 2:27 ` Greg Kroah-Hartman
2014-05-30 8:49 ` Johan Hovold
2014-05-30 12:29 ` Rafael J. Wysocki
1 sibling, 2 replies; 13+ messages in thread
From: Greg Kroah-Hartman @ 2014-05-30 2:27 UTC (permalink / raw)
To: Johan Hovold
Cc: Dirk Brandewie, Viresh Kumar, dirk.j.brandewie, Rafael J. Wysocki,
cpufreq@vger.kernel.org, linux-pm@vger.kernel.org,
Linux Kernel Mailing List, Doug Smythies, Yuyang Du,
Stratos Karafotis
On Wed, May 28, 2014 at 09:59:45AM +0200, Johan Hovold wrote:
> [ +CC: Greg, Doug, Stratos, Yuyang ]
>
> On Wed, May 21, 2014 at 11:00:51AM +0200, Johan Hovold wrote:
> > On Wed, May 07, 2014 at 07:10:49AM -0700, Dirk Brandewie wrote:
> > > On 05/06/2014 10:40 PM, Viresh Kumar wrote:
> > > > Cc'ing Dirk who is taking care of intel-pstate driver.
> > > >
> > >
> > > Thanks Viresh I had seen this thread.
> > >
> > > I am looking into it
> >
> > Any updates on this, Dirk? 3.14 is still basically unusable with the
> > intel_pstate driver.
> >
> > Any fixes or workarounds posted elsewhere that I can apply in the
> > meantime?
>
> Another week and still no reply, Dirk?
>
> I tried applying your (rejected) patch "intel_pstate: Remove C0
> tracking" posted here:
>
> https://lkml.org/lkml/2014/5/8/574
>
> to v3.14.4 and it fixes the problem as expected.
>
> So we have a commit fcb6a15c2e7e ("intel_pstate: Take core C0 time into
> account for core busy calculation") that went into v3.14-rc2 (and was
> even marked for *stable*) that first broke Greg KH's system:
>
> https://lkml.org/lkml/2014/2/19/626
>
> That was apparently fixed by e66c17683746 ("intel_pstate: Change
> busy calculation to use fixed point math."), but still left v3.14
> basically unusable for lower-intensity workloads such as my
> bash-completion example and other reported regressions:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=75121
>
> Sure there may be issues with v3.13 not hitting the lowest frequencies
> but at least the system was *usable*.
>
> In my opinion there's really no other option than to restore the 3.13
> behaviour by effectively reverting fcb6a15c2e7e ("intel_pstate: Take
> core C0 time into account for core busy calculation") until you have
> figured out a way to take C0 into account without breaking things too
> badly.
Dirk has posted some patches, do they fix the problem for you?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14
2014-05-30 2:27 ` Greg Kroah-Hartman
@ 2014-05-30 8:49 ` Johan Hovold
2014-05-30 12:29 ` Rafael J. Wysocki
1 sibling, 0 replies; 13+ messages in thread
From: Johan Hovold @ 2014-05-30 8:49 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Johan Hovold, Dirk Brandewie, Viresh Kumar, dirk.j.brandewie,
Rafael J. Wysocki, cpufreq@vger.kernel.org,
linux-pm@vger.kernel.org, Linux Kernel Mailing List,
Doug Smythies, Yuyang Du, Stratos Karafotis
On Thu, May 29, 2014 at 07:27:34PM -0700, Greg Kroah-Hartman wrote:
> On Wed, May 28, 2014 at 09:59:45AM +0200, Johan Hovold wrote:
> > [ +CC: Greg, Doug, Stratos, Yuyang ]
> >
> > On Wed, May 21, 2014 at 11:00:51AM +0200, Johan Hovold wrote:
> > > On Wed, May 07, 2014 at 07:10:49AM -0700, Dirk Brandewie wrote:
> > > > On 05/06/2014 10:40 PM, Viresh Kumar wrote:
> > > > > Cc'ing Dirk who is taking care of intel-pstate driver.
> > > > >
> > > >
> > > > Thanks Viresh I had seen this thread.
> > > >
> > > > I am looking into it
> > >
> > > Any updates on this, Dirk? 3.14 is still basically unusable with the
> > > intel_pstate driver.
> > >
> > > Any fixes or workarounds posted elsewhere that I can apply in the
> > > meantime?
> >
> > Another week and still no reply, Dirk?
> >
> > I tried applying your (rejected) patch "intel_pstate: Remove C0
> > tracking" posted here:
> >
> > https://lkml.org/lkml/2014/5/8/574
> >
> > to v3.14.4 and it fixes the problem as expected.
> >
> > So we have a commit fcb6a15c2e7e ("intel_pstate: Take core C0 time into
> > account for core busy calculation") that went into v3.14-rc2 (and was
> > even marked for *stable*) that first broke Greg KH's system:
> >
> > https://lkml.org/lkml/2014/2/19/626
> >
> > That was apparently fixed by e66c17683746 ("intel_pstate: Change
> > busy calculation to use fixed point math."), but still left v3.14
> > basically unusable for lower-intensity workloads such as my
> > bash-completion example and other reported regressions:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=75121
> >
> > Sure there may be issues with v3.13 not hitting the lowest frequencies
> > but at least the system was *usable*.
> >
> > In my opinion there's really no other option than to restore the 3.13
> > behaviour by effectively reverting fcb6a15c2e7e ("intel_pstate: Take
> > core C0 time into account for core busy calculation") until you have
> > figured out a way to take C0 into account without breaking things too
> > badly.
>
> Dirk has posted some patches, do they fix the problem for you?
Thanks for letting me know.
As the series posted yesterday includes the effective revert
"intel_pstate: Remove C0 tracking" mentioned above, they do.
The series does not apply at all to v3.14.4, and I choose to include
d37e2b764499 ("intel_pstate: remove unneeded sample buffers") in order
to ease backporting slightly.
I don't see much difference from applying only the revert patch or all
of them. I've included turbostat output from when running my
bash-construct on an idle system (running X, though) below.
With 3.14.4 all cores appear stuck at minimum frequency. With the
revert patch (with or without the remaining patches) frequencies are
around 3.7Ghz for all cores.
As at least one of the new patches has already raised some discussion:
http://marc.info/?l=linux-pm&m=140141648726863&w=2
perhaps we should consider simply applying and backporting the revert
(and fixed-point rounding fix) to restore v3.13 behaviour and un-break
v3.14 in the meantime?
Thanks,
Johan
v3.14.4 (with "intel_pstate: Remove C0 tracking" and a version of
"intel_pstate: Fix fixed point rounding macro" from two weeks ago):
cor CPU %c0 GHz TSC SMI %c1 %c3 %c6 %c7 CTMP PTMP %pc2 %pc3 %pc6 %pc7 Pkg_W Cor_W GFX_W
12.63 3.88 3.39 0 13.32 0.02 0.06 73.96 41 41 0.00 0.00 0.00 0.00 18.69 10.70 0.00
0 0 0.50 3.88 3.39 0 3.28 0.10 0.24 95.88 31 41 0.00 0.00 0.00 0.00 18.69 10.70 0.00
0 4 0.57 3.87 3.39 0 3.22
1 1 0.09 3.61 3.39 0 0.03 0.00 0.00 99.88 31
1 5 0.01 3.74 3.39 0 0.11
2 2 0.02 3.61 3.39 0 0.02 0.00 0.00 99.96 31
2 6 0.01 3.80 3.39 0 0.04
3 3 99.85 3.89 3.39 0 0.04 0.00 0.00 0.12 41
3 7 0.01 3.81 3.39 0 99.87
1.194291 sec
v3.14.4 (with patches from yesterday):
cor CPU %c0 GHz TSC SMI %c1 %c3 %c6 %c7 CTMP PTMP %pc2 %pc3 %pc6 %pc7 Pkg_W Cor_W GFX_W
12.63 3.88 3.39 0 13.33 0.04 0.04 73.95 44 45 0.00 0.00 0.00 0.00 18.65 10.70 0.00
0 0 0.52 3.88 3.39 0 3.28 0.17 0.16 95.88 33 45 0.00 0.00 0.00 0.00 18.65 10.70 0.00
0 4 0.58 3.87 3.39 0 3.22
1 1 0.10 3.45 3.39 0 0.05 0.00 0.00 99.85 34
1 5 0.01 3.67 3.39 0 0.14
2 2 0.02 3.56 3.39 0 0.04 0.00 0.00 99.94 34
2 6 0.01 3.63 3.39 0 0.05
3 3 99.82 3.89 3.39 0 0.05 0.00 0.00 0.13 44
3 7 0.01 3.70 3.39 0 99.86
1.197190 sec
v3.14.4 (current stable):
cor CPU %c0 GHz TSC SMI %c1 %c3 %c6 %c7 CTMP PTMP %pc2 %pc3 %pc6 %pc7 Pkg_W Cor_W GFX_W
12.86 0.82 3.39 0 14.16 0.00 0.00 72.97 32 32 0.00 0.00 0.00 0.00 5.55 0.61 0.00
0 0 2.26 0.82 3.39 0 5.74 0.01 0.02 91.98 32 32 0.00 0.00 0.00 0.00 5.55 0.61 0.00
0 4 0.58 0.82 3.39 0 7.42
1 1 0.09 0.82 3.39 0 0.03 0.00 0.00 99.88 31
1 5 0.01 0.85 3.39 0 0.10
2 2 0.02 0.83 3.39 0 0.06 0.00 0.00 99.92 31
2 6 0.03 0.86 3.39 0 0.05
3 3 99.89 0.82 3.39 0 0.01 0.00 0.00 0.09 31
3 7 0.02 0.89 3.39 0 99.89
5.675564 sec
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14
2014-05-30 2:27 ` Greg Kroah-Hartman
2014-05-30 8:49 ` Johan Hovold
@ 2014-05-30 12:29 ` Rafael J. Wysocki
1 sibling, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2014-05-30 12:29 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Johan Hovold, Dirk Brandewie, Viresh Kumar, dirk.j.brandewie,
Rafael J. Wysocki, cpufreq@vger.kernel.org,
linux-pm@vger.kernel.org, Linux Kernel Mailing List,
Doug Smythies, Yuyang Du, Stratos Karafotis
On Thursday, May 29, 2014 07:27:34 PM Greg Kroah-Hartman wrote:
> On Wed, May 28, 2014 at 09:59:45AM +0200, Johan Hovold wrote:
> > [ +CC: Greg, Doug, Stratos, Yuyang ]
> >
> > On Wed, May 21, 2014 at 11:00:51AM +0200, Johan Hovold wrote:
> > > On Wed, May 07, 2014 at 07:10:49AM -0700, Dirk Brandewie wrote:
> > > > On 05/06/2014 10:40 PM, Viresh Kumar wrote:
> > > > > Cc'ing Dirk who is taking care of intel-pstate driver.
> > > > >
> > > >
> > > > Thanks Viresh I had seen this thread.
> > > >
> > > > I am looking into it
> > >
> > > Any updates on this, Dirk? 3.14 is still basically unusable with the
> > > intel_pstate driver.
> > >
> > > Any fixes or workarounds posted elsewhere that I can apply in the
> > > meantime?
> >
> > Another week and still no reply, Dirk?
> >
> > I tried applying your (rejected) patch "intel_pstate: Remove C0
> > tracking" posted here:
> >
> > https://lkml.org/lkml/2014/5/8/574
> >
> > to v3.14.4 and it fixes the problem as expected.
> >
> > So we have a commit fcb6a15c2e7e ("intel_pstate: Take core C0 time into
> > account for core busy calculation") that went into v3.14-rc2 (and was
> > even marked for *stable*) that first broke Greg KH's system:
> >
> > https://lkml.org/lkml/2014/2/19/626
> >
> > That was apparently fixed by e66c17683746 ("intel_pstate: Change
> > busy calculation to use fixed point math."), but still left v3.14
> > basically unusable for lower-intensity workloads such as my
> > bash-completion example and other reported regressions:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=75121
> >
> > Sure there may be issues with v3.13 not hitting the lowest frequencies
> > but at least the system was *usable*.
> >
> > In my opinion there's really no other option than to restore the 3.13
> > behaviour by effectively reverting fcb6a15c2e7e ("intel_pstate: Take
> > core C0 time into account for core busy calculation") until you have
> > figured out a way to take C0 into account without breaking things too
> > badly.
>
> Dirk has posted some patches, do they fix the problem for you?
I'm queuing up this series for 3.16, BTW.
Rafael
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-05-30 12:29 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-06 16:35 Performance regression in v3.14 Johan Hovold
2014-05-07 5:40 ` Viresh Kumar
2014-05-07 7:35 ` Johan Hovold
2014-05-07 8:36 ` Romain Francoise
2014-05-07 14:10 ` Dirk Brandewie
2014-05-21 9:00 ` Johan Hovold
2014-05-28 7:59 ` Johan Hovold
2014-05-28 0:35 ` Yuyang Du
2014-05-28 16:00 ` Doug Smythies
2014-05-28 16:53 ` Yuyang Du
2014-05-30 2:27 ` Greg Kroah-Hartman
2014-05-30 8:49 ` Johan Hovold
2014-05-30 12:29 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).