* Performance regression in v3.14 @ 2014-05-06 16:35 Johan Hovold 2014-05-07 5:40 ` Viresh Kumar 0 siblings, 1 reply; 13+ messages in thread From: Johan Hovold @ 2014-05-06 16:35 UTC (permalink / raw) To: Rafael J. Wysocki, Viresh Kumar; +Cc: cpufreq, linux-pm, linux-kernel After updating my main system from v3.13 to v3.14.2, I found that the git bash-completion was extremely sluggish. Completing a file name would take roughly six rather than one second on this Haswell machine (i7-4770). (Other things, such as git rebase, also felt slower, but the completion issue was much more obvious and easy to measure). I managed to reproduce the problem using the following minimal construct cat dmesg.repeat | while read x; do true; done where dmesg.repeat is simply dmesg concatenated together to an equivalent number of lines as produced by git ls-files in the kernel-source tree root (45k), and where the actual processing of each line has been removed. Most of the time I get: $ time cat dmesg.repeat | while read x; do true; done real 0m6.091s user 0m3.674s sys 0m2.447s but sometimes it only takes one second. $ time cat dmesg.repeat | while read x; do true; done real 0m1.100s user 0m0.544s sys 0m0.570s I don't seem to be able to reproduce the problem on 3.13 where the pipe always takes about one second to finish. Taking all but one core offline seems to make the problem go away, and so does using the performance rather than powersave governor of the intel_pstate cpufreq driver (on at least one of two online cores). Moving the mouse cursor makes to loop finish faster, and so does switching to a another terminal to print cpufreq/cpuinfo_cur_freq which was around cpuinfo_min_freq several times (when tracing, see below). I could not reproduce the problem when using perf record, but I can get function-profile traces using ftrace (in which case the loop takes about 60 seconds instead of six seconds to finish). Comparing the traces I see a lot of functions taking ten times longer to finish, but I guess that's expected if this is indeed a cpufreq issue. Since this is my main machine (and only multi-core machine at the moment) I'm not able to bisect this myself. And for the same reason I have not verified that the problem persists in v3.15-rc. I don't see any cpufreq patches in the v3.14.3 stable queue nor anything obviously related and marked for stable in v3.15-rc. Any ideas about what might be going on? Thanks, Johan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14 2014-05-06 16:35 Performance regression in v3.14 Johan Hovold @ 2014-05-07 5:40 ` Viresh Kumar 2014-05-07 7:35 ` Johan Hovold 2014-05-07 14:10 ` Dirk Brandewie 0 siblings, 2 replies; 13+ messages in thread From: Viresh Kumar @ 2014-05-07 5:40 UTC (permalink / raw) To: Johan Hovold, Dirk Brandewie Cc: Rafael J. Wysocki, cpufreq@vger.kernel.org, linux-pm@vger.kernel.org, Linux Kernel Mailing List Cc'ing Dirk who is taking care of intel-pstate driver. On 6 May 2014 22:05, Johan Hovold <jhovold@gmail.com> wrote: > After updating my main system from v3.13 to v3.14.2, I found that the > git bash-completion was extremely sluggish. Completing a file name would > take roughly six rather than one second on this Haswell machine > (i7-4770). (Other things, such as git rebase, also felt slower, but > the completion issue was much more obvious and easy to measure). > > I managed to reproduce the problem using the following minimal construct > > cat dmesg.repeat | while read x; do true; done > > where dmesg.repeat is simply dmesg concatenated together to an > equivalent number of lines as produced by git ls-files in the > kernel-source tree root (45k), and where the actual processing of each > line has been removed. > > Most of the time I get: > > $ time cat dmesg.repeat | while read x; do true; done > > real 0m6.091s > user 0m3.674s > sys 0m2.447s > > but sometimes it only takes one second. > > $ time cat dmesg.repeat | while read x; do true; done > > real 0m1.100s > user 0m0.544s > sys 0m0.570s > > I don't seem to be able to reproduce the problem on 3.13 where the pipe > always takes about one second to finish. > > Taking all but one core offline seems to make the problem go away, and so > does using the performance rather than powersave governor of the > intel_pstate cpufreq driver (on at least one of two online cores). > > Moving the mouse cursor makes to loop finish faster, and so does > switching to a another terminal to print cpufreq/cpuinfo_cur_freq which > was around cpuinfo_min_freq several times (when tracing, see below). > > I could not reproduce the problem when using perf record, but I can get > function-profile traces using ftrace (in which case the loop takes about > 60 seconds instead of six seconds to finish). > > Comparing the traces I see a lot of functions taking ten times longer to > finish, but I guess that's expected if this is indeed a cpufreq issue. > > Since this is my main machine (and only multi-core machine at the > moment) I'm not able to bisect this myself. And for the same reason I > have not verified that the problem persists in v3.15-rc. > > I don't see any cpufreq patches in the v3.14.3 stable queue nor anything > obviously related and marked for stable in v3.15-rc. > > Any ideas about what might be going on? I tried to take a look at the diff for cpufreq between 3.13 and 3.14.2 and couldn't pin point on any change which might cause it. Don't have a clue of what's going on. I don't know how to help you on this. Normally I test my stuff on a ARM board and I don't remember facing any such behavior there. There might be something wrong with intel-pstate as well.. Also, can you try to use acpi-cpufreq instead? And see how that is behaving? -- viresh ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14 2014-05-07 5:40 ` Viresh Kumar @ 2014-05-07 7:35 ` Johan Hovold 2014-05-07 8:36 ` Romain Francoise 2014-05-07 14:10 ` Dirk Brandewie 1 sibling, 1 reply; 13+ messages in thread From: Johan Hovold @ 2014-05-07 7:35 UTC (permalink / raw) To: Viresh Kumar Cc: Johan Hovold, Dirk Brandewie, Rafael J. Wysocki, cpufreq@vger.kernel.org, linux-pm@vger.kernel.org, Linux Kernel Mailing List On Wed, May 07, 2014 at 11:10:34AM +0530, Viresh Kumar wrote: > Cc'ing Dirk who is taking care of intel-pstate driver. > > On 6 May 2014 22:05, Johan Hovold <jhovold@gmail.com> wrote: > > After updating my main system from v3.13 to v3.14.2, I found that the > > git bash-completion was extremely sluggish. Completing a file name would > > take roughly six rather than one second on this Haswell machine > > (i7-4770). (Other things, such as git rebase, also felt slower, but > > the completion issue was much more obvious and easy to measure). > > > > I managed to reproduce the problem using the following minimal construct > > > > cat dmesg.repeat | while read x; do true; done > > > > where dmesg.repeat is simply dmesg concatenated together to an > > equivalent number of lines as produced by git ls-files in the > > kernel-source tree root (45k), and where the actual processing of each > > line has been removed. > > > > Most of the time I get: > > > > $ time cat dmesg.repeat | while read x; do true; done > > > > real 0m6.091s > > user 0m3.674s > > sys 0m2.447s > > > > but sometimes it only takes one second. > > > > $ time cat dmesg.repeat | while read x; do true; done > > > > real 0m1.100s > > user 0m0.544s > > sys 0m0.570s > > > > I don't seem to be able to reproduce the problem on 3.13 where the pipe > > always takes about one second to finish. > > > > Taking all but one core offline seems to make the problem go away, and so > > does using the performance rather than powersave governor of the > > intel_pstate cpufreq driver (on at least one of two online cores). > > > > Moving the mouse cursor makes to loop finish faster, and so does > > switching to a another terminal to print cpufreq/cpuinfo_cur_freq which > > was around cpuinfo_min_freq several times (when tracing, see below). <snip> > I tried to take a look at the diff for cpufreq between 3.13 and 3.14.2 and > couldn't pin point on any change which might cause it. Don't have a clue > of what's going on. I don't know how to help you on this. > > Normally I test my stuff on a ARM board and I don't remember facing > any such behavior there. There might be something wrong with intel-pstate > as well.. > > Also, can you try to use acpi-cpufreq instead? And see how that is behaving? Using acpi-cpufreq and the ondemand governor (with all 8 cores online) on 3.14.3 improves the situation somewhat: $ time cat dmesg.repeat | while read x; do true; done real 0m1.989s user 0m1.257s sys 0m0.747s when the system is idle, and $ time cat dmesg.repeat | while read x; do true; done real 0m1.191s user 0m0.753s sys 0m0.449s when run a second time in immediate succession. When running the same tests on 3.13.11, the figures are roughly the same $ time cat dmesg.repeat | while read x; do true; done real 0m2.075s user 0m1.276s sys 0m0.816s $ time cat dmesg.repeat | while read x; do true; done real 0m1.291s user 0m0.800s sys 0m0.504s So I guess that idle-active difference is normal for acpi-cpufreq and that the problem only arises in or with the intel_pstate driver. Thanks, Johan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14 2014-05-07 7:35 ` Johan Hovold @ 2014-05-07 8:36 ` Romain Francoise 0 siblings, 0 replies; 13+ messages in thread From: Romain Francoise @ 2014-05-07 8:36 UTC (permalink / raw) To: Johan Hovold Cc: Viresh Kumar, Dirk Brandewie, Rafael J. Wysocki, cpufreq@vger.kernel.org, linux-pm@vger.kernel.org, Linux Kernel Mailing List Johan Hovold <jhovold@gmail.com> writes: > So I guess that idle-active difference is normal for acpi-cpufreq and > that the problem only arises in or with the intel_pstate driver. I've also noticed some performance issues with intel_pstate in powersave mode, in my case playing fullscreen video was very choppy. Switching to the performance governor fixed things as well. Looking at turbostat, the cores remain close to their minimal frequency pretty much all the time in powersave. Maybe 91a4cd4f3d8169d ("intel_pstate: Remove periodic P state boost") which went in v3.14 is the culprit? (I would also be curious to know what the driver authors recommend for typical laptop usage, performance or powersave, since it defaults to performance.) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14 2014-05-07 5:40 ` Viresh Kumar 2014-05-07 7:35 ` Johan Hovold @ 2014-05-07 14:10 ` Dirk Brandewie 2014-05-21 9:00 ` Johan Hovold 1 sibling, 1 reply; 13+ messages in thread From: Dirk Brandewie @ 2014-05-07 14:10 UTC (permalink / raw) To: Viresh Kumar, Johan Hovold Cc: dirk.j.brandewie, Rafael J. Wysocki, cpufreq@vger.kernel.org, linux-pm@vger.kernel.org, Linux Kernel Mailing List On 05/06/2014 10:40 PM, Viresh Kumar wrote: > Cc'ing Dirk who is taking care of intel-pstate driver. > Thanks Viresh I had seen this thread. I am looking into it --Dirk > On 6 May 2014 22:05, Johan Hovold <jhovold@gmail.com> wrote: >> After updating my main system from v3.13 to v3.14.2, I found that the >> git bash-completion was extremely sluggish. Completing a file name would >> take roughly six rather than one second on this Haswell machine >> (i7-4770). (Other things, such as git rebase, also felt slower, but >> the completion issue was much more obvious and easy to measure). >> >> I managed to reproduce the problem using the following minimal construct >> >> cat dmesg.repeat | while read x; do true; done >> >> where dmesg.repeat is simply dmesg concatenated together to an >> equivalent number of lines as produced by git ls-files in the >> kernel-source tree root (45k), and where the actual processing of each >> line has been removed. >> >> Most of the time I get: >> >> $ time cat dmesg.repeat | while read x; do true; done >> >> real 0m6.091s >> user 0m3.674s >> sys 0m2.447s >> >> but sometimes it only takes one second. >> >> $ time cat dmesg.repeat | while read x; do true; done >> >> real 0m1.100s >> user 0m0.544s >> sys 0m0.570s >> >> I don't seem to be able to reproduce the problem on 3.13 where the pipe >> always takes about one second to finish. >> >> Taking all but one core offline seems to make the problem go away, and so >> does using the performance rather than powersave governor of the >> intel_pstate cpufreq driver (on at least one of two online cores). >> >> Moving the mouse cursor makes to loop finish faster, and so does >> switching to a another terminal to print cpufreq/cpuinfo_cur_freq which >> was around cpuinfo_min_freq several times (when tracing, see below). >> >> I could not reproduce the problem when using perf record, but I can get >> function-profile traces using ftrace (in which case the loop takes about >> 60 seconds instead of six seconds to finish). >> >> Comparing the traces I see a lot of functions taking ten times longer to >> finish, but I guess that's expected if this is indeed a cpufreq issue. >> >> Since this is my main machine (and only multi-core machine at the >> moment) I'm not able to bisect this myself. And for the same reason I >> have not verified that the problem persists in v3.15-rc. >> >> I don't see any cpufreq patches in the v3.14.3 stable queue nor anything >> obviously related and marked for stable in v3.15-rc. >> >> Any ideas about what might be going on? > > I tried to take a look at the diff for cpufreq between 3.13 and 3.14.2 and > couldn't pin point on any change which might cause it. Don't have a clue > of what's going on. I don't know how to help you on this. > > Normally I test my stuff on a ARM board and I don't remember facing > any such behavior there. There might be something wrong with intel-pstate > as well.. > > Also, can you try to use acpi-cpufreq instead? And see how that is behaving? > > -- > viresh > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14 2014-05-07 14:10 ` Dirk Brandewie @ 2014-05-21 9:00 ` Johan Hovold 2014-05-28 7:59 ` Johan Hovold 0 siblings, 1 reply; 13+ messages in thread From: Johan Hovold @ 2014-05-21 9:00 UTC (permalink / raw) To: Dirk Brandewie Cc: Viresh Kumar, Johan Hovold, dirk.j.brandewie, Rafael J. Wysocki, cpufreq@vger.kernel.org, linux-pm@vger.kernel.org, Linux Kernel Mailing List On Wed, May 07, 2014 at 07:10:49AM -0700, Dirk Brandewie wrote: > On 05/06/2014 10:40 PM, Viresh Kumar wrote: > > Cc'ing Dirk who is taking care of intel-pstate driver. > > > > Thanks Viresh I had seen this thread. > > I am looking into it Any updates on this, Dirk? 3.14 is still basically unusable with the intel_pstate driver. Any fixes or workarounds posted elsewhere that I can apply in the meantime? Thanks, Johan > > On 6 May 2014 22:05, Johan Hovold <jhovold@gmail.com> wrote: > >> After updating my main system from v3.13 to v3.14.2, I found that the > >> git bash-completion was extremely sluggish. Completing a file name would > >> take roughly six rather than one second on this Haswell machine > >> (i7-4770). (Other things, such as git rebase, also felt slower, but > >> the completion issue was much more obvious and easy to measure). > >> > >> I managed to reproduce the problem using the following minimal construct > >> > >> cat dmesg.repeat | while read x; do true; done > >> > >> where dmesg.repeat is simply dmesg concatenated together to an > >> equivalent number of lines as produced by git ls-files in the > >> kernel-source tree root (45k), and where the actual processing of each > >> line has been removed. > >> > >> Most of the time I get: > >> > >> $ time cat dmesg.repeat | while read x; do true; done > >> > >> real 0m6.091s > >> user 0m3.674s > >> sys 0m2.447s > >> > >> but sometimes it only takes one second. > >> > >> $ time cat dmesg.repeat | while read x; do true; done > >> > >> real 0m1.100s > >> user 0m0.544s > >> sys 0m0.570s > >> > >> I don't seem to be able to reproduce the problem on 3.13 where the pipe > >> always takes about one second to finish. > >> > >> Taking all but one core offline seems to make the problem go away, and so > >> does using the performance rather than powersave governor of the > >> intel_pstate cpufreq driver (on at least one of two online cores). > >> > >> Moving the mouse cursor makes to loop finish faster, and so does > >> switching to a another terminal to print cpufreq/cpuinfo_cur_freq which > >> was around cpuinfo_min_freq several times (when tracing, see below). > >> > >> I could not reproduce the problem when using perf record, but I can get > >> function-profile traces using ftrace (in which case the loop takes about > >> 60 seconds instead of six seconds to finish). > >> > >> Comparing the traces I see a lot of functions taking ten times longer to > >> finish, but I guess that's expected if this is indeed a cpufreq issue. > >> > >> Since this is my main machine (and only multi-core machine at the > >> moment) I'm not able to bisect this myself. And for the same reason I > >> have not verified that the problem persists in v3.15-rc. > >> > >> I don't see any cpufreq patches in the v3.14.3 stable queue nor anything > >> obviously related and marked for stable in v3.15-rc. > >> > >> Any ideas about what might be going on? > > > > I tried to take a look at the diff for cpufreq between 3.13 and 3.14.2 and > > couldn't pin point on any change which might cause it. Don't have a clue > > of what's going on. I don't know how to help you on this. > > > > Normally I test my stuff on a ARM board and I don't remember facing > > any such behavior there. There might be something wrong with intel-pstate > > as well.. > > > > Also, can you try to use acpi-cpufreq instead? And see how that is behaving? > > > > -- > > viresh ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14 2014-05-21 9:00 ` Johan Hovold @ 2014-05-28 7:59 ` Johan Hovold 2014-05-28 0:35 ` Yuyang Du 2014-05-30 2:27 ` Greg Kroah-Hartman 0 siblings, 2 replies; 13+ messages in thread From: Johan Hovold @ 2014-05-28 7:59 UTC (permalink / raw) To: Dirk Brandewie Cc: Viresh Kumar, Johan Hovold, dirk.j.brandewie, Rafael J. Wysocki, cpufreq@vger.kernel.org, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Greg Kroah-Hartman, Doug Smythies, Yuyang Du, Stratos Karafotis [ +CC: Greg, Doug, Stratos, Yuyang ] On Wed, May 21, 2014 at 11:00:51AM +0200, Johan Hovold wrote: > On Wed, May 07, 2014 at 07:10:49AM -0700, Dirk Brandewie wrote: > > On 05/06/2014 10:40 PM, Viresh Kumar wrote: > > > Cc'ing Dirk who is taking care of intel-pstate driver. > > > > > > > Thanks Viresh I had seen this thread. > > > > I am looking into it > > Any updates on this, Dirk? 3.14 is still basically unusable with the > intel_pstate driver. > > Any fixes or workarounds posted elsewhere that I can apply in the > meantime? Another week and still no reply, Dirk? I tried applying your (rejected) patch "intel_pstate: Remove C0 tracking" posted here: https://lkml.org/lkml/2014/5/8/574 to v3.14.4 and it fixes the problem as expected. So we have a commit fcb6a15c2e7e ("intel_pstate: Take core C0 time into account for core busy calculation") that went into v3.14-rc2 (and was even marked for *stable*) that first broke Greg KH's system: https://lkml.org/lkml/2014/2/19/626 That was apparently fixed by e66c17683746 ("intel_pstate: Change busy calculation to use fixed point math."), but still left v3.14 basically unusable for lower-intensity workloads such as my bash-completion example and other reported regressions: https://bugzilla.kernel.org/show_bug.cgi?id=75121 Sure there may be issues with v3.13 not hitting the lowest frequencies but at least the system was *usable*. In my opinion there's really no other option than to restore the 3.13 behaviour by effectively reverting fcb6a15c2e7e ("intel_pstate: Take core C0 time into account for core busy calculation") until you have figured out a way to take C0 into account without breaking things too badly. Johan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14 2014-05-28 7:59 ` Johan Hovold @ 2014-05-28 0:35 ` Yuyang Du 2014-05-28 16:00 ` Doug Smythies 2014-05-30 2:27 ` Greg Kroah-Hartman 1 sibling, 1 reply; 13+ messages in thread From: Yuyang Du @ 2014-05-28 0:35 UTC (permalink / raw) To: Johan Hovold Cc: Dirk Brandewie, Viresh Kumar, dirk.j.brandewie, Rafael J. Wysocki, cpufreq@vger.kernel.org, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Greg Kroah-Hartman, Doug Smythies, Stratos Karafotis > I tried applying your (rejected) patch "intel_pstate: Remove C0 > tracking" posted here: > > https://lkml.org/lkml/2014/5/8/574 > > to v3.14.4 and it fixes the problem as expected. > > So we have a commit fcb6a15c2e7e ("intel_pstate: Take core C0 time into > account for core busy calculation") that went into v3.14-rc2 (and was > even marked for *stable*) that first broke Greg KH's system: > > https://lkml.org/lkml/2014/2/19/626 > > That was apparently fixed by e66c17683746 ("intel_pstate: Change > busy calculation to use fixed point math."), but still left v3.14 > basically unusable for lower-intensity workloads such as my > bash-completion example and other reported regressions: > > https://bugzilla.kernel.org/show_bug.cgi?id=75121 > > Sure there may be issues with v3.13 not hitting the lowest frequencies > but at least the system was *usable*. > > In my opinion there's really no other option than to restore the 3.13 > behaviour by effectively reverting fcb6a15c2e7e ("intel_pstate: Take > core C0 time into account for core busy calculation") until you have > figured out a way to take C0 into account without breaking things too > badly. Hi all, My posts before and now are only relevant to why C0 tracking can't be removed. Maybe I need to elaborate on it a little bit more. In a nutshell, without C0 tracking, the intel_pstate is effectively performance governor in terms of frequency control. Why? Without C0 trakcing, the machinery of the freq control is as I formed: last_freq_average / last_requested_freq ==> setpoint which can be virtually formed into: last_freq_average / last_requested_freq * last_C0_pct ==> setpoint * last_C0_pct which said, the control machinery will increase the freuqency at ANY frequency at ANY C0_pct (which is the CPU utilization), since setpoint is less then 100 percent. And a few iterations later, we will reach max (possible) frequency, then we are effectively performance governor (highest frequency all the time). So, sure, without C0 tracking, the performance issues should be fixed. But let's simply set highest frequency, that should be better. It is your decision whether we should remove C0 tracking as a no-other option fix right now. I am ok either way. Thanks, Yuyang ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Performance regression in v3.14 2014-05-28 0:35 ` Yuyang Du @ 2014-05-28 16:00 ` Doug Smythies 2014-05-28 16:53 ` Yuyang Du 0 siblings, 1 reply; 13+ messages in thread From: Doug Smythies @ 2014-05-28 16:00 UTC (permalink / raw) To: 'Yuyang Du', 'Johan Hovold' Cc: 'Dirk Brandewie', 'Viresh Kumar', dirk.j.brandewie, 'Rafael J. Wysocki', cpufreq, linux-pm, 'Linux Kernel Mailing List', 'Greg Kroah-Hartman', 'Stratos Karafotis', Doug Smythies On 2014.05.27 01:40 Yuyang Du wrote: >> On 2014.05.27 01:00, Johan Hovold wrote: >> I tried applying your (rejected) patch "intel_pstate: Remove C0 >> tracking" posted here: >> >> https://lkml.org/lkml/2014/5/8/574 >> >> to v3.14.4 and it fixes the problem as expected. >> >> So we have a commit fcb6a15c2e7e ("intel_pstate: Take core C0 time into >> account for core busy calculation") that went into v3.14-rc2 (and was >> even marked for *stable*) that first broke Greg KH's system: >> >> https://lkml.org/lkml/2014/2/19/626 >> >> That was apparently fixed by e66c17683746 ("intel_pstate: Change >> busy calculation to use fixed point math."), but still left v3.14 >> basically unusable for lower-intensity workloads such as my >> bash-completion example and other reported regressions: >> >> https://bugzilla.kernel.org/show_bug.cgi?id=75121 >> >> Sure there may be issues with v3.13 not hitting the lowest frequencies >> but at least the system was *usable*. >> >> In my opinion there's really no other option than to restore the 3.13 >> behaviour by effectively reverting fcb6a15c2e7e ("intel_pstate: Take >> core C0 time into account for core busy calculation") until you have >> figured out a way to take C0 into account without breaking things too >> badly. > Hi all, > My posts before and now are only relevant to why C0 tracking can't be > removed. Maybe I need to elaborate on it a little bit more. > In a nutshell, without C0 tracking, the intel_pstate is effectively > performance governor in terms of frequency control. That is not true. The CPU Frequency Verses Load response curve is different and considerably more aggressive for performance mode when compared to powersave mode with C0 tracking removed. I'll add a relevant graph to the bugzilla report referenced above. (but it will be a few hours before I do.) > Why? Without C0 trakcing, the machinery of the freq control > is as I formed: > last_freq_average / last_requested_freq ==> setpoint > which can be virtually formed into: > last_freq_average / last_requested_freq * last_C0_pct ==> > setpoint * last_C0_pct > which said, the control machinery will increase the frequency > at ANY frequency at ANY C0_pct (which is the CPU utilization), > since setpoint is less then 100 percent. That is not true. Yes, and due to the setpoint being less than 100, which is needed or the driver won't work at all, there is a tendency to drive the target pstate upwards. However that is tempered by both the PID proportional gain, and ultimately integer math. More importantly, the CPU itself tells the driver when it is operating below the target pstate and driver responds. Additionally, the tendency to drive up the target pstate too much is exasperated by some extra rounding up at a couple of spots. Dirk has a pending fix. > And a few iterations > later, we will reach max (possible) frequency, > then we are effectively performance governor > (highest frequency all the time). Please do not confuse highest target pstate with highest frequency. They are not the same. The processor itself can back off. ... Doug ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14 2014-05-28 16:00 ` Doug Smythies @ 2014-05-28 16:53 ` Yuyang Du 0 siblings, 0 replies; 13+ messages in thread From: Yuyang Du @ 2014-05-28 16:53 UTC (permalink / raw) To: Doug Smythies Cc: 'Johan Hovold', 'Dirk Brandewie', 'Viresh Kumar', dirk.j.brandewie, 'Rafael J. Wysocki', cpufreq, linux-pm, 'Linux Kernel Mailing List', 'Greg Kroah-Hartman', 'Stratos Karafotis' > That is not true. Yes, and due to the setpoint being less than > 100, which is needed or the driver won't work at all, there is > a tendency to drive the target pstate upwards. > However that is tempered by both the PID proportional gain, > and ultimately integer math. More importantly, the CPU > itself tells the driver when it is operating below the target > pstate and driver responds. > > Additionally, the tendency to drive up the target pstate > too much is exasperated by some extra rounding up at a > couple of spots. Dirk has a pending fix. > > > And a few iterations > > later, we will reach max (possible) frequency, > > then we are effectively performance governor > > (highest frequency all the time). > > Please do not confuse highest target pstate with > highest frequency. They are not the same. The processor > itself can back off. > Hi Doug, All you said is about the hardware will not give whatever software wants (e.g., requested freq too high). Agreed. But does it matter to this discussion? Yuyang ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14 2014-05-28 7:59 ` Johan Hovold 2014-05-28 0:35 ` Yuyang Du @ 2014-05-30 2:27 ` Greg Kroah-Hartman 2014-05-30 8:49 ` Johan Hovold 2014-05-30 12:29 ` Rafael J. Wysocki 1 sibling, 2 replies; 13+ messages in thread From: Greg Kroah-Hartman @ 2014-05-30 2:27 UTC (permalink / raw) To: Johan Hovold Cc: Dirk Brandewie, Viresh Kumar, dirk.j.brandewie, Rafael J. Wysocki, cpufreq@vger.kernel.org, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Doug Smythies, Yuyang Du, Stratos Karafotis On Wed, May 28, 2014 at 09:59:45AM +0200, Johan Hovold wrote: > [ +CC: Greg, Doug, Stratos, Yuyang ] > > On Wed, May 21, 2014 at 11:00:51AM +0200, Johan Hovold wrote: > > On Wed, May 07, 2014 at 07:10:49AM -0700, Dirk Brandewie wrote: > > > On 05/06/2014 10:40 PM, Viresh Kumar wrote: > > > > Cc'ing Dirk who is taking care of intel-pstate driver. > > > > > > > > > > Thanks Viresh I had seen this thread. > > > > > > I am looking into it > > > > Any updates on this, Dirk? 3.14 is still basically unusable with the > > intel_pstate driver. > > > > Any fixes or workarounds posted elsewhere that I can apply in the > > meantime? > > Another week and still no reply, Dirk? > > I tried applying your (rejected) patch "intel_pstate: Remove C0 > tracking" posted here: > > https://lkml.org/lkml/2014/5/8/574 > > to v3.14.4 and it fixes the problem as expected. > > So we have a commit fcb6a15c2e7e ("intel_pstate: Take core C0 time into > account for core busy calculation") that went into v3.14-rc2 (and was > even marked for *stable*) that first broke Greg KH's system: > > https://lkml.org/lkml/2014/2/19/626 > > That was apparently fixed by e66c17683746 ("intel_pstate: Change > busy calculation to use fixed point math."), but still left v3.14 > basically unusable for lower-intensity workloads such as my > bash-completion example and other reported regressions: > > https://bugzilla.kernel.org/show_bug.cgi?id=75121 > > Sure there may be issues with v3.13 not hitting the lowest frequencies > but at least the system was *usable*. > > In my opinion there's really no other option than to restore the 3.13 > behaviour by effectively reverting fcb6a15c2e7e ("intel_pstate: Take > core C0 time into account for core busy calculation") until you have > figured out a way to take C0 into account without breaking things too > badly. Dirk has posted some patches, do they fix the problem for you? thanks, greg k-h ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14 2014-05-30 2:27 ` Greg Kroah-Hartman @ 2014-05-30 8:49 ` Johan Hovold 2014-05-30 12:29 ` Rafael J. Wysocki 1 sibling, 0 replies; 13+ messages in thread From: Johan Hovold @ 2014-05-30 8:49 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Johan Hovold, Dirk Brandewie, Viresh Kumar, dirk.j.brandewie, Rafael J. Wysocki, cpufreq@vger.kernel.org, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Doug Smythies, Yuyang Du, Stratos Karafotis On Thu, May 29, 2014 at 07:27:34PM -0700, Greg Kroah-Hartman wrote: > On Wed, May 28, 2014 at 09:59:45AM +0200, Johan Hovold wrote: > > [ +CC: Greg, Doug, Stratos, Yuyang ] > > > > On Wed, May 21, 2014 at 11:00:51AM +0200, Johan Hovold wrote: > > > On Wed, May 07, 2014 at 07:10:49AM -0700, Dirk Brandewie wrote: > > > > On 05/06/2014 10:40 PM, Viresh Kumar wrote: > > > > > Cc'ing Dirk who is taking care of intel-pstate driver. > > > > > > > > > > > > > Thanks Viresh I had seen this thread. > > > > > > > > I am looking into it > > > > > > Any updates on this, Dirk? 3.14 is still basically unusable with the > > > intel_pstate driver. > > > > > > Any fixes or workarounds posted elsewhere that I can apply in the > > > meantime? > > > > Another week and still no reply, Dirk? > > > > I tried applying your (rejected) patch "intel_pstate: Remove C0 > > tracking" posted here: > > > > https://lkml.org/lkml/2014/5/8/574 > > > > to v3.14.4 and it fixes the problem as expected. > > > > So we have a commit fcb6a15c2e7e ("intel_pstate: Take core C0 time into > > account for core busy calculation") that went into v3.14-rc2 (and was > > even marked for *stable*) that first broke Greg KH's system: > > > > https://lkml.org/lkml/2014/2/19/626 > > > > That was apparently fixed by e66c17683746 ("intel_pstate: Change > > busy calculation to use fixed point math."), but still left v3.14 > > basically unusable for lower-intensity workloads such as my > > bash-completion example and other reported regressions: > > > > https://bugzilla.kernel.org/show_bug.cgi?id=75121 > > > > Sure there may be issues with v3.13 not hitting the lowest frequencies > > but at least the system was *usable*. > > > > In my opinion there's really no other option than to restore the 3.13 > > behaviour by effectively reverting fcb6a15c2e7e ("intel_pstate: Take > > core C0 time into account for core busy calculation") until you have > > figured out a way to take C0 into account without breaking things too > > badly. > > Dirk has posted some patches, do they fix the problem for you? Thanks for letting me know. As the series posted yesterday includes the effective revert "intel_pstate: Remove C0 tracking" mentioned above, they do. The series does not apply at all to v3.14.4, and I choose to include d37e2b764499 ("intel_pstate: remove unneeded sample buffers") in order to ease backporting slightly. I don't see much difference from applying only the revert patch or all of them. I've included turbostat output from when running my bash-construct on an idle system (running X, though) below. With 3.14.4 all cores appear stuck at minimum frequency. With the revert patch (with or without the remaining patches) frequencies are around 3.7Ghz for all cores. As at least one of the new patches has already raised some discussion: http://marc.info/?l=linux-pm&m=140141648726863&w=2 perhaps we should consider simply applying and backporting the revert (and fixed-point rounding fix) to restore v3.13 behaviour and un-break v3.14 in the meantime? Thanks, Johan v3.14.4 (with "intel_pstate: Remove C0 tracking" and a version of "intel_pstate: Fix fixed point rounding macro" from two weeks ago): cor CPU %c0 GHz TSC SMI %c1 %c3 %c6 %c7 CTMP PTMP %pc2 %pc3 %pc6 %pc7 Pkg_W Cor_W GFX_W 12.63 3.88 3.39 0 13.32 0.02 0.06 73.96 41 41 0.00 0.00 0.00 0.00 18.69 10.70 0.00 0 0 0.50 3.88 3.39 0 3.28 0.10 0.24 95.88 31 41 0.00 0.00 0.00 0.00 18.69 10.70 0.00 0 4 0.57 3.87 3.39 0 3.22 1 1 0.09 3.61 3.39 0 0.03 0.00 0.00 99.88 31 1 5 0.01 3.74 3.39 0 0.11 2 2 0.02 3.61 3.39 0 0.02 0.00 0.00 99.96 31 2 6 0.01 3.80 3.39 0 0.04 3 3 99.85 3.89 3.39 0 0.04 0.00 0.00 0.12 41 3 7 0.01 3.81 3.39 0 99.87 1.194291 sec v3.14.4 (with patches from yesterday): cor CPU %c0 GHz TSC SMI %c1 %c3 %c6 %c7 CTMP PTMP %pc2 %pc3 %pc6 %pc7 Pkg_W Cor_W GFX_W 12.63 3.88 3.39 0 13.33 0.04 0.04 73.95 44 45 0.00 0.00 0.00 0.00 18.65 10.70 0.00 0 0 0.52 3.88 3.39 0 3.28 0.17 0.16 95.88 33 45 0.00 0.00 0.00 0.00 18.65 10.70 0.00 0 4 0.58 3.87 3.39 0 3.22 1 1 0.10 3.45 3.39 0 0.05 0.00 0.00 99.85 34 1 5 0.01 3.67 3.39 0 0.14 2 2 0.02 3.56 3.39 0 0.04 0.00 0.00 99.94 34 2 6 0.01 3.63 3.39 0 0.05 3 3 99.82 3.89 3.39 0 0.05 0.00 0.00 0.13 44 3 7 0.01 3.70 3.39 0 99.86 1.197190 sec v3.14.4 (current stable): cor CPU %c0 GHz TSC SMI %c1 %c3 %c6 %c7 CTMP PTMP %pc2 %pc3 %pc6 %pc7 Pkg_W Cor_W GFX_W 12.86 0.82 3.39 0 14.16 0.00 0.00 72.97 32 32 0.00 0.00 0.00 0.00 5.55 0.61 0.00 0 0 2.26 0.82 3.39 0 5.74 0.01 0.02 91.98 32 32 0.00 0.00 0.00 0.00 5.55 0.61 0.00 0 4 0.58 0.82 3.39 0 7.42 1 1 0.09 0.82 3.39 0 0.03 0.00 0.00 99.88 31 1 5 0.01 0.85 3.39 0 0.10 2 2 0.02 0.83 3.39 0 0.06 0.00 0.00 99.92 31 2 6 0.03 0.86 3.39 0 0.05 3 3 99.89 0.82 3.39 0 0.01 0.00 0.00 0.09 31 3 7 0.02 0.89 3.39 0 99.89 5.675564 sec ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Performance regression in v3.14 2014-05-30 2:27 ` Greg Kroah-Hartman 2014-05-30 8:49 ` Johan Hovold @ 2014-05-30 12:29 ` Rafael J. Wysocki 1 sibling, 0 replies; 13+ messages in thread From: Rafael J. Wysocki @ 2014-05-30 12:29 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Johan Hovold, Dirk Brandewie, Viresh Kumar, dirk.j.brandewie, Rafael J. Wysocki, cpufreq@vger.kernel.org, linux-pm@vger.kernel.org, Linux Kernel Mailing List, Doug Smythies, Yuyang Du, Stratos Karafotis On Thursday, May 29, 2014 07:27:34 PM Greg Kroah-Hartman wrote: > On Wed, May 28, 2014 at 09:59:45AM +0200, Johan Hovold wrote: > > [ +CC: Greg, Doug, Stratos, Yuyang ] > > > > On Wed, May 21, 2014 at 11:00:51AM +0200, Johan Hovold wrote: > > > On Wed, May 07, 2014 at 07:10:49AM -0700, Dirk Brandewie wrote: > > > > On 05/06/2014 10:40 PM, Viresh Kumar wrote: > > > > > Cc'ing Dirk who is taking care of intel-pstate driver. > > > > > > > > > > > > > Thanks Viresh I had seen this thread. > > > > > > > > I am looking into it > > > > > > Any updates on this, Dirk? 3.14 is still basically unusable with the > > > intel_pstate driver. > > > > > > Any fixes or workarounds posted elsewhere that I can apply in the > > > meantime? > > > > Another week and still no reply, Dirk? > > > > I tried applying your (rejected) patch "intel_pstate: Remove C0 > > tracking" posted here: > > > > https://lkml.org/lkml/2014/5/8/574 > > > > to v3.14.4 and it fixes the problem as expected. > > > > So we have a commit fcb6a15c2e7e ("intel_pstate: Take core C0 time into > > account for core busy calculation") that went into v3.14-rc2 (and was > > even marked for *stable*) that first broke Greg KH's system: > > > > https://lkml.org/lkml/2014/2/19/626 > > > > That was apparently fixed by e66c17683746 ("intel_pstate: Change > > busy calculation to use fixed point math."), but still left v3.14 > > basically unusable for lower-intensity workloads such as my > > bash-completion example and other reported regressions: > > > > https://bugzilla.kernel.org/show_bug.cgi?id=75121 > > > > Sure there may be issues with v3.13 not hitting the lowest frequencies > > but at least the system was *usable*. > > > > In my opinion there's really no other option than to restore the 3.13 > > behaviour by effectively reverting fcb6a15c2e7e ("intel_pstate: Take > > core C0 time into account for core busy calculation") until you have > > figured out a way to take C0 into account without breaking things too > > badly. > > Dirk has posted some patches, do they fix the problem for you? I'm queuing up this series for 3.16, BTW. Rafael ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-05-30 12:29 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-05-06 16:35 Performance regression in v3.14 Johan Hovold 2014-05-07 5:40 ` Viresh Kumar 2014-05-07 7:35 ` Johan Hovold 2014-05-07 8:36 ` Romain Francoise 2014-05-07 14:10 ` Dirk Brandewie 2014-05-21 9:00 ` Johan Hovold 2014-05-28 7:59 ` Johan Hovold 2014-05-28 0:35 ` Yuyang Du 2014-05-28 16:00 ` Doug Smythies 2014-05-28 16:53 ` Yuyang Du 2014-05-30 2:27 ` Greg Kroah-Hartman 2014-05-30 8:49 ` Johan Hovold 2014-05-30 12:29 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).