From: "Doug Smythies" <dsmythies@telus.net>
To: "'Rafael J. Wysocki'" <rjw@rjwysocki.net>
Cc: "'Srinivas Pandruvada'" <srinivas.pandruvada@linux.intel.com>,
"'Peter Zijlstra'" <peterz@infradead.org>,
"'LKML'" <linux-kernel@vger.kernel.org>,
"'Frederic Weisbecker'" <frederic@kernel.org>,
"'Mel Gorman'" <mgorman@suse.de>,
"'Daniel Lezcano'" <daniel.lezcano@linaro.org>,
"'Chen, Hu'" <hu1.chen@intel.com>,
"'Quentin Perret'" <quentin.perret@arm.com>,
"'Linux PM'" <linux-pm@vger.kernel.org>,
"'Giovanni Gherdovich'" <ggherdovich@suse.cz>
Subject: RE: [RFC/RFT][PATCH v8] cpuidle: New timer events oriented governor for tickless systems
Date: Thu, 26 Sep 2019 09:31:36 -0700 [thread overview]
Message-ID: <001601d57487$e1029ef0$a307dcd0$@net> (raw)
In-Reply-To: WgXvgFd5aBdpLWgY0gWTga
All,
Recall the extensive tests and work done between 2018.10.11 and 2019.01.16
on Rafael's teo governor, versions 1 through 11, noting that versions
9, 10, and 11 did not contain functional changes.
Hi Rafael,
If the deepest idle state is disabled, the system
can become somewhat unstable, with anywhere between no problem
at all, to the occasional temporary jump using a lot more
power for a few seconds, to a permanent jump using a lot more
power continuously. I have been unable to isolate the exact
test load conditions under which this will occur. However,
temporarily disabling and then enabling other idle states
seems to make for a somewhat repeatable test. It is important
to note that the issue occurs with only ever disabling the deepest
idle state, just not reliably.
I want to know how you want to proceed before I do a bunch of
regression testing.
On 2018.12.11 03:50 Rafael J. Wysocki wrote:
> v7 -> v8:
> * Apply the selection rules to the idle deepest state as well as to
> the shallower ones (the deepest idle state was treated differently
> before by mistake).
> * Subtract 1/2 of the exit latency from the measured idle duration
> in teo_update() (instead of subtracting the entire exit latency).
> This makes the idle state selection be slightly more performance-
> oriented.
I have isolated the issue to a subset of the v7 to v8 changes, however
it was not the exit latency changes.
The partial revert to V7 changes I made were (on top of 5.3):
diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index 7d05efd..d2892bb 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -283,9 +283,28 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
if (idx < 0)
idx = i; /* first enabled state */
- if (s->target_residency > duration_us)
- break;
+ if (s->target_residency > duration_us) {
+ /*
+ * If the "hits" metric of the state matching the sleep
+ * length is greater than its "misses" metric, that is
+ * the one to use.
+ */
+ if (cpu_data->states[idx].hits > cpu_data->states[idx].misses)
+ break;
+ /*
+ * It is more likely that one of the shallower states
+ * will match the idle duration measured after wakeup,
+ * so take the one with the maximum "early hits" metric,
+ * but if that cannot be determined, just use the state
+ * selected so far.
+ */
+ if (max_early_idx >= 0) {
+ idx = max_early_idx;
+ duration_us = drv->states[idx].target_residency;
+ }
+ break;
+ }
if (s->exit_latency > latency_req) {
/*
* If we break out of the loop for latency reasons, use
@@ -294,7 +313,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
* as long as that target residency is low enough.
*/
duration_us = drv->states[idx].target_residency;
- goto refine;
+ break;
}
idx = i;
@@ -307,21 +326,6 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
}
}
- /*
- * If the "hits" metric of the idle state matching the sleep length is
- * greater than its "misses" metric, that is the one to use. Otherwise,
- * it is more likely that one of the shallower states will match the
- * idle duration observed after wakeup, so take the one with the maximum
- * "early hits" metric, but if that cannot be determined, just use the
- * state selected so far.
- */
- if (cpu_data->states[idx].hits <= cpu_data->states[idx].misses &&
- max_early_idx >= 0) {
- idx = max_early_idx;
- duration_us = drv->states[idx].target_residency;
- }
-
-refine:
if (idx < 0) {
idx = 0; /* No states enabled. Must use 0. */
} else if (idx > 0) {
Test results (note: I use my own monitoring utility, because it feeds my
graphing/html utility, but have used turbostat here in case others
want to repeat the test):
Processor: i7-2600K
Deepest idle state: 4 (C6)
The system is mostly idle for these tests.
turbostat command used:
sudo turbostat --Summary --quiet --hide
IRQ,Avg_MHz,SMI,GFXMHz,TSC_MHz,GFXWatt,CorWatt,CPU%c1,CPU%c7,CoreTmp,GFX%rc6,Pkg%pc2,Pkg%pc3,Pkg%pc6,C1,C1E,,C1%,C1E%,C3%,C6%
--interval 3
Only because it works so reliably, the test involves disabling all idle states
deeper than 0, then enabling all idle states shallower than the deepest.
I have attempted to minimize the number of display columns,
hopefully without hiding useful information.
Kernel 5.3:
Busy% Bzy_MHz POLL C3 C6 POLL% CPU%c3 CPU%c6 PkgTmp PkgWatt
0.04 1600 0 0 232 0.00 0.00 99.85 28 3.73
0.05 1600 0 3 254 0.00 0.00 99.82 28 3.73
0.04 1600 0 1 264 0.00 0.00 99.82 30 3.73
61.08 3493 14735 16 258 60.84 0.04 38.57 45 31.00
100.00 3500 24057 0 0 99.88 0.00 0.00 46 48.73
100.00 3500 24064 0 0 99.86 0.00 0.00 48 48.87
100.00 3500 24057 0 0 99.88 0.00 0.00 49 49.07
100.00 3500 24070 0 0 99.87 0.00 0.00 50 49.21
100.00 3500 24068 0 0 99.87 0.00 0.00 50 49.29
100.00 3500 24063 0 0 99.88 0.00 0.00 52 49.42
100.00 3500 24060 0 0 99.87 0.00 0.00 53 49.58
100.00 3500 24076 0 0 99.87 0.00 0.00 53 49.70
100.00 3500 24067 0 0 99.87 0.00 0.00 55 49.77
28.89 3575 15138 261 0 28.71 59.97 0.00 48 27.93
10.52 3666 11884 379 0 10.43 75.24 0.00 47 20.72
10.80 3680 12141 558 0 10.65 75.13 0.00 46 21.11
11.51 3742 13016 217 0 11.43 74.93 0.00 47 22.48
10.84 3696 12254 294 0 10.76 74.89 0.00 48 21.26
10.34 3652 11683 243 0 10.25 74.93 0.00 47 20.36
10.67 3681 12065 240 0 10.59 74.94 0.00 48 20.96
10.68 3671 12067 278 0 10.59 74.92 0.00 47 20.85
10.84 3690 12259 236 0 10.76 74.94 0.00 47 21.24
11.51 3735 13023 265 0 11.43 74.93 0.00 48 22.43
Notice how once idle state 3 is enabled again (as were states 1 and 2) and we
should be at about 5 watts, there is still a ton of time spent in idle state
0 (POLL).
Kernel 5.3 + above patch:
Busy% Bzy_MHz POLL C3 C6 POLL% CPU%c3 CPU%c6 PkgTmp PkgWatt
0.04 1600 0 3 235 0.00 0.00 99.82 26 3.71
0.04 1600 0 1 252 0.00 0.00 99.79 26 3.71
0.05 1600 0 3 273 0.00 0.00 99.79 26 3.72
0.04 1600 0 3 224 0.00 0.00 99.83 26 3.71
0.05 1600 0 1 252 0.00 0.00 99.81 25 3.71
0.04 1600 0 1 231 0.00 0.00 99.84 26 3.71
0.05 1600 0 3 294 0.00 0.00 99.77 28 3.71
0.04 1600 0 2 259 0.00 0.00 99.81 25 3.70
69.61 3490 16773 15 326 69.33 0.14 29.98 42 34.41
100.00 3500 24079 0 0 99.88 0.00 0.00 45 48.27
100.00 3500 24062 0 0 99.89 0.00 0.00 45 48.41
100.00 3500 24056 0 0 99.88 0.00 0.00 46 48.52
100.00 3500 24064 0 0 99.89 0.00 0.00 47 48.63
27.31 3498 6588 340 0 27.16 72.40 0.00 32 17.13
0.03 1600 1 230 0 0.00 99.91 0.00 30 5.08
0.03 1600 2 234 0 0.00 99.90 0.00 31 5.07
0.03 1600 5 238 0 0.00 99.91 0.00 30 5.07
0.03 1600 3 230 0 0.00 99.91 0.00 30 5.05
0.03 1600 3 246 0 0.00 99.91 0.00 29 5.05
0.03 1600 3 235 0 0.00 99.90 0.00 30 5.04
Notice how once idle state 3 is enabled again (as were states 1 and 2) almost
all time is spent there, as expected, and the power is the expected 5 watts.
... Doug
next reply other threads:[~2019-09-26 16:31 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-26 16:31 Doug Smythies [this message]
2019-09-29 16:04 ` [RFC/RFT][PATCH v8] cpuidle: New timer events oriented governor for tickless systems Doug Smythies
2019-10-01 9:31 ` Rafael J. Wysocki
2019-10-06 14:46 ` Doug Smythies
2019-10-06 15:34 ` Rafael J. Wysocki
2019-10-08 6:20 ` Doug Smythies
2019-10-08 9:51 ` Rafael J. Wysocki
2019-10-08 10:49 ` Rafael J. Wysocki
2019-10-08 23:19 ` Rafael J. Wysocki
2019-10-09 13:36 ` Rafael J. Wysocki
2019-10-10 7:05 ` Doug Smythies
2019-10-10 8:42 ` Rafael J. Wysocki
-- strict thread matches above, loose matches on Subject: below --
2018-12-17 1:53 Doug Smythies
2018-12-17 11:59 ` Rafael J. Wysocki
2018-12-11 11:49 Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='001601d57487$e1029ef0$a307dcd0$@net' \
--to=dsmythies@telus.net \
--cc=daniel.lezcano@linaro.org \
--cc=frederic@kernel.org \
--cc=ggherdovich@suse.cz \
--cc=hu1.chen@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=peterz@infradead.org \
--cc=quentin.perret@arm.com \
--cc=rjw@rjwysocki.net \
--cc=srinivas.pandruvada@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).