From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Doug Smythies" Subject: RE: SKL BOOT FAILURE unless idle=nomwait (was Re: PROBLEM: Cpufreq constantly keeps frequency at maximum on 4.5-rc4) Date: Sat, 12 Mar 2016 23:46:03 -0800 Message-ID: <001001d17cfc$67721e70$36565b50$@net> References: <003b01d17bf8$ad214680$0763d380$@net> <4779975.cHAts0tdyJ@vostro.rjw.lan> <97183685.ubU62sp0PR@vostro.rjw.lan> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Return-path: Received: from cmta2.telus.net ([209.171.16.75]:38559 "EHLO cmta2.telus.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751479AbcCMHqL (ORCPT ); Sun, 13 Mar 2016 03:46:11 -0400 In-Reply-To: <97183685.ubU62sp0PR@vostro.rjw.lan> Content-Language: en-ca Sender: linux-pm-owner@vger.kernel.org List-Id: linux-pm@vger.kernel.org To: "'Rafael J. Wysocki'" , 'Rik van Riel' Cc: "'Rafael J. Wysocki'" , 'Viresh Kumar' , 'Srinivas Pandruvada' , "'Chen, Yu C'" , linux-pm@vger.kernel.org, 'Arto Jantunen' , 'Len Brown' On 2016.03.11 18:02 Rafael J. Wysocki wrote: > On Saturday, March 12, 2016 02:45:42 AM Rafael J. Wysocki wrote: > > Gosh, I'm too tired. Parens missing and it can be written simpler using <=. > > Tentatively-signed-off-by: Rafael J. Wysocki > --- > drivers/cpuidle/governors/menu.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > Index: linux-pm/drivers/cpuidle/governors/menu.c > =================================================================== > --- linux-pm.orig/drivers/cpuidle/governors/menu.c > +++ linux-pm/drivers/cpuidle/governors/menu.c > @@ -327,11 +327,13 @@ static int menu_select(struct cpuidle_dr > data->last_state_idx = CPUIDLE_DRIVER_STATE_START - 1; > /* > * We want to default to C1 (hlt), not to busy polling > - * unless the timer is happening really really soon. > + * unless the timer is happening really really soon. Still, if > + * the exit latency of C1 is too high, we need to poll anyway. > */ > - if (interactivity_req > 20 && > + if (data->next_timer_us > 20 && > + drv->states[CPUIDLE_DRIVER_STATE_START].exit_latency <= latency_req && > !drv->states[CPUIDLE_DRIVER_STATE_START].disabled && > - dev->states_usage[CPUIDLE_DRIVER_STATE_START].disable == 0) > + !dev->states_usage[CPUIDLE_DRIVER_STATE_START].disable) > data->last_state_idx = CPUIDLE_DRIVER_STATE_START; > } else { > data->last_state_idx = CPUIDLE_DRIVER_STATE_START; Note 1: Above = rvr3 (because I already have a bunch of rjw kernels for other stuff). Note 2: reference tests re-done, using Rafael's 3 patch set version 10 "cpufreq: Replace timers with utilization update callbacks". Why? Because it was desirable to eliminate intel_pstate long durations between calls that were due to the CPU being idle on jiffy boundaries, but otherwise busy. Well why was that desirable? So that trace could be acquired where we could be reasonably confident that most very high CPU loads combined with very long durations were due to long periods in idle state 0. Aggregate times in each idle state for the 2000 second test: State k45rc7-rjw10 (mins) k45rc7-rjw10-reverted (mins) k45rc7-rjw10-rcr3 (mins) 0.00 18.07 0.92 18.38 1.00 12.35 19.51 13.16 2.00 3.96 4.28 2.91 3.00 1.55 1.53 1.00 4.00 138.96 141.99 115.41 total 174.90 168.24 150.87 Energy: Kernel 4.5-rc7-rjw10: 61983 Joules Kernel 4.5-rc7-rjw10-reverted: 48409 Joules Kernel 4.5-rc7-rjw10-rvr3: 62938 Joules Isn't the issue here just that it can be just so very expensive, in terms of energy, when the decision is made to poll instead of HLT or deeper? It doesn't have to happen all that often, where the CPU is basically abandoned in that state, because it can be there for up to 200,000 times longer than was expected (4 seconds instead of <20 usecs), per occurrence. An intel_pstate trace was obtained for the above "k45rc7-rjw10-rvr3" (Kernel 4.5-rc7 with Rafael's 3 patch set version 10 and the above suggested patch). In 2000 seconds there were about 3164 long durations at high CPU load (in this context meaning the CPU was actually idle, but was in idle state 0) accounting for 17.15 minutes out of the above listed 18.38 minutes. For example: CPU 6: mperf: 6672329686; aperf: 6921452881; load: 99.83% duration: 1.96 seconds. CPU 5: mperf: 7591407713; aperf: 5651758618; load: 99.87% duration: 2.23 seconds. An intel_pstate trace was obtained for the above "k45rc7-rjw10-reverted" (Kernel 4.5-rc7 with Rafael's 3 patch set version 10 and commits 9c4b2867ed7c8c8784dd417ffd16e705e81eb145 and a9ceb78bc75ca47972096372ff3d48648b16317a reverted). In 2000 seconds there were about 237 long durations at high CPU load (in this context meaning the CPU was actually idle, but was in idle state 0) totaling 3.42 minutes, or more than can be accounted for above. However, if I compensate for actual load (which is consistently less in the 237 samples (meaning it wasn't actually always in state 0 during that time) and take out some of the watchdog limit hits at the end, because the trace was longer than the actual idle states collection time, it goes to 0.35 minutes. ... Doug