From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754581Ab2GRONZ (ORCPT ); Wed, 18 Jul 2012 10:13:25 -0400 Received: from toro.web-alm.net ([62.245.132.31]:54232 "EHLO toro.web-alm.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751690Ab2GRONW (ORCPT ); Wed, 18 Jul 2012 10:13:22 -0400 Message-ID: <5006C3A9.7030300@osadl.org> Date: Wed, 18 Jul 2012 16:09:45 +0200 From: Carsten Emde Organization: Open Source Automation Development Lab (OSADL) User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.24) Gecko/20111108 Fedora/3.1.16-1.fc14 Thunderbird/3.1.16 MIME-Version: 1.0 To: Deepthi Dharwar CC: Len Brown , Kevin Hilman , Thomas Gleixner , LKML , Linux PM mailing list Subject: [PATCH 1/1 v2] Honor state disabling in the cpuidle ladder governor - documented References: <20120717185914.063547728@osadl.org> <20120717190330.700421963@osadl.org> <50065953.9040904@linux.vnet.ibm.com> <500697A9.6070101@osadl.org> <5006A2A6.8030902@linux.vnet.ibm.com> In-Reply-To: <5006A2A6.8030902@linux.vnet.ibm.com> Content-Type: multipart/mixed; boundary="------------030006070203080105020006" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a multi-part message in MIME format. --------------030006070203080105020006 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 07/18/2012 01:48 PM, Deepthi Dharwar wrote: > On 07/18/2012 04:32 PM, Carsten Emde wrote: >> On 07/18/2012 08:36 AM, Deepthi Dharwar wrote: >>> On 07/18/2012 12:29 AM, Carsten Emde wrote: >>>> There are two cpuidle governors ladder and menu. While the ladder >>>> governor is always available, if CONFIG_CPU_IDLE is selected, the >>>> menu governor additionally requires CONFIG_NO_HZ. >>>> >>>> A particular C state can be disabled by writing to the sysfs file >>>> /sys/devices/system/cpu/cpuN/cpuidle/stateN/disable, but this mechanism >>>> is only implemented in the menu governor. Thus, in a system where >>>> CONFIG_NO_HZ is not selected, the ladder governor becomes default and >>>> always will walk through all sleep states - irrespective of whether the >>>> C state was disabled via sysfs or not. The only way to select a specific >>>> C state was to write the related latency to /dev/cpu_dma_latency and >>>> keep the file open as long as this setting was required - not very >>>> practical and not suitable for setting a single core in an SMP system. >>>> >>>> With this patch, the ladder governor only will promote to the next >>>> C state, if it has not been disabled, and it will demote, if the >>>> current C state was disabled. >>> >>> Yes, I agree that currently that disabling a particular C-state >>> is not reflected in working of ladder governor. This patch is needed >>> to fix it on ladder too. >>> >>> Also wanted to clarify on the intended implementation here, >>> if there are say 5 C-states on a system, disabling 2nd >>> state would also end by disabling all the remaining 3 deeper states too >>> as ladder governor enters the lightest state first, and will only move >>> on to the next deeper state if a idle period was long enough as >>> per the implementation. >>> If one is disabling only the deepest state, then it would >>> work as intended. >> Yes, the patch does not make the setting of the sysfs variable >> "disable" coherent, i.e. if one is disabling a light state, then all >> deeper states are disabled as well, but the "disable" variable does not >> reflect it. Likewise, if one enables a deep state but a lighter state >> still is disabled, then this has no effect. > > Agree, as per the ladder design. > >> I could implement a sanitize mechanism of the ladder governor that >> takes care the "disable" variables of all deeper states are set to 1, >> if a state is disabled, and those of all lighter states are set to 0, >> if a state is enabled. Do you wish me to do that? > > No, I dont think thats necessary, current code suffices it. > The disable flag is knob we are giving to the user . So may be just > document the intended use of disable flag working > alongside design of ladder governor. Here comes v2 with a related section added to the documentation. -Carsten. --------------030006070203080105020006 Content-Type: text/x-patch; name="drivers-cpuidle-ladder-honor-disabling-with-doc.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="drivers-cpuidle-ladder-honor-disabling-with-doc.patch" Subject: Honor state disabling in the cpuidle ladder governor From: Carsten Emde Date: Tue, 17 Jul 2012 19:50:13 +0100 There are two cpuidle governors ladder and menu. While the ladder governor is always available, if CONFIG_CPU_IDLE is selected, the menu governor additionally requires CONFIG_NO_HZ. A particular C state can be disabled by writing to the sysfs file /sys/devices/system/cpu/cpuN/cpuidle/stateN/disable, but this mechanism is only implemented in the menu governor. Thus, in a system where CONFIG_NO_HZ is not selected, the ladder governor becomes default and always will walk through all sleep states - irrespective of whether the C state was disabled via sysfs or not. The only way to select a specific C state was to write the related latency to /dev/cpu_dma_latency and keep the file open as long as this setting was required - not very practical and not suitable for setting a single core in an SMP system. With this patch, the ladder governor only will promote to the next C state, if it has not been disabled, and it will demote, if the current C state was disabled. Note that the patch does not make the setting of the sysfs variable "disable" coherent, i.e. if one is disabling a light state, then all deeper states are disabled as well, but the "disable" variable does not reflect it. Likewise, if one enables a deep state but a lighter state still is disabled, then this has no effect. A related section has been addded to the documentation. Signed-off-by: Carsten Emde --- Documentation/cpuidle/sysfs.txt | 10 +++++++++- drivers/cpuidle/governors/ladder.c | 4 +++- 2 files changed, 12 insertions(+), 2 deletions(-) Index: linux-3.4.4-rt14-rc2-64/Documentation/cpuidle/sysfs.txt =================================================================== --- linux-3.4.4-rt14-rc2-64.orig/Documentation/cpuidle/sysfs.txt +++ linux-3.4.4-rt14-rc2-64/Documentation/cpuidle/sysfs.txt @@ -76,9 +76,17 @@ total 0 * desc : Small description about the idle state (string) -* disable : Option to disable this idle state (bool) +* disable : Option to disable this idle state (bool) -> see note below * latency : Latency to exit out of this idle state (in microseconds) * name : Name of the idle state (string) * power : Power consumed while in this idle state (in milliwatts) * time : Total time spent in this idle state (in microseconds) * usage : Number of times this state was entered (count) + +Note: +The behavior and the effect of the disable variable depends on the +implementation of a particular governor. In the ladder governor, for +example, it is not coherent, i.e. if one is disabling a light state, +then all deeper states are disabled as well, but the disable variable +does not reflect it. Likewise, if one enables a deep state but a lighter +state still is disabled, then this has no effect. Index: linux-3.4.4-rt14-rc2-64/drivers/cpuidle/governors/ladder.c =================================================================== --- linux-3.4.4-rt14-rc2-64.orig/drivers/cpuidle/governors/ladder.c +++ linux-3.4.4-rt14-rc2-64/drivers/cpuidle/governors/ladder.c @@ -88,6 +88,7 @@ static int ladder_select_state(struct cp /* consider promotion */ if (last_idx < drv->state_count - 1 && + !drv->states[last_idx + 1].disable && last_residency > last_state->threshold.promotion_time && drv->states[last_idx + 1].exit_latency <= latency_req) { last_state->stats.promotion_count++; @@ -100,7 +101,8 @@ static int ladder_select_state(struct cp /* consider demotion */ if (last_idx > CPUIDLE_DRIVER_STATE_START && - drv->states[last_idx].exit_latency > latency_req) { + (drv->states[last_idx].disable || + drv->states[last_idx].exit_latency > latency_req)) { int i; for (i = last_idx - 1; i > CPUIDLE_DRIVER_STATE_START; i--) { --------------030006070203080105020006--