public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] intel_idle: disable HW auto-demotion
@ 2011-01-19  2:52 Len Brown
  2011-01-19  5:40 ` [PATCH] intel_idle: disable HW auto-demotion by default (v2) Len Brown
  2011-01-19 14:41 ` [PATCH] intel_idle: disable HW auto-demotion Matthew Garrett
  0 siblings, 2 replies; 7+ messages in thread
From: Len Brown @ 2011-01-19  2:52 UTC (permalink / raw)
  To: linux-pm, x86, linux-acpi

From: Len Brown <len.brown@intel.com>

HW auto-demotion is a mechanism where the HW overrides
the OS C-state request, instead choosing a shallower state.

This is a useful feature for legacy Linux, which has clock
ticks in idle and may request states deeper than make sense.

However, modern Linux should get exactly the states it requests.

In particular, when a processor is taken off-line, it is
important that its request for the deepest available C-state
is honored, else it can disrupt the C-states reached by
the remaining on-line threads.

boot with "intel_idle.auto_demote=1" to disable the effect of this patch.

https://bugzilla.kernel.org/show_bug.cgi?id=25252

Signed-off-by: Len Brown <len.brown@intel.com>
---
 arch/x86/include/asm/msr-index.h |    4 ++++
 drivers/idle/intel_idle.c        |   30 ++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 4d0dfa0..b75eeab 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -36,6 +36,10 @@
 #define MSR_IA32_PERFCTR1		0x000000c2
 #define MSR_FSB_FREQ			0x000000cd
 
+#define MSR_NHM_SNB_PKG_CST_CFG_CTL	0x000000e2
+#define NHM_C3_AUTO_DEMOTE		(1UL << 25)
+#define NHM_C1_AUTO_DEMOTE		(1UL << 26)
+
 #define MSR_MTRRcap			0x000000fe
 #define MSR_IA32_BBL_CR_CTL		0x00000119
 
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 7acb32e..3ee8c38 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -62,6 +62,7 @@
 #include <linux/notifier.h>
 #include <linux/cpu.h>
 #include <asm/mwait.h>
+#include <asm/msr.h>
 
 #define INTEL_IDLE_VERSION "0.4"
 #define PREFIX "intel_idle: "
@@ -85,6 +86,16 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state);
 static struct cpuidle_state *cpuidle_state_table;
 
 /*
+ * Disable HW auto demotion on tick-less idle kernels
+ */
+static unsigned int has_nhm_snb_hw_auto_demotion;
+#ifdef CONFIG_NO_HZ
+static unsigned int auto_demotion;
+#else
+static unsigned int auto_demotion = 1;
+#endif
+
+/*
  * Set this flag for states where the HW flushes the TLB for us
  * and so we don't need cross-calls to keep it consistent.
  * If this flag is set, SW flushes the TLB, so even if the
@@ -285,6 +296,20 @@ static struct notifier_block __cpuinitdata setup_broadcast_notifier = {
 	.notifier_call = setup_broadcast_cpuhp_notify,
 };
 
+static long nhm_snb_auto_demotion_off(void *unused)
+{
+	unsigned long long msr_bits;
+
+	rdmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr_bits);
+
+	msr_bits &= ~(NHM_C1_AUTO_DEMOTE | NHM_C3_AUTO_DEMOTE);
+
+	wrmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr_bits);
+
+	return 0;
+}
+
+
 /*
  * intel_idle_probe()
  */
@@ -328,6 +353,7 @@ static int intel_idle_probe(void)
 	case 0x25:	/* Westmere */
 	case 0x2C:	/* Westmere */
 		cpuidle_state_table = nehalem_cstates;
+		has_nhm_snb_hw_auto_demotion = 1;
 		break;
 
 	case 0x1C:	/* 28 - Atom Processor */
@@ -338,6 +364,7 @@ static int intel_idle_probe(void)
 	case 0x2A:	/* SNB */
 	case 0x2D:	/* SNB Xeon */
 		cpuidle_state_table = snb_cstates;
+		has_nhm_snb_hw_auto_demotion = 1;
 		break;
 
 	default:
@@ -439,6 +466,8 @@ static int intel_idle_cpuidle_devices_init(void)
 			intel_idle_cpuidle_devices_uninit();
 			return -EIO;
 		}
+		if (has_nhm_snb_hw_auto_demotion && (auto_demotion == 0))
+			work_on_cpu(i, nhm_snb_auto_demotion_off, 0);
 	}
 
 	return 0;
@@ -490,6 +519,7 @@ module_init(intel_idle_init);
 module_exit(intel_idle_exit);
 
 module_param(max_cstate, int, 0444);
+module_param(auto_demotion, int, 0444);
 
 MODULE_AUTHOR("Len Brown <len.brown@intel.com>");
 MODULE_DESCRIPTION("Cpuidle driver for Intel Hardware v" INTEL_IDLE_VERSION);
-- 
1.7.4.rc2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH] intel_idle: disable HW auto-demotion by default (v2)
  2011-01-19  2:52 [PATCH] intel_idle: disable HW auto-demotion Len Brown
@ 2011-01-19  5:40 ` Len Brown
  2011-02-16  6:22   ` [PATCH] intel_idle: disable NHM/WSM HW C-state auto-demotion (v3) Len Brown
  2011-01-19 14:41 ` [PATCH] intel_idle: disable HW auto-demotion Matthew Garrett
  1 sibling, 1 reply; 7+ messages in thread
From: Len Brown @ 2011-01-19  5:40 UTC (permalink / raw)
  To: linux-pm, x86, linux-acpi, linux-kernel

From: Len Brown <len.brown@intel.com>

HW auto-demotion is a mechanism where the HW overrides
the OS C-state request, instead choosing a shallower state.
It is a useful feature for legacy Linux, which has clock
ticks in idle and may request states deeper than make sense.

However, modern Linux should get exactly the states it requests.

In particular, when a CPU is taken off-line, it must
not be demoted, else it can prevent the entire package from
reaching deep C-states.

https://bugzilla.kernel.org/show_bug.cgi?id=25252

boot with "intel_idle.auto_demote=1" to disable
disabling auto_demotion.

Signed-off-by: Len Brown <len.brown@intel.com>
---

(v2): use smp_call_function() rather than work_on_cpu()
update modparam name to match commit log.

 arch/x86/include/asm/msr-index.h |    4 ++++
 drivers/idle/intel_idle.c        |   30 ++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 4d0dfa0..b75eeab 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -36,6 +36,10 @@
 #define MSR_IA32_PERFCTR1		0x000000c2
 #define MSR_FSB_FREQ			0x000000cd
 
+#define MSR_NHM_SNB_PKG_CST_CFG_CTL	0x000000e2
+#define NHM_C3_AUTO_DEMOTE		(1UL << 25)
+#define NHM_C1_AUTO_DEMOTE		(1UL << 26)
+
 #define MSR_MTRRcap			0x000000fe
 #define MSR_IA32_BBL_CR_CTL		0x00000119
 
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 7acb32e..1290e24 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -62,6 +62,7 @@
 #include <linux/notifier.h>
 #include <linux/cpu.h>
 #include <asm/mwait.h>
+#include <asm/msr.h>
 
 #define INTEL_IDLE_VERSION "0.4"
 #define PREFIX "intel_idle: "
@@ -85,6 +86,16 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state);
 static struct cpuidle_state *cpuidle_state_table;
 
 /*
+ * Disable HW auto demotion on tick-less idle kernels
+ */
+static unsigned int has_nhm_snb_hw_auto_demotion;
+#ifdef CONFIG_NO_HZ
+static unsigned int auto_demote;
+#else
+static unsigned int auto_demote = 1;
+#endif
+
+/*
  * Set this flag for states where the HW flushes the TLB for us
  * and so we don't need cross-calls to keep it consistent.
  * If this flag is set, SW flushes the TLB, so even if the
@@ -285,6 +296,20 @@ static struct notifier_block __cpuinitdata setup_broadcast_notifier = {
 	.notifier_call = setup_broadcast_cpuhp_notify,
 };
 
+static long auto_demotion_disable(void *unused)
+{
+	unsigned long long msr_bits;
+
+	rdmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr_bits);
+
+	msr_bits &= ~(NHM_C1_AUTO_DEMOTE | NHM_C3_AUTO_DEMOTE);
+
+	wrmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr_bits);
+
+	return 0;
+}
+
+
 /*
  * intel_idle_probe()
  */
@@ -328,6 +353,7 @@ static int intel_idle_probe(void)
 	case 0x25:	/* Westmere */
 	case 0x2C:	/* Westmere */
 		cpuidle_state_table = nehalem_cstates;
+		has_nhm_snb_hw_auto_demotion = 1;
 		break;
 
 	case 0x1C:	/* 28 - Atom Processor */
@@ -338,6 +364,7 @@ static int intel_idle_probe(void)
 	case 0x2A:	/* SNB */
 	case 0x2D:	/* SNB Xeon */
 		cpuidle_state_table = snb_cstates;
+		has_nhm_snb_hw_auto_demotion = 1;
 		break;
 
 	default:
@@ -440,6 +467,8 @@ static int intel_idle_cpuidle_devices_init(void)
 			return -EIO;
 		}
 	}
+	if (has_nhm_snb_hw_auto_demotion && (auto_demote == 0))
+		smp_call_function(auto_demotion_disable, NULL, 1);
 
 	return 0;
 }
@@ -490,6 +519,7 @@ module_init(intel_idle_init);
 module_exit(intel_idle_exit);
 
 module_param(max_cstate, int, 0444);
+module_param(auto_demote, int, 0444);
 
 MODULE_AUTHOR("Len Brown <len.brown@intel.com>");
 MODULE_DESCRIPTION("Cpuidle driver for Intel Hardware v" INTEL_IDLE_VERSION);
-- 
1.7.4.rc2.3.g60a2e



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] intel_idle: disable HW auto-demotion
  2011-01-19  2:52 [PATCH] intel_idle: disable HW auto-demotion Len Brown
  2011-01-19  5:40 ` [PATCH] intel_idle: disable HW auto-demotion by default (v2) Len Brown
@ 2011-01-19 14:41 ` Matthew Garrett
  2011-02-16  6:19   ` Len Brown
  1 sibling, 1 reply; 7+ messages in thread
From: Matthew Garrett @ 2011-01-19 14:41 UTC (permalink / raw)
  To: Len Brown; +Cc: linux-pm, x86, linux-acpi

On Tue, Jan 18, 2011 at 09:52:26PM -0500, Len Brown wrote:
> + * Disable HW auto demotion on tick-less idle kernels
> + */
> +static unsigned int has_nhm_snb_hw_auto_demotion;
> +#ifdef CONFIG_NO_HZ
> +static unsigned int auto_demotion;
> +#else
> +static unsigned int auto_demotion = 1;
> +#endif

What if someone boots with nohz=off?

Just so I'm clear on this - this is unrelated to the auto-popup on DMA, 
right?

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] intel_idle: disable HW auto-demotion
  2011-01-19 14:41 ` [PATCH] intel_idle: disable HW auto-demotion Matthew Garrett
@ 2011-02-16  6:19   ` Len Brown
  0 siblings, 0 replies; 7+ messages in thread
From: Len Brown @ 2011-02-16  6:19 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: linux-pm, x86, linux-acpi

On Wed, 19 Jan 2011, Matthew Garrett wrote:

> On Tue, Jan 18, 2011 at 09:52:26PM -0500, Len Brown wrote:
> > + * Disable HW auto demotion on tick-less idle kernels
> > + */
> > +static unsigned int has_nhm_snb_hw_auto_demotion;
> > +#ifdef CONFIG_NO_HZ
> > +static unsigned int auto_demotion;
> > +#else
> > +static unsigned int auto_demotion = 1;
> > +#endif
> 
> What if someone boots with nohz=off?

The next version of this patch will disable auto-demotion
independent of HZ, on the assumption that the governor
is smart enough today to get what it asks for, HZ or NO_HZ.

It is also now limits the change to NHM/WSM,
as SNB has auto-un-demotion to address this scenario.

> Just so I'm clear on this - this is unrelated to the auto-popup on DMA, 
> right?

Right, C2 popup is a different feature, and that is smart
enought to return to the power saving state after the
DMA is done.

cheers,
Len Brown, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] intel_idle: disable NHM/WSM HW C-state auto-demotion (v3)
  2011-01-19  5:40 ` [PATCH] intel_idle: disable HW auto-demotion by default (v2) Len Brown
@ 2011-02-16  6:22   ` Len Brown
  2011-02-27 16:17     ` Pierre Tardy
  0 siblings, 1 reply; 7+ messages in thread
From: Len Brown @ 2011-02-16  6:22 UTC (permalink / raw)
  To: linux-pm, x86, linux-acpi, linux-kernel

From: Len Brown <len.brown@intel.com>

Hardware C-state auto-demotion is a mechanism where the HW overrides
the OS C-state request, instead demoting to a shallower state,
which is less expensive, but saves less power.

Modern Linux should generally get exactly the states it requests.
In particular, when a CPU is taken off-line, it must not be demoted, else
it can prevent the entire package from reaching deep C-states.

https://bugzilla.kernel.org/show_bug.cgi?id=25252

Signed-off-by: Len Brown <len.brown@intel.com>
---
 arch/x86/include/asm/msr-index.h |    4 ++++
 drivers/idle/intel_idle.c        |   20 ++++++++++++++++++++
 2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 4d0dfa0..b75eeab 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -36,6 +36,10 @@
 #define MSR_IA32_PERFCTR1		0x000000c2
 #define MSR_FSB_FREQ			0x000000cd
 
+#define MSR_NHM_SNB_PKG_CST_CFG_CTL	0x000000e2
+#define NHM_C3_AUTO_DEMOTE		(1UL << 25)
+#define NHM_C1_AUTO_DEMOTE		(1UL << 26)
+
 #define MSR_MTRRcap			0x000000fe
 #define MSR_IA32_BBL_CR_CTL		0x00000119
 
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 7acb32e..88998ac 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -62,6 +62,7 @@
 #include <linux/notifier.h>
 #include <linux/cpu.h>
 #include <asm/mwait.h>
+#include <asm/msr.h>
 
 #define INTEL_IDLE_VERSION "0.4"
 #define PREFIX "intel_idle: "
@@ -85,6 +86,12 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state);
 static struct cpuidle_state *cpuidle_state_table;
 
 /*
+ * Hardware C-state auto-demotion may not always be optimal.
+ * Indicate which enable bits to clear here.
+ */
+static unsigned int auto_demotion_disable_flags;
+
+/*
  * Set this flag for states where the HW flushes the TLB for us
  * and so we don't need cross-calls to keep it consistent.
  * If this flag is set, SW flushes the TLB, so even if the
@@ -285,6 +292,16 @@ static struct notifier_block __cpuinitdata setup_broadcast_notifier = {
 	.notifier_call = setup_broadcast_cpuhp_notify,
 };
 
+static long auto_demotion_disable(void *enable_bits)
+{
+	unsigned long long msr_bits;
+
+	rdmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr_bits);
+	msr_bits &= ~((unsigned long long)enable_bits);
+	wrmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr_bits);
+	return 0;
+}
+
 /*
  * intel_idle_probe()
  */
@@ -328,6 +345,7 @@ static int intel_idle_probe(void)
 	case 0x25:	/* Westmere */
 	case 0x2C:	/* Westmere */
 		cpuidle_state_table = nehalem_cstates;
+		auto_demotion_disable_flags = NHM_C1_AUTO_DEMOTE | NHM_C3_AUTO_DEMOTE;
 		break;
 
 	case 0x1C:	/* 28 - Atom Processor */
@@ -440,6 +458,8 @@ static int intel_idle_cpuidle_devices_init(void)
 			return -EIO;
 		}
 	}
+	if (auto_demotion_disable_flags)
+		smp_call_function(auto_demotion_disable, (void *)auto_demotion_disable_flags, 1);
 
 	return 0;
 }
-- 
1.7.4.1.26.g00e6e


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] intel_idle: disable NHM/WSM HW C-state auto-demotion (v3)
  2011-02-16  6:22   ` [PATCH] intel_idle: disable NHM/WSM HW C-state auto-demotion (v3) Len Brown
@ 2011-02-27 16:17     ` Pierre Tardy
  2011-02-28 16:08       ` Len Brown
  0 siblings, 1 reply; 7+ messages in thread
From: Pierre Tardy @ 2011-02-27 16:17 UTC (permalink / raw)
  To: Len Brown; +Cc: linux-pm, x86, linux-acpi, linux-kernel

On Wed, Feb 16, 2011 at 7:22 AM, Len Brown <lenb@kernel.org> wrote:
> From: Len Brown <len.brown@intel.com>
>
> Hardware C-state auto-demotion is a mechanism where the HW overrides
> the OS C-state request, instead demoting to a shallower state,
> which is less expensive, but saves less power.
I'm interrested, for pytimechart, to have trace information of what
actual c-state got reached after each idle request.
Do you have any info on how to get that?

Regards
Pierre

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] intel_idle: disable NHM/WSM HW C-state auto-demotion (v3)
  2011-02-27 16:17     ` Pierre Tardy
@ 2011-02-28 16:08       ` Len Brown
  0 siblings, 0 replies; 7+ messages in thread
From: Len Brown @ 2011-02-28 16:08 UTC (permalink / raw)
  To: Pierre Tardy; +Cc: linux-pm, x86, linux-acpi, linux-kernel

> > Hardware C-state auto-demotion is a mechanism where the HW overrides
> > the OS C-state request, instead demoting to a shallower state,
> > which is less expensive, but saves less power.

> I'm interrested, for pytimechart, to have trace information of what
> actual c-state got reached after each idle request.
> Do you have any info on how to get that?

the actual c-state residency can be seen in the residency 
counters that are exported by turbostat.
However, at the time of the request/return the OS doesn't know
that it's request got demoted demoted (or un-demoted).

No, checking the counters is not something we want to
add to the idle entry/exit path -- they are not optimized for speed.

thanks,
Len Brown, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-02-28 16:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-19  2:52 [PATCH] intel_idle: disable HW auto-demotion Len Brown
2011-01-19  5:40 ` [PATCH] intel_idle: disable HW auto-demotion by default (v2) Len Brown
2011-02-16  6:22   ` [PATCH] intel_idle: disable NHM/WSM HW C-state auto-demotion (v3) Len Brown
2011-02-27 16:17     ` Pierre Tardy
2011-02-28 16:08       ` Len Brown
2011-01-19 14:41 ` [PATCH] intel_idle: disable HW auto-demotion Matthew Garrett
2011-02-16  6:19   ` Len Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox