* [patch 012/100] acpi: fix of pmtimer overflow that make Cx states time incorrect
[not found] <20090423072020.428683652@sous-sol.org>
@ 2009-04-23 7:20 ` Chris Wright
2009-04-23 9:24 ` Shi, Alex
2009-04-23 9:40 ` Shi, Alex
2009-04-23 7:20 ` [patch 013/100] ACPI: cap off P-state transition latency from buggy BIOSes Chris Wright
2009-04-23 7:20 ` [patch 014/100] dock: fix dereference after kfree() Chris Wright
2 siblings, 2 replies; 6+ messages in thread
From: Chris Wright @ 2009-04-23 7:20 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
alan, Len Brown, linux-acpi, alex.shi, Venkatesh Pallipadi,
Yakui.zhao, Len Brown
[-- Attachment #1: acpi-fix-of-pmtimer-overflow-that-make-cx-states-time-incorrect.patch --]
[-- Type: text/plain, Size: 7495 bytes --]
-stable review patch. If anyone has any objections, please let us know.
---------------------
From: alex.shi <alex.shi@intel.com>
upstream commit: ff69f2bba67bd45514923aaedbf40fe351787c59
We found Cx states time abnormal in our some of machines which have 16
LCPUs, the C0 take too many time while system is really idle when kernel
enabled tickless and highres. powertop output is below:
PowerTOP version 1.9 (C) 2007 Intel Corporation
Cn Avg residency P-states (frequencies)
C0 (cpu running) (40.5%) 2.53 Ghz 0.0%
C1 0.0ms ( 0.0%) 2.53 Ghz 0.0%
C2 128.8ms (59.5%) 2.40 Ghz 0.0%
1.60 Ghz 100.0%
Wakeups-from-idle per second : 4.7 interval: 20.0s
no ACPI power usage estimate available
Top causes for wakeups:
41.4% ( 24.9) <interrupt> : extra timer interrupt
20.2% ( 12.2) <kernel core> : usb_hcd_poll_rh_status
(rh_timer_func)
After tacking detailed for this issue, Yakui and I find it is due to 24
bit PM timer overflows when some of cpu sleep more than 4 seconds. With
tickless kernel, the CPU want to sleep as much as possible when system
idle. But the Cx sleep time are recorded by pmtimer which length is
determined by BIOS. The current Cx time was gotten in the following
function from driver/acpi/processor_idle.c:
static inline u32 ticks_elapsed(u32 t1, u32 t2)
{
if (t2 >= t1)
return (t2 - t1);
else if (!(acpi_gbl_FADT.flags & ACPI_FADT_32BIT_TIMER))
return (((0x00FFFFFF - t1) + t2) & 0x00FFFFFF);
else
return ((0xFFFFFFFF - t1) + t2);
}
If pmtimer is 24 bits and it take 5 seconds from t1 to t2, in above
function, just about 1 seconds ticks was recorded. So the Cx time will be
reduced about 4 seconds. and this is why we see above powertop output.
To resolve this problem, Yakui and I use ktime_get() to record the Cx
states time instead of PM timer as the following patch. the patch was
tested with i386/x86_64 modes on several platforms.
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Tested-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Yakui.zhao <yakui.zhao@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
drivers/acpi/processor_idle.c | 63 ++++++++++++++++++------------------------
1 file changed, 27 insertions(+), 36 deletions(-)
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -64,7 +64,6 @@
#define _COMPONENT ACPI_PROCESSOR_COMPONENT
ACPI_MODULE_NAME("processor_idle");
#define ACPI_PROCESSOR_FILE_POWER "power"
-#define US_TO_PM_TIMER_TICKS(t) ((t * (PM_TIMER_FREQUENCY/1000)) / 1000)
#define PM_TIMER_TICK_NS (1000000000ULL/PM_TIMER_FREQUENCY)
#define C2_OVERHEAD 1 /* 1us */
#define C3_OVERHEAD 1 /* 1us */
@@ -78,6 +77,10 @@ module_param(nocst, uint, 0000);
static unsigned int latency_factor __read_mostly = 2;
module_param(latency_factor, uint, 0644);
+static s64 us_to_pm_timer_ticks(s64 t)
+{
+ return div64_u64(t * PM_TIMER_FREQUENCY, 1000000);
+}
/*
* IBM ThinkPad R40e crashes mysteriously when going into C2 or C3.
* For now disable this. Probably a bug somewhere else.
@@ -159,25 +162,6 @@ static struct dmi_system_id __cpuinitdat
{},
};
-static inline u32 ticks_elapsed(u32 t1, u32 t2)
-{
- if (t2 >= t1)
- return (t2 - t1);
- else if (!(acpi_gbl_FADT.flags & ACPI_FADT_32BIT_TIMER))
- return (((0x00FFFFFF - t1) + t2) & 0x00FFFFFF);
- else
- return ((0xFFFFFFFF - t1) + t2);
-}
-
-static inline u32 ticks_elapsed_in_us(u32 t1, u32 t2)
-{
- if (t2 >= t1)
- return PM_TIMER_TICKS_TO_US(t2 - t1);
- else if (!(acpi_gbl_FADT.flags & ACPI_FADT_32BIT_TIMER))
- return PM_TIMER_TICKS_TO_US(((0x00FFFFFF - t1) + t2) & 0x00FFFFFF);
- else
- return PM_TIMER_TICKS_TO_US((0xFFFFFFFF - t1) + t2);
-}
/*
* Callers should disable interrupts before the call and enable
@@ -853,7 +837,8 @@ static inline void acpi_idle_do_entry(st
static int acpi_idle_enter_c1(struct cpuidle_device *dev,
struct cpuidle_state *state)
{
- u32 t1, t2;
+ ktime_t kt1, kt2;
+ s64 idle_time;
struct acpi_processor *pr;
struct acpi_processor_cx *cx = cpuidle_get_statedata(state);
@@ -871,14 +856,15 @@ static int acpi_idle_enter_c1(struct cpu
return 0;
}
- t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
+ kt1 = ktime_get_real();
acpi_idle_do_entry(cx);
- t2 = inl(acpi_gbl_FADT.xpm_timer_block.address);
+ kt2 = ktime_get_real();
+ idle_time = ktime_to_us(ktime_sub(kt2, kt1));
local_irq_enable();
cx->usage++;
- return ticks_elapsed_in_us(t1, t2);
+ return idle_time;
}
/**
@@ -891,8 +877,9 @@ static int acpi_idle_enter_simple(struct
{
struct acpi_processor *pr;
struct acpi_processor_cx *cx = cpuidle_get_statedata(state);
- u32 t1, t2;
- int sleep_ticks = 0;
+ ktime_t kt1, kt2;
+ s64 idle_time;
+ s64 sleep_ticks = 0;
pr = __get_cpu_var(processors);
@@ -925,18 +912,19 @@ static int acpi_idle_enter_simple(struct
if (cx->type == ACPI_STATE_C3)
ACPI_FLUSH_CPU_CACHE();
- t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
+ kt1 = ktime_get_real();
/* Tell the scheduler that we are going deep-idle: */
sched_clock_idle_sleep_event();
acpi_idle_do_entry(cx);
- t2 = inl(acpi_gbl_FADT.xpm_timer_block.address);
+ kt2 = ktime_get_real();
+ idle_time = ktime_to_us(ktime_sub(kt2, kt1));
#if defined (CONFIG_GENERIC_TIME) && defined (CONFIG_X86)
/* TSC could halt in idle, so notify users */
if (tsc_halts_in_c(cx->type))
mark_tsc_unstable("TSC halts in idle");;
#endif
- sleep_ticks = ticks_elapsed(t1, t2);
+ sleep_ticks = us_to_pm_timer_ticks(idle_time);
/* Tell the scheduler how much we idled: */
sched_clock_idle_wakeup_event(sleep_ticks*PM_TIMER_TICK_NS);
@@ -948,7 +936,7 @@ static int acpi_idle_enter_simple(struct
acpi_state_timer_broadcast(pr, cx, 0);
cx->time += sleep_ticks;
- return ticks_elapsed_in_us(t1, t2);
+ return idle_time;
}
static int c3_cpu_count;
@@ -966,8 +954,10 @@ static int acpi_idle_enter_bm(struct cpu
{
struct acpi_processor *pr;
struct acpi_processor_cx *cx = cpuidle_get_statedata(state);
- u32 t1, t2;
- int sleep_ticks = 0;
+ ktime_t kt1, kt2;
+ s64 idle_time;
+ s64 sleep_ticks = 0;
+
pr = __get_cpu_var(processors);
@@ -1034,9 +1024,10 @@ static int acpi_idle_enter_bm(struct cpu
ACPI_FLUSH_CPU_CACHE();
}
- t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
+ kt1 = ktime_get_real();
acpi_idle_do_entry(cx);
- t2 = inl(acpi_gbl_FADT.xpm_timer_block.address);
+ kt2 = ktime_get_real();
+ idle_time = ktime_to_us(ktime_sub(kt2, kt1));
/* Re-enable bus master arbitration */
if (pr->flags.bm_check && pr->flags.bm_control) {
@@ -1051,7 +1042,7 @@ static int acpi_idle_enter_bm(struct cpu
if (tsc_halts_in_c(ACPI_STATE_C3))
mark_tsc_unstable("TSC halts in idle");
#endif
- sleep_ticks = ticks_elapsed(t1, t2);
+ sleep_ticks = us_to_pm_timer_ticks(idle_time);
/* Tell the scheduler how much we idled: */
sched_clock_idle_wakeup_event(sleep_ticks*PM_TIMER_TICK_NS);
@@ -1062,7 +1053,7 @@ static int acpi_idle_enter_bm(struct cpu
acpi_state_timer_broadcast(pr, cx, 0);
cx->time += sleep_ticks;
- return ticks_elapsed_in_us(t1, t2);
+ return idle_time;
}
struct cpuidle_driver acpi_idle_driver = {
^ permalink raw reply [flat|nested] 6+ messages in thread
* [patch 013/100] ACPI: cap off P-state transition latency from buggy BIOSes
[not found] <20090423072020.428683652@sous-sol.org>
2009-04-23 7:20 ` [patch 012/100] acpi: fix of pmtimer overflow that make Cx states time incorrect Chris Wright
@ 2009-04-23 7:20 ` Chris Wright
2009-04-25 15:59 ` Martin Steigerwald
2009-04-23 7:20 ` [patch 014/100] dock: fix dereference after kfree() Chris Wright
2 siblings, 1 reply; 6+ messages in thread
From: Chris Wright @ 2009-04-23 7:20 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
alan, Len Brown, linux-acpi, Venkatesh Pallipadi, Matthew Garrett,
Len Brown
[-- Attachment #1: acpi-cap-off-p-state-transition-latency-from-buggy-bioses.patch --]
[-- Type: text/plain, Size: 1973 bytes --]
-stable review patch. If anyone has any objections, please let us know.
---------------------
From: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
upstream commit: a59d1637eb0e0a37ee0e5c92800c60abe3624e24
Some BIOSes report very high frequency transition latency which are plainly
wrong on CPus that can change frequency using native MSR interface.
One such system is IBM T42 (2327-8ZU) as reported by Owen Taylor and
Rik van Riel.
cpufreq_ondemand driver uses this transition latency to come up with a
reasonable sampling interval to sample CPU usage and with such high
latency value, ondemand sampling interval ends up being very high
(0.5 sec, in this particular case), resulting in performance impact due to
slow response to increasing frequency.
Fix it by capping-off the transition latency to 20uS for native MSR based
frequency transitions.
mjg: We've confirmed that this also helps on the X31
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
--- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -680,6 +680,18 @@ static int acpi_cpufreq_cpu_init(struct
perf->states[i].transition_latency * 1000;
}
+ /* Check for high latency (>20uS) from buggy BIOSes, like on T42 */
+ if (perf->control_register.space_id == ACPI_ADR_SPACE_FIXED_HARDWARE &&
+ policy->cpuinfo.transition_latency > 20 * 1000) {
+ static int print_once;
+ policy->cpuinfo.transition_latency = 20 * 1000;
+ if (!print_once) {
+ print_once = 1;
+ printk(KERN_INFO "Capping off P-state tranision latency"
+ " at 20 uS\n");
+ }
+ }
+
data->max_freq = perf->states[0].core_frequency * 1000;
/* table init */
for (i=0; i<perf->state_count; i++) {
^ permalink raw reply [flat|nested] 6+ messages in thread
* [patch 014/100] dock: fix dereference after kfree()
[not found] <20090423072020.428683652@sous-sol.org>
2009-04-23 7:20 ` [patch 012/100] acpi: fix of pmtimer overflow that make Cx states time incorrect Chris Wright
2009-04-23 7:20 ` [patch 013/100] ACPI: cap off P-state transition latency from buggy BIOSes Chris Wright
@ 2009-04-23 7:20 ` Chris Wright
2 siblings, 0 replies; 6+ messages in thread
From: Chris Wright @ 2009-04-23 7:20 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
alan, Len Brown, linux-acpi, Dan Carpenter, Len Brown
[-- Attachment #1: dock-fix-dereference-after-kfree.patch --]
[-- Type: text/plain, Size: 1060 bytes --]
-stable review patch. If anyone has any objections, please let us know.
---------------------
From: Dan Carpenter <error27@gmail.com>
upstream commit: f240729832dff3785104d950dad2d3ced4387f6d
dock_remove() calls kfree() on dock_station so we should use
list_for_each_entry_safe() to avoid dereferencing freed memory.
Found by smatch (http://repo.or.cz/w/smatch.git/). Compile tested.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
drivers/acpi/dock.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/acpi/dock.c
+++ b/drivers/acpi/dock.c
@@ -1146,9 +1146,10 @@ static int __init dock_init(void)
static void __exit dock_exit(void)
{
struct dock_station *dock_station;
+ struct dock_station *tmp;
unregister_acpi_bus_notifier(&dock_acpi_notifier);
- list_for_each_entry(dock_station, &dock_stations, sibiling)
+ list_for_each_entry_safe(dock_station, tmp, &dock_stations, sibiling)
dock_remove(dock_station);
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [patch 012/100] acpi: fix of pmtimer overflow that make Cx states time incorrect
2009-04-23 7:20 ` [patch 012/100] acpi: fix of pmtimer overflow that make Cx states time incorrect Chris Wright
@ 2009-04-23 9:24 ` Shi, Alex
2009-04-23 9:40 ` Shi, Alex
1 sibling, 0 replies; 6+ messages in thread
From: Shi, Alex @ 2009-04-23 9:24 UTC (permalink / raw)
To: Chris Wright, linux-kernel@vger.kernel.org, stable@kernel.org
Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
Rodrigo Rubira Branco, Jake Edge, Eugene Teo,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk, Len Brown, linux-acpi@vger.kernel.org,
Pallipadi, Venkatesh, Zhao, Yakui
It was reported make some latop booting hang. and is not root caused now. :(
http://bugzilla.kernel.org/show_bug.cgi?id=13087
Alex
>-----Original Message-----
>From: Chris Wright [mailto:chrisw@sous-sol.org]
>Sent: 2009年4月23日 15:21
>To: linux-kernel@vger.kernel.org; stable@kernel.org
>Cc: Justin Forbes; Zwane Mwaikambo; Theodore Ts'o; Randy Dunlap; Dave Jones;
>Chuck Wolber; Chris Wedgwood; Michael Krufky; Chuck Ebbert; Domenico Andreoli;
>Willy Tarreau; Rodrigo Rubira Branco; Jake Edge; Eugene Teo;
>torvalds@linux-foundation.org; akpm@linux-foundation.org;
>alan@lxorguk.ukuu.org.uk; Len Brown; linux-acpi@vger.kernel.org; Shi, Alex;
>Pallipadi, Venkatesh; Zhao, Yakui; Brown, Len
>Subject: [patch 012/100] acpi: fix of pmtimer overflow that make Cx states time
>incorrect
>
>-stable review patch. If anyone has any objections, please let us know.
>---------------------
>
>From: alex.shi <alex.shi@intel.com>
>
>upstream commit: ff69f2bba67bd45514923aaedbf40fe351787c59
>
>We found Cx states time abnormal in our some of machines which have 16
>LCPUs, the C0 take too many time while system is really idle when kernel
>enabled tickless and highres. powertop output is below:
>
> PowerTOP version 1.9 (C) 2007 Intel Corporation
>
>Cn Avg residency P-states (frequencies)
>C0 (cpu running) (40.5%) 2.53 Ghz 0.0%
>C1 0.0ms ( 0.0%) 2.53 Ghz 0.0%
>C2 128.8ms (59.5%) 2.40 Ghz 0.0%
> 1.60 Ghz 100.0%
>
>Wakeups-from-idle per second : 4.7 interval: 20.0s
>no ACPI power usage estimate available
>
>Top causes for wakeups:
> 41.4% ( 24.9) <interrupt> : extra timer interrupt
> 20.2% ( 12.2) <kernel core> : usb_hcd_poll_rh_status
>(rh_timer_func)
>
>After tacking detailed for this issue, Yakui and I find it is due to 24
>bit PM timer overflows when some of cpu sleep more than 4 seconds. With
>tickless kernel, the CPU want to sleep as much as possible when system
>idle. But the Cx sleep time are recorded by pmtimer which length is
>determined by BIOS. The current Cx time was gotten in the following
>function from driver/acpi/processor_idle.c:
>
>static inline u32 ticks_elapsed(u32 t1, u32 t2)
>{
> if (t2 >= t1)
> return (t2 - t1);
> else if (!(acpi_gbl_FADT.flags & ACPI_FADT_32BIT_TIMER))
> return (((0x00FFFFFF - t1) + t2) & 0x00FFFFFF);
> else
> return ((0xFFFFFFFF - t1) + t2);
>}
>
>If pmtimer is 24 bits and it take 5 seconds from t1 to t2, in above
>function, just about 1 seconds ticks was recorded. So the Cx time will be
>reduced about 4 seconds. and this is why we see above powertop output.
>
>To resolve this problem, Yakui and I use ktime_get() to record the Cx
>states time instead of PM timer as the following patch. the patch was
>tested with i386/x86_64 modes on several platforms.
>
>Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
>Tested-by: Alex Shi <alex.shi@intel.com>
>Signed-off-by: Alex Shi <alex.shi@intel.com>
>Signed-off-by: Yakui.zhao <yakui.zhao@intel.com>
>Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>Signed-off-by: Len Brown <len.brown@intel.com>
>Signed-off-by: Chris Wright <chrisw@sous-sol.org>
>---
> drivers/acpi/processor_idle.c | 63
>++++++++++++++++++------------------------
> 1 file changed, 27 insertions(+), 36 deletions(-)
>
>--- a/drivers/acpi/processor_idle.c
>+++ b/drivers/acpi/processor_idle.c
>@@ -64,7 +64,6 @@
> #define _COMPONENT ACPI_PROCESSOR_COMPONENT
> ACPI_MODULE_NAME("processor_idle");
> #define ACPI_PROCESSOR_FILE_POWER "power"
>-#define US_TO_PM_TIMER_TICKS(t) ((t * (PM_TIMER_FREQUENCY/1000)) /
>1000)
> #define PM_TIMER_TICK_NS (1000000000ULL/PM_TIMER_FREQUENCY)
> #define C2_OVERHEAD 1 /* 1us */
> #define C3_OVERHEAD 1 /* 1us */
>@@ -78,6 +77,10 @@ module_param(nocst, uint, 0000);
> static unsigned int latency_factor __read_mostly = 2;
> module_param(latency_factor, uint, 0644);
>
>+static s64 us_to_pm_timer_ticks(s64 t)
>+{
>+ return div64_u64(t * PM_TIMER_FREQUENCY, 1000000);
>+}
> /*
> * IBM ThinkPad R40e crashes mysteriously when going into C2 or C3.
> * For now disable this. Probably a bug somewhere else.
>@@ -159,25 +162,6 @@ static struct dmi_system_id __cpuinitdat
> {},
> };
>
>-static inline u32 ticks_elapsed(u32 t1, u32 t2)
>-{
>- if (t2 >= t1)
>- return (t2 - t1);
>- else if (!(acpi_gbl_FADT.flags & ACPI_FADT_32BIT_TIMER))
>- return (((0x00FFFFFF - t1) + t2) & 0x00FFFFFF);
>- else
>- return ((0xFFFFFFFF - t1) + t2);
>-}
>-
>-static inline u32 ticks_elapsed_in_us(u32 t1, u32 t2)
>-{
>- if (t2 >= t1)
>- return PM_TIMER_TICKS_TO_US(t2 - t1);
>- else if (!(acpi_gbl_FADT.flags & ACPI_FADT_32BIT_TIMER))
>- return PM_TIMER_TICKS_TO_US(((0x00FFFFFF - t1) + t2) & 0x00FFFFFF);
>- else
>- return PM_TIMER_TICKS_TO_US((0xFFFFFFFF - t1) + t2);
>-}
>
> /*
> * Callers should disable interrupts before the call and enable
>@@ -853,7 +837,8 @@ static inline void acpi_idle_do_entry(st
> static int acpi_idle_enter_c1(struct cpuidle_device *dev,
> struct cpuidle_state *state)
> {
>- u32 t1, t2;
>+ ktime_t kt1, kt2;
>+ s64 idle_time;
> struct acpi_processor *pr;
> struct acpi_processor_cx *cx = cpuidle_get_statedata(state);
>
>@@ -871,14 +856,15 @@ static int acpi_idle_enter_c1(struct cpu
> return 0;
> }
>
>- t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>+ kt1 = ktime_get_real();
> acpi_idle_do_entry(cx);
>- t2 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>+ kt2 = ktime_get_real();
>+ idle_time = ktime_to_us(ktime_sub(kt2, kt1));
>
> local_irq_enable();
> cx->usage++;
>
>- return ticks_elapsed_in_us(t1, t2);
>+ return idle_time;
> }
>
> /**
>@@ -891,8 +877,9 @@ static int acpi_idle_enter_simple(struct
> {
> struct acpi_processor *pr;
> struct acpi_processor_cx *cx = cpuidle_get_statedata(state);
>- u32 t1, t2;
>- int sleep_ticks = 0;
>+ ktime_t kt1, kt2;
>+ s64 idle_time;
>+ s64 sleep_ticks = 0;
>
> pr = __get_cpu_var(processors);
>
>@@ -925,18 +912,19 @@ static int acpi_idle_enter_simple(struct
> if (cx->type == ACPI_STATE_C3)
> ACPI_FLUSH_CPU_CACHE();
>
>- t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>+ kt1 = ktime_get_real();
> /* Tell the scheduler that we are going deep-idle: */
> sched_clock_idle_sleep_event();
> acpi_idle_do_entry(cx);
>- t2 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>+ kt2 = ktime_get_real();
>+ idle_time = ktime_to_us(ktime_sub(kt2, kt1));
>
> #if defined (CONFIG_GENERIC_TIME) && defined (CONFIG_X86)
> /* TSC could halt in idle, so notify users */
> if (tsc_halts_in_c(cx->type))
> mark_tsc_unstable("TSC halts in idle");;
> #endif
>- sleep_ticks = ticks_elapsed(t1, t2);
>+ sleep_ticks = us_to_pm_timer_ticks(idle_time);
>
> /* Tell the scheduler how much we idled: */
> sched_clock_idle_wakeup_event(sleep_ticks*PM_TIMER_TICK_NS);
>@@ -948,7 +936,7 @@ static int acpi_idle_enter_simple(struct
>
> acpi_state_timer_broadcast(pr, cx, 0);
> cx->time += sleep_ticks;
>- return ticks_elapsed_in_us(t1, t2);
>+ return idle_time;
> }
>
> static int c3_cpu_count;
>@@ -966,8 +954,10 @@ static int acpi_idle_enter_bm(struct cpu
> {
> struct acpi_processor *pr;
> struct acpi_processor_cx *cx = cpuidle_get_statedata(state);
>- u32 t1, t2;
>- int sleep_ticks = 0;
>+ ktime_t kt1, kt2;
>+ s64 idle_time;
>+ s64 sleep_ticks = 0;
>+
>
> pr = __get_cpu_var(processors);
>
>@@ -1034,9 +1024,10 @@ static int acpi_idle_enter_bm(struct cpu
> ACPI_FLUSH_CPU_CACHE();
> }
>
>- t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>+ kt1 = ktime_get_real();
> acpi_idle_do_entry(cx);
>- t2 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>+ kt2 = ktime_get_real();
>+ idle_time = ktime_to_us(ktime_sub(kt2, kt1));
>
> /* Re-enable bus master arbitration */
> if (pr->flags.bm_check && pr->flags.bm_control) {
>@@ -1051,7 +1042,7 @@ static int acpi_idle_enter_bm(struct cpu
> if (tsc_halts_in_c(ACPI_STATE_C3))
> mark_tsc_unstable("TSC halts in idle");
> #endif
>- sleep_ticks = ticks_elapsed(t1, t2);
>+ sleep_ticks = us_to_pm_timer_ticks(idle_time);
> /* Tell the scheduler how much we idled: */
> sched_clock_idle_wakeup_event(sleep_ticks*PM_TIMER_TICK_NS);
>
>@@ -1062,7 +1053,7 @@ static int acpi_idle_enter_bm(struct cpu
>
> acpi_state_timer_broadcast(pr, cx, 0);
> cx->time += sleep_ticks;
>- return ticks_elapsed_in_us(t1, t2);
>+ return idle_time;
> }
>
> struct cpuidle_driver acpi_idle_driver = {
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [patch 012/100] acpi: fix of pmtimer overflow that make Cx states time incorrect
2009-04-23 7:20 ` [patch 012/100] acpi: fix of pmtimer overflow that make Cx states time incorrect Chris Wright
2009-04-23 9:24 ` Shi, Alex
@ 2009-04-23 9:40 ` Shi, Alex
1 sibling, 0 replies; 6+ messages in thread
From: Shi, Alex @ 2009-04-23 9:40 UTC (permalink / raw)
To: Shi, Alex, Chris Wright, linux-kernel@vger.kernel.org,
stable@kernel.org
Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
Rodrigo Rubira Branco, Jake Edge, Eugene Teo,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk, Len Brown, linux-acpi@vger.kernel.org,
Pallipadi, Venkatesh, Zhao, Yakui
Update:
Len Brown's patch may fix this bug.
http://bugzilla.kernel.org/show_bug.cgi?id=13087#c41
Alex
>-----Original Message-----
>From: Shi, Alex
>Sent: 2009年4月23日 17:25
>To: 'Chris Wright'; linux-kernel@vger.kernel.org; stable@kernel.org
>Cc: Justin Forbes; Zwane Mwaikambo; Theodore Ts'o; Randy Dunlap; Dave Jones;
>Chuck Wolber; Chris Wedgwood; Michael Krufky; Chuck Ebbert; Domenico Andreoli;
>Willy Tarreau; Rodrigo Rubira Branco; Jake Edge; Eugene Teo;
>torvalds@linux-foundation.org; akpm@linux-foundation.org;
>alan@lxorguk.ukuu.org.uk; Len Brown; linux-acpi@vger.kernel.org; Pallipadi,
>Venkatesh; Zhao, Yakui; Brown, Len
>Subject: RE: [patch 012/100] acpi: fix of pmtimer overflow that make Cx states
>time incorrect
>
>It was reported make some latop booting hang. and is not root caused now. :(
>http://bugzilla.kernel.org/show_bug.cgi?id=13087
>
>Alex
>>-----Original Message-----
>>From: Chris Wright [mailto:chrisw@sous-sol.org]
>>Sent: 2009年4月23日 15:21
>>To: linux-kernel@vger.kernel.org; stable@kernel.org
>>Cc: Justin Forbes; Zwane Mwaikambo; Theodore Ts'o; Randy Dunlap; Dave Jones;
>>Chuck Wolber; Chris Wedgwood; Michael Krufky; Chuck Ebbert; Domenico Andreoli;
>>Willy Tarreau; Rodrigo Rubira Branco; Jake Edge; Eugene Teo;
>>torvalds@linux-foundation.org; akpm@linux-foundation.org;
>>alan@lxorguk.ukuu.org.uk; Len Brown; linux-acpi@vger.kernel.org; Shi, Alex;
>>Pallipadi, Venkatesh; Zhao, Yakui; Brown, Len
>>Subject: [patch 012/100] acpi: fix of pmtimer overflow that make Cx states time
>>incorrect
>>
>>-stable review patch. If anyone has any objections, please let us know.
>>---------------------
>>
>>From: alex.shi <alex.shi@intel.com>
>>
>>upstream commit: ff69f2bba67bd45514923aaedbf40fe351787c59
>>
>>We found Cx states time abnormal in our some of machines which have 16
>>LCPUs, the C0 take too many time while system is really idle when kernel
>>enabled tickless and highres. powertop output is below:
>>
>> PowerTOP version 1.9 (C) 2007 Intel Corporation
>>
>>Cn Avg residency P-states (frequencies)
>>C0 (cpu running) (40.5%) 2.53 Ghz 0.0%
>>C1 0.0ms ( 0.0%) 2.53 Ghz 0.0%
>>C2 128.8ms (59.5%) 2.40 Ghz 0.0%
>> 1.60 Ghz 100.0%
>>
>>Wakeups-from-idle per second : 4.7 interval: 20.0s
>>no ACPI power usage estimate available
>>
>>Top causes for wakeups:
>> 41.4% ( 24.9) <interrupt> : extra timer interrupt
>> 20.2% ( 12.2) <kernel core> : usb_hcd_poll_rh_status
>>(rh_timer_func)
>>
>>After tacking detailed for this issue, Yakui and I find it is due to 24
>>bit PM timer overflows when some of cpu sleep more than 4 seconds. With
>>tickless kernel, the CPU want to sleep as much as possible when system
>>idle. But the Cx sleep time are recorded by pmtimer which length is
>>determined by BIOS. The current Cx time was gotten in the following
>>function from driver/acpi/processor_idle.c:
>>
>>static inline u32 ticks_elapsed(u32 t1, u32 t2)
>>{
>> if (t2 >= t1)
>> return (t2 - t1);
>> else if (!(acpi_gbl_FADT.flags & ACPI_FADT_32BIT_TIMER))
>> return (((0x00FFFFFF - t1) + t2) & 0x00FFFFFF);
>> else
>> return ((0xFFFFFFFF - t1) + t2);
>>}
>>
>>If pmtimer is 24 bits and it take 5 seconds from t1 to t2, in above
>>function, just about 1 seconds ticks was recorded. So the Cx time will be
>>reduced about 4 seconds. and this is why we see above powertop output.
>>
>>To resolve this problem, Yakui and I use ktime_get() to record the Cx
>>states time instead of PM timer as the following patch. the patch was
>>tested with i386/x86_64 modes on several platforms.
>>
>>Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
>>Tested-by: Alex Shi <alex.shi@intel.com>
>>Signed-off-by: Alex Shi <alex.shi@intel.com>
>>Signed-off-by: Yakui.zhao <yakui.zhao@intel.com>
>>Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>>Signed-off-by: Len Brown <len.brown@intel.com>
>>Signed-off-by: Chris Wright <chrisw@sous-sol.org>
>>---
>> drivers/acpi/processor_idle.c | 63
>>++++++++++++++++++------------------------
>> 1 file changed, 27 insertions(+), 36 deletions(-)
>>
>>--- a/drivers/acpi/processor_idle.c
>>+++ b/drivers/acpi/processor_idle.c
>>@@ -64,7 +64,6 @@
>> #define _COMPONENT ACPI_PROCESSOR_COMPONENT
>> ACPI_MODULE_NAME("processor_idle");
>> #define ACPI_PROCESSOR_FILE_POWER "power"
>>-#define US_TO_PM_TIMER_TICKS(t) ((t * (PM_TIMER_FREQUENCY/1000)) /
>>1000)
>> #define PM_TIMER_TICK_NS (1000000000ULL/PM_TIMER_FREQUENCY)
>> #define C2_OVERHEAD 1 /* 1us */
>> #define C3_OVERHEAD 1 /* 1us */
>>@@ -78,6 +77,10 @@ module_param(nocst, uint, 0000);
>> static unsigned int latency_factor __read_mostly = 2;
>> module_param(latency_factor, uint, 0644);
>>
>>+static s64 us_to_pm_timer_ticks(s64 t)
>>+{
>>+ return div64_u64(t * PM_TIMER_FREQUENCY, 1000000);
>>+}
>> /*
>> * IBM ThinkPad R40e crashes mysteriously when going into C2 or C3.
>> * For now disable this. Probably a bug somewhere else.
>>@@ -159,25 +162,6 @@ static struct dmi_system_id __cpuinitdat
>> {},
>> };
>>
>>-static inline u32 ticks_elapsed(u32 t1, u32 t2)
>>-{
>>- if (t2 >= t1)
>>- return (t2 - t1);
>>- else if (!(acpi_gbl_FADT.flags & ACPI_FADT_32BIT_TIMER))
>>- return (((0x00FFFFFF - t1) + t2) & 0x00FFFFFF);
>>- else
>>- return ((0xFFFFFFFF - t1) + t2);
>>-}
>>-
>>-static inline u32 ticks_elapsed_in_us(u32 t1, u32 t2)
>>-{
>>- if (t2 >= t1)
>>- return PM_TIMER_TICKS_TO_US(t2 - t1);
>>- else if (!(acpi_gbl_FADT.flags & ACPI_FADT_32BIT_TIMER))
>>- return PM_TIMER_TICKS_TO_US(((0x00FFFFFF - t1) + t2) & 0x00FFFFFF);
>>- else
>>- return PM_TIMER_TICKS_TO_US((0xFFFFFFFF - t1) + t2);
>>-}
>>
>> /*
>> * Callers should disable interrupts before the call and enable
>>@@ -853,7 +837,8 @@ static inline void acpi_idle_do_entry(st
>> static int acpi_idle_enter_c1(struct cpuidle_device *dev,
>> struct cpuidle_state *state)
>> {
>>- u32 t1, t2;
>>+ ktime_t kt1, kt2;
>>+ s64 idle_time;
>> struct acpi_processor *pr;
>> struct acpi_processor_cx *cx = cpuidle_get_statedata(state);
>>
>>@@ -871,14 +856,15 @@ static int acpi_idle_enter_c1(struct cpu
>> return 0;
>> }
>>
>>- t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>>+ kt1 = ktime_get_real();
>> acpi_idle_do_entry(cx);
>>- t2 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>>+ kt2 = ktime_get_real();
>>+ idle_time = ktime_to_us(ktime_sub(kt2, kt1));
>>
>> local_irq_enable();
>> cx->usage++;
>>
>>- return ticks_elapsed_in_us(t1, t2);
>>+ return idle_time;
>> }
>>
>> /**
>>@@ -891,8 +877,9 @@ static int acpi_idle_enter_simple(struct
>> {
>> struct acpi_processor *pr;
>> struct acpi_processor_cx *cx = cpuidle_get_statedata(state);
>>- u32 t1, t2;
>>- int sleep_ticks = 0;
>>+ ktime_t kt1, kt2;
>>+ s64 idle_time;
>>+ s64 sleep_ticks = 0;
>>
>> pr = __get_cpu_var(processors);
>>
>>@@ -925,18 +912,19 @@ static int acpi_idle_enter_simple(struct
>> if (cx->type == ACPI_STATE_C3)
>> ACPI_FLUSH_CPU_CACHE();
>>
>>- t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>>+ kt1 = ktime_get_real();
>> /* Tell the scheduler that we are going deep-idle: */
>> sched_clock_idle_sleep_event();
>> acpi_idle_do_entry(cx);
>>- t2 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>>+ kt2 = ktime_get_real();
>>+ idle_time = ktime_to_us(ktime_sub(kt2, kt1));
>>
>> #if defined (CONFIG_GENERIC_TIME) && defined (CONFIG_X86)
>> /* TSC could halt in idle, so notify users */
>> if (tsc_halts_in_c(cx->type))
>> mark_tsc_unstable("TSC halts in idle");;
>> #endif
>>- sleep_ticks = ticks_elapsed(t1, t2);
>>+ sleep_ticks = us_to_pm_timer_ticks(idle_time);
>>
>> /* Tell the scheduler how much we idled: */
>> sched_clock_idle_wakeup_event(sleep_ticks*PM_TIMER_TICK_NS);
>>@@ -948,7 +936,7 @@ static int acpi_idle_enter_simple(struct
>>
>> acpi_state_timer_broadcast(pr, cx, 0);
>> cx->time += sleep_ticks;
>>- return ticks_elapsed_in_us(t1, t2);
>>+ return idle_time;
>> }
>>
>> static int c3_cpu_count;
>>@@ -966,8 +954,10 @@ static int acpi_idle_enter_bm(struct cpu
>> {
>> struct acpi_processor *pr;
>> struct acpi_processor_cx *cx = cpuidle_get_statedata(state);
>>- u32 t1, t2;
>>- int sleep_ticks = 0;
>>+ ktime_t kt1, kt2;
>>+ s64 idle_time;
>>+ s64 sleep_ticks = 0;
>>+
>>
>> pr = __get_cpu_var(processors);
>>
>>@@ -1034,9 +1024,10 @@ static int acpi_idle_enter_bm(struct cpu
>> ACPI_FLUSH_CPU_CACHE();
>> }
>>
>>- t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>>+ kt1 = ktime_get_real();
>> acpi_idle_do_entry(cx);
>>- t2 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>>+ kt2 = ktime_get_real();
>>+ idle_time = ktime_to_us(ktime_sub(kt2, kt1));
>>
>> /* Re-enable bus master arbitration */
>> if (pr->flags.bm_check && pr->flags.bm_control) {
>>@@ -1051,7 +1042,7 @@ static int acpi_idle_enter_bm(struct cpu
>> if (tsc_halts_in_c(ACPI_STATE_C3))
>> mark_tsc_unstable("TSC halts in idle");
>> #endif
>>- sleep_ticks = ticks_elapsed(t1, t2);
>>+ sleep_ticks = us_to_pm_timer_ticks(idle_time);
>> /* Tell the scheduler how much we idled: */
>> sched_clock_idle_wakeup_event(sleep_ticks*PM_TIMER_TICK_NS);
>>
>>@@ -1062,7 +1053,7 @@ static int acpi_idle_enter_bm(struct cpu
>>
>> acpi_state_timer_broadcast(pr, cx, 0);
>> cx->time += sleep_ticks;
>>- return ticks_elapsed_in_us(t1, t2);
>>+ return idle_time;
>> }
>>
>> struct cpuidle_driver acpi_idle_driver = {
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch 013/100] ACPI: cap off P-state transition latency from buggy BIOSes
2009-04-23 7:20 ` [patch 013/100] ACPI: cap off P-state transition latency from buggy BIOSes Chris Wright
@ 2009-04-25 15:59 ` Martin Steigerwald
0 siblings, 0 replies; 6+ messages in thread
From: Martin Steigerwald @ 2009-04-25 15:59 UTC (permalink / raw)
To: linux-kernel
Cc: Chris Wright, stable, Justin Forbes, Zwane Mwaikambo,
Theodore Ts'o, Randy Dunlap, Dave Jones, Chuck Wolber,
Chris Wedgwood, Michael Krufky, Chuck Ebbert, Domenico Andreoli,
Willy Tarreau, Rodrigo Rubira Branco, Jake Edge, Eugene Teo,
torvalds, akpm, alan, Len Brown, linux-acpi, Venkatesh Pallipadi,
Matthew Garrett, Len Brown
[-- Attachment #1: Type: text/plain, Size: 2327 bytes --]
Am Donnerstag 23 April 2009 schrieb Chris Wright:
> -stable review patch. If anyone has any objections, please let us
> know. ---------------------
>
> From: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
>
> upstream commit: a59d1637eb0e0a37ee0e5c92800c60abe3624e24
>
> Some BIOSes report very high frequency transition latency which are
> plainly wrong on CPus that can change frequency using native MSR
> interface.
>
> One such system is IBM T42 (2327-8ZU) as reported by Owen Taylor and
> Rik van Riel.
>
> cpufreq_ondemand driver uses this transition latency to come up with a
> reasonable sampling interval to sample CPU usage and with such high
> latency value, ondemand sampling interval ends up being very high
> (0.5 sec, in this particular case), resulting in performance impact due
> to slow response to increasing frequency.
>
> Fix it by capping-off the transition latency to 20uS for native MSR
> based frequency transitions.
>
> mjg: We've confirmed that this also helps on the X31
>
> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> Acked-by: Matthew Garrett <mjg@redhat.com>
> Signed-off-by: Len Brown <len.brown@intel.com>
> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
> ---
>
> arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> @@ -680,6 +680,18 @@ static int acpi_cpufreq_cpu_init(struct
> perf->states[i].transition_latency * 1000;
> }
>
> + /* Check for high latency (>20uS) from buggy BIOSes, like on T42 */
> + if (perf->control_register.space_id == ACPI_ADR_SPACE_FIXED_HARDWARE
> && + policy->cpuinfo.transition_latency > 20 * 1000) {
> + static int print_once;
> + policy->cpuinfo.transition_latency = 20 * 1000;
> + if (!print_once) {
> + print_once = 1;
> + printk(KERN_INFO "Capping off P-state tranision latency"
typo: tranision => transition
> + " at 20 uS\n");
> + }
> + }
> +
> data->max_freq = perf->states[0].core_frequency * 1000;
> /* table init */
> for (i=0; i<perf->state_count; i++) {
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-04-25 15:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20090423072020.428683652@sous-sol.org>
2009-04-23 7:20 ` [patch 012/100] acpi: fix of pmtimer overflow that make Cx states time incorrect Chris Wright
2009-04-23 9:24 ` Shi, Alex
2009-04-23 9:40 ` Shi, Alex
2009-04-23 7:20 ` [patch 013/100] ACPI: cap off P-state transition latency from buggy BIOSes Chris Wright
2009-04-25 15:59 ` Martin Steigerwald
2009-04-23 7:20 ` [patch 014/100] dock: fix dereference after kfree() Chris Wright
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).