* [patch 1/4] acpi: acpi_cpu_soft_notify() not getting called on resume
2010-01-15 1:39 [patch 0/4] Only use HPET MSI timers on systems with deep C-state support Venkatesh Pallipadi
@ 2010-01-15 1:39 ` Venkatesh Pallipadi
2010-01-15 1:39 ` [patch 2/4] x86: Do not use hpet MSI as clockevent broadcast device Venkatesh Pallipadi
` (3 subsequent siblings)
4 siblings, 0 replies; 8+ messages in thread
From: Venkatesh Pallipadi @ 2010-01-15 1:39 UTC (permalink / raw)
To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Len Brown,
Mark Hounschell
Cc: linux-kernel, Rafael J. Wysocki, Alain Knaff, Linus Torvalds,
Li, Shaohua
[-- Attachment #1: 0002-acpi_cpu_soft_notify-seems-to-be-calling-ppc-cst.patch --]
[-- Type: text/plain, Size: 1408 bytes --]
acpi_cpu_soft_notify() seems to be calling ppc, cst and tstate routines
only on CPU_ONLINE and not on CPU_ONLINE_FROZEN, which means these
routines are getting called on logical CPU offline-online but not
getting called for any non-boot CPUs during suspend-resume.
One side effect of this that I noticed is that,
tick_broadcast_mask changes from 0000000f to 00000001 after
suspend-resume. I haven't seen any functionality issue with this. But,
seems wrong. I am not sure whether this can result in any other
bugs/issues. But, here is the fix.
This patch is a pre-requisite for followup patches that tries to fix
clockevent source ratings.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
---
drivers/acpi/processor_core.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 9863c98..4513395 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -695,7 +695,7 @@ static int acpi_cpu_soft_notify(struct notifier_block *nfb,
unsigned int cpu = (unsigned long)hcpu;
struct acpi_processor *pr = per_cpu(processors, cpu);
- if (action == CPU_ONLINE && pr) {
+ if ((action == CPU_ONLINE || action == CPU_ONLINE_FROZEN) && pr) {
acpi_processor_ppc_has_changed(pr, 0);
acpi_processor_cst_has_changed(pr);
acpi_processor_tstate_has_changed(pr);
--
1.6.0.6
--
^ permalink raw reply related [flat|nested] 8+ messages in thread* [patch 2/4] x86: Do not use hpet MSI as clockevent broadcast device
2010-01-15 1:39 [patch 0/4] Only use HPET MSI timers on systems with deep C-state support Venkatesh Pallipadi
2010-01-15 1:39 ` [patch 1/4] acpi: acpi_cpu_soft_notify() not getting called on resume Venkatesh Pallipadi
@ 2010-01-15 1:39 ` Venkatesh Pallipadi
2010-01-15 2:00 ` H. Peter Anvin
2010-01-15 1:40 ` [patch 3/4] clockevent: Add tick_check_fallback_timer to look for fallback timer Venkatesh Pallipadi
` (2 subsequent siblings)
4 siblings, 1 reply; 8+ messages in thread
From: Venkatesh Pallipadi @ 2010-01-15 1:39 UTC (permalink / raw)
To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Len Brown,
Mark Hounschell
Cc: linux-kernel, Rafael J. Wysocki, Alain Knaff, Linus Torvalds,
Li, Shaohua
[-- Attachment #1: 0003-x86-Do-not-use-hpet-MSI-as-clockevent-broadcast-dev.patch --]
[-- Type: text/plain, Size: 2171 bytes --]
Current kernel uses hpet MSI interrupts as both percpu clockevent device
and also as clockevent broadcast device in place of IRQ0 timer.
There seems to be issues with HPET MSI usage on some platforms. So,
restrict it to be only used as a percpu clockevent device.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
---
arch/x86/kernel/hpet.c | 2 +-
include/linux/clockchips.h | 2 ++
kernel/time/tick-broadcast.c | 3 ++-
3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index ba6e658..dd9370b 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -556,7 +556,7 @@ static void init_one_hpet_msi_clockevent(struct hpet_dev *hdev, int cpu)
evt->irq = hdev->irq;
evt->rating = 110;
- evt->features = CLOCK_EVT_FEAT_ONESHOT;
+ evt->features = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_NO_BROADCAST;
if (hdev->flags & HPET_DEV_PERI_CAP)
evt->features |= CLOCK_EVT_FEAT_PERIODIC;
diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 0cf725b..5352187 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -50,9 +50,11 @@ enum clock_event_nofitiers {
*
* - Clockevent source stops in C3 State and needs broadcast support.
* - Local APIC timer is used as a dummy device.
+ * - Do not use this device as a broadcast device.
*/
#define CLOCK_EVT_FEAT_C3STOP 0x000004
#define CLOCK_EVT_FEAT_DUMMY 0x000008
+#define CLOCK_EVT_FEAT_NO_BROADCAST 0x000010
/**
* struct clock_event_device - clock event device descriptor
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index b3bafd5..113f954 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -69,7 +69,8 @@ int tick_check_broadcast_device(struct clock_event_device *dev)
{
if ((tick_broadcast_device.evtdev &&
tick_broadcast_device.evtdev->rating >= dev->rating) ||
- (dev->features & CLOCK_EVT_FEAT_C3STOP))
+ dev->features & CLOCK_EVT_FEAT_C3STOP ||
+ dev->features & CLOCK_EVT_FEAT_NO_BROADCAST)
return 0;
clockevents_exchange_device(NULL, dev);
--
1.6.0.6
--
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [patch 2/4] x86: Do not use hpet MSI as clockevent broadcast device
2010-01-15 1:39 ` [patch 2/4] x86: Do not use hpet MSI as clockevent broadcast device Venkatesh Pallipadi
@ 2010-01-15 2:00 ` H. Peter Anvin
2010-01-15 2:07 ` Pallipadi, Venkatesh
0 siblings, 1 reply; 8+ messages in thread
From: H. Peter Anvin @ 2010-01-15 2:00 UTC (permalink / raw)
To: Venkatesh Pallipadi
Cc: Ingo Molnar, Thomas Gleixner, Len Brown, Mark Hounschell,
linux-kernel, Rafael J. Wysocki, Alain Knaff, Linus Torvalds,
Li, Shaohua
On 01/14/2010 05:39 PM, Venkatesh Pallipadi wrote:
> Current kernel uses hpet MSI interrupts as both percpu clockevent device
> and also as clockevent broadcast device in place of IRQ0 timer.
> There seems to be issues with HPET MSI usage on some platforms. So,
> restrict it to be only used as a percpu clockevent device.
Any ideas of which platforms? It would help avoiding "cargo cult
programming" in the future.
Other than that, I presume this is a .33 patch?
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch 2/4] x86: Do not use hpet MSI as clockevent broadcast device
2010-01-15 2:00 ` H. Peter Anvin
@ 2010-01-15 2:07 ` Pallipadi, Venkatesh
0 siblings, 0 replies; 8+ messages in thread
From: Pallipadi, Venkatesh @ 2010-01-15 2:07 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Ingo Molnar, Thomas Gleixner, Len Brown, Mark Hounschell,
linux-kernel@vger.kernel.org, Rafael J. Wysocki, Alain Knaff,
Linus Torvalds, Li, Shaohua
On Thu, 2010-01-14 at 18:00 -0800, H. Peter Anvin wrote:
> On 01/14/2010 05:39 PM, Venkatesh Pallipadi wrote:
> > Current kernel uses hpet MSI interrupts as both percpu clockevent device
> > and also as clockevent broadcast device in place of IRQ0 timer.
> > There seems to be issues with HPET MSI usage on some platforms. So,
> > restrict it to be only used as a percpu clockevent device.
>
> Any ideas of which platforms? It would help avoiding "cargo cult
> programming" in the future.
>
> Other than that, I presume this is a .33 patch?
>
> -hpa
>
The problem reported by Mark has only been on (AMD) 790x series chip
sets.
I am not quite sure what the quick-fix for .33 would be.
- Either use dmi to stop using HPET MSI on this chipset
- or stop using HPET MSI altogether, until this complete patchset
provides a better alternative for .34.
I think it should be first option.
Thanks,
Venki
^ permalink raw reply [flat|nested] 8+ messages in thread
* [patch 3/4] clockevent: Add tick_check_fallback_timer to look for fallback timer
2010-01-15 1:39 [patch 0/4] Only use HPET MSI timers on systems with deep C-state support Venkatesh Pallipadi
2010-01-15 1:39 ` [patch 1/4] acpi: acpi_cpu_soft_notify() not getting called on resume Venkatesh Pallipadi
2010-01-15 1:39 ` [patch 2/4] x86: Do not use hpet MSI as clockevent broadcast device Venkatesh Pallipadi
@ 2010-01-15 1:40 ` Venkatesh Pallipadi
2010-01-15 1:40 ` [patch 4/4] x86: Adjust HPET MSI timer rating to work with tick_check_fallback_timer Venkatesh Pallipadi
2010-01-15 21:53 ` [patch 0/4] Only use HPET MSI timers on systems with deep C-state support Mark Hounschell
4 siblings, 0 replies; 8+ messages in thread
From: Venkatesh Pallipadi @ 2010-01-15 1:40 UTC (permalink / raw)
To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Len Brown,
Mark Hounschell
Cc: linux-kernel, Rafael J. Wysocki, Alain Knaff, Linus Torvalds,
Li, Shaohua
[-- Attachment #1: 0004-clockevent-Add-tick_check_fallback_timer-to-look-fo.patch --]
[-- Type: text/plain, Size: 5089 bytes --]
Before switching to braodcast timer, in the event of C3 timer stoppage,
look for any other per cpu timer as a fallback.
This will be used in the following patch where we use HPET MSI as a
fallback timer for LAPIC timer, when C-states are supported.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
---
include/linux/clockchips.h | 2 +
kernel/time/clockevents.c | 29 +++++++++++++++++++++++++++-
kernel/time/tick-common.c | 44 +++++++++++++++++++++++++++++++++++++++++++
kernel/time/tick-internal.h | 3 ++
4 files changed, 77 insertions(+), 1 deletions(-)
diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 5352187..791952f 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -51,10 +51,12 @@ enum clock_event_nofitiers {
* - Clockevent source stops in C3 State and needs broadcast support.
* - Local APIC timer is used as a dummy device.
* - Do not use this device as a broadcast device.
+ * - Rating has been reduced for this device as system is C3 capable.
*/
#define CLOCK_EVT_FEAT_C3STOP 0x000004
#define CLOCK_EVT_FEAT_DUMMY 0x000008
#define CLOCK_EVT_FEAT_NO_BROADCAST 0x000010
+#define CLOCK_EVT_FEAT_RATING_REDUCED 0x000020
/**
* struct clock_event_device - clock event device descriptor
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 6f740d9..6641c97 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -161,7 +161,7 @@ static void clockevents_do_notify(unsigned long reason, void *dev)
* Called after a notify add to make devices available which were
* released from the notifier call.
*/
-static void clockevents_notify_released(void)
+void clockevents_notify_released(void)
{
struct clock_event_device *dev;
@@ -203,6 +203,33 @@ void clockevents_handle_noop(struct clock_event_device *dev)
}
/**
+ * clockevents_lookup_newdev - lookup for a higher rated device on this cpu
+ * @cpu: CPU that the call corresponds to
+ * @rating: baseline rating (look for higher than this rating)
+ *
+ * Called from the notifier chain. clockevents_lock is held already
+ */
+struct clock_event_device *clockevents_lookup_newdev(int cpu, int rating)
+{
+ struct clock_event_device *dev, *tmp, *found_dev = NULL;
+ int found_rating = 0;
+
+ list_for_each_entry_safe(dev, tmp, &clockevent_devices, list) {
+ if (cpumask_test_cpu(cpu, dev->cpumask) &&
+ cpumask_weight(dev->cpumask) == 1 &&
+ dev->rating > found_rating) {
+ found_rating = dev->rating;
+ found_dev = dev;
+ }
+ }
+
+ if (found_rating > rating)
+ return found_dev;
+
+ return NULL;
+}
+
+/**
* clockevents_exchange_device - release and request clock devices
* @old: device to release (can be NULL)
* @new: device to request (can be NULL)
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index b6b898d..e9626ad 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -284,6 +284,48 @@ out_bc:
}
/*
+ * Try to find an alternate timer which does not need broadcast.
+ */
+static void tick_check_fallback_timer(void)
+{
+ struct clock_event_device *curdev;
+ struct clock_event_device *newdev = NULL;
+ struct tick_device *td;
+ int cpu;
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&tick_device_lock, flags);
+
+ cpu = smp_processor_id();
+
+ td = &per_cpu(tick_cpu_device, cpu);
+ curdev = td->evtdev;
+
+ /* Check whether current timer is affected by C3 */
+ if (!curdev || !cpumask_equal(curdev->cpumask, cpumask_of(cpu)) ||
+ !(curdev->features & CLOCK_EVT_FEAT_C3STOP))
+ goto out;
+
+ /* Check if we have already reduced the rating of this device once */
+ if (curdev->features & CLOCK_EVT_FEAT_RATING_REDUCED)
+ goto out;
+
+ curdev->features |= CLOCK_EVT_FEAT_RATING_REDUCED;
+ curdev->rating -= 10;
+ newdev = clockevents_lookup_newdev(cpu, curdev->rating);
+
+out:
+ raw_spin_unlock_irqrestore(&tick_device_lock, flags);
+
+ if (newdev) {
+ tick_check_new_device(newdev);
+ clockevents_notify_released();
+ }
+
+ return;
+}
+
+/*
* Transfer the do_timer job away from a dying cpu.
*
* Called with interrupts disabled.
@@ -365,6 +407,8 @@ static int tick_notify(struct notifier_block *nb, unsigned long reason,
return tick_check_new_device(dev);
case CLOCK_EVT_NOTIFY_BROADCAST_ON:
+ tick_check_fallback_timer();
+ /* FALLTHRU */
case CLOCK_EVT_NOTIFY_BROADCAST_OFF:
case CLOCK_EVT_NOTIFY_BROADCAST_FORCE:
tick_broadcast_on_off(reason, dev);
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index 290eefb..c72703d 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -14,6 +14,9 @@ extern void tick_setup_periodic(struct clock_event_device *dev, int broadcast);
extern void tick_handle_periodic(struct clock_event_device *dev);
extern void clockevents_shutdown(struct clock_event_device *dev);
+extern struct clock_event_device *clockevents_lookup_newdev(int cpu,
+ int rating);
+extern void clockevents_notify_released(void);
/*
* NO_HZ / high resolution timer shared code
--
1.6.0.6
--
^ permalink raw reply related [flat|nested] 8+ messages in thread* [patch 4/4] x86: Adjust HPET MSI timer rating to work with tick_check_fallback_timer
2010-01-15 1:39 [patch 0/4] Only use HPET MSI timers on systems with deep C-state support Venkatesh Pallipadi
` (2 preceding siblings ...)
2010-01-15 1:40 ` [patch 3/4] clockevent: Add tick_check_fallback_timer to look for fallback timer Venkatesh Pallipadi
@ 2010-01-15 1:40 ` Venkatesh Pallipadi
2010-01-15 21:53 ` [patch 0/4] Only use HPET MSI timers on systems with deep C-state support Mark Hounschell
4 siblings, 0 replies; 8+ messages in thread
From: Venkatesh Pallipadi @ 2010-01-15 1:40 UTC (permalink / raw)
To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Len Brown,
Mark Hounschell
Cc: linux-kernel, Rafael J. Wysocki, Alain Knaff, Linus Torvalds,
Li, Shaohua
[-- Attachment #1: 0005-x86-Adjust-HPET-MSI-timer-rating-to-work-with-tick_.patch --]
[-- Type: text/plain, Size: 1791 bytes --]
Before this change we had
Normal lapic_clockevent rating 100
hpet MSI (percpu) clockevent rating 110
Always Running lapic_clockevent rating 150
As a result, on systems that support HPET MSI, percpu clockevents got
priority over LAPIC timer. That was ok when systems supported deep
C-state. But, that was sub-optimal on systems that did not support deep
C-states as HPETs are slower than LAPIC.
There was also a functional issue with usage of HPET MSI on some
platforms, which do not support deep C-state as reported here.
http://lkml.indiana.edu/hypermail/linux/kernel/0912.2/01118.html
After the change,
hpet MSI (percpu) clockevent rating 95
Normal lapic_clockevent rating 100
Always Running lapic_clockevent rating 150
And we reduce the rating of non-Always_Running LAPIC timer (to 90),
when we see that deep C-states are supported and switch to hpet MSI.
This change makes the timer usage optimal in terms of performance and
also eliminates the functionality issue mentioned above.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
---
arch/x86/kernel/hpet.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index dd9370b..ccb3752 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -555,7 +555,11 @@ static void init_one_hpet_msi_clockevent(struct hpet_dev *hdev, int cpu)
hpet_setup_irq(hdev);
evt->irq = hdev->irq;
- evt->rating = 110;
+ /*
+ * Rating should be within 10 less than lapic timer for
+ * timer switch to happen when deep C-states are supported.
+ */
+ evt->rating = 95;
evt->features = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_NO_BROADCAST;
if (hdev->flags & HPET_DEV_PERI_CAP)
evt->features |= CLOCK_EVT_FEAT_PERIODIC;
--
1.6.0.6
--
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [patch 0/4] Only use HPET MSI timers on systems with deep C-state support
2010-01-15 1:39 [patch 0/4] Only use HPET MSI timers on systems with deep C-state support Venkatesh Pallipadi
` (3 preceding siblings ...)
2010-01-15 1:40 ` [patch 4/4] x86: Adjust HPET MSI timer rating to work with tick_check_fallback_timer Venkatesh Pallipadi
@ 2010-01-15 21:53 ` Mark Hounschell
4 siblings, 0 replies; 8+ messages in thread
From: Mark Hounschell @ 2010-01-15 21:53 UTC (permalink / raw)
To: Venkatesh Pallipadi
Cc: Ingo Molnar, H. Peter Anvin, Thomas Gleixner, Len Brown,
linux-kernel, Rafael J. Wysocki, Alain Knaff, Linus Torvalds,
Li, Shaohua
On 01/14/2010 08:39 PM, Venkatesh Pallipadi wrote:
> There is a functionality issue reported on some AMD platforms
> http://lkml.indiana.edu/hypermail/linux/kernel/0912.2/01118.html
> wherein, fdformat fails when HPET MSI based percpu timer is used.
>
> We do not have the real root-cause for that problem. But, that
> report exposed an issue with our current usage HPET MSI timers.
> We use HPET MSI timers even on platforms that do not have
> support for C2/C3 states. On those systems we should rather be
> using LAPIC timers.
>
> So, this series of patches does just that.
> * Use LAPIC timer when there is always running APIC timer
> * Use LAPIC timer on platforms that do not have support for deep C states
> * Only use HPET MSI timers as percpu timers on systems that have LAPICs
> that stop in deep C-states _and_ system supports deep C-states
>
> The change turned out to be more than what I expected, due to
> the current static nature of clockevent rating and unrelated
> issues in acpi processor driver resume path. I also ended up
> touching different subsystems to handle this.
>
> If the patchset resolves the issue for Mark and if it looks sane
> can one of the maintainers queue it up for .34
>
> Reported-by: Mark Hounschell <markh@compro.net>
> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
>
It does resolve my problem
Mark
^ permalink raw reply [flat|nested] 8+ messages in thread