[PATCH] [0/10] x86: MCE: machine check bug fix series

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] [0/10] x86: MCE: machine check bug fix series
@ 2009-02-12 12:37 Andi Kleen
  2009-02-12 12:37 ` [PATCH] [1/10] x86: MCE: Reinitialize per cpu features on resume v3 Andi Kleen
                   ` (9 more replies)
  0 siblings, 10 replies; 13+ messages in thread
From: Andi Kleen @ 2009-02-12 12:37 UTC (permalink / raw)
  To: akpm, x86, linux-kernel

This series fixes a couple of bugs that have been 
found in the machine check code in the last few months.

This is currently only the bug fixes, none of the
crypto-contronversal 32bit<->64bit unification patches. 

Near all of it has been submitted and reviewed earlier.

The patches should be merged either into 2.6.29
(to get the bug fixes there) or if that is not possible
or into 2.6.30. They have been tested against 2.6.29-rc4,
but also apply to x86 tip as of 090212.

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [1/10] x86: MCE: Reinitialize per cpu features on resume v3
  2009-02-12 12:37 [PATCH] [0/10] x86: MCE: machine check bug fix series Andi Kleen
@ 2009-02-12 12:37 ` Andi Kleen
  2009-02-12 12:37 ` [PATCH] [2/10] x86: MCE: Don't disable machine checks during code patching Andi Kleen
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2009-02-12 12:37 UTC (permalink / raw)
  To: akpm, x86, linux-kernel


Impact: Bug fix

This fixes a long standing bug in the machine check code. On resume the
boot CPU wouldn't get its vendor specific state like thermal handling
reinitialized. This means the boot cpu wouldn't ever get any thermal
events reported again.

Call the respective initialization functions on resume

v2: Remove ancient init because they don't have a resume device anyways.
Pointed out by Thomas Gleixner.
v3: Now fix the Subject too to reflect v2 change

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/kernel/cpu/mcheck/mce_64.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 12:10:22.000000000 +0100
@@ -734,6 +734,7 @@
 static int mce_resume(struct sys_device *dev)
 {
 	mce_init(NULL);
+	mce_cpu_features(&current_cpu_data);
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [2/10] x86: MCE: Don't disable machine checks during code patching
  2009-02-12 12:37 [PATCH] [0/10] x86: MCE: machine check bug fix series Andi Kleen
  2009-02-12 12:37 ` [PATCH] [1/10] x86: MCE: Reinitialize per cpu features on resume v3 Andi Kleen
@ 2009-02-12 12:37 ` Andi Kleen
  2009-02-12 12:37 ` [PATCH] [3/10] x86: MCE: Always use separate work queue to run trigger Andi Kleen
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2009-02-12 12:37 UTC (permalink / raw)
  To: akpm, x86, linux-kernel


Impact: low priority bug fix

This removes part of a a patch I added myself some time ago. After some 
consideration the patch was a bad idea. In particular it stopped machine check
exceptions during code patching.

To quote the comment:

        * MCEs only happen when something got corrupted and in this
        * case we must do something about the corruption.
        * Ignoring it is worse than a unlikely patching race.
        * Also machine checks tend to be broadcast and if one CPU
        * goes into machine check the others follow quickly, so we don't
        * expect a machine check to cause undue problems during to code
        * patching.

So undo the machine check related parts of 
8f4e956b313dcccbc7be6f10808952345e3b638c NMIs are still disabled.

This only removes code, the only additions are a new comment.


Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/include/asm/mce.h          |    2 --
 arch/x86/kernel/alternative.c       |   17 +++++++++++------
 arch/x86/kernel/cpu/mcheck/mce_32.c |   14 --------------
 arch/x86/kernel/cpu/mcheck/mce_64.c |   14 --------------
 4 files changed, 11 insertions(+), 36 deletions(-)

Index: linux/arch/x86/kernel/alternative.c
===================================================================
--- linux.orig/arch/x86/kernel/alternative.c	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/kernel/alternative.c	2009-02-12 11:30:51.000000000 +0100
@@ -414,9 +414,17 @@
 	   that might execute the to be patched code.
 	   Other CPUs are not running. */
 	stop_nmi();
-#ifdef CONFIG_X86_MCE
-	stop_mce();
-#endif
+
+	/*
+	 * Don't stop machine check exceptions while patching.
+	 * MCEs only happen when something got corrupted and in this
+	 * case we must do something about the corruption.
+	 * Ignoring it is worse than a unlikely patching race.
+	 * Also machine checks tend to be broadcast and if one CPU
+	 * goes into machine check the others follow quickly, so we don't
+	 * expect a machine check to cause undue problems during to code
+	 * patching.
+	 */
 
 	apply_alternatives(__alt_instructions, __alt_instructions_end);
 
@@ -456,9 +464,6 @@
 				(unsigned long)__smp_locks_end);
 
 	restart_nmi();
-#ifdef CONFIG_X86_MCE
-	restart_mce();
-#endif
 }
 
 /**
Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 12:10:22.000000000 +0100
@@ -680,20 +680,6 @@
 	&mce_chrdev_ops,
 };
 
-static unsigned long old_cr4 __initdata;
-
-void __init stop_mce(void)
-{
-	old_cr4 = read_cr4();
-	clear_in_cr4(X86_CR4_MCE);
-}
-
-void __init restart_mce(void)
-{
-	if (old_cr4 & X86_CR4_MCE)
-		set_in_cr4(X86_CR4_MCE);
-}
-
 /*
  * Old style boot options parsing. Only for compatibility.
  */
Index: linux/arch/x86/include/asm/mce.h
===================================================================
--- linux.orig/arch/x86/include/asm/mce.h	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/include/asm/mce.h	2009-02-12 12:10:19.000000000 +0100
@@ -120,8 +120,6 @@
 #else
 #define mcheck_init(c) do { } while (0)
 #endif
-extern void stop_mce(void);
-extern void restart_mce(void);
 
 #endif /* __KERNEL__ */
 #endif /* _ASM_X86_MCE_H */
Index: linux/arch/x86/kernel/cpu/mcheck/mce_32.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_32.c	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_32.c	2009-02-12 11:30:51.000000000 +0100
@@ -60,20 +60,6 @@
 	}
 }
 
-static unsigned long old_cr4 __initdata;
-
-void __init stop_mce(void)
-{
-	old_cr4 = read_cr4();
-	clear_in_cr4(X86_CR4_MCE);
-}
-
-void __init restart_mce(void)
-{
-	if (old_cr4 & X86_CR4_MCE)
-		set_in_cr4(X86_CR4_MCE);
-}
-
 static int __init mcheck_disable(char *str)
 {
 	mce_disabled = 1;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [3/10] x86: MCE: Always use separate work queue to run trigger
  2009-02-12 12:37 [PATCH] [0/10] x86: MCE: machine check bug fix series Andi Kleen
  2009-02-12 12:37 ` [PATCH] [1/10] x86: MCE: Reinitialize per cpu features on resume v3 Andi Kleen
  2009-02-12 12:37 ` [PATCH] [2/10] x86: MCE: Don't disable machine checks during code patching Andi Kleen
@ 2009-02-12 12:37 ` Andi Kleen
  2009-02-12 12:37 ` [PATCH] [4/10] x86: MCE: Switch machine check polling to per CPU timer v3 Andi Kleen
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2009-02-12 12:37 UTC (permalink / raw)
  To: akpm, x86, linux-kernel


Impact: Needed for bug fix in next patch

This relaxes the requirement that mce_notify_user has to run in process 
context. Useful for future changes, but also leads to cleaner
behaviour now. Now instead mce_notify_user can be called directly
from interrupt (but not NMI) context.

The work queue only uses a single global work struct, which can be done safely
because it is always free to reuse before the trigger function is executed.
This way no events can be lost.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/kernel/cpu/mcheck/mce_64.c |   25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 12:10:21.000000000 +0100
@@ -380,11 +380,17 @@
 	schedule_delayed_work(&mcheck_work, next_interval);
 }
 
+static void mce_do_trigger(struct work_struct *work)
+{
+	call_usermodehelper(trigger, trigger_argv, NULL, UMH_NO_WAIT);
+}
+
+static DECLARE_WORK(mce_trigger_work, mce_do_trigger);
+
 /*
- * This is only called from process context.  This is where we do
- * anything we need to alert userspace about new MCEs.  This is called
- * directly from the poller and also from entry.S and idle, thanks to
- * TIF_MCE_NOTIFY.
+ * Notify the user(s) about new machine check events.
+ * Can be called from interrupt context, but not from machine check/NMI
+ * context.
  */
 int mce_notify_user(void)
 {
@@ -394,9 +400,14 @@
 		unsigned long now = jiffies;
 
 		wake_up_interruptible(&mce_wait);
-		if (trigger[0])
-			call_usermodehelper(trigger, trigger_argv, NULL,
-						UMH_NO_WAIT);
+
+		/*
+		 * There is no risk of missing notifications because
+		 * work_pending is always cleared before the function is
+		 * executed.
+		 */
+		if (trigger[0] && !work_pending(&mce_trigger_work))
+			schedule_work(&mce_trigger_work);
 
 		if (time_after_eq(now, last_print + (check_interval*HZ))) {
 			last_print = now;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [4/10] x86: MCE: Switch machine check polling to per CPU timer v3
  2009-02-12 12:37 [PATCH] [0/10] x86: MCE: machine check bug fix series Andi Kleen
                   ` (2 preceding siblings ...)
  2009-02-12 12:37 ` [PATCH] [3/10] x86: MCE: Always use separate work queue to run trigger Andi Kleen
@ 2009-02-12 12:37 ` Andi Kleen
  2009-02-13  5:27   ` Tim Hockin
  2009-02-12 12:37 ` [PATCH] [5/10] x86: MCE: Don't set up mce sysdev devices with mce=off Andi Kleen
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 13+ messages in thread
From: Andi Kleen @ 2009-02-12 12:37 UTC (permalink / raw)
  To: thockin, akpm, x86, linux-kernel


Impact: Higher priority bug fix

The machine check poller runs a single timer and then broadcasted an
IPI to all CPUs to check them. This leads to unnecessary
synchronization between CPUs. The original CPU running the timer has
to wait potentially a long time for all other CPUs answering. This is
also real time unfriendly and in general inefficient.

This was especially a problem on systems with a lot of events where
the poller run with a higher frequency after processing some events. 
There could be more and more CPU time wasted with this, to
the point of significantly slowing down machines.

The machine check polling is actually fully independent per CPU, so
there's no reason to not just do this all with per CPU timers.  This
patch implements that.

Also switch the poller also to use standard timers instead of work
queues. It was using work queues to be able to execute a user program
on a event, but mce_notify_user() handles this case now with a
separate callback. So instead always run the poll code in in a
standard per CPU timer, which means that in the common case of not
having to execute a trigger there will be less overhead.

This allows to clean up the initialization significantly, because
standard timers are already up when machine checks get init'ed.  No
multiple initialization functions.

Thanks to Thomas Gleixner for some help.

Cc: thockin@google.com
v2: Use del_timer_sync() on cpu shutdown and don't try to handle
migrated timers.
v3: Add WARN_ON for timer running on unexpected CPU

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/kernel/cpu/mcheck/mce_64.c |   68 +++++++++++++++++++++++-------------
 1 file changed, 45 insertions(+), 23 deletions(-)

Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 12:10:21.000000000 +0100
@@ -353,18 +353,17 @@
 
 static int check_interval = 5 * 60; /* 5 minutes */
 static int next_interval; /* in jiffies */
-static void mcheck_timer(struct work_struct *work);
-static DECLARE_DELAYED_WORK(mcheck_work, mcheck_timer);
+static void mcheck_timer(unsigned long);
+static DEFINE_PER_CPU(struct timer_list, mce_timer);
 
-static void mcheck_check_cpu(void *info)
+static void mcheck_timer(unsigned long data)
 {
+	struct timer_list *t = &per_cpu(mce_timer, data);
+
+	WARN_ON(smp_processor_id() != data);
+
 	if (mce_available(&current_cpu_data))
 		do_machine_check(NULL, 0);
-}
-
-static void mcheck_timer(struct work_struct *work)
-{
-	on_each_cpu(mcheck_check_cpu, NULL, 1);
 
 	/*
 	 * Alert userspace if needed.  If we logged an MCE, reduce the
@@ -377,7 +376,8 @@
 				(int)round_jiffies_relative(check_interval*HZ));
 	}
 
-	schedule_delayed_work(&mcheck_work, next_interval);
+	t->expires = jiffies + next_interval;
+	add_timer(t);
 }
 
 static void mce_do_trigger(struct work_struct *work)
@@ -436,16 +436,11 @@
 
 static __init int periodic_mcheck_init(void)
 {
-	next_interval = check_interval * HZ;
-	if (next_interval)
-		schedule_delayed_work(&mcheck_work,
-				      round_jiffies_relative(next_interval));
-	idle_notifier_register(&mce_idle_notifier);
-	return 0;
+       idle_notifier_register(&mce_idle_notifier);
+       return 0;
 }
 __initcall(periodic_mcheck_init);
 
-
 /*
  * Initialize Machine Checks for a CPU.
  */
@@ -515,6 +510,20 @@
 	}
 }
 
+static void mce_init_timer(void)
+{
+	struct timer_list *t = &__get_cpu_var(mce_timer);
+
+	/* data race harmless because everyone sets to the same value */
+	if (!next_interval)
+		next_interval = check_interval * HZ;
+	if (!next_interval)
+		return;
+	setup_timer(t, mcheck_timer, smp_processor_id());
+	t->expires = round_jiffies_relative(jiffies + next_interval);
+	add_timer(t);
+}
+
 /*
  * Called for each booted CPU to set up machine checks.
  * Must be called with preempt off.
@@ -529,6 +538,7 @@
 
 	mce_init(NULL);
 	mce_cpu_features(c);
+	mce_init_timer();
 }
 
 /*
@@ -735,17 +745,19 @@
 	return 0;
 }
 
+static void mce_cpu_restart(void *data)
+{
+	del_timer_sync(&__get_cpu_var(mce_timer));
+	if (mce_available(&current_cpu_data))
+		mce_init(NULL);
+	mce_init_timer();
+}
+
 /* Reinit MCEs after user configuration changes */
 static void mce_restart(void)
 {
-	if (next_interval)
-		cancel_delayed_work(&mcheck_work);
-	/* Timer race is harmless here */
-	on_each_cpu(mce_init, NULL, 1);
 	next_interval = check_interval * HZ;
-	if (next_interval)
-		schedule_delayed_work(&mcheck_work,
-				      round_jiffies_relative(next_interval));
+	on_each_cpu(mce_cpu_restart, NULL, 1);
 }
 
 static struct sysdev_class mce_sysclass = {
@@ -874,6 +886,7 @@
 				      unsigned long action, void *hcpu)
 {
 	unsigned int cpu = (unsigned long)hcpu;
+	struct timer_list *t = &per_cpu(mce_timer, cpu);
 
 	switch (action) {
 	case CPU_ONLINE:
@@ -888,6 +901,15 @@
 			threshold_cpu_callback(action, cpu);
 		mce_remove_device(cpu);
 		break;
+	case CPU_DOWN_PREPARE:
+	case CPU_DOWN_PREPARE_FROZEN:
+		del_timer_sync(t);
+		break;
+	case CPU_DOWN_FAILED:
+	case CPU_DOWN_FAILED_FROZEN:
+		t->expires = round_jiffies_relative(jiffies + next_interval);
+		add_timer_on(t, cpu);
+		break;
 	}
 	return NOTIFY_OK;
 }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [5/10] x86: MCE: Don't set up mce sysdev devices with mce=off
  2009-02-12 12:37 [PATCH] [0/10] x86: MCE: machine check bug fix series Andi Kleen
                   ` (3 preceding siblings ...)
  2009-02-12 12:37 ` [PATCH] [4/10] x86: MCE: Switch machine check polling to per CPU timer v3 Andi Kleen
@ 2009-02-12 12:37 ` Andi Kleen
  2009-02-12 12:37 ` [PATCH] [6/10] x86: MCE: Disable machine checks on offlined CPUs Andi Kleen
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2009-02-12 12:37 UTC (permalink / raw)
  To: akpm, x86, linux-kernel


Impact: bug fix, in this case the resume handler shouldn't run which
	avoids incorrectly reenabling machine checks on resume

When MCEs are completely disabled on the command line don't set 
up the sysdev devices for them either.

Includes a comment fix from Thomas Gleixner.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/kernel/cpu/mcheck/mce_64.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 12:10:21.000000000 +0100
@@ -151,6 +151,8 @@
 
 static int mce_available(struct cpuinfo_x86 *c)
 {
+	if (mce_dont_init)
+		return 0;
 	return cpu_has(c, X86_FEATURE_MCE) && cpu_has(c, X86_FEATURE_MCA);
 }
 
@@ -532,8 +534,7 @@
 {
 	mce_cpu_quirks(c);
 
-	if (mce_dont_init ||
-	    !mce_available(c))
+	if (!mce_available(c))
 		return;
 
 	mce_init(NULL);
@@ -710,8 +711,7 @@
 	return 1;
 }
 
-/* mce=off disables machine check. Note you can re-enable it later
-   using sysfs.
+/* mce=off disables machine check.
    mce=TOLERANCELEVEL (number, see above)
    mce=bootlog Log MCEs from before booting. Disabled by default on AMD.
    mce=nobootlog Don't log MCEs from before booting. */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [6/10] x86: MCE: Disable machine checks on offlined CPUs.
  2009-02-12 12:37 [PATCH] [0/10] x86: MCE: machine check bug fix series Andi Kleen
                   ` (4 preceding siblings ...)
  2009-02-12 12:37 ` [PATCH] [5/10] x86: MCE: Don't set up mce sysdev devices with mce=off Andi Kleen
@ 2009-02-12 12:37 ` Andi Kleen
  2009-02-12 12:37 ` [PATCH] [7/10] x86: MCE: Disable machine checks on suspend v2 Andi Kleen
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2009-02-12 12:37 UTC (permalink / raw)
  To: akpm, x86, linux-kernel


Impact: Lower priority bug fix

Offlined CPUs could still get machine checks, but the machine check handler
cannot handle them properly, leading to an unconditional crash. Disable 
machine checks on CPUs that are going down.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/kernel/cpu/mcheck/mce_64.c |   23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 12:10:54.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 12:12:26.000000000 +0100
@@ -881,6 +881,27 @@
 	cpu_clear(cpu, mce_device_initialized);
 }
 
+/* Make sure there are no machine checks on offlined CPUs. */
+static void __cpuexit mce_disable_cpu(void *h)
+{
+	int i;
+
+	if (!mce_available(&current_cpu_data))
+		return;
+	for (i = 0; i < banks; i++)
+		wrmsrl(MSR_IA32_MC0_CTL + i*4, 0);
+}
+
+static void __cpuexit mce_reenable_cpu(void *h)
+{
+	int i;
+
+	if (!mce_available(&current_cpu_data))
+		return;
+	for (i = 0; i < banks; i++)
+		wrmsrl(MSR_IA32_MC0_CTL + i*4, bank[i]);
+}
+
 /* Get notified when a cpu comes on/off. Be hotplug friendly. */
 static int __cpuinit mce_cpu_callback(struct notifier_block *nfb,
 				      unsigned long action, void *hcpu)
@@ -904,11 +925,13 @@
 	case CPU_DOWN_PREPARE:
 	case CPU_DOWN_PREPARE_FROZEN:
 		del_timer_sync(t);
+		smp_call_function_single(cpu, mce_disable_cpu, NULL, 1);
 		break;
 	case CPU_DOWN_FAILED:
 	case CPU_DOWN_FAILED_FROZEN:
 		t->expires = round_jiffies_relative(jiffies + next_interval);
 		add_timer_on(t, cpu);
+		smp_call_function_single(cpu, mce_reenable_cpu, NULL, 1);
 		break;
 	}
 	return NOTIFY_OK;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [7/10] x86: MCE: Disable machine checks on suspend v2
  2009-02-12 12:37 [PATCH] [0/10] x86: MCE: machine check bug fix series Andi Kleen
                   ` (5 preceding siblings ...)
  2009-02-12 12:37 ` [PATCH] [6/10] x86: MCE: Disable machine checks on offlined CPUs Andi Kleen
@ 2009-02-12 12:37 ` Andi Kleen
  2009-02-12 12:37 ` [PATCH] [8/10] x86: MCE: Use force_sig_info to kill process in machine check Andi Kleen
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2009-02-12 12:37 UTC (permalink / raw)
  To: akpm, x86, linux-kernel


Impact: Bug fix

During suspend it is not reliable to process machine check
exceptions, because CPUs disappear but can still get machine check
broadcasts.  Also the system is slightly more likely to 
machine check them, but the handler is typically not a position
to handle them in a meaningfull way.

So disable them during suspend and enable them during resume.

Also make sure they are always disabled on hot-unplugged CPUs.

This new code assumes that suspend always hotunplugs all
non BP CPUs.


v2: Remove the WARN_ONs Thomas objected to.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/kernel/cpu/mcheck/mce_64.c |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 12:10:20.000000000 +0100
@@ -735,6 +735,29 @@
  * Sysfs support
  */
 
+/*
+ * Disable machine checks on suspend and shutdown. We can't really handle
+ * them later.
+ */
+static int mce_disable(void)
+{
+	int i;
+
+	for (i = 0; i < banks; i++)
+		wrmsrl(MSR_IA32_MC0_CTL + i*4, 0);
+	return 0;
+}
+
+static int mce_suspend(struct sys_device *dev, pm_message_t state)
+{
+	return mce_disable();
+}
+
+static int mce_shutdown(struct sys_device *dev)
+{
+	return mce_disable();
+}
+
 /* On resume clear all MCE state. Don't want to see leftovers from the BIOS.
    Only one CPU is active at this time, the others get readded later using
    CPU hotplug. */
@@ -761,6 +784,8 @@
 }
 
 static struct sysdev_class mce_sysclass = {
+	.suspend = mce_suspend,
+	.shutdown = mce_shutdown,
 	.resume = mce_resume,
 	.name = "machinecheck",
 };

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [8/10] x86: MCE: Use force_sig_info to kill process in machine check
  2009-02-12 12:37 [PATCH] [0/10] x86: MCE: machine check bug fix series Andi Kleen
                   ` (6 preceding siblings ...)
  2009-02-12 12:37 ` [PATCH] [7/10] x86: MCE: Disable machine checks on suspend v2 Andi Kleen
@ 2009-02-12 12:37 ` Andi Kleen
  2009-02-12 12:37 ` [PATCH] [9/10] x86: MCE: Fix a race condition in mce_read() Andi Kleen
  2009-02-12 12:37 ` [PATCH] [10/10] x86: MCE: Fix ifdef for 64bit thermal apic vector clear on shutdown Andi Kleen
  9 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2009-02-12 12:37 UTC (permalink / raw)
  To: akpm, x86, linux-kernel


Impact: bug fix (with tolerant == 3)

do_exit cannot be called directly from the exception handler because 
it can sleep and the exception handler runs on the exception stack.  
Use force_sig() instead.

Based on a earlier patch by Ying Huang who debugged the problem.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/kernel/cpu/mcheck/mce_64.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 12:10:20.000000000 +0100
@@ -297,11 +297,11 @@
 		 * If we know that the error was in user space, send a
 		 * SIGBUS.  Otherwise, panic if tolerance is low.
 		 *
-		 * do_exit() takes an awful lot of locks and has a slight
+		 * force_sig() takes an awful lot of locks and has a slight
 		 * risk of deadlocking.
 		 */
 		if (user_space) {
-			do_exit(SIGBUS);
+			force_sig(SIGBUS, current);
 		} else if (panic_on_oops || tolerant < 2) {
 			mce_panic("Uncorrected machine check",
 				&panicm, mcestart);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [9/10] x86: MCE: Fix a race condition in mce_read().
  2009-02-12 12:37 [PATCH] [0/10] x86: MCE: machine check bug fix series Andi Kleen
                   ` (7 preceding siblings ...)
  2009-02-12 12:37 ` [PATCH] [8/10] x86: MCE: Use force_sig_info to kill process in machine check Andi Kleen
@ 2009-02-12 12:37 ` Andi Kleen
  2009-02-12 12:37 ` [PATCH] [10/10] x86: MCE: Fix ifdef for 64bit thermal apic vector clear on shutdown Andi Kleen
  9 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2009-02-12 12:37 UTC (permalink / raw)
  To: ying.huang, akpm, x86, linux-kernel


From: Huang Ying <ying.huang@intel.com>

Impact: bugfix

Considering the situation as follow:

before: mcelog.next == 1, mcelog.entry[0].finished = 1

+--------------------------------------------------------------------------
R                   W1                  W2                  W3

read mcelog.next (1)
                    mcelog.next++ (2)
                    (working on entry 1,
                    finished == 0)

mcelog.next = 0
                                        mcelog.next++ (1)
                                        (working on entry 0)
                                                           mcelog.next++ (2)
                                                           (working on entry 1)
                        <----------------- race ---------------->
                    (done on entry 1,
                    finished = 1)
                                                           (done on entry 1,
                                                           finished = 1)

To fix the race condition, a cmpxchg loop is added to mce_read() to
ensure no new MCE record can be added between mcelog.next reading and
mcelog.next = 0.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com

---
 arch/x86/kernel/cpu/mcheck/mce_64.c |   41 +++++++++++++++++++++---------------
 1 file changed, 24 insertions(+), 17 deletions(-)

Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 12:10:56.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 12:11:06.000000000 +0100
@@ -595,7 +595,7 @@
 {
 	unsigned long *cpu_tsc;
 	static DEFINE_MUTEX(mce_read_mutex);
-	unsigned next;
+	unsigned prev, next;
 	char __user *buf = ubuf;
 	int i, err;
 
@@ -614,25 +614,32 @@
 	}
 
 	err = 0;
-	for (i = 0; i < next; i++) {
-		unsigned long start = jiffies;
-
-		while (!mcelog.entry[i].finished) {
-			if (time_after_eq(jiffies, start + 2)) {
-				memset(mcelog.entry + i,0, sizeof(struct mce));
-				goto timeout;
+	prev = 0;
+	do {
+		for (i = prev; i < next; i++) {
+			unsigned long start = jiffies;
+
+			while (!mcelog.entry[i].finished) {
+				if (time_after_eq(jiffies, start + 2)) {
+					memset(mcelog.entry + i, 0,
+					       sizeof(struct mce));
+					goto timeout;
+				}
+				cpu_relax();
 			}
-			cpu_relax();
+			smp_rmb();
+			err |= copy_to_user(buf, mcelog.entry + i,
+					    sizeof(struct mce));
+			buf += sizeof(struct mce);
+timeout:
+			;
 		}
-		smp_rmb();
-		err |= copy_to_user(buf, mcelog.entry + i, sizeof(struct mce));
-		buf += sizeof(struct mce);
- timeout:
-		;
-	}
 
-	memset(mcelog.entry, 0, next * sizeof(struct mce));
-	mcelog.next = 0;
+		memset(mcelog.entry + prev, 0,
+		       (next - prev) * sizeof(struct mce));
+		prev = next;
+		next = cmpxchg(&mcelog.next, prev, 0);
+	} while (next != prev);
 
 	synchronize_sched();
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [10/10] x86: MCE: Fix ifdef for 64bit thermal apic vector clear on shutdown
  2009-02-12 12:37 [PATCH] [0/10] x86: MCE: machine check bug fix series Andi Kleen
                   ` (8 preceding siblings ...)
  2009-02-12 12:37 ` [PATCH] [9/10] x86: MCE: Fix a race condition in mce_read() Andi Kleen
@ 2009-02-12 12:37 ` Andi Kleen
  9 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2009-02-12 12:37 UTC (permalink / raw)
  To: akpm, x86, linux-kernel


Impact: Bugfix

The ifdef for the apic clear on shutdown for the 64bit intel thermal
vector was incorrect and never triggered. Fix that.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/kernel/apic.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/arch/x86/kernel/apic.c
===================================================================
--- linux.orig/arch/x86/kernel/apic.c	2009-02-12 11:30:48.000000000 +0100
+++ linux/arch/x86/kernel/apic.c	2009-02-12 12:10:16.000000000 +0100
@@ -862,7 +862,7 @@
 	}
 
 	/* lets not touch this if we didn't frob it */
-#if defined(CONFIG_X86_MCE_P4THERMAL) || defined(X86_MCE_INTEL)
+#if defined(CONFIG_X86_MCE_P4THERMAL) || defined(CONFIG_X86_MCE_INTEL)
 	if (maxlvt >= 5) {
 		v = apic_read(APIC_LVTTHMR);
 		apic_write(APIC_LVTTHMR, v | APIC_LVT_MASKED);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] [4/10] x86: MCE: Switch machine check polling to per CPU timer v3
  2009-02-12 12:39 [PATCH] [0/10] x86: MCE: machine check bug fix series Andi Kleen
@ 2009-02-12 12:39 ` Andi Kleen
  0 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2009-02-12 12:39 UTC (permalink / raw)
  To: thockin, akpm, mingo, tglx, hpa, linux-kernel


Impact: Higher priority bug fix

The machine check poller runs a single timer and then broadcasted an
IPI to all CPUs to check them. This leads to unnecessary
synchronization between CPUs. The original CPU running the timer has
to wait potentially a long time for all other CPUs answering. This is
also real time unfriendly and in general inefficient.

This was especially a problem on systems with a lot of events where
the poller run with a higher frequency after processing some events. 
There could be more and more CPU time wasted with this, to
the point of significantly slowing down machines.

The machine check polling is actually fully independent per CPU, so
there's no reason to not just do this all with per CPU timers.  This
patch implements that.

Also switch the poller also to use standard timers instead of work
queues. It was using work queues to be able to execute a user program
on a event, but mce_notify_user() handles this case now with a
separate callback. So instead always run the poll code in in a
standard per CPU timer, which means that in the common case of not
having to execute a trigger there will be less overhead.

This allows to clean up the initialization significantly, because
standard timers are already up when machine checks get init'ed.  No
multiple initialization functions.

Thanks to Thomas Gleixner for some help.

Cc: thockin@google.com
v2: Use del_timer_sync() on cpu shutdown and don't try to handle
migrated timers.
v3: Add WARN_ON for timer running on unexpected CPU

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/kernel/cpu/mcheck/mce_64.c |   68 +++++++++++++++++++++++-------------
 1 file changed, 45 insertions(+), 23 deletions(-)

Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 12:10:21.000000000 +0100
@@ -353,18 +353,17 @@
 
 static int check_interval = 5 * 60; /* 5 minutes */
 static int next_interval; /* in jiffies */
-static void mcheck_timer(struct work_struct *work);
-static DECLARE_DELAYED_WORK(mcheck_work, mcheck_timer);
+static void mcheck_timer(unsigned long);
+static DEFINE_PER_CPU(struct timer_list, mce_timer);
 
-static void mcheck_check_cpu(void *info)
+static void mcheck_timer(unsigned long data)
 {
+	struct timer_list *t = &per_cpu(mce_timer, data);
+
+	WARN_ON(smp_processor_id() != data);
+
 	if (mce_available(&current_cpu_data))
 		do_machine_check(NULL, 0);
-}
-
-static void mcheck_timer(struct work_struct *work)
-{
-	on_each_cpu(mcheck_check_cpu, NULL, 1);
 
 	/*
 	 * Alert userspace if needed.  If we logged an MCE, reduce the
@@ -377,7 +376,8 @@
 				(int)round_jiffies_relative(check_interval*HZ));
 	}
 
-	schedule_delayed_work(&mcheck_work, next_interval);
+	t->expires = jiffies + next_interval;
+	add_timer(t);
 }
 
 static void mce_do_trigger(struct work_struct *work)
@@ -436,16 +436,11 @@
 
 static __init int periodic_mcheck_init(void)
 {
-	next_interval = check_interval * HZ;
-	if (next_interval)
-		schedule_delayed_work(&mcheck_work,
-				      round_jiffies_relative(next_interval));
-	idle_notifier_register(&mce_idle_notifier);
-	return 0;
+       idle_notifier_register(&mce_idle_notifier);
+       return 0;
 }
 __initcall(periodic_mcheck_init);
 
-
 /*
  * Initialize Machine Checks for a CPU.
  */
@@ -515,6 +510,20 @@
 	}
 }
 
+static void mce_init_timer(void)
+{
+	struct timer_list *t = &__get_cpu_var(mce_timer);
+
+	/* data race harmless because everyone sets to the same value */
+	if (!next_interval)
+		next_interval = check_interval * HZ;
+	if (!next_interval)
+		return;
+	setup_timer(t, mcheck_timer, smp_processor_id());
+	t->expires = round_jiffies_relative(jiffies + next_interval);
+	add_timer(t);
+}
+
 /*
  * Called for each booted CPU to set up machine checks.
  * Must be called with preempt off.
@@ -529,6 +538,7 @@
 
 	mce_init(NULL);
 	mce_cpu_features(c);
+	mce_init_timer();
 }
 
 /*
@@ -735,17 +745,19 @@
 	return 0;
 }
 
+static void mce_cpu_restart(void *data)
+{
+	del_timer_sync(&__get_cpu_var(mce_timer));
+	if (mce_available(&current_cpu_data))
+		mce_init(NULL);
+	mce_init_timer();
+}
+
 /* Reinit MCEs after user configuration changes */
 static void mce_restart(void)
 {
-	if (next_interval)
-		cancel_delayed_work(&mcheck_work);
-	/* Timer race is harmless here */
-	on_each_cpu(mce_init, NULL, 1);
 	next_interval = check_interval * HZ;
-	if (next_interval)
-		schedule_delayed_work(&mcheck_work,
-				      round_jiffies_relative(next_interval));
+	on_each_cpu(mce_cpu_restart, NULL, 1);
 }
 
 static struct sysdev_class mce_sysclass = {
@@ -874,6 +886,7 @@
 				      unsigned long action, void *hcpu)
 {
 	unsigned int cpu = (unsigned long)hcpu;
+	struct timer_list *t = &per_cpu(mce_timer, cpu);
 
 	switch (action) {
 	case CPU_ONLINE:
@@ -888,6 +901,15 @@
 			threshold_cpu_callback(action, cpu);
 		mce_remove_device(cpu);
 		break;
+	case CPU_DOWN_PREPARE:
+	case CPU_DOWN_PREPARE_FROZEN:
+		del_timer_sync(t);
+		break;
+	case CPU_DOWN_FAILED:
+	case CPU_DOWN_FAILED_FROZEN:
+		t->expires = round_jiffies_relative(jiffies + next_interval);
+		add_timer_on(t, cpu);
+		break;
 	}
 	return NOTIFY_OK;
 }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] [4/10] x86: MCE: Switch machine check polling to per CPU  timer v3
  2009-02-12 12:37 ` [PATCH] [4/10] x86: MCE: Switch machine check polling to per CPU timer v3 Andi Kleen
@ 2009-02-13  5:27   ` Tim Hockin
  0 siblings, 0 replies; 13+ messages in thread
From: Tim Hockin @ 2009-02-13  5:27 UTC (permalink / raw)
  To: Andi Kleen; +Cc: akpm, x86, linux-kernel

On Thu, Feb 12, 2009 at 4:37 AM, Andi Kleen <andi@firstfloor.org> wrote:
>
> Impact: Higher priority bug fix
>
> The machine check poller runs a single timer and then broadcasted an
> IPI to all CPUs to check them. This leads to unnecessary
> synchronization between CPUs. The original CPU running the timer has
> to wait potentially a long time for all other CPUs answering. This is
> also real time unfriendly and in general inefficient.
>
> This was especially a problem on systems with a lot of events where
> the poller run with a higher frequency after processing some events.
> There could be more and more CPU time wasted with this, to
> the point of significantly slowing down machines.
>
> The machine check polling is actually fully independent per CPU, so
> there's no reason to not just do this all with per CPU timers.  This
> patch implements that.

Great!  W're going to patch this is an sanity check it here.  We'll
send you info when we have some results.  It looks good to me.

> Also switch the poller also to use standard timers instead of work
> queues. It was using work queues to be able to execute a user program
> on a event, but mce_notify_user() handles this case now with a
> separate callback. So instead always run the poll code in in a
> standard per CPU timer, which means that in the common case of not
> having to execute a trigger there will be less overhead.
>
> This allows to clean up the initialization significantly, because
> standard timers are already up when machine checks get init'ed.  No
> multiple initialization functions.
>
> Thanks to Thomas Gleixner for some help.
>
> Cc: thockin@google.com
> v2: Use del_timer_sync() on cpu shutdown and don't try to handle
> migrated timers.
> v3: Add WARN_ON for timer running on unexpected CPU
>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
>
> ---
>  arch/x86/kernel/cpu/mcheck/mce_64.c |   68 +++++++++++++++++++++++-------------
>  1 file changed, 45 insertions(+), 23 deletions(-)
>
> Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
> ===================================================================
> --- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c      2009-02-12 11:30:51.000000000 +0100
> +++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c   2009-02-12 12:10:21.000000000 +0100
> @@ -353,18 +353,17 @@
>
>  static int check_interval = 5 * 60; /* 5 minutes */
>  static int next_interval; /* in jiffies */
> -static void mcheck_timer(struct work_struct *work);
> -static DECLARE_DELAYED_WORK(mcheck_work, mcheck_timer);
> +static void mcheck_timer(unsigned long);
> +static DEFINE_PER_CPU(struct timer_list, mce_timer);
>
> -static void mcheck_check_cpu(void *info)
> +static void mcheck_timer(unsigned long data)
>  {
> +       struct timer_list *t = &per_cpu(mce_timer, data);
> +
> +       WARN_ON(smp_processor_id() != data);
> +
>        if (mce_available(&current_cpu_data))
>                do_machine_check(NULL, 0);
> -}
> -
> -static void mcheck_timer(struct work_struct *work)
> -{
> -       on_each_cpu(mcheck_check_cpu, NULL, 1);
>
>        /*
>         * Alert userspace if needed.  If we logged an MCE, reduce the
> @@ -377,7 +376,8 @@
>                                (int)round_jiffies_relative(check_interval*HZ));
>        }
>
> -       schedule_delayed_work(&mcheck_work, next_interval);
> +       t->expires = jiffies + next_interval;
> +       add_timer(t);
>  }
>
>  static void mce_do_trigger(struct work_struct *work)
> @@ -436,16 +436,11 @@
>
>  static __init int periodic_mcheck_init(void)
>  {
> -       next_interval = check_interval * HZ;
> -       if (next_interval)
> -               schedule_delayed_work(&mcheck_work,
> -                                     round_jiffies_relative(next_interval));
> -       idle_notifier_register(&mce_idle_notifier);
> -       return 0;
> +       idle_notifier_register(&mce_idle_notifier);
> +       return 0;
>  }
>  __initcall(periodic_mcheck_init);
>
> -
>  /*
>  * Initialize Machine Checks for a CPU.
>  */
> @@ -515,6 +510,20 @@
>        }
>  }
>
> +static void mce_init_timer(void)
> +{
> +       struct timer_list *t = &__get_cpu_var(mce_timer);
> +
> +       /* data race harmless because everyone sets to the same value */
> +       if (!next_interval)
> +               next_interval = check_interval * HZ;
> +       if (!next_interval)
> +               return;
> +       setup_timer(t, mcheck_timer, smp_processor_id());
> +       t->expires = round_jiffies_relative(jiffies + next_interval);
> +       add_timer(t);
> +}
> +
>  /*
>  * Called for each booted CPU to set up machine checks.
>  * Must be called with preempt off.
> @@ -529,6 +538,7 @@
>
>        mce_init(NULL);
>        mce_cpu_features(c);
> +       mce_init_timer();
>  }
>
>  /*
> @@ -735,17 +745,19 @@
>        return 0;
>  }
>
> +static void mce_cpu_restart(void *data)
> +{
> +       del_timer_sync(&__get_cpu_var(mce_timer));
> +       if (mce_available(&current_cpu_data))
> +               mce_init(NULL);
> +       mce_init_timer();
> +}
> +
>  /* Reinit MCEs after user configuration changes */
>  static void mce_restart(void)
>  {
> -       if (next_interval)
> -               cancel_delayed_work(&mcheck_work);
> -       /* Timer race is harmless here */
> -       on_each_cpu(mce_init, NULL, 1);
>        next_interval = check_interval * HZ;
> -       if (next_interval)
> -               schedule_delayed_work(&mcheck_work,
> -                                     round_jiffies_relative(next_interval));
> +       on_each_cpu(mce_cpu_restart, NULL, 1);
>  }
>
>  static struct sysdev_class mce_sysclass = {
> @@ -874,6 +886,7 @@
>                                      unsigned long action, void *hcpu)
>  {
>        unsigned int cpu = (unsigned long)hcpu;
> +       struct timer_list *t = &per_cpu(mce_timer, cpu);
>
>        switch (action) {
>        case CPU_ONLINE:
> @@ -888,6 +901,15 @@
>                        threshold_cpu_callback(action, cpu);
>                mce_remove_device(cpu);
>                break;
> +       case CPU_DOWN_PREPARE:
> +       case CPU_DOWN_PREPARE_FROZEN:
> +               del_timer_sync(t);
> +               break;
> +       case CPU_DOWN_FAILED:
> +       case CPU_DOWN_FAILED_FROZEN:
> +               t->expires = round_jiffies_relative(jiffies + next_interval);
> +               add_timer_on(t, cpu);
> +               break;
>        }
>        return NOTIFY_OK;
>  }
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-02-13  5:27 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-12 12:37 [PATCH] [0/10] x86: MCE: machine check bug fix series Andi Kleen
2009-02-12 12:37 ` [PATCH] [1/10] x86: MCE: Reinitialize per cpu features on resume v3 Andi Kleen
2009-02-12 12:37 ` [PATCH] [2/10] x86: MCE: Don't disable machine checks during code patching Andi Kleen
2009-02-12 12:37 ` [PATCH] [3/10] x86: MCE: Always use separate work queue to run trigger Andi Kleen
2009-02-12 12:37 ` [PATCH] [4/10] x86: MCE: Switch machine check polling to per CPU timer v3 Andi Kleen
2009-02-13  5:27   ` Tim Hockin
2009-02-12 12:37 ` [PATCH] [5/10] x86: MCE: Don't set up mce sysdev devices with mce=off Andi Kleen
2009-02-12 12:37 ` [PATCH] [6/10] x86: MCE: Disable machine checks on offlined CPUs Andi Kleen
2009-02-12 12:37 ` [PATCH] [7/10] x86: MCE: Disable machine checks on suspend v2 Andi Kleen
2009-02-12 12:37 ` [PATCH] [8/10] x86: MCE: Use force_sig_info to kill process in machine check Andi Kleen
2009-02-12 12:37 ` [PATCH] [9/10] x86: MCE: Fix a race condition in mce_read() Andi Kleen
2009-02-12 12:37 ` [PATCH] [10/10] x86: MCE: Fix ifdef for 64bit thermal apic vector clear on shutdown Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2009-02-12 12:39 [PATCH] [0/10] x86: MCE: machine check bug fix series Andi Kleen
2009-02-12 12:39 ` [PATCH] [4/10] x86: MCE: Switch machine check polling to per CPU timer v3 Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox