[PATCH] [0/9] x86: CMCI: Add support for Intel CMCI

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] [0/9] x86: CMCI: Add support for Intel CMCI
@ 2009-02-12 12:49 Andi Kleen
  2009-02-12 12:49 ` [PATCH] [1/9] x86: CMCI: Export MAX_NR_BANKS Andi Kleen
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Andi Kleen @ 2009-02-12 12:49 UTC (permalink / raw)
  To: akpm, mingo, tglx, hpa, linux-kernel

Intel CMCI (Corrected Machine Check Interrupt) is a new
feature on Nehalem CPUs. It allows the CPU to trigger
interrupts on corrected machine check events, which allows faster
reaction to them instead of with the traditional 
polling timer.

This is similar to the existing AMD threshold interrupt
feature. I'm reusing some code from this.

In addition it also provides for better handling of machine
check banks shared between CPUs. This is pretty common
on Nehalem class systems, where threads and cores share
some banks with each other.

The series applies on top of the earlier bugfixes
and cleanups series. 

Aimed for 2.6.30. Tested on 2.6.29-rc4, but also
applies to x86 tip as of today.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] [1/9] x86: CMCI: Export MAX_NR_BANKS
  2009-02-12 12:49 [PATCH] [0/9] x86: CMCI: Add support for Intel CMCI Andi Kleen
@ 2009-02-12 12:49 ` Andi Kleen
  2009-02-12 12:49 ` [PATCH] [2/9] x86: CMCI: Factor out threshold interrupt handler Andi Kleen
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2009-02-12 12:49 UTC (permalink / raw)
  To: akpm, mingo, tglx, hpa, linux-kernel


Move MAX_NR_BANKS into mce.h because it's needed there
for followup patches.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/include/asm/mce.h          |    6 ++++++
 arch/x86/kernel/cpu/mcheck/mce_64.c |    6 ------
 2 files changed, 6 insertions(+), 6 deletions(-)

Index: linux/arch/x86/include/asm/mce.h
===================================================================
--- linux.orig/arch/x86/include/asm/mce.h	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/include/asm/mce.h	2009-02-12 12:10:18.000000000 +0100
@@ -95,6 +95,12 @@
 DECLARE_PER_CPU(struct sys_device, device_mce);
 extern void (*threshold_cpu_callback)(unsigned long action, unsigned int cpu);
 
+/*
+ * To support more than 128 would need to escape the predefined
+ * Linux defined extended banks first.
+ */
+#define MAX_NR_BANKS (MCE_EXTENDED_BANK - 1)
+
 #ifdef CONFIG_X86_MCE_INTEL
 void mce_intel_feature_init(struct cpuinfo_x86 *c);
 #else
Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 12:10:18.000000000 +0100
@@ -37,12 +37,6 @@
 
 #define MISC_MCELOG_MINOR 227
 
-/*
- * To support more than 128 would need to escape the predefined
- * Linux defined extended banks first.
- */
-#define MAX_NR_BANKS (MCE_EXTENDED_BANK - 1)
-
 atomic_t mce_entry;
 
 static int mce_dont_init;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] [2/9] x86: CMCI: Factor out threshold interrupt handler
  2009-02-12 12:49 [PATCH] [0/9] x86: CMCI: Add support for Intel CMCI Andi Kleen
  2009-02-12 12:49 ` [PATCH] [1/9] x86: CMCI: Export MAX_NR_BANKS Andi Kleen
@ 2009-02-12 12:49 ` Andi Kleen
  2009-02-12 12:49 ` [PATCH] [3/9] x86: CMCI: Avoid potential reentry of threshold interrupt Andi Kleen
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2009-02-12 12:49 UTC (permalink / raw)
  To: akpm, mingo, tglx, hpa, linux-kernel


Impact: cleanup; preparation for feature

The mce_amd_64 code has an own private MC threshold vector with an own
interrupt handler. Since Intel needs a similar handler
it makes sense to share the vector because both can not 
be active at the same time. 

I factored the common APIC handler code into a separate file which can
be used by both the Intel or AMD MC code. 

This is needed for the next patch which adds an Intel specific
CMCI handler.

This patch should be a nop for AMD, it just moves some code 
around.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/Kconfig                        |    5 +++++
 arch/x86/include/asm/mce.h              |    2 ++
 arch/x86/kernel/cpu/mcheck/Makefile     |    1 +
 arch/x86/kernel/cpu/mcheck/mce_amd_64.c |   15 ++++++---------
 arch/x86/kernel/cpu/mcheck/threshold.c  |   24 ++++++++++++++++++++++++
 5 files changed, 38 insertions(+), 9 deletions(-)

Index: linux/arch/x86/kernel/cpu/mcheck/threshold.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux/arch/x86/kernel/cpu/mcheck/threshold.c	2009-02-12 12:10:18.000000000 +0100
@@ -0,0 +1,24 @@
+/* Common corrected MCE threshold handler code */
+#include <linux/kernel.h>
+#include <linux/interrupt.h>
+#include <asm/mce.h>
+#include <asm/irq_vectors.h>
+#include <asm/idle.h>
+
+static void default_threshold_interrupt(void)
+{
+	printk(KERN_ERR "Unexpected threshold interrupt at vector %x\n",
+			 THRESHOLD_APIC_VECTOR);
+}
+
+void (*mce_threshold_vector)(void) = default_threshold_interrupt;
+
+asmlinkage void mce_threshold_interrupt(void)
+{
+	ack_APIC_irq();
+	exit_idle();
+	irq_enter();
+	inc_irq_stat(irq_threshold_count);
+	mce_threshold_vector();
+	irq_exit();
+}
Index: linux/arch/x86/Kconfig
===================================================================
--- linux.orig/arch/x86/Kconfig	2009-02-12 11:30:47.000000000 +0100
+++ linux/arch/x86/Kconfig	2009-02-12 11:30:51.000000000 +0100
@@ -751,6 +751,11 @@
 	   Additional support for AMD specific MCE features such as
 	   the DRAM Error Threshold.
 
+config X86_MCE_THRESHOLD
+	depends on X86_MCE_AMD || X86_MCE_INTEL
+	bool
+	default y
+
 config X86_MCE_NONFATAL
 	tristate "Check for non-fatal errors on AMD Athlon/Duron / Intel Pentium 4"
 	depends on X86_32 && X86_MCE
Index: linux/arch/x86/kernel/cpu/mcheck/Makefile
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/Makefile	2009-02-12 11:30:47.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/Makefile	2009-02-12 11:30:51.000000000 +0100
@@ -4,3 +4,4 @@
 obj-$(CONFIG_X86_MCE_INTEL)	+= mce_intel_64.o
 obj-$(CONFIG_X86_MCE_AMD)	+= mce_amd_64.o
 obj-$(CONFIG_X86_MCE_NONFATAL)	+= non-fatal.o
+obj-$(CONFIG_X86_MCE_THRESHOLD) += threshold.o
Index: linux/arch/x86/include/asm/mce.h
===================================================================
--- linux.orig/arch/x86/include/asm/mce.h	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/include/asm/mce.h	2009-02-12 12:10:17.000000000 +0100
@@ -135,5 +135,7 @@
 #define mcheck_init(c) do { } while (0)
 #endif
 
+extern void (*mce_threshold_vector)(void);
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_X86_MCE_H */
Index: linux/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_amd_64.c	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_amd_64.c	2009-02-12 12:10:17.000000000 +0100
@@ -79,6 +79,8 @@
 
 static DEFINE_PER_CPU(unsigned char, bank_map);	/* see which banks are on */
 
+static void amd_threshold_interrupt(void);
+
 /*
  * CPU Initialization
  */
@@ -174,6 +176,8 @@
 			tr.reset = 0;
 			tr.old_limit = 0;
 			threshold_restart_bank(&tr);
+
+			mce_threshold_vector = amd_threshold_interrupt;
 		}
 	}
 }
@@ -187,16 +191,12 @@
  * the interrupt goes off when error_count reaches threshold_limit.
  * the handler will simply log mcelog w/ software defined bank number.
  */
-asmlinkage void mce_threshold_interrupt(void)
+static void amd_threshold_interrupt(void)
 {
 	unsigned int bank, block;
 	struct mce m;
 	u32 low = 0, high = 0, address = 0;
 
-	ack_APIC_irq();
-	exit_idle();
-	irq_enter();
-
 	mce_setup(&m);
 
 	/* assume first bank caused it */
@@ -241,13 +241,10 @@
 				       + bank * NR_BLOCKS
 				       + block;
 				mce_log(&m);
-				goto out;
+				return;
 			}
 		}
 	}
-out:
-	inc_irq_stat(irq_threshold_count);
-	irq_exit();
 }
 
 /*

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] [3/9] x86: CMCI: Avoid potential reentry of threshold interrupt
  2009-02-12 12:49 [PATCH] [0/9] x86: CMCI: Add support for Intel CMCI Andi Kleen
  2009-02-12 12:49 ` [PATCH] [1/9] x86: CMCI: Export MAX_NR_BANKS Andi Kleen
  2009-02-12 12:49 ` [PATCH] [2/9] x86: CMCI: Factor out threshold interrupt handler Andi Kleen
@ 2009-02-12 12:49 ` Andi Kleen
  2009-02-12 12:49 ` [PATCH] [4/9] x86: MCE: Replace machine check events logged interval with ratelimit Andi Kleen
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2009-02-12 12:49 UTC (permalink / raw)
  To: akpm, mingo, tglx, hpa, linux-kernel


Impact: minor bugfix 

The threshold handler on AMD (and soon on Intel) could be theoretically
reentered by the hardware. This could lead to corrupted events
because the machine check poll code assumes it is not reentered.

Move the APIC ACK to the end of the interrupt handler to let
the hardware avoid that.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/kernel/cpu/mcheck/threshold.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux/arch/x86/kernel/cpu/mcheck/threshold.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/threshold.c	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/threshold.c	2009-02-12 11:30:51.000000000 +0100
@@ -15,10 +15,11 @@
 
 asmlinkage void mce_threshold_interrupt(void)
 {
-	ack_APIC_irq();
 	exit_idle();
 	irq_enter();
 	inc_irq_stat(irq_threshold_count);
 	mce_threshold_vector();
 	irq_exit();
+	/* Ack only at the end to avoid potential reentry */
+	ack_APIC_irq();
 }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] [4/9] x86: MCE: Replace machine check events logged interval with ratelimit
  2009-02-12 12:49 [PATCH] [0/9] x86: CMCI: Add support for Intel CMCI Andi Kleen
                   ` (2 preceding siblings ...)
  2009-02-12 12:49 ` [PATCH] [3/9] x86: CMCI: Avoid potential reentry of threshold interrupt Andi Kleen
@ 2009-02-12 12:49 ` Andi Kleen
  2009-02-12 12:49 ` [PATCH] [5/9] x86: CMCI: Use polled banks bitmap in machine check poller Andi Kleen
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2009-02-12 12:49 UTC (permalink / raw)
  To: akpm, mingo, tglx, hpa, linux-kernel


Impact: feature

Use a standard leaky bucket ratelimit for the machine check
warning print interval instead of waiting every check_interval. 
Also decrease the limit to twice per minute.
This interacts better with threshold interrupts because
they can happen more often than check_interval.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/kernel/cpu/mcheck/mce_64.c |   11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 12:10:17.000000000 +0100
@@ -28,6 +28,7 @@
 #include <linux/kdebug.h>
 #include <linux/kobject.h>
 #include <linux/sysfs.h>
+#include <linux/ratelimit.h>
 #include <asm/processor.h>
 #include <asm/msr.h>
 #include <asm/mce.h>
@@ -488,11 +489,11 @@
  */
 int mce_notify_user(void)
 {
+	/* Not more than two messages every minute */
+	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
+
 	clear_thread_flag(TIF_MCE_NOTIFY);
 	if (test_and_clear_bit(0, &notify_user)) {
-		static unsigned long last_print;
-		unsigned long now = jiffies;
-
 		wake_up_interruptible(&mce_wait);
 
 		/*
@@ -503,10 +504,8 @@
 		if (trigger[0] && !work_pending(&mce_trigger_work))
 			schedule_work(&mce_trigger_work);
 
-		if (time_after_eq(now, last_print + (check_interval*HZ))) {
-			last_print = now;
+		if (__ratelimit(&ratelimit))
 			printk(KERN_INFO "Machine check events logged\n");
-		}
 
 		return 1;
 	}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] [5/9] x86: CMCI: Use polled banks bitmap in machine check poller
  2009-02-12 12:49 [PATCH] [0/9] x86: CMCI: Add support for Intel CMCI Andi Kleen
                   ` (3 preceding siblings ...)
  2009-02-12 12:49 ` [PATCH] [4/9] x86: MCE: Replace machine check events logged interval with ratelimit Andi Kleen
@ 2009-02-12 12:49 ` Andi Kleen
  2009-02-12 12:49 ` [PATCH] [6/9] x86: CMCI: Define MSR names and fields for new CMCI registers Andi Kleen
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2009-02-12 12:49 UTC (permalink / raw)
  To: akpm, mingo, tglx, hpa, linux-kernel


Define a per cpu bitmap that contains the banks polled by the machine
check poller. This is needed for the CMCI code in the next patches
to be able to disable polling on specific banks.

The bank by default contains all banks, so there is no behaviour
change. Only future code will remove some banks from the polling
set.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/include/asm/mce.h              |    5 ++++-
 arch/x86/kernel/cpu/mcheck/mce_64.c     |   16 ++++++++++++----
 arch/x86/kernel/cpu/mcheck/mce_amd_64.c |    3 ++-
 3 files changed, 18 insertions(+), 6 deletions(-)

Index: linux/arch/x86/include/asm/mce.h
===================================================================
--- linux.orig/arch/x86/include/asm/mce.h	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/include/asm/mce.h	2009-02-12 12:10:17.000000000 +0100
@@ -119,11 +119,14 @@
 
 extern void do_machine_check(struct pt_regs *, long);
 
+typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
+DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
+
 enum mcp_flags {
 	MCP_TIMESTAMP = (1 << 0),	/* log time stamp */
 	MCP_UC = (1 << 1),		/* log uncorrected errors */
 };
-extern void machine_check_poll(enum mcp_flags flags);
+extern void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
 
 extern int mce_notify_user(void);
 
Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 12:10:17.000000000 +0100
@@ -62,6 +62,11 @@
 
 static DECLARE_WAIT_QUEUE_HEAD(mce_wait);
 
+/* MCA banks polled by the period polling timer for corrected events */
+DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = {
+	[0 ... BITS_TO_LONGS(MAX_NR_BANKS)-1] = ~0UL
+};
+
 /* Do initial initialization of a struct mce */
 void mce_setup(struct mce *m)
 {
@@ -191,7 +196,7 @@
  *
  * This is executed in standard interrupt context.
  */
-void machine_check_poll(enum mcp_flags flags)
+void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 {
 	struct mce m;
 	int i;
@@ -200,7 +205,7 @@
 
 	rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
 	for (i = 0; i < banks; i++) {
-		if (!bank[i])
+		if (!bank[i] || !test_bit(i, *b))
 			continue;
 
 		m.misc = 0;
@@ -458,7 +463,8 @@
 	WARN_ON(smp_processor_id() != data);
 
 	if (mce_available(&current_cpu_data))
-		machine_check_poll(MCP_TIMESTAMP);
+		machine_check_poll(MCP_TIMESTAMP,
+				&__get_cpu_var(mce_poll_banks));
 
 	/*
 	 * Alert userspace if needed.  If we logged an MCE, reduce the
@@ -567,11 +573,13 @@
 {
 	u64 cap;
 	int i;
+	mce_banks_t all_banks;
 
 	/*
 	 * Log the machine checks left over from the previous reset.
 	 */
-	machine_check_poll(MCP_UC);
+	bitmap_fill(all_banks, MAX_NR_BANKS);
+	machine_check_poll(MCP_UC, &all_banks);
 
 	set_in_cr4(X86_CR4_MCE);
 
Index: linux/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_amd_64.c	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_amd_64.c	2009-02-12 11:30:51.000000000 +0100
@@ -231,7 +231,8 @@
 
 			/* Log the machine check that caused the threshold
 			   event. */
-			machine_check_poll(MCP_TIMESTAMP);
+			machine_check_poll(MCP_TIMESTAMP,
+					&__get_cpu_var(mce_poll_banks));
 
 			if (high & MASK_OVERFLOW_HI) {
 				rdmsrl(address, m.misc);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] [6/9] x86: CMCI: Define MSR names and fields for new CMCI registers
  2009-02-12 12:49 [PATCH] [0/9] x86: CMCI: Add support for Intel CMCI Andi Kleen
                   ` (4 preceding siblings ...)
  2009-02-12 12:49 ` [PATCH] [5/9] x86: CMCI: Use polled banks bitmap in machine check poller Andi Kleen
@ 2009-02-12 12:49 ` Andi Kleen
  2009-02-12 12:49 ` [PATCH] [7/9] x86: CMCI: Add CMCI support Andi Kleen
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2009-02-12 12:49 UTC (permalink / raw)
  To: akpm, mingo, tglx, hpa, linux-kernel


CMCI means support for raising an interrupt on a corrected machine
check event instead of having to poll for it. It's a new feature in 
Intel Nehalem CPUs available on some machine check banks.

For details see the IA32 SDM Vol3a 14.5

Define the registers for it as a preparation for further patches.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/include/asm/apicdef.h   |    1 +
 arch/x86/include/asm/mce.h       |    2 ++
 arch/x86/include/asm/msr-index.h |    5 +++++
 3 files changed, 8 insertions(+)

Index: linux/arch/x86/include/asm/msr-index.h
===================================================================
--- linux.orig/arch/x86/include/asm/msr-index.h	2009-02-12 11:30:45.000000000 +0100
+++ linux/arch/x86/include/asm/msr-index.h	2009-02-12 11:30:51.000000000 +0100
@@ -77,6 +77,11 @@
 #define MSR_IA32_MC0_ADDR		0x00000402
 #define MSR_IA32_MC0_MISC		0x00000403
 
+/* These are consecutive and not in the normal 4er MCE bank block */
+#define MSR_IA32_MC0_CTL2		0x00000280
+#define CMCI_EN			(1ULL << 30)
+#define CMCI_THRESHOLD_MASK		0xffffULL
+
 #define MSR_P6_PERFCTR0			0x000000c1
 #define MSR_P6_PERFCTR1			0x000000c2
 #define MSR_P6_EVNTSEL0			0x00000186
Index: linux/arch/x86/include/asm/mce.h
===================================================================
--- linux.orig/arch/x86/include/asm/mce.h	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/include/asm/mce.h	2009-02-12 12:10:17.000000000 +0100
@@ -11,6 +11,8 @@
  */
 
 #define MCG_CTL_P	 (1UL<<8)   /* MCG_CAP register available */
+#define MCG_EXT_P	 (1ULL<<9)   /* Extended registers available */
+#define MCG_CMCI_P	 (1ULL<<10)  /* CMCI supported */
 
 #define MCG_STATUS_RIPV  (1UL<<0)   /* restart ip valid */
 #define MCG_STATUS_EIPV  (1UL<<1)   /* ip points to correct instruction */
Index: linux/arch/x86/include/asm/apicdef.h
===================================================================
--- linux.orig/arch/x86/include/asm/apicdef.h	2009-02-12 11:30:45.000000000 +0100
+++ linux/arch/x86/include/asm/apicdef.h	2009-02-12 11:30:51.000000000 +0100
@@ -53,6 +53,7 @@
 #define		APIC_ESR_SENDILL	0x00020
 #define		APIC_ESR_RECVILL	0x00040
 #define		APIC_ESR_ILLREGA	0x00080
+#define 	APIC_LVTCMCI	0x2f0
 #define	APIC_ICR	0x300
 #define		APIC_DEST_SELF		0x40000
 #define		APIC_DEST_ALLINC	0x80000

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] [7/9] x86: CMCI: Add CMCI support
  2009-02-12 12:49 [PATCH] [0/9] x86: CMCI: Add support for Intel CMCI Andi Kleen
                   ` (5 preceding siblings ...)
  2009-02-12 12:49 ` [PATCH] [6/9] x86: CMCI: Define MSR names and fields for new CMCI registers Andi Kleen
@ 2009-02-12 12:49 ` Andi Kleen
  2009-02-12 12:49 ` [PATCH] [8/9] x86: CMCI: Disable CMCI on rebooting Andi Kleen
  2009-02-12 12:49 ` [PATCH] [9/9] x86: CMCI: Recheck CMCI banks after APIC has been enabled on CPU #0 Andi Kleen
  8 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2009-02-12 12:49 UTC (permalink / raw)
  To: akpm, mingo, tglx, hpa, linux-kernel


Intel CMCI (Corrected Machine Check Interrupt) is a new
feature on Nehalem CPUs. It allows the CPU to trigger
interrupts on corrected events, which allows faster
reaction to them instead of with the traditional 
polling timer.

Also use CMCI to discover shared banks. Machine check banks
can be shared by CPU threads or even cores. Using the CMCI enable
bit it is possible to detect the fact that another CPU already
saw a specific bank. Use this to assign shared banks only
to one CPU to avoid reporting duplicated events.

On CPU hot unplug bank sharing is re discovered. This is done
using a thread that cycles through all the CPUs.

To avoid races between the poller and CMCI we only poll
for banks that are not CMCI capable and only check CMCI 
owned banks on a interrupt. 

The shared banks ownership information is currently only used for
CMCI interrupts, not polled banks.

The sharing discovery code follows the algorithm recommended in the 
IA32 SDM Vol3a 14.5.2.1

The CMCI interrupt handler just calls the machine check poller to 
pick up the machine check event that caused the interrupt.

I decided not to implement a separate threshold event like
the AMD version has, because the threshold is always one currently
and adding another event didn't seem to add any value.

Some code inspired by Yunhong Jiang's Xen implementation,
which was in term inspired by a earlier CMCI implementation
by me.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/include/asm/mce.h                |   10 +
 arch/x86/kernel/cpu/mcheck/mce_64.c       |   16 +-
 arch/x86/kernel/cpu/mcheck/mce_intel_64.c |  205 ++++++++++++++++++++++++++++++
 3 files changed, 228 insertions(+), 3 deletions(-)

Index: linux/arch/x86/kernel/cpu/mcheck/mce_intel_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_intel_64.c	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_intel_64.c	2009-02-12 12:10:16.000000000 +0100
@@ -1,6 +1,8 @@
 /*
  * Intel specific MCE features.
  * Copyright 2004 Zwane Mwaikambo <zwane@linuxpower.ca>
+ * Copyright (C) 2008, 2009 Intel Corporation
+ * Author: Andi Kleen
  */
 
 #include <linux/init.h>
@@ -12,6 +14,7 @@
 #include <asm/hw_irq.h>
 #include <asm/idle.h>
 #include <asm/therm_throt.h>
+#include <asm/apic.h>
 
 asmlinkage void smp_thermal_interrupt(void)
 {
@@ -84,7 +87,209 @@
 	return;
 }
 
+/*
+ * Support for Intel Correct Machine Check Interrupts. This allows
+ * the CPU to raise an interrupt when a corrected machine check happened.
+ * Normally we pick those up using a regular polling timer.
+ * Also supports reliable discovery of shared banks.
+ */
+
+static DEFINE_PER_CPU(mce_banks_t, mce_banks_owned);
+
+/*
+ * cmci_discover_lock protects against parallel discovery attempts
+ * which could race against each other.
+ */
+static DEFINE_SPINLOCK(cmci_discover_lock);
+
+#define CMCI_THRESHOLD 1
+
+static __cpuinit int cmci_supported(int *banks)
+{
+	u64 cap;
+
+	/*
+	 * Vendor check is not strictly needed, but the initial
+	 * initialization is vendor keyed and this
+	 * makes sure none of the backdoors are entered otherwise.
+	 */
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+		return 0;
+	if (!cpu_has_apic || lapic_get_maxlvt() < 6)
+		return 0;
+	rdmsrl(MSR_IA32_MCG_CAP, cap);
+	*banks = min_t(unsigned, MAX_NR_BANKS, cap & 0xff);
+	return !!(cap & MCG_CMCI_P);
+}
+
+/*
+ * The interrupt handler. This is called on every event.
+ * Just call the poller directly to log any events.
+ * This could in theory increase the threshold under high load,
+ * but doesn't for now.
+ */
+static void intel_threshold_interrupt(void)
+{
+	machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_banks_owned));
+	mce_notify_user();
+}
+
+static void print_update(char *type, int *hdr, int num)
+{
+	if (*hdr == 0)
+		printk(KERN_INFO "CPU %d MCA banks", smp_processor_id());
+	*hdr = 1;
+	printk(KERN_CONT " %s:%d", type, num);
+}
+
+/*
+ * Enable CMCI (Corrected Machine Check Interrupt) for available MCE banks
+ * on this CPU. Use the algorithm recommended in the SDM to discover shared
+ * banks.
+ */
+static __cpuinit void cmci_discover(int banks, int boot)
+{
+	unsigned long *owned = (void *)&__get_cpu_var(mce_banks_owned);
+	int hdr = 0;
+	int i;
+
+	spin_lock(&cmci_discover_lock);
+	for (i = 0; i < banks; i++) {
+		u64 val;
+
+		if (test_bit(i, owned))
+			continue;
+
+		rdmsrl(MSR_IA32_MC0_CTL2 + i, val);
+
+		/* Already owned by someone else? */
+		if (val & CMCI_EN) {
+			if (test_and_clear_bit(i, owned) || boot)
+				print_update("SHD", &hdr, i);
+			__clear_bit(i, __get_cpu_var(mce_poll_banks));
+			continue;
+		}
+
+		val |= CMCI_EN | CMCI_THRESHOLD;
+		wrmsrl(MSR_IA32_MC0_CTL2 + i, val);
+		rdmsrl(MSR_IA32_MC0_CTL2 + i, val);
+
+		/* Did the enable bit stick? -- the bank supports CMCI */
+		if (val & CMCI_EN) {
+			if (!test_and_set_bit(i, owned) || boot)
+				print_update("CMCI", &hdr, i);
+			__clear_bit(i, __get_cpu_var(mce_poll_banks));
+		} else {
+			WARN_ON(!test_bit(i, __get_cpu_var(mce_poll_banks)));
+		}
+	}
+	spin_unlock(&cmci_discover_lock);
+	if (hdr)
+		printk(KERN_CONT "\n");
+}
+
+/*
+ * Just in case we missed an event during initialization check
+ * all the CMCI owned banks.
+ */
+__cpuinit void cmci_recheck(void)
+{
+	unsigned long flags;
+	int banks;
+
+	if (!mce_available(&current_cpu_data) || !cmci_supported(&banks))
+		return;
+	local_irq_save(flags);
+	machine_check_poll(MCP_TIMESTAMP, &__get_cpu_var(mce_banks_owned));
+	local_irq_restore(flags);
+}
+
+/*
+ * Disable CMCI on this CPU for all banks it owns when it goes down.
+ * This allows other CPUs to claim the banks on rediscovery.
+ */
+void __cpuexit cmci_clear(void)
+{
+	int i;
+	int banks;
+	u64 val;
+
+	if (!cmci_supported(&banks))
+		return;
+	spin_lock(&cmci_discover_lock);
+	for (i = 0; i < banks; i++) {
+		if (!test_bit(i, __get_cpu_var(mce_banks_owned)))
+			continue;
+		/* Disable CMCI */
+		rdmsrl(MSR_IA32_MC0_CTL2 + i, val);
+		val &= ~(CMCI_EN|CMCI_THRESHOLD_MASK);
+		wrmsrl(MSR_IA32_MC0_CTL2 + i, val);
+		__clear_bit(i, __get_cpu_var(mce_banks_owned));
+	}
+	spin_unlock(&cmci_discover_lock);
+}
+
+/*
+ * After a CPU went down cycle through all the others and rediscover
+ * Must run in process context.
+ */
+void __cpuexit cmci_rediscover(int dying)
+{
+	int banks;
+	int cpu;
+	cpumask_var_t old;
+
+	if (!cmci_supported(&banks))
+		return;
+	if (!alloc_cpumask_var(&old, GFP_KERNEL))
+		return;
+	cpumask_copy(old, &current->cpus_allowed);
+
+	for_each_online_cpu (cpu) {
+		if (cpu == dying)
+			continue;
+		if (set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu)))
+			continue;
+		/* Recheck banks in case CPUs don't all have the same */
+		if (cmci_supported(&banks))
+			cmci_discover(banks, 0);
+	}
+
+	set_cpus_allowed_ptr(current, old);
+	free_cpumask_var(old);
+}
+
+/*
+ * Reenable CMCI on this CPU in case a CPU down failed.
+ */
+void cmci_reenable(void)
+{
+	int banks;
+	if (cmci_supported(&banks))
+		cmci_discover(banks, 0);
+}
+
+static __cpuinit void intel_init_cmci(void)
+{
+	int banks;
+
+	if (!cmci_supported(&banks))
+		return;
+
+	mce_threshold_vector = intel_threshold_interrupt;
+	cmci_discover(banks, 1);
+	/*
+	 * For CPU #0 this runs with still disabled APIC, but that's
+	 * ok because only the vector is set up. We still do another
+	 * check for the banks later for CPU #0 just to make sure
+	 * to not miss any events.
+	 */
+	apic_write(APIC_LVTCMCI, THRESHOLD_APIC_VECTOR|APIC_DM_FIXED);
+	cmci_recheck();
+}
+
 void __cpuinit mce_intel_feature_init(struct cpuinfo_x86 *c)
 {
 	intel_init_thermal(c);
+	intel_init_cmci();
 }
Index: linux/arch/x86/include/asm/mce.h
===================================================================
--- linux.orig/arch/x86/include/asm/mce.h	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/include/asm/mce.h	2009-02-12 11:30:51.000000000 +0100
@@ -105,8 +105,16 @@
 
 #ifdef CONFIG_X86_MCE_INTEL
 void mce_intel_feature_init(struct cpuinfo_x86 *c);
+void cmci_clear(void);
+void cmci_reenable(void);
+void cmci_rediscover(int dying);
+void cmci_recheck(void);
 #else
 static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
+static inline void cmci_clear(void) {}
+static inline void cmci_reenable(void) {}
+static inline void cmci_rediscover(int dying) {}
+static inline void cmci_recheck(void) {}
 #endif
 
 #ifdef CONFIG_X86_MCE_AMD
@@ -115,6 +123,8 @@
 static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
 #endif
 
+extern int mce_available(struct cpuinfo_x86 *c);
+
 void mce_log_therm_throt_event(__u64 status);
 
 extern atomic_t mce_entry;
Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 11:30:51.000000000 +0100
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c	2009-02-12 12:10:16.000000000 +0100
@@ -166,7 +166,7 @@
 	panic(msg);
 }
 
-static int mce_available(struct cpuinfo_x86 *c)
+int mce_available(struct cpuinfo_x86 *c)
 {
 	if (mce_dont_init)
 		return 0;
@@ -1052,9 +1052,12 @@
 static void __cpuexit mce_disable_cpu(void *h)
 {
 	int i;
+	unsigned long action = *(unsigned long *)h;
 
 	if (!mce_available(&current_cpu_data))
 		return;
+	if (!(action & CPU_TASKS_FROZEN))
+		cmci_clear();
 	for (i = 0; i < banks; i++)
 		wrmsrl(MSR_IA32_MC0_CTL + i*4, 0);
 }
@@ -1062,9 +1065,12 @@
 static void __cpuexit mce_reenable_cpu(void *h)
 {
 	int i;
+	unsigned long action = *(unsigned long *)h;
 
 	if (!mce_available(&current_cpu_data))
 		return;
+	if (!(action & CPU_TASKS_FROZEN))
+		cmci_reenable();
 	for (i = 0; i < banks; i++)
 		wrmsrl(MSR_IA32_MC0_CTL + i*4, bank[i]);
 }
@@ -1092,13 +1098,17 @@
 	case CPU_DOWN_PREPARE:
 	case CPU_DOWN_PREPARE_FROZEN:
 		del_timer_sync(t);
-		smp_call_function_single(cpu, mce_disable_cpu, NULL, 1);
+		smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
 		break;
 	case CPU_DOWN_FAILED:
 	case CPU_DOWN_FAILED_FROZEN:
 		t->expires = round_jiffies_relative(jiffies + next_interval);
 		add_timer_on(t, cpu);
-		smp_call_function_single(cpu, mce_reenable_cpu, NULL, 1);
+		smp_call_function_single(cpu, mce_reenable_cpu, &action, 1);
+		break;
+	case CPU_POST_DEAD:
+		/* intentionally ignoring frozen here */
+		cmci_rediscover(cpu);
 		break;
 	}
 	return NOTIFY_OK;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] [8/9] x86: CMCI: Disable CMCI on rebooting
  2009-02-12 12:49 [PATCH] [0/9] x86: CMCI: Add support for Intel CMCI Andi Kleen
                   ` (6 preceding siblings ...)
  2009-02-12 12:49 ` [PATCH] [7/9] x86: CMCI: Add CMCI support Andi Kleen
@ 2009-02-12 12:49 ` Andi Kleen
  2009-02-12 12:49 ` [PATCH] [9/9] x86: CMCI: Recheck CMCI banks after APIC has been enabled on CPU #0 Andi Kleen
  8 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2009-02-12 12:49 UTC (permalink / raw)
  To: akpm, mingo, tglx, hpa, linux-kernel


Disable the CMCI vector on reboot to avoid confusing other OS.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/kernel/apic.c |    8 ++++++++
 1 file changed, 8 insertions(+)

Index: linux/arch/x86/kernel/apic.c
===================================================================
--- linux.orig/arch/x86/kernel/apic.c	2009-02-12 12:02:27.000000000 +0100
+++ linux/arch/x86/kernel/apic.c	2009-02-12 12:10:16.000000000 +0100
@@ -868,6 +868,14 @@
 		apic_write(APIC_LVTTHMR, v | APIC_LVT_MASKED);
 	}
 #endif
+#ifdef CONFIG_X86_MCE_INTEL
+	if (maxlvt >= 6) {
+		v = apic_read(APIC_LVTCMCI);
+		if (!(v & APIC_LVT_MASKED))
+			apic_write(APIC_LVTCMCI, v | APIC_LVT_MASKED);
+	}
+#endif
+
 	/*
 	 * Clean APIC state for other OSs:
 	 */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] [9/9] x86: CMCI: Recheck CMCI banks after APIC has been enabled on CPU #0
  2009-02-12 12:49 [PATCH] [0/9] x86: CMCI: Add support for Intel CMCI Andi Kleen
                   ` (7 preceding siblings ...)
  2009-02-12 12:49 ` [PATCH] [8/9] x86: CMCI: Disable CMCI on rebooting Andi Kleen
@ 2009-02-12 12:49 ` Andi Kleen
  8 siblings, 0 replies; 10+ messages in thread
From: Andi Kleen @ 2009-02-12 12:49 UTC (permalink / raw)
  To: akpm, mingo, tglx, hpa, linux-kernel


One the first CPU the machine checks are enabled early before
the local APIC is enabled. This could in theory lead 
to some lost CMCI events very early during boot because
CMCIs cannot be delivered with disabled LAPIC. 

The poller also doesn't recover from this because it doesn't
check CMCI banks.

Add an explicit CMCI banks check after the LAPIC is enabled.
This is only done for CPU #0, the other CPUs only initialize
machine checks after the LAPIC is on.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/kernel/apic.c |    7 +++++++
 1 file changed, 7 insertions(+)

Index: linux/arch/x86/kernel/apic.c
===================================================================
--- linux.orig/arch/x86/kernel/apic.c	2009-02-12 12:03:15.000000000 +0100
+++ linux/arch/x86/kernel/apic.c	2009-02-12 12:03:15.000000000 +0100
@@ -48,6 +48,7 @@
 #include <asm/apic.h>
 #include <asm/i8259.h>
 #include <asm/smp.h>
+#include <asm/mce.h>
 
 #include <mach_apic.h>
 #include <mach_apicdef.h>
@@ -1270,6 +1271,12 @@
 	apic_write(APIC_LVT1, value);
 
 	preempt_enable();
+
+#ifdef CONFIG_X86_MCE_INTEL
+	/* Recheck CMCI information after local APIC is up on CPU #0 */
+	if (smp_processor_id() == 0)
+		cmci_recheck();
+#endif
 }
 
 void __cpuinit end_local_APIC_setup(void)

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-02-12 12:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-12 12:49 [PATCH] [0/9] x86: CMCI: Add support for Intel CMCI Andi Kleen
2009-02-12 12:49 ` [PATCH] [1/9] x86: CMCI: Export MAX_NR_BANKS Andi Kleen
2009-02-12 12:49 ` [PATCH] [2/9] x86: CMCI: Factor out threshold interrupt handler Andi Kleen
2009-02-12 12:49 ` [PATCH] [3/9] x86: CMCI: Avoid potential reentry of threshold interrupt Andi Kleen
2009-02-12 12:49 ` [PATCH] [4/9] x86: MCE: Replace machine check events logged interval with ratelimit Andi Kleen
2009-02-12 12:49 ` [PATCH] [5/9] x86: CMCI: Use polled banks bitmap in machine check poller Andi Kleen
2009-02-12 12:49 ` [PATCH] [6/9] x86: CMCI: Define MSR names and fields for new CMCI registers Andi Kleen
2009-02-12 12:49 ` [PATCH] [7/9] x86: CMCI: Add CMCI support Andi Kleen
2009-02-12 12:49 ` [PATCH] [8/9] x86: CMCI: Disable CMCI on rebooting Andi Kleen
2009-02-12 12:49 ` [PATCH] [9/9] x86: CMCI: Recheck CMCI banks after APIC has been enabled on CPU #0 Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox