linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/8] [RFC PATCH] nmi watchdog: x86 32/64 cleanup
@ 2006-05-09 20:50 dzickus
  2006-05-09 20:50 ` [patch 1/8] nmi watchdog header cleanup dzickus
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: dzickus @ 2006-05-09 20:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: ak, oprofile-list, dzickus

A major cleanup of the nmi watchdog on the i386 and x86_64 archs.

This little project started out as a way to get unknown_nmi_panic working on
an x86_64 SMP machine. But after Andi rejected my dirty little hack ;), it
grew into a spring cleaning of the nmi watchdog.

The fix Andi wanted led to a conflict with how oprofile worked.  This led to
a reservation layer around the performance counters and a bunch of changes
in oprofile startup.  All the reservation layer consists of is reserving a
bit that corresponds to a performance counter.  The intent here is to
prevent subsystems like oprofile and nmi watchdog from stomping on each
other when trying to manipulate these counters. Currently, oprofile must
stop the nmi watchdog before using the counters.  This approach does not
encourage any sharing. :) 

During the testing of this reservation layer, I ran into more and more bugs
from old hacks (notably in the SMP cases).  As a result I found myself
rewriting a lot of this code.  I think I straightened everything out, except
for enabling/disabling the nmi watchdog on an ioapic.  The
enable_irq/disable_irq macros don't seem to work for me and I don't know
enough about the subsystem to figure it out either. 

Of course the oprofile code is shared with the i386 arch, so all the changes
I did in x86_64 had to be duplicated over there too.

I tested these changes on a Xeon (UP/SMP), P4 (32/64, UP/SMP) and under
various boot-up defaults (nmi-watchdog=0,1,2).  I also worked on some of
these changes with Andi Kleen and the oprofile folks, so these ideas
shouldn't be too surprising, just perhaps the implementation. :o

Any thoughts/feedback/criticism to clean this up more and probably fix some of my
coding style would be appreciated.

Cheers,
Don

--

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 1/8] nmi watchdog header cleanup
  2006-05-09 20:50 [patch 0/8] [RFC PATCH] nmi watchdog: x86 32/64 cleanup dzickus
@ 2006-05-09 20:50 ` dzickus
  2006-05-09 20:50 ` [patch 2/8] Add performance counter reservation framework for UP kernels dzickus
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: dzickus @ 2006-05-09 20:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: ak, oprofile-list, dzickus

[-- Attachment #1: nmi-x86-header-cleanup.patch --]
[-- Type: text/plain, Size: 8896 bytes --]


Misc header cleanup for nmi watchdog.

Signed-off-by: Don Zickus <dzickus@redhat.com>

Index: linux-don/arch/i386/kernel/apic.c
===================================================================
--- linux-don.orig/arch/i386/kernel/apic.c
+++ linux-don/arch/i386/kernel/apic.c
@@ -36,6 +36,7 @@
 #include <asm/arch_hooks.h>
 #include <asm/hpet.h>
 #include <asm/i8253.h>
+#include <asm/nmi.h>
 
 #include <mach_apic.h>
 #include <mach_apicdef.h>
Index: linux-don/arch/i386/kernel/io_apic.c
===================================================================
--- linux-don.orig/arch/i386/kernel/io_apic.c
+++ linux-don/arch/i386/kernel/io_apic.c
@@ -38,6 +38,7 @@
 #include <asm/desc.h>
 #include <asm/timer.h>
 #include <asm/i8259.h>
+#include <asm/nmi.h>
 
 #include <mach_apic.h>
 
Index: linux-don/arch/i386/kernel/nmi.c
===================================================================
--- linux-don.orig/arch/i386/kernel/nmi.c
+++ linux-don/arch/i386/kernel/nmi.c
@@ -14,20 +14,15 @@
  */
 
 #include <linux/config.h>
-#include <linux/mm.h>
 #include <linux/delay.h>
-#include <linux/bootmem.h>
-#include <linux/smp_lock.h>
 #include <linux/interrupt.h>
-#include <linux/mc146818rtc.h>
-#include <linux/kernel_stat.h>
 #include <linux/module.h>
 #include <linux/nmi.h>
 #include <linux/sysdev.h>
 #include <linux/sysctl.h>
+#include <linux/percpu.h>
 
 #include <asm/smp.h>
-#include <asm/div64.h>
 #include <asm/nmi.h>
 
 #include "mach_traps.h"
Index: linux-don/arch/i386/kernel/smpboot.c
===================================================================
--- linux-don.orig/arch/i386/kernel/smpboot.c
+++ linux-don/arch/i386/kernel/smpboot.c
@@ -52,6 +52,7 @@
 #include <asm/tlbflush.h>
 #include <asm/desc.h>
 #include <asm/arch_hooks.h>
+#include <asm/nmi.h>
 
 #include <mach_apic.h>
 #include <mach_wakecpu.h>
Index: linux-don/arch/i386/oprofile/op_model_athlon.c
===================================================================
--- linux-don.orig/arch/i386/oprofile/op_model_athlon.c
+++ linux-don/arch/i386/oprofile/op_model_athlon.c
@@ -13,6 +13,7 @@
 #include <linux/oprofile.h>
 #include <asm/ptrace.h>
 #include <asm/msr.h>
+#include <asm/nmi.h>
  
 #include "op_x86_model.h"
 #include "op_counter.h"
Index: linux-don/arch/i386/oprofile/op_model_p4.c
===================================================================
--- linux-don.orig/arch/i386/oprofile/op_model_p4.c
+++ linux-don/arch/i386/oprofile/op_model_p4.c
@@ -14,6 +14,7 @@
 #include <asm/ptrace.h>
 #include <asm/fixmap.h>
 #include <asm/apic.h>
+#include <asm/nmi.h>
 
 #include "op_x86_model.h"
 #include "op_counter.h"
Index: linux-don/arch/i386/oprofile/op_model_ppro.c
===================================================================
--- linux-don.orig/arch/i386/oprofile/op_model_ppro.c
+++ linux-don/arch/i386/oprofile/op_model_ppro.c
@@ -14,6 +14,7 @@
 #include <asm/ptrace.h>
 #include <asm/msr.h>
 #include <asm/apic.h>
+#include <asm/nmi.h>
  
 #include "op_x86_model.h"
 #include "op_counter.h"
Index: linux-don/arch/x86_64/kernel/io_apic.c
===================================================================
--- linux-don.orig/arch/x86_64/kernel/io_apic.c
+++ linux-don/arch/x86_64/kernel/io_apic.c
@@ -41,6 +41,7 @@
 #include <asm/mach_apic.h>
 #include <asm/acpi.h>
 #include <asm/dma.h>
+#include <asm/nmi.h>
 
 #define __apicdebuginit  __init
 
Index: linux-don/include/asm-i386/apic.h
===================================================================
--- linux-don.orig/include/asm-i386/apic.h
+++ linux-don/include/asm-i386/apic.h
@@ -112,24 +112,12 @@ extern void init_apic_mappings (void);
 extern void smp_local_timer_interrupt (struct pt_regs * regs);
 extern void setup_boot_APIC_clock (void);
 extern void setup_secondary_APIC_clock (void);
-extern void setup_apic_nmi_watchdog (void);
-extern int reserve_lapic_nmi(void);
-extern void release_lapic_nmi(void);
-extern void disable_timer_nmi_watchdog(void);
-extern void enable_timer_nmi_watchdog(void);
-extern void nmi_watchdog_tick (struct pt_regs * regs);
 extern int APIC_init_uniprocessor (void);
 extern void disable_APIC_timer(void);
 extern void enable_APIC_timer(void);
 
 extern void enable_NMI_through_LVT0 (void * dummy);
 
-extern unsigned int nmi_watchdog;
-#define NMI_NONE	0
-#define NMI_IO_APIC	1
-#define NMI_LOCAL_APIC	2
-#define NMI_INVALID	3
-
 extern int disable_timer_pin_1;
 
 void smp_send_timer_broadcast_ipi(struct pt_regs *regs);
Index: linux-don/include/asm-i386/nmi.h
===================================================================
--- linux-don.orig/include/asm-i386/nmi.h
+++ linux-don/include/asm-i386/nmi.h
@@ -5,24 +5,38 @@
 #define ASM_NMI_H
 
 #include <linux/pm.h>
- 
+
 struct pt_regs;
- 
+
 typedef int (*nmi_callback_t)(struct pt_regs * regs, int cpu);
- 
-/** 
+
+/**
  * set_nmi_callback
  *
  * Set a handler for an NMI. Only one handler may be
  * set. Return 1 if the NMI was handled.
  */
 void set_nmi_callback(nmi_callback_t callback);
- 
-/** 
+
+/**
  * unset_nmi_callback
  *
  * Remove the handler previously set.
  */
 void unset_nmi_callback(void);
- 
+
+extern void setup_apic_nmi_watchdog (void);
+extern int reserve_lapic_nmi(void);
+extern void release_lapic_nmi(void);
+extern void disable_timer_nmi_watchdog(void);
+extern void enable_timer_nmi_watchdog(void);
+extern void nmi_watchdog_tick (struct pt_regs * regs);
+
+extern unsigned int nmi_watchdog;
+#define NMI_DEFAULT     -1
+#define NMI_NONE	0
+#define NMI_IO_APIC	1
+#define NMI_LOCAL_APIC	2
+#define NMI_INVALID	3
+
 #endif /* ASM_NMI_H */
Index: linux-don/include/asm-x86_64/apic.h
===================================================================
--- linux-don.orig/include/asm-x86_64/apic.h
+++ linux-don/include/asm-x86_64/apic.h
@@ -80,27 +80,11 @@ extern void init_apic_mappings (void);
 extern void smp_local_timer_interrupt (struct pt_regs * regs);
 extern void setup_boot_APIC_clock (void);
 extern void setup_secondary_APIC_clock (void);
-extern void setup_apic_nmi_watchdog (void);
-extern int reserve_lapic_nmi(void);
-extern void release_lapic_nmi(void);
-extern void disable_timer_nmi_watchdog(void);
-extern void enable_timer_nmi_watchdog(void);
-extern void nmi_watchdog_tick (struct pt_regs * regs, unsigned reason);
 extern int APIC_init_uniprocessor (void);
 extern void disable_APIC_timer(void);
 extern void enable_APIC_timer(void);
 extern void clustered_apic_check(void);
 
-extern void nmi_watchdog_default(void);
-extern int setup_nmi_watchdog(char *);
-
-extern unsigned int nmi_watchdog;
-#define NMI_DEFAULT	-1
-#define NMI_NONE	0
-#define NMI_IO_APIC	1
-#define NMI_LOCAL_APIC	2
-#define NMI_INVALID	3
-
 extern int disable_timer_pin_1;
 
 extern void setup_threshold_lvt(unsigned long lvt_off);
Index: linux-don/include/asm-x86_64/nmi.h
===================================================================
--- linux-don.orig/include/asm-x86_64/nmi.h
+++ linux-don/include/asm-x86_64/nmi.h
@@ -5,26 +5,27 @@
 #define ASM_NMI_H
 
 #include <linux/pm.h>
+#include <asm/io.h>
  
 struct pt_regs;
- 
+
 typedef int (*nmi_callback_t)(struct pt_regs * regs, int cpu);
- 
-/** 
+
+/**
  * set_nmi_callback
  *
  * Set a handler for an NMI. Only one handler may be
  * set. Return 1 if the NMI was handled.
  */
 void set_nmi_callback(nmi_callback_t callback);
- 
-/** 
+
+/**
  * unset_nmi_callback
  *
  * Remove the handler previously set.
  */
 void unset_nmi_callback(void);
- 
+
 #ifdef CONFIG_PM
  
 /** Replace the PM callback routine for NMI. */
@@ -56,4 +57,21 @@ extern int unknown_nmi_panic;
 
 extern int check_nmi_watchdog(void);
  
+extern void setup_apic_nmi_watchdog (void);
+extern int reserve_lapic_nmi(void);
+extern void release_lapic_nmi(void);
+extern void disable_timer_nmi_watchdog(void);
+extern void enable_timer_nmi_watchdog(void);
+extern void nmi_watchdog_tick (struct pt_regs * regs, unsigned reason);
+
+extern void nmi_watchdog_default(void);
+extern int setup_nmi_watchdog(char *);
+
+extern unsigned int nmi_watchdog;
+#define NMI_DEFAULT	-1
+#define NMI_NONE	0
+#define NMI_IO_APIC	1
+#define NMI_LOCAL_APIC	2
+#define NMI_INVALID	3
+
 #endif /* ASM_NMI_H */
Index: linux-don/arch/x86_64/kernel/nmi.c
===================================================================
--- linux-don.orig/arch/x86_64/kernel/nmi.c
+++ linux-don/arch/x86_64/kernel/nmi.c
@@ -15,11 +15,7 @@
 #include <linux/config.h>
 #include <linux/mm.h>
 #include <linux/delay.h>
-#include <linux/bootmem.h>
-#include <linux/smp_lock.h>
 #include <linux/interrupt.h>
-#include <linux/mc146818rtc.h>
-#include <linux/kernel_stat.h>
 #include <linux/module.h>
 #include <linux/sysdev.h>
 #include <linux/nmi.h>
@@ -27,13 +23,9 @@
 #include <linux/kprobes.h>
 
 #include <asm/smp.h>
-#include <asm/mtrr.h>
-#include <asm/mpspec.h>
 #include <asm/nmi.h>
-#include <asm/msr.h>
 #include <asm/proto.h>
 #include <asm/kdebug.h>
-#include <asm/local.h>
 #include <asm/mce.h>
 
 /*

--

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 2/8] Add performance counter reservation framework for UP kernels
  2006-05-09 20:50 [patch 0/8] [RFC PATCH] nmi watchdog: x86 32/64 cleanup dzickus
  2006-05-09 20:50 ` [patch 1/8] nmi watchdog header cleanup dzickus
@ 2006-05-09 20:50 ` dzickus
  2006-05-09 20:50 ` [patch 3/8] Utilize performance counter reservation framework in oprofile dzickus
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: dzickus @ 2006-05-09 20:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: ak, oprofile-list, dzickus

[-- Attachment #1: nmi-x86-reservation-UP.patch --]
[-- Type: text/plain, Size: 16687 bytes --]


Adds basic infrastructure to allow subsystems to reserve performance
counters on the x86 chips.  Only UP kernels are supported in this patch to
make reviewing easier.  The SMP portion makes a lot more changes. 

Think of this as a locking mechanism where each bit represents a different
counter.  In addition, each subsystem should also reserve an appropriate
event selection register that will correspond to the performance counter it
will be using (this is mainly neccessary for the Pentium 4 chips as they
break the 1:1 relationship to performance counters). 

This will help prevent subsystems like oprofile from interfering with the
nmi watchdog. 

Signed-off-by:  Don Zickus <dzickus@redhat.com>

Index: linux-don/arch/x86_64/kernel/nmi.c
===================================================================
--- linux-don.orig/arch/x86_64/kernel/nmi.c
+++ linux-don/arch/x86_64/kernel/nmi.c
@@ -28,6 +28,20 @@
 #include <asm/kdebug.h>
 #include <asm/mce.h>
 
+/* perfctr_nmi_owner tracks the ownership of the perfctr registers:
+ * evtsel_nmi_owner tracks the ownership of the event selection
+ * - different performance counters/ event selection may be reserved for
+ *   different subsystems this reservation system just tries to coordinate
+ *   things a little
+ */
+static DEFINE_PER_CPU(unsigned, perfctr_nmi_owner);
+static DEFINE_PER_CPU(unsigned, evntsel_nmi_owner[2]);
+
+/* this number is calculated from Intel's MSR_P4_CRU_ESCR5 register and it's
+ * offset from MSR_P4_BSU_ESCR0.  It will be the max for all platforms (for now)
+ */
+#define NMI_MAX_COUNTER_BITS 66
+
 /*
  * lapic_nmi_owner tracks the ownership of the lapic NMI hardware:
  * - it may be reserved by some other driver, or not
@@ -91,6 +105,95 @@ static unsigned int nmi_p4_cccr_val;
 	(P4_CCCR_OVF_PMI0|P4_CCCR_THRESHOLD(15)|P4_CCCR_COMPLEMENT|	\
 	 P4_CCCR_COMPARE|P4_CCCR_REQUIRED|P4_CCCR_ESCR_SELECT(4)|P4_CCCR_ENABLE)
 
+/* converts an msr to an appropriate reservation bit */
+static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr)
+{
+	/* returns the bit offset of the performance counter register */
+	switch (boot_cpu_data.x86_vendor) {
+	case X86_VENDOR_AMD:
+		return (msr - MSR_K7_PERFCTR0);
+	case X86_VENDOR_INTEL:
+		return (msr - MSR_P4_BPU_PERFCTR0);
+	}
+	return 0;
+}
+
+/* converts an msr to an appropriate reservation bit */
+static inline unsigned int nmi_evntsel_msr_to_bit(unsigned int msr)
+{
+	/* returns the bit offset of the event selection register */
+	switch (boot_cpu_data.x86_vendor) {
+	case X86_VENDOR_AMD:
+		return (msr - MSR_K7_EVNTSEL0);
+	case X86_VENDOR_INTEL:
+		return (msr - MSR_P4_BSU_ESCR0);
+	}
+	return 0;
+}
+
+/* checks for a bit availability (hack for oprofile) */
+int avail_to_resrv_perfctr_nmi_bit(unsigned int counter)
+{
+	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+	return (!test_bit(counter, &__get_cpu_var(perfctr_nmi_owner)));
+}
+
+/* checks the an msr for availability */
+int avail_to_resrv_perfctr_nmi(unsigned int msr)
+{
+	unsigned int counter;
+
+	counter = nmi_perfctr_msr_to_bit(msr);
+	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+	return (!test_bit(counter, &__get_cpu_var(perfctr_nmi_owner)));
+}
+
+int reserve_perfctr_nmi(unsigned int msr)
+{
+	unsigned int counter;
+
+	counter = nmi_perfctr_msr_to_bit(msr);
+	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+	if (!test_and_set_bit(counter, &__get_cpu_var(perfctr_nmi_owner)))
+		return 1;
+	return 0;
+}
+
+void release_perfctr_nmi(unsigned int msr)
+{
+	unsigned int counter;
+
+	counter = nmi_perfctr_msr_to_bit(msr);
+	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+	clear_bit(counter, &__get_cpu_var(perfctr_nmi_owner));
+}
+
+int reserve_evntsel_nmi(unsigned int msr)
+{
+	unsigned int counter;
+
+	counter = nmi_evntsel_msr_to_bit(msr);
+	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+	if (!test_and_set_bit(counter, &__get_cpu_var(evntsel_nmi_owner)))
+		return 1;
+	return 0;
+}
+
+void release_evntsel_nmi(unsigned int msr)
+{
+	unsigned int counter;
+
+	counter = nmi_evntsel_msr_to_bit(msr);
+	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+	clear_bit(counter, &__get_cpu_var(evntsel_nmi_owner));
+}
+
 static __cpuinit inline int nmi_known_cpu(void)
 {
 	switch (boot_cpu_data.x86_vendor) {
@@ -326,34 +429,22 @@ late_initcall(init_lapic_nmi_sysfs);
 
 #endif	/* CONFIG_PM */
 
-/*
- * Activate the NMI watchdog via the local APIC.
- * Original code written by Keith Owens.
- */
-
-static void clear_msr_range(unsigned int base, unsigned int n)
+static int setup_k7_watchdog(void)
 {
-	unsigned int i;
-
-	for(i = 0; i < n; ++i)
-		wrmsr(base+i, 0, 0);
-}
-
-static void setup_k7_watchdog(void)
-{
-	int i;
 	unsigned int evntsel;
 
 	nmi_perfctr_msr = MSR_K7_PERFCTR0;
 
-	for(i = 0; i < 4; ++i) {
-		/* Simulator may not support it */
-		if (checking_wrmsrl(MSR_K7_EVNTSEL0+i, 0UL)) {
-			nmi_perfctr_msr = 0;
-			return;
-		}
-		wrmsrl(MSR_K7_PERFCTR0+i, 0UL);
-	}
+	if (!reserve_perfctr_nmi(nmi_perfctr_msr))
+		goto fail;
+
+	if (!reserve_evntsel_nmi(MSR_K7_EVNTSEL0))
+		goto fail1;
+
+	/* Simulator may not support it */
+	if (checking_wrmsrl(MSR_K7_EVNTSEL0, 0UL))
+		goto fail2;
+	wrmsrl(MSR_K7_PERFCTR0, 0UL);
 
 	evntsel = K7_EVNTSEL_INT
 		| K7_EVNTSEL_OS
@@ -365,6 +456,13 @@ static void setup_k7_watchdog(void)
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
 	evntsel |= K7_EVNTSEL_ENABLE;
 	wrmsr(MSR_K7_EVNTSEL0, evntsel, 0);
+	return 1;
+fail2:
+	release_evntsel_nmi(MSR_K7_EVNTSEL0);
+fail1:
+	release_perfctr_nmi(nmi_perfctr_msr);
+fail:
+	return 0;
 }
 
 
@@ -383,22 +481,11 @@ static int setup_p4_watchdog(void)
 		nmi_p4_cccr_val |= P4_CCCR_OVF_PMI1;
 #endif
 
-	if (!(misc_enable & MSR_P4_MISC_ENABLE_PEBS_UNAVAIL))
-		clear_msr_range(0x3F1, 2);
-	/* MSR 0x3F0 seems to have a default value of 0xFC00, but current
-	   docs doesn't fully define it, so leave it alone for now. */
-	if (boot_cpu_data.x86_model >= 0x3) {
-		/* MSR_P4_IQ_ESCR0/1 (0x3ba/0x3bb) removed */
-		clear_msr_range(0x3A0, 26);
-		clear_msr_range(0x3BC, 3);
-	} else {
-		clear_msr_range(0x3A0, 31);
-	}
-	clear_msr_range(0x3C0, 6);
-	clear_msr_range(0x3C8, 6);
-	clear_msr_range(0x3E0, 2);
-	clear_msr_range(MSR_P4_CCCR0, 18);
-	clear_msr_range(MSR_P4_PERFCTR0, 18);
+	if (!reserve_perfctr_nmi(nmi_perfctr_msr))
+		goto fail;
+
+	if (!reserve_evntsel_nmi(MSR_P4_CRU_ESCR0))
+		goto fail1;
 
 	wrmsr(MSR_P4_CRU_ESCR0, P4_NMI_CRU_ESCR0, 0);
 	wrmsr(MSR_P4_IQ_CCCR0, P4_NMI_IQ_CCCR0 & ~P4_CCCR_ENABLE, 0);
@@ -407,6 +494,10 @@ static int setup_p4_watchdog(void)
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
 	wrmsr(MSR_P4_IQ_CCCR0, nmi_p4_cccr_val, 0);
 	return 1;
+fail1:
+	release_perfctr_nmi(nmi_perfctr_msr);
+fail:
+	return 0;
 }
 
 void setup_apic_nmi_watchdog(void)
@@ -417,7 +508,8 @@ void setup_apic_nmi_watchdog(void)
 			return;
 		if (strstr(boot_cpu_data.x86_model_id, "Screwdriver"))
 			return;
-		setup_k7_watchdog();
+		if (!setup_k7_watchdog())
+			return;
 		break;
 	case X86_VENDOR_INTEL:
 		if (boot_cpu_data.x86 != 15)
@@ -587,6 +679,12 @@ int proc_unknown_nmi_panic(struct ctl_ta
 
 EXPORT_SYMBOL(nmi_active);
 EXPORT_SYMBOL(nmi_watchdog);
+EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi);
+EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi_bit);
+EXPORT_SYMBOL(reserve_perfctr_nmi);
+EXPORT_SYMBOL(release_perfctr_nmi);
+EXPORT_SYMBOL(reserve_evntsel_nmi);
+EXPORT_SYMBOL(release_evntsel_nmi);
 EXPORT_SYMBOL(reserve_lapic_nmi);
 EXPORT_SYMBOL(release_lapic_nmi);
 EXPORT_SYMBOL(disable_timer_nmi_watchdog);
Index: linux-don/include/asm-x86_64/nmi.h
===================================================================
--- linux-don.orig/include/asm-x86_64/nmi.h
+++ linux-don/include/asm-x86_64/nmi.h
@@ -56,7 +56,13 @@ extern int panic_on_timeout;
 extern int unknown_nmi_panic;
 
 extern int check_nmi_watchdog(void);
- 
+extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
+extern int avail_to_resrv_perfctr_nmi(unsigned int);
+extern int reserve_perfctr_nmi(unsigned int);
+extern void release_perfctr_nmi(unsigned int);
+extern int reserve_evntsel_nmi(unsigned int);
+extern void release_evntsel_nmi(unsigned int);
+
 extern void setup_apic_nmi_watchdog (void);
 extern int reserve_lapic_nmi(void);
 extern void release_lapic_nmi(void);
Index: linux-don/arch/i386/kernel/nmi.c
===================================================================
--- linux-don.orig/arch/i386/kernel/nmi.c
+++ linux-don/arch/i386/kernel/nmi.c
@@ -34,6 +34,20 @@ static unsigned int nmi_perfctr_msr;	/* 
 static unsigned int nmi_p4_cccr_val;
 extern void show_registers(struct pt_regs *regs);
 
+/* perfctr_nmi_owner tracks the ownership of the perfctr registers:
+ * evtsel_nmi_owner tracks the ownership of the event selection
+ * - different performance counters/ event selection may be reserved for
+ *   different subsystems this reservation system just tries to coordinate
+ *   things a little
+ */
+static DEFINE_PER_CPU(unsigned long, perfctr_nmi_owner);
+static DEFINE_PER_CPU(unsigned long, evntsel_nmi_owner[3]);
+
+/* this number is calculated from Intel's MSR_P4_CRU_ESCR5 register and it's
+ * offset from MSR_P4_BSU_ESCR0.  It will be the max for all platforms (for now)
+ */
+#define NMI_MAX_COUNTER_BITS 66
+
 /*
  * lapic_nmi_owner tracks the ownership of the lapic NMI hardware:
  * - it may be reserved by some other driver, or not
@@ -95,6 +109,105 @@ int nmi_active;
 	(P4_CCCR_OVF_PMI0|P4_CCCR_THRESHOLD(15)|P4_CCCR_COMPLEMENT|	\
 	 P4_CCCR_COMPARE|P4_CCCR_REQUIRED|P4_CCCR_ESCR_SELECT(4)|P4_CCCR_ENABLE)
 
+/* converts an msr to an appropriate reservation bit */
+static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr)
+{
+	/* returns the bit offset of the performance counter register */
+	switch (boot_cpu_data.x86_vendor) {
+	case X86_VENDOR_AMD:
+		return (msr - MSR_K7_PERFCTR0);
+	case X86_VENDOR_INTEL:
+		switch (boot_cpu_data.x86) {
+		case 6:
+			return (msr - MSR_P6_PERFCTR0);
+		case 15:
+			return (msr - MSR_P4_BPU_PERFCTR0);
+		}
+	}
+	return 0;
+}
+
+/* converts an msr to an appropriate reservation bit */
+static inline unsigned int nmi_evntsel_msr_to_bit(unsigned int msr)
+{
+	/* returns the bit offset of the event selection register */
+	switch (boot_cpu_data.x86_vendor) {
+	case X86_VENDOR_AMD:
+		return (msr - MSR_K7_EVNTSEL0);
+	case X86_VENDOR_INTEL:
+		switch (boot_cpu_data.x86) {
+		case 6:
+			return (msr - MSR_P6_EVNTSEL0);
+		case 15:
+			return (msr - MSR_P4_BSU_ESCR0);
+		}
+	}
+	return 0;
+}
+
+/* checks for a bit availability (hack for oprofile) */
+int avail_to_resrv_perfctr_nmi_bit(unsigned int counter)
+{
+	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+	return (!test_bit(counter, &__get_cpu_var(perfctr_nmi_owner)));
+}
+
+/* checks the an msr for availability */
+int avail_to_resrv_perfctr_nmi(unsigned int msr)
+{
+	unsigned int counter;
+
+	counter = nmi_perfctr_msr_to_bit(msr);
+	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+	return (!test_bit(counter, &__get_cpu_var(perfctr_nmi_owner)));
+}
+
+int reserve_perfctr_nmi(unsigned int msr)
+{
+	unsigned int counter;
+
+	counter = nmi_perfctr_msr_to_bit(msr);
+	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+	if (!test_and_set_bit(counter, &__get_cpu_var(perfctr_nmi_owner)))
+		return 1;
+	return 0;
+}
+
+void release_perfctr_nmi(unsigned int msr)
+{
+	unsigned int counter;
+
+	counter = nmi_perfctr_msr_to_bit(msr);
+	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+	clear_bit(counter, &__get_cpu_var(perfctr_nmi_owner));
+}
+
+int reserve_evntsel_nmi(unsigned int msr)
+{
+	unsigned int counter;
+
+	counter = nmi_evntsel_msr_to_bit(msr);
+	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+	if (!test_and_set_bit(counter, &__get_cpu_var(evntsel_nmi_owner[0])))
+		return 1;
+	return 0;
+}
+
+void release_evntsel_nmi(unsigned int msr)
+{
+	unsigned int counter;
+
+	counter = nmi_evntsel_msr_to_bit(msr);
+	BUG_ON(counter > NMI_MAX_COUNTER_BITS);
+
+	clear_bit(counter, &__get_cpu_var(evntsel_nmi_owner[0]));
+}
+
 #ifdef CONFIG_SMP
 /* The performance counters used by NMI_LOCAL_APIC don't trigger when
  * the CPU is idle. To make sure the NMI watchdog really ticks on all
@@ -344,14 +457,6 @@ late_initcall(init_lapic_nmi_sysfs);
  * Original code written by Keith Owens.
  */
 
-static void clear_msr_range(unsigned int base, unsigned int n)
-{
-	unsigned int i;
-
-	for(i = 0; i < n; ++i)
-		wrmsr(base+i, 0, 0);
-}
-
 static void write_watchdog_counter(const char *descr)
 {
 	u64 count = (u64)cpu_khz * 1000;
@@ -362,14 +467,19 @@ static void write_watchdog_counter(const
 	wrmsrl(nmi_perfctr_msr, 0 - count);
 }
 
-static void setup_k7_watchdog(void)
+static int setup_k7_watchdog(void)
 {
 	unsigned int evntsel;
 
 	nmi_perfctr_msr = MSR_K7_PERFCTR0;
 
-	clear_msr_range(MSR_K7_EVNTSEL0, 4);
-	clear_msr_range(MSR_K7_PERFCTR0, 4);
+	if (!reserve_perfctr_nmi(nmi_perfctr_msr))
+		goto fail;
+
+	if (!reserve_evntsel_nmi(MSR_K7_EVNTSEL0))
+		goto fail1;
+
+	wrmsrl(MSR_K7_PERFCTR0, 0UL);
 
 	evntsel = K7_EVNTSEL_INT
 		| K7_EVNTSEL_OS
@@ -381,16 +491,24 @@ static void setup_k7_watchdog(void)
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
 	evntsel |= K7_EVNTSEL_ENABLE;
 	wrmsr(MSR_K7_EVNTSEL0, evntsel, 0);
+	return 1;
+fail1:
+	release_perfctr_nmi(nmi_perfctr_msr);
+fail:
+	return 0;
 }
 
-static void setup_p6_watchdog(void)
+static int setup_p6_watchdog(void)
 {
 	unsigned int evntsel;
 
 	nmi_perfctr_msr = MSR_P6_PERFCTR0;
 
-	clear_msr_range(MSR_P6_EVNTSEL0, 2);
-	clear_msr_range(MSR_P6_PERFCTR0, 2);
+	if (!reserve_perfctr_nmi(nmi_perfctr_msr))
+		goto fail;
+
+	if (!reserve_evntsel_nmi(MSR_P6_EVNTSEL0))
+		goto fail1;
 
 	evntsel = P6_EVNTSEL_INT
 		| P6_EVNTSEL_OS
@@ -402,6 +520,11 @@ static void setup_p6_watchdog(void)
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
 	evntsel |= P6_EVNTSEL0_ENABLE;
 	wrmsr(MSR_P6_EVNTSEL0, evntsel, 0);
+	return 1;
+fail1:
+	release_perfctr_nmi(nmi_perfctr_msr);
+fail:
+	return 0;
 }
 
 static int setup_p4_watchdog(void)
@@ -419,22 +542,11 @@ static int setup_p4_watchdog(void)
 		nmi_p4_cccr_val |= P4_CCCR_OVF_PMI1;
 #endif
 
-	if (!(misc_enable & MSR_P4_MISC_ENABLE_PEBS_UNAVAIL))
-		clear_msr_range(0x3F1, 2);
-	/* MSR 0x3F0 seems to have a default value of 0xFC00, but current
-	   docs doesn't fully define it, so leave it alone for now. */
-	if (boot_cpu_data.x86_model >= 0x3) {
-		/* MSR_P4_IQ_ESCR0/1 (0x3ba/0x3bb) removed */
-		clear_msr_range(0x3A0, 26);
-		clear_msr_range(0x3BC, 3);
-	} else {
-		clear_msr_range(0x3A0, 31);
-	}
-	clear_msr_range(0x3C0, 6);
-	clear_msr_range(0x3C8, 6);
-	clear_msr_range(0x3E0, 2);
-	clear_msr_range(MSR_P4_CCCR0, 18);
-	clear_msr_range(MSR_P4_PERFCTR0, 18);
+	if (!reserve_perfctr_nmi(nmi_perfctr_msr))
+		goto fail;
+
+	if (!reserve_evntsel_nmi(MSR_P4_CRU_ESCR0))
+		goto fail1;
 
 	wrmsr(MSR_P4_CRU_ESCR0, P4_NMI_CRU_ESCR0, 0);
 	wrmsr(MSR_P4_IQ_CCCR0, P4_NMI_IQ_CCCR0 & ~P4_CCCR_ENABLE, 0);
@@ -442,6 +554,10 @@ static int setup_p4_watchdog(void)
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
 	wrmsr(MSR_P4_IQ_CCCR0, nmi_p4_cccr_val, 0);
 	return 1;
+fail1:
+	release_perfctr_nmi(nmi_perfctr_msr);
+fail:
+	return 0;
 }
 
 void setup_apic_nmi_watchdog (void)
@@ -450,7 +566,8 @@ void setup_apic_nmi_watchdog (void)
 	case X86_VENDOR_AMD:
 		if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15)
 			return;
-		setup_k7_watchdog();
+		if (!setup_k7_watchdog())
+			return;
 		break;
 	case X86_VENDOR_INTEL:
 		switch (boot_cpu_data.x86) {
@@ -458,7 +575,8 @@ void setup_apic_nmi_watchdog (void)
 			if (boot_cpu_data.x86_model > 0xd)
 				return;
 
-			setup_p6_watchdog();
+			if(!setup_p6_watchdog())
+				return;
 			break;
 		case 15:
 			if (boot_cpu_data.x86_model > 0x4)
@@ -611,6 +729,12 @@ int proc_unknown_nmi_panic(ctl_table *ta
 
 EXPORT_SYMBOL(nmi_active);
 EXPORT_SYMBOL(nmi_watchdog);
+EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi);
+EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi_bit);
+EXPORT_SYMBOL(reserve_perfctr_nmi);
+EXPORT_SYMBOL(release_perfctr_nmi);
+EXPORT_SYMBOL(reserve_evntsel_nmi);
+EXPORT_SYMBOL(release_evntsel_nmi);
 EXPORT_SYMBOL(reserve_lapic_nmi);
 EXPORT_SYMBOL(release_lapic_nmi);
 EXPORT_SYMBOL(disable_timer_nmi_watchdog);
Index: linux-don/include/asm-i386/nmi.h
===================================================================
--- linux-don.orig/include/asm-i386/nmi.h
+++ linux-don/include/asm-i386/nmi.h
@@ -25,6 +25,13 @@ void set_nmi_callback(nmi_callback_t cal
  */
 void unset_nmi_callback(void);
 
+extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
+extern int avail_to_resrv_perfctr_nmi(unsigned int);
+extern int reserve_perfctr_nmi(unsigned int);
+extern void release_perfctr_nmi(unsigned int);
+extern int reserve_evntsel_nmi(unsigned int);
+extern void release_evntsel_nmi(unsigned int);
+
 extern void setup_apic_nmi_watchdog (void);
 extern int reserve_lapic_nmi(void);
 extern void release_lapic_nmi(void);

--

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 3/8] Utilize performance counter reservation framework in oprofile
  2006-05-09 20:50 [patch 0/8] [RFC PATCH] nmi watchdog: x86 32/64 cleanup dzickus
  2006-05-09 20:50 ` [patch 1/8] nmi watchdog header cleanup dzickus
  2006-05-09 20:50 ` [patch 2/8] Add performance counter reservation framework for UP kernels dzickus
@ 2006-05-09 20:50 ` dzickus
  2006-05-09 20:50 ` [patch 4/8] Add SMP support on x86_64 to reservation framework dzickus
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: dzickus @ 2006-05-09 20:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: ak, oprofile-list, dzickus

[-- Attachment #1: nmi-x86-oprofile-reservation.patch --]
[-- Type: text/plain, Size: 20049 bytes --]


Incorporates the new performance counter reservation system in oprofile.
Also cleans up a lot of the initialization code.  The code original zero'd
out every register associated with performance counters regardless if those
registers were used or not.  This causes issues with the nmi watchdog. 

Signed-off-by:  Don Zickus <dzickus@redhat.com


Index: linux-don/arch/i386/oprofile/nmi_int.c
===================================================================
--- linux-don.orig/arch/i386/oprofile/nmi_int.c
+++ linux-don/arch/i386/oprofile/nmi_int.c
@@ -97,15 +97,19 @@ static void nmi_cpu_save_registers(struc
 	unsigned int i;
 
 	for (i = 0; i < nr_ctrs; ++i) {
-		rdmsr(counters[i].addr,
-			counters[i].saved.low,
-			counters[i].saved.high);
+		if (counters[i].addr){
+			rdmsr(counters[i].addr,
+				counters[i].saved.low,
+				counters[i].saved.high);
+		}
 	}
  
 	for (i = 0; i < nr_ctrls; ++i) {
-		rdmsr(controls[i].addr,
-			controls[i].saved.low,
-			controls[i].saved.high);
+		if (controls[i].addr){
+			rdmsr(controls[i].addr,
+				controls[i].saved.low,
+				controls[i].saved.high);
+		}
 	}
 }
 
@@ -204,15 +208,19 @@ static void nmi_restore_registers(struct
 	unsigned int i;
 
 	for (i = 0; i < nr_ctrls; ++i) {
-		wrmsr(controls[i].addr,
-			controls[i].saved.low,
-			controls[i].saved.high);
+		if (controls[i].addr){
+			wrmsr(controls[i].addr,
+				controls[i].saved.low,
+				controls[i].saved.high);
+		}
 	}
  
 	for (i = 0; i < nr_ctrs; ++i) {
-		wrmsr(counters[i].addr,
-			counters[i].saved.low,
-			counters[i].saved.high);
+		if (counters[i].addr){
+			wrmsr(counters[i].addr,
+				counters[i].saved.low,
+				counters[i].saved.high);
+		}
 	}
 }
  
@@ -233,6 +241,7 @@ static void nmi_cpu_shutdown(void * dumm
 	apic_write(APIC_LVTPC, saved_lvtpc[cpu]);
 	apic_write(APIC_LVTERR, v);
 	nmi_restore_registers(msrs);
+	model->shutdown(msrs);
 }
 
  
@@ -283,6 +292,14 @@ static int nmi_create_files(struct super
 		struct dentry * dir;
 		char buf[2];
  
+ 		/* quick little hack to _not_ expose a counter if it is not
+		 * available for use.  This should protect userspace app.
+		 * NOTE:  assumes 1:1 mapping here (that counters are organized
+		 *        sequentially in their struct assignment).
+		 */
+		if (unlikely(!avail_to_resrv_perfctr_nmi_bit(i)))
+			continue;
+
 		snprintf(buf, 2, "%d", i);
 		dir = oprofilefs_mkdir(sb, root, buf);
 		oprofilefs_create_ulong(sb, dir, "enabled", &counter_config[i].enabled); 
Index: linux-don/arch/i386/oprofile/op_model_athlon.c
===================================================================
--- linux-don.orig/arch/i386/oprofile/op_model_athlon.c
+++ linux-don/arch/i386/oprofile/op_model_athlon.c
@@ -21,10 +21,12 @@
 #define NUM_COUNTERS 4
 #define NUM_CONTROLS 4
 
+#define CTR_IS_RESERVED(msrs,c) (msrs->counters[(c)].addr ? 1 : 0)
 #define CTR_READ(l,h,msrs,c) do {rdmsr(msrs->counters[(c)].addr, (l), (h));} while (0)
 #define CTR_WRITE(l,msrs,c) do {wrmsr(msrs->counters[(c)].addr, -(unsigned int)(l), -1);} while (0)
 #define CTR_OVERFLOWED(n) (!((n) & (1U<<31)))
 
+#define CTRL_IS_RESERVED(msrs,c) (msrs->controls[(c)].addr ? 1 : 0)
 #define CTRL_READ(l,h,msrs,c) do {rdmsr(msrs->controls[(c)].addr, (l), (h));} while (0)
 #define CTRL_WRITE(l,h,msrs,c) do {wrmsr(msrs->controls[(c)].addr, (l), (h));} while (0)
 #define CTRL_SET_ACTIVE(n) (n |= (1<<22))
@@ -40,15 +42,21 @@ static unsigned long reset_value[NUM_COU
  
 static void athlon_fill_in_addresses(struct op_msrs * const msrs)
 {
-	msrs->counters[0].addr = MSR_K7_PERFCTR0;
-	msrs->counters[1].addr = MSR_K7_PERFCTR1;
-	msrs->counters[2].addr = MSR_K7_PERFCTR2;
-	msrs->counters[3].addr = MSR_K7_PERFCTR3;
-
-	msrs->controls[0].addr = MSR_K7_EVNTSEL0;
-	msrs->controls[1].addr = MSR_K7_EVNTSEL1;
-	msrs->controls[2].addr = MSR_K7_EVNTSEL2;
-	msrs->controls[3].addr = MSR_K7_EVNTSEL3;
+	int i;
+
+	for (i=0; i < NUM_COUNTERS; i++) {
+		if (reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i))
+			msrs->counters[i].addr = MSR_K7_PERFCTR0 + i;
+		else
+			msrs->counters[i].addr = 0;
+	}
+
+	for (i=0; i < NUM_CONTROLS; i++) {
+		if (reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i))
+			msrs->controls[i].addr = MSR_K7_EVNTSEL0 + i;
+		else
+			msrs->controls[i].addr = 0;
+	}
 }
 
  
@@ -59,19 +67,23 @@ static void athlon_setup_ctrs(struct op_
  
 	/* clear all counters */
 	for (i = 0 ; i < NUM_CONTROLS; ++i) {
+		if (unlikely(!CTRL_IS_RESERVED(msrs,i)))
+			continue;
 		CTRL_READ(low, high, msrs, i);
 		CTRL_CLEAR(low);
 		CTRL_WRITE(low, high, msrs, i);
 	}
-	
+
 	/* avoid a false detection of ctr overflows in NMI handler */
 	for (i = 0; i < NUM_COUNTERS; ++i) {
+		if (unlikely(!CTR_IS_RESERVED(msrs,i)))
+			continue;
 		CTR_WRITE(1, msrs, i);
 	}
 
 	/* enable active counters */
 	for (i = 0; i < NUM_COUNTERS; ++i) {
-		if (counter_config[i].enabled) {
+		if ((counter_config[i].enabled) && (CTR_IS_RESERVED(msrs,i))) {
 			reset_value[i] = counter_config[i].count;
 
 			CTR_WRITE(counter_config[i].count, msrs, i);
@@ -98,6 +110,8 @@ static int athlon_check_ctrs(struct pt_r
 	int i;
 
 	for (i = 0 ; i < NUM_COUNTERS; ++i) {
+		if (!reset_value[i])
+			continue;
 		CTR_READ(low, high, msrs, i);
 		if (CTR_OVERFLOWED(low)) {
 			oprofile_add_sample(regs, i);
@@ -132,12 +146,27 @@ static void athlon_stop(struct op_msrs c
 	/* Subtle: stop on all counters to avoid race with
 	 * setting our pm callback */
 	for (i = 0 ; i < NUM_COUNTERS ; ++i) {
+		if (!reset_value[i])
+			continue;
 		CTRL_READ(low, high, msrs, i);
 		CTRL_SET_INACTIVE(low);
 		CTRL_WRITE(low, high, msrs, i);
 	}
 }
 
+static void athlon_shutdown(struct op_msrs const * const msrs)
+{
+	int i;
+
+	for (i = 0 ; i < NUM_COUNTERS ; ++i) {
+		if (CTR_IS_RESERVED(msrs,i))
+			release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
+	}
+	for (i = 0 ; i < NUM_CONTROLS ; ++i) {
+		if (CTRL_IS_RESERVED(msrs,i))
+			release_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
+	}
+}
 
 struct op_x86_model_spec const op_athlon_spec = {
 	.num_counters = NUM_COUNTERS,
@@ -146,5 +175,6 @@ struct op_x86_model_spec const op_athlon
 	.setup_ctrs = &athlon_setup_ctrs,
 	.check_ctrs = &athlon_check_ctrs,
 	.start = &athlon_start,
-	.stop = &athlon_stop
+	.stop = &athlon_stop,
+	.shutdown = &athlon_shutdown
 };
Index: linux-don/arch/i386/oprofile/op_model_p4.c
===================================================================
--- linux-don.orig/arch/i386/oprofile/op_model_p4.c
+++ linux-don/arch/i386/oprofile/op_model_p4.c
@@ -32,7 +32,7 @@
 #define NUM_CONTROLS_HT2 (NUM_ESCRS_HT2 + NUM_CCCRS_HT2)
 
 static unsigned int num_counters = NUM_COUNTERS_NON_HT;
-
+static unsigned int num_controls = NUM_CONTROLS_NON_HT;
 
 /* this has to be checked dynamically since the
    hyper-threadedness of a chip is discovered at
@@ -40,8 +40,10 @@ static unsigned int num_counters = NUM_C
 static inline void setup_num_counters(void)
 {
 #ifdef CONFIG_SMP
-	if (smp_num_siblings == 2)
+	if (smp_num_siblings == 2){
 		num_counters = NUM_COUNTERS_HT2;
+		num_controls = NUM_CONTROLS_HT2;
+	}
 #endif
 }
 
@@ -97,15 +99,6 @@ static struct p4_counter_binding p4_coun
 
 #define NUM_UNUSED_CCCRS	NUM_CCCRS_NON_HT - NUM_COUNTERS_NON_HT
 
-/* All cccr we don't use. */
-static int p4_unused_cccr[NUM_UNUSED_CCCRS] = {
-	MSR_P4_BPU_CCCR1,	MSR_P4_BPU_CCCR3,
-	MSR_P4_MS_CCCR1,	MSR_P4_MS_CCCR3,
-	MSR_P4_FLAME_CCCR1,	MSR_P4_FLAME_CCCR3,
-	MSR_P4_IQ_CCCR0,	MSR_P4_IQ_CCCR1,
-	MSR_P4_IQ_CCCR2,	MSR_P4_IQ_CCCR3
-};
-
 /* p4 event codes in libop/op_event.h are indices into this table. */
 
 static struct p4_event_binding p4_events[NUM_EVENTS] = {
@@ -372,6 +365,8 @@ static struct p4_event_binding p4_events
 #define CCCR_OVF_P(cccr) ((cccr) & (1U<<31))
 #define CCCR_CLEAR_OVF(cccr) ((cccr) &= (~(1U<<31)))
 
+#define CTRL_IS_RESERVED(msrs,c) (msrs->controls[(c)].addr ? 1 : 0)
+#define CTR_IS_RESERVED(msrs,c) (msrs->counters[(c)].addr ? 1 : 0)
 #define CTR_READ(l,h,i) do {rdmsr(p4_counters[(i)].counter_address, (l), (h));} while (0)
 #define CTR_WRITE(l,i) do {wrmsr(p4_counters[(i)].counter_address, -(u32)(l), -1);} while (0)
 #define CTR_OVERFLOW_P(ctr) (!((ctr) & 0x80000000))
@@ -401,29 +396,34 @@ static unsigned long reset_value[NUM_COU
 static void p4_fill_in_addresses(struct op_msrs * const msrs)
 {
 	unsigned int i; 
-	unsigned int addr, stag;
+	unsigned int addr, cccraddr, stag;
 
 	setup_num_counters();
 	stag = get_stagger();
 
-	/* the counter registers we pay attention to */
+	/* initialize some registers */
 	for (i = 0; i < num_counters; ++i) {
-		msrs->counters[i].addr = 
-			p4_counters[VIRT_CTR(stag, i)].counter_address;
+		msrs->counters[i].addr = 0;
 	}
-
-	/* FIXME: bad feeling, we don't save the 10 counters we don't use. */
-
-	/* 18 CCCR registers */
-	for (i = 0, addr = MSR_P4_BPU_CCCR0 + stag;
-	     addr <= MSR_P4_IQ_CCCR5; ++i, addr += addr_increment()) {
-		msrs->controls[i].addr = addr;
+	for (i = 0; i < num_controls; ++i) {
+		msrs->controls[i].addr = 0;
 	}
 	
+	/* the counter & cccr registers we pay attention to */
+	for (i = 0; i < num_counters; ++i) {
+		addr = p4_counters[VIRT_CTR(stag, i)].counter_address;
+		cccraddr = p4_counters[VIRT_CTR(stag, i)].cccr_address;
+		if (reserve_perfctr_nmi(addr)){
+			msrs->counters[i].addr = addr;
+			msrs->controls[i].addr = cccraddr;
+		}
+	}
+
 	/* 43 ESCR registers in three or four discontiguous group */
 	for (addr = MSR_P4_BSU_ESCR0 + stag;
 	     addr < MSR_P4_IQ_ESCR0; ++i, addr += addr_increment()) {
-		msrs->controls[i].addr = addr;
+		if (reserve_evntsel_nmi(addr))
+			msrs->controls[i].addr = addr;
 	}
 
 	/* no IQ_ESCR0/1 on some models, we save a seconde time BSU_ESCR0/1
@@ -431,47 +431,57 @@ static void p4_fill_in_addresses(struct 
 	if (boot_cpu_data.x86_model >= 0x3) {
 		for (addr = MSR_P4_BSU_ESCR0 + stag;
 		     addr <= MSR_P4_BSU_ESCR1; ++i, addr += addr_increment()) {
-			msrs->controls[i].addr = addr;
+			if (reserve_evntsel_nmi(addr))
+				msrs->controls[i].addr = addr;
 		}
 	} else {
 		for (addr = MSR_P4_IQ_ESCR0 + stag;
 		     addr <= MSR_P4_IQ_ESCR1; ++i, addr += addr_increment()) {
-			msrs->controls[i].addr = addr;
+			if (reserve_evntsel_nmi(addr))
+				msrs->controls[i].addr = addr;
 		}
 	}
 
 	for (addr = MSR_P4_RAT_ESCR0 + stag;
 	     addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) {
-		msrs->controls[i].addr = addr;
+		if (reserve_evntsel_nmi(addr))
+			msrs->controls[i].addr = addr;
 	}
 	
 	for (addr = MSR_P4_MS_ESCR0 + stag;
 	     addr <= MSR_P4_TC_ESCR1; ++i, addr += addr_increment()) { 
-		msrs->controls[i].addr = addr;
+		if (reserve_evntsel_nmi(addr))
+			msrs->controls[i].addr = addr;
 	}
 	
 	for (addr = MSR_P4_IX_ESCR0 + stag;
 	     addr <= MSR_P4_CRU_ESCR3; ++i, addr += addr_increment()) { 
-		msrs->controls[i].addr = addr;
+		if (reserve_evntsel_nmi(addr))
+			msrs->controls[i].addr = addr;
 	}
 
 	/* there are 2 remaining non-contiguously located ESCRs */
 
 	if (num_counters == NUM_COUNTERS_NON_HT) {		
 		/* standard non-HT CPUs handle both remaining ESCRs*/
-		msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
-		msrs->controls[i++].addr = MSR_P4_CRU_ESCR4;
+		if (reserve_evntsel_nmi(MSR_P4_CRU_ESCR5))
+			msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
+		if (reserve_evntsel_nmi(MSR_P4_CRU_ESCR4))
+			msrs->controls[i++].addr = MSR_P4_CRU_ESCR4;
 
 	} else if (stag == 0) {
 		/* HT CPUs give the first remainder to the even thread, as
 		   the 32nd control register */
-		msrs->controls[i++].addr = MSR_P4_CRU_ESCR4;
+		if (reserve_evntsel_nmi(MSR_P4_CRU_ESCR4))
+			msrs->controls[i++].addr = MSR_P4_CRU_ESCR4;
 
 	} else {
 		/* and two copies of the second to the odd thread,
 		   for the 22st and 23nd control registers */
-		msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
-		msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
+		if (reserve_evntsel_nmi(MSR_P4_CRU_ESCR5)) {
+			msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
+			msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
+		}
 	}
 }
 
@@ -544,7 +554,6 @@ static void p4_setup_ctrs(struct op_msrs
 {
 	unsigned int i;
 	unsigned int low, high;
-	unsigned int addr;
 	unsigned int stag;
 
 	stag = get_stagger();
@@ -557,59 +566,24 @@ static void p4_setup_ctrs(struct op_msrs
 
 	/* clear the cccrs we will use */
 	for (i = 0 ; i < num_counters ; i++) {
+		if (unlikely(!CTRL_IS_RESERVED(msrs,i)))
+			continue;
 		rdmsr(p4_counters[VIRT_CTR(stag, i)].cccr_address, low, high);
 		CCCR_CLEAR(low);
 		CCCR_SET_REQUIRED_BITS(low);
 		wrmsr(p4_counters[VIRT_CTR(stag, i)].cccr_address, low, high);
 	}
 
-	/* clear cccrs outside our concern */
-	for (i = stag ; i < NUM_UNUSED_CCCRS ; i += addr_increment()) {
-		rdmsr(p4_unused_cccr[i], low, high);
-		CCCR_CLEAR(low);
-		CCCR_SET_REQUIRED_BITS(low);
-		wrmsr(p4_unused_cccr[i], low, high);
-	}
-
 	/* clear all escrs (including those outside our concern) */
-	for (addr = MSR_P4_BSU_ESCR0 + stag;
-	     addr <  MSR_P4_IQ_ESCR0; addr += addr_increment()) {
-		wrmsr(addr, 0, 0);
-	}
-
-	/* On older models clear also MSR_P4_IQ_ESCR0/1 */
-	if (boot_cpu_data.x86_model < 0x3) {
-		wrmsr(MSR_P4_IQ_ESCR0, 0, 0);
-		wrmsr(MSR_P4_IQ_ESCR1, 0, 0);
-	}
-
-	for (addr = MSR_P4_RAT_ESCR0 + stag;
-	     addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) {
-		wrmsr(addr, 0, 0);
-	}
-	
-	for (addr = MSR_P4_MS_ESCR0 + stag;
-	     addr <= MSR_P4_TC_ESCR1; addr += addr_increment()){ 
-		wrmsr(addr, 0, 0);
-	}
-	
-	for (addr = MSR_P4_IX_ESCR0 + stag;
-	     addr <= MSR_P4_CRU_ESCR3; addr += addr_increment()){ 
-		wrmsr(addr, 0, 0);
+	for (i = num_counters; i < num_controls; i++) {
+		if (unlikely(!CTRL_IS_RESERVED(msrs,i)))
+			continue;
+		wrmsr(msrs->controls[i].addr, 0, 0);
 	}
 
-	if (num_counters == NUM_COUNTERS_NON_HT) {		
-		wrmsr(MSR_P4_CRU_ESCR4, 0, 0);
-		wrmsr(MSR_P4_CRU_ESCR5, 0, 0);
-	} else if (stag == 0) {
-		wrmsr(MSR_P4_CRU_ESCR4, 0, 0);
-	} else {
-		wrmsr(MSR_P4_CRU_ESCR5, 0, 0);
-	}		
-	
 	/* setup all counters */
 	for (i = 0 ; i < num_counters ; ++i) {
-		if (counter_config[i].enabled) {
+		if ((counter_config[i].enabled) && (CTRL_IS_RESERVED(msrs,i))) {
 			reset_value[i] = counter_config[i].count;
 			pmc_setup_one_p4_counter(i);
 			CTR_WRITE(counter_config[i].count, VIRT_CTR(stag, i));
@@ -696,12 +670,32 @@ static void p4_stop(struct op_msrs const
 	stag = get_stagger();
 
 	for (i = 0; i < num_counters; ++i) {
+		if (!reset_value[i])
+			continue;
 		CCCR_READ(low, high, VIRT_CTR(stag, i));
 		CCCR_SET_DISABLE(low);
 		CCCR_WRITE(low, high, VIRT_CTR(stag, i));
 	}
 }
 
+static void p4_shutdown(struct op_msrs const * const msrs)
+{
+	int i;
+
+	for (i = 0 ; i < num_counters ; ++i) {
+		if (CTR_IS_RESERVED(msrs,i))
+			release_perfctr_nmi(msrs->counters[i].addr);
+	}
+	/* some of the control registers are specially reserved in
+	 * conjunction with the counter registers (hence the starting offset).
+	 * This saves a few bits.
+	 */
+	for (i = num_counters ; i < num_controls ; ++i) {
+		if (CTRL_IS_RESERVED(msrs,i))
+			release_evntsel_nmi(msrs->controls[i].addr);
+	}
+}
+
 
 #ifdef CONFIG_SMP
 struct op_x86_model_spec const op_p4_ht2_spec = {
@@ -711,7 +705,8 @@ struct op_x86_model_spec const op_p4_ht2
 	.setup_ctrs = &p4_setup_ctrs,
 	.check_ctrs = &p4_check_ctrs,
 	.start = &p4_start,
-	.stop = &p4_stop
+	.stop = &p4_stop,
+	.shutdown = &p4_shutdown
 };
 #endif
 
@@ -722,5 +717,6 @@ struct op_x86_model_spec const op_p4_spe
 	.setup_ctrs = &p4_setup_ctrs,
 	.check_ctrs = &p4_check_ctrs,
 	.start = &p4_start,
-	.stop = &p4_stop
+	.stop = &p4_stop,
+	.shutdown = &p4_shutdown
 };
Index: linux-don/arch/i386/oprofile/op_model_ppro.c
===================================================================
--- linux-don.orig/arch/i386/oprofile/op_model_ppro.c
+++ linux-don/arch/i386/oprofile/op_model_ppro.c
@@ -22,10 +22,12 @@
 #define NUM_COUNTERS 2
 #define NUM_CONTROLS 2
 
+#define CTR_IS_RESERVED(msrs,c) (msrs->counters[(c)].addr ? 1 : 0)
 #define CTR_READ(l,h,msrs,c) do {rdmsr(msrs->counters[(c)].addr, (l), (h));} while (0)
 #define CTR_WRITE(l,msrs,c) do {wrmsr(msrs->counters[(c)].addr, -(u32)(l), -1);} while (0)
 #define CTR_OVERFLOWED(n) (!((n) & (1U<<31)))
 
+#define CTRL_IS_RESERVED(msrs,c) (msrs->controls[(c)].addr ? 1 : 0)
 #define CTRL_READ(l,h,msrs,c) do {rdmsr((msrs->controls[(c)].addr), (l), (h));} while (0)
 #define CTRL_WRITE(l,h,msrs,c) do {wrmsr((msrs->controls[(c)].addr), (l), (h));} while (0)
 #define CTRL_SET_ACTIVE(n) (n |= (1<<22))
@@ -41,11 +43,21 @@ static unsigned long reset_value[NUM_COU
  
 static void ppro_fill_in_addresses(struct op_msrs * const msrs)
 {
-	msrs->counters[0].addr = MSR_P6_PERFCTR0;
-	msrs->counters[1].addr = MSR_P6_PERFCTR1;
+	int i;
+
+	for (i=0; i < NUM_COUNTERS; i++) {
+		if (reserve_perfctr_nmi(MSR_P6_PERFCTR0 + i))
+			msrs->counters[i].addr = MSR_P6_PERFCTR0 + i;
+		else
+			msrs->counters[i].addr = 0;
+	}
 	
-	msrs->controls[0].addr = MSR_P6_EVNTSEL0;
-	msrs->controls[1].addr = MSR_P6_EVNTSEL1;
+	for (i=0; i < NUM_CONTROLS; i++) {
+		if (reserve_evntsel_nmi(MSR_P6_EVNTSEL0 + i))
+			msrs->controls[i].addr = MSR_P6_EVNTSEL0 + i;
+		else
+			msrs->controls[i].addr = 0;
+	}
 }
 
 
@@ -56,6 +68,8 @@ static void ppro_setup_ctrs(struct op_ms
 
 	/* clear all counters */
 	for (i = 0 ; i < NUM_CONTROLS; ++i) {
+		if (unlikely(!CTRL_IS_RESERVED(msrs,i)))
+			continue;
 		CTRL_READ(low, high, msrs, i);
 		CTRL_CLEAR(low);
 		CTRL_WRITE(low, high, msrs, i);
@@ -63,12 +77,14 @@ static void ppro_setup_ctrs(struct op_ms
 	
 	/* avoid a false detection of ctr overflows in NMI handler */
 	for (i = 0; i < NUM_COUNTERS; ++i) {
+		if (unlikely(!CTR_IS_RESERVED(msrs,i)))
+			continue;
 		CTR_WRITE(1, msrs, i);
 	}
 
 	/* enable active counters */
 	for (i = 0; i < NUM_COUNTERS; ++i) {
-		if (counter_config[i].enabled) {
+		if ((counter_config[i].enabled) && (CTR_IS_RESERVED(msrs,i))) {
 			reset_value[i] = counter_config[i].count;
 
 			CTR_WRITE(counter_config[i].count, msrs, i);
@@ -81,6 +97,8 @@ static void ppro_setup_ctrs(struct op_ms
 			CTRL_SET_UM(low, counter_config[i].unit_mask);
 			CTRL_SET_EVENT(low, counter_config[i].event);
 			CTRL_WRITE(low, high, msrs, i);
+		} else {
+			reset_value[i] = 0;
 		}
 	}
 }
@@ -93,6 +111,8 @@ static int ppro_check_ctrs(struct pt_reg
 	int i;
  
 	for (i = 0 ; i < NUM_COUNTERS; ++i) {
+		if (!reset_value[i])
+			continue;
 		CTR_READ(low, high, msrs, i);
 		if (CTR_OVERFLOWED(low)) {
 			oprofile_add_sample(regs, i);
@@ -118,18 +138,38 @@ static int ppro_check_ctrs(struct pt_reg
 static void ppro_start(struct op_msrs const * const msrs)
 {
 	unsigned int low,high;
-	CTRL_READ(low, high, msrs, 0);
-	CTRL_SET_ACTIVE(low);
-	CTRL_WRITE(low, high, msrs, 0);
+
+	if (reset_value[0]) {
+		CTRL_READ(low, high, msrs, 0);
+		CTRL_SET_ACTIVE(low);
+		CTRL_WRITE(low, high, msrs, 0);
+	}
 }
 
 
 static void ppro_stop(struct op_msrs const * const msrs)
 {
 	unsigned int low,high;
-	CTRL_READ(low, high, msrs, 0);
-	CTRL_SET_INACTIVE(low);
-	CTRL_WRITE(low, high, msrs, 0);
+
+	if (reset_value[0]) {
+		CTRL_READ(low, high, msrs, 0);
+		CTRL_SET_INACTIVE(low);
+		CTRL_WRITE(low, high, msrs, 0);
+	}
+}
+
+static void ppro_shutdown(struct op_msrs const * const msrs)
+{
+	int i;
+
+	for (i = 0 ; i < NUM_COUNTERS ; ++i) {
+		if (CTR_IS_RESERVED(msrs,i))
+			release_perfctr_nmi(MSR_P6_PERFCTR0 + i);
+	}
+	for (i = 0 ; i < NUM_CONTROLS ; ++i) {
+		if (CTRL_IS_RESERVED(msrs,i))
+			release_evntsel_nmi(MSR_P6_EVNTSEL0 + i);
+	}
 }
 
 
@@ -140,5 +180,6 @@ struct op_x86_model_spec const op_ppro_s
 	.setup_ctrs = &ppro_setup_ctrs,
 	.check_ctrs = &ppro_check_ctrs,
 	.start = &ppro_start,
-	.stop = &ppro_stop
+	.stop = &ppro_stop,
+	.shutdown = &ppro_shutdown
 };
Index: linux-don/arch/i386/oprofile/op_x86_model.h
===================================================================
--- linux-don.orig/arch/i386/oprofile/op_x86_model.h
+++ linux-don/arch/i386/oprofile/op_x86_model.h
@@ -40,6 +40,7 @@ struct op_x86_model_spec {
 		struct op_msrs const * const msrs);
 	void (*start)(struct op_msrs const * const msrs);
 	void (*stop)(struct op_msrs const * const msrs);
+	void (*shutdown)(struct op_msrs const * const msrs);
 };
 
 extern struct op_x86_model_spec const op_ppro_spec;

--

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 4/8] Add SMP support on x86_64 to reservation framework
  2006-05-09 20:50 [patch 0/8] [RFC PATCH] nmi watchdog: x86 32/64 cleanup dzickus
                   ` (2 preceding siblings ...)
  2006-05-09 20:50 ` [patch 3/8] Utilize performance counter reservation framework in oprofile dzickus
@ 2006-05-09 20:50 ` dzickus
  2006-05-09 21:42   ` Andi Kleen
  2006-05-09 20:50 ` [patch 5/8] Add SMP support on i386 " dzickus
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: dzickus @ 2006-05-09 20:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: ak, oprofile-list, dzickus

[-- Attachment #1: nmi-x86-reservation-SMP-x86_64.patch --]
[-- Type: text/plain, Size: 18971 bytes --]


This patch includes the changes to make the nmi watchdog on x86_64 SMP
aware.  A bunch of code was moved around to make it simpler to read.  In
addition, it is now possible to determine if a particular NMI was the result
of the watchdog or not.  This feature allows the kernel to filter out
unknown NMIs easier. 

Signed-off-by:  Don Zickus <dzickus@redhat.com>


Index: linux-don/arch/x86_64/kernel/apic.c
===================================================================
--- linux-don.orig/arch/x86_64/kernel/apic.c
+++ linux-don/arch/x86_64/kernel/apic.c
@@ -480,8 +480,7 @@ void __cpuinit setup_local_APIC (void)
 	}
 
 	nmi_watchdog_default();
-	if (nmi_watchdog == NMI_LOCAL_APIC)
-		setup_apic_nmi_watchdog();
+	setup_apic_nmi_watchdog(NULL);
 	apic_pm_activate();
 }
 
Index: linux-don/arch/x86_64/kernel/nmi.c
===================================================================
--- linux-don.orig/arch/x86_64/kernel/nmi.c
+++ linux-don/arch/x86_64/kernel/nmi.c
@@ -57,53 +57,29 @@ static unsigned int lapic_nmi_owner;
 #define LAPIC_NMI_RESERVED	(1<<1)
 
 /* nmi_active:
- * +1: the lapic NMI watchdog is active, but can be disabled
- *  0: the lapic NMI watchdog has not been set up, and cannot
+ * >0: the lapic NMI watchdog is active, but can be disabled
+ * <0: the lapic NMI watchdog has not been set up, and cannot
  *     be enabled
- * -1: the lapic NMI watchdog is disabled, but can be enabled
+ *  0: the lapic NMI watchdog is disabled, but can be enabled
  */
-int nmi_active;		/* oprofile uses this */
+atomic_t nmi_active = ATOMIC_INIT(0);		/* oprofile uses this */
 int panic_on_timeout;
 
 unsigned int nmi_watchdog = NMI_DEFAULT;
 static unsigned int nmi_hz = HZ;
-static unsigned int nmi_perfctr_msr;	/* the MSR to reset in NMI handler */
-static unsigned int nmi_p4_cccr_val;
 
-/* Note that these events don't tick when the CPU idles. This means
-   the frequency varies with CPU load. */
-
-#define K7_EVNTSEL_ENABLE	(1 << 22)
-#define K7_EVNTSEL_INT		(1 << 20)
-#define K7_EVNTSEL_OS		(1 << 17)
-#define K7_EVNTSEL_USR		(1 << 16)
-#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING	0x76
-#define K7_NMI_EVENT		K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING
+struct nmi_watchdog_ctlblk {
+	int enabled;
+	u64 check_bit;
+	unsigned int cccr_msr;
+	unsigned int perfctr_msr;  /* the MSR to reset in NMI handler */
+	unsigned int evntsel_msr;  /* the MSR to select the events to handle */
+};
+static DEFINE_PER_CPU(struct nmi_watchdog_ctlblk, nmi_watchdog_ctlblk);
 
-#define MSR_P4_MISC_ENABLE	0x1A0
-#define MSR_P4_MISC_ENABLE_PERF_AVAIL	(1<<7)
-#define MSR_P4_MISC_ENABLE_PEBS_UNAVAIL	(1<<12)
-#define MSR_P4_PERFCTR0		0x300
-#define MSR_P4_CCCR0		0x360
-#define P4_ESCR_EVENT_SELECT(N)	((N)<<25)
-#define P4_ESCR_OS		(1<<3)
-#define P4_ESCR_USR		(1<<2)
-#define P4_CCCR_OVF_PMI0	(1<<26)
-#define P4_CCCR_OVF_PMI1	(1<<27)
-#define P4_CCCR_THRESHOLD(N)	((N)<<20)
-#define P4_CCCR_COMPLEMENT	(1<<19)
-#define P4_CCCR_COMPARE		(1<<18)
-#define P4_CCCR_REQUIRED	(3<<16)
-#define P4_CCCR_ESCR_SELECT(N)	((N)<<13)
-#define P4_CCCR_ENABLE		(1<<12)
-/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter
-   CRU_ESCR0 (with any non-null event selector) through a complemented
-   max threshold. [IA32-Vol3, Section 14.9.9] */
-#define MSR_P4_IQ_COUNTER0	0x30C
-#define P4_NMI_CRU_ESCR0	(P4_ESCR_EVENT_SELECT(0x3F)|P4_ESCR_OS|P4_ESCR_USR)
-#define P4_NMI_IQ_CCCR0	\
-	(P4_CCCR_OVF_PMI0|P4_CCCR_THRESHOLD(15)|P4_CCCR_COMPLEMENT|	\
-	 P4_CCCR_COMPARE|P4_CCCR_REQUIRED|P4_CCCR_ESCR_SELECT(4)|P4_CCCR_ENABLE)
+/* local prototypes */
+static void stop_apic_nmi_watchdog(void *unused);
+static int unknown_nmi_panic_callback(struct pt_regs *regs, int cpu);
 
 /* converts an msr to an appropriate reservation bit */
 static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr)
@@ -242,6 +218,12 @@ int __init check_nmi_watchdog (void)
 	int *counts;
 	int cpu;
 
+	if ((nmi_watchdog == NMI_NONE) || (nmi_watchdog == NMI_DEFAULT))
+		return 0;
+
+	if (!atomic_read(&nmi_active))
+		return 0;
+
 	counts = kmalloc(NR_CPUS * sizeof(int), GFP_KERNEL);
 	if (!counts)
 		return -1;
@@ -259,19 +241,22 @@ int __init check_nmi_watchdog (void)
 	mdelay((10*1000)/nmi_hz); // wait 10 ticks
 
 	for_each_online_cpu(cpu) {
+		if (!per_cpu(nmi_watchdog_ctlblk.enabled, cpu))
+			continue;
 		if (cpu_pda(cpu)->__nmi_count - counts[cpu] <= 5) {
-			endflag = 1;
 			printk("CPU#%d: NMI appears to be stuck (%d->%d)!\n",
 			       cpu,
 			       counts[cpu],
 			       cpu_pda(cpu)->__nmi_count);
-			nmi_active = 0;
-			lapic_nmi_owner &= ~LAPIC_NMI_WATCHDOG;
-			nmi_perfctr_msr = 0;
-			kfree(counts);
-			return -1;
+			per_cpu(nmi_watchdog_ctlblk.enabled, cpu) = 0;
+			atomic_dec(&nmi_active);
 		}
 	}
+	if (!atomic_read(&nmi_active)) {
+		kfree(counts);
+		atomic_set(&nmi_active, -1);
+		return -1;
+	}
 	endflag = 1;
 	printk("OK.\n");
 
@@ -298,8 +283,11 @@ int __init setup_nmi_watchdog(char *str)
 
 	get_option(&str, &nmi);
 
-	if (nmi >= NMI_INVALID)
+	if ((nmi >= NMI_INVALID) || (nmi < NMI_NONE))
 		return 0;
+
+	if ((nmi == NMI_LOCAL_APIC) && (nmi_known_cpu() == 0))
+		return 0;  /* no lapic support */
 	nmi_watchdog = nmi;
 	return 1;
 }
@@ -308,31 +296,30 @@ __setup("nmi_watchdog=", setup_nmi_watch
 
 static void disable_lapic_nmi_watchdog(void)
 {
-	if (nmi_active <= 0)
+	BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
+
+	if (atomic_read(&nmi_active) <= 0)
 		return;
-	switch (boot_cpu_data.x86_vendor) {
-	case X86_VENDOR_AMD:
-		wrmsr(MSR_K7_EVNTSEL0, 0, 0);
-		break;
-	case X86_VENDOR_INTEL:
-		if (boot_cpu_data.x86 == 15) {
-			wrmsr(MSR_P4_IQ_CCCR0, 0, 0);
-			wrmsr(MSR_P4_CRU_ESCR0, 0, 0);
-		}
-		break;
-	}
-	nmi_active = -1;
-	/* tell do_nmi() and others that we're not active any more */
-	nmi_watchdog = 0;
+
+	on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1);
+
+	BUG_ON(atomic_read(&nmi_active) != 0);
 }
 
 static void enable_lapic_nmi_watchdog(void)
 {
-	if (nmi_active < 0) {
-		nmi_watchdog = NMI_LOCAL_APIC;
-		touch_nmi_watchdog();
-		setup_apic_nmi_watchdog();
-	}
+	BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
+
+	/* are we already enabled */
+	if (atomic_read(&nmi_active) != 0)
+		return;
+
+	/* are we lapic aware */
+	if (nmi_known_cpu() <= 0)
+		return;
+
+	on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);
+	touch_nmi_watchdog();
 }
 
 int reserve_lapic_nmi(void)
@@ -364,21 +351,24 @@ void release_lapic_nmi(void)
 
 void disable_timer_nmi_watchdog(void)
 {
-	if ((nmi_watchdog != NMI_IO_APIC) || (nmi_active <= 0))
+	BUG_ON(nmi_watchdog != NMI_IO_APIC);
+
+	if (atomic_read(&nmi_active) <= 0)
 		return;
 
 	disable_irq(0);
-	unset_nmi_callback();
-	nmi_active = -1;
-	nmi_watchdog = NMI_NONE;
+	on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1);
+
+	BUG_ON(atomic_read(&nmi_active) != 0);
 }
 
 void enable_timer_nmi_watchdog(void)
 {
-	if (nmi_active < 0) {
-		nmi_watchdog = NMI_IO_APIC;
+	BUG_ON(nmi_watchdog != NMI_IO_APIC);
+
+	if (atomic_read(&nmi_active) == 0) {
 		touch_nmi_watchdog();
-		nmi_active = 1;
+		on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);
 		enable_irq(0);
 	}
 }
@@ -389,7 +379,7 @@ static int nmi_pm_active; /* nmi_active 
 
 static int lapic_nmi_suspend(struct sys_device *dev, pm_message_t state)
 {
-	nmi_pm_active = nmi_active;
+	nmi_pm_active = atomic_read(&nmi_active);
 	disable_lapic_nmi_watchdog();
 	return 0;
 }
@@ -397,7 +387,7 @@ static int lapic_nmi_suspend(struct sys_
 static int lapic_nmi_resume(struct sys_device *dev)
 {
 	if (nmi_pm_active > 0)
-	enable_lapic_nmi_watchdog();
+		enable_lapic_nmi_watchdog();
 	return 0;
 }
 
@@ -416,7 +406,13 @@ static int __init init_lapic_nmi_sysfs(v
 {
 	int error;
 
-	if (nmi_active == 0 || nmi_watchdog != NMI_LOCAL_APIC)
+	/* should really be a BUG_ON but b/c this is an
+	 * init call, it just doesn't work.  -dcz
+	 */
+	if (nmi_watchdog != NMI_LOCAL_APIC)
+		return 0;
+
+	if ( atomic_read(&nmi_active) < 0 )
 		return 0;
 
 	error = sysdev_class_register(&nmi_sysclass);
@@ -429,100 +425,232 @@ late_initcall(init_lapic_nmi_sysfs);
 
 #endif	/* CONFIG_PM */
 
+/*
+ * Activate the NMI watchdog via the local APIC.
+ * Original code written by Keith Owens.
+ */
+
+/* Note that these events don't tick when the CPU idles. This means
+   the frequency varies with CPU load. */
+
+#define K7_EVNTSEL_ENABLE	(1 << 22)
+#define K7_EVNTSEL_INT		(1 << 20)
+#define K7_EVNTSEL_OS		(1 << 17)
+#define K7_EVNTSEL_USR		(1 << 16)
+#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING	0x76
+#define K7_NMI_EVENT		K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING
+
 static int setup_k7_watchdog(void)
 {
+	unsigned int perfctr_msr, evntsel_msr;
 	unsigned int evntsel;
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
 
-	nmi_perfctr_msr = MSR_K7_PERFCTR0;
-
-	if (!reserve_perfctr_nmi(nmi_perfctr_msr))
+	perfctr_msr = MSR_K7_PERFCTR0;
+	evntsel_msr = MSR_K7_EVNTSEL0;
+	if (!reserve_perfctr_nmi(perfctr_msr))
 		goto fail;
 
-	if (!reserve_evntsel_nmi(MSR_K7_EVNTSEL0))
+	if (!reserve_evntsel_nmi(evntsel_msr))
 		goto fail1;
 
 	/* Simulator may not support it */
-	if (checking_wrmsrl(MSR_K7_EVNTSEL0, 0UL))
+	if (checking_wrmsrl(evntsel_msr, 0UL))
 		goto fail2;
-	wrmsrl(MSR_K7_PERFCTR0, 0UL);
+	wrmsrl(perfctr_msr, 0UL);
 
 	evntsel = K7_EVNTSEL_INT
 		| K7_EVNTSEL_OS
 		| K7_EVNTSEL_USR
 		| K7_NMI_EVENT;
 
-	wrmsr(MSR_K7_EVNTSEL0, evntsel, 0);
-	wrmsrl(MSR_K7_PERFCTR0, -((u64)cpu_khz * 1000 / nmi_hz));
+	/* setup the timer */
+	wrmsr(evntsel_msr, evntsel, 0);
+	wrmsrl(perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz));
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
 	evntsel |= K7_EVNTSEL_ENABLE;
-	wrmsr(MSR_K7_EVNTSEL0, evntsel, 0);
+	wrmsr(evntsel_msr, evntsel, 0);
+
+	wd->perfctr_msr = perfctr_msr;
+	wd->evntsel_msr = evntsel_msr;
+	wd->cccr_msr = 0;  //unused
+	wd->check_bit = 1ULL<<63;
 	return 1;
 fail2:
-	release_evntsel_nmi(MSR_K7_EVNTSEL0);
+	release_evntsel_nmi(evntsel_msr);
 fail1:
-	release_perfctr_nmi(nmi_perfctr_msr);
+	release_perfctr_nmi(perfctr_msr);
 fail:
 	return 0;
 }
 
+static void stop_k7_watchdog(void)
+{
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+	wrmsr(wd->evntsel_msr, 0, 0);
+
+	release_evntsel_nmi(wd->evntsel_msr);
+	release_perfctr_nmi(wd->perfctr_msr);
+}
+
+/* Note that these events don't tick when the CPU idles. This means
+   the frequency varies with CPU load. */
+
+#define MSR_P4_MISC_ENABLE_PERF_AVAIL	(1<<7)
+#define P4_ESCR_EVENT_SELECT(N)	((N)<<25)
+#define P4_ESCR_OS		(1<<3)
+#define P4_ESCR_USR		(1<<2)
+#define P4_CCCR_OVF_PMI0	(1<<26)
+#define P4_CCCR_OVF_PMI1	(1<<27)
+#define P4_CCCR_THRESHOLD(N)	((N)<<20)
+#define P4_CCCR_COMPLEMENT	(1<<19)
+#define P4_CCCR_COMPARE		(1<<18)
+#define P4_CCCR_REQUIRED	(3<<16)
+#define P4_CCCR_ESCR_SELECT(N)	((N)<<13)
+#define P4_CCCR_ENABLE		(1<<12)
+#define P4_CCCR_OVF 		(1<<31)
+/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter
+   CRU_ESCR0 (with any non-null event selector) through a complemented
+   max threshold. [IA32-Vol3, Section 14.9.9] */
 
 static int setup_p4_watchdog(void)
 {
+	unsigned int perfctr_msr, evntsel_msr, cccr_msr;
+	unsigned int evntsel, cccr_val;
 	unsigned int misc_enable, dummy;
+	unsigned int ht_num;
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
 
-	rdmsr(MSR_P4_MISC_ENABLE, misc_enable, dummy);
+	rdmsr(MSR_IA32_MISC_ENABLE, misc_enable, dummy);
 	if (!(misc_enable & MSR_P4_MISC_ENABLE_PERF_AVAIL))
 		return 0;
 
-	nmi_perfctr_msr = MSR_P4_IQ_COUNTER0;
-	nmi_p4_cccr_val = P4_NMI_IQ_CCCR0;
 #ifdef CONFIG_SMP
-	if (smp_num_siblings == 2)
-		nmi_p4_cccr_val |= P4_CCCR_OVF_PMI1;
+	/* detect which hyperthread we are on */
+	if (smp_num_siblings == 2) {
+		unsigned int ebx, apicid;
+
+        	ebx = cpuid_ebx(1);
+	        apicid = (ebx >> 24) & 0xff;
+        	ht_num = apicid & 1;
+	} else
 #endif
+		ht_num = 0;
+
+	/* performance counters are shared resources
+	 * assign each hyperthread its own set
+	 * (re-use the ESCR0 register, seems safe
+	 * and keeps the cccr_val the same)
+	 */
+	if (!ht_num) {
+		/* logical cpu 0 */
+		perfctr_msr = MSR_P4_IQ_PERFCTR0;
+		evntsel_msr = MSR_P4_CRU_ESCR0;
+		cccr_msr = MSR_P4_IQ_CCCR0;
+		cccr_val = P4_CCCR_OVF_PMI0 | P4_CCCR_ESCR_SELECT(4);
+	} else {
+		/* logical cpu 1 */
+		perfctr_msr = MSR_P4_IQ_PERFCTR1;
+		evntsel_msr = MSR_P4_CRU_ESCR0;
+		cccr_msr = MSR_P4_IQ_CCCR1;
+		cccr_val = P4_CCCR_OVF_PMI1 | P4_CCCR_ESCR_SELECT(4);
+	}
 
-	if (!reserve_perfctr_nmi(nmi_perfctr_msr))
+	if (!reserve_perfctr_nmi(perfctr_msr))
 		goto fail;
 
-	if (!reserve_evntsel_nmi(MSR_P4_CRU_ESCR0))
+	if (!reserve_evntsel_nmi(evntsel_msr))
 		goto fail1;
 
-	wrmsr(MSR_P4_CRU_ESCR0, P4_NMI_CRU_ESCR0, 0);
-	wrmsr(MSR_P4_IQ_CCCR0, P4_NMI_IQ_CCCR0 & ~P4_CCCR_ENABLE, 0);
-	Dprintk("setting P4_IQ_COUNTER0 to 0x%08lx\n", -(cpu_khz * 1000UL / nmi_hz));
-	wrmsrl(MSR_P4_IQ_COUNTER0, -((u64)cpu_khz * 1000 / nmi_hz));
+	evntsel = P4_ESCR_EVENT_SELECT(0x3F)
+	 	| P4_ESCR_OS
+		| P4_ESCR_USR;
+
+	cccr_val |= P4_CCCR_THRESHOLD(15)
+		 | P4_CCCR_COMPLEMENT
+		 | P4_CCCR_COMPARE
+		 | P4_CCCR_REQUIRED;
+
+	wrmsr(evntsel_msr, evntsel, 0);
+	wrmsr(cccr_msr, cccr_val, 0);
+	wrmsrl(perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz));
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
-	wrmsr(MSR_P4_IQ_CCCR0, nmi_p4_cccr_val, 0);
+	cccr_val |= P4_CCCR_ENABLE;
+	wrmsr(cccr_msr, cccr_val, 0);
+
+	wd->perfctr_msr = perfctr_msr;
+	wd->evntsel_msr = evntsel_msr;
+	wd->cccr_msr = cccr_msr;
+	wd->check_bit = 1ULL<<39;
 	return 1;
 fail1:
-	release_perfctr_nmi(nmi_perfctr_msr);
+	release_perfctr_nmi(perfctr_msr);
 fail:
 	return 0;
 }
 
-void setup_apic_nmi_watchdog(void)
+static void stop_p4_watchdog(void)
 {
-	switch (boot_cpu_data.x86_vendor) {
-	case X86_VENDOR_AMD:
-		if (boot_cpu_data.x86 != 15)
-			return;
-		if (strstr(boot_cpu_data.x86_model_id, "Screwdriver"))
-			return;
-		if (!setup_k7_watchdog())
-			return;
-		break;
-	case X86_VENDOR_INTEL:
-		if (boot_cpu_data.x86 != 15)
-			return;
-		if (!setup_p4_watchdog())
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+	wrmsr(wd->cccr_msr, 0, 0);
+	wrmsr(wd->evntsel_msr, 0, 0);
+
+	release_evntsel_nmi(wd->evntsel_msr);
+	release_perfctr_nmi(wd->perfctr_msr);
+}
+
+void setup_apic_nmi_watchdog(void *unused)
+{
+	/* only support LOCAL and IO APICs for now */
+	if ((nmi_watchdog != NMI_LOCAL_APIC) &&
+	    (nmi_watchdog != NMI_IO_APIC))
+	    	return;
+
+	if (nmi_watchdog == NMI_LOCAL_APIC) {
+		switch (boot_cpu_data.x86_vendor) {
+		case X86_VENDOR_AMD:
+			if (strstr(boot_cpu_data.x86_model_id, "Screwdriver"))
+				return;
+			if (!setup_k7_watchdog())
+				return;
+			break;
+		case X86_VENDOR_INTEL:
+			if (!setup_p4_watchdog())
+				return;
+			break;
+		default:
 			return;
-		break;
+		}
+	}
+	__get_cpu_var(nmi_watchdog_ctlblk.enabled) = 1;
+	atomic_inc(&nmi_active);
+}
 
-	default:
-		return;
+static void stop_apic_nmi_watchdog(void *unused)
+{
+	/* only support LOCAL and IO APICs for now */
+	if ((nmi_watchdog != NMI_LOCAL_APIC) &&
+	    (nmi_watchdog != NMI_IO_APIC))
+	    	return;
+
+	if (nmi_watchdog == NMI_LOCAL_APIC) {
+		switch (boot_cpu_data.x86_vendor) {
+		case X86_VENDOR_AMD:
+			if (strstr(boot_cpu_data.x86_model_id, "Screwdriver"))
+				return;
+			stop_k7_watchdog();
+			break;
+		case X86_VENDOR_INTEL:
+			stop_p4_watchdog();
+			break;
+		default:
+			return;
+		}
 	}
-	lapic_nmi_owner = LAPIC_NMI_WATCHDOG;
-	nmi_active = 1;
+	__get_cpu_var(nmi_watchdog_ctlblk.enabled) = 0;
+	atomic_dec(&nmi_active);
 }
 
 /*
@@ -559,50 +687,70 @@ void __kprobes nmi_watchdog_tick(struct 
 {
 	int sum;
 	int touched = 0;
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+	u64 dummy;
+
+	/* check for other users first */
+	if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
+			== NOTIFY_STOP) {
+		touched = 1;
+	}
 
 	sum = read_pda(apic_timer_irqs);
 	if (__get_cpu_var(nmi_touch)) {
 		__get_cpu_var(nmi_touch) = 0;
 		touched = 1;
 	}
+
 #ifdef CONFIG_X86_MCE
 	/* Could check oops_in_progress here too, but it's safer
 	   not too */
 	if (atomic_read(&mce_entry) > 0)
 		touched = 1;
 #endif
+	/* if the apic timer isn't firing, this cpu isn't doing much */
 	if (!touched && __get_cpu_var(last_irq_sum) == sum) {
 		/*
 		 * Ayiee, looks like this CPU is stuck ...
 		 * wait a few IRQs (5 seconds) before doing the oops ...
 		 */
 		local_inc(&__get_cpu_var(alert_counter));
-		if (local_read(&__get_cpu_var(alert_counter)) == 5*nmi_hz) {
-			if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
-							== NOTIFY_STOP) {
-				local_set(&__get_cpu_var(alert_counter), 0);
-				return;
-			}
+		if (local_read(&__get_cpu_var(alert_counter)) == 5*nmi_hz)
 			die_nmi("NMI Watchdog detected LOCKUP on CPU %d\n", regs);
-		}
 	} else {
 		__get_cpu_var(last_irq_sum) = sum;
 		local_set(&__get_cpu_var(alert_counter), 0);
 	}
-	if (nmi_perfctr_msr) {
- 		if (nmi_perfctr_msr == MSR_P4_IQ_COUNTER0) {
- 			/*
- 			 * P4 quirks:
- 			 * - An overflown perfctr will assert its interrupt
- 			 *   until the OVF flag in its CCCR is cleared.
- 			 * - LVTPC is masked on interrupt and must be
- 			 *   unmasked by the LVTPC handler.
- 			 */
- 			wrmsr(MSR_P4_IQ_CCCR0, nmi_p4_cccr_val, 0);
- 			apic_write(APIC_LVTPC, APIC_DM_NMI);
- 		}
-		wrmsrl(nmi_perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz));
+
+	/* see if the nmi watchdog went off */
+	if (wd->enabled) {
+		if (nmi_watchdog == NMI_LOCAL_APIC) {
+			rdmsrl(wd->perfctr_msr, dummy);
+			if (dummy & wd->check_bit){
+				/* this wasn't a watchdog timer interrupt */
+				goto done;
+			}
+
+			/* only Intel uses the cccr msr */
+	 		if (wd->cccr_msr != 0) {
+	 			/*
+	 			 * P4 quirks:
+	 			 * - An overflown perfctr will assert its interrupt
+	 			 *   until the OVF flag in its CCCR is cleared.
+	 			 * - LVTPC is masked on interrupt and must be
+	 			 *   unmasked by the LVTPC handler.
+	 			 */
+				rdmsrl(wd->cccr_msr, dummy);
+				dummy &= ~P4_CCCR_OVF;
+	 			wrmsrl(wd->cccr_msr, dummy);
+	 			apic_write(APIC_LVTPC, APIC_DM_NMI);
+	 		}
+			/* start the cycle over again */
+			wrmsrl(wd->perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz));
+		}
 	}
+done:
+	return;
 }
 
 static __kprobes int dummy_nmi_callback(struct pt_regs * regs, int cpu)
Index: linux-don/include/asm-x86_64/nmi.h
===================================================================
--- linux-don.orig/include/asm-x86_64/nmi.h
+++ linux-don/include/asm-x86_64/nmi.h
@@ -63,7 +63,7 @@ extern void release_perfctr_nmi(unsigned
 extern int reserve_evntsel_nmi(unsigned int);
 extern void release_evntsel_nmi(unsigned int);
 
-extern void setup_apic_nmi_watchdog (void);
+extern void setup_apic_nmi_watchdog (void *);
 extern int reserve_lapic_nmi(void);
 extern void release_lapic_nmi(void);
 extern void disable_timer_nmi_watchdog(void);
@@ -73,6 +73,7 @@ extern void nmi_watchdog_tick (struct pt
 extern void nmi_watchdog_default(void);
 extern int setup_nmi_watchdog(char *);
 
+extern atomic_t nmi_active;
 extern unsigned int nmi_watchdog;
 #define NMI_DEFAULT	-1
 #define NMI_NONE	0

--

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 5/8] Add SMP support on i386 to reservation framework
  2006-05-09 20:50 [patch 0/8] [RFC PATCH] nmi watchdog: x86 32/64 cleanup dzickus
                   ` (3 preceding siblings ...)
  2006-05-09 20:50 ` [patch 4/8] Add SMP support on x86_64 to reservation framework dzickus
@ 2006-05-09 20:50 ` dzickus
  2006-05-22 17:31   ` Markus Armbruster
  2006-05-09 20:50 ` [patch 6/8] Cleanup NMI interrupt path dzickus
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: dzickus @ 2006-05-09 20:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: ak, oprofile-list, dzickus

[-- Attachment #1: nmi-x86-reservation-SMP-i386.patch --]
[-- Type: text/plain, Size: 23922 bytes --]


This patch includes the changes to make the nmi watchdog on i386 SMP aware.
A bunch of code was moved around to make it simpler to read.  In addition,
it is now possible to determine if a particular NMI was the result of the
watchdog or not.  This feature allows the kernel to filter out unknown NMIs
easier.

Signed-off-by:  Don Zickus <dzickus@redhat.com>

Index: linux-don/arch/i386/kernel/apic.c
===================================================================
--- linux-don.orig/arch/i386/kernel/apic.c
+++ linux-don/arch/i386/kernel/apic.c
@@ -587,8 +587,7 @@ void __devinit setup_local_APIC(void)
 			printk("No ESR for 82489DX.\n");
 	}
 
-	if (nmi_watchdog == NMI_LOCAL_APIC)
-		setup_apic_nmi_watchdog();
+	setup_apic_nmi_watchdog(NULL);
 	apic_pm_activate();
 }
 
Index: linux-don/arch/i386/kernel/nmi.c
===================================================================
--- linux-don.orig/arch/i386/kernel/nmi.c
+++ linux-don/arch/i386/kernel/nmi.c
@@ -24,16 +24,10 @@
 
 #include <asm/smp.h>
 #include <asm/nmi.h>
+#include <asm/kdebug.h>
 
 #include "mach_traps.h"
 
-unsigned int nmi_watchdog = NMI_NONE;
-extern int unknown_nmi_panic;
-static unsigned int nmi_hz = HZ;
-static unsigned int nmi_perfctr_msr;	/* the MSR to reset in NMI handler */
-static unsigned int nmi_p4_cccr_val;
-extern void show_registers(struct pt_regs *regs);
-
 /* perfctr_nmi_owner tracks the ownership of the perfctr registers:
  * evtsel_nmi_owner tracks the ownership of the event selection
  * - different performance counters/ event selection may be reserved for
@@ -63,51 +57,31 @@ static unsigned int lapic_nmi_owner;
 #define LAPIC_NMI_RESERVED	(1<<1)
 
 /* nmi_active:
- * +1: the lapic NMI watchdog is active, but can be disabled
- *  0: the lapic NMI watchdog has not been set up, and cannot
+ * >0: the lapic NMI watchdog is active, but can be disabled
+ * <0: the lapic NMI watchdog has not been set up, and cannot
  *     be enabled
- * -1: the lapic NMI watchdog is disabled, but can be enabled
+ *  0: the lapic NMI watchdog is disabled, but can be enabled
  */
-int nmi_active;
+atomic_t nmi_active = ATOMIC_INIT(0);		/* oprofile uses this */
 
-#define K7_EVNTSEL_ENABLE	(1 << 22)
-#define K7_EVNTSEL_INT		(1 << 20)
-#define K7_EVNTSEL_OS		(1 << 17)
-#define K7_EVNTSEL_USR		(1 << 16)
-#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING	0x76
-#define K7_NMI_EVENT		K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING
+unsigned int nmi_watchdog = NMI_DEFAULT;
+static unsigned int nmi_hz = HZ;
 
-#define P6_EVNTSEL0_ENABLE	(1 << 22)
-#define P6_EVNTSEL_INT		(1 << 20)
-#define P6_EVNTSEL_OS		(1 << 17)
-#define P6_EVNTSEL_USR		(1 << 16)
-#define P6_EVENT_CPU_CLOCKS_NOT_HALTED	0x79
-#define P6_NMI_EVENT		P6_EVENT_CPU_CLOCKS_NOT_HALTED
+struct nmi_watchdog_ctlblk {
+	int enabled;
+	u64 check_bit;
+	unsigned int cccr_msr;
+	unsigned int perfctr_msr;  /* the MSR to reset in NMI handler */
+	unsigned int evntsel_msr;  /* the MSR to select the events to handle */
+};
+static DEFINE_PER_CPU(struct nmi_watchdog_ctlblk, nmi_watchdog_ctlblk);
 
-#define MSR_P4_MISC_ENABLE	0x1A0
-#define MSR_P4_MISC_ENABLE_PERF_AVAIL	(1<<7)
-#define MSR_P4_MISC_ENABLE_PEBS_UNAVAIL	(1<<12)
-#define MSR_P4_PERFCTR0		0x300
-#define MSR_P4_CCCR0		0x360
-#define P4_ESCR_EVENT_SELECT(N)	((N)<<25)
-#define P4_ESCR_OS		(1<<3)
-#define P4_ESCR_USR		(1<<2)
-#define P4_CCCR_OVF_PMI0	(1<<26)
-#define P4_CCCR_OVF_PMI1	(1<<27)
-#define P4_CCCR_THRESHOLD(N)	((N)<<20)
-#define P4_CCCR_COMPLEMENT	(1<<19)
-#define P4_CCCR_COMPARE		(1<<18)
-#define P4_CCCR_REQUIRED	(3<<16)
-#define P4_CCCR_ESCR_SELECT(N)	((N)<<13)
-#define P4_CCCR_ENABLE		(1<<12)
-/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter
-   CRU_ESCR0 (with any non-null event selector) through a complemented
-   max threshold. [IA32-Vol3, Section 14.9.9] */
-#define MSR_P4_IQ_COUNTER0	0x30C
-#define P4_NMI_CRU_ESCR0	(P4_ESCR_EVENT_SELECT(0x3F)|P4_ESCR_OS|P4_ESCR_USR)
-#define P4_NMI_IQ_CCCR0	\
-	(P4_CCCR_OVF_PMI0|P4_CCCR_THRESHOLD(15)|P4_CCCR_COMPLEMENT|	\
-	 P4_CCCR_COMPARE|P4_CCCR_REQUIRED|P4_CCCR_ESCR_SELECT(4)|P4_CCCR_ENABLE)
+/* local prototypes */
+static void stop_apic_nmi_watchdog(void *unused);
+static int unknown_nmi_panic_callback(struct pt_regs *regs, int cpu);
+
+extern void show_registers(struct pt_regs *regs);
+extern int unknown_nmi_panic;
 
 /* converts an msr to an appropriate reservation bit */
 static inline unsigned int nmi_perfctr_msr_to_bit(unsigned int msr)
@@ -208,6 +182,17 @@ void release_evntsel_nmi(unsigned int ms
 	clear_bit(counter, &__get_cpu_var(evntsel_nmi_owner[0]));
 }
 
+static __cpuinit inline int nmi_known_cpu(void)
+{
+	switch (boot_cpu_data.x86_vendor) {
+	case X86_VENDOR_AMD:
+		return ((boot_cpu_data.x86 == 15) || (boot_cpu_data.x86 == 6));
+	case X86_VENDOR_INTEL:
+		return ((boot_cpu_data.x86 == 15) || (boot_cpu_data.x86 == 6));
+	}
+	return 0;
+}
+
 #ifdef CONFIG_SMP
 /* The performance counters used by NMI_LOCAL_APIC don't trigger when
  * the CPU is idle. To make sure the NMI watchdog really ticks on all
@@ -234,7 +219,10 @@ static int __init check_nmi_watchdog(voi
 	unsigned int *prev_nmi_count;
 	int cpu;
 
-	if (nmi_watchdog == NMI_NONE)
+	if ((nmi_watchdog == NMI_NONE) || (nmi_watchdog == NMI_DEFAULT))
+		return 0;
+
+	if (!atomic_read(&nmi_active))
 		return 0;
 
 	prev_nmi_count = kmalloc(NR_CPUS * sizeof(int), GFP_KERNEL);
@@ -258,18 +246,22 @@ static int __init check_nmi_watchdog(voi
 		if (!cpu_isset(cpu, cpu_callin_map))
 			continue;
 #endif
+		if (!per_cpu(nmi_watchdog_ctlblk.enabled, cpu))
+			continue;
 		if (nmi_count(cpu) - prev_nmi_count[cpu] <= 5) {
-			endflag = 1;
 			printk("CPU#%d: NMI appears to be stuck (%d->%d)!\n",
 				cpu,
 				prev_nmi_count[cpu],
 				nmi_count(cpu));
-			nmi_active = 0;
-			lapic_nmi_owner &= ~LAPIC_NMI_WATCHDOG;
-			kfree(prev_nmi_count);
-			return -1;
+			per_cpu(nmi_watchdog_ctlblk.enabled, cpu) = 0;
+			atomic_dec(&nmi_active);
 		}
 	}
+	if (!atomic_read(&nmi_active)) {
+		kfree(prev_nmi_count);
+		atomic_set(&nmi_active, -1);
+		return -1;
+	}
 	endflag = 1;
 	printk("OK.\n");
 
@@ -290,31 +282,16 @@ static int __init setup_nmi_watchdog(cha
 
 	get_option(&str, &nmi);
 
-	if (nmi >= NMI_INVALID)
+	if ((nmi >= NMI_INVALID) || (nmi < NMI_NONE))
 		return 0;
-	if (nmi == NMI_NONE)
-		nmi_watchdog = nmi;
 	/*
 	 * If any other x86 CPU has a local APIC, then
 	 * please test the NMI stuff there and send me the
 	 * missing bits. Right now Intel P6/P4 and AMD K7 only.
 	 */
-	if ((nmi == NMI_LOCAL_APIC) &&
-			(boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) &&
-			(boot_cpu_data.x86 == 6 || boot_cpu_data.x86 == 15))
-		nmi_watchdog = nmi;
-	if ((nmi == NMI_LOCAL_APIC) &&
-			(boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
-	  		(boot_cpu_data.x86 == 6 || boot_cpu_data.x86 == 15))
-		nmi_watchdog = nmi;
-	/*
-	 * We can enable the IO-APIC watchdog
-	 * unconditionally.
-	 */
-	if (nmi == NMI_IO_APIC) {
-		nmi_active = 1;
-		nmi_watchdog = nmi;
-	}
+	if ((nmi == NMI_LOCAL_APIC) && (nmi_known_cpu() == 0))
+		return 0;  /* no lapic support */
+	nmi_watchdog = nmi;
 	return 1;
 }
 
@@ -322,41 +299,30 @@ __setup("nmi_watchdog=", setup_nmi_watch
 
 static void disable_lapic_nmi_watchdog(void)
 {
-	if (nmi_active <= 0)
+	BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
+
+	if (atomic_read(&nmi_active) <= 0)
 		return;
-	switch (boot_cpu_data.x86_vendor) {
-	case X86_VENDOR_AMD:
-		wrmsr(MSR_K7_EVNTSEL0, 0, 0);
-		break;
-	case X86_VENDOR_INTEL:
-		switch (boot_cpu_data.x86) {
-		case 6:
-			if (boot_cpu_data.x86_model > 0xd)
-				break;
 
-			wrmsr(MSR_P6_EVNTSEL0, 0, 0);
-			break;
-		case 15:
-			if (boot_cpu_data.x86_model > 0x4)
-				break;
+	on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1);
 
-			wrmsr(MSR_P4_IQ_CCCR0, 0, 0);
-			wrmsr(MSR_P4_CRU_ESCR0, 0, 0);
-			break;
-		}
-		break;
-	}
-	nmi_active = -1;
-	/* tell do_nmi() and others that we're not active any more */
-	nmi_watchdog = 0;
+	BUG_ON(atomic_read(&nmi_active) != 0);
 }
 
 static void enable_lapic_nmi_watchdog(void)
 {
-	if (nmi_active < 0) {
-		nmi_watchdog = NMI_LOCAL_APIC;
-		setup_apic_nmi_watchdog();
-	}
+	BUG_ON(nmi_watchdog != NMI_LOCAL_APIC);
+
+	/* are we already enabled */
+	if (atomic_read(&nmi_active) != 0)
+		return;
+
+	/* are we lapic aware */
+	if (nmi_known_cpu() <= 0)
+		return;
+
+	on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);
+	touch_nmi_watchdog();
 }
 
 int reserve_lapic_nmi(void)
@@ -388,20 +354,25 @@ void release_lapic_nmi(void)
 
 void disable_timer_nmi_watchdog(void)
 {
-	if ((nmi_watchdog != NMI_IO_APIC) || (nmi_active <= 0))
+	BUG_ON(nmi_watchdog != NMI_IO_APIC);
+
+	if (atomic_read(&nmi_active) <= 0)
 		return;
 
-	unset_nmi_callback();
-	nmi_active = -1;
-	nmi_watchdog = NMI_NONE;
+	disable_irq(0);
+	on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1);
+
+	BUG_ON(atomic_read(&nmi_active) != 0);
 }
 
 void enable_timer_nmi_watchdog(void)
 {
-	if (nmi_active < 0) {
-		nmi_watchdog = NMI_IO_APIC;
+	BUG_ON(nmi_watchdog != NMI_IO_APIC);
+
+	if (atomic_read(&nmi_active) == 0) {
 		touch_nmi_watchdog();
-		nmi_active = 1;
+		on_each_cpu(setup_apic_nmi_watchdog, NULL, 0, 1);
+		enable_irq(0);
 	}
 }
 
@@ -411,7 +382,7 @@ static int nmi_pm_active; /* nmi_active 
 
 static int lapic_nmi_suspend(struct sys_device *dev, pm_message_t state)
 {
-	nmi_pm_active = nmi_active;
+	nmi_pm_active = atomic_read(&nmi_active);
 	disable_lapic_nmi_watchdog();
 	return 0;
 }
@@ -439,7 +410,13 @@ static int __init init_lapic_nmi_sysfs(v
 {
 	int error;
 
-	if (nmi_active == 0 || nmi_watchdog != NMI_LOCAL_APIC)
+	/* should really be a BUG_ON but b/c this is an
+	 * init call, it just doesn't work.  -dcz
+	 */
+	if (nmi_watchdog != NMI_LOCAL_APIC)
+		return 0;
+
+	if ( atomic_read(&nmi_active) < 0 )
 		return 0;
 
 	error = sysdev_class_register(&nmi_sysclass);
@@ -457,143 +434,312 @@ late_initcall(init_lapic_nmi_sysfs);
  * Original code written by Keith Owens.
  */
 
-static void write_watchdog_counter(const char *descr)
+static void write_watchdog_counter(unsigned int perfctr_msr, const char *descr)
 {
 	u64 count = (u64)cpu_khz * 1000;
 
 	do_div(count, nmi_hz);
 	if(descr)
 		Dprintk("setting %s to -0x%08Lx\n", descr, count);
-	wrmsrl(nmi_perfctr_msr, 0 - count);
+	wrmsrl(perfctr_msr, 0 - count);
 }
 
+/* Note that these events don't tick when the CPU idles. This means
+   the frequency varies with CPU load. */
+
+#define K7_EVNTSEL_ENABLE	(1 << 22)
+#define K7_EVNTSEL_INT		(1 << 20)
+#define K7_EVNTSEL_OS		(1 << 17)
+#define K7_EVNTSEL_USR		(1 << 16)
+#define K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING	0x76
+#define K7_NMI_EVENT		K7_EVENT_CYCLES_PROCESSOR_IS_RUNNING
+
 static int setup_k7_watchdog(void)
 {
+	unsigned int perfctr_msr, evntsel_msr;
 	unsigned int evntsel;
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
 
-	nmi_perfctr_msr = MSR_K7_PERFCTR0;
-
-	if (!reserve_perfctr_nmi(nmi_perfctr_msr))
+	perfctr_msr = MSR_K7_PERFCTR0;
+	evntsel_msr = MSR_K7_EVNTSEL0;
+	if (!reserve_perfctr_nmi(perfctr_msr))
 		goto fail;
 
-	if (!reserve_evntsel_nmi(MSR_K7_EVNTSEL0))
+	if (!reserve_evntsel_nmi(evntsel_msr))
 		goto fail1;
 
-	wrmsrl(MSR_K7_PERFCTR0, 0UL);
+	wrmsrl(perfctr_msr, 0UL);
 
 	evntsel = K7_EVNTSEL_INT
 		| K7_EVNTSEL_OS
 		| K7_EVNTSEL_USR
 		| K7_NMI_EVENT;
 
-	wrmsr(MSR_K7_EVNTSEL0, evntsel, 0);
-	write_watchdog_counter("K7_PERFCTR0");
+	/* setup the timer */
+	wrmsr(evntsel_msr, evntsel, 0);
+	write_watchdog_counter(perfctr_msr, "K7_PERFCTR0");
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
 	evntsel |= K7_EVNTSEL_ENABLE;
-	wrmsr(MSR_K7_EVNTSEL0, evntsel, 0);
+	wrmsr(evntsel_msr, evntsel, 0);
+
+	wd->perfctr_msr = perfctr_msr;
+	wd->evntsel_msr = evntsel_msr;
+	wd->cccr_msr = 0;  //unused
+	wd->check_bit = 1ULL<<63;
 	return 1;
 fail1:
-	release_perfctr_nmi(nmi_perfctr_msr);
+	release_perfctr_nmi(perfctr_msr);
 fail:
 	return 0;
 }
 
+static void stop_k7_watchdog(void)
+{
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+	wrmsr(wd->evntsel_msr, 0, 0);
+
+	release_evntsel_nmi(wd->evntsel_msr);
+	release_perfctr_nmi(wd->perfctr_msr);
+}
+
+#define P6_EVNTSEL0_ENABLE	(1 << 22)
+#define P6_EVNTSEL_INT		(1 << 20)
+#define P6_EVNTSEL_OS		(1 << 17)
+#define P6_EVNTSEL_USR		(1 << 16)
+#define P6_EVENT_CPU_CLOCKS_NOT_HALTED	0x79
+#define P6_NMI_EVENT		P6_EVENT_CPU_CLOCKS_NOT_HALTED
+
 static int setup_p6_watchdog(void)
 {
+	unsigned int perfctr_msr, evntsel_msr;
 	unsigned int evntsel;
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
 
-	nmi_perfctr_msr = MSR_P6_PERFCTR0;
-
-	if (!reserve_perfctr_nmi(nmi_perfctr_msr))
+	perfctr_msr = MSR_P6_PERFCTR0;
+	evntsel_msr = MSR_P6_EVNTSEL0;
+	if (!reserve_perfctr_nmi(perfctr_msr))
 		goto fail;
 
-	if (!reserve_evntsel_nmi(MSR_P6_EVNTSEL0))
+	if (!reserve_evntsel_nmi(evntsel_msr))
 		goto fail1;
 
+	wrmsrl(perfctr_msr, 0UL);
+
 	evntsel = P6_EVNTSEL_INT
 		| P6_EVNTSEL_OS
 		| P6_EVNTSEL_USR
 		| P6_NMI_EVENT;
 
-	wrmsr(MSR_P6_EVNTSEL0, evntsel, 0);
-	write_watchdog_counter("P6_PERFCTR0");
+	/* setup the timer */
+	wrmsr(evntsel_msr, evntsel, 0);
+	write_watchdog_counter(perfctr_msr, "P6_PERFCTR0");
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
-	evntsel |= P6_EVNTSEL0_ENABLE;
-	wrmsr(MSR_P6_EVNTSEL0, evntsel, 0);
+	evntsel |= K7_EVNTSEL_ENABLE;
+	wrmsr(evntsel_msr, evntsel, 0);
+
+	wd->perfctr_msr = perfctr_msr;
+	wd->evntsel_msr = evntsel_msr;
+	wd->cccr_msr = 0;  //unused
+	wd->check_bit = 1ULL<<39;
 	return 1;
 fail1:
-	release_perfctr_nmi(nmi_perfctr_msr);
+	release_perfctr_nmi(perfctr_msr);
 fail:
 	return 0;
 }
 
+static void stop_p6_watchdog(void)
+{
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+	wrmsr(wd->evntsel_msr, 0, 0);
+
+	release_evntsel_nmi(wd->evntsel_msr);
+	release_perfctr_nmi(wd->perfctr_msr);
+}
+
+/* Note that these events don't tick when the CPU idles. This means
+   the frequency varies with CPU load. */
+
+#define MSR_P4_MISC_ENABLE_PERF_AVAIL	(1<<7)
+#define P4_ESCR_EVENT_SELECT(N)	((N)<<25)
+#define P4_ESCR_OS		(1<<3)
+#define P4_ESCR_USR		(1<<2)
+#define P4_CCCR_OVF_PMI0	(1<<26)
+#define P4_CCCR_OVF_PMI1	(1<<27)
+#define P4_CCCR_THRESHOLD(N)	((N)<<20)
+#define P4_CCCR_COMPLEMENT	(1<<19)
+#define P4_CCCR_COMPARE		(1<<18)
+#define P4_CCCR_REQUIRED	(3<<16)
+#define P4_CCCR_ESCR_SELECT(N)	((N)<<13)
+#define P4_CCCR_ENABLE		(1<<12)
+#define P4_CCCR_OVF 		(1<<31)
+/* Set up IQ_COUNTER0 to behave like a clock, by having IQ_CCCR0 filter
+   CRU_ESCR0 (with any non-null event selector) through a complemented
+   max threshold. [IA32-Vol3, Section 14.9.9] */
+
 static int setup_p4_watchdog(void)
 {
+	unsigned int perfctr_msr, evntsel_msr, cccr_msr;
+	unsigned int evntsel, cccr_val;
 	unsigned int misc_enable, dummy;
+	unsigned int ht_num;
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
 
-	rdmsr(MSR_P4_MISC_ENABLE, misc_enable, dummy);
+	rdmsr(MSR_IA32_MISC_ENABLE, misc_enable, dummy);
 	if (!(misc_enable & MSR_P4_MISC_ENABLE_PERF_AVAIL))
 		return 0;
 
-	nmi_perfctr_msr = MSR_P4_IQ_COUNTER0;
-	nmi_p4_cccr_val = P4_NMI_IQ_CCCR0;
 #ifdef CONFIG_SMP
-	if (smp_num_siblings == 2)
-		nmi_p4_cccr_val |= P4_CCCR_OVF_PMI1;
+	/* detect which hyperthread we are on */
+	if (smp_num_siblings == 2) {
+		unsigned int ebx, apicid;
+
+        	ebx = cpuid_ebx(1);
+	        apicid = (ebx >> 24) & 0xff;
+        	ht_num = apicid & 1;
+	} else
 #endif
+		ht_num = 0;
 
-	if (!reserve_perfctr_nmi(nmi_perfctr_msr))
+	/* performance counters are shared resources
+	 * assign each hyperthread its own set
+	 * (re-use the ESCR0 register, seems safe
+	 * and keeps the cccr_val the same)
+	 */
+	if (!ht_num) {
+		/* logical cpu 0 */
+		perfctr_msr = MSR_P4_IQ_PERFCTR0;
+		evntsel_msr = MSR_P4_CRU_ESCR0;
+		cccr_msr = MSR_P4_IQ_CCCR0;
+		cccr_val = P4_CCCR_OVF_PMI0 | P4_CCCR_ESCR_SELECT(4);
+	} else {
+		/* logical cpu 1 */
+		perfctr_msr = MSR_P4_IQ_PERFCTR1;
+		evntsel_msr = MSR_P4_CRU_ESCR0;
+		cccr_msr = MSR_P4_IQ_CCCR1;
+		cccr_val = P4_CCCR_OVF_PMI1 | P4_CCCR_ESCR_SELECT(4);
+	}
+
+	if (!reserve_perfctr_nmi(perfctr_msr))
 		goto fail;
 
-	if (!reserve_evntsel_nmi(MSR_P4_CRU_ESCR0))
+	if (!reserve_evntsel_nmi(evntsel_msr))
 		goto fail1;
 
-	wrmsr(MSR_P4_CRU_ESCR0, P4_NMI_CRU_ESCR0, 0);
-	wrmsr(MSR_P4_IQ_CCCR0, P4_NMI_IQ_CCCR0 & ~P4_CCCR_ENABLE, 0);
-	write_watchdog_counter("P4_IQ_COUNTER0");
+	evntsel = P4_ESCR_EVENT_SELECT(0x3F)
+	 	| P4_ESCR_OS
+		| P4_ESCR_USR;
+
+	cccr_val |= P4_CCCR_THRESHOLD(15)
+		 | P4_CCCR_COMPLEMENT
+		 | P4_CCCR_COMPARE
+		 | P4_CCCR_REQUIRED;
+
+	wrmsr(evntsel_msr, evntsel, 0);
+	wrmsr(cccr_msr, cccr_val, 0);
+	write_watchdog_counter(perfctr_msr, "P4_IQ_COUNTER0");
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
-	wrmsr(MSR_P4_IQ_CCCR0, nmi_p4_cccr_val, 0);
+	cccr_val |= P4_CCCR_ENABLE;
+	wrmsr(cccr_msr, cccr_val, 0);
+	wd->perfctr_msr = perfctr_msr;
+	wd->evntsel_msr = evntsel_msr;
+	wd->cccr_msr = cccr_msr;
+	wd->check_bit = 1ULL<<39;
 	return 1;
 fail1:
-	release_perfctr_nmi(nmi_perfctr_msr);
+	release_perfctr_nmi(perfctr_msr);
 fail:
 	return 0;
 }
 
-void setup_apic_nmi_watchdog (void)
+static void stop_p4_watchdog(void)
 {
-	switch (boot_cpu_data.x86_vendor) {
-	case X86_VENDOR_AMD:
-		if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15)
-			return;
-		if (!setup_k7_watchdog())
-			return;
-		break;
-	case X86_VENDOR_INTEL:
-		switch (boot_cpu_data.x86) {
-		case 6:
-			if (boot_cpu_data.x86_model > 0xd)
-				return;
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+
+	wrmsr(wd->cccr_msr, 0, 0);
+	wrmsr(wd->evntsel_msr, 0, 0);
+
+	release_evntsel_nmi(wd->evntsel_msr);
+	release_perfctr_nmi(wd->perfctr_msr);
+}
+
+void setup_apic_nmi_watchdog (void *unused)
+{
+	/* only support LOCAL and IO APICs for now */
+	if ((nmi_watchdog != NMI_LOCAL_APIC) &&
+	    (nmi_watchdog != NMI_IO_APIC))
+	    	return;
 
-			if(!setup_p6_watchdog())
+	if (nmi_watchdog == NMI_LOCAL_APIC) {
+		switch (boot_cpu_data.x86_vendor) {
+		case X86_VENDOR_AMD:
+			if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15)
 				return;
-			break;
-		case 15:
-			if (boot_cpu_data.x86_model > 0x4)
+			if (!setup_k7_watchdog())
 				return;
+			break;
+		case X86_VENDOR_INTEL:
+			switch (boot_cpu_data.x86) {
+			case 6:
+				if (boot_cpu_data.x86_model > 0xd)
+					return;
 
-			if (!setup_p4_watchdog())
+				if (!setup_p6_watchdog())
+					return;
+				break;
+			case 15:
+				if (boot_cpu_data.x86_model > 0x4)
+					return;
+
+				if (!setup_p4_watchdog())
+					return;
+				break;
+			default:
 				return;
+			}
+			break;
+		default:
+			return;
+		}
+	}
+	__get_cpu_var(nmi_watchdog_ctlblk.enabled) = 1;
+	atomic_inc(&nmi_active);
+}
+
+static void stop_apic_nmi_watchdog(void *unused)
+{
+	/* only support LOCAL and IO APICs for now */
+	if ((nmi_watchdog != NMI_LOCAL_APIC) &&
+	    (nmi_watchdog != NMI_IO_APIC))
+	    	return;
+
+	if (nmi_watchdog == NMI_LOCAL_APIC) {
+		switch (boot_cpu_data.x86_vendor) {
+		case X86_VENDOR_AMD:
+			stop_k7_watchdog();
+			break;
+		case X86_VENDOR_INTEL:
+			switch (boot_cpu_data.x86) {
+			case 6:
+				if (boot_cpu_data.x86_model > 0xd)
+					break;
+				stop_p6_watchdog();
+				break;
+			case 15:
+				if (boot_cpu_data.x86_model > 0x4)
+					break;
+				stop_p4_watchdog();
+				break;
+			}
 			break;
 		default:
 			return;
 		}
-		break;
-	default:
-		return;
 	}
-	lapic_nmi_owner = LAPIC_NMI_WATCHDOG;
-	nmi_active = 1;
+	__get_cpu_var(nmi_watchdog_ctlblk.enabled) = 0;
+	atomic_dec(&nmi_active);
 }
 
 /*
@@ -634,7 +780,7 @@ void touch_nmi_watchdog (void)
 
 extern void die_nmi(struct pt_regs *, const char *msg);
 
-void nmi_watchdog_tick (struct pt_regs * regs)
+void nmi_watchdog_tick (struct pt_regs * regs, unsigned reason)
 {
 
 	/*
@@ -643,11 +789,21 @@ void nmi_watchdog_tick (struct pt_regs *
 	 * smp_processor_id().
 	 */
 	unsigned int sum;
+	int touched = 0;
 	int cpu = smp_processor_id();
+	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
+	u64 dummy;
+
+	/* check for other users first */
+	if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
+			== NOTIFY_STOP) {
+		touched = 1;
+	}
 
 	sum = per_cpu(irq_stat, cpu).apic_timer_irqs;
 
-	if (last_irq_sums[cpu] == sum) {
+	/* if the apic timer isn't firing, this cpu isn't doing much */
+	if (!touched && last_irq_sums[cpu] == sum) {
 		/*
 		 * Ayiee, looks like this CPU is stuck ...
 		 * wait a few IRQs (5 seconds) before doing the oops ...
@@ -662,26 +818,41 @@ void nmi_watchdog_tick (struct pt_regs *
 		last_irq_sums[cpu] = sum;
 		alert_counter[cpu] = 0;
 	}
-	if (nmi_perfctr_msr) {
-		if (nmi_perfctr_msr == MSR_P4_IQ_COUNTER0) {
-			/*
-			 * P4 quirks:
-			 * - An overflown perfctr will assert its interrupt
-			 *   until the OVF flag in its CCCR is cleared.
-			 * - LVTPC is masked on interrupt and must be
-			 *   unmasked by the LVTPC handler.
-			 */
-			wrmsr(MSR_P4_IQ_CCCR0, nmi_p4_cccr_val, 0);
-			apic_write(APIC_LVTPC, APIC_DM_NMI);
-		}
-		else if (nmi_perfctr_msr == MSR_P6_PERFCTR0) {
-			/* Only P6 based Pentium M need to re-unmask
-			 * the apic vector but it doesn't hurt
-			 * other P6 variant */
-			apic_write(APIC_LVTPC, APIC_DM_NMI);
+	/* see if the nmi watchdog went off */
+	if (wd->enabled) {
+		if (nmi_watchdog == NMI_LOCAL_APIC) {
+			rdmsrl(wd->perfctr_msr, dummy);
+			if (dummy & wd->check_bit){
+				/* this wasn't a watchdog timer interrupt */
+				goto done;
+			}
+
+			/* only Intel P4 uses the cccr msr */
+	 		if (wd->cccr_msr != 0) {
+	 			/*
+	 			 * P4 quirks:
+	 			 * - An overflown perfctr will assert its interrupt
+	 			 *   until the OVF flag in its CCCR is cleared.
+	 			 * - LVTPC is masked on interrupt and must be
+	 			 *   unmasked by the LVTPC handler.
+	 			 */
+				rdmsrl(wd->cccr_msr, dummy);
+				dummy &= ~P4_CCCR_OVF;
+	 			wrmsrl(wd->cccr_msr, dummy);
+	 			apic_write(APIC_LVTPC, APIC_DM_NMI);
+	 		}
+			else if (wd->perfctr_msr == MSR_P6_PERFCTR0) {
+				/* Only P6 based Pentium M need to re-unmask
+				 * the apic vector but it doesn't hurt
+				 * other P6 variant */
+				apic_write(APIC_LVTPC, APIC_DM_NMI);
+			}
+			/* start the cycle over again */
+			write_watchdog_counter(wd->perfctr_msr, NULL);
 		}
-		write_watchdog_counter(NULL);
 	}
+done:
+	return;
 }
 
 #ifdef CONFIG_SYSCTL
Index: linux-don/arch/i386/kernel/traps.c
===================================================================
--- linux-don.orig/arch/i386/kernel/traps.c
+++ linux-don/arch/i386/kernel/traps.c
@@ -691,7 +691,7 @@ static void default_do_nmi(struct pt_reg
 		 * so it must be the NMI watchdog.
 		 */
 		if (nmi_watchdog) {
-			nmi_watchdog_tick(regs);
+			nmi_watchdog_tick(regs, reason);
 			return;
 		}
 #endif
Index: linux-don/include/asm-i386/nmi.h
===================================================================
--- linux-don.orig/include/asm-i386/nmi.h
+++ linux-don/include/asm-i386/nmi.h
@@ -32,13 +32,14 @@ extern void release_perfctr_nmi(unsigned
 extern int reserve_evntsel_nmi(unsigned int);
 extern void release_evntsel_nmi(unsigned int);
 
-extern void setup_apic_nmi_watchdog (void);
+extern void setup_apic_nmi_watchdog (void *);
 extern int reserve_lapic_nmi(void);
 extern void release_lapic_nmi(void);
 extern void disable_timer_nmi_watchdog(void);
 extern void enable_timer_nmi_watchdog(void);
-extern void nmi_watchdog_tick (struct pt_regs * regs);
+extern void nmi_watchdog_tick (struct pt_regs * regs, unsigned reason);
 
+extern atomic_t nmi_active;
 extern unsigned int nmi_watchdog;
 #define NMI_DEFAULT     -1
 #define NMI_NONE	0
Index: linux-don/arch/i386/oprofile/nmi_timer_int.c
===================================================================
--- linux-don.orig/arch/i386/oprofile/nmi_timer_int.c
+++ linux-don/arch/i386/oprofile/nmi_timer_int.c
@@ -42,9 +42,7 @@ static void timer_stop(void)
 
 int __init op_nmi_timer_init(struct oprofile_operations * ops)
 {
-	extern int nmi_active;
-
-	if (nmi_active <= 0)
+	if (atomic_read(&nmi_active) <= 0)
 		return -ENODEV;
 
 	ops->start = timer_start;

--

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 6/8] Cleanup NMI interrupt path
  2006-05-09 20:50 [patch 0/8] [RFC PATCH] nmi watchdog: x86 32/64 cleanup dzickus
                   ` (4 preceding siblings ...)
  2006-05-09 20:50 ` [patch 5/8] Add SMP support on i386 " dzickus
@ 2006-05-09 20:50 ` dzickus
  2006-05-09 20:50 ` [patch 7/8] Remove un/set_nmi_callback and reserve/release_lapic_nmi functions dzickus
  2006-05-09 20:50 ` [patch 8/8] Add abilty to enable/disable nmi watchdog from sysfs dzickus
  7 siblings, 0 replies; 18+ messages in thread
From: dzickus @ 2006-05-09 20:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: ak, oprofile-list, dzickus

[-- Attachment #1: nmi-x86-cleanup-intr-path.patch --]
[-- Type: text/plain, Size: 7308 bytes --]


This patch cleans up the NMI interrupt path.  Instead of being gated by if
the 'nmi callback' is set, the interrupt handler now calls everyone who is
registered on the die_chain and additionally checks the nmi watchdog,
reseting it if enabled.  This allows more subsystems to hook into the NMI if
they need to (without being block by set_nmi_callback). 


Signed-off-by:  Don Zickus <dzickus@redhat.com>

Index: linux-don/arch/i386/kernel/nmi.c
===================================================================
--- linux-don.orig/arch/i386/kernel/nmi.c
+++ linux-don/arch/i386/kernel/nmi.c
@@ -780,7 +780,7 @@ void touch_nmi_watchdog (void)
 
 extern void die_nmi(struct pt_regs *, const char *msg);
 
-void nmi_watchdog_tick (struct pt_regs * regs, unsigned reason)
+int nmi_watchdog_tick (struct pt_regs * regs, unsigned reason)
 {
 
 	/*
@@ -793,10 +793,12 @@ void nmi_watchdog_tick (struct pt_regs *
 	int cpu = smp_processor_id();
 	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
 	u64 dummy;
+	int rc=0;
 
 	/* check for other users first */
 	if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
 			== NOTIFY_STOP) {
+		rc = 1;
 		touched = 1;
 	}
 
@@ -849,10 +851,18 @@ void nmi_watchdog_tick (struct pt_regs *
 			}
 			/* start the cycle over again */
 			write_watchdog_counter(wd->perfctr_msr, NULL);
-		}
+			rc = 1;
+		} else if (nmi_watchdog == NMI_IO_APIC) {
+			/* don't know how to accurately check for this.
+			 * just assume it was a watchdog timer interrupt
+			 * This matches the old behaviour.
+			 */
+			rc = 1;
+		} else
+			printk(KERN_WARNING "Unknown enabled NMI hardware?!\n");
 	}
 done:
-	return;
+	return rc;
 }
 
 #ifdef CONFIG_SYSCTL
Index: linux-don/arch/i386/kernel/traps.c
===================================================================
--- linux-don.orig/arch/i386/kernel/traps.c
+++ linux-don/arch/i386/kernel/traps.c
@@ -673,6 +673,13 @@ void die_nmi (struct pt_regs *regs, cons
 	do_exit(SIGSEGV);
 }
 
+static int dummy_nmi_callback(struct pt_regs * regs, int cpu)
+{
+	return 0;
+}
+
+static nmi_callback_t nmi_callback = dummy_nmi_callback;
+
 static void default_do_nmi(struct pt_regs * regs)
 {
 	unsigned char reason = 0;
@@ -690,12 +697,11 @@ static void default_do_nmi(struct pt_reg
 		 * Ok, so this is none of the documented NMI sources,
 		 * so it must be the NMI watchdog.
 		 */
-		if (nmi_watchdog) {
-			nmi_watchdog_tick(regs, reason);
+		if (nmi_watchdog_tick(regs, reason))
 			return;
-		}
 #endif
-		unknown_nmi_error(reason, regs);
+		if (!rcu_dereference(nmi_callback)(regs, smp_processor_id()))
+			unknown_nmi_error(reason, regs);
 		return;
 	}
 	if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT) == NOTIFY_STOP)
@@ -711,13 +717,6 @@ static void default_do_nmi(struct pt_reg
 	reassert_nmi();
 }
 
-static int dummy_nmi_callback(struct pt_regs * regs, int cpu)
-{
-	return 0;
-}
- 
-static nmi_callback_t nmi_callback = dummy_nmi_callback;
- 
 fastcall void do_nmi(struct pt_regs * regs, long error_code)
 {
 	int cpu;
@@ -728,8 +727,7 @@ fastcall void do_nmi(struct pt_regs * re
 
 	++nmi_count(cpu);
 
-	if (!rcu_dereference(nmi_callback)(regs, cpu))
-		default_do_nmi(regs);
+	default_do_nmi(regs);
 
 	nmi_exit();
 }
Index: linux-don/arch/x86_64/kernel/nmi.c
===================================================================
--- linux-don.orig/arch/x86_64/kernel/nmi.c
+++ linux-don/arch/x86_64/kernel/nmi.c
@@ -683,16 +683,18 @@ void touch_nmi_watchdog (void)
  	touch_softlockup_watchdog();
 }
 
-void __kprobes nmi_watchdog_tick(struct pt_regs * regs, unsigned reason)
+int __kprobes nmi_watchdog_tick(struct pt_regs * regs, unsigned reason)
 {
 	int sum;
 	int touched = 0;
 	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
 	u64 dummy;
+	int rc=0;
 
 	/* check for other users first */
 	if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT)
 			== NOTIFY_STOP) {
+		rc = 1;
 		touched = 1;
 	}
 
@@ -747,10 +749,18 @@ void __kprobes nmi_watchdog_tick(struct 
 	 		}
 			/* start the cycle over again */
 			wrmsrl(wd->perfctr_msr, -((u64)cpu_khz * 1000 / nmi_hz));
-		}
+			rc = 1;
+		} else 	if (nmi_watchdog == NMI_IO_APIC) {
+			/* don't know how to accurately check for this.
+			 * just assume it was a watchdog timer interrupt
+			 * This matches the old behaviour.
+			 */
+			rc = 1;
+		} else
+			printk(KERN_WARNING "Unknown enabled NMI hardware?!\n");
 	}
 done:
-	return;
+	return rc;
 }
 
 static __kprobes int dummy_nmi_callback(struct pt_regs * regs, int cpu)
@@ -762,15 +772,17 @@ static nmi_callback_t nmi_callback = dum
  
 asmlinkage __kprobes void do_nmi(struct pt_regs * regs, long error_code)
 {
-	int cpu = safe_smp_processor_id();
-
 	nmi_enter();
 	add_pda(__nmi_count,1);
-	if (!rcu_dereference(nmi_callback)(regs, cpu))
-		default_do_nmi(regs);
+	default_do_nmi(regs);
 	nmi_exit();
 }
 
+int do_nmi_callback(struct pt_regs * regs, int cpu)
+{
+	return rcu_dereference(nmi_callback)(regs, cpu);
+}
+
 void set_nmi_callback(nmi_callback_t callback)
 {
 	vmalloc_sync_all();
Index: linux-don/arch/x86_64/kernel/traps.c
===================================================================
--- linux-don.orig/arch/x86_64/kernel/traps.c
+++ linux-don/arch/x86_64/kernel/traps.c
@@ -657,12 +657,12 @@ asmlinkage __kprobes void default_do_nmi
 		 * Ok, so this is none of the documented NMI sources,
 		 * so it must be the NMI watchdog.
 		 */
-		if (nmi_watchdog > 0) {
-			nmi_watchdog_tick(regs,reason);
+		if (nmi_watchdog_tick(regs,reason))
 			return;
-		}
+		if (!do_nmi_callback(regs,cpu))
 #endif
-		unknown_nmi_error(reason, regs);
+			unknown_nmi_error(reason, regs);
+
 		return;
 	}
 	if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT) == NOTIFY_STOP)
Index: linux-don/include/asm-i386/nmi.h
===================================================================
--- linux-don.orig/include/asm-i386/nmi.h
+++ linux-don/include/asm-i386/nmi.h
@@ -37,7 +37,7 @@ extern int reserve_lapic_nmi(void);
 extern void release_lapic_nmi(void);
 extern void disable_timer_nmi_watchdog(void);
 extern void enable_timer_nmi_watchdog(void);
-extern void nmi_watchdog_tick (struct pt_regs * regs, unsigned reason);
+extern int nmi_watchdog_tick (struct pt_regs * regs, unsigned reason);
 
 extern atomic_t nmi_active;
 extern unsigned int nmi_watchdog;
Index: linux-don/include/asm-x86_64/nmi.h
===================================================================
--- linux-don.orig/include/asm-x86_64/nmi.h
+++ linux-don/include/asm-x86_64/nmi.h
@@ -26,6 +26,14 @@ void set_nmi_callback(nmi_callback_t cal
  */
 void unset_nmi_callback(void);
 
+/**
+ * do_nmi_callback
+ *
+ * Check to see if a callback exists and execute it.  Return 1
+ * if the handler exists and was handled successfully.
+ */
+int do_nmi_callback(struct pt_regs *regs, int cpu);
+
 #ifdef CONFIG_PM
  
 /** Replace the PM callback routine for NMI. */
@@ -68,7 +76,7 @@ extern int reserve_lapic_nmi(void);
 extern void release_lapic_nmi(void);
 extern void disable_timer_nmi_watchdog(void);
 extern void enable_timer_nmi_watchdog(void);
-extern void nmi_watchdog_tick (struct pt_regs * regs, unsigned reason);
+extern int nmi_watchdog_tick (struct pt_regs * regs, unsigned reason);
 
 extern void nmi_watchdog_default(void);
 extern int setup_nmi_watchdog(char *);

--

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 7/8] Remove un/set_nmi_callback and reserve/release_lapic_nmi functions
  2006-05-09 20:50 [patch 0/8] [RFC PATCH] nmi watchdog: x86 32/64 cleanup dzickus
                   ` (5 preceding siblings ...)
  2006-05-09 20:50 ` [patch 6/8] Cleanup NMI interrupt path dzickus
@ 2006-05-09 20:50 ` dzickus
  2006-05-09 20:50 ` [patch 8/8] Add abilty to enable/disable nmi watchdog from sysfs dzickus
  7 siblings, 0 replies; 18+ messages in thread
From: dzickus @ 2006-05-09 20:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: ak, oprofile-list, dzickus

[-- Attachment #1: nmi-x86-remove-callback.patch --]
[-- Type: text/plain, Size: 16608 bytes --]


Removes the un/set_nmi_callback and reserve/release_lapic_nmi functions as
they are no longer needed.  The various subsystems are modified to register
with the die_notifier instead. 

Signed-off-by:  Don Zickus <dzickus@redhat.com>

Index: linux-don/arch/x86_64/kernel/nmi.c
===================================================================
--- linux-don.orig/arch/x86_64/kernel/nmi.c
+++ linux-don/arch/x86_64/kernel/nmi.c
@@ -42,20 +42,6 @@ static DEFINE_PER_CPU(unsigned, evntsel_
  */
 #define NMI_MAX_COUNTER_BITS 66
 
-/*
- * lapic_nmi_owner tracks the ownership of the lapic NMI hardware:
- * - it may be reserved by some other driver, or not
- * - when not reserved by some other driver, it may be used for
- *   the NMI watchdog, or not
- *
- * This is maintained separately from nmi_active because the NMI
- * watchdog may also be driven from the I/O APIC timer.
- */
-static DEFINE_SPINLOCK(lapic_nmi_owner_lock);
-static unsigned int lapic_nmi_owner;
-#define LAPIC_NMI_WATCHDOG	(1<<0)
-#define LAPIC_NMI_RESERVED	(1<<1)
-
 /* nmi_active:
  * >0: the lapic NMI watchdog is active, but can be disabled
  * <0: the lapic NMI watchdog has not been set up, and cannot
@@ -322,33 +308,6 @@ static void enable_lapic_nmi_watchdog(vo
 	touch_nmi_watchdog();
 }
 
-int reserve_lapic_nmi(void)
-{
-	unsigned int old_owner;
-
-	spin_lock(&lapic_nmi_owner_lock);
-	old_owner = lapic_nmi_owner;
-	lapic_nmi_owner |= LAPIC_NMI_RESERVED;
-	spin_unlock(&lapic_nmi_owner_lock);
-	if (old_owner & LAPIC_NMI_RESERVED)
-		return -EBUSY;
-	if (old_owner & LAPIC_NMI_WATCHDOG)
-		disable_lapic_nmi_watchdog();
-	return 0;
-}
-
-void release_lapic_nmi(void)
-{
-	unsigned int new_owner;
-
-	spin_lock(&lapic_nmi_owner_lock);
-	new_owner = lapic_nmi_owner & ~LAPIC_NMI_RESERVED;
-	lapic_nmi_owner = new_owner;
-	spin_unlock(&lapic_nmi_owner_lock);
-	if (new_owner & LAPIC_NMI_WATCHDOG)
-		enable_lapic_nmi_watchdog();
-}
-
 void disable_timer_nmi_watchdog(void)
 {
 	BUG_ON(nmi_watchdog != NMI_IO_APIC);
@@ -763,13 +722,6 @@ done:
 	return rc;
 }
 
-static __kprobes int dummy_nmi_callback(struct pt_regs * regs, int cpu)
-{
-	return 0;
-}
- 
-static nmi_callback_t nmi_callback = dummy_nmi_callback;
- 
 asmlinkage __kprobes void do_nmi(struct pt_regs * regs, long error_code)
 {
 	nmi_enter();
@@ -780,18 +732,11 @@ asmlinkage __kprobes void do_nmi(struct 
 
 int do_nmi_callback(struct pt_regs * regs, int cpu)
 {
-	return rcu_dereference(nmi_callback)(regs, cpu);
-}
-
-void set_nmi_callback(nmi_callback_t callback)
-{
-	vmalloc_sync_all();
-	rcu_assign_pointer(nmi_callback, callback);
-}
-
-void unset_nmi_callback(void)
-{
-	nmi_callback = dummy_nmi_callback;
+#ifdef CONFIG_SYSCTL
+	if (unknown_nmi_panic)
+		return unknown_nmi_panic_callback(regs, cpu);
+#endif
+	return 0;
 }
 
 #ifdef CONFIG_SYSCTL
@@ -801,37 +746,8 @@ static int unknown_nmi_panic_callback(st
 	unsigned char reason = get_nmi_reason();
 	char buf[64];
 
-	if (!(reason & 0xc0)) {
-		sprintf(buf, "NMI received for unknown reason %02x\n", reason);
-		die_nmi(buf,regs);
-	}
-	return 0;
-}
-
-/*
- * proc handler for /proc/sys/kernel/unknown_nmi_panic
- */
-int proc_unknown_nmi_panic(struct ctl_table *table, int write, struct file *file,
-			void __user *buffer, size_t *length, loff_t *ppos)
-{
-	int old_state;
-
-	old_state = unknown_nmi_panic;
-	proc_dointvec(table, write, file, buffer, length, ppos);
-	if (!!old_state == !!unknown_nmi_panic)
-		return 0;
-
-	if (unknown_nmi_panic) {
-		if (reserve_lapic_nmi() < 0) {
-			unknown_nmi_panic = 0;
-			return -EBUSY;
-		} else {
-			set_nmi_callback(unknown_nmi_panic_callback);
-		}
-	} else {
-		release_lapic_nmi();
-		unset_nmi_callback();
-	}
+	sprintf(buf, "NMI received for unknown reason %02x\n", reason);
+	die_nmi(buf,regs);
 	return 0;
 }
 
@@ -845,8 +761,6 @@ EXPORT_SYMBOL(reserve_perfctr_nmi);
 EXPORT_SYMBOL(release_perfctr_nmi);
 EXPORT_SYMBOL(reserve_evntsel_nmi);
 EXPORT_SYMBOL(release_evntsel_nmi);
-EXPORT_SYMBOL(reserve_lapic_nmi);
-EXPORT_SYMBOL(release_lapic_nmi);
 EXPORT_SYMBOL(disable_timer_nmi_watchdog);
 EXPORT_SYMBOL(enable_timer_nmi_watchdog);
 EXPORT_SYMBOL(touch_nmi_watchdog);
Index: linux-don/arch/x86_64/kernel/x8664_ksyms.c
===================================================================
--- linux-don.orig/arch/x86_64/kernel/x8664_ksyms.c
+++ linux-don/arch/x86_64/kernel/x8664_ksyms.c
@@ -104,9 +104,6 @@ EXPORT_SYMBOL(screen_info);
 
 EXPORT_SYMBOL(rtc_lock);
 
-EXPORT_SYMBOL_GPL(set_nmi_callback);
-EXPORT_SYMBOL_GPL(unset_nmi_callback);
-
 /* Export string functions. We normally rely on gcc builtin for most of these,
    but gcc sometimes decides not to inline them. */    
 #undef memcpy
Index: linux-don/include/asm-x86_64/nmi.h
===================================================================
--- linux-don.orig/include/asm-x86_64/nmi.h
+++ linux-don/include/asm-x86_64/nmi.h
@@ -7,25 +7,6 @@
 #include <linux/pm.h>
 #include <asm/io.h>
  
-struct pt_regs;
-
-typedef int (*nmi_callback_t)(struct pt_regs * regs, int cpu);
-
-/**
- * set_nmi_callback
- *
- * Set a handler for an NMI. Only one handler may be
- * set. Return 1 if the NMI was handled.
- */
-void set_nmi_callback(nmi_callback_t callback);
-
-/**
- * unset_nmi_callback
- *
- * Remove the handler previously set.
- */
-void unset_nmi_callback(void);
-
 /**
  * do_nmi_callback
  *
@@ -72,8 +53,6 @@ extern int reserve_evntsel_nmi(unsigned 
 extern void release_evntsel_nmi(unsigned int);
 
 extern void setup_apic_nmi_watchdog (void *);
-extern int reserve_lapic_nmi(void);
-extern void release_lapic_nmi(void);
 extern void disable_timer_nmi_watchdog(void);
 extern void enable_timer_nmi_watchdog(void);
 extern int nmi_watchdog_tick (struct pt_regs * regs, unsigned reason);
Index: linux-don/arch/i386/oprofile/nmi_int.c
===================================================================
--- linux-don.orig/arch/i386/oprofile/nmi_int.c
+++ linux-don/arch/i386/oprofile/nmi_int.c
@@ -16,14 +16,15 @@
 #include <asm/nmi.h>
 #include <asm/msr.h>
 #include <asm/apic.h>
+#include <asm/kdebug.h>
  
 #include "op_counter.h"
 #include "op_x86_model.h"
- 
+
 static struct op_x86_model_spec const * model;
 static struct op_msrs cpu_msrs[NR_CPUS];
 static unsigned long saved_lvtpc[NR_CPUS];
- 
+
 static int nmi_start(void);
 static void nmi_stop(void);
 
@@ -81,13 +82,24 @@ static void exit_driverfs(void)
 #define exit_driverfs() do { } while (0)
 #endif /* CONFIG_PM */
 
-
-static int nmi_callback(struct pt_regs * regs, int cpu)
+int profile_exceptions_notify(struct notifier_block *self,
+				unsigned long val, void *data)
 {
-	return model->check_ctrs(regs, &cpu_msrs[cpu]);
+	struct die_args *args = (struct die_args *)data;
+	int ret = NOTIFY_DONE;
+	int cpu = smp_processor_id();
+
+	switch(val) {
+	case DIE_NMI:
+		if (model->check_ctrs(args->regs, &cpu_msrs[cpu]))
+			ret = NOTIFY_STOP;
+		break;
+	default:
+		break;
+	}
+	return ret;
 }
- 
- 
+
 static void nmi_cpu_save_registers(struct op_msrs * msrs)
 {
 	unsigned int const nr_ctrs = model->num_counters;
@@ -173,27 +185,29 @@ static void nmi_cpu_setup(void * dummy)
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
 }
 
+static struct notifier_block profile_exceptions_nb = {
+	.notifier_call = profile_exceptions_notify,
+	.next = NULL,
+	.priority = 0
+};
 
 static int nmi_setup(void)
 {
+	int err=0;
+
 	if (!allocate_msrs())
 		return -ENOMEM;
 
-	/* We walk a thin line between law and rape here.
-	 * We need to be careful to install our NMI handler
-	 * without actually triggering any NMIs as this will
-	 * break the core code horrifically.
-	 */
-	if (reserve_lapic_nmi() < 0) {
+	if ((err = register_die_notifier(&profile_exceptions_nb))){
 		free_msrs();
-		return -EBUSY;
+		return err;
 	}
+
 	/* We need to serialize save and setup for HT because the subset
 	 * of msrs are distinct for save and setup operations
 	 */
 	on_each_cpu(nmi_save_registers, NULL, 0, 1);
 	on_each_cpu(nmi_cpu_setup, NULL, 0, 1);
-	set_nmi_callback(nmi_callback);
 	nmi_enabled = 1;
 	return 0;
 }
@@ -249,8 +263,7 @@ static void nmi_shutdown(void)
 {
 	nmi_enabled = 0;
 	on_each_cpu(nmi_cpu_shutdown, NULL, 0, 1);
-	unset_nmi_callback();
-	release_lapic_nmi();
+	unregister_die_notifier(&profile_exceptions_nb);
 	free_msrs();
 }
 
Index: linux-don/arch/i386/oprofile/nmi_timer_int.c
===================================================================
--- linux-don.orig/arch/i386/oprofile/nmi_timer_int.c
+++ linux-don/arch/i386/oprofile/nmi_timer_int.c
@@ -17,32 +17,49 @@
 #include <asm/nmi.h>
 #include <asm/apic.h>
 #include <asm/ptrace.h>
+#include <asm/kdebug.h>
  
-static int nmi_timer_callback(struct pt_regs * regs, int cpu)
+int profile_timer_exceptions_notify(struct notifier_block *self,
+				unsigned long val, void *data)
 {
-	oprofile_add_sample(regs, 0);
-	return 1;
+	struct die_args *args = (struct die_args *)data;
+	int ret = NOTIFY_DONE;
+
+	switch(val) {
+	case DIE_NMI:
+		oprofile_add_sample(args->regs, 0);
+		ret = NOTIFY_STOP;
+		break;
+	default:
+		break;
+	}
+	return ret;
 }
 
+static struct notifier_block profile_timer_exceptions_nb = {
+	.notifier_call = profile_timer_exceptions_notify,
+	.next = NULL,
+	.priority = 0
+};
+
 static int timer_start(void)
 {
-	disable_timer_nmi_watchdog();
-	set_nmi_callback(nmi_timer_callback);
+	if (register_die_notifier(&profile_timer_exceptions_nb))
+		return 1;
 	return 0;
 }
 
 
 static void timer_stop(void)
 {
-	enable_timer_nmi_watchdog();
-	unset_nmi_callback();
+	unregister_die_notifier(&profile_timer_exceptions_nb);
 	synchronize_sched();  /* Allow already-started NMIs to complete. */
 }
 
 
 int __init op_nmi_timer_init(struct oprofile_operations * ops)
 {
-	if (atomic_read(&nmi_active) <= 0)
+	if ((nmi_watchdog != NMI_IO_APIC) || (atomic_read(&nmi_active) <= 0))
 		return -ENODEV;
 
 	ops->start = timer_start;
Index: linux-don/arch/i386/kernel/nmi.c
===================================================================
--- linux-don.orig/arch/i386/kernel/nmi.c
+++ linux-don/arch/i386/kernel/nmi.c
@@ -42,20 +42,6 @@ static DEFINE_PER_CPU(unsigned long, evn
  */
 #define NMI_MAX_COUNTER_BITS 66
 
-/*
- * lapic_nmi_owner tracks the ownership of the lapic NMI hardware:
- * - it may be reserved by some other driver, or not
- * - when not reserved by some other driver, it may be used for
- *   the NMI watchdog, or not
- *
- * This is maintained separately from nmi_active because the NMI
- * watchdog may also be driven from the I/O APIC timer.
- */
-static DEFINE_SPINLOCK(lapic_nmi_owner_lock);
-static unsigned int lapic_nmi_owner;
-#define LAPIC_NMI_WATCHDOG	(1<<0)
-#define LAPIC_NMI_RESERVED	(1<<1)
-
 /* nmi_active:
  * >0: the lapic NMI watchdog is active, but can be disabled
  * <0: the lapic NMI watchdog has not been set up, and cannot
@@ -325,33 +311,6 @@ static void enable_lapic_nmi_watchdog(vo
 	touch_nmi_watchdog();
 }
 
-int reserve_lapic_nmi(void)
-{
-	unsigned int old_owner;
-
-	spin_lock(&lapic_nmi_owner_lock);
-	old_owner = lapic_nmi_owner;
-	lapic_nmi_owner |= LAPIC_NMI_RESERVED;
-	spin_unlock(&lapic_nmi_owner_lock);
-	if (old_owner & LAPIC_NMI_RESERVED)
-		return -EBUSY;
-	if (old_owner & LAPIC_NMI_WATCHDOG)
-		disable_lapic_nmi_watchdog();
-	return 0;
-}
-
-void release_lapic_nmi(void)
-{
-	unsigned int new_owner;
-
-	spin_lock(&lapic_nmi_owner_lock);
-	new_owner = lapic_nmi_owner & ~LAPIC_NMI_RESERVED;
-	lapic_nmi_owner = new_owner;
-	spin_unlock(&lapic_nmi_owner_lock);
-	if (new_owner & LAPIC_NMI_WATCHDOG)
-		enable_lapic_nmi_watchdog();
-}
-
 void disable_timer_nmi_watchdog(void)
 {
 	BUG_ON(nmi_watchdog != NMI_IO_APIC);
@@ -865,6 +824,15 @@ done:
 	return rc;
 }
 
+int do_nmi_callback(struct pt_regs * regs, int cpu)
+{
+#ifdef CONFIG_SYSCTL
+	if (unknown_nmi_panic)
+		return unknown_nmi_panic_callback(regs, cpu);
+#endif
+	return 0;
+}
+
 #ifdef CONFIG_SYSCTL
 
 static int unknown_nmi_panic_callback(struct pt_regs *regs, int cpu)
@@ -872,37 +840,8 @@ static int unknown_nmi_panic_callback(st
 	unsigned char reason = get_nmi_reason();
 	char buf[64];
 
-	if (!(reason & 0xc0)) {
-		sprintf(buf, "NMI received for unknown reason %02x\n", reason);
-		die_nmi(regs, buf);
-	}
-	return 0;
-}
-
-/*
- * proc handler for /proc/sys/kernel/unknown_nmi_panic
- */
-int proc_unknown_nmi_panic(ctl_table *table, int write, struct file *file,
-			void __user *buffer, size_t *length, loff_t *ppos)
-{
-	int old_state;
-
-	old_state = unknown_nmi_panic;
-	proc_dointvec(table, write, file, buffer, length, ppos);
-	if (!!old_state == !!unknown_nmi_panic)
-		return 0;
-
-	if (unknown_nmi_panic) {
-		if (reserve_lapic_nmi() < 0) {
-			unknown_nmi_panic = 0;
-			return -EBUSY;
-		} else {
-			set_nmi_callback(unknown_nmi_panic_callback);
-		}
-	} else {
-		release_lapic_nmi();
-		unset_nmi_callback();
-	}
+	sprintf(buf, "NMI received for unknown reason %02x\n", reason);
+	die_nmi(regs, buf);
 	return 0;
 }
 
@@ -916,7 +855,5 @@ EXPORT_SYMBOL(reserve_perfctr_nmi);
 EXPORT_SYMBOL(release_perfctr_nmi);
 EXPORT_SYMBOL(reserve_evntsel_nmi);
 EXPORT_SYMBOL(release_evntsel_nmi);
-EXPORT_SYMBOL(reserve_lapic_nmi);
-EXPORT_SYMBOL(release_lapic_nmi);
 EXPORT_SYMBOL(disable_timer_nmi_watchdog);
 EXPORT_SYMBOL(enable_timer_nmi_watchdog);
Index: linux-don/arch/i386/kernel/traps.c
===================================================================
--- linux-don.orig/arch/i386/kernel/traps.c
+++ linux-don/arch/i386/kernel/traps.c
@@ -673,13 +673,6 @@ void die_nmi (struct pt_regs *regs, cons
 	do_exit(SIGSEGV);
 }
 
-static int dummy_nmi_callback(struct pt_regs * regs, int cpu)
-{
-	return 0;
-}
-
-static nmi_callback_t nmi_callback = dummy_nmi_callback;
-
 static void default_do_nmi(struct pt_regs * regs)
 {
 	unsigned char reason = 0;
@@ -699,9 +692,10 @@ static void default_do_nmi(struct pt_reg
 		 */
 		if (nmi_watchdog_tick(regs, reason))
 			return;
+		if (!do_nmi_callback(regs, smp_processor_id()))
 #endif
-		if (!rcu_dereference(nmi_callback)(regs, smp_processor_id()))
 			unknown_nmi_error(reason, regs);
+
 		return;
 	}
 	if (notify_die(DIE_NMI, "nmi", regs, reason, 2, SIGINT) == NOTIFY_STOP)
@@ -732,19 +726,6 @@ fastcall void do_nmi(struct pt_regs * re
 	nmi_exit();
 }
 
-void set_nmi_callback(nmi_callback_t callback)
-{
-	vmalloc_sync_all();
-	rcu_assign_pointer(nmi_callback, callback);
-}
-EXPORT_SYMBOL_GPL(set_nmi_callback);
-
-void unset_nmi_callback(void)
-{
-	nmi_callback = dummy_nmi_callback;
-}
-EXPORT_SYMBOL_GPL(unset_nmi_callback);
-
 #ifdef CONFIG_KPROBES
 fastcall void __kprobes do_int3(struct pt_regs *regs, long error_code)
 {
Index: linux-don/include/asm-i386/nmi.h
===================================================================
--- linux-don.orig/include/asm-i386/nmi.h
+++ linux-don/include/asm-i386/nmi.h
@@ -6,24 +6,13 @@
 
 #include <linux/pm.h>
 
-struct pt_regs;
-
-typedef int (*nmi_callback_t)(struct pt_regs * regs, int cpu);
-
-/**
- * set_nmi_callback
- *
- * Set a handler for an NMI. Only one handler may be
- * set. Return 1 if the NMI was handled.
- */
-void set_nmi_callback(nmi_callback_t callback);
-
 /**
- * unset_nmi_callback
+ * do_nmi_callback
  *
- * Remove the handler previously set.
+ * Check to see if a callback exists and execute it.  Return 1
+ * if the handler exists and was handled successfully.
  */
-void unset_nmi_callback(void);
+int do_nmi_callback(struct pt_regs *regs, int cpu);
 
 extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
 extern int avail_to_resrv_perfctr_nmi(unsigned int);
@@ -33,8 +22,6 @@ extern int reserve_evntsel_nmi(unsigned 
 extern void release_evntsel_nmi(unsigned int);
 
 extern void setup_apic_nmi_watchdog (void *);
-extern int reserve_lapic_nmi(void);
-extern void release_lapic_nmi(void);
 extern void disable_timer_nmi_watchdog(void);
 extern void enable_timer_nmi_watchdog(void);
 extern int nmi_watchdog_tick (struct pt_regs * regs, unsigned reason);
Index: linux-don/kernel/sysctl.c
===================================================================
--- linux-don.orig/kernel/sysctl.c
+++ linux-don/kernel/sysctl.c
@@ -75,8 +75,6 @@ extern int percpu_pagelist_fraction;
 
 #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
 int unknown_nmi_panic;
-extern int proc_unknown_nmi_panic(ctl_table *, int, struct file *,
-				  void __user *, size_t *, loff_t *);
 #endif
 
 /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
@@ -630,7 +628,7 @@ static ctl_table kern_table[] = {
 		.data           = &unknown_nmi_panic,
 		.maxlen         = sizeof (int),
 		.mode           = 0644,
-		.proc_handler   = &proc_unknown_nmi_panic,
+		.proc_handler   = &proc_dointvec,
 	},
 #endif
 #if defined(CONFIG_X86)

--

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 8/8] Add abilty to enable/disable nmi watchdog from sysfs
  2006-05-09 20:50 [patch 0/8] [RFC PATCH] nmi watchdog: x86 32/64 cleanup dzickus
                   ` (6 preceding siblings ...)
  2006-05-09 20:50 ` [patch 7/8] Remove un/set_nmi_callback and reserve/release_lapic_nmi functions dzickus
@ 2006-05-09 20:50 ` dzickus
  2006-05-09 21:37   ` Andi Kleen
  2006-05-10  9:10   ` Stephane Eranian
  7 siblings, 2 replies; 18+ messages in thread
From: dzickus @ 2006-05-09 20:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: ak, oprofile-list, dzickus

[-- Attachment #1: nmi-x86-sysctl-nmi-switch.patch --]
[-- Type: text/plain, Size: 5497 bytes --]


Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi
watchdog.

Signed-off-by:  Don Zickus <dzickus@redhat.com>

Index: linux-don/arch/i386/kernel/nmi.c
===================================================================
--- linux-don.orig/arch/i386/kernel/nmi.c
+++ linux-don/arch/i386/kernel/nmi.c
@@ -845,6 +845,56 @@ static int unknown_nmi_panic_callback(st
 	return 0;
 }
 
+/*
+ * proc handler for /proc/sys/kernel/nmi
+ */
+int proc_nmi_enabled(struct ctl_table *table, int write, struct file *file,
+			void __user *buffer, size_t *length, loff_t *ppos)
+{
+	int old_state;
+
+	nmi_watchdog_enabled = (atomic_read(&nmi_active) > 0) ? 1 : 0;
+	old_state = nmi_watchdog_enabled;
+	proc_dointvec(table, write, file, buffer, length, ppos);
+	if (!!old_state == !!nmi_watchdog_enabled)
+		return 0;
+
+	if (atomic_read(&nmi_active) < 0) {
+		printk( KERN_WARNING "NMI watchdog is permanently disabled\n");
+		return 0;
+	}
+
+	if (nmi_watchdog == NMI_DEFAULT) {
+		if (nmi_known_cpu() > 0)
+			nmi_watchdog = NMI_LOCAL_APIC;
+		else
+			nmi_watchdog = NMI_IO_APIC;
+	}
+
+	if (nmi_watchdog == NMI_LOCAL_APIC)
+	{
+		if (nmi_watchdog_enabled)
+			enable_lapic_nmi_watchdog();
+		else
+			disable_lapic_nmi_watchdog();
+	} else if (nmi_watchdog == NMI_IO_APIC) {
+		/* FIXME
+		 * for some reason these functions don't work
+		 */
+		printk("Can not enable/disable NMI on IO APIC\n");
+		return 0;
+		/*if (nmi_watchdog_enabled)
+			enable_timer_nmi_watchdog();
+		else
+			disable_timer_nmi_watchdog();
+		*/
+	} else {
+		printk( KERN_WARNING
+			"NMI watchdog doesn't know what hardware to touch\n");
+	}
+	return 0;
+}
+
 #endif
 
 EXPORT_SYMBOL(nmi_active);
Index: linux-don/arch/x86_64/kernel/nmi.c
===================================================================
--- linux-don.orig/arch/x86_64/kernel/nmi.c
+++ linux-don/arch/x86_64/kernel/nmi.c
@@ -751,6 +751,52 @@ static int unknown_nmi_panic_callback(st
 	return 0;
 }
 
+/*
+ * proc handler for /proc/sys/kernel/nmi
+ */
+int proc_nmi_enabled(struct ctl_table *table, int write, struct file *file,
+			void __user *buffer, size_t *length, loff_t *ppos)
+{
+	int old_state;
+
+	nmi_watchdog_enabled = (atomic_read(&nmi_active) > 0) ? 1 : 0;
+	old_state = nmi_watchdog_enabled;
+	proc_dointvec(table, write, file, buffer, length, ppos);
+	if (!!old_state == !!nmi_watchdog_enabled)
+		return 0;
+
+	if (atomic_read(&nmi_active) < 0) {
+		printk( KERN_WARNING "NMI watchdog is permanently disabled\n");
+		return 0;
+	}
+
+	/* if nmi_watchdog is not set yet, then set it */
+	nmi_watchdog_default();
+
+	if (nmi_watchdog == NMI_LOCAL_APIC)
+	{
+		if (nmi_watchdog_enabled)
+			enable_lapic_nmi_watchdog();
+		else
+			disable_lapic_nmi_watchdog();
+	} else if (nmi_watchdog == NMI_IO_APIC) {
+		/* FIXME
+		 * for some reason these functions don't work
+		 */
+		printk("Can not enable/disable NMI on IO APIC\n");
+		return 0;
+		/*if (nmi_watchdog_enabled)
+			enable_timer_nmi_watchdog();
+		else
+			disable_timer_nmi_watchdog();
+		*/
+	} else {
+		printk( KERN_WARNING
+			"NMI watchdog doesn't know what hardware to touch\n");
+	}
+	return 0;
+}
+
 #endif
 
 EXPORT_SYMBOL(nmi_active);
Index: linux-don/include/asm-i386/nmi.h
===================================================================
--- linux-don.orig/include/asm-i386/nmi.h
+++ linux-don/include/asm-i386/nmi.h
@@ -14,6 +14,7 @@
  */
 int do_nmi_callback(struct pt_regs *regs, int cpu);
 
+extern int nmi_watchdog_enabled;
 extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
 extern int avail_to_resrv_perfctr_nmi(unsigned int);
 extern int reserve_perfctr_nmi(unsigned int);
Index: linux-don/include/asm-x86_64/nmi.h
===================================================================
--- linux-don.orig/include/asm-x86_64/nmi.h
+++ linux-don/include/asm-x86_64/nmi.h
@@ -43,6 +43,7 @@ extern void die_nmi(char *str, struct pt
 
 extern int panic_on_timeout;
 extern int unknown_nmi_panic;
+extern int nmi_watchdog_enabled;
 
 extern int check_nmi_watchdog(void);
 extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
Index: linux-don/include/linux/sysctl.h
===================================================================
--- linux-don.orig/include/linux/sysctl.h
+++ linux-don/include/linux/sysctl.h
@@ -148,6 +148,7 @@ enum
 	KERN_SPIN_RETRY=70,	/* int: number of spinlock retries */
 	KERN_ACPI_VIDEO_FLAGS=71, /* int: flags for setting up video after ACPI sleep */
 	KERN_IA64_UNALIGNED=72, /* int: ia64 unaligned userland trap enable */
+	KERN_NMI_ENABLED=73, /* int: enable/disable nmi watchdog */
 };
 
 
Index: linux-don/kernel/sysctl.c
===================================================================
--- linux-don.orig/kernel/sysctl.c
+++ linux-don/kernel/sysctl.c
@@ -75,6 +75,9 @@ extern int percpu_pagelist_fraction;
 
 #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
 int unknown_nmi_panic;
+int nmi_watchdog_enabled;
+extern int proc_nmi_enabled(struct ctl_table *, int , struct file *,
+			void __user *, size_t *, loff_t *);
 #endif
 
 /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
@@ -630,6 +633,14 @@ static ctl_table kern_table[] = {
 		.mode           = 0644,
 		.proc_handler   = &proc_dointvec,
 	},
+	{
+		.ctl_name       = KERN_NMI_ENABLED,
+		.procname       = "nmi",
+		.data           = &nmi_watchdog_enabled,
+		.maxlen         = sizeof (int),
+		.mode           = 0644,
+		.proc_handler   = &proc_nmi_enabled,
+	},
 #endif
 #if defined(CONFIG_X86)
 	{

--

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 8/8] Add abilty to enable/disable nmi watchdog from sysfs
  2006-05-09 20:50 ` [patch 8/8] Add abilty to enable/disable nmi watchdog from sysfs dzickus
@ 2006-05-09 21:37   ` Andi Kleen
  2006-05-10  9:10   ` Stephane Eranian
  1 sibling, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2006-05-09 21:37 UTC (permalink / raw)
  To: dzickus; +Cc: linux-kernel, oprofile-list

On Tuesday 09 May 2006 22:50, dzickus wrote:
> 
> Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi
> watchdog.

The subject is wrong i think - sysctl != sysfs.

And shouldn't that be called nmi_watchdog? 

> 
> Signed-off-by:  Don Zickus <dzickus@redhat.com>
> 
> Index: linux-don/arch/i386/kernel/nmi.c
> ===================================================================
> --- linux-don.orig/arch/i386/kernel/nmi.c
> +++ linux-don/arch/i386/kernel/nmi.c
> @@ -845,6 +845,56 @@ static int unknown_nmi_panic_callback(st
>  	return 0;
>  }
>  
> +/*
> + * proc handler for /proc/sys/kernel/nmi
> + */
> +int proc_nmi_enabled(struct ctl_table *table, int write, struct file *file,
> +			void __user *buffer, size_t *length, loff_t *ppos)
> +{
> +	int old_state;
> +
> +	nmi_watchdog_enabled = (atomic_read(&nmi_active) > 0) ? 1 : 0;
> +	old_state = nmi_watchdog_enabled;
> +	proc_dointvec(table, write, file, buffer, length, ppos);
> +	if (!!old_state == !!nmi_watchdog_enabled)
> +		return 0;
> +
> +	if (atomic_read(&nmi_active) < 0) {
> +		printk( KERN_WARNING "NMI watchdog is permanently disabled\n");
> +		return 0;
> +	}

The returns should be probably -EIO or similar. Same further down.

> +
> +	if (nmi_watchdog == NMI_DEFAULT) {
> +		if (nmi_known_cpu() > 0)
> +			nmi_watchdog = NMI_LOCAL_APIC;
> +		else
> +			nmi_watchdog = NMI_IO_APIC;
> +	}
> +
> +	if (nmi_watchdog == NMI_LOCAL_APIC)
> +	{

Move the { up

Anyways I fixed these up myself, no need to resubmit.

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/8] Add SMP support on x86_64 to reservation framework
  2006-05-09 20:50 ` [patch 4/8] Add SMP support on x86_64 to reservation framework dzickus
@ 2006-05-09 21:42   ` Andi Kleen
  0 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2006-05-09 21:42 UTC (permalink / raw)
  To: dzickus; +Cc: linux-kernel, oprofile-list

On Tuesday 09 May 2006 22:50, dzickus wrote:
> 
> This patch includes the changes to make the nmi watchdog on x86_64 SMP
> aware.  A bunch of code was moved around to make it simpler to read.  In
> addition, it is now possible to determine if a particular NMI was the result
> of the watchdog or not.  This feature allows the kernel to filter out
> unknown NMIs easier. 


Looks all good to me. I merged it all up. Thanks.

Regarding the ioapic watchdog not switching with the sysctl - i guess
we can live with that. It is deprecated anyways.

Thanks for doing all that work.

-Andi


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 8/8] Add abilty to enable/disable nmi watchdog from sysfs
  2006-05-09 20:50 ` [patch 8/8] Add abilty to enable/disable nmi watchdog from sysfs dzickus
  2006-05-09 21:37   ` Andi Kleen
@ 2006-05-10  9:10   ` Stephane Eranian
  2006-05-10  9:28     ` Andi Kleen
  2006-05-10 14:59     ` Don Zickus
  1 sibling, 2 replies; 18+ messages in thread
From: Stephane Eranian @ 2006-05-10  9:10 UTC (permalink / raw)
  To: dzickus; +Cc: linux-kernel, ak, oprofile-list, perfmon

Don,

Congratulations on the patch. I am glad to see that some of the SMP
issues I reported a long time ago are now fixed.

On Tue, May 09, 2006 at 04:50:43PM -0400, dzickus wrote:
> 
> Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi
> watchdog.
> 

This means you can at runtime enable/disbale nmi_watchdog, i.e., reserve
some performance counters on the fly. This gets complicated because now
the perfmon subsystem (and probably oprofile) cannot check register
availability when they are first initialized. Basically each time,
the /sys entry is modified, they would have to scan the list of available
performance counters. I don't know exactly when Oprofile does this checking.
For perfmon, this is done only once, when the PMU description table is loaded.

Also something that I did not see in this code is the error detection in
case enable_lapic_nmi_watchdog() fails. Oprofile runs on all CPUs or none.
Perfmon lets you monitor on subsets on CPUs. In case NMI was disabled and
a monitoring session was active on some CPUs. The enable_lapic_nmi_watchdog()
will fail on some CPUs. How is that handled?

Thanks.

> Signed-off-by:  Don Zickus <dzickus@redhat.com>
> 
> Index: linux-don/arch/i386/kernel/nmi.c
> ===================================================================
> --- linux-don.orig/arch/i386/kernel/nmi.c
> +++ linux-don/arch/i386/kernel/nmi.c
> @@ -845,6 +845,56 @@ static int unknown_nmi_panic_callback(st
>  	return 0;
>  }
>  
> +/*
> + * proc handler for /proc/sys/kernel/nmi
> + */
> +int proc_nmi_enabled(struct ctl_table *table, int write, struct file *file,
> +			void __user *buffer, size_t *length, loff_t *ppos)
> +{
> +	int old_state;
> +
> +	nmi_watchdog_enabled = (atomic_read(&nmi_active) > 0) ? 1 : 0;
> +	old_state = nmi_watchdog_enabled;
> +	proc_dointvec(table, write, file, buffer, length, ppos);
> +	if (!!old_state == !!nmi_watchdog_enabled)
> +		return 0;
> +
> +	if (atomic_read(&nmi_active) < 0) {
> +		printk( KERN_WARNING "NMI watchdog is permanently disabled\n");
> +		return 0;
> +	}
> +
> +	if (nmi_watchdog == NMI_DEFAULT) {
> +		if (nmi_known_cpu() > 0)
> +			nmi_watchdog = NMI_LOCAL_APIC;
> +		else
> +			nmi_watchdog = NMI_IO_APIC;
> +	}
> +
> +	if (nmi_watchdog == NMI_LOCAL_APIC)
> +	{
> +		if (nmi_watchdog_enabled)
> +			enable_lapic_nmi_watchdog();
> +		else
> +			disable_lapic_nmi_watchdog();
> +	} else if (nmi_watchdog == NMI_IO_APIC) {
> +		/* FIXME
> +		 * for some reason these functions don't work
> +		 */
> +		printk("Can not enable/disable NMI on IO APIC\n");
> +		return 0;
> +		/*if (nmi_watchdog_enabled)
> +			enable_timer_nmi_watchdog();
> +		else
> +			disable_timer_nmi_watchdog();
> +		*/
> +	} else {
> +		printk( KERN_WARNING
> +			"NMI watchdog doesn't know what hardware to touch\n");
> +	}
> +	return 0;
> +}
> +
>  #endif
>  
>  EXPORT_SYMBOL(nmi_active);
> Index: linux-don/arch/x86_64/kernel/nmi.c
> ===================================================================
> --- linux-don.orig/arch/x86_64/kernel/nmi.c
> +++ linux-don/arch/x86_64/kernel/nmi.c
> @@ -751,6 +751,52 @@ static int unknown_nmi_panic_callback(st
>  	return 0;
>  }
>  
> +/*
> + * proc handler for /proc/sys/kernel/nmi
> + */
> +int proc_nmi_enabled(struct ctl_table *table, int write, struct file *file,
> +			void __user *buffer, size_t *length, loff_t *ppos)
> +{
> +	int old_state;
> +
> +	nmi_watchdog_enabled = (atomic_read(&nmi_active) > 0) ? 1 : 0;
> +	old_state = nmi_watchdog_enabled;
> +	proc_dointvec(table, write, file, buffer, length, ppos);
> +	if (!!old_state == !!nmi_watchdog_enabled)
> +		return 0;
> +
> +	if (atomic_read(&nmi_active) < 0) {
> +		printk( KERN_WARNING "NMI watchdog is permanently disabled\n");
> +		return 0;
> +	}
> +
> +	/* if nmi_watchdog is not set yet, then set it */
> +	nmi_watchdog_default();
> +
> +	if (nmi_watchdog == NMI_LOCAL_APIC)
> +	{
> +		if (nmi_watchdog_enabled)
> +			enable_lapic_nmi_watchdog();
> +		else
> +			disable_lapic_nmi_watchdog();
> +	} else if (nmi_watchdog == NMI_IO_APIC) {
> +		/* FIXME
> +		 * for some reason these functions don't work
> +		 */
> +		printk("Can not enable/disable NMI on IO APIC\n");
> +		return 0;
> +		/*if (nmi_watchdog_enabled)
> +			enable_timer_nmi_watchdog();
> +		else
> +			disable_timer_nmi_watchdog();
> +		*/
> +	} else {
> +		printk( KERN_WARNING
> +			"NMI watchdog doesn't know what hardware to touch\n");
> +	}
> +	return 0;
> +}
> +
>  #endif
>  
>  EXPORT_SYMBOL(nmi_active);
> Index: linux-don/include/asm-i386/nmi.h
> ===================================================================
> --- linux-don.orig/include/asm-i386/nmi.h
> +++ linux-don/include/asm-i386/nmi.h
> @@ -14,6 +14,7 @@
>   */
>  int do_nmi_callback(struct pt_regs *regs, int cpu);
>  
> +extern int nmi_watchdog_enabled;
>  extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
>  extern int avail_to_resrv_perfctr_nmi(unsigned int);
>  extern int reserve_perfctr_nmi(unsigned int);
> Index: linux-don/include/asm-x86_64/nmi.h
> ===================================================================
> --- linux-don.orig/include/asm-x86_64/nmi.h
> +++ linux-don/include/asm-x86_64/nmi.h
> @@ -43,6 +43,7 @@ extern void die_nmi(char *str, struct pt
>  
>  extern int panic_on_timeout;
>  extern int unknown_nmi_panic;
> +extern int nmi_watchdog_enabled;
>  
>  extern int check_nmi_watchdog(void);
>  extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
> Index: linux-don/include/linux/sysctl.h
> ===================================================================
> --- linux-don.orig/include/linux/sysctl.h
> +++ linux-don/include/linux/sysctl.h
> @@ -148,6 +148,7 @@ enum
>  	KERN_SPIN_RETRY=70,	/* int: number of spinlock retries */
>  	KERN_ACPI_VIDEO_FLAGS=71, /* int: flags for setting up video after ACPI sleep */
>  	KERN_IA64_UNALIGNED=72, /* int: ia64 unaligned userland trap enable */
> +	KERN_NMI_ENABLED=73, /* int: enable/disable nmi watchdog */
>  };
>  
>  
> Index: linux-don/kernel/sysctl.c
> ===================================================================
> --- linux-don.orig/kernel/sysctl.c
> +++ linux-don/kernel/sysctl.c
> @@ -75,6 +75,9 @@ extern int percpu_pagelist_fraction;
>  
>  #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
>  int unknown_nmi_panic;
> +int nmi_watchdog_enabled;
> +extern int proc_nmi_enabled(struct ctl_table *, int , struct file *,
> +			void __user *, size_t *, loff_t *);
>  #endif
>  
>  /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
> @@ -630,6 +633,14 @@ static ctl_table kern_table[] = {
>  		.mode           = 0644,
>  		.proc_handler   = &proc_dointvec,
>  	},
> +	{
> +		.ctl_name       = KERN_NMI_ENABLED,
> +		.procname       = "nmi",
> +		.data           = &nmi_watchdog_enabled,
> +		.maxlen         = sizeof (int),
> +		.mode           = 0644,
> +		.proc_handler   = &proc_nmi_enabled,
> +	},
>  #endif
>  #if defined(CONFIG_X86)
>  	{
> 
> --
> 
> 
> -------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> oprofile-list mailing list
> oprofile-list@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oprofile-list

-- 

-Stephane

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 8/8] Add abilty to enable/disable nmi watchdog from sysfs
  2006-05-10  9:10   ` Stephane Eranian
@ 2006-05-10  9:28     ` Andi Kleen
  2006-05-10 14:59     ` Don Zickus
  1 sibling, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2006-05-10  9:28 UTC (permalink / raw)
  To: eranian; +Cc: dzickus, linux-kernel, oprofile-list, perfmon


> On Tue, May 09, 2006 at 04:50:43PM -0400, dzickus wrote:
> > 
> > Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi
> > watchdog.
> > 
> 
> This means you can at runtime enable/disbale nmi_watchdog, i.e., reserve
> some performance counters on the fly. This gets complicated because now
> the perfmon subsystem 

Right now we don't care about perfmon at all because it's not in (x86) mainline
If you want anybody to care you have to submit and pass review

> (and probably oprofile) cannot check register 
> availability when they are first initialized. Basically each time,
> the /sys entry is modified, they would have to scan the list of available
> performance counters. I don't know exactly when Oprofile does this checking.
> For perfmon, this is done only once, when the PMU description table is loaded.

I think the NMI watchdog will fail if the register is already allocated.
oprofile should check and allocate when it fills in the register. 

-Andi


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 8/8] Add abilty to enable/disable nmi watchdog from sysfs
  2006-05-10  9:10   ` Stephane Eranian
  2006-05-10  9:28     ` Andi Kleen
@ 2006-05-10 14:59     ` Don Zickus
  2006-05-10 15:03       ` Stephane Eranian
  1 sibling, 1 reply; 18+ messages in thread
From: Don Zickus @ 2006-05-10 14:59 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, ak, oprofile-list, perfmon

On Wed, May 10, 2006 at 02:10:26AM -0700, Stephane Eranian wrote:
> Don,
> 
> Congratulations on the patch. I am glad to see that some of the SMP
> issues I reported a long time ago are now fixed.

Thanks and glad I could help.

> > Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi
> > watchdog.
> > 
> 
> This means you can at runtime enable/disbale nmi_watchdog, i.e., reserve
> some performance counters on the fly. This gets complicated because now
> the perfmon subsystem (and probably oprofile) cannot check register
> availability when they are first initialized. Basically each time,
> the /sys entry is modified, they would have to scan the list of available
> performance counters. I don't know exactly when Oprofile does this checking.
> For perfmon, this is done only once, when the PMU description table is loaded.
How often did you plan on enabling/disabling the nmi_watchdog?  My
understanding was you disable nmi_watchdog, run oprofile/perfmon,
re-enable nmi_watchdog.  I guess I don't understand what type of funky
scenarios you are dealing with.  

> 
> Also something that I did not see in this code is the error detection in
> case enable_lapic_nmi_watchdog() fails. Oprofile runs on all CPUs or none.
> Perfmon lets you monitor on subsets on CPUs. In case NMI was disabled and
> a monitoring session was active on some CPUs. The enable_lapic_nmi_watchdog()
> will fail on some CPUs. How is that handled?

It's not.  In fact I wouldn't know what to do in such situations.  Is it
really wrong to only have a subset of cpus being monitored by the
nmi_watchdog?  This seems to be wandering into the area where the user is
looking to do something complicated (profiling a subset of cpus) and as
such might be expected to make sure the nmi_watchdog is properly enabled
on all cpus when they are done.

Cheers,
Don

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 8/8] Add abilty to enable/disable nmi watchdog from sysfs
  2006-05-10 14:59     ` Don Zickus
@ 2006-05-10 15:03       ` Stephane Eranian
  2006-05-10 19:06         ` Don Zickus
  0 siblings, 1 reply; 18+ messages in thread
From: Stephane Eranian @ 2006-05-10 15:03 UTC (permalink / raw)
  To: Don Zickus; +Cc: linux-kernel, ak, oprofile-list, perfmon

Don,

On Wed, May 10, 2006 at 10:59:53AM -0400, Don Zickus wrote:
> > 
> > This means you can at runtime enable/disbale nmi_watchdog, i.e., reserve
> > some performance counters on the fly. This gets complicated because now
> > the perfmon subsystem (and probably oprofile) cannot check register
> > availability when they are first initialized. Basically each time,
> > the /sys entry is modified, they would have to scan the list of available
> > performance counters. I don't know exactly when Oprofile does this checking.
> > For perfmon, this is done only once, when the PMU description table is loaded.

> How often did you plan on enabling/disabling the nmi_watchdog?  My
> understanding was you disable nmi_watchdog, run oprofile/perfmon,
> re-enable nmi_watchdog.  I guess I don't understand what type of funky
> scenarios you are dealing with.  
> 
I thought that part of the exercise was to allow oprofile/perfmon and nmi_watchdog
to work simultaneously, i.e., don't disable watchdog when you monitor. Leave watchdog
on and have oprofile/perfmon run with fewer counters. The alternative would be to do
what you say: disbale nmi, monitor, re-enable nmi.

> > 
> > Also something that I did not see in this code is the error detection in
> > case enable_lapic_nmi_watchdog() fails. Oprofile runs on all CPUs or none.
> > Perfmon lets you monitor on subsets on CPUs. In case NMI was disabled and
> > a monitoring session was active on some CPUs. The enable_lapic_nmi_watchdog()
> > will fail on some CPUs. How is that handled?
> 
> It's not.  In fact I wouldn't know what to do in such situations.  Is it
> really wrong to only have a subset of cpus being monitored by the
> nmi_watchdog?  This seems to be wandering into the area where the user is
> looking to do something complicated (profiling a subset of cpus) and as
> such might be expected to make sure the nmi_watchdog is properly enabled
> on all cpus when they are done.
> 
On larger machines (think NUMA-style), it does not always make sense to monitor all CPUs.

-- 

-Stephane

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 8/8] Add abilty to enable/disable nmi watchdog from sysfs
  2006-05-10 15:03       ` Stephane Eranian
@ 2006-05-10 19:06         ` Don Zickus
  0 siblings, 0 replies; 18+ messages in thread
From: Don Zickus @ 2006-05-10 19:06 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, ak, oprofile-list, perfmon

On Wed, May 10, 2006 at 08:03:57AM -0700, Stephane Eranian wrote:
> Don,
> 
> On Wed, May 10, 2006 at 10:59:53AM -0400, Don Zickus wrote:
> > > 
> > > This means you can at runtime enable/disbale nmi_watchdog, i.e., reserve
> > > some performance counters on the fly. This gets complicated because now
> > > the perfmon subsystem (and probably oprofile) cannot check register
> > > availability when they are first initialized. Basically each time,
> > > the /sys entry is modified, they would have to scan the list of available
> > > performance counters. I don't know exactly when Oprofile does this checking.
> > > For perfmon, this is done only once, when the PMU description table is loaded.
> 
> > How often did you plan on enabling/disabling the nmi_watchdog?  My
> > understanding was you disable nmi_watchdog, run oprofile/perfmon,
> > re-enable nmi_watchdog.  I guess I don't understand what type of funky
> > scenarios you are dealing with.  
> > 
> I thought that part of the exercise was to allow oprofile/perfmon and nmi_watchdog
> to work simultaneously, i.e., don't disable watchdog when you monitor. Leave watchdog
> on and have oprofile/perfmon run with fewer counters. The alternative would be to do
> what you say: disbale nmi, monitor, re-enable nmi.

It was and the patch allows one to run the monitor and the nmi_watchdog at
the same time.  I guess I am confused to what your original concern
was.  

> 
> > > 
> > > Also something that I did not see in this code is the error detection in
> > > case enable_lapic_nmi_watchdog() fails. Oprofile runs on all CPUs or none.
> > > Perfmon lets you monitor on subsets on CPUs. In case NMI was disabled and
> > > a monitoring session was active on some CPUs. The enable_lapic_nmi_watchdog()
> > > will fail on some CPUs. How is that handled?
> > 
> > It's not.  In fact I wouldn't know what to do in such situations.  Is it
> > really wrong to only have a subset of cpus being monitored by the
> > nmi_watchdog?  This seems to be wandering into the area where the user is
> > looking to do something complicated (profiling a subset of cpus) and as
> > such might be expected to make sure the nmi_watchdog is properly enabled
> > on all cpus when they are done.
> > 
> On larger machines (think NUMA-style), it does not always make sense to monitor all CPUs.

That makes sense.  Thinking about it, it probably wouldn't be too hard to
add code to try and re-scan the cpus and enable those nmi_watchdogs that
were disabled during your monitoring.  But you would have to manually do
this once you were done testing. 

Cheers,
Don

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 5/8] Add SMP support on i386 to reservation framework
  2006-05-09 20:50 ` [patch 5/8] Add SMP support on i386 " dzickus
@ 2006-05-22 17:31   ` Markus Armbruster
  2006-05-22 17:55     ` Don Zickus
  0 siblings, 1 reply; 18+ messages in thread
From: Markus Armbruster @ 2006-05-22 17:31 UTC (permalink / raw)
  To: dzickus; +Cc: linux-kernel, ak, oprofile-list

@@ -457,143 +434,312 @@ late_initcall(init_lapic_nmi_sysfs);
[...]
 static int setup_p6_watchdog(void)
[...]
 	apic_write(APIC_LVTPC, APIC_DM_NMI);
-	evntsel |= P6_EVNTSEL0_ENABLE;
-	wrmsr(MSR_P6_EVNTSEL0, evntsel, 0);
+	evntsel |= K7_EVNTSEL_ENABLE;

Me thinks you want P6_EVNTSEL0_ENABLE here, although the value is the
same.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 5/8] Add SMP support on i386 to reservation framework
  2006-05-22 17:31   ` Markus Armbruster
@ 2006-05-22 17:55     ` Don Zickus
  0 siblings, 0 replies; 18+ messages in thread
From: Don Zickus @ 2006-05-22 17:55 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: linux-kernel, ak, oprofile-list

On Mon, May 22, 2006 at 07:31:41PM +0200, Markus Armbruster wrote:
> @@ -457,143 +434,312 @@ late_initcall(init_lapic_nmi_sysfs);
> [...]
>  static int setup_p6_watchdog(void)
> [...]
>  	apic_write(APIC_LVTPC, APIC_DM_NMI);
> -	evntsel |= P6_EVNTSEL0_ENABLE;
> -	wrmsr(MSR_P6_EVNTSEL0, evntsel, 0);
> +	evntsel |= K7_EVNTSEL_ENABLE;
> 
> Me thinks you want P6_EVNTSEL0_ENABLE here, although the value is the
> same.

Yup, the downside of copying/pasting...

Thanks.



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2006-05-22 17:51 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-09 20:50 [patch 0/8] [RFC PATCH] nmi watchdog: x86 32/64 cleanup dzickus
2006-05-09 20:50 ` [patch 1/8] nmi watchdog header cleanup dzickus
2006-05-09 20:50 ` [patch 2/8] Add performance counter reservation framework for UP kernels dzickus
2006-05-09 20:50 ` [patch 3/8] Utilize performance counter reservation framework in oprofile dzickus
2006-05-09 20:50 ` [patch 4/8] Add SMP support on x86_64 to reservation framework dzickus
2006-05-09 21:42   ` Andi Kleen
2006-05-09 20:50 ` [patch 5/8] Add SMP support on i386 " dzickus
2006-05-22 17:31   ` Markus Armbruster
2006-05-22 17:55     ` Don Zickus
2006-05-09 20:50 ` [patch 6/8] Cleanup NMI interrupt path dzickus
2006-05-09 20:50 ` [patch 7/8] Remove un/set_nmi_callback and reserve/release_lapic_nmi functions dzickus
2006-05-09 20:50 ` [patch 8/8] Add abilty to enable/disable nmi watchdog from sysfs dzickus
2006-05-09 21:37   ` Andi Kleen
2006-05-10  9:10   ` Stephane Eranian
2006-05-10  9:28     ` Andi Kleen
2006-05-10 14:59     ` Don Zickus
2006-05-10 15:03       ` Stephane Eranian
2006-05-10 19:06         ` Don Zickus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).