* [PATCH] [2/20] x86: Implement support to synchronize RDTSC through MFENCE on AMD CPUs
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
@ 2008-01-03 0:49 ` Andi Kleen
2008-01-03 0:49 ` [PATCH] [3/20] x86: Implement support to synchronize RDTSC with LFENCE on Intel CPUs Andi Kleen
` (18 subsequent siblings)
19 siblings, 0 replies; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:49 UTC (permalink / raw)
To: andreas.herrmann3, linux-kernel
According to AMD RDTSC can be synchronized through MFENCE.
Implement the necessary CPUID bit for that.
Cc: andreas.herrmann3@amd.com
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/kernel/cpu/amd.c | 3 +++
arch/x86/kernel/setup_64.c | 4 ++--
include/asm-x86/cpufeature.h | 1 +
3 files changed, 6 insertions(+), 2 deletions(-)
Index: linux/arch/x86/kernel/setup_64.c
===================================================================
--- linux.orig/arch/x86/kernel/setup_64.c
+++ linux/arch/x86/kernel/setup_64.c
@@ -757,8 +757,8 @@ static void __cpuinit init_amd(struct cp
if (c->x86 == 0xf || c->x86 == 0x10 || c->x86 == 0x11)
set_cpu_cap(c, X86_FEATURE_K8);
- /* RDTSC can be speculated around */
- clear_cpu_cap(c, X86_FEATURE_SYNC_RDTSC);
+ /* MFENCE stops RDTSC speculation */
+ set_cpu_cap(c, X86_FEATURE_MFENCE_RDTSC);
/* Family 10 doesn't support C states in MWAIT so don't use it */
if (c->x86 == 0x10 && !force_mwait)
Index: linux/include/asm-x86/cpufeature.h
===================================================================
--- linux.orig/include/asm-x86/cpufeature.h
+++ linux/include/asm-x86/cpufeature.h
@@ -79,6 +79,7 @@
/* 14 free */
#define X86_FEATURE_SYNC_RDTSC (3*32+15) /* RDTSC synchronizes the CPU */
#define X86_FEATURE_REP_GOOD (3*32+16) /* rep microcode works well on this CPU */
+#define X86_FEATURE_MFENCE_RDTSC (3*32+17) /* Mfence synchronizes RDTSC */
/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
#define X86_FEATURE_XMM3 (4*32+ 0) /* Streaming SIMD Extensions-3 */
Index: linux/arch/x86/kernel/cpu/amd.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/amd.c
+++ linux/arch/x86/kernel/cpu/amd.c
@@ -301,6 +301,9 @@ static void __cpuinit init_amd(struct cp
/* K6s reports MCEs but don't actually have all the MSRs */
if (c->x86 < 6)
clear_bit(X86_FEATURE_MCE, c->x86_capability);
+
+ if (cpu_has_xmm)
+ set_bit(X86_FEATURE_MFENCE_RDTSC, c->x86_capability);
}
static unsigned int __cpuinit amd_size_cache(struct cpuinfo_x86 * c, unsigned int size)
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH] [3/20] x86: Implement support to synchronize RDTSC with LFENCE on Intel CPUs
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
2008-01-03 0:49 ` [PATCH] [2/20] x86: Implement support to synchronize RDTSC through MFENCE on AMD CPUs Andi Kleen
@ 2008-01-03 0:49 ` Andi Kleen
2008-01-03 0:49 ` [PATCH] [4/20] x86: Move nop declarations into separate include file Andi Kleen
` (17 subsequent siblings)
19 siblings, 0 replies; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:49 UTC (permalink / raw)
To: asit.k.mallick, linux-kernel
According to Intel RDTSC can be always synchronized with LFENCE
on all current CPUs. Implement the necessary CPUID bit for that.
It is unclear yet if that is true for all future CPUs too,
but if there's another way the kernel can be always updated.
Cc: asit.k.mallick@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/kernel/cpu/intel.c | 3 ++-
arch/x86/kernel/setup_64.c | 5 +----
include/asm-x86/cpufeature.h | 1 +
3 files changed, 4 insertions(+), 5 deletions(-)
Index: linux/arch/x86/kernel/setup_64.c
===================================================================
--- linux.orig/arch/x86/kernel/setup_64.c
+++ linux/arch/x86/kernel/setup_64.c
@@ -899,10 +899,7 @@ static void __cpuinit init_intel(struct
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
if (c->x86 == 6)
set_cpu_cap(c, X86_FEATURE_REP_GOOD);
- if (c->x86 == 15)
- set_cpu_cap(c, X86_FEATURE_SYNC_RDTSC);
- else
- clear_cpu_cap(c, X86_FEATURE_SYNC_RDTSC);
+ set_cpu_cap(c, X86_FEATURE_LFENCE_RDTSC);
c->x86_max_cores = intel_num_cpu_cores(c);
srat_detect_node();
Index: linux/include/asm-x86/cpufeature.h
===================================================================
--- linux.orig/include/asm-x86/cpufeature.h
+++ linux/include/asm-x86/cpufeature.h
@@ -80,6 +80,7 @@
#define X86_FEATURE_SYNC_RDTSC (3*32+15) /* RDTSC synchronizes the CPU */
#define X86_FEATURE_REP_GOOD (3*32+16) /* rep microcode works well on this CPU */
#define X86_FEATURE_MFENCE_RDTSC (3*32+17) /* Mfence synchronizes RDTSC */
+#define X86_FEATURE_LFENCE_RDTSC (3*32+18) /* Lfence synchronizes RDTSC */
/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
#define X86_FEATURE_XMM3 (4*32+ 0) /* Streaming SIMD Extensions-3 */
Index: linux/arch/x86/kernel/cpu/intel.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/intel.c
+++ linux/arch/x86/kernel/cpu/intel.c
@@ -203,9 +203,10 @@ static void __cpuinit init_intel(struct
}
#endif
+ if (cpu_has_xmm)
+ set_bit(X86_FEATURE_LFENCE_RDTSC, c->x86_capability);
if (c->x86 == 15) {
set_bit(X86_FEATURE_P4, c->x86_capability);
- set_bit(X86_FEATURE_SYNC_RDTSC, c->x86_capability);
}
if (c->x86 == 6)
set_bit(X86_FEATURE_P3, c->x86_capability);
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH] [4/20] x86: Move nop declarations into separate include file
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
2008-01-03 0:49 ` [PATCH] [2/20] x86: Implement support to synchronize RDTSC through MFENCE on AMD CPUs Andi Kleen
2008-01-03 0:49 ` [PATCH] [3/20] x86: Implement support to synchronize RDTSC with LFENCE on Intel CPUs Andi Kleen
@ 2008-01-03 0:49 ` Andi Kleen
2008-01-03 0:50 ` [PATCH] [5/20] x86: Introduce nsec_barrier() Andi Kleen
` (16 subsequent siblings)
19 siblings, 0 replies; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:49 UTC (permalink / raw)
To: linux-kernel
Moving things out of processor.h is always a good thing.
Also needed to avoid include loop in later patch.
Signed-off-by: Andi Kleen <ak@suse.de>
---
include/asm-x86/nops.h | 90 ++++++++++++++++++++++++++++++++++++++++++++
include/asm-x86/processor.h | 86 ------------------------------------------
2 files changed, 91 insertions(+), 85 deletions(-)
Index: linux/include/asm-x86/nops.h
===================================================================
--- /dev/null
+++ linux/include/asm-x86/nops.h
@@ -0,0 +1,90 @@
+#ifndef _ASM_NOPS_H
+#define _ASM_NOPS_H 1
+
+/* Define nops for use with alternative() */
+
+/* generic versions from gas */
+#define GENERIC_NOP1 ".byte 0x90\n"
+#define GENERIC_NOP2 ".byte 0x89,0xf6\n"
+#define GENERIC_NOP3 ".byte 0x8d,0x76,0x00\n"
+#define GENERIC_NOP4 ".byte 0x8d,0x74,0x26,0x00\n"
+#define GENERIC_NOP5 GENERIC_NOP1 GENERIC_NOP4
+#define GENERIC_NOP6 ".byte 0x8d,0xb6,0x00,0x00,0x00,0x00\n"
+#define GENERIC_NOP7 ".byte 0x8d,0xb4,0x26,0x00,0x00,0x00,0x00\n"
+#define GENERIC_NOP8 GENERIC_NOP1 GENERIC_NOP7
+
+/* Opteron 64bit nops */
+#define K8_NOP1 GENERIC_NOP1
+#define K8_NOP2 ".byte 0x66,0x90\n"
+#define K8_NOP3 ".byte 0x66,0x66,0x90\n"
+#define K8_NOP4 ".byte 0x66,0x66,0x66,0x90\n"
+#define K8_NOP5 K8_NOP3 K8_NOP2
+#define K8_NOP6 K8_NOP3 K8_NOP3
+#define K8_NOP7 K8_NOP4 K8_NOP3
+#define K8_NOP8 K8_NOP4 K8_NOP4
+
+/* K7 nops */
+/* uses eax dependencies (arbitary choice) */
+#define K7_NOP1 GENERIC_NOP1
+#define K7_NOP2 ".byte 0x8b,0xc0\n"
+#define K7_NOP3 ".byte 0x8d,0x04,0x20\n"
+#define K7_NOP4 ".byte 0x8d,0x44,0x20,0x00\n"
+#define K7_NOP5 K7_NOP4 ASM_NOP1
+#define K7_NOP6 ".byte 0x8d,0x80,0,0,0,0\n"
+#define K7_NOP7 ".byte 0x8D,0x04,0x05,0,0,0,0\n"
+#define K7_NOP8 K7_NOP7 ASM_NOP1
+
+/* P6 nops */
+/* uses eax dependencies (Intel-recommended choice) */
+#define P6_NOP1 GENERIC_NOP1
+#define P6_NOP2 ".byte 0x66,0x90\n"
+#define P6_NOP3 ".byte 0x0f,0x1f,0x00\n"
+#define P6_NOP4 ".byte 0x0f,0x1f,0x40,0\n"
+#define P6_NOP5 ".byte 0x0f,0x1f,0x44,0x00,0\n"
+#define P6_NOP6 ".byte 0x66,0x0f,0x1f,0x44,0x00,0\n"
+#define P6_NOP7 ".byte 0x0f,0x1f,0x80,0,0,0,0\n"
+#define P6_NOP8 ".byte 0x0f,0x1f,0x84,0x00,0,0,0,0\n"
+
+#if defined(CONFIG_MK8)
+#define ASM_NOP1 K8_NOP1
+#define ASM_NOP2 K8_NOP2
+#define ASM_NOP3 K8_NOP3
+#define ASM_NOP4 K8_NOP4
+#define ASM_NOP5 K8_NOP5
+#define ASM_NOP6 K8_NOP6
+#define ASM_NOP7 K8_NOP7
+#define ASM_NOP8 K8_NOP8
+#elif defined(CONFIG_MK7)
+#define ASM_NOP1 K7_NOP1
+#define ASM_NOP2 K7_NOP2
+#define ASM_NOP3 K7_NOP3
+#define ASM_NOP4 K7_NOP4
+#define ASM_NOP5 K7_NOP5
+#define ASM_NOP6 K7_NOP6
+#define ASM_NOP7 K7_NOP7
+#define ASM_NOP8 K7_NOP8
+#elif defined(CONFIG_M686) || defined(CONFIG_MPENTIUMII) || \
+ defined(CONFIG_MPENTIUMIII) || defined(CONFIG_MPENTIUMM) || \
+ defined(CONFIG_MCORE2) || defined(CONFIG_PENTIUM4)
+#define ASM_NOP1 P6_NOP1
+#define ASM_NOP2 P6_NOP2
+#define ASM_NOP3 P6_NOP3
+#define ASM_NOP4 P6_NOP4
+#define ASM_NOP5 P6_NOP5
+#define ASM_NOP6 P6_NOP6
+#define ASM_NOP7 P6_NOP7
+#define ASM_NOP8 P6_NOP8
+#else
+#define ASM_NOP1 GENERIC_NOP1
+#define ASM_NOP2 GENERIC_NOP2
+#define ASM_NOP3 GENERIC_NOP3
+#define ASM_NOP4 GENERIC_NOP4
+#define ASM_NOP5 GENERIC_NOP5
+#define ASM_NOP6 GENERIC_NOP6
+#define ASM_NOP7 GENERIC_NOP7
+#define ASM_NOP8 GENERIC_NOP8
+#endif
+
+#define ASM_NOP_MAX 8
+
+#endif
Index: linux/include/asm-x86/processor.h
===================================================================
--- linux.orig/include/asm-x86/processor.h
+++ linux/include/asm-x86/processor.h
@@ -20,6 +20,7 @@ struct mm_struct;
#include <asm/percpu.h>
#include <asm/msr.h>
#include <asm/desc_defs.h>
+#include <asm/nops.h>
#include <linux/personality.h>
#include <linux/cpumask.h>
#include <linux/cache.h>
@@ -674,91 +675,6 @@ extern int bootloader_type;
extern char ignore_fpu_irq;
#define cache_line_size() (boot_cpu_data.x86_cache_alignment)
-/* generic versions from gas */
-#define GENERIC_NOP1 ".byte 0x90\n"
-#define GENERIC_NOP2 ".byte 0x89,0xf6\n"
-#define GENERIC_NOP3 ".byte 0x8d,0x76,0x00\n"
-#define GENERIC_NOP4 ".byte 0x8d,0x74,0x26,0x00\n"
-#define GENERIC_NOP5 GENERIC_NOP1 GENERIC_NOP4
-#define GENERIC_NOP6 ".byte 0x8d,0xb6,0x00,0x00,0x00,0x00\n"
-#define GENERIC_NOP7 ".byte 0x8d,0xb4,0x26,0x00,0x00,0x00,0x00\n"
-#define GENERIC_NOP8 GENERIC_NOP1 GENERIC_NOP7
-
-/* Opteron nops */
-#define K8_NOP1 GENERIC_NOP1
-#define K8_NOP2 ".byte 0x66,0x90\n"
-#define K8_NOP3 ".byte 0x66,0x66,0x90\n"
-#define K8_NOP4 ".byte 0x66,0x66,0x66,0x90\n"
-#define K8_NOP5 K8_NOP3 K8_NOP2
-#define K8_NOP6 K8_NOP3 K8_NOP3
-#define K8_NOP7 K8_NOP4 K8_NOP3
-#define K8_NOP8 K8_NOP4 K8_NOP4
-
-/* K7 nops */
-/* uses eax dependencies (arbitary choice) */
-#define K7_NOP1 GENERIC_NOP1
-#define K7_NOP2 ".byte 0x8b,0xc0\n"
-#define K7_NOP3 ".byte 0x8d,0x04,0x20\n"
-#define K7_NOP4 ".byte 0x8d,0x44,0x20,0x00\n"
-#define K7_NOP5 K7_NOP4 ASM_NOP1
-#define K7_NOP6 ".byte 0x8d,0x80,0,0,0,0\n"
-#define K7_NOP7 ".byte 0x8D,0x04,0x05,0,0,0,0\n"
-#define K7_NOP8 K7_NOP7 ASM_NOP1
-
-/* P6 nops */
-/* uses eax dependencies (Intel-recommended choice) */
-#define P6_NOP1 GENERIC_NOP1
-#define P6_NOP2 ".byte 0x66,0x90\n"
-#define P6_NOP3 ".byte 0x0f,0x1f,0x00\n"
-#define P6_NOP4 ".byte 0x0f,0x1f,0x40,0\n"
-#define P6_NOP5 ".byte 0x0f,0x1f,0x44,0x00,0\n"
-#define P6_NOP6 ".byte 0x66,0x0f,0x1f,0x44,0x00,0\n"
-#define P6_NOP7 ".byte 0x0f,0x1f,0x80,0,0,0,0\n"
-#define P6_NOP8 ".byte 0x0f,0x1f,0x84,0x00,0,0,0,0\n"
-
-#ifdef CONFIG_MK7
-#define ASM_NOP1 K7_NOP1
-#define ASM_NOP2 K7_NOP2
-#define ASM_NOP3 K7_NOP3
-#define ASM_NOP4 K7_NOP4
-#define ASM_NOP5 K7_NOP5
-#define ASM_NOP6 K7_NOP6
-#define ASM_NOP7 K7_NOP7
-#define ASM_NOP8 K7_NOP8
-#elif defined(CONFIG_M686) || defined(CONFIG_MPENTIUMII) || \
- defined(CONFIG_MPENTIUMIII) || defined(CONFIG_MPENTIUMM) || \
- defined(CONFIG_MCORE2) || defined(CONFIG_PENTIUM4) || \
- defined(CONFIG_MPSC)
-#define ASM_NOP1 P6_NOP1
-#define ASM_NOP2 P6_NOP2
-#define ASM_NOP3 P6_NOP3
-#define ASM_NOP4 P6_NOP4
-#define ASM_NOP5 P6_NOP5
-#define ASM_NOP6 P6_NOP6
-#define ASM_NOP7 P6_NOP7
-#define ASM_NOP8 P6_NOP8
-#elif defined(CONFIG_MK8) || defined(CONFIG_X86_64)
-#define ASM_NOP1 K8_NOP1
-#define ASM_NOP2 K8_NOP2
-#define ASM_NOP3 K8_NOP3
-#define ASM_NOP4 K8_NOP4
-#define ASM_NOP5 K8_NOP5
-#define ASM_NOP6 K8_NOP6
-#define ASM_NOP7 K8_NOP7
-#define ASM_NOP8 K8_NOP8
-#else
-#define ASM_NOP1 GENERIC_NOP1
-#define ASM_NOP2 GENERIC_NOP2
-#define ASM_NOP3 GENERIC_NOP3
-#define ASM_NOP4 GENERIC_NOP4
-#define ASM_NOP5 GENERIC_NOP5
-#define ASM_NOP6 GENERIC_NOP6
-#define ASM_NOP7 GENERIC_NOP7
-#define ASM_NOP8 GENERIC_NOP8
-#endif
-
-#define ASM_NOP_MAX 8
-
#define HAVE_ARCH_PICK_MMAP_LAYOUT 1
#define ARCH_HAS_PREFETCHW
#define ARCH_HAS_SPINLOCK_PREFETCH
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH] [5/20] x86: Introduce nsec_barrier()
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
` (2 preceding siblings ...)
2008-01-03 0:49 ` [PATCH] [4/20] x86: Move nop declarations into separate include file Andi Kleen
@ 2008-01-03 0:50 ` Andi Kleen
2008-01-03 10:47 ` Ingo Molnar
2008-01-03 0:50 ` [PATCH] [6/20] x86: Remove get_cycles_sync Andi Kleen
` (15 subsequent siblings)
19 siblings, 1 reply; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:50 UTC (permalink / raw)
To: linux-kernel
nsec_barrier() is a new barrier primitive that stops RDTSC speculation
to avoid races with timer interrupts on other CPUs.
Add it to all architectures. Except for x86 it is a nop right now.
I only tested x86, but it's a very simple change.
On x86 it expands either to LFENCE (for Intel CPUs) or MFENCE (for
AMD CPUs) which stops RDTSC on all currently known microarchitectures
that implement SSE. On CPUs without SSE there is generally no RDTSC
speculation.
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/kernel/vsyscall_64.c | 2 ++
include/asm-alpha/barrier.h | 2 ++
include/asm-arm/system.h | 1 +
include/asm-avr32/system.h | 1 +
include/asm-blackfin/system.h | 1 +
include/asm-cris/system.h | 1 +
include/asm-frv/system.h | 1 +
include/asm-h8300/system.h | 1 +
include/asm-ia64/system.h | 1 +
include/asm-m32r/system.h | 1 +
include/asm-m68k/system.h | 1 +
include/asm-m68knommu/system.h | 1 +
include/asm-mips/barrier.h | 2 ++
include/asm-parisc/system.h | 1 +
include/asm-powerpc/system.h | 1 +
include/asm-ppc/system.h | 1 +
include/asm-s390/system.h | 1 +
include/asm-sh/system.h | 1 +
include/asm-sparc/system.h | 1 +
include/asm-sparc64/system.h | 2 ++
include/asm-v850/system.h | 2 ++
include/asm-x86/system.h | 11 +++++++++++
include/asm-xtensa/system.h | 1 +
kernel/time/timekeeping.c | 2 ++
24 files changed, 40 insertions(+)
Index: linux/arch/x86/kernel/vsyscall_64.c
===================================================================
--- linux.orig/arch/x86/kernel/vsyscall_64.c
+++ linux/arch/x86/kernel/vsyscall_64.c
@@ -126,6 +126,7 @@ static __always_inline void do_vgettimeo
cycle_t (*vread)(void);
do {
seq = read_seqbegin(&__vsyscall_gtod_data.lock);
+ nsec_barrier();
vread = __vsyscall_gtod_data.clock.vread;
if (unlikely(!__vsyscall_gtod_data.sysctl_enabled || !vread)) {
@@ -140,6 +141,7 @@ static __always_inline void do_vgettimeo
tv->tv_sec = __vsyscall_gtod_data.wall_time_sec;
nsec = __vsyscall_gtod_data.wall_time_nsec;
+ nsec_barrier();
} while (read_seqretry(&__vsyscall_gtod_data.lock, seq));
/* calculate interval: */
Index: linux/kernel/time/timekeeping.c
===================================================================
--- linux.orig/kernel/time/timekeeping.c
+++ linux/kernel/time/timekeeping.c
@@ -94,10 +94,12 @@ void getnstimeofday(struct timespec *ts)
do {
seq = read_seqbegin(&xtime_lock);
+ nsec_barrier();
*ts = xtime;
nsecs = __get_nsec_offset();
+ nsec_barrier();
} while (read_seqretry(&xtime_lock, seq));
timespec_add_ns(ts, nsecs);
Index: linux/include/asm-x86/system.h
===================================================================
--- linux.orig/include/asm-x86/system.h
+++ linux/include/asm-x86/system.h
@@ -5,6 +5,7 @@
#include <asm/segment.h>
#include <asm/cpufeature.h>
#include <asm/cmpxchg.h>
+#include <asm/nops.h>
#include <linux/kernel.h>
#include <linux/irqflags.h>
@@ -395,5 +396,15 @@ void default_idle(void);
#define set_mb(var, value) do { var = value; barrier(); } while (0)
#endif
+/* Stop RDTSC speculation. This is needed when you need to use RDTSC
+ * (or get_cycles or vread that possibly accesses the TSC) in a defined
+ * code region.
+ * Could use an alternative three way for this if there was one.
+ */
+static inline void nsec_barrier(void)
+{
+ alternative(ASM_NOP3, "mfence", X86_FEATURE_MFENCE_RDTSC);
+ alternative(ASM_NOP3, "lfence", X86_FEATURE_LFENCE_RDTSC);
+}
#endif
Index: linux/include/asm-alpha/barrier.h
===================================================================
--- linux.orig/include/asm-alpha/barrier.h
+++ linux/include/asm-alpha/barrier.h
@@ -15,6 +15,8 @@ __asm__ __volatile__("wmb": : :"memory")
#define read_barrier_depends() \
__asm__ __volatile__("mb": : :"memory")
+#define nsec_barrier() barrier()
+
#ifdef CONFIG_SMP
#define smp_mb() mb()
#define smp_rmb() rmb()
Index: linux/include/asm-arm/system.h
===================================================================
--- linux.orig/include/asm-arm/system.h
+++ linux/include/asm-arm/system.h
@@ -191,6 +191,7 @@ extern unsigned int user_debug;
#endif
#define read_barrier_depends() do { } while(0)
#define smp_read_barrier_depends() do { } while(0)
+#define nsec_barrier() barrier()
#define set_mb(var, value) do { var = value; smp_mb(); } while (0)
#define nop() __asm__ __volatile__("mov\tr0,r0\t@ nop\n\t");
Index: linux/include/asm-avr32/system.h
===================================================================
--- linux.orig/include/asm-avr32/system.h
+++ linux/include/asm-avr32/system.h
@@ -25,6 +25,7 @@
#define wmb() asm volatile("sync 0" : : : "memory")
#define read_barrier_depends() do { } while(0)
#define set_mb(var, value) do { var = value; mb(); } while(0)
+#define nsec_barrier() barrier();
/*
* Help PathFinder and other Nexus-compliant debuggers keep track of
Index: linux/include/asm-blackfin/system.h
===================================================================
--- linux.orig/include/asm-blackfin/system.h
+++ linux/include/asm-blackfin/system.h
@@ -128,6 +128,7 @@ extern unsigned long irq_flags;
#define mb() asm volatile ("" : : :"memory")
#define rmb() asm volatile ("" : : :"memory")
#define wmb() asm volatile ("" : : :"memory")
+#define nsec_barrier() barrier()
#define set_mb(var, value) do { (void) xchg(&var, value); } while (0)
#define read_barrier_depends() do { } while(0)
Index: linux/include/asm-cris/system.h
===================================================================
--- linux.orig/include/asm-cris/system.h
+++ linux/include/asm-cris/system.h
@@ -15,6 +15,7 @@ extern struct task_struct *resume(struct
#define mb() barrier()
#define rmb() mb()
#define wmb() mb()
+#define nsec_barrier() barrier()
#define read_barrier_depends() do { } while(0)
#define set_mb(var, value) do { var = value; mb(); } while (0)
Index: linux/include/asm-frv/system.h
===================================================================
--- linux.orig/include/asm-frv/system.h
+++ linux/include/asm-frv/system.h
@@ -179,6 +179,7 @@ do { \
#define rmb() asm volatile ("membar" : : :"memory")
#define wmb() asm volatile ("membar" : : :"memory")
#define set_mb(var, value) do { var = value; mb(); } while (0)
+#define nsec_barrier() barrier()
#define smp_mb() mb()
#define smp_rmb() rmb()
Index: linux/include/asm-h8300/system.h
===================================================================
--- linux.orig/include/asm-h8300/system.h
+++ linux/include/asm-h8300/system.h
@@ -83,6 +83,7 @@ asmlinkage void resume(void);
#define rmb() asm volatile ("" : : :"memory")
#define wmb() asm volatile ("" : : :"memory")
#define set_mb(var, value) do { xchg(&var, value); } while (0)
+#define nsec_barrier() barrier()
#ifdef CONFIG_SMP
#define smp_mb() mb()
Index: linux/include/asm-ia64/system.h
===================================================================
--- linux.orig/include/asm-ia64/system.h
+++ linux/include/asm-ia64/system.h
@@ -86,6 +86,7 @@ extern struct ia64_boot_param {
#define rmb() mb()
#define wmb() mb()
#define read_barrier_depends() do { } while(0)
+#define nsec_barrier() barrier()
#ifdef CONFIG_SMP
# define smp_mb() mb()
Index: linux/include/asm-m32r/system.h
===================================================================
--- linux.orig/include/asm-m32r/system.h
+++ linux/include/asm-m32r/system.h
@@ -267,6 +267,7 @@ __cmpxchg(volatile void *ptr, unsigned l
#define mb() barrier()
#define rmb() mb()
#define wmb() mb()
+#define nsec_barrier() barrier()
/**
* read_barrier_depends - Flush all pending reads that subsequents reads
Index: linux/include/asm-m68k/system.h
===================================================================
--- linux.orig/include/asm-m68k/system.h
+++ linux/include/asm-m68k/system.h
@@ -56,6 +56,7 @@ asmlinkage void resume(void);
#define wmb() barrier()
#define read_barrier_depends() ((void)0)
#define set_mb(var, value) ({ (var) = (value); wmb(); })
+#define nsec_barrier() barrier()
#define smp_mb() barrier()
#define smp_rmb() barrier()
Index: linux/include/asm-m68knommu/system.h
===================================================================
--- linux.orig/include/asm-m68knommu/system.h
+++ linux/include/asm-m68knommu/system.h
@@ -105,6 +105,7 @@ asmlinkage void resume(void);
#define rmb() asm volatile ("" : : :"memory")
#define wmb() asm volatile ("" : : :"memory")
#define set_mb(var, value) do { xchg(&var, value); } while (0)
+#define nsec_barrier() barrier()
#ifdef CONFIG_SMP
#define smp_mb() mb()
Index: linux/include/asm-mips/barrier.h
===================================================================
--- linux.orig/include/asm-mips/barrier.h
+++ linux/include/asm-mips/barrier.h
@@ -138,4 +138,6 @@
#define smp_llsc_rmb() __asm__ __volatile__(__WEAK_LLSC_MB : : :"memory")
#define smp_llsc_wmb() __asm__ __volatile__(__WEAK_LLSC_MB : : :"memory")
+#define nsec_barrier() barrier()
+
#endif /* __ASM_BARRIER_H */
Index: linux/include/asm-parisc/system.h
===================================================================
--- linux.orig/include/asm-parisc/system.h
+++ linux/include/asm-parisc/system.h
@@ -130,6 +130,7 @@ static inline void set_eiem(unsigned lon
#define smp_wmb() mb()
#define smp_read_barrier_depends() do { } while(0)
#define read_barrier_depends() do { } while(0)
+#define nsec_barrier() barrier()
#define set_mb(var, value) do { var = value; mb(); } while (0)
Index: linux/include/asm-powerpc/system.h
===================================================================
--- linux.orig/include/asm-powerpc/system.h
+++ linux/include/asm-powerpc/system.h
@@ -36,6 +36,7 @@
#define rmb() __asm__ __volatile__ (__stringify(LWSYNC) : : : "memory")
#define wmb() __asm__ __volatile__ ("sync" : : : "memory")
#define read_barrier_depends() do { } while(0)
+#define nsec_barrier() barrier()
#define set_mb(var, value) do { var = value; mb(); } while (0)
Index: linux/include/asm-ppc/system.h
===================================================================
--- linux.orig/include/asm-ppc/system.h
+++ linux/include/asm-ppc/system.h
@@ -30,6 +30,7 @@
#define rmb() __asm__ __volatile__ ("sync" : : : "memory")
#define wmb() __asm__ __volatile__ ("eieio" : : : "memory")
#define read_barrier_depends() do { } while(0)
+#define nsec_barrier() barrier()
#define set_mb(var, value) do { var = value; mb(); } while (0)
Index: linux/include/asm-s390/system.h
===================================================================
--- linux.orig/include/asm-s390/system.h
+++ linux/include/asm-s390/system.h
@@ -297,6 +297,7 @@ __cmpxchg(volatile void *ptr, unsigned l
#define smp_read_barrier_depends() read_barrier_depends()
#define smp_mb__before_clear_bit() smp_mb()
#define smp_mb__after_clear_bit() smp_mb()
+#define nsec_barrier() barrier()
#define set_mb(var, value) do { var = value; mb(); } while (0)
Index: linux/include/asm-sh/system.h
===================================================================
--- linux.orig/include/asm-sh/system.h
+++ linux/include/asm-sh/system.h
@@ -104,6 +104,7 @@ struct task_struct *__switch_to(struct t
#define ctrl_barrier() __asm__ __volatile__ ("nop;nop;nop;nop;nop;nop;nop;nop")
#define read_barrier_depends() do { } while(0)
#endif
+#define nsec_barrier() barrier()
#ifdef CONFIG_SMP
#define smp_mb() mb()
Index: linux/include/asm-sparc/system.h
===================================================================
--- linux.orig/include/asm-sparc/system.h
+++ linux/include/asm-sparc/system.h
@@ -174,6 +174,7 @@ extern void fpsave(unsigned long *fpregs
#define smp_rmb() __asm__ __volatile__("":::"memory")
#define smp_wmb() __asm__ __volatile__("":::"memory")
#define smp_read_barrier_depends() do { } while(0)
+#define nsec_barrier() barrier()
#define nop() __asm__ __volatile__ ("nop")
Index: linux/include/asm-sparc64/system.h
===================================================================
--- linux.orig/include/asm-sparc64/system.h
+++ linux/include/asm-sparc64/system.h
@@ -74,6 +74,8 @@ do { __asm__ __volatile__("ba,pt %%xcc,
#endif
+#define nsec_barrier() barrier()
+
#define nop() __asm__ __volatile__ ("nop")
#define read_barrier_depends() do { } while(0)
Index: linux/include/asm-v850/system.h
===================================================================
--- linux.orig/include/asm-v850/system.h
+++ linux/include/asm-v850/system.h
@@ -73,6 +73,8 @@ static inline int irqs_disabled (void)
#define smp_wmb() wmb ()
#define smp_read_barrier_depends() read_barrier_depends()
+#define nsec_barrier() barrier()
+
#define xchg(ptr, with) \
((__typeof__ (*(ptr)))__xchg ((unsigned long)(with), (ptr), sizeof (*(ptr))))
Index: linux/include/asm-xtensa/system.h
===================================================================
--- linux.orig/include/asm-xtensa/system.h
+++ linux/include/asm-xtensa/system.h
@@ -89,6 +89,7 @@ static inline void disable_coprocessor(i
#define mb() barrier()
#define rmb() mb()
#define wmb() mb()
+#define nsec_barrier() barrier()
#ifdef CONFIG_SMP
#error smp_* not defined
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH] [5/20] x86: Introduce nsec_barrier()
2008-01-03 0:50 ` [PATCH] [5/20] x86: Introduce nsec_barrier() Andi Kleen
@ 2008-01-03 10:47 ` Ingo Molnar
2008-01-03 12:55 ` Andi Kleen
0 siblings, 1 reply; 41+ messages in thread
From: Ingo Molnar @ 2008-01-03 10:47 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, Thomas Gleixner, H. Peter Anvin
* Andi Kleen <ak@suse.de> wrote:
> nsec_barrier() is a new barrier primitive that stops RDTSC speculation
> to avoid races with timer interrupts on other CPUs.
>
> Add it to all architectures. Except for x86 it is a nop right now. I
> only tested x86, but it's a very simple change.
>
> On x86 it expands either to LFENCE (for Intel CPUs) or MFENCE (for AMD
> CPUs) which stops RDTSC on all currently known microarchitectures that
> implement SSE. On CPUs without SSE there is generally no RDTSC
> speculation.
i've picked up your rdtsc patches into x86.git but have simplified it:
there's no nsec_barrier() anymore - rdtsc() is always synchronous.
MFENCE/LFENCE is fast enough. Open-coding such barriers almost always
leads to needless trouble. Please check the next x86.git tree.
Ingo
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] [5/20] x86: Introduce nsec_barrier()
2008-01-03 10:47 ` Ingo Molnar
@ 2008-01-03 12:55 ` Andi Kleen
2008-01-07 20:01 ` [PATCH] [5/20] x86: Introduce nsec_barrier() II Andi Kleen
0 siblings, 1 reply; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 12:55 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Thomas Gleixner, H. Peter Anvin
On Thursday 03 January 2008 11:47:54 Ingo Molnar wrote:
>
> * Andi Kleen <ak@suse.de> wrote:
>
> > nsec_barrier() is a new barrier primitive that stops RDTSC speculation
> > to avoid races with timer interrupts on other CPUs.
> >
> > Add it to all architectures. Except for x86 it is a nop right now. I
> > only tested x86, but it's a very simple change.
> >
> > On x86 it expands either to LFENCE (for Intel CPUs) or MFENCE (for AMD
> > CPUs) which stops RDTSC on all currently known microarchitectures that
> > implement SSE. On CPUs without SSE there is generally no RDTSC
> > speculation.
>
> i've picked up your rdtsc patches into x86.git but have simplified it:
> there's no nsec_barrier() anymore - rdtsc() is always synchronous.
> MFENCE/LFENCE is fast enough. Open-coding such barriers almost always
> leads to needless trouble. Please check the next x86.git tree.
That's most likely wrong unless you added two barriers -- the barriers
are strictly need to be before and after RDTSC.
I still think having the open barrier is the better approach here.
It's also useful for performance measurements because it allows
a cheap way to measure a specific region with RDTSC
-Andi
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] [5/20] x86: Introduce nsec_barrier() II
2008-01-03 12:55 ` Andi Kleen
@ 2008-01-07 20:01 ` Andi Kleen
0 siblings, 0 replies; 41+ messages in thread
From: Andi Kleen @ 2008-01-07 20:01 UTC (permalink / raw)
To: mingo, tglx; +Cc: linux-kernel
Andi Kleen <ak@suse.de> writes:
> On Thursday 03 January 2008 11:47:54 Ingo Molnar wrote:
>>
>> * Andi Kleen <ak@suse.de> wrote:
>>
>> > nsec_barrier() is a new barrier primitive that stops RDTSC speculation
>> > to avoid races with timer interrupts on other CPUs.
>> >
>> > Add it to all architectures. Except for x86 it is a nop right now. I
>> > only tested x86, but it's a very simple change.
>> >
>> > On x86 it expands either to LFENCE (for Intel CPUs) or MFENCE (for AMD
>> > CPUs) which stops RDTSC on all currently known microarchitectures that
>> > implement SSE. On CPUs without SSE there is generally no RDTSC
>> > speculation.
>>
>> i've picked up your rdtsc patches into x86.git but have simplified it:
>> there's no nsec_barrier() anymore - rdtsc() is always synchronous.
>> MFENCE/LFENCE is fast enough. Open-coding such barriers almost always
>> leads to needless trouble. Please check the next x86.git tree.
>
> That's most likely wrong unless you added two barriers -- the barriers
> are strictly need to be before and after RDTSC.
I checked your patch now -- 743abf4d987911af1ffce4c96f06cba6ffaa7e88
and 428f309ba5244fa25b44fcdf1d79aa94c8745cfd in gitx86 and you did it
indeed wrong.
The problem is that you inserted the barrier only after the RDTSC, but
the instruction can be speculated both forward and backward and both
can cause inconsistencies if it leaves the critical section.
The minimal fix would be to change native_read_tsc to
rdtsc_barrier();
asm volatile("rdtsc" : EAX_EDX_RET(val, low, high));
rdtsc_barrier();
(e.g. add the missing barrier)
Better would be still to keep the explicit nsec barriers as in
the original patch for several reasons:
- There are situations where RDTSC without barrier is ok and it might
be useful there.
- It is better to give the CPU some room to run uops in parallel;
especially in very performance critical like gtod(). A wider
barriered area is faster.
- I actually expect that other architectures with aggressive OOO
implementations (like POWER4/5) should make use of
nsec_barrier() too. I don't think it's really an x86 specific concept.
- Explicit barriers can be useful when measuring performance, although
admittedly there a x86 specific barrier is usually ok. However forcing
the barrier inside RDTSC is not.
If you insist to keep the incorrect patch please drop my name from it.
-Andi
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH] [6/20] x86: Remove get_cycles_sync
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
` (3 preceding siblings ...)
2008-01-03 0:50 ` [PATCH] [5/20] x86: Introduce nsec_barrier() Andi Kleen
@ 2008-01-03 0:50 ` Andi Kleen
2008-01-03 0:50 ` [PATCH] [7/20] x86: Remove the now unused X86_FEATURE_SYNC_RDTSC Andi Kleen
` (14 subsequent siblings)
19 siblings, 0 replies; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:50 UTC (permalink / raw)
To: linux-kernel
And replace with nsec_barrier() as needed which has the same effect
of preventing unnecessary speculation around RDTSC.
For the standard gtod() like calls the previous patch already added the
necessary barriers
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/kernel/time_64.c | 4 +++-
arch/x86/kernel/tsc_64.c | 17 +++++++++++------
arch/x86/kernel/tsc_sync.c | 6 ++++--
include/asm-x86/tsc.h | 44 ++++----------------------------------------
4 files changed, 22 insertions(+), 49 deletions(-)
Index: linux/arch/x86/kernel/time_64.c
===================================================================
--- linux.orig/arch/x86/kernel/time_64.c
+++ linux/arch/x86/kernel/time_64.c
@@ -81,9 +81,11 @@ unsigned long __init native_calculate_cp
wrmsrl(MSR_K7_PERFCTR0 + i, 0);
wrmsrl(MSR_K7_EVNTSEL0 + i, 1 << 22 | 3 << 16 | 0x76);
rdtscl(tsc_start);
+ nsec_barrier();
do {
rdmsrl(MSR_K7_PERFCTR0 + i, pmc_now);
- tsc_now = get_cycles_sync();
+ nsec_barrier();
+ tsc_now = get_cycles();
} while ((tsc_now - tsc_start) < TICK_COUNT);
local_irq_restore(flags);
Index: linux/arch/x86/kernel/tsc_64.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc_64.c
+++ linux/arch/x86/kernel/tsc_64.c
@@ -181,12 +181,14 @@ static unsigned long __init tsc_read_ref
int i;
for (i = 0; i < MAX_RETRIES; i++) {
- t1 = get_cycles_sync();
+ nsec_barrier();
+ t1 = get_cycles();
if (hpet)
*hpet = hpet_readl(HPET_COUNTER) & 0xFFFFFFFF;
else
*pm = acpi_pm_read_early();
- t2 = get_cycles_sync();
+ nsec_barrier();
+ t2 = get_cycles();
if ((t2 - t1) < SMI_TRESHOLD)
return t2;
}
@@ -210,9 +212,11 @@ void __init tsc_calibrate(void)
outb(0xb0, 0x43);
outb((CLOCK_TICK_RATE / (1000 / 50)) & 0xff, 0x42);
outb((CLOCK_TICK_RATE / (1000 / 50)) >> 8, 0x42);
- tr1 = get_cycles_sync();
+ tr1 = get_cycles();
+ nsec_barrier();
while ((inb(0x61) & 0x20) == 0);
- tr2 = get_cycles_sync();
+ nsec_barrier();
+ tr2 = get_cycles();
tsc2 = tsc_read_refs(&pm2, hpet ? &hpet2 : NULL);
@@ -298,15 +302,16 @@ __setup("notsc", notsc_setup);
/* clock source code: */
+/* Caller must do nsec_barrier()s */
static cycle_t read_tsc(void)
{
- cycle_t ret = (cycle_t)get_cycles_sync();
+ cycle_t ret = (cycle_t)get_cycles();
return ret;
}
static cycle_t __vsyscall_fn vread_tsc(void)
{
- cycle_t ret = (cycle_t)vget_cycles_sync();
+ cycle_t ret = (cycle_t)vget_cycles();
return ret;
}
Index: linux/include/asm-x86/tsc.h
===================================================================
--- linux.orig/include/asm-x86/tsc.h
+++ linux/include/asm-x86/tsc.h
@@ -36,61 +36,25 @@ static inline cycles_t get_cycles(void)
return ret;
}
-/* Like get_cycles, but make sure the CPU is synchronized. */
-static __always_inline cycles_t __get_cycles_sync(void)
-{
- unsigned long long ret;
- unsigned eax, edx;
-
- /*
- * Use RDTSCP if possible; it is guaranteed to be synchronous
- * and doesn't cause a VMEXIT on Hypervisors
- */
- alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
- ASM_OUTPUT2("=a" (eax), "=d" (edx)),
- "a" (0U), "d" (0U) : "ecx", "memory");
- ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax);
- if (ret)
- return ret;
-
- /*
- * Don't do an additional sync on CPUs where we know
- * RDTSC is already synchronous:
- */
- alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
- "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
-
- return 0;
-}
-
-static __always_inline cycles_t get_cycles_sync(void)
-{
- unsigned long long ret;
- ret = __get_cycles_sync();
- if (!ret)
- rdtscll(ret);
- return ret;
-}
-
#ifdef CONFIG_PARAVIRT
/*
* For paravirt guests, some functionalities are executed through function
* pointers in the various pvops structures.
* These function pointers exist inside the kernel and can not
* be accessed by user space. To avoid this, we make a copy of the
- * get_cycles_sync (called in kernel) but force the use of native_read_tsc.
+ * get_cycles (called in kernel) but force the use of native_read_tsc.
* Ideally, the guest should set up it's own clock and vread
*/
-static __always_inline long long vget_cycles_sync(void)
+static __always_inline long long vget_cycles(void)
{
unsigned long long ret;
- ret = __get_cycles_sync();
+ ret = get_cycles();
if (!ret)
ret = native_read_tsc();
return ret;
}
#else
-# define vget_cycles_sync() get_cycles_sync()
+# define vget_cycles() get_cycles()
#endif
extern void tsc_init(void);
Index: linux/arch/x86/kernel/tsc_sync.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc_sync.c
+++ linux/arch/x86/kernel/tsc_sync.c
@@ -46,7 +46,8 @@ static __cpuinit void check_tsc_warp(voi
cycles_t start, now, prev, end;
int i;
- start = get_cycles_sync();
+ start = get_cycles();
+ nsec_barrier();
/*
* The measurement runs for 20 msecs:
*/
@@ -61,7 +62,8 @@ static __cpuinit void check_tsc_warp(voi
*/
__raw_spin_lock(&sync_lock);
prev = last_tsc;
- now = get_cycles_sync();
+ nsec_barrier();
+ now = get_cycles();
last_tsc = now;
__raw_spin_unlock(&sync_lock);
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH] [7/20] x86: Remove the now unused X86_FEATURE_SYNC_RDTSC
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
` (4 preceding siblings ...)
2008-01-03 0:50 ` [PATCH] [6/20] x86: Remove get_cycles_sync Andi Kleen
@ 2008-01-03 0:50 ` Andi Kleen
2008-01-03 0:50 ` [PATCH] [8/20] x86: Make TIF_MCE_NOTIFY optional Andi Kleen
` (13 subsequent siblings)
19 siblings, 0 replies; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:50 UTC (permalink / raw)
To: linux-kernel
Signed-off-by: Andi Kleen <ak@suse.de>
---
include/asm-x86/cpufeature.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux/include/asm-x86/cpufeature.h
===================================================================
--- linux.orig/include/asm-x86/cpufeature.h
+++ linux/include/asm-x86/cpufeature.h
@@ -77,7 +77,7 @@
#define X86_FEATURE_PEBS (3*32+12) /* Precise-Event Based Sampling */
#define X86_FEATURE_BTS (3*32+13) /* Branch Trace Store */
/* 14 free */
-#define X86_FEATURE_SYNC_RDTSC (3*32+15) /* RDTSC synchronizes the CPU */
+/* 15 free */
#define X86_FEATURE_REP_GOOD (3*32+16) /* rep microcode works well on this CPU */
#define X86_FEATURE_MFENCE_RDTSC (3*32+17) /* Mfence synchronizes RDTSC */
#define X86_FEATURE_LFENCE_RDTSC (3*32+18) /* Lfence synchronizes RDTSC */
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH] [8/20] x86: Make TIF_MCE_NOTIFY optional
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
` (5 preceding siblings ...)
2008-01-03 0:50 ` [PATCH] [7/20] x86: Remove the now unused X86_FEATURE_SYNC_RDTSC Andi Kleen
@ 2008-01-03 0:50 ` Andi Kleen
2008-01-03 0:50 ` [PATCH] [9/20] x86: Don't use oops_begin in 64bit mce code Andi Kleen
` (12 subsequent siblings)
19 siblings, 0 replies; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:50 UTC (permalink / raw)
To: linux-kernel
So that we don't have to implement it on 32bit.
I would actually like to remove it on 64bit too.
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/Kconfig | 5 +++++
arch/x86/kernel/cpu/mcheck/mce_64.c | 16 ++++++++++++++--
2 files changed, 19 insertions(+), 2 deletions(-)
Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c
@@ -148,6 +148,13 @@ static void mce_panic(char *msg, struct
panic(msg);
}
+static void mce_notify_userspace(void)
+{
+#ifdef CONFIG_MCE_NOTIFY
+ set_thread_flag(TIF_MCE_NOTIFY);
+#endif
+}
+
static int mce_available(struct cpuinfo_x86 *c)
{
return cpu_has(c, X86_FEATURE_MCE) && cpu_has(c, X86_FEATURE_MCA);
@@ -305,8 +312,7 @@ void do_machine_check(struct pt_regs * r
}
}
- /* notify userspace ASAP */
- set_thread_flag(TIF_MCE_NOTIFY);
+ mce_notify_userspace();
out:
/* the last thing we do is clear state */
@@ -390,7 +396,9 @@ static void mcheck_timer(struct work_str
*/
int mce_notify_user(void)
{
+#ifdef CONFIG_MCE_NOTIFY
clear_thread_flag(TIF_MCE_NOTIFY);
+#endif
if (test_and_clear_bit(0, ¬ify_user)) {
static unsigned long last_print;
unsigned long now = jiffies;
@@ -410,6 +418,7 @@ int mce_notify_user(void)
return 0;
}
+#ifdef CONFIG_MCE_NOTIFY
/* see if the idle task needs to notify userspace */
static int
mce_idle_callback(struct notifier_block *nfb, unsigned long action, void *junk)
@@ -424,6 +433,7 @@ mce_idle_callback(struct notifier_block
static struct notifier_block mce_idle_notifier = {
.notifier_call = mce_idle_callback,
};
+#endif
static __init int periodic_mcheck_init(void)
{
@@ -431,7 +441,9 @@ static __init int periodic_mcheck_init(v
if (next_interval)
schedule_delayed_work(&mcheck_work,
round_jiffies_relative(next_interval));
+#ifdef CONFIG_MCE_NOTIFY
idle_notifier_register(&mce_idle_notifier);
+#endif
return 0;
}
__initcall(periodic_mcheck_init);
Index: linux/arch/x86/Kconfig
===================================================================
--- linux.orig/arch/x86/Kconfig
+++ linux/arch/x86/Kconfig
@@ -551,6 +551,11 @@ config X86_MCE
to disable it. MCE support simply ignores non-MCE processors like
the 386 and 486, so nearly everyone can say Y here.
+config MCE_NOTIFY
+ bool
+ default y
+ depends on X86_64
+
config X86_MCE_INTEL
def_bool y
prompt "Intel MCE features"
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH] [9/20] x86: Don't use oops_begin in 64bit mce code
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
` (6 preceding siblings ...)
2008-01-03 0:50 ` [PATCH] [8/20] x86: Make TIF_MCE_NOTIFY optional Andi Kleen
@ 2008-01-03 0:50 ` Andi Kleen
2008-01-03 10:39 ` Ingo Molnar
2008-01-03 0:50 ` [PATCH] [10/20] i386: Move MWAIT idle check to generic CPU initialization Andi Kleen
` (11 subsequent siblings)
19 siblings, 1 reply; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:50 UTC (permalink / raw)
To: linux-kernel
It is not really useful to lock machine checks against oopses. And
machine checks normally don't nest, so they don't need their
own locking. Just call bust_spinlock/console_verbose directly.
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/kernel/cpu/mcheck/mce_64.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c
@@ -133,7 +133,8 @@ static void mce_panic(char *msg, struct
{
int i;
- oops_begin();
+ console_verbose();
+ bust_spinlocks(1);
for (i = 0; i < MCE_LOG_LEN; i++) {
unsigned long tsc = mcelog.entry[i].tsc;
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH] [10/20] i386: Move MWAIT idle check to generic CPU initialization
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
` (7 preceding siblings ...)
2008-01-03 0:50 ` [PATCH] [9/20] x86: Don't use oops_begin in 64bit mce code Andi Kleen
@ 2008-01-03 0:50 ` Andi Kleen
2008-01-03 10:42 ` Ingo Molnar
2008-01-03 0:50 ` [PATCH] [11/20] x86: Use the correct cpuid method to detect MWAIT support for C states Andi Kleen
` (10 subsequent siblings)
19 siblings, 1 reply; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:50 UTC (permalink / raw)
To: linux-kernel
Previously it was only run for Intel CPUs, but AMD Fam10h implements MWAIT too.
This matches 64bit behaviour.
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/kernel/cpu/common.c | 2 ++
arch/x86/kernel/cpu/intel.c | 1 -
2 files changed, 2 insertions(+), 1 deletion(-)
Index: linux/arch/x86/kernel/cpu/common.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/common.c
+++ linux/arch/x86/kernel/cpu/common.c
@@ -510,6 +510,8 @@ void __cpuinit identify_cpu(struct cpuin
/* Init Machine Check Exception if available. */
mcheck_init(c);
+
+ select_idle_routine(c);
}
void __init identify_boot_cpu(void)
Index: linux/arch/x86/kernel/cpu/intel.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/intel.c
+++ linux/arch/x86/kernel/cpu/intel.c
@@ -134,7 +134,6 @@ static void __cpuinit init_intel(struct
}
#endif
- select_idle_routine(c);
l2 = init_intel_cacheinfo(c);
if (c->cpuid_level > 9 ) {
unsigned eax = cpuid_eax(10);
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH] [11/20] x86: Use the correct cpuid method to detect MWAIT support for C states
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
` (8 preceding siblings ...)
2008-01-03 0:50 ` [PATCH] [10/20] i386: Move MWAIT idle check to generic CPU initialization Andi Kleen
@ 2008-01-03 0:50 ` Andi Kleen
2008-01-03 10:45 ` Ingo Molnar
2008-01-03 0:50 ` [PATCH] [12/20] x86: Use a per cpu timer for correctable machine check checking Andi Kleen
` (9 subsequent siblings)
19 siblings, 1 reply; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:50 UTC (permalink / raw)
To: andreas.herrmann3, linux-kernel
Previously there was a AMD specific quirk to handle the case of
AMD Fam10h MWAIT not supporting any C states. But it turns out
that CPUID already has ways to detectly detect that without
using special quirks.
The new code simply checks if MWAIT supports at least C1 and doesn't
use it if it doesn't. No more vendor specific code.
Credit goes to Ben Serebrin for pointing out the (nearly) obvious.
Cc: "Andreas Herrmann" <andreas.herrmann3@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/kernel/cpu/amd.c | 3 ---
arch/x86/kernel/process_32.c | 10 +++++++++-
arch/x86/kernel/process_64.c | 11 ++++++++++-
arch/x86/kernel/setup_64.c | 4 ----
4 files changed, 19 insertions(+), 9 deletions(-)
Index: linux/arch/x86/kernel/cpu/amd.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/amd.c
+++ linux/arch/x86/kernel/cpu/amd.c
@@ -295,9 +295,6 @@ static void __cpuinit init_amd(struct cp
local_apic_timer_disabled = 1;
#endif
- if (c->x86 == 0x10 && !force_mwait)
- clear_bit(X86_FEATURE_MWAIT, c->x86_capability);
-
/* K6s reports MCEs but don't actually have all the MSRs */
if (c->x86 < 6)
clear_bit(X86_FEATURE_MCE, c->x86_capability);
Index: linux/arch/x86/kernel/process_32.c
===================================================================
--- linux.orig/arch/x86/kernel/process_32.c
+++ linux/arch/x86/kernel/process_32.c
@@ -271,9 +271,17 @@ static void mwait_idle(void)
mwait_idle_with_hints(0, 0);
}
+static int mwait_usable(const struct cpuinfo_x86 *c)
+{
+ if (force_mwait)
+ return 1;
+ /* Any C1 states supported? */
+ return c->cpuid_level >= 5 && ((cpuid_edx(5) >> 4) & 0xf) > 0;
+}
+
void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
{
- if (cpu_has(c, X86_FEATURE_MWAIT)) {
+ if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
printk("monitor/mwait feature present.\n");
/*
* Skip, if setup has overridden idle.
Index: linux/arch/x86/kernel/process_64.c
===================================================================
--- linux.orig/arch/x86/kernel/process_64.c
+++ linux/arch/x86/kernel/process_64.c
@@ -270,10 +270,19 @@ static void mwait_idle(void)
}
}
+
+static int mwait_usable(const struct cpuinfo_x86 *c)
+{
+ if (force_mwait)
+ return 1;
+ /* Any C1 states supported? */
+ return c->cpuid_level >= 5 && ((cpuid_edx(5) >> 4) & 0xf) > 0;
+}
+
void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
{
static int printed;
- if (cpu_has(c, X86_FEATURE_MWAIT)) {
+ if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
/*
* Skip, if setup has overridden idle.
* One CPU supports mwait => All CPUs supports mwait
Index: linux/arch/x86/kernel/setup_64.c
===================================================================
--- linux.orig/arch/x86/kernel/setup_64.c
+++ linux/arch/x86/kernel/setup_64.c
@@ -760,10 +760,6 @@ static void __cpuinit init_amd(struct cp
/* MFENCE stops RDTSC speculation */
set_cpu_cap(c, X86_FEATURE_MFENCE_RDTSC);
- /* Family 10 doesn't support C states in MWAIT so don't use it */
- if (c->x86 == 0x10 && !force_mwait)
- clear_cpu_cap(c, X86_FEATURE_MWAIT);
-
if (amd_apic_timer_broken())
disable_apic_timer = 1;
}
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH] [11/20] x86: Use the correct cpuid method to detect MWAIT support for C states
2008-01-03 0:50 ` [PATCH] [11/20] x86: Use the correct cpuid method to detect MWAIT support for C states Andi Kleen
@ 2008-01-03 10:45 ` Ingo Molnar
2008-01-03 12:53 ` Andi Kleen
0 siblings, 1 reply; 41+ messages in thread
From: Ingo Molnar @ 2008-01-03 10:45 UTC (permalink / raw)
To: Andi Kleen
Cc: andreas.herrmann3, linux-kernel, Thomas Gleixner, H. Peter Anvin
* Andi Kleen <ak@suse.de> wrote:
> +static int mwait_usable(const struct cpuinfo_x86 *c)
> +{
> + if (force_mwait)
> + return 1;
> + /* Any C1 states supported? */
> + return c->cpuid_level >= 5 && ((cpuid_edx(5) >> 4) & 0xf) > 0;
> +}
> +
> void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
> {
> - if (cpu_has(c, X86_FEATURE_MWAIT)) {
> + if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
> printk("monitor/mwait feature present.\n");
hm, why not clear FEATURE_MWAIT if it's "not usable"? That's the
standard approach we do for CPU features that do not work.
Ingo
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH] [11/20] x86: Use the correct cpuid method to detect MWAIT support for C states
2008-01-03 10:45 ` Ingo Molnar
@ 2008-01-03 12:53 ` Andi Kleen
0 siblings, 0 replies; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 12:53 UTC (permalink / raw)
To: Ingo Molnar
Cc: andreas.herrmann3, linux-kernel, Thomas Gleixner, H. Peter Anvin
On Thursday 03 January 2008 11:45:26 Ingo Molnar wrote:
>
> * Andi Kleen <ak@suse.de> wrote:
>
> > +static int mwait_usable(const struct cpuinfo_x86 *c)
> > +{
> > + if (force_mwait)
> > + return 1;
> > + /* Any C1 states supported? */
> > + return c->cpuid_level >= 5 && ((cpuid_edx(5) >> 4) & 0xf) > 0;
> > +}
> > +
> > void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
> > {
> > - if (cpu_has(c, X86_FEATURE_MWAIT)) {
> > + if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
> > printk("monitor/mwait feature present.\n");
>
> hm, why not clear FEATURE_MWAIT if it's "not usable"? That's the
> standard approach we do for CPU features that do not work.
Well it works, just in a unexpected way not useful to the kernel.
At least on AMD there is a bit to enable it for ring 3 too, so
in theory someone could use it anyways.
-Andi
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH] [12/20] x86: Use a per cpu timer for correctable machine check checking
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
` (9 preceding siblings ...)
2008-01-03 0:50 ` [PATCH] [11/20] x86: Use the correct cpuid method to detect MWAIT support for C states Andi Kleen
@ 2008-01-03 0:50 ` Andi Kleen
2008-01-03 10:49 ` Ingo Molnar
2008-01-03 0:50 ` [PATCH] [13/20] x86: Use a deferrable timer for the correctable machine check poller Andi Kleen
` (8 subsequent siblings)
19 siblings, 1 reply; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:50 UTC (permalink / raw)
To: linux-kernel
Previously the code used a single timer that then used smp_call_function
to interrupt all CPUs while the original CPU was waiting for them.
But it is better / more real time and more power friendly to simply run
individual timers on each CPU so they all do this independently.
This way no single CPU has to wait for all others.
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/kernel/cpu/mcheck/mce_64.c | 68 +++++++++++++++++++++++++-----------
1 file changed, 48 insertions(+), 20 deletions(-)
Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c
@@ -363,17 +363,14 @@ void mce_log_therm_throt_event(unsigned
static int check_interval = 5 * 60; /* 5 minutes */
static int next_interval; /* in jiffies */
static void mcheck_timer(struct work_struct *work);
-static DECLARE_DELAYED_WORK(mcheck_work, mcheck_timer);
+static DEFINE_PER_CPU(struct delayed_work, mcheck_work);
-static void mcheck_check_cpu(void *info)
+static void mcheck_timer(struct work_struct *work)
{
+ int cpu;
+
if (mce_available(¤t_cpu_data))
do_machine_check(NULL, 0);
-}
-
-static void mcheck_timer(struct work_struct *work)
-{
- on_each_cpu(mcheck_check_cpu, NULL, 1, 1);
/*
* Alert userspace if needed. If we logged an MCE, reduce the
@@ -386,7 +383,8 @@ static void mcheck_timer(struct work_str
(int)round_jiffies_relative(check_interval*HZ));
}
- schedule_delayed_work(&mcheck_work, next_interval);
+ cpu = smp_processor_id();
+ schedule_delayed_work_on(cpu, &per_cpu(mcheck_work, cpu), next_interval);
}
/*
@@ -436,12 +434,44 @@ static struct notifier_block mce_idle_no
};
#endif
+static void mce_timers(int restart)
+{
+ int i;
+ next_interval = restart ? check_interval * HZ : 0;
+ for_each_online_cpu (i) {
+ struct delayed_work *w = &per_cpu(mcheck_work, i);
+ cancel_delayed_work_sync(w);
+ if (restart)
+ schedule_delayed_work_on(i, w,
+ round_jiffies_relative(next_interval));
+ }
+}
+
+static int __cpuinit
+mce_periodic_cpu_cb(struct notifier_block *b, unsigned long action, void *arg)
+{
+ long cpu = (long)arg;
+ struct delayed_work *w = &per_cpu(mcheck_work, cpu);
+ switch (action) {
+ case CPU_DOWN_PREPARE:
+ cancel_delayed_work_sync(w);
+ break;
+ case CPU_ONLINE:
+ case CPU_DOWN_FAILED:
+ schedule_delayed_work_on(cpu, w, next_interval);
+ break;
+ }
+ return NOTIFY_DONE;
+}
+
static __init int periodic_mcheck_init(void)
{
- next_interval = check_interval * HZ;
- if (next_interval)
- schedule_delayed_work(&mcheck_work,
- round_jiffies_relative(next_interval));
+ /* RED-PEN: race here with CPU getting added in parallel. But
+ * if the hotplug lock is aquired here we run into lock ordering
+ * problems with the scheduler code.
+ */
+ hotcpu_notifier(mce_periodic_cpu_cb, 0);
+ mce_timers(1);
#ifdef CONFIG_MCE_NOTIFY
idle_notifier_register(&mce_idle_notifier);
#endif
@@ -520,12 +550,15 @@ static void __cpuinit mce_cpu_features(s
*/
void __cpuinit mcheck_init(struct cpuinfo_x86 *c)
{
+ int cpu = smp_processor_id();
static cpumask_t mce_cpus = CPU_MASK_NONE;
+ INIT_DELAYED_WORK(&per_cpu(mcheck_work, cpu), mcheck_timer);
+
mce_cpu_quirks(c);
if (mce_dont_init ||
- cpu_test_and_set(smp_processor_id(), mce_cpus) ||
+ cpu_test_and_set(cpu, mce_cpus) ||
!mce_available(c))
return;
@@ -751,14 +784,9 @@ static int mce_resume(struct sys_device
/* Reinit MCEs after user configuration changes */
static void mce_restart(void)
{
- if (next_interval)
- cancel_delayed_work(&mcheck_work);
- /* Timer race is harmless here */
+ mce_timers(0);
on_each_cpu(mce_init, NULL, 1, 1);
- next_interval = check_interval * HZ;
- if (next_interval)
- schedule_delayed_work(&mcheck_work,
- round_jiffies_relative(next_interval));
+ mce_timers(1);
}
static struct sysdev_class mce_sysclass = {
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH] [12/20] x86: Use a per cpu timer for correctable machine check checking
2008-01-03 0:50 ` [PATCH] [12/20] x86: Use a per cpu timer for correctable machine check checking Andi Kleen
@ 2008-01-03 10:49 ` Ingo Molnar
2008-01-03 12:56 ` Andi Kleen
0 siblings, 1 reply; 41+ messages in thread
From: Ingo Molnar @ 2008-01-03 10:49 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, Thomas Gleixner, H. Peter Anvin
* Andi Kleen <ak@suse.de> wrote:
> Previously the code used a single timer that then used
> smp_call_function to interrupt all CPUs while the original CPU was
> waiting for them.
>
> But it is better / more real time and more power friendly to simply
> run individual timers on each CPU so they all do this independently.
>
> This way no single CPU has to wait for all others.
i think we should unify this code first and provide it on 32-bit as
well.
Ingo
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] [12/20] x86: Use a per cpu timer for correctable machine check checking
2008-01-03 10:49 ` Ingo Molnar
@ 2008-01-03 12:56 ` Andi Kleen
0 siblings, 0 replies; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 12:56 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel, Thomas Gleixner, H. Peter Anvin
On Thursday 03 January 2008 11:49:56 Ingo Molnar wrote:
>
> * Andi Kleen <ak@suse.de> wrote:
>
> > Previously the code used a single timer that then used
> > smp_call_function to interrupt all CPUs while the original CPU was
> > waiting for them.
> >
> > But it is better / more real time and more power friendly to simply
> > run individual timers on each CPU so they all do this independently.
> >
> > This way no single CPU has to wait for all others.
>
> i think we should unify this code first and provide it on 32-bit as
> well.
That's done in another patch that hasn't been posted yet.
-Andi
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH] [13/20] x86: Use a deferrable timer for the correctable machine check poller
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
` (10 preceding siblings ...)
2008-01-03 0:50 ` [PATCH] [12/20] x86: Use a per cpu timer for correctable machine check checking Andi Kleen
@ 2008-01-03 0:50 ` Andi Kleen
2008-01-03 0:50 ` [PATCH] [14/20] x86: Add per cpu counters for machine check polls / machine check events Andi Kleen
` (7 subsequent siblings)
19 siblings, 0 replies; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:50 UTC (permalink / raw)
To: linux-kernel
They are not time critical and delaying them a little for
the next regular wakeup is no problem.
Also when a CPU is idle then it is unlikely to generate
errors anyways, so it is ok to check only when the CPU
is actually doing something.
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/kernel/cpu/mcheck/mce_64.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c
@@ -379,8 +379,7 @@ static void mcheck_timer(struct work_str
if (mce_notify_user()) {
next_interval = max(next_interval/2, HZ/100);
} else {
- next_interval = min(next_interval * 2,
- (int)round_jiffies_relative(check_interval*HZ));
+ next_interval = min(next_interval * 2, check_interval*HZ);
}
cpu = smp_processor_id();
@@ -442,8 +441,7 @@ static void mce_timers(int restart)
struct delayed_work *w = &per_cpu(mcheck_work, i);
cancel_delayed_work_sync(w);
if (restart)
- schedule_delayed_work_on(i, w,
- round_jiffies_relative(next_interval));
+ schedule_delayed_work_on(i, w, next_interval);
}
}
@@ -553,7 +551,7 @@ void __cpuinit mcheck_init(struct cpuinf
int cpu = smp_processor_id();
static cpumask_t mce_cpus = CPU_MASK_NONE;
- INIT_DELAYED_WORK(&per_cpu(mcheck_work, cpu), mcheck_timer);
+ INIT_DELAYED_WORK_DEFERRABLE(&per_cpu(mcheck_work, cpu), mcheck_timer);
mce_cpu_quirks(c);
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH] [14/20] x86: Add per cpu counters for machine check polls / machine check events
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
` (11 preceding siblings ...)
2008-01-03 0:50 ` [PATCH] [13/20] x86: Use a deferrable timer for the correctable machine check poller Andi Kleen
@ 2008-01-03 0:50 ` Andi Kleen
2008-01-03 0:50 ` [PATCH] [15/20] x86: Move X86_FEATURE_CONSTANT_TSC into early cpu feature detection Andi Kleen
` (6 subsequent siblings)
19 siblings, 0 replies; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:50 UTC (permalink / raw)
To: linux-kernel
.. and report them in /proc/interrupts
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/kernel/cpu/mcheck/mce_64.c | 6 ++++++
arch/x86/kernel/irq_32.c | 10 ++++++++++
arch/x86/kernel/irq_64.c | 9 +++++++++
include/asm-x86/mce.h | 3 +++
4 files changed, 28 insertions(+)
Index: linux/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce_64.c
+++ linux/arch/x86/kernel/cpu/mcheck/mce_64.c
@@ -23,6 +23,7 @@
#include <linux/ctype.h>
#include <linux/kmod.h>
#include <linux/kdebug.h>
+#include <linux/percpu.h>
#include <asm/processor.h>
#include <asm/msr.h>
#include <asm/mce.h>
@@ -57,6 +58,9 @@ static char *trigger_argv[2] = { trigger
static DECLARE_WAIT_QUEUE_HEAD(mce_wait);
+DEFINE_PER_CPU(unsigned, mce_checks);
+DEFINE_PER_CPU(unsigned, mce_events);
+
/*
* Lockless MCE logging infrastructure.
* This avoids deadlocks on printk locks without having to break locks. Also
@@ -208,6 +212,7 @@ void do_machine_check(struct pt_regs * r
memset(&m, 0, sizeof(struct mce));
m.cpu = smp_processor_id();
+ __get_cpu_var(mce_checks)++;
rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
/* if the restart IP is not valid, we're done for */
if (!(m.mcgstatus & MCG_STATUS_RIPV))
@@ -263,6 +268,7 @@ void do_machine_check(struct pt_regs * r
panicm_found = 1;
}
+ __get_cpu_var(mce_checks)++;
add_taint(TAINT_MACHINE_CHECK);
}
Index: linux/arch/x86/kernel/irq_32.c
===================================================================
--- linux.orig/arch/x86/kernel/irq_32.c
+++ linux/arch/x86/kernel/irq_32.c
@@ -18,6 +18,7 @@
#include <asm/apic.h>
#include <asm/uaccess.h>
+#include <asm/mce.h>
DEFINE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
EXPORT_PER_CPU_SYMBOL(irq_stat);
@@ -329,6 +330,15 @@ skip:
#if defined(CONFIG_X86_IO_APIC)
seq_printf(p, "MIS: %10u\n", atomic_read(&irq_mis_count));
#endif
+ seq_printf(p, "MCE: ");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", per_cpu(mce_events, j));
+ seq_printf(p, " Machine check events\n");
+ seq_printf(p, "MCP: ");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", per_cpu(mce_checks, j));
+ seq_printf(p, " Machine check state polls\n");
+
}
return 0;
}
Index: linux/include/asm-x86/mce.h
===================================================================
--- linux.orig/include/asm-x86/mce.h
+++ linux/include/asm-x86/mce.h
@@ -115,6 +115,9 @@ extern void mcheck_init(struct cpuinfo_x
extern void stop_mce(void);
extern void restart_mce(void);
+DECLARE_PER_CPU(unsigned, mce_events);
+DECLARE_PER_CPU(unsigned, mce_checks);
+
#endif /* __KERNEL__ */
#endif
Index: linux/arch/x86/kernel/irq_64.c
===================================================================
--- linux.orig/arch/x86/kernel/irq_64.c
+++ linux/arch/x86/kernel/irq_64.c
@@ -17,6 +17,7 @@
#include <asm/io_apic.h>
#include <asm/idle.h>
#include <asm/smp.h>
+#include <asm/mce.h>
DEFINE_PER_CPU(irq_cpustat_t, irq_stat);
@@ -151,6 +152,14 @@ skip:
seq_printf(p, "%10u ", cpu_pda(j)->irq_spurious_count);
seq_printf(p, " Spurious interrupts\n");
seq_printf(p, "ERR: %10u\n", atomic_read(&irq_err_count));
+ seq_printf(p, "MCE: ");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", per_cpu(mce_events, j));
+ seq_printf(p, " Machine check events\n");
+ seq_printf(p, "MCP: ");
+ for_each_online_cpu(j)
+ seq_printf(p, "%10u ", per_cpu(mce_checks, j));
+ seq_printf(p, " Machine check state polls\n");
}
return 0;
}
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH] [15/20] x86: Move X86_FEATURE_CONSTANT_TSC into early cpu feature detection
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
` (12 preceding siblings ...)
2008-01-03 0:50 ` [PATCH] [14/20] x86: Add per cpu counters for machine check polls / machine check events Andi Kleen
@ 2008-01-03 0:50 ` Andi Kleen
2008-01-03 11:03 ` Ingo Molnar
2008-01-03 0:50 ` [PATCH] [16/20] x86: Allow TSC clock source on AMD Fam10h and some cleanup Andi Kleen
` (5 subsequent siblings)
19 siblings, 1 reply; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:50 UTC (permalink / raw)
To: linux-kernel
Need this in the next patch in time_init and that happens early.
This includes a minor fix on i386 where early_intel_workarounds()
[which is now called early_init_intel] really executes early as
the comments say.
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/kernel/cpu/amd.c | 17 +++++++++++------
arch/x86/kernel/cpu/common.c | 11 +++++++++--
arch/x86/kernel/cpu/cpu.h | 3 ++-
arch/x86/kernel/cpu/intel.c | 13 ++++++-------
arch/x86/kernel/setup_64.c | 39 +++++++++++++++++++++++++++++++--------
5 files changed, 59 insertions(+), 24 deletions(-)
Index: linux/arch/x86/kernel/setup_64.c
===================================================================
--- linux.orig/arch/x86/kernel/setup_64.c
+++ linux/arch/x86/kernel/setup_64.c
@@ -553,9 +553,6 @@ static void __cpuinit display_cacheinfo(
printk(KERN_INFO "CPU: L2 Cache: %dK (%d bytes/line)\n",
c->x86_cache_size, ecx & 0xFF);
}
-
- if (n >= 0x80000007)
- cpuid(0x80000007, &dummy, &dummy, &dummy, &c->x86_power);
if (n >= 0x80000008) {
cpuid(0x80000008, &eax, &dummy, &dummy, &dummy);
c->x86_virt_bits = (eax >> 8) & 0xff;
@@ -633,7 +630,7 @@ static void __init amd_detect_cmp(struct
#endif
}
-static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
+static void __cpuinit early_init_amd_mc(struct cpuinfo_x86 *c)
{
#ifdef CONFIG_SMP
unsigned bits, ecx;
@@ -691,6 +688,15 @@ static __cpuinit int amd_apic_timer_brok
return 0;
}
+static void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
+{
+ early_init_amd_mc(c);
+
+ /* c->x86_power is 8000_0007 edx. Bit 8 is constant TSC */
+ if (c->x86_power & (1<<8))
+ set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
+}
+
static void __cpuinit init_amd(struct cpuinfo_x86 *c)
{
unsigned level;
@@ -740,10 +746,6 @@ static void __cpuinit init_amd(struct cp
}
display_cacheinfo(c);
- /* c->x86_power is 8000_0007 edx. Bit 8 is constant TSC */
- if (c->x86_power & (1<<8))
- set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
-
/* Multi core CPU? */
if (c->extended_cpuid_level >= 0x80000008)
amd_detect_cmp(c);
@@ -850,6 +852,13 @@ static void srat_detect_node(void)
#endif
}
+static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
+{
+ if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
+ (c->x86 == 0x6 && c->x86_model >= 0x0e))
+ set_bit(X86_FEATURE_CONSTANT_TSC, &c->x86_capability);
+}
+
static void __cpuinit init_intel(struct cpuinfo_x86 *c)
{
/* Cache sizes */
@@ -1061,6 +1070,20 @@ void __cpuinit identify_cpu(struct cpuin
#ifdef CONFIG_NUMA
numa_add_cpu(smp_processor_id());
#endif
+
+ c->extended_cpuid_level = cpuid_eax(0x80000000);
+
+ if (c->extended_cpuid_level >= 0x80000007)
+ c->x86_power = cpuid_edx(0x80000007);
+
+ switch (c->x86_vendor) {
+ case X86_VENDOR_AMD:
+ early_init_amd(c);
+ break;
+ case X86_VENDOR_INTEL:
+ early_init_intel(c);
+ break;
+ }
}
void __cpuinit print_cpu_info(struct cpuinfo_x86 *c)
Index: linux/arch/x86/kernel/cpu/amd.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/amd.c
+++ linux/arch/x86/kernel/cpu/amd.c
@@ -63,6 +63,15 @@ static __cpuinit int amd_apic_timer_brok
int force_mwait __cpuinitdata;
+void __cpuinit early_init_amd(struct cpuinfo_x86 *c)
+{
+ if (cpuid_eax(0x80000000) >= 0x80000007) {
+ c->x86_power = cpuid_edx(0x80000007);
+ if (c->x86_power & (1<<8))
+ set_bit(X86_FEATURE_CONSTANT_TSC, c->x86_capability);
+ }
+}
+
static void __cpuinit init_amd(struct cpuinfo_x86 *c)
{
u32 l, h;
@@ -85,6 +94,8 @@ static void __cpuinit init_amd(struct cp
}
#endif
+ early_init_amd(c);
+
/*
* FIXME: We should handle the K5 here. Set up the write
* range and also turn on MSR 83 bits 4 and 31 (write alloc,
@@ -257,12 +268,6 @@ static void __cpuinit init_amd(struct cp
c->x86_max_cores = (cpuid_ecx(0x80000008) & 0xff) + 1;
}
- if (cpuid_eax(0x80000000) >= 0x80000007) {
- c->x86_power = cpuid_edx(0x80000007);
- if (c->x86_power & (1<<8))
- set_bit(X86_FEATURE_CONSTANT_TSC, c->x86_capability);
- }
-
#ifdef CONFIG_X86_HT
/*
* On a AMD multi core setup the lower bits of the APIC id
Index: linux/arch/x86/kernel/cpu/common.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/common.c
+++ linux/arch/x86/kernel/cpu/common.c
@@ -307,6 +307,15 @@ static void __init early_cpu_detect(void
cpu_detect(c);
get_cpu_vendor(c, 1);
+
+ switch (c->x86_vendor) {
+ case X86_VENDOR_AMD:
+ early_init_amd(c);
+ break;
+ case X86_VENDOR_INTEL:
+ early_init_intel(c);
+ break;
+ }
}
static void __cpuinit generic_identify(struct cpuinfo_x86 * c)
@@ -364,8 +373,6 @@ static void __cpuinit generic_identify(s
init_scattered_cpuid_features(c);
}
- early_intel_workaround(c);
-
#ifdef CONFIG_X86_HT
c->phys_proc_id = (cpuid_ebx(1) >> 24) & 0xff;
#endif
Index: linux/arch/x86/kernel/cpu/intel.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/intel.c
+++ linux/arch/x86/kernel/cpu/intel.c
@@ -29,13 +29,14 @@
struct movsl_mask movsl_mask __read_mostly;
#endif
-void __cpuinit early_intel_workaround(struct cpuinfo_x86 *c)
+void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
{
- if (c->x86_vendor != X86_VENDOR_INTEL)
- return;
/* Netburst reports 64 bytes clflush size, but does IO in 128 bytes */
if (c->x86 == 15 && c->x86_cache_alignment == 64)
c->x86_cache_alignment = 128;
+ if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
+ (c->x86 == 0x6 && c->x86_model >= 0x0e))
+ set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
}
/*
@@ -115,6 +116,8 @@ static void __cpuinit init_intel(struct
unsigned int l2 = 0;
char *p = NULL;
+ early_init_intel(c);
+
#ifdef CONFIG_X86_F00F_BUG
/*
* All current models of Pentium and Pentium with MMX technology CPUs
@@ -209,10 +212,6 @@ static void __cpuinit init_intel(struct
}
if (c->x86 == 6)
set_bit(X86_FEATURE_P3, c->x86_capability);
- if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
- (c->x86 == 0x6 && c->x86_model >= 0x0e))
- set_bit(X86_FEATURE_CONSTANT_TSC, c->x86_capability);
-
if (cpu_has_ds) {
unsigned int l1;
rdmsr(MSR_IA32_MISC_ENABLE, l1, l2);
Index: linux/arch/x86/kernel/cpu/cpu.h
===================================================================
--- linux.orig/arch/x86/kernel/cpu/cpu.h
+++ linux/arch/x86/kernel/cpu/cpu.h
@@ -24,5 +24,6 @@ extern struct cpu_dev * cpu_devs [X86_VE
extern int get_model_name(struct cpuinfo_x86 *c);
extern void display_cacheinfo(struct cpuinfo_x86 *c);
-extern void early_intel_workaround(struct cpuinfo_x86 *c);
+extern void early_init_intel(struct cpuinfo_x86 *c);
+extern void early_init_amd(struct cpuinfo_x86 *c);
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH] [16/20] x86: Allow TSC clock source on AMD Fam10h and some cleanup
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
` (13 preceding siblings ...)
2008-01-03 0:50 ` [PATCH] [15/20] x86: Move X86_FEATURE_CONSTANT_TSC into early cpu feature detection Andi Kleen
@ 2008-01-03 0:50 ` Andi Kleen
2008-01-04 8:38 ` Ingo Molnar
2008-01-03 0:50 ` [PATCH] [17/20] x86: Remove explicit C3 TSC check on 64bit Andi Kleen
` (4 subsequent siblings)
19 siblings, 1 reply; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:50 UTC (permalink / raw)
To: lenb, linux-kernel
After a lot of discussions with AMD it turns out that TSC
on Fam10h CPUs is synchronized when the CONSTANT_TSC cpuid bit is set.
Or rather that if there are ever systems where that is not
true it would be their BIOS' task to disable the bit.
So finally use TSC gettimeofday on Fam10h by default.
Or rather it is always used now on CPUs where the AMD
specific CONSTANT_TSC bit is set.
This gives a nice speed bost for gettimeofday() on these systems
which tends to be by far the most common v/syscall.
On a Fam10h system here TSC gtod uses about 20% of the CPU time of
acpi_pm based gtod(). This was measured on 32bit, on 64bit
it is even better because TSC gtod() can use a vsyscall
and stay in ring 3, which acpi_pm doesn't.
The Intel check simply checks for CONSTANT_TSC too without hardcoding
Intel vendor. This is equivalent on 64bit because all 64bit capable Intel
CPUs will have CONSTANT_TSC set.
On Intel there is no CPU supplied CONSTANT_TSC bit currently,
but we synthesize one based on hardcoded knowledge which steppings
have p-state invariant TSC.
So the new logic is now: On CPUs which have the AMD specific
CONSTANT_TSC bit set or on Intel CPUs which are new enough
to be known to have p-state invariant TSC always use
TSC based gettimeofday()
Cc: lenb@kernel.org
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/kernel/tsc_32.c | 5 +++++
arch/x86/kernel/tsc_64.c | 5 ++---
2 files changed, 7 insertions(+), 3 deletions(-)
Index: linux/arch/x86/kernel/tsc_32.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc_32.c
+++ linux/arch/x86/kernel/tsc_32.c
@@ -354,6 +354,11 @@ __cpuinit int unsynchronized_tsc(void)
{
if (!cpu_has_tsc || tsc_unstable)
return 1;
+
+ /* Anything with constant TSC should be synchronized */
+ if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
+ return 0;
+
/*
* Intel systems are normally all synchronized.
* Exceptions must mark TSC as unstable:
Index: linux/arch/x86/kernel/tsc_64.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc_64.c
+++ linux/arch/x86/kernel/tsc_64.c
@@ -276,9 +276,8 @@ __cpuinit int unsynchronized_tsc(void)
if (apic_is_clustered_box())
return 1;
#endif
- /* Most intel systems have synchronized TSCs except for
- multi node systems */
- if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) {
+
+ if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
#ifdef CONFIG_ACPI
/* But TSC doesn't tick in C3 so don't use it there */
if (acpi_gbl_FADT.header.length > 0 &&
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH] [17/20] x86: Remove explicit C3 TSC check on 64bit
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
` (14 preceding siblings ...)
2008-01-03 0:50 ` [PATCH] [16/20] x86: Allow TSC clock source on AMD Fam10h and some cleanup Andi Kleen
@ 2008-01-03 0:50 ` Andi Kleen
2008-01-04 8:38 ` Ingo Molnar
2008-01-03 0:50 ` [PATCH] [18/20] x86: Don't disable TSC in any C states on AMD Fam10h Andi Kleen
` (3 subsequent siblings)
19 siblings, 1 reply; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:50 UTC (permalink / raw)
To: lenb, linux-kernel
Trust the ACPI code to disable TSC instead when C3 is used.
AMD Fam10h does not disable TSC in any C states so the
check was incorrect there anyways after the change
to handle this like Intel on AMD too.
This allows to use the TSC when C3 is disabled in software
(acpi.max_c_state=2), but the BIOS supports it anyways.
Match i386 behaviour.
Cc: lenb@kernel.org
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/kernel/tsc_64.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
Index: linux/arch/x86/kernel/tsc_64.c
===================================================================
--- linux.orig/arch/x86/kernel/tsc_64.c
+++ linux/arch/x86/kernel/tsc_64.c
@@ -277,15 +277,8 @@ __cpuinit int unsynchronized_tsc(void)
return 1;
#endif
- if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) {
-#ifdef CONFIG_ACPI
- /* But TSC doesn't tick in C3 so don't use it there */
- if (acpi_gbl_FADT.header.length > 0 &&
- acpi_gbl_FADT.C3latency < 1000)
- return 1;
-#endif
+ if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
return 0;
- }
/* Assume multi socket systems are not synchronized */
return num_present_cpus() > 1;
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH] [18/20] x86: Don't disable TSC in any C states on AMD Fam10h
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
` (15 preceding siblings ...)
2008-01-03 0:50 ` [PATCH] [17/20] x86: Remove explicit C3 TSC check on 64bit Andi Kleen
@ 2008-01-03 0:50 ` Andi Kleen
2008-01-04 8:40 ` Ingo Molnar
2008-01-03 0:50 ` [PATCH] [19/20] x86: Use shorter addresses in i386 segfault printks Andi Kleen
` (2 subsequent siblings)
19 siblings, 1 reply; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:50 UTC (permalink / raw)
To: lenb, linux-kernel
The ACPI code currently disables TSC use in any C2 and C3
states. But the AMD Fam10h BKDG documents that the TSC
will never stop in any C states when the CONSTANT_TSC bit is
set. Make this disabling conditional on CONSTANT_TSC
not set on AMD.
I actually think this is true on Intel too for C2 states
on CPUs with p-state invariant TSC, but this needs
further discussions with Len to really confirm :-)
So far it is only enabled on AMD.
Cc: lenb@kernel.org
Signed-off-by: Andi Kleen <ak@suse.de>
---
drivers/acpi/processor_idle.c | 32 ++++++++++++++++++++++++++++----
1 file changed, 28 insertions(+), 4 deletions(-)
Index: linux/drivers/acpi/processor_idle.c
===================================================================
--- linux.orig/drivers/acpi/processor_idle.c
+++ linux/drivers/acpi/processor_idle.c
@@ -353,6 +353,26 @@ int acpi_processor_resume(struct acpi_de
return 0;
}
+#if defined (CONFIG_GENERIC_TIME) && defined (CONFIG_X86_TSC)
+static int tsc_halts_in_c(int state)
+{
+ switch (boot_cpu_data.x86_vendor) {
+ case X86_VENDOR_AMD:
+ /*
+ * AMD Fam10h TSC will tick in all
+ * C/P/S0/S1 states when this bit is set.
+ */
+ if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
+ return 0;
+ /*FALL THROUGH*/
+ case X86_VENDOR_INTEL:
+ /* Several cases known where TSC halts in C2 too */
+ default:
+ return state > ACPI_STATE_C1;
+ }
+}
+#endif
+
#ifndef CONFIG_CPU_IDLE
static void acpi_processor_idle(void)
{
@@ -512,7 +532,8 @@ static void acpi_processor_idle(void)
#if defined (CONFIG_GENERIC_TIME) && defined (CONFIG_X86_TSC)
/* TSC halts in C2, so notify users */
- mark_tsc_unstable("possible TSC halt in C2");
+ if (tsc_halts_in_c(ACPI_STATE_C2))
+ mark_tsc_unstable("possible TSC halt in C2");
#endif
/* Compute time (ticks) that we were actually asleep */
sleep_ticks = ticks_elapsed(t1, t2);
@@ -576,7 +597,8 @@ static void acpi_processor_idle(void)
#if defined (CONFIG_GENERIC_TIME) && defined (CONFIG_X86_TSC)
/* TSC halts in C3, so notify users */
- mark_tsc_unstable("TSC halts in C3");
+ if (tsc_halts_in_c(ACPI_STATE_C3))
+ mark_tsc_unstable("TSC halts in C3");
#endif
/* Compute time (ticks) that we were actually asleep */
sleep_ticks = ticks_elapsed(t1, t2);
@@ -1441,7 +1463,8 @@ static int acpi_idle_enter_simple(struct
#if defined (CONFIG_GENERIC_TIME) && defined (CONFIG_X86_TSC)
/* TSC could halt in idle, so notify users */
- mark_tsc_unstable("TSC halts in idle");;
+ if (tsc_halts_in_c(cx->type))
+ mark_tsc_unstable("TSC halts in idle");;
#endif
sleep_ticks = ticks_elapsed(t1, t2);
@@ -1552,7 +1575,8 @@ static int acpi_idle_enter_bm(struct cpu
#if defined (CONFIG_GENERIC_TIME) && defined (CONFIG_X86_TSC)
/* TSC could halt in idle, so notify users */
- mark_tsc_unstable("TSC halts in idle");
+ if (tsc_halts_in_c(ACPI_STATE_C3))
+ mark_tsc_unstable("TSC halts in idle");
#endif
sleep_ticks = ticks_elapsed(t1, t2);
/* Tell the scheduler how much we idled: */
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH] [18/20] x86: Don't disable TSC in any C states on AMD Fam10h
2008-01-03 0:50 ` [PATCH] [18/20] x86: Don't disable TSC in any C states on AMD Fam10h Andi Kleen
@ 2008-01-04 8:40 ` Ingo Molnar
0 siblings, 0 replies; 41+ messages in thread
From: Ingo Molnar @ 2008-01-04 8:40 UTC (permalink / raw)
To: Andi Kleen; +Cc: lenb, linux-kernel, Thomas Gleixner, H. Peter Anvin
* Andi Kleen <ak@suse.de> wrote:
> The ACPI code currently disables TSC use in any C2 and C3 states. But
> the AMD Fam10h BKDG documents that the TSC will never stop in any C
> states when the CONSTANT_TSC bit is set. Make this disabling
> conditional on CONSTANT_TSC not set on AMD.
>
> I actually think this is true on Intel too for C2 states on CPUs with
> p-state invariant TSC, but this needs further discussions with Len to
> really confirm :-)
>
> So far it is only enabled on AMD.
thanks Andi - i've picked this up for x86.git, to get it tested. Len,
what do you think? If/when you pick it up into the ACPI tree i'll drop
it from x86.git.
Ingo
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH] [19/20] x86: Use shorter addresses in i386 segfault printks
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
` (16 preceding siblings ...)
2008-01-03 0:50 ` [PATCH] [18/20] x86: Don't disable TSC in any C states on AMD Fam10h Andi Kleen
@ 2008-01-03 0:50 ` Andi Kleen
2008-01-03 10:56 ` Ingo Molnar
2008-01-03 0:50 ` [PATCH] [20/20] x86: Print which shared library/executable faulted in segfault etc. messages Andi Kleen
2008-01-03 9:54 ` [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Ingo Molnar
19 siblings, 1 reply; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:50 UTC (permalink / raw)
To: linux-kernel
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/mm/fault_32.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux/arch/x86/mm/fault_32.c
===================================================================
--- linux.orig/arch/x86/mm/fault_32.c
+++ linux/arch/x86/mm/fault_32.c
@@ -549,7 +549,7 @@ bad_area_nosemaphore:
if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
printk_ratelimit()) {
- printk("%s%s[%d]: segfault at %08lx ip %08lx "
+ printk("%s%s[%d]: segfault at %lx ip %08lx "
"sp %08lx error %lx\n",
task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
tsk->comm, task_pid_nr(tsk), address, regs->ip,
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH] [20/20] x86: Print which shared library/executable faulted in segfault etc. messages
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
` (17 preceding siblings ...)
2008-01-03 0:50 ` [PATCH] [19/20] x86: Use shorter addresses in i386 segfault printks Andi Kleen
@ 2008-01-03 0:50 ` Andi Kleen
2008-01-03 6:28 ` Eric Dumazet
2008-01-03 11:00 ` Ingo Molnar
2008-01-03 9:54 ` [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Ingo Molnar
19 siblings, 2 replies; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 0:50 UTC (permalink / raw)
To: linux-kernel
They now look like
hal-resmgr[13791]: segfault at 3c rip 2b9c8caec182 rsp 7fff1e825d30 error 4 in libacl.so.1.1.0[2b9c8caea000+6000]
This makes it easier to pinpoint bugs to specific libraries.
And printing the offset into a mapping also always allows to find the
correct fault point in a library even with randomized mappings. Previously
there was no way to actually find the correct code address inside
the randomized mapping.
Relies on earlier patch to shorten the printk formats.
They are often now longer than 80 characters, but I think that's worth
it.
Patch for i386 and x86-64.
Signed-off-by: Andi Kleen <ak@suse.de>
---
arch/x86/kernel/signal_32.c | 7 +++++--
arch/x86/kernel/signal_64.c | 7 +++++--
arch/x86/kernel/traps_32.c | 7 +++++--
arch/x86/mm/fault_32.c | 4 +++-
include/linux/mm.h | 1 +
mm/memory.c | 27 +++++++++++++++++++++++++++
6 files changed, 46 insertions(+), 7 deletions(-)
Index: linux/include/linux/mm.h
===================================================================
--- linux.orig/include/linux/mm.h
+++ linux/include/linux/mm.h
@@ -1145,6 +1145,7 @@ extern int randomize_va_space;
#endif
const char * arch_vma_name(struct vm_area_struct *vma);
+void print_vma_addr(char *prefix, unsigned long rip);
struct page *sparse_mem_map_populate(unsigned long pnum, int nid);
pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
Index: linux/mm/memory.c
===================================================================
--- linux.orig/mm/memory.c
+++ linux/mm/memory.c
@@ -2746,3 +2746,30 @@ int access_process_vm(struct task_struct
return buf - old_buf;
}
+
+/*
+ * Print the name of a VMA.
+ */
+void print_vma_addr(char *prefix, unsigned long ip)
+{
+ struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma;
+ down_read(&mm->mmap_sem);
+ vma = find_vma(mm, ip);
+ if (vma && vma->vm_file) {
+ struct file *f = vma->vm_file;
+ char *buf = (char *)__get_free_page(GFP_KERNEL);
+ if (buf) {
+ char *p, *s;
+ p = d_path(f->f_dentry, f->f_vfsmnt, buf, PAGE_SIZE);
+ s = strrchr(p, '/');
+ if (s)
+ p = s+1;
+ printk("%s%s[%lx+%lx]", prefix, p,
+ vma->vm_start,
+ vma->vm_end - vma->vm_start);
+ free_page((unsigned long)buf);
+ }
+ }
+ up_read(¤t->mm->mmap_sem);
+}
Index: linux/arch/x86/kernel/signal_32.c
===================================================================
--- linux.orig/arch/x86/kernel/signal_32.c
+++ linux/arch/x86/kernel/signal_32.c
@@ -198,12 +198,15 @@ asmlinkage int sys_sigreturn(unsigned lo
return ax;
badframe:
- if (show_unhandled_signals && printk_ratelimit())
+ if (show_unhandled_signals && printk_ratelimit()) {
printk("%s%s[%d] bad frame in sigreturn frame:%p ip:%lx"
- " sp:%lx oeax:%lx\n",
+ " sp:%lx oeax:%lx",
task_pid_nr(current) > 1 ? KERN_INFO : KERN_EMERG,
current->comm, task_pid_nr(current), frame, regs->ip,
regs->sp, regs->orig_ax);
+ print_vma_addr(" in ", regs->ip);
+ printk("\n");
+ }
force_sig(SIGSEGV, current);
return 0;
Index: linux/arch/x86/kernel/signal_64.c
===================================================================
--- linux.orig/arch/x86/kernel/signal_64.c
+++ linux/arch/x86/kernel/signal_64.c
@@ -481,9 +481,12 @@ do_notify_resume(struct pt_regs *regs, v
void signal_fault(struct pt_regs *regs, void __user *frame, char *where)
{
struct task_struct *me = current;
- if (show_unhandled_signals && printk_ratelimit())
- printk("%s[%d] bad frame in %s frame:%p ip:%lx sp:%lx orax:%lx\n",
+ if (show_unhandled_signals && printk_ratelimit()) {
+ printk("%s[%d] bad frame in %s frame:%p ip:%lx sp:%lx orax:%lx",
me->comm,me->pid,where,frame,regs->ip,regs->sp,regs->orig_ax);
+ print_vma_addr(" in ", regs->ip);
+ printk("\n");
+ }
force_sig(SIGSEGV, me);
}
Index: linux/arch/x86/kernel/traps_32.c
===================================================================
--- linux.orig/arch/x86/kernel/traps_32.c
+++ linux/arch/x86/kernel/traps_32.c
@@ -673,11 +673,14 @@ void __kprobes do_general_protection(str
current->thread.error_code = error_code;
current->thread.trap_no = 13;
if (show_unhandled_signals && unhandled_signal(current, SIGSEGV) &&
- printk_ratelimit())
+ printk_ratelimit()) {
printk(KERN_INFO
- "%s[%d] general protection ip:%lx sp:%lx error:%lx\n",
+ "%s[%d] general protection ip:%lx sp:%lx error:%lx",
current->comm, task_pid_nr(current),
regs->ip, regs->sp, error_code);
+ print_vma_addr(" in ", regs->ip);
+ printk("\n");
+ }
force_sig(SIGSEGV, current);
return;
Index: linux/arch/x86/mm/fault_32.c
===================================================================
--- linux.orig/arch/x86/mm/fault_32.c
+++ linux/arch/x86/mm/fault_32.c
@@ -550,10 +550,12 @@ bad_area_nosemaphore:
if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
printk_ratelimit()) {
printk("%s%s[%d]: segfault at %lx ip %08lx "
- "sp %08lx error %lx\n",
+ "sp %08lx error %lx",
task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
tsk->comm, task_pid_nr(tsk), address, regs->ip,
regs->sp, error_code);
+ print_vma_addr(" in ", regs->ip);
+ printk("\n");
}
tsk->thread.cr2 = address;
/* Kernel addresses are always protection faults */
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH] [20/20] x86: Print which shared library/executable faulted in segfault etc. messages
2008-01-03 0:50 ` [PATCH] [20/20] x86: Print which shared library/executable faulted in segfault etc. messages Andi Kleen
@ 2008-01-03 6:28 ` Eric Dumazet
2008-01-03 11:00 ` Ingo Molnar
1 sibling, 0 replies; 41+ messages in thread
From: Eric Dumazet @ 2008-01-03 6:28 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel
Andi Kleen a écrit :
> They now look like
>
> hal-resmgr[13791]: segfault at 3c rip 2b9c8caec182 rsp 7fff1e825d30 error 4 in libacl.so.1.1.0[2b9c8caea000+6000]
>
> This makes it easier to pinpoint bugs to specific libraries.
>
> And printing the offset into a mapping also always allows to find the
> correct fault point in a library even with randomized mappings. Previously
> there was no way to actually find the correct code address inside
> the randomized mapping.
>
> Relies on earlier patch to shorten the printk formats.
>
> They are often now longer than 80 characters, but I think that's worth
> it.
>
> Patch for i386 and x86-64.
>
> Signed-off-by: Andi Kleen <ak@suse.de>
>
> ---
> arch/x86/kernel/signal_32.c | 7 +++++--
> arch/x86/kernel/signal_64.c | 7 +++++--
> arch/x86/kernel/traps_32.c | 7 +++++--
> arch/x86/mm/fault_32.c | 4 +++-
> include/linux/mm.h | 1 +
> mm/memory.c | 27 +++++++++++++++++++++++++++
> 6 files changed, 46 insertions(+), 7 deletions(-)
>
> Index: linux/include/linux/mm.h
> ===================================================================
> --- linux.orig/include/linux/mm.h
> +++ linux/include/linux/mm.h
> @@ -1145,6 +1145,7 @@ extern int randomize_va_space;
> #endif
>
> const char * arch_vma_name(struct vm_area_struct *vma);
> +void print_vma_addr(char *prefix, unsigned long rip);
>
> struct page *sparse_mem_map_populate(unsigned long pnum, int nid);
> pgd_t *vmemmap_pgd_populate(unsigned long addr, int node);
> Index: linux/mm/memory.c
> ===================================================================
> --- linux.orig/mm/memory.c
> +++ linux/mm/memory.c
> @@ -2746,3 +2746,30 @@ int access_process_vm(struct task_struct
>
> return buf - old_buf;
> }
> +
> +/*
> + * Print the name of a VMA.
> + */
> +void print_vma_addr(char *prefix, unsigned long ip)
> +{
> + struct mm_struct *mm = current->mm;
> + struct vm_area_struct *vma;
> + down_read(&mm->mmap_sem);
> + vma = find_vma(mm, ip);
> + if (vma && vma->vm_file) {
> + struct file *f = vma->vm_file;
> + char *buf = (char *)__get_free_page(GFP_KERNEL);
> + if (buf) {
> + char *p, *s;
> + p = d_path(f->f_dentry, f->f_vfsmnt, buf, PAGE_SIZE);
d_path() can returns an error. You should add :
if (IS_ERR(p))
p = "?";
> + s = strrchr(p, '/');
> + if (s)
> + p = s+1;
> + printk("%s%s[%lx+%lx]", prefix, p,
> + vma->vm_start,
> + vma->vm_end - vma->vm_start);
> + free_page((unsigned long)buf);
> + }
> + }
> + up_read(¤t->mm->mmap_sem);
> +}
Thank you
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH] [20/20] x86: Print which shared library/executable faulted in segfault etc. messages
2008-01-03 0:50 ` [PATCH] [20/20] x86: Print which shared library/executable faulted in segfault etc. messages Andi Kleen
2008-01-03 6:28 ` Eric Dumazet
@ 2008-01-03 11:00 ` Ingo Molnar
2008-01-03 13:06 ` Andi Kleen
1 sibling, 1 reply; 41+ messages in thread
From: Ingo Molnar @ 2008-01-03 11:00 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel
* Andi Kleen <ak@suse.de> wrote:
> They now look like
>
> hal-resmgr[13791]: segfault at 3c rip 2b9c8caec182 rsp 7fff1e825d30
> error 4 in libacl.so.1.1.0[2b9c8caea000+6000]
>
> This makes it easier to pinpoint bugs to specific libraries.
yep, that's really useful.
I think the patch needs one more iteration though:
> And printing the offset into a mapping also always allows to find the
> correct fault point in a library even with randomized mappings. Previously
> there was no way to actually find the correct code address inside
> the randomized mapping.
>
> Relies on earlier patch to shorten the printk formats.
>
> They are often now longer than 80 characters, but I think that's worth
> it.
why not make it multi-line? that way the %lx hack wouldnt be needed
either.
> +void print_vma_addr(char *prefix, unsigned long ip)
> +{
> + struct mm_struct *mm = current->mm;
> + struct vm_area_struct *vma;
> + down_read(&mm->mmap_sem);
> + vma = find_vma(mm, ip);
grumble. Proper CodingStyle please.
> + if (buf) {
> + char *p, *s;
> + p = d_path(f->f_dentry, f->f_vfsmnt, buf, PAGE_SIZE);
this one too.
> + if (show_unhandled_signals && printk_ratelimit()) {
> + printk("%s[%d] bad frame in %s frame:%p ip:%lx sp:%lx orax:%lx",
> me->comm,me->pid,where,frame,regs->ip,regs->sp,regs->orig_ax);
and this.
Ingo
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH] [20/20] x86: Print which shared library/executable faulted in segfault etc. messages
2008-01-03 11:00 ` Ingo Molnar
@ 2008-01-03 13:06 ` Andi Kleen
0 siblings, 0 replies; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 13:06 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel
> > And printing the offset into a mapping also always allows to find the
> > correct fault point in a library even with randomized mappings. Previously
> > there was no way to actually find the correct code address inside
> > the randomized mapping.
> >
> > Relies on earlier patch to shorten the printk formats.
> >
> > They are often now longer than 80 characters, but I think that's worth
> > it.
>
> why not make it multi-line? that way the %lx hack wouldnt be needed
> either.
I prefer it single-line. I also disagree on %lx being a hack.
>
> > +void print_vma_addr(char *prefix, unsigned long ip)
> > +{
> > + struct mm_struct *mm = current->mm;
> > + struct vm_area_struct *vma;
> > + down_read(&mm->mmap_sem);
> > + vma = find_vma(mm, ip);
>
> grumble. Proper CodingStyle please.
Looks fine to me. If you mean the new line after variables -- that was always optional.
Anyways I'll repost with the error check.
Also it seems like you did apply only parts of the patchkit. If you do that can
you send a list of what patches you didn't add, otherwise it'll be messy to figure
this out from here.
-Andi
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code
2008-01-03 0:49 [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Andi Kleen
` (18 preceding siblings ...)
2008-01-03 0:50 ` [PATCH] [20/20] x86: Print which shared library/executable faulted in segfault etc. messages Andi Kleen
@ 2008-01-03 9:54 ` Ingo Molnar
2008-01-03 12:57 ` Andi Kleen
19 siblings, 1 reply; 41+ messages in thread
From: Ingo Molnar @ 2008-01-03 9:54 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel
* Andi Kleen <ak@suse.de> wrote:
> Index: linux/include/asm-x86/ptrace-abi.h
> ===================================================================
> --- linux.orig/include/asm-x86/ptrace-abi.h
> +++ linux/include/asm-x86/ptrace-abi.h
> @@ -80,6 +80,7 @@
>
> #define PTRACE_SINGLEBLOCK 33 /* resume execution until next branch */
>
> +#ifndef __ASSEMBLY__
hm, this patch misses a rationale - what assembly code includes
ptrace-abi.h directly or indirectly? Did you see any build breakage with
x86.git that requires this? (if yes then please send me the .config)
Ingo
^ permalink raw reply [flat|nested] 41+ messages in thread* Re: [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code
2008-01-03 9:54 ` [PATCH] [1/20] x86: Make ptrace.h safe to include from assembler code Ingo Molnar
@ 2008-01-03 12:57 ` Andi Kleen
0 siblings, 0 replies; 41+ messages in thread
From: Andi Kleen @ 2008-01-03 12:57 UTC (permalink / raw)
To: Ingo Molnar; +Cc: linux-kernel
On Thursday 03 January 2008 10:54:52 Ingo Molnar wrote:
>
> * Andi Kleen <ak@suse.de> wrote:
>
> > Index: linux/include/asm-x86/ptrace-abi.h
> > ===================================================================
> > --- linux.orig/include/asm-x86/ptrace-abi.h
> > +++ linux/include/asm-x86/ptrace-abi.h
> > @@ -80,6 +80,7 @@
> >
> > #define PTRACE_SINGLEBLOCK 33 /* resume execution until next branch */
> >
> > +#ifndef __ASSEMBLY__
>
> hm, this patch misses a rationale - what assembly code includes
> ptrace-abi.h directly or indirectly? Did you see any build breakage with
> x86.git that requires this? (if yes then please send me the .config)
It's needed for the dwarf2 unwinder, but imho useful on its own.
-Andi
^ permalink raw reply [flat|nested] 41+ messages in thread