* [PATCH v2 0/2] x86: disable IRQs during CR4 changes
@ 2017-11-25 3:29 Nadav Amit
2017-11-25 3:29 ` [PATCH v2 1/2] x86: refactor CR4 setting and shadow write Nadav Amit
2017-11-25 3:29 ` [PATCH v2 2/2] x86: disable IRQs before changing CR4 Nadav Amit
0 siblings, 2 replies; 9+ messages in thread
From: Nadav Amit @ 2017-11-25 3:29 UTC (permalink / raw)
To: linux-kernel, linux-edac
Cc: nadav.amit, Nadav Amit, Andy Lutomirski, Thomas Gleixner,
Ingo Molnar, H. Peter Anvin, x86, Tony Luck, Borislav Petkov,
Paolo Bonzini, Radim Krčmář
CR4 needs to be updated atomically with its shadow value, as CR4 updates are
performed in read-modify-write fashion which are based on the shadow value. If
CR4 is changed between the read and the write, CR4 might not be updated
correctly.
For this to happen, CR4 needs to be rewritten by an interrupt handler.
[Presumably, writes to CR4 take place while preemption is disabled, although
due to the experience with CR3 - who knows.] CR4.PGD can be updated by an
interrupt handler, but it is restored to its previous value, so it should not
introduce a race. However, it seems that allowing CR4 updates without disabling
IRQs may present a potential future bug.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
v1 -> v2:
- Break into two patches (Andy)
- Rename refactored function to __cr4_set() (Andy)
Nadav Amit (2):
x86: refactor CR4 setting and shadow write
x86: disable IRQs before changing CR4
arch/x86/include/asm/mmu_context.h | 4 ++--
arch/x86/include/asm/tlbflush.h | 40 +++++++++++++++++++++---------------
arch/x86/include/asm/virtext.h | 2 +-
arch/x86/kernel/cpu/common.c | 38 +++++++++++++++++++++++++---------
arch/x86/kernel/cpu/mcheck/mce.c | 5 ++++-
arch/x86/kernel/cpu/mcheck/p5.c | 6 +++++-
arch/x86/kernel/cpu/mcheck/winchip.c | 5 ++++-
arch/x86/kernel/fpu/init.c | 2 +-
arch/x86/kernel/fpu/xstate.c | 4 ++--
arch/x86/kernel/process.c | 20 +++++++++++++-----
arch/x86/kernel/reboot.c | 2 +-
arch/x86/kvm/vmx.c | 13 ++++++++++--
arch/x86/mm/init.c | 6 +++++-
13 files changed, 102 insertions(+), 45 deletions(-)
--
2.14.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 1/2] x86: refactor CR4 setting and shadow write
2017-11-25 3:29 [PATCH v2 0/2] x86: disable IRQs during CR4 changes Nadav Amit
@ 2017-11-25 3:29 ` Nadav Amit
2017-11-25 12:37 ` [tip:x86/urgent] x86/tlb: Refactor " tip-bot for Nadav Amit
2017-11-25 3:29 ` [PATCH v2 2/2] x86: disable IRQs before changing CR4 Nadav Amit
1 sibling, 1 reply; 9+ messages in thread
From: Nadav Amit @ 2017-11-25 3:29 UTC (permalink / raw)
To: linux-kernel, linux-edac
Cc: nadav.amit, Nadav Amit, Andy Lutomirski, Thomas Gleixner,
Ingo Molnar, H. Peter Anvin, x86
Refactor the write to CR4 and its shadow value. This is done in
preparation for the addition of an assertion to check that IRQs are
disabled during CR4 update.
No functional change.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Nadav Amit <namit@vmware.com>
---
arch/x86/include/asm/tlbflush.h | 24 +++++++++++-------------
1 file changed, 11 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 509046cfa5ce..e736f7f0ba92 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -173,17 +173,20 @@ static inline void cr4_init_shadow(void)
this_cpu_write(cpu_tlbstate.cr4, __read_cr4());
}
+static inline void __cr4_set(unsigned long cr4)
+{
+ this_cpu_write(cpu_tlbstate.cr4, cr4);
+ __write_cr4(cr4);
+}
+
/* Set in this cpu's CR4. */
static inline void cr4_set_bits(unsigned long mask)
{
unsigned long cr4;
cr4 = this_cpu_read(cpu_tlbstate.cr4);
- if ((cr4 | mask) != cr4) {
- cr4 |= mask;
- this_cpu_write(cpu_tlbstate.cr4, cr4);
- __write_cr4(cr4);
- }
+ if ((cr4 | mask) != cr4)
+ __cr4_set(cr4 | mask);
}
/* Clear in this cpu's CR4. */
@@ -192,11 +195,8 @@ static inline void cr4_clear_bits(unsigned long mask)
unsigned long cr4;
cr4 = this_cpu_read(cpu_tlbstate.cr4);
- if ((cr4 & ~mask) != cr4) {
- cr4 &= ~mask;
- this_cpu_write(cpu_tlbstate.cr4, cr4);
- __write_cr4(cr4);
- }
+ if ((cr4 & ~mask) != cr4)
+ __cr4_set(cr4 & ~mask);
}
static inline void cr4_toggle_bits(unsigned long mask)
@@ -204,9 +204,7 @@ static inline void cr4_toggle_bits(unsigned long mask)
unsigned long cr4;
cr4 = this_cpu_read(cpu_tlbstate.cr4);
- cr4 ^= mask;
- this_cpu_write(cpu_tlbstate.cr4, cr4);
- __write_cr4(cr4);
+ __cr4_set(cr4 ^ mask);
}
/* Read the CR4 shadow. */
--
2.14.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v2 2/2] x86: disable IRQs before changing CR4
2017-11-25 3:29 [PATCH v2 0/2] x86: disable IRQs during CR4 changes Nadav Amit
2017-11-25 3:29 ` [PATCH v2 1/2] x86: refactor CR4 setting and shadow write Nadav Amit
@ 2017-11-25 3:29 ` Nadav Amit
2017-11-25 10:36 ` Thomas Gleixner
2017-11-25 12:38 ` [tip:x86/urgent] x86/tlb: Disable interrupts when " tip-bot for Nadav Amit
1 sibling, 2 replies; 9+ messages in thread
From: Nadav Amit @ 2017-11-25 3:29 UTC (permalink / raw)
To: linux-kernel, linux-edac
Cc: nadav.amit, Nadav Amit, Andy Lutomirski, Thomas Gleixner,
Ingo Molnar, H. Peter Anvin, x86, Tony Luck, Borislav Petkov,
Paolo Bonzini, Radim Krčmář
CR4 changes need to be performed while IRQs are disabled in order to
update the CR4 shadow and the actual register atomically. Actually, they
are needed regardless of CR4 shadowing, since CR4 are performed in a
read-modify-write manner.
Currently, however, this is not the case, as can be experienced by
adding warnings when CR4 updates are performed and interrupts are
enabled. It also appears that CR4 changes with enabled interrupts can be
triggered by the user (PR_SET_TSC).
If CR4 updates are done while interrupts are enabled, an interrupt can
be delivered between the CR4 read and the corresponding write of the
modified value to CR4. If the interrupt handler changes CR4, the write
would ignore the modified value.
It is not clear there are currently interrupt handlers that modify CR4
and do not restore immediately the original value, but there is an
interrupt handler that changes CR4, when global PTEs are invalidated.
Moreover, a recent patch considered doing change CR4 inside the
interrupt handler, emphasizing that the current scheme is error-prone.
Prevent the issue by: adding a debug warning if CR4 is updated with
enabled interrupts; changing CR4 manipulation function names to reflect
the fact they need disabled IRQs and fix the callers.
Note that in some cases, e.g., kvm_cpu_vmxon(), it appears that saving
and restoring the IRQs could have been spared. Yet, since these calls do
not appear to be on the hot path, save and restore the IRQs to be on the
safe side.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
---
arch/x86/include/asm/mmu_context.h | 4 ++--
arch/x86/include/asm/tlbflush.h | 16 +++++++++++----
arch/x86/include/asm/virtext.h | 2 +-
arch/x86/kernel/cpu/common.c | 38 ++++++++++++++++++++++++++----------
arch/x86/kernel/cpu/mcheck/mce.c | 5 ++++-
arch/x86/kernel/cpu/mcheck/p5.c | 6 +++++-
arch/x86/kernel/cpu/mcheck/winchip.c | 5 ++++-
arch/x86/kernel/fpu/init.c | 2 +-
arch/x86/kernel/fpu/xstate.c | 4 ++--
arch/x86/kernel/process.c | 20 ++++++++++++++-----
arch/x86/kernel/reboot.c | 2 +-
arch/x86/kvm/vmx.c | 13 ++++++++++--
arch/x86/mm/init.c | 6 +++++-
13 files changed, 91 insertions(+), 32 deletions(-)
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 6699fc441644..637cbda77eda 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -30,9 +30,9 @@ static inline void load_mm_cr4(struct mm_struct *mm)
{
if (static_key_false(&rdpmc_always_available) ||
atomic_read(&mm->context.perf_rdpmc_allowed))
- cr4_set_bits(X86_CR4_PCE);
+ cr4_set_bits_irqs_off(X86_CR4_PCE);
else
- cr4_clear_bits(X86_CR4_PCE);
+ cr4_clear_bits_irqs_off(X86_CR4_PCE);
}
#else
static inline void load_mm_cr4(struct mm_struct *mm) {}
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index e736f7f0ba92..6b94dcf8c500 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -175,12 +175,15 @@ static inline void cr4_init_shadow(void)
static inline void __cr4_set(unsigned long cr4)
{
+#ifdef CONFIG_LOCKDEP
+ WARN_ON(!irqs_disabled());
+#endif
this_cpu_write(cpu_tlbstate.cr4, cr4);
__write_cr4(cr4);
}
/* Set in this cpu's CR4. */
-static inline void cr4_set_bits(unsigned long mask)
+static inline void cr4_set_bits_irqs_off(unsigned long mask)
{
unsigned long cr4;
@@ -190,7 +193,7 @@ static inline void cr4_set_bits(unsigned long mask)
}
/* Clear in this cpu's CR4. */
-static inline void cr4_clear_bits(unsigned long mask)
+static inline void cr4_clear_bits_irqs_off(unsigned long mask)
{
unsigned long cr4;
@@ -199,7 +202,7 @@ static inline void cr4_clear_bits(unsigned long mask)
__cr4_set(cr4 & ~mask);
}
-static inline void cr4_toggle_bits(unsigned long mask)
+static inline void cr4_toggle_bits_irqs_off(unsigned long mask)
{
unsigned long cr4;
@@ -224,10 +227,15 @@ extern u32 *trampoline_cr4_features;
static inline void cr4_set_bits_and_update_boot(unsigned long mask)
{
+ unsigned long flags;
+
mmu_cr4_features |= mask;
if (trampoline_cr4_features)
*trampoline_cr4_features = mmu_cr4_features;
- cr4_set_bits(mask);
+
+ local_irq_save(flags);
+ cr4_set_bits_irqs_off(mask);
+ local_irq_restore(flags);
}
extern void initialize_tlbstate_and_flush(void);
diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index 0116b2ee9e64..b403ca417b7d 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -41,7 +41,7 @@ static inline int cpu_has_vmx(void)
static inline void cpu_vmxoff(void)
{
asm volatile (ASM_VMX_VMXOFF : : : "cc");
- cr4_clear_bits(X86_CR4_VMXE);
+ cr4_clear_bits_irqs_off(X86_CR4_VMXE);
}
static inline int cpu_vmx_enabled(void)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c9176bae7fd8..f31890787d01 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -302,8 +302,14 @@ __setup("nosmep", setup_disable_smep);
static __always_inline void setup_smep(struct cpuinfo_x86 *c)
{
- if (cpu_has(c, X86_FEATURE_SMEP))
- cr4_set_bits(X86_CR4_SMEP);
+ unsigned long flags;
+
+ if (!cpu_has(c, X86_FEATURE_SMEP))
+ return;
+
+ local_irq_save(flags);
+ cr4_set_bits_irqs_off(X86_CR4_SMEP);
+ local_irq_restore(flags);
}
static __init int setup_disable_smap(char *arg)
@@ -320,13 +326,16 @@ static __always_inline void setup_smap(struct cpuinfo_x86 *c)
/* This should have been cleared long ago */
BUG_ON(eflags & X86_EFLAGS_AC);
- if (cpu_has(c, X86_FEATURE_SMAP)) {
+ if (!cpu_has(c, X86_FEATURE_SMAP))
+ return;
+
+ local_irq_save(eflags);
#ifdef CONFIG_X86_SMAP
- cr4_set_bits(X86_CR4_SMAP);
+ cr4_set_bits_irqs_off(X86_CR4_SMAP);
#else
- cr4_clear_bits(X86_CR4_SMAP);
+ cr4_clear_bits_irqs_off(X86_CR4_SMAP);
#endif
- }
+ local_irq_restore(eflags);
}
/*
@@ -336,6 +345,8 @@ static bool pku_disabled;
static __always_inline void setup_pku(struct cpuinfo_x86 *c)
{
+ unsigned long flags;
+
/* check the boot processor, plus compile options for PKU: */
if (!cpu_feature_enabled(X86_FEATURE_PKU))
return;
@@ -345,7 +356,10 @@ static __always_inline void setup_pku(struct cpuinfo_x86 *c)
if (pku_disabled)
return;
- cr4_set_bits(X86_CR4_PKE);
+ local_irq_save(flags);
+ cr4_set_bits_irqs_off(X86_CR4_PKE);
+ local_irq_restore(flags);
+
/*
* Seting X86_CR4_PKE will cause the X86_FEATURE_OSPKE
* cpuid bit to be set. We need to ensure that we
@@ -1147,7 +1161,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Disable the PN if appropriate */
squash_the_stupid_serial_number(c);
- /* Set up SMEP/SMAP */
+ /*
+ * Set up SMEP/SMAP. Disable interrupts to prevent triggering a warning
+ * as CR4 changes must be done with disabled interrupts.
+ */
setup_smep(c);
setup_smap(c);
@@ -1520,7 +1537,7 @@ void cpu_init(void)
pr_debug("Initializing CPU#%d\n", cpu);
- cr4_clear_bits(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
+ cr4_clear_bits_irqs_off(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
/*
* Initialize the per-CPU GDT with the boot GDT,
@@ -1613,7 +1630,8 @@ void cpu_init(void)
if (cpu_feature_enabled(X86_FEATURE_VME) ||
boot_cpu_has(X86_FEATURE_TSC) ||
boot_cpu_has(X86_FEATURE_DE))
- cr4_clear_bits(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
+ cr4_clear_bits_irqs_off(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|
+ X86_CR4_DE);
load_current_idt();
switch_to_new_gdt(cpu);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 3b413065c613..1aac196eb145 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1508,6 +1508,7 @@ static void __mcheck_cpu_init_generic(void)
{
enum mcp_flags m_fl = 0;
mce_banks_t all_banks;
+ unsigned long flags;
u64 cap;
if (!mca_cfg.bootlog)
@@ -1519,7 +1520,9 @@ static void __mcheck_cpu_init_generic(void)
bitmap_fill(all_banks, MAX_NR_BANKS);
machine_check_poll(MCP_UC | m_fl, &all_banks);
- cr4_set_bits(X86_CR4_MCE);
+ local_irq_save(flags);
+ cr4_set_bits_irqs_off(X86_CR4_MCE);
+ local_irq_restore(flags);
rdmsrl(MSR_IA32_MCG_CAP, cap);
if (cap & MCG_CTL_P)
diff --git a/arch/x86/kernel/cpu/mcheck/p5.c b/arch/x86/kernel/cpu/mcheck/p5.c
index 5cddf831720f..e4bb9573cfe5 100644
--- a/arch/x86/kernel/cpu/mcheck/p5.c
+++ b/arch/x86/kernel/cpu/mcheck/p5.c
@@ -43,6 +43,7 @@ static void pentium_machine_check(struct pt_regs *regs, long error_code)
/* Set up machine check reporting for processors with Intel style MCE: */
void intel_p5_mcheck_init(struct cpuinfo_x86 *c)
{
+ unsigned long flags;
u32 l, h;
/* Default P5 to off as its often misconnected: */
@@ -63,7 +64,10 @@ void intel_p5_mcheck_init(struct cpuinfo_x86 *c)
pr_info("Intel old style machine check architecture supported.\n");
/* Enable MCE: */
- cr4_set_bits(X86_CR4_MCE);
+ local_irq_save(flags);
+ cr4_set_bits_irqs_off(X86_CR4_MCE);
+ local_irq_restore(flags);
+
pr_info("Intel old style machine check reporting enabled on CPU#%d.\n",
smp_processor_id());
}
diff --git a/arch/x86/kernel/cpu/mcheck/winchip.c b/arch/x86/kernel/cpu/mcheck/winchip.c
index 3b45b270a865..72213d75c865 100644
--- a/arch/x86/kernel/cpu/mcheck/winchip.c
+++ b/arch/x86/kernel/cpu/mcheck/winchip.c
@@ -27,6 +27,7 @@ static void winchip_machine_check(struct pt_regs *regs, long error_code)
/* Set up machine check reporting on the Winchip C6 series */
void winchip_mcheck_init(struct cpuinfo_x86 *c)
{
+ unsigned long flags;
u32 lo, hi;
machine_check_vector = winchip_machine_check;
@@ -38,7 +39,9 @@ void winchip_mcheck_init(struct cpuinfo_x86 *c)
lo &= ~(1<<4); /* Enable MCE */
wrmsr(MSR_IDT_FCR1, lo, hi);
- cr4_set_bits(X86_CR4_MCE);
+ local_irq_save(flags);
+ cr4_set_bits_irqs_off(X86_CR4_MCE);
+ local_irq_restore(flags);
pr_info("Winchip machine check reporting enabled on CPU#0.\n");
}
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 7affb7e3d9a5..db57b217e123 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -23,7 +23,7 @@ static void fpu__init_cpu_generic(void)
if (boot_cpu_has(X86_FEATURE_XMM))
cr4_mask |= X86_CR4_OSXMMEXCPT;
if (cr4_mask)
- cr4_set_bits(cr4_mask);
+ cr4_set_bits_irqs_off(cr4_mask);
cr0 = read_cr0();
cr0 &= ~(X86_CR0_TS|X86_CR0_EM); /* clear TS and EM */
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index f1d5476c9022..9d3c7a1a4ce5 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -237,7 +237,7 @@ void fpu__init_cpu_xstate(void)
xfeatures_mask &= ~XFEATURE_MASK_SUPERVISOR;
- cr4_set_bits(X86_CR4_OSXSAVE);
+ cr4_set_bits_irqs_off(X86_CR4_OSXSAVE);
xsetbv(XCR_XFEATURE_ENABLED_MASK, xfeatures_mask);
}
@@ -713,7 +713,7 @@ static int init_xstate_size(void)
static void fpu__init_disable_system_xstate(void)
{
xfeatures_mask = 0;
- cr4_clear_bits(X86_CR4_OSXSAVE);
+ cr4_clear_bits_irqs_off(X86_CR4_OSXSAVE);
fpu__xstate_clear_all_cpu_caps();
}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index c67685337c5a..412265fe14df 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -128,25 +128,35 @@ void flush_thread(void)
void disable_TSC(void)
{
+ unsigned long flags;
+
preempt_disable();
- if (!test_and_set_thread_flag(TIF_NOTSC))
+ if (!test_and_set_thread_flag(TIF_NOTSC)) {
/*
* Must flip the CPU state synchronously with
* TIF_NOTSC in the current running context.
*/
- cr4_set_bits(X86_CR4_TSD);
+ local_irq_save(flags);
+ cr4_set_bits_irqs_off(X86_CR4_TSD);
+ local_irq_restore(flags);
+ }
preempt_enable();
}
static void enable_TSC(void)
{
+ unsigned long flags;
+
preempt_disable();
- if (test_and_clear_thread_flag(TIF_NOTSC))
+ if (test_and_clear_thread_flag(TIF_NOTSC)) {
/*
* Must flip the CPU state synchronously with
* TIF_NOTSC in the current running context.
*/
- cr4_clear_bits(X86_CR4_TSD);
+ local_irq_save(flags);
+ cr4_clear_bits_irqs_off(X86_CR4_TSD);
+ local_irq_restore(flags);
+ }
preempt_enable();
}
@@ -293,7 +303,7 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
}
if ((tifp ^ tifn) & _TIF_NOTSC)
- cr4_toggle_bits(X86_CR4_TSD);
+ cr4_toggle_bits_irqs_off(X86_CR4_TSD);
if ((tifp ^ tifn) & _TIF_NOCPUID)
set_cpuid_faulting(!!(tifn & _TIF_NOCPUID));
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 2126b9d27c34..86ad70c02607 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -109,7 +109,7 @@ void __noreturn machine_real_restart(unsigned int type)
/* Exiting long mode will fail if CR4.PCIDE is set. */
if (static_cpu_has(X86_FEATURE_PCID))
- cr4_clear_bits(X86_CR4_PCIDE);
+ cr4_clear_bits_irqs_off(X86_CR4_PCIDE);
#endif
/* Jump to the identity-mapped low memory code */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a6f4f095f8f4..a0b387dfbf5c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3500,7 +3500,12 @@ static __init int vmx_disabled_by_bios(void)
static void kvm_cpu_vmxon(u64 addr)
{
- cr4_set_bits(X86_CR4_VMXE);
+ unsigned long flags;
+
+ local_irq_save(flags);
+ cr4_set_bits_irqs_off(X86_CR4_VMXE);
+ local_irq_restore(flags);
+
intel_pt_handle_vmx(1);
asm volatile (ASM_VMX_VMXON_RAX
@@ -3565,10 +3570,14 @@ static void vmclear_local_loaded_vmcss(void)
*/
static void kvm_cpu_vmxoff(void)
{
+ unsigned long flags;
+
asm volatile (__ex(ASM_VMX_VMXOFF) : : : "cc");
intel_pt_handle_vmx(0);
- cr4_clear_bits(X86_CR4_VMXE);
+ local_irq_save(flags);
+ cr4_clear_bits_irqs_off(X86_CR4_VMXE);
+ local_irq_restore(flags);
}
static void hardware_disable(void)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index af5c1ed21d43..ddd248acab8e 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -199,6 +199,8 @@ static void setup_pcid(void)
#ifdef CONFIG_X86_64
if (boot_cpu_has(X86_FEATURE_PCID)) {
if (boot_cpu_has(X86_FEATURE_PGE)) {
+ unsigned long flags;
+
/*
* This can't be cr4_set_bits_and_update_boot() --
* the trampoline code can't handle CR4.PCIDE and
@@ -210,7 +212,9 @@ static void setup_pcid(void)
* Instead, we brute-force it and set CR4.PCIDE
* manually in start_secondary().
*/
- cr4_set_bits(X86_CR4_PCIDE);
+ local_irq_save(flags);
+ cr4_set_bits_irqs_off(X86_CR4_PCIDE);
+ local_irq_restore(flags);
} else {
/*
* flush_tlb_all(), as currently implemented, won't
--
2.14.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] x86: disable IRQs before changing CR4
2017-11-25 3:29 ` [PATCH v2 2/2] x86: disable IRQs before changing CR4 Nadav Amit
@ 2017-11-25 10:36 ` Thomas Gleixner
2017-11-25 17:20 ` Nadav Amit
2017-11-25 12:38 ` [tip:x86/urgent] x86/tlb: Disable interrupts when " tip-bot for Nadav Amit
1 sibling, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2017-11-25 10:36 UTC (permalink / raw)
To: Nadav Amit
Cc: linux-kernel, linux-edac, nadav.amit, Andy Lutomirski,
Ingo Molnar, H. Peter Anvin, x86, Tony Luck, Borislav Petkov,
Paolo Bonzini, Radim Krčmář
On Fri, 24 Nov 2017, Nadav Amit wrote:
> /* Set in this cpu's CR4. */
> -static inline void cr4_set_bits(unsigned long mask)
> +static inline void cr4_set_bits_irqs_off(unsigned long mask)
This change is kinda weird. I'd expect that there is a corresponding
function cr4_set_bits() which takes care of disabling interrupts. But there
is not. All it does is creating a lot of pointless churn.
> static __always_inline void setup_smep(struct cpuinfo_x86 *c)
> static __init int setup_disable_smap(char *arg)
> static __always_inline void setup_pku(struct cpuinfo_x86 *c)
Why are you not doing this at the call site around all calls which fiddle
with cr4, i.e. in identify_cpu() ?
identify_cpu() is called from two places:
identify_boot_cpu() and identify_secondary_cpu()
identify_secondary_cpu is called with interrupts disabled anyway and there
is no reason why we can't enforce interrupts being disabled around
identify_cpu() completely.
But if we actually do the right thing, i.e. having cr4_set_bit() and
cr4_set_bit_irqsoff() all of this churn goes away magically.
Then the only place which needs to be changed is the context switch because
here interrupts are already disabled and we really care about performance.
> @@ -293,7 +303,7 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
> }
>
> if ((tifp ^ tifn) & _TIF_NOTSC)
> - cr4_toggle_bits(X86_CR4_TSD);
> + cr4_toggle_bits_irqs_off(X86_CR4_TSD);
>
Thanks,
tglx
^ permalink raw reply [flat|nested] 9+ messages in thread
* [tip:x86/urgent] x86/tlb: Refactor CR4 setting and shadow write
2017-11-25 3:29 ` [PATCH v2 1/2] x86: refactor CR4 setting and shadow write Nadav Amit
@ 2017-11-25 12:37 ` tip-bot for Nadav Amit
0 siblings, 0 replies; 9+ messages in thread
From: tip-bot for Nadav Amit @ 2017-11-25 12:37 UTC (permalink / raw)
To: linux-tip-commits; +Cc: tglx, mingo, luto, hpa, linux-kernel, namit
Commit-ID: 0c3292ca8025c5aef44dc389ac3a6bf4a325e0be
Gitweb: https://git.kernel.org/tip/0c3292ca8025c5aef44dc389ac3a6bf4a325e0be
Author: Nadav Amit <namit@vmware.com>
AuthorDate: Fri, 24 Nov 2017 19:29:06 -0800
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 25 Nov 2017 13:28:43 +0100
x86/tlb: Refactor CR4 setting and shadow write
Refactor the write to CR4 and its shadow value. This is done in
preparation for the addition of an assertion to check that IRQs are
disabled during CR4 update.
No functional change.
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: nadav.amit@gmail.com
Cc: Andy Lutomirski <luto@kernel.org>
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/20171125032907.2241-2-namit@vmware.com
---
arch/x86/include/asm/tlbflush.h | 24 +++++++++++-------------
1 file changed, 11 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 509046c..e736f7f 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -173,17 +173,20 @@ static inline void cr4_init_shadow(void)
this_cpu_write(cpu_tlbstate.cr4, __read_cr4());
}
+static inline void __cr4_set(unsigned long cr4)
+{
+ this_cpu_write(cpu_tlbstate.cr4, cr4);
+ __write_cr4(cr4);
+}
+
/* Set in this cpu's CR4. */
static inline void cr4_set_bits(unsigned long mask)
{
unsigned long cr4;
cr4 = this_cpu_read(cpu_tlbstate.cr4);
- if ((cr4 | mask) != cr4) {
- cr4 |= mask;
- this_cpu_write(cpu_tlbstate.cr4, cr4);
- __write_cr4(cr4);
- }
+ if ((cr4 | mask) != cr4)
+ __cr4_set(cr4 | mask);
}
/* Clear in this cpu's CR4. */
@@ -192,11 +195,8 @@ static inline void cr4_clear_bits(unsigned long mask)
unsigned long cr4;
cr4 = this_cpu_read(cpu_tlbstate.cr4);
- if ((cr4 & ~mask) != cr4) {
- cr4 &= ~mask;
- this_cpu_write(cpu_tlbstate.cr4, cr4);
- __write_cr4(cr4);
- }
+ if ((cr4 & ~mask) != cr4)
+ __cr4_set(cr4 & ~mask);
}
static inline void cr4_toggle_bits(unsigned long mask)
@@ -204,9 +204,7 @@ static inline void cr4_toggle_bits(unsigned long mask)
unsigned long cr4;
cr4 = this_cpu_read(cpu_tlbstate.cr4);
- cr4 ^= mask;
- this_cpu_write(cpu_tlbstate.cr4, cr4);
- __write_cr4(cr4);
+ __cr4_set(cr4 ^ mask);
}
/* Read the CR4 shadow. */
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [tip:x86/urgent] x86/tlb: Disable interrupts when changing CR4
2017-11-25 3:29 ` [PATCH v2 2/2] x86: disable IRQs before changing CR4 Nadav Amit
2017-11-25 10:36 ` Thomas Gleixner
@ 2017-11-25 12:38 ` tip-bot for Nadav Amit
1 sibling, 0 replies; 9+ messages in thread
From: tip-bot for Nadav Amit @ 2017-11-25 12:38 UTC (permalink / raw)
To: linux-tip-commits
Cc: namit, mingo, rkrcmar, luto, pbonzini, tony.luck, tglx, bp, hpa,
linux-kernel
Commit-ID: 9d0b62328d34c7044114d4f4281981d4c537c4ba
Gitweb: https://git.kernel.org/tip/9d0b62328d34c7044114d4f4281981d4c537c4ba
Author: Nadav Amit <namit@vmware.com>
AuthorDate: Fri, 24 Nov 2017 19:29:07 -0800
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 25 Nov 2017 13:28:43 +0100
x86/tlb: Disable interrupts when changing CR4
CR4 modifications are implemented as RMW operations which update a shadow
variable and write the result to CR4. The RMW operation is protected by
preemption disable, but there is no enforcement or debugging mechanism.
CR4 modifications happen also in interrupt context via
__native_flush_tlb_global(). This implementation does not affect a
interrupted thread context CR4 operation, because the CR4 toggle restores
the original content and does not modify the shadow variable.
So the current situation seems to be safe, but a recent patch tried to add
an actual RMW operation in interrupt context, which will cause subtle
corruptions.
To prevent that and make the CR4 handling future proof:
- Add a lockdep assertion to __cr4_set() which will catch interrupt
enabled invocations
- Disable interrupts in the cr4 manipulator inlines
- Rename cr4_toggle_bits() to cr4_toggle_bits_irqsoff(). This is called
from __switch_to_xtra() where interrupts are already disabled and
performance matters.
All other call sites are not performance critical, so the extra overhead of
an additional local_irq_save/restore() pair is not a problem. If new call
sites care about performance then the necessary _irqsoff() variants can be
added.
[ tglx: Condensed the patch by moving the irq protection inside the
manipulator functions. Updated changelog ]
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Luck <tony.luck@intel.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: nadav.amit@gmail.com
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/20171125032907.2241-3-namit@vmware.com
---
arch/x86/include/asm/tlbflush.h | 11 ++++++++---
arch/x86/kernel/process.c | 2 +-
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index e736f7f..877b5c1 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -175,6 +175,7 @@ static inline void cr4_init_shadow(void)
static inline void __cr4_set(unsigned long cr4)
{
+ lockdep_assert_irqs_disabled();
this_cpu_write(cpu_tlbstate.cr4, cr4);
__write_cr4(cr4);
}
@@ -182,24 +183,28 @@ static inline void __cr4_set(unsigned long cr4)
/* Set in this cpu's CR4. */
static inline void cr4_set_bits(unsigned long mask)
{
- unsigned long cr4;
+ unsigned long cr4, flags;
+ local_irq_save(flags);
cr4 = this_cpu_read(cpu_tlbstate.cr4);
if ((cr4 | mask) != cr4)
__cr4_set(cr4 | mask);
+ local_irq_restore(flags);
}
/* Clear in this cpu's CR4. */
static inline void cr4_clear_bits(unsigned long mask)
{
- unsigned long cr4;
+ unsigned long cr4, flags;
+ local_irq_save(flags);
cr4 = this_cpu_read(cpu_tlbstate.cr4);
if ((cr4 & ~mask) != cr4)
__cr4_set(cr4 & ~mask);
+ local_irq_restore(flags);
}
-static inline void cr4_toggle_bits(unsigned long mask)
+static inline void cr4_toggle_bits_irqsoff(unsigned long mask)
{
unsigned long cr4;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 97fb3e5..bb988a2 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -299,7 +299,7 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
}
if ((tifp ^ tifn) & _TIF_NOTSC)
- cr4_toggle_bits(X86_CR4_TSD);
+ cr4_toggle_bits_irqsoff(X86_CR4_TSD);
if ((tifp ^ tifn) & _TIF_NOCPUID)
set_cpuid_faulting(!!(tifn & _TIF_NOCPUID));
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] x86: disable IRQs before changing CR4
2017-11-25 10:36 ` Thomas Gleixner
@ 2017-11-25 17:20 ` Nadav Amit
2017-11-25 17:25 ` Thomas Gleixner
0 siblings, 1 reply; 9+ messages in thread
From: Nadav Amit @ 2017-11-25 17:20 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, linux-edac, Andy Lutomirski, Ingo Molnar, H. Peter Anvin,
x86, Tony Luck, Borislav Petkov, Paolo Bonzini,
Radim Krčmář
Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 24 Nov 2017, Nadav Amit wrote:
>> /* Set in this cpu's CR4. */
>> -static inline void cr4_set_bits(unsigned long mask)
>> +static inline void cr4_set_bits_irqs_off(unsigned long mask)
>
> This change is kinda weird. I'd expect that there is a corresponding
> function cr4_set_bits() which takes care of disabling interrupts. But there
> is not. All it does is creating a lot of pointless churn.
>
>> static __always_inline void setup_smep(struct cpuinfo_x86 *c)
>> static __init int setup_disable_smap(char *arg)
>> static __always_inline void setup_pku(struct cpuinfo_x86 *c)
>
> Why are you not doing this at the call site around all calls which fiddle
> with cr4, i.e. in identify_cpu() ?
>
> identify_cpu() is called from two places:
>
> identify_boot_cpu() and identify_secondary_cpu()
>
> identify_secondary_cpu is called with interrupts disabled anyway and there
> is no reason why we can't enforce interrupts being disabled around
> identify_cpu() completely.
>
> But if we actually do the right thing, i.e. having cr4_set_bit() and
> cr4_set_bit_irqsoff() all of this churn goes away magically.
>
> Then the only place which needs to be changed is the context switch because
> here interrupts are already disabled and we really care about performance.
>
>> @@ -293,7 +303,7 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
>> }
>>
>> if ((tifp ^ tifn) & _TIF_NOTSC)
>> - cr4_toggle_bits(X86_CR4_TSD);
>> + cr4_toggle_bits_irqs_off(X86_CR4_TSD);
You make a good point. I will add cr4_set_bit(). I will leave identify_cpu()
as is, since it is rather hard to maintain code that enables/disables irqs
at one point and rely on these operations at a completely different place.
As you said, it is less of an issue once cr4_set_bit() and friends are
introduced.
Thanks,
Nadav
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] x86: disable IRQs before changing CR4
2017-11-25 17:20 ` Nadav Amit
@ 2017-11-25 17:25 ` Thomas Gleixner
2017-11-25 17:31 ` Nadav Amit
0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2017-11-25 17:25 UTC (permalink / raw)
To: Nadav Amit
Cc: LKML, linux-edac, Andy Lutomirski, Ingo Molnar, H. Peter Anvin,
x86, Tony Luck, Borislav Petkov, Paolo Bonzini,
Radim Krčmář
On Sat, 25 Nov 2017, Nadav Amit wrote:
> Thomas Gleixner <tglx@linutronix.de> wrote:
>
> > On Fri, 24 Nov 2017, Nadav Amit wrote:
> >> /* Set in this cpu's CR4. */
> >> -static inline void cr4_set_bits(unsigned long mask)
> >> +static inline void cr4_set_bits_irqs_off(unsigned long mask)
> >
> > This change is kinda weird. I'd expect that there is a corresponding
> > function cr4_set_bits() which takes care of disabling interrupts. But there
> > is not. All it does is creating a lot of pointless churn.
> >
> >> static __always_inline void setup_smep(struct cpuinfo_x86 *c)
> >> static __init int setup_disable_smap(char *arg)
> >> static __always_inline void setup_pku(struct cpuinfo_x86 *c)
> >
> > Why are you not doing this at the call site around all calls which fiddle
> > with cr4, i.e. in identify_cpu() ?
> >
> > identify_cpu() is called from two places:
> >
> > identify_boot_cpu() and identify_secondary_cpu()
> >
> > identify_secondary_cpu is called with interrupts disabled anyway and there
> > is no reason why we can't enforce interrupts being disabled around
> > identify_cpu() completely.
> >
> > But if we actually do the right thing, i.e. having cr4_set_bit() and
> > cr4_set_bit_irqsoff() all of this churn goes away magically.
> >
> > Then the only place which needs to be changed is the context switch because
> > here interrupts are already disabled and we really care about performance.
> >
> >> @@ -293,7 +303,7 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
> >> }
> >>
> >> if ((tifp ^ tifn) & _TIF_NOTSC)
> >> - cr4_toggle_bits(X86_CR4_TSD);
> >> + cr4_toggle_bits_irqs_off(X86_CR4_TSD);
>
> You make a good point. I will add cr4_set_bit(). I will leave identify_cpu()
> as is, since it is rather hard to maintain code that enables/disables irqs
> at one point and rely on these operations at a completely different place.
> As you said, it is less of an issue once cr4_set_bit() and friends are
> introduced.
I fixed that up already as I wanted to have it done, see the tip-bot mail
in your inbox.
Thanks,
tglx
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/2] x86: disable IRQs before changing CR4
2017-11-25 17:25 ` Thomas Gleixner
@ 2017-11-25 17:31 ` Nadav Amit
0 siblings, 0 replies; 9+ messages in thread
From: Nadav Amit @ 2017-11-25 17:31 UTC (permalink / raw)
To: Thomas Gleixner
Cc: LKML, linux-edac, Andy Lutomirski, Ingo Molnar, H. Peter Anvin,
x86, Tony Luck, Borislav Petkov, Paolo Bonzini,
Radim Krčmář
[-- Attachment #1: Type: text/plain, Size: 2419 bytes --]
Thomas Gleixner <tglx@linutronix.de> wrote:
> On Sat, 25 Nov 2017, Nadav Amit wrote:
>> Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>>> On Fri, 24 Nov 2017, Nadav Amit wrote:
>>>> /* Set in this cpu's CR4. */
>>>> -static inline void cr4_set_bits(unsigned long mask)
>>>> +static inline void cr4_set_bits_irqs_off(unsigned long mask)
>>>
>>> This change is kinda weird. I'd expect that there is a corresponding
>>> function cr4_set_bits() which takes care of disabling interrupts. But there
>>> is not. All it does is creating a lot of pointless churn.
>>>
>>>> static __always_inline void setup_smep(struct cpuinfo_x86 *c)
>>>> static __init int setup_disable_smap(char *arg)
>>>> static __always_inline void setup_pku(struct cpuinfo_x86 *c)
>>>
>>> Why are you not doing this at the call site around all calls which fiddle
>>> with cr4, i.e. in identify_cpu() ?
>>>
>>> identify_cpu() is called from two places:
>>>
>>> identify_boot_cpu() and identify_secondary_cpu()
>>>
>>> identify_secondary_cpu is called with interrupts disabled anyway and there
>>> is no reason why we can't enforce interrupts being disabled around
>>> identify_cpu() completely.
>>>
>>> But if we actually do the right thing, i.e. having cr4_set_bit() and
>>> cr4_set_bit_irqsoff() all of this churn goes away magically.
>>>
>>> Then the only place which needs to be changed is the context switch because
>>> here interrupts are already disabled and we really care about performance.
>>>
>>>> @@ -293,7 +303,7 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
>>>> }
>>>>
>>>> if ((tifp ^ tifn) & _TIF_NOTSC)
>>>> - cr4_toggle_bits(X86_CR4_TSD);
>>>> + cr4_toggle_bits_irqs_off(X86_CR4_TSD);
>>
>> You make a good point. I will add cr4_set_bit(). I will leave identify_cpu()
>> as is, since it is rather hard to maintain code that enables/disables irqs
>> at one point and rely on these operations at a completely different place.
>> As you said, it is less of an issue once cr4_set_bit() and friends are
>> introduced.
>
> I fixed that up already as I wanted to have it done, see the tip-bot mail
> in your inbox.
Thanks, your changes made it much better. At some point it might be better
to make the MTRR code to use these interfaces too instead of meddling with
CR4 directly. Anyhow, that is a different story.
Regards,
Nadav
[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-11-25 17:31 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-25 3:29 [PATCH v2 0/2] x86: disable IRQs during CR4 changes Nadav Amit
2017-11-25 3:29 ` [PATCH v2 1/2] x86: refactor CR4 setting and shadow write Nadav Amit
2017-11-25 12:37 ` [tip:x86/urgent] x86/tlb: Refactor " tip-bot for Nadav Amit
2017-11-25 3:29 ` [PATCH v2 2/2] x86: disable IRQs before changing CR4 Nadav Amit
2017-11-25 10:36 ` Thomas Gleixner
2017-11-25 17:20 ` Nadav Amit
2017-11-25 17:25 ` Thomas Gleixner
2017-11-25 17:31 ` Nadav Amit
2017-11-25 12:38 ` [tip:x86/urgent] x86/tlb: Disable interrupts when " tip-bot for Nadav Amit
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox