* [PATCH 1/2] x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled
[not found] ` <cover.1456789300.git.luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2016-02-29 23:45 ` Andy Lutomirski
2016-02-29 23:45 ` [PATCH 1/2] x86/nmi/64: Giant debugging hack Andy Lutomirski
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Andy Lutomirski @ 2016-02-29 23:45 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: lguest-uLR06cmDAlY/bJ5BZ2RsiQ, konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
Andrew Cooper, x86-DgEjT+Ai2ygdnm+yROfE0A,
xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Borislav Petkov,
david.vrabel-Sxgqhf6Nn4DQT0dZR+AlfA, Andy Lutomirski,
boris.ostrovsky-QHcLZuEGTsvQT0dZR+AlfA
x86_64 has very clean espfix handling on paravirt: espfix64 is set
up in native_iret, so paravirt systems that override iret bypass
espfix64 automatically. This is robust and straightforward.
x86_32 is messier. espfix is set up before the IRET paravirt patch
point, so it can't be directly conditionalized on whether we use
native_iret. We also can't easily move it into native_iret without
regressing performance due to a bizarre consideration. Specifically,
on 64-bit kernels, the logic is:
if (regs->ss & 0x4)
setup_espfix;
On 32-bit kernels, the logic is:
if ((regs->ss & 0x4) && (regs->cs & 0x3) == 3 &&
(regs->flags & X86_EFLAGS_VM) == 0)
setup_espfix;
The performance of setup_espfix itself is essentially irrelevant, but
the comparison happens on every IRET so its performance matters. On
x86_64, there's no need for any registers except flags to implement
the comparison, so we fold the whole thing into native_iret. On
x86_32, we don't do that because we need a free register to
implement the comparison efficiently. We therefore do espfix setup
before restoring registers on x86_32.
This patch gets rid of the explicit paravirt_enabled check by
introducing X86_BUG_ESPFIX on 32-bit systems and using an ALTERNATIVE
to skip espfix on paravirt systems where iret != native_iret. This is
also messy, but it's at least in line with other things we do.
This improves espfix performance by removing a branch, but no one
cares. More importantly, it removes a paravirt_enabled user, which is
good because paravirt_enabled is ill-defined and is going away.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
arch/x86/entry/entry_32.S | 15 ++-------------
arch/x86/include/asm/cpufeatures.h | 8 ++++++++
arch/x86/kernel/cpu/common.c | 25 +++++++++++++++++++++++++
3 files changed, 35 insertions(+), 13 deletions(-)
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 6a792603f9f3..4f7615e405a4 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -418,6 +418,8 @@ restore_all:
TRACE_IRQS_IRET
restore_all_notrace:
#ifdef CONFIG_X86_ESPFIX32
+ ALTERNATIVE "jmp restore_nocheck", "", X86_BUG_ESPFIX
+
movl PT_EFLAGS(%esp), %eax # mix EFLAGS, SS and CS
/*
* Warning: PT_OLDSS(%esp) contains the wrong/random values if we
@@ -444,19 +446,6 @@ ENTRY(iret_exc )
#ifdef CONFIG_X86_ESPFIX32
ldt_ss:
-#ifdef CONFIG_PARAVIRT
- /*
- * The kernel can't run on a non-flat stack if paravirt mode
- * is active. Rather than try to fixup the high bits of
- * ESP, bypass this code entirely. This may break DOSemu
- * and/or Wine support in a paravirt VM, although the option
- * is still available to implement the setting of the high
- * 16-bits in the INTERRUPT_RETURN paravirt-op.
- */
- cmpl $0, pv_info+PARAVIRT_enabled
- jne restore_nocheck
-#endif
-
/*
* Setup and switch to ESPFIX stack
*
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6663fae71b12..d11a3aaafd96 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -286,4 +286,12 @@
#define X86_BUG_CLFLUSH_MONITOR X86_BUG(7) /* AAI65, CLFLUSH required before MONITOR */
#define X86_BUG_SYSRET_SS_ATTRS X86_BUG(8) /* SYSRET doesn't fix up SS attrs */
+#ifdef CONFIG_X86_32
+/*
+ * 64-bit kernels don't use X86_BUG_ESPFIX. Make the define conditional
+ * to avoid confusion.
+ */
+#define X86_BUG_ESPFIX X86_BUG(9) /* "" IRET to 16-bit SS corrupts ESP/RSP high bits */
+#endif
+
#endif /* _ASM_X86_CPUFEATURES_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 91ddae732a36..f398dc3f6ad4 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -979,6 +979,31 @@ static void identify_cpu(struct cpuinfo_x86 *c)
#ifdef CONFIG_NUMA
numa_add_cpu(smp_processor_id());
#endif
+
+ /*
+ * ESPFIX is a strange bug. All real CPUs have it. Paravirt
+ * systems that run Linux at CPL > 0 may or may not have the
+ * issue, but, even if they have the issue, there's absolutely
+ * nothing we can do about it because we can't use the real IRET
+ * instruction.
+ *
+ * NB: For the time being, only 32-bit kernels support
+ * X86_BUG_ESPFIX as such. 64-bit kernels directly choose
+ * whether to apply espfix using paravirt hooks. If any
+ * non-paravirt system ever shows up that does *not* have the
+ * ESPFIX issue, we can change this.
+ */
+#ifdef CONFIG_X86_32
+#ifdef CONFIG_PARAVIRT
+ do {
+ extern void native_iret(void);
+ if (pv_cpu_ops.iret == native_iret)
+ set_cpu_bug(c, X86_BUG_ESPFIX);
+ } while(0);
+#else
+ set_cpu_bug(c, X86_BUG_ESPFIX);
+#endif
+#endif
}
/*
--
2.5.0
_______________________________________________
Lguest mailing list
Lguest@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/lguest
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 1/2] x86/nmi/64: Giant debugging hack
[not found] ` <cover.1456789300.git.luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2016-02-29 23:45 ` [PATCH 1/2] x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled Andy Lutomirski
@ 2016-02-29 23:45 ` Andy Lutomirski
2016-02-29 23:45 ` [PATCH 2/2] x86/asm-offsets: Remove PARAVIRT_enabled Andy Lutomirski
2016-02-29 23:45 ` [PATCH 2/2] x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled Andy Lutomirski
3 siblings, 0 replies; 5+ messages in thread
From: Andy Lutomirski @ 2016-02-29 23:45 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: lguest-uLR06cmDAlY/bJ5BZ2RsiQ, konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
Andrew Cooper, x86-DgEjT+Ai2ygdnm+yROfE0A,
xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Borislav Petkov,
david.vrabel-Sxgqhf6Nn4DQT0dZR+AlfA, Andy Lutomirski,
boris.ostrovsky-QHcLZuEGTsvQT0dZR+AlfA
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
arch/x86/entry/entry_64.S | 34 ++++++++++++++++++++++++++++++----
arch/x86/include/asm/segment.h | 5 ++++-
arch/x86/kernel/cpu/common.c | 2 ++
3 files changed, 36 insertions(+), 5 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 70eadb0ea5fa..2f61059dacc3 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -37,6 +37,9 @@
#include <asm/pgtable_types.h>
#include <linux/err.h>
+#define NMI_NORECURSE_SS (GDT_ENTRY_FOO << 3)
+#define NMI_RECURSIVE_SS (GDT_ENTRY_BAR << 3)
+
/* Avoid __ASSEMBLER__'ifying <linux/audit.h> just for this. */
#include <linux/elf-em.h>
#define AUDIT_ARCH_X86_64 (EM_X86_64|__AUDIT_ARCH_64BIT|__AUDIT_ARCH_LE)
@@ -523,6 +526,10 @@ restore_c_regs_and_iret:
INTERRUPT_RETURN
ENTRY(native_iret)
+ cmp $NMI_NORECURSE_SS, 4*8(%rsp)
+ jne 1f
+ ud2
+1:
/*
* Are we returning to a stack segment from the LDT? Note: in
* 64-bit mode SS:RSP on the exception stack is always valid.
@@ -1113,6 +1120,13 @@ ENTRY(nmi)
/* Use %rdx as our temp variable throughout */
pushq %rdx
+ cmpl $NMI_NORECURSE_SS, 4*8(%rsp)
+ jne 1f
+ ud2
+1:
+ movl $NMI_NORECURSE_SS, %edx
+ mov %dx, %ss
+
testb $3, CS-RIP+8(%rsp)
jz .Lnmi_from_kernel
@@ -1158,6 +1172,8 @@ ENTRY(nmi)
* due to nesting -- we're on the normal thread stack and we're
* done with the NMI stack.
*/
+ mov $__KERNEL_DS, %edx
+ mov %dx, %ss
movq %rsp, %rdi
movq $-1, %rsi
@@ -1240,6 +1256,11 @@ ENTRY(nmi)
cmpl $1, -8(%rsp)
je nested_nmi
+ cmpl $0, -8(%rsp)
+ je 1f
+ ud2 /* corrupt */
+1:
+
/*
* Now test if the previous stack was an NMI stack. This covers
* the case where we interrupt an outer NMI after it clears
@@ -1277,7 +1298,7 @@ nested_nmi:
*/
subq $8, %rsp
leaq -10*8(%rsp), %rdx
- pushq $__KERNEL_DS
+ pushq $NMI_RECURSIVE_SS
pushq %rdx
pushfq
pushq $__KERNEL_CS
@@ -1293,6 +1314,11 @@ nested_nmi_out:
INTERRUPT_RETURN
first_nmi:
+ cmpl $NMI_RECURSIVE_SS, 4*8(%rsp)
+ jne 1f
+ ud2
+1:
+
/* Restore rdx. */
movq (%rsp), %rdx
@@ -1309,12 +1335,12 @@ first_nmi:
/* Everything up to here is safe from nested NMIs */
-#ifdef CONFIG_DEBUG_ENTRY
+/*#ifdef CONFIG_DEBUG_ENTRY*/
/*
* For ease of testing, unmask NMIs right away. Disabled by
* default because IRET is very expensive.
*/
- pushq $0 /* SS */
+ pushq $NMI_RECURSIVE_SS /* SS */
pushq %rsp /* RSP (minus 8 because of the previous push) */
addq $8, (%rsp) /* Fix up RSP */
pushfq /* RFLAGS */
@@ -1322,7 +1348,7 @@ first_nmi:
pushq $1f /* RIP */
INTERRUPT_RETURN /* continues at repeat_nmi below */
1:
-#endif
+/*#endif*/
repeat_nmi:
/*
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index 7d5a1929d76b..58473396f290 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -176,6 +176,9 @@
#define GDT_ENTRY_DEFAULT_USER_DS 5
#define GDT_ENTRY_DEFAULT_USER_CS 6
+#define GDT_ENTRY_FOO 7
+#define GDT_ENTRY_BAR 16
+
/* Needs two entries */
#define GDT_ENTRY_TSS 8
/* Needs two entries */
@@ -190,7 +193,7 @@
/*
* Number of entries in the GDT table:
*/
-#define GDT_ENTRIES 16
+#define GDT_ENTRIES 17
/*
* Segment selector values corresponding to the above entries:
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 079d83fc6488..91ddae732a36 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -108,6 +108,8 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
[GDT_ENTRY_DEFAULT_USER32_CS] = GDT_ENTRY_INIT(0xc0fb, 0, 0xfffff),
[GDT_ENTRY_DEFAULT_USER_DS] = GDT_ENTRY_INIT(0xc0f3, 0, 0xfffff),
[GDT_ENTRY_DEFAULT_USER_CS] = GDT_ENTRY_INIT(0xa0fb, 0, 0xfffff),
+ [GDT_ENTRY_FOO] = GDT_ENTRY_INIT(0xc093, 0, 0xfffff),
+ [GDT_ENTRY_BAR] = GDT_ENTRY_INIT(0xc093, 0, 0xfffff),
#else
[GDT_ENTRY_KERNEL_CS] = GDT_ENTRY_INIT(0xc09a, 0, 0xfffff),
[GDT_ENTRY_KERNEL_DS] = GDT_ENTRY_INIT(0xc092, 0, 0xfffff),
--
2.5.0
_______________________________________________
Lguest mailing list
Lguest@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/lguest
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] x86/asm-offsets: Remove PARAVIRT_enabled
[not found] ` <cover.1456789300.git.luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2016-02-29 23:45 ` [PATCH 1/2] x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled Andy Lutomirski
2016-02-29 23:45 ` [PATCH 1/2] x86/nmi/64: Giant debugging hack Andy Lutomirski
@ 2016-02-29 23:45 ` Andy Lutomirski
2016-02-29 23:45 ` [PATCH 2/2] x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled Andy Lutomirski
3 siblings, 0 replies; 5+ messages in thread
From: Andy Lutomirski @ 2016-02-29 23:45 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: lguest-uLR06cmDAlY/bJ5BZ2RsiQ, konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
Andrew Cooper, x86-DgEjT+Ai2ygdnm+yROfE0A,
xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Borislav Petkov,
david.vrabel-Sxgqhf6Nn4DQT0dZR+AlfA, Andy Lutomirski,
boris.ostrovsky-QHcLZuEGTsvQT0dZR+AlfA
It no longer has any users.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
arch/x86/kernel/asm-offsets.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 84a7524b202c..5c042466f274 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -59,7 +59,6 @@ void common(void) {
#ifdef CONFIG_PARAVIRT
BLANK();
- OFFSET(PARAVIRT_enabled, pv_info, paravirt_enabled);
OFFSET(PARAVIRT_PATCH_pv_cpu_ops, paravirt_patch_template, pv_cpu_ops);
OFFSET(PARAVIRT_PATCH_pv_irq_ops, paravirt_patch_template, pv_irq_ops);
OFFSET(PV_IRQ_irq_disable, pv_irq_ops, irq_disable);
--
2.5.0
_______________________________________________
Lguest mailing list
Lguest@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/lguest
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled
[not found] ` <cover.1456789300.git.luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
` (2 preceding siblings ...)
2016-02-29 23:45 ` [PATCH 2/2] x86/asm-offsets: Remove PARAVIRT_enabled Andy Lutomirski
@ 2016-02-29 23:45 ` Andy Lutomirski
3 siblings, 0 replies; 5+ messages in thread
From: Andy Lutomirski @ 2016-02-29 23:45 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: lguest-uLR06cmDAlY/bJ5BZ2RsiQ, konrad.wilk-QHcLZuEGTsvQT0dZR+AlfA,
Andrew Cooper, x86-DgEjT+Ai2ygdnm+yROfE0A,
xen-devel-GuqFBffKawuULHF6PoxzQEEOCMrvLtNR, Borislav Petkov,
david.vrabel-Sxgqhf6Nn4DQT0dZR+AlfA, Andy Lutomirski,
boris.ostrovsky-QHcLZuEGTsvQT0dZR+AlfA
x86_64 has very clean espfix handling on paravirt: espfix64 is set
up in native_iret, so paravirt systems that override iret bypass
espfix64 automatically. This is robust and straightforward.
x86_32 is messier. espfix is set up before the IRET paravirt patch
point, so it can't be directly conditionalized on whether we use
native_iret. We also can't easily move it into native_iret without
regressing performance due to a bizarre consideration. Specifically,
on 64-bit kernels, the logic is:
if (regs->ss & 0x4)
setup_espfix;
On 32-bit kernels, the logic is:
if ((regs->ss & 0x4) && (regs->cs & 0x3) == 3 &&
(regs->flags & X86_EFLAGS_VM) == 0)
setup_espfix;
The performance of setup_espfix itself is essentially irrelevant, but
the comparison happens on every IRET so its performance matters. On
x86_64, there's no need for any registers except flags to implement
the comparison, so we fold the whole thing into native_iret. On
x86_32, we don't do that because we need a free register to
implement the comparison efficiently. We therefore do espfix setup
before restoring registers on x86_32.
This patch gets rid of the explicit paravirt_enabled check by
introducing X86_BUG_ESPFIX on 32-bit systems and using an ALTERNATIVE
to skip espfix on paravirt systems where iret != native_iret. This is
also messy, but it's at least in line with other things we do.
This improves espfix performance by removing a branch, but no one
cares. More importantly, it removes a paravirt_enabled user, which is
good because paravirt_enabled is ill-defined and is going away.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
arch/x86/entry/entry_32.S | 15 ++-------------
arch/x86/include/asm/cpufeatures.h | 8 ++++++++
arch/x86/kernel/cpu/common.c | 25 +++++++++++++++++++++++++
3 files changed, 35 insertions(+), 13 deletions(-)
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 6a792603f9f3..4f7615e405a4 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -418,6 +418,8 @@ restore_all:
TRACE_IRQS_IRET
restore_all_notrace:
#ifdef CONFIG_X86_ESPFIX32
+ ALTERNATIVE "jmp restore_nocheck", "", X86_BUG_ESPFIX
+
movl PT_EFLAGS(%esp), %eax # mix EFLAGS, SS and CS
/*
* Warning: PT_OLDSS(%esp) contains the wrong/random values if we
@@ -444,19 +446,6 @@ ENTRY(iret_exc )
#ifdef CONFIG_X86_ESPFIX32
ldt_ss:
-#ifdef CONFIG_PARAVIRT
- /*
- * The kernel can't run on a non-flat stack if paravirt mode
- * is active. Rather than try to fixup the high bits of
- * ESP, bypass this code entirely. This may break DOSemu
- * and/or Wine support in a paravirt VM, although the option
- * is still available to implement the setting of the high
- * 16-bits in the INTERRUPT_RETURN paravirt-op.
- */
- cmpl $0, pv_info+PARAVIRT_enabled
- jne restore_nocheck
-#endif
-
/*
* Setup and switch to ESPFIX stack
*
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6663fae71b12..d11a3aaafd96 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -286,4 +286,12 @@
#define X86_BUG_CLFLUSH_MONITOR X86_BUG(7) /* AAI65, CLFLUSH required before MONITOR */
#define X86_BUG_SYSRET_SS_ATTRS X86_BUG(8) /* SYSRET doesn't fix up SS attrs */
+#ifdef CONFIG_X86_32
+/*
+ * 64-bit kernels don't use X86_BUG_ESPFIX. Make the define conditional
+ * to avoid confusion.
+ */
+#define X86_BUG_ESPFIX X86_BUG(9) /* "" IRET to 16-bit SS corrupts ESP/RSP high bits */
+#endif
+
#endif /* _ASM_X86_CPUFEATURES_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 91ddae732a36..c6ef4da8e4f4 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -979,6 +979,31 @@ static void identify_cpu(struct cpuinfo_x86 *c)
#ifdef CONFIG_NUMA
numa_add_cpu(smp_processor_id());
#endif
+
+ /*
+ * ESPFIX is a strange bug. All real CPUs have it. Paravirt
+ * systems that run Linux at CPL > 0 may or may not have the
+ * issue, but, even if they have the issue, there's absolutely
+ * nothing we can do about it because we can't use the real IRET
+ * instruction.
+ *
+ * NB: For the time being, only 32-bit kernels support
+ * X86_BUG_ESPFIX as such. 64-bit kernels directly choose
+ * whether to apply espfix using paravirt hooks. If any
+ * non-paravirt system ever shows up that does *not* have the
+ * ESPFIX issue, we can change this.
+ */
+#ifdef CONFIG_X86_32
+#ifdef CONFIG_PARAVIRT
+ do {
+ extern void native_iret(void);
+ if (pv_cpu_ops.iret == native_iret)
+ set_cpu_bug(c, X86_BUG_ESPFIX);
+ } while (0);
+#else
+ set_cpu_bug(c, X86_BUG_ESPFIX);
+#endif
+#endif
}
/*
--
2.5.0
_______________________________________________
Lguest mailing list
Lguest@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/lguest
^ permalink raw reply related [flat|nested] 5+ messages in thread