* [PATCH v3 00/17] Alternative Meltdown mitigation
@ 2018-02-09 14:01 Juergen Gross
2018-02-09 14:01 ` [PATCH v3 01/17] x86: don't use hypervisor stack size for dumping guest stacks Juergen Gross
` (17 more replies)
0 siblings, 18 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-09 14:01 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, dfaggioli, jbeulich
This patch series is meant to be used instead of the "XPTI-light"
Meltdown mitigation of Jan. It is using a different approach by
using a shadow of the guest's L4 page table and keeping those in a
cache in order to avoid the need to create the shadow multiple
times. I'll name my approach "XPTI" in the following.
The shadow L4 page table used for running in guest mode maps only the
guest (of course) and those parts of the hypervisor memory which are
needed for entering and leaving the hypervisor: IDT, GDT, TSS, stacks
and early interrupt handling code.
To avoid a guest being capable to read other domain's data via the
interrupt stacks of other cpus a guest subject to XPTI isn't using the
normal stacks for early interrupt handling, but per-vcpu stacks. This
allows to map the per-vcpu stacks only when running the guest.
For each guest L4 page table there is exactly one shadow L4 page table.
This approach avoids the need to do complicated synchronizations
between L4 page tables, as the guest already needs to synchronize
multiple cpus in case it is using the same address space on multiple
processors concurrently.
Without any further measures it will still be possible for e.g. a
guest's user program to read stack data of another vcpu of the same
domain, but this can be easily avoided by a little PV-ABI modification
introducing per-cpu user address spaces. I'm planning to add that when
Linux kernel is learning to use per-cpu address spaces.
This series is available via github:
https://github.com/jgross1/xen.git xpti
Dario wants to do some performance tests for this series to compare
performance with Jan's series with all optimizations posted.
Patch 1 is just (IMHO) a bugfix for guest stack dumping.
Patches 2 - 3 revert Jan's XPTI-light patches.
Patch 4 modifies the trap handling to use %r12 for addressing the
guest's saved registers instead of using %rsp. This is a prerequisite
for being able to switch the stacks in early trap handling.
Patch 5 adds the xpti command line parameter and some basic
infrastructure for the XPTI framework.
Patches 6 - 8 modify some current infrastructure to support the
following XPTI functionality.
Patch 9 adds syscall stubs for XPTI as the current stubs can't be used.
Patch 10 allocates the per-vcpu stacks and initializes them.
Patch 11 modifies interrupt handling to support stack switching in
case of XPTI.
Patch 12 adds activation of the per-vcpu stacks for domains subject to
XPTI.
Patch 13 adds the L4 page table shadowing including the L4 shadow
cache.
Patch 14 does some more modifications needed for keeping the L4 shadows
up to date.
Patch 15 adds populating the L4 shadow tables with the guest's L4
entries.
Patch 16 adds switching between hypervisor and guest L4 page tables
when entering/leaving the hypervisor.
Patch 17 removes all the hypervisor mappings not needed in the shadow
L4 page table.
Juergen Gross (17):
x86: don't use hypervisor stack size for dumping guest stacks
x86: do a revert of e871e80c38547d9faefc6604532ba3e985e65873
x86: revert 5784de3e2067ed73efc2fe42e62831e8ae7f46c4
x86: don't access saved user regs via rsp in trap handlers
x86: add a xpti command line parameter
x86: allow per-domain mappings without NX bit or with specific mfn
xen/x86: split _set_tssldt_desc() into ldt and tss specific functions
x86: add support for spectre mitigation with local thunk
x86: create syscall stub for per-domain mapping
x86: allocate per-vcpu stacks for interrupt entries
x86: modify interrupt handlers to support stack switching
x86: activate per-vcpu stacks in case of xpti
x86: allocate hypervisor L4 page table for XPTI
xen: add domain pointer to fill_ro_mpt() and zap_ro_mpt() functions
x86: fill XPTI shadow pages and keep them in sync with guest L4
x86: do page table switching when entering/leaving hypervisor
x86: hide most hypervisor mappings in XPTI shadow page tables
docs/misc/xen-command-line.markdown | 16 +-
xen/arch/x86/cpu/common.c | 4 +-
xen/arch/x86/domain.c | 113 +++-
xen/arch/x86/domctl.c | 4 +
xen/arch/x86/indirect-thunk.S | 23 +-
xen/arch/x86/mm.c | 92 +--
xen/arch/x86/mm/shadow/multi.c | 9 +-
xen/arch/x86/pv/Makefile | 2 +
xen/arch/x86/pv/dom0_build.c | 6 +
xen/arch/x86/pv/domain.c | 5 +
xen/arch/x86/pv/xpti-stub.S | 61 ++
xen/arch/x86/pv/xpti.c | 1028 ++++++++++++++++++++++++++++++
xen/arch/x86/smpboot.c | 211 ------
xen/arch/x86/traps.c | 35 +-
xen/arch/x86/x86_64/asm-offsets.c | 6 +-
xen/arch/x86/x86_64/compat/entry.S | 27 +-
xen/arch/x86/x86_64/entry.S | 315 +++------
xen/arch/x86/x86_64/traps.c | 3 +-
xen/common/wait.c | 8 +-
xen/include/asm-x86/asm_defns.h | 68 +-
xen/include/asm-x86/config.h | 13 +-
xen/include/asm-x86/current.h | 86 ++-
xen/include/asm-x86/desc.h | 14 +-
xen/include/asm-x86/domain.h | 8 +
xen/include/asm-x86/indirect_thunk_asm.h | 8 +-
xen/include/asm-x86/ldt.h | 2 +-
xen/include/asm-x86/mm.h | 4 +-
xen/include/asm-x86/nops.h | 2 +-
xen/include/asm-x86/processor.h | 13 +-
xen/include/asm-x86/pv/mm.h | 35 +
xen/include/asm-x86/regs.h | 2 +
xen/include/asm-x86/spec_ctrl_asm.h | 13 +-
xen/include/asm-x86/system.h | 5 +
xen/include/asm-x86/x86_64/page.h | 5 +-
34 files changed, 1632 insertions(+), 614 deletions(-)
create mode 100644 xen/arch/x86/pv/xpti-stub.S
create mode 100644 xen/arch/x86/pv/xpti.c
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v3 01/17] x86: don't use hypervisor stack size for dumping guest stacks
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
@ 2018-02-09 14:01 ` Juergen Gross
2018-02-09 14:01 ` [PATCH v3 02/17] x86: do a revert of e871e80c38547d9faefc6604532ba3e985e65873 Juergen Gross
` (16 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-09 14:01 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, dfaggioli, jbeulich
show_guest_stack() and compat_show_guest_stack() stop dumping the
stack of the guest whenever its virtual address reaches the same
alignment which is used for the hypervisor stacks.
Remove this arbitrary limit and try to dump a fixed number of lines
instead.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
xen/arch/x86/traps.c | 26 +++++++++++---------------
1 file changed, 11 insertions(+), 15 deletions(-)
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 2e022b09b8..13a852ca4e 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -219,7 +219,8 @@ static void compat_show_guest_stack(struct vcpu *v,
const struct cpu_user_regs *regs,
int debug_stack_lines)
{
- unsigned int i, *stack, addr, mask = STACK_SIZE;
+ unsigned int i, *stack, addr;
+ unsigned long last_addr = -1L;
stack = (unsigned int *)(unsigned long)regs->esp;
printk("Guest stack trace from esp=%08lx:\n ", (unsigned long)stack);
@@ -248,13 +249,13 @@ static void compat_show_guest_stack(struct vcpu *v,
printk("Inaccessible guest memory.\n");
return;
}
- mask = PAGE_SIZE;
+ last_addr = round_pgup((unsigned long)stack);
}
}
for ( i = 0; i < debug_stack_lines * 8; i++ )
{
- if ( (((long)stack - 1) ^ ((long)(stack + 1) - 1)) & mask )
+ if ( (unsigned long)stack >= last_addr )
break;
if ( __get_user(addr, stack) )
{
@@ -269,11 +270,9 @@ static void compat_show_guest_stack(struct vcpu *v,
printk(" %08x", addr);
stack++;
}
- if ( mask == PAGE_SIZE )
- {
- BUILD_BUG_ON(PAGE_SIZE == STACK_SIZE);
+ if ( last_addr != -1L )
unmap_domain_page(stack);
- }
+
if ( i == 0 )
printk("Stack empty.");
printk("\n");
@@ -282,8 +281,7 @@ static void compat_show_guest_stack(struct vcpu *v,
static void show_guest_stack(struct vcpu *v, const struct cpu_user_regs *regs)
{
int i;
- unsigned long *stack, addr;
- unsigned long mask = STACK_SIZE;
+ unsigned long *stack, addr, last_addr = -1L;
/* Avoid HVM as we don't know what the stack looks like. */
if ( is_hvm_vcpu(v) )
@@ -318,13 +316,13 @@ static void show_guest_stack(struct vcpu *v, const struct cpu_user_regs *regs)
printk("Inaccessible guest memory.\n");
return;
}
- mask = PAGE_SIZE;
+ last_addr = round_pgup((unsigned long)stack);
}
}
for ( i = 0; i < (debug_stack_lines*stack_words_per_line); i++ )
{
- if ( (((long)stack - 1) ^ ((long)(stack + 1) - 1)) & mask )
+ if ( (unsigned long)stack >= last_addr )
break;
if ( __get_user(addr, stack) )
{
@@ -339,11 +337,9 @@ static void show_guest_stack(struct vcpu *v, const struct cpu_user_regs *regs)
printk(" %p", _p(addr));
stack++;
}
- if ( mask == PAGE_SIZE )
- {
- BUILD_BUG_ON(PAGE_SIZE == STACK_SIZE);
+ if ( last_addr != -1L )
unmap_domain_page(stack);
- }
+
if ( i == 0 )
printk("Stack empty.");
printk("\n");
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 02/17] x86: do a revert of e871e80c38547d9faefc6604532ba3e985e65873
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
2018-02-09 14:01 ` [PATCH v3 01/17] x86: don't use hypervisor stack size for dumping guest stacks Juergen Gross
@ 2018-02-09 14:01 ` Juergen Gross
2018-02-13 10:14 ` Jan Beulich
2018-02-09 14:01 ` [PATCH v3 03/17] x86: revert 5784de3e2067ed73efc2fe42e62831e8ae7f46c4 Juergen Gross
` (15 subsequent siblings)
17 siblings, 1 reply; 23+ messages in thread
From: Juergen Gross @ 2018-02-09 14:01 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, dfaggioli, jbeulich
Revert "x86: allow Meltdown band-aid to be disabled" in order to
prepare for a final Meltdown mitigation.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
docs/misc/xen-command-line.markdown | 12 ------------
xen/arch/x86/domain.c | 7 ++-----
xen/arch/x86/mm.c | 12 +-----------
xen/arch/x86/smpboot.c | 17 +++--------------
xen/arch/x86/x86_64/entry.S | 2 --
5 files changed, 6 insertions(+), 44 deletions(-)
diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 79feba6bcd..6df39dae0b 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1926,18 +1926,6 @@ In the case that x2apic is in use, this option switches between physical and
clustered mode. The default, given no hint from the **FADT**, is cluster
mode.
-### xpti
-> `= <boolean>`
-
-> Default: `false` on AMD hardware
-> Default: `true` everywhere else
-
-Override default selection of whether to isolate 64-bit PV guest page
-tables.
-
-** WARNING: Not yet a complete isolation implementation, but better than
-nothing. **
-
### xsave
> `= <boolean>`
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f93327b0a2..752e0fafee 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1518,13 +1518,10 @@ void paravirt_ctxt_switch_from(struct vcpu *v)
void paravirt_ctxt_switch_to(struct vcpu *v)
{
- root_pgentry_t *root_pgt = this_cpu(root_pgt);
unsigned long cr4;
- if ( root_pgt )
- root_pgt[root_table_offset(PERDOMAIN_VIRT_START)] =
- l4e_from_page(v->domain->arch.perdomain_l3_pg,
- __PAGE_HYPERVISOR_RW);
+ this_cpu(root_pgt)[root_table_offset(PERDOMAIN_VIRT_START)] =
+ l4e_from_page(v->domain->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
cr4 = pv_guest_cr4_to_real_cr4(v);
if ( unlikely(cr4 != read_cr4()) )
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 35f204369b..fa0da7b0ff 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3706,17 +3706,7 @@ long do_mmu_update(
break;
rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
- /*
- * No need to sync if all uses of the page can be accounted
- * to the page lock we hold, its pinned status, and uses on
- * this (v)CPU.
- */
- if ( !rc && this_cpu(root_pgt) &&
- ((page->u.inuse.type_info & PGT_count_mask) >
- (1 + !!(page->u.inuse.type_info & PGT_pinned) +
- (pagetable_get_pfn(curr->arch.guest_table) == mfn) +
- (pagetable_get_pfn(curr->arch.guest_table_user) ==
- mfn))) )
+ if ( !rc )
sync_guest = true;
break;
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 2ebef03027..49978b3697 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -331,7 +331,7 @@ void start_secondary(void *unused)
spin_debug_disable();
get_cpu_info()->xen_cr3 = 0;
- get_cpu_info()->pv_cr3 = this_cpu(root_pgt) ? __pa(this_cpu(root_pgt)) : 0;
+ get_cpu_info()->pv_cr3 = __pa(this_cpu(root_pgt));
load_system_tables();
@@ -740,20 +740,14 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
return 0;
}
-static __read_mostly int8_t opt_xpti = -1;
-boolean_param("xpti", opt_xpti);
DEFINE_PER_CPU(root_pgentry_t *, root_pgt);
static int setup_cpu_root_pgt(unsigned int cpu)
{
- root_pgentry_t *rpt;
+ root_pgentry_t *rpt = alloc_xen_pagetable();
unsigned int off;
int rc;
- if ( !opt_xpti )
- return 0;
-
- rpt = alloc_xen_pagetable();
if ( !rpt )
return -ENOMEM;
@@ -1002,14 +996,10 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
stack_base[0] = stack_start;
- if ( opt_xpti < 0 )
- opt_xpti = boot_cpu_data.x86_vendor != X86_VENDOR_AMD;
-
rc = setup_cpu_root_pgt(0);
if ( rc )
panic("Error %d setting up PV root page table\n", rc);
- if ( per_cpu(root_pgt, 0) )
- get_cpu_info()->pv_cr3 = __pa(per_cpu(root_pgt, 0));
+ get_cpu_info()->pv_cr3 = __pa(per_cpu(root_pgt, 0));
set_nr_sockets();
@@ -1081,7 +1071,6 @@ void __init smp_prepare_boot_cpu(void)
#endif
get_cpu_info()->xen_cr3 = 0;
- get_cpu_info()->pv_cr3 = 0;
}
static void
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 58f652d010..52f64cceda 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -50,7 +50,6 @@ restore_all_guest:
movabs $DIRECTMAP_VIRT_START, %rcx
mov %rdi, %rax
and %rsi, %rdi
- jz .Lrag_keep_cr3
and %r9, %rsi
add %rcx, %rdi
add %rcx, %rsi
@@ -67,7 +66,6 @@ restore_all_guest:
rep movsq
mov %r9, STACK_CPUINFO_FIELD(xen_cr3)(%rdx)
write_cr3 rax, rdi, rsi
-.Lrag_keep_cr3:
/* Restore stashed SPEC_CTRL value. */
mov %r15d, %eax
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 03/17] x86: revert 5784de3e2067ed73efc2fe42e62831e8ae7f46c4
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
2018-02-09 14:01 ` [PATCH v3 01/17] x86: don't use hypervisor stack size for dumping guest stacks Juergen Gross
2018-02-09 14:01 ` [PATCH v3 02/17] x86: do a revert of e871e80c38547d9faefc6604532ba3e985e65873 Juergen Gross
@ 2018-02-09 14:01 ` Juergen Gross
2018-02-09 14:01 ` [PATCH v3 04/17] x86: don't access saved user regs via rsp in trap handlers Juergen Gross
` (14 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-09 14:01 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, dfaggioli, jbeulich
Revert patch "x86: Meltdown band-aid against malicious 64-bit PV
guests" in order to prepare for a final Meltdown mitigation.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
xen/arch/x86/domain.c | 5 -
xen/arch/x86/mm.c | 21 ----
xen/arch/x86/smpboot.c | 200 -------------------------------------
xen/arch/x86/x86_64/asm-offsets.c | 2 -
xen/arch/x86/x86_64/compat/entry.S | 12 +--
xen/arch/x86/x86_64/entry.S | 142 +-------------------------
xen/include/asm-x86/asm_defns.h | 9 --
xen/include/asm-x86/current.h | 12 ---
xen/include/asm-x86/processor.h | 1 -
xen/include/asm-x86/x86_64/page.h | 5 +-
10 files changed, 8 insertions(+), 401 deletions(-)
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 752e0fafee..6dd47bb2bb 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1520,9 +1520,6 @@ void paravirt_ctxt_switch_to(struct vcpu *v)
{
unsigned long cr4;
- this_cpu(root_pgt)[root_table_offset(PERDOMAIN_VIRT_START)] =
- l4e_from_page(v->domain->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
-
cr4 = pv_guest_cr4_to_real_cr4(v);
if ( unlikely(cr4 != read_cr4()) )
write_cr4(cr4);
@@ -1694,8 +1691,6 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
ASSERT(local_irq_is_enabled());
- get_cpu_info()->xen_cr3 = 0;
-
if ( unlikely(dirty_cpu != cpu) && dirty_cpu != VCPU_CPU_CLEAN )
{
/* Remote CPU calls __sync_local_execstate() from flush IPI handler. */
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index fa0da7b0ff..e795239829 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3542,7 +3542,6 @@ long do_mmu_update(
struct vcpu *curr = current, *v = curr;
struct domain *d = v->domain, *pt_owner = d, *pg_owner;
mfn_t map_mfn = INVALID_MFN;
- bool sync_guest = false;
uint32_t xsm_needed = 0;
uint32_t xsm_checked = 0;
int rc = put_old_guest_table(curr);
@@ -3706,8 +3705,6 @@ long do_mmu_update(
break;
rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
- if ( !rc )
- sync_guest = true;
break;
case PGT_writable_page:
@@ -3812,24 +3809,6 @@ long do_mmu_update(
if ( va )
unmap_domain_page(va);
- if ( sync_guest )
- {
- /*
- * Force other vCPU-s of the affected guest to pick up L4 entry
- * changes (if any). Issue a flush IPI with empty operation mask to
- * facilitate this (including ourselves waiting for the IPI to
- * actually have arrived). Utilize the fact that FLUSH_VA_VALID is
- * meaningless without FLUSH_CACHE, but will allow to pass the no-op
- * check in flush_area_mask().
- */
- unsigned int cpu = smp_processor_id();
- cpumask_t *mask = per_cpu(scratch_cpumask, cpu);
-
- cpumask_andnot(mask, pt_owner->dirty_cpumask, cpumask_of(cpu));
- if ( !cpumask_empty(mask) )
- flush_area_mask(mask, ZERO_BLOCK_PTR, FLUSH_VA_VALID);
- }
-
perfc_add(num_page_updates, i);
out:
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 49978b3697..9d346e54f6 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -330,9 +330,6 @@ void start_secondary(void *unused)
*/
spin_debug_disable();
- get_cpu_info()->xen_cr3 = 0;
- get_cpu_info()->pv_cr3 = __pa(this_cpu(root_pgt));
-
load_system_tables();
/* Full exception support from here on in. */
@@ -642,187 +639,6 @@ void cpu_exit_clear(unsigned int cpu)
set_cpu_state(CPU_STATE_DEAD);
}
-static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
-{
- unsigned long linear = (unsigned long)ptr, pfn;
- unsigned int flags;
- l3_pgentry_t *pl3e = l4e_to_l3e(idle_pg_table[root_table_offset(linear)]) +
- l3_table_offset(linear);
- l2_pgentry_t *pl2e;
- l1_pgentry_t *pl1e;
-
- if ( linear < DIRECTMAP_VIRT_START )
- return 0;
-
- flags = l3e_get_flags(*pl3e);
- ASSERT(flags & _PAGE_PRESENT);
- if ( flags & _PAGE_PSE )
- {
- pfn = (l3e_get_pfn(*pl3e) & ~((1UL << (2 * PAGETABLE_ORDER)) - 1)) |
- (PFN_DOWN(linear) & ((1UL << (2 * PAGETABLE_ORDER)) - 1));
- flags &= ~_PAGE_PSE;
- }
- else
- {
- pl2e = l3e_to_l2e(*pl3e) + l2_table_offset(linear);
- flags = l2e_get_flags(*pl2e);
- ASSERT(flags & _PAGE_PRESENT);
- if ( flags & _PAGE_PSE )
- {
- pfn = (l2e_get_pfn(*pl2e) & ~((1UL << PAGETABLE_ORDER) - 1)) |
- (PFN_DOWN(linear) & ((1UL << PAGETABLE_ORDER) - 1));
- flags &= ~_PAGE_PSE;
- }
- else
- {
- pl1e = l2e_to_l1e(*pl2e) + l1_table_offset(linear);
- flags = l1e_get_flags(*pl1e);
- if ( !(flags & _PAGE_PRESENT) )
- return 0;
- pfn = l1e_get_pfn(*pl1e);
- }
- }
-
- if ( !(root_get_flags(rpt[root_table_offset(linear)]) & _PAGE_PRESENT) )
- {
- pl3e = alloc_xen_pagetable();
- if ( !pl3e )
- return -ENOMEM;
- clear_page(pl3e);
- l4e_write(&rpt[root_table_offset(linear)],
- l4e_from_paddr(__pa(pl3e), __PAGE_HYPERVISOR));
- }
- else
- pl3e = l4e_to_l3e(rpt[root_table_offset(linear)]);
-
- pl3e += l3_table_offset(linear);
-
- if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
- {
- pl2e = alloc_xen_pagetable();
- if ( !pl2e )
- return -ENOMEM;
- clear_page(pl2e);
- l3e_write(pl3e, l3e_from_paddr(__pa(pl2e), __PAGE_HYPERVISOR));
- }
- else
- {
- ASSERT(!(l3e_get_flags(*pl3e) & _PAGE_PSE));
- pl2e = l3e_to_l2e(*pl3e);
- }
-
- pl2e += l2_table_offset(linear);
-
- if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
- {
- pl1e = alloc_xen_pagetable();
- if ( !pl1e )
- return -ENOMEM;
- clear_page(pl1e);
- l2e_write(pl2e, l2e_from_paddr(__pa(pl1e), __PAGE_HYPERVISOR));
- }
- else
- {
- ASSERT(!(l2e_get_flags(*pl2e) & _PAGE_PSE));
- pl1e = l2e_to_l1e(*pl2e);
- }
-
- pl1e += l1_table_offset(linear);
-
- if ( l1e_get_flags(*pl1e) & _PAGE_PRESENT )
- {
- ASSERT(l1e_get_pfn(*pl1e) == pfn);
- ASSERT(l1e_get_flags(*pl1e) == flags);
- }
- else
- l1e_write(pl1e, l1e_from_pfn(pfn, flags));
-
- return 0;
-}
-
-DEFINE_PER_CPU(root_pgentry_t *, root_pgt);
-
-static int setup_cpu_root_pgt(unsigned int cpu)
-{
- root_pgentry_t *rpt = alloc_xen_pagetable();
- unsigned int off;
- int rc;
-
- if ( !rpt )
- return -ENOMEM;
-
- clear_page(rpt);
- per_cpu(root_pgt, cpu) = rpt;
-
- rpt[root_table_offset(RO_MPT_VIRT_START)] =
- idle_pg_table[root_table_offset(RO_MPT_VIRT_START)];
- /* SH_LINEAR_PT inserted together with guest mappings. */
- /* PERDOMAIN inserted during context switch. */
- rpt[root_table_offset(XEN_VIRT_START)] =
- idle_pg_table[root_table_offset(XEN_VIRT_START)];
-
- /* Install direct map page table entries for stack, IDT, and TSS. */
- for ( off = rc = 0; !rc && off < STACK_SIZE; off += PAGE_SIZE )
- rc = clone_mapping(__va(__pa(stack_base[cpu])) + off, rpt);
-
- if ( !rc )
- rc = clone_mapping(idt_tables[cpu], rpt);
- if ( !rc )
- rc = clone_mapping(&per_cpu(init_tss, cpu), rpt);
-
- return rc;
-}
-
-static void cleanup_cpu_root_pgt(unsigned int cpu)
-{
- root_pgentry_t *rpt = per_cpu(root_pgt, cpu);
- unsigned int r;
-
- if ( !rpt )
- return;
-
- per_cpu(root_pgt, cpu) = NULL;
-
- for ( r = root_table_offset(DIRECTMAP_VIRT_START);
- r < root_table_offset(HYPERVISOR_VIRT_END); ++r )
- {
- l3_pgentry_t *l3t;
- unsigned int i3;
-
- if ( !(root_get_flags(rpt[r]) & _PAGE_PRESENT) )
- continue;
-
- l3t = l4e_to_l3e(rpt[r]);
-
- for ( i3 = 0; i3 < L3_PAGETABLE_ENTRIES; ++i3 )
- {
- l2_pgentry_t *l2t;
- unsigned int i2;
-
- if ( !(l3e_get_flags(l3t[i3]) & _PAGE_PRESENT) )
- continue;
-
- ASSERT(!(l3e_get_flags(l3t[i3]) & _PAGE_PSE));
- l2t = l3e_to_l2e(l3t[i3]);
-
- for ( i2 = 0; i2 < L2_PAGETABLE_ENTRIES; ++i2 )
- {
- if ( !(l2e_get_flags(l2t[i2]) & _PAGE_PRESENT) )
- continue;
-
- ASSERT(!(l2e_get_flags(l2t[i2]) & _PAGE_PSE));
- free_xen_pagetable(l2e_to_l1e(l2t[i2]));
- }
-
- free_xen_pagetable(l2t);
- }
-
- free_xen_pagetable(l3t);
- }
-
- free_xen_pagetable(rpt);
-}
-
static void cpu_smpboot_free(unsigned int cpu)
{
unsigned int order, socket = cpu_to_socket(cpu);
@@ -861,8 +677,6 @@ static void cpu_smpboot_free(unsigned int cpu)
free_domheap_page(mfn_to_page(mfn));
}
- cleanup_cpu_root_pgt(cpu);
-
order = get_order_from_pages(NR_RESERVED_GDT_PAGES);
free_xenheap_pages(per_cpu(gdt_table, cpu), order);
@@ -917,11 +731,6 @@ static int cpu_smpboot_alloc(unsigned int cpu)
memcpy(idt_tables[cpu], idt_table, IDT_ENTRIES * sizeof(idt_entry_t));
disable_each_ist(idt_tables[cpu]);
- rc = setup_cpu_root_pgt(cpu);
- if ( rc )
- goto out;
- rc = -ENOMEM;
-
for ( stub_page = 0, i = cpu & ~(STUBS_PER_PAGE - 1);
i < nr_cpu_ids && i <= (cpu | (STUBS_PER_PAGE - 1)); ++i )
if ( cpu_online(i) && cpu_to_node(i) == node )
@@ -981,8 +790,6 @@ static struct notifier_block cpu_smpboot_nfb = {
void __init smp_prepare_cpus(unsigned int max_cpus)
{
- int rc;
-
register_cpu_notifier(&cpu_smpboot_nfb);
mtrr_aps_sync_begin();
@@ -996,11 +803,6 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
stack_base[0] = stack_start;
- rc = setup_cpu_root_pgt(0);
- if ( rc )
- panic("Error %d setting up PV root page table\n", rc);
- get_cpu_info()->pv_cr3 = __pa(per_cpu(root_pgt, 0));
-
set_nr_sockets();
socket_cpumask = xzalloc_array(cpumask_t *, nr_sockets);
@@ -1069,8 +871,6 @@ void __init smp_prepare_boot_cpu(void)
#if NR_CPUS > 2 * BITS_PER_LONG
per_cpu(scratch_cpumask, cpu) = &scratch_cpu0mask;
#endif
-
- get_cpu_info()->xen_cr3 = 0;
}
static void
diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
index 51be528f89..cc7753c0a9 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -138,8 +138,6 @@ void __dummy__(void)
OFFSET(CPUINFO_processor_id, struct cpu_info, processor_id);
OFFSET(CPUINFO_current_vcpu, struct cpu_info, current_vcpu);
OFFSET(CPUINFO_cr4, struct cpu_info, cr4);
- OFFSET(CPUINFO_xen_cr3, struct cpu_info, xen_cr3);
- OFFSET(CPUINFO_pv_cr3, struct cpu_info, pv_cr3);
OFFSET(CPUINFO_shadow_spec_ctrl, struct cpu_info, shadow_spec_ctrl);
OFFSET(CPUINFO_use_shadow_spec_ctrl, struct cpu_info, use_shadow_spec_ctrl);
OFFSET(CPUINFO_bti_ist_info, struct cpu_info, bti_ist_info);
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 707c74621b..8fac5d304d 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -214,17 +214,7 @@ ENTRY(cstar_enter)
SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
- GET_STACK_END(bx)
- mov STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rcx
- neg %rcx
- jz .Lcstar_cr3_okay
- mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
- neg %rcx
- write_cr3 rcx, rdi, rsi
- movq $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
-.Lcstar_cr3_okay:
-
- movq STACK_CPUINFO_FIELD(current_vcpu)(%rbx), %rbx
+ GET_CURRENT(bx)
movq VCPU_domain(%rbx),%rcx
cmpb $0,DOMAIN_is_32bit_pv(%rcx)
je switch_to_kernel
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 52f64cceda..a078ad8979 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -40,35 +40,7 @@ restore_all_guest:
/* Stash guest SPEC_CTRL value while we can read struct vcpu. */
mov VCPU_arch_msr(%rbx), %rdx
- mov VCPUMSR_spec_ctrl_raw(%rdx), %r15d
-
- /* Copy guest mappings and switch to per-CPU root page table. */
- mov %cr3, %r9
- GET_STACK_END(dx)
- mov STACK_CPUINFO_FIELD(pv_cr3)(%rdx), %rdi
- movabs $PADDR_MASK & PAGE_MASK, %rsi
- movabs $DIRECTMAP_VIRT_START, %rcx
- mov %rdi, %rax
- and %rsi, %rdi
- and %r9, %rsi
- add %rcx, %rdi
- add %rcx, %rsi
- mov $ROOT_PAGETABLE_FIRST_XEN_SLOT, %ecx
- mov root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rsi), %r8
- mov %r8, root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rdi)
- rep movsq
- mov $ROOT_PAGETABLE_ENTRIES - \
- ROOT_PAGETABLE_LAST_XEN_SLOT - 1, %ecx
- sub $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \
- ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rsi
- sub $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \
- ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rdi
- rep movsq
- mov %r9, STACK_CPUINFO_FIELD(xen_cr3)(%rdx)
- write_cr3 rax, rdi, rsi
-
- /* Restore stashed SPEC_CTRL value. */
- mov %r15d, %eax
+ mov VCPUMSR_spec_ctrl_raw(%rdx), %eax
/* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
SPEC_CTRL_EXIT_TO_GUEST /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
@@ -107,22 +79,7 @@ iret_exit_to_guest:
ALIGN
/* No special register assumptions. */
restore_all_xen:
- /*
- * Check whether we need to switch to the per-CPU page tables, in
- * case we return to late PV exit code (from an NMI or #MC).
- */
GET_STACK_END(bx)
- mov STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rdx
- mov STACK_CPUINFO_FIELD(pv_cr3)(%rbx), %rax
- test %rdx, %rdx
- /*
- * Ideally the condition would be "nsz", but such doesn't exist,
- * so "g" will have to do.
- */
-UNLIKELY_START(g, exit_cr3)
- write_cr3 rax, rdi, rsi
-UNLIKELY_END(exit_cr3)
-
/* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
SPEC_CTRL_EXIT_TO_XEN_IST /* Req: %rbx=end, Clob: acd */
@@ -159,17 +116,7 @@ ENTRY(lstar_enter)
SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
- GET_STACK_END(bx)
- mov STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rcx
- neg %rcx
- jz .Llstar_cr3_okay
- mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
- neg %rcx
- write_cr3 rcx, rdi, rsi
- movq $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
-.Llstar_cr3_okay:
-
- movq STACK_CPUINFO_FIELD(current_vcpu)(%rbx), %rbx
+ GET_CURRENT(bx)
testb $TF_kernel_mode,VCPU_thread_flags(%rbx)
jz switch_to_kernel
@@ -265,17 +212,7 @@ GLOBAL(sysenter_eflags_saved)
SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
- GET_STACK_END(bx)
- mov STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rcx
- neg %rcx
- jz .Lsyse_cr3_okay
- mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
- neg %rcx
- write_cr3 rcx, rdi, rsi
- movq $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
-.Lsyse_cr3_okay:
-
- movq STACK_CPUINFO_FIELD(current_vcpu)(%rbx), %rbx
+ GET_CURRENT(bx)
cmpb $0,VCPU_sysenter_disables_events(%rbx)
movq VCPU_sysenter_addr(%rbx),%rax
setne %cl
@@ -314,23 +251,13 @@ ENTRY(int80_direct_trap)
SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
- GET_STACK_END(bx)
- mov STACK_CPUINFO_FIELD(xen_cr3)(%rbx), %rcx
- neg %rcx
- jz .Lint80_cr3_okay
- mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
- neg %rcx
- write_cr3 rcx, rdi, rsi
- movq $0, STACK_CPUINFO_FIELD(xen_cr3)(%rbx)
-.Lint80_cr3_okay:
-
cmpb $0,untrusted_msi(%rip)
UNLIKELY_START(ne, msi_check)
movl $0x80,%edi
call check_for_unexpected_msi
UNLIKELY_END(msi_check)
- movq STACK_CPUINFO_FIELD(current_vcpu)(%rbx), %rbx
+ GET_CURRENT(bx)
/* Check that the callback is non-null. */
leaq VCPU_int80_bounce(%rbx),%rdx
@@ -493,25 +420,9 @@ ENTRY(common_interrupt)
SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
- mov STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
- mov %rcx, %r15
- neg %rcx
- jz .Lintr_cr3_okay
- jns .Lintr_cr3_load
- mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
- neg %rcx
-.Lintr_cr3_load:
- write_cr3 rcx, rdi, rsi
- xor %ecx, %ecx
- mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
- testb $3, UREGS_cs(%rsp)
- cmovnz %rcx, %r15
-.Lintr_cr3_okay:
-
CR4_PV32_RESTORE
movq %rsp,%rdi
callq do_IRQ
- mov %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
jmp ret_from_intr
/* No special register assumptions. */
@@ -535,21 +446,6 @@ GLOBAL(handle_exception)
SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
- mov STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
- mov %rcx, %r15
- neg %rcx
- jz .Lxcpt_cr3_okay
- jns .Lxcpt_cr3_load
- mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
- neg %rcx
-.Lxcpt_cr3_load:
- write_cr3 rcx, rdi, rsi
- xor %ecx, %ecx
- mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
- testb $3, UREGS_cs(%rsp)
- cmovnz %rcx, %r15
-.Lxcpt_cr3_okay:
-
handle_exception_saved:
GET_CURRENT(bx)
testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
@@ -615,7 +511,6 @@ handle_exception_saved:
PERFC_INCR(exceptions, %rax, %rbx)
mov (%rdx, %rax, 8), %rdx
INDIRECT_CALL %rdx
- mov %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
testb $3,UREGS_cs(%rsp)
jz restore_all_xen
leaq VCPU_trap_bounce(%rbx),%rdx
@@ -648,7 +543,6 @@ exception_with_ints_disabled:
rep; movsq # make room for ec/ev
1: movq UREGS_error_code(%rsp),%rax # ec/ev
movq %rax,UREGS_kernel_sizeof(%rsp)
- mov %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
jmp restore_all_xen # return to fixup code
/* No special register assumptions. */
@@ -733,15 +627,6 @@ ENTRY(double_fault)
SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
- mov STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rbx
- test %rbx, %rbx
- jz .Ldblf_cr3_okay
- jns .Ldblf_cr3_load
- neg %rbx
-.Ldblf_cr3_load:
- write_cr3 rbx, rdi, rsi
-.Ldblf_cr3_okay:
-
movq %rsp,%rdi
call do_double_fault
BUG /* do_double_fault() shouldn't return. */
@@ -766,26 +651,10 @@ handle_ist_exception:
SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
- mov STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
- mov %rcx, %r15
- neg %rcx
- jz .List_cr3_okay
- jns .List_cr3_load
- mov %rcx, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
- neg %rcx
-.List_cr3_load:
- write_cr3 rcx, rdi, rsi
- movq $0, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
-.List_cr3_okay:
-
CR4_PV32_RESTORE
testb $3,UREGS_cs(%rsp)
jz 1f
- /*
- * Interrupted guest context. Clear the restore value for xen_cr3
- * and copy the context to stack bottom.
- */
- xor %r15, %r15
+ /* Interrupted guest context. Copy the context to stack bottom. */
GET_CPUINFO_FIELD(guest_cpu_user_regs,di)
movq %rsp,%rsi
movl $UREGS_kernel_sizeof/8,%ecx
@@ -796,7 +665,6 @@ handle_ist_exception:
leaq exception_table(%rip),%rdx
mov (%rdx, %rax, 8), %rdx
INDIRECT_CALL %rdx
- mov %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
cmpb $TRAP_nmi,UREGS_entry_vector(%rsp)
jne ret_from_intr
diff --git a/xen/include/asm-x86/asm_defns.h b/xen/include/asm-x86/asm_defns.h
index aee14ba007..2a79e8a7f4 100644
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -205,15 +205,6 @@ void ret_from_intr(void);
#define ASM_STAC ASM_AC(STAC)
#define ASM_CLAC ASM_AC(CLAC)
-.macro write_cr3 val:req, tmp1:req, tmp2:req
- mov %cr4, %\tmp1
- mov %\tmp1, %\tmp2
- and $~X86_CR4_PGE, %\tmp1
- mov %\tmp1, %cr4
- mov %\val, %cr3
- mov %\tmp2, %cr4
-.endm
-
#define CR4_PV32_RESTORE \
667: ASM_NOP5; \
.pushsection .altinstr_replacement, "ax"; \
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 4678a0fcf5..1087239357 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -41,18 +41,6 @@ struct cpu_info {
struct vcpu *current_vcpu;
unsigned long per_cpu_offset;
unsigned long cr4;
- /*
- * Of the two following fields the latter is being set to the CR3 value
- * to be used on the given pCPU for loading whenever 64-bit PV guest
- * context is being entered. The value never changes once set.
- * The former is the value to restore when re-entering Xen, if any. IOW
- * its value being zero means there's nothing to restore. However, its
- * value can also be negative, indicating to the exit-to-Xen code that
- * restoring is not necessary, but allowing any nested entry code paths
- * to still know the value to put back into CR3.
- */
- unsigned long xen_cr3;
- unsigned long pv_cr3;
/* See asm-x86/spec_ctrl_asm.h for usage. */
unsigned int shadow_spec_ctrl;
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 9c70a98aef..625f6e9f69 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -437,7 +437,6 @@ extern idt_entry_t idt_table[];
extern idt_entry_t *idt_tables[];
DECLARE_PER_CPU(struct tss_struct, init_tss);
-DECLARE_PER_CPU(root_pgentry_t *, root_pgt);
extern void init_int80_direct_trap(struct vcpu *v);
diff --git a/xen/include/asm-x86/x86_64/page.h b/xen/include/asm-x86/x86_64/page.h
index 05a0334893..6fb7cd5553 100644
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -24,8 +24,8 @@
/* These are architectural limits. Current CPUs support only 40-bit phys. */
#define PADDR_BITS 52
#define VADDR_BITS 48
-#define PADDR_MASK ((_AC(1,UL) << PADDR_BITS) - 1)
-#define VADDR_MASK ((_AC(1,UL) << VADDR_BITS) - 1)
+#define PADDR_MASK ((1UL << PADDR_BITS)-1)
+#define VADDR_MASK ((1UL << VADDR_BITS)-1)
#define VADDR_TOP_BIT (1UL << (VADDR_BITS - 1))
#define CANONICAL_MASK (~0UL & ~VADDR_MASK)
@@ -107,7 +107,6 @@ typedef l4_pgentry_t root_pgentry_t;
: (((_s) < ROOT_PAGETABLE_FIRST_XEN_SLOT) || \
((_s) > ROOT_PAGETABLE_LAST_XEN_SLOT)))
-#define root_table_offset l4_table_offset
#define root_get_pfn l4e_get_pfn
#define root_get_flags l4e_get_flags
#define root_get_intpte l4e_get_intpte
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 04/17] x86: don't access saved user regs via rsp in trap handlers
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
` (2 preceding siblings ...)
2018-02-09 14:01 ` [PATCH v3 03/17] x86: revert 5784de3e2067ed73efc2fe42e62831e8ae7f46c4 Juergen Gross
@ 2018-02-09 14:01 ` Juergen Gross
2018-02-09 14:01 ` [PATCH v3 05/17] x86: add a xpti command line parameter Juergen Gross
` (13 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-09 14:01 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, dfaggioli, jbeulich
In order to support switching stacks when entering the hypervisor for
support of page table isolation, don't use %rsp for accessing the
saved user registers, but do that via %r12.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V3:
- use %r12 instead %rdi (Jan Beulich)
- remove some compat changes (Jan Beulich)
---
xen/arch/x86/x86_64/compat/entry.S | 10 ++-
xen/arch/x86/x86_64/entry.S | 152 ++++++++++++++++++++----------------
xen/include/asm-x86/current.h | 8 +-
xen/include/asm-x86/nops.h | 2 +-
xen/include/asm-x86/spec_ctrl_asm.h | 13 +--
5 files changed, 102 insertions(+), 83 deletions(-)
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 8fac5d304d..eced1475b7 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -18,15 +18,16 @@ ENTRY(entry_int82)
pushq $0
movl $HYPERCALL_VECTOR, 4(%rsp)
SAVE_ALL compat=1 /* DPL1 gate, restricted to 32bit PV guests only. */
+ mov %rsp, %r12
- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+ SPEC_CTRL_ENTRY_FROM_PV /* Req: %r12=regs, %rsp=cpuinfo, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
CR4_PV32_RESTORE
GET_CURRENT(bx)
- mov %rsp, %rdi
+ mov %r12, %rdi
call do_entry_int82
/* %rbx: struct vcpu */
@@ -201,7 +202,6 @@ ENTRY(compat_post_handle_exception)
/* See lstar_enter for entry register state. */
ENTRY(cstar_enter)
sti
- CR4_PV32_RESTORE
movq 8(%rsp),%rax /* Restore %rax. */
movq $FLAT_KERNEL_SS,8(%rsp)
pushq %r11
@@ -210,10 +210,12 @@ ENTRY(cstar_enter)
pushq $0
movl $TRAP_syscall, 4(%rsp)
SAVE_ALL
+ movq %rsp, %r12
- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+ SPEC_CTRL_ENTRY_FROM_PV /* Req: %r12=regs, %rsp=cpuinfo, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+ CR4_PV32_RESTORE
GET_CURRENT(bx)
movq VCPU_domain(%rbx),%rcx
cmpb $0,DOMAIN_is_32bit_pv(%rcx)
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index a078ad8979..f067a74b0f 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -14,13 +14,13 @@
#include <public/xen.h>
#include <irq_vectors.h>
-/* %rbx: struct vcpu */
+/* %rbx: struct vcpu, %r12: user_regs */
ENTRY(switch_to_kernel)
leaq VCPU_trap_bounce(%rbx),%rdx
/* TB_eip = (32-bit syscall && syscall32_addr) ?
* syscall32_addr : syscall_addr */
xor %eax,%eax
- cmpw $FLAT_USER_CS32,UREGS_cs(%rsp)
+ cmpw $FLAT_USER_CS32,UREGS_cs(%r12)
cmoveq VCPU_syscall32_addr(%rbx),%rax
testq %rax,%rax
cmovzq VCPU_syscall_addr(%rbx),%rax
@@ -31,7 +31,7 @@ ENTRY(switch_to_kernel)
leal (,%rcx,TBF_INTERRUPT),%ecx
movb %cl,TRAPBOUNCE_flags(%rdx)
call create_bounce_frame
- andl $~X86_EFLAGS_DF,UREGS_eflags(%rsp)
+ andl $~X86_EFLAGS_DF,UREGS_eflags(%r12)
jmp test_all_events
/* %rbx: struct vcpu, interrupts disabled */
@@ -43,7 +43,7 @@ restore_all_guest:
mov VCPUMSR_spec_ctrl_raw(%rdx), %eax
/* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
- SPEC_CTRL_EXIT_TO_GUEST /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
+ SPEC_CTRL_EXIT_TO_GUEST /* Req: a=spec_ctrl %rsp=cpuinfo, Clob: cd */
RESTORE_ALL
testw $TRAP_syscall,4(%rsp)
@@ -77,6 +77,9 @@ iret_exit_to_guest:
_ASM_PRE_EXTABLE(.Lft0, handle_exception)
ALIGN
+/* %r12: context to return to. */
+restore_all_xen_r12:
+ mov %r12, %rsp
/* No special register assumptions. */
restore_all_xen:
GET_STACK_END(bx)
@@ -112,18 +115,19 @@ ENTRY(lstar_enter)
pushq $0
movl $TRAP_syscall, 4(%rsp)
SAVE_ALL
+ mov %rsp, %r12
- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+ SPEC_CTRL_ENTRY_FROM_PV /* Req: %r12=regs, %rsp=cpuinfo, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
GET_CURRENT(bx)
testb $TF_kernel_mode,VCPU_thread_flags(%rbx)
jz switch_to_kernel
- mov %rsp, %rdi
+ mov %r12, %rdi
call pv_hypercall
-/* %rbx: struct vcpu */
+/* %rbx: struct vcpu, %r12: user_regs */
test_all_events:
ASSERT_NOT_IN_ATOMIC
cli # tests must not race interrupts
@@ -154,14 +158,14 @@ test_guest_events:
jmp test_all_events
ALIGN
-/* %rbx: struct vcpu */
+/* %rbx: struct vcpu, %r12: user_regs */
process_softirqs:
sti
call do_softirq
jmp test_all_events
ALIGN
-/* %rbx: struct vcpu */
+/* %rbx: struct vcpu, %r12: user_regs */
process_mce:
testb $1 << VCPU_TRAP_MCE,VCPU_async_exception_mask(%rbx)
jnz .Ltest_guest_nmi
@@ -177,7 +181,7 @@ process_mce:
jmp process_trap
ALIGN
-/* %rbx: struct vcpu */
+/* %rbx: struct vcpu, %r12: user_regs */
process_nmi:
testb $1 << VCPU_TRAP_NMI,VCPU_async_exception_mask(%rbx)
jnz test_guest_events
@@ -208,15 +212,16 @@ GLOBAL(sysenter_eflags_saved)
pushq $0
movl $TRAP_syscall, 4(%rsp)
SAVE_ALL
+ mov %rsp, %r12
- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+ SPEC_CTRL_ENTRY_FROM_PV /* Req: %r12=regs, %rsp=cpuinfo, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
GET_CURRENT(bx)
cmpb $0,VCPU_sysenter_disables_events(%rbx)
movq VCPU_sysenter_addr(%rbx),%rax
setne %cl
- testl $X86_EFLAGS_NT,UREGS_eflags(%rsp)
+ testl $X86_EFLAGS_NT,UREGS_eflags(%r12)
leaq VCPU_trap_bounce(%rbx),%rdx
UNLIKELY_START(nz, sysenter_nt_set)
pushfq
@@ -228,7 +233,7 @@ UNLIKELY_END(sysenter_nt_set)
leal (,%rcx,TBF_INTERRUPT),%ecx
UNLIKELY_START(z, sysenter_gpf)
movq VCPU_trap_ctxt(%rbx),%rsi
- movl $TRAP_gp_fault,UREGS_entry_vector(%rsp)
+ movl $TRAP_gp_fault,UREGS_entry_vector(%r12)
movl %eax,TRAPBOUNCE_error_code(%rdx)
movq TRAP_gp_fault * TRAPINFO_sizeof + TRAPINFO_eip(%rsi),%rax
testb $4,TRAP_gp_fault * TRAPINFO_sizeof + TRAPINFO_flags(%rsi)
@@ -247,8 +252,9 @@ ENTRY(int80_direct_trap)
pushq $0
movl $0x80, 4(%rsp)
SAVE_ALL
+ mov %rsp, %r12
- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+ SPEC_CTRL_ENTRY_FROM_PV /* Req: %r12=regs, %rsp=cpuinfo, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
cmpb $0,untrusted_msi(%rip)
@@ -276,16 +282,16 @@ int80_slow_path:
* Setup entry vector and error code as if this was a GPF caused by an
* IDT entry with DPL==0.
*/
- movl $((0x80 << 3) | X86_XEC_IDT),UREGS_error_code(%rsp)
- movl $TRAP_gp_fault,UREGS_entry_vector(%rsp)
+ movl $((0x80 << 3) | X86_XEC_IDT),UREGS_error_code(%r12)
+ movl $TRAP_gp_fault,UREGS_entry_vector(%r12)
/* A GPF wouldn't have incremented the instruction pointer. */
- subq $2,UREGS_rip(%rsp)
+ subq $2,UREGS_rip(%r12)
jmp handle_exception_saved
/* CREATE A BASIC EXCEPTION FRAME ON GUEST OS STACK: */
/* { RCX, R11, [ERRCODE,] RIP, CS, RFLAGS, RSP, SS } */
-/* %rdx: trap_bounce, %rbx: struct vcpu */
-/* On return only %rbx and %rdx are guaranteed non-clobbered. */
+/* %rdx: trap_bounce, %rbx: struct vcpu, %r12: user_regs */
+/* On return only %r12, %rbx and %rdx are guaranteed non-clobbered. */
create_bounce_frame:
ASSERT_INTERRUPTS_ENABLED
testb $TF_kernel_mode,VCPU_thread_flags(%rbx)
@@ -298,8 +304,8 @@ create_bounce_frame:
movq VCPU_kernel_sp(%rbx),%rsi
jmp 2f
1: /* In kernel context already: push new frame at existing %rsp. */
- movq UREGS_rsp+8(%rsp),%rsi
- andb $0xfc,UREGS_cs+8(%rsp) # Indicate kernel context to guest.
+ movq UREGS_rsp(%r12),%rsi
+ andb $0xfc,UREGS_cs(%r12) # Indicate kernel context to guest.
2: andq $~0xf,%rsi # Stack frames are 16-byte aligned.
movq $HYPERVISOR_VIRT_START+1,%rax
cmpq %rax,%rsi
@@ -317,11 +323,11 @@ __UNLIKELY_END(create_bounce_frame_bad_sp)
_ASM_EXTABLE(0b, domain_crash_page_fault_ ## n ## x8)
subq $7*8,%rsi
- movq UREGS_ss+8(%rsp),%rax
+ movq UREGS_ss(%r12),%rax
ASM_STAC
movq VCPU_domain(%rbx),%rdi
STORE_GUEST_STACK(rax,6) # SS
- movq UREGS_rsp+8(%rsp),%rax
+ movq UREGS_rsp(%r12),%rax
STORE_GUEST_STACK(rax,5) # RSP
movq VCPU_vcpu_info(%rbx),%rax
pushq VCPUINFO_upcall_mask(%rax)
@@ -330,12 +336,12 @@ __UNLIKELY_END(create_bounce_frame_bad_sp)
orb %ch,VCPUINFO_upcall_mask(%rax)
popq %rax
shlq $32,%rax # Bits 32-39: saved_upcall_mask
- movw UREGS_cs+8(%rsp),%ax # Bits 0-15: CS
+ movw UREGS_cs(%r12),%ax # Bits 0-15: CS
STORE_GUEST_STACK(rax,3) # CS / saved_upcall_mask
shrq $32,%rax
testb $0xFF,%al # Bits 0-7: saved_upcall_mask
setz %ch # %ch == !saved_upcall_mask
- movl UREGS_eflags+8(%rsp),%eax
+ movl UREGS_eflags(%r12),%eax
andl $~(X86_EFLAGS_IF|X86_EFLAGS_IOPL),%eax
addb %ch,%ch # Bit 9 (EFLAGS.IF)
orb %ch,%ah # Fold EFLAGS.IF into %eax
@@ -344,7 +350,7 @@ __UNLIKELY_END(create_bounce_frame_bad_sp)
cmovnzl VCPU_iopl(%rbx),%ecx # Bits 13:12 (EFLAGS.IOPL)
orl %ecx,%eax # Fold EFLAGS.IOPL into %eax
STORE_GUEST_STACK(rax,4) # RFLAGS
- movq UREGS_rip+8(%rsp),%rax
+ movq UREGS_rip(%r12),%rax
STORE_GUEST_STACK(rax,2) # RIP
testb $TBF_EXCEPTION_ERRCODE,TRAPBOUNCE_flags(%rdx)
jz 1f
@@ -352,9 +358,9 @@ __UNLIKELY_END(create_bounce_frame_bad_sp)
movl TRAPBOUNCE_error_code(%rdx),%eax
STORE_GUEST_STACK(rax,2) # ERROR CODE
1:
- movq UREGS_r11+8(%rsp),%rax
+ movq UREGS_r11(%r12),%rax
STORE_GUEST_STACK(rax,1) # R11
- movq UREGS_rcx+8(%rsp),%rax
+ movq UREGS_rcx(%r12),%rax
STORE_GUEST_STACK(rax,0) # RCX
ASM_CLAC
@@ -363,19 +369,19 @@ __UNLIKELY_END(create_bounce_frame_bad_sp)
/* Rewrite our stack frame and return to guest-OS mode. */
/* IA32 Ref. Vol. 3: TF, VM, RF and NT flags are cleared on trap. */
/* Also clear AC: alignment checks shouldn't trigger in kernel mode. */
- orl $TRAP_syscall,UREGS_entry_vector+8(%rsp)
+ orl $TRAP_syscall,UREGS_entry_vector(%r12)
andl $~(X86_EFLAGS_AC|X86_EFLAGS_VM|X86_EFLAGS_RF|\
- X86_EFLAGS_NT|X86_EFLAGS_TF),UREGS_eflags+8(%rsp)
- movq $FLAT_KERNEL_SS,UREGS_ss+8(%rsp)
- movq %rsi,UREGS_rsp+8(%rsp)
- movq $FLAT_KERNEL_CS,UREGS_cs+8(%rsp)
+ X86_EFLAGS_NT|X86_EFLAGS_TF),UREGS_eflags(%r12)
+ movq $FLAT_KERNEL_SS,UREGS_ss(%r12)
+ movq %rsi,UREGS_rsp(%r12)
+ movq $FLAT_KERNEL_CS,UREGS_cs(%r12)
movq TRAPBOUNCE_eip(%rdx),%rax
testq %rax,%rax
UNLIKELY_START(z, create_bounce_frame_bad_bounce_ip)
lea UNLIKELY_DISPATCH_LABEL(create_bounce_frame_bad_bounce_ip)(%rip), %rdi
jmp asm_domain_crash_synchronous /* Does not return */
__UNLIKELY_END(create_bounce_frame_bad_bounce_ip)
- movq %rax,UREGS_rip+8(%rsp)
+ movq %rax,UREGS_rip(%r12)
ret
.pushsection .fixup, "ax", @progbits
@@ -414,22 +420,23 @@ ENTRY(dom_crash_sync_extable)
ENTRY(common_interrupt)
SAVE_ALL CLAC
+ mov %rsp, %r12
GET_STACK_END(14)
- SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, Clob: acd */
+ SPEC_CTRL_ENTRY_FROM_INTR /* Req: %r12=regs, %r14=end, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
CR4_PV32_RESTORE
- movq %rsp,%rdi
+ mov %r12, %rdi
callq do_IRQ
jmp ret_from_intr
/* No special register assumptions. */
ENTRY(ret_from_intr)
GET_CURRENT(bx)
- testb $3,UREGS_cs(%rsp)
- jz restore_all_xen
+ testb $3,UREGS_cs(%r12)
+ jz restore_all_xen_r12
movq VCPU_domain(%rbx),%rax
testb $1,DOMAIN_is_32bit_pv(%rax)
jz test_all_events
@@ -440,15 +447,16 @@ ENTRY(page_fault)
/* No special register assumptions. */
GLOBAL(handle_exception)
SAVE_ALL CLAC
+ mov %rsp, %r12
GET_STACK_END(14)
- SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, Clob: acd */
+ SPEC_CTRL_ENTRY_FROM_INTR /* Req: %r12=regs, %r14=end, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
handle_exception_saved:
GET_CURRENT(bx)
- testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%rsp)
+ testb $X86_EFLAGS_IF>>8,UREGS_eflags+1(%r12)
jz exception_with_ints_disabled
.Lcr4_pv32_orig:
@@ -469,7 +477,7 @@ handle_exception_saved:
(.Lcr4_pv32_alt_end - .Lcr4_pv32_alt)
.popsection
- testb $3,UREGS_cs(%rsp)
+ testb $3,UREGS_cs(%r12)
jz .Lcr4_pv32_done
cmpb $0,DOMAIN_is_32bit_pv(%rax)
je .Lcr4_pv32_done
@@ -498,21 +506,21 @@ handle_exception_saved:
* goto compat_test_all_events;
*/
mov $PFEC_page_present,%al
- cmpb $TRAP_page_fault,UREGS_entry_vector(%rsp)
+ cmpb $TRAP_page_fault,UREGS_entry_vector(%r12)
jne .Lcr4_pv32_done
- xor UREGS_error_code(%rsp),%eax
+ xor UREGS_error_code(%r12),%eax
test $~(PFEC_write_access|PFEC_insn_fetch),%eax
jz compat_test_all_events
.Lcr4_pv32_done:
sti
-1: movq %rsp,%rdi
- movzbl UREGS_entry_vector(%rsp),%eax
+1: mov %r12,%rdi
+ movzbl UREGS_entry_vector(%r12),%eax
leaq exception_table(%rip),%rdx
PERFC_INCR(exceptions, %rax, %rbx)
mov (%rdx, %rax, 8), %rdx
INDIRECT_CALL %rdx
- testb $3,UREGS_cs(%rsp)
- jz restore_all_xen
+ testb $3,UREGS_cs(%r12)
+ jz restore_all_xen_r12
leaq VCPU_trap_bounce(%rbx),%rdx
movq VCPU_domain(%rbx),%rax
testb $1,DOMAIN_is_32bit_pv(%rax)
@@ -526,29 +534,29 @@ handle_exception_saved:
/* No special register assumptions. */
exception_with_ints_disabled:
- testb $3,UREGS_cs(%rsp) # interrupts disabled outside Xen?
+ testb $3,UREGS_cs(%r12) # interrupts disabled outside Xen?
jnz FATAL_exception_with_ints_disabled
- movq %rsp,%rdi
+ mov %r12,%rdi
call search_pre_exception_table
testq %rax,%rax # no fixup code for faulting EIP?
jz 1b
- movq %rax,UREGS_rip(%rsp)
- subq $8,UREGS_rsp(%rsp) # add ec/ev to previous stack frame
- testb $15,UREGS_rsp(%rsp) # return %rsp is now aligned?
+ movq %rax,UREGS_rip(%r12)
+ subq $8,UREGS_rsp(%r12) # add ec/ev to previous stack frame
+ testb $15,UREGS_rsp(%r12) # return %rsp is now aligned?
jz 1f # then there is a pad quadword already
- movq %rsp,%rsi
- subq $8,%rsp
- movq %rsp,%rdi
+ movq %r12,%rsi
+ subq $8,%r12
+ movq %r12,%rdi
movq $UREGS_kernel_sizeof/8,%rcx
rep; movsq # make room for ec/ev
-1: movq UREGS_error_code(%rsp),%rax # ec/ev
- movq %rax,UREGS_kernel_sizeof(%rsp)
- jmp restore_all_xen # return to fixup code
+1: movq UREGS_error_code(%r12),%rax # ec/ev
+ movq %rax,UREGS_kernel_sizeof(%r12)
+ jmp restore_all_xen_r12 # return to fixup code
/* No special register assumptions. */
FATAL_exception_with_ints_disabled:
xorl %esi,%esi
- movq %rsp,%rdi
+ mov %r12,%rdi
call fatal_trap
BUG /* fatal_trap() shouldn't return. */
@@ -621,13 +629,14 @@ ENTRY(double_fault)
movl $TRAP_double_fault,4(%rsp)
/* Set AC to reduce chance of further SMAP faults */
SAVE_ALL STAC
+ movq %rsp, %r12
GET_STACK_END(14)
- SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, Clob: acd */
+ SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %r12=regs, %r14=end, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
- movq %rsp,%rdi
+ mov %r12,%rdi
call do_double_fault
BUG /* do_double_fault() shouldn't return. */
@@ -645,32 +654,37 @@ ENTRY(nmi)
movl $TRAP_nmi,4(%rsp)
handle_ist_exception:
SAVE_ALL CLAC
+ mov %rsp, %r12
GET_STACK_END(14)
- SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, Clob: acd */
+ SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %r12=regs, %r14=end, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
CR4_PV32_RESTORE
- testb $3,UREGS_cs(%rsp)
+ movq %r12,%rbx
+ subq %rsp,%rbx
+ testb $3,UREGS_cs(%r12)
jz 1f
/* Interrupted guest context. Copy the context to stack bottom. */
GET_CPUINFO_FIELD(guest_cpu_user_regs,di)
- movq %rsp,%rsi
+ addq %rbx,%rdi
+ movq %r12,%rsi
movl $UREGS_kernel_sizeof/8,%ecx
movq %rdi,%rsp
+ movq %rdi,%r12
rep movsq
-1: movq %rsp,%rdi
- movzbl UREGS_entry_vector(%rsp),%eax
+1: movzbl UREGS_entry_vector(%r12),%eax
leaq exception_table(%rip),%rdx
+ mov %r12,%rdi
mov (%rdx, %rax, 8), %rdx
INDIRECT_CALL %rdx
- cmpb $TRAP_nmi,UREGS_entry_vector(%rsp)
+ cmpb $TRAP_nmi,UREGS_entry_vector(%r12)
jne ret_from_intr
/* We want to get straight to the IRET on the NMI exit path. */
- testb $3,UREGS_cs(%rsp)
- jz restore_all_xen
+ testb $3,UREGS_cs(%r12)
+ jz restore_all_xen_r12
GET_CURRENT(bx)
/* Send an IPI to ourselves to cover for the lack of event checking. */
movl VCPU_processor(%rbx),%eax
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 1087239357..83d226a1ba 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -102,9 +102,11 @@ unsigned long get_stack_dump_bottom (unsigned long sp);
({ \
__asm__ __volatile__ ( \
"mov %0,%%"__OP"sp;" \
- CHECK_FOR_LIVEPATCH_WORK \
- "jmp %c1" \
- : : "r" (guest_cpu_user_regs()), "i" (__fn) : "memory" ); \
+ "mov %1,%%r12;" \
+ CHECK_FOR_LIVEPATCH_WORK \
+ "jmp %c2" \
+ : : "r" (get_cpu_info()), "r" (guest_cpu_user_regs()), \
+ "i" (__fn) : "memory" ); \
unreachable(); \
})
diff --git a/xen/include/asm-x86/nops.h b/xen/include/asm-x86/nops.h
index 61319ccfba..daf95f7147 100644
--- a/xen/include/asm-x86/nops.h
+++ b/xen/include/asm-x86/nops.h
@@ -68,7 +68,7 @@
#define ASM_NOP17 ASM_NOP8; ASM_NOP7; ASM_NOP2
#define ASM_NOP21 ASM_NOP8; ASM_NOP8; ASM_NOP5
#define ASM_NOP24 ASM_NOP8; ASM_NOP8; ASM_NOP8
-#define ASM_NOP29 ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP5
+#define ASM_NOP30 ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP6
#define ASM_NOP32 ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP8
#define ASM_NOP40 ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP8; ASM_NOP8
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 814f53dffc..5868db8db2 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -144,7 +144,8 @@
.macro DO_SPEC_CTRL_ENTRY maybexen:req ibrs_val:req
/*
- * Requires %rsp=regs (also cpuinfo if !maybexen)
+ * Requires %r12=regs
+ * Requires %rsp=stack_end (if !maybexen)
* Requires %r14=stack_end (if maybexen)
* Clobbers %rax, %rcx, %rdx
*
@@ -162,7 +163,7 @@
*/
.if \maybexen
/* Branchless `if ( !xen ) clear_shadowing` */
- testb $3, UREGS_cs(%rsp)
+ testb $3, UREGS_cs(%r12)
setz %al
and %al, STACK_CPUINFO_FIELD(use_shadow_spec_ctrl)(%r14)
.else
@@ -197,7 +198,7 @@
.macro DO_SPEC_CTRL_EXIT_TO_GUEST
/*
- * Requires %eax=spec_ctrl, %rsp=regs/cpuinfo
+ * Requires %eax=spec_ctrl, %rsp=cpuinfo
* Clobbers %rcx, %rdx
*
* When returning to guest context, set up SPEC_CTRL shadowing and load the
@@ -241,7 +242,7 @@
#define SPEC_CTRL_ENTRY_FROM_INTR \
ALTERNATIVE __stringify(ASM_NOP40), \
DO_OVERWRITE_RSB, X86_FEATURE_RSB_NATIVE; \
- ALTERNATIVE_2 __stringify(ASM_NOP29), \
+ ALTERNATIVE_2 __stringify(ASM_NOP30), \
__stringify(DO_SPEC_CTRL_ENTRY maybexen=1 \
ibrs_val=SPEC_CTRL_IBRS), \
X86_FEATURE_XEN_IBRS_SET, \
@@ -263,7 +264,7 @@
/* TODO: Drop these when the alternatives infrastructure is NMI/#MC safe. */
.macro SPEC_CTRL_ENTRY_FROM_INTR_IST
/*
- * Requires %rsp=regs, %r14=stack_end
+ * Requires %r12=regs, %r14=stack_end
* Clobbers %rax, %rcx, %rdx
*
* This is logical merge of DO_OVERWRITE_RSB and DO_SPEC_CTRL_ENTRY
@@ -282,7 +283,7 @@
jz .L\@_skip_wrmsr
xor %edx, %edx
- testb $3, UREGS_cs(%rsp)
+ testb $3, UREGS_cs(%r12)
setz %dl
and %dl, STACK_CPUINFO_FIELD(use_shadow_spec_ctrl)(%r14)
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 05/17] x86: add a xpti command line parameter
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
` (3 preceding siblings ...)
2018-02-09 14:01 ` [PATCH v3 04/17] x86: don't access saved user regs via rsp in trap handlers Juergen Gross
@ 2018-02-09 14:01 ` Juergen Gross
2018-02-09 14:01 ` [PATCH v3 06/17] x86: allow per-domain mappings without NX bit or with specific mfn Juergen Gross
` (12 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-09 14:01 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, dfaggioli, jbeulich
Add a command line parameter for controlling Xen page table isolation
(XPTI): per default it is on for non-AMD systems in 64 bit pv domains.
Possible settings are:
- true: switched on even on AMD systems
- false: switched off for all
- nodom0: switched off for dom0
As we don't want to set XPTI for 32 bit pv domains we have to delay
XPTI initialization until XEN_DOMCTL_set_address_size has been called
for the domain. Dom0 needs a specific init call.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V3:
- move XPTI initialization call to XEN_DOMCTL_set_address_size handling
- move XPTI code into arch/x86/pv/xpti.c
- replace xpti flag in struct domain by pointer
- add is_*_xpti_active() helpers
---
docs/misc/xen-command-line.markdown | 18 ++++++
xen/arch/x86/domctl.c | 4 ++
xen/arch/x86/pv/Makefile | 1 +
xen/arch/x86/pv/dom0_build.c | 3 +
xen/arch/x86/pv/domain.c | 3 +
xen/arch/x86/pv/xpti.c | 111 ++++++++++++++++++++++++++++++++++++
xen/include/asm-x86/domain.h | 4 ++
xen/include/asm-x86/pv/mm.h | 20 +++++++
8 files changed, 164 insertions(+)
create mode 100644 xen/arch/x86/pv/xpti.c
diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 6df39dae0b..f96bb6342d 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1926,6 +1926,24 @@ In the case that x2apic is in use, this option switches between physical and
clustered mode. The default, given no hint from the **FADT**, is cluster
mode.
+### xpti
+> `= nodom0 | default | <boolean>`
+
+> Default: `false` on AMD hardware, `true` everywhere else.
+
+> Can be modified at runtime
+
+Override default selection of whether to isolate 64-bit PV guest page
+tables.
+
+`true` activates page table isolation even on AMD hardware.
+
+`false` deactivates page table isolation on all systems.
+
+`nodom0` deactivates page table isolation for dom0.
+
+`default` switch to default settings.
+
### xsave
> `= <boolean>`
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 8fbbf3aeb3..0b448e411d 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -31,6 +31,7 @@
#include <xen/vm_event.h>
#include <public/vm_event.h>
#include <asm/mem_sharing.h>
+#include <asm/pv/mm.h>
#include <asm/xstate.h>
#include <asm/debugger.h>
#include <asm/psr.h>
@@ -610,6 +611,9 @@ long arch_do_domctl(
ret = switch_compat(d);
else
ret = -EINVAL;
+
+ if ( ret == 0 )
+ ret = xpti_domain_init(d);
break;
case XEN_DOMCTL_get_address_size:
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index 65bca04175..a12e4fbd1a 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -13,6 +13,7 @@ obj-y += mm.o
obj-y += ro-page-fault.o
obj-$(CONFIG_PV_SHIM) += shim.o
obj-y += traps.o
+obj-y += xpti.o
obj-bin-y += dom0_build.init.o
obj-bin-y += gpr_switch.o
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 0bd2f1bf90..6e7bc435ab 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -707,6 +707,9 @@ int __init dom0_construct_pv(struct domain *d,
cpu = p->processor;
}
+ if ( !is_pv_32bit_domain(d) )
+ xpti_domain_init(d);
+
d->arch.paging.mode = 0;
/* Set up CR3 value for write_ptbase */
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index 2c784fb3cc..a007af94dd 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -10,6 +10,7 @@
#include <xen/sched.h>
#include <asm/pv/domain.h>
+#include <asm/pv/mm.h>
/* Override macros from asm/page.h to make them work with mfn_t */
#undef mfn_to_page
@@ -174,6 +175,8 @@ void pv_domain_destroy(struct domain *d)
free_xenheap_page(d->arch.pv_domain.gdt_ldt_l1tab);
d->arch.pv_domain.gdt_ldt_l1tab = NULL;
+
+ xpti_domain_destroy(d);
}
diff --git a/xen/arch/x86/pv/xpti.c b/xen/arch/x86/pv/xpti.c
new file mode 100644
index 0000000000..0b17d77d74
--- /dev/null
+++ b/xen/arch/x86/pv/xpti.c
@@ -0,0 +1,111 @@
+/******************************************************************************
+ * arch/x86/mm/xpti.c
+ *
+ * Xen Page Table Isolation support.
+ *
+ * Copyright (c) 2018 SUSE Linux GmbH (Juergen Gross)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/errno.h>
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/sched.h>
+
+struct xpti_domain {
+ int pad;
+};
+
+static __read_mostly enum {
+ XPTI_DEFAULT,
+ XPTI_ON,
+ XPTI_OFF,
+ XPTI_NODOM0
+} opt_xpti = XPTI_DEFAULT;
+
+static int parse_xpti(const char *s)
+{
+ int rc = 0;
+
+ switch ( parse_bool(s, NULL) )
+ {
+ case 0:
+ opt_xpti = XPTI_OFF;
+ break;
+ case 1:
+ opt_xpti = XPTI_ON;
+ break;
+ default:
+ if ( !strcmp(s, "default") )
+ opt_xpti = XPTI_DEFAULT;
+ else if ( !strcmp(s, "nodom0") )
+ opt_xpti = XPTI_NODOM0;
+ else
+ rc = -EINVAL;
+ break;
+ }
+
+ return rc;
+}
+
+custom_runtime_param("xpti", parse_xpti);
+
+void xpti_domain_destroy(struct domain *d)
+{
+ xfree(d->arch.pv_domain.xpti);
+ d->arch.pv_domain.xpti = NULL;
+}
+
+int xpti_domain_init(struct domain *d)
+{
+ bool xpti = false;
+ int ret = 0;
+
+ if ( !is_pv_domain(d) || is_pv_32bit_domain(d) )
+ return 0;
+
+ switch ( opt_xpti )
+ {
+ case XPTI_OFF:
+ xpti = false;
+ break;
+ case XPTI_ON:
+ xpti = true;
+ break;
+ case XPTI_NODOM0:
+ xpti = boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
+ d->domain_id != 0 && d->domain_id != hardware_domid;
+ break;
+ case XPTI_DEFAULT:
+ xpti = boot_cpu_data.x86_vendor != X86_VENDOR_AMD;
+ break;
+ }
+
+ if ( !xpti )
+ return 0;
+
+ d->arch.pv_domain.xpti = xmalloc(struct xpti_domain);
+ if ( !d->arch.pv_domain.xpti )
+ {
+ ret = -ENOMEM;
+ goto done;
+ }
+
+ printk("Enabling Xen Pagetable protection (XPTI) for Domain %d\n",
+ d->domain_id);
+
+ done:
+ return ret;
+}
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 4679d5477d..b33c286807 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -247,6 +247,8 @@ struct time_scale {
u32 mul_frac;
};
+struct xpti_domain;
+
struct pv_domain
{
l1_pgentry_t **gdt_ldt_l1tab;
@@ -257,6 +259,8 @@ struct pv_domain
struct mapcache_domain mapcache;
struct cpuidmasks *cpuidmasks;
+
+ struct xpti_domain *xpti;
};
struct monitor_write_data {
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
index 246b99014c..dfac89df0b 100644
--- a/xen/include/asm-x86/pv/mm.h
+++ b/xen/include/asm-x86/pv/mm.h
@@ -31,6 +31,19 @@ void pv_destroy_gdt(struct vcpu *v);
bool pv_map_ldt_shadow_page(unsigned int off);
bool pv_destroy_ldt(struct vcpu *v);
+int xpti_domain_init(struct domain *d);
+void xpti_domain_destroy(struct domain *d);
+
+static inline bool is_domain_xpti_active(const struct domain *d)
+{
+ return is_pv_domain(d) && d->arch.pv_domain.xpti;
+}
+
+static inline bool is_vcpu_xpti_active(const struct vcpu *v)
+{
+ return is_domain_xpti_active(v->domain);
+}
+
#else
#include <xen/errno.h>
@@ -52,6 +65,13 @@ static inline bool pv_map_ldt_shadow_page(unsigned int off) { return false; }
static inline bool pv_destroy_ldt(struct vcpu *v)
{ ASSERT_UNREACHABLE(); return false; }
+static inline int xpti_domain_init(struct domain *d) { return 0; }
+static inline void xpti_domain_destroy(struct domain *d) { }
+
+static inline bool is_domain_xpti_active(const struct domain *d)
+{ return false; }
+static inline bool is_vcpu_xpti_active(const struct vcpu *v) { return false; }
+
#endif
#endif /* __X86_PV_MM_H__ */
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 06/17] x86: allow per-domain mappings without NX bit or with specific mfn
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
` (4 preceding siblings ...)
2018-02-09 14:01 ` [PATCH v3 05/17] x86: add a xpti command line parameter Juergen Gross
@ 2018-02-09 14:01 ` Juergen Gross
2018-02-09 14:01 ` [PATCH v3 07/17] xen/x86: split _set_tssldt_desc() into ldt and tss specific functions Juergen Gross
` (11 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-09 14:01 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, dfaggioli, jbeulich
For support of per-vcpu stacks we need per-vcpu trampolines. To be
able to put those into the per-domain mappings the upper levels
page tables must not have NX set for per-domain mappings.
As create_perdomain_mapping() creates L1 mappings with flags being
__PAGE_HYPERVISOR_RW this won't change any of the current per domain
mappings to become executable.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V3:
- remove functions for modifying per-domain mappings (Jan Beulich)
---
xen/arch/x86/mm.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e795239829..d86e07e9f8 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1568,7 +1568,7 @@ void init_xen_l4_slots(l4_pgentry_t *l4t, mfn_t l4mfn,
/* Slot 260: Per-domain mappings (if applicable). */
l4t[l4_table_offset(PERDOMAIN_VIRT_START)] =
- d ? l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW)
+ d ? l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR)
: l4e_empty();
/* Slot 261-: text/data/bss, RW M2P, vmap, frametable, directmap. */
@@ -5319,7 +5319,7 @@ int create_perdomain_mapping(struct domain *d, unsigned long va,
}
l2tab = __map_domain_page(pg);
clear_page(l2tab);
- l3tab[l3_table_offset(va)] = l3e_from_page(pg, __PAGE_HYPERVISOR_RW);
+ l3tab[l3_table_offset(va)] = l3e_from_page(pg, __PAGE_HYPERVISOR);
}
else
l2tab = map_l2t_from_l3e(l3tab[l3_table_offset(va)]);
@@ -5361,7 +5361,7 @@ int create_perdomain_mapping(struct domain *d, unsigned long va,
l1tab = __map_domain_page(pg);
}
clear_page(l1tab);
- *pl2e = l2e_from_page(pg, __PAGE_HYPERVISOR_RW);
+ *pl2e = l2e_from_page(pg, __PAGE_HYPERVISOR);
}
else if ( !l1tab )
l1tab = map_l1t_from_l2e(*pl2e);
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 07/17] xen/x86: split _set_tssldt_desc() into ldt and tss specific functions
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
` (5 preceding siblings ...)
2018-02-09 14:01 ` [PATCH v3 06/17] x86: allow per-domain mappings without NX bit or with specific mfn Juergen Gross
@ 2018-02-09 14:01 ` Juergen Gross
2018-02-09 14:01 ` [PATCH v3 08/17] x86: add support for spectre mitigation with local thunk Juergen Gross
` (10 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-09 14:01 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, dfaggioli, jbeulich
_set_tssldt_desc() is used to set LDT or TSS descriptors in the GDT.
As LDT descriptors might be shared across cpus care is taken to not
create a temporary invalid descriptor.
Split _set_tssldt_desc() into dedicated functions for setting either
a LDT or a TSS descriptor. For LDT descriptors this is basically the
same as today, while TSS descriptors can be written without using
barriers as those are written for the local cpu only.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V3:
- new patch
---
xen/arch/x86/cpu/common.c | 4 ++--
xen/arch/x86/traps.c | 4 ++--
xen/include/asm-x86/desc.h | 14 +++++++++++++-
xen/include/asm-x86/ldt.h | 2 +-
4 files changed, 18 insertions(+), 6 deletions(-)
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 4306e59650..e0ae8120a6 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -755,12 +755,12 @@ void load_system_tables(void)
.bitmap = IOBMP_INVALID_OFFSET,
};
- _set_tssldt_desc(
+ _set_tss_desc(
gdt + TSS_ENTRY,
(unsigned long)tss,
offsetof(struct tss_struct, __cacheline_filler) - 1,
SYS_DESC_tss_avail);
- _set_tssldt_desc(
+ _set_tss_desc(
compat_gdt + TSS_ENTRY,
(unsigned long)tss,
offsetof(struct tss_struct, __cacheline_filler) - 1,
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 13a852ca4e..9b29014e2c 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1845,12 +1845,12 @@ void load_TR(void)
.limit = LAST_RESERVED_GDT_BYTE
};
- _set_tssldt_desc(
+ _set_tss_desc(
this_cpu(gdt_table) + TSS_ENTRY - FIRST_RESERVED_GDT_ENTRY,
(unsigned long)tss,
offsetof(struct tss_struct, __cacheline_filler) - 1,
SYS_DESC_tss_avail);
- _set_tssldt_desc(
+ _set_tss_desc(
this_cpu(compat_gdt_table) + TSS_ENTRY - FIRST_RESERVED_GDT_ENTRY,
(unsigned long)tss,
offsetof(struct tss_struct, __cacheline_filler) - 1,
diff --git a/xen/include/asm-x86/desc.h b/xen/include/asm-x86/desc.h
index 4093c65faa..6ec515582d 100644
--- a/xen/include/asm-x86/desc.h
+++ b/xen/include/asm-x86/desc.h
@@ -171,7 +171,7 @@ static inline void _update_gate_addr_lower(idt_entry_t *gate, void *addr)
_write_gate_lower(gate, &idte);
}
-#define _set_tssldt_desc(desc,addr,limit,type) \
+#define _set_ldt_desc(desc,addr,limit,type) \
do { \
(desc)[0].b = (desc)[1].b = 0; \
smp_wmb(); /* disable entry /then/ rewrite */ \
@@ -185,6 +185,18 @@ do { \
(((u32)(addr) & 0x00FF0000U) >> 16); \
} while (0)
+#define _set_tss_desc(desc,addr,limit,type) \
+do { \
+ (desc)[0].a = \
+ ((u32)(addr) << 16) | ((u32)(limit) & 0xFFFF); \
+ (desc)[0].b = \
+ ((u32)(addr) & 0xFF000000U) | \
+ ((u32)(type) << 8) | 0x8000U | \
+ (((u32)(addr) & 0x00FF0000U) >> 16); \
+ (desc)[1].a = (u32)(((unsigned long)(addr)) >> 32); \
+ (desc)[1].b = 0; \
+} while (0)
+
struct __packed desc_ptr {
unsigned short limit;
unsigned long base;
diff --git a/xen/include/asm-x86/ldt.h b/xen/include/asm-x86/ldt.h
index 589daf83c6..6179cef9e9 100644
--- a/xen/include/asm-x86/ldt.h
+++ b/xen/include/asm-x86/ldt.h
@@ -16,7 +16,7 @@ static inline void load_LDT(struct vcpu *v)
desc = (!is_pv_32bit_vcpu(v)
? this_cpu(gdt_table) : this_cpu(compat_gdt_table))
+ LDT_ENTRY - FIRST_RESERVED_GDT_ENTRY;
- _set_tssldt_desc(desc, LDT_VIRT_START(v), ents*8-1, SYS_DESC_ldt);
+ _set_ldt_desc(desc, LDT_VIRT_START(v), ents*8-1, SYS_DESC_ldt);
lldt(LDT_ENTRY << 3);
}
}
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 08/17] x86: add support for spectre mitigation with local thunk
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
` (6 preceding siblings ...)
2018-02-09 14:01 ` [PATCH v3 07/17] xen/x86: split _set_tssldt_desc() into ldt and tss specific functions Juergen Gross
@ 2018-02-09 14:01 ` Juergen Gross
2018-02-09 14:01 ` [PATCH v3 09/17] x86: create syscall stub for per-domain mapping Juergen Gross
` (9 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-09 14:01 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, dfaggioli, jbeulich
Right now an indirect jump might use a relative jump to a retpoline
thunk in order to mitigate the Spectre vulnerability.
In case the code using the indirect jump is remapped to another
virtual address this won't work any longer, so add support for
indirect jumps using a local thunk instead.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V3:
- new patch
---
xen/arch/x86/indirect-thunk.S | 23 +----------------------
xen/include/asm-x86/asm_defns.h | 25 +++++++++++++++++++++++++
xen/include/asm-x86/indirect_thunk_asm.h | 8 ++++++--
3 files changed, 32 insertions(+), 24 deletions(-)
diff --git a/xen/arch/x86/indirect-thunk.S b/xen/arch/x86/indirect-thunk.S
index e03fc14c73..b4d3e4cec4 100644
--- a/xen/arch/x86/indirect-thunk.S
+++ b/xen/arch/x86/indirect-thunk.S
@@ -11,25 +11,6 @@
#include <asm/asm_defns.h>
-.macro IND_THUNK_RETPOLINE reg:req
- call 2f
-1:
- lfence
- jmp 1b
-2:
- mov %\reg, (%rsp)
- ret
-.endm
-
-.macro IND_THUNK_LFENCE reg:req
- lfence
- jmp *%\reg
-.endm
-
-.macro IND_THUNK_JMP reg:req
- jmp *%\reg
-.endm
-
/*
* Build the __x86_indirect_thunk_* symbols. Execution lands on an
* alternative patch point which implements one of the above THUNK_*'s
@@ -38,9 +19,7 @@
.section .text.__x86_indirect_thunk_\reg, "ax", @progbits
ENTRY(__x86_indirect_thunk_\reg)
- ALTERNATIVE_2 __stringify(IND_THUNK_RETPOLINE \reg), \
- __stringify(IND_THUNK_LFENCE \reg), X86_FEATURE_IND_THUNK_LFENCE, \
- __stringify(IND_THUNK_JMP \reg), X86_FEATURE_IND_THUNK_JMP
+ GEN_INDIRECT_THUNK_BODY \reg
.endm
/* Instantiate GEN_INDIRECT_THUNK for each register except %rsp. */
diff --git a/xen/include/asm-x86/asm_defns.h b/xen/include/asm-x86/asm_defns.h
index 2a79e8a7f4..7d26391be8 100644
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -334,6 +334,31 @@ static always_inline void stac(void)
subq $-(UREGS_error_code-UREGS_r15+\adj), %rsp
.endm
+.macro IND_THUNK_RETPOLINE reg:req
+ call 2f
+1:
+ lfence
+ jmp 1b
+2:
+ mov %\reg, (%rsp)
+ ret
+.endm
+
+.macro IND_THUNK_LFENCE reg:req
+ lfence
+ jmp *%\reg
+.endm
+
+.macro IND_THUNK_JMP reg:req
+ jmp *%\reg
+.endm
+
+.macro GEN_INDIRECT_THUNK_BODY reg:req
+ ALTERNATIVE_2 __stringify(IND_THUNK_RETPOLINE \reg), \
+ __stringify(IND_THUNK_LFENCE \reg), X86_FEATURE_IND_THUNK_LFENCE, \
+ __stringify(IND_THUNK_JMP \reg), X86_FEATURE_IND_THUNK_JMP
+.endm
+
#endif
#ifdef CONFIG_PERF_COUNTERS
diff --git a/xen/include/asm-x86/indirect_thunk_asm.h b/xen/include/asm-x86/indirect_thunk_asm.h
index 96bcc25497..3abb32caee 100644
--- a/xen/include/asm-x86/indirect_thunk_asm.h
+++ b/xen/include/asm-x86/indirect_thunk_asm.h
@@ -3,7 +3,7 @@
* usual #ifdef'ary to turn into comments.
*/
-.macro INDIRECT_BRANCH insn:req arg:req
+.macro INDIRECT_BRANCH insn:req arg:req label=__x86_indirect_thunk_r
/*
* Create an indirect branch. insn is one of call/jmp, arg is a single
* register.
@@ -16,7 +16,7 @@
$done = 0
.irp reg, ax, cx, dx, bx, bp, si, di, 8, 9, 10, 11, 12, 13, 14, 15
.ifeqs "\arg", "%r\reg"
- \insn __x86_indirect_thunk_r\reg
+ \insn \label\reg
$done = 1
.exitm
.endif
@@ -39,3 +39,7 @@
.macro INDIRECT_JMP arg:req
INDIRECT_BRANCH jmp \arg
.endm
+
+.macro INDIRECT_LOCAL_JMP arg:req
+ INDIRECT_BRANCH jmp \arg local__x86_indirect_thunk_r
+.endm
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 09/17] x86: create syscall stub for per-domain mapping
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
` (7 preceding siblings ...)
2018-02-09 14:01 ` [PATCH v3 08/17] x86: add support for spectre mitigation with local thunk Juergen Gross
@ 2018-02-09 14:01 ` Juergen Gross
2018-02-09 14:01 ` [PATCH v3 10/17] x86: allocate per-vcpu stacks for interrupt entries Juergen Gross
` (8 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-09 14:01 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, dfaggioli, jbeulich
The current syscall stub can't be used mapped in the per domain area
as required by XPTI due to the distance for jumping into the common
interrupt handling code is larger than 2GB. Using just an indirect
jump isn't going to work as this will require mitigations against
Spectre.
So use a new trampoline which is no longer unique to a (v)cpu, but
can be mapped into the per-domain area as needed. For addressing the
stack use the knowledge that the primary stack will be in the next
page after the trampoline coding so we can save %rsp via a %rip
relative access without needing any further register.
For being able to easily switch between per-cpu and per-vcpu stubs add
a macro for the per-cpu stub size and add the prototypes of
[cl]star_enter() to a header.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V3:
- completely new per-vcpu stub containing Spectre mitigation
---
xen/arch/x86/pv/Makefile | 1 +
xen/arch/x86/pv/xpti-stub.S | 61 ++++++++++++++++++++++++++++++++++++++
xen/arch/x86/x86_64/compat/entry.S | 1 +
xen/arch/x86/x86_64/entry.S | 1 +
xen/arch/x86/x86_64/traps.c | 3 +-
xen/include/asm-x86/system.h | 5 ++++
6 files changed, 70 insertions(+), 2 deletions(-)
create mode 100644 xen/arch/x86/pv/xpti-stub.S
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index a12e4fbd1a..3f6b5506dc 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -17,3 +17,4 @@ obj-y += xpti.o
obj-bin-y += dom0_build.init.o
obj-bin-y += gpr_switch.o
+obj-bin-y += xpti-stub.o
diff --git a/xen/arch/x86/pv/xpti-stub.S b/xen/arch/x86/pv/xpti-stub.S
new file mode 100644
index 0000000000..efa1e3f661
--- /dev/null
+++ b/xen/arch/x86/pv/xpti-stub.S
@@ -0,0 +1,61 @@
+/*
+ * Syscall stubs mappable to per-vcpu area in order to mitigate Meltdown attack.
+ * The stack page will be mapped just after the stub page, so its distance
+ * is well known.
+ *
+ * Copyright (c) 2018, Juergen Gross
+ */
+
+ .file "pv/xpti-stub.S"
+
+#include <asm/asm_defns.h>
+#include <public/xen.h>
+
+ .align PAGE_SIZE
+
+ .equ xpti_regs, . + 2 * PAGE_SIZE - CPUINFO_sizeof
+
+ENTRY(xpti_lstar)
+ mov %rsp, xpti_regs+UREGS_rsp(%rip)
+ lea xpti_regs+UREGS_rsp(%rip), %rsp
+ movq $FLAT_KERNEL_SS, 8(%rsp)
+ pushq %r11
+ pushq $FLAT_KERNEL_CS64
+ pushq %rcx
+ pushq $0
+ movl $TRAP_syscall, 4(%rsp)
+ SAVE_ALL
+ mov %rsp, %r12
+
+ sti
+
+ SPEC_CTRL_ENTRY_FROM_PV /* Req: %r12=regs, %rsp=cpuinfo, Clob: acd */
+ /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
+ movabsq $lstar_common, %rax
+ INDIRECT_LOCAL_JMP %rax
+
+ENTRY(xpti_cstar)
+ mov %rsp, xpti_regs+UREGS_rsp(%rip)
+ lea xpti_regs+UREGS_rsp(%rip), %rsp
+ movq $FLAT_KERNEL_SS, 8(%rsp)
+ pushq %r11
+ pushq $FLAT_USER_CS32
+ pushq %rcx
+ pushq $0
+ movl $TRAP_syscall, 4(%rsp)
+ SAVE_ALL
+ movq %rsp, %r12
+
+ sti
+
+ SPEC_CTRL_ENTRY_FROM_PV /* Req: %r12=regs, %rsp=cpuinfo, Clob: acd */
+ /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
+ movabsq $cstar_common, %rax
+ INDIRECT_LOCAL_JMP %rax
+
+local__x86_indirect_thunk_rax:
+ GEN_INDIRECT_THUNK_BODY rax
+
+ .align PAGE_SIZE
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index eced1475b7..206bc9a05a 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -215,6 +215,7 @@ ENTRY(cstar_enter)
SPEC_CTRL_ENTRY_FROM_PV /* Req: %r12=regs, %rsp=cpuinfo, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+GLOBAL(cstar_common)
CR4_PV32_RESTORE
GET_CURRENT(bx)
movq VCPU_domain(%rbx),%rcx
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index f067a74b0f..69590d0b17 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -120,6 +120,7 @@ ENTRY(lstar_enter)
SPEC_CTRL_ENTRY_FROM_PV /* Req: %r12=regs, %rsp=cpuinfo, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+GLOBAL(lstar_common)
GET_CURRENT(bx)
testb $TF_kernel_mode,VCPU_thread_flags(%rbx)
jz switch_to_kernel
diff --git a/xen/arch/x86/x86_64/traps.c b/xen/arch/x86/x86_64/traps.c
index 3652f5ff21..bd4d37c2ad 100644
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -291,8 +291,6 @@ static unsigned int write_stub_trampoline(
}
DEFINE_PER_CPU(struct stubs, stubs);
-void lstar_enter(void);
-void cstar_enter(void);
void subarch_percpu_traps_init(void)
{
@@ -315,6 +313,7 @@ void subarch_percpu_traps_init(void)
offset = write_stub_trampoline(stub_page + (stub_va & ~PAGE_MASK),
stub_va, stack_bottom,
(unsigned long)lstar_enter);
+ ASSERT(offset == STUB_TRAMPOLINE_SIZE_PERCPU);
stub_va += offset;
if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
index 8ac170371b..06afc59822 100644
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -230,6 +230,11 @@ static inline int local_irq_is_enabled(void)
void trap_init(void);
void init_idt_traps(void);
+#define STUB_TRAMPOLINE_SIZE_PERCPU 32
+void lstar_enter(void);
+void cstar_enter(void);
+void xpti_lstar(void);
+void xpti_cstar(void);
void load_system_tables(void);
void percpu_traps_init(void);
void subarch_percpu_traps_init(void);
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 10/17] x86: allocate per-vcpu stacks for interrupt entries
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
` (8 preceding siblings ...)
2018-02-09 14:01 ` [PATCH v3 09/17] x86: create syscall stub for per-domain mapping Juergen Gross
@ 2018-02-09 14:01 ` Juergen Gross
2018-02-09 14:01 ` [PATCH v3 11/17] x86: modify interrupt handlers to support stack switching Juergen Gross
` (7 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-09 14:01 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, dfaggioli, jbeulich
In case of XPTI being active for a pv-domain allocate and initialize
per-vcpu stacks. The stacks are added to the per-domain mappings of
the pv-domain.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V3:
- move xpti code to xpti.c
- directly modify page table entries as needed for stub and stack
page (Jan Beulich)
- use one page for all stacks and TSS
- remap global stub instead allocating one for each vcpu
---
xen/arch/x86/pv/domain.c | 2 +
xen/arch/x86/pv/xpti.c | 117 +++++++++++++++++++++++++++++++++++++++---
xen/include/asm-x86/config.h | 13 ++++-
xen/include/asm-x86/current.h | 49 +++++++++++++-----
xen/include/asm-x86/domain.h | 3 ++
xen/include/asm-x86/pv/mm.h | 2 +
6 files changed, 166 insertions(+), 20 deletions(-)
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index a007af94dd..550fbbf0fe 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -120,6 +120,8 @@ void pv_vcpu_destroy(struct vcpu *v)
pv_destroy_gdt_ldt_l1tab(v);
xfree(v->arch.pv_vcpu.trap_ctxt);
v->arch.pv_vcpu.trap_ctxt = NULL;
+
+ xpti_vcpu_destroy(v);
}
int pv_vcpu_initialise(struct vcpu *v)
diff --git a/xen/arch/x86/pv/xpti.c b/xen/arch/x86/pv/xpti.c
index 0b17d77d74..1356541804 100644
--- a/xen/arch/x86/pv/xpti.c
+++ b/xen/arch/x86/pv/xpti.c
@@ -19,13 +19,28 @@
* along with this program; If not, see <http://www.gnu.org/licenses/>.
*/
+#include <xen/domain_page.h>
#include <xen/errno.h>
#include <xen/init.h>
#include <xen/lib.h>
#include <xen/sched.h>
+#define XPTI_STACK_SIZE 512
+#define XPTI_STACK_N (XPTI_STACK_SIZE / 8)
+
+struct xpti_stack {
+ struct tss_struct tss;
+ char pad[PAGE_SIZE - sizeof(struct cpu_info) - sizeof(struct tss_struct) -
+ XPTI_STACK_SIZE * 4];
+ uint64_t df_stack[XPTI_STACK_N];
+ uint64_t nmi_stack[XPTI_STACK_N];
+ uint64_t mce_stack[XPTI_STACK_N];
+ uint64_t primary_stack[XPTI_STACK_N];
+ struct cpu_info cpu_info;
+};
+
struct xpti_domain {
- int pad;
+ l1_pgentry_t **perdom_l1tab;
};
static __read_mostly enum {
@@ -64,14 +79,92 @@ custom_runtime_param("xpti", parse_xpti);
void xpti_domain_destroy(struct domain *d)
{
- xfree(d->arch.pv_domain.xpti);
+ struct xpti_domain *xd = d->arch.pv_domain.xpti;
+
+ if ( !xd )
+ return;
+
+ xfree(xd->perdom_l1tab);
+ xfree(xd);
d->arch.pv_domain.xpti = NULL;
}
+void xpti_vcpu_destroy(struct vcpu *v)
+{
+ if ( v->domain->arch.pv_domain.xpti )
+ {
+ free_xenheap_page(v->arch.pv_vcpu.stack_regs);
+ v->arch.pv_vcpu.stack_regs = NULL;
+ destroy_perdomain_mapping(v->domain, XPTI_START(v), STACK_PAGES);
+ }
+}
+
+static int xpti_vcpu_init(struct vcpu *v)
+{
+ struct domain *d = v->domain;
+ struct xpti_domain *xd = d->arch.pv_domain.xpti;
+ void *ptr;
+ struct cpu_info *info;
+ struct xpti_stack *stack;
+ struct tss_struct *tss;
+ l1_pgentry_t *pl1e;
+ unsigned int i;
+ int rc;
+
+ /* Populate page tables. */
+ rc = create_perdomain_mapping(d, XPTI_START(v), STACK_PAGES,
+ xd->perdom_l1tab, NULL);
+ if ( rc )
+ goto done;
+ pl1e = xd->perdom_l1tab[l2_table_offset(XPTI_START(v))] +
+ l1_table_offset(XPTI_START(v));
+
+ /* Map stacks and TSS. */
+ rc = create_perdomain_mapping(d, XPTI_TSS(v), 1,
+ NULL, NIL(struct page_info *));
+ if ( rc )
+ goto done;
+
+ ptr = alloc_xenheap_page();
+ if ( !ptr )
+ {
+ rc = -ENOMEM;
+ goto done;
+ }
+ clear_page(ptr);
+ l1e_write(pl1e + STACK_PAGES - 1,
+ l1e_from_pfn(virt_to_mfn(ptr), __PAGE_HYPERVISOR_RW));
+ info = (struct cpu_info *)((unsigned long)ptr + PAGE_SIZE) - 1;
+ info->flags = ON_VCPUSTACK;
+ v->arch.pv_vcpu.stack_regs = &info->guest_cpu_user_regs;
+
+ /* stack just used for generating the correct addresses. */
+ stack = (struct xpti_stack *)XPTI_TSS(v);
+ tss = ptr;
+ tss->rsp0 = (unsigned long)&stack->cpu_info.guest_cpu_user_regs.es;
+ tss->rsp1 = 0x8600111111111111ul; /* poison */
+ tss->rsp2 = 0x8600111111111111ul; /* poison */
+ tss->ist[IST_MCE - 1] = (unsigned long)&stack->mce_stack[XPTI_STACK_N];
+ tss->ist[IST_DF - 1] = (unsigned long)&stack->df_stack[XPTI_STACK_N];
+ tss->ist[IST_NMI - 1] = (unsigned long)&stack->nmi_stack[XPTI_STACK_N];
+ for ( i = IST_MAX; i < ARRAY_SIZE(tss->ist); i++ )
+ tss->ist[i] = 0x8600111111111111ul; /* poison */
+ tss->bitmap = IOBMP_INVALID_OFFSET;
+
+ /* Map stub trampolines. */
+ l1e_write(pl1e + STACK_PAGES - 2,
+ l1e_from_pfn(virt_to_mfn(xpti_lstar), __PAGE_HYPERVISOR_RX));
+
+ done:
+ return rc;
+}
+
int xpti_domain_init(struct domain *d)
{
bool xpti = false;
- int ret = 0;
+ int ret = -ENOMEM;
+ struct vcpu *v;
+ struct xpti_domain *xd;
if ( !is_pv_domain(d) || is_pv_32bit_domain(d) )
return 0;
@@ -96,11 +189,21 @@ int xpti_domain_init(struct domain *d)
if ( !xpti )
return 0;
- d->arch.pv_domain.xpti = xmalloc(struct xpti_domain);
- if ( !d->arch.pv_domain.xpti )
- {
- ret = -ENOMEM;
+ xd = xzalloc(struct xpti_domain);
+ if ( !xd )
goto done;
+ d->arch.pv_domain.xpti = xd;
+
+ xd->perdom_l1tab = xzalloc_array(l1_pgentry_t *,
+ l2_table_offset((d->max_vcpus - 1) << XPTI_VA_SHIFT) + 1);
+ if ( !xd->perdom_l1tab )
+ goto done;
+
+ for_each_vcpu( d, v )
+ {
+ ret = xpti_vcpu_init(v);
+ if ( ret )
+ goto done;
}
printk("Enabling Xen Pagetable protection (XPTI) for Domain %d\n",
diff --git a/xen/include/asm-x86/config.h b/xen/include/asm-x86/config.h
index 9ef9d03ca7..b563a2f85b 100644
--- a/xen/include/asm-x86/config.h
+++ b/xen/include/asm-x86/config.h
@@ -66,6 +66,7 @@
#endif
#define STACK_ORDER 3
+#define STACK_PAGES (1 << STACK_ORDER)
#define STACK_SIZE (PAGE_SIZE << STACK_ORDER)
#define TRAMPOLINE_STACK_SPACE PAGE_SIZE
@@ -202,7 +203,7 @@ extern unsigned char boot_edid_info[128];
/* Slot 260: per-domain mappings (including map cache). */
#define PERDOMAIN_VIRT_START (PML4_ADDR(260))
#define PERDOMAIN_SLOT_MBYTES (PML4_ENTRY_BYTES >> (20 + PAGETABLE_ORDER))
-#define PERDOMAIN_SLOTS 3
+#define PERDOMAIN_SLOTS 4
#define PERDOMAIN_VIRT_SLOT(s) (PERDOMAIN_VIRT_START + (s) * \
(PERDOMAIN_SLOT_MBYTES << 20))
/* Slot 261: machine-to-phys conversion table (256GB). */
@@ -310,6 +311,16 @@ extern unsigned long xen_phys_start;
#define ARG_XLAT_START(v) \
(ARG_XLAT_VIRT_START + ((v)->vcpu_id << ARG_XLAT_VA_SHIFT))
+/* Per-vcpu XPTI pages. The fourth per-domain-mapping sub-area. */
+#define XPTI_VIRT_START PERDOMAIN_VIRT_SLOT(3)
+#define XPTI_VA_SHIFT (PAGE_SHIFT + STACK_ORDER)
+#define XPTI_TRAMPOLINE_OFF ((STACK_PAGES - 2) << PAGE_SHIFT)
+#define XPTI_TSS_OFF ((STACK_PAGES - 1) << PAGE_SHIFT)
+#define XPTI_START(v) (XPTI_VIRT_START + \
+ ((v)->vcpu_id << XPTI_VA_SHIFT))
+#define XPTI_TRAMPOLINE(v) (XPTI_START(v) + XPTI_TRAMPOLINE_OFF)
+#define XPTI_TSS(v) (XPTI_START(v) + XPTI_TSS_OFF)
+
#define NATIVE_VM_ASSIST_VALID ((1UL << VMASST_TYPE_4gb_segments) | \
(1UL << VMASST_TYPE_4gb_segments_notify) | \
(1UL << VMASST_TYPE_writable_pagetables) | \
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 83d226a1ba..5963114e08 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -12,7 +12,7 @@
#include <asm/page.h>
/*
- * Xen's cpu stacks are 8 pages (8-page aligned), arranged as:
+ * Xen's physical cpu stacks are 8 pages (8-page aligned), arranged as:
*
* 7 - Primary stack (with a struct cpu_info at the top)
* 6 - Primary stack
@@ -25,6 +25,19 @@
*/
/*
+ * The vcpu stacks used for XPTI are 8-page aligned in virtual address space
+ * like the physical cpu stacks, but most of that area is unpopulated.
+ * As each stack needs only space for the interrupted context and (in case
+ * of the primary stack) maybe a cpu_info structure, all stacks can be put
+ * into a single page. The Syscall trampolines are mapped directly below the
+ * stack page.
+ *
+ * 7 - Primary stack (with a struct cpu_info at the top), IST stacks and TSS
+ * 6 - Syscall trampolines
+ * 0 - 5 unused
+ */
+
+/*
* Identify which stack page the stack pointer is on. Returns an index
* as per the comment above.
*/
@@ -37,17 +50,29 @@ struct vcpu;
struct cpu_info {
struct cpu_user_regs guest_cpu_user_regs;
- unsigned int processor_id;
- struct vcpu *current_vcpu;
- unsigned long per_cpu_offset;
- unsigned long cr4;
-
- /* See asm-x86/spec_ctrl_asm.h for usage. */
- unsigned int shadow_spec_ctrl;
- bool use_shadow_spec_ctrl;
- uint8_t bti_ist_info;
-
- unsigned long __pad;
+ union {
+ /* per physical cpu mapping */
+ struct {
+ struct vcpu *current_vcpu;
+ unsigned long per_cpu_offset;
+ unsigned long cr4;
+
+ /* See asm-x86/spec_ctrl_asm.h for usage. */
+ unsigned int shadow_spec_ctrl;
+ bool use_shadow_spec_ctrl;
+ uint8_t bti_ist_info;
+ unsigned long p_pad;
+ };
+ /* per vcpu mapping (xpti) */
+ struct {
+ unsigned long v_pad[4];
+ unsigned long stack_bottom_cpu;
+ };
+ };
+ unsigned int processor_id; /* per physical cpu mapping only */
+ unsigned int flags;
+#define ON_VCPUSTACK 0x00000001
+#define VCPUSTACK_ACTIVE 0x00000002
/* get_stack_bottom() must be 16-byte aligned */
};
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index b33c286807..1a4e92481c 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -505,6 +505,9 @@ struct pv_vcpu
/* Deferred VA-based update state. */
bool_t need_update_runstate_area;
struct vcpu_time_info pending_system_time;
+
+ /* If XPTI is active: pointer to user regs on stack. */
+ struct cpu_user_regs *stack_regs;
};
typedef enum __packed {
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
index dfac89df0b..34c51bcfba 100644
--- a/xen/include/asm-x86/pv/mm.h
+++ b/xen/include/asm-x86/pv/mm.h
@@ -31,6 +31,7 @@ void pv_destroy_gdt(struct vcpu *v);
bool pv_map_ldt_shadow_page(unsigned int off);
bool pv_destroy_ldt(struct vcpu *v);
+void xpti_vcpu_destroy(struct vcpu *v);
int xpti_domain_init(struct domain *d);
void xpti_domain_destroy(struct domain *d);
@@ -65,6 +66,7 @@ static inline bool pv_map_ldt_shadow_page(unsigned int off) { return false; }
static inline bool pv_destroy_ldt(struct vcpu *v)
{ ASSERT_UNREACHABLE(); return false; }
+static inline void xpti_vcpu_init(struct vcpu *v) { }
static inline int xpti_domain_init(struct domain *d) { return 0; }
static inline void xpti_domain_destroy(struct domain *d) { }
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 11/17] x86: modify interrupt handlers to support stack switching
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
` (9 preceding siblings ...)
2018-02-09 14:01 ` [PATCH v3 10/17] x86: allocate per-vcpu stacks for interrupt entries Juergen Gross
@ 2018-02-09 14:01 ` Juergen Gross
2018-02-09 14:01 ` [PATCH v3 12/17] x86: activate per-vcpu stacks in case of xpti Juergen Gross
` (6 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-09 14:01 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, dfaggioli, jbeulich
Modify the interrupt handlers to switch stacks on interrupt entry in
case they are running on a per-vcpu stack. Same applies to returning
to the guest: in case the to be loaded context is located on a
per-vcpu stack switch to this one before returning to the guest.
The NMI and MCE interrupt handlers share most of their code today. Use
the common part only after switching stacks as this will enable us
calculating the correct stack address mostly at build time instead of
doing it all at runtime.
guest_cpu_user_regs() is modified to always return the correct
user registers address, either like today the one of the per physical
cpu stack, or that of the per-vcpu stack. Depending on the usage some
callers of guest_cpu_user_regs() need to be adapted to use
get_cpu_info() instead.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V3:
- rework SWITCH_FROM_VCPU_STACK_IST to take the ist as parameter
---
xen/arch/x86/pv/xpti-stub.S | 4 ++--
xen/arch/x86/x86_64/asm-offsets.c | 2 ++
xen/arch/x86/x86_64/entry.S | 14 ++++++++++++--
xen/common/wait.c | 8 ++++----
xen/include/asm-x86/asm_defns.h | 19 +++++++++++++++++++
xen/include/asm-x86/current.h | 15 ++++++++++++++-
xen/include/asm-x86/processor.h | 12 ++++++------
7 files changed, 59 insertions(+), 15 deletions(-)
diff --git a/xen/arch/x86/pv/xpti-stub.S b/xen/arch/x86/pv/xpti-stub.S
index efa1e3f661..92f2ef6dac 100644
--- a/xen/arch/x86/pv/xpti-stub.S
+++ b/xen/arch/x86/pv/xpti-stub.S
@@ -26,7 +26,7 @@ ENTRY(xpti_lstar)
movl $TRAP_syscall, 4(%rsp)
SAVE_ALL
mov %rsp, %r12
-
+ SWITCH_FROM_VCPU_STACK
sti
SPEC_CTRL_ENTRY_FROM_PV /* Req: %r12=regs, %rsp=cpuinfo, Clob: acd */
@@ -46,7 +46,7 @@ ENTRY(xpti_cstar)
movl $TRAP_syscall, 4(%rsp)
SAVE_ALL
movq %rsp, %r12
-
+ SWITCH_FROM_VCPU_STACK
sti
SPEC_CTRL_ENTRY_FROM_PV /* Req: %r12=regs, %rsp=cpuinfo, Clob: acd */
diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
index cc7753c0a9..b0060be261 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -141,6 +141,8 @@ void __dummy__(void)
OFFSET(CPUINFO_shadow_spec_ctrl, struct cpu_info, shadow_spec_ctrl);
OFFSET(CPUINFO_use_shadow_spec_ctrl, struct cpu_info, use_shadow_spec_ctrl);
OFFSET(CPUINFO_bti_ist_info, struct cpu_info, bti_ist_info);
+ OFFSET(CPUINFO_stack_bottom_cpu, struct cpu_info, stack_bottom_cpu);
+ OFFSET(CPUINFO_flags, struct cpu_info, flags);
DEFINE(CPUINFO_sizeof, sizeof(struct cpu_info));
BLANK();
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 69590d0b17..909f6eea66 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -45,6 +45,7 @@ restore_all_guest:
/* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
SPEC_CTRL_EXIT_TO_GUEST /* Req: a=spec_ctrl %rsp=cpuinfo, Clob: cd */
+ SWITCH_TO_VCPU_STACK
RESTORE_ALL
testw $TRAP_syscall,4(%rsp)
jz iret_exit_to_guest
@@ -202,7 +203,6 @@ process_trap:
jmp test_all_events
ENTRY(sysenter_entry)
- sti
pushq $FLAT_USER_SS
pushq $0
pushfq
@@ -214,6 +214,8 @@ GLOBAL(sysenter_eflags_saved)
movl $TRAP_syscall, 4(%rsp)
SAVE_ALL
mov %rsp, %r12
+ SWITCH_FROM_VCPU_STACK
+ sti
SPEC_CTRL_ENTRY_FROM_PV /* Req: %r12=regs, %rsp=cpuinfo, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
@@ -254,6 +256,7 @@ ENTRY(int80_direct_trap)
movl $0x80, 4(%rsp)
SAVE_ALL
mov %rsp, %r12
+ SWITCH_FROM_VCPU_STACK
SPEC_CTRL_ENTRY_FROM_PV /* Req: %r12=regs, %rsp=cpuinfo, Clob: acd */
/* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
@@ -422,6 +425,7 @@ ENTRY(dom_crash_sync_extable)
ENTRY(common_interrupt)
SAVE_ALL CLAC
mov %rsp, %r12
+ SWITCH_FROM_VCPU_STACK
GET_STACK_END(14)
@@ -449,6 +453,7 @@ ENTRY(page_fault)
GLOBAL(handle_exception)
SAVE_ALL CLAC
mov %rsp, %r12
+ SWITCH_FROM_VCPU_STACK
GET_STACK_END(14)
@@ -631,6 +636,7 @@ ENTRY(double_fault)
/* Set AC to reduce chance of further SMAP faults */
SAVE_ALL STAC
movq %rsp, %r12
+ SWITCH_FROM_VCPU_STACK_IST(IST_DF)
GET_STACK_END(14)
@@ -653,10 +659,11 @@ ENTRY(early_page_fault)
ENTRY(nmi)
pushq $0
movl $TRAP_nmi,4(%rsp)
-handle_ist_exception:
SAVE_ALL CLAC
mov %rsp, %r12
+ SWITCH_FROM_VCPU_STACK_IST(IST_NMI)
+handle_ist_exception:
GET_STACK_END(14)
SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %r12=regs, %r14=end, Clob: acd */
@@ -703,6 +710,9 @@ handle_ist_exception:
ENTRY(machine_check)
pushq $0
movl $TRAP_machine_check,4(%rsp)
+ SAVE_ALL CLAC
+ mov %rsp, %r12
+ SWITCH_FROM_VCPU_STACK_IST(IST_MCE)
jmp handle_ist_exception
/* Enable NMIs. No special register assumptions. Only %rax is not preserved. */
diff --git a/xen/common/wait.c b/xen/common/wait.c
index a57bc10d61..fbb5d996e5 100644
--- a/xen/common/wait.c
+++ b/xen/common/wait.c
@@ -122,10 +122,10 @@ void wake_up_all(struct waitqueue_head *wq)
static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
{
- struct cpu_info *cpu_info = get_cpu_info();
+ struct cpu_user_regs *user_regs = guest_cpu_user_regs();
struct vcpu *curr = current;
unsigned long dummy;
- u32 entry_vector = cpu_info->guest_cpu_user_regs.entry_vector;
+ u32 entry_vector = user_regs->entry_vector;
ASSERT(wqv->esp == 0);
@@ -160,7 +160,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
"pop %%r11; pop %%r10; pop %%r9; pop %%r8;"
"pop %%rbp; pop %%rdx; pop %%rbx; pop %%rax"
: "=&S" (wqv->esp), "=&c" (dummy), "=&D" (dummy)
- : "i" (PAGE_SIZE), "0" (0), "1" (cpu_info), "2" (wqv->stack)
+ : "i" (PAGE_SIZE), "0" (0), "1" (user_regs), "2" (wqv->stack)
: "memory" );
if ( unlikely(wqv->esp == 0) )
@@ -169,7 +169,7 @@ static void __prepare_to_wait(struct waitqueue_vcpu *wqv)
domain_crash_synchronous();
}
- cpu_info->guest_cpu_user_regs.entry_vector = entry_vector;
+ user_regs->entry_vector = entry_vector;
}
static void __finish_wait(struct waitqueue_vcpu *wqv)
diff --git a/xen/include/asm-x86/asm_defns.h b/xen/include/asm-x86/asm_defns.h
index 7d26391be8..f626cc6134 100644
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -7,6 +7,7 @@
#include <asm/asm-offsets.h>
#endif
#include <asm/bug.h>
+#include <asm/current.h>
#include <asm/page.h>
#include <asm/processor.h>
#include <asm/percpu.h>
@@ -136,6 +137,24 @@ void ret_from_intr(void);
GET_STACK_END(reg); \
movq STACK_CPUINFO_FIELD(current_vcpu)(%r##reg), %r##reg
+#define SWITCH_FROM_VCPU_STACK \
+ GET_STACK_END(ax); \
+ testb $ON_VCPUSTACK, STACK_CPUINFO_FIELD(flags)(%rax); \
+ jz 1f; \
+ movq STACK_CPUINFO_FIELD(stack_bottom_cpu)(%rax), %rsp; \
+1:
+
+#define SWITCH_FROM_VCPU_STACK_IST(ist) \
+ GET_STACK_END(ax); \
+ testb $ON_VCPUSTACK, STACK_CPUINFO_FIELD(flags)(%rax); \
+ jz 1f; \
+ sub $(STACK_SIZE - 1 - ist * PAGE_SIZE), %rax; \
+ mov %rax, %rsp; \
+1:
+
+#define SWITCH_TO_VCPU_STACK \
+ mov %r12, %rsp
+
#ifndef NDEBUG
#define ASSERT_NOT_IN_ATOMIC \
sti; /* sometimes called with interrupts disabled: safe to enable */ \
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index 5963114e08..e128c13a1e 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -9,8 +9,10 @@
#include <xen/percpu.h>
#include <public/xen.h>
+#include <asm/config.h>
#include <asm/page.h>
+#ifndef __ASSEMBLY__
/*
* Xen's physical cpu stacks are 8 pages (8-page aligned), arranged as:
*
@@ -71,8 +73,10 @@ struct cpu_info {
};
unsigned int processor_id; /* per physical cpu mapping only */
unsigned int flags;
+#endif /* !__ASSEMBLY__ */
#define ON_VCPUSTACK 0x00000001
#define VCPUSTACK_ACTIVE 0x00000002
+#ifndef __ASSEMBLY__
/* get_stack_bottom() must be 16-byte aligned */
};
@@ -97,9 +101,16 @@ static inline struct cpu_info *get_cpu_info(void)
#define set_processor_id(id) do { \
struct cpu_info *ci__ = get_cpu_info(); \
ci__->per_cpu_offset = __per_cpu_offset[ci__->processor_id = (id)]; \
+ ci__->flags = 0; \
} while (0)
-#define guest_cpu_user_regs() (&get_cpu_info()->guest_cpu_user_regs)
+#define guest_cpu_user_regs() ({ \
+ struct cpu_info *info = get_cpu_info(); \
+ if ( info->flags & VCPUSTACK_ACTIVE ) \
+ info = (struct cpu_info *)(XPTI_START(info->current_vcpu) + \
+ STACK_SIZE) - 1; \
+ &info->guest_cpu_user_regs; \
+})
/*
* Get the bottom-of-stack, as stored in the per-CPU TSS. This actually points
@@ -142,4 +153,6 @@ unsigned long get_stack_dump_bottom (unsigned long sp);
*/
DECLARE_PER_CPU(struct vcpu *, curr_vcpu);
+#endif /* !__ASSEMBLY__ */
+
#endif /* __X86_CURRENT_H__ */
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 625f6e9f69..58e47bf6e1 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -97,6 +97,12 @@
X86_EFLAGS_NT|X86_EFLAGS_DF|X86_EFLAGS_IF| \
X86_EFLAGS_TF)
+#define IST_NONE _AC(0,UL)
+#define IST_DF _AC(1,UL)
+#define IST_NMI _AC(2,UL)
+#define IST_MCE _AC(3,UL)
+#define IST_MAX _AC(3,UL)
+
#ifndef __ASSEMBLY__
struct domain;
@@ -400,12 +406,6 @@ struct __packed __cacheline_aligned tss_struct {
uint8_t __cacheline_filler[24];
};
-#define IST_NONE 0UL
-#define IST_DF 1UL
-#define IST_NMI 2UL
-#define IST_MCE 3UL
-#define IST_MAX 3UL
-
/* Set the interrupt stack table used by a particular interrupt
* descriptor table entry. */
static always_inline void set_ist(idt_entry_t *idt, unsigned long ist)
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 12/17] x86: activate per-vcpu stacks in case of xpti
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
` (10 preceding siblings ...)
2018-02-09 14:01 ` [PATCH v3 11/17] x86: modify interrupt handlers to support stack switching Juergen Gross
@ 2018-02-09 14:01 ` Juergen Gross
2018-02-09 14:01 ` [PATCH v3 13/17] x86: allocate hypervisor L4 page table for XPTI Juergen Gross
` (5 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-09 14:01 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, dfaggioli, jbeulich
When scheduling a vcpu subject to xpti activate the per-vcpu stacks
by loading the vcpu specific gdt and tss. When de-scheduling such a
vcpu switch back to the per physical cpu gdt and tss.
Accessing the user registers on the stack is done via helpers as
depending on XPTI active or not the registers are located either on
the per-vcpu stack or on the default stack.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
V3:
- moved some code to xpti.c
- fix error for HVM domains: reset MSRs of XPTI domain before setting
them for HVM
- fix error for 32 bit pv domains: setup GDT in case it was reset after
descheduling a XPTI domain
- reuse percpu GDT for XPTI domains by writing TSS entry (Jan Beulich)
- avoid LTRs and WRMSRs if possible (Jan Beulich)
---
xen/arch/x86/domain.c | 99 +++++++++++++++++++++++++++++++++++++++++++---
xen/include/asm-x86/regs.h | 2 +
2 files changed, 95 insertions(+), 6 deletions(-)
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 6dd47bb2bb..8d6dc73881 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1594,24 +1594,55 @@ static inline bool need_full_gdt(const struct domain *d)
return is_pv_domain(d) && !is_idle_domain(d);
}
+/*
+ * Get address of registers on stack: this is either on the per physical cpu
+ * stack (XPTI is off) or on the per-vcpu stack (XPTI is on)
+ */
+static inline struct cpu_user_regs *get_regs_on_stack(const struct vcpu *v)
+{
+ return is_vcpu_xpti_active(v) ? v->arch.pv_vcpu.stack_regs
+ : &get_cpu_info()->guest_cpu_user_regs;
+}
+
+static inline void copy_user_regs_from_stack(struct vcpu *v)
+{
+ memcpy(&v->arch.user_regs, get_regs_on_stack(v), CTXT_SWITCH_STACK_BYTES);
+}
+
+static inline void copy_user_regs_to_stack(const struct vcpu *v)
+{
+ memcpy(get_regs_on_stack(v), &v->arch.user_regs, CTXT_SWITCH_STACK_BYTES);
+}
+
static void __context_switch(void)
{
- struct cpu_user_regs *stack_regs = guest_cpu_user_regs();
unsigned int cpu = smp_processor_id();
struct vcpu *p = per_cpu(curr_vcpu, cpu);
struct vcpu *n = current;
struct domain *pd = p->domain, *nd = n->domain;
struct desc_struct *gdt;
struct desc_ptr gdt_desc;
+ bool is_pv_gdt;
ASSERT(p != n);
ASSERT(!vcpu_cpu_dirty(n));
if ( !is_idle_domain(pd) )
{
- memcpy(&p->arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES);
+ copy_user_regs_from_stack(p);
vcpu_save_fpu(p);
pd->arch.ctxt_switch->from(p);
+ if ( is_domain_xpti_active(pd) && !is_domain_xpti_active(nd) )
+ {
+ unsigned long stub_va = this_cpu(stubs.addr);
+
+ wrmsrl(MSR_LSTAR, stub_va);
+ wrmsrl(MSR_CSTAR, stub_va + STUB_TRAMPOLINE_SIZE_PERCPU);
+ if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
+ boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR )
+ wrmsrl(MSR_IA32_SYSENTER_ESP,
+ (unsigned long)&get_cpu_info()->guest_cpu_user_regs.es);
+ }
}
/*
@@ -1625,7 +1656,7 @@ static void __context_switch(void)
if ( !is_idle_domain(nd) )
{
- memcpy(stack_regs, &n->arch.user_regs, CTXT_SWITCH_STACK_BYTES);
+ copy_user_regs_to_stack(n);
if ( cpu_has_xsave )
{
u64 xcr0 = n->arch.xcr0 ?: XSTATE_FP_SSE;
@@ -1655,24 +1686,80 @@ static void __context_switch(void)
l1e_from_pfn(mfn + i, __PAGE_HYPERVISOR_RW));
}
- if ( need_full_gdt(pd) &&
- ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(nd)) )
+ is_pv_gdt = need_full_gdt(pd);
+ if ( is_pv_gdt &&
+ ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(nd) ||
+ (pd->arch.pv_domain.xpti && !is_domain_xpti_active(nd))) )
{
gdt_desc.limit = LAST_RESERVED_GDT_BYTE;
gdt_desc.base = (unsigned long)(gdt - FIRST_RESERVED_GDT_ENTRY);
lgdt(&gdt_desc);
+ is_pv_gdt = false;
+
+ /*
+ * When switching from XPTI domain to non-XPTI domain or when changing
+ * vcpu_id of XPTI domains we need to switch to the per physical cpu
+ * TSS in order to avoid either unmapped stacks or stacks being in use
+ * on multiple cpus at the same time.
+ */
+ if ( pd->arch.pv_domain.xpti )
+ {
+ _set_tss_desc(gdt + TSS_ENTRY - FIRST_RESERVED_GDT_ENTRY,
+ (unsigned long)&this_cpu(init_tss),
+ offsetof(struct tss_struct, __cacheline_filler) - 1,
+ SYS_DESC_tss_avail);
+ ltr(TSS_ENTRY << 3);
+ get_cpu_info()->flags &= ~VCPUSTACK_ACTIVE;
+ }
}
write_ptbase(n);
+ if ( is_domain_xpti_active(nd) )
+ {
+ struct cpu_info *info;
+
+ /* Don't use guest_cpu_user_regs(), might point to vcpu stack. */
+ info = (struct cpu_info *)(XPTI_START(n) + STACK_SIZE) - 1;
+ info->stack_bottom_cpu =
+ (unsigned long)&get_cpu_info()->guest_cpu_user_regs;
+ }
+
if ( need_full_gdt(nd) &&
- ((p->vcpu_id != n->vcpu_id) || !need_full_gdt(pd)) )
+ (!is_pv_gdt ||
+ (nd->arch.pv_domain.xpti && !is_domain_xpti_active(pd))) )
{
gdt_desc.limit = LAST_RESERVED_GDT_BYTE;
gdt_desc.base = GDT_VIRT_START(n);
lgdt(&gdt_desc);
+
+ /*
+ * Either we are currently on physical cpu TSS and stacks, or we have
+ * switched vcpu_id. In both cases we need to reload TSS and MSRs with
+ * stub addresses when we enter a XPTI domain.
+ */
+ if ( nd->arch.pv_domain.xpti )
+ {
+ unsigned long stub_va = XPTI_TRAMPOLINE(n);
+
+ _set_tss_desc(gdt + TSS_ENTRY - FIRST_RESERVED_GDT_ENTRY,
+ XPTI_TSS(n),
+ offsetof(struct tss_struct, __cacheline_filler) - 1,
+ SYS_DESC_tss_avail);
+
+ ltr(TSS_ENTRY << 3);
+ get_cpu_info()->flags |= VCPUSTACK_ACTIVE;
+ wrmsrl(MSR_LSTAR,
+ stub_va + ((unsigned long)&xpti_lstar & ~PAGE_MASK));
+ wrmsrl(MSR_CSTAR,
+ stub_va + ((unsigned long)&xpti_cstar & ~PAGE_MASK));
+ if ( boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
+ boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR )
+ wrmsrl(MSR_IA32_SYSENTER_ESP,
+ (unsigned long)&guest_cpu_user_regs()->es);
+ }
}
if ( pd != nd )
diff --git a/xen/include/asm-x86/regs.h b/xen/include/asm-x86/regs.h
index 725a664e0a..361de4c54e 100644
--- a/xen/include/asm-x86/regs.h
+++ b/xen/include/asm-x86/regs.h
@@ -7,6 +7,8 @@
#define guest_mode(r) \
({ \
unsigned long diff = (char *)guest_cpu_user_regs() - (char *)(r); \
+ if ( diff >= STACK_SIZE ) \
+ diff = (char *)&get_cpu_info()->guest_cpu_user_regs - (char *)(r); \
/* Frame pointer must point into current CPU stack. */ \
ASSERT(diff < STACK_SIZE); \
/* If not a guest frame, it must be a hypervisor frame. */ \
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 13/17] x86: allocate hypervisor L4 page table for XPTI
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
` (11 preceding siblings ...)
2018-02-09 14:01 ` [PATCH v3 12/17] x86: activate per-vcpu stacks in case of xpti Juergen Gross
@ 2018-02-09 14:01 ` Juergen Gross
2018-02-09 14:01 ` [PATCH v3 14/17] xen: add domain pointer to fill_ro_mpt() and zap_ro_mpt() functions Juergen Gross
` (4 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-09 14:01 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, dfaggioli, jbeulich
When XPTI for a domain is switched on allocate a hypervisor L4 page
table for each guest L4 page table. For performance reasons keep a
cache of the last used hypervisor L4 pages with the maximum number
depending on the number of vcpus of the guest.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
xen/arch/x86/mm.c | 4 +
xen/arch/x86/mm/shadow/multi.c | 3 +
xen/arch/x86/pv/xpti.c | 549 ++++++++++++++++++++++++++++++++++++++++-
xen/include/asm-x86/domain.h | 1 +
xen/include/asm-x86/pv/mm.h | 4 +
5 files changed, 560 insertions(+), 1 deletion(-)
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index d86e07e9f8..f615204dbb 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -504,6 +504,8 @@ void free_shared_domheap_page(struct page_info *page)
void make_cr3(struct vcpu *v, mfn_t mfn)
{
+ if ( is_vcpu_xpti_active(v) )
+ xpti_make_cr3(v, mfn_x(mfn));
v->arch.cr3 = mfn_x(mfn) << PAGE_SHIFT;
}
@@ -1807,6 +1809,8 @@ static int free_l4_table(struct page_info *page)
if ( rc >= 0 )
{
atomic_dec(&d->arch.pv_domain.nr_l4_pages);
+ if ( d->arch.pv_domain.xpti )
+ xpti_free_l4(d, pfn);
rc = 0;
}
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index a6372e3a02..2d42959f53 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -38,6 +38,7 @@ asm(".file \"" __OBJECT_FILE__ "\"");
#include <asm/hvm/hvm.h>
#include <asm/hvm/cacheattr.h>
#include <asm/mtrr.h>
+#include <asm/pv/mm.h>
#include <asm/guest_pt.h>
#include <public/sched.h>
#include "private.h"
@@ -1895,6 +1896,8 @@ void sh_destroy_l4_shadow(struct domain *d, mfn_t smfn)
/* Put the memory back in the pool */
shadow_free(d, smfn);
+ if ( is_domain_xpti_active(d) )
+ xpti_free_l4(d, mfn_x(smfn));
}
void sh_destroy_l3_shadow(struct domain *d, mfn_t smfn)
diff --git a/xen/arch/x86/pv/xpti.c b/xen/arch/x86/pv/xpti.c
index 1356541804..f663fae806 100644
--- a/xen/arch/x86/pv/xpti.c
+++ b/xen/arch/x86/pv/xpti.c
@@ -22,8 +22,75 @@
#include <xen/domain_page.h>
#include <xen/errno.h>
#include <xen/init.h>
+#include <xen/keyhandler.h>
#include <xen/lib.h>
#include <xen/sched.h>
+#include <asm/bitops.h>
+
+/*
+ * For each L4 page table of the guest we need a shadow for the hypervisor.
+ *
+ * Such a shadow is considered to be active when the guest has loaded its
+ * %cr3 on any vcpu with the MFN of the L4 page the shadow is associated
+ * with.
+ *
+ * The shadows are referenced via an array of struct xpti_l4pg. This array
+ * is set up at domain creation time and sized by number of vcpus of the
+ * domain (N_L4_PERVCPU * max_vcpus). The index into this array is used to
+ * address the single xpti_l4pg instances.
+ *
+ * A xpti_l4pg associated to a guest's L4 page table is put in a linked list
+ * anchored in the xpti_l4ref hash array. The index into this array is taken
+ * from the lower bits of the guest's L4 page table MFN.
+ * The hash lists are sorted by a LRU mechanism. Additionally all xpti_l4pg's
+ * in any hash list but those currently being active are in a single big
+ * LRU list.
+ *
+ * Whenever a guest L4 page table is being unpinned its associated shadow is
+ * put in a free list to avoid the need to allocate a new shadow from the heap
+ * when a new L4 page is being pinned. This free list is limited in its length
+ * to (N_L4_FREE_PERVCPU * max_vcpus).
+ *
+ * New shadows are obtained from the following resources (first hit wins):
+ * - from the free list
+ * - if maximum number of shadows not yet reached allocation from the heap
+ * - from the end of the global lru list
+ *
+ * At domain creation the free list is initialized with N_L4_MIN_PERVCPU per
+ * vcpu free shadows in order to have a minimal working set technically not
+ * requiring additional allocations.
+ */
+
+#define XPTI_DEBUG
+
+#define N_L4_PERVCPU 64
+#define N_L4_FREE_PERVCPU 16
+#define N_L4_MIN_PERVCPU 2
+
+#define L4_INVALID ~0
+
+#define max_l4(d) ((d)->max_vcpus * N_L4_PERVCPU)
+#define max_free(xd) (N_L4_FREE_PERVCPU * (xd)->domain->max_vcpus)
+#define min_free(xd) (N_L4_MIN_PERVCPU * (xd)->domain->max_vcpus)
+
+#ifdef XPTI_DEBUG
+#define XPTI_CNT(what) xd->what++
+#else
+#define XPTI_CNT(what)
+#endif
+
+struct xpti_l4ref {
+ unsigned int idx; /* First shadow */
+};
+
+struct xpti_l4pg {
+ unsigned long guest_mfn; /* MFN of guest L4 page */
+ unsigned long xen_mfn; /* MFN of associated shadow */
+ unsigned int ref_next; /* Next shadow, anchored in xpti_l4ref */
+ unsigned int lru_next; /* Global LRU list */
+ unsigned int lru_prev;
+ unsigned int active_cnt; /* Number of vcpus the shadow is active on */
+};
#define XPTI_STACK_SIZE 512
#define XPTI_STACK_N (XPTI_STACK_SIZE / 8)
@@ -40,7 +107,30 @@ struct xpti_stack {
};
struct xpti_domain {
+ struct xpti_l4ref *l4ref; /* Hash array */
+ struct xpti_l4pg *l4pg; /* Shadow admin array */
+ unsigned int l4ref_size; /* Hash size */
+ unsigned int n_alloc; /* Number of allocated shadows */
+ unsigned int n_free; /* Number of free shadows */
+ unsigned int lru_first; /* LRU list of associated shadows */
+ unsigned int lru_last;
+ unsigned int free_first; /* List of free shadows */
+ unsigned int unused_first; /* List of unused slots */
+ spinlock_t lock; /* Protects all shadow lists */
+ struct domain *domain;
+ struct tasklet tasklet;
l1_pgentry_t **perdom_l1tab;
+#ifdef XPTI_DEBUG
+ unsigned int cnt_alloc;
+ unsigned int cnt_free;
+ unsigned int cnt_getfree;
+ unsigned int cnt_putfree;
+ unsigned int cnt_getforce;
+ unsigned int cnt_activate;
+ unsigned int cnt_deactivate;
+ unsigned int cnt_newl4;
+ unsigned int cnt_freel4;
+#endif
};
static __read_mostly enum {
@@ -77,22 +167,371 @@ static int parse_xpti(const char *s)
custom_runtime_param("xpti", parse_xpti);
+static unsigned int xpti_shadow_add(struct xpti_domain *xd, unsigned long mfn)
+{
+ unsigned int new = xd->unused_first;
+ struct xpti_l4pg *l4pg = xd->l4pg + new;
+
+ if ( xd->n_alloc >= max_l4(xd->domain) )
+ new = L4_INVALID;
+ if ( new != L4_INVALID )
+ {
+ XPTI_CNT(cnt_alloc);
+ xd->unused_first = l4pg->lru_next;
+ l4pg->xen_mfn = mfn;
+ xd->n_alloc++;
+ }
+
+ return new;
+}
+
+static void *xpti_shadow_free(struct xpti_domain *xd, unsigned int free)
+{
+ struct xpti_l4pg *l4pg = xd->l4pg + free;
+ void *virt;
+
+ XPTI_CNT(cnt_free);
+ ASSERT(xd->n_alloc);
+ virt = mfn_to_virt(l4pg->xen_mfn);
+ l4pg->lru_next = xd->unused_first;
+ xd->unused_first = free;
+ xd->n_alloc--;
+
+ return virt;
+}
+
+static unsigned int xpti_shadow_getfree(struct xpti_domain *xd)
+{
+ unsigned free = xd->free_first;
+ struct xpti_l4pg *l4pg = xd->l4pg + free;
+
+ if ( free != L4_INVALID )
+ {
+ XPTI_CNT(cnt_getfree);
+ xd->free_first = l4pg->lru_next;
+ ASSERT(xd->n_free);
+ xd->n_free--;
+ l4pg->lru_next = L4_INVALID;
+
+ if ( !xd->n_free && xd->n_alloc < max_l4(xd->domain) &&
+ !xd->domain->is_dying )
+ tasklet_schedule(&xd->tasklet);
+ }
+
+ return free;
+}
+
+static void xpti_shadow_putfree(struct xpti_domain *xd, unsigned int free)
+{
+ struct xpti_l4pg *l4pg = xd->l4pg + free;
+
+ ASSERT(free != L4_INVALID);
+ XPTI_CNT(cnt_putfree);
+ l4pg->lru_prev = L4_INVALID;
+ l4pg->lru_next = xd->free_first;
+ xd->free_first = free;
+ xd->n_free++;
+
+ if ( xd->n_free > max_free(xd) && !xd->domain->is_dying )
+ tasklet_schedule(&xd->tasklet);
+}
+
+static struct xpti_l4ref *xpti_get_hashentry_mfn(struct xpti_domain *xd,
+ unsigned long mfn)
+{
+ return xd->l4ref + (mfn & (xd->l4ref_size - 1));
+}
+
+static struct xpti_l4ref *xpti_get_hashentry(struct xpti_domain *xd,
+ unsigned int idx)
+{
+ struct xpti_l4pg *l4pg = xd->l4pg + idx;
+
+ return xpti_get_hashentry_mfn(xd, l4pg->guest_mfn);
+}
+
+static unsigned int xpti_shadow_from_hashlist(struct xpti_domain *xd,
+ unsigned long mfn)
+{
+ struct xpti_l4ref *l4ref;
+ unsigned int ref_idx;
+
+ l4ref = xpti_get_hashentry_mfn(xd, mfn);
+ ref_idx = l4ref->idx;
+ while ( ref_idx != L4_INVALID && xd->l4pg[ref_idx].guest_mfn != mfn )
+ ref_idx = xd->l4pg[ref_idx].ref_next;
+
+ return ref_idx;
+}
+
+static void xpti_shadow_deactivate(struct xpti_domain *xd, unsigned int idx)
+{
+ struct xpti_l4pg *l4pg = xd->l4pg + idx;
+ struct xpti_l4ref *l4ref;
+ unsigned int ref_idx;
+
+ /* Decrement active count. If still > 0 we are done. */
+ XPTI_CNT(cnt_deactivate);
+ ASSERT(l4pg->active_cnt > 0);
+ l4pg->active_cnt--;
+ if ( l4pg->active_cnt )
+ return;
+
+ /* Put in hash list at first position for its hash entry. */
+ l4ref = xpti_get_hashentry(xd, idx);
+ ref_idx = l4ref->idx;
+ ASSERT(ref_idx != L4_INVALID);
+ /* Only need to do something if not already in front. */
+ if ( ref_idx != idx )
+ {
+ /* Search for entry referencing our element. */
+ while ( xd->l4pg[ref_idx].ref_next != idx )
+ ref_idx = xd->l4pg[ref_idx].ref_next;
+
+ /* Dequeue and put to front of list. */
+ xd->l4pg[ref_idx].ref_next = l4pg->ref_next;
+ l4pg->ref_next = l4ref->idx;
+ l4ref->idx = idx;
+ }
+
+ /* Put into LRU list at first position. */
+ l4pg->lru_next = xd->lru_first;
+ l4pg->lru_prev = L4_INVALID;
+ xd->lru_first = idx;
+ if ( xd->lru_last == L4_INVALID )
+ xd->lru_last = idx;
+ else if ( l4pg->lru_next != L4_INVALID )
+ xd->l4pg[l4pg->lru_next].lru_prev = idx;
+}
+
+static void xpti_shadow_lru_remove(struct xpti_domain *xd, unsigned int idx)
+{
+ struct xpti_l4pg *l4pg = xd->l4pg + idx;
+ unsigned int prev = l4pg->lru_prev;
+ unsigned int next = l4pg->lru_next;
+
+ if ( prev != L4_INVALID )
+ xd->l4pg[prev].lru_next = next;
+ else if ( xd->lru_first == idx )
+ xd->lru_first = next;
+ if ( next != L4_INVALID )
+ xd->l4pg[next].lru_prev = prev;
+ else if ( xd->lru_last == idx )
+ xd->lru_last = prev;
+ l4pg->lru_prev = L4_INVALID;
+ l4pg->lru_next = L4_INVALID;
+}
+
+static void xpti_shadow_hash_remove(struct xpti_domain *xd, unsigned int idx)
+{
+ struct xpti_l4pg *l4pg = xd->l4pg + idx;
+ struct xpti_l4ref *l4ref;
+ unsigned int ref_idx;
+
+ l4ref = xpti_get_hashentry(xd, idx);
+ ref_idx = l4ref->idx;
+ ASSERT(ref_idx != L4_INVALID);
+ if ( ref_idx == idx )
+ {
+ l4ref->idx = l4pg->ref_next;
+ }
+ else
+ {
+ while ( xd->l4pg[ref_idx].ref_next != idx )
+ ref_idx = xd->l4pg[ref_idx].ref_next;
+ xd->l4pg[ref_idx].ref_next = l4pg->ref_next;
+ }
+}
+
+static unsigned int xpti_shadow_getforce(struct xpti_domain *xd)
+{
+ unsigned int idx = xd->lru_last;
+
+ XPTI_CNT(cnt_getforce);
+ ASSERT(idx != L4_INVALID);
+ ASSERT(!xd->l4pg[idx].active_cnt);
+
+ xpti_shadow_hash_remove(xd, idx);
+ xpti_shadow_lru_remove(xd, idx);
+
+ return idx;
+}
+
+static unsigned int xpti_shadow_get(struct xpti_domain *xd, unsigned long mfn)
+{
+ unsigned int idx;
+ struct xpti_l4ref *l4ref;
+ struct xpti_l4pg *l4pg;
+
+ idx = xpti_shadow_from_hashlist(xd, mfn);
+ if ( idx != L4_INVALID )
+ {
+ /* Remove from LRU list if currently not active. */
+ if ( !xd->l4pg[idx].active_cnt )
+ xpti_shadow_lru_remove(xd, idx);
+
+ return idx;
+ }
+
+ XPTI_CNT(cnt_newl4);
+ idx = xpti_shadow_getfree(xd);
+ if ( idx == L4_INVALID )
+ idx = xpti_shadow_getforce(xd);
+
+ /* Set mfn and insert in hash list. */
+ l4ref = xpti_get_hashentry_mfn(xd, mfn);
+ l4pg = xd->l4pg + idx;
+ l4pg->guest_mfn = mfn;
+ l4pg->ref_next = l4ref->idx;
+ l4ref->idx = idx;
+
+ return idx;
+}
+
+static unsigned int xpti_shadow_activate(struct xpti_domain *xd,
+ unsigned long mfn)
+{
+ unsigned int idx;
+ struct xpti_l4pg *l4pg;
+
+ XPTI_CNT(cnt_activate);
+ idx = xpti_shadow_get(xd, mfn);
+ l4pg = xd->l4pg + idx;
+
+ l4pg->active_cnt++;
+
+ return idx;
+}
+
+void xpti_make_cr3(struct vcpu *v, unsigned long mfn)
+{
+ struct xpti_domain *xd = v->domain->arch.pv_domain.xpti;
+ unsigned long flags;
+ unsigned int idx;
+
+ spin_lock_irqsave(&xd->lock, flags);
+
+ idx = v->arch.pv_vcpu.xen_cr3_shadow;
+
+ /* First activate new shadow. */
+ v->arch.pv_vcpu.xen_cr3_shadow = xpti_shadow_activate(xd, mfn);
+
+ /* Deactivate old shadow if applicable. */
+ if ( idx != L4_INVALID )
+ xpti_shadow_deactivate(xd, idx);
+
+ spin_unlock_irqrestore(&xd->lock, flags);
+}
+
+void xpti_free_l4(struct domain *d, unsigned long mfn)
+{
+ struct xpti_domain *xd = d->arch.pv_domain.xpti;
+ unsigned long flags;
+ unsigned int idx;
+
+ spin_lock_irqsave(&xd->lock, flags);
+
+ idx = xpti_shadow_from_hashlist(xd, mfn);
+ if ( idx != L4_INVALID )
+ {
+ XPTI_CNT(cnt_freel4);
+ /* Might still be active in a vcpu to be destroyed. */
+ if ( !xd->l4pg[idx].active_cnt )
+ {
+ xpti_shadow_lru_remove(xd, idx);
+ xpti_shadow_hash_remove(xd, idx);
+ xpti_shadow_putfree(xd, idx);
+ }
+ }
+
+ spin_unlock_irqrestore(&xd->lock, flags);
+}
+
+static void xpti_tasklet(unsigned long _xd)
+{
+ struct xpti_domain *xd = (struct xpti_domain *)_xd;
+ void *virt;
+ unsigned long flags;
+ unsigned int free;
+
+ spin_lock_irqsave(&xd->lock, flags);
+
+ while ( xd->n_free < min_free(xd) && xd->n_alloc < max_l4(xd->domain) )
+ {
+ spin_unlock_irqrestore(&xd->lock, flags);
+ virt = alloc_xenheap_pages(0, MEMF_node(domain_to_node(xd->domain)));
+ spin_lock_irqsave(&xd->lock, flags);
+ if ( !virt )
+ break;
+ free = xpti_shadow_add(xd, virt_to_mfn(virt));
+ if ( free == L4_INVALID )
+ {
+ spin_unlock_irqrestore(&xd->lock, flags);
+ free_xenheap_page(virt);
+ spin_lock_irqsave(&xd->lock, flags);
+ break;
+ }
+ xpti_shadow_putfree(xd, free);
+ }
+
+ while ( xd->n_free > max_free(xd) )
+ {
+ free = xpti_shadow_getfree(xd);
+ ASSERT(free != L4_INVALID);
+ virt = xpti_shadow_free(xd, free);
+ spin_unlock_irqrestore(&xd->lock, flags);
+ free_xenheap_page(virt);
+ spin_lock_irqsave(&xd->lock, flags);
+ }
+
+ spin_unlock_irqrestore(&xd->lock, flags);
+}
+
void xpti_domain_destroy(struct domain *d)
{
struct xpti_domain *xd = d->arch.pv_domain.xpti;
+ unsigned int idx;
if ( !xd )
return;
+ tasklet_kill(&xd->tasklet);
+
+ while ( xd->lru_first != L4_INVALID ) {
+ idx = xd->lru_first;
+ xpti_shadow_lru_remove(xd, idx);
+ free_xenheap_page(xpti_shadow_free(xd, idx));
+ }
+
+ while ( xd->n_free ) {
+ idx = xpti_shadow_getfree(xd);
+ free_xenheap_page(xpti_shadow_free(xd, idx));
+ }
+
xfree(xd->perdom_l1tab);
+ xfree(xd->l4pg);
+ xfree(xd->l4ref);
xfree(xd);
d->arch.pv_domain.xpti = NULL;
}
void xpti_vcpu_destroy(struct vcpu *v)
{
- if ( v->domain->arch.pv_domain.xpti )
+ struct xpti_domain *xd = v->domain->arch.pv_domain.xpti;
+ unsigned long flags;
+
+ if ( xd )
{
+ spin_lock_irqsave(&xd->lock, flags);
+
+ if ( v->arch.pv_vcpu.xen_cr3_shadow != L4_INVALID )
+ {
+ xpti_shadow_deactivate(xd, v->arch.pv_vcpu.xen_cr3_shadow);
+ v->arch.pv_vcpu.xen_cr3_shadow = L4_INVALID;
+ }
+
+ spin_unlock_irqrestore(&xd->lock, flags);
+
free_xenheap_page(v->arch.pv_vcpu.stack_regs);
v->arch.pv_vcpu.stack_regs = NULL;
destroy_perdomain_mapping(v->domain, XPTI_START(v), STACK_PAGES);
@@ -155,6 +594,8 @@ static int xpti_vcpu_init(struct vcpu *v)
l1e_write(pl1e + STACK_PAGES - 2,
l1e_from_pfn(virt_to_mfn(xpti_lstar), __PAGE_HYPERVISOR_RX));
+ v->arch.pv_vcpu.xen_cr3_shadow = L4_INVALID;
+
done:
return rc;
}
@@ -165,6 +606,8 @@ int xpti_domain_init(struct domain *d)
int ret = -ENOMEM;
struct vcpu *v;
struct xpti_domain *xd;
+ void *virt;
+ unsigned int i, new;
if ( !is_pv_domain(d) || is_pv_32bit_domain(d) )
return 0;
@@ -193,6 +636,40 @@ int xpti_domain_init(struct domain *d)
if ( !xd )
goto done;
d->arch.pv_domain.xpti = xd;
+ xd->domain = d;
+ xd->lru_first = L4_INVALID;
+ xd->lru_last = L4_INVALID;
+ xd->free_first = L4_INVALID;
+
+ spin_lock_init(&xd->lock);
+ tasklet_init(&xd->tasklet, xpti_tasklet, (unsigned long)xd);
+
+ xd->l4ref_size = 1 << (fls(max_l4(d)) - 1);
+ xd->l4ref = xzalloc_array(struct xpti_l4ref, xd->l4ref_size);
+ if ( !xd->l4ref )
+ goto done;
+ for ( i = 0; i < xd->l4ref_size; i++ )
+ xd->l4ref[i].idx = L4_INVALID;
+
+ xd->l4pg = xzalloc_array(struct xpti_l4pg, max_l4(d));
+ if ( !xd->l4pg )
+ goto done;
+ for ( i = 0; i < max_l4(d) - 1; i++ )
+ {
+ xd->l4pg[i].lru_next = i + 1;
+ }
+ xd->l4pg[i].lru_next = L4_INVALID;
+ xd->unused_first = 0;
+
+ for ( i = 0; i < min_free(xd); i++ )
+ {
+ virt = alloc_xenheap_pages(0, MEMF_node(domain_to_node(d)));
+ if ( !virt )
+ goto done;
+ new = xpti_shadow_add(xd, virt_to_mfn(virt));
+ ASSERT(new != L4_INVALID);
+ xpti_shadow_putfree(xd, new);
+ }
xd->perdom_l1tab = xzalloc_array(l1_pgentry_t *,
l2_table_offset((d->max_vcpus - 1) << XPTI_VA_SHIFT) + 1);
@@ -206,9 +683,79 @@ int xpti_domain_init(struct domain *d)
goto done;
}
+ ret = 0;
+
printk("Enabling Xen Pagetable protection (XPTI) for Domain %d\n",
d->domain_id);
done:
return ret;
}
+
+static void xpti_dump_domain_info(struct domain *d)
+{
+ struct xpti_domain *xd = d->arch.pv_domain.xpti;
+ unsigned long flags;
+
+ if ( !is_pv_domain(d) || !xd )
+ return;
+
+ spin_lock_irqsave(&xd->lock, flags);
+
+ printk("Domain %d XPTI shadow pages: %u allocated, %u max, %u free\n",
+ d->domain_id, xd->n_alloc, max_l4(d), xd->n_free);
+
+#ifdef XPTI_DEBUG
+ printk(" alloc: %d, free: %d, getfree: %d, putfree: %d, getforce: %d\n",
+ xd->cnt_alloc, xd->cnt_free, xd->cnt_getfree, xd->cnt_putfree,
+ xd->cnt_getforce);
+ printk(" activate: %d, deactivate: %d, newl4: %d, freel4: %d\n",
+ xd->cnt_activate, xd->cnt_deactivate, xd->cnt_newl4,
+ xd->cnt_freel4);
+#endif
+
+ spin_unlock_irqrestore(&xd->lock, flags);
+}
+
+static void xpti_dump_info(unsigned char key)
+{
+ struct domain *d;
+ char *opt;
+
+ printk("'%c' pressed -> dumping XPTI info\n", key);
+
+ switch ( opt_xpti )
+ {
+ case XPTI_DEFAULT:
+ opt = "default";
+ break;
+ case XPTI_ON:
+ opt = "on";
+ break;
+ case XPTI_OFF:
+ opt = "off";
+ break;
+ case XPTI_NODOM0:
+ opt = "nodom0";
+ break;
+ default:
+ opt = "???";
+ break;
+ }
+
+ printk("XPTI global setting: %s\n", opt);
+
+ rcu_read_lock(&domlist_read_lock);
+
+ for_each_domain ( d )
+ xpti_dump_domain_info(d);
+
+ rcu_read_unlock(&domlist_read_lock);
+}
+
+static int __init xpti_key_init(void)
+{
+ register_keyhandler('X', xpti_dump_info, "dump XPTI info", 1);
+ return 0;
+}
+__initcall(xpti_key_init);
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 1a4e92481c..5d14631272 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -508,6 +508,7 @@ struct pv_vcpu
/* If XPTI is active: pointer to user regs on stack. */
struct cpu_user_regs *stack_regs;
+ unsigned xen_cr3_shadow; /* XPTI: index of current shadow L4 */
};
typedef enum __packed {
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
index 34c51bcfba..25c035988c 100644
--- a/xen/include/asm-x86/pv/mm.h
+++ b/xen/include/asm-x86/pv/mm.h
@@ -34,6 +34,8 @@ bool pv_destroy_ldt(struct vcpu *v);
void xpti_vcpu_destroy(struct vcpu *v);
int xpti_domain_init(struct domain *d);
void xpti_domain_destroy(struct domain *d);
+void xpti_make_cr3(struct vcpu *v, unsigned long mfn);
+void xpti_free_l4(struct domain *d, unsigned long mfn);
static inline bool is_domain_xpti_active(const struct domain *d)
{
@@ -69,6 +71,8 @@ static inline bool pv_destroy_ldt(struct vcpu *v)
static inline void xpti_vcpu_init(struct vcpu *v) { }
static inline int xpti_domain_init(struct domain *d) { return 0; }
static inline void xpti_domain_destroy(struct domain *d) { }
+static inline void xpti_make_cr3(struct vcpu *v, unsigned long mfn) { }
+static inline void xpti_free_l4(struct domain *d, unsigned long mfn) { }
static inline bool is_domain_xpti_active(const struct domain *d)
{ return false; }
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 14/17] xen: add domain pointer to fill_ro_mpt() and zap_ro_mpt() functions
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
` (12 preceding siblings ...)
2018-02-09 14:01 ` [PATCH v3 13/17] x86: allocate hypervisor L4 page table for XPTI Juergen Gross
@ 2018-02-09 14:01 ` Juergen Gross
2018-02-09 14:01 ` [PATCH v3 15/17] x86: fill XPTI shadow pages and keep them in sync with guest L4 Juergen Gross
` (3 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-09 14:01 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, dfaggioli, jbeulich
In order to be able to sync L4 page table modifications with XPTI we
need the domain pointer in fill_ro_mpt() and zap_ro_mpt().
Signed-off-by: Juergen Gross <jgross@suse.com>
---
xen/arch/x86/domain.c | 6 +++---
xen/arch/x86/mm.c | 8 ++++----
xen/arch/x86/mm/shadow/multi.c | 4 ++--
xen/include/asm-x86/mm.h | 4 ++--
4 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 8d6dc73881..0a6a94d2e1 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1002,7 +1002,7 @@ int arch_set_info_guest(
{
if ( (page->u.inuse.type_info & PGT_type_mask) ==
PGT_l4_page_table )
- done = !fill_ro_mpt(_mfn(page_to_mfn(page)));
+ done = !fill_ro_mpt(d, _mfn(page_to_mfn(page)));
page_unlock(page);
}
@@ -1078,7 +1078,7 @@ int arch_set_info_guest(
case 0:
if ( !compat && !VM_ASSIST(d, m2p_strict) &&
!paging_mode_refcounts(d) )
- fill_ro_mpt(_mfn(cr3_gfn));
+ fill_ro_mpt(d, _mfn(cr3_gfn));
break;
default:
if ( cr3_page == current->arch.old_guest_table )
@@ -1118,7 +1118,7 @@ int arch_set_info_guest(
break;
case 0:
if ( VM_ASSIST(d, m2p_strict) )
- zap_ro_mpt(_mfn(cr3_gfn));
+ zap_ro_mpt(d, _mfn(cr3_gfn));
break;
}
}
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index f615204dbb..16b004abe6 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1611,7 +1611,7 @@ void init_xen_l4_slots(l4_pgentry_t *l4t, mfn_t l4mfn,
}
}
-bool fill_ro_mpt(mfn_t mfn)
+bool fill_ro_mpt(const struct domain *d, mfn_t mfn)
{
l4_pgentry_t *l4tab = map_domain_page(mfn);
bool ret = false;
@@ -1627,7 +1627,7 @@ bool fill_ro_mpt(mfn_t mfn)
return ret;
}
-void zap_ro_mpt(mfn_t mfn)
+void zap_ro_mpt(const struct domain *d, mfn_t mfn)
{
l4_pgentry_t *l4tab = map_domain_page(mfn);
@@ -2891,7 +2891,7 @@ int new_guest_cr3(mfn_t mfn)
pv_destroy_ldt(curr); /* Unconditional TLB flush later. */
if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
- fill_ro_mpt(mfn);
+ fill_ro_mpt(d, mfn);
curr->arch.guest_table = pagetable_from_mfn(mfn);
update_cr3(curr);
@@ -3270,7 +3270,7 @@ long do_mmuext_op(
}
if ( VM_ASSIST(currd, m2p_strict) )
- zap_ro_mpt(_mfn(op.arg1.mfn));
+ zap_ro_mpt(currd, _mfn(op.arg1.mfn));
}
curr->arch.guest_table_user = pagetable_from_pfn(op.arg1.mfn);
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 2d42959f53..170163fbcf 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -4168,10 +4168,10 @@ sh_update_cr3(struct vcpu *v, int do_locking)
mfn_t smfn = pagetable_get_mfn(v->arch.shadow_table[0]);
if ( !(v->arch.flags & TF_kernel_mode) && VM_ASSIST(d, m2p_strict) )
- zap_ro_mpt(smfn);
+ zap_ro_mpt(d, smfn);
else if ( (v->arch.flags & TF_kernel_mode) &&
!VM_ASSIST(d, m2p_strict) )
- fill_ro_mpt(smfn);
+ fill_ro_mpt(d, smfn);
}
#else
#error This should never happen
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 3013c266fe..446d8584c0 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -344,8 +344,8 @@ int free_page_type(struct page_info *page, unsigned long type,
void init_xen_pae_l2_slots(l2_pgentry_t *l2t, const struct domain *d);
void init_xen_l4_slots(l4_pgentry_t *l4t, mfn_t l4mfn,
const struct domain *d, mfn_t sl4mfn, bool ro_mpt);
-bool fill_ro_mpt(mfn_t mfn);
-void zap_ro_mpt(mfn_t mfn);
+bool fill_ro_mpt(const struct domain *d, mfn_t mfn);
+void zap_ro_mpt(const struct domain *d, mfn_t mfn);
bool is_iomem_page(mfn_t mfn);
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 15/17] x86: fill XPTI shadow pages and keep them in sync with guest L4
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
` (13 preceding siblings ...)
2018-02-09 14:01 ` [PATCH v3 14/17] xen: add domain pointer to fill_ro_mpt() and zap_ro_mpt() functions Juergen Gross
@ 2018-02-09 14:01 ` Juergen Gross
2018-02-09 14:01 ` [PATCH v3 16/17] x86: do page table switching when entering/leaving hypervisor Juergen Gross
` (2 subsequent siblings)
17 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-09 14:01 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, dfaggioli, jbeulich
For being able to use the XPTI shadow L4 page tables in the hypervisor
fill them with the related entries of their masters and keep them in
sync when updates are done by the guest.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
xen/arch/x86/mm.c | 43 ++++++++++++++++++++++++++++++++++++++----
xen/arch/x86/mm/shadow/multi.c | 2 ++
xen/arch/x86/pv/dom0_build.c | 3 +++
xen/arch/x86/pv/xpti.c | 35 ++++++++++++++++++++++++++++++++++
xen/include/asm-x86/pv/mm.h | 4 ++++
5 files changed, 83 insertions(+), 4 deletions(-)
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 16b004abe6..14dc776a52 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1609,6 +1609,18 @@ void init_xen_l4_slots(l4_pgentry_t *l4t, mfn_t l4mfn,
(ROOT_PAGETABLE_FIRST_XEN_SLOT + slots -
l4_table_offset(XEN_VIRT_START)) * sizeof(*l4t));
}
+
+ if ( is_domain_xpti_active(d) )
+ {
+ unsigned int slot;
+
+ for ( slot = ROOT_PAGETABLE_FIRST_XEN_SLOT;
+ slot <= ROOT_PAGETABLE_LAST_XEN_SLOT;
+ slot++ )
+ xpti_update_l4(d,
+ mfn_x(mfn_eq(sl4mfn, INVALID_MFN) ? l4mfn : sl4mfn),
+ slot, l4t[slot]);
+ }
}
bool fill_ro_mpt(const struct domain *d, mfn_t mfn)
@@ -1621,6 +1633,9 @@ bool fill_ro_mpt(const struct domain *d, mfn_t mfn)
l4tab[l4_table_offset(RO_MPT_VIRT_START)] =
idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
ret = true;
+ if ( is_domain_xpti_active(d) )
+ xpti_update_l4(d, mfn_x(mfn), l4_table_offset(RO_MPT_VIRT_START),
+ idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)]);
}
unmap_domain_page(l4tab);
@@ -1632,6 +1647,11 @@ void zap_ro_mpt(const struct domain *d, mfn_t mfn)
l4_pgentry_t *l4tab = map_domain_page(mfn);
l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
+
+ if ( is_domain_xpti_active(d) )
+ xpti_update_l4(d, mfn_x(mfn), l4_table_offset(RO_MPT_VIRT_START),
+ l4e_empty());
+
unmap_domain_page(l4tab);
}
@@ -1682,6 +1702,8 @@ static int alloc_l4_table(struct page_info *page)
}
pl4e[i] = adjust_guest_l4e(pl4e[i], d);
+ if ( is_domain_xpti_active(d) )
+ xpti_update_l4(d, pfn, i, pl4e[i]);
}
if ( rc >= 0 )
@@ -2141,6 +2163,20 @@ static int mod_l3_entry(l3_pgentry_t *pl3e,
return rc;
}
+static bool update_l4pte(l4_pgentry_t *pl4e, l4_pgentry_t ol4e,
+ l4_pgentry_t nl4e, unsigned long pfn,
+ struct vcpu *v, bool preserve_ad)
+{
+ bool rc;
+
+ rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, v, preserve_ad);
+ if ( rc && is_vcpu_xpti_active(v) &&
+ (!paging_mode_shadow(v->domain) || !paging_get_hostmode(v)) )
+ xpti_update_l4(v->domain, pfn, pgentry_ptr_to_slot(pl4e), nl4e);
+
+ return rc;
+}
+
/* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */
static int mod_l4_entry(l4_pgentry_t *pl4e,
l4_pgentry_t nl4e,
@@ -2175,7 +2211,7 @@ static int mod_l4_entry(l4_pgentry_t *pl4e,
if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) )
{
nl4e = adjust_guest_l4e(nl4e, d);
- rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad);
+ rc = update_l4pte(pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad);
return rc ? 0 : -EFAULT;
}
@@ -2185,14 +2221,13 @@ static int mod_l4_entry(l4_pgentry_t *pl4e,
rc = 0;
nl4e = adjust_guest_l4e(nl4e, d);
- if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
- preserve_ad)) )
+ if ( unlikely(!update_l4pte(pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad)) )
{
ol4e = nl4e;
rc = -EFAULT;
}
}
- else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
+ else if ( unlikely(!update_l4pte(pl4e, ol4e, nl4e, pfn, vcpu,
preserve_ad)) )
{
return -EFAULT;
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 170163fbcf..110a5449a6 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -954,6 +954,8 @@ static int shadow_set_l4e(struct domain *d,
/* Write the new entry */
shadow_write_entries(sl4e, &new_sl4e, 1, sl4mfn);
flags |= SHADOW_SET_CHANGED;
+ if ( is_domain_xpti_active(d) )
+ xpti_update_l4(d, mfn_x(sl4mfn), pgentry_ptr_to_slot(sl4e), new_sl4e);
if ( shadow_l4e_get_flags(old_sl4e) & _PAGE_PRESENT )
{
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 6e7bc435ab..8ef9c87845 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -142,6 +142,9 @@ static __init void setup_pv_physmap(struct domain *d, unsigned long pgtbl_pfn,
pl3e = __map_domain_page(page);
clear_page(pl3e);
*pl4e = l4e_from_page(page, L4_PROT);
+ if ( is_domain_xpti_active(d) )
+ xpti_update_l4(d, pgtbl_pfn, l4_table_offset(vphysmap_start),
+ *pl4e);
} else
pl3e = map_l3t_from_l4e(*pl4e);
diff --git a/xen/arch/x86/pv/xpti.c b/xen/arch/x86/pv/xpti.c
index f663fae806..da83339563 100644
--- a/xen/arch/x86/pv/xpti.c
+++ b/xen/arch/x86/pv/xpti.c
@@ -357,6 +357,18 @@ static unsigned int xpti_shadow_getforce(struct xpti_domain *xd)
return idx;
}
+static void xpti_init_xen_l4(struct xpti_domain *xd, struct xpti_l4pg *l4pg)
+{
+ unsigned int i;
+ l4_pgentry_t *src, *dest;
+
+ src = map_domain_page(_mfn(l4pg->guest_mfn));
+ dest = mfn_to_virt(l4pg->xen_mfn);
+ for ( i = 0; i < L4_PAGETABLE_ENTRIES; i++ )
+ dest[i] = src[i];
+ unmap_domain_page(src);
+}
+
static unsigned int xpti_shadow_get(struct xpti_domain *xd, unsigned long mfn)
{
unsigned int idx;
@@ -385,6 +397,9 @@ static unsigned int xpti_shadow_get(struct xpti_domain *xd, unsigned long mfn)
l4pg->ref_next = l4ref->idx;
l4ref->idx = idx;
+ /* Fill the shadow page table entries. */
+ xpti_init_xen_l4(xd, l4pg);
+
return idx;
}
@@ -403,6 +418,26 @@ static unsigned int xpti_shadow_activate(struct xpti_domain *xd,
return idx;
}
+void xpti_update_l4(const struct domain *d, unsigned long mfn,
+ unsigned int slot, l4_pgentry_t e)
+{
+ struct xpti_domain *xd = d->arch.pv_domain.xpti;
+ unsigned long flags;
+ unsigned int idx;
+ l4_pgentry_t *l4;
+
+ spin_lock_irqsave(&xd->lock, flags);
+
+ idx = xpti_shadow_from_hashlist(xd, mfn);
+ if ( idx != L4_INVALID )
+ {
+ l4 = mfn_to_virt(xd->l4pg[idx].xen_mfn);
+ l4[slot] = e;
+ }
+
+ spin_unlock_irqrestore(&xd->lock, flags);
+}
+
void xpti_make_cr3(struct vcpu *v, unsigned long mfn)
{
struct xpti_domain *xd = v->domain->arch.pv_domain.xpti;
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
index 25c035988c..8a90af1084 100644
--- a/xen/include/asm-x86/pv/mm.h
+++ b/xen/include/asm-x86/pv/mm.h
@@ -36,6 +36,8 @@ int xpti_domain_init(struct domain *d);
void xpti_domain_destroy(struct domain *d);
void xpti_make_cr3(struct vcpu *v, unsigned long mfn);
void xpti_free_l4(struct domain *d, unsigned long mfn);
+void xpti_update_l4(const struct domain *d, unsigned long mfn,
+ unsigned int slot, l4_pgentry_t e);
static inline bool is_domain_xpti_active(const struct domain *d)
{
@@ -73,6 +75,8 @@ static inline int xpti_domain_init(struct domain *d) { return 0; }
static inline void xpti_domain_destroy(struct domain *d) { }
static inline void xpti_make_cr3(struct vcpu *v, unsigned long mfn) { }
static inline void xpti_free_l4(struct domain *d, unsigned long mfn) { }
+static inline void xpti_update_l4(const struct domain *d, unsigned long mfn,
+ unsigned int slot, l4_pgentry_t e) { }
static inline bool is_domain_xpti_active(const struct domain *d)
{ return false; }
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 16/17] x86: do page table switching when entering/leaving hypervisor
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
` (14 preceding siblings ...)
2018-02-09 14:01 ` [PATCH v3 15/17] x86: fill XPTI shadow pages and keep them in sync with guest L4 Juergen Gross
@ 2018-02-09 14:01 ` Juergen Gross
2018-02-09 14:01 ` [PATCH v3 17/17] x86: hide most hypervisor mappings in XPTI shadow page tables Juergen Gross
2018-02-12 17:54 ` [PATCH v3 00/17] Alternative Meltdown mitigation Dario Faggioli
17 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-09 14:01 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, dfaggioli, jbeulich
For XPTI enabled domains do page table switching when entering or
leaving the hypervisor. This requires both %cr3 values to be stored
in the per-vcpu stack regions and adding the switching code to the
macros used to switch stacks.
The hypervisor will run on the original L4 page table supplied by the
guest, while the guest will use the shadow.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
xen/arch/x86/pv/xpti.c | 17 ++++++++++++-----
xen/arch/x86/traps.c | 3 ++-
xen/arch/x86/x86_64/asm-offsets.c | 2 ++
xen/include/asm-x86/asm_defns.h | 17 ++++++++++++++++-
xen/include/asm-x86/current.h | 4 +++-
5 files changed, 35 insertions(+), 8 deletions(-)
diff --git a/xen/arch/x86/pv/xpti.c b/xen/arch/x86/pv/xpti.c
index da83339563..e08aa782bf 100644
--- a/xen/arch/x86/pv/xpti.c
+++ b/xen/arch/x86/pv/xpti.c
@@ -441,19 +441,26 @@ void xpti_update_l4(const struct domain *d, unsigned long mfn,
void xpti_make_cr3(struct vcpu *v, unsigned long mfn)
{
struct xpti_domain *xd = v->domain->arch.pv_domain.xpti;
+ struct cpu_info *cpu_info;
unsigned long flags;
- unsigned int idx;
+ unsigned int old, new;
+
+ cpu_info = (struct cpu_info *)v->arch.pv_vcpu.stack_regs;
spin_lock_irqsave(&xd->lock, flags);
- idx = v->arch.pv_vcpu.xen_cr3_shadow;
+ old = v->arch.pv_vcpu.xen_cr3_shadow;
/* First activate new shadow. */
- v->arch.pv_vcpu.xen_cr3_shadow = xpti_shadow_activate(xd, mfn);
+ new = xpti_shadow_activate(xd, mfn);
+ v->arch.pv_vcpu.xen_cr3_shadow = new;
/* Deactivate old shadow if applicable. */
- if ( idx != L4_INVALID )
- xpti_shadow_deactivate(xd, idx);
+ if ( old != L4_INVALID )
+ xpti_shadow_deactivate(xd, old);
+
+ cpu_info->xen_cr3 = mfn << PAGE_SHIFT;
+ cpu_info->guest_cr3 = xd->l4pg[new].xen_mfn << PAGE_SHIFT;
spin_unlock_irqrestore(&xd->lock, flags);
}
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 9b29014e2c..93b228dced 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -305,9 +305,10 @@ static void show_guest_stack(struct vcpu *v, const struct cpu_user_regs *regs)
if ( v != current )
{
struct vcpu *vcpu;
+ unsigned long cr3 = read_cr3();
ASSERT(guest_kernel_mode(v, regs));
- vcpu = maddr_get_owner(read_cr3()) == v->domain ? v : NULL;
+ vcpu = maddr_get_owner(cr3) == v->domain ? v : NULL;
if ( !vcpu )
{
stack = do_page_walk(v, (unsigned long)stack);
diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
index b0060be261..2855feafa3 100644
--- a/xen/arch/x86/x86_64/asm-offsets.c
+++ b/xen/arch/x86/x86_64/asm-offsets.c
@@ -141,6 +141,8 @@ void __dummy__(void)
OFFSET(CPUINFO_shadow_spec_ctrl, struct cpu_info, shadow_spec_ctrl);
OFFSET(CPUINFO_use_shadow_spec_ctrl, struct cpu_info, use_shadow_spec_ctrl);
OFFSET(CPUINFO_bti_ist_info, struct cpu_info, bti_ist_info);
+ OFFSET(CPUINFO_guest_cr3, struct cpu_info, guest_cr3);
+ OFFSET(CPUINFO_xen_cr3, struct cpu_info, xen_cr3);
OFFSET(CPUINFO_stack_bottom_cpu, struct cpu_info, stack_bottom_cpu);
OFFSET(CPUINFO_flags, struct cpu_info, flags);
DEFINE(CPUINFO_sizeof, sizeof(struct cpu_info));
diff --git a/xen/include/asm-x86/asm_defns.h b/xen/include/asm-x86/asm_defns.h
index f626cc6134..f69d1501fb 100644
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -141,6 +141,8 @@ void ret_from_intr(void);
GET_STACK_END(ax); \
testb $ON_VCPUSTACK, STACK_CPUINFO_FIELD(flags)(%rax); \
jz 1f; \
+ movq STACK_CPUINFO_FIELD(xen_cr3)(%rax), %rcx; \
+ mov %rcx, %cr3; \
movq STACK_CPUINFO_FIELD(stack_bottom_cpu)(%rax), %rsp; \
1:
@@ -148,12 +150,25 @@ void ret_from_intr(void);
GET_STACK_END(ax); \
testb $ON_VCPUSTACK, STACK_CPUINFO_FIELD(flags)(%rax); \
jz 1f; \
+ movq STACK_CPUINFO_FIELD(xen_cr3)(%rax), %rcx; \
+ mov %rcx, %cr3; \
sub $(STACK_SIZE - 1 - ist * PAGE_SIZE), %rax; \
mov %rax, %rsp; \
1:
#define SWITCH_TO_VCPU_STACK \
- mov %r12, %rsp
+ mov %r12, %rsp; \
+ GET_STACK_END(ax); \
+ testb $ON_VCPUSTACK, STACK_CPUINFO_FIELD(flags)(%rax); \
+ jz 1f; \
+ mov %cr4, %r8; \
+ mov %r8, %r9; \
+ and $~X86_CR4_PGE, %r8; \
+ mov %r8, %cr4; \
+ movq STACK_CPUINFO_FIELD(guest_cr3)(%rax), %rcx; \
+ mov %rcx, %cr3; \
+ mov %r9, %cr4; \
+1:
#ifndef NDEBUG
#define ASSERT_NOT_IN_ATOMIC \
diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h
index e128c13a1e..82d76a3746 100644
--- a/xen/include/asm-x86/current.h
+++ b/xen/include/asm-x86/current.h
@@ -67,7 +67,9 @@ struct cpu_info {
};
/* per vcpu mapping (xpti) */
struct {
- unsigned long v_pad[4];
+ unsigned long v_pad[2];
+ unsigned long guest_cr3;
+ unsigned long xen_cr3;
unsigned long stack_bottom_cpu;
};
};
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v3 17/17] x86: hide most hypervisor mappings in XPTI shadow page tables
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
` (15 preceding siblings ...)
2018-02-09 14:01 ` [PATCH v3 16/17] x86: do page table switching when entering/leaving hypervisor Juergen Gross
@ 2018-02-09 14:01 ` Juergen Gross
2018-02-12 17:54 ` [PATCH v3 00/17] Alternative Meltdown mitigation Dario Faggioli
17 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-09 14:01 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, andrew.cooper3, dfaggioli, jbeulich
Hide all but the absolute necessary hypervisor mappings in the XPTI
shadow page tables. The following mappings are needed:
- guest accessible areas, e.g. the RO M2P table
- IDT, TSS, GDT
- interrupt entry stacks
- interrupt handling code
For some of those mappings we need to setup lower level page tables
with just some entries populated.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
xen/arch/x86/pv/xpti.c | 229 ++++++++++++++++++++++++++++++++++++-
xen/arch/x86/traps.c | 2 +-
xen/arch/x86/x86_64/compat/entry.S | 4 +
xen/arch/x86/x86_64/entry.S | 4 +
xen/include/asm-x86/pv/mm.h | 5 +
5 files changed, 241 insertions(+), 3 deletions(-)
diff --git a/xen/arch/x86/pv/xpti.c b/xen/arch/x86/pv/xpti.c
index e08aa782bf..dea34322d7 100644
--- a/xen/arch/x86/pv/xpti.c
+++ b/xen/arch/x86/pv/xpti.c
@@ -19,13 +19,16 @@
* along with this program; If not, see <http://www.gnu.org/licenses/>.
*/
+#include <xen/cpu.h>
#include <xen/domain_page.h>
#include <xen/errno.h>
#include <xen/init.h>
#include <xen/keyhandler.h>
#include <xen/lib.h>
+#include <xen/notifier.h>
#include <xen/sched.h>
#include <asm/bitops.h>
+#include <asm/pv/mm.h>
/*
* For each L4 page table of the guest we need a shadow for the hypervisor.
@@ -118,6 +121,7 @@ struct xpti_domain {
unsigned int unused_first; /* List of unused slots */
spinlock_t lock; /* Protects all shadow lists */
struct domain *domain;
+ struct page_info *l3_shadow;
struct tasklet tasklet;
l1_pgentry_t **perdom_l1tab;
#ifdef XPTI_DEBUG
@@ -140,6 +144,9 @@ static __read_mostly enum {
XPTI_NODOM0
} opt_xpti = XPTI_DEFAULT;
+static bool xpti_l3_shadow = false;
+static l3_pgentry_t *xpti_l3_shadows[11];
+
static int parse_xpti(const char *s)
{
int rc = 0;
@@ -357,6 +364,34 @@ static unsigned int xpti_shadow_getforce(struct xpti_domain *xd)
return idx;
}
+static void xpti_update_l4_entry(struct xpti_domain *xd, l4_pgentry_t *dest,
+ l4_pgentry_t entry, unsigned int slot)
+{
+ l3_pgentry_t *l3pg;
+
+ switch ( slot )
+ {
+ case 257: /* ioremap area. */
+ case 258: /* linear page table (guest table). */
+ case 259: /* linear page table (shadow table). */
+ dest[slot] = l4e_empty();
+ break;
+ case 260: /* per-domain mappings. */
+ dest[slot] = l4e_from_page(xd->l3_shadow, __PAGE_HYPERVISOR);
+ break;
+ case 261 ... 271: /* hypervisor text and data, direct phys mapping. */
+ l3pg = xpti_l3_shadows[slot - 261];
+ dest[slot] = l3pg
+ ? l4e_from_mfn(_mfn(virt_to_mfn(l3pg)), __PAGE_HYPERVISOR)
+ : l4e_empty();
+ break;
+ case 256: /* read-only guest accessible m2p table. */
+ default:
+ dest[slot] = entry;
+ break;
+ }
+}
+
static void xpti_init_xen_l4(struct xpti_domain *xd, struct xpti_l4pg *l4pg)
{
unsigned int i;
@@ -365,7 +400,7 @@ static void xpti_init_xen_l4(struct xpti_domain *xd, struct xpti_l4pg *l4pg)
src = map_domain_page(_mfn(l4pg->guest_mfn));
dest = mfn_to_virt(l4pg->xen_mfn);
for ( i = 0; i < L4_PAGETABLE_ENTRIES; i++ )
- dest[i] = src[i];
+ xpti_update_l4_entry(xd, dest, src[i], i);
unmap_domain_page(src);
}
@@ -432,7 +467,7 @@ void xpti_update_l4(const struct domain *d, unsigned long mfn,
if ( idx != L4_INVALID )
{
l4 = mfn_to_virt(xd->l4pg[idx].xen_mfn);
- l4[slot] = e;
+ xpti_update_l4_entry(xd, l4, e, slot);
}
spin_unlock_irqrestore(&xd->lock, flags);
@@ -550,6 +585,8 @@ void xpti_domain_destroy(struct domain *d)
free_xenheap_page(xpti_shadow_free(xd, idx));
}
+ if ( xd->l3_shadow )
+ free_domheap_page(xd->l3_shadow);
xfree(xd->perdom_l1tab);
xfree(xd->l4pg);
xfree(xd->l4ref);
@@ -642,6 +679,125 @@ static int xpti_vcpu_init(struct vcpu *v)
return rc;
}
+static int xpti_add_mapping(unsigned long addr)
+{
+ unsigned int slot, flags, mapflags;
+ unsigned long mfn;
+ l3_pgentry_t *pl3e;
+ l2_pgentry_t *pl2e;
+ l1_pgentry_t *pl1e;
+
+ slot = l4_table_offset(addr);
+ pl3e = l4e_to_l3e(idle_pg_table[slot]);
+
+ slot = l3_table_offset(addr);
+ mapflags = l3e_get_flags(pl3e[slot]);
+ ASSERT(mapflags & _PAGE_PRESENT);
+ if ( mapflags & _PAGE_PSE )
+ {
+ mapflags &= ~_PAGE_PSE;
+ mfn = l3e_get_pfn(pl3e[slot]) & ~((1UL << (2 * PAGETABLE_ORDER)) - 1);
+ mfn |= PFN_DOWN(addr) & ((1UL << (2 * PAGETABLE_ORDER)) - 1);
+ }
+ else
+ {
+ pl2e = l3e_to_l2e(pl3e[slot]);
+ slot = l2_table_offset(addr);
+ mapflags = l2e_get_flags(pl2e[slot]);
+ ASSERT(mapflags & _PAGE_PRESENT);
+ if ( mapflags & _PAGE_PSE )
+ {
+ mapflags &= ~_PAGE_PSE;
+ mfn = l2e_get_pfn(pl2e[slot]) & ~((1UL << PAGETABLE_ORDER) - 1);
+ mfn |= PFN_DOWN(addr) & ((1UL << PAGETABLE_ORDER) - 1);
+ }
+ else
+ {
+ pl1e = l2e_to_l1e(pl2e[slot]);
+ slot = l1_table_offset(addr);
+ mapflags = l1e_get_flags(pl1e[slot]);
+ ASSERT(mapflags & _PAGE_PRESENT);
+ mfn = l1e_get_pfn(pl1e[slot]);
+ }
+ }
+
+ slot = l4_table_offset(addr);
+ ASSERT(slot >= 261 && slot <= 271);
+ pl3e = xpti_l3_shadows[slot - 261];
+ if ( !pl3e )
+ {
+ pl3e = alloc_xen_pagetable();
+ if ( !pl3e )
+ return -ENOMEM;
+ clear_page(pl3e);
+ xpti_l3_shadows[slot - 261] = pl3e;
+ }
+
+ slot = l3_table_offset(addr);
+ flags = l3e_get_flags(pl3e[slot]);
+ if ( !(flags & _PAGE_PRESENT) )
+ {
+ pl2e = alloc_xen_pagetable();
+ if ( !pl2e )
+ return -ENOMEM;
+ clear_page(pl2e);
+ pl3e[slot] = l3e_from_mfn(_mfn(virt_to_mfn(pl2e)), __PAGE_HYPERVISOR);
+ }
+ else
+ {
+ pl2e = l3e_to_l2e(pl3e[slot]);
+ }
+
+ slot = l2_table_offset(addr);
+ flags = l2e_get_flags(pl2e[slot]);
+ if ( !(flags & _PAGE_PRESENT) )
+ {
+ pl1e = alloc_xen_pagetable();
+ if ( !pl1e )
+ return -ENOMEM;
+ clear_page(pl1e);
+ pl2e[slot] = l2e_from_mfn(_mfn(virt_to_mfn(pl1e)), __PAGE_HYPERVISOR);
+ }
+ else
+ {
+ pl1e = l2e_to_l1e(pl2e[slot]);
+ }
+
+ slot = l1_table_offset(addr);
+ pl1e[slot] = l1e_from_mfn(_mfn(mfn), mapflags);
+
+ return 0;
+}
+
+static void xpti_rm_mapping(unsigned long addr)
+{
+ unsigned int slot, flags;
+ l3_pgentry_t *pl3e;
+ l2_pgentry_t *pl2e;
+ l1_pgentry_t *pl1e;
+
+ slot = l4_table_offset(addr);
+ ASSERT(slot >= 261 && slot <= 271);
+ pl3e = xpti_l3_shadows[slot - 261];
+ if ( !pl3e )
+ return;
+
+ slot = l3_table_offset(addr);
+ flags = l3e_get_flags(pl3e[slot]);
+ if ( !(flags & _PAGE_PRESENT) )
+ return;
+
+ pl2e = l3e_to_l2e(pl3e[slot]);
+ slot = l2_table_offset(addr);
+ flags = l2e_get_flags(pl2e[slot]);
+ if ( !(flags & _PAGE_PRESENT) )
+ return;
+
+ pl1e = l2e_to_l1e(pl2e[slot]);
+ slot = l1_table_offset(addr);
+ pl1e[slot] = l1e_empty();
+}
+
int xpti_domain_init(struct domain *d)
{
bool xpti = false;
@@ -649,7 +805,9 @@ int xpti_domain_init(struct domain *d)
struct vcpu *v;
struct xpti_domain *xd;
void *virt;
+ unsigned long addr;
unsigned int i, new;
+ l3_pgentry_t *l3tab, *l3shadow;
if ( !is_pv_domain(d) || is_pv_32bit_domain(d) )
return 0;
@@ -683,6 +841,27 @@ int xpti_domain_init(struct domain *d)
xd->lru_last = L4_INVALID;
xd->free_first = L4_INVALID;
+ if ( !xpti_l3_shadow )
+ {
+ xpti_l3_shadow = true;
+
+ for_each_online_cpu ( i )
+ if ( xpti_add_mapping((unsigned long)idt_tables[i]) )
+ goto done;
+
+ for ( addr = round_pgdown((unsigned long)&xpti_map_start);
+ addr <= round_pgdown((unsigned long)&xpti_map_end - 1);
+ addr += PAGE_SIZE )
+ if ( xpti_add_mapping(addr) )
+ goto done;
+
+ for ( addr = round_pgdown((unsigned long)&xpti_map_start_compat);
+ addr <= round_pgdown((unsigned long)&xpti_map_end_compat - 1);
+ addr += PAGE_SIZE )
+ if ( xpti_add_mapping(addr) )
+ goto done;
+ }
+
spin_lock_init(&xd->lock);
tasklet_init(&xd->tasklet, xpti_tasklet, (unsigned long)xd);
@@ -725,6 +904,16 @@ int xpti_domain_init(struct domain *d)
goto done;
}
+ xd->l3_shadow = alloc_domheap_page(d, MEMF_no_owner);
+ if ( !xd->l3_shadow )
+ goto done;
+ l3tab = __map_domain_page(d->arch.perdomain_l3_pg);
+ l3shadow = __map_domain_page(xd->l3_shadow);
+ clear_page(l3shadow);
+ l3shadow[0] = l3tab[0]; /* GDT/LDT shadow mapping. */
+ l3shadow[3] = l3tab[3]; /* XPTI mappings. */
+ unmap_domain_page(l3shadow);
+ unmap_domain_page(l3tab);
ret = 0;
printk("Enabling Xen Pagetable protection (XPTI) for Domain %d\n",
@@ -801,3 +990,39 @@ static int __init xpti_key_init(void)
return 0;
}
__initcall(xpti_key_init);
+
+static int xpti_cpu_callback(struct notifier_block *nfb, unsigned long action,
+ void *hcpu)
+{
+ unsigned int cpu = (unsigned long)hcpu;
+ int rc = 0;
+
+ if ( !xpti_l3_shadow )
+ return NOTIFY_DONE;
+
+ switch ( action )
+ {
+ case CPU_DOWN_FAILED:
+ case CPU_ONLINE:
+ rc = xpti_add_mapping((unsigned long)idt_tables[cpu]);
+ break;
+ case CPU_DOWN_PREPARE:
+ xpti_rm_mapping((unsigned long)idt_tables[cpu]);
+ break;
+ default:
+ break;
+ }
+
+ return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
+}
+
+static struct notifier_block xpti_cpu_nfb = {
+ .notifier_call = xpti_cpu_callback
+};
+
+static int __init xpti_presmp_init(void)
+{
+ register_cpu_notifier(&xpti_cpu_nfb);
+ return 0;
+}
+presmp_initcall(xpti_presmp_init);
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 93b228dced..00cc7cd9d7 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -102,7 +102,7 @@ DEFINE_PER_CPU_READ_MOSTLY(struct desc_struct *, gdt_table);
DEFINE_PER_CPU_READ_MOSTLY(struct desc_struct *, compat_gdt_table);
/* Master table, used by CPU0. */
-idt_entry_t idt_table[IDT_ENTRIES];
+idt_entry_t idt_table[IDT_ENTRIES] __aligned(PAGE_SIZE);
/* Pointer to the IDT of every CPU. */
idt_entry_t *idt_tables[NR_CPUS] __read_mostly;
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 206bc9a05a..575a3e5d8e 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -13,6 +13,8 @@
#include <public/xen.h>
#include <irq_vectors.h>
+ENTRY(xpti_map_start_compat)
+
ENTRY(entry_int82)
ASM_CLAC
pushq $0
@@ -367,3 +369,5 @@ compat_crash_page_fault:
jmp .Lft14
.previous
_ASM_EXTABLE(.Lft14, .Lfx14)
+
+ENTRY(xpti_map_end_compat)
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 909f6eea66..d1cb355044 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -14,6 +14,8 @@
#include <public/xen.h>
#include <irq_vectors.h>
+ENTRY(xpti_map_start)
+
/* %rbx: struct vcpu, %r12: user_regs */
ENTRY(switch_to_kernel)
leaq VCPU_trap_bounce(%rbx),%rdx
@@ -735,6 +737,8 @@ ENTRY(enable_nmis)
GLOBAL(trap_nop)
iretq
+ENTRY(xpti_map_end)
+
/* Table of automatically generated entry points. One per vector. */
.section .init.rodata, "a", @progbits
GLOBAL(autogen_entrypoints)
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
index 8a90af1084..36e1856b8d 100644
--- a/xen/include/asm-x86/pv/mm.h
+++ b/xen/include/asm-x86/pv/mm.h
@@ -23,6 +23,11 @@
#ifdef CONFIG_PV
+extern void *xpti_map_start;
+extern void *xpti_map_end;
+extern void *xpti_map_start_compat;
+extern void *xpti_map_end_compat;
+
int pv_ro_page_fault(unsigned long addr, struct cpu_user_regs *regs);
long pv_set_gdt(struct vcpu *v, unsigned long *frames, unsigned int entries);
--
2.13.6
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v3 00/17] Alternative Meltdown mitigation
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
` (16 preceding siblings ...)
2018-02-09 14:01 ` [PATCH v3 17/17] x86: hide most hypervisor mappings in XPTI shadow page tables Juergen Gross
@ 2018-02-12 17:54 ` Dario Faggioli
2018-02-13 11:36 ` Juergen Gross
17 siblings, 1 reply; 23+ messages in thread
From: Dario Faggioli @ 2018-02-12 17:54 UTC (permalink / raw)
To: Juergen Gross, xen-devel; +Cc: andrew.cooper3, jbeulich
[-- Attachment #1.1: Type: text/plain, Size: 6384 bytes --]
On Fri, 2018-02-09 at 15:01 +0100, Juergen Gross wrote:
> This series is available via github:
>
> https://github.com/jgross1/xen.git xpti
>
> Dario wants to do some performance tests for this series to compare
> performance with Jan's series with all optimizations posted.
>
And some of this is indeed ready.
So, this is again on my testbox, with 16 pCPUs and 12GB of RAM, and I
used a guest with 16 vCPUs and 10GB of RAM.
I benchmarked Jan's patch *plus* all the optimizations and overhead
mitigation patches he posted on xen-devel (the ones that are already in
staging, and also the ones that are not yet there). That's "XPTI-Light"
in the table and in the graphs. Booting this with 'xpti=false' is
considered the baseline, while booting with 'xpti=true' is the actual
thing we want to measure. :-)
Then I ran the same benchmarks on Juergen's branch above, enabled at
boot. That's "XPYI" in the table and graphs (yes, I know, sorry for the
typo!).
http://openbenchmarking.org/result/1802125-DARI-180211144
http://openbenchmarking.org/result/1802125-DARI-180211144&obr_hgv=XPTI-Light+xpti%3Dfalse&obr_nor=y&obr_hgv=XPTI-Light+xpti%3Dfalse
As far as the following benchmarks go:
- [disk] I/O benchmarks (like aio-stress, fio, iozone)
- compress/uncompress benchmarks
- sw building benchmarks
- system benchmarks (pgbench, nginx, most of the stress-ng cases)
- scheduling latency benchmarks (schbench)
the two approach are very very close. It may be said that 'XPTI-Light
optimized' has, overall, still a little bit of an edge. But really,
that varies from test to test, and most of the time is marginal (either
way).
System-V message passing and semaphores, as well as socket activity
tests, together with hackbench ones, seems to cause Juergen's XPTI
serious problems, though.
With Juergen, we decided to dig this a bit more. He hypothesized that,
currently, (vCPU) context switching costs are high in his solution.
Therefore, I went and check (roughly) how many context switches occurs
in Xen, during a few of the benchmarks.
Here's a summary.
******** stress-ng CPU ********
== XPTI
stress-ng: info: cpu 1795.71 bogo ops/s
sched: runs through scheduler 29822
sched: context switches 14391
== XPTI-Light
stress-ng: info: cpu 1821.60 bogo ops/s
sched: runs through scheduler 24544
sched: context switches 9128
******** stress-ng Memory Copying ********
== XPTI
stress-ng: info: memcpy 831.79 bogo ops/s
sched: runs through scheduler 22875
sched: context switches 8230
== XPTI-Light
stress-ng: info: memcpy 827.68
sched: runs through scheduler 23142
sched: context switches 8279
******** schbench ********
== XPTI
Latency percentiles (usec)
50.0000th: 36672
75.0000th: 79488
90.0000th: 124032
95.0000th: 154880
*99.0000th: 232192
99.5000th: 259328
99.9000th: 332288
min=0, max=568244
sched: runs through scheduler 25736
sched: context switches 10622
== XPTI-Light
Latency percentiles (usec)
50.0000th: 37824
75.0000th: 81024
90.0000th: 127872
95.0000th: 156416
*99.0000th: 235776
99.5000th: 271872
99.9000th: 348672
min=0, max=643999
sched: runs through scheduler 25604
sched: context switches 10741
******** hackbench ********
== XPTI
Running with 4*40 (== 160) tasks 250.707 s
sched: runs through scheduler 1322606
sched: context switches 1208853
== XPTI-Light
Running with 4*40 (== 160) tasks 60.961 s
sched: runs through scheduler 1680535
sched: context switches 1668358
******** stress-ng SysV Msg Passing ********
== XPTI
stress-ng: info: msg 276321.24 bogo ops/s
sched: runs through scheduler 25144
sched: context switches 10391
== XPTI-Light
stress-ng: info: msg 1775035.18 bogo ops/s
sched: runs through scheduler 33453
sched: context switches 18566
******** schbench -p *********
== XPTI
Latency percentiles (usec)
50.0000th: 53
75.0000th: 56
90.0000th: 103
95.0000th: 161
*99.0000th: 1326
99.5000th: 2172
99.9000th: 4760
min=0, max=124594
avg worker transfer: 478.63 ops/sec 1.87KB/s
sched: runs through scheduler 34161
sched: context switches 19556
== XPTI-Light
Latency percentiles (usec)
50.0000th: 16
75.0000th: 17
90.0000th: 18
95.0000th: 35
*99.0000th: 258
99.5000th: 424
99.9000th: 1005
min=0, max=110505
avg worker transfer: 1791.82 ops/sec 7.00KB/s
sched: runs through scheduler 41905
sched: context switches 27013
So, basically, the intuition seems to me to be confirmed. In fact, we
see that until the number of context switches happening during the
specific benchmark are limited to ~ below 10k, Juergen's XPTI is fine,
and on par or better than Jan's XPTI-Light (see stress-ng:cpu, stress-
ng:memorycopying, schbench).
Above 10k, XPTI begins to suffer; and the more context switches there
are, the worse (e.g., see how bad it goes in the hackbench case).
Note that, in the stress-ng:sysvmsg case, we see that in the XPTI-Light
case that there are ~20k context switches, and I believe that the fact
that we only see ~10k of them in the XPTI case, is that, due to context
switch being slower, the benchmark did less work in its 30s of
execution.
We can have a confirmation of that by looking at the schedbench-p case,
where the slowdown is evident by looking at the average data
transferred by the workers.
So, that's it for now. Thoughts are welcome. :-)
...
Or, actually, that's not it! :-O In fact, right while I was writing
this report, it came out on IRC that something can be done, on
Juergen's XPTI series, to mitigate the performance impact a bit.
Juergen sent me a patch already, and I'm re-running the benchmarks with
that applied. I'll let know how the results ends up looking like.
Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 157 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 02/17] x86: do a revert of e871e80c38547d9faefc6604532ba3e985e65873
2018-02-09 14:01 ` [PATCH v3 02/17] x86: do a revert of e871e80c38547d9faefc6604532ba3e985e65873 Juergen Gross
@ 2018-02-13 10:14 ` Jan Beulich
0 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2018-02-13 10:14 UTC (permalink / raw)
To: Juergen Gross; +Cc: andrew.cooper3, Dario Faggioli, xen-devel
>>> On 09.02.18 at 15:01, <jgross@suse.com> wrote:
> Revert "x86: allow Meltdown band-aid to be disabled" in order to
> prepare for a final Meltdown mitigation.
This no also reverts a22320e32dca0918ed23799583f470afe4c24330
afaict. I think that it would be better to revert the whole thing in a
single patch anyway (i.e. also fold patch 3 into here).
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 00/17] Alternative Meltdown mitigation
2018-02-12 17:54 ` [PATCH v3 00/17] Alternative Meltdown mitigation Dario Faggioli
@ 2018-02-13 11:36 ` Juergen Gross
2018-02-13 14:16 ` Jan Beulich
[not found] ` <5A83014E02000078001A7619@suse.com>
0 siblings, 2 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-13 11:36 UTC (permalink / raw)
To: Dario Faggioli, xen-devel; +Cc: andrew.cooper3, jbeulich
On 12/02/18 18:54, Dario Faggioli wrote:
> On Fri, 2018-02-09 at 15:01 +0100, Juergen Gross wrote:
>> This series is available via github:
>>
>> https://github.com/jgross1/xen.git xpti
>>
>> Dario wants to do some performance tests for this series to compare
>> performance with Jan's series with all optimizations posted.
>>
> And some of this is indeed ready.
>
> So, this is again on my testbox, with 16 pCPUs and 12GB of RAM, and I
> used a guest with 16 vCPUs and 10GB of RAM.
>
> I benchmarked Jan's patch *plus* all the optimizations and overhead
> mitigation patches he posted on xen-devel (the ones that are already in
> staging, and also the ones that are not yet there). That's "XPTI-Light"
> in the table and in the graphs. Booting this with 'xpti=false' is
> considered the baseline, while booting with 'xpti=true' is the actual
> thing we want to measure. :-)
>
> Then I ran the same benchmarks on Juergen's branch above, enabled at
> boot. That's "XPYI" in the table and graphs (yes, I know, sorry for the
> typo!).
>
> http://openbenchmarking.org/result/1802125-DARI-180211144
> http://openbenchmarking.org/result/1802125-DARI-180211144&obr_hgv=XPTI-Light+xpti%3Dfalse&obr_nor=y&obr_hgv=XPTI-Light+xpti%3Dfalse
...
> Or, actually, that's not it! :-O In fact, right while I was writing
> this report, it came out on IRC that something can be done, on
> Juergen's XPTI series, to mitigate the performance impact a bit.
>
> Juergen sent me a patch already, and I'm re-running the benchmarks with
> that applied. I'll let know how the results ends up looking like.
It turned out the results are not basically different. So the general
problem with context switches is still there (which I expected, BTW).
So I guess the really bad results with benchmarks triggering a lot of
vcpu scheduling show that my approach isn't going to fly, as the most
probable cause for the slow context switches are the introduced
serializing instructions (LTR, WRMSRs) which can't be avoided when we
want to use per-vcpu stacks.
OTOH the results of the other benchmarks showing some advantage over
Jan's solution indicate there is indeed an aspect which can be improved.
Instead of preferring one approach over the other I have thought about
a way to use the best parts of each solution in a combined variant. In
case nobody is feeling strong to pursue my current approach further I'd
like to suggest the following scheme:
- Whenever a L4 page table of the guest is in use on one physical cpu
only use the L4 shadow cache of my series in order to avoid having to
copy the L4 contents each time the hypervisor is left.
- As soon as a L4 page table is being activated on a second cpu fall
back to use the per-cpu page table on that cpu (the cpu already using
the L4 page table can continue doing so).
- Before activation of a L4 shadow page table it is modified to map the
per-cpu data needed in guest mode for the local cpu only.
- Use INVPCID instead of %cr4 PGE toggling to speed up purging global
TLB entries (depending on the availability of the feature, of course).
- Use the PCID feature for being able to avoid purging TLB entries which
might be needed later (depending on hardware again). I expect this
will help especially for cases where the guest often switches between
kernel and user mode. Whether we want 3 or 4 PCID values for each
guest address space has to be discussed: do we need 2 different Xen
variants for guest user and guest kernel (IOW: are there any problems
possible when the hypervisor is using a guest kernel's permission to
access guest data when the guest was running in user mode before
entering the hypervisor)?
Thoughts?
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 00/17] Alternative Meltdown mitigation
2018-02-13 11:36 ` Juergen Gross
@ 2018-02-13 14:16 ` Jan Beulich
[not found] ` <5A83014E02000078001A7619@suse.com>
1 sibling, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2018-02-13 14:16 UTC (permalink / raw)
To: Juergen Gross; +Cc: andrew.cooper3, Dario Faggioli, xen-devel
>>> On 13.02.18 at 12:36, <jgross@suse.com> wrote:
> On 12/02/18 18:54, Dario Faggioli wrote:
>> On Fri, 2018-02-09 at 15:01 +0100, Juergen Gross wrote:
>>> This series is available via github:
>>>
>>> https://github.com/jgross1/xen.git xpti
>>>
>>> Dario wants to do some performance tests for this series to compare
>>> performance with Jan's series with all optimizations posted.
>>>
>> And some of this is indeed ready.
>>
>> So, this is again on my testbox, with 16 pCPUs and 12GB of RAM, and I
>> used a guest with 16 vCPUs and 10GB of RAM.
>>
>> I benchmarked Jan's patch *plus* all the optimizations and overhead
>> mitigation patches he posted on xen-devel (the ones that are already in
>> staging, and also the ones that are not yet there). That's "XPTI-Light"
>> in the table and in the graphs. Booting this with 'xpti=false' is
>> considered the baseline, while booting with 'xpti=true' is the actual
>> thing we want to measure. :-)
>>
>> Then I ran the same benchmarks on Juergen's branch above, enabled at
>> boot. That's "XPYI" in the table and graphs (yes, I know, sorry for the
>> typo!).
>>
>> http://openbenchmarking.org/result/1802125-DARI-180211144
>>
> http://openbenchmarking.org/result/1802125-DARI-180211144&obr_hgv=XPTI-Light+x
> pti%3Dfalse&obr_nor=y&obr_hgv=XPTI-Light+xpti%3Dfalse
>
> ...
>
>> Or, actually, that's not it! :-O In fact, right while I was writing
>> this report, it came out on IRC that something can be done, on
>> Juergen's XPTI series, to mitigate the performance impact a bit.
>>
>> Juergen sent me a patch already, and I'm re-running the benchmarks with
>> that applied. I'll let know how the results ends up looking like.
>
> It turned out the results are not basically different. So the general
> problem with context switches is still there (which I expected, BTW).
>
> So I guess the really bad results with benchmarks triggering a lot of
> vcpu scheduling show that my approach isn't going to fly, as the most
> probable cause for the slow context switches are the introduced
> serializing instructions (LTR, WRMSRs) which can't be avoided when we
> want to use per-vcpu stacks.
>
> OTOH the results of the other benchmarks showing some advantage over
> Jan's solution indicate there is indeed an aspect which can be improved.
>
> Instead of preferring one approach over the other I have thought about
> a way to use the best parts of each solution in a combined variant. In
> case nobody is feeling strong to pursue my current approach further I'd
> like to suggest the following scheme:
>
> - Whenever a L4 page table of the guest is in use on one physical cpu
> only use the L4 shadow cache of my series in order to avoid having to
> copy the L4 contents each time the hypervisor is left.
>
> - As soon as a L4 page table is being activated on a second cpu fall
> back to use the per-cpu page table on that cpu (the cpu already using
> the L4 page table can continue doing so).
Would the first of these CPUs continue to run on the shadow L4 in
that case? If so, would there be no synchronization issues? If not,
how do you envision "telling" it to move to the per-CPU L4 (which,
afaict, includes knowing which vCPU / pCPU that is)?
> - Before activation of a L4 shadow page table it is modified to map the
> per-cpu data needed in guest mode for the local cpu only.
I had been considering to do this in XPTI light for other purposes
too (for example it might be possible to short circuit the guest
system call path to get away without multiple page table switches).
We really first need to settle on how much we feel is safe to expose
while the guest is running. So far I've been under the impression
that people actually think we should further reduce exposed pieces
of code/data, rather than widen the "window".
> - Use INVPCID instead of %cr4 PGE toggling to speed up purging global
> TLB entries (depending on the availability of the feature, of course).
That's something we should do independent of what XPTI model
we'd like to retain long term.
> - Use the PCID feature for being able to avoid purging TLB entries which
> might be needed later (depending on hardware again).
Which first of all raises the question: Does PCID (other than the U
bit) prevent use of TLB entries in the wrong context? IOW is the
PCID check done early (during TLB lookup) rather than late (during
insn retirement)?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v3 00/17] Alternative Meltdown mitigation
[not found] ` <5A83014E02000078001A7619@suse.com>
@ 2018-02-13 14:29 ` Juergen Gross
0 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2018-02-13 14:29 UTC (permalink / raw)
To: Jan Beulich; +Cc: andrew.cooper3, xen-devel, Dario Faggioli
On 13/02/18 15:16, Jan Beulich wrote:
>>>> On 13.02.18 at 12:36, <jgross@suse.com> wrote:
>> On 12/02/18 18:54, Dario Faggioli wrote:
>>> On Fri, 2018-02-09 at 15:01 +0100, Juergen Gross wrote:
>>>> This series is available via github:
>>>>
>>>> https://github.com/jgross1/xen.git xpti
>>>>
>>>> Dario wants to do some performance tests for this series to compare
>>>> performance with Jan's series with all optimizations posted.
>>>>
>>> And some of this is indeed ready.
>>>
>>> So, this is again on my testbox, with 16 pCPUs and 12GB of RAM, and I
>>> used a guest with 16 vCPUs and 10GB of RAM.
>>>
>>> I benchmarked Jan's patch *plus* all the optimizations and overhead
>>> mitigation patches he posted on xen-devel (the ones that are already in
>>> staging, and also the ones that are not yet there). That's "XPTI-Light"
>>> in the table and in the graphs. Booting this with 'xpti=false' is
>>> considered the baseline, while booting with 'xpti=true' is the actual
>>> thing we want to measure. :-)
>>>
>>> Then I ran the same benchmarks on Juergen's branch above, enabled at
>>> boot. That's "XPYI" in the table and graphs (yes, I know, sorry for the
>>> typo!).
>>>
>>> http://openbenchmarking.org/result/1802125-DARI-180211144
>>>
>> http://openbenchmarking.org/result/1802125-DARI-180211144&obr_hgv=XPTI-Light+x
>> pti%3Dfalse&obr_nor=y&obr_hgv=XPTI-Light+xpti%3Dfalse
>>
>> ...
>>
>>> Or, actually, that's not it! :-O In fact, right while I was writing
>>> this report, it came out on IRC that something can be done, on
>>> Juergen's XPTI series, to mitigate the performance impact a bit.
>>>
>>> Juergen sent me a patch already, and I'm re-running the benchmarks with
>>> that applied. I'll let know how the results ends up looking like.
>>
>> It turned out the results are not basically different. So the general
>> problem with context switches is still there (which I expected, BTW).
>>
>> So I guess the really bad results with benchmarks triggering a lot of
>> vcpu scheduling show that my approach isn't going to fly, as the most
>> probable cause for the slow context switches are the introduced
>> serializing instructions (LTR, WRMSRs) which can't be avoided when we
>> want to use per-vcpu stacks.
>>
>> OTOH the results of the other benchmarks showing some advantage over
>> Jan's solution indicate there is indeed an aspect which can be improved.
>>
>> Instead of preferring one approach over the other I have thought about
>> a way to use the best parts of each solution in a combined variant. In
>> case nobody is feeling strong to pursue my current approach further I'd
>> like to suggest the following scheme:
>>
>> - Whenever a L4 page table of the guest is in use on one physical cpu
>> only use the L4 shadow cache of my series in order to avoid having to
>> copy the L4 contents each time the hypervisor is left.
>>
>> - As soon as a L4 page table is being activated on a second cpu fall
>> back to use the per-cpu page table on that cpu (the cpu already using
>> the L4 page table can continue doing so).
>
> Would the first of these CPUs continue to run on the shadow L4 in
> that case? If so, would there be no synchronization issues? If not,
> how do you envision "telling" it to move to the per-CPU L4 (which,
> afaict, includes knowing which vCPU / pCPU that is)?
I thought to let the CPU running on the shadow L4. This L4 already is
configured for the CPU it is being used on, so we just have to avoid
to activate it on a second CPU.
I don't see synchronization issues as all guest L4 modifications would
be mirrored in the shadow, as done in my series already.
>> - Before activation of a L4 shadow page table it is modified to map the
>> per-cpu data needed in guest mode for the local cpu only.
>
> I had been considering to do this in XPTI light for other purposes
> too (for example it might be possible to short circuit the guest
> system call path to get away without multiple page table switches).
> We really first need to settle on how much we feel is safe to expose
> while the guest is running. So far I've been under the impression
> that people actually think we should further reduce exposed pieces
> of code/data, rather than widen the "window".
I would like to have some prepared L3 page tables for each cpu meant to
be hooked into the correct shadow L4 slots. The shadow L4 should map as
few hypervisor parts as possible (again like in my current series).
>> - Use INVPCID instead of %cr4 PGE toggling to speed up purging global
>> TLB entries (depending on the availability of the feature, of course).
>
> That's something we should do independent of what XPTI model
> we'd like to retain long term.
Right. That was just for completeness.
>> - Use the PCID feature for being able to avoid purging TLB entries which
>> might be needed later (depending on hardware again).
>
> Which first of all raises the question: Does PCID (other than the U
> bit) prevent use of TLB entries in the wrong context? IOW is the
> PCID check done early (during TLB lookup) rather than late (during
> insn retirement)?
We can test this easily. As Linux kernel is already using this mechanism
for Meltdown mitigation I assume it is save, or the Linux kernel way to
avoid Meltdown attacks wouldn't work.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2018-02-13 14:29 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-09 14:01 [PATCH v3 00/17] Alternative Meltdown mitigation Juergen Gross
2018-02-09 14:01 ` [PATCH v3 01/17] x86: don't use hypervisor stack size for dumping guest stacks Juergen Gross
2018-02-09 14:01 ` [PATCH v3 02/17] x86: do a revert of e871e80c38547d9faefc6604532ba3e985e65873 Juergen Gross
2018-02-13 10:14 ` Jan Beulich
2018-02-09 14:01 ` [PATCH v3 03/17] x86: revert 5784de3e2067ed73efc2fe42e62831e8ae7f46c4 Juergen Gross
2018-02-09 14:01 ` [PATCH v3 04/17] x86: don't access saved user regs via rsp in trap handlers Juergen Gross
2018-02-09 14:01 ` [PATCH v3 05/17] x86: add a xpti command line parameter Juergen Gross
2018-02-09 14:01 ` [PATCH v3 06/17] x86: allow per-domain mappings without NX bit or with specific mfn Juergen Gross
2018-02-09 14:01 ` [PATCH v3 07/17] xen/x86: split _set_tssldt_desc() into ldt and tss specific functions Juergen Gross
2018-02-09 14:01 ` [PATCH v3 08/17] x86: add support for spectre mitigation with local thunk Juergen Gross
2018-02-09 14:01 ` [PATCH v3 09/17] x86: create syscall stub for per-domain mapping Juergen Gross
2018-02-09 14:01 ` [PATCH v3 10/17] x86: allocate per-vcpu stacks for interrupt entries Juergen Gross
2018-02-09 14:01 ` [PATCH v3 11/17] x86: modify interrupt handlers to support stack switching Juergen Gross
2018-02-09 14:01 ` [PATCH v3 12/17] x86: activate per-vcpu stacks in case of xpti Juergen Gross
2018-02-09 14:01 ` [PATCH v3 13/17] x86: allocate hypervisor L4 page table for XPTI Juergen Gross
2018-02-09 14:01 ` [PATCH v3 14/17] xen: add domain pointer to fill_ro_mpt() and zap_ro_mpt() functions Juergen Gross
2018-02-09 14:01 ` [PATCH v3 15/17] x86: fill XPTI shadow pages and keep them in sync with guest L4 Juergen Gross
2018-02-09 14:01 ` [PATCH v3 16/17] x86: do page table switching when entering/leaving hypervisor Juergen Gross
2018-02-09 14:01 ` [PATCH v3 17/17] x86: hide most hypervisor mappings in XPTI shadow page tables Juergen Gross
2018-02-12 17:54 ` [PATCH v3 00/17] Alternative Meltdown mitigation Dario Faggioli
2018-02-13 11:36 ` Juergen Gross
2018-02-13 14:16 ` Jan Beulich
[not found] ` <5A83014E02000078001A7619@suse.com>
2018-02-13 14:29 ` Juergen Gross
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).