From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Shuai Ruan <shuai.ruan@linux.intel.com>, xen-devel@lists.xen.org
Cc: kevin.tian@intel.com, wei.liu2@citrix.com,
Ian.Campbell@citrix.com, stefano.stabellini@eu.citrix.com,
eddie.dong@intel.com, ian.jackson@eu.citrix.com,
jbeulich@suse.com, jun.nakajima@intel.com, keir@xen.org
Subject: Re: [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest
Date: Wed, 5 Aug 2015 18:51:29 +0100 [thread overview]
Message-ID: <55C24D21.50209@citrix.com> (raw)
In-Reply-To: <1438739842-31658-2-git-send-email-shuai.ruan@linux.intel.com>
On 05/08/15 02:57, Shuai Ruan wrote:
> This patch emualtes xsaves/xrstors instructions and
> XSS msr access.
>
> As xsaves/xrstors instructions and XSS msr access
> required be executed only in ring0. So emulation are
> needed when pv guest uses these instructions.
>
> Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
> ---
> xen/arch/x86/domain.c | 3 +
> xen/arch/x86/traps.c | 138 ++++++++++++++++++++++++++++++++++++++++
> xen/arch/x86/x86_64/mm.c | 52 +++++++++++++++
> xen/arch/x86/xstate.c | 39 ++++++++++++
> xen/include/asm-x86/domain.h | 1 +
> xen/include/asm-x86/mm.h | 1 +
> xen/include/asm-x86/msr-index.h | 2 +
> xen/include/asm-x86/xstate.h | 3 +
> 8 files changed, 239 insertions(+)
>
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 045f6ff..e8b8d67 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -426,6 +426,7 @@ int vcpu_initialise(struct vcpu *v)
>
> /* By default, do not emulate */
> v->arch.vm_event.emulate_flags = 0;
> + v->arch.msr_ia32_xss = 0;
The backing memory for struct vcpu is zeroed when allocated. There is
no need to explicitly zero this field here.
>
> rc = mapcache_vcpu_init(v);
> if ( rc )
> @@ -1529,6 +1530,8 @@ static void __context_switch(void)
> if ( xcr0 != get_xcr0() && !set_xcr0(xcr0) )
> BUG();
> }
> + if ( cpu_has_xsaves )
> + wrmsr_safe(MSR_IA32_XSS, n->arch.msr_ia32_xss);
This musn't throw away potential errors. It should not be possible for
n->arch.msr_ia32_xss to be invalid by this point, so a straight wrmsr()
would be correct.
However, you will want to implement lazy context switching, exactly like
get/set_xcr0().
> vcpu_restore_fpu_eager(n);
> n->arch.ctxt_switch_to(n);
> }
> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> index 6a03582..c1fea77 100644
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -2353,6 +2353,131 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
> }
> break;
>
> + case 0xc7:
This case should be sorted numerically, so should be between CPUID and
default.
> + {
> + void *xsave_addr;
> + int not_page_aligned = 0;
bool_t
> + u32 guest_xsaves_size = xstate_ctxt_size_compact(v->arch.xcr0);
> +
> + switch ( insn_fetch(u8, code_base, eip, code_limit) )
> + {
> + case 0x2f:/* XSAVES */
> + {
> + if ( !cpu_has_xsaves || !(v->arch.pv_vcpu.ctrlreg[4] &
> + X86_CR4_OSXSAVE))
I would format this as
if ( !cpu_has_xsaves ||
!(v->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE) )
to associate the bit test more clearly.
> + {
> + do_guest_trap(TRAP_invalid_op, regs, 0);
> + goto skip;
> + }
> +
> + if ( v->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )
> + {
> + do_guest_trap(TRAP_nmi, regs, 0);
TRAP_no_device surely?
> + goto skip;
> + }
> +
> + if ( !guest_kernel_mode(v, regs) || (regs->edi & 0x3f) )
What does edi have to do with xsaves? only edx:eax are special
according to the manual.
> + goto fail;
> +
> + if ( (regs->edi & ~PAGE_MASK) + guest_xsaves_size > PAGE_SIZE )
> + {
> + mfn_t mfn_list[2];
> + void *va;
This fails to account for the xsaves size growing to > PAGE_SIZE in
future processors.
/* TODO - expand for future processors .*/
BUG_ON(guest_xsaves_size <= PAGE_SIZE);
might be acceptable. However, it is better fixed by...
> +
> + not_page_aligned = 1;
> + mfn_list[0] = _mfn(do_page_walk_mfn(v, regs->edi));
> + mfn_list[1] = _mfn(do_page_walk_mfn(v,
> + PAGE_ALIGN(regs->edi)));
> +
> + va = __vmap(mfn_list, 1, 2, PAGE_SIZE, PAGE_HYPERVISOR);
> + xsave_addr = (void *)((unsigned long)va +
> + (regs->edi & ~PAGE_MASK));
... introducing a new helper such as
void *vmap_guest_linear(void *va, size_t bytes,, uint32_t PFEC);
which takes care of the internal details of making a linear area of
guest virtual address space appear linear in Xen virtual address space
as well.
You also need to take care to respect non-writable pages, and have an
access_ok() check or a guest could (ab)use this emulation to write to
Xen code/data areas.
> + }
> + else
> + xsave_addr = do_page_walk(v, regs->edi);
> +
> + if ( !xsave_addr )
> + goto fail;
> +
> + xsaves(regs->eax, regs->edx, xsave_addr);
> +
> + if ( not_page_aligned )
> + vunmap((void *)((unsigned long)xsave_addr & PAGE_MASK));
> + else
> + unmap_domain_page(xsave_addr);
> + break;
> + }
Blank line here please.
> + case 0x1f:/* XRSTORS */
> + {
> + struct xsave_struct guest_xsave_area;
Not on the stack please, but I don't believe you need this.
> +
> + if ( !cpu_has_xsaves || !(v->arch.pv_vcpu.ctrlreg[4] &
> + X86_CR4_OSXSAVE))
> + {
> + do_guest_trap(TRAP_invalid_op, regs, 0);
> + goto skip;
> + }
> +
> + if ( v->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )
> + {
> + do_guest_trap(TRAP_nmi, regs, 0);
> + goto skip;
> + }
> +
> + if ( !guest_kernel_mode(v, regs) || (regs->edi & 0x3f) )
> + goto fail;
> +
> + if ( (rc = copy_from_user(&guest_xsave_area, (void *) regs->edi,
> + sizeof(struct xsave_struct))) !=0 )
> + {
> + propagate_page_fault(regs->edi +
> + sizeof(struct xsave_struct) - rc, 0);
> + goto skip;
Surely you just need the xstate_bv and xcomp_bv ?
> + }
> + else
> + if ( !(guest_xsave_area.xsave_hdr.xcomp_bv & 1l << 63) ||
1l << 63 is undefined behaviour by altering the sign bit with a shift.
Perhaps you meant 1ul << 63 ?
> + (guest_xsave_area.xsave_hdr.xstate_bv |
> + guest_xsave_area.xsave_hdr.xcomp_bv) !=
> + guest_xsave_area.xsave_hdr.xcomp_bv ||
xcomp_bv is not introduced until patch 2, which means that this patch
doesn't build. Please make absolutely sure that the entire series is
bisectable.
> + ((guest_xsave_area.xsave_hdr.xcomp_bv & ~(1l << 63)) |
> + v->arch.xcr0 | v->arch.msr_ia32_xss ) !=
> + (v->arch.xcr0 | v->arch.msr_ia32_xss) )
> + goto fail;
> +
> + if ( (regs->edi & ~PAGE_MASK) + guest_xsaves_size > PAGE_SIZE )
> + {
> + mfn_t mfn_list[2];
> + void *va;
> +
> + not_page_aligned = 1;
> + mfn_list[0] = _mfn(do_page_walk_mfn(v, regs->edi));
> + mfn_list[1] = _mfn(do_page_walk_mfn(v,
> + PAGE_ALIGN(regs->edi)));
> +
> + va = __vmap(mfn_list, 1, 2, PAGE_SIZE, PAGE_HYPERVISOR);
> + xsave_addr = (void *)((unsigned long)va +
> + (regs->edi & ~PAGE_MASK));
> + }
> + else
> + xsave_addr = do_page_walk(v, regs->edi);
> +
> + if ( !xsave_addr )
> + goto fail;
> +
> + xrstors(regs->eax, regs->edx, xsave_addr);
> +
> + if ( not_page_aligned )
> + vunmap((void *)((unsigned long)xsave_addr & PAGE_MASK));
> + else
> + unmap_domain_page(xsave_addr);
> + break;
> + }
> + default:
> + goto fail;
> + }
> + break;
> + }
> +
> case 0x06: /* CLTS */
> (void)do_fpu_taskswitch(0);
> break;
> @@ -2663,6 +2788,13 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
> }
> break;
> }
> + case MSR_IA32_XSS:
> + if ( !cpu_has_xsaves )
> + goto fail;
> + if ( wrmsr_safe(regs->ecx, msr_content) != 0 )
> + goto fail;
You can join these two if() statements together. Also, it is
technically regs->_ecx.
> + v->arch.msr_ia32_xss = msr_content;
> + break;
> /*FALLTHROUGH*/
Observe the comment which indicates that the previous case: deliberately
falls through to default:
>
> default:
> @@ -2798,6 +2930,12 @@ static int emulate_privileged_op(struct cpu_user_regs *regs)
> }
> break;
> }
> + case MSR_IA32_XSS:
> + if ( !cpu_has_xsaves )
> + goto fail;
> + regs->eax = v->arch.msr_ia32_xss;
> + regs->edx = v->arch.msr_ia32_xss >> 32;
val = v->arch.msr_ia32_xss;
goto rdmsr_writeback;
This code currently incorrectly clobbers the upper 32 bits of rax.
> + break;
> /*FALLTHROUGH*/
Again - another bad fall through case.
>
> default:
> diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
> index 98310f3..de94ac1 100644
> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -48,6 +48,58 @@ l2_pgentry_t __section(".bss.page_aligned") l2_bootmap[L2_PAGETABLE_ENTRIES];
>
> l2_pgentry_t *compat_idle_pg_table_l2;
>
> +unsigned long do_page_walk_mfn(struct vcpu *v, unsigned long addr)
What is this function? Why is it useful? Something like this belongs
in its own patch along with a description of why it is being introduced.
> +{
> + unsigned long mfn = pagetable_get_pfn(v->arch.guest_table);
> + l4_pgentry_t l4e, *l4t;
> + l3_pgentry_t l3e, *l3t;
> + l2_pgentry_t l2e, *l2t;
> + l1_pgentry_t l1e, *l1t;
> +
> + if ( !is_pv_vcpu(v) || !is_canonical_address(addr) )
> + return 0;
> +
> + l4t = map_domain_page(_mfn(mfn));
> + l4e = l4t[l4_table_offset(addr)];
> + unmap_domain_page(l4t);
> + if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
> + return 0;
> +
> + l3t = map_l3t_from_l4e(l4e);
> + l3e = l3t[l3_table_offset(addr)];
> + unmap_domain_page(l3t);
> + mfn = l3e_get_pfn(l3e);
> + if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) || !mfn_valid(mfn) )
> + return 0;
> + if ( (l3e_get_flags(l3e) & _PAGE_PSE) )
> + {
> + mfn += PFN_DOWN(addr & ((1UL << L3_PAGETABLE_SHIFT) - 1));
> + goto ret;
> + }
> +
> + l2t = map_domain_page(_mfn(mfn));
> + l2e = l2t[l2_table_offset(addr)];
> + unmap_domain_page(l2t);
> + mfn = l2e_get_pfn(l2e);
> + if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) || !mfn_valid(mfn) )
> + return 0;
> + if ( (l2e_get_flags(l2e) & _PAGE_PSE) )
> + {
> + mfn += PFN_DOWN(addr & ((1UL << L2_PAGETABLE_SHIFT) - 1));
> + goto ret;
> + }
> +
> + l1t = map_domain_page(_mfn(mfn));
> + l1e = l1t[l1_table_offset(addr)];
> + unmap_domain_page(l1t);
> + mfn = l1e_get_pfn(l1e);
> + if ( !(l1e_get_flags(l1e) & _PAGE_PRESENT) || !mfn_valid(mfn) )
> + return 0;
> +
> + ret:
> + return mfn;
> +}
> +
> void *do_page_walk(struct vcpu *v, unsigned long addr)
> {
> unsigned long mfn = pagetable_get_pfn(v->arch.guest_table);
> diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
> index d5f5e3b..e34eda3 100644
> --- a/xen/arch/x86/xstate.c
> +++ b/xen/arch/x86/xstate.c
> @@ -65,6 +65,31 @@ uint64_t get_xcr0(void)
> return this_cpu(xcr0);
> }
>
> +void xsaves(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr)
Should take a 64bit mask.
> +{
> + asm volatile ( ".byte 0x48,0x0f,0xc7,0x2f"
> + : "=m" (*ptr)
> + : "a" (lmask), "d" (hmask), "D" (ptr) );
> +}
> +
> +void xrstors(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr)
> +{
> + asm volatile ( "1: .byte 0x48,0x0f,0xc7,0x1f\n"
> + ".section .fixup,\"ax\" \n"
> + "2: mov %5,%%ecx \n"
> + " xor %1,%1 \n"
> + " rep stosb \n"
> + " lea %2,%0 \n"
> + " mov %3,%1 \n"
> + " jmp 1b \n"
> + ".previous \n"
> + _ASM_EXTABLE(1b, 2b)
> + : "+&D" (ptr), "+&a" (lmask)
> + : "m" (*ptr), "g" (lmask), "d" (hmask),
> + "m" (xsave_cntxt_size)
> + : "ecx" );
> +}
> +
Neither of these two helpers have anything like sufficient checking to
be safely used on guest state.
> void xsave(struct vcpu *v, uint64_t mask)
> {
> struct xsave_struct *ptr = v->arch.xsave_area;
> @@ -268,6 +293,20 @@ static unsigned int _xstate_ctxt_size(u64 xcr0)
> return ebx;
> }
>
> +unsigned int xstate_ctxt_size_compact(u64 xcr0)
> +{
> + u64 act_xcr0 = get_xcr0();
> + u32 eax, ebx = 0, ecx, edx;
> + bool_t ok = set_xcr0(xcr0);
> +
> + ASSERT(ok);
> + cpuid_count(XSTATE_CPUID, 1, &eax, &ebx, &ecx, &edx);
> + ok = set_xcr0(act_xcr0);
> + ASSERT(ok);
> +
> + return ebx;
> +}
> +
> /* Fastpath for common xstate size requests, avoiding reloads of xcr0. */
> unsigned int xstate_ctxt_size(u64 xcr0)
> {
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index 7a9e96f..aee781b 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -496,6 +496,7 @@ struct arch_vcpu
> */
> struct xsave_struct *xsave_area;
> uint64_t xcr0;
> + u64 msr_ia32_xss;
uint64_t please, for consistency.
~Andrew
> /* Accumulated eXtended features mask for using XSAVE/XRESTORE by Xen
> * itself, as we can never know whether guest OS depends on content
> * preservation whenever guest OS clears one feature flag (for example,
> diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
> index 8595c38..94a590e 100644
> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -524,6 +524,7 @@ void make_cr3(struct vcpu *v, unsigned long mfn);
> void update_cr3(struct vcpu *v);
> int vcpu_destroy_pagetables(struct vcpu *);
> struct trap_bounce *propagate_page_fault(unsigned long addr, u16 error_code);
> +unsigned long do_page_walk_mfn(struct vcpu *v, unsigned long addr);
> void *do_page_walk(struct vcpu *v, unsigned long addr);
>
> int __sync_local_execstate(void);
> diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
> index 5425f77..365d995 100644
> --- a/xen/include/asm-x86/msr-index.h
> +++ b/xen/include/asm-x86/msr-index.h
> @@ -58,6 +58,8 @@
>
> #define MSR_IA32_BNDCFGS 0x00000D90
>
> +#define MSR_IA32_XSS 0x00000da0
> +
> #define MSR_MTRRfix64K_00000 0x00000250
> #define MSR_MTRRfix16K_80000 0x00000258
> #define MSR_MTRRfix16K_A0000 0x00000259
> diff --git a/xen/include/asm-x86/xstate.h b/xen/include/asm-x86/xstate.h
> index 4c690db..59c7156 100644
> --- a/xen/include/asm-x86/xstate.h
> +++ b/xen/include/asm-x86/xstate.h
> @@ -82,6 +82,8 @@ struct __packed __attribute__((aligned (64))) xsave_struct
> /* extended state operations */
> bool_t __must_check set_xcr0(u64 xfeatures);
> uint64_t get_xcr0(void);
> +void xsaves(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr);
> +void xrstors(uint32_t lmask, uint32_t hmask, struct xsave_struct *ptr);
> void xsave(struct vcpu *v, uint64_t mask);
> void xrstor(struct vcpu *v, uint64_t mask);
> bool_t xsave_enabled(const struct vcpu *v);
> @@ -92,6 +94,7 @@ int __must_check handle_xsetbv(u32 index, u64 new_bv);
> void xstate_free_save_area(struct vcpu *v);
> int xstate_alloc_save_area(struct vcpu *v);
> void xstate_init(bool_t bsp);
> +unsigned int xstate_ctxt_size_compact(u64 xcr0);
> unsigned int xstate_ctxt_size(u64 xcr0);
>
> #endif /* __ASM_XSTATE_H */
next prev parent reply other threads:[~2015-08-05 17:51 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-05 1:57 [PATCH V3 0/6] add xsaves/xrstors support Shuai Ruan
2015-08-05 1:57 ` [PATCH V3 1/6] x86/xsaves: enable xsaves/xrstors for pv guest Shuai Ruan
2015-08-05 17:51 ` Andrew Cooper [this message]
2015-08-07 8:00 ` Shuai Ruan
[not found] ` <20150807080008.GA2976@shuai.ruan@linux.intel.com>
2015-08-07 12:44 ` Andrew Cooper
2015-08-11 7:50 ` Shuai Ruan
[not found] ` <20150811075039.GA14406@shuai.ruan@linux.intel.com>
2015-08-11 10:24 ` Andrew Cooper
2015-08-12 3:01 ` Shuai Ruan
2015-08-05 1:57 ` [PATCH V3 2/6] x86/xsaves: enable xsaves/xrstors in xen Shuai Ruan
2015-08-05 17:57 ` Andrew Cooper
2015-08-05 1:57 ` [PATCH V3 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest Shuai Ruan
2015-08-05 18:17 ` Andrew Cooper
2015-08-07 8:22 ` Shuai Ruan
[not found] ` <20150807082244.GB2976@shuai.ruan@linux.intel.com>
2015-08-07 13:04 ` Andrew Cooper
2015-08-11 7:59 ` Shuai Ruan
[not found] ` <20150811075909.GB14406@shuai.ruan@linux.intel.com>
2015-08-11 9:37 ` Andrew Cooper
2015-08-12 11:17 ` Shuai Ruan
2015-08-05 1:57 ` [PATCH V3 4/6] libxc: expose xsaves/xgetbv1/xsavec to " Shuai Ruan
2015-08-05 8:37 ` Ian Campbell
2015-08-07 8:23 ` Shuai Ruan
2015-08-05 1:57 ` [PATCH V3 5/6] x86/xsaves: support compact format for hvm save/restore Shuai Ruan
2015-08-05 18:45 ` Andrew Cooper
[not found] ` <20150811080143.GC14406@shuai.ruan@linux.intel.com>
2015-08-11 9:27 ` Andrew Cooper
2015-08-12 11:23 ` Shuai Ruan
2015-08-05 1:57 ` [PATCH V3 6/6] x86/xsaves: detect xsaves/xgetbv1 in xen Shuai Ruan
2015-08-05 16:38 ` [PATCH V3 0/6] add xsaves/xrstors support Andrew Cooper
2015-08-07 8:25 ` Shuai Ruan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55C24D21.50209@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=eddie.dong@intel.com \
--cc=ian.jackson@eu.citrix.com \
--cc=jbeulich@suse.com \
--cc=jun.nakajima@intel.com \
--cc=keir@xen.org \
--cc=kevin.tian@intel.com \
--cc=shuai.ruan@linux.intel.com \
--cc=stefano.stabellini@eu.citrix.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).