xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0 of 8] FPU LWP: patch description
       [not found] <patchbomb.1304747132@weisles1164.amd.com>
@ 2011-05-07  5:39 ` Huang2, Wei
  2011-05-09  8:39   ` Jan Beulich
       [not found] ` <439311a108d71aba1580.1304747133@weisles1164.amd.com>
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 11+ messages in thread
From: Huang2, Wei @ 2011-05-07  5:39 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com

This patch set supports AMD lightweight profiling for SVM guests. Because LWP isn't tracked by CR0.TS bit, we clean up the FPU code to handle lazy and unlazy FPU states differently. Lazy FPU state (such as SSE, YMM) is handled when #NM is triggered. Unlazy state, such as LWP, is saved and restored on each vcpu context switch.

Per Keir's comments, I moved extended state related code into xstate.c and xstate.h. The FPU related code in i387.c was also cleaned up and has consistent names now. The comments from Jan Beulich were also taken. I added a new variable, nonlazy_xstate_used, to control whether save/restore nonlazy state.

====== i387.c ======
* void vcpu_restore_fpu_eager(struct vcpu *v);
* void vcpu_restore_fpu_lazy(struct vcpu *v);
* void vcpu_save_fpu(struct vcpu *v);
* int vcpu_init_fpu(struct vcpu *v);
* void vcpu_destroy_fpu(struct vcpu *v);

====== xstate.c ======
* void set_xcr0(u64 xfeatures);
* uint64_t get_xcr0(void);
* void xsave(struct vcpu *v, uint64_t mask);
* void xrstor(struct vcpu *v, uint64_t mask);
* bool_t xsave_enabled(const struct vcpu *v);
* void xstate_free_save_area(struct vcpu *v);
* int xstate_alloc_save_area(struct vcpu *v);
* void xstate_init(void);


This code has been tested on real hardware. Please comment.

-Wei


 b/xen/arch/x86/xstate.c            |  188 +++++++++++++++
 b/xen/include/asm-x86/xstate.h     |   72 +++++
 tools/libxc/xc_cpuid_x86.c         |    6
 xen/arch/x86/Makefile              |    1
 xen/arch/x86/acpi/suspend.c        |    2
 xen/arch/x86/cpu/common.c          |    4
 xen/arch/x86/domain.c              |   29 --
 xen/arch/x86/domctl.c              |    2
 xen/arch/x86/hvm/hvm.c             |    3
 xen/arch/x86/hvm/svm/svm.c         |   75 ++++++
 xen/arch/x86/hvm/svm/vmcb.c        |    5
 xen/arch/x86/hvm/vmx/vmcs.c        |    1
 xen/arch/x86/hvm/vmx/vmx.c         |    2
 xen/arch/x86/i387.c                |  450 ++++++++++++++++---------------------
 xen/arch/x86/traps.c               |    3
 xen/include/asm-x86/cpufeature.h   |    2
 xen/include/asm-x86/domain.h       |    5
 xen/include/asm-x86/hvm/svm/vmcb.h |    3
 xen/include/asm-x86/i387.h         |   71 -----
 xen/include/asm-x86/msr-index.h    |    4
 20 files changed, 573 insertions(+), 355 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1 of 8] FPU: extract extended related code into xstate.h and xstate.c
       [not found] ` <439311a108d71aba1580.1304747133@weisles1164.amd.com>
@ 2011-05-07  5:40   ` Huang2, Wei
  0 siblings, 0 replies; 11+ messages in thread
From: Huang2, Wei @ 2011-05-07  5:40 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com

[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: xen-unstable.hg-8.patch --]
[-- Type: text/x-patch; name="xen-unstable.hg-8.patch", Size: 18794 bytes --]

# HG changeset patch
# User Wei Huang <wei.huang2@amd.com>
# Date 1304447651 18000
# Node ID 439311a108d71aba1580177543c7d4320b012721
# Parent  fac1411505d6e56cd6745d29682bcc02fd043649
FPU: extract extended related code into xstate.h and xstate.c

Current extended code is mixing with FPU code in i387.c. As part of FPU code cleanup, this patch moves all extended state code into independent files. Not much semantic are changed and most function names are kept untouched, except for xsave() and xsaveopt(). These two functions are combined into a single function.

Signed-off-by: Wei Huang <wei.huang2@amd.com>

diff -r fac1411505d6 -r 439311a108d7 xen/arch/x86/Makefile
--- a/xen/arch/x86/Makefile	Sat May 07 00:39:45 2011 -0500
+++ b/xen/arch/x86/Makefile	Tue May 03 13:34:11 2011 -0500
@@ -56,6 +56,7 @@
 obj-y += crash.o
 obj-y += tboot.o
 obj-y += hpet.o
+obj-y += xstate.o
 
 obj-$(crash_debug) += gdbstub.o
 
diff -r fac1411505d6 -r 439311a108d7 xen/arch/x86/cpu/common.c
--- a/xen/arch/x86/cpu/common.c	Sat May 07 00:39:45 2011 -0500
+++ b/xen/arch/x86/cpu/common.c	Tue May 03 13:34:11 2011 -0500
@@ -5,7 +5,7 @@
 #include <xen/smp.h>
 #include <asm/current.h>
 #include <asm/processor.h>
-#include <asm/i387.h>
+#include <asm/xstate.h>
 #include <asm/msr.h>
 #include <asm/io.h>
 #include <asm/mpspec.h>
@@ -354,7 +354,7 @@
 		clear_bit(X86_FEATURE_XSAVE, boot_cpu_data.x86_capability);
 
 	if ( cpu_has_xsave )
-		xsave_init();
+		xstate_init();
 
 	/*
 	 * The vendor-specific functions might have changed features.  Now
diff -r fac1411505d6 -r 439311a108d7 xen/arch/x86/domain.c
--- a/xen/arch/x86/domain.c	Sat May 07 00:39:45 2011 -0500
+++ b/xen/arch/x86/domain.c	Tue May 03 13:34:11 2011 -0500
@@ -42,6 +42,7 @@
 #include <asm/processor.h>
 #include <asm/desc.h>
 #include <asm/i387.h>
+#include <asm/xstate.h>
 #include <asm/mpspec.h>
 #include <asm/ldt.h>
 #include <asm/hypercall.h>
@@ -419,7 +420,7 @@
 
     v->arch.perdomain_ptes = perdomain_ptes(d, v);
 
-    if ( (rc = xsave_alloc_save_area(v)) != 0 )
+    if ( (rc = xstate_alloc_save_area(v)) != 0 )
         return rc;
     if ( v->arch.xsave_area )
         v->arch.fpu_ctxt = &v->arch.xsave_area->fpu_sse;
@@ -485,7 +486,7 @@
     if ( rc )
     {
         if ( v->arch.xsave_area )
-            xsave_free_save_area(v);
+            xstate_free_save_area(v);
         else
             xfree(v->arch.fpu_ctxt);
         if ( !is_hvm_domain(d) && standalone_trap_ctxt(v) )
@@ -501,7 +502,7 @@
         release_compat_l4(v);
 
     if ( v->arch.xsave_area )
-        xsave_free_save_area(v);
+        xstate_free_save_area(v);
     else
         xfree(v->arch.fpu_ctxt);
 
diff -r fac1411505d6 -r 439311a108d7 xen/arch/x86/domctl.c
--- a/xen/arch/x86/domctl.c	Sat May 07 00:39:45 2011 -0500
+++ b/xen/arch/x86/domctl.c	Tue May 03 13:34:11 2011 -0500
@@ -33,7 +33,7 @@
 #include <asm/mem_event.h>
 #include <public/mem_event.h>
 #include <asm/mem_sharing.h>
-#include <asm/i387.h>
+#include <asm/xstate.h>
 
 #ifdef XEN_KDB_CONFIG
 #include "../kdb/include/kdbdefs.h"
diff -r fac1411505d6 -r 439311a108d7 xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c	Sat May 07 00:39:45 2011 -0500
+++ b/xen/arch/x86/hvm/hvm.c	Tue May 03 13:34:11 2011 -0500
@@ -46,6 +46,7 @@
 #include <asm/types.h>
 #include <asm/msr.h>
 #include <asm/i387.h>
+#include <asm/xstate.h>
 #include <asm/traps.h>
 #include <asm/mc146818rtc.h>
 #include <asm/spinlock.h>
@@ -2427,7 +2428,7 @@
         if ( count == 0 && v->arch.xcr0 ) 
         {
             /* reset EBX to default value first */
-            *ebx = XSAVE_AREA_MIN_SIZE; 
+            *ebx = XSTATE_AREA_MIN_SIZE; 
             for ( sub_leaf = 2; sub_leaf < 64; sub_leaf++ )
             {
                 if ( !(v->arch.xcr0 & (1ULL << sub_leaf)) )
diff -r fac1411505d6 -r 439311a108d7 xen/arch/x86/hvm/vmx/vmcs.c
--- a/xen/arch/x86/hvm/vmx/vmcs.c	Sat May 07 00:39:45 2011 -0500
+++ b/xen/arch/x86/hvm/vmx/vmcs.c	Tue May 03 13:34:11 2011 -0500
@@ -26,6 +26,7 @@
 #include <asm/cpufeature.h>
 #include <asm/processor.h>
 #include <asm/msr.h>
+#include <asm/xstate.h>
 #include <asm/hvm/hvm.h>
 #include <asm/hvm/io.h>
 #include <asm/hvm/support.h>
diff -r fac1411505d6 -r 439311a108d7 xen/arch/x86/i387.c
--- a/xen/arch/x86/i387.c	Sat May 07 00:39:45 2011 -0500
+++ b/xen/arch/x86/i387.c	Tue May 03 13:34:11 2011 -0500
@@ -14,41 +14,8 @@
 #include <asm/processor.h>
 #include <asm/hvm/support.h>
 #include <asm/i387.h>
+#include <asm/xstate.h>
 #include <asm/asm_defns.h>
-
-static bool_t __read_mostly cpu_has_xsaveopt;
-
-static void xsave(struct vcpu *v)
-{
-    struct xsave_struct *ptr = v->arch.xsave_area;
-
-    asm volatile (
-        ".byte " REX_PREFIX "0x0f,0xae,0x27"
-        :
-        : "a" (-1), "d" (-1), "D"(ptr)
-        : "memory" );
-}
-
-static void xsaveopt(struct vcpu *v)
-{
-    struct xsave_struct *ptr = v->arch.xsave_area;
-
-    asm volatile (
-        ".byte " REX_PREFIX "0x0f,0xae,0x37"
-        :
-        : "a" (-1), "d" (-1), "D"(ptr)
-        : "memory" );
-}
-
-static void xrstor(struct vcpu *v)
-{
-    struct xsave_struct *ptr = v->arch.xsave_area;
-
-    asm volatile (
-        ".byte " REX_PREFIX "0x0f,0xae,0x2f"
-        :
-        : "m" (*ptr), "a" (-1), "d" (-1), "D"(ptr) );
-}
 
 static void load_mxcsr(unsigned long val)
 {
@@ -122,10 +89,7 @@
          * we set all accumulated feature mask before doing save/restore.
          */
         set_xcr0(v->arch.xcr0_accum);
-        if ( cpu_has_xsaveopt )
-            xsaveopt(v);
-        else
-            xsave(v);
+        xsave(v);
         set_xcr0(v->arch.xcr0);
     }
     else if ( cpu_has_fxsr )
@@ -220,113 +184,6 @@
     }
 }
 
-#define XSTATE_CPUID 0xd
-
-/*
- * Maximum size (in byte) of the XSAVE/XRSTOR save area required by all
- * the supported and enabled features on the processor, including the
- * XSAVE.HEADER. We only enable XCNTXT_MASK that we have known.
- */
-u32 xsave_cntxt_size;
-
-/* A 64-bit bitmask of the XSAVE/XRSTOR features supported by processor. */
-u64 xfeature_mask;
-
-/* Cached xcr0 for fast read */
-DEFINE_PER_CPU(uint64_t, xcr0);
-
-void xsave_init(void)
-{
-    u32 eax, ebx, ecx, edx;
-    int cpu = smp_processor_id();
-    u32 min_size;
-
-    if ( boot_cpu_data.cpuid_level < XSTATE_CPUID )
-        return;
-
-    cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx);
-
-    BUG_ON((eax & XSTATE_FP_SSE) != XSTATE_FP_SSE);
-    BUG_ON((eax & XSTATE_YMM) && !(eax & XSTATE_SSE));
-
-    /* FP/SSE, XSAVE.HEADER, YMM */
-    min_size =  XSAVE_AREA_MIN_SIZE;
-    if ( eax & XSTATE_YMM )
-        min_size += XSTATE_YMM_SIZE;
-    BUG_ON(ecx < min_size);
-
-    /*
-     * Set CR4_OSXSAVE and run "cpuid" to get xsave_cntxt_size.
-     */
-    set_in_cr4(X86_CR4_OSXSAVE);
-    set_xcr0((((u64)edx << 32) | eax) & XCNTXT_MASK);
-    cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx);
-
-    if ( cpu == 0 )
-    {
-        /*
-         * xsave_cntxt_size is the max size required by enabled features.
-         * We know FP/SSE and YMM about eax, and nothing about edx at present.
-         */
-        xsave_cntxt_size = ebx;
-        xfeature_mask = eax + ((u64)edx << 32);
-        xfeature_mask &= XCNTXT_MASK;
-        printk("%s: using cntxt_size: 0x%x and states: 0x%"PRIx64"\n",
-            __func__, xsave_cntxt_size, xfeature_mask);
-
-        /* Check XSAVEOPT feature. */
-        cpuid_count(XSTATE_CPUID, 1, &eax, &ebx, &ecx, &edx);
-        cpu_has_xsaveopt = !!(eax & XSAVEOPT);
-    }
-    else
-    {
-        BUG_ON(xsave_cntxt_size != ebx);
-        BUG_ON(xfeature_mask != (xfeature_mask & XCNTXT_MASK));
-    }
-}
-
-int xsave_alloc_save_area(struct vcpu *v)
-{
-    void *save_area;
-
-    if ( !cpu_has_xsave || is_idle_vcpu(v) )
-        return 0;
-
-    BUG_ON(xsave_cntxt_size < XSAVE_AREA_MIN_SIZE);
-
-    /* XSAVE/XRSTOR requires the save area be 64-byte-boundary aligned. */
-    save_area = _xmalloc(xsave_cntxt_size, 64);
-    if ( save_area == NULL )
-        return -ENOMEM;
-
-    memset(save_area, 0, xsave_cntxt_size);
-    ((u32 *)save_area)[6] = 0x1f80;  /* MXCSR */
-    *(uint64_t *)(save_area + 512) = XSTATE_FP_SSE;  /* XSETBV */
-
-    v->arch.xsave_area = save_area;
-    v->arch.xcr0 = XSTATE_FP_SSE;
-    v->arch.xcr0_accum = XSTATE_FP_SSE;
-
-    return 0;
-}
-
-void xsave_free_save_area(struct vcpu *v)
-{
-    xfree(v->arch.xsave_area);
-    v->arch.xsave_area = NULL;
-}
-
-bool_t xsave_enabled(const struct vcpu *v)
-{
-    if ( cpu_has_xsave )
-    {
-        ASSERT(xsave_cntxt_size >= XSAVE_AREA_MIN_SIZE);
-        ASSERT(v->arch.xsave_area);
-    }
-
-    return cpu_has_xsave;	
-}
-
 /*
  * Local variables:
  * mode: C
diff -r fac1411505d6 -r 439311a108d7 xen/arch/x86/traps.c
--- a/xen/arch/x86/traps.c	Sat May 07 00:39:45 2011 -0500
+++ b/xen/arch/x86/traps.c	Tue May 03 13:34:11 2011 -0500
@@ -58,6 +58,7 @@
 #include <asm/flushtlb.h>
 #include <asm/uaccess.h>
 #include <asm/i387.h>
+#include <asm/xstate.h>
 #include <asm/debugger.h>
 #include <asm/msr.h>
 #include <asm/shared.h>
diff -r fac1411505d6 -r 439311a108d7 xen/arch/x86/xstate.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/xen/arch/x86/xstate.c	Tue May 03 13:34:11 2011 -0500
@@ -0,0 +1,183 @@
+/*
+ *  arch/x86/xstate.c
+ *
+ *  x86 extended state operations
+ *
+ */
+
+#include <xen/config.h>
+#include <xen/sched.h>
+#include <asm/current.h>
+#include <asm/processor.h>
+#include <asm/hvm/support.h>
+#include <asm/xstate.h>
+#include <asm/asm_defns.h>
+
+bool_t __read_mostly cpu_has_xsaveopt;
+
+/*
+ * Maximum size (in byte) of the XSAVE/XRSTOR save area required by all
+ * the supported and enabled features on the processor, including the
+ * XSAVE.HEADER. We only enable XCNTXT_MASK that we have known.
+ */
+u32 xsave_cntxt_size;
+
+/* A 64-bit bitmask of the XSAVE/XRSTOR features supported by processor. */
+u64 xfeature_mask;
+
+/* Cached xcr0 for fast read */
+DEFINE_PER_CPU(uint64_t, xcr0);
+
+/* Because XCR0 is cached for each CPU, xsetbv() is not exposed. Users should 
+ * use set_xcr0() instead.
+ */
+static inline void xsetbv(u32 index, u64 xfeatures)
+{
+    u32 hi = xfeatures >> 32;
+    u32 lo = (u32)xfeatures;
+
+    asm volatile (".byte 0x0f,0x01,0xd1" :: "c" (index),
+            "a" (lo), "d" (hi));
+}
+
+inline void set_xcr0(u64 xfeatures)
+{
+    this_cpu(xcr0) = xfeatures;
+    xsetbv(XCR_XFEATURE_ENABLED_MASK, xfeatures);
+}
+
+inline uint64_t get_xcr0(void)
+{
+    return this_cpu(xcr0);
+}
+
+void xsave(struct vcpu *v)
+{
+    struct xsave_struct *ptr = v->arch.xsave_area;
+
+    if ( cpu_has_xsaveopt )
+        asm volatile (
+            ".byte " REX_PREFIX "0x0f,0xae,0x37"
+            :
+            : "a" (-1), "d" (-1), "D"(ptr)
+            : "memory" );
+    else
+        asm volatile (
+            ".byte " REX_PREFIX "0x0f,0xae,0x27"
+            :
+            : "a" (-1), "d" (-1), "D"(ptr)
+            : "memory" );
+}
+
+void xrstor(struct vcpu *v)
+{
+    struct xsave_struct *ptr = v->arch.xsave_area;
+
+    asm volatile (
+        ".byte " REX_PREFIX "0x0f,0xae,0x2f"
+        :
+        : "m" (*ptr), "a" (-1), "d" (-1), "D"(ptr) );
+}
+
+bool_t xsave_enabled(const struct vcpu *v)
+{
+    if ( cpu_has_xsave )
+    {
+        ASSERT(xsave_cntxt_size >= XSTATE_AREA_MIN_SIZE);
+        ASSERT(v->arch.xsave_area);
+    }
+
+    return cpu_has_xsave;	
+}
+
+int xstate_alloc_save_area(struct vcpu *v)
+{
+    void *save_area;
+
+    if ( !cpu_has_xsave || is_idle_vcpu(v) )
+        return 0;
+
+    BUG_ON(xsave_cntxt_size < XSTATE_AREA_MIN_SIZE);
+
+    /* XSAVE/XRSTOR requires the save area be 64-byte-boundary aligned. */
+    save_area = _xmalloc(xsave_cntxt_size, 64);
+    if ( save_area == NULL )
+        return -ENOMEM;
+
+    memset(save_area, 0, xsave_cntxt_size);
+    ((u32 *)save_area)[6] = 0x1f80;  /* MXCSR */
+    *(uint64_t *)(save_area + 512) = XSTATE_FP_SSE;  /* XSETBV */
+
+    v->arch.xsave_area = save_area;
+    v->arch.xcr0 = XSTATE_FP_SSE;
+    v->arch.xcr0_accum = XSTATE_FP_SSE;
+
+    return 0;
+}
+
+void xstate_free_save_area(struct vcpu *v)
+{
+    xfree(v->arch.xsave_area);
+    v->arch.xsave_area = NULL;
+}
+
+/* Collect the information of processor's extended state */
+void xstate_init(void)
+{
+    u32 eax, ebx, ecx, edx;
+    int cpu = smp_processor_id();
+    u32 min_size;
+
+    if ( boot_cpu_data.cpuid_level < XSTATE_CPUID )
+        return;
+
+    cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx);
+
+    BUG_ON((eax & XSTATE_FP_SSE) != XSTATE_FP_SSE);
+    BUG_ON((eax & XSTATE_YMM) && !(eax & XSTATE_SSE));
+
+    /* FP/SSE, XSAVE.HEADER, YMM */
+    min_size =  XSTATE_AREA_MIN_SIZE;
+    if ( eax & XSTATE_YMM )
+        min_size += XSTATE_YMM_SIZE;
+    BUG_ON(ecx < min_size);
+
+    /*
+     * Set CR4_OSXSAVE and run "cpuid" to get xsave_cntxt_size.
+     */
+    set_in_cr4(X86_CR4_OSXSAVE);
+    set_xcr0((((u64)edx << 32) | eax) & XCNTXT_MASK);
+    cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx);
+
+    if ( cpu == 0 )
+    {
+        /*
+         * xsave_cntxt_size is the max size required by enabled features.
+         * We know FP/SSE and YMM about eax, and nothing about edx at present.
+         */
+        xsave_cntxt_size = ebx;
+        xfeature_mask = eax + ((u64)edx << 32);
+        xfeature_mask &= XCNTXT_MASK;
+        printk("%s: using cntxt_size: 0x%x and states: 0x%"PRIx64"\n",
+            __func__, xsave_cntxt_size, xfeature_mask);
+
+        /* Check XSAVEOPT feature. */
+        cpuid_count(XSTATE_CPUID, 1, &eax, &ebx, &ecx, &edx);
+        cpu_has_xsaveopt = !!(eax & XSTATE_FEATURE_XSAVEOPT);
+    }
+    else
+    {
+        BUG_ON(xsave_cntxt_size != ebx);
+        BUG_ON(xfeature_mask != (xfeature_mask & XCNTXT_MASK));
+    }
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-set-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff -r fac1411505d6 -r 439311a108d7 xen/include/asm-x86/i387.h
--- a/xen/include/asm-x86/i387.h	Sat May 07 00:39:45 2011 -0500
+++ b/xen/include/asm-x86/i387.h	Tue May 03 13:34:11 2011 -0500
@@ -14,71 +14,7 @@
 #include <xen/types.h>
 #include <xen/percpu.h>
 
-struct vcpu;
-
-extern unsigned int xsave_cntxt_size;
-extern u64 xfeature_mask;
-
-void xsave_init(void);
-int xsave_alloc_save_area(struct vcpu *v);
-void xsave_free_save_area(struct vcpu *v);
-bool_t xsave_enabled(const struct vcpu *v);
-
-#define XSAVE_AREA_MIN_SIZE (512 + 64) /* FP/SSE + XSAVE.HEADER */
-#define XSTATE_FP       (1ULL << 0)
-#define XSTATE_SSE      (1ULL << 1)
-#define XSTATE_YMM      (1ULL << 2)
-#define XSTATE_LWP      (1ULL << 62) /* AMD lightweight profiling */
-#define XSTATE_FP_SSE   (XSTATE_FP | XSTATE_SSE)
-#define XCNTXT_MASK     (XSTATE_FP | XSTATE_SSE | XSTATE_YMM | XSTATE_LWP)
-#define XSTATE_YMM_OFFSET  XSAVE_AREA_MIN_SIZE
-#define XSTATE_YMM_SIZE    256
-#define XSAVEOPT        (1 << 0)
-
-struct xsave_struct
-{
-    struct { char x[512]; } fpu_sse;         /* FPU/MMX, SSE */
-
-    struct {
-        u64 xstate_bv;
-        u64 reserved[7];
-    } xsave_hdr;                            /* The 64-byte header */
-
-    struct { char x[XSTATE_YMM_SIZE]; } ymm; /* YMM */
-    char   data[];                           /* Future new states */
-} __attribute__ ((packed, aligned (64)));
-
-#define XCR_XFEATURE_ENABLED_MASK   0
-
-#ifdef CONFIG_X86_64
-#define REX_PREFIX "0x48, "
-#else
-#define REX_PREFIX
-#endif
-
-DECLARE_PER_CPU(uint64_t, xcr0);
-
-static inline void xsetbv(u32 index, u64 xfeatures)
-{
-    u32 hi = xfeatures >> 32;
-    u32 lo = (u32)xfeatures;
-
-    asm volatile (".byte 0x0f,0x01,0xd1" :: "c" (index),
-            "a" (lo), "d" (hi));
-}
-
-static inline void set_xcr0(u64 xfeatures)
-{
-    this_cpu(xcr0) = xfeatures;
-    xsetbv(XCR_XFEATURE_ENABLED_MASK, xfeatures);
-}
-
-static inline uint64_t get_xcr0(void)
-{
-    return this_cpu(xcr0);
-}
-
-extern void setup_fpu(struct vcpu *v);
-extern void save_init_fpu(struct vcpu *v);
+void setup_fpu(struct vcpu *v);
+void save_init_fpu(struct vcpu *v);
 
 #endif /* __ASM_I386_I387_H */
diff -r fac1411505d6 -r 439311a108d7 xen/include/asm-x86/xstate.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/xen/include/asm-x86/xstate.h	Tue May 03 13:34:11 2011 -0500
@@ -0,0 +1,68 @@
+/*
+ * include/asm-i386/xstate.h
+ *
+ * x86 extended state (xsave/xrstor) related definitions
+ * 
+ */
+
+#ifndef __ASM_XSTATE_H
+#define __ASM_XSTATE_H
+
+#include <xen/types.h>
+#include <xen/percpu.h>
+
+#define XSTATE_CPUID              0x0000000d
+#define XSTATE_FEATURE_XSAVEOPT   (1 << 0)    /* sub-leaf 1, eax[bit 0] */
+
+#define XCR_XFEATURE_ENABLED_MASK 0x00000000  /* index of XCR0 */
+
+#define XSTATE_YMM_SIZE           256
+#define XSTATE_YMM_OFFSET         XSAVE_AREA_MIN_SIZE
+#define XSTATE_AREA_MIN_SIZE      (512 + 64)  /* FP/SSE + XSAVE.HEADER */
+
+#define XSTATE_FP      (1ULL << 0)
+#define XSTATE_SSE     (1ULL << 1)
+#define XSTATE_YMM     (1ULL << 2)
+#define XSTATE_LWP     (1ULL << 62) /* AMD lightweight profiling */
+#define XSTATE_FP_SSE  (XSTATE_FP | XSTATE_SSE)
+#define XCNTXT_MASK    (XSTATE_FP | XSTATE_SSE | XSTATE_YMM | XSTATE_LWP)
+
+#ifdef CONFIG_X86_64
+#define REX_PREFIX     "0x48, "
+#else
+#define REX_PREFIX
+#endif
+
+/* extended state variables */
+DECLARE_PER_CPU(uint64_t, xcr0);
+
+extern unsigned int xsave_cntxt_size;
+extern u64 xfeature_mask;
+
+/* extended state save area */
+struct xsave_struct
+{
+    struct { char x[512]; } fpu_sse;         /* FPU/MMX, SSE */
+
+    struct {
+        u64 xstate_bv;
+        u64 reserved[7];
+    } xsave_hdr;                             /* The 64-byte header */
+
+    struct { char x[XSTATE_YMM_SIZE]; } ymm; /* YMM */
+    char   data[];                           /* Future new states */
+} __attribute__ ((packed, aligned (64)));
+
+/* extended state operations */
+void set_xcr0(u64 xfeatures);
+uint64_t get_xcr0(void);
+void xsave(struct vcpu *v);
+void xrstor(struct vcpu *v);
+bool_t xsave_enabled(const struct vcpu *v);
+
+/* extended state init and cleanup functions */
+void xstate_free_save_area(struct vcpu *v);
+int xstate_alloc_save_area(struct vcpu *v);
+void xstate_init(void);
+
+#endif /* __ASM_XSTATE_H */

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2 of 8] FPU: create FPU init and destroy functions
       [not found] ` <eb6f1c5d0c7e688df083.1304747134@weisles1164.amd.com>
@ 2011-05-07  5:40   ` Huang2, Wei
  0 siblings, 0 replies; 11+ messages in thread
From: Huang2, Wei @ 2011-05-07  5:40 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com

[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: xen-unstable.hg-8.patch --]
[-- Type: text/x-patch; name="xen-unstable.hg-8.patch", Size: 3569 bytes --]

# HG changeset patch
# User Wei Huang <wei.huang2@amd.com>
# Date 1304447821 18000
# Node ID eb6f1c5d0c7e688df083708348892ec5ba45c41a
# Parent  439311a108d71aba1580177543c7d4320b012721
FPU: create FPU init and destroy functions

Extract FPU initialization and destroy code into two functions. These functions handle memory allocation/deallocation for FPU context.

Signed-off-by: Wei Huang <wei.huang2@amd.com>

diff -r 439311a108d7 -r eb6f1c5d0c7e xen/arch/x86/domain.c
--- a/xen/arch/x86/domain.c	Tue May 03 13:34:11 2011 -0500
+++ b/xen/arch/x86/domain.c	Tue May 03 13:37:01 2011 -0500
@@ -420,20 +420,8 @@
 
     v->arch.perdomain_ptes = perdomain_ptes(d, v);
 
-    if ( (rc = xstate_alloc_save_area(v)) != 0 )
+    if ( (rc = vcpu_init_fpu(v)) != 0 )
         return rc;
-    if ( v->arch.xsave_area )
-        v->arch.fpu_ctxt = &v->arch.xsave_area->fpu_sse;
-    else if ( !is_idle_domain(d) )
-    {
-        v->arch.fpu_ctxt = _xmalloc(sizeof(v->arch.xsave_area->fpu_sse), 16);
-        if ( !v->arch.fpu_ctxt )
-        {
-            rc = -ENOMEM;
-            goto done;
-        }
-        memset(v->arch.fpu_ctxt, 0, sizeof(v->arch.xsave_area->fpu_sse));
-    }
 
     if ( is_hvm_domain(d) )
     {
@@ -485,10 +473,8 @@
  done:
     if ( rc )
     {
-        if ( v->arch.xsave_area )
-            xstate_free_save_area(v);
-        else
-            xfree(v->arch.fpu_ctxt);
+        vcpu_destroy_fpu(v);
+
         if ( !is_hvm_domain(d) && standalone_trap_ctxt(v) )
             free_xenheap_page(v->arch.pv_vcpu.trap_ctxt);
     }
@@ -501,10 +487,7 @@
     if ( is_pv_32on64_vcpu(v) )
         release_compat_l4(v);
 
-    if ( v->arch.xsave_area )
-        xstate_free_save_area(v);
-    else
-        xfree(v->arch.fpu_ctxt);
+    vcpu_destroy_fpu(v);
 
     if ( is_hvm_vcpu(v) )
         hvm_vcpu_destroy(v);
diff -r 439311a108d7 -r eb6f1c5d0c7e xen/arch/x86/i387.c
--- a/xen/arch/x86/i387.c	Tue May 03 13:34:11 2011 -0500
+++ b/xen/arch/x86/i387.c	Tue May 03 13:37:01 2011 -0500
@@ -184,6 +184,47 @@
     }
 }
 
+/*******************************/
+/*       VCPU FPU Functions    */
+/*******************************/
+/* Initialize FPU's context save area */
+int vcpu_init_fpu(struct vcpu *v)
+{
+    int rc = 0;
+    
+    /* Idle domain doesn't have FPU state allocated */
+    if ( is_idle_vcpu(v) )
+        goto done;
+
+    if ( (rc = xstate_alloc_save_area(v)) != 0 )
+        return rc;
+
+    if ( v->arch.xsave_area )
+        v->arch.fpu_ctxt = &v->arch.xsave_area->fpu_sse;
+    else
+    {
+        v->arch.fpu_ctxt = _xmalloc(sizeof(v->arch.xsave_area->fpu_sse), 16);
+        if ( !v->arch.fpu_ctxt )
+        {
+            rc = -ENOMEM;
+            goto done;
+        }
+        memset(v->arch.fpu_ctxt, 0, sizeof(v->arch.xsave_area->fpu_sse));
+    }
+
+done:
+    return rc;
+}
+
+/* Free FPU's context save area */
+void vcpu_destroy_fpu(struct vcpu *v)
+{
+    if ( v->arch.xsave_area )
+        xstate_free_save_area(v);
+    else
+        xfree(v->arch.fpu_ctxt);
+}
+
 /*
  * Local variables:
  * mode: C
diff -r 439311a108d7 -r eb6f1c5d0c7e xen/include/asm-x86/i387.h
--- a/xen/include/asm-x86/i387.h	Tue May 03 13:34:11 2011 -0500
+++ b/xen/include/asm-x86/i387.h	Tue May 03 13:37:01 2011 -0500
@@ -17,4 +17,6 @@
 void setup_fpu(struct vcpu *v);
 void save_init_fpu(struct vcpu *v);
 
+int vcpu_init_fpu(struct vcpu *v);
+void vcpu_destroy_fpu(struct vcpu *v);
 #endif /* __ASM_I386_I387_H */

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 3 of 8] FPU: clean up FPU context save function
       [not found] ` <5560826d130f143c8aec.1304747135@weisles1164.amd.com>
@ 2011-05-07  5:41   ` Huang2, Wei
  0 siblings, 0 replies; 11+ messages in thread
From: Huang2, Wei @ 2011-05-07  5:41 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com

[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: xen-unstable.hg-8.patch --]
[-- Type: text/x-patch; name="xen-unstable.hg-8.patch", Size: 7256 bytes --]

# HG changeset patch
# User Wei Huang <wei.huang2@amd.com>
# Date 1304448193 18000
# Node ID 5560826d130f143c8aecbf4abbcc59cef587f29c
# Parent  eb6f1c5d0c7e688df083708348892ec5ba45c41a
FPU: clean up FPU context save function

This patch cleans up context save function. It renames the save function to vcpu_save_fpu() because existing function name is confusion. It also extracts FPU context save code (fsave, fxsave, xsave) into seperate functions. vcpu_save_fpu() will call corresponding sub-function depending on CPU's capability.

Signed-off-by: Wei Huang <wei.huang2@amd.com>

diff -r eb6f1c5d0c7e -r 5560826d130f xen/arch/x86/acpi/suspend.c
--- a/xen/arch/x86/acpi/suspend.c	Tue May 03 13:37:01 2011 -0500
+++ b/xen/arch/x86/acpi/suspend.c	Tue May 03 13:43:13 2011 -0500
@@ -24,7 +24,7 @@
 
 void save_rest_processor_state(void)
 {
-    save_init_fpu(current);
+    vcpu_save_fpu(current);
 
 #if defined(CONFIG_X86_64)
     asm volatile (
diff -r eb6f1c5d0c7e -r 5560826d130f xen/arch/x86/domain.c
--- a/xen/arch/x86/domain.c	Tue May 03 13:37:01 2011 -0500
+++ b/xen/arch/x86/domain.c	Tue May 03 13:43:13 2011 -0500
@@ -1560,7 +1560,7 @@
     if ( !is_idle_vcpu(p) )
     {
         memcpy(&p->arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES);
-        save_init_fpu(p);
+        vcpu_save_fpu(p);
         p->arch.ctxt_switch_from(p);
     }
 
diff -r eb6f1c5d0c7e -r 5560826d130f xen/arch/x86/i387.c
--- a/xen/arch/x86/i387.c	Tue May 03 13:37:01 2011 -0500
+++ b/xen/arch/x86/i387.c	Tue May 03 13:43:13 2011 -0500
@@ -66,78 +66,6 @@
         load_mxcsr(0x1f80);
 }
 
-void save_init_fpu(struct vcpu *v)
-{
-    unsigned long cr0;
-    char *fpu_ctxt;
-
-    if ( !v->fpu_dirtied )
-        return;
-
-    ASSERT(!is_idle_vcpu(v));
-
-    cr0 = read_cr0();
-    fpu_ctxt = v->arch.fpu_ctxt;
-
-    /* This can happen, if a paravirtualised guest OS has set its CR0.TS. */
-    if ( cr0 & X86_CR0_TS )
-        clts();
-
-    if ( xsave_enabled(v) )
-    {
-        /* XCR0 normally represents what guest OS set. In case of Xen itself,
-         * we set all accumulated feature mask before doing save/restore.
-         */
-        set_xcr0(v->arch.xcr0_accum);
-        xsave(v);
-        set_xcr0(v->arch.xcr0);
-    }
-    else if ( cpu_has_fxsr )
-    {
-#ifdef __i386__
-        asm volatile (
-            "fxsave %0"
-            : "=m" (*fpu_ctxt) );
-#else /* __x86_64__ */
-        /*
-         * The only way to force fxsaveq on a wide range of gas versions. On 
-         * older versions the rex64 prefix works only if we force an
-         * addressing mode that doesn't require extended registers.
-         */
-        asm volatile (
-            REX64_PREFIX "fxsave (%1)"
-            : "=m" (*fpu_ctxt) : "cdaSDb" (fpu_ctxt) );
-#endif
-
-        /* Clear exception flags if FSW.ES is set. */
-        if ( unlikely(fpu_ctxt[2] & 0x80) )
-            asm volatile ("fnclex");
-
-        /*
-         * AMD CPUs don't save/restore FDP/FIP/FOP unless an exception
-         * is pending. Clear the x87 state here by setting it to fixed
-         * values. The hypervisor data segment can be sometimes 0 and
-         * sometimes new user value. Both should be ok. Use the FPU saved
-         * data block as a safe address because it should be in L1.
-         */
-        if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
-        {
-            asm volatile (
-                "emms\n\t"  /* clear stack tags */
-                "fildl %0"  /* load to clear state */
-                : : "m" (*fpu_ctxt) );
-        }
-    }
-    else
-    {
-        /* FWAIT is required to make FNSAVE synchronous. */
-        asm volatile ( "fnsave %0 ; fwait" : "=m" (*fpu_ctxt) );
-    }
-
-    v->fpu_dirtied = 0;
-    write_cr0(cr0|X86_CR0_TS);
-}
-
 static void restore_fpu(struct vcpu *v)
 {
     const char *fpu_ctxt = v->arch.fpu_ctxt;
@@ -185,8 +113,96 @@
 }
 
 /*******************************/
+/*      FPU Save Functions     */
+/*******************************/
+/* Save x87 extended state */
+static inline void fpu_xsave(struct vcpu *v)
+{
+    /* XCR0 normally represents what guest OS set. In case of Xen itself,
+     * we set all accumulated feature mask before doing save/restore.
+     */
+    set_xcr0(v->arch.xcr0_accum);
+    xsave(v);
+    set_xcr0(v->arch.xcr0);    
+}
+
+/* Save x87 FPU, MMX, SSE and SSE2 state */
+static inline void fpu_fxsave(struct vcpu *v)
+{
+    char *fpu_ctxt = v->arch.fpu_ctxt;
+
+#ifdef __i386__
+    asm volatile (
+        "fxsave %0"
+        : "=m" (*fpu_ctxt) );
+#else /* __x86_64__ */
+    /*
+     * The only way to force fxsaveq on a wide range of gas versions. On 
+     * older versions the rex64 prefix works only if we force an
+     * addressing mode that doesn't require extended registers.
+     */
+    asm volatile (
+        REX64_PREFIX "fxsave (%1)"
+        : "=m" (*fpu_ctxt) : "cdaSDb" (fpu_ctxt) );
+#endif
+    
+    /* Clear exception flags if FSW.ES is set. */
+    if ( unlikely(fpu_ctxt[2] & 0x80) )
+        asm volatile ("fnclex");
+    
+    /*
+     * AMD CPUs don't save/restore FDP/FIP/FOP unless an exception
+     * is pending. Clear the x87 state here by setting it to fixed
+     * values. The hypervisor data segment can be sometimes 0 and
+     * sometimes new user value. Both should be ok. Use the FPU saved
+     * data block as a safe address because it should be in L1.
+     */
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
+    {
+        asm volatile (
+            "emms\n\t"  /* clear stack tags */
+            "fildl %0"  /* load to clear state */
+            : : "m" (*fpu_ctxt) );
+    }
+}
+
+/* Save x87 FPU state */
+static inline void fpu_fsave(struct vcpu *v)
+{
+    char *fpu_ctxt = v->arch.fpu_ctxt;
+
+    /* FWAIT is required to make FNSAVE synchronous. */
+    asm volatile ( "fnsave %0 ; fwait" : "=m" (*fpu_ctxt) );
+}
+
+/*******************************/
 /*       VCPU FPU Functions    */
 /*******************************/
+/* 
+ * On each context switch, save the necessary FPU info of VCPU being switch 
+ * out. It dispatches saving operation based on CPU's capability.
+ */
+void vcpu_save_fpu(struct vcpu *v)
+{
+    if ( !v->fpu_dirtied )
+        return;
+
+    ASSERT(!is_idle_vcpu(v));
+
+    /* This can happen, if a paravirtualised guest OS has set its CR0.TS. */
+    clts();
+
+    if ( xsave_enabled(v) )
+        fpu_xsave(v);
+    else if ( cpu_has_fxsr )
+        fpu_fxsave(v);
+    else
+        fpu_fsave(v);
+
+    v->fpu_dirtied = 0;
+    stts();
+}
+
 /* Initialize FPU's context save area */
 int vcpu_init_fpu(struct vcpu *v)
 {
diff -r eb6f1c5d0c7e -r 5560826d130f xen/include/asm-x86/i387.h
--- a/xen/include/asm-x86/i387.h	Tue May 03 13:37:01 2011 -0500
+++ b/xen/include/asm-x86/i387.h	Tue May 03 13:43:13 2011 -0500
@@ -15,7 +15,7 @@
 #include <xen/percpu.h>
 
 void setup_fpu(struct vcpu *v);
-void save_init_fpu(struct vcpu *v);
+void vcpu_save_fpu(struct vcpu *v);
 
 int vcpu_init_fpu(struct vcpu *v);
 void vcpu_destroy_fpu(struct vcpu *v);

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 4 of 8] FPU: clean up FPU context restore function
       [not found] ` <5badff7cda6b4af71991.1304747136@weisles1164.amd.com>
@ 2011-05-07  5:42   ` Huang2, Wei
  0 siblings, 0 replies; 11+ messages in thread
From: Huang2, Wei @ 2011-05-07  5:42 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com

[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: xen-unstable.hg-8.patch --]
[-- Type: text/x-patch; name="xen-unstable.hg-8.patch", Size: 7751 bytes --]

# HG changeset patch
# User Wei Huang <wei.huang2@amd.com>
# Date 1304448326 18000
# Node ID 5badff7cda6b4af71991d8150803ac2f7c4a27f5
# Parent  5560826d130f143c8aecbf4abbcc59cef587f29c
FPU: clean up FPU context restore function

This patch cleans up context restore function. It renames the function name to vcpu_restore_fpu(). It also extracts FPU restore code (frstor, fxrstor, xrstor) out into seperate functions. vcpu_restor_fpu() will dispatch to these functions depending on CPU's capability.

Signed-off-by: Wei Huang <wei.huang2@amd.com>

diff -r 5560826d130f -r 5badff7cda6b xen/arch/x86/hvm/svm/svm.c
--- a/xen/arch/x86/hvm/svm/svm.c	Tue May 03 13:43:13 2011 -0500
+++ b/xen/arch/x86/hvm/svm/svm.c	Tue May 03 13:45:26 2011 -0500
@@ -348,7 +348,7 @@
 {
     struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
 
-    setup_fpu(v);
+    vcpu_restore_fpu(v);
     vmcb_set_exception_intercepts(
         vmcb, vmcb_get_exception_intercepts(vmcb) & ~(1U << TRAP_no_device));
 }
diff -r 5560826d130f -r 5badff7cda6b xen/arch/x86/hvm/vmx/vmx.c
--- a/xen/arch/x86/hvm/vmx/vmx.c	Tue May 03 13:43:13 2011 -0500
+++ b/xen/arch/x86/hvm/vmx/vmx.c	Tue May 03 13:45:26 2011 -0500
@@ -612,7 +612,7 @@
 
 static void vmx_fpu_enter(struct vcpu *v)
 {
-    setup_fpu(v);
+    vcpu_restore_fpu(v);
     v->arch.hvm_vmx.exception_bitmap &= ~(1u << TRAP_no_device);
     vmx_update_exception_bitmap(v);
     v->arch.hvm_vmx.host_cr0 &= ~X86_CR0_TS;
diff -r 5560826d130f -r 5badff7cda6b xen/arch/x86/i387.c
--- a/xen/arch/x86/i387.c	Tue May 03 13:43:13 2011 -0500
+++ b/xen/arch/x86/i387.c	Tue May 03 13:45:26 2011 -0500
@@ -17,56 +17,37 @@
 #include <asm/xstate.h>
 #include <asm/asm_defns.h>
 
-static void load_mxcsr(unsigned long val)
+#define MXCSR_DEFAULT 0x1f80
+static void fpu_init(void)
 {
-    val &= 0xffbf;
-    asm volatile ( "ldmxcsr %0" : : "m" (val) );
+    unsigned long val;
+    
+    asm volatile ( "fninit" );
+    if ( cpu_has_xmm )
+    {
+        /* load default value into MXCSR control/status register */
+        val = MXCSR_DEFAULT;
+        asm volatile ( "ldmxcsr %0" : : "m" (val) );
+    }
 }
 
-static void init_fpu(void);
-static void restore_fpu(struct vcpu *v);
-
-void setup_fpu(struct vcpu *v)
+/*******************************/
+/*     FPU Restore Functions   */
+/*******************************/
+/* Restore x87 extended state */
+static inline void fpu_xrstor(struct vcpu *v)
 {
-    ASSERT(!is_idle_vcpu(v));
-
-    /* Avoid recursion. */
-    clts();
-
-    if ( v->fpu_dirtied )
-        return;
-
-    if ( xsave_enabled(v) )
-    {
-        /*
-         * XCR0 normally represents what guest OS set. In case of Xen itself, 
-         * we set all supported feature mask before doing save/restore.
-         */
-        set_xcr0(v->arch.xcr0_accum);
-        xrstor(v);
-        set_xcr0(v->arch.xcr0);
-    }
-    else if ( v->fpu_initialised )
-    {
-        restore_fpu(v);
-    }
-    else
-    {
-        init_fpu();
-    }
-
-    v->fpu_initialised = 1;
-    v->fpu_dirtied = 1;
+    /*
+     * XCR0 normally represents what guest OS set. In case of Xen itself, 
+     * we set all supported feature mask before doing save/restore.
+     */
+    set_xcr0(v->arch.xcr0_accum);
+    xrstor(v);
+    set_xcr0(v->arch.xcr0);
 }
 
-static void init_fpu(void)
-{
-    asm volatile ( "fninit" );
-    if ( cpu_has_xmm )
-        load_mxcsr(0x1f80);
-}
-
-static void restore_fpu(struct vcpu *v)
+/* Restor x87 FPU, MMX, SSE and SSE2 state */
+static inline void fpu_fxrstor(struct vcpu *v)
 {
     const char *fpu_ctxt = v->arch.fpu_ctxt;
 
@@ -75,41 +56,42 @@
      * possibility, which may occur if the block was passed to us by control
      * tools, by silently clearing the block.
      */
-    if ( cpu_has_fxsr )
-    {
-        asm volatile (
+    asm volatile (
 #ifdef __i386__
-            "1: fxrstor %0            \n"
+        "1: fxrstor %0            \n"
 #else /* __x86_64__ */
-            /* See above for why the operands/constraints are this way. */
-            "1: " REX64_PREFIX "fxrstor (%2)\n"
+        /* See above for why the operands/constraints are this way. */
+        "1: " REX64_PREFIX "fxrstor (%2)\n"
 #endif
-            ".section .fixup,\"ax\"   \n"
-            "2: push %%"__OP"ax       \n"
-            "   push %%"__OP"cx       \n"
-            "   push %%"__OP"di       \n"
-            "   lea  %0,%%"__OP"di    \n"
-            "   mov  %1,%%ecx         \n"
-            "   xor  %%eax,%%eax      \n"
-            "   rep ; stosl           \n"
-            "   pop  %%"__OP"di       \n"
-            "   pop  %%"__OP"cx       \n"
-            "   pop  %%"__OP"ax       \n"
-            "   jmp  1b               \n"
-            ".previous                \n"
-            _ASM_EXTABLE(1b, 2b)
-            : 
-            : "m" (*fpu_ctxt),
-              "i" (sizeof(v->arch.xsave_area->fpu_sse)/4)
+        ".section .fixup,\"ax\"   \n"
+        "2: push %%"__OP"ax       \n"
+        "   push %%"__OP"cx       \n"
+        "   push %%"__OP"di       \n"
+        "   lea  %0,%%"__OP"di    \n"
+        "   mov  %1,%%ecx         \n"
+        "   xor  %%eax,%%eax      \n"
+        "   rep ; stosl           \n"
+        "   pop  %%"__OP"di       \n"
+        "   pop  %%"__OP"cx       \n"
+        "   pop  %%"__OP"ax       \n"
+        "   jmp  1b               \n"
+        ".previous                \n"
+        _ASM_EXTABLE(1b, 2b)
+        : 
+        : "m" (*fpu_ctxt),
+          "i" (sizeof(v->arch.xsave_area->fpu_sse)/4)
 #ifdef __x86_64__
-             ,"cdaSDb" (fpu_ctxt)
+          ,"cdaSDb" (fpu_ctxt)
 #endif
-            );
-    }
-    else
-    {
-        asm volatile ( "frstor %0" : : "m" (*fpu_ctxt) );
-    }
+        );
+}
+
+/* Restore x87 extended state */
+static inline void fpu_frstor(struct vcpu *v)
+{
+    const char *fpu_ctxt = v->arch.fpu_ctxt;
+
+    asm volatile ( "frstor %0" : : "m" (*fpu_ctxt) );
 }
 
 /*******************************/
@@ -178,6 +160,35 @@
 /*******************************/
 /*       VCPU FPU Functions    */
 /*******************************/
+/* 
+ * Restore FPU state when #NM is triggered.
+ */
+void vcpu_restore_fpu(struct vcpu *v)
+{
+    ASSERT(!is_idle_vcpu(v));
+
+    /* Avoid recursion. */
+    clts();
+
+    if ( v->fpu_dirtied )
+        return;
+
+    if ( xsave_enabled(v) )
+        fpu_xrstor(v);
+    else if ( v->fpu_initialised )
+    {
+        if ( cpu_has_fxsr )
+            fpu_fxrstor(v);
+        else
+            fpu_frstor(v);
+    }
+    else
+        fpu_init();
+
+    v->fpu_initialised = 1;
+    v->fpu_dirtied = 1;
+}
+
 /* 
  * On each context switch, save the necessary FPU info of VCPU being switch 
  * out. It dispatches saving operation based on CPU's capability.
diff -r 5560826d130f -r 5badff7cda6b xen/arch/x86/traps.c
--- a/xen/arch/x86/traps.c	Tue May 03 13:43:13 2011 -0500
+++ b/xen/arch/x86/traps.c	Tue May 03 13:45:26 2011 -0500
@@ -3198,7 +3198,7 @@
 
     BUG_ON(!guest_mode(regs));
 
-    setup_fpu(curr);
+    vcpu_restore_fpu(curr);
 
     if ( curr->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )
     {
diff -r 5560826d130f -r 5badff7cda6b xen/include/asm-x86/i387.h
--- a/xen/include/asm-x86/i387.h	Tue May 03 13:43:13 2011 -0500
+++ b/xen/include/asm-x86/i387.h	Tue May 03 13:45:26 2011 -0500
@@ -14,7 +14,7 @@
 #include <xen/types.h>
 #include <xen/percpu.h>
 
-void setup_fpu(struct vcpu *v);
+void vcpu_restore_fpu(struct vcpu *v);
 void vcpu_save_fpu(struct vcpu *v);
 
 int vcpu_init_fpu(struct vcpu *v);

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 5 of 8] FPU: add mask parameter to xsave and xrstor
       [not found] ` <ab7d2191421fbf95313e.1304747137@weisles1164.amd.com>
@ 2011-05-07  5:42   ` Huang2, Wei
  0 siblings, 0 replies; 11+ messages in thread
From: Huang2, Wei @ 2011-05-07  5:42 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com

[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: xen-unstable.hg-8.patch --]
[-- Type: text/x-patch; name="xen-unstable.hg-8.patch", Size: 4353 bytes --]

# HG changeset patch
# User Wei Huang <wei.huang2@amd.com>
# Date 1304696422 18000
# Node ID ab7d2191421fbf95313e1109de3ced5e2a094424
# Parent  5badff7cda6b4af71991d8150803ac2f7c4a27f5
FPU: add mask parameter to xsave and xrstor

Xen currently sets mask bits of xsave() and xrstor() to all 1's. This patch adds a mask option to xsave() and xrstor().

Signed-off-by: Wei Huang <wei.huang2@amd.com>

diff -r 5badff7cda6b -r ab7d2191421f xen/arch/x86/i387.c
--- a/xen/arch/x86/i387.c	Tue May 03 13:45:26 2011 -0500
+++ b/xen/arch/x86/i387.c	Fri May 06 10:40:22 2011 -0500
@@ -35,14 +35,14 @@
 /*     FPU Restore Functions   */
 /*******************************/
 /* Restore x87 extended state */
-static inline void fpu_xrstor(struct vcpu *v)
+static inline void fpu_xrstor(struct vcpu *v, uint64_t mask)
 {
     /*
      * XCR0 normally represents what guest OS set. In case of Xen itself, 
      * we set all supported feature mask before doing save/restore.
      */
     set_xcr0(v->arch.xcr0_accum);
-    xrstor(v);
+    xrstor(v, mask);
     set_xcr0(v->arch.xcr0);
 }
 
@@ -98,13 +98,13 @@
 /*      FPU Save Functions     */
 /*******************************/
 /* Save x87 extended state */
-static inline void fpu_xsave(struct vcpu *v)
+static inline void fpu_xsave(struct vcpu *v, uint64_t mask)
 {
     /* XCR0 normally represents what guest OS set. In case of Xen itself,
      * we set all accumulated feature mask before doing save/restore.
      */
     set_xcr0(v->arch.xcr0_accum);
-    xsave(v);
+    xsave(v, mask);
     set_xcr0(v->arch.xcr0);    
 }
 
@@ -174,7 +174,7 @@
         return;
 
     if ( xsave_enabled(v) )
-        fpu_xrstor(v);
+        fpu_xrstor(v, XSTATE_ALL);
     else if ( v->fpu_initialised )
     {
         if ( cpu_has_fxsr )
@@ -204,7 +204,7 @@
     clts();
 
     if ( xsave_enabled(v) )
-        fpu_xsave(v);
+        fpu_xsave(v, XSTATE_ALL);
     else if ( cpu_has_fxsr )
         fpu_fxsave(v);
     else
diff -r 5badff7cda6b -r ab7d2191421f xen/arch/x86/xstate.c
--- a/xen/arch/x86/xstate.c	Tue May 03 13:45:26 2011 -0500
+++ b/xen/arch/x86/xstate.c	Fri May 06 10:40:22 2011 -0500
@@ -51,32 +51,37 @@
     return this_cpu(xcr0);
 }
 
-void xsave(struct vcpu *v)
+void xsave(struct vcpu *v, uint64_t mask)
 {
     struct xsave_struct *ptr = v->arch.xsave_area;
+    uint32_t hmask = mask >> 32;
+    uint32_t lmask = mask;
 
     if ( cpu_has_xsaveopt )
         asm volatile (
             ".byte " REX_PREFIX "0x0f,0xae,0x37"
             :
-            : "a" (-1), "d" (-1), "D"(ptr)
+            : "a" (lmask), "d" (hmask), "D"(ptr)
             : "memory" );
     else
         asm volatile (
             ".byte " REX_PREFIX "0x0f,0xae,0x27"
             :
-            : "a" (-1), "d" (-1), "D"(ptr)
+            : "a" (lmask), "d" (hmask), "D"(ptr)
             : "memory" );
 }
 
-void xrstor(struct vcpu *v)
+void xrstor(struct vcpu *v, uint64_t mask)
 {
+    uint32_t hmask = mask >> 32;
+    uint32_t lmask = mask;
+
     struct xsave_struct *ptr = v->arch.xsave_area;
 
     asm volatile (
         ".byte " REX_PREFIX "0x0f,0xae,0x2f"
         :
-        : "m" (*ptr), "a" (-1), "d" (-1), "D"(ptr) );
+        : "m" (*ptr), "a" (lmask), "d" (hmask), "D"(ptr) );
 }
 
 bool_t xsave_enabled(const struct vcpu *v)
diff -r 5badff7cda6b -r ab7d2191421f xen/include/asm-x86/xstate.h
--- a/xen/include/asm-x86/xstate.h	Tue May 03 13:45:26 2011 -0500
+++ b/xen/include/asm-x86/xstate.h	Fri May 06 10:40:22 2011 -0500
@@ -26,6 +26,10 @@
 #define XSTATE_LWP     (1ULL << 62) /* AMD lightweight profiling */
 #define XSTATE_FP_SSE  (XSTATE_FP | XSTATE_SSE)
 #define XCNTXT_MASK    (XSTATE_FP | XSTATE_SSE | XSTATE_YMM | XSTATE_LWP)
+
+#define XSTATE_ALL     (~0)
+#define XSTATE_NONLAZY (XSTATE_LWP)
+#define XSTATE_LAZY    (XSTATE_ALL & ~XSTATE_NONLAZY)
 
 #ifdef CONFIG_X86_64
 #define REX_PREFIX     "0x48, "
@@ -56,8 +60,8 @@
 /* extended state operations */
 void set_xcr0(u64 xfeatures);
 uint64_t get_xcr0(void);
-void xsave(struct vcpu *v);
-void xrstor(struct vcpu *v);
+void xsave(struct vcpu *v, uint64_t mask);
+void xrstor(struct vcpu *v, uint64_t mask);
 bool_t xsave_enabled(const struct vcpu *v);
 
 /* extended state init and cleanup functions */

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 6 of 8] FPU: create lazy and non-lazy FPU restore functions
       [not found] ` <e58b1d06aabe5c0aa26f.1304747138@weisles1164.amd.com>
@ 2011-05-07  5:42   ` Huang2, Wei
  2011-05-09  8:37     ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Huang2, Wei @ 2011-05-07  5:42 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com

[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: xen-unstable.hg-8.patch --]
[-- Type: text/x-patch; name="xen-unstable.hg-8.patch", Size: 5455 bytes --]

# HG changeset patch
# User Wei Huang <wei.huang2@amd.com>
# Date 1304697215 18000
# Node ID e58b1d06aabe5c0aa26f2bd29b607a1c337d6c80
# Parent  ab7d2191421fbf95313e1109de3ced5e2a094424
FPU: create lazy and non-lazy FPU restore functions

Currently Xen relies on #NM (via CR0.TS) to trigger FPU context restore. But not all FPU state is tracked by TS bit. This function creates two FPU restore functions: vcpu_restore_fpu_lazy() and vcpu_restore_fpu_eager(). vcpu_restore_fpu_lazy() is still used when #NM is triggered. vcpu_restore_fpu_eager(), as a comparision, is called for vcpu which is being scheduled in on every context switch. To minimize restore overhead, it creates a flag, nonlazy_xstate_used, to control non-lazy restore.

Signed-off-by: Wei Huang <wei.huang2@amd.com>

diff -r ab7d2191421f -r e58b1d06aabe xen/arch/x86/domain.c
--- a/xen/arch/x86/domain.c	Fri May 06 10:40:22 2011 -0500
+++ b/xen/arch/x86/domain.c	Fri May 06 10:53:35 2011 -0500
@@ -1578,6 +1578,7 @@
         memcpy(stack_regs, &n->arch.user_regs, CTXT_SWITCH_STACK_BYTES);
         if ( xsave_enabled(n) && n->arch.xcr0 != get_xcr0() )
             set_xcr0(n->arch.xcr0);
+        vcpu_restore_fpu_eager(n);
         n->arch.ctxt_switch_to(n);
     }
 
diff -r ab7d2191421f -r e58b1d06aabe xen/arch/x86/hvm/svm/svm.c
--- a/xen/arch/x86/hvm/svm/svm.c	Fri May 06 10:40:22 2011 -0500
+++ b/xen/arch/x86/hvm/svm/svm.c	Fri May 06 10:53:35 2011 -0500
@@ -348,7 +348,7 @@
 {
     struct vmcb_struct *vmcb = v->arch.hvm_svm.vmcb;
 
-    vcpu_restore_fpu(v);
+    vcpu_restore_fpu_lazy(v);
     vmcb_set_exception_intercepts(
         vmcb, vmcb_get_exception_intercepts(vmcb) & ~(1U << TRAP_no_device));
 }
diff -r ab7d2191421f -r e58b1d06aabe xen/arch/x86/hvm/vmx/vmx.c
--- a/xen/arch/x86/hvm/vmx/vmx.c	Fri May 06 10:40:22 2011 -0500
+++ b/xen/arch/x86/hvm/vmx/vmx.c	Fri May 06 10:53:35 2011 -0500
@@ -612,7 +612,7 @@
 
 static void vmx_fpu_enter(struct vcpu *v)
 {
-    vcpu_restore_fpu(v);
+    vcpu_restore_fpu_lazy(v);
     v->arch.hvm_vmx.exception_bitmap &= ~(1u << TRAP_no_device);
     vmx_update_exception_bitmap(v);
     v->arch.hvm_vmx.host_cr0 &= ~X86_CR0_TS;
diff -r ab7d2191421f -r e58b1d06aabe xen/arch/x86/i387.c
--- a/xen/arch/x86/i387.c	Fri May 06 10:40:22 2011 -0500
+++ b/xen/arch/x86/i387.c	Fri May 06 10:53:35 2011 -0500
@@ -98,13 +98,13 @@
 /*      FPU Save Functions     */
 /*******************************/
 /* Save x87 extended state */
-static inline void fpu_xsave(struct vcpu *v, uint64_t mask)
+static inline void fpu_xsave(struct vcpu *v)
 {
     /* XCR0 normally represents what guest OS set. In case of Xen itself,
      * we set all accumulated feature mask before doing save/restore.
      */
     set_xcr0(v->arch.xcr0_accum);
-    xsave(v, mask);
+    xsave(v, v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY);
     set_xcr0(v->arch.xcr0);    
 }
 
@@ -160,10 +160,25 @@
 /*******************************/
 /*       VCPU FPU Functions    */
 /*******************************/
+/* Restore FPU state whenever VCPU is schduled in. */
+void vcpu_restore_fpu_eager(struct vcpu *v)
+{
+    ASSERT(!is_idle_vcpu(v));
+    
+    /* save the nonlazy extended state which is not tracked by CR0.TS bit */
+    if ( v->arch.nonlazy_xstate_used )
+    {
+        /* Avoid recursion */
+        clts();        
+        fpu_xrstor(v, XSTATE_NONLAZY);
+        stts();
+    }
+}
+
 /* 
  * Restore FPU state when #NM is triggered.
  */
-void vcpu_restore_fpu(struct vcpu *v)
+void vcpu_restore_fpu_lazy(struct vcpu *v)
 {
     ASSERT(!is_idle_vcpu(v));
 
@@ -174,7 +189,7 @@
         return;
 
     if ( xsave_enabled(v) )
-        fpu_xrstor(v, XSTATE_ALL);
+        fpu_xrstor(v, XSTATE_LAZY);
     else if ( v->fpu_initialised )
     {
         if ( cpu_has_fxsr )
@@ -204,7 +219,7 @@
     clts();
 
     if ( xsave_enabled(v) )
-        fpu_xsave(v, XSTATE_ALL);
+        fpu_xsave(v);
     else if ( cpu_has_fxsr )
         fpu_fxsave(v);
     else
diff -r ab7d2191421f -r e58b1d06aabe xen/arch/x86/traps.c
--- a/xen/arch/x86/traps.c	Fri May 06 10:40:22 2011 -0500
+++ b/xen/arch/x86/traps.c	Fri May 06 10:53:35 2011 -0500
@@ -3198,7 +3198,7 @@
 
     BUG_ON(!guest_mode(regs));
 
-    vcpu_restore_fpu(curr);
+    vcpu_restore_fpu_lazy(curr);
 
     if ( curr->arch.pv_vcpu.ctrlreg[0] & X86_CR0_TS )
     {
diff -r ab7d2191421f -r e58b1d06aabe xen/include/asm-x86/domain.h
--- a/xen/include/asm-x86/domain.h	Fri May 06 10:40:22 2011 -0500
+++ b/xen/include/asm-x86/domain.h	Fri May 06 10:53:35 2011 -0500
@@ -492,7 +492,10 @@
      * it explicitly enables it via xcr0.
      */
     uint64_t xcr0_accum;
-
+    /* This variable determines whether nonlazy extended state has been used,
+     * and thus should be saved/restored. */
+    bool_t nonlazy_xstate_used;
+    
     struct paging_vcpu paging;
 
 #ifdef CONFIG_X86_32
diff -r ab7d2191421f -r e58b1d06aabe xen/include/asm-x86/i387.h
--- a/xen/include/asm-x86/i387.h	Fri May 06 10:40:22 2011 -0500
+++ b/xen/include/asm-x86/i387.h	Fri May 06 10:53:35 2011 -0500
@@ -14,7 +14,8 @@
 #include <xen/types.h>
 #include <xen/percpu.h>
 
-void vcpu_restore_fpu(struct vcpu *v);
+void vcpu_restore_fpu_eager(struct vcpu *v);
+void vcpu_restore_fpu_lazy(struct vcpu *v);
 void vcpu_save_fpu(struct vcpu *v);
 
 int vcpu_init_fpu(struct vcpu *v);

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 7 of 8] LWP: export LWP related CPUID to AMD SVM guest
       [not found] ` <92b0ed16c09da1fb24f5.1304747139@weisles1164.amd.com>
@ 2011-05-07  5:43   ` Huang2, Wei
  0 siblings, 0 replies; 11+ messages in thread
From: Huang2, Wei @ 2011-05-07  5:43 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com

[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: xen-unstable.hg-8.patch --]
[-- Type: text/x-patch; name="xen-unstable.hg-8.patch", Size: 1642 bytes --]

# HG changeset patch
# User Wei Huang <wei.huang2@amd.com>
# Date 1304449290 18000
# Node ID 92b0ed16c09da1fb24f5c21ed06ed1c99709b6d7
# Parent  e58b1d06aabe5c0aa26f2bd29b607a1c337d6c80
LWP: export LWP related CPUID to AMD SVM guest

This patch exposes LWP CPUID 0x8000001C to SVM guests.

Signed-off-by: Wei Huang  <wei.huang2@amd.com>

diff -r e58b1d06aabe -r 92b0ed16c09d tools/libxc/xc_cpuid_x86.c
--- a/tools/libxc/xc_cpuid_x86.c	Fri May 06 10:53:35 2011 -0500
+++ b/tools/libxc/xc_cpuid_x86.c	Tue May 03 14:01:30 2011 -0500
@@ -31,7 +31,7 @@
 
 #define DEF_MAX_BASE 0x0000000du
 #define DEF_MAX_INTELEXT  0x80000008u
-#define DEF_MAX_AMDEXT    0x8000000au
+#define DEF_MAX_AMDEXT    0x8000001cu
 
 static int hypervisor_is_64bit(xc_interface *xch)
 {
@@ -111,7 +111,8 @@
                     bitmaskof(X86_FEATURE_3DNOWPREFETCH) |
                     bitmaskof(X86_FEATURE_XOP) |
                     bitmaskof(X86_FEATURE_FMA4) |
-                    bitmaskof(X86_FEATURE_TBM));
+                    bitmaskof(X86_FEATURE_TBM) |
+                    bitmaskof(X86_FEATURE_LWP));
         regs[3] &= (0x0183f3ff | /* features shared with 0x00000001:EDX */
                     (is_pae ? bitmaskof(X86_FEATURE_NX) : 0) |
                     (is_64bit ? bitmaskof(X86_FEATURE_LM) : 0) |
@@ -385,6 +386,7 @@
     case 0x80000005: /* AMD L1 cache/TLB info (dumped by Intel policy) */
     case 0x80000006: /* AMD L2/3 cache/TLB info ; Intel L2 cache features */
     case 0x8000000a: /* AMD SVM feature bits */
+    case 0x8000001c: /* AMD lightweight profiling */
         break;
 
     default:

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 8 of 8] LWP: Add LWP support for SVM guests
       [not found] ` <ea027ce41fb45a619d39.1304747140@weisles1164.amd.com>
@ 2011-05-07  5:43   ` Huang2, Wei
  0 siblings, 0 replies; 11+ messages in thread
From: Huang2, Wei @ 2011-05-07  5:43 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com

[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: xen-unstable.hg-8.patch --]
[-- Type: text/x-patch; name="xen-unstable.hg-8.patch", Size: 6483 bytes --]

# HG changeset patch
# User Wei Huang <wei.huang2@amd.com>
# Date 1304746997 18000
# Node ID ea027ce41fb45a619d39f7bdd2bd94e233fe2030
# Parent  92b0ed16c09da1fb24f5c21ed06ed1c99709b6d7
LWP: Add LWP support for SVM guests

This patch enables SVM to handle LWP related MSRs and CPUID. It intercepts guests read/write to LWP_CFG. It also save/restore LWP_CFG when guests touch this MSR. The LWP_CBADDR MSR is not intercepted because this MSR is handled by xsave/xrstor.

Signed-off-by: Wei Huang <wei.huang2@amd.com>

diff -r 92b0ed16c09d -r ea027ce41fb4 xen/arch/x86/hvm/svm/svm.c
--- a/xen/arch/x86/hvm/svm/svm.c	Tue May 03 14:01:30 2011 -0500
+++ b/xen/arch/x86/hvm/svm/svm.c	Sat May 07 00:43:17 2011 -0500
@@ -58,7 +58,8 @@
 #include <asm/hvm/trace.h>
 #include <asm/hap.h>
 #include <asm/apic.h>
-#include <asm/debugger.h>       
+#include <asm/debugger.h>
+#include <asm/xstate.h>
 
 u32 svm_feature_flags;
 
@@ -695,6 +696,50 @@
     *(u16 *)(hypercall_page + (__HYPERVISOR_iret * 32)) = 0x0b0f; /* ud2 */
 }
 
+static inline void svm_lwp_save(struct vcpu *v)
+{
+    /* Don't mess up with other guests. Disable LWP for next VCPU. */
+    if ( v->arch.hvm_svm.guest_lwp_cfg )
+    {
+        wrmsrl(MSR_AMD64_LWP_CFG, 0x0);
+        wrmsrl(MSR_AMD64_LWP_CBADDR, 0x0);
+    }
+}
+
+static inline void svm_lwp_load(struct vcpu *v)
+{
+    /* Only LWP_CFG is reloaded. LWP_CBADDR will be reloaded via xrstor. */
+   if ( v->arch.hvm_svm.guest_lwp_cfg ) 
+       wrmsrl(MSR_AMD64_LWP_CFG, v->arch.hvm_svm.guest_lwp_cfg);
+}
+
+/* Update LWP_CFG MSR (0xc0000105). Return -1 if error; otherwise returns 0. */
+static int svm_update_lwp_cfg(struct vcpu *v, uint64_t msr_content)
+{
+    unsigned int eax, ebx, ecx, edx;
+    uint32_t msr_low;
+    
+    if ( xsave_enabled(v) && cpu_has_lwp )
+    {
+        hvm_cpuid(0x8000001c, &eax, &ebx, &ecx, &edx);
+        msr_low = (uint32_t)msr_content;
+        
+        /* generate #GP if guest tries to turn on unsupported features. */
+        if ( msr_low & ~edx)
+            return -1;
+        
+        wrmsrl(MSR_AMD64_LWP_CFG, msr_content);
+        /* CPU might automatically correct reserved bits. So read it back. */
+        rdmsrl(MSR_AMD64_LWP_CFG, msr_content);
+        v->arch.hvm_svm.guest_lwp_cfg = msr_content;
+
+        /* track nonalzy state if LWP_CFG is non-zero. */
+        v->arch.nonlazy_xstate_used = !!(msr_content);
+    }
+
+    return 0;
+}
+
 static void svm_ctxt_switch_from(struct vcpu *v)
 {
     int cpu = smp_processor_id();
@@ -703,6 +748,7 @@
 
     svm_save_dr(v);
     vpmu_save(v);
+    svm_lwp_save(v);
 
     svm_sync_vmcb(v);
     svm_vmload(per_cpu(root_vmcb, cpu));
@@ -746,6 +792,7 @@
     svm_vmload(vmcb);
     vmcb->cleanbits.bytes = 0;
     vpmu_load(v);
+    svm_lwp_load(v);
 
     if ( cpu_has_rdtscp )
         wrmsrl(MSR_TSC_AUX, hvm_msr_tsc_aux(v));
@@ -1120,6 +1167,24 @@
         if ( vlapic_hw_disabled(vcpu_vlapic(v)) )
             __clear_bit(X86_FEATURE_APIC & 31, edx);
         break;
+    case 0x8000001c: 
+    {
+        /* LWP capability CPUID */
+        uint64_t lwp_cfg = v->arch.hvm_svm.guest_lwp_cfg;
+
+        if ( cpu_has_lwp )
+        {
+            if ( !(v->arch.xcr0 & XSTATE_LWP) )
+           {
+                *eax = 0x0;
+                break;
+            }
+
+            /* turn on available bit and other features specified in lwp_cfg */
+            *eax = (*edx & lwp_cfg) | 0x00000001;
+        }
+        break;
+    }
     default:
         break;
     }
@@ -1227,6 +1292,10 @@
         *msr_content = vmcb_get_lastinttoip(vmcb);
         break;
 
+    case MSR_AMD64_LWP_CFG:
+        *msr_content = v->arch.hvm_svm.guest_lwp_cfg;
+        break;
+
     case MSR_K7_PERFCTR0:
     case MSR_K7_PERFCTR1:
     case MSR_K7_PERFCTR2:
@@ -1323,6 +1392,11 @@
 
     case MSR_IA32_LASTINTTOIP:
         vmcb_set_lastinttoip(vmcb, msr_content);
+        break;
+
+    case MSR_AMD64_LWP_CFG:
+        if ( svm_update_lwp_cfg(v, msr_content) < 0 )
+            goto gpf;
         break;
 
     case MSR_K7_PERFCTR0:
diff -r 92b0ed16c09d -r ea027ce41fb4 xen/arch/x86/hvm/svm/vmcb.c
--- a/xen/arch/x86/hvm/svm/vmcb.c	Tue May 03 14:01:30 2011 -0500
+++ b/xen/arch/x86/hvm/svm/vmcb.c	Sat May 07 00:43:17 2011 -0500
@@ -120,6 +120,11 @@
     svm_disable_intercept_for_msr(v, MSR_LSTAR);
     svm_disable_intercept_for_msr(v, MSR_STAR);
     svm_disable_intercept_for_msr(v, MSR_SYSCALL_MASK);
+
+    /* LWP_CBADDR MSR is saved and restored by FPU code. So SVM doesn't need to
+     * intercept it. */
+    if ( cpu_has_lwp )
+        svm_disable_intercept_for_msr(v, MSR_AMD64_LWP_CBADDR);
 
     vmcb->_msrpm_base_pa = (u64)virt_to_maddr(arch_svm->msrpm);
     vmcb->_iopm_base_pa  = (u64)virt_to_maddr(hvm_io_bitmap);
diff -r 92b0ed16c09d -r ea027ce41fb4 xen/include/asm-x86/cpufeature.h
--- a/xen/include/asm-x86/cpufeature.h	Tue May 03 14:01:30 2011 -0500
+++ b/xen/include/asm-x86/cpufeature.h	Sat May 07 00:43:17 2011 -0500
@@ -208,6 +208,8 @@
 
 #define cpu_has_xsave           boot_cpu_has(X86_FEATURE_XSAVE)
 
+#define cpu_has_lwp             boot_cpu_has(X86_FEATURE_LWP)
+
 #define cpu_has_arch_perfmon    boot_cpu_has(X86_FEATURE_ARCH_PERFMON)
 
 #define cpu_has_rdtscp          boot_cpu_has(X86_FEATURE_RDTSCP)
diff -r 92b0ed16c09d -r ea027ce41fb4 xen/include/asm-x86/hvm/svm/vmcb.h
--- a/xen/include/asm-x86/hvm/svm/vmcb.h	Tue May 03 14:01:30 2011 -0500
+++ b/xen/include/asm-x86/hvm/svm/vmcb.h	Sat May 07 00:43:17 2011 -0500
@@ -512,6 +512,9 @@
     uint64_t guest_sysenter_cs;
     uint64_t guest_sysenter_esp;
     uint64_t guest_sysenter_eip;
+    
+    /* AMD lightweight profiling MSR */
+    uint64_t guest_lwp_cfg;
 };
 
 struct vmcb_struct *alloc_vmcb(void);
diff -r 92b0ed16c09d -r ea027ce41fb4 xen/include/asm-x86/msr-index.h
--- a/xen/include/asm-x86/msr-index.h	Tue May 03 14:01:30 2011 -0500
+++ b/xen/include/asm-x86/msr-index.h	Sat May 07 00:43:17 2011 -0500
@@ -253,6 +253,10 @@
 #define MSR_AMD_PATCHLEVEL		0x0000008b
 #define MSR_AMD_PATCHLOADER		0xc0010020
 
+/* AMD Lightweight Profiling MSRs */
+#define MSR_AMD64_LWP_CFG		0xc0000105
+#define MSR_AMD64_LWP_CBADDR		0xc0000106
+
 /* AMD OS Visible Workaround MSRs */
 #define MSR_AMD_OSVW_ID_LENGTH          0xc0010140
 #define MSR_AMD_OSVW_STATUS             0xc0010141

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 6 of 8] FPU: create lazy and non-lazy FPU restore functions
  2011-05-07  5:42   ` [PATCH 6 of 8] FPU: create lazy and non-lazy FPU restore functions Huang2, Wei
@ 2011-05-09  8:37     ` Jan Beulich
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2011-05-09  8:37 UTC (permalink / raw)
  To: Wei Huang2; +Cc: xen-devel@lists.xensource.com

>>> On 07.05.11 at 07:42, "Huang2, Wei" <Wei.Huang2@amd.com> wrote:
>--- a/xen/arch/x86/i387.c	Fri May 06 10:40:22 2011 -0500
>+++ b/xen/arch/x86/i387.c	Fri May 06 10:53:35 2011 -0500
>@@ -98,13 +98,13 @@
> /*      FPU Save Functions     */
> /*******************************/
> /* Save x87 extended state */
>-static inline void fpu_xsave(struct vcpu *v, uint64_t mask)
>+static inline void fpu_xsave(struct vcpu *v)

This looks okay now to me, only a cosmetic comment: You add the
"mask" parameter in patch 5, just to remove it here again.

Jan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0 of 8] FPU LWP: patch description
  2011-05-07  5:39 ` [PATCH 0 of 8] FPU LWP: patch description Huang2, Wei
@ 2011-05-09  8:39   ` Jan Beulich
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2011-05-09  8:39 UTC (permalink / raw)
  To: Wei Huang2, xen-devel@lists.xensource.com

>>> On 07.05.11 at 07:39, "Huang2, Wei" <Wei.Huang2@amd.com> wrote:
> This patch set supports AMD lightweight profiling for SVM guests. Because LWP 
> isn't tracked by CR0.TS bit, we clean up the FPU code to handle lazy and 
> unlazy FPU states differently. Lazy FPU state (such as SSE, YMM) is handled 
> when #NM is triggered. Unlazy state, such as LWP, is saved and restored on 
> each vcpu context switch.
> 
> Per Keir's comments, I moved extended state related code into xstate.c and 
> xstate.h. The FPU related code in i387.c was also cleaned up and has 
> consistent names now. The comments from Jan Beulich were also taken. I added 
> a new variable, nonlazy_xstate_used, to control whether save/restore nonlazy 
> state.
> 
> ====== i387.c ======
> * void vcpu_restore_fpu_eager(struct vcpu *v);
> * void vcpu_restore_fpu_lazy(struct vcpu *v);
> * void vcpu_save_fpu(struct vcpu *v);
> * int vcpu_init_fpu(struct vcpu *v);
> * void vcpu_destroy_fpu(struct vcpu *v);
> 
> ====== xstate.c ======
> * void set_xcr0(u64 xfeatures);
> * uint64_t get_xcr0(void);
> * void xsave(struct vcpu *v, uint64_t mask);
> * void xrstor(struct vcpu *v, uint64_t mask);
> * bool_t xsave_enabled(const struct vcpu *v);
> * void xstate_free_save_area(struct vcpu *v);
> * int xstate_alloc_save_area(struct vcpu *v);
> * void xstate_init(void);
> 
> 
> This code has been tested on real hardware. Please comment.

Thanks, Wei - this looks good now to me.

Jan

> 
> -Wei
> 
> 
>  b/xen/arch/x86/xstate.c            |  188 +++++++++++++++
>  b/xen/include/asm-x86/xstate.h     |   72 +++++
>  tools/libxc/xc_cpuid_x86.c         |    6
>  xen/arch/x86/Makefile              |    1
>  xen/arch/x86/acpi/suspend.c        |    2
>  xen/arch/x86/cpu/common.c          |    4
>  xen/arch/x86/domain.c              |   29 --
>  xen/arch/x86/domctl.c              |    2
>  xen/arch/x86/hvm/hvm.c             |    3
>  xen/arch/x86/hvm/svm/svm.c         |   75 ++++++
>  xen/arch/x86/hvm/svm/vmcb.c        |    5
>  xen/arch/x86/hvm/vmx/vmcs.c        |    1
>  xen/arch/x86/hvm/vmx/vmx.c         |    2
>  xen/arch/x86/i387.c                |  450 ++++++++++++++++---------------------
>  xen/arch/x86/traps.c               |    3
>  xen/include/asm-x86/cpufeature.h   |    2
>  xen/include/asm-x86/domain.h       |    5
>  xen/include/asm-x86/hvm/svm/vmcb.h |    3
>  xen/include/asm-x86/i387.h         |   71 -----
>  xen/include/asm-x86/msr-index.h    |    4
>  20 files changed, 573 insertions(+), 355 deletions(-)
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com 
> http://lists.xensource.com/xen-devel 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-05-09  8:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <patchbomb.1304747132@weisles1164.amd.com>
2011-05-07  5:39 ` [PATCH 0 of 8] FPU LWP: patch description Huang2, Wei
2011-05-09  8:39   ` Jan Beulich
     [not found] ` <439311a108d71aba1580.1304747133@weisles1164.amd.com>
2011-05-07  5:40   ` [PATCH 1 of 8] FPU: extract extended related code into xstate.h and xstate.c Huang2, Wei
     [not found] ` <eb6f1c5d0c7e688df083.1304747134@weisles1164.amd.com>
2011-05-07  5:40   ` [PATCH 2 of 8] FPU: create FPU init and destroy functions Huang2, Wei
     [not found] ` <5560826d130f143c8aec.1304747135@weisles1164.amd.com>
2011-05-07  5:41   ` [PATCH 3 of 8] FPU: clean up FPU context save function Huang2, Wei
     [not found] ` <5badff7cda6b4af71991.1304747136@weisles1164.amd.com>
2011-05-07  5:42   ` [PATCH 4 of 8] FPU: clean up FPU context restore function Huang2, Wei
     [not found] ` <ab7d2191421fbf95313e.1304747137@weisles1164.amd.com>
2011-05-07  5:42   ` [PATCH 5 of 8] FPU: add mask parameter to xsave and xrstor Huang2, Wei
     [not found] ` <e58b1d06aabe5c0aa26f.1304747138@weisles1164.amd.com>
2011-05-07  5:42   ` [PATCH 6 of 8] FPU: create lazy and non-lazy FPU restore functions Huang2, Wei
2011-05-09  8:37     ` Jan Beulich
     [not found] ` <92b0ed16c09da1fb24f5.1304747139@weisles1164.amd.com>
2011-05-07  5:43   ` [PATCH 7 of 8] LWP: export LWP related CPUID to AMD SVM guest Huang2, Wei
     [not found] ` <ea027ce41fb45a619d39.1304747140@weisles1164.amd.com>
2011-05-07  5:43   ` [PATCH 8 of 8] LWP: Add LWP support for SVM guests Huang2, Wei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).