All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/2] x86: FPU handling cleanup
@ 2024-10-07 15:52 Alejandro Vallejo
  2024-10-07 15:52 ` [PATCH v4 1/2] x86/fpu: Combine fpu_ctxt and xsave_area in arch_vcpu Alejandro Vallejo
  2024-10-07 15:52 ` [PATCH v4 2/2] x86/fpu: Rework fpu_setup_fpu() uses to split it in two Alejandro Vallejo
  0 siblings, 2 replies; 7+ messages in thread
From: Alejandro Vallejo @ 2024-10-07 15:52 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
	Roger Pau Monné

v3: https://lore.kernel.org/xen-devel/20240813142119.29012-1-alejandro.vallejo@cloud.com/
v3 -> v4: Removal vcpu_default_fpu() + style changes

v2: https://lore.kernel.org/xen-devel/20240808134150.29927-1-alejandro.vallejo@cloud.com/
v2 -> v3: Cosmetic changes and wiped big comment about missing data in the
          migration stream. Details in each patch.

v1: https://lore.kernel.org/xen-devel/cover.1720538832.git.alejandro.vallejo@cloud.com/
v1 -> v2: v1/patch1 and v1/patch2 are already in staging.

=============================== Original cover letter =========================
I want to eventually reach a position in which the FPU state can be allocated
from the domheap and hidden via the same core mechanism proposed in Elias'
directmap removal series. Doing so is complicated by the presence of 2 aliased
pointers (v->arch.fpu_ctxt and v->arch.xsave_area) and the rather complicated
semantics of vcpu_setup_fpu(). This series tries to simplify the code so moving
to a "map/modify/unmap" model is more tractable.

Patches 1 and 2 are trivial refactors.

Patch 3 unifies FPU state so an XSAVE area is allocated per vCPU regardless of
the host supporting it or not. The rationale is that the memory savings are
negligible and not worth the extra complexity.

Patch 4 is a non-trivial split of the vcpu_setup_fpu() into 2 separate
functions. One to override x87/SSE state, and another to set a reset state.
===============================================================================

Alejandro Vallejo (3):
  x86/fpu: Combine fpu_ctxt and xsave_area in arch_vcpu
  x86/fpu: Rework fpu_setup_fpu() uses to split it in two
  x86/fpu: Remove remaining uses of FCW_DEFAULT

 xen/arch/x86/domain.c             |  7 ++-
 xen/arch/x86/domctl.c             |  6 +-
 xen/arch/x86/hvm/emulate.c        |  4 +-
 xen/arch/x86/hvm/hvm.c            | 18 +++---
 xen/arch/x86/i387.c               | 94 ++++++++-----------------------
 xen/arch/x86/include/asm/domain.h |  6 --
 xen/arch/x86/include/asm/i387.h   | 21 +++++--
 xen/arch/x86/include/asm/xstate.h |  2 +-
 xen/arch/x86/x86_emulate/blk.c    |  2 +-
 xen/arch/x86/xstate.c             | 14 +++--
 xen/common/efi/runtime.c          |  2 +-
 11 files changed, 74 insertions(+), 102 deletions(-)

-- 
2.46.0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 1/2] x86/fpu: Combine fpu_ctxt and xsave_area in arch_vcpu
  2024-10-07 15:52 [PATCH v4 0/2] x86: FPU handling cleanup Alejandro Vallejo
@ 2024-10-07 15:52 ` Alejandro Vallejo
  2024-10-08  7:47   ` Frediano Ziglio
  2024-10-07 15:52 ` [PATCH v4 2/2] x86/fpu: Rework fpu_setup_fpu() uses to split it in two Alejandro Vallejo
  1 sibling, 1 reply; 7+ messages in thread
From: Alejandro Vallejo @ 2024-10-07 15:52 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
	Roger Pau Monné

fpu_ctxt is either a pointer to the legacy x87/SSE save area (used by FXSAVE) or
a pointer aliased with xsave_area that points to its fpu_sse subfield. Such
subfield is at the base and is identical in size and layout to the legacy
buffer.

This patch merges the 2 pointers in the arch_vcpu into a single XSAVE area. In
the very rare case in which the host doesn't support XSAVE all we're doing is
wasting a tiny amount of memory and trading those for a lot more simplicity in
the code.

While at it, dedup the setup logic in vcpu_init_fpu() and integrate it
into xstate_alloc_save_area().

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
--
v4:
  * Amend commit message with extra note about deduping vcpu_init_fpu()
  * Remove comment on top of cpu_user_regs (though I really think there
    ought to be a credible one, in one form or another).
  * Remove cast from blk.c so FXSAVE_AREA is "void *"
  * Simplify comment in xstate_alloc_save_area() for the "host has no
    XSAVE" case.
---
 xen/arch/x86/domctl.c             |  6 ++++-
 xen/arch/x86/hvm/emulate.c        |  4 +--
 xen/arch/x86/hvm/hvm.c            |  6 ++++-
 xen/arch/x86/i387.c               | 45 +++++--------------------------
 xen/arch/x86/include/asm/domain.h |  6 -----
 xen/arch/x86/x86_emulate/blk.c    |  2 +-
 xen/arch/x86/xstate.c             | 12 ++++++---
 7 files changed, 28 insertions(+), 53 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 96d816cf1a7d..2d115395da90 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1379,7 +1379,11 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
 #define c(fld) (c.nat->fld)
 #endif
 
-    memcpy(&c.nat->fpu_ctxt, v->arch.fpu_ctxt, sizeof(c.nat->fpu_ctxt));
+    BUILD_BUG_ON(sizeof(c.nat->fpu_ctxt) !=
+                 sizeof(v->arch.xsave_area->fpu_sse));
+    memcpy(&c.nat->fpu_ctxt, &v->arch.xsave_area->fpu_sse,
+           sizeof(c.nat->fpu_ctxt));
+
     if ( is_pv_domain(d) )
         c(flags = v->arch.pv.vgc_flags & ~(VGCF_i387_valid|VGCF_in_kernel));
     else
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index aa97ca1cbffd..f2bc6967dfcb 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -2371,7 +2371,7 @@ static int cf_check hvmemul_get_fpu(
         alternative_vcall(hvm_funcs.fpu_dirty_intercept);
     else if ( type == X86EMUL_FPU_fpu )
     {
-        const fpusse_t *fpu_ctxt = curr->arch.fpu_ctxt;
+        const fpusse_t *fpu_ctxt = &curr->arch.xsave_area->fpu_sse;
 
         /*
          * Latch current register state so that we can back out changes
@@ -2411,7 +2411,7 @@ static void cf_check hvmemul_put_fpu(
 
     if ( aux )
     {
-        fpusse_t *fpu_ctxt = curr->arch.fpu_ctxt;
+        fpusse_t *fpu_ctxt = &curr->arch.xsave_area->fpu_sse;
         bool dval = aux->dval;
         int mode = hvm_guest_x86_mode(curr);
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 7b2e1c9813d6..77fe282118f7 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -914,7 +914,11 @@ static int cf_check hvm_save_cpu_ctxt(struct vcpu *v, hvm_domain_context_t *h)
 
     if ( v->fpu_initialised )
     {
-        memcpy(ctxt.fpu_regs, v->arch.fpu_ctxt, sizeof(ctxt.fpu_regs));
+        BUILD_BUG_ON(sizeof(ctxt.fpu_regs) !=
+                     sizeof(v->arch.xsave_area->fpu_sse));
+        memcpy(ctxt.fpu_regs, &v->arch.xsave_area->fpu_sse,
+               sizeof(ctxt.fpu_regs));
+
         ctxt.flags = XEN_X86_FPU_INITIALISED;
     }
 
diff --git a/xen/arch/x86/i387.c b/xen/arch/x86/i387.c
index 134e0bece519..fbb9d3584a3d 100644
--- a/xen/arch/x86/i387.c
+++ b/xen/arch/x86/i387.c
@@ -39,7 +39,7 @@ static inline void fpu_xrstor(struct vcpu *v, uint64_t mask)
 /* Restore x87 FPU, MMX, SSE and SSE2 state */
 static inline void fpu_fxrstor(struct vcpu *v)
 {
-    const fpusse_t *fpu_ctxt = v->arch.fpu_ctxt;
+    const fpusse_t *fpu_ctxt = &v->arch.xsave_area->fpu_sse;
 
     /*
      * Some CPUs don't save/restore FDP/FIP/FOP unless an exception
@@ -151,7 +151,7 @@ static inline void fpu_xsave(struct vcpu *v)
 /* Save x87 FPU, MMX, SSE and SSE2 state */
 static inline void fpu_fxsave(struct vcpu *v)
 {
-    fpusse_t *fpu_ctxt = v->arch.fpu_ctxt;
+    fpusse_t *fpu_ctxt = &v->arch.xsave_area->fpu_sse;
     unsigned int fip_width = v->domain->arch.x87_fip_width;
 
     if ( fip_width != 4 )
@@ -212,7 +212,7 @@ void vcpu_restore_fpu_nonlazy(struct vcpu *v, bool need_stts)
      * above) we also need to restore full state, to prevent subsequently
      * saving state belonging to another vCPU.
      */
-    if ( v->arch.fully_eager_fpu || (v->arch.xsave_area && xstate_all(v)) )
+    if ( v->arch.fully_eager_fpu || xstate_all(v) )
     {
         if ( cpu_has_xsave )
             fpu_xrstor(v, XSTATE_ALL);
@@ -299,44 +299,14 @@ void save_fpu_enable(void)
 /* Initialize FPU's context save area */
 int vcpu_init_fpu(struct vcpu *v)
 {
-    int rc;
-
     v->arch.fully_eager_fpu = opt_eager_fpu;
-
-    if ( (rc = xstate_alloc_save_area(v)) != 0 )
-        return rc;
-
-    if ( v->arch.xsave_area )
-        v->arch.fpu_ctxt = &v->arch.xsave_area->fpu_sse;
-    else
-    {
-        BUILD_BUG_ON(__alignof(v->arch.xsave_area->fpu_sse) < 16);
-        v->arch.fpu_ctxt = _xzalloc(sizeof(v->arch.xsave_area->fpu_sse),
-                                    __alignof(v->arch.xsave_area->fpu_sse));
-        if ( v->arch.fpu_ctxt )
-        {
-            fpusse_t *fpu_sse = v->arch.fpu_ctxt;
-
-            fpu_sse->fcw = FCW_DEFAULT;
-            fpu_sse->mxcsr = MXCSR_DEFAULT;
-        }
-        else
-            rc = -ENOMEM;
-    }
-
-    return rc;
+    return xstate_alloc_save_area(v);
 }
 
 void vcpu_setup_fpu(struct vcpu *v, struct xsave_struct *xsave_area,
                     const void *data, unsigned int fcw_default)
 {
-    /*
-     * For the entire function please note that vcpu_init_fpu() (above) points
-     * v->arch.fpu_ctxt into v->arch.xsave_area when XSAVE is available. Hence
-     * accesses through both pointers alias one another, and the shorter form
-     * is used here.
-     */
-    fpusse_t *fpu_sse = v->arch.fpu_ctxt;
+    fpusse_t *fpu_sse = &v->arch.xsave_area->fpu_sse;
 
     ASSERT(!xsave_area || xsave_area == v->arch.xsave_area);
 
@@ -373,10 +343,7 @@ void vcpu_setup_fpu(struct vcpu *v, struct xsave_struct *xsave_area,
 /* Free FPU's context save area */
 void vcpu_destroy_fpu(struct vcpu *v)
 {
-    if ( v->arch.xsave_area )
-        xstate_free_save_area(v);
-    else
-        xfree(v->arch.fpu_ctxt);
+    xstate_free_save_area(v);
 }
 
 /*
diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
index 5219c4fb0f69..b79d6badd71c 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -591,12 +591,6 @@ struct pv_vcpu
 
 struct arch_vcpu
 {
-    /*
-     * guest context (mirroring struct vcpu_guest_context) common
-     * between pv and hvm guests
-     */
-
-    void              *fpu_ctxt;
     struct cpu_user_regs user_regs;
 
     /* Debug registers. */
diff --git a/xen/arch/x86/x86_emulate/blk.c b/xen/arch/x86/x86_emulate/blk.c
index e790f4f90056..08a05f8453f7 100644
--- a/xen/arch/x86/x86_emulate/blk.c
+++ b/xen/arch/x86/x86_emulate/blk.c
@@ -11,7 +11,7 @@
     !defined(X86EMUL_NO_SIMD)
 # ifdef __XEN__
 #  include <asm/xstate.h>
-#  define FXSAVE_AREA current->arch.fpu_ctxt
+#  define FXSAVE_AREA ((void *)&current->arch.xsave_area->fpu_sse)
 # else
 #  define FXSAVE_AREA get_fpu_save_area()
 # endif
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 57a0749f0d54..af9e345a7ace 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -508,9 +508,15 @@ int xstate_alloc_save_area(struct vcpu *v)
     unsigned int size;
 
     if ( !cpu_has_xsave )
-        return 0;
-
-    if ( !is_idle_vcpu(v) || !cpu_has_xsavec )
+    {
+        /*
+         * On non-XSAVE systems, we allocate an XSTATE buffer for simplicity.
+         * XSTATE is backwards compatible to FXSAVE, and only one cacheline
+         * larger.
+         */
+        size = XSTATE_AREA_MIN_SIZE;
+    }
+    else if ( !is_idle_vcpu(v) || !cpu_has_xsavec )
     {
         size = xsave_cntxt_size;
         BUG_ON(size < XSTATE_AREA_MIN_SIZE);
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 2/2] x86/fpu: Rework fpu_setup_fpu() uses to split it in two
  2024-10-07 15:52 [PATCH v4 0/2] x86: FPU handling cleanup Alejandro Vallejo
  2024-10-07 15:52 ` [PATCH v4 1/2] x86/fpu: Combine fpu_ctxt and xsave_area in arch_vcpu Alejandro Vallejo
@ 2024-10-07 15:52 ` Alejandro Vallejo
  2024-10-08  6:37   ` Jan Beulich
  1 sibling, 1 reply; 7+ messages in thread
From: Alejandro Vallejo @ 2024-10-07 15:52 UTC (permalink / raw)
  To: Xen-devel
  Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
	Roger Pau Monné

It was trying to do too many things at once and there was no clear way of
defining what it was meant to do. This commit splits the function in two.

  1. A function to return the FPU to power-on reset values.
  2. A x87/SSE state loader (equivalent to the old function when it took
     a data pointer).

The old function also had a concept of "default" values that the FPU
would be configured for in some cases but not others. This patch removes
that 3rd vague initial state and replaces it with power-on reset.

While doing this make sure the abridged control tag is consistent with the
manuals and starts as 0xFF

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
--
@Jan: The patch changed substantially. Are you still ok with this R-by?

v4:
  * Reworded commit message and title
  * Remove vcpu_default_fpu() and replaced its uses with vcpu_reset_fpu()
  * s/FTW_RESET/FXSAVE_FTW_RESET/ (plus comment)
  * Remove FCW_DEFAULT, as it's the leftover reset value from the 80287
    (which we largely don't care about anymore).
---
 xen/arch/x86/domain.c             |  7 +++--
 xen/arch/x86/hvm/hvm.c            | 12 +++-----
 xen/arch/x86/i387.c               | 51 +++++++++++--------------------
 xen/arch/x86/include/asm/i387.h   | 21 ++++++++++---
 xen/arch/x86/include/asm/xstate.h |  1 +
 5 files changed, 45 insertions(+), 47 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 89aad7e8978f..78a13e6812c9 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1186,9 +1186,10 @@ int arch_set_info_guest(
          is_pv_64bit_domain(d) )
         v->arch.flags &= ~TF_kernel_mode;
 
-    vcpu_setup_fpu(v, v->arch.xsave_area,
-                   flags & VGCF_I387_VALID ? &c.nat->fpu_ctxt : NULL,
-                   FCW_DEFAULT);
+    if ( flags & VGCF_I387_VALID )
+        vcpu_setup_fpu(v, &c.nat->fpu_ctxt);
+    else
+        vcpu_reset_fpu(v);
 
     if ( !compat )
     {
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 77fe282118f7..44f4964aa036 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1163,10 +1163,10 @@ static int cf_check hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
     seg.attr = ctxt.ldtr_arbytes;
     hvm_set_segment_register(v, x86_seg_ldtr, &seg);
 
-    /* Cover xsave-absent save file restoration on xsave-capable host. */
-    vcpu_setup_fpu(v, xsave_enabled(v) ? NULL : v->arch.xsave_area,
-                   ctxt.flags & XEN_X86_FPU_INITIALISED ? ctxt.fpu_regs : NULL,
-                   FCW_RESET);
+    if ( ctxt.flags & XEN_X86_FPU_INITIALISED )
+        vcpu_setup_fpu(v, &ctxt.fpu_regs);
+    else
+        vcpu_reset_fpu(v);
 
     v->arch.user_regs.rax = ctxt.rax;
     v->arch.user_regs.rbx = ctxt.rbx;
@@ -4006,9 +4006,7 @@ void hvm_vcpu_reset_state(struct vcpu *v, uint16_t cs, uint16_t ip)
         v->arch.guest_table = pagetable_null();
     }
 
-    if ( v->arch.xsave_area )
-        v->arch.xsave_area->xsave_hdr.xstate_bv = 0;
-    vcpu_setup_fpu(v, v->arch.xsave_area, NULL, FCW_RESET);
+    vcpu_reset_fpu(v);
 
     arch_vcpu_regs_init(v);
     v->arch.user_regs.rip = ip;
diff --git a/xen/arch/x86/i387.c b/xen/arch/x86/i387.c
index fbb9d3584a3d..916d9b572598 100644
--- a/xen/arch/x86/i387.c
+++ b/xen/arch/x86/i387.c
@@ -303,41 +303,26 @@ int vcpu_init_fpu(struct vcpu *v)
     return xstate_alloc_save_area(v);
 }
 
-void vcpu_setup_fpu(struct vcpu *v, struct xsave_struct *xsave_area,
-                    const void *data, unsigned int fcw_default)
+void vcpu_reset_fpu(struct vcpu *v)
 {
-    fpusse_t *fpu_sse = &v->arch.xsave_area->fpu_sse;
-
-    ASSERT(!xsave_area || xsave_area == v->arch.xsave_area);
-
-    v->fpu_initialised = !!data;
-
-    if ( data )
-    {
-        memcpy(fpu_sse, data, sizeof(*fpu_sse));
-        if ( xsave_area )
-            xsave_area->xsave_hdr.xstate_bv = XSTATE_FP_SSE;
-    }
-    else if ( xsave_area && fcw_default == FCW_DEFAULT )
-    {
-        xsave_area->xsave_hdr.xstate_bv = 0;
-        fpu_sse->mxcsr = MXCSR_DEFAULT;
-    }
-    else
-    {
-        memset(fpu_sse, 0, sizeof(*fpu_sse));
-        fpu_sse->fcw = fcw_default;
-        fpu_sse->mxcsr = MXCSR_DEFAULT;
-        if ( v->arch.xsave_area )
-        {
-            v->arch.xsave_area->xsave_hdr.xstate_bv &= ~XSTATE_FP_SSE;
-            if ( fcw_default != FCW_DEFAULT )
-                v->arch.xsave_area->xsave_hdr.xstate_bv |= X86_XCR0_X87;
-        }
-    }
+    v->fpu_initialised = false;
+    *v->arch.xsave_area = (struct xsave_struct) {
+        .fpu_sse = {
+            .mxcsr = MXCSR_DEFAULT,
+            .fcw = FCW_RESET,
+            .ftw = FXSAVE_FTW_RESET,
+        },
+        .xsave_hdr.xstate_bv = X86_XCR0_X87,
+    };
+}
 
-    if ( xsave_area )
-        xsave_area->xsave_hdr.xcomp_bv = 0;
+void vcpu_setup_fpu(struct vcpu *v, const void *data)
+{
+    v->fpu_initialised = true;
+    *v->arch.xsave_area = (struct xsave_struct) {
+        .fpu_sse = *(const fpusse_t*)data,
+        .xsave_hdr.xstate_bv = XSTATE_FP_SSE,
+    };
 }
 
 /* Free FPU's context save area */
diff --git a/xen/arch/x86/include/asm/i387.h b/xen/arch/x86/include/asm/i387.h
index a783549db991..652d7ad2deb6 100644
--- a/xen/arch/x86/include/asm/i387.h
+++ b/xen/arch/x86/include/asm/i387.h
@@ -31,10 +31,23 @@ void vcpu_restore_fpu_nonlazy(struct vcpu *v, bool need_stts);
 void vcpu_restore_fpu_lazy(struct vcpu *v);
 void vcpu_save_fpu(struct vcpu *v);
 void save_fpu_enable(void);
-
 int vcpu_init_fpu(struct vcpu *v);
-struct xsave_struct;
-void vcpu_setup_fpu(struct vcpu *v, struct xsave_struct *xsave_area,
-                    const void *data, unsigned int fcw_default);
 void vcpu_destroy_fpu(struct vcpu *v);
+
+/*
+ * Restore v's FPU to power-on reset values
+ *
+ * @param v vCPU containing the FPU
+ */
+void vcpu_reset_fpu(struct vcpu *v);
+
+/*
+ * Load x87/SSE state into v's FPU
+ *
+ * Overrides the XSAVE header to set the state components to be x87 and SSE.
+ *
+ * @param v    vCPU containing the FPU
+ * @param data 512-octet blob for x87/SSE state
+ */
+void vcpu_setup_fpu(struct vcpu *v, const void *data);
 #endif /* __ASM_I386_I387_H */
diff --git a/xen/arch/x86/include/asm/xstate.h b/xen/arch/x86/include/asm/xstate.h
index b4ee5559534a..07017cc4edfd 100644
--- a/xen/arch/x86/include/asm/xstate.h
+++ b/xen/arch/x86/include/asm/xstate.h
@@ -14,6 +14,7 @@
 
 #define FCW_DEFAULT               0x037f
 #define FCW_RESET                 0x0040
+#define FXSAVE_FTW_RESET          0xFF /* Abridged Tag Word format */
 #define MXCSR_DEFAULT             0x1f80
 
 extern uint32_t mxcsr_mask;
-- 
2.46.0



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] x86/fpu: Rework fpu_setup_fpu() uses to split it in two
  2024-10-07 15:52 ` [PATCH v4 2/2] x86/fpu: Rework fpu_setup_fpu() uses to split it in two Alejandro Vallejo
@ 2024-10-08  6:37   ` Jan Beulich
  2024-10-08  9:38     ` Alejandro Vallejo
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2024-10-08  6:37 UTC (permalink / raw)
  To: Alejandro Vallejo; +Cc: Andrew Cooper, Roger Pau Monné, Xen-devel

On 07.10.2024 17:52, Alejandro Vallejo wrote:
> It was trying to do too many things at once and there was no clear way of
> defining what it was meant to do. This commit splits the function in two.
> 
>   1. A function to return the FPU to power-on reset values.
>   2. A x87/SSE state loader (equivalent to the old function when it took
>      a data pointer).
> 
> The old function also had a concept of "default" values that the FPU
> would be configured for in some cases but not others. This patch removes
> that 3rd vague initial state and replaces it with power-on reset.
> 
> While doing this make sure the abridged control tag is consistent with the
> manuals and starts as 0xFF
> 
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> --
> @Jan: The patch changed substantially. Are you still ok with this R-by?

I am. However in such a situation imo you'd better drop the tag, for it to
be re-offered (if desired). It can very well happen that the person simply
doesn't notice the question pointed at them.

Jan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] x86/fpu: Combine fpu_ctxt and xsave_area in arch_vcpu
  2024-10-07 15:52 ` [PATCH v4 1/2] x86/fpu: Combine fpu_ctxt and xsave_area in arch_vcpu Alejandro Vallejo
@ 2024-10-08  7:47   ` Frediano Ziglio
  2024-10-08  9:41     ` Alejandro Vallejo
  0 siblings, 1 reply; 7+ messages in thread
From: Frediano Ziglio @ 2024-10-08  7:47 UTC (permalink / raw)
  To: Alejandro Vallejo
  Cc: Xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné

On Mon, Oct 7, 2024 at 4:52 PM Alejandro Vallejo
<alejandro.vallejo@cloud.com> wrote:
>
> fpu_ctxt is either a pointer to the legacy x87/SSE save area (used by FXSAVE) or
> a pointer aliased with xsave_area that points to its fpu_sse subfield. Such
> subfield is at the base and is identical in size and layout to the legacy
> buffer.
>
> This patch merges the 2 pointers in the arch_vcpu into a single XSAVE area. In
> the very rare case in which the host doesn't support XSAVE all we're doing is
> wasting a tiny amount of memory and trading those for a lot more simplicity in
> the code.
>
> While at it, dedup the setup logic in vcpu_init_fpu() and integrate it
> into xstate_alloc_save_area().
>
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> --
> v4:
>   * Amend commit message with extra note about deduping vcpu_init_fpu()
>   * Remove comment on top of cpu_user_regs (though I really think there
>     ought to be a credible one, in one form or another).
>   * Remove cast from blk.c so FXSAVE_AREA is "void *"
>   * Simplify comment in xstate_alloc_save_area() for the "host has no
>     XSAVE" case.
> ---
>  xen/arch/x86/domctl.c             |  6 ++++-
>  xen/arch/x86/hvm/emulate.c        |  4 +--
>  xen/arch/x86/hvm/hvm.c            |  6 ++++-
>  xen/arch/x86/i387.c               | 45 +++++--------------------------
>  xen/arch/x86/include/asm/domain.h |  6 -----
>  xen/arch/x86/x86_emulate/blk.c    |  2 +-
>  xen/arch/x86/xstate.c             | 12 ++++++---
>  7 files changed, 28 insertions(+), 53 deletions(-)
>
> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
> index 96d816cf1a7d..2d115395da90 100644
> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -1379,7 +1379,11 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
>  #define c(fld) (c.nat->fld)
>  #endif
>
> -    memcpy(&c.nat->fpu_ctxt, v->arch.fpu_ctxt, sizeof(c.nat->fpu_ctxt));
> +    BUILD_BUG_ON(sizeof(c.nat->fpu_ctxt) !=
> +                 sizeof(v->arch.xsave_area->fpu_sse));
> +    memcpy(&c.nat->fpu_ctxt, &v->arch.xsave_area->fpu_sse,
> +           sizeof(c.nat->fpu_ctxt));
> +
>      if ( is_pv_domain(d) )
>          c(flags = v->arch.pv.vgc_flags & ~(VGCF_i387_valid|VGCF_in_kernel));
>      else
> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
> index aa97ca1cbffd..f2bc6967dfcb 100644
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -2371,7 +2371,7 @@ static int cf_check hvmemul_get_fpu(
>          alternative_vcall(hvm_funcs.fpu_dirty_intercept);
>      else if ( type == X86EMUL_FPU_fpu )
>      {
> -        const fpusse_t *fpu_ctxt = curr->arch.fpu_ctxt;
> +        const fpusse_t *fpu_ctxt = &curr->arch.xsave_area->fpu_sse;
>
>          /*
>           * Latch current register state so that we can back out changes
> @@ -2411,7 +2411,7 @@ static void cf_check hvmemul_put_fpu(
>
>      if ( aux )
>      {
> -        fpusse_t *fpu_ctxt = curr->arch.fpu_ctxt;
> +        fpusse_t *fpu_ctxt = &curr->arch.xsave_area->fpu_sse;
>          bool dval = aux->dval;
>          int mode = hvm_guest_x86_mode(curr);
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 7b2e1c9813d6..77fe282118f7 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -914,7 +914,11 @@ static int cf_check hvm_save_cpu_ctxt(struct vcpu *v, hvm_domain_context_t *h)
>
>      if ( v->fpu_initialised )
>      {
> -        memcpy(ctxt.fpu_regs, v->arch.fpu_ctxt, sizeof(ctxt.fpu_regs));
> +        BUILD_BUG_ON(sizeof(ctxt.fpu_regs) !=
> +                     sizeof(v->arch.xsave_area->fpu_sse));
> +        memcpy(ctxt.fpu_regs, &v->arch.xsave_area->fpu_sse,
> +               sizeof(ctxt.fpu_regs));
> +
>          ctxt.flags = XEN_X86_FPU_INITIALISED;
>      }
>
> diff --git a/xen/arch/x86/i387.c b/xen/arch/x86/i387.c
> index 134e0bece519..fbb9d3584a3d 100644
> --- a/xen/arch/x86/i387.c
> +++ b/xen/arch/x86/i387.c
> @@ -39,7 +39,7 @@ static inline void fpu_xrstor(struct vcpu *v, uint64_t mask)
>  /* Restore x87 FPU, MMX, SSE and SSE2 state */
>  static inline void fpu_fxrstor(struct vcpu *v)
>  {
> -    const fpusse_t *fpu_ctxt = v->arch.fpu_ctxt;
> +    const fpusse_t *fpu_ctxt = &v->arch.xsave_area->fpu_sse;
>
>      /*
>       * Some CPUs don't save/restore FDP/FIP/FOP unless an exception
> @@ -151,7 +151,7 @@ static inline void fpu_xsave(struct vcpu *v)
>  /* Save x87 FPU, MMX, SSE and SSE2 state */
>  static inline void fpu_fxsave(struct vcpu *v)
>  {
> -    fpusse_t *fpu_ctxt = v->arch.fpu_ctxt;
> +    fpusse_t *fpu_ctxt = &v->arch.xsave_area->fpu_sse;
>      unsigned int fip_width = v->domain->arch.x87_fip_width;
>
>      if ( fip_width != 4 )
> @@ -212,7 +212,7 @@ void vcpu_restore_fpu_nonlazy(struct vcpu *v, bool need_stts)
>       * above) we also need to restore full state, to prevent subsequently
>       * saving state belonging to another vCPU.
>       */
> -    if ( v->arch.fully_eager_fpu || (v->arch.xsave_area && xstate_all(v)) )
> +    if ( v->arch.fully_eager_fpu || xstate_all(v) )
>      {
>          if ( cpu_has_xsave )
>              fpu_xrstor(v, XSTATE_ALL);
> @@ -299,44 +299,14 @@ void save_fpu_enable(void)
>  /* Initialize FPU's context save area */
>  int vcpu_init_fpu(struct vcpu *v)
>  {
> -    int rc;
> -
>      v->arch.fully_eager_fpu = opt_eager_fpu;
> -
> -    if ( (rc = xstate_alloc_save_area(v)) != 0 )
> -        return rc;
> -
> -    if ( v->arch.xsave_area )
> -        v->arch.fpu_ctxt = &v->arch.xsave_area->fpu_sse;
> -    else
> -    {
> -        BUILD_BUG_ON(__alignof(v->arch.xsave_area->fpu_sse) < 16);
> -        v->arch.fpu_ctxt = _xzalloc(sizeof(v->arch.xsave_area->fpu_sse),
> -                                    __alignof(v->arch.xsave_area->fpu_sse));
> -        if ( v->arch.fpu_ctxt )
> -        {
> -            fpusse_t *fpu_sse = v->arch.fpu_ctxt;
> -
> -            fpu_sse->fcw = FCW_DEFAULT;
> -            fpu_sse->mxcsr = MXCSR_DEFAULT;
> -        }
> -        else
> -            rc = -ENOMEM;
> -    }
> -
> -    return rc;
> +    return xstate_alloc_save_area(v);
>  }
>
>  void vcpu_setup_fpu(struct vcpu *v, struct xsave_struct *xsave_area,
>                      const void *data, unsigned int fcw_default)
>  {
> -    /*
> -     * For the entire function please note that vcpu_init_fpu() (above) points
> -     * v->arch.fpu_ctxt into v->arch.xsave_area when XSAVE is available. Hence
> -     * accesses through both pointers alias one another, and the shorter form
> -     * is used here.
> -     */
> -    fpusse_t *fpu_sse = v->arch.fpu_ctxt;
> +    fpusse_t *fpu_sse = &v->arch.xsave_area->fpu_sse;
>
>      ASSERT(!xsave_area || xsave_area == v->arch.xsave_area);
>
> @@ -373,10 +343,7 @@ void vcpu_setup_fpu(struct vcpu *v, struct xsave_struct *xsave_area,
>  /* Free FPU's context save area */
>  void vcpu_destroy_fpu(struct vcpu *v)
>  {
> -    if ( v->arch.xsave_area )
> -        xstate_free_save_area(v);
> -    else
> -        xfree(v->arch.fpu_ctxt);
> +    xstate_free_save_area(v);
>  }
>
>  /*
> diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
> index 5219c4fb0f69..b79d6badd71c 100644
> --- a/xen/arch/x86/include/asm/domain.h
> +++ b/xen/arch/x86/include/asm/domain.h
> @@ -591,12 +591,6 @@ struct pv_vcpu
>
>  struct arch_vcpu
>  {
> -    /*
> -     * guest context (mirroring struct vcpu_guest_context) common
> -     * between pv and hvm guests
> -     */
> -
> -    void              *fpu_ctxt;
>      struct cpu_user_regs user_regs;
>
>      /* Debug registers. */
> diff --git a/xen/arch/x86/x86_emulate/blk.c b/xen/arch/x86/x86_emulate/blk.c
> index e790f4f90056..08a05f8453f7 100644
> --- a/xen/arch/x86/x86_emulate/blk.c
> +++ b/xen/arch/x86/x86_emulate/blk.c
> @@ -11,7 +11,7 @@
>      !defined(X86EMUL_NO_SIMD)
>  # ifdef __XEN__
>  #  include <asm/xstate.h>
> -#  define FXSAVE_AREA current->arch.fpu_ctxt
> +#  define FXSAVE_AREA ((void *)&current->arch.xsave_area->fpu_sse)

Could you use "struct x86_fxsr *" instead of "void*" ?
Maybe adding another "struct x86_fxsr fxsr" inside the anonymous
fpu_sse union would help here.

>  # else
>  #  define FXSAVE_AREA get_fpu_save_area()
>  # endif
> diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
> index 57a0749f0d54..af9e345a7ace 100644
> --- a/xen/arch/x86/xstate.c
> +++ b/xen/arch/x86/xstate.c
> @@ -508,9 +508,15 @@ int xstate_alloc_save_area(struct vcpu *v)
>      unsigned int size;
>
>      if ( !cpu_has_xsave )
> -        return 0;
> -
> -    if ( !is_idle_vcpu(v) || !cpu_has_xsavec )
> +    {
> +        /*
> +         * On non-XSAVE systems, we allocate an XSTATE buffer for simplicity.
> +         * XSTATE is backwards compatible to FXSAVE, and only one cacheline
> +         * larger.
> +         */
> +        size = XSTATE_AREA_MIN_SIZE;
> +    }
> +    else if ( !is_idle_vcpu(v) || !cpu_has_xsavec )
>      {
>          size = xsave_cntxt_size;
>          BUG_ON(size < XSTATE_AREA_MIN_SIZE);

Frediano


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] x86/fpu: Rework fpu_setup_fpu() uses to split it in two
  2024-10-08  6:37   ` Jan Beulich
@ 2024-10-08  9:38     ` Alejandro Vallejo
  0 siblings, 0 replies; 7+ messages in thread
From: Alejandro Vallejo @ 2024-10-08  9:38 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Roger Pau Monné, Xen-devel

On Tue Oct 8, 2024 at 7:37 AM BST, Jan Beulich wrote:
> On 07.10.2024 17:52, Alejandro Vallejo wrote:
> > It was trying to do too many things at once and there was no clear way of
> > defining what it was meant to do. This commit splits the function in two.
> > 
> >   1. A function to return the FPU to power-on reset values.
> >   2. A x87/SSE state loader (equivalent to the old function when it took
> >      a data pointer).
> > 
> > The old function also had a concept of "default" values that the FPU
> > would be configured for in some cases but not others. This patch removes
> > that 3rd vague initial state and replaces it with power-on reset.
> > 
> > While doing this make sure the abridged control tag is consistent with the
> > manuals and starts as 0xFF
> > 
> > Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> > --
> > @Jan: The patch changed substantially. Are you still ok with this R-by?
>
> I am. However in such a situation imo you'd better drop the tag, for it to
> be re-offered (if desired). It can very well happen that the person simply
> doesn't notice the question pointed at them.
>
> Jan

Noted for next time. Thanks for the promptness!

Cheers,
Alejandro


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] x86/fpu: Combine fpu_ctxt and xsave_area in arch_vcpu
  2024-10-08  7:47   ` Frediano Ziglio
@ 2024-10-08  9:41     ` Alejandro Vallejo
  0 siblings, 0 replies; 7+ messages in thread
From: Alejandro Vallejo @ 2024-10-08  9:41 UTC (permalink / raw)
  To: Frediano Ziglio
  Cc: Xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné

On Tue Oct 8, 2024 at 8:47 AM BST, Frediano Ziglio wrote:
> On Mon, Oct 7, 2024 at 4:52 PM Alejandro Vallejo
> <alejandro.vallejo@cloud.com> wrote:
> >
> > fpu_ctxt is either a pointer to the legacy x87/SSE save area (used by FXSAVE) or
> > a pointer aliased with xsave_area that points to its fpu_sse subfield. Such
> > subfield is at the base and is identical in size and layout to the legacy
> > buffer.
> >
> > This patch merges the 2 pointers in the arch_vcpu into a single XSAVE area. In
> > the very rare case in which the host doesn't support XSAVE all we're doing is
> > wasting a tiny amount of memory and trading those for a lot more simplicity in
> > the code.
> >
> > While at it, dedup the setup logic in vcpu_init_fpu() and integrate it
> > into xstate_alloc_save_area().
> >
> > Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> > --
> > v4:
> >   * Amend commit message with extra note about deduping vcpu_init_fpu()
> >   * Remove comment on top of cpu_user_regs (though I really think there
> >     ought to be a credible one, in one form or another).
> >   * Remove cast from blk.c so FXSAVE_AREA is "void *"
> >   * Simplify comment in xstate_alloc_save_area() for the "host has no
> >     XSAVE" case.
> > ---
> >  xen/arch/x86/domctl.c             |  6 ++++-
> >  xen/arch/x86/hvm/emulate.c        |  4 +--
> >  xen/arch/x86/hvm/hvm.c            |  6 ++++-
> >  xen/arch/x86/i387.c               | 45 +++++--------------------------
> >  xen/arch/x86/include/asm/domain.h |  6 -----
> >  xen/arch/x86/x86_emulate/blk.c    |  2 +-
> >  xen/arch/x86/xstate.c             | 12 ++++++---
> >  7 files changed, 28 insertions(+), 53 deletions(-)
> >
> > diff --git a/xen/arch/x86/x86_emulate/blk.c b/xen/arch/x86/x86_emulate/blk.c
> > index e790f4f90056..08a05f8453f7 100644
> > --- a/xen/arch/x86/x86_emulate/blk.c
> > +++ b/xen/arch/x86/x86_emulate/blk.c
> > @@ -11,7 +11,7 @@
> >      !defined(X86EMUL_NO_SIMD)
> >  # ifdef __XEN__
> >  #  include <asm/xstate.h>
> > -#  define FXSAVE_AREA current->arch.fpu_ctxt
> > +#  define FXSAVE_AREA ((void *)&current->arch.xsave_area->fpu_sse)
>
> Could you use "struct x86_fxsr *" instead of "void*" ?
> Maybe adding another "struct x86_fxsr fxsr" inside the anonymous
> fpu_sse union would help here.
>

I did in v3, and Andrew suggested to keep the (void *). See:

  https://lore.kernel.org/xen-devel/2b42323a-961a-4dd8-8cde-f4b19eac0dc5@citrix.com/

Cheers,
Alejandro


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-10-08  9:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-07 15:52 [PATCH v4 0/2] x86: FPU handling cleanup Alejandro Vallejo
2024-10-07 15:52 ` [PATCH v4 1/2] x86/fpu: Combine fpu_ctxt and xsave_area in arch_vcpu Alejandro Vallejo
2024-10-08  7:47   ` Frediano Ziglio
2024-10-08  9:41     ` Alejandro Vallejo
2024-10-07 15:52 ` [PATCH v4 2/2] x86/fpu: Rework fpu_setup_fpu() uses to split it in two Alejandro Vallejo
2024-10-08  6:37   ` Jan Beulich
2024-10-08  9:38     ` Alejandro Vallejo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.