* [PATCH v3 0/2] x86: FPU handling cleanup
@ 2024-08-13 14:21 Alejandro Vallejo
2024-08-13 14:21 ` [PATCH v3 1/2] x86/fpu: Combine fpu_ctxt and xsave_area in arch_vcpu Alejandro Vallejo
2024-08-13 14:21 ` [PATCH v3 2/2] x86/fpu: Split fpu_setup_fpu() in three Alejandro Vallejo
0 siblings, 2 replies; 12+ messages in thread
From: Alejandro Vallejo @ 2024-08-13 14:21 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
Roger Pau Monné
v2: https://lore.kernel.org/xen-devel/20240808134150.29927-1-alejandro.vallejo@cloud.com/T/#t
v2 -> v3: Cosmetic changes and wiped big comment about missing data in the
migration stream. Details in each patch.
v1: https://lore.kernel.org/xen-devel/cover.1720538832.git.alejandro.vallejo@cloud.com/T/#t
v1 -> v2: v1/patch1 and v1/patch2 are already in staging.
=============================== Original cover letter =========================
I want to eventually reach a position in which the FPU state can be allocated
from the domheap and hidden via the same core mechanism proposed in Elias'
directmap removal series. Doing so is complicated by the presence of 2 aliased
pointers (v->arch.fpu_ctxt and v->arch.xsave_area) and the rather complicated
semantics of vcpu_setup_fpu(). This series tries to simplify the code so moving
to a "map/modify/unmap" model is more tractable.
Patches 1 and 2 are trivial refactors.
Patch 3 unifies FPU state so an XSAVE area is allocated per vCPU regardless of
the host supporting it or not. The rationale is that the memory savings are
negligible and not worth the extra complexity.
Patch 4 is a non-trivial split of the vcpu_setup_fpu() into 2 separate
functions. One to override x87/SSE state, and another to set a reset state.
===============================================================================
Alejandro Vallejo (2):
x86/fpu: Combine fpu_ctxt and xsave_area in arch_vcpu
x86/fpu: Split fpu_setup_fpu() in three
xen/arch/x86/domain.c | 7 +-
xen/arch/x86/domctl.c | 6 +-
xen/arch/x86/hvm/emulate.c | 4 +-
xen/arch/x86/hvm/hvm.c | 18 +++---
xen/arch/x86/i387.c | 103 ++++++++++--------------------
xen/arch/x86/include/asm/domain.h | 8 +--
xen/arch/x86/include/asm/i387.h | 28 ++++++--
xen/arch/x86/include/asm/xstate.h | 1 +
xen/arch/x86/x86_emulate/blk.c | 3 +-
xen/arch/x86/xstate.c | 13 +++-
10 files changed, 95 insertions(+), 96 deletions(-)
--
2.45.2
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 1/2] x86/fpu: Combine fpu_ctxt and xsave_area in arch_vcpu
2024-08-13 14:21 [PATCH v3 0/2] x86: FPU handling cleanup Alejandro Vallejo
@ 2024-08-13 14:21 ` Alejandro Vallejo
2024-10-03 19:38 ` Andrew Cooper
2024-08-13 14:21 ` [PATCH v3 2/2] x86/fpu: Split fpu_setup_fpu() in three Alejandro Vallejo
1 sibling, 1 reply; 12+ messages in thread
From: Alejandro Vallejo @ 2024-08-13 14:21 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
Roger Pau Monné
fpu_ctxt is either a pointer to the legacy x87/SSE save area (used by FXSAVE) or
a pointer aliased with xsave_area that points to its fpu_sse subfield. Such
subfield is at the base and is identical in size and layout to the legacy
buffer.
This patch merges the 2 pointers in the arch_vcpu into a single XSAVE area. In
the very rare case in which the host doesn't support XSAVE all we're doing is
wasting a tiny amount of memory and trading those for a lot more simplicity in
the code.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v3:
* Reverse memcpy() and BUILD_BUG_ON()
* Use sizeof(arg) rather than sizeof(type)
---
xen/arch/x86/domctl.c | 6 ++++-
xen/arch/x86/hvm/emulate.c | 4 +--
xen/arch/x86/hvm/hvm.c | 6 ++++-
xen/arch/x86/i387.c | 45 +++++--------------------------
xen/arch/x86/include/asm/domain.h | 8 +++---
xen/arch/x86/x86_emulate/blk.c | 3 ++-
xen/arch/x86/xstate.c | 13 ++++++---
7 files changed, 34 insertions(+), 51 deletions(-)
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 68b5b46d1a83..63fbceac0911 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1344,7 +1344,11 @@ void arch_get_info_guest(struct vcpu *v, vcpu_guest_context_u c)
#define c(fld) (c.nat->fld)
#endif
- memcpy(&c.nat->fpu_ctxt, v->arch.fpu_ctxt, sizeof(c.nat->fpu_ctxt));
+ BUILD_BUG_ON(sizeof(c.nat->fpu_ctxt) !=
+ sizeof(v->arch.xsave_area->fpu_sse));
+ memcpy(&c.nat->fpu_ctxt, &v->arch.xsave_area->fpu_sse,
+ sizeof(c.nat->fpu_ctxt));
+
if ( is_pv_domain(d) )
c(flags = v->arch.pv.vgc_flags & ~(VGCF_i387_valid|VGCF_in_kernel));
else
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index feb4792cc567..03020542c3ba 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -2363,7 +2363,7 @@ static int cf_check hvmemul_get_fpu(
alternative_vcall(hvm_funcs.fpu_dirty_intercept);
else if ( type == X86EMUL_FPU_fpu )
{
- const fpusse_t *fpu_ctxt = curr->arch.fpu_ctxt;
+ const fpusse_t *fpu_ctxt = &curr->arch.xsave_area->fpu_sse;
/*
* Latch current register state so that we can back out changes
@@ -2403,7 +2403,7 @@ static void cf_check hvmemul_put_fpu(
if ( aux )
{
- fpusse_t *fpu_ctxt = curr->arch.fpu_ctxt;
+ fpusse_t *fpu_ctxt = &curr->arch.xsave_area->fpu_sse;
bool dval = aux->dval;
int mode = hvm_guest_x86_mode(curr);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index f49e29faf753..76bbb645b77a 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -916,7 +916,11 @@ static int cf_check hvm_save_cpu_ctxt(struct vcpu *v, hvm_domain_context_t *h)
if ( v->fpu_initialised )
{
- memcpy(ctxt.fpu_regs, v->arch.fpu_ctxt, sizeof(ctxt.fpu_regs));
+ BUILD_BUG_ON(sizeof(ctxt.fpu_regs) !=
+ sizeof(v->arch.xsave_area->fpu_sse));
+ memcpy(ctxt.fpu_regs, &v->arch.xsave_area->fpu_sse,
+ sizeof(ctxt.fpu_regs));
+
ctxt.flags = XEN_X86_FPU_INITIALISED;
}
diff --git a/xen/arch/x86/i387.c b/xen/arch/x86/i387.c
index 134e0bece519..fbb9d3584a3d 100644
--- a/xen/arch/x86/i387.c
+++ b/xen/arch/x86/i387.c
@@ -39,7 +39,7 @@ static inline void fpu_xrstor(struct vcpu *v, uint64_t mask)
/* Restore x87 FPU, MMX, SSE and SSE2 state */
static inline void fpu_fxrstor(struct vcpu *v)
{
- const fpusse_t *fpu_ctxt = v->arch.fpu_ctxt;
+ const fpusse_t *fpu_ctxt = &v->arch.xsave_area->fpu_sse;
/*
* Some CPUs don't save/restore FDP/FIP/FOP unless an exception
@@ -151,7 +151,7 @@ static inline void fpu_xsave(struct vcpu *v)
/* Save x87 FPU, MMX, SSE and SSE2 state */
static inline void fpu_fxsave(struct vcpu *v)
{
- fpusse_t *fpu_ctxt = v->arch.fpu_ctxt;
+ fpusse_t *fpu_ctxt = &v->arch.xsave_area->fpu_sse;
unsigned int fip_width = v->domain->arch.x87_fip_width;
if ( fip_width != 4 )
@@ -212,7 +212,7 @@ void vcpu_restore_fpu_nonlazy(struct vcpu *v, bool need_stts)
* above) we also need to restore full state, to prevent subsequently
* saving state belonging to another vCPU.
*/
- if ( v->arch.fully_eager_fpu || (v->arch.xsave_area && xstate_all(v)) )
+ if ( v->arch.fully_eager_fpu || xstate_all(v) )
{
if ( cpu_has_xsave )
fpu_xrstor(v, XSTATE_ALL);
@@ -299,44 +299,14 @@ void save_fpu_enable(void)
/* Initialize FPU's context save area */
int vcpu_init_fpu(struct vcpu *v)
{
- int rc;
-
v->arch.fully_eager_fpu = opt_eager_fpu;
-
- if ( (rc = xstate_alloc_save_area(v)) != 0 )
- return rc;
-
- if ( v->arch.xsave_area )
- v->arch.fpu_ctxt = &v->arch.xsave_area->fpu_sse;
- else
- {
- BUILD_BUG_ON(__alignof(v->arch.xsave_area->fpu_sse) < 16);
- v->arch.fpu_ctxt = _xzalloc(sizeof(v->arch.xsave_area->fpu_sse),
- __alignof(v->arch.xsave_area->fpu_sse));
- if ( v->arch.fpu_ctxt )
- {
- fpusse_t *fpu_sse = v->arch.fpu_ctxt;
-
- fpu_sse->fcw = FCW_DEFAULT;
- fpu_sse->mxcsr = MXCSR_DEFAULT;
- }
- else
- rc = -ENOMEM;
- }
-
- return rc;
+ return xstate_alloc_save_area(v);
}
void vcpu_setup_fpu(struct vcpu *v, struct xsave_struct *xsave_area,
const void *data, unsigned int fcw_default)
{
- /*
- * For the entire function please note that vcpu_init_fpu() (above) points
- * v->arch.fpu_ctxt into v->arch.xsave_area when XSAVE is available. Hence
- * accesses through both pointers alias one another, and the shorter form
- * is used here.
- */
- fpusse_t *fpu_sse = v->arch.fpu_ctxt;
+ fpusse_t *fpu_sse = &v->arch.xsave_area->fpu_sse;
ASSERT(!xsave_area || xsave_area == v->arch.xsave_area);
@@ -373,10 +343,7 @@ void vcpu_setup_fpu(struct vcpu *v, struct xsave_struct *xsave_area,
/* Free FPU's context save area */
void vcpu_destroy_fpu(struct vcpu *v)
{
- if ( v->arch.xsave_area )
- xstate_free_save_area(v);
- else
- xfree(v->arch.fpu_ctxt);
+ xstate_free_save_area(v);
}
/*
diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
index bca3258d69ac..3da60af2a44a 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -592,11 +592,11 @@ struct pv_vcpu
struct arch_vcpu
{
/*
- * guest context (mirroring struct vcpu_guest_context) common
- * between pv and hvm guests
+ * Guest context common between PV and HVM guests. Includes general purpose
+ * registers, segment registers and other parts of the exception frame.
+ *
+ * It doesn't contain FPU state, as that lives in xsave_area instead.
*/
-
- void *fpu_ctxt;
struct cpu_user_regs user_regs;
/* Debug registers. */
diff --git a/xen/arch/x86/x86_emulate/blk.c b/xen/arch/x86/x86_emulate/blk.c
index e790f4f90056..28b54f26fe29 100644
--- a/xen/arch/x86/x86_emulate/blk.c
+++ b/xen/arch/x86/x86_emulate/blk.c
@@ -11,7 +11,8 @@
!defined(X86EMUL_NO_SIMD)
# ifdef __XEN__
# include <asm/xstate.h>
-# define FXSAVE_AREA current->arch.fpu_ctxt
+# define FXSAVE_AREA ((struct x86_fxsr *) \
+ (void *)¤t->arch.xsave_area->fpu_sse)
# else
# define FXSAVE_AREA get_fpu_save_area()
# endif
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 5c4144d55e89..850ee31bd18c 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -507,9 +507,16 @@ int xstate_alloc_save_area(struct vcpu *v)
unsigned int size;
if ( !cpu_has_xsave )
- return 0;
-
- if ( !is_idle_vcpu(v) || !cpu_has_xsavec )
+ {
+ /*
+ * This is bigger than FXSAVE_SIZE by 64 bytes, but it helps treating
+ * the FPU state uniformly as an XSAVE buffer even if XSAVE is not
+ * available in the host. Note the alignment restriction of the XSAVE
+ * area are stricter than those of the FXSAVE area.
+ */
+ size = XSTATE_AREA_MIN_SIZE;
+ }
+ else if ( !is_idle_vcpu(v) || !cpu_has_xsavec )
{
size = xsave_cntxt_size;
BUG_ON(size < XSTATE_AREA_MIN_SIZE);
--
2.45.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 2/2] x86/fpu: Split fpu_setup_fpu() in three
2024-08-13 14:21 [PATCH v3 0/2] x86: FPU handling cleanup Alejandro Vallejo
2024-08-13 14:21 ` [PATCH v3 1/2] x86/fpu: Combine fpu_ctxt and xsave_area in arch_vcpu Alejandro Vallejo
@ 2024-08-13 14:21 ` Alejandro Vallejo
2024-08-13 14:32 ` Jan Beulich
2024-10-04 0:04 ` Andrew Cooper
1 sibling, 2 replies; 12+ messages in thread
From: Alejandro Vallejo @ 2024-08-13 14:21 UTC (permalink / raw)
To: Xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Andrew Cooper,
Roger Pau Monné
It was trying to do too many things at once and there was no clear way of
defining what it was meant to do. This commit splits the function in three.
1. A function to return the FPU to power-on reset values.
2. A function to return the FPU to default values.
3. A x87/SSE state loader (equivalent to the old function when it took a data
pointer).
While at it, make sure the abridged tag is consistent with the manuals and
start as 0xFF.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
---
v3:
* Adjust commit message, as the split is now in 3.
* Remove bulky comment, as the rationale for it turned out to be
unsubstantiated. I can't find proof in xen-devel of the stream
operating the way I claimed, and at that point having the comment
at all is pointless
I suspect the rationale for xsave_vcpu(v) was merely to skip writing the XSAVE
header when it would be rewritten later on. Whatever it might be the current
logic does the right thing and is several orders of magnitude clearer about its
objective and its intent.
---
xen/arch/x86/domain.c | 7 ++--
xen/arch/x86/hvm/hvm.c | 12 +++----
xen/arch/x86/i387.c | 60 +++++++++++++++----------------
xen/arch/x86/include/asm/i387.h | 28 ++++++++++++---
xen/arch/x86/include/asm/xstate.h | 1 +
5 files changed, 62 insertions(+), 46 deletions(-)
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index d977ec71ca20..5af9e3e7a8b4 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1186,9 +1186,10 @@ int arch_set_info_guest(
is_pv_64bit_domain(d) )
v->arch.flags &= ~TF_kernel_mode;
- vcpu_setup_fpu(v, v->arch.xsave_area,
- flags & VGCF_I387_VALID ? &c.nat->fpu_ctxt : NULL,
- FCW_DEFAULT);
+ if ( flags & VGCF_I387_VALID )
+ vcpu_setup_fpu(v, &c.nat->fpu_ctxt);
+ else
+ vcpu_default_fpu(v);
if ( !compat )
{
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 76bbb645b77a..95d66e68a849 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1165,10 +1165,10 @@ static int cf_check hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
seg.attr = ctxt.ldtr_arbytes;
hvm_set_segment_register(v, x86_seg_ldtr, &seg);
- /* Cover xsave-absent save file restoration on xsave-capable host. */
- vcpu_setup_fpu(v, xsave_enabled(v) ? NULL : v->arch.xsave_area,
- ctxt.flags & XEN_X86_FPU_INITIALISED ? ctxt.fpu_regs : NULL,
- FCW_RESET);
+ if ( ctxt.flags & XEN_X86_FPU_INITIALISED )
+ vcpu_setup_fpu(v, &ctxt.fpu_regs);
+ else
+ vcpu_reset_fpu(v);
v->arch.user_regs.rax = ctxt.rax;
v->arch.user_regs.rbx = ctxt.rbx;
@@ -4008,9 +4008,7 @@ void hvm_vcpu_reset_state(struct vcpu *v, uint16_t cs, uint16_t ip)
v->arch.guest_table = pagetable_null();
}
- if ( v->arch.xsave_area )
- v->arch.xsave_area->xsave_hdr.xstate_bv = 0;
- vcpu_setup_fpu(v, v->arch.xsave_area, NULL, FCW_RESET);
+ vcpu_reset_fpu(v);
arch_vcpu_regs_init(v);
v->arch.user_regs.rip = ip;
diff --git a/xen/arch/x86/i387.c b/xen/arch/x86/i387.c
index fbb9d3584a3d..f7a9dcd162ba 100644
--- a/xen/arch/x86/i387.c
+++ b/xen/arch/x86/i387.c
@@ -303,41 +303,37 @@ int vcpu_init_fpu(struct vcpu *v)
return xstate_alloc_save_area(v);
}
-void vcpu_setup_fpu(struct vcpu *v, struct xsave_struct *xsave_area,
- const void *data, unsigned int fcw_default)
+void vcpu_reset_fpu(struct vcpu *v)
{
- fpusse_t *fpu_sse = &v->arch.xsave_area->fpu_sse;
-
- ASSERT(!xsave_area || xsave_area == v->arch.xsave_area);
-
- v->fpu_initialised = !!data;
+ v->fpu_initialised = false;
+ *v->arch.xsave_area = (struct xsave_struct) {
+ .fpu_sse = {
+ .mxcsr = MXCSR_DEFAULT,
+ .fcw = FCW_RESET,
+ .ftw = FTW_RESET,
+ },
+ .xsave_hdr.xstate_bv = X86_XCR0_X87,
+ };
+}
- if ( data )
- {
- memcpy(fpu_sse, data, sizeof(*fpu_sse));
- if ( xsave_area )
- xsave_area->xsave_hdr.xstate_bv = XSTATE_FP_SSE;
- }
- else if ( xsave_area && fcw_default == FCW_DEFAULT )
- {
- xsave_area->xsave_hdr.xstate_bv = 0;
- fpu_sse->mxcsr = MXCSR_DEFAULT;
- }
- else
- {
- memset(fpu_sse, 0, sizeof(*fpu_sse));
- fpu_sse->fcw = fcw_default;
- fpu_sse->mxcsr = MXCSR_DEFAULT;
- if ( v->arch.xsave_area )
- {
- v->arch.xsave_area->xsave_hdr.xstate_bv &= ~XSTATE_FP_SSE;
- if ( fcw_default != FCW_DEFAULT )
- v->arch.xsave_area->xsave_hdr.xstate_bv |= X86_XCR0_X87;
- }
- }
+void vcpu_default_fpu(struct vcpu *v)
+{
+ v->fpu_initialised = false;
+ *v->arch.xsave_area = (struct xsave_struct) {
+ .fpu_sse = {
+ .mxcsr = MXCSR_DEFAULT,
+ .fcw = FCW_DEFAULT,
+ },
+ };
+}
- if ( xsave_area )
- xsave_area->xsave_hdr.xcomp_bv = 0;
+void vcpu_setup_fpu(struct vcpu *v, const void *data)
+{
+ v->fpu_initialised = true;
+ *v->arch.xsave_area = (struct xsave_struct) {
+ .fpu_sse = *(const fpusse_t*)data,
+ .xsave_hdr.xstate_bv = XSTATE_FP_SSE,
+ };
}
/* Free FPU's context save area */
diff --git a/xen/arch/x86/include/asm/i387.h b/xen/arch/x86/include/asm/i387.h
index a783549db991..7a69577de45b 100644
--- a/xen/arch/x86/include/asm/i387.h
+++ b/xen/arch/x86/include/asm/i387.h
@@ -31,10 +31,30 @@ void vcpu_restore_fpu_nonlazy(struct vcpu *v, bool need_stts);
void vcpu_restore_fpu_lazy(struct vcpu *v);
void vcpu_save_fpu(struct vcpu *v);
void save_fpu_enable(void);
-
int vcpu_init_fpu(struct vcpu *v);
-struct xsave_struct;
-void vcpu_setup_fpu(struct vcpu *v, struct xsave_struct *xsave_area,
- const void *data, unsigned int fcw_default);
void vcpu_destroy_fpu(struct vcpu *v);
+
+/*
+ * Restore v's FPU to power-on reset values
+ *
+ * @param v vCPU containing the FPU
+ */
+void vcpu_reset_fpu(struct vcpu *v);
+
+/*
+ * Restore v's FPU to default values
+ *
+ * @param v vCPU containing the FPU
+ */
+void vcpu_default_fpu(struct vcpu *v);
+
+/*
+ * Load x87/SSE state into v's FPU
+ *
+ * Overrides the XSAVE header to set the state components to be x87 and SSE.
+ *
+ * @param v vCPU containing the FPU
+ * @param data 512-octet blob for x87/SSE state
+ */
+void vcpu_setup_fpu(struct vcpu *v, const void *data);
#endif /* __ASM_I386_I387_H */
diff --git a/xen/arch/x86/include/asm/xstate.h b/xen/arch/x86/include/asm/xstate.h
index ebeb2a3dcaf9..6144ed6f8551 100644
--- a/xen/arch/x86/include/asm/xstate.h
+++ b/xen/arch/x86/include/asm/xstate.h
@@ -14,6 +14,7 @@
#define FCW_DEFAULT 0x037f
#define FCW_RESET 0x0040
+#define FTW_RESET 0xFF
#define MXCSR_DEFAULT 0x1f80
extern uint32_t mxcsr_mask;
--
2.45.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/2] x86/fpu: Split fpu_setup_fpu() in three
2024-08-13 14:21 ` [PATCH v3 2/2] x86/fpu: Split fpu_setup_fpu() in three Alejandro Vallejo
@ 2024-08-13 14:32 ` Jan Beulich
2024-08-13 16:33 ` Alejandro Vallejo
2024-10-04 0:04 ` Andrew Cooper
1 sibling, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2024-08-13 14:32 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Andrew Cooper, Roger Pau Monné, Xen-devel
On 13.08.2024 16:21, Alejandro Vallejo wrote:
> It was trying to do too many things at once and there was no clear way of
> defining what it was meant to do. This commit splits the function in three.
>
> 1. A function to return the FPU to power-on reset values.
> 2. A function to return the FPU to default values.
> 3. A x87/SSE state loader (equivalent to the old function when it took a data
> pointer).
>
> While at it, make sure the abridged tag is consistent with the manuals and
> start as 0xFF.
>
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
> ---
> v3:
> * Adjust commit message, as the split is now in 3.
> * Remove bulky comment, as the rationale for it turned out to be
> unsubstantiated. I can't find proof in xen-devel of the stream
> operating the way I claimed, and at that point having the comment
> at all is pointless
So you deliberately removed the comment altogether, not just point 3 of it?
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/2] x86/fpu: Split fpu_setup_fpu() in three
2024-08-13 14:32 ` Jan Beulich
@ 2024-08-13 16:33 ` Alejandro Vallejo
2024-10-03 13:54 ` Alejandro Vallejo
0 siblings, 1 reply; 12+ messages in thread
From: Alejandro Vallejo @ 2024-08-13 16:33 UTC (permalink / raw)
To: Jan Beulich; +Cc: Andrew Cooper, Roger Pau Monné, Xen-devel
On Tue Aug 13, 2024 at 3:32 PM BST, Jan Beulich wrote:
> On 13.08.2024 16:21, Alejandro Vallejo wrote:
> > It was trying to do too many things at once and there was no clear way of
> > defining what it was meant to do. This commit splits the function in three.
> >
> > 1. A function to return the FPU to power-on reset values.
> > 2. A function to return the FPU to default values.
> > 3. A x87/SSE state loader (equivalent to the old function when it took a data
> > pointer).
> >
> > While at it, make sure the abridged tag is consistent with the manuals and
> > start as 0xFF.
> >
> > Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>
> > ---
> > v3:
> > * Adjust commit message, as the split is now in 3.
> > * Remove bulky comment, as the rationale for it turned out to be
> > unsubstantiated. I can't find proof in xen-devel of the stream
> > operating the way I claimed, and at that point having the comment
> > at all is pointless
>
> So you deliberately removed the comment altogether, not just point 3 of it?
>
> Jan
Yes. The other two cases can be deduced pretty trivially from the conditional,
I reckon. I commented them more heavily in order to properly introduce (3), but
seeing how it was all a midsummer dream might as well reduce clutter.
I got as far as the original implementation of XSAVE in Xen and it seems to
have been tested against many combinations of src and dst, none of which was
that ficticious "xsave enabled + xsave context missing". I suspect the
xsave_enabled(v) was merely avoiding writing to the XSAVE buffer just for
efficiency (however minor effect it might have had). I just reverse engineering
it wrong.
Which reminds me. Thanks for mentioning that, because it was really just
guesswork on my part.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/2] x86/fpu: Split fpu_setup_fpu() in three
2024-08-13 16:33 ` Alejandro Vallejo
@ 2024-10-03 13:54 ` Alejandro Vallejo
2024-10-04 6:08 ` Jan Beulich
0 siblings, 1 reply; 12+ messages in thread
From: Alejandro Vallejo @ 2024-10-03 13:54 UTC (permalink / raw)
To: Alejandro Vallejo, Jan Beulich
Cc: Andrew Cooper, Roger Pau Monné, Xen-devel
Hi,
On Tue Aug 13, 2024 at 5:33 PM BST, Alejandro Vallejo wrote:
> On Tue Aug 13, 2024 at 3:32 PM BST, Jan Beulich wrote:
> > On 13.08.2024 16:21, Alejandro Vallejo wrote:
> > > It was trying to do too many things at once and there was no clear way of
> > > defining what it was meant to do. This commit splits the function in three.
> > >
> > > 1. A function to return the FPU to power-on reset values.
> > > 2. A function to return the FPU to default values.
> > > 3. A x87/SSE state loader (equivalent to the old function when it took a data
> > > pointer).
> > >
> > > While at it, make sure the abridged tag is consistent with the manuals and
> > > start as 0xFF.
> > >
> > > Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> >
> > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> >
> > > ---
> > > v3:
> > > * Adjust commit message, as the split is now in 3.
> > > * Remove bulky comment, as the rationale for it turned out to be
> > > unsubstantiated. I can't find proof in xen-devel of the stream
> > > operating the way I claimed, and at that point having the comment
> > > at all is pointless
> >
> > So you deliberately removed the comment altogether, not just point 3 of it?
> >
> > Jan
>
> Yes. The other two cases can be deduced pretty trivially from the conditional,
> I reckon. I commented them more heavily in order to properly introduce (3), but
> seeing how it was all a midsummer dream might as well reduce clutter.
>
> I got as far as the original implementation of XSAVE in Xen and it seems to
> have been tested against many combinations of src and dst, none of which was
> that ficticious "xsave enabled + xsave context missing". I suspect the
> xsave_enabled(v) was merely avoiding writing to the XSAVE buffer just for
> efficiency (however minor effect it might have had). I just reverse engineering
> it wrong.
>
> Which reminds me. Thanks for mentioning that, because it was really just
> guesswork on my part.
>
> Cheers,
> Alejandro
Playing around with the FPU I noticed this patch wasn't committed, did it fall
under the cracks or is there a specific reason?
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/2] x86/fpu: Combine fpu_ctxt and xsave_area in arch_vcpu
2024-08-13 14:21 ` [PATCH v3 1/2] x86/fpu: Combine fpu_ctxt and xsave_area in arch_vcpu Alejandro Vallejo
@ 2024-10-03 19:38 ` Andrew Cooper
2024-10-04 16:15 ` Alejandro Vallejo
0 siblings, 1 reply; 12+ messages in thread
From: Andrew Cooper @ 2024-10-03 19:38 UTC (permalink / raw)
To: Alejandro Vallejo, Xen-devel; +Cc: Jan Beulich, Roger Pau Monné
On 13/08/2024 3:21 pm, Alejandro Vallejo wrote:
> @@ -299,44 +299,14 @@ void save_fpu_enable(void)
> /* Initialize FPU's context save area */
> int vcpu_init_fpu(struct vcpu *v)
> {
> - int rc;
> -
> v->arch.fully_eager_fpu = opt_eager_fpu;
> -
> - if ( (rc = xstate_alloc_save_area(v)) != 0 )
> - return rc;
> -
> - if ( v->arch.xsave_area )
> - v->arch.fpu_ctxt = &v->arch.xsave_area->fpu_sse;
> - else
> - {
> - BUILD_BUG_ON(__alignof(v->arch.xsave_area->fpu_sse) < 16);
> - v->arch.fpu_ctxt = _xzalloc(sizeof(v->arch.xsave_area->fpu_sse),
> - __alignof(v->arch.xsave_area->fpu_sse));
> - if ( v->arch.fpu_ctxt )
> - {
> - fpusse_t *fpu_sse = v->arch.fpu_ctxt;
> -
> - fpu_sse->fcw = FCW_DEFAULT;
> - fpu_sse->mxcsr = MXCSR_DEFAULT;
> - }
> - else
> - rc = -ENOMEM;
This looks wonky. It's not, because xstate_alloc_save_area() contains
the same logic for setting up FCW/MXCSR.
It would be helpful to note this in the commit message. Something about
deduplicating the setup alongside deduplicating the pointer.
> diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
> index bca3258d69ac..3da60af2a44a 100644
> --- a/xen/arch/x86/include/asm/domain.h
> +++ b/xen/arch/x86/include/asm/domain.h
> @@ -592,11 +592,11 @@ struct pv_vcpu
> struct arch_vcpu
> {
> /*
> - * guest context (mirroring struct vcpu_guest_context) common
> - * between pv and hvm guests
> + * Guest context common between PV and HVM guests. Includes general purpose
> + * registers, segment registers and other parts of the exception frame.
> + *
> + * It doesn't contain FPU state, as that lives in xsave_area instead.
> */
This new comment isn't really correct either. arch_vcpu contains the
PV/HVM union, so it not only things which are common between the two.
I'd either leave it alone, or delete it entirely. It doesn't serve much
purpose IMO, and it is going to bitrot very quickly (FRED alone will
change two of the state groups you mention).
> -
> - void *fpu_ctxt;
> struct cpu_user_regs user_regs;
>
> /* Debug registers. */
> diff --git a/xen/arch/x86/x86_emulate/blk.c b/xen/arch/x86/x86_emulate/blk.c
> index e790f4f90056..28b54f26fe29 100644
> --- a/xen/arch/x86/x86_emulate/blk.c
> +++ b/xen/arch/x86/x86_emulate/blk.c
> @@ -11,7 +11,8 @@
> !defined(X86EMUL_NO_SIMD)
> # ifdef __XEN__
> # include <asm/xstate.h>
> -# define FXSAVE_AREA current->arch.fpu_ctxt
> +# define FXSAVE_AREA ((struct x86_fxsr *) \
> + (void *)¤t->arch.xsave_area->fpu_sse)
This isn't a like-for-like replacement.
Previously FXSAVE_AREA's type was void *. I'd leave the expression as just
(void *)¤t->arch.xsave_area->fpu_sse
because struct x86_fxsr is not the only type needing to be used here in
due course. (There are 8 variations of data layout for older
instructions.)
> # else
> # define FXSAVE_AREA get_fpu_save_area()
> # endif
> diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
> index 5c4144d55e89..850ee31bd18c 100644
> --- a/xen/arch/x86/xstate.c
> +++ b/xen/arch/x86/xstate.c
> @@ -507,9 +507,16 @@ int xstate_alloc_save_area(struct vcpu *v)
> unsigned int size;
>
> if ( !cpu_has_xsave )
> - return 0;
> -
> - if ( !is_idle_vcpu(v) || !cpu_has_xsavec )
> + {
> + /*
> + * This is bigger than FXSAVE_SIZE by 64 bytes, but it helps treating
> + * the FPU state uniformly as an XSAVE buffer even if XSAVE is not
> + * available in the host. Note the alignment restriction of the XSAVE
> + * area are stricter than those of the FXSAVE area.
> + */
Can I suggest the following?
"On non-XSAVE systems, we allocate an XSTATE buffer for simplicity.
XSTATE is backwards compatible to FXSAVE, and only one cacheline larger."
It's rather more concise.
~Andrew
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/2] x86/fpu: Split fpu_setup_fpu() in three
2024-08-13 14:21 ` [PATCH v3 2/2] x86/fpu: Split fpu_setup_fpu() in three Alejandro Vallejo
2024-08-13 14:32 ` Jan Beulich
@ 2024-10-04 0:04 ` Andrew Cooper
1 sibling, 0 replies; 12+ messages in thread
From: Andrew Cooper @ 2024-10-04 0:04 UTC (permalink / raw)
To: Alejandro Vallejo, Xen-devel; +Cc: Jan Beulich, Roger Pau Monné
On 13/08/2024 3:21 pm, Alejandro Vallejo wrote:
> It was trying to do too many things at once and there was no clear way of
> defining what it was meant to do. This commit splits the function in three.
>
> 1. A function to return the FPU to power-on reset values.
> 2. A function to return the FPU to default values.
> 3. A x87/SSE state loader (equivalent to the old function when it took a data
> pointer).
>
> While at it, make sure the abridged tag is consistent with the manuals and
> start as 0xFF.
>
> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
fpu_setup_fpu() definitely needs splitting, and it was doing many things
before, but...
> diff --git a/xen/arch/x86/include/asm/i387.h b/xen/arch/x86/include/asm/i387.h
> index a783549db991..7a69577de45b 100644
> --- a/xen/arch/x86/include/asm/i387.h
> +++ b/xen/arch/x86/include/asm/i387.h
> @@ -31,10 +31,30 @@ void vcpu_restore_fpu_nonlazy(struct vcpu *v, bool need_stts);
> void vcpu_restore_fpu_lazy(struct vcpu *v);
> void vcpu_save_fpu(struct vcpu *v);
> void save_fpu_enable(void);
> -
> int vcpu_init_fpu(struct vcpu *v);
> -struct xsave_struct;
> -void vcpu_setup_fpu(struct vcpu *v, struct xsave_struct *xsave_area,
> - const void *data, unsigned int fcw_default);
> void vcpu_destroy_fpu(struct vcpu *v);
> +
> +/*
> + * Restore v's FPU to power-on reset values
> + *
> + * @param v vCPU containing the FPU
> + */
> +void vcpu_reset_fpu(struct vcpu *v);
> +
> +/*
> + * Restore v's FPU to default values
> + *
> + * @param v vCPU containing the FPU
> + */
> +void vcpu_default_fpu(struct vcpu *v);
... whether this is an accurate transform of the logic or not, we
oughtn't to have both.
AFAICT, these two functions differ only in the choice of FCW constant,
and whether FTW gets a nonzero value or not.
The x86 architecture has #RESET and #INIT states (and Xen is especially
bad at these right now). They're well defined in the SDM/APM; #RESET
zeroes most things but leaves FCW=0x40 and FTW=0x5555 (for which the
FXSAVE mapping is 0xFF). #INIT leaves everything unmodified.
[After a very long manual-diving session. Thankyou Christian]
The 80287's #RESET and F(N)INIT instructions were broadly similar. They
differed on whether they changed the addressing mode[1] but both cases
set FCW=0x37f and FTW=0x5555.
The 80387 intentionally diverged #RESET and FNINIT, with #RESET setting
FCW=0x40. This had a side effect of asserting #ERROR, and software is
required to execute FNINIT which sets FCW=0x37f and de-asserts #ERROR
Why? From the 387's programmers reference:
---%<---
6.2.2 Hardware Recognition of the NPX
The 80386 identifies the type of its coprocessor (80287 or 80387) by
sampling its ERROR# input some time after the falling edge of RESET and
before executing the first ESC instruction. The 80287 keeps its ERROR#
output in inactive state after hardware reset; the 80387 keeps its
ERROR# output in active state after hardware reset. The 80386 records
this difference in the ET bit of control register zero (CR0). The 80386
subsequently uses ET to control its interface with the coprocessor. If
ET is set, it employs the 32-bit protocol of the 80387; if ET is not
set, it employs the 16-bit protocol of the 80287.
<snip>
6.2.5 Initializing the 80387
Initializing the 80387 NPX simply means placing the NPX in a known state
unaffected by any activity performed earlier. A single FNINIT
instruction performs this initialization. All the error masks are set,
all registers are tagged empty, TOP is set to zero, and default rounding
and precision controls are set. Table 6-1 shows the state of the 80387
NPX following FINIT or FNINIT. This state is compatible with that of the
80287 after FINIT or after hardware RESET.
---%<---
The 486 gets even more complicated, but I've been writing this email for
long enough.
So, 0x40 is the correct reset value for the 386/387 and later, hence why
that's what the SDM/APM say these days.
As for Xen. For HVM guests, setting FCW=0x37f is definitely wrong.
For PV guests, things are ill-defined, but software has been required to
issue FNINIT for 4 decades already (even MiniOS does!) so I find myself
dis-interested in trying to maintain compatibility for coprocessor which
predates Xen by ~15y.
You're already playing with FTW too (vs the old logic), so please drop
vcpu_default_fpu() and just use vcpu_reset_fpu() instead.
As a bonus, it lets you delete yet-more code, but it also needs to come
with a rework of the commit message. I'd aim for less "split in 3" and
more "delete and write something sane".
> +
> +/*
> + * Load x87/SSE state into v's FPU
> + *
> + * Overrides the XSAVE header to set the state components to be x87 and SSE.
> + *
> + * @param v vCPU containing the FPU
> + * @param data 512-octet blob for x87/SSE state
> + */
> +void vcpu_setup_fpu(struct vcpu *v, const void *data);
> #endif /* __ASM_I386_I387_H */
> diff --git a/xen/arch/x86/include/asm/xstate.h b/xen/arch/x86/include/asm/xstate.h
> index ebeb2a3dcaf9..6144ed6f8551 100644
> --- a/xen/arch/x86/include/asm/xstate.h
> +++ b/xen/arch/x86/include/asm/xstate.h
> @@ -14,6 +14,7 @@
>
> #define FCW_DEFAULT 0x037f
> #define FCW_RESET 0x0040
> +#define FTW_RESET 0xFF
This isn't a tag word reset value. The value for that would be 0x5555.
I'd suggest:
#define FXSAVE_FTW_RESET 0xff /* Abridged Tag Word format */
for want of anything better. At least this name makes it clear it's
specifically for the FXSAVE format.
~Andrew
[1] the 286, (in)famously couldn't leave protected mode, and neither
could the 287 it seems.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/2] x86/fpu: Split fpu_setup_fpu() in three
2024-10-03 13:54 ` Alejandro Vallejo
@ 2024-10-04 6:08 ` Jan Beulich
2024-10-07 15:59 ` Alejandro Vallejo
0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2024-10-04 6:08 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Andrew Cooper, Roger Pau Monné, Xen-devel
On 03.10.2024 15:54, Alejandro Vallejo wrote:
> On Tue Aug 13, 2024 at 5:33 PM BST, Alejandro Vallejo wrote:
>> On Tue Aug 13, 2024 at 3:32 PM BST, Jan Beulich wrote:
>>> On 13.08.2024 16:21, Alejandro Vallejo wrote:
>>>> It was trying to do too many things at once and there was no clear way of
>>>> defining what it was meant to do. This commit splits the function in three.
>>>>
>>>> 1. A function to return the FPU to power-on reset values.
>>>> 2. A function to return the FPU to default values.
>>>> 3. A x87/SSE state loader (equivalent to the old function when it took a data
>>>> pointer).
>>>>
>>>> While at it, make sure the abridged tag is consistent with the manuals and
>>>> start as 0xFF.
>>>>
>>>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>>>
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>
>>>> ---
>>>> v3:
>>>> * Adjust commit message, as the split is now in 3.
>>>> * Remove bulky comment, as the rationale for it turned out to be
>>>> unsubstantiated. I can't find proof in xen-devel of the stream
>>>> operating the way I claimed, and at that point having the comment
>>>> at all is pointless
>>>
>>> So you deliberately removed the comment altogether, not just point 3 of it?
>>>
>>> Jan
>>
>> Yes. The other two cases can be deduced pretty trivially from the conditional,
>> I reckon. I commented them more heavily in order to properly introduce (3), but
>> seeing how it was all a midsummer dream might as well reduce clutter.
>>
>> I got as far as the original implementation of XSAVE in Xen and it seems to
>> have been tested against many combinations of src and dst, none of which was
>> that ficticious "xsave enabled + xsave context missing". I suspect the
>> xsave_enabled(v) was merely avoiding writing to the XSAVE buffer just for
>> efficiency (however minor effect it might have had). I just reverse engineering
>> it wrong.
>>
>> Which reminds me. Thanks for mentioning that, because it was really just
>> guesswork on my part.
>>
>> Cheers,
>> Alejandro
>
> Playing around with the FPU I noticed this patch wasn't committed, did it fall
> under the cracks or is there a specific reason?
Well, it's patch 2 in a series with no statement that it's independent of patch
1, and patch 1 continues to lack an ack (based on earlier comments of mine you
probably have inferred that I'm not intending to ack it in this shape, while at
the same time - considering the arguments you gave - I also don't mean to stand
in the way of it going in with someone else's ack).
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/2] x86/fpu: Combine fpu_ctxt and xsave_area in arch_vcpu
2024-10-03 19:38 ` Andrew Cooper
@ 2024-10-04 16:15 ` Alejandro Vallejo
0 siblings, 0 replies; 12+ messages in thread
From: Alejandro Vallejo @ 2024-10-04 16:15 UTC (permalink / raw)
To: Andrew Cooper, Xen-devel; +Cc: Jan Beulich, Roger Pau Monné
Hi,
On Thu Oct 3, 2024 at 8:38 PM BST, Andrew Cooper wrote:
> On 13/08/2024 3:21 pm, Alejandro Vallejo wrote:
> > @@ -299,44 +299,14 @@ void save_fpu_enable(void)
> > /* Initialize FPU's context save area */
> > int vcpu_init_fpu(struct vcpu *v)
> > {
> > - int rc;
> > -
> > v->arch.fully_eager_fpu = opt_eager_fpu;
> > -
> > - if ( (rc = xstate_alloc_save_area(v)) != 0 )
> > - return rc;
> > -
> > - if ( v->arch.xsave_area )
> > - v->arch.fpu_ctxt = &v->arch.xsave_area->fpu_sse;
> > - else
> > - {
> > - BUILD_BUG_ON(__alignof(v->arch.xsave_area->fpu_sse) < 16);
> > - v->arch.fpu_ctxt = _xzalloc(sizeof(v->arch.xsave_area->fpu_sse),
> > - __alignof(v->arch.xsave_area->fpu_sse));
> > - if ( v->arch.fpu_ctxt )
> > - {
> > - fpusse_t *fpu_sse = v->arch.fpu_ctxt;
> > -
> > - fpu_sse->fcw = FCW_DEFAULT;
> > - fpu_sse->mxcsr = MXCSR_DEFAULT;
> > - }
> > - else
> > - rc = -ENOMEM;
>
> This looks wonky. It's not, because xstate_alloc_save_area() contains
> the same logic for setting up FCW/MXCSR.
>
> It would be helpful to note this in the commit message. Something about
> deduplicating the setup alongside deduplicating the pointer.
>
Sure
> > diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
> > index bca3258d69ac..3da60af2a44a 100644
> > --- a/xen/arch/x86/include/asm/domain.h
> > +++ b/xen/arch/x86/include/asm/domain.h
> > @@ -592,11 +592,11 @@ struct pv_vcpu
> > struct arch_vcpu
> > {
> > /*
> > - * guest context (mirroring struct vcpu_guest_context) common
> > - * between pv and hvm guests
> > + * Guest context common between PV and HVM guests. Includes general purpose
> > + * registers, segment registers and other parts of the exception frame.
> > + *
> > + * It doesn't contain FPU state, as that lives in xsave_area instead.
> > */
>
> This new comment isn't really correct either. arch_vcpu contains the
> PV/HVM union, so it not only things which are common between the two.
It's about cpu_user_regs though, not arch_vcpu?
>
> I'd either leave it alone, or delete it entirely. It doesn't serve much
> purpose IMO, and it is going to bitrot very quickly (FRED alone will
> change two of the state groups you mention).
>
I'm happy getting rid of it because it's actively confusing in its current
form. That said, I can't possibly believe there's not a single simple
description of cpu_user_regs that everyone can agree on.
> > -
> > - void *fpu_ctxt;
> > struct cpu_user_regs user_regs;
> >
> > /* Debug registers. */
> > diff --git a/xen/arch/x86/x86_emulate/blk.c b/xen/arch/x86/x86_emulate/blk.c
> > index e790f4f90056..28b54f26fe29 100644
> > --- a/xen/arch/x86/x86_emulate/blk.c
> > +++ b/xen/arch/x86/x86_emulate/blk.c
> > @@ -11,7 +11,8 @@
> > !defined(X86EMUL_NO_SIMD)
> > # ifdef __XEN__
> > # include <asm/xstate.h>
> > -# define FXSAVE_AREA current->arch.fpu_ctxt
> > +# define FXSAVE_AREA ((struct x86_fxsr *) \
> > + (void *)¤t->arch.xsave_area->fpu_sse)
>
> This isn't a like-for-like replacement.
>
> Previously FXSAVE_AREA's type was void *. I'd leave the expression as just
>
> (void *)¤t->arch.xsave_area->fpu_sse
>
> because struct x86_fxsr is not the only type needing to be used here in
> due course. (There are 8 variations of data layout for older
> instructions.)
>
Sure
> > # else
> > # define FXSAVE_AREA get_fpu_save_area()
> > # endif
> > diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
> > index 5c4144d55e89..850ee31bd18c 100644
> > --- a/xen/arch/x86/xstate.c
> > +++ b/xen/arch/x86/xstate.c
> > @@ -507,9 +507,16 @@ int xstate_alloc_save_area(struct vcpu *v)
> > unsigned int size;
> >
> > if ( !cpu_has_xsave )
> > - return 0;
> > -
> > - if ( !is_idle_vcpu(v) || !cpu_has_xsavec )
> > + {
> > + /*
> > + * This is bigger than FXSAVE_SIZE by 64 bytes, but it helps treating
> > + * the FPU state uniformly as an XSAVE buffer even if XSAVE is not
> > + * available in the host. Note the alignment restriction of the XSAVE
> > + * area are stricter than those of the FXSAVE area.
> > + */
>
> Can I suggest the following?
>
> "On non-XSAVE systems, we allocate an XSTATE buffer for simplicity.
> XSTATE is backwards compatible to FXSAVE, and only one cacheline larger."
>
> It's rather more concise.
>
> ~Andrew
Sure.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/2] x86/fpu: Split fpu_setup_fpu() in three
2024-10-04 6:08 ` Jan Beulich
@ 2024-10-07 15:59 ` Alejandro Vallejo
2024-10-08 7:52 ` Jan Beulich
0 siblings, 1 reply; 12+ messages in thread
From: Alejandro Vallejo @ 2024-10-07 15:59 UTC (permalink / raw)
To: Jan Beulich; +Cc: Andrew Cooper, Roger Pau Monné, Xen-devel
Hi,
On Fri Oct 4, 2024 at 7:08 AM BST, Jan Beulich wrote:
> On 03.10.2024 15:54, Alejandro Vallejo wrote:
> > On Tue Aug 13, 2024 at 5:33 PM BST, Alejandro Vallejo wrote:
> >> On Tue Aug 13, 2024 at 3:32 PM BST, Jan Beulich wrote:
> >>> On 13.08.2024 16:21, Alejandro Vallejo wrote:
> >>>> It was trying to do too many things at once and there was no clear way of
> >>>> defining what it was meant to do. This commit splits the function in three.
> >>>>
> >>>> 1. A function to return the FPU to power-on reset values.
> >>>> 2. A function to return the FPU to default values.
> >>>> 3. A x87/SSE state loader (equivalent to the old function when it took a data
> >>>> pointer).
> >>>>
> >>>> While at it, make sure the abridged tag is consistent with the manuals and
> >>>> start as 0xFF.
> >>>>
> >>>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
> >>>
> >>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> >>>
> >>>> ---
> >>>> v3:
> >>>> * Adjust commit message, as the split is now in 3.
> >>>> * Remove bulky comment, as the rationale for it turned out to be
> >>>> unsubstantiated. I can't find proof in xen-devel of the stream
> >>>> operating the way I claimed, and at that point having the comment
> >>>> at all is pointless
> >>>
> >>> So you deliberately removed the comment altogether, not just point 3 of it?
> >>>
> >>> Jan
> >>
> >> Yes. The other two cases can be deduced pretty trivially from the conditional,
> >> I reckon. I commented them more heavily in order to properly introduce (3), but
> >> seeing how it was all a midsummer dream might as well reduce clutter.
> >>
> >> I got as far as the original implementation of XSAVE in Xen and it seems to
> >> have been tested against many combinations of src and dst, none of which was
> >> that ficticious "xsave enabled + xsave context missing". I suspect the
> >> xsave_enabled(v) was merely avoiding writing to the XSAVE buffer just for
> >> efficiency (however minor effect it might have had). I just reverse engineering
> >> it wrong.
> >>
> >> Which reminds me. Thanks for mentioning that, because it was really just
> >> guesswork on my part.
> >>
> >> Cheers,
> >> Alejandro
> >
> > Playing around with the FPU I noticed this patch wasn't committed, did it fall
> > under the cracks or is there a specific reason?
>
> Well, it's patch 2 in a series with no statement that it's independent of patch
I meant the series as a whole, rather than this specific patch. They are indeed
not independent.
> 1, and patch 1 continues to lack an ack (based on earlier comments of mine you
> probably have inferred that I'm not intending to ack it in this shape, while at
> the same time - considering the arguments you gave - I also don't mean to stand
> in the way of it going in with someone else's ack).
I didn't infer that at all, I'm afraid. I merely thought you had been busy and
forgot about it. Is the "in this shape" about the overallocation that you
mentioned in v1?
>
> Jan
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/2] x86/fpu: Split fpu_setup_fpu() in three
2024-10-07 15:59 ` Alejandro Vallejo
@ 2024-10-08 7:52 ` Jan Beulich
0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2024-10-08 7:52 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Andrew Cooper, Roger Pau Monné, Xen-devel
On 07.10.2024 17:59, Alejandro Vallejo wrote:
> On Fri Oct 4, 2024 at 7:08 AM BST, Jan Beulich wrote:
>> On 03.10.2024 15:54, Alejandro Vallejo wrote:
>>> On Tue Aug 13, 2024 at 5:33 PM BST, Alejandro Vallejo wrote:
>>>> On Tue Aug 13, 2024 at 3:32 PM BST, Jan Beulich wrote:
>>>>> On 13.08.2024 16:21, Alejandro Vallejo wrote:
>>>>>> It was trying to do too many things at once and there was no clear way of
>>>>>> defining what it was meant to do. This commit splits the function in three.
>>>>>>
>>>>>> 1. A function to return the FPU to power-on reset values.
>>>>>> 2. A function to return the FPU to default values.
>>>>>> 3. A x87/SSE state loader (equivalent to the old function when it took a data
>>>>>> pointer).
>>>>>>
>>>>>> While at it, make sure the abridged tag is consistent with the manuals and
>>>>>> start as 0xFF.
>>>>>>
>>>>>> Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
>>>>>
>>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>>>
>>>>>> ---
>>>>>> v3:
>>>>>> * Adjust commit message, as the split is now in 3.
>>>>>> * Remove bulky comment, as the rationale for it turned out to be
>>>>>> unsubstantiated. I can't find proof in xen-devel of the stream
>>>>>> operating the way I claimed, and at that point having the comment
>>>>>> at all is pointless
>>>>>
>>>>> So you deliberately removed the comment altogether, not just point 3 of it?
>>>>>
>>>>> Jan
>>>>
>>>> Yes. The other two cases can be deduced pretty trivially from the conditional,
>>>> I reckon. I commented them more heavily in order to properly introduce (3), but
>>>> seeing how it was all a midsummer dream might as well reduce clutter.
>>>>
>>>> I got as far as the original implementation of XSAVE in Xen and it seems to
>>>> have been tested against many combinations of src and dst, none of which was
>>>> that ficticious "xsave enabled + xsave context missing". I suspect the
>>>> xsave_enabled(v) was merely avoiding writing to the XSAVE buffer just for
>>>> efficiency (however minor effect it might have had). I just reverse engineering
>>>> it wrong.
>>>>
>>>> Which reminds me. Thanks for mentioning that, because it was really just
>>>> guesswork on my part.
>>>>
>>>> Cheers,
>>>> Alejandro
>>>
>>> Playing around with the FPU I noticed this patch wasn't committed, did it fall
>>> under the cracks or is there a specific reason?
>>
>> Well, it's patch 2 in a series with no statement that it's independent of patch
>
> I meant the series as a whole, rather than this specific patch. They are indeed
> not independent.
>
>> 1, and patch 1 continues to lack an ack (based on earlier comments of mine you
>> probably have inferred that I'm not intending to ack it in this shape, while at
>> the same time - considering the arguments you gave - I also don't mean to stand
>> in the way of it going in with someone else's ack).
>
> I didn't infer that at all, I'm afraid. I merely thought you had been busy and
> forgot about it. Is the "in this shape" about the overallocation that you
> mentioned in v1?
Yes.
Jan
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-10-08 7:52 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-13 14:21 [PATCH v3 0/2] x86: FPU handling cleanup Alejandro Vallejo
2024-08-13 14:21 ` [PATCH v3 1/2] x86/fpu: Combine fpu_ctxt and xsave_area in arch_vcpu Alejandro Vallejo
2024-10-03 19:38 ` Andrew Cooper
2024-10-04 16:15 ` Alejandro Vallejo
2024-08-13 14:21 ` [PATCH v3 2/2] x86/fpu: Split fpu_setup_fpu() in three Alejandro Vallejo
2024-08-13 14:32 ` Jan Beulich
2024-08-13 16:33 ` Alejandro Vallejo
2024-10-03 13:54 ` Alejandro Vallejo
2024-10-04 6:08 ` Jan Beulich
2024-10-07 15:59 ` Alejandro Vallejo
2024-10-08 7:52 ` Jan Beulich
2024-10-04 0:04 ` Andrew Cooper
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.