All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mohamed Mediouni <mohamed@unpredictable.fr>
To: qemu-devel@nongnu.org
Cc: Pedro Barbuda <pbarbuda@microsoft.com>,
	qemu-arm@nongnu.org,
	Pierrick Bouvier <pierrick.bouvier@linaro.org>,
	Mohamed Mediouni <mohamed@unpredictable.fr>,
	Roman Bolshakov <rbolshakov@ddn.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Wei Liu <wei.liu@kernel.org>,
	Phil Dennis-Jordan <phil@philjordan.eu>,
	Peter Maydell <peter.maydell@linaro.org>,
	Zhao Liu <zhao1.liu@intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Magnus Kulke <magnuskulke@linux.microsoft.com>
Subject: [PATCH v3 28/37] target/i386: add de/compaction to xsave_helper
Date: Wed, 22 Apr 2026 23:42:16 +0200	[thread overview]
Message-ID: <20260422214225.2242-29-mohamed@unpredictable.fr> (raw)
In-Reply-To: <20260422214225.2242-1-mohamed@unpredictable.fr>

From: Magnus Kulke <magnuskulke@linux.microsoft.com>

HyperV use XSAVES which stores extended state in compacted format in
which components are packed contiguously, while QEMU's internal XSAVE
representation use the standard format in which each component is places
at a fixed offset. Hence for this purpose we add two conversion fn's to
the xsave helper to roundtrip XSAVE state in a migration.

- decompact_xsave_area(): converts compacted format to standard.
  XSTATE_BV is masked to host XCR0 since IA32_XSS is managed
  by the hypervisor.

- compact_xsave_area(): converts standard format back to compacted
  format. XCOMP_BV is set from the host's CPUID 0xD.0 rather than the
  guest's XCR0, as this is what the hypervisor expects.

Both functions use the host's CPUID leaf 0xD subleaves to determine component
sizes, offsets, and alignment requirements.

There are situations when the host advertises features that we want to
disable for the guest, e.g. AMX TILE. In this case we cannot rely on the
host's xcr0, but instead we use the feature mask that has been generated
in as part of the CPU realization process (x86_cpu_expand_features).

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>

[Fixup: made xsave_offset a size_t to fix macOS and OpenBSD builds]

Signed-off-by: Mohamed Mediouni <mohamed@unpredictable.fr>
---
 target/i386/cpu.h          |   2 +
 target/i386/xsave_helper.c | 256 +++++++++++++++++++++++++++++++++++++
 2 files changed, 258 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 0af7bdf85a..80cdc1cb2a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -3023,6 +3023,8 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void *buf, uint32_t buflen);
 void x86_cpu_xsave_all_areas(X86CPU *cpu, void *buf, uint32_t buflen);
 uint32_t xsave_area_size(uint64_t mask, bool compacted);
 void x86_update_hflags(CPUX86State* env);
+int decompact_xsave_area(const void *buf, size_t buflen, CPUX86State *env);
+int compact_xsave_area(CPUX86State *env, void *buf, size_t buflen);
 
 static inline bool hyperv_feat_enabled(X86CPU *cpu, int feat)
 {
diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c
index bab2258732..625bae103a 100644
--- a/target/i386/xsave_helper.c
+++ b/target/i386/xsave_helper.c
@@ -3,6 +3,7 @@
  * See the COPYING file in the top-level directory.
  */
 #include "qemu/osdep.h"
+#include "qemu/error-report.h"
 
 #include "cpu.h"
 
@@ -293,3 +294,258 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void *buf, uint32_t buflen)
     }
 #endif
 }
+
+#define XSTATE_BV_IN_HDR  offsetof(X86XSaveHeader, xstate_bv)
+#define XCOMP_BV_IN_HDR   offsetof(X86XSaveHeader, xcomp_bvo)
+
+typedef struct X86XSaveAreaView {
+    /* 512 bytes */
+    X86LegacyXSaveArea legacy;
+    /* 64 bytes */
+    X86XSaveHeader     header;
+    /* ...followed by individual xsave areas */
+} X86XSaveAreaView;
+
+#define XSAVE_XSTATE_BV_OFFSET  offsetof(X86XSaveAreaView, header.xstate_bv)
+#define XSAVE_XCOMP_BV_OFFSET   offsetof(X86XSaveAreaView, header.xcomp_bv)
+#define XSAVE_EXT_OFFSET        (sizeof(X86LegacyXSaveArea) + \
+                                 sizeof(X86XSaveHeader))
+
+/**
+ * decompact_xsave_area - Convert compacted XSAVE format to standard format
+ * @buf: Source buffer containing compacted XSAVE data
+ * @buflen: Size of source buffer
+ * @env: CPU state where the standard format buffer will be written to
+ *
+ * Accelerator backends like MSHV might return XSAVE state in compacted format
+ * (XSAVEC). The state components have to be packed contiguously without gaps.
+ * The XSAVE qemu buffers are in standard format where each component has a
+ * fixed offset.
+ *
+ * Returns: 0 on success, negative errno on failure
+ */
+int decompact_xsave_area(const void *buf, size_t buflen, CPUX86State *env)
+{
+    uint64_t compacted_xstate_bv, compacted_xcomp_bv, compacted_layout_bv;
+    size_t xsave_offset;
+    uint64_t *xcomp_bv;
+    size_t i;
+    uint32_t eax, ebx, ecx, edx;
+    uint32_t size, dst_off;
+    bool align64;
+    uint64_t guest_xcr0, *xstate_bv;
+
+    compacted_xstate_bv = *(uint64_t *)(buf + XSAVE_XSTATE_BV_OFFSET);
+    compacted_xcomp_bv  = *(uint64_t *)(buf + XSAVE_XCOMP_BV_OFFSET);
+
+    /* This function only handles compacted format (bit 63 set) */
+    assert((compacted_xcomp_bv >> 63) & 1);
+
+    /* Low bits of XCOMP_BV describe which components are in the layout */
+    compacted_layout_bv = compacted_xcomp_bv & ~(1ULL << 63);
+
+    /* Zero out buffer, then copy legacy region (FP + SSE) and header as-is */
+    memset(env->xsave_buf, 0, env->xsave_buf_len);
+    memcpy(env->xsave_buf, buf, XSAVE_EXT_OFFSET);
+
+    /*
+     * We mask XSTATE_BV with the guest's supported XCR0 because:
+     * 1. Supervisor state (IA32_XSS) is hypervisor-managed, we don't use
+     *    this state for migration.
+     * 2. Features disabled at partition creation (e.g. AMX) must be excluded
+     */
+    guest_xcr0 = ((uint64_t)env->features[FEAT_XSAVE_XCR0_HI] << 32) |
+                 env->features[FEAT_XSAVE_XCR0_LO];
+    xstate_bv = (uint64_t *)(env->xsave_buf + XSAVE_XSTATE_BV_OFFSET);
+    *xstate_bv &= guest_xcr0;
+
+    /* Clear bit 63 - output is standard format, not compacted */
+    xcomp_bv = (uint64_t *)(env->xsave_buf + XSAVE_XCOMP_BV_OFFSET);
+    *xcomp_bv = *xcomp_bv & ~(1ULL << 63);
+
+    /*
+     * Process each extended state component in the compacted layout.
+     * Components 0 and 1 (FP and SSE) are in the legacy region, so we
+     * start at component 2. For each component:
+     * - Calculate its offset in the compacted source (contiguous layout)
+     * - Get its fixed offset in the standard destination from CPUID
+     * - Copy if the component has non-init state (bit set in XSTATE_BV)
+     */
+    xsave_offset = XSAVE_EXT_OFFSET;
+    for (i = 2; i < 63; i++) {
+        if (((compacted_layout_bv >> i) & 1) == 0) {
+            continue;
+        }
+
+        /* Query guest CPUID for this component's size and standard offset */
+        cpu_x86_cpuid(env, 0xD, i, &eax, &ebx, &ecx, &edx);
+
+        size = eax;
+        dst_off = ebx;
+        align64 = (ecx & (1u << 1)) != 0;
+
+        /* Component is in the layout but unknown to the guest CPUID model */
+        if (size == 0) {
+            /*
+             * The hypervisor might expose a component that has no
+             * representation in the guest CPUID model. We query the host to
+             * retrieve the size of the component, so we can skip over it.
+             */
+            host_cpuid(0xD, i, &eax, &ebx, &ecx, &edx);
+            size = eax;
+            align64 = (ecx & (1u << 1)) != 0;
+            if (size == 0) {
+                error_report("xsave component %zu: size unknown to both "
+                             "guest and host CPUID", i);
+                return -EINVAL;
+            }
+
+            if (align64) {
+                xsave_offset = QEMU_ALIGN_UP(xsave_offset, 64);
+            }
+
+            if (xsave_offset + size > buflen) {
+                error_report("xsave component %zu overruns source buffer: "
+                             "offset=%zu size=%u buflen=%zu",
+                             i, xsave_offset, size, buflen);
+                return -E2BIG;
+            }
+
+            xsave_offset += size;
+            continue;
+        }
+
+        if (align64) {
+            xsave_offset = QEMU_ALIGN_UP(xsave_offset, 64);
+        }
+
+        if ((xsave_offset + size) > buflen) {
+            error_report("xsave component %zu overruns source buffer: "
+                         "offset=%zu size=%u buflen=%zu",
+                         i, xsave_offset, size, buflen);
+            return -E2BIG;
+        }
+
+        if ((dst_off + size) > env->xsave_buf_len) {
+            error_report("xsave component %zu overruns destination buffer: "
+                         "offset=%u size=%u buflen=%zu",
+                         i, dst_off, size, (size_t)env->xsave_buf_len);
+            return -E2BIG;
+        }
+
+        /* Copy components marked present in XSTATE_BV to guest model */
+        if (((compacted_xstate_bv >> i) & 1) != 0) {
+            memcpy(env->xsave_buf + dst_off, buf + xsave_offset, size);
+        }
+
+        xsave_offset += size;
+    }
+
+    return 0;
+}
+
+/**
+ * compact_xsave_area - Convert standard XSAVE format to compacted format
+ * @env: CPU state containing the standard format XSAVE buffer
+ * @buf: Destination buffer for compacted XSAVE data (to send to hypervisor)
+ * @buflen: Size of destination buffer
+ *
+ * Accelerator backends like MSHV might expect XSAVE state in compacted format
+ * (XSAVEC). The state components are packed contiguously without gaps.
+ * The XSAVE qemu buffers are in standard format where each component has a
+ * fixed offset.
+ *
+ * This function converts from standard to compacted format, it accepts a
+ * pre-allocated destination buffer of sufficient size, it is the
+ * responsibility of the caller to ensure the buffer is big enough.
+ *
+ * Returns: total size of compacted XSAVE data written to @buf
+ */
+int compact_xsave_area(CPUX86State *env, void *buf, size_t buflen)
+{
+    uint64_t *xcomp_bv;
+    size_t i;
+    uint32_t eax, ebx, ecx, edx;
+    uint32_t size, src_off;
+    bool align64;
+    size_t compact_offset;
+    uint64_t host_xcr0_mask, guest_xcr0;
+
+    /* Zero out buffer, then copy legacy region (FP + SSE) and header as-is */
+    memset(buf, 0, buflen);
+    memcpy(buf, env->xsave_buf, XSAVE_EXT_OFFSET);
+
+    /*
+     * Set XCOMP_BV to indicate compacted format (bit 63) and which
+     * components are in the layout.
+     *
+     * We must explicitly set XCOMP_BV because x86_cpu_xsave_all_areas()
+     * produces standard format with XCOMP_BV=0 (buffer is zeroed and only
+     * XSTATE_BV is set in the header).
+     *
+     * XCOMP_BV must reflect the partition's XSAVE capability, not the
+     * guest's current XCR0 (env->xcr0). These differ b/c:
+     * - A guest's XCR0 is what the guest OS has enabled via XSETBV
+     * - The partition's XCR0 mask is the hypervisor's save/restore capability
+     *
+     * The hypervisor uses XSAVES which saves based on its capability, so the
+     * XCOMP_BV value in the buffer we send back must match that capability.
+     *
+     * We intersect the host XCR0 with the guest's supported XCR0 features
+     * (FEAT_XSAVE_XCR0_*) so that features disabled at partition creation
+     * (e.g. AMX) are excluded from the compacted layout.
+     */
+    host_cpuid(0xD, 0, &eax, &ebx, &ecx, &edx);
+    host_xcr0_mask = ((uint64_t)edx << 32) | eax;
+    guest_xcr0 = ((uint64_t)env->features[FEAT_XSAVE_XCR0_HI] << 32) |
+                 env->features[FEAT_XSAVE_XCR0_LO];
+    host_xcr0_mask &= guest_xcr0;
+    xcomp_bv = buf + XSAVE_XCOMP_BV_OFFSET;
+    *xcomp_bv = host_xcr0_mask | (1ULL << 63);
+
+    /*
+     * Process each extended state component in the host's XCR0.
+     * The compacted layout must match XCOMP_BV (host capability).
+     *
+     * For each component:
+     * - Get its size and standard offset from host CPUID
+     * - Apply 64-byte alignment if required
+     * - Copy data only if guest has this component (bit set in env->xcr0)
+     * - Always advance offset to maintain correct layout
+     */
+    compact_offset = XSAVE_EXT_OFFSET;
+    for (i = 2; i < 63; i++) {
+        if (!((host_xcr0_mask >> i) & 1)) {
+            continue;
+        }
+
+        /* Query host CPUID for this component's size and standard offset */
+        host_cpuid(0xD, i, &eax, &ebx, &ecx, &edx);
+        size = eax;
+        src_off = ebx;
+        align64 = (ecx >> 1) & 1;
+
+        if (size == 0) {
+            /* Component in host xcr0 but unknown - shouldn't happen */
+            continue;
+        }
+
+        /* Apply 64-byte alignment if required by this component */
+        if (align64) {
+            compact_offset = QEMU_ALIGN_UP(compact_offset, 64);
+        }
+
+        /*
+         * Only copy data if guest has this component enabled in XCR0.
+         * Otherwise the component remains zeroed (init state), but we
+         * still advance the offset to maintain the correct layout.
+         */
+        if ((env->xcr0 >> i) & 1) {
+            memcpy(buf + compact_offset, env->xsave_buf + src_off, size);
+        }
+
+        compact_offset += size;
+    }
+
+    return compact_offset;
+}
-- 
2.50.1 (Apple Git-155)



  parent reply	other threads:[~2026-04-22 21:47 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-22 21:41 [PATCH v3 00/37] WHPX x86 updates for QEMU 11.1 Mohamed Mediouni
2026-04-22 21:41 ` [PATCH v3 01/37] target/i386: emulate: include name of unhandled instruction Mohamed Mediouni
2026-04-22 21:41 ` [PATCH v3 02/37] whpx: i386: x2apic emulation Mohamed Mediouni
2026-04-22 21:41 ` [PATCH v3 03/37] whpx: i386: wire up feature probing Mohamed Mediouni
2026-04-22 21:41 ` [PATCH v3 04/37] whpx: i386: disable TbFlushHypercalls for emulated LAPIC Mohamed Mediouni
2026-04-22 21:41 ` [PATCH v3 05/37] whpx: i386: enable x2apic by default for user-mode LAPIC Mohamed Mediouni
2026-04-22 21:41 ` [PATCH v3 06/37] whpx: i386: reintroduce enlightenments for Windows 10 Mohamed Mediouni
2026-04-22 21:41 ` [PATCH v3 07/37] whpx: i386: introduce proper cpuid support Mohamed Mediouni
2026-04-22 21:41 ` [PATCH v3 08/37] whpx: i386: kernel-irqchip=off fixes Mohamed Mediouni
2026-04-22 21:41 ` [PATCH v3 09/37] whpx: i386: use WHvX64RegisterCr8 only when kernel-irqchip=off Mohamed Mediouni
2026-04-22 21:41 ` [PATCH v3 10/37] whpx: i386: disable kernel-irqchip on Windows 10 when PIC enabled Mohamed Mediouni
2026-04-22 21:41 ` [PATCH v3 11/37] whpx: i386: IO port fast path cleanup Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 12/37] whpx: i386: disable enlightenments and LAPIC for isapc Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 13/37] whpx: i386: interrupt priority support Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 14/37] hw/intc: apic: disallow APIC reads when disabled Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 15/37] whpx: i386: fix CPUID[1:EDX].APIC reporting Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 16/37] whpx: i386: set apicbase value only on success Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 17/37] whpx: i386: unknown MSR configurability Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 18/37] whpx: i386: enable GuestIdleReg enlightenment Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 19/37] whpx: i386: tighten APIC base validity check Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 20/37] whpx: i386: ignore vpassist when kernel-irqchip=off Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 21/37] target: i386: HLT type that ignores EFLAGS.IF Mohamed Mediouni
2026-04-30 13:43   ` Paolo Bonzini
2026-04-22 21:42 ` [PATCH v3 22/37] whpx: i386: add HV_X64_MSR_GUEST_IDLE when !kernel-irqchip Mohamed Mediouni
2026-04-30 13:21   ` Paolo Bonzini
2026-04-22 21:42 ` [PATCH v3 23/37] whpx: i386: some x2APIC awareness Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 24/37] whpx: i386: set WHvX64RegisterInitialApicId Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 25/37] whpx: i386: Pause VM on fatal exception to be able to inspect state Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 26/37] target/i386: emulate: use exception_payload for fault address Mohamed Mediouni
2026-04-30 13:24   ` Paolo Bonzini
2026-04-22 21:42 ` [PATCH v3 27/37] target/i386: make xsave_buf present unconditionally Mohamed Mediouni
2026-04-22 21:42 ` Mohamed Mediouni [this message]
2026-04-30 13:31   ` [PATCH v3 28/37] target/i386: add de/compaction to xsave_helper Paolo Bonzini
2026-04-22 21:42 ` [PATCH v3 29/37] whpx: xsave support Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 30/37] whpx: i386: set APIC ID only when APIC present Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 31/37] whpx: i386: update migration blocker message Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 32/37] whpx: i386: don't increment eip on MSR access raising GPF Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 33/37] target/i386: emulate, hvf: rdmsr/wrmsr GPF handling Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 34/37] whpx: i386: add feature to intercept #GP MSR accesses Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 35/37] whpx: i386: nested virt settings Mohamed Mediouni
2026-04-30 13:44   ` Paolo Bonzini
2026-04-30 17:52     ` Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 36/37] whpx: i386: add SeparateSecurityDomain flag and make default Mohamed Mediouni
2026-04-22 21:42 ` [PATCH v3 37/37] whpx: i386: documentation update Mohamed Mediouni
2026-04-23 11:10 ` [PATCH v3 00/37] WHPX x86 updates for QEMU 11.1 Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260422214225.2242-29-mohamed@unpredictable.fr \
    --to=mohamed@unpredictable.fr \
    --cc=magnuskulke@linux.microsoft.com \
    --cc=mst@redhat.com \
    --cc=pbarbuda@microsoft.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=phil@philjordan.eu \
    --cc=pierrick.bouvier@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rbolshakov@ddn.com \
    --cc=wei.liu@kernel.org \
    --cc=zhao1.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.