public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Magnus Kulke <magnuskulke@linux.microsoft.com>
To: qemu-devel@nongnu.org
Cc: kvm@vger.kernel.org, "Magnus Kulke" <magnuskulke@microsoft.com>,
	"Wei Liu" <liuwe@microsoft.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Cédric Le Goater" <clg@redhat.com>,
	"Zhao Liu" <zhao1.liu@intel.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Wei Liu" <wei.liu@kernel.org>,
	"Magnus Kulke" <magnuskulke@linux.microsoft.com>,
	"Alex Williamson" <alex@shazbot.org>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Marcelo Tosatti" <mtosatti@redhat.com>
Subject: [PATCH 30/34] target/i386: add de/compaction to xsave_helper
Date: Fri, 17 Apr 2026 12:56:14 +0200	[thread overview]
Message-ID: <20260417105618.3621-31-magnuskulke@linux.microsoft.com> (raw)
In-Reply-To: <20260417105618.3621-1-magnuskulke@linux.microsoft.com>

HyperV use XSAVES which stores extended state in compacted format in
which components are packed contiguously, while QEMU's internal XSAVE
representation use the standard format in which each component is places
at a fixed offset. Hence for this purpose we add two conversion fn's to
the xsave helper to roundtrip XSAVE state in a migration.

- decompact_xsave_area(): converts compacted format to standard.
  XSTATE_BV is masked to host XCR0 since IA32_XSS is managed
  by the hypervisor.

- compact_xsave_area(): converts standard format back to compacted
  format. XCOMP_BV is set from the host's CPUID 0xD.0 rather than the
  guest's XCR0, as this is what the hypervisor expects.

Both functions use the host's CPUID leaf 0xD subleaves to determine component
sizes, offsets, and alignment requirements.

There are situations when the host advertises features that we want to
disable for the guest, e.g. AMX TILE. In this case we cannot rely on the
host's xcr0, but instead we use the feature mask that has been generated
in as part of the CPU realization process (x86_cpu_expand_features).

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 target/i386/cpu.h          |   2 +
 target/i386/xsave_helper.c | 255 +++++++++++++++++++++++++++++++++++++
 2 files changed, 257 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 4ad4a35ce9..cd5d5a5369 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -3033,6 +3033,8 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void *buf, uint32_t buflen);
 void x86_cpu_xsave_all_areas(X86CPU *cpu, void *buf, uint32_t buflen);
 uint32_t xsave_area_size(uint64_t mask, bool compacted);
 void x86_update_hflags(CPUX86State* env);
+int decompact_xsave_area(const void *buf, size_t buflen, CPUX86State *env);
+int compact_xsave_area(CPUX86State *env, void *buf, size_t buflen);
 
 static inline bool hyperv_feat_enabled(X86CPU *cpu, int feat)
 {
diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c
index bab2258732..2272b83f5f 100644
--- a/target/i386/xsave_helper.c
+++ b/target/i386/xsave_helper.c
@@ -3,6 +3,7 @@
  * See the COPYING file in the top-level directory.
  */
 #include "qemu/osdep.h"
+#include "qemu/error-report.h"
 
 #include "cpu.h"
 
@@ -293,3 +294,257 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void *buf, uint32_t buflen)
     }
 #endif
 }
+
+#define XSTATE_BV_IN_HDR  offsetof(X86XSaveHeader, xstate_bv)
+#define XCOMP_BV_IN_HDR   offsetof(X86XSaveHeader, xcomp_bvo)
+
+typedef struct X86XSaveAreaView {
+    /* 512 bytes */
+    X86LegacyXSaveArea legacy;
+    /* 64 bytes */
+    X86XSaveHeader     header;
+    /* ...followed by individual xsave areas */
+} X86XSaveAreaView;
+
+#define XSAVE_XSTATE_BV_OFFSET  offsetof(X86XSaveAreaView, header.xstate_bv)
+#define XSAVE_XCOMP_BV_OFFSET   offsetof(X86XSaveAreaView, header.xcomp_bv)
+#define XSAVE_EXT_OFFSET        (sizeof(X86LegacyXSaveArea) + \
+                                 sizeof(X86XSaveHeader))
+
+/**
+ * decompact_xsave_area - Convert compacted XSAVE format to standard format
+ * @buf: Source buffer containing compacted XSAVE data
+ * @buflen: Size of source buffer
+ * @env: CPU state where the standard format buffer will be written to
+ *
+ * Accelerator backends like MSHV might return XSAVE state in compacted format
+ * (XSAVEC). The state components have to be packed contiguously without gaps.
+ * The XSAVE qemu buffers are in standard format where each component has a
+ * fixed offset.
+ *
+ * Returns: 0 on success, negative errno on failure
+ */
+int decompact_xsave_area(const void *buf, size_t buflen, CPUX86State *env)
+{
+    uint64_t compacted_xstate_bv, compacted_xcomp_bv, compacted_layout_bv;
+    uint64_t xsave_offset, *xcomp_bv;
+    size_t i;
+    uint32_t eax, ebx, ecx, edx;
+    uint32_t size, dst_off;
+    bool align64;
+    uint64_t guest_xcr0, *xstate_bv;
+
+    compacted_xstate_bv = *(uint64_t *)(buf + XSAVE_XSTATE_BV_OFFSET);
+    compacted_xcomp_bv  = *(uint64_t *)(buf + XSAVE_XCOMP_BV_OFFSET);
+
+    /* This function only handles compacted format (bit 63 set) */
+    assert((compacted_xcomp_bv >> 63) & 1);
+
+    /* Low bits of XCOMP_BV describe which components are in the layout */
+    compacted_layout_bv = compacted_xcomp_bv & ~(1ULL << 63);
+
+    /* Zero out buffer, then copy legacy region (FP + SSE) and header as-is */
+    memset(env->xsave_buf, 0, env->xsave_buf_len);
+    memcpy(env->xsave_buf, buf, XSAVE_EXT_OFFSET);
+
+    /*
+     * We mask XSTATE_BV with the guest's supported XCR0 because:
+     * 1. Supervisor state (IA32_XSS) is hypervisor-managed, we don't use
+     *    this state for migration.
+     * 2. Features disabled at partition creation (e.g. AMX) must be excluded
+     */
+    guest_xcr0 = ((uint64_t)env->features[FEAT_XSAVE_XCR0_HI] << 32) |
+                 env->features[FEAT_XSAVE_XCR0_LO];
+    xstate_bv = (uint64_t *)(env->xsave_buf + XSAVE_XSTATE_BV_OFFSET);
+    *xstate_bv &= guest_xcr0;
+
+    /* Clear bit 63 - output is standard format, not compacted */
+    xcomp_bv = (uint64_t *)(env->xsave_buf + XSAVE_XCOMP_BV_OFFSET);
+    *xcomp_bv = *xcomp_bv & ~(1ULL << 63);
+
+    /*
+     * Process each extended state component in the compacted layout.
+     * Components 0 and 1 (FP and SSE) are in the legacy region, so we
+     * start at component 2. For each component:
+     * - Calculate its offset in the compacted source (contiguous layout)
+     * - Get its fixed offset in the standard destination from CPUID
+     * - Copy if the component has non-init state (bit set in XSTATE_BV)
+     */
+    xsave_offset = XSAVE_EXT_OFFSET;
+    for (i = 2; i < 63; i++) {
+        if (((compacted_layout_bv >> i) & 1) == 0) {
+            continue;
+        }
+
+        /* Query guest CPUID for this component's size and standard offset */
+        cpu_x86_cpuid(env, 0xD, i, &eax, &ebx, &ecx, &edx);
+
+        size = eax;
+        dst_off = ebx;
+        align64 = (ecx & (1u << 1)) != 0;
+
+        /* Component is in the layout but unknown to the guest CPUID model */
+        if (size == 0) {
+            /*
+             * The hypervisor might expose a component that has no
+             * representation in the guest CPUID model. We query the host to
+             * retrieve the size of the component, so we can skip over it.
+             */
+            host_cpuid(0xD, i, &eax, &ebx, &ecx, &edx);
+            size = eax;
+            align64 = (ecx & (1u << 1)) != 0;
+            if (size == 0) {
+                error_report("xsave component %zu: size unknown to both "
+                             "guest and host CPUID", i);
+                return -EINVAL;
+            }
+
+            if (align64) {
+                xsave_offset = QEMU_ALIGN_UP(xsave_offset, 64);
+            }
+
+            if (xsave_offset + size > buflen) {
+                error_report("xsave component %zu overruns source buffer: "
+                             "offset=%zu size=%u buflen=%zu",
+                             i, xsave_offset, size, buflen);
+                return -E2BIG;
+            }
+
+            xsave_offset += size;
+            continue;
+        }
+
+        if (align64) {
+            xsave_offset = QEMU_ALIGN_UP(xsave_offset, 64);
+        }
+
+        if ((xsave_offset + size) > buflen) {
+            error_report("xsave component %zu overruns source buffer: "
+                         "offset=%zu size=%u buflen=%zu",
+                         i, xsave_offset, size, buflen);
+            return -E2BIG;
+        }
+
+        if ((dst_off + size) > env->xsave_buf_len) {
+            error_report("xsave component %zu overruns destination buffer: "
+                         "offset=%u size=%u buflen=%zu",
+                         i, dst_off, size, (size_t)env->xsave_buf_len);
+            return -E2BIG;
+        }
+
+        /* Copy components marked present in XSTATE_BV to guest model */
+        if (((compacted_xstate_bv >> i) & 1) != 0) {
+            memcpy(env->xsave_buf + dst_off, buf + xsave_offset, size);
+        }
+
+        xsave_offset += size;
+    }
+
+    return 0;
+}
+
+/**
+ * compact_xsave_area - Convert standard XSAVE format to compacted format
+ * @env: CPU state containing the standard format XSAVE buffer
+ * @buf: Destination buffer for compacted XSAVE data (to send to hypervisor)
+ * @buflen: Size of destination buffer
+ *
+ * Accelerator backends like MSHV might expect XSAVE state in compacted format
+ * (XSAVEC). The state components are packed contiguously without gaps.
+ * The XSAVE qemu buffers are in standard format where each component has a
+ * fixed offset.
+ *
+ * This function converts from standard to compacted format, it accepts a
+ * pre-allocated destination buffer of sufficient size, it is the
+ * responsibility of the caller to ensure the buffer is big enough.
+ *
+ * Returns: total size of compacted XSAVE data written to @buf
+ */
+int compact_xsave_area(CPUX86State *env, void *buf, size_t buflen)
+{
+    uint64_t *xcomp_bv;
+    size_t i;
+    uint32_t eax, ebx, ecx, edx;
+    uint32_t size, src_off;
+    bool align64;
+    size_t compact_offset;
+    uint64_t host_xcr0_mask, guest_xcr0;
+
+    /* Zero out buffer, then copy legacy region (FP + SSE) and header as-is */
+    memset(buf, 0, buflen);
+    memcpy(buf, env->xsave_buf, XSAVE_EXT_OFFSET);
+
+    /*
+     * Set XCOMP_BV to indicate compacted format (bit 63) and which
+     * components are in the layout.
+     *
+     * We must explicitly set XCOMP_BV because x86_cpu_xsave_all_areas()
+     * produces standard format with XCOMP_BV=0 (buffer is zeroed and only
+     * XSTATE_BV is set in the header).
+     *
+     * XCOMP_BV must reflect the partition's XSAVE capability, not the
+     * guest's current XCR0 (env->xcr0). These differ b/c:
+     * - A guest's XCR0 is what the guest OS has enabled via XSETBV
+     * - The partition's XCR0 mask is the hypervisor's save/restore capability
+     *
+     * The hypervisor uses XSAVES which saves based on its capability, so the
+     * XCOMP_BV value in the buffer we send back must match that capability.
+     *
+     * We intersect the host XCR0 with the guest's supported XCR0 features
+     * (FEAT_XSAVE_XCR0_*) so that features disabled at partition creation
+     * (e.g. AMX) are excluded from the compacted layout.
+     */
+    host_cpuid(0xD, 0, &eax, &ebx, &ecx, &edx);
+    host_xcr0_mask = ((uint64_t)edx << 32) | eax;
+    guest_xcr0 = ((uint64_t)env->features[FEAT_XSAVE_XCR0_HI] << 32) |
+                 env->features[FEAT_XSAVE_XCR0_LO];
+    host_xcr0_mask &= guest_xcr0;
+    xcomp_bv = buf + XSAVE_XCOMP_BV_OFFSET;
+    *xcomp_bv = host_xcr0_mask | (1ULL << 63);
+
+    /*
+     * Process each extended state component in the host's XCR0.
+     * The compacted layout must match XCOMP_BV (host capability).
+     *
+     * For each component:
+     * - Get its size and standard offset from host CPUID
+     * - Apply 64-byte alignment if required
+     * - Copy data only if guest has this component (bit set in env->xcr0)
+     * - Always advance offset to maintain correct layout
+     */
+    compact_offset = XSAVE_EXT_OFFSET;
+    for (i = 2; i < 63; i++) {
+        if (!((host_xcr0_mask >> i) & 1)) {
+            continue;
+        }
+
+        /* Query host CPUID for this component's size and standard offset */
+        host_cpuid(0xD, i, &eax, &ebx, &ecx, &edx);
+        size = eax;
+        src_off = ebx;
+        align64 = (ecx >> 1) & 1;
+
+        if (size == 0) {
+            /* Component in host xcr0 but unknown - shouldn't happen */
+            continue;
+        }
+
+        /* Apply 64-byte alignment if required by this component */
+        if (align64) {
+            compact_offset = QEMU_ALIGN_UP(compact_offset, 64);
+        }
+
+        /*
+         * Only copy data if guest has this component enabled in XCR0.
+         * Otherwise the component remains zeroed (init state), but we
+         * still advance the offset to maintain the correct layout.
+         */
+        if ((env->xcr0 >> i) & 1) {
+            memcpy(buf + compact_offset, env->xsave_buf + src_off, size);
+        }
+
+        compact_offset += size;
+    }
+
+    return compact_offset;
+}
-- 
2.34.1


  parent reply	other threads:[~2026-04-17 10:58 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-17 10:55 [PATCH 00/34] Add migration support to the MSHV accelerator Magnus Kulke
2026-04-17 10:55 ` [PATCH 01/34] target/i386/mshv: use arch_load/store_reg fns Magnus Kulke
2026-04-17 10:55 ` [PATCH 02/34] target/i386/mshv: use generic FPU/xcr0 state Magnus Kulke
2026-04-17 10:55 ` [PATCH 03/34] target/i386/mshv: impl init/load/store_vcpu_state Magnus Kulke
2026-04-17 10:55 ` [PATCH 04/34] accel/accel-irq: add AccelRouteChange abstraction Magnus Kulke
2026-04-17 10:55 ` [PATCH 05/34] accel/accel-irq: add generic begin_route_changes Magnus Kulke
2026-04-17 10:55 ` [PATCH 06/34] accel/accel-irq: add generic commit_route_changes Magnus Kulke
2026-04-17 10:55 ` [PATCH 07/34] accel/mshv: add irq_routes to state Magnus Kulke
2026-04-17 10:55 ` [PATCH 08/34] accel/mshv: update s->irq_routes in add_msi_route Magnus Kulke
2026-04-17 10:55 ` [PATCH 09/34] accel/mshv: update s->irq_routes in update_msi_route Magnus Kulke
2026-04-17 10:55 ` [PATCH 10/34] accel/mshv: update s->irq_routes in release_virq Magnus Kulke
2026-04-17 10:55 ` [PATCH 11/34] accel/mshv: use s->irq_routes in commit_routes Magnus Kulke
2026-04-17 10:55 ` [PATCH 12/34] accel/mshv: reserve ioapic routes on s->irq_routes Magnus Kulke
2026-04-17 10:55 ` [PATCH 13/34] accel/mshv: remove redundant msi controller Magnus Kulke
2026-04-17 10:55 ` [PATCH 14/34] target/i386/mshv: move apic logic into own file Magnus Kulke
2026-04-17 10:55 ` [PATCH 15/34] target/i386/mshv: remove redundant apic helpers Magnus Kulke
2026-04-17 10:56 ` [PATCH 16/34] target/i386/mshv: migrate LAPIC state Magnus Kulke
2026-04-17 11:54   ` Mohamed Mediouni
2026-04-20 11:37     ` Magnus Kulke
2026-04-17 10:56 ` [PATCH 17/34] target/i386/mshv: move msr code to arch Magnus Kulke
2026-04-17 10:56 ` [PATCH 18/34] accel/mshv: store partition proc features Magnus Kulke
2026-04-17 10:56 ` [PATCH 19/34] target/i386/mshv: expose msvh_get_generic_regs Magnus Kulke
2026-04-17 10:56 ` [PATCH 20/34] target/i386/mshv: migrate MSRs Magnus Kulke
2026-04-17 10:56 ` [PATCH 21/34] target/i386/mshv: migrate MTRR MSRs Magnus Kulke
2026-04-17 10:56 ` [PATCH 22/34] target/i386/mshv: migrate Synic SINT MSRs Magnus Kulke
2026-04-17 10:56 ` [PATCH 23/34] target/i386/mshv: migrate CET/SS MSRs Magnus Kulke
2026-04-17 10:56 ` [PATCH 24/34] target/i386/mshv: migrate SIMP and SIEFP state Magnus Kulke
2026-04-17 10:56 ` [PATCH 25/34] target/i386/mshv: migrate STIMER state Magnus Kulke
2026-04-17 10:56 ` [PATCH 26/34] accel/mshv: introduce SaveVMHandler Magnus Kulke
2026-04-17 10:56 ` [PATCH 27/34] accel/mshv: write synthetic MSRs after migration Magnus Kulke
2026-04-17 10:56 ` [PATCH 28/34] accel/mshv: migrate REFERENCE_TIME Magnus Kulke
2026-04-17 10:56 ` [PATCH 29/34] target/i386/mshv: migrate pending ints/excs Magnus Kulke
2026-04-17 10:56 ` Magnus Kulke [this message]
2026-04-17 11:56   ` [PATCH 30/34] target/i386: add de/compaction to xsave_helper Mohamed Mediouni
2026-04-18 17:46   ` Mohamed Mediouni
2026-04-20 12:02     ` Magnus Kulke
2026-04-17 10:56 ` [PATCH 31/34] target/i386/mshv: migrate XSAVE state Magnus Kulke
2026-04-17 10:56 ` [PATCH 32/34] target/i386/mshv: reconstruct hflags after load Magnus Kulke
2026-04-17 10:56 ` [PATCH 33/34] target/i386/mshv: migrate MP_STATE Magnus Kulke
2026-04-17 10:56 ` [PATCH 34/34] accel/mshv: enable dirty page tracking Magnus Kulke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260417105618.3621-31-magnuskulke@linux.microsoft.com \
    --to=magnuskulke@linux.microsoft.com \
    --cc=alex@shazbot.org \
    --cc=clg@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=liuwe@microsoft.com \
    --cc=magnuskulke@microsoft.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=wei.liu@kernel.org \
    --cc=zhao1.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox