public inbox for qemu-devel@nongnu.org
 help / color / mirror / Atom feed
* [RFC 00/32] Add migration support to the MSHV accelerator
@ 2026-03-23 13:57 Magnus Kulke
  2026-03-23 13:57 ` [RFC 01/32] target/i386/mshv: use arch_load/store_reg fns Magnus Kulke
                   ` (31 more replies)
  0 siblings, 32 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

Hey all, This is a draft patch series for live migration support in
the MSHV accelerator. Since this is somewhat invasiva and touches
various parts outside of the accel's folder hierarchies I'm sending
an RFC series to collect early feedback.

Note: this patches are sent in sent in isolation, but for LM to be
functional we will require require the recently submitted patch series
"Support QEMU cpu models in MSHV accelerator" (at v3 currently) to be
merged, since on some hosts we will need the CPUID infra to disable
features that we currently do not yet migrate (e.g. AMX tiles).

In this series we perform some preperatory refactorings and introduce
new abstractions where required, particulary for irqchips and MSR
logic. We also have to introduce some generic logic for XSAVE
de/compaction to allow migration of XSAVE state.

Note 2: I did already receive some feedback offline on the new XSAVE
de/compaction handlers. We want to avoid the buffer copy and rework
existing code to be generic over a compacted or standard layout. The v1
submission will contain this change.

The guest state components that are covered by migration are:

- standard regs
- special regs
- xcr0
- (legacy) FPU regs
- XSAVE
- LAPIC
- MSRs
- SynIC state (SIMP, SIEFP, STIMER)
- pending interrupts/exceptions
- MP state (AP cpu modes)

Finally, routines for dirty-page tracking to reduce migration downtime
have beend added and integrated in the respective hooks.

best,

magnus

Magnus Kulke (32):
  target/i386/mshv: use arch_load/store_reg fns
  target/i386/mshv: use generic FPU/xcr0 state
  target/i386/mshv: impl init/load/store_vcpu_state
  accel/accel-irq: add AccelRouteChange abstraction
  accel/accel-irq: add generic begin_route_changes
  accel/accel-irq: add generic commit_route_changes
  accel/mshv: add irq_routes to state
  accel/mshv: update s->irq_routes in add_msi_route
  accel/mshv: update s->irq_routes in update_msi_route
  accel/mshv: update s->irq_routes in release_virq
  accel/mshv: use s->irq_routes in commit_routes
  accel/mshv: reserve ioapic routes on s->irq_routes
  accel/mshv: remove redundant msi controller
  target/i386/mshv: move apic logic into own file
  target/i386/mshv: migrate LAPIC state
  target/i386/mshv: move msr code to arch
  accel/mshv: store partition proc features
  target/i386/mshv: expose msvh_get_generic_regs
  target/i386/mshv: migrate MSRs
  target/i386/mshv: migrate MTRR MSRs
  target/i386/mshv: migrate Synic SINT MSRs
  target/i386/mshv: migrate SIMP and SIEFP state
  target/i386/mshv: migrate STIMER state
  accel/mshv: introduce SaveVMHandler
  accel/mshv: write synthetic MSRs after migration
  accel/mshv: migrate REFERENCE_TIME
  target/i386/mshv: migrate pending ints/excs
  target/i386: add de/compaction to xsave_helper
  target/i386/mshv: migrate XSAVE state
  target/i386/mshv: reconstruct hflags after load
  target/i386/mshv: migrate MP_STATE
  accel/mshv: enable dirty page tracking

 accel/accel-irq.c               |  41 +-
 accel/kvm/kvm-all.c             |   6 +-
 accel/mshv/irq.c                | 360 ++++++------
 accel/mshv/mem.c                | 211 +++++++
 accel/mshv/meson.build          |   1 -
 accel/mshv/mshv-all.c           | 243 ++++++++-
 accel/mshv/msr.c                | 375 -------------
 accel/stubs/kvm-stub.c          |   2 +-
 accel/stubs/mshv-stub.c         |   6 +-
 hw/intc/apic_common.c           |   3 +
 hw/misc/ivshmem-pci.c           |   8 +-
 hw/vfio/pci.c                   |  11 +-
 hw/virtio/virtio-pci.c          |   3 +-
 include/accel/accel-route.h     |  17 +
 include/hw/hyperv/hvgdk_mini.h  |  22 +
 include/hw/hyperv/hvhdk.h       | 149 +++++
 include/hw/i386/apic_internal.h |   5 +
 include/system/accel-irq.h      |   6 +-
 include/system/kvm.h            |  23 +-
 include/system/mshv.h           |  15 +-
 include/system/mshv_int.h       |  89 +--
 target/i386/cpu.h               |  14 +-
 target/i386/kvm/kvm.c           |   5 +-
 target/i386/machine.c           |  46 ++
 target/i386/mshv/meson.build    |   3 +
 target/i386/mshv/mshv-apic.c    |  78 +++
 target/i386/mshv/mshv-cpu.c     | 940 +++++++++++++++++++++++---------
 target/i386/mshv/msr.c          | 432 +++++++++++++++
 target/i386/mshv/synic.c        | 206 +++++++
 target/i386/xsave_helper.c      | 255 +++++++++
 30 files changed, 2641 insertions(+), 934 deletions(-)
 delete mode 100644 accel/mshv/msr.c
 create mode 100644 include/accel/accel-route.h
 create mode 100644 target/i386/mshv/mshv-apic.c
 create mode 100644 target/i386/mshv/msr.c
 create mode 100644 target/i386/mshv/synic.c

-- 
2.34.1



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC 01/32] target/i386/mshv: use arch_load/store_reg fns
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:57 ` [RFC 02/32] target/i386/mshv: use generic FPU/xcr0 state Magnus Kulke
                   ` (30 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

Improved consistency around the naming of load/store register fn's. this
is required since we want to roundtrip more registers in a migration
than what's currently required for MMIO emulation.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/mshv-all.c       |  2 +-
 include/system/mshv_int.h   |  6 ++---
 target/i386/mshv/mshv-cpu.c | 52 ++++++++++++++-----------------------
 3 files changed, 23 insertions(+), 37 deletions(-)

diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index d4cc7f5371..7c0eb68a5b 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -650,7 +650,7 @@ static void mshv_cpu_synchronize_pre_loadvm(CPUState *cpu)
 static void do_mshv_cpu_synchronize(CPUState *cpu, run_on_cpu_data arg)
 {
     if (!cpu->accel->dirty) {
-        int ret = mshv_load_regs(cpu);
+        int ret = mshv_arch_load_regs(cpu);
         if (ret < 0) {
             error_report("Failed to load registers for vcpu %d",
                          cpu->cpu_index);
diff --git a/include/system/mshv_int.h b/include/system/mshv_int.h
index 35386c422f..a142dd241a 100644
--- a/include/system/mshv_int.h
+++ b/include/system/mshv_int.h
@@ -82,11 +82,9 @@ void mshv_init_mmio_emu(void);
 int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd);
 void mshv_remove_vcpu(int vm_fd, int cpu_fd);
 int mshv_configure_vcpu(const CPUState *cpu, const MshvFPU *fpu, uint64_t xcr0);
-int mshv_get_standard_regs(CPUState *cpu);
-int mshv_get_special_regs(CPUState *cpu);
 int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit);
-int mshv_load_regs(CPUState *cpu);
-int mshv_store_regs(CPUState *cpu);
+int mshv_arch_load_regs(CPUState *cpu);
+int mshv_arch_store_regs(CPUState *cpu);
 int mshv_set_generic_regs(const CPUState *cpu, const hv_register_assoc *assocs,
                           size_t n_regs);
 int mshv_arch_put_registers(const CPUState *cpu);
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 2bc978deb2..9456e75277 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -107,6 +107,8 @@ static enum hv_register_name FPU_REGISTER_NAMES[26] = {
     HV_X64_REGISTER_XMM_CONTROL_STATUS,
 };
 
+static int set_special_regs(const CPUState *cpu);
+
 static int translate_gva(const CPUState *cpu, uint64_t gva, uint64_t *gpa,
                          uint64_t flags)
 {
@@ -285,7 +287,7 @@ static int set_standard_regs(const CPUState *cpu)
     return 0;
 }
 
-int mshv_store_regs(CPUState *cpu)
+int mshv_arch_store_regs(CPUState *cpu)
 {
     int ret;
 
@@ -295,6 +297,12 @@ int mshv_store_regs(CPUState *cpu)
         return -1;
     }
 
+    ret = set_special_regs(cpu);
+    if (ret < 0) {
+        error_report("Failed to store speical registers");
+        return ret;
+    }
+
     return 0;
 }
 
@@ -323,7 +331,7 @@ static void populate_standard_regs(const hv_register_assoc *assocs,
     rflags_to_lflags(env);
 }
 
-int mshv_get_standard_regs(CPUState *cpu)
+static int get_standard_regs(CPUState *cpu)
 {
     struct hv_register_assoc assocs[ARRAY_SIZE(STANDARD_REGISTER_NAMES)];
     int ret;
@@ -401,8 +409,7 @@ static void populate_special_regs(const hv_register_assoc *assocs,
     cpu_set_apic_base(x86cpu->apic_state, assocs[16].value.reg64);
 }
 
-
-int mshv_get_special_regs(CPUState *cpu)
+static int get_special_regs(CPUState *cpu)
 {
     struct hv_register_assoc assocs[ARRAY_SIZE(SPECIAL_REGISTER_NAMES)];
     int ret;
@@ -422,17 +429,17 @@ int mshv_get_special_regs(CPUState *cpu)
     return 0;
 }
 
-int mshv_load_regs(CPUState *cpu)
+int mshv_arch_load_regs(CPUState *cpu)
 {
     int ret;
 
-    ret = mshv_get_standard_regs(cpu);
+    ret = get_standard_regs(cpu);
     if (ret < 0) {
         error_report("Failed to load standard registers");
         return -1;
     }
 
-    ret = mshv_get_special_regs(cpu);
+    ret = get_special_regs(cpu);
     if (ret < 0) {
         error_report("Failed to load special registers");
         return -1;
@@ -1103,16 +1110,16 @@ static int emulate_instruction(CPUState *cpu,
     int ret;
     x86_insn_stream stream = { .bytes = insn_bytes, .len = insn_len };
 
-    ret = mshv_load_regs(cpu);
+    ret = mshv_arch_load_regs(cpu);
     if (ret < 0) {
-        error_report("failed to load registers");
+        error_report("Failed to load registers");
         return -1;
     }
 
     decode_instruction_stream(env, &decode, &stream);
     exec_instruction(env, &decode);
 
-    ret = mshv_store_regs(cpu);
+    ret = mshv_arch_store_regs(cpu);
     if (ret < 0) {
         error_report("failed to store registers");
         return -1;
@@ -1291,25 +1298,6 @@ static int handle_pio_non_str(const CPUState *cpu,
     return 0;
 }
 
-static int fetch_guest_state(CPUState *cpu)
-{
-    int ret;
-
-    ret = mshv_get_standard_regs(cpu);
-    if (ret < 0) {
-        error_report("Failed to get standard registers");
-        return -1;
-    }
-
-    ret = mshv_get_special_regs(cpu);
-    if (ret < 0) {
-        error_report("Failed to get special registers");
-        return -1;
-    }
-
-    return 0;
-}
-
 static int read_memory(const CPUState *cpu, uint64_t initial_gva,
                        uint64_t initial_gpa, uint64_t gva, uint8_t *data,
                        size_t len)
@@ -1429,9 +1417,9 @@ static int handle_pio_str(CPUState *cpu, hv_x64_io_port_intercept_message *info)
     X86CPU *x86_cpu = X86_CPU(cpu);
     CPUX86State *env = &x86_cpu->env;
 
-    ret = fetch_guest_state(cpu);
+    ret = mshv_arch_load_regs(cpu);
     if (ret < 0) {
-        error_report("Failed to fetch guest state");
+        error_report("Failed to load registers");
         return -1;
     }
 
@@ -1462,7 +1450,7 @@ static int handle_pio_str(CPUState *cpu, hv_x64_io_port_intercept_message *info)
 
     ret = set_x64_registers(cpu, reg_names, reg_values);
     if (ret < 0) {
-        error_report("Failed to set x64 registers");
+        error_report("Failed to set RIP and RAX registers");
         return -1;
     }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 02/32] target/i386/mshv: use generic FPU/xcr0 state
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
  2026-03-23 13:57 ` [RFC 01/32] target/i386/mshv: use arch_load/store_reg fns Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:57 ` [RFC 03/32] target/i386/mshv: impl init/load/store_vcpu_state Magnus Kulke
                   ` (29 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

Instead of using an mshv-specific FPU state representation we switch to
the generic i386 representation of the registers.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 include/system/mshv_int.h   | 15 +-------
 target/i386/mshv/mshv-cpu.c | 76 ++++++++++++++++++++++---------------
 2 files changed, 47 insertions(+), 44 deletions(-)

diff --git a/include/system/mshv_int.h b/include/system/mshv_int.h
index a142dd241a..e3d1867a77 100644
--- a/include/system/mshv_int.h
+++ b/include/system/mshv_int.h
@@ -58,19 +58,6 @@ typedef struct MshvMsiControl {
 #define mshv_vcpufd(cpu) (cpu->accel->cpufd)
 
 /* cpu */
-typedef struct MshvFPU {
-    uint8_t fpr[8][16];
-    uint16_t fcw;
-    uint16_t fsw;
-    uint8_t ftwx;
-    uint8_t pad1;
-    uint16_t last_opcode;
-    uint64_t last_ip;
-    uint64_t last_dp;
-    uint8_t xmm[16][16];
-    uint32_t mxcsr;
-    uint32_t pad2;
-} MshvFPU;
 
 typedef enum MshvVmExit {
     MshvVmExitIgnore   = 0,
@@ -81,7 +68,7 @@ typedef enum MshvVmExit {
 void mshv_init_mmio_emu(void);
 int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd);
 void mshv_remove_vcpu(int vm_fd, int cpu_fd);
-int mshv_configure_vcpu(const CPUState *cpu, const MshvFPU *fpu, uint64_t xcr0);
+int mshv_configure_vcpu(const CPUState *cpu);
 int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit);
 int mshv_arch_load_regs(CPUState *cpu);
 int mshv_arch_store_regs(CPUState *cpu);
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 9456e75277..78b218e596 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -108,6 +108,9 @@ static enum hv_register_name FPU_REGISTER_NAMES[26] = {
 };
 
 static int set_special_regs(const CPUState *cpu);
+static int get_generic_regs(CPUState *cpu,
+                            struct hv_register_assoc *assocs,
+                            size_t n_regs);
 
 static int translate_gva(const CPUState *cpu, uint64_t gva, uint64_t *gpa,
                          uint64_t flags)
@@ -717,48 +720,65 @@ static int set_special_regs(const CPUState *cpu)
     return 0;
 }
 
-static int set_fpu(const CPUState *cpu, const struct MshvFPU *regs)
+static int set_fpu(const CPUState *cpu)
 {
     struct hv_register_assoc assocs[ARRAY_SIZE(FPU_REGISTER_NAMES)];
     union hv_register_value *value;
-    size_t fp_i;
     union hv_x64_fp_control_status_register *ctrl_status;
     union hv_x64_xmm_control_status_register *xmm_ctrl_status;
     int ret;
     size_t n_regs = ARRAY_SIZE(FPU_REGISTER_NAMES);
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    size_t i, fp_i;
+    bool valid;
 
     /* first 16 registers are xmm0-xmm15 */
-    for (size_t i = 0; i < 16; i++) {
+    for (i = 0; i < 16; i++) {
         assocs[i].name = FPU_REGISTER_NAMES[i];
         value = &assocs[i].value;
-        memcpy(&value->reg128, &regs->xmm[i], 16);
+        value->reg128.low_part  = env->xmm_regs[i].ZMM_Q(0);
+        value->reg128.high_part = env->xmm_regs[i].ZMM_Q(1);
     }
 
     /* next 8 registers are fp_mmx0-fp_mmx7 */
-    for (size_t i = 16; i < 24; i++) {
-        assocs[i].name = FPU_REGISTER_NAMES[i];
+    for (i = 16; i < 24; i++) {
         fp_i = (i - 16);
+        assocs[i].name = FPU_REGISTER_NAMES[i];
         value = &assocs[i].value;
-        memcpy(&value->reg128, &regs->fpr[fp_i], 16);
+        value->fp.mantissa        = env->fpregs[fp_i].d.low;
+        value->fp.biased_exponent = env->fpregs[fp_i].d.high & 0x7FFF;
+        value->fp.sign            = (env->fpregs[fp_i].d.high >> 15) & 0x1;
+        value->fp.reserved        = 0;
     }
 
     /* last two registers are fp_control_status and xmm_control_status */
     assocs[24].name = FPU_REGISTER_NAMES[24];
     value = &assocs[24].value;
     ctrl_status = &value->fp_control_status;
-    ctrl_status->fp_control = regs->fcw;
-    ctrl_status->fp_status = regs->fsw;
-    ctrl_status->fp_tag = regs->ftwx;
+
+    ctrl_status->fp_control = env->fpuc;
+    /* bits 11,12,13 are the top of stack pointer */
+    ctrl_status->fp_status = (env->fpus & ~0x3800) | ((env->fpstt & 0x7) << 11);
+
+    ctrl_status->fp_tag = 0;
+    for (i = 0; i < 8; i++) {
+        valid = (env->fptags[i] == 0);
+        if (valid) {
+            ctrl_status->fp_tag |= (1u << i);
+        }
+    }
+
     ctrl_status->reserved = 0;
-    ctrl_status->last_fp_op = regs->last_opcode;
-    ctrl_status->last_fp_rip = regs->last_ip;
+    ctrl_status->last_fp_op = env->fpop;
+    ctrl_status->last_fp_rip = env->fpip;
 
     assocs[25].name = FPU_REGISTER_NAMES[25];
     value = &assocs[25].value;
     xmm_ctrl_status = &value->xmm_control_status;
-    xmm_ctrl_status->xmm_status_control = regs->mxcsr;
-    xmm_ctrl_status->xmm_status_control_mask = 0;
-    xmm_ctrl_status->last_fp_rdp = regs->last_dp;
+    xmm_ctrl_status->xmm_status_control = env->mxcsr;
+    xmm_ctrl_status->xmm_status_control_mask = 0x0000ffff;
+    xmm_ctrl_status->last_fp_rdp = env->fpdp;
 
     ret = mshv_set_generic_regs(cpu, assocs, n_regs);
     if (ret < 0) {
@@ -769,12 +789,15 @@ static int set_fpu(const CPUState *cpu, const struct MshvFPU *regs)
     return 0;
 }
 
-static int set_xc_reg(const CPUState *cpu, uint64_t xcr0)
+static int set_xc_reg(const CPUState *cpu)
 {
     int ret;
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+
     struct hv_register_assoc assoc = {
         .name = HV_X64_REGISTER_XFEM,
-        .value.reg64 = xcr0,
+        .value.reg64 = env->xcr0,
     };
 
     ret = mshv_set_generic_regs(cpu, &assoc, 1);
@@ -785,8 +808,7 @@ static int set_xc_reg(const CPUState *cpu, uint64_t xcr0)
     return 0;
 }
 
-static int set_cpu_state(const CPUState *cpu, const MshvFPU *fpu_regs,
-                         uint64_t xcr0)
+static int set_cpu_state(const CPUState *cpu)
 {
     int ret;
 
@@ -798,11 +820,11 @@ static int set_cpu_state(const CPUState *cpu, const MshvFPU *fpu_regs,
     if (ret < 0) {
         return ret;
     }
-    ret = set_fpu(cpu, fpu_regs);
+    ret = set_fpu(cpu);
     if (ret < 0) {
         return ret;
     }
-    ret = set_xc_reg(cpu, xcr0);
+    ret = set_xc_reg(cpu);
     if (ret < 0) {
         return ret;
     }
@@ -951,8 +973,7 @@ static int setup_msrs(const CPUState *cpu)
  * CPUX86State *env = &x86cpu->env;
  * X86CPUTopoInfo *topo_info = &env->topo_info;
  */
-int mshv_configure_vcpu(const CPUState *cpu, const struct MshvFPU *fpu,
-                        uint64_t xcr0)
+int mshv_configure_vcpu(const CPUState *cpu)
 {
     int ret;
     int cpu_fd = mshv_vcpufd(cpu);
@@ -969,7 +990,7 @@ int mshv_configure_vcpu(const CPUState *cpu, const struct MshvFPU *fpu,
         return -1;
     }
 
-    ret = set_cpu_state(cpu, fpu, xcr0);
+    ret = set_cpu_state(cpu);
     if (ret < 0) {
         error_report("failed to set cpu state");
         return -1;
@@ -986,14 +1007,9 @@ int mshv_configure_vcpu(const CPUState *cpu, const struct MshvFPU *fpu,
 
 static int put_regs(const CPUState *cpu)
 {
-    X86CPU *x86cpu = X86_CPU(cpu);
-    CPUX86State *env = &x86cpu->env;
-    MshvFPU fpu = {0};
     int ret;
 
-    memset(&fpu, 0, sizeof(fpu));
-
-    ret = mshv_configure_vcpu(cpu, &fpu, env->xcr0);
+    ret = mshv_configure_vcpu(cpu);
     if (ret < 0) {
         error_report("failed to configure vcpu");
         return ret;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 03/32] target/i386/mshv: impl init/load/store_vcpu_state
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
  2026-03-23 13:57 ` [RFC 01/32] target/i386/mshv: use arch_load/store_reg fns Magnus Kulke
  2026-03-23 13:57 ` [RFC 02/32] target/i386/mshv: use generic FPU/xcr0 state Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:57 ` [RFC 04/32] accel/accel-irq: add AccelRouteChange abstraction Magnus Kulke
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

In migration we will handle more than registers, so we rework the
routines that were used to load & store CPU registers from/to the
hypervisor into more explicit init/load/store_vcpu_state() functions
that can be called from the appropriate hooks.

load/store_regs() still exists for the purpose of MMIO emulation, but it
will only address standard and special x86 registers.

Functions to retrieve FPU and XCR0 state from the hypervsisor have been
introduced.

MSR and APIC state covered are covered only as part of init_vcpu(). They
are not yet part of the load/store routines.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/mshv-all.c       |  10 +-
 include/system/mshv_int.h   |   5 +-
 target/i386/mshv/mshv-cpu.c | 354 ++++++++++++++++++------------------
 3 files changed, 185 insertions(+), 184 deletions(-)

diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index 7c0eb68a5b..04d248fe1d 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -400,13 +400,13 @@ static int mshv_init_vcpu(CPUState *cpu)
     int ret;
 
     cpu->accel = g_new0(AccelCPUState, 1);
-    mshv_arch_init_vcpu(cpu);
 
     ret = mshv_create_vcpu(vm_fd, vp_index, &cpu->accel->cpufd);
     if (ret < 0) {
         return -1;
     }
 
+    mshv_arch_init_vcpu(cpu);
     cpu->accel->dirty = true;
 
     return 0;
@@ -488,7 +488,7 @@ static int mshv_cpu_exec(CPUState *cpu)
 
     do {
         if (cpu->accel->dirty) {
-            ret = mshv_arch_put_registers(cpu);
+            ret = mshv_arch_store_vcpu_state(cpu);
             if (ret) {
                 error_report("Failed to put registers after init: %s",
                               strerror(-ret));
@@ -610,7 +610,7 @@ static void mshv_start_vcpu_thread(CPUState *cpu)
 static void do_mshv_cpu_synchronize_post_init(CPUState *cpu,
                                               run_on_cpu_data arg)
 {
-    int ret = mshv_arch_put_registers(cpu);
+    int ret = mshv_arch_store_vcpu_state(cpu);
     if (ret < 0) {
         error_report("Failed to put registers after init: %s", strerror(-ret));
         abort();
@@ -626,7 +626,7 @@ static void mshv_cpu_synchronize_post_init(CPUState *cpu)
 
 static void mshv_cpu_synchronize_post_reset(CPUState *cpu)
 {
-    int ret = mshv_arch_put_registers(cpu);
+    int ret = mshv_arch_store_vcpu_state(cpu);
     if (ret) {
         error_report("Failed to put registers after reset: %s",
                      strerror(-ret));
@@ -650,7 +650,7 @@ static void mshv_cpu_synchronize_pre_loadvm(CPUState *cpu)
 static void do_mshv_cpu_synchronize(CPUState *cpu, run_on_cpu_data arg)
 {
     if (!cpu->accel->dirty) {
-        int ret = mshv_arch_load_regs(cpu);
+        int ret = mshv_arch_load_vcpu_state(cpu);
         if (ret < 0) {
             error_report("Failed to load registers for vcpu %d",
                          cpu->cpu_index);
diff --git a/include/system/mshv_int.h b/include/system/mshv_int.h
index e3d1867a77..70631ca6ba 100644
--- a/include/system/mshv_int.h
+++ b/include/system/mshv_int.h
@@ -70,11 +70,10 @@ int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd);
 void mshv_remove_vcpu(int vm_fd, int cpu_fd);
 int mshv_configure_vcpu(const CPUState *cpu);
 int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit);
-int mshv_arch_load_regs(CPUState *cpu);
-int mshv_arch_store_regs(CPUState *cpu);
 int mshv_set_generic_regs(const CPUState *cpu, const hv_register_assoc *assocs,
                           size_t n_regs);
-int mshv_arch_put_registers(const CPUState *cpu);
+int mshv_arch_store_vcpu_state(const CPUState *cpu);
+int mshv_arch_load_vcpu_state(CPUState *cpu);
 void mshv_arch_init_vcpu(CPUState *cpu);
 void mshv_arch_destroy_vcpu(CPUState *cpu);
 void mshv_arch_amend_proc_features(
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 78b218e596..56656ac0b0 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -112,6 +112,92 @@ static int get_generic_regs(CPUState *cpu,
                             struct hv_register_assoc *assocs,
                             size_t n_regs);
 
+static void populate_fpu(const hv_register_assoc *assocs, X86CPU *x86cpu)
+{
+    union hv_register_value value;
+    const union hv_x64_fp_control_status_register *ctrl_status;
+    const union hv_x64_xmm_control_status_register *xmm_ctrl;
+    CPUX86State *env = &x86cpu->env;
+    size_t i, fp_i;
+    bool valid;
+
+    /* first 16 registers are xmm0-xmm15 */
+    for (i = 0; i < 16; i++) {
+        value = assocs[i].value;
+        env->xmm_regs[i].ZMM_Q(0) = value.reg128.low_part;
+        env->xmm_regs[i].ZMM_Q(1) = value.reg128.high_part;
+    }
+
+    /* next 8 registers are fp_mmx0-fp_mmx7 */
+    for (i = 16; i < 24; i++) {
+        fp_i = i - 16;
+        value = assocs[i].value;
+        env->fpregs[fp_i].d.low = value.fp.mantissa;
+        env->fpregs[fp_i].d.high = (value.fp.sign << 15)
+                                 | (value.fp.biased_exponent & 0x7FFF);
+    }
+
+    /* last two registers are fp_control_status and xmm_control_status */
+    ctrl_status = &assocs[24].value.fp_control_status;
+    env->fpuc = ctrl_status->fp_control;
+
+    env->fpus = ctrl_status->fp_status & ~0x3800;
+    /* bits 11,12,13 are the top of stack pointer */
+    env->fpstt = (ctrl_status->fp_status >> 11) & 0x7;
+
+    for (i = 0; i < 8; i++) {
+        valid = ctrl_status->fp_tag & (1 << i);
+        env->fptags[i] = valid ? 0 : 1;
+    }
+
+    env->fpop = ctrl_status->last_fp_op;
+    env->fpip = ctrl_status->last_fp_rip;
+
+    xmm_ctrl = &assocs[25].value.xmm_control_status;
+    env->mxcsr = xmm_ctrl->xmm_status_control;
+    env->fpdp = xmm_ctrl->last_fp_rdp;
+}
+
+static int get_fpu(CPUState *cpu)
+{
+    struct hv_register_assoc assocs[ARRAY_SIZE(FPU_REGISTER_NAMES)];
+    int ret;
+    X86CPU *x86cpu = X86_CPU(cpu);
+    size_t n_regs = ARRAY_SIZE(FPU_REGISTER_NAMES);
+
+    for (size_t i = 0; i < n_regs; i++) {
+        assocs[i].name = FPU_REGISTER_NAMES[i];
+    }
+    ret = get_generic_regs(cpu, assocs, n_regs);
+    if (ret < 0) {
+        error_report("failed to get special registers");
+        return -errno;
+    }
+
+    populate_fpu(assocs, x86cpu);
+
+    return 0;
+}
+
+static int get_xc_reg(CPUState *cpu)
+{
+    int ret;
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    struct hv_register_assoc assocs[1];
+
+    assocs[0].name = HV_X64_REGISTER_XFEM;
+
+    ret = get_generic_regs(cpu, assocs, 1);
+    if (ret < 0) {
+        error_report("failed to get xcr0");
+        return -1;
+    }
+    env->xcr0 = assocs[0].value.reg64;
+
+    return 0;
+}
+
 static int translate_gva(const CPUState *cpu, uint64_t gva, uint64_t *gpa,
                          uint64_t flags)
 {
@@ -290,7 +376,7 @@ static int set_standard_regs(const CPUState *cpu)
     return 0;
 }
 
-int mshv_arch_store_regs(CPUState *cpu)
+static int store_regs(CPUState *cpu)
 {
     int ret;
 
@@ -432,20 +518,45 @@ static int get_special_regs(CPUState *cpu)
     return 0;
 }
 
-int mshv_arch_load_regs(CPUState *cpu)
+static int load_regs(CPUState *cpu)
 {
     int ret;
 
     ret = get_standard_regs(cpu);
     if (ret < 0) {
-        error_report("Failed to load standard registers");
-        return -1;
+        return ret;
     }
 
     ret = get_special_regs(cpu);
     if (ret < 0) {
-        error_report("Failed to load special registers");
-        return -1;
+        return ret;
+    }
+
+    return 0;
+}
+
+int mshv_arch_load_vcpu_state(CPUState *cpu)
+{
+    int ret;
+
+    ret = get_standard_regs(cpu);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = get_special_regs(cpu);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = get_xc_reg(cpu);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = get_fpu(cpu);
+    if (ret < 0) {
+        return ret;
     }
 
     return 0;
@@ -617,7 +728,7 @@ static int register_intercept_result_cpuid(const CPUState *cpu,
     return ret;
 }
 
-static int set_cpuid2(const CPUState *cpu)
+static int init_cpuid2(const CPUState *cpu)
 {
     int ret;
     size_t n_entries, cpuid_size;
@@ -808,29 +919,6 @@ static int set_xc_reg(const CPUState *cpu)
     return 0;
 }
 
-static int set_cpu_state(const CPUState *cpu)
-{
-    int ret;
-
-    ret = set_standard_regs(cpu);
-    if (ret < 0) {
-        return ret;
-    }
-    ret = set_special_regs(cpu);
-    if (ret < 0) {
-        return ret;
-    }
-    ret = set_fpu(cpu);
-    if (ret < 0) {
-        return ret;
-    }
-    ret = set_xc_reg(cpu);
-    if (ret < 0) {
-        return ret;
-    }
-    return 0;
-}
-
 static int get_vp_state(int cpu_fd, struct mshv_get_set_vp_state *state)
 {
     int ret;
@@ -844,7 +932,7 @@ static int get_vp_state(int cpu_fd, struct mshv_get_set_vp_state *state)
     return 0;
 }
 
-static int get_lapic(int cpu_fd,
+static int get_lapic(const CPUState *cpu,
                      struct hv_local_interrupt_controller_state *state)
 {
     int ret;
@@ -852,6 +940,7 @@ static int get_lapic(int cpu_fd,
     /* buffer aligned to 4k, as *state requires that */
     void *buffer = qemu_memalign(size, size);
     struct mshv_get_set_vp_state mshv_state = { 0 };
+    int cpu_fd = mshv_vcpufd(cpu);
 
     mshv_state.buf_ptr = (uint64_t) buffer;
     mshv_state.buf_sz = size;
@@ -888,7 +977,7 @@ static int set_vp_state(int cpu_fd, const struct mshv_get_set_vp_state *state)
     return 0;
 }
 
-static int set_lapic(int cpu_fd,
+static int set_lapic(const CPUState *cpu,
                      const struct hv_local_interrupt_controller_state *state)
 {
     int ret;
@@ -896,6 +985,7 @@ static int set_lapic(int cpu_fd,
     /* buffer aligned to 4k, as *state requires that */
     void *buffer = qemu_memalign(size, size);
     struct mshv_get_set_vp_state mshv_state = { 0 };
+    int cpu_fd = mshv_vcpufd(cpu);
 
     if (!state) {
         error_report("lapic state is NULL");
@@ -917,13 +1007,13 @@ static int set_lapic(int cpu_fd,
     return 0;
 }
 
-static int set_lint(int cpu_fd)
+static int init_lint(const CPUState *cpu)
 {
     int ret;
     uint32_t *lvt_lint0, *lvt_lint1;
 
     struct hv_local_interrupt_controller_state lapic_state = { 0 };
-    ret = get_lapic(cpu_fd, &lapic_state);
+    ret = get_lapic(cpu, &lapic_state);
     if (ret < 0) {
         return ret;
     }
@@ -936,161 +1026,31 @@ static int set_lint(int cpu_fd)
 
     /* TODO: should we skip setting lapic if the values are the same? */
 
-    return set_lapic(cpu_fd, &lapic_state);
+    return set_lapic(cpu, &lapic_state);
 }
 
-static int setup_msrs(const CPUState *cpu)
-{
-    int ret;
-    uint64_t default_type = MSR_MTRR_ENABLE | MSR_MTRR_MEM_TYPE_WB;
-
-    /* boot msr entries */
-    MshvMsrEntry msrs[9] = {
-        { .index = IA32_MSR_SYSENTER_CS, .data = 0x0, },
-        { .index = IA32_MSR_SYSENTER_ESP, .data = 0x0, },
-        { .index = IA32_MSR_SYSENTER_EIP, .data = 0x0, },
-        { .index = IA32_MSR_STAR, .data = 0x0, },
-        { .index = IA32_MSR_CSTAR, .data = 0x0, },
-        { .index = IA32_MSR_LSTAR, .data = 0x0, },
-        { .index = IA32_MSR_KERNEL_GS_BASE, .data = 0x0, },
-        { .index = IA32_MSR_SFMASK, .data = 0x0, },
-        { .index = IA32_MSR_MTRR_DEF_TYPE, .data = default_type, },
-    };
-
-    ret = mshv_configure_msr(cpu, msrs, 9);
-    if (ret < 0) {
-        error_report("failed to setup msrs");
-        return -1;
-    }
-
-    return 0;
-}
-
-/*
- * TODO: populate topology info:
- *
- * X86CPU *x86cpu = X86_CPU(cpu);
- * CPUX86State *env = &x86cpu->env;
- * X86CPUTopoInfo *topo_info = &env->topo_info;
- */
-int mshv_configure_vcpu(const CPUState *cpu)
+int mshv_arch_store_vcpu_state(const CPUState *cpu)
 {
     int ret;
-    int cpu_fd = mshv_vcpufd(cpu);
-
-    ret = set_cpuid2(cpu);
-    if (ret < 0) {
-        error_report("failed to set cpuid");
-        return -1;
-    }
-
-    ret = setup_msrs(cpu);
-    if (ret < 0) {
-        error_report("failed to setup msrs");
-        return -1;
-    }
-
-    ret = set_cpu_state(cpu);
-    if (ret < 0) {
-        error_report("failed to set cpu state");
-        return -1;
-    }
 
-    ret = set_lint(cpu_fd);
+    ret = set_standard_regs(cpu);
     if (ret < 0) {
-        error_report("failed to set lpic int");
-        return -1;
+        return ret;
     }
 
-    return 0;
-}
-
-static int put_regs(const CPUState *cpu)
-{
-    int ret;
-
-    ret = mshv_configure_vcpu(cpu);
+    ret = set_special_regs(cpu);
     if (ret < 0) {
-        error_report("failed to configure vcpu");
         return ret;
     }
 
-    return 0;
-}
-
-struct MsrPair {
-    uint32_t index;
-    uint64_t value;
-};
-
-static int put_msrs(const CPUState *cpu)
-{
-    int ret = 0;
-    X86CPU *x86cpu = X86_CPU(cpu);
-    CPUX86State *env = &x86cpu->env;
-    MshvMsrEntries *msrs = g_malloc0(sizeof(MshvMsrEntries));
-
-    struct MsrPair pairs[] = {
-        { MSR_IA32_SYSENTER_CS,    env->sysenter_cs },
-        { MSR_IA32_SYSENTER_ESP,   env->sysenter_esp },
-        { MSR_IA32_SYSENTER_EIP,   env->sysenter_eip },
-        { MSR_EFER,                env->efer },
-        { MSR_PAT,                 env->pat },
-        { MSR_STAR,                env->star },
-        { MSR_CSTAR,               env->cstar },
-        { MSR_LSTAR,               env->lstar },
-        { MSR_KERNELGSBASE,        env->kernelgsbase },
-        { MSR_FMASK,               env->fmask },
-        { MSR_MTRRdefType,         env->mtrr_deftype },
-        { MSR_VM_HSAVE_PA,         env->vm_hsave },
-        { MSR_SMI_COUNT,           env->msr_smi_count },
-        { MSR_IA32_PKRS,           env->pkrs },
-        { MSR_IA32_BNDCFGS,        env->msr_bndcfgs },
-        { MSR_IA32_XSS,            env->xss },
-        { MSR_IA32_UMWAIT_CONTROL, env->umwait },
-        { MSR_IA32_TSX_CTRL,       env->tsx_ctrl },
-        { MSR_AMD64_TSC_RATIO,     env->amd_tsc_scale_msr },
-        { MSR_TSC_AUX,             env->tsc_aux },
-        { MSR_TSC_ADJUST,          env->tsc_adjust },
-        { MSR_IA32_SMBASE,         env->smbase },
-        { MSR_IA32_SPEC_CTRL,      env->spec_ctrl },
-        { MSR_VIRT_SSBD,           env->virt_ssbd },
-    };
-
-    if (ARRAY_SIZE(pairs) > MSHV_MSR_ENTRIES_COUNT) {
-        error_report("MSR entries exceed maximum size");
-        g_free(msrs);
-        return -1;
-    }
-
-    for (size_t i = 0; i < ARRAY_SIZE(pairs); i++) {
-        MshvMsrEntry *entry = &msrs->entries[i];
-        entry->index = pairs[i].index;
-        entry->reserved = 0;
-        entry->data = pairs[i].value;
-        msrs->nmsrs++;
-    }
-
-    ret = mshv_configure_msr(cpu, &msrs->entries[0], msrs->nmsrs);
-    g_free(msrs);
-    return ret;
-}
-
-
-int mshv_arch_put_registers(const CPUState *cpu)
-{
-    int ret;
-
-    ret = put_regs(cpu);
+    ret = set_xc_reg(cpu);
     if (ret < 0) {
-        error_report("Failed to put registers");
-        return -1;
+        return ret;
     }
 
-    ret = put_msrs(cpu);
+    ret = set_fpu(cpu);
     if (ret < 0) {
-        error_report("Failed to put msrs");
-        return -1;
+        return ret;
     }
 
     return 0;
@@ -1126,7 +1086,7 @@ static int emulate_instruction(CPUState *cpu,
     int ret;
     x86_insn_stream stream = { .bytes = insn_bytes, .len = insn_len };
 
-    ret = mshv_arch_load_regs(cpu);
+    ret = load_regs(cpu);
     if (ret < 0) {
         error_report("Failed to load registers");
         return -1;
@@ -1135,7 +1095,7 @@ static int emulate_instruction(CPUState *cpu,
     decode_instruction_stream(env, &decode, &stream);
     exec_instruction(env, &decode);
 
-    ret = mshv_arch_store_regs(cpu);
+    ret = store_regs(cpu);
     if (ret < 0) {
         error_report("failed to store registers");
         return -1;
@@ -1433,7 +1393,7 @@ static int handle_pio_str(CPUState *cpu, hv_x64_io_port_intercept_message *info)
     X86CPU *x86_cpu = X86_CPU(cpu);
     CPUX86State *env = &x86_cpu->env;
 
-    ret = mshv_arch_load_regs(cpu);
+    ret = load_regs(cpu);
     if (ret < 0) {
         error_report("Failed to load registers");
         return -1;
@@ -1579,6 +1539,33 @@ void mshv_init_mmio_emu(void)
     init_emu(&mshv_x86_emul_ops);
 }
 
+static int init_msrs(const CPUState *cpu)
+{
+    int ret;
+    uint64_t d_t = MSR_MTRR_ENABLE | MSR_MTRR_MEM_TYPE_WB;
+
+    const struct hv_register_assoc assocs[] = {
+        { .name = HV_X64_REGISTER_SYSENTER_CS,       .value.reg64 = 0x0 },
+        { .name = HV_X64_REGISTER_SYSENTER_ESP,      .value.reg64 = 0x0 },
+        { .name = HV_X64_REGISTER_SYSENTER_EIP,      .value.reg64 = 0x0 },
+        { .name = HV_X64_REGISTER_STAR,              .value.reg64 = 0x0 },
+        { .name = HV_X64_REGISTER_CSTAR,             .value.reg64 = 0x0 },
+        { .name = HV_X64_REGISTER_LSTAR,             .value.reg64 = 0x0 },
+        { .name = HV_X64_REGISTER_KERNEL_GS_BASE,    .value.reg64 = 0x0 },
+        { .name = HV_X64_REGISTER_SFMASK,            .value.reg64 = 0x0 },
+        { .name = HV_X64_REGISTER_MSR_MTRR_DEF_TYPE, .value.reg64 = d_t },
+    };
+    QEMU_BUILD_BUG_ON(ARRAY_SIZE(assocs) > MSHV_MSR_ENTRIES_COUNT);
+
+    ret = mshv_set_generic_regs(cpu, assocs, ARRAY_SIZE(assocs));
+    if (ret < 0) {
+        error_report("failed to put msrs");
+        return -1;
+    }
+
+    return 0;
+}
+
 void mshv_arch_init_vcpu(CPUState *cpu)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
@@ -1586,6 +1573,7 @@ void mshv_arch_init_vcpu(CPUState *cpu)
     AccelCPUState *state = cpu->accel;
     size_t page = HV_HYP_PAGE_SIZE;
     void *mem = qemu_memalign(page, 2 * page);
+    int ret;
 
     /* sanity check, to make sure we don't overflow the page */
     QEMU_BUILD_BUG_ON((MAX_REGISTER_COUNT
@@ -1598,6 +1586,20 @@ void mshv_arch_init_vcpu(CPUState *cpu)
     state->hvcall_args.output_page = (uint8_t *)mem + page;
 
     env->emu_mmio_buf = g_new(char, 4096);
+
+    /*
+     * TODO: populate topology info:
+     * X86CPUTopoInfo *topo_info = &env->topo_info;
+     */
+
+    ret = init_cpuid2(cpu);
+    assert(ret == 0);
+
+    ret = init_msrs(cpu);
+    assert(ret == 0);
+
+    ret = init_lint(cpu);
+    assert(ret == 0);
 }
 
 void mshv_arch_destroy_vcpu(CPUState *cpu)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 04/32] accel/accel-irq: add AccelRouteChange abstraction
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (2 preceding siblings ...)
  2026-03-23 13:57 ` [RFC 03/32] target/i386/mshv: impl init/load/store_vcpu_state Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:57 ` [RFC 05/32] accel/accel-irq: add generic begin_route_changes Magnus Kulke
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

The accelerated irqchip routines use a record of changes to batch
changes when programming routes.

Currently this mechanism is coupled to the KVM accelerator, this change
introduces an abstraction that replaces KVMRouteChange and keeps a
pointer to an abstract AccelState instead of the concrete type,
converting the state where necessary.

This is done to further align the irqchip programming in the MSHV
accelerator with the existing KVM code in QEMU. Subsequent commits will
introduce AccelRouteChange to the MSHV accelerator code.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/accel-irq.c           |  4 ++--
 accel/kvm/kvm-all.c         |  6 +++---
 accel/stubs/kvm-stub.c      |  2 +-
 hw/misc/ivshmem-pci.c       |  2 +-
 hw/vfio/pci.c               |  2 +-
 hw/virtio/virtio-pci.c      |  3 +--
 include/accel/accel-route.h | 17 +++++++++++++++++
 include/system/accel-irq.h  |  5 +++--
 include/system/kvm.h        | 21 ++++++++++-----------
 include/system/mshv.h       |  1 +
 target/i386/kvm/kvm.c       |  2 +-
 11 files changed, 41 insertions(+), 24 deletions(-)
 create mode 100644 include/accel/accel-route.h

diff --git a/accel/accel-irq.c b/accel/accel-irq.c
index 7f864e35c4..0aa04c033d 100644
--- a/accel/accel-irq.c
+++ b/accel/accel-irq.c
@@ -16,7 +16,7 @@
 #include "system/mshv.h"
 #include "system/accel-irq.h"
 
-int accel_irqchip_add_msi_route(KVMRouteChange *c, int vector, PCIDevice *dev)
+int accel_irqchip_add_msi_route(AccelRouteChange *c, int vector, PCIDevice *dev)
 {
 #ifdef CONFIG_MSHV_IS_POSSIBLE
     if (mshv_msi_via_irqfd_enabled()) {
@@ -42,7 +42,7 @@ int accel_irqchip_update_msi_route(int vector, MSIMessage msg, PCIDevice *dev)
     return -ENOSYS;
 }
 
-void accel_irqchip_commit_route_changes(KVMRouteChange *c)
+void accel_irqchip_commit_route_changes(AccelRouteChange *c)
 {
 #ifdef CONFIG_MSHV_IS_POSSIBLE
     if (mshv_msi_via_irqfd_enabled()) {
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 774499d34f..0979545744 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2359,11 +2359,11 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
     return kvm_vm_ioctl(s, KVM_SIGNAL_MSI, &msi);
 }
 
-int kvm_irqchip_add_msi_route(KVMRouteChange *c, int vector, PCIDevice *dev)
+int kvm_irqchip_add_msi_route(AccelRouteChange *c, int vector, PCIDevice *dev)
 {
     struct kvm_irq_routing_entry kroute = {};
     int virq;
-    KVMState *s = c->s;
+    KVMState *s = KVM_STATE(c->accel);
     MSIMessage msg = {0, 0};
 
     if (pci_available && dev) {
@@ -2506,7 +2506,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
     abort();
 }
 
-int kvm_irqchip_add_msi_route(KVMRouteChange *c, int vector, PCIDevice *dev)
+int kvm_irqchip_add_msi_route(AccelRouteChange *c, int vector, PCIDevice *dev)
 {
     return -ENOSYS;
 }
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index c4617caac6..32b4b07403 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -44,7 +44,7 @@ int kvm_on_sigbus(int code, void *addr)
     return 1;
 }
 
-int kvm_irqchip_add_msi_route(KVMRouteChange *c, int vector, PCIDevice *dev)
+int kvm_irqchip_add_msi_route(AccelRouteChange *c, int vector, PCIDevice *dev)
 {
     return -ENOSYS;
 }
diff --git a/hw/misc/ivshmem-pci.c b/hw/misc/ivshmem-pci.c
index c987eebb98..aa8f271755 100644
--- a/hw/misc/ivshmem-pci.c
+++ b/hw/misc/ivshmem-pci.c
@@ -424,7 +424,7 @@ static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector,
                                      Error **errp)
 {
     PCIDevice *pdev = PCI_DEVICE(s);
-    KVMRouteChange c;
+    AccelRouteChange c;
     int ret;
 
     IVSHMEM_DPRINTF("ivshmem_add_kvm_msi_virq vector:%d\n", vector);
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 94c174a773..e48f4add4e 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -51,7 +51,7 @@
 #include "vfio-helpers.h"
 
 /* Protected by BQL */
-static KVMRouteChange vfio_route_change;
+static AccelRouteChange vfio_route_change;
 
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index bcab2d18b8..5010572784 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -869,7 +869,7 @@ static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy,
     int ret;
 
     if (irqfd->users == 0) {
-        KVMRouteChange c = kvm_irqchip_begin_route_changes(kvm_state);
+        AccelRouteChange c = kvm_irqchip_begin_route_changes(kvm_state);
         ret = accel_irqchip_add_msi_route(&c, vector, &proxy->pci_dev);
         if (ret < 0) {
             return ret;
@@ -2695,4 +2695,3 @@ static void virtio_pci_register_types(void)
 }
 
 type_init(virtio_pci_register_types)
-
diff --git a/include/accel/accel-route.h b/include/accel/accel-route.h
new file mode 100644
index 0000000000..07fac27e2a
--- /dev/null
+++ b/include/accel/accel-route.h
@@ -0,0 +1,17 @@
+/*
+ * Accelerator MSI route change tracking
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef ACCEL_ROUTE_H
+#define ACCEL_ROUTE_H
+
+#include "qemu/accel.h"
+
+typedef struct AccelRouteChange {
+    AccelState *accel;
+    int changes;
+} AccelRouteChange;
+
+#endif /* ACCEL_ROUTE_H */
diff --git a/include/system/accel-irq.h b/include/system/accel-irq.h
index a2caa06f54..a148920711 100644
--- a/include/system/accel-irq.h
+++ b/include/system/accel-irq.h
@@ -25,9 +25,10 @@ static inline bool accel_irqchip_is_split(void)
     return mshv_msi_via_irqfd_enabled() || kvm_irqchip_is_split();
 }
 
-int accel_irqchip_add_msi_route(KVMRouteChange *c, int vector, PCIDevice *dev);
+int accel_irqchip_add_msi_route(AccelRouteChange *c, int vector,
+                                PCIDevice *dev);
 int accel_irqchip_update_msi_route(int vector, MSIMessage msg, PCIDevice *dev);
-void accel_irqchip_commit_route_changes(KVMRouteChange *c);
+void accel_irqchip_commit_route_changes(AccelRouteChange *c);
 void accel_irqchip_commit_routes(void);
 void accel_irqchip_release_virq(int virq);
 int accel_irqchip_add_irqfd_notifier_gsi(EventNotifier *n, EventNotifier *rn,
diff --git a/include/system/kvm.h b/include/system/kvm.h
index 5fa33eddda..ccf90b8341 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -18,6 +18,7 @@
 
 #include "exec/memattrs.h"
 #include "qemu/accel.h"
+#include "accel/accel-route.h"
 #include "qom/object.h"
 
 #ifdef COMPILING_PER_TARGET
@@ -183,11 +184,6 @@ extern KVMState *kvm_state;
 typedef struct Notifier Notifier;
 typedef struct NotifierWithReturn NotifierWithReturn;
 
-typedef struct KVMRouteChange {
-     KVMState *s;
-     int changes;
-} KVMRouteChange;
-
 /* external API */
 
 unsigned int kvm_get_max_memslots(void);
@@ -466,7 +462,7 @@ void kvm_init_cpu_signals(CPUState *cpu);
 
 /**
  * kvm_irqchip_add_msi_route - Add MSI route for specific vector
- * @c:      KVMRouteChange instance.
+ * @c:      AccelRouteChange instance.
  * @vector: which vector to add. This can be either MSI/MSIX
  *          vector. The function will automatically detect whether
  *          MSI/MSIX is enabled, and fetch corresponding MSI
@@ -475,20 +471,23 @@ void kvm_init_cpu_signals(CPUState *cpu);
  *          as @NULL, an empty MSI message will be inited.
  * @return: virq (>=0) when success, errno (<0) when failed.
  */
-int kvm_irqchip_add_msi_route(KVMRouteChange *c, int vector, PCIDevice *dev);
+int kvm_irqchip_add_msi_route(AccelRouteChange *c, int vector, PCIDevice *dev);
 int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
                                  PCIDevice *dev);
 void kvm_irqchip_commit_routes(KVMState *s);
 
-static inline KVMRouteChange kvm_irqchip_begin_route_changes(KVMState *s)
+static inline AccelRouteChange kvm_irqchip_begin_route_changes(KVMState *s)
 {
-    return (KVMRouteChange) { .s = s, .changes = 0 };
+    return (AccelRouteChange) {
+        .accel = ACCEL(s),
+        .changes = 0,
+    };
 }
 
-static inline void kvm_irqchip_commit_route_changes(KVMRouteChange *c)
+static inline void kvm_irqchip_commit_route_changes(AccelRouteChange *c)
 {
     if (c->changes) {
-        kvm_irqchip_commit_routes(c->s);
+        kvm_irqchip_commit_routes(KVM_STATE(c->accel));
         c->changes = 0;
     }
 }
diff --git a/include/system/mshv.h b/include/system/mshv.h
index 75286baf16..1e96b3a606 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -21,6 +21,7 @@
 #include "qapi/qapi-types-common.h"
 #include "system/memory.h"
 #include "accel/accel-ops.h"
+#include "accel/accel-route.h"
 
 #ifdef COMPILING_PER_TARGET
 #ifdef CONFIG_MSHV
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a29f757c16..9cc41758d7 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -6680,7 +6680,7 @@ void kvm_arch_init_irq_routing(KVMState *s)
     kvm_gsi_routing_allowed = true;
 
     if (kvm_irqchip_is_split()) {
-        KVMRouteChange c = kvm_irqchip_begin_route_changes(s);
+        AccelRouteChange c = kvm_irqchip_begin_route_changes(s);
         int i;
 
         /* If the ioapic is in QEMU and the lapics are in KVM, reserve
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 05/32] accel/accel-irq: add generic begin_route_changes
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (3 preceding siblings ...)
  2026-03-23 13:57 ` [RFC 04/32] accel/accel-irq: add AccelRouteChange abstraction Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:57 ` [RFC 06/32] accel/accel-irq: add generic commit_route_changes Magnus Kulke
                   ` (26 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

A generic accel_irqchip_begin_route_change() fn has been introduced for
usage in the MSHV accelerator. It replaces the respective kvm_ fn.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/accel-irq.c          | 21 +++++++++++++++++++++
 hw/misc/ivshmem-pci.c      |  4 ++--
 hw/vfio/pci.c              |  5 +++--
 hw/virtio/virtio-pci.c     |  2 +-
 include/system/accel-irq.h |  1 +
 include/system/kvm.h       |  8 --------
 target/i386/kvm/kvm.c      |  3 ++-
 7 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/accel/accel-irq.c b/accel/accel-irq.c
index 0aa04c033d..3815f6727c 100644
--- a/accel/accel-irq.c
+++ b/accel/accel-irq.c
@@ -10,6 +10,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/error-report.h"
 #include "hw/pci/msi.h"
 
 #include "system/kvm.h"
@@ -104,3 +105,23 @@ int accel_irqchip_remove_irqfd_notifier_gsi(EventNotifier *n, int virq)
     }
     return -ENOSYS;
 }
+
+inline AccelRouteChange accel_irqchip_begin_route_changes(void)
+{
+#ifdef CONFIG_MSHV_IS_POSSIBLE
+    if (mshv_msi_via_irqfd_enabled()) {
+        return (AccelRouteChange) {
+            .accel = ACCEL(mshv_state),
+            .changes = 0,
+        };
+    }
+#endif
+    if (kvm_enabled()) {
+        return (AccelRouteChange) {
+            .accel = ACCEL(kvm_state),
+            .changes = 0,
+        };
+    }
+    error_report("can't initiate route change, no accel irqchip available");
+    abort();
+}
diff --git a/hw/misc/ivshmem-pci.c b/hw/misc/ivshmem-pci.c
index aa8f271755..fa23562886 100644
--- a/hw/misc/ivshmem-pci.c
+++ b/hw/misc/ivshmem-pci.c
@@ -26,7 +26,7 @@
 #include "hw/core/qdev-properties-system.h"
 #include "hw/pci/msi.h"
 #include "hw/pci/msix.h"
-#include "system/kvm.h"
+#include "system/accel-irq.h"
 #include "migration/blocker.h"
 #include "migration/vmstate.h"
 #include "qemu/error-report.h"
@@ -430,7 +430,7 @@ static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector,
     IVSHMEM_DPRINTF("ivshmem_add_kvm_msi_virq vector:%d\n", vector);
     assert(!s->msi_vectors[vector].pdev);
 
-    c = kvm_irqchip_begin_route_changes(kvm_state);
+    c = accel_irqchip_begin_route_changes();
     ret = kvm_irqchip_add_msi_route(&c, vector, pdev);
     if (ret < 0) {
         error_setg(errp, "kvm_irqchip_add_msi_route failed");
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e48f4add4e..6768523147 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -39,6 +39,7 @@
 #include "qemu/module.h"
 #include "qemu/range.h"
 #include "qemu/units.h"
+#include "system/accel-irq.h"
 #include "system/kvm.h"
 #include "system/runstate.h"
 #include "pci.h"
@@ -692,7 +693,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
             if (vdev->defer_kvm_irq_routing) {
                 vfio_pci_add_kvm_msi_virq(vdev, vector, nr, true);
             } else {
-                vfio_route_change = kvm_irqchip_begin_route_changes(kvm_state);
+                vfio_route_change = accel_irqchip_begin_route_changes();
                 vfio_pci_add_kvm_msi_virq(vdev, vector, nr, true);
                 kvm_irqchip_commit_route_changes(&vfio_route_change);
                 vfio_connect_kvm_msi_virq(vector, nr);
@@ -793,7 +794,7 @@ void vfio_pci_prepare_kvm_msi_virq_batch(VFIOPCIDevice *vdev)
 {
     assert(!vdev->defer_kvm_irq_routing);
     vdev->defer_kvm_irq_routing = true;
-    vfio_route_change = kvm_irqchip_begin_route_changes(kvm_state);
+    vfio_route_change = accel_irqchip_begin_route_changes();
 }
 
 void vfio_pci_commit_kvm_msi_virq_batch(VFIOPCIDevice *vdev)
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 5010572784..faa4a41cca 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -869,7 +869,7 @@ static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy,
     int ret;
 
     if (irqfd->users == 0) {
-        AccelRouteChange c = kvm_irqchip_begin_route_changes(kvm_state);
+        AccelRouteChange c = accel_irqchip_begin_route_changes();
         ret = accel_irqchip_add_msi_route(&c, vector, &proxy->pci_dev);
         if (ret < 0) {
             return ret;
diff --git a/include/system/accel-irq.h b/include/system/accel-irq.h
index a148920711..fc94c54264 100644
--- a/include/system/accel-irq.h
+++ b/include/system/accel-irq.h
@@ -25,6 +25,7 @@ static inline bool accel_irqchip_is_split(void)
     return mshv_msi_via_irqfd_enabled() || kvm_irqchip_is_split();
 }
 
+AccelRouteChange accel_irqchip_begin_route_changes(void);
 int accel_irqchip_add_msi_route(AccelRouteChange *c, int vector,
                                 PCIDevice *dev);
 int accel_irqchip_update_msi_route(int vector, MSIMessage msg, PCIDevice *dev);
diff --git a/include/system/kvm.h b/include/system/kvm.h
index ccf90b8341..fec24d2135 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -476,14 +476,6 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
                                  PCIDevice *dev);
 void kvm_irqchip_commit_routes(KVMState *s);
 
-static inline AccelRouteChange kvm_irqchip_begin_route_changes(KVMState *s)
-{
-    return (AccelRouteChange) {
-        .accel = ACCEL(s),
-        .changes = 0,
-    };
-}
-
 static inline void kvm_irqchip_commit_route_changes(AccelRouteChange *c)
 {
     if (c->changes) {
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 9cc41758d7..dc7c495f9c 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -32,6 +32,7 @@
 #include "vmsr_energy.h"
 #include "system/system.h"
 #include "system/hw_accel.h"
+#include "system/accel-irq.h"
 #include "system/kvm_int.h"
 #include "system/runstate.h"
 #include "system/ramblock.h"
@@ -6680,7 +6681,7 @@ void kvm_arch_init_irq_routing(KVMState *s)
     kvm_gsi_routing_allowed = true;
 
     if (kvm_irqchip_is_split()) {
-        AccelRouteChange c = kvm_irqchip_begin_route_changes(s);
+        AccelRouteChange c = accel_irqchip_begin_route_changes();
         int i;
 
         /* If the ioapic is in QEMU and the lapics are in KVM, reserve
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 06/32] accel/accel-irq: add generic commit_route_changes
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (4 preceding siblings ...)
  2026-03-23 13:57 ` [RFC 05/32] accel/accel-irq: add generic begin_route_changes Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:57 ` [RFC 07/32] accel/mshv: add irq_routes to state Magnus Kulke
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

A generic accel_irqchip_commit_route_changes() fn has been introduced for
usage in the MSHV accelerator. The respective kvm_ fn can be removed
since we handle the commit op in a generic way.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/accel-irq.c     | 10 +++-------
 hw/misc/ivshmem-pci.c |  2 +-
 hw/vfio/pci.c         |  4 ++--
 include/system/kvm.h  |  8 --------
 target/i386/kvm/kvm.c |  2 +-
 5 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/accel/accel-irq.c b/accel/accel-irq.c
index 3815f6727c..7e71b52555 100644
--- a/accel/accel-irq.c
+++ b/accel/accel-irq.c
@@ -45,13 +45,9 @@ int accel_irqchip_update_msi_route(int vector, MSIMessage msg, PCIDevice *dev)
 
 void accel_irqchip_commit_route_changes(AccelRouteChange *c)
 {
-#ifdef CONFIG_MSHV_IS_POSSIBLE
-    if (mshv_msi_via_irqfd_enabled()) {
-        mshv_irqchip_commit_routes();
-    }
-#endif
-    if (kvm_enabled()) {
-        kvm_irqchip_commit_route_changes(c);
+    if (c->changes) {
+        accel_irqchip_commit_routes();
+        c->changes = 0;
     }
 }
 
diff --git a/hw/misc/ivshmem-pci.c b/hw/misc/ivshmem-pci.c
index fa23562886..536475e9de 100644
--- a/hw/misc/ivshmem-pci.c
+++ b/hw/misc/ivshmem-pci.c
@@ -436,7 +436,7 @@ static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector,
         error_setg(errp, "kvm_irqchip_add_msi_route failed");
         return;
     }
-    kvm_irqchip_commit_route_changes(&c);
+    accel_irqchip_commit_route_changes(&c);
 
     s->msi_vectors[vector].virq = ret;
     s->msi_vectors[vector].pdev = pdev;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6768523147..d1807888d5 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -695,7 +695,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
             } else {
                 vfio_route_change = accel_irqchip_begin_route_changes();
                 vfio_pci_add_kvm_msi_virq(vdev, vector, nr, true);
-                kvm_irqchip_commit_route_changes(&vfio_route_change);
+                accel_irqchip_commit_route_changes(&vfio_route_change);
                 vfio_connect_kvm_msi_virq(vector, nr);
             }
         }
@@ -804,7 +804,7 @@ void vfio_pci_commit_kvm_msi_virq_batch(VFIOPCIDevice *vdev)
     assert(vdev->defer_kvm_irq_routing);
     vdev->defer_kvm_irq_routing = false;
 
-    kvm_irqchip_commit_route_changes(&vfio_route_change);
+    accel_irqchip_commit_route_changes(&vfio_route_change);
 
     for (i = 0; i < vdev->nr_vectors; i++) {
         vfio_connect_kvm_msi_virq(&vdev->msi_vectors[i], i);
diff --git a/include/system/kvm.h b/include/system/kvm.h
index fec24d2135..cdd1856ac5 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -476,14 +476,6 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
                                  PCIDevice *dev);
 void kvm_irqchip_commit_routes(KVMState *s);
 
-static inline void kvm_irqchip_commit_route_changes(AccelRouteChange *c)
-{
-    if (c->changes) {
-        kvm_irqchip_commit_routes(KVM_STATE(c->accel));
-        c->changes = 0;
-    }
-}
-
 int kvm_irqchip_get_virq(KVMState *s);
 void kvm_irqchip_release_virq(KVMState *s, int virq);
 
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index dc7c495f9c..b132e986f9 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -6692,7 +6692,7 @@ void kvm_arch_init_irq_routing(KVMState *s)
                 exit(1);
             }
         }
-        kvm_irqchip_commit_route_changes(&c);
+        accel_irqchip_commit_route_changes(&c);
     }
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 07/32] accel/mshv: add irq_routes to state
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (5 preceding siblings ...)
  2026-03-23 13:57 ` [RFC 06/32] accel/accel-irq: add generic commit_route_changes Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:57 ` [RFC 08/32] accel/mshv: update s->irq_routes in add_msi_route Magnus Kulke
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

This change adds fields related to irq routing to the MSHV state, following
similar fields in the KVM implementation.

So far the fields are only initialized, they will be used in subsequent
commits for bookkeeping purposes and storing uncommitted interrupt routes.

The TYPE_MSHV_ACCEL defines have been moved to the header.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/irq.c          | 10 ++++++++++
 accel/mshv/mshv-all.c     |  6 ++----
 include/system/mshv.h     |  7 +++++++
 include/system/mshv_int.h |  5 +++++
 4 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/accel/mshv/irq.c b/accel/mshv/irq.c
index 3c238c33c3..82f2022c7c 100644
--- a/accel/mshv/irq.c
+++ b/accel/mshv/irq.c
@@ -396,3 +396,13 @@ int mshv_reserve_ioapic_msi_routes(int vm_fd)
 
     return 0;
 }
+
+void mshv_init_irq_routing(MshvState *s)
+{
+    int gsi_count = MSHV_MAX_MSI_ROUTES;
+
+    s->irq_routes = g_malloc0(sizeof(*s->irq_routes));
+    s->nr_allocated_irq_routes = 0;
+    s->gsi_count = gsi_count;
+    s->used_gsi_bitmap = bitmap_new(gsi_count);
+}
diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index 04d248fe1d..8acb080db1 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -43,10 +43,6 @@
 #include <err.h>
 #include <sys/ioctl.h>
 
-#define TYPE_MSHV_ACCEL ACCEL_CLASS_NAME("mshv")
-
-DECLARE_INSTANCE_CHECKER(MshvState, MSHV_STATE, TYPE_MSHV_ACCEL)
-
 bool mshv_allowed;
 
 MshvState *mshv_state;
@@ -457,6 +453,8 @@ static int mshv_init(AccelState *as, MachineState *ms)
 
     mshv_state = s;
 
+    mshv_init_irq_routing(s);
+
     register_mshv_memory_listener(s, &s->memory_listener, &address_space_memory,
                                   0, "mshv-memory");
     memory_listener_register(&mshv_io_listener, &address_space_io);
diff --git a/include/system/mshv.h b/include/system/mshv.h
index 1e96b3a606..0d1745315b 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -45,7 +45,13 @@ extern bool mshv_allowed;
 #define mshv_msi_via_irqfd_enabled() mshv_enabled()
 #endif
 
+#define TYPE_MSHV_ACCEL ACCEL_CLASS_NAME("mshv")
+
 typedef struct MshvState MshvState;
+
+DECLARE_INSTANCE_CHECKER(MshvState, MSHV_STATE,
+                         TYPE_MSHV_ACCEL)
+
 extern MshvState *mshv_state;
 
 /* interrupt */
@@ -60,5 +66,6 @@ void mshv_irqchip_release_virq(int virq);
 int mshv_irqchip_add_irqfd_notifier_gsi(const EventNotifier *n,
                                         const EventNotifier *rn, int virq);
 int mshv_irqchip_remove_irqfd_notifier_gsi(const EventNotifier *n, int virq);
+void mshv_init_irq_routing(MshvState *s);
 
 #endif
diff --git a/include/system/mshv_int.h b/include/system/mshv_int.h
index 70631ca6ba..56fda76a9c 100644
--- a/include/system/mshv_int.h
+++ b/include/system/mshv_int.h
@@ -48,6 +48,11 @@ struct MshvState {
     int nr_as;
     MshvAddressSpace *as;
     int fd;
+    /* irqchip routing */
+    struct mshv_user_irq_table *irq_routes;
+    int nr_allocated_irq_routes;
+    unsigned long *used_gsi_bitmap;
+    unsigned int gsi_count;
 };
 
 typedef struct MshvMsiControl {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 08/32] accel/mshv: update s->irq_routes in add_msi_route
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (6 preceding siblings ...)
  2026-03-23 13:57 ` [RFC 07/32] accel/mshv: add irq_routes to state Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:57 ` [RFC 09/32] accel/mshv: update s->irq_routes in update_msi_route Magnus Kulke
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

The irq_routes field of the state is populated with native mshv irq
route entries. The allocation logic is modelled after the KVM
implementation: we will always allocate a minumum of 64 entries and use
a bitmask to find/set/clear GSIs.

The old implementation of add_msi_routes will be removed in a subsequent
commit.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/accel-irq.c       |  2 +-
 accel/mshv/irq.c        | 87 +++++++++++++++++++++++++++++++++++++----
 accel/stubs/mshv-stub.c |  2 +-
 include/system/mshv.h   |  3 +-
 4 files changed, 84 insertions(+), 10 deletions(-)

diff --git a/accel/accel-irq.c b/accel/accel-irq.c
index 7e71b52555..5a97a345b2 100644
--- a/accel/accel-irq.c
+++ b/accel/accel-irq.c
@@ -21,7 +21,7 @@ int accel_irqchip_add_msi_route(AccelRouteChange *c, int vector, PCIDevice *dev)
 {
 #ifdef CONFIG_MSHV_IS_POSSIBLE
     if (mshv_msi_via_irqfd_enabled()) {
-        return mshv_irqchip_add_msi_route(vector, dev);
+        return mshv_irqchip_add_msi_route(c, vector, dev);
     }
 #endif
     if (kvm_enabled()) {
diff --git a/accel/mshv/irq.c b/accel/mshv/irq.c
index 82f2022c7c..9d6bdde27a 100644
--- a/accel/mshv/irq.c
+++ b/accel/mshv/irq.c
@@ -278,18 +278,91 @@ static int irqchip_update_irqfd_notifier_gsi(const EventNotifier *event,
     return register_irqfd(vm_fd, fd, virq);
 }
 
+static int irqchip_allocate_gsi(MshvState *s, int *gsi)
+{
+    int next_gsi;
+
+    /* Return the lowest unused GSI in the bitmap */
+    next_gsi = find_first_zero_bit(s->used_gsi_bitmap, s->gsi_count);
+    if (next_gsi >= s->gsi_count) {
+        return -ENOSPC;
+    }
+
+    *gsi = next_gsi;
+
+    return 0;
+}
+
+static void irqchip_release_gsi(MshvState *s, int gsi)
+{
+    clear_bit(gsi, s->used_gsi_bitmap);
+}
+
+static void add_routing_entry(MshvState *s, struct mshv_user_irq_entry *entry)
+{
+    struct mshv_user_irq_entry *new;
+    int n, size;
+
+    if (s->irq_routes->nr == s->nr_allocated_irq_routes) {
+        n = s->nr_allocated_irq_routes * 2;
+        if (n < MSHV_MIN_ALLOCATED_MSI_ROUTES) {
+            n = MSHV_MIN_ALLOCATED_MSI_ROUTES;
+        }
+        size = sizeof(struct mshv_user_irq_table);
+        size += n * sizeof(*new);
+        s->irq_routes = g_realloc(s->irq_routes, size);
+        s->nr_allocated_irq_routes = n;
+    }
+
+    n = s->irq_routes->nr;
+    s->irq_routes->nr++;
+    new = &s->irq_routes->entries[n];
+
+    *new = *entry;
+
+    set_bit(entry->gsi, s->used_gsi_bitmap);
 
-int mshv_irqchip_add_msi_route(int vector, PCIDevice *dev)
+    trace_mshv_add_msi_routing(entry->address_lo | entry->address_hi,
+                               entry->data);
+}
+
+int mshv_irqchip_add_msi_route(AccelRouteChange *c, int vector, PCIDevice *dev)
 {
-    MSIMessage msg = { 0, 0 };
-    int virq = 0;
+    struct mshv_user_irq_entry entry = { 0 };
+    MSIMessage msg = { 0 };
+    uint32_t data, high_addr, low_addr;
+    int gsi, ret;
+    MshvState *s = MSHV_STATE(c->accel);
+
+    if (!pci_available || !dev) {
+        return 0;
+    }
 
-    if (pci_available && dev) {
-        msg = pci_get_msi_message(dev, vector);
-        virq = add_msi_routing(msg.address, le32_to_cpu(msg.data));
+    msg = pci_get_msi_message(dev, vector);
+
+    ret = irqchip_allocate_gsi(mshv_state, &gsi);
+    if (ret < 0) {
+        error_report("Could not allocate GSI for MSI route");
+        return -1;
+    }
+    high_addr = msg.address >> 32;
+    low_addr = msg.address & 0xFFFFFFFF;
+    data = le32_to_cpu(msg.data);
+
+    entry.gsi = gsi;
+    entry.address_hi = high_addr;
+    entry.address_lo = low_addr;
+    entry.data = data;
+
+    if (s->irq_routes->nr < s->gsi_count) {
+        add_routing_entry(s, &entry);
+        c->changes++;
+    } else {
+        irqchip_release_gsi(s, gsi);
+        return -ENOSPC;
     }
 
-    return virq;
+    return gsi;
 }
 
 void mshv_irqchip_release_virq(int virq)
diff --git a/accel/stubs/mshv-stub.c b/accel/stubs/mshv-stub.c
index e499b199d9..998c9e2fc6 100644
--- a/accel/stubs/mshv-stub.c
+++ b/accel/stubs/mshv-stub.c
@@ -14,7 +14,7 @@
 
 bool mshv_allowed;
 
-int mshv_irqchip_add_msi_route(int vector, PCIDevice *dev)
+int mshv_irqchip_add_msi_route(AccelRouteChange *c, int vector, PCIDevice *dev)
 {
     return -ENOSYS;
 }
diff --git a/include/system/mshv.h b/include/system/mshv.h
index 0d1745315b..7f60aba308 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -33,6 +33,7 @@
 #endif
 
 #define MSHV_MAX_MSI_ROUTES 4096
+#define MSHV_MIN_ALLOCATED_MSI_ROUTES 64
 
 #define MSHV_PAGE_SHIFT 12
 
@@ -59,7 +60,7 @@ int mshv_request_interrupt(MshvState *mshv_state, uint32_t interrupt_type, uint3
                            uint32_t vp_index, bool logical_destination_mode,
                            bool level_triggered);
 
-int mshv_irqchip_add_msi_route(int vector, PCIDevice *dev);
+int mshv_irqchip_add_msi_route(AccelRouteChange *c, int vector, PCIDevice *dev);
 int mshv_irqchip_update_msi_route(int virq, MSIMessage msg, PCIDevice *dev);
 void mshv_irqchip_commit_routes(void);
 void mshv_irqchip_release_virq(int virq);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 09/32] accel/mshv: update s->irq_routes in update_msi_route
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (7 preceding siblings ...)
  2026-03-23 13:57 ` [RFC 08/32] accel/mshv: update s->irq_routes in add_msi_route Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:57 ` [RFC 10/32] accel/mshv: update s->irq_routes in release_virq Magnus Kulke
                   ` (22 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

The state's irq_routes field will be updated when an irqchip's gsi
is requested to be updated with a new dest/vector.

The old set_msi_routing() fn is redundant and can be removed.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/irq.c | 87 +++++++++++++++++++++---------------------------
 1 file changed, 38 insertions(+), 49 deletions(-)

diff --git a/accel/mshv/irq.c b/accel/mshv/irq.c
index 9d6bdde27a..b5a047b367 100644
--- a/accel/mshv/irq.c
+++ b/accel/mshv/irq.c
@@ -36,52 +36,6 @@ void mshv_init_msicontrol(void)
     msi_control->updated = false;
 }
 
-static int set_msi_routing(uint32_t gsi, uint64_t addr, uint32_t data)
-{
-    struct mshv_user_irq_entry *entry;
-    uint32_t high_addr = addr >> 32;
-    uint32_t low_addr = addr & 0xFFFFFFFF;
-    GHashTable *gsi_routes;
-
-    trace_mshv_set_msi_routing(gsi, addr, data);
-
-    if (gsi >= MSHV_MAX_MSI_ROUTES) {
-        error_report("gsi >= MSHV_MAX_MSI_ROUTES");
-        return -1;
-    }
-
-    assert(msi_control);
-
-    WITH_QEMU_LOCK_GUARD(&msi_control_mutex) {
-        gsi_routes = msi_control->gsi_routes;
-        entry = g_hash_table_lookup(gsi_routes, GINT_TO_POINTER(gsi));
-
-        if (entry
-            && entry->address_hi == high_addr
-            && entry->address_lo == low_addr
-            && entry->data == data)
-        {
-            /* nothing to update */
-            return 0;
-        }
-
-        /* free old entry */
-        g_free(entry);
-
-        /* create new entry */
-        entry = g_new0(struct mshv_user_irq_entry, 1);
-        entry->gsi = gsi;
-        entry->address_hi = high_addr;
-        entry->address_lo = low_addr;
-        entry->data = data;
-
-        g_hash_table_insert(gsi_routes, GINT_TO_POINTER(gsi), entry);
-        msi_control->updated = true;
-    }
-
-    return 0;
-}
-
 static int add_msi_routing(uint64_t addr, uint32_t data)
 {
     struct mshv_user_irq_entry *route_entry;
@@ -370,16 +324,51 @@ void mshv_irqchip_release_virq(int virq)
     remove_msi_routing(virq);
 }
 
+static int update_routing_entry(MshvState *s,
+                                struct mshv_user_irq_entry *new_entry)
+{
+    struct mshv_user_irq_entry *entry;
+    int n;
+
+    for (n = 0; n < s->irq_routes->nr; n++) {
+        entry = &s->irq_routes->entries[n];
+        if (entry->gsi != new_entry->gsi) {
+            continue;
+        }
+
+        if (!memcmp(entry, new_entry, sizeof *entry)) {
+            return 0;
+        }
+
+        *entry = *new_entry;
+
+        return 0;
+    }
+
+    return -ESRCH;
+}
+
 int mshv_irqchip_update_msi_route(int virq, MSIMessage msg, PCIDevice *dev)
 {
+    uint32_t addr_hi = msg.address >> 32;
+    uint32_t addr_lo = msg.address & 0xFFFFFFFF;
+    uint32_t data = le32_to_cpu(msg.data);
+    struct mshv_user_irq_entry entry = {
+        .gsi = virq,
+        .address_hi = addr_hi,
+        .address_lo = addr_lo,
+        .data = data,
+    };
     int ret;
 
-    ret = set_msi_routing(virq, msg.address, le32_to_cpu(msg.data));
+    ret = update_routing_entry(mshv_state, &entry);
     if (ret < 0) {
-        error_report("Failed to set msi routing");
-        return -1;
+        error_report("Failed to set msi routing for gsi %d", virq);
+        abort();
     }
 
+    trace_mshv_set_msi_routing(virq, msg.address, data);
+
     return 0;
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 10/32] accel/mshv: update s->irq_routes in release_virq
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (8 preceding siblings ...)
  2026-03-23 13:57 ` [RFC 09/32] accel/mshv: update s->irq_routes in update_msi_route Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:57 ` [RFC 11/32] accel/mshv: use s->irq_routes in commit_routes Magnus Kulke
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

The state's irq_routes field will be updated when an irqchip's gsi
is requested to be released.

The old remove_msi_routing() fn is redundant and can be removed.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/accel-irq.c       |  2 +-
 accel/mshv/irq.c        | 43 ++++++++++++++---------------------------
 accel/stubs/mshv-stub.c |  2 +-
 include/system/mshv.h   |  2 +-
 4 files changed, 17 insertions(+), 32 deletions(-)

diff --git a/accel/accel-irq.c b/accel/accel-irq.c
index 5a97a345b2..0cb3ef78d3 100644
--- a/accel/accel-irq.c
+++ b/accel/accel-irq.c
@@ -67,7 +67,7 @@ void accel_irqchip_release_virq(int virq)
 {
 #ifdef CONFIG_MSHV_IS_POSSIBLE
     if (mshv_msi_via_irqfd_enabled()) {
-        mshv_irqchip_release_virq(virq);
+        mshv_irqchip_release_virq(mshv_state, virq);
     }
 #endif
     if (kvm_enabled()) {
diff --git a/accel/mshv/irq.c b/accel/mshv/irq.c
index b5a047b367..990ce34620 100644
--- a/accel/mshv/irq.c
+++ b/accel/mshv/irq.c
@@ -123,33 +123,6 @@ static int commit_msi_routing_table(int vm_fd)
     return 0;
 }
 
-static int remove_msi_routing(uint32_t gsi)
-{
-    struct mshv_user_irq_entry *route_entry;
-    GHashTable *gsi_routes;
-
-    trace_mshv_remove_msi_routing(gsi);
-
-    if (gsi >= MSHV_MAX_MSI_ROUTES) {
-        error_report("Invalid GSI: %u", gsi);
-        return -1;
-    }
-
-    assert(msi_control);
-
-    WITH_QEMU_LOCK_GUARD(&msi_control_mutex) {
-        gsi_routes = msi_control->gsi_routes;
-        route_entry = g_hash_table_lookup(gsi_routes, GINT_TO_POINTER(gsi));
-        if (route_entry) {
-            g_hash_table_remove(gsi_routes, GINT_TO_POINTER(gsi));
-            g_free(route_entry);
-            msi_control->updated = true;
-        }
-    }
-
-    return 0;
-}
-
 /* Pass an eventfd which is to be used for injecting interrupts from userland */
 static int irqfd(int vm_fd, int fd, int resample_fd, uint32_t gsi,
                  uint32_t flags)
@@ -319,9 +292,21 @@ int mshv_irqchip_add_msi_route(AccelRouteChange *c, int vector, PCIDevice *dev)
     return gsi;
 }
 
-void mshv_irqchip_release_virq(int virq)
+void mshv_irqchip_release_virq(MshvState *s, int virq)
 {
-    remove_msi_routing(virq);
+    struct mshv_user_irq_entry *e;
+    int i;
+
+    for (i = 0; i < s->irq_routes->nr; i++) {
+        e = &s->irq_routes->entries[i];
+        if (e->gsi == virq) {
+            s->irq_routes->nr--;
+            *e = s->irq_routes->entries[s->irq_routes->nr];
+        }
+    }
+    irqchip_release_gsi(s, virq);
+
+    trace_mshv_remove_msi_routing(virq);
 }
 
 static int update_routing_entry(MshvState *s,
diff --git a/accel/stubs/mshv-stub.c b/accel/stubs/mshv-stub.c
index 998c9e2fc6..dadf05511a 100644
--- a/accel/stubs/mshv-stub.c
+++ b/accel/stubs/mshv-stub.c
@@ -19,7 +19,7 @@ int mshv_irqchip_add_msi_route(AccelRouteChange *c, int vector, PCIDevice *dev)
     return -ENOSYS;
 }
 
-void mshv_irqchip_release_virq(int virq)
+void mshv_irqchip_release_virq(MshvState *s, int virq)
 {
 }
 
diff --git a/include/system/mshv.h b/include/system/mshv.h
index 7f60aba308..2033beed70 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -63,7 +63,7 @@ int mshv_request_interrupt(MshvState *mshv_state, uint32_t interrupt_type, uint3
 int mshv_irqchip_add_msi_route(AccelRouteChange *c, int vector, PCIDevice *dev);
 int mshv_irqchip_update_msi_route(int virq, MSIMessage msg, PCIDevice *dev);
 void mshv_irqchip_commit_routes(void);
-void mshv_irqchip_release_virq(int virq);
+void mshv_irqchip_release_virq(MshvState *s, int virq);
 int mshv_irqchip_add_irqfd_notifier_gsi(const EventNotifier *n,
                                         const EventNotifier *rn, int virq);
 int mshv_irqchip_remove_irqfd_notifier_gsi(const EventNotifier *n, int virq);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 11/32] accel/mshv: use s->irq_routes in commit_routes
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (9 preceding siblings ...)
  2026-03-23 13:57 ` [RFC 10/32] accel/mshv: update s->irq_routes in release_virq Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:57 ` [RFC 12/32] accel/mshv: reserve ioapic routes on s->irq_routes Magnus Kulke
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

In mshv_irqchip_commit_routes() the entries that have been accumulated
in s->irq_routes are committed directly to MSHV's irqchip.

The old commit_msi_routing_table() fn will be removed in a subsquent commit.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/accel-irq.c       | 2 +-
 accel/mshv/irq.c        | 7 ++++---
 accel/stubs/mshv-stub.c | 2 +-
 include/system/mshv.h   | 2 +-
 4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/accel/accel-irq.c b/accel/accel-irq.c
index 0cb3ef78d3..4303e10e40 100644
--- a/accel/accel-irq.c
+++ b/accel/accel-irq.c
@@ -55,7 +55,7 @@ void accel_irqchip_commit_routes(void)
 {
 #ifdef CONFIG_MSHV_IS_POSSIBLE
     if (mshv_msi_via_irqfd_enabled()) {
-        mshv_irqchip_commit_routes();
+        mshv_irqchip_commit_routes(mshv_state);
     }
 #endif
     if (kvm_enabled()) {
diff --git a/accel/mshv/irq.c b/accel/mshv/irq.c
index 990ce34620..9ba837f0e2 100644
--- a/accel/mshv/irq.c
+++ b/accel/mshv/irq.c
@@ -394,16 +394,17 @@ int mshv_request_interrupt(MshvState *mshv_state, uint32_t interrupt_type, uint3
     return 0;
 }
 
-void mshv_irqchip_commit_routes(void)
+void mshv_irqchip_commit_routes(MshvState *s)
 {
     int ret;
-    int vm_fd = mshv_state->vm;
+    int vm_fd = s->vm;
 
-    ret = commit_msi_routing_table(vm_fd);
+    ret = ioctl(vm_fd, MSHV_SET_MSI_ROUTING, s->irq_routes);
     if (ret < 0) {
         error_report("Failed to commit msi routing table");
         abort();
     }
+    trace_mshv_commit_msi_routing_table(vm_fd, s->irq_routes->nr);
 }
 
 int mshv_irqchip_add_irqfd_notifier_gsi(const EventNotifier *event,
diff --git a/accel/stubs/mshv-stub.c b/accel/stubs/mshv-stub.c
index dadf05511a..8b69539a85 100644
--- a/accel/stubs/mshv-stub.c
+++ b/accel/stubs/mshv-stub.c
@@ -28,7 +28,7 @@ int mshv_irqchip_update_msi_route(int virq, MSIMessage msg, PCIDevice *dev)
     return -ENOSYS;
 }
 
-void mshv_irqchip_commit_routes(void)
+void mshv_irqchip_commit_routes(MshvState *s)
 {
 }
 
diff --git a/include/system/mshv.h b/include/system/mshv.h
index 2033beed70..0c9a4f927c 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -62,8 +62,8 @@ int mshv_request_interrupt(MshvState *mshv_state, uint32_t interrupt_type, uint3
 
 int mshv_irqchip_add_msi_route(AccelRouteChange *c, int vector, PCIDevice *dev);
 int mshv_irqchip_update_msi_route(int virq, MSIMessage msg, PCIDevice *dev);
-void mshv_irqchip_commit_routes(void);
 void mshv_irqchip_release_virq(MshvState *s, int virq);
+void mshv_irqchip_commit_routes(MshvState *s);
 int mshv_irqchip_add_irqfd_notifier_gsi(const EventNotifier *n,
                                         const EventNotifier *rn, int virq);
 int mshv_irqchip_remove_irqfd_notifier_gsi(const EventNotifier *n, int virq);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 12/32] accel/mshv: reserve ioapic routes on s->irq_routes
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (10 preceding siblings ...)
  2026-03-23 13:57 ` [RFC 11/32] accel/mshv: use s->irq_routes in commit_routes Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:57 ` [RFC 13/32] accel/mshv: remove redundant msi controller Magnus Kulke
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

We reserve 24 ioapic routes using the new functions that operate on the
mshv apic state.

commit/add_msi_routing() fn's can be removed now.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/irq.c          | 115 ++++++--------------------------------
 accel/mshv/mshv-all.c     |   5 --
 include/system/mshv_int.h |   1 -
 3 files changed, 18 insertions(+), 103 deletions(-)

diff --git a/accel/mshv/irq.c b/accel/mshv/irq.c
index 9ba837f0e2..52b8ac9479 100644
--- a/accel/mshv/irq.c
+++ b/accel/mshv/irq.c
@@ -36,93 +36,6 @@ void mshv_init_msicontrol(void)
     msi_control->updated = false;
 }
 
-static int add_msi_routing(uint64_t addr, uint32_t data)
-{
-    struct mshv_user_irq_entry *route_entry;
-    uint32_t high_addr = addr >> 32;
-    uint32_t low_addr = addr & 0xFFFFFFFF;
-    int gsi;
-    GHashTable *gsi_routes;
-
-    trace_mshv_add_msi_routing(addr, data);
-
-    assert(msi_control);
-
-    WITH_QEMU_LOCK_GUARD(&msi_control_mutex) {
-        /* find an empty slot */
-        gsi = 0;
-        gsi_routes = msi_control->gsi_routes;
-        while (gsi < MSHV_MAX_MSI_ROUTES) {
-            route_entry = g_hash_table_lookup(gsi_routes, GINT_TO_POINTER(gsi));
-            if (!route_entry) {
-                break;
-            }
-            gsi++;
-        }
-        if (gsi >= MSHV_MAX_MSI_ROUTES) {
-            error_report("No empty gsi slot available");
-            return -1;
-        }
-
-        /* create new entry */
-        route_entry = g_new0(struct mshv_user_irq_entry, 1);
-        route_entry->gsi = gsi;
-        route_entry->address_hi = high_addr;
-        route_entry->address_lo = low_addr;
-        route_entry->data = data;
-
-        g_hash_table_insert(gsi_routes, GINT_TO_POINTER(gsi), route_entry);
-        msi_control->updated = true;
-    }
-
-    return gsi;
-}
-
-static int commit_msi_routing_table(int vm_fd)
-{
-    guint len;
-    int i, ret;
-    size_t table_size;
-    struct mshv_user_irq_table *table;
-    GHashTableIter iter;
-    gpointer key, value;
-
-    assert(msi_control);
-
-    WITH_QEMU_LOCK_GUARD(&msi_control_mutex) {
-        if (!msi_control->updated) {
-            /* nothing to update */
-            return 0;
-        }
-
-        /* Calculate the size of the table */
-        len = g_hash_table_size(msi_control->gsi_routes);
-        table_size = sizeof(struct mshv_user_irq_table)
-                     + len * sizeof(struct mshv_user_irq_entry);
-        table = g_malloc0(table_size);
-
-        g_hash_table_iter_init(&iter, msi_control->gsi_routes);
-        i = 0;
-        while (g_hash_table_iter_next(&iter, &key, &value)) {
-            struct mshv_user_irq_entry *entry = value;
-            table->entries[i] = *entry;
-            i++;
-        }
-        table->nr = i;
-
-        trace_mshv_commit_msi_routing_table(vm_fd, len);
-
-        ret = ioctl(vm_fd, MSHV_SET_MSI_ROUTING, table);
-        g_free(table);
-        if (ret < 0) {
-            error_report("Failed to commit msi routing table");
-            return -1;
-        }
-        msi_control->updated = false;
-    }
-    return 0;
-}
-
 /* Pass an eventfd which is to be used for injecting interrupts from userland */
 static int irqfd(int vm_fd, int fd, int resample_fd, uint32_t gsi,
                  uint32_t flags)
@@ -420,37 +333,45 @@ int mshv_irqchip_remove_irqfd_notifier_gsi(const EventNotifier *event,
     return irqchip_update_irqfd_notifier_gsi(event, NULL, virq, false);
 }
 
-int mshv_reserve_ioapic_msi_routes(int vm_fd)
+static int mshv_reserve_ioapic_msi_routes(MshvState *s)
 {
-    int ret, gsi;
+    int ret, i;
+    int gsi = 0;
+    struct mshv_user_irq_entry blank_entry = { 0 };
 
     /*
      * Reserve GSI 0-23 for IOAPIC pins, to avoid conflicts of legacy
      * peripherals with MSI-X devices
      */
-    for (gsi = 0; gsi < IOAPIC_NUM_PINS; gsi++) {
-        ret = add_msi_routing(0, 0);
+    for (i = 0; i < IOAPIC_NUM_PINS; i++) {
+        /* ret = add_msi_routing(0, 0); */
+        ret = irqchip_allocate_gsi(s, &gsi);
         if (ret < 0) {
-            error_report("Failed to reserve GSI %d", gsi);
+            error_report("Failed to reserve GSI %d: %s", gsi, strerror(-ret));
             return -1;
         }
+        blank_entry.gsi = gsi;
+        add_routing_entry(s, &blank_entry);
     }
 
-    ret = commit_msi_routing_table(vm_fd);
-    if (ret < 0) {
-        error_report("Failed to commit reserved IOAPIC MSI routes");
-        return -1;
-    }
+    mshv_irqchip_commit_routes(s);
 
     return 0;
 }
 
 void mshv_init_irq_routing(MshvState *s)
 {
+    int ret;
     int gsi_count = MSHV_MAX_MSI_ROUTES;
 
     s->irq_routes = g_malloc0(sizeof(*s->irq_routes));
     s->nr_allocated_irq_routes = 0;
     s->gsi_count = gsi_count;
     s->used_gsi_bitmap = bitmap_new(gsi_count);
+
+    ret = mshv_reserve_ioapic_msi_routes(s);
+    if (ret < 0) {
+        error_report("Failed to reserve IOAPIC MSI routes");
+        abort();
+    }
 }
diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index 8acb080db1..08bc26713f 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -200,11 +200,6 @@ static int create_vm(int mshv_fd, int *vm_fd)
         return -1;
     }
 
-    ret = mshv_reserve_ioapic_msi_routes(*vm_fd);
-    if (ret < 0) {
-        return -1;
-    }
-
     ret = mshv_arch_post_init_vm(*vm_fd);
     if (ret < 0) {
         return -1;
diff --git a/include/system/mshv_int.h b/include/system/mshv_int.h
index 56fda76a9c..9bc56e70cf 100644
--- a/include/system/mshv_int.h
+++ b/include/system/mshv_int.h
@@ -120,6 +120,5 @@ int mshv_configure_msr(const CPUState *cpu, const MshvMsrEntry *msrs,
 
 /* interrupt */
 void mshv_init_msicontrol(void);
-int mshv_reserve_ioapic_msi_routes(int vm_fd);
 
 #endif
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 13/32] accel/mshv: remove redundant msi controller
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (11 preceding siblings ...)
  2026-03-23 13:57 ` [RFC 12/32] accel/mshv: reserve ioapic routes on s->irq_routes Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:57 ` [RFC 14/32] target/i386/mshv: move apic logic into own file Magnus Kulke
                   ` (18 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

The remaining MsiControl infrastructure can be removed now

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/irq.c          | 11 -----------
 accel/mshv/mshv-all.c     |  2 --
 include/system/mshv_int.h |  3 ---
 3 files changed, 16 deletions(-)

diff --git a/accel/mshv/irq.c b/accel/mshv/irq.c
index 52b8ac9479..4828ac51ac 100644
--- a/accel/mshv/irq.c
+++ b/accel/mshv/irq.c
@@ -25,17 +25,6 @@
 #define MSHV_IRQFD_RESAMPLE_FLAG (1 << MSHV_IRQFD_BIT_RESAMPLE)
 #define MSHV_IRQFD_BIT_DEASSIGN_FLAG (1 << MSHV_IRQFD_BIT_DEASSIGN)
 
-static MshvMsiControl *msi_control;
-static QemuMutex msi_control_mutex;
-
-void mshv_init_msicontrol(void)
-{
-    qemu_mutex_init(&msi_control_mutex);
-    msi_control = g_new0(MshvMsiControl, 1);
-    msi_control->gsi_routes = g_hash_table_new(g_direct_hash, g_direct_equal);
-    msi_control->updated = false;
-}
-
 /* Pass an eventfd which is to be used for injecting interrupts from userland */
 static int irqfd(int vm_fd, int fd, int resample_fd, uint32_t gsi,
                  uint32_t flags)
diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index 08bc26713f..056b19b3b8 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -426,8 +426,6 @@ static int mshv_init(AccelState *as, MachineState *ms)
 
     mshv_init_mmio_emu();
 
-    mshv_init_msicontrol();
-
     ret = create_vm(mshv_fd, &vm_fd);
     if (ret < 0) {
         close(mshv_fd);
diff --git a/include/system/mshv_int.h b/include/system/mshv_int.h
index 9bc56e70cf..7af5bcf022 100644
--- a/include/system/mshv_int.h
+++ b/include/system/mshv_int.h
@@ -118,7 +118,4 @@ typedef struct MshvMsrEntries {
 int mshv_configure_msr(const CPUState *cpu, const MshvMsrEntry *msrs,
                        size_t n_msrs);
 
-/* interrupt */
-void mshv_init_msicontrol(void);
-
 #endif
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 14/32] target/i386/mshv: move apic logic into own file
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (12 preceding siblings ...)
  2026-03-23 13:57 ` [RFC 13/32] accel/mshv: remove redundant msi controller Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:57 ` [RFC 15/32] target/i386/mshv: migrate LAPIC state Magnus Kulke
                   ` (17 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

Use a new file to unclutter the vcpu code gradually from APIC related
logic.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 include/system/mshv_int.h    |  8 ++++
 target/i386/mshv/meson.build |  1 +
 target/i386/mshv/mshv-apic.c | 78 ++++++++++++++++++++++++++++++++++++
 target/i386/mshv/mshv-cpu.c  | 66 +++---------------------------
 4 files changed, 92 insertions(+), 61 deletions(-)
 create mode 100644 target/i386/mshv/mshv-apic.c

diff --git a/include/system/mshv_int.h b/include/system/mshv_int.h
index 7af5bcf022..f86c7a3be6 100644
--- a/include/system/mshv_int.h
+++ b/include/system/mshv_int.h
@@ -16,6 +16,8 @@
 
 #define MSHV_MSR_ENTRIES_COUNT 64
 
+struct mshv_get_set_vp_state;
+
 typedef struct hyperv_message hv_message;
 
 typedef struct MshvHvCallArgs {
@@ -84,6 +86,12 @@ void mshv_arch_destroy_vcpu(CPUState *cpu);
 void mshv_arch_amend_proc_features(
     union hv_partition_synthetic_processor_features *features);
 int mshv_arch_post_init_vm(int vm_fd);
+int mshv_set_lapic(int cpu_fd,
+                   const struct hv_local_interrupt_controller_state *state);
+int mshv_get_lapic(int cpu_fd,
+                   struct hv_local_interrupt_controller_state *state);
+int mshv_get_vp_state(int cpu_fd, struct mshv_get_set_vp_state *state);
+int mshv_set_vp_state(int cpu_fd, const struct mshv_get_set_vp_state *state);
 
 typedef struct mshv_root_hvcall mshv_root_hvcall;
 int mshv_hvcall(int fd, const mshv_root_hvcall *args);
diff --git a/target/i386/mshv/meson.build b/target/i386/mshv/meson.build
index 49f28d4a5b..5eb6e833a6 100644
--- a/target/i386/mshv/meson.build
+++ b/target/i386/mshv/meson.build
@@ -1,6 +1,7 @@
 i386_mshv_ss = ss.source_set()
 
 i386_mshv_ss.add(files(
+  'mshv-apic.c',
   'mshv-cpu.c',
 ))
 
diff --git a/target/i386/mshv/mshv-apic.c b/target/i386/mshv/mshv-apic.c
new file mode 100644
index 0000000000..e0e8b8cacf
--- /dev/null
+++ b/target/i386/mshv/mshv-apic.c
@@ -0,0 +1,78 @@
+/*
+ * QEMU MSHV support
+ *
+ * Copyright Microsoft, Corp. 2026
+ *
+ * Authors: Magnus Kulke <magnuskulke@microsoft.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/memalign.h"
+#include "qemu/error-report.h"
+
+#include "system/mshv.h"
+#include "system/mshv_int.h"
+
+#include "linux/mshv.h"
+#include "hw/hyperv/hvgdk.h"
+#include "hw/hyperv/hvhdk_mini.h"
+#include "hw/hyperv/hvgdk_mini.h"
+
+#include <sys/ioctl.h>
+
+int mshv_get_lapic(int cpu_fd,
+                   struct hv_local_interrupt_controller_state *state)
+{
+    int ret;
+    size_t size = 4096;
+    /* buffer aligned to 4k, as *state requires that */
+    void *buffer = qemu_memalign(size, size);
+    struct mshv_get_set_vp_state mshv_state = { 0 };
+
+    mshv_state.buf_ptr = (uint64_t) buffer;
+    mshv_state.buf_sz = size;
+    mshv_state.type = MSHV_VP_STATE_LAPIC;
+
+    ret = mshv_get_vp_state(cpu_fd, &mshv_state);
+    if (ret == 0) {
+        memcpy(state, buffer, sizeof(*state));
+    }
+    qemu_vfree(buffer);
+    if (ret < 0) {
+        error_report("failed to get lapic");
+        return -1;
+    }
+
+    return 0;
+}
+
+int mshv_set_lapic(int cpu_fd,
+                   const struct hv_local_interrupt_controller_state *state)
+{
+    int ret;
+    size_t size = 4096;
+    /* buffer aligned to 4k, as *state requires that */
+    void *buffer = qemu_memalign(size, size);
+    struct mshv_get_set_vp_state mshv_state = { 0 };
+
+    if (!state) {
+        error_report("lapic state is NULL");
+        return -1;
+    }
+    memcpy(buffer, state, sizeof(*state));
+
+    mshv_state.buf_ptr = (uint64_t) buffer;
+    mshv_state.buf_sz = size;
+    mshv_state.type = MSHV_VP_STATE_LAPIC;
+
+    ret = mshv_set_vp_state(cpu_fd, &mshv_state);
+    qemu_vfree(buffer);
+    if (ret < 0) {
+        error_report("failed to set lapic: %s", strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 56656ac0b0..54c262d8bc 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -919,7 +919,7 @@ static int set_xc_reg(const CPUState *cpu)
     return 0;
 }
 
-static int get_vp_state(int cpu_fd, struct mshv_get_set_vp_state *state)
+int mshv_get_vp_state(int cpu_fd, struct mshv_get_set_vp_state *state)
 {
     int ret;
 
@@ -932,39 +932,12 @@ static int get_vp_state(int cpu_fd, struct mshv_get_set_vp_state *state)
     return 0;
 }
 
-static int get_lapic(const CPUState *cpu,
-                     struct hv_local_interrupt_controller_state *state)
-{
-    int ret;
-    size_t size = 4096;
-    /* buffer aligned to 4k, as *state requires that */
-    void *buffer = qemu_memalign(size, size);
-    struct mshv_get_set_vp_state mshv_state = { 0 };
-    int cpu_fd = mshv_vcpufd(cpu);
-
-    mshv_state.buf_ptr = (uint64_t) buffer;
-    mshv_state.buf_sz = size;
-    mshv_state.type = MSHV_VP_STATE_LAPIC;
-
-    ret = get_vp_state(cpu_fd, &mshv_state);
-    if (ret == 0) {
-        memcpy(state, buffer, sizeof(*state));
-    }
-    qemu_vfree(buffer);
-    if (ret < 0) {
-        error_report("failed to get lapic");
-        return -1;
-    }
-
-    return 0;
-}
-
 static uint32_t set_apic_delivery_mode(uint32_t reg, uint32_t mode)
 {
     return ((reg) & ~0x700) | ((mode) << 8);
 }
 
-static int set_vp_state(int cpu_fd, const struct mshv_get_set_vp_state *state)
+int mshv_set_vp_state(int cpu_fd, const struct mshv_get_set_vp_state *state)
 {
     int ret;
 
@@ -977,43 +950,14 @@ static int set_vp_state(int cpu_fd, const struct mshv_get_set_vp_state *state)
     return 0;
 }
 
-static int set_lapic(const CPUState *cpu,
-                     const struct hv_local_interrupt_controller_state *state)
-{
-    int ret;
-    size_t size = 4096;
-    /* buffer aligned to 4k, as *state requires that */
-    void *buffer = qemu_memalign(size, size);
-    struct mshv_get_set_vp_state mshv_state = { 0 };
-    int cpu_fd = mshv_vcpufd(cpu);
-
-    if (!state) {
-        error_report("lapic state is NULL");
-        return -1;
-    }
-    memcpy(buffer, state, sizeof(*state));
-
-    mshv_state.buf_ptr = (uint64_t) buffer;
-    mshv_state.buf_sz = size;
-    mshv_state.type = MSHV_VP_STATE_LAPIC;
-
-    ret = set_vp_state(cpu_fd, &mshv_state);
-    qemu_vfree(buffer);
-    if (ret < 0) {
-        error_report("failed to set lapic: %s", strerror(errno));
-        return -1;
-    }
-
-    return 0;
-}
-
 static int init_lint(const CPUState *cpu)
 {
     int ret;
     uint32_t *lvt_lint0, *lvt_lint1;
+    int cpu_fd = mshv_vcpufd(cpu);
 
     struct hv_local_interrupt_controller_state lapic_state = { 0 };
-    ret = get_lapic(cpu, &lapic_state);
+    ret = mshv_get_lapic(cpu_fd, &lapic_state);
     if (ret < 0) {
         return ret;
     }
@@ -1026,7 +970,7 @@ static int init_lint(const CPUState *cpu)
 
     /* TODO: should we skip setting lapic if the values are the same? */
 
-    return set_lapic(cpu, &lapic_state);
+    return mshv_set_lapic(cpu_fd, &lapic_state);
 }
 
 int mshv_arch_store_vcpu_state(const CPUState *cpu)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 15/32] target/i386/mshv: migrate LAPIC state
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (13 preceding siblings ...)
  2026-03-23 13:57 ` [RFC 14/32] target/i386/mshv: move apic logic into own file Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:57 ` [RFC 16/32] target/i386/mshv: move msr code to arch Magnus Kulke
                   ` (16 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

This change implements loading and storing the hyperv lapic state as
part of the load/store routines for a vcpu.

The HyperV LAPIC is similar to the the split-irqchip in KVM, it will
only handle MSI/X interrupts. PIC and IOAPIC have to be handled in
userland.

An opaque blob is added to the APICCommonState, guarded behind a flag,
hence it will be covered by a migration, as we declare VMSTATE_BUFFER
for the hv_lapic_state field.

In the future we might want to introduce a dedicated class for MSHV, that
would require us to wire up an IOAPIC delivery path to QEMU's userland
emulation.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 hw/intc/apic_common.c           |  3 ++
 include/hw/i386/apic_internal.h |  5 +++
 target/i386/mshv/mshv-cpu.c     | 61 +++++++++++++++++++++++++++++++--
 3 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index bf4abc21d7..a7df870f1a 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -380,6 +380,9 @@ static const VMStateDescription vmstate_apic_common = {
         VMSTATE_INT64(next_time, APICCommonState),
         VMSTATE_INT64(timer_expiry,
                       APICCommonState), /* open-coded timer state */
+#ifdef CONFIG_MSHV
+        VMSTATE_BUFFER(hv_lapic_state, APICCommonState),
+#endif
         VMSTATE_END_OF_LIST()
     },
     .subsections = (const VMStateDescription * const []) {
diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h
index 0cb06bbc76..6d4ccca4e8 100644
--- a/include/hw/i386/apic_internal.h
+++ b/include/hw/i386/apic_internal.h
@@ -23,6 +23,7 @@
 
 #include "cpu.h"
 #include "hw/i386/apic.h"
+#include "hw/hyperv/hvgdk_mini.h"
 #include "system/memory.h"
 #include "qemu/timer.h"
 #include "target/i386/cpu-qom.h"
@@ -188,6 +189,10 @@ struct APICCommonState {
     DeviceState *vapic;
     hwaddr vapic_paddr; /* note: persistence via kvmvapic */
     uint32_t extended_log_dest;
+
+#ifdef CONFIG_MSHV
+    uint8_t hv_lapic_state[sizeof(struct hv_local_interrupt_controller_state)];
+#endif
 };
 
 typedef struct VAPICState {
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 54c262d8bc..906f5b0c3d 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -112,6 +112,25 @@ static int get_generic_regs(CPUState *cpu,
                             struct hv_register_assoc *assocs,
                             size_t n_regs);
 
+static int get_lapic(CPUState *cpu)
+{
+    X86CPU *x86cpu = X86_CPU(cpu);
+    APICCommonState *apic = APIC_COMMON(x86cpu->apic_state);
+    int cpu_fd = mshv_vcpufd(cpu);
+    int ret;
+    struct hv_local_interrupt_controller_state lapic_state = { 0 };
+
+    ret = mshv_get_lapic(cpu_fd, &lapic_state);
+    if (ret < 0) {
+        error_report("failed to get lapic state");
+        return -1;
+    }
+
+    memcpy(&apic->hv_lapic_state, &lapic_state, sizeof(lapic_state));
+
+    return 0;
+}
+
 static void populate_fpu(const hv_register_assoc *assocs, X86CPU *x86cpu)
 {
     union hv_register_value value;
@@ -559,6 +578,11 @@ int mshv_arch_load_vcpu_state(CPUState *cpu)
         return ret;
     }
 
+    ret = get_lapic(cpu);
+    if (ret < 0) {
+        return ret;
+    }
+
     return 0;
 }
 
@@ -952,9 +976,11 @@ int mshv_set_vp_state(int cpu_fd, const struct mshv_get_set_vp_state *state)
 
 static int init_lint(const CPUState *cpu)
 {
-    int ret;
+    X86CPU *x86cpu = X86_CPU(cpu);
+    APICCommonState *apic = APIC_COMMON(x86cpu->apic_state);
     uint32_t *lvt_lint0, *lvt_lint1;
     int cpu_fd = mshv_vcpufd(cpu);
+    int ret;
 
     struct hv_local_interrupt_controller_state lapic_state = { 0 };
     ret = mshv_get_lapic(cpu_fd, &lapic_state);
@@ -970,7 +996,32 @@ static int init_lint(const CPUState *cpu)
 
     /* TODO: should we skip setting lapic if the values are the same? */
 
-    return mshv_set_lapic(cpu_fd, &lapic_state);
+    ret = mshv_set_lapic(cpu_fd, &lapic_state);
+    if (ret < 0) {
+        return -1;
+    }
+
+    memcpy(apic->hv_lapic_state, &lapic_state, sizeof(lapic_state));
+
+    return 0;
+}
+
+static int set_lapic(const CPUState *cpu)
+{
+    X86CPU *x86cpu = X86_CPU(cpu);
+    APICCommonState *apic = APIC_COMMON(x86cpu->apic_state);
+    int cpu_fd = mshv_vcpufd(cpu);
+    int ret;
+
+    struct hv_local_interrupt_controller_state lapic_state = { 0 };
+    memcpy(&lapic_state, &apic->hv_lapic_state, sizeof(lapic_state));
+    ret = mshv_set_lapic(cpu_fd, &lapic_state);
+    if (ret < 0) {
+        error_report("failed to set lapic");
+        return -1;
+    }
+
+    return 0;
 }
 
 int mshv_arch_store_vcpu_state(const CPUState *cpu)
@@ -997,6 +1048,12 @@ int mshv_arch_store_vcpu_state(const CPUState *cpu)
         return ret;
     }
 
+    /* INVARIANT: special regs (APIC_BASE) must be restored before LAPIC */
+    ret = set_lapic(cpu);
+    if (ret < 0) {
+        return ret;
+    }
+
     return 0;
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 16/32] target/i386/mshv: move msr code to arch
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (14 preceding siblings ...)
  2026-03-23 13:57 ` [RFC 15/32] target/i386/mshv: migrate LAPIC state Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:57 ` [RFC 17/32] accel/mshv: store partition proc features Magnus Kulke
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

The MSR code is x86 specific, hence it's better suited in the arch
tree.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/meson.build            | 1 -
 target/i386/mshv/meson.build      | 1 +
 {accel => target/i386}/mshv/msr.c | 0
 3 files changed, 1 insertion(+), 1 deletion(-)
 rename {accel => target/i386}/mshv/msr.c (100%)

diff --git a/accel/mshv/meson.build b/accel/mshv/meson.build
index c1b1787c5e..e433187cde 100644
--- a/accel/mshv/meson.build
+++ b/accel/mshv/meson.build
@@ -1,6 +1,5 @@
 system_ss.add(when: 'CONFIG_MSHV', if_true: files(
   'irq.c',
   'mem.c',
-  'msr.c',
   'mshv-all.c'
 ))
diff --git a/target/i386/mshv/meson.build b/target/i386/mshv/meson.build
index 5eb6e833a6..f44e84688d 100644
--- a/target/i386/mshv/meson.build
+++ b/target/i386/mshv/meson.build
@@ -3,6 +3,7 @@ i386_mshv_ss = ss.source_set()
 i386_mshv_ss.add(files(
   'mshv-apic.c',
   'mshv-cpu.c',
+  'msr.c',
 ))
 
 i386_system_ss.add_all(when: 'CONFIG_MSHV', if_true: i386_mshv_ss)
diff --git a/accel/mshv/msr.c b/target/i386/mshv/msr.c
similarity index 100%
rename from accel/mshv/msr.c
rename to target/i386/mshv/msr.c
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 17/32] accel/mshv: store partition proc features
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (15 preceding siblings ...)
  2026-03-23 13:57 ` [RFC 16/32] target/i386/mshv: move msr code to arch Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:57 ` [RFC 18/32] target/i386/mshv: expose msvh_get_generic_regs Magnus Kulke
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

We retrieve and store processor features on the state, so we can query
them later when deciding which MSRs to migrate.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/mshv-all.c     |  57 +++++++++++++++
 include/hw/hyperv/hvhdk.h | 149 ++++++++++++++++++++++++++++++++++++++
 include/system/mshv_int.h |   2 +
 3 files changed, 208 insertions(+)

diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index 056b19b3b8..d1bf148bb8 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -106,6 +106,57 @@ static int resume_vm(int vm_fd)
     return 0;
 }
 
+static int get_partition_property(int vm_fd, uint32_t property_code,
+                                  uint64_t *value)
+{
+    struct hv_input_get_partition_property in = {0};
+    struct hv_output_get_partition_property out = {0};
+    struct mshv_root_hvcall args = {0};
+    int ret;
+
+    in.property_code = property_code;
+
+    args.code    = HVCALL_GET_PARTITION_PROPERTY;
+    args.in_sz   = sizeof(in);
+    args.in_ptr  = (uint64_t)&in;
+    args.out_sz  = sizeof(out);
+    args.out_ptr = (uint64_t)&out;
+
+    ret = ioctl(vm_fd, MSHV_ROOT_HVCALL, &args);
+    if (ret < 0) {
+        error_report("Failed to get guest partition property bank: %s",
+                     strerror(errno));
+        return -1;
+    }
+
+    *value = out.property_value;
+    return 0;
+}
+
+static int get_proc_features(int vm_fd,
+                             union hv_partition_processor_features *features)
+{
+    int ret;
+
+    ret = get_partition_property(vm_fd,
+                                 HV_PARTITION_PROPERTY_PROCESSOR_FEATURES0,
+                                 features[0].as_uint64);
+    if (ret < 0) {
+        error_report("Failed to get processor features bank 0");
+        return -1;
+    }
+
+    ret = get_partition_property(vm_fd,
+                                 HV_PARTITION_PROPERTY_PROCESSOR_FEATURES1,
+                                 features[0].as_uint64);
+    if (ret < 0) {
+        error_report("Failed to get processor features bank 1");
+        return -1;
+    }
+
+    return 0;
+}
+
 static int create_partition(int mshv_fd, int *vm_fd)
 {
     int ret;
@@ -441,6 +492,12 @@ static int mshv_init(AccelState *as, MachineState *ms)
 
     s->vm = vm_fd;
     s->fd = mshv_fd;
+
+    ret = get_proc_features(vm_fd, &s->processor_features);
+    if (ret < 0) {
+        return -1;
+    }
+
     s->nr_as = 1;
     s->as = g_new0(MshvAddressSpace, s->nr_as);
 
diff --git a/include/hw/hyperv/hvhdk.h b/include/hw/hyperv/hvhdk.h
index 866c8211bf..5177cfa7e5 100644
--- a/include/hw/hyperv/hvhdk.h
+++ b/include/hw/hyperv/hvhdk.h
@@ -11,6 +11,16 @@
 
 #define HV_PARTITION_SYNTHETIC_PROCESSOR_FEATURES_BANKS 1
 
+struct hv_input_get_partition_property {
+    uint64_t partition_id;
+    uint32_t property_code; /* enum hv_partition_property_code */
+    uint32_t padding;
+} QEMU_PACKED;
+
+struct hv_output_get_partition_property {
+    uint64_t property_value;
+} QEMU_PACKED;
+
 struct hv_input_set_partition_property {
     uint64_t partition_id;
     uint32_t property_code; /* enum hv_partition_property_code */
@@ -246,4 +256,143 @@ typedef struct hv_input_register_intercept_result {
     union hv_register_intercept_result_parameters parameters;
 } hv_input_register_intercept_result;
 
+#define HV_PARTITION_PROCESSOR_FEATURES_BANKS 2
+#define HV_PARTITION_PROCESSOR_FEATURES_RESERVEDBANK1_BITFIELD_COUNT 4
+
+union hv_partition_processor_features {
+    uint64_t as_uint64[HV_PARTITION_PROCESSOR_FEATURES_BANKS];
+    struct {
+        uint64_t sse3_support:1;
+        uint64_t lahf_sahf_support:1;
+        uint64_t ssse3_support:1;
+        uint64_t sse4_1_support:1;
+        uint64_t sse4_2_support:1;
+        uint64_t sse4a_support:1;
+        uint64_t xop_support:1;
+        uint64_t pop_cnt_support:1;
+        uint64_t cmpxchg16b_support:1;
+        uint64_t altmovcr8_support:1;
+        uint64_t lzcnt_support:1;
+        uint64_t mis_align_sse_support:1;
+        uint64_t mmx_ext_support:1;
+        uint64_t amd3dnow_support:1;
+        uint64_t extended_amd3dnow_support:1;
+        uint64_t page_1gb_support:1;
+        uint64_t aes_support:1;
+        uint64_t pclmulqdq_support:1;
+        uint64_t pcid_support:1;
+        uint64_t fma4_support:1;
+        uint64_t f16c_support:1;
+        uint64_t rd_rand_support:1;
+        uint64_t rd_wr_fs_gs_support:1;
+        uint64_t smep_support:1;
+        uint64_t enhanced_fast_string_support:1;
+        uint64_t bmi1_support:1;
+        uint64_t bmi2_support:1;
+        uint64_t hle_support_deprecated:1;
+        uint64_t rtm_support_deprecated:1;
+        uint64_t movbe_support:1;
+        uint64_t npiep1_support:1;
+        uint64_t dep_x87_fpu_save_support:1;
+        uint64_t rd_seed_support:1;
+        uint64_t adx_support:1;
+        uint64_t intel_prefetch_support:1;
+        uint64_t smap_support:1;
+        uint64_t hle_support:1;
+        uint64_t rtm_support:1;
+        uint64_t rdtscp_support:1;
+        uint64_t clflushopt_support:1;
+        uint64_t clwb_support:1;
+        uint64_t sha_support:1;
+        uint64_t x87_pointers_saved_support:1;
+        uint64_t invpcid_support:1;
+        uint64_t ibrs_support:1;
+        uint64_t stibp_support:1;
+        uint64_t ibpb_support:1;
+        uint64_t unrestricted_guest_support:1;
+        uint64_t mdd_support:1;
+        uint64_t fast_short_rep_mov_support:1;
+        uint64_t l1dcache_flush_support:1;
+        uint64_t rdcl_no_support:1;
+        uint64_t ibrs_all_support:1;
+        uint64_t skip_l1df_support:1;
+        uint64_t ssb_no_support:1;
+        uint64_t rsb_a_no_support:1;
+        uint64_t virt_spec_ctrl_support:1;
+        uint64_t rd_pid_support:1;
+        uint64_t umip_support:1;
+        uint64_t mbs_no_support:1;
+        uint64_t mb_clear_support:1;
+        uint64_t taa_no_support:1;
+        uint64_t tsx_ctrl_support:1;
+        uint64_t reserved_bank0:1;
+
+        /* N.B. Begin bank 1 processor features. */
+        uint64_t a_count_m_count_support:1;
+        uint64_t tsc_invariant_support:1;
+        uint64_t cl_zero_support:1;
+        uint64_t rdpru_support:1;
+        uint64_t la57_support:1;
+        uint64_t mbec_support:1;
+        uint64_t nested_virt_support:1;
+        uint64_t psfd_support:1;
+        uint64_t cet_ss_support:1;
+        uint64_t cet_ibt_support:1;
+        uint64_t vmx_exception_inject_support:1;
+        uint64_t enqcmd_support:1;
+        uint64_t umwait_tpause_support:1;
+        uint64_t movdiri_support:1;
+        uint64_t movdir64b_support:1;
+        uint64_t cldemote_support:1;
+        uint64_t serialize_support:1;
+        uint64_t tsc_deadline_tmr_support:1;
+        uint64_t tsc_adjust_support:1;
+        uint64_t fzl_rep_movsb:1;
+        uint64_t fs_rep_stosb:1;
+        uint64_t fs_rep_cmpsb:1;
+        uint64_t tsx_ld_trk_support:1;
+        uint64_t vmx_ins_outs_exit_info_support:1;
+        uint64_t hlat_support:1;
+        uint64_t sbdr_ssdp_no_support:1;
+        uint64_t fbsdp_no_support:1;
+        uint64_t psdp_no_support:1;
+        uint64_t fb_clear_support:1;
+        uint64_t btc_no_support:1;
+        uint64_t ibpb_rsb_flush_support:1;
+        uint64_t stibp_always_on_support:1;
+        uint64_t perf_global_ctrl_support:1;
+        uint64_t npt_execute_only_support:1;
+        uint64_t npt_ad_flags_support:1;
+        uint64_t npt1_gb_page_support:1;
+        uint64_t amd_processor_topology_node_id_support:1;
+        uint64_t local_machine_check_support:1;
+        uint64_t extended_topology_leaf_fp256_amd_support:1;
+        uint64_t gds_no_support:1;
+        uint64_t cmpccxadd_support:1;
+        uint64_t tsc_aux_virtualization_support:1;
+        uint64_t rmp_query_support:1;
+        uint64_t bhi_no_support:1;
+        uint64_t bhi_dis_support:1;
+        uint64_t prefetch_i_support:1;
+        uint64_t sha512_support:1;
+        uint64_t mitigation_ctrl_support:1;
+        uint64_t rfds_no_support:1;
+        uint64_t rfds_clear_support:1;
+        uint64_t sm3_support:1;
+        uint64_t sm4_support:1;
+        uint64_t secure_avic_support:1;
+        uint64_t guest_intercept_ctrl_support:1;
+        uint64_t sbpb_supported:1;
+        uint64_t ibpb_br_type_supported:1;
+        uint64_t srso_no_supported:1;
+        uint64_t srso_user_kernel_no_supported:1;
+        uint64_t vrew_clear_supported:1;
+        uint64_t tsa_l1_no_supported:1;
+        uint64_t tsa_sq_no_supported:1;
+        uint64_t lass_support:1;
+        uint64_t idle_hlt_intercept_support:1;
+        uint64_t msr_list_support:1;
+    } QEMU_PACKED;
+};
+
 #endif /* HW_HYPERV_HVHDK_H */
diff --git a/include/system/mshv_int.h b/include/system/mshv_int.h
index f86c7a3be6..2b6d7b2f35 100644
--- a/include/system/mshv_int.h
+++ b/include/system/mshv_int.h
@@ -15,6 +15,7 @@
 #define QEMU_MSHV_INT_H
 
 #define MSHV_MSR_ENTRIES_COUNT 64
+#include "hw/hyperv/hvhdk.h"
 
 struct mshv_get_set_vp_state;
 
@@ -55,6 +56,7 @@ struct MshvState {
     int nr_allocated_irq_routes;
     unsigned long *used_gsi_bitmap;
     unsigned int gsi_count;
+    union hv_partition_processor_features processor_features;
 };
 
 typedef struct MshvMsiControl {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 18/32] target/i386/mshv: expose msvh_get_generic_regs
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (16 preceding siblings ...)
  2026-03-23 13:57 ` [RFC 17/32] accel/mshv: store partition proc features Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:57 ` [RFC 19/32] target/i386/mshv: migrate MSRs Magnus Kulke
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

We expose the fn, so we can call them from the other source files
(msr.c).

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 include/system/mshv_int.h   |  2 ++
 target/i386/mshv/mshv-cpu.c | 15 ++++++---------
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/include/system/mshv_int.h b/include/system/mshv_int.h
index 2b6d7b2f35..2c5d16bf9a 100644
--- a/include/system/mshv_int.h
+++ b/include/system/mshv_int.h
@@ -81,6 +81,8 @@ int mshv_configure_vcpu(const CPUState *cpu);
 int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit);
 int mshv_set_generic_regs(const CPUState *cpu, const hv_register_assoc *assocs,
                           size_t n_regs);
+int mshv_get_generic_regs(CPUState *cpu, hv_register_assoc *assocs,
+                          size_t n_regs);
 int mshv_arch_store_vcpu_state(const CPUState *cpu);
 int mshv_arch_load_vcpu_state(CPUState *cpu);
 void mshv_arch_init_vcpu(CPUState *cpu);
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 906f5b0c3d..ecb4711b95 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -108,9 +108,6 @@ static enum hv_register_name FPU_REGISTER_NAMES[26] = {
 };
 
 static int set_special_regs(const CPUState *cpu);
-static int get_generic_regs(CPUState *cpu,
-                            struct hv_register_assoc *assocs,
-                            size_t n_regs);
 
 static int get_lapic(CPUState *cpu)
 {
@@ -187,7 +184,7 @@ static int get_fpu(CPUState *cpu)
     for (size_t i = 0; i < n_regs; i++) {
         assocs[i].name = FPU_REGISTER_NAMES[i];
     }
-    ret = get_generic_regs(cpu, assocs, n_regs);
+    ret = mshv_get_generic_regs(cpu, assocs, n_regs);
     if (ret < 0) {
         error_report("failed to get special registers");
         return -errno;
@@ -207,7 +204,7 @@ static int get_xc_reg(CPUState *cpu)
 
     assocs[0].name = HV_X64_REGISTER_XFEM;
 
-    ret = get_generic_regs(cpu, assocs, 1);
+    ret = mshv_get_generic_regs(cpu, assocs, 1);
     if (ret < 0) {
         error_report("failed to get xcr0");
         return -1;
@@ -300,8 +297,8 @@ int mshv_set_generic_regs(const CPUState *cpu, const hv_register_assoc *assocs,
     return 0;
 }
 
-static int get_generic_regs(CPUState *cpu, hv_register_assoc *assocs,
-                            size_t n_regs)
+int mshv_get_generic_regs(CPUState *cpu, hv_register_assoc *assocs,
+                          size_t n_regs)
 {
     int cpu_fd = mshv_vcpufd(cpu);
     int vp_index = cpu->cpu_index;
@@ -450,7 +447,7 @@ static int get_standard_regs(CPUState *cpu)
     for (size_t i = 0; i < n_regs; i++) {
         assocs[i].name = STANDARD_REGISTER_NAMES[i];
     }
-    ret = get_generic_regs(cpu, assocs, n_regs);
+    ret = mshv_get_generic_regs(cpu, assocs, n_regs);
     if (ret < 0) {
         error_report("failed to get standard registers");
         return -1;
@@ -527,7 +524,7 @@ static int get_special_regs(CPUState *cpu)
     for (size_t i = 0; i < n_regs; i++) {
         assocs[i].name = SPECIAL_REGISTER_NAMES[i];
     }
-    ret = get_generic_regs(cpu, assocs, n_regs);
+    ret = mshv_get_generic_regs(cpu, assocs, n_regs);
     if (ret < 0) {
         error_report("failed to get special registers");
         return -errno;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 19/32] target/i386/mshv: migrate MSRs
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (17 preceding siblings ...)
  2026-03-23 13:57 ` [RFC 18/32] target/i386/mshv: expose msvh_get_generic_regs Magnus Kulke
@ 2026-03-23 13:57 ` Magnus Kulke
  2026-03-23 13:58 ` [RFC 20/32] target/i386/mshv: migrate MTRR MSRs Magnus Kulke
                   ` (12 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

In this change the we rewrite the existing MSR logic to make MSRs
migratable:

- we map them on existing QEMU fields in the CPU. A table and a macro
  MSHV_ENV_FIELD is used to associate a HV register name to the their msr
  index and their offset in the cpu state struct. The list is not
  exhaustive and will be extended in follow-up commits.
- mshv_set/get_msrs() fns are called in the arch_load/store_vcpu_state()
  fns. they use use generic registers ioctl's and map the input/output
  via load/store_to/from_env() from/to the hv register content to the
  cpu state representation.
- init_msrs() has been moved from mshv-vcpu to the msr source file
- we need to perform some filtering of MSR because before writing and
  reading, because the hvcalls will fail if the partition doesn't
  support a given MSRs.
- Some MSRs are partition-wide and so we will only write the to on the
  BSP.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 include/hw/hyperv/hvgdk_mini.h |  16 +
 include/system/mshv_int.h      |  17 +-
 target/i386/mshv/mshv-cpu.c    |  40 +--
 target/i386/mshv/msr.c         | 549 +++++++++++++--------------------
 4 files changed, 254 insertions(+), 368 deletions(-)

diff --git a/include/hw/hyperv/hvgdk_mini.h b/include/hw/hyperv/hvgdk_mini.h
index cb52cc9de2..a47bc6212e 100644
--- a/include/hw/hyperv/hvgdk_mini.h
+++ b/include/hw/hyperv/hvgdk_mini.h
@@ -9,6 +9,19 @@
 
 #define MSHV_IOCTL  0xB8
 
+/* Hyper-V specific model specific registers (MSRs) */
+
+/* HV_X64_SYNTHETIC_MSR */
+#define HV_X64_MSR_GUEST_OS_ID      0x40000000
+#define HV_X64_MSR_HYPERCALL        0x40000001
+#define HV_X64_MSR_VP_INDEX         0x40000002
+#define HV_X64_MSR_RESET            0x40000003
+#define HV_X64_MSR_VP_RUNTIME       0x40000010
+#define HV_X64_MSR_TIME_REF_COUNT   0x40000020
+#define HV_X64_MSR_REFERENCE_TSC    0x40000021
+#define HV_X64_MSR_TSC_FREQUENCY    0x40000022
+#define HV_X64_MSR_APIC_FREQUENCY   0x40000023
+
 typedef enum hv_register_name {
     /* Pending Interruption Register */
     HV_REGISTER_PENDING_INTERRUPTION = 0x00010002,
@@ -152,12 +165,14 @@ typedef enum hv_register_name {
     /* Available */
 
     HV_X64_REGISTER_SPEC_CTRL       = 0x00080084,
+    HV_X64_REGISTER_TSC_DEADLINE    = 0x00080095,
     HV_X64_REGISTER_TSC_ADJUST      = 0x00080096,
 
     /* Other MSRs */
     HV_X64_REGISTER_MSR_IA32_MISC_ENABLE = 0x000800A0,
 
     /* Misc */
+    HV_X64_REGISTER_HYPERCALL       = 0x00090001,
     HV_REGISTER_GUEST_OS_ID         = 0x00090002,
     HV_REGISTER_REFERENCE_TSC       = 0x00090017,
 
@@ -788,6 +803,7 @@ struct hv_cpuid {
 #define IA32_MSR_DEBUG_CTL        0x1D9
 #define IA32_MSR_SPEC_CTRL        0x00000048
 #define IA32_MSR_TSC_ADJUST       0x0000003b
+#define IA32_MSR_TSC_DEADLINE     0x000006e0
 
 #define IA32_MSR_MISC_ENABLE 0x000001a0
 
diff --git a/include/system/mshv_int.h b/include/system/mshv_int.h
index 2c5d16bf9a..29b363e73e 100644
--- a/include/system/mshv_int.h
+++ b/include/system/mshv_int.h
@@ -14,7 +14,6 @@
 #ifndef QEMU_MSHV_INT_H
 #define QEMU_MSHV_INT_H
 
-#define MSHV_MSR_ENTRIES_COUNT 64
 #include "hw/hyperv/hvhdk.h"
 
 struct mshv_get_set_vp_state;
@@ -116,18 +115,8 @@ void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
                        bool add);
 
 /* msr */
-typedef struct MshvMsrEntry {
-  uint32_t index;
-  uint32_t reserved;
-  uint64_t data;
-} MshvMsrEntry;
-
-typedef struct MshvMsrEntries {
-    MshvMsrEntry entries[MSHV_MSR_ENTRIES_COUNT];
-    uint32_t nmsrs;
-} MshvMsrEntries;
-
-int mshv_configure_msr(const CPUState *cpu, const MshvMsrEntry *msrs,
-                       size_t n_msrs);
+int mshv_init_msrs(const CPUState *cpu);
+int mshv_get_msrs(CPUState *cpu);
+int mshv_set_msrs(const CPUState *cpu);
 
 #endif
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index ecb4711b95..0d4721582a 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -580,6 +580,11 @@ int mshv_arch_load_vcpu_state(CPUState *cpu)
         return ret;
     }
 
+    ret = mshv_get_msrs(cpu);
+    if (ret < 0) {
+        return ret;
+    }
+
     return 0;
 }
 
@@ -1051,6 +1056,12 @@ int mshv_arch_store_vcpu_state(const CPUState *cpu)
         return ret;
     }
 
+    /* INVARIANT: LAPIC must be restored before MSRs (TSC_DEADLINE) */
+    ret = mshv_set_msrs(cpu);
+    if (ret < 0) {
+        return ret;
+    }
+
     return 0;
 }
 
@@ -1537,33 +1548,6 @@ void mshv_init_mmio_emu(void)
     init_emu(&mshv_x86_emul_ops);
 }
 
-static int init_msrs(const CPUState *cpu)
-{
-    int ret;
-    uint64_t d_t = MSR_MTRR_ENABLE | MSR_MTRR_MEM_TYPE_WB;
-
-    const struct hv_register_assoc assocs[] = {
-        { .name = HV_X64_REGISTER_SYSENTER_CS,       .value.reg64 = 0x0 },
-        { .name = HV_X64_REGISTER_SYSENTER_ESP,      .value.reg64 = 0x0 },
-        { .name = HV_X64_REGISTER_SYSENTER_EIP,      .value.reg64 = 0x0 },
-        { .name = HV_X64_REGISTER_STAR,              .value.reg64 = 0x0 },
-        { .name = HV_X64_REGISTER_CSTAR,             .value.reg64 = 0x0 },
-        { .name = HV_X64_REGISTER_LSTAR,             .value.reg64 = 0x0 },
-        { .name = HV_X64_REGISTER_KERNEL_GS_BASE,    .value.reg64 = 0x0 },
-        { .name = HV_X64_REGISTER_SFMASK,            .value.reg64 = 0x0 },
-        { .name = HV_X64_REGISTER_MSR_MTRR_DEF_TYPE, .value.reg64 = d_t },
-    };
-    QEMU_BUILD_BUG_ON(ARRAY_SIZE(assocs) > MSHV_MSR_ENTRIES_COUNT);
-
-    ret = mshv_set_generic_regs(cpu, assocs, ARRAY_SIZE(assocs));
-    if (ret < 0) {
-        error_report("failed to put msrs");
-        return -1;
-    }
-
-    return 0;
-}
-
 void mshv_arch_init_vcpu(CPUState *cpu)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
@@ -1593,7 +1577,7 @@ void mshv_arch_init_vcpu(CPUState *cpu)
     ret = init_cpuid2(cpu);
     assert(ret == 0);
 
-    ret = init_msrs(cpu);
+    ret = mshv_init_msrs(cpu);
     assert(ret == 0);
 
     ret = init_lint(cpu);
diff --git a/target/i386/mshv/msr.c b/target/i386/mshv/msr.c
index e6e5baef50..6e53874787 100644
--- a/target/i386/mshv/msr.c
+++ b/target/i386/mshv/msr.c
@@ -14,362 +14,259 @@
 #include "hw/hyperv/hvgdk_mini.h"
 #include "linux/mshv.h"
 #include "qemu/error-report.h"
+#include "cpu.h"
 
-static uint32_t supported_msrs[64] = {
-    IA32_MSR_TSC,
-    IA32_MSR_EFER,
-    IA32_MSR_KERNEL_GS_BASE,
-    IA32_MSR_APIC_BASE,
-    IA32_MSR_PAT,
-    IA32_MSR_SYSENTER_CS,
-    IA32_MSR_SYSENTER_ESP,
-    IA32_MSR_SYSENTER_EIP,
-    IA32_MSR_STAR,
-    IA32_MSR_LSTAR,
-    IA32_MSR_CSTAR,
-    IA32_MSR_SFMASK,
-    IA32_MSR_MTRR_DEF_TYPE,
-    IA32_MSR_MTRR_PHYSBASE0,
-    IA32_MSR_MTRR_PHYSMASK0,
-    IA32_MSR_MTRR_PHYSBASE1,
-    IA32_MSR_MTRR_PHYSMASK1,
-    IA32_MSR_MTRR_PHYSBASE2,
-    IA32_MSR_MTRR_PHYSMASK2,
-    IA32_MSR_MTRR_PHYSBASE3,
-    IA32_MSR_MTRR_PHYSMASK3,
-    IA32_MSR_MTRR_PHYSBASE4,
-    IA32_MSR_MTRR_PHYSMASK4,
-    IA32_MSR_MTRR_PHYSBASE5,
-    IA32_MSR_MTRR_PHYSMASK5,
-    IA32_MSR_MTRR_PHYSBASE6,
-    IA32_MSR_MTRR_PHYSMASK6,
-    IA32_MSR_MTRR_PHYSBASE7,
-    IA32_MSR_MTRR_PHYSMASK7,
-    IA32_MSR_MTRR_FIX64K_00000,
-    IA32_MSR_MTRR_FIX16K_80000,
-    IA32_MSR_MTRR_FIX16K_A0000,
-    IA32_MSR_MTRR_FIX4K_C0000,
-    IA32_MSR_MTRR_FIX4K_C8000,
-    IA32_MSR_MTRR_FIX4K_D0000,
-    IA32_MSR_MTRR_FIX4K_D8000,
-    IA32_MSR_MTRR_FIX4K_E0000,
-    IA32_MSR_MTRR_FIX4K_E8000,
-    IA32_MSR_MTRR_FIX4K_F0000,
-    IA32_MSR_MTRR_FIX4K_F8000,
-    IA32_MSR_TSC_AUX,
-    IA32_MSR_DEBUG_CTL,
-    HV_X64_MSR_GUEST_OS_ID,
-    HV_X64_MSR_SINT0,
-    HV_X64_MSR_SINT1,
-    HV_X64_MSR_SINT2,
-    HV_X64_MSR_SINT3,
-    HV_X64_MSR_SINT4,
-    HV_X64_MSR_SINT5,
-    HV_X64_MSR_SINT6,
-    HV_X64_MSR_SINT7,
-    HV_X64_MSR_SINT8,
-    HV_X64_MSR_SINT9,
-    HV_X64_MSR_SINT10,
-    HV_X64_MSR_SINT11,
-    HV_X64_MSR_SINT12,
-    HV_X64_MSR_SINT13,
-    HV_X64_MSR_SINT14,
-    HV_X64_MSR_SINT15,
-    HV_X64_MSR_SCONTROL,
-    HV_X64_MSR_SIEFP,
-    HV_X64_MSR_SIMP,
-    HV_X64_MSR_REFERENCE_TSC,
-    HV_X64_MSR_EOM,
+#define MSHV_ENV_FIELD(env, offset) (*(uint64_t *)((char *)(env) + (offset)))
+
+typedef struct MshvMsrEnvMap {
+    uint32_t msr_index;
+    uint32_t hv_name;
+    ptrdiff_t env_offset;
+} MshvMsrEnvMap;
+
+/* Those MSRs have a direct mapping to fields in CPUX86State  */
+static const MshvMsrEnvMap msr_env_map[] = {
+    /* Architectural */
+    { IA32_MSR_EFER, HV_X64_REGISTER_EFER, offsetof(CPUX86State, efer) },
+    { IA32_MSR_PAT,  HV_X64_REGISTER_PAT,  offsetof(CPUX86State, pat) },
+
+    /* Syscall */
+    { IA32_MSR_SYSENTER_CS,    HV_X64_REGISTER_SYSENTER_CS,
+                               offsetof(CPUX86State, sysenter_cs) },
+    { IA32_MSR_SYSENTER_ESP,   HV_X64_REGISTER_SYSENTER_ESP,
+                               offsetof(CPUX86State, sysenter_esp) },
+    { IA32_MSR_SYSENTER_EIP,   HV_X64_REGISTER_SYSENTER_EIP,
+                               offsetof(CPUX86State, sysenter_eip) },
+    { IA32_MSR_STAR,           HV_X64_REGISTER_STAR,
+                               offsetof(CPUX86State, star) },
+    { IA32_MSR_LSTAR,          HV_X64_REGISTER_LSTAR,
+                               offsetof(CPUX86State, lstar) },
+    { IA32_MSR_CSTAR,          HV_X64_REGISTER_CSTAR,
+                               offsetof(CPUX86State, cstar) },
+    { IA32_MSR_SFMASK,         HV_X64_REGISTER_SFMASK,
+                               offsetof(CPUX86State, fmask) },
+    { IA32_MSR_KERNEL_GS_BASE, HV_X64_REGISTER_KERNEL_GS_BASE,
+                               offsetof(CPUX86State, kernelgsbase) },
+
+    /* TSC-related */
+    { IA32_MSR_TSC,          HV_X64_REGISTER_TSC,
+                             offsetof(CPUX86State, tsc) },
+    { IA32_MSR_TSC_AUX,      HV_X64_REGISTER_TSC_AUX,
+                             offsetof(CPUX86State, tsc_aux) },
+    { IA32_MSR_TSC_ADJUST,   HV_X64_REGISTER_TSC_ADJUST,
+                             offsetof(CPUX86State, tsc_adjust) },
+    { IA32_MSR_TSC_DEADLINE, HV_X64_REGISTER_TSC_DEADLINE,
+                             offsetof(CPUX86State, tsc_deadline) },
+
+    /* Hyper-V per-partition MSRs */
+    { HV_X64_MSR_HYPERCALL,     HV_X64_REGISTER_HYPERCALL,
+                                offsetof(CPUX86State, msr_hv_hypercall) },
+    { HV_X64_MSR_GUEST_OS_ID,   HV_REGISTER_GUEST_OS_ID,
+                                offsetof(CPUX86State, msr_hv_guest_os_id) },
+    { HV_X64_MSR_REFERENCE_TSC, HV_REGISTER_REFERENCE_TSC,
+                                offsetof(CPUX86State, msr_hv_tsc) },
+
+    /* Hyper-V MSRs (non-SINT) */
+    { HV_X64_MSR_SCONTROL,  HV_REGISTER_SCONTROL,
+                            offsetof(CPUX86State, msr_hv_synic_control) },
+    { HV_X64_MSR_SIEFP,     HV_REGISTER_SIEFP,
+                            offsetof(CPUX86State, msr_hv_synic_evt_page) },
+    { HV_X64_MSR_SIMP,      HV_REGISTER_SIMP,
+                            offsetof(CPUX86State, msr_hv_synic_msg_page) },
+
+    /* Other */
+
+    /* TODO: find out processor features that correlate to unsupported MSRs. */
+    /* { IA32_MSR_MISC_ENABLE, HV_X64_REGISTER_MSR_IA32_MISC_ENABLE, */
+    /*                         offsetof(CPUX86State, msr_ia32_misc_enable) }, */
+    /* { IA32_MSR_BNDCFGS,     HV_X64_REGISTER_BNDCFGS, */
+    /*                         offsetof(CPUX86State, msr_bndcfgs) }, */
+    { IA32_MSR_SPEC_CTRL,   HV_X64_REGISTER_SPEC_CTRL,
+                            offsetof(CPUX86State, spec_ctrl) },
 };
-static const size_t msr_count = ARRAY_SIZE(supported_msrs);
 
-static int compare_msr_index(const void *a, const void *b)
+int mshv_init_msrs(const CPUState *cpu)
 {
-    return *(uint32_t *)a - *(uint32_t *)b;
+    int ret;
+    uint64_t d_t = MSR_MTRR_ENABLE | MSR_MTRR_MEM_TYPE_WB;
+
+    const struct hv_register_assoc assocs[] = {
+        { .name = HV_X64_REGISTER_SYSENTER_CS,       .value.reg64 = 0x0 },
+        { .name = HV_X64_REGISTER_SYSENTER_ESP,      .value.reg64 = 0x0 },
+        { .name = HV_X64_REGISTER_SYSENTER_EIP,      .value.reg64 = 0x0 },
+        { .name = HV_X64_REGISTER_STAR,              .value.reg64 = 0x0 },
+        { .name = HV_X64_REGISTER_CSTAR,             .value.reg64 = 0x0 },
+        { .name = HV_X64_REGISTER_LSTAR,             .value.reg64 = 0x0 },
+        { .name = HV_X64_REGISTER_KERNEL_GS_BASE,    .value.reg64 = 0x0 },
+        { .name = HV_X64_REGISTER_SFMASK,            .value.reg64 = 0x0 },
+        { .name = HV_X64_REGISTER_MSR_MTRR_DEF_TYPE, .value.reg64 = d_t },
+    };
+
+    ret = mshv_set_generic_regs(cpu, assocs, ARRAY_SIZE(assocs));
+    if (ret < 0) {
+        error_report("failed to put msrs");
+        return -1;
+    }
+
+    return 0;
 }
 
-__attribute__((constructor))
-static void init_sorted_msr_map(void)
+
+/*
+ * INVARIANT: this fn expects assocs in the same order as they appear in
+ * msr_env_map.
+ */
+static void store_in_env(CPUState *cpu, const struct hv_register_assoc *assocs,
+                         size_t n_assocs)
 {
-    qsort(supported_msrs, msr_count, sizeof(uint32_t), compare_msr_index);
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+    size_t i, j;
+    const MshvMsrEnvMap *mapping;
+    union hv_register_value hv_value;
+    ptrdiff_t offset;
+    uint32_t hv_name;
+
+    assert(n_assocs <= (ARRAY_SIZE(msr_env_map)));
+
+    for (i = 0, j = 0; i < ARRAY_SIZE(msr_env_map); i++) {
+        hv_name = assocs[j].name;
+        mapping = &msr_env_map[i];
+        if (hv_name != mapping->hv_name) {
+            continue;
+        }
+
+        hv_value = assocs[j].value;
+        offset = mapping->env_offset;
+        MSHV_ENV_FIELD(env, offset) = hv_value.reg64;
+        j++;
+    }
 }
 
-static int mshv_is_supported_msr(uint32_t msr)
+static void set_hv_name_in_assocs(struct hv_register_assoc *assocs,
+                                  size_t n_assocs)
 {
-    return bsearch(&msr, supported_msrs, msr_count, sizeof(uint32_t),
-                   compare_msr_index) != NULL;
+    size_t i;
+
+    assert(n_assocs == ARRAY_SIZE(msr_env_map));
+    for (i = 0; i < ARRAY_SIZE(msr_env_map); i++) {
+        assocs[i].name = msr_env_map[i].hv_name;
+    }
 }
 
-static int mshv_msr_to_hv_reg_name(uint32_t msr, uint32_t *hv_reg)
+static bool msr_supported(uint32_t name)
 {
-    switch (msr) {
-    case IA32_MSR_TSC:
-        *hv_reg = HV_X64_REGISTER_TSC;
-        return 0;
-    case IA32_MSR_EFER:
-        *hv_reg = HV_X64_REGISTER_EFER;
-        return 0;
-    case IA32_MSR_KERNEL_GS_BASE:
-        *hv_reg = HV_X64_REGISTER_KERNEL_GS_BASE;
-        return 0;
-    case IA32_MSR_APIC_BASE:
-        *hv_reg = HV_X64_REGISTER_APIC_BASE;
-        return 0;
-    case IA32_MSR_PAT:
-        *hv_reg = HV_X64_REGISTER_PAT;
-        return 0;
-    case IA32_MSR_SYSENTER_CS:
-        *hv_reg = HV_X64_REGISTER_SYSENTER_CS;
-        return 0;
-    case IA32_MSR_SYSENTER_ESP:
-        *hv_reg = HV_X64_REGISTER_SYSENTER_ESP;
-        return 0;
-    case IA32_MSR_SYSENTER_EIP:
-        *hv_reg = HV_X64_REGISTER_SYSENTER_EIP;
-        return 0;
-    case IA32_MSR_STAR:
-        *hv_reg = HV_X64_REGISTER_STAR;
-        return 0;
-    case IA32_MSR_LSTAR:
-        *hv_reg = HV_X64_REGISTER_LSTAR;
-        return 0;
-    case IA32_MSR_CSTAR:
-        *hv_reg = HV_X64_REGISTER_CSTAR;
-        return 0;
-    case IA32_MSR_SFMASK:
-        *hv_reg = HV_X64_REGISTER_SFMASK;
-        return 0;
-    case IA32_MSR_MTRR_CAP:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_CAP;
-        return 0;
-    case IA32_MSR_MTRR_DEF_TYPE:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_DEF_TYPE;
-        return 0;
-    case IA32_MSR_MTRR_PHYSBASE0:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE0;
-        return 0;
-    case IA32_MSR_MTRR_PHYSMASK0:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK0;
-        return 0;
-    case IA32_MSR_MTRR_PHYSBASE1:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE1;
-        return 0;
-    case IA32_MSR_MTRR_PHYSMASK1:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK1;
-        return 0;
-    case IA32_MSR_MTRR_PHYSBASE2:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE2;
-        return 0;
-    case IA32_MSR_MTRR_PHYSMASK2:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK2;
-        return 0;
-    case IA32_MSR_MTRR_PHYSBASE3:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE3;
-        return 0;
-    case IA32_MSR_MTRR_PHYSMASK3:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK3;
-        return 0;
-    case IA32_MSR_MTRR_PHYSBASE4:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE4;
-        return 0;
-    case IA32_MSR_MTRR_PHYSMASK4:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK4;
-        return 0;
-    case IA32_MSR_MTRR_PHYSBASE5:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE5;
-        return 0;
-    case IA32_MSR_MTRR_PHYSMASK5:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK5;
-        return 0;
-    case IA32_MSR_MTRR_PHYSBASE6:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE6;
-        return 0;
-    case IA32_MSR_MTRR_PHYSMASK6:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK6;
-        return 0;
-    case IA32_MSR_MTRR_PHYSBASE7:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE7;
-        return 0;
-    case IA32_MSR_MTRR_PHYSMASK7:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK7;
-        return 0;
-    case IA32_MSR_MTRR_FIX64K_00000:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX64K00000;
-        return 0;
-    case IA32_MSR_MTRR_FIX16K_80000:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX16K80000;
-        return 0;
-    case IA32_MSR_MTRR_FIX16K_A0000:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX16KA0000;
-        return 0;
-    case IA32_MSR_MTRR_FIX4K_C0000:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KC0000;
-        return 0;
-    case IA32_MSR_MTRR_FIX4K_C8000:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KC8000;
-        return 0;
-    case IA32_MSR_MTRR_FIX4K_D0000:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KD0000;
-        return 0;
-    case IA32_MSR_MTRR_FIX4K_D8000:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KD8000;
-        return 0;
-    case IA32_MSR_MTRR_FIX4K_E0000:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KE0000;
-        return 0;
-    case IA32_MSR_MTRR_FIX4K_E8000:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KE8000;
-        return 0;
-    case IA32_MSR_MTRR_FIX4K_F0000:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KF0000;
-        return 0;
-    case IA32_MSR_MTRR_FIX4K_F8000:
-        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KF8000;
-        return 0;
-    case IA32_MSR_TSC_AUX:
-        *hv_reg = HV_X64_REGISTER_TSC_AUX;
-        return 0;
-    case IA32_MSR_BNDCFGS:
-        *hv_reg = HV_X64_REGISTER_BNDCFGS;
-        return 0;
-    case IA32_MSR_DEBUG_CTL:
-        *hv_reg = HV_X64_REGISTER_DEBUG_CTL;
-        return 0;
-    case IA32_MSR_TSC_ADJUST:
-        *hv_reg = HV_X64_REGISTER_TSC_ADJUST;
-        return 0;
-    case IA32_MSR_SPEC_CTRL:
-        *hv_reg = HV_X64_REGISTER_SPEC_CTRL;
-        return 0;
-    case HV_X64_MSR_GUEST_OS_ID:
-        *hv_reg = HV_REGISTER_GUEST_OS_ID;
-        return 0;
-    case HV_X64_MSR_SINT0:
-        *hv_reg = HV_REGISTER_SINT0;
-        return 0;
-    case HV_X64_MSR_SINT1:
-        *hv_reg = HV_REGISTER_SINT1;
-        return 0;
-    case HV_X64_MSR_SINT2:
-        *hv_reg = HV_REGISTER_SINT2;
-        return 0;
-    case HV_X64_MSR_SINT3:
-        *hv_reg = HV_REGISTER_SINT3;
-        return 0;
-    case HV_X64_MSR_SINT4:
-        *hv_reg = HV_REGISTER_SINT4;
-        return 0;
-    case HV_X64_MSR_SINT5:
-        *hv_reg = HV_REGISTER_SINT5;
-        return 0;
-    case HV_X64_MSR_SINT6:
-        *hv_reg = HV_REGISTER_SINT6;
-        return 0;
-    case HV_X64_MSR_SINT7:
-        *hv_reg = HV_REGISTER_SINT7;
-        return 0;
-    case HV_X64_MSR_SINT8:
-        *hv_reg = HV_REGISTER_SINT8;
-        return 0;
-    case HV_X64_MSR_SINT9:
-        *hv_reg = HV_REGISTER_SINT9;
-        return 0;
-    case HV_X64_MSR_SINT10:
-        *hv_reg = HV_REGISTER_SINT10;
-        return 0;
-    case HV_X64_MSR_SINT11:
-        *hv_reg = HV_REGISTER_SINT11;
-        return 0;
-    case HV_X64_MSR_SINT12:
-        *hv_reg = HV_REGISTER_SINT12;
-        return 0;
-    case HV_X64_MSR_SINT13:
-        *hv_reg = HV_REGISTER_SINT13;
-        return 0;
-    case HV_X64_MSR_SINT14:
-        *hv_reg = HV_REGISTER_SINT14;
-        return 0;
-    case HV_X64_MSR_SINT15:
-        *hv_reg = HV_REGISTER_SINT15;
-        return 0;
-    case IA32_MSR_MISC_ENABLE:
-        *hv_reg = HV_X64_REGISTER_MSR_IA32_MISC_ENABLE;
-        return 0;
-    case HV_X64_MSR_SCONTROL:
-        *hv_reg = HV_REGISTER_SCONTROL;
-        return 0;
-    case HV_X64_MSR_SIEFP:
-        *hv_reg = HV_REGISTER_SIEFP;
-        return 0;
-    case HV_X64_MSR_SIMP:
-        *hv_reg = HV_REGISTER_SIMP;
-        return 0;
-    case HV_X64_MSR_REFERENCE_TSC:
-        *hv_reg = HV_REGISTER_REFERENCE_TSC;
-        return 0;
-    case HV_X64_MSR_EOM:
-        *hv_reg = HV_REGISTER_EOM;
-        return 0;
-    default:
-        error_report("failed to map MSR %u to HV register name", msr);
-        return -1;
+    /*
+     * This check is not done comprehensively, it's meant to avoid hvcall
+     * failures for certain MSRs on architectures that don't support them.
+     */
+
+    switch (name) {
+    case HV_X64_REGISTER_SPEC_CTRL:
+        return mshv_state->processor_features.ibrs_support;
+    case HV_X64_REGISTER_TSC_ADJUST:
+        return mshv_state->processor_features.tsc_adjust_support;
+    case HV_X64_REGISTER_TSC_DEADLINE:
+        return mshv_state->processor_features.tsc_deadline_tmr_support;
     }
+
+    return true;
 }
 
-static int set_msrs(const CPUState *cpu, GList *msrs)
+int mshv_get_msrs(CPUState *cpu)
 {
-    size_t n_msrs;
-    GList *entries;
-    MshvMsrEntry *entry;
-    enum hv_register_name name;
-    struct hv_register_assoc *assoc;
-    int ret;
-    size_t i = 0;
-
-    n_msrs = g_list_length(msrs);
-    hv_register_assoc *assocs = g_new0(hv_register_assoc, n_msrs);
-
-    entries = msrs;
-    for (const GList *elem = entries; elem != NULL; elem = elem->next) {
-        entry = elem->data;
-        ret = mshv_msr_to_hv_reg_name(entry->index, &name);
-        if (ret < 0) {
-            g_free(assocs);
-            return ret;
+    int ret = 0;
+    size_t n_assocs = ARRAY_SIZE(msr_env_map);
+    struct hv_register_assoc assocs[ARRAY_SIZE(msr_env_map)];
+    size_t i, j;
+    uint32_t name;
+
+    set_hv_name_in_assocs(assocs, n_assocs);
+
+    /* Filter out MSRs that cannot be read */
+    for (i = 0, j = 0; i < n_assocs; i++) {
+        name = assocs[i].name;
+
+        if (!msr_supported(name)) {
+            continue;
+        }
+
+        if (j != i) {
+            assocs[j] = assocs[i];
         }
-        assoc = &assocs[i];
-        assoc->name = name;
-        /* the union has been initialized to 0 */
-        assoc->value.reg64 = entry->data;
-        i++;
+        j++;
     }
-    ret = mshv_set_generic_regs(cpu, assocs, n_msrs);
-    g_free(assocs);
+    n_assocs = j;
+
+    ret = mshv_get_generic_regs(cpu, assocs, n_assocs);
     if (ret < 0) {
-        error_report("failed to set msrs");
-        return -1;
+        error_report("Failed to get MSRs");
+        return -errno;
     }
+
+    store_in_env(cpu, assocs, n_assocs);
+
     return 0;
 }
 
+static void load_from_env(const CPUState *cpu, struct hv_register_assoc *assocs,
+                          size_t n_assocs)
+{
+    size_t i;
+    const MshvMsrEnvMap *mapping;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+    ptrdiff_t offset;
+    union hv_register_value *hv_value;
+
+    assert(n_assocs == ARRAY_SIZE(msr_env_map));
 
-int mshv_configure_msr(const CPUState *cpu, const MshvMsrEntry *msrs,
-                       size_t n_msrs)
+    for (i = 0; i < ARRAY_SIZE(msr_env_map); i++) {
+        mapping = &msr_env_map[i];
+        offset = mapping->env_offset;
+        assocs[i].name = mapping->hv_name;
+        hv_value = &assocs[i].value;
+        hv_value->reg64 = MSHV_ENV_FIELD(env, offset);
+    }
+}
+
+int mshv_set_msrs(const CPUState *cpu)
 {
-    GList *valid_msrs = NULL;
-    uint32_t msr_index;
+    size_t n_assocs = ARRAY_SIZE(msr_env_map);
+    struct hv_register_assoc assocs[ARRAY_SIZE(msr_env_map)];
     int ret;
+    size_t i, j;
 
-    for (size_t i = 0; i < n_msrs; i++) {
-        msr_index = msrs[i].index;
-        /* check whether index of msrs is in SUPPORTED_MSRS */
-        if (mshv_is_supported_msr(msr_index)) {
-            valid_msrs = g_list_append(valid_msrs, (void *) &msrs[i]);
+    load_from_env(cpu, assocs, n_assocs);
+
+    /* Filter out MSRs that cannot be written */
+    for (i = 0, j = 0; i < n_assocs; i++) {
+        uint32_t name = assocs[i].name;
+
+        /* Partition-wide MSRs: only write on vCPU 0 */
+        if (cpu->cpu_index != 0 &&
+            (name == HV_X64_REGISTER_HYPERCALL ||
+             name == HV_REGISTER_GUEST_OS_ID ||
+             name == HV_REGISTER_REFERENCE_TSC)) {
+            continue;
         }
+
+        if (!msr_supported(name)) {
+            continue;
+        }
+
+        if (j != i) {
+            assocs[j] = assocs[i];
+        }
+        j++;
     }
+    n_assocs = j;
 
-    ret = set_msrs(cpu, valid_msrs);
-    g_list_free(valid_msrs);
+    ret = mshv_set_generic_regs(cpu, assocs, n_assocs);
+    if (ret < 0) {
+        error_report("Failed to set MSRs");
+        return -errno;
+    }
 
-    return ret;
+    return 0;
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 20/32] target/i386/mshv: migrate MTRR MSRs
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (18 preceding siblings ...)
  2026-03-23 13:57 ` [RFC 19/32] target/i386/mshv: migrate MSRs Magnus Kulke
@ 2026-03-23 13:58 ` Magnus Kulke
  2026-03-23 13:58 ` [RFC 21/32] target/i386/mshv: migrate Synic SINT MSRs Magnus Kulke
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

This change roundtrips memory access/caching MSRs. The mapping scheme
is a bit more elaborate on these, so we have added a special handling
instead of individual entries in the MSR mapping table.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 target/i386/mshv/msr.c | 136 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 129 insertions(+), 7 deletions(-)

diff --git a/target/i386/mshv/msr.c b/target/i386/mshv/msr.c
index 6e53874787..240ee84447 100644
--- a/target/i386/mshv/msr.c
+++ b/target/i386/mshv/msr.c
@@ -74,6 +74,10 @@ static const MshvMsrEnvMap msr_env_map[] = {
     { HV_X64_MSR_SIMP,      HV_REGISTER_SIMP,
                             offsetof(CPUX86State, msr_hv_synic_msg_page) },
 
+    /* MTRR default type */
+    { IA32_MSR_MTRR_DEF_TYPE, HV_X64_REGISTER_MSR_MTRR_DEF_TYPE,
+                              offsetof(CPUX86State, mtrr_deftype) },
+
     /* Other */
 
     /* TODO: find out processor features that correlate to unsupported MSRs. */
@@ -85,6 +89,98 @@ static const MshvMsrEnvMap msr_env_map[] = {
                             offsetof(CPUX86State, spec_ctrl) },
 };
 
+/*
+ * The assocs have to be set according to this schema:
+ *      8  entries for 0-7 mtrr_base
+ *      8  entries for mtrr_mask 0-7
+ *      11 entries for 1 x 64k, 2 x 16k, 8 x 4k fixed MTRR
+ *      27 total entries
+ */
+
+#define MSHV_MTRR_MSR_COUNT 27
+#define MSHV_MSR_TOTAL_COUNT (ARRAY_SIZE(msr_env_map) + MSHV_MTRR_MSR_COUNT)
+
+static void store_in_env_mtrr_phys(CPUState *cpu,
+                                   const struct hv_register_assoc *assocs,
+                                   size_t n_assocs)
+{
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+    size_t i, fixed_offset;
+    hv_register_name hv_name;
+    uint64_t base, mask;
+
+    assert(n_assocs == MSHV_MTRR_MSR_COUNT);
+
+    for (i = 0; i < MSR_MTRRcap_VCNT; i++) {
+        hv_name = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE0 + i;
+        assert(assocs[i].name == hv_name);
+        hv_name = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK0 + i;
+        assert(assocs[i + MSR_MTRRcap_VCNT].name == hv_name);
+
+        base = assocs[i].value.reg64;
+        mask = assocs[i + MSR_MTRRcap_VCNT].value.reg64;
+        env->mtrr_var[i].base = base;
+        env->mtrr_var[i].mask = mask;
+    }
+
+    /* fixed 1x 64, 2x 16, 8x 4 kB */
+    fixed_offset = MSR_MTRRcap_VCNT * 2;
+    for (i = 0; i < 11; i++) {
+        hv_name = HV_X64_REGISTER_MSR_MTRR_FIX64K00000 + i;
+        assert(assocs[fixed_offset + i].name == hv_name);
+        env->mtrr_fixed[i] = assocs[fixed_offset + i].value.reg64;
+    }
+}
+
+/*
+ * The assocs have to be set according to this schema:
+ *      8  entries for 0-7 mtrr_base
+ *      8  entries for mtrr_mask 0-7
+ *      11 entries for 1 x 64k, 2 x 16k, 8 x 4k fixed MTRR
+ *      27 total entries
+ */
+static void load_from_env_mtrr_phys(const CPUState *cpu,
+                                    struct hv_register_assoc *assocs,
+                                    size_t n_assocs)
+{
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+    size_t i, fixed_offset;
+    uint64_t base, mask, fixed_value;
+    hv_register_name base_name, mask_name, fixed_name;
+    hv_register_assoc *assoc;
+
+    assert(n_assocs == MSHV_MTRR_MSR_COUNT);
+
+    for (i = 0; i < MSR_MTRRcap_VCNT; i++) {
+        base = env->mtrr_var[i].base;
+        mask = env->mtrr_var[i].mask;
+
+        base_name = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE0 + i;
+        mask_name = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK0 + i;
+
+        assoc = &assocs[i];
+        assoc->name = base_name;
+        assoc->value.reg64 = base;
+
+        assoc = &assocs[i + MSR_MTRRcap_VCNT];
+        assoc->name = mask_name;
+        assoc->value.reg64 = mask;
+    }
+
+    /* fixed 1x 64, 2x 16, 8x 4 kB */
+    fixed_offset = MSR_MTRRcap_VCNT * 2;
+    for (i = 0; i < 11; i++) {
+        fixed_name = HV_X64_REGISTER_MSR_MTRR_FIX64K00000 + i;
+        fixed_value = env->mtrr_fixed[i];
+
+        assoc = &assocs[fixed_offset + i];
+        assoc->name = fixed_name;
+        assoc->value.reg64 = fixed_value;
+    }
+}
+
 int mshv_init_msrs(const CPUState *cpu)
 {
     int ret;
@@ -126,8 +222,9 @@ static void store_in_env(CPUState *cpu, const struct hv_register_assoc *assocs,
     union hv_register_value hv_value;
     ptrdiff_t offset;
     uint32_t hv_name;
+    size_t mtrr_index;
 
-    assert(n_assocs <= (ARRAY_SIZE(msr_env_map)));
+    assert(n_assocs <= MSHV_MSR_TOTAL_COUNT);
 
     for (i = 0, j = 0; i < ARRAY_SIZE(msr_env_map); i++) {
         hv_name = assocs[j].name;
@@ -141,17 +238,38 @@ static void store_in_env(CPUState *cpu, const struct hv_register_assoc *assocs,
         MSHV_ENV_FIELD(env, offset) = hv_value.reg64;
         j++;
     }
+
+    mtrr_index = j;
+    store_in_env_mtrr_phys(cpu, &assocs[mtrr_index], MSHV_MTRR_MSR_COUNT);
 }
 
 static void set_hv_name_in_assocs(struct hv_register_assoc *assocs,
                                   size_t n_assocs)
 {
     size_t i;
+    size_t mtrr_offset, mtrr_fixed_offset;
+    hv_register_name hv_name;
+
+    assert(n_assocs == MSHV_MSR_TOTAL_COUNT);
 
-    assert(n_assocs == ARRAY_SIZE(msr_env_map));
     for (i = 0; i < ARRAY_SIZE(msr_env_map); i++) {
         assocs[i].name = msr_env_map[i].hv_name;
     }
+
+    mtrr_offset = ARRAY_SIZE(msr_env_map);
+    for (i = 0; i < MSR_MTRRcap_VCNT; i++) {
+        hv_name = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE0 + i;
+        assocs[mtrr_offset + i].name = hv_name;
+        hv_name = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK0 + i;
+        assocs[mtrr_offset + MSR_MTRRcap_VCNT + i].name = hv_name;
+    }
+
+    /* fixed 1x 64, 2x 16, 8x 4 kB */
+    mtrr_fixed_offset = mtrr_offset + MSR_MTRRcap_VCNT * 2;
+    for (i = 0; i < 11; i++) {
+        hv_name = HV_X64_REGISTER_MSR_MTRR_FIX64K00000 + i;
+        assocs[mtrr_fixed_offset + i].name = hv_name;
+    }
 }
 
 static bool msr_supported(uint32_t name)
@@ -176,8 +294,8 @@ static bool msr_supported(uint32_t name)
 int mshv_get_msrs(CPUState *cpu)
 {
     int ret = 0;
-    size_t n_assocs = ARRAY_SIZE(msr_env_map);
-    struct hv_register_assoc assocs[ARRAY_SIZE(msr_env_map)];
+    size_t n_assocs = MSHV_MSR_TOTAL_COUNT;
+    struct hv_register_assoc assocs[MSHV_MSR_TOTAL_COUNT];
     size_t i, j;
     uint32_t name;
 
@@ -218,8 +336,9 @@ static void load_from_env(const CPUState *cpu, struct hv_register_assoc *assocs,
     CPUX86State *env = &x86_cpu->env;
     ptrdiff_t offset;
     union hv_register_value *hv_value;
+    size_t mtrr_offset;
 
-    assert(n_assocs == ARRAY_SIZE(msr_env_map));
+    assert(n_assocs == MSHV_MSR_TOTAL_COUNT);
 
     for (i = 0; i < ARRAY_SIZE(msr_env_map); i++) {
         mapping = &msr_env_map[i];
@@ -228,12 +347,15 @@ static void load_from_env(const CPUState *cpu, struct hv_register_assoc *assocs,
         hv_value = &assocs[i].value;
         hv_value->reg64 = MSHV_ENV_FIELD(env, offset);
     }
+
+    mtrr_offset = ARRAY_SIZE(msr_env_map);
+    load_from_env_mtrr_phys(cpu, &assocs[mtrr_offset], MSHV_MTRR_MSR_COUNT);
 }
 
 int mshv_set_msrs(const CPUState *cpu)
 {
-    size_t n_assocs = ARRAY_SIZE(msr_env_map);
-    struct hv_register_assoc assocs[ARRAY_SIZE(msr_env_map)];
+    size_t n_assocs = MSHV_MSR_TOTAL_COUNT;
+    struct hv_register_assoc assocs[MSHV_MSR_TOTAL_COUNT];
     int ret;
     size_t i, j;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 21/32] target/i386/mshv: migrate Synic SINT MSRs
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (19 preceding siblings ...)
  2026-03-23 13:58 ` [RFC 20/32] target/i386/mshv: migrate MTRR MSRs Magnus Kulke
@ 2026-03-23 13:58 ` Magnus Kulke
  2026-03-23 13:58 ` [RFC 22/32] target/i386/mshv: migrate SIMP and SIEFP state Magnus Kulke
                   ` (10 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

Migrate HyperV SynIC SINT MSRs. We can only read/write those if SCONTROL
is enabled in the guest, hence we have to split the SINT MSR out and
make reading/writing them dependent on that MSR.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 target/i386/mshv/msr.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/target/i386/mshv/msr.c b/target/i386/mshv/msr.c
index 240ee84447..d19b79d729 100644
--- a/target/i386/mshv/msr.c
+++ b/target/i386/mshv/msr.c
@@ -298,6 +298,8 @@ int mshv_get_msrs(CPUState *cpu)
     struct hv_register_assoc assocs[MSHV_MSR_TOTAL_COUNT];
     size_t i, j;
     uint32_t name;
+    X86CPU *x86cpu = X86_CPU(cpu);
+    bool synic_enabled;
 
     set_hv_name_in_assocs(assocs, n_assocs);
 
@@ -324,6 +326,27 @@ int mshv_get_msrs(CPUState *cpu)
 
     store_in_env(cpu, assocs, n_assocs);
 
+    /* Read SINT MSRs only if SynIC is enabled */
+    synic_enabled = x86cpu->env.msr_hv_synic_control & 1;
+    if (synic_enabled) {
+        QEMU_BUILD_BUG_ON(MSHV_MSR_TOTAL_COUNT < HV_SINT_COUNT);
+
+        for (i = 0; i < HV_SINT_COUNT; i++) {
+            assocs[i].name = HV_REGISTER_SINT0 + i;
+        }
+
+        ret = mshv_get_generic_regs(cpu, assocs, HV_SINT_COUNT);
+        if (ret < 0) {
+            error_report("Failed to get SynIC SINT MSRs");
+            return -errno;
+        }
+
+        for (i = 0; i < HV_SINT_COUNT; i++) {
+            uint64_t hv_sint_value = assocs[i].value.reg64;
+            x86cpu->env.msr_hv_synic_sint[i] = hv_sint_value;
+        }
+    }
+
     return 0;
 }
 
@@ -358,6 +381,8 @@ int mshv_set_msrs(const CPUState *cpu)
     struct hv_register_assoc assocs[MSHV_MSR_TOTAL_COUNT];
     int ret;
     size_t i, j;
+    X86CPU *x86cpu = X86_CPU(cpu);
+    bool synic_enabled = x86cpu->env.msr_hv_synic_control & 1;
 
     load_from_env(cpu, assocs, n_assocs);
 
@@ -390,5 +415,21 @@ int mshv_set_msrs(const CPUState *cpu)
         return -errno;
     }
 
+    /* SINT MSRs can only be written if SCONTROL has been set, so we split */
+    if (synic_enabled) {
+        QEMU_BUILD_BUG_ON(MSHV_MSR_TOTAL_COUNT < HV_SINT_COUNT);
+
+        for (i = 0; i < HV_SINT_COUNT; i++) {
+            assocs[i].name        = HV_REGISTER_SINT0 + i;
+            assocs[i].value.reg64 = x86cpu->env.msr_hv_synic_sint[i];
+        }
+
+        ret = mshv_set_generic_regs(cpu, assocs, HV_SINT_COUNT);
+        if (ret < 0) {
+            error_report("Failed to set SynIC SINT MSRs");
+            return -errno;
+        }
+    }
+
     return 0;
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 22/32] target/i386/mshv: migrate SIMP and SIEFP state
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (20 preceding siblings ...)
  2026-03-23 13:58 ` [RFC 21/32] target/i386/mshv: migrate Synic SINT MSRs Magnus Kulke
@ 2026-03-23 13:58 ` Magnus Kulke
  2026-03-23 13:58 ` [RFC 23/32] target/i386/mshv: migrate STIMER state Magnus Kulke
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

This part SynIC state is retrieved from the hypervisor via aligned state
pages:

- Add new synic source file
- Centralize the synic_enabled() check
- r/w pages from the hyper via aligned pages
- only handle pages when synic is enabled
- add buffers for migration to VM state

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 include/system/mshv_int.h    |   7 ++
 target/i386/cpu.h            |   5 ++
 target/i386/machine.c        |  26 ++++++
 target/i386/mshv/meson.build |   1 +
 target/i386/mshv/mshv-cpu.c  |  64 +++++++++++++++
 target/i386/mshv/msr.c       |   7 +-
 target/i386/mshv/synic.c     | 155 +++++++++++++++++++++++++++++++++++
 7 files changed, 260 insertions(+), 5 deletions(-)
 create mode 100644 target/i386/mshv/synic.c

diff --git a/include/system/mshv_int.h b/include/system/mshv_int.h
index 29b363e73e..80df4030c5 100644
--- a/include/system/mshv_int.h
+++ b/include/system/mshv_int.h
@@ -119,4 +119,11 @@ int mshv_init_msrs(const CPUState *cpu);
 int mshv_get_msrs(CPUState *cpu);
 int mshv_set_msrs(const CPUState *cpu);
 
+/* synic */
+int mshv_get_simp(int cpu_fd, uint8_t *page);
+int mshv_set_simp(int cpu_fd, const uint8_t *page);
+int mshv_get_siefp(int cpu_fd, uint8_t *page);
+int mshv_set_siefp(int cpu_fd, const uint8_t *page);
+bool mshv_synic_enabled(const CPUState *cpu);
+
 #endif
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 0b539155c4..d010d26146 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -33,6 +33,7 @@
 #include "qemu/cpu-float.h"
 #include "qemu/timer.h"
 #include "standard-headers/asm-x86/kvm_para.h"
+#include "hw/hyperv/hvgdk_mini.h"
 
 #define XEN_NR_VIRQS 24
 
@@ -2291,6 +2292,10 @@ typedef struct CPUArchState {
 #if defined(CONFIG_HVF) || defined(CONFIG_MSHV) || defined(CONFIG_WHPX)
     void *emu_mmio_buf;
 #endif
+#if defined(CONFIG_MSHV)
+    uint8_t hv_simp_page[HV_HYP_PAGE_SIZE];
+    uint8_t hv_siefp_page[HV_HYP_PAGE_SIZE];
+#endif
 
     uint64_t mcg_cap;
     uint64_t mcg_ctl;
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 48a2a4b319..f94cc544b3 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -952,6 +952,29 @@ static const VMStateDescription vmstate_msr_hyperv_reenlightenment = {
     }
 };
 
+#ifdef CONFIG_MSHV
+static bool mshv_synic_vp_state_needed(void *opaque)
+{
+    X86CPU *cpu = opaque;
+    CPUX86State *env = &cpu->env;
+
+    /* Only migrate SIMP/SIEFP if SynIC is enabled */
+    return env->msr_hv_synic_control & 1;
+}
+
+static const VMStateDescription vmstate_mshv_synic_vp_state = {
+    .name = "cpu/mshv_synic_vp_state",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = mshv_synic_vp_state_needed,
+    .fields = (const VMStateField[]) {
+        VMSTATE_BUFFER(env.hv_simp_page, X86CPU),
+        VMSTATE_BUFFER(env.hv_siefp_page, X86CPU),
+        VMSTATE_END_OF_LIST()
+    }
+};
+#endif
+
 static bool avx512_needed(void *opaque)
 {
     X86CPU *cpu = opaque;
@@ -1916,6 +1939,9 @@ const VMStateDescription vmstate_x86_cpu = {
         &vmstate_cet,
 #ifdef TARGET_X86_64
         &vmstate_apx,
+#endif
+#ifdef CONFIG_MSHV
+        &vmstate_mshv_synic_vp_state,
 #endif
         NULL
     }
diff --git a/target/i386/mshv/meson.build b/target/i386/mshv/meson.build
index f44e84688d..a847a6c74c 100644
--- a/target/i386/mshv/meson.build
+++ b/target/i386/mshv/meson.build
@@ -4,6 +4,7 @@ i386_mshv_ss.add(files(
   'mshv-apic.c',
   'mshv-cpu.c',
   'msr.c',
+  'synic.c',
 ))
 
 i386_system_ss.add_all(when: 'CONFIG_MSHV', if_true: i386_mshv_ss)
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 0d4721582a..49f3f9c090 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -128,6 +128,33 @@ static int get_lapic(CPUState *cpu)
     return 0;
 }
 
+static int get_synic_state(CPUState *cpu)
+{
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    int cpu_fd = mshv_vcpufd(cpu);
+    int ret;
+
+    /* SIMP/SIEFP can only be read when SynIC is enabled */
+    if (!mshv_synic_enabled(cpu)) {
+        return 0;
+    }
+
+    ret = mshv_get_simp(cpu_fd, env->hv_simp_page);
+    if (ret < 0) {
+        error_report("failed to get simp state");
+        return -1;
+    }
+
+    ret = mshv_get_siefp(cpu_fd, env->hv_siefp_page);
+    if (ret < 0) {
+        error_report("failed to get siefp state");
+        return -1;
+    }
+
+    return 0;
+}
+
 static void populate_fpu(const hv_register_assoc *assocs, X86CPU *x86cpu)
 {
     union hv_register_value value;
@@ -585,6 +612,11 @@ int mshv_arch_load_vcpu_state(CPUState *cpu)
         return ret;
     }
 
+    ret = get_synic_state(cpu);
+    if (ret < 0) {
+        return ret;
+    }
+
     return 0;
 }
 
@@ -1026,6 +1058,33 @@ static int set_lapic(const CPUState *cpu)
     return 0;
 }
 
+static int set_synic_state(const CPUState *cpu)
+{
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    int cpu_fd = mshv_vcpufd(cpu);
+    int ret;
+
+    /* SIMP/SIEFP can only be written when SynIC is enabled */
+    if (!mshv_synic_enabled(cpu)) {
+        return 0;
+    }
+
+    ret = mshv_set_simp(cpu_fd, env->hv_simp_page);
+    if (ret < 0) {
+        error_report("failed to set simp state");
+        return -1;
+    }
+
+    ret = mshv_set_siefp(cpu_fd, env->hv_siefp_page);
+    if (ret < 0) {
+        error_report("failed to set siefp state");
+        return -1;
+    }
+
+    return 0;
+}
+
 int mshv_arch_store_vcpu_state(const CPUState *cpu)
 {
     int ret;
@@ -1062,6 +1121,11 @@ int mshv_arch_store_vcpu_state(const CPUState *cpu)
         return ret;
     }
 
+    ret = set_synic_state(cpu);
+    if (ret < 0) {
+        return ret;
+    }
+
     return 0;
 }
 
diff --git a/target/i386/mshv/msr.c b/target/i386/mshv/msr.c
index d19b79d729..bfae4ed0d8 100644
--- a/target/i386/mshv/msr.c
+++ b/target/i386/mshv/msr.c
@@ -299,7 +299,6 @@ int mshv_get_msrs(CPUState *cpu)
     size_t i, j;
     uint32_t name;
     X86CPU *x86cpu = X86_CPU(cpu);
-    bool synic_enabled;
 
     set_hv_name_in_assocs(assocs, n_assocs);
 
@@ -327,8 +326,7 @@ int mshv_get_msrs(CPUState *cpu)
     store_in_env(cpu, assocs, n_assocs);
 
     /* Read SINT MSRs only if SynIC is enabled */
-    synic_enabled = x86cpu->env.msr_hv_synic_control & 1;
-    if (synic_enabled) {
+    if (mshv_synic_enabled(cpu)) {
         QEMU_BUILD_BUG_ON(MSHV_MSR_TOTAL_COUNT < HV_SINT_COUNT);
 
         for (i = 0; i < HV_SINT_COUNT; i++) {
@@ -382,7 +380,6 @@ int mshv_set_msrs(const CPUState *cpu)
     int ret;
     size_t i, j;
     X86CPU *x86cpu = X86_CPU(cpu);
-    bool synic_enabled = x86cpu->env.msr_hv_synic_control & 1;
 
     load_from_env(cpu, assocs, n_assocs);
 
@@ -416,7 +413,7 @@ int mshv_set_msrs(const CPUState *cpu)
     }
 
     /* SINT MSRs can only be written if SCONTROL has been set, so we split */
-    if (synic_enabled) {
+    if (mshv_synic_enabled(cpu)) {
         QEMU_BUILD_BUG_ON(MSHV_MSR_TOTAL_COUNT < HV_SINT_COUNT);
 
         for (i = 0; i < HV_SINT_COUNT; i++) {
diff --git a/target/i386/mshv/synic.c b/target/i386/mshv/synic.c
new file mode 100644
index 0000000000..8f9fee6ed7
--- /dev/null
+++ b/target/i386/mshv/synic.c
@@ -0,0 +1,155 @@
+/*
+ * QEMU MSHV SynIC support
+ *
+ * Copyright Microsoft, Corp. 2026
+ *
+ * Authors: Magnus Kulke  <magnuskulke@microsoft.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/memalign.h"
+#include "qemu/error-report.h"
+
+#include "system/mshv.h"
+#include "system/mshv_int.h"
+
+#include "linux/mshv.h"
+#include "hw/hyperv/hvgdk_mini.h"
+#include "cpu.h"
+
+#include <sys/ioctl.h>
+
+bool mshv_synic_enabled(const CPUState *cpu)
+{
+    X86CPU *x86cpu = X86_CPU(cpu);
+
+    return x86cpu->env.msr_hv_synic_control & 1;
+}
+
+static int get_vp_state(int cpu_fd, struct mshv_get_set_vp_state *state)
+{
+    int ret;
+
+    ret = ioctl(cpu_fd, MSHV_GET_VP_STATE, state);
+    if (ret < 0) {
+        error_report("failed to get vp state: %s", strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
+static int set_vp_state(int cpu_fd, const struct mshv_get_set_vp_state *state)
+{
+    int ret;
+
+    ret = ioctl(cpu_fd, MSHV_SET_VP_STATE, state);
+    if (ret < 0) {
+        error_report("failed to set vp state: %s", strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
+int mshv_get_simp(int cpu_fd, uint8_t *page)
+{
+    int ret;
+    void *buffer;
+    struct mshv_get_set_vp_state args = {0};
+
+    buffer = qemu_memalign(HV_HYP_PAGE_SIZE, HV_HYP_PAGE_SIZE);
+    args.buf_ptr = (uint64_t)buffer;
+    args.buf_sz = HV_HYP_PAGE_SIZE;
+    args.type = MSHV_VP_STATE_SIMP;
+
+    ret = get_vp_state(cpu_fd, &args);
+
+    if (ret < 0) {
+        qemu_vfree(buffer);
+        error_report("failed to get simp");
+        return -1;
+    }
+
+    memcpy(page, buffer, HV_HYP_PAGE_SIZE);
+    qemu_vfree(buffer);
+
+    return 0;
+}
+
+int mshv_set_simp(int cpu_fd, const uint8_t *page)
+{
+    int ret;
+    void *buffer;
+    struct mshv_get_set_vp_state args = {0};
+
+    buffer = qemu_memalign(HV_HYP_PAGE_SIZE, HV_HYP_PAGE_SIZE);
+    args.buf_ptr = (uint64_t)buffer;
+    args.buf_sz = HV_HYP_PAGE_SIZE;
+    args.type = MSHV_VP_STATE_SIMP;
+
+    assert(page);
+    memcpy(buffer, page, HV_HYP_PAGE_SIZE);
+
+    ret = set_vp_state(cpu_fd, &args);
+    qemu_vfree(buffer);
+
+    if (ret < 0) {
+        error_report("failed to set simp");
+        return -1;
+    }
+
+    return 0;
+}
+
+int mshv_get_siefp(int cpu_fd, uint8_t *page)
+{
+    int ret;
+    void *buffer;
+    struct mshv_get_set_vp_state args = {0};
+
+    buffer = qemu_memalign(HV_HYP_PAGE_SIZE, HV_HYP_PAGE_SIZE);
+    args.buf_ptr = (uint64_t)buffer;
+    args.buf_sz = HV_HYP_PAGE_SIZE;
+    args.type = MSHV_VP_STATE_SIEFP,
+
+    ret = get_vp_state(cpu_fd, &args);
+
+    if (ret < 0) {
+        qemu_vfree(buffer);
+        error_report("failed to get siefp");
+        return -1;
+    }
+
+    memcpy(page, buffer, HV_HYP_PAGE_SIZE);
+    qemu_vfree(buffer);
+
+    return 0;
+}
+
+int mshv_set_siefp(int cpu_fd, const uint8_t *page)
+{
+    int ret;
+    void *buffer;
+    struct mshv_get_set_vp_state args = {0};
+
+    buffer = qemu_memalign(HV_HYP_PAGE_SIZE, HV_HYP_PAGE_SIZE);
+    args.buf_ptr = (uint64_t)buffer;
+    args.buf_sz = HV_HYP_PAGE_SIZE;
+    args.type = MSHV_VP_STATE_SIEFP,
+
+    assert(page);
+    memcpy(buffer, page, HV_HYP_PAGE_SIZE);
+
+    ret = set_vp_state(cpu_fd, &args);
+    qemu_vfree(buffer);
+
+    if (ret < 0) {
+        error_report("failed to set simp");
+        return -1;
+    }
+
+    return 0;
+}
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 23/32] target/i386/mshv: migrate STIMER state
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (21 preceding siblings ...)
  2026-03-23 13:58 ` [RFC 22/32] target/i386/mshv: migrate SIMP and SIEFP state Magnus Kulke
@ 2026-03-23 13:58 ` Magnus Kulke
  2026-03-23 13:58 ` [RFC 24/32] accel/mshv: introduce SaveVMHandler Magnus Kulke
                   ` (8 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

This part of Synic state is retrieved via a mem-aligned page. We declare
the required space (size reference: rust-vmm/mshv) as a buffer on the VM
state struct for inclusion in a migration.

Other than other SynIC features, STIMER doesn't depend on SCONTROL being
set.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 include/system/mshv_int.h   |  2 ++
 target/i386/cpu.h           |  5 ++++
 target/i386/machine.c       | 20 +++++++++++++++
 target/i386/mshv/mshv-cpu.c | 12 +++++++++
 target/i386/mshv/synic.c    | 51 +++++++++++++++++++++++++++++++++++++
 5 files changed, 90 insertions(+)

diff --git a/include/system/mshv_int.h b/include/system/mshv_int.h
index 80df4030c5..7d685fc647 100644
--- a/include/system/mshv_int.h
+++ b/include/system/mshv_int.h
@@ -125,5 +125,7 @@ int mshv_set_simp(int cpu_fd, const uint8_t *page);
 int mshv_get_siefp(int cpu_fd, uint8_t *page);
 int mshv_set_siefp(int cpu_fd, const uint8_t *page);
 bool mshv_synic_enabled(const CPUState *cpu);
+int mshv_get_synthetic_timers(int cpu_fd, uint8_t *state);
+int mshv_set_synthetic_timers(int cpu_fd, const uint8_t *state);
 
 #endif
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index d010d26146..4ad4a35ce9 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -45,6 +45,10 @@
 #define ELF_MACHINE_UNAME "i686"
 #endif
 
+#ifdef CONFIG_MSHV
+#define MSHV_STIMERS_STATE_SIZE 200
+#endif
+
 enum {
     R_EAX = 0,
     R_ECX = 1,
@@ -2295,6 +2299,7 @@ typedef struct CPUArchState {
 #if defined(CONFIG_MSHV)
     uint8_t hv_simp_page[HV_HYP_PAGE_SIZE];
     uint8_t hv_siefp_page[HV_HYP_PAGE_SIZE];
+    uint8_t hv_synthetic_timers_state[MSHV_STIMERS_STATE_SIZE];
 #endif
 
     uint64_t mcg_cap;
diff --git a/target/i386/machine.c b/target/i386/machine.c
index f94cc544b3..38ccbbe19d 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -10,6 +10,7 @@
 #include "exec/watchpoint.h"
 #include "system/kvm.h"
 #include "system/kvm_xen.h"
+#include "system/mshv.h"
 #include "system/tcg.h"
 
 #include "qemu/error-report.h"
@@ -953,6 +954,24 @@ static const VMStateDescription vmstate_msr_hyperv_reenlightenment = {
 };
 
 #ifdef CONFIG_MSHV
+
+static bool mshv_synthetic_timers_needed(void *opaque)
+{
+    /* Always migrate synthetic timers */
+    return mshv_enabled();
+}
+
+static const VMStateDescription vmstate_mshv_synthetic_timers = {
+    .name = "cpu/mshv_synthetic_timers",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = mshv_synthetic_timers_needed,
+    .fields = (const VMStateField[]) {
+        VMSTATE_BUFFER(env.hv_synthetic_timers_state, X86CPU),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static bool mshv_synic_vp_state_needed(void *opaque)
 {
     X86CPU *cpu = opaque;
@@ -1942,6 +1961,7 @@ const VMStateDescription vmstate_x86_cpu = {
 #endif
 #ifdef CONFIG_MSHV
         &vmstate_mshv_synic_vp_state,
+        &vmstate_mshv_synthetic_timers,
 #endif
         NULL
     }
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 49f3f9c090..ec1caf4e7a 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -135,6 +135,12 @@ static int get_synic_state(CPUState *cpu)
     int cpu_fd = mshv_vcpufd(cpu);
     int ret;
 
+    ret = mshv_get_synthetic_timers(cpu_fd, env->hv_synthetic_timers_state);
+    if (ret < 0) {
+        error_report("failed to get synthetic timers");
+        return -1;
+    }
+
     /* SIMP/SIEFP can only be read when SynIC is enabled */
     if (!mshv_synic_enabled(cpu)) {
         return 0;
@@ -1065,6 +1071,12 @@ static int set_synic_state(const CPUState *cpu)
     int cpu_fd = mshv_vcpufd(cpu);
     int ret;
 
+    ret = mshv_set_synthetic_timers(cpu_fd, env->hv_synthetic_timers_state);
+    if (ret < 0) {
+        error_report("failed to set synthetic timers state");
+        return -1;
+    }
+
     /* SIMP/SIEFP can only be written when SynIC is enabled */
     if (!mshv_synic_enabled(cpu)) {
         return 0;
diff --git a/target/i386/mshv/synic.c b/target/i386/mshv/synic.c
index 8f9fee6ed7..4c629adc3a 100644
--- a/target/i386/mshv/synic.c
+++ b/target/i386/mshv/synic.c
@@ -54,6 +54,57 @@ static int set_vp_state(int cpu_fd, const struct mshv_get_set_vp_state *state)
     return 0;
 }
 
+int mshv_get_synthetic_timers(int cpu_fd, uint8_t *state)
+{
+    int ret;
+    void *buffer;
+    struct mshv_get_set_vp_state args = {0};
+
+    buffer = qemu_memalign(HV_HYP_PAGE_SIZE, HV_HYP_PAGE_SIZE);
+    args.buf_ptr = (uint64_t)buffer;
+    args.buf_sz = HV_HYP_PAGE_SIZE;
+    args.type = MSHV_VP_STATE_SYNTHETIC_TIMERS;
+
+    ret = get_vp_state(cpu_fd, &args);
+
+    if (ret < 0) {
+        qemu_vfree(buffer);
+        error_report("failed to get synthetic timers");
+        return -1;
+    }
+
+    memcpy(state, buffer, MSHV_STIMERS_STATE_SIZE);
+    qemu_vfree(buffer);
+
+    return 0;
+}
+
+int mshv_set_synthetic_timers(int cpu_fd, const uint8_t *state)
+{
+    int ret;
+    void *buffer;
+    struct mshv_get_set_vp_state args = {0};
+
+    buffer = qemu_memalign(HV_HYP_PAGE_SIZE, HV_HYP_PAGE_SIZE);
+    memset(buffer, 0, HV_HYP_PAGE_SIZE);
+    args.buf_ptr = (uint64_t)buffer;
+    args.buf_sz = HV_HYP_PAGE_SIZE;
+    args.type = MSHV_VP_STATE_SYNTHETIC_TIMERS;
+
+    assert(state);
+    memcpy(buffer, state, MSHV_STIMERS_STATE_SIZE);
+
+    ret = set_vp_state(cpu_fd, &args);
+    qemu_vfree(buffer);
+
+    if (ret < 0) {
+        error_report("failed to set synthetic timers");
+        return -1;
+    }
+
+    return 0;
+}
+
 int mshv_get_simp(int cpu_fd, uint8_t *page)
 {
     int ret;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 24/32] accel/mshv: introduce SaveVMHandler
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (22 preceding siblings ...)
  2026-03-23 13:58 ` [RFC 23/32] target/i386/mshv: migrate STIMER state Magnus Kulke
@ 2026-03-23 13:58 ` Magnus Kulke
  2026-03-23 13:58 ` [RFC 25/32] accel/mshv: write synthetic MSRs after migration Magnus Kulke
                   ` (7 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

This mechanism is used to handle more imperative partition-wide steps
that have to be taken as part of a migration routine.

Currently it's a skeleton that will just pause/resume the partition as
part of a migration. It will later be extended with more specific steps
like reference time and hypercall pages.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/mshv-all.c | 75 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index d1bf148bb8..365288b901 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -39,6 +39,8 @@
 #include "system/mshv.h"
 #include "system/mshv_int.h"
 #include "system/reset.h"
+#include "migration/qemu-file-types.h"
+#include "migration/register.h"
 #include "trace.h"
 #include <err.h>
 #include <sys/ioctl.h>
@@ -47,6 +49,9 @@ bool mshv_allowed;
 
 MshvState *mshv_state;
 
+static int pause_vm(int vm_fd);
+static int resume_vm(int vm_fd);
+
 static int init_mshv(int *mshv_fd)
 {
     int fd = open("/dev/mshv", O_RDWR | O_CLOEXEC);
@@ -80,6 +85,64 @@ static int set_time_freeze(int vm_fd, int freeze)
     return 0;
 }
 
+static void mshv_save_state(QEMUFile *f, void *opaque)
+{
+    return;
+}
+
+static int mshv_load_state(QEMUFile *f, void *opaque, int version_id)
+{
+    return 0;
+}
+
+static int mshv_save_prepare(void *opaque, Error **errp)
+{
+    MshvState *s = opaque;
+    int ret;
+
+    ret = pause_vm(s->vm);
+    if (ret < 0) {
+        error_setg(errp, "Failed to pause VM for migration");
+        return -1;
+    }
+
+    return 0;
+}
+
+static void mshv_save_cleanup(void *opaque)
+{
+    MshvState *s = opaque;
+    int ret;
+
+    ret = resume_vm(s->vm);
+    if (ret < 0) {
+        error_report("Failed to resme VM after  migration");
+    }
+}
+
+static int mshv_load_setup(QEMUFile *f, void *opaque, Error **errp)
+{
+    MshvState *s = opaque;
+    int ret;
+
+    ret = pause_vm(s->vm);
+    if (ret < 0) {
+        error_setg(errp, "Failed to pause VM for migration restore");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int mshv_load_cleanup(void *opaque)
+{
+    MshvState *s = opaque;
+
+    resume_vm(s->vm);
+
+    return 0;
+}
+
 static int pause_vm(int vm_fd)
 {
     int ret;
@@ -454,6 +517,15 @@ static int mshv_init_vcpu(CPUState *cpu)
     return 0;
 }
 
+static SaveVMHandlers savevm_mshv = {
+    .save_prepare = mshv_save_prepare,
+    .save_state = mshv_save_state,
+    .save_cleanup = mshv_save_cleanup,
+    .load_setup = mshv_load_setup,
+    .load_state = mshv_load_state,
+    .load_cleanup = mshv_load_cleanup,
+};
+
 static int mshv_init(AccelState *as, MachineState *ms)
 {
     MshvState *s;
@@ -509,6 +581,9 @@ static int mshv_init(AccelState *as, MachineState *ms)
                                   0, "mshv-memory");
     memory_listener_register(&mshv_io_listener, &address_space_io);
 
+    /* register custom handlers for migration events */
+    register_savevm_live("mshv", 0, 1, &savevm_mshv, s);
+
     return 0;
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 25/32] accel/mshv: write synthetic MSRs after migration
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (23 preceding siblings ...)
  2026-03-23 13:58 ` [RFC 24/32] accel/mshv: introduce SaveVMHandler Magnus Kulke
@ 2026-03-23 13:58 ` Magnus Kulke
  2026-03-23 13:58 ` [RFC 26/32] accel/mshv: migrate REFERENCE_TIME Magnus Kulke
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

Write partition-wide synthetic MSRs. This ensures the hypercall page and
SynIC facilities are set up before vCPUs attempt to use it.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/mshv-all.c          |  7 +++++++
 include/hw/hyperv/hvgdk_mini.h |  3 +++
 include/system/mshv_int.h      |  1 +
 target/i386/mshv/mshv-cpu.c    | 15 +++++++++++++++
 4 files changed, 26 insertions(+)

diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index 365288b901..22e838ede6 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -137,6 +137,13 @@ static int mshv_load_setup(QEMUFile *f, void *opaque, Error **errp)
 static int mshv_load_cleanup(void *opaque)
 {
     MshvState *s = opaque;
+    int ret;
+
+    ret = mshv_arch_set_partition_msrs(first_cpu);
+    if (ret < 0) {
+        error_report("Failed to set partition MSRs: %s", strerror(-ret));
+        return -1;
+    }
 
     resume_vm(s->vm);
 
diff --git a/include/hw/hyperv/hvgdk_mini.h b/include/hw/hyperv/hvgdk_mini.h
index a47bc6212e..00daac0431 100644
--- a/include/hw/hyperv/hvgdk_mini.h
+++ b/include/hw/hyperv/hvgdk_mini.h
@@ -23,6 +23,9 @@
 #define HV_X64_MSR_APIC_FREQUENCY   0x40000023
 
 typedef enum hv_register_name {
+    /* VP Management Registers */
+    HV_REGISTER_INTERNAL_ACTIVITY_STATE = 0x00000004,
+
     /* Pending Interruption Register */
     HV_REGISTER_PENDING_INTERRUPTION = 0x00010002,
 
diff --git a/include/system/mshv_int.h b/include/system/mshv_int.h
index 7d685fc647..7052f20a00 100644
--- a/include/system/mshv_int.h
+++ b/include/system/mshv_int.h
@@ -84,6 +84,7 @@ int mshv_get_generic_regs(CPUState *cpu, hv_register_assoc *assocs,
                           size_t n_regs);
 int mshv_arch_store_vcpu_state(const CPUState *cpu);
 int mshv_arch_load_vcpu_state(CPUState *cpu);
+int mshv_arch_set_partition_msrs(const CPUState *cpu);
 void mshv_arch_init_vcpu(CPUState *cpu);
 void mshv_arch_destroy_vcpu(CPUState *cpu);
 void mshv_arch_amend_proc_features(
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index ec1caf4e7a..0b08f478ce 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -1141,6 +1141,21 @@ int mshv_arch_store_vcpu_state(const CPUState *cpu)
     return 0;
 }
 
+int mshv_arch_set_partition_msrs(const CPUState *cpu)
+{
+    CPUX86State *env = &X86_CPU(cpu)->env;
+    struct hv_register_assoc assocs[] = {
+        { .name = HV_REGISTER_GUEST_OS_ID,
+          .value.reg64 = env->msr_hv_guest_os_id },
+        { .name = HV_REGISTER_REFERENCE_TSC,
+          .value.reg64 = env->msr_hv_tsc },
+        { .name = HV_X64_REGISTER_HYPERCALL,
+          .value.reg64 = env->msr_hv_hypercall },
+    };
+
+    return mshv_set_generic_regs(cpu, assocs, ARRAY_SIZE(assocs));
+}
+
 void mshv_arch_amend_proc_features(
     union hv_partition_synthetic_processor_features *features)
 {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 26/32] accel/mshv: migrate REFERENCE_TIME
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (24 preceding siblings ...)
  2026-03-23 13:58 ` [RFC 25/32] accel/mshv: write synthetic MSRs after migration Magnus Kulke
@ 2026-03-23 13:58 ` Magnus Kulke
  2026-03-23 13:58 ` [RFC 27/32] target/i386/mshv: migrate pending ints/excs Magnus Kulke
                   ` (5 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

This is a partition-wide state that we use a dedicated handler for. It
is stored as-is in the "mshv" savevm_live field. We might have to extend
this later if we have more partition-wide state to roundtrip.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/mshv-all.c | 70 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 69 insertions(+), 1 deletion(-)

diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index 22e838ede6..3927a82925 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -85,13 +85,81 @@ static int set_time_freeze(int vm_fd, int freeze)
     return 0;
 }
 
+static int get_reference_time(int vm_fd, uint64_t *ref_time)
+{
+    int ret;
+    struct hv_input_get_partition_property in = {0};
+    struct hv_output_get_partition_property out = {0};
+    struct mshv_root_hvcall args = {0};
+
+    in.property_code = HV_PARTITION_PROPERTY_REFERENCE_TIME;
+
+    args.code = HVCALL_GET_PARTITION_PROPERTY;
+    args.in_sz = sizeof(in);
+    args.in_ptr = (uint64_t)&in;
+    args.out_sz = sizeof(out);
+    args.out_ptr = (uint64_t)&out;
+
+    ret = mshv_hvcall(vm_fd, &args);
+    if (ret < 0) {
+        error_report("Failed to get reference time");
+        return -1;
+    }
+
+    *ref_time = out.property_value;
+    return 0;
+}
+
 static void mshv_save_state(QEMUFile *f, void *opaque)
 {
-    return;
+    MshvState *s = opaque;
+    uint64_t ref_time;
+    int ret;
+
+    ret = get_reference_time(s->vm, &ref_time);
+    if (ret < 0) {
+        error_report("Failed to get reference time for migration");
+        abort();
+    }
+
+    qemu_put_be64(f, ref_time);
+}
+
+static int set_reference_time(int vm_fd, uint64_t ref_time)
+{
+    int ret;
+    struct hv_input_set_partition_property in = {0};
+    struct mshv_root_hvcall args = {0};
+
+    in.property_code = HV_PARTITION_PROPERTY_REFERENCE_TIME;
+    in.property_value = ref_time;
+
+    args.code = HVCALL_SET_PARTITION_PROPERTY;
+    args.in_sz = sizeof(in);
+    args.in_ptr = (uint64_t)&in;
+
+    ret = mshv_hvcall(vm_fd, &args);
+    if (ret < 0) {
+        error_report("Failed to set reference time");
+        return -1;
+    }
+
+    return 0;
 }
 
 static int mshv_load_state(QEMUFile *f, void *opaque, int version_id)
 {
+    MshvState *s = opaque;
+    uint64_t ref_time;
+    int ret;
+
+    ref_time = qemu_get_be64(f);
+
+    ret = set_reference_time(s->vm, ref_time);
+    if (ret < 0) {
+        error_report("Failed to set reference time after migration");
+        return -1;
+    }
     return 0;
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 27/32] target/i386/mshv: migrate pending ints/excs
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (25 preceding siblings ...)
  2026-03-23 13:58 ` [RFC 26/32] accel/mshv: migrate REFERENCE_TIME Magnus Kulke
@ 2026-03-23 13:58 ` Magnus Kulke
  2026-03-23 13:58 ` [RFC 28/32] target/i386: add de/compaction to xsave_helper Magnus Kulke
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

We use PENDING_INTERRUPTION, INTERRUPT_STATE, PENDING_EVENT hv registers
to map and roundtrip from/to CPUX86State.

We ignore HV_REGISTER_PENDING_EVENT1 which represent events for nested
virt contexts, as we don't support nested virt with MSHV currently.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 include/hw/hyperv/hvgdk_mini.h |   3 +
 include/system/mshv_int.h      |  13 +++
 target/i386/mshv/mshv-cpu.c    | 168 +++++++++++++++++++++++++++++++++
 3 files changed, 184 insertions(+)

diff --git a/include/hw/hyperv/hvgdk_mini.h b/include/hw/hyperv/hvgdk_mini.h
index 00daac0431..a88420fafe 100644
--- a/include/hw/hyperv/hvgdk_mini.h
+++ b/include/hw/hyperv/hvgdk_mini.h
@@ -28,6 +28,9 @@ typedef enum hv_register_name {
 
     /* Pending Interruption Register */
     HV_REGISTER_PENDING_INTERRUPTION = 0x00010002,
+    HV_REGISTER_INTERRUPT_STATE      = 0x00010003,
+    HV_REGISTER_PENDING_EVENT0       = 0x00010004,
+    HV_REGISTER_PENDING_EVENT1       = 0x00010005,
 
     /* X64 User-Mode Registers */
     HV_X64_REGISTER_RAX     = 0x00020000,
diff --git a/include/system/mshv_int.h b/include/system/mshv_int.h
index 7052f20a00..bc16b794b2 100644
--- a/include/system/mshv_int.h
+++ b/include/system/mshv_int.h
@@ -18,6 +18,19 @@
 
 struct mshv_get_set_vp_state;
 
+/*
+ * Interruption-type encoding, used by the hypervisor in
+ * hv_x64_pending_interruption_register.interruption_type
+ * See TLFS 6.0 section 7.9.2, p55
+ * https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
+ */
+#define MSHV_HV_INTERRUPTION_TYPE_EXT_INT     0
+#define MSHV_HV_INTERRUPTION_TYPE_NMI         2
+#define MSHV_HV_INTERRUPTION_TYPE_HW_EXC      3
+#define MSHV_HV_INTERRUPTION_TYPE_SW_INT      4
+#define MSHV_HV_INTERRUPTION_TYPE_PRIV_SW_EXC 5
+#define MSHV_HV_INTERRUPTION_TYPE_SW_EXC      6
+
 typedef struct hyperv_message hv_message;
 
 typedef struct MshvHvCallArgs {
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 0b08f478ce..746987d62b 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -584,6 +584,164 @@ static int load_regs(CPUState *cpu)
     return 0;
 }
 
+static int get_vcpu_events(CPUState *cpu)
+{
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    struct hv_register_assoc assocs[] = {
+        { .name = HV_REGISTER_PENDING_INTERRUPTION },
+        { .name = HV_REGISTER_INTERRUPT_STATE },
+        { .name = HV_REGISTER_PENDING_EVENT0 },
+    };
+    union hv_x64_pending_interruption_register pending_int;
+    union hv_x64_interrupt_state_register int_state;
+    union hv_x64_pending_exception_event pending_exc;
+    int ret;
+
+    ret = mshv_get_generic_regs(cpu, assocs, ARRAY_SIZE(assocs));
+    if (ret < 0) {
+        error_report("failed to get vcpu event registers");
+        return -1;
+    }
+
+    pending_int.as_uint64 = assocs[0].value.reg64;
+    int_state.as_uint64 = assocs[1].value.reg64;
+    pending_exc = assocs[2].value.pending_exception_event;
+
+    /* Clear previous state. injected ints/excs are blanked w/ -1 */
+    env->interrupt_injected    = -1;
+    env->soft_interrupt        = 0;
+    env->exception_injected    = 0;
+    env->exception_pending     = 0;
+    env->exception_nr          = -1;
+    env->has_error_code        = 0;
+    env->error_code            = 0;
+    env->exception_has_payload = 0;
+    env->exception_payload     = 0;
+    env->nmi_injected          = 0;
+
+    if (pending_int.interruption_pending) {
+        switch (pending_int.interruption_type) {
+        case MSHV_HV_INTERRUPTION_TYPE_EXT_INT:
+            env->interrupt_injected = pending_int.interruption_vector;
+            break;
+        case MSHV_HV_INTERRUPTION_TYPE_NMI:
+            env->nmi_injected = 1;
+            break;
+        case MSHV_HV_INTERRUPTION_TYPE_HW_EXC:
+            env->exception_injected = 1;
+            env->exception_nr       = pending_int.interruption_vector;
+            env->has_error_code     = pending_int.deliver_error_code;
+            env->error_code         = pending_int.error_code;
+            break;
+        case MSHV_HV_INTERRUPTION_TYPE_SW_INT:
+            env->interrupt_injected = pending_int.interruption_vector;
+            env->soft_interrupt     = 1;
+            break;
+        case MSHV_HV_INTERRUPTION_TYPE_SW_EXC:
+        case MSHV_HV_INTERRUPTION_TYPE_PRIV_SW_EXC:
+            env->exception_injected = 1;
+            env->exception_nr       = pending_int.interruption_vector;
+            env->has_error_code     = pending_int.deliver_error_code;
+            env->error_code         = pending_int.error_code;
+            break;
+        default:
+            error_report("unknown interruption type %u",
+                         pending_int.interruption_type);
+            return -EINVAL;
+        }
+    }
+
+    /* disabled for one instr after STI, MOV/POP SS, see hvf_store_events() */
+    if (int_state.interrupt_shadow) {
+        env->hflags |= HF_INHIBIT_IRQ_MASK;
+    } else {
+        env->hflags &= ~HF_INHIBIT_IRQ_MASK;
+    }
+
+    /* see kvm_get_vcpu_events(), hvf_store_events() */
+    if (int_state.nmi_masked) {
+        env->hflags2 |= HF2_NMI_MASK;
+    } else {
+        env->hflags2 &= ~HF2_NMI_MASK;
+    }
+
+    /* HV_REGISTER_PENDING_EVENT0: pending exception not yet injected */
+    if (pending_exc.event_pending) {
+        env->exception_pending     = 1;
+        env->exception_nr          = pending_exc.vector;
+        env->has_error_code        = pending_exc.deliver_error_code;
+        env->error_code            = pending_exc.error_code;
+        env->exception_has_payload = (pending_exc.exception_parameter != 0);
+        env->exception_payload     = pending_exc.exception_parameter;
+    }
+
+    /*
+     * Ignoring HV_REGISTER_PENDING_EVENT1, virtualization fault events, MSHV
+     * does not support nested virtualization.
+     */
+
+    return 0;
+}
+
+static int set_vcpu_events(const CPUState *cpu)
+{
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    union hv_x64_pending_interruption_register pending_int = { 0 };
+    union hv_x64_interrupt_state_register int_state = { 0 };
+    union hv_x64_pending_exception_event pending_exc = { 0 };
+    struct hv_register_assoc assocs[3];
+    int ret;
+
+    /* build pending_int from CPUX86State */
+    if (env->exception_injected) {
+        pending_int.interruption_pending = 1;
+        pending_int.interruption_type    = MSHV_HV_INTERRUPTION_TYPE_HW_EXC;
+        pending_int.interruption_vector  = env->exception_nr;
+        pending_int.deliver_error_code   = env->has_error_code;
+        pending_int.error_code           = env->error_code;
+    } else if (env->nmi_injected) {
+        pending_int.interruption_pending = 1;
+        pending_int.interruption_type    = MSHV_HV_INTERRUPTION_TYPE_NMI;
+        pending_int.interruption_vector  = EXCP02_NMI;
+    } else if (env->interrupt_injected >= 0) {
+        pending_int.interruption_pending = 1;
+        pending_int.interruption_type    = env->soft_interrupt
+            ? MSHV_HV_INTERRUPTION_TYPE_SW_INT
+            : MSHV_HV_INTERRUPTION_TYPE_EXT_INT;
+        pending_int.interruption_vector  = env->interrupt_injected;
+    }
+
+    /* build int_state, normalize to bool */
+    int_state.interrupt_shadow = !!(env->hflags  & HF_INHIBIT_IRQ_MASK);
+    int_state.nmi_masked       = !!(env->hflags2 & HF2_NMI_MASK);
+
+    /* build pending_exc */
+    if (env->exception_pending) {
+        pending_exc.event_pending       = 1;
+        pending_exc.vector              = env->exception_nr;
+        pending_exc.deliver_error_code  = env->has_error_code;
+        pending_exc.error_code          = env->error_code;
+        pending_exc.exception_parameter = env->exception_payload;
+    }
+
+    assocs[0].name = HV_REGISTER_PENDING_INTERRUPTION;
+    assocs[0].value.reg64 = pending_int.as_uint64;
+    assocs[1].name = HV_REGISTER_INTERRUPT_STATE;
+    assocs[1].value.reg64 = int_state.as_uint64;
+    assocs[2].name = HV_REGISTER_PENDING_EVENT0;
+    assocs[2].value.pending_exception_event = pending_exc;
+
+    ret = mshv_set_generic_regs(cpu, assocs, ARRAY_SIZE(assocs));
+    if (ret < 0) {
+        error_report("failed to set vcpu event registers");
+        return -1;
+    }
+
+    return 0;
+}
+
 int mshv_arch_load_vcpu_state(CPUState *cpu)
 {
     int ret;
@@ -623,6 +781,11 @@ int mshv_arch_load_vcpu_state(CPUState *cpu)
         return ret;
     }
 
+    ret = get_vcpu_events(cpu);
+    if (ret < 0) {
+        return ret;
+    }
+
     return 0;
 }
 
@@ -1138,6 +1301,11 @@ int mshv_arch_store_vcpu_state(const CPUState *cpu)
         return ret;
     }
 
+    ret = set_vcpu_events(cpu);
+    if (ret < 0) {
+        return ret;
+    }
+
     return 0;
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 28/32] target/i386: add de/compaction to xsave_helper
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (26 preceding siblings ...)
  2026-03-23 13:58 ` [RFC 27/32] target/i386/mshv: migrate pending ints/excs Magnus Kulke
@ 2026-03-23 13:58 ` Magnus Kulke
  2026-03-23 13:58 ` [RFC 29/32] target/i386/mshv: migrate XSAVE state Magnus Kulke
                   ` (3 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

HyperV use XSAVES which stores extended state in compacted format in
which components are packed contiguously, while QEMU's internal XSAVE
representation use the standard format in which each component is places
at a fixed offset. Hence for this purpose we add two conversion fn's to
the xsave helper to roundtrip XSAVE state in a migration.

- decompact_xsave_area(): converts compacted format to standard.
  XSTATE_BV is masked to host XCR0 since IA32_XSS is managed
  by the hypervisor.

- compact_xsave_area(): converts standard format back to compacted
  format. XCOMP_BV is set from the host's CPUID 0xD.0 rather than the
  guest's XCR0, as this is what the hypervisor expects.

Both functions use the host's CPUID leaf 0xD subleaves to determine component
sizes, offsets, and alignment requirements.

There are situations when the host advertises features that we want to
disable for the guest, e.g. AMX TILE. In this case we cannot rely on the
host's xcr0, but instead we use the feature mask that has been generated
in as part of the CPU realization process (x86_cpu_expand_features).

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 target/i386/cpu.h          |   2 +
 target/i386/xsave_helper.c | 255 +++++++++++++++++++++++++++++++++++++
 2 files changed, 257 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 4ad4a35ce9..cd5d5a5369 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -3033,6 +3033,8 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void *buf, uint32_t buflen);
 void x86_cpu_xsave_all_areas(X86CPU *cpu, void *buf, uint32_t buflen);
 uint32_t xsave_area_size(uint64_t mask, bool compacted);
 void x86_update_hflags(CPUX86State* env);
+int decompact_xsave_area(const void *buf, size_t buflen, CPUX86State *env);
+int compact_xsave_area(CPUX86State *env, void *buf, size_t buflen);
 
 static inline bool hyperv_feat_enabled(X86CPU *cpu, int feat)
 {
diff --git a/target/i386/xsave_helper.c b/target/i386/xsave_helper.c
index bab2258732..2272b83f5f 100644
--- a/target/i386/xsave_helper.c
+++ b/target/i386/xsave_helper.c
@@ -3,6 +3,7 @@
  * See the COPYING file in the top-level directory.
  */
 #include "qemu/osdep.h"
+#include "qemu/error-report.h"
 
 #include "cpu.h"
 
@@ -293,3 +294,257 @@ void x86_cpu_xrstor_all_areas(X86CPU *cpu, const void *buf, uint32_t buflen)
     }
 #endif
 }
+
+#define XSTATE_BV_IN_HDR  offsetof(X86XSaveHeader, xstate_bv)
+#define XCOMP_BV_IN_HDR   offsetof(X86XSaveHeader, xcomp_bvo)
+
+typedef struct X86XSaveAreaView {
+    /* 512 bytes */
+    X86LegacyXSaveArea legacy;
+    /* 64 bytes */
+    X86XSaveHeader     header;
+    /* ...followed by individual xsave areas */
+} X86XSaveAreaView;
+
+#define XSAVE_XSTATE_BV_OFFSET  offsetof(X86XSaveAreaView, header.xstate_bv)
+#define XSAVE_XCOMP_BV_OFFSET   offsetof(X86XSaveAreaView, header.xcomp_bv)
+#define XSAVE_EXT_OFFSET        (sizeof(X86LegacyXSaveArea) + \
+                                 sizeof(X86XSaveHeader))
+
+/**
+ * decompact_xsave_area - Convert compacted XSAVE format to standard format
+ * @buf: Source buffer containing compacted XSAVE data
+ * @buflen: Size of source buffer
+ * @env: CPU state where the standard format buffer will be written to
+ *
+ * Accelerator backends like MSHV might return XSAVE state in compacted format
+ * (XSAVEC). The state components have to be packed contiguously without gaps.
+ * The XSAVE qemu buffers are in standard format where each component has a
+ * fixed offset.
+ *
+ * Returns: 0 on success, negative errno on failure
+ */
+int decompact_xsave_area(const void *buf, size_t buflen, CPUX86State *env)
+{
+    uint64_t compacted_xstate_bv, compacted_xcomp_bv, compacted_layout_bv;
+    uint64_t xsave_offset, *xcomp_bv;
+    size_t i;
+    uint32_t eax, ebx, ecx, edx;
+    uint32_t size, dst_off;
+    bool align64;
+    uint64_t guest_xcr0, *xstate_bv;
+
+    compacted_xstate_bv = *(uint64_t *)(buf + XSAVE_XSTATE_BV_OFFSET);
+    compacted_xcomp_bv  = *(uint64_t *)(buf + XSAVE_XCOMP_BV_OFFSET);
+
+    /* This function only handles compacted format (bit 63 set) */
+    assert((compacted_xcomp_bv >> 63) & 1);
+
+    /* Low bits of XCOMP_BV describe which components are in the layout */
+    compacted_layout_bv = compacted_xcomp_bv & ~(1ULL << 63);
+
+    /* Zero out buffer, then copy legacy region (FP + SSE) and header as-is */
+    memset(env->xsave_buf, 0, env->xsave_buf_len);
+    memcpy(env->xsave_buf, buf, XSAVE_EXT_OFFSET);
+
+    /*
+     * We mask XSTATE_BV with the guest's supported XCR0 because:
+     * 1. Supervisor state (IA32_XSS) is hypervisor-managed, we don't use
+     *    this state for migration.
+     * 2. Features disabled at partition creation (e.g. AMX) must be excluded
+     */
+    guest_xcr0 = ((uint64_t)env->features[FEAT_XSAVE_XCR0_HI] << 32) |
+                 env->features[FEAT_XSAVE_XCR0_LO];
+    xstate_bv = (uint64_t *)(env->xsave_buf + XSAVE_XSTATE_BV_OFFSET);
+    *xstate_bv &= guest_xcr0;
+
+    /* Clear bit 63 - output is standard format, not compacted */
+    xcomp_bv = (uint64_t *)(env->xsave_buf + XSAVE_XCOMP_BV_OFFSET);
+    *xcomp_bv = *xcomp_bv & ~(1ULL << 63);
+
+    /*
+     * Process each extended state component in the compacted layout.
+     * Components 0 and 1 (FP and SSE) are in the legacy region, so we
+     * start at component 2. For each component:
+     * - Calculate its offset in the compacted source (contiguous layout)
+     * - Get its fixed offset in the standard destination from CPUID
+     * - Copy if the component has non-init state (bit set in XSTATE_BV)
+     */
+    xsave_offset = XSAVE_EXT_OFFSET;
+    for (i = 2; i < 63; i++) {
+        if (((compacted_layout_bv >> i) & 1) == 0) {
+            continue;
+        }
+
+        /* Query guest CPUID for this component's size and standard offset */
+        cpu_x86_cpuid(env, 0xD, i, &eax, &ebx, &ecx, &edx);
+
+        size = eax;
+        dst_off = ebx;
+        align64 = (ecx & (1u << 1)) != 0;
+
+        /* Component is in the layout but unknown to the guest CPUID model */
+        if (size == 0) {
+            /*
+             * The hypervisor might expose a component that has no
+             * representation in the guest CPUID model. We query the host to
+             * retrieve the size of the component, so we can skip over it.
+             */
+            host_cpuid(0xD, i, &eax, &ebx, &ecx, &edx);
+            size = eax;
+            align64 = (ecx & (1u << 1)) != 0;
+            if (size == 0) {
+                error_report("xsave component %zu: size unknown to both "
+                             "guest and host CPUID", i);
+                return -EINVAL;
+            }
+
+            if (align64) {
+                xsave_offset = QEMU_ALIGN_UP(xsave_offset, 64);
+            }
+
+            if (xsave_offset + size > buflen) {
+                error_report("xsave component %zu overruns source buffer: "
+                             "offset=%zu size=%u buflen=%zu",
+                             i, xsave_offset, size, buflen);
+                return -E2BIG;
+            }
+
+            xsave_offset += size;
+            continue;
+        }
+
+        if (align64) {
+            xsave_offset = QEMU_ALIGN_UP(xsave_offset, 64);
+        }
+
+        if ((xsave_offset + size) > buflen) {
+            error_report("xsave component %zu overruns source buffer: "
+                         "offset=%zu size=%u buflen=%zu",
+                         i, xsave_offset, size, buflen);
+            return -E2BIG;
+        }
+
+        if ((dst_off + size) > env->xsave_buf_len) {
+            error_report("xsave component %zu overruns destination buffer: "
+                         "offset=%u size=%u buflen=%zu",
+                         i, dst_off, size, (size_t)env->xsave_buf_len);
+            return -E2BIG;
+        }
+
+        /* Copy components marked present in XSTATE_BV to guest model */
+        if (((compacted_xstate_bv >> i) & 1) != 0) {
+            memcpy(env->xsave_buf + dst_off, buf + xsave_offset, size);
+        }
+
+        xsave_offset += size;
+    }
+
+    return 0;
+}
+
+/**
+ * compact_xsave_area - Convert standard XSAVE format to compacted format
+ * @env: CPU state containing the standard format XSAVE buffer
+ * @buf: Destination buffer for compacted XSAVE data (to send to hypervisor)
+ * @buflen: Size of destination buffer
+ *
+ * Accelerator backends like MSHV might expect XSAVE state in compacted format
+ * (XSAVEC). The state components are packed contiguously without gaps.
+ * The XSAVE qemu buffers are in standard format where each component has a
+ * fixed offset.
+ *
+ * This function converts from standard to compacted format, it accepts a
+ * pre-allocated destination buffer of sufficient size, it is the
+ * responsibility of the caller to ensure the buffer is big enough.
+ *
+ * Returns: total size of compacted XSAVE data written to @buf
+ */
+int compact_xsave_area(CPUX86State *env, void *buf, size_t buflen)
+{
+    uint64_t *xcomp_bv;
+    size_t i;
+    uint32_t eax, ebx, ecx, edx;
+    uint32_t size, src_off;
+    bool align64;
+    size_t compact_offset;
+    uint64_t host_xcr0_mask, guest_xcr0;
+
+    /* Zero out buffer, then copy legacy region (FP + SSE) and header as-is */
+    memset(buf, 0, buflen);
+    memcpy(buf, env->xsave_buf, XSAVE_EXT_OFFSET);
+
+    /*
+     * Set XCOMP_BV to indicate compacted format (bit 63) and which
+     * components are in the layout.
+     *
+     * We must explicitly set XCOMP_BV because x86_cpu_xsave_all_areas()
+     * produces standard format with XCOMP_BV=0 (buffer is zeroed and only
+     * XSTATE_BV is set in the header).
+     *
+     * XCOMP_BV must reflect the partition's XSAVE capability, not the
+     * guest's current XCR0 (env->xcr0). These differ b/c:
+     * - A guest's XCR0 is what the guest OS has enabled via XSETBV
+     * - The partition's XCR0 mask is the hypervisor's save/restore capability
+     *
+     * The hypervisor uses XSAVES which saves based on its capability, so the
+     * XCOMP_BV value in the buffer we send back must match that capability.
+     *
+     * We intersect the host XCR0 with the guest's supported XCR0 features
+     * (FEAT_XSAVE_XCR0_*) so that features disabled at partition creation
+     * (e.g. AMX) are excluded from the compacted layout.
+     */
+    host_cpuid(0xD, 0, &eax, &ebx, &ecx, &edx);
+    host_xcr0_mask = ((uint64_t)edx << 32) | eax;
+    guest_xcr0 = ((uint64_t)env->features[FEAT_XSAVE_XCR0_HI] << 32) |
+                 env->features[FEAT_XSAVE_XCR0_LO];
+    host_xcr0_mask &= guest_xcr0;
+    xcomp_bv = buf + XSAVE_XCOMP_BV_OFFSET;
+    *xcomp_bv = host_xcr0_mask | (1ULL << 63);
+
+    /*
+     * Process each extended state component in the host's XCR0.
+     * The compacted layout must match XCOMP_BV (host capability).
+     *
+     * For each component:
+     * - Get its size and standard offset from host CPUID
+     * - Apply 64-byte alignment if required
+     * - Copy data only if guest has this component (bit set in env->xcr0)
+     * - Always advance offset to maintain correct layout
+     */
+    compact_offset = XSAVE_EXT_OFFSET;
+    for (i = 2; i < 63; i++) {
+        if (!((host_xcr0_mask >> i) & 1)) {
+            continue;
+        }
+
+        /* Query host CPUID for this component's size and standard offset */
+        host_cpuid(0xD, i, &eax, &ebx, &ecx, &edx);
+        size = eax;
+        src_off = ebx;
+        align64 = (ecx >> 1) & 1;
+
+        if (size == 0) {
+            /* Component in host xcr0 but unknown - shouldn't happen */
+            continue;
+        }
+
+        /* Apply 64-byte alignment if required by this component */
+        if (align64) {
+            compact_offset = QEMU_ALIGN_UP(compact_offset, 64);
+        }
+
+        /*
+         * Only copy data if guest has this component enabled in XCR0.
+         * Otherwise the component remains zeroed (init state), but we
+         * still advance the offset to maintain the correct layout.
+         */
+        if ((env->xcr0 >> i) & 1) {
+            memcpy(buf + compact_offset, env->xsave_buf + src_off, size);
+        }
+
+        compact_offset += size;
+    }
+
+    return compact_offset;
+}
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 29/32] target/i386/mshv: migrate XSAVE state
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (27 preceding siblings ...)
  2026-03-23 13:58 ` [RFC 28/32] target/i386: add de/compaction to xsave_helper Magnus Kulke
@ 2026-03-23 13:58 ` Magnus Kulke
  2026-03-23 13:58 ` [RFC 30/32] target/i386/mshv: reconstruct hflags after load Magnus Kulke
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

We implement fn's that roundtrip XSAVE state in migration. We are using
the xsave_helper routines to move individual components from CPUX86State
to an xsave_buf and then we have to compact the buffer to XSAVEC format,
which is what the hypervisor expects.

And the same applies in the other direction for restoring state from the
hypervisor.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 target/i386/cpu.h           |   2 +-
 target/i386/mshv/mshv-cpu.c | 100 +++++++++++++++++++++++++++++++++++-
 2 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index cd5d5a5369..0f30f0dd5b 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2272,7 +2272,7 @@ typedef struct CPUArchState {
     int64_t user_tsc_khz; /* for sanity check only */
     uint64_t apic_bus_freq;
     uint64_t tsc;
-#if defined(CONFIG_KVM) || defined(CONFIG_HVF)
+#if defined(CONFIG_KVM) || defined(CONFIG_HVF) || defined(CONFIG_MSHV)
     void *xsave_buf;
     uint32_t xsave_buf_len;
 #endif
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 746987d62b..dacc33674c 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -109,6 +109,78 @@ static enum hv_register_name FPU_REGISTER_NAMES[26] = {
 
 static int set_special_regs(const CPUState *cpu);
 
+static int get_xsave_state(CPUState *cpu)
+{
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    int cpu_fd = mshv_vcpufd(cpu);
+    int ret;
+    void *xsavec_buf;
+    const size_t page = HV_HYP_PAGE_SIZE;
+    size_t xsavec_buf_len = page;
+
+    /* TODO: should properly determine xsavec size based on CPUID */
+    xsavec_buf = qemu_memalign(page, xsavec_buf_len);
+    memset(xsavec_buf, 0, xsavec_buf_len);
+
+    struct mshv_get_set_vp_state args = {
+        .type = MSHV_VP_STATE_XSAVE,
+        .buf_sz = xsavec_buf_len,
+        .buf_ptr = (uintptr_t)xsavec_buf,
+    };
+
+    ret = ioctl(cpu_fd, MSHV_GET_VP_STATE, &args);
+    if (ret < 0) {
+        error_report("failed to get xsave state: %s", strerror(errno));
+        return -errno;
+    }
+
+    ret = decompact_xsave_area(xsavec_buf, xsavec_buf_len, env);
+    g_free(xsavec_buf);
+    if (ret < 0) {
+        error_report("failed to decompact xsave area");
+        return ret;
+    }
+    x86_cpu_xrstor_all_areas(x86cpu, env->xsave_buf, env->xsave_buf_len);
+
+    return 0;
+}
+
+static int set_xsave_state(const CPUState *cpu)
+{
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    int cpu_fd = mshv_vcpufd(cpu);
+    int ret;
+    void *xsavec_buf;
+    size_t page = HV_HYP_PAGE_SIZE, xsavec_buf_len;
+
+    /* allocate and populate compacted buffer */
+    xsavec_buf = qemu_memalign(page, page);
+    xsavec_buf_len = page;
+
+    /* save registers to standard format buffer */
+    x86_cpu_xsave_all_areas(x86cpu, env->xsave_buf, env->xsave_buf_len);
+
+    /* store compacted version of xsave area in xsavec_buf */
+    compact_xsave_area(env, xsavec_buf, xsavec_buf_len);
+
+    struct mshv_get_set_vp_state args = {
+        .type = MSHV_VP_STATE_XSAVE,
+        .buf_sz = xsavec_buf_len,
+        .buf_ptr = (uintptr_t)xsavec_buf,
+    };
+
+    ret = ioctl(cpu_fd, MSHV_SET_VP_STATE, &args);
+    g_free(xsavec_buf);
+    if (ret < 0) {
+        error_report("failed to set xsave state: %s", strerror(errno));
+        return -errno;
+    }
+
+    return 0;
+}
+
 static int get_lapic(CPUState *cpu)
 {
     X86CPU *x86cpu = X86_CPU(cpu);
@@ -766,6 +838,11 @@ int mshv_arch_load_vcpu_state(CPUState *cpu)
         return ret;
     }
 
+    ret = get_xsave_state(cpu);
+    if (ret < 0) {
+        return ret;
+    }
+
     ret = get_lapic(cpu);
     if (ret < 0) {
         return ret;
@@ -1284,6 +1361,11 @@ int mshv_arch_store_vcpu_state(const CPUState *cpu)
         return ret;
     }
 
+    ret = set_xsave_state(cpu);
+    if (ret < 0) {
+        return ret;
+    }
+
     /* INVARIANT: special regs (APIC_BASE) must be restored before LAPIC */
     ret = set_lapic(cpu);
     if (ret < 0) {
@@ -1812,9 +1894,10 @@ void mshv_arch_init_vcpu(CPUState *cpu)
     X86CPU *x86_cpu = X86_CPU(cpu);
     CPUX86State *env = &x86_cpu->env;
     AccelCPUState *state = cpu->accel;
-    size_t page = HV_HYP_PAGE_SIZE;
+    size_t page = HV_HYP_PAGE_SIZE, xsave_len;
     void *mem = qemu_memalign(page, 2 * page);
     int ret;
+    X86XSaveHeader *header;
 
     /* sanity check, to make sure we don't overflow the page */
     QEMU_BUILD_BUG_ON((MAX_REGISTER_COUNT
@@ -1828,6 +1911,17 @@ void mshv_arch_init_vcpu(CPUState *cpu)
 
     env->emu_mmio_buf = g_new(char, 4096);
 
+    /* Initialize XSAVE buffer page-aligned */
+    /* TODO: pick proper size based on CPUID */
+    xsave_len = page;
+    env->xsave_buf = qemu_memalign(page, xsave_len);
+    env->xsave_buf_len = xsave_len;
+    memset(env->xsave_buf, 0, env->xsave_buf_len);
+
+    /* we need to set the compacted format bit in xsave header for mshv */
+    header = (X86XSaveHeader *)(env->xsave_buf + sizeof(X86LegacyXSaveArea));
+    header->xcomp_bv = header->xstate_bv | (1ULL << 63);
+
     /*
      * TODO: populate topology info:
      * X86CPUTopoInfo *topo_info = &env->topo_info;
@@ -1852,6 +1946,10 @@ void mshv_arch_destroy_vcpu(CPUState *cpu)
     g_free(state->hvcall_args.base);
     state->hvcall_args = (MshvHvCallArgs){0};
     g_clear_pointer(&env->emu_mmio_buf, g_free);
+
+    qemu_vfree(env->xsave_buf);
+    env->xsave_buf = NULL;
+    env->xsave_buf_len = 0;
 }
 
 /*
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 30/32] target/i386/mshv: reconstruct hflags after load
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (28 preceding siblings ...)
  2026-03-23 13:58 ` [RFC 29/32] target/i386/mshv: migrate XSAVE state Magnus Kulke
@ 2026-03-23 13:58 ` Magnus Kulke
  2026-03-23 13:58 ` [RFC 31/32] target/i386/mshv: migrate MP_STATE Magnus Kulke
  2026-03-23 13:58 ` [RFC 32/32] accel/mshv: enable dirty page tracking Magnus Kulke
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

hflags is a cached bitmap derived from standard and special regs. We
want to reconstruct this state after regs and sregs have been read from
the hypervisor, similar to how it's one in other accelerators.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 target/i386/mshv/mshv-cpu.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index dacc33674c..80e5dd5a4b 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -814,6 +814,16 @@ static int set_vcpu_events(const CPUState *cpu)
     return 0;
 }
 
+static int update_hflags(CPUState *cpu)
+{
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+
+    x86_update_hflags(env);
+
+    return 0;
+}
+
 int mshv_arch_load_vcpu_state(CPUState *cpu)
 {
     int ret;
@@ -828,6 +838,9 @@ int mshv_arch_load_vcpu_state(CPUState *cpu)
         return ret;
     }
 
+    /* INVARIANT: hflags are derived from regs+sregs, need to get both first */
+    update_hflags(cpu);
+
     ret = get_xc_reg(cpu);
     if (ret < 0) {
         return ret;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 31/32] target/i386/mshv: migrate MP_STATE
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (29 preceding siblings ...)
  2026-03-23 13:58 ` [RFC 30/32] target/i386/mshv: reconstruct hflags after load Magnus Kulke
@ 2026-03-23 13:58 ` Magnus Kulke
  2026-03-23 13:58 ` [RFC 32/32] accel/mshv: enable dirty page tracking Magnus Kulke
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

MSHV's "internal activity state" roughly maps to QEMU's env->mp_state
and cpu->halted states that describe state of APs in a guest.

We don't invoke set_mp_state as part of store_vcpu_state() b/c we would
put all BSP + APs in a RUNNABLE (0) state immediately, breaking SMP boot

Instead we store the mp state as part of the load_cleanup() routine
after a migration.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/mshv-all.c       | 10 +++++
 include/system/mshv_int.h   |  1 +
 target/i386/mshv/mshv-cpu.c | 80 +++++++++++++++++++++++++++++++++++++
 3 files changed, 91 insertions(+)

diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index 3927a82925..45fe1ef468 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -205,6 +205,7 @@ static int mshv_load_setup(QEMUFile *f, void *opaque, Error **errp)
 static int mshv_load_cleanup(void *opaque)
 {
     MshvState *s = opaque;
+    CPUState *cpu;
     int ret;
 
     ret = mshv_arch_set_partition_msrs(first_cpu);
@@ -213,6 +214,15 @@ static int mshv_load_cleanup(void *opaque)
         return -1;
     }
 
+    CPU_FOREACH(cpu) {
+        ret = mshv_arch_set_mp_state(cpu);
+        if (ret < 0) {
+            error_report("Failed to set mp state for vCPU %d: %s",
+                         cpu->cpu_index, strerror(-ret));
+            return -1;
+        }
+    }
+
     resume_vm(s->vm);
 
     return 0;
diff --git a/include/system/mshv_int.h b/include/system/mshv_int.h
index bc16b794b2..c24efc8675 100644
--- a/include/system/mshv_int.h
+++ b/include/system/mshv_int.h
@@ -98,6 +98,7 @@ int mshv_get_generic_regs(CPUState *cpu, hv_register_assoc *assocs,
 int mshv_arch_store_vcpu_state(const CPUState *cpu);
 int mshv_arch_load_vcpu_state(CPUState *cpu);
 int mshv_arch_set_partition_msrs(const CPUState *cpu);
+int mshv_arch_set_mp_state(const CPUState *cpu);
 void mshv_arch_init_vcpu(CPUState *cpu);
 void mshv_arch_destroy_vcpu(CPUState *cpu);
 void mshv_arch_amend_proc_features(
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 80e5dd5a4b..c2c217372a 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -33,6 +33,11 @@
 
 #include <sys/ioctl.h>
 
+#define MSHV_MP_STATE_RUNNABLE       0
+#define MSHV_MP_STATE_UNINITIALIZED  1
+#define MSHV_MP_STATE_INIT_RECEIVED  2
+#define MSHV_MP_STATE_HALTED         3
+
 #define MAX_REGISTER_COUNT (MAX_CONST(ARRAY_SIZE(STANDARD_REGISTER_NAMES), \
                             MAX_CONST(ARRAY_SIZE(SPECIAL_REGISTER_NAMES), \
                                       ARRAY_SIZE(FPU_REGISTER_NAMES))))
@@ -814,6 +819,76 @@ static int set_vcpu_events(const CPUState *cpu)
     return 0;
 }
 
+static int get_mp_state(CPUState *cpu)
+{
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    struct hv_register_assoc assoc = {
+        .name = HV_REGISTER_INTERNAL_ACTIVITY_STATE,
+    };
+    union hv_internal_activity_register activity;
+    int ret;
+
+    ret = mshv_get_generic_regs(cpu, &assoc, 1);
+    if (ret < 0) {
+        error_report("failed to get internal activity state");
+        return -1;
+    }
+
+    activity.as_uint64 = assoc.value.reg64;
+
+    /*
+     * map MSHV activity state to KVM mp_state values, which are used as the
+     * shared representation in env->mp_state and serialized by vmstate_x86_cpu.
+     */
+
+    if (activity.startup_suspend) {
+        env->mp_state = MSHV_MP_STATE_UNINITIALIZED;
+    } else if (activity.halt_suspend) {
+        env->mp_state = MSHV_MP_STATE_HALTED;
+    } else {
+        env->mp_state = MSHV_MP_STATE_RUNNABLE;
+    }
+
+    cpu->halted = (env->mp_state == MSHV_MP_STATE_HALTED);
+
+    return 0;
+}
+
+int mshv_arch_set_mp_state(const CPUState *cpu)
+{
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    union hv_internal_activity_register activity = { 0 };
+    struct hv_register_assoc assoc = {
+        .name = HV_REGISTER_INTERNAL_ACTIVITY_STATE,
+    };
+    int ret;
+
+    switch (env->mp_state) {
+    case MSHV_MP_STATE_HALTED:
+        activity.halt_suspend = 1;
+        break;
+    case MSHV_MP_STATE_UNINITIALIZED:
+    case MSHV_MP_STATE_INIT_RECEIVED:
+        activity.startup_suspend = 1;
+        break;
+    case MSHV_MP_STATE_RUNNABLE:
+    default:
+        break;
+    }
+
+    assoc.value.reg64 = activity.as_uint64;
+
+    ret = mshv_set_generic_regs(cpu, &assoc, 1);
+    if (ret < 0) {
+        error_report("failed to set internal activity state");
+        return -1;
+    }
+
+    return 0;
+}
+
 static int update_hflags(CPUState *cpu)
 {
     X86CPU *x86cpu = X86_CPU(cpu);
@@ -876,6 +951,11 @@ int mshv_arch_load_vcpu_state(CPUState *cpu)
         return ret;
     }
 
+    ret = get_mp_state(cpu);
+    if (ret < 0) {
+        return ret;
+    }
+
     return 0;
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC 32/32] accel/mshv: enable dirty page tracking
  2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
                   ` (30 preceding siblings ...)
  2026-03-23 13:58 ` [RFC 31/32] target/i386/mshv: migrate MP_STATE Magnus Kulke
@ 2026-03-23 13:58 ` Magnus Kulke
  31 siblings, 0 replies; 33+ messages in thread
From: Magnus Kulke @ 2026-03-23 13:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: kvm, Wei Liu, Richard Henderson, Marcelo Tosatti,
	Marcel Apfelbaum, Wei Liu, Alex Williamson, Paolo Bonzini,
	Zhao Liu, Philippe Mathieu-Daudé, Cédric Le Goater,
	Magnus Kulke, Magnus Kulke, Michael S. Tsirkin

This change introduces the functions required to perform dirty page
tracking to speed up migrations. We are using the sync, global_start,
and global_stop hooks.

The sync is implemented in batches.

Before we can disable the dirty page tracking we have to set all dirty bits.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/mem.c          | 211 ++++++++++++++++++++++++++++++++++++++
 accel/mshv/mshv-all.c     |   3 +
 include/system/mshv_int.h |   5 +
 3 files changed, 219 insertions(+)

diff --git a/accel/mshv/mem.c b/accel/mshv/mem.c
index e55c38d4db..820f87ef0c 100644
--- a/accel/mshv/mem.c
+++ b/accel/mshv/mem.c
@@ -12,10 +12,13 @@
 
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
+#include "qapi/error.h"
 #include "linux/mshv.h"
 #include "system/address-spaces.h"
 #include "system/mshv.h"
 #include "system/mshv_int.h"
+#include "hw/hyperv/hvhdk_mini.h"
+#include "system/physmem.h"
 #include "exec/memattrs.h"
 #include <sys/ioctl.h>
 #include "trace.h"
@@ -211,3 +214,211 @@ void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
         abort();
     }
 }
+
+static int enable_dirty_page_tracking(int vm_fd)
+{
+    int ret;
+    struct hv_input_set_partition_property in = {0};
+    struct mshv_root_hvcall args = {0};
+
+    in.property_code = HV_PARTITION_PROPERTY_GPA_PAGE_ACCESS_TRACKING;
+    in.property_value = 1;
+
+    args.code = HVCALL_SET_PARTITION_PROPERTY;
+    args.in_sz = sizeof(in);
+    args.in_ptr = (uint64_t)&in;
+
+    ret = mshv_hvcall(vm_fd, &args);
+    if (ret < 0) {
+        error_report("Failed to enable dirty page tracking: %s",
+                     strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * Retrieve dirty page bitmap for a GPA range, clearing the dirty bits
+ * atomically. Large ranges are handled in batches.
+ */
+static int get_dirty_log(int vm_fd, uint64_t base_pfn, uint64_t page_count,
+                         unsigned long *bitmap, size_t bitmap_size)
+{
+    uint64_t batch, bitmap_offset, completed = 0;
+    struct mshv_gpap_access_bitmap args = {0};
+    int ret;
+
+    QEMU_BUILD_BUG_ON(MSHV_DIRTY_PAGES_BATCH_SIZE % BITS_PER_LONG != 0);
+    assert(bitmap_size >= ROUND_UP(page_count, BITS_PER_LONG) / 8);
+
+    while (completed < page_count) {
+        batch = MIN(MSHV_DIRTY_PAGES_BATCH_SIZE, page_count - completed);
+        bitmap_offset = completed / BITS_PER_LONG;
+
+        args.access_type = MSHV_GPAP_ACCESS_TYPE_DIRTY;
+        args.access_op   = MSHV_GPAP_ACCESS_OP_CLEAR;
+        args.page_count  = batch;
+        args.gpap_base   = base_pfn + completed;
+        args.bitmap_ptr  = (uint64_t)(bitmap + bitmap_offset);
+
+        ret = ioctl(vm_fd, MSHV_GET_GPAP_ACCESS_BITMAP, &args);
+        if (ret < 0) {
+            error_report("Failed to get dirty log (base_pfn=0x%" PRIx64
+                         " batch=%" PRIu64 "): %s",
+                         base_pfn + completed, batch, strerror(errno));
+            return -1;
+        }
+        completed += batch;
+    }
+
+    return 0;
+}
+
+bool mshv_log_global_start(MemoryListener *listener, Error **errp)
+{
+    int ret;
+
+    ret = enable_dirty_page_tracking(mshv_state->vm);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Failed to enable dirty page tracking");
+        return false;
+    }
+    return true;
+}
+
+static int disable_dirty_page_tracking(int vm_fd)
+{
+    int ret;
+    struct hv_input_set_partition_property in = {0};
+    struct mshv_root_hvcall args = {0};
+
+    in.property_code = HV_PARTITION_PROPERTY_GPA_PAGE_ACCESS_TRACKING;
+    in.property_value = 0;
+
+    args.code = HVCALL_SET_PARTITION_PROPERTY;
+    args.in_sz = sizeof(in);
+    args.in_ptr = (uint64_t)&in;
+
+    ret = mshv_hvcall(vm_fd, &args);
+    if (ret < 0) {
+        error_report("Failed to disable dirty page tracking: %s",
+                     strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
+static int set_dirty_pages(int vm_fd, uint64_t base_pfn, uint64_t page_count)
+{
+    uint64_t batch, completed = 0;
+    unsigned long bitmap[MSHV_DIRTY_PAGES_BATCH_SIZE / BITS_PER_LONG];
+    struct mshv_gpap_access_bitmap args = {0};
+    int ret;
+
+    while (completed < page_count) {
+        batch = MIN(MSHV_DIRTY_PAGES_BATCH_SIZE, page_count - completed);
+
+        args.access_type = MSHV_GPAP_ACCESS_TYPE_DIRTY;
+        args.access_op   = MSHV_GPAP_ACCESS_OP_SET;
+        args.page_count  = batch;
+        args.gpap_base   = base_pfn + completed;
+        args.bitmap_ptr  = (uint64_t)bitmap;
+
+        ret = ioctl(vm_fd, MSHV_GET_GPAP_ACCESS_BITMAP, &args);
+        if (ret < 0) {
+            error_report("Failed to set dirty pages (base_pfn=0x%" PRIx64
+                         " batch=%" PRIu64 "): %s",
+                         base_pfn + completed, batch, strerror(errno));
+            return -1;
+        }
+        completed += batch;
+    }
+
+    return 0;
+}
+
+static bool set_dirty_bits_cb(Int128 start, Int128 len, const MemoryRegion *mr,
+                              hwaddr offset_in_region, void *opaque)
+{
+    int ret, *errp = opaque;
+    hwaddr gpa, size;
+    uint64_t page_count, base_pfn;
+
+    gpa = int128_get64(start);
+    size = int128_get64(len);
+    page_count = size >> MSHV_PAGE_SHIFT;
+    base_pfn = gpa >> MSHV_PAGE_SHIFT;
+
+    if (!mr->ram || mr->readonly) {
+        return false;
+    }
+
+    if (page_count == 0) {
+        return false;
+    }
+
+    ret = set_dirty_pages(mshv_state->vm, base_pfn, page_count);
+
+    /* true aborts the iteration, which is what we want if there's an error */
+    if (ret < 0) {
+        *errp = ret;
+        return true;
+    }
+
+    return false;
+}
+
+void mshv_log_global_stop(MemoryListener *listener)
+{
+    int err = 0;
+    /* MSHV requires all dirty bits to be set before disabling tracking. */
+    FlatView *fv = address_space_to_flatview(&address_space_memory);
+    flatview_for_each_range(fv, set_dirty_bits_cb, &err);
+
+    if (err < 0) {
+        error_report("Failed to set dirty bits before disabling tracking");
+    }
+
+    disable_dirty_page_tracking(mshv_state->vm);
+}
+
+void mshv_log_sync(MemoryListener *listener, MemoryRegionSection *section)
+{
+    hwaddr size, start_addr, mr_offset;
+    uint64_t page_count, base_pfn;
+    size_t bitmap_size;
+    unsigned long *bitmap;
+    ram_addr_t ram_addr;
+    int ret;
+    MemoryRegion *mr = section->mr;
+
+    if (!memory_region_is_ram(mr) || memory_region_is_rom(mr)) {
+        return;
+    }
+
+    size = align_section(section, &start_addr);
+    if (!size) {
+        return;
+    }
+
+    page_count = size >> MSHV_PAGE_SHIFT;
+    base_pfn = start_addr >> MSHV_PAGE_SHIFT;
+    bitmap_size = ROUND_UP(page_count, BITS_PER_LONG) / 8;
+    bitmap = g_malloc0(bitmap_size);
+
+    ret = get_dirty_log(mshv_state->vm, base_pfn, page_count, bitmap,
+                        bitmap_size);
+    if (ret < 0) {
+        g_free(bitmap);
+        return;
+    }
+
+    mr_offset = section->offset_within_region + start_addr -
+                section->offset_within_address_space;
+    ram_addr = memory_region_get_ram_addr(mr) + mr_offset;
+
+    physical_memory_set_dirty_lebitmap(bitmap, ram_addr, page_count);
+    g_free(bitmap);
+}
diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index 45fe1ef468..87ed785302 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -546,6 +546,9 @@ static MemoryListener mshv_memory_listener = {
     .region_del = mem_region_del,
     .eventfd_add = mem_ioeventfd_add,
     .eventfd_del = mem_ioeventfd_del,
+    .log_sync = mshv_log_sync,
+    .log_global_start = mshv_log_global_start,
+    .log_global_stop = mshv_log_global_stop,
 };
 
 static MemoryListener mshv_io_listener = {
diff --git a/include/system/mshv_int.h b/include/system/mshv_int.h
index c24efc8675..ddbdd76076 100644
--- a/include/system/mshv_int.h
+++ b/include/system/mshv_int.h
@@ -31,6 +31,8 @@ struct mshv_get_set_vp_state;
 #define MSHV_HV_INTERRUPTION_TYPE_PRIV_SW_EXC 5
 #define MSHV_HV_INTERRUPTION_TYPE_SW_EXC      6
 
+#define MSHV_DIRTY_PAGES_BATCH_SIZE 0x10000
+
 typedef struct hyperv_message hv_message;
 
 typedef struct MshvHvCallArgs {
@@ -128,6 +130,9 @@ int mshv_guest_mem_write(uint64_t gpa, const uint8_t *data, uintptr_t size,
                          bool is_secure_mode);
 void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
                        bool add);
+void mshv_log_sync(MemoryListener *listener, MemoryRegionSection *section);
+bool mshv_log_global_start(MemoryListener *listener, Error **errp);
+void mshv_log_global_stop(MemoryListener *listener);
 
 /* msr */
 int mshv_init_msrs(const CPUState *cpu);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2026-03-23 14:06 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-23 13:57 [RFC 00/32] Add migration support to the MSHV accelerator Magnus Kulke
2026-03-23 13:57 ` [RFC 01/32] target/i386/mshv: use arch_load/store_reg fns Magnus Kulke
2026-03-23 13:57 ` [RFC 02/32] target/i386/mshv: use generic FPU/xcr0 state Magnus Kulke
2026-03-23 13:57 ` [RFC 03/32] target/i386/mshv: impl init/load/store_vcpu_state Magnus Kulke
2026-03-23 13:57 ` [RFC 04/32] accel/accel-irq: add AccelRouteChange abstraction Magnus Kulke
2026-03-23 13:57 ` [RFC 05/32] accel/accel-irq: add generic begin_route_changes Magnus Kulke
2026-03-23 13:57 ` [RFC 06/32] accel/accel-irq: add generic commit_route_changes Magnus Kulke
2026-03-23 13:57 ` [RFC 07/32] accel/mshv: add irq_routes to state Magnus Kulke
2026-03-23 13:57 ` [RFC 08/32] accel/mshv: update s->irq_routes in add_msi_route Magnus Kulke
2026-03-23 13:57 ` [RFC 09/32] accel/mshv: update s->irq_routes in update_msi_route Magnus Kulke
2026-03-23 13:57 ` [RFC 10/32] accel/mshv: update s->irq_routes in release_virq Magnus Kulke
2026-03-23 13:57 ` [RFC 11/32] accel/mshv: use s->irq_routes in commit_routes Magnus Kulke
2026-03-23 13:57 ` [RFC 12/32] accel/mshv: reserve ioapic routes on s->irq_routes Magnus Kulke
2026-03-23 13:57 ` [RFC 13/32] accel/mshv: remove redundant msi controller Magnus Kulke
2026-03-23 13:57 ` [RFC 14/32] target/i386/mshv: move apic logic into own file Magnus Kulke
2026-03-23 13:57 ` [RFC 15/32] target/i386/mshv: migrate LAPIC state Magnus Kulke
2026-03-23 13:57 ` [RFC 16/32] target/i386/mshv: move msr code to arch Magnus Kulke
2026-03-23 13:57 ` [RFC 17/32] accel/mshv: store partition proc features Magnus Kulke
2026-03-23 13:57 ` [RFC 18/32] target/i386/mshv: expose msvh_get_generic_regs Magnus Kulke
2026-03-23 13:57 ` [RFC 19/32] target/i386/mshv: migrate MSRs Magnus Kulke
2026-03-23 13:58 ` [RFC 20/32] target/i386/mshv: migrate MTRR MSRs Magnus Kulke
2026-03-23 13:58 ` [RFC 21/32] target/i386/mshv: migrate Synic SINT MSRs Magnus Kulke
2026-03-23 13:58 ` [RFC 22/32] target/i386/mshv: migrate SIMP and SIEFP state Magnus Kulke
2026-03-23 13:58 ` [RFC 23/32] target/i386/mshv: migrate STIMER state Magnus Kulke
2026-03-23 13:58 ` [RFC 24/32] accel/mshv: introduce SaveVMHandler Magnus Kulke
2026-03-23 13:58 ` [RFC 25/32] accel/mshv: write synthetic MSRs after migration Magnus Kulke
2026-03-23 13:58 ` [RFC 26/32] accel/mshv: migrate REFERENCE_TIME Magnus Kulke
2026-03-23 13:58 ` [RFC 27/32] target/i386/mshv: migrate pending ints/excs Magnus Kulke
2026-03-23 13:58 ` [RFC 28/32] target/i386: add de/compaction to xsave_helper Magnus Kulke
2026-03-23 13:58 ` [RFC 29/32] target/i386/mshv: migrate XSAVE state Magnus Kulke
2026-03-23 13:58 ` [RFC 30/32] target/i386/mshv: reconstruct hflags after load Magnus Kulke
2026-03-23 13:58 ` [RFC 31/32] target/i386/mshv: migrate MP_STATE Magnus Kulke
2026-03-23 13:58 ` [RFC 32/32] accel/mshv: enable dirty page tracking Magnus Kulke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox