[RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator
@ 2025-05-20 11:29 Magnus Kulke
  2025-05-20 11:29 ` [RFC PATCH 01/25] accel: Add Meson and config support for MSHV accelerator Magnus Kulke
                   ` (25 more replies)
  0 siblings, 26 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:29 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Hello all,

as previously announced here, we are working on an integration that will
expose the HyperV hypervisor to QEMU on Linux hosts. HyperV is a Type 1
hypervisor with a layered architecture that features a "root partition"
alongside VMs as "child partitions" that will interface with the
hypervisor and has access to the hardware. (https://aka.ms/hypervarch)

The effort to run Linux on such a Root Partition and expose HyperV to
such a management partition is called "MSHV". Sometimes we refer to the
root partition as "Dom0 Linux". Today we are targetting nested
virtualization, that is: the creation + management of L2 VMs on an L1
VM (L0 would indicate bare metal).

+-------------+ +----------------+ +--------------+
|             | |                | |              |
| Azure Host  | | L1 Linux Dom0  | | L2 Guest VM  |
|             | |                | |              |
|     OS      | |                | |              |
|             | | +------------+ | |              |
|             | | |  Qemu VMM  | | |              |
|             | | +------------+ | |              |
|             | | +------------+ | |              |
|             | | |   Kernel   | | |              |
|             | | +-----+------+ | |              |
|             | +-------|--------+ +--------------+
|             | +-------v-------------------------+
|             | |    Microsoft Hypervisor (L1)    |
+-------------+ +-------+-------------------------+
                        |
+-----------------------v-------------------------+
|            Microsoft Hypervisor (L0)            |
+-------------------------------------------------+

+-------------------------------------------------+
|                                                 |
|                    Hardware                     |
|                                                 |
+-------------------------------------------------+

This submission is a port of the existing MSHV integration that is
shipped in Cloud-Hypervisor and MSHV-specific rust-crates in rust-vmm.
There are various products like AKS Pod Sandboxing and AKS Confidential
Pods built on MSHV and Cloud-Hypervisor. We hope to achieve a seamless
integration into the QEMU accelerator framework, similar to existing
integrations like KVM, HVF or WHPX.

The patch set has been split into chunks that should be applicable and
buildable individually, but only the full set of commits will allow
launching MSHV-accelerated guests on supported kernels and environments.

The toggle to enable the feature at build time would be: `./configure
--enable-mshv`.

When launching a VM, the accelerator `mshv` can be enabled via

`-accel mshv` or `-machine q35,accel=mshv`.

We concluded the porting, but we haven't performed any comprehensive
testing yet. We opted to send our submission early to receive feedback
about the general structure and potential problems of our integration.
Most likely we will uncover problems during testing and address those in
upcoming revisions of the patch set.

The configuration we are using during development:

machine q35 + OVMF + various recent linux distros as guests (fedora
42, ubuntu 22.04)

We would welcome any feedback around the structure and integration
points that we chose, so we can incorporate them into upcoming
revisions.

Some notes/caveats about the initial submission:

- The relevant MSHV kernel code has been accepted for inclusion in the
  upcoming 6.15 release, which should be released shortly. To allow
  building it on older kernel we vendored the kernel headers that define
  the MSHV ABI into the patch set. We might remove it in later
  revisions of the patch set, or put it behind a feature toggle. Once
  the kernel is released we plan to published a preconfigured Azure
  image, which can be used to test the MSHV accelerator.

- QEMU is mapping regions into the guest that might partially overlap in
  their userspace_addr range (e.g. for ROMs in early boot). Currently
  MSHV will reject such overlaps. We are looking into whether we
  can/want to relax that restriction. To work around this we maintain a
  list of mapping references and swap in/out regions if there's a GPA
  fault and we find a valid candidate region in our list. (see last
  commit). Maybe there are alternative, less invasive, suggestions. We'd
  be happy to hear those.

- We noticed that when using SeaBIOS, in certain permutations of guest
  configuration (> 2GB ram & >1 virtio-blk-pci devices), we run into
  unmapped GPA errors. We suspect it has to do with SeaBIOS addressing
  memory in the 4GB+ region in those cases. We are investigating, and
  will hopefully be able to issue a fix soon. For the time being this
  can be worked around by using OVMF as firmware:

- Since the MHSV accelerator requires a HyperV hypervisor to be present,
  it would make sense to provide testing infrastructure for integration
  testing on Azure. We are looking into options how to implement that.

best,

magnus

Magnus Kulke (25):
  accel: Add Meson and config support for MSHV accelerator
  target/i386/emulate: allow instruction decoding from stream
  target/i386/mshv: Add x86 decoder/emu implementation
  hw/intc: Generalize APIC helper names from kvm_* to accel_*
  include/hw/hyperv: Add MSHV ABI header definitions
  accel/mshv: Add accelerator skeleton
  accel/mshv: Register memory region listeners
  accel/mshv: Initialize VM partition
  accel/mshv: Register guest memory regions with hypervisor
  accel/mshv: Add ioeventfd support
  accel/mshv: Add basic interrupt injection support
  accel/mshv: Add vCPU creation and execution loop
  accel/mshv: Add vCPU signal handling
  target/i386/mshv: Add CPU create and remove logic
  target/i386/mshv: Implement mshv_store_regs()
  target/i386/mshv: Implement mshv_get_standard_regs()
  target/i386/mshv: Implement mshv_get_special_regs()
  target/i386/mshv: Implement mshv_arch_put_registers()
  target/i386/mshv: Set local interrupt controller state
  target/i386/mshv: Register CPUID entries with MSHV
  target/i386/mshv: Register MSRs with MSHV
  target/i386/mshv: Integrate x86 instruction decoder/emulator
  target/i386/mshv: Write MSRs to the hypervisor
  target/i386/mshv: Implement mshv_vcpu_run()
  accel/mshv: Add memory remapping workaround

 accel/Kconfig                    |    3 +
 accel/accel-irq.c                |   95 ++
 accel/meson.build                |    3 +-
 accel/mshv/irq.c                 |  370 +++++++
 accel/mshv/mem.c                 |  434 ++++++++
 accel/mshv/meson.build           |    9 +
 accel/mshv/mshv-all.c            |  731 ++++++++++++
 accel/mshv/msr.c                 |  375 +++++++
 accel/mshv/trace-events          |   20 +
 accel/mshv/trace.h               |    1 +
 hw/intc/apic.c                   |    9 +
 hw/intc/ioapic.c                 |   20 +-
 hw/virtio/virtio-pci.c           |   19 +-
 include/hw/hyperv/hvgdk.h        |   20 +
 include/hw/hyperv/hvhdk.h        |  165 +++
 include/hw/hyperv/hvhdk_mini.h   |  106 ++
 include/hw/hyperv/linux-mshv.h   | 1038 ++++++++++++++++++
 include/system/accel-irq.h       |   26 +
 include/system/mshv.h            |  237 ++++
 meson.build                      |   17 +
 meson_options.txt                |    2 +
 scripts/meson-buildoptions.sh    |    3 +
 target/i386/cpu.h                |    2 +-
 target/i386/emulate/meson.build  |    7 +-
 target/i386/emulate/x86_decode.c |   32 +-
 target/i386/emulate/x86_decode.h |   11 +
 target/i386/emulate/x86_emu.c    |    3 +-
 target/i386/emulate/x86_emu.h    |    1 +
 target/i386/meson.build          |    2 +
 target/i386/mshv/meson.build     |    8 +
 target/i386/mshv/mshv-cpu.c      | 1768 ++++++++++++++++++++++++++++++
 target/i386/mshv/x86.c           |  330 ++++++
 32 files changed, 5841 insertions(+), 26 deletions(-)
 create mode 100644 accel/accel-irq.c
 create mode 100644 accel/mshv/irq.c
 create mode 100644 accel/mshv/mem.c
 create mode 100644 accel/mshv/meson.build
 create mode 100644 accel/mshv/mshv-all.c
 create mode 100644 accel/mshv/msr.c
 create mode 100644 accel/mshv/trace-events
 create mode 100644 accel/mshv/trace.h
 create mode 100644 include/hw/hyperv/hvgdk.h
 create mode 100644 include/hw/hyperv/hvhdk.h
 create mode 100644 include/hw/hyperv/hvhdk_mini.h
 create mode 100644 include/hw/hyperv/linux-mshv.h
 create mode 100644 include/system/accel-irq.h
 create mode 100644 include/system/mshv.h
 create mode 100644 target/i386/mshv/meson.build
 create mode 100644 target/i386/mshv/mshv-cpu.c
 create mode 100644 target/i386/mshv/x86.c

-- 
2.34.1



^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC PATCH 01/25] accel: Add Meson and config support for MSHV accelerator
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
@ 2025-05-20 11:29 ` Magnus Kulke
  2025-05-20 11:50   ` Daniel P. Berrangé
  2025-05-20 11:29 ` [RFC PATCH 02/25] target/i386/emulate: allow instruction decoding from stream Magnus Kulke
                   ` (24 subsequent siblings)
  25 siblings, 1 reply; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:29 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Introduce a Meson feature option and default-config entry to allow
building QEMU with MSHV (Microsoft Hypervisor) acceleration support.

This is the first step toward implementing an MSHV backend in QEMU.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/Kconfig                 |  3 +++
 meson.build                   | 16 ++++++++++++++++
 meson_options.txt             |  2 ++
 scripts/meson-buildoptions.sh |  3 +++
 4 files changed, 24 insertions(+)

diff --git a/accel/Kconfig b/accel/Kconfig
index 4263cab722..a60f114923 100644
--- a/accel/Kconfig
+++ b/accel/Kconfig
@@ -13,6 +13,9 @@ config TCG
 config KVM
     bool
 
+config MSHV
+    bool
+
 config XEN
     bool
     select FSDEV_9P if VIRTFS
diff --git a/meson.build b/meson.build
index e819a7084c..a4269b816b 100644
--- a/meson.build
+++ b/meson.build
@@ -322,6 +322,13 @@ else
 endif
 accelerator_targets += { 'CONFIG_XEN': xen_targets }
 
+if cpu == 'x86_64'
+  mshv_targets = ['x86_64-softmmu']
+else
+  mshv_targets = []
+endif
+accelerator_targets += { 'CONFIG_MSHV': mshv_targets }
+
 if cpu == 'aarch64'
   accelerator_targets += {
     'CONFIG_HVF': ['aarch64-softmmu']
@@ -877,6 +884,14 @@ accelerators = []
 if get_option('kvm').allowed() and host_os == 'linux'
   accelerators += 'CONFIG_KVM'
 endif
+
+if get_option('mshv').allowed() and host_os == 'linux'
+  if get_option('mshv').enabled() and host_machine.cpu() != 'x86_64'
+    error('mshv accelerator requires x64_64 host')
+  endif
+  accelerators += 'CONFIG_MSHV'
+endif
+
 if get_option('whpx').allowed() and host_os == 'windows'
   if get_option('whpx').enabled() and host_machine.cpu() != 'x86_64'
     error('WHPX requires 64-bit host')
@@ -4747,6 +4762,7 @@ if have_system
   summary_info += {'HVF support':       config_all_accel.has_key('CONFIG_HVF')}
   summary_info += {'WHPX support':      config_all_accel.has_key('CONFIG_WHPX')}
   summary_info += {'NVMM support':      config_all_accel.has_key('CONFIG_NVMM')}
+  summary_info += {'MSHV support':       config_all_accel.has_key('CONFIG_MSHV')}
   summary_info += {'Xen support':       xen.found()}
   if xen.found()
     summary_info += {'xen ctrl version':  xen.version()}
diff --git a/meson_options.txt b/meson_options.txt
index cc66b46c63..e5671884b8 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -71,6 +71,8 @@ option('malloc', type : 'combo', choices : ['system', 'tcmalloc', 'jemalloc'],
 
 option('kvm', type: 'feature', value: 'auto',
        description: 'KVM acceleration support')
+option('mshv', type: 'feature', value: 'auto',
+       description: 'MSHV acceleration support')
 option('whpx', type: 'feature', value: 'auto',
        description: 'WHPX acceleration support')
 option('hvf', type: 'feature', value: 'auto',
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 8a67a14e2e..cfd767a425 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -155,6 +155,7 @@ meson_options_help() {
   printf "%s\n" '  membarrier      membarrier system call (for Linux 4.14+ or Windows'
   printf "%s\n" '  modules         modules support (non Windows)'
   printf "%s\n" '  mpath           Multipath persistent reservation passthrough'
+  printf "%s\n" '  mshv            MSHV acceleration support'
   printf "%s\n" '  multiprocess    Out of process device emulation support'
   printf "%s\n" '  netmap          netmap network backend support'
   printf "%s\n" '  nettle          nettle cryptography support'
@@ -410,6 +411,8 @@ _meson_option_parse() {
     --disable-modules) printf "%s" -Dmodules=disabled ;;
     --enable-mpath) printf "%s" -Dmpath=enabled ;;
     --disable-mpath) printf "%s" -Dmpath=disabled ;;
+    --enable-mshv) printf "%s" -Dmshv=enabled ;;
+    --disable-mshv) printf "%s" -Dmshv=disabled ;;
     --enable-multiprocess) printf "%s" -Dmultiprocess=enabled ;;
     --disable-multiprocess) printf "%s" -Dmultiprocess=disabled ;;
     --enable-netmap) printf "%s" -Dnetmap=enabled ;;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 02/25] target/i386/emulate: allow instruction decoding from stream
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
  2025-05-20 11:29 ` [RFC PATCH 01/25] accel: Add Meson and config support for MSHV accelerator Magnus Kulke
@ 2025-05-20 11:29 ` Magnus Kulke
  2025-05-20 12:42   ` Paolo Bonzini
  2025-05-20 17:29   ` Wei Liu
  2025-05-20 11:29 ` [RFC PATCH 03/25] target/i386/mshv: Add x86 decoder/emu implementation Magnus Kulke
                   ` (23 subsequent siblings)
  25 siblings, 2 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:29 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Introduce a new helper function to decode x86 instructions from a
raw instruction byte stream. MSHV delivers an instruction stream in a
buffer of the vm_exit message. It can be used to speed up MMIO
emulation, since instructions do not have to be fetched and translated.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 target/i386/emulate/x86_decode.c | 32 +++++++++++++++++++++++++++-----
 target/i386/emulate/x86_decode.h | 11 +++++++++++
 target/i386/emulate/x86_emu.c    |  3 ++-
 target/i386/emulate/x86_emu.h    |  1 +
 4 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/target/i386/emulate/x86_decode.c b/target/i386/emulate/x86_decode.c
index 88be9479a8..7a862b976e 100644
--- a/target/i386/emulate/x86_decode.c
+++ b/target/i386/emulate/x86_decode.c
@@ -60,6 +60,7 @@ static inline uint64_t decode_bytes(CPUX86State *env, struct x86_decode *decode,
                                     int size)
 {
     uint64_t val = 0;
+    target_ulong va;
 
     switch (size) {
     case 1:
@@ -71,10 +72,16 @@ static inline uint64_t decode_bytes(CPUX86State *env, struct x86_decode *decode,
         VM_PANIC_EX("%s invalid size %d\n", __func__, size);
         break;
     }
-    target_ulong va  = linear_rip(env_cpu(env), env->eip) + decode->len;
-    emul_ops->read_mem(env_cpu(env), &val, va, size);
+
+	/* copy the bytes from the instruction stream, if available */
+	if (decode->stream && decode->len + size <= decode->stream->len) {
+		memcpy(&val, decode->stream->bytes + decode->len, size);
+	} else {
+		va = linear_rip(env_cpu(env), env->eip) + decode->len;
+		emul_ops->fetch_instruction(env_cpu(env), &val, va, size);
+	}
     decode->len += size;
-    
+
     return val;
 }
 
@@ -2076,9 +2083,8 @@ static void decode_opcodes(CPUX86State *env, struct x86_decode *decode)
     }
 }
 
-uint32_t decode_instruction(CPUX86State *env, struct x86_decode *decode)
+static uint32_t decode_opcode(CPUX86State *env, struct x86_decode *decode)
 {
-    memset(decode, 0, sizeof(*decode));
     decode_prefix(env, decode);
     set_addressing_size(env, decode);
     set_operand_size(env, decode);
@@ -2088,6 +2094,22 @@ uint32_t decode_instruction(CPUX86State *env, struct x86_decode *decode)
     return decode->len;
 }
 
+uint32_t decode_instruction(CPUX86State *env, struct x86_decode *decode)
+{
+	memset(decode, 0, sizeof(*decode));
+	return decode_opcode(env, decode);
+}
+
+uint32_t decode_instruction_stream(CPUX86State *env, struct x86_decode *decode,
+		                           struct x86_insn_stream *stream)
+{
+	memset(decode, 0, sizeof(*decode));
+	if (stream != NULL) {
+		decode->stream = stream;
+	}
+	return decode_opcode(env, decode);
+}
+
 void init_decoder(void)
 {
     int i;
diff --git a/target/i386/emulate/x86_decode.h b/target/i386/emulate/x86_decode.h
index 87cc728598..9bc7d6cc49 100644
--- a/target/i386/emulate/x86_decode.h
+++ b/target/i386/emulate/x86_decode.h
@@ -269,6 +269,11 @@ typedef struct x86_decode_op {
     target_ulong ptr;
 } x86_decode_op;
 
+typedef struct x86_insn_stream {
+	const uint8_t *bytes;
+	size_t len;
+} x86_insn_stream;
+
 typedef struct x86_decode {
     int len;
     uint8_t opcode[4];
@@ -295,12 +300,18 @@ typedef struct x86_decode {
     struct x86_modrm modrm;
     struct x86_decode_op op[4];
     bool is_fpu;
+
+	x86_insn_stream *stream;
 } x86_decode;
 
 uint64_t sign(uint64_t val, int size);
 
 uint32_t decode_instruction(CPUX86State *env, struct x86_decode *decode);
 
+uint32_t decode_instruction_stream(CPUX86State *env,
+								   struct x86_decode *decode,
+		                           struct x86_insn_stream *stream);
+
 target_ulong get_reg_ref(CPUX86State *env, int reg, int rex_present,
                          int is_extended, int size);
 target_ulong get_reg_val(CPUX86State *env, int reg, int rex_present,
diff --git a/target/i386/emulate/x86_emu.c b/target/i386/emulate/x86_emu.c
index 7773b51b95..73c9eb41d1 100644
--- a/target/i386/emulate/x86_emu.c
+++ b/target/i386/emulate/x86_emu.c
@@ -1241,7 +1241,8 @@ static void init_cmd_handler(void)
 bool exec_instruction(CPUX86State *env, struct x86_decode *ins)
 {
     if (!_cmd_handler[ins->cmd].handler) {
-        printf("Unimplemented handler (" TARGET_FMT_lx ") for %d (%x %x) \n", env->eip,
+        printf("Unimplemented handler (" TARGET_FMT_lx ") for %d (%x %x) \n",
+                env->eip,
                 ins->cmd, ins->opcode[0],
                 ins->opcode_len > 1 ? ins->opcode[1] : 0);
         env->eip += ins->len;
diff --git a/target/i386/emulate/x86_emu.h b/target/i386/emulate/x86_emu.h
index 555b567e2c..761e83fd6b 100644
--- a/target/i386/emulate/x86_emu.h
+++ b/target/i386/emulate/x86_emu.h
@@ -24,6 +24,7 @@
 #include "cpu.h"
 
 struct x86_emul_ops {
+    void (*fetch_instruction)(CPUState *cpu, void *data, target_ulong addr, int bytes);
     void (*read_mem)(CPUState *cpu, void *data, target_ulong addr, int bytes);
     void (*write_mem)(CPUState *cpu, void *data, target_ulong addr, int bytes);
     void (*read_segment_descriptor)(CPUState *cpu, struct x86_segment_descriptor *desc,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 03/25] target/i386/mshv: Add x86 decoder/emu implementation
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
  2025-05-20 11:29 ` [RFC PATCH 01/25] accel: Add Meson and config support for MSHV accelerator Magnus Kulke
  2025-05-20 11:29 ` [RFC PATCH 02/25] target/i386/emulate: allow instruction decoding from stream Magnus Kulke
@ 2025-05-20 11:29 ` Magnus Kulke
  2025-05-20 11:54   ` Daniel P. Berrangé
                     ` (2 more replies)
  2025-05-20 11:29 ` [RFC PATCH 04/25] hw/intc: Generalize APIC helper names from kvm_* to accel_* Magnus Kulke
                   ` (22 subsequent siblings)
  25 siblings, 3 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:29 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

The MSHV accelerator requires a x86 decoder/emulator in userland to
emulate MMIO instructions. This change contains the implementations for
the generalized i386 instruction decoder/emulator.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 include/system/mshv.h           |  32 ++++
 target/i386/cpu.h               |   2 +-
 target/i386/emulate/meson.build |   7 +-
 target/i386/meson.build         |   2 +
 target/i386/mshv/meson.build    |   7 +
 target/i386/mshv/x86.c          | 330 ++++++++++++++++++++++++++++++++
 6 files changed, 377 insertions(+), 3 deletions(-)
 create mode 100644 include/system/mshv.h
 create mode 100644 target/i386/mshv/meson.build
 create mode 100644 target/i386/mshv/x86.c

diff --git a/include/system/mshv.h b/include/system/mshv.h
new file mode 100644
index 0000000000..8380b92da2
--- /dev/null
+++ b/include/system/mshv.h
@@ -0,0 +1,32 @@
+/*
+ * QEMU MSHV support
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * Authors:
+ *  Ziqiao Zhou       <ziqiaozhou@microsoft.com>
+ *  Magnus Kulke      <magnuskulke@microsoft.com>
+ *  Jinank Jain       <jinankjain@microsoft.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_MSHV_INT_H
+#define QEMU_MSHV_INT_H
+
+#ifdef COMPILING_PER_TARGET
+#ifdef CONFIG_MSHV
+#define CONFIG_MSHV_IS_POSSIBLE
+#endif
+#else
+#define CONFIG_MSHV_IS_POSSIBLE
+#endif
+
+/* cpu */
+/* EFER (technically not a register) bits */
+#define EFER_LMA   ((uint64_t)0x400)
+#define EFER_LME   ((uint64_t)0x100)
+
+#endif
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 4f8ed8868e..db6a37b271 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2101,7 +2101,7 @@ typedef struct CPUArchState {
     QEMUTimer *xen_periodic_timer;
     QemuMutex xen_timers_lock;
 #endif
-#if defined(CONFIG_HVF)
+#if defined(CONFIG_HVF) || defined(CONFIG_MSHV)
     X86LazyFlags lflags;
     void *emu_mmio_buf;
 #endif
diff --git a/target/i386/emulate/meson.build b/target/i386/emulate/meson.build
index 4edd4f462f..b6dafb6a5b 100644
--- a/target/i386/emulate/meson.build
+++ b/target/i386/emulate/meson.build
@@ -1,5 +1,8 @@
-i386_system_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files(
+emulator_files = files(
   'x86_decode.c',
   'x86_emu.c',
   'x86_flags.c',
-))
+)
+
+i386_system_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: emulator_files)
+i386_system_ss.add(when: 'CONFIG_MSHV', if_true: emulator_files)
diff --git a/target/i386/meson.build b/target/i386/meson.build
index c1aacea613..6097e5c427 100644
--- a/target/i386/meson.build
+++ b/target/i386/meson.build
@@ -11,6 +11,7 @@ i386_ss.add(when: 'CONFIG_SEV', if_true: files('host-cpu.c', 'confidential-guest
 # x86 cpu type
 i386_ss.add(when: 'CONFIG_KVM', if_true: files('host-cpu.c'))
 i386_ss.add(when: 'CONFIG_HVF', if_true: files('host-cpu.c'))
+i386_ss.add(when: 'CONFIG_MSHV', if_true: files('host-cpu.c'))
 
 i386_system_ss = ss.source_set()
 i386_system_ss.add(files(
@@ -32,6 +33,7 @@ subdir('nvmm')
 subdir('hvf')
 subdir('tcg')
 subdir('emulate')
+subdir('mshv')
 
 target_arch += {'i386': i386_ss}
 target_system_arch += {'i386': i386_system_ss}
diff --git a/target/i386/mshv/meson.build b/target/i386/mshv/meson.build
new file mode 100644
index 0000000000..8ddaa7c11d
--- /dev/null
+++ b/target/i386/mshv/meson.build
@@ -0,0 +1,7 @@
+i386_mshv_ss = ss.source_set()
+
+i386_mshv_ss.add(files(
+  'x86.c',
+))
+
+i386_system_ss.add_all(when: 'CONFIG_MSHV', if_true: i386_mshv_ss)
diff --git a/target/i386/mshv/x86.c b/target/i386/mshv/x86.c
new file mode 100644
index 0000000000..581710fd06
--- /dev/null
+++ b/target/i386/mshv/x86.c
@@ -0,0 +1,330 @@
+/*
+ * QEMU MSHV support
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * Authors:
+ *  Magnus Kulke      <magnuskulke@microsoft.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include "cpu.h"
+#include "emulate/x86_decode.h"
+#include "emulate/x86_emu.h"
+#include "qemu/typedefs.h"
+#include "qemu/error-report.h"
+#include "system/mshv.h"
+
+/* RW or Exec segment */
+static const uint8_t RWRX_SEGMENT_TYPE        = 0x2;
+static const uint8_t CODE_SEGMENT_TYPE        = 0x8;
+static const uint8_t EXPAND_DOWN_SEGMENT_TYPE = 0x4;
+
+typedef enum CpuMode {
+    REAL_MODE,
+    PROTECTED_MODE,
+    LONG_MODE,
+} CpuMode;
+
+static CpuMode cpu_mode(CPUState *cpu)
+{
+    enum CpuMode m = REAL_MODE;
+
+    if (x86_is_protected(cpu)) {
+        m = PROTECTED_MODE;
+
+        if (x86_is_long_mode(cpu)) {
+            m = LONG_MODE;
+        }
+    }
+
+    return m;
+}
+
+static bool segment_type_ro(const SegmentCache *seg)
+{
+    uint32_t type_ = (seg->flags >> DESC_TYPE_SHIFT) & 15;
+    return (type_ & (~RWRX_SEGMENT_TYPE)) == 0;
+}
+
+static bool segment_type_code(const SegmentCache *seg)
+{
+    uint32_t type_ = (seg->flags >> DESC_TYPE_SHIFT) & 15;
+    return (type_ & CODE_SEGMENT_TYPE) != 0;
+}
+
+static bool segment_expands_down(const SegmentCache *seg)
+{
+    uint32_t type_ = (seg->flags >> DESC_TYPE_SHIFT) & 15;
+
+    if (segment_type_code(seg)) {
+        return false;
+    }
+
+    return (type_ & EXPAND_DOWN_SEGMENT_TYPE) != 0;
+}
+
+static uint32_t segment_limit(const SegmentCache *seg)
+{
+    uint32_t limit = seg->limit;
+    uint32_t granularity = (seg->flags & DESC_G_MASK) != 0;
+
+    if (granularity != 0) {
+        limit = (limit << 12) | 0xFFF;
+    }
+
+    return limit;
+}
+
+static uint8_t segment_db(const SegmentCache *seg)
+{
+    return (seg->flags >> DESC_B_SHIFT) & 1;
+}
+
+static uint32_t segment_max_limit(const SegmentCache *seg)
+{
+    if (segment_db(seg) != 0) {
+        return 0xFFFFFFFF;
+    }
+    return 0xFFFF;
+}
+
+static int linearize(CPUState *cpu,
+                     target_ulong logical_addr, target_ulong *linear_addr,
+                     X86Seg seg_idx)
+{
+    enum CpuMode mode;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+    SegmentCache *seg = &env->segs[seg_idx];
+    target_ulong base = seg->base;
+    target_ulong logical_addr_32b;
+    uint32_t limit;
+    /* TODO: the emulator will not pass us "write" indicator yet */
+    bool write = false;
+
+    mode = cpu_mode(cpu);
+
+    switch (mode) {
+    case LONG_MODE:
+        if (__builtin_add_overflow(logical_addr, base, linear_addr)) {
+            error_report("Address overflow");
+            return -1;
+        }
+        break;
+    case PROTECTED_MODE:
+    case REAL_MODE:
+        if (segment_type_ro(seg) && write) {
+            error_report("Cannot write to read-only segment");
+            return -1;
+        }
+
+        logical_addr_32b = logical_addr & 0xFFFFFFFF;
+        limit = segment_limit(seg);
+
+        if (segment_expands_down(seg)) {
+            if (logical_addr_32b >= limit) {
+                error_report("Address exceeds limit (expands down)");
+                return -1;
+            }
+
+            limit = segment_max_limit(seg);
+        }
+
+        if (logical_addr_32b > limit) {
+            error_report("Address exceeds limit %u", limit);
+            return -1;
+        }
+        *linear_addr = logical_addr_32b + base;
+        break;
+    default:
+        error_report("Unknown cpu mode: %d", mode);
+        return -1;
+    }
+
+    return 0;
+}
+
+bool x86_read_segment_descriptor(CPUState *cpu,
+                                 struct x86_segment_descriptor *desc,
+                                 x86_segment_selector sel)
+{
+    target_ulong base;
+    uint32_t limit;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+    target_ulong gva;
+    /* int ret; */
+
+    memset(desc, 0, sizeof(*desc));
+
+    /* valid gdt descriptors start from index 1 */
+    if (!sel.index && GDT_SEL == sel.ti) {
+        return false;
+    }
+
+    if (GDT_SEL == sel.ti) {
+        base = env->gdt.base;
+        limit = env->gdt.limit;
+    } else {
+        base = env->ldt.base;
+        limit = env->ldt.limit;
+    }
+
+    if (sel.index * 8 >= limit) {
+        return false;
+    }
+
+    gva = base + sel.index * 8;
+    emul_ops->read_mem(cpu, desc, gva, sizeof(*desc));
+
+    return true;
+}
+
+bool x86_write_segment_descriptor(CPUState *cpu,
+                                  struct x86_segment_descriptor *desc,
+                                  x86_segment_selector sel)
+{
+    target_ulong base;
+    uint32_t limit;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+    /* int ret; */
+    target_ulong gva;
+
+    if (GDT_SEL == sel.ti) {
+        base = env->gdt.base;
+        limit = env->gdt.limit;
+    } else {
+        base = env->ldt.base;
+        limit = env->ldt.limit;
+    }
+
+    if (sel.index * 8 >= limit) {
+        return false;
+    }
+
+    gva = base + sel.index * 8;
+    emul_ops->write_mem(cpu, desc, gva, sizeof(*desc));
+
+    return true;
+}
+
+bool x86_read_call_gate(CPUState *cpu, struct x86_call_gate *idt_desc,
+                        int gate)
+{
+    target_ulong base;
+    uint32_t limit;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+    target_ulong gva;
+
+    base = env->idt.base;
+    limit = env->idt.limit;
+
+    memset(idt_desc, 0, sizeof(*idt_desc));
+    if (gate * 8 >= limit) {
+        perror("call gate exceeds idt limit");
+        return false;
+    }
+
+    gva = base + gate * 8;
+    emul_ops->read_mem(cpu, idt_desc, gva, sizeof(*idt_desc));
+
+    return true;
+}
+
+bool x86_is_protected(CPUState *cpu)
+{
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+    uint64_t cr0 = env->cr[0];
+
+    return cr0 & CR0_PE_MASK;
+}
+
+bool x86_is_real(CPUState *cpu)
+{
+    return !x86_is_protected(cpu);
+}
+
+bool x86_is_v8086(CPUState *cpu)
+{
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+    return x86_is_protected(cpu) && (env->eflags & VM_MASK);
+}
+
+bool x86_is_long_mode(CPUState *cpu)
+{
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+    uint64_t efer = env->efer;
+
+    return ((efer & (EFER_LME | EFER_LMA)) == (EFER_LME | EFER_LMA));
+}
+
+bool x86_is_long64_mode(CPUState *cpu)
+{
+    error_report("unimplemented: is_long64_mode()");
+    abort();
+}
+
+bool x86_is_paging_mode(CPUState *cpu)
+{
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+    uint64_t cr0 = env->cr[0];
+
+    return cr0 & CR0_PG_MASK;
+}
+
+bool x86_is_pae_enabled(CPUState *cpu)
+{
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+    uint64_t cr4 = env->cr[4];
+
+    return cr4 & CR4_PAE_MASK;
+}
+
+target_ulong linear_addr(CPUState *cpu, target_ulong addr, X86Seg seg)
+{
+    int ret;
+    target_ulong linear_addr;
+
+    /* return vmx_read_segment_base(cpu, seg) + addr; */
+    ret = linearize(cpu, addr, &linear_addr, seg);
+    if (ret < 0) {
+        error_report("failed to linearize address");
+        abort();
+    }
+
+    return linear_addr;
+}
+
+target_ulong linear_addr_size(CPUState *cpu, target_ulong addr, int size,
+                              X86Seg seg)
+{
+    switch (size) {
+    case 2:
+        addr = (uint16_t)addr;
+        break;
+    case 4:
+        addr = (uint32_t)addr;
+        break;
+    default:
+        break;
+    }
+    return linear_addr(cpu, addr, seg);
+}
+
+target_ulong linear_rip(CPUState *cpu, target_ulong rip)
+{
+    return linear_addr(cpu, rip, R_CS);
+}
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 04/25] hw/intc: Generalize APIC helper names from kvm_* to accel_*
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (2 preceding siblings ...)
  2025-05-20 11:29 ` [RFC PATCH 03/25] target/i386/mshv: Add x86 decoder/emu implementation Magnus Kulke
@ 2025-05-20 11:29 ` Magnus Kulke
  2025-05-20 11:29 ` [RFC PATCH 05/25] include/hw/hyperv: Add MSHV ABI header definitions Magnus Kulke
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:29 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Rename APIC helper functions to use an accel_* prefix instead of kvm_*
to support use by accelerators other than KVM. This is a preparatory
step for integrating MSHV support with common APIC logic.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/accel-irq.c          | 95 ++++++++++++++++++++++++++++++++++++++
 accel/meson.build          |  2 +-
 hw/intc/ioapic.c           | 20 +++++---
 hw/virtio/virtio-pci.c     | 19 ++++----
 include/system/accel-irq.h | 26 +++++++++++
 include/system/mshv.h      | 22 +++++++++
 6 files changed, 167 insertions(+), 17 deletions(-)
 create mode 100644 accel/accel-irq.c
 create mode 100644 include/system/accel-irq.h

diff --git a/accel/accel-irq.c b/accel/accel-irq.c
new file mode 100644
index 0000000000..63f8ed260a
--- /dev/null
+++ b/accel/accel-irq.c
@@ -0,0 +1,95 @@
+#include "qemu/osdep.h"
+#include "hw/pci/msi.h"
+
+#include "system/kvm.h"
+#include "system/mshv.h"
+#include "system/accel-irq.h"
+
+int accel_irqchip_add_msi_route(KVMRouteChange *c, int vector, PCIDevice *dev)
+{
+#ifdef CONFIG_MSHV_IS_POSSIBLE
+    if (mshv_msi_via_irqfd_enabled()) {
+        return mshv_irqchip_add_msi_route(vector, dev);
+    }
+#endif
+    if (kvm_enabled()) {
+        return kvm_irqchip_add_msi_route(c, vector, dev);
+    }
+    return -ENOSYS;
+}
+
+int accel_irqchip_update_msi_route(int vector, MSIMessage msg, PCIDevice *dev)
+{
+#ifdef CONFIG_MSHV_IS_POSSIBLE
+    if (mshv_msi_via_irqfd_enabled()) {
+        return mshv_irqchip_update_msi_route(vector, msg, dev);
+    }
+#endif
+    if (kvm_enabled()) {
+        return kvm_irqchip_update_msi_route(kvm_state, vector, msg, dev);
+    }
+    return -ENOSYS;
+}
+
+void accel_irqchip_commit_route_changes(KVMRouteChange *c)
+{
+#ifdef CONFIG_MSHV_IS_POSSIBLE
+    if (mshv_msi_via_irqfd_enabled()) {
+        mshv_irqchip_commit_routes();
+    }
+#endif
+    if (kvm_enabled()) {
+        kvm_irqchip_commit_route_changes(c);
+    }
+}
+
+void accel_irqchip_commit_routes(void)
+{
+#ifdef CONFIG_MSHV_IS_POSSIBLE
+    if (mshv_msi_via_irqfd_enabled()) {
+        mshv_irqchip_commit_routes();
+    }
+#endif
+    if (kvm_enabled()) {
+        kvm_irqchip_commit_routes(kvm_state);
+    }
+}
+
+void accel_irqchip_release_virq(int virq)
+{
+#ifdef CONFIG_MSHV_IS_POSSIBLE
+    if (mshv_msi_via_irqfd_enabled()) {
+        mshv_irqchip_release_virq(virq);
+    }
+#endif
+    if (kvm_enabled()) {
+        kvm_irqchip_release_virq(kvm_state, virq);
+    }
+}
+
+int accel_irqchip_add_irqfd_notifier_gsi(EventNotifier *n, EventNotifier *rn,
+                                         int virq)
+{
+#ifdef CONFIG_MSHV_IS_POSSIBLE
+    if (mshv_msi_via_irqfd_enabled()) {
+        return mshv_irqchip_add_irqfd_notifier_gsi(n, rn, virq);
+    }
+#endif
+    if (kvm_enabled()) {
+        return kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, n, rn, virq);
+    }
+    return -ENOSYS;
+}
+
+int accel_irqchip_remove_irqfd_notifier_gsi(EventNotifier *n, int virq)
+{
+#ifdef CONFIG_MSHV_IS_POSSIBLE
+    if (mshv_msi_via_irqfd_enabled()) {
+        return mshv_irqchip_remove_irqfd_notifier_gsi(n, virq);
+    }
+#endif
+    if (kvm_enabled()) {
+        return kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, n, virq);
+    }
+    return -ENOSYS;
+}
diff --git a/accel/meson.build b/accel/meson.build
index 52909314bf..d5e982d152 100644
--- a/accel/meson.build
+++ b/accel/meson.build
@@ -1,6 +1,6 @@
 common_ss.add(files('accel-common.c'))
 specific_ss.add(files('accel-target.c'))
-system_ss.add(files('accel-system.c', 'accel-blocker.c'))
+system_ss.add(files('accel-system.c', 'accel-blocker.c', 'accel-irq.c'))
 user_ss.add(files('accel-user.c'))
 
 subdir('tcg')
diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 133bef852d..e431d00311 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -30,12 +30,18 @@
 #include "hw/intc/ioapic_internal.h"
 #include "hw/pci/msi.h"
 #include "hw/qdev-properties.h"
+#include "system/accel-irq.h"
 #include "system/kvm.h"
 #include "system/system.h"
 #include "hw/i386/apic-msidef.h"
 #include "hw/i386/x86-iommu.h"
 #include "trace.h"
 
+
+#if defined(CONFIG_KVM) || defined(CONFIG_MSHV)
+#define ACCEL_GSI_IRQFD_POSSIBLE
+#endif
+
 #define APIC_DELIVERY_MODE_SHIFT 8
 #define APIC_POLARITY_SHIFT 14
 #define APIC_TRIG_MODE_SHIFT 15
@@ -191,10 +197,10 @@ static void ioapic_set_irq(void *opaque, int vector, int level)
 
 static void ioapic_update_kvm_routes(IOAPICCommonState *s)
 {
-#ifdef CONFIG_KVM
+#ifdef ACCEL_GSI_IRQFD_POSSIBLE
     int i;
 
-    if (kvm_irqchip_is_split()) {
+    if (accel_irqchip_is_split()) {
         for (i = 0; i < IOAPIC_NUM_PINS; i++) {
             MSIMessage msg;
             struct ioapic_entry_info info;
@@ -202,15 +208,15 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s)
             if (!info.masked) {
                 msg.address = info.addr;
                 msg.data = info.data;
-                kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL);
+                accel_irqchip_update_msi_route(i, msg, NULL);
             }
         }
-        kvm_irqchip_commit_routes(kvm_state);
+        accel_irqchip_commit_routes();
     }
 #endif
 }
 
-#ifdef CONFIG_KVM
+#ifdef ACCEL_KERNEL_GSI_IRQFD_POSSIBLE
 static void ioapic_iec_notifier(void *private, bool global,
                                 uint32_t index, uint32_t mask)
 {
@@ -428,11 +434,11 @@ static const MemoryRegionOps ioapic_io_ops = {
 
 static void ioapic_machine_done_notify(Notifier *notifier, void *data)
 {
-#ifdef CONFIG_KVM
+#ifdef ACCEL_KERNEL_GSI_IRQFD_POSSIBLE
     IOAPICCommonState *s = container_of(notifier, IOAPICCommonState,
                                         machine_done);
 
-    if (kvm_irqchip_is_split()) {
+    if (accel_irqchip_is_split()) {
         X86IOMMUState *iommu = x86_iommu_get_default();
         if (iommu) {
             /* Register this IOAPIC with IOMMU IEC notifier, so that
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 0fa8fe4955..fac54e793e 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -33,6 +33,7 @@
 #include "hw/pci/msi.h"
 #include "hw/pci/msix.h"
 #include "hw/loader.h"
+#include "system/accel-irq.h"
 #include "system/kvm.h"
 #include "hw/virtio/virtio-pci.h"
 #include "qemu/range.h"
@@ -826,7 +827,7 @@ static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy,
 
     if (irqfd->users == 0) {
         KVMRouteChange c = kvm_irqchip_begin_route_changes(kvm_state);
-        ret = kvm_irqchip_add_msi_route(&c, vector, &proxy->pci_dev);
+        ret = accel_irqchip_add_msi_route(&c, vector, &proxy->pci_dev);
         if (ret < 0) {
             return ret;
         }
@@ -842,7 +843,7 @@ static void kvm_virtio_pci_vq_vector_release(VirtIOPCIProxy *proxy,
 {
     VirtIOIRQFD *irqfd = &proxy->vector_irqfd[vector];
     if (--irqfd->users == 0) {
-        kvm_irqchip_release_virq(kvm_state, irqfd->virq);
+        accel_irqchip_release_virq(irqfd->virq);
     }
 }
 
@@ -851,7 +852,7 @@ static int kvm_virtio_pci_irqfd_use(VirtIOPCIProxy *proxy,
                                  unsigned int vector)
 {
     VirtIOIRQFD *irqfd = &proxy->vector_irqfd[vector];
-    return kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, n, NULL, irqfd->virq);
+    return accel_irqchip_add_irqfd_notifier_gsi(n, NULL, irqfd->virq);
 }
 
 static void kvm_virtio_pci_irqfd_release(VirtIOPCIProxy *proxy,
@@ -861,7 +862,7 @@ static void kvm_virtio_pci_irqfd_release(VirtIOPCIProxy *proxy,
     VirtIOIRQFD *irqfd = &proxy->vector_irqfd[vector];
     int ret;
 
-    ret = kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, n, irqfd->virq);
+    ret = accel_irqchip_remove_irqfd_notifier_gsi(n, irqfd->virq);
     assert(ret == 0);
 }
 static int virtio_pci_get_notifier(VirtIOPCIProxy *proxy, int queue_no,
@@ -996,12 +997,12 @@ static int virtio_pci_one_vector_unmask(VirtIOPCIProxy *proxy,
     if (proxy->vector_irqfd) {
         irqfd = &proxy->vector_irqfd[vector];
         if (irqfd->msg.data != msg.data || irqfd->msg.address != msg.address) {
-            ret = kvm_irqchip_update_msi_route(kvm_state, irqfd->virq, msg,
-                                               &proxy->pci_dev);
+            ret = accel_irqchip_update_msi_route(irqfd->virq, msg,
+                                                 &proxy->pci_dev);
             if (ret < 0) {
                 return ret;
             }
-            kvm_irqchip_commit_routes(kvm_state);
+            accel_irqchip_commit_routes();
         }
     }
 
@@ -1225,7 +1226,7 @@ static int virtio_pci_set_guest_notifiers(DeviceState *d, int nvqs, bool assign)
     VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
     int r, n;
     bool with_irqfd = msix_enabled(&proxy->pci_dev) &&
-        kvm_msi_via_irqfd_enabled();
+        accel_msi_via_irqfd_enabled() ;
 
     nvqs = MIN(nvqs, VIRTIO_QUEUE_MAX);
 
@@ -1429,7 +1430,7 @@ static void virtio_pci_set_vector(VirtIODevice *vdev,
                                   uint16_t new_vector)
 {
     bool kvm_irqfd = (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
-        msix_enabled(&proxy->pci_dev) && kvm_msi_via_irqfd_enabled();
+        msix_enabled(&proxy->pci_dev) && accel_msi_via_irqfd_enabled();
 
     if (new_vector == old_vector) {
         return;
diff --git a/include/system/accel-irq.h b/include/system/accel-irq.h
new file mode 100644
index 0000000000..8d17fe45ea
--- /dev/null
+++ b/include/system/accel-irq.h
@@ -0,0 +1,26 @@
+#ifndef SYSEMU_ACCEL_H
+#define SYSEMU_ACCEL_H
+#include "hw/pci/msi.h"
+#include "qemu/osdep.h"
+#include "system/kvm.h"
+#include "system/mshv.h"
+
+static inline bool accel_msi_via_irqfd_enabled(void)
+{
+  return mshv_msi_via_irqfd_enabled() || kvm_msi_via_irqfd_enabled();
+}
+
+static inline bool accel_irqchip_is_split(void)
+{
+  return mshv_msi_via_irqfd_enabled() || kvm_irqchip_is_split();
+}
+
+int accel_irqchip_add_msi_route(KVMRouteChange *c, int vector, PCIDevice *dev);
+int accel_irqchip_update_msi_route(int vector, MSIMessage msg, PCIDevice *dev);
+void accel_irqchip_commit_route_changes(KVMRouteChange *c);
+void accel_irqchip_commit_routes(void);
+void accel_irqchip_release_virq(int virq);
+int accel_irqchip_add_irqfd_notifier_gsi(EventNotifier *n, EventNotifier *rn,
+                                         int virq);
+int accel_irqchip_remove_irqfd_notifier_gsi(EventNotifier *n, int virq);
+#endif
diff --git a/include/system/mshv.h b/include/system/mshv.h
index 8380b92da2..bc8f2c228a 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -24,9 +24,31 @@
 #define CONFIG_MSHV_IS_POSSIBLE
 #endif
 
+
+#ifdef CONFIG_MSHV_IS_POSSIBLE
+extern bool mshv_allowed;
+#define mshv_enabled() (mshv_allowed)
+#else /* CONFIG_MSHV_IS_POSSIBLE */
+#define mshv_enabled() false
+#endif
+#ifdef MSHV_USE_KERNEL_GSI_IRQFD
+#define mshv_msi_via_irqfd_enabled() mshv_enabled()
+#else
+#define mshv_msi_via_irqfd_enabled() false
+#endif
+
 /* cpu */
 /* EFER (technically not a register) bits */
 #define EFER_LMA   ((uint64_t)0x400)
 #define EFER_LME   ((uint64_t)0x100)
 
+/* interrupt */
+int mshv_irqchip_add_msi_route(int vector, PCIDevice *dev);
+int mshv_irqchip_update_msi_route(int virq, MSIMessage msg, PCIDevice *dev);
+void mshv_irqchip_commit_routes(void);
+void mshv_irqchip_release_virq(int virq);
+int mshv_irqchip_add_irqfd_notifier_gsi(const EventNotifier *n,
+                                        const EventNotifier *rn, int virq);
+int mshv_irqchip_remove_irqfd_notifier_gsi(const EventNotifier *n, int virq);
+
 #endif
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 05/25] include/hw/hyperv: Add MSHV ABI header definitions
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (3 preceding siblings ...)
  2025-05-20 11:29 ` [RFC PATCH 04/25] hw/intc: Generalize APIC helper names from kvm_* to accel_* Magnus Kulke
@ 2025-05-20 11:29 ` Magnus Kulke
  2025-05-20 14:24   ` Paolo Bonzini
  2025-05-20 11:29 ` [RFC PATCH 06/25] accel/mshv: Add accelerator skeleton Magnus Kulke
                   ` (20 subsequent siblings)
  25 siblings, 1 reply; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:29 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Introduce headers for the Microsoft Hypervisor (MSHV) userspace ABI,
including IOCTLs and structures used to interface with the hypervisor.

These definitions are based on the upstream Linux MSHV interface and
will be used by the MSHV accelerator backend in later patches.

Note that for the time being the header `linux-mshv.h` is also being
included to allow building on machines that do not ship the header yet.
The header will be available in kernel 6.15 (at the time of writing
we're at -rc6) we will probably drop it in later revisions of the
patch set.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 include/hw/hyperv/hvgdk.h      |   20 +
 include/hw/hyperv/hvhdk.h      |  165 +++++
 include/hw/hyperv/hvhdk_mini.h |  106 ++++
 include/hw/hyperv/linux-mshv.h | 1038 ++++++++++++++++++++++++++++++++
 4 files changed, 1329 insertions(+)
 create mode 100644 include/hw/hyperv/hvgdk.h
 create mode 100644 include/hw/hyperv/hvhdk.h
 create mode 100644 include/hw/hyperv/hvhdk_mini.h
 create mode 100644 include/hw/hyperv/linux-mshv.h

diff --git a/include/hw/hyperv/hvgdk.h b/include/hw/hyperv/hvgdk.h
new file mode 100644
index 0000000000..b03868d152
--- /dev/null
+++ b/include/hw/hyperv/hvgdk.h
@@ -0,0 +1,20 @@
+/*
+ * Type definitions for the mshv guest interface.
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef _HVGDK_H
+#define _HVGDK_H
+
+#define HVGDK_H_VERSION         (25125)
+
+enum hv_unimplemented_msr_action {
+    HV_UNIMPLEMENTED_MSR_ACTION_FAULT = 0,
+    HV_UNIMPLEMENTED_MSR_ACTION_IGNORE_WRITE_READ_ZERO = 1,
+    HV_UNIMPLEMENTED_MSR_ACTION_COUNT = 2,
+};
+
+#endif /* _HVGDK_H */
diff --git a/include/hw/hyperv/hvhdk.h b/include/hw/hyperv/hvhdk.h
new file mode 100644
index 0000000000..7896f49d10
--- /dev/null
+++ b/include/hw/hyperv/hvhdk.h
@@ -0,0 +1,165 @@
+/*
+ * Type definitions for the mshv host.
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_HYPERV_HVHDK_H
+#define HW_HYPERV_HVHDK_H
+
+#define HV_PARTITION_SYNTHETIC_PROCESSOR_FEATURES_BANKS 1
+
+struct hv_input_set_partition_property {
+    __u64 partition_id;
+    __u32 property_code; /* enum hv_partition_property_code */
+    __u32 padding;
+    __u64 property_value;
+};
+
+union hv_partition_synthetic_processor_features {
+    __u64 as_uint64[HV_PARTITION_SYNTHETIC_PROCESSOR_FEATURES_BANKS];
+
+    struct {
+        /*
+         * Report a hypervisor is present. CPUID leaves
+         * 0x40000000 and 0x40000001 are supported.
+         */
+        __u64 hypervisor_present:1;
+
+        /*
+         * Features associated with HV#1:
+         */
+
+        /* Report support for Hv1 (CPUID leaves 0x40000000 - 0x40000006). */
+        __u64 hv1:1;
+
+        /*
+         * Access to HV_X64_MSR_VP_RUNTIME.
+         * Corresponds to access_vp_run_time_reg privilege.
+         */
+        __u64 access_vp_run_time_reg:1;
+
+        /*
+         * Access to HV_X64_MSR_TIME_REF_COUNT.
+         * Corresponds to access_partition_reference_counter privilege.
+         */
+        __u64 access_partition_reference_counter:1;
+
+        /*
+         * Access to SINT-related registers (HV_X64_MSR_SCONTROL through
+         * HV_X64_MSR_EOM and HV_X64_MSR_SINT0 through HV_X64_MSR_SINT15).
+         * Corresponds to access_synic_regs privilege.
+         */
+        __u64 access_synic_regs:1;
+
+        /*
+         * Access to synthetic timers and associated MSRs
+         * (HV_X64_MSR_STIMER0_CONFIG through HV_X64_MSR_STIMER3_COUNT).
+         * Corresponds to access_synthetic_timer_regs privilege.
+         */
+        __u64 access_synthetic_timer_regs:1;
+
+        /*
+         * Access to APIC MSRs (HV_X64_MSR_EOI, HV_X64_MSR_ICR and
+         * HV_X64_MSR_TPR) as well as the VP assist page.
+         * Corresponds to access_intr_ctrl_regs privilege.
+         */
+        __u64 access_intr_ctrl_regs:1;
+
+        /*
+         * Access to registers associated with hypercalls
+         * (HV_X64_MSR_GUEST_OS_ID and HV_X64_MSR_HYPERCALL).
+         * Corresponds to access_hypercall_msrs privilege.
+         */
+        __u64 access_hypercall_regs:1;
+
+        /* VP index can be queried. corresponds to access_vp_index privilege. */
+        __u64 access_vp_index:1;
+
+        /*
+         * Access to the reference TSC. Corresponds to
+         * access_partition_reference_tsc privilege.
+         */
+        __u64 access_partition_reference_tsc:1;
+
+        /*
+         * Partition has access to the guest idle reg. Corresponds to
+         * access_guest_idle_reg privilege.
+         */
+        __u64 access_guest_idle_reg:1;
+
+        /*
+         * Partition has access to frequency regs. corresponds to
+         * access_frequency_regs privilege.
+         */
+        __u64 access_frequency_regs:1;
+
+        __u64 reserved_z12:1; /* Reserved for access_reenlightenment_controls */
+        __u64 reserved_z13:1; /* Reserved for access_root_scheduler_reg */
+        __u64 reserved_z14:1; /* Reserved for access_tsc_invariant_controls */
+
+        /*
+         * Extended GVA ranges for HvCallFlushVirtualAddressList hypercall.
+         * Corresponds to privilege.
+         */
+        __u64 enable_extended_gva_ranges_for_flush_virtual_address_list:1;
+
+        __u64 reserved_z16:1; /* Reserved for access_vsm. */
+        __u64 reserved_z17:1; /* Reserved for access_vp_registers. */
+
+        /* Use fast hypercall output. Corresponds to privilege. */
+        __u64 fast_hypercall_output:1;
+
+        __u64 reserved_z19:1; /* Reserved for enable_extended_hypercalls. */
+
+        /*
+         * HvStartVirtualProcessor can be used to start virtual processors.
+         * Corresponds to privilege.
+         */
+        __u64 start_virtual_processor:1;
+
+        __u64 reserved_z21:1; /* Reserved for Isolation. */
+
+        /* Synthetic timers in direct mode. */
+        __u64 direct_synthetic_timers:1;
+
+        __u64 reserved_z23:1; /* Reserved for synthetic time unhalted timer */
+
+        /* Use extended processor masks. */
+        __u64 extended_processor_masks:1;
+
+        /*
+         * HvCallFlushVirtualAddressSpace / HvCallFlushVirtualAddressList are
+         * supported.
+         */
+        __u64 tb_flush_hypercalls:1;
+
+        /* HvCallSendSyntheticClusterIpi is supported. */
+        __u64 synthetic_cluster_ipi:1;
+
+        /* HvCallNotifyLongSpinWait is supported. */
+        __u64 notify_long_spin_wait:1;
+
+        /* HvCallQueryNumaDistance is supported. */
+        __u64 query_numa_distance:1;
+
+        /* HvCallSignalEvent is supported. Corresponds to privilege. */
+        __u64 signal_events:1;
+
+        /* HvCallRetargetDeviceInterrupt is supported. */
+        __u64 retarget_device_interrupt:1;
+
+        /* HvCallRestorePartitionTime is supported. */
+        __u64 restore_time:1;
+
+        /* EnlightenedVmcs nested enlightenment is supported. */
+        __u64 enlightened_vmcs:1;
+
+        __u64 reserved:30;
+    };
+};
+
+#endif
diff --git a/include/hw/hyperv/hvhdk_mini.h b/include/hw/hyperv/hvhdk_mini.h
new file mode 100644
index 0000000000..3231af1f5a
--- /dev/null
+++ b/include/hw/hyperv/hvhdk_mini.h
@@ -0,0 +1,106 @@
+/*
+ * Type definitions for the mshv host interface.
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef _HVHDK_MINI_H
+#define _HVHDK_MINI_H
+
+#define HVHVK_MINI_VERSION (25294)
+
+/* Each generic set contains 64 elements */
+#define HV_GENERIC_SET_SHIFT (6)
+#define HV_GENERIC_SET_MASK  (63)
+
+#define HVCALL_GET_PARTITION_PROPERTY   0x0044
+#define HVCALL_SET_PARTITION_PROPERTY   0x0045
+#define HVCALL_ASSERT_VIRTUAL_INTERRUPT 0x0094
+
+enum hv_generic_set_format {
+    HV_GENERIC_SET_SPARSE_4K,
+    HV_GENERIC_SET_ALL,
+};
+
+enum hv_partition_property_code {
+    /* Privilege properties */
+    HV_PARTITION_PROPERTY_PRIVILEGE_FLAGS         = 0x00010000,
+    HV_PARTITION_PROPERTY_SYNTHETIC_PROC_FEATURES = 0x00010001,
+
+    /* Scheduling properties */
+    HV_PARTITION_PROPERTY_SUSPEND      = 0x00020000,
+    HV_PARTITION_PROPERTY_CPU_RESERVE  = 0x00020001,
+    HV_PARTITION_PROPERTY_CPU_CAP      = 0x00020002,
+    HV_PARTITION_PROPERTY_CPU_WEIGHT   = 0x00020003,
+    HV_PARTITION_PROPERTY_CPU_GROUP_ID = 0x00020004,
+
+    /* Time properties */
+    HV_PARTITION_PROPERTY_TIME_FREEZE    = 0x00030003,
+    HV_PARTITION_PROPERTY_REFERENCE_TIME = 0x00030005,
+
+    /* Debugging properties */
+    HV_PARTITION_PROPERTY_DEBUG_CHANNEL_ID = 0x00040000,
+
+    /* Resource properties */
+    HV_PARTITION_PROPERTY_VIRTUAL_TLB_PAGE_COUNT                 = 0x00050000,
+    HV_PARTITION_PROPERTY_VSM_CONFIG                             = 0x00050001,
+    HV_PARTITION_PROPERTY_ZERO_MEMORY_ON_RESET                   = 0x00050002,
+    HV_PARTITION_PROPERTY_PROCESSORS_PER_SOCKET                  = 0x00050003,
+    HV_PARTITION_PROPERTY_NESTED_TLB_SIZE                        = 0x00050004,
+    HV_PARTITION_PROPERTY_GPA_PAGE_ACCESS_TRACKING               = 0x00050005,
+    HV_PARTITION_PROPERTY_VSM_PERMISSIONS_DIRTY_SINCE_LAST_QUERY = 0x00050006,
+    HV_PARTITION_PROPERTY_SGX_LAUNCH_CONTROL_CONFIG              = 0x00050007,
+    HV_PARTITION_PROPERTY_DEFAULT_SGX_LAUNCH_CONTROL0            = 0x00050008,
+    HV_PARTITION_PROPERTY_DEFAULT_SGX_LAUNCH_CONTROL1            = 0x00050009,
+    HV_PARTITION_PROPERTY_DEFAULT_SGX_LAUNCH_CONTROL2            = 0x0005000a,
+    HV_PARTITION_PROPERTY_DEFAULT_SGX_LAUNCH_CONTROL3            = 0x0005000b,
+    HV_PARTITION_PROPERTY_ISOLATION_STATE                        = 0x0005000c,
+    HV_PARTITION_PROPERTY_ISOLATION_CONTROL                      = 0x0005000d,
+    HV_PARTITION_PROPERTY_ALLOCATION_ID                          = 0x0005000e,
+    HV_PARTITION_PROPERTY_MONITORING_ID                          = 0x0005000f,
+    HV_PARTITION_PROPERTY_IMPLEMENTED_PHYSICAL_ADDRESS_BITS      = 0x00050010,
+    HV_PARTITION_PROPERTY_NON_ARCHITECTURAL_CORE_SHARING         = 0x00050011,
+    HV_PARTITION_PROPERTY_HYPERCALL_DOORBELL_PAGE                = 0x00050012,
+    HV_PARTITION_PROPERTY_ISOLATION_POLICY                       = 0x00050014,
+    HV_PARTITION_PROPERTY_UNIMPLEMENTED_MSR_ACTION               = 0x00050017,
+    HV_PARTITION_PROPERTY_SEV_VMGEXIT_OFFLOADS                   = 0x00050022,
+
+    /* Compatibility properties */
+    HV_PARTITION_PROPERTY_PROCESSOR_VENDOR              = 0x00060000,
+    HV_PARTITION_PROPERTY_PROCESSOR_FEATURES_DEPRECATED = 0x00060001,
+    HV_PARTITION_PROPERTY_PROCESSOR_XSAVE_FEATURES      = 0x00060002,
+    HV_PARTITION_PROPERTY_PROCESSOR_CL_FLUSH_SIZE       = 0x00060003,
+    HV_PARTITION_PROPERTY_ENLIGHTENMENT_MODIFICATIONS   = 0x00060004,
+    HV_PARTITION_PROPERTY_COMPATIBILITY_VERSION         = 0x00060005,
+    HV_PARTITION_PROPERTY_PHYSICAL_ADDRESS_WIDTH        = 0x00060006,
+    HV_PARTITION_PROPERTY_XSAVE_STATES                  = 0x00060007,
+    HV_PARTITION_PROPERTY_MAX_XSAVE_DATA_SIZE           = 0x00060008,
+    HV_PARTITION_PROPERTY_PROCESSOR_CLOCK_FREQUENCY     = 0x00060009,
+    HV_PARTITION_PROPERTY_PROCESSOR_FEATURES0           = 0x0006000a,
+    HV_PARTITION_PROPERTY_PROCESSOR_FEATURES1           = 0x0006000b,
+
+    /* Guest software properties */
+    HV_PARTITION_PROPERTY_GUEST_OS_ID = 0x00070000,
+
+    /* Nested virtualization properties */
+    HV_PARTITION_PROPERTY_PROCESSOR_VIRTUALIZATION_FEATURES = 0x00080000,
+};
+
+/* HV Map GPA (Guest Physical Address) Flags */
+#define HV_MAP_GPA_PERMISSIONS_NONE        0x0
+#define HV_MAP_GPA_READABLE                0x1
+#define HV_MAP_GPA_WRITABLE                0x2
+#define HV_MAP_GPA_KERNEL_EXECUTABLE       0x4
+#define HV_MAP_GPA_USER_EXECUTABLE         0x8
+#define HV_MAP_GPA_EXECUTABLE              0xC
+#define HV_MAP_GPA_PERMISSIONS_MASK        0xF
+#define HV_MAP_GPA_ADJUSTABLE           0x8000
+#define HV_MAP_GPA_NO_ACCESS           0x10000
+#define HV_MAP_GPA_NOT_CACHED         0x200000
+#define HV_MAP_GPA_LARGE_PAGE       0x80000000
+
+#define HV_PFN_RNG_PAGEBITS 24  /* HV_SPA_PAGE_RANGE_ADDITIONAL_PAGES_BITS */
+
+#endif /* _HVHDK_MINI_H */
diff --git a/include/hw/hyperv/linux-mshv.h b/include/hw/hyperv/linux-mshv.h
new file mode 100644
index 0000000000..9b1e1f7ce1
--- /dev/null
+++ b/include/hw/hyperv/linux-mshv.h
@@ -0,0 +1,1038 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Userspace interfaces for /dev/mshv* devices and derived fds
+ * Includes:
+ * - VMM APIs for parent (nested/baremetal root) partition APIs
+ * - VMM APIs for VTL0 APIs
+ * - Debug and performance metrics APIs
+ *
+ * This file is divided into sections containing data structures and IOCTLs for
+ * a particular set of related devices or derived file descriptors.
+ *
+ * The IOCTL definitions are at the end of each section. They are grouped by
+ * device/fd, so that new IOCTLs can easily be added with a monotonically
+ * increasing number.
+ */
+
+#ifndef HW_HYPERV_LINUX_MSHV_H
+#define HW_HYPERV_LINUX_MSHV_H
+
+#include <linux/types.h>
+#include <stdint.h>
+
+#define MSHV_IOCTL	0xB8
+
+typedef enum hv_register_name {
+	/* Pending Interruption Register */
+	HV_REGISTER_PENDING_INTERRUPTION = 0x00010002,
+
+	/* X64 User-Mode Registers */
+	HV_X64_REGISTER_RAX		= 0x00020000,
+	HV_X64_REGISTER_RCX		= 0x00020001,
+	HV_X64_REGISTER_RDX		= 0x00020002,
+	HV_X64_REGISTER_RBX		= 0x00020003,
+	HV_X64_REGISTER_RSP		= 0x00020004,
+	HV_X64_REGISTER_RBP		= 0x00020005,
+	HV_X64_REGISTER_RSI		= 0x00020006,
+	HV_X64_REGISTER_RDI		= 0x00020007,
+	HV_X64_REGISTER_R8		= 0x00020008,
+	HV_X64_REGISTER_R9		= 0x00020009,
+	HV_X64_REGISTER_R10		= 0x0002000A,
+	HV_X64_REGISTER_R11		= 0x0002000B,
+	HV_X64_REGISTER_R12		= 0x0002000C,
+	HV_X64_REGISTER_R13		= 0x0002000D,
+	HV_X64_REGISTER_R14		= 0x0002000E,
+	HV_X64_REGISTER_R15		= 0x0002000F,
+	HV_X64_REGISTER_RIP		= 0x00020010,
+	HV_X64_REGISTER_RFLAGS	= 0x00020011,
+
+	/* X64 Floating Point and Vector Registers */
+	HV_X64_REGISTER_XMM0				= 0x00030000,
+	HV_X64_REGISTER_XMM1				= 0x00030001,
+	HV_X64_REGISTER_XMM2				= 0x00030002,
+	HV_X64_REGISTER_XMM3				= 0x00030003,
+	HV_X64_REGISTER_XMM4				= 0x00030004,
+	HV_X64_REGISTER_XMM5				= 0x00030005,
+	HV_X64_REGISTER_XMM6				= 0x00030006,
+	HV_X64_REGISTER_XMM7				= 0x00030007,
+	HV_X64_REGISTER_XMM8				= 0x00030008,
+	HV_X64_REGISTER_XMM9				= 0x00030009,
+	HV_X64_REGISTER_XMM10				= 0x0003000A,
+	HV_X64_REGISTER_XMM11				= 0x0003000B,
+	HV_X64_REGISTER_XMM12				= 0x0003000C,
+	HV_X64_REGISTER_XMM13				= 0x0003000D,
+	HV_X64_REGISTER_XMM14				= 0x0003000E,
+	HV_X64_REGISTER_XMM15				= 0x0003000F,
+	HV_X64_REGISTER_FP_MMX0				= 0x00030010,
+	HV_X64_REGISTER_FP_MMX1				= 0x00030011,
+	HV_X64_REGISTER_FP_MMX2				= 0x00030012,
+	HV_X64_REGISTER_FP_MMX3				= 0x00030013,
+	HV_X64_REGISTER_FP_MMX4				= 0x00030014,
+	HV_X64_REGISTER_FP_MMX5				= 0x00030015,
+	HV_X64_REGISTER_FP_MMX6				= 0x00030016,
+	HV_X64_REGISTER_FP_MMX7				= 0x00030017,
+	HV_X64_REGISTER_FP_CONTROL_STATUS	= 0x00030018,
+	HV_X64_REGISTER_XMM_CONTROL_STATUS	= 0x00030019,
+
+	/* X64 Control Registers */
+	HV_X64_REGISTER_CR0		= 0x00040000,
+	HV_X64_REGISTER_CR2		= 0x00040001,
+	HV_X64_REGISTER_CR3		= 0x00040002,
+	HV_X64_REGISTER_CR4		= 0x00040003,
+	HV_X64_REGISTER_CR8		= 0x00040004,
+	HV_X64_REGISTER_XFEM	= 0x00040005,
+
+	/* X64 Segment Registers */
+	HV_X64_REGISTER_ES		= 0x00060000,
+	HV_X64_REGISTER_CS		= 0x00060001,
+	HV_X64_REGISTER_SS		= 0x00060002,
+	HV_X64_REGISTER_DS		= 0x00060003,
+	HV_X64_REGISTER_FS		= 0x00060004,
+	HV_X64_REGISTER_GS		= 0x00060005,
+	HV_X64_REGISTER_LDTR	= 0x00060006,
+	HV_X64_REGISTER_TR		= 0x00060007,
+
+	/* X64 Table Registers */
+	HV_X64_REGISTER_IDTR	= 0x00070000,
+	HV_X64_REGISTER_GDTR	= 0x00070001,
+
+	/* X64 Virtualized MSRs */
+	HV_X64_REGISTER_TSC				= 0x00080000,
+	HV_X64_REGISTER_EFER			= 0x00080001,
+	HV_X64_REGISTER_KERNEL_GS_BASE	= 0x00080002,
+	HV_X64_REGISTER_APIC_BASE		= 0x00080003,
+	HV_X64_REGISTER_PAT				= 0x00080004,
+	HV_X64_REGISTER_SYSENTER_CS		= 0x00080005,
+	HV_X64_REGISTER_SYSENTER_EIP	= 0x00080006,
+	HV_X64_REGISTER_SYSENTER_ESP	= 0x00080007,
+	HV_X64_REGISTER_STAR			= 0x00080008,
+	HV_X64_REGISTER_LSTAR			= 0x00080009,
+	HV_X64_REGISTER_CSTAR			= 0x0008000A,
+	HV_X64_REGISTER_SFMASK			= 0x0008000B,
+	HV_X64_REGISTER_INITIAL_APIC_ID	= 0x0008000C,
+
+	/* X64 Cache control MSRs */
+	HV_X64_REGISTER_MSR_MTRR_CAP			= 0x0008000D,
+	HV_X64_REGISTER_MSR_MTRR_DEF_TYPE		= 0x0008000E,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_BASE0		= 0x00080010,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_BASE1		= 0x00080011,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_BASE2		= 0x00080012,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_BASE3		= 0x00080013,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_BASE4		= 0x00080014,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_BASE5		= 0x00080015,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_BASE6		= 0x00080016,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_BASE7		= 0x00080017,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_BASE8		= 0x00080018,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_BASE9		= 0x00080019,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_BASEA		= 0x0008001A,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_BASEB		= 0x0008001B,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_BASEC		= 0x0008001C,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_BASED		= 0x0008001D,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_BASEE		= 0x0008001E,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_BASEF		= 0x0008001F,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_MASK0		= 0x00080040,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_MASK1		= 0x00080041,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_MASK2		= 0x00080042,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_MASK3		= 0x00080043,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_MASK4		= 0x00080044,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_MASK5		= 0x00080045,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_MASK6		= 0x00080046,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_MASK7		= 0x00080047,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_MASK8		= 0x00080048,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_MASK9		= 0x00080049,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_MASKA		= 0x0008004A,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_MASKB		= 0x0008004B,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_MASKC		= 0x0008004C,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_MASKD		= 0x0008004D,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_MASKE		= 0x0008004E,
+	HV_X64_REGISTER_MSR_MTRR_PHYS_MASKF		= 0x0008004F,
+	HV_X64_REGISTER_MSR_MTRR_FIX64K00000	= 0x00080070,
+	HV_X64_REGISTER_MSR_MTRR_FIX16K80000	= 0x00080071,
+	HV_X64_REGISTER_MSR_MTRR_FIX16KA0000	= 0x00080072,
+	HV_X64_REGISTER_MSR_MTRR_FIX4KC0000		= 0x00080073,
+	HV_X64_REGISTER_MSR_MTRR_FIX4KC8000		= 0x00080074,
+	HV_X64_REGISTER_MSR_MTRR_FIX4KD0000		= 0x00080075,
+	HV_X64_REGISTER_MSR_MTRR_FIX4KD8000		= 0x00080076,
+	HV_X64_REGISTER_MSR_MTRR_FIX4KE0000		= 0x00080077,
+	HV_X64_REGISTER_MSR_MTRR_FIX4KE8000		= 0x00080078,
+	HV_X64_REGISTER_MSR_MTRR_FIX4KF0000		= 0x00080079,
+	HV_X64_REGISTER_MSR_MTRR_FIX4KF8000		= 0x0008007A,
+
+	HV_X64_REGISTER_TSC_AUX		= 0x0008007B,
+	HV_X64_REGISTER_BNDCFGS		= 0x0008007C,
+	HV_X64_REGISTER_DEBUG_CTL	= 0x0008007D,
+
+	/* Available */
+
+	HV_X64_REGISTER_SPEC_CTRL		= 0x00080084,
+	HV_X64_REGISTER_TSC_ADJUST	    = 0x00080096,
+
+	/* Other MSRs */
+	HV_X64_REGISTER_MSR_IA32_MISC_ENABLE = 0x000800A0,
+
+	/* Misc */
+	HV_REGISTER_GUEST_OS_ID			= 0x00090002,
+	HV_REGISTER_REFERENCE_TSC		= 0x00090017,
+
+	/* Hypervisor-defined Registers (Synic) */
+	HV_REGISTER_SINT0		= 0x000A0000,
+	HV_REGISTER_SINT1		= 0x000A0001,
+	HV_REGISTER_SINT2		= 0x000A0002,
+	HV_REGISTER_SINT3		= 0x000A0003,
+	HV_REGISTER_SINT4		= 0x000A0004,
+	HV_REGISTER_SINT5		= 0x000A0005,
+	HV_REGISTER_SINT6		= 0x000A0006,
+	HV_REGISTER_SINT7		= 0x000A0007,
+	HV_REGISTER_SINT8		= 0x000A0008,
+	HV_REGISTER_SINT9		= 0x000A0009,
+	HV_REGISTER_SINT10		= 0x000A000A,
+	HV_REGISTER_SINT11		= 0x000A000B,
+	HV_REGISTER_SINT12		= 0x000A000C,
+	HV_REGISTER_SINT13		= 0x000A000D,
+	HV_REGISTER_SINT14		= 0x000A000E,
+	HV_REGISTER_SINT15		= 0x000A000F,
+	HV_REGISTER_SCONTROL	= 0x000A0010,
+	HV_REGISTER_SVERSION	= 0x000A0011,
+	HV_REGISTER_SIEFP		= 0x000A0012,
+	HV_REGISTER_SIMP		= 0x000A0013,
+	HV_REGISTER_EOM			= 0x000A0014,
+	HV_REGISTER_SIRBP		= 0x000A0015,
+} hv_register_name;
+
+enum hv_intercept_type {
+	HV_INTERCEPT_TYPE_X64_IO_PORT		= 0X00000000,
+	HV_INTERCEPT_TYPE_X64_MSR			= 0X00000001,
+	HV_INTERCEPT_TYPE_X64_CPUID			= 0X00000002,
+	HV_INTERCEPT_TYPE_EXCEPTION			= 0X00000003,
+
+	/* Used to be HV_INTERCEPT_TYPE_REGISTER */
+	HV_INTERCEPT_TYPE_RESERVED0			= 0X00000004,
+	HV_INTERCEPT_TYPE_MMIO				= 0X00000005,
+	HV_INTERCEPT_TYPE_X64_GLOBAL_CPUID	= 0X00000006,
+	HV_INTERCEPT_TYPE_X64_APIC_SMI		= 0X00000007,
+	HV_INTERCEPT_TYPE_HYPERCALL			= 0X00000008,
+
+	HV_INTERCEPT_TYPE_X64_APIC_INIT_SIPI		= 0X00000009,
+	HV_INTERCEPT_MC_UPDATE_PATCH_LEVEL_MSR_READ	= 0X0000000A,
+
+	HV_INTERCEPT_TYPE_X64_APIC_WRITE		= 0X0000000B,
+	HV_INTERCEPT_TYPE_X64_MSR_INDEX			= 0X0000000C,
+	HV_INTERCEPT_TYPE_MAX,
+	HV_INTERCEPT_TYPE_INVALID			    = 0XFFFFFFFF,
+};
+
+struct hv_u128 {
+	__u64 low_part;
+	__u64 high_part;
+};
+
+union hv_x64_xmm_control_status_register {
+	struct hv_u128 as_uint128;
+	struct {
+		union {
+			/* long mode */
+			__u64 last_fp_rdp;
+			/* 32 bit mode */
+			struct {
+				__u32 last_fp_dp;
+				__u16 last_fp_ds;
+				__u16 padding;
+			};
+		};
+		__u32 xmm_status_control;
+		__u32 xmm_status_control_mask;
+	};
+};
+
+union hv_x64_fp_register {
+	struct hv_u128 as_uint128;
+	struct {
+		__u64 mantissa;
+		__u64 biased_exponent : 15;
+		__u64 sign : 1;
+		__u64 reserved : 48;
+	};
+};
+
+union hv_x64_pending_exception_event {
+	__u64 as_uint64[2];
+	struct {
+		__u32 event_pending : 1;
+		__u32 event_type : 3;
+		__u32 reserved0 : 4;
+		__u32 deliver_error_code : 1;
+		__u32 reserved1 : 7;
+		__u32 vector : 16;
+		__u32 error_code;
+		__u64 exception_parameter;
+	};
+};
+
+union hv_x64_pending_virtualization_fault_event {
+	__u64 as_uint64[2];
+	struct {
+		__u32 event_pending : 1;
+		__u32 event_type : 3;
+		__u32 reserved0 : 4;
+		__u32 reserved1 : 8;
+		__u32 parameter0 : 16;
+		__u32 code;
+		__u64 parameter1;
+	};
+};
+
+union hv_x64_pending_interruption_register {
+	__u64 as_uint64;
+	struct {
+		__u32 interruption_pending : 1;
+		__u32 interruption_type : 3;
+		__u32 deliver_error_code : 1;
+		__u32 instruction_length : 4;
+		__u32 nested_event : 1;
+		__u32 reserved : 6;
+		__u32 interruption_vector : 16;
+		__u32 error_code;
+	};
+};
+
+union hv_x64_register_sev_control {
+	__u64 as_uint64;
+	struct {
+		__u64 enable_encrypted_state : 1;
+		__u64 reserved_z : 11;
+		__u64 vmsa_gpa_page_number : 52;
+	};
+};
+
+union hv_x64_msr_npiep_config_contents {
+	__u64 as_uint64;
+	struct {
+		/*
+		 * These bits enable instruction execution prevention for
+		 * specific instructions.
+		 */
+		__u64 prevents_gdt : 1;
+		__u64 prevents_idt : 1;
+		__u64 prevents_ldt : 1;
+		__u64 prevents_tr : 1;
+
+		/* The reserved bits must always be 0. */
+		__u64 reserved : 60;
+	};
+};
+
+typedef struct hv_x64_segment_register {
+	__u64 base;
+	__u32 limit;
+	__u16 selector;
+	union {
+		struct {
+			__u16 segment_type : 4;
+			__u16 non_system_segment : 1;
+			__u16 descriptor_privilege_level : 2;
+			__u16 present : 1;
+			__u16 reserved : 4;
+			__u16 available : 1;
+			__u16 _long : 1;
+			__u16 _default : 1;
+			__u16 granularity : 1;
+		};
+		__u16 attributes;
+	};
+} hv_x64_segment_register;
+
+typedef struct hv_x64_table_register {
+	__u16 pad[3];
+	__u16 limit;
+	__u64 base;
+} hv_x64_table_register;
+
+union hv_x64_fp_control_status_register {
+	struct hv_u128 as_uint128;
+	struct {
+		__u16 fp_control;
+		__u16 fp_status;
+		__u8 fp_tag;
+		__u8 reserved;
+		__u16 last_fp_op;
+		union {
+			/* long mode */
+			__u64 last_fp_rip;
+			/* 32 bit mode */
+			struct {
+				__u32 last_fp_eip;
+				__u16 last_fp_cs;
+				__u16 padding;
+			};
+		};
+	};
+};
+
+/* General Hypervisor Register Content Definitions */
+
+union hv_explicit_suspend_register {
+	__u64 as_uint64;
+	struct {
+		__u64 suspended : 1;
+		__u64 reserved : 63;
+	};
+};
+
+union hv_internal_activity_register {
+	__u64 as_uint64;
+
+	struct {
+		__u64 startup_suspend : 1;
+		__u64 halt_suspend : 1;
+		__u64 idle_suspend : 1;
+		__u64 rsvd_z : 61;
+	};
+};
+
+union hv_x64_interrupt_state_register {
+	__u64 as_uint64;
+	struct {
+		__u64 interrupt_shadow : 1;
+		__u64 nmi_masked : 1;
+		__u64 reserved : 62;
+	};
+};
+
+union hv_intercept_suspend_register {
+	__u64 as_uint64;
+	struct {
+		__u64 suspended : 1;
+		__u64 reserved : 63;
+	};
+};
+
+union hv_register_value {
+	struct hv_u128 reg128;
+	__u64 reg64;
+	__u32 reg32;
+	__u16 reg16;
+	__u8 reg8;
+	union hv_x64_fp_register fp;
+	union hv_x64_fp_control_status_register fp_control_status;
+	union hv_x64_xmm_control_status_register xmm_control_status;
+	struct hv_x64_segment_register segment;
+	struct hv_x64_table_register table;
+	union hv_explicit_suspend_register explicit_suspend;
+	union hv_intercept_suspend_register intercept_suspend;
+	union hv_internal_activity_register internal_activity;
+	union hv_x64_interrupt_state_register interrupt_state;
+	union hv_x64_pending_interruption_register pending_interruption;
+	union hv_x64_msr_npiep_config_contents npiep_config;
+	union hv_x64_pending_exception_event pending_exception_event;
+	union hv_x64_pending_virtualization_fault_event
+		pending_virtualization_fault_event;
+	union hv_x64_register_sev_control sev_control;
+};
+
+typedef struct hv_register_assoc {
+	__u32 name;			/* enum hv_register_name */
+	__u32 reserved1;
+	__u64 reserved2;
+	union hv_register_value value;
+} hv_register_assoc;
+
+#define MSHV_VP_MAX_REGISTERS	128
+
+struct mshv_vp_registers {
+	int count; /* at most MSHV_VP_MAX_REGISTERS */
+	struct hv_register_assoc *regs;
+};
+
+/**
+ * struct mshv_user_mem_region - arguments for MSHV_SET_GUEST_MEMORY
+ * @size: Size of the memory region (bytes). Must be aligned to PAGE_SIZE
+ * @guest_pfn: Base guest page number to map
+ * @userspace_addr: Base address of userspace memory. Must be aligned to
+ *                  PAGE_SIZE
+ * @flags: Bitmask of 1 << MSHV_SET_MEM_BIT_*. If (1 << MSHV_SET_MEM_BIT_UNMAP)
+ *         is set, ignore other bits.
+ * @rsvd: MBZ
+ *
+ * Map or unmap a region of userspace memory to Guest Physical Addresses (GPA).
+ * Mappings can't overlap in GPA space or userspace.
+ * To unmap, these fields must match an existing mapping.
+ */
+typedef struct mshv_user_mem_region {
+	__u64 size;
+	__u64 guest_pfn;
+	__u64 userspace_addr;
+	__u8 flags;
+	__u8 rsvd[7];
+} mshv_user_mem_region;
+
+enum {
+	MSHV_SET_MEM_BIT_WRITABLE,
+	MSHV_SET_MEM_BIT_EXECUTABLE,
+	MSHV_SET_MEM_BIT_UNMAP,
+	MSHV_SET_MEM_BIT_COUNT
+};
+#define MSHV_SET_MEM_FLAGS_MASK ((1 << MSHV_SET_MEM_BIT_COUNT) - 1)
+
+enum {
+	MSHV_PT_BIT_LAPIC,
+	MSHV_PT_BIT_X2APIC,
+	MSHV_PT_BIT_GPA_SUPER_PAGES,
+	MSHV_PT_BIT_COUNT,
+};
+#define MSHV_PT_FLAGS_MASK ((1 << MSHV_PT_BIT_COUNT) - 1)
+
+enum {
+	MSHV_PT_ISOLATION_NONE,
+	MSHV_PT_ISOLATION_SNP,
+	MSHV_PT_ISOLATION_COUNT,
+};
+
+enum {
+	MSHV_IOEVENTFD_BIT_DATAMATCH,
+	MSHV_IOEVENTFD_BIT_PIO,
+	MSHV_IOEVENTFD_BIT_DEASSIGN,
+	MSHV_IOEVENTFD_BIT_COUNT,
+};
+#define MSHV_IOEVENTFD_FLAGS_MASK	((1 << MSHV_IOEVENTFD_BIT_COUNT) - 1)
+
+union hv_interrupt_control {
+	__u64 as_uint64;
+	struct {
+		__u32 interrupt_type; /* enum hv_interrupt type */
+		__u32 level_triggered : 1;
+		__u32 logical_dest_mode : 1;
+		__u32 rsvd : 30;
+	};
+};
+
+struct hv_input_assert_virtual_interrupt {
+	__u64 partition_id;
+	union hv_interrupt_control control;
+	__u64 dest_addr; /* cpu's apic id */
+	__u32 vector;
+	__u8 target_vtl;
+	__u8 rsvd_z0;
+	__u16 rsvd_z1;
+};
+
+struct hv_register_x64_cpuid_result_parameters {
+	struct {
+		__u32 eax;
+		__u32 ecx;
+		__u8 subleaf_specific;
+		__u8 always_override;
+		__u16 padding;
+	} input;
+	struct {
+		__u32 eax;
+		__u32 eax_mask;
+		__u32 ebx;
+		__u32 ebx_mask;
+		__u32 ecx;
+		__u32 ecx_mask;
+		__u32 edx;
+		__u32 edx_mask;
+	} result;
+};
+
+struct hv_register_x64_msr_result_parameters {
+	__u32 msr_index;
+	__u32 access_type;
+	__u32 action; /* enum hv_unimplemented_msr_action */
+};
+
+union hv_register_intercept_result_parameters {
+	struct hv_register_x64_cpuid_result_parameters cpuid;
+	struct hv_register_x64_msr_result_parameters msr;
+};
+
+struct mshv_register_intercept_result {
+	__u32 intercept_type; /* enum hv_intercept_type */
+	union hv_register_intercept_result_parameters parameters;
+};
+
+typedef struct mshv_user_ioeventfd {
+	__u64 datamatch;
+	__u64 addr;	   /* legal pio/mmio address */
+	__u32 len;	   /* 1, 2, 4, or 8 bytes    */
+	__s32 fd;
+	__u32 flags;
+	__u8  rsvd[4];
+} mshv_user_ioeventfd;
+
+typedef struct mshv_user_irq_entry {
+	__u32 gsi;
+	__u32 address_lo;
+	__u32 address_hi;
+	__u32 data;
+} mshv_user_irq_entry;
+
+struct mshv_user_irq_table {
+	__u32 nr;
+	__u32 rsvd; /* MBZ */
+	struct mshv_user_irq_entry entries[0];
+};
+
+enum {
+	MSHV_IRQFD_BIT_DEASSIGN,
+	MSHV_IRQFD_BIT_RESAMPLE,
+	MSHV_IRQFD_BIT_COUNT,
+};
+#define MSHV_IRQFD_FLAGS_MASK	((1 << MSHV_IRQFD_BIT_COUNT) - 1)
+
+struct mshv_user_irqfd {
+	__s32 fd;
+	__s32 resamplefd;
+	__u32 gsi;
+	__u32 flags;
+};
+
+/**
+ * struct mshv_create_partition - arguments for MSHV_CREATE_PARTITION
+ * @pt_flags: Bitmask of 1 << MSHV_PT_BIT_*
+ * @pt_isolation: MSHV_PT_ISOLATION_*
+ *
+ * Returns a file descriptor to act as a handle to a guest partition.
+ * At this point the partition is not yet initialized in the hypervisor.
+ * Some operations must be done with the partition in this state, e.g. setting
+ * so-called "early" partition properties. The partition can then be
+ * initialized with MSHV_INITIALIZE_PARTITION.
+ */
+struct mshv_create_partition {
+	__u64 pt_flags;
+	__u64 pt_isolation;
+};
+
+struct mshv_create_vp {
+	__u32 vp_index;
+};
+
+enum hv_translate_gva_result_code {
+	HV_TRANSLATE_GVA_SUCCESS					= 0,
+
+	/* Translation failures. */
+	HV_TRANSLATE_GVA_PAGE_NOT_PRESENT			= 1,
+	HV_TRANSLATE_GVA_PRIVILEGE_VIOLATION		= 2,
+	HV_TRANSLATE_GVA_INVALIDE_PAGE_TABLE_FLAGS	= 3,
+
+	/* GPA access failures. */
+	HV_TRANSLATE_GVA_GPA_UNMAPPED				= 4,
+	HV_TRANSLATE_GVA_GPA_NO_READ_ACCESS			= 5,
+	HV_TRANSLATE_GVA_GPA_NO_WRITE_ACCESS		= 6,
+	HV_TRANSLATE_GVA_GPA_ILLEGAL_OVERLAY_ACCESS	= 7,
+
+	/*
+	 * Intercept for memory access by either
+	 *  - a higher VTL
+	 *  - a nested hypervisor (due to a violation of the nested page table)
+	 */
+	HV_TRANSLATE_GVA_INTERCEPT					= 8,
+
+	HV_TRANSLATE_GVA_GPA_UNACCEPTED				= 9,
+};
+
+union hv_translate_gva_result {
+	__u64 as_uint64;
+	struct {
+		__u32 result_code; /* enum hv_translate_hva_result_code */
+		__u32 cache_type : 8;
+		__u32 overlay_page : 1;
+		__u32 reserved : 23;
+	};
+};
+
+typedef struct mshv_translate_gva {
+	__u64 gva;
+	__u64 flags;
+	union hv_translate_gva_result *result;
+	__u64 *gpa;
+} mshv_translate_gva;
+
+/* /dev/mshv */
+#define MSHV_CREATE_PARTITION	_IOW(MSHV_IOCTL, 0x00, struct mshv_create_partition)
+#define MSHV_CREATE_VP			_IOW(MSHV_IOCTL, 0x01, struct mshv_create_vp)
+
+/* Partition fds created with MSHV_CREATE_PARTITION */
+#define MSHV_INITIALIZE_PARTITION	_IO(MSHV_IOCTL, 0x00)
+#define MSHV_SET_GUEST_MEMORY		_IOW(MSHV_IOCTL, 0x02, struct mshv_user_mem_region)
+#define MSHV_IRQFD					_IOW(MSHV_IOCTL, 0x03, struct mshv_user_irqfd)
+#define MSHV_IOEVENTFD			    _IOW(MSHV_IOCTL, 0x04, struct mshv_user_ioeventfd)
+#define MSHV_SET_MSI_ROUTING		_IOW(MSHV_IOCTL, 0x05, struct mshv_user_irq_table)
+
+/* TODO: replace with ROOT_HVCALL */
+#define MSHV_GET_VP_REGISTERS		_IOWR(MSHV_IOCTL, 0xF0, struct mshv_vp_registers)
+#define MSHV_SET_VP_REGISTERS		_IOW(MSHV_IOCTL, 0xF1, struct mshv_vp_registers)
+#define MSHV_TRANSLATE_GVA			_IOWR(MSHV_IOCTL, 0xF2, struct mshv_translate_gva)
+
+#define MSHV_VP_REGISTER_INTERCEPT_RESULT _IOW(MSHV_IOCTL, 0xF3, struct mshv_register_intercept_result)
+
+/*
+ ********************************
+ * VP APIs for child partitions *
+ ********************************
+ */
+
+enum {
+	MSHV_VP_STATE_LAPIC = 0,
+	MSHV_VP_STATE_XSAVE, /* XSAVE data in compacted form */
+	MSHV_VP_STATE_SIMP,
+	MSHV_VP_STATE_SIEFP,
+	MSHV_VP_STATE_SYNTHETIC_TIMERS,
+	MSHV_VP_STATE_COUNT,
+};
+
+typedef struct mshv_get_set_vp_state {
+	__u8 type;	/* MSHV_VP_STATE_* */
+	__u8 rsvd[3];	/* MBZ */
+	__u32 buf_sz;	/* in - 4k page-aligned size of buffer.
+			 * out - actual size of data.
+			 * On EINVAL, check this to see if buffer was too small
+			 */
+	__u64 buf_ptr;	/* 4k page-aligned data buffer. */
+} mshv_get_set_vp_state;
+
+struct hv_local_interrupt_controller_state {
+	/* HV_X64_INTERRUPT_CONTROLLER_STATE */
+	__u32 apic_id;
+	__u32 apic_version;
+	__u32 apic_ldr;
+	__u32 apic_dfr;
+	__u32 apic_spurious;
+	__u32 apic_isr[8];
+	__u32 apic_tmr[8];
+	__u32 apic_irr[8];
+	__u32 apic_esr;
+	__u32 apic_icr_high;
+	__u32 apic_icr_low;
+	__u32 apic_lvt_timer;
+	__u32 apic_lvt_thermal;
+	__u32 apic_lvt_perfmon;
+	__u32 apic_lvt_lint0;
+	__u32 apic_lvt_lint1;
+	__u32 apic_lvt_error;
+	__u32 apic_lvt_cmci;
+	__u32 apic_error_status;
+	__u32 apic_initial_count;
+	__u32 apic_counter_value;
+	__u32 apic_divide_configuration;
+	__u32 apic_remote_read;
+};
+
+#define MSHV_RUN_VP_BUF_SZ 256
+
+struct mshv_run_vp {
+	__u8 msg_buf[MSHV_RUN_VP_BUF_SZ];
+};
+
+#define MSHV_RUN_VP			    _IOR(MSHV_IOCTL, 0x00, struct mshv_run_vp)
+#define MSHV_GET_VP_STATE		_IOWR(MSHV_IOCTL, 0x01, struct mshv_get_set_vp_state)
+#define MSHV_SET_VP_STATE		_IOWR(MSHV_IOCTL, 0x02, struct mshv_get_set_vp_state)
+
+/**
+ * struct mshv_root_hvcall - arguments for MSHV_ROOT_HVCALL
+ * @code: Hypercall code (HVCALL_*)
+ * @reps: in: Rep count ('repcount')
+ *	  out: Reps completed ('repcomp'). MBZ unless rep hvcall
+ * @in_sz: Size of input incl rep data. <= HV_HYP_PAGE_SIZE
+ * @out_sz: Size of output buffer. <= HV_HYP_PAGE_SIZE. MBZ if out_ptr is 0
+ * @status: in: MBZ
+ *	    out: HV_STATUS_* from hypercall
+ * @rsvd: MBZ
+ * @in_ptr: Input data buffer (struct hv_input_*). If used with partition or
+ *	    vp fd, partition id field is added by kernel.
+ * @out_ptr: Output data buffer (optional)
+ */
+struct mshv_root_hvcall {
+	__u16 code;
+	__u16 reps;
+	__u16 in_sz;
+	__u16 out_sz;
+	__u16 status;
+	__u8 rsvd[6];
+	__u64 in_ptr;
+	__u64 out_ptr;
+};
+
+/* Generic hypercall */
+#define MSHV_ROOT_HVCALL		_IOWR(MSHV_IOCTL, 0x07, struct mshv_root_hvcall)
+
+/* From hvgdk_mini.h */
+
+#define HV_X64_MSR_GUEST_OS_ID		0x40000000
+#define HV_X64_MSR_SINT0			0x40000090
+#define HV_X64_MSR_SINT1			0x40000091
+#define HV_X64_MSR_SINT2			0x40000092
+#define HV_X64_MSR_SINT3			0x40000093
+#define HV_X64_MSR_SINT4			0x40000094
+#define HV_X64_MSR_SINT5			0x40000095
+#define HV_X64_MSR_SINT6			0x40000096
+#define HV_X64_MSR_SINT7			0x40000097
+#define HV_X64_MSR_SINT8			0x40000098
+#define HV_X64_MSR_SINT9			0x40000099
+#define HV_X64_MSR_SINT10			0x4000009A
+#define HV_X64_MSR_SINT11			0x4000009B
+#define HV_X64_MSR_SINT12			0x4000009C
+#define HV_X64_MSR_SINT13			0x4000009D
+#define HV_X64_MSR_SINT14			0x4000009E
+#define HV_X64_MSR_SINT15			0x4000009F
+#define HV_X64_MSR_SCONTROL			0x40000080
+#define HV_X64_MSR_SIEFP			0x40000082
+#define HV_X64_MSR_SIMP				0x40000083
+#define HV_X64_MSR_REFERENCE_TSC	0x40000021
+#define HV_X64_MSR_EOM				0x40000084
+
+/* Define port identifier type. */
+union hv_port_id {
+	__u32 as__u32;
+	struct {
+		__u32 id : 24;
+		__u32 reserved : 8;
+	};
+};
+
+#define HV_MESSAGE_SIZE			        (256)
+#define HV_MESSAGE_PAYLOAD_BYTE_COUNT	(240)
+#define HV_MESSAGE_PAYLOAD_QWORD_COUNT	(30)
+
+/* Define hypervisor message types. */
+enum hv_message_type {
+	HVMSG_NONE							= 0x00000000,
+
+	/* Memory access messages. */
+	HVMSG_UNMAPPED_GPA					= 0x80000000,
+	HVMSG_GPA_INTERCEPT					= 0x80000001,
+	HVMSG_UNACCEPTED_GPA				= 0x80000003,
+	HVMSG_GPA_ATTRIBUTE_INTERCEPT		= 0x80000004,
+
+	/* Timer notification messages. */
+	HVMSG_TIMER_EXPIRED					= 0x80000010,
+
+	/* Error messages. */
+	HVMSG_INVALID_VP_REGISTER_VALUE		= 0x80000020,
+	HVMSG_UNRECOVERABLE_EXCEPTION		= 0x80000021,
+	HVMSG_UNSUPPORTED_FEATURE			= 0x80000022,
+
+	/*
+	 * Opaque intercept message. The original intercept message is only
+	 * accessible from the mapped intercept message page.
+	 */
+	HVMSG_OPAQUE_INTERCEPT				= 0x8000003F,
+
+	/* Trace buffer complete messages. */
+	HVMSG_EVENTLOG_BUFFERCOMPLETE		= 0x80000040,
+
+	/* Hypercall intercept */
+	HVMSG_HYPERCALL_INTERCEPT			= 0x80000050,
+
+	/* SynIC intercepts */
+	HVMSG_SYNIC_EVENT_INTERCEPT			= 0x80000060,
+	HVMSG_SYNIC_SINT_INTERCEPT			= 0x80000061,
+	HVMSG_SYNIC_SINT_DELIVERABLE		= 0x80000062,
+
+	/* Async call completion intercept */
+	HVMSG_ASYNC_CALL_COMPLETION			= 0x80000070,
+
+	/* Root scheduler messages */
+	HVMSG_SCHEDULER_VP_SIGNAL_BITSE		= 0x80000100,
+	HVMSG_SCHEDULER_VP_SIGNAL_PAIR		= 0x80000101,
+
+	/* Platform-specific processor intercept messages. */
+	HVMSG_X64_IO_PORT_INTERCEPT			= 0x80010000,
+	HVMSG_X64_MSR_INTERCEPT				= 0x80010001,
+	HVMSG_X64_CPUID_INTERCEPT			= 0x80010002,
+	HVMSG_X64_EXCEPTION_INTERCEPT		= 0x80010003,
+	HVMSG_X64_APIC_EOI					= 0x80010004,
+	HVMSG_X64_LEGACY_FP_ERROR			= 0x80010005,
+	HVMSG_X64_IOMMU_PRQ					= 0x80010006,
+	HVMSG_X64_HALT						= 0x80010007,
+	HVMSG_X64_INTERRUPTION_DELIVERABLE	= 0x80010008,
+	HVMSG_X64_SIPI_INTERCEPT			= 0x80010009,
+	HVMSG_X64_SEV_VMGEXIT_INTERCEPT		= 0x80010013,
+};
+
+union hv_x64_vp_execution_state {
+	__u16 as_uint16;
+	struct {
+		__u16 cpl:2;
+		__u16 cr0_pe:1;
+		__u16 cr0_am:1;
+		__u16 efer_lma:1;
+		__u16 debug_active:1;
+		__u16 interruption_pending:1;
+		__u16 vtl:4;
+		__u16 enclave_mode:1;
+		__u16 interrupt_shadow:1;
+		__u16 virtualization_fault_active:1;
+		__u16 reserved:2;
+	};
+};
+
+/* From openvmm::hvdef */
+enum hv_x64_intercept_access_type {
+	HV_X64_INTERCEPT_ACCESS_TYPE_READ = 0,
+	HV_X64_INTERCEPT_ACCESS_TYPE_WRITE = 1,
+	HV_X64_INTERCEPT_ACCESS_TYPE_EXECUTE = 2,
+};
+
+struct hv_x64_intercept_message_header {
+	__u32 vp_index;
+	__u8 instruction_length:4;
+	__u8 cr8:4; /* Only set for exo partitions */
+	__u8 intercept_access_type;
+	union hv_x64_vp_execution_state execution_state;
+	struct hv_x64_segment_register cs_segment;
+	__u64 rip;
+	__u64 rflags;
+};
+
+union hv_x64_io_port_access_info {
+	__u8 as_uint8;
+	struct {
+		__u8 access_size:3;
+		__u8 string_op:1;
+		__u8 rep_prefix:1;
+		__u8 reserved:3;
+	};
+};
+
+typedef struct hv_x64_io_port_intercept_message {
+	struct hv_x64_intercept_message_header header;
+	__u16 port_number;
+	union hv_x64_io_port_access_info access_info;
+	__u8 instruction_byte_count;
+	__u32 reserved;
+	__u64 rax;
+	__u8 instruction_bytes[16];
+	struct hv_x64_segment_register ds_segment;
+	struct hv_x64_segment_register es_segment;
+	__u64 rcx;
+	__u64 rsi;
+	__u64 rdi;
+} hv_x64_io_port_intercept_message;
+
+union hv_x64_memory_access_info {
+	__u8 as_uint8;
+	struct {
+		__u8 gva_valid:1;
+		__u8 gva_gpa_valid:1;
+		__u8 hypercall_output_pending:1;
+		__u8 tlb_locked_no_overlay:1;
+		__u8 reserved:4;
+	};
+};
+
+struct hv_x64_memory_intercept_message {
+	struct hv_x64_intercept_message_header header;
+	__u32 cache_type; /* enum hv_cache_type */
+	__u8 instruction_byte_count;
+	union hv_x64_memory_access_info memory_access_info;
+	__u8 tpr_priority;
+	__u8 reserved1;
+	__u64 guest_virtual_address;
+	__u64 guest_physical_address;
+	__u8 instruction_bytes[16];
+};
+
+union hv_message_flags {
+	__u8 asu8;
+	struct {
+		__u8 msg_pending : 1;
+		__u8 reserved : 7;
+	};
+};
+
+struct hv_message_header {
+	__u32 message_type;
+	__u8 payload_size;
+	union hv_message_flags message_flags;
+	__u8 reserved[2];
+	union {
+		__u64 sender;
+		union hv_port_id port;
+	};
+};
+
+struct hv_message {
+	struct hv_message_header header;
+	union {
+		__u64 payload[HV_MESSAGE_PAYLOAD_QWORD_COUNT];
+	} u;
+};
+
+/* From  github.com/rust-vmm/mshv-bindings/src/x86_64/regs.rs */
+
+struct hv_cpuid_entry {
+	uint32_t function;
+	uint32_t index;
+	uint32_t flags;
+	uint32_t eax;
+	uint32_t ebx;
+	uint32_t ecx;
+	uint32_t edx;
+	uint32_t padding[3];
+};
+
+struct hv_cpuid {
+	uint32_t nent;
+	uint32_t padding;
+	struct hv_cpuid_entry entries[0];
+};
+
+#define IA32_MSR_TSC 			0x00000010
+#define IA32_MSR_EFER 			0xC0000080
+#define IA32_MSR_KERNEL_GS_BASE 0xC0000102
+#define IA32_MSR_APIC_BASE 		0x0000001B
+#define IA32_MSR_PAT 			0x0277
+#define IA32_MSR_SYSENTER_CS 	0x00000174
+#define IA32_MSR_SYSENTER_ESP 	0x00000175
+#define IA32_MSR_SYSENTER_EIP 	0x00000176
+#define IA32_MSR_STAR 			0xC0000081
+#define IA32_MSR_LSTAR 			0xC0000082
+#define IA32_MSR_CSTAR 			0xC0000083
+#define IA32_MSR_SFMASK 		0xC0000084
+
+#define IA32_MSR_MTRR_CAP 		0x00FE
+#define IA32_MSR_MTRR_DEF_TYPE 	0x02FF
+#define IA32_MSR_MTRR_PHYSBASE0 0x0200
+#define IA32_MSR_MTRR_PHYSMASK0 0x0201
+#define IA32_MSR_MTRR_PHYSBASE1 0x0202
+#define IA32_MSR_MTRR_PHYSMASK1 0x0203
+#define IA32_MSR_MTRR_PHYSBASE2 0x0204
+#define IA32_MSR_MTRR_PHYSMASK2 0x0205
+#define IA32_MSR_MTRR_PHYSBASE3 0x0206
+#define IA32_MSR_MTRR_PHYSMASK3 0x0207
+#define IA32_MSR_MTRR_PHYSBASE4 0x0208
+#define IA32_MSR_MTRR_PHYSMASK4 0x0209
+#define IA32_MSR_MTRR_PHYSBASE5 0x020A
+#define IA32_MSR_MTRR_PHYSMASK5 0x020B
+#define IA32_MSR_MTRR_PHYSBASE6 0x020C
+#define IA32_MSR_MTRR_PHYSMASK6 0x020D
+#define IA32_MSR_MTRR_PHYSBASE7 0x020E
+#define IA32_MSR_MTRR_PHYSMASK7 0x020F
+
+#define IA32_MSR_MTRR_FIX64K_00000 0x0250
+#define IA32_MSR_MTRR_FIX16K_80000 0x0258
+#define IA32_MSR_MTRR_FIX16K_A0000 0x0259
+#define IA32_MSR_MTRR_FIX4K_C0000 0x0268
+#define IA32_MSR_MTRR_FIX4K_C8000 0x0269
+#define IA32_MSR_MTRR_FIX4K_D0000 0x026A
+#define IA32_MSR_MTRR_FIX4K_D8000 0x026B
+#define IA32_MSR_MTRR_FIX4K_E0000 0x026C
+#define IA32_MSR_MTRR_FIX4K_E8000 0x026D
+#define IA32_MSR_MTRR_FIX4K_F0000 0x026E
+#define IA32_MSR_MTRR_FIX4K_F8000 0x026F
+
+#define IA32_MSR_TSC_AUX 		  0xC0000103
+#define IA32_MSR_BNDCFGS 		  0x00000d90
+#define IA32_MSR_DEBUG_CTL 		  0x1D9
+#define IA32_MSR_SPEC_CTRL        0x00000048
+#define IA32_MSR_TSC_ADJUST 	  0x0000003b
+
+#define IA32_MSR_MISC_ENABLE 0x000001a0
+
+
+#define HV_TRANSLATE_GVA_VALIDATE_READ	     (0x0001)
+#define HV_TRANSLATE_GVA_VALIDATE_WRITE      (0x0002)
+#define HV_TRANSLATE_GVA_VALIDATE_EXECUTE    (0x0004)
+
+#endif
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 06/25] accel/mshv: Add accelerator skeleton
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (4 preceding siblings ...)
  2025-05-20 11:29 ` [RFC PATCH 05/25] include/hw/hyperv: Add MSHV ABI header definitions Magnus Kulke
@ 2025-05-20 11:29 ` Magnus Kulke
  2025-05-20 12:02   ` Daniel P. Berrangé
  2025-05-20 11:30 ` [RFC PATCH 07/25] accel/mshv: Register memory region listeners Magnus Kulke
                   ` (19 subsequent siblings)
  25 siblings, 1 reply; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:29 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Introduce the initial scaffold for the MSHV (Microsoft Hypervisor)
accelerator backend. This includes the basic directory structure and
stub implementations needed to integrate with QEMU's accelerator
framework.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/meson.build      |   1 +
 accel/mshv/meson.build |   6 ++
 accel/mshv/mshv-all.c  | 143 +++++++++++++++++++++++++++++++++++++++++
 include/system/mshv.h  |  34 ++++++++++
 4 files changed, 184 insertions(+)
 create mode 100644 accel/mshv/meson.build
 create mode 100644 accel/mshv/mshv-all.c

diff --git a/accel/meson.build b/accel/meson.build
index d5e982d152..efa62879b6 100644
--- a/accel/meson.build
+++ b/accel/meson.build
@@ -10,6 +10,7 @@ if have_system
   subdir('kvm')
   subdir('xen')
   subdir('stubs')
+  subdir('mshv')
 endif
 
 # qtest
diff --git a/accel/mshv/meson.build b/accel/mshv/meson.build
new file mode 100644
index 0000000000..4c03ac7921
--- /dev/null
+++ b/accel/mshv/meson.build
@@ -0,0 +1,6 @@
+mshv_ss = ss.source_set()
+mshv_ss.add(if_true: files(
+  'mshv-all.c'
+))
+
+specific_ss.add_all(when: 'CONFIG_MSHV', if_true: mshv_ss)
diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
new file mode 100644
index 0000000000..44605adf94
--- /dev/null
+++ b/accel/mshv/mshv-all.c
@@ -0,0 +1,143 @@
+/*
+ * QEMU MSHV support
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * Authors:
+ *  Ziqiao Zhou       <ziqiaozhou@microsoft.com>
+ *  Magnus Kulke      <magnuskulke@microsoft.com>
+ *  Jinank Jain       <jinankjain@microsoft.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/event_notifier.h"
+#include "qemu/module.h"
+#include "qemu/main-loop.h"
+#include "hw/boards.h"
+
+#include "hw/hyperv/hvhdk.h"
+#include "hw/hyperv/hvhdk_mini.h"
+#include "hw/hyperv/hvgdk.h"
+#include "hw/hyperv/linux-mshv.h"
+
+#include "qemu/accel.h"
+#include "qemu/guest-random.h"
+#include "system/accel-ops.h"
+#include "system/cpus.h"
+#include "system/runstate.h"
+#include "system/accel-blocker.h"
+#include "system/address-spaces.h"
+#include "system/mshv.h"
+#include "system/reset.h"
+#include "trace.h"
+#include <err.h>
+#include <stdint.h>
+#include <sys/ioctl.h>
+
+#define TYPE_MSHV_ACCEL ACCEL_CLASS_NAME("mshv")
+
+DECLARE_INSTANCE_CHECKER(MshvState, MSHV_STATE, TYPE_MSHV_ACCEL)
+
+bool mshv_allowed;
+
+MshvState *mshv_state;
+
+
+static int mshv_init(MachineState *ms)
+{
+	error_report("unimplemented");
+	abort();
+}
+
+static void mshv_start_vcpu_thread(CPUState *cpu)
+{
+	error_report("unimplemented");
+	abort();
+}
+
+static void mshv_cpu_synchronize_post_init(CPUState *cpu)
+{
+	error_report("unimplemented");
+	abort();
+}
+
+static void mshv_cpu_synchronize_post_reset(CPUState *cpu)
+{
+	error_report("unimplemented");
+	abort();
+}
+
+static void mshv_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+	error_report("unimplemented");
+	abort();
+}
+
+static void mshv_cpu_synchronize(CPUState *cpu)
+{
+	error_report("unimplemented");
+	abort();
+}
+
+static bool mshv_cpus_are_resettable(void)
+{
+	error_report("unimplemented");
+	abort();
+}
+
+static void mshv_accel_class_init(ObjectClass *oc, const void *data)
+{
+    AccelClass *ac = ACCEL_CLASS(oc);
+
+    ac->name = "MSHV";
+    ac->init_machine = mshv_init;
+    ac->allowed = &mshv_allowed;
+}
+
+static void mshv_accel_instance_init(Object *obj)
+{
+    MshvState *s = MSHV_STATE(obj);
+
+    s->vm = 0;
+}
+
+static const TypeInfo mshv_accel_type = {
+    .name = TYPE_MSHV_ACCEL,
+    .parent = TYPE_ACCEL,
+    .instance_init = mshv_accel_instance_init,
+    .class_init = mshv_accel_class_init,
+    .instance_size = sizeof(MshvState),
+};
+
+static void mshv_accel_ops_class_init(ObjectClass *oc, const void *data)
+{
+    AccelOpsClass *ops = ACCEL_OPS_CLASS(oc);
+
+    ops->create_vcpu_thread = mshv_start_vcpu_thread;
+    ops->synchronize_post_init = mshv_cpu_synchronize_post_init;
+    ops->synchronize_post_reset = mshv_cpu_synchronize_post_reset;
+    ops->synchronize_state = mshv_cpu_synchronize;
+    ops->synchronize_pre_loadvm = mshv_cpu_synchronize_pre_loadvm;
+    ops->cpus_are_resettable = mshv_cpus_are_resettable;
+}
+
+static const TypeInfo mshv_accel_ops_type = {
+    .name = ACCEL_OPS_NAME("mshv"),
+    .parent = TYPE_ACCEL_OPS,
+    .class_init = mshv_accel_ops_class_init,
+    .abstract = true,
+};
+
+static void mshv_type_init(void)
+{
+    type_register_static(&mshv_accel_type);
+    type_register_static(&mshv_accel_ops_type);
+}
+
+type_init(mshv_type_init);
diff --git a/include/system/mshv.h b/include/system/mshv.h
index bc8f2c228a..0858e47def 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -16,6 +16,14 @@
 #ifndef QEMU_MSHV_INT_H
 #define QEMU_MSHV_INT_H
 
+#include "qemu/osdep.h"
+#include "qemu/accel.h"
+#include "hw/hyperv/hyperv-proto.h"
+#include "hw/hyperv/linux-mshv.h"
+#include "hw/hyperv/hvhdk.h"
+#include "qapi/qapi-types-common.h"
+#include "system/memory.h"
+
 #ifdef COMPILING_PER_TARGET
 #ifdef CONFIG_MSHV
 #define CONFIG_MSHV_IS_POSSIBLE
@@ -28,6 +36,32 @@
 #ifdef CONFIG_MSHV_IS_POSSIBLE
 extern bool mshv_allowed;
 #define mshv_enabled() (mshv_allowed)
+
+typedef struct MshvMemoryListener {
+  MemoryListener listener;
+  int as_id;
+} MshvMemoryListener;
+
+typedef struct MshvAddressSpace {
+    MshvMemoryListener *ml;
+    AddressSpace *as;
+} MshvAddressSpace;
+
+typedef struct MshvState {
+  AccelState parent_obj;
+  int vm;
+  MshvMemoryListener memory_listener;
+  /* number of listeners */
+  int nr_as;
+  MshvAddressSpace *as;
+} MshvState;
+extern MshvState *mshv_state;
+
+struct AccelCPUState {
+  int cpufd;
+  bool dirty;
+};
+
 #else /* CONFIG_MSHV_IS_POSSIBLE */
 #define mshv_enabled() false
 #endif
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 07/25] accel/mshv: Register memory region listeners
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (5 preceding siblings ...)
  2025-05-20 11:29 ` [RFC PATCH 06/25] accel/mshv: Add accelerator skeleton Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 11:30 ` [RFC PATCH 08/25] accel/mshv: Initialize VM partition Magnus Kulke
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Add memory listener hooks for the MSHV accelerator to track guest
memory regions. This enables the backend to respond to region
additions, removals and will be used to manage guest memory mappings
inside the hypervisor.

Actually registering physical memory in the hypervisor is still stubbed
out.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/mem.c       | 25 ++++++++++++++++
 accel/mshv/meson.build |  1 +
 accel/mshv/mshv-all.c  | 68 ++++++++++++++++++++++++++++++++++++++++--
 include/system/mshv.h  |  4 +++
 4 files changed, 96 insertions(+), 2 deletions(-)
 create mode 100644 accel/mshv/mem.c

diff --git a/accel/mshv/mem.c b/accel/mshv/mem.c
new file mode 100644
index 0000000000..eddd83ae83
--- /dev/null
+++ b/accel/mshv/mem.c
@@ -0,0 +1,25 @@
+/*
+ * QEMU MSHV support
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * Authors:
+ *  Magnus Kulke      <magnuskulke@microsoft.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "system/address-spaces.h"
+#include "system/mshv.h"
+
+void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
+                       bool add)
+{
+	error_report("unimplemented");
+	abort();
+}
+
diff --git a/accel/mshv/meson.build b/accel/mshv/meson.build
index 4c03ac7921..8a6beb3fb1 100644
--- a/accel/mshv/meson.build
+++ b/accel/mshv/meson.build
@@ -1,5 +1,6 @@
 mshv_ss = ss.source_set()
 mshv_ss.add(if_true: files(
+  'mem.c',
   'mshv-all.c'
 ))
 
diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index 44605adf94..63b0eca1fc 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -49,10 +49,74 @@ bool mshv_allowed;
 MshvState *mshv_state;
 
 
+static void mem_region_add(MemoryListener *listener,
+                           MemoryRegionSection *section)
+{
+    MshvMemoryListener *mml;
+    mml = container_of(listener, MshvMemoryListener, listener);
+    memory_region_ref(section->mr);
+    mshv_set_phys_mem(mml, section, true);
+}
+
+static void mem_region_del(MemoryListener *listener,
+                           MemoryRegionSection *section)
+{
+    MshvMemoryListener *mml;
+    mml = container_of(listener, MshvMemoryListener, listener);
+    mshv_set_phys_mem(mml, section, false);
+    memory_region_unref(section->mr);
+}
+
+static MemoryListener mshv_memory_listener = {
+    .name = "mshv",
+    .priority = MEMORY_LISTENER_PRIORITY_ACCEL,
+    .region_add = mem_region_add,
+    .region_del = mem_region_del,
+};
+
+static MemoryListener mshv_io_listener = {
+    .name = "mshv", .priority = MEMORY_LISTENER_PRIORITY_DEV_BACKEND,
+    /* MSHV does not support PIO eventfd */
+};
+
+static void register_mshv_memory_listener(MshvState *s, MshvMemoryListener *mml,
+                                          AddressSpace *as, int as_id,
+                                          const char *name)
+{
+    int i;
+
+    mml->listener = mshv_memory_listener;
+    mml->listener.name = name;
+    memory_listener_register(&mml->listener, as);
+    for (i = 0; i < s->nr_as; ++i) {
+        if (!s->as[i].as) {
+            s->as[i].as = as;
+            s->as[i].ml = mml;
+            break;
+        }
+    }
+}
+
+
 static int mshv_init(MachineState *ms)
 {
-	error_report("unimplemented");
-	abort();
+    MshvState *s;
+    s = MSHV_STATE(ms->accelerator);
+
+    accel_blocker_init();
+
+    s->vm = 0;
+
+    s->nr_as = 1;
+    s->as = g_new0(MshvAddressSpace, s->nr_as);
+
+    mshv_state = s;
+
+    register_mshv_memory_listener(s, &s->memory_listener, &address_space_memory,
+                                  0, "mshv-memory");
+    memory_listener_register(&mshv_io_listener, &address_space_io);
+
+    return 0;
 }
 
 static void mshv_start_vcpu_thread(CPUState *cpu)
diff --git a/include/system/mshv.h b/include/system/mshv.h
index 0858e47def..b93cf027d8 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -76,6 +76,10 @@ struct AccelCPUState {
 #define EFER_LMA   ((uint64_t)0x400)
 #define EFER_LME   ((uint64_t)0x100)
 
+
+/* memory */
+void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
+                       bool add);
 /* interrupt */
 int mshv_irqchip_add_msi_route(int vector, PCIDevice *dev);
 int mshv_irqchip_update_msi_route(int virq, MSIMessage msg, PCIDevice *dev);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 08/25] accel/mshv: Initialize VM partition
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (6 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 07/25] accel/mshv: Register memory region listeners Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 19:07   ` Wei Liu
  2025-05-20 11:30 ` [RFC PATCH 09/25] accel/mshv: Register guest memory regions with hypervisor Magnus Kulke
                   ` (17 subsequent siblings)
  25 siblings, 1 reply; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Create the MSHV virtual machine by opening a partition and issuing
the necessary ioctl to initialize it. This sets up the basic VM
structure and initial configuration used by MSHV to manage guest state.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/mshv-all.c        | 204 +++++++++++++++++++++++++++++++++++
 accel/mshv/trace-events      |   3 +
 accel/mshv/trace.h           |   1 +
 include/system/mshv.h        |   6 ++
 meson.build                  |   1 +
 target/i386/mshv/meson.build |   1 +
 target/i386/mshv/mshv-cpu.c  |  73 +++++++++++++
 7 files changed, 289 insertions(+)
 create mode 100644 accel/mshv/trace-events
 create mode 100644 accel/mshv/trace.h
 create mode 100644 target/i386/mshv/mshv-cpu.c

diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index 63b0eca1fc..95f1008a48 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -48,6 +48,178 @@ bool mshv_allowed;
 
 MshvState *mshv_state;
 
+static int init_mshv(int *mshv_fd)
+{
+    int fd = open("/dev/mshv", O_RDWR | O_CLOEXEC);
+    if (fd < 0) {
+        error_report("Failed to open /dev/mshv: %s", strerror(errno));
+        return -1;
+    }
+	*mshv_fd = fd;
+	return 0;
+}
+
+/* freeze 1 to pause, 0 to resume */
+static int set_time_freeze(int vm_fd, int freeze)
+{
+    int ret;
+
+    if (freeze != 0 && freeze != 1) {
+        error_report("Invalid time freeze value");
+        return -1;
+    }
+
+    struct hv_input_set_partition_property in = {0};
+    in.property_code = HV_PARTITION_PROPERTY_TIME_FREEZE;
+    in.property_value = freeze;
+
+    struct mshv_root_hvcall args = {0};
+    args.code = HVCALL_SET_PARTITION_PROPERTY;
+    args.in_sz = sizeof(in);
+    args.in_ptr = (uint64_t)&in;
+
+    ret = mshv_hvcall(vm_fd, &args);
+    if (ret < 0) {
+        error_report("Failed to set time freeze");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int pause_vm(int vm_fd)
+{
+    int ret;
+
+    ret = set_time_freeze(vm_fd, 1);
+    if (ret < 0) {
+        error_report("Failed to pause partition: %s", strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
+static int resume_vm(int vm_fd)
+{
+    int ret;
+
+    ret = set_time_freeze(vm_fd, 0);
+    if (ret < 0) {
+        error_report("Failed to resume partition: %s", strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
+static int create_partition(int mshv_fd, int *vm_fd)
+{
+    int ret;
+    struct mshv_create_partition args = {0};
+
+    /* Initialize pt_flags with the desired features */
+    uint64_t pt_flags = (1ULL << MSHV_PT_BIT_LAPIC) |
+                        (1ULL << MSHV_PT_BIT_X2APIC) |
+                        (1ULL << MSHV_PT_BIT_GPA_SUPER_PAGES);
+
+    /* Set default isolation type */
+    uint64_t pt_isolation = MSHV_PT_ISOLATION_NONE;
+
+    args.pt_flags = pt_flags;
+    args.pt_isolation = pt_isolation;
+
+    ret = ioctl(mshv_fd, MSHV_CREATE_PARTITION, &args);
+    if (ret < 0) {
+        error_report("Failed to create partition: %s", strerror(errno));
+        return -1;
+    }
+
+    *vm_fd = ret;
+    return 0;
+}
+
+static int set_synthetic_proc_features(int vm_fd)
+{
+    int ret;
+    struct hv_input_set_partition_property in = {0};
+    union hv_partition_synthetic_processor_features features = {0};
+
+    /* Access the bitfield and set the desired features */
+    features.hypervisor_present = 1;
+    features.hv1 = 1;
+    features.access_partition_reference_counter = 1;
+    features.access_synic_regs = 1;
+    features.access_synthetic_timer_regs = 1;
+    features.access_partition_reference_tsc = 1;
+    features.access_frequency_regs = 1;
+    features.access_intr_ctrl_regs = 1;
+    features.access_vp_index = 1;
+    features.access_hypercall_regs = 1;
+    features.tb_flush_hypercalls = 1;
+    features.synthetic_cluster_ipi = 1;
+    features.direct_synthetic_timers = 1;
+
+    mshv_arch_amend_proc_features(&features);
+
+    in.property_code = HV_PARTITION_PROPERTY_SYNTHETIC_PROC_FEATURES;
+    in.property_value = features.as_uint64[0];
+
+    struct mshv_root_hvcall args = {0};
+    args.code = HVCALL_SET_PARTITION_PROPERTY;
+    args.in_sz = sizeof(in);
+    args.in_ptr = (uint64_t)&in;
+
+    trace_mshv_hvcall_args("synthetic_proc_features", args.code, args.in_sz);
+
+    ret = mshv_hvcall(vm_fd, &args);
+    if (ret < 0) {
+        error_report("Failed to set synthethic proc features");
+        return -errno;
+    }
+    return 0;
+}
+
+static int initialize_vm(int vm_fd)
+{
+    int ret = ioctl(vm_fd, MSHV_INITIALIZE_PARTITION);
+    if (ret < 0) {
+        error_report("Failed to initialize partition: %s", strerror(errno));
+        return -1;
+    }
+    return 0;
+}
+
+static int create_vm(int mshv_fd)
+{
+    int vm_fd;
+
+    int ret = create_partition(mshv_fd, &vm_fd);
+    if (ret < 0) {
+        close(mshv_fd);
+        return -errno;
+    }
+
+    ret = set_synthetic_proc_features(vm_fd);
+    if (ret < 0) {
+        return -errno;
+    }
+
+    ret = initialize_vm(vm_fd);
+    if (ret < 0) {
+        return -1;
+    }
+
+    ret = mshv_arch_post_init_vm(vm_fd);
+    if (ret < 0) {
+        return -1;
+    }
+
+    /* Always create a frozen partition */
+    pause_vm(vm_fd);
+
+    return vm_fd;
+}
 
 static void mem_region_add(MemoryListener *listener,
                            MemoryRegionSection *section)
@@ -96,22 +268,54 @@ static void register_mshv_memory_listener(MshvState *s, MshvMemoryListener *mml,
         }
     }
 }
+static void mshv_reset(void *param)
+{
+    warn_report("mshv reset");
+}
+
+int mshv_hvcall(int mshv_fd, const struct mshv_root_hvcall *args)
+{
+    int ret = 0;
+
+    ret = ioctl(mshv_fd, MSHV_ROOT_HVCALL, args);
+    if (ret < 0) {
+        error_report("Failed to perform hvcall: %s", strerror(errno));
+        return -1;
+    }
+    return ret;
+}
 
 
 static int mshv_init(MachineState *ms)
 {
     MshvState *s;
+    int mshv_fd, ret;
+
     s = MSHV_STATE(ms->accelerator);
 
     accel_blocker_init();
 
     s->vm = 0;
 
+    ret = init_mshv(&mshv_fd);
+    if (ret < 0) {
+        return -1;
+    }
+
+    do {
+        int vm_fd = create_vm(mshv_fd);
+        s->vm = vm_fd;
+    } while (!s->vm);
+
+    resume_vm(s->vm);
+
     s->nr_as = 1;
     s->as = g_new0(MshvAddressSpace, s->nr_as);
 
     mshv_state = s;
 
+    qemu_register_reset(mshv_reset, NULL);
+
     register_mshv_memory_listener(s, &s->memory_listener, &address_space_memory,
                                   0, "mshv-memory");
     memory_listener_register(&mshv_io_listener, &address_space_io);
diff --git a/accel/mshv/trace-events b/accel/mshv/trace-events
new file mode 100644
index 0000000000..f99e8c5a41
--- /dev/null
+++ b/accel/mshv/trace-events
@@ -0,0 +1,3 @@
+# See docs/devel/tracing.rst for syntax documentation.
+
+mshv_hvcall_args(const char* hvcall, uint16_t code, uint16_t in_sz) "built args for '%s' code: %d in_sz: %d"
diff --git a/accel/mshv/trace.h b/accel/mshv/trace.h
new file mode 100644
index 0000000000..da5b40cd24
--- /dev/null
+++ b/accel/mshv/trace.h
@@ -0,0 +1 @@
+#include "trace/trace-accel_mshv.h"
diff --git a/include/system/mshv.h b/include/system/mshv.h
index b93cf027d8..398cda3254 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -76,6 +76,12 @@ struct AccelCPUState {
 #define EFER_LMA   ((uint64_t)0x400)
 #define EFER_LME   ((uint64_t)0x100)
 
+void mshv_arch_amend_proc_features(
+    union hv_partition_synthetic_processor_features *features);
+int mshv_arch_post_init_vm(int vm_fd);
+
+int mshv_hvcall(int mshv_fd, const struct mshv_root_hvcall *args);
+
 
 /* memory */
 void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
diff --git a/meson.build b/meson.build
index a4269b816b..6cd3e26e39 100644
--- a/meson.build
+++ b/meson.build
@@ -3595,6 +3595,7 @@ endif
 if have_system
   trace_events_subdirs += [
     'accel/kvm',
+    'accel/mshv',
     'audio',
     'backends',
     'backends/tpm',
diff --git a/target/i386/mshv/meson.build b/target/i386/mshv/meson.build
index 8ddaa7c11d..647e5dafb7 100644
--- a/target/i386/mshv/meson.build
+++ b/target/i386/mshv/meson.build
@@ -1,6 +1,7 @@
 i386_mshv_ss = ss.source_set()
 
 i386_mshv_ss.add(files(
+  'mshv-cpu.c',
   'x86.c',
 ))
 
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
new file mode 100644
index 0000000000..b36f8904fb
--- /dev/null
+++ b/target/i386/mshv/mshv-cpu.c
@@ -0,0 +1,73 @@
+/*
+ * QEMU MSHV support
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * Authors:
+ *  Ziqiao Zhou       <ziqiaozhou@microsoft.com>
+ *  Magnus Kulke      <magnuskulke@microsoft.com>
+ *  Jinank Jain       <jinankjain@microsoft.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/typedefs.h"
+
+#include "system/mshv.h"
+#include "hw/hyperv/linux-mshv.h"
+#include "hw/hyperv/hvhdk_mini.h"
+#include "hw/hyperv/hvgdk.h"
+
+
+#include "trace-accel_mshv.h"
+#include "trace.h"
+
+void mshv_arch_amend_proc_features(
+    union hv_partition_synthetic_processor_features *features)
+{
+    features->access_guest_idle_reg = 1;
+}
+
+/*
+ * Default Microsoft Hypervisor behavior for unimplemented MSR is to send a
+ * fault to the guest if it tries to access it. It is possible to override
+ * this behavior with a more suitable option i.e., ignore writes from the guest
+ * and return zero in attempt to read unimplemented.
+ */
+static int set_unimplemented_msr_action(int vm_fd)
+{
+    struct hv_input_set_partition_property in = {0};
+    struct mshv_root_hvcall args = {0};
+
+    in.property_code  = HV_PARTITION_PROPERTY_UNIMPLEMENTED_MSR_ACTION;
+    in.property_value = HV_UNIMPLEMENTED_MSR_ACTION_IGNORE_WRITE_READ_ZERO;
+
+    args.code   = HVCALL_SET_PARTITION_PROPERTY;
+    args.in_sz  = sizeof(in);
+    args.in_ptr = (uint64_t)&in;
+
+    trace_mshv_hvcall_args("unimplemented_msr_action", args.code, args.in_sz);
+
+    int ret = mshv_hvcall(vm_fd, &args);
+    if (ret < 0) {
+        error_report("Failed to set unimplemented MSR action");
+        return -1;
+    }
+    return 0;
+}
+
+int mshv_arch_post_init_vm(int vm_fd)
+{
+    int ret;
+
+    ret = set_unimplemented_msr_action(vm_fd);
+    if (ret < 0) {
+        error_report("Failed to set unimplemented MSR action");
+    }
+
+    return ret;
+}
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 09/25] accel/mshv: Register guest memory regions with hypervisor
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (7 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 08/25] accel/mshv: Initialize VM partition Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 20:07   ` Wei Liu
  2025-05-20 11:30 ` [RFC PATCH 10/25] accel/mshv: Add ioeventfd support Magnus Kulke
                   ` (16 subsequent siblings)
  25 siblings, 1 reply; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Handle region_add events by invoking the MSHV memory registration
ioctl to map guest memory into the hypervisor partition. This allows
the guest to access memory through MSHV-managed mappings.

Note that this assumes the hypervisor will accept regions that overlap
in userspace_addr. Currently that's not the case, it will be addressed
in a later commit in the series.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/mem.c        | 116 ++++++++++++++++++++++++++++++++++++++--
 accel/mshv/trace-events |   1 +
 include/system/mshv.h   |  11 ++++
 3 files changed, 125 insertions(+), 3 deletions(-)

diff --git a/accel/mshv/mem.c b/accel/mshv/mem.c
index eddd83ae83..2bbeae4f4a 100644
--- a/accel/mshv/mem.c
+++ b/accel/mshv/mem.c
@@ -13,13 +13,123 @@
 
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
+#include "hw/hyperv/linux-mshv.h"
 #include "system/address-spaces.h"
 #include "system/mshv.h"
+#include "exec/memattrs.h"
+#include <sys/ioctl.h>
+#include "trace.h"
+
+static int set_guest_memory(int vm_fd, const mshv_user_mem_region *region)
+{
+    int ret;
+
+    ret = ioctl(vm_fd, MSHV_SET_GUEST_MEMORY, region);
+    if (ret < 0) {
+        error_report("failed to set guest memory");
+        return -errno;
+    }
+
+    return 0;
+}
+
+static int map_or_unmap(int vm_fd, const MshvMemoryRegion *mr, bool add)
+{
+    struct mshv_user_mem_region region = {0};
+
+    region.guest_pfn = mr->guest_phys_addr >> MSHV_PAGE_SHIFT;
+    region.size = mr->memory_size;
+    region.userspace_addr = mr->userspace_addr;
+
+    if (!add) {
+        region.flags |= (1 << MSHV_SET_MEM_BIT_UNMAP);
+        return set_guest_memory(vm_fd, &region);
+    }
+
+    region.flags = (1 << MSHV_SET_MEM_BIT_EXECUTABLE);
+    if (!mr->readonly) {
+        region.flags |= (1 << MSHV_SET_MEM_BIT_WRITABLE);
+    }
+
+    return set_guest_memory(vm_fd, &region);
+}
+
+static int set_memory(const MshvMemoryRegion *mshv_mr, bool add)
+{
+    int ret = 0;
+
+    if (!mshv_mr) {
+        error_report("Invalid mshv_mr");
+        return -1;
+    }
+
+    trace_mshv_set_memory(add, mshv_mr->guest_phys_addr,
+                          mshv_mr->memory_size,
+                          mshv_mr->userspace_addr, mshv_mr->readonly,
+                          ret);
+    return map_or_unmap(mshv_state->vm, mshv_mr, add);
+}
+
+/*
+ * Calculate and align the start address and the size of the section.
+ * Return the size. If the size is 0, the aligned section is empty.
+ */
+static hwaddr align_section(MemoryRegionSection *section, hwaddr *start)
+{
+    hwaddr size = int128_get64(section->size);
+    hwaddr delta, aligned;
+
+    /*
+     * works in page size chunks, but the function may be called
+     * with sub-page size and unaligned start address. Pad the start
+     * address to next and truncate size to previous page boundary.
+     */
+    aligned = ROUND_UP(section->offset_within_address_space,
+                       qemu_real_host_page_size());
+    delta = aligned - section->offset_within_address_space;
+    *start = aligned;
+    if (delta > size) {
+        return 0;
+    }
+
+    return (size - delta) & qemu_real_host_page_mask();
+}
 
 void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
                        bool add)
 {
-	error_report("unimplemented");
-	abort();
-}
+    int ret = 0;
+    MemoryRegion *area = section->mr;
+    bool writable = !area->readonly && !area->rom_device;
+    hwaddr start_addr, mr_offset, size;
+    void *ram;
+    MshvMemoryRegion tmp, *mshv_mr = &tmp;
+
+    if (!memory_region_is_ram(area)) {
+        if (writable) {
+            return;
+        }
+    }
+
+    size = align_section(section, &start_addr);
+    if (!size) {
+        return;
+    }
+
+    mr_offset = section->offset_within_region + start_addr -
+                section->offset_within_address_space;
 
+    ram = memory_region_get_ram_ptr(area) + mr_offset;
+
+    memset(mshv_mr, 0, sizeof(*mshv_mr));
+    mshv_mr->guest_phys_addr = start_addr;
+    mshv_mr->memory_size = size;
+    mshv_mr->readonly = !writable;
+    mshv_mr->userspace_addr = (uint64_t)ram;
+
+    ret = set_memory(mshv_mr, add);
+    if (ret < 0) {
+        error_report("Failed to set memory region");
+        abort();
+    }
+}
diff --git a/accel/mshv/trace-events b/accel/mshv/trace-events
index f99e8c5a41..63625192ec 100644
--- a/accel/mshv/trace-events
+++ b/accel/mshv/trace-events
@@ -1,3 +1,4 @@
 # See docs/devel/tracing.rst for syntax documentation.
 
+mshv_set_memory(bool add, uint64_t gpa, uint64_t size, uint64_t user_addr, bool readonly, int ret) "[add = %d] gpa = %lx size = %lx user = %lx readonly = %d result = %d"
 mshv_hvcall_args(const char* hvcall, uint16_t code, uint16_t in_sz) "built args for '%s' code: %d in_sz: %d"
diff --git a/include/system/mshv.h b/include/system/mshv.h
index 398cda3254..bed28b48a9 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -32,6 +32,8 @@
 #define CONFIG_MSHV_IS_POSSIBLE
 #endif
 
+#define MSHV_PAGE_SHIFT 12
+
 
 #ifdef CONFIG_MSHV_IS_POSSIBLE
 extern bool mshv_allowed;
@@ -84,6 +86,15 @@ int mshv_hvcall(int mshv_fd, const struct mshv_root_hvcall *args);
 
 
 /* memory */
+typedef struct MshvMemoryRegion {
+    uint64_t guest_phys_addr;
+    uint64_t memory_size;
+    uint64_t userspace_addr;
+    bool readonly;
+} MshvMemoryRegion;
+
+int mshv_add_mem(int vm_fd, const MshvMemoryRegion *mr);
+int mshv_remove_mem(int vm_fd, const MshvMemoryRegion *mr);
 void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
                        bool add);
 /* interrupt */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 10/25] accel/mshv: Add ioeventfd support
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (8 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 09/25] accel/mshv: Register guest memory regions with hypervisor Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 11:30 ` [RFC PATCH 11/25] accel/mshv: Add basic interrupt injection support Magnus Kulke
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Implement ioeventfd registration in the MSHV accelerator backend to
handle guest-triggered events. This enables integration with QEMU's
eventfd-based I/O mechanism.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/mshv-all.c   | 115 ++++++++++++++++++++++++++++++++++++++++
 accel/mshv/trace-events |   3 ++
 include/system/mshv.h   |   8 +++
 3 files changed, 126 insertions(+)

diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index 95f1008a48..e4085b216d 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -239,11 +239,126 @@ static void mem_region_del(MemoryListener *listener,
     memory_region_unref(section->mr);
 }
 
+typedef enum {
+    DATAMATCH_NONE,
+    DATAMATCH_U32,
+    DATAMATCH_U64,
+} DatamatchTag;
+
+typedef struct {
+    DatamatchTag tag;
+    union {
+        uint32_t u32;
+        uint64_t u64;
+    } value;
+} Datamatch;
+
+/* flags: determine whether to de/assign */
+static int ioeventfd(int vm_fd, int event_fd, uint64_t addr, Datamatch dm,
+                     uint32_t flags)
+{
+    mshv_user_ioeventfd args = {0};
+    args.fd = event_fd;
+    args.addr = addr;
+    args.flags = flags;
+
+    if (dm.tag == DATAMATCH_NONE) {
+        args.datamatch = 0;
+    } else {
+        flags |= BIT(MSHV_IOEVENTFD_BIT_DATAMATCH);
+        args.flags = flags;
+        if (dm.tag == DATAMATCH_U64) {
+            args.len = sizeof(uint64_t);
+            args.datamatch = dm.value.u64;
+        } else {
+            args.len = sizeof(uint32_t);
+            args.datamatch = dm.value.u32;
+        }
+    }
+
+    return ioctl(vm_fd, MSHV_IOEVENTFD, &args);
+}
+
+static int unregister_ioevent(int vm_fd, int event_fd, uint64_t mmio_addr)
+{
+    uint32_t flags = 0;
+    Datamatch dm = {0};
+
+    flags |= BIT(MSHV_IOEVENTFD_BIT_DEASSIGN);
+    dm.tag = DATAMATCH_NONE;
+
+    return ioeventfd(vm_fd, event_fd, mmio_addr, dm, flags);
+}
+
+static int register_ioevent(int vm_fd, int event_fd, uint64_t mmio_addr,
+                            uint64_t val, bool is_64bit, bool is_datamatch)
+{
+    uint32_t flags = 0;
+    Datamatch dm = {0};
+
+    if (!is_datamatch) {
+        dm.tag = DATAMATCH_NONE;
+    } else if (is_64bit) {
+        dm.tag = DATAMATCH_U64;
+        dm.value.u64 = val;
+    } else {
+        dm.tag = DATAMATCH_U32;
+        dm.value.u32 = val;
+    }
+
+    return ioeventfd(vm_fd, event_fd, mmio_addr, dm, flags);
+}
+
+static void mem_ioeventfd_add(MemoryListener *listener,
+                              MemoryRegionSection *section,
+                              bool match_data, uint64_t data,
+                              EventNotifier *e)
+{
+    int fd = event_notifier_get_fd(e);
+    int ret;
+    bool is_64 = int128_get64(section->size) == 8;
+    uint64_t addr = section->offset_within_address_space & 0xffffffff;
+
+    trace_mshv_mem_ioeventfd_add(addr, int128_get64(section->size), data);
+
+    ret = register_ioevent(mshv_state->vm, fd, addr, data, is_64, match_data);
+
+    if (ret < 0) {
+        error_report("Failed to register ioeventfd: %s (%d)", strerror(-ret),
+                     -ret);
+        abort();
+    }
+}
+
+static void mem_ioeventfd_del(MemoryListener *listener,
+                              MemoryRegionSection *section,
+                              bool match_data, uint64_t data,
+                              EventNotifier *e)
+{
+    int fd = event_notifier_get_fd(e);
+    int ret;
+    uint64_t addr = section->offset_within_address_space & 0xffffffff;
+
+    trace_mshv_mem_ioeventfd_del(section->offset_within_address_space,
+                                 int128_get64(section->size), data);
+
+    ret = unregister_ioevent(mshv_state->vm, fd, addr);
+    if (ret < 0) {
+        error_report("Failed to unregister ioeventfd: %s (%d)", strerror(-ret),
+                     -ret);
+        abort();
+    }
+}
+
 static MemoryListener mshv_memory_listener = {
     .name = "mshv",
     .priority = MEMORY_LISTENER_PRIORITY_ACCEL,
     .region_add = mem_region_add,
     .region_del = mem_region_del,
+#ifdef MSHV_USE_IOEVENTFD
+    .eventfd_add = mem_ioeventfd_add,
+    .eventfd_del = mem_ioeventfd_del,
+#endif
 };
 
 static MemoryListener mshv_io_listener = {
diff --git a/accel/mshv/trace-events b/accel/mshv/trace-events
index 63625192ec..5929cb45a5 100644
--- a/accel/mshv/trace-events
+++ b/accel/mshv/trace-events
@@ -1,4 +1,7 @@
 # See docs/devel/tracing.rst for syntax documentation.
 
 mshv_set_memory(bool add, uint64_t gpa, uint64_t size, uint64_t user_addr, bool readonly, int ret) "[add = %d] gpa = %lx size = %lx user = %lx readonly = %d result = %d"
+mshv_mem_ioeventfd_add(uint64_t addr, uint32_t size, uint32_t data) "addr %lx size %d data %x"
+mshv_mem_ioeventfd_del(uint64_t addr, uint32_t size, uint32_t data) "addr %lx size %d data %x"
+
 mshv_hvcall_args(const char* hvcall, uint16_t code, uint16_t in_sz) "built args for '%s' code: %d in_sz: %d"
diff --git a/include/system/mshv.h b/include/system/mshv.h
index bed28b48a9..c7ee4f0cc1 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -32,6 +32,14 @@
 #define CONFIG_MSHV_IS_POSSIBLE
 #endif
 
+/*
+ * Set to 0 if we do not want to use eventfd to optimize the MMIO events.
+ * Set to 1 so that mshv kernel driver receives doorbell when the VM access
+ * MMIO memory and then signal eventfd to notify the qemu device
+ * without extra switching to qemu to emulate mmio access.
+ */
+#define MSHV_USE_IOEVENTFD 1
+
 #define MSHV_PAGE_SHIFT 12
 
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 11/25] accel/mshv: Add basic interrupt injection support
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (9 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 10/25] accel/mshv: Add ioeventfd support Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 14:18   ` Paolo Bonzini
  2025-05-20 20:15   ` Wei Liu
  2025-05-20 11:30 ` [RFC PATCH 12/25] accel/mshv: Add vCPU creation and execution loop Magnus Kulke
                   ` (14 subsequent siblings)
  25 siblings, 2 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Implement initial interrupt handling logic in the MSHV backend. This
includes management of MSI and un/registering of irqfd mechanisms.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/irq.c        | 370 ++++++++++++++++++++++++++++++++++++++++
 accel/mshv/meson.build  |   1 +
 accel/mshv/mshv-all.c   |   2 +
 accel/mshv/trace-events |   9 +
 hw/intc/apic.c          |   9 +
 include/system/mshv.h   |  14 ++
 6 files changed, 405 insertions(+)
 create mode 100644 accel/mshv/irq.c

diff --git a/accel/mshv/irq.c b/accel/mshv/irq.c
new file mode 100644
index 0000000000..74f0bb62db
--- /dev/null
+++ b/accel/mshv/irq.c
@@ -0,0 +1,370 @@
+/*
+ * QEMU MSHV support
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * Authors:
+ *  Ziqiao Zhou       <ziqiaozhou@microsoft.com>
+ *  Magnus Kulke      <magnuskulke@microsoft.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hw/hyperv/linux-mshv.h"
+#include "hw/hyperv/hvhdk_mini.h"
+#include "qemu/osdep.h"
+#include "hw/pci/msi.h"
+#include "system/mshv.h"
+#include "trace.h"
+#include <stdint.h>
+#include <sys/ioctl.h>
+
+#define MSHV_IRQFD_RESAMPLE_FLAG (1 << MSHV_IRQFD_BIT_RESAMPLE)
+#define MSHV_IRQFD_BIT_DEASSIGN_FLAG (1 << MSHV_IRQFD_BIT_DEASSIGN)
+
+static MshvMsiControl *msi_control;
+static QemuMutex msi_control_mutex;
+
+void mshv_init_msicontrol(void)
+{
+    qemu_mutex_init(&msi_control_mutex);
+    msi_control = g_new0(MshvMsiControl, 1);
+    msi_control->gsi_routes = g_hash_table_new(g_direct_hash, g_direct_equal);
+    msi_control->updated = false;
+}
+
+static int set_msi_routing(uint32_t gsi, uint64_t addr, uint32_t data)
+{
+    struct mshv_user_irq_entry *entry;
+    uint32_t high_addr = addr >> 32;
+    uint32_t low_addr = addr & 0xFFFFFFFF;
+    GHashTable *gsi_routes;
+
+    trace_mshv_set_msi_routing(gsi, addr, data);
+
+    if (gsi >= MSHV_MAX_MSI_ROUTES) {
+        error_report("gsi >= MSHV_MAX_MSI_ROUTES");
+        return -1;
+    }
+
+    assert(msi_control);
+
+    WITH_QEMU_LOCK_GUARD(&msi_control_mutex) {
+        gsi_routes = msi_control->gsi_routes;
+        entry = g_hash_table_lookup(gsi_routes, GINT_TO_POINTER(gsi));
+
+        if (entry
+            && entry->address_hi == high_addr
+            && entry->address_lo == low_addr
+            && entry->data == data)
+        {
+            /* nothing to update */
+            return 0;
+        }
+
+        /* free old entry */
+        g_free(entry);
+
+        /* create new entry */
+        entry = g_new0(mshv_user_irq_entry, 1);
+        entry->gsi = gsi;
+        entry->address_hi = high_addr;
+        entry->address_lo = low_addr;
+        entry->data = data;
+
+        g_hash_table_insert(gsi_routes, GINT_TO_POINTER(gsi), entry);
+        msi_control->updated = true;
+    }
+
+    return 0;
+}
+
+static int add_msi_routing(uint64_t addr, uint32_t data)
+{
+    struct mshv_user_irq_entry *route_entry;
+    uint32_t high_addr = addr >> 32;
+    uint32_t low_addr = addr & 0xFFFFFFFF;
+    int gsi;
+    GHashTable *gsi_routes;
+
+    trace_mshv_add_msi_routing(addr, data);
+
+    assert(msi_control);
+
+    WITH_QEMU_LOCK_GUARD(&msi_control_mutex) {
+        /* find an empty slot */
+        gsi = 0;
+        gsi_routes = msi_control->gsi_routes;
+        while (gsi < MSHV_MAX_MSI_ROUTES) {
+            route_entry = g_hash_table_lookup(gsi_routes, GINT_TO_POINTER(gsi));
+            if (!route_entry) {
+                break;
+            }
+            gsi++;
+        }
+        if (gsi >= MSHV_MAX_MSI_ROUTES) {
+            error_report("No empty gsi slot available");
+            return -1;
+        }
+
+        /* create new entry */
+        route_entry = g_new0(struct mshv_user_irq_entry, 1);
+        route_entry->gsi = gsi;
+        route_entry->address_hi = high_addr;
+        route_entry->address_lo = low_addr;
+        route_entry->data = data;
+
+        g_hash_table_insert(gsi_routes, GINT_TO_POINTER(gsi), route_entry);
+        msi_control->updated = true;
+    }
+
+    return gsi;
+}
+
+static int commit_msi_routing_table(void)
+{
+    guint len;
+    int i, ret;
+    size_t table_size;
+    struct mshv_user_irq_table *table;
+    GHashTableIter iter;
+    gpointer key, value;
+    int vm_fd = mshv_state->vm;
+
+    assert(msi_control);
+
+    WITH_QEMU_LOCK_GUARD(&msi_control_mutex) {
+        if (!msi_control->updated) {
+            /* nothing to update */
+            return 0;
+        }
+
+        /* Calculate the size of the table */
+        len = g_hash_table_size(msi_control->gsi_routes);
+        table_size = sizeof(struct mshv_user_irq_table)
+                     + len * sizeof(struct mshv_user_irq_entry);
+        table = g_malloc0(table_size);
+
+        g_hash_table_iter_init(&iter, msi_control->gsi_routes);
+        i = 0;
+        while (g_hash_table_iter_next(&iter, &key, &value)) {
+            struct mshv_user_irq_entry *entry = value;
+            table->entries[i] = *entry;
+            i++;
+        }
+
+        trace_mshv_commit_msi_routing_table(vm_fd, len);
+
+        ret = ioctl(vm_fd, MSHV_SET_MSI_ROUTING, table);
+        g_free(table);
+        if (ret < 0) {
+            error_report("Failed to commit msi routing table");
+            return -1;
+        }
+        msi_control->updated = false;
+    }
+    return 0;
+}
+
+static int remove_msi_routing(uint32_t gsi)
+{
+    struct mshv_user_irq_entry *route_entry;
+    GHashTable *gsi_routes;
+
+    trace_mshv_remove_msi_routing(gsi);
+
+    if (gsi >= MSHV_MAX_MSI_ROUTES) {
+        error_report("Invalid GSI: %u", gsi);
+        return -1;
+    }
+
+    assert(msi_control);
+
+    WITH_QEMU_LOCK_GUARD(&msi_control_mutex) {
+        gsi_routes = msi_control->gsi_routes;
+        route_entry = g_hash_table_lookup(gsi_routes, GINT_TO_POINTER(gsi));
+        if (route_entry) {
+            g_hash_table_remove(gsi_routes, GINT_TO_POINTER(gsi));
+            g_free(route_entry);
+            msi_control->updated = true;
+        }
+    }
+
+    return 0;
+}
+
+/* Pass an eventfd which is to be used for injecting interrupts from userland */
+static int irqfd(int vm_fd, int fd, int resample_fd, uint32_t gsi,
+                 uint32_t flags)
+{
+    int ret;
+    struct mshv_user_irqfd arg = {
+        .fd = fd,
+        .resamplefd = resample_fd,
+        .gsi = gsi,
+        .flags = flags,
+    };
+
+    ret = ioctl(vm_fd, MSHV_IRQFD, &arg);
+    if (ret < 0) {
+        error_report("Failed to set irqfd: gsi=%u, fd=%d", gsi, fd);
+        return -1;
+    }
+    return ret;
+}
+
+static int register_irqfd(int vm_fd, int event_fd, uint32_t gsi)
+{
+    int ret;
+
+    trace_mshv_register_irqfd(vm_fd, event_fd, gsi);
+
+    ret = irqfd(vm_fd, event_fd, 0, gsi, 0);
+    if (ret < 0) {
+        error_report("Failed to register irqfd: gsi=%u", gsi);
+        return -1;
+    }
+    return 0;
+}
+
+static int register_irqfd_with_resample(int vm_fd, int event_fd,
+                                        int resample_fd, uint32_t gsi)
+{
+    int ret;
+    uint32_t flags = MSHV_IRQFD_RESAMPLE_FLAG;
+
+    ret = irqfd(vm_fd, event_fd, resample_fd, gsi, flags);
+    if (ret < 0) {
+        error_report("Failed to register irqfd with resample: gsi=%u", gsi);
+        return -errno;
+    }
+    return 0;
+}
+
+static int unregister_irqfd(int vm_fd, int event_fd, uint32_t gsi)
+{
+    int ret;
+    uint32_t flags = MSHV_IRQFD_BIT_DEASSIGN_FLAG;
+
+    ret = irqfd(vm_fd, event_fd, 0, gsi, flags);
+    if (ret < 0) {
+        error_report("Failed to unregister irqfd: gsi=%u", gsi);
+        return -errno;
+    }
+    return 0;
+}
+
+static int irqchip_update_irqfd_notifier_gsi(const EventNotifier *event,
+                                             const EventNotifier *resample,
+                                             int virq, bool add)
+{
+    int fd = event_notifier_get_fd(event);
+    int rfd = resample ? event_notifier_get_fd(resample) : -1;
+    int vm_fd = mshv_state->vm;
+
+    trace_mshv_irqchip_update_irqfd_notifier_gsi(fd, rfd, virq, add);
+
+    if (!add) {
+        return unregister_irqfd(vm_fd, fd, virq);
+    }
+
+    if (rfd > 0) {
+        return register_irqfd_with_resample(vm_fd, fd, rfd, virq);
+    }
+
+    return register_irqfd(vm_fd, fd, virq);
+}
+
+
+int mshv_irqchip_add_msi_route(int vector, PCIDevice *dev)
+{
+    MSIMessage msg = { 0, 0 };
+    int virq = 0;
+
+    if (pci_available && dev) {
+        msg = pci_get_msi_message(dev, vector);
+        virq = add_msi_routing(msg.address, le32_to_cpu(msg.data));
+    }
+
+    return virq;
+}
+
+void mshv_irqchip_release_virq(int virq)
+{
+    remove_msi_routing(virq);
+}
+
+int mshv_irqchip_update_msi_route(int virq, MSIMessage msg, PCIDevice *dev)
+{
+    int ret;
+
+    ret = set_msi_routing(virq, msg.address, le32_to_cpu(msg.data));
+    if (ret < 0) {
+        error_report("Failed to set msi routing");
+        return -1;
+    }
+
+    return 0;
+}
+
+int mshv_request_interrupt(int vm_fd, uint32_t interrupt_type, uint32_t vector,
+                           uint32_t vp_index, bool logical_dest_mode,
+                           bool level_triggered)
+{
+    int ret;
+
+    if (vector == 0) {
+        /* TODO: why do we receive this? */
+        return 0;
+    }
+
+    union hv_interrupt_control control = {
+        .interrupt_type = interrupt_type,
+        .level_triggered = level_triggered,
+        .logical_dest_mode = logical_dest_mode,
+        .rsvd = 0,
+    };
+
+    struct hv_input_assert_virtual_interrupt arg = {0};
+    arg.control = control;
+    arg.dest_addr = (uint64_t)vp_index;
+    arg.vector = vector;
+
+    struct mshv_root_hvcall args = {0};
+    args.code   = HVCALL_ASSERT_VIRTUAL_INTERRUPT;
+    args.in_sz  = sizeof(arg);
+    args.in_ptr = (uint64_t)&arg;
+
+    ret = mshv_hvcall(vm_fd, &args);
+    if (ret < 0) {
+        error_report("Failed to request interrupt");
+        return -errno;
+    }
+    return 0;
+}
+
+void mshv_irqchip_commit_routes(void)
+{
+    int ret;
+
+    ret = commit_msi_routing_table();
+    if (ret < 0) {
+        error_report("Failed to commit msi routing table");
+        abort();
+    }
+}
+
+int mshv_irqchip_add_irqfd_notifier_gsi(const EventNotifier *event,
+                                        const EventNotifier *resample,
+                                        int virq)
+{
+    return irqchip_update_irqfd_notifier_gsi(event, resample, virq, true);
+}
+
+int mshv_irqchip_remove_irqfd_notifier_gsi(const EventNotifier *event,
+                                           int virq)
+{
+    return irqchip_update_irqfd_notifier_gsi(event, NULL, virq, false);
+}
diff --git a/accel/mshv/meson.build b/accel/mshv/meson.build
index 8a6beb3fb1..f88fc8678c 100644
--- a/accel/mshv/meson.build
+++ b/accel/mshv/meson.build
@@ -1,5 +1,6 @@
 mshv_ss = ss.source_set()
 mshv_ss.add(if_true: files(
+  'irq.c',
   'mem.c',
   'mshv-all.c'
 ))
diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index e4085b216d..a29e356ba0 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -417,6 +417,8 @@ static int mshv_init(MachineState *ms)
         return -1;
     }
 
+    mshv_init_msicontrol();
+
     do {
         int vm_fd = create_vm(mshv_fd);
         s->vm = vm_fd;
diff --git a/accel/mshv/trace-events b/accel/mshv/trace-events
index 5929cb45a5..beb5be7b73 100644
--- a/accel/mshv/trace-events
+++ b/accel/mshv/trace-events
@@ -1,7 +1,16 @@
 # See docs/devel/tracing.rst for syntax documentation.
 
+mshv_handle_interrupt(uint32_t cpu, int mask) "cpu_index %d mask %x"
 mshv_set_memory(bool add, uint64_t gpa, uint64_t size, uint64_t user_addr, bool readonly, int ret) "[add = %d] gpa = %lx size = %lx user = %lx readonly = %d result = %d"
 mshv_mem_ioeventfd_add(uint64_t addr, uint32_t size, uint32_t data) "addr %lx size %d data %x"
 mshv_mem_ioeventfd_del(uint64_t addr, uint32_t size, uint32_t data) "addr %lx size %d data %x"
 
 mshv_hvcall_args(const char* hvcall, uint16_t code, uint16_t in_sz) "built args for '%s' code: %d in_sz: %d"
+
+mshv_set_msi_routing(uint32_t gsi, uint64_t addr, uint32_t data) "gsi %d addr %lx data %x"
+mshv_remove_msi_routing(uint32_t gsi) "gsi %d"
+mshv_add_msi_routing(uint64_t addr, uint32_t data) "addr %lx data %x"
+mshv_commit_msi_routing_table(int vm_fd, int len) "vm_fd %d table_size %d"
+mshv_register_irqfd(int vm_fd, int event_fd, uint32_t gsi) "vm_fd %d event_fd %d gsi %d"
+mshv_irqchip_update_irqfd_notifier_gsi(int event_fd, int resample_fd, int virq, bool add) "event_fd %d resample_fd %d virq %d add %d"
+
diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index bcb103560c..4d1fe7cdd1 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -27,6 +27,7 @@
 #include "hw/pci/msi.h"
 #include "qemu/host-utils.h"
 #include "system/kvm.h"
+#include "system/mshv.h"
 #include "trace.h"
 #include "hw/i386/apic-msidef.h"
 #include "qapi/error.h"
@@ -932,6 +933,14 @@ static void apic_send_msi(MSIMessage *msi)
     uint8_t trigger_mode = (data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
     uint8_t delivery = (data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
     /* XXX: Ignore redirection hint. */
+#ifdef CONFIG_MSHV
+    if (mshv_enabled()) {
+		/* TODO: error handling? */
+        mshv_request_interrupt(mshv_state->vm, delivery, vector, dest,
+                               dest_mode, trigger_mode);
+        return;
+    }
+#endif
     apic_deliver_irq(dest, dest_mode, delivery, vector, trigger_mode);
 }
 
diff --git a/include/system/mshv.h b/include/system/mshv.h
index c7ee4f0cc1..4c1e901835 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -40,6 +40,10 @@
  */
 #define MSHV_USE_IOEVENTFD 1
 
+#define MSHV_USE_KERNEL_GSI_IRQFD 1
+
+#define MSHV_MAX_MSI_ROUTES 4096
+
 #define MSHV_PAGE_SHIFT 12
 
 
@@ -72,6 +76,11 @@ struct AccelCPUState {
   bool dirty;
 };
 
+typedef struct MshvMsiControl {
+    bool updated;
+    GHashTable *gsi_routes;
+} MshvMsiControl;
+
 #else /* CONFIG_MSHV_IS_POSSIBLE */
 #define mshv_enabled() false
 #endif
@@ -106,6 +115,11 @@ int mshv_remove_mem(int vm_fd, const MshvMemoryRegion *mr);
 void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
                        bool add);
 /* interrupt */
+void mshv_init_msicontrol(void);
+int mshv_request_interrupt(int vm_fd, uint32_t interrupt_type, uint32_t vector,
+                           uint32_t vp_index, bool logical_destination_mode,
+                           bool level_triggered);
+
 int mshv_irqchip_add_msi_route(int vector, PCIDevice *dev);
 int mshv_irqchip_update_msi_route(int virq, MSIMessage msg, PCIDevice *dev);
 void mshv_irqchip_commit_routes(void);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 12/25] accel/mshv: Add vCPU creation and execution loop
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (10 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 11/25] accel/mshv: Add basic interrupt injection support Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 13:50   ` Paolo Bonzini
  2025-05-20 11:30 ` [RFC PATCH 13/25] accel/mshv: Add vCPU signal handling Magnus Kulke
                   ` (13 subsequent siblings)
  25 siblings, 1 reply; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Create MSHV vCPUs using MSHV_CREATE_VP and initialize their state.
Register the MSHV CPU execution loop loop with the QEMU accelerator
framework to enable guest code execution.

The target/i386 functionality is still mostly stubbed out and will be
populated in a later commit in this series.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/mshv-all.c       | 197 +++++++++++++++++++++++++++++++++---
 accel/mshv/trace-events     |   1 +
 include/system/mshv.h       |  19 ++++
 target/i386/mshv/mshv-cpu.c |  63 ++++++++++++
 4 files changed, 268 insertions(+), 12 deletions(-)

diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index a29e356ba0..71fedc9137 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -400,6 +400,24 @@ int mshv_hvcall(int mshv_fd, const struct mshv_root_hvcall *args)
     return ret;
 }
 
+static int mshv_init_vcpu(CPUState *cpu)
+{
+    int vm_fd = mshv_state->vm;
+    uint8_t vp_index = cpu->cpu_index;
+    int ret;
+
+    mshv_arch_init_vcpu(cpu);
+    cpu->accel = g_new0(AccelCPUState, 1);
+
+    ret = mshv_create_vcpu(vm_fd, vp_index, &cpu->accel->cpufd);
+    if (ret < 0) {
+        return -1;
+    }
+
+    cpu->accel->dirty = true;
+
+    return 0;
+}
 
 static int mshv_init(MachineState *ms)
 {
@@ -417,6 +435,8 @@ static int mshv_init(MachineState *ms)
         return -1;
     }
 
+    mshv_init_cpu_logic();
+
     mshv_init_msicontrol();
 
     do {
@@ -440,40 +460,193 @@ static int mshv_init(MachineState *ms)
     return 0;
 }
 
+static int mshv_destroy_vcpu(CPUState *cpu)
+{
+    int cpu_fd = mshv_vcpufd(cpu);
+    int vm_fd = mshv_state->vm;
+
+    mshv_remove_vcpu(vm_fd, cpu_fd);
+    mshv_vcpufd(cpu) = 0;
+
+    mshv_arch_destroy_vcpu(cpu);
+    g_free(cpu->accel);
+    return 0;
+}
+
+static int mshv_cpu_exec(CPUState *cpu)
+{
+    hv_message mshv_msg;
+    enum MshvVmExit exit_reason;
+    int ret = 0;
+
+    bql_unlock();
+    cpu_exec_start(cpu);
+
+    do {
+        if (cpu->accel->dirty) {
+            ret = mshv_arch_put_registers(cpu);
+            if (ret) {
+                error_report("Failed to put registers after init: %s",
+                              strerror(-ret));
+                ret = -1;
+                break;
+            }
+            cpu->accel->dirty = false;
+        }
+
+        if (qatomic_read(&cpu->exit_request)) {
+            qemu_cpu_kick_self();
+        }
+
+        /*
+         * Read cpu->exit_request before KVM_RUN reads run->immediate_exit.
+         * Matching barrier in kvm_eat_signals.
+         */
+        smp_rmb();
+
+        ret = mshv_run_vcpu(mshv_state->vm, cpu, &mshv_msg, &exit_reason);
+        if (ret < 0) {
+            error_report("Failed to run on vcpu %d", cpu->cpu_index);
+            abort();
+        }
+
+        switch (exit_reason) {
+        case MshvVmExitIgnore:
+            break;
+        default:
+            ret = EXCP_INTERRUPT;
+            break;
+        }
+    } while (ret == 0);
+
+    cpu_exec_end(cpu);
+    bql_lock();
+
+    if (ret < 0) {
+        cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
+        vm_stop(RUN_STATE_INTERNAL_ERROR);
+    }
+
+    qatomic_set(&cpu->exit_request, 0);
+    return ret;
+}
+
+static void *mshv_vcpu_thread(void *arg)
+{
+    CPUState *cpu = arg;
+    int ret;
+
+    rcu_register_thread();
+
+    bql_lock();
+    qemu_thread_get_self(cpu->thread);
+    cpu->thread_id = qemu_get_thread_id();
+    current_cpu = cpu;
+    ret = mshv_init_vcpu(cpu);
+    if (ret < 0) {
+        error_report("Failed to init vcpu %d", cpu->cpu_index);
+        goto cleanup;
+    }
+
+    /* signal CPU creation */
+    cpu_thread_signal_created(cpu);
+    qemu_guest_random_seed_thread_part2(cpu->random_seed);
+
+    do {
+        if (cpu_can_run(cpu)) {
+            mshv_cpu_exec(cpu);
+        }
+        qemu_wait_io_event(cpu);
+    } while (!cpu->unplug || cpu_can_run(cpu));
+
+    mshv_destroy_vcpu(cpu);
+cleanup:
+    cpu_thread_signal_destroyed(cpu);
+    bql_unlock();
+    rcu_unregister_thread();
+    return NULL;
+}
+
 static void mshv_start_vcpu_thread(CPUState *cpu)
 {
-	error_report("unimplemented");
-	abort();
+    char thread_name[VCPU_THREAD_NAME_SIZE];
+
+    cpu->thread = g_malloc0(sizeof(QemuThread));
+    cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+
+    qemu_cond_init(cpu->halt_cond);
+
+    trace_mshv_start_vcpu_thread(thread_name, cpu->cpu_index);
+    qemu_thread_create(cpu->thread, thread_name, mshv_vcpu_thread, cpu,
+                       QEMU_THREAD_JOINABLE);
+}
+
+static void do_mshv_cpu_synchronize_post_init(CPUState *cpu,
+                                              run_on_cpu_data arg)
+{
+    int ret = mshv_arch_put_registers(cpu);
+    if (ret < 0) {
+        error_report("Failed to put registers after init: %s", strerror(-ret));
+        abort();
+    }
+
+    cpu->accel->dirty = false;
 }
 
 static void mshv_cpu_synchronize_post_init(CPUState *cpu)
 {
-	error_report("unimplemented");
-	abort();
+    run_on_cpu(cpu, do_mshv_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
 }
 
 static void mshv_cpu_synchronize_post_reset(CPUState *cpu)
 {
-	error_report("unimplemented");
-	abort();
+    int ret = mshv_arch_put_registers(cpu);
+    if (ret) {
+        error_report("Failed to put registers after reset: %s",
+                     strerror(-ret));
+        cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
+        vm_stop(RUN_STATE_INTERNAL_ERROR);
+    }
+    cpu->accel->dirty = false;
+}
+
+static void do_mshv_cpu_synchronize_pre_loadvm(CPUState *cpu,
+                                               run_on_cpu_data arg)
+{
+    cpu->accel->dirty = true;
 }
 
 static void mshv_cpu_synchronize_pre_loadvm(CPUState *cpu)
 {
-	error_report("unimplemented");
-	abort();
+    run_on_cpu(cpu, do_mshv_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
+}
+
+static void do_mshv_cpu_synchronize(CPUState *cpu, run_on_cpu_data arg)
+{
+    if (!cpu->accel->dirty) {
+        int ret = mshv_load_regs(cpu);
+        if (ret < 0) {
+            error_report("Failed to load registers for vcpu %d",
+                         cpu->cpu_index);
+
+            cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
+            vm_stop(RUN_STATE_INTERNAL_ERROR);
+        }
+
+        cpu->accel->dirty = true;
+    }
 }
 
 static void mshv_cpu_synchronize(CPUState *cpu)
 {
-	error_report("unimplemented");
-	abort();
+    if (!cpu->accel->dirty) {
+        run_on_cpu(cpu, do_mshv_cpu_synchronize, RUN_ON_CPU_NULL);
+    }
 }
 
 static bool mshv_cpus_are_resettable(void)
 {
-	error_report("unimplemented");
-	abort();
+    return false;
 }
 
 static void mshv_accel_class_init(ObjectClass *oc, const void *data)
diff --git a/accel/mshv/trace-events b/accel/mshv/trace-events
index beb5be7b73..06aa27ef67 100644
--- a/accel/mshv/trace-events
+++ b/accel/mshv/trace-events
@@ -1,5 +1,6 @@
 # See docs/devel/tracing.rst for syntax documentation.
 
+mshv_start_vcpu_thread(const char* thread, uint32_t cpu) "thread %s cpu_index %d"
 mshv_handle_interrupt(uint32_t cpu, int mask) "cpu_index %d mask %x"
 mshv_set_memory(bool add, uint64_t gpa, uint64_t size, uint64_t user_addr, bool readonly, int ret) "[add = %d] gpa = %lx size = %lx user = %lx readonly = %d result = %d"
 mshv_mem_ioeventfd_add(uint64_t addr, uint32_t size, uint32_t data) "addr %lx size %d data %x"
diff --git a/include/system/mshv.h b/include/system/mshv.h
index 4c1e901835..458b182077 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -32,6 +32,8 @@
 #define CONFIG_MSHV_IS_POSSIBLE
 #endif
 
+typedef struct hyperv_message hv_message;
+
 /*
  * Set to 0 if we do not want to use eventfd to optimize the MMIO events.
  * Set to 1 so that mshv kernel driver receives doorbell when the VM access
@@ -81,6 +83,8 @@ typedef struct MshvMsiControl {
     GHashTable *gsi_routes;
 } MshvMsiControl;
 
+#define mshv_vcpufd(cpu) (cpu->accel->cpufd)
+
 #else /* CONFIG_MSHV_IS_POSSIBLE */
 #define mshv_enabled() false
 #endif
@@ -95,6 +99,21 @@ typedef struct MshvMsiControl {
 #define EFER_LMA   ((uint64_t)0x400)
 #define EFER_LME   ((uint64_t)0x100)
 
+typedef enum MshvVmExit {
+    MshvVmExitIgnore   = 0,
+    MshvVmExitShutdown = 1,
+    MshvVmExitSpecial  = 2,
+} MshvVmExit;
+
+void mshv_init_cpu_logic(void);
+int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd);
+void mshv_remove_vcpu(int vm_fd, int cpu_fd);
+int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit);
+int mshv_load_regs(CPUState *cpu);
+int mshv_store_regs(CPUState *cpu);
+int mshv_arch_put_registers(const CPUState *cpu);
+void mshv_arch_init_vcpu(CPUState *cpu);
+void mshv_arch_destroy_vcpu(CPUState *cpu);
 void mshv_arch_amend_proc_features(
     union hv_partition_synthetic_processor_features *features);
 int mshv_arch_post_init_vm(int vm_fd);
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index b36f8904fb..c4b2c297e2 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -22,16 +22,79 @@
 #include "hw/hyperv/hvhdk_mini.h"
 #include "hw/hyperv/hvgdk.h"
 
+#include "cpu.h"
+#include "emulate/x86_decode.h"
+#include "emulate/x86_emu.h"
+#include "emulate/x86_flags.h"
 
 #include "trace-accel_mshv.h"
 #include "trace.h"
 
+int mshv_store_regs(CPUState *cpu)
+{
+	error_report("unimplemented");
+	abort();
+}
+
+int mshv_load_regs(CPUState *cpu)
+{
+	error_report("unimplemented");
+	abort();
+}
+
+int mshv_arch_put_registers(const CPUState *cpu)
+{
+	error_report("unimplemented");
+	abort();
+}
+
 void mshv_arch_amend_proc_features(
     union hv_partition_synthetic_processor_features *features)
 {
     features->access_guest_idle_reg = 1;
 }
 
+int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit)
+{
+	error_report("unimplemented");
+	abort();
+}
+
+void mshv_remove_vcpu(int vm_fd, int cpu_fd)
+{
+	error_report("unimplemented");
+	abort();
+}
+
+int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd)
+{
+	error_report("unimplemented");
+	abort();
+}
+
+void mshv_init_cpu_logic(void)
+{
+	error_report("unimplemented");
+	abort();
+}
+
+void mshv_arch_init_vcpu(CPUState *cpu)
+{
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+
+    env->emu_mmio_buf = g_new(char, 4096);
+}
+
+void mshv_arch_destroy_vcpu(CPUState *cpu)
+{
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+
+    g_free(env->emu_mmio_buf);
+    env->emu_mmio_buf = NULL;
+}
+
 /*
  * Default Microsoft Hypervisor behavior for unimplemented MSR is to send a
  * fault to the guest if it tries to access it. It is possible to override
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 13/25] accel/mshv: Add vCPU signal handling
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (11 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 12/25] accel/mshv: Add vCPU creation and execution loop Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 11:30 ` [RFC PATCH 14/25] target/i386/mshv: Add CPU create and remove logic Magnus Kulke
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Implement signal handling for MSHV vCPUs to support asynchronous
interrupts from the main thread.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/mshv-all.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index 71fedc9137..97212c54f1 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -531,6 +531,33 @@ static int mshv_cpu_exec(CPUState *cpu)
     return ret;
 }
 
+/*
+ * The signal handler is triggered when QEMU's main thread receives a SIG_IPI
+ * (SIGUSR1). This signal causes the current CPU thread to be kicked, forcing a
+ * VM exit on the CPU. The VM exit generates an exit reason that breaks the loop
+ * (see mshv_cpu_exec). If the exit is due to a Ctrl+A+x command, the system
+ * will shut down. For other cases, the system will continue running.
+ */
+static void sa_ipi_handler(int sig)
+{
+    qemu_cpu_kick_self();
+}
+
+static void init_signal(CPUState *cpu)
+{
+    /* init cpu signals */
+    struct sigaction sigact;
+    sigset_t set;
+
+    memset(&sigact, 0, sizeof(sigact));
+    sigact.sa_handler = sa_ipi_handler;
+    sigaction(SIG_IPI, &sigact, NULL);
+
+    pthread_sigmask(SIG_BLOCK, NULL, &set);
+    sigdelset(&set, SIG_IPI);
+    pthread_sigmask(SIG_SETMASK, &set, NULL);
+}
+
 static void *mshv_vcpu_thread(void *arg)
 {
     CPUState *cpu = arg;
@@ -547,6 +574,7 @@ static void *mshv_vcpu_thread(void *arg)
         error_report("Failed to init vcpu %d", cpu->cpu_index);
         goto cleanup;
     }
+    init_signal(cpu);
 
     /* signal CPU creation */
     cpu_thread_signal_created(cpu);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 14/25] target/i386/mshv: Add CPU create and remove logic
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (12 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 13/25] accel/mshv: Add vCPU signal handling Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 21:50   ` Wei Liu
  2025-05-20 11:30 ` [RFC PATCH 15/25] target/i386/mshv: Implement mshv_store_regs() Magnus Kulke
                   ` (11 subsequent siblings)
  25 siblings, 1 reply; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Implement MSHV-specific hooks for vCPU creation and teardown in the
i386 target. A list of locks per vCPU is maintained to lock CPU state in
MMIO operations.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 target/i386/mshv/mshv-cpu.c | 61 +++++++++++++++++++++++++++++++++----
 1 file changed, 55 insertions(+), 6 deletions(-)

diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index c4b2c297e2..0ba1dacaed 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -14,6 +14,8 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/atomic.h"
+#include "qemu/lockable.h"
 #include "qemu/error-report.h"
 #include "qemu/typedefs.h"
 
@@ -30,6 +32,36 @@
 #include "trace-accel_mshv.h"
 #include "trace.h"
 
+#include <sys/ioctl.h>
+
+static QemuMutex *cpu_guards_lock;
+static GHashTable *cpu_guards;
+
+static void add_cpu_guard(int cpu_fd)
+{
+    QemuMutex *guard;
+
+    WITH_QEMU_LOCK_GUARD(cpu_guards_lock) {
+        guard = g_new0(QemuMutex, 1);
+        qemu_mutex_init(guard);
+        g_hash_table_insert(cpu_guards, GUINT_TO_POINTER(cpu_fd), guard);
+    }
+}
+
+static void remove_cpu_guard(int cpu_fd)
+{
+    QemuMutex *guard;
+
+    WITH_QEMU_LOCK_GUARD(cpu_guards_lock) {
+        guard = g_hash_table_lookup(cpu_guards, GUINT_TO_POINTER(cpu_fd));
+        if (guard) {
+            qemu_mutex_destroy(guard);
+            g_free(guard);
+            g_hash_table_remove(cpu_guards, GUINT_TO_POINTER(cpu_fd));
+        }
+    }
+}
+
 int mshv_store_regs(CPUState *cpu)
 {
 	error_report("unimplemented");
@@ -62,20 +94,37 @@ int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit)
 
 void mshv_remove_vcpu(int vm_fd, int cpu_fd)
 {
-	error_report("unimplemented");
-	abort();
+    /*
+     * TODO: don't we have to perform an ioctl to remove the vcpu?
+     * there is WHvDeleteVirtualProcessor in the WHV api
+     */
+    remove_cpu_guard(cpu_fd);
 }
 
+
 int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd)
 {
-	error_report("unimplemented");
-	abort();
+    int ret;
+    struct mshv_create_vp vp_arg = {
+        .vp_index = vp_index,
+    };
+    ret = ioctl(vm_fd, MSHV_CREATE_VP, &vp_arg);
+    if (ret < 0) {
+        error_report("failed to create mshv vcpu: %s", strerror(errno));
+        return -1;
+    }
+
+    add_cpu_guard(ret);
+    *cpu_fd = ret;
+
+    return 0;
 }
 
 void mshv_init_cpu_logic(void)
 {
-	error_report("unimplemented");
-	abort();
+    cpu_guards_lock = g_new0(QemuMutex, 1);
+    qemu_mutex_init(cpu_guards_lock);
+    cpu_guards = g_hash_table_new(g_direct_hash, g_direct_equal);
 }
 
 void mshv_arch_init_vcpu(CPUState *cpu)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 15/25] target/i386/mshv: Implement mshv_store_regs()
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (13 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 14/25] target/i386/mshv: Add CPU create and remove logic Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 22:07   ` Wei Liu
  2025-05-20 11:30 ` [RFC PATCH 16/25] target/i386/mshv: Implement mshv_get_standard_regs() Magnus Kulke
                   ` (10 subsequent siblings)
  25 siblings, 1 reply; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Add support for writing general-purpose registers to MSHV vCPUs
during initialization or migration using the MSHV register interface. A
generic set_register call is introduced to abstract the HV call over
the various register types.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 include/system/mshv.h       |  1 +
 target/i386/mshv/mshv-cpu.c | 89 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/include/system/mshv.h b/include/system/mshv.h
index 458b182077..b2dec5a7ec 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -111,6 +111,7 @@ void mshv_remove_vcpu(int vm_fd, int cpu_fd);
 int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit);
 int mshv_load_regs(CPUState *cpu);
 int mshv_store_regs(CPUState *cpu);
+int mshv_set_generic_regs(int cpu_fd, hv_register_assoc *assocs, size_t n_regs);
 int mshv_arch_put_registers(const CPUState *cpu);
 void mshv_arch_init_vcpu(CPUState *cpu);
 void mshv_arch_destroy_vcpu(CPUState *cpu);
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 0ba1dacaed..83dcdc7b70 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -37,6 +37,27 @@
 static QemuMutex *cpu_guards_lock;
 static GHashTable *cpu_guards;
 
+static enum hv_register_name STANDARD_REGISTER_NAMES[18] = {
+    HV_X64_REGISTER_RAX,
+    HV_X64_REGISTER_RBX,
+    HV_X64_REGISTER_RCX,
+    HV_X64_REGISTER_RDX,
+    HV_X64_REGISTER_RSI,
+    HV_X64_REGISTER_RDI,
+    HV_X64_REGISTER_RSP,
+    HV_X64_REGISTER_RBP,
+    HV_X64_REGISTER_R8,
+    HV_X64_REGISTER_R9,
+    HV_X64_REGISTER_R10,
+    HV_X64_REGISTER_R11,
+    HV_X64_REGISTER_R12,
+    HV_X64_REGISTER_R13,
+    HV_X64_REGISTER_R14,
+    HV_X64_REGISTER_R15,
+    HV_X64_REGISTER_RIP,
+    HV_X64_REGISTER_RFLAGS,
+};
+
 static void add_cpu_guard(int cpu_fd)
 {
     QemuMutex *guard;
@@ -62,12 +83,76 @@ static void remove_cpu_guard(int cpu_fd)
     }
 }
 
+int mshv_set_generic_regs(int cpu_fd, hv_register_assoc *assocs, size_t n_regs)
+{
+    struct mshv_vp_registers input = {
+        .count = n_regs,
+        .regs = assocs,
+    };
+
+    return ioctl(cpu_fd, MSHV_SET_VP_REGISTERS, &input);
+}
+
+static int set_standard_regs(const CPUState *cpu)
+{
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    struct hv_register_assoc *assocs;
+    size_t n_regs = sizeof(STANDARD_REGISTER_NAMES) / sizeof(hv_register_name);
+    int ret;
+    int cpu_fd = mshv_vcpufd(cpu);
+
+    assocs = g_new0(hv_register_assoc, n_regs);
+
+    /* set names */
+    for (size_t i = 0; i < n_regs; i++) {
+        assocs[i].name = STANDARD_REGISTER_NAMES[i];
+    }
+    assocs[0].value.reg64 = env->regs[R_EAX];
+    assocs[1].value.reg64 = env->regs[R_EBX];
+    assocs[2].value.reg64 = env->regs[R_ECX];
+    assocs[3].value.reg64 = env->regs[R_EDX];
+    assocs[4].value.reg64 = env->regs[R_ESI];
+    assocs[5].value.reg64 = env->regs[R_EDI];
+    assocs[6].value.reg64 = env->regs[R_ESP];
+    assocs[7].value.reg64 = env->regs[R_EBP];
+    assocs[8].value.reg64 = env->regs[R_R8];
+    assocs[9].value.reg64 = env->regs[R_R9];
+    assocs[10].value.reg64 = env->regs[R_R10];
+    assocs[11].value.reg64 = env->regs[R_R11];
+    assocs[12].value.reg64 = env->regs[R_R12];
+    assocs[13].value.reg64 = env->regs[R_R13];
+    assocs[14].value.reg64 = env->regs[R_R14];
+    assocs[15].value.reg64 = env->regs[R_R15];
+    assocs[16].value.reg64 = env->eip;
+    lflags_to_rflags(env);
+    assocs[17].value.reg64 = env->eflags;
+
+    ret = mshv_set_generic_regs(cpu_fd, assocs, n_regs);
+    g_free(assocs);
+    if (ret < 0) {
+        error_report("failed to set standard registers");
+        return -errno;
+    }
+    return 0;
+}
+
 int mshv_store_regs(CPUState *cpu)
 {
-	error_report("unimplemented");
-	abort();
+    int ret;
+
+    ret = set_standard_regs(cpu);
+    if (ret < 0) {
+        error_report("Failed to store standard registers");
+        return -1;
+    }
+
+    /* TODO: should store special registers? the equivalent hvf code doesn't */
+
+    return 0;
 }
 
+
 int mshv_load_regs(CPUState *cpu)
 {
 	error_report("unimplemented");
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 16/25] target/i386/mshv: Implement mshv_get_standard_regs()
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (14 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 15/25] target/i386/mshv: Implement mshv_store_regs() Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 22:09   ` Wei Liu
  2025-05-20 11:30 ` [RFC PATCH 17/25] target/i386/mshv: Implement mshv_get_special_regs() Magnus Kulke
                   ` (9 subsequent siblings)
  25 siblings, 1 reply; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Fetch standard register state from MSHV vCPUs to support debugging,
migration, and other introspection features in QEMU.

Fetch standard register state from a MHSV vCPU's. A generic get_regs()
function and a mapper to map the different register representations are
introduced.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 include/system/mshv.h       |  1 +
 target/i386/mshv/mshv-cpu.c | 70 +++++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)

diff --git a/include/system/mshv.h b/include/system/mshv.h
index b2dec5a7ec..9b78b66a24 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -108,6 +108,7 @@ typedef enum MshvVmExit {
 void mshv_init_cpu_logic(void);
 int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd);
 void mshv_remove_vcpu(int vm_fd, int cpu_fd);
+int mshv_get_standard_regs(CPUState *cpu);
 int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit);
 int mshv_load_regs(CPUState *cpu);
 int mshv_store_regs(CPUState *cpu);
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 83dcdc7b70..41584c3f8e 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -93,6 +93,18 @@ int mshv_set_generic_regs(int cpu_fd, hv_register_assoc *assocs, size_t n_regs)
     return ioctl(cpu_fd, MSHV_SET_VP_REGISTERS, &input);
 }
 
+static int get_generic_regs(int cpu_fd, struct hv_register_assoc *assocs,
+                            size_t n_regs)
+{
+    struct mshv_vp_registers input = {
+        .count = n_regs,
+        .regs = assocs,
+    };
+
+    return ioctl(cpu_fd, MSHV_GET_VP_REGISTERS, &input);
+}
+
+
 static int set_standard_regs(const CPUState *cpu)
 {
     X86CPU *x86cpu = X86_CPU(cpu);
@@ -152,9 +164,67 @@ int mshv_store_regs(CPUState *cpu)
     return 0;
 }
 
+static void populate_standard_regs(const hv_register_assoc *assocs,
+                                   CPUX86State *env)
+{
+    env->regs[R_EAX] = assocs[0].value.reg64;
+    env->regs[R_EBX] = assocs[1].value.reg64;
+    env->regs[R_ECX] = assocs[2].value.reg64;
+    env->regs[R_EDX] = assocs[3].value.reg64;
+    env->regs[R_ESI] = assocs[4].value.reg64;
+    env->regs[R_EDI] = assocs[5].value.reg64;
+    env->regs[R_ESP] = assocs[6].value.reg64;
+    env->regs[R_EBP] = assocs[7].value.reg64;
+    env->regs[R_R8]  = assocs[8].value.reg64;
+    env->regs[R_R9]  = assocs[9].value.reg64;
+    env->regs[R_R10] = assocs[10].value.reg64;
+    env->regs[R_R11] = assocs[11].value.reg64;
+    env->regs[R_R12] = assocs[12].value.reg64;
+    env->regs[R_R13] = assocs[13].value.reg64;
+    env->regs[R_R14] = assocs[14].value.reg64;
+    env->regs[R_R15] = assocs[15].value.reg64;
+
+    env->eip = assocs[16].value.reg64;
+    env->eflags = assocs[17].value.reg64;
+    rflags_to_lflags(env);
+}
+
+int mshv_get_standard_regs(CPUState *cpu)
+{
+    size_t n_regs = sizeof(STANDARD_REGISTER_NAMES) / sizeof(hv_register_name);
+    struct hv_register_assoc *assocs;
+    int ret;
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    int cpu_fd = mshv_vcpufd(cpu);
+
+    assocs = g_new0(hv_register_assoc, n_regs);
+    for (size_t i = 0; i < n_regs; i++) {
+        assocs[i].name = STANDARD_REGISTER_NAMES[i];
+    }
+    ret = get_generic_regs(cpu_fd, assocs, n_regs);
+    if (ret < 0) {
+        error_report("failed to get standard registers");
+        g_free(assocs);
+        return -1;
+    }
+
+    populate_standard_regs(assocs, env);
+
+    g_free(assocs);
+    return 0;
+}
 
 int mshv_load_regs(CPUState *cpu)
 {
+    int ret;
+
+    ret = mshv_get_standard_regs(cpu);
+    if (ret < 0) {
+        error_report("Failed to load standard registers");
+        return -1;
+    }
+
 	error_report("unimplemented");
 	abort();
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 17/25] target/i386/mshv: Implement mshv_get_special_regs()
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (15 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 16/25] target/i386/mshv: Implement mshv_get_standard_regs() Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 14:05   ` Paolo Bonzini
  2025-05-20 22:15   ` Wei Liu
  2025-05-20 11:30 ` [RFC PATCH 18/25] target/i386/mshv: Implement mshv_arch_put_registers() Magnus Kulke
                   ` (8 subsequent siblings)
  25 siblings, 2 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Retrieve special registers (e.g. segment, control, and descriptor
table registers) from MSHV vCPUs.

Various helper functions to map register state representations between
Qemu and MSHV are introduced.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 include/system/mshv.h       |   1 +
 target/i386/mshv/mshv-cpu.c | 118 +++++++++++++++++++++++++++++++++++-
 2 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/include/system/mshv.h b/include/system/mshv.h
index 9b78b66a24..055489a6f3 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -109,6 +109,7 @@ void mshv_init_cpu_logic(void);
 int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd);
 void mshv_remove_vcpu(int vm_fd, int cpu_fd);
 int mshv_get_standard_regs(CPUState *cpu);
+int mshv_get_special_regs(CPUState *cpu);
 int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit);
 int mshv_load_regs(CPUState *cpu);
 int mshv_store_regs(CPUState *cpu);
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 41584c3f8e..979ee5b8c3 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -58,6 +58,27 @@ static enum hv_register_name STANDARD_REGISTER_NAMES[18] = {
     HV_X64_REGISTER_RFLAGS,
 };
 
+static enum hv_register_name SPECIAL_REGISTER_NAMES[18] = {
+    HV_X64_REGISTER_CS,
+    HV_X64_REGISTER_DS,
+    HV_X64_REGISTER_ES,
+    HV_X64_REGISTER_FS,
+    HV_X64_REGISTER_GS,
+    HV_X64_REGISTER_SS,
+    HV_X64_REGISTER_TR,
+    HV_X64_REGISTER_LDTR,
+    HV_X64_REGISTER_GDTR,
+    HV_X64_REGISTER_IDTR,
+    HV_X64_REGISTER_CR0,
+    HV_X64_REGISTER_CR2,
+    HV_X64_REGISTER_CR3,
+    HV_X64_REGISTER_CR4,
+    HV_X64_REGISTER_CR8,
+    HV_X64_REGISTER_EFER,
+    HV_X64_REGISTER_APIC_BASE,
+    HV_REGISTER_PENDING_INTERRUPTION,
+};
+
 static void add_cpu_guard(int cpu_fd)
 {
     QemuMutex *guard;
@@ -215,6 +236,94 @@ int mshv_get_standard_regs(CPUState *cpu)
     return 0;
 }
 
+static void populate_segment_reg(const hv_x64_segment_register *hv_seg,
+                                 SegmentCache *seg)
+{
+    memset(seg, 0, sizeof(SegmentCache));
+
+    seg->base = hv_seg->base;
+    seg->limit = hv_seg->limit;
+    seg->selector = hv_seg->selector;
+
+    seg->flags = (hv_seg->segment_type << DESC_TYPE_SHIFT)
+                 | (hv_seg->present * DESC_P_MASK)
+                 | (hv_seg->descriptor_privilege_level << DESC_DPL_SHIFT)
+                 | (hv_seg->_default << DESC_B_SHIFT)
+                 | (hv_seg->non_system_segment * DESC_S_MASK)
+                 | (hv_seg->_long << DESC_L_SHIFT)
+                 | (hv_seg->granularity * DESC_G_MASK)
+                 | (hv_seg->available * DESC_AVL_MASK);
+
+}
+
+static void populate_table_reg(const hv_x64_table_register *hv_seg,
+                               SegmentCache *tbl)
+{
+    memset(tbl, 0, sizeof(SegmentCache));
+
+    tbl->base = hv_seg->base;
+    tbl->limit = hv_seg->limit;
+}
+
+static void populate_special_regs(const hv_register_assoc *assocs,
+                                  X86CPU *x86cpu)
+{
+    CPUX86State *env = &x86cpu->env;
+
+    populate_segment_reg(&assocs[0].value.segment, &env->segs[R_CS]);
+    populate_segment_reg(&assocs[1].value.segment, &env->segs[R_DS]);
+    populate_segment_reg(&assocs[2].value.segment, &env->segs[R_ES]);
+    populate_segment_reg(&assocs[3].value.segment, &env->segs[R_FS]);
+    populate_segment_reg(&assocs[4].value.segment, &env->segs[R_GS]);
+    populate_segment_reg(&assocs[5].value.segment, &env->segs[R_SS]);
+
+    /* TODO: should we set TR + LDT? */
+    /* populate_segment_reg(&assocs[6].value.segment, &regs->tr); */
+    /* populate_segment_reg(&assocs[7].value.segment, &regs->ldt); */
+
+    populate_table_reg(&assocs[8].value.table, &env->gdt);
+    populate_table_reg(&assocs[9].value.table, &env->idt);
+
+    env->cr[0] = assocs[10].value.reg64;
+    env->cr[2] = assocs[11].value.reg64;
+    env->cr[3] = assocs[12].value.reg64;
+    env->cr[4] = assocs[13].value.reg64;
+
+    cpu_set_apic_tpr(x86cpu->apic_state, assocs[14].value.reg64);
+    env->efer = assocs[15].value.reg64;
+    cpu_set_apic_base(x86cpu->apic_state, assocs[16].value.reg64);
+
+    /* TODO: should we set those? */
+    /* pending_reg = assocs[17].value.pending_interruption.as_uint64; */
+    /* populate_interrupt_bitmap(pending_reg, regs->interrupt_bitmap); */
+}
+
+
+int mshv_get_special_regs(CPUState *cpu)
+{
+    size_t n_regs = sizeof(SPECIAL_REGISTER_NAMES) / sizeof(hv_register_name);
+    struct hv_register_assoc *assocs;
+    int ret;
+    X86CPU *x86cpu = X86_CPU(cpu);
+    int cpu_fd = mshv_vcpufd(cpu);
+
+    assocs = g_new0(hv_register_assoc, n_regs);
+    for (size_t i = 0; i < n_regs; i++) {
+        assocs[i].name = SPECIAL_REGISTER_NAMES[i];
+    }
+    ret = get_generic_regs(cpu_fd, assocs, n_regs);
+    if (ret < 0) {
+        error_report("failed to get special registers");
+        g_free(assocs);
+        return -errno;
+    }
+
+    populate_special_regs(assocs, x86cpu);
+
+    g_free(assocs);
+    return 0;
+}
+
 int mshv_load_regs(CPUState *cpu)
 {
     int ret;
@@ -225,8 +334,13 @@ int mshv_load_regs(CPUState *cpu)
         return -1;
     }
 
-	error_report("unimplemented");
-	abort();
+    ret = mshv_get_special_regs(cpu);
+    if (ret < 0) {
+        error_report("Failed to load special registers");
+        return -1;
+    }
+
+    return 0;
 }
 
 int mshv_arch_put_registers(const CPUState *cpu)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 18/25] target/i386/mshv: Implement mshv_arch_put_registers()
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (16 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 17/25] target/i386/mshv: Implement mshv_get_special_regs() Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 14:33   ` Paolo Bonzini
                     ` (2 more replies)
  2025-05-20 11:30 ` [RFC PATCH 19/25] target/i386/mshv: Set local interrupt controller state Magnus Kulke
                   ` (7 subsequent siblings)
  25 siblings, 3 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Write CPU register state to MSHV vCPUs. Various mapping functions to
prepare the payload for the HV call have been implemented.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 include/system/mshv.h       |  41 ++++++
 target/i386/mshv/mshv-cpu.c | 249 ++++++++++++++++++++++++++++++++++++
 2 files changed, 290 insertions(+)

diff --git a/include/system/mshv.h b/include/system/mshv.h
index 055489a6f3..76a3b0010e 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -99,6 +99,46 @@ typedef struct MshvMsiControl {
 #define EFER_LMA   ((uint64_t)0x400)
 #define EFER_LME   ((uint64_t)0x100)
 
+/* CR0 bits */
+#define CR0_PE     ((uint64_t)0x1)
+#define CR0_PG     ((uint64_t)0x80000000)
+
+/* CR4 bits */
+#define CR4_PAE    ((uint64_t)0x20)
+#define CR4_LA57   ((uint64_t)0x1000)
+
+/* rflags bits (shift values) */
+#define CF_SHIFT   0
+#define PF_SHIFT   2
+#define AF_SHIFT   4
+#define ZF_SHIFT   6
+#define SF_SHIFT   7
+#define DF_SHIFT   10
+#define OF_SHIFT   11
+
+/* rflags bits (bit masks) */
+#define CF         ((uint64_t)1 << CF_SHIFT)
+#define PF         ((uint64_t)1 << PF_SHIFT)
+#define AF         ((uint64_t)1 << AF_SHIFT)
+#define ZF         ((uint64_t)1 << ZF_SHIFT)
+#define SF         ((uint64_t)1 << SF_SHIFT)
+#define DF         ((uint64_t)1 << DF_SHIFT)
+#define OF         ((uint64_t)1 << OF_SHIFT)
+
+typedef struct MshvFPU {
+  uint8_t fpr[8][16];
+  uint16_t fcw;
+  uint16_t fsw;
+  uint8_t ftwx;
+  uint8_t pad1;
+  uint16_t last_opcode;
+  uint64_t last_ip;
+  uint64_t last_dp;
+  uint8_t xmm[16][16];
+  uint32_t mxcsr;
+  uint32_t pad2;
+} MshvFPU;
+
 typedef enum MshvVmExit {
     MshvVmExitIgnore   = 0,
     MshvVmExitShutdown = 1,
@@ -108,6 +148,7 @@ typedef enum MshvVmExit {
 void mshv_init_cpu_logic(void);
 int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd);
 void mshv_remove_vcpu(int vm_fd, int cpu_fd);
+int mshv_configure_vcpu(const CPUState *cpu, const MshvFPU *fpu, uint64_t xcr0);
 int mshv_get_standard_regs(CPUState *cpu);
 int mshv_get_special_regs(CPUState *cpu);
 int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit);
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 979ee5b8c3..ad42a09b99 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -79,6 +79,35 @@ static enum hv_register_name SPECIAL_REGISTER_NAMES[18] = {
     HV_REGISTER_PENDING_INTERRUPTION,
 };
 
+static enum hv_register_name FPU_REGISTER_NAMES[26] = {
+    HV_X64_REGISTER_XMM0,
+    HV_X64_REGISTER_XMM1,
+    HV_X64_REGISTER_XMM2,
+    HV_X64_REGISTER_XMM3,
+    HV_X64_REGISTER_XMM4,
+    HV_X64_REGISTER_XMM5,
+    HV_X64_REGISTER_XMM6,
+    HV_X64_REGISTER_XMM7,
+    HV_X64_REGISTER_XMM8,
+    HV_X64_REGISTER_XMM9,
+    HV_X64_REGISTER_XMM10,
+    HV_X64_REGISTER_XMM11,
+    HV_X64_REGISTER_XMM12,
+    HV_X64_REGISTER_XMM13,
+    HV_X64_REGISTER_XMM14,
+    HV_X64_REGISTER_XMM15,
+    HV_X64_REGISTER_FP_MMX0,
+    HV_X64_REGISTER_FP_MMX1,
+    HV_X64_REGISTER_FP_MMX2,
+    HV_X64_REGISTER_FP_MMX3,
+    HV_X64_REGISTER_FP_MMX4,
+    HV_X64_REGISTER_FP_MMX5,
+    HV_X64_REGISTER_FP_MMX6,
+    HV_X64_REGISTER_FP_MMX7,
+    HV_X64_REGISTER_FP_CONTROL_STATUS,
+    HV_X64_REGISTER_XMM_CONTROL_STATUS,
+};
+
 static void add_cpu_guard(int cpu_fd)
 {
     QemuMutex *guard;
@@ -343,8 +372,228 @@ int mshv_load_regs(CPUState *cpu)
     return 0;
 }
 
+static void populate_hv_segment_reg(SegmentCache *seg,
+                                    hv_x64_segment_register *hv_reg)
+{
+    uint32_t flags = seg->flags;
+
+    hv_reg->base = seg->base;
+    hv_reg->limit = seg->limit;
+    hv_reg->selector = seg->selector;
+    hv_reg->segment_type = (flags >> DESC_TYPE_SHIFT) & 0xF;
+    hv_reg->non_system_segment = (flags & DESC_S_MASK) != 0;
+    hv_reg->descriptor_privilege_level = (flags >> DESC_DPL_SHIFT) & 0x3;
+    hv_reg->present = (flags & DESC_P_MASK) != 0;
+    hv_reg->reserved = 0;
+    hv_reg->available = (flags & DESC_AVL_MASK) != 0;
+    hv_reg->_long = (flags >> DESC_L_SHIFT) & 0x1;
+    hv_reg->_default = (flags >> DESC_B_SHIFT) & 0x1;
+    hv_reg->granularity = (flags & DESC_G_MASK) != 0;
+}
+
+static void populate_hv_table_reg(const struct SegmentCache *seg,
+                                  hv_x64_table_register *hv_reg)
+{
+    hv_reg->base = seg->base;
+    hv_reg->limit = seg->limit;
+    memset(hv_reg->pad, 0, sizeof(hv_reg->pad));
+}
+
+static int set_special_regs(const CPUState *cpu)
+{
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    int cpu_fd = mshv_vcpufd(cpu);
+    struct hv_register_assoc *assocs;
+    size_t n_regs = sizeof(SPECIAL_REGISTER_NAMES) / sizeof(hv_register_name);
+    int ret;
+
+    assocs = g_new0(struct hv_register_assoc, n_regs);
+
+    /* set names */
+    for (size_t i = 0; i < n_regs; i++) {
+        assocs[i].name = SPECIAL_REGISTER_NAMES[i];
+    }
+    populate_hv_segment_reg(&env->segs[R_CS], &assocs[0].value.segment);
+    populate_hv_segment_reg(&env->segs[R_DS], &assocs[1].value.segment);
+    populate_hv_segment_reg(&env->segs[R_ES], &assocs[2].value.segment);
+    populate_hv_segment_reg(&env->segs[R_FS], &assocs[3].value.segment);
+    populate_hv_segment_reg(&env->segs[R_GS], &assocs[4].value.segment);
+    populate_hv_segment_reg(&env->segs[R_SS], &assocs[5].value.segment);
+    populate_hv_segment_reg(&env->tr, &assocs[6].value.segment);
+    populate_hv_segment_reg(&env->ldt, &assocs[7].value.segment);
+
+    populate_hv_table_reg(&env->gdt, &assocs[8].value.table);
+    populate_hv_table_reg(&env->idt, &assocs[9].value.table);
+
+    assocs[10].value.reg64 = env->cr[0];
+    assocs[11].value.reg64 = env->cr[2];
+    assocs[12].value.reg64 = env->cr[3];
+    assocs[13].value.reg64 = env->cr[4];
+    assocs[14].value.reg64 = cpu_get_apic_tpr(x86cpu->apic_state);
+    assocs[15].value.reg64 = env->efer;
+    assocs[16].value.reg64 = cpu_get_apic_base(x86cpu->apic_state);
+
+    /*
+     * TODO: support asserting an interrupt using interrup_bitmap
+     * it should be possible if we use the vm_fd
+     */
+
+    ret = mshv_set_generic_regs(cpu_fd, assocs, n_regs);
+    g_free(assocs);
+    if (ret < 0) {
+        error_report("failed to set special registers");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int set_fpu_regs(int cpu_fd, const struct MshvFPU *regs)
+{
+    struct hv_register_assoc *assocs;
+    union hv_register_value *value;
+    size_t n_regs = sizeof(FPU_REGISTER_NAMES) / sizeof(enum hv_register_name);
+    size_t fp_i;
+    union hv_x64_fp_control_status_register *ctrl_status;
+    union hv_x64_xmm_control_status_register *xmm_ctrl_status;
+    int ret;
+
+    assocs = g_new0(struct hv_register_assoc, n_regs);
+
+    /* first 16 registers are xmm0-xmm15 */
+    for (size_t i = 0; i < 16; i++) {
+        assocs[i].name = FPU_REGISTER_NAMES[i];
+        value = &assocs[i].value;
+        memcpy(&value->reg128, &regs->xmm[i], 16);
+    }
+
+    /* next 8 registers are fp_mmx0-fp_mmx7 */
+    for (size_t i = 16; i < 24; i++) {
+        assocs[i].name = FPU_REGISTER_NAMES[i];
+        fp_i = (i - 16);
+        value = &assocs[i].value;
+        memcpy(&value->reg128, &regs->fpr[fp_i], 16);
+    }
+
+    /* last two registers are fp_control_status and xmm_control_status */
+    assocs[24].name = FPU_REGISTER_NAMES[24];
+    value = &assocs[24].value;
+    ctrl_status = &value->fp_control_status;
+    ctrl_status->fp_control = regs->fcw;
+    ctrl_status->fp_status = regs->fsw;
+    ctrl_status->fp_tag = regs->ftwx;
+    ctrl_status->reserved = 0;
+    ctrl_status->last_fp_op = regs->last_opcode;
+    ctrl_status->last_fp_rip = regs->last_ip;
+
+    assocs[25].name = FPU_REGISTER_NAMES[25];
+    value = &assocs[25].value;
+    xmm_ctrl_status = &value->xmm_control_status;
+    xmm_ctrl_status->xmm_status_control = regs->mxcsr;
+    xmm_ctrl_status->xmm_status_control_mask = 0;
+    xmm_ctrl_status->last_fp_rdp = regs->last_dp;
+
+    ret = mshv_set_generic_regs(cpu_fd, assocs, n_regs);
+    g_free(assocs);
+    if (ret < 0) {
+        error_report("failed to set fpu registers");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int set_xc_reg(int cpu_fd, uint64_t xcr0)
+{
+    int ret;
+    struct hv_register_assoc assoc = {
+        .name = HV_X64_REGISTER_XFEM,
+        .value.reg64 = xcr0,
+    };
+
+    ret = mshv_set_generic_regs(cpu_fd, &assoc, 1);
+    if (ret < 0) {
+        error_report("failed to set xcr0");
+        return -errno;
+    }
+    return 0;
+}
+
+static int set_cpu_state(const CPUState *cpu, const MshvFPU *fpu_regs,
+                         uint64_t xcr0)
+{
+    int ret;
+    int cpu_fd = mshv_vcpufd(cpu);
+
+    ret = set_standard_regs(cpu);
+    if (ret < 0) {
+        return ret;
+    }
+    ret = set_special_regs(cpu);
+    if (ret < 0) {
+        return ret;
+    }
+    ret = set_fpu_regs(cpu_fd, fpu_regs);
+    if (ret < 0) {
+        return ret;
+    }
+    ret = set_xc_reg(cpu_fd, xcr0);
+    if (ret < 0) {
+        return ret;
+    }
+    return 0;
+}
+
+/*
+ * TODO: populate topology info:
+ *
+ * X86CPU *x86cpu = X86_CPU(cpu);
+ * CPUX86State *env = &x86cpu->env;
+ * X86CPUTopoInfo *topo_info = &env->topo_info;
+ */
+int mshv_configure_vcpu(const CPUState *cpu, const struct MshvFPU *fpu,
+                        uint64_t xcr0)
+{
+    int ret;
+
+    ret = set_cpu_state(cpu, fpu, xcr0);
+    if (ret < 0) {
+        error_report("failed to set cpu state");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int put_regs(const CPUState *cpu)
+{
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    MshvFPU fpu = {0};
+    int ret;
+
+    memset(&fpu, 0, sizeof(fpu));
+
+    ret = mshv_configure_vcpu(cpu, &fpu, env->xcr0);
+    if (ret < 0) {
+        error_report("failed to configure vcpu");
+        return ret;
+    }
+
+    return 0;
+}
+
 int mshv_arch_put_registers(const CPUState *cpu)
 {
+    int ret;
+
+    ret = put_regs(cpu);
+    if (ret < 0) {
+        error_report("Failed to put registers");
+        return -1;
+    }
+
 	error_report("unimplemented");
 	abort();
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 19/25] target/i386/mshv: Set local interrupt controller state
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (17 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 18/25] target/i386/mshv: Implement mshv_arch_put_registers() Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 14:03   ` Paolo Bonzini
  2025-05-20 11:30 ` [RFC PATCH 20/25] target/i386/mshv: Register CPUID entries with MSHV Magnus Kulke
                   ` (6 subsequent siblings)
  25 siblings, 1 reply; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

To set the local interrupt controller state, perform hv calls retrieving
partition state from the hypervisor.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 target/i386/mshv/mshv-cpu.c | 120 ++++++++++++++++++++++++++++++++++++
 1 file changed, 120 insertions(+)

diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index ad42a09b99..dd856a2242 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -17,6 +17,7 @@
 #include "qemu/atomic.h"
 #include "qemu/lockable.h"
 #include "qemu/error-report.h"
+#include "qemu/memalign.h"
 #include "qemu/typedefs.h"
 
 #include "system/mshv.h"
@@ -108,6 +109,10 @@ static enum hv_register_name FPU_REGISTER_NAMES[26] = {
     HV_X64_REGISTER_XMM_CONTROL_STATUS,
 };
 
+/* Defines poached from apicdef.h kernel header. */
+static u_int32_t APIC_MODE_NMI = 0x4;
+static u_int32_t APIC_MODE_EXTINT = 0x7;
+
 static void add_cpu_guard(int cpu_fd)
 {
     QemuMutex *guard;
@@ -545,6 +550,114 @@ static int set_cpu_state(const CPUState *cpu, const MshvFPU *fpu_regs,
     return 0;
 }
 
+static int get_vp_state(int cpu_fd, mshv_get_set_vp_state *state)
+{
+    int ret;
+
+    ret = ioctl(cpu_fd, MSHV_GET_VP_STATE, state);
+    if (ret < 0) {
+        error_report("failed to get partition state: %s", strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
+static int get_lapic(int cpu_fd,
+                     struct hv_local_interrupt_controller_state *state)
+{
+    int ret;
+    size_t size = 4096;
+    /* buffer aligned to 4k, as *state requires that */
+    void *buffer = qemu_memalign(size, size);
+    struct mshv_get_set_vp_state mshv_state = { 0 };
+
+    mshv_state.buf_ptr = (uint64_t) buffer;
+    mshv_state.buf_sz = size;
+    mshv_state.type = MSHV_VP_STATE_LAPIC;
+
+    ret = get_vp_state(cpu_fd, &mshv_state);
+    if (ret == 0) {
+        memcpy(state, buffer, sizeof(*state));
+    }
+    qemu_vfree(buffer);
+    if (ret < 0) {
+        error_report("failed to get lapic");
+        return -1;
+    }
+
+    return 0;
+}
+
+static uint32_t set_apic_delivery_mode(uint32_t reg, uint32_t mode)
+{
+    return ((reg) & ~0x700) | ((mode) << 8);
+}
+
+static int set_vp_state(int cpu_fd, const mshv_get_set_vp_state *state)
+{
+    int ret;
+
+    ret = ioctl(cpu_fd, MSHV_SET_VP_STATE, state);
+    if (ret < 0) {
+        error_report("failed to set partition state: %s", strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
+static int set_lapic(int cpu_fd,
+                     const struct hv_local_interrupt_controller_state *state)
+{
+    int ret;
+    size_t size = 4096;
+    /* buffer aligned to 4k, as *state requires that */
+    void *buffer = qemu_memalign(size, size);
+    struct mshv_get_set_vp_state mshv_state = { 0 };
+
+    if (!state) {
+        error_report("lapic state is NULL");
+        return -1;
+    }
+    memcpy(buffer, state, sizeof(*state));
+
+    mshv_state.buf_ptr = (uint64_t) buffer;
+    mshv_state.buf_sz = size;
+    mshv_state.type = MSHV_VP_STATE_LAPIC;
+
+    ret = set_vp_state(cpu_fd, &mshv_state);
+    qemu_vfree(buffer);
+    if (ret < 0) {
+        error_report("failed to set lapic: %s", strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
+static int set_lint(int cpu_fd)
+{
+    int ret;
+    uint32_t *lvt_lint0, *lvt_lint1;
+
+    struct hv_local_interrupt_controller_state lapic_state = { 0 };
+    ret = get_lapic(cpu_fd, &lapic_state);
+    if (ret < 0) {
+        return ret;
+    }
+
+    lvt_lint0 = &lapic_state.apic_lvt_lint0;
+    *lvt_lint0 = set_apic_delivery_mode(*lvt_lint0, APIC_MODE_EXTINT);
+
+    lvt_lint1 = &lapic_state.apic_lvt_lint1;
+    *lvt_lint1 = set_apic_delivery_mode(*lvt_lint1, APIC_MODE_NMI);
+
+    /* TODO: should we skip setting lapic if the values are the same? */
+
+    return set_lapic(cpu_fd, &lapic_state);
+}
+
 /*
  * TODO: populate topology info:
  *
@@ -556,6 +669,7 @@ int mshv_configure_vcpu(const CPUState *cpu, const struct MshvFPU *fpu,
                         uint64_t xcr0)
 {
     int ret;
+    int cpu_fd = mshv_vcpufd(cpu);
 
     ret = set_cpu_state(cpu, fpu, xcr0);
     if (ret < 0) {
@@ -563,6 +677,12 @@ int mshv_configure_vcpu(const CPUState *cpu, const struct MshvFPU *fpu,
         return -1;
     }
 
+    ret = set_lint(cpu_fd);
+    if (ret < 0) {
+        error_report("failed to set lpic int");
+        return -1;
+    }
+
     return 0;
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 20/25] target/i386/mshv: Register CPUID entries with MSHV
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (18 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 19/25] target/i386/mshv: Set local interrupt controller state Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 11:30 ` [RFC PATCH 21/25] target/i386/mshv: Register MSRs " Magnus Kulke
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Convert the guest CPU's CPUID model into MSHV's format and register it
with the hypervisor. This ensures that the guest observes the correct
CPU feature set during CPUID instructions.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 target/i386/mshv/mshv-cpu.c | 199 ++++++++++++++++++++++++++++++++++++
 1 file changed, 199 insertions(+)

diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index dd856a2242..4208f498cd 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -377,6 +377,199 @@ int mshv_load_regs(CPUState *cpu)
     return 0;
 }
 
+static void add_cpuid_entry(GList *cpuid_entries,
+                            uint32_t function, uint32_t index,
+                            uint32_t eax, uint32_t ebx,
+                            uint32_t ecx, uint32_t edx)
+{
+    struct hv_cpuid_entry *entry;
+
+    entry = g_malloc0(sizeof(struct hv_cpuid_entry));
+    entry->function = function;
+    entry->index = index;
+    entry->eax = eax;
+    entry->ebx = ebx;
+    entry->ecx = ecx;
+    entry->edx = edx;
+
+    cpuid_entries = g_list_append(cpuid_entries, entry);
+}
+
+static void collect_cpuid_entries(const CPUState *cpu, GList *cpuid_entries)
+{
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+    uint32_t eax, ebx, ecx, edx;
+    uint32_t leaf, subleaf;
+    size_t max_leaf = 0x1F;
+    size_t max_subleaf = 0x20;
+
+    uint32_t leaves_with_subleaves[] = {0x4, 0x7, 0xD, 0xF, 0x10};
+    int n_subleaf_leaves = ARRAY_SIZE(leaves_with_subleaves);
+
+    /* Regular leaves without subleaves */
+    for (leaf = 0; leaf <= max_leaf; leaf++) {
+        bool has_subleaves = false;
+        for (int i = 0; i < n_subleaf_leaves; i++) {
+            if (leaf == leaves_with_subleaves[i]) {
+                has_subleaves = true;
+                break;
+            }
+        }
+
+        if (!has_subleaves) {
+            cpu_x86_cpuid(env, leaf, 0, &eax, &ebx, &ecx, &edx);
+            if (eax == 0 && ebx == 0 && ecx == 0 && edx == 0) {
+                /* all zeroes indicates no more leaves */
+                continue;
+            }
+
+            add_cpuid_entry(cpuid_entries, leaf, 0, eax, ebx, ecx, edx);
+            continue;
+        }
+
+        subleaf = 0;
+        while (subleaf < max_subleaf) {
+            cpu_x86_cpuid(env, leaf, subleaf, &eax, &ebx, &ecx, &edx);
+
+            if (eax == 0 && ebx == 0 && ecx == 0 && edx == 0) {
+                /* all zeroes indicates no more leaves */
+                break;
+            }
+            add_cpuid_entry(cpuid_entries, leaf, 0, eax, ebx, ecx, edx);
+            subleaf++;
+        }
+    }
+}
+
+static int register_intercept_result_cpuid_entry(int cpu_fd,
+                                                 uint8_t subleaf_specific,
+                                                 uint8_t always_override,
+                                                 struct hv_cpuid_entry *entry)
+{
+    struct hv_register_x64_cpuid_result_parameters cpuid_params = {
+        .input.eax = entry->function,
+        .input.ecx = entry->index,
+        .input.subleaf_specific = subleaf_specific,
+        .input.always_override = always_override,
+        .input.padding = 0,
+        /*
+         * With regard to masks - these are to specify bits to be overwritten
+         * The current CpuidEntry structure wouldn't allow to carry the masks
+         * in addition to the actual register values. For this reason, the
+         * masks are set to the exact values of the corresponding register bits
+         * to be registered for an overwrite. To view resulting values the
+         * hypervisor would return, HvCallGetVpCpuidValues hypercall can be
+         * used.
+         */
+        .result.eax = entry->eax,
+        .result.eax_mask = entry->eax,
+        .result.ebx = entry->ebx,
+        .result.ebx_mask = entry->ebx,
+        .result.ecx = entry->ecx,
+        .result.ecx_mask = entry->ecx,
+        .result.edx = entry->edx,
+        .result.edx_mask = entry->edx,
+    };
+    union hv_register_intercept_result_parameters parameters = {
+        .cpuid = cpuid_params,
+    };
+    struct mshv_register_intercept_result args = {
+        .intercept_type = HV_INTERCEPT_TYPE_X64_CPUID,
+        .parameters = parameters,
+    };
+    int ret;
+
+    ret = ioctl(cpu_fd, MSHV_VP_REGISTER_INTERCEPT_RESULT, &args);
+    if (ret < 0) {
+        error_report("failed to register intercept result for cpuid: %s",
+                     strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
+static int register_intercept_result_cpuid(int cpu_fd, struct hv_cpuid *cpuid)
+{
+    int ret = 0, entry_ret;
+    struct hv_cpuid_entry *entry;
+    uint8_t subleaf_specific, always_override;
+
+    for (size_t i = 0; i < cpuid->nent; i++) {
+        entry = &cpuid->entries[i];
+
+        /* set defaults */
+        subleaf_specific = 0;
+        always_override = 1;
+
+        /* Intel */
+        /* 0xb - Extended Topology Enumeration Leaf */
+        /* 0x1f - V2 Extended Topology Enumeration Leaf */
+        /* AMD */
+        /* 0x8000_001e - Processor Topology Information */
+        /* 0x8000_0026 - Extended CPU Topology */
+        if (entry->function == 0xb
+            || entry->function == 0x1f
+            || entry->function == 0x8000001e
+            || entry->function == 0x80000026) {
+            subleaf_specific = 1;
+            always_override = 1;
+        } else if (entry->function == 0x00000001
+            || entry->function == 0x80000000
+            || entry->function == 0x80000001
+            || entry->function == 0x80000008) {
+            subleaf_specific = 0;
+            always_override = 1;
+        }
+
+        entry_ret = register_intercept_result_cpuid_entry(cpu_fd,
+                                                          subleaf_specific,
+                                                          always_override,
+                                                          entry);
+        if ((entry_ret < 0) && (ret == 0)) {
+            ret = entry_ret;
+        }
+    }
+
+    return ret;
+}
+
+static int set_cpuid2(const CPUState *cpu)
+{
+    int ret;
+    size_t n_entries, cpuid_size;
+    struct hv_cpuid *cpuid;
+    struct hv_cpuid_entry *entry;
+    GList *entries = NULL;
+    int cpu_fd = mshv_vcpufd(cpu);
+
+    collect_cpuid_entries(cpu, entries);
+    n_entries = g_list_length(entries);
+
+    cpuid_size = sizeof(struct hv_cpuid)
+        + n_entries * sizeof(struct hv_cpuid_entry);
+
+    cpuid = g_malloc0(cpuid_size);
+    cpuid->nent = n_entries;
+    cpuid->padding = 0;
+
+    for (size_t i = 0; i < n_entries; i++) {
+        entry = g_list_nth_data(entries, i);
+        cpuid->entries[i] = *entry;
+        g_free(entry);
+    }
+    g_list_free(entries);
+
+    ret = register_intercept_result_cpuid(cpu_fd, cpuid);
+    g_free(cpuid);
+    if (ret < 0) {
+        return ret;
+    }
+
+    return 0;
+}
+
 static void populate_hv_segment_reg(SegmentCache *seg,
                                     hv_x64_segment_register *hv_reg)
 {
@@ -671,6 +864,12 @@ int mshv_configure_vcpu(const CPUState *cpu, const struct MshvFPU *fpu,
     int ret;
     int cpu_fd = mshv_vcpufd(cpu);
 
+    ret = set_cpuid2(cpu);
+    if (ret < 0) {
+        error_report("failed to set cpuid");
+        return -1;
+    }
+
     ret = set_cpu_state(cpu, fpu, xcr0);
     if (ret < 0) {
         error_report("failed to set cpu state");
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 21/25] target/i386/mshv: Register MSRs with MSHV
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (19 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 20/25] target/i386/mshv: Register CPUID entries with MSHV Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 11:30 ` [RFC PATCH 22/25] target/i386/mshv: Integrate x86 instruction decoder/emulator Magnus Kulke
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Build and register the guest vCPU's model-specific registers using
the MSHV interface.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/meson.build      |   1 +
 accel/mshv/msr.c            | 375 ++++++++++++++++++++++++++++++++++++
 include/system/mshv.h       |  26 +++
 target/i386/mshv/mshv-cpu.c |  38 ++++
 4 files changed, 440 insertions(+)
 create mode 100644 accel/mshv/msr.c

diff --git a/accel/mshv/meson.build b/accel/mshv/meson.build
index f88fc8678c..d3a2b32581 100644
--- a/accel/mshv/meson.build
+++ b/accel/mshv/meson.build
@@ -2,6 +2,7 @@ mshv_ss = ss.source_set()
 mshv_ss.add(if_true: files(
   'irq.c',
   'mem.c',
+  'msr.c',
   'mshv-all.c'
 ))
 
diff --git a/accel/mshv/msr.c b/accel/mshv/msr.c
new file mode 100644
index 0000000000..9967254aba
--- /dev/null
+++ b/accel/mshv/msr.c
@@ -0,0 +1,375 @@
+/*
+ * QEMU MSHV support
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * Authors:
+ *  Magnus Kulke      <magnuskulke@microsoft.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "system/mshv.h"
+#include "hw/hyperv/linux-mshv.h"
+#include "qemu/error-report.h"
+
+static uint32_t supported_msrs[64] = {
+    IA32_MSR_TSC,
+    IA32_MSR_EFER,
+    IA32_MSR_KERNEL_GS_BASE,
+    IA32_MSR_APIC_BASE,
+    IA32_MSR_PAT,
+    IA32_MSR_SYSENTER_CS,
+    IA32_MSR_SYSENTER_ESP,
+    IA32_MSR_SYSENTER_EIP,
+    IA32_MSR_STAR,
+    IA32_MSR_LSTAR,
+    IA32_MSR_CSTAR,
+    IA32_MSR_SFMASK,
+    IA32_MSR_MTRR_DEF_TYPE,
+    IA32_MSR_MTRR_PHYSBASE0,
+    IA32_MSR_MTRR_PHYSMASK0,
+    IA32_MSR_MTRR_PHYSBASE1,
+    IA32_MSR_MTRR_PHYSMASK1,
+    IA32_MSR_MTRR_PHYSBASE2,
+    IA32_MSR_MTRR_PHYSMASK2,
+    IA32_MSR_MTRR_PHYSBASE3,
+    IA32_MSR_MTRR_PHYSMASK3,
+    IA32_MSR_MTRR_PHYSBASE4,
+    IA32_MSR_MTRR_PHYSMASK4,
+    IA32_MSR_MTRR_PHYSBASE5,
+    IA32_MSR_MTRR_PHYSMASK5,
+    IA32_MSR_MTRR_PHYSBASE6,
+    IA32_MSR_MTRR_PHYSMASK6,
+    IA32_MSR_MTRR_PHYSBASE7,
+    IA32_MSR_MTRR_PHYSMASK7,
+    IA32_MSR_MTRR_FIX64K_00000,
+    IA32_MSR_MTRR_FIX16K_80000,
+    IA32_MSR_MTRR_FIX16K_A0000,
+    IA32_MSR_MTRR_FIX4K_C0000,
+    IA32_MSR_MTRR_FIX4K_C8000,
+    IA32_MSR_MTRR_FIX4K_D0000,
+    IA32_MSR_MTRR_FIX4K_D8000,
+    IA32_MSR_MTRR_FIX4K_E0000,
+    IA32_MSR_MTRR_FIX4K_E8000,
+    IA32_MSR_MTRR_FIX4K_F0000,
+    IA32_MSR_MTRR_FIX4K_F8000,
+    IA32_MSR_TSC_AUX,
+    IA32_MSR_DEBUG_CTL,
+    HV_X64_MSR_GUEST_OS_ID,
+    HV_X64_MSR_SINT0,
+    HV_X64_MSR_SINT1,
+    HV_X64_MSR_SINT2,
+    HV_X64_MSR_SINT3,
+    HV_X64_MSR_SINT4,
+    HV_X64_MSR_SINT5,
+    HV_X64_MSR_SINT6,
+    HV_X64_MSR_SINT7,
+    HV_X64_MSR_SINT8,
+    HV_X64_MSR_SINT9,
+    HV_X64_MSR_SINT10,
+    HV_X64_MSR_SINT11,
+    HV_X64_MSR_SINT12,
+    HV_X64_MSR_SINT13,
+    HV_X64_MSR_SINT14,
+    HV_X64_MSR_SINT15,
+    HV_X64_MSR_SCONTROL,
+    HV_X64_MSR_SIEFP,
+    HV_X64_MSR_SIMP,
+    HV_X64_MSR_REFERENCE_TSC,
+    HV_X64_MSR_EOM,
+};
+static const size_t msr_count = ARRAY_SIZE(supported_msrs);
+
+static int compare_msr_index(const void *a, const void *b)
+{
+    return *(uint32_t *)a - *(uint32_t *)b;
+}
+
+__attribute__((constructor))
+static void init_sorted_msr_map(void)
+{
+    qsort(supported_msrs, msr_count, sizeof(uint32_t), compare_msr_index);
+}
+
+static int mshv_is_supported_msr(uint32_t msr)
+{
+    return bsearch(&msr, supported_msrs, msr_count, sizeof(uint32_t),
+                   compare_msr_index) != NULL;
+}
+
+static int mshv_msr_to_hv_reg_name(uint32_t msr, uint32_t *hv_reg)
+{
+    switch (msr) {
+    case IA32_MSR_TSC:
+        *hv_reg = HV_X64_REGISTER_TSC;
+        return 0;
+    case IA32_MSR_EFER:
+        *hv_reg = HV_X64_REGISTER_EFER;
+        return 0;
+    case IA32_MSR_KERNEL_GS_BASE:
+        *hv_reg = HV_X64_REGISTER_KERNEL_GS_BASE;
+        return 0;
+    case IA32_MSR_APIC_BASE:
+        *hv_reg = HV_X64_REGISTER_APIC_BASE;
+        return 0;
+    case IA32_MSR_PAT:
+        *hv_reg = HV_X64_REGISTER_PAT;
+        return 0;
+    case IA32_MSR_SYSENTER_CS:
+        *hv_reg = HV_X64_REGISTER_SYSENTER_CS;
+        return 0;
+    case IA32_MSR_SYSENTER_ESP:
+        *hv_reg = HV_X64_REGISTER_SYSENTER_ESP;
+        return 0;
+    case IA32_MSR_SYSENTER_EIP:
+        *hv_reg = HV_X64_REGISTER_SYSENTER_EIP;
+        return 0;
+    case IA32_MSR_STAR:
+        *hv_reg = HV_X64_REGISTER_STAR;
+        return 0;
+    case IA32_MSR_LSTAR:
+        *hv_reg = HV_X64_REGISTER_LSTAR;
+        return 0;
+    case IA32_MSR_CSTAR:
+        *hv_reg = HV_X64_REGISTER_CSTAR;
+        return 0;
+    case IA32_MSR_SFMASK:
+        *hv_reg = HV_X64_REGISTER_SFMASK;
+        return 0;
+    case IA32_MSR_MTRR_CAP:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_CAP;
+        return 0;
+    case IA32_MSR_MTRR_DEF_TYPE:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_DEF_TYPE;
+        return 0;
+    case IA32_MSR_MTRR_PHYSBASE0:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE0;
+        return 0;
+    case IA32_MSR_MTRR_PHYSMASK0:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK0;
+        return 0;
+    case IA32_MSR_MTRR_PHYSBASE1:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE1;
+        return 0;
+    case IA32_MSR_MTRR_PHYSMASK1:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK1;
+        return 0;
+    case IA32_MSR_MTRR_PHYSBASE2:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE2;
+        return 0;
+    case IA32_MSR_MTRR_PHYSMASK2:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK2;
+        return 0;
+    case IA32_MSR_MTRR_PHYSBASE3:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE3;
+        return 0;
+    case IA32_MSR_MTRR_PHYSMASK3:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK3;
+        return 0;
+    case IA32_MSR_MTRR_PHYSBASE4:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE4;
+        return 0;
+    case IA32_MSR_MTRR_PHYSMASK4:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK4;
+        return 0;
+    case IA32_MSR_MTRR_PHYSBASE5:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE5;
+        return 0;
+    case IA32_MSR_MTRR_PHYSMASK5:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK5;
+        return 0;
+    case IA32_MSR_MTRR_PHYSBASE6:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE6;
+        return 0;
+    case IA32_MSR_MTRR_PHYSMASK6:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK6;
+        return 0;
+    case IA32_MSR_MTRR_PHYSBASE7:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE7;
+        return 0;
+    case IA32_MSR_MTRR_PHYSMASK7:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK7;
+        return 0;
+    case IA32_MSR_MTRR_FIX64K_00000:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX64K00000;
+        return 0;
+    case IA32_MSR_MTRR_FIX16K_80000:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX16K80000;
+        return 0;
+    case IA32_MSR_MTRR_FIX16K_A0000:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX16KA0000;
+        return 0;
+    case IA32_MSR_MTRR_FIX4K_C0000:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KC0000;
+        return 0;
+    case IA32_MSR_MTRR_FIX4K_C8000:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KC8000;
+        return 0;
+    case IA32_MSR_MTRR_FIX4K_D0000:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KD0000;
+        return 0;
+    case IA32_MSR_MTRR_FIX4K_D8000:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KD8000;
+        return 0;
+    case IA32_MSR_MTRR_FIX4K_E0000:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KE0000;
+        return 0;
+    case IA32_MSR_MTRR_FIX4K_E8000:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KE8000;
+        return 0;
+    case IA32_MSR_MTRR_FIX4K_F0000:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KF0000;
+        return 0;
+    case IA32_MSR_MTRR_FIX4K_F8000:
+        *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KF8000;
+        return 0;
+    case IA32_MSR_TSC_AUX:
+        *hv_reg = HV_X64_REGISTER_TSC_AUX;
+        return 0;
+    case IA32_MSR_BNDCFGS:
+        *hv_reg = HV_X64_REGISTER_BNDCFGS;
+        return 0;
+    case IA32_MSR_DEBUG_CTL:
+        *hv_reg = HV_X64_REGISTER_DEBUG_CTL;
+        return 0;
+    case IA32_MSR_TSC_ADJUST:
+        *hv_reg = HV_X64_REGISTER_TSC_ADJUST;
+        return 0;
+    case IA32_MSR_SPEC_CTRL:
+        *hv_reg = HV_X64_REGISTER_SPEC_CTRL;
+        return 0;
+    case HV_X64_MSR_GUEST_OS_ID:
+        *hv_reg = HV_REGISTER_GUEST_OS_ID;
+        return 0;
+    case HV_X64_MSR_SINT0:
+        *hv_reg = HV_REGISTER_SINT0;
+        return 0;
+    case HV_X64_MSR_SINT1:
+        *hv_reg = HV_REGISTER_SINT1;
+        return 0;
+    case HV_X64_MSR_SINT2:
+        *hv_reg = HV_REGISTER_SINT2;
+        return 0;
+    case HV_X64_MSR_SINT3:
+        *hv_reg = HV_REGISTER_SINT3;
+        return 0;
+    case HV_X64_MSR_SINT4:
+        *hv_reg = HV_REGISTER_SINT4;
+        return 0;
+    case HV_X64_MSR_SINT5:
+        *hv_reg = HV_REGISTER_SINT5;
+        return 0;
+    case HV_X64_MSR_SINT6:
+        *hv_reg = HV_REGISTER_SINT6;
+        return 0;
+    case HV_X64_MSR_SINT7:
+        *hv_reg = HV_REGISTER_SINT7;
+        return 0;
+    case HV_X64_MSR_SINT8:
+        *hv_reg = HV_REGISTER_SINT8;
+        return 0;
+    case HV_X64_MSR_SINT9:
+        *hv_reg = HV_REGISTER_SINT9;
+        return 0;
+    case HV_X64_MSR_SINT10:
+        *hv_reg = HV_REGISTER_SINT10;
+        return 0;
+    case HV_X64_MSR_SINT11:
+        *hv_reg = HV_REGISTER_SINT11;
+        return 0;
+    case HV_X64_MSR_SINT12:
+        *hv_reg = HV_REGISTER_SINT12;
+        return 0;
+    case HV_X64_MSR_SINT13:
+        *hv_reg = HV_REGISTER_SINT13;
+        return 0;
+    case HV_X64_MSR_SINT14:
+        *hv_reg = HV_REGISTER_SINT14;
+        return 0;
+    case HV_X64_MSR_SINT15:
+        *hv_reg = HV_REGISTER_SINT15;
+        return 0;
+    case IA32_MSR_MISC_ENABLE:
+        *hv_reg = HV_X64_REGISTER_MSR_IA32_MISC_ENABLE;
+        return 0;
+    case HV_X64_MSR_SCONTROL:
+        *hv_reg = HV_REGISTER_SCONTROL;
+        return 0;
+    case HV_X64_MSR_SIEFP:
+        *hv_reg = HV_REGISTER_SIEFP;
+        return 0;
+    case HV_X64_MSR_SIMP:
+        *hv_reg = HV_REGISTER_SIMP;
+        return 0;
+    case HV_X64_MSR_REFERENCE_TSC:
+        *hv_reg = HV_REGISTER_REFERENCE_TSC;
+        return 0;
+    case HV_X64_MSR_EOM:
+        *hv_reg = HV_REGISTER_EOM;
+        return 0;
+    default:
+        error_report("failed to map MSR %u to HV register name", msr);
+        return -1;
+    }
+}
+
+static int set_msrs(int cpu_fd, GList *msrs)
+{
+    size_t n_msrs;
+    GList *entries;
+    MshvMsrEntry *entry;
+    enum hv_register_name name;
+    struct hv_register_assoc *assoc;
+    int ret;
+    size_t i = 0;
+
+    n_msrs = g_list_length(msrs);
+    hv_register_assoc *assocs = g_new0(hv_register_assoc, n_msrs);
+
+    entries = msrs;
+    for (const GList *elem = entries; elem != NULL; elem = elem->next) {
+        entry = elem->data;
+        ret = mshv_msr_to_hv_reg_name(entry->index, &name);
+        if (ret < 0) {
+            g_free(assocs);
+            return ret;
+        }
+        assoc = &assocs[i];
+        assoc->name = name;
+        /* the union has been initialized to 0 */
+        assoc->value.reg64 = entry->data;
+        i++;
+    }
+    ret = mshv_set_generic_regs(cpu_fd, assocs, n_msrs);
+    g_free(assocs);
+    if (ret < 0) {
+        error_report("failed to set msrs");
+        return -1;
+    }
+    return 0;
+}
+
+
+int mshv_configure_msr(int cpu_fd, const MshvMsrEntry *msrs, size_t n_msrs)
+{
+    GList *valid_msrs = NULL;
+    uint32_t msr_index;
+    int ret;
+
+    for (size_t i = 0; i < n_msrs; i++) {
+        msr_index = msrs[i].index;
+        /* check whether index of msrs is in SUPPORTED_MSRS */
+        if (mshv_is_supported_msr(msr_index)) {
+            valid_msrs = g_list_append(valid_msrs, (void *) &msrs[i]);
+        }
+    }
+
+    ret = set_msrs(cpu_fd, valid_msrs);
+    g_list_free(valid_msrs);
+
+    return ret;
+}
diff --git a/include/system/mshv.h b/include/system/mshv.h
index 76a3b0010e..f854f9b77d 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -48,6 +48,8 @@ typedef struct hyperv_message hv_message;
 
 #define MSHV_PAGE_SHIFT 12
 
+#define MSHV_MSR_ENTRIES_COUNT 64
+
 
 #ifdef CONFIG_MSHV_IS_POSSIBLE
 extern bool mshv_allowed;
@@ -162,8 +164,32 @@ void mshv_arch_amend_proc_features(
     union hv_partition_synthetic_processor_features *features);
 int mshv_arch_post_init_vm(int vm_fd);
 
+/* pio */
+int mshv_pio_write(uint64_t port, const uint8_t *data, uintptr_t size,
+                   bool is_secure_mode);
+void mshv_pio_read(uint64_t port, uint8_t *data, uintptr_t size,
+                   bool is_secure_mode);
+
+/* generic */
+enum MshvMiscError {
+    MSHV_USERSPACE_ADDR_REMAP_ERROR = 2001,
+};
+
 int mshv_hvcall(int mshv_fd, const struct mshv_root_hvcall *args);
 
+/* msr */
+typedef struct MshvMsrEntry {
+  uint32_t index;
+  uint32_t reserved;
+  uint64_t data;
+} MshvMsrEntry;
+
+typedef struct MshvMsrEntries {
+    MshvMsrEntry entries[MSHV_MSR_ENTRIES_COUNT];
+    uint32_t nmsrs;
+} MshvMsrEntries;
+
+int mshv_configure_msr(int cpu_fd, const MshvMsrEntry *msrs, size_t n_msrs);
 
 /* memory */
 typedef struct MshvMemoryRegion {
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 4208f498cd..081132e0c9 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -109,6 +109,11 @@ static enum hv_register_name FPU_REGISTER_NAMES[26] = {
     HV_X64_REGISTER_XMM_CONTROL_STATUS,
 };
 
+/* MTRR constants */
+/* IA32_MTRR_DEF_TYPE MSR: E (MTRRs enabled) flag, bit 11 */
+static u_int64_t MTRR_ENABLE = 0x800;
+static u_int64_t MTRR_MEM_TYPE_WB = 0x6;
+
 /* Defines poached from apicdef.h kernel header. */
 static u_int32_t APIC_MODE_NMI = 0x4;
 static u_int32_t APIC_MODE_EXTINT = 0x7;
@@ -851,6 +856,33 @@ static int set_lint(int cpu_fd)
     return set_lapic(cpu_fd, &lapic_state);
 }
 
+static int setup_msrs(int cpu_fd)
+{
+    int ret;
+    uint64_t default_type = MTRR_ENABLE | MTRR_MEM_TYPE_WB;
+
+    /* boot msr entries */
+    MshvMsrEntry msrs[9] = {
+        { .index = IA32_MSR_SYSENTER_CS, .data = 0x0, },
+        { .index = IA32_MSR_SYSENTER_ESP, .data = 0x0, },
+        { .index = IA32_MSR_SYSENTER_EIP, .data = 0x0, },
+        { .index = IA32_MSR_STAR, .data = 0x0, },
+        { .index = IA32_MSR_CSTAR, .data = 0x0, },
+        { .index = IA32_MSR_LSTAR, .data = 0x0, },
+        { .index = IA32_MSR_KERNEL_GS_BASE, .data = 0x0, },
+        { .index = IA32_MSR_SFMASK, .data = 0x0, },
+        { .index = IA32_MSR_MTRR_DEF_TYPE, .data = default_type, },
+    };
+
+    ret = mshv_configure_msr(cpu_fd, msrs, 9);
+    if (ret < 0) {
+        error_report("failed to setup msrs");
+        return -1;
+    }
+
+    return 0;
+}
+
 /*
  * TODO: populate topology info:
  *
@@ -870,6 +902,12 @@ int mshv_configure_vcpu(const CPUState *cpu, const struct MshvFPU *fpu,
         return -1;
     }
 
+    ret = setup_msrs(cpu_fd);
+    if (ret < 0) {
+        error_report("failed to setup msrs");
+        return -1;
+    }
+
     ret = set_cpu_state(cpu, fpu, xcr0);
     if (ret < 0) {
         error_report("failed to set cpu state");
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 22/25] target/i386/mshv: Integrate x86 instruction decoder/emulator
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (20 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 21/25] target/i386/mshv: Register MSRs " Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 22:38   ` Wei Liu
  2025-05-20 11:30 ` [RFC PATCH 23/25] target/i386/mshv: Write MSRs to the hypervisor Magnus Kulke
                   ` (3 subsequent siblings)
  25 siblings, 1 reply; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Connect the x86 instruction decoder and emulator to the MSHV backend
to handle intercepted instructions. This enables software emulation
of MMIO operations in MSHV guests. MSHV has a translate_gva hypercall
that is used to accessing the physical guest memory.

A guest might read from unmapped memory regions (e.g. OVMF will probe
0xfed40000 for a vTPM). In those cases 0xFF bytes is returned instead of
aborting the execution.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/mem.c            |  72 ++++++++++++++++++++
 accel/mshv/trace-events     |   3 +
 include/system/mshv.h       |   4 ++
 target/i386/mshv/mshv-cpu.c | 127 ++++++++++++++++++++++++++++++++++++
 4 files changed, 206 insertions(+)

diff --git a/accel/mshv/mem.c b/accel/mshv/mem.c
index 2bbeae4f4a..ee627e7bd6 100644
--- a/accel/mshv/mem.c
+++ b/accel/mshv/mem.c
@@ -54,6 +54,78 @@ static int map_or_unmap(int vm_fd, const MshvMemoryRegion *mr, bool add)
     return set_guest_memory(vm_fd, &region);
 }
 
+static inline MemTxAttrs get_mem_attrs(bool is_secure_mode)
+{
+    MemTxAttrs memattr = {0};
+    memattr.secure = is_secure_mode;
+    return memattr;
+}
+
+static int handle_unmapped_mmio_region_read(uint64_t gpa, uint64_t size,
+                                            uint8_t *data)
+{
+    warn_report("read from unmapped mmio region gpa=0x%lx size=%lu", gpa, size);
+
+    if (size == 0 || size > 8) {
+        error_report("invalid size %lu for reading from unmapped mmio region",
+                     size);
+        return -1;
+    }
+
+    memset(data, 0xFF, size);
+
+    return 0;
+}
+
+int mshv_guest_mem_read(uint64_t gpa, uint8_t *data, uintptr_t size,
+                        bool is_secure_mode, bool instruction_fetch)
+{
+    int ret;
+    MemTxAttrs memattr = get_mem_attrs(is_secure_mode);
+
+    if (instruction_fetch) {
+        trace_mshv_insn_fetch(gpa, size);
+    } else {
+        trace_mshv_mem_read(gpa, size);
+    }
+
+    ret = address_space_rw(&address_space_memory, gpa, memattr, (void *)data,
+                           size, false);
+    if (ret == MEMTX_OK) {
+        return 0;
+    }
+
+    if (ret == MEMTX_DECODE_ERROR) {
+        return handle_unmapped_mmio_region_read(gpa, size, data);
+    }
+
+    error_report("failed to read guest memory at 0x%lx", gpa);
+    return -1;
+}
+
+int mshv_guest_mem_write(uint64_t gpa, const uint8_t *data, uintptr_t size,
+                         bool is_secure_mode)
+{
+    int ret;
+
+    trace_mshv_mem_write(gpa, size);
+    MemTxAttrs memattr = get_mem_attrs(is_secure_mode);
+    ret = address_space_rw(&address_space_memory, gpa, memattr, (void *)data,
+                           size, true);
+    if (ret == MEMTX_OK) {
+        return 0;
+    }
+
+    if (ret == MEMTX_DECODE_ERROR) {
+        warn_report("write to unmapped mmio region gpa=0x%lx size=%lu", gpa,
+                    size);
+        return 0;
+    }
+
+    error_report("Failed to write guest memory");
+    return -1;
+}
+
 static int set_memory(const MshvMemoryRegion *mshv_mr, bool add)
 {
     int ret = 0;
diff --git a/accel/mshv/trace-events b/accel/mshv/trace-events
index 06aa27ef67..686d89e084 100644
--- a/accel/mshv/trace-events
+++ b/accel/mshv/trace-events
@@ -15,3 +15,6 @@ mshv_commit_msi_routing_table(int vm_fd, int len) "vm_fd %d table_size %d"
 mshv_register_irqfd(int vm_fd, int event_fd, uint32_t gsi) "vm_fd %d event_fd %d gsi %d"
 mshv_irqchip_update_irqfd_notifier_gsi(int event_fd, int resample_fd, int virq, bool add) "event_fd %d resample_fd %d virq %d add %d"
 
+mshv_insn_fetch(uint64_t addr, size_t size) "gpa=%lx size=%lu"
+mshv_mem_write(uint64_t addr, size_t size) "\tgpa=%lx size=%lu"
+mshv_mem_read(uint64_t addr, size_t size) "\tgpa=%lx size=%lu"
diff --git a/include/system/mshv.h b/include/system/mshv.h
index f854f9b77d..622b3db540 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -201,6 +201,10 @@ typedef struct MshvMemoryRegion {
 
 int mshv_add_mem(int vm_fd, const MshvMemoryRegion *mr);
 int mshv_remove_mem(int vm_fd, const MshvMemoryRegion *mr);
+int mshv_guest_mem_read(uint64_t gpa, uint8_t *data, uintptr_t size,
+                        bool is_secure_mode, bool instruction_fetch);
+int mshv_guest_mem_write(uint64_t gpa, const uint8_t *data, uintptr_t size,
+                         bool is_secure_mode);
 void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
                        bool add);
 /* interrupt */
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 081132e0c9..a7ee5ebb2a 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -995,11 +995,138 @@ int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd)
     return 0;
 }
 
+static int translate_gva(int cpu_fd, uint64_t gva, uint64_t *gpa,
+                         uint64_t flags)
+{
+    int ret;
+    union hv_translate_gva_result result = { 0 };
+
+    *gpa = 0;
+    mshv_translate_gva args = {
+        .gva = gva,
+        .flags = flags,
+        .gpa = (__u64 *)gpa,
+        .result = &result,
+    };
+
+    ret = ioctl(cpu_fd, MSHV_TRANSLATE_GVA, &args);
+    if (ret < 0) {
+        error_report("failed to invoke gpa->gva translation");
+        return -errno;
+    }
+    if (result.result_code != HV_TRANSLATE_GVA_SUCCESS) {
+        error_report("failed to translate gva (" TARGET_FMT_lx ") to gpa", gva);
+        return -1;
+
+    }
+
+    return 0;
+}
+
+static int guest_mem_read_with_gva(const CPUState *cpu, uint64_t gva,
+                                   uint8_t *data, uintptr_t size,
+                                   bool fetch_instruction)
+{
+    int ret;
+    uint64_t gpa, flags;
+    int cpu_fd = mshv_vcpufd(cpu);
+
+    flags = HV_TRANSLATE_GVA_VALIDATE_READ;
+    ret = translate_gva(cpu_fd, gva, &gpa, flags);
+    if (ret < 0) {
+        error_report("failed to translate gva to gpa");
+        return -1;
+    }
+
+    ret = mshv_guest_mem_read(gpa, data, size, false, fetch_instruction);
+    if (ret < 0) {
+        error_report("failed to read from guest memory");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int guest_mem_write_with_gva(const CPUState *cpu, uint64_t gva,
+                                    const uint8_t *data, uintptr_t size)
+{
+    int ret;
+    uint64_t gpa, flags;
+    int cpu_fd = mshv_vcpufd(cpu);
+
+    flags = HV_TRANSLATE_GVA_VALIDATE_WRITE;
+    ret = translate_gva(cpu_fd, gva, &gpa, flags);
+    if (ret < 0) {
+        error_report("failed to translate gva to gpa");
+        return -1;
+    }
+    ret = mshv_guest_mem_write(gpa, data, size, false);
+    if (ret < 0) {
+        error_report("failed to write to guest memory");
+        return -1;
+    }
+    return 0;
+}
+
+static void write_mem_emu(CPUState *cpu, void *data, target_ulong addr,
+                          int bytes)
+{
+    if (guest_mem_write_with_gva(cpu, addr, data, bytes) < 0) {
+        error_report("failed to write memory");
+        abort();
+    }
+}
+
+static void read_mem_emu(CPUState *cpu, void *data, target_ulong addr,
+                         int bytes)
+{
+    if (guest_mem_read_with_gva(cpu, addr, data, bytes, false) < 0) {
+        error_report("failed to read memory");
+        abort();
+    }
+}
+
+static void fetch_instruction_emu(CPUState *cpu, void *data,
+                                  target_ulong addr, int bytes)
+{
+    if (guest_mem_read_with_gva(cpu, addr, data, bytes, true) < 0) {
+        error_report("failed to fetch instruction");
+        abort();
+    }
+}
+
+static void read_segment_descriptor_emu(CPUState *cpu,
+                                        struct x86_segment_descriptor *desc,
+                                        enum X86Seg seg_idx)
+{
+    bool ret;
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+    SegmentCache *seg = &env->segs[seg_idx];
+    x86_segment_selector sel = { .sel = seg->selector & 0xFFFF };
+
+    ret = x86_read_segment_descriptor(cpu, desc, sel);
+    if (ret == false) {
+        error_report("failed to read segment descriptor");
+        abort();
+    }
+}
+
+static const struct x86_emul_ops mshv_x86_emul_ops = {
+    .fetch_instruction = fetch_instruction_emu,
+    .read_mem = read_mem_emu,
+    .write_mem = write_mem_emu,
+    .read_segment_descriptor = read_segment_descriptor_emu,
+};
+
 void mshv_init_cpu_logic(void)
 {
     cpu_guards_lock = g_new0(QemuMutex, 1);
     qemu_mutex_init(cpu_guards_lock);
     cpu_guards = g_hash_table_new(g_direct_hash, g_direct_equal);
+
+    init_decoder();
+    init_emu(&mshv_x86_emul_ops);
 }
 
 void mshv_arch_init_vcpu(CPUState *cpu)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 23/25] target/i386/mshv: Write MSRs to the hypervisor
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (21 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 22/25] target/i386/mshv: Integrate x86 instruction decoder/emulator Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 11:30 ` [RFC PATCH 24/25] target/i386/mshv: Implement mshv_vcpu_run() Magnus Kulke
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Push current model-specific register (MSR) values to MSHV's vCPUs as
part of setting state to the hypervisor.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 target/i386/mshv/mshv-cpu.c | 70 +++++++++++++++++++++++++++++++++++--
 1 file changed, 68 insertions(+), 2 deletions(-)

diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index a7ee5ebb2a..fdc7e5e019 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -118,6 +118,8 @@ static u_int64_t MTRR_MEM_TYPE_WB = 0x6;
 static u_int32_t APIC_MODE_NMI = 0x4;
 static u_int32_t APIC_MODE_EXTINT = 0x7;
 
+#define MSR_ENTRIES_COUNT 64
+
 static void add_cpu_guard(int cpu_fd)
 {
     QemuMutex *guard;
@@ -941,6 +943,65 @@ static int put_regs(const CPUState *cpu)
     return 0;
 }
 
+struct MsrPair {
+    uint32_t index;
+    uint64_t value;
+};
+
+static int put_msrs(const CPUState *cpu)
+{
+    int ret = 0;
+    X86CPU *x86cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86cpu->env;
+    MshvMsrEntries *msrs = g_malloc0(sizeof(MshvMsrEntries));
+
+    struct MsrPair pairs[] = {
+        { MSR_IA32_SYSENTER_CS,    env->sysenter_cs },
+        { MSR_IA32_SYSENTER_ESP,   env->sysenter_esp },
+        { MSR_IA32_SYSENTER_EIP,   env->sysenter_eip },
+        { MSR_EFER,                env->efer },
+        { MSR_PAT,                 env->pat },
+        { MSR_STAR,                env->star },
+        { MSR_CSTAR,               env->cstar },
+        { MSR_LSTAR,               env->lstar },
+        { MSR_KERNELGSBASE,        env->kernelgsbase },
+        { MSR_FMASK,               env->fmask },
+        { MSR_MTRRdefType,         env->mtrr_deftype },
+        { MSR_VM_HSAVE_PA,         env->vm_hsave },
+        { MSR_SMI_COUNT,           env->msr_smi_count },
+        { MSR_IA32_PKRS,           env->pkrs },
+        { MSR_IA32_BNDCFGS,        env->msr_bndcfgs },
+        { MSR_IA32_XSS,            env->xss },
+        { MSR_IA32_UMWAIT_CONTROL, env->umwait },
+        { MSR_IA32_TSX_CTRL,       env->tsx_ctrl },
+        { MSR_AMD64_TSC_RATIO,     env->amd_tsc_scale_msr },
+        { MSR_TSC_AUX,             env->tsc_aux },
+        { MSR_TSC_ADJUST,          env->tsc_adjust },
+        { MSR_IA32_SMBASE,         env->smbase },
+        { MSR_IA32_SPEC_CTRL,      env->spec_ctrl },
+        { MSR_VIRT_SSBD,           env->virt_ssbd },
+    };
+
+    if (ARRAY_SIZE(pairs) > MSR_ENTRIES_COUNT) {
+        error_report("MSR entries exceed maximum size");
+        g_free(msrs);
+        return -1;
+    }
+
+    for (size_t i = 0; i < ARRAY_SIZE(pairs); i++) {
+        MshvMsrEntry *entry = &msrs->entries[i];
+        entry->index = pairs[i].index;
+        entry->reserved = 0;
+        entry->data = pairs[i].value;
+        msrs->nmsrs++;
+    }
+
+    ret = mshv_configure_msr(mshv_vcpufd(cpu), &msrs->entries[0], msrs->nmsrs);
+    g_free(msrs);
+    return ret;
+}
+
+
 int mshv_arch_put_registers(const CPUState *cpu)
 {
     int ret;
@@ -951,8 +1012,13 @@ int mshv_arch_put_registers(const CPUState *cpu)
         return -1;
     }
 
-	error_report("unimplemented");
-	abort();
+    ret = put_msrs(cpu);
+    if (ret < 0) {
+        error_report("Failed to put msrs");
+        return -1;
+    }
+
+    return 0;
 }
 
 void mshv_arch_amend_proc_features(
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 24/25] target/i386/mshv: Implement mshv_vcpu_run()
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (22 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 23/25] target/i386/mshv: Write MSRs to the hypervisor Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 13:21   ` Paolo Bonzini
  2025-05-20 22:52   ` Wei Liu
  2025-05-20 11:30 ` [RFC PATCH 25/25] accel/mshv: Add memory remapping workaround Magnus Kulke
  2025-05-20 14:25 ` [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Paolo Bonzini
  25 siblings, 2 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Add the main vCPU execution loop for MSHV using the MSHV_RUN_VP ioctl.

A translate_gva() hypercall is implemented. The execution loop handles
guest entry and VM exits. There are handlers for memory r/w, PIO and
MMIO to which the exit events are dispatched.

In case of MMIO the i386 instruction decoder/emulator is invoked to
perform the operation in user space.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 target/i386/mshv/mshv-cpu.c | 554 ++++++++++++++++++++++++++++++++++--
 1 file changed, 524 insertions(+), 30 deletions(-)

diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index fdc7e5e019..27c6cd6138 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -21,6 +21,7 @@
 #include "qemu/typedefs.h"
 
 #include "system/mshv.h"
+#include "system/address-spaces.h"
 #include "hw/hyperv/linux-mshv.h"
 #include "hw/hyperv/hvhdk_mini.h"
 #include "hw/hyperv/hvgdk.h"
@@ -145,6 +146,34 @@ static void remove_cpu_guard(int cpu_fd)
     }
 }
 
+static int translate_gva(int cpu_fd, uint64_t gva, uint64_t *gpa,
+                         uint64_t flags)
+{
+    int ret;
+    union hv_translate_gva_result result = { 0 };
+
+    *gpa = 0;
+    mshv_translate_gva args = {
+        .gva = gva,
+        .flags = flags,
+        .gpa = (__u64 *)gpa,
+        .result = &result,
+    };
+
+    ret = ioctl(cpu_fd, MSHV_TRANSLATE_GVA, &args);
+    if (ret < 0) {
+        error_report("failed to invoke gpa->gva translation");
+        return -errno;
+    }
+    if (result.result_code != HV_TRANSLATE_GVA_SUCCESS) {
+        error_report("failed to translate gva (" TARGET_FMT_lx ") to gpa", gva);
+        return -1;
+
+    }
+
+    return 0;
+}
+
 int mshv_set_generic_regs(int cpu_fd, hv_register_assoc *assocs, size_t n_regs)
 {
     struct mshv_vp_registers input = {
@@ -1027,10 +1056,503 @@ void mshv_arch_amend_proc_features(
     features->access_guest_idle_reg = 1;
 }
 
+static int set_memory_info(const struct hyperv_message *msg,
+                           struct hv_x64_memory_intercept_message *info)
+{
+    if (msg->header.message_type != HVMSG_GPA_INTERCEPT
+            && msg->header.message_type != HVMSG_UNMAPPED_GPA
+            && msg->header.message_type != HVMSG_UNACCEPTED_GPA) {
+        error_report("invalid message type");
+        return -1;
+    }
+    memcpy(info, msg->payload, sizeof(*info));
+
+    return 0;
+}
+
+static int emulate_instruction(CPUState *cpu,
+                               const uint8_t *insn_bytes, size_t insn_len,
+                               uint64_t gva, uint64_t gpa)
+{
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+    struct x86_decode decode = { 0 };
+    int ret;
+    int cpu_fd = mshv_vcpufd(cpu);
+    QemuMutex *guard;
+    x86_insn_stream stream = { .bytes = insn_bytes, .len = insn_len };
+
+    guard = g_hash_table_lookup(cpu_guards, GUINT_TO_POINTER(cpu_fd));
+    if (!guard) {
+        error_report("failed to get cpu guard");
+        return -1;
+    }
+
+    WITH_QEMU_LOCK_GUARD(guard) {
+        ret = mshv_load_regs(cpu);
+        if (ret < 0) {
+            error_report("failed to load registers");
+            return -1;
+        }
+
+        decode_instruction_stream(env, &decode, &stream);
+        exec_instruction(env, &decode);
+
+        ret = mshv_store_regs(cpu);
+        if (ret < 0) {
+            error_report("failed to store registers");
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
+static int handle_mmio(CPUState *cpu, const struct hyperv_message *msg,
+                       MshvVmExit *exit_reason)
+{
+    struct hv_x64_memory_intercept_message info = { 0 };
+    size_t insn_len;
+    uint8_t access_type;
+    uint8_t *instruction_bytes;
+    int ret;
+
+    ret = set_memory_info(msg, &info);
+    if (ret < 0) {
+        error_report("failed to convert message to memory info");
+        return -1;
+    }
+    insn_len = info.instruction_byte_count;
+    access_type = info.header.intercept_access_type;
+
+    if (access_type == HV_X64_INTERCEPT_ACCESS_TYPE_EXECUTE) {
+        error_report("invalid intercept access type: execute");
+        return -1;
+    }
+
+    if (insn_len > 16) {
+        error_report("invalid mmio instruction length: %zu", insn_len);
+        return -1;
+    }
+
+    if (insn_len == 0) {
+        warn_report("mmio instruction buffer empty");
+    }
+
+    instruction_bytes = info.instruction_bytes;
+
+    ret = emulate_instruction(cpu, instruction_bytes, insn_len,
+                              info.guest_virtual_address,
+                              info.guest_physical_address);
+    if (ret < 0) {
+        error_report("failed to emulate mmio");
+        return -1;
+    }
+
+    *exit_reason = MshvVmExitIgnore;
+
+    return 0;
+}
+
+static int handle_unmapped_mem(int vm_fd, CPUState *cpu,
+                               const struct hyperv_message *msg,
+                               MshvVmExit *exit_reason)
+{
+    struct hv_x64_memory_intercept_message info = { 0 };
+    int ret;
+
+    ret = set_memory_info(msg, &info);
+    if (ret < 0) {
+        error_report("failed to convert message to memory info");
+        return -1;
+    }
+
+    return handle_mmio(cpu, msg, exit_reason);
+}
+
+static int set_ioport_info(const struct hyperv_message *msg,
+                           hv_x64_io_port_intercept_message *info)
+{
+    if (msg->header.message_type != HVMSG_X64_IO_PORT_INTERCEPT) {
+        error_report("Invalid message type");
+        return -1;
+    }
+    memcpy(info, msg->payload, sizeof(*info));
+
+    return 0;
+}
+
+typedef struct X64Registers {
+  const uint32_t *names;
+  const uint64_t *values;
+  uintptr_t count;
+} X64Registers;
+
+static int set_x64_registers(int cpu_fd, const X64Registers *regs)
+{
+    size_t n_regs = regs->count;
+    struct hv_register_assoc *assocs;
+
+    assocs = g_new0(hv_register_assoc, n_regs);
+    for (size_t i = 0; i < n_regs; i++) {
+        assocs[i].name = regs->names[i];
+        assocs[i].value.reg64 = regs->values[i];
+    }
+    int ret;
+
+    ret = mshv_set_generic_regs(cpu_fd, assocs, n_regs);
+    g_free(assocs);
+    if (ret < 0) {
+        error_report("failed to set x64 registers");
+        return -1;
+    }
+
+    return 0;
+}
+
+static inline MemTxAttrs get_mem_attrs(bool is_secure_mode)
+{
+    MemTxAttrs memattr = {0};
+    memattr.secure = is_secure_mode;
+    return memattr;
+}
+
+static void pio_read(uint64_t port, uint8_t *data, uintptr_t size,
+                     bool is_secure_mode)
+{
+    int ret = 0;
+    MemTxAttrs memattr = get_mem_attrs(is_secure_mode);
+    ret = address_space_rw(&address_space_io, port, memattr, (void *)data, size,
+                           false);
+    if (ret != MEMTX_OK) {
+        error_report("Failed to read from port %lx: %d", port, ret);
+        abort();
+    }
+}
+
+static int pio_write(uint64_t port, const uint8_t *data, uintptr_t size,
+                     bool is_secure_mode)
+{
+    int ret = 0;
+    MemTxAttrs memattr = get_mem_attrs(is_secure_mode);
+    ret = address_space_rw(&address_space_io, port, memattr, (void *)data, size,
+                           true);
+    return ret;
+}
+
+static int handle_pio_non_str(const CPUState *cpu,
+                              hv_x64_io_port_intercept_message *info) {
+    size_t len = info->access_info.access_size;
+    uint8_t access_type = info->header.intercept_access_type;
+    int ret;
+    uint32_t val, eax;
+    const uint32_t eax_mask =  0xffffffffu >> (32 - len * 8);
+    size_t insn_len;
+    uint64_t rip, rax;
+    uint32_t reg_names[2];
+    uint64_t reg_values[2];
+    struct X64Registers x64_regs = { 0 };
+    uint16_t port = info->port_number;
+    int cpu_fd = mshv_vcpufd(cpu);
+
+    if (access_type == HV_X64_INTERCEPT_ACCESS_TYPE_WRITE) {
+        union {
+            uint32_t u32;
+            uint8_t bytes[4];
+        } conv;
+
+        /* convert the first 4 bytes of rax to bytes */
+        conv.u32 = (uint32_t)info->rax;
+        /* secure mode is set to false */
+        ret = pio_write(port, conv.bytes, len, false);
+        if (ret < 0) {
+            error_report("Failed to write to io port");
+            return -1;
+        }
+    } else {
+        uint8_t data[4] = { 0 };
+        /* secure mode is set to false */
+        pio_read(info->port_number, data, len, false);
+
+        /* Preserve high bits in EAX, but clear out high bits in RAX */
+        val = *(uint32_t *)data;
+        eax = (((uint32_t)info->rax) & ~eax_mask) | (val & eax_mask);
+        info->rax = (uint64_t)eax;
+    }
+
+    insn_len = info->header.instruction_length;
+
+    /* Advance RIP and update RAX */
+    rip = info->header.rip + insn_len;
+    rax = info->rax;
+
+    reg_names[0] = HV_X64_REGISTER_RIP;
+    reg_values[0] = rip;
+    reg_names[1] = HV_X64_REGISTER_RAX;
+    reg_values[1] = rax;
+
+    x64_regs.names = reg_names;
+    x64_regs.values = reg_values;
+    x64_regs.count = 2;
+
+    ret = set_x64_registers(cpu_fd, &x64_regs);
+    if (ret < 0) {
+        error_report("Failed to set x64 registers");
+        return -1;
+    }
+
+    cpu->accel->dirty = false;
+
+    return 0;
+}
+
+static int fetch_guest_state(CPUState *cpu)
+{
+    int ret;
+
+    ret = mshv_get_standard_regs(cpu);
+    if (ret < 0) {
+        error_report("Failed to get standard registers");
+        return -1;
+    }
+
+    ret = mshv_get_special_regs(cpu);
+    if (ret < 0) {
+        error_report("Failed to get special registers");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int read_memory(int cpu_fd, uint64_t initial_gva, uint64_t initial_gpa,
+                       uint64_t gva, uint8_t *data, size_t len)
+{
+    int ret;
+    uint64_t gpa, flags;
+
+    if (gva == initial_gva) {
+        gpa = initial_gpa;
+    } else {
+        flags = HV_TRANSLATE_GVA_VALIDATE_READ;
+        ret = translate_gva(cpu_fd, gva, &gpa, flags);
+        if (ret < 0) {
+            return -1;
+        }
+
+        ret = mshv_guest_mem_read(gpa, data, len, false, false);
+        if (ret < 0) {
+            error_report("failed to read guest mem");
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
+static int write_memory(int cpu_fd, uint64_t initial_gva, uint64_t initial_gpa,
+                        uint64_t gva, const uint8_t *data, size_t len)
+{
+    int ret;
+    uint64_t gpa, flags;
+
+    if (gva == initial_gva) {
+        gpa = initial_gpa;
+    } else {
+        flags = HV_TRANSLATE_GVA_VALIDATE_WRITE;
+        ret = translate_gva(cpu_fd, gva, &gpa, flags);
+        if (ret < 0) {
+            error_report("failed to translate gva to gpa");
+            return -1;
+        }
+    }
+    ret = mshv_guest_mem_write(gpa, data, len, false);
+    if (ret != MEMTX_OK) {
+        error_report("failed to write to mmio");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int handle_pio_str_write(CPUState *cpu,
+                                hv_x64_io_port_intercept_message *info,
+                                size_t repeat, uint16_t port,
+                                bool direction_flag)
+{
+    int ret;
+    uint64_t src;
+    uint8_t data[4] = { 0 };
+    size_t len = info->access_info.access_size;
+    int cpu_fd = mshv_vcpufd(cpu);
+
+    src = linear_addr(cpu, info->rsi, R_DS);
+
+    for (size_t i = 0; i < repeat; i++) {
+        ret = read_memory(cpu_fd, 0, 0, src, data, len);
+        if (ret < 0) {
+            error_report("Failed to read memory");
+            return -1;
+        }
+        ret = pio_write(port, data, len, false);
+        if (ret < 0) {
+            error_report("Failed to write to io port");
+            return -1;
+        }
+        src += direction_flag ? -len : len;
+        info->rsi += direction_flag ? -len : len;
+    }
+
+    return 0;
+}
+
+static int handle_pio_str_read(CPUState *cpu,
+                                hv_x64_io_port_intercept_message *info,
+                                size_t repeat, uint16_t port,
+                                bool direction_flag)
+{
+    int ret;
+    uint64_t dst;
+    size_t len = info->access_info.access_size;
+    uint8_t data[4] = { 0 };
+    int cpu_fd = mshv_vcpufd(cpu);
+
+    dst = linear_addr(cpu, info->rdi, R_ES);
+
+    for (size_t i = 0; i < repeat; i++) {
+        pio_read(port, data, len, false);
+
+        ret = write_memory(cpu_fd, 0, 0, dst, data, len);
+        if (ret < 0) {
+            error_report("Failed to write memory");
+            return -1;
+        }
+        dst += direction_flag ? -len : len;
+        info->rdi += direction_flag ? -len : len;
+    }
+
+    return 0;
+}
+
+static int handle_pio_str(CPUState *cpu,
+                          hv_x64_io_port_intercept_message *info)
+{
+    uint8_t access_type = info->header.intercept_access_type;
+    uint16_t port = info->port_number;
+    bool repop = info->access_info.rep_prefix == 1;
+    size_t repeat = repop ? info->rcx : 1;
+    size_t insn_len = info->header.instruction_length;
+    bool direction_flag;
+    uint32_t reg_names[3];
+    uint64_t reg_values[3];
+    int ret;
+    struct X64Registers x64_regs = { 0 };
+    X86CPU *x86_cpu = X86_CPU(cpu);
+    CPUX86State *env = &x86_cpu->env;
+    int cpu_fd = mshv_vcpufd(cpu);
+
+    ret = fetch_guest_state(cpu);
+    if (ret < 0) {
+        error_report("Failed to fetch guest state");
+        return -1;
+    }
+
+    direction_flag = (env->eflags & DF) != 0;
+
+    if (access_type == HV_X64_INTERCEPT_ACCESS_TYPE_WRITE) {
+        ret = handle_pio_str_write(cpu, info, repeat, port, direction_flag);
+        if (ret < 0) {
+            error_report("Failed to handle pio str write");
+            return -1;
+        }
+        reg_names[0] = HV_X64_REGISTER_RSI;
+        reg_values[0] = info->rsi;
+    } else {
+        ret = handle_pio_str_read(cpu, info, repeat, port, direction_flag);
+        reg_names[0] = HV_X64_REGISTER_RDI;
+        reg_values[0] = info->rdi;
+    }
+
+    reg_names[1] = HV_X64_REGISTER_RIP;
+    reg_values[1] = info->header.rip + insn_len;
+    reg_names[2] = HV_X64_REGISTER_RAX;
+    reg_values[2] = info->rax;
+
+    x64_regs.names = reg_names;
+    x64_regs.values = reg_values;
+    x64_regs.count = 2;
+
+    ret = set_x64_registers(cpu_fd, &x64_regs);
+    if (ret < 0) {
+        error_report("Failed to set x64 registers");
+        return -1;
+    }
+
+    cpu->accel->dirty = false;
+
+    return 0;
+}
+
+static int handle_pio(CPUState *cpu, const struct hyperv_message *msg)
+{
+    struct hv_x64_io_port_intercept_message info = { 0 };
+    int ret;
+
+    ret = set_ioport_info(msg, &info);
+    if (ret < 0) {
+        error_report("Failed to convert message to ioport info");
+        return -1;
+    }
+
+    if (info.access_info.string_op) {
+        return handle_pio_str(cpu, &info);
+    }
+
+    return handle_pio_non_str(cpu, &info);
+}
+
 int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit)
 {
-	error_report("unimplemented");
-	abort();
+    int ret;
+    hv_message exit_msg = { 0 };
+    enum MshvVmExit exit_reason;
+    int cpu_fd = mshv_vcpufd(cpu);
+
+    ret = ioctl(cpu_fd, MSHV_RUN_VP, &exit_msg);
+    if (ret < 0) {
+        return MshvVmExitShutdown;
+    }
+
+    switch (exit_msg.header.message_type) {
+    case HVMSG_UNRECOVERABLE_EXCEPTION:
+        *msg = exit_msg;
+        return MshvVmExitShutdown;
+    case HVMSG_UNMAPPED_GPA:
+        ret = handle_unmapped_mem(vm_fd, cpu, &exit_msg, &exit_reason);
+        if (ret < 0) {
+            error_report("failed to handle unmapped memory");
+            return -1;
+        }
+        return exit_reason;
+    case HVMSG_GPA_INTERCEPT:
+        ret = handle_mmio(cpu, &exit_msg, &exit_reason);
+        if (ret < 0) {
+            error_report("failed to handle mmio");
+            return -1;
+        }
+        return exit_reason;
+    case HVMSG_X64_IO_PORT_INTERCEPT:
+        ret = handle_pio(cpu, &exit_msg);
+        if (ret < 0) {
+            return MshvVmExitSpecial;
+        }
+        return MshvVmExitIgnore;
+    default:
+        msg = &exit_msg;
+    }
+
+    *exit = MshvVmExitIgnore;
+    return 0;
 }
 
 void mshv_remove_vcpu(int vm_fd, int cpu_fd)
@@ -1061,34 +1583,6 @@ int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd)
     return 0;
 }
 
-static int translate_gva(int cpu_fd, uint64_t gva, uint64_t *gpa,
-                         uint64_t flags)
-{
-    int ret;
-    union hv_translate_gva_result result = { 0 };
-
-    *gpa = 0;
-    mshv_translate_gva args = {
-        .gva = gva,
-        .flags = flags,
-        .gpa = (__u64 *)gpa,
-        .result = &result,
-    };
-
-    ret = ioctl(cpu_fd, MSHV_TRANSLATE_GVA, &args);
-    if (ret < 0) {
-        error_report("failed to invoke gpa->gva translation");
-        return -errno;
-    }
-    if (result.result_code != HV_TRANSLATE_GVA_SUCCESS) {
-        error_report("failed to translate gva (" TARGET_FMT_lx ") to gpa", gva);
-        return -1;
-
-    }
-
-    return 0;
-}
-
 static int guest_mem_read_with_gva(const CPUState *cpu, uint64_t gva,
                                    uint8_t *data, uintptr_t size,
                                    bool fetch_instruction)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 25/25] accel/mshv: Add memory remapping workaround
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (23 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 24/25] target/i386/mshv: Implement mshv_vcpu_run() Magnus Kulke
@ 2025-05-20 11:30 ` Magnus Kulke
  2025-05-20 13:53   ` Paolo Bonzini
  2025-05-20 14:25 ` [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Paolo Bonzini
  25 siblings, 1 reply; 76+ messages in thread
From: Magnus Kulke @ 2025-05-20 11:30 UTC (permalink / raw)
  To: magnuskulke, qemu-devel, liuwe
  Cc: Paolo Bonzini, Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan,
	Roman Bolshakov, Philippe Mathieu-Daudé, Zhao Liu,
	Richard Henderson, Cameron Esfahani, Marc-André Lureau,
	Daniel P. Berrangé

Qemu maps regions of userland multiple times into the guest. The MSHV
kernel driver detects those overlapping regions and rejects those
mappings.

A logic is introduced to track all mappings and replace a region on the
fly if an unmapped gpa is encountered. If there is a region in the list
that would qualify and is currently unmapped, the current region is
unmapped and the requested region is mapped in.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 accel/mshv/mem.c            | 229 +++++++++++++++++++++++++++++++++++-
 accel/mshv/mshv-all.c       |   2 +
 include/system/mshv.h       |  13 ++
 target/i386/mshv/mshv-cpu.c |  23 +++-
 4 files changed, 265 insertions(+), 2 deletions(-)

diff --git a/accel/mshv/mem.c b/accel/mshv/mem.c
index ee627e7bd6..53e43873dc 100644
--- a/accel/mshv/mem.c
+++ b/accel/mshv/mem.c
@@ -12,7 +12,9 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/lockable.h"
 #include "qemu/error-report.h"
+#include "qemu/rcu.h"
 #include "hw/hyperv/linux-mshv.h"
 #include "system/address-spaces.h"
 #include "system/mshv.h"
@@ -20,12 +22,101 @@
 #include <sys/ioctl.h>
 #include "trace.h"
 
+static GList *mem_entries;
+
+/* We need this, because call_rcu1 won't operate on empty lists (NULL) */
+typedef struct {
+    struct rcu_head rcu;
+    GList *list;
+} FreeMemEntriesJob;
+
+static inline void free_mem_entries(struct rcu_head *rh)
+{
+    FreeMemEntriesJob *job = container_of(rh, FreeMemEntriesJob, rcu);
+    g_list_free(job->list);
+    g_free(job);
+}
+
+static void add_mem_entry(MshvMemoryEntry *entry)
+{
+    GList *old = qatomic_rcu_read(&mem_entries);
+    GList *new = g_list_copy(old);
+    new = g_list_prepend(new, entry);
+
+    qatomic_rcu_set(&mem_entries, new);
+
+    /* defer freeing of an obsolete snapshot */
+    FreeMemEntriesJob *job = g_new(FreeMemEntriesJob, 1);
+    job->list = old;
+    call_rcu1(&job->rcu, free_mem_entries);
+}
+
+static void remove_mem_entry(MshvMemoryEntry *entry)
+{
+    GList *old = qatomic_rcu_read(&mem_entries);
+    GList *new = g_list_copy(old);
+    new = g_list_remove(new, entry);
+
+    qatomic_rcu_set(&mem_entries, new);
+
+    /* Defer freeing of an obsolete snapshot */
+    FreeMemEntriesJob *job = g_new(FreeMemEntriesJob, 1);
+    job->list = old;
+    call_rcu1((struct rcu_head *)old, free_mem_entries);
+}
+
+/* Find _currently mapped_ memory entry, that is overlapping in userspace */
+static MshvMemoryEntry *find_overlap_mem_entry(const MshvMemoryEntry *entry_1)
+{
+    uint64_t start_1 = entry_1->mr.userspace_addr, start_2;
+    size_t len_1 = entry_1->mr.memory_size, len_2;
+
+    WITH_RCU_READ_LOCK_GUARD() {
+        GList *entries = qatomic_rcu_read(&mem_entries);
+        bool overlaps;
+        MshvMemoryEntry *entry_2;
+
+        for (GList *l = entries; l != NULL; l = l->next) {
+            entry_2 = l->data;
+            assert(entry_2);
+
+            if (entry_2 == entry_1) {
+                continue;
+            }
+
+            start_2 = entry_2->mr.userspace_addr;
+            len_2 = entry_2->mr.memory_size;
+
+            overlaps = ranges_overlap(start_1, len_1, start_2, len_2);
+            if (entry_2 != entry_1 && entry_2->mapped && overlaps) {
+                return entry_2;
+            }
+        }
+    }
+
+    return NULL;
+}
+
+void mshv_init_mem_manager(void)
+{
+    mem_entries = NULL;
+}
+
 static int set_guest_memory(int vm_fd, const mshv_user_mem_region *region)
 {
     int ret;
+    MshvMemoryEntry *overlap_entry, entry = { .mr = { 0 }, .mapped = false };
 
     ret = ioctl(vm_fd, MSHV_SET_GUEST_MEMORY, region);
     if (ret < 0) {
+        entry.mr.userspace_addr = region->userspace_addr;
+        entry.mr.memory_size = region->size;
+
+        overlap_entry = find_overlap_mem_entry(&entry);
+        if (overlap_entry != NULL) {
+            return -MSHV_USERSPACE_ADDR_REMAP_ERROR;
+        }
+
         error_report("failed to set guest memory");
         return -errno;
     }
@@ -54,6 +145,142 @@ static int map_or_unmap(int vm_fd, const MshvMemoryRegion *mr, bool add)
     return set_guest_memory(vm_fd, &region);
 }
 
+static MshvMemoryEntry *find_mem_entry_by_region(const MshvMemoryRegion *mr)
+{
+    WITH_RCU_READ_LOCK_GUARD() {
+        GList *entries = qatomic_rcu_read(&mem_entries);
+        MshvMemoryEntry *entry;
+
+        for (GList *l = entries; l != NULL; l = l->next) {
+            entry = l->data;
+            assert(entry);
+            if (memcmp(mr, &entry->mr, sizeof(MshvMemoryRegion)) == 0) {
+                return entry;
+            }
+        }
+    }
+
+    return NULL;
+}
+
+static inline int tracked_map_or_unmap(int vm_fd, const MshvMemoryRegion *mr, bool add)
+{
+    MshvMemoryEntry *entry;
+    int ret;
+
+    entry = find_mem_entry_by_region(mr);
+
+    if (!entry) {
+        /* delete */
+        if (!add) {
+            error_report("mem entry selected for removal does not exist");
+            return -1;
+        }
+
+        /* add */
+        ret = map_or_unmap(vm_fd, mr, true);
+        entry = g_new0(MshvMemoryEntry, 1);
+        entry->mr = *mr;
+        /* set depending on success */
+        entry->mapped = (ret == 0);
+        add_mem_entry(entry);
+
+        if (ret == -MSHV_USERSPACE_ADDR_REMAP_ERROR) {
+            warn_report(
+                "ignoring failed remapping userspace_addr=0x%016lx "
+                "gpa=0x%08lx size=0x%lx", mr->userspace_addr,
+                mr->guest_phys_addr, mr->memory_size);
+            ret = 0;
+        }
+
+        return ret;
+    }
+
+    /* entry exists */
+
+    /* delete */
+    if (!add) {
+        ret = 0;
+        if (entry->mapped) {
+            ret = map_or_unmap(vm_fd, mr, false);
+        }
+        remove_mem_entry(entry);
+        g_free(entry);
+        return ret;
+    }
+
+    /* add */
+    ret = map_or_unmap(vm_fd, mr, true);
+
+    /* set depending on success */
+    entry->mapped = (ret == 0);
+    return ret;
+}
+
+static MshvMemoryEntry* find_mem_entry_by_gpa(uint64_t gpa)
+{
+    WITH_RCU_READ_LOCK_GUARD() {
+        GList *entries = qatomic_rcu_read(&mem_entries);
+        MshvMemoryEntry *entry;
+        uint64_t gpa_offset;
+
+        for (GList *l = entries; l != NULL; l = l->next) {
+            entry = l->data;
+            assert(entry);
+            gpa_offset = gpa - entry->mr.guest_phys_addr;
+            if (entry->mr.guest_phys_addr <= gpa
+                && gpa_offset < entry->mr.memory_size) {
+                return entry;
+            }
+        }
+    }
+
+    return NULL;
+}
+
+MshvRemapResult mshv_remap_overlapped_region(int vm_fd, uint64_t gpa)
+{
+    MshvMemoryEntry *gpa_entry, *overlap_entry;
+    int ret;
+
+    /* return early if no entry is found */
+    gpa_entry = find_mem_entry_by_gpa(gpa);
+    if (gpa_entry == NULL) {
+        return MshvRemapNoMapping;
+    }
+
+    overlap_entry = find_overlap_mem_entry(gpa_entry);
+    if (overlap_entry == NULL) {
+        return MshvRemapNoOverlap;
+    }
+
+    /* unmap overlapping region */
+    ret = map_or_unmap(vm_fd, &overlap_entry->mr, false);
+    if (ret < 0) {
+        error_report("failed to unmap overlap region");
+        abort();
+    }
+    overlap_entry->mapped = false;
+    warn_report("mapped out userspace_addr=0x%016lx gpa=0x%010lx size=0x%lx",
+                overlap_entry->mr.userspace_addr,
+                overlap_entry->mr.guest_phys_addr,
+                overlap_entry->mr.memory_size);
+
+    /* map region for gpa */
+    ret = map_or_unmap(vm_fd, &gpa_entry->mr, true);
+    if (ret < 0) {
+        error_report("failed to map new region");
+        abort();
+    }
+    gpa_entry->mapped = true;
+    warn_report("mapped in  userspace_addr=0x%016lx gpa=0x%010lx size=0x%lx",
+                gpa_entry->mr.userspace_addr,
+                gpa_entry->mr.guest_phys_addr,
+                gpa_entry->mr.memory_size);
+
+    return MshvRemapOk;
+}
+
 static inline MemTxAttrs get_mem_attrs(bool is_secure_mode)
 {
     MemTxAttrs memattr = {0};
@@ -139,7 +366,7 @@ static int set_memory(const MshvMemoryRegion *mshv_mr, bool add)
                           mshv_mr->memory_size,
                           mshv_mr->userspace_addr, mshv_mr->readonly,
                           ret);
-    return map_or_unmap(mshv_state->vm, mshv_mr, add);
+    return tracked_map_or_unmap(mshv_state->vm, mshv_mr, add);
 }
 
 /*
diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index 97212c54f1..bf30c968ce 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -439,6 +439,8 @@ static int mshv_init(MachineState *ms)
 
     mshv_init_msicontrol();
 
+    mshv_init_mem_manager();
+
     do {
         int vm_fd = create_vm(mshv_fd);
         s->vm = vm_fd;
diff --git a/include/system/mshv.h b/include/system/mshv.h
index 622b3db540..c4072b980f 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -147,6 +147,12 @@ typedef enum MshvVmExit {
     MshvVmExitSpecial  = 2,
 } MshvVmExit;
 
+typedef enum MshvRemapResult {
+    MshvRemapOk = 0,
+    MshvRemapNoMapping = 1,
+    MshvRemapNoOverlap = 2,
+} MshvRemapResult;
+
 void mshv_init_cpu_logic(void);
 int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd);
 void mshv_remove_vcpu(int vm_fd, int cpu_fd);
@@ -199,8 +205,15 @@ typedef struct MshvMemoryRegion {
     bool readonly;
 } MshvMemoryRegion;
 
+typedef struct MshvMemoryEntry {
+    MshvMemoryRegion mr;
+    bool mapped;
+} MshvMemoryEntry;
+
+void mshv_init_mem_manager(void);
 int mshv_add_mem(int vm_fd, const MshvMemoryRegion *mr);
 int mshv_remove_mem(int vm_fd, const MshvMemoryRegion *mr);
+MshvRemapResult mshv_remap_overlapped_region(int vm_fd, uint64_t gpa);
 int mshv_guest_mem_read(uint64_t gpa, uint8_t *data, uintptr_t size,
                         bool is_secure_mode, bool instruction_fetch);
 int mshv_guest_mem_write(uint64_t gpa, const uint8_t *data, uintptr_t size,
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 27c6cd6138..4c74081968 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -1159,7 +1159,9 @@ static int handle_unmapped_mem(int vm_fd, CPUState *cpu,
                                MshvVmExit *exit_reason)
 {
     struct hv_x64_memory_intercept_message info = { 0 };
+    uint64_t gpa;
     int ret;
+    enum MshvRemapResult remap_result;
 
     ret = set_memory_info(msg, &info);
     if (ret < 0) {
@@ -1167,7 +1169,26 @@ static int handle_unmapped_mem(int vm_fd, CPUState *cpu,
         return -1;
     }
 
-    return handle_mmio(cpu, msg, exit_reason);
+    gpa = info.guest_physical_address;
+
+    /* attempt to remap the region, in case of overlapping userspase mappings */
+    remap_result = mshv_remap_overlapped_region(vm_fd, gpa);
+    *exit_reason = MshvVmExitIgnore;
+
+    switch (remap_result) {
+    case MshvRemapNoMapping:
+        /* if we didn't find a mapping, it is probably mmio */
+        return handle_mmio(cpu, msg, exit_reason);
+    case MshvRemapOk:
+        break;
+    case MshvRemapNoOverlap:
+        /* This should not happen, but we are forgiving it */
+        warn_report("found no overlap for unmapped region");
+        *exit_reason = MshvVmExitSpecial;
+        break;
+    }
+
+    return 0;
 }
 
 static int set_ioport_info(const struct hyperv_message *msg,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 01/25] accel: Add Meson and config support for MSHV accelerator
  2025-05-20 11:29 ` [RFC PATCH 01/25] accel: Add Meson and config support for MSHV accelerator Magnus Kulke
@ 2025-05-20 11:50   ` Daniel P. Berrangé
  2025-05-20 14:16     ` Paolo Bonzini
  0 siblings, 1 reply; 76+ messages in thread
From: Daniel P. Berrangé @ 2025-05-20 11:50 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau

On Tue, May 20, 2025 at 01:29:54PM +0200, Magnus Kulke wrote:
> Introduce a Meson feature option and default-config entry to allow
> building QEMU with MSHV (Microsoft Hypervisor) acceleration support.
> 
> This is the first step toward implementing an MSHV backend in QEMU.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
>  accel/Kconfig                 |  3 +++
>  meson.build                   | 16 ++++++++++++++++
>  meson_options.txt             |  2 ++
>  scripts/meson-buildoptions.sh |  3 +++
>  4 files changed, 24 insertions(+)
> 
> diff --git a/accel/Kconfig b/accel/Kconfig
> index 4263cab722..a60f114923 100644
> --- a/accel/Kconfig
> +++ b/accel/Kconfig
> @@ -13,6 +13,9 @@ config TCG
>  config KVM
>      bool
>  
> +config MSHV
> +    bool
> +
>  config XEN
>      bool
>      select FSDEV_9P if VIRTFS
> diff --git a/meson.build b/meson.build
> index e819a7084c..a4269b816b 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -322,6 +322,13 @@ else
>  endif
>  accelerator_targets += { 'CONFIG_XEN': xen_targets }
>  
> +if cpu == 'x86_64'
> +  mshv_targets = ['x86_64-softmmu']
> +else
> +  mshv_targets = []
> +endif
> +accelerator_targets += { 'CONFIG_MSHV': mshv_targets }
> +
>  if cpu == 'aarch64'
>    accelerator_targets += {
>      'CONFIG_HVF': ['aarch64-softmmu']
> @@ -877,6 +884,14 @@ accelerators = []
>  if get_option('kvm').allowed() and host_os == 'linux'
>    accelerators += 'CONFIG_KVM'
>  endif
> +
> +if get_option('mshv').allowed() and host_os == 'linux'
> +  if get_option('mshv').enabled() and host_machine.cpu() != 'x86_64'
> +    error('mshv accelerator requires x64_64 host')
> +  endif
> +  accelerators += 'CONFIG_MSHV'

This enables MSHV for non-x86 when the option is left on 'auto'.

You would need something more like this:

  if host_machine.cpu() != 'x86_64'
    if get_option('mshv').enabled()
      error('mshv accelerator requires x64_64 host')
    endif
  else
    accelerators += 'CONFIG_MSHV'
  endif

> +endif
> +
>  if get_option('whpx').allowed() and host_os == 'windows'
>    if get_option('whpx').enabled() and host_machine.cpu() != 'x86_64'
>      error('WHPX requires 64-bit host')

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 03/25] target/i386/mshv: Add x86 decoder/emu implementation
  2025-05-20 11:29 ` [RFC PATCH 03/25] target/i386/mshv: Add x86 decoder/emu implementation Magnus Kulke
@ 2025-05-20 11:54   ` Daniel P. Berrangé
  2025-05-20 13:17   ` Paolo Bonzini
  2025-05-20 17:36   ` Wei Liu
  2 siblings, 0 replies; 76+ messages in thread
From: Daniel P. Berrangé @ 2025-05-20 11:54 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau

On Tue, May 20, 2025 at 01:29:56PM +0200, Magnus Kulke wrote:
> The MSHV accelerator requires a x86 decoder/emulator in userland to
> emulate MMIO instructions. This change contains the implementations for
> the generalized i386 instruction decoder/emulator.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
>  include/system/mshv.h           |  32 ++++
>  target/i386/cpu.h               |   2 +-
>  target/i386/emulate/meson.build |   7 +-
>  target/i386/meson.build         |   2 +
>  target/i386/mshv/meson.build    |   7 +
>  target/i386/mshv/x86.c          | 330 ++++++++++++++++++++++++++++++++
>  6 files changed, 377 insertions(+), 3 deletions(-)
>  create mode 100644 include/system/mshv.h
>  create mode 100644 target/i386/mshv/meson.build
>  create mode 100644 target/i386/mshv/x86.c
> 
> diff --git a/include/system/mshv.h b/include/system/mshv.h
> new file mode 100644
> index 0000000000..8380b92da2
> --- /dev/null
> +++ b/include/system/mshv.h
> @@ -0,0 +1,32 @@
> +/*
> + * QEMU MSHV support
> + *
> + * Copyright Microsoft, Corp. 2025
> + *
> + * Authors:
> + *  Ziqiao Zhou       <ziqiaozhou@microsoft.com>
> + *  Magnus Kulke      <magnuskulke@microsoft.com>
> + *  Jinank Jain       <jinankjain@microsoft.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.

FYI, for new files we now require use of SPDX-License-Identifier,
and omission of any manually written license boilerplate text.

checkpatch.pl is supposed to be warning about this, but it is
buggy & incomplete right now, but fixes for that are pending
to correctly warn about this.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 06/25] accel/mshv: Add accelerator skeleton
  2025-05-20 11:29 ` [RFC PATCH 06/25] accel/mshv: Add accelerator skeleton Magnus Kulke
@ 2025-05-20 12:02   ` Daniel P. Berrangé
  2025-05-20 12:38     ` Paolo Bonzini
  0 siblings, 1 reply; 76+ messages in thread
From: Daniel P. Berrangé @ 2025-05-20 12:02 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau

On Tue, May 20, 2025 at 01:29:59PM +0200, Magnus Kulke wrote:
> Introduce the initial scaffold for the MSHV (Microsoft Hypervisor)
> accelerator backend. This includes the basic directory structure and
> stub implementations needed to integrate with QEMU's accelerator
> framework.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
>  accel/meson.build      |   1 +
>  accel/mshv/meson.build |   6 ++
>  accel/mshv/mshv-all.c  | 143 +++++++++++++++++++++++++++++++++++++++++
>  include/system/mshv.h  |  34 ++++++++++
>  4 files changed, 184 insertions(+)
>  create mode 100644 accel/mshv/meson.build
>  create mode 100644 accel/mshv/mshv-all.c
> 

> diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
> new file mode 100644
> index 0000000000..44605adf94
> --- /dev/null
> +++ b/accel/mshv/mshv-all.c

> +
> +static int mshv_init(MachineState *ms)
> +{
> +	error_report("unimplemented");
> +	abort();
> +}

Nit-picking - although you remove these lines in later patches,
lets remove the tabs from these lines.

> diff --git a/include/system/mshv.h b/include/system/mshv.h
> index bc8f2c228a..0858e47def 100644
> --- a/include/system/mshv.h
> +++ b/include/system/mshv.h
> @@ -16,6 +16,14 @@
>  #ifndef QEMU_MSHV_INT_H
>  #define QEMU_MSHV_INT_H
>  
> +#include "qemu/osdep.h"
> +#include "qemu/accel.h"
> +#include "hw/hyperv/hyperv-proto.h"
> +#include "hw/hyperv/linux-mshv.h"
> +#include "hw/hyperv/hvhdk.h"
> +#include "qapi/qapi-types-common.h"
> +#include "system/memory.h"
> +
>  #ifdef COMPILING_PER_TARGET
>  #ifdef CONFIG_MSHV
>  #define CONFIG_MSHV_IS_POSSIBLE
> @@ -28,6 +36,32 @@
>  #ifdef CONFIG_MSHV_IS_POSSIBLE
>  extern bool mshv_allowed;
>  #define mshv_enabled() (mshv_allowed)
> +
> +typedef struct MshvMemoryListener {
> +  MemoryListener listener;
> +  int as_id;
> +} MshvMemoryListener;
> +
> +typedef struct MshvAddressSpace {
> +    MshvMemoryListener *ml;
> +    AddressSpace *as;
> +} MshvAddressSpace;

Inconsistent mix of 2-space and 4-space
indents - stick with 4-space throughout

> +
> +typedef struct MshvState {
> +  AccelState parent_obj;
> +  int vm;
> +  MshvMemoryListener memory_listener;
> +  /* number of listeners */
> +  int nr_as;
> +  MshvAddressSpace *as;
> +} MshvState;
> +extern MshvState *mshv_state;
> +
> +struct AccelCPUState {
> +  int cpufd;
> +  bool dirty;
> +};
> +
>  #else /* CONFIG_MSHV_IS_POSSIBLE */
>  #define mshv_enabled() false
>  #endif
> -- 
> 2.34.1
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 06/25] accel/mshv: Add accelerator skeleton
  2025-05-20 12:02   ` Daniel P. Berrangé
@ 2025-05-20 12:38     ` Paolo Bonzini
  0 siblings, 0 replies; 76+ messages in thread
From: Paolo Bonzini @ 2025-05-20 12:38 UTC (permalink / raw)
  To: Daniel P. Berrangé, Magnus Kulke
  Cc: magnuskulke, qemu-devel, liuwe, Michael S. Tsirkin, Wei Liu,
	Phil Dennis-Jordan, Roman Bolshakov, Philippe Mathieu-Daudé,
	Zhao Liu, Richard Henderson, Cameron Esfahani,
	Marc-André Lureau

On 5/20/25 14:02, Daniel P. Berrangé wrote:
> On Tue, May 20, 2025 at 01:29:59PM +0200, Magnus Kulke wrote:
>> Introduce the initial scaffold for the MSHV (Microsoft Hypervisor)
>> accelerator backend. This includes the basic directory structure and
>> stub implementations needed to integrate with QEMU's accelerator
>> framework.
>>
>> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
>> ---
>>   accel/meson.build      |   1 +
>>   accel/mshv/meson.build |   6 ++
>>   accel/mshv/mshv-all.c  | 143 +++++++++++++++++++++++++++++++++++++++++
>>   include/system/mshv.h  |  34 ++++++++++
>>   4 files changed, 184 insertions(+)
>>   create mode 100644 accel/mshv/meson.build
>>   create mode 100644 accel/mshv/mshv-all.c
>>
> 
>> diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
>> new file mode 100644
>> index 0000000000..44605adf94
>> --- /dev/null
>> +++ b/accel/mshv/mshv-all.c
> 
>> +
>> +static int mshv_init(MachineState *ms)
>> +{
>> +	error_report("unimplemented");
>> +	abort();
>> +}
> 
> Nit-picking - although you remove these lines in later patches,
> lets remove the tabs from these lines.

Indent is a bit messy throughout the whole series, in some files there 
are also TAB characters that are supposed to expand to 4.

Probably easiest to run all new files through autoindent, there must be 
some VSCode magic for that. :)

Paolo

>> diff --git a/include/system/mshv.h b/include/system/mshv.h
>> index bc8f2c228a..0858e47def 100644
>> --- a/include/system/mshv.h
>> +++ b/include/system/mshv.h
>> @@ -16,6 +16,14 @@
>>   #ifndef QEMU_MSHV_INT_H
>>   #define QEMU_MSHV_INT_H
>>   
>> +#include "qemu/osdep.h"
>> +#include "qemu/accel.h"
>> +#include "hw/hyperv/hyperv-proto.h"
>> +#include "hw/hyperv/linux-mshv.h"
>> +#include "hw/hyperv/hvhdk.h"
>> +#include "qapi/qapi-types-common.h"
>> +#include "system/memory.h"
>> +
>>   #ifdef COMPILING_PER_TARGET
>>   #ifdef CONFIG_MSHV
>>   #define CONFIG_MSHV_IS_POSSIBLE
>> @@ -28,6 +36,32 @@
>>   #ifdef CONFIG_MSHV_IS_POSSIBLE
>>   extern bool mshv_allowed;
>>   #define mshv_enabled() (mshv_allowed)
>> +
>> +typedef struct MshvMemoryListener {
>> +  MemoryListener listener;
>> +  int as_id;
>> +} MshvMemoryListener;
>> +
>> +typedef struct MshvAddressSpace {
>> +    MshvMemoryListener *ml;
>> +    AddressSpace *as;
>> +} MshvAddressSpace;
> 
> Inconsistent mix of 2-space and 4-space
> indents - stick with 4-space throughout



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 02/25] target/i386/emulate: allow instruction decoding from stream
  2025-05-20 11:29 ` [RFC PATCH 02/25] target/i386/emulate: allow instruction decoding from stream Magnus Kulke
@ 2025-05-20 12:42   ` Paolo Bonzini
  2025-05-20 17:29   ` Wei Liu
  1 sibling, 0 replies; 76+ messages in thread
From: Paolo Bonzini @ 2025-05-20 12:42 UTC (permalink / raw)
  To: Magnus Kulke, magnuskulke, qemu-devel, liuwe
  Cc: Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On 5/20/25 13:29, Magnus Kulke wrote:
> Introduce a new helper function to decode x86 instructions from a
> raw instruction byte stream. MSHV delivers an instruction stream in a
> buffer of the vm_exit message. It can be used to speed up MMIO
> emulation, since instructions do not have to be fetched and translated.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>

Missing update of hvf_x86_emul_ops.

Paolo

> ---
>   target/i386/emulate/x86_decode.c | 32 +++++++++++++++++++++++++++-----
>   target/i386/emulate/x86_decode.h | 11 +++++++++++
>   target/i386/emulate/x86_emu.c    |  3 ++-
>   target/i386/emulate/x86_emu.h    |  1 +
>   4 files changed, 41 insertions(+), 6 deletions(-)
> 
> diff --git a/target/i386/emulate/x86_decode.c b/target/i386/emulate/x86_decode.c
> index 88be9479a8..7a862b976e 100644
> --- a/target/i386/emulate/x86_decode.c
> +++ b/target/i386/emulate/x86_decode.c
> @@ -60,6 +60,7 @@ static inline uint64_t decode_bytes(CPUX86State *env, struct x86_decode *decode,
>                                       int size)
>   {
>       uint64_t val = 0;
> +    target_ulong va;
>   
>       switch (size) {
>       case 1:
> @@ -71,10 +72,16 @@ static inline uint64_t decode_bytes(CPUX86State *env, struct x86_decode *decode,
>           VM_PANIC_EX("%s invalid size %d\n", __func__, size);
>           break;
>       }
> -    target_ulong va  = linear_rip(env_cpu(env), env->eip) + decode->len;
> -    emul_ops->read_mem(env_cpu(env), &val, va, size);
> +
> +	/* copy the bytes from the instruction stream, if available */
> +	if (decode->stream && decode->len + size <= decode->stream->len) {
> +		memcpy(&val, decode->stream->bytes + decode->len, size);
> +	} else {
> +		va = linear_rip(env_cpu(env), env->eip) + decode->len;
> +		emul_ops->fetch_instruction(env_cpu(env), &val, va, size);
> +	}
>       decode->len += size;
> -
> +
>       return val;
>   }
>   
> @@ -2076,9 +2083,8 @@ static void decode_opcodes(CPUX86State *env, struct x86_decode *decode)
>       }
>   }
>   
> -uint32_t decode_instruction(CPUX86State *env, struct x86_decode *decode)
> +static uint32_t decode_opcode(CPUX86State *env, struct x86_decode *decode)
>   {
> -    memset(decode, 0, sizeof(*decode));
>       decode_prefix(env, decode);
>       set_addressing_size(env, decode);
>       set_operand_size(env, decode);
> @@ -2088,6 +2094,22 @@ uint32_t decode_instruction(CPUX86State *env, struct x86_decode *decode)
>       return decode->len;
>   }
>   
> +uint32_t decode_instruction(CPUX86State *env, struct x86_decode *decode)
> +{
> +	memset(decode, 0, sizeof(*decode));
> +	return decode_opcode(env, decode);
> +}
> +
> +uint32_t decode_instruction_stream(CPUX86State *env, struct x86_decode *decode,
> +		                           struct x86_insn_stream *stream)
> +{
> +	memset(decode, 0, sizeof(*decode));
> +	if (stream != NULL) {
> +		decode->stream = stream;
> +	}
> +	return decode_opcode(env, decode);
> +}
> +
>   void init_decoder(void)
>   {
>       int i;
> diff --git a/target/i386/emulate/x86_decode.h b/target/i386/emulate/x86_decode.h
> index 87cc728598..9bc7d6cc49 100644
> --- a/target/i386/emulate/x86_decode.h
> +++ b/target/i386/emulate/x86_decode.h
> @@ -269,6 +269,11 @@ typedef struct x86_decode_op {
>       target_ulong ptr;
>   } x86_decode_op;
>   
> +typedef struct x86_insn_stream {
> +	const uint8_t *bytes;
> +	size_t len;
> +} x86_insn_stream;
> +
>   typedef struct x86_decode {
>       int len;
>       uint8_t opcode[4];
> @@ -295,12 +300,18 @@ typedef struct x86_decode {
>       struct x86_modrm modrm;
>       struct x86_decode_op op[4];
>       bool is_fpu;
> +
> +	x86_insn_stream *stream;
>   } x86_decode;
>   
>   uint64_t sign(uint64_t val, int size);
>   
>   uint32_t decode_instruction(CPUX86State *env, struct x86_decode *decode);
>   
> +uint32_t decode_instruction_stream(CPUX86State *env,
> +								   struct x86_decode *decode,
> +		                           struct x86_insn_stream *stream);
> +
>   target_ulong get_reg_ref(CPUX86State *env, int reg, int rex_present,
>                            int is_extended, int size);
>   target_ulong get_reg_val(CPUX86State *env, int reg, int rex_present,
> diff --git a/target/i386/emulate/x86_emu.c b/target/i386/emulate/x86_emu.c
> index 7773b51b95..73c9eb41d1 100644
> --- a/target/i386/emulate/x86_emu.c
> +++ b/target/i386/emulate/x86_emu.c
> @@ -1241,7 +1241,8 @@ static void init_cmd_handler(void)
>   bool exec_instruction(CPUX86State *env, struct x86_decode *ins)
>   {
>       if (!_cmd_handler[ins->cmd].handler) {
> -        printf("Unimplemented handler (" TARGET_FMT_lx ") for %d (%x %x) \n", env->eip,
> +        printf("Unimplemented handler (" TARGET_FMT_lx ") for %d (%x %x) \n",
> +                env->eip,
>                   ins->cmd, ins->opcode[0],
>                   ins->opcode_len > 1 ? ins->opcode[1] : 0);
>           env->eip += ins->len;
> diff --git a/target/i386/emulate/x86_emu.h b/target/i386/emulate/x86_emu.h
> index 555b567e2c..761e83fd6b 100644
> --- a/target/i386/emulate/x86_emu.h
> +++ b/target/i386/emulate/x86_emu.h
> @@ -24,6 +24,7 @@
>   #include "cpu.h"
>   
>   struct x86_emul_ops {
> +    void (*fetch_instruction)(CPUState *cpu, void *data, target_ulong addr, int bytes);
>       void (*read_mem)(CPUState *cpu, void *data, target_ulong addr, int bytes);
>       void (*write_mem)(CPUState *cpu, void *data, target_ulong addr, int bytes);
>       void (*read_segment_descriptor)(CPUState *cpu, struct x86_segment_descriptor *desc,



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 03/25] target/i386/mshv: Add x86 decoder/emu implementation
  2025-05-20 11:29 ` [RFC PATCH 03/25] target/i386/mshv: Add x86 decoder/emu implementation Magnus Kulke
  2025-05-20 11:54   ` Daniel P. Berrangé
@ 2025-05-20 13:17   ` Paolo Bonzini
  2025-05-20 17:36   ` Wei Liu
  2 siblings, 0 replies; 76+ messages in thread
From: Paolo Bonzini @ 2025-05-20 13:17 UTC (permalink / raw)
  To: Magnus Kulke, magnuskulke, qemu-devel, liuwe
  Cc: Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On 5/20/25 13:29, Magnus Kulke wrote:
> +/* cpu */
> +/* EFER (technically not a register) bits */
> +#define EFER_LMA   ((uint64_t)0x400)
> +#define EFER_LME   ((uint64_t)0x100)

There's already MSR_EFER_LMA and MSR_EFER_LME, please use them.
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 4f8ed8868e..db6a37b271 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -2101,7 +2101,7 @@ typedef struct CPUArchState {
>      QEMUTimer *xen_periodic_timer;
>      QemuMutex xen_timers_lock;
>  #endif
> -#if defined(CONFIG_HVF)
> +#if defined(CONFIG_HVF) || defined(CONFIG_MSHV)
>      X86LazyFlags lflags;
>      void *emu_mmio_buf;
>  #endif

Please rebase since the lflags member was removed.

> +bool x86_write_segment_descriptor(CPUState *cpu,
> +                                  struct x86_segment_descriptor *desc,
> +                                  x86_segment_selector sel)

There are a bunch of unused functions such as this one.

Paolo



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 24/25] target/i386/mshv: Implement mshv_vcpu_run()
  2025-05-20 11:30 ` [RFC PATCH 24/25] target/i386/mshv: Implement mshv_vcpu_run() Magnus Kulke
@ 2025-05-20 13:21   ` Paolo Bonzini
  2025-05-20 22:52   ` Wei Liu
  1 sibling, 0 replies; 76+ messages in thread
From: Paolo Bonzini @ 2025-05-20 13:21 UTC (permalink / raw)
  To: Magnus Kulke, magnuskulke, qemu-devel, liuwe
  Cc: Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On 5/20/25 13:30, Magnus Kulke wrote:
> +static int emulate_instruction(CPUState *cpu,
> +                               const uint8_t *insn_bytes, size_t insn_len,
> +                               uint64_t gva, uint64_t gpa)
> +{
> +    X86CPU *x86_cpu = X86_CPU(cpu);
> +    CPUX86State *env = &x86_cpu->env;
> +    struct x86_decode decode = { 0 };
> +    int ret;
> +    int cpu_fd = mshv_vcpufd(cpu);
> +    QemuMutex *guard;
> +    x86_insn_stream stream = { .bytes = insn_bytes, .len = insn_len };
> +
> +    guard = g_hash_table_lookup(cpu_guards, GUINT_TO_POINTER(cpu_fd));

mshv_cpu_exec() will always run in the vCPU thread, so you don't need a 
mutex.  All of patch 14 can go, in fact.

Paolo

> +    if (!guard) {
> +        error_report("failed to get cpu guard");
> +        return -1;
> +    }
> +
> +    WITH_QEMU_LOCK_GUARD(guard) {
> +        ret = mshv_load_regs(cpu);
> +        if (ret < 0) {
> +            error_report("failed to load registers");
> +            return -1;
> +        }
> +
> +        decode_instruction_stream(env, &decode, &stream);
> +        exec_instruction(env, &decode);
> +
> +        ret = mshv_store_regs(cpu);
> +        if (ret < 0) {
> +            error_report("failed to store registers");
> +            return -1;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +static int handle_mmio(CPUState *cpu, const struct hyperv_message *msg,
> +                       MshvVmExit *exit_reason)
> +{
> +    struct hv_x64_memory_intercept_message info = { 0 };
> +    size_t insn_len;
> +    uint8_t access_type;
> +    uint8_t *instruction_bytes;
> +    int ret;
> +
> +    ret = set_memory_info(msg, &info);
> +    if (ret < 0) {
> +        error_report("failed to convert message to memory info");
> +        return -1;
> +    }
> +    insn_len = info.instruction_byte_count;
> +    access_type = info.header.intercept_access_type;
> +
> +    if (access_type == HV_X64_INTERCEPT_ACCESS_TYPE_EXECUTE) {
> +        error_report("invalid intercept access type: execute");
> +        return -1;
> +    }
> +
> +    if (insn_len > 16) {
> +        error_report("invalid mmio instruction length: %zu", insn_len);
> +        return -1;
> +    }
> +
> +    if (insn_len == 0) {
> +        warn_report("mmio instruction buffer empty");
> +    }
> +
> +    instruction_bytes = info.instruction_bytes;
> +
> +    ret = emulate_instruction(cpu, instruction_bytes, insn_len,
> +                              info.guest_virtual_address,
> +                              info.guest_physical_address);
> +    if (ret < 0) {
> +        error_report("failed to emulate mmio");
> +        return -1;
> +    }
> +
> +    *exit_reason = MshvVmExitIgnore;
> +
> +    return 0;
> +}
> +
> +static int handle_unmapped_mem(int vm_fd, CPUState *cpu,
> +                               const struct hyperv_message *msg,
> +                               MshvVmExit *exit_reason)
> +{
> +    struct hv_x64_memory_intercept_message info = { 0 };
> +    int ret;
> +
> +    ret = set_memory_info(msg, &info);
> +    if (ret < 0) {
> +        error_report("failed to convert message to memory info");
> +        return -1;
> +    }
> +
> +    return handle_mmio(cpu, msg, exit_reason);
> +}
> +
> +static int set_ioport_info(const struct hyperv_message *msg,
> +                           hv_x64_io_port_intercept_message *info)
> +{
> +    if (msg->header.message_type != HVMSG_X64_IO_PORT_INTERCEPT) {
> +        error_report("Invalid message type");
> +        return -1;
> +    }
> +    memcpy(info, msg->payload, sizeof(*info));
> +
> +    return 0;
> +}
> +
> +typedef struct X64Registers {
> +  const uint32_t *names;
> +  const uint64_t *values;
> +  uintptr_t count;
> +} X64Registers;
> +
> +static int set_x64_registers(int cpu_fd, const X64Registers *regs)
> +{
> +    size_t n_regs = regs->count;
> +    struct hv_register_assoc *assocs;
> +
> +    assocs = g_new0(hv_register_assoc, n_regs);
> +    for (size_t i = 0; i < n_regs; i++) {
> +        assocs[i].name = regs->names[i];
> +        assocs[i].value.reg64 = regs->values[i];
> +    }
> +    int ret;
> +
> +    ret = mshv_set_generic_regs(cpu_fd, assocs, n_regs);
> +    g_free(assocs);
> +    if (ret < 0) {
> +        error_report("failed to set x64 registers");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static inline MemTxAttrs get_mem_attrs(bool is_secure_mode)
> +{
> +    MemTxAttrs memattr = {0};
> +    memattr.secure = is_secure_mode;
> +    return memattr;
> +}
> +
> +static void pio_read(uint64_t port, uint8_t *data, uintptr_t size,
> +                     bool is_secure_mode)
> +{
> +    int ret = 0;
> +    MemTxAttrs memattr = get_mem_attrs(is_secure_mode);
> +    ret = address_space_rw(&address_space_io, port, memattr, (void *)data, size,
> +                           false);
> +    if (ret != MEMTX_OK) {
> +        error_report("Failed to read from port %lx: %d", port, ret);
> +        abort();
> +    }
> +}
> +
> +static int pio_write(uint64_t port, const uint8_t *data, uintptr_t size,
> +                     bool is_secure_mode)
> +{
> +    int ret = 0;
> +    MemTxAttrs memattr = get_mem_attrs(is_secure_mode);
> +    ret = address_space_rw(&address_space_io, port, memattr, (void *)data, size,
> +                           true);
> +    return ret;
> +}
> +
> +static int handle_pio_non_str(const CPUState *cpu,
> +                              hv_x64_io_port_intercept_message *info) {
> +    size_t len = info->access_info.access_size;
> +    uint8_t access_type = info->header.intercept_access_type;
> +    int ret;
> +    uint32_t val, eax;
> +    const uint32_t eax_mask =  0xffffffffu >> (32 - len * 8);
> +    size_t insn_len;
> +    uint64_t rip, rax;
> +    uint32_t reg_names[2];
> +    uint64_t reg_values[2];
> +    struct X64Registers x64_regs = { 0 };
> +    uint16_t port = info->port_number;
> +    int cpu_fd = mshv_vcpufd(cpu);
> +
> +    if (access_type == HV_X64_INTERCEPT_ACCESS_TYPE_WRITE) {
> +        union {
> +            uint32_t u32;
> +            uint8_t bytes[4];
> +        } conv;
> +
> +        /* convert the first 4 bytes of rax to bytes */
> +        conv.u32 = (uint32_t)info->rax;
> +        /* secure mode is set to false */
> +        ret = pio_write(port, conv.bytes, len, false);
> +        if (ret < 0) {
> +            error_report("Failed to write to io port");
> +            return -1;
> +        }
> +    } else {
> +        uint8_t data[4] = { 0 };
> +        /* secure mode is set to false */
> +        pio_read(info->port_number, data, len, false);
> +
> +        /* Preserve high bits in EAX, but clear out high bits in RAX */
> +        val = *(uint32_t *)data;
> +        eax = (((uint32_t)info->rax) & ~eax_mask) | (val & eax_mask);
> +        info->rax = (uint64_t)eax;
> +    }
> +
> +    insn_len = info->header.instruction_length;
> +
> +    /* Advance RIP and update RAX */
> +    rip = info->header.rip + insn_len;
> +    rax = info->rax;
> +
> +    reg_names[0] = HV_X64_REGISTER_RIP;
> +    reg_values[0] = rip;
> +    reg_names[1] = HV_X64_REGISTER_RAX;
> +    reg_values[1] = rax;
> +
> +    x64_regs.names = reg_names;
> +    x64_regs.values = reg_values;
> +    x64_regs.count = 2;
> +
> +    ret = set_x64_registers(cpu_fd, &x64_regs);
> +    if (ret < 0) {
> +        error_report("Failed to set x64 registers");
> +        return -1;
> +    }
> +
> +    cpu->accel->dirty = false;
> +
> +    return 0;
> +}
> +
> +static int fetch_guest_state(CPUState *cpu)
> +{
> +    int ret;
> +
> +    ret = mshv_get_standard_regs(cpu);
> +    if (ret < 0) {
> +        error_report("Failed to get standard registers");
> +        return -1;
> +    }
> +
> +    ret = mshv_get_special_regs(cpu);
> +    if (ret < 0) {
> +        error_report("Failed to get special registers");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int read_memory(int cpu_fd, uint64_t initial_gva, uint64_t initial_gpa,
> +                       uint64_t gva, uint8_t *data, size_t len)
> +{
> +    int ret;
> +    uint64_t gpa, flags;
> +
> +    if (gva == initial_gva) {
> +        gpa = initial_gpa;
> +    } else {
> +        flags = HV_TRANSLATE_GVA_VALIDATE_READ;
> +        ret = translate_gva(cpu_fd, gva, &gpa, flags);
> +        if (ret < 0) {
> +            return -1;
> +        }
> +
> +        ret = mshv_guest_mem_read(gpa, data, len, false, false);
> +        if (ret < 0) {
> +            error_report("failed to read guest mem");
> +            return -1;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +static int write_memory(int cpu_fd, uint64_t initial_gva, uint64_t initial_gpa,
> +                        uint64_t gva, const uint8_t *data, size_t len)
> +{
> +    int ret;
> +    uint64_t gpa, flags;
> +
> +    if (gva == initial_gva) {
> +        gpa = initial_gpa;
> +    } else {
> +        flags = HV_TRANSLATE_GVA_VALIDATE_WRITE;
> +        ret = translate_gva(cpu_fd, gva, &gpa, flags);
> +        if (ret < 0) {
> +            error_report("failed to translate gva to gpa");
> +            return -1;
> +        }
> +    }
> +    ret = mshv_guest_mem_write(gpa, data, len, false);
> +    if (ret != MEMTX_OK) {
> +        error_report("failed to write to mmio");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int handle_pio_str_write(CPUState *cpu,
> +                                hv_x64_io_port_intercept_message *info,
> +                                size_t repeat, uint16_t port,
> +                                bool direction_flag)
> +{
> +    int ret;
> +    uint64_t src;
> +    uint8_t data[4] = { 0 };
> +    size_t len = info->access_info.access_size;
> +    int cpu_fd = mshv_vcpufd(cpu);
> +
> +    src = linear_addr(cpu, info->rsi, R_DS);
> +
> +    for (size_t i = 0; i < repeat; i++) {
> +        ret = read_memory(cpu_fd, 0, 0, src, data, len);
> +        if (ret < 0) {
> +            error_report("Failed to read memory");
> +            return -1;
> +        }
> +        ret = pio_write(port, data, len, false);
> +        if (ret < 0) {
> +            error_report("Failed to write to io port");
> +            return -1;
> +        }
> +        src += direction_flag ? -len : len;
> +        info->rsi += direction_flag ? -len : len;
> +    }
> +
> +    return 0;
> +}
> +
> +static int handle_pio_str_read(CPUState *cpu,
> +                                hv_x64_io_port_intercept_message *info,
> +                                size_t repeat, uint16_t port,
> +                                bool direction_flag)
> +{
> +    int ret;
> +    uint64_t dst;
> +    size_t len = info->access_info.access_size;
> +    uint8_t data[4] = { 0 };
> +    int cpu_fd = mshv_vcpufd(cpu);
> +
> +    dst = linear_addr(cpu, info->rdi, R_ES);
> +
> +    for (size_t i = 0; i < repeat; i++) {
> +        pio_read(port, data, len, false);
> +
> +        ret = write_memory(cpu_fd, 0, 0, dst, data, len);
> +        if (ret < 0) {
> +            error_report("Failed to write memory");
> +            return -1;
> +        }
> +        dst += direction_flag ? -len : len;
> +        info->rdi += direction_flag ? -len : len;
> +    }
> +
> +    return 0;
> +}
> +
> +static int handle_pio_str(CPUState *cpu,
> +                          hv_x64_io_port_intercept_message *info)
> +{
> +    uint8_t access_type = info->header.intercept_access_type;
> +    uint16_t port = info->port_number;
> +    bool repop = info->access_info.rep_prefix == 1;
> +    size_t repeat = repop ? info->rcx : 1;
> +    size_t insn_len = info->header.instruction_length;
> +    bool direction_flag;
> +    uint32_t reg_names[3];
> +    uint64_t reg_values[3];
> +    int ret;
> +    struct X64Registers x64_regs = { 0 };
> +    X86CPU *x86_cpu = X86_CPU(cpu);
> +    CPUX86State *env = &x86_cpu->env;
> +    int cpu_fd = mshv_vcpufd(cpu);
> +
> +    ret = fetch_guest_state(cpu);
> +    if (ret < 0) {
> +        error_report("Failed to fetch guest state");
> +        return -1;
> +    }
> +
> +    direction_flag = (env->eflags & DF) != 0;
> +
> +    if (access_type == HV_X64_INTERCEPT_ACCESS_TYPE_WRITE) {
> +        ret = handle_pio_str_write(cpu, info, repeat, port, direction_flag);
> +        if (ret < 0) {
> +            error_report("Failed to handle pio str write");
> +            return -1;
> +        }
> +        reg_names[0] = HV_X64_REGISTER_RSI;
> +        reg_values[0] = info->rsi;
> +    } else {
> +        ret = handle_pio_str_read(cpu, info, repeat, port, direction_flag);
> +        reg_names[0] = HV_X64_REGISTER_RDI;
> +        reg_values[0] = info->rdi;
> +    }
> +
> +    reg_names[1] = HV_X64_REGISTER_RIP;
> +    reg_values[1] = info->header.rip + insn_len;
> +    reg_names[2] = HV_X64_REGISTER_RAX;
> +    reg_values[2] = info->rax;
> +
> +    x64_regs.names = reg_names;
> +    x64_regs.values = reg_values;
> +    x64_regs.count = 2;
> +
> +    ret = set_x64_registers(cpu_fd, &x64_regs);
> +    if (ret < 0) {
> +        error_report("Failed to set x64 registers");
> +        return -1;
> +    }
> +
> +    cpu->accel->dirty = false;
> +
> +    return 0;
> +}
> +
> +static int handle_pio(CPUState *cpu, const struct hyperv_message *msg)
> +{
> +    struct hv_x64_io_port_intercept_message info = { 0 };
> +    int ret;
> +
> +    ret = set_ioport_info(msg, &info);
> +    if (ret < 0) {
> +        error_report("Failed to convert message to ioport info");
> +        return -1;
> +    }
> +
> +    if (info.access_info.string_op) {
> +        return handle_pio_str(cpu, &info);
> +    }
> +
> +    return handle_pio_non_str(cpu, &info);
> +}
> +
>   int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit)
>   {
> -	error_report("unimplemented");
> -	abort();
> +    int ret;
> +    hv_message exit_msg = { 0 };
> +    enum MshvVmExit exit_reason;
> +    int cpu_fd = mshv_vcpufd(cpu);
> +
> +    ret = ioctl(cpu_fd, MSHV_RUN_VP, &exit_msg);
> +    if (ret < 0) {
> +        return MshvVmExitShutdown;
> +    }
> +
> +    switch (exit_msg.header.message_type) {
> +    case HVMSG_UNRECOVERABLE_EXCEPTION:
> +        *msg = exit_msg;
> +        return MshvVmExitShutdown;
> +    case HVMSG_UNMAPPED_GPA:
> +        ret = handle_unmapped_mem(vm_fd, cpu, &exit_msg, &exit_reason);
> +        if (ret < 0) {
> +            error_report("failed to handle unmapped memory");
> +            return -1;
> +        }
> +        return exit_reason;
> +    case HVMSG_GPA_INTERCEPT:
> +        ret = handle_mmio(cpu, &exit_msg, &exit_reason);
> +        if (ret < 0) {
> +            error_report("failed to handle mmio");
> +            return -1;
> +        }
> +        return exit_reason;
> +    case HVMSG_X64_IO_PORT_INTERCEPT:
> +        ret = handle_pio(cpu, &exit_msg);
> +        if (ret < 0) {
> +            return MshvVmExitSpecial;
> +        }
> +        return MshvVmExitIgnore;
> +    default:
> +        msg = &exit_msg;
> +    }
> +
> +    *exit = MshvVmExitIgnore;
> +    return 0;
>   }
>   
>   void mshv_remove_vcpu(int vm_fd, int cpu_fd)
> @@ -1061,34 +1583,6 @@ int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd)
>       return 0;
>   }
>   
> -static int translate_gva(int cpu_fd, uint64_t gva, uint64_t *gpa,
> -                         uint64_t flags)
> -{
> -    int ret;
> -    union hv_translate_gva_result result = { 0 };
> -
> -    *gpa = 0;
> -    mshv_translate_gva args = {
> -        .gva = gva,
> -        .flags = flags,
> -        .gpa = (__u64 *)gpa,
> -        .result = &result,
> -    };
> -
> -    ret = ioctl(cpu_fd, MSHV_TRANSLATE_GVA, &args);
> -    if (ret < 0) {
> -        error_report("failed to invoke gpa->gva translation");
> -        return -errno;
> -    }
> -    if (result.result_code != HV_TRANSLATE_GVA_SUCCESS) {
> -        error_report("failed to translate gva (" TARGET_FMT_lx ") to gpa", gva);
> -        return -1;
> -
> -    }
> -
> -    return 0;
> -}
> -
>   static int guest_mem_read_with_gva(const CPUState *cpu, uint64_t gva,
>                                      uint8_t *data, uintptr_t size,
>                                      bool fetch_instruction)



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 12/25] accel/mshv: Add vCPU creation and execution loop
  2025-05-20 11:30 ` [RFC PATCH 12/25] accel/mshv: Add vCPU creation and execution loop Magnus Kulke
@ 2025-05-20 13:50   ` Paolo Bonzini
  2025-05-20 13:54     ` Paolo Bonzini
  2025-06-06 23:06     ` Nuno Das Neves
  0 siblings, 2 replies; 76+ messages in thread
From: Paolo Bonzini @ 2025-05-20 13:50 UTC (permalink / raw)
  To: Magnus Kulke, magnuskulke, qemu-devel, liuwe
  Cc: Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On 5/20/25 13:30, Magnus Kulke wrote:
> +    int ret;
> +    hv_message exit_msg = { 0 };

You probably don't want to fill 512 bytes on every vmentry.  Maybe pass 
&exit_msg up from mshv_cpu_exec()?

> +        /*
> +         * Read cpu->exit_request before KVM_RUN reads run->immediate_exit.
> +         * Matching barrier in kvm_eat_signals.
> +         */
> +        smp_rmb();

The comment is obviously wrong; unfortunately, the code is wrong too:

1) qemu_cpu_kick_self() is only needed for an old KVM API.  In that API 
the signal handler is blocked while QEMU runs.  In your case, 
qemu_cpu_kick_self() is an expensive way to do nothing.

2) Because of this, there's a race condition between delivering the 
signal and entering MSHV_RUN_VP

You need support in the hypervisor for this: KVM and HVF both have it.

There are two ways to do it, for both cases the hypervisor side for the 
latter can be something like this:

diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index 72df774e410a..627afece4046 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -530,7 +530,7 @@ static long mshv_run_vp_with_root_scheduler(
  		struct hv_output_dispatch_vp output;

  		ret = mshv_pre_guest_mode_work(vp);
-		if (ret)
+		if (ret || vp->run.flags.immediate_exit)
  			break;

  		if (vp->run.flags.intercept_suspend)
@@ -585,6 +585,7 @@
  		}
  	} while (!vp->run.flags.intercept_suspend);

+	vp->run.flags.immediate_exit = 0;
  	return ret;
  }

Instead of calling qemu_cpu_kick_self(), your signal handler would 
invoke a new MSHV ioctl that sets vp->run.flags.immediate_exit = 1.

And then you also don't need the barrier, by the way, because all 
inter-thread communication is mediated by the signal handler.

Paolo

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 25/25] accel/mshv: Add memory remapping workaround
  2025-05-20 11:30 ` [RFC PATCH 25/25] accel/mshv: Add memory remapping workaround Magnus Kulke
@ 2025-05-20 13:53   ` Paolo Bonzini
  2025-05-22 12:51     ` Magnus Kulke
  0 siblings, 1 reply; 76+ messages in thread
From: Paolo Bonzini @ 2025-05-20 13:53 UTC (permalink / raw)
  To: Magnus Kulke, magnuskulke, qemu-devel, liuwe
  Cc: Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On 5/20/25 13:30, Magnus Kulke wrote:
> Qemu maps regions of userland multiple times into the guest. The MSHV
> kernel driver detects those overlapping regions and rejects those
> mappings.

Can you explain what you see?  QEMU doesn't do that, just look at KVM code:

static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id,
                                       gfn_t start, gfn_t end)
{
         struct kvm_memslot_iter iter;

         kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end) {
                 if (iter.slot->id != id)
                         return true;
         }

         return false;
}

...

         if ((change == KVM_MR_CREATE || change == KVM_MR_MOVE) &&
             kvm_check_memslot_overlap(slots, id, base_gfn, base_gfn + npages))
                 return -EEXIST;


Paolo



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 12/25] accel/mshv: Add vCPU creation and execution loop
  2025-05-20 13:50   ` Paolo Bonzini
@ 2025-05-20 13:54     ` Paolo Bonzini
  2025-05-23 17:05       ` Wei Liu
  2025-06-06 23:06     ` Nuno Das Neves
  1 sibling, 1 reply; 76+ messages in thread
From: Paolo Bonzini @ 2025-05-20 13:54 UTC (permalink / raw)
  To: Magnus Kulke, magnuskulke, qemu-devel, liuwe
  Cc: Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On 5/20/25 15:50, Paolo Bonzini wrote:
> You need support in the hypervisor for this: KVM and HVF both have it.
> 
> There are two ways to do it

Sorry - I left out the other way which is to pass something *into* 
MSHV_RUN_VP since only half of it is currently used (I think).  But 
that's more complicated; the advantage would be to avoid the ioctl in 
the signal handler but it's not a fast path.  I would just do it the 
easy way.

Paolo



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 19/25] target/i386/mshv: Set local interrupt controller state
  2025-05-20 11:30 ` [RFC PATCH 19/25] target/i386/mshv: Set local interrupt controller state Magnus Kulke
@ 2025-05-20 14:03   ` Paolo Bonzini
  0 siblings, 0 replies; 76+ messages in thread
From: Paolo Bonzini @ 2025-05-20 14:03 UTC (permalink / raw)
  To: Magnus Kulke, magnuskulke, qemu-devel, liuwe
  Cc: Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

These:

> +/* Defines poached from apicdef.h kernel header. */
> +static u_int32_t APIC_MODE_NMI = 0x4;
> +static u_int32_t APIC_MODE_EXTINT = 0x7;

are available in other QEMU headers (search for APIC_DM_NMI, 
APIC_DM_EXTINT).

Also they should have been #defines rather than statics; and also these 
nearby ones from patch 21 should be #defines in cpu.h instead of being here:

> +/* IA32_MTRR_DEF_TYPE MSR: E (MTRRs enabled) flag, bit 11 */
> +static u_int64_t MTRR_ENABLE = 0x800;
> +static u_int64_t MTRR_MEM_TYPE_WB = 0x6;
> +


Thanks,

Paolo



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 17/25] target/i386/mshv: Implement mshv_get_special_regs()
  2025-05-20 11:30 ` [RFC PATCH 17/25] target/i386/mshv: Implement mshv_get_special_regs() Magnus Kulke
@ 2025-05-20 14:05   ` Paolo Bonzini
  2025-05-20 22:15   ` Wei Liu
  1 sibling, 0 replies; 76+ messages in thread
From: Paolo Bonzini @ 2025-05-20 14:05 UTC (permalink / raw)
  To: Magnus Kulke, magnuskulke, qemu-devel, liuwe
  Cc: Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On 5/20/25 13:30, Magnus Kulke wrote:
> +static void populate_special_regs(const hv_register_assoc *assocs,
> +                                  X86CPU *x86cpu)
> +{
> +    CPUX86State *env = &x86cpu->env;
> +
> +    populate_segment_reg(&assocs[0].value.segment, &env->segs[R_CS]);
> +    populate_segment_reg(&assocs[1].value.segment, &env->segs[R_DS]);
> +    populate_segment_reg(&assocs[2].value.segment, &env->segs[R_ES]);
> +    populate_segment_reg(&assocs[3].value.segment, &env->segs[R_FS]);
> +    populate_segment_reg(&assocs[4].value.segment, &env->segs[R_GS]);
> +    populate_segment_reg(&assocs[5].value.segment, &env->segs[R_SS]);
> +
> +    /* TODO: should we set TR + LDT? */
> +    /* populate_segment_reg(&assocs[6].value.segment, &regs->tr); */
> +    /* populate_segment_reg(&assocs[7].value.segment, &regs->ldt); */

Yes :)

Paolo



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 01/25] accel: Add Meson and config support for MSHV accelerator
  2025-05-20 11:50   ` Daniel P. Berrangé
@ 2025-05-20 14:16     ` Paolo Bonzini
  0 siblings, 0 replies; 76+ messages in thread
From: Paolo Bonzini @ 2025-05-20 14:16 UTC (permalink / raw)
  To: Daniel P. Berrangé, Magnus Kulke
  Cc: magnuskulke, qemu-devel, liuwe, Michael S. Tsirkin, Wei Liu,
	Phil Dennis-Jordan, Roman Bolshakov, Philippe Mathieu-Daudé,
	Zhao Liu, Richard Henderson, Cameron Esfahani,
	Marc-André Lureau

On 5/20/25 13:50, Daniel P. Berrangé wrote:
>> +if get_option('mshv').allowed() and host_os == 'linux'
>> +  if get_option('mshv').enabled() and host_machine.cpu() != 'x86_64'
>> +    error('mshv accelerator requires x64_64 host')
>> +  endif
>> +  accelerators += 'CONFIG_MSHV'
> 
> This enables MSHV for non-x86 when the option is left on 'auto'.

This is similar to what other accelerators do.  The idea is that 
--enable-kvm will give an error on Windows, but not (say) on 
SPARC/Linux.  It was done this way to simplify packaging and let distros 
use --enable-kvm unconditionally; and now --enable-mshv should probably 
behave the same way.

The "requires x86_64 host" was copied from whpx, but is really 
unnecessary there because above you have

elif cpu == 'x86_64'
   accelerator_targets += {
     'CONFIG_HVF': ['x86_64-softmmu'],
     'CONFIG_NVMM': ['i386-softmmu', 'x86_64-softmmu'],
     'CONFIG_WHPX': ['i386-softmmu', 'x86_64-softmmu'],
   }
endif

So the patch is mostly okay, however I'd replace:

>> +if cpu == 'x86_64'
>> +  mshv_targets = ['x86_64-softmmu']
>> +else
>> +  mshv_targets = []
>> +endif
>> +accelerator_targets += { 'CONFIG_MSHV': mshv_targets }
>> +

with the simpler

  elif cpu == 'x86_64'
     accelerator_targets += {
       'CONFIG_HVF': ['x86_64-softmmu'],
       'CONFIG_NVMM': ['i386-softmmu', 'x86_64-softmmu'],
       'CONFIG_WHPX': ['i386-softmmu', 'x86_64-softmmu'],
+     'CONFIG_MSHV': ['x86_64-softmmu'],
     }
  endif

Thanks,

Paolo



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 11/25] accel/mshv: Add basic interrupt injection support
  2025-05-20 11:30 ` [RFC PATCH 11/25] accel/mshv: Add basic interrupt injection support Magnus Kulke
@ 2025-05-20 14:18   ` Paolo Bonzini
  2025-05-20 20:15   ` Wei Liu
  1 sibling, 0 replies; 76+ messages in thread
From: Paolo Bonzini @ 2025-05-20 14:18 UTC (permalink / raw)
  To: Magnus Kulke, magnuskulke, qemu-devel, liuwe
  Cc: Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On 5/20/25 13:30, Magnus Kulke wrote:
> diff --git a/include/system/mshv.h b/include/system/mshv.h
> index c7ee4f0cc1..4c1e901835 100644
> --- a/include/system/mshv.h
> +++ b/include/system/mshv.h
> @@ -40,6 +40,10 @@
>    */
>   #define MSHV_USE_IOEVENTFD 1
>   
> +#define MSHV_USE_KERNEL_GSI_IRQFD 1

Please make this code unconditional - same for MSHV_USE_IOEVENTFD.

Paolo



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 05/25] include/hw/hyperv: Add MSHV ABI header definitions
  2025-05-20 11:29 ` [RFC PATCH 05/25] include/hw/hyperv: Add MSHV ABI header definitions Magnus Kulke
@ 2025-05-20 14:24   ` Paolo Bonzini
  0 siblings, 0 replies; 76+ messages in thread
From: Paolo Bonzini @ 2025-05-20 14:24 UTC (permalink / raw)
  To: Magnus Kulke, magnuskulke, qemu-devel, liuwe
  Cc: Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On 5/20/25 13:29, Magnus Kulke wrote:
> Introduce headers for the Microsoft Hypervisor (MSHV) userspace ABI,
> including IOCTLs and structures used to interface with the hypervisor.
> 
> These definitions are based on the upstream Linux MSHV interface and
> will be used by the MSHV accelerator backend in later patches.
> 
> Note that for the time being the header `linux-mshv.h` is also being
> included to allow building on machines that do not ship the header yet.
> The header will be available in kernel 6.15 (at the time of writing
> we're at -rc6) we will probably drop it in later revisions of the
> patch set.

We do ship headers copied from Linux in QEMU; please modify 
scripts/update-linux-headers.sh to include linux/mshv.h as 
linux-headers/linux/mshv.h.

The other three can remain in include/hw/hyperv, since the Linux 
versions are not intended for consumption outside the kernel (they're 
not in include/uapi/).  But when you copy them...

> +#ifndef HW_HYPERV_HVHDK_H
> +#define HW_HYPERV_HVHDK_H
> +
> +#define HV_PARTITION_SYNTHETIC_PROCESSOR_FEATURES_BANKS 1
> +
> +struct hv_input_set_partition_property {
> +    __u64 partition_id;
> +    __u32 property_code; /* enum hv_partition_property_code */
> +    __u32 padding;
> +    __u64 property_value;

... please change the types to uintNN_t and drop <linux/types.h>.

Thanks,

Paolo



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator
  2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
                   ` (24 preceding siblings ...)
  2025-05-20 11:30 ` [RFC PATCH 25/25] accel/mshv: Add memory remapping workaround Magnus Kulke
@ 2025-05-20 14:25 ` Paolo Bonzini
  25 siblings, 0 replies; 76+ messages in thread
From: Paolo Bonzini @ 2025-05-20 14:25 UTC (permalink / raw)
  To: Magnus Kulke, magnuskulke, qemu-devel, liuwe
  Cc: Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On 5/20/25 13:29, Magnus Kulke wrote:
> Hello all,
> 
> as previously announced here, we are working on an integration that will
> expose the HyperV hypervisor to QEMU on Linux hosts. HyperV is a Type 1
> hypervisor with a layered architecture that features a "root partition"
> alongside VMs as "child partitions" that will interface with the
> hypervisor and has access to the hardware. (https://aka.ms/hypervarch)

I gave a look at stuff that is usually incorrect in newly submitted 
accelerators. :)  The two main thing to cover are:

- the signal handlers, which require a kernel change

- the memory region issue that isn't clear

Everything else is just small things that need to be cleaned up.

Paolo



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 18/25] target/i386/mshv: Implement mshv_arch_put_registers()
  2025-05-20 11:30 ` [RFC PATCH 18/25] target/i386/mshv: Implement mshv_arch_put_registers() Magnus Kulke
@ 2025-05-20 14:33   ` Paolo Bonzini
  2025-05-20 22:22   ` Wei Liu
  2025-06-06 19:11   ` Wei Liu
  2 siblings, 0 replies; 76+ messages in thread
From: Paolo Bonzini @ 2025-05-20 14:33 UTC (permalink / raw)
  To: Magnus Kulke, magnuskulke, qemu-devel, liuwe
  Cc: Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On 5/20/25 13:30, Magnus Kulke wrote:
> Write CPU register state to MSHV vCPUs. Various mapping functions to
> prepare the payload for the HV call have been implemented.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
>   include/system/mshv.h       |  41 ++++++
>   target/i386/mshv/mshv-cpu.c | 249 ++++++++++++++++++++++++++++++++++++
>   2 files changed, 290 insertions(+)
> 
> diff --git a/include/system/mshv.h b/include/system/mshv.h
> index 055489a6f3..76a3b0010e 100644
> --- a/include/system/mshv.h
> +++ b/include/system/mshv.h
> @@ -99,6 +99,46 @@ typedef struct MshvMsiControl {
>   #define EFER_LMA   ((uint64_t)0x400)
>   #define EFER_LME   ((uint64_t)0x100)
>   
> +/* CR0 bits */
> +#define CR0_PE     ((uint64_t)0x1)
> +#define CR0_PG     ((uint64_t)0x80000000)
> +
> +/* CR4 bits */
> +#define CR4_PAE    ((uint64_t)0x20)
> +#define CR4_LA57   ((uint64_t)0x1000)
> +
> +/* rflags bits (shift values) */
> +#define CF_SHIFT   0
> +#define PF_SHIFT   2
> +#define AF_SHIFT   4
> +#define ZF_SHIFT   6
> +#define SF_SHIFT   7
> +#define DF_SHIFT   10
> +#define OF_SHIFT   11
> +
> +/* rflags bits (bit masks) */
> +#define CF         ((uint64_t)1 << CF_SHIFT)
> +#define PF         ((uint64_t)1 << PF_SHIFT)
> +#define AF         ((uint64_t)1 << AF_SHIFT)
> +#define ZF         ((uint64_t)1 << ZF_SHIFT)
> +#define SF         ((uint64_t)1 << SF_SHIFT)
> +#define DF         ((uint64_t)1 << DF_SHIFT)
> +#define OF         ((uint64_t)1 << OF_SHIFT)

All of these are either duplicate or unused.

Paolo



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 02/25] target/i386/emulate: allow instruction decoding from stream
  2025-05-20 11:29 ` [RFC PATCH 02/25] target/i386/emulate: allow instruction decoding from stream Magnus Kulke
  2025-05-20 12:42   ` Paolo Bonzini
@ 2025-05-20 17:29   ` Wei Liu
  1 sibling, 0 replies; 76+ messages in thread
From: Wei Liu @ 2025-05-20 17:29 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 01:29:55PM +0200, Magnus Kulke wrote:
> Introduce a new helper function to decode x86 instructions from a
> raw instruction byte stream. MSHV delivers an instruction stream in a
> buffer of the vm_exit message. It can be used to speed up MMIO
> emulation, since instructions do not have to be fetched and translated.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
>  target/i386/emulate/x86_decode.c | 32 +++++++++++++++++++++++++++-----
>  target/i386/emulate/x86_decode.h | 11 +++++++++++
>  target/i386/emulate/x86_emu.c    |  3 ++-
>  target/i386/emulate/x86_emu.h    |  1 +
>  4 files changed, 41 insertions(+), 6 deletions(-)
> 
> diff --git a/target/i386/emulate/x86_decode.c b/target/i386/emulate/x86_decode.c
> index 88be9479a8..7a862b976e 100644
> --- a/target/i386/emulate/x86_decode.c
> +++ b/target/i386/emulate/x86_decode.c
> @@ -60,6 +60,7 @@ static inline uint64_t decode_bytes(CPUX86State *env, struct x86_decode *decode,
>                                      int size)
>  {
>      uint64_t val = 0;
> +    target_ulong va;
>  
>      switch (size) {
>      case 1:
> @@ -71,10 +72,16 @@ static inline uint64_t decode_bytes(CPUX86State *env, struct x86_decode *decode,
>          VM_PANIC_EX("%s invalid size %d\n", __func__, size);
>          break;
>      }
> -    target_ulong va  = linear_rip(env_cpu(env), env->eip) + decode->len;
> -    emul_ops->read_mem(env_cpu(env), &val, va, size);
> +
> +	/* copy the bytes from the instruction stream, if available */
> +	if (decode->stream && decode->len + size <= decode->stream->len) {
> +		memcpy(&val, decode->stream->bytes + decode->len, size);
> +	} else {
> +		va = linear_rip(env_cpu(env), env->eip) + decode->len;
> +		emul_ops->fetch_instruction(env_cpu(env), &val, va, size);
> +	}

You're using tabs here.

>      decode->len += size;
> -    
> +

Unrelated whitespace change.

>      return val;
>  }
>  
> @@ -2076,9 +2083,8 @@ static void decode_opcodes(CPUX86State *env, struct x86_decode *decode)
>      }
>  }
>  
> -uint32_t decode_instruction(CPUX86State *env, struct x86_decode *decode)
> +static uint32_t decode_opcode(CPUX86State *env, struct x86_decode *decode)
>  {
> -    memset(decode, 0, sizeof(*decode));

Why lift this out to its callers when in both cases they need to call
memset anyway?

>      decode_prefix(env, decode);
>      set_addressing_size(env, decode);
>      set_operand_size(env, decode);
> @@ -2088,6 +2094,22 @@ uint32_t decode_instruction(CPUX86State *env, struct x86_decode *decode)
>      return decode->len;
>  }
>  
> +uint32_t decode_instruction(CPUX86State *env, struct x86_decode *decode)
> +{
> +	memset(decode, 0, sizeof(*decode));
> +	return decode_opcode(env, decode);
> +}
> +
> +uint32_t decode_instruction_stream(CPUX86State *env, struct x86_decode *decode,
> +		                           struct x86_insn_stream *stream)
> +{
> +	memset(decode, 0, sizeof(*decode));
> +	if (stream != NULL) {
> +		decode->stream = stream;
> +	}
> +	return decode_opcode(env, decode);
> +}
> +
>  void init_decoder(void)
>  {
>      int i;
> diff --git a/target/i386/emulate/x86_decode.h b/target/i386/emulate/x86_decode.h
> index 87cc728598..9bc7d6cc49 100644
> --- a/target/i386/emulate/x86_decode.h
> +++ b/target/i386/emulate/x86_decode.h
> @@ -269,6 +269,11 @@ typedef struct x86_decode_op {
>      target_ulong ptr;
>  } x86_decode_op;
>  
> +typedef struct x86_insn_stream {
> +	const uint8_t *bytes;
> +	size_t len;
> +} x86_insn_stream;
> +
>  typedef struct x86_decode {
>      int len;
>      uint8_t opcode[4];
> @@ -295,12 +300,18 @@ typedef struct x86_decode {
>      struct x86_modrm modrm;
>      struct x86_decode_op op[4];
>      bool is_fpu;
> +
> +	x86_insn_stream *stream;

Tab here.

>  } x86_decode;
>  
>  uint64_t sign(uint64_t val, int size);
>  
>  uint32_t decode_instruction(CPUX86State *env, struct x86_decode *decode);
>  
> +uint32_t decode_instruction_stream(CPUX86State *env,
> +								   struct x86_decode *decode,

Tabs here again.

I suppose you will need to configure your editor correctly and change
from tabs to spaces.

Thanks,
Wei.

> +		                           struct x86_insn_stream *stream);
> +
>  target_ulong get_reg_ref(CPUX86State *env, int reg, int rex_present,
>                           int is_extended, int size);
>  target_ulong get_reg_val(CPUX86State *env, int reg, int rex_present,
> diff --git a/target/i386/emulate/x86_emu.c b/target/i386/emulate/x86_emu.c
> index 7773b51b95..73c9eb41d1 100644
> --- a/target/i386/emulate/x86_emu.c
> +++ b/target/i386/emulate/x86_emu.c
> @@ -1241,7 +1241,8 @@ static void init_cmd_handler(void)
>  bool exec_instruction(CPUX86State *env, struct x86_decode *ins)
>  {
>      if (!_cmd_handler[ins->cmd].handler) {
> -        printf("Unimplemented handler (" TARGET_FMT_lx ") for %d (%x %x) \n", env->eip,
> +        printf("Unimplemented handler (" TARGET_FMT_lx ") for %d (%x %x) \n",
> +                env->eip,
>                  ins->cmd, ins->opcode[0],
>                  ins->opcode_len > 1 ? ins->opcode[1] : 0);
>          env->eip += ins->len;
> diff --git a/target/i386/emulate/x86_emu.h b/target/i386/emulate/x86_emu.h
> index 555b567e2c..761e83fd6b 100644
> --- a/target/i386/emulate/x86_emu.h
> +++ b/target/i386/emulate/x86_emu.h
> @@ -24,6 +24,7 @@
>  #include "cpu.h"
>  
>  struct x86_emul_ops {
> +    void (*fetch_instruction)(CPUState *cpu, void *data, target_ulong addr, int bytes);
>      void (*read_mem)(CPUState *cpu, void *data, target_ulong addr, int bytes);
>      void (*write_mem)(CPUState *cpu, void *data, target_ulong addr, int bytes);
>      void (*read_segment_descriptor)(CPUState *cpu, struct x86_segment_descriptor *desc,
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 03/25] target/i386/mshv: Add x86 decoder/emu implementation
  2025-05-20 11:29 ` [RFC PATCH 03/25] target/i386/mshv: Add x86 decoder/emu implementation Magnus Kulke
  2025-05-20 11:54   ` Daniel P. Berrangé
  2025-05-20 13:17   ` Paolo Bonzini
@ 2025-05-20 17:36   ` Wei Liu
  2 siblings, 0 replies; 76+ messages in thread
From: Wei Liu @ 2025-05-20 17:36 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 01:29:56PM +0200, Magnus Kulke wrote:
> The MSHV accelerator requires a x86 decoder/emulator in userland to
> emulate MMIO instructions. This change contains the implementations for
> the generalized i386 instruction decoder/emulator.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
[...]
> +bool x86_read_segment_descriptor(CPUState *cpu,
> +                                 struct x86_segment_descriptor *desc,
> +                                 x86_segment_selector sel)
> +{
> +    target_ulong base;
> +    uint32_t limit;
> +    X86CPU *x86_cpu = X86_CPU(cpu);
> +    CPUX86State *env = &x86_cpu->env;
> +    target_ulong gva;
> +    /* int ret; */

Unused code. Please drop.

> +
> +    memset(desc, 0, sizeof(*desc));
> +
> +    /* valid gdt descriptors start from index 1 */
> +    if (!sel.index && GDT_SEL == sel.ti) {
> +        return false;
> +    }
> +
> +    if (GDT_SEL == sel.ti) {
> +        base = env->gdt.base;
> +        limit = env->gdt.limit;
> +    } else {
> +        base = env->ldt.base;
> +        limit = env->ldt.limit;
> +    }
> +
> +    if (sel.index * 8 >= limit) {
> +        return false;
> +    }
> +
> +    gva = base + sel.index * 8;
> +    emul_ops->read_mem(cpu, desc, gva, sizeof(*desc));
> +
> +    return true;
> +}
> +
> +bool x86_write_segment_descriptor(CPUState *cpu,
> +                                  struct x86_segment_descriptor *desc,
> +                                  x86_segment_selector sel)
> +{
> +    target_ulong base;
> +    uint32_t limit;
> +    X86CPU *x86_cpu = X86_CPU(cpu);
> +    CPUX86State *env = &x86_cpu->env;
> +    /* int ret; */

Unused code. Please drop.

> +    target_ulong gva;
> +
> +    if (GDT_SEL == sel.ti) {
> +        base = env->gdt.base;
> +        limit = env->gdt.limit;
> +    } else {
> +        base = env->ldt.base;
> +        limit = env->ldt.limit;
> +    }
> +
> +    if (sel.index * 8 >= limit) {
> +        return false;
> +    }
> +
> +    gva = base + sel.index * 8;
> +    emul_ops->write_mem(cpu, desc, gva, sizeof(*desc));
> +
> +    return true;
> +}
> +
[...]
> +
> +target_ulong linear_addr(CPUState *cpu, target_ulong addr, X86Seg seg)
> +{
> +    int ret;
> +    target_ulong linear_addr;
> +
> +    /* return vmx_read_segment_base(cpu, seg) + addr; */

Unused code.

Thanks,
Wei.

> +    ret = linearize(cpu, addr, &linear_addr, seg);
> +    if (ret < 0) {
> +        error_report("failed to linearize address");
> +        abort();
> +    }
> +
> +    return linear_addr;
> +}
> +
> +target_ulong linear_addr_size(CPUState *cpu, target_ulong addr, int size,
> +                              X86Seg seg)
> +{
> +    switch (size) {
> +    case 2:
> +        addr = (uint16_t)addr;
> +        break;
> +    case 4:
> +        addr = (uint32_t)addr;
> +        break;
> +    default:
> +        break;
> +    }
> +    return linear_addr(cpu, addr, seg);
> +}
> +
> +target_ulong linear_rip(CPUState *cpu, target_ulong rip)
> +{
> +    return linear_addr(cpu, rip, R_CS);
> +}
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 08/25] accel/mshv: Initialize VM partition
  2025-05-20 11:30 ` [RFC PATCH 08/25] accel/mshv: Initialize VM partition Magnus Kulke
@ 2025-05-20 19:07   ` Wei Liu
  2025-05-22 15:42     ` Magnus Kulke
  2025-05-23  8:23     ` Magnus Kulke
  0 siblings, 2 replies; 76+ messages in thread
From: Wei Liu @ 2025-05-20 19:07 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 01:30:01PM +0200, Magnus Kulke wrote:
> Create the MSHV virtual machine by opening a partition and issuing
> the necessary ioctl to initialize it. This sets up the basic VM
> structure and initial configuration used by MSHV to manage guest state.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
[...]
> diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
> index 63b0eca1fc..95f1008a48 100644
> --- a/accel/mshv/mshv-all.c
> +++ b/accel/mshv/mshv-all.c
> @@ -48,6 +48,178 @@ bool mshv_allowed;
>  
>  MshvState *mshv_state;
>  
> +static int init_mshv(int *mshv_fd)
> +{
> +    int fd = open("/dev/mshv", O_RDWR | O_CLOEXEC);
> +    if (fd < 0) {
> +        error_report("Failed to open /dev/mshv: %s", strerror(errno));
> +        return -1;
> +    }
> +	*mshv_fd = fd;
> +	return 0;
> +}
> +
> +/* freeze 1 to pause, 0 to resume */
> +static int set_time_freeze(int vm_fd, int freeze)
> +{
> +    int ret;
> +
> +    if (freeze != 0 && freeze != 1) {
> +        error_report("Invalid time freeze value");
> +        return -1;
> +    }
> +
> +    struct hv_input_set_partition_property in = {0};
> +    in.property_code = HV_PARTITION_PROPERTY_TIME_FREEZE;
> +    in.property_value = freeze;
> +
> +    struct mshv_root_hvcall args = {0};
> +    args.code = HVCALL_SET_PARTITION_PROPERTY;
> +    args.in_sz = sizeof(in);
> +    args.in_ptr = (uint64_t)&in;
> +
> +    ret = mshv_hvcall(vm_fd, &args);
> +    if (ret < 0) {
> +        error_report("Failed to set time freeze");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int pause_vm(int vm_fd)

I feel like the name of this function and its counter part is too broad.
Pausing the VM to me means not only freezing the time, but also making
sure VCPUs are no longer scheduled, all I/O are quiescent.

This is not blocking. Luckily we can always change the name later.

> +{
> +    int ret;
> +
> +    ret = set_time_freeze(vm_fd, 1);
> +    if (ret < 0) {
> +        error_report("Failed to pause partition: %s", strerror(errno));
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int resume_vm(int vm_fd)
> +{
[...]
> +
> +static int create_vm(int mshv_fd)
> +{
> +    int vm_fd;
> +
> +    int ret = create_partition(mshv_fd, &vm_fd);
> +    if (ret < 0) {
> +        close(mshv_fd);

No please don't close it here. This fd is not created by this function.

> +        return -errno;
> +    }
> +
> +    ret = set_synthetic_proc_features(vm_fd);
> +    if (ret < 0) {
> +        return -errno;
> +    }
> +
> +    ret = initialize_vm(vm_fd);
> +    if (ret < 0) {
> +        return -1;
> +    }
> +
> +    ret = mshv_arch_post_init_vm(vm_fd);
> +    if (ret < 0) {
> +        return -1;
> +    }
> +
> +    /* Always create a frozen partition */
> +    pause_vm(vm_fd);
> +
> +    return vm_fd;
> +}
>  
>  static void mem_region_add(MemoryListener *listener,
>                             MemoryRegionSection *section)
> @@ -96,22 +268,54 @@ static void register_mshv_memory_listener(MshvState *s, MshvMemoryListener *mml,
>          }
>      }
>  }
> +static void mshv_reset(void *param)
> +{
> +    warn_report("mshv reset");

What's missing for this hook?

> +}
> +
> +int mshv_hvcall(int mshv_fd, const struct mshv_root_hvcall *args)

Please be consistent and change mshv_fd to vm_fd.

> +{
> +    int ret = 0;
> +
> +    ret = ioctl(mshv_fd, MSHV_ROOT_HVCALL, args);
> +    if (ret < 0) {
> +        error_report("Failed to perform hvcall: %s", strerror(errno));
> +        return -1;
> +    }
> +    return ret;
> +}
>  
>  
>  static int mshv_init(MachineState *ms)
>  {
>      MshvState *s;
> +    int mshv_fd, ret;
> +
>      s = MSHV_STATE(ms->accelerator);
>  
>      accel_blocker_init();
>  
>      s->vm = 0;
>  
> +    ret = init_mshv(&mshv_fd);
> +    if (ret < 0) {
> +        return -1;
> +    }
> +
> +    do {
> +        int vm_fd = create_vm(mshv_fd);
> +        s->vm = vm_fd;
> +    } while (!s->vm);
> +

This loop doesn't make sense to me. The create_vm function doesn't
return 0 as "try again".

> +    resume_vm(s->vm);
> +

mshv_fd is neither stashed into a state structure nor freed after this
point.  Is it leaked?

Thanks,
Wei.

>      s->nr_as = 1;
>      s->as = g_new0(MshvAddressSpace, s->nr_as);
>  
>      mshv_state = s;
>  
> +    qemu_register_reset(mshv_reset, NULL);
> +
>      register_mshv_memory_listener(s, &s->memory_listener, &address_space_memory,
>                                    0, "mshv-memory");
>      memory_listener_register(&mshv_io_listener, &address_space_io);


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 09/25] accel/mshv: Register guest memory regions with hypervisor
  2025-05-20 11:30 ` [RFC PATCH 09/25] accel/mshv: Register guest memory regions with hypervisor Magnus Kulke
@ 2025-05-20 20:07   ` Wei Liu
  2025-05-23 14:17     ` Magnus Kulke
  0 siblings, 1 reply; 76+ messages in thread
From: Wei Liu @ 2025-05-20 20:07 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 01:30:02PM +0200, Magnus Kulke wrote:
> Handle region_add events by invoking the MSHV memory registration
> ioctl to map guest memory into the hypervisor partition. This allows
> the guest to access memory through MSHV-managed mappings.
> 
> Note that this assumes the hypervisor will accept regions that overlap
> in userspace_addr. Currently that's not the case, it will be addressed
> in a later commit in the series.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
>  accel/mshv/mem.c        | 116 ++++++++++++++++++++++++++++++++++++++--
>  accel/mshv/trace-events |   1 +
>  include/system/mshv.h   |  11 ++++
>  3 files changed, 125 insertions(+), 3 deletions(-)
> 
> diff --git a/accel/mshv/mem.c b/accel/mshv/mem.c
> index eddd83ae83..2bbeae4f4a 100644
> --- a/accel/mshv/mem.c
> +++ b/accel/mshv/mem.c
> @@ -13,13 +13,123 @@
>  
>  #include "qemu/osdep.h"
>  #include "qemu/error-report.h"
> +#include "hw/hyperv/linux-mshv.h"
>  #include "system/address-spaces.h"
>  #include "system/mshv.h"
> +#include "exec/memattrs.h"
> +#include <sys/ioctl.h>
> +#include "trace.h"
> +
> +static int set_guest_memory(int vm_fd, const mshv_user_mem_region *region)
> +{
> +    int ret;
> +
> +    ret = ioctl(vm_fd, MSHV_SET_GUEST_MEMORY, region);
> +    if (ret < 0) {
> +        error_report("failed to set guest memory");
> +        return -errno;
> +    }
> +
> +    return 0;
> +}
> +
> +static int map_or_unmap(int vm_fd, const MshvMemoryRegion *mr, bool add)

Change "add" to "map" to match the name of the function.

> +{
> +    struct mshv_user_mem_region region = {0};
> +
> +    region.guest_pfn = mr->guest_phys_addr >> MSHV_PAGE_SHIFT;
> +    region.size = mr->memory_size;
> +    region.userspace_addr = mr->userspace_addr;
> +
> +    if (!add) {
> +        region.flags |= (1 << MSHV_SET_MEM_BIT_UNMAP);

Use BIT() like you did in other places?

> +        return set_guest_memory(vm_fd, &region);
> +    }
> +
> +    region.flags = (1 << MSHV_SET_MEM_BIT_EXECUTABLE);

Should this be always set? Is there a way to get more information from
the caller or QEMU's core memory region management logic?

> +    if (!mr->readonly) {
> +        region.flags |= (1 << MSHV_SET_MEM_BIT_WRITABLE);
> +    }
> +
> +    return set_guest_memory(vm_fd, &region);
> +}
> +
> +static int set_memory(const MshvMemoryRegion *mshv_mr, bool add)
> +{
> +    int ret = 0;
> +
> +    if (!mshv_mr) {
> +        error_report("Invalid mshv_mr");
> +        return -1;
> +    }
> +
> +    trace_mshv_set_memory(add, mshv_mr->guest_phys_addr,
> +                          mshv_mr->memory_size,
> +                          mshv_mr->userspace_addr, mshv_mr->readonly,
> +                          ret);
> +    return map_or_unmap(mshv_state->vm, mshv_mr, add);
> +}
> +
> +/*
> + * Calculate and align the start address and the size of the section.
> + * Return the size. If the size is 0, the aligned section is empty.
> + */
> +static hwaddr align_section(MemoryRegionSection *section, hwaddr *start)
> +{
> +    hwaddr size = int128_get64(section->size);
> +    hwaddr delta, aligned;
> +
> +    /*
> +     * works in page size chunks, but the function may be called
> +     * with sub-page size and unaligned start address. Pad the start
> +     * address to next and truncate size to previous page boundary.
> +     */
> +    aligned = ROUND_UP(section->offset_within_address_space,
> +                       qemu_real_host_page_size());
> +    delta = aligned - section->offset_within_address_space;
> +    *start = aligned;
> +    if (delta > size) {
> +        return 0;
> +    }
> +
> +    return (size - delta) & qemu_real_host_page_mask();
> +}
>  
>  void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
>                         bool add)
>  {
> -	error_report("unimplemented");
> -	abort();
> -}
> +    int ret = 0;
> +    MemoryRegion *area = section->mr;
> +    bool writable = !area->readonly && !area->rom_device;
> +    hwaddr start_addr, mr_offset, size;
> +    void *ram;
> +    MshvMemoryRegion tmp, *mshv_mr = &tmp;
> +
> +    if (!memory_region_is_ram(area)) {
> +        if (writable) {
> +            return;
> +        }
> +    }
> +

I don't follow the check here. Can you put in a comment?

Thanks,
Wei.

> +    size = align_section(section, &start_addr);
> +    if (!size) {
> +        return;
> +    }
> +
> +    mr_offset = section->offset_within_region + start_addr -
> +                section->offset_within_address_space;
>  
> +    ram = memory_region_get_ram_ptr(area) + mr_offset;
> +
> +    memset(mshv_mr, 0, sizeof(*mshv_mr));
> +    mshv_mr->guest_phys_addr = start_addr;
> +    mshv_mr->memory_size = size;
> +    mshv_mr->readonly = !writable;
> +    mshv_mr->userspace_addr = (uint64_t)ram;
> +
> +    ret = set_memory(mshv_mr, add);
> +    if (ret < 0) {
> +        error_report("Failed to set memory region");
> +        abort();
> +    }
> +}


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 11/25] accel/mshv: Add basic interrupt injection support
  2025-05-20 11:30 ` [RFC PATCH 11/25] accel/mshv: Add basic interrupt injection support Magnus Kulke
  2025-05-20 14:18   ` Paolo Bonzini
@ 2025-05-20 20:15   ` Wei Liu
  2025-05-27 16:27     ` Magnus Kulke
  1 sibling, 1 reply; 76+ messages in thread
From: Wei Liu @ 2025-05-20 20:15 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 01:30:04PM +0200, Magnus Kulke wrote:
> Implement initial interrupt handling logic in the MSHV backend. This
> includes management of MSI and un/registering of irqfd mechanisms.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
[...]
> +int mshv_request_interrupt(int vm_fd, uint32_t interrupt_type, uint32_t vector,
> +                           uint32_t vp_index, bool logical_dest_mode,
> +                           bool level_triggered)
> +{
> +    int ret;
> +
> +    if (vector == 0) {
> +        /* TODO: why do we receive this? */

You must have seen this in real life, right? We need to convince
ourselves why this is okay.

Thanks,
Wei.

> +        return 0;
> +    }
> +
> +    union hv_interrupt_control control = {
> +        .interrupt_type = interrupt_type,
> +        .level_triggered = level_triggered,
> +        .logical_dest_mode = logical_dest_mode,
> +        .rsvd = 0,
> +    };
> +
> +    struct hv_input_assert_virtual_interrupt arg = {0};
> +    arg.control = control;
> +    arg.dest_addr = (uint64_t)vp_index;
> +    arg.vector = vector;
> +
> +    struct mshv_root_hvcall args = {0};
> +    args.code   = HVCALL_ASSERT_VIRTUAL_INTERRUPT;
> +    args.in_sz  = sizeof(arg);
> +    args.in_ptr = (uint64_t)&arg;
> +
> +    ret = mshv_hvcall(vm_fd, &args);
> +    if (ret < 0) {
> +        error_report("Failed to request interrupt");
> +        return -errno;
> +    }
> +    return 0;
> +}
> +


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 14/25] target/i386/mshv: Add CPU create and remove logic
  2025-05-20 11:30 ` [RFC PATCH 14/25] target/i386/mshv: Add CPU create and remove logic Magnus Kulke
@ 2025-05-20 21:50   ` Wei Liu
  0 siblings, 0 replies; 76+ messages in thread
From: Wei Liu @ 2025-05-20 21:50 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 01:30:07PM +0200, Magnus Kulke wrote:
> Implement MSHV-specific hooks for vCPU creation and teardown in the
> i386 target. A list of locks per vCPU is maintained to lock CPU state in
> MMIO operations.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
>  target/i386/mshv/mshv-cpu.c | 61 +++++++++++++++++++++++++++++++++----
>  1 file changed, 55 insertions(+), 6 deletions(-)
> 
> diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
> index c4b2c297e2..0ba1dacaed 100644
> --- a/target/i386/mshv/mshv-cpu.c
> +++ b/target/i386/mshv/mshv-cpu.c
> @@ -14,6 +14,8 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/atomic.h"
> +#include "qemu/lockable.h"
>  #include "qemu/error-report.h"
>  #include "qemu/typedefs.h"
>  
> @@ -30,6 +32,36 @@
>  #include "trace-accel_mshv.h"
>  #include "trace.h"
>  
> +#include <sys/ioctl.h>
> +
> +static QemuMutex *cpu_guards_lock;
> +static GHashTable *cpu_guards;
> +
> +static void add_cpu_guard(int cpu_fd)
> +{
> +    QemuMutex *guard;
> +
> +    WITH_QEMU_LOCK_GUARD(cpu_guards_lock) {
> +        guard = g_new0(QemuMutex, 1);
> +        qemu_mutex_init(guard);
> +        g_hash_table_insert(cpu_guards, GUINT_TO_POINTER(cpu_fd), guard);
> +    }
> +}
> +
> +static void remove_cpu_guard(int cpu_fd)
> +{
> +    QemuMutex *guard;
> +
> +    WITH_QEMU_LOCK_GUARD(cpu_guards_lock) {
> +        guard = g_hash_table_lookup(cpu_guards, GUINT_TO_POINTER(cpu_fd));
> +        if (guard) {
> +            qemu_mutex_destroy(guard);
> +            g_free(guard);
> +            g_hash_table_remove(cpu_guards, GUINT_TO_POINTER(cpu_fd));
> +        }
> +    }
> +}
> +
>  int mshv_store_regs(CPUState *cpu)
>  {
>  	error_report("unimplemented");
> @@ -62,20 +94,37 @@ int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit)
>  
>  void mshv_remove_vcpu(int vm_fd, int cpu_fd)
>  {
> -	error_report("unimplemented");
> -	abort();
> +    /*
> +     * TODO: don't we have to perform an ioctl to remove the vcpu?
> +     * there is WHvDeleteVirtualProcessor in the WHV api
> +     */
> +    remove_cpu_guard(cpu_fd);

Can you just park that CPU and never schedule it again?

There is a DELETE_VP call but we may not have exposed that to user
space.

The code as-is seems to be leaking the cpu_fd. If it is handled
elsewhere you can ignore this comment.

Thanks,
Wei.

>  }
>  
> +
>  int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd)
>  {
> -	error_report("unimplemented");
> -	abort();
> +    int ret;
> +    struct mshv_create_vp vp_arg = {
> +        .vp_index = vp_index,
> +    };
> +    ret = ioctl(vm_fd, MSHV_CREATE_VP, &vp_arg);
> +    if (ret < 0) {
> +        error_report("failed to create mshv vcpu: %s", strerror(errno));
> +        return -1;
> +    }
> +
> +    add_cpu_guard(ret);
> +    *cpu_fd = ret;
> +
> +    return 0;
>  }
>  
>  void mshv_init_cpu_logic(void)
>  {
> -	error_report("unimplemented");
> -	abort();
> +    cpu_guards_lock = g_new0(QemuMutex, 1);
> +    qemu_mutex_init(cpu_guards_lock);
> +    cpu_guards = g_hash_table_new(g_direct_hash, g_direct_equal);
>  }
>  
>  void mshv_arch_init_vcpu(CPUState *cpu)
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 15/25] target/i386/mshv: Implement mshv_store_regs()
  2025-05-20 11:30 ` [RFC PATCH 15/25] target/i386/mshv: Implement mshv_store_regs() Magnus Kulke
@ 2025-05-20 22:07   ` Wei Liu
  0 siblings, 0 replies; 76+ messages in thread
From: Wei Liu @ 2025-05-20 22:07 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 01:30:08PM +0200, Magnus Kulke wrote:
> Add support for writing general-purpose registers to MSHV vCPUs
> during initialization or migration using the MSHV register interface. A
> generic set_register call is introduced to abstract the HV call over
> the various register types.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
[...]
>  int mshv_store_regs(CPUState *cpu)
>  {
> -	error_report("unimplemented");
> -	abort();
> +    int ret;
> +
> +    ret = set_standard_regs(cpu);
> +    if (ret < 0) {
> +        error_report("Failed to store standard registers");
> +        return -1;
> +    }
> +
> +    /* TODO: should store special registers? the equivalent hvf code doesn't */

(I'm just using x86 KVM's special registers as a reference)

We should not need to store them every time we refresh the CPU state,
unless we know some of them are dirtied by QEMU.

Thanks,
Wei.

> +
> +    return 0;
>  }
>  
> +
>  int mshv_load_regs(CPUState *cpu)
>  {
>  	error_report("unimplemented");
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 16/25] target/i386/mshv: Implement mshv_get_standard_regs()
  2025-05-20 11:30 ` [RFC PATCH 16/25] target/i386/mshv: Implement mshv_get_standard_regs() Magnus Kulke
@ 2025-05-20 22:09   ` Wei Liu
  0 siblings, 0 replies; 76+ messages in thread
From: Wei Liu @ 2025-05-20 22:09 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 01:30:09PM +0200, Magnus Kulke wrote:
> Fetch standard register state from MSHV vCPUs to support debugging,
> migration, and other introspection features in QEMU.
> 
> Fetch standard register state from a MHSV vCPU's. A generic get_regs()
> function and a mapper to map the different register representations are
> introduced.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
[...]
>  int mshv_load_regs(CPUState *cpu)
>  {
> +    int ret;
> +
> +    ret = mshv_get_standard_regs(cpu);
> +    if (ret < 0) {
> +        error_report("Failed to load standard registers");
> +        return -1;
> +    }
> +
>  	error_report("unimplemented");
>  	abort();

This part looks wrong. It should be "return 0;" instead.

Thanks,
Wei.

>  }

> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 17/25] target/i386/mshv: Implement mshv_get_special_regs()
  2025-05-20 11:30 ` [RFC PATCH 17/25] target/i386/mshv: Implement mshv_get_special_regs() Magnus Kulke
  2025-05-20 14:05   ` Paolo Bonzini
@ 2025-05-20 22:15   ` Wei Liu
  2025-05-28 13:55     ` Magnus Kulke
  1 sibling, 1 reply; 76+ messages in thread
From: Wei Liu @ 2025-05-20 22:15 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 01:30:10PM +0200, Magnus Kulke wrote:
> Retrieve special registers (e.g. segment, control, and descriptor
> table registers) from MSHV vCPUs.
> 
> Various helper functions to map register state representations between
> Qemu and MSHV are introduced.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
>  include/system/mshv.h       |   1 +
>  target/i386/mshv/mshv-cpu.c | 118 +++++++++++++++++++++++++++++++++++-
>  2 files changed, 117 insertions(+), 2 deletions(-)
> 
> diff --git a/include/system/mshv.h b/include/system/mshv.h
> index 9b78b66a24..055489a6f3 100644
> --- a/include/system/mshv.h
> +++ b/include/system/mshv.h
> @@ -109,6 +109,7 @@ void mshv_init_cpu_logic(void);
>  int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd);
>  void mshv_remove_vcpu(int vm_fd, int cpu_fd);
>  int mshv_get_standard_regs(CPUState *cpu);
> +int mshv_get_special_regs(CPUState *cpu);
>  int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit);
>  int mshv_load_regs(CPUState *cpu);
>  int mshv_store_regs(CPUState *cpu);
> diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
> index 41584c3f8e..979ee5b8c3 100644
> --- a/target/i386/mshv/mshv-cpu.c
> +++ b/target/i386/mshv/mshv-cpu.c
> @@ -58,6 +58,27 @@ static enum hv_register_name STANDARD_REGISTER_NAMES[18] = {
>      HV_X64_REGISTER_RFLAGS,
>  };
>  
> +static enum hv_register_name SPECIAL_REGISTER_NAMES[18] = {
[...]
> +    HV_REGISTER_PENDING_INTERRUPTION,

Why do you think this is needed?

> +};
> +
>  static void add_cpu_guard(int cpu_fd)
>  {
>      QemuMutex *guard;
> @@ -215,6 +236,94 @@ int mshv_get_standard_regs(CPUState *cpu)
>      return 0;
>  }
>  
> +static void populate_segment_reg(const hv_x64_segment_register *hv_seg,
> +                                 SegmentCache *seg)
> +{
> +    memset(seg, 0, sizeof(SegmentCache));
> +
> +    seg->base = hv_seg->base;
> +    seg->limit = hv_seg->limit;
> +    seg->selector = hv_seg->selector;
> +
> +    seg->flags = (hv_seg->segment_type << DESC_TYPE_SHIFT)
> +                 | (hv_seg->present * DESC_P_MASK)
> +                 | (hv_seg->descriptor_privilege_level << DESC_DPL_SHIFT)
> +                 | (hv_seg->_default << DESC_B_SHIFT)
> +                 | (hv_seg->non_system_segment * DESC_S_MASK)
> +                 | (hv_seg->_long << DESC_L_SHIFT)
> +                 | (hv_seg->granularity * DESC_G_MASK)
> +                 | (hv_seg->available * DESC_AVL_MASK);
> +
> +}
> +
> +static void populate_table_reg(const hv_x64_table_register *hv_seg,
> +                               SegmentCache *tbl)
> +{
> +    memset(tbl, 0, sizeof(SegmentCache));
> +
> +    tbl->base = hv_seg->base;
> +    tbl->limit = hv_seg->limit;
> +}
> +
> +static void populate_special_regs(const hv_register_assoc *assocs,
> +                                  X86CPU *x86cpu)
> +{
> +    CPUX86State *env = &x86cpu->env;
> +
> +    populate_segment_reg(&assocs[0].value.segment, &env->segs[R_CS]);
> +    populate_segment_reg(&assocs[1].value.segment, &env->segs[R_DS]);
> +    populate_segment_reg(&assocs[2].value.segment, &env->segs[R_ES]);
> +    populate_segment_reg(&assocs[3].value.segment, &env->segs[R_FS]);
> +    populate_segment_reg(&assocs[4].value.segment, &env->segs[R_GS]);
> +    populate_segment_reg(&assocs[5].value.segment, &env->segs[R_SS]);
> +
> +    /* TODO: should we set TR + LDT? */
> +    /* populate_segment_reg(&assocs[6].value.segment, &regs->tr); */
> +    /* populate_segment_reg(&assocs[7].value.segment, &regs->ldt); */
> +
> +    populate_table_reg(&assocs[8].value.table, &env->gdt);
> +    populate_table_reg(&assocs[9].value.table, &env->idt);
> +
> +    env->cr[0] = assocs[10].value.reg64;
> +    env->cr[2] = assocs[11].value.reg64;
> +    env->cr[3] = assocs[12].value.reg64;
> +    env->cr[4] = assocs[13].value.reg64;
> +
> +    cpu_set_apic_tpr(x86cpu->apic_state, assocs[14].value.reg64);
> +    env->efer = assocs[15].value.reg64;
> +    cpu_set_apic_base(x86cpu->apic_state, assocs[16].value.reg64);
> +
> +    /* TODO: should we set those? */
> +    /* pending_reg = assocs[17].value.pending_interruption.as_uint64; */
> +    /* populate_interrupt_bitmap(pending_reg, regs->interrupt_bitmap); */

If QEMU never touches it, then there is no need to set it.

> +}
> +
> +
> +int mshv_get_special_regs(CPUState *cpu)
> +{
> +    size_t n_regs = sizeof(SPECIAL_REGISTER_NAMES) / sizeof(hv_register_name);
> +    struct hv_register_assoc *assocs;
> +    int ret;
> +    X86CPU *x86cpu = X86_CPU(cpu);
> +    int cpu_fd = mshv_vcpufd(cpu);
> +
> +    assocs = g_new0(hv_register_assoc, n_regs);
> +    for (size_t i = 0; i < n_regs; i++) {
> +        assocs[i].name = SPECIAL_REGISTER_NAMES[i];
> +    }
> +    ret = get_generic_regs(cpu_fd, assocs, n_regs);
> +    if (ret < 0) {
> +        error_report("failed to get special registers");
> +        g_free(assocs);
> +        return -errno;
> +    }
> +
> +    populate_special_regs(assocs, x86cpu);
> +
> +    g_free(assocs);
> +    return 0;
> +}
> +
>  int mshv_load_regs(CPUState *cpu)
>  {
>      int ret;
> @@ -225,8 +334,13 @@ int mshv_load_regs(CPUState *cpu)
>          return -1;
>      }
>  
> -	error_report("unimplemented");
> -	abort();
> +    ret = mshv_get_special_regs(cpu);
> +    if (ret < 0) {
> +        error_report("Failed to load special registers");
> +        return -1;
> +    }
> +
> +    return 0;

Ah so you changed the code in this patch.

Thanks,
Wei.

>  }
>  
>  int mshv_arch_put_registers(const CPUState *cpu)
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 18/25] target/i386/mshv: Implement mshv_arch_put_registers()
  2025-05-20 11:30 ` [RFC PATCH 18/25] target/i386/mshv: Implement mshv_arch_put_registers() Magnus Kulke
  2025-05-20 14:33   ` Paolo Bonzini
@ 2025-05-20 22:22   ` Wei Liu
  2025-05-28 14:30     ` Magnus Kulke
  2025-06-06 19:11   ` Wei Liu
  2 siblings, 1 reply; 76+ messages in thread
From: Wei Liu @ 2025-05-20 22:22 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 01:30:11PM +0200, Magnus Kulke wrote:
> Write CPU register state to MSHV vCPUs. Various mapping functions to
> prepare the payload for the HV call have been implemented.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
[...]
> +static int set_special_regs(const CPUState *cpu)
> +{
> +    X86CPU *x86cpu = X86_CPU(cpu);
> +    CPUX86State *env = &x86cpu->env;
> +    int cpu_fd = mshv_vcpufd(cpu);
> +    struct hv_register_assoc *assocs;
> +    size_t n_regs = sizeof(SPECIAL_REGISTER_NAMES) / sizeof(hv_register_name);
> +    int ret;
> +
> +    assocs = g_new0(struct hv_register_assoc, n_regs);
> +
> +    /* set names */
> +    for (size_t i = 0; i < n_regs; i++) {
> +        assocs[i].name = SPECIAL_REGISTER_NAMES[i];
> +    }
> +    populate_hv_segment_reg(&env->segs[R_CS], &assocs[0].value.segment);
> +    populate_hv_segment_reg(&env->segs[R_DS], &assocs[1].value.segment);
> +    populate_hv_segment_reg(&env->segs[R_ES], &assocs[2].value.segment);
> +    populate_hv_segment_reg(&env->segs[R_FS], &assocs[3].value.segment);
> +    populate_hv_segment_reg(&env->segs[R_GS], &assocs[4].value.segment);
> +    populate_hv_segment_reg(&env->segs[R_SS], &assocs[5].value.segment);
> +    populate_hv_segment_reg(&env->tr, &assocs[6].value.segment);
> +    populate_hv_segment_reg(&env->ldt, &assocs[7].value.segment);
> +
> +    populate_hv_table_reg(&env->gdt, &assocs[8].value.table);
> +    populate_hv_table_reg(&env->idt, &assocs[9].value.table);
> +
> +    assocs[10].value.reg64 = env->cr[0];
> +    assocs[11].value.reg64 = env->cr[2];
> +    assocs[12].value.reg64 = env->cr[3];
> +    assocs[13].value.reg64 = env->cr[4];
> +    assocs[14].value.reg64 = cpu_get_apic_tpr(x86cpu->apic_state);
> +    assocs[15].value.reg64 = env->efer;
> +    assocs[16].value.reg64 = cpu_get_apic_base(x86cpu->apic_state);
> +
> +    /*
> +     * TODO: support asserting an interrupt using interrup_bitmap
> +     * it should be possible if we use the vm_fd
> +     */
> +

Why is there a need to assert an interrupt here?

> +    ret = mshv_set_generic_regs(cpu_fd, assocs, n_regs);
> +    g_free(assocs);
> +    if (ret < 0) {
> +        error_report("failed to set special registers");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int set_fpu_regs(int cpu_fd, const struct MshvFPU *regs)

Please change regs to fpu.

Thanks,
Wei.

> +{
> +    struct hv_register_assoc *assocs;
> +    union hv_register_value *value;
> +    size_t n_regs = sizeof(FPU_REGISTER_NAMES) / sizeof(enum hv_register_name);
> +    size_t fp_i;
> +    union hv_x64_fp_control_status_register *ctrl_status;
> +    union hv_x64_xmm_control_status_register *xmm_ctrl_status;
> +    int ret;
> +


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 22/25] target/i386/mshv: Integrate x86 instruction decoder/emulator
  2025-05-20 11:30 ` [RFC PATCH 22/25] target/i386/mshv: Integrate x86 instruction decoder/emulator Magnus Kulke
@ 2025-05-20 22:38   ` Wei Liu
  2025-05-28 15:10     ` Magnus Kulke
  0 siblings, 1 reply; 76+ messages in thread
From: Wei Liu @ 2025-05-20 22:38 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 01:30:15PM +0200, Magnus Kulke wrote:
> Connect the x86 instruction decoder and emulator to the MSHV backend
> to handle intercepted instructions. This enables software emulation
> of MMIO operations in MSHV guests. MSHV has a translate_gva hypercall
> that is used to accessing the physical guest memory.
> 
> A guest might read from unmapped memory regions (e.g. OVMF will probe
> 0xfed40000 for a vTPM). In those cases 0xFF bytes is returned instead of
> aborting the execution.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
>  accel/mshv/mem.c            |  72 ++++++++++++++++++++
>  accel/mshv/trace-events     |   3 +
>  include/system/mshv.h       |   4 ++
>  target/i386/mshv/mshv-cpu.c | 127 ++++++++++++++++++++++++++++++++++++
>  4 files changed, 206 insertions(+)
> 
> diff --git a/accel/mshv/mem.c b/accel/mshv/mem.c
> index 2bbeae4f4a..ee627e7bd6 100644
> --- a/accel/mshv/mem.c
> +++ b/accel/mshv/mem.c
> @@ -54,6 +54,78 @@ static int map_or_unmap(int vm_fd, const MshvMemoryRegion *mr, bool add)
>      return set_guest_memory(vm_fd, &region);
>  }
>  
> +static inline MemTxAttrs get_mem_attrs(bool is_secure_mode)
> +{
> +    MemTxAttrs memattr = {0};
> +    memattr.secure = is_secure_mode;
> +    return memattr;
> +}

This function can be dropped. Its call site can be written as

      MemTxAttrs memattrs = { .secure = is_secure_mode };

> +
> +static int handle_unmapped_mmio_region_read(uint64_t gpa, uint64_t size,
> +                                            uint8_t *data)
> +{
> +    warn_report("read from unmapped mmio region gpa=0x%lx size=%lu", gpa, size);
> +
> +    if (size == 0 || size > 8) {
> +        error_report("invalid size %lu for reading from unmapped mmio region",
> +                     size);
> +        return -1;
> +    }
> +
> +    memset(data, 0xFF, size);
> +
> +    return 0;
> +}
> +
> +int mshv_guest_mem_read(uint64_t gpa, uint8_t *data, uintptr_t size,
> +                        bool is_secure_mode, bool instruction_fetch)
> +{
> +    int ret;
> +    MemTxAttrs memattr = get_mem_attrs(is_secure_mode);
> +
> +    if (instruction_fetch) {
> +        trace_mshv_insn_fetch(gpa, size);
> +    } else {
> +        trace_mshv_mem_read(gpa, size);
> +    }
> +
> +    ret = address_space_rw(&address_space_memory, gpa, memattr, (void *)data,
> +                           size, false);
> +    if (ret == MEMTX_OK) {
> +        return 0;
> +    }
> +
> +    if (ret == MEMTX_DECODE_ERROR) {
> +        return handle_unmapped_mmio_region_read(gpa, size, data);
> +    }
> +
> +    error_report("failed to read guest memory at 0x%lx", gpa);
> +    return -1;
> +}
> +
> +int mshv_guest_mem_write(uint64_t gpa, const uint8_t *data, uintptr_t size,
> +                         bool is_secure_mode)
> +{
> +    int ret;
> +
> +    trace_mshv_mem_write(gpa, size);
> +    MemTxAttrs memattr = get_mem_attrs(is_secure_mode);
> +    ret = address_space_rw(&address_space_memory, gpa, memattr, (void *)data,
> +                           size, true);
> +    if (ret == MEMTX_OK) {
> +        return 0;
> +    }
> +
> +    if (ret == MEMTX_DECODE_ERROR) {
> +        warn_report("write to unmapped mmio region gpa=0x%lx size=%lu", gpa,
> +                    size);
> +        return 0;
> +    }
> +
> +    error_report("Failed to write guest memory");
> +    return -1;
> +}
> +
>  static int set_memory(const MshvMemoryRegion *mshv_mr, bool add)
>  {
>      int ret = 0;
> diff --git a/accel/mshv/trace-events b/accel/mshv/trace-events
> index 06aa27ef67..686d89e084 100644
> --- a/accel/mshv/trace-events
> +++ b/accel/mshv/trace-events
> @@ -15,3 +15,6 @@ mshv_commit_msi_routing_table(int vm_fd, int len) "vm_fd %d table_size %d"
>  mshv_register_irqfd(int vm_fd, int event_fd, uint32_t gsi) "vm_fd %d event_fd %d gsi %d"
>  mshv_irqchip_update_irqfd_notifier_gsi(int event_fd, int resample_fd, int virq, bool add) "event_fd %d resample_fd %d virq %d add %d"
>  
> +mshv_insn_fetch(uint64_t addr, size_t size) "gpa=%lx size=%lu"
> +mshv_mem_write(uint64_t addr, size_t size) "\tgpa=%lx size=%lu"
> +mshv_mem_read(uint64_t addr, size_t size) "\tgpa=%lx size=%lu"
> diff --git a/include/system/mshv.h b/include/system/mshv.h
> index f854f9b77d..622b3db540 100644
> --- a/include/system/mshv.h
> +++ b/include/system/mshv.h
> @@ -201,6 +201,10 @@ typedef struct MshvMemoryRegion {
>  
>  int mshv_add_mem(int vm_fd, const MshvMemoryRegion *mr);
>  int mshv_remove_mem(int vm_fd, const MshvMemoryRegion *mr);
> +int mshv_guest_mem_read(uint64_t gpa, uint8_t *data, uintptr_t size,
> +                        bool is_secure_mode, bool instruction_fetch);
> +int mshv_guest_mem_write(uint64_t gpa, const uint8_t *data, uintptr_t size,
> +                         bool is_secure_mode);
>  void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
>                         bool add);
>  /* interrupt */
> diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
> index 081132e0c9..a7ee5ebb2a 100644
> --- a/target/i386/mshv/mshv-cpu.c
> +++ b/target/i386/mshv/mshv-cpu.c
> @@ -995,11 +995,138 @@ int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd)
>      return 0;
>  }
>  
> +static int translate_gva(int cpu_fd, uint64_t gva, uint64_t *gpa,
> +                         uint64_t flags)
> +{
> +    int ret;
> +    union hv_translate_gva_result result = { 0 };
> +
> +    *gpa = 0;
> +    mshv_translate_gva args = {
> +        .gva = gva,
> +        .flags = flags,
> +        .gpa = (__u64 *)gpa,
> +        .result = &result,
> +    };
> +
> +    ret = ioctl(cpu_fd, MSHV_TRANSLATE_GVA, &args);
> +    if (ret < 0) {
> +        error_report("failed to invoke gpa->gva translation");
> +        return -errno;
> +    }
> +    if (result.result_code != HV_TRANSLATE_GVA_SUCCESS) {
> +        error_report("failed to translate gva (" TARGET_FMT_lx ") to gpa", gva);
> +        return -1;
> +
> +    }
> +
> +    return 0;
> +}
> +
> +static int guest_mem_read_with_gva(const CPUState *cpu, uint64_t gva,
> +                                   uint8_t *data, uintptr_t size,
> +                                   bool fetch_instruction)
> +{
> +    int ret;
> +    uint64_t gpa, flags;
> +    int cpu_fd = mshv_vcpufd(cpu);
> +
> +    flags = HV_TRANSLATE_GVA_VALIDATE_READ;
> +    ret = translate_gva(cpu_fd, gva, &gpa, flags);

Please check the exit structure. IIRC it provides an initial
translation. If you don't cross a page boundary, that's probably enough
for what you need. This is going to save you a lot of transitions across
two boundaries (user -> kernel, kernel -> hypervisor).

> +    if (ret < 0) {
> +        error_report("failed to translate gva to gpa");
> +        return -1;
> +    }
> +
> +    ret = mshv_guest_mem_read(gpa, data, size, false, fetch_instruction);
> +    if (ret < 0) {
> +        error_report("failed to read from guest memory");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int guest_mem_write_with_gva(const CPUState *cpu, uint64_t gva,
> +                                    const uint8_t *data, uintptr_t size)
> +{
> +    int ret;
> +    uint64_t gpa, flags;
> +    int cpu_fd = mshv_vcpufd(cpu);
> +
> +    flags = HV_TRANSLATE_GVA_VALIDATE_WRITE;
> +    ret = translate_gva(cpu_fd, gva, &gpa, flags);
> +    if (ret < 0) {
> +        error_report("failed to translate gva to gpa");
> +        return -1;
> +    }
> +    ret = mshv_guest_mem_write(gpa, data, size, false);
> +    if (ret < 0) {
> +        error_report("failed to write to guest memory");
> +        return -1;
> +    }
> +    return 0;
> +}
> +
> +static void write_mem_emu(CPUState *cpu, void *data, target_ulong addr,
> +                          int bytes)
> +{
> +    if (guest_mem_write_with_gva(cpu, addr, data, bytes) < 0) {
> +        error_report("failed to write memory");
> +        abort();
> +    }
> +}
> +
> +static void read_mem_emu(CPUState *cpu, void *data, target_ulong addr,
> +                         int bytes)
> +{
> +    if (guest_mem_read_with_gva(cpu, addr, data, bytes, false) < 0) {
> +        error_report("failed to read memory");
> +        abort();
> +    }
> +}
> +
> +static void fetch_instruction_emu(CPUState *cpu, void *data,
> +                                  target_ulong addr, int bytes)
> +{
> +    if (guest_mem_read_with_gva(cpu, addr, data, bytes, true) < 0) {
> +        error_report("failed to fetch instruction");
> +        abort();
> +    }
> +}
> +
> +static void read_segment_descriptor_emu(CPUState *cpu,
> +                                        struct x86_segment_descriptor *desc,
> +                                        enum X86Seg seg_idx)
> +{
> +    bool ret;
> +    X86CPU *x86_cpu = X86_CPU(cpu);
> +    CPUX86State *env = &x86_cpu->env;
> +    SegmentCache *seg = &env->segs[seg_idx];
> +    x86_segment_selector sel = { .sel = seg->selector & 0xFFFF };
> +
> +    ret = x86_read_segment_descriptor(cpu, desc, sel);
> +    if (ret == false) {
> +        error_report("failed to read segment descriptor");
> +        abort();
> +    }
> +}
> +
> +static const struct x86_emul_ops mshv_x86_emul_ops = {
> +    .fetch_instruction = fetch_instruction_emu,
> +    .read_mem = read_mem_emu,
> +    .write_mem = write_mem_emu,
> +    .read_segment_descriptor = read_segment_descriptor_emu,

You can remove the _emu suffix from all the handler functions.

> +};
> +
>  void mshv_init_cpu_logic(void)
>  {
>      cpu_guards_lock = g_new0(QemuMutex, 1);
>      qemu_mutex_init(cpu_guards_lock);
>      cpu_guards = g_hash_table_new(g_direct_hash, g_direct_equal);
> +
> +    init_decoder();
> +    init_emu(&mshv_x86_emul_ops);

If I'm not mistaken, the name mshv_init_cpu_logic suggests this function
is called every time a CPU is initialized. There is no need to
repeatedly initialize the emulator.

The code snippet should be moved to either the initialization function
of the accelerator or the initialization function of the VM object.

Thanks,
Wei.

>  }
>  
>  void mshv_arch_init_vcpu(CPUState *cpu)
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 24/25] target/i386/mshv: Implement mshv_vcpu_run()
  2025-05-20 11:30 ` [RFC PATCH 24/25] target/i386/mshv: Implement mshv_vcpu_run() Magnus Kulke
  2025-05-20 13:21   ` Paolo Bonzini
@ 2025-05-20 22:52   ` Wei Liu
  2025-06-03 15:40     ` Magnus Kulke
  2025-07-01  8:35     ` Magnus Kulke
  1 sibling, 2 replies; 76+ messages in thread
From: Wei Liu @ 2025-05-20 22:52 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 01:30:17PM +0200, Magnus Kulke wrote:
> Add the main vCPU execution loop for MSHV using the MSHV_RUN_VP ioctl.
> 
> A translate_gva() hypercall is implemented. The execution loop handles
> guest entry and VM exits. There are handlers for memory r/w, PIO and
> MMIO to which the exit events are dispatched.
> 
> In case of MMIO the i386 instruction decoder/emulator is invoked to
> perform the operation in user space.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
[...]
> +
> +static int handle_mmio(CPUState *cpu, const struct hyperv_message *msg,
> +                       MshvVmExit *exit_reason)
> +{
> +    struct hv_x64_memory_intercept_message info = { 0 };
> +    size_t insn_len;
> +    uint8_t access_type;
> +    uint8_t *instruction_bytes;
> +    int ret;
> +
> +    ret = set_memory_info(msg, &info);
> +    if (ret < 0) {
> +        error_report("failed to convert message to memory info");
> +        return -1;
> +    }
> +    insn_len = info.instruction_byte_count;
> +    access_type = info.header.intercept_access_type;
> +
> +    if (access_type == HV_X64_INTERCEPT_ACCESS_TYPE_EXECUTE) {
> +        error_report("invalid intercept access type: execute");
> +        return -1;
> +    }
> +

You can assert(insn_len <= 16) here to simplify the code.

> +    if (insn_len > 16) {
> +        error_report("invalid mmio instruction length: %zu", insn_len);
> +        return -1;
> +    }
> +
> +    if (insn_len == 0) {
> +        warn_report("mmio instruction buffer empty");

This is a valid state so there is no need to warn.

> +    }
> +
> +    instruction_bytes = info.instruction_bytes;
> +
> +    ret = emulate_instruction(cpu, instruction_bytes, insn_len,
> +                              info.guest_virtual_address,
> +                              info.guest_physical_address);
> +    if (ret < 0) {
> +        error_report("failed to emulate mmio");
> +        return -1;
> +    }
> +
> +    *exit_reason = MshvVmExitIgnore;
> +
> +    return 0;
> +}
> +
> +static int handle_unmapped_mem(int vm_fd, CPUState *cpu,
> +                               const struct hyperv_message *msg,
> +                               MshvVmExit *exit_reason)
> +{
> +    struct hv_x64_memory_intercept_message info = { 0 };
> +    int ret;
> +
> +    ret = set_memory_info(msg, &info);
> +    if (ret < 0) {
> +        error_report("failed to convert message to memory info");
> +        return -1;
> +    }
> +
> +    return handle_mmio(cpu, msg, exit_reason);
> +}
> +
> +static int set_ioport_info(const struct hyperv_message *msg,
> +                           hv_x64_io_port_intercept_message *info)
> +{
> +    if (msg->header.message_type != HVMSG_X64_IO_PORT_INTERCEPT) {
> +        error_report("Invalid message type");
> +        return -1;
> +    }
> +    memcpy(info, msg->payload, sizeof(*info));
> +
> +    return 0;
> +}
> +
> +typedef struct X64Registers {
> +  const uint32_t *names;
> +  const uint64_t *values;
> +  uintptr_t count;
> +} X64Registers;
> +
> +static int set_x64_registers(int cpu_fd, const X64Registers *regs)
> +{
> +    size_t n_regs = regs->count;
> +    struct hv_register_assoc *assocs;
> +
> +    assocs = g_new0(hv_register_assoc, n_regs);
> +    for (size_t i = 0; i < n_regs; i++) {
> +        assocs[i].name = regs->names[i];
> +        assocs[i].value.reg64 = regs->values[i];
> +    }
> +    int ret;
> +
> +    ret = mshv_set_generic_regs(cpu_fd, assocs, n_regs);
> +    g_free(assocs);
> +    if (ret < 0) {
> +        error_report("failed to set x64 registers");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static inline MemTxAttrs get_mem_attrs(bool is_secure_mode)
> +{
> +    MemTxAttrs memattr = {0};
> +    memattr.secure = is_secure_mode;
> +    return memattr;
> +}
> +
> +static void pio_read(uint64_t port, uint8_t *data, uintptr_t size,
> +                     bool is_secure_mode)
> +{
> +    int ret = 0;
> +    MemTxAttrs memattr = get_mem_attrs(is_secure_mode);
> +    ret = address_space_rw(&address_space_io, port, memattr, (void *)data, size,
> +                           false);
> +    if (ret != MEMTX_OK) {
> +        error_report("Failed to read from port %lx: %d", port, ret);
> +        abort();
> +    }
> +}
> +
> +static int pio_write(uint64_t port, const uint8_t *data, uintptr_t size,
> +                     bool is_secure_mode)
> +{
> +    int ret = 0;
> +    MemTxAttrs memattr = get_mem_attrs(is_secure_mode);
> +    ret = address_space_rw(&address_space_io, port, memattr, (void *)data, size,
> +                           true);
> +    return ret;
> +}
> +
> +static int handle_pio_non_str(const CPUState *cpu,
> +                              hv_x64_io_port_intercept_message *info) {
> +    size_t len = info->access_info.access_size;
> +    uint8_t access_type = info->header.intercept_access_type;
> +    int ret;
> +    uint32_t val, eax;
> +    const uint32_t eax_mask =  0xffffffffu >> (32 - len * 8);
> +    size_t insn_len;
> +    uint64_t rip, rax;
> +    uint32_t reg_names[2];
> +    uint64_t reg_values[2];
> +    struct X64Registers x64_regs = { 0 };
> +    uint16_t port = info->port_number;
> +    int cpu_fd = mshv_vcpufd(cpu);
> +
> +    if (access_type == HV_X64_INTERCEPT_ACCESS_TYPE_WRITE) {
> +        union {
> +            uint32_t u32;
> +            uint8_t bytes[4];
> +        } conv;
> +
> +        /* convert the first 4 bytes of rax to bytes */
> +        conv.u32 = (uint32_t)info->rax;
> +        /* secure mode is set to false */
> +        ret = pio_write(port, conv.bytes, len, false);
> +        if (ret < 0) {
> +            error_report("Failed to write to io port");
> +            return -1;
> +        }
> +    } else {
> +        uint8_t data[4] = { 0 };
> +        /* secure mode is set to false */
> +        pio_read(info->port_number, data, len, false);
> +
> +        /* Preserve high bits in EAX, but clear out high bits in RAX */
> +        val = *(uint32_t *)data;
> +        eax = (((uint32_t)info->rax) & ~eax_mask) | (val & eax_mask);
> +        info->rax = (uint64_t)eax;
> +    }
> +
> +    insn_len = info->header.instruction_length;
> +
> +    /* Advance RIP and update RAX */
> +    rip = info->header.rip + insn_len;
> +    rax = info->rax;
> +
> +    reg_names[0] = HV_X64_REGISTER_RIP;
> +    reg_values[0] = rip;
> +    reg_names[1] = HV_X64_REGISTER_RAX;
> +    reg_values[1] = rax;
> +
> +    x64_regs.names = reg_names;
> +    x64_regs.values = reg_values;
> +    x64_regs.count = 2;
> +
> +    ret = set_x64_registers(cpu_fd, &x64_regs);
> +    if (ret < 0) {
> +        error_report("Failed to set x64 registers");
> +        return -1;
> +    }
> +
> +    cpu->accel->dirty = false;
> +
> +    return 0;
> +}
> +
> +static int fetch_guest_state(CPUState *cpu)
> +{
> +    int ret;
> +
> +    ret = mshv_get_standard_regs(cpu);
> +    if (ret < 0) {
> +        error_report("Failed to get standard registers");
> +        return -1;
> +    }
> +
> +    ret = mshv_get_special_regs(cpu);
> +    if (ret < 0) {
> +        error_report("Failed to get special registers");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int read_memory(int cpu_fd, uint64_t initial_gva, uint64_t initial_gpa,
> +                       uint64_t gva, uint8_t *data, size_t len)
> +{
> +    int ret;
> +    uint64_t gpa, flags;
> +
> +    if (gva == initial_gva) {
> +        gpa = initial_gpa;
> +    } else {
> +        flags = HV_TRANSLATE_GVA_VALIDATE_READ;
> +        ret = translate_gva(cpu_fd, gva, &gpa, flags);
> +        if (ret < 0) {
> +            return -1;
> +        }
> +
> +        ret = mshv_guest_mem_read(gpa, data, len, false, false);
> +        if (ret < 0) {
> +            error_report("failed to read guest mem");
> +            return -1;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +static int write_memory(int cpu_fd, uint64_t initial_gva, uint64_t initial_gpa,
> +                        uint64_t gva, const uint8_t *data, size_t len)
> +{
> +    int ret;
> +    uint64_t gpa, flags;
> +
> +    if (gva == initial_gva) {
> +        gpa = initial_gpa;
> +    } else {
> +        flags = HV_TRANSLATE_GVA_VALIDATE_WRITE;
> +        ret = translate_gva(cpu_fd, gva, &gpa, flags);
> +        if (ret < 0) {
> +            error_report("failed to translate gva to gpa");
> +            return -1;
> +        }
> +    }
> +    ret = mshv_guest_mem_write(gpa, data, len, false);
> +    if (ret != MEMTX_OK) {
> +        error_report("failed to write to mmio");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int handle_pio_str_write(CPUState *cpu,
> +                                hv_x64_io_port_intercept_message *info,
> +                                size_t repeat, uint16_t port,
> +                                bool direction_flag)
> +{
> +    int ret;
> +    uint64_t src;
> +    uint8_t data[4] = { 0 };
> +    size_t len = info->access_info.access_size;
> +    int cpu_fd = mshv_vcpufd(cpu);
> +
> +    src = linear_addr(cpu, info->rsi, R_DS);
> +
> +    for (size_t i = 0; i < repeat; i++) {
> +        ret = read_memory(cpu_fd, 0, 0, src, data, len);
> +        if (ret < 0) {
> +            error_report("Failed to read memory");
> +            return -1;
> +        }
> +        ret = pio_write(port, data, len, false);
> +        if (ret < 0) {
> +            error_report("Failed to write to io port");
> +            return -1;
> +        }
> +        src += direction_flag ? -len : len;
> +        info->rsi += direction_flag ? -len : len;
> +    }
> +
> +    return 0;
> +}
> +
> +static int handle_pio_str_read(CPUState *cpu,
> +                                hv_x64_io_port_intercept_message *info,
> +                                size_t repeat, uint16_t port,
> +                                bool direction_flag)
> +{
> +    int ret;
> +    uint64_t dst;
> +    size_t len = info->access_info.access_size;
> +    uint8_t data[4] = { 0 };
> +    int cpu_fd = mshv_vcpufd(cpu);
> +
> +    dst = linear_addr(cpu, info->rdi, R_ES);
> +
> +    for (size_t i = 0; i < repeat; i++) {
> +        pio_read(port, data, len, false);
> +
> +        ret = write_memory(cpu_fd, 0, 0, dst, data, len);
> +        if (ret < 0) {
> +            error_report("Failed to write memory");
> +            return -1;
> +        }
> +        dst += direction_flag ? -len : len;
> +        info->rdi += direction_flag ? -len : len;
> +    }
> +
> +    return 0;
> +}
> +
> +static int handle_pio_str(CPUState *cpu,
> +                          hv_x64_io_port_intercept_message *info)
> +{
> +    uint8_t access_type = info->header.intercept_access_type;
> +    uint16_t port = info->port_number;
> +    bool repop = info->access_info.rep_prefix == 1;
> +    size_t repeat = repop ? info->rcx : 1;
> +    size_t insn_len = info->header.instruction_length;
> +    bool direction_flag;
> +    uint32_t reg_names[3];
> +    uint64_t reg_values[3];
> +    int ret;
> +    struct X64Registers x64_regs = { 0 };
> +    X86CPU *x86_cpu = X86_CPU(cpu);
> +    CPUX86State *env = &x86_cpu->env;
> +    int cpu_fd = mshv_vcpufd(cpu);
> +
> +    ret = fetch_guest_state(cpu);
> +    if (ret < 0) {
> +        error_report("Failed to fetch guest state");
> +        return -1;
> +    }
> +
> +    direction_flag = (env->eflags & DF) != 0;
> +
> +    if (access_type == HV_X64_INTERCEPT_ACCESS_TYPE_WRITE) {
> +        ret = handle_pio_str_write(cpu, info, repeat, port, direction_flag);
> +        if (ret < 0) {
> +            error_report("Failed to handle pio str write");
> +            return -1;
> +        }
> +        reg_names[0] = HV_X64_REGISTER_RSI;
> +        reg_values[0] = info->rsi;
> +    } else {
> +        ret = handle_pio_str_read(cpu, info, repeat, port, direction_flag);
> +        reg_names[0] = HV_X64_REGISTER_RDI;
> +        reg_values[0] = info->rdi;
> +    }
> +
> +    reg_names[1] = HV_X64_REGISTER_RIP;
> +    reg_values[1] = info->header.rip + insn_len;
> +    reg_names[2] = HV_X64_REGISTER_RAX;
> +    reg_values[2] = info->rax;
> +
> +    x64_regs.names = reg_names;
> +    x64_regs.values = reg_values;
> +    x64_regs.count = 2;
> +
> +    ret = set_x64_registers(cpu_fd, &x64_regs);
> +    if (ret < 0) {
> +        error_report("Failed to set x64 registers");
> +        return -1;
> +    }
> +
> +    cpu->accel->dirty = false;
> +
> +    return 0;
> +}
> +
> +static int handle_pio(CPUState *cpu, const struct hyperv_message *msg)
> +{
> +    struct hv_x64_io_port_intercept_message info = { 0 };
> +    int ret;
> +
> +    ret = set_ioport_info(msg, &info);
> +    if (ret < 0) {
> +        error_report("Failed to convert message to ioport info");
> +        return -1;
> +    }
> +
> +    if (info.access_info.string_op) {
> +        return handle_pio_str(cpu, &info);
> +    }
> +
> +    return handle_pio_non_str(cpu, &info);
> +}
> +
>  int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit)
>  {
> -	error_report("unimplemented");
> -	abort();
> +    int ret;
> +    hv_message exit_msg = { 0 };
> +    enum MshvVmExit exit_reason;
> +    int cpu_fd = mshv_vcpufd(cpu);
> +
> +    ret = ioctl(cpu_fd, MSHV_RUN_VP, &exit_msg);
> +    if (ret < 0) {
> +        return MshvVmExitShutdown;
> +    }
> +
> +    switch (exit_msg.header.message_type) {
> +    case HVMSG_UNRECOVERABLE_EXCEPTION:
> +        *msg = exit_msg;
> +        return MshvVmExitShutdown;
> +    case HVMSG_UNMAPPED_GPA:
> +        ret = handle_unmapped_mem(vm_fd, cpu, &exit_msg, &exit_reason);
> +        if (ret < 0) {
> +            error_report("failed to handle unmapped memory");
> +            return -1;
> +        }
> +        return exit_reason;
> +    case HVMSG_GPA_INTERCEPT:

I'm not sure why you want to handle UNMAPPED_GPA and GPA_INTERCEPT
separately. In Cloud Hypervisor there is one code path for both.

Is this due to how the memory address space is set up in QEMU?

> +        ret = handle_mmio(cpu, &exit_msg, &exit_reason);
> +        if (ret < 0) {
> +            error_report("failed to handle mmio");
> +            return -1;
> +        }
> +        return exit_reason;
> +    case HVMSG_X64_IO_PORT_INTERCEPT:
> +        ret = handle_pio(cpu, &exit_msg);
> +        if (ret < 0) {
> +            return MshvVmExitSpecial;
> +        }
> +        return MshvVmExitIgnore;
> +    default:
> +        msg = &exit_msg;

Do you not get any HALT exit? How are you going to shut down the VM?

> +    }
> +
> +    *exit = MshvVmExitIgnore;
> +    return 0;
>  }
>  
>  void mshv_remove_vcpu(int vm_fd, int cpu_fd)
> @@ -1061,34 +1583,6 @@ int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd)
>      return 0;
>  }
>  
> -static int translate_gva(int cpu_fd, uint64_t gva, uint64_t *gpa,
> -                         uint64_t flags)
> -{
> -    int ret;
> -    union hv_translate_gva_result result = { 0 };
> -
> -    *gpa = 0;
> -    mshv_translate_gva args = {
> -        .gva = gva,
> -        .flags = flags,
> -        .gpa = (__u64 *)gpa,
> -        .result = &result,
> -    };
> -
> -    ret = ioctl(cpu_fd, MSHV_TRANSLATE_GVA, &args);
> -    if (ret < 0) {
> -        error_report("failed to invoke gpa->gva translation");
> -        return -errno;
> -    }
> -    if (result.result_code != HV_TRANSLATE_GVA_SUCCESS) {
> -        error_report("failed to translate gva (" TARGET_FMT_lx ") to gpa", gva);
> -        return -1;
> -
> -    }
> -
> -    return 0;
> -}
> -

Why not put this function in the correct location in the previous patch
to begin with?

Thanks,
Wei.

>  static int guest_mem_read_with_gva(const CPUState *cpu, uint64_t gva,
>                                     uint8_t *data, uintptr_t size,
>                                     bool fetch_instruction)
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 25/25] accel/mshv: Add memory remapping workaround
  2025-05-20 13:53   ` Paolo Bonzini
@ 2025-05-22 12:51     ` Magnus Kulke
  0 siblings, 0 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-22 12:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: magnuskulke, qemu-devel, liuwe, Michael S. Tsirkin, Wei Liu,
	Phil Dennis-Jordan, Roman Bolshakov, Philippe Mathieu-Daudé,
	Zhao Liu, Richard Henderson, Cameron Esfahani,
	Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 03:53:10PM +0200, Paolo Bonzini wrote:
> On 5/20/25 13:30, Magnus Kulke wrote:
> > Qemu maps regions of userland multiple times into the guest. The MSHV
> > kernel driver detects those overlapping regions and rejects those
> > mappings.
> 
> Can you explain what you see?  QEMU doesn't do that, just look at KVM code:

Hey Paolo, I appreciate that you took a look so swiftly, we'll try to
accomodate and post a fixed series up soon.

I think what I am referring to is a "memory region alias", e.g. in this
mtree output (machine q35 + seabios):

00000000000e0000-00000000000fffff (prio 1, rom): alias isa-bios @pc.bios 0000000000020000-000000000003ffff
...
00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios

parts of the bios are mapped into different regions on the guest. a
code path for such a mapping that is refused by the MSHV kernel driver
would start in hw/i386/pc.c:894

memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", machine->ram,
                         0, x86ms->below_4g_mem_size);
memory_region_add_subregion(system_memory, 0, ram_below_4g);

eventually that ends up in a hv call that registers a region, but the
userspace_addr of pc.bios is already registered, so the mapping of an alias
slice is rejected by the kernel driver.

best,

magnus

> 
> static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id,
>                                       gfn_t start, gfn_t end)
> {
>         struct kvm_memslot_iter iter;
> 
>         kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end) {
>                 if (iter.slot->id != id)
>                         return true;
>         }
> 
>         return false;
> }
> 
> ...
> 
>         if ((change == KVM_MR_CREATE || change == KVM_MR_MOVE) &&
>             kvm_check_memslot_overlap(slots, id, base_gfn, base_gfn + npages))
>                 return -EEXIST;
> 
> 
> Paolo
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 08/25] accel/mshv: Initialize VM partition
  2025-05-20 19:07   ` Wei Liu
@ 2025-05-22 15:42     ` Magnus Kulke
  2025-05-22 17:46       ` Wei Liu
  2025-05-23  8:23     ` Magnus Kulke
  1 sibling, 1 reply; 76+ messages in thread
From: Magnus Kulke @ 2025-05-22 15:42 UTC (permalink / raw)
  To: Wei Liu
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Phil Dennis-Jordan, Roman Bolshakov, Philippe Mathieu-Daudé,
	Zhao Liu, Richard Henderson, Cameron Esfahani,
	Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 07:07:06PM +0000, Wei Liu wrote:
> On Tue, May 20, 2025 at 01:30:01PM +0200, Magnus Kulke wrote:
> > +static void mshv_reset(void *param)
> > +{
> > +    warn_report("mshv reset");
> 
> What's missing for this hook?
> 

Ah, I suppose this was inspired by the KVM accel. The hook is called for
cleanups that should occur on a reset. At the moment we don't have state
that we want cleanup during reset afaik. So we can remove it here.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 08/25] accel/mshv: Initialize VM partition
  2025-05-22 15:42     ` Magnus Kulke
@ 2025-05-22 17:46       ` Wei Liu
  0 siblings, 0 replies; 76+ messages in thread
From: Wei Liu @ 2025-05-22 17:46 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: Wei Liu, magnuskulke, qemu-devel, liuwe, Paolo Bonzini,
	Michael S. Tsirkin, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On Thu, May 22, 2025 at 05:42:48PM +0200, Magnus Kulke wrote:
> On Tue, May 20, 2025 at 07:07:06PM +0000, Wei Liu wrote:
> > On Tue, May 20, 2025 at 01:30:01PM +0200, Magnus Kulke wrote:
> > > +static void mshv_reset(void *param)
> > > +{
> > > +    warn_report("mshv reset");
> > 
> > What's missing for this hook?
> > 
> 
> Ah, I suppose this was inspired by the KVM accel. The hook is called for
> cleanups that should occur on a reset. At the moment we don't have state
> that we want cleanup during reset afaik. So we can remove it here.

Right, please leave it out if it is not needed. We can add that hook
once we have something to clean up.

Thanks,
Wei.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 08/25] accel/mshv: Initialize VM partition
  2025-05-20 19:07   ` Wei Liu
  2025-05-22 15:42     ` Magnus Kulke
@ 2025-05-23  8:23     ` Magnus Kulke
  2025-05-23 15:37       ` Wei Liu
  1 sibling, 1 reply; 76+ messages in thread
From: Magnus Kulke @ 2025-05-23  8:23 UTC (permalink / raw)
  To: Wei Liu
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Phil Dennis-Jordan, Roman Bolshakov, Philippe Mathieu-Daudé,
	Zhao Liu, Richard Henderson, Cameron Esfahani,
	Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 07:07:06PM +0000, Wei Liu wrote:
> On Tue, May 20, 2025 at 01:30:01PM +0200, Magnus Kulke wrote:
> > Create the MSHV virtual machine by opening a partition and issuing
> > the necessary ioctl to initialize it. This sets up the basic VM
> > structure and initial configuration used by MSHV to manage guest state.
> > 
> > Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> > ---
> [...]
> 
> mshv_fd is neither stashed into a state structure nor freed after this
> point.  Is it leaked?
> 
> Thanks,
> Wei.
> 

AFAIK the accelerator should not be initialized multiple times at runtime,
so under normal circumstances the fd wouldn't leak. But in certain debug
scenarios that would be the case. So, yes, we should make this more solid
and exit early if MSHV_STATE has been previously initialized.

> >      s->nr_as = 1;
> >      s->as = g_new0(MshvAddressSpace, s->nr_as);
> >  
> >      mshv_state = s;
> >  
> > +    qemu_register_reset(mshv_reset, NULL);
> > +
> >      register_mshv_memory_listener(s, &s->memory_listener, &address_space_memory,
> >                                    0, "mshv-memory");
> >      memory_listener_register(&mshv_io_listener, &address_space_io);


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 09/25] accel/mshv: Register guest memory regions with hypervisor
  2025-05-20 20:07   ` Wei Liu
@ 2025-05-23 14:17     ` Magnus Kulke
  0 siblings, 0 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-23 14:17 UTC (permalink / raw)
  To: Wei Liu
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Phil Dennis-Jordan, Roman Bolshakov, Philippe Mathieu-Daudé,
	Zhao Liu, Richard Henderson, Cameron Esfahani,
	Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 08:07:27PM +0000, Wei Liu wrote:
> On Tue, May 20, 2025 at 01:30:02PM +0200, Magnus Kulke wrote:
> > Handle region_add events by invoking the MSHV memory registration
> > +        return set_guest_memory(vm_fd, &region);
> > +    }
> > +
> > +    region.flags = (1 << MSHV_SET_MEM_BIT_EXECUTABLE);
> 
> Should this be always set? Is there a way to get more information from
> the caller or QEMU's core memory region management logic?
> 

HVF always sets the bit and as far as I can tell KVM doesn't have a
KVM_MEM_EXECUTE flag, so it's implied.

Still, there might be some criteria to determine whether a region is
executable or not, I'll look further into that.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 08/25] accel/mshv: Initialize VM partition
  2025-05-23  8:23     ` Magnus Kulke
@ 2025-05-23 15:37       ` Wei Liu
  2025-05-23 16:13         ` Magnus Kulke
  0 siblings, 1 reply; 76+ messages in thread
From: Wei Liu @ 2025-05-23 15:37 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: Wei Liu, magnuskulke, qemu-devel, liuwe, Paolo Bonzini,
	Michael S. Tsirkin, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On Fri, May 23, 2025 at 10:23:58AM +0200, Magnus Kulke wrote:
> On Tue, May 20, 2025 at 07:07:06PM +0000, Wei Liu wrote:
> > On Tue, May 20, 2025 at 01:30:01PM +0200, Magnus Kulke wrote:
> > > Create the MSHV virtual machine by opening a partition and issuing
> > > the necessary ioctl to initialize it. This sets up the basic VM
> > > structure and initial configuration used by MSHV to manage guest state.
> > > 
> > > Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> > > ---
> > [...]
> > 
> > mshv_fd is neither stashed into a state structure nor freed after this
> > point.  Is it leaked?
> > 
> > Thanks,
> > Wei.
> > 
> 
> AFAIK the accelerator should not be initialized multiple times at runtime,
> so under normal circumstances the fd wouldn't leak. But in certain debug
> scenarios that would be the case. So, yes, we should make this more solid
> and exit early if MSHV_STATE has been previously initialized.
> 

I'm not talking about initialization specifically. I don't think QEMU
calls the initialization function of an accelerator multiple times.

What I mean is that after this point, the fd is neither closed nor
tracked. There is no way to cleanly handle it other than waiting for the
process to exist. One fd may not seem a lot, but it takes up precise
space in the file descriptor table in the kernel and is counted against
the fd limit.

My suggestion would be if this fd is no longer needed, it can be closed
in this same function.

If it is needed throughout the life cycle of the VM, we put it in a
either a global variable or (better) the accelerator state structure. If
we do the latter, we should also close it when we deinitialize the
accelerator if we have such a phase.

Thanks,
Wei.

> > >      s->nr_as = 1;
> > >      s->as = g_new0(MshvAddressSpace, s->nr_as);
> > >  
> > >      mshv_state = s;
> > >  
> > > +    qemu_register_reset(mshv_reset, NULL);
> > > +
> > >      register_mshv_memory_listener(s, &s->memory_listener, &address_space_memory,
> > >                                    0, "mshv-memory");
> > >      memory_listener_register(&mshv_io_listener, &address_space_io);


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 08/25] accel/mshv: Initialize VM partition
  2025-05-23 15:37       ` Wei Liu
@ 2025-05-23 16:13         ` Magnus Kulke
  0 siblings, 0 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-23 16:13 UTC (permalink / raw)
  To: Wei Liu
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Phil Dennis-Jordan, Roman Bolshakov, Philippe Mathieu-Daudé,
	Zhao Liu, Richard Henderson, Cameron Esfahani,
	Marc-André Lureau, Daniel P. Berrangé

On Fri, May 23, 2025 at 03:37:02PM +0000, Wei Liu wrote:
> On Fri, May 23, 2025 at 10:23:58AM +0200, Magnus Kulke wrote:
> > On Tue, May 20, 2025 at 07:07:06PM +0000, Wei Liu wrote:
> > > On Tue, May 20, 2025 at 01:30:01PM +0200, Magnus Kulke wrote:
> > > > Create the MSHV virtual machine by opening a partition and issuing
> > > > the necessary ioctl to initialize it. This sets up the basic VM
> > > > structure and initial configuration used by MSHV to manage guest state.
> > > > 
> > > > Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> > > > ---
> > > [...]
> I'm not talking about initialization specifically. I don't think QEMU
> calls the initialization function of an accelerator multiple times.
> 
> What I mean is that after this point, the fd is neither closed nor
> tracked. There is no way to cleanly handle it other than waiting for the
> process to exist. One fd may not seem a lot, but it takes up precise
> space in the file descriptor table in the kernel and is counted against
> the fd limit.
> 
> My suggestion would be if this fd is no longer needed, it can be closed
> in this same function.
> 
> If it is needed throughout the life cycle of the VM, we put it in a
> either a global variable or (better) the accelerator state structure. If
> we do the latter, we should also close it when we deinitialize the
> accelerator if we have such a phase.
> 
> Thanks,
> Wei.
> 

oh yes, that's right. we wouldn't use the mshv_fd anywhere else for the
time being, so we can close it immediately after the create_vm ioctl.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 12/25] accel/mshv: Add vCPU creation and execution loop
  2025-05-20 13:54     ` Paolo Bonzini
@ 2025-05-23 17:05       ` Wei Liu
  0 siblings, 0 replies; 76+ messages in thread
From: Wei Liu @ 2025-05-23 17:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Magnus Kulke, magnuskulke, qemu-devel, liuwe, Michael S. Tsirkin,
	Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 03:54:57PM +0200, Paolo Bonzini wrote:
> On 5/20/25 15:50, Paolo Bonzini wrote:
> > You need support in the hypervisor for this: KVM and HVF both have it.
> > 
> > There are two ways to do it
> 
> Sorry - I left out the other way which is to pass something *into*
> MSHV_RUN_VP since only half of it is currently used (I think).  But that's
> more complicated; the advantage would be to avoid the ioctl in the signal
> handler but it's not a fast path.  I would just do it the easy way.

Thank you for the suggestions. We need some time to discuss kernel side
changes.

Thanks,
Wei.

> 
> Paolo
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 11/25] accel/mshv: Add basic interrupt injection support
  2025-05-20 20:15   ` Wei Liu
@ 2025-05-27 16:27     ` Magnus Kulke
  0 siblings, 0 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-27 16:27 UTC (permalink / raw)
  To: Wei Liu
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Phil Dennis-Jordan, Roman Bolshakov, Philippe Mathieu-Daudé,
	Zhao Liu, Richard Henderson, Cameron Esfahani,
	Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 08:15:20PM +0000, Wei Liu wrote:
> On Tue, May 20, 2025 at 01:30:04PM +0200, Magnus Kulke wrote:
> > Implement initial interrupt handling logic in the MSHV backend. This
> > includes management of MSI and un/registering of irqfd mechanisms.
> > 
> > Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> > ---
> [...]
> > +int mshv_request_interrupt(int vm_fd, uint32_t interrupt_type, uint32_t vector,
> > +                           uint32_t vp_index, bool logical_dest_mode,
> > +                           bool level_triggered)
> > +{
> > +    int ret;
> > +
> > +    if (vector == 0) {
> > +        /* TODO: why do we receive this? */
> 
> You must have seen this in real life, right? We need to convince
> ourselves why this is okay.
> 
> Thanks,
> Wei.
> 

I haven't seen this in real use, I spotted it in the mshvc library and
wondered why we have this clause at this point. We can log a warning if
that occurs.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 17/25] target/i386/mshv: Implement mshv_get_special_regs()
  2025-05-20 22:15   ` Wei Liu
@ 2025-05-28 13:55     ` Magnus Kulke
  0 siblings, 0 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-28 13:55 UTC (permalink / raw)
  To: Wei Liu
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Phil Dennis-Jordan, Roman Bolshakov, Philippe Mathieu-Daudé,
	Zhao Liu, Richard Henderson, Cameron Esfahani,
	Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 10:15:23PM +0000, Wei Liu wrote:
> On Tue, May 20, 2025 at 01:30:10PM +0200, Magnus Kulke wrote:
> >  
> > +static enum hv_register_name SPECIAL_REGISTER_NAMES[18] = {
> [...]
> > +    HV_REGISTER_PENDING_INTERRUPTION,
> 
> Why do you think this is needed?
> 

It's not, that's a leftover. we can remove it, since we don't use it in
QEMU currently.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 18/25] target/i386/mshv: Implement mshv_arch_put_registers()
  2025-05-20 22:22   ` Wei Liu
@ 2025-05-28 14:30     ` Magnus Kulke
  2025-06-06 19:16       ` Wei Liu
  0 siblings, 1 reply; 76+ messages in thread
From: Magnus Kulke @ 2025-05-28 14:30 UTC (permalink / raw)
  To: Wei Liu
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Phil Dennis-Jordan, Roman Bolshakov, Philippe Mathieu-Daudé,
	Zhao Liu, Richard Henderson, Cameron Esfahani,
	Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 10:22:27PM +0000, Wei Liu wrote:
> On Tue, May 20, 2025 at 01:30:11PM +0200, Magnus Kulke wrote:
> > +    /*
> > +     * TODO: support asserting an interrupt using interrup_bitmap
> > +     * it should be possible if we use the vm_fd
> > +     */
> > +
> 
> Why is there a need to assert an interrupt here?
> 

The comment has been carried over from the mshv-ioctl crate:

https://github.com/rust-vmm/mshv/blob/main/mshv-ioctls/src/ioctls/vcpu.rs#L778

I was wondering whether we can/want to set the bitmap here, since we do
have access to the vm_fd, but I didn't follow up on that yet.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 22/25] target/i386/mshv: Integrate x86 instruction decoder/emulator
  2025-05-20 22:38   ` Wei Liu
@ 2025-05-28 15:10     ` Magnus Kulke
  0 siblings, 0 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-05-28 15:10 UTC (permalink / raw)
  To: Wei Liu
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Phil Dennis-Jordan, Roman Bolshakov, Philippe Mathieu-Daudé,
	Zhao Liu, Richard Henderson, Cameron Esfahani,
	Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 10:38:28PM +0000, Wei Liu wrote:
> On Tue, May 20, 2025 at 01:30:15PM +0200, Magnus Kulke wrote:
> > +    init_emu(&mshv_x86_emul_ops);
> 
> If I'm not mistaken, the name mshv_init_cpu_logic suggests this function
> is called every time a CPU is initialized. There is no need to
> repeatedly initialize the emulator.
> 
> The code snippet should be moved to either the initialization function
> of the accelerator or the initialization function of the VM object.

It's called as part of the accelerator initialization, but it's a
misnomer, I agree. We'll see whether we require the guards, so this is
is really just an mmio emu init.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 24/25] target/i386/mshv: Implement mshv_vcpu_run()
  2025-05-20 22:52   ` Wei Liu
@ 2025-06-03 15:40     ` Magnus Kulke
  2025-07-01  8:35     ` Magnus Kulke
  1 sibling, 0 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-06-03 15:40 UTC (permalink / raw)
  To: Wei Liu
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Phil Dennis-Jordan, Roman Bolshakov, Philippe Mathieu-Daudé,
	Zhao Liu, Richard Henderson, Cameron Esfahani,
	Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 10:52:39PM +0000, Wei Liu wrote:
> On Tue, May 20, 2025 at 01:30:17PM +0200, Magnus Kulke wrote:
> > +    case HVMSG_GPA_INTERCEPT:
> 
> I'm not sure why you want to handle UNMAPPED_GPA and GPA_INTERCEPT
> separately. In Cloud Hypervisor there is one code path for both.
> 
> Is this due to how the memory address space is set up in QEMU?
> 

yes, indeed. this is a provisiong for the dynamic re-mapping of
overlapping userspace addresses. We can handle both together in this
commit, though.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 18/25] target/i386/mshv: Implement mshv_arch_put_registers()
  2025-05-20 11:30 ` [RFC PATCH 18/25] target/i386/mshv: Implement mshv_arch_put_registers() Magnus Kulke
  2025-05-20 14:33   ` Paolo Bonzini
  2025-05-20 22:22   ` Wei Liu
@ 2025-06-06 19:11   ` Wei Liu
  2 siblings, 0 replies; 76+ messages in thread
From: Wei Liu @ 2025-06-06 19:11 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 01:30:11PM +0200, Magnus Kulke wrote:
> Write CPU register state to MSHV vCPUs. Various mapping functions to
> prepare the payload for the HV call have been implemented.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
[...]
> +
> +static void populate_hv_table_reg(const struct SegmentCache *seg,
> +                                  hv_x64_table_register *hv_reg)
> +{
> +    hv_reg->base = seg->base;
> +    hv_reg->limit = seg->limit;
> +    memset(hv_reg->pad, 0, sizeof(hv_reg->pad));

I'm not sure if the compiler will optimize this function call out.

It is straightforward to write

       *hv_reg = { .base = seg->base, .limit = seg->limit };

> +}
> +
> +static int set_special_regs(const CPUState *cpu)
> +{
> +    X86CPU *x86cpu = X86_CPU(cpu);
> +    CPUX86State *env = &x86cpu->env;
> +    int cpu_fd = mshv_vcpufd(cpu);
> +    struct hv_register_assoc *assocs;
> +    size_t n_regs = sizeof(SPECIAL_REGISTER_NAMES) / sizeof(hv_register_name);
> +    int ret;
> +
> +    assocs = g_new0(struct hv_register_assoc, n_regs);

The allocation here can be removed, since we know for sure how many
elements are in `SPECIAL_REGISTER_NAMES`. It should be fine to use an
on-stack array.

There are probably other places you can optimize.

Thanks,
Wei.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 18/25] target/i386/mshv: Implement mshv_arch_put_registers()
  2025-05-28 14:30     ` Magnus Kulke
@ 2025-06-06 19:16       ` Wei Liu
  0 siblings, 0 replies; 76+ messages in thread
From: Wei Liu @ 2025-06-06 19:16 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: Wei Liu, magnuskulke, qemu-devel, liuwe, Paolo Bonzini,
	Michael S. Tsirkin, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé,
	jinankjain, muislam

On Wed, May 28, 2025 at 04:30:55PM +0200, Magnus Kulke wrote:
> On Tue, May 20, 2025 at 10:22:27PM +0000, Wei Liu wrote:
> > On Tue, May 20, 2025 at 01:30:11PM +0200, Magnus Kulke wrote:
> > > +    /*
> > > +     * TODO: support asserting an interrupt using interrup_bitmap
> > > +     * it should be possible if we use the vm_fd
> > > +     */
> > > +
> > 
> > Why is there a need to assert an interrupt here?
> > 
> 
> The comment has been carried over from the mshv-ioctl crate:
> 
> https://github.com/rust-vmm/mshv/blob/main/mshv-ioctls/src/ioctls/vcpu.rs#L778
> 
> I was wondering whether we can/want to set the bitmap here, since we do
> have access to the vm_fd, but I didn't follow up on that yet.

In the code snippet you quoted, an error is returned if the bitmap is
not empty.

Please at least print a warning if the bitmap is not empty to catch any
issues. Debugging lost interrupts is hard enough as it is.

CC the Rust-VMM code co-owners for awareness.

Thanks,
Wei.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 12/25] accel/mshv: Add vCPU creation and execution loop
  2025-05-20 13:50   ` Paolo Bonzini
  2025-05-20 13:54     ` Paolo Bonzini
@ 2025-06-06 23:06     ` Nuno Das Neves
  1 sibling, 0 replies; 76+ messages in thread
From: Nuno Das Neves @ 2025-06-06 23:06 UTC (permalink / raw)
  To: Paolo Bonzini, Magnus Kulke, magnuskulke, qemu-devel, liuwe
  Cc: Michael S. Tsirkin, Wei Liu, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On 5/20/2025 6:50 AM, Paolo Bonzini wrote:
> On 5/20/25 13:30, Magnus Kulke wrote:
>> +    int ret;
>> +    hv_message exit_msg = { 0 };
> 
> You probably don't want to fill 512 bytes on every vmentry.  Maybe pass &exit_msg up from mshv_cpu_exec()?
> 
>> +        /*
>> +         * Read cpu->exit_request before KVM_RUN reads run->immediate_exit.
>> +         * Matching barrier in kvm_eat_signals.
>> +         */
>> +        smp_rmb();
> 
> The comment is obviously wrong; unfortunately, the code is wrong too:
> 
> 1) qemu_cpu_kick_self() is only needed for an old KVM API.  In that API the signal handler is blocked while QEMU runs.  In your case, qemu_cpu_kick_self() is an expensive way to do nothing.
> 
> 2) Because of this, there's a race condition between delivering the signal and entering MSHV_RUN_VP
> 

Hi Paolo,

I might be misunderstanding something here, but isn't there a race condition regardless of where this check is made?
i.e., checking a flag in userspace, like the above:

if (qatomic_read(&cpu->exit_request)) {

vs checking the flag in the kernel, are effectively doing the same thing.
The signal can still come just after the check is made (in the kernel) and the VP will dispatch.

The virtual "explicit suspend" register in the VP seems to solve this problem - it can be used for manually kicking the VP
while it is running. But, it can also be set before dispatching the VP, and the dispatch hypercall will return immediately
in that case.

Thanks
Nuno

> You need support in the hypervisor for this: KVM and HVF both have it.
> 
> There are two ways to do it, for both cases the hypervisor side for the latter can be something like this:
> 
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index 72df774e410a..627afece4046 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -530,7 +530,7 @@ static long mshv_run_vp_with_root_scheduler(
>          struct hv_output_dispatch_vp output;
> 
>          ret = mshv_pre_guest_mode_work(vp);
> -        if (ret)
> +        if (ret || vp->run.flags.immediate_exit)
>              break;
> 
>          if (vp->run.flags.intercept_suspend)
> @@ -585,6 +585,7 @@
>          }
>      } while (!vp->run.flags.intercept_suspend);
> 
> +    vp->run.flags.immediate_exit = 0;
>      return ret;
>  }
> 
> 
> Instead of calling qemu_cpu_kick_self(), your signal handler would invoke a new MSHV ioctl that sets vp->run.flags.immediate_exit = 1.
> 
> And then you also don't need the barrier, by the way, because all inter-thread communication is mediated by the signal handler.
> 
> Paolo
> 
> 
> 



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 24/25] target/i386/mshv: Implement mshv_vcpu_run()
  2025-05-20 22:52   ` Wei Liu
  2025-06-03 15:40     ` Magnus Kulke
@ 2025-07-01  8:35     ` Magnus Kulke
  2025-07-01 15:11       ` Wei Liu
  1 sibling, 1 reply; 76+ messages in thread
From: Magnus Kulke @ 2025-07-01  8:35 UTC (permalink / raw)
  To: Wei Liu
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Phil Dennis-Jordan, Roman Bolshakov, Philippe Mathieu-Daudé,
	Zhao Liu, Richard Henderson, Cameron Esfahani,
	Marc-André Lureau, Daniel P. Berrangé

On Tue, May 20, 2025 at 10:52:39PM +0000, Wei Liu wrote:
> On Tue, May 20, 2025 at 01:30:17PM +0200, Magnus Kulke wrote:
> > +    default:
> > +        msg = &exit_msg;
> 
> Do you not get any HALT exit? How are you going to shut down the VM?
> 

In the WHPX accelerator there is this comment:

	case WHvRunVpExitReasonX64Halt:
		/*
		 * WARNING: as of build 19043.1526 (21H1), this exit reason is no
		 * longer used.
		 */
		ret = whpx_handle_halt(cpu);
		break;

I wonder if this also applies to HVMSG_X64_HALT from the MSHV driver?


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 24/25] target/i386/mshv: Implement mshv_vcpu_run()
  2025-07-01  8:35     ` Magnus Kulke
@ 2025-07-01 15:11       ` Wei Liu
  2025-07-01 15:45         ` Magnus Kulke
  0 siblings, 1 reply; 76+ messages in thread
From: Wei Liu @ 2025-07-01 15:11 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: Wei Liu, magnuskulke, qemu-devel, liuwe, Paolo Bonzini,
	Michael S. Tsirkin, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On Tue, Jul 01, 2025 at 10:35:34AM +0200, Magnus Kulke wrote:
> On Tue, May 20, 2025 at 10:52:39PM +0000, Wei Liu wrote:
> > On Tue, May 20, 2025 at 01:30:17PM +0200, Magnus Kulke wrote:
> > > +    default:
> > > +        msg = &exit_msg;
> > 
> > Do you not get any HALT exit? How are you going to shut down the VM?
> > 
> 
> In the WHPX accelerator there is this comment:
> 
> 	case WHvRunVpExitReasonX64Halt:
> 		/*
> 		 * WARNING: as of build 19043.1526 (21H1), this exit reason is no
> 		 * longer used.
> 		 */
> 		ret = whpx_handle_halt(cpu);
> 		break;
> 
> I wonder if this also applies to HVMSG_X64_HALT from the MSHV driver?

IIRC that's still used in our driver.

You can try shutting down the VM with `poweroff` or `halt` and see if
you get the exit.

Wei


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 24/25] target/i386/mshv: Implement mshv_vcpu_run()
  2025-07-01 15:11       ` Wei Liu
@ 2025-07-01 15:45         ` Magnus Kulke
  2025-07-01 15:47           ` Wei Liu
  0 siblings, 1 reply; 76+ messages in thread
From: Magnus Kulke @ 2025-07-01 15:45 UTC (permalink / raw)
  To: Wei Liu
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Phil Dennis-Jordan, Roman Bolshakov, Philippe Mathieu-Daudé,
	Zhao Liu, Richard Henderson, Cameron Esfahani,
	Marc-André Lureau, Daniel P. Berrangé

On Tue, Jul 01, 2025 at 03:11:39PM +0000, Wei Liu wrote:
> On Tue, Jul 01, 2025 at 10:35:34AM +0200, Magnus Kulke wrote:
> > On Tue, May 20, 2025 at 10:52:39PM +0000, Wei Liu wrote:
> > > On Tue, May 20, 2025 at 01:30:17PM +0200, Magnus Kulke wrote:
> > > > +    default:
> > > > +        msg = &exit_msg;
> > > 
> > > Do you not get any HALT exit? How are you going to shut down the VM?
> > > 
> > 
> > In the WHPX accelerator there is this comment:
> > 
> > 	case WHvRunVpExitReasonX64Halt:
> > 		/*
> > 		 * WARNING: as of build 19043.1526 (21H1), this exit reason is no
> > 		 * longer used.
> > 		 */
> > 		ret = whpx_handle_halt(cpu);
> > 		break;
> > 
> > I wonder if this also applies to HVMSG_X64_HALT from the MSHV driver?
> 
> IIRC that's still used in our driver.
> 
> You can try shutting down the VM with `poweroff` or `halt` and see if
> you get the exit.
> 
> Wei

I wasn't able to trigger the exit with `poweroff` or `halt -p`. Or a
kernel module that performs:

```
local_irq_disable();
__asm__("hlt");
```

(it will just hang the guest).

I have added the handler, but it looks like it's dead code currently.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 24/25] target/i386/mshv: Implement mshv_vcpu_run()
  2025-07-01 15:45         ` Magnus Kulke
@ 2025-07-01 15:47           ` Wei Liu
  2025-07-01 15:51             ` Magnus Kulke
  0 siblings, 1 reply; 76+ messages in thread
From: Wei Liu @ 2025-07-01 15:47 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: Wei Liu, magnuskulke, qemu-devel, liuwe, Paolo Bonzini,
	Michael S. Tsirkin, Phil Dennis-Jordan, Roman Bolshakov,
	Philippe Mathieu-Daudé, Zhao Liu, Richard Henderson,
	Cameron Esfahani, Marc-André Lureau, Daniel P. Berrangé

On Tue, Jul 01, 2025 at 05:45:07PM +0200, Magnus Kulke wrote:
> On Tue, Jul 01, 2025 at 03:11:39PM +0000, Wei Liu wrote:
> > On Tue, Jul 01, 2025 at 10:35:34AM +0200, Magnus Kulke wrote:
> > > On Tue, May 20, 2025 at 10:52:39PM +0000, Wei Liu wrote:
> > > > On Tue, May 20, 2025 at 01:30:17PM +0200, Magnus Kulke wrote:
> > > > > +    default:
> > > > > +        msg = &exit_msg;
> > > > 
> > > > Do you not get any HALT exit? How are you going to shut down the VM?
> > > > 
> > > 
> > > In the WHPX accelerator there is this comment:
> > > 
> > > 	case WHvRunVpExitReasonX64Halt:
> > > 		/*
> > > 		 * WARNING: as of build 19043.1526 (21H1), this exit reason is no
> > > 		 * longer used.
> > > 		 */
> > > 		ret = whpx_handle_halt(cpu);
> > > 		break;
> > > 
> > > I wonder if this also applies to HVMSG_X64_HALT from the MSHV driver?
> > 
> > IIRC that's still used in our driver.
> > 
> > You can try shutting down the VM with `poweroff` or `halt` and see if
> > you get the exit.
> > 
> > Wei
> 
> I wasn't able to trigger the exit with `poweroff` or `halt -p`. Or a
> kernel module that performs:
> 
> ```
> local_irq_disable();
> __asm__("hlt");
> ```
> 
> (it will just hang the guest).
> 
> I have added the handler, but it looks like it's dead code currently.

We can leave the out for now as long as the guest shutdown works.

Wei.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 24/25] target/i386/mshv: Implement mshv_vcpu_run()
  2025-07-01 15:47           ` Wei Liu
@ 2025-07-01 15:51             ` Magnus Kulke
  0 siblings, 0 replies; 76+ messages in thread
From: Magnus Kulke @ 2025-07-01 15:51 UTC (permalink / raw)
  To: Wei Liu
  Cc: magnuskulke, qemu-devel, liuwe, Paolo Bonzini, Michael S. Tsirkin,
	Phil Dennis-Jordan, Roman Bolshakov, Philippe Mathieu-Daudé,
	Zhao Liu, Richard Henderson, Cameron Esfahani,
	Marc-André Lureau, Daniel P. Berrangé

On Tue, Jul 01, 2025 at 03:47:40PM +0000, Wei Liu wrote:
> 
> We can leave the out for now as long as the guest shutdown works.
> 
> Wei.

yup, shutdown works fine, so I will drop the commit from the next patch
set, thanks!


^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2025-07-01 15:51 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-20 11:29 [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
2025-05-20 11:29 ` [RFC PATCH 01/25] accel: Add Meson and config support for MSHV accelerator Magnus Kulke
2025-05-20 11:50   ` Daniel P. Berrangé
2025-05-20 14:16     ` Paolo Bonzini
2025-05-20 11:29 ` [RFC PATCH 02/25] target/i386/emulate: allow instruction decoding from stream Magnus Kulke
2025-05-20 12:42   ` Paolo Bonzini
2025-05-20 17:29   ` Wei Liu
2025-05-20 11:29 ` [RFC PATCH 03/25] target/i386/mshv: Add x86 decoder/emu implementation Magnus Kulke
2025-05-20 11:54   ` Daniel P. Berrangé
2025-05-20 13:17   ` Paolo Bonzini
2025-05-20 17:36   ` Wei Liu
2025-05-20 11:29 ` [RFC PATCH 04/25] hw/intc: Generalize APIC helper names from kvm_* to accel_* Magnus Kulke
2025-05-20 11:29 ` [RFC PATCH 05/25] include/hw/hyperv: Add MSHV ABI header definitions Magnus Kulke
2025-05-20 14:24   ` Paolo Bonzini
2025-05-20 11:29 ` [RFC PATCH 06/25] accel/mshv: Add accelerator skeleton Magnus Kulke
2025-05-20 12:02   ` Daniel P. Berrangé
2025-05-20 12:38     ` Paolo Bonzini
2025-05-20 11:30 ` [RFC PATCH 07/25] accel/mshv: Register memory region listeners Magnus Kulke
2025-05-20 11:30 ` [RFC PATCH 08/25] accel/mshv: Initialize VM partition Magnus Kulke
2025-05-20 19:07   ` Wei Liu
2025-05-22 15:42     ` Magnus Kulke
2025-05-22 17:46       ` Wei Liu
2025-05-23  8:23     ` Magnus Kulke
2025-05-23 15:37       ` Wei Liu
2025-05-23 16:13         ` Magnus Kulke
2025-05-20 11:30 ` [RFC PATCH 09/25] accel/mshv: Register guest memory regions with hypervisor Magnus Kulke
2025-05-20 20:07   ` Wei Liu
2025-05-23 14:17     ` Magnus Kulke
2025-05-20 11:30 ` [RFC PATCH 10/25] accel/mshv: Add ioeventfd support Magnus Kulke
2025-05-20 11:30 ` [RFC PATCH 11/25] accel/mshv: Add basic interrupt injection support Magnus Kulke
2025-05-20 14:18   ` Paolo Bonzini
2025-05-20 20:15   ` Wei Liu
2025-05-27 16:27     ` Magnus Kulke
2025-05-20 11:30 ` [RFC PATCH 12/25] accel/mshv: Add vCPU creation and execution loop Magnus Kulke
2025-05-20 13:50   ` Paolo Bonzini
2025-05-20 13:54     ` Paolo Bonzini
2025-05-23 17:05       ` Wei Liu
2025-06-06 23:06     ` Nuno Das Neves
2025-05-20 11:30 ` [RFC PATCH 13/25] accel/mshv: Add vCPU signal handling Magnus Kulke
2025-05-20 11:30 ` [RFC PATCH 14/25] target/i386/mshv: Add CPU create and remove logic Magnus Kulke
2025-05-20 21:50   ` Wei Liu
2025-05-20 11:30 ` [RFC PATCH 15/25] target/i386/mshv: Implement mshv_store_regs() Magnus Kulke
2025-05-20 22:07   ` Wei Liu
2025-05-20 11:30 ` [RFC PATCH 16/25] target/i386/mshv: Implement mshv_get_standard_regs() Magnus Kulke
2025-05-20 22:09   ` Wei Liu
2025-05-20 11:30 ` [RFC PATCH 17/25] target/i386/mshv: Implement mshv_get_special_regs() Magnus Kulke
2025-05-20 14:05   ` Paolo Bonzini
2025-05-20 22:15   ` Wei Liu
2025-05-28 13:55     ` Magnus Kulke
2025-05-20 11:30 ` [RFC PATCH 18/25] target/i386/mshv: Implement mshv_arch_put_registers() Magnus Kulke
2025-05-20 14:33   ` Paolo Bonzini
2025-05-20 22:22   ` Wei Liu
2025-05-28 14:30     ` Magnus Kulke
2025-06-06 19:16       ` Wei Liu
2025-06-06 19:11   ` Wei Liu
2025-05-20 11:30 ` [RFC PATCH 19/25] target/i386/mshv: Set local interrupt controller state Magnus Kulke
2025-05-20 14:03   ` Paolo Bonzini
2025-05-20 11:30 ` [RFC PATCH 20/25] target/i386/mshv: Register CPUID entries with MSHV Magnus Kulke
2025-05-20 11:30 ` [RFC PATCH 21/25] target/i386/mshv: Register MSRs " Magnus Kulke
2025-05-20 11:30 ` [RFC PATCH 22/25] target/i386/mshv: Integrate x86 instruction decoder/emulator Magnus Kulke
2025-05-20 22:38   ` Wei Liu
2025-05-28 15:10     ` Magnus Kulke
2025-05-20 11:30 ` [RFC PATCH 23/25] target/i386/mshv: Write MSRs to the hypervisor Magnus Kulke
2025-05-20 11:30 ` [RFC PATCH 24/25] target/i386/mshv: Implement mshv_vcpu_run() Magnus Kulke
2025-05-20 13:21   ` Paolo Bonzini
2025-05-20 22:52   ` Wei Liu
2025-06-03 15:40     ` Magnus Kulke
2025-07-01  8:35     ` Magnus Kulke
2025-07-01 15:11       ` Wei Liu
2025-07-01 15:45         ` Magnus Kulke
2025-07-01 15:47           ` Wei Liu
2025-07-01 15:51             ` Magnus Kulke
2025-05-20 11:30 ` [RFC PATCH 25/25] accel/mshv: Add memory remapping workaround Magnus Kulke
2025-05-20 13:53   ` Paolo Bonzini
2025-05-22 12:51     ` Magnus Kulke
2025-05-20 14:25 ` [RFC PATCH 00/25] Implementing a MSHV (Microsoft Hypervisor) accelerator Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).