* [PATCH v3 01/26] accel: Add Meson and config support for MSHV accelerator
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-11 17:51 ` Wei Liu
2025-08-27 10:27 ` Daniel P. Berrangé
2025-08-07 14:39 ` [PATCH v3 02/26] target/i386/emulate: Allow instruction decoding from stream Magnus Kulke
` (26 subsequent siblings)
27 siblings, 2 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Introduce a Meson feature option and default-config entry to allow
building QEMU with MSHV (Microsoft Hypervisor) acceleration support.
This is the first step toward implementing an MSHV backend in QEMU.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
accel/Kconfig | 3 +++
meson.build | 10 ++++++++++
meson_options.txt | 2 ++
scripts/meson-buildoptions.sh | 3 +++
4 files changed, 18 insertions(+)
diff --git a/accel/Kconfig b/accel/Kconfig
index 4263cab722..a60f114923 100644
--- a/accel/Kconfig
+++ b/accel/Kconfig
@@ -13,6 +13,9 @@ config TCG
config KVM
bool
+config MSHV
+ bool
+
config XEN
bool
select FSDEV_9P if VIRTFS
diff --git a/meson.build b/meson.build
index e53cd5b413..b6e70714f1 100644
--- a/meson.build
+++ b/meson.build
@@ -334,6 +334,7 @@ elif cpu == 'x86_64'
'CONFIG_HVF': ['x86_64-softmmu'],
'CONFIG_NVMM': ['i386-softmmu', 'x86_64-softmmu'],
'CONFIG_WHPX': ['i386-softmmu', 'x86_64-softmmu'],
+ 'CONFIG_MSHV': ['x86_64-softmmu'],
}
endif
@@ -884,6 +885,14 @@ accelerators = []
if get_option('kvm').allowed() and host_os == 'linux'
accelerators += 'CONFIG_KVM'
endif
+
+if get_option('mshv').allowed() and host_os == 'linux'
+ if get_option('mshv').enabled() and host_machine.cpu() != 'x86_64'
+ error('mshv accelerator requires x64_64 host')
+ endif
+ accelerators += 'CONFIG_MSHV'
+endif
+
if get_option('whpx').allowed() and host_os == 'windows'
if get_option('whpx').enabled() and host_machine.cpu() != 'x86_64'
error('WHPX requires 64-bit host')
@@ -4818,6 +4827,7 @@ if have_system
summary_info += {'HVF support': config_all_accel.has_key('CONFIG_HVF')}
summary_info += {'WHPX support': config_all_accel.has_key('CONFIG_WHPX')}
summary_info += {'NVMM support': config_all_accel.has_key('CONFIG_NVMM')}
+ summary_info += {'MSHV support': config_all_accel.has_key('CONFIG_MSHV')}
summary_info += {'Xen support': xen.found()}
if xen.found()
summary_info += {'xen ctrl version': xen.version()}
diff --git a/meson_options.txt b/meson_options.txt
index dd33530750..2a6e8dd950 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -71,6 +71,8 @@ option('malloc', type : 'combo', choices : ['system', 'tcmalloc', 'jemalloc'],
option('kvm', type: 'feature', value: 'auto',
description: 'KVM acceleration support')
+option('mshv', type: 'feature', value: 'auto',
+ description: 'MSHV acceleration support')
option('whpx', type: 'feature', value: 'auto',
description: 'WHPX acceleration support')
option('hvf', type: 'feature', value: 'auto',
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index d559e260ed..a3bc3d195e 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -157,6 +157,7 @@ meson_options_help() {
printf "%s\n" ' membarrier membarrier system call (for Linux 4.14+ or Windows'
printf "%s\n" ' modules modules support (non Windows)'
printf "%s\n" ' mpath Multipath persistent reservation passthrough'
+ printf "%s\n" ' mshv MSHV acceleration support'
printf "%s\n" ' multiprocess Out of process device emulation support'
printf "%s\n" ' netmap netmap network backend support'
printf "%s\n" ' nettle nettle cryptography support'
@@ -413,6 +414,8 @@ _meson_option_parse() {
--disable-modules) printf "%s" -Dmodules=disabled ;;
--enable-mpath) printf "%s" -Dmpath=enabled ;;
--disable-mpath) printf "%s" -Dmpath=disabled ;;
+ --enable-mshv) printf "%s" -Dmshv=enabled ;;
+ --disable-mshv) printf "%s" -Dmshv=disabled ;;
--enable-multiprocess) printf "%s" -Dmultiprocess=enabled ;;
--disable-multiprocess) printf "%s" -Dmultiprocess=disabled ;;
--enable-netmap) printf "%s" -Dnetmap=enabled ;;
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v3 01/26] accel: Add Meson and config support for MSHV accelerator
2025-08-07 14:39 ` [PATCH v3 01/26] accel: Add Meson and config support for MSHV accelerator Magnus Kulke
@ 2025-08-11 17:51 ` Wei Liu
2025-08-27 10:27 ` Daniel P. Berrangé
1 sibling, 0 replies; 46+ messages in thread
From: Wei Liu @ 2025-08-11 17:51 UTC (permalink / raw)
To: Magnus Kulke
Cc: qemu-devel, Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
On Thu, Aug 07, 2025 at 04:39:26PM +0200, Magnus Kulke wrote:
> Introduce a Meson feature option and default-config entry to allow
> building QEMU with MSHV (Microsoft Hypervisor) acceleration support.
>
> This is the first step toward implementing an MSHV backend in QEMU.
>
[...]
> if get_option('whpx').allowed() and host_os == 'windows'
> if get_option('whpx').enabled() and host_machine.cpu() != 'x86_64'
> error('WHPX requires 64-bit host')
> @@ -4818,6 +4827,7 @@ if have_system
> summary_info += {'HVF support': config_all_accel.has_key('CONFIG_HVF')}
> summary_info += {'WHPX support': config_all_accel.has_key('CONFIG_WHPX')}
> summary_info += {'NVMM support': config_all_accel.has_key('CONFIG_NVMM')}
> + summary_info += {'MSHV support': config_all_accel.has_key('CONFIG_MSHV')}
Minor nit, one space too many here.
Wei
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v3 01/26] accel: Add Meson and config support for MSHV accelerator
2025-08-07 14:39 ` [PATCH v3 01/26] accel: Add Meson and config support for MSHV accelerator Magnus Kulke
2025-08-11 17:51 ` Wei Liu
@ 2025-08-27 10:27 ` Daniel P. Berrangé
1 sibling, 0 replies; 46+ messages in thread
From: Daniel P. Berrangé @ 2025-08-27 10:27 UTC (permalink / raw)
To: Magnus Kulke
Cc: qemu-devel, Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Magnus Kulke, Cornelia Huck, Zhao Liu, Thomas Huth, Yanan Wang,
Cameron Esfahani, Wei Liu, Wei Liu, Marc-André Lureau,
Roman Bolshakov, Philippe Mathieu-Daudé
On Thu, Aug 07, 2025 at 04:39:26PM +0200, Magnus Kulke wrote:
> Introduce a Meson feature option and default-config entry to allow
> building QEMU with MSHV (Microsoft Hypervisor) acceleration support.
>
> This is the first step toward implementing an MSHV backend in QEMU.
>
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
> accel/Kconfig | 3 +++
> meson.build | 10 ++++++++++
> meson_options.txt | 2 ++
> scripts/meson-buildoptions.sh | 3 +++
> 4 files changed, 18 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v3 02/26] target/i386/emulate: Allow instruction decoding from stream
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 01/26] accel: Add Meson and config support for MSHV accelerator Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 03/26] target/i386/mshv: Add x86 decoder/emu implementation Magnus Kulke
` (25 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Introduce a new helper function to decode x86 instructions from a
raw instruction byte stream. MSHV delivers an instruction stream in a
buffer of the vm_exit message. It can be used to speed up MMIO
emulation, since instructions do not have to be fetched and translated.
Added "fetch_instruction()" op to x86_emul_ops() to improve
traceability.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
target/i386/emulate/x86_decode.c | 27 +++++++++++++++++++++++----
target/i386/emulate/x86_decode.h | 9 +++++++++
target/i386/emulate/x86_emu.c | 3 ++-
target/i386/emulate/x86_emu.h | 2 ++
4 files changed, 36 insertions(+), 5 deletions(-)
diff --git a/target/i386/emulate/x86_decode.c b/target/i386/emulate/x86_decode.c
index 2eca39802e..97bd6f1a3b 100644
--- a/target/i386/emulate/x86_decode.c
+++ b/target/i386/emulate/x86_decode.c
@@ -71,10 +71,16 @@ static inline uint64_t decode_bytes(CPUX86State *env, struct x86_decode *decode,
VM_PANIC_EX("%s invalid size %d\n", __func__, size);
break;
}
- target_ulong va = linear_rip(env_cpu(env), env->eip) + decode->len;
- emul_ops->read_mem(env_cpu(env), &val, va, size);
+
+ /* copy the bytes from the instruction stream, if available */
+ if (decode->stream && decode->len + size <= decode->stream->len) {
+ memcpy(&val, decode->stream->bytes + decode->len, size);
+ } else {
+ target_ulong va = linear_rip(env_cpu(env), env->eip) + decode->len;
+ emul_ops->fetch_instruction(env_cpu(env), &val, va, size);
+ }
decode->len += size;
-
+
return val;
}
@@ -2076,9 +2082,10 @@ static void decode_opcodes(CPUX86State *env, struct x86_decode *decode)
}
}
-uint32_t decode_instruction(CPUX86State *env, struct x86_decode *decode)
+static uint32_t decode_opcode(CPUX86State *env, struct x86_decode *decode)
{
memset(decode, 0, sizeof(*decode));
+
decode_prefix(env, decode);
set_addressing_size(env, decode);
set_operand_size(env, decode);
@@ -2088,6 +2095,18 @@ uint32_t decode_instruction(CPUX86State *env, struct x86_decode *decode)
return decode->len;
}
+uint32_t decode_instruction(CPUX86State *env, struct x86_decode *decode)
+{
+ return decode_opcode(env, decode);
+}
+
+uint32_t decode_instruction_stream(CPUX86State *env, struct x86_decode *decode,
+ struct x86_insn_stream *stream)
+{
+ decode->stream = stream;
+ return decode_opcode(env, decode);
+}
+
void init_decoder(void)
{
int i;
diff --git a/target/i386/emulate/x86_decode.h b/target/i386/emulate/x86_decode.h
index 927645af1a..1cadf3694f 100644
--- a/target/i386/emulate/x86_decode.h
+++ b/target/i386/emulate/x86_decode.h
@@ -272,6 +272,11 @@ typedef struct x86_decode_op {
};
} x86_decode_op;
+typedef struct x86_insn_stream {
+ const uint8_t *bytes;
+ size_t len;
+} x86_insn_stream;
+
typedef struct x86_decode {
int len;
uint8_t opcode[4];
@@ -298,11 +303,15 @@ typedef struct x86_decode {
struct x86_modrm modrm;
struct x86_decode_op op[4];
bool is_fpu;
+
+ x86_insn_stream *stream;
} x86_decode;
uint64_t sign(uint64_t val, int size);
uint32_t decode_instruction(CPUX86State *env, struct x86_decode *decode);
+uint32_t decode_instruction_stream(CPUX86State *env, struct x86_decode *decode,
+ struct x86_insn_stream *stream);
void *get_reg_ref(CPUX86State *env, int reg, int rex_present,
int is_extended, int size);
diff --git a/target/i386/emulate/x86_emu.c b/target/i386/emulate/x86_emu.c
index db7a7f7437..4409f7bc13 100644
--- a/target/i386/emulate/x86_emu.c
+++ b/target/i386/emulate/x86_emu.c
@@ -1246,7 +1246,8 @@ static void init_cmd_handler(void)
bool exec_instruction(CPUX86State *env, struct x86_decode *ins)
{
if (!_cmd_handler[ins->cmd].handler) {
- printf("Unimplemented handler (" TARGET_FMT_lx ") for %d (%x %x) \n", env->eip,
+ printf("Unimplemented handler (" TARGET_FMT_lx ") for %d (%x %x)\n",
+ env->eip,
ins->cmd, ins->opcode[0],
ins->opcode_len > 1 ? ins->opcode[1] : 0);
env->eip += ins->len;
diff --git a/target/i386/emulate/x86_emu.h b/target/i386/emulate/x86_emu.h
index a1a961284b..05686b162f 100644
--- a/target/i386/emulate/x86_emu.h
+++ b/target/i386/emulate/x86_emu.h
@@ -24,6 +24,8 @@
#include "cpu.h"
struct x86_emul_ops {
+ void (*fetch_instruction)(CPUState *cpu, void *data, target_ulong addr,
+ int bytes);
void (*read_mem)(CPUState *cpu, void *data, target_ulong addr, int bytes);
void (*write_mem)(CPUState *cpu, void *data, target_ulong addr, int bytes);
void (*read_segment_descriptor)(CPUState *cpu, struct x86_segment_descriptor *desc,
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 03/26] target/i386/mshv: Add x86 decoder/emu implementation
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 01/26] accel: Add Meson and config support for MSHV accelerator Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 02/26] target/i386/emulate: Allow instruction decoding from stream Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 04/26] hw/intc: Generalize APIC helper names from kvm_* to accel_* Magnus Kulke
` (24 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
The MSHV accelerator requires a x86 decoder/emulator in userland to
emulate MMIO instructions. This change contains the implementations for
the generalized i386 instruction decoder/emulator.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
include/system/mshv.h | 25 +++
target/i386/cpu.h | 2 +-
target/i386/emulate/meson.build | 7 +-
target/i386/meson.build | 2 +
target/i386/mshv/meson.build | 7 +
target/i386/mshv/x86.c | 297 ++++++++++++++++++++++++++++++++
6 files changed, 337 insertions(+), 3 deletions(-)
create mode 100644 include/system/mshv.h
create mode 100644 target/i386/mshv/meson.build
create mode 100644 target/i386/mshv/x86.c
diff --git a/include/system/mshv.h b/include/system/mshv.h
new file mode 100644
index 0000000000..a971982b52
--- /dev/null
+++ b/include/system/mshv.h
@@ -0,0 +1,25 @@
+/*
+ * QEMU MSHV support
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * Authors: Ziqiao Zhou <ziqiaozhou@microsoft.com>
+ * Magnus Kulke <magnuskulke@microsoft.com>
+ * Jinank Jain <jinankjain@microsoft.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ */
+
+#ifndef QEMU_MSHV_INT_H
+#define QEMU_MSHV_INT_H
+
+#ifdef COMPILING_PER_TARGET
+#ifdef CONFIG_MSHV
+#define CONFIG_MSHV_IS_POSSIBLE
+#endif
+#else
+#define CONFIG_MSHV_IS_POSSIBLE
+#endif
+
+#endif
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index f977fc49a7..6d3d2b1440 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2126,7 +2126,7 @@ typedef struct CPUArchState {
QEMUTimer *xen_periodic_timer;
QemuMutex xen_timers_lock;
#endif
-#if defined(CONFIG_HVF)
+#if defined(CONFIG_HVF) || defined(CONFIG_MSHV)
void *emu_mmio_buf;
#endif
diff --git a/target/i386/emulate/meson.build b/target/i386/emulate/meson.build
index 4edd4f462f..b6dafb6a5b 100644
--- a/target/i386/emulate/meson.build
+++ b/target/i386/emulate/meson.build
@@ -1,5 +1,8 @@
-i386_system_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files(
+emulator_files = files(
'x86_decode.c',
'x86_emu.c',
'x86_flags.c',
-))
+)
+
+i386_system_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: emulator_files)
+i386_system_ss.add(when: 'CONFIG_MSHV', if_true: emulator_files)
diff --git a/target/i386/meson.build b/target/i386/meson.build
index 092af34e2d..89ba4912aa 100644
--- a/target/i386/meson.build
+++ b/target/i386/meson.build
@@ -13,6 +13,7 @@ i386_ss.add(when: 'CONFIG_KVM', if_true: files('host-cpu.c'))
i386_ss.add(when: 'CONFIG_HVF', if_true: files('host-cpu.c'))
i386_ss.add(when: 'CONFIG_WHPX', if_true: files('host-cpu.c'))
i386_ss.add(when: 'CONFIG_NVMM', if_true: files('host-cpu.c'))
+i386_ss.add(when: 'CONFIG_MSHV', if_true: files('host-cpu.c'))
i386_system_ss = ss.source_set()
i386_system_ss.add(files(
@@ -34,6 +35,7 @@ subdir('nvmm')
subdir('hvf')
subdir('tcg')
subdir('emulate')
+subdir('mshv')
target_arch += {'i386': i386_ss}
target_system_arch += {'i386': i386_system_ss}
diff --git a/target/i386/mshv/meson.build b/target/i386/mshv/meson.build
new file mode 100644
index 0000000000..8ddaa7c11d
--- /dev/null
+++ b/target/i386/mshv/meson.build
@@ -0,0 +1,7 @@
+i386_mshv_ss = ss.source_set()
+
+i386_mshv_ss.add(files(
+ 'x86.c',
+))
+
+i386_system_ss.add_all(when: 'CONFIG_MSHV', if_true: i386_mshv_ss)
diff --git a/target/i386/mshv/x86.c b/target/i386/mshv/x86.c
new file mode 100644
index 0000000000..d574b3bc52
--- /dev/null
+++ b/target/i386/mshv/x86.c
@@ -0,0 +1,297 @@
+/*
+ * QEMU MSHV support
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * Authors: Magnus Kulke <magnuskulke@microsoft.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+
+#include "cpu.h"
+#include "emulate/x86_decode.h"
+#include "emulate/x86_emu.h"
+#include "qemu/typedefs.h"
+#include "qemu/error-report.h"
+#include "system/mshv.h"
+
+/* RW or Exec segment */
+static const uint8_t RWRX_SEGMENT_TYPE = 0x2;
+static const uint8_t CODE_SEGMENT_TYPE = 0x8;
+static const uint8_t EXPAND_DOWN_SEGMENT_TYPE = 0x4;
+
+typedef enum CpuMode {
+ REAL_MODE,
+ PROTECTED_MODE,
+ LONG_MODE,
+} CpuMode;
+
+static CpuMode cpu_mode(CPUState *cpu)
+{
+ enum CpuMode m = REAL_MODE;
+
+ if (x86_is_protected(cpu)) {
+ m = PROTECTED_MODE;
+
+ if (x86_is_long_mode(cpu)) {
+ m = LONG_MODE;
+ }
+ }
+
+ return m;
+}
+
+static bool segment_type_ro(const SegmentCache *seg)
+{
+ uint32_t type_ = (seg->flags >> DESC_TYPE_SHIFT) & 15;
+ return (type_ & (~RWRX_SEGMENT_TYPE)) == 0;
+}
+
+static bool segment_type_code(const SegmentCache *seg)
+{
+ uint32_t type_ = (seg->flags >> DESC_TYPE_SHIFT) & 15;
+ return (type_ & CODE_SEGMENT_TYPE) != 0;
+}
+
+static bool segment_expands_down(const SegmentCache *seg)
+{
+ uint32_t type_ = (seg->flags >> DESC_TYPE_SHIFT) & 15;
+
+ if (segment_type_code(seg)) {
+ return false;
+ }
+
+ return (type_ & EXPAND_DOWN_SEGMENT_TYPE) != 0;
+}
+
+static uint32_t segment_limit(const SegmentCache *seg)
+{
+ uint32_t limit = seg->limit;
+ uint32_t granularity = (seg->flags & DESC_G_MASK) != 0;
+
+ if (granularity != 0) {
+ limit = (limit << 12) | 0xFFF;
+ }
+
+ return limit;
+}
+
+static uint8_t segment_db(const SegmentCache *seg)
+{
+ return (seg->flags >> DESC_B_SHIFT) & 1;
+}
+
+static uint32_t segment_max_limit(const SegmentCache *seg)
+{
+ if (segment_db(seg) != 0) {
+ return 0xFFFFFFFF;
+ }
+ return 0xFFFF;
+}
+
+static int linearize(CPUState *cpu,
+ target_ulong logical_addr, target_ulong *linear_addr,
+ X86Seg seg_idx)
+{
+ enum CpuMode mode;
+ X86CPU *x86_cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86_cpu->env;
+ SegmentCache *seg = &env->segs[seg_idx];
+ target_ulong base = seg->base;
+ target_ulong logical_addr_32b;
+ uint32_t limit;
+ /* TODO: the emulator will not pass us "write" indicator yet */
+ bool write = false;
+
+ mode = cpu_mode(cpu);
+
+ switch (mode) {
+ case LONG_MODE:
+ if (__builtin_add_overflow(logical_addr, base, linear_addr)) {
+ error_report("Address overflow");
+ return -1;
+ }
+ break;
+ case PROTECTED_MODE:
+ case REAL_MODE:
+ if (segment_type_ro(seg) && write) {
+ error_report("Cannot write to read-only segment");
+ return -1;
+ }
+
+ logical_addr_32b = logical_addr & 0xFFFFFFFF;
+ limit = segment_limit(seg);
+
+ if (segment_expands_down(seg)) {
+ if (logical_addr_32b >= limit) {
+ error_report("Address exceeds limit (expands down)");
+ return -1;
+ }
+
+ limit = segment_max_limit(seg);
+ }
+
+ if (logical_addr_32b > limit) {
+ error_report("Address exceeds limit %u", limit);
+ return -1;
+ }
+ *linear_addr = logical_addr_32b + base;
+ break;
+ default:
+ error_report("Unknown cpu mode: %d", mode);
+ return -1;
+ }
+
+ return 0;
+}
+
+bool x86_read_segment_descriptor(CPUState *cpu,
+ struct x86_segment_descriptor *desc,
+ x86_segment_selector sel)
+{
+ target_ulong base;
+ uint32_t limit;
+ X86CPU *x86_cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86_cpu->env;
+ target_ulong gva;
+
+ memset(desc, 0, sizeof(*desc));
+
+ /* valid gdt descriptors start from index 1 */
+ if (!sel.index && GDT_SEL == sel.ti) {
+ return false;
+ }
+
+ if (GDT_SEL == sel.ti) {
+ base = env->gdt.base;
+ limit = env->gdt.limit;
+ } else {
+ base = env->ldt.base;
+ limit = env->ldt.limit;
+ }
+
+ if (sel.index * 8 >= limit) {
+ return false;
+ }
+
+ gva = base + sel.index * 8;
+ emul_ops->read_mem(cpu, desc, gva, sizeof(*desc));
+
+ return true;
+}
+
+bool x86_read_call_gate(CPUState *cpu, struct x86_call_gate *idt_desc,
+ int gate)
+{
+ target_ulong base;
+ uint32_t limit;
+ X86CPU *x86_cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86_cpu->env;
+ target_ulong gva;
+
+ base = env->idt.base;
+ limit = env->idt.limit;
+
+ memset(idt_desc, 0, sizeof(*idt_desc));
+ if (gate * 8 >= limit) {
+ perror("call gate exceeds idt limit");
+ return false;
+ }
+
+ gva = base + gate * 8;
+ emul_ops->read_mem(cpu, idt_desc, gva, sizeof(*idt_desc));
+
+ return true;
+}
+
+bool x86_is_protected(CPUState *cpu)
+{
+ X86CPU *x86_cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86_cpu->env;
+ uint64_t cr0 = env->cr[0];
+
+ return cr0 & CR0_PE_MASK;
+}
+
+bool x86_is_real(CPUState *cpu)
+{
+ return !x86_is_protected(cpu);
+}
+
+bool x86_is_v8086(CPUState *cpu)
+{
+ X86CPU *x86_cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86_cpu->env;
+ return x86_is_protected(cpu) && (env->eflags & VM_MASK);
+}
+
+bool x86_is_long_mode(CPUState *cpu)
+{
+ X86CPU *x86_cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86_cpu->env;
+ uint64_t efer = env->efer;
+ uint64_t lme_lma = (MSR_EFER_LME | MSR_EFER_LMA);
+
+ return ((efer & lme_lma) == lme_lma);
+}
+
+bool x86_is_long64_mode(CPUState *cpu)
+{
+ error_report("unimplemented: is_long64_mode()");
+ abort();
+}
+
+bool x86_is_paging_mode(CPUState *cpu)
+{
+ X86CPU *x86_cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86_cpu->env;
+ uint64_t cr0 = env->cr[0];
+
+ return cr0 & CR0_PG_MASK;
+}
+
+bool x86_is_pae_enabled(CPUState *cpu)
+{
+ X86CPU *x86_cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86_cpu->env;
+ uint64_t cr4 = env->cr[4];
+
+ return cr4 & CR4_PAE_MASK;
+}
+
+target_ulong linear_addr(CPUState *cpu, target_ulong addr, X86Seg seg)
+{
+ int ret;
+ target_ulong linear_addr;
+
+ ret = linearize(cpu, addr, &linear_addr, seg);
+ if (ret < 0) {
+ error_report("failed to linearize address");
+ abort();
+ }
+
+ return linear_addr;
+}
+
+target_ulong linear_addr_size(CPUState *cpu, target_ulong addr, int size,
+ X86Seg seg)
+{
+ switch (size) {
+ case 2:
+ addr = (uint16_t)addr;
+ break;
+ case 4:
+ addr = (uint32_t)addr;
+ break;
+ default:
+ break;
+ }
+ return linear_addr(cpu, addr, seg);
+}
+
+target_ulong linear_rip(CPUState *cpu, target_ulong rip)
+{
+ return linear_addr(cpu, rip, R_CS);
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 04/26] hw/intc: Generalize APIC helper names from kvm_* to accel_*
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (2 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 03/26] target/i386/mshv: Add x86 decoder/emu implementation Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 05/26] include/hw/hyperv: Add MSHV ABI header definitions Magnus Kulke
` (23 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Rename APIC helper functions to use an accel_* prefix instead of kvm_*
to support use by accelerators other than KVM. This is a preparatory
step for integrating MSHV support with common APIC logic.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
accel/accel-irq.c | 106 +++++++++++++++++++++++++++++++++++++
accel/meson.build | 2 +-
hw/intc/ioapic.c | 20 ++++---
hw/virtio/virtio-pci.c | 21 ++++----
include/system/accel-irq.h | 37 +++++++++++++
include/system/mshv.h | 21 ++++++++
6 files changed, 189 insertions(+), 18 deletions(-)
create mode 100644 accel/accel-irq.c
create mode 100644 include/system/accel-irq.h
diff --git a/accel/accel-irq.c b/accel/accel-irq.c
new file mode 100644
index 0000000000..7f864e35c4
--- /dev/null
+++ b/accel/accel-irq.c
@@ -0,0 +1,106 @@
+/*
+ * Accelerated irqchip abstraction
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * Authors: Ziqiao Zhou <ziqiaozhou@microsoft.com>
+ * Magnus Kulke <magnuskulke@microsoft.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "hw/pci/msi.h"
+
+#include "system/kvm.h"
+#include "system/mshv.h"
+#include "system/accel-irq.h"
+
+int accel_irqchip_add_msi_route(KVMRouteChange *c, int vector, PCIDevice *dev)
+{
+#ifdef CONFIG_MSHV_IS_POSSIBLE
+ if (mshv_msi_via_irqfd_enabled()) {
+ return mshv_irqchip_add_msi_route(vector, dev);
+ }
+#endif
+ if (kvm_enabled()) {
+ return kvm_irqchip_add_msi_route(c, vector, dev);
+ }
+ return -ENOSYS;
+}
+
+int accel_irqchip_update_msi_route(int vector, MSIMessage msg, PCIDevice *dev)
+{
+#ifdef CONFIG_MSHV_IS_POSSIBLE
+ if (mshv_msi_via_irqfd_enabled()) {
+ return mshv_irqchip_update_msi_route(vector, msg, dev);
+ }
+#endif
+ if (kvm_enabled()) {
+ return kvm_irqchip_update_msi_route(kvm_state, vector, msg, dev);
+ }
+ return -ENOSYS;
+}
+
+void accel_irqchip_commit_route_changes(KVMRouteChange *c)
+{
+#ifdef CONFIG_MSHV_IS_POSSIBLE
+ if (mshv_msi_via_irqfd_enabled()) {
+ mshv_irqchip_commit_routes();
+ }
+#endif
+ if (kvm_enabled()) {
+ kvm_irqchip_commit_route_changes(c);
+ }
+}
+
+void accel_irqchip_commit_routes(void)
+{
+#ifdef CONFIG_MSHV_IS_POSSIBLE
+ if (mshv_msi_via_irqfd_enabled()) {
+ mshv_irqchip_commit_routes();
+ }
+#endif
+ if (kvm_enabled()) {
+ kvm_irqchip_commit_routes(kvm_state);
+ }
+}
+
+void accel_irqchip_release_virq(int virq)
+{
+#ifdef CONFIG_MSHV_IS_POSSIBLE
+ if (mshv_msi_via_irqfd_enabled()) {
+ mshv_irqchip_release_virq(virq);
+ }
+#endif
+ if (kvm_enabled()) {
+ kvm_irqchip_release_virq(kvm_state, virq);
+ }
+}
+
+int accel_irqchip_add_irqfd_notifier_gsi(EventNotifier *n, EventNotifier *rn,
+ int virq)
+{
+#ifdef CONFIG_MSHV_IS_POSSIBLE
+ if (mshv_msi_via_irqfd_enabled()) {
+ return mshv_irqchip_add_irqfd_notifier_gsi(n, rn, virq);
+ }
+#endif
+ if (kvm_enabled()) {
+ return kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, n, rn, virq);
+ }
+ return -ENOSYS;
+}
+
+int accel_irqchip_remove_irqfd_notifier_gsi(EventNotifier *n, int virq)
+{
+#ifdef CONFIG_MSHV_IS_POSSIBLE
+ if (mshv_msi_via_irqfd_enabled()) {
+ return mshv_irqchip_remove_irqfd_notifier_gsi(n, virq);
+ }
+#endif
+ if (kvm_enabled()) {
+ return kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, n, virq);
+ }
+ return -ENOSYS;
+}
diff --git a/accel/meson.build b/accel/meson.build
index 25b0f100b5..6349efe682 100644
--- a/accel/meson.build
+++ b/accel/meson.build
@@ -1,6 +1,6 @@
common_ss.add(files('accel-common.c'))
specific_ss.add(files('accel-target.c'))
-system_ss.add(files('accel-system.c', 'accel-blocker.c', 'accel-qmp.c'))
+system_ss.add(files('accel-system.c', 'accel-blocker.c', 'accel-qmp.c', 'accel-irq.c'))
user_ss.add(files('accel-user.c'))
subdir('tcg')
diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 133bef852d..e431d00311 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -30,12 +30,18 @@
#include "hw/intc/ioapic_internal.h"
#include "hw/pci/msi.h"
#include "hw/qdev-properties.h"
+#include "system/accel-irq.h"
#include "system/kvm.h"
#include "system/system.h"
#include "hw/i386/apic-msidef.h"
#include "hw/i386/x86-iommu.h"
#include "trace.h"
+
+#if defined(CONFIG_KVM) || defined(CONFIG_MSHV)
+#define ACCEL_GSI_IRQFD_POSSIBLE
+#endif
+
#define APIC_DELIVERY_MODE_SHIFT 8
#define APIC_POLARITY_SHIFT 14
#define APIC_TRIG_MODE_SHIFT 15
@@ -191,10 +197,10 @@ static void ioapic_set_irq(void *opaque, int vector, int level)
static void ioapic_update_kvm_routes(IOAPICCommonState *s)
{
-#ifdef CONFIG_KVM
+#ifdef ACCEL_GSI_IRQFD_POSSIBLE
int i;
- if (kvm_irqchip_is_split()) {
+ if (accel_irqchip_is_split()) {
for (i = 0; i < IOAPIC_NUM_PINS; i++) {
MSIMessage msg;
struct ioapic_entry_info info;
@@ -202,15 +208,15 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s)
if (!info.masked) {
msg.address = info.addr;
msg.data = info.data;
- kvm_irqchip_update_msi_route(kvm_state, i, msg, NULL);
+ accel_irqchip_update_msi_route(i, msg, NULL);
}
}
- kvm_irqchip_commit_routes(kvm_state);
+ accel_irqchip_commit_routes();
}
#endif
}
-#ifdef CONFIG_KVM
+#ifdef ACCEL_KERNEL_GSI_IRQFD_POSSIBLE
static void ioapic_iec_notifier(void *private, bool global,
uint32_t index, uint32_t mask)
{
@@ -428,11 +434,11 @@ static const MemoryRegionOps ioapic_io_ops = {
static void ioapic_machine_done_notify(Notifier *notifier, void *data)
{
-#ifdef CONFIG_KVM
+#ifdef ACCEL_KERNEL_GSI_IRQFD_POSSIBLE
IOAPICCommonState *s = container_of(notifier, IOAPICCommonState,
machine_done);
- if (kvm_irqchip_is_split()) {
+ if (accel_irqchip_is_split()) {
X86IOMMUState *iommu = x86_iommu_get_default();
if (iommu) {
/* Register this IOAPIC with IOMMU IEC notifier, so that
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 767216d795..0cdc16217f 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -34,6 +34,7 @@
#include "hw/pci/msi.h"
#include "hw/pci/msix.h"
#include "hw/loader.h"
+#include "system/accel-irq.h"
#include "system/kvm.h"
#include "hw/virtio/virtio-pci.h"
#include "qemu/range.h"
@@ -825,11 +826,11 @@ static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy,
if (irqfd->users == 0) {
KVMRouteChange c = kvm_irqchip_begin_route_changes(kvm_state);
- ret = kvm_irqchip_add_msi_route(&c, vector, &proxy->pci_dev);
+ ret = accel_irqchip_add_msi_route(&c, vector, &proxy->pci_dev);
if (ret < 0) {
return ret;
}
- kvm_irqchip_commit_route_changes(&c);
+ accel_irqchip_commit_route_changes(&c);
irqfd->virq = ret;
}
irqfd->users++;
@@ -841,7 +842,7 @@ static void kvm_virtio_pci_vq_vector_release(VirtIOPCIProxy *proxy,
{
VirtIOIRQFD *irqfd = &proxy->vector_irqfd[vector];
if (--irqfd->users == 0) {
- kvm_irqchip_release_virq(kvm_state, irqfd->virq);
+ accel_irqchip_release_virq(irqfd->virq);
}
}
@@ -850,7 +851,7 @@ static int kvm_virtio_pci_irqfd_use(VirtIOPCIProxy *proxy,
unsigned int vector)
{
VirtIOIRQFD *irqfd = &proxy->vector_irqfd[vector];
- return kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, n, NULL, irqfd->virq);
+ return accel_irqchip_add_irqfd_notifier_gsi(n, NULL, irqfd->virq);
}
static void kvm_virtio_pci_irqfd_release(VirtIOPCIProxy *proxy,
@@ -860,7 +861,7 @@ static void kvm_virtio_pci_irqfd_release(VirtIOPCIProxy *proxy,
VirtIOIRQFD *irqfd = &proxy->vector_irqfd[vector];
int ret;
- ret = kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, n, irqfd->virq);
+ ret = accel_irqchip_remove_irqfd_notifier_gsi(n, irqfd->virq);
assert(ret == 0);
}
static int virtio_pci_get_notifier(VirtIOPCIProxy *proxy, int queue_no,
@@ -995,12 +996,12 @@ static int virtio_pci_one_vector_unmask(VirtIOPCIProxy *proxy,
if (proxy->vector_irqfd) {
irqfd = &proxy->vector_irqfd[vector];
if (irqfd->msg.data != msg.data || irqfd->msg.address != msg.address) {
- ret = kvm_irqchip_update_msi_route(kvm_state, irqfd->virq, msg,
- &proxy->pci_dev);
+ ret = accel_irqchip_update_msi_route(irqfd->virq, msg,
+ &proxy->pci_dev);
if (ret < 0) {
return ret;
}
- kvm_irqchip_commit_routes(kvm_state);
+ accel_irqchip_commit_routes();
}
}
@@ -1229,7 +1230,7 @@ static int virtio_pci_set_guest_notifiers(DeviceState *d, int nvqs, bool assign)
VirtioDeviceClass *k = VIRTIO_DEVICE_GET_CLASS(vdev);
int r, n;
bool with_irqfd = msix_enabled(&proxy->pci_dev) &&
- kvm_msi_via_irqfd_enabled();
+ accel_msi_via_irqfd_enabled() ;
nvqs = MIN(nvqs, VIRTIO_QUEUE_MAX);
@@ -1433,7 +1434,7 @@ static void virtio_pci_set_vector(VirtIODevice *vdev,
uint16_t new_vector)
{
bool kvm_irqfd = (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) &&
- msix_enabled(&proxy->pci_dev) && kvm_msi_via_irqfd_enabled();
+ msix_enabled(&proxy->pci_dev) && accel_msi_via_irqfd_enabled();
if (new_vector == old_vector) {
return;
diff --git a/include/system/accel-irq.h b/include/system/accel-irq.h
new file mode 100644
index 0000000000..671fb7dfdb
--- /dev/null
+++ b/include/system/accel-irq.h
@@ -0,0 +1,37 @@
+/*
+ * Accelerated irqchip abstraction
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * Authors: Ziqiao Zhou <ziqiaozhou@microsoft.com>
+ * Magnus Kulke <magnuskulke@microsoft.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef SYSTEM_ACCEL_IRQ_H
+#define SYSTEM_ACCEL_IRQ_H
+#include "hw/pci/msi.h"
+#include "qemu/osdep.h"
+#include "system/kvm.h"
+#include "system/mshv.h"
+
+static inline bool accel_msi_via_irqfd_enabled(void)
+{
+ return mshv_msi_via_irqfd_enabled() || kvm_msi_via_irqfd_enabled();
+}
+
+static inline bool accel_irqchip_is_split(void)
+{
+ return mshv_msi_via_irqfd_enabled() || kvm_irqchip_is_split();
+}
+
+int accel_irqchip_add_msi_route(KVMRouteChange *c, int vector, PCIDevice *dev);
+int accel_irqchip_update_msi_route(int vector, MSIMessage msg, PCIDevice *dev);
+void accel_irqchip_commit_route_changes(KVMRouteChange *c);
+void accel_irqchip_commit_routes(void);
+void accel_irqchip_release_virq(int virq);
+int accel_irqchip_add_irqfd_notifier_gsi(EventNotifier *n, EventNotifier *rn,
+ int virq);
+int accel_irqchip_remove_irqfd_notifier_gsi(EventNotifier *n, int virq);
+#endif
diff --git a/include/system/mshv.h b/include/system/mshv.h
index a971982b52..a358691428 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -22,4 +22,25 @@
#define CONFIG_MSHV_IS_POSSIBLE
#endif
+#ifdef CONFIG_MSHV_IS_POSSIBLE
+extern bool mshv_allowed;
+#define mshv_enabled() (mshv_allowed)
+#else /* CONFIG_MSHV_IS_POSSIBLE */
+#define mshv_enabled() false
+#endif
+#ifdef MSHV_USE_KERNEL_GSI_IRQFD
+#define mshv_msi_via_irqfd_enabled() mshv_enabled()
+#else
+#define mshv_msi_via_irqfd_enabled() false
+#endif
+
+/* interrupt */
+int mshv_irqchip_add_msi_route(int vector, PCIDevice *dev);
+int mshv_irqchip_update_msi_route(int virq, MSIMessage msg, PCIDevice *dev);
+void mshv_irqchip_commit_routes(void);
+void mshv_irqchip_release_virq(int virq);
+int mshv_irqchip_add_irqfd_notifier_gsi(const EventNotifier *n,
+ const EventNotifier *rn, int virq);
+int mshv_irqchip_remove_irqfd_notifier_gsi(const EventNotifier *n, int virq);
+
#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 05/26] include/hw/hyperv: Add MSHV ABI header definitions
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (3 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 04/26] hw/intc: Generalize APIC helper names from kvm_* to accel_* Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-27 10:44 ` Daniel P. Berrangé
2025-08-07 14:39 ` [PATCH v3 06/26] linux-headers/linux: Add mshv.h headers Magnus Kulke
` (22 subsequent siblings)
27 siblings, 1 reply; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Introduce headers for the Microsoft Hypervisor (MSHV) userspace ABI,
including IOCTLs and structures used to interface with the hypervisor.
These definitions are based on the upstream Linux MSHV interface and
will be used by the MSHV accelerator backend in later patches.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
include/hw/hyperv/hvgdk.h | 19 +
include/hw/hyperv/hvgdk_mini.h | 864 ++++++++++++++++++++++++++++++++
include/hw/hyperv/hvhdk.h | 164 ++++++
include/hw/hyperv/hvhdk_mini.h | 105 ++++
| 2 +-
5 files changed, 1153 insertions(+), 1 deletion(-)
create mode 100644 include/hw/hyperv/hvgdk.h
create mode 100644 include/hw/hyperv/hvgdk_mini.h
create mode 100644 include/hw/hyperv/hvhdk.h
create mode 100644 include/hw/hyperv/hvhdk_mini.h
diff --git a/include/hw/hyperv/hvgdk.h b/include/hw/hyperv/hvgdk.h
new file mode 100644
index 0000000000..d37c2b188d
--- /dev/null
+++ b/include/hw/hyperv/hvgdk.h
@@ -0,0 +1,19 @@
+/*
+ * Type definitions for the mshv guest interface.
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#ifndef _HVGDK_H
+#define _HVGDK_H
+
+#define HVGDK_H_VERSION (25125)
+
+enum hv_unimplemented_msr_action {
+ HV_UNIMPLEMENTED_MSR_ACTION_FAULT = 0,
+ HV_UNIMPLEMENTED_MSR_ACTION_IGNORE_WRITE_READ_ZERO = 1,
+ HV_UNIMPLEMENTED_MSR_ACTION_COUNT = 2,
+};
+
+#endif /* _HVGDK_H */
diff --git a/include/hw/hyperv/hvgdk_mini.h b/include/hw/hyperv/hvgdk_mini.h
new file mode 100644
index 0000000000..83f44fd5fa
--- /dev/null
+++ b/include/hw/hyperv/hvgdk_mini.h
@@ -0,0 +1,864 @@
+/*
+ * Userspace interfaces for /dev/mshv* devices and derived fds
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HW_HYPERV_LINUX_MSHV_H
+#define HW_HYPERV_LINUX_MSHV_H
+
+#define MSHV_IOCTL 0xB8
+
+typedef enum hv_register_name {
+ /* Pending Interruption Register */
+ HV_REGISTER_PENDING_INTERRUPTION = 0x00010002,
+
+ /* X64 User-Mode Registers */
+ HV_X64_REGISTER_RAX = 0x00020000,
+ HV_X64_REGISTER_RCX = 0x00020001,
+ HV_X64_REGISTER_RDX = 0x00020002,
+ HV_X64_REGISTER_RBX = 0x00020003,
+ HV_X64_REGISTER_RSP = 0x00020004,
+ HV_X64_REGISTER_RBP = 0x00020005,
+ HV_X64_REGISTER_RSI = 0x00020006,
+ HV_X64_REGISTER_RDI = 0x00020007,
+ HV_X64_REGISTER_R8 = 0x00020008,
+ HV_X64_REGISTER_R9 = 0x00020009,
+ HV_X64_REGISTER_R10 = 0x0002000A,
+ HV_X64_REGISTER_R11 = 0x0002000B,
+ HV_X64_REGISTER_R12 = 0x0002000C,
+ HV_X64_REGISTER_R13 = 0x0002000D,
+ HV_X64_REGISTER_R14 = 0x0002000E,
+ HV_X64_REGISTER_R15 = 0x0002000F,
+ HV_X64_REGISTER_RIP = 0x00020010,
+ HV_X64_REGISTER_RFLAGS = 0x00020011,
+
+ /* X64 Floating Point and Vector Registers */
+ HV_X64_REGISTER_XMM0 = 0x00030000,
+ HV_X64_REGISTER_XMM1 = 0x00030001,
+ HV_X64_REGISTER_XMM2 = 0x00030002,
+ HV_X64_REGISTER_XMM3 = 0x00030003,
+ HV_X64_REGISTER_XMM4 = 0x00030004,
+ HV_X64_REGISTER_XMM5 = 0x00030005,
+ HV_X64_REGISTER_XMM6 = 0x00030006,
+ HV_X64_REGISTER_XMM7 = 0x00030007,
+ HV_X64_REGISTER_XMM8 = 0x00030008,
+ HV_X64_REGISTER_XMM9 = 0x00030009,
+ HV_X64_REGISTER_XMM10 = 0x0003000A,
+ HV_X64_REGISTER_XMM11 = 0x0003000B,
+ HV_X64_REGISTER_XMM12 = 0x0003000C,
+ HV_X64_REGISTER_XMM13 = 0x0003000D,
+ HV_X64_REGISTER_XMM14 = 0x0003000E,
+ HV_X64_REGISTER_XMM15 = 0x0003000F,
+ HV_X64_REGISTER_FP_MMX0 = 0x00030010,
+ HV_X64_REGISTER_FP_MMX1 = 0x00030011,
+ HV_X64_REGISTER_FP_MMX2 = 0x00030012,
+ HV_X64_REGISTER_FP_MMX3 = 0x00030013,
+ HV_X64_REGISTER_FP_MMX4 = 0x00030014,
+ HV_X64_REGISTER_FP_MMX5 = 0x00030015,
+ HV_X64_REGISTER_FP_MMX6 = 0x00030016,
+ HV_X64_REGISTER_FP_MMX7 = 0x00030017,
+ HV_X64_REGISTER_FP_CONTROL_STATUS = 0x00030018,
+ HV_X64_REGISTER_XMM_CONTROL_STATUS = 0x00030019,
+
+ /* X64 Control Registers */
+ HV_X64_REGISTER_CR0 = 0x00040000,
+ HV_X64_REGISTER_CR2 = 0x00040001,
+ HV_X64_REGISTER_CR3 = 0x00040002,
+ HV_X64_REGISTER_CR4 = 0x00040003,
+ HV_X64_REGISTER_CR8 = 0x00040004,
+ HV_X64_REGISTER_XFEM = 0x00040005,
+
+ /* X64 Segment Registers */
+ HV_X64_REGISTER_ES = 0x00060000,
+ HV_X64_REGISTER_CS = 0x00060001,
+ HV_X64_REGISTER_SS = 0x00060002,
+ HV_X64_REGISTER_DS = 0x00060003,
+ HV_X64_REGISTER_FS = 0x00060004,
+ HV_X64_REGISTER_GS = 0x00060005,
+ HV_X64_REGISTER_LDTR = 0x00060006,
+ HV_X64_REGISTER_TR = 0x00060007,
+
+ /* X64 Table Registers */
+ HV_X64_REGISTER_IDTR = 0x00070000,
+ HV_X64_REGISTER_GDTR = 0x00070001,
+
+ /* X64 Virtualized MSRs */
+ HV_X64_REGISTER_TSC = 0x00080000,
+ HV_X64_REGISTER_EFER = 0x00080001,
+ HV_X64_REGISTER_KERNEL_GS_BASE = 0x00080002,
+ HV_X64_REGISTER_APIC_BASE = 0x00080003,
+ HV_X64_REGISTER_PAT = 0x00080004,
+ HV_X64_REGISTER_SYSENTER_CS = 0x00080005,
+ HV_X64_REGISTER_SYSENTER_EIP = 0x00080006,
+ HV_X64_REGISTER_SYSENTER_ESP = 0x00080007,
+ HV_X64_REGISTER_STAR = 0x00080008,
+ HV_X64_REGISTER_LSTAR = 0x00080009,
+ HV_X64_REGISTER_CSTAR = 0x0008000A,
+ HV_X64_REGISTER_SFMASK = 0x0008000B,
+ HV_X64_REGISTER_INITIAL_APIC_ID = 0x0008000C,
+
+ /* X64 Cache control MSRs */
+ HV_X64_REGISTER_MSR_MTRR_CAP = 0x0008000D,
+ HV_X64_REGISTER_MSR_MTRR_DEF_TYPE = 0x0008000E,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_BASE0 = 0x00080010,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_BASE1 = 0x00080011,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_BASE2 = 0x00080012,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_BASE3 = 0x00080013,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_BASE4 = 0x00080014,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_BASE5 = 0x00080015,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_BASE6 = 0x00080016,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_BASE7 = 0x00080017,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_BASE8 = 0x00080018,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_BASE9 = 0x00080019,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_BASEA = 0x0008001A,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_BASEB = 0x0008001B,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_BASEC = 0x0008001C,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_BASED = 0x0008001D,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_BASEE = 0x0008001E,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_BASEF = 0x0008001F,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_MASK0 = 0x00080040,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_MASK1 = 0x00080041,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_MASK2 = 0x00080042,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_MASK3 = 0x00080043,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_MASK4 = 0x00080044,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_MASK5 = 0x00080045,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_MASK6 = 0x00080046,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_MASK7 = 0x00080047,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_MASK8 = 0x00080048,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_MASK9 = 0x00080049,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_MASKA = 0x0008004A,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_MASKB = 0x0008004B,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_MASKC = 0x0008004C,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_MASKD = 0x0008004D,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_MASKE = 0x0008004E,
+ HV_X64_REGISTER_MSR_MTRR_PHYS_MASKF = 0x0008004F,
+ HV_X64_REGISTER_MSR_MTRR_FIX64K00000 = 0x00080070,
+ HV_X64_REGISTER_MSR_MTRR_FIX16K80000 = 0x00080071,
+ HV_X64_REGISTER_MSR_MTRR_FIX16KA0000 = 0x00080072,
+ HV_X64_REGISTER_MSR_MTRR_FIX4KC0000 = 0x00080073,
+ HV_X64_REGISTER_MSR_MTRR_FIX4KC8000 = 0x00080074,
+ HV_X64_REGISTER_MSR_MTRR_FIX4KD0000 = 0x00080075,
+ HV_X64_REGISTER_MSR_MTRR_FIX4KD8000 = 0x00080076,
+ HV_X64_REGISTER_MSR_MTRR_FIX4KE0000 = 0x00080077,
+ HV_X64_REGISTER_MSR_MTRR_FIX4KE8000 = 0x00080078,
+ HV_X64_REGISTER_MSR_MTRR_FIX4KF0000 = 0x00080079,
+ HV_X64_REGISTER_MSR_MTRR_FIX4KF8000 = 0x0008007A,
+
+ HV_X64_REGISTER_TSC_AUX = 0x0008007B,
+ HV_X64_REGISTER_BNDCFGS = 0x0008007C,
+ HV_X64_REGISTER_DEBUG_CTL = 0x0008007D,
+
+ /* Available */
+
+ HV_X64_REGISTER_SPEC_CTRL = 0x00080084,
+ HV_X64_REGISTER_TSC_ADJUST = 0x00080096,
+
+ /* Other MSRs */
+ HV_X64_REGISTER_MSR_IA32_MISC_ENABLE = 0x000800A0,
+
+ /* Misc */
+ HV_REGISTER_GUEST_OS_ID = 0x00090002,
+ HV_REGISTER_REFERENCE_TSC = 0x00090017,
+
+ /* Hypervisor-defined Registers (Synic) */
+ HV_REGISTER_SINT0 = 0x000A0000,
+ HV_REGISTER_SINT1 = 0x000A0001,
+ HV_REGISTER_SINT2 = 0x000A0002,
+ HV_REGISTER_SINT3 = 0x000A0003,
+ HV_REGISTER_SINT4 = 0x000A0004,
+ HV_REGISTER_SINT5 = 0x000A0005,
+ HV_REGISTER_SINT6 = 0x000A0006,
+ HV_REGISTER_SINT7 = 0x000A0007,
+ HV_REGISTER_SINT8 = 0x000A0008,
+ HV_REGISTER_SINT9 = 0x000A0009,
+ HV_REGISTER_SINT10 = 0x000A000A,
+ HV_REGISTER_SINT11 = 0x000A000B,
+ HV_REGISTER_SINT12 = 0x000A000C,
+ HV_REGISTER_SINT13 = 0x000A000D,
+ HV_REGISTER_SINT14 = 0x000A000E,
+ HV_REGISTER_SINT15 = 0x000A000F,
+ HV_REGISTER_SCONTROL = 0x000A0010,
+ HV_REGISTER_SVERSION = 0x000A0011,
+ HV_REGISTER_SIEFP = 0x000A0012,
+ HV_REGISTER_SIMP = 0x000A0013,
+ HV_REGISTER_EOM = 0x000A0014,
+ HV_REGISTER_SIRBP = 0x000A0015,
+} hv_register_name;
+
+enum hv_intercept_type {
+ HV_INTERCEPT_TYPE_X64_IO_PORT = 0X00000000,
+ HV_INTERCEPT_TYPE_X64_MSR = 0X00000001,
+ HV_INTERCEPT_TYPE_X64_CPUID = 0X00000002,
+ HV_INTERCEPT_TYPE_EXCEPTION = 0X00000003,
+
+ /* Used to be HV_INTERCEPT_TYPE_REGISTER */
+ HV_INTERCEPT_TYPE_RESERVED0 = 0X00000004,
+ HV_INTERCEPT_TYPE_MMIO = 0X00000005,
+ HV_INTERCEPT_TYPE_X64_GLOBAL_CPUID = 0X00000006,
+ HV_INTERCEPT_TYPE_X64_APIC_SMI = 0X00000007,
+ HV_INTERCEPT_TYPE_HYPERCALL = 0X00000008,
+
+ HV_INTERCEPT_TYPE_X64_APIC_INIT_SIPI = 0X00000009,
+ HV_INTERCEPT_MC_UPDATE_PATCH_LEVEL_MSR_READ = 0X0000000A,
+
+ HV_INTERCEPT_TYPE_X64_APIC_WRITE = 0X0000000B,
+ HV_INTERCEPT_TYPE_X64_MSR_INDEX = 0X0000000C,
+ HV_INTERCEPT_TYPE_MAX,
+ HV_INTERCEPT_TYPE_INVALID = 0XFFFFFFFF,
+};
+
+struct hv_u128 {
+ uint64_t low_part;
+ uint64_t high_part;
+};
+
+union hv_x64_xmm_control_status_register {
+ struct hv_u128 as_uint128;
+ struct {
+ union {
+ /* long mode */
+ uint64_t last_fp_rdp;
+ /* 32 bit mode */
+ struct {
+ uint32_t last_fp_dp;
+ uint16_t last_fp_ds;
+ uint16_t padding;
+ };
+ };
+ uint32_t xmm_status_control;
+ uint32_t xmm_status_control_mask;
+ };
+};
+
+union hv_x64_fp_register {
+ struct hv_u128 as_uint128;
+ struct {
+ uint64_t mantissa;
+ uint64_t biased_exponent:15;
+ uint64_t sign:1;
+ uint64_t reserved:48;
+ };
+};
+
+union hv_x64_pending_exception_event {
+ uint64_t as_uint64[2];
+ struct {
+ uint32_t event_pending:1;
+ uint32_t event_type:3;
+ uint32_t reserved0:4;
+ uint32_t deliver_error_code:1;
+ uint32_t reserved1:7;
+ uint32_t vector:16;
+ uint32_t error_code;
+ uint64_t exception_parameter;
+ };
+};
+
+union hv_x64_pending_virtualization_fault_event {
+ uint64_t as_uint64[2];
+ struct {
+ uint32_t event_pending:1;
+ uint32_t event_type:3;
+ uint32_t reserved0:4;
+ uint32_t reserved1:8;
+ uint32_t parameter0:16;
+ uint32_t code;
+ uint64_t parameter1;
+ };
+};
+
+union hv_x64_pending_interruption_register {
+ uint64_t as_uint64;
+ struct {
+ uint32_t interruption_pending:1;
+ uint32_t interruption_type:3;
+ uint32_t deliver_error_code:1;
+ uint32_t instruction_length:4;
+ uint32_t nested_event:1;
+ uint32_t reserved:6;
+ uint32_t interruption_vector:16;
+ uint32_t error_code;
+ };
+};
+
+union hv_x64_register_sev_control {
+ uint64_t as_uint64;
+ struct {
+ uint64_t enable_encrypted_state:1;
+ uint64_t reserved_z:11;
+ uint64_t vmsa_gpa_page_number:52;
+ };
+};
+
+union hv_x64_msr_npiep_config_contents {
+ uint64_t as_uint64;
+ struct {
+ /*
+ * These bits enable instruction execution prevention for
+ * specific instructions.
+ */
+ uint64_t prevents_gdt:1;
+ uint64_t prevents_idt:1;
+ uint64_t prevents_ldt:1;
+ uint64_t prevents_tr:1;
+
+ /* The reserved bits must always be 0. */
+ uint64_t reserved:60;
+ };
+};
+
+typedef struct hv_x64_segment_register {
+ uint64_t base;
+ uint32_t limit;
+ uint16_t selector;
+ union {
+ struct {
+ uint16_t segment_type:4;
+ uint16_t non_system_segment:1;
+ uint16_t descriptor_privilege_level:2;
+ uint16_t present:1;
+ uint16_t reserved:4;
+ uint16_t available:1;
+ uint16_t _long:1;
+ uint16_t _default:1;
+ uint16_t granularity:1;
+ };
+ uint16_t attributes;
+ };
+} hv_x64_segment_register;
+
+typedef struct hv_x64_table_register {
+ uint16_t pad[3];
+ uint16_t limit;
+ uint64_t base;
+} hv_x64_table_register;
+
+union hv_x64_fp_control_status_register {
+ struct hv_u128 as_uint128;
+ struct {
+ uint16_t fp_control;
+ uint16_t fp_status;
+ uint8_t fp_tag;
+ uint8_t reserved;
+ uint16_t last_fp_op;
+ union {
+ /* long mode */
+ uint64_t last_fp_rip;
+ /* 32 bit mode */
+ struct {
+ uint32_t last_fp_eip;
+ uint16_t last_fp_cs;
+ uint16_t padding;
+ };
+ };
+ };
+};
+
+/* General Hypervisor Register Content Definitions */
+
+union hv_explicit_suspend_register {
+ uint64_t as_uint64;
+ struct {
+ uint64_t suspended:1;
+ uint64_t reserved:63;
+ };
+};
+
+union hv_internal_activity_register {
+ uint64_t as_uint64;
+
+ struct {
+ uint64_t startup_suspend:1;
+ uint64_t halt_suspend:1;
+ uint64_t idle_suspend:1;
+ uint64_t rsvd_z:61;
+ };
+};
+
+union hv_x64_interrupt_state_register {
+ uint64_t as_uint64;
+ struct {
+ uint64_t interrupt_shadow:1;
+ uint64_t nmi_masked:1;
+ uint64_t reserved:62;
+ };
+};
+
+union hv_intercept_suspend_register {
+ uint64_t as_uint64;
+ struct {
+ uint64_t suspended:1;
+ uint64_t reserved:63;
+ };
+};
+
+union hv_register_value {
+ struct hv_u128 reg128;
+ uint64_t reg64;
+ uint32_t reg32;
+ uint16_t reg16;
+ uint8_t reg8;
+ union hv_x64_fp_register fp;
+ union hv_x64_fp_control_status_register fp_control_status;
+ union hv_x64_xmm_control_status_register xmm_control_status;
+ struct hv_x64_segment_register segment;
+ struct hv_x64_table_register table;
+ union hv_explicit_suspend_register explicit_suspend;
+ union hv_intercept_suspend_register intercept_suspend;
+ union hv_internal_activity_register internal_activity;
+ union hv_x64_interrupt_state_register interrupt_state;
+ union hv_x64_pending_interruption_register pending_interruption;
+ union hv_x64_msr_npiep_config_contents npiep_config;
+ union hv_x64_pending_exception_event pending_exception_event;
+ union hv_x64_pending_virtualization_fault_event
+ pending_virtualization_fault_event;
+ union hv_x64_register_sev_control sev_control;
+};
+
+typedef struct hv_register_assoc {
+ uint32_t name; /* enum hv_register_name */
+ uint32_t reserved1;
+ uint64_t reserved2;
+ union hv_register_value value;
+} hv_register_assoc;
+
+#define MSHV_VP_MAX_REGISTERS 128
+
+struct mshv_vp_registers {
+ int count; /* at most MSHV_VP_MAX_REGISTERS */
+ struct hv_register_assoc *regs;
+};
+
+union hv_interrupt_control {
+ uint64_t as_uint64;
+ struct {
+ uint32_t interrupt_type; /* enum hv_interrupt type */
+ uint32_t level_triggered:1;
+ uint32_t logical_dest_mode:1;
+ uint32_t rsvd:30;
+ };
+};
+
+struct hv_input_assert_virtual_interrupt {
+ uint64_t partition_id;
+ union hv_interrupt_control control;
+ uint64_t dest_addr; /* cpu's apic id */
+ uint32_t vector;
+ uint8_t target_vtl;
+ uint8_t rsvd_z0;
+ uint16_t rsvd_z1;
+};
+
+struct hv_register_x64_cpuid_result_parameters {
+ struct {
+ uint32_t eax;
+ uint32_t ecx;
+ uint8_t subleaf_specific;
+ uint8_t always_override;
+ uint16_t padding;
+ } input;
+ struct {
+ uint32_t eax;
+ uint32_t eax_mask;
+ uint32_t ebx;
+ uint32_t ebx_mask;
+ uint32_t ecx;
+ uint32_t ecx_mask;
+ uint32_t edx;
+ uint32_t edx_mask;
+ } result;
+};
+
+struct hv_register_x64_msr_result_parameters {
+ uint32_t msr_index;
+ uint32_t access_type;
+ uint32_t action; /* enum hv_unimplemented_msr_action */
+};
+
+union hv_register_intercept_result_parameters {
+ struct hv_register_x64_cpuid_result_parameters cpuid;
+ struct hv_register_x64_msr_result_parameters msr;
+};
+
+struct mshv_register_intercept_result {
+ uint32_t intercept_type; /* enum hv_intercept_type */
+ union hv_register_intercept_result_parameters parameters;
+};
+
+enum hv_translate_gva_result_code {
+ HV_TRANSLATE_GVA_SUCCESS = 0,
+
+ /* Translation failures. */
+ HV_TRANSLATE_GVA_PAGE_NOT_PRESENT = 1,
+ HV_TRANSLATE_GVA_PRIVILEGE_VIOLATION = 2,
+ HV_TRANSLATE_GVA_INVALIDE_PAGE_TABLE_FLAGS = 3,
+
+ /* GPA access failures. */
+ HV_TRANSLATE_GVA_GPA_UNMAPPED = 4,
+ HV_TRANSLATE_GVA_GPA_NO_READ_ACCESS = 5,
+ HV_TRANSLATE_GVA_GPA_NO_WRITE_ACCESS = 6,
+ HV_TRANSLATE_GVA_GPA_ILLEGAL_OVERLAY_ACCESS = 7,
+
+ /*
+ * Intercept for memory access by either
+ * - a higher VTL
+ * - a nested hypervisor (due to a violation of the nested page table)
+ */
+ HV_TRANSLATE_GVA_INTERCEPT = 8,
+
+ HV_TRANSLATE_GVA_GPA_UNACCEPTED = 9,
+};
+
+union hv_translate_gva_result {
+ uint64_t as_uint64;
+ struct {
+ uint32_t result_code; /* enum hv_translate_hva_result_code */
+ uint32_t cache_type:8;
+ uint32_t overlay_page:1;
+ uint32_t reserved:23;
+ };
+};
+
+typedef struct mshv_translate_gva {
+ uint64_t gva;
+ uint64_t flags;
+ union hv_translate_gva_result *result;
+ uint64_t *gpa;
+} mshv_translate_gva;
+
+/* /dev/mshv */
+#define MSHV_CREATE_PARTITION _IOW(MSHV_IOCTL, 0x00, struct mshv_create_partition)
+#define MSHV_CREATE_VP _IOW(MSHV_IOCTL, 0x01, struct mshv_create_vp)
+
+/* Partition fds created with MSHV_CREATE_PARTITION */
+#define MSHV_INITIALIZE_PARTITION _IO(MSHV_IOCTL, 0x00)
+#define MSHV_SET_GUEST_MEMORY _IOW(MSHV_IOCTL, 0x02, struct mshv_user_mem_region)
+#define MSHV_IRQFD _IOW(MSHV_IOCTL, 0x03, struct mshv_user_irqfd)
+#define MSHV_IOEVENTFD _IOW(MSHV_IOCTL, 0x04, struct mshv_user_ioeventfd)
+#define MSHV_SET_MSI_ROUTING _IOW(MSHV_IOCTL, 0x05, struct mshv_user_irq_table)
+
+/* TODO: replace with ROOT_HVCALL */
+#define MSHV_GET_VP_REGISTERS _IOWR(MSHV_IOCTL, 0xF0, struct mshv_vp_registers)
+#define MSHV_SET_VP_REGISTERS _IOW(MSHV_IOCTL, 0xF1, struct mshv_vp_registers)
+#define MSHV_TRANSLATE_GVA _IOWR(MSHV_IOCTL, 0xF2, struct mshv_translate_gva)
+
+#define MSHV_VP_REGISTER_INTERCEPT_RESULT \
+ _IOW(MSHV_IOCTL, 0xF3, struct mshv_register_intercept_result)
+
+/*
+ ********************************
+ * VP APIs for child partitions *
+ ********************************
+ */
+
+struct hv_local_interrupt_controller_state {
+ /* HV_X64_INTERRUPT_CONTROLLER_STATE */
+ uint32_t apic_id;
+ uint32_t apic_version;
+ uint32_t apic_ldr;
+ uint32_t apic_dfr;
+ uint32_t apic_spurious;
+ uint32_t apic_isr[8];
+ uint32_t apic_tmr[8];
+ uint32_t apic_irr[8];
+ uint32_t apic_esr;
+ uint32_t apic_icr_high;
+ uint32_t apic_icr_low;
+ uint32_t apic_lvt_timer;
+ uint32_t apic_lvt_thermal;
+ uint32_t apic_lvt_perfmon;
+ uint32_t apic_lvt_lint0;
+ uint32_t apic_lvt_lint1;
+ uint32_t apic_lvt_error;
+ uint32_t apic_lvt_cmci;
+ uint32_t apic_error_status;
+ uint32_t apic_initial_count;
+ uint32_t apic_counter_value;
+ uint32_t apic_divide_configuration;
+ uint32_t apic_remote_read;
+};
+
+/* Generic hypercall */
+#define MSHV_ROOT_HVCALL _IOWR(MSHV_IOCTL, 0x07, struct mshv_root_hvcall)
+
+/* From hvgdk_mini.h */
+
+#define HV_X64_MSR_GUEST_OS_ID 0x40000000
+#define HV_X64_MSR_SINT0 0x40000090
+#define HV_X64_MSR_SINT1 0x40000091
+#define HV_X64_MSR_SINT2 0x40000092
+#define HV_X64_MSR_SINT3 0x40000093
+#define HV_X64_MSR_SINT4 0x40000094
+#define HV_X64_MSR_SINT5 0x40000095
+#define HV_X64_MSR_SINT6 0x40000096
+#define HV_X64_MSR_SINT7 0x40000097
+#define HV_X64_MSR_SINT8 0x40000098
+#define HV_X64_MSR_SINT9 0x40000099
+#define HV_X64_MSR_SINT10 0x4000009A
+#define HV_X64_MSR_SINT11 0x4000009B
+#define HV_X64_MSR_SINT12 0x4000009C
+#define HV_X64_MSR_SINT13 0x4000009D
+#define HV_X64_MSR_SINT14 0x4000009E
+#define HV_X64_MSR_SINT15 0x4000009F
+#define HV_X64_MSR_SCONTROL 0x40000080
+#define HV_X64_MSR_SIEFP 0x40000082
+#define HV_X64_MSR_SIMP 0x40000083
+#define HV_X64_MSR_REFERENCE_TSC 0x40000021
+#define HV_X64_MSR_EOM 0x40000084
+
+/* Define port identifier type. */
+union hv_port_id {
+ uint32_t asuint32_t;
+ struct {
+ uint32_t id:24;
+ uint32_t reserved:8;
+ };
+};
+
+#define HV_MESSAGE_SIZE (256)
+#define HV_MESSAGE_PAYLOAD_BYTE_COUNT (240)
+#define HV_MESSAGE_PAYLOAD_QWORD_COUNT (30)
+
+/* Define hypervisor message types. */
+enum hv_message_type {
+ HVMSG_NONE = 0x00000000,
+
+ /* Memory access messages. */
+ HVMSG_UNMAPPED_GPA = 0x80000000,
+ HVMSG_GPA_INTERCEPT = 0x80000001,
+ HVMSG_UNACCEPTED_GPA = 0x80000003,
+ HVMSG_GPA_ATTRIBUTE_INTERCEPT = 0x80000004,
+
+ /* Timer notification messages. */
+ HVMSG_TIMER_EXPIRED = 0x80000010,
+
+ /* Error messages. */
+ HVMSG_INVALID_VP_REGISTER_VALUE = 0x80000020,
+ HVMSG_UNRECOVERABLE_EXCEPTION = 0x80000021,
+ HVMSG_UNSUPPORTED_FEATURE = 0x80000022,
+
+ /*
+ * Opaque intercept message. The original intercept message is only
+ * accessible from the mapped intercept message page.
+ */
+ HVMSG_OPAQUE_INTERCEPT = 0x8000003F,
+
+ /* Trace buffer complete messages. */
+ HVMSG_EVENTLOG_BUFFERCOMPLETE = 0x80000040,
+
+ /* Hypercall intercept */
+ HVMSG_HYPERCALL_INTERCEPT = 0x80000050,
+
+ /* SynIC intercepts */
+ HVMSG_SYNIC_EVENT_INTERCEPT = 0x80000060,
+ HVMSG_SYNIC_SINT_INTERCEPT = 0x80000061,
+ HVMSG_SYNIC_SINT_DELIVERABLE = 0x80000062,
+
+ /* Async call completion intercept */
+ HVMSG_ASYNC_CALL_COMPLETION = 0x80000070,
+
+ /* Root scheduler messages */
+ HVMSG_SCHEDULER_VP_SIGNAL_BITSE = 0x80000100,
+ HVMSG_SCHEDULER_VP_SIGNAL_PAIR = 0x80000101,
+
+ /* Platform-specific processor intercept messages. */
+ HVMSG_X64_IO_PORT_INTERCEPT = 0x80010000,
+ HVMSG_X64_MSR_INTERCEPT = 0x80010001,
+ HVMSG_X64_CPUID_INTERCEPT = 0x80010002,
+ HVMSG_X64_EXCEPTION_INTERCEPT = 0x80010003,
+ HVMSG_X64_APIC_EOI = 0x80010004,
+ HVMSG_X64_LEGACY_FP_ERROR = 0x80010005,
+ HVMSG_X64_IOMMU_PRQ = 0x80010006,
+ HVMSG_X64_HALT = 0x80010007,
+ HVMSG_X64_INTERRUPTION_DELIVERABLE = 0x80010008,
+ HVMSG_X64_SIPI_INTERCEPT = 0x80010009,
+ HVMSG_X64_SEV_VMGEXIT_INTERCEPT = 0x80010013,
+};
+
+union hv_x64_vp_execution_state {
+ uint16_t as_uint16;
+ struct {
+ uint16_t cpl:2;
+ uint16_t cr0_pe:1;
+ uint16_t cr0_am:1;
+ uint16_t efer_lma:1;
+ uint16_t debug_active:1;
+ uint16_t interruption_pending:1;
+ uint16_t vtl:4;
+ uint16_t enclave_mode:1;
+ uint16_t interrupt_shadow:1;
+ uint16_t virtualization_fault_active:1;
+ uint16_t reserved:2;
+ };
+};
+
+/* From openvmm::hvdef */
+enum hv_x64_intercept_access_type {
+ HV_X64_INTERCEPT_ACCESS_TYPE_READ = 0,
+ HV_X64_INTERCEPT_ACCESS_TYPE_WRITE = 1,
+ HV_X64_INTERCEPT_ACCESS_TYPE_EXECUTE = 2,
+};
+
+struct hv_x64_intercept_message_header {
+ uint32_t vp_index;
+ uint8_t instruction_length:4;
+ uint8_t cr8:4; /* Only set for exo partitions */
+ uint8_t intercept_access_type;
+ union hv_x64_vp_execution_state execution_state;
+ struct hv_x64_segment_register cs_segment;
+ uint64_t rip;
+ uint64_t rflags;
+};
+
+union hv_x64_io_port_access_info {
+ uint8_t as_uint8;
+ struct {
+ uint8_t access_size:3;
+ uint8_t string_op:1;
+ uint8_t rep_prefix:1;
+ uint8_t reserved:3;
+ };
+};
+
+typedef struct hv_x64_io_port_intercept_message {
+ struct hv_x64_intercept_message_header header;
+ uint16_t port_number;
+ union hv_x64_io_port_access_info access_info;
+ uint8_t instruction_byte_count;
+ uint32_t reserved;
+ uint64_t rax;
+ uint8_t instruction_bytes[16];
+ struct hv_x64_segment_register ds_segment;
+ struct hv_x64_segment_register es_segment;
+ uint64_t rcx;
+ uint64_t rsi;
+ uint64_t rdi;
+} hv_x64_io_port_intercept_message;
+
+union hv_x64_memory_access_info {
+ uint8_t as_uint8;
+ struct {
+ uint8_t gva_valid:1;
+ uint8_t gva_gpa_valid:1;
+ uint8_t hypercall_output_pending:1;
+ uint8_t tlb_locked_no_overlay:1;
+ uint8_t reserved:4;
+ };
+};
+
+struct hv_x64_memory_intercept_message {
+ struct hv_x64_intercept_message_header header;
+ uint32_t cache_type; /* enum hv_cache_type */
+ uint8_t instruction_byte_count;
+ union hv_x64_memory_access_info memory_access_info;
+ uint8_t tpr_priority;
+ uint8_t reserved1;
+ uint64_t guest_virtual_address;
+ uint64_t guest_physical_address;
+ uint8_t instruction_bytes[16];
+};
+
+union hv_message_flags {
+ uint8_t asu8;
+ struct {
+ uint8_t msg_pending:1;
+ uint8_t reserved:7;
+ };
+};
+
+struct hv_message_header {
+ uint32_t message_type;
+ uint8_t payload_size;
+ union hv_message_flags message_flags;
+ uint8_t reserved[2];
+ union {
+ uint64_t sender;
+ union hv_port_id port;
+ };
+};
+
+struct hv_message {
+ struct hv_message_header header;
+ union {
+ uint64_t payload[HV_MESSAGE_PAYLOAD_QWORD_COUNT];
+ } u;
+};
+
+/* From github.com/rust-vmm/mshv-bindings/src/x86_64/regs.rs */
+
+struct hv_cpuid_entry {
+ uint32_t function;
+ uint32_t index;
+ uint32_t flags;
+ uint32_t eax;
+ uint32_t ebx;
+ uint32_t ecx;
+ uint32_t edx;
+ uint32_t padding[3];
+};
+
+struct hv_cpuid {
+ uint32_t nent;
+ uint32_t padding;
+ struct hv_cpuid_entry entries[0];
+};
+
+#define IA32_MSR_TSC 0x00000010
+#define IA32_MSR_EFER 0xC0000080
+#define IA32_MSR_KERNEL_GS_BASE 0xC0000102
+#define IA32_MSR_APIC_BASE 0x0000001B
+#define IA32_MSR_PAT 0x0277
+#define IA32_MSR_SYSENTER_CS 0x00000174
+#define IA32_MSR_SYSENTER_ESP 0x00000175
+#define IA32_MSR_SYSENTER_EIP 0x00000176
+#define IA32_MSR_STAR 0xC0000081
+#define IA32_MSR_LSTAR 0xC0000082
+#define IA32_MSR_CSTAR 0xC0000083
+#define IA32_MSR_SFMASK 0xC0000084
+
+#define IA32_MSR_MTRR_CAP 0x00FE
+#define IA32_MSR_MTRR_DEF_TYPE 0x02FF
+#define IA32_MSR_MTRR_PHYSBASE0 0x0200
+#define IA32_MSR_MTRR_PHYSMASK0 0x0201
+#define IA32_MSR_MTRR_PHYSBASE1 0x0202
+#define IA32_MSR_MTRR_PHYSMASK1 0x0203
+#define IA32_MSR_MTRR_PHYSBASE2 0x0204
+#define IA32_MSR_MTRR_PHYSMASK2 0x0205
+#define IA32_MSR_MTRR_PHYSBASE3 0x0206
+#define IA32_MSR_MTRR_PHYSMASK3 0x0207
+#define IA32_MSR_MTRR_PHYSBASE4 0x0208
+#define IA32_MSR_MTRR_PHYSMASK4 0x0209
+#define IA32_MSR_MTRR_PHYSBASE5 0x020A
+#define IA32_MSR_MTRR_PHYSMASK5 0x020B
+#define IA32_MSR_MTRR_PHYSBASE6 0x020C
+#define IA32_MSR_MTRR_PHYSMASK6 0x020D
+#define IA32_MSR_MTRR_PHYSBASE7 0x020E
+#define IA32_MSR_MTRR_PHYSMASK7 0x020F
+
+#define IA32_MSR_MTRR_FIX64K_00000 0x0250
+#define IA32_MSR_MTRR_FIX16K_80000 0x0258
+#define IA32_MSR_MTRR_FIX16K_A0000 0x0259
+#define IA32_MSR_MTRR_FIX4K_C0000 0x0268
+#define IA32_MSR_MTRR_FIX4K_C8000 0x0269
+#define IA32_MSR_MTRR_FIX4K_D0000 0x026A
+#define IA32_MSR_MTRR_FIX4K_D8000 0x026B
+#define IA32_MSR_MTRR_FIX4K_E0000 0x026C
+#define IA32_MSR_MTRR_FIX4K_E8000 0x026D
+#define IA32_MSR_MTRR_FIX4K_F0000 0x026E
+#define IA32_MSR_MTRR_FIX4K_F8000 0x026F
+
+#define IA32_MSR_TSC_AUX 0xC0000103
+#define IA32_MSR_BNDCFGS 0x00000d90
+#define IA32_MSR_DEBUG_CTL 0x1D9
+#define IA32_MSR_SPEC_CTRL 0x00000048
+#define IA32_MSR_TSC_ADJUST 0x0000003b
+
+#define IA32_MSR_MISC_ENABLE 0x000001a0
+
+
+#define HV_TRANSLATE_GVA_VALIDATE_READ (0x0001)
+#define HV_TRANSLATE_GVA_VALIDATE_WRITE (0x0002)
+#define HV_TRANSLATE_GVA_VALIDATE_EXECUTE (0x0004)
+
+#endif
diff --git a/include/hw/hyperv/hvhdk.h b/include/hw/hyperv/hvhdk.h
new file mode 100644
index 0000000000..d22cc49742
--- /dev/null
+++ b/include/hw/hyperv/hvhdk.h
@@ -0,0 +1,164 @@
+/*
+ * Type definitions for the mshv host.
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HW_HYPERV_HVHDK_H
+#define HW_HYPERV_HVHDK_H
+
+#define HV_PARTITION_SYNTHETIC_PROCESSOR_FEATURES_BANKS 1
+
+struct hv_input_set_partition_property {
+ uint64_t partition_id;
+ uint32_t property_code; /* enum hv_partition_property_code */
+ uint32_t padding;
+ uint64_t property_value;
+};
+
+union hv_partition_synthetic_processor_features {
+ uint64_t as_uint64[HV_PARTITION_SYNTHETIC_PROCESSOR_FEATURES_BANKS];
+
+ struct {
+ /*
+ * Report a hypervisor is present. CPUID leaves
+ * 0x40000000 and 0x40000001 are supported.
+ */
+ uint64_t hypervisor_present:1;
+
+ /*
+ * Features associated with HV#1:
+ */
+
+ /* Report support for Hv1 (CPUID leaves 0x40000000 - 0x40000006). */
+ uint64_t hv1:1;
+
+ /*
+ * Access to HV_X64_MSR_VP_RUNTIME.
+ * Corresponds to access_vp_run_time_reg privilege.
+ */
+ uint64_t access_vp_run_time_reg:1;
+
+ /*
+ * Access to HV_X64_MSR_TIME_REF_COUNT.
+ * Corresponds to access_partition_reference_counter privilege.
+ */
+ uint64_t access_partition_reference_counter:1;
+
+ /*
+ * Access to SINT-related registers (HV_X64_MSR_SCONTROL through
+ * HV_X64_MSR_EOM and HV_X64_MSR_SINT0 through HV_X64_MSR_SINT15).
+ * Corresponds to access_synic_regs privilege.
+ */
+ uint64_t access_synic_regs:1;
+
+ /*
+ * Access to synthetic timers and associated MSRs
+ * (HV_X64_MSR_STIMER0_CONFIG through HV_X64_MSR_STIMER3_COUNT).
+ * Corresponds to access_synthetic_timer_regs privilege.
+ */
+ uint64_t access_synthetic_timer_regs:1;
+
+ /*
+ * Access to APIC MSRs (HV_X64_MSR_EOI, HV_X64_MSR_ICR and
+ * HV_X64_MSR_TPR) as well as the VP assist page.
+ * Corresponds to access_intr_ctrl_regs privilege.
+ */
+ uint64_t access_intr_ctrl_regs:1;
+
+ /*
+ * Access to registers associated with hypercalls
+ * (HV_X64_MSR_GUEST_OS_ID and HV_X64_MSR_HYPERCALL).
+ * Corresponds to access_hypercall_msrs privilege.
+ */
+ uint64_t access_hypercall_regs:1;
+
+ /* VP index can be queried. corresponds to access_vp_index privilege. */
+ uint64_t access_vp_index:1;
+
+ /*
+ * Access to the reference TSC. Corresponds to
+ * access_partition_reference_tsc privilege.
+ */
+ uint64_t access_partition_reference_tsc:1;
+
+ /*
+ * Partition has access to the guest idle reg. Corresponds to
+ * access_guest_idle_reg privilege.
+ */
+ uint64_t access_guest_idle_reg:1;
+
+ /*
+ * Partition has access to frequency regs. corresponds to
+ * access_frequency_regs privilege.
+ */
+ uint64_t access_frequency_regs:1;
+
+ uint64_t reserved_z12:1; /* Reserved for access_reenlightenment_controls */
+ uint64_t reserved_z13:1; /* Reserved for access_root_scheduler_reg */
+ uint64_t reserved_z14:1; /* Reserved for access_tsc_invariant_controls */
+
+ /*
+ * Extended GVA ranges for HvCallFlushVirtualAddressList hypercall.
+ * Corresponds to privilege.
+ */
+ uint64_t enable_extended_gva_ranges_for_flush_virtual_address_list:1;
+
+ uint64_t reserved_z16:1; /* Reserved for access_vsm. */
+ uint64_t reserved_z17:1; /* Reserved for access_vp_registers. */
+
+ /* Use fast hypercall output. Corresponds to privilege. */
+ uint64_t fast_hypercall_output:1;
+
+ uint64_t reserved_z19:1; /* Reserved for enable_extended_hypercalls. */
+
+ /*
+ * HvStartVirtualProcessor can be used to start virtual processors.
+ * Corresponds to privilege.
+ */
+ uint64_t start_virtual_processor:1;
+
+ uint64_t reserved_z21:1; /* Reserved for Isolation. */
+
+ /* Synthetic timers in direct mode. */
+ uint64_t direct_synthetic_timers:1;
+
+ uint64_t reserved_z23:1; /* Reserved for synthetic time unhalted timer */
+
+ /* Use extended processor masks. */
+ uint64_t extended_processor_masks:1;
+
+ /*
+ * HvCallFlushVirtualAddressSpace / HvCallFlushVirtualAddressList are
+ * supported.
+ */
+ uint64_t tb_flush_hypercalls:1;
+
+ /* HvCallSendSyntheticClusterIpi is supported. */
+ uint64_t synthetic_cluster_ipi:1;
+
+ /* HvCallNotifyLongSpinWait is supported. */
+ uint64_t notify_long_spin_wait:1;
+
+ /* HvCallQueryNumaDistance is supported. */
+ uint64_t query_numa_distance:1;
+
+ /* HvCallSignalEvent is supported. Corresponds to privilege. */
+ uint64_t signal_events:1;
+
+ /* HvCallRetargetDeviceInterrupt is supported. */
+ uint64_t retarget_device_interrupt:1;
+
+ /* HvCallRestorePartitionTime is supported. */
+ uint64_t restore_time:1;
+
+ /* EnlightenedVmcs nested enlightenment is supported. */
+ uint64_t enlightened_vmcs:1;
+
+ uint64_t reserved:30;
+ };
+};
+
+#endif
diff --git a/include/hw/hyperv/hvhdk_mini.h b/include/hw/hyperv/hvhdk_mini.h
new file mode 100644
index 0000000000..cffd16e0de
--- /dev/null
+++ b/include/hw/hyperv/hvhdk_mini.h
@@ -0,0 +1,105 @@
+/*
+ * Type definitions for the mshv host interface.
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#ifndef _HVHDK_MINI_H
+#define _HVHDK_MINI_H
+
+#define HVHVK_MINI_VERSION (25294)
+
+/* Each generic set contains 64 elements */
+#define HV_GENERIC_SET_SHIFT (6)
+#define HV_GENERIC_SET_MASK (63)
+
+#define HVCALL_GET_PARTITION_PROPERTY 0x0044
+#define HVCALL_SET_PARTITION_PROPERTY 0x0045
+#define HVCALL_ASSERT_VIRTUAL_INTERRUPT 0x0094
+
+enum hv_generic_set_format {
+ HV_GENERIC_SET_SPARSE_4K,
+ HV_GENERIC_SET_ALL,
+};
+
+enum hv_partition_property_code {
+ /* Privilege properties */
+ HV_PARTITION_PROPERTY_PRIVILEGE_FLAGS = 0x00010000,
+ HV_PARTITION_PROPERTY_SYNTHETIC_PROC_FEATURES = 0x00010001,
+
+ /* Scheduling properties */
+ HV_PARTITION_PROPERTY_SUSPEND = 0x00020000,
+ HV_PARTITION_PROPERTY_CPU_RESERVE = 0x00020001,
+ HV_PARTITION_PROPERTY_CPU_CAP = 0x00020002,
+ HV_PARTITION_PROPERTY_CPU_WEIGHT = 0x00020003,
+ HV_PARTITION_PROPERTY_CPU_GROUP_ID = 0x00020004,
+
+ /* Time properties */
+ HV_PARTITION_PROPERTY_TIME_FREEZE = 0x00030003,
+ HV_PARTITION_PROPERTY_REFERENCE_TIME = 0x00030005,
+
+ /* Debugging properties */
+ HV_PARTITION_PROPERTY_DEBUG_CHANNEL_ID = 0x00040000,
+
+ /* Resource properties */
+ HV_PARTITION_PROPERTY_VIRTUAL_TLB_PAGE_COUNT = 0x00050000,
+ HV_PARTITION_PROPERTY_VSM_CONFIG = 0x00050001,
+ HV_PARTITION_PROPERTY_ZERO_MEMORY_ON_RESET = 0x00050002,
+ HV_PARTITION_PROPERTY_PROCESSORS_PER_SOCKET = 0x00050003,
+ HV_PARTITION_PROPERTY_NESTED_TLB_SIZE = 0x00050004,
+ HV_PARTITION_PROPERTY_GPA_PAGE_ACCESS_TRACKING = 0x00050005,
+ HV_PARTITION_PROPERTY_VSM_PERMISSIONS_DIRTY_SINCE_LAST_QUERY = 0x00050006,
+ HV_PARTITION_PROPERTY_SGX_LAUNCH_CONTROL_CONFIG = 0x00050007,
+ HV_PARTITION_PROPERTY_DEFAULT_SGX_LAUNCH_CONTROL0 = 0x00050008,
+ HV_PARTITION_PROPERTY_DEFAULT_SGX_LAUNCH_CONTROL1 = 0x00050009,
+ HV_PARTITION_PROPERTY_DEFAULT_SGX_LAUNCH_CONTROL2 = 0x0005000a,
+ HV_PARTITION_PROPERTY_DEFAULT_SGX_LAUNCH_CONTROL3 = 0x0005000b,
+ HV_PARTITION_PROPERTY_ISOLATION_STATE = 0x0005000c,
+ HV_PARTITION_PROPERTY_ISOLATION_CONTROL = 0x0005000d,
+ HV_PARTITION_PROPERTY_ALLOCATION_ID = 0x0005000e,
+ HV_PARTITION_PROPERTY_MONITORING_ID = 0x0005000f,
+ HV_PARTITION_PROPERTY_IMPLEMENTED_PHYSICAL_ADDRESS_BITS = 0x00050010,
+ HV_PARTITION_PROPERTY_NON_ARCHITECTURAL_CORE_SHARING = 0x00050011,
+ HV_PARTITION_PROPERTY_HYPERCALL_DOORBELL_PAGE = 0x00050012,
+ HV_PARTITION_PROPERTY_ISOLATION_POLICY = 0x00050014,
+ HV_PARTITION_PROPERTY_UNIMPLEMENTED_MSR_ACTION = 0x00050017,
+ HV_PARTITION_PROPERTY_SEV_VMGEXIT_OFFLOADS = 0x00050022,
+
+ /* Compatibility properties */
+ HV_PARTITION_PROPERTY_PROCESSOR_VENDOR = 0x00060000,
+ HV_PARTITION_PROPERTY_PROCESSOR_FEATURES_DEPRECATED = 0x00060001,
+ HV_PARTITION_PROPERTY_PROCESSOR_XSAVE_FEATURES = 0x00060002,
+ HV_PARTITION_PROPERTY_PROCESSOR_CL_FLUSH_SIZE = 0x00060003,
+ HV_PARTITION_PROPERTY_ENLIGHTENMENT_MODIFICATIONS = 0x00060004,
+ HV_PARTITION_PROPERTY_COMPATIBILITY_VERSION = 0x00060005,
+ HV_PARTITION_PROPERTY_PHYSICAL_ADDRESS_WIDTH = 0x00060006,
+ HV_PARTITION_PROPERTY_XSAVE_STATES = 0x00060007,
+ HV_PARTITION_PROPERTY_MAX_XSAVE_DATA_SIZE = 0x00060008,
+ HV_PARTITION_PROPERTY_PROCESSOR_CLOCK_FREQUENCY = 0x00060009,
+ HV_PARTITION_PROPERTY_PROCESSOR_FEATURES0 = 0x0006000a,
+ HV_PARTITION_PROPERTY_PROCESSOR_FEATURES1 = 0x0006000b,
+
+ /* Guest software properties */
+ HV_PARTITION_PROPERTY_GUEST_OS_ID = 0x00070000,
+
+ /* Nested virtualization properties */
+ HV_PARTITION_PROPERTY_PROCESSOR_VIRTUALIZATION_FEATURES = 0x00080000,
+};
+
+/* HV Map GPA (Guest Physical Address) Flags */
+#define HV_MAP_GPA_PERMISSIONS_NONE 0x0
+#define HV_MAP_GPA_READABLE 0x1
+#define HV_MAP_GPA_WRITABLE 0x2
+#define HV_MAP_GPA_KERNEL_EXECUTABLE 0x4
+#define HV_MAP_GPA_USER_EXECUTABLE 0x8
+#define HV_MAP_GPA_EXECUTABLE 0xC
+#define HV_MAP_GPA_PERMISSIONS_MASK 0xF
+#define HV_MAP_GPA_ADJUSTABLE 0x8000
+#define HV_MAP_GPA_NO_ACCESS 0x10000
+#define HV_MAP_GPA_NOT_CACHED 0x200000
+#define HV_MAP_GPA_LARGE_PAGE 0x80000000
+
+#define HV_PFN_RNG_PAGEBITS 24 /* HV_SPA_PAGE_RANGE_ADDITIONAL_PAGES_BITS */
+
+#endif /* _HVHDK_MINI_H */
--git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 717c379f9e..828a7809f7 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -195,7 +195,7 @@ rm -rf "$output/linux-headers/linux"
mkdir -p "$output/linux-headers/linux"
for header in const.h stddef.h kvm.h vfio.h vfio_ccw.h vfio_zdev.h vhost.h \
psci.h psp-sev.h userfaultfd.h memfd.h mman.h nvme_ioctl.h \
- vduse.h iommufd.h bits.h; do
+ vduse.h iommufd.h bits.h mshv.h; do
cp "$hdrdir/include/linux/$header" "$output/linux-headers/linux"
done
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v3 05/26] include/hw/hyperv: Add MSHV ABI header definitions
2025-08-07 14:39 ` [PATCH v3 05/26] include/hw/hyperv: Add MSHV ABI header definitions Magnus Kulke
@ 2025-08-27 10:44 ` Daniel P. Berrangé
0 siblings, 0 replies; 46+ messages in thread
From: Daniel P. Berrangé @ 2025-08-27 10:44 UTC (permalink / raw)
To: Magnus Kulke
Cc: qemu-devel, Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Magnus Kulke, Cornelia Huck, Zhao Liu, Thomas Huth, Yanan Wang,
Cameron Esfahani, Wei Liu, Wei Liu, Marc-André Lureau,
Roman Bolshakov, Philippe Mathieu-Daudé
On Thu, Aug 07, 2025 at 04:39:30PM +0200, Magnus Kulke wrote:
> Introduce headers for the Microsoft Hypervisor (MSHV) userspace ABI,
> including IOCTLs and structures used to interface with the hypervisor.
>
> These definitions are based on the upstream Linux MSHV interface and
> will be used by the MSHV accelerator backend in later patches.
>
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
> include/hw/hyperv/hvgdk.h | 19 +
> include/hw/hyperv/hvgdk_mini.h | 864 ++++++++++++++++++++++++++++++++
> include/hw/hyperv/hvhdk.h | 164 ++++++
> include/hw/hyperv/hvhdk_mini.h | 105 ++++
> scripts/update-linux-headers.sh | 2 +-
> 5 files changed, 1153 insertions(+), 1 deletion(-)
> create mode 100644 include/hw/hyperv/hvgdk.h
> create mode 100644 include/hw/hyperv/hvgdk_mini.h
> create mode 100644 include/hw/hyperv/hvhdk.h
> create mode 100644 include/hw/hyperv/hvhdk_mini.h
>
> diff --git a/include/hw/hyperv/hvgdk.h b/include/hw/hyperv/hvgdk.h
> new file mode 100644
> index 0000000000..d37c2b188d
> --- /dev/null
> +++ b/include/hw/hyperv/hvgdk.h
> @@ -0,0 +1,19 @@
> +#ifndef _HVGDK_H
> +#define _HVGDK_H
The choice of naming scheme for these #ifndef is a bit inconsistent
through this patch, and nit-picking the leading _ is reserved IIRC.
> diff --git a/include/hw/hyperv/hvgdk_mini.h b/include/hw/hyperv/hvgdk_mini.h
> new file mode 100644
> index 0000000000..83f44fd5fa
> --- /dev/null
> +++ b/include/hw/hyperv/hvgdk_mini.h
> +#ifndef HW_HYPERV_LINUX_MSHV_H
> +#define HW_HYPERV_LINUX_MSHV_H
> diff --git a/include/hw/hyperv/hvhdk.h b/include/hw/hyperv/hvhdk.h
> new file mode 100644
> index 0000000000..d22cc49742
> --- /dev/null
> +++ b/include/hw/hyperv/hvhdk.h
> +#ifndef HW_HYPERV_HVHDK_H
> +#define HW_HYPERV_HVHDK_H
> diff --git a/include/hw/hyperv/hvhdk_mini.h b/include/hw/hyperv/hvhdk_mini.h
> new file mode 100644
> index 0000000000..cffd16e0de
> --- /dev/null
> +++ b/include/hw/hyperv/hvhdk_mini.h
> +#ifndef _HVHDK_MINI_H
> +#define _HVHDK_MINI_H
Can we just make all these headers match the relative path under
'include/' as you did for the hvhdk.h file
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v3 06/26] linux-headers/linux: Add mshv.h headers
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (4 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 05/26] include/hw/hyperv: Add MSHV ABI header definitions Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 07/26] accel/mshv: Add accelerator skeleton Magnus Kulke
` (21 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
This file has been added to the tree by running `update-linux-header.sh`
on linux v6.16.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
| 291 +++++++++++++++++++++++++++++++++++++
1 file changed, 291 insertions(+)
create mode 100644 linux-headers/linux/mshv.h
--git a/linux-headers/linux/mshv.h b/linux-headers/linux/mshv.h
new file mode 100644
index 0000000000..5bc83db6a3
--- /dev/null
+++ b/linux-headers/linux/mshv.h
@@ -0,0 +1,291 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Userspace interfaces for /dev/mshv* devices and derived fds
+ *
+ * This file is divided into sections containing data structures and IOCTLs for
+ * a particular set of related devices or derived file descriptors.
+ *
+ * The IOCTL definitions are at the end of each section. They are grouped by
+ * device/fd, so that new IOCTLs can easily be added with a monotonically
+ * increasing number.
+ */
+#ifndef _LINUX_MSHV_H
+#define _LINUX_MSHV_H
+
+#include <linux/types.h>
+
+#define MSHV_IOCTL 0xB8
+
+/*
+ *******************************************
+ * Entry point to main VMM APIs: /dev/mshv *
+ *******************************************
+ */
+
+enum {
+ MSHV_PT_BIT_LAPIC,
+ MSHV_PT_BIT_X2APIC,
+ MSHV_PT_BIT_GPA_SUPER_PAGES,
+ MSHV_PT_BIT_COUNT,
+};
+
+#define MSHV_PT_FLAGS_MASK ((1 << MSHV_PT_BIT_COUNT) - 1)
+
+enum {
+ MSHV_PT_ISOLATION_NONE,
+ MSHV_PT_ISOLATION_COUNT,
+};
+
+/**
+ * struct mshv_create_partition - arguments for MSHV_CREATE_PARTITION
+ * @pt_flags: Bitmask of 1 << MSHV_PT_BIT_*
+ * @pt_isolation: MSHV_PT_ISOLATION_*
+ *
+ * Returns a file descriptor to act as a handle to a guest partition.
+ * At this point the partition is not yet initialized in the hypervisor.
+ * Some operations must be done with the partition in this state, e.g. setting
+ * so-called "early" partition properties. The partition can then be
+ * initialized with MSHV_INITIALIZE_PARTITION.
+ */
+struct mshv_create_partition {
+ __u64 pt_flags;
+ __u64 pt_isolation;
+};
+
+/* /dev/mshv */
+#define MSHV_CREATE_PARTITION _IOW(MSHV_IOCTL, 0x00, struct mshv_create_partition)
+
+/*
+ ************************
+ * Child partition APIs *
+ ************************
+ */
+
+struct mshv_create_vp {
+ __u32 vp_index;
+};
+
+enum {
+ MSHV_SET_MEM_BIT_WRITABLE,
+ MSHV_SET_MEM_BIT_EXECUTABLE,
+ MSHV_SET_MEM_BIT_UNMAP,
+ MSHV_SET_MEM_BIT_COUNT
+};
+
+#define MSHV_SET_MEM_FLAGS_MASK ((1 << MSHV_SET_MEM_BIT_COUNT) - 1)
+
+/* The hypervisor's "native" page size */
+#define MSHV_HV_PAGE_SIZE 0x1000
+
+/**
+ * struct mshv_user_mem_region - arguments for MSHV_SET_GUEST_MEMORY
+ * @size: Size of the memory region (bytes). Must be aligned to
+ * MSHV_HV_PAGE_SIZE
+ * @guest_pfn: Base guest page number to map
+ * @userspace_addr: Base address of userspace memory. Must be aligned to
+ * MSHV_HV_PAGE_SIZE
+ * @flags: Bitmask of 1 << MSHV_SET_MEM_BIT_*. If (1 << MSHV_SET_MEM_BIT_UNMAP)
+ * is set, ignore other bits.
+ * @rsvd: MBZ
+ *
+ * Map or unmap a region of userspace memory to Guest Physical Addresses (GPA).
+ * Mappings can't overlap in GPA space or userspace.
+ * To unmap, these fields must match an existing mapping.
+ */
+struct mshv_user_mem_region {
+ __u64 size;
+ __u64 guest_pfn;
+ __u64 userspace_addr;
+ __u8 flags;
+ __u8 rsvd[7];
+};
+
+enum {
+ MSHV_IRQFD_BIT_DEASSIGN,
+ MSHV_IRQFD_BIT_RESAMPLE,
+ MSHV_IRQFD_BIT_COUNT,
+};
+
+#define MSHV_IRQFD_FLAGS_MASK ((1 << MSHV_IRQFD_BIT_COUNT) - 1)
+
+struct mshv_user_irqfd {
+ __s32 fd;
+ __s32 resamplefd;
+ __u32 gsi;
+ __u32 flags;
+};
+
+enum {
+ MSHV_IOEVENTFD_BIT_DATAMATCH,
+ MSHV_IOEVENTFD_BIT_PIO,
+ MSHV_IOEVENTFD_BIT_DEASSIGN,
+ MSHV_IOEVENTFD_BIT_COUNT,
+};
+
+#define MSHV_IOEVENTFD_FLAGS_MASK ((1 << MSHV_IOEVENTFD_BIT_COUNT) - 1)
+
+struct mshv_user_ioeventfd {
+ __u64 datamatch;
+ __u64 addr; /* legal pio/mmio address */
+ __u32 len; /* 1, 2, 4, or 8 bytes */
+ __s32 fd;
+ __u32 flags;
+ __u8 rsvd[4];
+};
+
+struct mshv_user_irq_entry {
+ __u32 gsi;
+ __u32 address_lo;
+ __u32 address_hi;
+ __u32 data;
+};
+
+struct mshv_user_irq_table {
+ __u32 nr;
+ __u32 rsvd; /* MBZ */
+ struct mshv_user_irq_entry entries[];
+};
+
+enum {
+ MSHV_GPAP_ACCESS_TYPE_ACCESSED,
+ MSHV_GPAP_ACCESS_TYPE_DIRTY,
+ MSHV_GPAP_ACCESS_TYPE_COUNT /* Count of enum members */
+};
+
+enum {
+ MSHV_GPAP_ACCESS_OP_NOOP,
+ MSHV_GPAP_ACCESS_OP_CLEAR,
+ MSHV_GPAP_ACCESS_OP_SET,
+ MSHV_GPAP_ACCESS_OP_COUNT /* Count of enum members */
+};
+
+/**
+ * struct mshv_gpap_access_bitmap - arguments for MSHV_GET_GPAP_ACCESS_BITMAP
+ * @access_type: MSHV_GPAP_ACCESS_TYPE_* - The type of access to record in the
+ * bitmap
+ * @access_op: MSHV_GPAP_ACCESS_OP_* - Allows an optional clear or set of all
+ * the access states in the range, after retrieving the current
+ * states.
+ * @rsvd: MBZ
+ * @page_count: Number of pages
+ * @gpap_base: Base gpa page number
+ * @bitmap_ptr: Output buffer for bitmap, at least (page_count + 7) / 8 bytes
+ *
+ * Retrieve a bitmap of either ACCESSED or DIRTY bits for a given range of guest
+ * memory, and optionally clear or set the bits.
+ */
+struct mshv_gpap_access_bitmap {
+ __u8 access_type;
+ __u8 access_op;
+ __u8 rsvd[6];
+ __u64 page_count;
+ __u64 gpap_base;
+ __u64 bitmap_ptr;
+};
+
+/**
+ * struct mshv_root_hvcall - arguments for MSHV_ROOT_HVCALL
+ * @code: Hypercall code (HVCALL_*)
+ * @reps: in: Rep count ('repcount')
+ * out: Reps completed ('repcomp'). MBZ unless rep hvcall
+ * @in_sz: Size of input incl rep data. <= MSHV_HV_PAGE_SIZE
+ * @out_sz: Size of output buffer. <= MSHV_HV_PAGE_SIZE. MBZ if out_ptr is 0
+ * @status: in: MBZ
+ * out: HV_STATUS_* from hypercall
+ * @rsvd: MBZ
+ * @in_ptr: Input data buffer (struct hv_input_*). If used with partition or
+ * vp fd, partition id field is populated by kernel.
+ * @out_ptr: Output data buffer (optional)
+ */
+struct mshv_root_hvcall {
+ __u16 code;
+ __u16 reps;
+ __u16 in_sz;
+ __u16 out_sz;
+ __u16 status;
+ __u8 rsvd[6];
+ __u64 in_ptr;
+ __u64 out_ptr;
+};
+
+/* Partition fds created with MSHV_CREATE_PARTITION */
+#define MSHV_INITIALIZE_PARTITION _IO(MSHV_IOCTL, 0x00)
+#define MSHV_CREATE_VP _IOW(MSHV_IOCTL, 0x01, struct mshv_create_vp)
+#define MSHV_SET_GUEST_MEMORY _IOW(MSHV_IOCTL, 0x02, struct mshv_user_mem_region)
+#define MSHV_IRQFD _IOW(MSHV_IOCTL, 0x03, struct mshv_user_irqfd)
+#define MSHV_IOEVENTFD _IOW(MSHV_IOCTL, 0x04, struct mshv_user_ioeventfd)
+#define MSHV_SET_MSI_ROUTING _IOW(MSHV_IOCTL, 0x05, struct mshv_user_irq_table)
+#define MSHV_GET_GPAP_ACCESS_BITMAP _IOWR(MSHV_IOCTL, 0x06, struct mshv_gpap_access_bitmap)
+/* Generic hypercall */
+#define MSHV_ROOT_HVCALL _IOWR(MSHV_IOCTL, 0x07, struct mshv_root_hvcall)
+
+/*
+ ********************************
+ * VP APIs for child partitions *
+ ********************************
+ */
+
+#define MSHV_RUN_VP_BUF_SZ 256
+
+/*
+ * VP state pages may be mapped to userspace via mmap().
+ * To specify which state page, use MSHV_VP_MMAP_OFFSET_ values multiplied by
+ * the system page size.
+ * e.g.
+ * long page_size = sysconf(_SC_PAGE_SIZE);
+ * void *reg_page = mmap(NULL, MSHV_HV_PAGE_SIZE, PROT_READ|PROT_WRITE,
+ * MAP_SHARED, vp_fd,
+ * MSHV_VP_MMAP_OFFSET_REGISTERS * page_size);
+ */
+enum {
+ MSHV_VP_MMAP_OFFSET_REGISTERS,
+ MSHV_VP_MMAP_OFFSET_INTERCEPT_MESSAGE,
+ MSHV_VP_MMAP_OFFSET_GHCB,
+ MSHV_VP_MMAP_OFFSET_COUNT
+};
+
+/**
+ * struct mshv_run_vp - argument for MSHV_RUN_VP
+ * @msg_buf: On success, the intercept message is copied here. It can be
+ * interpreted using the relevant hypervisor definitions.
+ */
+struct mshv_run_vp {
+ __u8 msg_buf[MSHV_RUN_VP_BUF_SZ];
+};
+
+enum {
+ MSHV_VP_STATE_LAPIC, /* Local interrupt controller state (either arch) */
+ MSHV_VP_STATE_XSAVE, /* XSAVE data in compacted form (x86_64) */
+ MSHV_VP_STATE_SIMP,
+ MSHV_VP_STATE_SIEFP,
+ MSHV_VP_STATE_SYNTHETIC_TIMERS,
+ MSHV_VP_STATE_COUNT,
+};
+
+/**
+ * struct mshv_get_set_vp_state - arguments for MSHV_[GET,SET]_VP_STATE
+ * @type: MSHV_VP_STATE_*
+ * @rsvd: MBZ
+ * @buf_sz: in: 4k page-aligned size of buffer
+ * out: Actual size of data (on EINVAL, check this to see if buffer
+ * was too small)
+ * @buf_ptr: 4k page-aligned data buffer
+ */
+struct mshv_get_set_vp_state {
+ __u8 type;
+ __u8 rsvd[3];
+ __u32 buf_sz;
+ __u64 buf_ptr;
+};
+
+/* VP fds created with MSHV_CREATE_VP */
+#define MSHV_RUN_VP _IOR(MSHV_IOCTL, 0x00, struct mshv_run_vp)
+#define MSHV_GET_VP_STATE _IOWR(MSHV_IOCTL, 0x01, struct mshv_get_set_vp_state)
+#define MSHV_SET_VP_STATE _IOWR(MSHV_IOCTL, 0x02, struct mshv_get_set_vp_state)
+/*
+ * Generic hypercall
+ * Defined above in partition IOCTLs, avoid redefining it here
+ * #define MSHV_ROOT_HVCALL _IOWR(MSHV_IOCTL, 0x07, struct mshv_root_hvcall)
+ */
+
+#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 07/26] accel/mshv: Add accelerator skeleton
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (5 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 06/26] linux-headers/linux: Add mshv.h headers Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 08/26] accel/mshv: Register memory region listeners Magnus Kulke
` (20 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Introduce the initial scaffold for the MSHV (Microsoft Hypervisor)
accelerator backend. This includes the basic directory structure and
stub implementations needed to integrate with QEMU's accelerator
framework.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
accel/meson.build | 1 +
accel/mshv/meson.build | 6 ++
accel/mshv/mshv-all.c | 143 +++++++++++++++++++++++++++++++++++++++++
include/system/mshv.h | 35 ++++++++++
4 files changed, 185 insertions(+)
create mode 100644 accel/mshv/meson.build
create mode 100644 accel/mshv/mshv-all.c
diff --git a/accel/meson.build b/accel/meson.build
index 6349efe682..983dfd0bd5 100644
--- a/accel/meson.build
+++ b/accel/meson.build
@@ -10,6 +10,7 @@ if have_system
subdir('kvm')
subdir('xen')
subdir('stubs')
+ subdir('mshv')
endif
# qtest
diff --git a/accel/mshv/meson.build b/accel/mshv/meson.build
new file mode 100644
index 0000000000..4c03ac7921
--- /dev/null
+++ b/accel/mshv/meson.build
@@ -0,0 +1,6 @@
+mshv_ss = ss.source_set()
+mshv_ss.add(if_true: files(
+ 'mshv-all.c'
+))
+
+specific_ss.add_all(when: 'CONFIG_MSHV', if_true: mshv_ss)
diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
new file mode 100644
index 0000000000..f548b1187b
--- /dev/null
+++ b/accel/mshv/mshv-all.c
@@ -0,0 +1,143 @@
+/*
+ * QEMU MSHV support
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * Authors:
+ * Ziqiao Zhou <ziqiaozhou@microsoft.com>
+ * Magnus Kulke <magnuskulke@microsoft.com>
+ * Jinank Jain <jinankjain@microsoft.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/event_notifier.h"
+#include "qemu/module.h"
+#include "qemu/main-loop.h"
+#include "hw/boards.h"
+
+#include "hw/hyperv/hvhdk.h"
+#include "hw/hyperv/hvhdk_mini.h"
+#include "hw/hyperv/hvgdk.h"
+#include "linux/mshv.h"
+
+#include "qemu/accel.h"
+#include "qemu/guest-random.h"
+#include "accel/accel-ops.h"
+#include "accel/accel-cpu-ops.h"
+#include "system/cpus.h"
+#include "system/runstate.h"
+#include "system/accel-blocker.h"
+#include "system/address-spaces.h"
+#include "system/mshv.h"
+#include "system/reset.h"
+#include "trace.h"
+#include <err.h>
+#include <stdint.h>
+#include <sys/ioctl.h>
+
+#define TYPE_MSHV_ACCEL ACCEL_CLASS_NAME("mshv")
+
+DECLARE_INSTANCE_CHECKER(MshvState, MSHV_STATE, TYPE_MSHV_ACCEL)
+
+bool mshv_allowed;
+
+MshvState *mshv_state;
+
+static int mshv_init(AccelState *as, MachineState *ms)
+{
+ error_report("unimplemented");
+ abort();
+}
+
+static void mshv_start_vcpu_thread(CPUState *cpu)
+{
+ error_report("unimplemented");
+ abort();
+}
+
+static void mshv_cpu_synchronize_post_init(CPUState *cpu)
+{
+ error_report("unimplemented");
+ abort();
+}
+
+static void mshv_cpu_synchronize_post_reset(CPUState *cpu)
+{
+ error_report("unimplemented");
+ abort();
+}
+
+static void mshv_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+ error_report("unimplemented");
+ abort();
+}
+
+static void mshv_cpu_synchronize(CPUState *cpu)
+{
+ error_report("unimplemented");
+ abort();
+}
+
+static bool mshv_cpus_are_resettable(void)
+{
+ error_report("unimplemented");
+ abort();
+}
+
+static void mshv_accel_class_init(ObjectClass *oc, const void *data)
+{
+ AccelClass *ac = ACCEL_CLASS(oc);
+
+ ac->name = "MSHV";
+ ac->init_machine = mshv_init;
+ ac->allowed = &mshv_allowed;
+}
+
+static void mshv_accel_instance_init(Object *obj)
+{
+ MshvState *s = MSHV_STATE(obj);
+
+ s->vm = 0;
+}
+
+static const TypeInfo mshv_accel_type = {
+ .name = TYPE_MSHV_ACCEL,
+ .parent = TYPE_ACCEL,
+ .instance_init = mshv_accel_instance_init,
+ .class_init = mshv_accel_class_init,
+ .instance_size = sizeof(MshvState),
+};
+
+static void mshv_accel_ops_class_init(ObjectClass *oc, const void *data)
+{
+ AccelOpsClass *ops = ACCEL_OPS_CLASS(oc);
+
+ ops->create_vcpu_thread = mshv_start_vcpu_thread;
+ ops->synchronize_post_init = mshv_cpu_synchronize_post_init;
+ ops->synchronize_post_reset = mshv_cpu_synchronize_post_reset;
+ ops->synchronize_state = mshv_cpu_synchronize;
+ ops->synchronize_pre_loadvm = mshv_cpu_synchronize_pre_loadvm;
+ ops->cpus_are_resettable = mshv_cpus_are_resettable;
+ ops->handle_interrupt = generic_handle_interrupt;
+}
+
+static const TypeInfo mshv_accel_ops_type = {
+ .name = ACCEL_OPS_NAME("mshv"),
+ .parent = TYPE_ACCEL_OPS,
+ .class_init = mshv_accel_ops_class_init,
+ .abstract = true,
+};
+
+static void mshv_type_init(void)
+{
+ type_register_static(&mshv_accel_type);
+ type_register_static(&mshv_accel_ops_type);
+}
+
+type_init(mshv_type_init);
diff --git a/include/system/mshv.h b/include/system/mshv.h
index a358691428..45808c5c50 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -14,6 +14,15 @@
#ifndef QEMU_MSHV_INT_H
#define QEMU_MSHV_INT_H
+#include "qemu/osdep.h"
+#include "qemu/accel.h"
+#include "hw/hyperv/hyperv-proto.h"
+#include "linux/mshv.h"
+#include "hw/hyperv/hvhdk.h"
+#include "qapi/qapi-types-common.h"
+#include "system/memory.h"
+#include "accel/accel-ops.h"
+
#ifdef COMPILING_PER_TARGET
#ifdef CONFIG_MSHV
#define CONFIG_MSHV_IS_POSSIBLE
@@ -25,6 +34,32 @@
#ifdef CONFIG_MSHV_IS_POSSIBLE
extern bool mshv_allowed;
#define mshv_enabled() (mshv_allowed)
+
+typedef struct MshvMemoryListener {
+ MemoryListener listener;
+ int as_id;
+} MshvMemoryListener;
+
+typedef struct MshvAddressSpace {
+ MshvMemoryListener *ml;
+ AddressSpace *as;
+} MshvAddressSpace;
+
+typedef struct MshvState {
+ AccelState parent_obj;
+ int vm;
+ MshvMemoryListener memory_listener;
+ /* number of listeners */
+ int nr_as;
+ MshvAddressSpace *as;
+} MshvState;
+extern MshvState *mshv_state;
+
+struct AccelCPUState {
+ int cpufd;
+ bool dirty;
+};
+
#else /* CONFIG_MSHV_IS_POSSIBLE */
#define mshv_enabled() false
#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 08/26] accel/mshv: Register memory region listeners
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (6 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 07/26] accel/mshv: Add accelerator skeleton Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 09/26] accel/mshv: Initialize VM partition Magnus Kulke
` (19 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Add memory listener hooks for the MSHV accelerator to track guest
memory regions. This enables the backend to respond to region
additions, removals and will be used to manage guest memory mappings
inside the hypervisor.
Actually registering physical memory in the hypervisor is still stubbed
out.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
accel/mshv/mem.c | 24 +++++++++++++++
accel/mshv/meson.build | 1 +
accel/mshv/mshv-all.c | 67 ++++++++++++++++++++++++++++++++++++++++--
include/system/mshv.h | 4 +++
4 files changed, 94 insertions(+), 2 deletions(-)
create mode 100644 accel/mshv/mem.c
diff --git a/accel/mshv/mem.c b/accel/mshv/mem.c
new file mode 100644
index 0000000000..ad5e62c89c
--- /dev/null
+++ b/accel/mshv/mem.c
@@ -0,0 +1,24 @@
+/*
+ * QEMU MSHV support
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * Authors:
+ * Magnus Kulke <magnuskulke@microsoft.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "system/address-spaces.h"
+#include "system/mshv.h"
+
+void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
+ bool add)
+{
+ error_report("unimplemented");
+ abort();
+}
+
diff --git a/accel/mshv/meson.build b/accel/mshv/meson.build
index 4c03ac7921..8a6beb3fb1 100644
--- a/accel/mshv/meson.build
+++ b/accel/mshv/meson.build
@@ -1,5 +1,6 @@
mshv_ss = ss.source_set()
mshv_ss.add(if_true: files(
+ 'mem.c',
'mshv-all.c'
))
diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index f548b1187b..2094966c8c 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -48,10 +48,73 @@ bool mshv_allowed;
MshvState *mshv_state;
+static void mem_region_add(MemoryListener *listener,
+ MemoryRegionSection *section)
+{
+ MshvMemoryListener *mml;
+ mml = container_of(listener, MshvMemoryListener, listener);
+ memory_region_ref(section->mr);
+ mshv_set_phys_mem(mml, section, true);
+}
+
+static void mem_region_del(MemoryListener *listener,
+ MemoryRegionSection *section)
+{
+ MshvMemoryListener *mml;
+ mml = container_of(listener, MshvMemoryListener, listener);
+ mshv_set_phys_mem(mml, section, false);
+ memory_region_unref(section->mr);
+}
+
+static MemoryListener mshv_memory_listener = {
+ .name = "mshv",
+ .priority = MEMORY_LISTENER_PRIORITY_ACCEL,
+ .region_add = mem_region_add,
+ .region_del = mem_region_del,
+};
+
+static MemoryListener mshv_io_listener = {
+ .name = "mshv", .priority = MEMORY_LISTENER_PRIORITY_DEV_BACKEND,
+ /* MSHV does not support PIO eventfd */
+};
+
+static void register_mshv_memory_listener(MshvState *s, MshvMemoryListener *mml,
+ AddressSpace *as, int as_id,
+ const char *name)
+{
+ int i;
+
+ mml->listener = mshv_memory_listener;
+ mml->listener.name = name;
+ memory_listener_register(&mml->listener, as);
+ for (i = 0; i < s->nr_as; ++i) {
+ if (!s->as[i].as) {
+ s->as[i].as = as;
+ s->as[i].ml = mml;
+ break;
+ }
+ }
+}
+
static int mshv_init(AccelState *as, MachineState *ms)
{
- error_report("unimplemented");
- abort();
+ MshvState *s;
+ s = MSHV_STATE(as);
+
+ accel_blocker_init();
+
+ s->vm = 0;
+
+ s->nr_as = 1;
+ s->as = g_new0(MshvAddressSpace, s->nr_as);
+
+ mshv_state = s;
+
+ register_mshv_memory_listener(s, &s->memory_listener, &address_space_memory,
+ 0, "mshv-memory");
+ memory_listener_register(&mshv_io_listener, &address_space_io);
+
+ return 0;
}
static void mshv_start_vcpu_thread(CPUState *cpu)
diff --git a/include/system/mshv.h b/include/system/mshv.h
index 45808c5c50..c5d33cd990 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -69,6 +69,10 @@ struct AccelCPUState {
#define mshv_msi_via_irqfd_enabled() false
#endif
+/* memory */
+void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
+ bool add);
+
/* interrupt */
int mshv_irqchip_add_msi_route(int vector, PCIDevice *dev);
int mshv_irqchip_update_msi_route(int virq, MSIMessage msg, PCIDevice *dev);
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 09/26] accel/mshv: Initialize VM partition
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (7 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 08/26] accel/mshv: Register memory region listeners Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-27 11:15 ` Daniel P. Berrangé
2025-08-07 14:39 ` [PATCH v3 10/26] accel/mshv: Add vCPU creation and execution loop Magnus Kulke
` (18 subsequent siblings)
27 siblings, 1 reply; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Create the MSHV virtual machine by opening a partition and issuing
the necessary ioctl to initialize it. This sets up the basic VM
structure and initial configuration used by MSHV to manage guest state.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
accel/mshv/irq.c | 397 +++++++++++++++++++++++++++++++++++
accel/mshv/mem.c | 129 +++++++++++-
accel/mshv/meson.build | 1 +
accel/mshv/mshv-all.c | 325 ++++++++++++++++++++++++++++
accel/mshv/trace-events | 23 ++
accel/mshv/trace.h | 14 ++
hw/intc/apic.c | 8 +
include/system/mshv.h | 38 +++-
meson.build | 1 +
target/i386/mshv/meson.build | 1 +
target/i386/mshv/mshv-cpu.c | 71 +++++++
11 files changed, 1001 insertions(+), 7 deletions(-)
create mode 100644 accel/mshv/irq.c
create mode 100644 accel/mshv/trace-events
create mode 100644 accel/mshv/trace.h
create mode 100644 target/i386/mshv/mshv-cpu.c
diff --git a/accel/mshv/irq.c b/accel/mshv/irq.c
new file mode 100644
index 0000000000..d528af5ff3
--- /dev/null
+++ b/accel/mshv/irq.c
@@ -0,0 +1,397 @@
+/*
+ * QEMU MSHV support
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * Authors: Ziqiao Zhou <ziqiaozhou@microsoft.com>
+ * Magnus Kulke <magnuskulke@microsoft.com>
+ * Stanislav Kinsburskii <skinsburskii@microsoft.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "linux/mshv.h"
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "hw/hyperv/hvhdk_mini.h"
+#include "hw/hyperv/hvgdk_mini.h"
+#include "hw/intc/ioapic.h"
+#include "hw/pci/msi.h"
+#include "system/mshv.h"
+#include "trace.h"
+#include <stdint.h>
+#include <sys/ioctl.h>
+
+#define MSHV_IRQFD_RESAMPLE_FLAG (1 << MSHV_IRQFD_BIT_RESAMPLE)
+#define MSHV_IRQFD_BIT_DEASSIGN_FLAG (1 << MSHV_IRQFD_BIT_DEASSIGN)
+
+static MshvMsiControl *msi_control;
+static QemuMutex msi_control_mutex;
+
+void mshv_init_msicontrol(void)
+{
+ qemu_mutex_init(&msi_control_mutex);
+ msi_control = g_new0(MshvMsiControl, 1);
+ msi_control->gsi_routes = g_hash_table_new(g_direct_hash, g_direct_equal);
+ msi_control->updated = false;
+}
+
+static int set_msi_routing(uint32_t gsi, uint64_t addr, uint32_t data)
+{
+ struct mshv_user_irq_entry *entry;
+ uint32_t high_addr = addr >> 32;
+ uint32_t low_addr = addr & 0xFFFFFFFF;
+ GHashTable *gsi_routes;
+
+ trace_mshv_set_msi_routing(gsi, addr, data);
+
+ if (gsi >= MSHV_MAX_MSI_ROUTES) {
+ error_report("gsi >= MSHV_MAX_MSI_ROUTES");
+ return -1;
+ }
+
+ assert(msi_control);
+
+ WITH_QEMU_LOCK_GUARD(&msi_control_mutex) {
+ gsi_routes = msi_control->gsi_routes;
+ entry = g_hash_table_lookup(gsi_routes, GINT_TO_POINTER(gsi));
+
+ if (entry
+ && entry->address_hi == high_addr
+ && entry->address_lo == low_addr
+ && entry->data == data)
+ {
+ /* nothing to update */
+ return 0;
+ }
+
+ /* free old entry */
+ g_free(entry);
+
+ /* create new entry */
+ entry = g_new0(struct mshv_user_irq_entry, 1);
+ entry->gsi = gsi;
+ entry->address_hi = high_addr;
+ entry->address_lo = low_addr;
+ entry->data = data;
+
+ g_hash_table_insert(gsi_routes, GINT_TO_POINTER(gsi), entry);
+ msi_control->updated = true;
+ }
+
+ return 0;
+}
+
+static int add_msi_routing(uint64_t addr, uint32_t data)
+{
+ struct mshv_user_irq_entry *route_entry;
+ uint32_t high_addr = addr >> 32;
+ uint32_t low_addr = addr & 0xFFFFFFFF;
+ int gsi;
+ GHashTable *gsi_routes;
+
+ trace_mshv_add_msi_routing(addr, data);
+
+ assert(msi_control);
+
+ WITH_QEMU_LOCK_GUARD(&msi_control_mutex) {
+ /* find an empty slot */
+ gsi = 0;
+ gsi_routes = msi_control->gsi_routes;
+ while (gsi < MSHV_MAX_MSI_ROUTES) {
+ route_entry = g_hash_table_lookup(gsi_routes, GINT_TO_POINTER(gsi));
+ if (!route_entry) {
+ break;
+ }
+ gsi++;
+ }
+ if (gsi >= MSHV_MAX_MSI_ROUTES) {
+ error_report("No empty gsi slot available");
+ return -1;
+ }
+
+ /* create new entry */
+ route_entry = g_new0(struct mshv_user_irq_entry, 1);
+ route_entry->gsi = gsi;
+ route_entry->address_hi = high_addr;
+ route_entry->address_lo = low_addr;
+ route_entry->data = data;
+
+ g_hash_table_insert(gsi_routes, GINT_TO_POINTER(gsi), route_entry);
+ msi_control->updated = true;
+ }
+
+ return gsi;
+}
+
+static int commit_msi_routing_table(int vm_fd)
+{
+ guint len;
+ int i, ret;
+ size_t table_size;
+ struct mshv_user_irq_table *table;
+ GHashTableIter iter;
+ gpointer key, value;
+
+ assert(msi_control);
+
+ WITH_QEMU_LOCK_GUARD(&msi_control_mutex) {
+ if (!msi_control->updated) {
+ /* nothing to update */
+ return 0;
+ }
+
+ /* Calculate the size of the table */
+ len = g_hash_table_size(msi_control->gsi_routes);
+ table_size = sizeof(struct mshv_user_irq_table)
+ + len * sizeof(struct mshv_user_irq_entry);
+ table = g_malloc0(table_size);
+
+ g_hash_table_iter_init(&iter, msi_control->gsi_routes);
+ i = 0;
+ while (g_hash_table_iter_next(&iter, &key, &value)) {
+ struct mshv_user_irq_entry *entry = value;
+ table->entries[i] = *entry;
+ i++;
+ }
+ table->nr = i;
+
+ trace_mshv_commit_msi_routing_table(vm_fd, len);
+
+ ret = ioctl(vm_fd, MSHV_SET_MSI_ROUTING, table);
+ g_free(table);
+ if (ret < 0) {
+ error_report("Failed to commit msi routing table");
+ return -1;
+ }
+ msi_control->updated = false;
+ }
+ return 0;
+}
+
+static int remove_msi_routing(uint32_t gsi)
+{
+ struct mshv_user_irq_entry *route_entry;
+ GHashTable *gsi_routes;
+
+ trace_mshv_remove_msi_routing(gsi);
+
+ if (gsi >= MSHV_MAX_MSI_ROUTES) {
+ error_report("Invalid GSI: %u", gsi);
+ return -1;
+ }
+
+ assert(msi_control);
+
+ WITH_QEMU_LOCK_GUARD(&msi_control_mutex) {
+ gsi_routes = msi_control->gsi_routes;
+ route_entry = g_hash_table_lookup(gsi_routes, GINT_TO_POINTER(gsi));
+ if (route_entry) {
+ g_hash_table_remove(gsi_routes, GINT_TO_POINTER(gsi));
+ g_free(route_entry);
+ msi_control->updated = true;
+ }
+ }
+
+ return 0;
+}
+
+/* Pass an eventfd which is to be used for injecting interrupts from userland */
+static int irqfd(int vm_fd, int fd, int resample_fd, uint32_t gsi,
+ uint32_t flags)
+{
+ int ret;
+ struct mshv_user_irqfd arg = {
+ .fd = fd,
+ .resamplefd = resample_fd,
+ .gsi = gsi,
+ .flags = flags,
+ };
+
+ ret = ioctl(vm_fd, MSHV_IRQFD, &arg);
+ if (ret < 0) {
+ error_report("Failed to set irqfd: gsi=%u, fd=%d", gsi, fd);
+ return -1;
+ }
+ return ret;
+}
+
+static int register_irqfd(int vm_fd, int event_fd, uint32_t gsi)
+{
+ int ret;
+
+ trace_mshv_register_irqfd(vm_fd, event_fd, gsi);
+
+ ret = irqfd(vm_fd, event_fd, 0, gsi, 0);
+ if (ret < 0) {
+ error_report("Failed to register irqfd: gsi=%u", gsi);
+ return -1;
+ }
+ return 0;
+}
+
+static int register_irqfd_with_resample(int vm_fd, int event_fd,
+ int resample_fd, uint32_t gsi)
+{
+ int ret;
+ uint32_t flags = MSHV_IRQFD_RESAMPLE_FLAG;
+
+ ret = irqfd(vm_fd, event_fd, resample_fd, gsi, flags);
+ if (ret < 0) {
+ error_report("Failed to register irqfd with resample: gsi=%u", gsi);
+ return -errno;
+ }
+ return 0;
+}
+
+static int unregister_irqfd(int vm_fd, int event_fd, uint32_t gsi)
+{
+ int ret;
+ uint32_t flags = MSHV_IRQFD_BIT_DEASSIGN_FLAG;
+
+ ret = irqfd(vm_fd, event_fd, 0, gsi, flags);
+ if (ret < 0) {
+ error_report("Failed to unregister irqfd: gsi=%u", gsi);
+ return -errno;
+ }
+ return 0;
+}
+
+static int irqchip_update_irqfd_notifier_gsi(const EventNotifier *event,
+ const EventNotifier *resample,
+ int virq, bool add)
+{
+ int fd = event_notifier_get_fd(event);
+ int rfd = resample ? event_notifier_get_fd(resample) : -1;
+ int vm_fd = mshv_state->vm;
+
+ trace_mshv_irqchip_update_irqfd_notifier_gsi(fd, rfd, virq, add);
+
+ if (!add) {
+ return unregister_irqfd(vm_fd, fd, virq);
+ }
+
+ if (rfd > 0) {
+ return register_irqfd_with_resample(vm_fd, fd, rfd, virq);
+ }
+
+ return register_irqfd(vm_fd, fd, virq);
+}
+
+
+int mshv_irqchip_add_msi_route(int vector, PCIDevice *dev)
+{
+ MSIMessage msg = { 0, 0 };
+ int virq = 0;
+
+ if (pci_available && dev) {
+ msg = pci_get_msi_message(dev, vector);
+ virq = add_msi_routing(msg.address, le32_to_cpu(msg.data));
+ }
+
+ return virq;
+}
+
+void mshv_irqchip_release_virq(int virq)
+{
+ remove_msi_routing(virq);
+}
+
+int mshv_irqchip_update_msi_route(int virq, MSIMessage msg, PCIDevice *dev)
+{
+ int ret;
+
+ ret = set_msi_routing(virq, msg.address, le32_to_cpu(msg.data));
+ if (ret < 0) {
+ error_report("Failed to set msi routing");
+ return -1;
+ }
+
+ return 0;
+}
+
+int mshv_request_interrupt(int vm_fd, uint32_t interrupt_type, uint32_t vector,
+ uint32_t vp_index, bool logical_dest_mode,
+ bool level_triggered)
+{
+ int ret;
+
+ if (vector == 0) {
+ warn_report("Ignoring request for interrupt vector 0");
+ return 0;
+ }
+
+ union hv_interrupt_control control = {
+ .interrupt_type = interrupt_type,
+ .level_triggered = level_triggered,
+ .logical_dest_mode = logical_dest_mode,
+ .rsvd = 0,
+ };
+
+ struct hv_input_assert_virtual_interrupt arg = {0};
+ arg.control = control;
+ arg.dest_addr = (uint64_t)vp_index;
+ arg.vector = vector;
+
+ struct mshv_root_hvcall args = {0};
+ args.code = HVCALL_ASSERT_VIRTUAL_INTERRUPT;
+ args.in_sz = sizeof(arg);
+ args.in_ptr = (uint64_t)&arg;
+
+ ret = mshv_hvcall(vm_fd, &args);
+ if (ret < 0) {
+ error_report("Failed to request interrupt");
+ return -errno;
+ }
+ return 0;
+}
+
+void mshv_irqchip_commit_routes(void)
+{
+ int ret;
+ int vm_fd = mshv_state->vm;
+
+ ret = commit_msi_routing_table(vm_fd);
+ if (ret < 0) {
+ error_report("Failed to commit msi routing table");
+ abort();
+ }
+}
+
+int mshv_irqchip_add_irqfd_notifier_gsi(const EventNotifier *event,
+ const EventNotifier *resample,
+ int virq)
+{
+ return irqchip_update_irqfd_notifier_gsi(event, resample, virq, true);
+}
+
+int mshv_irqchip_remove_irqfd_notifier_gsi(const EventNotifier *event,
+ int virq)
+{
+ return irqchip_update_irqfd_notifier_gsi(event, NULL, virq, false);
+}
+
+int mshv_reserve_ioapic_msi_routes(int vm_fd)
+{
+ int ret, gsi;
+
+ /*
+ * Reserve GSI 0-23 for IOAPIC pins, to avoid conflicts of legacy
+ * peripherals with MSI-X devices
+ */
+ for (gsi = 0; gsi < IOAPIC_NUM_PINS; gsi++) {
+ ret = add_msi_routing(0, 0);
+ if (ret < 0) {
+ error_report("Failed to reserve GSI %d", gsi);
+ return -1;
+ }
+ }
+
+ ret = commit_msi_routing_table(vm_fd);
+ if (ret < 0) {
+ error_report("Failed to commit reserved IOAPIC MSI routes");
+ return -1;
+ }
+
+ return 0;
+}
diff --git a/accel/mshv/mem.c b/accel/mshv/mem.c
index ad5e62c89c..8039f35680 100644
--- a/accel/mshv/mem.c
+++ b/accel/mshv/mem.c
@@ -12,13 +12,136 @@
#include "qemu/osdep.h"
#include "qemu/error-report.h"
+#include "linux/mshv.h"
#include "system/address-spaces.h"
#include "system/mshv.h"
+#include "exec/memattrs.h"
+#include <sys/ioctl.h>
+#include "trace.h"
+
+static int set_guest_memory(int vm_fd,
+ const struct mshv_user_mem_region *region)
+{
+ int ret;
+
+ ret = ioctl(vm_fd, MSHV_SET_GUEST_MEMORY, region);
+ if (ret < 0) {
+ error_report("failed to set guest memory");
+ return -errno;
+ }
+
+ return 0;
+}
+
+static int map_or_unmap(int vm_fd, const MshvMemoryRegion *mr, bool map)
+{
+ struct mshv_user_mem_region region = {0};
+
+ region.guest_pfn = mr->guest_phys_addr >> MSHV_PAGE_SHIFT;
+ region.size = mr->memory_size;
+ region.userspace_addr = mr->userspace_addr;
+
+ if (!map) {
+ region.flags |= (1 << MSHV_SET_MEM_BIT_UNMAP);
+ trace_mshv_unmap_memory(mr->userspace_addr, mr->guest_phys_addr,
+ mr->memory_size);
+ return set_guest_memory(vm_fd, ®ion);
+ }
+
+ region.flags = BIT(MSHV_SET_MEM_BIT_EXECUTABLE);
+ if (!mr->readonly) {
+ region.flags |= BIT(MSHV_SET_MEM_BIT_WRITABLE);
+ }
+
+ trace_mshv_map_memory(mr->userspace_addr, mr->guest_phys_addr,
+ mr->memory_size);
+ return set_guest_memory(vm_fd, ®ion);
+}
+
+static int set_memory(const MshvMemoryRegion *mshv_mr, bool add)
+{
+ int ret = 0;
+
+ if (!mshv_mr) {
+ error_report("Invalid mshv_mr");
+ return -1;
+ }
+
+ trace_mshv_set_memory(add, mshv_mr->guest_phys_addr,
+ mshv_mr->memory_size,
+ mshv_mr->userspace_addr, mshv_mr->readonly,
+ ret);
+ return map_or_unmap(mshv_state->vm, mshv_mr, add);
+}
+
+/*
+ * Calculate and align the start address and the size of the section.
+ * Return the size. If the size is 0, the aligned section is empty.
+ */
+static hwaddr align_section(MemoryRegionSection *section, hwaddr *start)
+{
+ hwaddr size = int128_get64(section->size);
+ hwaddr delta, aligned;
+
+ /*
+ * works in page size chunks, but the function may be called
+ * with sub-page size and unaligned start address. Pad the start
+ * address to next and truncate size to previous page boundary.
+ */
+ aligned = ROUND_UP(section->offset_within_address_space,
+ qemu_real_host_page_size());
+ delta = aligned - section->offset_within_address_space;
+ *start = aligned;
+ if (delta > size) {
+ return 0;
+ }
+
+ return (size - delta) & qemu_real_host_page_mask();
+}
void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
bool add)
{
- error_report("unimplemented");
- abort();
-}
+ int ret = 0;
+ MemoryRegion *area = section->mr;
+ bool writable = !area->readonly && !area->rom_device;
+ hwaddr start_addr, mr_offset, size;
+ void *ram;
+ MshvMemoryRegion mshv_mr = {0};
+
+ size = align_section(section, &start_addr);
+ trace_mshv_set_phys_mem(add, section->mr->name, start_addr);
+
+ /*
+ * If the memory device is a writable non-ram area, we do not
+ * want to map it into the guest memory. If it is not a ROM device,
+ * we want to remove mshv memory mapping, so accesses will trap.
+ */
+ if (!memory_region_is_ram(area)) {
+ if (writable) {
+ return;
+ } else if (!area->romd_mode) {
+ add = false;
+ }
+ }
+
+ if (!size) {
+ return;
+ }
+ mr_offset = section->offset_within_region + start_addr -
+ section->offset_within_address_space;
+
+ ram = memory_region_get_ram_ptr(area) + mr_offset;
+
+ mshv_mr.guest_phys_addr = start_addr;
+ mshv_mr.memory_size = size;
+ mshv_mr.readonly = !writable;
+ mshv_mr.userspace_addr = (uint64_t)ram;
+
+ ret = set_memory(&mshv_mr, add);
+ if (ret < 0) {
+ error_report("Failed to set memory region");
+ abort();
+ }
+}
diff --git a/accel/mshv/meson.build b/accel/mshv/meson.build
index 8a6beb3fb1..f88fc8678c 100644
--- a/accel/mshv/meson.build
+++ b/accel/mshv/meson.build
@@ -1,5 +1,6 @@
mshv_ss = ss.source_set()
mshv_ss.add(if_true: files(
+ 'irq.c',
'mem.c',
'mshv-all.c'
))
diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index 2094966c8c..54c32d6252 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -7,6 +7,7 @@
* Ziqiao Zhou <ziqiaozhou@microsoft.com>
* Magnus Kulke <magnuskulke@microsoft.com>
* Jinank Jain <jinankjain@microsoft.com>
+ * Wei Liu <liuwe@microsoft.com>
*
* SPDX-License-Identifier: GPL-2.0-or-later
*
@@ -48,6 +49,175 @@ bool mshv_allowed;
MshvState *mshv_state;
+static int init_mshv(int *mshv_fd)
+{
+ int fd = open("/dev/mshv", O_RDWR | O_CLOEXEC);
+ if (fd < 0) {
+ error_report("Failed to open /dev/mshv: %s", strerror(errno));
+ return -1;
+ }
+ *mshv_fd = fd;
+ return 0;
+}
+
+/* freeze 1 to pause, 0 to resume */
+static int set_time_freeze(int vm_fd, int freeze)
+{
+ int ret;
+ struct hv_input_set_partition_property in = {0};
+ in.property_code = HV_PARTITION_PROPERTY_TIME_FREEZE;
+ in.property_value = freeze;
+
+ struct mshv_root_hvcall args = {0};
+ args.code = HVCALL_SET_PARTITION_PROPERTY;
+ args.in_sz = sizeof(in);
+ args.in_ptr = (uint64_t)∈
+
+ ret = mshv_hvcall(vm_fd, &args);
+ if (ret < 0) {
+ error_report("Failed to set time freeze");
+ return -1;
+ }
+
+ return 0;
+}
+
+static int pause_vm(int vm_fd)
+{
+ int ret;
+
+ ret = set_time_freeze(vm_fd, 1);
+ if (ret < 0) {
+ error_report("Failed to pause partition: %s", strerror(errno));
+ return -1;
+ }
+
+ return 0;
+}
+
+static int resume_vm(int vm_fd)
+{
+ int ret;
+
+ ret = set_time_freeze(vm_fd, 0);
+ if (ret < 0) {
+ error_report("Failed to resume partition: %s", strerror(errno));
+ return -1;
+ }
+
+ return 0;
+}
+
+static int create_partition(int mshv_fd, int *vm_fd)
+{
+ int ret;
+ struct mshv_create_partition args = {0};
+
+ /* Initialize pt_flags with the desired features */
+ uint64_t pt_flags = (1ULL << MSHV_PT_BIT_LAPIC) |
+ (1ULL << MSHV_PT_BIT_X2APIC) |
+ (1ULL << MSHV_PT_BIT_GPA_SUPER_PAGES);
+
+ /* Set default isolation type */
+ uint64_t pt_isolation = MSHV_PT_ISOLATION_NONE;
+
+ args.pt_flags = pt_flags;
+ args.pt_isolation = pt_isolation;
+
+ ret = ioctl(mshv_fd, MSHV_CREATE_PARTITION, &args);
+ if (ret < 0) {
+ error_report("Failed to create partition: %s", strerror(errno));
+ return -1;
+ }
+
+ *vm_fd = ret;
+ return 0;
+}
+
+static int set_synthetic_proc_features(int vm_fd)
+{
+ int ret;
+ struct hv_input_set_partition_property in = {0};
+ union hv_partition_synthetic_processor_features features = {0};
+
+ /* Access the bitfield and set the desired features */
+ features.hypervisor_present = 1;
+ features.hv1 = 1;
+ features.access_partition_reference_counter = 1;
+ features.access_synic_regs = 1;
+ features.access_synthetic_timer_regs = 1;
+ features.access_partition_reference_tsc = 1;
+ features.access_frequency_regs = 1;
+ features.access_intr_ctrl_regs = 1;
+ features.access_vp_index = 1;
+ features.access_hypercall_regs = 1;
+ features.tb_flush_hypercalls = 1;
+ features.synthetic_cluster_ipi = 1;
+ features.direct_synthetic_timers = 1;
+
+ mshv_arch_amend_proc_features(&features);
+
+ in.property_code = HV_PARTITION_PROPERTY_SYNTHETIC_PROC_FEATURES;
+ in.property_value = features.as_uint64[0];
+
+ struct mshv_root_hvcall args = {0};
+ args.code = HVCALL_SET_PARTITION_PROPERTY;
+ args.in_sz = sizeof(in);
+ args.in_ptr = (uint64_t)∈
+
+ trace_mshv_hvcall_args("synthetic_proc_features", args.code, args.in_sz);
+
+ ret = mshv_hvcall(vm_fd, &args);
+ if (ret < 0) {
+ error_report("Failed to set synthethic proc features");
+ return -errno;
+ }
+ return 0;
+}
+
+static int initialize_vm(int vm_fd)
+{
+ int ret = ioctl(vm_fd, MSHV_INITIALIZE_PARTITION);
+ if (ret < 0) {
+ error_report("Failed to initialize partition: %s", strerror(errno));
+ return -1;
+ }
+ return 0;
+}
+
+static int create_vm(int mshv_fd, int *vm_fd)
+{
+ int ret = create_partition(mshv_fd, vm_fd);
+ if (ret < 0) {
+ return -1;
+ }
+
+ ret = set_synthetic_proc_features(*vm_fd);
+ if (ret < 0) {
+ return -1;
+ }
+
+ ret = initialize_vm(*vm_fd);
+ if (ret < 0) {
+ return -1;
+ }
+
+ ret = mshv_reserve_ioapic_msi_routes(*vm_fd);
+ if (ret < 0) {
+ return -1;
+ }
+
+ ret = mshv_arch_post_init_vm(*vm_fd);
+ if (ret < 0) {
+ return -1;
+ }
+
+ /* Always create a frozen partition */
+ pause_vm(*vm_fd);
+
+ return 0;
+}
+
static void mem_region_add(MemoryListener *listener,
MemoryRegionSection *section)
{
@@ -66,11 +236,124 @@ static void mem_region_del(MemoryListener *listener,
memory_region_unref(section->mr);
}
+typedef enum {
+ DATAMATCH_NONE,
+ DATAMATCH_U32,
+ DATAMATCH_U64,
+} DatamatchTag;
+
+typedef struct {
+ DatamatchTag tag;
+ union {
+ uint32_t u32;
+ uint64_t u64;
+ } value;
+} Datamatch;
+
+/* flags: determine whether to de/assign */
+static int ioeventfd(int vm_fd, int event_fd, uint64_t addr, Datamatch dm,
+ uint32_t flags)
+{
+ struct mshv_user_ioeventfd args = {0};
+ args.fd = event_fd;
+ args.addr = addr;
+ args.flags = flags;
+
+ if (dm.tag == DATAMATCH_NONE) {
+ args.datamatch = 0;
+ } else {
+ flags |= BIT(MSHV_IOEVENTFD_BIT_DATAMATCH);
+ args.flags = flags;
+ if (dm.tag == DATAMATCH_U64) {
+ args.len = sizeof(uint64_t);
+ args.datamatch = dm.value.u64;
+ } else {
+ args.len = sizeof(uint32_t);
+ args.datamatch = dm.value.u32;
+ }
+ }
+
+ return ioctl(vm_fd, MSHV_IOEVENTFD, &args);
+}
+
+static int unregister_ioevent(int vm_fd, int event_fd, uint64_t mmio_addr)
+{
+ uint32_t flags = 0;
+ Datamatch dm = {0};
+
+ flags |= BIT(MSHV_IOEVENTFD_BIT_DEASSIGN);
+ dm.tag = DATAMATCH_NONE;
+
+ return ioeventfd(vm_fd, event_fd, mmio_addr, dm, flags);
+}
+
+static int register_ioevent(int vm_fd, int event_fd, uint64_t mmio_addr,
+ uint64_t val, bool is_64bit, bool is_datamatch)
+{
+ uint32_t flags = 0;
+ Datamatch dm = {0};
+
+ if (!is_datamatch) {
+ dm.tag = DATAMATCH_NONE;
+ } else if (is_64bit) {
+ dm.tag = DATAMATCH_U64;
+ dm.value.u64 = val;
+ } else {
+ dm.tag = DATAMATCH_U32;
+ dm.value.u32 = val;
+ }
+
+ return ioeventfd(vm_fd, event_fd, mmio_addr, dm, flags);
+}
+
+static void mem_ioeventfd_add(MemoryListener *listener,
+ MemoryRegionSection *section,
+ bool match_data, uint64_t data,
+ EventNotifier *e)
+{
+ int fd = event_notifier_get_fd(e);
+ int ret;
+ bool is_64 = int128_get64(section->size) == 8;
+ uint64_t addr = section->offset_within_address_space;
+
+ trace_mshv_mem_ioeventfd_add(addr, int128_get64(section->size), data);
+
+ ret = register_ioevent(mshv_state->vm, fd, addr, data, is_64, match_data);
+
+ if (ret < 0) {
+ error_report("Failed to register ioeventfd: %s (%d)", strerror(-ret),
+ -ret);
+ abort();
+ }
+}
+
+static void mem_ioeventfd_del(MemoryListener *listener,
+ MemoryRegionSection *section,
+ bool match_data, uint64_t data,
+ EventNotifier *e)
+{
+ int fd = event_notifier_get_fd(e);
+ int ret;
+ uint64_t addr = section->offset_within_address_space;
+
+ trace_mshv_mem_ioeventfd_del(section->offset_within_address_space,
+ int128_get64(section->size), data);
+
+ ret = unregister_ioevent(mshv_state->vm, fd, addr);
+ if (ret < 0) {
+ error_report("Failed to unregister ioeventfd: %s (%d)", strerror(-ret),
+ -ret);
+ abort();
+ }
+}
+
static MemoryListener mshv_memory_listener = {
.name = "mshv",
.priority = MEMORY_LISTENER_PRIORITY_ACCEL,
.region_add = mem_region_add,
.region_del = mem_region_del,
+ .eventfd_add = mem_ioeventfd_add,
+ .eventfd_del = mem_ioeventfd_del,
};
static MemoryListener mshv_io_listener = {
@@ -96,15 +379,57 @@ static void register_mshv_memory_listener(MshvState *s, MshvMemoryListener *mml,
}
}
+int mshv_hvcall(int vm_fd, const struct mshv_root_hvcall *args)
+{
+ int ret = 0;
+
+ ret = ioctl(vm_fd, MSHV_ROOT_HVCALL, args);
+ if (ret < 0) {
+ error_report("Failed to perform hvcall: %s", strerror(errno));
+ return -1;
+ }
+ return ret;
+}
+
+
static int mshv_init(AccelState *as, MachineState *ms)
{
MshvState *s;
+ int mshv_fd, vm_fd, ret;
+
+ if (mshv_state) {
+ warn_report("MSHV accelerator already initialized");
+ return 0;
+ }
+
s = MSHV_STATE(as);
accel_blocker_init();
s->vm = 0;
+ ret = init_mshv(&mshv_fd);
+ if (ret < 0) {
+ return -1;
+ }
+
+ mshv_init_msicontrol();
+
+ ret = create_vm(mshv_fd, &vm_fd);
+ if (ret < 0) {
+ close(mshv_fd);
+ return -1;
+ }
+
+ ret = resume_vm(vm_fd);
+ if (ret < 0) {
+ close(mshv_fd);
+ close(vm_fd);
+ return -1;
+ }
+
+ s->vm = vm_fd;
+ s->fd = mshv_fd;
s->nr_as = 1;
s->as = g_new0(MshvAddressSpace, s->nr_as);
diff --git a/accel/mshv/trace-events b/accel/mshv/trace-events
new file mode 100644
index 0000000000..5ea5e74722
--- /dev/null
+++ b/accel/mshv/trace-events
@@ -0,0 +1,23 @@
+# See docs/devel/tracing.rst for syntax documentation.
+
+mshv_set_memory(bool add, uint64_t gpa, uint64_t size, uint64_t user_addr, bool readonly, int ret) "add=%d gpa=0x%lx size=0x%lx user=0x%lx readonly=%d result=%d"
+mshv_mem_ioeventfd_add(uint64_t addr, uint32_t size, uint32_t data) "addr=0x%lx size=%d data=0x%x"
+mshv_mem_ioeventfd_del(uint64_t addr, uint32_t size, uint32_t data) "addr=0x%lx size=%d data=0x%x"
+
+mshv_hvcall_args(const char* hvcall, uint16_t code, uint16_t in_sz) "built args for '%s' code: %d in_sz: %d"
+
+mshv_handle_interrupt(uint32_t cpu, int mask) "cpu_index=%d mask=0x%x"
+mshv_set_msi_routing(uint32_t gsi, uint64_t addr, uint32_t data) "gsi=%d addr=0x%lx data=0x%x"
+mshv_remove_msi_routing(uint32_t gsi) "gsi=%d"
+mshv_add_msi_routing(uint64_t addr, uint32_t data) "addr=0x%lx data=0x%x"
+mshv_commit_msi_routing_table(int vm_fd, int len) "vm_fd=%d table_size=%d"
+mshv_register_irqfd(int vm_fd, int event_fd, uint32_t gsi) "vm_fd=%d event_fd=%d gsi=%d"
+mshv_irqchip_update_irqfd_notifier_gsi(int event_fd, int resample_fd, int virq, bool add) "event_fd=%d resample_fd=%d virq=%d add=%d"
+
+mshv_insn_fetch(uint64_t addr, size_t size) "gpa=0x%lx size=%lu"
+mshv_mem_write(uint64_t addr, size_t size) "\tgpa=0x%lx size=%lu"
+mshv_mem_read(uint64_t addr, size_t size) "\tgpa=0x%lx size=%lu"
+mshv_map_memory(uint64_t userspace_addr, uint64_t gpa, uint64_t size) "\tu_a=0x%lx gpa=0x%010lx size=0x%08lx"
+mshv_unmap_memory(uint64_t userspace_addr, uint64_t gpa, uint64_t size) "\tu_a=0x%lx gpa=0x%010lx size=0x%08lx"
+mshv_set_phys_mem(bool add, const char *name, uint64_t gpa) "\tadd=%d name=%s gpa=0x%010lx"
+mshv_handle_mmio(uint64_t gva, uint64_t gpa, uint64_t size, uint8_t access_type) "\tgva=0x%lx gpa=0x%010lx size=0x%lx access_type=%d"
diff --git a/accel/mshv/trace.h b/accel/mshv/trace.h
new file mode 100644
index 0000000000..0dca48f917
--- /dev/null
+++ b/accel/mshv/trace.h
@@ -0,0 +1,14 @@
+/*
+ * QEMU MSHV support
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * Authors:
+ * Ziqiao Zhou <ziqiaozhou@microsoft.com>
+ * Magnus Kulke <magnuskulke@microsoft.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ */
+
+#include "trace/trace-accel_mshv.h"
diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index bcb103560c..beba8c62a0 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -27,6 +27,7 @@
#include "hw/pci/msi.h"
#include "qemu/host-utils.h"
#include "system/kvm.h"
+#include "system/mshv.h"
#include "trace.h"
#include "hw/i386/apic-msidef.h"
#include "qapi/error.h"
@@ -932,6 +933,13 @@ static void apic_send_msi(MSIMessage *msi)
uint8_t trigger_mode = (data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
uint8_t delivery = (data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
/* XXX: Ignore redirection hint. */
+#ifdef CONFIG_MSHV
+ if (mshv_enabled()) {
+ mshv_request_interrupt(mshv_state->vm, delivery, vector, dest,
+ dest_mode, trigger_mode);
+ return;
+ }
+#endif
apic_deliver_irq(dest, dest_mode, delivery, vector, trigger_mode);
}
diff --git a/include/system/mshv.h b/include/system/mshv.h
index c5d33cd990..f2ffbe4ace 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -31,6 +31,12 @@
#define CONFIG_MSHV_IS_POSSIBLE
#endif
+typedef struct hyperv_message hv_message;
+
+#define MSHV_MAX_MSI_ROUTES 4096
+
+#define MSHV_PAGE_SHIFT 12
+
#ifdef CONFIG_MSHV_IS_POSSIBLE
extern bool mshv_allowed;
#define mshv_enabled() (mshv_allowed)
@@ -52,6 +58,7 @@ typedef struct MshvState {
/* number of listeners */
int nr_as;
MshvAddressSpace *as;
+ int fd;
} MshvState;
extern MshvState *mshv_state;
@@ -60,20 +67,42 @@ struct AccelCPUState {
bool dirty;
};
+typedef struct MshvMsiControl {
+ bool updated;
+ GHashTable *gsi_routes;
+} MshvMsiControl;
+
#else /* CONFIG_MSHV_IS_POSSIBLE */
#define mshv_enabled() false
#endif
-#ifdef MSHV_USE_KERNEL_GSI_IRQFD
#define mshv_msi_via_irqfd_enabled() mshv_enabled()
-#else
-#define mshv_msi_via_irqfd_enabled() false
-#endif
+
+/* cpu */
+void mshv_arch_amend_proc_features(
+ union hv_partition_synthetic_processor_features *features);
+int mshv_arch_post_init_vm(int vm_fd);
+
+int mshv_hvcall(int mshv_fd, const struct mshv_root_hvcall *args);
/* memory */
+typedef struct MshvMemoryRegion {
+ uint64_t guest_phys_addr;
+ uint64_t memory_size;
+ uint64_t userspace_addr;
+ bool readonly;
+} MshvMemoryRegion;
+
+int mshv_add_mem(int vm_fd, const MshvMemoryRegion *mr);
+int mshv_remove_mem(int vm_fd, const MshvMemoryRegion *mr);
void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
bool add);
/* interrupt */
+void mshv_init_msicontrol(void);
+int mshv_request_interrupt(int vm_fd, uint32_t interrupt_type, uint32_t vector,
+ uint32_t vp_index, bool logical_destination_mode,
+ bool level_triggered);
+
int mshv_irqchip_add_msi_route(int vector, PCIDevice *dev);
int mshv_irqchip_update_msi_route(int virq, MSIMessage msg, PCIDevice *dev);
void mshv_irqchip_commit_routes(void);
@@ -81,5 +110,6 @@ void mshv_irqchip_release_virq(int virq);
int mshv_irqchip_add_irqfd_notifier_gsi(const EventNotifier *n,
const EventNotifier *rn, int virq);
int mshv_irqchip_remove_irqfd_notifier_gsi(const EventNotifier *n, int virq);
+int mshv_reserve_ioapic_msi_routes(int vm_fd);
#endif
diff --git a/meson.build b/meson.build
index b6e70714f1..ed454aa42c 100644
--- a/meson.build
+++ b/meson.build
@@ -3661,6 +3661,7 @@ if have_system
trace_events_subdirs += [
'accel/hvf',
'accel/kvm',
+ 'accel/mshv',
'audio',
'backends',
'backends/tpm',
diff --git a/target/i386/mshv/meson.build b/target/i386/mshv/meson.build
index 8ddaa7c11d..647e5dafb7 100644
--- a/target/i386/mshv/meson.build
+++ b/target/i386/mshv/meson.build
@@ -1,6 +1,7 @@
i386_mshv_ss = ss.source_set()
i386_mshv_ss.add(files(
+ 'mshv-cpu.c',
'x86.c',
))
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
new file mode 100644
index 0000000000..c00e98dfba
--- /dev/null
+++ b/target/i386/mshv/mshv-cpu.c
@@ -0,0 +1,71 @@
+/*
+ * QEMU MSHV support
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * Authors: Ziqiao Zhou <ziqiaozhou@microsoft.com>
+ * Magnus Kulke <magnuskulke@microsoft.com>
+ * Jinank Jain <jinankjain@microsoft.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/typedefs.h"
+
+#include "system/mshv.h"
+#include "system/address-spaces.h"
+#include "linux/mshv.h"
+#include "hw/hyperv/hvhdk_mini.h"
+#include "hw/hyperv/hvgdk.h"
+
+
+#include "trace-accel_mshv.h"
+#include "trace.h"
+
+void mshv_arch_amend_proc_features(
+ union hv_partition_synthetic_processor_features *features)
+{
+ features->access_guest_idle_reg = 1;
+}
+
+/*
+ * Default Microsoft Hypervisor behavior for unimplemented MSR is to send a
+ * fault to the guest if it tries to access it. It is possible to override
+ * this behavior with a more suitable option i.e., ignore writes from the guest
+ * and return zero in attempt to read unimplemented.
+ */
+static int set_unimplemented_msr_action(int vm_fd)
+{
+ struct hv_input_set_partition_property in = {0};
+ struct mshv_root_hvcall args = {0};
+
+ in.property_code = HV_PARTITION_PROPERTY_UNIMPLEMENTED_MSR_ACTION;
+ in.property_value = HV_UNIMPLEMENTED_MSR_ACTION_IGNORE_WRITE_READ_ZERO;
+
+ args.code = HVCALL_SET_PARTITION_PROPERTY;
+ args.in_sz = sizeof(in);
+ args.in_ptr = (uint64_t)∈
+
+ trace_mshv_hvcall_args("unimplemented_msr_action", args.code, args.in_sz);
+
+ int ret = mshv_hvcall(vm_fd, &args);
+ if (ret < 0) {
+ error_report("Failed to set unimplemented MSR action");
+ return -1;
+ }
+ return 0;
+}
+
+int mshv_arch_post_init_vm(int vm_fd)
+{
+ int ret;
+
+ ret = set_unimplemented_msr_action(vm_fd);
+ if (ret < 0) {
+ error_report("Failed to set unimplemented MSR action");
+ }
+
+ return ret;
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v3 09/26] accel/mshv: Initialize VM partition
2025-08-07 14:39 ` [PATCH v3 09/26] accel/mshv: Initialize VM partition Magnus Kulke
@ 2025-08-27 11:15 ` Daniel P. Berrangé
0 siblings, 0 replies; 46+ messages in thread
From: Daniel P. Berrangé @ 2025-08-27 11:15 UTC (permalink / raw)
To: Magnus Kulke
Cc: qemu-devel, Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Magnus Kulke, Cornelia Huck, Zhao Liu, Thomas Huth, Yanan Wang,
Cameron Esfahani, Wei Liu, Wei Liu, Marc-André Lureau,
Roman Bolshakov, Philippe Mathieu-Daudé
On Thu, Aug 07, 2025 at 04:39:34PM +0200, Magnus Kulke wrote:
> Create the MSHV virtual machine by opening a partition and issuing
> the necessary ioctl to initialize it. This sets up the basic VM
> structure and initial configuration used by MSHV to manage guest state.
>
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
> accel/mshv/irq.c | 397 +++++++++++++++++++++++++++++++++++
> accel/mshv/mem.c | 129 +++++++++++-
> accel/mshv/meson.build | 1 +
> accel/mshv/mshv-all.c | 325 ++++++++++++++++++++++++++++
> accel/mshv/trace-events | 23 ++
> accel/mshv/trace.h | 14 ++
> hw/intc/apic.c | 8 +
> include/system/mshv.h | 38 +++-
> meson.build | 1 +
> target/i386/mshv/meson.build | 1 +
> target/i386/mshv/mshv-cpu.c | 71 +++++++
> 11 files changed, 1001 insertions(+), 7 deletions(-)
> create mode 100644 accel/mshv/irq.c
> create mode 100644 accel/mshv/trace-events
> create mode 100644 accel/mshv/trace.h
> create mode 100644 target/i386/mshv/mshv-cpu.c
>
> diff --git a/accel/mshv/irq.c b/accel/mshv/irq.c
> diff --git a/accel/mshv/trace-events b/accel/mshv/trace-events
> new file mode 100644
> index 0000000000..5ea5e74722
> --- /dev/null
> +++ b/accel/mshv/trace-events
> @@ -0,0 +1,23 @@
> +# See docs/devel/tracing.rst for syntax documentation.
This will need the SPDX-License-Identifier tag too these
days.
> diff --git a/accel/mshv/trace.h b/accel/mshv/trace.h
> new file mode 100644
> index 0000000000..0dca48f917
> --- /dev/null
> +++ b/accel/mshv/trace.h
> @@ -0,0 +1,14 @@
> +/*
> + * QEMU MSHV support
> + *
> + * Copyright Microsoft, Corp. 2025
> + *
> + * Authors:
> + * Ziqiao Zhou <ziqiaozhou@microsoft.com>
> + * Magnus Kulke <magnuskulke@microsoft.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + */
> +
> +#include "trace/trace-accel_mshv.h"
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v3 10/26] accel/mshv: Add vCPU creation and execution loop
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (8 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 09/26] accel/mshv: Initialize VM partition Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-27 11:24 ` Daniel P. Berrangé
2025-08-07 14:39 ` [PATCH v3 11/26] accel/mshv: Add vCPU signal handling Magnus Kulke
` (17 subsequent siblings)
27 siblings, 1 reply; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Create MSHV vCPUs using MSHV_CREATE_VP and initialize their state.
Register the MSHV CPU execution loop loop with the QEMU accelerator
framework to enable guest code execution.
The target/i386 functionality is still mostly stubbed out and will be
populated in a later commit in this series.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
accel/mshv/mshv-all.c | 187 +++++++++++++++++++++++++++++++++---
accel/mshv/trace-events | 2 +
include/system/mshv.h | 17 ++++
target/i386/mshv/mshv-cpu.c | 63 ++++++++++++
4 files changed, 257 insertions(+), 12 deletions(-)
diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index 54c32d6252..a4eeaeec76 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -391,6 +391,24 @@ int mshv_hvcall(int vm_fd, const struct mshv_root_hvcall *args)
return ret;
}
+static int mshv_init_vcpu(CPUState *cpu)
+{
+ int vm_fd = mshv_state->vm;
+ uint8_t vp_index = cpu->cpu_index;
+ int ret;
+
+ mshv_arch_init_vcpu(cpu);
+ cpu->accel = g_new0(AccelCPUState, 1);
+
+ ret = mshv_create_vcpu(vm_fd, vp_index, &cpu->accel->cpufd);
+ if (ret < 0) {
+ return -1;
+ }
+
+ cpu->accel->dirty = true;
+
+ return 0;
+}
static int mshv_init(AccelState *as, MachineState *ms)
{
@@ -413,6 +431,8 @@ static int mshv_init(AccelState *as, MachineState *ms)
return -1;
}
+ mshv_init_cpu_logic();
+
mshv_init_msicontrol();
ret = create_vm(mshv_fd, &vm_fd);
@@ -442,40 +462,183 @@ static int mshv_init(AccelState *as, MachineState *ms)
return 0;
}
+static int mshv_destroy_vcpu(CPUState *cpu)
+{
+ int cpu_fd = mshv_vcpufd(cpu);
+ int vm_fd = mshv_state->vm;
+
+ mshv_remove_vcpu(vm_fd, cpu_fd);
+ mshv_vcpufd(cpu) = 0;
+
+ mshv_arch_destroy_vcpu(cpu);
+ g_free(cpu->accel);
+ return 0;
+}
+
+static int mshv_cpu_exec(CPUState *cpu)
+{
+ hv_message mshv_msg;
+ enum MshvVmExit exit_reason;
+ int ret = 0;
+
+ bql_unlock();
+ cpu_exec_start(cpu);
+
+ do {
+ if (cpu->accel->dirty) {
+ ret = mshv_arch_put_registers(cpu);
+ if (ret) {
+ error_report("Failed to put registers after init: %s",
+ strerror(-ret));
+ ret = -1;
+ break;
+ }
+ cpu->accel->dirty = false;
+ }
+
+ ret = mshv_run_vcpu(mshv_state->vm, cpu, &mshv_msg, &exit_reason);
+ if (ret < 0) {
+ error_report("Failed to run on vcpu %d", cpu->cpu_index);
+ abort();
+ }
+
+ switch (exit_reason) {
+ case MshvVmExitIgnore:
+ break;
+ default:
+ ret = EXCP_INTERRUPT;
+ break;
+ }
+ } while (ret == 0);
+
+ cpu_exec_end(cpu);
+ bql_lock();
+
+ if (ret < 0) {
+ cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
+ vm_stop(RUN_STATE_INTERNAL_ERROR);
+ }
+
+ qatomic_set(&cpu->exit_request, 0);
+ return ret;
+}
+
+static void *mshv_vcpu_thread(void *arg)
+{
+ CPUState *cpu = arg;
+ int ret;
+
+ rcu_register_thread();
+
+ bql_lock();
+ qemu_thread_get_self(cpu->thread);
+ cpu->thread_id = qemu_get_thread_id();
+ current_cpu = cpu;
+ ret = mshv_init_vcpu(cpu);
+ if (ret < 0) {
+ error_report("Failed to init vcpu %d", cpu->cpu_index);
+ goto cleanup;
+ }
+
+ /* signal CPU creation */
+ cpu_thread_signal_created(cpu);
+ qemu_guest_random_seed_thread_part2(cpu->random_seed);
+
+ do {
+ if (cpu_can_run(cpu)) {
+ mshv_cpu_exec(cpu);
+ }
+ qemu_wait_io_event(cpu);
+ } while (!cpu->unplug || cpu_can_run(cpu));
+
+ mshv_destroy_vcpu(cpu);
+cleanup:
+ cpu_thread_signal_destroyed(cpu);
+ bql_unlock();
+ rcu_unregister_thread();
+ return NULL;
+}
+
static void mshv_start_vcpu_thread(CPUState *cpu)
{
- error_report("unimplemented");
- abort();
+ char thread_name[VCPU_THREAD_NAME_SIZE];
+
+ cpu->thread = g_malloc0(sizeof(QemuThread));
+ cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+
+ qemu_cond_init(cpu->halt_cond);
+
+ trace_mshv_start_vcpu_thread(thread_name, cpu->cpu_index);
+ qemu_thread_create(cpu->thread, thread_name, mshv_vcpu_thread, cpu,
+ QEMU_THREAD_JOINABLE);
+}
+
+static void do_mshv_cpu_synchronize_post_init(CPUState *cpu,
+ run_on_cpu_data arg)
+{
+ int ret = mshv_arch_put_registers(cpu);
+ if (ret < 0) {
+ error_report("Failed to put registers after init: %s", strerror(-ret));
+ abort();
+ }
+
+ cpu->accel->dirty = false;
}
static void mshv_cpu_synchronize_post_init(CPUState *cpu)
{
- error_report("unimplemented");
- abort();
+ run_on_cpu(cpu, do_mshv_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
}
static void mshv_cpu_synchronize_post_reset(CPUState *cpu)
{
- error_report("unimplemented");
- abort();
+ int ret = mshv_arch_put_registers(cpu);
+ if (ret) {
+ error_report("Failed to put registers after reset: %s",
+ strerror(-ret));
+ cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
+ vm_stop(RUN_STATE_INTERNAL_ERROR);
+ }
+ cpu->accel->dirty = false;
+}
+
+static void do_mshv_cpu_synchronize_pre_loadvm(CPUState *cpu,
+ run_on_cpu_data arg)
+{
+ cpu->accel->dirty = true;
}
static void mshv_cpu_synchronize_pre_loadvm(CPUState *cpu)
{
- error_report("unimplemented");
- abort();
+ run_on_cpu(cpu, do_mshv_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
+}
+
+static void do_mshv_cpu_synchronize(CPUState *cpu, run_on_cpu_data arg)
+{
+ if (!cpu->accel->dirty) {
+ int ret = mshv_load_regs(cpu);
+ if (ret < 0) {
+ error_report("Failed to load registers for vcpu %d",
+ cpu->cpu_index);
+
+ cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
+ vm_stop(RUN_STATE_INTERNAL_ERROR);
+ }
+
+ cpu->accel->dirty = true;
+ }
}
static void mshv_cpu_synchronize(CPUState *cpu)
{
- error_report("unimplemented");
- abort();
+ if (!cpu->accel->dirty) {
+ run_on_cpu(cpu, do_mshv_cpu_synchronize, RUN_ON_CPU_NULL);
+ }
}
static bool mshv_cpus_are_resettable(void)
{
- error_report("unimplemented");
- abort();
+ return false;
}
static void mshv_accel_class_init(ObjectClass *oc, const void *data)
diff --git a/accel/mshv/trace-events b/accel/mshv/trace-events
index 5ea5e74722..1b1b43a1e8 100644
--- a/accel/mshv/trace-events
+++ b/accel/mshv/trace-events
@@ -1,5 +1,7 @@
# See docs/devel/tracing.rst for syntax documentation.
+mshv_start_vcpu_thread(const char* thread, uint32_t cpu) "thread=%s cpu_index=%d"
+
mshv_set_memory(bool add, uint64_t gpa, uint64_t size, uint64_t user_addr, bool readonly, int ret) "add=%d gpa=0x%lx size=0x%lx user=0x%lx readonly=%d result=%d"
mshv_mem_ioeventfd_add(uint64_t addr, uint32_t size, uint32_t data) "addr=0x%lx size=%d data=0x%x"
mshv_mem_ioeventfd_del(uint64_t addr, uint32_t size, uint32_t data) "addr=0x%lx size=%d data=0x%x"
diff --git a/include/system/mshv.h b/include/system/mshv.h
index f2ffbe4ace..301228a813 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -72,12 +72,29 @@ typedef struct MshvMsiControl {
GHashTable *gsi_routes;
} MshvMsiControl;
+#define mshv_vcpufd(cpu) (cpu->accel->cpufd)
+
#else /* CONFIG_MSHV_IS_POSSIBLE */
#define mshv_enabled() false
#endif
#define mshv_msi_via_irqfd_enabled() mshv_enabled()
/* cpu */
+typedef enum MshvVmExit {
+ MshvVmExitIgnore = 0,
+ MshvVmExitShutdown = 1,
+ MshvVmExitSpecial = 2,
+} MshvVmExit;
+
+void mshv_init_cpu_logic(void);
+int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd);
+void mshv_remove_vcpu(int vm_fd, int cpu_fd);
+int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit);
+int mshv_load_regs(CPUState *cpu);
+int mshv_store_regs(CPUState *cpu);
+int mshv_arch_put_registers(const CPUState *cpu);
+void mshv_arch_init_vcpu(CPUState *cpu);
+void mshv_arch_destroy_vcpu(CPUState *cpu);
void mshv_arch_amend_proc_features(
union hv_partition_synthetic_processor_features *features);
int mshv_arch_post_init_vm(int vm_fd);
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index c00e98dfba..2fe5319201 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -20,16 +20,79 @@
#include "hw/hyperv/hvhdk_mini.h"
#include "hw/hyperv/hvgdk.h"
+#include "cpu.h"
+#include "emulate/x86_decode.h"
+#include "emulate/x86_emu.h"
+#include "emulate/x86_flags.h"
#include "trace-accel_mshv.h"
#include "trace.h"
+int mshv_store_regs(CPUState *cpu)
+{
+ error_report("unimplemented");
+ abort();
+}
+
+int mshv_load_regs(CPUState *cpu)
+{
+ error_report("unimplemented");
+ abort();
+}
+
+int mshv_arch_put_registers(const CPUState *cpu)
+{
+ error_report("unimplemented");
+ abort();
+}
+
void mshv_arch_amend_proc_features(
union hv_partition_synthetic_processor_features *features)
{
features->access_guest_idle_reg = 1;
}
+int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit)
+{
+ error_report("unimplemented");
+ abort();
+}
+
+void mshv_remove_vcpu(int vm_fd, int cpu_fd)
+{
+ error_report("unimplemented");
+ abort();
+}
+
+int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd)
+{
+ error_report("unimplemented");
+ abort();
+}
+
+void mshv_init_cpu_logic(void)
+{
+ error_report("unimplemented");
+ abort();
+}
+
+void mshv_arch_init_vcpu(CPUState *cpu)
+{
+ X86CPU *x86_cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86_cpu->env;
+
+ env->emu_mmio_buf = g_new(char, 4096);
+}
+
+void mshv_arch_destroy_vcpu(CPUState *cpu)
+{
+ X86CPU *x86_cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86_cpu->env;
+
+ g_free(env->emu_mmio_buf);
+ env->emu_mmio_buf = NULL;
+}
+
/*
* Default Microsoft Hypervisor behavior for unimplemented MSR is to send a
* fault to the guest if it tries to access it. It is possible to override
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v3 10/26] accel/mshv: Add vCPU creation and execution loop
2025-08-07 14:39 ` [PATCH v3 10/26] accel/mshv: Add vCPU creation and execution loop Magnus Kulke
@ 2025-08-27 11:24 ` Daniel P. Berrangé
2025-08-27 17:39 ` Wei Liu
2025-09-16 9:33 ` Magnus Kulke
0 siblings, 2 replies; 46+ messages in thread
From: Daniel P. Berrangé @ 2025-08-27 11:24 UTC (permalink / raw)
To: Magnus Kulke
Cc: qemu-devel, Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Magnus Kulke, Cornelia Huck, Zhao Liu, Thomas Huth, Yanan Wang,
Cameron Esfahani, Wei Liu, Wei Liu, Marc-André Lureau,
Roman Bolshakov, Philippe Mathieu-Daudé
On Thu, Aug 07, 2025 at 04:39:35PM +0200, Magnus Kulke wrote:
> Create MSHV vCPUs using MSHV_CREATE_VP and initialize their state.
> Register the MSHV CPU execution loop loop with the QEMU accelerator
> framework to enable guest code execution.
>
> The target/i386 functionality is still mostly stubbed out and will be
> populated in a later commit in this series.
>
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
> accel/mshv/mshv-all.c | 187 +++++++++++++++++++++++++++++++++---
> accel/mshv/trace-events | 2 +
> include/system/mshv.h | 17 ++++
> target/i386/mshv/mshv-cpu.c | 63 ++++++++++++
> 4 files changed, 257 insertions(+), 12 deletions(-)
>
> diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
> index 54c32d6252..a4eeaeec76 100644
> --- a/accel/mshv/mshv-all.c
> +++ b/accel/mshv/mshv-all.c
> @@ -391,6 +391,24 @@ int mshv_hvcall(int vm_fd, const struct mshv_root_hvcall *args)
> return ret;
> }
>
> +static int mshv_init_vcpu(CPUState *cpu)
> +{
> + int vm_fd = mshv_state->vm;
> + uint8_t vp_index = cpu->cpu_index;
> + int ret;
> +
> + mshv_arch_init_vcpu(cpu);
> + cpu->accel = g_new0(AccelCPUState, 1);
> +
> + ret = mshv_create_vcpu(vm_fd, vp_index, &cpu->accel->cpufd);
> + if (ret < 0) {
> + return -1;
> + }
> +
> + cpu->accel->dirty = true;
> +
> + return 0;
> +}
>
> static int mshv_init(AccelState *as, MachineState *ms)
> {
> @@ -413,6 +431,8 @@ static int mshv_init(AccelState *as, MachineState *ms)
> return -1;
> }
>
> + mshv_init_cpu_logic();
> +
> mshv_init_msicontrol();
>
> ret = create_vm(mshv_fd, &vm_fd);
> @@ -442,40 +462,183 @@ static int mshv_init(AccelState *as, MachineState *ms)
> return 0;
> }
>
> +static int mshv_destroy_vcpu(CPUState *cpu)
> +{
> + int cpu_fd = mshv_vcpufd(cpu);
> + int vm_fd = mshv_state->vm;
> +
> + mshv_remove_vcpu(vm_fd, cpu_fd);
> + mshv_vcpufd(cpu) = 0;
> +
> + mshv_arch_destroy_vcpu(cpu);
> + g_free(cpu->accel);
Since the lifetime of the CPUState is not tightly
tied to the cpu->accel, I'd suggest that here
should use
g_clear_pointer(&cpu->accel, g_free);
so that if there is any race with code accessing
cpu->accel after it is free'd, we'll get a clear
NULL de-reference, rather than use-after-free which
is harder to diagnose.
> +static void *mshv_vcpu_thread(void *arg)
> +{
> + CPUState *cpu = arg;
> + int ret;
> +
> + rcu_register_thread();
> +
> + bql_lock();
> + qemu_thread_get_self(cpu->thread);
> + cpu->thread_id = qemu_get_thread_id();
So every MSHV vCPU has a corresponding Linux thread, similar
to the model with KVM. In libvirt we rely on the vCPU thread
being controllable with all the normal Linux process related
APIs. For example, setting thread CPU affinity, setting NUMA
memory policy, setting scheduler priorities, putting threads
into cgroups and applying a wide variety of cgroup controls.
Will there be any significant "gotchas" with the threads for
MSHV vCPUs, that would mean the above libvirt controls would
either raise errors, or silently not have any effect ?
> + current_cpu = cpu;
> + ret = mshv_init_vcpu(cpu);
> + if (ret < 0) {
> + error_report("Failed to init vcpu %d", cpu->cpu_index);
> + goto cleanup;
> + }
> +
> + /* signal CPU creation */
> + cpu_thread_signal_created(cpu);
> + qemu_guest_random_seed_thread_part2(cpu->random_seed);
> +
> + do {
> + if (cpu_can_run(cpu)) {
> + mshv_cpu_exec(cpu);
> + }
> + qemu_wait_io_event(cpu);
> + } while (!cpu->unplug || cpu_can_run(cpu));
> +
> + mshv_destroy_vcpu(cpu);
> +cleanup:
> + cpu_thread_signal_destroyed(cpu);
> + bql_unlock();
> + rcu_unregister_thread();
> + return NULL;
> +}
> +
> static void mshv_start_vcpu_thread(CPUState *cpu)
> {
> - error_report("unimplemented");
> - abort();
> + char thread_name[VCPU_THREAD_NAME_SIZE];
> +
> + cpu->thread = g_malloc0(sizeof(QemuThread));
= g_new0(QemuThread, 1);
> + cpu->halt_cond = g_malloc0(sizeof(QemuCond));
= g_new0(QemuCond, 1);
> +
> + qemu_cond_init(cpu->halt_cond);
> +
> + trace_mshv_start_vcpu_thread(thread_name, cpu->cpu_index);
> + qemu_thread_create(cpu->thread, thread_name, mshv_vcpu_thread, cpu,
> + QEMU_THREAD_JOINABLE);
> +}
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v3 10/26] accel/mshv: Add vCPU creation and execution loop
2025-08-27 11:24 ` Daniel P. Berrangé
@ 2025-08-27 17:39 ` Wei Liu
2025-09-16 9:33 ` Magnus Kulke
1 sibling, 0 replies; 46+ messages in thread
From: Wei Liu @ 2025-08-27 17:39 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Magnus Kulke, qemu-devel, Eric Blake, Eduardo Habkost,
Michael S. Tsirkin, Markus Armbruster, Magnus Kulke,
Paolo Bonzini, Richard Henderson, Phil Dennis-Jordan,
Marcel Apfelbaum, Alex Bennée, Magnus Kulke, Cornelia Huck,
Zhao Liu, Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu,
Wei Liu, Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
On Wed, Aug 27, 2025 at 12:24:39PM +0100, Daniel P. Berrangé wrote:
[...]
>
> > +static void *mshv_vcpu_thread(void *arg)
> > +{
> > + CPUState *cpu = arg;
> > + int ret;
> > +
> > + rcu_register_thread();
> > +
> > + bql_lock();
> > + qemu_thread_get_self(cpu->thread);
> > + cpu->thread_id = qemu_get_thread_id();
>
> So every MSHV vCPU has a corresponding Linux thread, similar
> to the model with KVM. In libvirt we rely on the vCPU thread
> being controllable with all the normal Linux process related
> APIs. For example, setting thread CPU affinity, setting NUMA
> memory policy, setting scheduler priorities, putting threads
> into cgroups and applying a wide variety of cgroup controls.
>
> Will there be any significant "gotchas" with the threads for
> MSHV vCPUs, that would mean the above libvirt controls would
> either raise errors, or silently not have any effect ?
>
It depends on the scheduling model of the host.
MSHV supports two scheduling models: hypervisor-based and root
partition. Root partition is the term we use to describe the host VM --
think of it like the Dom0 VM in Xen.
In the hypervisor-based scheduling model, the VCPUs are scheduled by the
hypervisor. The root partition merely tells the hypervisor "this VCPU is
ready to run", and the hypervisor decides when and where to actually run
it. In this model, the VCPU threads, when scheduled, are shown as
blocked. Libvirt controls over the threads won't fail but have no
effect.
The root partition scheduling model is where the root (Linux) can decide
where and when to run the VCPUs. Everything you mentioned should work
as expected.
For the upcoming project, we are going to use the root scheduling model.
Thanks,
Wei
P.S. In the hpyervsior-based scheduling mode, the hypervisor does allow
us to set CPU affinity for VCPUs or group them (similar to cgroup but
not the same) by making some hypercalls. We've been thinking about
mapping those into libvirt controls, but haven't made good progress on
that front. It deserves its own discussion.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v3 10/26] accel/mshv: Add vCPU creation and execution loop
2025-08-27 11:24 ` Daniel P. Berrangé
2025-08-27 17:39 ` Wei Liu
@ 2025-09-16 9:33 ` Magnus Kulke
1 sibling, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-09-16 9:33 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: qemu-devel, Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Magnus Kulke, Cornelia Huck, Zhao Liu, Thomas Huth, Yanan Wang,
Cameron Esfahani, Wei Liu, Wei Liu, Marc-André Lureau,
Roman Bolshakov, Philippe Mathieu-Daudé
On Wed, Aug 27, 2025 at 12:24:39PM +0100, Daniel P. Berrangé wrote:
> So every MSHV vCPU has a corresponding Linux thread, similar
> to the model with KVM. In libvirt we rely on the vCPU thread
> being controllable with all the normal Linux process related
> APIs. For example, setting thread CPU affinity, setting NUMA
> memory policy, setting scheduler priorities, putting threads
> into cgroups and applying a wide variety of cgroup controls.
>
> Will there be any significant "gotchas" with the threads for
> MSHV vCPUs, that would mean the above libvirt controls would
> either raise errors, or silently not have any effect ?
>
Hi Daniel,
I am not aware of any such gotchas. The MSHV vCPU threads should
be regular threads that spend most of their time blocked in
ioctl(MSHV_RUN) calls, and as such they should be controllable by
the facilities you mentioned. I know that that folks who tested this
code have been using numactl for reliable performance assessments
without running into issues.
best,
magnus
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v3 11/26] accel/mshv: Add vCPU signal handling
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (9 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 10/26] accel/mshv: Add vCPU creation and execution loop Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 12/26] target/i386/mshv: Add CPU create and remove logic Magnus Kulke
` (16 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Implement signal handling for MSHV vCPUs to support asynchronous
interrupts from the main thread.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
accel/mshv/mshv-all.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index a4eeaeec76..65166e82b0 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -523,6 +523,35 @@ static int mshv_cpu_exec(CPUState *cpu)
return ret;
}
+/*
+ * The signal handler is triggered when QEMU's main thread receives a SIG_IPI
+ * (SIGUSR1). This signal causes the current CPU thread to be kicked, forcing a
+ * VM exit on the CPU. The VM exit generates an exit reason that breaks the loop
+ * (see mshv_cpu_exec). If the exit is due to a Ctrl+A+x command, the system
+ * will shut down. For other cases, the system will continue running.
+ */
+static void sa_ipi_handler(int sig)
+{
+ /* TODO: call IOCTL to set_immediate_exit, once implemented. */
+
+ qemu_cpu_kick_self();
+}
+
+static void init_signal(CPUState *cpu)
+{
+ /* init cpu signals */
+ struct sigaction sigact;
+ sigset_t set;
+
+ memset(&sigact, 0, sizeof(sigact));
+ sigact.sa_handler = sa_ipi_handler;
+ sigaction(SIG_IPI, &sigact, NULL);
+
+ pthread_sigmask(SIG_BLOCK, NULL, &set);
+ sigdelset(&set, SIG_IPI);
+ pthread_sigmask(SIG_SETMASK, &set, NULL);
+}
+
static void *mshv_vcpu_thread(void *arg)
{
CPUState *cpu = arg;
@@ -539,6 +568,7 @@ static void *mshv_vcpu_thread(void *arg)
error_report("Failed to init vcpu %d", cpu->cpu_index);
goto cleanup;
}
+ init_signal(cpu);
/* signal CPU creation */
cpu_thread_signal_created(cpu);
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 12/26] target/i386/mshv: Add CPU create and remove logic
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (10 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 11/26] accel/mshv: Add vCPU signal handling Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 13/26] target/i386/mshv: Implement mshv_store_regs() Magnus Kulke
` (15 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Implement MSHV-specific hooks for vCPU creation and teardown in the
i386 target.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
target/i386/mshv/mshv-cpu.c | 23 +++++++++++++++++------
1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 2fe5319201..7a6965d7fb 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -28,6 +28,8 @@
#include "trace-accel_mshv.h"
#include "trace.h"
+#include <sys/ioctl.h>
+
int mshv_store_regs(CPUState *cpu)
{
error_report("unimplemented");
@@ -60,20 +62,29 @@ int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit)
void mshv_remove_vcpu(int vm_fd, int cpu_fd)
{
- error_report("unimplemented");
- abort();
+ close(cpu_fd);
}
+
int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd)
{
- error_report("unimplemented");
- abort();
+ int ret;
+ struct mshv_create_vp vp_arg = {
+ .vp_index = vp_index,
+ };
+ ret = ioctl(vm_fd, MSHV_CREATE_VP, &vp_arg);
+ if (ret < 0) {
+ error_report("failed to create mshv vcpu: %s", strerror(errno));
+ return -1;
+ }
+
+ *cpu_fd = ret;
+
+ return 0;
}
void mshv_init_cpu_logic(void)
{
- error_report("unimplemented");
- abort();
}
void mshv_arch_init_vcpu(CPUState *cpu)
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 13/26] target/i386/mshv: Implement mshv_store_regs()
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (11 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 12/26] target/i386/mshv: Add CPU create and remove logic Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 14/26] target/i386/mshv: Implement mshv_get_standard_regs() Magnus Kulke
` (14 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Add support for writing general-purpose registers to MSHV vCPUs
during initialization or migration using the MSHV register interface. A
generic set_register call is introduced to abstract the HV call over
the various register types.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
include/system/mshv.h | 3 ++
target/i386/mshv/mshv-cpu.c | 84 ++++++++++++++++++++++++++++++++++++-
2 files changed, 85 insertions(+), 2 deletions(-)
diff --git a/include/system/mshv.h b/include/system/mshv.h
index 301228a813..d1a119c3e7 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -19,6 +19,7 @@
#include "hw/hyperv/hyperv-proto.h"
#include "linux/mshv.h"
#include "hw/hyperv/hvhdk.h"
+#include "hw/hyperv/hvgdk_mini.h"
#include "qapi/qapi-types-common.h"
#include "system/memory.h"
#include "accel/accel-ops.h"
@@ -92,6 +93,8 @@ void mshv_remove_vcpu(int vm_fd, int cpu_fd);
int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit);
int mshv_load_regs(CPUState *cpu);
int mshv_store_regs(CPUState *cpu);
+int mshv_set_generic_regs(int cpu_fd, struct hv_register_assoc *assocs,
+ size_t n_regs);
int mshv_arch_put_registers(const CPUState *cpu);
void mshv_arch_init_vcpu(CPUState *cpu);
void mshv_arch_destroy_vcpu(CPUState *cpu);
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 7a6965d7fb..4bd4e29b72 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -30,12 +30,92 @@
#include <sys/ioctl.h>
+static enum hv_register_name STANDARD_REGISTER_NAMES[18] = {
+ HV_X64_REGISTER_RAX,
+ HV_X64_REGISTER_RBX,
+ HV_X64_REGISTER_RCX,
+ HV_X64_REGISTER_RDX,
+ HV_X64_REGISTER_RSI,
+ HV_X64_REGISTER_RDI,
+ HV_X64_REGISTER_RSP,
+ HV_X64_REGISTER_RBP,
+ HV_X64_REGISTER_R8,
+ HV_X64_REGISTER_R9,
+ HV_X64_REGISTER_R10,
+ HV_X64_REGISTER_R11,
+ HV_X64_REGISTER_R12,
+ HV_X64_REGISTER_R13,
+ HV_X64_REGISTER_R14,
+ HV_X64_REGISTER_R15,
+ HV_X64_REGISTER_RIP,
+ HV_X64_REGISTER_RFLAGS,
+};
+
+int mshv_set_generic_regs(int cpu_fd, hv_register_assoc *assocs, size_t n_regs)
+{
+ struct mshv_vp_registers input = {
+ .count = n_regs,
+ .regs = assocs,
+ };
+
+ return ioctl(cpu_fd, MSHV_SET_VP_REGISTERS, &input);
+}
+
+static int set_standard_regs(const CPUState *cpu)
+{
+ X86CPU *x86cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86cpu->env;
+ hv_register_assoc assocs[ARRAY_SIZE(STANDARD_REGISTER_NAMES)];
+ int ret;
+ int cpu_fd = mshv_vcpufd(cpu);
+ size_t n_regs = ARRAY_SIZE(STANDARD_REGISTER_NAMES);
+
+ /* set names */
+ for (size_t i = 0; i < ARRAY_SIZE(STANDARD_REGISTER_NAMES); i++) {
+ assocs[i].name = STANDARD_REGISTER_NAMES[i];
+ }
+ assocs[0].value.reg64 = env->regs[R_EAX];
+ assocs[1].value.reg64 = env->regs[R_EBX];
+ assocs[2].value.reg64 = env->regs[R_ECX];
+ assocs[3].value.reg64 = env->regs[R_EDX];
+ assocs[4].value.reg64 = env->regs[R_ESI];
+ assocs[5].value.reg64 = env->regs[R_EDI];
+ assocs[6].value.reg64 = env->regs[R_ESP];
+ assocs[7].value.reg64 = env->regs[R_EBP];
+ assocs[8].value.reg64 = env->regs[R_R8];
+ assocs[9].value.reg64 = env->regs[R_R9];
+ assocs[10].value.reg64 = env->regs[R_R10];
+ assocs[11].value.reg64 = env->regs[R_R11];
+ assocs[12].value.reg64 = env->regs[R_R12];
+ assocs[13].value.reg64 = env->regs[R_R13];
+ assocs[14].value.reg64 = env->regs[R_R14];
+ assocs[15].value.reg64 = env->regs[R_R15];
+ assocs[16].value.reg64 = env->eip;
+ lflags_to_rflags(env);
+ assocs[17].value.reg64 = env->eflags;
+
+ ret = mshv_set_generic_regs(cpu_fd, assocs, n_regs);
+ if (ret < 0) {
+ error_report("failed to set standard registers");
+ return -errno;
+ }
+ return 0;
+}
+
int mshv_store_regs(CPUState *cpu)
{
- error_report("unimplemented");
- abort();
+ int ret;
+
+ ret = set_standard_regs(cpu);
+ if (ret < 0) {
+ error_report("Failed to store standard registers");
+ return -1;
+ }
+
+ return 0;
}
+
int mshv_load_regs(CPUState *cpu)
{
error_report("unimplemented");
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 14/26] target/i386/mshv: Implement mshv_get_standard_regs()
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (12 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 13/26] target/i386/mshv: Implement mshv_store_regs() Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 15/26] target/i386/mshv: Implement mshv_get_special_regs() Magnus Kulke
` (13 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Fetch standard register state from MSHV vCPUs to support debugging,
migration, and other introspection features in QEMU.
Fetch standard register state from a MHSV vCPU's. A generic get_regs()
function and a mapper to map the different register representations are
introduced.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
include/system/mshv.h | 1 +
target/i386/mshv/mshv-cpu.c | 69 +++++++++++++++++++++++++++++++++++--
2 files changed, 68 insertions(+), 2 deletions(-)
diff --git a/include/system/mshv.h b/include/system/mshv.h
index d1a119c3e7..43e2619ddc 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -90,6 +90,7 @@ typedef enum MshvVmExit {
void mshv_init_cpu_logic(void);
int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd);
void mshv_remove_vcpu(int vm_fd, int cpu_fd);
+int mshv_get_standard_regs(CPUState *cpu);
int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit);
int mshv_load_regs(CPUState *cpu);
int mshv_store_regs(CPUState *cpu);
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 4bd4e29b72..cb59d74eb4 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -61,6 +61,18 @@ int mshv_set_generic_regs(int cpu_fd, hv_register_assoc *assocs, size_t n_regs)
return ioctl(cpu_fd, MSHV_SET_VP_REGISTERS, &input);
}
+static int get_generic_regs(int cpu_fd, struct hv_register_assoc *assocs,
+ size_t n_regs)
+{
+ struct mshv_vp_registers input = {
+ .count = n_regs,
+ .regs = assocs,
+ };
+
+ return ioctl(cpu_fd, MSHV_GET_VP_REGISTERS, &input);
+}
+
+
static int set_standard_regs(const CPUState *cpu)
{
X86CPU *x86cpu = X86_CPU(cpu);
@@ -115,11 +127,64 @@ int mshv_store_regs(CPUState *cpu)
return 0;
}
+static void populate_standard_regs(const hv_register_assoc *assocs,
+ CPUX86State *env)
+{
+ env->regs[R_EAX] = assocs[0].value.reg64;
+ env->regs[R_EBX] = assocs[1].value.reg64;
+ env->regs[R_ECX] = assocs[2].value.reg64;
+ env->regs[R_EDX] = assocs[3].value.reg64;
+ env->regs[R_ESI] = assocs[4].value.reg64;
+ env->regs[R_EDI] = assocs[5].value.reg64;
+ env->regs[R_ESP] = assocs[6].value.reg64;
+ env->regs[R_EBP] = assocs[7].value.reg64;
+ env->regs[R_R8] = assocs[8].value.reg64;
+ env->regs[R_R9] = assocs[9].value.reg64;
+ env->regs[R_R10] = assocs[10].value.reg64;
+ env->regs[R_R11] = assocs[11].value.reg64;
+ env->regs[R_R12] = assocs[12].value.reg64;
+ env->regs[R_R13] = assocs[13].value.reg64;
+ env->regs[R_R14] = assocs[14].value.reg64;
+ env->regs[R_R15] = assocs[15].value.reg64;
+
+ env->eip = assocs[16].value.reg64;
+ env->eflags = assocs[17].value.reg64;
+ rflags_to_lflags(env);
+}
+
+int mshv_get_standard_regs(CPUState *cpu)
+{
+ struct hv_register_assoc assocs[ARRAY_SIZE(STANDARD_REGISTER_NAMES)];
+ int ret;
+ X86CPU *x86cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86cpu->env;
+ int cpu_fd = mshv_vcpufd(cpu);
+ size_t n_regs = ARRAY_SIZE(STANDARD_REGISTER_NAMES);
+
+ for (size_t i = 0; i < n_regs; i++) {
+ assocs[i].name = STANDARD_REGISTER_NAMES[i];
+ }
+ ret = get_generic_regs(cpu_fd, assocs, n_regs);
+ if (ret < 0) {
+ error_report("failed to get standard registers");
+ return -1;
+ }
+
+ populate_standard_regs(assocs, env);
+ return 0;
+}
int mshv_load_regs(CPUState *cpu)
{
- error_report("unimplemented");
- abort();
+ int ret;
+
+ ret = mshv_get_standard_regs(cpu);
+ if (ret < 0) {
+ error_report("Failed to load standard registers");
+ return -1;
+ }
+
+ return 0;
}
int mshv_arch_put_registers(const CPUState *cpu)
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 15/26] target/i386/mshv: Implement mshv_get_special_regs()
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (13 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 14/26] target/i386/mshv: Implement mshv_get_standard_regs() Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 16/26] target/i386/mshv: Implement mshv_arch_put_registers() Magnus Kulke
` (12 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Retrieve special registers (e.g. segment, control, and descriptor
table registers) from MSHV vCPUs.
Various helper functions to map register state representations between
Qemu and MSHV are introduced.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
include/system/mshv.h | 1 +
target/i386/mshv/mshv-cpu.c | 105 ++++++++++++++++++++++++++++++++++++
2 files changed, 106 insertions(+)
diff --git a/include/system/mshv.h b/include/system/mshv.h
index 43e2619ddc..6ca82d367b 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -91,6 +91,7 @@ void mshv_init_cpu_logic(void);
int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd);
void mshv_remove_vcpu(int vm_fd, int cpu_fd);
int mshv_get_standard_regs(CPUState *cpu);
+int mshv_get_special_regs(CPUState *cpu);
int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit);
int mshv_load_regs(CPUState *cpu);
int mshv_store_regs(CPUState *cpu);
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index cb59d74eb4..53b6722af4 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -51,6 +51,26 @@ static enum hv_register_name STANDARD_REGISTER_NAMES[18] = {
HV_X64_REGISTER_RFLAGS,
};
+static enum hv_register_name SPECIAL_REGISTER_NAMES[17] = {
+ HV_X64_REGISTER_CS,
+ HV_X64_REGISTER_DS,
+ HV_X64_REGISTER_ES,
+ HV_X64_REGISTER_FS,
+ HV_X64_REGISTER_GS,
+ HV_X64_REGISTER_SS,
+ HV_X64_REGISTER_TR,
+ HV_X64_REGISTER_LDTR,
+ HV_X64_REGISTER_GDTR,
+ HV_X64_REGISTER_IDTR,
+ HV_X64_REGISTER_CR0,
+ HV_X64_REGISTER_CR2,
+ HV_X64_REGISTER_CR3,
+ HV_X64_REGISTER_CR4,
+ HV_X64_REGISTER_CR8,
+ HV_X64_REGISTER_EFER,
+ HV_X64_REGISTER_APIC_BASE,
+};
+
int mshv_set_generic_regs(int cpu_fd, hv_register_assoc *assocs, size_t n_regs)
{
struct mshv_vp_registers input = {
@@ -174,6 +194,85 @@ int mshv_get_standard_regs(CPUState *cpu)
return 0;
}
+static inline void populate_segment_reg(const hv_x64_segment_register *hv_seg,
+ SegmentCache *seg)
+{
+ memset(seg, 0, sizeof(SegmentCache));
+
+ seg->base = hv_seg->base;
+ seg->limit = hv_seg->limit;
+ seg->selector = hv_seg->selector;
+
+ seg->flags = (hv_seg->segment_type << DESC_TYPE_SHIFT)
+ | (hv_seg->present * DESC_P_MASK)
+ | (hv_seg->descriptor_privilege_level << DESC_DPL_SHIFT)
+ | (hv_seg->_default << DESC_B_SHIFT)
+ | (hv_seg->non_system_segment * DESC_S_MASK)
+ | (hv_seg->_long << DESC_L_SHIFT)
+ | (hv_seg->granularity * DESC_G_MASK)
+ | (hv_seg->available * DESC_AVL_MASK);
+
+}
+
+static inline void populate_table_reg(const hv_x64_table_register *hv_seg,
+ SegmentCache *tbl)
+{
+ memset(tbl, 0, sizeof(SegmentCache));
+
+ tbl->base = hv_seg->base;
+ tbl->limit = hv_seg->limit;
+}
+
+static void populate_special_regs(const hv_register_assoc *assocs,
+ X86CPU *x86cpu)
+{
+ CPUX86State *env = &x86cpu->env;
+
+ populate_segment_reg(&assocs[0].value.segment, &env->segs[R_CS]);
+ populate_segment_reg(&assocs[1].value.segment, &env->segs[R_DS]);
+ populate_segment_reg(&assocs[2].value.segment, &env->segs[R_ES]);
+ populate_segment_reg(&assocs[3].value.segment, &env->segs[R_FS]);
+ populate_segment_reg(&assocs[4].value.segment, &env->segs[R_GS]);
+ populate_segment_reg(&assocs[5].value.segment, &env->segs[R_SS]);
+
+ populate_segment_reg(&assocs[6].value.segment, &env->tr);
+ populate_segment_reg(&assocs[7].value.segment, &env->ldt);
+
+ populate_table_reg(&assocs[8].value.table, &env->gdt);
+ populate_table_reg(&assocs[9].value.table, &env->idt);
+
+ env->cr[0] = assocs[10].value.reg64;
+ env->cr[2] = assocs[11].value.reg64;
+ env->cr[3] = assocs[12].value.reg64;
+ env->cr[4] = assocs[13].value.reg64;
+
+ cpu_set_apic_tpr(x86cpu->apic_state, assocs[14].value.reg64);
+ env->efer = assocs[15].value.reg64;
+ cpu_set_apic_base(x86cpu->apic_state, assocs[16].value.reg64);
+}
+
+
+int mshv_get_special_regs(CPUState *cpu)
+{
+ struct hv_register_assoc assocs[ARRAY_SIZE(SPECIAL_REGISTER_NAMES)];
+ int ret;
+ X86CPU *x86cpu = X86_CPU(cpu);
+ int cpu_fd = mshv_vcpufd(cpu);
+ size_t n_regs = ARRAY_SIZE(SPECIAL_REGISTER_NAMES);
+
+ for (size_t i = 0; i < n_regs; i++) {
+ assocs[i].name = SPECIAL_REGISTER_NAMES[i];
+ }
+ ret = get_generic_regs(cpu_fd, assocs, n_regs);
+ if (ret < 0) {
+ error_report("failed to get special registers");
+ return -errno;
+ }
+
+ populate_special_regs(assocs, x86cpu);
+ return 0;
+}
+
int mshv_load_regs(CPUState *cpu)
{
int ret;
@@ -184,6 +283,12 @@ int mshv_load_regs(CPUState *cpu)
return -1;
}
+ ret = mshv_get_special_regs(cpu);
+ if (ret < 0) {
+ error_report("Failed to load special registers");
+ return -1;
+ }
+
return 0;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 16/26] target/i386/mshv: Implement mshv_arch_put_registers()
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (14 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 15/26] target/i386/mshv: Implement mshv_get_special_regs() Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 17/26] target/i386/mshv: Set local interrupt controller state Magnus Kulke
` (11 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Write CPU register state to MSHV vCPUs. Various mapping functions to
prepare the payload for the HV call have been implemented.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
include/system/mshv.h | 15 +++
target/i386/mshv/mshv-cpu.c | 239 ++++++++++++++++++++++++++++++++++++
2 files changed, 254 insertions(+)
diff --git a/include/system/mshv.h b/include/system/mshv.h
index 6ca82d367b..ec41e62315 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -81,6 +81,20 @@ typedef struct MshvMsiControl {
#define mshv_msi_via_irqfd_enabled() mshv_enabled()
/* cpu */
+typedef struct MshvFPU {
+ uint8_t fpr[8][16];
+ uint16_t fcw;
+ uint16_t fsw;
+ uint8_t ftwx;
+ uint8_t pad1;
+ uint16_t last_opcode;
+ uint64_t last_ip;
+ uint64_t last_dp;
+ uint8_t xmm[16][16];
+ uint32_t mxcsr;
+ uint32_t pad2;
+} MshvFPU;
+
typedef enum MshvVmExit {
MshvVmExitIgnore = 0,
MshvVmExitShutdown = 1,
@@ -90,6 +104,7 @@ typedef enum MshvVmExit {
void mshv_init_cpu_logic(void);
int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd);
void mshv_remove_vcpu(int vm_fd, int cpu_fd);
+int mshv_configure_vcpu(const CPUState *cpu, const MshvFPU *fpu, uint64_t xcr0);
int mshv_get_standard_regs(CPUState *cpu);
int mshv_get_special_regs(CPUState *cpu);
int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit);
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 53b6722af4..dddb2da428 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -71,6 +71,35 @@ static enum hv_register_name SPECIAL_REGISTER_NAMES[17] = {
HV_X64_REGISTER_APIC_BASE,
};
+static enum hv_register_name FPU_REGISTER_NAMES[26] = {
+ HV_X64_REGISTER_XMM0,
+ HV_X64_REGISTER_XMM1,
+ HV_X64_REGISTER_XMM2,
+ HV_X64_REGISTER_XMM3,
+ HV_X64_REGISTER_XMM4,
+ HV_X64_REGISTER_XMM5,
+ HV_X64_REGISTER_XMM6,
+ HV_X64_REGISTER_XMM7,
+ HV_X64_REGISTER_XMM8,
+ HV_X64_REGISTER_XMM9,
+ HV_X64_REGISTER_XMM10,
+ HV_X64_REGISTER_XMM11,
+ HV_X64_REGISTER_XMM12,
+ HV_X64_REGISTER_XMM13,
+ HV_X64_REGISTER_XMM14,
+ HV_X64_REGISTER_XMM15,
+ HV_X64_REGISTER_FP_MMX0,
+ HV_X64_REGISTER_FP_MMX1,
+ HV_X64_REGISTER_FP_MMX2,
+ HV_X64_REGISTER_FP_MMX3,
+ HV_X64_REGISTER_FP_MMX4,
+ HV_X64_REGISTER_FP_MMX5,
+ HV_X64_REGISTER_FP_MMX6,
+ HV_X64_REGISTER_FP_MMX7,
+ HV_X64_REGISTER_FP_CONTROL_STATUS,
+ HV_X64_REGISTER_XMM_CONTROL_STATUS,
+};
+
int mshv_set_generic_regs(int cpu_fd, hv_register_assoc *assocs, size_t n_regs)
{
struct mshv_vp_registers input = {
@@ -292,8 +321,218 @@ int mshv_load_regs(CPUState *cpu)
return 0;
}
+static inline void populate_hv_segment_reg(SegmentCache *seg,
+ hv_x64_segment_register *hv_reg)
+{
+ uint32_t flags = seg->flags;
+
+ hv_reg->base = seg->base;
+ hv_reg->limit = seg->limit;
+ hv_reg->selector = seg->selector;
+ hv_reg->segment_type = (flags >> DESC_TYPE_SHIFT) & 0xF;
+ hv_reg->non_system_segment = (flags & DESC_S_MASK) != 0;
+ hv_reg->descriptor_privilege_level = (flags >> DESC_DPL_SHIFT) & 0x3;
+ hv_reg->present = (flags & DESC_P_MASK) != 0;
+ hv_reg->reserved = 0;
+ hv_reg->available = (flags & DESC_AVL_MASK) != 0;
+ hv_reg->_long = (flags >> DESC_L_SHIFT) & 0x1;
+ hv_reg->_default = (flags >> DESC_B_SHIFT) & 0x1;
+ hv_reg->granularity = (flags & DESC_G_MASK) != 0;
+}
+
+static inline void populate_hv_table_reg(const struct SegmentCache *seg,
+ hv_x64_table_register *hv_reg)
+{
+ memset(hv_reg, 0, sizeof(*hv_reg));
+
+ hv_reg->base = seg->base;
+ hv_reg->limit = seg->limit;
+}
+
+static int set_special_regs(const CPUState *cpu)
+{
+ X86CPU *x86cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86cpu->env;
+ int cpu_fd = mshv_vcpufd(cpu);
+ struct hv_register_assoc assocs[ARRAY_SIZE(SPECIAL_REGISTER_NAMES)];
+ size_t n_regs = ARRAY_SIZE(SPECIAL_REGISTER_NAMES);
+ int ret;
+
+ /* set names */
+ for (size_t i = 0; i < n_regs; i++) {
+ assocs[i].name = SPECIAL_REGISTER_NAMES[i];
+ }
+ populate_hv_segment_reg(&env->segs[R_CS], &assocs[0].value.segment);
+ populate_hv_segment_reg(&env->segs[R_DS], &assocs[1].value.segment);
+ populate_hv_segment_reg(&env->segs[R_ES], &assocs[2].value.segment);
+ populate_hv_segment_reg(&env->segs[R_FS], &assocs[3].value.segment);
+ populate_hv_segment_reg(&env->segs[R_GS], &assocs[4].value.segment);
+ populate_hv_segment_reg(&env->segs[R_SS], &assocs[5].value.segment);
+ populate_hv_segment_reg(&env->tr, &assocs[6].value.segment);
+ populate_hv_segment_reg(&env->ldt, &assocs[7].value.segment);
+
+ populate_hv_table_reg(&env->gdt, &assocs[8].value.table);
+ populate_hv_table_reg(&env->idt, &assocs[9].value.table);
+
+ assocs[10].value.reg64 = env->cr[0];
+ assocs[11].value.reg64 = env->cr[2];
+ assocs[12].value.reg64 = env->cr[3];
+ assocs[13].value.reg64 = env->cr[4];
+ assocs[14].value.reg64 = cpu_get_apic_tpr(x86cpu->apic_state);
+ assocs[15].value.reg64 = env->efer;
+ assocs[16].value.reg64 = cpu_get_apic_base(x86cpu->apic_state);
+
+ ret = mshv_set_generic_regs(cpu_fd, assocs, n_regs);
+ if (ret < 0) {
+ error_report("failed to set special registers");
+ return -1;
+ }
+
+ return 0;
+}
+
+static int set_fpu(int cpu_fd, const struct MshvFPU *regs)
+{
+ struct hv_register_assoc assocs[ARRAY_SIZE(FPU_REGISTER_NAMES)];
+ union hv_register_value *value;
+ size_t fp_i;
+ union hv_x64_fp_control_status_register *ctrl_status;
+ union hv_x64_xmm_control_status_register *xmm_ctrl_status;
+ int ret;
+ size_t n_regs = ARRAY_SIZE(FPU_REGISTER_NAMES);
+
+ /* first 16 registers are xmm0-xmm15 */
+ for (size_t i = 0; i < 16; i++) {
+ assocs[i].name = FPU_REGISTER_NAMES[i];
+ value = &assocs[i].value;
+ memcpy(&value->reg128, ®s->xmm[i], 16);
+ }
+
+ /* next 8 registers are fp_mmx0-fp_mmx7 */
+ for (size_t i = 16; i < 24; i++) {
+ assocs[i].name = FPU_REGISTER_NAMES[i];
+ fp_i = (i - 16);
+ value = &assocs[i].value;
+ memcpy(&value->reg128, ®s->fpr[fp_i], 16);
+ }
+
+ /* last two registers are fp_control_status and xmm_control_status */
+ assocs[24].name = FPU_REGISTER_NAMES[24];
+ value = &assocs[24].value;
+ ctrl_status = &value->fp_control_status;
+ ctrl_status->fp_control = regs->fcw;
+ ctrl_status->fp_status = regs->fsw;
+ ctrl_status->fp_tag = regs->ftwx;
+ ctrl_status->reserved = 0;
+ ctrl_status->last_fp_op = regs->last_opcode;
+ ctrl_status->last_fp_rip = regs->last_ip;
+
+ assocs[25].name = FPU_REGISTER_NAMES[25];
+ value = &assocs[25].value;
+ xmm_ctrl_status = &value->xmm_control_status;
+ xmm_ctrl_status->xmm_status_control = regs->mxcsr;
+ xmm_ctrl_status->xmm_status_control_mask = 0;
+ xmm_ctrl_status->last_fp_rdp = regs->last_dp;
+
+ ret = mshv_set_generic_regs(cpu_fd, assocs, n_regs);
+ if (ret < 0) {
+ error_report("failed to set fpu registers");
+ return -1;
+ }
+
+ return 0;
+}
+
+static int set_xc_reg(int cpu_fd, uint64_t xcr0)
+{
+ int ret;
+ struct hv_register_assoc assoc = {
+ .name = HV_X64_REGISTER_XFEM,
+ .value.reg64 = xcr0,
+ };
+
+ ret = mshv_set_generic_regs(cpu_fd, &assoc, 1);
+ if (ret < 0) {
+ error_report("failed to set xcr0");
+ return -errno;
+ }
+ return 0;
+}
+
+static int set_cpu_state(const CPUState *cpu, const MshvFPU *fpu_regs,
+ uint64_t xcr0)
+{
+ int ret;
+ int cpu_fd = mshv_vcpufd(cpu);
+
+ ret = set_standard_regs(cpu);
+ if (ret < 0) {
+ return ret;
+ }
+ ret = set_special_regs(cpu);
+ if (ret < 0) {
+ return ret;
+ }
+ ret = set_fpu(cpu_fd, fpu_regs);
+ if (ret < 0) {
+ return ret;
+ }
+ ret = set_xc_reg(cpu_fd, xcr0);
+ if (ret < 0) {
+ return ret;
+ }
+ return 0;
+}
+
+/*
+ * TODO: populate topology info:
+ *
+ * X86CPU *x86cpu = X86_CPU(cpu);
+ * CPUX86State *env = &x86cpu->env;
+ * X86CPUTopoInfo *topo_info = &env->topo_info;
+ */
+int mshv_configure_vcpu(const CPUState *cpu, const struct MshvFPU *fpu,
+ uint64_t xcr0)
+{
+ int ret;
+
+ ret = set_cpu_state(cpu, fpu, xcr0);
+ if (ret < 0) {
+ error_report("failed to set cpu state");
+ return -1;
+ }
+
+ return 0;
+}
+
+static int put_regs(const CPUState *cpu)
+{
+ X86CPU *x86cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86cpu->env;
+ MshvFPU fpu = {0};
+ int ret;
+
+ memset(&fpu, 0, sizeof(fpu));
+
+ ret = mshv_configure_vcpu(cpu, &fpu, env->xcr0);
+ if (ret < 0) {
+ error_report("failed to configure vcpu");
+ return ret;
+ }
+
+ return 0;
+}
+
int mshv_arch_put_registers(const CPUState *cpu)
{
+ int ret;
+
+ ret = put_regs(cpu);
+ if (ret < 0) {
+ error_report("Failed to put registers");
+ return -1;
+ }
+
error_report("unimplemented");
abort();
}
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 17/26] target/i386/mshv: Set local interrupt controller state
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (15 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 16/26] target/i386/mshv: Implement mshv_arch_put_registers() Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 18/26] target/i386/mshv: Register CPUID entries with MSHV Magnus Kulke
` (10 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
To set the local interrupt controller state, perform hv calls retrieving
partition state from the hypervisor.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
target/i386/mshv/mshv-cpu.c | 118 ++++++++++++++++++++++++++++++++++++
1 file changed, 118 insertions(+)
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index dddb2da428..c233d4af70 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -12,6 +12,7 @@
#include "qemu/osdep.h"
#include "qemu/error-report.h"
+#include "qemu/memalign.h"
#include "qemu/typedefs.h"
#include "system/mshv.h"
@@ -19,6 +20,8 @@
#include "linux/mshv.h"
#include "hw/hyperv/hvhdk_mini.h"
#include "hw/hyperv/hvgdk.h"
+#include "hw/hyperv/hvgdk_mini.h"
+#include "hw/i386/apic_internal.h"
#include "cpu.h"
#include "emulate/x86_decode.h"
@@ -484,6 +487,114 @@ static int set_cpu_state(const CPUState *cpu, const MshvFPU *fpu_regs,
return 0;
}
+static int get_vp_state(int cpu_fd, struct mshv_get_set_vp_state *state)
+{
+ int ret;
+
+ ret = ioctl(cpu_fd, MSHV_GET_VP_STATE, state);
+ if (ret < 0) {
+ error_report("failed to get partition state: %s", strerror(errno));
+ return -1;
+ }
+
+ return 0;
+}
+
+static int get_lapic(int cpu_fd,
+ struct hv_local_interrupt_controller_state *state)
+{
+ int ret;
+ size_t size = 4096;
+ /* buffer aligned to 4k, as *state requires that */
+ void *buffer = qemu_memalign(size, size);
+ struct mshv_get_set_vp_state mshv_state = { 0 };
+
+ mshv_state.buf_ptr = (uint64_t) buffer;
+ mshv_state.buf_sz = size;
+ mshv_state.type = MSHV_VP_STATE_LAPIC;
+
+ ret = get_vp_state(cpu_fd, &mshv_state);
+ if (ret == 0) {
+ memcpy(state, buffer, sizeof(*state));
+ }
+ qemu_vfree(buffer);
+ if (ret < 0) {
+ error_report("failed to get lapic");
+ return -1;
+ }
+
+ return 0;
+}
+
+static uint32_t set_apic_delivery_mode(uint32_t reg, uint32_t mode)
+{
+ return ((reg) & ~0x700) | ((mode) << 8);
+}
+
+static int set_vp_state(int cpu_fd, const struct mshv_get_set_vp_state *state)
+{
+ int ret;
+
+ ret = ioctl(cpu_fd, MSHV_SET_VP_STATE, state);
+ if (ret < 0) {
+ error_report("failed to set partition state: %s", strerror(errno));
+ return -1;
+ }
+
+ return 0;
+}
+
+static int set_lapic(int cpu_fd,
+ const struct hv_local_interrupt_controller_state *state)
+{
+ int ret;
+ size_t size = 4096;
+ /* buffer aligned to 4k, as *state requires that */
+ void *buffer = qemu_memalign(size, size);
+ struct mshv_get_set_vp_state mshv_state = { 0 };
+
+ if (!state) {
+ error_report("lapic state is NULL");
+ return -1;
+ }
+ memcpy(buffer, state, sizeof(*state));
+
+ mshv_state.buf_ptr = (uint64_t) buffer;
+ mshv_state.buf_sz = size;
+ mshv_state.type = MSHV_VP_STATE_LAPIC;
+
+ ret = set_vp_state(cpu_fd, &mshv_state);
+ qemu_vfree(buffer);
+ if (ret < 0) {
+ error_report("failed to set lapic: %s", strerror(errno));
+ return -1;
+ }
+
+ return 0;
+}
+
+static int set_lint(int cpu_fd)
+{
+ int ret;
+ uint32_t *lvt_lint0, *lvt_lint1;
+
+ struct hv_local_interrupt_controller_state lapic_state = { 0 };
+ ret = get_lapic(cpu_fd, &lapic_state);
+ if (ret < 0) {
+ return ret;
+ }
+
+ lvt_lint0 = &lapic_state.apic_lvt_lint0;
+ *lvt_lint0 = set_apic_delivery_mode(*lvt_lint0, APIC_DM_EXTINT);
+
+ lvt_lint1 = &lapic_state.apic_lvt_lint1;
+ *lvt_lint1 = set_apic_delivery_mode(*lvt_lint1, APIC_DM_NMI);
+
+ /* TODO: should we skip setting lapic if the values are the same? */
+
+ return set_lapic(cpu_fd, &lapic_state);
+}
+
/*
* TODO: populate topology info:
*
@@ -495,6 +606,7 @@ int mshv_configure_vcpu(const CPUState *cpu, const struct MshvFPU *fpu,
uint64_t xcr0)
{
int ret;
+ int cpu_fd = mshv_vcpufd(cpu);
ret = set_cpu_state(cpu, fpu, xcr0);
if (ret < 0) {
@@ -502,6 +614,12 @@ int mshv_configure_vcpu(const CPUState *cpu, const struct MshvFPU *fpu,
return -1;
}
+ ret = set_lint(cpu_fd);
+ if (ret < 0) {
+ error_report("failed to set lpic int");
+ return -1;
+ }
+
return 0;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 18/26] target/i386/mshv: Register CPUID entries with MSHV
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (16 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 17/26] target/i386/mshv: Set local interrupt controller state Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-27 11:29 ` Daniel P. Berrangé
2025-08-07 14:39 ` [PATCH v3 19/26] target/i386/mshv: Register MSRs " Magnus Kulke
` (9 subsequent siblings)
27 siblings, 1 reply; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Convert the guest CPU's CPUID model into MSHV's format and register it
with the hypervisor. This ensures that the guest observes the correct
CPU feature set during CPUID instructions.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
target/i386/mshv/mshv-cpu.c | 199 ++++++++++++++++++++++++++++++++++++
1 file changed, 199 insertions(+)
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index c233d4af70..0b7350877d 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -324,6 +324,199 @@ int mshv_load_regs(CPUState *cpu)
return 0;
}
+static void add_cpuid_entry(GList *cpuid_entries,
+ uint32_t function, uint32_t index,
+ uint32_t eax, uint32_t ebx,
+ uint32_t ecx, uint32_t edx)
+{
+ struct hv_cpuid_entry *entry;
+
+ entry = g_malloc0(sizeof(struct hv_cpuid_entry));
+ entry->function = function;
+ entry->index = index;
+ entry->eax = eax;
+ entry->ebx = ebx;
+ entry->ecx = ecx;
+ entry->edx = edx;
+
+ cpuid_entries = g_list_append(cpuid_entries, entry);
+}
+
+static void collect_cpuid_entries(const CPUState *cpu, GList *cpuid_entries)
+{
+ X86CPU *x86_cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86_cpu->env;
+ uint32_t eax, ebx, ecx, edx;
+ uint32_t leaf, subleaf;
+ size_t max_leaf = 0x1F;
+ size_t max_subleaf = 0x20;
+
+ uint32_t leaves_with_subleaves[] = {0x4, 0x7, 0xD, 0xF, 0x10};
+ int n_subleaf_leaves = ARRAY_SIZE(leaves_with_subleaves);
+
+ /* Regular leaves without subleaves */
+ for (leaf = 0; leaf <= max_leaf; leaf++) {
+ bool has_subleaves = false;
+ for (int i = 0; i < n_subleaf_leaves; i++) {
+ if (leaf == leaves_with_subleaves[i]) {
+ has_subleaves = true;
+ break;
+ }
+ }
+
+ if (!has_subleaves) {
+ cpu_x86_cpuid(env, leaf, 0, &eax, &ebx, &ecx, &edx);
+ if (eax == 0 && ebx == 0 && ecx == 0 && edx == 0) {
+ /* all zeroes indicates no more leaves */
+ continue;
+ }
+
+ add_cpuid_entry(cpuid_entries, leaf, 0, eax, ebx, ecx, edx);
+ continue;
+ }
+
+ subleaf = 0;
+ while (subleaf < max_subleaf) {
+ cpu_x86_cpuid(env, leaf, subleaf, &eax, &ebx, &ecx, &edx);
+
+ if (eax == 0 && ebx == 0 && ecx == 0 && edx == 0) {
+ /* all zeroes indicates no more leaves */
+ break;
+ }
+ add_cpuid_entry(cpuid_entries, leaf, 0, eax, ebx, ecx, edx);
+ subleaf++;
+ }
+ }
+}
+
+static int register_intercept_result_cpuid_entry(int cpu_fd,
+ uint8_t subleaf_specific,
+ uint8_t always_override,
+ struct hv_cpuid_entry *entry)
+{
+ struct hv_register_x64_cpuid_result_parameters cpuid_params = {
+ .input.eax = entry->function,
+ .input.ecx = entry->index,
+ .input.subleaf_specific = subleaf_specific,
+ .input.always_override = always_override,
+ .input.padding = 0,
+ /*
+ * With regard to masks - these are to specify bits to be overwritten
+ * The current CpuidEntry structure wouldn't allow to carry the masks
+ * in addition to the actual register values. For this reason, the
+ * masks are set to the exact values of the corresponding register bits
+ * to be registered for an overwrite. To view resulting values the
+ * hypervisor would return, HvCallGetVpCpuidValues hypercall can be
+ * used.
+ */
+ .result.eax = entry->eax,
+ .result.eax_mask = entry->eax,
+ .result.ebx = entry->ebx,
+ .result.ebx_mask = entry->ebx,
+ .result.ecx = entry->ecx,
+ .result.ecx_mask = entry->ecx,
+ .result.edx = entry->edx,
+ .result.edx_mask = entry->edx,
+ };
+ union hv_register_intercept_result_parameters parameters = {
+ .cpuid = cpuid_params,
+ };
+ struct mshv_register_intercept_result args = {
+ .intercept_type = HV_INTERCEPT_TYPE_X64_CPUID,
+ .parameters = parameters,
+ };
+ int ret;
+
+ ret = ioctl(cpu_fd, MSHV_VP_REGISTER_INTERCEPT_RESULT, &args);
+ if (ret < 0) {
+ error_report("failed to register intercept result for cpuid: %s",
+ strerror(errno));
+ return -1;
+ }
+
+ return 0;
+}
+
+static int register_intercept_result_cpuid(int cpu_fd, struct hv_cpuid *cpuid)
+{
+ int ret = 0, entry_ret;
+ struct hv_cpuid_entry *entry;
+ uint8_t subleaf_specific, always_override;
+
+ for (size_t i = 0; i < cpuid->nent; i++) {
+ entry = &cpuid->entries[i];
+
+ /* set defaults */
+ subleaf_specific = 0;
+ always_override = 1;
+
+ /* Intel */
+ /* 0xb - Extended Topology Enumeration Leaf */
+ /* 0x1f - V2 Extended Topology Enumeration Leaf */
+ /* AMD */
+ /* 0x8000_001e - Processor Topology Information */
+ /* 0x8000_0026 - Extended CPU Topology */
+ if (entry->function == 0xb
+ || entry->function == 0x1f
+ || entry->function == 0x8000001e
+ || entry->function == 0x80000026) {
+ subleaf_specific = 1;
+ always_override = 1;
+ } else if (entry->function == 0x00000001
+ || entry->function == 0x80000000
+ || entry->function == 0x80000001
+ || entry->function == 0x80000008) {
+ subleaf_specific = 0;
+ always_override = 1;
+ }
+
+ entry_ret = register_intercept_result_cpuid_entry(cpu_fd,
+ subleaf_specific,
+ always_override,
+ entry);
+ if ((entry_ret < 0) && (ret == 0)) {
+ ret = entry_ret;
+ }
+ }
+
+ return ret;
+}
+
+static int set_cpuid2(const CPUState *cpu)
+{
+ int ret;
+ size_t n_entries, cpuid_size;
+ struct hv_cpuid *cpuid;
+ struct hv_cpuid_entry *entry;
+ GList *entries = NULL;
+ int cpu_fd = mshv_vcpufd(cpu);
+
+ collect_cpuid_entries(cpu, entries);
+ n_entries = g_list_length(entries);
+
+ cpuid_size = sizeof(struct hv_cpuid)
+ + n_entries * sizeof(struct hv_cpuid_entry);
+
+ cpuid = g_malloc0(cpuid_size);
+ cpuid->nent = n_entries;
+ cpuid->padding = 0;
+
+ for (size_t i = 0; i < n_entries; i++) {
+ entry = g_list_nth_data(entries, i);
+ cpuid->entries[i] = *entry;
+ g_free(entry);
+ }
+ g_list_free(entries);
+
+ ret = register_intercept_result_cpuid(cpu_fd, cpuid);
+ g_free(cpuid);
+ if (ret < 0) {
+ return ret;
+ }
+
+ return 0;
+}
+
static inline void populate_hv_segment_reg(SegmentCache *seg,
hv_x64_segment_register *hv_reg)
{
@@ -608,6 +801,12 @@ int mshv_configure_vcpu(const CPUState *cpu, const struct MshvFPU *fpu,
int ret;
int cpu_fd = mshv_vcpufd(cpu);
+ ret = set_cpuid2(cpu);
+ if (ret < 0) {
+ error_report("failed to set cpuid");
+ return -1;
+ }
+
ret = set_cpu_state(cpu, fpu, xcr0);
if (ret < 0) {
error_report("failed to set cpu state");
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v3 18/26] target/i386/mshv: Register CPUID entries with MSHV
2025-08-07 14:39 ` [PATCH v3 18/26] target/i386/mshv: Register CPUID entries with MSHV Magnus Kulke
@ 2025-08-27 11:29 ` Daniel P. Berrangé
2025-09-16 10:49 ` Magnus Kulke
0 siblings, 1 reply; 46+ messages in thread
From: Daniel P. Berrangé @ 2025-08-27 11:29 UTC (permalink / raw)
To: Magnus Kulke
Cc: qemu-devel, Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Magnus Kulke, Cornelia Huck, Zhao Liu, Thomas Huth, Yanan Wang,
Cameron Esfahani, Wei Liu, Wei Liu, Marc-André Lureau,
Roman Bolshakov, Philippe Mathieu-Daudé
On Thu, Aug 07, 2025 at 04:39:43PM +0200, Magnus Kulke wrote:
> Convert the guest CPU's CPUID model into MSHV's format and register it
> with the hypervisor. This ensures that the guest observes the correct
> CPU feature set during CPUID instructions.
QEMU supports a variety of CPU models. '-cpu host' is intended to
expose every possible feature that the underlying hypervisor can
support, while '-cpu $NAME' exposes certain named CPU models.
Also KVM will force enable certain features that it can either
unconditionally emulate, or requires to always be present.
Are you aware if there any noteworthy differences / restrictions
in the use of CPU models for MSHV that would not be present for
KVM, or vica-verca ? I'm particularly wondering if there is
anything special libvirt needs to be aware of - most of what
libvirt does it gets via the QMP query-cpu-XXXX commands.
>
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
> target/i386/mshv/mshv-cpu.c | 199 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 199 insertions(+)
>
> diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
> index c233d4af70..0b7350877d 100644
> --- a/target/i386/mshv/mshv-cpu.c
> +++ b/target/i386/mshv/mshv-cpu.c
> @@ -324,6 +324,199 @@ int mshv_load_regs(CPUState *cpu)
> return 0;
> }
>
> +static void add_cpuid_entry(GList *cpuid_entries,
> + uint32_t function, uint32_t index,
> + uint32_t eax, uint32_t ebx,
> + uint32_t ecx, uint32_t edx)
> +{
> + struct hv_cpuid_entry *entry;
> +
> + entry = g_malloc0(sizeof(struct hv_cpuid_entry));
> + entry->function = function;
> + entry->index = index;
> + entry->eax = eax;
> + entry->ebx = ebx;
> + entry->ecx = ecx;
> + entry->edx = edx;
> +
> + cpuid_entries = g_list_append(cpuid_entries, entry);
> +}
> +
> +static void collect_cpuid_entries(const CPUState *cpu, GList *cpuid_entries)
> +{
> + X86CPU *x86_cpu = X86_CPU(cpu);
> + CPUX86State *env = &x86_cpu->env;
> + uint32_t eax, ebx, ecx, edx;
> + uint32_t leaf, subleaf;
> + size_t max_leaf = 0x1F;
> + size_t max_subleaf = 0x20;
> +
> + uint32_t leaves_with_subleaves[] = {0x4, 0x7, 0xD, 0xF, 0x10};
> + int n_subleaf_leaves = ARRAY_SIZE(leaves_with_subleaves);
> +
> + /* Regular leaves without subleaves */
> + for (leaf = 0; leaf <= max_leaf; leaf++) {
> + bool has_subleaves = false;
> + for (int i = 0; i < n_subleaf_leaves; i++) {
> + if (leaf == leaves_with_subleaves[i]) {
> + has_subleaves = true;
> + break;
> + }
> + }
> +
> + if (!has_subleaves) {
> + cpu_x86_cpuid(env, leaf, 0, &eax, &ebx, &ecx, &edx);
> + if (eax == 0 && ebx == 0 && ecx == 0 && edx == 0) {
> + /* all zeroes indicates no more leaves */
> + continue;
> + }
> +
> + add_cpuid_entry(cpuid_entries, leaf, 0, eax, ebx, ecx, edx);
> + continue;
> + }
> +
> + subleaf = 0;
> + while (subleaf < max_subleaf) {
> + cpu_x86_cpuid(env, leaf, subleaf, &eax, &ebx, &ecx, &edx);
> +
> + if (eax == 0 && ebx == 0 && ecx == 0 && edx == 0) {
> + /* all zeroes indicates no more leaves */
> + break;
> + }
> + add_cpuid_entry(cpuid_entries, leaf, 0, eax, ebx, ecx, edx);
> + subleaf++;
> + }
> + }
> +}
> +
> +static int register_intercept_result_cpuid_entry(int cpu_fd,
> + uint8_t subleaf_specific,
> + uint8_t always_override,
> + struct hv_cpuid_entry *entry)
> +{
> + struct hv_register_x64_cpuid_result_parameters cpuid_params = {
> + .input.eax = entry->function,
> + .input.ecx = entry->index,
> + .input.subleaf_specific = subleaf_specific,
> + .input.always_override = always_override,
> + .input.padding = 0,
> + /*
> + * With regard to masks - these are to specify bits to be overwritten
> + * The current CpuidEntry structure wouldn't allow to carry the masks
> + * in addition to the actual register values. For this reason, the
> + * masks are set to the exact values of the corresponding register bits
> + * to be registered for an overwrite. To view resulting values the
> + * hypervisor would return, HvCallGetVpCpuidValues hypercall can be
> + * used.
> + */
> + .result.eax = entry->eax,
> + .result.eax_mask = entry->eax,
> + .result.ebx = entry->ebx,
> + .result.ebx_mask = entry->ebx,
> + .result.ecx = entry->ecx,
> + .result.ecx_mask = entry->ecx,
> + .result.edx = entry->edx,
> + .result.edx_mask = entry->edx,
> + };
> + union hv_register_intercept_result_parameters parameters = {
> + .cpuid = cpuid_params,
> + };
> + struct mshv_register_intercept_result args = {
> + .intercept_type = HV_INTERCEPT_TYPE_X64_CPUID,
> + .parameters = parameters,
> + };
> + int ret;
> +
> + ret = ioctl(cpu_fd, MSHV_VP_REGISTER_INTERCEPT_RESULT, &args);
> + if (ret < 0) {
> + error_report("failed to register intercept result for cpuid: %s",
> + strerror(errno));
> + return -1;
> + }
> +
> + return 0;
> +}
> +
> +static int register_intercept_result_cpuid(int cpu_fd, struct hv_cpuid *cpuid)
> +{
> + int ret = 0, entry_ret;
> + struct hv_cpuid_entry *entry;
> + uint8_t subleaf_specific, always_override;
> +
> + for (size_t i = 0; i < cpuid->nent; i++) {
> + entry = &cpuid->entries[i];
> +
> + /* set defaults */
> + subleaf_specific = 0;
> + always_override = 1;
> +
> + /* Intel */
> + /* 0xb - Extended Topology Enumeration Leaf */
> + /* 0x1f - V2 Extended Topology Enumeration Leaf */
> + /* AMD */
> + /* 0x8000_001e - Processor Topology Information */
> + /* 0x8000_0026 - Extended CPU Topology */
> + if (entry->function == 0xb
> + || entry->function == 0x1f
> + || entry->function == 0x8000001e
> + || entry->function == 0x80000026) {
> + subleaf_specific = 1;
> + always_override = 1;
> + } else if (entry->function == 0x00000001
> + || entry->function == 0x80000000
> + || entry->function == 0x80000001
> + || entry->function == 0x80000008) {
> + subleaf_specific = 0;
> + always_override = 1;
> + }
> +
> + entry_ret = register_intercept_result_cpuid_entry(cpu_fd,
> + subleaf_specific,
> + always_override,
> + entry);
> + if ((entry_ret < 0) && (ret == 0)) {
> + ret = entry_ret;
> + }
> + }
> +
> + return ret;
> +}
> +
> +static int set_cpuid2(const CPUState *cpu)
> +{
> + int ret;
> + size_t n_entries, cpuid_size;
> + struct hv_cpuid *cpuid;
> + struct hv_cpuid_entry *entry;
> + GList *entries = NULL;
> + int cpu_fd = mshv_vcpufd(cpu);
> +
> + collect_cpuid_entries(cpu, entries);
> + n_entries = g_list_length(entries);
> +
> + cpuid_size = sizeof(struct hv_cpuid)
> + + n_entries * sizeof(struct hv_cpuid_entry);
> +
> + cpuid = g_malloc0(cpuid_size);
> + cpuid->nent = n_entries;
> + cpuid->padding = 0;
> +
> + for (size_t i = 0; i < n_entries; i++) {
> + entry = g_list_nth_data(entries, i);
> + cpuid->entries[i] = *entry;
> + g_free(entry);
> + }
> + g_list_free(entries);
> +
> + ret = register_intercept_result_cpuid(cpu_fd, cpuid);
> + g_free(cpuid);
> + if (ret < 0) {
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> static inline void populate_hv_segment_reg(SegmentCache *seg,
> hv_x64_segment_register *hv_reg)
> {
> @@ -608,6 +801,12 @@ int mshv_configure_vcpu(const CPUState *cpu, const struct MshvFPU *fpu,
> int ret;
> int cpu_fd = mshv_vcpufd(cpu);
>
> + ret = set_cpuid2(cpu);
> + if (ret < 0) {
> + error_report("failed to set cpuid");
> + return -1;
> + }
> +
> ret = set_cpu_state(cpu, fpu, xcr0);
> if (ret < 0) {
> error_report("failed to set cpu state");
> --
> 2.34.1
>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v3 18/26] target/i386/mshv: Register CPUID entries with MSHV
2025-08-27 11:29 ` Daniel P. Berrangé
@ 2025-09-16 10:49 ` Magnus Kulke
0 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-09-16 10:49 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: qemu-devel, Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Magnus Kulke, Cornelia Huck, Zhao Liu, Thomas Huth, Yanan Wang,
Cameron Esfahani, Wei Liu, Wei Liu, Marc-André Lureau,
Roman Bolshakov, Philippe Mathieu-Daudé
On Wed, Aug 27, 2025 at 12:29:25PM +0100, Daniel P. Berrangé wrote:
> QEMU supports a variety of CPU models. '-cpu host' is intended to
> expose every possible feature that the underlying hypervisor can
> support, while '-cpu $NAME' exposes certain named CPU models.
>
> Also KVM will force enable certain features that it can either
> unconditionally emulate, or requires to always be present.
>
> Are you aware if there any noteworthy differences / restrictions
> in the use of CPU models for MSHV that would not be present for
> KVM, or vica-verca ? I'm particularly wondering if there is
> anything special libvirt needs to be aware of - most of what
> libvirt does it gets via the QMP query-cpu-XXXX commands.
>
The current cpuid impl is rather simple/unopionated at this point. We
will probably iterate on it in the future (e.g. include synthetic
responses). In principle it should behave similar to the KVM accel:
-cpu host is reflecting the cpuid of the host CPU (i.e. dom0/root
partition running on Hyper‑V). We are gathering those values from
QEMU and register them with the hypervisor.
-cpu $MODEL should works similar. The QEMU-supplied model definitions
CPUID/MSR values are registered with HyperV. In case of an unsupported
feature the registration would fail.
What the MSHV driver currently doesn't provide is something similar to
KVM's KVM_GET_SUPPORTED_CPUID ioctl, so we do not currently force-enable
or silently mask cpuid bits beyond what the CPU model requests.
I'm not aware of any implications for libvirt and QMP that we would need
to take into account wrt cpuid.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v3 19/26] target/i386/mshv: Register MSRs with MSHV
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (17 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 18/26] target/i386/mshv: Register CPUID entries with MSHV Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 20/26] target/i386/mshv: Integrate x86 instruction decoder/emulator Magnus Kulke
` (8 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Build and register the guest vCPU's model-specific registers using
the MSHV interface.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
accel/mshv/meson.build | 1 +
accel/mshv/msr.c | 373 ++++++++++++++++++++++++++++++++++++
include/system/mshv.h | 23 +++
target/i386/cpu.h | 2 +
target/i386/mshv/mshv-cpu.c | 33 ++++
5 files changed, 432 insertions(+)
create mode 100644 accel/mshv/msr.c
diff --git a/accel/mshv/meson.build b/accel/mshv/meson.build
index f88fc8678c..d3a2b32581 100644
--- a/accel/mshv/meson.build
+++ b/accel/mshv/meson.build
@@ -2,6 +2,7 @@ mshv_ss = ss.source_set()
mshv_ss.add(if_true: files(
'irq.c',
'mem.c',
+ 'msr.c',
'mshv-all.c'
))
diff --git a/accel/mshv/msr.c b/accel/mshv/msr.c
new file mode 100644
index 0000000000..b477eac14c
--- /dev/null
+++ b/accel/mshv/msr.c
@@ -0,0 +1,373 @@
+/*
+ * QEMU MSHV support
+ *
+ * Copyright Microsoft, Corp. 2025
+ *
+ * Authors: Magnus Kulke <magnuskulke@microsoft.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "system/mshv.h"
+#include "hw/hyperv/hvgdk_mini.h"
+#include "linux/mshv.h"
+#include "qemu/error-report.h"
+
+static uint32_t supported_msrs[64] = {
+ IA32_MSR_TSC,
+ IA32_MSR_EFER,
+ IA32_MSR_KERNEL_GS_BASE,
+ IA32_MSR_APIC_BASE,
+ IA32_MSR_PAT,
+ IA32_MSR_SYSENTER_CS,
+ IA32_MSR_SYSENTER_ESP,
+ IA32_MSR_SYSENTER_EIP,
+ IA32_MSR_STAR,
+ IA32_MSR_LSTAR,
+ IA32_MSR_CSTAR,
+ IA32_MSR_SFMASK,
+ IA32_MSR_MTRR_DEF_TYPE,
+ IA32_MSR_MTRR_PHYSBASE0,
+ IA32_MSR_MTRR_PHYSMASK0,
+ IA32_MSR_MTRR_PHYSBASE1,
+ IA32_MSR_MTRR_PHYSMASK1,
+ IA32_MSR_MTRR_PHYSBASE2,
+ IA32_MSR_MTRR_PHYSMASK2,
+ IA32_MSR_MTRR_PHYSBASE3,
+ IA32_MSR_MTRR_PHYSMASK3,
+ IA32_MSR_MTRR_PHYSBASE4,
+ IA32_MSR_MTRR_PHYSMASK4,
+ IA32_MSR_MTRR_PHYSBASE5,
+ IA32_MSR_MTRR_PHYSMASK5,
+ IA32_MSR_MTRR_PHYSBASE6,
+ IA32_MSR_MTRR_PHYSMASK6,
+ IA32_MSR_MTRR_PHYSBASE7,
+ IA32_MSR_MTRR_PHYSMASK7,
+ IA32_MSR_MTRR_FIX64K_00000,
+ IA32_MSR_MTRR_FIX16K_80000,
+ IA32_MSR_MTRR_FIX16K_A0000,
+ IA32_MSR_MTRR_FIX4K_C0000,
+ IA32_MSR_MTRR_FIX4K_C8000,
+ IA32_MSR_MTRR_FIX4K_D0000,
+ IA32_MSR_MTRR_FIX4K_D8000,
+ IA32_MSR_MTRR_FIX4K_E0000,
+ IA32_MSR_MTRR_FIX4K_E8000,
+ IA32_MSR_MTRR_FIX4K_F0000,
+ IA32_MSR_MTRR_FIX4K_F8000,
+ IA32_MSR_TSC_AUX,
+ IA32_MSR_DEBUG_CTL,
+ HV_X64_MSR_GUEST_OS_ID,
+ HV_X64_MSR_SINT0,
+ HV_X64_MSR_SINT1,
+ HV_X64_MSR_SINT2,
+ HV_X64_MSR_SINT3,
+ HV_X64_MSR_SINT4,
+ HV_X64_MSR_SINT5,
+ HV_X64_MSR_SINT6,
+ HV_X64_MSR_SINT7,
+ HV_X64_MSR_SINT8,
+ HV_X64_MSR_SINT9,
+ HV_X64_MSR_SINT10,
+ HV_X64_MSR_SINT11,
+ HV_X64_MSR_SINT12,
+ HV_X64_MSR_SINT13,
+ HV_X64_MSR_SINT14,
+ HV_X64_MSR_SINT15,
+ HV_X64_MSR_SCONTROL,
+ HV_X64_MSR_SIEFP,
+ HV_X64_MSR_SIMP,
+ HV_X64_MSR_REFERENCE_TSC,
+ HV_X64_MSR_EOM,
+};
+static const size_t msr_count = ARRAY_SIZE(supported_msrs);
+
+static int compare_msr_index(const void *a, const void *b)
+{
+ return *(uint32_t *)a - *(uint32_t *)b;
+}
+
+__attribute__((constructor))
+static void init_sorted_msr_map(void)
+{
+ qsort(supported_msrs, msr_count, sizeof(uint32_t), compare_msr_index);
+}
+
+static int mshv_is_supported_msr(uint32_t msr)
+{
+ return bsearch(&msr, supported_msrs, msr_count, sizeof(uint32_t),
+ compare_msr_index) != NULL;
+}
+
+static int mshv_msr_to_hv_reg_name(uint32_t msr, uint32_t *hv_reg)
+{
+ switch (msr) {
+ case IA32_MSR_TSC:
+ *hv_reg = HV_X64_REGISTER_TSC;
+ return 0;
+ case IA32_MSR_EFER:
+ *hv_reg = HV_X64_REGISTER_EFER;
+ return 0;
+ case IA32_MSR_KERNEL_GS_BASE:
+ *hv_reg = HV_X64_REGISTER_KERNEL_GS_BASE;
+ return 0;
+ case IA32_MSR_APIC_BASE:
+ *hv_reg = HV_X64_REGISTER_APIC_BASE;
+ return 0;
+ case IA32_MSR_PAT:
+ *hv_reg = HV_X64_REGISTER_PAT;
+ return 0;
+ case IA32_MSR_SYSENTER_CS:
+ *hv_reg = HV_X64_REGISTER_SYSENTER_CS;
+ return 0;
+ case IA32_MSR_SYSENTER_ESP:
+ *hv_reg = HV_X64_REGISTER_SYSENTER_ESP;
+ return 0;
+ case IA32_MSR_SYSENTER_EIP:
+ *hv_reg = HV_X64_REGISTER_SYSENTER_EIP;
+ return 0;
+ case IA32_MSR_STAR:
+ *hv_reg = HV_X64_REGISTER_STAR;
+ return 0;
+ case IA32_MSR_LSTAR:
+ *hv_reg = HV_X64_REGISTER_LSTAR;
+ return 0;
+ case IA32_MSR_CSTAR:
+ *hv_reg = HV_X64_REGISTER_CSTAR;
+ return 0;
+ case IA32_MSR_SFMASK:
+ *hv_reg = HV_X64_REGISTER_SFMASK;
+ return 0;
+ case IA32_MSR_MTRR_CAP:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_CAP;
+ return 0;
+ case IA32_MSR_MTRR_DEF_TYPE:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_DEF_TYPE;
+ return 0;
+ case IA32_MSR_MTRR_PHYSBASE0:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE0;
+ return 0;
+ case IA32_MSR_MTRR_PHYSMASK0:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK0;
+ return 0;
+ case IA32_MSR_MTRR_PHYSBASE1:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE1;
+ return 0;
+ case IA32_MSR_MTRR_PHYSMASK1:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK1;
+ return 0;
+ case IA32_MSR_MTRR_PHYSBASE2:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE2;
+ return 0;
+ case IA32_MSR_MTRR_PHYSMASK2:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK2;
+ return 0;
+ case IA32_MSR_MTRR_PHYSBASE3:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE3;
+ return 0;
+ case IA32_MSR_MTRR_PHYSMASK3:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK3;
+ return 0;
+ case IA32_MSR_MTRR_PHYSBASE4:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE4;
+ return 0;
+ case IA32_MSR_MTRR_PHYSMASK4:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK4;
+ return 0;
+ case IA32_MSR_MTRR_PHYSBASE5:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE5;
+ return 0;
+ case IA32_MSR_MTRR_PHYSMASK5:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK5;
+ return 0;
+ case IA32_MSR_MTRR_PHYSBASE6:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE6;
+ return 0;
+ case IA32_MSR_MTRR_PHYSMASK6:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK6;
+ return 0;
+ case IA32_MSR_MTRR_PHYSBASE7:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_BASE7;
+ return 0;
+ case IA32_MSR_MTRR_PHYSMASK7:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_PHYS_MASK7;
+ return 0;
+ case IA32_MSR_MTRR_FIX64K_00000:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX64K00000;
+ return 0;
+ case IA32_MSR_MTRR_FIX16K_80000:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX16K80000;
+ return 0;
+ case IA32_MSR_MTRR_FIX16K_A0000:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX16KA0000;
+ return 0;
+ case IA32_MSR_MTRR_FIX4K_C0000:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KC0000;
+ return 0;
+ case IA32_MSR_MTRR_FIX4K_C8000:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KC8000;
+ return 0;
+ case IA32_MSR_MTRR_FIX4K_D0000:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KD0000;
+ return 0;
+ case IA32_MSR_MTRR_FIX4K_D8000:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KD8000;
+ return 0;
+ case IA32_MSR_MTRR_FIX4K_E0000:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KE0000;
+ return 0;
+ case IA32_MSR_MTRR_FIX4K_E8000:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KE8000;
+ return 0;
+ case IA32_MSR_MTRR_FIX4K_F0000:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KF0000;
+ return 0;
+ case IA32_MSR_MTRR_FIX4K_F8000:
+ *hv_reg = HV_X64_REGISTER_MSR_MTRR_FIX4KF8000;
+ return 0;
+ case IA32_MSR_TSC_AUX:
+ *hv_reg = HV_X64_REGISTER_TSC_AUX;
+ return 0;
+ case IA32_MSR_BNDCFGS:
+ *hv_reg = HV_X64_REGISTER_BNDCFGS;
+ return 0;
+ case IA32_MSR_DEBUG_CTL:
+ *hv_reg = HV_X64_REGISTER_DEBUG_CTL;
+ return 0;
+ case IA32_MSR_TSC_ADJUST:
+ *hv_reg = HV_X64_REGISTER_TSC_ADJUST;
+ return 0;
+ case IA32_MSR_SPEC_CTRL:
+ *hv_reg = HV_X64_REGISTER_SPEC_CTRL;
+ return 0;
+ case HV_X64_MSR_GUEST_OS_ID:
+ *hv_reg = HV_REGISTER_GUEST_OS_ID;
+ return 0;
+ case HV_X64_MSR_SINT0:
+ *hv_reg = HV_REGISTER_SINT0;
+ return 0;
+ case HV_X64_MSR_SINT1:
+ *hv_reg = HV_REGISTER_SINT1;
+ return 0;
+ case HV_X64_MSR_SINT2:
+ *hv_reg = HV_REGISTER_SINT2;
+ return 0;
+ case HV_X64_MSR_SINT3:
+ *hv_reg = HV_REGISTER_SINT3;
+ return 0;
+ case HV_X64_MSR_SINT4:
+ *hv_reg = HV_REGISTER_SINT4;
+ return 0;
+ case HV_X64_MSR_SINT5:
+ *hv_reg = HV_REGISTER_SINT5;
+ return 0;
+ case HV_X64_MSR_SINT6:
+ *hv_reg = HV_REGISTER_SINT6;
+ return 0;
+ case HV_X64_MSR_SINT7:
+ *hv_reg = HV_REGISTER_SINT7;
+ return 0;
+ case HV_X64_MSR_SINT8:
+ *hv_reg = HV_REGISTER_SINT8;
+ return 0;
+ case HV_X64_MSR_SINT9:
+ *hv_reg = HV_REGISTER_SINT9;
+ return 0;
+ case HV_X64_MSR_SINT10:
+ *hv_reg = HV_REGISTER_SINT10;
+ return 0;
+ case HV_X64_MSR_SINT11:
+ *hv_reg = HV_REGISTER_SINT11;
+ return 0;
+ case HV_X64_MSR_SINT12:
+ *hv_reg = HV_REGISTER_SINT12;
+ return 0;
+ case HV_X64_MSR_SINT13:
+ *hv_reg = HV_REGISTER_SINT13;
+ return 0;
+ case HV_X64_MSR_SINT14:
+ *hv_reg = HV_REGISTER_SINT14;
+ return 0;
+ case HV_X64_MSR_SINT15:
+ *hv_reg = HV_REGISTER_SINT15;
+ return 0;
+ case IA32_MSR_MISC_ENABLE:
+ *hv_reg = HV_X64_REGISTER_MSR_IA32_MISC_ENABLE;
+ return 0;
+ case HV_X64_MSR_SCONTROL:
+ *hv_reg = HV_REGISTER_SCONTROL;
+ return 0;
+ case HV_X64_MSR_SIEFP:
+ *hv_reg = HV_REGISTER_SIEFP;
+ return 0;
+ case HV_X64_MSR_SIMP:
+ *hv_reg = HV_REGISTER_SIMP;
+ return 0;
+ case HV_X64_MSR_REFERENCE_TSC:
+ *hv_reg = HV_REGISTER_REFERENCE_TSC;
+ return 0;
+ case HV_X64_MSR_EOM:
+ *hv_reg = HV_REGISTER_EOM;
+ return 0;
+ default:
+ error_report("failed to map MSR %u to HV register name", msr);
+ return -1;
+ }
+}
+
+static int set_msrs(int cpu_fd, GList *msrs)
+{
+ size_t n_msrs;
+ GList *entries;
+ MshvMsrEntry *entry;
+ enum hv_register_name name;
+ struct hv_register_assoc *assoc;
+ int ret;
+ size_t i = 0;
+
+ n_msrs = g_list_length(msrs);
+ hv_register_assoc *assocs = g_new0(hv_register_assoc, n_msrs);
+
+ entries = msrs;
+ for (const GList *elem = entries; elem != NULL; elem = elem->next) {
+ entry = elem->data;
+ ret = mshv_msr_to_hv_reg_name(entry->index, &name);
+ if (ret < 0) {
+ g_free(assocs);
+ return ret;
+ }
+ assoc = &assocs[i];
+ assoc->name = name;
+ /* the union has been initialized to 0 */
+ assoc->value.reg64 = entry->data;
+ i++;
+ }
+ ret = mshv_set_generic_regs(cpu_fd, assocs, n_msrs);
+ g_free(assocs);
+ if (ret < 0) {
+ error_report("failed to set msrs");
+ return -1;
+ }
+ return 0;
+}
+
+
+int mshv_configure_msr(int cpu_fd, const MshvMsrEntry *msrs, size_t n_msrs)
+{
+ GList *valid_msrs = NULL;
+ uint32_t msr_index;
+ int ret;
+
+ for (size_t i = 0; i < n_msrs; i++) {
+ msr_index = msrs[i].index;
+ /* check whether index of msrs is in SUPPORTED_MSRS */
+ if (mshv_is_supported_msr(msr_index)) {
+ valid_msrs = g_list_append(valid_msrs, (void *) &msrs[i]);
+ }
+ }
+
+ ret = set_msrs(cpu_fd, valid_msrs);
+ g_list_free(valid_msrs);
+
+ return ret;
+}
diff --git a/include/system/mshv.h b/include/system/mshv.h
index ec41e62315..7f2a7dcb8a 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -38,6 +38,8 @@ typedef struct hyperv_message hv_message;
#define MSHV_PAGE_SHIFT 12
+#define MSHV_MSR_ENTRIES_COUNT 64
+
#ifdef CONFIG_MSHV_IS_POSSIBLE
extern bool mshv_allowed;
#define mshv_enabled() (mshv_allowed)
@@ -119,8 +121,29 @@ void mshv_arch_amend_proc_features(
union hv_partition_synthetic_processor_features *features);
int mshv_arch_post_init_vm(int vm_fd);
+/* pio */
+int mshv_pio_write(uint64_t port, const uint8_t *data, uintptr_t size,
+ bool is_secure_mode);
+void mshv_pio_read(uint64_t port, uint8_t *data, uintptr_t size,
+ bool is_secure_mode);
+
+/* generic */
int mshv_hvcall(int mshv_fd, const struct mshv_root_hvcall *args);
+/* msr */
+typedef struct MshvMsrEntry {
+ uint32_t index;
+ uint32_t reserved;
+ uint64_t data;
+} MshvMsrEntry;
+
+typedef struct MshvMsrEntries {
+ MshvMsrEntry entries[MSHV_MSR_ENTRIES_COUNT];
+ uint32_t nmsrs;
+} MshvMsrEntries;
+
+int mshv_configure_msr(int cpu_fd, const MshvMsrEntry *msrs, size_t n_msrs);
+
/* memory */
typedef struct MshvMemoryRegion {
uint64_t guest_phys_addr;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 6d3d2b1440..05197ab0b3 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -435,9 +435,11 @@ typedef enum X86Seg {
#define MSR_SMI_COUNT 0x34
#define MSR_CORE_THREAD_COUNT 0x35
#define MSR_MTRRcap 0xfe
+#define MSR_MTRR_MEM_TYPE_WB 0x06
#define MSR_MTRRcap_VCNT 8
#define MSR_MTRRcap_FIXRANGE_SUPPORT (1 << 8)
#define MSR_MTRRcap_WC_SUPPORTED (1 << 10)
+#define MSR_MTRR_ENABLE (1 << 11)
#define MSR_IA32_SYSENTER_CS 0x174
#define MSR_IA32_SYSENTER_ESP 0x175
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 0b7350877d..c2c7934343 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -788,6 +788,33 @@ static int set_lint(int cpu_fd)
return set_lapic(cpu_fd, &lapic_state);
}
+static int setup_msrs(int cpu_fd)
+{
+ int ret;
+ uint64_t default_type = MSR_MTRR_ENABLE | MSR_MTRR_MEM_TYPE_WB;
+
+ /* boot msr entries */
+ MshvMsrEntry msrs[9] = {
+ { .index = IA32_MSR_SYSENTER_CS, .data = 0x0, },
+ { .index = IA32_MSR_SYSENTER_ESP, .data = 0x0, },
+ { .index = IA32_MSR_SYSENTER_EIP, .data = 0x0, },
+ { .index = IA32_MSR_STAR, .data = 0x0, },
+ { .index = IA32_MSR_CSTAR, .data = 0x0, },
+ { .index = IA32_MSR_LSTAR, .data = 0x0, },
+ { .index = IA32_MSR_KERNEL_GS_BASE, .data = 0x0, },
+ { .index = IA32_MSR_SFMASK, .data = 0x0, },
+ { .index = IA32_MSR_MTRR_DEF_TYPE, .data = default_type, },
+ };
+
+ ret = mshv_configure_msr(cpu_fd, msrs, 9);
+ if (ret < 0) {
+ error_report("failed to setup msrs");
+ return -1;
+ }
+
+ return 0;
+}
+
/*
* TODO: populate topology info:
*
@@ -807,6 +834,12 @@ int mshv_configure_vcpu(const CPUState *cpu, const struct MshvFPU *fpu,
return -1;
}
+ ret = setup_msrs(cpu_fd);
+ if (ret < 0) {
+ error_report("failed to setup msrs");
+ return -1;
+ }
+
ret = set_cpu_state(cpu, fpu, xcr0);
if (ret < 0) {
error_report("failed to set cpu state");
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 20/26] target/i386/mshv: Integrate x86 instruction decoder/emulator
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (18 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 19/26] target/i386/mshv: Register MSRs " Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 21/26] target/i386/mshv: Write MSRs to the hypervisor Magnus Kulke
` (7 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Connect the x86 instruction decoder and emulator to the MSHV backend
to handle intercepted instructions. This enables software emulation
of MMIO operations in MSHV guests. MSHV has a translate_gva hypercall
that is used to accessing the physical guest memory.
A guest might read from unmapped memory regions (e.g. OVMF will probe
0xfed40000 for a vTPM). In those cases 0xFF bytes is returned instead of
aborting the execution.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
accel/mshv/mem.c | 65 +++++++++++++++++++
accel/mshv/mshv-all.c | 2 +-
include/system/mshv.h | 6 +-
target/i386/mshv/mshv-cpu.c | 126 +++++++++++++++++++++++++++++++++++-
4 files changed, 196 insertions(+), 3 deletions(-)
diff --git a/accel/mshv/mem.c b/accel/mshv/mem.c
index 8039f35680..150fb723af 100644
--- a/accel/mshv/mem.c
+++ b/accel/mshv/mem.c
@@ -58,6 +58,71 @@ static int map_or_unmap(int vm_fd, const MshvMemoryRegion *mr, bool map)
return set_guest_memory(vm_fd, ®ion);
}
+static int handle_unmapped_mmio_region_read(uint64_t gpa, uint64_t size,
+ uint8_t *data)
+{
+ warn_report("read from unmapped mmio region gpa=0x%lx size=%lu", gpa, size);
+
+ if (size == 0 || size > 8) {
+ error_report("invalid size %lu for reading from unmapped mmio region",
+ size);
+ return -1;
+ }
+
+ memset(data, 0xFF, size);
+
+ return 0;
+}
+
+int mshv_guest_mem_read(uint64_t gpa, uint8_t *data, uintptr_t size,
+ bool is_secure_mode, bool instruction_fetch)
+{
+ int ret;
+ MemTxAttrs memattr = { .secure = is_secure_mode };
+
+ if (instruction_fetch) {
+ trace_mshv_insn_fetch(gpa, size);
+ } else {
+ trace_mshv_mem_read(gpa, size);
+ }
+
+ ret = address_space_rw(&address_space_memory, gpa, memattr, (void *)data,
+ size, false);
+ if (ret == MEMTX_OK) {
+ return 0;
+ }
+
+ if (ret == MEMTX_DECODE_ERROR) {
+ return handle_unmapped_mmio_region_read(gpa, size, data);
+ }
+
+ error_report("failed to read guest memory at 0x%lx", gpa);
+ return -1;
+}
+
+int mshv_guest_mem_write(uint64_t gpa, const uint8_t *data, uintptr_t size,
+ bool is_secure_mode)
+{
+ int ret;
+ MemTxAttrs memattr = { .secure = is_secure_mode };
+
+ trace_mshv_mem_write(gpa, size);
+ ret = address_space_rw(&address_space_memory, gpa, memattr, (void *)data,
+ size, true);
+ if (ret == MEMTX_OK) {
+ return 0;
+ }
+
+ if (ret == MEMTX_DECODE_ERROR) {
+ warn_report("write to unmapped mmio region gpa=0x%lx size=%lu", gpa,
+ size);
+ return 0;
+ }
+
+ error_report("Failed to write guest memory");
+ return -1;
+}
+
static int set_memory(const MshvMemoryRegion *mshv_mr, bool add)
{
int ret = 0;
diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index 65166e82b0..4f4c4b9639 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -431,7 +431,7 @@ static int mshv_init(AccelState *as, MachineState *ms)
return -1;
}
- mshv_init_cpu_logic();
+ mshv_init_mmio_emu();
mshv_init_msicontrol();
diff --git a/include/system/mshv.h b/include/system/mshv.h
index 7f2a7dcb8a..c527acc08c 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -103,7 +103,7 @@ typedef enum MshvVmExit {
MshvVmExitSpecial = 2,
} MshvVmExit;
-void mshv_init_cpu_logic(void);
+void mshv_init_mmio_emu(void);
int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd);
void mshv_remove_vcpu(int vm_fd, int cpu_fd);
int mshv_configure_vcpu(const CPUState *cpu, const MshvFPU *fpu, uint64_t xcr0);
@@ -154,6 +154,10 @@ typedef struct MshvMemoryRegion {
int mshv_add_mem(int vm_fd, const MshvMemoryRegion *mr);
int mshv_remove_mem(int vm_fd, const MshvMemoryRegion *mr);
+int mshv_guest_mem_read(uint64_t gpa, uint8_t *data, uintptr_t size,
+ bool is_secure_mode, bool instruction_fetch);
+int mshv_guest_mem_write(uint64_t gpa, const uint8_t *data, uintptr_t size,
+ bool is_secure_mode);
void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
bool add);
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index c2c7934343..673c90f865 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -103,6 +103,34 @@ static enum hv_register_name FPU_REGISTER_NAMES[26] = {
HV_X64_REGISTER_XMM_CONTROL_STATUS,
};
+static int translate_gva(int cpu_fd, uint64_t gva, uint64_t *gpa,
+ uint64_t flags)
+{
+ int ret;
+ union hv_translate_gva_result result = { 0 };
+
+ *gpa = 0;
+ mshv_translate_gva args = {
+ .gva = gva,
+ .flags = flags,
+ .gpa = gpa,
+ .result = &result,
+ };
+
+ ret = ioctl(cpu_fd, MSHV_TRANSLATE_GVA, &args);
+ if (ret < 0) {
+ error_report("failed to invoke gpa->gva translation");
+ return -errno;
+ }
+ if (result.result_code != HV_TRANSLATE_GVA_SUCCESS) {
+ error_report("failed to translate gva (" TARGET_FMT_lx ") to gpa", gva);
+ return -1;
+
+ }
+
+ return 0;
+}
+
int mshv_set_generic_regs(int cpu_fd, hv_register_assoc *assocs, size_t n_regs)
{
struct mshv_vp_registers input = {
@@ -922,8 +950,104 @@ int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd)
return 0;
}
-void mshv_init_cpu_logic(void)
+static int guest_mem_read_with_gva(const CPUState *cpu, uint64_t gva,
+ uint8_t *data, uintptr_t size,
+ bool fetch_instruction)
+{
+ int ret;
+ uint64_t gpa, flags;
+ int cpu_fd = mshv_vcpufd(cpu);
+
+ flags = HV_TRANSLATE_GVA_VALIDATE_READ;
+ ret = translate_gva(cpu_fd, gva, &gpa, flags);
+ if (ret < 0) {
+ error_report("failed to translate gva to gpa");
+ return -1;
+ }
+
+ ret = mshv_guest_mem_read(gpa, data, size, false, fetch_instruction);
+ if (ret < 0) {
+ error_report("failed to read from guest memory");
+ return -1;
+ }
+
+ return 0;
+}
+
+static int guest_mem_write_with_gva(const CPUState *cpu, uint64_t gva,
+ const uint8_t *data, uintptr_t size)
+{
+ int ret;
+ uint64_t gpa, flags;
+ int cpu_fd = mshv_vcpufd(cpu);
+
+ flags = HV_TRANSLATE_GVA_VALIDATE_WRITE;
+ ret = translate_gva(cpu_fd, gva, &gpa, flags);
+ if (ret < 0) {
+ error_report("failed to translate gva to gpa");
+ return -1;
+ }
+ ret = mshv_guest_mem_write(gpa, data, size, false);
+ if (ret < 0) {
+ error_report("failed to write to guest memory");
+ return -1;
+ }
+ return 0;
+}
+
+static void write_mem(CPUState *cpu, void *data, target_ulong addr, int bytes)
+{
+ if (guest_mem_write_with_gva(cpu, addr, data, bytes) < 0) {
+ error_report("failed to write memory");
+ abort();
+ }
+}
+
+static void read_mem(CPUState *cpu, void *data, target_ulong addr, int bytes)
+{
+ if (guest_mem_read_with_gva(cpu, addr, data, bytes, false) < 0) {
+ error_report("failed to read memory");
+ abort();
+ }
+}
+
+static void fetch_instruction(CPUState *cpu, void *data,
+ target_ulong addr, int bytes)
+{
+ if (guest_mem_read_with_gva(cpu, addr, data, bytes, true) < 0) {
+ error_report("failed to fetch instruction");
+ abort();
+ }
+}
+
+static void read_segment_descriptor(CPUState *cpu,
+ struct x86_segment_descriptor *desc,
+ enum X86Seg seg_idx)
+{
+ bool ret;
+ X86CPU *x86_cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86_cpu->env;
+ SegmentCache *seg = &env->segs[seg_idx];
+ x86_segment_selector sel = { .sel = seg->selector & 0xFFFF };
+
+ ret = x86_read_segment_descriptor(cpu, desc, sel);
+ if (ret == false) {
+ error_report("failed to read segment descriptor");
+ abort();
+ }
+}
+
+static const struct x86_emul_ops mshv_x86_emul_ops = {
+ .fetch_instruction = fetch_instruction,
+ .read_mem = read_mem,
+ .write_mem = write_mem,
+ .read_segment_descriptor = read_segment_descriptor,
+};
+
+void mshv_init_mmio_emu(void)
{
+ init_decoder();
+ init_emu(&mshv_x86_emul_ops);
}
void mshv_arch_init_vcpu(CPUState *cpu)
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 21/26] target/i386/mshv: Write MSRs to the hypervisor
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (19 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 20/26] target/i386/mshv: Integrate x86 instruction decoder/emulator Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 22/26] target/i386/mshv: Implement mshv_vcpu_run() Magnus Kulke
` (6 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Push current model-specific register (MSR) values to MSHV's vCPUs as
part of setting state to the hypervisor.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
target/i386/mshv/mshv-cpu.c | 68 +++++++++++++++++++++++++++++++++++--
1 file changed, 66 insertions(+), 2 deletions(-)
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 673c90f865..431bf83ff9 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -901,6 +901,65 @@ static int put_regs(const CPUState *cpu)
return 0;
}
+struct MsrPair {
+ uint32_t index;
+ uint64_t value;
+};
+
+static int put_msrs(const CPUState *cpu)
+{
+ int ret = 0;
+ X86CPU *x86cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86cpu->env;
+ MshvMsrEntries *msrs = g_malloc0(sizeof(MshvMsrEntries));
+
+ struct MsrPair pairs[] = {
+ { MSR_IA32_SYSENTER_CS, env->sysenter_cs },
+ { MSR_IA32_SYSENTER_ESP, env->sysenter_esp },
+ { MSR_IA32_SYSENTER_EIP, env->sysenter_eip },
+ { MSR_EFER, env->efer },
+ { MSR_PAT, env->pat },
+ { MSR_STAR, env->star },
+ { MSR_CSTAR, env->cstar },
+ { MSR_LSTAR, env->lstar },
+ { MSR_KERNELGSBASE, env->kernelgsbase },
+ { MSR_FMASK, env->fmask },
+ { MSR_MTRRdefType, env->mtrr_deftype },
+ { MSR_VM_HSAVE_PA, env->vm_hsave },
+ { MSR_SMI_COUNT, env->msr_smi_count },
+ { MSR_IA32_PKRS, env->pkrs },
+ { MSR_IA32_BNDCFGS, env->msr_bndcfgs },
+ { MSR_IA32_XSS, env->xss },
+ { MSR_IA32_UMWAIT_CONTROL, env->umwait },
+ { MSR_IA32_TSX_CTRL, env->tsx_ctrl },
+ { MSR_AMD64_TSC_RATIO, env->amd_tsc_scale_msr },
+ { MSR_TSC_AUX, env->tsc_aux },
+ { MSR_TSC_ADJUST, env->tsc_adjust },
+ { MSR_IA32_SMBASE, env->smbase },
+ { MSR_IA32_SPEC_CTRL, env->spec_ctrl },
+ { MSR_VIRT_SSBD, env->virt_ssbd },
+ };
+
+ if (ARRAY_SIZE(pairs) > MSHV_MSR_ENTRIES_COUNT) {
+ error_report("MSR entries exceed maximum size");
+ g_free(msrs);
+ return -1;
+ }
+
+ for (size_t i = 0; i < ARRAY_SIZE(pairs); i++) {
+ MshvMsrEntry *entry = &msrs->entries[i];
+ entry->index = pairs[i].index;
+ entry->reserved = 0;
+ entry->data = pairs[i].value;
+ msrs->nmsrs++;
+ }
+
+ ret = mshv_configure_msr(mshv_vcpufd(cpu), &msrs->entries[0], msrs->nmsrs);
+ g_free(msrs);
+ return ret;
+}
+
+
int mshv_arch_put_registers(const CPUState *cpu)
{
int ret;
@@ -911,8 +970,13 @@ int mshv_arch_put_registers(const CPUState *cpu)
return -1;
}
- error_report("unimplemented");
- abort();
+ ret = put_msrs(cpu);
+ if (ret < 0) {
+ error_report("Failed to put msrs");
+ return -1;
+ }
+
+ return 0;
}
void mshv_arch_amend_proc_features(
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 22/26] target/i386/mshv: Implement mshv_vcpu_run()
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (20 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 21/26] target/i386/mshv: Write MSRs to the hypervisor Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 23/26] accel/mshv: Handle overlapping mem mappings Magnus Kulke
` (5 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Add the main vCPU execution loop for MSHV using the MSHV_RUN_VP ioctl.
A translate_gva() hypercall is implemented. The execution loop handles
guest entry and VM exits. There are handlers for memory r/w, PIO and
MMIO to which the exit events are dispatched.
In case of MMIO the i386 instruction decoder/emulator is invoked to
perform the operation in user space.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
target/i386/mshv/mshv-cpu.c | 463 +++++++++++++++++++++++++++++++++++-
1 file changed, 461 insertions(+), 2 deletions(-)
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 431bf83ff9..81e9176164 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -985,10 +985,469 @@ void mshv_arch_amend_proc_features(
features->access_guest_idle_reg = 1;
}
+static int set_memory_info(const struct hyperv_message *msg,
+ struct hv_x64_memory_intercept_message *info)
+{
+ if (msg->header.message_type != HVMSG_GPA_INTERCEPT
+ && msg->header.message_type != HVMSG_UNMAPPED_GPA
+ && msg->header.message_type != HVMSG_UNACCEPTED_GPA) {
+ error_report("invalid message type");
+ return -1;
+ }
+ memcpy(info, msg->payload, sizeof(*info));
+
+ return 0;
+}
+
+static int emulate_instruction(CPUState *cpu,
+ const uint8_t *insn_bytes, size_t insn_len,
+ uint64_t gva, uint64_t gpa)
+{
+ X86CPU *x86_cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86_cpu->env;
+ struct x86_decode decode = { 0 };
+ int ret;
+ x86_insn_stream stream = { .bytes = insn_bytes, .len = insn_len };
+
+ ret = mshv_load_regs(cpu);
+ if (ret < 0) {
+ error_report("failed to load registers");
+ return -1;
+ }
+
+ decode_instruction_stream(env, &decode, &stream);
+ exec_instruction(env, &decode);
+
+ ret = mshv_store_regs(cpu);
+ if (ret < 0) {
+ error_report("failed to store registers");
+ return -1;
+ }
+
+ return 0;
+}
+
+static int handle_mmio(CPUState *cpu, const struct hyperv_message *msg,
+ MshvVmExit *exit_reason)
+{
+ struct hv_x64_memory_intercept_message info = { 0 };
+ size_t insn_len;
+ uint8_t access_type;
+ uint8_t *instruction_bytes;
+ int ret;
+
+ ret = set_memory_info(msg, &info);
+ if (ret < 0) {
+ error_report("failed to convert message to memory info");
+ return -1;
+ }
+ insn_len = info.instruction_byte_count;
+ access_type = info.header.intercept_access_type;
+
+ if (access_type == HV_X64_INTERCEPT_ACCESS_TYPE_EXECUTE) {
+ error_report("invalid intercept access type: execute");
+ return -1;
+ }
+
+ if (insn_len > 16) {
+ error_report("invalid mmio instruction length: %zu", insn_len);
+ return -1;
+ }
+
+ trace_mshv_handle_mmio(info.guest_virtual_address,
+ info.guest_physical_address,
+ info.instruction_byte_count, access_type);
+
+ instruction_bytes = info.instruction_bytes;
+
+ ret = emulate_instruction(cpu, instruction_bytes, insn_len,
+ info.guest_virtual_address,
+ info.guest_physical_address);
+ if (ret < 0) {
+ error_report("failed to emulate mmio");
+ return -1;
+ }
+
+ *exit_reason = MshvVmExitIgnore;
+
+ return 0;
+}
+
+static int set_ioport_info(const struct hyperv_message *msg,
+ hv_x64_io_port_intercept_message *info)
+{
+ if (msg->header.message_type != HVMSG_X64_IO_PORT_INTERCEPT) {
+ error_report("Invalid message type");
+ return -1;
+ }
+ memcpy(info, msg->payload, sizeof(*info));
+
+ return 0;
+}
+
+typedef struct X64Registers {
+ const uint32_t *names;
+ const uint64_t *values;
+ uintptr_t count;
+} X64Registers;
+
+static int set_x64_registers(int cpu_fd, const X64Registers *regs)
+{
+ size_t n_regs = regs->count;
+ struct hv_register_assoc *assocs;
+
+ assocs = g_new0(hv_register_assoc, n_regs);
+ for (size_t i = 0; i < n_regs; i++) {
+ assocs[i].name = regs->names[i];
+ assocs[i].value.reg64 = regs->values[i];
+ }
+ int ret;
+
+ ret = mshv_set_generic_regs(cpu_fd, assocs, n_regs);
+ g_free(assocs);
+ if (ret < 0) {
+ error_report("failed to set x64 registers");
+ return -1;
+ }
+
+ return 0;
+}
+
+static inline MemTxAttrs get_mem_attrs(bool is_secure_mode)
+{
+ MemTxAttrs memattr = {0};
+ memattr.secure = is_secure_mode;
+ return memattr;
+}
+
+static void pio_read(uint64_t port, uint8_t *data, uintptr_t size,
+ bool is_secure_mode)
+{
+ int ret = 0;
+ MemTxAttrs memattr = get_mem_attrs(is_secure_mode);
+ ret = address_space_rw(&address_space_io, port, memattr, (void *)data, size,
+ false);
+ if (ret != MEMTX_OK) {
+ error_report("Failed to read from port %lx: %d", port, ret);
+ abort();
+ }
+}
+
+static int pio_write(uint64_t port, const uint8_t *data, uintptr_t size,
+ bool is_secure_mode)
+{
+ int ret = 0;
+ MemTxAttrs memattr = get_mem_attrs(is_secure_mode);
+ ret = address_space_rw(&address_space_io, port, memattr, (void *)data, size,
+ true);
+ return ret;
+}
+
+static int handle_pio_non_str(const CPUState *cpu,
+ hv_x64_io_port_intercept_message *info) {
+ size_t len = info->access_info.access_size;
+ uint8_t access_type = info->header.intercept_access_type;
+ int ret;
+ uint32_t val, eax;
+ const uint32_t eax_mask = 0xffffffffu >> (32 - len * 8);
+ size_t insn_len;
+ uint64_t rip, rax;
+ uint32_t reg_names[2];
+ uint64_t reg_values[2];
+ struct X64Registers x64_regs = { 0 };
+ uint16_t port = info->port_number;
+ int cpu_fd = mshv_vcpufd(cpu);
+
+ if (access_type == HV_X64_INTERCEPT_ACCESS_TYPE_WRITE) {
+ union {
+ uint32_t u32;
+ uint8_t bytes[4];
+ } conv;
+
+ /* convert the first 4 bytes of rax to bytes */
+ conv.u32 = (uint32_t)info->rax;
+ /* secure mode is set to false */
+ ret = pio_write(port, conv.bytes, len, false);
+ if (ret < 0) {
+ error_report("Failed to write to io port");
+ return -1;
+ }
+ } else {
+ uint8_t data[4] = { 0 };
+ /* secure mode is set to false */
+ pio_read(info->port_number, data, len, false);
+
+ /* Preserve high bits in EAX, but clear out high bits in RAX */
+ val = *(uint32_t *)data;
+ eax = (((uint32_t)info->rax) & ~eax_mask) | (val & eax_mask);
+ info->rax = (uint64_t)eax;
+ }
+
+ insn_len = info->header.instruction_length;
+
+ /* Advance RIP and update RAX */
+ rip = info->header.rip + insn_len;
+ rax = info->rax;
+
+ reg_names[0] = HV_X64_REGISTER_RIP;
+ reg_values[0] = rip;
+ reg_names[1] = HV_X64_REGISTER_RAX;
+ reg_values[1] = rax;
+
+ x64_regs.names = reg_names;
+ x64_regs.values = reg_values;
+ x64_regs.count = 2;
+
+ ret = set_x64_registers(cpu_fd, &x64_regs);
+ if (ret < 0) {
+ error_report("Failed to set x64 registers");
+ return -1;
+ }
+
+ cpu->accel->dirty = false;
+
+ return 0;
+}
+
+static int fetch_guest_state(CPUState *cpu)
+{
+ int ret;
+
+ ret = mshv_get_standard_regs(cpu);
+ if (ret < 0) {
+ error_report("Failed to get standard registers");
+ return -1;
+ }
+
+ ret = mshv_get_special_regs(cpu);
+ if (ret < 0) {
+ error_report("Failed to get special registers");
+ return -1;
+ }
+
+ return 0;
+}
+
+static int read_memory(int cpu_fd, uint64_t initial_gva, uint64_t initial_gpa,
+ uint64_t gva, uint8_t *data, size_t len)
+{
+ int ret;
+ uint64_t gpa, flags;
+
+ if (gva == initial_gva) {
+ gpa = initial_gpa;
+ } else {
+ flags = HV_TRANSLATE_GVA_VALIDATE_READ;
+ ret = translate_gva(cpu_fd, gva, &gpa, flags);
+ if (ret < 0) {
+ return -1;
+ }
+
+ ret = mshv_guest_mem_read(gpa, data, len, false, false);
+ if (ret < 0) {
+ error_report("failed to read guest mem");
+ return -1;
+ }
+ }
+
+ return 0;
+}
+
+static int write_memory(int cpu_fd, uint64_t initial_gva, uint64_t initial_gpa,
+ uint64_t gva, const uint8_t *data, size_t len)
+{
+ int ret;
+ uint64_t gpa, flags;
+
+ if (gva == initial_gva) {
+ gpa = initial_gpa;
+ } else {
+ flags = HV_TRANSLATE_GVA_VALIDATE_WRITE;
+ ret = translate_gva(cpu_fd, gva, &gpa, flags);
+ if (ret < 0) {
+ error_report("failed to translate gva to gpa");
+ return -1;
+ }
+ }
+ ret = mshv_guest_mem_write(gpa, data, len, false);
+ if (ret != MEMTX_OK) {
+ error_report("failed to write to mmio");
+ return -1;
+ }
+
+ return 0;
+}
+
+static int handle_pio_str_write(CPUState *cpu,
+ hv_x64_io_port_intercept_message *info,
+ size_t repeat, uint16_t port,
+ bool direction_flag)
+{
+ int ret;
+ uint64_t src;
+ uint8_t data[4] = { 0 };
+ size_t len = info->access_info.access_size;
+ int cpu_fd = mshv_vcpufd(cpu);
+
+ src = linear_addr(cpu, info->rsi, R_DS);
+
+ for (size_t i = 0; i < repeat; i++) {
+ ret = read_memory(cpu_fd, 0, 0, src, data, len);
+ if (ret < 0) {
+ error_report("Failed to read memory");
+ return -1;
+ }
+ ret = pio_write(port, data, len, false);
+ if (ret < 0) {
+ error_report("Failed to write to io port");
+ return -1;
+ }
+ src += direction_flag ? -len : len;
+ info->rsi += direction_flag ? -len : len;
+ }
+
+ return 0;
+}
+
+static int handle_pio_str_read(CPUState *cpu,
+ hv_x64_io_port_intercept_message *info,
+ size_t repeat, uint16_t port,
+ bool direction_flag)
+{
+ int ret;
+ uint64_t dst;
+ size_t len = info->access_info.access_size;
+ uint8_t data[4] = { 0 };
+ int cpu_fd = mshv_vcpufd(cpu);
+
+ dst = linear_addr(cpu, info->rdi, R_ES);
+
+ for (size_t i = 0; i < repeat; i++) {
+ pio_read(port, data, len, false);
+
+ ret = write_memory(cpu_fd, 0, 0, dst, data, len);
+ if (ret < 0) {
+ error_report("Failed to write memory");
+ return -1;
+ }
+ dst += direction_flag ? -len : len;
+ info->rdi += direction_flag ? -len : len;
+ }
+
+ return 0;
+}
+
+static int handle_pio_str(CPUState *cpu,
+ hv_x64_io_port_intercept_message *info)
+{
+ uint8_t access_type = info->header.intercept_access_type;
+ uint16_t port = info->port_number;
+ bool repop = info->access_info.rep_prefix == 1;
+ size_t repeat = repop ? info->rcx : 1;
+ size_t insn_len = info->header.instruction_length;
+ bool direction_flag;
+ uint32_t reg_names[3];
+ uint64_t reg_values[3];
+ int ret;
+ struct X64Registers x64_regs = { 0 };
+ X86CPU *x86_cpu = X86_CPU(cpu);
+ CPUX86State *env = &x86_cpu->env;
+ int cpu_fd = mshv_vcpufd(cpu);
+
+ ret = fetch_guest_state(cpu);
+ if (ret < 0) {
+ error_report("Failed to fetch guest state");
+ return -1;
+ }
+
+ direction_flag = (env->eflags & DESC_E_MASK) != 0;
+
+ if (access_type == HV_X64_INTERCEPT_ACCESS_TYPE_WRITE) {
+ ret = handle_pio_str_write(cpu, info, repeat, port, direction_flag);
+ if (ret < 0) {
+ error_report("Failed to handle pio str write");
+ return -1;
+ }
+ reg_names[0] = HV_X64_REGISTER_RSI;
+ reg_values[0] = info->rsi;
+ } else {
+ ret = handle_pio_str_read(cpu, info, repeat, port, direction_flag);
+ reg_names[0] = HV_X64_REGISTER_RDI;
+ reg_values[0] = info->rdi;
+ }
+
+ reg_names[1] = HV_X64_REGISTER_RIP;
+ reg_values[1] = info->header.rip + insn_len;
+ reg_names[2] = HV_X64_REGISTER_RAX;
+ reg_values[2] = info->rax;
+
+ x64_regs.names = reg_names;
+ x64_regs.values = reg_values;
+ x64_regs.count = 2;
+
+ ret = set_x64_registers(cpu_fd, &x64_regs);
+ if (ret < 0) {
+ error_report("Failed to set x64 registers");
+ return -1;
+ }
+
+ cpu->accel->dirty = false;
+
+ return 0;
+}
+
+static int handle_pio(CPUState *cpu, const struct hyperv_message *msg)
+{
+ struct hv_x64_io_port_intercept_message info = { 0 };
+ int ret;
+
+ ret = set_ioport_info(msg, &info);
+ if (ret < 0) {
+ error_report("Failed to convert message to ioport info");
+ return -1;
+ }
+
+ if (info.access_info.string_op) {
+ return handle_pio_str(cpu, &info);
+ }
+
+ return handle_pio_non_str(cpu, &info);
+}
+
int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit)
{
- error_report("unimplemented");
- abort();
+ int ret;
+ enum MshvVmExit exit_reason;
+ int cpu_fd = mshv_vcpufd(cpu);
+
+ ret = ioctl(cpu_fd, MSHV_RUN_VP, msg);
+ if (ret < 0) {
+ return MshvVmExitShutdown;
+ }
+
+ switch (msg->header.message_type) {
+ case HVMSG_UNRECOVERABLE_EXCEPTION:
+ return MshvVmExitShutdown;
+ case HVMSG_UNMAPPED_GPA:
+ case HVMSG_GPA_INTERCEPT:
+ ret = handle_mmio(cpu, msg, &exit_reason);
+ if (ret < 0) {
+ error_report("failed to handle mmio");
+ return -1;
+ }
+ return exit_reason;
+ case HVMSG_X64_IO_PORT_INTERCEPT:
+ ret = handle_pio(cpu, msg);
+ if (ret < 0) {
+ return MshvVmExitSpecial;
+ }
+ return MshvVmExitIgnore;
+ default:
+ break;
+ }
+
+ *exit = MshvVmExitIgnore;
+ return 0;
}
void mshv_remove_vcpu(int vm_fd, int cpu_fd)
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 23/26] accel/mshv: Handle overlapping mem mappings
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (21 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 22/26] target/i386/mshv: Implement mshv_vcpu_run() Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 24/26] docs: Add mshv to documentation Magnus Kulke
` (4 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
QEMU maps certain regions into the guest multiple times, as seen in the
trace below. Currently the MSHV kernel driver will reject those
mappings. To workaround this, a record is kept (a static global list of
"slots", inspired by what the HVF accelerator has implemented). An
overlapping region is not registered at the hypervisor, and marked as
mapped=false. If there is an UNMAPPED_GPA exit, we can look for a slot
that is unmapped and would cover the GPA. In this case we map out the
conflicting slot and map in the requested region.
mshv_set_phys_mem add=1 name=pc.bios
mshv_map_memory => u_a=7ffff4e00000 gpa=00fffc0000 size=00040000
mshv_set_phys_mem add=1 name=ioapic
mshv_set_phys_mem add=1 name=hpet
mshv_set_phys_mem add=0 name=pc.ram
mshv_unmap_memory u_a=7fff67e00000 gpa=0000000000 size=80000000
mshv_set_phys_mem add=1 name=pc.ram
mshv_map_memory u_a=7fff67e00000 gpa=0000000000 size=000c0000
mshv_set_phys_mem add=1 name=pc.rom
mshv_map_memory u_a=7ffff4c00000 gpa=00000c0000 size=00020000
mshv_set_phys_mem add=1 name=pc.bios
mshv_remap_attempt => u_a=7ffff4e20000 gpa=00000e0000 size=00020000
The mapping table is guarded by a mutex for concurrent modification and
RCU mechanisms for concurrent reads. Writes occur rarely, but we'll have
to verify whether an unmapped region exist for each UNMAPPED_GPA exit,
which happens frequently.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
accel/mshv/mem.c | 406 +++++++++++++++++++++++++++++++++---
accel/mshv/mshv-all.c | 2 +
accel/mshv/trace-events | 5 +
include/system/mshv.h | 24 ++-
target/i386/mshv/mshv-cpu.c | 43 ++++
5 files changed, 448 insertions(+), 32 deletions(-)
diff --git a/accel/mshv/mem.c b/accel/mshv/mem.c
index 150fb723af..c56e2c077f 100644
--- a/accel/mshv/mem.c
+++ b/accel/mshv/mem.c
@@ -11,7 +11,9 @@
*/
#include "qemu/osdep.h"
+#include "qemu/lockable.h"
#include "qemu/error-report.h"
+#include "qemu/rcu.h"
#include "linux/mshv.h"
#include "system/address-spaces.h"
#include "system/mshv.h"
@@ -19,6 +21,137 @@
#include <sys/ioctl.h>
#include "trace.h"
+typedef struct SlotsRCUReclaim {
+ struct rcu_head rcu;
+ GList *old_head;
+ MshvMemorySlot *removed_slot;
+} SlotsRCUReclaim;
+
+static void rcu_reclaim_slotlist(struct rcu_head *rcu)
+{
+ SlotsRCUReclaim *r = container_of(rcu, SlotsRCUReclaim, rcu);
+ g_list_free(r->old_head);
+ g_free(r->removed_slot);
+ g_free(r);
+}
+
+static void publish_slots(GList *new_head, GList *old_head,
+ MshvMemorySlot *removed_slot)
+{
+ MshvMemorySlotManager *manager = &mshv_state->msm;
+
+ assert(manager);
+ qatomic_store_release(&manager->slots, new_head);
+
+ SlotsRCUReclaim *r = g_new(SlotsRCUReclaim, 1);
+ r->old_head = old_head;
+ r->removed_slot = removed_slot;
+
+ call_rcu1(&r->rcu, rcu_reclaim_slotlist);
+}
+
+/* Needs to be called with mshv_state->msm.mutex held */
+static int remove_slot(MshvMemorySlot *slot)
+{
+ GList *old_head, *new_head;
+ MshvMemorySlotManager *manager = &mshv_state->msm;
+
+ assert(manager);
+ old_head = qatomic_load_acquire(&manager->slots);
+
+ if (!g_list_find(old_head, slot)) {
+ error_report("slot requested for removal not found");
+ return -1;
+ }
+
+ new_head = g_list_copy(old_head);
+ new_head = g_list_remove(new_head, slot);
+ manager->n_slots--;
+
+ publish_slots(new_head, old_head, slot);
+
+ return 0;
+}
+
+/* Needs to be called with mshv_state->msm.mutex held */
+static MshvMemorySlot *append_slot(uint64_t gpa, uint64_t userspace_addr,
+ uint64_t size, bool readonly)
+{
+ GList *old_head, *new_head;
+ MshvMemorySlot *slot;
+ MshvMemorySlotManager *manager = &mshv_state->msm;
+
+ assert(manager);
+
+ old_head = qatomic_load_acquire(&manager->slots);
+
+ if (manager->n_slots >= MSHV_MAX_MEM_SLOTS) {
+ error_report("no free memory slots available");
+ return NULL;
+ }
+
+ slot = g_new0(MshvMemorySlot, 1);
+ slot->guest_phys_addr = gpa;
+ slot->userspace_addr = userspace_addr;
+ slot->memory_size = size;
+ slot->readonly = readonly;
+
+ new_head = g_list_copy(old_head);
+ new_head = g_list_append(new_head, slot);
+ manager->n_slots++;
+
+ publish_slots(new_head, old_head, NULL);
+
+ return slot;
+}
+
+static int slot_overlaps(const MshvMemorySlot *slot1,
+ const MshvMemorySlot *slot2)
+{
+ uint64_t start_1 = slot1->userspace_addr,
+ start_2 = slot2->userspace_addr;
+ size_t len_1 = slot1->memory_size,
+ len_2 = slot2->memory_size;
+
+ if (slot1 == slot2) {
+ return -1;
+ }
+
+ return ranges_overlap(start_1, len_1, start_2, len_2) ? 0 : -1;
+}
+
+static bool is_mapped(MshvMemorySlot *slot)
+{
+ /* Subsequent reads of mapped field see a fully-initialized slot */
+ return qatomic_load_acquire(&slot->mapped);
+}
+
+/*
+ * Find slot that is:
+ * - overlapping in userspace
+ * - currently mapped in the guest
+ *
+ * Needs to be called with mshv_state->msm.mutex or RCU read lock held.
+ */
+static MshvMemorySlot *find_overlap_mem_slot(GList *head, MshvMemorySlot *slot)
+{
+ GList *found;
+ MshvMemorySlot *overlap_slot;
+
+ found = g_list_find_custom(head, slot, (GCompareFunc) slot_overlaps);
+
+ if (!found) {
+ return NULL;
+ }
+
+ overlap_slot = found->data;
+ if (!overlap_slot || !is_mapped(overlap_slot)) {
+ return NULL;
+ }
+
+ return overlap_slot;
+}
+
static int set_guest_memory(int vm_fd,
const struct mshv_user_mem_region *region)
{
@@ -26,38 +159,169 @@ static int set_guest_memory(int vm_fd,
ret = ioctl(vm_fd, MSHV_SET_GUEST_MEMORY, region);
if (ret < 0) {
- error_report("failed to set guest memory");
- return -errno;
+ error_report("failed to set guest memory: %s", strerror(errno));
+ return -1;
}
return 0;
}
-static int map_or_unmap(int vm_fd, const MshvMemoryRegion *mr, bool map)
+static int map_or_unmap(int vm_fd, const MshvMemorySlot *slot, bool map)
{
struct mshv_user_mem_region region = {0};
- region.guest_pfn = mr->guest_phys_addr >> MSHV_PAGE_SHIFT;
- region.size = mr->memory_size;
- region.userspace_addr = mr->userspace_addr;
+ region.guest_pfn = slot->guest_phys_addr >> MSHV_PAGE_SHIFT;
+ region.size = slot->memory_size;
+ region.userspace_addr = slot->userspace_addr;
if (!map) {
region.flags |= (1 << MSHV_SET_MEM_BIT_UNMAP);
- trace_mshv_unmap_memory(mr->userspace_addr, mr->guest_phys_addr,
- mr->memory_size);
+ trace_mshv_unmap_memory(slot->userspace_addr, slot->guest_phys_addr,
+ slot->memory_size);
return set_guest_memory(vm_fd, ®ion);
}
region.flags = BIT(MSHV_SET_MEM_BIT_EXECUTABLE);
- if (!mr->readonly) {
+ if (!slot->readonly) {
region.flags |= BIT(MSHV_SET_MEM_BIT_WRITABLE);
}
- trace_mshv_map_memory(mr->userspace_addr, mr->guest_phys_addr,
- mr->memory_size);
+ trace_mshv_map_memory(slot->userspace_addr, slot->guest_phys_addr,
+ slot->memory_size);
return set_guest_memory(vm_fd, ®ion);
}
+static int slot_matches_region(const MshvMemorySlot *slot1,
+ const MshvMemorySlot *slot2)
+{
+ return (slot1->guest_phys_addr == slot2->guest_phys_addr &&
+ slot1->userspace_addr == slot2->userspace_addr &&
+ slot1->memory_size == slot2->memory_size) ? 0 : -1;
+}
+
+/* Needs to be called with mshv_state->msm.mutex held */
+static MshvMemorySlot *find_mem_slot_by_region(uint64_t gpa, uint64_t size,
+ uint64_t userspace_addr)
+{
+ MshvMemorySlot ref_slot = {
+ .guest_phys_addr = gpa,
+ .userspace_addr = userspace_addr,
+ .memory_size = size,
+ };
+ GList *found;
+ MshvMemorySlotManager *manager = &mshv_state->msm;
+
+ assert(manager);
+ found = g_list_find_custom(manager->slots, &ref_slot,
+ (GCompareFunc) slot_matches_region);
+
+ return found ? found->data : NULL;
+}
+
+static int slot_covers_gpa(const MshvMemorySlot *slot, uint64_t *gpa_p)
+{
+ uint64_t gpa_offset, gpa = *gpa_p;
+
+ gpa_offset = gpa - slot->guest_phys_addr;
+ return (slot->guest_phys_addr <= gpa && gpa_offset < slot->memory_size)
+ ? 0 : -1;
+}
+
+/* Needs to be called with mshv_state->msm.mutex or RCU read lock held */
+static MshvMemorySlot *find_mem_slot_by_gpa(GList *head, uint64_t gpa)
+{
+ GList *found;
+ MshvMemorySlot *slot;
+
+ trace_mshv_find_slot_by_gpa(gpa);
+
+ found = g_list_find_custom(head, &gpa, (GCompareFunc) slot_covers_gpa);
+ if (found) {
+ slot = found->data;
+ trace_mshv_found_slot(slot->userspace_addr, slot->guest_phys_addr,
+ slot->memory_size);
+ return slot;
+ }
+
+ return NULL;
+}
+
+/* Needs to be called with mshv_state->msm.mutex held */
+static void set_mapped(MshvMemorySlot *slot, bool mapped)
+{
+ /* prior writes to mapped field becomes visible before readers see slot */
+ qatomic_store_release(&slot->mapped, mapped);
+}
+
+MshvRemapResult mshv_remap_overlap_region(int vm_fd, uint64_t gpa)
+{
+ MshvMemorySlot *gpa_slot, *overlap_slot;
+ GList *head;
+ int ret;
+ MshvMemorySlotManager *manager = &mshv_state->msm;
+
+ /* fast path, called often by unmapped_gpa vm exit */
+ WITH_RCU_READ_LOCK_GUARD() {
+ assert(manager);
+ head = qatomic_load_acquire(&manager->slots);
+ /* return early if no slot is found */
+ gpa_slot = find_mem_slot_by_gpa(head, gpa);
+ if (gpa_slot == NULL) {
+ return MshvRemapNoMapping;
+ }
+
+ /* return early if no overlapping slot is found */
+ overlap_slot = find_overlap_mem_slot(head, gpa_slot);
+ if (overlap_slot == NULL) {
+ return MshvRemapNoOverlap;
+ }
+ }
+
+ /*
+ * We'll modify the mapping list, so we need to upgrade to mutex and
+ * recheck.
+ */
+ assert(manager);
+ QEMU_LOCK_GUARD(&manager->mutex);
+
+ /* return early if no slot is found */
+ gpa_slot = find_mem_slot_by_gpa(manager->slots, gpa);
+ if (gpa_slot == NULL) {
+ return MshvRemapNoMapping;
+ }
+
+ /* return early if no overlapping slot is found */
+ overlap_slot = find_overlap_mem_slot(manager->slots, gpa_slot);
+ if (overlap_slot == NULL) {
+ return MshvRemapNoOverlap;
+ }
+
+ /* unmap overlapping slot */
+ ret = map_or_unmap(vm_fd, overlap_slot, false);
+ if (ret < 0) {
+ error_report("failed to unmap overlap region");
+ abort();
+ }
+ set_mapped(overlap_slot, false);
+ warn_report("mapped out userspace_addr=0x%016lx gpa=0x%010lx size=0x%lx",
+ overlap_slot->userspace_addr,
+ overlap_slot->guest_phys_addr,
+ overlap_slot->memory_size);
+
+ /* map region for gpa */
+ ret = map_or_unmap(vm_fd, gpa_slot, true);
+ if (ret < 0) {
+ error_report("failed to map new region");
+ abort();
+ }
+ set_mapped(gpa_slot, true);
+ warn_report("mapped in userspace_addr=0x%016lx gpa=0x%010lx size=0x%lx",
+ gpa_slot->userspace_addr, gpa_slot->guest_phys_addr,
+ gpa_slot->memory_size);
+
+ return MshvRemapOk;
+}
+
static int handle_unmapped_mmio_region_read(uint64_t gpa, uint64_t size,
uint8_t *data)
{
@@ -123,20 +387,97 @@ int mshv_guest_mem_write(uint64_t gpa, const uint8_t *data, uintptr_t size,
return -1;
}
-static int set_memory(const MshvMemoryRegion *mshv_mr, bool add)
+static int tracked_unmap(int vm_fd, uint64_t gpa, uint64_t size,
+ uint64_t userspace_addr)
{
- int ret = 0;
+ int ret;
+ MshvMemorySlot *slot;
+ MshvMemorySlotManager *manager = &mshv_state->msm;
+
+ assert(manager);
+
+ QEMU_LOCK_GUARD(&manager->mutex);
+
+ slot = find_mem_slot_by_region(gpa, size, userspace_addr);
+ if (!slot) {
+ trace_mshv_skip_unset_mem(userspace_addr, gpa, size);
+ /* no work to do */
+ return 0;
+ }
+
+ if (!is_mapped(slot)) {
+ /* remove slot, no need to unmap */
+ return remove_slot(slot);
+ }
- if (!mshv_mr) {
- error_report("Invalid mshv_mr");
+ ret = map_or_unmap(vm_fd, slot, false);
+ if (ret < 0) {
+ error_report("failed to unmap memory region");
+ return ret;
+ }
+ return remove_slot(slot);
+}
+
+static int tracked_map(int vm_fd, uint64_t gpa, uint64_t size, bool readonly,
+ uint64_t userspace_addr)
+{
+ MshvMemorySlot *slot, *overlap_slot;
+ int ret;
+ MshvMemorySlotManager *manager = &mshv_state->msm;
+
+ assert(manager);
+
+ QEMU_LOCK_GUARD(&manager->mutex);
+
+ slot = find_mem_slot_by_region(gpa, size, userspace_addr);
+ if (slot) {
+ error_report("memory region already mapped at gpa=0x%lx, "
+ "userspace_addr=0x%lx, size=0x%lx",
+ slot->guest_phys_addr, slot->userspace_addr,
+ slot->memory_size);
return -1;
}
- trace_mshv_set_memory(add, mshv_mr->guest_phys_addr,
- mshv_mr->memory_size,
- mshv_mr->userspace_addr, mshv_mr->readonly,
- ret);
- return map_or_unmap(mshv_state->vm, mshv_mr, add);
+ slot = append_slot(gpa, userspace_addr, size, readonly);
+
+ overlap_slot = find_overlap_mem_slot(manager->slots, slot);
+ if (overlap_slot) {
+ trace_mshv_remap_attempt(slot->userspace_addr,
+ slot->guest_phys_addr,
+ slot->memory_size);
+ warn_report("attempt to map region [0x%lx-0x%lx], while "
+ "[0x%lx-0x%lx] is already mapped in the guest",
+ userspace_addr, userspace_addr + size - 1,
+ overlap_slot->userspace_addr,
+ overlap_slot->userspace_addr +
+ overlap_slot->memory_size - 1);
+
+ /* do not register mem slot in hv, but record for later swap-in */
+ set_mapped(slot, false);
+
+ return 0;
+ }
+
+ ret = map_or_unmap(vm_fd, slot, true);
+ if (ret < 0) {
+ error_report("failed to map memory region");
+ return -1;
+ }
+ set_mapped(slot, true);
+
+ return 0;
+}
+
+static int set_memory(uint64_t gpa, uint64_t size, bool readonly,
+ uint64_t userspace_addr, bool add)
+{
+ int vm_fd = mshv_state->vm;
+
+ if (add) {
+ return tracked_map(vm_fd, gpa, size, readonly, userspace_addr);
+ }
+
+ return tracked_unmap(vm_fd, gpa, size, userspace_addr);
}
/*
@@ -172,7 +513,9 @@ void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
bool writable = !area->readonly && !area->rom_device;
hwaddr start_addr, mr_offset, size;
void *ram;
- MshvMemoryRegion mshv_mr = {0};
+
+ size = align_section(section, &start_addr);
+ trace_mshv_set_phys_mem(add, section->mr->name, start_addr);
size = align_section(section, &start_addr);
trace_mshv_set_phys_mem(add, section->mr->name, start_addr);
@@ -199,14 +542,21 @@ void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
ram = memory_region_get_ram_ptr(area) + mr_offset;
- mshv_mr.guest_phys_addr = start_addr;
- mshv_mr.memory_size = size;
- mshv_mr.readonly = !writable;
- mshv_mr.userspace_addr = (uint64_t)ram;
-
- ret = set_memory(&mshv_mr, add);
+ ret = set_memory(start_addr, size, !writable, (uint64_t)ram, add);
if (ret < 0) {
- error_report("Failed to set memory region");
+ error_report("failed to set memory region");
abort();
}
}
+
+void mshv_init_memory_slot_manager(MshvState *mshv_state)
+{
+ MshvMemorySlotManager *manager;
+
+ assert(mshv_state);
+ manager = &mshv_state->msm;
+
+ manager->n_slots = 0;
+ manager->slots = NULL;
+ qemu_mutex_init(&manager->mutex);
+}
diff --git a/accel/mshv/mshv-all.c b/accel/mshv/mshv-all.c
index 4f4c4b9639..d859d08fb9 100644
--- a/accel/mshv/mshv-all.c
+++ b/accel/mshv/mshv-all.c
@@ -435,6 +435,8 @@ static int mshv_init(AccelState *as, MachineState *ms)
mshv_init_msicontrol();
+ mshv_init_memory_slot_manager(s);
+
ret = create_vm(mshv_fd, &vm_fd);
if (ret < 0) {
close(mshv_fd);
diff --git a/accel/mshv/trace-events b/accel/mshv/trace-events
index 1b1b43a1e8..b30f963445 100644
--- a/accel/mshv/trace-events
+++ b/accel/mshv/trace-events
@@ -23,3 +23,8 @@ mshv_map_memory(uint64_t userspace_addr, uint64_t gpa, uint64_t size) "\tu_a=0x%
mshv_unmap_memory(uint64_t userspace_addr, uint64_t gpa, uint64_t size) "\tu_a=0x%lx gpa=0x%010lx size=0x%08lx"
mshv_set_phys_mem(bool add, const char *name, uint64_t gpa) "\tadd=%d name=%s gpa=0x%010lx"
mshv_handle_mmio(uint64_t gva, uint64_t gpa, uint64_t size, uint8_t access_type) "\tgva=0x%lx gpa=0x%010lx size=0x%lx access_type=%d"
+
+mshv_found_slot(uint64_t userspace_addr, uint64_t gpa, uint64_t size) "\tu_a=0x%lx gpa=0x%010lx size=0x%08lx"
+mshv_skip_unset_mem(uint64_t userspace_addr, uint64_t gpa, uint64_t size) "\tu_a=0x%lx gpa=0x%010lx size=0x%08lx"
+mshv_remap_attempt(uint64_t userspace_addr, uint64_t gpa, uint64_t size) "\tu_a=0x%lx gpa=0x%010lx size=0x%08lx"
+mshv_find_slot_by_gpa(uint64_t gpa) "\tgpa=0x%010lx"
diff --git a/include/system/mshv.h b/include/system/mshv.h
index c527acc08c..3fccb9645a 100644
--- a/include/system/mshv.h
+++ b/include/system/mshv.h
@@ -40,6 +40,8 @@ typedef struct hyperv_message hv_message;
#define MSHV_MSR_ENTRIES_COUNT 64
+#define MSHV_MAX_MEM_SLOTS 32
+
#ifdef CONFIG_MSHV_IS_POSSIBLE
extern bool mshv_allowed;
#define mshv_enabled() (mshv_allowed)
@@ -54,6 +56,12 @@ typedef struct MshvAddressSpace {
AddressSpace *as;
} MshvAddressSpace;
+typedef struct MshvMemorySlotManager {
+ size_t n_slots;
+ GList *slots;
+ QemuMutex mutex;
+} MshvMemorySlotManager;
+
typedef struct MshvState {
AccelState parent_obj;
int vm;
@@ -62,6 +70,7 @@ typedef struct MshvState {
int nr_as;
MshvAddressSpace *as;
int fd;
+ MshvMemorySlotManager msm;
} MshvState;
extern MshvState *mshv_state;
@@ -103,6 +112,12 @@ typedef enum MshvVmExit {
MshvVmExitSpecial = 2,
} MshvVmExit;
+typedef enum MshvRemapResult {
+ MshvRemapOk = 0,
+ MshvRemapNoMapping = 1,
+ MshvRemapNoOverlap = 2,
+} MshvRemapResult;
+
void mshv_init_mmio_emu(void);
int mshv_create_vcpu(int vm_fd, uint8_t vp_index, int *cpu_fd);
void mshv_remove_vcpu(int vm_fd, int cpu_fd);
@@ -145,21 +160,22 @@ typedef struct MshvMsrEntries {
int mshv_configure_msr(int cpu_fd, const MshvMsrEntry *msrs, size_t n_msrs);
/* memory */
-typedef struct MshvMemoryRegion {
+typedef struct MshvMemorySlot {
uint64_t guest_phys_addr;
uint64_t memory_size;
uint64_t userspace_addr;
bool readonly;
-} MshvMemoryRegion;
+ bool mapped;
+} MshvMemorySlot;
-int mshv_add_mem(int vm_fd, const MshvMemoryRegion *mr);
-int mshv_remove_mem(int vm_fd, const MshvMemoryRegion *mr);
+MshvRemapResult mshv_remap_overlap_region(int vm_fd, uint64_t gpa);
int mshv_guest_mem_read(uint64_t gpa, uint8_t *data, uintptr_t size,
bool is_secure_mode, bool instruction_fetch);
int mshv_guest_mem_write(uint64_t gpa, const uint8_t *data, uintptr_t size,
bool is_secure_mode);
void mshv_set_phys_mem(MshvMemoryListener *mml, MemoryRegionSection *section,
bool add);
+void mshv_init_memory_slot_manager(MshvState *mshv_state);
/* interrupt */
void mshv_init_msicontrol(void);
diff --git a/target/i386/mshv/mshv-cpu.c b/target/i386/mshv/mshv-cpu.c
index 81e9176164..8dff75a19f 100644
--- a/target/i386/mshv/mshv-cpu.c
+++ b/target/i386/mshv/mshv-cpu.c
@@ -1073,6 +1073,43 @@ static int handle_mmio(CPUState *cpu, const struct hyperv_message *msg,
return 0;
}
+static int handle_unmapped_mem(int vm_fd, CPUState *cpu,
+ const struct hyperv_message *msg,
+ MshvVmExit *exit_reason)
+{
+ struct hv_x64_memory_intercept_message info = { 0 };
+ uint64_t gpa;
+ int ret;
+ enum MshvRemapResult remap_result;
+
+ ret = set_memory_info(msg, &info);
+ if (ret < 0) {
+ error_report("failed to convert message to memory info");
+ return -1;
+ }
+
+ gpa = info.guest_physical_address;
+
+ /* attempt to remap the region, in case of overlapping userspace mappings */
+ remap_result = mshv_remap_overlap_region(vm_fd, gpa);
+ *exit_reason = MshvVmExitIgnore;
+
+ switch (remap_result) {
+ case MshvRemapNoMapping:
+ /* if we didn't find a mapping, it is probably mmio */
+ return handle_mmio(cpu, msg, exit_reason);
+ case MshvRemapOk:
+ break;
+ case MshvRemapNoOverlap:
+ /* This should not happen, but we are forgiving it */
+ warn_report("found no overlap for unmapped region");
+ *exit_reason = MshvVmExitSpecial;
+ break;
+ }
+
+ return 0;
+}
+
static int set_ioport_info(const struct hyperv_message *msg,
hv_x64_io_port_intercept_message *info)
{
@@ -1429,6 +1466,12 @@ int mshv_run_vcpu(int vm_fd, CPUState *cpu, hv_message *msg, MshvVmExit *exit)
case HVMSG_UNRECOVERABLE_EXCEPTION:
return MshvVmExitShutdown;
case HVMSG_UNMAPPED_GPA:
+ ret = handle_unmapped_mem(vm_fd, cpu, msg, &exit_reason);
+ if (ret < 0) {
+ error_report("failed to handle unmapped memory");
+ return -1;
+ }
+ return exit_reason;
case HVMSG_GPA_INTERCEPT:
ret = handle_mmio(cpu, msg, &exit_reason);
if (ret < 0) {
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 24/26] docs: Add mshv to documentation
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (22 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 23/26] accel/mshv: Handle overlapping mem mappings Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-27 11:33 ` Daniel P. Berrangé
2025-08-07 14:39 ` [PATCH v3 25/26] MAINTAINERS: Add maintainers for mshv accelerator Magnus Kulke
` (3 subsequent siblings)
27 siblings, 1 reply; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Added mshv to the list of accelerators in doc text.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
docs/devel/codebase.rst | 2 +-
qemu-options.hx | 16 ++++++++--------
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/docs/devel/codebase.rst b/docs/devel/codebase.rst
index 2a3143787a..69d8827117 100644
--- a/docs/devel/codebase.rst
+++ b/docs/devel/codebase.rst
@@ -48,7 +48,7 @@ yet, so sometimes the source code is all you have.
* `accel <https://gitlab.com/qemu-project/qemu/-/tree/master/accel>`_:
Infrastructure and architecture agnostic code related to the various
`accelerators <Accelerators>` supported by QEMU
- (TCG, KVM, hvf, whpx, xen, nvmm).
+ (TCG, KVM, hvf, whpx, xen, nvmm, mshv).
Contains interfaces for operations that will be implemented per
`target <https://gitlab.com/qemu-project/qemu/-/tree/master/target>`_.
* `audio <https://gitlab.com/qemu-project/qemu/-/tree/master/audio>`_:
diff --git a/qemu-options.hx b/qemu-options.hx
index ab23f14d21..ad747eb154 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -28,7 +28,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
"-machine [type=]name[,prop[=value][,...]]\n"
" selects emulated machine ('-machine help' for list)\n"
" property accel=accel1[:accel2[:...]] selects accelerator\n"
- " supported accelerators are kvm, xen, hvf, nvmm, whpx or tcg (default: tcg)\n"
+ " supported accelerators are kvm, xen, hvf, nvmm, whpx, mshv or tcg (default: tcg)\n"
" vmport=on|off|auto controls emulation of vmport (default: auto)\n"
" dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
" mem-merge=on|off controls memory merge support (default: on)\n"
@@ -66,10 +66,10 @@ SRST
``accel=accels1[:accels2[:...]]``
This is used to enable an accelerator. Depending on the target
- architecture, kvm, xen, hvf, nvmm, whpx or tcg can be available.
- By default, tcg is used. If there is more than one accelerator
- specified, the next one is used if the previous one fails to
- initialize.
+ architecture, kvm, xen, hvf, nvmm, whpx, mshv or tcg can be
+ available. By default, tcg is used. If there is more than one
+ accelerator specified, the next one is used if the previous one
+ fails to initialize.
``vmport=on|off|auto``
Enables emulation of VMWare IO port, for vmmouse etc. auto says
@@ -226,7 +226,7 @@ ERST
DEF("accel", HAS_ARG, QEMU_OPTION_accel,
"-accel [accel=]accelerator[,prop[=value][,...]]\n"
- " select accelerator (kvm, xen, hvf, nvmm, whpx or tcg; use 'help' for a list)\n"
+ " select accelerator (kvm, xen, hvf, nvmm, whpx, mshv or tcg; use 'help' for a list)\n"
" igd-passthru=on|off (enable Xen integrated Intel graphics passthrough, default=off)\n"
" kernel-irqchip=on|off|split controls accelerated irqchip support (default=on)\n"
" kvm-shadow-mem=size of KVM shadow MMU in bytes\n"
@@ -241,8 +241,8 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel,
SRST
``-accel name[,prop=value[,...]]``
This is used to enable an accelerator. Depending on the target
- architecture, kvm, xen, hvf, nvmm, whpx or tcg can be available. By
- default, tcg is used. If there is more than one accelerator
+ architecture, kvm, xen, hvf, nvmm, whpx, mshv or tcg can be available.
+ By default, tcg is used. If there is more than one accelerator
specified, the next one is used if the previous one fails to
initialize.
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v3 24/26] docs: Add mshv to documentation
2025-08-07 14:39 ` [PATCH v3 24/26] docs: Add mshv to documentation Magnus Kulke
@ 2025-08-27 11:33 ` Daniel P. Berrangé
0 siblings, 0 replies; 46+ messages in thread
From: Daniel P. Berrangé @ 2025-08-27 11:33 UTC (permalink / raw)
To: Magnus Kulke
Cc: qemu-devel, Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Magnus Kulke, Cornelia Huck, Zhao Liu, Thomas Huth, Yanan Wang,
Cameron Esfahani, Wei Liu, Wei Liu, Marc-André Lureau,
Roman Bolshakov, Philippe Mathieu-Daudé
On Thu, Aug 07, 2025 at 04:39:49PM +0200, Magnus Kulke wrote:
> Added mshv to the list of accelerators in doc text.
>
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
> docs/devel/codebase.rst | 2 +-
Probably also sensible to mention it in
docs/about/build-platforms.rst
docs/glossary.rst
docs/system/introduction.rst
as those files tend to mention other accelerators too.
> qemu-options.hx | 16 ++++++++--------
> 2 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/docs/devel/codebase.rst b/docs/devel/codebase.rst
> index 2a3143787a..69d8827117 100644
> --- a/docs/devel/codebase.rst
> +++ b/docs/devel/codebase.rst
> @@ -48,7 +48,7 @@ yet, so sometimes the source code is all you have.
> * `accel <https://gitlab.com/qemu-project/qemu/-/tree/master/accel>`_:
> Infrastructure and architecture agnostic code related to the various
> `accelerators <Accelerators>` supported by QEMU
> - (TCG, KVM, hvf, whpx, xen, nvmm).
> + (TCG, KVM, hvf, whpx, xen, nvmm, mshv).
> Contains interfaces for operations that will be implemented per
> `target <https://gitlab.com/qemu-project/qemu/-/tree/master/target>`_.
> * `audio <https://gitlab.com/qemu-project/qemu/-/tree/master/audio>`_:
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v3 25/26] MAINTAINERS: Add maintainers for mshv accelerator
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (23 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 24/26] docs: Add mshv to documentation Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 14:39 ` [PATCH v3 26/26] qapi/accel: Allow to query mshv capabilities Magnus Kulke
` (2 subsequent siblings)
27 siblings, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
Adding Magnus Kulke and Wei Liu to the maintainers file for the
respective folders/files.
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
MAINTAINERS | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index a07086ed76..7527264f30 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -546,6 +546,21 @@ F: target/i386/whpx/
F: accel/stubs/whpx-stub.c
F: include/system/whpx.h
+MSHV
+M: Magnus Kulke <magnus.kulke@linux.microsoft.com>
+R: Wei Liu <wei.liu@kernel.org>
+S: Supported
+F: accel/mshv/
+F: include/system/mshv.h
+F: include/hw/hyperv/hvgdk*.h
+F: include/hw/hyperv/hvhdk*.h
+
+MSHV CPUs
+M: Magnus Kulke <magnus.kulke@linux.microsoft.com>
+R: Wei Liu <wei.liu@kernel.org>
+S: Supported
+F: target/i386/mshv/
+
X86 Instruction Emulator
M: Cameron Esfahani <dirty@apple.com>
M: Roman Bolshakov <rbolshakov@ddn.com>
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* [PATCH v3 26/26] qapi/accel: Allow to query mshv capabilities
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (24 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 25/26] MAINTAINERS: Add maintainers for mshv accelerator Magnus Kulke
@ 2025-08-07 14:39 ` Magnus Kulke
2025-08-07 19:22 ` Wei Liu
2025-08-27 11:39 ` Daniel P. Berrangé
2025-08-11 17:59 ` [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Wei Liu
2025-09-11 6:59 ` Michael S. Tsirkin
27 siblings, 2 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-08-07 14:39 UTC (permalink / raw)
To: qemu-devel
Cc: Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
From: Praveen K Paladugu <prapal@microsoft.com>
Allow to query mshv capabilities via query-mshv QMP command.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
hw/core/machine-qmp-cmds.c | 14 ++++++++++++++
qapi/accelerator.json | 29 +++++++++++++++++++++++++++++
2 files changed, 43 insertions(+)
diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index 6aca1a626e..024ddb8d2d 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -28,6 +28,20 @@
#include "system/runstate.h"
#include "system/system.h"
#include "hw/s390x/storage-keys.h"
+#include <sys/stat.h>
+
+/*
+ * QMP query for MSHV
+ */
+MshvInfo *qmp_query_mshv(Error **errp)
+{
+ MshvInfo *info = g_malloc0(sizeof(*info));
+ struct stat st;
+
+ info->present = accel_find("mshv");
+ info->enabled = (stat("/dev/mshv", &st) == 0);
+ return info;
+}
/*
* fast means: we NEVER interrupt vCPU threads to retrieve
diff --git a/qapi/accelerator.json b/qapi/accelerator.json
index fb28c8d920..c2bfbc507f 100644
--- a/qapi/accelerator.json
+++ b/qapi/accelerator.json
@@ -54,3 +54,32 @@
{ 'command': 'x-accel-stats',
'returns': 'HumanReadableText',
'features': [ 'unstable' ] }
+
+##
+# @MshvInfo:
+#
+# Information about support for MSHV acceleration
+#
+# @enabled: true if MSHV acceleration is active
+#
+# @present: true if MSHV acceleration is built into this executable
+#
+# Since: 10.0.92
+##
+{ 'struct': 'MshvInfo', 'data': {'enabled': 'bool', 'present': 'bool'} }
+
+##
+# @query-mshv:
+#
+# Return information about MSHV acceleration
+#
+# Returns: @MshvInfo
+#
+# Since: 10.0.92
+#
+# .. qmp-example::
+#
+# -> { "execute": "query-mshv" }
+# <- { "return": { "enabled": true, "present": true } }
+##
+{ 'command': 'query-mshv', 'returns': 'MshvInfo' }
--
2.34.1
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v3 26/26] qapi/accel: Allow to query mshv capabilities
2025-08-07 14:39 ` [PATCH v3 26/26] qapi/accel: Allow to query mshv capabilities Magnus Kulke
@ 2025-08-07 19:22 ` Wei Liu
2025-08-13 0:37 ` Wei Liu
2025-08-27 11:39 ` Daniel P. Berrangé
1 sibling, 1 reply; 46+ messages in thread
From: Wei Liu @ 2025-08-07 19:22 UTC (permalink / raw)
To: Magnus Kulke, prapal
Cc: qemu-devel, Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
On Thu, Aug 07, 2025 at 04:39:51PM +0200, Magnus Kulke wrote:
> From: Praveen K Paladugu <prapal@microsoft.com>
>
> Allow to query mshv capabilities via query-mshv QMP command.
>
> Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
> hw/core/machine-qmp-cmds.c | 14 ++++++++++++++
> qapi/accelerator.json | 29 +++++++++++++++++++++++++++++
> 2 files changed, 43 insertions(+)
>
> diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
> index 6aca1a626e..024ddb8d2d 100644
> --- a/hw/core/machine-qmp-cmds.c
> +++ b/hw/core/machine-qmp-cmds.c
> @@ -28,6 +28,20 @@
> #include "system/runstate.h"
> #include "system/system.h"
> #include "hw/s390x/storage-keys.h"
> +#include <sys/stat.h>
> +
> +/*
> + * QMP query for MSHV
> + */
> +MshvInfo *qmp_query_mshv(Error **errp)
> +{
> + MshvInfo *info = g_malloc0(sizeof(*info));
> + struct stat st;
> +
> + info->present = accel_find("mshv");
> + info->enabled = (stat("/dev/mshv", &st) == 0);
I don't think this is the right way to check if MSHV is _enabled_. The
device node being around doesn't necessarily mean that QEMU is using it.
You can refer to kvm_enabled() to see how it is implemented.
Some functions that are of interest:
do_configure_accelerator
accel_init_machine
Thanks,
Wei
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v3 26/26] qapi/accel: Allow to query mshv capabilities
2025-08-07 19:22 ` Wei Liu
@ 2025-08-13 0:37 ` Wei Liu
0 siblings, 0 replies; 46+ messages in thread
From: Wei Liu @ 2025-08-13 0:37 UTC (permalink / raw)
To: Magnus Kulke, prapal
Cc: qemu-devel, Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
On Thu, Aug 07, 2025 at 07:22:21PM +0000, Wei Liu wrote:
> On Thu, Aug 07, 2025 at 04:39:51PM +0200, Magnus Kulke wrote:
> > From: Praveen K Paladugu <prapal@microsoft.com>
> >
> > Allow to query mshv capabilities via query-mshv QMP command.
> >
> > Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
> > Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> > ---
> > hw/core/machine-qmp-cmds.c | 14 ++++++++++++++
> > qapi/accelerator.json | 29 +++++++++++++++++++++++++++++
> > 2 files changed, 43 insertions(+)
> >
> > diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
> > index 6aca1a626e..024ddb8d2d 100644
> > --- a/hw/core/machine-qmp-cmds.c
> > +++ b/hw/core/machine-qmp-cmds.c
> > @@ -28,6 +28,20 @@
> > #include "system/runstate.h"
> > #include "system/system.h"
> > #include "hw/s390x/storage-keys.h"
> > +#include <sys/stat.h>
> > +
> > +/*
> > + * QMP query for MSHV
> > + */
> > +MshvInfo *qmp_query_mshv(Error **errp)
> > +{
> > + MshvInfo *info = g_malloc0(sizeof(*info));
> > + struct stat st;
> > +
> > + info->present = accel_find("mshv");
> > + info->enabled = (stat("/dev/mshv", &st) == 0);
>
> I don't think this is the right way to check if MSHV is _enabled_. The
> device node being around doesn't necessarily mean that QEMU is using it.
>
> You can refer to kvm_enabled() to see how it is implemented.
>
> Some functions that are of interest:
> do_configure_accelerator
> accel_init_machine
This is likely as simple as squashing in the following diff.
diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index 024ddb8d2d7c..1b520599972a 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -39,7 +39,7 @@ MshvInfo *qmp_query_mshv(Error **errp)
struct stat st;
info->present = accel_find("mshv");
- info->enabled = (stat("/dev/mshv", &st) == 0);
+ info->enabled = mshv_enabled();
return info;
}
^ permalink raw reply related [flat|nested] 46+ messages in thread
* Re: [PATCH v3 26/26] qapi/accel: Allow to query mshv capabilities
2025-08-07 14:39 ` [PATCH v3 26/26] qapi/accel: Allow to query mshv capabilities Magnus Kulke
2025-08-07 19:22 ` Wei Liu
@ 2025-08-27 11:39 ` Daniel P. Berrangé
1 sibling, 0 replies; 46+ messages in thread
From: Daniel P. Berrangé @ 2025-08-27 11:39 UTC (permalink / raw)
To: Magnus Kulke
Cc: qemu-devel, Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Magnus Kulke, Cornelia Huck, Zhao Liu, Thomas Huth, Yanan Wang,
Cameron Esfahani, Wei Liu, Wei Liu, Marc-André Lureau,
Roman Bolshakov, Philippe Mathieu-Daudé
On Thu, Aug 07, 2025 at 04:39:51PM +0200, Magnus Kulke wrote:
> From: Praveen K Paladugu <prapal@microsoft.com>
>
> Allow to query mshv capabilities via query-mshv QMP command.
>
> Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
> ---
> hw/core/machine-qmp-cmds.c | 14 ++++++++++++++
> qapi/accelerator.json | 29 +++++++++++++++++++++++++++++
> 2 files changed, 43 insertions(+)
>
> diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
> index 6aca1a626e..024ddb8d2d 100644
> --- a/hw/core/machine-qmp-cmds.c
> +++ b/hw/core/machine-qmp-cmds.c
> @@ -28,6 +28,20 @@
> #include "system/runstate.h"
> #include "system/system.h"
> #include "hw/s390x/storage-keys.h"
> +#include <sys/stat.h>
> +
> +/*
> + * QMP query for MSHV
> + */
> +MshvInfo *qmp_query_mshv(Error **errp)
> +{
> + MshvInfo *info = g_malloc0(sizeof(*info));
> + struct stat st;
> +
> + info->present = accel_find("mshv");
> + info->enabled = (stat("/dev/mshv", &st) == 0);
This does't have the right semantics
$ sudo touch /dev/mshv
$ ./scripts/qmp/qmp-shell-wrap ./build/qemu-system-x86_64 -display none -accel tcg
(QEMU) query-mshv
{"return": {"enabled": true, "present": true}}
It cannot be enabled, since I asked for 'tcg' as the accelerator
IIUC, it should be
info->enabled = mshv_enabled();
to match what KVM does.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (25 preceding siblings ...)
2025-08-07 14:39 ` [PATCH v3 26/26] qapi/accel: Allow to query mshv capabilities Magnus Kulke
@ 2025-08-11 17:59 ` Wei Liu
2025-09-11 6:59 ` Michael S. Tsirkin
27 siblings, 0 replies; 46+ messages in thread
From: Wei Liu @ 2025-08-11 17:59 UTC (permalink / raw)
To: Magnus Kulke
Cc: qemu-devel, Eric Blake, Eduardo Habkost, Michael S. Tsirkin,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
On Thu, Aug 07, 2025 at 04:39:25PM +0200, Magnus Kulke wrote:
> Hey all,
>
[...]
>
> - A discrete kernel ioctl "set_immediate_exit" (to avoid a race condition
> when handling terminiation signals like ctrl-a x) has been tested and
> proven to mitigate the problem. Since other consumers of /dev/mshv have
> simular requirements as QEMU, we opted to iterate a bit more on the
> respective kernel interface.
>
> Magnus Kulke (25):
> accel: Add Meson and config support for MSHV accelerator
> target/i386/emulate: Allow instruction decoding from stream
> target/i386/mshv: Add x86 decoder/emu implementation
> hw/intc: Generalize APIC helper names from kvm_* to accel_*
This needs acks from KVM maintainers.
> include/hw/hyperv: Add MSHV ABI header definitions
> linux-headers/linux: Add mshv.h headers
> accel/mshv: Add accelerator skeleton
> accel/mshv: Register memory region listeners
> accel/mshv: Initialize VM partition
> accel/mshv: Add vCPU creation and execution loop
> accel/mshv: Add vCPU signal handling
> target/i386/mshv: Add CPU create and remove logic
> target/i386/mshv: Implement mshv_store_regs()
> target/i386/mshv: Implement mshv_get_standard_regs()
> target/i386/mshv: Implement mshv_get_special_regs()
> target/i386/mshv: Implement mshv_arch_put_registers()
> target/i386/mshv: Set local interrupt controller state
> target/i386/mshv: Register CPUID entries with MSHV
> target/i386/mshv: Register MSRs with MSHV
> target/i386/mshv: Integrate x86 instruction decoder/emulator
> target/i386/mshv: Write MSRs to the hypervisor
> target/i386/mshv: Implement mshv_vcpu_run()
> accel/mshv: Handle overlapping mem mappings
I only had a cursory look at this. I'm definitively not an expert on
RCU, so the more reviews we can get the better. To the best of my
(limited) knowledge, the code looks reasonable.
> docs: Add mshv to documentation
> MAINTAINERS: Add maintainers for mshv accelerator
>
The rest looks okay.
> Praveen K Paladugu (1):
> qapi/accel: Allow to query mshv capabilities
>
This looks problematic and probably needs to be changed.
I really hope that we can commit as many patches as possible to QEMU
tree, so that we don't need to keep rebasing.
Thanks,
Wei
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator
2025-08-07 14:39 [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Magnus Kulke
` (26 preceding siblings ...)
2025-08-11 17:59 ` [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator Wei Liu
@ 2025-09-11 6:59 ` Michael S. Tsirkin
2025-09-11 15:21 ` Paolo Bonzini
2025-09-18 4:08 ` Mohamed Mediouni
27 siblings, 2 replies; 46+ messages in thread
From: Michael S. Tsirkin @ 2025-09-11 6:59 UTC (permalink / raw)
To: Magnus Kulke
Cc: qemu-devel, Eric Blake, Eduardo Habkost, Markus Armbruster,
Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
On Thu, Aug 07, 2025 at 04:39:25PM +0200, Magnus Kulke wrote:
> Hey all,
>
> This is the third revision of a patch set implementing an accelerator
> for the MSHV kernel driver, exposing HyperV to Linux "Dom0" hosts in various
> scenarios. Thank you for the feedback so far. Since the last revision we
> incorporated feedback from the last review and identified further areas for
> performance fixes, notably in the irqfd handling. I'm looking forward to your
> comments.
>
> Best regards,
>
> magnus
So regarding merging plans. Did you guys get in touch
with Sunil? That's the easiest smoothest path, through
an existing maintainer.
> Changelog:
>
> v2 => v3
>
> - Addressed code review comments (style)
> - Reserve GSI 01-23 for IO-APIC pins (this resolved a problem in which MSI
> routes would be overwritten with interrupts from legacy devices, breaking
> irqfd notification for virtio-blk queues)
> - Guard memory slot table with mutex and RCU mechanism (multiple threads
> might access the memory slot table, and in the event of an UNMAPPED_GPA
> exit we need to query the table for an unmapped region covering that GPA)
> - Include memory slot manager in MshvState
> - Produce mshv.h kernel header with ./scripts/update-linux-headers.sh from
> linux 6.16 (not all UAPI definitions are defined in the upstream kernel,
> hence we ship hw/hyper/hvgdk*.h and hw/hyperv/hvhdk*.h headers)
> - Added a QMP command query-mshv (a requirement for integration into
> higher-level tooling)
> - Removed handling of HALT vm exit, since this is not a supported HV
> message any more.
> - Added 2 maintainers from Microsoft for the respective file hierarchy
> - Added mshv as accelerator option in the documentation
>
> RFC (v1) => v2
>
> - Addressed code review comments (style, consilidation).
> - Rewrote the logic that handles overlap-in-userspace mappings to use
> a static list of slots, inspired by the HVF accelerator code.
> - Fixed a bug that wrote corrupt payload in a MSHV_SET_MSI_ROUTING
> call, preventing vhost=on to work on tap network devices.
> - Removed an erronous truncation of guest addresses to 32bit when
> registering ioeventfd's using MSHV_IOEVENTFD. This resulted in
> shadowing of low memory when ioevents were registered with
> addresses beyond the 4gb barrier and thus unexpected "unmapped gpa"
> vm exits in lower mem regions (impacting io performance).
> - Fixed problem in which the MSI routing table was committed for KVM
> KVM instead of MSHV in virtio-pci bus initialization.
> - Added handler for HLT vm exits.
> - The above fixes removed a few limitation present in the previous
> revision:
> - Guest with machine type "pc" are booting (testing is still mostly
> performed with q35)
> - Tap network devices can be used with vhost=on option.
> - Seabios can be used with >2.75G memory and multiple virtio-pci
> devices
> - I/O performance improvement as extranous MMIO vm exits are avoided
> by registering ioevents with a correct address.
>
> Notes:
>
> - A discrete kernel ioctl "set_immediate_exit" (to avoid a race condition
> when handling terminiation signals like ctrl-a x) has been tested and
> proven to mitigate the problem. Since other consumers of /dev/mshv have
> simular requirements as QEMU, we opted to iterate a bit more on the
> respective kernel interface.
>
> Magnus Kulke (25):
> accel: Add Meson and config support for MSHV accelerator
> target/i386/emulate: Allow instruction decoding from stream
> target/i386/mshv: Add x86 decoder/emu implementation
> hw/intc: Generalize APIC helper names from kvm_* to accel_*
> include/hw/hyperv: Add MSHV ABI header definitions
> linux-headers/linux: Add mshv.h headers
> accel/mshv: Add accelerator skeleton
> accel/mshv: Register memory region listeners
> accel/mshv: Initialize VM partition
> accel/mshv: Add vCPU creation and execution loop
> accel/mshv: Add vCPU signal handling
> target/i386/mshv: Add CPU create and remove logic
> target/i386/mshv: Implement mshv_store_regs()
> target/i386/mshv: Implement mshv_get_standard_regs()
> target/i386/mshv: Implement mshv_get_special_regs()
> target/i386/mshv: Implement mshv_arch_put_registers()
> target/i386/mshv: Set local interrupt controller state
> target/i386/mshv: Register CPUID entries with MSHV
> target/i386/mshv: Register MSRs with MSHV
> target/i386/mshv: Integrate x86 instruction decoder/emulator
> target/i386/mshv: Write MSRs to the hypervisor
> target/i386/mshv: Implement mshv_vcpu_run()
> accel/mshv: Handle overlapping mem mappings
> docs: Add mshv to documentation
> MAINTAINERS: Add maintainers for mshv accelerator
>
> Praveen K Paladugu (1):
> qapi/accel: Allow to query mshv capabilities
>
> MAINTAINERS | 15 +
> accel/Kconfig | 3 +
> accel/accel-irq.c | 106 ++
> accel/meson.build | 3 +-
> accel/mshv/irq.c | 397 +++++++
> accel/mshv/mem.c | 562 ++++++++++
> accel/mshv/meson.build | 9 +
> accel/mshv/mshv-all.c | 726 +++++++++++++
> accel/mshv/msr.c | 373 +++++++
> accel/mshv/trace-events | 30 +
> accel/mshv/trace.h | 14 +
> docs/devel/codebase.rst | 2 +-
> hw/core/machine-qmp-cmds.c | 14 +
> hw/intc/apic.c | 8 +
> hw/intc/ioapic.c | 20 +-
> hw/virtio/virtio-pci.c | 21 +-
> include/hw/hyperv/hvgdk.h | 19 +
> include/hw/hyperv/hvgdk_mini.h | 864 +++++++++++++++
> include/hw/hyperv/hvhdk.h | 164 +++
> include/hw/hyperv/hvhdk_mini.h | 105 ++
> include/system/accel-irq.h | 37 +
> include/system/mshv.h | 195 ++++
> linux-headers/linux/mshv.h | 291 ++++++
> meson.build | 11 +
> meson_options.txt | 2 +
> qapi/accelerator.json | 29 +
> qemu-options.hx | 16 +-
> scripts/meson-buildoptions.sh | 3 +
> scripts/update-linux-headers.sh | 2 +-
> target/i386/cpu.h | 4 +-
> target/i386/emulate/meson.build | 7 +-
> target/i386/emulate/x86_decode.c | 27 +-
> target/i386/emulate/x86_decode.h | 9 +
> target/i386/emulate/x86_emu.c | 3 +-
> target/i386/emulate/x86_emu.h | 2 +
> target/i386/meson.build | 2 +
> target/i386/mshv/meson.build | 8 +
> target/i386/mshv/mshv-cpu.c | 1674 ++++++++++++++++++++++++++++++
> target/i386/mshv/x86.c | 297 ++++++
> 39 files changed, 6038 insertions(+), 36 deletions(-)
> create mode 100644 accel/accel-irq.c
> create mode 100644 accel/mshv/irq.c
> create mode 100644 accel/mshv/mem.c
> create mode 100644 accel/mshv/meson.build
> create mode 100644 accel/mshv/mshv-all.c
> create mode 100644 accel/mshv/msr.c
> create mode 100644 accel/mshv/trace-events
> create mode 100644 accel/mshv/trace.h
> create mode 100644 include/hw/hyperv/hvgdk.h
> create mode 100644 include/hw/hyperv/hvgdk_mini.h
> create mode 100644 include/hw/hyperv/hvhdk.h
> create mode 100644 include/hw/hyperv/hvhdk_mini.h
> create mode 100644 include/system/accel-irq.h
> create mode 100644 include/system/mshv.h
> create mode 100644 linux-headers/linux/mshv.h
> create mode 100644 target/i386/mshv/meson.build
> create mode 100644 target/i386/mshv/mshv-cpu.c
> create mode 100644 target/i386/mshv/x86.c
>
> --
> 2.34.1
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator
2025-09-11 6:59 ` Michael S. Tsirkin
@ 2025-09-11 15:21 ` Paolo Bonzini
2025-09-11 16:07 ` Wei Liu
2025-09-11 16:26 ` Magnus Kulke
2025-09-18 4:08 ` Mohamed Mediouni
1 sibling, 2 replies; 46+ messages in thread
From: Paolo Bonzini @ 2025-09-11 15:21 UTC (permalink / raw)
To: Michael S. Tsirkin, Magnus Kulke
Cc: qemu-devel, Eric Blake, Eduardo Habkost, Markus Armbruster,
Magnus Kulke, Richard Henderson, Phil Dennis-Jordan,
Marcel Apfelbaum, Alex Bennée, Daniel P. Berrangé,
Magnus Kulke, Cornelia Huck, Zhao Liu, Thomas Huth, Yanan Wang,
Cameron Esfahani, Wei Liu, Wei Liu, Marc-André Lureau,
Roman Bolshakov, Philippe Mathieu-Daudé
On 9/11/25 08:59, Michael S. Tsirkin wrote:
> On Thu, Aug 07, 2025 at 04:39:25PM +0200, Magnus Kulke wrote:
>> Hey all,
>>
>> This is the third revision of a patch set implementing an accelerator
>> for the MSHV kernel driver, exposing HyperV to Linux "Dom0" hosts in various
>> scenarios. Thank you for the feedback so far. Since the last revision we
>> incorporated feedback from the last review and identified further areas for
>> performance fixes, notably in the irqfd handling. I'm looking forward to your
>> comments.
>>
>> Best regards,
>>
>> magnus
>
>
> So regarding merging plans. Did you guys get in touch
> with Sunil? That's the easiest smoothest path, through
> an existing maintainer.
There's hardly any code shared with WHPX; I am on vacation this week but
I'll do a final review and merge it soon.
Paolo
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator
2025-09-11 15:21 ` Paolo Bonzini
@ 2025-09-11 16:07 ` Wei Liu
2025-09-11 16:26 ` Magnus Kulke
1 sibling, 0 replies; 46+ messages in thread
From: Wei Liu @ 2025-09-11 16:07 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Michael S. Tsirkin, Magnus Kulke, qemu-devel, Eric Blake,
Eduardo Habkost, Markus Armbruster, Magnus Kulke,
Richard Henderson, Phil Dennis-Jordan, Marcel Apfelbaum,
Alex Bennée, Daniel P. Berrangé, Magnus Kulke,
Cornelia Huck, Zhao Liu, Thomas Huth, Yanan Wang,
Cameron Esfahani, Wei Liu, Wei Liu, Marc-André Lureau,
Roman Bolshakov, Philippe Mathieu-Daudé
On Thu, Sep 11, 2025 at 05:21:44PM +0200, Paolo Bonzini wrote:
> On 9/11/25 08:59, Michael S. Tsirkin wrote:
> > On Thu, Aug 07, 2025 at 04:39:25PM +0200, Magnus Kulke wrote:
> > > Hey all,
> > >
> > > This is the third revision of a patch set implementing an accelerator
> > > for the MSHV kernel driver, exposing HyperV to Linux "Dom0" hosts in various
> > > scenarios. Thank you for the feedback so far. Since the last revision we
> > > incorporated feedback from the last review and identified further areas for
> > > performance fixes, notably in the irqfd handling. I'm looking forward to your
> > > comments.
> > >
> > > Best regards,
> > >
> > > magnus
> >
> >
> > So regarding merging plans. Did you guys get in touch
> > with Sunil? That's the easiest smoothest path, through
> > an existing maintainer.
>
> There's hardly any code shared with WHPX; I am on vacation this week but
> I'll do a final review and merge it soon.
+1 on this. Nothing's shared with WHPX.
Mangus, can you confirm this is ready to be merged? Is there another
version brewing?
Thanks,
Wei
>
> Paolo
>
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator
2025-09-11 15:21 ` Paolo Bonzini
2025-09-11 16:07 ` Wei Liu
@ 2025-09-11 16:26 ` Magnus Kulke
1 sibling, 0 replies; 46+ messages in thread
From: Magnus Kulke @ 2025-09-11 16:26 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Michael S. Tsirkin, qemu-devel, Eric Blake, Eduardo Habkost,
Markus Armbruster, Magnus Kulke, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
Daniel P. Berrangé, Magnus Kulke, Cornelia Huck, Zhao Liu,
Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu, Wei Liu,
Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
On Thu, Sep 11, 2025 at 05:21:44PM +0200, Paolo Bonzini wrote:
> On 9/11/25 08:59, Michael S. Tsirkin wrote:
> > On Thu, Aug 07, 2025 at 04:39:25PM +0200, Magnus Kulke wrote:
> > > Hey all,
> > >
> > > This is the third revision of a patch set implementing an accelerator
> > > for the MSHV kernel driver, exposing HyperV to Linux "Dom0" hosts in various
> > > scenarios. Thank you for the feedback so far. Since the last revision we
> > > incorporated feedback from the last review and identified further areas for
> > > performance fixes, notably in the irqfd handling. I'm looking forward to your
> > > comments.
> > >
> > > Best regards,
> > >
> > > magnus
> >
> >
> > So regarding merging plans. Did you guys get in touch
> > with Sunil? That's the easiest smoothest path, through
> > an existing maintainer.
>
> There's hardly any code shared with WHPX; I am on vacation this week but
> I'll do a final review and merge it soon.
>
> Paolo
>
Hello Paolo, we intend to post a V4 soon-ish (next week probably) which
will include a necessary update to some IOCTLs calls. that and some minor
changes due to review feedback and a rebase, since this is a couple of
weeks old already. It probably makes sense to to wait for that.
best,
magnus
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v3 00/26] Implementing a MSHV (Microsoft Hypervisor) accelerator
2025-09-11 6:59 ` Michael S. Tsirkin
2025-09-11 15:21 ` Paolo Bonzini
@ 2025-09-18 4:08 ` Mohamed Mediouni
1 sibling, 0 replies; 46+ messages in thread
From: Mohamed Mediouni @ 2025-09-18 4:08 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Magnus Kulke, qemu-devel, Eric Blake, Eduardo Habkost,
Markus Armbruster, Magnus Kulke, Paolo Bonzini, Richard Henderson,
Phil Dennis-Jordan, Marcel Apfelbaum, Alex Bennée,
"Daniel P. Berrangé", Magnus Kulke, Cornelia Huck,
Zhao Liu, Thomas Huth, Yanan Wang, Cameron Esfahani, Wei Liu,
Wei Liu, Marc-André Lureau, Roman Bolshakov,
Philippe Mathieu-Daudé
[-- Attachment #1: Type: text/plain, Size: 1229 bytes --]
> On 11. Sep 2025, at 08:59, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> So regarding merging plans. Did you guys get in touch
> with Sunil? That's the easiest smoothest path, through
> an existing maintainer.
On that bit:
Didn’t submit this yet but this is coming in the next WHPX arm64 patch revision:
MAINTAINERS: update maintainers for WHPX
And add arm64 files.
From Pedro Barbuda (on Teams):
> we meant to have that switched a while back. you can add me as the maintainer. Pedro Barbuda (pbarbuda@microsoft.com)
Signed-off-by: Mohamed Mediouni <mohamed@unpredictable.fr>
diff --git a/MAINTAINERS b/MAINTAINERS
index ece8624d01..6b1764ccf0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -544,11 +544,14 @@ F: accel/stubs/hvf-stub.c
F: include/system/hvf.h
F: include/system/hvf_int.h
-WHPX CPUs
-M: Sunil Muthuswamy <sunilmut@microsoft.com>
+WHPX
+M: Pedro Barbuda <pbarbuda@microsoft.com>
+M: Mohamed Mediouni <mohamed@unpredictable.fr>
S: Supported
F: accel/whpx/
F: target/i386/whpx/
+F: target/arm/whpx_arm.h
+F: target/arm/whpx/
F: accel/stubs/whpx-stub.c
F: include/system/whpx.h
F: include/system/whpx-accel-ops.h
[-- Attachment #2: Type: text/html, Size: 4207 bytes --]
^ permalink raw reply related [flat|nested] 46+ messages in thread