* [PATCH 0/6] Hyper-V: kexec fixes for L1VH (mshv)
@ 2026-03-27 20:19 Jork Loeser
2026-03-27 20:19 ` [PATCH 1/6] Drivers: hv: vmbus: fix hyperv_cpuhp_online variable shadowing Jork Loeser
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Jork Loeser @ 2026-03-27 20:19 UTC (permalink / raw)
To: linux-hyperv
Cc: x86, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Long Li, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, Arnd Bergmann, Roman Kisel,
Michael Kelley, linux-kernel, linux-arch, Jork Loeser
This series fixes kexec support when Linux runs as an L1 Virtual Host
(L1VH) under Hyper-V, using the MSHV driver to manage child VMs.
1. A variable shadowing bug in vmbus that hides the cpuhp state used
for teardown.
2. Move hv_stimer_global_cleanup() from vmbus's hv_kexec_handler() to
hv_machine_shutdown(). This ensures stimer cleanup happens before
the vmbus unload.
3. LP/VP re-creation: after kexec, logical processors and virtual
processors already exist in the hypervisor. Detect this and skip
re-adding them.
4-5. SynIC cleanup: the MSHV driver manages its own SynIC resources
separately from vmbus. Add proper teardown of MSHV-owned SINTs,
SIMP, and SIEFP on kexec, scoped to only the resources MSHV
owns.
6. Debugfs stats pages: unmap the VP statistics overlay pages before
kexec to avoid stale mappings in the new kernel.
Jork Loeser (6):
Drivers: hv: vmbus: fix hyperv_cpuhp_online variable shadowing
x86/hyperv: move stimer cleanup to hv_machine_shutdown()
x86/hyperv: Skip LP/VP creation on kexec
mshv: limit SynIC management to MSHV-owned resources
mshv: clean up SynIC state on kexec for L1VH
mshv: unmap debugfs stats pages on kexec
arch/x86/kernel/cpu/mshyperv.c | 15 +++-
drivers/hv/hv_proc.c | 47 +++++++++++
drivers/hv/mshv_debugfs.c | 7 +-
drivers/hv/mshv_root_main.c | 22 ++---
drivers/hv/mshv_synic.c | 144 ++++++++++++++++++++++-----------
drivers/hv/vmbus_drv.c | 2 -
include/asm-generic/mshyperv.h | 10 +++
include/hyperv/hvgdk_mini.h | 1 +
include/hyperv/hvhdk_mini.h | 12 +++
9 files changed, 190 insertions(+), 70 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/6] Drivers: hv: vmbus: fix hyperv_cpuhp_online variable shadowing
2026-03-27 20:19 [PATCH 0/6] Hyper-V: kexec fixes for L1VH (mshv) Jork Loeser
@ 2026-03-27 20:19 ` Jork Loeser
2026-03-27 20:19 ` [PATCH 2/6] x86/hyperv: move stimer cleanup to hv_machine_shutdown() Jork Loeser
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Jork Loeser @ 2026-03-27 20:19 UTC (permalink / raw)
To: linux-hyperv
Cc: x86, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Long Li, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, Arnd Bergmann, Roman Kisel,
Michael Kelley, linux-kernel, linux-arch, Jork Loeser
vmbus_alloc_synic_and_connect() declares a local 'int
hyperv_cpuhp_online' that shadows the file-scope global of the same
name. The cpuhp state returned by cpuhp_setup_state() is stored in
the local, leaving the global at 0 (CPUHP_OFFLINE). When
hv_kexec_handler() or hv_machine_shutdown() later call
cpuhp_remove_state(hyperv_cpuhp_online) they pass 0, which hits the
BUG_ON in __cpuhp_remove_state_cpuslocked().
Remove the local declaration so the cpuhp state is stored in the
file-scope global where hv_kexec_handler() and hv_machine_shutdown()
expect it.
Fixes: 2647c96649ba ("Drivers: hv: Support establishing the confidential VMBus connection")
Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
---
drivers/hv/vmbus_drv.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 3e7a52918ce0..301273d61892 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1430,7 +1430,6 @@ static int vmbus_alloc_synic_and_connect(void)
{
int ret, cpu;
struct work_struct __percpu *works;
- int hyperv_cpuhp_online;
ret = hv_synic_alloc();
if (ret < 0)
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/6] x86/hyperv: move stimer cleanup to hv_machine_shutdown()
2026-03-27 20:19 [PATCH 0/6] Hyper-V: kexec fixes for L1VH (mshv) Jork Loeser
2026-03-27 20:19 ` [PATCH 1/6] Drivers: hv: vmbus: fix hyperv_cpuhp_online variable shadowing Jork Loeser
@ 2026-03-27 20:19 ` Jork Loeser
2026-03-27 20:19 ` [PATCH 3/6] x86/hyperv: Skip LP/VP creation on kexec Jork Loeser
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Jork Loeser @ 2026-03-27 20:19 UTC (permalink / raw)
To: linux-hyperv
Cc: x86, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Long Li, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, Arnd Bergmann, Roman Kisel,
Michael Kelley, linux-kernel, linux-arch, Jork Loeser,
Anirudh Rayabharam
Move hv_stimer_global_cleanup() from vmbus's hv_kexec_handler() to
hv_machine_shutdown() in the platform code. This ensures stimer cleanup
happens before the vmbus unload, which is required for root partition
kexec to work correctly.
Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
---
arch/x86/kernel/cpu/mshyperv.c | 8 ++++++--
drivers/hv/vmbus_drv.c | 1 -
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 89a2eb8a0722..235087456bdf 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -235,8 +235,12 @@ void hv_remove_crash_handler(void)
#ifdef CONFIG_KEXEC_CORE
static void hv_machine_shutdown(void)
{
- if (kexec_in_progress && hv_kexec_handler)
- hv_kexec_handler();
+ if (kexec_in_progress) {
+ hv_stimer_global_cleanup();
+
+ if (hv_kexec_handler)
+ hv_kexec_handler();
+ }
/*
* Call hv_cpu_die() on all the CPUs, otherwise later the hypervisor
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 301273d61892..5d1449f8c6ea 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2892,7 +2892,6 @@ static struct platform_driver vmbus_platform_driver = {
static void hv_kexec_handler(void)
{
- hv_stimer_global_cleanup();
vmbus_initiate_unload(false);
/* Make sure conn_state is set as hv_synic_cleanup checks for it */
mb();
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/6] x86/hyperv: Skip LP/VP creation on kexec
2026-03-27 20:19 [PATCH 0/6] Hyper-V: kexec fixes for L1VH (mshv) Jork Loeser
2026-03-27 20:19 ` [PATCH 1/6] Drivers: hv: vmbus: fix hyperv_cpuhp_online variable shadowing Jork Loeser
2026-03-27 20:19 ` [PATCH 2/6] x86/hyperv: move stimer cleanup to hv_machine_shutdown() Jork Loeser
@ 2026-03-27 20:19 ` Jork Loeser
2026-03-27 20:19 ` [PATCH 4/6] mshv: limit SynIC management to MSHV-owned resources Jork Loeser
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Jork Loeser @ 2026-03-27 20:19 UTC (permalink / raw)
To: linux-hyperv
Cc: x86, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Long Li, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, Arnd Bergmann, Roman Kisel,
Michael Kelley, linux-kernel, linux-arch, Jork Loeser,
Anirudh Rayabharam, Stanislav Kinsburskii, Mukesh Rathor
After a kexec the logical processors and virtual processors already
exist in the hypervisor because they were created by the previous
kernel. Attempting to add them again causes either a BUG_ON or
corrupted VP state leading to MCEs in the new kernel.
Add hv_lp_exists() to probe whether an LP is already present by
calling HVCALL_GET_LOGICAL_PROCESSOR_RUN_TIME. When it succeeds the
LP exists and we skip the add-LP and create-VP loops entirely.
Also add hv_call_notify_all_processors_started() which informs the
hypervisor that all processors are online. This is required after
adding LPs (fresh boot) and is a no-op on kexec since we skip that
path.
Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Co-developed-by: Stanislav Kinsburskii <stanislav.kinsburski@gmail.com>
Signed-off-by: Stanislav Kinsburskii <stanislav.kinsburski@gmail.com>
Co-developed-by: Mukesh Rathor <mukeshrathor@microsoft.com>
Signed-off-by: Mukesh Rathor <mukeshrathor@microsoft.com>
Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
---
arch/x86/kernel/cpu/mshyperv.c | 7 +++++
drivers/hv/hv_proc.c | 47 ++++++++++++++++++++++++++++++++++
include/asm-generic/mshyperv.h | 10 ++++++++
include/hyperv/hvgdk_mini.h | 1 +
include/hyperv/hvhdk_mini.h | 12 +++++++++
5 files changed, 77 insertions(+)
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 235087456bdf..f653feea880b 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -429,6 +429,10 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
}
#ifdef CONFIG_X86_64
+ /* If AP LPs exist, we are in a kexec'd kernel and VPs already exist */
+ if (num_present_cpus() == 1 || hv_lp_exists(1))
+ return;
+
for_each_present_cpu(i) {
if (i == 0)
continue;
@@ -436,6 +440,9 @@ static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
BUG_ON(ret);
}
+ ret = hv_call_notify_all_processors_started();
+ WARN_ON(ret);
+
for_each_present_cpu(i) {
if (i == 0)
continue;
diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
index 5f4fd9c3231c..63a48e5a02c5 100644
--- a/drivers/hv/hv_proc.c
+++ b/drivers/hv/hv_proc.c
@@ -239,3 +239,50 @@ int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags)
return ret;
}
EXPORT_SYMBOL_GPL(hv_call_create_vp);
+
+int hv_call_notify_all_processors_started(void)
+{
+ struct hv_input_notify_partition_event *input;
+ u64 status;
+ unsigned long irq_flags;
+ int ret = 0;
+
+ local_irq_save(irq_flags);
+ input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ memset(input, 0, sizeof(*input));
+ input->event = HV_PARTITION_ALL_LOGICAL_PROCESSORS_STARTED;
+ status = hv_do_hypercall(HVCALL_NOTIFY_PARTITION_EVENT,
+ input, NULL);
+ local_irq_restore(irq_flags);
+
+ if (!hv_result_success(status)) {
+ hv_status_err(status, "\n");
+ ret = hv_result_to_errno(status);
+ }
+ return ret;
+}
+
+bool hv_lp_exists(u32 lp_index)
+{
+ struct hv_input_get_logical_processor_run_time *input;
+ struct hv_output_get_logical_processor_run_time *output;
+ unsigned long flags;
+ u64 status;
+
+ local_irq_save(flags);
+ input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ output = *this_cpu_ptr(hyperv_pcpu_output_arg);
+
+ input->lp_index = lp_index;
+ status = hv_do_hypercall(HVCALL_GET_LOGICAL_PROCESSOR_RUN_TIME,
+ input, output);
+ local_irq_restore(flags);
+
+ if (!hv_result_success(status) &&
+ hv_result(status) != HV_STATUS_INVALID_LP_INDEX) {
+ hv_status_err(status, "\n");
+ BUG();
+ }
+
+ return hv_result_success(status);
+}
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index d37b68238c97..bf601d67cecb 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -347,6 +347,8 @@ bool hv_result_needs_memory(u64 status);
int hv_deposit_memory_node(int node, u64 partition_id, u64 status);
int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
+int hv_call_notify_all_processors_started(void);
+bool hv_lp_exists(u32 lp_index);
int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
#else /* CONFIG_MSHV_ROOT */
@@ -366,6 +368,14 @@ static inline int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id)
{
return -EOPNOTSUPP;
}
+static inline int hv_call_notify_all_processors_started(void)
+{
+ return -EOPNOTSUPP;
+}
+static inline bool hv_lp_exists(u32 lp_index)
+{
+ return false;
+}
static inline int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags)
{
return -EOPNOTSUPP;
diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
index 056ef7b6b360..f2598e186550 100644
--- a/include/hyperv/hvgdk_mini.h
+++ b/include/hyperv/hvgdk_mini.h
@@ -435,6 +435,7 @@ union hv_vp_assist_msr_contents { /* HV_REGISTER_VP_ASSIST_PAGE */
/* HV_CALL_CODE */
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE 0x0002
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST 0x0003
+#define HVCALL_GET_LOGICAL_PROCESSOR_RUN_TIME 0x0004
#define HVCALL_NOTIFY_LONG_SPIN_WAIT 0x0008
#define HVCALL_SEND_IPI 0x000b
#define HVCALL_ENABLE_VP_VTL 0x000f
diff --git a/include/hyperv/hvhdk_mini.h b/include/hyperv/hvhdk_mini.h
index 091c03e26046..b4cb2fa26e9b 100644
--- a/include/hyperv/hvhdk_mini.h
+++ b/include/hyperv/hvhdk_mini.h
@@ -362,6 +362,7 @@ union hv_partition_event_input {
enum hv_partition_event {
HV_PARTITION_EVENT_ROOT_CRASHDUMP = 2,
+ HV_PARTITION_ALL_LOGICAL_PROCESSORS_STARTED = 4,
};
struct hv_input_notify_partition_event {
@@ -369,6 +370,17 @@ struct hv_input_notify_partition_event {
union hv_partition_event_input input;
} __packed;
+struct hv_input_get_logical_processor_run_time {
+ u32 lp_index;
+} __packed;
+
+struct hv_output_get_logical_processor_run_time {
+ u64 global_time;
+ u64 local_run_time;
+ u64 rsvdz0;
+ u64 hypervisor_time;
+} __packed;
+
struct hv_lp_startup_status {
u64 hv_status;
u64 substatus1;
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 4/6] mshv: limit SynIC management to MSHV-owned resources
2026-03-27 20:19 [PATCH 0/6] Hyper-V: kexec fixes for L1VH (mshv) Jork Loeser
` (2 preceding siblings ...)
2026-03-27 20:19 ` [PATCH 3/6] x86/hyperv: Skip LP/VP creation on kexec Jork Loeser
@ 2026-03-27 20:19 ` Jork Loeser
2026-03-27 20:19 ` [PATCH 5/6] mshv: clean up SynIC state on kexec for L1VH Jork Loeser
2026-03-27 20:19 ` [PATCH 6/6] mshv: unmap debugfs stats pages on kexec Jork Loeser
5 siblings, 0 replies; 7+ messages in thread
From: Jork Loeser @ 2026-03-27 20:19 UTC (permalink / raw)
To: linux-hyperv
Cc: x86, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Long Li, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, Arnd Bergmann, Roman Kisel,
Michael Kelley, linux-kernel, linux-arch, Jork Loeser
The SynIC is shared between VMBus and MSHV. VMBus owns the message
page (SIMP), event flags page (SIEFP), global enable (SCONTROL), and
SINT2. MSHV adds SINT0, SINT5, and the event ring page (SIRBP).
Currently mshv_synic_init() redundantly enables SIMP, SIEFP, and
SCONTROL that VMBus already configured, and mshv_synic_cleanup()
disables all of them. This is wrong because MSHV can be torn down
while VMBus is still active. In particular, a kexec reboot notifier
tears down MSHV first. Disabling SCONTROL, SIMP, and SIEFP out from
under VMBus causes its later cleanup to write SynIC MSRs while SynIC
is disabled, which the hypervisor does not tolerate.
Restrict MSHV to managing only the resources it owns:
- SINT0, SINT5: mask on cleanup, unmask on init
- SIRBP: enable/disable as before
- SIMP, SIEFP, SCONTROL: on L1VH leave entirely to VMBus (it
already enabled them); on root partition VMBus doesn't run, so
MSHV must enable/disable them
Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
---
drivers/hv/mshv_synic.c | 109 ++++++++++++++++++++++++----------------
1 file changed, 67 insertions(+), 42 deletions(-)
diff --git a/drivers/hv/mshv_synic.c b/drivers/hv/mshv_synic.c
index f8b0337cdc82..8a7d76a10dc3 100644
--- a/drivers/hv/mshv_synic.c
+++ b/drivers/hv/mshv_synic.c
@@ -454,7 +454,6 @@ int mshv_synic_init(unsigned int cpu)
#ifdef HYPERVISOR_CALLBACK_VECTOR
union hv_synic_sint sint;
#endif
- union hv_synic_scontrol sctrl;
struct hv_synic_pages *spages = this_cpu_ptr(mshv_root.synic_pages);
struct hv_message_page **msg_page = &spages->hyp_synic_message_page;
struct hv_synic_event_flags_page **event_flags_page =
@@ -462,28 +461,37 @@ int mshv_synic_init(unsigned int cpu)
struct hv_synic_event_ring_page **event_ring_page =
&spages->synic_event_ring_page;
- /* Setup the Synic's message page */
+ /*
+ * Map the SYNIC message page. On root partition the hypervisor
+ * pre-provisions the SIMP GPA but may not set simp_enabled;
+ * on L1VH, VMBus already fully set it up. Enable it on root.
+ */
simp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIMP);
- simp.simp_enabled = true;
+ if (hv_root_partition()) {
+ simp.simp_enabled = true;
+ hv_set_non_nested_msr(HV_MSR_SIMP, simp.as_uint64);
+ }
*msg_page = memremap(simp.base_simp_gpa << HV_HYP_PAGE_SHIFT,
HV_HYP_PAGE_SIZE,
MEMREMAP_WB);
if (!(*msg_page))
- return -EFAULT;
-
- hv_set_non_nested_msr(HV_MSR_SIMP, simp.as_uint64);
+ goto cleanup_simp;
- /* Setup the Synic's event flags page */
+ /*
+ * Map the event flags page. Same as SIMP: enable on root,
+ * already enabled by VMBus on L1VH.
+ */
siefp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIEFP);
- siefp.siefp_enabled = true;
+ if (hv_root_partition()) {
+ siefp.siefp_enabled = true;
+ hv_set_non_nested_msr(HV_MSR_SIEFP, siefp.as_uint64);
+ }
*event_flags_page = memremap(siefp.base_siefp_gpa << PAGE_SHIFT,
PAGE_SIZE, MEMREMAP_WB);
if (!(*event_flags_page))
- goto cleanup;
-
- hv_set_non_nested_msr(HV_MSR_SIEFP, siefp.as_uint64);
+ goto cleanup_siefp;
/* Setup the Synic's event ring page */
sirbp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIRBP);
@@ -492,7 +500,7 @@ int mshv_synic_init(unsigned int cpu)
PAGE_SIZE, MEMREMAP_WB);
if (!(*event_ring_page))
- goto cleanup;
+ goto cleanup_siefp;
hv_set_non_nested_msr(HV_MSR_SIRBP, sirbp.as_uint64);
@@ -515,28 +523,33 @@ int mshv_synic_init(unsigned int cpu)
sint.as_uint64);
#endif
- /* Enable global synic bit */
- sctrl.as_uint64 = hv_get_non_nested_msr(HV_MSR_SCONTROL);
- sctrl.enable = 1;
- hv_set_non_nested_msr(HV_MSR_SCONTROL, sctrl.as_uint64);
+ /*
+ * On L1VH, VMBus owns SCONTROL and has already enabled it.
+ * On root partition, VMBus doesn't run so we must enable it.
+ */
+ if (hv_root_partition()) {
+ union hv_synic_scontrol sctrl;
+
+ sctrl.as_uint64 = hv_get_non_nested_msr(HV_MSR_SCONTROL);
+ sctrl.enable = 1;
+ hv_set_non_nested_msr(HV_MSR_SCONTROL, sctrl.as_uint64);
+ }
return 0;
-cleanup:
- if (*event_ring_page) {
- sirbp.sirbp_enabled = false;
- hv_set_non_nested_msr(HV_MSR_SIRBP, sirbp.as_uint64);
- memunmap(*event_ring_page);
- }
- if (*event_flags_page) {
+cleanup_siefp:
+ if (*event_flags_page)
+ memunmap(*event_flags_page);
+ if (hv_root_partition()) {
siefp.siefp_enabled = false;
hv_set_non_nested_msr(HV_MSR_SIEFP, siefp.as_uint64);
- memunmap(*event_flags_page);
}
- if (*msg_page) {
+cleanup_simp:
+ if (*msg_page)
+ memunmap(*msg_page);
+ if (hv_root_partition()) {
simp.simp_enabled = false;
hv_set_non_nested_msr(HV_MSR_SIMP, simp.as_uint64);
- memunmap(*msg_page);
}
return -EFAULT;
@@ -545,10 +558,7 @@ int mshv_synic_init(unsigned int cpu)
int mshv_synic_cleanup(unsigned int cpu)
{
union hv_synic_sint sint;
- union hv_synic_simp simp;
- union hv_synic_siefp siefp;
union hv_synic_sirbp sirbp;
- union hv_synic_scontrol sctrl;
struct hv_synic_pages *spages = this_cpu_ptr(mshv_root.synic_pages);
struct hv_message_page **msg_page = &spages->hyp_synic_message_page;
struct hv_synic_event_flags_page **event_flags_page =
@@ -568,28 +578,43 @@ int mshv_synic_cleanup(unsigned int cpu)
hv_set_non_nested_msr(HV_MSR_SINT0 + HV_SYNIC_DOORBELL_SINT_INDEX,
sint.as_uint64);
- /* Disable Synic's event ring page */
+ /* Disable SYNIC event ring page owned by MSHV */
sirbp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIRBP);
sirbp.sirbp_enabled = false;
hv_set_non_nested_msr(HV_MSR_SIRBP, sirbp.as_uint64);
memunmap(*event_ring_page);
- /* Disable Synic's event flags page */
- siefp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIEFP);
- siefp.siefp_enabled = false;
- hv_set_non_nested_msr(HV_MSR_SIEFP, siefp.as_uint64);
+ /*
+ * Release our mappings of the message and event flags pages.
+ * On root partition, we enabled SIMP/SIEFP — disable them.
+ * On L1VH, VMBus owns the MSRs, leave them alone.
+ */
memunmap(*event_flags_page);
+ if (hv_root_partition()) {
+ union hv_synic_simp simp;
+ union hv_synic_siefp siefp;
- /* Disable Synic's message page */
- simp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIMP);
- simp.simp_enabled = false;
- hv_set_non_nested_msr(HV_MSR_SIMP, simp.as_uint64);
+ siefp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIEFP);
+ siefp.siefp_enabled = false;
+ hv_set_non_nested_msr(HV_MSR_SIEFP, siefp.as_uint64);
+
+ simp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIMP);
+ simp.simp_enabled = false;
+ hv_set_non_nested_msr(HV_MSR_SIMP, simp.as_uint64);
+ }
memunmap(*msg_page);
- /* Disable global synic bit */
- sctrl.as_uint64 = hv_get_non_nested_msr(HV_MSR_SCONTROL);
- sctrl.enable = 0;
- hv_set_non_nested_msr(HV_MSR_SCONTROL, sctrl.as_uint64);
+ /*
+ * On root partition, we enabled SCONTROL in init — disable it.
+ * On L1VH, VMBus owns SCONTROL, leave it alone.
+ */
+ if (hv_root_partition()) {
+ union hv_synic_scontrol sctrl;
+
+ sctrl.as_uint64 = hv_get_non_nested_msr(HV_MSR_SCONTROL);
+ sctrl.enable = 0;
+ hv_set_non_nested_msr(HV_MSR_SCONTROL, sctrl.as_uint64);
+ }
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 5/6] mshv: clean up SynIC state on kexec for L1VH
2026-03-27 20:19 [PATCH 0/6] Hyper-V: kexec fixes for L1VH (mshv) Jork Loeser
` (3 preceding siblings ...)
2026-03-27 20:19 ` [PATCH 4/6] mshv: limit SynIC management to MSHV-owned resources Jork Loeser
@ 2026-03-27 20:19 ` Jork Loeser
2026-03-27 20:19 ` [PATCH 6/6] mshv: unmap debugfs stats pages on kexec Jork Loeser
5 siblings, 0 replies; 7+ messages in thread
From: Jork Loeser @ 2026-03-27 20:19 UTC (permalink / raw)
To: linux-hyperv
Cc: x86, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Long Li, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, Arnd Bergmann, Roman Kisel,
Michael Kelley, linux-kernel, linux-arch, Jork Loeser
Register the mshv reboot notifier for all parent partitions, not just
root. Previously the notifier was gated on hv_root_partition(), so on
L1VH (where hv_root_partition() is false) SINT0, SINT5, and SIRBP were
never cleaned up before kexec. The kexec'd kernel then inherited stale
unmasked SINTs and an enabled SIRBP pointing to freed memory.
The L1VH SIRBP also needs special handling: unlike the root partition
where the hypervisor provides the SIRBP page, L1VH must allocate its
own page and program the GPA into the MSR. Add this allocation to
mshv_synic_init() and the corresponding free to mshv_synic_cleanup().
Remove the unnecessary mshv_root_partition_init/exit wrappers and
register the reboot notifier directly in mshv_parent_partition_init().
Make mshv_reboot_nb static since it no longer needs external linkage.
Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
---
drivers/hv/mshv_root_main.c | 21 ++++-----------------
drivers/hv/mshv_synic.c | 37 ++++++++++++++++++++++++++++++-------
2 files changed, 34 insertions(+), 24 deletions(-)
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index e6509c980763..281f530b68a9 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -2256,20 +2256,10 @@ static int mshv_reboot_notify(struct notifier_block *nb,
return 0;
}
-struct notifier_block mshv_reboot_nb = {
+static struct notifier_block mshv_reboot_nb = {
.notifier_call = mshv_reboot_notify,
};
-static void mshv_root_partition_exit(void)
-{
- unregister_reboot_notifier(&mshv_reboot_nb);
-}
-
-static int __init mshv_root_partition_init(struct device *dev)
-{
- return register_reboot_notifier(&mshv_reboot_nb);
-}
-
static int __init mshv_init_vmm_caps(struct device *dev)
{
int ret;
@@ -2339,8 +2329,7 @@ static int __init mshv_parent_partition_init(void)
if (ret)
goto remove_cpu_state;
- if (hv_root_partition())
- ret = mshv_root_partition_init(dev);
+ ret = register_reboot_notifier(&mshv_reboot_nb);
if (ret)
goto remove_cpu_state;
@@ -2368,8 +2357,7 @@ static int __init mshv_parent_partition_init(void)
deinit_root_scheduler:
root_scheduler_deinit();
exit_partition:
- if (hv_root_partition())
- mshv_root_partition_exit();
+ unregister_reboot_notifier(&mshv_reboot_nb);
remove_cpu_state:
cpuhp_remove_state(mshv_cpuhp_online);
free_synic_pages:
@@ -2387,8 +2375,7 @@ static void __exit mshv_parent_partition_exit(void)
misc_deregister(&mshv_dev);
mshv_irqfd_wq_cleanup();
root_scheduler_deinit();
- if (hv_root_partition())
- mshv_root_partition_exit();
+ unregister_reboot_notifier(&mshv_reboot_nb);
cpuhp_remove_state(mshv_cpuhp_online);
free_percpu(mshv_root.synic_pages);
}
diff --git a/drivers/hv/mshv_synic.c b/drivers/hv/mshv_synic.c
index 8a7d76a10dc3..32f91a714c97 100644
--- a/drivers/hv/mshv_synic.c
+++ b/drivers/hv/mshv_synic.c
@@ -495,13 +495,29 @@ int mshv_synic_init(unsigned int cpu)
/* Setup the Synic's event ring page */
sirbp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIRBP);
- sirbp.sirbp_enabled = true;
- *event_ring_page = memremap(sirbp.base_sirbp_gpa << PAGE_SHIFT,
- PAGE_SIZE, MEMREMAP_WB);
- if (!(*event_ring_page))
- goto cleanup_siefp;
+ if (hv_root_partition()) {
+ *event_ring_page = memremap(sirbp.base_sirbp_gpa << PAGE_SHIFT,
+ PAGE_SIZE, MEMREMAP_WB);
+
+ if (!(*event_ring_page))
+ goto cleanup_siefp;
+ } else {
+ /*
+ * On L1VH the hypervisor does not provide a SIRBP page.
+ * Allocate one and program its GPA into the MSR.
+ */
+ *event_ring_page = (struct hv_synic_event_ring_page *)
+ get_zeroed_page(GFP_KERNEL);
+
+ if (!(*event_ring_page))
+ goto cleanup_siefp;
+ sirbp.base_sirbp_gpa = virt_to_phys(*event_ring_page)
+ >> PAGE_SHIFT;
+ }
+
+ sirbp.sirbp_enabled = true;
hv_set_non_nested_msr(HV_MSR_SIRBP, sirbp.as_uint64);
#ifdef HYPERVISOR_CALLBACK_VECTOR
@@ -581,8 +597,15 @@ int mshv_synic_cleanup(unsigned int cpu)
/* Disable SYNIC event ring page owned by MSHV */
sirbp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIRBP);
sirbp.sirbp_enabled = false;
- hv_set_non_nested_msr(HV_MSR_SIRBP, sirbp.as_uint64);
- memunmap(*event_ring_page);
+
+ if (hv_root_partition()) {
+ hv_set_non_nested_msr(HV_MSR_SIRBP, sirbp.as_uint64);
+ memunmap(*event_ring_page);
+ } else {
+ sirbp.base_sirbp_gpa = 0;
+ hv_set_non_nested_msr(HV_MSR_SIRBP, sirbp.as_uint64);
+ free_page((unsigned long)*event_ring_page);
+ }
/*
* Release our mappings of the message and event flags pages.
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 6/6] mshv: unmap debugfs stats pages on kexec
2026-03-27 20:19 [PATCH 0/6] Hyper-V: kexec fixes for L1VH (mshv) Jork Loeser
` (4 preceding siblings ...)
2026-03-27 20:19 ` [PATCH 5/6] mshv: clean up SynIC state on kexec for L1VH Jork Loeser
@ 2026-03-27 20:19 ` Jork Loeser
5 siblings, 0 replies; 7+ messages in thread
From: Jork Loeser @ 2026-03-27 20:19 UTC (permalink / raw)
To: linux-hyperv
Cc: x86, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Long Li, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H . Peter Anvin, Arnd Bergmann, Roman Kisel,
Michael Kelley, linux-kernel, linux-arch, Jork Loeser
On L1VH, debugfs stats pages are overlay pages: the kernel allocates
them and registers the GPAs with the hypervisor via
HVCALL_MAP_STATS_PAGE2. These overlay mappings persist in the
hypervisor across kexec. If the kexec'd kernel reuses those physical
pages, the hypervisor's overlay semantics cause a machine check
exception.
Fix this by calling mshv_debugfs_exit() from the reboot notifier,
which issues HVCALL_UNMAP_STATS_PAGE for each mapped stats page before
kexec. This releases the overlay bindings so the physical pages can be
safely reused. Guard mshv_debugfs_exit() against being called when
init failed.
Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com>
---
drivers/hv/mshv_debugfs.c | 7 ++++++-
drivers/hv/mshv_root_main.c | 1 +
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/hv/mshv_debugfs.c b/drivers/hv/mshv_debugfs.c
index ebf2549eb44d..f9a4499cf8f3 100644
--- a/drivers/hv/mshv_debugfs.c
+++ b/drivers/hv/mshv_debugfs.c
@@ -676,8 +676,10 @@ int __init mshv_debugfs_init(void)
mshv_debugfs = debugfs_create_dir("mshv", NULL);
if (IS_ERR(mshv_debugfs)) {
+ err = PTR_ERR(mshv_debugfs);
+ mshv_debugfs = NULL;
pr_err("%s: failed to create debugfs directory\n", __func__);
- return PTR_ERR(mshv_debugfs);
+ return err;
}
if (hv_root_partition()) {
@@ -712,6 +714,9 @@ int __init mshv_debugfs_init(void)
void mshv_debugfs_exit(void)
{
+ if (!mshv_debugfs)
+ return;
+
mshv_debugfs_parent_partition_remove();
if (hv_root_partition()) {
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index 281f530b68a9..7038fd830646 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -2252,6 +2252,7 @@ root_scheduler_deinit(void)
static int mshv_reboot_notify(struct notifier_block *nb,
unsigned long code, void *unused)
{
+ mshv_debugfs_exit();
cpuhp_remove_state(mshv_cpuhp_online);
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-03-27 20:19 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27 20:19 [PATCH 0/6] Hyper-V: kexec fixes for L1VH (mshv) Jork Loeser
2026-03-27 20:19 ` [PATCH 1/6] Drivers: hv: vmbus: fix hyperv_cpuhp_online variable shadowing Jork Loeser
2026-03-27 20:19 ` [PATCH 2/6] x86/hyperv: move stimer cleanup to hv_machine_shutdown() Jork Loeser
2026-03-27 20:19 ` [PATCH 3/6] x86/hyperv: Skip LP/VP creation on kexec Jork Loeser
2026-03-27 20:19 ` [PATCH 4/6] mshv: limit SynIC management to MSHV-owned resources Jork Loeser
2026-03-27 20:19 ` [PATCH 5/6] mshv: clean up SynIC state on kexec for L1VH Jork Loeser
2026-03-27 20:19 ` [PATCH 6/6] mshv: unmap debugfs stats pages on kexec Jork Loeser
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox