* [PATCH v2 0/2] Add support for clean shutdown with MSHV
@ 2025-10-14 16:41 Praveen K Paladugu
2025-10-14 16:41 ` [PATCH v2 1/2] hyperv: Add definitions for MSHV sleep state configuration Praveen K Paladugu
2025-10-14 16:41 ` [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV Praveen K Paladugu
0 siblings, 2 replies; 10+ messages in thread
From: Praveen K Paladugu @ 2025-10-14 16:41 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, tglx, mingo, linux-hyperv,
linux-kernel, bp, dave.hansen, x86, hpa, arnd
Cc: anbelski, prapal, easwar.hariharan, nunodasneves, skinsburskii
Add support for clean shutdown of the root partition when running on MSHV
hypervisor.
v2:
- Addressed review comments from v1.
- Moved all sleep state handling methods under CONFIG_ACPI stub
- - This fixes build issues on non-x86 architectures.
Moving the declartion of `hv_sleep_notifiers_register` to
arch/x86/include/asm/mshyperv.h, required the use of CONFIG_X86 stub
within hv_common.c. As this sleep configuration is dependent on ACPI,
I moved all the methods under CONFIG_ACPI to keep the code cleaner,
without introducing CONFIG_X86 stub.
Praveen K Paladugu (2):
hyperv: Add definitions for MSHV sleep state configuration
hyperv: Enable clean shutdown for root partition with MSHV
arch/x86/hyperv/hv_init.c | 7 ++
arch/x86/include/asm/mshyperv.h | 1 +
drivers/hv/hv_common.c | 119 ++++++++++++++++++++++++++++++++
include/hyperv/hvgdk_mini.h | 4 +-
include/hyperv/hvhdk_mini.h | 33 +++++++++
5 files changed, 163 insertions(+), 1 deletion(-)
--
2.51.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/2] hyperv: Add definitions for MSHV sleep state configuration
2025-10-14 16:41 [PATCH v2 0/2] Add support for clean shutdown with MSHV Praveen K Paladugu
@ 2025-10-14 16:41 ` Praveen K Paladugu
2025-10-14 16:41 ` [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV Praveen K Paladugu
1 sibling, 0 replies; 10+ messages in thread
From: Praveen K Paladugu @ 2025-10-14 16:41 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, tglx, mingo, linux-hyperv,
linux-kernel, bp, dave.hansen, x86, hpa, arnd
Cc: anbelski, prapal, easwar.hariharan, nunodasneves, skinsburskii
Add the definitions required to configure sleep states in mshv hypervsior.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Co-developed-by: Anatol Belski <anbelski@linux.microsoft.com>
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
---
include/hyperv/hvgdk_mini.h | 4 +++-
include/hyperv/hvhdk_mini.h | 33 +++++++++++++++++++++++++++++++++
2 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
index 1be7f6a02304..25041417b54b 100644
--- a/include/hyperv/hvgdk_mini.h
+++ b/include/hyperv/hvgdk_mini.h
@@ -464,18 +464,20 @@ union hv_vp_assist_msr_contents { /* HV_REGISTER_VP_ASSIST_PAGE */
#define HVCALL_RESET_DEBUG_SESSION 0x006b
#define HVCALL_MAP_STATS_PAGE 0x006c
#define HVCALL_UNMAP_STATS_PAGE 0x006d
+#define HVCALL_SET_SYSTEM_PROPERTY 0x006f
#define HVCALL_ADD_LOGICAL_PROCESSOR 0x0076
#define HVCALL_GET_SYSTEM_PROPERTY 0x007b
#define HVCALL_MAP_DEVICE_INTERRUPT 0x007c
#define HVCALL_UNMAP_DEVICE_INTERRUPT 0x007d
#define HVCALL_RETARGET_INTERRUPT 0x007e
+#define HVCALL_ENTER_SLEEP_STATE 0x0084
#define HVCALL_NOTIFY_PORT_RING_EMPTY 0x008b
#define HVCALL_REGISTER_INTERCEPT_RESULT 0x0091
#define HVCALL_ASSERT_VIRTUAL_INTERRUPT 0x0094
#define HVCALL_CREATE_PORT 0x0095
#define HVCALL_CONNECT_PORT 0x0096
#define HVCALL_START_VP 0x0099
-#define HVCALL_GET_VP_INDEX_FROM_APIC_ID 0x009a
+#define HVCALL_GET_VP_INDEX_FROM_APIC_ID 0x009a
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
#define HVCALL_SIGNAL_EVENT_DIRECT 0x00c0
diff --git a/include/hyperv/hvhdk_mini.h b/include/hyperv/hvhdk_mini.h
index 858f6a3925b3..8fa86c014c25 100644
--- a/include/hyperv/hvhdk_mini.h
+++ b/include/hyperv/hvhdk_mini.h
@@ -114,10 +114,24 @@ enum hv_snp_status {
enum hv_system_property {
/* Add more values when needed */
+ HV_SYSTEM_PROPERTY_SLEEP_STATE = 3,
HV_SYSTEM_PROPERTY_SCHEDULER_TYPE = 15,
HV_DYNAMIC_PROCESSOR_FEATURE_PROPERTY = 21,
};
+enum hv_sleep_state {
+ HV_SLEEP_STATE_S1 = 1,
+ HV_SLEEP_STATE_S2 = 2,
+ HV_SLEEP_STATE_S3 = 3,
+ HV_SLEEP_STATE_S4 = 4,
+ HV_SLEEP_STATE_S5 = 5,
+ /*
+ * After hypervisor has received this, any follow up sleep
+ * state registration requests will be rejected.
+ */
+ HV_SLEEP_STATE_LOCK = 6
+};
+
enum hv_dynamic_processor_feature_property {
/* Add more values when needed */
HV_X64_DYNAMIC_PROCESSOR_FEATURE_MAX_ENCRYPTED_PARTITIONS = 13,
@@ -145,6 +159,25 @@ struct hv_output_get_system_property {
};
} __packed;
+struct hv_sleep_state_info {
+ u32 sleep_state; /* enum hv_sleep_state */
+ u8 pm1a_slp_typ;
+ u8 pm1b_slp_typ;
+} __packed;
+
+struct hv_input_set_system_property {
+ u32 property_id; /* enum hv_system_property */
+ u32 reserved;
+ union {
+ /* More fields to be filled in when needed */
+ struct hv_sleep_state_info set_sleep_state_info;
+ };
+} __packed;
+
+struct hv_input_enter_sleep_state { /* HV_INPUT_ENTER_SLEEP_STATE */
+ u32 sleep_state; /* enum hv_sleep_state */
+} __packed;
+
struct hv_input_map_stats_page {
u32 type; /* enum hv_stats_object_type */
u32 padding;
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
2025-10-14 16:41 [PATCH v2 0/2] Add support for clean shutdown with MSHV Praveen K Paladugu
2025-10-14 16:41 ` [PATCH v2 1/2] hyperv: Add definitions for MSHV sleep state configuration Praveen K Paladugu
@ 2025-10-14 16:41 ` Praveen K Paladugu
2025-10-15 5:23 ` kernel test robot
` (2 more replies)
1 sibling, 3 replies; 10+ messages in thread
From: Praveen K Paladugu @ 2025-10-14 16:41 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, tglx, mingo, linux-hyperv,
linux-kernel, bp, dave.hansen, x86, hpa, arnd
Cc: anbelski, prapal, easwar.hariharan, nunodasneves, skinsburskii
When a shutdown is initiated in the root partition without configuring
sleep states, the call to `hv_call_enter_sleep_state` fails. In such cases
the root falls back to using legacy ACPI mechanisms to poweroff. This call
is intercepted by MSHV and will result in a Machine Check Exception (MCE).
Root panics with a trace similar to:
[ 81.306348] reboot: Power down
[ 81.314709] mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 0: b2000000c0060001
[ 81.314711] mce: [Hardware Error]: TSC 3b8cb60a66 PPIN 11d98332458e4ea9
[ 81.314713] mce: [Hardware Error]: PROCESSOR 0:606a6 TIME 1759339405 SOCKET 0 APIC 0 microcode ffffffff
[ 81.314715] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 81.314716] mce: [Hardware Error]: Machine check: Processor context corrupt
[ 81.314717] Kernel panic - not syncing: Fatal machine check
To prevent this, properly configure sleep states within MSHV, allowing
the root partition to shut down cleanly without triggering a panic.
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Co-developed-by: Anatol Belski <anbelski@linux.microsoft.com>
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
---
arch/x86/hyperv/hv_init.c | 7 ++
arch/x86/include/asm/mshyperv.h | 1 +
drivers/hv/hv_common.c | 119 ++++++++++++++++++++++++++++++++
3 files changed, 127 insertions(+)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index afdbda2dd7b7..57bd96671ead 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -510,6 +510,13 @@ void __init hyperv_init(void)
memunmap(src);
hv_remap_tsc_clocksource();
+ /*
+ * The notifier registration might fail at various hops.
+ * Corresponding error messages will land in dmesg. There is
+ * otherwise nothing that can be specifically done to handle
+ * failures here.
+ */
+ (void)hv_sleep_notifiers_register();
} else {
hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
wrmsrq(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index abc4659f5809..fb8d691193df 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -236,6 +236,7 @@ int hyperv_fill_flush_guest_mapping_list(
void hv_apic_init(void);
void __init hv_init_spinlocks(void);
bool hv_vcpu_is_preempted(int vcpu);
+int hv_sleep_notifiers_register(void);
#else
static inline void hv_apic_init(void) {}
#endif
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index e109a620c83f..cfba9ded7bcb 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -837,3 +837,122 @@ const char *hv_result_to_string(u64 status)
return "Unknown";
}
EXPORT_SYMBOL_GPL(hv_result_to_string);
+
+#if IS_ENABLED(CONFIG_ACPI)
+/*
+ * Corresponding sleep states have to be initialized in order for a subsequent
+ * HVCALL_ENTER_SLEEP_STATE call to succeed. Currently only S5 state as per
+ * ACPI 6.4 chapter 7.4.2 is relevant, while S1, S2 and S3 can be supported.
+ *
+ * ACPI should be initialized and should support S5 sleep state when this method
+ * is called, so that it can extract correct PM values and pass them to hv.
+ */
+static int hv_initialize_sleep_states(void)
+{
+ u64 status;
+ unsigned long flags;
+ struct hv_input_set_system_property *in;
+ acpi_status acpi_status;
+ u8 sleep_type_a, sleep_type_b;
+
+ if (!acpi_sleep_state_supported(ACPI_STATE_S5)) {
+ pr_err("%s: S5 sleep state not supported.\n", __func__);
+ return -ENODEV;
+ }
+
+ acpi_status = acpi_get_sleep_type_data(ACPI_STATE_S5,
+ &sleep_type_a, &sleep_type_b);
+ if (ACPI_FAILURE(acpi_status))
+ return -ENODEV;
+
+ local_irq_save(flags);
+ in = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ memset(in, 0, sizeof(*in));
+
+ in->property_id = HV_SYSTEM_PROPERTY_SLEEP_STATE;
+ in->set_sleep_state_info.sleep_state = HV_SLEEP_STATE_S5;
+ in->set_sleep_state_info.pm1a_slp_typ = sleep_type_a;
+ in->set_sleep_state_info.pm1b_slp_typ = sleep_type_b;
+
+ status = hv_do_hypercall(HVCALL_SET_SYSTEM_PROPERTY, in, NULL);
+ local_irq_restore(flags);
+
+ if (!hv_result_success(status)) {
+ hv_status_err(status, "\n");
+ return hv_result_to_errno(status);
+ }
+
+ return 0;
+}
+
+static int hv_call_enter_sleep_state(u32 sleep_state)
+{
+ u64 status;
+ int ret;
+ unsigned long flags;
+ struct hv_input_enter_sleep_state *in;
+
+ ret = hv_initialize_sleep_states();
+ if (ret)
+ return ret;
+
+ local_irq_save(flags);
+ in = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ in->sleep_state = sleep_state;
+
+ status = hv_do_hypercall(HVCALL_ENTER_SLEEP_STATE, in, NULL);
+ local_irq_restore(flags);
+
+ if (!hv_result_success(status)) {
+ hv_status_err(status, "\n");
+ return hv_result_to_errno(status);
+ }
+
+ return 0;
+}
+
+static int hv_reboot_notifier_handler(struct notifier_block *this,
+ unsigned long code, void *another)
+{
+ int ret = 0;
+
+ if (code == SYS_HALT || code == SYS_POWER_OFF)
+ ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
+
+ return ret ? NOTIFY_DONE : NOTIFY_OK;
+}
+
+static struct notifier_block hv_reboot_notifier = {
+ .notifier_call = hv_reboot_notifier_handler,
+};
+
+static int hv_acpi_sleep_handler(u8 sleep_state, u32 pm1a_cnt, u32 pm1b_cnt)
+{
+ int ret = 0;
+
+ if (sleep_state == ACPI_STATE_S5)
+ ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
+
+ return ret == 0 ? 1 : -1;
+}
+
+static int hv_acpi_extended_sleep_handler(u8 sleep_state, u32 val_a, u32 val_b)
+{
+ return hv_acpi_sleep_handler(sleep_state, val_a, val_b);
+}
+
+int hv_sleep_notifiers_register(void)
+{
+ int ret;
+
+ acpi_os_set_prepare_sleep(&hv_acpi_sleep_handler);
+ acpi_os_set_prepare_extended_sleep(&hv_acpi_extended_sleep_handler);
+
+ ret = register_reboot_notifier(&hv_reboot_notifier);
+ if (ret)
+ pr_err("%s: cannot register reboot notifier %d\n",
+ __func__, ret);
+
+ return ret;
+}
+#endif
--
2.51.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
2025-10-14 16:41 ` [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV Praveen K Paladugu
@ 2025-10-15 5:23 ` kernel test robot
2025-10-15 8:01 ` kernel test robot
2025-10-16 19:29 ` Michael Kelley
2 siblings, 0 replies; 10+ messages in thread
From: kernel test robot @ 2025-10-15 5:23 UTC (permalink / raw)
To: Praveen K Paladugu, kys, haiyangz, wei.liu, decui, tglx, mingo,
linux-hyperv, linux-kernel, bp, dave.hansen, x86, hpa, arnd
Cc: llvm, oe-kbuild-all, anbelski, prapal, easwar.hariharan,
nunodasneves, skinsburskii
Hi Praveen,
kernel test robot noticed the following build warnings:
[auto build test WARNING on tip/x86/core]
[also build test WARNING on linus/master v6.18-rc1 next-20251014]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Praveen-K-Paladugu/hyperv-Add-definitions-for-MSHV-sleep-state-configuration/20251015-004650
base: tip/x86/core
patch link: https://lore.kernel.org/r/20251014164150.6935-3-prapal%40linux.microsoft.com
patch subject: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
config: i386-randconfig-006-20251015 (https://download.01.org/0day-ci/archive/20251015/202510151359.vRXcys2P-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251015/202510151359.vRXcys2P-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510151359.vRXcys2P-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/hv/hv_common.c:944:5: warning: no previous prototype for function 'hv_sleep_notifiers_register' [-Wmissing-prototypes]
944 | int hv_sleep_notifiers_register(void)
| ^
drivers/hv/hv_common.c:944:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
944 | int hv_sleep_notifiers_register(void)
| ^
| static
1 warning generated.
vim +/hv_sleep_notifiers_register +944 drivers/hv/hv_common.c
943
> 944 int hv_sleep_notifiers_register(void)
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
2025-10-14 16:41 ` [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV Praveen K Paladugu
2025-10-15 5:23 ` kernel test robot
@ 2025-10-15 8:01 ` kernel test robot
2025-10-16 19:29 ` Michael Kelley
2 siblings, 0 replies; 10+ messages in thread
From: kernel test robot @ 2025-10-15 8:01 UTC (permalink / raw)
To: Praveen K Paladugu, kys, haiyangz, wei.liu, decui, tglx, mingo,
linux-hyperv, linux-kernel, bp, dave.hansen, x86, hpa, arnd
Cc: oe-kbuild-all, anbelski, prapal, easwar.hariharan, nunodasneves,
skinsburskii
Hi Praveen,
kernel test robot noticed the following build errors:
[auto build test ERROR on tip/x86/core]
[also build test ERROR on linus/master v6.18-rc1 next-20251014]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Praveen-K-Paladugu/hyperv-Add-definitions-for-MSHV-sleep-state-configuration/20251015-004650
base: tip/x86/core
patch link: https://lore.kernel.org/r/20251014164150.6935-3-prapal%40linux.microsoft.com
patch subject: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
config: i386-buildonly-randconfig-002-20251015 (https://download.01.org/0day-ci/archive/20251015/202510151539.w9IHg8lU-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251015/202510151539.w9IHg8lU-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510151539.w9IHg8lU-lkp@intel.com/
All error/warnings (new ones prefixed by >>):
arch/x86/hyperv/hv_init.c: In function 'hyperv_init':
>> arch/x86/hyperv/hv_init.c:556:23: error: implicit declaration of function 'hv_sleep_notifiers_register'; did you mean 'preempt_notifier_register'? [-Wimplicit-function-declaration]
556 | (void)hv_sleep_notifiers_register();
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
| preempt_notifier_register
--
>> drivers/hv/hv_common.c:944:5: warning: no previous prototype for 'hv_sleep_notifiers_register' [-Wmissing-prototypes]
944 | int hv_sleep_notifiers_register(void)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
vim +556 arch/x86/hyperv/hv_init.c
431
432 /*
433 * This function is to be invoked early in the boot sequence after the
434 * hypervisor has been detected.
435 *
436 * 1. Setup the hypercall page.
437 * 2. Register Hyper-V specific clocksource.
438 * 3. Setup Hyper-V specific APIC entry points.
439 */
440 void __init hyperv_init(void)
441 {
442 u64 guest_id;
443 union hv_x64_msr_hypercall_contents hypercall_msr;
444 int cpuhp;
445
446 if (x86_hyper_type != X86_HYPER_MS_HYPERV)
447 return;
448
449 if (hv_common_init())
450 return;
451
452 /*
453 * The VP assist page is useless to a TDX guest: the only use we
454 * would have for it is lazy EOI, which can not be used with TDX.
455 */
456 if (hv_isolation_type_tdx())
457 hv_vp_assist_page = NULL;
458 else
459 hv_vp_assist_page = kcalloc(nr_cpu_ids,
460 sizeof(*hv_vp_assist_page),
461 GFP_KERNEL);
462 if (!hv_vp_assist_page) {
463 ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
464
465 if (!hv_isolation_type_tdx())
466 goto common_free;
467 }
468
469 if (ms_hyperv.paravisor_present && hv_isolation_type_snp()) {
470 /* Negotiate GHCB Version. */
471 if (!hv_ghcb_negotiate_protocol())
472 hv_ghcb_terminate(SEV_TERM_SET_GEN,
473 GHCB_SEV_ES_PROT_UNSUPPORTED);
474
475 hv_ghcb_pg = alloc_percpu(union hv_ghcb *);
476 if (!hv_ghcb_pg)
477 goto free_vp_assist_page;
478 }
479
480 cpuhp = cpuhp_setup_state(CPUHP_AP_HYPERV_ONLINE, "x86/hyperv_init:online",
481 hv_cpu_init, hv_cpu_die);
482 if (cpuhp < 0)
483 goto free_ghcb_page;
484
485 /*
486 * Setup the hypercall page and enable hypercalls.
487 * 1. Register the guest ID
488 * 2. Enable the hypercall and register the hypercall page
489 *
490 * A TDX VM with no paravisor only uses TDX GHCI rather than hv_hypercall_pg:
491 * when the hypercall input is a page, such a VM must pass a decrypted
492 * page to Hyper-V, e.g. hv_post_message() uses the per-CPU page
493 * hyperv_pcpu_input_arg, which is decrypted if no paravisor is present.
494 *
495 * A TDX VM with the paravisor uses hv_hypercall_pg for most hypercalls,
496 * which are handled by the paravisor and the VM must use an encrypted
497 * input page: in such a VM, the hyperv_pcpu_input_arg is encrypted and
498 * used in the hypercalls, e.g. see hv_mark_gpa_visibility() and
499 * hv_arch_irq_unmask(). Such a VM uses TDX GHCI for two hypercalls:
500 * 1. HVCALL_SIGNAL_EVENT: see vmbus_set_event() and _hv_do_fast_hypercall8().
501 * 2. HVCALL_POST_MESSAGE: the input page must be a decrypted page, i.e.
502 * hv_post_message() in such a VM can't use the encrypted hyperv_pcpu_input_arg;
503 * instead, hv_post_message() uses the post_msg_page, which is decrypted
504 * in such a VM and is only used in such a VM.
505 */
506 guest_id = hv_generate_guest_id(LINUX_VERSION_CODE);
507 wrmsrq(HV_X64_MSR_GUEST_OS_ID, guest_id);
508
509 /* With the paravisor, the VM must also write the ID via GHCB/GHCI */
510 hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
511
512 /* A TDX VM with no paravisor only uses TDX GHCI rather than hv_hypercall_pg */
513 if (hv_isolation_type_tdx() && !ms_hyperv.paravisor_present)
514 goto skip_hypercall_pg_init;
515
516 hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, MODULES_VADDR,
517 MODULES_END, GFP_KERNEL, PAGE_KERNEL_ROX,
518 VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
519 __builtin_return_address(0));
520 if (hv_hypercall_pg == NULL)
521 goto clean_guest_os_id;
522
523 rdmsrq(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
524 hypercall_msr.enable = 1;
525
526 if (hv_root_partition()) {
527 struct page *pg;
528 void *src;
529
530 /*
531 * For the root partition, the hypervisor will set up its
532 * hypercall page. The hypervisor guarantees it will not show
533 * up in the root's address space. The root can't change the
534 * location of the hypercall page.
535 *
536 * Order is important here. We must enable the hypercall page
537 * so it is populated with code, then copy the code to an
538 * executable page.
539 */
540 wrmsrq(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
541
542 pg = vmalloc_to_page(hv_hypercall_pg);
543 src = memremap(hypercall_msr.guest_physical_address << PAGE_SHIFT, PAGE_SIZE,
544 MEMREMAP_WB);
545 BUG_ON(!src);
546 memcpy_to_page(pg, 0, src, HV_HYP_PAGE_SIZE);
547 memunmap(src);
548
549 hv_remap_tsc_clocksource();
550 /*
551 * The notifier registration might fail at various hops.
552 * Corresponding error messages will land in dmesg. There is
553 * otherwise nothing that can be specifically done to handle
554 * failures here.
555 */
> 556 (void)hv_sleep_notifiers_register();
557 } else {
558 hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
559 wrmsrq(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
560 }
561
562 hv_set_hypercall_pg(hv_hypercall_pg);
563
564 skip_hypercall_pg_init:
565 /*
566 * hyperv_init() is called before LAPIC is initialized: see
567 * apic_intr_mode_init() -> x86_platform.apic_post_init() and
568 * apic_bsp_setup() -> setup_local_APIC(). The direct-mode STIMER
569 * depends on LAPIC, so hv_stimer_alloc() should be called from
570 * x86_init.timers.setup_percpu_clockev.
571 */
572 old_setup_percpu_clockev = x86_init.timers.setup_percpu_clockev;
573 x86_init.timers.setup_percpu_clockev = hv_stimer_setup_percpu_clockev;
574
575 hv_apic_init();
576
577 x86_init.pci.arch_init = hv_pci_init;
578
579 register_syscore_ops(&hv_syscore_ops);
580
581 if (ms_hyperv.priv_high & HV_ACCESS_PARTITION_ID)
582 hv_get_partition_id();
583
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
2025-10-14 16:41 ` [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV Praveen K Paladugu
2025-10-15 5:23 ` kernel test robot
2025-10-15 8:01 ` kernel test robot
@ 2025-10-16 19:29 ` Michael Kelley
2025-10-20 15:59 ` Praveen Paladugu
2025-10-22 21:30 ` Praveen Paladugu
2 siblings, 2 replies; 10+ messages in thread
From: Michael Kelley @ 2025-10-16 19:29 UTC (permalink / raw)
To: Praveen K Paladugu, kys@microsoft.com, haiyangz@microsoft.com,
wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
mingo@redhat.com, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
arnd@arndb.de
Cc: anbelski@linux.microsoft.com,
easwar.hariharan@linux.microsoft.com,
nunodasneves@linux.microsoft.com,
skinsburskii@linux.microsoft.com
From: Praveen K Paladugu <prapal@linux.microsoft.com> Sent: Tuesday, October 14, 2025 9:41 AM
>
> When a shutdown is initiated in the root partition without configuring
> sleep states, the call to `hv_call_enter_sleep_state` fails. In such cases
> the root falls back to using legacy ACPI mechanisms to poweroff. This call
> is intercepted by MSHV and will result in a Machine Check Exception (MCE).
>
> Root panics with a trace similar to:
>
> [ 81.306348] reboot: Power down
> [ 81.314709] mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 0: b2000000c0060001
> [ 81.314711] mce: [Hardware Error]: TSC 3b8cb60a66 PPIN 11d98332458e4ea9
> [ 81.314713] mce: [Hardware Error]: PROCESSOR 0:606a6 TIME 1759339405 SOCKET 0 APIC 0 microcode ffffffff
> [ 81.314715] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
> [ 81.314716] mce: [Hardware Error]: Machine check: Processor context corrupt
> [ 81.314717] Kernel panic - not syncing: Fatal machine check
>
> To prevent this, properly configure sleep states within MSHV, allowing
> the root partition to shut down cleanly without triggering a panic.
>
> Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
> Co-developed-by: Anatol Belski <anbelski@linux.microsoft.com>
> Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
> ---
> arch/x86/hyperv/hv_init.c | 7 ++
> arch/x86/include/asm/mshyperv.h | 1 +
> drivers/hv/hv_common.c | 119 ++++++++++++++++++++++++++++++++
> 3 files changed, 127 insertions(+)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index afdbda2dd7b7..57bd96671ead 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -510,6 +510,13 @@ void __init hyperv_init(void)
> memunmap(src);
>
> hv_remap_tsc_clocksource();
> + /*
> + * The notifier registration might fail at various hops.
> + * Corresponding error messages will land in dmesg. There is
> + * otherwise nothing that can be specifically done to handle
> + * failures here.
> + */
> + (void)hv_sleep_notifiers_register();
> } else {
> hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
> wrmsrq(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index abc4659f5809..fb8d691193df 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -236,6 +236,7 @@ int hyperv_fill_flush_guest_mapping_list(
> void hv_apic_init(void);
> void __init hv_init_spinlocks(void);
> bool hv_vcpu_is_preempted(int vcpu);
> +int hv_sleep_notifiers_register(void);
> #else
> static inline void hv_apic_init(void) {}
> #endif
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index e109a620c83f..cfba9ded7bcb 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -837,3 +837,122 @@ const char *hv_result_to_string(u64 status)
> return "Unknown";
> }
> EXPORT_SYMBOL_GPL(hv_result_to_string);
> +
> +#if IS_ENABLED(CONFIG_ACPI)
> +/*
> + * Corresponding sleep states have to be initialized in order for a subsequent
> + * HVCALL_ENTER_SLEEP_STATE call to succeed. Currently only S5 state as per
> + * ACPI 6.4 chapter 7.4.2 is relevant, while S1, S2 and S3 can be supported.
> + *
> + * ACPI should be initialized and should support S5 sleep state when this method
> + * is called, so that it can extract correct PM values and pass them to hv.
> + */
> +static int hv_initialize_sleep_states(void)
> +{
> + u64 status;
> + unsigned long flags;
> + struct hv_input_set_system_property *in;
> + acpi_status acpi_status;
> + u8 sleep_type_a, sleep_type_b;
> +
> + if (!acpi_sleep_state_supported(ACPI_STATE_S5)) {
> + pr_err("%s: S5 sleep state not supported.\n", __func__);
> + return -ENODEV;
> + }
> +
> + acpi_status = acpi_get_sleep_type_data(ACPI_STATE_S5,
> + &sleep_type_a, &sleep_type_b);
> + if (ACPI_FAILURE(acpi_status))
> + return -ENODEV;
> +
> + local_irq_save(flags);
> + in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> + memset(in, 0, sizeof(*in));
> +
> + in->property_id = HV_SYSTEM_PROPERTY_SLEEP_STATE;
> + in->set_sleep_state_info.sleep_state = HV_SLEEP_STATE_S5;
> + in->set_sleep_state_info.pm1a_slp_typ = sleep_type_a;
> + in->set_sleep_state_info.pm1b_slp_typ = sleep_type_b;
> +
> + status = hv_do_hypercall(HVCALL_SET_SYSTEM_PROPERTY, in, NULL);
> + local_irq_restore(flags);
> +
> + if (!hv_result_success(status)) {
> + hv_status_err(status, "\n");
> + return hv_result_to_errno(status);
> + }
> +
> + return 0;
> +}
> +
> +static int hv_call_enter_sleep_state(u32 sleep_state)
> +{
> + u64 status;
> + int ret;
> + unsigned long flags;
> + struct hv_input_enter_sleep_state *in;
> +
> + ret = hv_initialize_sleep_states();
> + if (ret)
> + return ret;
> +
> + local_irq_save(flags);
> + in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> + in->sleep_state = sleep_state;
> +
> + status = hv_do_hypercall(HVCALL_ENTER_SLEEP_STATE, in, NULL);
If this hypercall succeeds, does the root partition (which is the caller) go
to sleep in S5, such that the hypercall never returns? If that's not the case,
what is the behavior of this hypercall?
> + local_irq_restore(flags);
> +
> + if (!hv_result_success(status)) {
> + hv_status_err(status, "\n");
> + return hv_result_to_errno(status);
> + }
> +
> + return 0;
> +}
> +
> +static int hv_reboot_notifier_handler(struct notifier_block *this,
> + unsigned long code, void *another)
> +{
> + int ret = 0;
> +
> + if (code == SYS_HALT || code == SYS_POWER_OFF)
> + ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
If hv_call_enter_sleep_state() never returns, here's an issue. There may be
multiple entries on the reboot notifier chain. For example,
mshv_root_partition_init() puts an entry on the reboot notifier chain. At
reboot time, the entries are executed in some order, with the expectation
that all entries will be executed prior to the reboot actually happening. But
if this hypercall never returns, some entries may never be executed.
Notifier chains support a notion of priority to control the order in
which they are executed, but that priority isn't set in hv_reboot_notifier
below, or in mshv_reboot_nb. And most other reboot notifiers throughout
Linux appear to not set it. So the ordering is unspecified, and having
this notifier never return may be problematic.
> +
> + return ret ? NOTIFY_DONE : NOTIFY_OK;
> +}
> +
> +static struct notifier_block hv_reboot_notifier = {
> + .notifier_call = hv_reboot_notifier_handler,
> +};
> +
> +static int hv_acpi_sleep_handler(u8 sleep_state, u32 pm1a_cnt, u32 pm1b_cnt)
> +{
> + int ret = 0;
> +
> + if (sleep_state == ACPI_STATE_S5)
> + ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> +
> + return ret == 0 ? 1 : -1;
> +}
> +
> +static int hv_acpi_extended_sleep_handler(u8 sleep_state, u32 val_a, u32 val_b)
> +{
> + return hv_acpi_sleep_handler(sleep_state, val_a, val_b);
> +}
Is this function needed? The function signature is identical to hv_acpi_sleep_handler().
So it seems like acpi_os_set_prepare_extended_sleep() could just use
hv_acpi_sleep_handler() directly.
> +
> +int hv_sleep_notifiers_register(void)
> +{
> + int ret;
> +
> + acpi_os_set_prepare_sleep(&hv_acpi_sleep_handler);
> + acpi_os_set_prepare_extended_sleep(&hv_acpi_extended_sleep_handler);
I'm not clear on why these handlers are set. If the hv_reboot_notifier is
called, are these ACPI handlers ever called? Or are these to catch any cases
where the hv_reboot_notifier is somehow bypassed? Or maybe I'm just
not understanding something .... :-)
> +
> + ret = register_reboot_notifier(&hv_reboot_notifier);
> + if (ret)
> + pr_err("%s: cannot register reboot notifier %d\n",
> + __func__, ret);
> +
> + return ret;
> +}
> +#endif
I'm wondering if all this code belongs in hv_common.c, since it is only needed
for Linux in the root partition. Couldn't it go in mshv_common.c? It would still
be built-in code (i.e., not in a loadable module), but only if CONFIG_MSHV_ROOT
is set.
Michael
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
2025-10-16 19:29 ` Michael Kelley
@ 2025-10-20 15:59 ` Praveen Paladugu
2025-10-20 17:30 ` Michael Kelley
2025-10-22 21:30 ` Praveen Paladugu
1 sibling, 1 reply; 10+ messages in thread
From: Praveen Paladugu @ 2025-10-20 15:59 UTC (permalink / raw)
To: Michael Kelley
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, arnd@arndb.de, anbelski@linux.microsoft.com,
easwar.hariharan@linux.microsoft.com,
nunodasneves@linux.microsoft.com,
skinsburskii@linux.microsoft.com
On Thu, Oct 16, 2025 at 07:29:06PM +0000, Michael Kelley wrote:
> From: Praveen K Paladugu <prapal@linux.microsoft.com> Sent: Tuesday, October 14, 2025 9:41 AM
> >
> > When a shutdown is initiated in the root partition without configuring
> > sleep states, the call to `hv_call_enter_sleep_state` fails. In such cases
> > the root falls back to using legacy ACPI mechanisms to poweroff. This call
> > is intercepted by MSHV and will result in a Machine Check Exception (MCE).
> >
> > Root panics with a trace similar to:
> >
> > [ 81.306348] reboot: Power down
> > [ 81.314709] mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 0: b2000000c0060001
> > [ 81.314711] mce: [Hardware Error]: TSC 3b8cb60a66 PPIN 11d98332458e4ea9
> > [ 81.314713] mce: [Hardware Error]: PROCESSOR 0:606a6 TIME 1759339405 SOCKET 0 APIC 0 microcode ffffffff
> > [ 81.314715] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
> > [ 81.314716] mce: [Hardware Error]: Machine check: Processor context corrupt
> > [ 81.314717] Kernel panic - not syncing: Fatal machine check
> >
> > To prevent this, properly configure sleep states within MSHV, allowing
> > the root partition to shut down cleanly without triggering a panic.
> >
> > Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
> > Co-developed-by: Anatol Belski <anbelski@linux.microsoft.com>
> > Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
> > ---
> > arch/x86/hyperv/hv_init.c | 7 ++
> > arch/x86/include/asm/mshyperv.h | 1 +
> > drivers/hv/hv_common.c | 119 ++++++++++++++++++++++++++++++++
> > 3 files changed, 127 insertions(+)
> >
> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > index afdbda2dd7b7..57bd96671ead 100644
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -510,6 +510,13 @@ void __init hyperv_init(void)
> > memunmap(src);
> >
> > hv_remap_tsc_clocksource();
> > + /*
> > + * The notifier registration might fail at various hops.
> > + * Corresponding error messages will land in dmesg. There is
> > + * otherwise nothing that can be specifically done to handle
> > + * failures here.
> > + */
> > + (void)hv_sleep_notifiers_register();
> > } else {
> > hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
> > wrmsrq(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> > index abc4659f5809..fb8d691193df 100644
> > --- a/arch/x86/include/asm/mshyperv.h
> > +++ b/arch/x86/include/asm/mshyperv.h
> > @@ -236,6 +236,7 @@ int hyperv_fill_flush_guest_mapping_list(
> > void hv_apic_init(void);
> > void __init hv_init_spinlocks(void);
> > bool hv_vcpu_is_preempted(int vcpu);
> > +int hv_sleep_notifiers_register(void);
> > #else
> > static inline void hv_apic_init(void) {}
> > #endif
> > diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> > index e109a620c83f..cfba9ded7bcb 100644
> > --- a/drivers/hv/hv_common.c
> > +++ b/drivers/hv/hv_common.c
> > @@ -837,3 +837,122 @@ const char *hv_result_to_string(u64 status)
> > return "Unknown";
> > }
> > EXPORT_SYMBOL_GPL(hv_result_to_string);
> > +
> > +#if IS_ENABLED(CONFIG_ACPI)
> > +/*
> > + * Corresponding sleep states have to be initialized in order for a subsequent
> > + * HVCALL_ENTER_SLEEP_STATE call to succeed. Currently only S5 state as per
> > + * ACPI 6.4 chapter 7.4.2 is relevant, while S1, S2 and S3 can be supported.
> > + *
> > + * ACPI should be initialized and should support S5 sleep state when this method
> > + * is called, so that it can extract correct PM values and pass them to hv.
> > + */
> > +static int hv_initialize_sleep_states(void)
> > +{
> > + u64 status;
> > + unsigned long flags;
> > + struct hv_input_set_system_property *in;
> > + acpi_status acpi_status;
> > + u8 sleep_type_a, sleep_type_b;
> > +
> > + if (!acpi_sleep_state_supported(ACPI_STATE_S5)) {
> > + pr_err("%s: S5 sleep state not supported.\n", __func__);
> > + return -ENODEV;
> > + }
> > +
> > + acpi_status = acpi_get_sleep_type_data(ACPI_STATE_S5,
> > + &sleep_type_a, &sleep_type_b);
> > + if (ACPI_FAILURE(acpi_status))
> > + return -ENODEV;
> > +
> > + local_irq_save(flags);
> > + in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > + memset(in, 0, sizeof(*in));
> > +
> > + in->property_id = HV_SYSTEM_PROPERTY_SLEEP_STATE;
> > + in->set_sleep_state_info.sleep_state = HV_SLEEP_STATE_S5;
> > + in->set_sleep_state_info.pm1a_slp_typ = sleep_type_a;
> > + in->set_sleep_state_info.pm1b_slp_typ = sleep_type_b;
> > +
> > + status = hv_do_hypercall(HVCALL_SET_SYSTEM_PROPERTY, in, NULL);
> > + local_irq_restore(flags);
> > +
> > + if (!hv_result_success(status)) {
> > + hv_status_err(status, "\n");
> > + return hv_result_to_errno(status);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int hv_call_enter_sleep_state(u32 sleep_state)
> > +{
> > + u64 status;
> > + int ret;
> > + unsigned long flags;
> > + struct hv_input_enter_sleep_state *in;
> > +
> > + ret = hv_initialize_sleep_states();
> > + if (ret)
> > + return ret;
> > +
> > + local_irq_save(flags);
> > + in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > + in->sleep_state = sleep_state;
> > +
> > + status = hv_do_hypercall(HVCALL_ENTER_SLEEP_STATE, in, NULL);
>
> If this hypercall succeeds, does the root partition (which is the caller) go
> to sleep in S5, such that the hypercall never returns? If that's not the case,
> what is the behavior of this hypercall?
>
This hypercall returns to the kernel when the CPU wakes up the next
time.
> > + local_irq_restore(flags);
> > +
> > + if (!hv_result_success(status)) {
> > + hv_status_err(status, "\n");
> > + return hv_result_to_errno(status);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int hv_reboot_notifier_handler(struct notifier_block *this,
> > + unsigned long code, void *another)
> > +{
> > + int ret = 0;
> > +
> > + if (code == SYS_HALT || code == SYS_POWER_OFF)
> > + ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
>
> If hv_call_enter_sleep_state() never returns, here's an issue. There may be
> multiple entries on the reboot notifier chain. For example,
> mshv_root_partition_init() puts an entry on the reboot notifier chain. At
> reboot time, the entries are executed in some order, with the expectation
> that all entries will be executed prior to the reboot actually happening. But
> if this hypercall never returns, some entries may never be executed.
>
> Notifier chains support a notion of priority to control the order in
> which they are executed, but that priority isn't set in hv_reboot_notifier
> below, or in mshv_reboot_nb. And most other reboot notifiers throughout
> Linux appear to not set it. So the ordering is unspecified, and having
> this notifier never return may be problematic.
>
Thanks for the detailed explanation Michael!
As I mentioned above, this hypercall returns to the kernel, so the rest
of the entries in the notifier chain should continue to execute.
> > +
> > + return ret ? NOTIFY_DONE : NOTIFY_OK;
> > +}
> > +
> > +static struct notifier_block hv_reboot_notifier = {
> > + .notifier_call = hv_reboot_notifier_handler,
> > +};
> > +
> > +static int hv_acpi_sleep_handler(u8 sleep_state, u32 pm1a_cnt, u32 pm1b_cnt)
> > +{
> > + int ret = 0;
> > +
> > + if (sleep_state == ACPI_STATE_S5)
> > + ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> > +
> > + return ret == 0 ? 1 : -1;
> > +}
> > +
> > +static int hv_acpi_extended_sleep_handler(u8 sleep_state, u32 val_a, u32 val_b)
> > +{
> > + return hv_acpi_sleep_handler(sleep_state, val_a, val_b);
> > +}
>
> Is this function needed? The function signature is identical to hv_acpi_sleep_handler().
> So it seems like acpi_os_set_prepare_extended_sleep() could just use
> hv_acpi_sleep_handler() directly.
>
Upon further investigation, I discovered that extended sleep is only
supported on platforms with ACPI_REDUCED_HARDWARE.
As these patches are targetted at X86, above does not really apply. I
will drop this handler in next version.
> > +
> > +int hv_sleep_notifiers_register(void)
> > +{
> > + int ret;
> > +
> > + acpi_os_set_prepare_sleep(&hv_acpi_sleep_handler);
> > + acpi_os_set_prepare_extended_sleep(&hv_acpi_extended_sleep_handler);
>
> I'm not clear on why these handlers are set. If the hv_reboot_notifier is
> called, are these ACPI handlers ever called? Or are these to catch any cases
> where the hv_reboot_notifier is somehow bypassed? Or maybe I'm just
> not understanding something .... :-)
>
I am trying to trace these calls. I will keep you posted with my
findings.
> > +
> > + ret = register_reboot_notifier(&hv_reboot_notifier);
> > + if (ret)
> > + pr_err("%s: cannot register reboot notifier %d\n",
> > + __func__, ret);
> > +
> > + return ret;
> > +}
> > +#endif
>
> I'm wondering if all this code belongs in hv_common.c, since it is only needed
> for Linux in the root partition. Couldn't it go in mshv_common.c? It would still
> be built-in code (i.e., not in a loadable module), but only if CONFIG_MSHV_ROOT
> is set.
>
This sounds reasonable. I will discuss this internally and get back you.
> Michael
>
> > --
> > 2.51.0
> >
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
2025-10-20 15:59 ` Praveen Paladugu
@ 2025-10-20 17:30 ` Michael Kelley
2025-10-22 21:26 ` Praveen Paladugu
0 siblings, 1 reply; 10+ messages in thread
From: Michael Kelley @ 2025-10-20 17:30 UTC (permalink / raw)
To: Praveen Paladugu
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, arnd@arndb.de, anbelski@linux.microsoft.com,
easwar.hariharan@linux.microsoft.com,
nunodasneves@linux.microsoft.com,
skinsburskii@linux.microsoft.com
From: Praveen Paladugu <prapal@linux.microsoft.com> Sent: Monday, October 20, 2025 9:00 AM
>
> On Thu, Oct 16, 2025 at 07:29:06PM +0000, Michael Kelley wrote:
> > From: Praveen K Paladugu <prapal@linux.microsoft.com> Sent: Tuesday, October 14, 2025 9:41 AM
> > >
[snip]
> > > +static int hv_call_enter_sleep_state(u32 sleep_state)
> > > +{
> > > + u64 status;
> > > + int ret;
> > > + unsigned long flags;
> > > + struct hv_input_enter_sleep_state *in;
> > > +
> > > + ret = hv_initialize_sleep_states();
> > > + if (ret)
> > > + return ret;
> > > +
> > > + local_irq_save(flags);
> > > + in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > > + in->sleep_state = sleep_state;
> > > +
> > > + status = hv_do_hypercall(HVCALL_ENTER_SLEEP_STATE, in, NULL);
> >
> > If this hypercall succeeds, does the root partition (which is the caller) go
> > to sleep in S5, such that the hypercall never returns? If that's not the case,
> > what is the behavior of this hypercall?
> >
> This hypercall returns to the kernel when the CPU wakes up the next
> time.
I must be missing something about the big picture, because "returns to
the kernel when the CPU wakes up" doesn't fit my mental model of what's
going on. I thought this function would be called, and the hypercall made,
when Linux in the root partition is shutting down. So if a CPU makes this
hypercall and goes to sleep, what wakes it up? And when it wakes up, is it
still running the same Linux instance that was shutting down, or has it
rebooted into new Linux instance? In the latter case, returning from
the hypercall doesn't make sense.
Can you explain further how this all works?
Michael
>
> > > + local_irq_restore(flags);
> > > +
> > > + if (!hv_result_success(status)) {
> > > + hv_status_err(status, "\n");
> > > + return hv_result_to_errno(status);
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int hv_reboot_notifier_handler(struct notifier_block *this,
> > > + unsigned long code, void *another)
> > > +{
> > > + int ret = 0;
> > > +
> > > + if (code == SYS_HALT || code == SYS_POWER_OFF)
> > > + ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> >
> > If hv_call_enter_sleep_state() never returns, here's an issue. There may be
> > multiple entries on the reboot notifier chain. For example,
> > mshv_root_partition_init() puts an entry on the reboot notifier chain. At
> > reboot time, the entries are executed in some order, with the expectation
> > that all entries will be executed prior to the reboot actually happening. But
> > if this hypercall never returns, some entries may never be executed.
> >
> > Notifier chains support a notion of priority to control the order in
> > which they are executed, but that priority isn't set in hv_reboot_notifier
> > below, or in mshv_reboot_nb. And most other reboot notifiers throughout
> > Linux appear to not set it. So the ordering is unspecified, and having
> > this notifier never return may be problematic.
> >
> Thanks for the detailed explanation Michael!
>
> As I mentioned above, this hypercall returns to the kernel, so the rest
> of the entries in the notifier chain should continue to execute.
>
> > > +
> > > + return ret ? NOTIFY_DONE : NOTIFY_OK;
> > > +}
> > > +
> > > +static struct notifier_block hv_reboot_notifier = {
> > > + .notifier_call = hv_reboot_notifier_handler,
> > > +};
> > > +
> > > +static int hv_acpi_sleep_handler(u8 sleep_state, u32 pm1a_cnt, u32 pm1b_cnt)
> > > +{
> > > + int ret = 0;
> > > +
> > > + if (sleep_state == ACPI_STATE_S5)
> > > + ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> > > +
> > > + return ret == 0 ? 1 : -1;
> > > +}
> > > +
> > > +static int hv_acpi_extended_sleep_handler(u8 sleep_state, u32 val_a, u32 val_b)
> > > +{
> > > + return hv_acpi_sleep_handler(sleep_state, val_a, val_b);
> > > +}
> >
> > Is this function needed? The function signature is identical to hv_acpi_sleep_handler().
> > So it seems like acpi_os_set_prepare_extended_sleep() could just use
> > hv_acpi_sleep_handler() directly.
> >
> Upon further investigation, I discovered that extended sleep is only
> supported on platforms with ACPI_REDUCED_HARDWARE.
>
> As these patches are targetted at X86, above does not really apply. I
> will drop this handler in next version.
>
> > > +
> > > +int hv_sleep_notifiers_register(void)
> > > +{
> > > + int ret;
> > > +
> > > + acpi_os_set_prepare_sleep(&hv_acpi_sleep_handler);
> > > + acpi_os_set_prepare_extended_sleep(&hv_acpi_extended_sleep_handler);
> >
> > I'm not clear on why these handlers are set. If the hv_reboot_notifier is
> > called, are these ACPI handlers ever called? Or are these to catch any cases
> > where the hv_reboot_notifier is somehow bypassed? Or maybe I'm just
> > not understanding something .... :-)
> >
>
> I am trying to trace these calls. I will keep you posted with my
> findings.
>
> > > +
> > > + ret = register_reboot_notifier(&hv_reboot_notifier);
> > > + if (ret)
> > > + pr_err("%s: cannot register reboot notifier %d\n",
> > > + __func__, ret);
> > > +
> > > + return ret;
> > > +}
> > > +#endif
> >
> > I'm wondering if all this code belongs in hv_common.c, since it is only needed
> > for Linux in the root partition. Couldn't it go in mshv_common.c? It would still
> > be built-in code (i.e., not in a loadable module), but only if CONFIG_MSHV_ROOT
> > is set.
> >
>
> This sounds reasonable. I will discuss this internally and get back you.
>
> > Michael
> >
> > > --
> > > 2.51.0
> > >
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
2025-10-20 17:30 ` Michael Kelley
@ 2025-10-22 21:26 ` Praveen Paladugu
0 siblings, 0 replies; 10+ messages in thread
From: Praveen Paladugu @ 2025-10-22 21:26 UTC (permalink / raw)
To: Michael Kelley
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, arnd@arndb.de, anbelski@linux.microsoft.com,
easwar.hariharan@linux.microsoft.com,
nunodasneves@linux.microsoft.com,
skinsburskii@linux.microsoft.com
On Mon, Oct 20, 2025 at 05:30:44PM +0000, Michael Kelley wrote:
> From: Praveen Paladugu <prapal@linux.microsoft.com> Sent: Monday, October 20, 2025 9:00 AM
> >
> > On Thu, Oct 16, 2025 at 07:29:06PM +0000, Michael Kelley wrote:
> > > From: Praveen K Paladugu <prapal@linux.microsoft.com> Sent: Tuesday, October 14, 2025 9:41 AM
> > > >
>
> [snip]
>
> > > > +static int hv_call_enter_sleep_state(u32 sleep_state)
> > > > +{
> > > > + u64 status;
> > > > + int ret;
> > > > + unsigned long flags;
> > > > + struct hv_input_enter_sleep_state *in;
> > > > +
> > > > + ret = hv_initialize_sleep_states();
> > > > + if (ret)
> > > > + return ret;
> > > > +
> > > > + local_irq_save(flags);
> > > > + in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > > > + in->sleep_state = sleep_state;
> > > > +
> > > > + status = hv_do_hypercall(HVCALL_ENTER_SLEEP_STATE, in, NULL);
> > >
> > > If this hypercall succeeds, does the root partition (which is the caller) go
> > > to sleep in S5, such that the hypercall never returns? If that's not the case,
> > > what is the behavior of this hypercall?
> > >
> > This hypercall returns to the kernel when the CPU wakes up the next
> > time.
>
> I must be missing something about the big picture, because "returns to
> the kernel when the CPU wakes up" doesn't fit my mental model of what's
> going on. I thought this function would be called, and the hypercall made,
> when Linux in the root partition is shutting down. So if a CPU makes this
> hypercall and goes to sleep, what wakes it up? And when it wakes up, is it
> still running the same Linux instance that was shutting down, or has it
> rebooted into new Linux instance? In the latter case, returning from
> the hypercall doesn't make sense.
>
> Can you explain further how this all works?
>
Sorry for the confusion here. I mis-understood what happens while
entering the sleep state here. I will clarify below:
Sleep state S5 refers to shutdown/poweroff. Although other sleep states
are handled, they are not properly applied while running mshv on
servers. Non-S5 sleep states are only applied on non-server hosts.
This patch only handles S5 sleep state.
If a hypercall for non-S5 sleep state is invoked, the control
is returned back to the kernel. Usually non-S5 sleep state do things
like suspend to memory/hibernate etc.
If the hypecall is for S5 sleep state, then Hypervisor does poweroff the
host and the control does not return back to the kernel.
Now that is clarified, your previous comments about the order of the
reboot_notifiers is relevant now. I will investigate how to apply a
priority/order so that hv_reboot_notifier_handler will be called last.
Praveen
> Michael
>
> >
> > > > + local_irq_restore(flags);
> > > > +
> > > > + if (!hv_result_success(status)) {
> > > > + hv_status_err(status, "\n");
> > > > + return hv_result_to_errno(status);
> > > > + }
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static int hv_reboot_notifier_handler(struct notifier_block *this,
> > > > + unsigned long code, void *another)
> > > > +{
> > > > + int ret = 0;
> > > > +
> > > > + if (code == SYS_HALT || code == SYS_POWER_OFF)
> > > > + ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> > >
> > > If hv_call_enter_sleep_state() never returns, here's an issue. There may be
> > > multiple entries on the reboot notifier chain. For example,
> > > mshv_root_partition_init() puts an entry on the reboot notifier chain. At
> > > reboot time, the entries are executed in some order, with the expectation
> > > that all entries will be executed prior to the reboot actually happening. But
> > > if this hypercall never returns, some entries may never be executed.
> > >
> > > Notifier chains support a notion of priority to control the order in
> > > which they are executed, but that priority isn't set in hv_reboot_notifier
> > > below, or in mshv_reboot_nb. And most other reboot notifiers throughout
> > > Linux appear to not set it. So the ordering is unspecified, and having
> > > this notifier never return may be problematic.
> > >
> > Thanks for the detailed explanation Michael!
> >
> > As I mentioned above, this hypercall returns to the kernel, so the rest
> > of the entries in the notifier chain should continue to execute.
> >
> > > > +
> > > > + return ret ? NOTIFY_DONE : NOTIFY_OK;
> > > > +}
> > > > +
> > > > +static struct notifier_block hv_reboot_notifier = {
> > > > + .notifier_call = hv_reboot_notifier_handler,
> > > > +};
> > > > +
> > > > +static int hv_acpi_sleep_handler(u8 sleep_state, u32 pm1a_cnt, u32 pm1b_cnt)
> > > > +{
> > > > + int ret = 0;
> > > > +
> > > > + if (sleep_state == ACPI_STATE_S5)
> > > > + ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> > > > +
> > > > + return ret == 0 ? 1 : -1;
> > > > +}
> > > > +
> > > > +static int hv_acpi_extended_sleep_handler(u8 sleep_state, u32 val_a, u32 val_b)
> > > > +{
> > > > + return hv_acpi_sleep_handler(sleep_state, val_a, val_b);
> > > > +}
> > >
> > > Is this function needed? The function signature is identical to hv_acpi_sleep_handler().
> > > So it seems like acpi_os_set_prepare_extended_sleep() could just use
> > > hv_acpi_sleep_handler() directly.
> > >
> > Upon further investigation, I discovered that extended sleep is only
> > supported on platforms with ACPI_REDUCED_HARDWARE.
> >
> > As these patches are targetted at X86, above does not really apply. I
> > will drop this handler in next version.
> >
> > > > +
> > > > +int hv_sleep_notifiers_register(void)
> > > > +{
> > > > + int ret;
> > > > +
> > > > + acpi_os_set_prepare_sleep(&hv_acpi_sleep_handler);
> > > > + acpi_os_set_prepare_extended_sleep(&hv_acpi_extended_sleep_handler);
> > >
> > > I'm not clear on why these handlers are set. If the hv_reboot_notifier is
> > > called, are these ACPI handlers ever called? Or are these to catch any cases
> > > where the hv_reboot_notifier is somehow bypassed? Or maybe I'm just
> > > not understanding something .... :-)
> > >
> >
> > I am trying to trace these calls. I will keep you posted with my
> > findings.
> >
> > > > +
> > > > + ret = register_reboot_notifier(&hv_reboot_notifier);
> > > > + if (ret)
> > > > + pr_err("%s: cannot register reboot notifier %d\n",
> > > > + __func__, ret);
> > > > +
> > > > + return ret;
> > > > +}
> > > > +#endif
> > >
> > > I'm wondering if all this code belongs in hv_common.c, since it is only needed
> > > for Linux in the root partition. Couldn't it go in mshv_common.c? It would still
> > > be built-in code (i.e., not in a loadable module), but only if CONFIG_MSHV_ROOT
> > > is set.
> > >
> >
> > This sounds reasonable. I will discuss this internally and get back you.
> >
> > > Michael
> > >
> > > > --
> > > > 2.51.0
> > > >
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
2025-10-16 19:29 ` Michael Kelley
2025-10-20 15:59 ` Praveen Paladugu
@ 2025-10-22 21:30 ` Praveen Paladugu
1 sibling, 0 replies; 10+ messages in thread
From: Praveen Paladugu @ 2025-10-22 21:30 UTC (permalink / raw)
To: Michael Kelley
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
hpa@zytor.com, arnd@arndb.de, anbelski@linux.microsoft.com,
easwar.hariharan@linux.microsoft.com,
nunodasneves@linux.microsoft.com,
skinsburskii@linux.microsoft.com
On Thu, Oct 16, 2025 at 07:29:06PM +0000, Michael Kelley wrote:
> From: Praveen K Paladugu <prapal@linux.microsoft.com> Sent: Tuesday, October 14, 2025 9:41 AM
> >
> > When a shutdown is initiated in the root partition without configuring
> > sleep states, the call to `hv_call_enter_sleep_state` fails. In such cases
> > the root falls back to using legacy ACPI mechanisms to poweroff. This call
> > is intercepted by MSHV and will result in a Machine Check Exception (MCE).
> >
> > Root panics with a trace similar to:
> >
> > [ 81.306348] reboot: Power down
> > [ 81.314709] mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 0: b2000000c0060001
> > [ 81.314711] mce: [Hardware Error]: TSC 3b8cb60a66 PPIN 11d98332458e4ea9
> > [ 81.314713] mce: [Hardware Error]: PROCESSOR 0:606a6 TIME 1759339405 SOCKET 0 APIC 0 microcode ffffffff
> > [ 81.314715] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
> > [ 81.314716] mce: [Hardware Error]: Machine check: Processor context corrupt
> > [ 81.314717] Kernel panic - not syncing: Fatal machine check
> >
> > To prevent this, properly configure sleep states within MSHV, allowing
> > the root partition to shut down cleanly without triggering a panic.
> >
> > Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
> > Co-developed-by: Anatol Belski <anbelski@linux.microsoft.com>
> > Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
> > ---
> > arch/x86/hyperv/hv_init.c | 7 ++
> > arch/x86/include/asm/mshyperv.h | 1 +
> > drivers/hv/hv_common.c | 119 ++++++++++++++++++++++++++++++++
> > 3 files changed, 127 insertions(+)
> >
> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > index afdbda2dd7b7..57bd96671ead 100644
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -510,6 +510,13 @@ void __init hyperv_init(void)
> > memunmap(src);
> >
> > hv_remap_tsc_clocksource();
> > + /*
> > + * The notifier registration might fail at various hops.
> > + * Corresponding error messages will land in dmesg. There is
> > + * otherwise nothing that can be specifically done to handle
> > + * failures here.
> > + */
> > + (void)hv_sleep_notifiers_register();
> > } else {
> > hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
> > wrmsrq(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> > index abc4659f5809..fb8d691193df 100644
> > --- a/arch/x86/include/asm/mshyperv.h
> > +++ b/arch/x86/include/asm/mshyperv.h
> > @@ -236,6 +236,7 @@ int hyperv_fill_flush_guest_mapping_list(
> > void hv_apic_init(void);
> > void __init hv_init_spinlocks(void);
> > bool hv_vcpu_is_preempted(int vcpu);
> > +int hv_sleep_notifiers_register(void);
> > #else
> > static inline void hv_apic_init(void) {}
> > #endif
> > diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> > index e109a620c83f..cfba9ded7bcb 100644
> > --- a/drivers/hv/hv_common.c
> > +++ b/drivers/hv/hv_common.c
> > @@ -837,3 +837,122 @@ const char *hv_result_to_string(u64 status)
> > return "Unknown";
> > }
> > EXPORT_SYMBOL_GPL(hv_result_to_string);
> > +
> > +#if IS_ENABLED(CONFIG_ACPI)
> > +/*
> > + * Corresponding sleep states have to be initialized in order for a subsequent
> > + * HVCALL_ENTER_SLEEP_STATE call to succeed. Currently only S5 state as per
> > + * ACPI 6.4 chapter 7.4.2 is relevant, while S1, S2 and S3 can be supported.
> > + *
> > + * ACPI should be initialized and should support S5 sleep state when this method
> > + * is called, so that it can extract correct PM values and pass them to hv.
> > + */
> > +static int hv_initialize_sleep_states(void)
> > +{
> > + u64 status;
> > + unsigned long flags;
> > + struct hv_input_set_system_property *in;
> > + acpi_status acpi_status;
> > + u8 sleep_type_a, sleep_type_b;
> > +
> > + if (!acpi_sleep_state_supported(ACPI_STATE_S5)) {
> > + pr_err("%s: S5 sleep state not supported.\n", __func__);
> > + return -ENODEV;
> > + }
> > +
> > + acpi_status = acpi_get_sleep_type_data(ACPI_STATE_S5,
> > + &sleep_type_a, &sleep_type_b);
> > + if (ACPI_FAILURE(acpi_status))
> > + return -ENODEV;
> > +
> > + local_irq_save(flags);
> > + in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > + memset(in, 0, sizeof(*in));
> > +
> > + in->property_id = HV_SYSTEM_PROPERTY_SLEEP_STATE;
> > + in->set_sleep_state_info.sleep_state = HV_SLEEP_STATE_S5;
> > + in->set_sleep_state_info.pm1a_slp_typ = sleep_type_a;
> > + in->set_sleep_state_info.pm1b_slp_typ = sleep_type_b;
> > +
> > + status = hv_do_hypercall(HVCALL_SET_SYSTEM_PROPERTY, in, NULL);
> > + local_irq_restore(flags);
> > +
> > + if (!hv_result_success(status)) {
> > + hv_status_err(status, "\n");
> > + return hv_result_to_errno(status);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int hv_call_enter_sleep_state(u32 sleep_state)
> > +{
> > + u64 status;
> > + int ret;
> > + unsigned long flags;
> > + struct hv_input_enter_sleep_state *in;
> > +
> > + ret = hv_initialize_sleep_states();
> > + if (ret)
> > + return ret;
> > +
> > + local_irq_save(flags);
> > + in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > + in->sleep_state = sleep_state;
> > +
> > + status = hv_do_hypercall(HVCALL_ENTER_SLEEP_STATE, in, NULL);
>
> If this hypercall succeeds, does the root partition (which is the caller) go
> to sleep in S5, such that the hypercall never returns? If that's not the case,
> what is the behavior of this hypercall?
>
> > + local_irq_restore(flags);
> > +
> > + if (!hv_result_success(status)) {
> > + hv_status_err(status, "\n");
> > + return hv_result_to_errno(status);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int hv_reboot_notifier_handler(struct notifier_block *this,
> > + unsigned long code, void *another)
> > +{
> > + int ret = 0;
> > +
> > + if (code == SYS_HALT || code == SYS_POWER_OFF)
> > + ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
>
> If hv_call_enter_sleep_state() never returns, here's an issue. There may be
> multiple entries on the reboot notifier chain. For example,
> mshv_root_partition_init() puts an entry on the reboot notifier chain. At
> reboot time, the entries are executed in some order, with the expectation
> that all entries will be executed prior to the reboot actually happening. But
> if this hypercall never returns, some entries may never be executed.
>
> Notifier chains support a notion of priority to control the order in
> which they are executed, but that priority isn't set in hv_reboot_notifier
> below, or in mshv_reboot_nb. And most other reboot notifiers throughout
> Linux appear to not set it. So the ordering is unspecified, and having
> this notifier never return may be problematic.
>
> > +
> > + return ret ? NOTIFY_DONE : NOTIFY_OK;
> > +}
> > +
> > +static struct notifier_block hv_reboot_notifier = {
> > + .notifier_call = hv_reboot_notifier_handler,
> > +};
> > +
> > +static int hv_acpi_sleep_handler(u8 sleep_state, u32 pm1a_cnt, u32 pm1b_cnt)
> > +{
> > + int ret = 0;
> > +
> > + if (sleep_state == ACPI_STATE_S5)
> > + ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> > +
> > + return ret == 0 ? 1 : -1;
> > +}
> > +
> > +static int hv_acpi_extended_sleep_handler(u8 sleep_state, u32 val_a, u32 val_b)
> > +{
> > + return hv_acpi_sleep_handler(sleep_state, val_a, val_b);
> > +}
>
> Is this function needed? The function signature is identical to hv_acpi_sleep_handler().
> So it seems like acpi_os_set_prepare_extended_sleep() could just use
> hv_acpi_sleep_handler() directly.
>
I confirmed that hv_acpi_xxx_handler methods are not really necessary.
They are usually invoked by `pm_suspend`, when the host is be put into a
non-S5 sleep state. As non-S5 sleep states are not supported by mshv, I
will drop these handlers in next revision.
> > +
> > +int hv_sleep_notifiers_register(void)
> > +{
> > + int ret;
> > +
> > + acpi_os_set_prepare_sleep(&hv_acpi_sleep_handler);
> > + acpi_os_set_prepare_extended_sleep(&hv_acpi_extended_sleep_handler);
>
> I'm not clear on why these handlers are set. If the hv_reboot_notifier is
> called, are these ACPI handlers ever called? Or are these to catch any cases
> where the hv_reboot_notifier is somehow bypassed? Or maybe I'm just
> not understanding something .... :-)
>
> > +
> > + ret = register_reboot_notifier(&hv_reboot_notifier);
> > + if (ret)
> > + pr_err("%s: cannot register reboot notifier %d\n",
> > + __func__, ret);
> > +
> > + return ret;
> > +}
> > +#endif
>
> I'm wondering if all this code belongs in hv_common.c, since it is only needed
> for Linux in the root partition. Couldn't it go in mshv_common.c? It would still
> be built-in code (i.e., not in a loadable module), but only if CONFIG_MSHV_ROOT
> is set.
>
> Michael
>
> > --
> > 2.51.0
> >
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-10-22 21:30 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-14 16:41 [PATCH v2 0/2] Add support for clean shutdown with MSHV Praveen K Paladugu
2025-10-14 16:41 ` [PATCH v2 1/2] hyperv: Add definitions for MSHV sleep state configuration Praveen K Paladugu
2025-10-14 16:41 ` [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV Praveen K Paladugu
2025-10-15 5:23 ` kernel test robot
2025-10-15 8:01 ` kernel test robot
2025-10-16 19:29 ` Michael Kelley
2025-10-20 15:59 ` Praveen Paladugu
2025-10-20 17:30 ` Michael Kelley
2025-10-22 21:26 ` Praveen Paladugu
2025-10-22 21:30 ` Praveen Paladugu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).