linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Add support for clean shutdown with MSHV
@ 2025-10-14 16:41 Praveen K Paladugu
  2025-10-14 16:41 ` [PATCH v2 1/2] hyperv: Add definitions for MSHV sleep state configuration Praveen K Paladugu
  2025-10-14 16:41 ` [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV Praveen K Paladugu
  0 siblings, 2 replies; 10+ messages in thread
From: Praveen K Paladugu @ 2025-10-14 16:41 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, linux-hyperv,
	linux-kernel, bp, dave.hansen, x86, hpa, arnd
  Cc: anbelski, prapal, easwar.hariharan, nunodasneves, skinsburskii

Add support for clean shutdown of the root partition when running on MSHV
hypervisor.

v2:
  - Addressed review comments from v1.
  - Moved all sleep state handling methods under CONFIG_ACPI stub
  - - This fixes build issues on non-x86 architectures.

Moving the declartion of `hv_sleep_notifiers_register` to
arch/x86/include/asm/mshyperv.h, required the use of CONFIG_X86 stub
within hv_common.c. As this sleep configuration is dependent on ACPI,
I moved all the methods under CONFIG_ACPI to keep the code cleaner,
without introducing CONFIG_X86 stub.


Praveen K Paladugu (2):
  hyperv: Add definitions for MSHV sleep state configuration
  hyperv: Enable clean shutdown for root partition with MSHV

 arch/x86/hyperv/hv_init.c       |   7 ++
 arch/x86/include/asm/mshyperv.h |   1 +
 drivers/hv/hv_common.c          | 119 ++++++++++++++++++++++++++++++++
 include/hyperv/hvgdk_mini.h     |   4 +-
 include/hyperv/hvhdk_mini.h     |  33 +++++++++
 5 files changed, 163 insertions(+), 1 deletion(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/2] hyperv: Add definitions for MSHV sleep state configuration
  2025-10-14 16:41 [PATCH v2 0/2] Add support for clean shutdown with MSHV Praveen K Paladugu
@ 2025-10-14 16:41 ` Praveen K Paladugu
  2025-10-14 16:41 ` [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV Praveen K Paladugu
  1 sibling, 0 replies; 10+ messages in thread
From: Praveen K Paladugu @ 2025-10-14 16:41 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, linux-hyperv,
	linux-kernel, bp, dave.hansen, x86, hpa, arnd
  Cc: anbelski, prapal, easwar.hariharan, nunodasneves, skinsburskii

Add the definitions required to configure sleep states in mshv hypervsior.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Co-developed-by: Anatol Belski <anbelski@linux.microsoft.com>
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
---
 include/hyperv/hvgdk_mini.h |  4 +++-
 include/hyperv/hvhdk_mini.h | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
index 1be7f6a02304..25041417b54b 100644
--- a/include/hyperv/hvgdk_mini.h
+++ b/include/hyperv/hvgdk_mini.h
@@ -464,18 +464,20 @@ union hv_vp_assist_msr_contents {	 /* HV_REGISTER_VP_ASSIST_PAGE */
 #define HVCALL_RESET_DEBUG_SESSION			0x006b
 #define HVCALL_MAP_STATS_PAGE				0x006c
 #define HVCALL_UNMAP_STATS_PAGE				0x006d
+#define HVCALL_SET_SYSTEM_PROPERTY			0x006f
 #define HVCALL_ADD_LOGICAL_PROCESSOR			0x0076
 #define HVCALL_GET_SYSTEM_PROPERTY			0x007b
 #define HVCALL_MAP_DEVICE_INTERRUPT			0x007c
 #define HVCALL_UNMAP_DEVICE_INTERRUPT			0x007d
 #define HVCALL_RETARGET_INTERRUPT			0x007e
+#define HVCALL_ENTER_SLEEP_STATE			0x0084
 #define HVCALL_NOTIFY_PORT_RING_EMPTY			0x008b
 #define HVCALL_REGISTER_INTERCEPT_RESULT		0x0091
 #define HVCALL_ASSERT_VIRTUAL_INTERRUPT			0x0094
 #define HVCALL_CREATE_PORT				0x0095
 #define HVCALL_CONNECT_PORT				0x0096
 #define HVCALL_START_VP					0x0099
-#define HVCALL_GET_VP_INDEX_FROM_APIC_ID			0x009a
+#define HVCALL_GET_VP_INDEX_FROM_APIC_ID		0x009a
 #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE	0x00af
 #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST	0x00b0
 #define HVCALL_SIGNAL_EVENT_DIRECT			0x00c0
diff --git a/include/hyperv/hvhdk_mini.h b/include/hyperv/hvhdk_mini.h
index 858f6a3925b3..8fa86c014c25 100644
--- a/include/hyperv/hvhdk_mini.h
+++ b/include/hyperv/hvhdk_mini.h
@@ -114,10 +114,24 @@ enum hv_snp_status {
 
 enum hv_system_property {
 	/* Add more values when needed */
+	HV_SYSTEM_PROPERTY_SLEEP_STATE = 3,
 	HV_SYSTEM_PROPERTY_SCHEDULER_TYPE = 15,
 	HV_DYNAMIC_PROCESSOR_FEATURE_PROPERTY = 21,
 };
 
+enum hv_sleep_state {
+	HV_SLEEP_STATE_S1 = 1,
+	HV_SLEEP_STATE_S2 = 2,
+	HV_SLEEP_STATE_S3 = 3,
+	HV_SLEEP_STATE_S4 = 4,
+	HV_SLEEP_STATE_S5 = 5,
+	/*
+	 * After hypervisor has received this, any follow up sleep
+	 * state registration requests will be rejected.
+	 */
+	HV_SLEEP_STATE_LOCK = 6
+};
+
 enum hv_dynamic_processor_feature_property {
 	/* Add more values when needed */
 	HV_X64_DYNAMIC_PROCESSOR_FEATURE_MAX_ENCRYPTED_PARTITIONS = 13,
@@ -145,6 +159,25 @@ struct hv_output_get_system_property {
 	};
 } __packed;
 
+struct hv_sleep_state_info {
+	u32 sleep_state; /* enum hv_sleep_state */
+	u8 pm1a_slp_typ;
+	u8 pm1b_slp_typ;
+} __packed;
+
+struct hv_input_set_system_property {
+	u32 property_id; /* enum hv_system_property */
+	u32 reserved;
+	union {
+		/* More fields to be filled in when needed */
+		struct hv_sleep_state_info set_sleep_state_info;
+	};
+} __packed;
+
+struct hv_input_enter_sleep_state {     /* HV_INPUT_ENTER_SLEEP_STATE */
+	u32 sleep_state;        /* enum hv_sleep_state */
+} __packed;
+
 struct hv_input_map_stats_page {
 	u32 type; /* enum hv_stats_object_type */
 	u32 padding;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
  2025-10-14 16:41 [PATCH v2 0/2] Add support for clean shutdown with MSHV Praveen K Paladugu
  2025-10-14 16:41 ` [PATCH v2 1/2] hyperv: Add definitions for MSHV sleep state configuration Praveen K Paladugu
@ 2025-10-14 16:41 ` Praveen K Paladugu
  2025-10-15  5:23   ` kernel test robot
                     ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Praveen K Paladugu @ 2025-10-14 16:41 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, linux-hyperv,
	linux-kernel, bp, dave.hansen, x86, hpa, arnd
  Cc: anbelski, prapal, easwar.hariharan, nunodasneves, skinsburskii

When a shutdown is initiated in the root partition without configuring
sleep states, the call to `hv_call_enter_sleep_state` fails. In such cases
the root falls back to using legacy ACPI mechanisms to poweroff. This call
is intercepted by MSHV and will result in a Machine Check Exception (MCE).

Root panics with a trace similar to:

[   81.306348] reboot: Power down
[   81.314709] mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 0: b2000000c0060001
[   81.314711] mce: [Hardware Error]: TSC 3b8cb60a66 PPIN 11d98332458e4ea9
[   81.314713] mce: [Hardware Error]: PROCESSOR 0:606a6 TIME 1759339405 SOCKET 0 APIC 0 microcode ffffffff
[   81.314715] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[   81.314716] mce: [Hardware Error]: Machine check: Processor context corrupt
[   81.314717] Kernel panic - not syncing: Fatal machine check

To prevent this, properly configure sleep states within MSHV, allowing
the root partition to shut down cleanly without triggering a panic.

Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Co-developed-by: Anatol Belski <anbelski@linux.microsoft.com>
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
---
 arch/x86/hyperv/hv_init.c       |   7 ++
 arch/x86/include/asm/mshyperv.h |   1 +
 drivers/hv/hv_common.c          | 119 ++++++++++++++++++++++++++++++++
 3 files changed, 127 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index afdbda2dd7b7..57bd96671ead 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -510,6 +510,13 @@ void __init hyperv_init(void)
 		memunmap(src);
 
 		hv_remap_tsc_clocksource();
+		/*
+		 * The notifier registration might fail at various hops.
+		 * Corresponding error messages will land in dmesg. There is
+		 * otherwise nothing that can be specifically done to handle
+		 * failures here.
+		 */
+		(void)hv_sleep_notifiers_register();
 	} else {
 		hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
 		wrmsrq(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index abc4659f5809..fb8d691193df 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -236,6 +236,7 @@ int hyperv_fill_flush_guest_mapping_list(
 void hv_apic_init(void);
 void __init hv_init_spinlocks(void);
 bool hv_vcpu_is_preempted(int vcpu);
+int hv_sleep_notifiers_register(void);
 #else
 static inline void hv_apic_init(void) {}
 #endif
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index e109a620c83f..cfba9ded7bcb 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -837,3 +837,122 @@ const char *hv_result_to_string(u64 status)
 	return "Unknown";
 }
 EXPORT_SYMBOL_GPL(hv_result_to_string);
+
+#if IS_ENABLED(CONFIG_ACPI)
+/*
+ * Corresponding sleep states have to be initialized in order for a subsequent
+ * HVCALL_ENTER_SLEEP_STATE call to succeed. Currently only S5 state as per
+ * ACPI 6.4 chapter 7.4.2 is relevant, while S1, S2 and S3 can be supported.
+ *
+ * ACPI should be initialized and should support S5 sleep state when this method
+ * is called, so that it can extract correct PM values and pass them to hv.
+ */
+static int hv_initialize_sleep_states(void)
+{
+	u64 status;
+	unsigned long flags;
+	struct hv_input_set_system_property *in;
+	acpi_status acpi_status;
+	u8 sleep_type_a, sleep_type_b;
+
+	if (!acpi_sleep_state_supported(ACPI_STATE_S5)) {
+		pr_err("%s: S5 sleep state not supported.\n", __func__);
+		return -ENODEV;
+	}
+
+	acpi_status = acpi_get_sleep_type_data(ACPI_STATE_S5,
+						&sleep_type_a, &sleep_type_b);
+	if (ACPI_FAILURE(acpi_status))
+		return -ENODEV;
+
+	local_irq_save(flags);
+	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	memset(in, 0, sizeof(*in));
+
+	in->property_id = HV_SYSTEM_PROPERTY_SLEEP_STATE;
+	in->set_sleep_state_info.sleep_state = HV_SLEEP_STATE_S5;
+	in->set_sleep_state_info.pm1a_slp_typ = sleep_type_a;
+	in->set_sleep_state_info.pm1b_slp_typ = sleep_type_b;
+
+	status = hv_do_hypercall(HVCALL_SET_SYSTEM_PROPERTY, in, NULL);
+	local_irq_restore(flags);
+
+	if (!hv_result_success(status)) {
+		hv_status_err(status, "\n");
+		return hv_result_to_errno(status);
+	}
+
+	return 0;
+}
+
+static int hv_call_enter_sleep_state(u32 sleep_state)
+{
+	u64 status;
+	int ret;
+	unsigned long flags;
+	struct hv_input_enter_sleep_state *in;
+
+	ret = hv_initialize_sleep_states();
+	if (ret)
+		return ret;
+
+	local_irq_save(flags);
+	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	in->sleep_state = sleep_state;
+
+	status = hv_do_hypercall(HVCALL_ENTER_SLEEP_STATE, in, NULL);
+	local_irq_restore(flags);
+
+	if (!hv_result_success(status)) {
+		hv_status_err(status, "\n");
+		return hv_result_to_errno(status);
+	}
+
+	return 0;
+}
+
+static int hv_reboot_notifier_handler(struct notifier_block *this,
+				      unsigned long code, void *another)
+{
+	int ret = 0;
+
+	if (code == SYS_HALT || code == SYS_POWER_OFF)
+		ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
+
+	return ret ? NOTIFY_DONE : NOTIFY_OK;
+}
+
+static struct notifier_block hv_reboot_notifier = {
+	.notifier_call  = hv_reboot_notifier_handler,
+};
+
+static int hv_acpi_sleep_handler(u8 sleep_state, u32 pm1a_cnt, u32 pm1b_cnt)
+{
+	int ret = 0;
+
+	if (sleep_state == ACPI_STATE_S5)
+		ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
+
+	return ret == 0 ? 1 : -1;
+}
+
+static int hv_acpi_extended_sleep_handler(u8 sleep_state, u32 val_a, u32 val_b)
+{
+	return hv_acpi_sleep_handler(sleep_state, val_a, val_b);
+}
+
+int hv_sleep_notifiers_register(void)
+{
+	int ret;
+
+	acpi_os_set_prepare_sleep(&hv_acpi_sleep_handler);
+	acpi_os_set_prepare_extended_sleep(&hv_acpi_extended_sleep_handler);
+
+	ret = register_reboot_notifier(&hv_reboot_notifier);
+	if (ret)
+		pr_err("%s: cannot register reboot notifier %d\n",
+			__func__, ret);
+
+	return ret;
+}
+#endif
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
  2025-10-14 16:41 ` [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV Praveen K Paladugu
@ 2025-10-15  5:23   ` kernel test robot
  2025-10-15  8:01   ` kernel test robot
  2025-10-16 19:29   ` Michael Kelley
  2 siblings, 0 replies; 10+ messages in thread
From: kernel test robot @ 2025-10-15  5:23 UTC (permalink / raw)
  To: Praveen K Paladugu, kys, haiyangz, wei.liu, decui, tglx, mingo,
	linux-hyperv, linux-kernel, bp, dave.hansen, x86, hpa, arnd
  Cc: llvm, oe-kbuild-all, anbelski, prapal, easwar.hariharan,
	nunodasneves, skinsburskii

Hi Praveen,

kernel test robot noticed the following build warnings:

[auto build test WARNING on tip/x86/core]
[also build test WARNING on linus/master v6.18-rc1 next-20251014]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Praveen-K-Paladugu/hyperv-Add-definitions-for-MSHV-sleep-state-configuration/20251015-004650
base:   tip/x86/core
patch link:    https://lore.kernel.org/r/20251014164150.6935-3-prapal%40linux.microsoft.com
patch subject: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
config: i386-randconfig-006-20251015 (https://download.01.org/0day-ci/archive/20251015/202510151359.vRXcys2P-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251015/202510151359.vRXcys2P-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510151359.vRXcys2P-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/hv/hv_common.c:944:5: warning: no previous prototype for function 'hv_sleep_notifiers_register' [-Wmissing-prototypes]
     944 | int hv_sleep_notifiers_register(void)
         |     ^
   drivers/hv/hv_common.c:944:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
     944 | int hv_sleep_notifiers_register(void)
         | ^
         | static 
   1 warning generated.


vim +/hv_sleep_notifiers_register +944 drivers/hv/hv_common.c

   943	
 > 944	int hv_sleep_notifiers_register(void)

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
  2025-10-14 16:41 ` [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV Praveen K Paladugu
  2025-10-15  5:23   ` kernel test robot
@ 2025-10-15  8:01   ` kernel test robot
  2025-10-16 19:29   ` Michael Kelley
  2 siblings, 0 replies; 10+ messages in thread
From: kernel test robot @ 2025-10-15  8:01 UTC (permalink / raw)
  To: Praveen K Paladugu, kys, haiyangz, wei.liu, decui, tglx, mingo,
	linux-hyperv, linux-kernel, bp, dave.hansen, x86, hpa, arnd
  Cc: oe-kbuild-all, anbelski, prapal, easwar.hariharan, nunodasneves,
	skinsburskii

Hi Praveen,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/x86/core]
[also build test ERROR on linus/master v6.18-rc1 next-20251014]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Praveen-K-Paladugu/hyperv-Add-definitions-for-MSHV-sleep-state-configuration/20251015-004650
base:   tip/x86/core
patch link:    https://lore.kernel.org/r/20251014164150.6935-3-prapal%40linux.microsoft.com
patch subject: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
config: i386-buildonly-randconfig-002-20251015 (https://download.01.org/0day-ci/archive/20251015/202510151539.w9IHg8lU-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251015/202510151539.w9IHg8lU-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510151539.w9IHg8lU-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   arch/x86/hyperv/hv_init.c: In function 'hyperv_init':
>> arch/x86/hyperv/hv_init.c:556:23: error: implicit declaration of function 'hv_sleep_notifiers_register'; did you mean 'preempt_notifier_register'? [-Wimplicit-function-declaration]
     556 |                 (void)hv_sleep_notifiers_register();
         |                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~
         |                       preempt_notifier_register
--
>> drivers/hv/hv_common.c:944:5: warning: no previous prototype for 'hv_sleep_notifiers_register' [-Wmissing-prototypes]
     944 | int hv_sleep_notifiers_register(void)
         |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +556 arch/x86/hyperv/hv_init.c

   431	
   432	/*
   433	 * This function is to be invoked early in the boot sequence after the
   434	 * hypervisor has been detected.
   435	 *
   436	 * 1. Setup the hypercall page.
   437	 * 2. Register Hyper-V specific clocksource.
   438	 * 3. Setup Hyper-V specific APIC entry points.
   439	 */
   440	void __init hyperv_init(void)
   441	{
   442		u64 guest_id;
   443		union hv_x64_msr_hypercall_contents hypercall_msr;
   444		int cpuhp;
   445	
   446		if (x86_hyper_type != X86_HYPER_MS_HYPERV)
   447			return;
   448	
   449		if (hv_common_init())
   450			return;
   451	
   452		/*
   453		 * The VP assist page is useless to a TDX guest: the only use we
   454		 * would have for it is lazy EOI, which can not be used with TDX.
   455		 */
   456		if (hv_isolation_type_tdx())
   457			hv_vp_assist_page = NULL;
   458		else
   459			hv_vp_assist_page = kcalloc(nr_cpu_ids,
   460						    sizeof(*hv_vp_assist_page),
   461						    GFP_KERNEL);
   462		if (!hv_vp_assist_page) {
   463			ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
   464	
   465			if (!hv_isolation_type_tdx())
   466				goto common_free;
   467		}
   468	
   469		if (ms_hyperv.paravisor_present && hv_isolation_type_snp()) {
   470			/* Negotiate GHCB Version. */
   471			if (!hv_ghcb_negotiate_protocol())
   472				hv_ghcb_terminate(SEV_TERM_SET_GEN,
   473						  GHCB_SEV_ES_PROT_UNSUPPORTED);
   474	
   475			hv_ghcb_pg = alloc_percpu(union hv_ghcb *);
   476			if (!hv_ghcb_pg)
   477				goto free_vp_assist_page;
   478		}
   479	
   480		cpuhp = cpuhp_setup_state(CPUHP_AP_HYPERV_ONLINE, "x86/hyperv_init:online",
   481					  hv_cpu_init, hv_cpu_die);
   482		if (cpuhp < 0)
   483			goto free_ghcb_page;
   484	
   485		/*
   486		 * Setup the hypercall page and enable hypercalls.
   487		 * 1. Register the guest ID
   488		 * 2. Enable the hypercall and register the hypercall page
   489		 *
   490		 * A TDX VM with no paravisor only uses TDX GHCI rather than hv_hypercall_pg:
   491		 * when the hypercall input is a page, such a VM must pass a decrypted
   492		 * page to Hyper-V, e.g. hv_post_message() uses the per-CPU page
   493		 * hyperv_pcpu_input_arg, which is decrypted if no paravisor is present.
   494		 *
   495		 * A TDX VM with the paravisor uses hv_hypercall_pg for most hypercalls,
   496		 * which are handled by the paravisor and the VM must use an encrypted
   497		 * input page: in such a VM, the hyperv_pcpu_input_arg is encrypted and
   498		 * used in the hypercalls, e.g. see hv_mark_gpa_visibility() and
   499		 * hv_arch_irq_unmask(). Such a VM uses TDX GHCI for two hypercalls:
   500		 * 1. HVCALL_SIGNAL_EVENT: see vmbus_set_event() and _hv_do_fast_hypercall8().
   501		 * 2. HVCALL_POST_MESSAGE: the input page must be a decrypted page, i.e.
   502		 * hv_post_message() in such a VM can't use the encrypted hyperv_pcpu_input_arg;
   503		 * instead, hv_post_message() uses the post_msg_page, which is decrypted
   504		 * in such a VM and is only used in such a VM.
   505		 */
   506		guest_id = hv_generate_guest_id(LINUX_VERSION_CODE);
   507		wrmsrq(HV_X64_MSR_GUEST_OS_ID, guest_id);
   508	
   509		/* With the paravisor, the VM must also write the ID via GHCB/GHCI */
   510		hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
   511	
   512		/* A TDX VM with no paravisor only uses TDX GHCI rather than hv_hypercall_pg */
   513		if (hv_isolation_type_tdx() && !ms_hyperv.paravisor_present)
   514			goto skip_hypercall_pg_init;
   515	
   516		hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, MODULES_VADDR,
   517				MODULES_END, GFP_KERNEL, PAGE_KERNEL_ROX,
   518				VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
   519				__builtin_return_address(0));
   520		if (hv_hypercall_pg == NULL)
   521			goto clean_guest_os_id;
   522	
   523		rdmsrq(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
   524		hypercall_msr.enable = 1;
   525	
   526		if (hv_root_partition()) {
   527			struct page *pg;
   528			void *src;
   529	
   530			/*
   531			 * For the root partition, the hypervisor will set up its
   532			 * hypercall page. The hypervisor guarantees it will not show
   533			 * up in the root's address space. The root can't change the
   534			 * location of the hypercall page.
   535			 *
   536			 * Order is important here. We must enable the hypercall page
   537			 * so it is populated with code, then copy the code to an
   538			 * executable page.
   539			 */
   540			wrmsrq(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
   541	
   542			pg = vmalloc_to_page(hv_hypercall_pg);
   543			src = memremap(hypercall_msr.guest_physical_address << PAGE_SHIFT, PAGE_SIZE,
   544					MEMREMAP_WB);
   545			BUG_ON(!src);
   546			memcpy_to_page(pg, 0, src, HV_HYP_PAGE_SIZE);
   547			memunmap(src);
   548	
   549			hv_remap_tsc_clocksource();
   550			/*
   551			 * The notifier registration might fail at various hops.
   552			 * Corresponding error messages will land in dmesg. There is
   553			 * otherwise nothing that can be specifically done to handle
   554			 * failures here.
   555			 */
 > 556			(void)hv_sleep_notifiers_register();
   557		} else {
   558			hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
   559			wrmsrq(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
   560		}
   561	
   562		hv_set_hypercall_pg(hv_hypercall_pg);
   563	
   564	skip_hypercall_pg_init:
   565		/*
   566		 * hyperv_init() is called before LAPIC is initialized: see
   567		 * apic_intr_mode_init() -> x86_platform.apic_post_init() and
   568		 * apic_bsp_setup() -> setup_local_APIC(). The direct-mode STIMER
   569		 * depends on LAPIC, so hv_stimer_alloc() should be called from
   570		 * x86_init.timers.setup_percpu_clockev.
   571		 */
   572		old_setup_percpu_clockev = x86_init.timers.setup_percpu_clockev;
   573		x86_init.timers.setup_percpu_clockev = hv_stimer_setup_percpu_clockev;
   574	
   575		hv_apic_init();
   576	
   577		x86_init.pci.arch_init = hv_pci_init;
   578	
   579		register_syscore_ops(&hv_syscore_ops);
   580	
   581		if (ms_hyperv.priv_high & HV_ACCESS_PARTITION_ID)
   582			hv_get_partition_id();
   583	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
  2025-10-14 16:41 ` [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV Praveen K Paladugu
  2025-10-15  5:23   ` kernel test robot
  2025-10-15  8:01   ` kernel test robot
@ 2025-10-16 19:29   ` Michael Kelley
  2025-10-20 15:59     ` Praveen Paladugu
  2025-10-22 21:30     ` Praveen Paladugu
  2 siblings, 2 replies; 10+ messages in thread
From: Michael Kelley @ 2025-10-16 19:29 UTC (permalink / raw)
  To: Praveen K Paladugu, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	arnd@arndb.de
  Cc: anbelski@linux.microsoft.com,
	easwar.hariharan@linux.microsoft.com,
	nunodasneves@linux.microsoft.com,
	skinsburskii@linux.microsoft.com

From: Praveen K Paladugu <prapal@linux.microsoft.com> Sent: Tuesday, October 14, 2025 9:41 AM
> 
> When a shutdown is initiated in the root partition without configuring
> sleep states, the call to `hv_call_enter_sleep_state` fails. In such cases
> the root falls back to using legacy ACPI mechanisms to poweroff. This call
> is intercepted by MSHV and will result in a Machine Check Exception (MCE).
> 
> Root panics with a trace similar to:
> 
> [   81.306348] reboot: Power down
> [   81.314709] mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 0: b2000000c0060001
> [   81.314711] mce: [Hardware Error]: TSC 3b8cb60a66 PPIN 11d98332458e4ea9
> [   81.314713] mce: [Hardware Error]: PROCESSOR 0:606a6 TIME 1759339405 SOCKET 0 APIC 0 microcode ffffffff
> [   81.314715] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
> [   81.314716] mce: [Hardware Error]: Machine check: Processor context corrupt
> [   81.314717] Kernel panic - not syncing: Fatal machine check
> 
> To prevent this, properly configure sleep states within MSHV, allowing
> the root partition to shut down cleanly without triggering a panic.
> 
> Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
> Co-developed-by: Anatol Belski <anbelski@linux.microsoft.com>
> Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
> ---
>  arch/x86/hyperv/hv_init.c       |   7 ++
>  arch/x86/include/asm/mshyperv.h |   1 +
>  drivers/hv/hv_common.c          | 119 ++++++++++++++++++++++++++++++++
>  3 files changed, 127 insertions(+)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index afdbda2dd7b7..57bd96671ead 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -510,6 +510,13 @@ void __init hyperv_init(void)
>  		memunmap(src);
> 
>  		hv_remap_tsc_clocksource();
> +		/*
> +		 * The notifier registration might fail at various hops.
> +		 * Corresponding error messages will land in dmesg. There is
> +		 * otherwise nothing that can be specifically done to handle
> +		 * failures here.
> +		 */
> +		(void)hv_sleep_notifiers_register();
>  	} else {
>  		hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
>  		wrmsrq(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index abc4659f5809..fb8d691193df 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -236,6 +236,7 @@ int hyperv_fill_flush_guest_mapping_list(
>  void hv_apic_init(void);
>  void __init hv_init_spinlocks(void);
>  bool hv_vcpu_is_preempted(int vcpu);
> +int hv_sleep_notifiers_register(void);
>  #else
>  static inline void hv_apic_init(void) {}
>  #endif
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index e109a620c83f..cfba9ded7bcb 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -837,3 +837,122 @@ const char *hv_result_to_string(u64 status)
>  	return "Unknown";
>  }
>  EXPORT_SYMBOL_GPL(hv_result_to_string);
> +
> +#if IS_ENABLED(CONFIG_ACPI)
> +/*
> + * Corresponding sleep states have to be initialized in order for a subsequent
> + * HVCALL_ENTER_SLEEP_STATE call to succeed. Currently only S5 state as per
> + * ACPI 6.4 chapter 7.4.2 is relevant, while S1, S2 and S3 can be supported.
> + *
> + * ACPI should be initialized and should support S5 sleep state when this method
> + * is called, so that it can extract correct PM values and pass them to hv.
> + */
> +static int hv_initialize_sleep_states(void)
> +{
> +	u64 status;
> +	unsigned long flags;
> +	struct hv_input_set_system_property *in;
> +	acpi_status acpi_status;
> +	u8 sleep_type_a, sleep_type_b;
> +
> +	if (!acpi_sleep_state_supported(ACPI_STATE_S5)) {
> +		pr_err("%s: S5 sleep state not supported.\n", __func__);
> +		return -ENODEV;
> +	}
> +
> +	acpi_status = acpi_get_sleep_type_data(ACPI_STATE_S5,
> +						&sleep_type_a, &sleep_type_b);
> +	if (ACPI_FAILURE(acpi_status))
> +		return -ENODEV;
> +
> +	local_irq_save(flags);
> +	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +	memset(in, 0, sizeof(*in));
> +
> +	in->property_id = HV_SYSTEM_PROPERTY_SLEEP_STATE;
> +	in->set_sleep_state_info.sleep_state = HV_SLEEP_STATE_S5;
> +	in->set_sleep_state_info.pm1a_slp_typ = sleep_type_a;
> +	in->set_sleep_state_info.pm1b_slp_typ = sleep_type_b;
> +
> +	status = hv_do_hypercall(HVCALL_SET_SYSTEM_PROPERTY, in, NULL);
> +	local_irq_restore(flags);
> +
> +	if (!hv_result_success(status)) {
> +		hv_status_err(status, "\n");
> +		return hv_result_to_errno(status);
> +	}
> +
> +	return 0;
> +}
> +
> +static int hv_call_enter_sleep_state(u32 sleep_state)
> +{
> +	u64 status;
> +	int ret;
> +	unsigned long flags;
> +	struct hv_input_enter_sleep_state *in;
> +
> +	ret = hv_initialize_sleep_states();
> +	if (ret)
> +		return ret;
> +
> +	local_irq_save(flags);
> +	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +	in->sleep_state = sleep_state;
> +
> +	status = hv_do_hypercall(HVCALL_ENTER_SLEEP_STATE, in, NULL);

If this hypercall succeeds, does the root partition (which is the caller) go
to sleep in S5, such that the hypercall never returns? If that's not the case,
what is the behavior of this hypercall?

> +	local_irq_restore(flags);
> +
> +	if (!hv_result_success(status)) {
> +		hv_status_err(status, "\n");
> +		return hv_result_to_errno(status);
> +	}
> +
> +	return 0;
> +}
> +
> +static int hv_reboot_notifier_handler(struct notifier_block *this,
> +				      unsigned long code, void *another)
> +{
> +	int ret = 0;
> +
> +	if (code == SYS_HALT || code == SYS_POWER_OFF)
> +		ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);

If hv_call_enter_sleep_state() never returns, here's an issue. There may be
multiple entries on the reboot notifier chain. For example,
mshv_root_partition_init() puts an entry on the reboot notifier chain. At
reboot time, the entries are executed in some order, with the expectation
that all entries will be executed prior to the reboot actually happening. But
if this hypercall never returns, some entries may never be executed.

Notifier chains support a notion of priority to control the order in
which they are executed, but that priority isn't set in hv_reboot_notifier
below, or in mshv_reboot_nb. And most other reboot notifiers throughout
Linux appear to not set it. So the ordering is unspecified, and having
this notifier never return may be problematic.

> +
> +	return ret ? NOTIFY_DONE : NOTIFY_OK;
> +}
> +
> +static struct notifier_block hv_reboot_notifier = {
> +	.notifier_call  = hv_reboot_notifier_handler,
> +};
> +
> +static int hv_acpi_sleep_handler(u8 sleep_state, u32 pm1a_cnt, u32 pm1b_cnt)
> +{
> +	int ret = 0;
> +
> +	if (sleep_state == ACPI_STATE_S5)
> +		ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> +
> +	return ret == 0 ? 1 : -1;
> +}
> +
> +static int hv_acpi_extended_sleep_handler(u8 sleep_state, u32 val_a, u32 val_b)
> +{
> +	return hv_acpi_sleep_handler(sleep_state, val_a, val_b);
> +}

Is this function needed? The function signature is identical to hv_acpi_sleep_handler().
So it seems like acpi_os_set_prepare_extended_sleep() could just use
hv_acpi_sleep_handler() directly.

> +
> +int hv_sleep_notifiers_register(void)
> +{
> +	int ret;
> +
> +	acpi_os_set_prepare_sleep(&hv_acpi_sleep_handler);
> +	acpi_os_set_prepare_extended_sleep(&hv_acpi_extended_sleep_handler);

I'm not clear on why these handlers are set. If the hv_reboot_notifier is
called, are these ACPI handlers ever called? Or are these to catch any cases
where the hv_reboot_notifier is somehow bypassed? Or maybe I'm just
not understanding something .... :-)

> +
> +	ret = register_reboot_notifier(&hv_reboot_notifier);
> +	if (ret)
> +		pr_err("%s: cannot register reboot notifier %d\n",
> +			__func__, ret);
> +
> +	return ret;
> +}
> +#endif

I'm wondering if all this code belongs in hv_common.c, since it is only needed
for Linux in the root partition. Couldn't it go in mshv_common.c? It would still
be built-in code (i.e., not in a loadable module), but only if CONFIG_MSHV_ROOT
is set.

Michael

> --
> 2.51.0
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
  2025-10-16 19:29   ` Michael Kelley
@ 2025-10-20 15:59     ` Praveen Paladugu
  2025-10-20 17:30       ` Michael Kelley
  2025-10-22 21:30     ` Praveen Paladugu
  1 sibling, 1 reply; 10+ messages in thread
From: Praveen Paladugu @ 2025-10-20 15:59 UTC (permalink / raw)
  To: Michael Kelley
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, arnd@arndb.de, anbelski@linux.microsoft.com,
	easwar.hariharan@linux.microsoft.com,
	nunodasneves@linux.microsoft.com,
	skinsburskii@linux.microsoft.com

On Thu, Oct 16, 2025 at 07:29:06PM +0000, Michael Kelley wrote:
> From: Praveen K Paladugu <prapal@linux.microsoft.com> Sent: Tuesday, October 14, 2025 9:41 AM
> > 
> > When a shutdown is initiated in the root partition without configuring
> > sleep states, the call to `hv_call_enter_sleep_state` fails. In such cases
> > the root falls back to using legacy ACPI mechanisms to poweroff. This call
> > is intercepted by MSHV and will result in a Machine Check Exception (MCE).
> > 
> > Root panics with a trace similar to:
> > 
> > [   81.306348] reboot: Power down
> > [   81.314709] mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 0: b2000000c0060001
> > [   81.314711] mce: [Hardware Error]: TSC 3b8cb60a66 PPIN 11d98332458e4ea9
> > [   81.314713] mce: [Hardware Error]: PROCESSOR 0:606a6 TIME 1759339405 SOCKET 0 APIC 0 microcode ffffffff
> > [   81.314715] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
> > [   81.314716] mce: [Hardware Error]: Machine check: Processor context corrupt
> > [   81.314717] Kernel panic - not syncing: Fatal machine check
> > 
> > To prevent this, properly configure sleep states within MSHV, allowing
> > the root partition to shut down cleanly without triggering a panic.
> > 
> > Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
> > Co-developed-by: Anatol Belski <anbelski@linux.microsoft.com>
> > Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
> > ---
> >  arch/x86/hyperv/hv_init.c       |   7 ++
> >  arch/x86/include/asm/mshyperv.h |   1 +
> >  drivers/hv/hv_common.c          | 119 ++++++++++++++++++++++++++++++++
> >  3 files changed, 127 insertions(+)
> > 
> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > index afdbda2dd7b7..57bd96671ead 100644
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -510,6 +510,13 @@ void __init hyperv_init(void)
> >  		memunmap(src);
> > 
> >  		hv_remap_tsc_clocksource();
> > +		/*
> > +		 * The notifier registration might fail at various hops.
> > +		 * Corresponding error messages will land in dmesg. There is
> > +		 * otherwise nothing that can be specifically done to handle
> > +		 * failures here.
> > +		 */
> > +		(void)hv_sleep_notifiers_register();
> >  	} else {
> >  		hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
> >  		wrmsrq(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> > index abc4659f5809..fb8d691193df 100644
> > --- a/arch/x86/include/asm/mshyperv.h
> > +++ b/arch/x86/include/asm/mshyperv.h
> > @@ -236,6 +236,7 @@ int hyperv_fill_flush_guest_mapping_list(
> >  void hv_apic_init(void);
> >  void __init hv_init_spinlocks(void);
> >  bool hv_vcpu_is_preempted(int vcpu);
> > +int hv_sleep_notifiers_register(void);
> >  #else
> >  static inline void hv_apic_init(void) {}
> >  #endif
> > diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> > index e109a620c83f..cfba9ded7bcb 100644
> > --- a/drivers/hv/hv_common.c
> > +++ b/drivers/hv/hv_common.c
> > @@ -837,3 +837,122 @@ const char *hv_result_to_string(u64 status)
> >  	return "Unknown";
> >  }
> >  EXPORT_SYMBOL_GPL(hv_result_to_string);
> > +
> > +#if IS_ENABLED(CONFIG_ACPI)
> > +/*
> > + * Corresponding sleep states have to be initialized in order for a subsequent
> > + * HVCALL_ENTER_SLEEP_STATE call to succeed. Currently only S5 state as per
> > + * ACPI 6.4 chapter 7.4.2 is relevant, while S1, S2 and S3 can be supported.
> > + *
> > + * ACPI should be initialized and should support S5 sleep state when this method
> > + * is called, so that it can extract correct PM values and pass them to hv.
> > + */
> > +static int hv_initialize_sleep_states(void)
> > +{
> > +	u64 status;
> > +	unsigned long flags;
> > +	struct hv_input_set_system_property *in;
> > +	acpi_status acpi_status;
> > +	u8 sleep_type_a, sleep_type_b;
> > +
> > +	if (!acpi_sleep_state_supported(ACPI_STATE_S5)) {
> > +		pr_err("%s: S5 sleep state not supported.\n", __func__);
> > +		return -ENODEV;
> > +	}
> > +
> > +	acpi_status = acpi_get_sleep_type_data(ACPI_STATE_S5,
> > +						&sleep_type_a, &sleep_type_b);
> > +	if (ACPI_FAILURE(acpi_status))
> > +		return -ENODEV;
> > +
> > +	local_irq_save(flags);
> > +	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > +	memset(in, 0, sizeof(*in));
> > +
> > +	in->property_id = HV_SYSTEM_PROPERTY_SLEEP_STATE;
> > +	in->set_sleep_state_info.sleep_state = HV_SLEEP_STATE_S5;
> > +	in->set_sleep_state_info.pm1a_slp_typ = sleep_type_a;
> > +	in->set_sleep_state_info.pm1b_slp_typ = sleep_type_b;
> > +
> > +	status = hv_do_hypercall(HVCALL_SET_SYSTEM_PROPERTY, in, NULL);
> > +	local_irq_restore(flags);
> > +
> > +	if (!hv_result_success(status)) {
> > +		hv_status_err(status, "\n");
> > +		return hv_result_to_errno(status);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int hv_call_enter_sleep_state(u32 sleep_state)
> > +{
> > +	u64 status;
> > +	int ret;
> > +	unsigned long flags;
> > +	struct hv_input_enter_sleep_state *in;
> > +
> > +	ret = hv_initialize_sleep_states();
> > +	if (ret)
> > +		return ret;
> > +
> > +	local_irq_save(flags);
> > +	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > +	in->sleep_state = sleep_state;
> > +
> > +	status = hv_do_hypercall(HVCALL_ENTER_SLEEP_STATE, in, NULL);
> 
> If this hypercall succeeds, does the root partition (which is the caller) go
> to sleep in S5, such that the hypercall never returns? If that's not the case,
> what is the behavior of this hypercall?
>
This hypercall returns to the kernel when the CPU wakes up the next
time.

> > +	local_irq_restore(flags);
> > +
> > +	if (!hv_result_success(status)) {
> > +		hv_status_err(status, "\n");
> > +		return hv_result_to_errno(status);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int hv_reboot_notifier_handler(struct notifier_block *this,
> > +				      unsigned long code, void *another)
> > +{
> > +	int ret = 0;
> > +
> > +	if (code == SYS_HALT || code == SYS_POWER_OFF)
> > +		ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> 
> If hv_call_enter_sleep_state() never returns, here's an issue. There may be
> multiple entries on the reboot notifier chain. For example,
> mshv_root_partition_init() puts an entry on the reboot notifier chain. At
> reboot time, the entries are executed in some order, with the expectation
> that all entries will be executed prior to the reboot actually happening. But
> if this hypercall never returns, some entries may never be executed.
> 
> Notifier chains support a notion of priority to control the order in
> which they are executed, but that priority isn't set in hv_reboot_notifier
> below, or in mshv_reboot_nb. And most other reboot notifiers throughout
> Linux appear to not set it. So the ordering is unspecified, and having
> this notifier never return may be problematic.
> 
Thanks for the detailed explanation Michael!

As I mentioned above, this hypercall returns to the kernel, so the rest
of the entries in the notifier chain should continue to execute.

> > +
> > +	return ret ? NOTIFY_DONE : NOTIFY_OK;
> > +}
> > +
> > +static struct notifier_block hv_reboot_notifier = {
> > +	.notifier_call  = hv_reboot_notifier_handler,
> > +};
> > +
> > +static int hv_acpi_sleep_handler(u8 sleep_state, u32 pm1a_cnt, u32 pm1b_cnt)
> > +{
> > +	int ret = 0;
> > +
> > +	if (sleep_state == ACPI_STATE_S5)
> > +		ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> > +
> > +	return ret == 0 ? 1 : -1;
> > +}
> > +
> > +static int hv_acpi_extended_sleep_handler(u8 sleep_state, u32 val_a, u32 val_b)
> > +{
> > +	return hv_acpi_sleep_handler(sleep_state, val_a, val_b);
> > +}
> 
> Is this function needed? The function signature is identical to hv_acpi_sleep_handler().
> So it seems like acpi_os_set_prepare_extended_sleep() could just use
> hv_acpi_sleep_handler() directly.
> 
Upon further investigation, I discovered that extended sleep is only
supported on platforms with ACPI_REDUCED_HARDWARE.

As these patches are targetted at X86, above does not really apply. I
will drop this handler in next version.

> > +
> > +int hv_sleep_notifiers_register(void)
> > +{
> > +	int ret;
> > +
> > +	acpi_os_set_prepare_sleep(&hv_acpi_sleep_handler);
> > +	acpi_os_set_prepare_extended_sleep(&hv_acpi_extended_sleep_handler);
> 
> I'm not clear on why these handlers are set. If the hv_reboot_notifier is
> called, are these ACPI handlers ever called? Or are these to catch any cases
> where the hv_reboot_notifier is somehow bypassed? Or maybe I'm just
> not understanding something .... :-)
>

I am trying to trace these calls. I will keep you posted with my
findings.

> > +
> > +	ret = register_reboot_notifier(&hv_reboot_notifier);
> > +	if (ret)
> > +		pr_err("%s: cannot register reboot notifier %d\n",
> > +			__func__, ret);
> > +
> > +	return ret;
> > +}
> > +#endif
> 
> I'm wondering if all this code belongs in hv_common.c, since it is only needed
> for Linux in the root partition. Couldn't it go in mshv_common.c? It would still
> be built-in code (i.e., not in a loadable module), but only if CONFIG_MSHV_ROOT
> is set.
>

This sounds reasonable. I will discuss this internally and get back you.

> Michael
> 
> > --
> > 2.51.0
> > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
  2025-10-20 15:59     ` Praveen Paladugu
@ 2025-10-20 17:30       ` Michael Kelley
  2025-10-22 21:26         ` Praveen Paladugu
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Kelley @ 2025-10-20 17:30 UTC (permalink / raw)
  To: Praveen Paladugu
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, arnd@arndb.de, anbelski@linux.microsoft.com,
	easwar.hariharan@linux.microsoft.com,
	nunodasneves@linux.microsoft.com,
	skinsburskii@linux.microsoft.com

From: Praveen Paladugu <prapal@linux.microsoft.com> Sent: Monday, October 20, 2025 9:00 AM
> 
> On Thu, Oct 16, 2025 at 07:29:06PM +0000, Michael Kelley wrote:
> > From: Praveen K Paladugu <prapal@linux.microsoft.com> Sent: Tuesday, October 14, 2025 9:41 AM
> > >

[snip]

> > > +static int hv_call_enter_sleep_state(u32 sleep_state)
> > > +{
> > > +	u64 status;
> > > +	int ret;
> > > +	unsigned long flags;
> > > +	struct hv_input_enter_sleep_state *in;
> > > +
> > > +	ret = hv_initialize_sleep_states();
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	local_irq_save(flags);
> > > +	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > > +	in->sleep_state = sleep_state;
> > > +
> > > +	status = hv_do_hypercall(HVCALL_ENTER_SLEEP_STATE, in, NULL);
> >
> > If this hypercall succeeds, does the root partition (which is the caller) go
> > to sleep in S5, such that the hypercall never returns? If that's not the case,
> > what is the behavior of this hypercall?
> >
> This hypercall returns to the kernel when the CPU wakes up the next
> time.

I must be missing something about the big picture, because "returns to
the kernel when the CPU wakes up" doesn't fit my mental model of what's
going on. I thought this function would be called, and the hypercall made,
when Linux in the root partition is shutting down. So if a CPU makes this
hypercall and goes to sleep, what wakes it up? And when it wakes up, is it
still running the same Linux instance that was shutting down, or has it
rebooted into new Linux instance? In the latter case, returning from
the hypercall doesn't make sense.

Can you explain further how this all works?

Michael

> 
> > > +	local_irq_restore(flags);
> > > +
> > > +	if (!hv_result_success(status)) {
> > > +		hv_status_err(status, "\n");
> > > +		return hv_result_to_errno(status);
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int hv_reboot_notifier_handler(struct notifier_block *this,
> > > +				      unsigned long code, void *another)
> > > +{
> > > +	int ret = 0;
> > > +
> > > +	if (code == SYS_HALT || code == SYS_POWER_OFF)
> > > +		ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> >
> > If hv_call_enter_sleep_state() never returns, here's an issue. There may be
> > multiple entries on the reboot notifier chain. For example,
> > mshv_root_partition_init() puts an entry on the reboot notifier chain. At
> > reboot time, the entries are executed in some order, with the expectation
> > that all entries will be executed prior to the reboot actually happening. But
> > if this hypercall never returns, some entries may never be executed.
> >
> > Notifier chains support a notion of priority to control the order in
> > which they are executed, but that priority isn't set in hv_reboot_notifier
> > below, or in mshv_reboot_nb. And most other reboot notifiers throughout
> > Linux appear to not set it. So the ordering is unspecified, and having
> > this notifier never return may be problematic.
> >
> Thanks for the detailed explanation Michael!
> 
> As I mentioned above, this hypercall returns to the kernel, so the rest
> of the entries in the notifier chain should continue to execute.
> 
> > > +
> > > +	return ret ? NOTIFY_DONE : NOTIFY_OK;
> > > +}
> > > +
> > > +static struct notifier_block hv_reboot_notifier = {
> > > +	.notifier_call  = hv_reboot_notifier_handler,
> > > +};
> > > +
> > > +static int hv_acpi_sleep_handler(u8 sleep_state, u32 pm1a_cnt, u32 pm1b_cnt)
> > > +{
> > > +	int ret = 0;
> > > +
> > > +	if (sleep_state == ACPI_STATE_S5)
> > > +		ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> > > +
> > > +	return ret == 0 ? 1 : -1;
> > > +}
> > > +
> > > +static int hv_acpi_extended_sleep_handler(u8 sleep_state, u32 val_a, u32 val_b)
> > > +{
> > > +	return hv_acpi_sleep_handler(sleep_state, val_a, val_b);
> > > +}
> >
> > Is this function needed? The function signature is identical to hv_acpi_sleep_handler().
> > So it seems like acpi_os_set_prepare_extended_sleep() could just use
> > hv_acpi_sleep_handler() directly.
> >
> Upon further investigation, I discovered that extended sleep is only
> supported on platforms with ACPI_REDUCED_HARDWARE.
> 
> As these patches are targetted at X86, above does not really apply. I
> will drop this handler in next version.
> 
> > > +
> > > +int hv_sleep_notifiers_register(void)
> > > +{
> > > +	int ret;
> > > +
> > > +	acpi_os_set_prepare_sleep(&hv_acpi_sleep_handler);
> > > +	acpi_os_set_prepare_extended_sleep(&hv_acpi_extended_sleep_handler);
> >
> > I'm not clear on why these handlers are set. If the hv_reboot_notifier is
> > called, are these ACPI handlers ever called? Or are these to catch any cases
> > where the hv_reboot_notifier is somehow bypassed? Or maybe I'm just
> > not understanding something .... :-)
> >
> 
> I am trying to trace these calls. I will keep you posted with my
> findings.
> 
> > > +
> > > +	ret = register_reboot_notifier(&hv_reboot_notifier);
> > > +	if (ret)
> > > +		pr_err("%s: cannot register reboot notifier %d\n",
> > > +			__func__, ret);
> > > +
> > > +	return ret;
> > > +}
> > > +#endif
> >
> > I'm wondering if all this code belongs in hv_common.c, since it is only needed
> > for Linux in the root partition. Couldn't it go in mshv_common.c? It would still
> > be built-in code (i.e., not in a loadable module), but only if CONFIG_MSHV_ROOT
> > is set.
> >
> 
> This sounds reasonable. I will discuss this internally and get back you.
> 
> > Michael
> >
> > > --
> > > 2.51.0
> > >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
  2025-10-20 17:30       ` Michael Kelley
@ 2025-10-22 21:26         ` Praveen Paladugu
  0 siblings, 0 replies; 10+ messages in thread
From: Praveen Paladugu @ 2025-10-22 21:26 UTC (permalink / raw)
  To: Michael Kelley
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, arnd@arndb.de, anbelski@linux.microsoft.com,
	easwar.hariharan@linux.microsoft.com,
	nunodasneves@linux.microsoft.com,
	skinsburskii@linux.microsoft.com

On Mon, Oct 20, 2025 at 05:30:44PM +0000, Michael Kelley wrote:
> From: Praveen Paladugu <prapal@linux.microsoft.com> Sent: Monday, October 20, 2025 9:00 AM
> > 
> > On Thu, Oct 16, 2025 at 07:29:06PM +0000, Michael Kelley wrote:
> > > From: Praveen K Paladugu <prapal@linux.microsoft.com> Sent: Tuesday, October 14, 2025 9:41 AM
> > > >
> 
> [snip]
> 
> > > > +static int hv_call_enter_sleep_state(u32 sleep_state)
> > > > +{
> > > > +	u64 status;
> > > > +	int ret;
> > > > +	unsigned long flags;
> > > > +	struct hv_input_enter_sleep_state *in;
> > > > +
> > > > +	ret = hv_initialize_sleep_states();
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	local_irq_save(flags);
> > > > +	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > > > +	in->sleep_state = sleep_state;
> > > > +
> > > > +	status = hv_do_hypercall(HVCALL_ENTER_SLEEP_STATE, in, NULL);
> > >
> > > If this hypercall succeeds, does the root partition (which is the caller) go
> > > to sleep in S5, such that the hypercall never returns? If that's not the case,
> > > what is the behavior of this hypercall?
> > >
> > This hypercall returns to the kernel when the CPU wakes up the next
> > time.
> 
> I must be missing something about the big picture, because "returns to
> the kernel when the CPU wakes up" doesn't fit my mental model of what's
> going on. I thought this function would be called, and the hypercall made,
> when Linux in the root partition is shutting down. So if a CPU makes this
> hypercall and goes to sleep, what wakes it up? And when it wakes up, is it
> still running the same Linux instance that was shutting down, or has it
> rebooted into new Linux instance? In the latter case, returning from
> the hypercall doesn't make sense.
> 
> Can you explain further how this all works?
>
Sorry for the confusion here. I mis-understood what happens while
entering the sleep state here. I will clarify below:

Sleep state S5 refers to shutdown/poweroff. Although other sleep states
are handled, they are not properly applied while running mshv on
servers. Non-S5 sleep states are only applied on non-server hosts.
This patch only handles S5 sleep state.

If a hypercall for non-S5 sleep state is invoked, the control
is returned back to the kernel. Usually non-S5 sleep state do things
like suspend to memory/hibernate etc.

If the hypecall is for S5 sleep state, then Hypervisor does poweroff the
host and the control does not return back to the kernel.


Now that is clarified, your previous comments about the order of the
reboot_notifiers is relevant now. I will investigate how to apply a
priority/order so that hv_reboot_notifier_handler will be called last.


Praveen

> Michael
> 
> > 
> > > > +	local_irq_restore(flags);
> > > > +
> > > > +	if (!hv_result_success(status)) {
> > > > +		hv_status_err(status, "\n");
> > > > +		return hv_result_to_errno(status);
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static int hv_reboot_notifier_handler(struct notifier_block *this,
> > > > +				      unsigned long code, void *another)
> > > > +{
> > > > +	int ret = 0;
> > > > +
> > > > +	if (code == SYS_HALT || code == SYS_POWER_OFF)
> > > > +		ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> > >
> > > If hv_call_enter_sleep_state() never returns, here's an issue. There may be
> > > multiple entries on the reboot notifier chain. For example,
> > > mshv_root_partition_init() puts an entry on the reboot notifier chain. At
> > > reboot time, the entries are executed in some order, with the expectation
> > > that all entries will be executed prior to the reboot actually happening. But
> > > if this hypercall never returns, some entries may never be executed.
> > >
> > > Notifier chains support a notion of priority to control the order in
> > > which they are executed, but that priority isn't set in hv_reboot_notifier
> > > below, or in mshv_reboot_nb. And most other reboot notifiers throughout
> > > Linux appear to not set it. So the ordering is unspecified, and having
> > > this notifier never return may be problematic.
> > >
> > Thanks for the detailed explanation Michael!
> > 
> > As I mentioned above, this hypercall returns to the kernel, so the rest
> > of the entries in the notifier chain should continue to execute.
> > 
> > > > +
> > > > +	return ret ? NOTIFY_DONE : NOTIFY_OK;
> > > > +}
> > > > +
> > > > +static struct notifier_block hv_reboot_notifier = {
> > > > +	.notifier_call  = hv_reboot_notifier_handler,
> > > > +};
> > > > +
> > > > +static int hv_acpi_sleep_handler(u8 sleep_state, u32 pm1a_cnt, u32 pm1b_cnt)
> > > > +{
> > > > +	int ret = 0;
> > > > +
> > > > +	if (sleep_state == ACPI_STATE_S5)
> > > > +		ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> > > > +
> > > > +	return ret == 0 ? 1 : -1;
> > > > +}
> > > > +
> > > > +static int hv_acpi_extended_sleep_handler(u8 sleep_state, u32 val_a, u32 val_b)
> > > > +{
> > > > +	return hv_acpi_sleep_handler(sleep_state, val_a, val_b);
> > > > +}
> > >
> > > Is this function needed? The function signature is identical to hv_acpi_sleep_handler().
> > > So it seems like acpi_os_set_prepare_extended_sleep() could just use
> > > hv_acpi_sleep_handler() directly.
> > >
> > Upon further investigation, I discovered that extended sleep is only
> > supported on platforms with ACPI_REDUCED_HARDWARE.
> > 
> > As these patches are targetted at X86, above does not really apply. I
> > will drop this handler in next version.
> > 
> > > > +
> > > > +int hv_sleep_notifiers_register(void)
> > > > +{
> > > > +	int ret;
> > > > +
> > > > +	acpi_os_set_prepare_sleep(&hv_acpi_sleep_handler);
> > > > +	acpi_os_set_prepare_extended_sleep(&hv_acpi_extended_sleep_handler);
> > >
> > > I'm not clear on why these handlers are set. If the hv_reboot_notifier is
> > > called, are these ACPI handlers ever called? Or are these to catch any cases
> > > where the hv_reboot_notifier is somehow bypassed? Or maybe I'm just
> > > not understanding something .... :-)
> > >
> > 
> > I am trying to trace these calls. I will keep you posted with my
> > findings.
> > 
> > > > +
> > > > +	ret = register_reboot_notifier(&hv_reboot_notifier);
> > > > +	if (ret)
> > > > +		pr_err("%s: cannot register reboot notifier %d\n",
> > > > +			__func__, ret);
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +#endif
> > >
> > > I'm wondering if all this code belongs in hv_common.c, since it is only needed
> > > for Linux in the root partition. Couldn't it go in mshv_common.c? It would still
> > > be built-in code (i.e., not in a loadable module), but only if CONFIG_MSHV_ROOT
> > > is set.
> > >
> > 
> > This sounds reasonable. I will discuss this internally and get back you.
> > 
> > > Michael
> > >
> > > > --
> > > > 2.51.0
> > > >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV
  2025-10-16 19:29   ` Michael Kelley
  2025-10-20 15:59     ` Praveen Paladugu
@ 2025-10-22 21:30     ` Praveen Paladugu
  1 sibling, 0 replies; 10+ messages in thread
From: Praveen Paladugu @ 2025-10-22 21:30 UTC (permalink / raw)
  To: Michael Kelley
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org,
	hpa@zytor.com, arnd@arndb.de, anbelski@linux.microsoft.com,
	easwar.hariharan@linux.microsoft.com,
	nunodasneves@linux.microsoft.com,
	skinsburskii@linux.microsoft.com

On Thu, Oct 16, 2025 at 07:29:06PM +0000, Michael Kelley wrote:
> From: Praveen K Paladugu <prapal@linux.microsoft.com> Sent: Tuesday, October 14, 2025 9:41 AM
> > 
> > When a shutdown is initiated in the root partition without configuring
> > sleep states, the call to `hv_call_enter_sleep_state` fails. In such cases
> > the root falls back to using legacy ACPI mechanisms to poweroff. This call
> > is intercepted by MSHV and will result in a Machine Check Exception (MCE).
> > 
> > Root panics with a trace similar to:
> > 
> > [   81.306348] reboot: Power down
> > [   81.314709] mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 0: b2000000c0060001
> > [   81.314711] mce: [Hardware Error]: TSC 3b8cb60a66 PPIN 11d98332458e4ea9
> > [   81.314713] mce: [Hardware Error]: PROCESSOR 0:606a6 TIME 1759339405 SOCKET 0 APIC 0 microcode ffffffff
> > [   81.314715] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
> > [   81.314716] mce: [Hardware Error]: Machine check: Processor context corrupt
> > [   81.314717] Kernel panic - not syncing: Fatal machine check
> > 
> > To prevent this, properly configure sleep states within MSHV, allowing
> > the root partition to shut down cleanly without triggering a panic.
> > 
> > Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
> > Co-developed-by: Anatol Belski <anbelski@linux.microsoft.com>
> > Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
> > ---
> >  arch/x86/hyperv/hv_init.c       |   7 ++
> >  arch/x86/include/asm/mshyperv.h |   1 +
> >  drivers/hv/hv_common.c          | 119 ++++++++++++++++++++++++++++++++
> >  3 files changed, 127 insertions(+)
> > 
> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > index afdbda2dd7b7..57bd96671ead 100644
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -510,6 +510,13 @@ void __init hyperv_init(void)
> >  		memunmap(src);
> > 
> >  		hv_remap_tsc_clocksource();
> > +		/*
> > +		 * The notifier registration might fail at various hops.
> > +		 * Corresponding error messages will land in dmesg. There is
> > +		 * otherwise nothing that can be specifically done to handle
> > +		 * failures here.
> > +		 */
> > +		(void)hv_sleep_notifiers_register();
> >  	} else {
> >  		hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
> >  		wrmsrq(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> > index abc4659f5809..fb8d691193df 100644
> > --- a/arch/x86/include/asm/mshyperv.h
> > +++ b/arch/x86/include/asm/mshyperv.h
> > @@ -236,6 +236,7 @@ int hyperv_fill_flush_guest_mapping_list(
> >  void hv_apic_init(void);
> >  void __init hv_init_spinlocks(void);
> >  bool hv_vcpu_is_preempted(int vcpu);
> > +int hv_sleep_notifiers_register(void);
> >  #else
> >  static inline void hv_apic_init(void) {}
> >  #endif
> > diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> > index e109a620c83f..cfba9ded7bcb 100644
> > --- a/drivers/hv/hv_common.c
> > +++ b/drivers/hv/hv_common.c
> > @@ -837,3 +837,122 @@ const char *hv_result_to_string(u64 status)
> >  	return "Unknown";
> >  }
> >  EXPORT_SYMBOL_GPL(hv_result_to_string);
> > +
> > +#if IS_ENABLED(CONFIG_ACPI)
> > +/*
> > + * Corresponding sleep states have to be initialized in order for a subsequent
> > + * HVCALL_ENTER_SLEEP_STATE call to succeed. Currently only S5 state as per
> > + * ACPI 6.4 chapter 7.4.2 is relevant, while S1, S2 and S3 can be supported.
> > + *
> > + * ACPI should be initialized and should support S5 sleep state when this method
> > + * is called, so that it can extract correct PM values and pass them to hv.
> > + */
> > +static int hv_initialize_sleep_states(void)
> > +{
> > +	u64 status;
> > +	unsigned long flags;
> > +	struct hv_input_set_system_property *in;
> > +	acpi_status acpi_status;
> > +	u8 sleep_type_a, sleep_type_b;
> > +
> > +	if (!acpi_sleep_state_supported(ACPI_STATE_S5)) {
> > +		pr_err("%s: S5 sleep state not supported.\n", __func__);
> > +		return -ENODEV;
> > +	}
> > +
> > +	acpi_status = acpi_get_sleep_type_data(ACPI_STATE_S5,
> > +						&sleep_type_a, &sleep_type_b);
> > +	if (ACPI_FAILURE(acpi_status))
> > +		return -ENODEV;
> > +
> > +	local_irq_save(flags);
> > +	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > +	memset(in, 0, sizeof(*in));
> > +
> > +	in->property_id = HV_SYSTEM_PROPERTY_SLEEP_STATE;
> > +	in->set_sleep_state_info.sleep_state = HV_SLEEP_STATE_S5;
> > +	in->set_sleep_state_info.pm1a_slp_typ = sleep_type_a;
> > +	in->set_sleep_state_info.pm1b_slp_typ = sleep_type_b;
> > +
> > +	status = hv_do_hypercall(HVCALL_SET_SYSTEM_PROPERTY, in, NULL);
> > +	local_irq_restore(flags);
> > +
> > +	if (!hv_result_success(status)) {
> > +		hv_status_err(status, "\n");
> > +		return hv_result_to_errno(status);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int hv_call_enter_sleep_state(u32 sleep_state)
> > +{
> > +	u64 status;
> > +	int ret;
> > +	unsigned long flags;
> > +	struct hv_input_enter_sleep_state *in;
> > +
> > +	ret = hv_initialize_sleep_states();
> > +	if (ret)
> > +		return ret;
> > +
> > +	local_irq_save(flags);
> > +	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > +	in->sleep_state = sleep_state;
> > +
> > +	status = hv_do_hypercall(HVCALL_ENTER_SLEEP_STATE, in, NULL);
> 
> If this hypercall succeeds, does the root partition (which is the caller) go
> to sleep in S5, such that the hypercall never returns? If that's not the case,
> what is the behavior of this hypercall?
> 
> > +	local_irq_restore(flags);
> > +
> > +	if (!hv_result_success(status)) {
> > +		hv_status_err(status, "\n");
> > +		return hv_result_to_errno(status);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int hv_reboot_notifier_handler(struct notifier_block *this,
> > +				      unsigned long code, void *another)
> > +{
> > +	int ret = 0;
> > +
> > +	if (code == SYS_HALT || code == SYS_POWER_OFF)
> > +		ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> 
> If hv_call_enter_sleep_state() never returns, here's an issue. There may be
> multiple entries on the reboot notifier chain. For example,
> mshv_root_partition_init() puts an entry on the reboot notifier chain. At
> reboot time, the entries are executed in some order, with the expectation
> that all entries will be executed prior to the reboot actually happening. But
> if this hypercall never returns, some entries may never be executed.
> 
> Notifier chains support a notion of priority to control the order in
> which they are executed, but that priority isn't set in hv_reboot_notifier
> below, or in mshv_reboot_nb. And most other reboot notifiers throughout
> Linux appear to not set it. So the ordering is unspecified, and having
> this notifier never return may be problematic.
> 
> > +
> > +	return ret ? NOTIFY_DONE : NOTIFY_OK;
> > +}
> > +
> > +static struct notifier_block hv_reboot_notifier = {
> > +	.notifier_call  = hv_reboot_notifier_handler,
> > +};
> > +
> > +static int hv_acpi_sleep_handler(u8 sleep_state, u32 pm1a_cnt, u32 pm1b_cnt)
> > +{
> > +	int ret = 0;
> > +
> > +	if (sleep_state == ACPI_STATE_S5)
> > +		ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> > +
> > +	return ret == 0 ? 1 : -1;
> > +}
> > +
> > +static int hv_acpi_extended_sleep_handler(u8 sleep_state, u32 val_a, u32 val_b)
> > +{
> > +	return hv_acpi_sleep_handler(sleep_state, val_a, val_b);
> > +}
> 
> Is this function needed? The function signature is identical to hv_acpi_sleep_handler().
> So it seems like acpi_os_set_prepare_extended_sleep() could just use
> hv_acpi_sleep_handler() directly.
>

I confirmed that hv_acpi_xxx_handler methods are not really necessary.
They are usually invoked by `pm_suspend`, when the host is be put into a
non-S5 sleep state. As non-S5 sleep states are not supported by mshv, I
will drop these handlers in next revision.

> > +
> > +int hv_sleep_notifiers_register(void)
> > +{
> > +	int ret;
> > +
> > +	acpi_os_set_prepare_sleep(&hv_acpi_sleep_handler);
> > +	acpi_os_set_prepare_extended_sleep(&hv_acpi_extended_sleep_handler);
> 
> I'm not clear on why these handlers are set. If the hv_reboot_notifier is
> called, are these ACPI handlers ever called? Or are these to catch any cases
> where the hv_reboot_notifier is somehow bypassed? Or maybe I'm just
> not understanding something .... :-)
> 
> > +
> > +	ret = register_reboot_notifier(&hv_reboot_notifier);
> > +	if (ret)
> > +		pr_err("%s: cannot register reboot notifier %d\n",
> > +			__func__, ret);
> > +
> > +	return ret;
> > +}
> > +#endif
> 
> I'm wondering if all this code belongs in hv_common.c, since it is only needed
> for Linux in the root partition. Couldn't it go in mshv_common.c? It would still
> be built-in code (i.e., not in a loadable module), but only if CONFIG_MSHV_ROOT
> is set.
> 
> Michael
> 
> > --
> > 2.51.0
> > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-10-22 21:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-14 16:41 [PATCH v2 0/2] Add support for clean shutdown with MSHV Praveen K Paladugu
2025-10-14 16:41 ` [PATCH v2 1/2] hyperv: Add definitions for MSHV sleep state configuration Praveen K Paladugu
2025-10-14 16:41 ` [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition with MSHV Praveen K Paladugu
2025-10-15  5:23   ` kernel test robot
2025-10-15  8:01   ` kernel test robot
2025-10-16 19:29   ` Michael Kelley
2025-10-20 15:59     ` Praveen Paladugu
2025-10-20 17:30       ` Michael Kelley
2025-10-22 21:26         ` Praveen Paladugu
2025-10-22 21:30     ` Praveen Paladugu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).