* [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings
@ 2025-09-26 16:23 Nuno Das Neves
2025-09-26 16:23 ` [PATCH v4 1/5] mshv: Only map vp->vp_stats_pages if on root scheduler Nuno Das Neves
` (5 more replies)
0 siblings, 6 replies; 24+ messages in thread
From: Nuno Das Neves @ 2025-09-26 16:23 UTC (permalink / raw)
To: linux-hyperv, linux-kernel, prapal, easwar.hariharan, tiala,
anirudh, paekkaladevi, skinsburskii
Cc: kys, haiyangz, wei.liu, decui, Nuno Das Neves
There are some differences in how L1VH partitions must map stats and vp
state pages, some of which are due to differences across hypervisor
versions. Detect and handle these cases.
Patch 1:
Fix for the logic of when to map the vp stats page for the root scheduler.
Patch 2-3:
Add HVCALL_GET_PARTITION_PROPERTY_EX and use it to query "vmm capabilities" on
module init.
Patches 4-5:
Check a feature bit in vmm capabilities, to take a new code path for mapping
stats and vp state pages. In this case, the stats and vp state pages must be
allocated by Linux, and a new hypercall HVCALL_MAP_VP_STATS_PAGE2 must be used
to map the stats page.
---
v4:
- Fixed some __packed attributes on unions [Stanislav]
- Cleaned up mshv_init_vmm_caps() [Stanislav]
- Cleaned up loop in hv_call_map_stats_page2() [Stanislav]
v3:
https://lore.kernel.org/linux-hyperv/1758066262-15477-1-git-send-email-nunodasneves@linux.microsoft.com/T/#t
- Fix bug in patch 4, in mshv_partition_ioctl_create_vp() cleanup path
[kernel test robot]
- Make hv_unmap_vp_state_page() use struct page to match hv_map_vp_state_page()
- Remove SELF == PARENT check which doesn't belong in patch 5 [Easwar]
v2:
https://lore.kernel.org/linux-hyperv/1757546089-2002-1-git-send-email-nunodasneves@linux.microsoft.com/T/#t
- Remove patch falling back to SELF page if PARENT mapping fails [Easwar]
(To be included in a future series)
- Fix formatting of function definitions [Easwar]
- Fix some wording in commit messages [Praveen]
- Proceed with driver init even if getting vmm capabilities fails [Anirudh]
v1:
https://lore.kernel.org/linux-hyperv/1756428230-3599-1-git-send-email-nunodasneves@linux.microsoft.com/T/#t
---
Jinank Jain (2):
mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH
mshv: Introduce new hypercall to map stats page for L1VH partitions
Nuno Das Neves (1):
mshv: Only map vp->vp_stats_pages if on root scheduler
Purna Pavan Chandra Aekkaladevi (2):
mshv: Add the HVCALL_GET_PARTITION_PROPERTY_EX hypercall
mshv: Get the vmm capabilities offered by the hypervisor
drivers/hv/mshv_root.h | 24 +++--
drivers/hv/mshv_root_hv_call.c | 185 +++++++++++++++++++++++++++++++--
drivers/hv/mshv_root_main.c | 127 ++++++++++++----------
include/hyperv/hvgdk_mini.h | 2 +
include/hyperv/hvhdk.h | 40 +++++++
include/hyperv/hvhdk_mini.h | 33 ++++++
6 files changed, 337 insertions(+), 74 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v4 1/5] mshv: Only map vp->vp_stats_pages if on root scheduler
2025-09-26 16:23 [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings Nuno Das Neves
@ 2025-09-26 16:23 ` Nuno Das Neves
2025-09-26 16:23 ` [PATCH v4 2/5] mshv: Add the HVCALL_GET_PARTITION_PROPERTY_EX hypercall Nuno Das Neves
` (4 subsequent siblings)
5 siblings, 0 replies; 24+ messages in thread
From: Nuno Das Neves @ 2025-09-26 16:23 UTC (permalink / raw)
To: linux-hyperv, linux-kernel, prapal, easwar.hariharan, tiala,
anirudh, paekkaladevi, skinsburskii
Cc: kys, haiyangz, wei.liu, decui, Nuno Das Neves
This mapping is only used for checking if the dispatch thread is
blocked. This is only relevant for the root scheduler, so check the
scheduler type to determine whether to map/unmap these pages, instead of
the current check, which is incorrect.
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com>
Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
drivers/hv/mshv_root_main.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index e3b2bd417c46..24df47726363 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -934,7 +934,11 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
goto unmap_register_page;
}
- if (hv_parent_partition()) {
+ /*
+ * This mapping of the stats page is for detecting if dispatch thread
+ * is blocked - only relevant for root scheduler
+ */
+ if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT) {
ret = mshv_vp_stats_map(partition->pt_id, args.vp_index,
stats_pages);
if (ret)
@@ -963,7 +967,7 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
if (mshv_partition_encrypted(partition) && is_ghcb_mapping_available())
vp->vp_ghcb_page = page_to_virt(ghcb_page);
- if (hv_parent_partition())
+ if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
memcpy(vp->vp_stats_pages, stats_pages, sizeof(stats_pages));
/*
@@ -986,7 +990,7 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
free_vp:
kfree(vp);
unmap_stats_pages:
- if (hv_parent_partition())
+ if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
mshv_vp_stats_unmap(partition->pt_id, args.vp_index);
unmap_ghcb_page:
if (mshv_partition_encrypted(partition) && is_ghcb_mapping_available()) {
@@ -1740,7 +1744,7 @@ static void destroy_partition(struct mshv_partition *partition)
if (!vp)
continue;
- if (hv_parent_partition())
+ if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
mshv_vp_stats_unmap(partition->pt_id, vp->vp_index);
if (vp->vp_register_page) {
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v4 2/5] mshv: Add the HVCALL_GET_PARTITION_PROPERTY_EX hypercall
2025-09-26 16:23 [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings Nuno Das Neves
2025-09-26 16:23 ` [PATCH v4 1/5] mshv: Only map vp->vp_stats_pages if on root scheduler Nuno Das Neves
@ 2025-09-26 16:23 ` Nuno Das Neves
2025-09-29 5:31 ` Tianyu Lan
2025-09-29 17:55 ` Michael Kelley
2025-09-26 16:23 ` [PATCH v4 3/5] mshv: Get the vmm capabilities offered by the hypervisor Nuno Das Neves
` (3 subsequent siblings)
5 siblings, 2 replies; 24+ messages in thread
From: Nuno Das Neves @ 2025-09-26 16:23 UTC (permalink / raw)
To: linux-hyperv, linux-kernel, prapal, easwar.hariharan, tiala,
anirudh, paekkaladevi, skinsburskii
Cc: kys, haiyangz, wei.liu, decui, Nuno Das Neves
From: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
This hypercall can be used to fetch extended properties of a
partition. Extended properties are properties with values larger than
a u64. Some of these also need additional input arguments.
Add helper function for using the hypercall in the mshv_root driver.
Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com>
Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
---
drivers/hv/mshv_root.h | 2 ++
drivers/hv/mshv_root_hv_call.c | 31 ++++++++++++++++++++++++++
include/hyperv/hvgdk_mini.h | 1 +
include/hyperv/hvhdk.h | 40 ++++++++++++++++++++++++++++++++++
include/hyperv/hvhdk_mini.h | 26 ++++++++++++++++++++++
5 files changed, 100 insertions(+)
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index e3931b0f1269..4aeb03bea6b6 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -303,6 +303,8 @@ int hv_call_unmap_stat_page(enum hv_stats_object_type type,
int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
u64 page_struct_count, u32 host_access,
u32 flags, u8 acquire);
+int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code, u64 arg,
+ void *property_value, size_t property_value_sz);
extern struct mshv_root mshv_root;
extern enum hv_scheduler_type hv_scheduler_type;
diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
index c9c274f29c3c..3fd3cce23f69 100644
--- a/drivers/hv/mshv_root_hv_call.c
+++ b/drivers/hv/mshv_root_hv_call.c
@@ -590,6 +590,37 @@ int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
return hv_result_to_errno(status);
}
+int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code,
+ u64 arg, void *property_value,
+ size_t property_value_sz)
+{
+ u64 status;
+ unsigned long flags;
+ struct hv_input_get_partition_property_ex *input;
+ struct hv_output_get_partition_property_ex *output;
+
+ local_irq_save(flags);
+ input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ output = *this_cpu_ptr(hyperv_pcpu_output_arg);
+
+ memset(input, 0, sizeof(*input));
+ input->partition_id = partition_id;
+ input->property_code = property_code;
+ input->arg = arg;
+ status = hv_do_hypercall(HVCALL_GET_PARTITION_PROPERTY_EX, input, output);
+
+ if (!hv_result_success(status)) {
+ hv_status_debug(status, "\n");
+ local_irq_restore(flags);
+ return hv_result_to_errno(status);
+ }
+ memcpy(property_value, &output->property_value, property_value_sz);
+
+ local_irq_restore(flags);
+
+ return 0;
+}
+
int
hv_call_clear_virtual_interrupt(u64 partition_id)
{
diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
index 1be7f6a02304..ff4325fb623a 100644
--- a/include/hyperv/hvgdk_mini.h
+++ b/include/hyperv/hvgdk_mini.h
@@ -490,6 +490,7 @@ union hv_vp_assist_msr_contents { /* HV_REGISTER_VP_ASSIST_PAGE */
#define HVCALL_GET_VP_STATE 0x00e3
#define HVCALL_SET_VP_STATE 0x00e4
#define HVCALL_GET_VP_CPUID_VALUES 0x00f4
+#define HVCALL_GET_PARTITION_PROPERTY_EX 0x0101
#define HVCALL_MMIO_READ 0x0106
#define HVCALL_MMIO_WRITE 0x0107
diff --git a/include/hyperv/hvhdk.h b/include/hyperv/hvhdk.h
index b4067ada02cf..416c0d45b793 100644
--- a/include/hyperv/hvhdk.h
+++ b/include/hyperv/hvhdk.h
@@ -376,6 +376,46 @@ struct hv_input_set_partition_property {
u64 property_value;
} __packed;
+union hv_partition_property_arg {
+ u64 as_uint64;
+ struct {
+ union {
+ u32 arg;
+ u32 vp_index;
+ };
+ u16 reserved0;
+ u8 reserved1;
+ u8 object_type;
+ } __packed;
+};
+
+struct hv_input_get_partition_property_ex {
+ u64 partition_id;
+ u32 property_code; /* enum hv_partition_property_code */
+ u32 padding;
+ union {
+ union hv_partition_property_arg arg_data;
+ u64 arg;
+ };
+} __packed;
+
+/*
+ * NOTE: Should use hv_input_set_partition_property_ex_header to compute this
+ * size, but hv_input_get_partition_property_ex is identical so it suffices
+ */
+#define HV_PARTITION_PROPERTY_EX_MAX_VAR_SIZE \
+ (HV_HYP_PAGE_SIZE - sizeof(struct hv_input_get_partition_property_ex))
+
+union hv_partition_property_ex {
+ u8 buffer[HV_PARTITION_PROPERTY_EX_MAX_VAR_SIZE];
+ struct hv_partition_property_vmm_capabilities vmm_capabilities;
+ /* More fields to be filled in when needed */
+};
+
+struct hv_output_get_partition_property_ex {
+ union hv_partition_property_ex property_value;
+} __packed;
+
enum hv_vp_state_page_type {
HV_VP_STATE_PAGE_REGISTERS = 0,
HV_VP_STATE_PAGE_INTERCEPT_MESSAGE = 1,
diff --git a/include/hyperv/hvhdk_mini.h b/include/hyperv/hvhdk_mini.h
index 858f6a3925b3..bf2ce27dfcc5 100644
--- a/include/hyperv/hvhdk_mini.h
+++ b/include/hyperv/hvhdk_mini.h
@@ -96,8 +96,34 @@ enum hv_partition_property_code {
HV_PARTITION_PROPERTY_XSAVE_STATES = 0x00060007,
HV_PARTITION_PROPERTY_MAX_XSAVE_DATA_SIZE = 0x00060008,
HV_PARTITION_PROPERTY_PROCESSOR_CLOCK_FREQUENCY = 0x00060009,
+
+ /* Extended properties with larger property values */
+ HV_PARTITION_PROPERTY_VMM_CAPABILITIES = 0x00090007,
};
+#define HV_PARTITION_VMM_CAPABILITIES_BANK_COUNT 1
+#define HV_PARTITION_VMM_CAPABILITIES_RESERVED_BITFIELD_COUNT 59
+
+struct hv_partition_property_vmm_capabilities {
+ u16 bank_count;
+ u16 reserved[3];
+ union {
+ u64 as_uint64[HV_PARTITION_VMM_CAPABILITIES_BANK_COUNT];
+ struct {
+ u64 map_gpa_preserve_adjustable: 1;
+ u64 vmm_can_provide_overlay_gpfn: 1;
+ u64 vp_affinity_property: 1;
+#if IS_ENABLED(CONFIG_ARM64)
+ u64 vmm_can_provide_gic_overlay_locations: 1;
+#else
+ u64 reservedbit3: 1;
+#endif
+ u64 assignable_synthetic_proc_features: 1;
+ u64 reserved0: HV_PARTITION_VMM_CAPABILITIES_RESERVED_BITFIELD_COUNT;
+ } __packed;
+ };
+} __packed;
+
enum hv_snp_status {
HV_SNP_STATUS_NONE = 0,
HV_SNP_STATUS_AVAILABLE = 1,
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v4 3/5] mshv: Get the vmm capabilities offered by the hypervisor
2025-09-26 16:23 [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings Nuno Das Neves
2025-09-26 16:23 ` [PATCH v4 1/5] mshv: Only map vp->vp_stats_pages if on root scheduler Nuno Das Neves
2025-09-26 16:23 ` [PATCH v4 2/5] mshv: Add the HVCALL_GET_PARTITION_PROPERTY_EX hypercall Nuno Das Neves
@ 2025-09-26 16:23 ` Nuno Das Neves
2025-09-26 16:23 ` [PATCH v4 4/5] mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH Nuno Das Neves
` (2 subsequent siblings)
5 siblings, 0 replies; 24+ messages in thread
From: Nuno Das Neves @ 2025-09-26 16:23 UTC (permalink / raw)
To: linux-hyperv, linux-kernel, prapal, easwar.hariharan, tiala,
anirudh, paekkaladevi, skinsburskii
Cc: kys, haiyangz, wei.liu, decui, Nuno Das Neves
From: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
Some hypervisor APIs are gated by feature bits in the
"vmm capabilities" partition property. Store the capabilities on
mshv_root module init, using HVCALL_GET_PARTITION_PROPERTY_EX.
This is not supported on all hypervisors. In that case, just set the
capabilities to 0 and proceed as normal.
Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
---
drivers/hv/mshv_root.h | 1 +
drivers/hv/mshv_root_main.c | 17 +++++++++++++++++
2 files changed, 18 insertions(+)
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index 4aeb03bea6b6..0cb1e2589fe1 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -178,6 +178,7 @@ struct mshv_root {
struct hv_synic_pages __percpu *synic_pages;
spinlock_t pt_ht_lock;
DECLARE_HASHTABLE(pt_htable, MSHV_PARTITIONS_HASH_BITS);
+ struct hv_partition_property_vmm_capabilities vmm_caps;
};
/*
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index 24df47726363..e199770ecdfa 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -2201,6 +2201,21 @@ static int __init mshv_root_partition_init(struct device *dev)
return err;
}
+static void mshv_init_vmm_caps(struct device *dev)
+{
+ /*
+ * HVCALL_GET_PARTITION_PROPERTY_EX or HV_PARTITION_PROPERTY_VMM_CAPABILITIES
+ * may not be supported. Leave them as 0 in that case.
+ */
+ if (hv_call_get_partition_property_ex(HV_PARTITION_ID_SELF,
+ HV_PARTITION_PROPERTY_VMM_CAPABILITIES,
+ 0, &mshv_root.vmm_caps,
+ sizeof(mshv_root.vmm_caps)))
+ dev_warn(dev, "Unable to get VMM capabilities\n");
+
+ dev_dbg(dev, "vmm_caps=0x%llx\n", mshv_root.vmm_caps.as_uint64[0]);
+}
+
static int __init mshv_parent_partition_init(void)
{
int ret;
@@ -2253,6 +2268,8 @@ static int __init mshv_parent_partition_init(void)
if (ret)
goto remove_cpu_state;
+ mshv_init_vmm_caps(dev);
+
ret = mshv_irqfd_wq_init();
if (ret)
goto exit_partition;
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v4 4/5] mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH
2025-09-26 16:23 [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings Nuno Das Neves
` (2 preceding siblings ...)
2025-09-26 16:23 ` [PATCH v4 3/5] mshv: Get the vmm capabilities offered by the hypervisor Nuno Das Neves
@ 2025-09-26 16:23 ` Nuno Das Neves
2025-09-29 5:31 ` Tianyu Lan
2025-09-29 17:56 ` Michael Kelley
2025-09-26 16:23 ` [PATCH v4 5/5] mshv: Introduce new hypercall to map stats page for L1VH partitions Nuno Das Neves
2025-09-26 23:12 ` [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings Stanislav Kinsburskii
5 siblings, 2 replies; 24+ messages in thread
From: Nuno Das Neves @ 2025-09-26 16:23 UTC (permalink / raw)
To: linux-hyperv, linux-kernel, prapal, easwar.hariharan, tiala,
anirudh, paekkaladevi, skinsburskii
Cc: kys, haiyangz, wei.liu, decui, Jinank Jain, Nuno Das Neves
From: Jinank Jain <jinankjain@linux.microsoft.com>
Introduce mshv_use_overlay_gpfn() to check if a page needs to be
allocated and passed to the hypervisor to map VP state pages. This is
only needed on L1VH, and only on some (newer) versions of the
hypervisor, hence the need to check vmm_capabilities.
Introduce functions hv_map/unmap_vp_state_page() to handle the
allocation and freeing.
Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com>
---
drivers/hv/mshv_root.h | 11 ++---
drivers/hv/mshv_root_hv_call.c | 61 ++++++++++++++++++++++++---
drivers/hv/mshv_root_main.c | 76 +++++++++++++++++-----------------
3 files changed, 98 insertions(+), 50 deletions(-)
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index 0cb1e2589fe1..dbe2d1d0b22f 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -279,11 +279,12 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
/* Choose between pages and bytes */
struct hv_vp_state_data state_data, u64 page_count,
struct page **pages, u32 num_bytes, u8 *bytes);
-int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
- union hv_input_vtl input_vtl,
- struct page **state_page);
-int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
- union hv_input_vtl input_vtl);
+int hv_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
+ union hv_input_vtl input_vtl,
+ struct page **state_page);
+int hv_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
+ struct page *state_page,
+ union hv_input_vtl input_vtl);
int hv_call_create_port(u64 port_partition_id, union hv_port_id port_id,
u64 connection_partition_id, struct hv_port_info *port_info,
u8 port_vtl, u8 min_connection_vtl, int node);
diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
index 3fd3cce23f69..98c6278ff151 100644
--- a/drivers/hv/mshv_root_hv_call.c
+++ b/drivers/hv/mshv_root_hv_call.c
@@ -526,9 +526,9 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
return ret;
}
-int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
- union hv_input_vtl input_vtl,
- struct page **state_page)
+static int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
+ union hv_input_vtl input_vtl,
+ struct page **state_page)
{
struct hv_input_map_vp_state_page *input;
struct hv_output_map_vp_state_page *output;
@@ -547,7 +547,14 @@ int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
input->type = type;
input->input_vtl = input_vtl;
- status = hv_do_hypercall(HVCALL_MAP_VP_STATE_PAGE, input, output);
+ if (*state_page) {
+ input->flags.map_location_provided = 1;
+ input->requested_map_location =
+ page_to_pfn(*state_page);
+ }
+
+ status = hv_do_hypercall(HVCALL_MAP_VP_STATE_PAGE, input,
+ output);
if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
if (hv_result_success(status))
@@ -565,8 +572,39 @@ int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
return ret;
}
-int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
- union hv_input_vtl input_vtl)
+static bool mshv_use_overlay_gpfn(void)
+{
+ return hv_l1vh_partition() &&
+ mshv_root.vmm_caps.vmm_can_provide_overlay_gpfn;
+}
+
+int hv_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
+ union hv_input_vtl input_vtl,
+ struct page **state_page)
+{
+ int ret = 0;
+ struct page *allocated_page = NULL;
+
+ if (mshv_use_overlay_gpfn()) {
+ allocated_page = alloc_page(GFP_KERNEL);
+ if (!allocated_page)
+ return -ENOMEM;
+ *state_page = allocated_page;
+ } else {
+ *state_page = NULL;
+ }
+
+ ret = hv_call_map_vp_state_page(partition_id, vp_index, type, input_vtl,
+ state_page);
+
+ if (ret && allocated_page)
+ __free_page(allocated_page);
+
+ return ret;
+}
+
+static int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
+ union hv_input_vtl input_vtl)
{
unsigned long flags;
u64 status;
@@ -590,6 +628,17 @@ int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
return hv_result_to_errno(status);
}
+int hv_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
+ struct page *state_page, union hv_input_vtl input_vtl)
+{
+ int ret = hv_call_unmap_vp_state_page(partition_id, vp_index, type, input_vtl);
+
+ if (mshv_use_overlay_gpfn() && state_page)
+ __free_page(state_page);
+
+ return ret;
+}
+
int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code,
u64 arg, void *property_value,
size_t property_value_sz)
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index e199770ecdfa..2d0ad17acde6 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -890,7 +890,7 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
{
struct mshv_create_vp args;
struct mshv_vp *vp;
- struct page *intercept_message_page, *register_page, *ghcb_page;
+ struct page *intercept_msg_page, *register_page, *ghcb_page;
void *stats_pages[2];
long ret;
@@ -908,28 +908,25 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
if (ret)
return ret;
- ret = hv_call_map_vp_state_page(partition->pt_id, args.vp_index,
- HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
- input_vtl_zero,
- &intercept_message_page);
+ ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
+ HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
+ input_vtl_zero, &intercept_msg_page);
if (ret)
goto destroy_vp;
if (!mshv_partition_encrypted(partition)) {
- ret = hv_call_map_vp_state_page(partition->pt_id, args.vp_index,
- HV_VP_STATE_PAGE_REGISTERS,
- input_vtl_zero,
- ®ister_page);
+ ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
+ HV_VP_STATE_PAGE_REGISTERS,
+ input_vtl_zero, ®ister_page);
if (ret)
goto unmap_intercept_message_page;
}
if (mshv_partition_encrypted(partition) &&
is_ghcb_mapping_available()) {
- ret = hv_call_map_vp_state_page(partition->pt_id, args.vp_index,
- HV_VP_STATE_PAGE_GHCB,
- input_vtl_normal,
- &ghcb_page);
+ ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
+ HV_VP_STATE_PAGE_GHCB,
+ input_vtl_normal, &ghcb_page);
if (ret)
goto unmap_register_page;
}
@@ -960,7 +957,7 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
atomic64_set(&vp->run.vp_signaled_count, 0);
vp->vp_index = args.vp_index;
- vp->vp_intercept_msg_page = page_to_virt(intercept_message_page);
+ vp->vp_intercept_msg_page = page_to_virt(intercept_msg_page);
if (!mshv_partition_encrypted(partition))
vp->vp_register_page = page_to_virt(register_page);
@@ -993,21 +990,19 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
mshv_vp_stats_unmap(partition->pt_id, args.vp_index);
unmap_ghcb_page:
- if (mshv_partition_encrypted(partition) && is_ghcb_mapping_available()) {
- hv_call_unmap_vp_state_page(partition->pt_id, args.vp_index,
- HV_VP_STATE_PAGE_GHCB,
- input_vtl_normal);
- }
+ if (mshv_partition_encrypted(partition) && is_ghcb_mapping_available())
+ hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
+ HV_VP_STATE_PAGE_GHCB, ghcb_page,
+ input_vtl_normal);
unmap_register_page:
- if (!mshv_partition_encrypted(partition)) {
- hv_call_unmap_vp_state_page(partition->pt_id, args.vp_index,
- HV_VP_STATE_PAGE_REGISTERS,
- input_vtl_zero);
- }
+ if (!mshv_partition_encrypted(partition))
+ hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
+ HV_VP_STATE_PAGE_REGISTERS,
+ register_page, input_vtl_zero);
unmap_intercept_message_page:
- hv_call_unmap_vp_state_page(partition->pt_id, args.vp_index,
- HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
- input_vtl_zero);
+ hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
+ HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
+ intercept_msg_page, input_vtl_zero);
destroy_vp:
hv_call_delete_vp(partition->pt_id, args.vp_index);
return ret;
@@ -1748,24 +1743,27 @@ static void destroy_partition(struct mshv_partition *partition)
mshv_vp_stats_unmap(partition->pt_id, vp->vp_index);
if (vp->vp_register_page) {
- (void)hv_call_unmap_vp_state_page(partition->pt_id,
- vp->vp_index,
- HV_VP_STATE_PAGE_REGISTERS,
- input_vtl_zero);
+ (void)hv_unmap_vp_state_page(partition->pt_id,
+ vp->vp_index,
+ HV_VP_STATE_PAGE_REGISTERS,
+ virt_to_page(vp->vp_register_page),
+ input_vtl_zero);
vp->vp_register_page = NULL;
}
- (void)hv_call_unmap_vp_state_page(partition->pt_id,
- vp->vp_index,
- HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
- input_vtl_zero);
+ (void)hv_unmap_vp_state_page(partition->pt_id,
+ vp->vp_index,
+ HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
+ virt_to_page(vp->vp_intercept_msg_page),
+ input_vtl_zero);
vp->vp_intercept_msg_page = NULL;
if (vp->vp_ghcb_page) {
- (void)hv_call_unmap_vp_state_page(partition->pt_id,
- vp->vp_index,
- HV_VP_STATE_PAGE_GHCB,
- input_vtl_normal);
+ (void)hv_unmap_vp_state_page(partition->pt_id,
+ vp->vp_index,
+ HV_VP_STATE_PAGE_GHCB,
+ virt_to_page(vp->vp_ghcb_page),
+ input_vtl_normal);
vp->vp_ghcb_page = NULL;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v4 5/5] mshv: Introduce new hypercall to map stats page for L1VH partitions
2025-09-26 16:23 [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings Nuno Das Neves
` (3 preceding siblings ...)
2025-09-26 16:23 ` [PATCH v4 4/5] mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH Nuno Das Neves
@ 2025-09-26 16:23 ` Nuno Das Neves
2025-09-29 17:57 ` Michael Kelley
2025-09-26 23:12 ` [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings Stanislav Kinsburskii
5 siblings, 1 reply; 24+ messages in thread
From: Nuno Das Neves @ 2025-09-26 16:23 UTC (permalink / raw)
To: linux-hyperv, linux-kernel, prapal, easwar.hariharan, tiala,
anirudh, paekkaladevi, skinsburskii
Cc: kys, haiyangz, wei.liu, decui, Jinank Jain, Nuno Das Neves
From: Jinank Jain <jinankjain@linux.microsoft.com>
Introduce HVCALL_MAP_STATS_PAGE2 which provides a map location (GPFN)
to map the stats to. This hypercall is required for L1VH partitions,
depending on the hypervisor version. This uses the same check as the
state page map location; mshv_use_overlay_gpfn().
Add mshv_map_vp_state_page() helpers to use this new hypercall or the
old one depending on availability.
For unmapping, the original HVCALL_UNMAP_STATS_PAGE works for both
cases.
Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
---
drivers/hv/mshv_root.h | 10 ++--
drivers/hv/mshv_root_hv_call.c | 93 ++++++++++++++++++++++++++++++++--
drivers/hv/mshv_root_main.c | 22 ++++----
include/hyperv/hvgdk_mini.h | 1 +
include/hyperv/hvhdk_mini.h | 7 +++
5 files changed, 113 insertions(+), 20 deletions(-)
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index dbe2d1d0b22f..0dfccfbe6123 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -297,11 +297,11 @@ int hv_call_connect_port(u64 port_partition_id, union hv_port_id port_id,
int hv_call_disconnect_port(u64 connection_partition_id,
union hv_connection_id connection_id);
int hv_call_notify_port_ring_empty(u32 sint_index);
-int hv_call_map_stat_page(enum hv_stats_object_type type,
- const union hv_stats_object_identity *identity,
- void **addr);
-int hv_call_unmap_stat_page(enum hv_stats_object_type type,
- const union hv_stats_object_identity *identity);
+int hv_map_stats_page(enum hv_stats_object_type type,
+ const union hv_stats_object_identity *identity,
+ void **addr);
+int hv_unmap_stats_page(enum hv_stats_object_type type, void *page_addr,
+ const union hv_stats_object_identity *identity);
int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
u64 page_struct_count, u32 host_access,
u32 flags, u8 acquire);
diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
index 98c6278ff151..5a805b3dec0b 100644
--- a/drivers/hv/mshv_root_hv_call.c
+++ b/drivers/hv/mshv_root_hv_call.c
@@ -804,9 +804,51 @@ hv_call_notify_port_ring_empty(u32 sint_index)
return hv_result_to_errno(status);
}
-int hv_call_map_stat_page(enum hv_stats_object_type type,
- const union hv_stats_object_identity *identity,
- void **addr)
+static int hv_call_map_stats_page2(enum hv_stats_object_type type,
+ const union hv_stats_object_identity *identity,
+ u64 map_location)
+{
+ unsigned long flags;
+ struct hv_input_map_stats_page2 *input;
+ u64 status;
+ int ret;
+
+ if (!map_location || !mshv_use_overlay_gpfn())
+ return -EINVAL;
+
+ do {
+ local_irq_save(flags);
+ input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+
+ memset(input, 0, sizeof(*input));
+ input->type = type;
+ input->identity = *identity;
+ input->map_location = map_location;
+
+ status = hv_do_hypercall(HVCALL_MAP_STATS_PAGE2, input, NULL);
+
+ local_irq_restore(flags);
+
+ ret = hv_result_to_errno(status);
+
+ if (!ret)
+ break;
+
+ if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
+ hv_status_debug(status, "\n");
+ break;
+ }
+
+ ret = hv_call_deposit_pages(NUMA_NO_NODE,
+ hv_current_partition_id, 1);
+ } while (!ret);
+
+ return ret;
+}
+
+static int hv_call_map_stats_page(enum hv_stats_object_type type,
+ const union hv_stats_object_identity *identity,
+ void **addr)
{
unsigned long flags;
struct hv_input_map_stats_page *input;
@@ -845,8 +887,36 @@ int hv_call_map_stat_page(enum hv_stats_object_type type,
return ret;
}
-int hv_call_unmap_stat_page(enum hv_stats_object_type type,
- const union hv_stats_object_identity *identity)
+int hv_map_stats_page(enum hv_stats_object_type type,
+ const union hv_stats_object_identity *identity,
+ void **addr)
+{
+ int ret;
+ struct page *allocated_page = NULL;
+
+ if (!addr)
+ return -EINVAL;
+
+ if (mshv_use_overlay_gpfn()) {
+ allocated_page = alloc_page(GFP_KERNEL);
+ if (!allocated_page)
+ return -ENOMEM;
+
+ ret = hv_call_map_stats_page2(type, identity,
+ page_to_pfn(allocated_page));
+ *addr = page_address(allocated_page);
+ } else {
+ ret = hv_call_map_stats_page(type, identity, addr);
+ }
+
+ if (ret && allocated_page)
+ __free_page(allocated_page);
+
+ return ret;
+}
+
+static int hv_call_unmap_stats_page(enum hv_stats_object_type type,
+ const union hv_stats_object_identity *identity)
{
unsigned long flags;
struct hv_input_unmap_stats_page *input;
@@ -865,6 +935,19 @@ int hv_call_unmap_stat_page(enum hv_stats_object_type type,
return hv_result_to_errno(status);
}
+int hv_unmap_stats_page(enum hv_stats_object_type type, void *page_addr,
+ const union hv_stats_object_identity *identity)
+{
+ int ret;
+
+ ret = hv_call_unmap_stats_page(type, identity);
+
+ if (mshv_use_overlay_gpfn() && page_addr)
+ __free_page(virt_to_page(page_addr));
+
+ return ret;
+}
+
int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
u64 page_struct_count, u32 host_access,
u32 flags, u8 acquire)
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index 2d0ad17acde6..71a8ab5db3b8 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -841,7 +841,8 @@ mshv_vp_release(struct inode *inode, struct file *filp)
return 0;
}
-static void mshv_vp_stats_unmap(u64 partition_id, u32 vp_index)
+static void mshv_vp_stats_unmap(u64 partition_id, u32 vp_index,
+ void *stats_pages[])
{
union hv_stats_object_identity identity = {
.vp.partition_id = partition_id,
@@ -849,10 +850,10 @@ static void mshv_vp_stats_unmap(u64 partition_id, u32 vp_index)
};
identity.vp.stats_area_type = HV_STATS_AREA_SELF;
- hv_call_unmap_stat_page(HV_STATS_OBJECT_VP, &identity);
+ hv_unmap_stats_page(HV_STATS_OBJECT_VP, NULL, &identity);
identity.vp.stats_area_type = HV_STATS_AREA_PARENT;
- hv_call_unmap_stat_page(HV_STATS_OBJECT_VP, &identity);
+ hv_unmap_stats_page(HV_STATS_OBJECT_VP, NULL, &identity);
}
static int mshv_vp_stats_map(u64 partition_id, u32 vp_index,
@@ -865,14 +866,14 @@ static int mshv_vp_stats_map(u64 partition_id, u32 vp_index,
int err;
identity.vp.stats_area_type = HV_STATS_AREA_SELF;
- err = hv_call_map_stat_page(HV_STATS_OBJECT_VP, &identity,
- &stats_pages[HV_STATS_AREA_SELF]);
+ err = hv_map_stats_page(HV_STATS_OBJECT_VP, &identity,
+ &stats_pages[HV_STATS_AREA_SELF]);
if (err)
return err;
identity.vp.stats_area_type = HV_STATS_AREA_PARENT;
- err = hv_call_map_stat_page(HV_STATS_OBJECT_VP, &identity,
- &stats_pages[HV_STATS_AREA_PARENT]);
+ err = hv_map_stats_page(HV_STATS_OBJECT_VP, &identity,
+ &stats_pages[HV_STATS_AREA_PARENT]);
if (err)
goto unmap_self;
@@ -880,7 +881,7 @@ static int mshv_vp_stats_map(u64 partition_id, u32 vp_index,
unmap_self:
identity.vp.stats_area_type = HV_STATS_AREA_SELF;
- hv_call_unmap_stat_page(HV_STATS_OBJECT_VP, &identity);
+ hv_unmap_stats_page(HV_STATS_OBJECT_VP, NULL, &identity);
return err;
}
@@ -988,7 +989,7 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
kfree(vp);
unmap_stats_pages:
if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
- mshv_vp_stats_unmap(partition->pt_id, args.vp_index);
+ mshv_vp_stats_unmap(partition->pt_id, args.vp_index, stats_pages);
unmap_ghcb_page:
if (mshv_partition_encrypted(partition) && is_ghcb_mapping_available())
hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
@@ -1740,7 +1741,8 @@ static void destroy_partition(struct mshv_partition *partition)
continue;
if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
- mshv_vp_stats_unmap(partition->pt_id, vp->vp_index);
+ mshv_vp_stats_unmap(partition->pt_id, vp->vp_index,
+ (void **)vp->vp_stats_pages);
if (vp->vp_register_page) {
(void)hv_unmap_vp_state_page(partition->pt_id,
diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
index ff4325fb623a..f66565106d21 100644
--- a/include/hyperv/hvgdk_mini.h
+++ b/include/hyperv/hvgdk_mini.h
@@ -493,6 +493,7 @@ union hv_vp_assist_msr_contents { /* HV_REGISTER_VP_ASSIST_PAGE */
#define HVCALL_GET_PARTITION_PROPERTY_EX 0x0101
#define HVCALL_MMIO_READ 0x0106
#define HVCALL_MMIO_WRITE 0x0107
+#define HVCALL_MAP_STATS_PAGE2 0x0131
/* HV_HYPERCALL_INPUT */
#define HV_HYPERCALL_RESULT_MASK GENMASK_ULL(15, 0)
diff --git a/include/hyperv/hvhdk_mini.h b/include/hyperv/hvhdk_mini.h
index bf2ce27dfcc5..064bf735cab6 100644
--- a/include/hyperv/hvhdk_mini.h
+++ b/include/hyperv/hvhdk_mini.h
@@ -177,6 +177,13 @@ struct hv_input_map_stats_page {
union hv_stats_object_identity identity;
} __packed;
+struct hv_input_map_stats_page2 {
+ u32 type; /* enum hv_stats_object_type */
+ u32 padding;
+ union hv_stats_object_identity identity;
+ u64 map_location;
+} __packed;
+
struct hv_output_map_stats_page {
u64 map_location;
} __packed;
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings
2025-09-26 16:23 [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings Nuno Das Neves
` (4 preceding siblings ...)
2025-09-26 16:23 ` [PATCH v4 5/5] mshv: Introduce new hypercall to map stats page for L1VH partitions Nuno Das Neves
@ 2025-09-26 23:12 ` Stanislav Kinsburskii
2025-09-29 18:19 ` Nuno Das Neves
5 siblings, 1 reply; 24+ messages in thread
From: Stanislav Kinsburskii @ 2025-09-26 23:12 UTC (permalink / raw)
To: Nuno Das Neves
Cc: linux-hyperv, linux-kernel, prapal, easwar.hariharan, tiala,
anirudh, paekkaladevi, kys, haiyangz, wei.liu, decui
On Fri, Sep 26, 2025 at 09:23:10AM -0700, Nuno Das Neves wrote:
> There are some differences in how L1VH partitions must map stats and vp
> state pages, some of which are due to differences across hypervisor
> versions. Detect and handle these cases.
>
I'm not sure that support for older and actully broken versions on
hypervisor need to be usptreamed, as these versions will go away sooner
or later and this support will become dead weight.
I think we should upstrem only the changes needed for the new versiong
of hypervisors instead and carry legacy support out of tree until it
becomes obsoleted.
Thanks,
Stanislav
> Patch 1:
> Fix for the logic of when to map the vp stats page for the root scheduler.
>
> Patch 2-3:
> Add HVCALL_GET_PARTITION_PROPERTY_EX and use it to query "vmm capabilities" on
> module init.
>
> Patches 4-5:
> Check a feature bit in vmm capabilities, to take a new code path for mapping
> stats and vp state pages. In this case, the stats and vp state pages must be
> allocated by Linux, and a new hypercall HVCALL_MAP_VP_STATS_PAGE2 must be used
> to map the stats page.
>
> ---
> v4:
> - Fixed some __packed attributes on unions [Stanislav]
> - Cleaned up mshv_init_vmm_caps() [Stanislav]
> - Cleaned up loop in hv_call_map_stats_page2() [Stanislav]
>
> v3:
> https://lore.kernel.org/linux-hyperv/1758066262-15477-1-git-send-email-nunodasneves@linux.microsoft.com/T/#t
> - Fix bug in patch 4, in mshv_partition_ioctl_create_vp() cleanup path
> [kernel test robot]
> - Make hv_unmap_vp_state_page() use struct page to match hv_map_vp_state_page()
> - Remove SELF == PARENT check which doesn't belong in patch 5 [Easwar]
>
> v2:
> https://lore.kernel.org/linux-hyperv/1757546089-2002-1-git-send-email-nunodasneves@linux.microsoft.com/T/#t
> - Remove patch falling back to SELF page if PARENT mapping fails [Easwar]
> (To be included in a future series)
> - Fix formatting of function definitions [Easwar]
> - Fix some wording in commit messages [Praveen]
> - Proceed with driver init even if getting vmm capabilities fails [Anirudh]
>
> v1:
> https://lore.kernel.org/linux-hyperv/1756428230-3599-1-git-send-email-nunodasneves@linux.microsoft.com/T/#t
>
> ---
> Jinank Jain (2):
> mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH
> mshv: Introduce new hypercall to map stats page for L1VH partitions
>
> Nuno Das Neves (1):
> mshv: Only map vp->vp_stats_pages if on root scheduler
>
> Purna Pavan Chandra Aekkaladevi (2):
> mshv: Add the HVCALL_GET_PARTITION_PROPERTY_EX hypercall
> mshv: Get the vmm capabilities offered by the hypervisor
>
> drivers/hv/mshv_root.h | 24 +++--
> drivers/hv/mshv_root_hv_call.c | 185 +++++++++++++++++++++++++++++++--
> drivers/hv/mshv_root_main.c | 127 ++++++++++++----------
> include/hyperv/hvgdk_mini.h | 2 +
> include/hyperv/hvhdk.h | 40 +++++++
> include/hyperv/hvhdk_mini.h | 33 ++++++
> 6 files changed, 337 insertions(+), 74 deletions(-)
>
> --
> 2.34.1
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 2/5] mshv: Add the HVCALL_GET_PARTITION_PROPERTY_EX hypercall
2025-09-26 16:23 ` [PATCH v4 2/5] mshv: Add the HVCALL_GET_PARTITION_PROPERTY_EX hypercall Nuno Das Neves
@ 2025-09-29 5:31 ` Tianyu Lan
2025-09-29 17:55 ` Michael Kelley
1 sibling, 0 replies; 24+ messages in thread
From: Tianyu Lan @ 2025-09-29 5:31 UTC (permalink / raw)
To: Nuno Das Neves
Cc: linux-hyperv, linux-kernel, prapal, easwar.hariharan, tiala,
anirudh, paekkaladevi, skinsburskii, kys, haiyangz, wei.liu,
decui
On Sat, Sep 27, 2025 at 12:23 AM Nuno Das Neves
<nunodasneves@linux.microsoft.com> wrote:
>
> From: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
>
> This hypercall can be used to fetch extended properties of a
> partition. Extended properties are properties with values larger than
> a u64. Some of these also need additional input arguments.
>
> Add helper function for using the hypercall in the mshv_root driver.
>
> Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com>
> Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com>
> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
> ---
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
> drivers/hv/mshv_root.h | 2 ++
> drivers/hv/mshv_root_hv_call.c | 31 ++++++++++++++++++++++++++
> include/hyperv/hvgdk_mini.h | 1 +
> include/hyperv/hvhdk.h | 40 ++++++++++++++++++++++++++++++++++
> include/hyperv/hvhdk_mini.h | 26 ++++++++++++++++++++++
> 5 files changed, 100 insertions(+)
>
> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> index e3931b0f1269..4aeb03bea6b6 100644
> --- a/drivers/hv/mshv_root.h
> +++ b/drivers/hv/mshv_root.h
> @@ -303,6 +303,8 @@ int hv_call_unmap_stat_page(enum hv_stats_object_type type,
> int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
> u64 page_struct_count, u32 host_access,
> u32 flags, u8 acquire);
> +int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code, u64 arg,
> + void *property_value, size_t property_value_sz);
>
> extern struct mshv_root mshv_root;
> extern enum hv_scheduler_type hv_scheduler_type;
> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> index c9c274f29c3c..3fd3cce23f69 100644
> --- a/drivers/hv/mshv_root_hv_call.c
> +++ b/drivers/hv/mshv_root_hv_call.c
> @@ -590,6 +590,37 @@ int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> return hv_result_to_errno(status);
> }
>
> +int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code,
> + u64 arg, void *property_value,
> + size_t property_value_sz)
> +{
> + u64 status;
> + unsigned long flags;
> + struct hv_input_get_partition_property_ex *input;
> + struct hv_output_get_partition_property_ex *output;
> +
> + local_irq_save(flags);
> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> + output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> +
> + memset(input, 0, sizeof(*input));
> + input->partition_id = partition_id;
> + input->property_code = property_code;
> + input->arg = arg;
> + status = hv_do_hypercall(HVCALL_GET_PARTITION_PROPERTY_EX, input, output);
> +
> + if (!hv_result_success(status)) {
> + hv_status_debug(status, "\n");
> + local_irq_restore(flags);
> + return hv_result_to_errno(status);
> + }
> + memcpy(property_value, &output->property_value, property_value_sz);
> +
> + local_irq_restore(flags);
> +
> + return 0;
> +}
> +
> int
> hv_call_clear_virtual_interrupt(u64 partition_id)
> {
> diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
> index 1be7f6a02304..ff4325fb623a 100644
> --- a/include/hyperv/hvgdk_mini.h
> +++ b/include/hyperv/hvgdk_mini.h
> @@ -490,6 +490,7 @@ union hv_vp_assist_msr_contents { /* HV_REGISTER_VP_ASSIST_PAGE */
> #define HVCALL_GET_VP_STATE 0x00e3
> #define HVCALL_SET_VP_STATE 0x00e4
> #define HVCALL_GET_VP_CPUID_VALUES 0x00f4
> +#define HVCALL_GET_PARTITION_PROPERTY_EX 0x0101
> #define HVCALL_MMIO_READ 0x0106
> #define HVCALL_MMIO_WRITE 0x0107
>
> diff --git a/include/hyperv/hvhdk.h b/include/hyperv/hvhdk.h
> index b4067ada02cf..416c0d45b793 100644
> --- a/include/hyperv/hvhdk.h
> +++ b/include/hyperv/hvhdk.h
> @@ -376,6 +376,46 @@ struct hv_input_set_partition_property {
> u64 property_value;
> } __packed;
>
> +union hv_partition_property_arg {
> + u64 as_uint64;
> + struct {
> + union {
> + u32 arg;
> + u32 vp_index;
> + };
> + u16 reserved0;
> + u8 reserved1;
> + u8 object_type;
> + } __packed;
> +};
> +
> +struct hv_input_get_partition_property_ex {
> + u64 partition_id;
> + u32 property_code; /* enum hv_partition_property_code */
> + u32 padding;
> + union {
> + union hv_partition_property_arg arg_data;
> + u64 arg;
> + };
> +} __packed;
> +
> +/*
> + * NOTE: Should use hv_input_set_partition_property_ex_header to compute this
> + * size, but hv_input_get_partition_property_ex is identical so it suffices
> + */
> +#define HV_PARTITION_PROPERTY_EX_MAX_VAR_SIZE \
> + (HV_HYP_PAGE_SIZE - sizeof(struct hv_input_get_partition_property_ex))
> +
> +union hv_partition_property_ex {
> + u8 buffer[HV_PARTITION_PROPERTY_EX_MAX_VAR_SIZE];
> + struct hv_partition_property_vmm_capabilities vmm_capabilities;
> + /* More fields to be filled in when needed */
> +};
> +
> +struct hv_output_get_partition_property_ex {
> + union hv_partition_property_ex property_value;
> +} __packed;
> +
> enum hv_vp_state_page_type {
> HV_VP_STATE_PAGE_REGISTERS = 0,
> HV_VP_STATE_PAGE_INTERCEPT_MESSAGE = 1,
> diff --git a/include/hyperv/hvhdk_mini.h b/include/hyperv/hvhdk_mini.h
> index 858f6a3925b3..bf2ce27dfcc5 100644
> --- a/include/hyperv/hvhdk_mini.h
> +++ b/include/hyperv/hvhdk_mini.h
> @@ -96,8 +96,34 @@ enum hv_partition_property_code {
> HV_PARTITION_PROPERTY_XSAVE_STATES = 0x00060007,
> HV_PARTITION_PROPERTY_MAX_XSAVE_DATA_SIZE = 0x00060008,
> HV_PARTITION_PROPERTY_PROCESSOR_CLOCK_FREQUENCY = 0x00060009,
> +
> + /* Extended properties with larger property values */
> + HV_PARTITION_PROPERTY_VMM_CAPABILITIES = 0x00090007,
> };
>
> +#define HV_PARTITION_VMM_CAPABILITIES_BANK_COUNT 1
> +#define HV_PARTITION_VMM_CAPABILITIES_RESERVED_BITFIELD_COUNT 59
> +
> +struct hv_partition_property_vmm_capabilities {
> + u16 bank_count;
> + u16 reserved[3];
> + union {
> + u64 as_uint64[HV_PARTITION_VMM_CAPABILITIES_BANK_COUNT];
> + struct {
> + u64 map_gpa_preserve_adjustable: 1;
> + u64 vmm_can_provide_overlay_gpfn: 1;
> + u64 vp_affinity_property: 1;
> +#if IS_ENABLED(CONFIG_ARM64)
> + u64 vmm_can_provide_gic_overlay_locations: 1;
> +#else
> + u64 reservedbit3: 1;
> +#endif
> + u64 assignable_synthetic_proc_features: 1;
> + u64 reserved0: HV_PARTITION_VMM_CAPABILITIES_RESERVED_BITFIELD_COUNT;
> + } __packed;
> + };
> +} __packed;
> +
> enum hv_snp_status {
> HV_SNP_STATUS_NONE = 0,
> HV_SNP_STATUS_AVAILABLE = 1,
> --
> 2.34.1
>
>
--
Thanks
Tianyu Lan
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 4/5] mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH
2025-09-26 16:23 ` [PATCH v4 4/5] mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH Nuno Das Neves
@ 2025-09-29 5:31 ` Tianyu Lan
2025-09-29 17:56 ` Michael Kelley
1 sibling, 0 replies; 24+ messages in thread
From: Tianyu Lan @ 2025-09-29 5:31 UTC (permalink / raw)
To: Nuno Das Neves
Cc: linux-hyperv, linux-kernel, prapal, easwar.hariharan, tiala,
anirudh, paekkaladevi, skinsburskii, kys, haiyangz, wei.liu,
decui, Jinank Jain
On Sat, Sep 27, 2025 at 12:48 AM Nuno Das Neves
<nunodasneves@linux.microsoft.com> wrote:
>
> From: Jinank Jain <jinankjain@linux.microsoft.com>
>
> Introduce mshv_use_overlay_gpfn() to check if a page needs to be
> allocated and passed to the hypervisor to map VP state pages. This is
> only needed on L1VH, and only on some (newer) versions of the
> hypervisor, hence the need to check vmm_capabilities.
>
> Introduce functions hv_map/unmap_vp_state_page() to handle the
> allocation and freeing.
>
> Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com>
> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com>
> ---
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
> drivers/hv/mshv_root.h | 11 ++---
> drivers/hv/mshv_root_hv_call.c | 61 ++++++++++++++++++++++++---
> drivers/hv/mshv_root_main.c | 76 +++++++++++++++++-----------------
> 3 files changed, 98 insertions(+), 50 deletions(-)
>
> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> index 0cb1e2589fe1..dbe2d1d0b22f 100644
> --- a/drivers/hv/mshv_root.h
> +++ b/drivers/hv/mshv_root.h
> @@ -279,11 +279,12 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
> /* Choose between pages and bytes */
> struct hv_vp_state_data state_data, u64 page_count,
> struct page **pages, u32 num_bytes, u8 *bytes);
> -int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> - union hv_input_vtl input_vtl,
> - struct page **state_page);
> -int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> - union hv_input_vtl input_vtl);
> +int hv_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> + union hv_input_vtl input_vtl,
> + struct page **state_page);
> +int hv_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> + struct page *state_page,
> + union hv_input_vtl input_vtl);
> int hv_call_create_port(u64 port_partition_id, union hv_port_id port_id,
> u64 connection_partition_id, struct hv_port_info *port_info,
> u8 port_vtl, u8 min_connection_vtl, int node);
> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> index 3fd3cce23f69..98c6278ff151 100644
> --- a/drivers/hv/mshv_root_hv_call.c
> +++ b/drivers/hv/mshv_root_hv_call.c
> @@ -526,9 +526,9 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
> return ret;
> }
>
> -int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> - union hv_input_vtl input_vtl,
> - struct page **state_page)
> +static int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> + union hv_input_vtl input_vtl,
> + struct page **state_page)
> {
> struct hv_input_map_vp_state_page *input;
> struct hv_output_map_vp_state_page *output;
> @@ -547,7 +547,14 @@ int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> input->type = type;
> input->input_vtl = input_vtl;
>
> - status = hv_do_hypercall(HVCALL_MAP_VP_STATE_PAGE, input, output);
> + if (*state_page) {
> + input->flags.map_location_provided = 1;
> + input->requested_map_location =
> + page_to_pfn(*state_page);
> + }
> +
> + status = hv_do_hypercall(HVCALL_MAP_VP_STATE_PAGE, input,
> + output);
>
> if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
> if (hv_result_success(status))
> @@ -565,8 +572,39 @@ int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> return ret;
> }
>
> -int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> - union hv_input_vtl input_vtl)
> +static bool mshv_use_overlay_gpfn(void)
> +{
> + return hv_l1vh_partition() &&
> + mshv_root.vmm_caps.vmm_can_provide_overlay_gpfn;
> +}
> +
> +int hv_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> + union hv_input_vtl input_vtl,
> + struct page **state_page)
> +{
> + int ret = 0;
> + struct page *allocated_page = NULL;
> +
> + if (mshv_use_overlay_gpfn()) {
> + allocated_page = alloc_page(GFP_KERNEL);
> + if (!allocated_page)
> + return -ENOMEM;
> + *state_page = allocated_page;
> + } else {
> + *state_page = NULL;
> + }
> +
> + ret = hv_call_map_vp_state_page(partition_id, vp_index, type, input_vtl,
> + state_page);
> +
> + if (ret && allocated_page)
> + __free_page(allocated_page);
> +
> + return ret;
> +}
> +
> +static int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> + union hv_input_vtl input_vtl)
> {
> unsigned long flags;
> u64 status;
> @@ -590,6 +628,17 @@ int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> return hv_result_to_errno(status);
> }
>
> +int hv_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> + struct page *state_page, union hv_input_vtl input_vtl)
> +{
> + int ret = hv_call_unmap_vp_state_page(partition_id, vp_index, type, input_vtl);
> +
> + if (mshv_use_overlay_gpfn() && state_page)
> + __free_page(state_page);
> +
> + return ret;
> +}
> +
> int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code,
> u64 arg, void *property_value,
> size_t property_value_sz)
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index e199770ecdfa..2d0ad17acde6 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -890,7 +890,7 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
> {
> struct mshv_create_vp args;
> struct mshv_vp *vp;
> - struct page *intercept_message_page, *register_page, *ghcb_page;
> + struct page *intercept_msg_page, *register_page, *ghcb_page;
> void *stats_pages[2];
> long ret;
>
> @@ -908,28 +908,25 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
> if (ret)
> return ret;
>
> - ret = hv_call_map_vp_state_page(partition->pt_id, args.vp_index,
> - HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> - input_vtl_zero,
> - &intercept_message_page);
> + ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
> + HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> + input_vtl_zero, &intercept_msg_page);
> if (ret)
> goto destroy_vp;
>
> if (!mshv_partition_encrypted(partition)) {
> - ret = hv_call_map_vp_state_page(partition->pt_id, args.vp_index,
> - HV_VP_STATE_PAGE_REGISTERS,
> - input_vtl_zero,
> - ®ister_page);
> + ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
> + HV_VP_STATE_PAGE_REGISTERS,
> + input_vtl_zero, ®ister_page);
> if (ret)
> goto unmap_intercept_message_page;
> }
>
> if (mshv_partition_encrypted(partition) &&
> is_ghcb_mapping_available()) {
> - ret = hv_call_map_vp_state_page(partition->pt_id, args.vp_index,
> - HV_VP_STATE_PAGE_GHCB,
> - input_vtl_normal,
> - &ghcb_page);
> + ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
> + HV_VP_STATE_PAGE_GHCB,
> + input_vtl_normal, &ghcb_page);
> if (ret)
> goto unmap_register_page;
> }
> @@ -960,7 +957,7 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
> atomic64_set(&vp->run.vp_signaled_count, 0);
>
> vp->vp_index = args.vp_index;
> - vp->vp_intercept_msg_page = page_to_virt(intercept_message_page);
> + vp->vp_intercept_msg_page = page_to_virt(intercept_msg_page);
> if (!mshv_partition_encrypted(partition))
> vp->vp_register_page = page_to_virt(register_page);
>
> @@ -993,21 +990,19 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
> if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
> mshv_vp_stats_unmap(partition->pt_id, args.vp_index);
> unmap_ghcb_page:
> - if (mshv_partition_encrypted(partition) && is_ghcb_mapping_available()) {
> - hv_call_unmap_vp_state_page(partition->pt_id, args.vp_index,
> - HV_VP_STATE_PAGE_GHCB,
> - input_vtl_normal);
> - }
> + if (mshv_partition_encrypted(partition) && is_ghcb_mapping_available())
> + hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
> + HV_VP_STATE_PAGE_GHCB, ghcb_page,
> + input_vtl_normal);
> unmap_register_page:
> - if (!mshv_partition_encrypted(partition)) {
> - hv_call_unmap_vp_state_page(partition->pt_id, args.vp_index,
> - HV_VP_STATE_PAGE_REGISTERS,
> - input_vtl_zero);
> - }
> + if (!mshv_partition_encrypted(partition))
> + hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
> + HV_VP_STATE_PAGE_REGISTERS,
> + register_page, input_vtl_zero);
> unmap_intercept_message_page:
> - hv_call_unmap_vp_state_page(partition->pt_id, args.vp_index,
> - HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> - input_vtl_zero);
> + hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
> + HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> + intercept_msg_page, input_vtl_zero);
> destroy_vp:
> hv_call_delete_vp(partition->pt_id, args.vp_index);
> return ret;
> @@ -1748,24 +1743,27 @@ static void destroy_partition(struct mshv_partition *partition)
> mshv_vp_stats_unmap(partition->pt_id, vp->vp_index);
>
> if (vp->vp_register_page) {
> - (void)hv_call_unmap_vp_state_page(partition->pt_id,
> - vp->vp_index,
> - HV_VP_STATE_PAGE_REGISTERS,
> - input_vtl_zero);
> + (void)hv_unmap_vp_state_page(partition->pt_id,
> + vp->vp_index,
> + HV_VP_STATE_PAGE_REGISTERS,
> + virt_to_page(vp->vp_register_page),
> + input_vtl_zero);
> vp->vp_register_page = NULL;
> }
>
> - (void)hv_call_unmap_vp_state_page(partition->pt_id,
> - vp->vp_index,
> - HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> - input_vtl_zero);
> + (void)hv_unmap_vp_state_page(partition->pt_id,
> + vp->vp_index,
> + HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> + virt_to_page(vp->vp_intercept_msg_page),
> + input_vtl_zero);
> vp->vp_intercept_msg_page = NULL;
>
> if (vp->vp_ghcb_page) {
> - (void)hv_call_unmap_vp_state_page(partition->pt_id,
> - vp->vp_index,
> - HV_VP_STATE_PAGE_GHCB,
> - input_vtl_normal);
> + (void)hv_unmap_vp_state_page(partition->pt_id,
> + vp->vp_index,
> + HV_VP_STATE_PAGE_GHCB,
> + virt_to_page(vp->vp_ghcb_page),
> + input_vtl_normal);
> vp->vp_ghcb_page = NULL;
> }
>
> --
> 2.34.1
>
>
--
Thanks
Tianyu Lan
^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: [PATCH v4 2/5] mshv: Add the HVCALL_GET_PARTITION_PROPERTY_EX hypercall
2025-09-26 16:23 ` [PATCH v4 2/5] mshv: Add the HVCALL_GET_PARTITION_PROPERTY_EX hypercall Nuno Das Neves
2025-09-29 5:31 ` Tianyu Lan
@ 2025-09-29 17:55 ` Michael Kelley
2025-10-01 22:32 ` Nuno Das Neves
1 sibling, 1 reply; 24+ messages in thread
From: Michael Kelley @ 2025-09-29 17:55 UTC (permalink / raw)
To: Nuno Das Neves, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, prapal@linux.microsoft.com,
easwar.hariharan@linux.microsoft.com, tiala@microsoft.com,
anirudh@anirudhrb.com, paekkaladevi@linux.microsoft.com,
skinsburskii@linux.microsoft.com
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com
From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Friday, September 26, 2025 9:23 AM
>
> From: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
>
> This hypercall can be used to fetch extended properties of a
> partition. Extended properties are properties with values larger than
> a u64. Some of these also need additional input arguments.
>
> Add helper function for using the hypercall in the mshv_root driver.
>
> Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com>
> Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com>
> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
> ---
> drivers/hv/mshv_root.h | 2 ++
> drivers/hv/mshv_root_hv_call.c | 31 ++++++++++++++++++++++++++
> include/hyperv/hvgdk_mini.h | 1 +
> include/hyperv/hvhdk.h | 40 ++++++++++++++++++++++++++++++++++
> include/hyperv/hvhdk_mini.h | 26 ++++++++++++++++++++++
> 5 files changed, 100 insertions(+)
>
> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> index e3931b0f1269..4aeb03bea6b6 100644
> --- a/drivers/hv/mshv_root.h
> +++ b/drivers/hv/mshv_root.h
> @@ -303,6 +303,8 @@ int hv_call_unmap_stat_page(enum hv_stats_object_type type,
> int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
> u64 page_struct_count, u32 host_access,
> u32 flags, u8 acquire);
> +int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code, u64 arg,
> + void *property_value, size_t property_value_sz);
>
> extern struct mshv_root mshv_root;
> extern enum hv_scheduler_type hv_scheduler_type;
> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> index c9c274f29c3c..3fd3cce23f69 100644
> --- a/drivers/hv/mshv_root_hv_call.c
> +++ b/drivers/hv/mshv_root_hv_call.c
> @@ -590,6 +590,37 @@ int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> return hv_result_to_errno(status);
> }
>
> +int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code,
> + u64 arg, void *property_value,
> + size_t property_value_sz)
> +{
> + u64 status;
> + unsigned long flags;
> + struct hv_input_get_partition_property_ex *input;
> + struct hv_output_get_partition_property_ex *output;
> +
> + local_irq_save(flags);
> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> + output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> +
> + memset(input, 0, sizeof(*input));
> + input->partition_id = partition_id;
> + input->property_code = property_code;
> + input->arg = arg;
> + status = hv_do_hypercall(HVCALL_GET_PARTITION_PROPERTY_EX, input, output);
> +
> + if (!hv_result_success(status)) {
> + hv_status_debug(status, "\n");
> + local_irq_restore(flags);
Nit: It would be marginally better to do the local_irq_restore() first, and then
output the error message so that interrupts are not disabled any longer than
needed.
> + return hv_result_to_errno(status);
> + }
> + memcpy(property_value, &output->property_value, property_value_sz);
> +
> + local_irq_restore(flags);
> +
> + return 0;
> +}
> +
> int
> hv_call_clear_virtual_interrupt(u64 partition_id)
> {
> diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
> index 1be7f6a02304..ff4325fb623a 100644
> --- a/include/hyperv/hvgdk_mini.h
> +++ b/include/hyperv/hvgdk_mini.h
> @@ -490,6 +490,7 @@ union hv_vp_assist_msr_contents { /*
> HV_REGISTER_VP_ASSIST_PAGE */
> #define HVCALL_GET_VP_STATE 0x00e3
> #define HVCALL_SET_VP_STATE 0x00e4
> #define HVCALL_GET_VP_CPUID_VALUES 0x00f4
> +#define HVCALL_GET_PARTITION_PROPERTY_EX 0x0101
> #define HVCALL_MMIO_READ 0x0106
> #define HVCALL_MMIO_WRITE 0x0107
>
> diff --git a/include/hyperv/hvhdk.h b/include/hyperv/hvhdk.h
> index b4067ada02cf..416c0d45b793 100644
> --- a/include/hyperv/hvhdk.h
> +++ b/include/hyperv/hvhdk.h
> @@ -376,6 +376,46 @@ struct hv_input_set_partition_property {
> u64 property_value;
> } __packed;
>
> +union hv_partition_property_arg {
> + u64 as_uint64;
> + struct {
> + union {
> + u32 arg;
> + u32 vp_index;
> + };
> + u16 reserved0;
> + u8 reserved1;
> + u8 object_type;
> + } __packed;
> +};
> +
> +struct hv_input_get_partition_property_ex {
> + u64 partition_id;
> + u32 property_code; /* enum hv_partition_property_code */
> + u32 padding;
> + union {
> + union hv_partition_property_arg arg_data;
> + u64 arg;
This union, and the "u64 arg" member, seems redundant since
union hv_partition_property_arg already has a member "as_uint64".
Maybe this is just being copied from the Windows versions of these
structures, in which case I realize there are constraints on what you
want to change or fix, and you can ignore my comment.
> + };
> +} __packed;
> +
> +/*
> + * NOTE: Should use hv_input_set_partition_property_ex_header to compute this
> + * size, but hv_input_get_partition_property_ex is identical so it suffices
> + */
> +#define HV_PARTITION_PROPERTY_EX_MAX_VAR_SIZE \
> + (HV_HYP_PAGE_SIZE - sizeof(struct hv_input_get_partition_property_ex))
> +
> +union hv_partition_property_ex {
> + u8 buffer[HV_PARTITION_PROPERTY_EX_MAX_VAR_SIZE];
It's unclear what this "buffer" field is trying to do, and particularly its size.
The comment above references hv_input_set_partition_property_ex_header,
which doesn't exist in the Linux code. And why is the max size of the output
buffer reduced by the size of the header of the input to "set partition property"?
> + struct hv_partition_property_vmm_capabilities vmm_capabilities;
> + /* More fields to be filled in when needed */
> +};
> +
> +struct hv_output_get_partition_property_ex {
> + union hv_partition_property_ex property_value;
> +} __packed;
> +
> enum hv_vp_state_page_type {
> HV_VP_STATE_PAGE_REGISTERS = 0,
> HV_VP_STATE_PAGE_INTERCEPT_MESSAGE = 1,
> diff --git a/include/hyperv/hvhdk_mini.h b/include/hyperv/hvhdk_mini.h
> index 858f6a3925b3..bf2ce27dfcc5 100644
> --- a/include/hyperv/hvhdk_mini.h
> +++ b/include/hyperv/hvhdk_mini.h
> @@ -96,8 +96,34 @@ enum hv_partition_property_code {
> HV_PARTITION_PROPERTY_XSAVE_STATES = 0x00060007,
> HV_PARTITION_PROPERTY_MAX_XSAVE_DATA_SIZE = 0x00060008,
> HV_PARTITION_PROPERTY_PROCESSOR_CLOCK_FREQUENCY = 0x00060009,
> +
> + /* Extended properties with larger property values */
> + HV_PARTITION_PROPERTY_VMM_CAPABILITIES = 0x00090007,
> };
>
> +#define HV_PARTITION_VMM_CAPABILITIES_BANK_COUNT 1
> +#define HV_PARTITION_VMM_CAPABILITIES_RESERVED_BITFIELD_COUNT 59
> +
> +struct hv_partition_property_vmm_capabilities {
> + u16 bank_count;
> + u16 reserved[3];
> + union {
> + u64 as_uint64[HV_PARTITION_VMM_CAPABILITIES_BANK_COUNT];
> + struct {
> + u64 map_gpa_preserve_adjustable: 1;
> + u64 vmm_can_provide_overlay_gpfn: 1;
> + u64 vp_affinity_property: 1;
> +#if IS_ENABLED(CONFIG_ARM64)
> + u64 vmm_can_provide_gic_overlay_locations: 1;
> +#else
> + u64 reservedbit3: 1;
> +#endif
> + u64 assignable_synthetic_proc_features: 1;
> + u64 reserved0: HV_PARTITION_VMM_CAPABILITIES_RESERVED_BITFIELD_COUNT;
> + } __packed;
> + };
> +} __packed;
> +
> enum hv_snp_status {
> HV_SNP_STATUS_NONE = 0,
> HV_SNP_STATUS_AVAILABLE = 1,
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: [PATCH v4 4/5] mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH
2025-09-26 16:23 ` [PATCH v4 4/5] mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH Nuno Das Neves
2025-09-29 5:31 ` Tianyu Lan
@ 2025-09-29 17:56 ` Michael Kelley
2025-10-01 22:56 ` Nuno Das Neves
1 sibling, 1 reply; 24+ messages in thread
From: Michael Kelley @ 2025-09-29 17:56 UTC (permalink / raw)
To: Nuno Das Neves, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, prapal@linux.microsoft.com,
easwar.hariharan@linux.microsoft.com, tiala@microsoft.com,
anirudh@anirudhrb.com, paekkaladevi@linux.microsoft.com,
skinsburskii@linux.microsoft.com
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, Jinank Jain
From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Friday, September 26, 2025 9:23 AM
>
> From: Jinank Jain <jinankjain@linux.microsoft.com>
>
> Introduce mshv_use_overlay_gpfn() to check if a page needs to be
> allocated and passed to the hypervisor to map VP state pages. This is
> only needed on L1VH, and only on some (newer) versions of the
> hypervisor, hence the need to check vmm_capabilities.
>
> Introduce functions hv_map/unmap_vp_state_page() to handle the
> allocation and freeing.
>
> Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com>
> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com>
> ---
> drivers/hv/mshv_root.h | 11 ++---
> drivers/hv/mshv_root_hv_call.c | 61 ++++++++++++++++++++++++---
> drivers/hv/mshv_root_main.c | 76 +++++++++++++++++-----------------
> 3 files changed, 98 insertions(+), 50 deletions(-)
>
> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> index 0cb1e2589fe1..dbe2d1d0b22f 100644
> --- a/drivers/hv/mshv_root.h
> +++ b/drivers/hv/mshv_root.h
> @@ -279,11 +279,12 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
> /* Choose between pages and bytes */
> struct hv_vp_state_data state_data, u64 page_count,
> struct page **pages, u32 num_bytes, u8 *bytes);
> -int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> - union hv_input_vtl input_vtl,
> - struct page **state_page);
> -int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> - union hv_input_vtl input_vtl);
> +int hv_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> + union hv_input_vtl input_vtl,
> + struct page **state_page);
> +int hv_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> + struct page *state_page,
> + union hv_input_vtl input_vtl);
> int hv_call_create_port(u64 port_partition_id, union hv_port_id port_id,
> u64 connection_partition_id, struct hv_port_info *port_info,
> u8 port_vtl, u8 min_connection_vtl, int node);
> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> index 3fd3cce23f69..98c6278ff151 100644
> --- a/drivers/hv/mshv_root_hv_call.c
> +++ b/drivers/hv/mshv_root_hv_call.c
> @@ -526,9 +526,9 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
> return ret;
> }
>
> -int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> - union hv_input_vtl input_vtl,
> - struct page **state_page)
> +static int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> + union hv_input_vtl input_vtl,
> + struct page **state_page)
> {
> struct hv_input_map_vp_state_page *input;
> struct hv_output_map_vp_state_page *output;
> @@ -547,7 +547,14 @@ int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> input->type = type;
> input->input_vtl = input_vtl;
>
> - status = hv_do_hypercall(HVCALL_MAP_VP_STATE_PAGE, input, output);
This function must zero the input area before using it. Otherwise,
flags.map_location_provided is uninitialized when *state_page is NULL. It will
have whatever value was left by the previous user of hyperv_pcpu_input_arg,
potentially producing bizarre results. And there's a reserved field that won't be
set to zero.
> + if (*state_page) {
> + input->flags.map_location_provided = 1;
> + input->requested_map_location =
> + page_to_pfn(*state_page);
Technically, this should be page_to_hvpfn() since the PFN value is being sent to
Hyper-V. I know root (and L1VH?) partitions must run with the same page size
as the Hyper-V host, but it's better to not leave code buried here that will blow
up if the "same page size requirement" should ever change.
And after making the hypercall, there's an invocation of pfn_to_page(), which
should account for the same. Unfortunately, there's not an existing hvpfn_to_page()
function.
> + }
> +
> + status = hv_do_hypercall(HVCALL_MAP_VP_STATE_PAGE, input,
> + output);
>
> if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
> if (hv_result_success(status))
> @@ -565,8 +572,39 @@ int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> return ret;
> }
>
> -int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> - union hv_input_vtl input_vtl)
> +static bool mshv_use_overlay_gpfn(void)
> +{
> + return hv_l1vh_partition() &&
> + mshv_root.vmm_caps.vmm_can_provide_overlay_gpfn;
> +}
> +
> +int hv_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> + union hv_input_vtl input_vtl,
> + struct page **state_page)
> +{
> + int ret = 0;
> + struct page *allocated_page = NULL;
> +
> + if (mshv_use_overlay_gpfn()) {
> + allocated_page = alloc_page(GFP_KERNEL);
> + if (!allocated_page)
> + return -ENOMEM;
> + *state_page = allocated_page;
> + } else {
> + *state_page = NULL;
> + }
> +
> + ret = hv_call_map_vp_state_page(partition_id, vp_index, type, input_vtl,
> + state_page);
> +
> + if (ret && allocated_page)
> + __free_page(allocated_page);
For robustness, you might want to set *state_page = NULL here so the
caller doesn't have a reference to the page that has been freed. I didn't
see any cases where the caller incorrectly checks the returned
*state_page value after an error, so the current code isn't broken.
> +
> + return ret;
> +}
> +
> +static int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> + union hv_input_vtl input_vtl)
> {
> unsigned long flags;
> u64 status;
> @@ -590,6 +628,17 @@ int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> return hv_result_to_errno(status);
> }
>
> +int hv_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> + struct page *state_page, union hv_input_vtl input_vtl)
> +{
> + int ret = hv_call_unmap_vp_state_page(partition_id, vp_index, type, input_vtl);
> +
> + if (mshv_use_overlay_gpfn() && state_page)
> + __free_page(state_page);
> +
> + return ret;
> +}
> +
> int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code,
> u64 arg, void *property_value,
> size_t property_value_sz)
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index e199770ecdfa..2d0ad17acde6 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -890,7 +890,7 @@ mshv_partition_ioctl_create_vp(struct mshv_partition
> *partition,
> {
> struct mshv_create_vp args;
> struct mshv_vp *vp;
> - struct page *intercept_message_page, *register_page, *ghcb_page;
> + struct page *intercept_msg_page, *register_page, *ghcb_page;
> void *stats_pages[2];
> long ret;
>
> @@ -908,28 +908,25 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
> if (ret)
> return ret;
>
> - ret = hv_call_map_vp_state_page(partition->pt_id, args.vp_index,
> - HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> - input_vtl_zero,
> - &intercept_message_page);
> + ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
> + HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> + input_vtl_zero, &intercept_msg_page);
> if (ret)
> goto destroy_vp;
>
> if (!mshv_partition_encrypted(partition)) {
> - ret = hv_call_map_vp_state_page(partition->pt_id, args.vp_index,
> - HV_VP_STATE_PAGE_REGISTERS,
> - input_vtl_zero,
> - ®ister_page);
> + ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
> + HV_VP_STATE_PAGE_REGISTERS,
> + input_vtl_zero, ®ister_page);
> if (ret)
> goto unmap_intercept_message_page;
> }
>
> if (mshv_partition_encrypted(partition) &&
> is_ghcb_mapping_available()) {
> - ret = hv_call_map_vp_state_page(partition->pt_id, args.vp_index,
> - HV_VP_STATE_PAGE_GHCB,
> - input_vtl_normal,
> - &ghcb_page);
> + ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
> + HV_VP_STATE_PAGE_GHCB,
> + input_vtl_normal, &ghcb_page);
> if (ret)
> goto unmap_register_page;
> }
> @@ -960,7 +957,7 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
> atomic64_set(&vp->run.vp_signaled_count, 0);
>
> vp->vp_index = args.vp_index;
> - vp->vp_intercept_msg_page = page_to_virt(intercept_message_page);
> + vp->vp_intercept_msg_page = page_to_virt(intercept_msg_page);
> if (!mshv_partition_encrypted(partition))
> vp->vp_register_page = page_to_virt(register_page);
>
> @@ -993,21 +990,19 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
> if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
> mshv_vp_stats_unmap(partition->pt_id, args.vp_index);
> unmap_ghcb_page:
> - if (mshv_partition_encrypted(partition) && is_ghcb_mapping_available()) {
> - hv_call_unmap_vp_state_page(partition->pt_id, args.vp_index,
> - HV_VP_STATE_PAGE_GHCB,
> - input_vtl_normal);
> - }
> + if (mshv_partition_encrypted(partition) && is_ghcb_mapping_available())
> + hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
> + HV_VP_STATE_PAGE_GHCB, ghcb_page,
> + input_vtl_normal);
> unmap_register_page:
> - if (!mshv_partition_encrypted(partition)) {
> - hv_call_unmap_vp_state_page(partition->pt_id, args.vp_index,
> - HV_VP_STATE_PAGE_REGISTERS,
> - input_vtl_zero);
> - }
> + if (!mshv_partition_encrypted(partition))
> + hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
> + HV_VP_STATE_PAGE_REGISTERS,
> + register_page, input_vtl_zero);
> unmap_intercept_message_page:
> - hv_call_unmap_vp_state_page(partition->pt_id, args.vp_index,
> - HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> - input_vtl_zero);
> + hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
> + HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> + intercept_msg_page, input_vtl_zero);
> destroy_vp:
> hv_call_delete_vp(partition->pt_id, args.vp_index);
> return ret;
> @@ -1748,24 +1743,27 @@ static void destroy_partition(struct mshv_partition *partition)
> mshv_vp_stats_unmap(partition->pt_id, vp->vp_index);
>
> if (vp->vp_register_page) {
> - (void)hv_call_unmap_vp_state_page(partition->pt_id,
> - vp->vp_index,
> - HV_VP_STATE_PAGE_REGISTERS,
> - input_vtl_zero);
> + (void)hv_unmap_vp_state_page(partition->pt_id,
> + vp->vp_index,
> + HV_VP_STATE_PAGE_REGISTERS,
> + virt_to_page(vp->vp_register_page),
> + input_vtl_zero);
> vp->vp_register_page = NULL;
> }
>
> - (void)hv_call_unmap_vp_state_page(partition->pt_id,
> - vp->vp_index,
> - HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> - input_vtl_zero);
> + (void)hv_unmap_vp_state_page(partition->pt_id,
> + vp->vp_index,
> + HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> + virt_to_page(vp->vp_intercept_msg_page),
> + input_vtl_zero);
> vp->vp_intercept_msg_page = NULL;
>
> if (vp->vp_ghcb_page) {
> - (void)hv_call_unmap_vp_state_page(partition->pt_id,
> - vp->vp_index,
> - HV_VP_STATE_PAGE_GHCB,
> - input_vtl_normal);
> + (void)hv_unmap_vp_state_page(partition->pt_id,
> + vp->vp_index,
> + HV_VP_STATE_PAGE_GHCB,
> + virt_to_page(vp->vp_ghcb_page),
> + input_vtl_normal);
> vp->vp_ghcb_page = NULL;
> }
>
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: [PATCH v4 5/5] mshv: Introduce new hypercall to map stats page for L1VH partitions
2025-09-26 16:23 ` [PATCH v4 5/5] mshv: Introduce new hypercall to map stats page for L1VH partitions Nuno Das Neves
@ 2025-09-29 17:57 ` Michael Kelley
2025-10-01 23:03 ` Nuno Das Neves
0 siblings, 1 reply; 24+ messages in thread
From: Michael Kelley @ 2025-09-29 17:57 UTC (permalink / raw)
To: Nuno Das Neves, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, prapal@linux.microsoft.com,
easwar.hariharan@linux.microsoft.com, tiala@microsoft.com,
anirudh@anirudhrb.com, paekkaladevi@linux.microsoft.com,
skinsburskii@linux.microsoft.com
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, Jinank Jain
From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Friday, September 26, 2025 9:23 AM
>
> From: Jinank Jain <jinankjain@linux.microsoft.com>
>
> Introduce HVCALL_MAP_STATS_PAGE2 which provides a map location (GPFN)
> to map the stats to. This hypercall is required for L1VH partitions,
> depending on the hypervisor version. This uses the same check as the
> state page map location; mshv_use_overlay_gpfn().
>
> Add mshv_map_vp_state_page() helpers to use this new hypercall or the
> old one depending on availability.
>
> For unmapping, the original HVCALL_UNMAP_STATS_PAGE works for both
> cases.
>
> Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
> ---
> drivers/hv/mshv_root.h | 10 ++--
> drivers/hv/mshv_root_hv_call.c | 93 ++++++++++++++++++++++++++++++++--
> drivers/hv/mshv_root_main.c | 22 ++++----
> include/hyperv/hvgdk_mini.h | 1 +
> include/hyperv/hvhdk_mini.h | 7 +++
> 5 files changed, 113 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> index dbe2d1d0b22f..0dfccfbe6123 100644
> --- a/drivers/hv/mshv_root.h
> +++ b/drivers/hv/mshv_root.h
> @@ -297,11 +297,11 @@ int hv_call_connect_port(u64 port_partition_id, union hv_port_id port_id,
> int hv_call_disconnect_port(u64 connection_partition_id,
> union hv_connection_id connection_id);
> int hv_call_notify_port_ring_empty(u32 sint_index);
> -int hv_call_map_stat_page(enum hv_stats_object_type type,
> - const union hv_stats_object_identity *identity,
> - void **addr);
> -int hv_call_unmap_stat_page(enum hv_stats_object_type type,
> - const union hv_stats_object_identity *identity);
> +int hv_map_stats_page(enum hv_stats_object_type type,
> + const union hv_stats_object_identity *identity,
> + void **addr);
> +int hv_unmap_stats_page(enum hv_stats_object_type type, void *page_addr,
> + const union hv_stats_object_identity *identity);
> int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
> u64 page_struct_count, u32 host_access,
> u32 flags, u8 acquire);
> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> index 98c6278ff151..5a805b3dec0b 100644
> --- a/drivers/hv/mshv_root_hv_call.c
> +++ b/drivers/hv/mshv_root_hv_call.c
> @@ -804,9 +804,51 @@ hv_call_notify_port_ring_empty(u32 sint_index)
> return hv_result_to_errno(status);
> }
>
> -int hv_call_map_stat_page(enum hv_stats_object_type type,
> - const union hv_stats_object_identity *identity,
> - void **addr)
> +static int hv_call_map_stats_page2(enum hv_stats_object_type type,
> + const union hv_stats_object_identity *identity,
> + u64 map_location)
> +{
> + unsigned long flags;
> + struct hv_input_map_stats_page2 *input;
> + u64 status;
> + int ret;
> +
> + if (!map_location || !mshv_use_overlay_gpfn())
> + return -EINVAL;
> +
> + do {
> + local_irq_save(flags);
> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +
> + memset(input, 0, sizeof(*input));
> + input->type = type;
> + input->identity = *identity;
> + input->map_location = map_location;
> +
> + status = hv_do_hypercall(HVCALL_MAP_STATS_PAGE2, input, NULL);
> +
> + local_irq_restore(flags);
> +
> + ret = hv_result_to_errno(status);
> +
> + if (!ret)
> + break;
This logic is incorrect. If the hypercall returns
HV_STATUS_INSUFFICIENT_MEMORY, then errno -ENOMEM is immediately
returned to the caller without any opportunity to do hv_call_deposit_pages().
> +
> + if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
> + hv_status_debug(status, "\n");
> + break;
> + }
> +
> + ret = hv_call_deposit_pages(NUMA_NO_NODE,
> + hv_current_partition_id, 1);
> + } while (!ret);
> +
> + return ret;
> +}
> +
> +static int hv_call_map_stats_page(enum hv_stats_object_type type,
> + const union hv_stats_object_identity *identity,
> + void **addr)
> {
> unsigned long flags;
> struct hv_input_map_stats_page *input;
> @@ -845,8 +887,36 @@ int hv_call_map_stat_page(enum hv_stats_object_type type,
> return ret;
> }
>
> -int hv_call_unmap_stat_page(enum hv_stats_object_type type,
> - const union hv_stats_object_identity *identity)
> +int hv_map_stats_page(enum hv_stats_object_type type,
> + const union hv_stats_object_identity *identity,
> + void **addr)
> +{
> + int ret;
> + struct page *allocated_page = NULL;
> +
> + if (!addr)
> + return -EINVAL;
> +
> + if (mshv_use_overlay_gpfn()) {
> + allocated_page = alloc_page(GFP_KERNEL);
> + if (!allocated_page)
> + return -ENOMEM;
> +
> + ret = hv_call_map_stats_page2(type, identity,
> + page_to_pfn(allocated_page));
This should use page_to_hvpfn() per my comments in Patch 4 of this series.
> + *addr = page_address(allocated_page);
> + } else {
> + ret = hv_call_map_stats_page(type, identity, addr);
> + }
> +
> + if (ret && allocated_page)
> + __free_page(allocated_page);
Might want to do *addr = NULL after freeing the page so that the caller
can't erroneously reference the free page. Again, the current caller doesn't
do that, so current code isn't broken.
> +
> + return ret;
> +}
> +
> +static int hv_call_unmap_stats_page(enum hv_stats_object_type type,
> + const union hv_stats_object_identity *identity)
> {
> unsigned long flags;
> struct hv_input_unmap_stats_page *input;
> @@ -865,6 +935,19 @@ int hv_call_unmap_stat_page(enum hv_stats_object_type type,
> return hv_result_to_errno(status);
> }
>
> +int hv_unmap_stats_page(enum hv_stats_object_type type, void *page_addr,
> + const union hv_stats_object_identity *identity)
> +{
> + int ret;
> +
> + ret = hv_call_unmap_stats_page(type, identity);
> +
> + if (mshv_use_overlay_gpfn() && page_addr)
> + __free_page(virt_to_page(page_addr));
> +
> + return ret;
> +}
> +
> int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
> u64 page_struct_count, u32 host_access,
> u32 flags, u8 acquire)
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index 2d0ad17acde6..71a8ab5db3b8 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -841,7 +841,8 @@ mshv_vp_release(struct inode *inode, struct file *filp)
> return 0;
> }
>
> -static void mshv_vp_stats_unmap(u64 partition_id, u32 vp_index)
> +static void mshv_vp_stats_unmap(u64 partition_id, u32 vp_index,
> + void *stats_pages[])
> {
> union hv_stats_object_identity identity = {
> .vp.partition_id = partition_id,
> @@ -849,10 +850,10 @@ static void mshv_vp_stats_unmap(u64 partition_id, u32 vp_index)
> };
>
> identity.vp.stats_area_type = HV_STATS_AREA_SELF;
> - hv_call_unmap_stat_page(HV_STATS_OBJECT_VP, &identity);
> + hv_unmap_stats_page(HV_STATS_OBJECT_VP, NULL, &identity);
>
> identity.vp.stats_area_type = HV_STATS_AREA_PARENT;
> - hv_call_unmap_stat_page(HV_STATS_OBJECT_VP, &identity);
> + hv_unmap_stats_page(HV_STATS_OBJECT_VP, NULL, &identity);
> }
>
> static int mshv_vp_stats_map(u64 partition_id, u32 vp_index,
> @@ -865,14 +866,14 @@ static int mshv_vp_stats_map(u64 partition_id, u32 vp_index,
> int err;
>
> identity.vp.stats_area_type = HV_STATS_AREA_SELF;
> - err = hv_call_map_stat_page(HV_STATS_OBJECT_VP, &identity,
> - &stats_pages[HV_STATS_AREA_SELF]);
> + err = hv_map_stats_page(HV_STATS_OBJECT_VP, &identity,
> + &stats_pages[HV_STATS_AREA_SELF]);
> if (err)
> return err;
>
> identity.vp.stats_area_type = HV_STATS_AREA_PARENT;
> - err = hv_call_map_stat_page(HV_STATS_OBJECT_VP, &identity,
> - &stats_pages[HV_STATS_AREA_PARENT]);
> + err = hv_map_stats_page(HV_STATS_OBJECT_VP, &identity,
> + &stats_pages[HV_STATS_AREA_PARENT]);
> if (err)
> goto unmap_self;
>
> @@ -880,7 +881,7 @@ static int mshv_vp_stats_map(u64 partition_id, u32 vp_index,
>
> unmap_self:
> identity.vp.stats_area_type = HV_STATS_AREA_SELF;
> - hv_call_unmap_stat_page(HV_STATS_OBJECT_VP, &identity);
> + hv_unmap_stats_page(HV_STATS_OBJECT_VP, NULL, &identity);
> return err;
> }
>
> @@ -988,7 +989,7 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
> kfree(vp);
> unmap_stats_pages:
> if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
> - mshv_vp_stats_unmap(partition->pt_id, args.vp_index);
> + mshv_vp_stats_unmap(partition->pt_id, args.vp_index, stats_pages);
> unmap_ghcb_page:
> if (mshv_partition_encrypted(partition) && is_ghcb_mapping_available())
> hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
> @@ -1740,7 +1741,8 @@ static void destroy_partition(struct mshv_partition *partition)
> continue;
>
> if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
> - mshv_vp_stats_unmap(partition->pt_id, vp->vp_index);
> + mshv_vp_stats_unmap(partition->pt_id, vp->vp_index,
> + (void **)vp->vp_stats_pages);
>
> if (vp->vp_register_page) {
> (void)hv_unmap_vp_state_page(partition->pt_id,
> diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
> index ff4325fb623a..f66565106d21 100644
> --- a/include/hyperv/hvgdk_mini.h
> +++ b/include/hyperv/hvgdk_mini.h
> @@ -493,6 +493,7 @@ union hv_vp_assist_msr_contents { /*
> HV_REGISTER_VP_ASSIST_PAGE */
> #define HVCALL_GET_PARTITION_PROPERTY_EX 0x0101
> #define HVCALL_MMIO_READ 0x0106
> #define HVCALL_MMIO_WRITE 0x0107
> +#define HVCALL_MAP_STATS_PAGE2 0x0131
>
> /* HV_HYPERCALL_INPUT */
> #define HV_HYPERCALL_RESULT_MASK GENMASK_ULL(15, 0)
> diff --git a/include/hyperv/hvhdk_mini.h b/include/hyperv/hvhdk_mini.h
> index bf2ce27dfcc5..064bf735cab6 100644
> --- a/include/hyperv/hvhdk_mini.h
> +++ b/include/hyperv/hvhdk_mini.h
> @@ -177,6 +177,13 @@ struct hv_input_map_stats_page {
> union hv_stats_object_identity identity;
> } __packed;
>
> +struct hv_input_map_stats_page2 {
> + u32 type; /* enum hv_stats_object_type */
> + u32 padding;
> + union hv_stats_object_identity identity;
> + u64 map_location;
> +} __packed;
> +
> struct hv_output_map_stats_page {
> u64 map_location;
> } __packed;
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings
2025-09-26 23:12 ` [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings Stanislav Kinsburskii
@ 2025-09-29 18:19 ` Nuno Das Neves
2025-09-30 23:14 ` Wei Liu
2025-10-03 16:31 ` Stanislav Kinsburskii
0 siblings, 2 replies; 24+ messages in thread
From: Nuno Das Neves @ 2025-09-29 18:19 UTC (permalink / raw)
To: Stanislav Kinsburskii
Cc: linux-hyperv, linux-kernel, prapal, easwar.hariharan, tiala,
anirudh, paekkaladevi, kys, haiyangz, wei.liu, decui
On 9/26/2025 4:12 PM, Stanislav Kinsburskii wrote:
> On Fri, Sep 26, 2025 at 09:23:10AM -0700, Nuno Das Neves wrote:
>> There are some differences in how L1VH partitions must map stats and vp
>> state pages, some of which are due to differences across hypervisor
>> versions. Detect and handle these cases.
>>
>
> I'm not sure that support for older and actully broken versions on
> hypervisor need to be usptreamed, as these versions will go away sooner
> or later and this support will become dead weight.
>
As far as I know, these changes are relevant for shipped versions of the
hypervisor - they are not 'broken' except in some very specific cases
(live migration on L1VH, I think?)
The hypervisor team added a feature bit for these changes so that both old
and new versions of these APIs can be supported.
> I think we should upstrem only the changes needed for the new versiong
> of hypervisors instead and carry legacy support out of tree until it
> becomes obsoleted.
>
Which version do you suggest to be the cutoff?
I'd prefer to support as many versions of the hypervisor as we can, as
long as they are at all relevant. We can remove the support later.
Removing prematurely just creates friction. Inevitably some users will
find themselves running on an older hypervisor and then it just fails
with a cryptic error. This includes myself, since I test L1VH on Azure
which typically has older hypervisor versions.
Nuno
> Thanks,
> Stanislav
>
>
>> Patch 1:
>> Fix for the logic of when to map the vp stats page for the root scheduler.
>>
>> Patch 2-3:
>> Add HVCALL_GET_PARTITION_PROPERTY_EX and use it to query "vmm capabilities" on
>> module init.
>>
>> Patches 4-5:
>> Check a feature bit in vmm capabilities, to take a new code path for mapping
>> stats and vp state pages. In this case, the stats and vp state pages must be
>> allocated by Linux, and a new hypercall HVCALL_MAP_VP_STATS_PAGE2 must be used
>> to map the stats page.
>>
>> ---
>> v4:
>> - Fixed some __packed attributes on unions [Stanislav]
>> - Cleaned up mshv_init_vmm_caps() [Stanislav]
>> - Cleaned up loop in hv_call_map_stats_page2() [Stanislav]
>>
>> v3:
>> https://lore.kernel.org/linux-hyperv/1758066262-15477-1-git-send-email-nunodasneves@linux.microsoft.com/T/#t
>> - Fix bug in patch 4, in mshv_partition_ioctl_create_vp() cleanup path
>> [kernel test robot]
>> - Make hv_unmap_vp_state_page() use struct page to match hv_map_vp_state_page()
>> - Remove SELF == PARENT check which doesn't belong in patch 5 [Easwar]
>>
>> v2:
>> https://lore.kernel.org/linux-hyperv/1757546089-2002-1-git-send-email-nunodasneves@linux.microsoft.com/T/#t
>> - Remove patch falling back to SELF page if PARENT mapping fails [Easwar]
>> (To be included in a future series)
>> - Fix formatting of function definitions [Easwar]
>> - Fix some wording in commit messages [Praveen]
>> - Proceed with driver init even if getting vmm capabilities fails [Anirudh]
>>
>> v1:
>> https://lore.kernel.org/linux-hyperv/1756428230-3599-1-git-send-email-nunodasneves@linux.microsoft.com/T/#t
>>
>> ---
>> Jinank Jain (2):
>> mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH
>> mshv: Introduce new hypercall to map stats page for L1VH partitions
>>
>> Nuno Das Neves (1):
>> mshv: Only map vp->vp_stats_pages if on root scheduler
>>
>> Purna Pavan Chandra Aekkaladevi (2):
>> mshv: Add the HVCALL_GET_PARTITION_PROPERTY_EX hypercall
>> mshv: Get the vmm capabilities offered by the hypervisor
>>
>> drivers/hv/mshv_root.h | 24 +++--
>> drivers/hv/mshv_root_hv_call.c | 185 +++++++++++++++++++++++++++++++--
>> drivers/hv/mshv_root_main.c | 127 ++++++++++++----------
>> include/hyperv/hvgdk_mini.h | 2 +
>> include/hyperv/hvhdk.h | 40 +++++++
>> include/hyperv/hvhdk_mini.h | 33 ++++++
>> 6 files changed, 337 insertions(+), 74 deletions(-)
>>
>> --
>> 2.34.1
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings
2025-09-29 18:19 ` Nuno Das Neves
@ 2025-09-30 23:14 ` Wei Liu
2025-10-03 16:31 ` Stanislav Kinsburskii
1 sibling, 0 replies; 24+ messages in thread
From: Wei Liu @ 2025-09-30 23:14 UTC (permalink / raw)
To: Nuno Das Neves
Cc: Stanislav Kinsburskii, linux-hyperv, linux-kernel, prapal,
easwar.hariharan, tiala, anirudh, paekkaladevi, kys, haiyangz,
wei.liu, decui
On Mon, Sep 29, 2025 at 11:19:51AM -0700, Nuno Das Neves wrote:
> On 9/26/2025 4:12 PM, Stanislav Kinsburskii wrote:
> > On Fri, Sep 26, 2025 at 09:23:10AM -0700, Nuno Das Neves wrote:
> >> There are some differences in how L1VH partitions must map stats and vp
> >> state pages, some of which are due to differences across hypervisor
> >> versions. Detect and handle these cases.
> >>
> >
> > I'm not sure that support for older and actully broken versions on
> > hypervisor need to be usptreamed, as these versions will go away sooner
> > or later and this support will become dead weight.
> >
> As far as I know, these changes are relevant for shipped versions of the
> hypervisor - they are not 'broken' except in some very specific cases
> (live migration on L1VH, I think?)
>
Right, they are not broken, just have more limitations.
> The hypervisor team added a feature bit for these changes so that both old
> and new versions of these APIs can be supported.
>
> > I think we should upstrem only the changes needed for the new versiong
> > of hypervisors instead and carry legacy support out of tree until it
> > becomes obsoleted.
> >
> Which version do you suggest to be the cutoff?
>
> I'd prefer to support as many versions of the hypervisor as we can, as
> long as they are at all relevant. We can remove the support later.
> Removing prematurely just creates friction. Inevitably some users will
> find themselves running on an older hypervisor and then it just fails
> with a cryptic error. This includes myself, since I test L1VH on Azure
> which typically has older hypervisor versions.
This. It takes a long time to saturate the fleet with a new hypervisor.
Realistically I am looking at 2+ years before we can drop the
compatibility code, if ever.
Another reason to upstream the compatibility code is because partners
will want to pick up our code, so hiding this code from them makes both
our and their life harder.
Wei
>
> Nuno
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 2/5] mshv: Add the HVCALL_GET_PARTITION_PROPERTY_EX hypercall
2025-09-29 17:55 ` Michael Kelley
@ 2025-10-01 22:32 ` Nuno Das Neves
2025-10-02 0:01 ` Michael Kelley
0 siblings, 1 reply; 24+ messages in thread
From: Nuno Das Neves @ 2025-10-01 22:32 UTC (permalink / raw)
To: Michael Kelley, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, prapal@linux.microsoft.com,
easwar.hariharan@linux.microsoft.com, tiala@microsoft.com,
anirudh@anirudhrb.com, paekkaladevi@linux.microsoft.com,
skinsburskii@linux.microsoft.com
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com
On 9/29/2025 10:55 AM, Michael Kelley wrote:
> From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Friday, September 26, 2025 9:23 AM
>>
>> From: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
>>
>> This hypercall can be used to fetch extended properties of a
>> partition. Extended properties are properties with values larger than
>> a u64. Some of these also need additional input arguments.
>>
>> Add helper function for using the hypercall in the mshv_root driver.
>>
>> Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
>> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
>> Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com>
>> Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com>
>> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
>> ---
>> drivers/hv/mshv_root.h | 2 ++
>> drivers/hv/mshv_root_hv_call.c | 31 ++++++++++++++++++++++++++
>> include/hyperv/hvgdk_mini.h | 1 +
>> include/hyperv/hvhdk.h | 40 ++++++++++++++++++++++++++++++++++
>> include/hyperv/hvhdk_mini.h | 26 ++++++++++++++++++++++
>> 5 files changed, 100 insertions(+)
>>
>> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
>> index e3931b0f1269..4aeb03bea6b6 100644
>> --- a/drivers/hv/mshv_root.h
>> +++ b/drivers/hv/mshv_root.h
>> @@ -303,6 +303,8 @@ int hv_call_unmap_stat_page(enum hv_stats_object_type type,
>> int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
>> u64 page_struct_count, u32 host_access,
>> u32 flags, u8 acquire);
>> +int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code, u64 arg,
>> + void *property_value, size_t property_value_sz);
>>
>> extern struct mshv_root mshv_root;
>> extern enum hv_scheduler_type hv_scheduler_type;
>> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
>> index c9c274f29c3c..3fd3cce23f69 100644
>> --- a/drivers/hv/mshv_root_hv_call.c
>> +++ b/drivers/hv/mshv_root_hv_call.c
>> @@ -590,6 +590,37 @@ int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> return hv_result_to_errno(status);
>> }
>>
>> +int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code,
>> + u64 arg, void *property_value,
>> + size_t property_value_sz)
>> +{
>> + u64 status;
>> + unsigned long flags;
>> + struct hv_input_get_partition_property_ex *input;
>> + struct hv_output_get_partition_property_ex *output;
>> +
>> + local_irq_save(flags);
>> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
>> + output = *this_cpu_ptr(hyperv_pcpu_output_arg);
>> +
>> + memset(input, 0, sizeof(*input));
>> + input->partition_id = partition_id;
>> + input->property_code = property_code;
>> + input->arg = arg;
>> + status = hv_do_hypercall(HVCALL_GET_PARTITION_PROPERTY_EX, input, output);
>> +
>> + if (!hv_result_success(status)) {
>> + hv_status_debug(status, "\n");
>> + local_irq_restore(flags);
>
> Nit: It would be marginally better to do the local_irq_restore() first, and then
> output the error message so that interrupts are not disabled any longer than
> needed.
>
Good point
>> + return hv_result_to_errno(status);
>> + }
>> + memcpy(property_value, &output->property_value, property_value_sz);
>> +
>> + local_irq_restore(flags);
>> +
>> + return 0;
>> +}
>> +
>> int
>> hv_call_clear_virtual_interrupt(u64 partition_id)
>> {
>> diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
>> index 1be7f6a02304..ff4325fb623a 100644
>> --- a/include/hyperv/hvgdk_mini.h
>> +++ b/include/hyperv/hvgdk_mini.h
>> @@ -490,6 +490,7 @@ union hv_vp_assist_msr_contents { /*
>> HV_REGISTER_VP_ASSIST_PAGE */
>> #define HVCALL_GET_VP_STATE 0x00e3
>> #define HVCALL_SET_VP_STATE 0x00e4
>> #define HVCALL_GET_VP_CPUID_VALUES 0x00f4
>> +#define HVCALL_GET_PARTITION_PROPERTY_EX 0x0101
>> #define HVCALL_MMIO_READ 0x0106
>> #define HVCALL_MMIO_WRITE 0x0107
>>
>> diff --git a/include/hyperv/hvhdk.h b/include/hyperv/hvhdk.h
>> index b4067ada02cf..416c0d45b793 100644
>> --- a/include/hyperv/hvhdk.h
>> +++ b/include/hyperv/hvhdk.h
>> @@ -376,6 +376,46 @@ struct hv_input_set_partition_property {
>> u64 property_value;
>> } __packed;
>>
>> +union hv_partition_property_arg {
>> + u64 as_uint64;
>> + struct {
>> + union {
>> + u32 arg;
>> + u32 vp_index;
>> + };
>> + u16 reserved0;
>> + u8 reserved1;
>> + u8 object_type;
>> + } __packed;
>> +};
>> +
>> +struct hv_input_get_partition_property_ex {
>> + u64 partition_id;
>> + u32 property_code; /* enum hv_partition_property_code */
>> + u32 padding;
>> + union {
>> + union hv_partition_property_arg arg_data;
>> + u64 arg;
>
> This union, and the "u64 arg" member, seems redundant since
> union hv_partition_property_arg already has a member "as_uint64".
>
> Maybe this is just being copied from the Windows versions of these
> structures, in which case I realize there are constraints on what you
> want to change or fix, and you can ignore my comment.
>
Yeah, this is just due to it coming from the Windows code. I prefer to
keep it as close as possible unless it is actually a bug (like missing
padding which has happened a couple of times).
>> + };
>> +} __packed;
>> +
>> +/*
>> + * NOTE: Should use hv_input_set_partition_property_ex_header to compute this
>> + * size, but hv_input_get_partition_property_ex is identical so it suffices
>> + */
>> +#define HV_PARTITION_PROPERTY_EX_MAX_VAR_SIZE \
>> + (HV_HYP_PAGE_SIZE - sizeof(struct hv_input_get_partition_property_ex))
>> +
>> +union hv_partition_property_ex {
>> + u8 buffer[HV_PARTITION_PROPERTY_EX_MAX_VAR_SIZE];
>
> It's unclear what this "buffer" field is trying to do, and particularly its size.
> The comment above references hv_input_set_partition_property_ex_header,
> which doesn't exist in the Linux code. And why is the max size of the output
> buffer reduced by the size of the header of the input to "set partition property"?
>
Yes, hv_input_set_partition_property_ex_header doesn't exist here, it's from the
Windows code (although in CAPs). It's identical to hv_input_get_partition_property_ex.
I decided to just use hv_input_get_partition_property_ex instead of introducing the
unused header struct, for now.
I'm not certain what the buffer is used for, but I suspect it's for data that doesn't
fit with the other partition property structs. Maybe some raw/unstructured data.
hv_partition_property_ex has to be that size because it is also used (not yet, in linux)
for setting properties, and the set hypercall has a header as mentioned above, so the
entire struct mustn't exceed a page in size - i.e.:
struct hv_input_set_partition_property_ex {
hv_input_set_partition_property_ex_header header;
hv_partition_property_ex property_value;
};
But we don't need this yet.
Thanks
Nuno
>> + struct hv_partition_property_vmm_capabilities vmm_capabilities;
>> + /* More fields to be filled in when needed */
>> +};
>> +
>> +struct hv_output_get_partition_property_ex {
>> + union hv_partition_property_ex property_value;
>> +} __packed;
>> +
>> enum hv_vp_state_page_type {
>> HV_VP_STATE_PAGE_REGISTERS = 0,
>> HV_VP_STATE_PAGE_INTERCEPT_MESSAGE = 1,
>> diff --git a/include/hyperv/hvhdk_mini.h b/include/hyperv/hvhdk_mini.h
>> index 858f6a3925b3..bf2ce27dfcc5 100644
>> --- a/include/hyperv/hvhdk_mini.h
>> +++ b/include/hyperv/hvhdk_mini.h
>> @@ -96,8 +96,34 @@ enum hv_partition_property_code {
>> HV_PARTITION_PROPERTY_XSAVE_STATES = 0x00060007,
>> HV_PARTITION_PROPERTY_MAX_XSAVE_DATA_SIZE = 0x00060008,
>> HV_PARTITION_PROPERTY_PROCESSOR_CLOCK_FREQUENCY = 0x00060009,
>> +
>> + /* Extended properties with larger property values */
>> + HV_PARTITION_PROPERTY_VMM_CAPABILITIES = 0x00090007,
>> };
>>
>> +#define HV_PARTITION_VMM_CAPABILITIES_BANK_COUNT 1
>> +#define HV_PARTITION_VMM_CAPABILITIES_RESERVED_BITFIELD_COUNT 59
>> +
>> +struct hv_partition_property_vmm_capabilities {
>> + u16 bank_count;
>> + u16 reserved[3];
>> + union {
>> + u64 as_uint64[HV_PARTITION_VMM_CAPABILITIES_BANK_COUNT];
>> + struct {
>> + u64 map_gpa_preserve_adjustable: 1;
>> + u64 vmm_can_provide_overlay_gpfn: 1;
>> + u64 vp_affinity_property: 1;
>> +#if IS_ENABLED(CONFIG_ARM64)
>> + u64 vmm_can_provide_gic_overlay_locations: 1;
>> +#else
>> + u64 reservedbit3: 1;
>> +#endif
>> + u64 assignable_synthetic_proc_features: 1;
>> + u64 reserved0: HV_PARTITION_VMM_CAPABILITIES_RESERVED_BITFIELD_COUNT;
>> + } __packed;
>> + };
>> +} __packed;
>> +
>> enum hv_snp_status {
>> HV_SNP_STATUS_NONE = 0,
>> HV_SNP_STATUS_AVAILABLE = 1,
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 4/5] mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH
2025-09-29 17:56 ` Michael Kelley
@ 2025-10-01 22:56 ` Nuno Das Neves
2025-10-02 0:02 ` Michael Kelley
0 siblings, 1 reply; 24+ messages in thread
From: Nuno Das Neves @ 2025-10-01 22:56 UTC (permalink / raw)
To: Michael Kelley, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, prapal@linux.microsoft.com,
easwar.hariharan@linux.microsoft.com, tiala@microsoft.com,
anirudh@anirudhrb.com, paekkaladevi@linux.microsoft.com,
skinsburskii@linux.microsoft.com
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, Jinank Jain
On 9/29/2025 10:56 AM, Michael Kelley wrote:
> From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Friday, September 26, 2025 9:23 AM
>>
>> From: Jinank Jain <jinankjain@linux.microsoft.com>
>>
>> Introduce mshv_use_overlay_gpfn() to check if a page needs to be
>> allocated and passed to the hypervisor to map VP state pages. This is
>> only needed on L1VH, and only on some (newer) versions of the
>> hypervisor, hence the need to check vmm_capabilities.
>>
>> Introduce functions hv_map/unmap_vp_state_page() to handle the
>> allocation and freeing.
>>
>> Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
>> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
>> Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com>
>> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
>> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
>> Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com>
>> ---
>> drivers/hv/mshv_root.h | 11 ++---
>> drivers/hv/mshv_root_hv_call.c | 61 ++++++++++++++++++++++++---
>> drivers/hv/mshv_root_main.c | 76 +++++++++++++++++-----------------
>> 3 files changed, 98 insertions(+), 50 deletions(-)
>>
>> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
>> index 0cb1e2589fe1..dbe2d1d0b22f 100644
>> --- a/drivers/hv/mshv_root.h
>> +++ b/drivers/hv/mshv_root.h
>> @@ -279,11 +279,12 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
>> /* Choose between pages and bytes */
>> struct hv_vp_state_data state_data, u64 page_count,
>> struct page **pages, u32 num_bytes, u8 *bytes);
>> -int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> - union hv_input_vtl input_vtl,
>> - struct page **state_page);
>> -int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> - union hv_input_vtl input_vtl);
>> +int hv_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> + union hv_input_vtl input_vtl,
>> + struct page **state_page);
>> +int hv_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> + struct page *state_page,
>> + union hv_input_vtl input_vtl);
>> int hv_call_create_port(u64 port_partition_id, union hv_port_id port_id,
>> u64 connection_partition_id, struct hv_port_info *port_info,
>> u8 port_vtl, u8 min_connection_vtl, int node);
>> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
>> index 3fd3cce23f69..98c6278ff151 100644
>> --- a/drivers/hv/mshv_root_hv_call.c
>> +++ b/drivers/hv/mshv_root_hv_call.c
>> @@ -526,9 +526,9 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
>> return ret;
>> }
>>
>> -int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> - union hv_input_vtl input_vtl,
>> - struct page **state_page)
>> +static int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> + union hv_input_vtl input_vtl,
>> + struct page **state_page)
>> {
>> struct hv_input_map_vp_state_page *input;
>> struct hv_output_map_vp_state_page *output;
>> @@ -547,7 +547,14 @@ int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> input->type = type;
>> input->input_vtl = input_vtl;
>>
>> - status = hv_do_hypercall(HVCALL_MAP_VP_STATE_PAGE, input, output);
>
> This function must zero the input area before using it. Otherwise,
> flags.map_location_provided is uninitialized when *state_page is NULL. It will
> have whatever value was left by the previous user of hyperv_pcpu_input_arg,
> potentially producing bizarre results. And there's a reserved field that won't be
> set to zero.
>
Good catch, will add a memset().
>> + if (*state_page) {
>> + input->flags.map_location_provided = 1;
>> + input->requested_map_location =
>> + page_to_pfn(*state_page);
>
> Technically, this should be page_to_hvpfn() since the PFN value is being sent to
> Hyper-V. I know root (and L1VH?) partitions must run with the same page size
> as the Hyper-V host, but it's better to not leave code buried here that will blow
> up if the "same page size requirement" should ever change.
>
Good point...I could change these calls, but the other way doesn't work, see below.
> And after making the hypercall, there's an invocation of pfn_to_page(), which
> should account for the same. Unfortunately, there's not an existing hvpfn_to_page()
> function.
>
This seems like a tricky scenario to get right. In the root partition case, the
hypervisor allocates the page. That pfn could be some page within a larger Linux page.
Converting that to a Linux pfn (naively) means losing the original hvpfn since it gets
truncated, which is no good if we want to unmap it later. Also page_address() would
give the wrong virtual address.
In other words, we'd have to completely change how we track these pages in order to
support this scenario, and the same goes for various other hypervisor APIs where the
hypervisor does the allocating. I think it's out of scope to try and address that
here, even in part, especially since we will be making assumptions about something
that may never happen.
>> + }
>> +
>> + status = hv_do_hypercall(HVCALL_MAP_VP_STATE_PAGE, input,
>> + output);
>>
>> if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
>> if (hv_result_success(status))
>> @@ -565,8 +572,39 @@ int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> return ret;
>> }
>>
>> -int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> - union hv_input_vtl input_vtl)
>> +static bool mshv_use_overlay_gpfn(void)
>> +{
>> + return hv_l1vh_partition() &&
>> + mshv_root.vmm_caps.vmm_can_provide_overlay_gpfn;
>> +}
>> +
>> +int hv_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> + union hv_input_vtl input_vtl,
>> + struct page **state_page)
>> +{
>> + int ret = 0;
>> + struct page *allocated_page = NULL;
>> +
>> + if (mshv_use_overlay_gpfn()) {
>> + allocated_page = alloc_page(GFP_KERNEL);
>> + if (!allocated_page)
>> + return -ENOMEM;
>> + *state_page = allocated_page;
>> + } else {
>> + *state_page = NULL;
>> + }
>> +
>> + ret = hv_call_map_vp_state_page(partition_id, vp_index, type, input_vtl,
>> + state_page);
>> +
>> + if (ret && allocated_page)
>> + __free_page(allocated_page);
>
> For robustness, you might want to set *state_page = NULL here so the
> caller doesn't have a reference to the page that has been freed. I didn't
> see any cases where the caller incorrectly checks the returned
> *state_page value after an error, so the current code isn't broken.
>
Sure, I can add it.
>> +
>> + return ret;
>> +}
>> +
>> +static int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> + union hv_input_vtl input_vtl)
>> {
>> unsigned long flags;
>> u64 status;
>> @@ -590,6 +628,17 @@ int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> return hv_result_to_errno(status);
>> }
>>
>> +int hv_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> + struct page *state_page, union hv_input_vtl input_vtl)
>> +{
>> + int ret = hv_call_unmap_vp_state_page(partition_id, vp_index, type, input_vtl);
>> +
>> + if (mshv_use_overlay_gpfn() && state_page)
>> + __free_page(state_page);
>> +
>> + return ret;
>> +}
>> +
>> int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code,
>> u64 arg, void *property_value,
>> size_t property_value_sz)
>> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
>> index e199770ecdfa..2d0ad17acde6 100644
>> --- a/drivers/hv/mshv_root_main.c
>> +++ b/drivers/hv/mshv_root_main.c
>> @@ -890,7 +890,7 @@ mshv_partition_ioctl_create_vp(struct mshv_partition
>> *partition,
>> {
>> struct mshv_create_vp args;
>> struct mshv_vp *vp;
>> - struct page *intercept_message_page, *register_page, *ghcb_page;
>> + struct page *intercept_msg_page, *register_page, *ghcb_page;
>> void *stats_pages[2];
>> long ret;
>>
>> @@ -908,28 +908,25 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
>> if (ret)
>> return ret;
>>
>> - ret = hv_call_map_vp_state_page(partition->pt_id, args.vp_index,
>> - HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
>> - input_vtl_zero,
>> - &intercept_message_page);
>> + ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
>> + HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
>> + input_vtl_zero, &intercept_msg_page);
>> if (ret)
>> goto destroy_vp;
>>
>> if (!mshv_partition_encrypted(partition)) {
>> - ret = hv_call_map_vp_state_page(partition->pt_id, args.vp_index,
>> - HV_VP_STATE_PAGE_REGISTERS,
>> - input_vtl_zero,
>> - ®ister_page);
>> + ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
>> + HV_VP_STATE_PAGE_REGISTERS,
>> + input_vtl_zero, ®ister_page);
>> if (ret)
>> goto unmap_intercept_message_page;
>> }
>>
>> if (mshv_partition_encrypted(partition) &&
>> is_ghcb_mapping_available()) {
>> - ret = hv_call_map_vp_state_page(partition->pt_id, args.vp_index,
>> - HV_VP_STATE_PAGE_GHCB,
>> - input_vtl_normal,
>> - &ghcb_page);
>> + ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
>> + HV_VP_STATE_PAGE_GHCB,
>> + input_vtl_normal, &ghcb_page);
>> if (ret)
>> goto unmap_register_page;
>> }
>> @@ -960,7 +957,7 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
>> atomic64_set(&vp->run.vp_signaled_count, 0);
>>
>> vp->vp_index = args.vp_index;
>> - vp->vp_intercept_msg_page = page_to_virt(intercept_message_page);
>> + vp->vp_intercept_msg_page = page_to_virt(intercept_msg_page);
>> if (!mshv_partition_encrypted(partition))
>> vp->vp_register_page = page_to_virt(register_page);
>>
>> @@ -993,21 +990,19 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
>> if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
>> mshv_vp_stats_unmap(partition->pt_id, args.vp_index);
>> unmap_ghcb_page:
>> - if (mshv_partition_encrypted(partition) && is_ghcb_mapping_available()) {
>> - hv_call_unmap_vp_state_page(partition->pt_id, args.vp_index,
>> - HV_VP_STATE_PAGE_GHCB,
>> - input_vtl_normal);
>> - }
>> + if (mshv_partition_encrypted(partition) && is_ghcb_mapping_available())
>> + hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
>> + HV_VP_STATE_PAGE_GHCB, ghcb_page,
>> + input_vtl_normal);
>> unmap_register_page:
>> - if (!mshv_partition_encrypted(partition)) {
>> - hv_call_unmap_vp_state_page(partition->pt_id, args.vp_index,
>> - HV_VP_STATE_PAGE_REGISTERS,
>> - input_vtl_zero);
>> - }
>> + if (!mshv_partition_encrypted(partition))
>> + hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
>> + HV_VP_STATE_PAGE_REGISTERS,
>> + register_page, input_vtl_zero);
>> unmap_intercept_message_page:
>> - hv_call_unmap_vp_state_page(partition->pt_id, args.vp_index,
>> - HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
>> - input_vtl_zero);
>> + hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
>> + HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
>> + intercept_msg_page, input_vtl_zero);
>> destroy_vp:
>> hv_call_delete_vp(partition->pt_id, args.vp_index);
>> return ret;
>> @@ -1748,24 +1743,27 @@ static void destroy_partition(struct mshv_partition *partition)
>> mshv_vp_stats_unmap(partition->pt_id, vp->vp_index);
>>
>> if (vp->vp_register_page) {
>> - (void)hv_call_unmap_vp_state_page(partition->pt_id,
>> - vp->vp_index,
>> - HV_VP_STATE_PAGE_REGISTERS,
>> - input_vtl_zero);
>> + (void)hv_unmap_vp_state_page(partition->pt_id,
>> + vp->vp_index,
>> + HV_VP_STATE_PAGE_REGISTERS,
>> + virt_to_page(vp->vp_register_page),
>> + input_vtl_zero);
>> vp->vp_register_page = NULL;
>> }
>>
>> - (void)hv_call_unmap_vp_state_page(partition->pt_id,
>> - vp->vp_index,
>> - HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
>> - input_vtl_zero);
>> + (void)hv_unmap_vp_state_page(partition->pt_id,
>> + vp->vp_index,
>> + HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
>> + virt_to_page(vp->vp_intercept_msg_page),
>> + input_vtl_zero);
>> vp->vp_intercept_msg_page = NULL;
>>
>> if (vp->vp_ghcb_page) {
>> - (void)hv_call_unmap_vp_state_page(partition->pt_id,
>> - vp->vp_index,
>> - HV_VP_STATE_PAGE_GHCB,
>> - input_vtl_normal);
>> + (void)hv_unmap_vp_state_page(partition->pt_id,
>> + vp->vp_index,
>> + HV_VP_STATE_PAGE_GHCB,
>> + virt_to_page(vp->vp_ghcb_page),
>> + input_vtl_normal);
>> vp->vp_ghcb_page = NULL;
>> }
>>
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 5/5] mshv: Introduce new hypercall to map stats page for L1VH partitions
2025-09-29 17:57 ` Michael Kelley
@ 2025-10-01 23:03 ` Nuno Das Neves
2025-10-02 0:16 ` Michael Kelley
0 siblings, 1 reply; 24+ messages in thread
From: Nuno Das Neves @ 2025-10-01 23:03 UTC (permalink / raw)
To: Michael Kelley, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, prapal@linux.microsoft.com,
easwar.hariharan@linux.microsoft.com, tiala@microsoft.com,
anirudh@anirudhrb.com, paekkaladevi@linux.microsoft.com,
skinsburskii@linux.microsoft.com
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, Jinank Jain
On 9/29/2025 10:57 AM, Michael Kelley wrote:
> From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Friday, September 26, 2025 9:23 AM
>>
>> From: Jinank Jain <jinankjain@linux.microsoft.com>
>>
>> Introduce HVCALL_MAP_STATS_PAGE2 which provides a map location (GPFN)
>> to map the stats to. This hypercall is required for L1VH partitions,
>> depending on the hypervisor version. This uses the same check as the
>> state page map location; mshv_use_overlay_gpfn().
>>
>> Add mshv_map_vp_state_page() helpers to use this new hypercall or the
>> old one depending on availability.
>>
>> For unmapping, the original HVCALL_UNMAP_STATS_PAGE works for both
>> cases.
>>
>> Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
>> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
>> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
>> ---
>> drivers/hv/mshv_root.h | 10 ++--
>> drivers/hv/mshv_root_hv_call.c | 93 ++++++++++++++++++++++++++++++++--
>> drivers/hv/mshv_root_main.c | 22 ++++----
>> include/hyperv/hvgdk_mini.h | 1 +
>> include/hyperv/hvhdk_mini.h | 7 +++
>> 5 files changed, 113 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
>> index dbe2d1d0b22f..0dfccfbe6123 100644
>> --- a/drivers/hv/mshv_root.h
>> +++ b/drivers/hv/mshv_root.h
>> @@ -297,11 +297,11 @@ int hv_call_connect_port(u64 port_partition_id, union hv_port_id port_id,
>> int hv_call_disconnect_port(u64 connection_partition_id,
>> union hv_connection_id connection_id);
>> int hv_call_notify_port_ring_empty(u32 sint_index);
>> -int hv_call_map_stat_page(enum hv_stats_object_type type,
>> - const union hv_stats_object_identity *identity,
>> - void **addr);
>> -int hv_call_unmap_stat_page(enum hv_stats_object_type type,
>> - const union hv_stats_object_identity *identity);
>> +int hv_map_stats_page(enum hv_stats_object_type type,
>> + const union hv_stats_object_identity *identity,
>> + void **addr);
>> +int hv_unmap_stats_page(enum hv_stats_object_type type, void *page_addr,
>> + const union hv_stats_object_identity *identity);
>> int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
>> u64 page_struct_count, u32 host_access,
>> u32 flags, u8 acquire);
>> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
>> index 98c6278ff151..5a805b3dec0b 100644
>> --- a/drivers/hv/mshv_root_hv_call.c
>> +++ b/drivers/hv/mshv_root_hv_call.c
>> @@ -804,9 +804,51 @@ hv_call_notify_port_ring_empty(u32 sint_index)
>> return hv_result_to_errno(status);
>> }
>>
>> -int hv_call_map_stat_page(enum hv_stats_object_type type,
>> - const union hv_stats_object_identity *identity,
>> - void **addr)
>> +static int hv_call_map_stats_page2(enum hv_stats_object_type type,
>> + const union hv_stats_object_identity *identity,
>> + u64 map_location)
>> +{
>> + unsigned long flags;
>> + struct hv_input_map_stats_page2 *input;
>> + u64 status;
>> + int ret;
>> +
>> + if (!map_location || !mshv_use_overlay_gpfn())
>> + return -EINVAL;
>> +
>> + do {
>> + local_irq_save(flags);
>> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
>> +
>> + memset(input, 0, sizeof(*input));
>> + input->type = type;
>> + input->identity = *identity;
>> + input->map_location = map_location;
>> +
>> + status = hv_do_hypercall(HVCALL_MAP_STATS_PAGE2, input, NULL);
>> +
>> + local_irq_restore(flags);
>> +
>> + ret = hv_result_to_errno(status);
>> +
>> + if (!ret)
>> + break;
>
> This logic is incorrect. If the hypercall returns
> HV_STATUS_INSUFFICIENT_MEMORY, then errno -ENOMEM is immediately
> returned to the caller without any opportunity to do hv_call_deposit_pages().
>
The loop breaks if (!ret), i.e. on success. Maybe you misread it as `if (ret)`?
>> +
>> + if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
>> + hv_status_debug(status, "\n");
>> + break;
>> + }
>> +
>> + ret = hv_call_deposit_pages(NUMA_NO_NODE,
>> + hv_current_partition_id, 1);
>> + } while (!ret);
>> +
>> + return ret;
>> +}
>> +
>> +static int hv_call_map_stats_page(enum hv_stats_object_type type,
>> + const union hv_stats_object_identity *identity,
>> + void **addr)
>> {
>> unsigned long flags;
>> struct hv_input_map_stats_page *input;
>> @@ -845,8 +887,36 @@ int hv_call_map_stat_page(enum hv_stats_object_type type,
>> return ret;
>> }
>>
>> -int hv_call_unmap_stat_page(enum hv_stats_object_type type,
>> - const union hv_stats_object_identity *identity)
>> +int hv_map_stats_page(enum hv_stats_object_type type,
>> + const union hv_stats_object_identity *identity,
>> + void **addr)
>> +{
>> + int ret;
>> + struct page *allocated_page = NULL;
>> +
>> + if (!addr)
>> + return -EINVAL;
>> +
>> + if (mshv_use_overlay_gpfn()) {
>> + allocated_page = alloc_page(GFP_KERNEL);
>> + if (!allocated_page)
>> + return -ENOMEM;
>> +
>> + ret = hv_call_map_stats_page2(type, identity,
>> + page_to_pfn(allocated_page));
>
> This should use page_to_hvpfn() per my comments in Patch 4 of this series.
>
I could change it, but as mentioned in my reply to Patch 4 I'm not sure it's
worth doing without addressing the whole issue which is a much bigger ask.
>> + *addr = page_address(allocated_page);
>> + } else {
>> + ret = hv_call_map_stats_page(type, identity, addr);
>> + }
>> +
>> + if (ret && allocated_page)
>> + __free_page(allocated_page);
>
> Might want to do *addr = NULL after freeing the page so that the caller
> can't erroneously reference the free page. Again, the current caller doesn't
> do that, so current code isn't broken.
>
Yep, I can add it to provide a little more robustness against bugs in callers.
>> +
>> + return ret;
>> +}
>> +
>> +static int hv_call_unmap_stats_page(enum hv_stats_object_type type,
>> + const union hv_stats_object_identity *identity)
>> {
>> unsigned long flags;
>> struct hv_input_unmap_stats_page *input;
>> @@ -865,6 +935,19 @@ int hv_call_unmap_stat_page(enum hv_stats_object_type type,
>> return hv_result_to_errno(status);
>> }
>>
>> +int hv_unmap_stats_page(enum hv_stats_object_type type, void *page_addr,
>> + const union hv_stats_object_identity *identity)
>> +{
>> + int ret;
>> +
>> + ret = hv_call_unmap_stats_page(type, identity);
>> +
>> + if (mshv_use_overlay_gpfn() && page_addr)
>> + __free_page(virt_to_page(page_addr));
>> +
>> + return ret;
>> +}
>> +
>> int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
>> u64 page_struct_count, u32 host_access,
>> u32 flags, u8 acquire)
>> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
>> index 2d0ad17acde6..71a8ab5db3b8 100644
>> --- a/drivers/hv/mshv_root_main.c
>> +++ b/drivers/hv/mshv_root_main.c
>> @@ -841,7 +841,8 @@ mshv_vp_release(struct inode *inode, struct file *filp)
>> return 0;
>> }
>>
>> -static void mshv_vp_stats_unmap(u64 partition_id, u32 vp_index)
>> +static void mshv_vp_stats_unmap(u64 partition_id, u32 vp_index,
>> + void *stats_pages[])
>> {
>> union hv_stats_object_identity identity = {
>> .vp.partition_id = partition_id,
>> @@ -849,10 +850,10 @@ static void mshv_vp_stats_unmap(u64 partition_id, u32 vp_index)
>> };
>>
>> identity.vp.stats_area_type = HV_STATS_AREA_SELF;
>> - hv_call_unmap_stat_page(HV_STATS_OBJECT_VP, &identity);
>> + hv_unmap_stats_page(HV_STATS_OBJECT_VP, NULL, &identity);
>>
>> identity.vp.stats_area_type = HV_STATS_AREA_PARENT;
>> - hv_call_unmap_stat_page(HV_STATS_OBJECT_VP, &identity);
>> + hv_unmap_stats_page(HV_STATS_OBJECT_VP, NULL, &identity);
>> }
>>
>> static int mshv_vp_stats_map(u64 partition_id, u32 vp_index,
>> @@ -865,14 +866,14 @@ static int mshv_vp_stats_map(u64 partition_id, u32 vp_index,
>> int err;
>>
>> identity.vp.stats_area_type = HV_STATS_AREA_SELF;
>> - err = hv_call_map_stat_page(HV_STATS_OBJECT_VP, &identity,
>> - &stats_pages[HV_STATS_AREA_SELF]);
>> + err = hv_map_stats_page(HV_STATS_OBJECT_VP, &identity,
>> + &stats_pages[HV_STATS_AREA_SELF]);
>> if (err)
>> return err;
>>
>> identity.vp.stats_area_type = HV_STATS_AREA_PARENT;
>> - err = hv_call_map_stat_page(HV_STATS_OBJECT_VP, &identity,
>> - &stats_pages[HV_STATS_AREA_PARENT]);
>> + err = hv_map_stats_page(HV_STATS_OBJECT_VP, &identity,
>> + &stats_pages[HV_STATS_AREA_PARENT]);
>> if (err)
>> goto unmap_self;
>>
>> @@ -880,7 +881,7 @@ static int mshv_vp_stats_map(u64 partition_id, u32 vp_index,
>>
>> unmap_self:
>> identity.vp.stats_area_type = HV_STATS_AREA_SELF;
>> - hv_call_unmap_stat_page(HV_STATS_OBJECT_VP, &identity);
>> + hv_unmap_stats_page(HV_STATS_OBJECT_VP, NULL, &identity);
>> return err;
>> }
>>
>> @@ -988,7 +989,7 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
>> kfree(vp);
>> unmap_stats_pages:
>> if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
>> - mshv_vp_stats_unmap(partition->pt_id, args.vp_index);
>> + mshv_vp_stats_unmap(partition->pt_id, args.vp_index, stats_pages);
>> unmap_ghcb_page:
>> if (mshv_partition_encrypted(partition) && is_ghcb_mapping_available())
>> hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
>> @@ -1740,7 +1741,8 @@ static void destroy_partition(struct mshv_partition *partition)
>> continue;
>>
>> if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
>> - mshv_vp_stats_unmap(partition->pt_id, vp->vp_index);
>> + mshv_vp_stats_unmap(partition->pt_id, vp->vp_index,
>> + (void **)vp->vp_stats_pages);
>>
>> if (vp->vp_register_page) {
>> (void)hv_unmap_vp_state_page(partition->pt_id,
>> diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
>> index ff4325fb623a..f66565106d21 100644
>> --- a/include/hyperv/hvgdk_mini.h
>> +++ b/include/hyperv/hvgdk_mini.h
>> @@ -493,6 +493,7 @@ union hv_vp_assist_msr_contents { /*
>> HV_REGISTER_VP_ASSIST_PAGE */
>> #define HVCALL_GET_PARTITION_PROPERTY_EX 0x0101
>> #define HVCALL_MMIO_READ 0x0106
>> #define HVCALL_MMIO_WRITE 0x0107
>> +#define HVCALL_MAP_STATS_PAGE2 0x0131
>>
>> /* HV_HYPERCALL_INPUT */
>> #define HV_HYPERCALL_RESULT_MASK GENMASK_ULL(15, 0)
>> diff --git a/include/hyperv/hvhdk_mini.h b/include/hyperv/hvhdk_mini.h
>> index bf2ce27dfcc5..064bf735cab6 100644
>> --- a/include/hyperv/hvhdk_mini.h
>> +++ b/include/hyperv/hvhdk_mini.h
>> @@ -177,6 +177,13 @@ struct hv_input_map_stats_page {
>> union hv_stats_object_identity identity;
>> } __packed;
>>
>> +struct hv_input_map_stats_page2 {
>> + u32 type; /* enum hv_stats_object_type */
>> + u32 padding;
>> + union hv_stats_object_identity identity;
>> + u64 map_location;
>> +} __packed;
>> +
>> struct hv_output_map_stats_page {
>> u64 map_location;
>> } __packed;
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: [PATCH v4 2/5] mshv: Add the HVCALL_GET_PARTITION_PROPERTY_EX hypercall
2025-10-01 22:32 ` Nuno Das Neves
@ 2025-10-02 0:01 ` Michael Kelley
0 siblings, 0 replies; 24+ messages in thread
From: Michael Kelley @ 2025-10-02 0:01 UTC (permalink / raw)
To: Nuno Das Neves, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, prapal@linux.microsoft.com,
easwar.hariharan@linux.microsoft.com, tiala@microsoft.com,
anirudh@anirudhrb.com, paekkaladevi@linux.microsoft.com,
skinsburskii@linux.microsoft.com
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com
From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Wednesday, October 1, 2025 3:33 PM
>
> On 9/29/2025 10:55 AM, Michael Kelley wrote:
> > From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Friday, September 26, 2025 9:23 AM
> >>
> >> From: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com>
> >>
[snip]
> >> diff --git a/include/hyperv/hvhdk.h b/include/hyperv/hvhdk.h
> >> index b4067ada02cf..416c0d45b793 100644
> >> --- a/include/hyperv/hvhdk.h
> >> +++ b/include/hyperv/hvhdk.h
> >> @@ -376,6 +376,46 @@ struct hv_input_set_partition_property {
> >> u64 property_value;
> >> } __packed;
> >>
> >> +union hv_partition_property_arg {
> >> + u64 as_uint64;
> >> + struct {
> >> + union {
> >> + u32 arg;
> >> + u32 vp_index;
> >> + };
> >> + u16 reserved0;
> >> + u8 reserved1;
> >> + u8 object_type;
> >> + } __packed;
> >> +};
> >> +
> >> +struct hv_input_get_partition_property_ex {
> >> + u64 partition_id;
> >> + u32 property_code; /* enum hv_partition_property_code */
> >> + u32 padding;
> >> + union {
> >> + union hv_partition_property_arg arg_data;
> >> + u64 arg;
> >
> > This union, and the "u64 arg" member, seems redundant since
> > union hv_partition_property_arg already has a member "as_uint64".
> >
> > Maybe this is just being copied from the Windows versions of these
> > structures, in which case I realize there are constraints on what you
> > want to change or fix, and you can ignore my comment.
> >
> Yeah, this is just due to it coming from the Windows code. I prefer to
> keep it as close as possible unless it is actually a bug (like missing
> padding which has happened a couple of times).
Makes sense.
>
> >> + };
> >> +} __packed;
> >> +
> >> +/*
> >> + * NOTE: Should use hv_input_set_partition_property_ex_header to compute this
> >> + * size, but hv_input_get_partition_property_ex is identical so it suffices
> >> + */
> >> +#define HV_PARTITION_PROPERTY_EX_MAX_VAR_SIZE \
> >> + (HV_HYP_PAGE_SIZE - sizeof(struct hv_input_get_partition_property_ex))
> >> +
> >> +union hv_partition_property_ex {
> >> + u8 buffer[HV_PARTITION_PROPERTY_EX_MAX_VAR_SIZE];
> >
> > It's unclear what this "buffer" field is trying to do, and particularly its size.
> > The comment above references hv_input_set_partition_property_ex_header,
> > which doesn't exist in the Linux code. And why is the max size of the output
> > buffer reduced by the size of the header of the input to "set partition property"?
> >
> Yes, hv_input_set_partition_property_ex_header doesn't exist here, it's from the
> Windows code (although in CAPs). It's identical to hv_input_get_partition_property_ex.
> I decided to just use hv_input_get_partition_property_ex instead of introducing the
> unused header struct, for now.
>
> I'm not certain what the buffer is used for, but I suspect it's for data that doesn't
> fit with the other partition property structs. Maybe some raw/unstructured data.
>
> hv_partition_property_ex has to be that size because it is also used (not yet, in linux)
> for setting properties, and the set hypercall has a header as mentioned above, so the
> entire struct mustn't exceed a page in size - i.e.:
>
> struct hv_input_set_partition_property_ex {
> hv_input_set_partition_property_ex_header header;
> hv_partition_property_ex property_value;
> };
OK, that makes sense. Thanks.
Michael
^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: [PATCH v4 4/5] mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH
2025-10-01 22:56 ` Nuno Das Neves
@ 2025-10-02 0:02 ` Michael Kelley
2025-10-04 15:25 ` Michael Kelley
0 siblings, 1 reply; 24+ messages in thread
From: Michael Kelley @ 2025-10-02 0:02 UTC (permalink / raw)
To: Nuno Das Neves, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, prapal@linux.microsoft.com,
easwar.hariharan@linux.microsoft.com, tiala@microsoft.com,
anirudh@anirudhrb.com, paekkaladevi@linux.microsoft.com,
skinsburskii@linux.microsoft.com
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, Jinank Jain
From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Wednesday, October 1, 2025 3:56 PM
>
> On 9/29/2025 10:56 AM, Michael Kelley wrote:
> > From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Friday, September 26, 2025 9:23 AM
> >>
> >> From: Jinank Jain <jinankjain@linux.microsoft.com>
> >>
> >> Introduce mshv_use_overlay_gpfn() to check if a page needs to be
> >> allocated and passed to the hypervisor to map VP state pages. This is
> >> only needed on L1VH, and only on some (newer) versions of the
> >> hypervisor, hence the need to check vmm_capabilities.
> >>
> >> Introduce functions hv_map/unmap_vp_state_page() to handle the
> >> allocation and freeing.
> >>
> >> Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
> >> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> >> Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com>
> >> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
> >> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> >> Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com>
> >> ---
> >> drivers/hv/mshv_root.h | 11 ++---
> >> drivers/hv/mshv_root_hv_call.c | 61 ++++++++++++++++++++++++---
> >> drivers/hv/mshv_root_main.c | 76 +++++++++++++++++-----------------
> >> 3 files changed, 98 insertions(+), 50 deletions(-)
> >>
> >> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> >> index 0cb1e2589fe1..dbe2d1d0b22f 100644
> >> --- a/drivers/hv/mshv_root.h
> >> +++ b/drivers/hv/mshv_root.h
> >> @@ -279,11 +279,12 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
> >> /* Choose between pages and bytes */
> >> struct hv_vp_state_data state_data, u64 page_count,
> >> struct page **pages, u32 num_bytes, u8 *bytes);
> >> -int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> >> - union hv_input_vtl input_vtl,
> >> - struct page **state_page);
> >> -int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> >> - union hv_input_vtl input_vtl);
> >> +int hv_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> >> + union hv_input_vtl input_vtl,
> >> + struct page **state_page);
> >> +int hv_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> >> + struct page *state_page,
> >> + union hv_input_vtl input_vtl);
> >> int hv_call_create_port(u64 port_partition_id, union hv_port_id port_id,
> >> u64 connection_partition_id, struct hv_port_info *port_info,
> >> u8 port_vtl, u8 min_connection_vtl, int node);
> >> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> >> index 3fd3cce23f69..98c6278ff151 100644
> >> --- a/drivers/hv/mshv_root_hv_call.c
> >> +++ b/drivers/hv/mshv_root_hv_call.c
> >> @@ -526,9 +526,9 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
> >> return ret;
> >> }
> >>
> >> -int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> >> - union hv_input_vtl input_vtl,
> >> - struct page **state_page)
> >> +static int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> >> + union hv_input_vtl input_vtl,
> >> + struct page **state_page)
> >> {
> >> struct hv_input_map_vp_state_page *input;
> >> struct hv_output_map_vp_state_page *output;
> >> @@ -547,7 +547,14 @@ int hv_call_map_vp_state_page(u64 partition_id, u32
> vp_index, u32 type,
> >> input->type = type;
> >> input->input_vtl = input_vtl;
> >>
> >> - status = hv_do_hypercall(HVCALL_MAP_VP_STATE_PAGE, input, output);
> >
> > This function must zero the input area before using it. Otherwise,
> > flags.map_location_provided is uninitialized when *state_page is NULL. It will
> > have whatever value was left by the previous user of hyperv_pcpu_input_arg,
> > potentially producing bizarre results. And there's a reserved field that won't be
> > set to zero.
> >
> Good catch, will add a memset().
>
> >> + if (*state_page) {
> >> + input->flags.map_location_provided = 1;
> >> + input->requested_map_location =
> >> + page_to_pfn(*state_page);
> >
> > Technically, this should be page_to_hvpfn() since the PFN value is being sent to
> > Hyper-V. I know root (and L1VH?) partitions must run with the same page size
> > as the Hyper-V host, but it's better to not leave code buried here that will blow
> > up if the "same page size requirement" should ever change.
> >
> Good point...I could change these calls, but the other way doesn't work, see below.
>
> > And after making the hypercall, there's an invocation of pfn_to_page(), which
> > should account for the same. Unfortunately, there's not an existing hvpfn_to_page()
> > function.
> >
> This seems like a tricky scenario to get right. In the root partition case, the
> hypervisor allocates the page. That pfn could be some page within a larger Linux page.
> Converting that to a Linux pfn (naively) means losing the original hvpfn since it gets
> truncated, which is no good if we want to unmap it later. Also page_address() would
> give the wrong virtual address.
>
> In other words, we'd have to completely change how we track these pages in order to
> support this scenario, and the same goes for various other hypervisor APIs where the
> hypervisor does the allocating. I think it's out of scope to try and address that
> here, even in part, especially since we will be making assumptions about something
> that may never happen.
OK, yes the hypervisor allocating the page is a problem when Linux tracks it
as a struct page. I'll agree it's out of current scope to change this.
It makes me think about hv_synic_enable_regs() where the paravisor or hypervisor
allocates the synic_message_page and synic_event_page. But that case should work
OK with a regular guest with page size greater than 4K because the pages are tracked
based on the guest kernel virtual address, not the PFN. So hv_synic_enable_regs()
should work on ARM64 Linux guests with 64K page size and a paravisor, as well as
for my postulated root partition with page size greater than 4K.
When it matters, cases where the hypervisor or paravisor allocate pages to give
to the guest will require careful handling to ensure they work for guest page sizes
greater than 4K. That's useful information for future consideration. Thanks for the
discussion.
Michael
>
> >> + }
> >> +
> >> + status = hv_do_hypercall(HVCALL_MAP_VP_STATE_PAGE, input,
> >> + output);
> >>
> >> if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
> >> if (hv_result_success(status))
> >> @@ -565,8 +572,39 @@ int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> >> return ret;
> >> }
> >>
> >> -int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> >> - union hv_input_vtl input_vtl)
> >> +static bool mshv_use_overlay_gpfn(void)
> >> +{
> >> + return hv_l1vh_partition() &&
> >> + mshv_root.vmm_caps.vmm_can_provide_overlay_gpfn;
> >> +}
> >> +
> >> +int hv_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> >> + union hv_input_vtl input_vtl,
> >> + struct page **state_page)
> >> +{
> >> + int ret = 0;
> >> + struct page *allocated_page = NULL;
> >> +
> >> + if (mshv_use_overlay_gpfn()) {
> >> + allocated_page = alloc_page(GFP_KERNEL);
> >> + if (!allocated_page)
> >> + return -ENOMEM;
> >> + *state_page = allocated_page;
> >> + } else {
> >> + *state_page = NULL;
> >> + }
> >> +
> >> + ret = hv_call_map_vp_state_page(partition_id, vp_index, type, input_vtl,
> >> + state_page);
> >> +
> >> + if (ret && allocated_page)
> >> + __free_page(allocated_page);
> >
> > For robustness, you might want to set *state_page = NULL here so the
> > caller doesn't have a reference to the page that has been freed. I didn't
> > see any cases where the caller incorrectly checks the returned
> > *state_page value after an error, so the current code isn't broken.
> >
> Sure, I can add it.
>
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +static int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> >> + union hv_input_vtl input_vtl)
> >> {
> >> unsigned long flags;
> >> u64 status;
> >> @@ -590,6 +628,17 @@ int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> >> return hv_result_to_errno(status);
> >> }
> >>
> >> +int hv_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> >> + struct page *state_page, union hv_input_vtl input_vtl)
> >> +{
> >> + int ret = hv_call_unmap_vp_state_page(partition_id, vp_index, type, input_vtl);
> >> +
> >> + if (mshv_use_overlay_gpfn() && state_page)
> >> + __free_page(state_page);
> >> +
> >> + return ret;
> >> +}
> >> +
> >> int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code,
> >> u64 arg, void *property_value,
> >> size_t property_value_sz)
> >> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> >> index e199770ecdfa..2d0ad17acde6 100644
> >> --- a/drivers/hv/mshv_root_main.c
> >> +++ b/drivers/hv/mshv_root_main.c
> >> @@ -890,7 +890,7 @@ mshv_partition_ioctl_create_vp(struct mshv_partition
> >> *partition,
> >> {
> >> struct mshv_create_vp args;
> >> struct mshv_vp *vp;
> >> - struct page *intercept_message_page, *register_page, *ghcb_page;
> >> + struct page *intercept_msg_page, *register_page, *ghcb_page;
> >> void *stats_pages[2];
> >> long ret;
> >>
> >> @@ -908,28 +908,25 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
> >> if (ret)
> >> return ret;
> >>
> >> - ret = hv_call_map_vp_state_page(partition->pt_id, args.vp_index,
> >> - HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> >> - input_vtl_zero,
> >> - &intercept_message_page);
> >> + ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
> >> + HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> >> + input_vtl_zero, &intercept_msg_page);
> >> if (ret)
> >> goto destroy_vp;
> >>
> >> if (!mshv_partition_encrypted(partition)) {
> >> - ret = hv_call_map_vp_state_page(partition->pt_id, args.vp_index,
> >> - HV_VP_STATE_PAGE_REGISTERS,
> >> - input_vtl_zero,
> >> - ®ister_page);
> >> + ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
> >> + HV_VP_STATE_PAGE_REGISTERS,
> >> + input_vtl_zero, ®ister_page);
> >> if (ret)
> >> goto unmap_intercept_message_page;
> >> }
> >>
> >> if (mshv_partition_encrypted(partition) &&
> >> is_ghcb_mapping_available()) {
> >> - ret = hv_call_map_vp_state_page(partition->pt_id, args.vp_index,
> >> - HV_VP_STATE_PAGE_GHCB,
> >> - input_vtl_normal,
> >> - &ghcb_page);
> >> + ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
> >> + HV_VP_STATE_PAGE_GHCB,
> >> + input_vtl_normal, &ghcb_page);
> >> if (ret)
> >> goto unmap_register_page;
> >> }
> >> @@ -960,7 +957,7 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
> >> atomic64_set(&vp->run.vp_signaled_count, 0);
> >>
> >> vp->vp_index = args.vp_index;
> >> - vp->vp_intercept_msg_page = page_to_virt(intercept_message_page);
> >> + vp->vp_intercept_msg_page = page_to_virt(intercept_msg_page);
> >> if (!mshv_partition_encrypted(partition))
> >> vp->vp_register_page = page_to_virt(register_page);
> >>
> >> @@ -993,21 +990,19 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
> >> if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
> >> mshv_vp_stats_unmap(partition->pt_id, args.vp_index);
> >> unmap_ghcb_page:
> >> - if (mshv_partition_encrypted(partition) && is_ghcb_mapping_available()) {
> >> - hv_call_unmap_vp_state_page(partition->pt_id, args.vp_index,
> >> - HV_VP_STATE_PAGE_GHCB,
> >> - input_vtl_normal);
> >> - }
> >> + if (mshv_partition_encrypted(partition) && is_ghcb_mapping_available())
> >> + hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
> >> + HV_VP_STATE_PAGE_GHCB, ghcb_page,
> >> + input_vtl_normal);
> >> unmap_register_page:
> >> - if (!mshv_partition_encrypted(partition)) {
> >> - hv_call_unmap_vp_state_page(partition->pt_id, args.vp_index,
> >> - HV_VP_STATE_PAGE_REGISTERS,
> >> - input_vtl_zero);
> >> - }
> >> + if (!mshv_partition_encrypted(partition))
> >> + hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
> >> + HV_VP_STATE_PAGE_REGISTERS,
> >> + register_page, input_vtl_zero);
> >> unmap_intercept_message_page:
> >> - hv_call_unmap_vp_state_page(partition->pt_id, args.vp_index,
> >> - HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> >> - input_vtl_zero);
> >> + hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
> >> + HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> >> + intercept_msg_page, input_vtl_zero);
> >> destroy_vp:
> >> hv_call_delete_vp(partition->pt_id, args.vp_index);
> >> return ret;
> >> @@ -1748,24 +1743,27 @@ static void destroy_partition(struct mshv_partition *partition)
> >> mshv_vp_stats_unmap(partition->pt_id, vp->vp_index);
> >>
> >> if (vp->vp_register_page) {
> >> - (void)hv_call_unmap_vp_state_page(partition->pt_id,
> >> - vp->vp_index,
> >> - HV_VP_STATE_PAGE_REGISTERS,
> >> - input_vtl_zero);
> >> + (void)hv_unmap_vp_state_page(partition->pt_id,
> >> + vp->vp_index,
> >> + HV_VP_STATE_PAGE_REGISTERS,
> >> + virt_to_page(vp->vp_register_page),
> >> + input_vtl_zero);
> >> vp->vp_register_page = NULL;
> >> }
> >>
> >> - (void)hv_call_unmap_vp_state_page(partition->pt_id,
> >> - vp->vp_index,
> >> - HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> >> - input_vtl_zero);
> >> + (void)hv_unmap_vp_state_page(partition->pt_id,
> >> + vp->vp_index,
> >> + HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
> >> + virt_to_page(vp->vp_intercept_msg_page),
> >> + input_vtl_zero);
> >> vp->vp_intercept_msg_page = NULL;
> >>
> >> if (vp->vp_ghcb_page) {
> >> - (void)hv_call_unmap_vp_state_page(partition->pt_id,
> >> - vp->vp_index,
> >> - HV_VP_STATE_PAGE_GHCB,
> >> - input_vtl_normal);
> >> + (void)hv_unmap_vp_state_page(partition->pt_id,
> >> + vp->vp_index,
> >> + HV_VP_STATE_PAGE_GHCB,
> >> + virt_to_page(vp->vp_ghcb_page),
> >> + input_vtl_normal);
> >> vp->vp_ghcb_page = NULL;
> >> }
> >>
> >> --
> >> 2.34.1
> >>
^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: [PATCH v4 5/5] mshv: Introduce new hypercall to map stats page for L1VH partitions
2025-10-01 23:03 ` Nuno Das Neves
@ 2025-10-02 0:16 ` Michael Kelley
0 siblings, 0 replies; 24+ messages in thread
From: Michael Kelley @ 2025-10-02 0:16 UTC (permalink / raw)
To: Nuno Das Neves, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, prapal@linux.microsoft.com,
easwar.hariharan@linux.microsoft.com, tiala@microsoft.com,
anirudh@anirudhrb.com, paekkaladevi@linux.microsoft.com,
skinsburskii@linux.microsoft.com
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, Jinank Jain
From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Wednesday, October 1, 2025 4:03 PM
>
> On 9/29/2025 10:57 AM, Michael Kelley wrote:
> > From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Friday, September 26, 2025 9:23 AM
> >>
> >> From: Jinank Jain <jinankjain@linux.microsoft.com>
> >>
[snip]
> >> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> >> index 98c6278ff151..5a805b3dec0b 100644
> >> --- a/drivers/hv/mshv_root_hv_call.c
> >> +++ b/drivers/hv/mshv_root_hv_call.c
> >> @@ -804,9 +804,51 @@ hv_call_notify_port_ring_empty(u32 sint_index)
> >> return hv_result_to_errno(status);
> >> }
> >>
> >> -int hv_call_map_stat_page(enum hv_stats_object_type type,
> >> - const union hv_stats_object_identity *identity,
> >> - void **addr)
> >> +static int hv_call_map_stats_page2(enum hv_stats_object_type type,
> >> + const union hv_stats_object_identity *identity,
> >> + u64 map_location)
> >> +{
> >> + unsigned long flags;
> >> + struct hv_input_map_stats_page2 *input;
> >> + u64 status;
> >> + int ret;
> >> +
> >> + if (!map_location || !mshv_use_overlay_gpfn())
> >> + return -EINVAL;
> >> +
> >> + do {
> >> + local_irq_save(flags);
> >> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> >> +
> >> + memset(input, 0, sizeof(*input));
> >> + input->type = type;
> >> + input->identity = *identity;
> >> + input->map_location = map_location;
> >> +
> >> + status = hv_do_hypercall(HVCALL_MAP_STATS_PAGE2, input, NULL);
> >> +
> >> + local_irq_restore(flags);
> >> +
> >> + ret = hv_result_to_errno(status);
> >> +
> >> + if (!ret)
> >> + break;
> >
> > This logic is incorrect. If the hypercall returns
> > HV_STATUS_INSUFFICIENT_MEMORY, then errno -ENOMEM is immediately
> > returned to the caller without any opportunity to do hv_call_deposit_pages().
> >
> The loop breaks if (!ret), i.e. on success. Maybe you misread it as `if (ret)`?
Pfft! Indeed I had it wrong.
I'll note that hv_call_map_stats_page2() and the existing hv_call_map_stats_page()
have different code for sorting out success vs. insufficient memory vs. other errors.
And there are additional code variations in other MSHV places that must check for
insufficient memory. Some consistency might be nice, but that's out of scope
for this patch set.
> >> +
> >> + if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
> >> + hv_status_debug(status, "\n");
> >> + break;
> >> + }
> >> +
> >> + ret = hv_call_deposit_pages(NUMA_NO_NODE,
> >> + hv_current_partition_id, 1);
> >> + } while (!ret);
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +static int hv_call_map_stats_page(enum hv_stats_object_type type,
> >> + const union hv_stats_object_identity *identity,
> >> + void **addr)
> >> {
> >> unsigned long flags;
> >> struct hv_input_map_stats_page *input;
> >> @@ -845,8 +887,36 @@ int hv_call_map_stat_page(enum hv_stats_object_type type,
> >> return ret;
> >> }
> >>
> >> -int hv_call_unmap_stat_page(enum hv_stats_object_type type,
> >> - const union hv_stats_object_identity *identity)
> >> +int hv_map_stats_page(enum hv_stats_object_type type,
> >> + const union hv_stats_object_identity *identity,
> >> + void **addr)
> >> +{
> >> + int ret;
> >> + struct page *allocated_page = NULL;
> >> +
> >> + if (!addr)
> >> + return -EINVAL;
> >> +
> >> + if (mshv_use_overlay_gpfn()) {
> >> + allocated_page = alloc_page(GFP_KERNEL);
> >> + if (!allocated_page)
> >> + return -ENOMEM;
> >> +
> >> + ret = hv_call_map_stats_page2(type, identity,
> >> + page_to_pfn(allocated_page));
> >
> > This should use page_to_hvpfn() per my comments in Patch 4 of this series.
> >
> I could change it, but as mentioned in my reply to Patch 4 I'm not sure it's
> worth doing without addressing the whole issue which is a much bigger ask.
Agreed.
Michael
>
> >> + *addr = page_address(allocated_page);
> >> + } else {
> >> + ret = hv_call_map_stats_page(type, identity, addr);
> >> + }
> >> +
> >> + if (ret && allocated_page)
> >> + __free_page(allocated_page);
> >
> > Might want to do *addr = NULL after freeing the page so that the caller
> > can't erroneously reference the free page. Again, the current caller doesn't
> > do that, so current code isn't broken.
> >
> Yep, I can add it to provide a little more robustness against bugs in callers.
>
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +static int hv_call_unmap_stats_page(enum hv_stats_object_type type,
> >> + const union hv_stats_object_identity *identity)
> >> {
> >> unsigned long flags;
> >> struct hv_input_unmap_stats_page *input;
> >> @@ -865,6 +935,19 @@ int hv_call_unmap_stat_page(enum hv_stats_object_type
> type,
> >> return hv_result_to_errno(status);
> >> }
> >>
> >> +int hv_unmap_stats_page(enum hv_stats_object_type type, void *page_addr,
> >> + const union hv_stats_object_identity *identity)
> >> +{
> >> + int ret;
> >> +
> >> + ret = hv_call_unmap_stats_page(type, identity);
> >> +
> >> + if (mshv_use_overlay_gpfn() && page_addr)
> >> + __free_page(virt_to_page(page_addr));
> >> +
> >> + return ret;
> >> +}
> >> +
> >> int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
> >> u64 page_struct_count, u32 host_access,
> >> u32 flags, u8 acquire)
> >> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> >> index 2d0ad17acde6..71a8ab5db3b8 100644
> >> --- a/drivers/hv/mshv_root_main.c
> >> +++ b/drivers/hv/mshv_root_main.c
> >> @@ -841,7 +841,8 @@ mshv_vp_release(struct inode *inode, struct file *filp)
> >> return 0;
> >> }
> >>
> >> -static void mshv_vp_stats_unmap(u64 partition_id, u32 vp_index)
> >> +static void mshv_vp_stats_unmap(u64 partition_id, u32 vp_index,
> >> + void *stats_pages[])
> >> {
> >> union hv_stats_object_identity identity = {
> >> .vp.partition_id = partition_id,
> >> @@ -849,10 +850,10 @@ static void mshv_vp_stats_unmap(u64 partition_id, u32
> vp_index)
> >> };
> >>
> >> identity.vp.stats_area_type = HV_STATS_AREA_SELF;
> >> - hv_call_unmap_stat_page(HV_STATS_OBJECT_VP, &identity);
> >> + hv_unmap_stats_page(HV_STATS_OBJECT_VP, NULL, &identity);
> >>
> >> identity.vp.stats_area_type = HV_STATS_AREA_PARENT;
> >> - hv_call_unmap_stat_page(HV_STATS_OBJECT_VP, &identity);
> >> + hv_unmap_stats_page(HV_STATS_OBJECT_VP, NULL, &identity);
> >> }
> >>
> >> static int mshv_vp_stats_map(u64 partition_id, u32 vp_index,
> >> @@ -865,14 +866,14 @@ static int mshv_vp_stats_map(u64 partition_id, u32
> vp_index,
> >> int err;
> >>
> >> identity.vp.stats_area_type = HV_STATS_AREA_SELF;
> >> - err = hv_call_map_stat_page(HV_STATS_OBJECT_VP, &identity,
> >> - &stats_pages[HV_STATS_AREA_SELF]);
> >> + err = hv_map_stats_page(HV_STATS_OBJECT_VP, &identity,
> >> + &stats_pages[HV_STATS_AREA_SELF]);
> >> if (err)
> >> return err;
> >>
> >> identity.vp.stats_area_type = HV_STATS_AREA_PARENT;
> >> - err = hv_call_map_stat_page(HV_STATS_OBJECT_VP, &identity,
> >> - &stats_pages[HV_STATS_AREA_PARENT]);
> >> + err = hv_map_stats_page(HV_STATS_OBJECT_VP, &identity,
> >> + &stats_pages[HV_STATS_AREA_PARENT]);
> >> if (err)
> >> goto unmap_self;
> >>
> >> @@ -880,7 +881,7 @@ static int mshv_vp_stats_map(u64 partition_id, u32
> vp_index,
> >>
> >> unmap_self:
> >> identity.vp.stats_area_type = HV_STATS_AREA_SELF;
> >> - hv_call_unmap_stat_page(HV_STATS_OBJECT_VP, &identity);
> >> + hv_unmap_stats_page(HV_STATS_OBJECT_VP, NULL, &identity);
> >> return err;
> >> }
> >>
> >> @@ -988,7 +989,7 @@ mshv_partition_ioctl_create_vp(struct mshv_partition
> *partition,
> >> kfree(vp);
> >> unmap_stats_pages:
> >> if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
> >> - mshv_vp_stats_unmap(partition->pt_id, args.vp_index);
> >> + mshv_vp_stats_unmap(partition->pt_id, args.vp_index, stats_pages);
> >> unmap_ghcb_page:
> >> if (mshv_partition_encrypted(partition) && is_ghcb_mapping_available())
> >> hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
> >> @@ -1740,7 +1741,8 @@ static void destroy_partition(struct mshv_partition
> *partition)
> >> continue;
> >>
> >> if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
> >> - mshv_vp_stats_unmap(partition->pt_id, vp-
> >vp_index);
> >> + mshv_vp_stats_unmap(partition->pt_id, vp->vp_index,
> >> + (void **)vp->vp_stats_pages);
> >>
> >> if (vp->vp_register_page) {
> >> (void)hv_unmap_vp_state_page(partition->pt_id,
> >> diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
> >> index ff4325fb623a..f66565106d21 100644
> >> --- a/include/hyperv/hvgdk_mini.h
> >> +++ b/include/hyperv/hvgdk_mini.h
> >> @@ -493,6 +493,7 @@ union hv_vp_assist_msr_contents { /*
> >> HV_REGISTER_VP_ASSIST_PAGE */
> >> #define HVCALL_GET_PARTITION_PROPERTY_EX 0x0101
> >> #define HVCALL_MMIO_READ 0x0106
> >> #define HVCALL_MMIO_WRITE 0x0107
> >> +#define HVCALL_MAP_STATS_PAGE2 0x0131
> >>
> >> /* HV_HYPERCALL_INPUT */
> >> #define HV_HYPERCALL_RESULT_MASK GENMASK_ULL(15, 0)
> >> diff --git a/include/hyperv/hvhdk_mini.h b/include/hyperv/hvhdk_mini.h
> >> index bf2ce27dfcc5..064bf735cab6 100644
> >> --- a/include/hyperv/hvhdk_mini.h
> >> +++ b/include/hyperv/hvhdk_mini.h
> >> @@ -177,6 +177,13 @@ struct hv_input_map_stats_page {
> >> union hv_stats_object_identity identity;
> >> } __packed;
> >>
> >> +struct hv_input_map_stats_page2 {
> >> + u32 type; /* enum hv_stats_object_type */
> >> + u32 padding;
> >> + union hv_stats_object_identity identity;
> >> + u64 map_location;
> >> +} __packed;
> >> +
> >> struct hv_output_map_stats_page {
> >> u64 map_location;
> >> } __packed;
> >> --
> >> 2.34.1
> >>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings
2025-09-29 18:19 ` Nuno Das Neves
2025-09-30 23:14 ` Wei Liu
@ 2025-10-03 16:31 ` Stanislav Kinsburskii
2025-10-03 17:30 ` Wei Liu
1 sibling, 1 reply; 24+ messages in thread
From: Stanislav Kinsburskii @ 2025-10-03 16:31 UTC (permalink / raw)
To: Nuno Das Neves
Cc: linux-hyperv, linux-kernel, prapal, easwar.hariharan, tiala,
anirudh, paekkaladevi, kys, haiyangz, wei.liu, decui
On Mon, Sep 29, 2025 at 11:19:51AM -0700, Nuno Das Neves wrote:
> On 9/26/2025 4:12 PM, Stanislav Kinsburskii wrote:
> > On Fri, Sep 26, 2025 at 09:23:10AM -0700, Nuno Das Neves wrote:
> >> There are some differences in how L1VH partitions must map stats and vp
> >> state pages, some of which are due to differences across hypervisor
> >> versions. Detect and handle these cases.
> >>
> >
> > I'm not sure that support for older and actully broken versions on
> > hypervisor need to be usptreamed, as these versions will go away sooner
> > or later and this support will become dead weight.
> >
> As far as I know, these changes are relevant for shipped versions of the
> hypervisor - they are not 'broken' except in some very specific cases
> (live migration on L1VH, I think?)
>
I'm not sure I understand what "shipped version" of hypervisor actually
is.
As of today, the hypervisor is close source and the only product where
it's used is Azure. In Azure, the older versions of hypervisor are
replaced with newer on regular basis.
> The hypervisor team added a feature bit for these changes so that both old
> and new versions of these APIs can be supported.
>
> > I think we should upstrem only the changes needed for the new versiong
> > of hypervisors instead and carry legacy support out of tree until it
> > becomes obsoleted.
> >
> Which version do you suggest to be the cutoff?
>
> I'd prefer to support as many versions of the hypervisor as we can, as
> long as they are at all relevant. We can remove the support later.
> Removing prematurely just creates friction. Inevitably some users will
> find themselves running on an older hypervisor and then it just fails
> with a cryptic error. This includes myself, since I test L1VH on Azure
> which typically has older hypervisor versions.
>
Given that these changes are expected to land to a newly released
kernel, it will take time until this kernel gets to production. At that
moment it's highly likley that the older versions of hypervisor you are
trying to support here will be gone for good.
Even if they won't be gone, they will be obsoleted and intended to be
replaced which effecitively makes this support of older versions a
dead weight, which - if needed to be caried - is cleaner to keep in house
and drop when apporiate than keeping in the upstream code base.
Thanks,
Stas
> Nuno
>
> > Thanks,
> > Stanislav
> >
> >
> >> Patch 1:
> >> Fix for the logic of when to map the vp stats page for the root scheduler.
> >>
> >> Patch 2-3:
> >> Add HVCALL_GET_PARTITION_PROPERTY_EX and use it to query "vmm capabilities" on
> >> module init.
> >>
> >> Patches 4-5:
> >> Check a feature bit in vmm capabilities, to take a new code path for mapping
> >> stats and vp state pages. In this case, the stats and vp state pages must be
> >> allocated by Linux, and a new hypercall HVCALL_MAP_VP_STATS_PAGE2 must be used
> >> to map the stats page.
> >>
> >> ---
> >> v4:
> >> - Fixed some __packed attributes on unions [Stanislav]
> >> - Cleaned up mshv_init_vmm_caps() [Stanislav]
> >> - Cleaned up loop in hv_call_map_stats_page2() [Stanislav]
> >>
> >> v3:
> >> https://lore.kernel.org/linux-hyperv/1758066262-15477-1-git-send-email-nunodasneves@linux.microsoft.com/T/#t
> >> - Fix bug in patch 4, in mshv_partition_ioctl_create_vp() cleanup path
> >> [kernel test robot]
> >> - Make hv_unmap_vp_state_page() use struct page to match hv_map_vp_state_page()
> >> - Remove SELF == PARENT check which doesn't belong in patch 5 [Easwar]
> >>
> >> v2:
> >> https://lore.kernel.org/linux-hyperv/1757546089-2002-1-git-send-email-nunodasneves@linux.microsoft.com/T/#t
> >> - Remove patch falling back to SELF page if PARENT mapping fails [Easwar]
> >> (To be included in a future series)
> >> - Fix formatting of function definitions [Easwar]
> >> - Fix some wording in commit messages [Praveen]
> >> - Proceed with driver init even if getting vmm capabilities fails [Anirudh]
> >>
> >> v1:
> >> https://lore.kernel.org/linux-hyperv/1756428230-3599-1-git-send-email-nunodasneves@linux.microsoft.com/T/#t
> >>
> >> ---
> >> Jinank Jain (2):
> >> mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH
> >> mshv: Introduce new hypercall to map stats page for L1VH partitions
> >>
> >> Nuno Das Neves (1):
> >> mshv: Only map vp->vp_stats_pages if on root scheduler
> >>
> >> Purna Pavan Chandra Aekkaladevi (2):
> >> mshv: Add the HVCALL_GET_PARTITION_PROPERTY_EX hypercall
> >> mshv: Get the vmm capabilities offered by the hypervisor
> >>
> >> drivers/hv/mshv_root.h | 24 +++--
> >> drivers/hv/mshv_root_hv_call.c | 185 +++++++++++++++++++++++++++++++--
> >> drivers/hv/mshv_root_main.c | 127 ++++++++++++----------
> >> include/hyperv/hvgdk_mini.h | 2 +
> >> include/hyperv/hvhdk.h | 40 +++++++
> >> include/hyperv/hvhdk_mini.h | 33 ++++++
> >> 6 files changed, 337 insertions(+), 74 deletions(-)
> >>
> >> --
> >> 2.34.1
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings
2025-10-03 16:31 ` Stanislav Kinsburskii
@ 2025-10-03 17:30 ` Wei Liu
0 siblings, 0 replies; 24+ messages in thread
From: Wei Liu @ 2025-10-03 17:30 UTC (permalink / raw)
To: Stanislav Kinsburskii
Cc: Nuno Das Neves, linux-hyperv, linux-kernel, prapal,
easwar.hariharan, tiala, anirudh, paekkaladevi, kys, haiyangz,
wei.liu, decui
On Fri, Oct 03, 2025 at 09:31:41AM -0700, Stanislav Kinsburskii wrote:
> On Mon, Sep 29, 2025 at 11:19:51AM -0700, Nuno Das Neves wrote:
> > On 9/26/2025 4:12 PM, Stanislav Kinsburskii wrote:
> > > On Fri, Sep 26, 2025 at 09:23:10AM -0700, Nuno Das Neves wrote:
> > >> There are some differences in how L1VH partitions must map stats and vp
> > >> state pages, some of which are due to differences across hypervisor
> > >> versions. Detect and handle these cases.
> > >>
> > >
> > > I'm not sure that support for older and actully broken versions on
> > > hypervisor need to be usptreamed, as these versions will go away sooner
> > > or later and this support will become dead weight.
> > >
> > As far as I know, these changes are relevant for shipped versions of the
> > hypervisor - they are not 'broken' except in some very specific cases
> > (live migration on L1VH, I think?)
> >
>
> I'm not sure I understand what "shipped version" of hypervisor actually
> is.
> As of today, the hypervisor is close source and the only product where
> it's used is Azure. In Azure, the older versions of hypervisor are
> replaced with newer on regular basis.
>
> > The hypervisor team added a feature bit for these changes so that both old
> > and new versions of these APIs can be supported.
> >
> > > I think we should upstrem only the changes needed for the new versiong
> > > of hypervisors instead and carry legacy support out of tree until it
> > > becomes obsoleted.
> > >
> > Which version do you suggest to be the cutoff?
> >
> > I'd prefer to support as many versions of the hypervisor as we can, as
> > long as they are at all relevant. We can remove the support later.
> > Removing prematurely just creates friction. Inevitably some users will
> > find themselves running on an older hypervisor and then it just fails
> > with a cryptic error. This includes myself, since I test L1VH on Azure
> > which typically has older hypervisor versions.
> >
>
> Given that these changes are expected to land to a newly released
> kernel, it will take time until this kernel gets to production. At that
> moment it's highly likley that the older versions of hypervisor you are
> trying to support here will be gone for good.
No. This is not 100% certain. The schedule is outside of our control.
Who knows if there is a specialized Azure cluster that is not updated
for some reason.
Interested parties have already started experimenting with this new
setup. I just point them to the upstream tree whenever I can.
> Even if they won't be gone, they will be obsoleted and intended to be
> replaced which effecitively makes this support of older versions a
> dead weight, which - if needed to be caried - is cleaner to keep in house
> and drop when apporiate than keeping in the upstream code base.
>
While the maintenance burden argument generally applies, the burden in
this case is not big. It is totally fine to have the code.
Wei
> Thanks,
> Stas
>
> > Nuno
> >
> > > Thanks,
> > > Stanislav
> > >
> > >
^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: [PATCH v4 4/5] mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH
2025-10-02 0:02 ` Michael Kelley
@ 2025-10-04 15:25 ` Michael Kelley
2025-10-10 19:08 ` Nuno Das Neves
0 siblings, 1 reply; 24+ messages in thread
From: Michael Kelley @ 2025-10-04 15:25 UTC (permalink / raw)
To: Michael Kelley, Nuno Das Neves, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, prapal@linux.microsoft.com,
easwar.hariharan@linux.microsoft.com, tiala@microsoft.com,
anirudh@anirudhrb.com, paekkaladevi@linux.microsoft.com,
skinsburskii@linux.microsoft.com
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, Jinank Jain
From: Michael Kelley <mhklinux@outlook.com> Sent: Wednesday, October 1, 2025 5:03 PM
>
> From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Wednesday, October 1, 2025 3:56 PM
> >
> > On 9/29/2025 10:56 AM, Michael Kelley wrote:
> > > From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Friday, September 26, 2025 9:23 AM
> > >>
> > >> From: Jinank Jain <jinankjain@linux.microsoft.com>
> > >>
> > >> Introduce mshv_use_overlay_gpfn() to check if a page needs to be
> > >> allocated and passed to the hypervisor to map VP state pages. This is
> > >> only needed on L1VH, and only on some (newer) versions of the
> > >> hypervisor, hence the need to check vmm_capabilities.
> > >>
> > >> Introduce functions hv_map/unmap_vp_state_page() to handle the
> > >> allocation and freeing.
> > >>
> > >> Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
> > >> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> > >> Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com>
> > >> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
> > >> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > >> Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com>
> > >> ---
> > >> drivers/hv/mshv_root.h | 11 ++---
> > >> drivers/hv/mshv_root_hv_call.c | 61 ++++++++++++++++++++++++---
> > >> drivers/hv/mshv_root_main.c | 76 +++++++++++++++++-----------------
> > >> 3 files changed, 98 insertions(+), 50 deletions(-)
> > >>
> > >> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> > >> index 0cb1e2589fe1..dbe2d1d0b22f 100644
> > >> --- a/drivers/hv/mshv_root.h
> > >> +++ b/drivers/hv/mshv_root.h
> > >> @@ -279,11 +279,12 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
> > >> /* Choose between pages and bytes */
> > >> struct hv_vp_state_data state_data, u64 page_count,
> > >> struct page **pages, u32 num_bytes, u8 *bytes);
> > >> -int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> > >> - union hv_input_vtl input_vtl,
> > >> - struct page **state_page);
> > >> -int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> > >> - union hv_input_vtl input_vtl);
> > >> +int hv_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> > >> + union hv_input_vtl input_vtl,
> > >> + struct page **state_page);
> > >> +int hv_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> > >> + struct page *state_page,
> > >> + union hv_input_vtl input_vtl);
> > >> int hv_call_create_port(u64 port_partition_id, union hv_port_id port_id,
> > >> u64 connection_partition_id, struct hv_port_info *port_info,
> > >> u8 port_vtl, u8 min_connection_vtl, int node);
> > >> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> > >> index 3fd3cce23f69..98c6278ff151 100644
> > >> --- a/drivers/hv/mshv_root_hv_call.c
> > >> +++ b/drivers/hv/mshv_root_hv_call.c
> > >> @@ -526,9 +526,9 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
> > >> return ret;
> > >> }
> > >>
> > >> -int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> > >> - union hv_input_vtl input_vtl,
> > >> - struct page **state_page)
> > >> +static int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> > >> + union hv_input_vtl input_vtl,
> > >> + struct page **state_page)
> > >> {
> > >> struct hv_input_map_vp_state_page *input;
> > >> struct hv_output_map_vp_state_page *output;
> > >> @@ -547,7 +547,14 @@ int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
> > >> input->type = type;
> > >> input->input_vtl = input_vtl;
> > >>
> > >> - status = hv_do_hypercall(HVCALL_MAP_VP_STATE_PAGE, input, output);
> > >
> > > This function must zero the input area before using it. Otherwise,
> > > flags.map_location_provided is uninitialized when *state_page is NULL. It will
> > > have whatever value was left by the previous user of hyperv_pcpu_input_arg,
> > > potentially producing bizarre results. And there's a reserved field that won't be
> > > set to zero.
> > >
> > Good catch, will add a memset().
> >
> > >> + if (*state_page) {
> > >> + input->flags.map_location_provided = 1;
> > >> + input->requested_map_location =
> > >> + page_to_pfn(*state_page);
> > >
> > > Technically, this should be page_to_hvpfn() since the PFN value is being sent to
> > > Hyper-V. I know root (and L1VH?) partitions must run with the same page size
> > > as the Hyper-V host, but it's better to not leave code buried here that will blow
> > > up if the "same page size requirement" should ever change.
> > >
> > Good point...I could change these calls, but the other way doesn't work, see below.
> >
> > > And after making the hypercall, there's an invocation of pfn_to_page(), which
> > > should account for the same. Unfortunately, there's not an existing hvpfn_to_page()
> > > function.
> > >
> > This seems like a tricky scenario to get right. In the root partition case, the
> > hypervisor allocates the page. That pfn could be some page within a larger Linux page.
> > Converting that to a Linux pfn (naively) means losing the original hvpfn since it gets
> > truncated, which is no good if we want to unmap it later. Also page_address() would
> > give the wrong virtual address.
> >
> > In other words, we'd have to completely change how we track these pages in order to
> > support this scenario, and the same goes for various other hypervisor APIs where the
> > hypervisor does the allocating. I think it's out of scope to try and address that
> > here, even in part, especially since we will be making assumptions about something
> > that may never happen.
>
> OK, yes the hypervisor allocating the page is a problem when Linux tracks it
> as a struct page. I'll agree it's out of current scope to change this.
>
> It makes me think about hv_synic_enable_regs() where the paravisor or hypervisor
> allocates the synic_message_page and synic_event_page. But that case should work
> OK with a regular guest with page size greater than 4K because the pages are tracked
> based on the guest kernel virtual address, not the PFN. So hv_synic_enable_regs()
> should work on ARM64 Linux guests with 64K page size and a paravisor, as well as
> for my postulated root partition with page size greater than 4K.
>
> When it matters, cases where the hypervisor or paravisor allocate pages to give
> to the guest will require careful handling to ensure they work for guest page sizes
> greater than 4K. That's useful information for future consideration. Thanks for the
> discussion.
>
Upon further reflection, I think there's a subtle bug in the case where the
hypervisor supplies the state page. This bug is present in the existing code, and
persists after these patches.
I'm assuming that the hypervisor supplied page is *not* part of the memory
assigned to the root partition when it was created. I.e., the supplied page is part
of the hypervisor's private memory. Or does the root partition Linux "give"
the hypervisor some memory for it to later return as one of these state pages?
Assuming the page did *not* originate in Linux, then the page provided by the
hypervisor doesn't actually have a "struct page" entry in the root partition Linux
instance. Doing pfn_to_page() works because it just does some address
arithmetic, but the resulting "struct page *" value points somewhere off the
end of the root partition's "struct page" array.
Then page_to_virt() is done on this somewhat bogus "struct page *" value.
page_to_virt() also just does some address arithmetic, so it "works". But it
assumes that the "struct page *" input value is good, and that Linux has a valid
virtual-to-physical direct mapping for the physical memory represented by
that input value. Unfortunately, that assumption might not always be true. I
think it works most of the time because Linux uses huge page mappings for
the direct map, and they may extend far enough beyond the root partition's
actual memory to cover this hypervisor page. Or maybe there's something
else going on that I'm not aware of and that allows the current code to work.
So please check my thinking.
The robust fix is to do like hv_synic_enable_regs(), where a page returned
by the hypervisor/paravisor is explicitly mapped in order to have a valid
virtual address in the root partition Linux.
Note that on ARM64, when CONFIG_DEBUG_VIRTUAL is set to catch errors
like this, page_to_virt() does additional checks on its input value, and would
generate a WARN_ON() in this case. x86/x64 does not do the additional checks.
Michael
>
> >
> > >> + }
> > >> +
> > >> + status = hv_do_hypercall(HVCALL_MAP_VP_STATE_PAGE, input,
> > >> + output);
> > >>
> > >> if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
> > >> if (hv_result_success(status))
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v4 4/5] mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH
2025-10-04 15:25 ` Michael Kelley
@ 2025-10-10 19:08 ` Nuno Das Neves
0 siblings, 0 replies; 24+ messages in thread
From: Nuno Das Neves @ 2025-10-10 19:08 UTC (permalink / raw)
To: Michael Kelley, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, prapal@linux.microsoft.com,
easwar.hariharan@linux.microsoft.com, tiala@microsoft.com,
anirudh@anirudhrb.com, paekkaladevi@linux.microsoft.com,
skinsburskii@linux.microsoft.com
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, Jinank Jain
On 10/4/2025 8:25 AM, Michael Kelley wrote:
> From: Michael Kelley <mhklinux@outlook.com> Sent: Wednesday, October 1, 2025 5:03 PM
>>
>> From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Wednesday, October 1, 2025 3:56 PM
>>>
>>> On 9/29/2025 10:56 AM, Michael Kelley wrote:
>>>> From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Friday, September 26, 2025 9:23 AM
>>>>>
>>>>> From: Jinank Jain <jinankjain@linux.microsoft.com>
>>>>>
>>>>> Introduce mshv_use_overlay_gpfn() to check if a page needs to be
>>>>> allocated and passed to the hypervisor to map VP state pages. This is
>>>>> only needed on L1VH, and only on some (newer) versions of the
>>>>> hypervisor, hence the need to check vmm_capabilities.
>>>>>
>>>>> Introduce functions hv_map/unmap_vp_state_page() to handle the
>>>>> allocation and freeing.
>>>>>
>>>>> Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
>>>>> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
>>>>> Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com>
>>>>> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
>>>>> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
>>>>> Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com>
>>>>> ---
>>>>> drivers/hv/mshv_root.h | 11 ++---
>>>>> drivers/hv/mshv_root_hv_call.c | 61 ++++++++++++++++++++++++---
>>>>> drivers/hv/mshv_root_main.c | 76 +++++++++++++++++-----------------
>>>>> 3 files changed, 98 insertions(+), 50 deletions(-)
>>>>>
>>>>> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
>>>>> index 0cb1e2589fe1..dbe2d1d0b22f 100644
>>>>> --- a/drivers/hv/mshv_root.h
>>>>> +++ b/drivers/hv/mshv_root.h
>>>>> @@ -279,11 +279,12 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
>>>>> /* Choose between pages and bytes */
>>>>> struct hv_vp_state_data state_data, u64 page_count,
>>>>> struct page **pages, u32 num_bytes, u8 *bytes);
>>>>> -int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>>>>> - union hv_input_vtl input_vtl,
>>>>> - struct page **state_page);
>>>>> -int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>>>>> - union hv_input_vtl input_vtl);
>>>>> +int hv_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>>>>> + union hv_input_vtl input_vtl,
>>>>> + struct page **state_page);
>>>>> +int hv_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>>>>> + struct page *state_page,
>>>>> + union hv_input_vtl input_vtl);
>>>>> int hv_call_create_port(u64 port_partition_id, union hv_port_id port_id,
>>>>> u64 connection_partition_id, struct hv_port_info *port_info,
>>>>> u8 port_vtl, u8 min_connection_vtl, int node);
>>>>> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
>>>>> index 3fd3cce23f69..98c6278ff151 100644
>>>>> --- a/drivers/hv/mshv_root_hv_call.c
>>>>> +++ b/drivers/hv/mshv_root_hv_call.c
>>>>> @@ -526,9 +526,9 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
>>>>> return ret;
>>>>> }
>>>>>
>>>>> -int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>>>>> - union hv_input_vtl input_vtl,
>>>>> - struct page **state_page)
>>>>> +static int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>>>>> + union hv_input_vtl input_vtl,
>>>>> + struct page **state_page)
>>>>> {
>>>>> struct hv_input_map_vp_state_page *input;
>>>>> struct hv_output_map_vp_state_page *output;
>>>>> @@ -547,7 +547,14 @@ int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>>>>> input->type = type;
>>>>> input->input_vtl = input_vtl;
>>>>>
>>>>> - status = hv_do_hypercall(HVCALL_MAP_VP_STATE_PAGE, input, output);
>>>>
>>>> This function must zero the input area before using it. Otherwise,
>>>> flags.map_location_provided is uninitialized when *state_page is NULL. It will
>>>> have whatever value was left by the previous user of hyperv_pcpu_input_arg,
>>>> potentially producing bizarre results. And there's a reserved field that won't be
>>>> set to zero.
>>>>
>>> Good catch, will add a memset().
>>>
>>>>> + if (*state_page) {
>>>>> + input->flags.map_location_provided = 1;
>>>>> + input->requested_map_location =
>>>>> + page_to_pfn(*state_page);
>>>>
>>>> Technically, this should be page_to_hvpfn() since the PFN value is being sent to
>>>> Hyper-V. I know root (and L1VH?) partitions must run with the same page size
>>>> as the Hyper-V host, but it's better to not leave code buried here that will blow
>>>> up if the "same page size requirement" should ever change.
>>>>
>>> Good point...I could change these calls, but the other way doesn't work, see below.
>>>
>>>> And after making the hypercall, there's an invocation of pfn_to_page(), which
>>>> should account for the same. Unfortunately, there's not an existing hvpfn_to_page()
>>>> function.
>>>>
>>> This seems like a tricky scenario to get right. In the root partition case, the
>>> hypervisor allocates the page. That pfn could be some page within a larger Linux page.
>>> Converting that to a Linux pfn (naively) means losing the original hvpfn since it gets
>>> truncated, which is no good if we want to unmap it later. Also page_address() would
>>> give the wrong virtual address.
>>>
>>> In other words, we'd have to completely change how we track these pages in order to
>>> support this scenario, and the same goes for various other hypervisor APIs where the
>>> hypervisor does the allocating. I think it's out of scope to try and address that
>>> here, even in part, especially since we will be making assumptions about something
>>> that may never happen.
>>
>> OK, yes the hypervisor allocating the page is a problem when Linux tracks it
>> as a struct page. I'll agree it's out of current scope to change this.
>>
>> It makes me think about hv_synic_enable_regs() where the paravisor or hypervisor
>> allocates the synic_message_page and synic_event_page. But that case should work
>> OK with a regular guest with page size greater than 4K because the pages are tracked
>> based on the guest kernel virtual address, not the PFN. So hv_synic_enable_regs()
>> should work on ARM64 Linux guests with 64K page size and a paravisor, as well as
>> for my postulated root partition with page size greater than 4K.
>>
>> When it matters, cases where the hypervisor or paravisor allocate pages to give
>> to the guest will require careful handling to ensure they work for guest page sizes
>> greater than 4K. That's useful information for future consideration. Thanks for the
>> discussion.
>>
>
> Upon further reflection, I think there's a subtle bug in the case where the
> hypervisor supplies the state page. This bug is present in the existing code, and
> persists after these patches.
>
> I'm assuming that the hypervisor supplied page is *not* part of the memory
> assigned to the root partition when it was created. I.e., the supplied page is part
> of the hypervisor's private memory. Or does the root partition Linux "give"
> the hypervisor some memory for it to later return as one of these state pages?
>
It is a regular page that Linux deposits to that particular guest partition so it
can be used for this or other purposes (bookkeeping, page tables, stats pages,
etc...). It would have been deposited earlier, e.g. one of the pages deposited
just before HVCALL_INITIALIZE_PARTITION.
The hypervisor's pre-reserved pages which Linux doesn't touch are only used for a
few things, and never for guests as far as I know.
> Assuming the page did *not* originate in Linux, then the page provided by the
> hypervisor doesn't actually have a "struct page" entry in the root partition Linux
> instance. Doing pfn_to_page() works because it just does some address
> arithmetic, but the resulting "struct page *" value points somewhere off the
> end of the root partition's "struct page" array.
>
> Then page_to_virt() is done on this somewhat bogus "struct page *" value.
> page_to_virt() also just does some address arithmetic, so it "works". But it
> assumes that the "struct page *" input value is good, and that Linux has a valid
> virtual-to-physical direct mapping for the physical memory represented by
> that input value. Unfortunately, that assumption might not always be true. I
> think it works most of the time because Linux uses huge page mappings for
> the direct map, and they may extend far enough beyond the root partition's
> actual memory to cover this hypervisor page. Or maybe there's something
> else going on that I'm not aware of and that allows the current code to work.
> So please check my thinking.
>
> The robust fix is to do like hv_synic_enable_regs(), where a page returned
> by the hypervisor/paravisor is explicitly mapped in order to have a valid
> virtual address in the root partition Linux.
>
> Note that on ARM64, when CONFIG_DEBUG_VIRTUAL is set to catch errors
> like this, page_to_virt() does additional checks on its input value, and would
> generate a WARN_ON() in this case. x86/x64 does not do the additional checks.
>
> Michael
>
>>
>>>
>>>>> + }
>>>>> +
>>>>> + status = hv_do_hypercall(HVCALL_MAP_VP_STATE_PAGE, input,
>>>>> + output);
>>>>>
>>>>> if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
>>>>> if (hv_result_success(status))
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2025-10-10 19:08 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-26 16:23 [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings Nuno Das Neves
2025-09-26 16:23 ` [PATCH v4 1/5] mshv: Only map vp->vp_stats_pages if on root scheduler Nuno Das Neves
2025-09-26 16:23 ` [PATCH v4 2/5] mshv: Add the HVCALL_GET_PARTITION_PROPERTY_EX hypercall Nuno Das Neves
2025-09-29 5:31 ` Tianyu Lan
2025-09-29 17:55 ` Michael Kelley
2025-10-01 22:32 ` Nuno Das Neves
2025-10-02 0:01 ` Michael Kelley
2025-09-26 16:23 ` [PATCH v4 3/5] mshv: Get the vmm capabilities offered by the hypervisor Nuno Das Neves
2025-09-26 16:23 ` [PATCH v4 4/5] mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH Nuno Das Neves
2025-09-29 5:31 ` Tianyu Lan
2025-09-29 17:56 ` Michael Kelley
2025-10-01 22:56 ` Nuno Das Neves
2025-10-02 0:02 ` Michael Kelley
2025-10-04 15:25 ` Michael Kelley
2025-10-10 19:08 ` Nuno Das Neves
2025-09-26 16:23 ` [PATCH v4 5/5] mshv: Introduce new hypercall to map stats page for L1VH partitions Nuno Das Neves
2025-09-29 17:57 ` Michael Kelley
2025-10-01 23:03 ` Nuno Das Neves
2025-10-02 0:16 ` Michael Kelley
2025-09-26 23:12 ` [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings Stanislav Kinsburskii
2025-09-29 18:19 ` Nuno Das Neves
2025-09-30 23:14 ` Wei Liu
2025-10-03 16:31 ` Stanislav Kinsburskii
2025-10-03 17:30 ` Wei Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox