public inbox for linux-hyperv@vger.kernel.org
 help / color / mirror / Atom feed
From: Nuno Das Neves <nunodasneves@linux.microsoft.com>
To: Michael Kelley <mhklinux@outlook.com>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"prapal@linux.microsoft.com" <prapal@linux.microsoft.com>,
	"easwar.hariharan@linux.microsoft.com"
	<easwar.hariharan@linux.microsoft.com>,
	"tiala@microsoft.com" <tiala@microsoft.com>,
	"anirudh@anirudhrb.com" <anirudh@anirudhrb.com>,
	"paekkaladevi@linux.microsoft.com"
	<paekkaladevi@linux.microsoft.com>,
	"skinsburskii@linux.microsoft.com"
	<skinsburskii@linux.microsoft.com>
Cc: "kys@microsoft.com" <kys@microsoft.com>,
	"haiyangz@microsoft.com" <haiyangz@microsoft.com>,
	"wei.liu@kernel.org" <wei.liu@kernel.org>,
	"decui@microsoft.com" <decui@microsoft.com>,
	Jinank Jain <jinankjain@linux.microsoft.com>
Subject: Re: [PATCH v4 4/5] mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH
Date: Wed, 1 Oct 2025 15:56:29 -0700	[thread overview]
Message-ID: <8bacde85-6406-4b47-b11d-ed5054b270f2@linux.microsoft.com> (raw)
In-Reply-To: <SN6PR02MB4157E95BDC70A1F91228DAF7D41BA@SN6PR02MB4157.namprd02.prod.outlook.com>

On 9/29/2025 10:56 AM, Michael Kelley wrote:
> From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Friday, September 26, 2025 9:23 AM
>>
>> From: Jinank Jain <jinankjain@linux.microsoft.com>
>>
>> Introduce mshv_use_overlay_gpfn() to check if a page needs to be
>> allocated and passed to the hypervisor to map VP state pages. This is
>> only needed on L1VH, and only on some (newer) versions of the
>> hypervisor, hence the need to check vmm_capabilities.
>>
>> Introduce functions hv_map/unmap_vp_state_page() to handle the
>> allocation and freeing.
>>
>> Signed-off-by: Jinank Jain <jinankjain@linux.microsoft.com>
>> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
>> Reviewed-by: Praveen K Paladugu <prapal@linux.microsoft.com>
>> Reviewed-by: Easwar Hariharan <easwar.hariharan@linux.microsoft.com>
>> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
>> Reviewed-by: Anirudh Rayabharam <anirudh@anirudhrb.com>
>> ---
>>  drivers/hv/mshv_root.h         | 11 ++---
>>  drivers/hv/mshv_root_hv_call.c | 61 ++++++++++++++++++++++++---
>>  drivers/hv/mshv_root_main.c    | 76 +++++++++++++++++-----------------
>>  3 files changed, 98 insertions(+), 50 deletions(-)
>>
>> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
>> index 0cb1e2589fe1..dbe2d1d0b22f 100644
>> --- a/drivers/hv/mshv_root.h
>> +++ b/drivers/hv/mshv_root.h
>> @@ -279,11 +279,12 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
>>  			 /* Choose between pages and bytes */
>>  			 struct hv_vp_state_data state_data, u64 page_count,
>>  			 struct page **pages, u32 num_bytes, u8 *bytes);
>> -int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> -			      union hv_input_vtl input_vtl,
>> -			      struct page **state_page);
>> -int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> -				union hv_input_vtl input_vtl);
>> +int hv_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> +			 union hv_input_vtl input_vtl,
>> +			 struct page **state_page);
>> +int hv_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> +			   struct page *state_page,
>> +			   union hv_input_vtl input_vtl);
>>  int hv_call_create_port(u64 port_partition_id, union hv_port_id port_id,
>>  			u64 connection_partition_id, struct hv_port_info *port_info,
>>  			u8 port_vtl, u8 min_connection_vtl, int node);
>> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
>> index 3fd3cce23f69..98c6278ff151 100644
>> --- a/drivers/hv/mshv_root_hv_call.c
>> +++ b/drivers/hv/mshv_root_hv_call.c
>> @@ -526,9 +526,9 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
>>  	return ret;
>>  }
>>
>> -int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> -			      union hv_input_vtl input_vtl,
>> -			      struct page **state_page)
>> +static int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> +				     union hv_input_vtl input_vtl,
>> +				     struct page **state_page)
>>  {
>>  	struct hv_input_map_vp_state_page *input;
>>  	struct hv_output_map_vp_state_page *output;
>> @@ -547,7 +547,14 @@ int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>>  		input->type = type;
>>  		input->input_vtl = input_vtl;
>>
>> -		status = hv_do_hypercall(HVCALL_MAP_VP_STATE_PAGE, input, output);
> 
> This function must zero the input area before using it. Otherwise,
> flags.map_location_provided is uninitialized when *state_page is NULL. It will
> have whatever value was left by the previous user of hyperv_pcpu_input_arg,
> potentially producing bizarre results. And there's a reserved field that won't be
> set to zero.
> 
Good catch, will add a memset().

>> +		if (*state_page) {
>> +			input->flags.map_location_provided = 1;
>> +			input->requested_map_location =
>> +				page_to_pfn(*state_page);
> 
> Technically, this should be page_to_hvpfn() since the PFN value is being sent to
> Hyper-V. I know root (and L1VH?) partitions must run with the same page size
> as the Hyper-V host, but it's better to not leave code buried here that will blow
> up if the "same page size requirement" should ever change.
> 
Good point...I could change these calls, but the other way doesn't work, see below.

> And after making the hypercall, there's an invocation of pfn_to_page(), which
> should account for the same. Unfortunately, there's not an existing hvpfn_to_page()
> function.
> 
This seems like a tricky scenario to get right. In the root partition case, the
hypervisor allocates the page. That pfn could be some page within a larger Linux page.
Converting that to a Linux pfn (naively) means losing the original hvpfn since it gets
truncated, which is no good if we want to unmap it later. Also page_address() would
give the wrong virtual address.

In other words, we'd have to completely change how we track these pages in order to
support this scenario, and the same goes for various other hypervisor APIs where the
hypervisor does the allocating. I think it's out of scope to try and address that
here, even in part, especially since we will be making assumptions about something
that may never happen.

>> +		}
>> +
>> +		status = hv_do_hypercall(HVCALL_MAP_VP_STATE_PAGE, input,
>> +					 output);
>>
>>  		if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
>>  			if (hv_result_success(status))
>> @@ -565,8 +572,39 @@ int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>>  	return ret;
>>  }
>>
>> -int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> -				union hv_input_vtl input_vtl)
>> +static bool mshv_use_overlay_gpfn(void)
>> +{
>> +	return hv_l1vh_partition() &&
>> +	       mshv_root.vmm_caps.vmm_can_provide_overlay_gpfn;
>> +}
>> +
>> +int hv_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> +			 union hv_input_vtl input_vtl,
>> +			 struct page **state_page)
>> +{
>> +	int ret = 0;
>> +	struct page *allocated_page = NULL;
>> +
>> +	if (mshv_use_overlay_gpfn()) {
>> +		allocated_page = alloc_page(GFP_KERNEL);
>> +		if (!allocated_page)
>> +			return -ENOMEM;
>> +		*state_page = allocated_page;
>> +	} else {
>> +		*state_page = NULL;
>> +	}
>> +
>> +	ret = hv_call_map_vp_state_page(partition_id, vp_index, type, input_vtl,
>> +					state_page);
>> +
>> +	if (ret && allocated_page)
>> +		__free_page(allocated_page);
> 
> For robustness, you might want to set *state_page = NULL here so the
> caller doesn't have a reference to the page that has been freed. I didn't
> see any cases where the caller incorrectly checks the returned
> *state_page value after an error, so the current code isn't broken.
> 
Sure, I can add it.

>> +
>> +	return ret;
>> +}
>> +
>> +static int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> +				       union hv_input_vtl input_vtl)
>>  {
>>  	unsigned long flags;
>>  	u64 status;
>> @@ -590,6 +628,17 @@ int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>>  	return hv_result_to_errno(status);
>>  }
>>
>> +int hv_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
>> +			   struct page *state_page, union hv_input_vtl input_vtl)
>> +{
>> +	int ret = hv_call_unmap_vp_state_page(partition_id, vp_index, type, input_vtl);
>> +
>> +	if (mshv_use_overlay_gpfn() && state_page)
>> +		__free_page(state_page);
>> +
>> +	return ret;
>> +}
>> +
>>  int hv_call_get_partition_property_ex(u64 partition_id, u64 property_code,
>>  				      u64 arg, void *property_value,
>>  				      size_t property_value_sz)
>> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
>> index e199770ecdfa..2d0ad17acde6 100644
>> --- a/drivers/hv/mshv_root_main.c
>> +++ b/drivers/hv/mshv_root_main.c
>> @@ -890,7 +890,7 @@ mshv_partition_ioctl_create_vp(struct mshv_partition
>> *partition,
>>  {
>>  	struct mshv_create_vp args;
>>  	struct mshv_vp *vp;
>> -	struct page *intercept_message_page, *register_page, *ghcb_page;
>> +	struct page *intercept_msg_page, *register_page, *ghcb_page;
>>  	void *stats_pages[2];
>>  	long ret;
>>
>> @@ -908,28 +908,25 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
>>  	if (ret)
>>  		return ret;
>>
>> -	ret = hv_call_map_vp_state_page(partition->pt_id, args.vp_index,
>> -					HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
>> -					input_vtl_zero,
>> -					&intercept_message_page);
>> +	ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
>> +				   HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
>> +				   input_vtl_zero, &intercept_msg_page);
>>  	if (ret)
>>  		goto destroy_vp;
>>
>>  	if (!mshv_partition_encrypted(partition)) {
>> -		ret = hv_call_map_vp_state_page(partition->pt_id, args.vp_index,
>> -						HV_VP_STATE_PAGE_REGISTERS,
>> -						input_vtl_zero,
>> -						&register_page);
>> +		ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
>> +					   HV_VP_STATE_PAGE_REGISTERS,
>> +					   input_vtl_zero, &register_page);
>>  		if (ret)
>>  			goto unmap_intercept_message_page;
>>  	}
>>
>>  	if (mshv_partition_encrypted(partition) &&
>>  	    is_ghcb_mapping_available()) {
>> -		ret = hv_call_map_vp_state_page(partition->pt_id, args.vp_index,
>> -						HV_VP_STATE_PAGE_GHCB,
>> -						input_vtl_normal,
>> -						&ghcb_page);
>> +		ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
>> +					   HV_VP_STATE_PAGE_GHCB,
>> +					   input_vtl_normal, &ghcb_page);
>>  		if (ret)
>>  			goto unmap_register_page;
>>  	}
>> @@ -960,7 +957,7 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
>>  	atomic64_set(&vp->run.vp_signaled_count, 0);
>>
>>  	vp->vp_index = args.vp_index;
>> -	vp->vp_intercept_msg_page = page_to_virt(intercept_message_page);
>> +	vp->vp_intercept_msg_page = page_to_virt(intercept_msg_page);
>>  	if (!mshv_partition_encrypted(partition))
>>  		vp->vp_register_page = page_to_virt(register_page);
>>
>> @@ -993,21 +990,19 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
>>  	if (hv_scheduler_type == HV_SCHEDULER_TYPE_ROOT)
>>  		mshv_vp_stats_unmap(partition->pt_id, args.vp_index);
>>  unmap_ghcb_page:
>> -	if (mshv_partition_encrypted(partition) && is_ghcb_mapping_available()) {
>> -		hv_call_unmap_vp_state_page(partition->pt_id, args.vp_index,
>> -					    HV_VP_STATE_PAGE_GHCB,
>> -					    input_vtl_normal);
>> -	}
>> +	if (mshv_partition_encrypted(partition) && is_ghcb_mapping_available())
>> +		hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
>> +				       HV_VP_STATE_PAGE_GHCB, ghcb_page,
>> +				       input_vtl_normal);
>>  unmap_register_page:
>> -	if (!mshv_partition_encrypted(partition)) {
>> -		hv_call_unmap_vp_state_page(partition->pt_id, args.vp_index,
>> -					    HV_VP_STATE_PAGE_REGISTERS,
>> -					    input_vtl_zero);
>> -	}
>> +	if (!mshv_partition_encrypted(partition))
>> +		hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
>> +				       HV_VP_STATE_PAGE_REGISTERS,
>> +				       register_page, input_vtl_zero);
>>  unmap_intercept_message_page:
>> -	hv_call_unmap_vp_state_page(partition->pt_id, args.vp_index,
>> -				    HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
>> -				    input_vtl_zero);
>> +	hv_unmap_vp_state_page(partition->pt_id, args.vp_index,
>> +			       HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
>> +			       intercept_msg_page, input_vtl_zero);
>>  destroy_vp:
>>  	hv_call_delete_vp(partition->pt_id, args.vp_index);
>>  	return ret;
>> @@ -1748,24 +1743,27 @@ static void destroy_partition(struct mshv_partition *partition)
>>  				mshv_vp_stats_unmap(partition->pt_id, vp->vp_index);
>>
>>  			if (vp->vp_register_page) {
>> -				(void)hv_call_unmap_vp_state_page(partition->pt_id,
>> -								  vp->vp_index,
>> - 								  HV_VP_STATE_PAGE_REGISTERS,
>> -								  input_vtl_zero);
>> +				(void)hv_unmap_vp_state_page(partition->pt_id,
>> +							     vp->vp_index,
>> +							     HV_VP_STATE_PAGE_REGISTERS,
>> +							     virt_to_page(vp->vp_register_page),
>> +							     input_vtl_zero);
>>  				vp->vp_register_page = NULL;
>>  			}
>>
>> -			(void)hv_call_unmap_vp_state_page(partition->pt_id,
>> -							  vp->vp_index,
>> -							  HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
>> -							  input_vtl_zero);
>> +			(void)hv_unmap_vp_state_page(partition->pt_id,
>> +						     vp->vp_index,
>> +						     HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
>> +						     virt_to_page(vp->vp_intercept_msg_page),
>> +						     input_vtl_zero);
>>  			vp->vp_intercept_msg_page = NULL;
>>
>>  			if (vp->vp_ghcb_page) {
>> -				(void)hv_call_unmap_vp_state_page(partition->pt_id,
>> -								  vp->vp_index,
>> -								  HV_VP_STATE_PAGE_GHCB,
>> -								  input_vtl_normal);
>> +				(void)hv_unmap_vp_state_page(partition->pt_id,
>> +							     vp->vp_index,
>> +							     HV_VP_STATE_PAGE_GHCB,
>> +							     virt_to_page(vp->vp_ghcb_page),
>> +							     input_vtl_normal);
>>  				vp->vp_ghcb_page = NULL;
>>  			}
>>
>> --
>> 2.34.1
>>


  reply	other threads:[~2025-10-01 22:56 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-26 16:23 [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings Nuno Das Neves
2025-09-26 16:23 ` [PATCH v4 1/5] mshv: Only map vp->vp_stats_pages if on root scheduler Nuno Das Neves
2025-09-26 16:23 ` [PATCH v4 2/5] mshv: Add the HVCALL_GET_PARTITION_PROPERTY_EX hypercall Nuno Das Neves
2025-09-29  5:31   ` Tianyu Lan
2025-09-29 17:55   ` Michael Kelley
2025-10-01 22:32     ` Nuno Das Neves
2025-10-02  0:01       ` Michael Kelley
2025-09-26 16:23 ` [PATCH v4 3/5] mshv: Get the vmm capabilities offered by the hypervisor Nuno Das Neves
2025-09-26 16:23 ` [PATCH v4 4/5] mshv: Allocate vp state page for HVCALL_MAP_VP_STATE_PAGE on L1VH Nuno Das Neves
2025-09-29  5:31   ` Tianyu Lan
2025-09-29 17:56   ` Michael Kelley
2025-10-01 22:56     ` Nuno Das Neves [this message]
2025-10-02  0:02       ` Michael Kelley
2025-10-04 15:25         ` Michael Kelley
2025-10-10 19:08           ` Nuno Das Neves
2025-09-26 16:23 ` [PATCH v4 5/5] mshv: Introduce new hypercall to map stats page for L1VH partitions Nuno Das Neves
2025-09-29 17:57   ` Michael Kelley
2025-10-01 23:03     ` Nuno Das Neves
2025-10-02  0:16       ` Michael Kelley
2025-09-26 23:12 ` [PATCH v4 0/5] mshv: Fixes for stats and vp state page mappings Stanislav Kinsburskii
2025-09-29 18:19   ` Nuno Das Neves
2025-09-30 23:14     ` Wei Liu
2025-10-03 16:31     ` Stanislav Kinsburskii
2025-10-03 17:30       ` Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8bacde85-6406-4b47-b11d-ed5054b270f2@linux.microsoft.com \
    --to=nunodasneves@linux.microsoft.com \
    --cc=anirudh@anirudhrb.com \
    --cc=decui@microsoft.com \
    --cc=easwar.hariharan@linux.microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=jinankjain@linux.microsoft.com \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhklinux@outlook.com \
    --cc=paekkaladevi@linux.microsoft.com \
    --cc=prapal@linux.microsoft.com \
    --cc=skinsburskii@linux.microsoft.com \
    --cc=tiala@microsoft.com \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox