linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] hyperv: Fixes for get_vtl(void)
@ 2024-12-18 20:54 Roman Kisel
  2024-12-18 20:54 ` [PATCH 1/2] hyperv: Fix pointer type for the output of the hypercall in get_vtl(void) Roman Kisel
  2024-12-18 20:54 ` [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas " Roman Kisel
  0 siblings, 2 replies; 27+ messages in thread
From: Roman Kisel @ 2024-12-18 20:54 UTC (permalink / raw)
  To: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86
  Cc: apais, benhill, ssengar, sunilmut, vdso

The get_vtl(void) function

* has got one bug when the code started using a wrong pointer type after
  refactoring, and also
* it doesn't adhere to the requirements of the Hypervisor Top-Level Funactional
  Specification[1, 2] as the code overlaps the input and output areas for a hypercall.

The first issue leads to a wrong 100% reproducible computation due to reading
a byte worth of data at a wrong offset. That in turn leads to using a nonsensical
value ("fortunately", could catch it easily!) for the current VTL when initiating
VMBus communications. As a repercussion from that, the system wouldn't boot. The
fix is straightforward: use the correct pointer type.

The second issue doesn't seem to lead to any reproducible breakage just yet. It is
fixed with using the output hypercall pages allocated per-CPU, and that isn't the
only or the most obvious choice so let me elaborate why that fix appears to be the
best one in my opinion out of the options I could conceive of.

An alternative approach could be to use an appropriately aligned space within the
input page that doesn't overlap with the input data, as a memory optimization.
Indeed, when considering a 1,000+ vCPU VM, allocating one page per-CPU makes the
system spend more time when booting and more space during its lifetime, let alone
that the kernel running in VTL2 is expected to be CPU and memory frugal. Although
saving on the hypercall output page allocation in just that function, we'd still
need to allocate the output pages later for other hypercalls to provide services
for the VTL0 guest (VTL2 works as an exclave of the hypervisor) so here we wouldn't
really save any memory in the long run.

One could also consider passing the input and output parameters in the registers
to get the function be faster potentially, and/or to avoid allocations altogether to
be able to call it from any context at any stage of booting the system. This function
is not in a hot path though, and if it ever is, adding support for the extended
fast hypercalls (that pass input and output in the XMM registers, here, due to
the size of output parameters for the GetVpregisters hypercall) will both make
the function faster and allocation-less. Then again, if either of that ever becomes
a concern.

I have validated the fixes by booting the fixed kernel in VTL2 up using OpenVMM and
OpenHCL[3, 4].

[1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/hypercall-interface
[2] https://github.com/MicrosoftDocs/Virtualization-Documentation/tree/main/tlfs
[3] https://openvmm.dev/guide/user_guide/openhcl.html
[4] https://github.com/microsoft/OpenVMM

Roman Kisel (2):
  hyperv: Fix pointer type for the output of the hypercall in
    get_vtl(void)
  hyperv: Do not overlap the input and output hypercall areas in
    get_vtl(void)

 arch/x86/hyperv/hv_init.c   | 6 +++---
 drivers/hv/hv_common.c      | 6 +++---
 include/hyperv/hvgdk_mini.h | 3 ---
 3 files changed, 6 insertions(+), 9 deletions(-)


base-commit: 4d4ace979a3066e5c940331571e6c1c3f280d1d3
-- 
2.34.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/2] hyperv: Fix pointer type for the output of the hypercall in get_vtl(void)
  2024-12-18 20:54 [PATCH 0/2] hyperv: Fixes for get_vtl(void) Roman Kisel
@ 2024-12-18 20:54 ` Roman Kisel
  2024-12-19  2:45   ` Wei Liu
  2024-12-19 18:40   ` Nuno Das Neves
  2024-12-18 20:54 ` [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas " Roman Kisel
  1 sibling, 2 replies; 27+ messages in thread
From: Roman Kisel @ 2024-12-18 20:54 UTC (permalink / raw)
  To: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86
  Cc: apais, benhill, ssengar, sunilmut, vdso

Commit bc905fa8b633 ("hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h")
changed the type of the output pointer to `struct hv_register_assoc` from
`struct hv_get_vp_registers_output`. That leads to an incorrect computation,
and leaves the system broken.

Use the correct pointer type for the output of the GetVpRegisters hypercall.

Fixes: bc905fa8b633 ("hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h")
Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 arch/x86/hyperv/hv_init.c   | 6 +++---
 include/hyperv/hvgdk_mini.h | 3 ---
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 3cf2a227d666..c7185c6a290b 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -416,13 +416,13 @@ static u8 __init get_vtl(void)
 {
 	u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS;
 	struct hv_input_get_vp_registers *input;
-	struct hv_register_assoc *output;
+	struct hv_get_vp_registers_output *output;
 	unsigned long flags;
 	u64 ret;
 
 	local_irq_save(flags);
 	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	output = (struct hv_register_assoc *)input;
+	output = (struct hv_get_vp_registers_output *)input;
 
 	memset(input, 0, struct_size(input, names, 1));
 	input->partition_id = HV_PARTITION_ID_SELF;
@@ -432,7 +432,7 @@ static u8 __init get_vtl(void)
 
 	ret = hv_do_hypercall(control, input, output);
 	if (hv_result_success(ret)) {
-		ret = output->value.reg8 & HV_X64_VTL_MASK;
+		ret = output->as64.low & HV_X64_VTL_MASK;
 	} else {
 		pr_err("Failed to get VTL(error: %lld) exiting...\n", ret);
 		BUG();
diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
index db3d1aaf7330..0b1a10828f33 100644
--- a/include/hyperv/hvgdk_mini.h
+++ b/include/hyperv/hvgdk_mini.h
@@ -1107,7 +1107,6 @@ union hv_register_value {
 	union hv_x64_pending_interruption_register pending_interruption;
 };
 
-#if defined(CONFIG_ARM64)
 /* HvGetVpRegisters returns an array of these output elements */
 struct hv_get_vp_registers_output {
 	union {
@@ -1124,8 +1123,6 @@ struct hv_get_vp_registers_output {
 	};
 };
 
-#endif /* CONFIG_ARM64 */
-
 struct hv_register_assoc {
 	u32 name;			/* enum hv_register_name */
 	u32 reserved1;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas in get_vtl(void)
  2024-12-18 20:54 [PATCH 0/2] hyperv: Fixes for get_vtl(void) Roman Kisel
  2024-12-18 20:54 ` [PATCH 1/2] hyperv: Fix pointer type for the output of the hypercall in get_vtl(void) Roman Kisel
@ 2024-12-18 20:54 ` Roman Kisel
  2024-12-19  2:42   ` Wei Liu
  1 sibling, 1 reply; 27+ messages in thread
From: Roman Kisel @ 2024-12-18 20:54 UTC (permalink / raw)
  To: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86
  Cc: apais, benhill, ssengar, sunilmut, vdso

The Top-Level Functional Specification for Hyper-V, Section 3.6 [1, 2], disallows
overlapping of the input and output hypercall areas, and get_vtl(void) does
overlap them.

To fix this, enable allocation of the output hypercall pages when running in
the VTL mode and use the output hypercall page of the current vCPU for the
hypercall.

[1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/hypercall-interface
[2] https://github.com/MicrosoftDocs/Virtualization-Documentation/tree/main/tlfs

Fixes: 8387ce06d70b ("x86/hyperv: Set Virtual Trust Level in VMBus init message")
Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 arch/x86/hyperv/hv_init.c | 2 +-
 drivers/hv/hv_common.c    | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index c7185c6a290b..90c9ea00273e 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -422,7 +422,7 @@ static u8 __init get_vtl(void)
 
 	local_irq_save(flags);
 	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	output = (struct hv_get_vp_registers_output *)input;
+	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
 
 	memset(input, 0, struct_size(input, names, 1));
 	input->partition_id = HV_PARTITION_ID_SELF;
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index c4fd07d9bf1a..5178beed6ca8 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -340,7 +340,7 @@ int __init hv_common_init(void)
 	BUG_ON(!hyperv_pcpu_input_arg);
 
 	/* Allocate the per-CPU state for output arg for root */
-	if (hv_root_partition) {
+	if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
 		hyperv_pcpu_output_arg = alloc_percpu(void *);
 		BUG_ON(!hyperv_pcpu_output_arg);
 	}
@@ -435,7 +435,7 @@ int hv_common_cpu_init(unsigned int cpu)
 	void **inputarg, **outputarg;
 	u64 msr_vp_index;
 	gfp_t flags;
-	int pgcount = hv_root_partition ? 2 : 1;
+	const int pgcount = (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) ? 2 : 1;
 	void *mem;
 	int ret;
 
@@ -453,7 +453,7 @@ int hv_common_cpu_init(unsigned int cpu)
 		if (!mem)
 			return -ENOMEM;
 
-		if (hv_root_partition) {
+		if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
 			outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
 			*outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas in get_vtl(void)
  2024-12-18 20:54 ` [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas " Roman Kisel
@ 2024-12-19  2:42   ` Wei Liu
  2024-12-19 18:19     ` Roman Kisel
  0 siblings, 1 reply; 27+ messages in thread
From: Wei Liu @ 2024-12-19  2:42 UTC (permalink / raw)
  To: Roman Kisel
  Cc: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86, apais, benhill, ssengar, sunilmut, vdso

On Wed, Dec 18, 2024 at 12:54:21PM -0800, Roman Kisel wrote:
> The Top-Level Functional Specification for Hyper-V, Section 3.6 [1, 2], disallows
> overlapping of the input and output hypercall areas, and get_vtl(void) does
> overlap them.
> 
> To fix this, enable allocation of the output hypercall pages when running in
> the VTL mode and use the output hypercall page of the current vCPU for the
> hypercall.
> 
> [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/hypercall-interface
> [2] https://github.com/MicrosoftDocs/Virtualization-Documentation/tree/main/tlfs
> 
> Fixes: 8387ce06d70b ("x86/hyperv: Set Virtual Trust Level in VMBus init message")
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>  arch/x86/hyperv/hv_init.c | 2 +-
>  drivers/hv/hv_common.c    | 6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index c7185c6a290b..90c9ea00273e 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -422,7 +422,7 @@ static u8 __init get_vtl(void)
>  
>  	local_irq_save(flags);
>  	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> -	output = (struct hv_get_vp_registers_output *)input;
> +	output = *this_cpu_ptr(hyperv_pcpu_output_arg);

You can do

	output = (char *)input + HV_HYP_PAGE_SIZE / 2;

to avoid the extra allocation.

The input and output structures surely won't take up half of the page.

Thanks,
Wei.

>  
>  	memset(input, 0, struct_size(input, names, 1));
>  	input->partition_id = HV_PARTITION_ID_SELF;
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index c4fd07d9bf1a..5178beed6ca8 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -340,7 +340,7 @@ int __init hv_common_init(void)
>  	BUG_ON(!hyperv_pcpu_input_arg);
>  
>  	/* Allocate the per-CPU state for output arg for root */
> -	if (hv_root_partition) {
> +	if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
>  		hyperv_pcpu_output_arg = alloc_percpu(void *);
>  		BUG_ON(!hyperv_pcpu_output_arg);
>  	}
> @@ -435,7 +435,7 @@ int hv_common_cpu_init(unsigned int cpu)
>  	void **inputarg, **outputarg;
>  	u64 msr_vp_index;
>  	gfp_t flags;
> -	int pgcount = hv_root_partition ? 2 : 1;
> +	const int pgcount = (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) ? 2 : 1;
>  	void *mem;
>  	int ret;
>  
> @@ -453,7 +453,7 @@ int hv_common_cpu_init(unsigned int cpu)
>  		if (!mem)
>  			return -ENOMEM;
>  
> -		if (hv_root_partition) {
> +		if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
>  			outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
>  			*outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
>  		}
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] hyperv: Fix pointer type for the output of the hypercall in get_vtl(void)
  2024-12-18 20:54 ` [PATCH 1/2] hyperv: Fix pointer type for the output of the hypercall in get_vtl(void) Roman Kisel
@ 2024-12-19  2:45   ` Wei Liu
  2024-12-19 17:26     ` Roman Kisel
  2024-12-19 18:40   ` Nuno Das Neves
  1 sibling, 1 reply; 27+ messages in thread
From: Wei Liu @ 2024-12-19  2:45 UTC (permalink / raw)
  To: Roman Kisel
  Cc: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86, apais, benhill, ssengar, sunilmut, vdso

On Wed, Dec 18, 2024 at 12:54:20PM -0800, Roman Kisel wrote:
> Commit bc905fa8b633 ("hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h")
> changed the type of the output pointer to `struct hv_register_assoc` from
> `struct hv_get_vp_registers_output`. That leads to an incorrect computation,
> and leaves the system broken.
> 
> Use the correct pointer type for the output of the GetVpRegisters hypercall.
> 
> Fixes: bc905fa8b633 ("hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h")

This commit is not in the mainline kernel yet, so this tag is not
needed.

It will most likely to be wrong since I will need to rebase the
hyperv-next branch.

I can fold this patch into the original patch and leave your
Signed-off-by there.

Thanks,
Wei.

> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>  arch/x86/hyperv/hv_init.c   | 6 +++---
>  include/hyperv/hvgdk_mini.h | 3 ---
>  2 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 3cf2a227d666..c7185c6a290b 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -416,13 +416,13 @@ static u8 __init get_vtl(void)
>  {
>  	u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS;
>  	struct hv_input_get_vp_registers *input;
> -	struct hv_register_assoc *output;
> +	struct hv_get_vp_registers_output *output;
>  	unsigned long flags;
>  	u64 ret;
>  
>  	local_irq_save(flags);
>  	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> -	output = (struct hv_register_assoc *)input;
> +	output = (struct hv_get_vp_registers_output *)input;
>  
>  	memset(input, 0, struct_size(input, names, 1));
>  	input->partition_id = HV_PARTITION_ID_SELF;
> @@ -432,7 +432,7 @@ static u8 __init get_vtl(void)
>  
>  	ret = hv_do_hypercall(control, input, output);
>  	if (hv_result_success(ret)) {
> -		ret = output->value.reg8 & HV_X64_VTL_MASK;
> +		ret = output->as64.low & HV_X64_VTL_MASK;
>  	} else {
>  		pr_err("Failed to get VTL(error: %lld) exiting...\n", ret);
>  		BUG();
> diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
> index db3d1aaf7330..0b1a10828f33 100644
> --- a/include/hyperv/hvgdk_mini.h
> +++ b/include/hyperv/hvgdk_mini.h
> @@ -1107,7 +1107,6 @@ union hv_register_value {
>  	union hv_x64_pending_interruption_register pending_interruption;
>  };
>  
> -#if defined(CONFIG_ARM64)
>  /* HvGetVpRegisters returns an array of these output elements */
>  struct hv_get_vp_registers_output {
>  	union {
> @@ -1124,8 +1123,6 @@ struct hv_get_vp_registers_output {
>  	};
>  };
>  
> -#endif /* CONFIG_ARM64 */
> -
>  struct hv_register_assoc {
>  	u32 name;			/* enum hv_register_name */
>  	u32 reserved1;
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] hyperv: Fix pointer type for the output of the hypercall in get_vtl(void)
  2024-12-19  2:45   ` Wei Liu
@ 2024-12-19 17:26     ` Roman Kisel
  0 siblings, 0 replies; 27+ messages in thread
From: Roman Kisel @ 2024-12-19 17:26 UTC (permalink / raw)
  To: Wei Liu
  Cc: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, linux-hyperv, linux-kernel,
	x86, apais, benhill, ssengar, sunilmut, vdso



On 12/18/2024 6:45 PM, Wei Liu wrote:
> On Wed, Dec 18, 2024 at 12:54:20PM -0800, Roman Kisel wrote:
>> Commit bc905fa8b633 ("hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h")
>> changed the type of the output pointer to `struct hv_register_assoc` from
>> `struct hv_get_vp_registers_output`. That leads to an incorrect computation,
>> and leaves the system broken.
>>
>> Use the correct pointer type for the output of the GetVpRegisters hypercall.
>>
>> Fixes: bc905fa8b633 ("hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h")
> 
> This commit is not in the mainline kernel yet, so this tag is not
> needed.
Got it, thanks for the explanation!

> 
> It will most likely to be wrong since I will need to rebase the
> hyperv-next branch.
> 
> I can fold this patch into the original patch and leave your
> Signed-off-by there.
That would be great and appreciated very much, thank you!

> 
> Thanks,
> Wei.
Thank you,
Roman

> 
>> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
>> ---
>>   arch/x86/hyperv/hv_init.c   | 6 +++---
>>   include/hyperv/hvgdk_mini.h | 3 ---
>>   2 files changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> index 3cf2a227d666..c7185c6a290b 100644
>> --- a/arch/x86/hyperv/hv_init.c
>> +++ b/arch/x86/hyperv/hv_init.c
>> @@ -416,13 +416,13 @@ static u8 __init get_vtl(void)
>>   {
>>   	u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS;
>>   	struct hv_input_get_vp_registers *input;
>> -	struct hv_register_assoc *output;
>> +	struct hv_get_vp_registers_output *output;
>>   	unsigned long flags;
>>   	u64 ret;
>>   
>>   	local_irq_save(flags);
>>   	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
>> -	output = (struct hv_register_assoc *)input;
>> +	output = (struct hv_get_vp_registers_output *)input;
>>   
>>   	memset(input, 0, struct_size(input, names, 1));
>>   	input->partition_id = HV_PARTITION_ID_SELF;
>> @@ -432,7 +432,7 @@ static u8 __init get_vtl(void)
>>   
>>   	ret = hv_do_hypercall(control, input, output);
>>   	if (hv_result_success(ret)) {
>> -		ret = output->value.reg8 & HV_X64_VTL_MASK;
>> +		ret = output->as64.low & HV_X64_VTL_MASK;
>>   	} else {
>>   		pr_err("Failed to get VTL(error: %lld) exiting...\n", ret);
>>   		BUG();
>> diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
>> index db3d1aaf7330..0b1a10828f33 100644
>> --- a/include/hyperv/hvgdk_mini.h
>> +++ b/include/hyperv/hvgdk_mini.h
>> @@ -1107,7 +1107,6 @@ union hv_register_value {
>>   	union hv_x64_pending_interruption_register pending_interruption;
>>   };
>>   
>> -#if defined(CONFIG_ARM64)
>>   /* HvGetVpRegisters returns an array of these output elements */
>>   struct hv_get_vp_registers_output {
>>   	union {
>> @@ -1124,8 +1123,6 @@ struct hv_get_vp_registers_output {
>>   	};
>>   };
>>   
>> -#endif /* CONFIG_ARM64 */
>> -
>>   struct hv_register_assoc {
>>   	u32 name;			/* enum hv_register_name */
>>   	u32 reserved1;
>> -- 
>> 2.34.1
>>

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas in get_vtl(void)
  2024-12-19  2:42   ` Wei Liu
@ 2024-12-19 18:19     ` Roman Kisel
  2024-12-19 21:35       ` Wei Liu
  2024-12-19 21:37       ` Michael Kelley
  0 siblings, 2 replies; 27+ messages in thread
From: Roman Kisel @ 2024-12-19 18:19 UTC (permalink / raw)
  To: Wei Liu
  Cc: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, linux-hyperv, linux-kernel,
	x86, apais, benhill, ssengar, sunilmut, vdso



On 12/18/2024 6:42 PM, Wei Liu wrote:
> On Wed, Dec 18, 2024 at 12:54:21PM -0800, Roman Kisel wrote:
>> The Top-Level Functional Specification for Hyper-V, Section 3.6 [1, 2], disallows
>> overlapping of the input and output hypercall areas, and get_vtl(void) does
>> overlap them.
>>
>> To fix this, enable allocation of the output hypercall pages when running in
>> the VTL mode and use the output hypercall page of the current vCPU for the
>> hypercall.
>>
>> [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/hypercall-interface
>> [2] https://github.com/MicrosoftDocs/Virtualization-Documentation/tree/main/tlfs
>>
>> Fixes: 8387ce06d70b ("x86/hyperv: Set Virtual Trust Level in VMBus init message")
>> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
>> ---
>>   arch/x86/hyperv/hv_init.c | 2 +-
>>   drivers/hv/hv_common.c    | 6 +++---
>>   2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> index c7185c6a290b..90c9ea00273e 100644
>> --- a/arch/x86/hyperv/hv_init.c
>> +++ b/arch/x86/hyperv/hv_init.c
>> @@ -422,7 +422,7 @@ static u8 __init get_vtl(void)
>>   
>>   	local_irq_save(flags);
>>   	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
>> -	output = (struct hv_get_vp_registers_output *)input;
>> +	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> 
> You can do
> 
> 	output = (char *)input + HV_HYP_PAGE_SIZE / 2;
> 
> to avoid the extra allocation.
> 
> The input and output structures surely won't take up half of the page.
Agreed on the both counts! I do think that the attempt to save here
won't help much: the hypercall output per-CPU pages in the VTL mode are
needed just as in the dom0/root partition mode because this hypercall
isn't going to be the only one required.

In other words, we will have to allocate these pages anyway as we evolve
the code; we are trying to save here what is going to be spent anyway. 
Sort of, kicking the can down the road as the saying goes :)

I do understand that within the code that is already merged, there is
just one this place in this function where the hypercall that returns
data is used. And the proposed approach makes the code self-explanatory:
```
output = *this_cpu_ptr(hyperv_pcpu_output_arg);
```

as opposed to

```
output = (char *)input + HV_HYP_PAGE_SIZE / 2;
```

or, as it existed,

```
output = (struct hv_get_vp_registers_output *)input;
```

which both do require a good comment I believe.

There will surely be more hypercall usage in the VTL mode that return
data and require the output pages as we progress with upstreaming the
VTL patches. Enabling the hypercall output pages allows to fix the
function in question in a very natural way, making it possible to
replace with some future `hv_get_vp_register` that would work for both
dom0 and VTL mode just the same.

All told, if you believe that we should make this patch a one-liner, 
I'll do as you suggested.

> 
> Thanks,
> Wei.
Thank you,
Roman

> 
>>   
>>   	memset(input, 0, struct_size(input, names, 1));
>>   	input->partition_id = HV_PARTITION_ID_SELF;
>> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
>> index c4fd07d9bf1a..5178beed6ca8 100644
>> --- a/drivers/hv/hv_common.c
>> +++ b/drivers/hv/hv_common.c
>> @@ -340,7 +340,7 @@ int __init hv_common_init(void)
>>   	BUG_ON(!hyperv_pcpu_input_arg);
>>   
>>   	/* Allocate the per-CPU state for output arg for root */
>> -	if (hv_root_partition) {
>> +	if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
>>   		hyperv_pcpu_output_arg = alloc_percpu(void *);
>>   		BUG_ON(!hyperv_pcpu_output_arg);
>>   	}
>> @@ -435,7 +435,7 @@ int hv_common_cpu_init(unsigned int cpu)
>>   	void **inputarg, **outputarg;
>>   	u64 msr_vp_index;
>>   	gfp_t flags;
>> -	int pgcount = hv_root_partition ? 2 : 1;
>> +	const int pgcount = (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) ? 2 : 1;
>>   	void *mem;
>>   	int ret;
>>   
>> @@ -453,7 +453,7 @@ int hv_common_cpu_init(unsigned int cpu)
>>   		if (!mem)
>>   			return -ENOMEM;
>>   
>> -		if (hv_root_partition) {
>> +		if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
>>   			outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
>>   			*outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
>>   		}
>> -- 
>> 2.34.1
>>

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] hyperv: Fix pointer type for the output of the hypercall in get_vtl(void)
  2024-12-18 20:54 ` [PATCH 1/2] hyperv: Fix pointer type for the output of the hypercall in get_vtl(void) Roman Kisel
  2024-12-19  2:45   ` Wei Liu
@ 2024-12-19 18:40   ` Nuno Das Neves
  2024-12-19 19:11     ` Roman Kisel
  2024-12-19 19:13     ` Easwar Hariharan
  1 sibling, 2 replies; 27+ messages in thread
From: Nuno Das Neves @ 2024-12-19 18:40 UTC (permalink / raw)
  To: Roman Kisel, hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz,
	mingo, mhklinux, tglx, tiala, wei.liu, linux-hyperv, linux-kernel,
	x86
  Cc: apais, benhill, ssengar, sunilmut, vdso

On 12/18/2024 12:54 PM, Roman Kisel wrote:
> Commit bc905fa8b633 ("hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h")
> changed the type of the output pointer to `struct hv_register_assoc` from
> `struct hv_get_vp_registers_output`. That leads to an incorrect computation,
> and leaves the system broken.
> 
My bad! But, lets not use `struct hv_get_registers_output`. Instead, use
`struct hv_register_value`, since that is the more complete definition of a
register value. The output of the get_vp_registers hypercall is just an array
of these values.

Ideally we remove `struct hv_get_vp_registers_output` at some point, since
it serves the same role as `struct hv_register_value` but in a more limited
capacity.

Thanks
Nuno

> Use the correct pointer type for the output of the GetVpRegisters hypercall.
> 
> Fixes: bc905fa8b633 ("hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h")
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>  arch/x86/hyperv/hv_init.c   | 6 +++---
>  include/hyperv/hvgdk_mini.h | 3 ---
>  2 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 3cf2a227d666..c7185c6a290b 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -416,13 +416,13 @@ static u8 __init get_vtl(void)
>  {
>  	u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS;
>  	struct hv_input_get_vp_registers *input;
> -	struct hv_register_assoc *output;
> +	struct hv_get_vp_registers_output *output;
>  	unsigned long flags;
>  	u64 ret;
>  
>  	local_irq_save(flags);
>  	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> -	output = (struct hv_register_assoc *)input;
> +	output = (struct hv_get_vp_registers_output *)input;
>  
>  	memset(input, 0, struct_size(input, names, 1));
>  	input->partition_id = HV_PARTITION_ID_SELF;
> @@ -432,7 +432,7 @@ static u8 __init get_vtl(void)
>  
>  	ret = hv_do_hypercall(control, input, output);
>  	if (hv_result_success(ret)) {
> -		ret = output->value.reg8 & HV_X64_VTL_MASK;
> +		ret = output->as64.low & HV_X64_VTL_MASK;
>  	} else {
>  		pr_err("Failed to get VTL(error: %lld) exiting...\n", ret);
>  		BUG();
> diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
> index db3d1aaf7330..0b1a10828f33 100644
> --- a/include/hyperv/hvgdk_mini.h
> +++ b/include/hyperv/hvgdk_mini.h
> @@ -1107,7 +1107,6 @@ union hv_register_value {
>  	union hv_x64_pending_interruption_register pending_interruption;
>  };
>  
> -#if defined(CONFIG_ARM64)
>  /* HvGetVpRegisters returns an array of these output elements */
>  struct hv_get_vp_registers_output {
>  	union {
> @@ -1124,8 +1123,6 @@ struct hv_get_vp_registers_output {
>  	};
>  };
>  
> -#endif /* CONFIG_ARM64 */
> -
>  struct hv_register_assoc {
>  	u32 name;			/* enum hv_register_name */
>  	u32 reserved1;


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] hyperv: Fix pointer type for the output of the hypercall in get_vtl(void)
  2024-12-19 18:40   ` Nuno Das Neves
@ 2024-12-19 19:11     ` Roman Kisel
  2024-12-19 19:13     ` Easwar Hariharan
  1 sibling, 0 replies; 27+ messages in thread
From: Roman Kisel @ 2024-12-19 19:11 UTC (permalink / raw)
  To: Nuno Das Neves, hpa, kys, bp, dave.hansen, decui, eahariha,
	haiyangz, mingo, mhklinux, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86
  Cc: apais, benhill, ssengar, sunilmut, vdso



On 12/19/2024 10:40 AM, Nuno Das Neves wrote:
> On 12/18/2024 12:54 PM, Roman Kisel wrote:
>> Commit bc905fa8b633 ("hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h")
>> changed the type of the output pointer to `struct hv_register_assoc` from
>> `struct hv_get_vp_registers_output`. That leads to an incorrect computation,
>> and leaves the system broken.
>>
> My bad! But, lets not use `struct hv_get_registers_output`. Instead, use
> `struct hv_register_value`, since that is the more complete definition of a
> register value. The output of the get_vp_registers hypercall is just an array
> of these values.
> 
> Ideally we remove `struct hv_get_vp_registers_output` at some point, since
> it serves the same role as `struct hv_register_value` but in a more limited
> capacity.
Nuno, thank you very much for your help! Agreed, I will use `struct
  hv_register_value` in v2.

> 
> Thanks
> Nuno
Thank you,
Roman

> 
>> Use the correct pointer type for the output of the GetVpRegisters hypercall.
>>
>> Fixes: bc905fa8b633 ("hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h")
>> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
>> ---
>>   arch/x86/hyperv/hv_init.c   | 6 +++---
>>   include/hyperv/hvgdk_mini.h | 3 ---
>>   2 files changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> index 3cf2a227d666..c7185c6a290b 100644
>> --- a/arch/x86/hyperv/hv_init.c
>> +++ b/arch/x86/hyperv/hv_init.c
>> @@ -416,13 +416,13 @@ static u8 __init get_vtl(void)
>>   {
>>   	u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS;
>>   	struct hv_input_get_vp_registers *input;
>> -	struct hv_register_assoc *output;
>> +	struct hv_get_vp_registers_output *output;
>>   	unsigned long flags;
>>   	u64 ret;
>>   
>>   	local_irq_save(flags);
>>   	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
>> -	output = (struct hv_register_assoc *)input;
>> +	output = (struct hv_get_vp_registers_output *)input;
>>   
>>   	memset(input, 0, struct_size(input, names, 1));
>>   	input->partition_id = HV_PARTITION_ID_SELF;
>> @@ -432,7 +432,7 @@ static u8 __init get_vtl(void)
>>   
>>   	ret = hv_do_hypercall(control, input, output);
>>   	if (hv_result_success(ret)) {
>> -		ret = output->value.reg8 & HV_X64_VTL_MASK;
>> +		ret = output->as64.low & HV_X64_VTL_MASK;
>>   	} else {
>>   		pr_err("Failed to get VTL(error: %lld) exiting...\n", ret);
>>   		BUG();
>> diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
>> index db3d1aaf7330..0b1a10828f33 100644
>> --- a/include/hyperv/hvgdk_mini.h
>> +++ b/include/hyperv/hvgdk_mini.h
>> @@ -1107,7 +1107,6 @@ union hv_register_value {
>>   	union hv_x64_pending_interruption_register pending_interruption;
>>   };
>>   
>> -#if defined(CONFIG_ARM64)
>>   /* HvGetVpRegisters returns an array of these output elements */
>>   struct hv_get_vp_registers_output {
>>   	union {
>> @@ -1124,8 +1123,6 @@ struct hv_get_vp_registers_output {
>>   	};
>>   };
>>   
>> -#endif /* CONFIG_ARM64 */
>> -
>>   struct hv_register_assoc {
>>   	u32 name;			/* enum hv_register_name */
>>   	u32 reserved1;

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] hyperv: Fix pointer type for the output of the hypercall in get_vtl(void)
  2024-12-19 18:40   ` Nuno Das Neves
  2024-12-19 19:11     ` Roman Kisel
@ 2024-12-19 19:13     ` Easwar Hariharan
  2024-12-19 19:23       ` Nuno Das Neves
  2024-12-19 20:00       ` Roman Kisel
  1 sibling, 2 replies; 27+ messages in thread
From: Easwar Hariharan @ 2024-12-19 19:13 UTC (permalink / raw)
  To: Nuno Das Neves
  Cc: Roman Kisel, hpa, kys, bp, dave.hansen, decui, haiyangz, mingo,
	mhklinux, tglx, tiala, wei.liu, linux-hyperv, linux-kernel, x86,
	eahariha, apais, benhill, ssengar, sunilmut, vdso

On 12/19/2024 10:40 AM, Nuno Das Neves wrote:
> On 12/18/2024 12:54 PM, Roman Kisel wrote:
>> Commit bc905fa8b633 ("hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h")
>> changed the type of the output pointer to `struct hv_register_assoc` from
>> `struct hv_get_vp_registers_output`. That leads to an incorrect computation,
>> and leaves the system broken.
>>
> My bad! But, lets not use `struct hv_get_registers_output`. Instead, use
> `struct hv_register_value`, since that is the more complete definition of a
> register value. The output of the get_vp_registers hypercall is just an array
> of these values.
> 
> Ideally we remove `struct hv_get_vp_registers_output` at some point, since
> it serves the same role as `struct hv_register_value` but in a more limited
> capacity.
> 
> Thanks
> Nuno
> 

I had much the same conversation with Roman off-list yesterday.

The choice is between using hv_get_registers_output which is clearly the
output of the GetVpRegisters hypercall by name, albeit limited as you
said, and hv_register_value which is the more complete definition and
what the hypervisor actually returns, but does not currently include the
arm64 definitions in our copy of hvgdk_mini.h. hv_get_registers_output
and hv_register_value overlap in layout for Roman's purposes.

FWIW, I'm in favor of adding the arm64 definitions to hv_register_value
and using it for this get_vtl() patch.

This could be accompanied with migration of hv_get_vpreg128 in arm64/
and removal of struct hv_get_registers_output, or that could be deferred
to a later patch.

- Easwar


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] hyperv: Fix pointer type for the output of the hypercall in get_vtl(void)
  2024-12-19 19:13     ` Easwar Hariharan
@ 2024-12-19 19:23       ` Nuno Das Neves
  2024-12-19 19:32         ` Easwar Hariharan
  2024-12-19 20:00       ` Roman Kisel
  1 sibling, 1 reply; 27+ messages in thread
From: Nuno Das Neves @ 2024-12-19 19:23 UTC (permalink / raw)
  To: Easwar Hariharan
  Cc: Roman Kisel, hpa, kys, bp, dave.hansen, decui, haiyangz, mingo,
	mhklinux, tglx, tiala, wei.liu, linux-hyperv, linux-kernel, x86,
	apais, benhill, ssengar, sunilmut, vdso

On 12/19/2024 11:13 AM, Easwar Hariharan wrote:
> On 12/19/2024 10:40 AM, Nuno Das Neves wrote:
>> On 12/18/2024 12:54 PM, Roman Kisel wrote:
>>> Commit bc905fa8b633 ("hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h")
>>> changed the type of the output pointer to `struct hv_register_assoc` from
>>> `struct hv_get_vp_registers_output`. That leads to an incorrect computation,
>>> and leaves the system broken.
>>>
>> My bad! But, lets not use `struct hv_get_registers_output`. Instead, use
>> `struct hv_register_value`, since that is the more complete definition of a
>> register value. The output of the get_vp_registers hypercall is just an array
>> of these values.
>>
>> Ideally we remove `struct hv_get_vp_registers_output` at some point, since
>> it serves the same role as `struct hv_register_value` but in a more limited
>> capacity.
>>
>> Thanks
>> Nuno
>>
> 
> I had much the same conversation with Roman off-list yesterday.
> 
> The choice is between using hv_get_registers_output which is clearly the
> output of the GetVpRegisters hypercall by name, albeit limited as you

If it's desirable to have a more 'friendly' naming here, then I'd be okay with:
```
/* NOTE: Linux helper struct - NOT from Hyper-V code */
struct hv_output_get_vp_registers {
	DECLARE_FLEX_ARRAY(struct hv_register_value, values);
}
```
Note also the name is prefixed with "hv_output_" to match other hypercall outputs.

> said, and hv_register_value which is the more complete definition and
> what the hypervisor actually returns, but does not currently include the
> arm64 definitions in our copy of hvgdk_mini.h. hv_get_registers_output
> and hv_register_value overlap in layout for Roman's purposes.
> 
> FWIW, I'm in favor of adding the arm64 definitions to hv_register_value
> and using it for this get_vtl() patch.
> 

> This could be accompanied with migration of hv_get_vpreg128 in arm64/
> and removal of struct hv_get_registers_output, or that could be deferred
> to a later patch.

I'd be happy to submit a followup patch to update the arm64 code to use
hv_register_value, or a new struct as outlined above.

It is a pretty small change though, it might be easier to just include it in
this series.

Nuno

> 
> - Easwar


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] hyperv: Fix pointer type for the output of the hypercall in get_vtl(void)
  2024-12-19 19:23       ` Nuno Das Neves
@ 2024-12-19 19:32         ` Easwar Hariharan
  2024-12-19 20:03           ` Roman Kisel
  0 siblings, 1 reply; 27+ messages in thread
From: Easwar Hariharan @ 2024-12-19 19:32 UTC (permalink / raw)
  To: Nuno Das Neves
  Cc: eahariha, Roman Kisel, hpa, kys, bp, dave.hansen, decui, haiyangz,
	mingo, mhklinux, tglx, tiala, wei.liu, linux-hyperv, linux-kernel,
	x86, apais, benhill, ssengar, sunilmut, vdso

On 12/19/2024 11:23 AM, Nuno Das Neves wrote:
> On 12/19/2024 11:13 AM, Easwar Hariharan wrote:
>> On 12/19/2024 10:40 AM, Nuno Das Neves wrote:
>>> On 12/18/2024 12:54 PM, Roman Kisel wrote:
>>>> Commit bc905fa8b633 ("hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h")
>>>> changed the type of the output pointer to `struct hv_register_assoc` from
>>>> `struct hv_get_vp_registers_output`. That leads to an incorrect computation,
>>>> and leaves the system broken.
>>>>
>>> My bad! But, lets not use `struct hv_get_registers_output`. Instead, use
>>> `struct hv_register_value`, since that is the more complete definition of a
>>> register value. The output of the get_vp_registers hypercall is just an array
>>> of these values.
>>>
>>> Ideally we remove `struct hv_get_vp_registers_output` at some point, since
>>> it serves the same role as `struct hv_register_value` but in a more limited
>>> capacity.
>>>
>>> Thanks
>>> Nuno
>>>
>>
>> I had much the same conversation with Roman off-list yesterday.
>>
>> The choice is between using hv_get_registers_output which is clearly the
>> output of the GetVpRegisters hypercall by name, albeit limited as you
> 
> If it's desirable to have a more 'friendly' naming here, then I'd be okay with:
> ```
> /* NOTE: Linux helper struct - NOT from Hyper-V code */
> struct hv_output_get_vp_registers {
> 	DECLARE_FLEX_ARRAY(struct hv_register_value, values);
> }
> ```
> Note also the name is prefixed with "hv_output_" to match other hypercall outputs.

I like the idea of improving our code readability and consistency in
interface naming independently of the hypervisor. That comment should
allow for clarity when new definitions are imported.

> 
>> said, and hv_register_value which is the more complete definition and
>> what the hypervisor actually returns, but does not currently include the
>> arm64 definitions in our copy of hvgdk_mini.h. hv_get_registers_output
>> and hv_register_value overlap in layout for Roman's purposes.
>>
>> FWIW, I'm in favor of adding the arm64 definitions to hv_register_value
>> and using it for this get_vtl() patch.
>>
>> This could be accompanied with migration of hv_get_vpreg128 in arm64/
>> and removal of struct hv_get_registers_output, or that could be deferred
>> to a later patch.
> 
> I'd be happy to submit a followup patch to update the arm64 code to use
> hv_register_value, or a new struct as outlined above.
> 
> It is a pretty small change though, it might be easier to just include it in
> this series.
> 

Thank you! I'll leave it you and Roman to decide how you go about that.

- Easwar

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] hyperv: Fix pointer type for the output of the hypercall in get_vtl(void)
  2024-12-19 19:13     ` Easwar Hariharan
  2024-12-19 19:23       ` Nuno Das Neves
@ 2024-12-19 20:00       ` Roman Kisel
  2024-12-19 20:04         ` Easwar Hariharan
  1 sibling, 1 reply; 27+ messages in thread
From: Roman Kisel @ 2024-12-19 20:00 UTC (permalink / raw)
  To: Easwar Hariharan, Nuno Das Neves
  Cc: hpa, kys, bp, dave.hansen, decui, haiyangz, mingo, mhklinux, tglx,
	tiala, wei.liu, linux-hyperv, linux-kernel, x86, apais, benhill,
	ssengar, sunilmut, vdso



On 12/19/2024 11:13 AM, Easwar Hariharan wrote:
> On 12/19/2024 10:40 AM, Nuno Das Neves wrote:
>> On 12/18/2024 12:54 PM, Roman Kisel wrote:
>>> Commit bc905fa8b633 ("hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h")
>>> changed the type of the output pointer to `struct hv_register_assoc` from
>>> `struct hv_get_vp_registers_output`. That leads to an incorrect computation,
>>> and leaves the system broken.
>>>
>> My bad! But, lets not use `struct hv_get_registers_output`. Instead, use
>> `struct hv_register_value`, since that is the more complete definition of a
>> register value. The output of the get_vp_registers hypercall is just an array
>> of these values.
>>
>> Ideally we remove `struct hv_get_vp_registers_output` at some point, since
>> it serves the same role as `struct hv_register_value` but in a more limited
>> capacity.
>>
>> Thanks
>> Nuno
>>
> 
> I had much the same conversation with Roman off-list yesterday.
Appreciate your help, Easwar!

> 
> The choice is between using hv_get_registers_output which is clearly the
> output of the GetVpRegisters hypercall by name, albeit limited as you
> said, and hv_register_value which is the more complete definition and
> what the hypervisor actually returns, but does not currently include the
> arm64 definitions in our copy of hvgdk_mini.h. hv_get_registers_output
> and hv_register_value overlap in layout for Roman's purposes.
> 
> FWIW, I'm in favor of adding the arm64 definitions to hv_register_value
> and using it for this get_vtl() patch.
I could do that, will wait to see if no objections.

> 
> This could be accompanied with migration of hv_get_vpreg128 in arm64/
> and removal of struct hv_get_registers_output, or that could be deferred
> to a later patch.
Having an equivalent of `hv_get_vpreg128` would be awesome on x64!
I'd bet something like that should exist in dom0/mshv already if you'd
like to have a dedicated function. So perhaps we could let mshv take
the lead with that?

If the desire is also to use the fast hypercall technique as
`hv_get_vpreg128` does, then we'd need fast hypercalls on x64 that
can use XMM registers (aka the fast extended hypercalls) that aren't
implemented in the hyperv drivers from my read of the code (although
it seems KVM knows how to do that when it emulates Hyper-V,
arch/x86/kvm/hyperv.c, struct kvm_hv_hcall *hc).

These considerations make me lean towards deferring hv_get_vpreg128-like
implementation to a later time.

> 
> - Easwar
> 

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] hyperv: Fix pointer type for the output of the hypercall in get_vtl(void)
  2024-12-19 19:32         ` Easwar Hariharan
@ 2024-12-19 20:03           ` Roman Kisel
  0 siblings, 0 replies; 27+ messages in thread
From: Roman Kisel @ 2024-12-19 20:03 UTC (permalink / raw)
  To: Easwar Hariharan, Nuno Das Neves
  Cc: hpa, kys, bp, dave.hansen, decui, haiyangz, mingo, mhklinux, tglx,
	tiala, wei.liu, linux-hyperv, linux-kernel, x86, apais, benhill,
	ssengar, sunilmut, vdso



On 12/19/2024 11:32 AM, Easwar Hariharan wrote:
> On 12/19/2024 11:23 AM, Nuno Das Neves wrote:
>> On 12/19/2024 11:13 AM, Easwar Hariharan wrote:
>>> On 12/19/2024 10:40 AM, Nuno Das Neves wrote:
>>>> On 12/18/2024 12:54 PM, Roman Kisel wrote:
>>>>> Commit bc905fa8b633 ("hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h")
>>>>> changed the type of the output pointer to `struct hv_register_assoc` from
>>>>> `struct hv_get_vp_registers_output`. That leads to an incorrect computation,
>>>>> and leaves the system broken.
>>>>>
>>>> My bad! But, lets not use `struct hv_get_registers_output`. Instead, use
>>>> `struct hv_register_value`, since that is the more complete definition of a
>>>> register value. The output of the get_vp_registers hypercall is just an array
>>>> of these values.
>>>>
>>>> Ideally we remove `struct hv_get_vp_registers_output` at some point, since
>>>> it serves the same role as `struct hv_register_value` but in a more limited
>>>> capacity.
>>>>
>>>> Thanks
>>>> Nuno
>>>>
>>>
>>> I had much the same conversation with Roman off-list yesterday.
>>>
>>> The choice is between using hv_get_registers_output which is clearly the
>>> output of the GetVpRegisters hypercall by name, albeit limited as you
>>
>> If it's desirable to have a more 'friendly' naming here, then I'd be okay with:
>> ```
>> /* NOTE: Linux helper struct - NOT from Hyper-V code */
>> struct hv_output_get_vp_registers {
>> 	DECLARE_FLEX_ARRAY(struct hv_register_value, values);
>> }
>> ```
>> Note also the name is prefixed with "hv_output_" to match other hypercall outputs.
> 
> I like the idea of improving our code readability and consistency in
> interface naming independently of the hypervisor. That comment should
> allow for clarity when new definitions are imported.
> 
Thank you, Easwar and Nuno, I'll use that in v2!

>>
>>> said, and hv_register_value which is the more complete definition and
>>> what the hypervisor actually returns, but does not currently include the
>>> arm64 definitions in our copy of hvgdk_mini.h. hv_get_registers_output
>>> and hv_register_value overlap in layout for Roman's purposes.
>>>
>>> FWIW, I'm in favor of adding the arm64 definitions to hv_register_value
>>> and using it for this get_vtl() patch.
>>>
>>> This could be accompanied with migration of hv_get_vpreg128 in arm64/
>>> and removal of struct hv_get_registers_output, or that could be deferred
>>> to a later patch.
>>
>> I'd be happy to submit a followup patch to update the arm64 code to use
>> hv_register_value, or a new struct as outlined above.
>>
>> It is a pretty small change though, it might be easier to just include it in
>> this series.
>>
> 
> Thank you! I'll leave it you and Roman to decide how you go about that.
> 
> - Easwar

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] hyperv: Fix pointer type for the output of the hypercall in get_vtl(void)
  2024-12-19 20:00       ` Roman Kisel
@ 2024-12-19 20:04         ` Easwar Hariharan
  2024-12-19 20:18           ` Roman Kisel
  0 siblings, 1 reply; 27+ messages in thread
From: Easwar Hariharan @ 2024-12-19 20:04 UTC (permalink / raw)
  To: Roman Kisel
  Cc: Nuno Das Neves, eahariha, hpa, kys, bp, dave.hansen, decui,
	haiyangz, mingo, mhklinux, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86, apais, benhill, ssengar, sunilmut, vdso

On 12/19/2024 12:00 PM, Roman Kisel wrote:
> 
> 
> On 12/19/2024 11:13 AM, Easwar Hariharan wrote:
>> On 12/19/2024 10:40 AM, Nuno Das Neves wrote:
>>> On 12/18/2024 12:54 PM, Roman Kisel wrote:
>>>> Commit bc905fa8b633 ("hyperv: Switch from hyperv-tlfs.h to hyperv/
>>>> hvhdk.h")
>>>> changed the type of the output pointer to `struct hv_register_assoc`
>>>> from
>>>> `struct hv_get_vp_registers_output`. That leads to an incorrect
>>>> computation,
>>>> and leaves the system broken.
>>>>
>>> My bad! But, lets not use `struct hv_get_registers_output`. Instead, use
>>> `struct hv_register_value`, since that is the more complete
>>> definition of a
>>> register value. The output of the get_vp_registers hypercall is just
>>> an array
>>> of these values.
>>>
>>> Ideally we remove `struct hv_get_vp_registers_output` at some point,
>>> since
>>> it serves the same role as `struct hv_register_value` but in a more
>>> limited
>>> capacity.
>>>
>>> Thanks
>>> Nuno
>>>
>>
>> I had much the same conversation with Roman off-list yesterday.
> Appreciate your help, Easwar!
> 
>>
>> The choice is between using hv_get_registers_output which is clearly the
>> output of the GetVpRegisters hypercall by name, albeit limited as you
>> said, and hv_register_value which is the more complete definition and
>> what the hypervisor actually returns, but does not currently include the
>> arm64 definitions in our copy of hvgdk_mini.h. hv_get_registers_output
>> and hv_register_value overlap in layout for Roman's purposes.
>>
>> FWIW, I'm in favor of adding the arm64 definitions to hv_register_value
>> and using it for this get_vtl() patch.
> I could do that, will wait to see if no objections.
> 
>>
>> This could be accompanied with migration of hv_get_vpreg128 in arm64/
>> and removal of struct hv_get_registers_output, or that could be deferred
>> to a later patch.
> Having an equivalent of `hv_get_vpreg128` would be awesome on x64!
> I'd bet something like that should exist in dom0/mshv already if you'd
> like to have a dedicated function. So perhaps we could let mshv take
> the lead with that?
> 
> If the desire is also to use the fast hypercall technique as
> `hv_get_vpreg128` does, then we'd need fast hypercalls on x64 that
> can use XMM registers (aka the fast extended hypercalls) that aren't
> implemented in the hyperv drivers from my read of the code (although
> it seems KVM knows how to do that when it emulates Hyper-V,
> arch/x86/kvm/hyperv.c, struct kvm_hv_hcall *hc).
> 
> These considerations make me lean towards deferring hv_get_vpreg128-like
> implementation to a later time.
>

To clarify, I didn't mean to include implementing extended fast
hypercalls with the XMM registers, or an implementation of
hv_get_vpreg128() for x64 in my suggestion.

Just to fix up the existing hv_get_vpreg128 in arch/arm64 to use
hv_register_value (or its Linux-specific helper as Nuno suggested).

Thanks,
Easwar

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] hyperv: Fix pointer type for the output of the hypercall in get_vtl(void)
  2024-12-19 20:04         ` Easwar Hariharan
@ 2024-12-19 20:18           ` Roman Kisel
  0 siblings, 0 replies; 27+ messages in thread
From: Roman Kisel @ 2024-12-19 20:18 UTC (permalink / raw)
  To: Easwar Hariharan
  Cc: Nuno Das Neves, hpa, kys, bp, dave.hansen, decui, haiyangz, mingo,
	mhklinux, tglx, tiala, wei.liu, linux-hyperv, linux-kernel, x86,
	apais, benhill, ssengar, sunilmut, vdso



On 12/19/2024 12:04 PM, Easwar Hariharan wrote:

[snip]

>>
>>>
>>> This could be accompanied with migration of hv_get_vpreg128 in arm64/
>>> and removal of struct hv_get_registers_output, or that could be deferred
>>> to a later patch.

[snip]

> 
> To clarify, I didn't mean to include implementing extended fast
> hypercalls with the XMM registers, or an implementation of
> hv_get_vpreg128() for x64 in my suggestion.
> 
> Just to fix up the existing hv_get_vpreg128 in arch/arm64 to use
> hv_register_value (or its Linux-specific helper as Nuno suggested).
Oooh, I see, thanks! I'll try that, looks like that should be pretty
straightforward, and hopefully, it will :)

> 
> Thanks,
> Easwar

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas in get_vtl(void)
  2024-12-19 18:19     ` Roman Kisel
@ 2024-12-19 21:35       ` Wei Liu
  2024-12-19 21:37       ` Michael Kelley
  1 sibling, 0 replies; 27+ messages in thread
From: Wei Liu @ 2024-12-19 21:35 UTC (permalink / raw)
  To: Roman Kisel
  Cc: Wei Liu, hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz,
	mingo, mhklinux, nunodasneves, tglx, tiala, linux-hyperv,
	linux-kernel, x86, apais, benhill, ssengar, sunilmut, vdso

On Thu, Dec 19, 2024 at 10:19:07AM -0800, Roman Kisel wrote:
> 
> 
> On 12/18/2024 6:42 PM, Wei Liu wrote:
> > On Wed, Dec 18, 2024 at 12:54:21PM -0800, Roman Kisel wrote:
> > > The Top-Level Functional Specification for Hyper-V, Section 3.6 [1, 2], disallows
> > > overlapping of the input and output hypercall areas, and get_vtl(void) does
> > > overlap them.
> > > 
> > > To fix this, enable allocation of the output hypercall pages when running in
> > > the VTL mode and use the output hypercall page of the current vCPU for the
> > > hypercall.
> > > 
> > > [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/hypercall-interface
> > > [2] https://github.com/MicrosoftDocs/Virtualization-Documentation/tree/main/tlfs
> > > 
> > > Fixes: 8387ce06d70b ("x86/hyperv: Set Virtual Trust Level in VMBus init message")
> > > Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> > > ---
> > >   arch/x86/hyperv/hv_init.c | 2 +-
> > >   drivers/hv/hv_common.c    | 6 +++---
> > >   2 files changed, 4 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > > index c7185c6a290b..90c9ea00273e 100644
> > > --- a/arch/x86/hyperv/hv_init.c
> > > +++ b/arch/x86/hyperv/hv_init.c
> > > @@ -422,7 +422,7 @@ static u8 __init get_vtl(void)
> > >   	local_irq_save(flags);
> > >   	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > > -	output = (struct hv_get_vp_registers_output *)input;
> > > +	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > 
> > You can do
> > 
> > 	output = (char *)input + HV_HYP_PAGE_SIZE / 2;
> > 
> > to avoid the extra allocation.
> > 
> > The input and output structures surely won't take up half of the page.
> Agreed on the both counts! I do think that the attempt to save here
> won't help much: the hypercall output per-CPU pages in the VTL mode are
> needed just as in the dom0/root partition mode because this hypercall
> isn't going to be the only one required.
> 
> In other words, we will have to allocate these pages anyway as we evolve
> the code; we are trying to save here what is going to be spent anyway. Sort
> of, kicking the can down the road as the saying goes :)
> 

If you want this patch to be backported, then the smaller the change the
better.

In this particular case, I don't have a strong opinion. Your original
patch is small enough to be backported easily.

You can keep the patch as-is.

Thanks,
Wei.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas in get_vtl(void)
  2024-12-19 18:19     ` Roman Kisel
  2024-12-19 21:35       ` Wei Liu
@ 2024-12-19 21:37       ` Michael Kelley
  2024-12-19 23:39         ` Roman Kisel
  1 sibling, 1 reply; 27+ messages in thread
From: Michael Kelley @ 2024-12-19 21:37 UTC (permalink / raw)
  To: Roman Kisel, Wei Liu
  Cc: hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev

From: Roman Kisel <romank@linux.microsoft.com> Sent: Thursday, December 19, 2024 10:19 AM
> 
> On 12/18/2024 6:42 PM, Wei Liu wrote:
> > On Wed, Dec 18, 2024 at 12:54:21PM -0800, Roman Kisel wrote:

[...]

> >> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> >> index c7185c6a290b..90c9ea00273e 100644
> >> --- a/arch/x86/hyperv/hv_init.c
> >> +++ b/arch/x86/hyperv/hv_init.c
> >> @@ -422,7 +422,7 @@ static u8 __init get_vtl(void)
> >>
> >>   	local_irq_save(flags);
> >>   	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> >> -	output = (struct hv_get_vp_registers_output *)input;
> >> +	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> >
> > You can do
> >
> > 	output = (char *)input + HV_HYP_PAGE_SIZE / 2;
> >
> > to avoid the extra allocation.
> >
> > The input and output structures surely won't take up half of the page.
> Agreed on the both counts! I do think that the attempt to save here
> won't help much: the hypercall output per-CPU pages in the VTL mode are
> needed just as in the dom0/root partition mode because this hypercall
> isn't going to be the only one required.
> 
> In other words, we will have to allocate these pages anyway as we evolve
> the code; we are trying to save here what is going to be spent anyway.
> Sort of, kicking the can down the road as the saying goes :)
> 
> I do understand that within the code that is already merged, there is
> just one this place in this function where the hypercall that returns
> data is used. And the proposed approach makes the code self-explanatory:
> ```
> output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> ```
> 
> as opposed to
> 
> ```
> output = (char *)input + HV_HYP_PAGE_SIZE / 2;
> ```
> 
> or, as it existed,
> 
> ```
> output = (struct hv_get_vp_registers_output *)input;
> ```
> 
> which both do require a good comment I believe.
> 
> There will surely be more hypercall usage in the VTL mode that return
> data and require the output pages as we progress with upstreaming the
> VTL patches. Enabling the hypercall output pages allows to fix the
> function in question in a very natural way, making it possible to
> replace with some future `hv_get_vp_register` that would work for both
> dom0 and VTL mode just the same.
> 
> All told, if you believe that we should make this patch a one-liner,
> I'll do as you suggested.
> 

FWIW, Roman and I had this same discussion back in August. See [1].
I'll add one new thought that wasn't part of the August discussion.

To my knowledge, the hypercalls that *may* use a full page for input
and/or a full page for output don't actually *require* a full page. The
size of the input and output depends on how many "entries" the
hypercall is specified to process, where "entries" could be registers,
memory PFNs, or whatever. I would expect the code to invoke these
hypercalls must already deal with the case where the requested number
of entries causes the input or output size to exceed one page, so the
code just iterates making multiple invocations of the hypercall with
a "batch size" that fits in one page.

It would be perfectly reasonable to limit the batch size so that a
"batch" of input or output fits in a half page instead of a full page,
avoiding the need to allocate hyperv_pcpu_output_arg. Or if the
input and output sizes are not equal, use whatever input vs. output
partitioning of a single page make sense for that hypercall. The
tradeoff, of course, is having to make the hypercall more times
with smaller batches. But if the hypercall is not in a hot path, this
might be a reasonable tradeoff to avoid the additional memory
allocation. Or if the hypercall processing time per "entry" is high,
the added overhead of more invocations with smaller batches is
probably negligible compared to the time processing the entries.

This scheme could also be used in the existing root partition code
that is currently the only user of the hyperv_pcpu_output_arg.
I could see a valid argument being made to drop
hyperv_pcpu_output_arg entirely and just use smaller batches.

Or are there hypercalls where a smaller batch size doesn't work
at all or is a bad tradeoff for performance reasons? I know I'm not
familiar with *all* the hypercalls that might be used in root
partition or VTL code. If there are such hypercalls, I would be curious
to know about them.

Michael

[1] https://lore.kernel.org/linux-hyperv/SN6PR02MB415759676AEF931F030430FDD4BE2@SN6PR02MB4157.namprd02.prod.outlook.com/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas in get_vtl(void)
  2024-12-19 21:37       ` Michael Kelley
@ 2024-12-19 23:39         ` Roman Kisel
  2024-12-20  2:01           ` Michael Kelley
  0 siblings, 1 reply; 27+ messages in thread
From: Roman Kisel @ 2024-12-19 23:39 UTC (permalink / raw)
  To: Michael Kelley, Wei Liu
  Cc: hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev



On 12/19/2024 1:37 PM, Michael Kelley wrote:
> From: Roman Kisel <romank@linux.microsoft.com> Sent: Thursday, December 19, 2024 10:19 AM

[...]

>>
>> There will surely be more hypercall usage in the VTL mode that return
>> data and require the output pages as we progress with upstreaming the
>> VTL patches. Enabling the hypercall output pages allows to fix the
>> function in question in a very natural way, making it possible to
>> replace with some future `hv_get_vp_register` that would work for both
>> dom0 and VTL mode just the same.
>>
>> All told, if you believe that we should make this patch a one-liner,
>> I'll do as you suggested.
>>
> 
> FWIW, Roman and I had this same discussion back in August. See [1].
> I'll add one new thought that wasn't part of the August discussion.
Michael, thank you very much for helping out in finding a better
solution!

> 
> To my knowledge, the hypercalls that *may* use a full page for input
> and/or a full page for output don't actually *require* a full page. The
> size of the input and output depends on how many "entries" the
> hypercall is specified to process, where "entries" could be registers,
> memory PFNs, or whatever. I would expect the code to invoke these
> hypercalls must already deal with the case where the requested number
> of entries causes the input or output size to exceed one page, so the
> code just iterates making multiple invocations of the hypercall with
> a "batch size" that fits in one page.
That is what I see in the code, too. The TLFS requires to use a page
worth of data maximum ("cannot overlap or cross page boundaries"), hence
the hypercall code shall chunk up the input data appropriately.

> 
> It would be perfectly reasonable to limit the batch size so that a
> "batch" of input or output fits in a half page instead of a full page,
> avoiding the need to allocate hyperv_pcpu_output_arg. Or if the
> input and output sizes are not equal, use whatever input vs. output
> partitioning of a single page make sense for that hypercall. The
> tradeoff, of course, is having to make the hypercall more times
> with smaller batches. But if the hypercall is not in a hot path, this
> might be a reasonable tradeoff to avoid the additional memory
> allocation. Or if the hypercall processing time per "entry" is high,
> the added overhead of more invocations with smaller batches is
> probably negligible compared to the time processing the entries.
The hypervisor yields control back to the guest if the hypervisor
spends more than ~a dozen 1e-6 sec in the hypercall processing, and
the processing isn't done yet. When yielding the control back, the
hypervisor doesn't advance the instruction pointer so the guest can
process interrupts on the vCPU (if the guest didn't mask them), and
get back to processing the hypercall. That helps the guest stay
responsive if the guest chose to send larger batches. For smaller
batches, have to consider the cost of invoking the hypercall as you
are pointing out. On the topic of saving CPU time, there are also fast
hypercalls that pass data in the CPU registers.

> 
> This scheme could also be used in the existing root partition code
> that is currently the only user of the hyperv_pcpu_output_arg.
> I could see a valid argument being made to drop
> hyperv_pcpu_output_arg entirely and just use smaller batches.
In my view, that is a reasonable idea to cut down on memory usage.
Here is what I get applying that general idea to this specific
situation (and sticking to 4KiB as the page size).

We are going to save 4KiB * N of memory where N is the number of cores
allocated to the root partition. Let us also introduce M that denotes
the amount of memory in KiB allocated to the root partition.

Given that the order of magnitude for N is 1 (log_10(N) ~= 1), and the
order of magnitude for M is 6..7, the savings (~=10^(N-M)=1e-5) look
rather meager to my eye. That might be a judgement call, I wouldn't
argue that.

What is worrisome is that the guest goes against the specification. The
specification decrees: the input and output areas for the hypercall
shall not cross page boundaries and shall not overlap.
Hence, the hypervisor is within its right to produce up to 4KiB of
output in response to up to 4KiBs of input, and we have:

```
sizeof(input) + sizeof(output) <= 2*sizeof(page)
```

But when the guest doesn't use the output page, we obviously have

```
sizeof(input) + sizeof(output) <= sizeof(page)
```
on the guest side so the contract is broken.

The hypervisor would need to know that the guest optimizes its memory
usage in this way, limiting what is allowed by the specification when
implementing any new hypercalls.

> 
> Or are there hypercalls where a smaller batch size doesn't work
> at all or is a bad tradeoff for performance reasons? I know I'm not
> familiar with *all* the hypercalls that might be used in root
> partition or VTL code. If there are such hypercalls, I would be curious
> to know about them.
Nothing that I could find in the specification. I wouldn't think that
justifies investing in creating specialized/special-cased functions
on the guest side without solid evidence that more code is needed. In
this particular case, one day, I'd love to replace `get_vtl` with
one generic function `hv_get_vp_registers` that works both for dom0 and
VTLs, plays by the book and does not require much/any explaining what is
going on inside it and why. I believe this will make maintenance easier.


> 
> Michael
> 
> [1] https://lore.kernel.org/linux-hyperv/SN6PR02MB415759676AEF931F030430FDD4BE2@SN6PR02MB4157.namprd02.prod.outlook.com/

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas in get_vtl(void)
  2024-12-19 23:39         ` Roman Kisel
@ 2024-12-20  2:01           ` Michael Kelley
  2024-12-20 19:13             ` Roman Kisel
  0 siblings, 1 reply; 27+ messages in thread
From: Michael Kelley @ 2024-12-20  2:01 UTC (permalink / raw)
  To: Roman Kisel, Wei Liu
  Cc: hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev

From: Roman Kisel <romank@linux.microsoft.com> Sent: Thursday, December 19, 2024 3:39 PM
> 
> On 12/19/2024 1:37 PM, Michael Kelley wrote:
> > From: Roman Kisel <romank@linux.microsoft.com> Sent: Thursday, December 19,
> 2024 10:19 AM
> 
> [...]
> 
> >>
> >> There will surely be more hypercall usage in the VTL mode that return
> >> data and require the output pages as we progress with upstreaming the
> >> VTL patches. Enabling the hypercall output pages allows to fix the
> >> function in question in a very natural way, making it possible to
> >> replace with some future `hv_get_vp_register` that would work for both
> >> dom0 and VTL mode just the same.
> >>
> >> All told, if you believe that we should make this patch a one-liner,
> >> I'll do as you suggested.
> >>
> >
> > FWIW, Roman and I had this same discussion back in August. See [1].
> > I'll add one new thought that wasn't part of the August discussion.
> Michael, thank you very much for helping out in finding a better
> solution!
> 
> >
> > To my knowledge, the hypercalls that *may* use a full page for input
> > and/or a full page for output don't actually *require* a full page. The
> > size of the input and output depends on how many "entries" the
> > hypercall is specified to process, where "entries" could be registers,
> > memory PFNs, or whatever. I would expect the code to invoke these
> > hypercalls must already deal with the case where the requested number
> > of entries causes the input or output size to exceed one page, so the
> > code just iterates making multiple invocations of the hypercall with
> > a "batch size" that fits in one page.
> That is what I see in the code, too. The TLFS requires to use a page
> worth of data maximum ("cannot overlap or cross page boundaries"), hence
> the hypercall code shall chunk up the input data appropriately.
> 
> >
> > It would be perfectly reasonable to limit the batch size so that a
> > "batch" of input or output fits in a half page instead of a full page,
> > avoiding the need to allocate hyperv_pcpu_output_arg. Or if the
> > input and output sizes are not equal, use whatever input vs. output
> > partitioning of a single page make sense for that hypercall. The
> > tradeoff, of course, is having to make the hypercall more times
> > with smaller batches. But if the hypercall is not in a hot path, this
> > might be a reasonable tradeoff to avoid the additional memory
> > allocation. Or if the hypercall processing time per "entry" is high,
> > the added overhead of more invocations with smaller batches is
> > probably negligible compared to the time processing the entries.
> The hypervisor yields control back to the guest if the hypervisor
> spends more than ~a dozen 1e-6 sec in the hypercall processing, and
> the processing isn't done yet. When yielding the control back, the
> hypervisor doesn't advance the instruction pointer so the guest can
> process interrupts on the vCPU (if the guest didn't mask them), and
> get back to processing the hypercall. That helps the guest stay
> responsive if the guest chose to send larger batches. For smaller
> batches, have to consider the cost of invoking the hypercall as you
> are pointing out. On the topic of saving CPU time, there are also fast
> hypercalls that pass data in the CPU registers.
> 
> >
> > This scheme could also be used in the existing root partition code
> > that is currently the only user of the hyperv_pcpu_output_arg.
> > I could see a valid argument being made to drop
> > hyperv_pcpu_output_arg entirely and just use smaller batches.
> In my view, that is a reasonable idea to cut down on memory usage.
> Here is what I get applying that general idea to this specific
> situation (and sticking to 4KiB as the page size).
> 
> We are going to save 4KiB * N of memory where N is the number of cores
> allocated to the root partition. Let us also introduce M that denotes
> the amount of memory in KiB allocated to the root partition.
> 
> Given that the order of magnitude for N is 1 (log_10(N) ~= 1), and the
> order of magnitude for M is 6..7, the savings (~=10^(N-M)=1e-5) look
> rather meager to my eye. That might be a judgement call, I wouldn't
> argue that.

I would agree that the percentage savings is small. VMs often have
several hundred MiB to a few GiB of memory per vCPU. Saving a
4K page out of that amount of memory is a small percentage. The
thing to guard against, though, is applying that logic in many different
places in Linux kernel code. :-)  The Hyper-V support in Linux already
has multiple pre-allocated per-vCPU pages, and by being a little bit
clever we might be able to avoid another one.

> 
> What is worrisome is that the guest goes against the specification. The
> specification decrees: the input and output areas for the hypercall
> shall not cross page boundaries and shall not overlap.
> Hence, the hypervisor is within its right to produce up to 4KiB of
> output in response to up to 4KiBs of input, and we have:
> 
> ```
> sizeof(input) + sizeof(output) <= 2*sizeof(page)
> ```
> 
> But when the guest doesn't use the output page, we obviously have
> 
> ```
> sizeof(input) + sizeof(output) <= sizeof(page)
> ```
> on the guest side so the contract is broken.

I agree that a hypercall could produce up to 4 KiB of output in
response to up to 4 KiB of input. But the guest controls how much
input it passes. Furthermore, for all the hypercalls I'm familiar with,
the specification of the hypercall tells the max amount of output it
will produce in response to the input. That allows the guest to
know how much output space it needs to allocate and provide to
the hypercall.

I will concede that it's possible to envision a hypercall with a
specification that says "May produce up to 4 KiB of output. A header
at the beginning of the output says how much output was produced."
In that case, the guest indeed must supply a full page of output space.
But I don't think we have any such hypercalls now, and adding such a 
hypercall in the future seems unlikely. Of course, if such a hypercall
did get added and Linux used that hypercall, Linux would need to
supply a full page for the hypercall output. That page wouldn't
necessarily need to be a pre-allocated per-vCPU hypercall output
page. Depending on the usage circumstances, that full page might be
able to be allocated on-demand. 

But assume things proceed as they are today where Linux can limit
the amount of hypercall output based on the input. Then I don't
see a violation of the contract if Linux limits the output and fits
it within a page that is also being shared in a non-overlapping
way with any hypercall input. I wouldn't allocate a per-vCPU
hypercall output page now for a theoretically possible
hypercall that doesn't exist yet.

Michael

> 
> The hypervisor would need to know that the guest optimizes its memory
> usage in this way, limiting what is allowed by the specification when
> implementing any new hypercalls.
> 
> >
> > Or are there hypercalls where a smaller batch size doesn't work
> > at all or is a bad tradeoff for performance reasons? I know I'm not
> > familiar with *all* the hypercalls that might be used in root
> > partition or VTL code. If there are such hypercalls, I would be curious
> > to know about them.
> Nothing that I could find in the specification. I wouldn't think that
> justifies investing in creating specialized/special-cased functions
> on the guest side without solid evidence that more code is needed. In
> this particular case, one day, I'd love to replace `get_vtl` with
> one generic function `hv_get_vp_registers` that works both for dom0 and
> VTLs, plays by the book and does not require much/any explaining what is
> going on inside it and why. I believe this will make maintenance easier.
> 
> 
> >
> > Michael
> >
> > [1] https://lore.kernel.org/linux-hyperv/SN6PR02MB415759676AEF931F030430FDD4BE2@SN6PR02MB4157.namprd02.prod.outlook.com/
> 
> --
> Thank you,
> Roman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas in get_vtl(void)
  2024-12-20  2:01           ` Michael Kelley
@ 2024-12-20 19:13             ` Roman Kisel
  2024-12-20 22:42               ` Michael Kelley
  0 siblings, 1 reply; 27+ messages in thread
From: Roman Kisel @ 2024-12-20 19:13 UTC (permalink / raw)
  To: Michael Kelley, Wei Liu
  Cc: hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev



On 12/19/2024 6:01 PM, Michael Kelley wrote:
> From: Roman Kisel <romank@linux.microsoft.com> Sent: Thursday, December 19, 2024 3:39 PM
>>
>> On 12/19/2024 1:37 PM, Michael Kelley wrote:

[...]

> I would agree that the percentage savings is small. VMs often have
> several hundred MiB to a few GiB of memory per vCPU. Saving a
> 4K page out of that amount of memory is a small percentage. The
> thing to guard against, though, is applying that logic in many different
> places in Linux kernel code. :-)  The Hyper-V support in Linux already
> has multiple pre-allocated per-vCPU pages, and by being a little bit
> clever we might be able to avoid another one.
> 

[...]

We will also need the vCPU assist pages and the context pages for the
lower VTLs, and the data don't currently occupy the entire pages. Yet
imho it is prudent to leave some wiggle room instead of painting
ourselves into the corner. We're not writing the code for MCUs, are
working under different constraints, and, yes, reaching to use a page as
that's the hard currency of virtualization imho. The numbers show that
savings are negligible per-CPU but these savings come at a cost of
making assumptions what will not happen in the future thus placing a bet
against what the specification says.

It's not even the hyperv code that is the largest consumer of the per-
CPU data, not even close. Looking at the `vmlinux`'es `.data..percpu`
section, there are almost 200 entries, and roughly one quarter is
pointer-sized so who really knows how much is going to be allocated per-
CPU. The top ten statically allocated are

nm -rS --size-sort ./vmlinux | grep -vF ffffffff8 | sed 10q

000000000000c000 0000000000008000 d exception_stacks
0000000000006000 0000000000005000 D cpu_tss_rw
0000000000002000 0000000000004000 D irq_stack_backing_store
000000000001b5c0 0000000000003180 d timer_bases
0000000000017000 0000000000003000 d bts_ctx
0000000000015520 0000000000001450 D cpu_hw_events
000000000000b000 0000000000001000 D gdt_page
0000000000014000 0000000000001000 d entry_stack_storage
0000000000001000 0000000000001000 D cpu_debug_store
0000000000021d80 0000000000000c40 D runqueues

on a configuration that is the bare minimum, no fluff. We could invest
into looking what would be the cost of compiling out `bts_ctx` or
`cpu_debug_store` instead of adding more if statements and making the
code look tricky.

> 
> I agree that a hypercall could produce up to 4 KiB of output in
> response to up to 4 KiB of input. But the guest controls how much
> input it passes. Furthermore, for all the hypercalls I'm familiar with,
> the specification of the hypercall tells the max amount of output it
> will produce in response to the input. That allows the guest to
> know how much output space it needs to allocate and provide to
> the hypercall.
> 

> I will concede that it's possible to envision a hypercall with a
> specification that says "May produce up to 4 KiB of output. A header
> at the beginning of the output says how much output was produced."
> In that case, the guest indeed must supply a full page of output space.
> But I don't think we have any such hypercalls now, and adding such a
> hypercall in the future seems unlikely. Of course, if such a hypercall
> did get added and Linux used that hypercall, Linux would need to
> supply a full page for the hypercall output. That page wouldn't
> necessarily need to be a pre-allocated per-vCPU hypercall output
> page. Depending on the usage circumstances, that full page might be
> able to be allocated on-demand.
> 
> But assume things proceed as they are today where Linux can limit
> the amount of hypercall output based on the input. Then I don't
> see a violation of the contract if Linux limits the output and fits
> it within a page that is also being shared in a non-overlapping
> way with any hypercall input. I wouldn't allocate a per-vCPU
> hypercall output page now for a theoretically possible
> hypercall that doesn't exist yet.

Given that:

* Using the hypercall output page is plumbed throughout the dom0 code,
* dom0 and the VTL mode can share code quite naturally,
* Not using the output page cannot acccount for all cases permitted by
   the TLFS clause "don't overlap input and output, and don't cross page
   boundaries",
* Not using the output page everywhere for consistency requires updating
   call sites of `hv_do_hypercall`​ and friends, i.e. every place where a
   hypercall is made needs to incorporate the offset/pointer at which the
   output should start, or perhaps do some shenanigans with macro's,
* Not using the output page leads to negligible savings,

it is hard to see for me how to make not using the hypercall output page 
look as a profitable enginnering endeavor, really comes off as dancing 
to the perf scare/feature creep tune for peanuts.

In my opinion, we should use the hypercall output page for the VTL mode
as dom0 does to share code and reduce code churn. Had we used some
`hv_get_registers` in both​ instead of the specialized for no compelling
imo practical reason `get_vtl`​ (as it just gets a vCPU register, nothing
more, nothing less), this code path would've been better tested, and any 
of this patching would not have been needed.

I'd wait for few days and then would likely prefer to run with Wei's
permission to send this patch in v2 as-is unless some damning evidence 
presents itself.

> 
> Michael
> 

[...]

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas in get_vtl(void)
  2024-12-20 19:13             ` Roman Kisel
@ 2024-12-20 22:42               ` Michael Kelley
  2024-12-23 20:30                 ` Roman Kisel
  0 siblings, 1 reply; 27+ messages in thread
From: Michael Kelley @ 2024-12-20 22:42 UTC (permalink / raw)
  To: Roman Kisel, Wei Liu
  Cc: hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev

From: Roman Kisel <romank@linux.microsoft.com> Sent: Friday, December 20, 2024 11:14 AM
> 
> On 12/19/2024 6:01 PM, Michael Kelley wrote:
> > From: Roman Kisel <romank@linux.microsoft.com> Sent: Thursday, December 19,
> 2024 3:39 PM
> >>
> >> On 12/19/2024 1:37 PM, Michael Kelley wrote:
> 
> [...]
> 
> > I would agree that the percentage savings is small. VMs often have
> > several hundred MiB to a few GiB of memory per vCPU. Saving a
> > 4K page out of that amount of memory is a small percentage. The
> > thing to guard against, though, is applying that logic in many different
> > places in Linux kernel code. :-)  The Hyper-V support in Linux already
> > has multiple pre-allocated per-vCPU pages, and by being a little bit
> > clever we might be able to avoid another one.
> >
> 
> [...]
> 
> We will also need the vCPU assist pages and the context pages for the
> lower VTLs, and the data don't currently occupy the entire pages. Yet
> imho it is prudent to leave some wiggle room instead of painting
> ourselves into the corner. We're not writing the code for MCUs, are
> working under different constraints, and, yes, reaching to use a page as
> that's the hard currency of virtualization imho. The numbers show that
> savings are negligible per-CPU but these savings come at a cost of
> making assumptions what will not happen in the future thus placing a bet
> against what the specification says.

I will not object to your preferred path of allocating the
hyperv_pcpu_output_arg when CONFIG_HYPERV_VTL_MODE is set. For
me, it has been a worthwhile discussion to explore the alternatives. I do,
however, want to get further clarification about the spec requirements
and whether our current Linux code is meeting those requirements. More
on that below.  And allow me to continue the discussion about a couple
of points you raised.

> 
> It's not even the hyperv code that is the largest consumer of the per-
> CPU data, not even close. Looking at the `vmlinux`'es `.data..percpu`
> section, there are almost 200 entries, and roughly one quarter is
> pointer-sized so who really knows how much is going to be allocated per-
> CPU. The top ten statically allocated are
> 
> nm -rS --size-sort ./vmlinux | grep -vF ffffffff8 | sed 10q
> 
> 000000000000c000 0000000000008000 d exception_stacks
> 0000000000006000 0000000000005000 D cpu_tss_rw
> 0000000000002000 0000000000004000 D irq_stack_backing_store
> 000000000001b5c0 0000000000003180 d timer_bases
> 0000000000017000 0000000000003000 d bts_ctx
> 0000000000015520 0000000000001450 D cpu_hw_events
> 000000000000b000 0000000000001000 D gdt_page
> 0000000000014000 0000000000001000 d entry_stack_storage
> 0000000000001000 0000000000001000 D cpu_debug_store
> 0000000000021d80 0000000000000c40 D runqueues
> 
> on a configuration that is the bare minimum, no fluff. We could invest
> into looking what would be the cost of compiling out `bts_ctx` or
> `cpu_debug_store` instead of adding more if statements and making the
> code look tricky.
> 
> >
> > I agree that a hypercall could produce up to 4 KiB of output in
> > response to up to 4 KiB of input. But the guest controls how much
> > input it passes. Furthermore, for all the hypercalls I'm familiar with,
> > the specification of the hypercall tells the max amount of output it
> > will produce in response to the input. That allows the guest to
> > know how much output space it needs to allocate and provide to
> > the hypercall.
> >
> 
> > I will concede that it's possible to envision a hypercall with a
> > specification that says "May produce up to 4 KiB of output. A header
> > at the beginning of the output says how much output was produced."
> > In that case, the guest indeed must supply a full page of output space.
> > But I don't think we have any such hypercalls now, and adding such a
> > hypercall in the future seems unlikely. Of course, if such a hypercall
> > did get added and Linux used that hypercall, Linux would need to
> > supply a full page for the hypercall output. That page wouldn't
> > necessarily need to be a pre-allocated per-vCPU hypercall output
> > page. Depending on the usage circumstances, that full page might be
> > able to be allocated on-demand.
> >
> > But assume things proceed as they are today where Linux can limit
> > the amount of hypercall output based on the input. Then I don't
> > see a violation of the contract if Linux limits the output and fits
> > it within a page that is also being shared in a non-overlapping
> > way with any hypercall input. I wouldn't allocate a per-vCPU
> > hypercall output page now for a theoretically possible
> > hypercall that doesn't exist yet.
> 
> Given that:
> 
> * Using the hypercall output page is plumbed throughout the dom0 code,

I see three uses in current upstream code. And then there are close to
a dozen more uses in Nuno's year-old patch set for /dev/mshv that hasn't
been accepted upstream yet. Does that roughly match what you are referring
to?

> * dom0 and the VTL mode can share code quite naturally,

Yep.

> * Not using the output page cannot acccount for all cases permitted by
>    the TLFS clause "don't overlap input and output, and don't cross page
>    boundaries",

This is what I don’t understand.

> * Not using the output page everywhere for consistency requires updating
>    call sites of `hv_do_hypercall`​ and friends, i.e. every place where a
>    hypercall is made needs to incorporate the offset/pointer at which the
>    output should start, or perhaps do some shenanigans with macro's,

In my musing about getting rid of hyperv_pcpu_output_arg entirely, the
call signature of hv_do_hypercall() and friends would remain unchanged.
There would still be a virtual address passed as the output argument (which
might be NULL, and is *not* required to be page aligned). The caller of
hv_do_hypercall() must only ensure that the virtual address points to an
output area that is large enough to contain the output without crossing a
page boundary. The area must also not overlap with the input area. Given
that we currently have only 3 uses of hyperv_pcpu_output_arg in upstream
code, only those would need to be tweaked to split the hyperv_pcpu_input_arg
page into an input and output area. All other callers of hv_do_hypercall()
would be unchanged.

> * Not using the output page leads to negligible savings,
> 
> it is hard to see for me how to make not using the hypercall output page
> look as a profitable enginnering endeavor, really comes off as dancing
> to the perf scare/feature creep tune for peanuts.

There would be some simplification in hv_common_free(), hv_common_init(),
and hv_common_cpu_init() if there was no hyperv_pcpu_output_arg.  :-)

> 
> In my opinion, we should use the hypercall output page for the VTL mode
> as dom0 does to share code and reduce code churn. Had we used some
> `hv_get_registers` in both​ instead of the specialized for no compelling
> imo practical reason `get_vtl`​ (as it just gets a vCPU register, nothing
> more, nothing less), this code path would've been better tested, and any
> of this patching would not have been needed.

FWIW, the historical context is that at the time get_vtl() was implemented,
all VP registers that Linux needed to access were available as synthetic
MSRs on the x86 side. Going back to its origins 15 years ago, the Linux code
for Hyper-V always just used the synthetic MSRs. get_vtl() was the first time
that x86 code needed the HVCALL_GET_VP_REGISTERS hypercall because
HV_REGISTER_VSM_VP_STATUS has no synthetic MSR equivalent. There still
is no 'hv_get_registers' function in upstream code on the x86 side.
hv_get_vpreg() exists only on the arm64 side where there are no synthetic
MSRs or equivalent.

In hindsight, hv_get_vpreg() could have been added on the x86 side, and
then get_vtl() implemented on top of that. And I made the get_vtl() situation
worse by giving Tianyu bad advice about overlapping the output area with
the input area. :-(

I once had a manager at Microsoft who said "He reserved the right
to wake up smarter every day."  I've had to claim that right as well
from time-to-time. :-)

> 
> I'd wait for few days and then would likely prefer to run with Wei's
> permission to send this patch in v2 as-is unless some damning evidence
> presents itself.

Again, I won't object to your taking that path.

But regarding compliance with the TLFS, take a look at the code for
hv_pci_read_mmio() and hv_pci_write_mmio().  Do you think that code
violates the spec?  If not, what would a violation look like in the context
of this discussion?  If current code is in violation, then we should fix that.

Michael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas in get_vtl(void)
  2024-12-20 22:42               ` Michael Kelley
@ 2024-12-23 20:30                 ` Roman Kisel
  2024-12-24 16:45                   ` Michael Kelley
  0 siblings, 1 reply; 27+ messages in thread
From: Roman Kisel @ 2024-12-23 20:30 UTC (permalink / raw)
  To: Michael Kelley, Wei Liu
  Cc: hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev



On 12/20/2024 2:42 PM, Michael Kelley wrote:
> From: Roman Kisel <romank@linux.microsoft.com> Sent: Friday, December 20, 2024 11:14 AM
>>

[...]

> 
> I will not object to your preferred path of allocating the
> hyperv_pcpu_output_arg when CONFIG_HYPERV_VTL_MODE is set. For
> me, it has been a worthwhile discussion to explore the alternatives. I do,
> however, want to get further clarification about the spec requirements
> and whether our current Linux code is meeting those requirements. More
> on that below.  And allow me to continue the discussion about a couple
> of points you raised.
> 
Thank you, Michael, for helping me explore the options! From where I
sit, we both seem to agree that an update is needed, and where our
opinions diverge is whether we need to spend another per-CPU page or
not. If at any point, you start believing that your "Nack" is the best
option going forward, I'll put this patch aside.

[...]

> I see three uses in current upstream code. And then there are close to
> a dozen more uses in Nuno's year-old patch set for /dev/mshv that hasn't
> been accepted upstream yet. Does that roughly match what you are referring
> to?
> 
While compiling my arguments and making sure they're worth the readers'
time, I did miss to provide references to the code. Yes, that patch set
might be a good illustration of the direction, thank you very much for
pointing that out!

Another reference might have been the kernel used with OpenHCL (6.6.x):
https://github.com/microsoft/OHCL-Linux-Kernel. Azure Boost runs that, 
and the kernel is delighting our customers with the rock-solid
foundation.

It contains lots of mshv code, and it is the code run in Azure as the
VTL2 kernel; we're working towards upstreaming, e.g. here :) It uses the
hypercall output page, and the performance folks beat the hell out of
the code, and found it being up to snuff. There were few findings about
memory consumption out of which I remember the perf folks tuning SWIOTLB
size and tweaking the kernel memory layout to save on padding and
alignment. As for the per-CPU data, nothing allowing to save that much,
perhaps would be nice to be able to compile out more. Yet, as I
understand, x86_64 systems might not always be envisioned as the
commonplace foundation for building embedded systems so while we're
looking to save few KiBs in hyperv, there's likely much more to save
elsewhere.

>> * dom0 and the VTL mode can share code quite naturally,
> 
> Yep.
> 
To me, this has been the ever-shining light in the dark maze of other
arguments. It provides a corner stone to building the generic well-
tested code, hence developers will rest assured when introducing changes
that the changes are robust even when not testing the myriad ways the
hyperv code is used in. That is, the validation wouldn't need all the
permutations of conditional statements and all #ifdef's that would be
needed when special-casing.

>> * Not using the output page cannot acccount for all cases permitted by
>>     the TLFS clause "don't overlap input and output, and don't cross page
>>     boundaries",
> 
> This is what I don’t understand.
> 
I meant that when implementing a generic case, not using the output page
prohibits from being able to receive a page of output when supplying a
page of input. I agree that we can special-case to save a bit of memory
as it appears we don't need that ability (4KiB of output in response to
4KiB input) right now. What I question though if that specialization is
needed at all since it will lead to more code, more parameters for the
functions, more comments and if statements therefore more validation
work and more cognitive load, and more maintenance costs.

>> * Not using the output page everywhere for consistency requires updating
>>     call sites of `hv_do_hypercall`​ and friends, i.e. every place where a
>>     hypercall is made needs to incorporate the offset/pointer at which the
>>     output should start, or perhaps do some shenanigans with macro's,
> 
> In my musing about getting rid of hyperv_pcpu_output_arg entirely, the
> call signature of hv_do_hypercall() and friends would remain unchanged.
> There would still be a virtual address passed as the output argument (which
> might be NULL, and is *not* required to be page aligned). The caller of
> hv_do_hypercall() must only ensure that the virtual address points to an
> output area that is large enough to contain the output without crossing a
> page boundary. The area must also not overlap with the input area. Given
> that we currently have only 3 uses of hyperv_pcpu_output_arg in upstream
> code, only those would need to be tweaked to split the hyperv_pcpu_input_arg
> page into an input and output area. All other callers of hv_do_hypercall()
> would be unchanged.
> 
Your approach appears being solid to me! Again, going to be beating on
my drum :) I believe here we are deciding not only on the small
optimization which is a clever way of saving a bit of memory, and it did
bring a satisfying "wow, nice" to my mind.

We are drafting the constrains for the future code as it appears;
without knowing better, perhaps it is the wisest not to constrain. As
the quote goes:

"... We should forget about small efficiencies, say about 97% of the
time: premature optimization is the root of all evil. Yet we should not
pass up our opportunities in that critical 3%. A good programmer ...
will be wise to look carefully at the critical code; but only after that
code has been identified. ..."


>> * Not using the output page leads to negligible savings,
>>
>> it is hard to see for me how to make not using the hypercall output page
>> look as a profitable enginnering endeavor, really comes off as dancing
>> to the perf scare/feature creep tune for peanuts.
> 
> There would be some simplification in hv_common_free(), hv_common_init(),
> and hv_common_cpu_init() if there was no hyperv_pcpu_output_arg.  :-)
> 
Agreed!

>>
>> In my opinion, we should use the hypercall output page for the VTL mode
>> as dom0 does to share code and reduce code churn. Had we used some
>> `hv_get_registers` in both​ instead of the specialized for no compelling
>> imo practical reason `get_vtl`​ (as it just gets a vCPU register, nothing
>> more, nothing less), this code path would've been better tested, and any
>> of this patching would not have been needed.
> 
> FWIW, the historical context is that at the time get_vtl() was implemented,
> all VP registers that Linux needed to access were available as synthetic
> MSRs on the x86 side. Going back to its origins 15 years ago, the Linux code
> for Hyper-V always just used the synthetic MSRs. get_vtl() was the first time
> that x86 code needed the HVCALL_GET_VP_REGISTERS hypercall because
> HV_REGISTER_VSM_VP_STATUS has no synthetic MSR equivalent. There still
> is no 'hv_get_registers' function in upstream code on the x86 side.
> hv_get_vpreg() exists only on the arm64 side where there are no synthetic
> MSRs or equivalent.


> 
> In hindsight, hv_get_vpreg() could have been added on the x86 side, and
> then get_vtl() implemented on top of that. And I made the get_vtl() situation
> worse by giving Tianyu bad advice about overlapping the output area with
> the input area. :-(
> 
> I once had a manager at Microsoft who said "He reserved the right
> to wake up smarter every day."  I've had to claim that right as well
> from time-to-time. :-)
> 
Michael, thank you very much for your efforts! At that time, no one
knew otherwise iiuc. You had the willpower to make things happen and the
vision to bring the solution over the finish line, and imo the Hyper-V
community just cannot possibly thank you enough for your incredible
work. It is my genuine hope that putting the Fixes tag didn't overshadow
that fact for anyone as these five letters don't tell any of the story,
most assuredly.

>>
>> I'd wait for few days and then would likely prefer to run with Wei's
>> permission to send this patch in v2 as-is unless some damning evidence
>> presents itself.
> 
> Again, I won't object to your taking that path.
> 
> But regarding compliance with the TLFS, take a look at the code for
> hv_pci_read_mmio() and hv_pci_write_mmio().  Do you think that code
> violates the spec?  If not, what would a violation look like in the context
> of this discussion?  If current code is in violation, then we should fix that.
> 
Appreciate your advice! I don't think they do. The code doesn't overlap 
input and output, and to my eye, the arguments are sized and aligned 
appropriately. That does not violate TLFS. I'd think the
`hv_pci_read_mmio` might've used the output page as it is supposed to be
used for output to stick to one pattern, much similar to the `get_vtl`
in question.

Now, it seems the functions run in the VTL0 only (but I'd leave that to
someone else to judge) where we currently don't allocate the output
page, and `hv_pci_read_mmio` is the only function doing that input page
split so special-casing might be justified, perhaps adding a comment
would seem appropriate. Unless there are prospects to merge that with
dom0, I'd leave the code be.

> Michael

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas in get_vtl(void)
  2024-12-23 20:30                 ` Roman Kisel
@ 2024-12-24 16:45                   ` Michael Kelley
  2024-12-26 16:45                     ` Roman Kisel
  0 siblings, 1 reply; 27+ messages in thread
From: Michael Kelley @ 2024-12-24 16:45 UTC (permalink / raw)
  To: Roman Kisel, Wei Liu
  Cc: hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev

From: Roman Kisel <romank@linux.microsoft.com> Sent: Monday, December 23, 2024 12:30 PM
> 
> On 12/20/2024 2:42 PM, Michael Kelley wrote:
> > From: Roman Kisel <romank@linux.microsoft.com> Sent: Friday, December 20, 2024 11:14 AM
> >>
> 
> [...]
> 
> >
> > I will not object to your preferred path of allocating the
> > hyperv_pcpu_output_arg when CONFIG_HYPERV_VTL_MODE is set. For
> > me, it has been a worthwhile discussion to explore the alternatives. I do,
> > however, want to get further clarification about the spec requirements
> > and whether our current Linux code is meeting those requirements. More
> > on that below.  And allow me to continue the discussion about a couple
> > of points you raised.
> >
> Thank you, Michael, for helping me explore the options! From where I
> sit, we both seem to agree that an update is needed, and where our
> opinions diverge is whether we need to spend another per-CPU page or
> not. If at any point, you start believing that your "Nack" is the best
> option going forward, I'll put this patch aside.
> 
> [...]
> 
> > I see three uses in current upstream code. And then there are close to
> > a dozen more uses in Nuno's year-old patch set for /dev/mshv that hasn't
> > been accepted upstream yet. Does that roughly match what you are referring
> > to?
> >
> While compiling my arguments and making sure they're worth the readers'
> time, I did miss to provide references to the code. Yes, that patch set
> might be a good illustration of the direction, thank you very much for
> pointing that out!
> 
> Another reference might have been the kernel used with OpenHCL (6.6.x):
> https://github.com/microsoft/OHCL-Linux-Kernel. Azure Boost runs that,
> and the kernel is delighting our customers with the rock-solid
> foundation.

This is good. I'll refer to this code base as background as more patches
come through to upstream the deltas. I had seen the blog post about
OpenHCL but didn’t mentally connect the patches as coming from
that code base.

> 
> It contains lots of mshv code, and it is the code run in Azure as the
> VTL2 kernel; we're working towards upstreaming, e.g. here :) It uses the
> hypercall output page, and the performance folks beat the hell out of
> the code, and found it being up to snuff. There were few findings about
> memory consumption out of which I remember the perf folks tuning SWIOTLB
> size and tweaking the kernel memory layout to save on padding and
> alignment. As for the per-CPU data, nothing allowing to save that much,
> perhaps would be nice to be able to compile out more. Yet, as I
> understand, x86_64 systems might not always be envisioned as the
> commonplace foundation for building embedded systems so while we're
> looking to save few KiBs in hyperv, there's likely much more to save
> elsewhere.
> 
> >> * dom0 and the VTL mode can share code quite naturally,
> >
> > Yep.
> >
> To me, this has been the ever-shining light in the dark maze of other
> arguments. It provides a corner stone to building the generic well-
> tested code, hence developers will rest assured when introducing changes
> that the changes are robust even when not testing the myriad ways the
> hyperv code is used in. That is, the validation wouldn't need all the
> permutations of conditional statements and all #ifdef's that would be
> needed when special-casing.
> 
> >> * Not using the output page cannot acccount for all cases permitted by
> >>     the TLFS clause "don't overlap input and output, and don't cross page
> >>     boundaries",
> >
> > This is what I don’t understand.
> >
> I meant that when implementing a generic case, not using the output page
> prohibits from being able to receive a page of output when supplying a
> page of input. I agree that we can special-case to save a bit of memory
> as it appears we don't need that ability (4KiB of output in response to
> 4KiB input) right now. What I question though if that specialization is
> needed at all since it will lead to more code, more parameters for the
> functions, more comments and if statements therefore more validation
> work and more cognitive load, and more maintenance costs.
> 
> >> * Not using the output page everywhere for consistency requires updating
> >>     call sites of `hv_do_hypercall`​ and friends, i.e. every place where a
> >>     hypercall is made needs to incorporate the offset/pointer at which the
> >>     output should start, or perhaps do some shenanigans with macro's,
> >
> > In my musing about getting rid of hyperv_pcpu_output_arg entirely, the
> > call signature of hv_do_hypercall() and friends would remain unchanged.
> > There would still be a virtual address passed as the output argument (which
> > might be NULL, and is *not* required to be page aligned). The caller of
> > hv_do_hypercall() must only ensure that the virtual address points to an
> > output area that is large enough to contain the output without crossing a
> > page boundary. The area must also not overlap with the input area. Given
> > that we currently have only 3 uses of hyperv_pcpu_output_arg in upstream
> > code, only those would need to be tweaked to split the hyperv_pcpu_input_arg
> > page into an input and output area. All other callers of hv_do_hypercall()
> > would be unchanged.
> >
> Your approach appears being solid to me! Again, going to be beating on
> my drum :) I believe here we are deciding not only on the small
> optimization which is a clever way of saving a bit of memory, and it did
> bring a satisfying "wow, nice" to my mind.
> 
> We are drafting the constrains for the future code as it appears;
> without knowing better, perhaps it is the wisest not to constrain. As
> the quote goes:
> 
> "... We should forget about small efficiencies, say about 97% of the
> time: premature optimization is the root of all evil. Yet we should not
> pass up our opportunities in that critical 3%. A good programmer ...
> will be wise to look carefully at the critical code; but only after that
> code has been identified. ..."
> 
> 
> >> * Not using the output page leads to negligible savings,
> >>
> >> it is hard to see for me how to make not using the hypercall output page
> >> look as a profitable enginnering endeavor, really comes off as dancing
> >> to the perf scare/feature creep tune for peanuts.
> >
> > There would be some simplification in hv_common_free(), hv_common_init(),
> > and hv_common_cpu_init() if there was no hyperv_pcpu_output_arg.  :-)
> >
> Agreed!
> 
> >>
> >> In my opinion, we should use the hypercall output page for the VTL mode
> >> as dom0 does to share code and reduce code churn. Had we used some
> >> `hv_get_registers` in both​ instead of the specialized for no compelling
> >> imo practical reason `get_vtl`​ (as it just gets a vCPU register, nothing
> >> more, nothing less), this code path would've been better tested, and any
> >> of this patching would not have been needed.
> >
> > FWIW, the historical context is that at the time get_vtl() was implemented,
> > all VP registers that Linux needed to access were available as synthetic
> > MSRs on the x86 side. Going back to its origins 15 years ago, the Linux code
> > for Hyper-V always just used the synthetic MSRs. get_vtl() was the first time
> > that x86 code needed the HVCALL_GET_VP_REGISTERS hypercall because
> > HV_REGISTER_VSM_VP_STATUS has no synthetic MSR equivalent. There still
> > is no 'hv_get_registers' function in upstream code on the x86 side.
> > hv_get_vpreg() exists only on the arm64 side where there are no synthetic
> > MSRs or equivalent.
> 
> 
> >
> > In hindsight, hv_get_vpreg() could have been added on the x86 side, and
> > then get_vtl() implemented on top of that. And I made the get_vtl() situation
> > worse by giving Tianyu bad advice about overlapping the output area with
> > the input area. :-(
> >
> > I once had a manager at Microsoft who said "He reserved the right
> > to wake up smarter every day."  I've had to claim that right as well
> > from time-to-time. :-)
> >
> Michael, thank you very much for your efforts! At that time, no one
> knew otherwise iiuc. You had the willpower to make things happen and the
> vision to bring the solution over the finish line, and imo the Hyper-V
> community just cannot possibly thank you enough for your incredible
> work. It is my genuine hope that putting the Fixes tag didn't overshadow
> that fact for anyone as these five letters don't tell any of the story,
> most assuredly.
> 
> >>
> >> I'd wait for few days and then would likely prefer to run with Wei's
> >> permission to send this patch in v2 as-is unless some damning evidence
> >> presents itself.
> >
> > Again, I won't object to your taking that path.
> >
> > But regarding compliance with the TLFS, take a look at the code for
> > hv_pci_read_mmio() and hv_pci_write_mmio().  Do you think that code
> > violates the spec?  If not, what would a violation look like in the context
> > of this discussion?  If current code is in violation, then we should fix that.
> >
> Appreciate your advice! I don't think they do. The code doesn't overlap
> input and output, and to my eye, the arguments are sized and aligned
> appropriately. That does not violate TLFS. I'd think the
> `hv_pci_read_mmio` might've used the output page as it is supposed to be
> used for output to stick to one pattern, much similar to the `get_vtl`
> in question.
> 
> Now, it seems the functions run in the VTL0 only (but I'd leave that to
> someone else to judge) where we currently don't allocate the output
> page, and `hv_pci_read_mmio` is the only function doing that input page
> split so special-casing might be justified, perhaps adding a comment
> would seem appropriate. Unless there are prospects to merge that with
> dom0, I'd leave the code be.

OK, my understanding is that your concern about spec conformance is
just that Linux should be able to allocate enough input and output space
for the maximum case, which is 4KiB of input *plus* 4KiB of output. If
the total amount of input plus output for a particular hypercall is less
than 4KiB, then there's no conformance problem with having the input
and output share a page, as long as the "no overlap" rule is observed.

There's an idea kicking around in my head about a different way to
handle all this that might be cleaner and less code all-around. If I
get motivated, I may code it up and see if it really works. If so,
I'll run it by you to see what you think.

Michael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas in get_vtl(void)
  2024-12-24 16:45                   ` Michael Kelley
@ 2024-12-26 16:45                     ` Roman Kisel
  2024-12-26 20:04                       ` Michael Kelley
  0 siblings, 1 reply; 27+ messages in thread
From: Roman Kisel @ 2024-12-26 16:45 UTC (permalink / raw)
  To: Michael Kelley, Wei Liu
  Cc: hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev



On 12/24/2024 8:45 AM, Michael Kelley wrote:

[...]

> 
> OK, my understanding is that your concern about spec conformance is
> just that Linux should be able to allocate enough input and output space
> for the maximum case, which is 4KiB of input *plus* 4KiB of output. If
> the total amount of input plus output for a particular hypercall is less
> than 4KiB, then there's no conformance problem with having the input
> and output share a page, as long as the "no overlap" rule is observed.
>
Appreciate bearing with me and guiding me towards expressing the intent
clearer :) Yes, the logic chain has been:

* can't overlap input and output due to TLFS req's =>
* need to fix get_vtl() *&&* dom0 uses the output page *&&* VTLs use
   the output page =>
* let us fix the overlap *&&* make get_vtl() look as get_vp_register()
   as this is what it actually is so soon we should be able to have less
   code.

Getting rid of the hypercall output page feels like too much as if the
code base is dovetailed to that and the hypervisor gets a hypercall
whose output is as large as a page (however unlikely that sounds, but
first there was an opinion that 640KiB is plenty, then 32 address lines,
then 48 bits in the PA and 4 level pages, then 57 bits and 5 levels,
...), we'd need to fix the code or allocate and deallocate on demand.
That tradeoff b/w saving a page and adding special cases makes me lean
to just allocate the page as it is allocated anyway.

> There's an idea kicking around in my head about a different way to
> handle all this that might be cleaner and less code all-around. If I
> get motivated, I may code it up and see if it really works. If so,
> I'll run it by you to see what you think.
MUCH appreciated!! The complexity appears to be increasing over time,
and it would be incredible to pack all we got into less code without
constraining ourselves too much :)

> 
> Michael

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas in get_vtl(void)
  2024-12-26 16:45                     ` Roman Kisel
@ 2024-12-26 20:04                       ` Michael Kelley
  2024-12-26 20:42                         ` Roman Kisel
  0 siblings, 1 reply; 27+ messages in thread
From: Michael Kelley @ 2024-12-26 20:04 UTC (permalink / raw)
  To: Roman Kisel, Wei Liu
  Cc: hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev

From: Roman Kisel <romank@linux.microsoft.com> Sent: Thursday, December 26, 2024 8:46 AM
> 
> On 12/24/2024 8:45 AM, Michael Kelley wrote:
> 
> [...]
> 
> >
> > OK, my understanding is that your concern about spec conformance is
> > just that Linux should be able to allocate enough input and output space
> > for the maximum case, which is 4KiB of input *plus* 4KiB of output. If
> > the total amount of input plus output for a particular hypercall is less
> > than 4KiB, then there's no conformance problem with having the input
> > and output share a page, as long as the "no overlap" rule is observed.
> >
> Appreciate bearing with me and guiding me towards expressing the intent
> clearer :) Yes, the logic chain has been:
> 
> * can't overlap input and output due to TLFS req's =>
> * need to fix get_vtl() *&&* dom0 uses the output page *&&* VTLs use
>    the output page =>
> * let us fix the overlap *&&* make get_vtl() look as get_vp_register()
>    as this is what it actually is so soon we should be able to have less
>    code.
> 
> Getting rid of the hypercall output page feels like too much as if the
> code base is dovetailed to that and the hypervisor gets a hypercall
> whose output is as large as a page (however unlikely that sounds, but
> first there was an opinion that 640KiB is plenty, then 32 address lines,
> then 48 bits in the PA and 4 level pages, then 57 bits and 5 levels,
> ...), we'd need to fix the code or allocate and deallocate on demand.
> That tradeoff b/w saving a page and adding special cases makes me lean
> to just allocate the page as it is allocated anyway.
> 
> > There's an idea kicking around in my head about a different way to
> > handle all this that might be cleaner and less code all-around. If I
> > get motivated, I may code it up and see if it really works. If so,
> > I'll run it by you to see what you think.
> MUCH appreciated!! The complexity appears to be increasing over time,
> and it would be incredible to pack all we got into less code without
> constraining ourselves too much :)
> 

As I was looking at how hypercall input and output arguments are
managed in upstream code and in the OHCL-Linux-Kernel repo,
I noticed two things:

1) There's a bug in mshv_vtl_hvcall_call() in the OHCL-Linux-Kernel
repo, for which I filed a github issue. [1]

2) hv_vtl_apicid_to_vp_id() also has the overlapping hypercall input
and output spec violation. You might want to fix that occurrence as
well in this patch set.

Michael

[1] https://github.com/microsoft/OHCL-Linux-Kernel/issues/33

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas in get_vtl(void)
  2024-12-26 20:04                       ` Michael Kelley
@ 2024-12-26 20:42                         ` Roman Kisel
  0 siblings, 0 replies; 27+ messages in thread
From: Roman Kisel @ 2024-12-26 20:42 UTC (permalink / raw)
  To: Michael Kelley, Wei Liu
  Cc: hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev



On 12/26/2024 12:04 PM, Michael Kelley wrote:
[...]

> 
> As I was looking at how hypercall input and output arguments are
> managed in upstream code and in the OHCL-Linux-Kernel repo,
> I noticed two things:
> 
> 1) There's a bug in mshv_vtl_hvcall_call() in the OHCL-Linux-Kernel
> repo, for which I filed a github issue. [1]
Appreciated very much!

> 
> 2) hv_vtl_apicid_to_vp_id() also has the overlapping hypercall input
> and output spec violation. You might want to fix that occurrence as
> well in this patch set.
Let me do that, thank you!

> 
> Michael
> 
> [1] https://github.com/microsoft/OHCL-Linux-Kernel/issues/33

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2024-12-26 20:42 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-18 20:54 [PATCH 0/2] hyperv: Fixes for get_vtl(void) Roman Kisel
2024-12-18 20:54 ` [PATCH 1/2] hyperv: Fix pointer type for the output of the hypercall in get_vtl(void) Roman Kisel
2024-12-19  2:45   ` Wei Liu
2024-12-19 17:26     ` Roman Kisel
2024-12-19 18:40   ` Nuno Das Neves
2024-12-19 19:11     ` Roman Kisel
2024-12-19 19:13     ` Easwar Hariharan
2024-12-19 19:23       ` Nuno Das Neves
2024-12-19 19:32         ` Easwar Hariharan
2024-12-19 20:03           ` Roman Kisel
2024-12-19 20:00       ` Roman Kisel
2024-12-19 20:04         ` Easwar Hariharan
2024-12-19 20:18           ` Roman Kisel
2024-12-18 20:54 ` [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas " Roman Kisel
2024-12-19  2:42   ` Wei Liu
2024-12-19 18:19     ` Roman Kisel
2024-12-19 21:35       ` Wei Liu
2024-12-19 21:37       ` Michael Kelley
2024-12-19 23:39         ` Roman Kisel
2024-12-20  2:01           ` Michael Kelley
2024-12-20 19:13             ` Roman Kisel
2024-12-20 22:42               ` Michael Kelley
2024-12-23 20:30                 ` Roman Kisel
2024-12-24 16:45                   ` Michael Kelley
2024-12-26 16:45                     ` Roman Kisel
2024-12-26 20:04                       ` Michael Kelley
2024-12-26 20:42                         ` Roman Kisel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).