[PATCH v5 0/5] hyperv: Fixes for get_vtl(), hv_vtl_apicid_to_vp

linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5 0/5] hyperv: Fixes for get_vtl(), hv_vtl_apicid_to_vp_id()
@ 2024-12-30 18:09 Roman Kisel
  2024-12-30 18:09 ` [PATCH v5 1/5] hyperv: Define struct hv_output_get_vp_registers Roman Kisel
                   ` (5 more replies)
  0 siblings, 6 replies; 40+ messages in thread
From: Roman Kisel @ 2024-12-30 18:09 UTC (permalink / raw)
  To: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86
  Cc: apais, benhill, ssengar, sunilmut, vdso

The get_vtl(void) function

* has got one bug when the code started using a wrong pointer type after
  refactoring, and also
* the both function in question don't adhere to the requirements of
  the Hypervisor Top-Level Funactional Specification[1, 2] as the code overlaps
  the input and output areas for a hypercall.

The first issue leads to a wrong 100% reproducible computation due to reading
a byte worth of data at a wrong offset. That in turn leads to using a nonsensical
value ("fortunately", could catch it easily!) for the current VTL when initiating
VMBus communications. As a repercussion from that, the system wouldn't boot. The
fix is straightforward: use the correct pointer type.

The second issue doesn't seem to lead to any reproducible breakage just yet. It is
fixed with using the output hypercall pages allocated per-CPU, and that isn't the
only or the most obvious choice so let me elaborate why that fix appears to be the
best one in my opinion out of the options I could conceive of.

The approach chosen for fixing the second issue makes two things shine through:

* these functions just get a vCPU register, no special treatment needs to be
  involved,
* VTLs and dom0 can and should share code as both exist to provide services to
  a guest(s), be that from within the partition or from outside of it.

The projected benefits include replacing the functions in question with a future
`hv_get_vp_registers` one shared between dom0 and VTLs to allow for a better test
coverage.

I have validated the fixes by booting the fixed kernel in VTL2 up using OpenVMM and
OpenHCL[3, 4].

[1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/hypercall-interface
[2] https://github.com/MicrosoftDocs/Virtualization-Documentation/tree/main/tlfs
[3] https://openvmm.dev/guide/user_guide/openhcl.html
[4] https://github.com/microsoft/OpenVMM

[v5]
  - In the first patch, removed some arch-specific #ifdef guards to fix the
    arm64 build and stick to the direction chosen for the Hyper-V header files.
    I could not remove all of them as some interrupt state structures
    are defined differently for x64 and arm64 and are found in the same
    enclosing `union hv_register_value`.

    No changes to other patches (approved in v4).

[v4]: https://lore.kernel.org/lkml/20241227183155.122827-1-romank@linux.microsoft.com/
  - Wrapped DECLARE_FLEX_ARRAY into a struct and added one more
    member as the documentation requires,
  - Removed superfluous type coercion,
  - Fixed tags,
  - Rebased onto the latest hyperv-next branch.

[v3]: https://lore.kernel.org/lkml/20241226213110.899497-1-romank@linux.microsoft.com/
  - Added a fix for hv_vtl_apicid_to_vp_id(),
  - Split out the patch for enabling the hypercall output page,
  - Updated the title of the patch series,

[v2]: https://lore.kernel.org/lkml/20241226203050.800524-1-romank@linux.microsoft.com/
  - Used the suggestions to define an additional structure to improve code readability,
  - Split out the patch with that definition.

[v1]: https://lore.kernel.org/lkml/20241218205421.319969-1-romank@linux.microsoft.com/

Roman Kisel (5):
  hyperv: Define struct hv_output_get_vp_registers
  hyperv: Fix pointer type in get_vtl(void)
  hyperv: Enable the hypercall output page for the VTL mode
  hyperv: Do not overlap the hvcall IO areas in get_vtl()
  hyperv: Do not overlap the hvcall IO areas in hv_vtl_apicid_to_vp_id()

 arch/x86/hyperv/hv_init.c   |  6 ++---
 arch/x86/hyperv/hv_vtl.c    |  2 +-
 drivers/hv/hv_common.c      |  6 ++---
 include/hyperv/hvgdk_mini.h | 49 +++++++++++++++++++++++++++++++++++++
 4 files changed, 56 insertions(+), 7 deletions(-)

base-commit: 26e1b813fcd02984b1cac5f3decdf4b0bb56fe02
-- 
2.34.1

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v5 1/5] hyperv: Define struct hv_output_get_vp_registers
  2024-12-30 18:09 [PATCH v5 0/5] hyperv: Fixes for get_vtl(), hv_vtl_apicid_to_vp_id() Roman Kisel
@ 2024-12-30 18:09 ` Roman Kisel
  2025-01-06 17:37   ` Michael Kelley
  2024-12-30 18:09 ` [PATCH v5 2/5] hyperv: Fix pointer type in get_vtl(void) Roman Kisel
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 40+ messages in thread
From: Roman Kisel @ 2024-12-30 18:09 UTC (permalink / raw)
  To: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86
  Cc: apais, benhill, ssengar, sunilmut, vdso

There is no definition of the output structure for the
GetVpRegisters hypercall. Hence, using the hypercall
is not possible when the output value has some structure
to it. Even getting a datum of a primitive type reads
as ad-hoc without that definition.

Define struct hv_output_get_vp_registers to enable using
the GetVpRegisters hypercall. Make provisions for all
supported architectures. No functional changes.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 include/hyperv/hvgdk_mini.h | 49 +++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
index db3d1aaf7330..e8e3faa78e15 100644
--- a/include/hyperv/hvgdk_mini.h
+++ b/include/hyperv/hvgdk_mini.h
@@ -1068,6 +1068,35 @@ union hv_dispatch_suspend_register {
 	} __packed;
 };
 
+union hv_arm64_pending_interruption_register {
+	u64 as_uint64;
+	struct {
+		u64 interruption_pending : 1;
+		u64 interruption_type : 1;
+		u64 reserved : 30;
+		u32 error_code;
+	} __packed;
+};
+
+union hv_arm64_interrupt_state_register {
+	u64 as_uint64;
+	struct {
+		u64 interrupt_shadow : 1;
+		u64 reserved : 63;
+	} __packed;
+};
+
+union hv_arm64_pending_synthetic_exception_event {
+	u64 as_uint64[2];
+	struct {
+		u32 event_pending : 1;
+		u32 event_type : 3;
+		u32 reserved : 4;
+		u32 exception_type;
+		u64 context;
+	} __packed;
+};
+
 union hv_x64_interrupt_state_register {
 	u64 as_uint64;
 	struct {
@@ -1103,8 +1132,28 @@ union hv_register_value {
 	union hv_explicit_suspend_register explicit_suspend;
 	union hv_intercept_suspend_register intercept_suspend;
 	union hv_dispatch_suspend_register dispatch_suspend;
+#ifdef CONFIG_ARM64
+	union hv_arm64_interrupt_state_register interrupt_state;
+	union hv_arm64_pending_interruption_register pending_interruption;
+#endif
+#ifdef CONFIG_X86
 	union hv_x64_interrupt_state_register interrupt_state;
 	union hv_x64_pending_interruption_register pending_interruption;
+#endif
+	union hv_arm64_pending_synthetic_exception_event pending_synthetic_exception_event;
+};
+
+/*
+ * NOTE: Linux helper struct - NOT from Hyper-V code.
+ * DECLARE_FLEX_ARRAY() needs to be wrapped into
+ * a structure and have at least one more member besides
+ * DECLARE_FLEX_ARRAY.
+ */
+struct hv_output_get_vp_registers {
+	struct {
+		DECLARE_FLEX_ARRAY(union hv_register_value, values);
+		struct {} values_end;
+	};
 };
 
 #if defined(CONFIG_ARM64)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 2/5] hyperv: Fix pointer type in get_vtl(void)
  2024-12-30 18:09 [PATCH v5 0/5] hyperv: Fixes for get_vtl(), hv_vtl_apicid_to_vp_id() Roman Kisel
  2024-12-30 18:09 ` [PATCH v5 1/5] hyperv: Define struct hv_output_get_vp_registers Roman Kisel
@ 2024-12-30 18:09 ` Roman Kisel
  2025-01-08 17:59   ` Nuno Das Neves
  2024-12-30 18:09 ` [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode Roman Kisel
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 40+ messages in thread
From: Roman Kisel @ 2024-12-30 18:09 UTC (permalink / raw)
  To: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86
  Cc: apais, benhill, ssengar, sunilmut, vdso

Commit bc905fa8b633 ("hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h")
changed the type of the output pointer to `struct hv_register_assoc` from
`struct hv_get_vp_registers_output`. That leads to an incorrect computation,
and leaves the system broken.

Use the correct pointer type for the output of the GetVpRegisters hypercall.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 arch/x86/hyperv/hv_init.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 9e5e8328df6b..f82d1aefaa8a 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -416,13 +416,13 @@ static u8 __init get_vtl(void)
 {
 	u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS;
 	struct hv_input_get_vp_registers *input;
-	struct hv_register_assoc *output;
+	struct hv_output_get_vp_registers *output;
 	unsigned long flags;
 	u64 ret;
 
 	local_irq_save(flags);
 	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	output = (struct hv_register_assoc *)input;
+	output = (struct hv_output_get_vp_registers *)input;
 
 	memset(input, 0, struct_size(input, names, 1));
 	input->partition_id = HV_PARTITION_ID_SELF;
@@ -432,7 +432,7 @@ static u8 __init get_vtl(void)
 
 	ret = hv_do_hypercall(control, input, output);
 	if (hv_result_success(ret)) {
-		ret = output->value.reg8 & HV_X64_VTL_MASK;
+		ret = output->values[0].reg8 & HV_X64_VTL_MASK;
 	} else {
 		pr_err("Failed to get VTL(error: %lld) exiting...\n", ret);
 		BUG();
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2024-12-30 18:09 [PATCH v5 0/5] hyperv: Fixes for get_vtl(), hv_vtl_apicid_to_vp_id() Roman Kisel
  2024-12-30 18:09 ` [PATCH v5 1/5] hyperv: Define struct hv_output_get_vp_registers Roman Kisel
  2024-12-30 18:09 ` [PATCH v5 2/5] hyperv: Fix pointer type in get_vtl(void) Roman Kisel
@ 2024-12-30 18:09 ` Roman Kisel
  2025-01-03 19:20   ` Stanislav Kinsburskii
  2025-01-08 21:08   ` Nuno Das Neves
  2024-12-30 18:09 ` [PATCH v5 4/5] hyperv: Do not overlap the hvcall IO areas in get_vtl() Roman Kisel
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 40+ messages in thread
From: Roman Kisel @ 2024-12-30 18:09 UTC (permalink / raw)
  To: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86
  Cc: apais, benhill, ssengar, sunilmut, vdso

Due to the hypercall page not being allocated in the VTL mode,
the code resorts to using a part of the input page.

Allocate the hypercall output page in the VTL mode thus enabling
it to use it for output and share code with dom0.

Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 drivers/hv/hv_common.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index c6ed3ba4bf61..c983cfd4d6c0 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -340,7 +340,7 @@ int __init hv_common_init(void)
 	BUG_ON(!hyperv_pcpu_input_arg);
 
 	/* Allocate the per-CPU state for output arg for root */
-	if (hv_root_partition) {
+	if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
 		hyperv_pcpu_output_arg = alloc_percpu(void *);
 		BUG_ON(!hyperv_pcpu_output_arg);
 	}
@@ -435,7 +435,7 @@ int hv_common_cpu_init(unsigned int cpu)
 	void **inputarg, **outputarg;
 	u64 msr_vp_index;
 	gfp_t flags;
-	int pgcount = hv_root_partition ? 2 : 1;
+	const int pgcount = (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) ? 2 : 1;
 	void *mem;
 	int ret;
 
@@ -453,7 +453,7 @@ int hv_common_cpu_init(unsigned int cpu)
 		if (!mem)
 			return -ENOMEM;
 
-		if (hv_root_partition) {
+		if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
 			outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
 			*outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 4/5] hyperv: Do not overlap the hvcall IO areas in get_vtl()
  2024-12-30 18:09 [PATCH v5 0/5] hyperv: Fixes for get_vtl(), hv_vtl_apicid_to_vp_id() Roman Kisel
                   ` (2 preceding siblings ...)
  2024-12-30 18:09 ` [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode Roman Kisel
@ 2024-12-30 18:09 ` Roman Kisel
  2025-01-08  7:36   ` Wei Liu
  2024-12-30 18:09 ` [PATCH v5 5/5] hyperv: Do not overlap the hvcall IO areas in hv_vtl_apicid_to_vp_id() Roman Kisel
  2024-12-30 18:16 ` [PATCH v5 0/5] hyperv: Fixes for get_vtl(), hv_vtl_apicid_to_vp_id() Borislav Petkov
  5 siblings, 1 reply; 40+ messages in thread
From: Roman Kisel @ 2024-12-30 18:09 UTC (permalink / raw)
  To: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86
  Cc: apais, benhill, ssengar, sunilmut, vdso

The Top-Level Functional Specification for Hyper-V, Section 3.6 [1, 2],
disallows overlapping of the input and output hypercall areas, and
get_vtl(void) does overlap them.

Use the output hypercall page of the current vCPU for the hypercall.

[1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/hypercall-interface
[2] https://github.com/MicrosoftDocs/Virtualization-Documentation/tree/main/tlfs

Fixes: 8387ce06d70b ("x86/hyperv: Set Virtual Trust Level in VMBus init message")
Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 arch/x86/hyperv/hv_init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index f82d1aefaa8a..173005e6a95d 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -422,7 +422,7 @@ static u8 __init get_vtl(void)
 
 	local_irq_save(flags);
 	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	output = (struct hv_output_get_vp_registers *)input;
+	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
 
 	memset(input, 0, struct_size(input, names, 1));
 	input->partition_id = HV_PARTITION_ID_SELF;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v5 5/5] hyperv: Do not overlap the hvcall IO areas in hv_vtl_apicid_to_vp_id()
  2024-12-30 18:09 [PATCH v5 0/5] hyperv: Fixes for get_vtl(), hv_vtl_apicid_to_vp_id() Roman Kisel
                   ` (3 preceding siblings ...)
  2024-12-30 18:09 ` [PATCH v5 4/5] hyperv: Do not overlap the hvcall IO areas in get_vtl() Roman Kisel
@ 2024-12-30 18:09 ` Roman Kisel
  2024-12-30 18:16 ` [PATCH v5 0/5] hyperv: Fixes for get_vtl(), hv_vtl_apicid_to_vp_id() Borislav Petkov
  5 siblings, 0 replies; 40+ messages in thread
From: Roman Kisel @ 2024-12-30 18:09 UTC (permalink / raw)
  To: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86
  Cc: apais, benhill, ssengar, sunilmut, vdso

The Top-Level Functional Specification for Hyper-V, Section 3.6 [1, 2],
disallows overlapping of the input and output hypercall areas, and
hv_vtl_apicid_to_vp_id() overlaps them.

Use the output hypercall page of the current vCPU for the hypercall.

[1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/hypercall-interface
[2] https://github.com/MicrosoftDocs/Virtualization-Documentation/tree/main/tlfs

Reported-by: Michael Kelley <mhklinux@outlook.com>
Closes: https://lore.kernel.org/lkml/SN6PR02MB4157B98CD34781CC87A9D921D40D2@SN6PR02MB4157.namprd02.prod.outlook.com/
Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
 arch/x86/hyperv/hv_vtl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c
index 04775346369c..4e1b1e3b5658 100644
--- a/arch/x86/hyperv/hv_vtl.c
+++ b/arch/x86/hyperv/hv_vtl.c
@@ -189,7 +189,7 @@ static int hv_vtl_apicid_to_vp_id(u32 apic_id)
 	input->partition_id = HV_PARTITION_ID_SELF;
 	input->apic_ids[0] = apic_id;
 
-	output = (u32 *)input;
+	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
 
 	control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_ID_FROM_APIC_ID;
 	status = hv_do_hypercall(control, input, output);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 0/5] hyperv: Fixes for get_vtl(), hv_vtl_apicid_to_vp_id()
  2024-12-30 18:09 [PATCH v5 0/5] hyperv: Fixes for get_vtl(), hv_vtl_apicid_to_vp_id() Roman Kisel
                   ` (4 preceding siblings ...)
  2024-12-30 18:09 ` [PATCH v5 5/5] hyperv: Do not overlap the hvcall IO areas in hv_vtl_apicid_to_vp_id() Roman Kisel
@ 2024-12-30 18:16 ` Borislav Petkov
  5 siblings, 0 replies; 40+ messages in thread
From: Borislav Petkov @ 2024-12-30 18:16 UTC (permalink / raw)
  To: Roman Kisel
  Cc: hpa, kys, dave.hansen, decui, eahariha, haiyangz, mingo, mhklinux,
	nunodasneves, tglx, tiala, wei.liu, linux-hyperv, linux-kernel,
	x86, apais, benhill, ssengar, sunilmut, vdso

On Mon, Dec 30, 2024 at 10:09:36AM -0800, Roman Kisel wrote:
> [v5]
>   - In the first patch, removed some arch-specific #ifdef guards to fix the
>     arm64 build and stick to the direction chosen for the Hyper-V header files.
>     I could not remove all of them as some interrupt state structures
>     are defined differently for x64 and arm64 and are found in the same
>     enclosing `union hv_register_value`.
> 
>     No changes to other patches (approved in v4).

From: Documentation/process/submitting-patches.rst

Don't get discouraged - or impatient
------------------------------------

After you have submitted your change, be patient and wait.  Reviewers are
busy people and may not get to your patch right away.

Once upon a time, patches used to disappear into the void without comment,
but the development process works more smoothly than that now.  You should
receive comments within a week or so; if that does not happen, make sure
that you have sent your patches to the right place.  Wait for a minimum of
one week before resubmitting or pinging reviewers - possibly longer during
busy times like merge windows.

T Dec 18     Roman Kisel ( :8.6K|) [PATCH 0/2] hyperv: Fixes for get_vtl(void)
T Dec 26     Roman Kisel ( :8.1K|) [PATCH v2 0/3] hyperv: Fixes for get_vtl(void)
T Dec 26     Roman Kisel ( :8.6K|) [PATCH v3 0/5] hyperv: Fixes for get_vtl(), hv_vtl_apicid_to_vp_id()
T Dec 27     Roman Kisel ( :8.8K|) [PATCH v4 0/5] hyperv: Fixes for get_vtl(), hv_vtl_apicid_to_vp_id()
T Dec 30     Roman Kisel ( :  84|) [PATCH v5 0/5] hyperv: Fixes for get_vtl(), hv_vtl_apicid_to_vp_id()

This patchset gets sent almost every day. How about you slow-down and read the
docs while waiting?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2024-12-30 18:09 ` [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode Roman Kisel
@ 2025-01-03 19:20   ` Stanislav Kinsburskii
  2025-01-03 21:39     ` Roman Kisel
  2025-01-03 22:08     ` Michael Kelley
  2025-01-08 21:08   ` Nuno Das Neves
  1 sibling, 2 replies; 40+ messages in thread
From: Stanislav Kinsburskii @ 2025-01-03 19:20 UTC (permalink / raw)
  To: Roman Kisel
  Cc: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86, apais, benhill, ssengar, sunilmut, vdso

On Mon, Dec 30, 2024 at 10:09:39AM -0800, Roman Kisel wrote:
> Due to the hypercall page not being allocated in the VTL mode,
> the code resorts to using a part of the input page.
> 
> Allocate the hypercall output page in the VTL mode thus enabling
> it to use it for output and share code with dom0.
> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>  drivers/hv/hv_common.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index c6ed3ba4bf61..c983cfd4d6c0 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -340,7 +340,7 @@ int __init hv_common_init(void)
>  	BUG_ON(!hyperv_pcpu_input_arg);
>  
>  	/* Allocate the per-CPU state for output arg for root */
> -	if (hv_root_partition) {
> +	if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {

This check doesn't look nice.
First of all, IS_ENABLED(CONFIG_HYPERV_VTL_MODE) doesn't mean that this
particular kernel is being booted in VTL other that VTL0.

Second, current approach is that a VTL1+ kernel is a different build from VTL0
kernel and thus relying on the config option looks reasonable. However,
this is not true in general case.

I'd suggest one of the following three options:

1. Always allocate per-cpu output pages. This is wasteful for the VTL0
guest case, but it might worth it for overall simplification.

2. Don't touch this code and keep the cnage in the Underhill tree.

3. Introduce a configuration option (command line or device tree or
both) and use it instead of the kernel config option.

Thanks,
Stas

>  		hyperv_pcpu_output_arg = alloc_percpu(void *);
>  		BUG_ON(!hyperv_pcpu_output_arg);
>  	}
> @@ -435,7 +435,7 @@ int hv_common_cpu_init(unsigned int cpu)
>  	void **inputarg, **outputarg;
>  	u64 msr_vp_index;
>  	gfp_t flags;
> -	int pgcount = hv_root_partition ? 2 : 1;
> +	const int pgcount = (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) ? 2 : 1;
>  	void *mem;
>  	int ret;
>  
> @@ -453,7 +453,7 @@ int hv_common_cpu_init(unsigned int cpu)
>  		if (!mem)
>  			return -ENOMEM;
>  
> -		if (hv_root_partition) {
> +		if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
>  			outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
>  			*outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
>  		}
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-03 19:20   ` Stanislav Kinsburskii
@ 2025-01-03 21:39     ` Roman Kisel
  2025-01-06 17:11       ` Stanislav Kinsburskii
  2025-01-03 22:08     ` Michael Kelley
  1 sibling, 1 reply; 40+ messages in thread
From: Roman Kisel @ 2025-01-03 21:39 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86, apais, benhill, ssengar, sunilmut, vdso



On 1/3/2025 11:20 AM, Stanislav Kinsburskii wrote:
> On Mon, Dec 30, 2024 at 10:09:39AM -0800, Roman Kisel wrote:
>> Due to the hypercall page not being allocated in the VTL mode,
>> the code resorts to using a part of the input page.
>>
>> Allocate the hypercall output page in the VTL mode thus enabling
>> it to use it for output and share code with dom0.
>>
>> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
>> ---
>>   drivers/hv/hv_common.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
>> index c6ed3ba4bf61..c983cfd4d6c0 100644
>> --- a/drivers/hv/hv_common.c
>> +++ b/drivers/hv/hv_common.c
>> @@ -340,7 +340,7 @@ int __init hv_common_init(void)
>>   	BUG_ON(!hyperv_pcpu_input_arg);
>>   
>>   	/* Allocate the per-CPU state for output arg for root */
>> -	if (hv_root_partition) {
>> +	if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
> 
> This check doesn't look nice.
I read that as you don't like it. Neither do I (see below), the change
enables what's needed for the rest, and poses no harm imo.

> First of all, IS_ENABLED(CONFIG_HYPERV_VTL_MODE) doesn't mean that this
> particular kernel is being booted in VTL other that VTL0. > Second, current approach is that a VTL1+ kernel is a different build 
from VTL0
> kernel and thus relying on the config option looks reasonable. However,
> this is not true in general case.
"First" and "Second" appear to be saying that the approach is good in
your opinion. What is that general case you're alluding to which is
going to be broken by adding IS_ENABLED() here, how do I repro the
possible borkage?

> 
> I'd suggest one of the following three options:
> 
> 1. Always allocate per-cpu output pages. This is wasteful for the VTL0
> guest case, but it might worth it for overall simplification.
As outlined above, the justification for the changes you're requesting
isn't clear. Yet, if no objections from others, I'd happily weed out
these if's and #ifdef's, on that we're in agreement.

> 
> 2. Don't touch this code and keep the cnage in the Underhill tree.
So, leave get_vtl() broken iiuc? Please suggest what would be the fix
you prefer more. The patch set regularizes the common case and makes
get_vtl() look as hv_get_vp_register which it is so get_vtl() can be
replaced with it once it appears in the tree.

All in all, strong disagree. I cannot seem to see how "don't touch"
and "keep" is going to work with upstreaming the VTL mode patches.

> 
> 3. Introduce a configuration option (command line or device tree or
> both) and use it instead of the kernel config option.
That looks to me as messy and complicated compared to adding
IS_ENABLED(). Why defer the decision to runtime, what are the benefits
in your opinion?

> 
> Thanks,
> Stas
> 
>>   		hyperv_pcpu_output_arg = alloc_percpu(void *);
>>   		BUG_ON(!hyperv_pcpu_output_arg);
>>   	}
>> @@ -435,7 +435,7 @@ int hv_common_cpu_init(unsigned int cpu)
>>   	void **inputarg, **outputarg;
>>   	u64 msr_vp_index;
>>   	gfp_t flags;
>> -	int pgcount = hv_root_partition ? 2 : 1;
>> +	const int pgcount = (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) ? 2 : 1;
>>   	void *mem;
>>   	int ret;
>>   
>> @@ -453,7 +453,7 @@ int hv_common_cpu_init(unsigned int cpu)
>>   		if (!mem)
>>   			return -ENOMEM;
>>   
>> -		if (hv_root_partition) {
>> +		if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
>>   			outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
>>   			*outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
>>   		}
>> -- 
>> 2.34.1
>>

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-03 19:20   ` Stanislav Kinsburskii
  2025-01-03 21:39     ` Roman Kisel
@ 2025-01-03 22:08     ` Michael Kelley
       [not found]       ` <CAJ-90NKKfF-KcWJ7sdMCXK9fWiXwMG-9xtjQn9fVhXgjRinZbA@mail.gmail.com>
  2025-01-06 17:23       ` Stanislav Kinsburskii
  1 sibling, 2 replies; 40+ messages in thread
From: Michael Kelley @ 2025-01-03 22:08 UTC (permalink / raw)
  To: Stanislav Kinsburskii, Roman Kisel
  Cc: hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com, wei.liu@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Friday, January 3, 2025 11:20 AM
> 
> On Mon, Dec 30, 2024 at 10:09:39AM -0800, Roman Kisel wrote:
> > Due to the hypercall page not being allocated in the VTL mode,
> > the code resorts to using a part of the input page.
> >
> > Allocate the hypercall output page in the VTL mode thus enabling
> > it to use it for output and share code with dom0.
> >
> > Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> > ---
> >  drivers/hv/hv_common.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> > index c6ed3ba4bf61..c983cfd4d6c0 100644
> > --- a/drivers/hv/hv_common.c
> > +++ b/drivers/hv/hv_common.c
> > @@ -340,7 +340,7 @@ int __init hv_common_init(void)
> >  	BUG_ON(!hyperv_pcpu_input_arg);
> >
> >  	/* Allocate the per-CPU state for output arg for root */
> > -	if (hv_root_partition) {
> > +	if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
> 
> This check doesn't look nice.
> First of all, IS_ENABLED(CONFIG_HYPERV_VTL_MODE) doesn't mean that this
> particular kernel is being booted in VTL other that VTL0.

Actually, it does mean that. Kernels built with CONFIG_HYPERV_VTL_MODE=y
will not run as a normal guest in VTL 0. See the third paragraph of the
"help" section for HYPERV_VTL_MODE in drivers/hv/Kconfig.

Michael

> 
> Second, current approach is that a VTL1+ kernel is a different build from VTL0
> kernel and thus relying on the config option looks reasonable. However,
> this is not true in general case.
> 
> I'd suggest one of the following three options:
> 
> 1. Always allocate per-cpu output pages. This is wasteful for the VTL0
> guest case, but it might worth it for overall simplification.
> 
> 2. Don't touch this code and keep the cnage in the Underhill tree.
> 
> 3. Introduce a configuration option (command line or device tree or
> both) and use it instead of the kernel config option.
> 
> Thanks,
> Stas
> 
> >  		hyperv_pcpu_output_arg = alloc_percpu(void *);
> >  		BUG_ON(!hyperv_pcpu_output_arg);
> >  	}
> > @@ -435,7 +435,7 @@ int hv_common_cpu_init(unsigned int cpu)
> >  	void **inputarg, **outputarg;
> >  	u64 msr_vp_index;
> >  	gfp_t flags;
> > -	int pgcount = hv_root_partition ? 2 : 1;
> > +	const int pgcount = (hv_root_partition ||
> IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) ? 2 : 1;
> >  	void *mem;
> >  	int ret;
> >
> > @@ -453,7 +453,7 @@ int hv_common_cpu_init(unsigned int cpu)
> >  		if (!mem)
> >  			return -ENOMEM;
> >
> > -		if (hv_root_partition) {
> > +		if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
> >  			outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> >  			*outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
> >  		}
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
       [not found]       ` <CAJ-90NKKfF-KcWJ7sdMCXK9fWiXwMG-9xtjQn9fVhXgjRinZbA@mail.gmail.com>
@ 2025-01-06 14:53         ` Alex Ionescu
  2025-01-06 16:10           ` Michael Kelley
  0 siblings, 1 reply; 40+ messages in thread
From: Alex Ionescu @ 2025-01-06 14:53 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Stanislav Kinsburskii, Roman Kisel, hpa@zytor.com,
	kys@microsoft.com, bp@alien8.de, dave.hansen@linux.intel.com,
	decui@microsoft.com, eahariha@linux.microsoft.com,
	haiyangz@microsoft.com, mingo@redhat.com,
	nunodasneves@linux.microsoft.com, tglx@linutronix.de,
	tiala@microsoft.com, wei.liu@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev

For another 2c worth, I had previously requested #1 (always allocate
output page) as this would simplify some further work I was interested
in at some point to provide VSM-like functionality to a VTL 0 Linux
guest, which would, at that point, not make it wasteful for VTL 0 any
longer (since some required hypercalls for VSM support need an output
page).

I realize this is "future-proofing" something that doesn't exist yet
but it would avoid a further change down the road.

Best regards,
Alex Ionescu

Best regards,
Alex Ionescu


On Mon, Jan 6, 2025 at 9:45 AM Alex Ionescu <aionescu@gmail.com> wrote:
>
> For another 2c worth, I had previously requested #1 (always allocate output page) as this would simplify some further work I was interested in at some point to provide VSM-like functionality to a VTL 0 Linux guest, which would, at that point, not make it wasteful for VTL 0 any longer (since some required hypercalls for VSM support need an output page).
>
> I realize this is "future-proofing" something that doesn't exist yet but it would avoid a further change down the road.
>
> Best regards,
> Alex Ionescu
>
>
> On Fri, Jan 3, 2025 at 5:08 PM Michael Kelley <mhklinux@outlook.com> wrote:
>>
>> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Friday, January 3, 2025 11:20 AM
>> >
>> > On Mon, Dec 30, 2024 at 10:09:39AM -0800, Roman Kisel wrote:
>> > > Due to the hypercall page not being allocated in the VTL mode,
>> > > the code resorts to using a part of the input page.
>> > >
>> > > Allocate the hypercall output page in the VTL mode thus enabling
>> > > it to use it for output and share code with dom0.
>> > >
>> > > Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
>> > > ---
>> > >  drivers/hv/hv_common.c | 6 +++---
>> > >  1 file changed, 3 insertions(+), 3 deletions(-)
>> > >
>> > > diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
>> > > index c6ed3ba4bf61..c983cfd4d6c0 100644
>> > > --- a/drivers/hv/hv_common.c
>> > > +++ b/drivers/hv/hv_common.c
>> > > @@ -340,7 +340,7 @@ int __init hv_common_init(void)
>> > >     BUG_ON(!hyperv_pcpu_input_arg);
>> > >
>> > >     /* Allocate the per-CPU state for output arg for root */
>> > > -   if (hv_root_partition) {
>> > > +   if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
>> >
>> > This check doesn't look nice.
>> > First of all, IS_ENABLED(CONFIG_HYPERV_VTL_MODE) doesn't mean that this
>> > particular kernel is being booted in VTL other that VTL0.
>>
>> Actually, it does mean that. Kernels built with CONFIG_HYPERV_VTL_MODE=y
>> will not run as a normal guest in VTL 0. See the third paragraph of the
>> "help" section for HYPERV_VTL_MODE in drivers/hv/Kconfig.
>>
>> Michael
>>
>> >
>> > Second, current approach is that a VTL1+ kernel is a different build from VTL0
>> > kernel and thus relying on the config option looks reasonable. However,
>> > this is not true in general case.
>> >
>> > I'd suggest one of the following three options:
>> >
>> > 1. Always allocate per-cpu output pages. This is wasteful for the VTL0
>> > guest case, but it might worth it for overall simplification.
>> >
>> > 2. Don't touch this code and keep the cnage in the Underhill tree.
>> >
>> > 3. Introduce a configuration option (command line or device tree or
>> > both) and use it instead of the kernel config option.
>> >
>> > Thanks,
>> > Stas
>> >
>> > >             hyperv_pcpu_output_arg = alloc_percpu(void *);
>> > >             BUG_ON(!hyperv_pcpu_output_arg);
>> > >     }
>> > > @@ -435,7 +435,7 @@ int hv_common_cpu_init(unsigned int cpu)
>> > >     void **inputarg, **outputarg;
>> > >     u64 msr_vp_index;
>> > >     gfp_t flags;
>> > > -   int pgcount = hv_root_partition ? 2 : 1;
>> > > +   const int pgcount = (hv_root_partition ||
>> > IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) ? 2 : 1;
>> > >     void *mem;
>> > >     int ret;
>> > >
>> > > @@ -453,7 +453,7 @@ int hv_common_cpu_init(unsigned int cpu)
>> > >             if (!mem)
>> > >                     return -ENOMEM;
>> > >
>> > > -           if (hv_root_partition) {
>> > > +           if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
>> > >                     outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
>> > >                     *outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
>> > >             }
>> > > --
>> > > 2.34.1
>> > >
>>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-06 14:53         ` Alex Ionescu
@ 2025-01-06 16:10           ` Michael Kelley
  0 siblings, 0 replies; 40+ messages in thread
From: Michael Kelley @ 2025-01-06 16:10 UTC (permalink / raw)
  To: Alex Ionescu
  Cc: Stanislav Kinsburskii, Roman Kisel, hpa@zytor.com,
	kys@microsoft.com, bp@alien8.de, dave.hansen@linux.intel.com,
	decui@microsoft.com, eahariha@linux.microsoft.com,
	haiyangz@microsoft.com, mingo@redhat.com,
	nunodasneves@linux.microsoft.com, tglx@linutronix.de,
	tiala@microsoft.com, wei.liu@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev

From: Alex Ionescu <aionescu@gmail.com> Sent: Monday, January 6, 2025 6:54 AM
> 
> For another 2c worth, I had previously requested #1 (always allocate
> output page) as this would simplify some further work I was interested
> in at some point to provide VSM-like functionality to a VTL 0 Linux
> guest, which would, at that point, not make it wasteful for VTL 0 any
> longer (since some required hypercalls for VSM support need an output
> page).
> 
> I realize this is "future-proofing" something that doesn't exist yet
> but it would avoid a further change down the road.
> 

Alex --

I'm prototyping a new approach to managing per-cpu memory for
Hyper-V hypercall inputs and outputs. This new approach doesn’t have
explicit "input" and "output" pages, but may share a single page for
both input and output (without overlapping, as required by the TLFS).
The goal is to abstract and "regularize" the way hypercall input and
output space is set up, so that we avoid the current ad hoc approach
that is open coded for each hypercall invocation and is subject to errors.

I want to make sure my approach would handle your case. Could you
elaborate on the nature of the input and output from the VSM-related
hypercalls? Is it some fixed size structure? Or perhaps a fixed size
structure plus an array? Or something else?

Thanks,

Michael

> >
> > On Fri, Jan 3, 2025 at 5:08 PM Michael Kelley <mhklinux@outlook.com> wrote:
> >>
> >> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Friday, January 3, 2025 11:20 AM
> >> >
> >> > On Mon, Dec 30, 2024 at 10:09:39AM -0800, Roman Kisel wrote:
> >> > > Due to the hypercall page not being allocated in the VTL mode,
> >> > > the code resorts to using a part of the input page.
> >> > >
> >> > > Allocate the hypercall output page in the VTL mode thus enabling
> >> > > it to use it for output and share code with dom0.
> >> > >
> >> > > Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> >> > > ---
> >> > >  drivers/hv/hv_common.c | 6 +++---
> >> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> >> > >
> >> > > diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> >> > > index c6ed3ba4bf61..c983cfd4d6c0 100644
> >> > > --- a/drivers/hv/hv_common.c
> >> > > +++ b/drivers/hv/hv_common.c
> >> > > @@ -340,7 +340,7 @@ int __init hv_common_init(void)
> >> > >     BUG_ON(!hyperv_pcpu_input_arg);
> >> > >
> >> > >     /* Allocate the per-CPU state for output arg for root */
> >> > > -   if (hv_root_partition) {
> >> > > +   if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
> >> >
> >> > This check doesn't look nice.
> >> > First of all, IS_ENABLED(CONFIG_HYPERV_VTL_MODE) doesn't mean that this
> >> > particular kernel is being booted in VTL other that VTL0.
> >>
> >> Actually, it does mean that. Kernels built with CONFIG_HYPERV_VTL_MODE=y
> >> will not run as a normal guest in VTL 0. See the third paragraph of the
> >> "help" section for HYPERV_VTL_MODE in drivers/hv/Kconfig.
> >>
> >> Michael
> >>
> >> >
> >> > Second, current approach is that a VTL1+ kernel is a different build from VTL0
> >> > kernel and thus relying on the config option looks reasonable. However,
> >> > this is not true in general case.
> >> >
> >> > I'd suggest one of the following three options:
> >> >
> >> > 1. Always allocate per-cpu output pages. This is wasteful for the VTL0
> >> > guest case, but it might worth it for overall simplification.
> >> >
> >> > 2. Don't touch this code and keep the cnage in the Underhill tree.
> >> >
> >> > 3. Introduce a configuration option (command line or device tree or
> >> > both) and use it instead of the kernel config option.
> >> >
> >> > Thanks,
> >> > Stas
> >> >
> >> > >             hyperv_pcpu_output_arg = alloc_percpu(void *);
> >> > >             BUG_ON(!hyperv_pcpu_output_arg);
> >> > >     }
> >> > > @@ -435,7 +435,7 @@ int hv_common_cpu_init(unsigned int cpu)
> >> > >     void **inputarg, **outputarg;
> >> > >     u64 msr_vp_index;
> >> > >     gfp_t flags;
> >> > > -   int pgcount = hv_root_partition ? 2 : 1;
> >> > > +   const int pgcount = (hv_root_partition ||
> >> > IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) ? 2 : 1;
> >> > >     void *mem;
> >> > >     int ret;
> >> > >
> >> > > @@ -453,7 +453,7 @@ int hv_common_cpu_init(unsigned int cpu)
> >> > >             if (!mem)
> >> > >                     return -ENOMEM;
> >> > >
> >> > > -           if (hv_root_partition) {
> >> > > +           if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
> >> > >                     outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> >> > >                     *outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
> >> > >             }
> >> > > --
> >> > > 2.34.1
> >> > >
> >>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-03 21:39     ` Roman Kisel
@ 2025-01-06 17:11       ` Stanislav Kinsburskii
  2025-01-06 18:11         ` Roman Kisel
  0 siblings, 1 reply; 40+ messages in thread
From: Stanislav Kinsburskii @ 2025-01-06 17:11 UTC (permalink / raw)
  To: Roman Kisel
  Cc: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86, apais, benhill, ssengar, sunilmut, vdso

On Fri, Jan 03, 2025 at 01:39:29PM -0800, Roman Kisel wrote:
> 
> 
> On 1/3/2025 11:20 AM, Stanislav Kinsburskii wrote:
> > On Mon, Dec 30, 2024 at 10:09:39AM -0800, Roman Kisel wrote:
> > > Due to the hypercall page not being allocated in the VTL mode,
> > > the code resorts to using a part of the input page.
> > > 
> > > Allocate the hypercall output page in the VTL mode thus enabling
> > > it to use it for output and share code with dom0.
> > > 
> > > Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> > > ---
> > >   drivers/hv/hv_common.c | 6 +++---
> > >   1 file changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> > > index c6ed3ba4bf61..c983cfd4d6c0 100644
> > > --- a/drivers/hv/hv_common.c
> > > +++ b/drivers/hv/hv_common.c
> > > @@ -340,7 +340,7 @@ int __init hv_common_init(void)
> > >   	BUG_ON(!hyperv_pcpu_input_arg);
> > >   	/* Allocate the per-CPU state for output arg for root */
> > > -	if (hv_root_partition) {
> > > +	if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
> > 
> > This check doesn't look nice.
> I read that as you don't like it. Neither do I (see below), the change
> enables what's needed for the rest, and poses no harm imo.
> 
> > First of all, IS_ENABLED(CONFIG_HYPERV_VTL_MODE) doesn't mean that this
> > particular kernel is being booted in VTL other that VTL0. > Second,
> > current approach is that a VTL1+ kernel is a different build
> from VTL0
> > kernel and thus relying on the config option looks reasonable. However,
> > this is not true in general case.
> "First" and "Second" appear to be saying that the approach is good in
> your opinion. What is that general case you're alluding to which is
> going to be broken by adding IS_ENABLED() here, how do I repro the
> possible borkage?
> 
> > 
> > I'd suggest one of the following three options:
> > 
> > 1. Always allocate per-cpu output pages. This is wasteful for the VTL0
> > guest case, but it might worth it for overall simplification.
> As outlined above, the justification for the changes you're requesting
> isn't clear. Yet, if no objections from others, I'd happily weed out
> these if's and #ifdef's, on that we're in agreement.
> 
> > 
> > 2. Don't touch this code and keep the cnage in the Underhill tree.
> So, leave get_vtl() broken iiuc? Please suggest what would be the fix
> you prefer more. The patch set regularizes the common case and makes
> get_vtl() look as hv_get_vp_register which it is so get_vtl() can be
> replaced with it once it appears in the tree.
> 
> All in all, strong disagree. I cannot seem to see how "don't touch"
> and "keep" is going to work with upstreaming the VTL mode patches.
> 
> > 
> > 3. Introduce a configuration option (command line or device tree or
> > both) and use it instead of the kernel config option.
> That looks to me as messy and complicated compared to adding
> IS_ENABLED(). Why defer the decision to runtime, what are the benefits
> in your opinion?
> 

The issue is that when you boot the same kernel in both VTL0 and VTL1+,
the pages will be allocated in any case (root or guest, VTL0 or VTL1+).

Also, there are other places in the code, where braching needs to be done
differently depending in on which VTL the execution is happening in.

I think, there are two possible paths we can take here.

The first one is when the checks are done during compilation. In this case
the kernel should explicitly fail to boot in VTL0 if compiled for VTL1+
and vice versa.

The second one if to make checks in runtime and let the same kernel boot
differently in different VTLs.

Thoughts?

Stas

> > 
> > Thanks,
> > Stas
> > 
> > >   		hyperv_pcpu_output_arg = alloc_percpu(void *);
> > >   		BUG_ON(!hyperv_pcpu_output_arg);
> > >   	}
> > > @@ -435,7 +435,7 @@ int hv_common_cpu_init(unsigned int cpu)
> > >   	void **inputarg, **outputarg;
> > >   	u64 msr_vp_index;
> > >   	gfp_t flags;
> > > -	int pgcount = hv_root_partition ? 2 : 1;
> > > +	const int pgcount = (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) ? 2 : 1;
> > >   	void *mem;
> > >   	int ret;
> > > @@ -453,7 +453,7 @@ int hv_common_cpu_init(unsigned int cpu)
> > >   		if (!mem)
> > >   			return -ENOMEM;
> > > -		if (hv_root_partition) {
> > > +		if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
> > >   			outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> > >   			*outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
> > >   		}
> > > -- 
> > > 2.34.1
> > > 
> 
> -- 
> Thank you,
> Roman
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-03 22:08     ` Michael Kelley
       [not found]       ` <CAJ-90NKKfF-KcWJ7sdMCXK9fWiXwMG-9xtjQn9fVhXgjRinZbA@mail.gmail.com>
@ 2025-01-06 17:23       ` Stanislav Kinsburskii
  2025-01-06 18:18         ` Michael Kelley
  1 sibling, 1 reply; 40+ messages in thread
From: Stanislav Kinsburskii @ 2025-01-06 17:23 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Roman Kisel, hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com, wei.liu@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev

On Fri, Jan 03, 2025 at 10:08:05PM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Friday, January 3, 2025 11:20 AM
> > 
> > On Mon, Dec 30, 2024 at 10:09:39AM -0800, Roman Kisel wrote:
> > > Due to the hypercall page not being allocated in the VTL mode,
> > > the code resorts to using a part of the input page.
> > >
> > > Allocate the hypercall output page in the VTL mode thus enabling
> > > it to use it for output and share code with dom0.
> > >
> > > Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> > > ---
> > >  drivers/hv/hv_common.c | 6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> > > index c6ed3ba4bf61..c983cfd4d6c0 100644
> > > --- a/drivers/hv/hv_common.c
> > > +++ b/drivers/hv/hv_common.c
> > > @@ -340,7 +340,7 @@ int __init hv_common_init(void)
> > >  	BUG_ON(!hyperv_pcpu_input_arg);
> > >
> > >  	/* Allocate the per-CPU state for output arg for root */
> > > -	if (hv_root_partition) {
> > > +	if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
> > 
> > This check doesn't look nice.
> > First of all, IS_ENABLED(CONFIG_HYPERV_VTL_MODE) doesn't mean that this
> > particular kernel is being booted in VTL other that VTL0.
> 
> Actually, it does mean that. Kernels built with CONFIG_HYPERV_VTL_MODE=y
> will not run as a normal guest in VTL 0. See the third paragraph of the
> "help" section for HYPERV_VTL_MODE in drivers/hv/Kconfig.
> 

Thanks for pointing to this piece.

This limitation looks aritificial to me and as VTL support in Linux is
currently being extended beyond Underhill support, keeping this
restriction makes some further development in scope of LVBS support
complicated and error prone due to potential ABI mismatches between
Linux kernels in different VTLs.

IOW, making the same kernel properly bootable (or - worse - explicitly
un-bootable) in different VTLs is a more robust way in the long run.

Thanks,
Stas

> Michael
> 
> > 
> > Second, current approach is that a VTL1+ kernel is a different build from VTL0
> > kernel and thus relying on the config option looks reasonable. However,
> > this is not true in general case.
> > 
> > I'd suggest one of the following three options:
> > 
> > 1. Always allocate per-cpu output pages. This is wasteful for the VTL0
> > guest case, but it might worth it for overall simplification.
> > 
> > 2. Don't touch this code and keep the cnage in the Underhill tree.
> > 
> > 3. Introduce a configuration option (command line or device tree or
> > both) and use it instead of the kernel config option.
> > 
> > Thanks,
> > Stas
> > 
> > >  		hyperv_pcpu_output_arg = alloc_percpu(void *);
> > >  		BUG_ON(!hyperv_pcpu_output_arg);
> > >  	}
> > > @@ -435,7 +435,7 @@ int hv_common_cpu_init(unsigned int cpu)
> > >  	void **inputarg, **outputarg;
> > >  	u64 msr_vp_index;
> > >  	gfp_t flags;
> > > -	int pgcount = hv_root_partition ? 2 : 1;
> > > +	const int pgcount = (hv_root_partition ||
> > IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) ? 2 : 1;
> > >  	void *mem;
> > >  	int ret;
> > >
> > > @@ -453,7 +453,7 @@ int hv_common_cpu_init(unsigned int cpu)
> > >  		if (!mem)
> > >  			return -ENOMEM;
> > >
> > > -		if (hv_root_partition) {
> > > +		if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
> > >  			outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> > >  			*outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
> > >  		}
> > > --
> > > 2.34.1
> > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v5 1/5] hyperv: Define struct hv_output_get_vp_registers
  2024-12-30 18:09 ` [PATCH v5 1/5] hyperv: Define struct hv_output_get_vp_registers Roman Kisel
@ 2025-01-06 17:37   ` Michael Kelley
  2025-01-06 20:24     ` Roman Kisel
  2025-01-08 17:58     ` Nuno Das Neves
  0 siblings, 2 replies; 40+ messages in thread
From: Michael Kelley @ 2025-01-06 17:37 UTC (permalink / raw)
  To: Roman Kisel, hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com, wei.liu@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com, ssengar@microsoft.com,
	sunilmut@microsoft.com, vdso@hexbites.dev

From: Roman Kisel <romank@linux.microsoft.com> Sent: Monday, December 30, 2024 10:10 AM
> 
> There is no definition of the output structure for the
> GetVpRegisters hypercall. Hence, using the hypercall
> is not possible when the output value has some structure
> to it. Even getting a datum of a primitive type reads
> as ad-hoc without that definition.
> 
> Define struct hv_output_get_vp_registers to enable using
> the GetVpRegisters hypercall. Make provisions for all
> supported architectures. No functional changes.
> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>  include/hyperv/hvgdk_mini.h | 49 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 49 insertions(+)
> 
> diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
> index db3d1aaf7330..e8e3faa78e15 100644
> --- a/include/hyperv/hvgdk_mini.h
> +++ b/include/hyperv/hvgdk_mini.h
> @@ -1068,6 +1068,35 @@ union hv_dispatch_suspend_register {
>  	} __packed;
>  };
> 
> +union hv_arm64_pending_interruption_register {
> +	u64 as_uint64;
> +	struct {
> +		u64 interruption_pending : 1;
> +		u64 interruption_type : 1;
> +		u64 reserved : 30;
> +		u32 error_code;

These bit field definitions don't look right. We want to "fill up"
the field size, so that we're explicit about each bit, and not leave
it to the compiler to add padding (which __packed tells the
compiler not to do). So in aggregate, the "u64" bit fields should
account for all 64 bits, but here you account for only 32 bits.
There are two ways to fix this:

		u32 interruption_pending : 1;
		u32 interruption_type: 1;
		u32 reserved : 30;
		u32 error_code;
Or
		u64 interruption_pending : 1;
		u64 interruption_type: 1;
		u64 reserved : 30;
		u64 error_code : 32;

> +	} __packed;
> +};
> +
> +union hv_arm64_interrupt_state_register {
> +	u64 as_uint64;
> +	struct {
> +		u64 interrupt_shadow : 1;
> +		u64 reserved : 63;
> +	} __packed;
> +};
> +
> +union hv_arm64_pending_synthetic_exception_event {
> +	u64 as_uint64[2];
> +	struct {
> +		u32 event_pending : 1;
> +		u32 event_type : 3;
> +		u32 reserved : 4;

Same here. Expand the "reserved" field to 28 bits?  Or maybe
there's a reason to have two separate reserved fields of 4 bits
and 24 bits. I'm not sure what the register layout is supposed to
be. Looking at hv_arm64_pending_synthetic_exception_event
in the OHCL-Linux-Kernel github tree shows the same gap of
24 bits, so that doesn't provide any guidance.

> +		u32 exception_type;
> +		u64 context;
> +	} __packed;
> +};
> +
>  union hv_x64_interrupt_state_register {
>  	u64 as_uint64;
>  	struct {
> @@ -1103,8 +1132,28 @@ union hv_register_value {
>  	union hv_explicit_suspend_register explicit_suspend;
>  	union hv_intercept_suspend_register intercept_suspend;
>  	union hv_dispatch_suspend_register dispatch_suspend;
> +#ifdef CONFIG_ARM64
> +	union hv_arm64_interrupt_state_register interrupt_state;
> +	union hv_arm64_pending_interruption_register pending_interruption;
> +#endif
> +#ifdef CONFIG_X86
>  	union hv_x64_interrupt_state_register interrupt_state;
>  	union hv_x64_pending_interruption_register pending_interruption;
> +#endif
> +	union hv_arm64_pending_synthetic_exception_event pending_synthetic_exception_event;
> +};

Per the previous discussion, I can see that the #ifdef's are needed
here to disambiguate the field names that are the same, but have
different unions on x86 and arm64.

But on the flip side, I wonder if the field names should really be the
same. Because of the different unions, it seems like they couldn't be
accessed by architecture neutral code (unless the access is just using
the "as_uint64" option?). So giving the fields names like
"x86_interrupt_state" and "arm64_interrupt_state" instead of just
"interrupt_state" might be more consistent with how the rest of this
file handles architecture differences. But I don't know all the implications
of making such a change.

Nuno -- your thoughts?

Michael

> +
> +/*
> + * NOTE: Linux helper struct - NOT from Hyper-V code.
> + * DECLARE_FLEX_ARRAY() needs to be wrapped into
> + * a structure and have at least one more member besides
> + * DECLARE_FLEX_ARRAY.
> + */
> +struct hv_output_get_vp_registers {
> +	struct {
> +		DECLARE_FLEX_ARRAY(union hv_register_value, values);
> +		struct {} values_end;
> +	};
>  };
> 
>  #if defined(CONFIG_ARM64)
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-06 17:11       ` Stanislav Kinsburskii
@ 2025-01-06 18:11         ` Roman Kisel
  2025-01-06 19:32           ` Stanislav Kinsburskii
  0 siblings, 1 reply; 40+ messages in thread
From: Roman Kisel @ 2025-01-06 18:11 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86, apais, benhill, ssengar, sunilmut, vdso

On 1/6/2025 9:11 AM, Stanislav Kinsburskii wrote:
> On Fri, Jan 03, 2025 at 01:39:29PM -0800, Roman Kisel wrote:
>>

[...]

>>
> 
> The issue is that when you boot the same kernel in both VTL0 and VTL1+,
> the pages will be allocated in any case (root or guest, VTL0 or VTL1+).

I think we share we same beliefs: use common code as much as possible.
Strategically, one day, there will be the kernel being able to boot
(or at the very minimum share the Hyper-V code for) VTL0, VTL1 (LVBS)
and VTL2 (OpenHCL). It is not today, though: VTL0 relies on ACPI|BIOS,
VTL2 relies on DeviceTree, and VTL1 boot configuration comes off as
a bit ad-hoc from my read of https://github.com/heki-linux/lvbs-linux,
and working with the LVBS folks on debugging that.

Can that day of the grand VTL code unification be tomorrow, or next
week, or next month, or maybe next year, what is the option you leaning
towards?

To me, it seems, that's not even the next month. Let us take a look
at how much ink is being spent to just fix a garden variety function.
On the meta-level that might mean some _fundamental work_ is needed to
provide a _robust foundation_ to built upon, such as removing the if
statements and #ifdef's we're debating about to let the general case
shine through.

Tactically, imo, a staged approach might give more velocity and
coverage instead of fixing the world in this small patch set. I would
not want to increase the potential "blast radius" of the change.
As it stands, it is pretty well-contained.

All told, it might be prudent to focus on the task at hand - fix the
function in question to enable building on that, e.g. proposing the v4
of the ARM64 VTL mode patches, and more of what we have in
https://github.com/microsoft/OHCL-Linux-Kernel.

Once we take that small step to fix the hyperv-next tree, someone could
propose removing the conditions for allocating the output page —-- or,
perhaps, suggest an entirely new & vastly better solution to handling
the hypercall output page. IMHO, that would enable adding features via
relying on more generic code rather than further complicating the web
of conditional statements and conditional compilation.

> 
> Also, there are other places in the code, where braching needs to be done
> differently depending in on which VTL the execution is happening in.
> 
> I think, there are two possible paths we can take here.
> 
> The first one is when the checks are done during compilation. In this case
> the kernel should explicitly fail to boot in VTL0 if compiled for VTL1+
> and vice versa.
> 
> The second one if to make checks in runtime and let the same kernel boot
> differently in different VTLs.
> 
> Thoughts?
> 
> Stas
> 
>>>
>>> Thanks,
>>> Stas
>>>
>>>>    		hyperv_pcpu_output_arg = alloc_percpu(void *);
>>>>    		BUG_ON(!hyperv_pcpu_output_arg);
>>>>    	}
>>>> @@ -435,7 +435,7 @@ int hv_common_cpu_init(unsigned int cpu)
>>>>    	void **inputarg, **outputarg;
>>>>    	u64 msr_vp_index;
>>>>    	gfp_t flags;
>>>> -	int pgcount = hv_root_partition ? 2 : 1;
>>>> +	const int pgcount = (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) ? 2 : 1;
>>>>    	void *mem;
>>>>    	int ret;
>>>> @@ -453,7 +453,7 @@ int hv_common_cpu_init(unsigned int cpu)
>>>>    		if (!mem)
>>>>    			return -ENOMEM;
>>>> -		if (hv_root_partition) {
>>>> +		if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
>>>>    			outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
>>>>    			*outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
>>>>    		}
>>>> -- 
>>>> 2.34.1
>>>>
>>
>> -- 
>> Thank you,
>> Roman
>>

-- 
Thank you,
Roman

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-06 17:23       ` Stanislav Kinsburskii
@ 2025-01-06 18:18         ` Michael Kelley
  2025-01-06 19:19           ` Stanislav Kinsburskii
  0 siblings, 1 reply; 40+ messages in thread
From: Michael Kelley @ 2025-01-06 18:18 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: Roman Kisel, hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com, wei.liu@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, January 6, 2025 9:23 AM
> 
> On Fri, Jan 03, 2025 at 10:08:05PM +0000, Michael Kelley wrote:
> > From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Friday, January
> 3, 2025 11:20 AM
> > >
> > > On Mon, Dec 30, 2024 at 10:09:39AM -0800, Roman Kisel wrote:
> > > > Due to the hypercall page not being allocated in the VTL mode,
> > > > the code resorts to using a part of the input page.
> > > >
> > > > Allocate the hypercall output page in the VTL mode thus enabling
> > > > it to use it for output and share code with dom0.
> > > >
> > > > Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> > > > ---
> > > >  drivers/hv/hv_common.c | 6 +++---
> > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> > > > index c6ed3ba4bf61..c983cfd4d6c0 100644
> > > > --- a/drivers/hv/hv_common.c
> > > > +++ b/drivers/hv/hv_common.c
> > > > @@ -340,7 +340,7 @@ int __init hv_common_init(void)
> > > >  	BUG_ON(!hyperv_pcpu_input_arg);
> > > >
> > > >  	/* Allocate the per-CPU state for output arg for root */
> > > > -	if (hv_root_partition) {
> > > > +	if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
> > >
> > > This check doesn't look nice.
> > > First of all, IS_ENABLED(CONFIG_HYPERV_VTL_MODE) doesn't mean that this
> > > particular kernel is being booted in VTL other that VTL0.
> >
> > Actually, it does mean that. Kernels built with CONFIG_HYPERV_VTL_MODE=y
> > will not run as a normal guest in VTL 0. See the third paragraph of the
> > "help" section for HYPERV_VTL_MODE in drivers/hv/Kconfig.
> >
> 
> Thanks for pointing to this piece.
> 
> This limitation looks aritificial to me and as VTL support in Linux is
> currently being extended beyond Underhill support, keeping this
> restriction makes some further development in scope of LVBS support
> complicated and error prone due to potential ABI mismatches between
> Linux kernels in different VTLs.
> 
> IOW, making the same kernel properly bootable (or - worse - explicitly
> un-bootable) in different VTLs is a more robust way in the long run.

The reason for the limitation is the sequencing of early Hyper-V-related
initialization steps. Knowing at runtime whether you are running at
VTL0 or some other VTL requires making a hypercall in get_vtl().
Unfortunately, the machinery for making a hypercall (setting the guest
OS ID, and allocating the x86 hypercall page) is established relatively late
during initialization, in hyperv_init(). But running in other than VTL0
requires the initializations that are done in hv_vtl_init_platform(), which
must be done much earlier. There's no clear way out of this conundrum
purely on the Linux guest side.

To solve the conundrum on x86, one possibility to consider is having
Hyper-V make HV_REGISTER_VSM_VP_STATUS available as a synthetic
MSR, which can be read without making a hypercall. This register could be
read in ms_hyperv_init_platform() to know if running at VTL0 or not.
Using synthetic MSRs is how other aspects of early Hyper-V-related
initialization is done in Linux on x86.

I think there's some discussion on the x86 sequencing issues on LKML
from when the VTL code was first added. I was part of that discussion, but
don't remember all the details. There might additional issues raised in
that discussion.

The sequencing issues would also need to be sorted out on the arm64
side, as they are different from x86. We don't have an early Hyper-V
specific hook like ms_hyperv_init_platform() on the arm64 side, so
that might be problem. But on the flip side, we also don't have the
x86-specific messiness that hv_vtl_init_platform() handles. Also,
there are no synthetic MSRs on arm64, so register accesses always
use hypercalls, but there's no hypercall page needed. On balance, I
think getting VTL stuff initialized on arm64 will be easier, but I'm not sure.

Michael

> 
> Thanks,
> Stas
> 
> > Michael
> >
> > >
> > > Second, current approach is that a VTL1+ kernel is a different build from VTL0
> > > kernel and thus relying on the config option looks reasonable. However,
> > > this is not true in general case.
> > >
> > > I'd suggest one of the following three options:
> > >
> > > 1. Always allocate per-cpu output pages. This is wasteful for the VTL0
> > > guest case, but it might worth it for overall simplification.
> > >
> > > 2. Don't touch this code and keep the cnage in the Underhill tree.
> > >
> > > 3. Introduce a configuration option (command line or device tree or
> > > both) and use it instead of the kernel config option.
> > >
> > > Thanks,
> > > Stas
> > >
> > > >  		hyperv_pcpu_output_arg = alloc_percpu(void *);
> > > >  		BUG_ON(!hyperv_pcpu_output_arg);
> > > >  	}
> > > > @@ -435,7 +435,7 @@ int hv_common_cpu_init(unsigned int cpu)
> > > >  	void **inputarg, **outputarg;
> > > >  	u64 msr_vp_index;
> > > >  	gfp_t flags;
> > > > -	int pgcount = hv_root_partition ? 2 : 1;
> > > > +	const int pgcount = (hv_root_partition ||
> > > IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) ? 2 : 1;
> > > >  	void *mem;
> > > >  	int ret;
> > > >
> > > > @@ -453,7 +453,7 @@ int hv_common_cpu_init(unsigned int cpu)
> > > >  		if (!mem)
> > > >  			return -ENOMEM;
> > > >
> > > > -		if (hv_root_partition) {
> > > > +		if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
> > > >  			outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> > > >  			*outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
> > > >  		}
> > > > --
> > > > 2.34.1
> > > >

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-06 18:18         ` Michael Kelley
@ 2025-01-06 19:19           ` Stanislav Kinsburskii
  2025-01-06 19:49             ` Michael Kelley
  0 siblings, 1 reply; 40+ messages in thread
From: Stanislav Kinsburskii @ 2025-01-06 19:19 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Roman Kisel, hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com, wei.liu@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev

On Mon, Jan 06, 2025 at 06:18:51PM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, January 6, 2025 9:23 AM
> > 
> > On Fri, Jan 03, 2025 at 10:08:05PM +0000, Michael Kelley wrote:
> > > From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Friday, January
> > 3, 2025 11:20 AM
> > > >
> > > > On Mon, Dec 30, 2024 at 10:09:39AM -0800, Roman Kisel wrote:
> > > > > Due to the hypercall page not being allocated in the VTL mode,
> > > > > the code resorts to using a part of the input page.
> > > > >
> > > > > Allocate the hypercall output page in the VTL mode thus enabling
> > > > > it to use it for output and share code with dom0.
> > > > >
> > > > > Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> > > > > ---
> > > > >  drivers/hv/hv_common.c | 6 +++---
> > > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> > > > > index c6ed3ba4bf61..c983cfd4d6c0 100644
> > > > > --- a/drivers/hv/hv_common.c
> > > > > +++ b/drivers/hv/hv_common.c
> > > > > @@ -340,7 +340,7 @@ int __init hv_common_init(void)
> > > > >  	BUG_ON(!hyperv_pcpu_input_arg);
> > > > >
> > > > >  	/* Allocate the per-CPU state for output arg for root */
> > > > > -	if (hv_root_partition) {
> > > > > +	if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
> > > >
> > > > This check doesn't look nice.
> > > > First of all, IS_ENABLED(CONFIG_HYPERV_VTL_MODE) doesn't mean that this
> > > > particular kernel is being booted in VTL other that VTL0.
> > >
> > > Actually, it does mean that. Kernels built with CONFIG_HYPERV_VTL_MODE=y
> > > will not run as a normal guest in VTL 0. See the third paragraph of the
> > > "help" section for HYPERV_VTL_MODE in drivers/hv/Kconfig.
> > >
> > 
> > Thanks for pointing to this piece.
> > 
> > This limitation looks aritificial to me and as VTL support in Linux is
> > currently being extended beyond Underhill support, keeping this
> > restriction makes some further development in scope of LVBS support
> > complicated and error prone due to potential ABI mismatches between
> > Linux kernels in different VTLs.
> > 
> > IOW, making the same kernel properly bootable (or - worse - explicitly
> > un-bootable) in different VTLs is a more robust way in the long run.
> 
> The reason for the limitation is the sequencing of early Hyper-V-related
> initialization steps. Knowing at runtime whether you are running at
> VTL0 or some other VTL requires making a hypercall in get_vtl().
> Unfortunately, the machinery for making a hypercall (setting the guest
> OS ID, and allocating the x86 hypercall page) is established relatively late
> during initialization, in hyperv_init(). But running in other than VTL0
> requires the initializations that are done in hv_vtl_init_platform(), which
> must be done much earlier. There's no clear way out of this conundrum
> purely on the Linux guest side.
> 
> To solve the conundrum on x86, one possibility to consider is having
> Hyper-V make HV_REGISTER_VSM_VP_STATUS available as a synthetic
> MSR, which can be read without making a hypercall. This register could be
> read in ms_hyperv_init_platform() to know if running at VTL0 or not.
> Using synthetic MSRs is how other aspects of early Hyper-V-related
> initialization is done in Linux on x86.
> 
> I think there's some discussion on the x86 sequencing issues on LKML
> from when the VTL code was first added. I was part of that discussion, but
> don't remember all the details. There might additional issues raised in
> that discussion.
> 
> The sequencing issues would also need to be sorted out on the arm64
> side, as they are different from x86. We don't have an early Hyper-V
> specific hook like ms_hyperv_init_platform() on the arm64 side, so
> that might be problem. But on the flip side, we also don't have the
> x86-specific messiness that hv_vtl_init_platform() handles. Also,
> there are no synthetic MSRs on arm64, so register accesses always
> use hypercalls, but there's no hypercall page needed. On balance, I
> think getting VTL stuff initialized on arm64 will be easier, but I'm not sure.
> 
> Michael

Thank you for summarizing this up. This aligns with my understanding.

Since VTL1 firmware is a payload for VTL0 kernel, one of my proposals to
the original message was to explicitly notify the kernel it's running in
VTL1 via command line argument and/or DT.

Thanks,
Stas

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-06 18:11         ` Roman Kisel
@ 2025-01-06 19:32           ` Stanislav Kinsburskii
  2025-01-06 21:07             ` Roman Kisel
  0 siblings, 1 reply; 40+ messages in thread
From: Stanislav Kinsburskii @ 2025-01-06 19:32 UTC (permalink / raw)
  To: Roman Kisel
  Cc: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86, apais, benhill, ssengar, sunilmut, vdso

On Mon, Jan 06, 2025 at 10:11:16AM -0800, Roman Kisel wrote:
> 
> 
> On 1/6/2025 9:11 AM, Stanislav Kinsburskii wrote:
> > On Fri, Jan 03, 2025 at 01:39:29PM -0800, Roman Kisel wrote:
> > > 
> 
> [...]
> 
> > > 
> > 
> > The issue is that when you boot the same kernel in both VTL0 and VTL1+,
> > the pages will be allocated in any case (root or guest, VTL0 or VTL1+).
> 
> I think we share we same beliefs: use common code as much as possible.
> Strategically, one day, there will be the kernel being able to boot
> (or at the very minimum share the Hyper-V code for) VTL0, VTL1 (LVBS)
> and VTL2 (OpenHCL). It is not today, though: VTL0 relies on ACPI|BIOS,
> VTL2 relies on DeviceTree, and VTL1 boot configuration comes off as
> a bit ad-hoc from my read of https://github.com/heki-linux/lvbs-linux,
> and working with the LVBS folks on debugging that.
> 
> Can that day of the grand VTL code unification be tomorrow, or next
> week, or next month, or maybe next year, what is the option you leaning
> towards?
> 
> To me, it seems, that's not even the next month. Let us take a look
> at how much ink is being spent to just fix a garden variety function.
> On the meta-level that might mean some _fundamental work_ is needed to
> provide a _robust foundation_ to built upon, such as removing the if
> statements and #ifdef's we're debating about to let the general case
> shine through.
> 
> Tactically, imo, a staged approach might give more velocity and
> coverage instead of fixing the world in this small patch set. I would
> not want to increase the potential "blast radius" of the change.
> As it stands, it is pretty well-contained.
> 
> All told, it might be prudent to focus on the task at hand - fix the
> function in question to enable building on that, e.g. proposing the v4
> of the ARM64 VTL mode patches, and more of what we have in
> https://github.com/microsoft/OHCL-Linux-Kernel.
> 
> Once we take that small step to fix the hyperv-next tree, someone could
> propose removing the conditions for allocating the output page —-- or,
> perhaps, suggest an entirely new & vastly better solution to handling
> the hypercall output page. IMHO, that would enable adding features via
> relying on more generic code rather than further complicating the web
> of conditional statements and conditional compilation.
> 

From my POV a decision between a unified approach and interim solutions
in upstream should usually be resolved in favor of the former.
Given there are different stake holders in VTL code integration, I'd
suggest we step back a bit and think about how to proceed with the
overall design.

In my opinion, although I undestand why Underhill project decided to
come up with the original VTL kernels separation during build time, it's
time to reconsider this approach and to come up with a more generic
design, supporting booting the same kernel in different VTLs.

The major reason for this is that LVBS project relies on binary
compatibility of the kernels running in different VTLs.
The simplest way to provide such a guarantee in both development and
deployment is to run the same kernel in both VTLs.
Not having this ability will require carefull crafting of both the
kernels upon build, making kexec servicing of such kernels in production
complicated and error prone.

Thanks,
Stas

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-06 19:19           ` Stanislav Kinsburskii
@ 2025-01-06 19:49             ` Michael Kelley
  2025-01-06 21:12               ` Stanislav Kinsburskii
  0 siblings, 1 reply; 40+ messages in thread
From: Michael Kelley @ 2025-01-06 19:49 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: Roman Kisel, hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com, wei.liu@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, January 6, 2025 11:19 AM
> 
> On Mon, Jan 06, 2025 at 06:18:51PM +0000, Michael Kelley wrote:
> > From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, January 6, 2025 9:23 AM
> > >
> > > On Fri, Jan 03, 2025 at 10:08:05PM +0000, Michael Kelley wrote:
> > > > From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Friday, January
> > > 3, 2025 11:20 AM
> > > > >
> > > > > On Mon, Dec 30, 2024 at 10:09:39AM -0800, Roman Kisel wrote:
> > > > > > Due to the hypercall page not being allocated in the VTL mode,
> > > > > > the code resorts to using a part of the input page.
> > > > > >
> > > > > > Allocate the hypercall output page in the VTL mode thus enabling
> > > > > > it to use it for output and share code with dom0.
> > > > > >
> > > > > > Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> > > > > > ---
> > > > > >  drivers/hv/hv_common.c | 6 +++---
> > > > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> > > > > > index c6ed3ba4bf61..c983cfd4d6c0 100644
> > > > > > --- a/drivers/hv/hv_common.c
> > > > > > +++ b/drivers/hv/hv_common.c
> > > > > > @@ -340,7 +340,7 @@ int __init hv_common_init(void)
> > > > > >  	BUG_ON(!hyperv_pcpu_input_arg);
> > > > > >
> > > > > >  	/* Allocate the per-CPU state for output arg for root */
> > > > > > -	if (hv_root_partition) {
> > > > > > +	if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
> > > > >
> > > > > This check doesn't look nice.
> > > > > First of all, IS_ENABLED(CONFIG_HYPERV_VTL_MODE) doesn't mean that this
> > > > > particular kernel is being booted in VTL other that VTL0.
> > > >
> > > > Actually, it does mean that. Kernels built with CONFIG_HYPERV_VTL_MODE=y
> > > > will not run as a normal guest in VTL 0. See the third paragraph of the
> > > > "help" section for HYPERV_VTL_MODE in drivers/hv/Kconfig.
> > > >
> > >
> > > Thanks for pointing to this piece.
> > >
> > > This limitation looks aritificial to me and as VTL support in Linux is
> > > currently being extended beyond Underhill support, keeping this
> > > restriction makes some further development in scope of LVBS support
> > > complicated and error prone due to potential ABI mismatches between
> > > Linux kernels in different VTLs.
> > >
> > > IOW, making the same kernel properly bootable (or - worse - explicitly
> > > un-bootable) in different VTLs is a more robust way in the long run.
> >
> > The reason for the limitation is the sequencing of early Hyper-V-related
> > initialization steps. Knowing at runtime whether you are running at
> > VTL0 or some other VTL requires making a hypercall in get_vtl().
> > Unfortunately, the machinery for making a hypercall (setting the guest
> > OS ID, and allocating the x86 hypercall page) is established relatively late
> > during initialization, in hyperv_init(). But running in other than VTL0
> > requires the initializations that are done in hv_vtl_init_platform(), which
> > must be done much earlier. There's no clear way out of this conundrum
> > purely on the Linux guest side.
> >
> > To solve the conundrum on x86, one possibility to consider is having
> > Hyper-V make HV_REGISTER_VSM_VP_STATUS available as a synthetic
> > MSR, which can be read without making a hypercall. This register could be
> > read in ms_hyperv_init_platform() to know if running at VTL0 or not.
> > Using synthetic MSRs is how other aspects of early Hyper-V-related
> > initialization is done in Linux on x86.
> >
> > I think there's some discussion on the x86 sequencing issues on LKML
> > from when the VTL code was first added. I was part of that discussion, but
> > don't remember all the details. There might additional issues raised in
> > that discussion.
> >
> > The sequencing issues would also need to be sorted out on the arm64
> > side, as they are different from x86. We don't have an early Hyper-V
> > specific hook like ms_hyperv_init_platform() on the arm64 side, so
> > that might be problem. But on the flip side, we also don't have the
> > x86-specific messiness that hv_vtl_init_platform() handles. Also,
> > there are no synthetic MSRs on arm64, so register accesses always
> > use hypercalls, but there's no hypercall page needed. On balance, I
> > think getting VTL stuff initialized on arm64 will be easier, but I'm not sure.
> >
> > Michael
> 
> Thank you for summarizing this up. This aligns with my understanding.
> 
> Since VTL1 firmware is a payload for VTL0 kernel, one of my proposals to
> the original message was to explicitly notify the kernel it's running in
> VTL1 via command line argument and/or DT.
> 

OK, yes.  Rather than non-VTL0 behavior being a kernel build-time decision,
make it a kernel boot-line based decision. A runtime decision based on
detecting the VTL is hard as described above.

I don't immediately see an initialization sequence problem in using a kernel
boot line parameter to specify running in non-VTL0. I'm unsure if DT would
be available at the time ms_hyperv_init_platform() runs and decides to call
hv_vtl_init_platform(). Have you looked to see?

Michael

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 1/5] hyperv: Define struct hv_output_get_vp_registers
  2025-01-06 17:37   ` Michael Kelley
@ 2025-01-06 20:24     ` Roman Kisel
  2025-01-08  7:34       ` Wei Liu
  2025-01-08 17:58     ` Nuno Das Neves
  1 sibling, 1 reply; 40+ messages in thread
From: Roman Kisel @ 2025-01-06 20:24 UTC (permalink / raw)
  To: Michael Kelley, hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com, wei.liu@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com, ssengar@microsoft.com,
	sunilmut@microsoft.com, vdso@hexbites.dev



On 1/6/2025 9:37 AM, Michael Kelley wrote:
> From: Roman Kisel <romank@linux.microsoft.com> Sent: Monday, December 30, 2024 10:10 AM
[...]
> 
> These bit field definitions don't look right. We want to "fill up"
> the field size, so that we're explicit about each bit, and not leave
> it to the compiler to add padding (which __packed tells the
> compiler not to do). So in aggregate, the "u64" bit fields should
> account for all 64 bits, but here you account for only 32 bits.
> There are two ways to fix this:
> 
> 		u32 interruption_pending : 1;
> 		u32 interruption_type: 1;
> 		u32 reserved : 30;
> 		u32 error_code;
> Or
> 		u64 interruption_pending : 1;
> 		u64 interruption_type: 1;
> 		u64 reserved : 30;
> 		u64 error_code : 32;
> 
> 
> Nuno -- your thoughts?

Quite a blunder on my side! Thank you very much for your help :)
GDB is my witness:

(gdb) ptype /o union hv_arm64_pending_synthetic_exception_event
/* offset      |    size */  type = union 
hv_arm64_pending_synthetic_exception_event {
/*                    16 */    u64 as_uint64[2];
/*                    13 */    struct {
/*      0: 0   |       4 */        u32 event_pending : 1;
/*      0: 1   |       4 */        u32 event_type : 3;
/*      0: 4   |       4 */        u32 reserved : 4;
/*      1      |       4 */        u32 exception_type;
/*      5      |       8 */        u64 context;

                                    /* total size (bytes):   13 */
                                };

                                /* total size (bytes):   16 */
                              }

which looks messed up to me to put it mildly. Will fix next time.

[...]

> 
> Michael
> 
>> +
>> +/*
>> + * NOTE: Linux helper struct - NOT from Hyper-V code.
>> + * DECLARE_FLEX_ARRAY() needs to be wrapped into
>> + * a structure and have at least one more member besides
>> + * DECLARE_FLEX_ARRAY.
>> + */
>> +struct hv_output_get_vp_registers {
>> +	struct {
>> +		DECLARE_FLEX_ARRAY(union hv_register_value, values);
>> +		struct {} values_end;
>> +	};
>>   };
>>
>>   #if defined(CONFIG_ARM64)
>> --
>> 2.34.1

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-06 19:32           ` Stanislav Kinsburskii
@ 2025-01-06 21:07             ` Roman Kisel
  2025-01-07 19:18               ` Stanislav Kinsburskii
  0 siblings, 1 reply; 40+ messages in thread
From: Roman Kisel @ 2025-01-06 21:07 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86, apais, benhill, ssengar, sunilmut, vdso



On 1/6/2025 11:32 AM, Stanislav Kinsburskii wrote:
> On Mon, Jan 06, 2025 at 10:11:16AM -0800, Roman Kisel wrote:
[...]s

> From my POV a decision between a unified approach and interim solutions
> in upstream should usually be resolved in favor of the former.
> Given there are different stake holders in VTL code integration, I'd
> suggest we step back a bit and think about how to proceed with the
> overall design.
I don't see any compelling reason for stalling this fix and think for
no one knows how long. What is going to be fixed is clear, and it has
not been demonstrated what is going to be broken when this change is
merged.

> 
> In my opinion, although I undestand why Underhill project decided to
> come up with the original VTL kernels separation during build time, it's
> time to reconsider this approach and to come up with a more generic
> design, supporting booting the same kernel in different VTLs.
"The same kernel" means supporting ACPI/UEFI for VTL0, VTL1, VTL2 as
otherwise VTL0 won't boot. But why would VTL>0 even consider relying on
ACPI or compiling it in? I would fix your broad assertion with adding
one constraint: share as much Hyper-V code as possible, booting is not
expected, rather refuse building.

> 
> The major reason for this is that LVBS project relies on binary
> compatibility of the kernels running in different VTLs.
> The simplest way to provide such a guarantee in both development and
> deployment is to run the same kernel in both VTLs.

OpenHCL aka Underhill is the only shipping product relying on
the code in question (others might be dom0 and LVBS). What the LVBS
project relies on might change any number of times any day before it
ships. OpenHCL works for customers and slicing and dicing it ought to
be well thought through.

> Not having this ability will require carefull crafting of both the
> kernels upon build, making kexec servicing of such kernels in production
> complicated and error prone.

Too vague, imho. I'd respond that "careful crafting" shouldn't lead to
being complicated and error prone as long as it's automatic and, well,
careful.

It is harder and harder to me to see how enabling the hypercall output
page is related to where the discussion has drifted. My claims are:

* enabling the hypercall output page poses no harm (doesn't break hyper-
next),
* allows to bring the code to the clear and concise form of getting a VP
register.

Can you refute any of that?

> 
> Thanks,
> Stas

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-06 19:49             ` Michael Kelley
@ 2025-01-06 21:12               ` Stanislav Kinsburskii
  0 siblings, 0 replies; 40+ messages in thread
From: Stanislav Kinsburskii @ 2025-01-06 21:12 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Roman Kisel, hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com, wei.liu@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev

On Mon, Jan 06, 2025 at 07:49:15PM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, January 6, 2025 11:19 AM
> > 
> > On Mon, Jan 06, 2025 at 06:18:51PM +0000, Michael Kelley wrote:
> > > From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, January 6, 2025 9:23 AM
> > > >
> > > > On Fri, Jan 03, 2025 at 10:08:05PM +0000, Michael Kelley wrote:
> > > > > From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Friday, January
> > > > 3, 2025 11:20 AM
> > > > > >
> > > > > > On Mon, Dec 30, 2024 at 10:09:39AM -0800, Roman Kisel wrote:
> > > > > > > Due to the hypercall page not being allocated in the VTL mode,
> > > > > > > the code resorts to using a part of the input page.
> > > > > > >
> > > > > > > Allocate the hypercall output page in the VTL mode thus enabling
> > > > > > > it to use it for output and share code with dom0.
> > > > > > >
> > > > > > > Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> > > > > > > ---
> > > > > > >  drivers/hv/hv_common.c | 6 +++---
> > > > > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> > > > > > > index c6ed3ba4bf61..c983cfd4d6c0 100644
> > > > > > > --- a/drivers/hv/hv_common.c
> > > > > > > +++ b/drivers/hv/hv_common.c
> > > > > > > @@ -340,7 +340,7 @@ int __init hv_common_init(void)
> > > > > > >  	BUG_ON(!hyperv_pcpu_input_arg);
> > > > > > >
> > > > > > >  	/* Allocate the per-CPU state for output arg for root */
> > > > > > > -	if (hv_root_partition) {
> > > > > > > +	if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
> > > > > >
> > > > > > This check doesn't look nice.
> > > > > > First of all, IS_ENABLED(CONFIG_HYPERV_VTL_MODE) doesn't mean that this
> > > > > > particular kernel is being booted in VTL other that VTL0.
> > > > >
> > > > > Actually, it does mean that. Kernels built with CONFIG_HYPERV_VTL_MODE=y
> > > > > will not run as a normal guest in VTL 0. See the third paragraph of the
> > > > > "help" section for HYPERV_VTL_MODE in drivers/hv/Kconfig.
> > > > >
> > > >
> > > > Thanks for pointing to this piece.
> > > >
> > > > This limitation looks aritificial to me and as VTL support in Linux is
> > > > currently being extended beyond Underhill support, keeping this
> > > > restriction makes some further development in scope of LVBS support
> > > > complicated and error prone due to potential ABI mismatches between
> > > > Linux kernels in different VTLs.
> > > >
> > > > IOW, making the same kernel properly bootable (or - worse - explicitly
> > > > un-bootable) in different VTLs is a more robust way in the long run.
> > >
> > > The reason for the limitation is the sequencing of early Hyper-V-related
> > > initialization steps. Knowing at runtime whether you are running at
> > > VTL0 or some other VTL requires making a hypercall in get_vtl().
> > > Unfortunately, the machinery for making a hypercall (setting the guest
> > > OS ID, and allocating the x86 hypercall page) is established relatively late
> > > during initialization, in hyperv_init(). But running in other than VTL0
> > > requires the initializations that are done in hv_vtl_init_platform(), which
> > > must be done much earlier. There's no clear way out of this conundrum
> > > purely on the Linux guest side.
> > >
> > > To solve the conundrum on x86, one possibility to consider is having
> > > Hyper-V make HV_REGISTER_VSM_VP_STATUS available as a synthetic
> > > MSR, which can be read without making a hypercall. This register could be
> > > read in ms_hyperv_init_platform() to know if running at VTL0 or not.
> > > Using synthetic MSRs is how other aspects of early Hyper-V-related
> > > initialization is done in Linux on x86.
> > >
> > > I think there's some discussion on the x86 sequencing issues on LKML
> > > from when the VTL code was first added. I was part of that discussion, but
> > > don't remember all the details. There might additional issues raised in
> > > that discussion.
> > >
> > > The sequencing issues would also need to be sorted out on the arm64
> > > side, as they are different from x86. We don't have an early Hyper-V
> > > specific hook like ms_hyperv_init_platform() on the arm64 side, so
> > > that might be problem. But on the flip side, we also don't have the
> > > x86-specific messiness that hv_vtl_init_platform() handles. Also,
> > > there are no synthetic MSRs on arm64, so register accesses always
> > > use hypercalls, but there's no hypercall page needed. On balance, I
> > > think getting VTL stuff initialized on arm64 will be easier, but I'm not sure.
> > >
> > > Michael
> > 
> > Thank you for summarizing this up. This aligns with my understanding.
> > 
> > Since VTL1 firmware is a payload for VTL0 kernel, one of my proposals to
> > the original message was to explicitly notify the kernel it's running in
> > VTL1 via command line argument and/or DT.
> > 
> 
> OK, yes.  Rather than non-VTL0 behavior being a kernel build-time decision,
> make it a kernel boot-line based decision. A runtime decision based on
> detecting the VTL is hard as described above.
> 
> I don't immediately see an initialization sequence problem in using a kernel
> boot line parameter to specify running in non-VTL0. I'm unsure if DT would
> be available at the time ms_hyperv_init_platform() runs and decides to call
> hv_vtl_init_platform(). Have you looked to see?
> 

In current implementation, DT is not available at the moment of
hypervisor platform init on x86, which leaves only the command line
option for now.

But I guess given the overall DT struggle on x86, command line option is
a least intrusive solution out of these two.

Thanks,
Stas

> Michael

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-06 21:07             ` Roman Kisel
@ 2025-01-07 19:18               ` Stanislav Kinsburskii
  2025-01-07 23:11                 ` Roman Kisel
  0 siblings, 1 reply; 40+ messages in thread
From: Stanislav Kinsburskii @ 2025-01-07 19:18 UTC (permalink / raw)
  To: Roman Kisel
  Cc: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86, apais, benhill, ssengar, sunilmut, vdso

On Mon, Jan 06, 2025 at 01:07:25PM -0800, Roman Kisel wrote:
> 
> 
> On 1/6/2025 11:32 AM, Stanislav Kinsburskii wrote:
> > On Mon, Jan 06, 2025 at 10:11:16AM -0800, Roman Kisel wrote:
> [...]s
> 
> > From my POV a decision between a unified approach and interim solutions
> > in upstream should usually be resolved in favor of the former.
> > Given there are different stake holders in VTL code integration, I'd
> > suggest we step back a bit and think about how to proceed with the
> > overall design.
> I don't see any compelling reason for stalling this fix and think for
> no one knows how long. What is going to be fixed is clear, and it has
> not been demonstrated what is going to be broken when this change is
> merged.
> 
> > 
> > In my opinion, although I undestand why Underhill project decided to
> > come up with the original VTL kernels separation during build time, it's
> > time to reconsider this approach and to come up with a more generic
> > design, supporting booting the same kernel in different VTLs.
> "The same kernel" means supporting ACPI/UEFI for VTL0, VTL1, VTL2 as
> otherwise VTL0 won't boot. But why would VTL>0 even consider relying on
> ACPI or compiling it in? I would fix your broad assertion with adding
> one constraint: share as much Hyper-V code as possible, booting is not
> expected, rather refuse building.
> 
> > 
> > The major reason for this is that LVBS project relies on binary
> > compatibility of the kernels running in different VTLs.
> > The simplest way to provide such a guarantee in both development and
> > deployment is to run the same kernel in both VTLs.
> 
> OpenHCL aka Underhill is the only shipping product relying on
> the code in question (others might be dom0 and LVBS). What the LVBS
> project relies on might change any number of times any day before it
> ships. OpenHCL works for customers and slicing and dicing it ought to
> be well thought through.
> 
> > Not having this ability will require carefull crafting of both the
> > kernels upon build, making kexec servicing of such kernels in production
> > complicated and error prone.
> 
> Too vague, imho. I'd respond that "careful crafting" shouldn't lead to
> being complicated and error prone as long as it's automatic and, well,
> careful.
> 
> It is harder and harder to me to see how enabling the hypercall output
> page is related to where the discussion has drifted. My claims are:
> 
> * enabling the hypercall output page poses no harm (doesn't break hyper-
> next),
> * allows to bring the code to the clear and concise form of getting a VP
> register.
> 
> Can you refute any of that?
> 

My point is that the proposed fix looks more like an Underhill-tailored
bandage and doesn't take the needs of other stake holders into
consideration.

What is the urgency in merging of this particular change?

Thanks,
Stas

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-07 19:18               ` Stanislav Kinsburskii
@ 2025-01-07 23:11                 ` Roman Kisel
  2025-01-08  8:04                   ` Wei Liu
  2025-01-08 19:17                   ` Stanislav Kinsburskii
  0 siblings, 2 replies; 40+ messages in thread
From: Roman Kisel @ 2025-01-07 23:11 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86, apais, benhill, ssengar, sunilmut, vdso



On 1/7/2025 11:18 AM, Stanislav Kinsburskii wrote:
> On Mon, Jan 06, 2025 at 01:07:25PM -0800, Roman Kisel wrote:
> 

[...]

> My point is that the proposed fix looks more like an Underhill-tailored
> bandage and doesn't take the needs of other stake holders into
> consideration.
The patch takes as much into consideration as present in the hyperv-next
tree. Working on the open-source project seems to be harder otherwise.
A bandage, or not, that's a matter of opinion. There's a been a break,
here's the bandage.

> 
> What is the urgency in merging of this particular change?

The get_vtl function is broken thus blocking any further work on
upstreaming VTL mode patches, ARM64 and more. That's not an urgent
urgency where customers are in pain, more like the urgency of needing
to take the trash out, and until that happens, continuing inhaling the
fumes.

The urgency of unblocking is to continue work on proposing VTL mode
patches not to carry lots of out-of-tree code in the fork.

There might be a future where the Hyper-V code offers an API surface
covering needs of consumers like dom0 and VTLs whereby they maybe can
be built as an out-of-tree modules so the opinions wouldn't clash as
much.

Avoiding using the output hypercall page leads to something like[1]
and it looks quite complicated although that's the bare bones, lots
of notes.

[1]

/*
  * Fast extended hypercall with 20 bytes of input and 16 bytes of
  * output for getting a VP register.
  *
  * NOTES:
  *  1. The function is __init only atm, so the XMM context isn't
  *     used by the user mode.
  *  2. X86_64 only.
  *  3. Fast extended hypercalls may use XMM0..XMM6, and XMM is
  *     architerctural on X86_64 yet the support should be enabled
  *     in the CR's. Here, need RDX, R8 and XMM0 for input and RDX,
  *     R8 for output
  *  4. No provisions for TDX and SEV-SNP for the sake of simplicity
  *     (the hypervisor cannot see the guest registers in the
  *     confidential VM), would need to fallback.
  *  5. The robust implementation would need to check if fast extended
  *     hypercalls are available by checking the synthehtic CPUID leaves.
  *     A separate leaf indicates fast output support.
  *     It _almost_ certainly has to be, unless somehow disabled, hard
  *     to see why that would be needed.
  */
struct hv_u128 {
	u64 low_part;
	u64 high_part;
} __packed;

static __init u64 hv_vp_get_register_xfast(u32 reg_name,
		struct hv_u128 *value)
{
	u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS |
					HV_HYPERCALL_FAST_BIT;
	unsigned long flags;
	u64 hv_status;

	union {
		struct hv_get_vp_registers_input input;
		struct {
			u64 lo;
			u64 hi;
		} __packed as_u128;
	} hv_input;
	register u64 rdx asm("rdx");
	register u64 r8 asm("r8");
	register u64 r12 asm("r12");

	local_irq_save(flags);

	hv_input.as_u128.lo = hv_input.as_u128.hi = 0;
	hv_input.input.header.partitionid = HV_PARTITION_ID_SELF;
	hv_input.input.header.vpindex = HV_VP_INDEX_SELF;
	hv_input.input.header.inputvtl = 0;

	rdx = hv_input.as_u128.lo;
	r8 = hv_input.as_u128.hi;
	r12 = reg_name;

	__asm__ __volatile__(
			"subq		$16, %%rsp\n"
			"movups		%%xmm0, 16(%%rsp)\n"
			"movd		%%r12, %%xmm0\n"
			CALL_NOSPEC
			"movups		16(%%rsp), %%xmm0\n"
			"addq		$16, %%rsp\n"
			: "=a" (hv_status), ASM_CALL_CONSTRAINT,
			"+c" (control), "+r" (rdx), "+r" (r8)
			: THUNK_TARGET(hv_hypercall_pg), "r"(r12)
			: "cc", "memory", "r9", "r10", "r11");

	if (hv_result_success(hv_status)) {
		value->low_part = rdx;
		value->high_part = r8;
	}

	local_irq_restore(flags);
	return hv_status;
}

#if IS_ENABLED(CONFIG_HYPERV_VTL_MODE)
u8 __init get_vtl(void)
{
	struct hv_u128 reg_value;
	u64 ret = hv_vp_get_register_xfast(HV_REGISTER_VSM_VP_STATUS, &reg_value);

	if (hv_result_success(ret)) {
		ret = reg_value.low_part & HV_VTL_MASK;
	} else {
		pr_err("Failed to get VTL(error: %lld) exiting...\n", ret);
		BUG();
	}

	return ret;
}
#endif

> 
> Thanks,
> Stas

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 1/5] hyperv: Define struct hv_output_get_vp_registers
  2025-01-06 20:24     ` Roman Kisel
@ 2025-01-08  7:34       ` Wei Liu
  2025-01-08 17:48         ` Roman Kisel
  0 siblings, 1 reply; 40+ messages in thread
From: Wei Liu @ 2025-01-08  7:34 UTC (permalink / raw)
  To: Roman Kisel
  Cc: Michael Kelley, hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com, wei.liu@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev

On Mon, Jan 06, 2025 at 12:24:32PM -0800, Roman Kisel wrote:
> 
> 
> On 1/6/2025 9:37 AM, Michael Kelley wrote:
> > From: Roman Kisel <romank@linux.microsoft.com> Sent: Monday, December 30, 2024 10:10 AM
> [...]
> > 
> > These bit field definitions don't look right. We want to "fill up"
> > the field size, so that we're explicit about each bit, and not leave
> > it to the compiler to add padding (which __packed tells the
> > compiler not to do). So in aggregate, the "u64" bit fields should
> > account for all 64 bits, but here you account for only 32 bits.
> > There are two ways to fix this:
> > 
> > 		u32 interruption_pending : 1;
> > 		u32 interruption_type: 1;
> > 		u32 reserved : 30;
> > 		u32 error_code;
> > Or
> > 		u64 interruption_pending : 1;
> > 		u64 interruption_type: 1;
> > 		u64 reserved : 30;
> > 		u64 error_code : 32;
> > 
> > 
> > Nuno -- your thoughts?
> 
> Quite a blunder on my side! Thank you very much for your help :)
> GDB is my witness:
> 
> (gdb) ptype /o union hv_arm64_pending_synthetic_exception_event
> /* offset      |    size */  type = union
> hv_arm64_pending_synthetic_exception_event {
> /*                    16 */    u64 as_uint64[2];
> /*                    13 */    struct {
> /*      0: 0   |       4 */        u32 event_pending : 1;
> /*      0: 1   |       4 */        u32 event_type : 3;
> /*      0: 4   |       4 */        u32 reserved : 4;
> /*      1      |       4 */        u32 exception_type;
> /*      5      |       8 */        u64 context;
> 
>                                    /* total size (bytes):   13 */
>                                };
> 
>                                /* total size (bytes):   16 */
>                              }
> 
> which looks messed up to me to put it mildly. Will fix next time.

We are pretty close to the next merge window (~2 weeks). Can you please
resend a version this week or early next week?

Nuno, if you have any comments on this series, now it is the time to
share them.

Thanks,
Wei.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 4/5] hyperv: Do not overlap the hvcall IO areas in get_vtl()
  2024-12-30 18:09 ` [PATCH v5 4/5] hyperv: Do not overlap the hvcall IO areas in get_vtl() Roman Kisel
@ 2025-01-08  7:36   ` Wei Liu
  2025-01-08 17:47     ` Roman Kisel
  0 siblings, 1 reply; 40+ messages in thread
From: Wei Liu @ 2025-01-08  7:36 UTC (permalink / raw)
  To: Roman Kisel
  Cc: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86, apais, benhill, ssengar, sunilmut, vdso

On Mon, Dec 30, 2024 at 10:09:40AM -0800, Roman Kisel wrote:
> The Top-Level Functional Specification for Hyper-V, Section 3.6 [1, 2],
> disallows overlapping of the input and output hypercall areas, and
> get_vtl(void) does overlap them.
> 
> Use the output hypercall page of the current vCPU for the hypercall.
> 
> [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/hypercall-interface
> [2] https://github.com/MicrosoftDocs/Virtualization-Documentation/tree/main/tlfs
> 
> Fixes: 8387ce06d70b ("x86/hyperv: Set Virtual Trust Level in VMBus init message")
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>

You forgot to pick up Tianyu's Reviewed-by tag in the previous version.
In the future please make sure to collect all the tags you get from
previous review rounds.

Thanks,
Wei.

> ---
>  arch/x86/hyperv/hv_init.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index f82d1aefaa8a..173005e6a95d 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -422,7 +422,7 @@ static u8 __init get_vtl(void)
>  
>  	local_irq_save(flags);
>  	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> -	output = (struct hv_output_get_vp_registers *)input;
> +	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
>  
>  	memset(input, 0, struct_size(input, names, 1));
>  	input->partition_id = HV_PARTITION_ID_SELF;
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-07 23:11                 ` Roman Kisel
@ 2025-01-08  8:04                   ` Wei Liu
  2025-01-08 19:17                   ` Stanislav Kinsburskii
  1 sibling, 0 replies; 40+ messages in thread
From: Wei Liu @ 2025-01-08  8:04 UTC (permalink / raw)
  To: Roman Kisel
  Cc: Stanislav Kinsburskii, hpa, kys, bp, dave.hansen, decui, eahariha,
	haiyangz, mingo, mhklinux, nunodasneves, tglx, tiala, wei.liu,
	linux-hyperv, linux-kernel, x86, apais, benhill, ssengar,
	sunilmut, vdso

On Tue, Jan 07, 2025 at 03:11:15PM -0800, Roman Kisel wrote:
> 
> 
> On 1/7/2025 11:18 AM, Stanislav Kinsburskii wrote:
> > On Mon, Jan 06, 2025 at 01:07:25PM -0800, Roman Kisel wrote:
> > 
> 
> [...]
> 
> > My point is that the proposed fix looks more like an Underhill-tailored
> > bandage and doesn't take the needs of other stake holders into
> > consideration.
> The patch takes as much into consideration as present in the hyperv-next
> tree. Working on the open-source project seems to be harder otherwise.
> A bandage, or not, that's a matter of opinion. There's a been a break,
> here's the bandage.
> 
> > 
> > What is the urgency in merging of this particular change?
> 
> The get_vtl function is broken thus blocking any further work on
> upstreaming VTL mode patches, ARM64 and more. That's not an urgent
> urgency where customers are in pain, more like the urgency of needing
> to take the trash out, and until that happens, continuing inhaling the
> fumes.
> 
> The urgency of unblocking is to continue work on proposing VTL mode
> patches not to carry lots of out-of-tree code in the fork.
> 
> There might be a future where the Hyper-V code offers an API surface
> covering needs of consumers like dom0 and VTLs whereby they maybe can
> be built as an out-of-tree modules so the opinions wouldn't clash as
> much.
> 
> Avoiding using the output hypercall page leads to something like[1]
> and it looks quite complicated although that's the bare bones, lots
> of notes.
> 
> [1]
> 
> /*
>  * Fast extended hypercall with 20 bytes of input and 16 bytes of
>  * output for getting a VP register.
>  *
>  * NOTES:
>  *  1. The function is __init only atm, so the XMM context isn't
>  *     used by the user mode.
>  *  2. X86_64 only.
>  *  3. Fast extended hypercalls may use XMM0..XMM6, and XMM is
>  *     architerctural on X86_64 yet the support should be enabled
>  *     in the CR's. Here, need RDX, R8 and XMM0 for input and RDX,
>  *     R8 for output
>  *  4. No provisions for TDX and SEV-SNP for the sake of simplicity
>  *     (the hypervisor cannot see the guest registers in the
>  *     confidential VM), would need to fallback.

I am not worried about this point. There are architectural defined ways
to handle this.

>  *  5. The robust implementation would need to check if fast extended
>  *     hypercalls are available by checking the synthehtic CPUID leaves.
>  *     A separate leaf indicates fast output support.
>  *     It _almost_ certainly has to be, unless somehow disabled, hard
>  *     to see why that would be needed.
>  */

The rest I agree. Not worth the effort just to add that support here for
a single user.

I've been thinking about adding the extended hypercall support for a
while, but I'm not sure if it's worth the effort overall.

An aspiring developer who's interested in this area is building a
prototype to see if extended fast hypercall can give a boost to some of
the frequent hypercalls.

In any case, I think this patch is fine.

Thanks,
Wei.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 4/5] hyperv: Do not overlap the hvcall IO areas in get_vtl()
  2025-01-08  7:36   ` Wei Liu
@ 2025-01-08 17:47     ` Roman Kisel
  0 siblings, 0 replies; 40+ messages in thread
From: Roman Kisel @ 2025-01-08 17:47 UTC (permalink / raw)
  To: Wei Liu
  Cc: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, linux-hyperv, linux-kernel,
	x86, apais, benhill, ssengar, sunilmut, vdso



On 1/7/2025 11:36 PM, Wei Liu wrote:
> On Mon, Dec 30, 2024 at 10:09:40AM -0800, Roman Kisel wrote:
>> The Top-Level Functional Specification for Hyper-V, Section 3.6 [1, 2],
>> disallows overlapping of the input and output hypercall areas, and
>> get_vtl(void) does overlap them.
>>
>> Use the output hypercall page of the current vCPU for the hypercall.
>>
>> [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/hypercall-interface
>> [2] https://github.com/MicrosoftDocs/Virtualization-Documentation/tree/main/tlfs
>>
>> Fixes: 8387ce06d70b ("x86/hyperv: Set Virtual Trust Level in VMBus init message")
>> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> 
> You forgot to pick up Tianyu's Reviewed-by tag in the previous version.
> In the future please make sure to collect all the tags you get from
> previous review rounds.

Apologies, my bad! Will collect tags in the future.

> 
> Thanks,
> Wei.
> 
>> ---
>>   arch/x86/hyperv/hv_init.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> index f82d1aefaa8a..173005e6a95d 100644
>> --- a/arch/x86/hyperv/hv_init.c
>> +++ b/arch/x86/hyperv/hv_init.c
>> @@ -422,7 +422,7 @@ static u8 __init get_vtl(void)
>>   
>>   	local_irq_save(flags);
>>   	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
>> -	output = (struct hv_output_get_vp_registers *)input;
>> +	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
>>   
>>   	memset(input, 0, struct_size(input, names, 1));
>>   	input->partition_id = HV_PARTITION_ID_SELF;
>> -- 
>> 2.34.1
>>

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 1/5] hyperv: Define struct hv_output_get_vp_registers
  2025-01-08  7:34       ` Wei Liu
@ 2025-01-08 17:48         ` Roman Kisel
  0 siblings, 0 replies; 40+ messages in thread
From: Roman Kisel @ 2025-01-08 17:48 UTC (permalink / raw)
  To: Wei Liu
  Cc: Michael Kelley, hpa@zytor.com, kys@microsoft.com, bp@alien8.de,
	dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, nunodasneves@linux.microsoft.com,
	tglx@linutronix.de, tiala@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	x86@kernel.org, apais@microsoft.com, benhill@microsoft.com,
	ssengar@microsoft.com, sunilmut@microsoft.com, vdso@hexbites.dev



On 1/7/2025 11:34 PM, Wei Liu wrote:
> On Mon, Jan 06, 2025 at 12:24:32PM -0800, Roman Kisel wrote:
>>
>>
>> On 1/6/2025 9:37 AM, Michael Kelley wrote:
>>> From: Roman Kisel <romank@linux.microsoft.com> Sent: Monday, December 30, 2024 10:10 AM
>> [...]
>>>
>>> These bit field definitions don't look right. We want to "fill up"
>>> the field size, so that we're explicit about each bit, and not leave
>>> it to the compiler to add padding (which __packed tells the
>>> compiler not to do). So in aggregate, the "u64" bit fields should
>>> account for all 64 bits, but here you account for only 32 bits.
>>> There are two ways to fix this:
>>>
>>> 		u32 interruption_pending : 1;
>>> 		u32 interruption_type: 1;
>>> 		u32 reserved : 30;
>>> 		u32 error_code;
>>> Or
>>> 		u64 interruption_pending : 1;
>>> 		u64 interruption_type: 1;
>>> 		u64 reserved : 30;
>>> 		u64 error_code : 32;
>>>
>>>
>>> Nuno -- your thoughts?
>>
>> Quite a blunder on my side! Thank you very much for your help :)
>> GDB is my witness:
>>
>> (gdb) ptype /o union hv_arm64_pending_synthetic_exception_event
>> /* offset      |    size */  type = union
>> hv_arm64_pending_synthetic_exception_event {
>> /*                    16 */    u64 as_uint64[2];
>> /*                    13 */    struct {
>> /*      0: 0   |       4 */        u32 event_pending : 1;
>> /*      0: 1   |       4 */        u32 event_type : 3;
>> /*      0: 4   |       4 */        u32 reserved : 4;
>> /*      1      |       4 */        u32 exception_type;
>> /*      5      |       8 */        u64 context;
>>
>>                                     /* total size (bytes):   13 */
>>                                 };
>>
>>                                 /* total size (bytes):   16 */
>>                               }
>>
>> which looks messed up to me to put it mildly. Will fix next time.
> 
> We are pretty close to the next merge window (~2 weeks). Can you please
> resend a version this week or early next week?

Understood, will do!

> 
> Nuno, if you have any comments on this series, now it is the time to
> share them.
> 
> Thanks,
> Wei.

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 1/5] hyperv: Define struct hv_output_get_vp_registers
  2025-01-06 17:37   ` Michael Kelley
  2025-01-06 20:24     ` Roman Kisel
@ 2025-01-08 17:58     ` Nuno Das Neves
  2025-01-08 18:22       ` Michael Kelley
  1 sibling, 1 reply; 40+ messages in thread
From: Nuno Das Neves @ 2025-01-08 17:58 UTC (permalink / raw)
  To: Michael Kelley, Roman Kisel, hpa@zytor.com, kys@microsoft.com,
	bp@alien8.de, dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, tglx@linutronix.de, tiala@microsoft.com,
	wei.liu@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com, ssengar@microsoft.com,
	sunilmut@microsoft.com, vdso@hexbites.dev

On 1/6/2025 9:37 AM, Michael Kelley wrote:
> From: Roman Kisel <romank@linux.microsoft.com> Sent: Monday, December 30, 2024 10:10 AM
>>
>> There is no definition of the output structure for the
>> GetVpRegisters hypercall. Hence, using the hypercall
>> is not possible when the output value has some structure
>> to it. Even getting a datum of a primitive type reads
>> as ad-hoc without that definition.
>>
>> Define struct hv_output_get_vp_registers to enable using
>> the GetVpRegisters hypercall. Make provisions for all
>> supported architectures. No functional changes.
>>
>> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
>> ---
>>  include/hyperv/hvgdk_mini.h | 49 +++++++++++++++++++++++++++++++++++++
>>  1 file changed, 49 insertions(+)
>>
>> diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
>> index db3d1aaf7330..e8e3faa78e15 100644
>> --- a/include/hyperv/hvgdk_mini.h
>> +++ b/include/hyperv/hvgdk_mini.h
>> @@ -1068,6 +1068,35 @@ union hv_dispatch_suspend_register {
>>  	} __packed;
>>  };
>>
>> +union hv_arm64_pending_interruption_register {
>> +	u64 as_uint64;
>> +	struct {
>> +		u64 interruption_pending : 1;
>> +		u64 interruption_type : 1;
>> +		u64 reserved : 30;
>> +		u32 error_code;
> 
> These bit field definitions don't look right. We want to "fill up"
> the field size, so that we're explicit about each bit, and not leave
> it to the compiler to add padding (which __packed tells the
> compiler not to do). So in aggregate, the "u64" bit fields should
> account for all 64 bits, but here you account for only 32 bits.
> There are two ways to fix this:
> 
> 		u32 interruption_pending : 1;
> 		u32 interruption_type: 1;
> 		u32 reserved : 30;
> 		u32 error_code;
> Or
> 		u64 interruption_pending : 1;
> 		u64 interruption_type: 1;
> 		u64 reserved : 30;
> 		u64 error_code : 32;
> 

Agreed. In the spirit of matching the original headers, I'd prefer
the second one. But either will work.

>> +	} __packed;
>> +};
>> +
>> +union hv_arm64_interrupt_state_register {
>> +	u64 as_uint64;
>> +	struct {
>> +		u64 interrupt_shadow : 1;
>> +		u64 reserved : 63;
>> +	} __packed;
>> +};
>> +
>> +union hv_arm64_pending_synthetic_exception_event {
>> +	u64 as_uint64[2];
>> +	struct {
>> +		u32 event_pending : 1;
>> +		u32 event_type : 3;
>> +		u32 reserved : 4;
> 
> Same here. Expand the "reserved" field to 28 bits?  Or maybe
> there's a reason to have two separate reserved fields of 4 bits
> and 24 bits. I'm not sure what the register layout is supposed to
> be. Looking at hv_arm64_pending_synthetic_exception_event
> in the OHCL-Linux-Kernel github tree shows the same gap of
> 24 bits, so that doesn't provide any guidance.
> 

Hmm..these should be u8 bitfields according to the Hyper-V code.
However that leaves a 24 bit gap as you pointed out.

In the Hyper-V code, these structures aren't actually packed,
which means sometimes the explicit padding is left out
(unintentionally).

Please add the 24 bits of padding to make it explicit here. I
suggest making the bitfields u8 as in the original code, and adding
another padding field after, like:

u8 event_pending : 1;
u8 event_type : 3;
u8 reserved : 4;
u8 rsvd[3];

>> +		u32 exception_type;
>> +		u64 context;
>> +	} __packed;
>> +};
>> +
>>  union hv_x64_interrupt_state_register {
>>  	u64 as_uint64;
>>  	struct {
>> @@ -1103,8 +1132,28 @@ union hv_register_value {
>>  	union hv_explicit_suspend_register explicit_suspend;
>>  	union hv_intercept_suspend_register intercept_suspend;
>>  	union hv_dispatch_suspend_register dispatch_suspend;
>> +#ifdef CONFIG_ARM64
>> +	union hv_arm64_interrupt_state_register interrupt_state;
>> +	union hv_arm64_pending_interruption_register pending_interruption;
>> +#endif
>> +#ifdef CONFIG_X86
>>  	union hv_x64_interrupt_state_register interrupt_state;
>>  	union hv_x64_pending_interruption_register pending_interruption;
>> +#endif
>> +	union hv_arm64_pending_synthetic_exception_event pending_synthetic_exception_event;
>> +};
> 
> Per the previous discussion, I can see that the #ifdef's are needed
> here to disambiguate the field names that are the same, but have
> different unions on x86 and arm64.
> 
> But on the flip side, I wonder if the field names should really be the
> same. Because of the different unions, it seems like they couldn't be
> accessed by architecture neutral code (unless the access is just using
> the "as_uint64" option?). So giving the fields names like
> "x86_interrupt_state" and "arm64_interrupt_state" instead of just
> "interrupt_state" might be more consistent with how the rest of this
> file handles architecture differences. But I don't know all the implications
> of making such a change.
> 
> Nuno -- your thoughts?

My main preference is to match with the original code unless there are *serious*
clarity, style or incompatibility issues. I don't see a big problem with gating
or not gating these. As you pointed out, it *may* make arch-neutral code a little
more cumbersome. But it's hard to say if that will actually be a problem.

Right now it seems to match the Hyper-V code and seems fine to me!

> 
> Michael
> 
>> +
>> +/*
>> + * NOTE: Linux helper struct - NOT from Hyper-V code.
>> + * DECLARE_FLEX_ARRAY() needs to be wrapped into
>> + * a structure and have at least one more member besides
>> + * DECLARE_FLEX_ARRAY.
>> + */

See below - you can remove the second part of this comment and just
leave the first line clarifying this is a Linux-only helper.

>> +struct hv_output_get_vp_registers {
>> +	struct {
>> +		DECLARE_FLEX_ARRAY(union hv_register_value, values);
>> +		struct {} values_end;
>> +	};
>>  };

I missed this change from a previous version - the additional empty struct
isn't needed here.

Michael - The documentation comment you mentioned previously[1] is just
describing how the DECLARE_FLEX_ARRAY() macro works - it actually adds
the empty struct to placate the compiler.

See include/uapi/linux/stddef.h:47:

#define __DECLARE_FLEX_ARRAY(TYPE, NAME)	\
	struct { \
		struct { } __empty_ ## NAME; \
		TYPE NAME[]; \
	}
#endif

So the definition should just look like:

struct hv_output_get_vp_registers {
	DECLARE_FLEX_ARRAY(union hv_register_value, values);
};


Thanks
Nuno


[1] https://lore.kernel.org/linux-hyperv/1bf0ce72-a377-4c3f-b68a-0f890f8b5d09@linux.microsoft.com/

>>
>>  #if defined(CONFIG_ARM64)
>> --
>> 2.34.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 2/5] hyperv: Fix pointer type in get_vtl(void)
  2024-12-30 18:09 ` [PATCH v5 2/5] hyperv: Fix pointer type in get_vtl(void) Roman Kisel
@ 2025-01-08 17:59   ` Nuno Das Neves
  0 siblings, 0 replies; 40+ messages in thread
From: Nuno Das Neves @ 2025-01-08 17:59 UTC (permalink / raw)
  To: Roman Kisel, hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz,
	mingo, mhklinux, tglx, tiala, wei.liu, linux-hyperv, linux-kernel,
	x86
  Cc: apais, benhill, ssengar, sunilmut, vdso

On 12/30/2024 10:09 AM, Roman Kisel wrote:
> Commit bc905fa8b633 ("hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h")
> changed the type of the output pointer to `struct hv_register_assoc` from
> `struct hv_get_vp_registers_output`. That leads to an incorrect computation,
> and leaves the system broken.
> 
> Use the correct pointer type for the output of the GetVpRegisters hypercall.
> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>  arch/x86/hyperv/hv_init.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 9e5e8328df6b..f82d1aefaa8a 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -416,13 +416,13 @@ static u8 __init get_vtl(void)
>  {
>  	u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS;
>  	struct hv_input_get_vp_registers *input;
> -	struct hv_register_assoc *output;
> +	struct hv_output_get_vp_registers *output;
>  	unsigned long flags;
>  	u64 ret;
>  
>  	local_irq_save(flags);
>  	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> -	output = (struct hv_register_assoc *)input;
> +	output = (struct hv_output_get_vp_registers *)input;
>  
>  	memset(input, 0, struct_size(input, names, 1));
>  	input->partition_id = HV_PARTITION_ID_SELF;
> @@ -432,7 +432,7 @@ static u8 __init get_vtl(void)
>  
>  	ret = hv_do_hypercall(control, input, output);
>  	if (hv_result_success(ret)) {
> -		ret = output->value.reg8 & HV_X64_VTL_MASK;
> +		ret = output->values[0].reg8 & HV_X64_VTL_MASK;
>  	} else {
>  		pr_err("Failed to get VTL(error: %lld) exiting...\n", ret);
>  		BUG();

Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH v5 1/5] hyperv: Define struct hv_output_get_vp_registers
  2025-01-08 17:58     ` Nuno Das Neves
@ 2025-01-08 18:22       ` Michael Kelley
  2025-01-08 18:29         ` Roman Kisel
  0 siblings, 1 reply; 40+ messages in thread
From: Michael Kelley @ 2025-01-08 18:22 UTC (permalink / raw)
  To: Nuno Das Neves, Roman Kisel, hpa@zytor.com, kys@microsoft.com,
	bp@alien8.de, dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, tglx@linutronix.de, tiala@microsoft.com,
	wei.liu@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com, ssengar@microsoft.com,
	sunilmut@microsoft.com, vdso@hexbites.dev

From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Wednesday, January 8, 2025 9:58 AM
> 
> On 1/6/2025 9:37 AM, Michael Kelley wrote:
> > From: Roman Kisel <romank@linux.microsoft.com> Sent: Monday, December 30,
> 2024 10:10 AM
> >>
> >> There is no definition of the output structure for the
> >> GetVpRegisters hypercall. Hence, using the hypercall
> >> is not possible when the output value has some structure
> >> to it. Even getting a datum of a primitive type reads
> >> as ad-hoc without that definition.
> >>
> >> Define struct hv_output_get_vp_registers to enable using
> >> the GetVpRegisters hypercall. Make provisions for all
> >> supported architectures. No functional changes.
> >>
> >> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> >> ---
> >>  include/hyperv/hvgdk_mini.h | 49 +++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 49 insertions(+)
> >>
> >> diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
> >> index db3d1aaf7330..e8e3faa78e15 100644
> >> --- a/include/hyperv/hvgdk_mini.h
> >> +++ b/include/hyperv/hvgdk_mini.h
> >> @@ -1068,6 +1068,35 @@ union hv_dispatch_suspend_register {
> >>  	} __packed;
> >>  };
> >>
> >> +union hv_arm64_pending_interruption_register {
> >> +	u64 as_uint64;
> >> +	struct {
> >> +		u64 interruption_pending : 1;
> >> +		u64 interruption_type : 1;
> >> +		u64 reserved : 30;
> >> +		u32 error_code;
> >
> > These bit field definitions don't look right. We want to "fill up"
> > the field size, so that we're explicit about each bit, and not leave
> > it to the compiler to add padding (which __packed tells the
> > compiler not to do). So in aggregate, the "u64" bit fields should
> > account for all 64 bits, but here you account for only 32 bits.
> > There are two ways to fix this:
> >
> > 		u32 interruption_pending : 1;
> > 		u32 interruption_type: 1;
> > 		u32 reserved : 30;
> > 		u32 error_code;
> > Or
> > 		u64 interruption_pending : 1;
> > 		u64 interruption_type: 1;
> > 		u64 reserved : 30;
> > 		u64 error_code : 32;
> >
> 
> Agreed. In the spirit of matching the original headers, I'd prefer
> the second one. But either will work.

Matching the original headers by using the second one is
fine with me.

> 
> >> +	} __packed;
> >> +};
> >> +
> >> +union hv_arm64_interrupt_state_register {
> >> +	u64 as_uint64;
> >> +	struct {
> >> +		u64 interrupt_shadow : 1;
> >> +		u64 reserved : 63;
> >> +	} __packed;
> >> +};
> >> +
> >> +union hv_arm64_pending_synthetic_exception_event {
> >> +	u64 as_uint64[2];
> >> +	struct {
> >> +		u32 event_pending : 1;
> >> +		u32 event_type : 3;
> >> +		u32 reserved : 4;
> >
> > Same here. Expand the "reserved" field to 28 bits?  Or maybe
> > there's a reason to have two separate reserved fields of 4 bits
> > and 24 bits. I'm not sure what the register layout is supposed to
> > be. Looking at hv_arm64_pending_synthetic_exception_event
> > in the OHCL-Linux-Kernel github tree shows the same gap of
> > 24 bits, so that doesn't provide any guidance.
> >
> 
> Hmm..these should be u8 bitfields according to the Hyper-V code.
> However that leaves a 24 bit gap as you pointed out.
> 
> In the Hyper-V code, these structures aren't actually packed,
> which means sometimes the explicit padding is left out
> (unintentionally).
> 
> Please add the 24 bits of padding to make it explicit here. I
> suggest making the bitfields u8 as in the original code, and adding
> another padding field after, like:
> 
> u8 event_pending : 1;
> u8 event_type : 3;
> u8 reserved : 4;
> u8 rsvd[3];

I'm good with that. For the ABI between the host and guest, we
*do* want to make all the padding explicit.

> 
> >> +		u32 exception_type;
> >> +		u64 context;
> >> +	} __packed;
> >> +};
> >> +
> >>  union hv_x64_interrupt_state_register {
> >>  	u64 as_uint64;
> >>  	struct {
> >> @@ -1103,8 +1132,28 @@ union hv_register_value {
> >>  	union hv_explicit_suspend_register explicit_suspend;
> >>  	union hv_intercept_suspend_register intercept_suspend;
> >>  	union hv_dispatch_suspend_register dispatch_suspend;
> >> +#ifdef CONFIG_ARM64
> >> +	union hv_arm64_interrupt_state_register interrupt_state;
> >> +	union hv_arm64_pending_interruption_register pending_interruption;
> >> +#endif
> >> +#ifdef CONFIG_X86
> >>  	union hv_x64_interrupt_state_register interrupt_state;
> >>  	union hv_x64_pending_interruption_register pending_interruption;
> >> +#endif
> >> +	union hv_arm64_pending_synthetic_exception_event pending_synthetic_exception_event;
> >> +};
> >
> > Per the previous discussion, I can see that the #ifdef's are needed
> > here to disambiguate the field names that are the same, but have
> > different unions on x86 and arm64.
> >
> > But on the flip side, I wonder if the field names should really be the
> > same. Because of the different unions, it seems like they couldn't be
> > accessed by architecture neutral code (unless the access is just using
> > the "as_uint64" option?). So giving the fields names like
> > "x86_interrupt_state" and "arm64_interrupt_state" instead of just
> > "interrupt_state" might be more consistent with how the rest of this
> > file handles architecture differences. But I don't know all the implications
> > of making such a change.
> >
> > Nuno -- your thoughts?
> 
> My main preference is to match with the original code unless there are *serious*
> clarity, style or incompatibility issues. I don't see a big problem with gating
> or not gating these. As you pointed out, it *may* make arch-neutral code a little
> more cumbersome. But it's hard to say if that will actually be a problem.
> 
> Right now it seems to match the Hyper-V code and seems fine to me!

OK by me as well.

> 
> >
> > Michael
> >
> >> +
> >> +/*
> >> + * NOTE: Linux helper struct - NOT from Hyper-V code.
> >> + * DECLARE_FLEX_ARRAY() needs to be wrapped into
> >> + * a structure and have at least one more member besides
> >> + * DECLARE_FLEX_ARRAY.
> >> + */
> 
> See below - you can remove the second part of this comment and just
> leave the first line clarifying this is a Linux-only helper.
> 
> >> +struct hv_output_get_vp_registers {
> >> +	struct {
> >> +		DECLARE_FLEX_ARRAY(union hv_register_value, values);
> >> +		struct {} values_end;
> >> +	};
> >>  };
> 
> I missed this change from a previous version - the additional empty struct
> isn't needed here.
> 
> Michael - 
> The documentation comment you mentioned previously[1] is just
> describing how the DECLARE_FLEX_ARRAY() macro works - it actually adds
> the empty struct to placate the compiler.
> 
> See include/uapi/linux/stddef.h:47:
> 
> #define __DECLARE_FLEX_ARRAY(TYPE, NAME)	\
> 	struct { \
> 		struct { } __empty_ ## NAME; \
> 		TYPE NAME[]; \
> 	}
> #endif
> 
> So the definition should just look like:
> 
> struct hv_output_get_vp_registers {
> 	DECLARE_FLEX_ARRAY(union hv_register_value, values);
> };

It was actually Easwar who mentioned this. But regardless, I'm glad
the simpler definition works!

Michael

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 1/5] hyperv: Define struct hv_output_get_vp_registers
  2025-01-08 18:22       ` Michael Kelley
@ 2025-01-08 18:29         ` Roman Kisel
  0 siblings, 0 replies; 40+ messages in thread
From: Roman Kisel @ 2025-01-08 18:29 UTC (permalink / raw)
  To: Michael Kelley, Nuno Das Neves, hpa@zytor.com, kys@microsoft.com,
	bp@alien8.de, dave.hansen@linux.intel.com, decui@microsoft.com,
	eahariha@linux.microsoft.com, haiyangz@microsoft.com,
	mingo@redhat.com, tglx@linutronix.de, tiala@microsoft.com,
	wei.liu@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
  Cc: apais@microsoft.com, benhill@microsoft.com, ssengar@microsoft.com,
	sunilmut@microsoft.com, vdso@hexbites.dev



On 1/8/2025 10:22 AM, Michael Kelley wrote:
> From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Wednesday, January 8, 2025 9:58 AM
>>
>> On 1/6/2025 9:37 AM, Michael Kelley wrote:
>>> From: Roman Kisel <romank@linux.microsoft.com> Sent: Monday, December 30,

Thanks Michael and Nuno for your indispensable help! I'll fix the
the patch as suggested then.

> 
> Michael

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-07 23:11                 ` Roman Kisel
  2025-01-08  8:04                   ` Wei Liu
@ 2025-01-08 19:17                   ` Stanislav Kinsburskii
  2025-01-08 20:37                     ` Roman Kisel
  1 sibling, 1 reply; 40+ messages in thread
From: Stanislav Kinsburskii @ 2025-01-08 19:17 UTC (permalink / raw)
  To: Roman Kisel
  Cc: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86, apais, benhill, ssengar, sunilmut, vdso

On Tue, Jan 07, 2025 at 03:11:15PM -0800, Roman Kisel wrote:
> 
> 
> On 1/7/2025 11:18 AM, Stanislav Kinsburskii wrote:
> > On Mon, Jan 06, 2025 at 01:07:25PM -0800, Roman Kisel wrote:
> > 
> 
> [...]
> 
> > My point is that the proposed fix looks more like an Underhill-tailored
> > bandage and doesn't take the needs of other stake holders into
> > consideration.
> The patch takes as much into consideration as present in the hyperv-next
> tree. Working on the open-source project seems to be harder otherwise.
> A bandage, or not, that's a matter of opinion. There's a been a break,
> here's the bandage.
> 
> > 
> > What is the urgency in merging of this particular change?
> 
> The get_vtl function is broken thus blocking any further work on
> upstreaming VTL mode patches, ARM64 and more. That's not an urgent
> urgency where customers are in pain, more like the urgency of needing
> to take the trash out, and until that happens, continuing inhaling the
> fumes.
> 
> The urgency of unblocking is to continue work on proposing VTL mode
> patches not to carry lots of out-of-tree code in the fork.
> 
> There might be a future where the Hyper-V code offers an API surface
> covering needs of consumers like dom0 and VTLs whereby they maybe can
> be built as an out-of-tree modules so the opinions wouldn't clash as
> much.
> 
> Avoiding using the output hypercall page leads to something like[1]
> and it looks quite complicated although that's the bare bones, lots
> of notes.
> 

How is this related to the original discussion?
My concern was about the piece allocating of the output page guarded by
the VTL config option.

Thanks,
Stas

> [1]
> 
> /*
>  * Fast extended hypercall with 20 bytes of input and 16 bytes of
>  * output for getting a VP register.
>  *
>  * NOTES:
>  *  1. The function is __init only atm, so the XMM context isn't
>  *     used by the user mode.
>  *  2. X86_64 only.
>  *  3. Fast extended hypercalls may use XMM0..XMM6, and XMM is
>  *     architerctural on X86_64 yet the support should be enabled
>  *     in the CR's. Here, need RDX, R8 and XMM0 for input and RDX,
>  *     R8 for output
>  *  4. No provisions for TDX and SEV-SNP for the sake of simplicity
>  *     (the hypervisor cannot see the guest registers in the
>  *     confidential VM), would need to fallback.
>  *  5. The robust implementation would need to check if fast extended
>  *     hypercalls are available by checking the synthehtic CPUID leaves.
>  *     A separate leaf indicates fast output support.
>  *     It _almost_ certainly has to be, unless somehow disabled, hard
>  *     to see why that would be needed.
>  */
> struct hv_u128 {
> 	u64 low_part;
> 	u64 high_part;
> } __packed;
> 
> static __init u64 hv_vp_get_register_xfast(u32 reg_name,
> 		struct hv_u128 *value)
> {
> 	u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS |
> 					HV_HYPERCALL_FAST_BIT;
> 	unsigned long flags;
> 	u64 hv_status;
> 
> 	union {
> 		struct hv_get_vp_registers_input input;
> 		struct {
> 			u64 lo;
> 			u64 hi;
> 		} __packed as_u128;
> 	} hv_input;
> 	register u64 rdx asm("rdx");
> 	register u64 r8 asm("r8");
> 	register u64 r12 asm("r12");
> 
> 	local_irq_save(flags);
> 
> 	hv_input.as_u128.lo = hv_input.as_u128.hi = 0;
> 	hv_input.input.header.partitionid = HV_PARTITION_ID_SELF;
> 	hv_input.input.header.vpindex = HV_VP_INDEX_SELF;
> 	hv_input.input.header.inputvtl = 0;
> 
> 	rdx = hv_input.as_u128.lo;
> 	r8 = hv_input.as_u128.hi;
> 	r12 = reg_name;
> 
> 	__asm__ __volatile__(
> 			"subq		$16, %%rsp\n"
> 			"movups		%%xmm0, 16(%%rsp)\n"
> 			"movd		%%r12, %%xmm0\n"
> 			CALL_NOSPEC
> 			"movups		16(%%rsp), %%xmm0\n"
> 			"addq		$16, %%rsp\n"
> 			: "=a" (hv_status), ASM_CALL_CONSTRAINT,
> 			"+c" (control), "+r" (rdx), "+r" (r8)
> 			: THUNK_TARGET(hv_hypercall_pg), "r"(r12)
> 			: "cc", "memory", "r9", "r10", "r11");
> 
> 	if (hv_result_success(hv_status)) {
> 		value->low_part = rdx;
> 		value->high_part = r8;
> 	}
> 
> 	local_irq_restore(flags);
> 	return hv_status;
> }
> 
> #if IS_ENABLED(CONFIG_HYPERV_VTL_MODE)
> u8 __init get_vtl(void)
> {
> 	struct hv_u128 reg_value;
> 	u64 ret = hv_vp_get_register_xfast(HV_REGISTER_VSM_VP_STATUS, &reg_value);
> 
> 	if (hv_result_success(ret)) {
> 		ret = reg_value.low_part & HV_VTL_MASK;
> 	} else {
> 		pr_err("Failed to get VTL(error: %lld) exiting...\n", ret);
> 		BUG();
> 	}
> 
> 	return ret;
> }
> #endif
> 
> > 
> > Thanks,
> > Stas
> 
> -- 
> Thank you,
> Roman
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-08 19:17                   ` Stanislav Kinsburskii
@ 2025-01-08 20:37                     ` Roman Kisel
  2025-01-08 22:19                       ` Stanislav Kinsburskii
  0 siblings, 1 reply; 40+ messages in thread
From: Roman Kisel @ 2025-01-08 20:37 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86, apais, benhill, ssengar, sunilmut, vdso



On 1/8/2025 11:17 AM, Stanislav Kinsburskii wrote:
> On Tue, Jan 07, 2025 at 03:11:15PM -0800, Roman Kisel wrote:

[...]

>>
>> Avoiding using the output hypercall page leads to something like[1]
>> and it looks quite complicated although that's the bare bones, lots
>> of notes.
>>
> 
> How is this related to the original discussion?

I was looking for ways to eliminate what I perceived as the source of
friction in the discussion -- allocating the hypercall output page.

> My concern was about the piece allocating of the output page guarded by
> the VTL config option.>> Thanks,
> Stas
> 
>> [1]
>>
>> /*
>>   * Fast extended hypercall with 20 bytes of input and 16 bytes of
>>   * output for getting a VP register.
>>   *
>>   * NOTES:
>>   *  1. The function is __init only atm, so the XMM context isn't
>>   *     used by the user mode.
>>   *  2. X86_64 only.
>>   *  3. Fast extended hypercalls may use XMM0..XMM6, and XMM is
>>   *     architerctural on X86_64 yet the support should be enabled
>>   *     in the CR's. Here, need RDX, R8 and XMM0 for input and RDX,
>>   *     R8 for output
>>   *  4. No provisions for TDX and SEV-SNP for the sake of simplicity
>>   *     (the hypervisor cannot see the guest registers in the
>>   *     confidential VM), would need to fallback.
>>   *  5. The robust implementation would need to check if fast extended
>>   *     hypercalls are available by checking the synthehtic CPUID leaves.
>>   *     A separate leaf indicates fast output support.
>>   *     It _almost_ certainly has to be, unless somehow disabled, hard
>>   *     to see why that would be needed.
>>   */
>> struct hv_u128 {
>> 	u64 low_part;
>> 	u64 high_part;
>> } __packed;
>>
>> static __init u64 hv_vp_get_register_xfast(u32 reg_name,
>> 		struct hv_u128 *value)
>> {
>> 	u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS |
>> 					HV_HYPERCALL_FAST_BIT;
>> 	unsigned long flags;
>> 	u64 hv_status;
>>
>> 	union {
>> 		struct hv_get_vp_registers_input input;
>> 		struct {
>> 			u64 lo;
>> 			u64 hi;
>> 		} __packed as_u128;
>> 	} hv_input;
>> 	register u64 rdx asm("rdx");
>> 	register u64 r8 asm("r8");
>> 	register u64 r12 asm("r12");
>>
>> 	local_irq_save(flags);
>>
>> 	hv_input.as_u128.lo = hv_input.as_u128.hi = 0;
>> 	hv_input.input.header.partitionid = HV_PARTITION_ID_SELF;
>> 	hv_input.input.header.vpindex = HV_VP_INDEX_SELF;
>> 	hv_input.input.header.inputvtl = 0;
>>
>> 	rdx = hv_input.as_u128.lo;
>> 	r8 = hv_input.as_u128.hi;
>> 	r12 = reg_name;
>>
>> 	__asm__ __volatile__(
>> 			"subq		$16, %%rsp\n"
>> 			"movups		%%xmm0, 16(%%rsp)\n"
>> 			"movd		%%r12, %%xmm0\n"
>> 			CALL_NOSPEC
>> 			"movups		16(%%rsp), %%xmm0\n"
>> 			"addq		$16, %%rsp\n"
>> 			: "=a" (hv_status), ASM_CALL_CONSTRAINT,
>> 			"+c" (control), "+r" (rdx), "+r" (r8)
>> 			: THUNK_TARGET(hv_hypercall_pg), "r"(r12)
>> 			: "cc", "memory", "r9", "r10", "r11");
>>
>> 	if (hv_result_success(hv_status)) {
>> 		value->low_part = rdx;
>> 		value->high_part = r8;
>> 	}
>>
>> 	local_irq_restore(flags);
>> 	return hv_status;
>> }
>>
>> #if IS_ENABLED(CONFIG_HYPERV_VTL_MODE)
>> u8 __init get_vtl(void)
>> {
>> 	struct hv_u128 reg_value;
>> 	u64 ret = hv_vp_get_register_xfast(HV_REGISTER_VSM_VP_STATUS, &reg_value);
>>
>> 	if (hv_result_success(ret)) {
>> 		ret = reg_value.low_part & HV_VTL_MASK;
>> 	} else {
>> 		pr_err("Failed to get VTL(error: %lld) exiting...\n", ret);
>> 		BUG();
>> 	}
>>
>> 	return ret;
>> }
>> #endif
>>
>>>
>>> Thanks,
>>> Stas
>>
>> -- 
>> Thank you,
>> Roman
>>

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2024-12-30 18:09 ` [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode Roman Kisel
  2025-01-03 19:20   ` Stanislav Kinsburskii
@ 2025-01-08 21:08   ` Nuno Das Neves
  2025-01-08 21:22     ` Roman Kisel
  1 sibling, 1 reply; 40+ messages in thread
From: Nuno Das Neves @ 2025-01-08 21:08 UTC (permalink / raw)
  To: Roman Kisel, hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz,
	mingo, mhklinux, tglx, tiala, wei.liu, linux-hyperv, linux-kernel,
	x86
  Cc: apais, benhill, ssengar, sunilmut, vdso

On 12/30/2024 10:09 AM, Roman Kisel wrote:
> Due to the hypercall page not being allocated in the VTL mode,
> the code resorts to using a part of the input page.
> 
> Allocate the hypercall output page in the VTL mode thus enabling
> it to use it for output and share code with dom0.
> 
> Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
> ---
>  drivers/hv/hv_common.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index c6ed3ba4bf61..c983cfd4d6c0 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -340,7 +340,7 @@ int __init hv_common_init(void)
>  	BUG_ON(!hyperv_pcpu_input_arg);
>  
>  	/* Allocate the per-CPU state for output arg for root */
> -	if (hv_root_partition) {
> +	if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
>  		hyperv_pcpu_output_arg = alloc_percpu(void *);
>  		BUG_ON(!hyperv_pcpu_output_arg);
>  	}
> @@ -435,7 +435,7 @@ int hv_common_cpu_init(unsigned int cpu)
>  	void **inputarg, **outputarg;
>  	u64 msr_vp_index;
>  	gfp_t flags;
> -	int pgcount = hv_root_partition ? 2 : 1;
> +	const int pgcount = (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) ? 2 : 1;
>  	void *mem;
>  	int ret;
>  
> @@ -453,7 +453,7 @@ int hv_common_cpu_init(unsigned int cpu)
>  		if (!mem)
>  			return -ENOMEM;
>  
> -		if (hv_root_partition) {
> +		if (hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE)) {
>  			outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
>  			*outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
>  		}

Replying in a new thread since I don't have all the context about the different
VTL code paths that may need to converge. To my perspective, the approach in this
patch seems perfectly reasonable to fix the current issue.

One small improvement might be to put this check into a helper function. That would
make it easier to amend later when/if a clearly better solution is proposed.

Something like:

static inline bool hv_output_page_exists()
{
	return hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE);
}

Thanks
Nuno

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-08 21:08   ` Nuno Das Neves
@ 2025-01-08 21:22     ` Roman Kisel
  0 siblings, 0 replies; 40+ messages in thread
From: Roman Kisel @ 2025-01-08 21:22 UTC (permalink / raw)
  To: Nuno Das Neves, hpa, kys, bp, dave.hansen, decui, eahariha,
	haiyangz, mingo, mhklinux, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86
  Cc: apais, benhill, ssengar, sunilmut, vdso



On 1/8/2025 1:08 PM, Nuno Das Neves wrote:
> On 12/30/2024 10:09 AM, Roman Kisel wrote:

[...]

> Replying in a new thread since I don't have all the context about the different
> VTL code paths that may need to converge. To my perspective, the approach in this
> patch seems perfectly reasonable to fix the current issue.
> 
> One small improvement might be to put this check into a helper function. That would
> make it easier to amend later when/if a clearly better solution is proposed.
> 
> Something like:
> 
> static inline bool hv_output_page_exists()
> {
> 	return hv_root_partition || IS_ENABLED(CONFIG_HYPERV_VTL_MODE);
> }

Will apply, thank you for the suggestion, Nuno!

> 
> Thanks
> Nuno

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-08 20:37                     ` Roman Kisel
@ 2025-01-08 22:19                       ` Stanislav Kinsburskii
  2025-01-08 23:04                         ` Roman Kisel
  0 siblings, 1 reply; 40+ messages in thread
From: Stanislav Kinsburskii @ 2025-01-08 22:19 UTC (permalink / raw)
  To: Roman Kisel
  Cc: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86, apais, benhill, ssengar, sunilmut, vdso

On Wed, Jan 08, 2025 at 12:37:17PM -0800, Roman Kisel wrote:
> 
> 
> On 1/8/2025 11:17 AM, Stanislav Kinsburskii wrote:
> > On Tue, Jan 07, 2025 at 03:11:15PM -0800, Roman Kisel wrote:
> 
> [...]
> 
> > > 
> > > Avoiding using the output hypercall page leads to something like[1]
> > > and it looks quite complicated although that's the bare bones, lots
> > > of notes.
> > > 
> > 
> > How is this related to the original discussion?
> 
> I was looking for ways to eliminate what I perceived as the source of
> friction in the discussion -- allocating the hypercall output page.
> 

No, output page allocation is the current solution and it is fine.
The source of friction is allocation of this page under config option in
runtime.

Thanks,
Stas

> > My concern was about the piece allocating of the output page guarded by
> > the VTL config option.>> Thanks,
> > Stas
> > 
> > > [1]
> > > 
> > > /*
> > >   * Fast extended hypercall with 20 bytes of input and 16 bytes of
> > >   * output for getting a VP register.
> > >   *
> > >   * NOTES:
> > >   *  1. The function is __init only atm, so the XMM context isn't
> > >   *     used by the user mode.
> > >   *  2. X86_64 only.
> > >   *  3. Fast extended hypercalls may use XMM0..XMM6, and XMM is
> > >   *     architerctural on X86_64 yet the support should be enabled
> > >   *     in the CR's. Here, need RDX, R8 and XMM0 for input and RDX,
> > >   *     R8 for output
> > >   *  4. No provisions for TDX and SEV-SNP for the sake of simplicity
> > >   *     (the hypervisor cannot see the guest registers in the
> > >   *     confidential VM), would need to fallback.
> > >   *  5. The robust implementation would need to check if fast extended
> > >   *     hypercalls are available by checking the synthehtic CPUID leaves.
> > >   *     A separate leaf indicates fast output support.
> > >   *     It _almost_ certainly has to be, unless somehow disabled, hard
> > >   *     to see why that would be needed.
> > >   */
> > > struct hv_u128 {
> > > 	u64 low_part;
> > > 	u64 high_part;
> > > } __packed;
> > > 
> > > static __init u64 hv_vp_get_register_xfast(u32 reg_name,
> > > 		struct hv_u128 *value)
> > > {
> > > 	u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS |
> > > 					HV_HYPERCALL_FAST_BIT;
> > > 	unsigned long flags;
> > > 	u64 hv_status;
> > > 
> > > 	union {
> > > 		struct hv_get_vp_registers_input input;
> > > 		struct {
> > > 			u64 lo;
> > > 			u64 hi;
> > > 		} __packed as_u128;
> > > 	} hv_input;
> > > 	register u64 rdx asm("rdx");
> > > 	register u64 r8 asm("r8");
> > > 	register u64 r12 asm("r12");
> > > 
> > > 	local_irq_save(flags);
> > > 
> > > 	hv_input.as_u128.lo = hv_input.as_u128.hi = 0;
> > > 	hv_input.input.header.partitionid = HV_PARTITION_ID_SELF;
> > > 	hv_input.input.header.vpindex = HV_VP_INDEX_SELF;
> > > 	hv_input.input.header.inputvtl = 0;
> > > 
> > > 	rdx = hv_input.as_u128.lo;
> > > 	r8 = hv_input.as_u128.hi;
> > > 	r12 = reg_name;
> > > 
> > > 	__asm__ __volatile__(
> > > 			"subq		$16, %%rsp\n"
> > > 			"movups		%%xmm0, 16(%%rsp)\n"
> > > 			"movd		%%r12, %%xmm0\n"
> > > 			CALL_NOSPEC
> > > 			"movups		16(%%rsp), %%xmm0\n"
> > > 			"addq		$16, %%rsp\n"
> > > 			: "=a" (hv_status), ASM_CALL_CONSTRAINT,
> > > 			"+c" (control), "+r" (rdx), "+r" (r8)
> > > 			: THUNK_TARGET(hv_hypercall_pg), "r"(r12)
> > > 			: "cc", "memory", "r9", "r10", "r11");
> > > 
> > > 	if (hv_result_success(hv_status)) {
> > > 		value->low_part = rdx;
> > > 		value->high_part = r8;
> > > 	}
> > > 
> > > 	local_irq_restore(flags);
> > > 	return hv_status;
> > > }
> > > 
> > > #if IS_ENABLED(CONFIG_HYPERV_VTL_MODE)
> > > u8 __init get_vtl(void)
> > > {
> > > 	struct hv_u128 reg_value;
> > > 	u64 ret = hv_vp_get_register_xfast(HV_REGISTER_VSM_VP_STATUS, &reg_value);
> > > 
> > > 	if (hv_result_success(ret)) {
> > > 		ret = reg_value.low_part & HV_VTL_MASK;
> > > 	} else {
> > > 		pr_err("Failed to get VTL(error: %lld) exiting...\n", ret);
> > > 		BUG();
> > > 	}
> > > 
> > > 	return ret;
> > > }
> > > #endif
> > > 
> > > > 
> > > > Thanks,
> > > > Stas
> > > 
> > > -- 
> > > Thank you,
> > > Roman
> > > 
> 
> -- 
> Thank you,
> Roman
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode
  2025-01-08 22:19                       ` Stanislav Kinsburskii
@ 2025-01-08 23:04                         ` Roman Kisel
  0 siblings, 0 replies; 40+ messages in thread
From: Roman Kisel @ 2025-01-08 23:04 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: hpa, kys, bp, dave.hansen, decui, eahariha, haiyangz, mingo,
	mhklinux, nunodasneves, tglx, tiala, wei.liu, linux-hyperv,
	linux-kernel, x86, apais, benhill, ssengar, sunilmut, vdso



On 1/8/2025 2:19 PM, Stanislav Kinsburskii wrote:
> On Wed, Jan 08, 2025 at 12:37:17PM -0800, Roman Kisel wrote:
>>
>>
>> On 1/8/2025 11:17 AM, Stanislav Kinsburskii wrote:
>>> On Tue, Jan 07, 2025 at 03:11:15PM -0800, Roman Kisel wrote:
>>
>> [...]
>>
>>>>
>>>> Avoiding using the output hypercall page leads to something like[1]
>>>> and it looks quite complicated although that's the bare bones, lots
>>>> of notes.
>>>>
>>>
>>> How is this related to the original discussion?
>>
>> I was looking for ways to eliminate what I perceived as the source of
>> friction in the discussion -- allocating the hypercall output page.
>>
> 
> No, output page allocation is the current solution and it is fine.
> The source of friction is allocation of this page under config option in
> runtime.

Same difference: the proposed fix makes sense in the hyperv-next tree.

The change is a well-contained, the minimal, very economical code motion
to fix the issue at hand, no risk to the tree was shown. Why predicate
worthiness of the patch on a prototype LVBS kernel fork you've referred
to being built outside of the LKML? When the time comes to take LVBS to
the LKML, can easily rehash any of this due to the minuscule patch size,
and until then it's just bets in what shape the future comes and when it
does. Instead of betting and waiting, fixing the break is more
beneficial so I've sent out v6.

> 
> Thanks,
> Stas
> 
>>> My concern was about the piece allocating of the output page guarded by
>>> the VTL config option.>> Thanks,
>>> Stas
>>>
>>>> [1]
>>>>
>>>> /*
>>>>    * Fast extended hypercall with 20 bytes of input and 16 bytes of
>>>>    * output for getting a VP register.
>>>>    *
>>>>    * NOTES:
>>>>    *  1. The function is __init only atm, so the XMM context isn't
>>>>    *     used by the user mode.
>>>>    *  2. X86_64 only.
>>>>    *  3. Fast extended hypercalls may use XMM0..XMM6, and XMM is
>>>>    *     architerctural on X86_64 yet the support should be enabled
>>>>    *     in the CR's. Here, need RDX, R8 and XMM0 for input and RDX,
>>>>    *     R8 for output
>>>>    *  4. No provisions for TDX and SEV-SNP for the sake of simplicity
>>>>    *     (the hypervisor cannot see the guest registers in the
>>>>    *     confidential VM), would need to fallback.
>>>>    *  5. The robust implementation would need to check if fast extended
>>>>    *     hypercalls are available by checking the synthehtic CPUID leaves.
>>>>    *     A separate leaf indicates fast output support.
>>>>    *     It _almost_ certainly has to be, unless somehow disabled, hard
>>>>    *     to see why that would be needed.
>>>>    */
>>>> struct hv_u128 {
>>>> 	u64 low_part;
>>>> 	u64 high_part;
>>>> } __packed;
>>>>
>>>> static __init u64 hv_vp_get_register_xfast(u32 reg_name,
>>>> 		struct hv_u128 *value)
>>>> {
>>>> 	u64 control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_REGISTERS |
>>>> 					HV_HYPERCALL_FAST_BIT;
>>>> 	unsigned long flags;
>>>> 	u64 hv_status;
>>>>
>>>> 	union {
>>>> 		struct hv_get_vp_registers_input input;
>>>> 		struct {
>>>> 			u64 lo;
>>>> 			u64 hi;
>>>> 		} __packed as_u128;
>>>> 	} hv_input;
>>>> 	register u64 rdx asm("rdx");
>>>> 	register u64 r8 asm("r8");
>>>> 	register u64 r12 asm("r12");
>>>>
>>>> 	local_irq_save(flags);
>>>>
>>>> 	hv_input.as_u128.lo = hv_input.as_u128.hi = 0;
>>>> 	hv_input.input.header.partitionid = HV_PARTITION_ID_SELF;
>>>> 	hv_input.input.header.vpindex = HV_VP_INDEX_SELF;
>>>> 	hv_input.input.header.inputvtl = 0;
>>>>
>>>> 	rdx = hv_input.as_u128.lo;
>>>> 	r8 = hv_input.as_u128.hi;
>>>> 	r12 = reg_name;
>>>>
>>>> 	__asm__ __volatile__(
>>>> 			"subq		$16, %%rsp\n"
>>>> 			"movups		%%xmm0, 16(%%rsp)\n"
>>>> 			"movd		%%r12, %%xmm0\n"
>>>> 			CALL_NOSPEC
>>>> 			"movups		16(%%rsp), %%xmm0\n"
>>>> 			"addq		$16, %%rsp\n"
>>>> 			: "=a" (hv_status), ASM_CALL_CONSTRAINT,
>>>> 			"+c" (control), "+r" (rdx), "+r" (r8)
>>>> 			: THUNK_TARGET(hv_hypercall_pg), "r"(r12)
>>>> 			: "cc", "memory", "r9", "r10", "r11");
>>>>
>>>> 	if (hv_result_success(hv_status)) {
>>>> 		value->low_part = rdx;
>>>> 		value->high_part = r8;
>>>> 	}
>>>>
>>>> 	local_irq_restore(flags);
>>>> 	return hv_status;
>>>> }
>>>>
>>>> #if IS_ENABLED(CONFIG_HYPERV_VTL_MODE)
>>>> u8 __init get_vtl(void)
>>>> {
>>>> 	struct hv_u128 reg_value;
>>>> 	u64 ret = hv_vp_get_register_xfast(HV_REGISTER_VSM_VP_STATUS, &reg_value);
>>>>
>>>> 	if (hv_result_success(ret)) {
>>>> 		ret = reg_value.low_part & HV_VTL_MASK;
>>>> 	} else {
>>>> 		pr_err("Failed to get VTL(error: %lld) exiting...\n", ret);
>>>> 		BUG();
>>>> 	}
>>>>
>>>> 	return ret;
>>>> }
>>>> #endif
>>>>
>>>>>
>>>>> Thanks,
>>>>> Stas
>>>>
>>>> -- 
>>>> Thank you,
>>>> Roman
>>>>
>>
>> -- 
>> Thank you,
>> Roman
>>

-- 
Thank you,
Roman


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2025-01-08 23:04 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-30 18:09 [PATCH v5 0/5] hyperv: Fixes for get_vtl(), hv_vtl_apicid_to_vp_id() Roman Kisel
2024-12-30 18:09 ` [PATCH v5 1/5] hyperv: Define struct hv_output_get_vp_registers Roman Kisel
2025-01-06 17:37   ` Michael Kelley
2025-01-06 20:24     ` Roman Kisel
2025-01-08  7:34       ` Wei Liu
2025-01-08 17:48         ` Roman Kisel
2025-01-08 17:58     ` Nuno Das Neves
2025-01-08 18:22       ` Michael Kelley
2025-01-08 18:29         ` Roman Kisel
2024-12-30 18:09 ` [PATCH v5 2/5] hyperv: Fix pointer type in get_vtl(void) Roman Kisel
2025-01-08 17:59   ` Nuno Das Neves
2024-12-30 18:09 ` [PATCH v5 3/5] hyperv: Enable the hypercall output page for the VTL mode Roman Kisel
2025-01-03 19:20   ` Stanislav Kinsburskii
2025-01-03 21:39     ` Roman Kisel
2025-01-06 17:11       ` Stanislav Kinsburskii
2025-01-06 18:11         ` Roman Kisel
2025-01-06 19:32           ` Stanislav Kinsburskii
2025-01-06 21:07             ` Roman Kisel
2025-01-07 19:18               ` Stanislav Kinsburskii
2025-01-07 23:11                 ` Roman Kisel
2025-01-08  8:04                   ` Wei Liu
2025-01-08 19:17                   ` Stanislav Kinsburskii
2025-01-08 20:37                     ` Roman Kisel
2025-01-08 22:19                       ` Stanislav Kinsburskii
2025-01-08 23:04                         ` Roman Kisel
2025-01-03 22:08     ` Michael Kelley
     [not found]       ` <CAJ-90NKKfF-KcWJ7sdMCXK9fWiXwMG-9xtjQn9fVhXgjRinZbA@mail.gmail.com>
2025-01-06 14:53         ` Alex Ionescu
2025-01-06 16:10           ` Michael Kelley
2025-01-06 17:23       ` Stanislav Kinsburskii
2025-01-06 18:18         ` Michael Kelley
2025-01-06 19:19           ` Stanislav Kinsburskii
2025-01-06 19:49             ` Michael Kelley
2025-01-06 21:12               ` Stanislav Kinsburskii
2025-01-08 21:08   ` Nuno Das Neves
2025-01-08 21:22     ` Roman Kisel
2024-12-30 18:09 ` [PATCH v5 4/5] hyperv: Do not overlap the hvcall IO areas in get_vtl() Roman Kisel
2025-01-08  7:36   ` Wei Liu
2025-01-08 17:47     ` Roman Kisel
2024-12-30 18:09 ` [PATCH v5 5/5] hyperv: Do not overlap the hvcall IO areas in hv_vtl_apicid_to_vp_id() Roman Kisel
2024-12-30 18:16 ` [PATCH v5 0/5] hyperv: Fixes for get_vtl(), hv_vtl_apicid_to_vp_id() Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).