[PATCH 0/7] hyperv: Introduce new way to manage hypercall args

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/7] hyperv: Introduce new way to manage hypercall args
@ 2025-02-26 20:06 mhkelley58
  2025-02-26 20:06 ` [PATCH 1/7] x86/hyperv: Fix output argument to hypercall that changes page visibility mhkelley58
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: mhkelley58 @ 2025-02-26 20:06 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

From: Michael Kelley <mhklinux@outlook.com>

This patch set introduces a new way to manage the use of the per-cpu
memory that is usually the input and output arguments to Hyper-V
hypercalls. Current code allocates the "hyperv_pcpu_input_arg", and in
some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
page of memory allocated per-vCPU. A hypercall call site disables
interrupts, then uses this memory to set up the input parameters for
the hypercall, read the output results after hypercall execution, and
re-enable interrupts. The open coding of these steps has led to
inconsistencies, and in some cases, violation of the generic
requirements for the hypercall input and output as described in the
Hyper-V Top Level Functional Spec (TLFS)[1]. This patch set introduces
a new family of inline functions to replace the open coding. The new
functions encapsulate key aspects of the use of per-vCPU memory for
hypercall input and output,and ensure that the TLFS requirements are
met (max size of 1 page each for input and output, no overlap of input
and output, aligned to 8 bytes, etc.).

With this change, hypercall call sites no longer directly access
"hyperv_pcpu_input_arg" and "hyperv_pcpu_output_arg". Instead, one of
a family of new functions provides the per-cpu memory that a hypercall
call site uses to set up hypercall input and output areas.
Conceptually, there is no longer a difference between the "per-vCPU
input page" and "per-vCPU output page". Only a single per-vCPU page is
allocated, and it is used to provide both hypercall input and output.
All current hypercalls can fit their input and output within that single
page, though the new code allows easy changing to two pages should a
future hypercall require a full page for each of the input and output.

The new functions always zero the fixed-size portion of the hypercall
input area (but not any array portion -- see below) so that
uninitialized memory isn't inadvertently passed to the hypercall.
Current open-coded hypercall call sites are inconsistent on this point,
and use of the new functions addresses that inconsistency. The output
area is not zero'ed by the new code as it is Hyper-V's responsibility
to provide legal output.

When the input or output (or both) contain an array, the new code
calculates and returns how many array entries fit within the per-cpu
memory page, which is effectively the "batch size" for the hypercall
processing multiple entries. This batch size can then be used in the
hypercall control word to specify the repetition count. This
calculation of the batch size replaces current open coding of the
batch size, which is prone to errors. Note that the array portion of
the input area is *not* zero'ed. The arrays are almost always 64-bit
GPAs or something similar, and zero'ing that much memory seems
wasteful at runtime when it will all be overwritten. The hypercall
call site is responsible for ensuring that no part of the array is
left uninitialized (just as with current code).

The new family of functions is realized as a single inline function
that handles the most complex case, which is a hypercall with input
and output, both of which contain arrays. Simpler cases are mapped to
this most complex case with #define wrappers that provide zero or NULL
for some arguments. Several of the arguments to this new function are
expected to be compile-time constants generated by "sizeof()"
expressions. As such, most of the code in the new function is
evaluated by the compiler, with the result that the runtime code paths
are no longer than with the current open coding. An exception is the
new code generated to zero the fixed-size portion of the input area
in cases where it was not previously done.

Use of the new function typically (but not always) saves a few lines
of code at each hypercall call site. This is traded off against the
lines of code added for the new functions. With code currently
upstream, the net is an add of about 50 lines of code and comments.
However, as additional hypercall call sites are upstreamed from the
OpenHCL project[2] in support of Linux running in the Hyper-V root
partition and in VTLs other than VTL 0, the net lines of code added is
nearly zero.

A couple hypercall call sites have requirements that are not 100%
handled by the new function. These still require some manual open-
coded adjustment or open-coded batch size calculations -- see the
individual patches in this series. Suggestions on how to do better
are welcome.

The patches in the series do the following:

Patch 1: Fix a hypercall argument error discovered in one hypercall
         call site while researching this project.

Patch 2: Introduce the new family of functions for assigning hypercall
         input and output arguments.

Patch 3 to 6: Change existing hypercall call sites to use one of the new
         functions. In some cases, tweaks to the hypercall argument data
         structures are necessary, but these tweaks are making the data
         structures more consistent with the overall pattern. These
         four patches are independent of each other, and can go in any
         order. The breakup into 4 patches is for ease of review.

Patch 7: Update the name of the variable used to hold the per-cpu memory
         used for hypercall arguments. Remove code for managing the
	 per-cpu output page.

The new code compiles and runs successfully on x86 and arm64. Separate
from this patch set, for evaluation purposes I also applied the
changes to the additional hypercall call sites in the OpenHCL
project[2]. However, I don't have the hardware or Hyper-V
configurations needed to test running in the Hyper-V root partition or
in a VTL other than VTL 0. So the related hypercall call sites still
need to be tested to make sure I didn t break anything. Hopefully
someone with the necessary configurations and Hyper-V versions can
help with that testing.

For gcc 9.4.0, I've looked at the generated code for a couple of
hypercall call sites on both x86 and arm64 to ensure that it boils
down to the equivalent of the current open coding. I have not looked
at the generated code for later gcc versions or for Clang/LLVM, but
there's no reason to expect something worse as the code isn't doing
anything tricky.

This patch set is built against linux-next20250222.

[1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
[2] https://github.com/microsoft/OHCL-Linux-Kernel

Michael Kelley (7):
  x86/hyperv: Fix output argument to hypercall that changes page
    visibility
  Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 1
  x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 2
  Drivers: hv: Use hv_hvcall_*() to set up hypercall arguments
  PCI: hv: Use hv_hvcall_*() to set up hypercall arguments
  Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg

 arch/x86/hyperv/hv_apic.c           |   6 +-
 arch/x86/hyperv/hv_init.c           |  11 ++-
 arch/x86/hyperv/hv_vtl.c            |   8 +--
 arch/x86/hyperv/irqdomain.c         |  10 ++-
 arch/x86/hyperv/ivm.c               |  21 +++---
 arch/x86/hyperv/mmu.c               |  17 +----
 arch/x86/hyperv/nested.c            |  14 ++--
 drivers/hv/hv.c                     |  10 +--
 drivers/hv/hv_balloon.c             |   4 +-
 drivers/hv/hv_common.c              |  57 +++++-----------
 drivers/hv/hv_proc.c                |   8 +--
 drivers/hv/hyperv_vmbus.h           |   2 +-
 drivers/pci/controller/pci-hyperv.c |  14 ++--
 include/asm-generic/mshyperv.h      | 102 +++++++++++++++++++++++++++-
 include/hyperv/hvgdk_mini.h         |   6 +-
 15 files changed, 169 insertions(+), 121 deletions(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/7] x86/hyperv: Fix output argument to hypercall that changes page visibility
  2025-02-26 20:06 [PATCH 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
@ 2025-02-26 20:06 ` mhkelley58
  2025-02-26 22:59   ` Nuno Das Neves
  2025-02-26 20:06 ` [PATCH 2/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments mhkelley58
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: mhkelley58 @ 2025-02-26 20:06 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

From: Michael Kelley <mhklinux@outlook.com>

The hypercall in hv_mark_gpa_visibility() is invoked with an input
argument and an output argument. The output argument ostensibly returns
the number of pages that were processed. But in fact, the hypercall does
not provide any output, so the output argument is spurious.

The spurious argument is harmless because Hyper-V ignores it, but in the
interest of correctness and to avoid the potential for future problems,
remove it.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
I have not provided a "Fixes:" tag because the error causes no impact.
There's no value in backporting the fix.

 arch/x86/hyperv/ivm.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index dd68d9ad9b22..ec7880271cf9 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -464,7 +464,6 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
 			   enum hv_mem_host_visibility visibility)
 {
 	struct hv_gpa_range_for_visibility *input;
-	u16 pages_processed;
 	u64 hv_status;
 	unsigned long flags;
 
@@ -493,7 +492,7 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
 	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
 	hv_status = hv_do_rep_hypercall(
 			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
-			0, input, &pages_processed);
+			0, input, NULL);
 	local_irq_restore(flags);
 
 	if (hv_result_success(hv_status))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/7] x86/hyperv: Fix output argument to hypercall that changes page visibility
  2025-02-26 20:06 ` [PATCH 1/7] x86/hyperv: Fix output argument to hypercall that changes page visibility mhkelley58
@ 2025-02-26 22:59   ` Nuno Das Neves
  2025-03-10  0:22     ` Wei Liu
  0 siblings, 1 reply; 17+ messages in thread
From: Nuno Das Neves @ 2025-02-26 22:59 UTC (permalink / raw)
  To: mhklinux, kys, haiyangz, wei.liu, decui, tglx, mingo, bp,
	dave.hansen, hpa, lpieralisi, kw, manivannan.sadhasivam, robh,
	bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

On 2/26/2025 12:06 PM, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
> 
> The hypercall in hv_mark_gpa_visibility() is invoked with an input
> argument and an output argument. The output argument ostensibly returns
> the number of pages that were processed. But in fact, the hypercall does
> not provide any output, so the output argument is spurious.
> 
> The spurious argument is harmless because Hyper-V ignores it, but in the
> interest of correctness and to avoid the potential for future problems,
> remove it.
> 
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> ---
> I have not provided a "Fixes:" tag because the error causes no impact.
> There's no value in backporting the fix.
> 
>  arch/x86/hyperv/ivm.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index dd68d9ad9b22..ec7880271cf9 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -464,7 +464,6 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
>  			   enum hv_mem_host_visibility visibility)
>  {
>  	struct hv_gpa_range_for_visibility *input;
> -	u16 pages_processed;
>  	u64 hv_status;
>  	unsigned long flags;
>  
> @@ -493,7 +492,7 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
>  	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
>  	hv_status = hv_do_rep_hypercall(
>  			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
> -			0, input, &pages_processed);
> +			0, input, NULL);
>  	local_irq_restore(flags);
>  
>  	if (hv_result_success(hv_status))

Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/7] x86/hyperv: Fix output argument to hypercall that changes page visibility
  2025-02-26 22:59   ` Nuno Das Neves
@ 2025-03-10  0:22     ` Wei Liu
  0 siblings, 0 replies; 17+ messages in thread
From: Wei Liu @ 2025-03-10  0:22 UTC (permalink / raw)
  To: Nuno Das Neves
  Cc: mhklinux, kys, haiyangz, wei.liu, decui, tglx, mingo, bp,
	dave.hansen, hpa, lpieralisi, kw, manivannan.sadhasivam, robh,
	bhelgaas, arnd, x86, linux-hyperv, linux-kernel, linux-pci,
	linux-arch

On Wed, Feb 26, 2025 at 02:59:53PM -0800, Nuno Das Neves wrote:
> On 2/26/2025 12:06 PM, mhkelley58@gmail.com wrote:
> > From: Michael Kelley <mhklinux@outlook.com>
> > 
> > The hypercall in hv_mark_gpa_visibility() is invoked with an input
> > argument and an output argument. The output argument ostensibly returns
> > the number of pages that were processed. But in fact, the hypercall does
> > not provide any output, so the output argument is spurious.
> > 
> > The spurious argument is harmless because Hyper-V ignores it, but in the
> > interest of correctness and to avoid the potential for future problems,
> > remove it.
> > 
> > Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> > ---
> > I have not provided a "Fixes:" tag because the error causes no impact.
> > There's no value in backporting the fix.
> > 
> >  arch/x86/hyperv/ivm.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> > index dd68d9ad9b22..ec7880271cf9 100644
> > --- a/arch/x86/hyperv/ivm.c
> > +++ b/arch/x86/hyperv/ivm.c
> > @@ -464,7 +464,6 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
> >  			   enum hv_mem_host_visibility visibility)
> >  {
> >  	struct hv_gpa_range_for_visibility *input;
> > -	u16 pages_processed;
> >  	u64 hv_status;
> >  	unsigned long flags;
> >  
> > @@ -493,7 +492,7 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
> >  	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
> >  	hv_status = hv_do_rep_hypercall(
> >  			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
> > -			0, input, &pages_processed);
> > +			0, input, NULL);
> >  	local_irq_restore(flags);
> >  
> >  	if (hv_result_success(hv_status))
> 
> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>

Applied to hyperv-fixes, thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-02-26 20:06 [PATCH 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
  2025-02-26 20:06 ` [PATCH 1/7] x86/hyperv: Fix output argument to hypercall that changes page visibility mhkelley58
@ 2025-02-26 20:06 ` mhkelley58
  2025-02-27  0:14   ` Nuno Das Neves
  2025-02-26 20:06 ` [PATCH 3/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 1 mhkelley58
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: mhkelley58 @ 2025-02-26 20:06 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

From: Michael Kelley <mhklinux@outlook.com>

Current code allocates the "hyperv_pcpu_input_arg", and in
some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
page of memory allocated per-vCPU. A hypercall call site disables
interrupts, then uses this memory to set up the input parameters for
the hypercall, read the output results after hypercall execution, and
re-enable interrupts. The open coding of these steps leads to
inconsistencies, and in some cases, violation of the generic
requirements for the hypercall input and output as described in the
Hyper-V Top Level Functional Spec (TLFS)[1].

To reduce these kinds of problems, introduce a family of inline
functions to replace the open coding. The functions provide a new way
to manage the use of this per-vCPU memory that is usually the input and
output arguments to Hyper-V hypercalls. The functions encapsulate
key aspects of the usage and ensure that the TLFS requirements are
met (max size of 1 page each for input and output, no overlap of
input and output, aligned to 8 bytes, etc.). Conceptually, there
is no longer a difference between the "per-vCPU input page" and
"per-vCPU output page". Only a single per-vCPU page is allocated, and
it provides both hypercall input and output memory. All current
hypercalls can fit their input and output within that single page,
though the new code allows easy changing to two pages should a future
hypercall require a full page for each of the input and output.

The new functions always zero the fixed-size portion of the hypercall
input area so that uninitialized memory is not inadvertently passed
to the hypercall. Current open-coded hypercall call sites are
inconsistent on this point, and use of the new functions addresses
that inconsistency. The output area is not zero'ed by the new code
as it is Hyper-V's responsibility to provide legal output.

When the input or output (or both) contain an array, the new functions
calculate and return how many array entries fit within the per-cpu
memory page, which is effectively the "batch size" for the hypercall
processing multiple entries. This batch size can then be used in the
hypercall control word to specify the repetition count. This
calculation of the batch size replaces current open coding of the
batch size, which is prone to errors. Note that the array portion of
the input area is *not* zero'ed. The arrays are almost always 64-bit
GPAs or something similar, and zero'ing that much memory seems
wasteful at runtime when it will all be overwritten. The hypercall
call site is responsible for ensuring that no part of the array is
left uninitialized (just as with current code).

The new functions are realized as a single inline function that
handles the most complex case, which is a hypercall with input
and output, both of which contain arrays. Simpler cases are mapped to
this most complex case with #define wrappers that provide zero or NULL
for some arguments. Several of the arguments to this new function are
expected to be compile-time constants generated by "sizeof()"
expressions. As such, most of the code in the new function can be
evaluated by the compiler, with the result that the code paths are
no longer than with the current open coding. The one exception is
new code generated to zero the fixed-size portion of the input area
in cases where it is not currently done.

[1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
 include/asm-generic/mshyperv.h | 102 +++++++++++++++++++++++++++++++++
 1 file changed, 102 insertions(+)

diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index b13b0cda4ac8..0c8a9133cf1a 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -135,6 +135,108 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size,
 	return status;
 }

+/*
+ * Hypercall input and output argument setup
+ */
+
+/* Temporary mapping to be removed at the end of the patch series */
+#define hyperv_pcpu_arg hyperv_pcpu_input_arg
+
+/*
+ * Allocate one page that is shared between input and output args, which is
+ * sufficient for all current hypercalls. If a future hypercall requires
+ * more space, change this value to "2" and everything will work.
+ */
+#define HV_HVCALL_ARG_PAGES 1
+
+/*
+ * Allocate space for hypercall input and output arguments from the
+ * pre-allocated per-cpu hyperv_pcpu_args page(s). A NULL value for the input
+ * or output indicates to allocate no space for that argument. For input and
+ * for output, specify the size of the fixed portion, and the size of an
+ * element in a variable size array. A zero value for entry_size indicates
+ * there is no array. The fixed size space for the input is zero'ed.
+ *
+ * When variable size arrays are present, the function returns the number of
+ * elements (i.e, the batch size) that fit in the available space.
+ *
+ * The four "size" arguments are expected to be constants, in which case the
+ * compiler does most of the calculations. Then the generated inline code is no
+ * larger than if open coding the access to the hyperv_pcpu_arg and doing
+ * memset() on the input.
+ */
+static inline int hv_hvcall_inout_array(
+			void *input, u32 in_size, u32 in_entry_size,
+			void *output, u32 out_size, u32 out_entry_size)
+{
+	u32 in_batch_count = 0, out_batch_count = 0, batch_count;
+	u32 in_total_size, out_total_size, offset;
+	u32 batch_space = HV_HYP_PAGE_SIZE * HV_HVCALL_ARG_PAGES;
+	void *space;
+
+	/*
+	 * If input and output have arrays, allocate half the space to input
+	 * and half to output. If only input has an array, the array can use
+	 * all the space except for the fixed size output (but not to exceed
+	 * one page), and vice versa.
+	 */
+	if (in_entry_size && out_entry_size)
+		batch_space = batch_space / 2;
+	else if (in_entry_size)
+		batch_space = min(HV_HYP_PAGE_SIZE, batch_space - out_size);
+	else if (out_entry_size)
+		batch_space = min(HV_HYP_PAGE_SIZE, batch_space - in_size);
+
+	if (in_entry_size)
+		in_batch_count = (batch_space - in_size) / in_entry_size;
+	if (out_entry_size)
+		out_batch_count = (batch_space - out_size) / out_entry_size;
+
+	/*
+	 * If input and output have arrays, use the smaller of the two batch
+	 * counts, in case they are different. If only one has an array, use
+	 * that batch count. batch_count will be zero if neither has an array.
+	 */
+	if (in_batch_count && out_batch_count)
+		batch_count = min(in_batch_count, out_batch_count);
+	else
+		batch_count = in_batch_count | out_batch_count;
+
+	in_total_size = ALIGN(in_size + (in_entry_size * batch_count), 8);
+	out_total_size = ALIGN(out_size + (out_entry_size * batch_count), 8);
+
+	space = *this_cpu_ptr(hyperv_pcpu_arg);
+	if (input) {
+		*(void **)input = space;
+		if (space)
+			/* Zero the fixed size portion, not any array portion */
+			memset(space, 0, ALIGN(in_size, 8));
+	}
+
+	if (output) {
+		if (in_total_size + out_total_size <= HV_HYP_PAGE_SIZE) {
+			offset = in_total_size;
+		} else {
+			offset = HV_HYP_PAGE_SIZE;
+			/* Need more than 1 page, but only 1 was allocated */
+			BUILD_BUG_ON(HV_HVCALL_ARG_PAGES == 1);
+		}
+		*(void **)output = space + offset;
+	}
+
+	return batch_count;
+}
+
+/* Wrappers for some of the simpler cases with only input, or with no arrays */
+#define hv_hvcall_in(input, in_size) \
+	hv_hvcall_inout_array(input, in_size, 0, NULL, 0, 0)
+
+#define hv_hvcall_inout(input, in_size, output, out_size) \
+	hv_hvcall_inout_array(input, in_size, 0, output, out_size, 0)
+
+#define hv_hvcall_in_array(input, in_size, in_entry_size) \
+	hv_hvcall_inout_array(input, in_size, in_entry_size, NULL, 0, 0)
+
 /* Generate the guest OS identifier as described in the Hyper-V TLFS */
 static inline u64 hv_generate_guest_id(u64 kernel_version)
 {
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-02-26 20:06 ` [PATCH 2/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments mhkelley58
@ 2025-02-27  0:14   ` Nuno Das Neves
  2025-02-27  1:50     ` Michael Kelley
  0 siblings, 1 reply; 17+ messages in thread
From: Nuno Das Neves @ 2025-02-27  0:14 UTC (permalink / raw)
  To: mhklinux, kys, haiyangz, wei.liu, decui, tglx, mingo, bp,
	dave.hansen, hpa, lpieralisi, kw, manivannan.sadhasivam, robh,
	bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

On 2/26/2025 12:06 PM, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
> 
> Current code allocates the "hyperv_pcpu_input_arg", and in
> some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
> page of memory allocated per-vCPU. A hypercall call site disables
> interrupts, then uses this memory to set up the input parameters for
> the hypercall, read the output results after hypercall execution, and
> re-enable interrupts. The open coding of these steps leads to
> inconsistencies, and in some cases, violation of the generic
> requirements for the hypercall input and output as described in the
> Hyper-V Top Level Functional Spec (TLFS)[1].
> 
> To reduce these kinds of problems, introduce a family of inline
> functions to replace the open coding. The functions provide a new way
> to manage the use of this per-vCPU memory that is usually the input and
> output arguments to Hyper-V hypercalls. The functions encapsulate
> key aspects of the usage and ensure that the TLFS requirements are
> met (max size of 1 page each for input and output, no overlap of
> input and output, aligned to 8 bytes, etc.). Conceptually, there
> is no longer a difference between the "per-vCPU input page" and
> "per-vCPU output page". Only a single per-vCPU page is allocated, and
> it provides both hypercall input and output memory. All current
> hypercalls can fit their input and output within that single page,
> though the new code allows easy changing to two pages should a future
> hypercall require a full page for each of the input and output.
> 
> The new functions always zero the fixed-size portion of the hypercall
> input area so that uninitialized memory is not inadvertently passed
> to the hypercall. Current open-coded hypercall call sites are
> inconsistent on this point, and use of the new functions addresses
> that inconsistency. The output area is not zero'ed by the new code
> as it is Hyper-V's responsibility to provide legal output.
> 
> When the input or output (or both) contain an array, the new functions
> calculate and return how many array entries fit within the per-cpu
> memory page, which is effectively the "batch size" for the hypercall
> processing multiple entries. This batch size can then be used in the
> hypercall control word to specify the repetition count. This
> calculation of the batch size replaces current open coding of the
> batch size, which is prone to errors. Note that the array portion of
> the input area is *not* zero'ed. The arrays are almost always 64-bit
> GPAs or something similar, and zero'ing that much memory seems
> wasteful at runtime when it will all be overwritten. The hypercall
> call site is responsible for ensuring that no part of the array is
> left uninitialized (just as with current code).
> 
> The new functions are realized as a single inline function that
> handles the most complex case, which is a hypercall with input
> and output, both of which contain arrays. Simpler cases are mapped to
> this most complex case with #define wrappers that provide zero or NULL
> for some arguments. Several of the arguments to this new function are
> expected to be compile-time constants generated by "sizeof()"
> expressions. As such, most of the code in the new function can be
> evaluated by the compiler, with the result that the code paths are
> no longer than with the current open coding. The one exception is
> new code generated to zero the fixed-size portion of the input area
> in cases where it is not currently done.
> 
> [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
> 
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> ---
>  include/asm-generic/mshyperv.h | 102 +++++++++++++++++++++++++++++++++
>  1 file changed, 102 insertions(+)
> 
> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> index b13b0cda4ac8..0c8a9133cf1a 100644
> --- a/include/asm-generic/mshyperv.h
> +++ b/include/asm-generic/mshyperv.h
> @@ -135,6 +135,108 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size,
>  	return status;
>  }
>  
> +/*
> + * Hypercall input and output argument setup
> + */
> +
> +/* Temporary mapping to be removed at the end of the patch series */
> +#define hyperv_pcpu_arg hyperv_pcpu_input_arg
> +
> +/*
> + * Allocate one page that is shared between input and output args, which is
> + * sufficient for all current hypercalls. If a future hypercall requires
> + * more space, change this value to "2" and everything will work.
> + */
> +#define HV_HVCALL_ARG_PAGES 1
> +
> +/*
> + * Allocate space for hypercall input and output arguments from the
> + * pre-allocated per-cpu hyperv_pcpu_args page(s). A NULL value for the input
> + * or output indicates to allocate no space for that argument. For input and
> + * for output, specify the size of the fixed portion, and the size of an
> + * element in a variable size array. A zero value for entry_size indicates
> + * there is no array. The fixed size space for the input is zero'ed.
> + *
It might be worth explicitly mentioning that interrupts should be disabled when
calling this function.

> + * When variable size arrays are present, the function returns the number of
> + * elements (i.e, the batch size) that fit in the available space.
> + *
> + * The four "size" arguments are expected to be constants, in which case the
> + * compiler does most of the calculations. Then the generated inline code is no
> + * larger than if open coding the access to the hyperv_pcpu_arg and doing
> + * memset() on the input.
> + */
> +static inline int hv_hvcall_inout_array(
> +			void *input, u32 in_size, u32 in_entry_size,
> +			void *output, u32 out_size, u32 out_entry_size)
Is there a reason input and output are void * instead of void ** here?

> +{
> +	u32 in_batch_count = 0, out_batch_count = 0, batch_count;
> +	u32 in_total_size, out_total_size, offset;
> +	u32 batch_space = HV_HYP_PAGE_SIZE * HV_HVCALL_ARG_PAGES;
> +	void *space;
> +
> +	/*
> +	 * If input and output have arrays, allocate half the space to input
> +	 * and half to output. If only input has an array, the array can use
> +	 * all the space except for the fixed size output (but not to exceed
> +	 * one page), and vice versa.
> +	 */
> +	if (in_entry_size && out_entry_size)
> +		batch_space = batch_space / 2;
> +	else if (in_entry_size)
> +		batch_space = min(HV_HYP_PAGE_SIZE, batch_space - out_size);
> +	else if (out_entry_size)
> +		batch_space = min(HV_HYP_PAGE_SIZE, batch_space - in_size);
> +
> +	if (in_entry_size)
> +		in_batch_count = (batch_space - in_size) / in_entry_size;
> +	if (out_entry_size)
> +		out_batch_count = (batch_space - out_size) / out_entry_size;
> +
> +	/*
> +	 * If input and output have arrays, use the smaller of the two batch
> +	 * counts, in case they are different. If only one has an array, use
> +	 * that batch count. batch_count will be zero if neither has an array.
> +	 */
> +	if (in_batch_count && out_batch_count)
> +		batch_count = min(in_batch_count, out_batch_count);
> +	else
> +		batch_count = in_batch_count | out_batch_count;
> +
> +	in_total_size = ALIGN(in_size + (in_entry_size * batch_count), 8);
> +	out_total_size = ALIGN(out_size + (out_entry_size * batch_count), 8);
> +
> +	space = *this_cpu_ptr(hyperv_pcpu_arg);
> +	if (input) {
> +		*(void **)input = space;
> +		if (space)
> +			/* Zero the fixed size portion, not any array portion */
> +			memset(space, 0, ALIGN(in_size, 8));
> +	}
> +
> +	if (output) {
> +		if (in_total_size + out_total_size <= HV_HYP_PAGE_SIZE) {
> +			offset = in_total_size;
> +		} else {
> +			offset = HV_HYP_PAGE_SIZE;
> +			/* Need more than 1 page, but only 1 was allocated */
> +			BUILD_BUG_ON(HV_HVCALL_ARG_PAGES == 1);
Interesting... so the compiler is not compiling this BUILD_BUG_ON in your patchset
because in_total_size + out_total_size <= HV_HYP_PAGE_SIZE is always known at
compile-time?
So will this also fail if any of the args in_size, in_entry_size, out_size,
out_entry_size are runtime-known?

Nuno

> +		}
> +		*(void **)output = space + offset;
> +	}
> +
> +	return batch_count;
> +}
> +
> +/* Wrappers for some of the simpler cases with only input, or with no arrays */
> +#define hv_hvcall_in(input, in_size) \
> +	hv_hvcall_inout_array(input, in_size, 0, NULL, 0, 0)
> +
> +#define hv_hvcall_inout(input, in_size, output, out_size) \
> +	hv_hvcall_inout_array(input, in_size, 0, output, out_size, 0)
> +
> +#define hv_hvcall_in_array(input, in_size, in_entry_size) \
> +	hv_hvcall_inout_array(input, in_size, in_entry_size, NULL, 0, 0)
> +
>  /* Generate the guest OS identifier as described in the Hyper-V TLFS */
>  static inline u64 hv_generate_guest_id(u64 kernel_version)
>  {


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 2/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-02-27  0:14   ` Nuno Das Neves
@ 2025-02-27  1:50     ` Michael Kelley
  2025-02-27 20:09       ` Nuno Das Neves
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Kelley @ 2025-02-27  1:50 UTC (permalink / raw)
  To: Nuno Das Neves, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, lpieralisi@kernel.org, kw@linux.com,
	manivannan.sadhasivam@linaro.org, robh@kernel.org,
	bhelgaas@google.com, arnd@arndb.de
  Cc: x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-arch@vger.kernel.org

From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Wednesday, February 26, 2025 4:15 PM
> 
> On 2/26/2025 12:06 PM, mhkelley58@gmail.com wrote:
> > From: Michael Kelley <mhklinux@outlook.com>
> >
> > Current code allocates the "hyperv_pcpu_input_arg", and in
> > some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
> > page of memory allocated per-vCPU. A hypercall call site disables
> > interrupts, then uses this memory to set up the input parameters for
> > the hypercall, read the output results after hypercall execution, and
> > re-enable interrupts. The open coding of these steps leads to
> > inconsistencies, and in some cases, violation of the generic
> > requirements for the hypercall input and output as described in the
> > Hyper-V Top Level Functional Spec (TLFS)[1].
> >
> > To reduce these kinds of problems, introduce a family of inline
> > functions to replace the open coding. The functions provide a new way
> > to manage the use of this per-vCPU memory that is usually the input and
> > output arguments to Hyper-V hypercalls. The functions encapsulate
> > key aspects of the usage and ensure that the TLFS requirements are
> > met (max size of 1 page each for input and output, no overlap of
> > input and output, aligned to 8 bytes, etc.). Conceptually, there
> > is no longer a difference between the "per-vCPU input page" and
> > "per-vCPU output page". Only a single per-vCPU page is allocated, and
> > it provides both hypercall input and output memory. All current
> > hypercalls can fit their input and output within that single page,
> > though the new code allows easy changing to two pages should a future
> > hypercall require a full page for each of the input and output.
> >
> > The new functions always zero the fixed-size portion of the hypercall
> > input area so that uninitialized memory is not inadvertently passed
> > to the hypercall. Current open-coded hypercall call sites are
> > inconsistent on this point, and use of the new functions addresses
> > that inconsistency. The output area is not zero'ed by the new code
> > as it is Hyper-V's responsibility to provide legal output.
> >
> > When the input or output (or both) contain an array, the new functions
> > calculate and return how many array entries fit within the per-cpu
> > memory page, which is effectively the "batch size" for the hypercall
> > processing multiple entries. This batch size can then be used in the
> > hypercall control word to specify the repetition count. This
> > calculation of the batch size replaces current open coding of the
> > batch size, which is prone to errors. Note that the array portion of
> > the input area is *not* zero'ed. The arrays are almost always 64-bit
> > GPAs or something similar, and zero'ing that much memory seems
> > wasteful at runtime when it will all be overwritten. The hypercall
> > call site is responsible for ensuring that no part of the array is
> > left uninitialized (just as with current code).
> >
> > The new functions are realized as a single inline function that
> > handles the most complex case, which is a hypercall with input
> > and output, both of which contain arrays. Simpler cases are mapped to
> > this most complex case with #define wrappers that provide zero or NULL
> > for some arguments. Several of the arguments to this new function are
> > expected to be compile-time constants generated by "sizeof()"
> > expressions. As such, most of the code in the new function can be
> > evaluated by the compiler, with the result that the code paths are
> > no longer than with the current open coding. The one exception is
> > new code generated to zero the fixed-size portion of the input area
> > in cases where it is not currently done.
> >
> > [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
> >
> > Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> > ---
> >  include/asm-generic/mshyperv.h | 102 +++++++++++++++++++++++++++++++++
> >  1 file changed, 102 insertions(+)
> >
> > diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
> > index b13b0cda4ac8..0c8a9133cf1a 100644
> > --- a/include/asm-generic/mshyperv.h
> > +++ b/include/asm-generic/mshyperv.h
> > @@ -135,6 +135,108 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size,
> >  	return status;
> >  }
> >
> > +/*
> > + * Hypercall input and output argument setup
> > + */
> > +
> > +/* Temporary mapping to be removed at the end of the patch series */
> > +#define hyperv_pcpu_arg hyperv_pcpu_input_arg
> > +
> > +/*
> > + * Allocate one page that is shared between input and output args, which is
> > + * sufficient for all current hypercalls. If a future hypercall requires
> > + * more space, change this value to "2" and everything will work.
> > + */
> > +#define HV_HVCALL_ARG_PAGES 1
> > +
> > +/*
> > + * Allocate space for hypercall input and output arguments from the
> > + * pre-allocated per-cpu hyperv_pcpu_args page(s). A NULL value for the input
> > + * or output indicates to allocate no space for that argument. For input and
> > + * for output, specify the size of the fixed portion, and the size of an
> > + * element in a variable size array. A zero value for entry_size indicates
> > + * there is no array. The fixed size space for the input is zero'ed.
> > + *
> It might be worth explicitly mentioning that interrupts should be disabled when
> calling this function.

Agreed.

> 
> > + * When variable size arrays are present, the function returns the number of
> > + * elements (i.e, the batch size) that fit in the available space.
> > + *
> > + * The four "size" arguments are expected to be constants, in which case the
> > + * compiler does most of the calculations. Then the generated inline code is no
> > + * larger than if open coding the access to the hyperv_pcpu_arg and doing
> > + * memset() on the input.
> > + */
> > +static inline int hv_hvcall_inout_array(
> > +			void *input, u32 in_size, u32 in_entry_size,
> > +			void *output, u32 out_size, u32 out_entry_size)
> Is there a reason input and output are void * instead of void ** here?

Yes -- it must be void *, and not void **.  Consider a function like get_vtl()
in Patch 3 of this series where local variable "input" is declared as:

struct hv_input_get_vp_registers *input;

Then the first argument to hv_hvcall_inout() will be of type
struct hv_input_get_vp_registers **. The compiler does not consider
this to match void ** and would generate an error. But
struct hv_input_get_vp_registers ** _does_ match void * and compiles
with no error. It makes sense when you think about it, though it
isn't obvious. I tried to use void ** initially and had to figure out
why the code wouldn't compile. :-)

> 
> > +{
> > +	u32 in_batch_count = 0, out_batch_count = 0, batch_count;
> > +	u32 in_total_size, out_total_size, offset;
> > +	u32 batch_space = HV_HYP_PAGE_SIZE * HV_HVCALL_ARG_PAGES;
> > +	void *space;
> > +
> > +	/*
> > +	 * If input and output have arrays, allocate half the space to input
> > +	 * and half to output. If only input has an array, the array can use
> > +	 * all the space except for the fixed size output (but not to exceed
> > +	 * one page), and vice versa.
> > +	 */
> > +	if (in_entry_size && out_entry_size)
> > +		batch_space = batch_space / 2;
> > +	else if (in_entry_size)
> > +		batch_space = min(HV_HYP_PAGE_SIZE, batch_space - out_size);
> > +	else if (out_entry_size)
> > +		batch_space = min(HV_HYP_PAGE_SIZE, batch_space - in_size);
> > +
> > +	if (in_entry_size)
> > +		in_batch_count = (batch_space - in_size) / in_entry_size;
> > +	if (out_entry_size)
> > +		out_batch_count = (batch_space - out_size) / out_entry_size;
> > +
> > +	/*
> > +	 * If input and output have arrays, use the smaller of the two batch
> > +	 * counts, in case they are different. If only one has an array, use
> > +	 * that batch count. batch_count will be zero if neither has an array.
> > +	 */
> > +	if (in_batch_count && out_batch_count)
> > +		batch_count = min(in_batch_count, out_batch_count);
> > +	else
> > +		batch_count = in_batch_count | out_batch_count;
> > +
> > +	in_total_size = ALIGN(in_size + (in_entry_size * batch_count), 8);
> > +	out_total_size = ALIGN(out_size + (out_entry_size * batch_count), 8);
> > +
> > +	space = *this_cpu_ptr(hyperv_pcpu_arg);
> > +	if (input) {
> > +		*(void **)input = space;
> > +		if (space)
> > +			/* Zero the fixed size portion, not any array portion */
> > +			memset(space, 0, ALIGN(in_size, 8));
> > +	}
> > +
> > +	if (output) {
> > +		if (in_total_size + out_total_size <= HV_HYP_PAGE_SIZE) {
> > +			offset = in_total_size;
> > +		} else {
> > +			offset = HV_HYP_PAGE_SIZE;
> > +			/* Need more than 1 page, but only 1 was allocated */
> > +			BUILD_BUG_ON(HV_HVCALL_ARG_PAGES == 1);
> Interesting... so the compiler is not compiling this BUILD_BUG_ON in your patchset
> because in_total_size + out_total_size <= HV_HYP_PAGE_SIZE is always known at
> compile-time?

Correct. And if for some future hypercall in_total_size + out_total_size is
*not* <= HV_HYP_PAGE_SIZE, the BUILD_BUG_ON() will alert the developer
to the problem. Depending on the argument requirements of this future
hypercall, the solution might require changing HV_HVCALL_ARG_PAGES to 2.

> So will this also fail if any of the args in_size, in_entry_size, out_size,
> out_entry_size are runtime-known?

Correct.  I should change my wording about "The four 'size' arguments are
expected to be constants" to ". . . must be constants". OTOH, if there's a need
to support non-constants for any of these arguments, some additional code
could handle that case. But the overall hv_hvcall_inout_array() function will
end up generating a lot of code to execute at runtime. I looked at the hypercall
call sites in the OHCL kernel tree, and didn't see any need for non-constants,
but I haven't looked yet at the v4 patch series you just posted.  Let me know
if you have a case requiring non-constants.

Michael

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-02-27  1:50     ` Michael Kelley
@ 2025-02-27 20:09       ` Nuno Das Neves
  0 siblings, 0 replies; 17+ messages in thread
From: Nuno Das Neves @ 2025-02-27 20:09 UTC (permalink / raw)
  To: Michael Kelley, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, lpieralisi@kernel.org, kw@linux.com,
	manivannan.sadhasivam@linaro.org, robh@kernel.org,
	bhelgaas@google.com, arnd@arndb.de
  Cc: x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-arch@vger.kernel.org

On 2/26/2025 5:50 PM, Michael Kelley wrote:
> From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Wednesday, February 26, 2025 4:15 PM
>>
>> On 2/26/2025 12:06 PM, mhkelley58@gmail.com wrote:
>>> From: Michael Kelley <mhklinux@outlook.com>
>>>
>>> Current code allocates the "hyperv_pcpu_input_arg", and in
>>> some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
>>> page of memory allocated per-vCPU. A hypercall call site disables
>>> interrupts, then uses this memory to set up the input parameters for
>>> the hypercall, read the output results after hypercall execution, and
>>> re-enable interrupts. The open coding of these steps leads to
>>> inconsistencies, and in some cases, violation of the generic
>>> requirements for the hypercall input and output as described in the
>>> Hyper-V Top Level Functional Spec (TLFS)[1].
>>>
>>> To reduce these kinds of problems, introduce a family of inline
>>> functions to replace the open coding. The functions provide a new way
>>> to manage the use of this per-vCPU memory that is usually the input and
>>> output arguments to Hyper-V hypercalls. The functions encapsulate
>>> key aspects of the usage and ensure that the TLFS requirements are
>>> met (max size of 1 page each for input and output, no overlap of
>>> input and output, aligned to 8 bytes, etc.). Conceptually, there
>>> is no longer a difference between the "per-vCPU input page" and
>>> "per-vCPU output page". Only a single per-vCPU page is allocated, and
>>> it provides both hypercall input and output memory. All current
>>> hypercalls can fit their input and output within that single page,
>>> though the new code allows easy changing to two pages should a future
>>> hypercall require a full page for each of the input and output.
>>>
>>> The new functions always zero the fixed-size portion of the hypercall
>>> input area so that uninitialized memory is not inadvertently passed
>>> to the hypercall. Current open-coded hypercall call sites are
>>> inconsistent on this point, and use of the new functions addresses
>>> that inconsistency. The output area is not zero'ed by the new code
>>> as it is Hyper-V's responsibility to provide legal output.
>>>
>>> When the input or output (or both) contain an array, the new functions
>>> calculate and return how many array entries fit within the per-cpu
>>> memory page, which is effectively the "batch size" for the hypercall
>>> processing multiple entries. This batch size can then be used in the
>>> hypercall control word to specify the repetition count. This
>>> calculation of the batch size replaces current open coding of the
>>> batch size, which is prone to errors. Note that the array portion of
>>> the input area is *not* zero'ed. The arrays are almost always 64-bit
>>> GPAs or something similar, and zero'ing that much memory seems
>>> wasteful at runtime when it will all be overwritten. The hypercall
>>> call site is responsible for ensuring that no part of the array is
>>> left uninitialized (just as with current code).
>>>
>>> The new functions are realized as a single inline function that
>>> handles the most complex case, which is a hypercall with input
>>> and output, both of which contain arrays. Simpler cases are mapped to
>>> this most complex case with #define wrappers that provide zero or NULL
>>> for some arguments. Several of the arguments to this new function are
>>> expected to be compile-time constants generated by "sizeof()"
>>> expressions. As such, most of the code in the new function can be
>>> evaluated by the compiler, with the result that the code paths are
>>> no longer than with the current open coding. The one exception is
>>> new code generated to zero the fixed-size portion of the input area
>>> in cases where it is not currently done.
>>>
>>> [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
>>>
>>> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
>>> ---
>>>  include/asm-generic/mshyperv.h | 102 +++++++++++++++++++++++++++++++++
>>>  1 file changed, 102 insertions(+)
>>>
>>> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
>>> index b13b0cda4ac8..0c8a9133cf1a 100644
>>> --- a/include/asm-generic/mshyperv.h
>>> +++ b/include/asm-generic/mshyperv.h
>>> @@ -135,6 +135,108 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size,
>>>  	return status;
>>>  }
>>>
>>> +/*
>>> + * Hypercall input and output argument setup
>>> + */
>>> +
>>> +/* Temporary mapping to be removed at the end of the patch series */
>>> +#define hyperv_pcpu_arg hyperv_pcpu_input_arg
>>> +
>>> +/*
>>> + * Allocate one page that is shared between input and output args, which is
>>> + * sufficient for all current hypercalls. If a future hypercall requires
>>> + * more space, change this value to "2" and everything will work.
>>> + */
>>> +#define HV_HVCALL_ARG_PAGES 1
>>> +
>>> +/*
>>> + * Allocate space for hypercall input and output arguments from the
>>> + * pre-allocated per-cpu hyperv_pcpu_args page(s). A NULL value for the input
>>> + * or output indicates to allocate no space for that argument. For input and
>>> + * for output, specify the size of the fixed portion, and the size of an
>>> + * element in a variable size array. A zero value for entry_size indicates
>>> + * there is no array. The fixed size space for the input is zero'ed.
>>> + *
>> It might be worth explicitly mentioning that interrupts should be disabled when
>> calling this function.
> 
> Agreed.
> 
>>
>>> + * When variable size arrays are present, the function returns the number of
>>> + * elements (i.e, the batch size) that fit in the available space.
>>> + *
>>> + * The four "size" arguments are expected to be constants, in which case the
>>> + * compiler does most of the calculations. Then the generated inline code is no
>>> + * larger than if open coding the access to the hyperv_pcpu_arg and doing
>>> + * memset() on the input.
>>> + */
>>> +static inline int hv_hvcall_inout_array(
>>> +			void *input, u32 in_size, u32 in_entry_size,
>>> +			void *output, u32 out_size, u32 out_entry_size)
>> Is there a reason input and output are void * instead of void ** here?
> 
> Yes -- it must be void *, and not void **.  Consider a function like get_vtl()
> in Patch 3 of this series where local variable "input" is declared as:
> 
> struct hv_input_get_vp_registers *input;
> 
> Then the first argument to hv_hvcall_inout() will be of type
> struct hv_input_get_vp_registers **. The compiler does not consider
> this to match void ** and would generate an error. But
> struct hv_input_get_vp_registers ** _does_ match void * and compiles
> with no error. It makes sense when you think about it, though it
> isn't obvious. I tried to use void ** initially and had to figure out
> why the code wouldn't compile. :-)
> 
Ah, that does make sense. Okay then, fair enough!

>>
>>> +{
>>> +	u32 in_batch_count = 0, out_batch_count = 0, batch_count;
>>> +	u32 in_total_size, out_total_size, offset;
>>> +	u32 batch_space = HV_HYP_PAGE_SIZE * HV_HVCALL_ARG_PAGES;
>>> +	void *space;
>>> +
>>> +	/*
>>> +	 * If input and output have arrays, allocate half the space to input
>>> +	 * and half to output. If only input has an array, the array can use
>>> +	 * all the space except for the fixed size output (but not to exceed
>>> +	 * one page), and vice versa.
>>> +	 */
>>> +	if (in_entry_size && out_entry_size)
>>> +		batch_space = batch_space / 2;
>>> +	else if (in_entry_size)
>>> +		batch_space = min(HV_HYP_PAGE_SIZE, batch_space - out_size);
>>> +	else if (out_entry_size)
>>> +		batch_space = min(HV_HYP_PAGE_SIZE, batch_space - in_size);
>>> +
>>> +	if (in_entry_size)
>>> +		in_batch_count = (batch_space - in_size) / in_entry_size;
>>> +	if (out_entry_size)
>>> +		out_batch_count = (batch_space - out_size) / out_entry_size;
>>> +
>>> +	/*
>>> +	 * If input and output have arrays, use the smaller of the two batch
>>> +	 * counts, in case they are different. If only one has an array, use
>>> +	 * that batch count. batch_count will be zero if neither has an array.
>>> +	 */
>>> +	if (in_batch_count && out_batch_count)
>>> +		batch_count = min(in_batch_count, out_batch_count);
>>> +	else
>>> +		batch_count = in_batch_count | out_batch_count;
>>> +
>>> +	in_total_size = ALIGN(in_size + (in_entry_size * batch_count), 8);
>>> +	out_total_size = ALIGN(out_size + (out_entry_size * batch_count), 8);
>>> +
>>> +	space = *this_cpu_ptr(hyperv_pcpu_arg);
>>> +	if (input) {
>>> +		*(void **)input = space;
>>> +		if (space)
>>> +			/* Zero the fixed size portion, not any array portion */
>>> +			memset(space, 0, ALIGN(in_size, 8));
>>> +	}
>>> +
>>> +	if (output) {
>>> +		if (in_total_size + out_total_size <= HV_HYP_PAGE_SIZE) {
>>> +			offset = in_total_size;
>>> +		} else {
>>> +			offset = HV_HYP_PAGE_SIZE;
>>> +			/* Need more than 1 page, but only 1 was allocated */
>>> +			BUILD_BUG_ON(HV_HVCALL_ARG_PAGES == 1);
>> Interesting... so the compiler is not compiling this BUILD_BUG_ON in your patchset
>> because in_total_size + out_total_size <= HV_HYP_PAGE_SIZE is always known at
>> compile-time?
> 
> Correct. And if for some future hypercall in_total_size + out_total_size is
> *not* <= HV_HYP_PAGE_SIZE, the BUILD_BUG_ON() will alert the developer
> to the problem. Depending on the argument requirements of this future
> hypercall, the solution might require changing HV_HVCALL_ARG_PAGES to 2.
> 
>> So will this also fail if any of the args in_size, in_entry_size, out_size,
>> out_entry_size are runtime-known?
> 
> Correct.  I should change my wording about "The four 'size' arguments are
> expected to be constants" to ". . . must be constants". OTOH, if there's a need
> to support non-constants for any of these arguments, some additional code
> could handle that case. But the overall hv_hvcall_inout_array() function will
> end up generating a lot of code to execute at runtime. I looked at the hypercall
> call sites in the OHCL kernel tree, and didn't see any need for non-constants,
> but I haven't looked yet at the v4 patch series you just posted.  Let me know
> if you have a case requiring non-constants.
> 
I don't think we ever use non-constants. In fact I can't think of a case where these
values aren't the result of a sizeof() (or 0).

Overall I think this approach is looking quite nice and I'd be happy to adopt it
in the mshv driver code when this is merged.

With the comment change mentioned above:

Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>

> Michael


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 3/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 1
  2025-02-26 20:06 [PATCH 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
  2025-02-26 20:06 ` [PATCH 1/7] x86/hyperv: Fix output argument to hypercall that changes page visibility mhkelley58
  2025-02-26 20:06 ` [PATCH 2/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments mhkelley58
@ 2025-02-26 20:06 ` mhkelley58
  2025-02-27 21:07   ` Nuno Das Neves
  2025-02-26 20:06 ` [PATCH 4/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 2 mhkelley58
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: mhkelley58 @ 2025-02-26 20:06 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

From: Michael Kelley <mhklinux@outlook.com>

Update hypercall call sites to use the new hv_hvcall_*() functions
to set up hypercall arguments. Since these functions zero the
fixed portion of input memory, remove now redundant calls to memset()
and explicit zero'ing of input fields.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
 arch/x86/hyperv/hv_apic.c   |  6 ++----
 arch/x86/hyperv/hv_init.c   |  5 +----
 arch/x86/hyperv/hv_vtl.c    |  8 ++------
 arch/x86/hyperv/irqdomain.c | 10 ++++------
 4 files changed, 9 insertions(+), 20 deletions(-)

diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index f022d5f64fb6..c16f81dd36fc 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -115,14 +115,12 @@ static bool __send_ipi_mask_ex(const struct cpumask *mask, int vector,
 		return false;
 
 	local_irq_save(flags);
-	ipi_arg = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
+	hv_hvcall_in_array(&ipi_arg, sizeof(*ipi_arg),
+				sizeof(ipi_arg->vp_set.bank_contents[0]));
 	if (unlikely(!ipi_arg))
 		goto ipi_mask_ex_done;
 
 	ipi_arg->vector = vector;
-	ipi_arg->reserved = 0;
-	ipi_arg->vp_set.valid_bank_mask = 0;
 
 	/*
 	 * Use HV_GENERIC_SET_ALL and avoid converting cpumask to VP_SET
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index ddeb40930bc8..c5c9511cb7ed 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -400,13 +400,10 @@ static u8 __init get_vtl(void)
 	u64 ret;
 
 	local_irq_save(flags);
-	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
 
-	memset(input, 0, struct_size(input, names, 1));
+	hv_hvcall_inout(&input, sizeof(*input), &output, sizeof(*output));
 	input->partition_id = HV_PARTITION_ID_SELF;
 	input->vp_index = HV_VP_INDEX_SELF;
-	input->input_vtl.as_uint8 = 0;
 	input->names[0] = HV_REGISTER_VSM_VP_STATUS;
 
 	ret = hv_do_hypercall(control, input, output);
diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c
index 3f4e20d7b724..3dd27d548db6 100644
--- a/arch/x86/hyperv/hv_vtl.c
+++ b/arch/x86/hyperv/hv_vtl.c
@@ -94,8 +94,7 @@ static int hv_vtl_bringup_vcpu(u32 target_vp_index, int cpu, u64 eip_ignored)
 
 	local_irq_save(irq_flags);
 
-	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	memset(input, 0, sizeof(*input));
+	hv_hvcall_in(&input, sizeof(*input));
 
 	input->partition_id = HV_PARTITION_ID_SELF;
 	input->vp_index = target_vp_index;
@@ -185,13 +184,10 @@ static int hv_vtl_apicid_to_vp_id(u32 apic_id)
 
 	local_irq_save(irq_flags);
 
-	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	memset(input, 0, sizeof(*input));
+	hv_hvcall_inout(&input, sizeof(*input), &output, sizeof(*output));
 	input->partition_id = HV_PARTITION_ID_SELF;
 	input->apic_ids[0] = apic_id;
 
-	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
-
 	control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_ID_FROM_APIC_ID;
 	status = hv_do_hypercall(control, input, output);
 	ret = output[0];
diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
index 64b921360b0f..803b1a945c5c 100644
--- a/arch/x86/hyperv/irqdomain.c
+++ b/arch/x86/hyperv/irqdomain.c
@@ -24,11 +24,11 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
 
 	local_irq_save(flags);
 
-	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
+	hv_hvcall_inout_array(&input, sizeof(*input),
+			sizeof(input->interrupt_descriptor.target.vp_set.bank_contents[0]),
+			&output, sizeof(*output), 0);
 
 	intr_desc = &input->interrupt_descriptor;
-	memset(input, 0, sizeof(*input));
 	input->partition_id = hv_current_partition_id;
 	input->device_id = device_id.as_uint64;
 	intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
@@ -40,7 +40,6 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
 	else
 		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
 
-	intr_desc->target.vp_set.valid_bank_mask = 0;
 	intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
 	nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), cpumask_of(cpu));
 	if (nr_bank < 0) {
@@ -77,9 +76,8 @@ static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *old_entry)
 	u64 status;
 
 	local_irq_save(flags);
-	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
 
-	memset(input, 0, sizeof(*input));
+	hv_hvcall_in(&input, sizeof(*input));
 	intr_entry = &input->interrupt_entry;
 	input->partition_id = hv_current_partition_id;
 	input->device_id = id;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 1
  2025-02-26 20:06 ` [PATCH 3/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 1 mhkelley58
@ 2025-02-27 21:07   ` Nuno Das Neves
  2025-02-28  0:50     ` Michael Kelley
  0 siblings, 1 reply; 17+ messages in thread
From: Nuno Das Neves @ 2025-02-27 21:07 UTC (permalink / raw)
  To: mhklinux, kys, haiyangz, wei.liu, decui, tglx, mingo, bp,
	dave.hansen, hpa, lpieralisi, kw, manivannan.sadhasivam, robh,
	bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

On 2/26/2025 12:06 PM, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
> 
> Update hypercall call sites to use the new hv_hvcall_*() functions
> to set up hypercall arguments. Since these functions zero the
> fixed portion of input memory, remove now redundant calls to memset()
> and explicit zero'ing of input fields.
> 
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> ---
>  arch/x86/hyperv/hv_apic.c   |  6 ++----
>  arch/x86/hyperv/hv_init.c   |  5 +----
>  arch/x86/hyperv/hv_vtl.c    |  8 ++------
>  arch/x86/hyperv/irqdomain.c | 10 ++++------
>  4 files changed, 9 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
> index f022d5f64fb6..c16f81dd36fc 100644
> --- a/arch/x86/hyperv/hv_apic.c
> +++ b/arch/x86/hyperv/hv_apic.c
> @@ -115,14 +115,12 @@ static bool __send_ipi_mask_ex(const struct cpumask *mask, int vector,
>  		return false;
>  
>  	local_irq_save(flags);
> -	ipi_arg = *this_cpu_ptr(hyperv_pcpu_input_arg);
> -
> +	hv_hvcall_in_array(&ipi_arg, sizeof(*ipi_arg),
> +				sizeof(ipi_arg->vp_set.bank_contents[0]));
I think the returned "batch size" should be checked to ensure it is not too small to hold the
variable-sized part of the header.

>  	if (unlikely(!ipi_arg))
>  		goto ipi_mask_ex_done;
>  
While here, is this check really needed? If so, maybe a check for the percpu page(s) could be
baked into hv_hvcall_inout_array()?

>  	ipi_arg->vector = vector;
> -	ipi_arg->reserved = 0;
> -	ipi_arg->vp_set.valid_bank_mask = 0;
>  
>  	/*
>  	 * Use HV_GENERIC_SET_ALL and avoid converting cpumask to VP_SET
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index ddeb40930bc8..c5c9511cb7ed 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -400,13 +400,10 @@ static u8 __init get_vtl(void)
>  	u64 ret;
>  
>  	local_irq_save(flags);
> -	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> -	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
>  
> -	memset(input, 0, struct_size(input, names, 1));
> +	hv_hvcall_inout(&input, sizeof(*input), &output, sizeof(*output));

This doesn't look right, this is a rep hypercall taking an array of register names
and outputting an array of register values.

hv_hvcall_inout_array() should be fully utilized (input and output arrays) here.

The current code may actually work, but it will overlap the input and output!

>  	input->partition_id = HV_PARTITION_ID_SELF;
>  	input->vp_index = HV_VP_INDEX_SELF;
> -	input->input_vtl.as_uint8 = 0;
>  	input->names[0] = HV_REGISTER_VSM_VP_STATUS;
>  
>  	ret = hv_do_hypercall(control, input, output);
> diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c
> index 3f4e20d7b724..3dd27d548db6 100644
> --- a/arch/x86/hyperv/hv_vtl.c
> +++ b/arch/x86/hyperv/hv_vtl.c
<snip>
> @@ -185,13 +184,10 @@ static int hv_vtl_apicid_to_vp_id(u32 apic_id)
>  
>  	local_irq_save(irq_flags);
>  
> -	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> -	memset(input, 0, sizeof(*input));
> +	hv_hvcall_inout(&input, sizeof(*input), &output, sizeof(*output));
This has the same issue as above - it is a rep hypercall so needs hv_hvcall_inout_array()

>  	input->partition_id = HV_PARTITION_ID_SELF;
>  	input->apic_ids[0] = apic_id;
>  
> -	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> -
>  	control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_ID_FROM_APIC_ID;
>  	status = hv_do_hypercall(control, input, output);
>  	ret = output[0];
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> index 64b921360b0f..803b1a945c5c 100644
> --- a/arch/x86/hyperv/irqdomain.c
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -24,11 +24,11 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
>  
>  	local_irq_save(flags);
>  
> -	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> -	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> +	hv_hvcall_inout_array(&input, sizeof(*input),
> +			sizeof(input->interrupt_descriptor.target.vp_set.bank_contents[0]),
> +			&output, sizeof(*output), 0);
As noted before I think the batch size should be checked to ensure it is large enough.

Side note - it seems in this hypercall, nr_banks + 1 is used as the varhead size, which
counts the vp valid mask, but this is not the case in __send_ipi_mask_ex(). Do you happen
to know why that might be?

Thanks,
Nuno

>  
>  	intr_desc = &input->interrupt_descriptor;
> -	memset(input, 0, sizeof(*input));
>  	input->partition_id = hv_current_partition_id;
>  	input->device_id = device_id.as_uint64;
>  	intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
<snip>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 3/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 1
  2025-02-27 21:07   ` Nuno Das Neves
@ 2025-02-28  0:50     ` Michael Kelley
  0 siblings, 0 replies; 17+ messages in thread
From: Michael Kelley @ 2025-02-28  0:50 UTC (permalink / raw)
  To: Nuno Das Neves, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, lpieralisi@kernel.org, kw@linux.com,
	manivannan.sadhasivam@linaro.org, robh@kernel.org,
	bhelgaas@google.com, arnd@arndb.de
  Cc: x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-arch@vger.kernel.org

From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Thursday, February 27, 2025 1:08 PM
> 
> On 2/26/2025 12:06 PM, mhkelley58@gmail.com wrote:
> > From: Michael Kelley <mhklinux@outlook.com>
> >
> > Update hypercall call sites to use the new hv_hvcall_*() functions
> > to set up hypercall arguments. Since these functions zero the
> > fixed portion of input memory, remove now redundant calls to memset()
> > and explicit zero'ing of input fields.
> >
> > Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> > ---
> >  arch/x86/hyperv/hv_apic.c   |  6 ++----
> >  arch/x86/hyperv/hv_init.c   |  5 +----
> >  arch/x86/hyperv/hv_vtl.c    |  8 ++------
> >  arch/x86/hyperv/irqdomain.c | 10 ++++------
> >  4 files changed, 9 insertions(+), 20 deletions(-)
> >
> > diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
> > index f022d5f64fb6..c16f81dd36fc 100644
> > --- a/arch/x86/hyperv/hv_apic.c
> > +++ b/arch/x86/hyperv/hv_apic.c
> > @@ -115,14 +115,12 @@ static bool __send_ipi_mask_ex(const struct cpumask *mask, int vector,
> >  		return false;
> >
> >  	local_irq_save(flags);
> > -	ipi_arg = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > -
> > +	hv_hvcall_in_array(&ipi_arg, sizeof(*ipi_arg),
> > +				sizeof(ipi_arg->vp_set.bank_contents[0]));
> I think the returned "batch size" should be checked to ensure it is not too small to hold the
> variable-sized part of the header.

Are you saying to check the batch_size against nr_bank (which is the
size of the array embedded in vp_set)?  

> 
> >  	if (unlikely(!ipi_arg))
> >  		goto ipi_mask_ex_done;
> >
> While here, is this check really needed? If so, maybe a check for the percpu page(s) could be
> baked into hv_hvcall_inout_array()?

Yes, I wanted to bake the check into hv_hvcall_inout_array(). But this
check really is needed.  hv_send_ipi(), which can propagate down to
__send_ipi_mask_ex(), can get called early during boot before the per-cpu
hypercall argument page is allocated. The lack of the hypercall argument
page must cleanly propagate back to hv_send_ipi() so it can use the native
APIC "send IPI" function that works without a hypercall, but is slower.

> 
> >  	ipi_arg->vector = vector;
> > -	ipi_arg->reserved = 0;
> > -	ipi_arg->vp_set.valid_bank_mask = 0;
> >
> >  	/*
> >  	 * Use HV_GENERIC_SET_ALL and avoid converting cpumask to VP_SET
> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > index ddeb40930bc8..c5c9511cb7ed 100644
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -400,13 +400,10 @@ static u8 __init get_vtl(void)
> >  	u64 ret;
> >
> >  	local_irq_save(flags);
> > -	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > -	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> >
> > -	memset(input, 0, struct_size(input, names, 1));
> > +	hv_hvcall_inout(&input, sizeof(*input), &output, sizeof(*output));
> 
> This doesn't look right, this is a rep hypercall taking an array of register names
> and outputting an array of register values.
> 
> hv_hvcall_inout_array() should be fully utilized (input and output arrays) here.
> 
> The current code may actually work, but it will overlap the input and output!

Yep.  I messed this up.  Not sure why. :-(   Will fix.

> 
> >  	input->partition_id = HV_PARTITION_ID_SELF;
> >  	input->vp_index = HV_VP_INDEX_SELF;
> > -	input->input_vtl.as_uint8 = 0;
> >  	input->names[0] = HV_REGISTER_VSM_VP_STATUS;
> >
> >  	ret = hv_do_hypercall(control, input, output);
> > diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c
> > index 3f4e20d7b724..3dd27d548db6 100644
> > --- a/arch/x86/hyperv/hv_vtl.c
> > +++ b/arch/x86/hyperv/hv_vtl.c
> <snip>
> > @@ -185,13 +184,10 @@ static int hv_vtl_apicid_to_vp_id(u32 apic_id)
> >
> >  	local_irq_save(irq_flags);
> >
> > -	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > -	memset(input, 0, sizeof(*input));
> > +	hv_hvcall_inout(&input, sizeof(*input), &output, sizeof(*output));
> This has the same issue as above - it is a rep hypercall so needs hv_hvcall_inout_array()

Agreed.  Will fix.

> 
> >  	input->partition_id = HV_PARTITION_ID_SELF;
> >  	input->apic_ids[0] = apic_id;
> >
> > -	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > -
> >  	control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_ID_FROM_APIC_ID;
> >  	status = hv_do_hypercall(control, input, output);
> >  	ret = output[0];
> > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > index 64b921360b0f..803b1a945c5c 100644
> > --- a/arch/x86/hyperv/irqdomain.c
> > +++ b/arch/x86/hyperv/irqdomain.c
> > @@ -24,11 +24,11 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
> >
> >  	local_irq_save(flags);
> >
> > -	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > -	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > +	hv_hvcall_inout_array(&input, sizeof(*input),
> > +			sizeof(input->interrupt_descriptor.target.vp_set.bank_contents[0]),
> > +			&output, sizeof(*output), 0);
> As noted before I think the batch size should be checked to ensure it is large enough.
> 
> Side note - it seems in this hypercall, nr_banks + 1 is used as the varhead size, which
> counts the vp valid mask, but this is not the case in __send_ipi_mask_ex(). Do you happen
> to know why that might be?

Interesting discrepancy.  Right off the bat, I don't know why.  The comment
in hv_map_interrupt() is very specific and sounds like it knows what it is
talking about. hyperv_flush_tlb_others_ex() does it like __send_ipi_mask_ex(),
but hv_arch_irq_unmask() does it like hv_map_interrupt(), and with the same
comment.  So two votes for each way. :-(  I'll research further.

Thanks for the careful review and flagging the mistakes!

Michael

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 4/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 2
  2025-02-26 20:06 [PATCH 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
                   ` (2 preceding siblings ...)
  2025-02-26 20:06 ` [PATCH 3/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 1 mhkelley58
@ 2025-02-26 20:06 ` mhkelley58
  2025-02-26 20:06 ` [PATCH 5/7] Drivers: hv: Use hv_hvcall_*() to set up hypercall arguments mhkelley58
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: mhkelley58 @ 2025-02-26 20:06 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

From: Michael Kelley <mhklinux@outlook.com>

Update hypercall call sites to use the new hv_hvcall_*() functions
to set up hypercall arguments. Since these functions zero the
fixed portion of input memory, remove now redundant calls to memset()
and explicit zero'ing of input fields.

For hv_mark_gpa_visibility(), use the computed batch_size instead
of HV_MAX_MODIFY_GPA_REP_COUNT. Also update the associated gpa_page_list[]
field to have zero size, which is more consistent with other array
arguments to hypercalls. Due to the interaction with the calling
hv_vtom_set_host_visibility(), HV_MAX_MODIFY_GPA_REP_COUNT cannot be
completely eliminated without some further restructuring, but that's
for another patch set.

Similarly, for the nested flush functions, update the gpa_list[] to
have zero size. Again, separate restructuring would be required to
completely eliminate the need for HV_MAX_FLUSH_REP_COUNT.

Finally, hyperv_flush_tlb_others_ex() requires special handling
because the input consists of two arrays -- one for the hv_vp_set and
another for the gva list. The batch_size computed by hv_hvcall_in_array()
is adjusted to account for the number of entries in the hv_vp_set.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
 arch/x86/hyperv/ivm.c       | 18 +++++++++---------
 arch/x86/hyperv/mmu.c       | 17 +++--------------
 arch/x86/hyperv/nested.c    | 14 +++++---------
 include/hyperv/hvgdk_mini.h |  4 ++--
 4 files changed, 19 insertions(+), 34 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index ec7880271cf9..1e4f65aef09b 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -465,30 +465,30 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
 {
 	struct hv_gpa_range_for_visibility *input;
 	u64 hv_status;
+	int batch_size;
 	unsigned long flags;
 
 	/* no-op if partition isolation is not enabled */
 	if (!hv_is_isolation_supported())
 		return 0;
 
-	if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
-		pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
-			HV_MAX_MODIFY_GPA_REP_COUNT);
+	local_irq_save(flags);
+	batch_size = hv_hvcall_in_array(&input, sizeof(*input),
+					sizeof(input->gpa_page_list[0]));
+	if (unlikely(!input)) {
+		local_irq_restore(flags);
 		return -EINVAL;
 	}
 
-	local_irq_save(flags);
-	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
-	if (unlikely(!input)) {
+	if (count > batch_size) {
+		pr_err("Hyper-V: GPA count:%d exceeds supported:%u\n", count,
+			batch_size);
 		local_irq_restore(flags);
 		return -EINVAL;
 	}
 
 	input->partition_id = HV_PARTITION_ID_SELF;
 	input->host_visibility = visibility;
-	input->reserved0 = 0;
-	input->reserved1 = 0;
 	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
 	hv_status = hv_do_rep_hypercall(
 			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index 1f7c3082a36d..ab9db23247c1 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -72,7 +72,7 @@ static void hyperv_flush_tlb_multi(const struct cpumask *cpus,
 
 	local_irq_save(flags);
 
-	flush = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	max_gvas = hv_hvcall_in_array(&flush, sizeof(*flush), sizeof(flush->gva_list[0]));
 
 	if (unlikely(!flush)) {
 		local_irq_restore(flags);
@@ -86,13 +86,10 @@ static void hyperv_flush_tlb_multi(const struct cpumask *cpus,
 		 */
 		flush->address_space = virt_to_phys(info->mm->pgd);
 		flush->address_space &= CR3_ADDR_MASK;
-		flush->flags = 0;
 	} else {
-		flush->address_space = 0;
 		flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
 	}
 
-	flush->processor_mask = 0;
 	if (cpumask_equal(cpus, cpu_present_mask)) {
 		flush->flags |= HV_FLUSH_ALL_PROCESSORS;
 	} else {
@@ -139,8 +136,6 @@ static void hyperv_flush_tlb_multi(const struct cpumask *cpus,
 	 * We can flush not more than max_gvas with one hypercall. Flush the
 	 * whole address space if we were asked to do more.
 	 */
-	max_gvas = (PAGE_SIZE - sizeof(*flush)) / sizeof(flush->gva_list[0]);
-
 	if (info->end == TLB_FLUSH_ALL) {
 		flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
 		status = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
@@ -179,7 +174,7 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
 	if (!(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED))
 		return HV_STATUS_INVALID_PARAMETER;
 
-	flush = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	max_gvas = hv_hvcall_in_array(&flush, sizeof(*flush), sizeof(flush->gva_list[0]));
 
 	if (info->mm) {
 		/*
@@ -188,14 +183,10 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
 		 */
 		flush->address_space = virt_to_phys(info->mm->pgd);
 		flush->address_space &= CR3_ADDR_MASK;
-		flush->flags = 0;
 	} else {
-		flush->address_space = 0;
 		flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
 	}
 
-	flush->hv_vp_set.valid_bank_mask = 0;
-
 	flush->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
 	nr_bank = cpumask_to_vpset_skip(&flush->hv_vp_set, cpus,
 			info->freed_tables ? NULL : cpu_is_lazy);
@@ -206,9 +197,7 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
 	 * We can flush not more than max_gvas with one hypercall. Flush the
 	 * whole address space if we were asked to do more.
 	 */
-	max_gvas =
-		(PAGE_SIZE - sizeof(*flush) - nr_bank *
-		 sizeof(flush->hv_vp_set.bank_contents[0])) /
+	max_gvas -= (nr_bank * sizeof(flush->hv_vp_set.bank_contents[0])) /
 		sizeof(flush->gva_list[0]);
 
 	if (info->end == TLB_FLUSH_ALL) {
diff --git a/arch/x86/hyperv/nested.c b/arch/x86/hyperv/nested.c
index 1083dc8646f9..88c39ac8d0aa 100644
--- a/arch/x86/hyperv/nested.c
+++ b/arch/x86/hyperv/nested.c
@@ -29,15 +29,13 @@ int hyperv_flush_guest_mapping(u64 as)
 
 	local_irq_save(flags);
 
-	flush = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
+	hv_hvcall_in(&flush, sizeof(*flush));
 	if (unlikely(!flush)) {
 		local_irq_restore(flags);
 		goto fault;
 	}
 
 	flush->address_space = as;
-	flush->flags = 0;
 
 	status = hv_do_hypercall(HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE,
 				 flush, NULL);
@@ -90,25 +88,23 @@ int hyperv_flush_guest_mapping_range(u64 as,
 	u64 status;
 	unsigned long flags;
 	int ret = -ENOTSUPP;
-	int gpa_n = 0;
+	int batch_size, gpa_n = 0;
 
 	if (!hv_hypercall_pg || !fill_flush_list_func)
 		goto fault;
 
 	local_irq_save(flags);
 
-	flush = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
+	batch_size = hv_hvcall_in_array(&flush, sizeof(*flush),
+					sizeof(flush->gpa_list[0]));
 	if (unlikely(!flush)) {
 		local_irq_restore(flags);
 		goto fault;
 	}
 
 	flush->address_space = as;
-	flush->flags = 0;
-
 	gpa_n = fill_flush_list_func(flush, data);
-	if (gpa_n < 0) {
+	if (gpa_n < 0 || gpa_n > batch_size) {
 		local_irq_restore(flags);
 		goto fault;
 	}
diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
index 58895883f636..70e5d7ee40c8 100644
--- a/include/hyperv/hvgdk_mini.h
+++ b/include/hyperv/hvgdk_mini.h
@@ -533,7 +533,7 @@ union hv_gpa_page_range {
 struct hv_guest_mapping_flush_list {
 	u64 address_space;
 	u64 flags;
-	union hv_gpa_page_range gpa_list[HV_MAX_FLUSH_REP_COUNT];
+	union hv_gpa_page_range gpa_list[];
 };
 
 struct hv_tlb_flush {	 /* HV_INPUT_FLUSH_VIRTUAL_ADDRESS_LIST */
@@ -1218,7 +1218,7 @@ struct hv_gpa_range_for_visibility {
 	u32 host_visibility : 2;
 	u32 reserved0 : 30;
 	u32 reserved1;
-	u64 gpa_page_list[HV_MAX_MODIFY_GPA_REP_COUNT];
+	u64 gpa_page_list[];
 } __packed;
 
 #if defined(CONFIG_X86)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/7] Drivers: hv: Use hv_hvcall_*() to set up hypercall arguments
  2025-02-26 20:06 [PATCH 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
                   ` (3 preceding siblings ...)
  2025-02-26 20:06 ` [PATCH 4/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 2 mhkelley58
@ 2025-02-26 20:06 ` mhkelley58
  2025-02-26 20:06 ` [PATCH 6/7] PCI: " mhkelley58
  2025-02-26 20:06 ` [PATCH 7/7] Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg mhkelley58
  6 siblings, 0 replies; 17+ messages in thread
From: mhkelley58 @ 2025-02-26 20:06 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

From: Michael Kelley <mhklinux@outlook.com>

Update hypercall call sites to use the new hv_hvcall_*() functions
to set up hypercall arguments. Since these functions zero the
fixed portion of input memory, remove now redundant zero'ing of
input fields.

hv_post_message() requires additional updates. The payload area is
treated as an array to avoid wasting cycles on zero'ing it and
then overwriting with memcpy(). To allow treatment as an array,
the corresponding payload[] field is updated to have zero size.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
 drivers/hv/hv.c           | 8 +++++---
 drivers/hv/hv_balloon.c   | 4 ++--
 drivers/hv/hv_common.c    | 2 +-
 drivers/hv/hv_proc.c      | 8 ++++----
 drivers/hv/hyperv_vmbus.h | 2 +-
 5 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index a38f84548bc2..d98e83b4a6fd 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -66,7 +66,8 @@ int hv_post_message(union hv_connection_id connection_id,
 	if (hv_isolation_type_tdx() && ms_hyperv.paravisor_present)
 		aligned_msg = this_cpu_ptr(hv_context.cpu_context)->post_msg_page;
 	else
-		aligned_msg = *this_cpu_ptr(hyperv_pcpu_input_arg);
+		hv_hvcall_in_array(&aligned_msg, sizeof(*aligned_msg),
+					sizeof(aligned_msg->payload[0]));
 
 	aligned_msg->connectionid = connection_id;
 	aligned_msg->reserved = 0;
@@ -80,8 +81,9 @@ int hv_post_message(union hv_connection_id connection_id,
 						  virt_to_phys(aligned_msg), 0);
 		else if (hv_isolation_type_snp())
 			status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
-						   aligned_msg, NULL,
-						   sizeof(*aligned_msg));
+					aligned_msg, NULL,
+					struct_size(aligned_msg, payload,
+					      HV_MESSAGE_PAYLOAD_QWORD_COUNT));
 		else
 			status = HV_STATUS_INVALID_PARAMETER;
 	} else {
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index fec2f18679e3..2def8b8794ee 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1582,14 +1582,14 @@ static int hv_free_page_report(struct page_reporting_dev_info *pr_dev_info,
 	WARN_ON_ONCE(nents > HV_MEMORY_HINT_MAX_GPA_PAGE_RANGES);
 	WARN_ON_ONCE(sgl->length < (HV_HYP_PAGE_SIZE << page_reporting_order));
 	local_irq_save(flags);
-	hint = *this_cpu_ptr(hyperv_pcpu_input_arg);
+
+	hv_hvcall_in_array(&hint, sizeof(*hint), sizeof(hint->ranges[0]));
 	if (!hint) {
 		local_irq_restore(flags);
 		return -ENOSPC;
 	}
 
 	hint->heat_type = HV_EXTMEM_HEAT_HINT_COLD_DISCARD;
-	hint->reserved = 0;
 	for_each_sg(sgl, sg, nents, i) {
 		union hv_gpa_page_range *range;
 
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 9804adb4cc56..a6b1cdfbc8d4 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -293,7 +293,7 @@ void __init hv_get_partition_id(void)
 	u64 status, pt_id;
 
 	local_irq_save(flags);
-	output = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	hv_hvcall_inout(NULL, 0, &output, sizeof(*output));
 	status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, &output);
 	pt_id = output->partition_id;
 	local_irq_restore(flags);
diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
index 2fae18e4f7d2..d8f3b74d5306 100644
--- a/drivers/hv/hv_proc.c
+++ b/drivers/hv/hv_proc.c
@@ -73,7 +73,8 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
 
 	local_irq_save(flags);
 
-	input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	hv_hvcall_in_array(&input_page, sizeof(*input_page),
+			sizeof(input_page->gpa_page_list[0]));
 
 	input_page->partition_id = partition_id;
 
@@ -124,9 +125,8 @@ int hv_call_add_logical_proc(int node, u32 lp_index, u32 apic_id)
 	do {
 		local_irq_save(flags);
 
-		input = *this_cpu_ptr(hyperv_pcpu_input_arg);
 		/* We don't do anything with the output right now */
-		output = *this_cpu_ptr(hyperv_pcpu_output_arg);
+		hv_hvcall_inout(&input, sizeof(*input), &output, sizeof(*output));
 
 		input->lp_index = lp_index;
 		input->apic_id = apic_id;
@@ -167,7 +167,7 @@ int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags)
 	do {
 		local_irq_save(irq_flags);
 
-		input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+		hv_hvcall_in(&input, sizeof(*input));
 
 		input->partition_id = partition_id;
 		input->vp_index = vp_index;
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 29780f3a7478..44b5e8330d9d 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -101,7 +101,7 @@ struct hv_input_post_message {
 	u32 reserved;
 	u32 message_type;
 	u32 payload_size;
-	u64 payload[HV_MESSAGE_PAYLOAD_QWORD_COUNT];
+	u64 payload[];
 };
 
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/7] PCI: hv: Use hv_hvcall_*() to set up hypercall arguments
  2025-02-26 20:06 [PATCH 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
                   ` (4 preceding siblings ...)
  2025-02-26 20:06 ` [PATCH 5/7] Drivers: hv: Use hv_hvcall_*() to set up hypercall arguments mhkelley58
@ 2025-02-26 20:06 ` mhkelley58
  2025-02-27 20:29   ` Bjorn Helgaas
  2025-02-26 20:06 ` [PATCH 7/7] Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg mhkelley58
  6 siblings, 1 reply; 17+ messages in thread
From: mhkelley58 @ 2025-02-26 20:06 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

From: Michael Kelley <mhklinux@outlook.com>

Update hypercall call sites to use the new hv_hvcall_*() functions
to set up hypercall arguments. Since these functions zero the
fixed portion of input memory, remove now redundant calls to memset().

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
 drivers/pci/controller/pci-hyperv.c | 14 ++++++--------
 include/hyperv/hvgdk_mini.h         |  2 +-
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 44d7f4339306..b7bfda00544d 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -638,8 +638,8 @@ static void hv_arch_irq_unmask(struct irq_data *data)
 
 	local_irq_save(flags);
 
-	params = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	memset(params, 0, sizeof(*params));
+	hv_hvcall_in_array(&params, sizeof(*params),
+			sizeof(params->int_target.vp_set.bank_contents[0]));
 	params->partition_id = HV_PARTITION_ID_SELF;
 	params->int_entry.source = HV_INTERRUPT_SOURCE_MSI;
 	params->int_entry.msi_entry.address.as_uint32 = int_desc->address & 0xffffffff;
@@ -1034,11 +1034,9 @@ static void hv_pci_read_mmio(struct device *dev, phys_addr_t gpa, int size, u32
 
 	/*
 	 * Must be called with interrupts disabled so it is safe
-	 * to use the per-cpu input argument page.  Use it for
-	 * both input and output.
+	 * to use the per-cpu argument page.
 	 */
-	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	out = *this_cpu_ptr(hyperv_pcpu_input_arg) + sizeof(*in);
+	hv_hvcall_inout(&in, sizeof(*in), &out, sizeof(*out));
 	in->gpa = gpa;
 	in->size = size;
 
@@ -1067,9 +1065,9 @@ static void hv_pci_write_mmio(struct device *dev, phys_addr_t gpa, int size, u32
 
 	/*
 	 * Must be called with interrupts disabled so it is safe
-	 * to use the per-cpu input argument memory.
+	 * to use the per-cpu argument page.
 	 */
-	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	hv_hvcall_in_array(&in, sizeof(*in), sizeof(in->data[0]));
 	in->gpa = gpa;
 	in->size = size;
 	switch (size) {
diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
index 70e5d7ee40c8..cb25ac1e3ac5 100644
--- a/include/hyperv/hvgdk_mini.h
+++ b/include/hyperv/hvgdk_mini.h
@@ -1342,7 +1342,7 @@ struct hv_mmio_write_input {
 	u64 gpa;
 	u32 size;
 	u32 reserved;
-	u8 data[HV_HYPERCALL_MMIO_MAX_DATA_LENGTH];
+	u8 data[];
 } __packed;
 
 #endif /* _HV_HVGDK_MINI_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/7] PCI: hv: Use hv_hvcall_*() to set up hypercall arguments
  2025-02-26 20:06 ` [PATCH 6/7] PCI: " mhkelley58
@ 2025-02-27 20:29   ` Bjorn Helgaas
  2025-03-05 17:24     ` Wei Liu
  0 siblings, 1 reply; 17+ messages in thread
From: Bjorn Helgaas @ 2025-02-27 20:29 UTC (permalink / raw)
  To: mhklinux
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd, x86,
	linux-hyperv, linux-kernel, linux-pci, linux-arch

On Wed, Feb 26, 2025 at 12:06:11PM -0800, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
> 
> Update hypercall call sites to use the new hv_hvcall_*() functions
> to set up hypercall arguments. Since these functions zero the
> fixed portion of input memory, remove now redundant calls to memset().
> 
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Since most of this series touches arch/x86/hyperv/ and drivers/hv/, I
assume this will be merged via some non-PCI tree.

> ---
>  drivers/pci/controller/pci-hyperv.c | 14 ++++++--------
>  include/hyperv/hvgdk_mini.h         |  2 +-
>  2 files changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 44d7f4339306..b7bfda00544d 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -638,8 +638,8 @@ static void hv_arch_irq_unmask(struct irq_data *data)
>  
>  	local_irq_save(flags);
>  
> -	params = *this_cpu_ptr(hyperv_pcpu_input_arg);
> -	memset(params, 0, sizeof(*params));
> +	hv_hvcall_in_array(&params, sizeof(*params),
> +			sizeof(params->int_target.vp_set.bank_contents[0]));
>  	params->partition_id = HV_PARTITION_ID_SELF;
>  	params->int_entry.source = HV_INTERRUPT_SOURCE_MSI;
>  	params->int_entry.msi_entry.address.as_uint32 = int_desc->address & 0xffffffff;
> @@ -1034,11 +1034,9 @@ static void hv_pci_read_mmio(struct device *dev, phys_addr_t gpa, int size, u32
>  
>  	/*
>  	 * Must be called with interrupts disabled so it is safe
> -	 * to use the per-cpu input argument page.  Use it for
> -	 * both input and output.
> +	 * to use the per-cpu argument page.
>  	 */
> -	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> -	out = *this_cpu_ptr(hyperv_pcpu_input_arg) + sizeof(*in);
> +	hv_hvcall_inout(&in, sizeof(*in), &out, sizeof(*out));
>  	in->gpa = gpa;
>  	in->size = size;
>  
> @@ -1067,9 +1065,9 @@ static void hv_pci_write_mmio(struct device *dev, phys_addr_t gpa, int size, u32
>  
>  	/*
>  	 * Must be called with interrupts disabled so it is safe
> -	 * to use the per-cpu input argument memory.
> +	 * to use the per-cpu argument page.
>  	 */
> -	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +	hv_hvcall_in_array(&in, sizeof(*in), sizeof(in->data[0]));
>  	in->gpa = gpa;
>  	in->size = size;
>  	switch (size) {
> diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
> index 70e5d7ee40c8..cb25ac1e3ac5 100644
> --- a/include/hyperv/hvgdk_mini.h
> +++ b/include/hyperv/hvgdk_mini.h
> @@ -1342,7 +1342,7 @@ struct hv_mmio_write_input {
>  	u64 gpa;
>  	u32 size;
>  	u32 reserved;
> -	u8 data[HV_HYPERCALL_MMIO_MAX_DATA_LENGTH];
> +	u8 data[];
>  } __packed;
>  
>  #endif /* _HV_HVGDK_MINI_H */
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/7] PCI: hv: Use hv_hvcall_*() to set up hypercall arguments
  2025-02-27 20:29   ` Bjorn Helgaas
@ 2025-03-05 17:24     ` Wei Liu
  0 siblings, 0 replies; 17+ messages in thread
From: Wei Liu @ 2025-03-05 17:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: mhklinux, kys, haiyangz, wei.liu, decui, tglx, mingo, bp,
	dave.hansen, hpa, lpieralisi, kw, manivannan.sadhasivam, robh,
	bhelgaas, arnd, x86, linux-hyperv, linux-kernel, linux-pci,
	linux-arch

On Thu, Feb 27, 2025 at 02:29:51PM -0600, Bjorn Helgaas wrote:
> On Wed, Feb 26, 2025 at 12:06:11PM -0800, mhkelley58@gmail.com wrote:
> > From: Michael Kelley <mhklinux@outlook.com>
> > 
> > Update hypercall call sites to use the new hv_hvcall_*() functions
> > to set up hypercall arguments. Since these functions zero the
> > fixed portion of input memory, remove now redundant calls to memset().
> > 
> > Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> Since most of this series touches arch/x86/hyperv/ and drivers/hv/, I
> assume this will be merged via some non-PCI tree.

Yes, I will pick this up after the whole series is reviewed and acked.

Thanks,
Wei.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 7/7] Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg
  2025-02-26 20:06 [PATCH 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
                   ` (5 preceding siblings ...)
  2025-02-26 20:06 ` [PATCH 6/7] PCI: " mhkelley58
@ 2025-02-26 20:06 ` mhkelley58
  6 siblings, 0 replies; 17+ messages in thread
From: mhkelley58 @ 2025-02-26 20:06 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

From: Michael Kelley <mhklinux@outlook.com>

All open coded uses of hyperv_pcpu_input_arg and hyperv_pcpu_ouput_arg
have been replaced by hv_hvcall_*() functions. So combine
hyperv_pcpu_input_arg and hyperv_pcpu_output_arg in a single
hyperv_pcpu_arg. Remove logic for managing a separate output arg. Fixup
comment references to the old variable names.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
 arch/x86/hyperv/hv_init.c      |  6 ++--
 drivers/hv/hv.c                |  2 +-
 drivers/hv/hv_common.c         | 55 ++++++++++------------------------
 include/asm-generic/mshyperv.h |  6 +---
 4 files changed, 21 insertions(+), 48 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index c5c9511cb7ed..c701d7601f48 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -482,16 +482,16 @@ void __init hyperv_init(void)
 	 * A TDX VM with no paravisor only uses TDX GHCI rather than hv_hypercall_pg:
 	 * when the hypercall input is a page, such a VM must pass a decrypted
 	 * page to Hyper-V, e.g. hv_post_message() uses the per-CPU page
-	 * hyperv_pcpu_input_arg, which is decrypted if no paravisor is present.
+	 * hyperv_pcpu_arg, which is decrypted if no paravisor is present.
 	 *
 	 * A TDX VM with the paravisor uses hv_hypercall_pg for most hypercalls,
 	 * which are handled by the paravisor and the VM must use an encrypted
-	 * input page: in such a VM, the hyperv_pcpu_input_arg is encrypted and
+	 * input page: in such a VM, the hyperv_pcpu_arg is encrypted and
 	 * used in the hypercalls, e.g. see hv_mark_gpa_visibility() and
 	 * hv_arch_irq_unmask(). Such a VM uses TDX GHCI for two hypercalls:
 	 * 1. HVCALL_SIGNAL_EVENT: see vmbus_set_event() and _hv_do_fast_hypercall8().
 	 * 2. HVCALL_POST_MESSAGE: the input page must be a decrypted page, i.e.
-	 * hv_post_message() in such a VM can't use the encrypted hyperv_pcpu_input_arg;
+	 * hv_post_message() in such a VM can't use the encrypted hyperv_pcpu_arg;
 	 * instead, hv_post_message() uses the post_msg_page, which is decrypted
 	 * in such a VM and is only used in such a VM.
 	 */
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index d98e83b4a6fd..cdd5dcbdee4e 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -60,7 +60,7 @@ int hv_post_message(union hv_connection_id connection_id,
 	/*
 	 * A TDX VM with the paravisor must use the decrypted post_msg_page: see
 	 * the comment in struct hv_per_cpu_context. A SNP VM with the paravisor
-	 * can use the encrypted hyperv_pcpu_input_arg because it copies the
+	 * can use the encrypted hyperv_pcpu_arg because it copies the
 	 * input into the GHCB page, which has been decrypted by the paravisor.
 	 */
 	if (hv_isolation_type_tdx() && ms_hyperv.paravisor_present)
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index a6b1cdfbc8d4..f3e219daf9fe 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -58,11 +58,8 @@ EXPORT_SYMBOL_GPL(hv_vp_index);
 u32 hv_max_vp_index;
 EXPORT_SYMBOL_GPL(hv_max_vp_index);
 
-void * __percpu *hyperv_pcpu_input_arg;
-EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
-
-void * __percpu *hyperv_pcpu_output_arg;
-EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
+void * __percpu *hyperv_pcpu_arg;
+EXPORT_SYMBOL_GPL(hyperv_pcpu_arg);
 
 static void hv_kmsg_dump_unregister(void);
 
@@ -85,11 +82,8 @@ void __init hv_common_free(void)
 	kfree(hv_vp_index);
 	hv_vp_index = NULL;
 
-	free_percpu(hyperv_pcpu_output_arg);
-	hyperv_pcpu_output_arg = NULL;
-
-	free_percpu(hyperv_pcpu_input_arg);
-	hyperv_pcpu_input_arg = NULL;
+	free_percpu(hyperv_pcpu_arg);
+	hyperv_pcpu_arg = NULL;
 }
 
 /*
@@ -281,11 +275,6 @@ static void hv_kmsg_dump_register(void)
 	}
 }
 
-static inline bool hv_output_page_exists(void)
-{
-	return hv_root_partition() || IS_ENABLED(CONFIG_HYPERV_VTL_MODE);
-}
-
 void __init hv_get_partition_id(void)
 {
 	struct hv_output_get_partition_id *output;
@@ -363,14 +352,8 @@ int __init hv_common_init(void)
 	 * (per-CPU) hypercall input page and thus this failure is
 	 * fatal on Hyper-V.
 	 */
-	hyperv_pcpu_input_arg = alloc_percpu(void  *);
-	BUG_ON(!hyperv_pcpu_input_arg);
-
-	/* Allocate the per-CPU state for output arg for root */
-	if (hv_output_page_exists()) {
-		hyperv_pcpu_output_arg = alloc_percpu(void *);
-		BUG_ON(!hyperv_pcpu_output_arg);
-	}
+	hyperv_pcpu_arg = alloc_percpu(void  *);
+	BUG_ON(!hyperv_pcpu_arg);
 
 	hv_vp_index = kmalloc_array(nr_cpu_ids, sizeof(*hv_vp_index),
 				    GFP_KERNEL);
@@ -459,32 +442,27 @@ void __init ms_hyperv_late_init(void)
 
 int hv_common_cpu_init(unsigned int cpu)
 {
-	void **inputarg, **outputarg;
+	void **inputarg;
 	u64 msr_vp_index;
 	gfp_t flags;
-	const int pgcount = hv_output_page_exists() ? 2 : 1;
+	const int pgcount = HV_HVCALL_ARG_PAGES;
 	void *mem;
 	int ret;
 
 	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
 	flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL;
 
-	inputarg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
+	inputarg = (void **)this_cpu_ptr(hyperv_pcpu_arg);
 
 	/*
-	 * hyperv_pcpu_input_arg and hyperv_pcpu_output_arg memory is already
-	 * allocated if this CPU was previously online and then taken offline
+	 * hyperv_pcpu_arg memory is already allocated if this CPU was
+	 * previously online and then taken offline
 	 */
 	if (!*inputarg) {
 		mem = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags);
 		if (!mem)
 			return -ENOMEM;
 
-		if (hv_output_page_exists()) {
-			outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
-			*outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
-		}
-
 		if (!ms_hyperv.paravisor_present &&
 		    (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
 			ret = set_memory_decrypted((unsigned long)mem, pgcount);
@@ -498,13 +476,13 @@ int hv_common_cpu_init(unsigned int cpu)
 
 		/*
 		 * In a fully enlightened TDX/SNP VM with more than 64 VPs, if
-		 * hyperv_pcpu_input_arg is not NULL, set_memory_decrypted() ->
+		 * hyperv_pcpu_arg is not NULL, set_memory_decrypted() ->
 		 * ... -> cpa_flush()-> ... -> __send_ipi_mask_ex() tries to
-		 * use hyperv_pcpu_input_arg as the hypercall input page, which
+		 * use hyperv_pcpu_arg as the hypercall input page, which
 		 * must be a decrypted page in such a VM, but the page is still
 		 * encrypted before set_memory_decrypted() returns. Fix this by
 		 * setting *inputarg after the above set_memory_decrypted(): if
-		 * hyperv_pcpu_input_arg is NULL, __send_ipi_mask_ex() returns
+		 * hyperv_pcpu_arg is NULL, __send_ipi_mask_ex() returns
 		 * HV_STATUS_INVALID_PARAMETER immediately, and the function
 		 * hv_send_ipi_mask() falls back to orig_apic.send_IPI_mask(),
 		 * which may be slightly slower than the hypercall, but still
@@ -526,9 +504,8 @@ int hv_common_cpu_init(unsigned int cpu)
 int hv_common_cpu_die(unsigned int cpu)
 {
 	/*
-	 * The hyperv_pcpu_input_arg and hyperv_pcpu_output_arg memory
-	 * is not freed when the CPU goes offline as the hyperv_pcpu_input_arg
-	 * may be used by the Hyper-V vPCI driver in reassigning interrupts
+	 * The hyperv_pcpu_arg memory is not freed when the CPU goes offline as
+	 * it may be used by the Hyper-V vPCI driver in reassigning interrupts
 	 * as part of the offlining process.  The interrupt reassignment
 	 * happens *after* the CPUHP_AP_HYPERV_ONLINE state has run and
 	 * called this function.
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 0c8a9133cf1a..2f94020b4633 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -66,8 +66,7 @@ extern bool hv_nested;
 extern u64 hv_current_partition_id;
 extern enum hv_partition_type hv_curr_partition_type;
 
-extern void * __percpu *hyperv_pcpu_input_arg;
-extern void * __percpu *hyperv_pcpu_output_arg;
+extern void * __percpu *hyperv_pcpu_arg;
 
 extern u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
 extern u64 hv_do_fast_hypercall8(u16 control, u64 input8);
@@ -139,9 +138,6 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size,
  * Hypercall input and output argument setup
  */
 
-/* Temporary mapping to be removed at the end of the patch series */
-#define hyperv_pcpu_arg hyperv_pcpu_input_arg
-
 /*
  * Allocate one page that is shared between input and output args, which is
  * sufficient for all current hypercalls. If a future hypercall requires
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-03-10  0:22 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-26 20:06 [PATCH 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
2025-02-26 20:06 ` [PATCH 1/7] x86/hyperv: Fix output argument to hypercall that changes page visibility mhkelley58
2025-02-26 22:59   ` Nuno Das Neves
2025-03-10  0:22     ` Wei Liu
2025-02-26 20:06 ` [PATCH 2/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments mhkelley58
2025-02-27  0:14   ` Nuno Das Neves
2025-02-27  1:50     ` Michael Kelley
2025-02-27 20:09       ` Nuno Das Neves
2025-02-26 20:06 ` [PATCH 3/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 1 mhkelley58
2025-02-27 21:07   ` Nuno Das Neves
2025-02-28  0:50     ` Michael Kelley
2025-02-26 20:06 ` [PATCH 4/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 2 mhkelley58
2025-02-26 20:06 ` [PATCH 5/7] Drivers: hv: Use hv_hvcall_*() to set up hypercall arguments mhkelley58
2025-02-26 20:06 ` [PATCH 6/7] PCI: " mhkelley58
2025-02-27 20:29   ` Bjorn Helgaas
2025-03-05 17:24     ` Wei Liu
2025-02-26 20:06 ` [PATCH 7/7] Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg mhkelley58

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).