[PATCH v3 0/7] hyperv: Introduce new way to manage hypercall args

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/7] hyperv: Introduce new way to manage hypercall args
@ 2025-04-15 18:07 mhkelley58
  2025-04-15 18:07 ` [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments mhkelley58
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: mhkelley58 @ 2025-04-15 18:07 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

From: Michael Kelley <mhklinux@outlook.com>

This patch set introduces a new way to manage the use of the per-cpu
memory that is usually the input and output arguments to Hyper-V
hypercalls. Current code allocates the "hyperv_pcpu_input_arg", and in
some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
page of memory allocated per-vCPU. A hypercall call site disables
interrupts, then uses this memory to set up the input parameters for
the hypercall, read the output results after hypercall execution, and
re-enable interrupts. The open coding of these steps has led to
inconsistencies, and in some cases, violation of the generic
requirements for the hypercall input and output as described in the
Hyper-V Top Level Functional Spec (TLFS)[1]. This patch set introduces
a new family of inline functions to replace the open coding. The new
functions encapsulate key aspects of the use of per-vCPU memory for
hypercall input and output, and ensure that the TLFS requirements are
met (max size of 1 page each for input and output, no overlap of input
and output, aligned to 8 bytes, etc.).

With this change, hypercall call sites no longer directly access
"hyperv_pcpu_input_arg" and "hyperv_pcpu_output_arg". Instead, one of
a family of new functions provides the per-cpu memory that a hypercall
call site uses to set up hypercall input and output areas.
Conceptually, there is no longer a difference between the "per-vCPU
input page" and "per-vCPU output page". Only a single per-vCPU page is
allocated, and it is used to provide both hypercall input and output.
All current hypercalls can fit their input and output within that single
page, though the new code allows easy changing to two pages should a
future hypercall require a full page for each of the input and output.

The new functions always zero the fixed-size portion of the hypercall
input area (but not any array portion -- see below) so that
uninitialized memory isn't inadvertently passed to the hypercall.
Current open-coded hypercall call sites are inconsistent on this point,
and use of the new functions addresses that inconsistency. The output
area is not zero'ed by the new code as it is Hyper-V's responsibility
to provide legal output.

When the input or output (or both) contain an array, the new code
calculates and returns how many array entries fit within the per-cpu
memory page, which is effectively the "batch size" for the hypercall
processing multiple entries. This batch size can then be used in the
hypercall control word to specify the repetition count. This
calculation of the batch size replaces current open coding of the
batch size, which is prone to errors. Note that the array portion of
the input area is *not* zero'ed. The arrays are almost always 64-bit
GPAs or something similar, and zero'ing that much memory seems
wasteful at runtime when it will all be overwritten. The hypercall
call site is responsible for ensuring that no part of the array is
left uninitialized (just as with current code).

The new family of functions is realized as a single inline function
that handles the most complex case, which is a hypercall with input
and output, both of which contain arrays. Simpler cases are mapped to
this most complex case with #define wrappers that provide zero or NULL
for some arguments. Several of the arguments to this new function
must be compile-time constants generated by "sizeof()" expressions.
As such, most of the code in the new function is evaluated by the
compiler, with the result that the runtime code paths are no longer
than with the current open coding. An exception is the new code
generated to zero the fixed-size portion of the input area in cases
where it was not previously done.

Use of the new function typically (but not always) saves a few lines
of code at each hypercall call site. This is traded off against the
lines of code added for the new functions. With code currently
upstream, the net is an add of about 20 lines of code and comments.

A couple hypercall call sites have requirements that are not 100%
handled by the new function. These still require some manual open-
coded adjustment or open-coded batch size calculations -- see the
individual patches in this series. Suggestions on how to do better
are welcome.

The patches in the series do the following:

Patch 1: Introduce the new family of functions for assigning hypercall
         input and output arguments.

Patch 2 to 6: Change existing hypercall call sites to use one of the new
         functions. In some cases, tweaks to the hypercall argument data
         structures are necessary, but these tweaks are making the data
         structures more consistent with the overall pattern. These
         5 patches are independent of each other, and can go in any
         order. The breakup into 5 patches is for ease of review.

Patch 7: Update the name of the variable used to hold the per-vCPU memory
         used for hypercall arguments. Remove code for managing the
	 per-vCPU output page.

Patch 6 is new in v3 of the patch set. It updates the new hypercall
call sites added as part of the mshv code in the 6.15-rc1 kernel.

The new code compiles and runs successfully on x86 and arm64. However,
basic smoke tests cover only a limited number of hypercall call sites
that have been modified. I don't have the hardware or Hyper-V
configurations needed to test running in the Hyper-V root partition
or running in a VTL other than VTL 0. The related hypercall call sites
still need to be tested to make sure I didn't break anything. Hopefully
someone with the necessary configurations and Hyper-V versions can
help with that testing.

For gcc 9.4.0, I've looked at the generated code for a couple of
hypercall call sites on both x86 and arm64 to ensure that it boils
down to the equivalent of the current open coding. I have not looked
at the generated code for later gcc versions or for Clang/LLVM, but
there's no reason to expect something worse as the code isn't doing
anything tricky.

This patch set is built against linux-next20250411.

[1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs

Michael Kelley (7):
  Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 1
  x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 2
  Drivers: hv: Use hv_hvcall_*() to set up hypercall arguments
  PCI: hv: Use hv_hvcall_*() to set up hypercall arguments
  Drivers: hv: Use hv_hvcall_*() to set up hypercall arguments for mshv
    code
  Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg

 arch/x86/hyperv/hv_apic.c           |  10 +--
 arch/x86/hyperv/hv_init.c           |  12 ++-
 arch/x86/hyperv/hv_vtl.c            |   9 +--
 arch/x86/hyperv/irqdomain.c         |  17 ++--
 arch/x86/hyperv/ivm.c               |  18 ++---
 arch/x86/hyperv/mmu.c               |  19 ++---
 arch/x86/hyperv/nested.c            |  14 ++--
 drivers/hv/hv.c                     |   6 +-
 drivers/hv/hv_balloon.c             |   8 +-
 drivers/hv/hv_common.c              |  57 ++++---------
 drivers/hv/hv_proc.c                |  23 +++---
 drivers/hv/mshv_common.c            |  31 +++----
 drivers/hv/mshv_root_hv_call.c      | 121 +++++++++++-----------------
 drivers/hv/mshv_root_main.c         |   5 +-
 drivers/pci/controller/pci-hyperv.c |  18 ++---
 include/asm-generic/mshyperv.h      | 106 +++++++++++++++++++++++-
 include/hyperv/hvgdk_mini.h         |   4 +-
 17 files changed, 250 insertions(+), 228 deletions(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-04-15 18:07 [PATCH v3 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
@ 2025-04-15 18:07 ` mhkelley58
  2025-04-21 20:41   ` Easwar Hariharan
  2025-08-21  0:31   ` Mukesh R
  2025-04-15 18:07 ` [PATCH v3 2/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 1 mhkelley58
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 24+ messages in thread
From: mhkelley58 @ 2025-04-15 18:07 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

From: Michael Kelley <mhklinux@outlook.com>

Current code allocates the "hyperv_pcpu_input_arg", and in
some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
page of memory allocated per-vCPU. A hypercall call site disables
interrupts, then uses this memory to set up the input parameters for
the hypercall, read the output results after hypercall execution, and
re-enable interrupts. The open coding of these steps leads to
inconsistencies, and in some cases, violation of the generic
requirements for the hypercall input and output as described in the
Hyper-V Top Level Functional Spec (TLFS)[1].

To reduce these kinds of problems, introduce a family of inline
functions to replace the open coding. The functions provide a new way
to manage the use of this per-vCPU memory that is usually the input and
output arguments to Hyper-V hypercalls. The functions encapsulate
key aspects of the usage and ensure that the TLFS requirements are
met (max size of 1 page each for input and output, no overlap of
input and output, aligned to 8 bytes, etc.). Conceptually, there
is no longer a difference between the "per-vCPU input page" and
"per-vCPU output page". Only a single per-vCPU page is allocated, and
it provides both hypercall input and output memory. All current
hypercalls can fit their input and output within that single page,
though the new code allows easy changing to two pages should a future
hypercall require a full page for each of the input and output.

The new functions always zero the fixed-size portion of the hypercall
input area so that uninitialized memory is not inadvertently passed
to the hypercall. Current open-coded hypercall call sites are
inconsistent on this point, and use of the new functions addresses
that inconsistency. The output area is not zero'ed by the new code
as it is Hyper-V's responsibility to provide legal output.

When the input or output (or both) contain an array, the new functions
calculate and return how many array entries fit within the per-vCPU
memory page, which is effectively the "batch size" for the hypercall
processing multiple entries. This batch size can then be used in the
hypercall control word to specify the repetition count. This
calculation of the batch size replaces current open coding of the
batch size, which is prone to errors. Note that the array portion of
the input area is *not* zero'ed. The arrays are almost always 64-bit
GPAs or something similar, and zero'ing that much memory seems
wasteful at runtime when it will all be overwritten. The hypercall
call site is responsible for ensuring that no part of the array is
left uninitialized (just as with current code).

The new functions are realized as a single inline function that
handles the most complex case, which is a hypercall with input
and output, both of which contain arrays. Simpler cases are mapped to
this most complex case with #define wrappers that provide zero or NULL
for some arguments. Several of the arguments to this new function
must be compile-time constants generated by "sizeof()"
expressions. As such, most of the code in the new function can be
evaluated by the compiler, with the result that the code paths are
no longer than with the current open coding. The one exception is
new code generated to zero the fixed-size portion of the input area
in cases where it is not currently done.

[1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
---

Notes:
    Changes in v3:
    * Added wrapper #define hv_hvcall_in_batch_size() to get the batch size
      without setting up hypercall input/output parameters. This call can be
      used when the batch size is needed for validation checks or memory
      allocations prior to disabling interrupts.

    Changes in v2:
    * Added comment that hv_hvcall_inout_array() should always be called with
      interrupts disabled because it is returning pointers to per-cpu memory
      [Nuno Das Neves]

 include/asm-generic/mshyperv.h | 106 +++++++++++++++++++++++++++++++++
 1 file changed, 106 insertions(+)

diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index ccccb1cbf7df..504c44b1ab9e 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -151,6 +151,112 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size,
 	return status;
 }

+/*
+ * Hypercall input and output argument setup
+ */
+
+/* Temporary mapping to be removed at the end of the patch series */
+#define hyperv_pcpu_arg hyperv_pcpu_input_arg
+
+/*
+ * Allocate one page that is shared between input and output args, which is
+ * sufficient for all current hypercalls. If a future hypercall requires
+ * more space, change this value to "2" and everything will work.
+ */
+#define HV_HVCALL_ARG_PAGES 1
+
+/*
+ * Allocate space for hypercall input and output arguments from the
+ * pre-allocated per-cpu hyperv_pcpu_args page(s). A NULL value for the input
+ * or output indicates to allocate no space for that argument. For input and
+ * for output, specify the size of the fixed portion, and the size of an
+ * element in a variable size array. A zero value for entry_size indicates
+ * there is no array. The fixed size space for the input is zero'ed.
+ *
+ * When variable size arrays are present, the function returns the number of
+ * elements (i.e, the batch size) that fit in the available space.
+ *
+ * The four "size" arguments must be constants so the compiler can do most of
+ * the calculations. Then the generated inline code is no larger than if open
+ * coding the access to the hyperv_pcpu_arg and doing memset() on the input.
+ *
+ * This function must be called with interrupts disabled so the thread is not
+ * rescheduled onto another vCPU while accessing the per-cpu args page.
+ */
+static inline int hv_hvcall_inout_array(void *input, u32 in_size, u32 in_entry_size,
+					void *output, u32 out_size, u32 out_entry_size)
+{
+	u32 in_batch_count = 0, out_batch_count = 0, batch_count;
+	u32 in_total_size, out_total_size, offset;
+	u32 batch_space = HV_HYP_PAGE_SIZE * HV_HVCALL_ARG_PAGES;
+	void *space;
+
+	/*
+	 * If input and output have arrays, allocate half the space to input
+	 * and half to output. If only input has an array, the array can use
+	 * all the space except for the fixed size output (but not to exceed
+	 * one page), and vice versa.
+	 */
+	if (in_entry_size && out_entry_size)
+		batch_space = batch_space / 2;
+	else if (in_entry_size)
+		batch_space = min(HV_HYP_PAGE_SIZE, batch_space - out_size);
+	else if (out_entry_size)
+		batch_space = min(HV_HYP_PAGE_SIZE, batch_space - in_size);
+
+	if (in_entry_size)
+		in_batch_count = (batch_space - in_size) / in_entry_size;
+	if (out_entry_size)
+		out_batch_count = (batch_space - out_size) / out_entry_size;
+
+	/*
+	 * If input and output have arrays, use the smaller of the two batch
+	 * counts, in case they are different. If only one has an array, use
+	 * that batch count. batch_count will be zero if neither has an array.
+	 */
+	if (in_batch_count && out_batch_count)
+		batch_count = min(in_batch_count, out_batch_count);
+	else
+		batch_count = in_batch_count | out_batch_count;
+
+	in_total_size = ALIGN(in_size + (in_entry_size * batch_count), 8);
+	out_total_size = ALIGN(out_size + (out_entry_size * batch_count), 8);
+
+	space = *this_cpu_ptr(hyperv_pcpu_arg);
+	if (input) {
+		*(void **)input = space;
+		if (space)
+			/* Zero the fixed size portion, not any array portion */
+			memset(space, 0, ALIGN(in_size, 8));
+	}
+
+	if (output) {
+		if (in_total_size + out_total_size <= HV_HYP_PAGE_SIZE) {
+			offset = in_total_size;
+		} else {
+			offset = HV_HYP_PAGE_SIZE;
+			/* Need more than 1 page, but only 1 was allocated */
+			BUILD_BUG_ON(HV_HVCALL_ARG_PAGES == 1);
+		}
+		*(void **)output = space + offset;
+	}
+
+	return batch_count;
+}
+
+/* Wrappers for some of the simpler cases with only input, or with no arrays */
+#define hv_hvcall_in(input, in_size) \
+	hv_hvcall_inout_array(input, in_size, 0, NULL, 0, 0)
+
+#define hv_hvcall_inout(input, in_size, output, out_size) \
+	hv_hvcall_inout_array(input, in_size, 0, output, out_size, 0)
+
+#define hv_hvcall_in_array(input, in_size, in_entry_size) \
+	hv_hvcall_inout_array(input, in_size, in_entry_size, NULL, 0, 0)
+
+#define hv_hvcall_in_batch_size(in_size, in_entry_size) \
+	hv_hvcall_inout_array(NULL, in_size, in_entry_size, NULL, 0, 0)
+
 /* Generate the guest OS identifier as described in the Hyper-V TLFS */
 static inline u64 hv_generate_guest_id(u64 kernel_version)
 {
-- 
2.25.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 2/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 1
  2025-04-15 18:07 [PATCH v3 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
  2025-04-15 18:07 ` [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments mhkelley58
@ 2025-04-15 18:07 ` mhkelley58
  2025-04-15 18:07 ` [PATCH v3 3/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 2 mhkelley58
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: mhkelley58 @ 2025-04-15 18:07 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

From: Michael Kelley <mhklinux@outlook.com>

Update hypercall call sites to use the new hv_hvcall_*() functions
to set up hypercall arguments. Since these functions zero the
fixed portion of input memory, remove now redundant calls to memset()
and explicit zero'ing of input fields.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
---

Notes:
    Changes in v2:
    * Fixed get_vtl() and hv_vtl_apicid_to_vp_id() to properly treat the input
      and output arguments as arrays [Nuno Das Neves]
    * Enhanced __send_ipi_mask_ex() and hv_map_interrupt() to check the number
      of computed banks in the hv_vpset against the batch_size. Since an
      hv_vpset currently represents a maximum of 4096 CPUs, the hv_vpset size
      does not exceed 512 bytes and there should always be sufficent space. But
      do the check just in case something changes. [Nuno Das Neves]

 arch/x86/hyperv/hv_apic.c   | 10 ++++------
 arch/x86/hyperv/hv_init.c   |  6 ++----
 arch/x86/hyperv/hv_vtl.c    |  9 +++------
 arch/x86/hyperv/irqdomain.c | 17 ++++++++++-------
 4 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index 6d91ac5f9836..cd794baaa636 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -108,21 +108,19 @@ static bool __send_ipi_mask_ex(const struct cpumask *mask, int vector,
 {
 	struct hv_send_ipi_ex *ipi_arg;
 	unsigned long flags;
-	int nr_bank = 0;
+	int batch_size, nr_bank = 0;
 	u64 status = HV_STATUS_INVALID_PARAMETER;
 
 	if (!(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED))
 		return false;
 
 	local_irq_save(flags);
-	ipi_arg = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
+	batch_size = hv_hvcall_in_array(&ipi_arg, sizeof(*ipi_arg),
+					sizeof(ipi_arg->vp_set.bank_contents[0]));
 	if (unlikely(!ipi_arg))
 		goto ipi_mask_ex_done;
 
 	ipi_arg->vector = vector;
-	ipi_arg->reserved = 0;
-	ipi_arg->vp_set.valid_bank_mask = 0;
 
 	/*
 	 * Use HV_GENERIC_SET_ALL and avoid converting cpumask to VP_SET
@@ -139,7 +137,7 @@ static bool __send_ipi_mask_ex(const struct cpumask *mask, int vector,
 		 * represented in VP_SET. Return an error and fall back to
 		 * native (architectural) method of sending IPIs.
 		 */
-		if (nr_bank <= 0)
+		if (nr_bank <= 0 || nr_bank > batch_size)
 			goto ipi_mask_ex_done;
 	} else {
 		ipi_arg->vp_set.format = HV_GENERIC_SET_ALL;
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index ddeb40930bc8..cc843905c23a 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -400,13 +400,11 @@ static u8 __init get_vtl(void)
 	u64 ret;
 
 	local_irq_save(flags);
-	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
 
-	memset(input, 0, struct_size(input, names, 1));
+	hv_hvcall_inout_array(&input, sizeof(*input), sizeof(input->names[0]),
+			      &output, sizeof(*output), sizeof(output->values[0]));
 	input->partition_id = HV_PARTITION_ID_SELF;
 	input->vp_index = HV_VP_INDEX_SELF;
-	input->input_vtl.as_uint8 = 0;
 	input->names[0] = HV_REGISTER_VSM_VP_STATUS;
 
 	ret = hv_do_hypercall(control, input, output);
diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c
index 13242ed8ff16..ccd9c24722f9 100644
--- a/arch/x86/hyperv/hv_vtl.c
+++ b/arch/x86/hyperv/hv_vtl.c
@@ -125,8 +125,7 @@ static int hv_vtl_bringup_vcpu(u32 target_vp_index, int cpu, u64 eip_ignored)
 
 	local_irq_save(irq_flags);
 
-	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	memset(input, 0, sizeof(*input));
+	hv_hvcall_in(&input, sizeof(*input));
 
 	input->partition_id = HV_PARTITION_ID_SELF;
 	input->vp_index = target_vp_index;
@@ -216,13 +215,11 @@ static int hv_vtl_apicid_to_vp_id(u32 apic_id)
 
 	local_irq_save(irq_flags);
 
-	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	memset(input, 0, sizeof(*input));
+	hv_hvcall_inout_array(&input, sizeof(*input), sizeof(input->apic_ids[0]),
+			      &output, 0, sizeof(*output));
 	input->partition_id = HV_PARTITION_ID_SELF;
 	input->apic_ids[0] = apic_id;
 
-	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
-
 	control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_ID_FROM_APIC_ID;
 	status = hv_do_hypercall(control, input, output);
 	ret = output[0];
diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
index 31f0d29cbc5e..82c4e84541ab 100644
--- a/arch/x86/hyperv/irqdomain.c
+++ b/arch/x86/hyperv/irqdomain.c
@@ -20,15 +20,15 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
 	struct hv_device_interrupt_descriptor *intr_desc;
 	unsigned long flags;
 	u64 status;
-	int nr_bank, var_size;
+	int batch_size, nr_bank, var_size;
 
 	local_irq_save(flags);
 
-	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
+	batch_size = hv_hvcall_inout_array(&input, sizeof(*input),
+			sizeof(input->interrupt_descriptor.target.vp_set.bank_contents[0]),
+			&output, sizeof(*output), 0);
 
 	intr_desc = &input->interrupt_descriptor;
-	memset(input, 0, sizeof(*input));
 	input->partition_id = hv_current_partition_id;
 	input->device_id = device_id.as_uint64;
 	intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
@@ -40,7 +40,6 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
 	else
 		intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
 
-	intr_desc->target.vp_set.valid_bank_mask = 0;
 	intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
 	nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), cpumask_of(cpu));
 	if (nr_bank < 0) {
@@ -48,6 +47,11 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
 		pr_err("%s: unable to generate VP set\n", __func__);
 		return EINVAL;
 	}
+	if (nr_bank > batch_size) {
+		local_irq_restore(flags);
+		pr_err("%s: nr_bank too large\n", __func__);
+		return EINVAL;
+	}
 	intr_desc->target.flags = HV_DEVICE_INTERRUPT_TARGET_PROCESSOR_SET;
 
 	/*
@@ -77,9 +81,8 @@ static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *old_entry)
 	u64 status;
 
 	local_irq_save(flags);
-	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
 
-	memset(input, 0, sizeof(*input));
+	hv_hvcall_in(&input, sizeof(*input));
 	intr_entry = &input->interrupt_entry;
 	input->partition_id = hv_current_partition_id;
 	input->device_id = id;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 3/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 2
  2025-04-15 18:07 [PATCH v3 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
  2025-04-15 18:07 ` [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments mhkelley58
  2025-04-15 18:07 ` [PATCH v3 2/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 1 mhkelley58
@ 2025-04-15 18:07 ` mhkelley58
  2025-04-15 18:07 ` [PATCH v3 4/7] Drivers: hv: Use hv_hvcall_*() to set up hypercall arguments mhkelley58
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: mhkelley58 @ 2025-04-15 18:07 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

From: Michael Kelley <mhklinux@outlook.com>

Update hypercall call sites to use the new hv_hvcall_*() functions
to set up hypercall arguments. Since these functions zero the
fixed portion of input memory, remove now redundant calls to memset()
and explicit zero'ing of input fields.

For hv_mark_gpa_visibility(), use the computed batch_size instead
of HV_MAX_MODIFY_GPA_REP_COUNT. Also update the associated gpa_page_list[]
field to have zero size, which is more consistent with other array
arguments to hypercalls. Due to the interaction with the calling
hv_vtom_set_host_visibility(), HV_MAX_MODIFY_GPA_REP_COUNT cannot be
completely eliminated without some further restructuring, but that's
for another patch set.

Similarly, for the nested flush functions, update the gpa_list[] to
have zero size. Again, separate restructuring would be required to
completely eliminate the need for HV_MAX_FLUSH_REP_COUNT.

Finally, hyperv_flush_tlb_others_ex() requires special handling
because the input consists of two arrays -- one for the hv_vp_set and
another for the gva list. The batch_size computed by hv_hvcall_in_array()
is adjusted to account for the number of entries in the hv_vp_set.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
---

Notes:
    Changes in v2:
    * In hyperv_flush_tlb_others_ex(), added check of the adjusted
      max_gvas to make sure it doesn't go to zero or negative, which would
      happen if there is insufficient space to hold the hv_vpset and have
      at least one entry in the gva list. Since an hv_vpset currently
      represents a maximum of 4096 CPUs, the hv_vpset size does not exceed
      512 bytes and there should always be sufficent space. But do the
      check just in case something changes. [Nuno Das Neves]

 arch/x86/hyperv/ivm.c       | 18 +++++++++---------
 arch/x86/hyperv/mmu.c       | 19 +++++--------------
 arch/x86/hyperv/nested.c    | 14 +++++---------
 include/hyperv/hvgdk_mini.h |  4 ++--
 4 files changed, 21 insertions(+), 34 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 77bf05f06b9e..f99b7f4482d3 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -465,30 +465,30 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
 {
 	struct hv_gpa_range_for_visibility *input;
 	u64 hv_status;
+	int batch_size;
 	unsigned long flags;
 
 	/* no-op if partition isolation is not enabled */
 	if (!hv_is_isolation_supported())
 		return 0;
 
-	if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
-		pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
-			HV_MAX_MODIFY_GPA_REP_COUNT);
+	local_irq_save(flags);
+	batch_size = hv_hvcall_in_array(&input, sizeof(*input),
+					sizeof(input->gpa_page_list[0]));
+	if (unlikely(!input)) {
+		local_irq_restore(flags);
 		return -EINVAL;
 	}
 
-	local_irq_save(flags);
-	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
-	if (unlikely(!input)) {
+	if (count > batch_size) {
+		pr_err("Hyper-V: GPA count:%d exceeds supported:%u\n", count,
+		       batch_size);
 		local_irq_restore(flags);
 		return -EINVAL;
 	}
 
 	input->partition_id = HV_PARTITION_ID_SELF;
 	input->host_visibility = visibility;
-	input->reserved0 = 0;
-	input->reserved1 = 0;
 	memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
 	hv_status = hv_do_rep_hypercall(
 			HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index cfcb60468b01..7eaa34ce2f5f 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -72,7 +72,7 @@ static void hyperv_flush_tlb_multi(const struct cpumask *cpus,
 
 	local_irq_save(flags);
 
-	flush = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	max_gvas = hv_hvcall_in_array(&flush, sizeof(*flush), sizeof(flush->gva_list[0]));
 
 	if (unlikely(!flush)) {
 		local_irq_restore(flags);
@@ -86,13 +86,10 @@ static void hyperv_flush_tlb_multi(const struct cpumask *cpus,
 		 */
 		flush->address_space = virt_to_phys(info->mm->pgd);
 		flush->address_space &= CR3_ADDR_MASK;
-		flush->flags = 0;
 	} else {
-		flush->address_space = 0;
 		flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
 	}
 
-	flush->processor_mask = 0;
 	if (cpumask_equal(cpus, cpu_present_mask)) {
 		flush->flags |= HV_FLUSH_ALL_PROCESSORS;
 	} else {
@@ -139,8 +136,6 @@ static void hyperv_flush_tlb_multi(const struct cpumask *cpus,
 	 * We can flush not more than max_gvas with one hypercall. Flush the
 	 * whole address space if we were asked to do more.
 	 */
-	max_gvas = (PAGE_SIZE - sizeof(*flush)) / sizeof(flush->gva_list[0]);
-
 	if (info->end == TLB_FLUSH_ALL) {
 		flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
 		status = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
@@ -179,7 +174,7 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
 	if (!(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED))
 		return HV_STATUS_INVALID_PARAMETER;
 
-	flush = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	max_gvas = hv_hvcall_in_array(&flush, sizeof(*flush), sizeof(flush->gva_list[0]));
 
 	if (info->mm) {
 		/*
@@ -188,14 +183,10 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
 		 */
 		flush->address_space = virt_to_phys(info->mm->pgd);
 		flush->address_space &= CR3_ADDR_MASK;
-		flush->flags = 0;
 	} else {
-		flush->address_space = 0;
 		flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
 	}
 
-	flush->hv_vp_set.valid_bank_mask = 0;
-
 	flush->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
 	nr_bank = cpumask_to_vpset_skip(&flush->hv_vp_set, cpus,
 			info->freed_tables ? NULL : cpu_is_lazy);
@@ -210,10 +201,10 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
 	 * of flush->hv_vp_set as part of the fixed size input header.
 	 * So the variable input header size is equal to nr_bank.
 	 */
-	max_gvas =
-		(PAGE_SIZE - sizeof(*flush) - nr_bank *
-		 sizeof(flush->hv_vp_set.bank_contents[0])) /
+	max_gvas -= (nr_bank * sizeof(flush->hv_vp_set.bank_contents[0])) /
 		sizeof(flush->gva_list[0]);
+	if (max_gvas <= 0)
+		return HV_STATUS_INVALID_PARAMETER;
 
 	if (info->end == TLB_FLUSH_ALL) {
 		flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
diff --git a/arch/x86/hyperv/nested.c b/arch/x86/hyperv/nested.c
index 1083dc8646f9..88c39ac8d0aa 100644
--- a/arch/x86/hyperv/nested.c
+++ b/arch/x86/hyperv/nested.c
@@ -29,15 +29,13 @@ int hyperv_flush_guest_mapping(u64 as)
 
 	local_irq_save(flags);
 
-	flush = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
+	hv_hvcall_in(&flush, sizeof(*flush));
 	if (unlikely(!flush)) {
 		local_irq_restore(flags);
 		goto fault;
 	}
 
 	flush->address_space = as;
-	flush->flags = 0;
 
 	status = hv_do_hypercall(HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE,
 				 flush, NULL);
@@ -90,25 +88,23 @@ int hyperv_flush_guest_mapping_range(u64 as,
 	u64 status;
 	unsigned long flags;
 	int ret = -ENOTSUPP;
-	int gpa_n = 0;
+	int batch_size, gpa_n = 0;
 
 	if (!hv_hypercall_pg || !fill_flush_list_func)
 		goto fault;
 
 	local_irq_save(flags);
 
-	flush = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
+	batch_size = hv_hvcall_in_array(&flush, sizeof(*flush),
+					sizeof(flush->gpa_list[0]));
 	if (unlikely(!flush)) {
 		local_irq_restore(flags);
 		goto fault;
 	}
 
 	flush->address_space = as;
-	flush->flags = 0;
-
 	gpa_n = fill_flush_list_func(flush, data);
-	if (gpa_n < 0) {
+	if (gpa_n < 0 || gpa_n > batch_size) {
 		local_irq_restore(flags);
 		goto fault;
 	}
diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
index abf0bd76e370..5a89120ba1a6 100644
--- a/include/hyperv/hvgdk_mini.h
+++ b/include/hyperv/hvgdk_mini.h
@@ -557,7 +557,7 @@ union hv_gpa_page_range {
 struct hv_guest_mapping_flush_list {
 	u64 address_space;
 	u64 flags;
-	union hv_gpa_page_range gpa_list[HV_MAX_FLUSH_REP_COUNT];
+	union hv_gpa_page_range gpa_list[];
 };
 
 struct hv_tlb_flush {	 /* HV_INPUT_FLUSH_VIRTUAL_ADDRESS_LIST */
@@ -1244,7 +1244,7 @@ struct hv_gpa_range_for_visibility {
 	u32 host_visibility : 2;
 	u32 reserved0 : 30;
 	u32 reserved1;
-	u64 gpa_page_list[HV_MAX_MODIFY_GPA_REP_COUNT];
+	u64 gpa_page_list[];
 } __packed;
 
 #if defined(CONFIG_X86)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 4/7] Drivers: hv: Use hv_hvcall_*() to set up hypercall arguments
  2025-04-15 18:07 [PATCH v3 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
                   ` (2 preceding siblings ...)
  2025-04-15 18:07 ` [PATCH v3 3/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 2 mhkelley58
@ 2025-04-15 18:07 ` mhkelley58
  2025-04-15 18:07 ` [PATCH v3 5/7] PCI: " mhkelley58
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: mhkelley58 @ 2025-04-15 18:07 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

From: Michael Kelley <mhklinux@outlook.com>

Update hypercall call sites to use the new hv_hvcall_*() functions
to set up hypercall arguments. Since these functions zero the
fixed portion of input memory, remove now redundant zero'ing of
input fields.

In hv_post_message(), the payload area is treated as an array to
avoid wasting cycles on zero'ing it and then overwriting with
memcpy().

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---

Notes:
    Changes in v3:
    * Removed change to definition of struct hv_input_post_message so the
      'payload' remains a fixed size array. Adjust hv_post_message() so
      that the 'payload' array is not zero'ed. [Nuno Das Neves]
    * Added check of the batch size in hv_free_page_report(). [Nuno Das Neves]
    * In hv_call_deposit_pages(), use the new hv_hvcall_in_batch_size() to
      get the batch size at the start of the function, and check the
      'num_pages' input parameter against that batch size instead of against
      a separately defined constant. Also use the batch size to compute the
      size of the memory allocation. [Nuno Das Neves]

 drivers/hv/hv.c         |  4 +++-
 drivers/hv/hv_balloon.c |  8 ++++----
 drivers/hv/hv_common.c  |  2 +-
 drivers/hv/hv_proc.c    | 23 ++++++++++-------------
 4 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 308c8f279df8..3e7d681ff2b7 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -66,7 +66,9 @@ int hv_post_message(union hv_connection_id connection_id,
 	if (hv_isolation_type_tdx() && ms_hyperv.paravisor_present)
 		aligned_msg = this_cpu_ptr(hv_context.cpu_context)->post_msg_page;
 	else
-		aligned_msg = *this_cpu_ptr(hyperv_pcpu_input_arg);
+		hv_hvcall_in_array(&aligned_msg,
+				   offsetof(typeof(*aligned_msg), payload),
+				   sizeof(aligned_msg->payload[0]));
 
 	aligned_msg->connectionid = connection_id;
 	aligned_msg->reserved = 0;
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 2b4080e51f97..801c03fe10f8 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1577,21 +1577,21 @@ static int hv_free_page_report(struct page_reporting_dev_info *pr_dev_info,
 {
 	unsigned long flags;
 	struct hv_memory_hint *hint;
-	int i, order;
+	int i, order, batch_size;
 	u64 status;
 	struct scatterlist *sg;
 
-	WARN_ON_ONCE(nents > HV_MEMORY_HINT_MAX_GPA_PAGE_RANGES);
 	WARN_ON_ONCE(sgl->length < (HV_HYP_PAGE_SIZE << page_reporting_order));
 	local_irq_save(flags);
-	hint = *this_cpu_ptr(hyperv_pcpu_input_arg);
+
+	batch_size = hv_hvcall_in_array(&hint, sizeof(*hint), sizeof(hint->ranges[0]));
 	if (!hint) {
 		local_irq_restore(flags);
 		return -ENOSPC;
 	}
+	WARN_ON_ONCE(nents > batch_size);
 
 	hint->heat_type = HV_EXTMEM_HEAT_HINT_COLD_DISCARD;
-	hint->reserved = 0;
 	for_each_sg(sgl, sg, nents, i) {
 		union hv_gpa_page_range *range;
 
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index a7d7494feaca..895448954f37 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -306,7 +306,7 @@ void __init hv_get_partition_id(void)
 	u64 status, pt_id;
 
 	local_irq_save(flags);
-	output = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	hv_hvcall_inout(NULL, 0, &output, sizeof(*output));
 	status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output);
 	pt_id = output->partition_id;
 	local_irq_restore(flags);
diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
index 7d7ecb6f6137..e85d9dd08a9d 100644
--- a/drivers/hv/hv_proc.c
+++ b/drivers/hv/hv_proc.c
@@ -8,12 +8,6 @@
 #include <linux/minmax.h>
 #include <asm/mshyperv.h>
 
-/*
- * See struct hv_deposit_memory. The first u64 is partition ID, the rest
- * are GPAs.
- */
-#define HV_DEPOSIT_MAX (HV_HYP_PAGE_SIZE / sizeof(u64) - 1)
-
 /* Deposits exact number of pages. Must be called with interrupts enabled.  */
 int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
 {
@@ -24,11 +18,13 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
 	int order;
 	u64 status;
 	int ret;
-	u64 base_pfn;
+	u64 base_pfn, batch_size;
 	struct hv_deposit_memory *input_page;
 	unsigned long flags;
 
-	if (num_pages > HV_DEPOSIT_MAX)
+	batch_size = hv_hvcall_in_batch_size(sizeof(*input_page),
+			   sizeof(input_page->gpa_page_list[0]));
+	if (num_pages > batch_size)
 		return -E2BIG;
 	if (!num_pages)
 		return 0;
@@ -39,7 +35,7 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
 		return -ENOMEM;
 	pages = page_address(page);
 
-	counts = kcalloc(HV_DEPOSIT_MAX, sizeof(int), GFP_KERNEL);
+	counts = kcalloc(batch_size, sizeof(int), GFP_KERNEL);
 	if (!counts) {
 		free_page((unsigned long)pages);
 		return -ENOMEM;
@@ -73,7 +69,9 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
 
 	local_irq_save(flags);
 
-	input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	/* Batch size is checked at the start of function; no need to repeat */
+	hv_hvcall_in_array(&input_page, sizeof(*input_page),
+			   sizeof(input_page->gpa_page_list[0]));
 
 	input_page->partition_id = partition_id;
 
@@ -125,9 +123,8 @@ int hv_call_add_logical_proc(int node, u32 lp_index, u32 apic_id)
 	do {
 		local_irq_save(flags);
 
-		input = *this_cpu_ptr(hyperv_pcpu_input_arg);
 		/* We don't do anything with the output right now */
-		output = *this_cpu_ptr(hyperv_pcpu_output_arg);
+		hv_hvcall_inout(&input, sizeof(*input), &output, sizeof(*output));
 
 		input->lp_index = lp_index;
 		input->apic_id = apic_id;
@@ -168,7 +165,7 @@ int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags)
 	do {
 		local_irq_save(irq_flags);
 
-		input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+		hv_hvcall_in(&input, sizeof(*input));
 
 		input->partition_id = partition_id;
 		input->vp_index = vp_index;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 5/7] PCI: hv: Use hv_hvcall_*() to set up hypercall arguments
  2025-04-15 18:07 [PATCH v3 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
                   ` (3 preceding siblings ...)
  2025-04-15 18:07 ` [PATCH v3 4/7] Drivers: hv: Use hv_hvcall_*() to set up hypercall arguments mhkelley58
@ 2025-04-15 18:07 ` mhkelley58
  2025-04-15 18:07 ` [PATCH v3 6/7] Drivers: hv: Use hv_hvcall_*() to set up hypercall arguments for mshv code mhkelley58
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: mhkelley58 @ 2025-04-15 18:07 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

From: Michael Kelley <mhklinux@outlook.com>

Update hypercall call sites to use the new hv_hvcall_*() functions
to set up hypercall arguments. Since these functions zero the
fixed portion of input memory, remove now redundant calls to memset().

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---

Notes:
    Changes in v3:
    * Removed change to definition of struct hv_mmio_write_input so it remains
      consistent with original Hyper-V definitions. Adjusted argument to
      hv_hvcall_in_array() accordingly so that the 64 byte 'data' array is
      not zero'ed. [Nuno Das Neves]
    
    Changes in v2:
    * In hv_arch_irq_unmask(), added check of the number of computed banks
      in the hv_vpset against the batch_size. Since an hv_vpset currently
      represents a maximum of 4096 CPUs, the hv_vpset size does not exceed
      512 bytes and there should always be sufficent space. But do the
      check just in case something changes. [Nuno Das Neves]

 drivers/pci/controller/pci-hyperv.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index e1eaa24559a2..32cceceff062 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -622,7 +622,7 @@ static void hv_arch_irq_unmask(struct irq_data *data)
 	struct pci_dev *pdev;
 	unsigned long flags;
 	u32 var_size = 0;
-	int cpu, nr_bank;
+	int cpu, nr_bank, batch_size;
 	u64 res;
 
 	dest = irq_data_get_effective_affinity_mask(data);
@@ -638,8 +638,8 @@ static void hv_arch_irq_unmask(struct irq_data *data)
 
 	local_irq_save(flags);
 
-	params = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	memset(params, 0, sizeof(*params));
+	batch_size = hv_hvcall_in_array(&params, sizeof(*params),
+					sizeof(params->int_target.vp_set.bank_contents[0]));
 	params->partition_id = HV_PARTITION_ID_SELF;
 	params->int_entry.source = HV_INTERRUPT_SOURCE_MSI;
 	params->int_entry.msi_entry.address.as_uint32 = int_desc->address & 0xffffffff;
@@ -671,7 +671,7 @@ static void hv_arch_irq_unmask(struct irq_data *data)
 		nr_bank = cpumask_to_vpset(&params->int_target.vp_set, tmp);
 		free_cpumask_var(tmp);
 
-		if (nr_bank <= 0) {
+		if (nr_bank <= 0 || nr_bank > batch_size) {
 			res = 1;
 			goto out;
 		}
@@ -1034,11 +1034,9 @@ static void hv_pci_read_mmio(struct device *dev, phys_addr_t gpa, int size, u32
 
 	/*
 	 * Must be called with interrupts disabled so it is safe
-	 * to use the per-cpu input argument page.  Use it for
-	 * both input and output.
+	 * to use the per-cpu argument page.
 	 */
-	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	out = *this_cpu_ptr(hyperv_pcpu_input_arg) + sizeof(*in);
+	hv_hvcall_inout(&in, sizeof(*in), &out, sizeof(*out));
 	in->gpa = gpa;
 	in->size = size;
 
@@ -1067,9 +1065,9 @@ static void hv_pci_write_mmio(struct device *dev, phys_addr_t gpa, int size, u32
 
 	/*
 	 * Must be called with interrupts disabled so it is safe
-	 * to use the per-cpu input argument memory.
+	 * to use the per-cpu argument page.
 	 */
-	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	hv_hvcall_in_array(&in, offsetof(typeof(*in), data), sizeof(in->data[0]));
 	in->gpa = gpa;
 	in->size = size;
 	switch (size) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 6/7] Drivers: hv: Use hv_hvcall_*() to set up hypercall arguments for mshv code
  2025-04-15 18:07 [PATCH v3 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
                   ` (4 preceding siblings ...)
  2025-04-15 18:07 ` [PATCH v3 5/7] PCI: " mhkelley58
@ 2025-04-15 18:07 ` mhkelley58
  2025-04-15 18:07 ` [PATCH v3 7/7] Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg mhkelley58
  2025-08-25 21:39 ` [PATCH v3 0/7] hyperv: Introduce new way to manage hypercall args Wei Liu
  7 siblings, 0 replies; 24+ messages in thread
From: mhkelley58 @ 2025-04-15 18:07 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

From: Michael Kelley <mhklinux@outlook.com>

Update hypercall call sites to use the new hv_hvcall_*() functions
to set up hypercall arguments. Since these functions zero the
fixed portion of input memory, remove now redundant calls to memset()
and explicit zero'ing of input fields. Where feasible use batch size
returned by hv_hvcall_inout_array() instead of separate #define value.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---

Notes:
    Changes in v3:
    * This patch is new in v3 due to rebasing on 6.15-rc1, which has new
      mshv-related hypercalls.

 drivers/hv/mshv_common.c       |  31 +++------
 drivers/hv/mshv_root_hv_call.c | 121 +++++++++++++--------------------
 drivers/hv/mshv_root_main.c    |   5 +-
 3 files changed, 60 insertions(+), 97 deletions(-)

diff --git a/drivers/hv/mshv_common.c b/drivers/hv/mshv_common.c
index 2575e6d7a71f..2ad36cc7a329 100644
--- a/drivers/hv/mshv_common.c
+++ b/drivers/hv/mshv_common.c
@@ -16,12 +16,6 @@
 
 #include "mshv.h"
 
-#define HV_GET_REGISTER_BATCH_SIZE	\
-	(HV_HYP_PAGE_SIZE / sizeof(union hv_register_value))
-#define HV_SET_REGISTER_BATCH_SIZE	\
-	((HV_HYP_PAGE_SIZE - sizeof(struct hv_input_set_vp_registers)) \
-		/ sizeof(struct hv_register_assoc))
-
 int hv_call_get_vp_registers(u32 vp_index, u64 partition_id, u16 count,
 			     union hv_input_vtl input_vtl,
 			     struct hv_register_assoc *registers)
@@ -29,24 +23,23 @@ int hv_call_get_vp_registers(u32 vp_index, u64 partition_id, u16 count,
 	struct hv_input_get_vp_registers *input_page;
 	union hv_register_value *output_page;
 	u16 completed = 0;
-	unsigned long remaining = count;
+	unsigned long batch_size, remaining = count;
 	int rep_count, i;
 	u64 status = HV_STATUS_SUCCESS;
 	unsigned long flags;
 
 	local_irq_save(flags);
 
-	input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
+	batch_size = hv_hvcall_inout_array(&input_page, sizeof(*input_page),
+			      sizeof(input_page->names[0]),
+			      &output_page, 0, sizeof(*output_page));
 
 	input_page->partition_id = partition_id;
 	input_page->vp_index = vp_index;
 	input_page->input_vtl.as_uint8 = input_vtl.as_uint8;
-	input_page->rsvd_z8 = 0;
-	input_page->rsvd_z16 = 0;
 
 	while (remaining) {
-		rep_count = min(remaining, HV_GET_REGISTER_BATCH_SIZE);
+		rep_count = min(remaining, batch_size);
 		for (i = 0; i < rep_count; ++i)
 			input_page->names[i] = registers[i].name;
 
@@ -75,21 +68,19 @@ int hv_call_set_vp_registers(u32 vp_index, u64 partition_id, u16 count,
 	struct hv_input_set_vp_registers *input_page;
 	u16 completed = 0;
 	unsigned long remaining = count;
-	int rep_count;
+	unsigned long rep_count, batch_size;
 	u64 status = HV_STATUS_SUCCESS;
 	unsigned long flags;
 
 	local_irq_save(flags);
-	input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
+	batch_size = hv_hvcall_in_array(&input_page, sizeof(*input_page),
+			sizeof(input_page->elements[0]));
 	input_page->partition_id = partition_id;
 	input_page->vp_index = vp_index;
 	input_page->input_vtl.as_uint8 = input_vtl.as_uint8;
-	input_page->rsvd_z8 = 0;
-	input_page->rsvd_z16 = 0;
 
 	while (remaining) {
-		rep_count = min(remaining, HV_SET_REGISTER_BATCH_SIZE);
+		rep_count = min(remaining, batch_size);
 		memcpy(input_page->elements, registers,
 		       sizeof(struct hv_register_assoc) * rep_count);
 
@@ -119,9 +110,7 @@ int hv_call_get_partition_property(u64 partition_id,
 	struct hv_output_get_partition_property *output;
 
 	local_irq_save(flags);
-	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
-	memset(input, 0, sizeof(*input));
+	hv_hvcall_inout(&input, sizeof(*input), &output, sizeof(*output));
 	input->partition_id = partition_id;
 	input->property_code = property_code;
 	status = hv_do_hypercall(HVCALL_GET_PARTITION_PROPERTY, input, output);
diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
index a222a16107f6..f14720de3248 100644
--- a/drivers/hv/mshv_root_hv_call.c
+++ b/drivers/hv/mshv_root_hv_call.c
@@ -21,22 +21,6 @@
 #define HV_PAGE_COUNT_2M_ALIGNED(pg_count) (!((pg_count) & (0x200 - 1)))
 
 #define HV_WITHDRAW_BATCH_SIZE	(HV_HYP_PAGE_SIZE / sizeof(u64))
-#define HV_MAP_GPA_BATCH_SIZE	\
-	((HV_HYP_PAGE_SIZE - sizeof(struct hv_input_map_gpa_pages)) \
-		/ sizeof(u64))
-#define HV_GET_VP_STATE_BATCH_SIZE	\
-	((HV_HYP_PAGE_SIZE - sizeof(struct hv_input_get_vp_state)) \
-		/ sizeof(u64))
-#define HV_SET_VP_STATE_BATCH_SIZE	\
-	((HV_HYP_PAGE_SIZE - sizeof(struct hv_input_set_vp_state)) \
-		/ sizeof(u64))
-#define HV_GET_GPA_ACCESS_STATES_BATCH_SIZE	\
-	((HV_HYP_PAGE_SIZE - sizeof(union hv_gpa_page_access_state)) \
-		/ sizeof(union hv_gpa_page_access_state))
-#define HV_MODIFY_SPARSE_SPA_PAGE_HOST_ACCESS_MAX_PAGE_COUNT		       \
-	((HV_HYP_PAGE_SIZE -						       \
-	  sizeof(struct hv_input_modify_sparse_spa_page_host_access)) /        \
-	 sizeof(u64))
 
 int hv_call_withdraw_memory(u64 count, int node, u64 partition_id)
 {
@@ -57,9 +41,7 @@ int hv_call_withdraw_memory(u64 count, int node, u64 partition_id)
 	while (remaining) {
 		local_irq_save(flags);
 
-		input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
-		memset(input_page, 0, sizeof(*input_page));
+		hv_hvcall_in(&input_page, sizeof(*input_page));
 		input_page->partition_id = partition_id;
 		status = hv_do_rep_hypercall(HVCALL_WITHDRAW_MEMORY,
 					     min(remaining, HV_WITHDRAW_BATCH_SIZE),
@@ -98,10 +80,7 @@ int hv_call_create_partition(u64 flags,
 
 	do {
 		local_irq_save(irq_flags);
-		input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-		output = *this_cpu_ptr(hyperv_pcpu_output_arg);
-
-		memset(input, 0, sizeof(*input));
+		hv_hvcall_inout(&input, sizeof(*input), &output, sizeof(*output));
 		input->flags = flags;
 		input->compatibility_version = HV_COMPATIBILITY_21_H2;
 
@@ -205,11 +184,12 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
 
 	while (done < page_count) {
 		ulong i, completed, remain = page_count - done;
-		int rep_count = min(remain, HV_MAP_GPA_BATCH_SIZE);
+		ulong rep_count, batch_size;
 
 		local_irq_save(irq_flags);
-		input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
+		batch_size = hv_hvcall_in_array(&input_page, sizeof(*input_page),
+				   sizeof(input_page->source_gpa_page_list[0]));
+		rep_count = min(remain, batch_size);
 		input_page->target_partition_id = partition_id;
 		input_page->target_gpa_base = gfn + (done << large_shift);
 		input_page->map_flags = flags;
@@ -310,7 +290,7 @@ int hv_call_unmap_gpa_pages(u64 partition_id, u64 gfn, u64 page_count_4k,
 		int rep_count = min(remain, HV_UMAP_GPA_PAGES);
 
 		local_irq_save(irq_flags);
-		input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
+		hv_hvcall_in(&input_page, sizeof(*input_page));
 
 		input_page->target_partition_id = partition_id;
 		input_page->target_gpa_base = gfn + (done << large_shift);
@@ -339,7 +319,7 @@ int hv_call_get_gpa_access_states(u64 partition_id, u32 count, u64 gpa_base_pfn,
 	struct hv_input_get_gpa_pages_access_state *input_page;
 	union hv_gpa_page_access_state *output_page;
 	int completed = 0;
-	unsigned long remaining = count;
+	unsigned long batch_size, remaining = count;
 	int rep_count, i;
 	u64 status = 0;
 	unsigned long flags;
@@ -347,13 +327,13 @@ int hv_call_get_gpa_access_states(u64 partition_id, u32 count, u64 gpa_base_pfn,
 	*written_total = 0;
 	while (remaining) {
 		local_irq_save(flags);
-		input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
-		output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
+		batch_size = hv_hvcall_inout_array(&input_page, sizeof(*input_page),
+					0, &output_page, 0, sizeof(*output_page));
 
 		input_page->partition_id = partition_id;
 		input_page->hv_gpa_page_number = gpa_base_pfn + *written_total;
 		input_page->flags = state_flags;
-		rep_count = min(remaining, HV_GET_GPA_ACCESS_STATES_BATCH_SIZE);
+		rep_count = min(remaining, batch_size);
 
 		status = hv_do_rep_hypercall(HVCALL_GET_GPA_PAGES_ACCESS_STATES, rep_count,
 					     0, input_page, output_page);
@@ -383,8 +363,7 @@ int hv_call_assert_virtual_interrupt(u64 partition_id, u32 vector,
 	u64 status;
 
 	local_irq_save(flags);
-	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	memset(input, 0, sizeof(*input));
+	hv_hvcall_in(&input, sizeof(*input));
 	input->partition_id = partition_id;
 	input->vector = vector;
 	input->dest_addr = dest_addr;
@@ -421,21 +400,21 @@ int hv_call_get_vp_state(u32 vp_index, u64 partition_id,
 	u64 status;
 	int i;
 	u64 control;
-	unsigned long flags;
+	unsigned long flags, batch_size;
 	int ret = 0;
 
-	if (page_count > HV_GET_VP_STATE_BATCH_SIZE)
-		return -EINVAL;
-
 	if (!page_count && !ret_output)
 		return -EINVAL;
 
 	do {
 		local_irq_save(flags);
-		input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-		output = *this_cpu_ptr(hyperv_pcpu_output_arg);
-		memset(input, 0, sizeof(*input));
-		memset(output, 0, sizeof(*output));
+		batch_size = hv_hvcall_inout_array(&input, sizeof(*input),
+				sizeof(input->output_data_pfns[0]),
+				&output, sizeof(*output), 0);
+		if (page_count > batch_size) {
+			local_irq_restore(flags);
+			return -EINVAL;
+		}
 
 		input->partition_id = partition_id;
 		input->vp_index = vp_index;
@@ -477,11 +456,7 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
 	unsigned long flags;
 	int ret = 0;
 	u16 varhead_sz;
-
-	if (page_count > HV_SET_VP_STATE_BATCH_SIZE)
-		return -EINVAL;
-	if (sizeof(*input) + num_bytes > HV_HYP_PAGE_SIZE)
-		return -EINVAL;
+	u64 batch_size;
 
 	if (num_bytes)
 		/* round up to 8 and divide by 8 */
@@ -493,18 +468,26 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
 
 	do {
 		local_irq_save(flags);
-		input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-		memset(input, 0, sizeof(*input));
 
-		input->partition_id = partition_id;
-		input->vp_index = vp_index;
-		input->state_data = state_data;
 		if (num_bytes) {
+			batch_size = hv_hvcall_in_array(&input, sizeof(*input),
+						sizeof(input->data[0].bytes));
+			if (num_bytes > batch_size)
+				goto size_error;
+
 			memcpy((u8 *)input->data, bytes, num_bytes);
 		} else {
+			batch_size = hv_hvcall_in_array(&input, sizeof(*input),
+						sizeof(input->data[0].pfns));
+			if (page_count > batch_size)
+				goto size_error;
+
 			for (i = 0; i < page_count; i++)
 				input->data[i].pfns = page_to_pfn(pages[i]);
 		}
+		input->partition_id = partition_id;
+		input->vp_index = vp_index;
+		input->state_data = state_data;
 
 		control = (HVCALL_SET_VP_STATE) |
 			  (varhead_sz << HV_HYPERCALL_VARHEAD_OFFSET);
@@ -523,6 +506,10 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
 	} while (!ret);
 
 	return ret;
+
+size_error:
+	local_irq_restore(flags);
+	return -EINVAL;
 }
 
 int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
@@ -538,8 +525,7 @@ int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
 	do {
 		local_irq_save(flags);
 
-		input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-		output = *this_cpu_ptr(hyperv_pcpu_output_arg);
+		hv_hvcall_inout(&input, sizeof(*input), &output, sizeof(*output));
 
 		input->partition_id = partition_id;
 		input->vp_index = vp_index;
@@ -573,9 +559,7 @@ int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
 
 	local_irq_save(flags);
 
-	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
-	memset(input, 0, sizeof(*input));
+	hv_hvcall_in(&input, sizeof(*input));
 
 	input->partition_id = partition_id;
 	input->vp_index = vp_index;
@@ -613,8 +597,7 @@ hv_call_create_port(u64 port_partition_id, union hv_port_id port_id,
 
 	do {
 		local_irq_save(flags);
-		input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-		memset(input, 0, sizeof(*input));
+		hv_hvcall_in(&input, sizeof(*input));
 
 		input->port_partition_id = port_partition_id;
 		input->port_id = port_id;
@@ -667,8 +650,7 @@ hv_call_connect_port(u64 port_partition_id, union hv_port_id port_id,
 
 	do {
 		local_irq_save(flags);
-		input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-		memset(input, 0, sizeof(*input));
+		hv_hvcall_in(&input, sizeof(*input));
 		input->port_partition_id = port_partition_id;
 		input->port_id = port_id;
 		input->connection_partition_id = connection_partition_id;
@@ -735,10 +717,7 @@ int hv_call_map_stat_page(enum hv_stats_object_type type,
 
 	do {
 		local_irq_save(flags);
-		input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-		output = *this_cpu_ptr(hyperv_pcpu_output_arg);
-
-		memset(input, 0, sizeof(*input));
+		hv_hvcall_inout(&input, sizeof(*input), &output, sizeof(*output));
 		input->type = type;
 		input->identity = *identity;
 
@@ -772,9 +751,7 @@ int hv_call_unmap_stat_page(enum hv_stats_object_type type,
 	u64 status;
 
 	local_irq_save(flags);
-	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
-	memset(input, 0, sizeof(*input));
+	hv_hvcall_in(&input, sizeof(*input));
 	input->type = type;
 	input->identity = *identity;
 
@@ -807,14 +784,14 @@ int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
 	}
 
 	while (done < page_count) {
-		ulong i, completed, remain = page_count - done;
-		int rep_count = min(remain,
-				    HV_MODIFY_SPARSE_SPA_PAGE_HOST_ACCESS_MAX_PAGE_COUNT);
+		ulong i, batch_size, completed, remain = page_count - done;
+		ulong rep_count;
 
 		local_irq_save(irq_flags);
-		input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
+		batch_size = hv_hvcall_in_array(&input_page, sizeof(*input_page),
+						sizeof(input_page->spa_page_list[0]));
+		rep_count = min(remain, batch_size);
 
-		memset(input_page, 0, sizeof(*input_page));
 		/* Only set the partition id if you are making the pages
 		 * exclusive
 		 */
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index 72df774e410a..df6b0da4a9a8 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -2051,11 +2051,8 @@ static int __init hv_retrieve_scheduler_type(enum hv_scheduler_type *out)
 	u64 status;
 
 	local_irq_save(flags);
-	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
 
-	memset(input, 0, sizeof(*input));
-	memset(output, 0, sizeof(*output));
+	hv_hvcall_inout(&input, sizeof(*input), &output, sizeof(*output));
 	input->property_id = HV_SYSTEM_PROPERTY_SCHEDULER_TYPE;
 
 	status = hv_do_hypercall(HVCALL_GET_SYSTEM_PROPERTY, input, output);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 7/7] Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg
  2025-04-15 18:07 [PATCH v3 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
                   ` (5 preceding siblings ...)
  2025-04-15 18:07 ` [PATCH v3 6/7] Drivers: hv: Use hv_hvcall_*() to set up hypercall arguments for mshv code mhkelley58
@ 2025-04-15 18:07 ` mhkelley58
  2025-08-25 21:39 ` [PATCH v3 0/7] hyperv: Introduce new way to manage hypercall args Wei Liu
  7 siblings, 0 replies; 24+ messages in thread
From: mhkelley58 @ 2025-04-15 18:07 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

From: Michael Kelley <mhklinux@outlook.com>

All open coded uses of hyperv_pcpu_input_arg and hyperv_pcpu_ouput_arg
have been replaced by hv_hvcall_*() functions. So combine
hyperv_pcpu_input_arg and hyperv_pcpu_output_arg in a single
hyperv_pcpu_arg. Remove logic for managing a separate output arg. Fixup
comment references to the old variable names.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
 arch/x86/hyperv/hv_init.c      |  6 ++--
 drivers/hv/hv.c                |  2 +-
 drivers/hv/hv_common.c         | 55 ++++++++++------------------------
 include/asm-generic/mshyperv.h |  6 +---
 4 files changed, 21 insertions(+), 48 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index cc843905c23a..e930fe75f2ca 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -483,16 +483,16 @@ void __init hyperv_init(void)
 	 * A TDX VM with no paravisor only uses TDX GHCI rather than hv_hypercall_pg:
 	 * when the hypercall input is a page, such a VM must pass a decrypted
 	 * page to Hyper-V, e.g. hv_post_message() uses the per-CPU page
-	 * hyperv_pcpu_input_arg, which is decrypted if no paravisor is present.
+	 * hyperv_pcpu_arg, which is decrypted if no paravisor is present.
 	 *
 	 * A TDX VM with the paravisor uses hv_hypercall_pg for most hypercalls,
 	 * which are handled by the paravisor and the VM must use an encrypted
-	 * input page: in such a VM, the hyperv_pcpu_input_arg is encrypted and
+	 * input page: in such a VM, the hyperv_pcpu_arg is encrypted and
 	 * used in the hypercalls, e.g. see hv_mark_gpa_visibility() and
 	 * hv_arch_irq_unmask(). Such a VM uses TDX GHCI for two hypercalls:
 	 * 1. HVCALL_SIGNAL_EVENT: see vmbus_set_event() and _hv_do_fast_hypercall8().
 	 * 2. HVCALL_POST_MESSAGE: the input page must be a decrypted page, i.e.
-	 * hv_post_message() in such a VM can't use the encrypted hyperv_pcpu_input_arg;
+	 * hv_post_message() in such a VM can't use the encrypted hyperv_pcpu_arg;
 	 * instead, hv_post_message() uses the post_msg_page, which is decrypted
 	 * in such a VM and is only used in such a VM.
 	 */
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 3e7d681ff2b7..c8bd40b797ba 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -60,7 +60,7 @@ int hv_post_message(union hv_connection_id connection_id,
 	/*
 	 * A TDX VM with the paravisor must use the decrypted post_msg_page: see
 	 * the comment in struct hv_per_cpu_context. A SNP VM with the paravisor
-	 * can use the encrypted hyperv_pcpu_input_arg because it copies the
+	 * can use the encrypted hyperv_pcpu_arg because it copies the
 	 * input into the GHCB page, which has been decrypted by the paravisor.
 	 */
 	if (hv_isolation_type_tdx() && ms_hyperv.paravisor_present)
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 895448954f37..712937c97fee 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -58,11 +58,8 @@ EXPORT_SYMBOL_GPL(hv_vp_index);
 u32 hv_max_vp_index;
 EXPORT_SYMBOL_GPL(hv_max_vp_index);
 
-void * __percpu *hyperv_pcpu_input_arg;
-EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
-
-void * __percpu *hyperv_pcpu_output_arg;
-EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
+void * __percpu *hyperv_pcpu_arg;
+EXPORT_SYMBOL_GPL(hyperv_pcpu_arg);
 
 static void hv_kmsg_dump_unregister(void);
 
@@ -95,11 +92,8 @@ void __init hv_common_free(void)
 	kfree(hv_vp_index);
 	hv_vp_index = NULL;
 
-	free_percpu(hyperv_pcpu_output_arg);
-	hyperv_pcpu_output_arg = NULL;
-
-	free_percpu(hyperv_pcpu_input_arg);
-	hyperv_pcpu_input_arg = NULL;
+	free_percpu(hyperv_pcpu_arg);
+	hyperv_pcpu_arg = NULL;
 
 	free_percpu(hv_synic_eventring_tail);
 	hv_synic_eventring_tail = NULL;
@@ -294,11 +288,6 @@ static void hv_kmsg_dump_register(void)
 	}
 }
 
-static inline bool hv_output_page_exists(void)
-{
-	return hv_root_partition() || IS_ENABLED(CONFIG_HYPERV_VTL_MODE);
-}
-
 void __init hv_get_partition_id(void)
 {
 	struct hv_output_get_partition_id *output;
@@ -376,14 +365,8 @@ int __init hv_common_init(void)
 	 * (per-CPU) hypercall input page and thus this failure is
 	 * fatal on Hyper-V.
 	 */
-	hyperv_pcpu_input_arg = alloc_percpu(void  *);
-	BUG_ON(!hyperv_pcpu_input_arg);
-
-	/* Allocate the per-CPU state for output arg for root */
-	if (hv_output_page_exists()) {
-		hyperv_pcpu_output_arg = alloc_percpu(void *);
-		BUG_ON(!hyperv_pcpu_output_arg);
-	}
+	hyperv_pcpu_arg = alloc_percpu(void  *);
+	BUG_ON(!hyperv_pcpu_arg);
 
 	if (hv_root_partition()) {
 		hv_synic_eventring_tail = alloc_percpu(u8 *);
@@ -477,33 +460,28 @@ void __init ms_hyperv_late_init(void)
 
 int hv_common_cpu_init(unsigned int cpu)
 {
-	void **inputarg, **outputarg;
+	void **inputarg;
 	u8 **synic_eventring_tail;
 	u64 msr_vp_index;
 	gfp_t flags;
-	const int pgcount = hv_output_page_exists() ? 2 : 1;
+	const int pgcount = HV_HVCALL_ARG_PAGES;
 	void *mem;
 	int ret = 0;
 
 	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
 	flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL;
 
-	inputarg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
+	inputarg = (void **)this_cpu_ptr(hyperv_pcpu_arg);
 
 	/*
-	 * The per-cpu memory is already allocated if this CPU was previously
-	 * online and then taken offline
+	 * hyperv_pcpu_arg memory is already allocated if this CPU was
+	 * previously online and then taken offline
 	 */
 	if (!*inputarg) {
 		mem = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags);
 		if (!mem)
 			return -ENOMEM;
 
-		if (hv_output_page_exists()) {
-			outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
-			*outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
-		}
-
 		if (!ms_hyperv.paravisor_present &&
 		    (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
 			ret = set_memory_decrypted((unsigned long)mem, pgcount);
@@ -517,13 +495,13 @@ int hv_common_cpu_init(unsigned int cpu)
 
 		/*
 		 * In a fully enlightened TDX/SNP VM with more than 64 VPs, if
-		 * hyperv_pcpu_input_arg is not NULL, set_memory_decrypted() ->
+		 * hyperv_pcpu_arg is not NULL, set_memory_decrypted() ->
 		 * ... -> cpa_flush()-> ... -> __send_ipi_mask_ex() tries to
-		 * use hyperv_pcpu_input_arg as the hypercall input page, which
+		 * use hyperv_pcpu_arg as the hypercall input page, which
 		 * must be a decrypted page in such a VM, but the page is still
 		 * encrypted before set_memory_decrypted() returns. Fix this by
 		 * setting *inputarg after the above set_memory_decrypted(): if
-		 * hyperv_pcpu_input_arg is NULL, __send_ipi_mask_ex() returns
+		 * hyperv_pcpu_arg is NULL, __send_ipi_mask_ex() returns
 		 * HV_STATUS_INVALID_PARAMETER immediately, and the function
 		 * hv_send_ipi_mask() falls back to orig_apic.send_IPI_mask(),
 		 * which may be slightly slower than the hypercall, but still
@@ -555,9 +533,8 @@ int hv_common_cpu_die(unsigned int cpu)
 {
 	u8 **synic_eventring_tail;
 	/*
-	 * The hyperv_pcpu_input_arg and hyperv_pcpu_output_arg memory
-	 * is not freed when the CPU goes offline as the hyperv_pcpu_input_arg
-	 * may be used by the Hyper-V vPCI driver in reassigning interrupts
+	 * The hyperv_pcpu_arg memory is not freed when the CPU goes offline as
+	 * it may be used by the Hyper-V vPCI driver in reassigning interrupts
 	 * as part of the offlining process.  The interrupt reassignment
 	 * happens *after* the CPUHP_AP_HYPERV_ONLINE state has run and
 	 * called this function.
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 504c44b1ab9e..a73ddee6d322 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -67,8 +67,7 @@ extern bool hv_nested;
 extern u64 hv_current_partition_id;
 extern enum hv_partition_type hv_curr_partition_type;
 
-extern void * __percpu *hyperv_pcpu_input_arg;
-extern void * __percpu *hyperv_pcpu_output_arg;
+extern void * __percpu *hyperv_pcpu_arg;
 
 u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
 u64 hv_do_fast_hypercall8(u16 control, u64 input8);
@@ -155,9 +154,6 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size,
  * Hypercall input and output argument setup
  */
 
-/* Temporary mapping to be removed at the end of the patch series */
-#define hyperv_pcpu_arg hyperv_pcpu_input_arg
-
 /*
  * Allocate one page that is shared between input and output args, which is
  * sufficient for all current hypercalls. If a future hypercall requires
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-04-15 18:07 ` [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments mhkelley58
@ 2025-04-21 20:41   ` Easwar Hariharan
  2025-04-21 21:24     ` Michael Kelley
  2025-08-21  0:31   ` Mukesh R
  1 sibling, 1 reply; 24+ messages in thread
From: Easwar Hariharan @ 2025-04-21 20:41 UTC (permalink / raw)
  To: mhklinux
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd,
	eahariha, x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

On 4/15/2025 11:07 AM, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
> 
> Current code allocates the "hyperv_pcpu_input_arg", and in
> some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
> page of memory allocated per-vCPU. A hypercall call site disables
> interrupts, then uses this memory to set up the input parameters for
> the hypercall, read the output results after hypercall execution, and
> re-enable interrupts. The open coding of these steps leads to
> inconsistencies, and in some cases, violation of the generic
> requirements for the hypercall input and output as described in the
> Hyper-V Top Level Functional Spec (TLFS)[1].
> 
> To reduce these kinds of problems, introduce a family of inline
> functions to replace the open coding. The functions provide a new way
> to manage the use of this per-vCPU memory that is usually the input and
> output arguments to Hyper-V hypercalls. The functions encapsulate
> key aspects of the usage and ensure that the TLFS requirements are
> met (max size of 1 page each for input and output, no overlap of
> input and output, aligned to 8 bytes, etc.). Conceptually, there
> is no longer a difference between the "per-vCPU input page" and
> "per-vCPU output page". Only a single per-vCPU page is allocated, and
> it provides both hypercall input and output memory. All current
> hypercalls can fit their input and output within that single page,
> though the new code allows easy changing to two pages should a future
> hypercall require a full page for each of the input and output.
> 
> The new functions always zero the fixed-size portion of the hypercall
> input area so that uninitialized memory is not inadvertently passed
> to the hypercall. Current open-coded hypercall call sites are
> inconsistent on this point, and use of the new functions addresses
> that inconsistency. The output area is not zero'ed by the new code
> as it is Hyper-V's responsibility to provide legal output.
> 
> When the input or output (or both) contain an array, the new functions
> calculate and return how many array entries fit within the per-vCPU
> memory page, which is effectively the "batch size" for the hypercall
> processing multiple entries. This batch size can then be used in the
> hypercall control word to specify the repetition count. This
> calculation of the batch size replaces current open coding of the
> batch size, which is prone to errors. Note that the array portion of
> the input area is *not* zero'ed. The arrays are almost always 64-bit
> GPAs or something similar, and zero'ing that much memory seems
> wasteful at runtime when it will all be overwritten. The hypercall
> call site is responsible for ensuring that no part of the array is
> left uninitialized (just as with current code).
> 
> The new functions are realized as a single inline function that
> handles the most complex case, which is a hypercall with input
> and output, both of which contain arrays. Simpler cases are mapped to
> this most complex case with #define wrappers that provide zero or NULL
> for some arguments. Several of the arguments to this new function
> must be compile-time constants generated by "sizeof()"
> expressions. As such, most of the code in the new function can be
> evaluated by the compiler, with the result that the code paths are
> no longer than with the current open coding. The one exception is
> new code generated to zero the fixed-size portion of the input area
> in cases where it is not currently done.
> 
> [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
> 
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> ---
> 
> Notes:
>     Changes in v3:
>     * Added wrapper #define hv_hvcall_in_batch_size() to get the batch size
>       without setting up hypercall input/output parameters. This call can be
>       used when the batch size is needed for validation checks or memory
>       allocations prior to disabling interrupts.
>     
>     Changes in v2:
>     * Added comment that hv_hvcall_inout_array() should always be called with
>       interrupts disabled because it is returning pointers to per-cpu memory
>       [Nuno Das Neves]
> 
>  include/asm-generic/mshyperv.h | 106 +++++++++++++++++++++++++++++++++
>  1 file changed, 106 insertions(+)
>

This is very cool, thanks for taking the time! I think the function naming
could be more intuitive, e.g. hv_setup_*_args(). I'd not block it for that reason,
but would be super happy if you would update it. What do you think?

Thanks,
Easwar (he/him)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-04-21 20:41   ` Easwar Hariharan
@ 2025-04-21 21:24     ` Michael Kelley
  2025-04-21 23:27       ` Easwar Hariharan
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Kelley @ 2025-04-21 21:24 UTC (permalink / raw)
  To: Easwar Hariharan
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
	lpieralisi@kernel.org, kw@linux.com,
	manivannan.sadhasivam@linaro.org, robh@kernel.org,
	bhelgaas@google.com, arnd@arndb.de, x86@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-arch@vger.kernel.org

From: Easwar Hariharan <eahariha@linux.microsoft.com> Sent: Monday, April 21, 2025 1:41 PM
> >
> > Current code allocates the "hyperv_pcpu_input_arg", and in
> > some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
> > page of memory allocated per-vCPU. A hypercall call site disables
> > interrupts, then uses this memory to set up the input parameters for
> > the hypercall, read the output results after hypercall execution, and
> > re-enable interrupts. The open coding of these steps leads to
> > inconsistencies, and in some cases, violation of the generic
> > requirements for the hypercall input and output as described in the
> > Hyper-V Top Level Functional Spec (TLFS)[1].
> >
> > To reduce these kinds of problems, introduce a family of inline
> > functions to replace the open coding. The functions provide a new way
> > to manage the use of this per-vCPU memory that is usually the input and
> > output arguments to Hyper-V hypercalls. The functions encapsulate
> > key aspects of the usage and ensure that the TLFS requirements are
> > met (max size of 1 page each for input and output, no overlap of
> > input and output, aligned to 8 bytes, etc.). Conceptually, there
> > is no longer a difference between the "per-vCPU input page" and
> > "per-vCPU output page". Only a single per-vCPU page is allocated, and
> > it provides both hypercall input and output memory. All current
> > hypercalls can fit their input and output within that single page,
> > though the new code allows easy changing to two pages should a future
> > hypercall require a full page for each of the input and output.
> >
> > The new functions always zero the fixed-size portion of the hypercall
> > input area so that uninitialized memory is not inadvertently passed
> > to the hypercall. Current open-coded hypercall call sites are
> > inconsistent on this point, and use of the new functions addresses
> > that inconsistency. The output area is not zero'ed by the new code
> > as it is Hyper-V's responsibility to provide legal output.
> >
> > When the input or output (or both) contain an array, the new functions
> > calculate and return how many array entries fit within the per-vCPU
> > memory page, which is effectively the "batch size" for the hypercall
> > processing multiple entries. This batch size can then be used in the
> > hypercall control word to specify the repetition count. This
> > calculation of the batch size replaces current open coding of the
> > batch size, which is prone to errors. Note that the array portion of
> > the input area is *not* zero'ed. The arrays are almost always 64-bit
> > GPAs or something similar, and zero'ing that much memory seems
> > wasteful at runtime when it will all be overwritten. The hypercall
> > call site is responsible for ensuring that no part of the array is
> > left uninitialized (just as with current code).
> >
> > The new functions are realized as a single inline function that
> > handles the most complex case, which is a hypercall with input
> > and output, both of which contain arrays. Simpler cases are mapped to
> > this most complex case with #define wrappers that provide zero or NULL
> > for some arguments. Several of the arguments to this new function
> > must be compile-time constants generated by "sizeof()"
> > expressions. As such, most of the code in the new function can be
> > evaluated by the compiler, with the result that the code paths are
> > no longer than with the current open coding. The one exception is
> > new code generated to zero the fixed-size portion of the input area
> > in cases where it is not currently done.
> >
> > [1]
> https://learn.microsoft/.
> com%2Fen-us%2Fvirtualization%2Fhyper-v-on-
> windows%2Ftlfs%2Ftlfs&data=05%7C02%7C%7Ceefaa97bb91c4d5c9dfb08dd8114da
> b3%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638808648755643707%
> 7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCI
> sIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=1S2
> 9jKMjSZgciblHrJzH1rVbPuIORh%2FrU1vFcviBBHE%3D&reserved=0
> >
> > Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> > Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> > ---
> >
> > Notes:
> >     Changes in v3:
> >     * Added wrapper #define hv_hvcall_in_batch_size() to get the batch size
> >       without setting up hypercall input/output parameters. This call can be
> >       used when the batch size is needed for validation checks or memory
> >       allocations prior to disabling interrupts.
> >
> >     Changes in v2:
> >     * Added comment that hv_hvcall_inout_array() should always be called with
> >       interrupts disabled because it is returning pointers to per-cpu memory
> >       [Nuno Das Neves]
> >
> >  include/asm-generic/mshyperv.h | 106 +++++++++++++++++++++++++++++++++
> >  1 file changed, 106 insertions(+)
> >
>
> This is very cool, thanks for taking the time! I think the function naming
> could be more intuitive, e.g. hv_setup_*_args(). I'd not block it for that reason,
> but would be super happy if you would update it. What do you think?
>

I'm not particularly enamored with my naming scheme, but it was the
best I could come up with. My criteria were:

* Keep the length reasonably short to not make line length problems
   any worse
* Distinguish the input args only, input & output args, and array versions
* Use the standard "hv_" prefix for Hyper-V related code

Using "setup" instead of "hvcall" seems like an improvement to me, and
it is 1 character shorter.  The "hv" prefix would be there, but they wouldn't
refer specifically to hypercalls. I would not add "_args" on the end because
that's another 5 characters in length. So we would have:

* hv_setup_in()
* hv_setup_inout()
* hv_setup_in_array()
* hv_setup_inout_array()
* hv_setup_in_batch_size() [??]

Or maybe, something like this, or similar, which picks up the "args" string,
but not "setup":

* hv_hcargs_in()
* hv_hcargs_inout()
* hv_hcargs_in_array()
* hv_hcargs_inout_array()
* hv_hcargs_in_batch_size() [??]

I'm very open to any other ideas because I'm not particularly
happy with the hv_hvcall_* approach.

Michael

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-04-21 21:24     ` Michael Kelley
@ 2025-04-21 23:27       ` Easwar Hariharan
  2025-06-04 17:41         ` Easwar Hariharan
  0 siblings, 1 reply; 24+ messages in thread
From: Easwar Hariharan @ 2025-04-21 23:27 UTC (permalink / raw)
  To: Michael Kelley
  Cc: eahariha, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, lpieralisi@kernel.org, kw@linux.com,
	manivannan.sadhasivam@linaro.org, robh@kernel.org,
	bhelgaas@google.com, arnd@arndb.de, x86@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-arch@vger.kernel.org

On 4/21/2025 2:24 PM, Michael Kelley wrote:
> From: Easwar Hariharan <eahariha@linux.microsoft.com> Sent: Monday, April 21, 2025 1:41 PM
>>>

<snip>

>>>
>>
>> This is very cool, thanks for taking the time! I think the function naming
>> could be more intuitive, e.g. hv_setup_*_args(). I'd not block it for that reason,
>> but would be super happy if you would update it. What do you think?
>>
> 
> I'm not particularly enamored with my naming scheme, but it was the
> best I could come up with. My criteria were:
> 
> * Keep the length reasonably short to not make line length problems
>    any worse
> * Distinguish the input args only, input & output args, and array versions

I think the in/inout/array scheme you have does this nicely

> * Use the standard "hv_" prefix for Hyper-V related code
> 
> Using "setup" instead of "hvcall" seems like an improvement to me, and
> it is 1 character shorter.  The "hv" prefix would be there, but they wouldn't
> refer specifically to hypercalls. I would not add "_args" on the end because
> that's another 5 characters in length. So we would have:
> 
> * hv_setup_in()
> * hv_setup_inout()
> * hv_setup_in_array()
> * hv_setup_inout_array()
> * hv_setup_in_batch_size() [??]
> 
> Or maybe, something like this, or similar, which picks up the "args" string,
> but not "setup":
> 
> * hv_hcargs_in()
> * hv_hcargs_inout()
> * hv_hcargs_in_array()
> * hv_hcargs_inout_array()
> * hv_hcargs_in_batch_size() [??]
> 
> I'm very open to any other ideas because I'm not particularly
> happy with the hv_hvcall_* approach.

Between the two presented here, I prefer option 1, with the "setup" verb because it tells you
inline what the function will do. I agree that the "args" is unnecessary because most
hypercall args are named hv_{input, output}_* and are clearly arguments to hv_do_hypercall()
and friends.

Since hv_setup*() will normally be followed shortly after by hv_do_hypercall(), I don't
see a problem with not referring specifically to hypercalls, it should be clear in context.

For hv_hvcall_in_batch_size(), I think it serves a fundamentally different function than the
other wrappers and doesn't need to follow the "setup" pattern. Instead it could be named 
hv_get_input_batch_size() for the same length and similarly tell you its purpose inline.

I am continuing to review the rest of the series, sorry for the delay, and thank you for your
patience!

Thanks,
Easwar (he/him)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-04-21 23:27       ` Easwar Hariharan
@ 2025-06-04 17:41         ` Easwar Hariharan
  0 siblings, 0 replies; 24+ messages in thread
From: Easwar Hariharan @ 2025-06-04 17:41 UTC (permalink / raw)
  To: Michael Kelley
  Cc: eahariha, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, lpieralisi@kernel.org, kw@linux.com,
	manivannan.sadhasivam@linaro.org, robh@kernel.org,
	bhelgaas@google.com, arnd@arndb.de, x86@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-arch@vger.kernel.org

Hi Michael,

On 4/21/2025 4:27 PM, Easwar Hariharan wrote:
> On 4/21/2025 2:24 PM, Michael Kelley wrote:
>> From: Easwar Hariharan <eahariha@linux.microsoft.com> Sent: Monday, April 21, 2025 1:41 PM
>>>>
> 
> <snip>
> 
>>>>
>>>
>>> This is very cool, thanks for taking the time! I think the function naming
>>> could be more intuitive, e.g. hv_setup_*_args(). I'd not block it for that reason,
>>> but would be super happy if you would update it. What do you think?
>>>
>>
>> I'm not particularly enamored with my naming scheme, but it was the
>> best I could come up with. My criteria were:
>>
>> * Keep the length reasonably short to not make line length problems
>>    any worse
>> * Distinguish the input args only, input & output args, and array versions
> 
> I think the in/inout/array scheme you have does this nicely
> 
>> * Use the standard "hv_" prefix for Hyper-V related code
>>
>> Using "setup" instead of "hvcall" seems like an improvement to me, and
>> it is 1 character shorter.  The "hv" prefix would be there, but they wouldn't
>> refer specifically to hypercalls. I would not add "_args" on the end because
>> that's another 5 characters in length. So we would have:
>>
>> * hv_setup_in()
>> * hv_setup_inout()
>> * hv_setup_in_array()
>> * hv_setup_inout_array()
>> * hv_setup_in_batch_size() [??]
>>
>> Or maybe, something like this, or similar, which picks up the "args" string,
>> but not "setup":
>>
>> * hv_hcargs_in()
>> * hv_hcargs_inout()
>> * hv_hcargs_in_array()
>> * hv_hcargs_inout_array()
>> * hv_hcargs_in_batch_size() [??]
>>
>> I'm very open to any other ideas because I'm not particularly
>> happy with the hv_hvcall_* approach.
> 
> Between the two presented here, I prefer option 1, with the "setup" verb because it tells you
> inline what the function will do. I agree that the "args" is unnecessary because most
> hypercall args are named hv_{input, output}_* and are clearly arguments to hv_do_hypercall()
> and friends.
> 
> Since hv_setup*() will normally be followed shortly after by hv_do_hypercall(), I don't
> see a problem with not referring specifically to hypercalls, it should be clear in context.
> 
> For hv_hvcall_in_batch_size(), I think it serves a fundamentally different function than the
> other wrappers and doesn't need to follow the "setup" pattern. Instead it could be named 
> hv_get_input_batch_size() for the same length and similarly tell you its purpose inline.
> 
> I am continuing to review the rest of the series, sorry for the delay, and thank you for your
> patience!
> 

Sorry this took so long, commercial work took up all of my focus the past few weeks. The rest
of the patches look good to me. If you could follow up with a v4 for the function naming,
Wei can pick this up for the 6.17 merge window.

Thanks,
Easwar (he/him)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-04-15 18:07 ` [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments mhkelley58
  2025-04-21 20:41   ` Easwar Hariharan
@ 2025-08-21  0:31   ` Mukesh R
  2025-08-21  2:58     ` Mukesh R
  1 sibling, 1 reply; 24+ messages in thread
From: Mukesh R @ 2025-08-21  0:31 UTC (permalink / raw)
  To: mhklinux, kys, haiyangz, wei.liu, decui, tglx, mingo, bp,
	dave.hansen, hpa, lpieralisi, kw, manivannan.sadhasivam, robh,
	bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

On 4/15/25 11:07, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
> 
> Current code allocates the "hyperv_pcpu_input_arg", and in
> some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
> page of memory allocated per-vCPU. A hypercall call site disables
> interrupts, then uses this memory to set up the input parameters for
> the hypercall, read the output results after hypercall execution, and
> re-enable interrupts. The open coding of these steps leads to
> inconsistencies, and in some cases, violation of the generic
> requirements for the hypercall input and output as described in the
> Hyper-V Top Level Functional Spec (TLFS)[1].
> 
<snip>

> The new functions are realized as a single inline function that
> handles the most complex case, which is a hypercall with input
> and output, both of which contain arrays. Simpler cases are mapped to
> this most complex case with #define wrappers that provide zero or NULL
> for some arguments. Several of the arguments to this new function
> must be compile-time constants generated by "sizeof()"
> expressions. As such, most of the code in the new function can be
> evaluated by the compiler, with the result that the code paths are
> no longer than with the current open coding. The one exception is
> new code generated to zero the fixed-size portion of the input area
> in cases where it is not currently done.

IMHO, this is unnecessary change that just obfuscates code. With status quo
one has the advantage of seeing what exactly is going on, one can use the
args any which way, change batch size any which way, and is thus flexible.
With time these functions only get more complicated and error prone. The
saving of ram is very minimal, this makes analyzing crash dumps harder,
and in some cases like in your patch 3/7 disables unnecessarily in error case:

- if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
-  pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
-   HV_MAX_MODIFY_GPA_REP_COUNT);
+ local_irq_save(flags);      <<<<<<<
...

So, this is a nak from me. sorry.

<snip>

> +/*
> + * Allocate one page that is shared between input and output args, which is
> + * sufficient for all current hypercalls. If a future hypercall requires

That is incorrect. We've iommu map hypercalls that will use up entire page
for input. More coming as we implement ram withdrawl from the hypervisor.

Thanks,
-Mukesh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-08-21  0:31   ` Mukesh R
@ 2025-08-21  2:58     ` Mukesh R
  2025-08-21 19:24       ` Michael Kelley
  0 siblings, 1 reply; 24+ messages in thread
From: Mukesh R @ 2025-08-21  2:58 UTC (permalink / raw)
  To: mhklinux, kys, haiyangz, wei.liu, decui, tglx, mingo, bp,
	dave.hansen, hpa, lpieralisi, kw, manivannan.sadhasivam, robh,
	bhelgaas, arnd
  Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch

On 8/20/25 17:31, Mukesh R wrote:
> On 4/15/25 11:07, mhkelley58@gmail.com wrote:
>> From: Michael Kelley <mhklinux@outlook.com>
>>
>> Current code allocates the "hyperv_pcpu_input_arg", and in
>> some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
>> page of memory allocated per-vCPU. A hypercall call site disables
>> interrupts, then uses this memory to set up the input parameters for
>> the hypercall, read the output results after hypercall execution, and
>> re-enable interrupts. The open coding of these steps leads to
>> inconsistencies, and in some cases, violation of the generic
>> requirements for the hypercall input and output as described in the
>> Hyper-V Top Level Functional Spec (TLFS)[1].
>>
> <snip>
> 
>> The new functions are realized as a single inline function that
>> handles the most complex case, which is a hypercall with input
>> and output, both of which contain arrays. Simpler cases are mapped to
>> this most complex case with #define wrappers that provide zero or NULL
>> for some arguments. Several of the arguments to this new function
>> must be compile-time constants generated by "sizeof()"
>> expressions. As such, most of the code in the new function can be
>> evaluated by the compiler, with the result that the code paths are
>> no longer than with the current open coding. The one exception is
>> new code generated to zero the fixed-size portion of the input area
>> in cases where it is not currently done.
> 
> IMHO, this is unnecessary change that just obfuscates code. With status quo
> one has the advantage of seeing what exactly is going on, one can use the
> args any which way, change batch size any which way, and is thus flexible.
> With time these functions only get more complicated and error prone. The
> saving of ram is very minimal, this makes analyzing crash dumps harder,
> and in some cases like in your patch 3/7 disables unnecessarily in error case:
> 
> - if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
> -  pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
> -   HV_MAX_MODIFY_GPA_REP_COUNT);
> + local_irq_save(flags);      <<<<<<<
> ...
> 
> So, this is a nak from me. sorry.
> 

Furthermore, this makes us lose the ability to permanently map
input/output pages in the hypervisor. So, Wei kindly undo.

Thanks,
-Mukesh



> <snip>
> 
>> +/*
>> + * Allocate one page that is shared between input and output args, which is
>> + * sufficient for all current hypercalls. If a future hypercall requires
> 
> That is incorrect. We've iommu map hypercalls that will use up entire page
> for input. More coming as we implement ram withdrawl from the hypervisor.
> 
> Thanks,
> -Mukesh


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-08-21  2:58     ` Mukesh R
@ 2025-08-21 19:24       ` Michael Kelley
  2025-08-21 20:49         ` Mukesh R
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Kelley @ 2025-08-21 19:24 UTC (permalink / raw)
  To: Mukesh R, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, lpieralisi@kernel.org, kw@linux.com,
	manivannan.sadhasivam@linaro.org, robh@kernel.org,
	bhelgaas@google.com, arnd@arndb.de
  Cc: x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-arch@vger.kernel.org

From: Mukesh R <mrathor@linux.microsoft.com> Sent: Wednesday, August 20, 2025 7:58 PM
> 
> On 8/20/25 17:31, Mukesh R wrote:
> > On 4/15/25 11:07, mhkelley58@gmail.com wrote:
> >> From: Michael Kelley <mhklinux@outlook.com>
> >>
> >> Current code allocates the "hyperv_pcpu_input_arg", and in
> >> some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
> >> page of memory allocated per-vCPU. A hypercall call site disables
> >> interrupts, then uses this memory to set up the input parameters for
> >> the hypercall, read the output results after hypercall execution, and
> >> re-enable interrupts. The open coding of these steps leads to
> >> inconsistencies, and in some cases, violation of the generic
> >> requirements for the hypercall input and output as described in the
> >> Hyper-V Top Level Functional Spec (TLFS)[1].
> >>
> > <snip>
> >
> >> The new functions are realized as a single inline function that
> >> handles the most complex case, which is a hypercall with input
> >> and output, both of which contain arrays. Simpler cases are mapped to
> >> this most complex case with #define wrappers that provide zero or NULL
> >> for some arguments. Several of the arguments to this new function
> >> must be compile-time constants generated by "sizeof()"
> >> expressions. As such, most of the code in the new function can be
> >> evaluated by the compiler, with the result that the code paths are
> >> no longer than with the current open coding. The one exception is
> >> new code generated to zero the fixed-size portion of the input area
> >> in cases where it is not currently done.
> >
> > IMHO, this is unnecessary change that just obfuscates code. With status quo
> > one has the advantage of seeing what exactly is going on, one can use the
> > args any which way, change batch size any which way, and is thus flexible.

I started this patch set in response to some errors in open coding the
use of hyperv_pcpu_input/output_arg, to see if helper functions could
regularize the usage and reduce the likelihood of future errors. Balancing
the pluses and minuses of the result, in my view the helper functions are
an improvement, though not overwhelmingly so. Others may see the
tradeoffs differently, and as such I would not go to the mat in arguing the
patches must be taken. But if we don't take them, we need to go back and
clean up minor errors and inconsistencies in the open coding at some
existing hypercall call sites.

> > With time these functions only get more complicated and error prone. The
> > saving of ram is very minimal, this makes analyzing crash dumps harder,
> > and in some cases like in your patch 3/7 disables unnecessarily in error case:
> >
> > - if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
> > -  pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
> > -   HV_MAX_MODIFY_GPA_REP_COUNT);
> > + local_irq_save(flags);      <<<<<<<
> > ...

FWIW, this error case is not disabled. It is checked a few lines further down as:

+       if (count > batch_size) {
+               pr_err("Hyper-V: GPA count:%d exceeds supported:%u\n", count,
+                      batch_size);

> >
> > So, this is a nak from me. sorry.
> >
> 
> Furthermore, this makes us lose the ability to permanently map
> input/output pages in the hypervisor. So, Wei kindly undo.
> 

Could you elaborate on "lose the ability to permanently map
input/output pages in the hypervisor"? What specifically can't be
done and why?

<snip>

> >
> >> +/*
> >> + * Allocate one page that is shared between input and output args, which is
> >> + * sufficient for all current hypercalls. If a future hypercall requires
> >
> > That is incorrect. We've iommu map hypercalls that will use up entire page
> > for input. More coming as we implement ram withdrawl from the hypervisor.

At least some form of ram withdrawal is already implemented upstream as
hv_call_withdraw_memory(). The hypercall has a very small input using the
hv_setup_in() helper, but the output list of PFNs must go to a separately
allocated page so it can be retained with interrupts enabled while
__free_page() is called. The use of this separate output page predates the
introduction of the hv_setup_in() helper.

For iommu map hypercalls, what do the input and output look like? Is the
paradigm different from the typical small fixed portion plus a variable size
array of values that are fed into a rep hypercall? Is there also a large amount
of output from the hypercall? Just curious if there's a case that's fundamentally
different from the current set of hypercalls.

Michael

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-08-21 19:24       ` Michael Kelley
@ 2025-08-21 20:49         ` Mukesh R
  2025-08-21 21:15           ` Mukesh R
  2025-08-22  2:10           ` Michael Kelley
  0 siblings, 2 replies; 24+ messages in thread
From: Mukesh R @ 2025-08-21 20:49 UTC (permalink / raw)
  To: Michael Kelley, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, lpieralisi@kernel.org, kw@linux.com,
	manivannan.sadhasivam@linaro.org, robh@kernel.org,
	bhelgaas@google.com, arnd@arndb.de
  Cc: x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-arch@vger.kernel.org

On 8/21/25 12:24, Michael Kelley wrote:
> From: Mukesh R <mrathor@linux.microsoft.com> Sent: Wednesday, August 20, 2025 7:58 PM
>>
>> On 8/20/25 17:31, Mukesh R wrote:
>>> On 4/15/25 11:07, mhkelley58@gmail.com wrote:
>>>> From: Michael Kelley <mhklinux@outlook.com>
>>>>
>>>>
<snip>
>>>
>>>
>>> IMHO, this is unnecessary change that just obfuscates code. With status quo
>>> one has the advantage of seeing what exactly is going on, one can use the
>>> args any which way, change batch size any which way, and is thus flexible.
> 
> I started this patch set in response to some errors in open coding the
> use of hyperv_pcpu_input/output_arg, to see if helper functions could
> regularize the usage and reduce the likelihood of future errors. Balancing
> the pluses and minuses of the result, in my view the helper functions are
> an improvement, though not overwhelmingly so. Others may see the
> tradeoffs differently, and as such I would not go to the mat in arguing the
> patches must be taken. But if we don't take them, we need to go back and
> clean up minor errors and inconsistencies in the open coding at some
> existing hypercall call sites.

Yes, definitely. Assuming Nuno knows what issues you are referring to,
I'll work with him to get them addressed asap. Thanks for noticing them.
If Nuno is not aware, I'll ping you for more info.


>>> With time these functions only get more complicated and error prone. The
>>> saving of ram is very minimal, this makes analyzing crash dumps harder,
>>> and in some cases like in your patch 3/7 disables unnecessarily in error case:
>>>
>>> - if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
>>> -  pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
>>> -   HV_MAX_MODIFY_GPA_REP_COUNT);
>>> + local_irq_save(flags);      <<<<<<<
>>> ...
> 
> FWIW, this error case is not disabled. It is checked a few lines further down as:

I meant disabled interrupts. The check moves after disabling interrupts, so
it runs "disabled" in traditional OS terminology :).

> 
> +       if (count > batch_size) {
> +               pr_err("Hyper-V: GPA count:%d exceeds supported:%u\n", count,
> +                      batch_size);
> 
>>>
>>> So, this is a nak from me. sorry.
>>>
>>
>> Furthermore, this makes us lose the ability to permanently map
>> input/output pages in the hypervisor. So, Wei kindly undo.
>>
> 
> Could you elaborate on "lose the ability to permanently map
> input/output pages in the hypervisor"? What specifically can't be
> done and why?

Input and output are mapped at fixed GPA/SPA always to avoid hyp
having to map/unmap every time.

> <snip>
> 
>>>
>>>> +/*
>>>> + * Allocate one page that is shared between input and output args, which is
>>>> + * sufficient for all current hypercalls. If a future hypercall requires
>>>
>>> That is incorrect. We've iommu map hypercalls that will use up entire page
>>> for input. More coming as we implement ram withdrawl from the hypervisor.
> 
> At least some form of ram withdrawal is already implemented upstream as
> hv_call_withdraw_memory(). The hypercall has a very small input using the
> hv_setup_in() helper, but the output list of PFNs must go to a separately
> allocated page so it can be retained with interrupts enabled while
> __free_page() is called. The use of this separate output page predates the
> introduction of the hv_setup_in() helper.

Yeah, I am talking about hyp memory that loader gives it, and during the
lifetime it accumulates for VMs. We are opening the flood gates, so you
will see lots patches very soon.


> For iommu map hypercalls, what do the input and output look like? Is the
> paradigm different from the typical small fixed portion plus a variable size
> array of values that are fed into a rep hypercall? Is there also a large amount
> of output from the hypercall? Just curious if there's a case that's fundamentally
> different from the current set of hypercalls.

Patches coming soon, but at high level, hypercall includes list of SPAs
that hypevisor will map into the iommu. These can get large. We will be
exploring what we can do better to pass them, perhaps multiple pages, not
sure yet, but for now it's single page.

Thanks,
-Mukesh


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-08-21 20:49         ` Mukesh R
@ 2025-08-21 21:15           ` Mukesh R
  2025-08-22  2:16             ` Michael Kelley
  2025-08-22  2:10           ` Michael Kelley
  1 sibling, 1 reply; 24+ messages in thread
From: Mukesh R @ 2025-08-21 21:15 UTC (permalink / raw)
  To: Michael Kelley, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, lpieralisi@kernel.org, kw@linux.com,
	manivannan.sadhasivam@linaro.org, robh@kernel.org,
	bhelgaas@google.com, arnd@arndb.de
  Cc: x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-arch@vger.kernel.org

On 8/21/25 13:49, Mukesh R wrote:
> On 8/21/25 12:24, Michael Kelley wrote:
>> From: Mukesh R <mrathor@linux.microsoft.com> Sent: Wednesday, August 20, 2025 7:58 PM
>>>
>>> On 8/20/25 17:31, Mukesh R wrote:
>>>> On 4/15/25 11:07, mhkelley58@gmail.com wrote:
>>>>> From: Michael Kelley <mhklinux@outlook.com>
>>>>>
>>>>>
> <snip>
>>>>
>>>>
>>>> IMHO, this is unnecessary change that just obfuscates code. With status quo
>>>> one has the advantage of seeing what exactly is going on, one can use the
>>>> args any which way, change batch size any which way, and is thus flexible.
>>
>> I started this patch set in response to some errors in open coding the
>> use of hyperv_pcpu_input/output_arg, to see if helper functions could
>> regularize the usage and reduce the likelihood of future errors. Balancing
>> the pluses and minuses of the result, in my view the helper functions are
>> an improvement, though not overwhelmingly so. Others may see the
>> tradeoffs differently, and as such I would not go to the mat in arguing the
>> patches must be taken. But if we don't take them, we need to go back and
>> clean up minor errors and inconsistencies in the open coding at some
>> existing hypercall call sites.
> 
> Yes, definitely. Assuming Nuno knows what issues you are referring to,
> I'll work with him to get them addressed asap. Thanks for noticing them.
> If Nuno is not aware, I'll ping you for more info.

Talked to Nuno, he's not aware of anything pending or details. So if you
can kindly list them out here, I will make sure it gets addressed right
away.

Thanks,
-Mukesh


>>
<deleted>
>>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-08-21 20:49         ` Mukesh R
  2025-08-21 21:15           ` Mukesh R
@ 2025-08-22  2:10           ` Michael Kelley
  2025-08-23  2:25             ` Mukesh R
  1 sibling, 1 reply; 24+ messages in thread
From: Michael Kelley @ 2025-08-22  2:10 UTC (permalink / raw)
  To: Mukesh R, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, lpieralisi@kernel.org, kw@linux.com,
	manivannan.sadhasivam@linaro.org, robh@kernel.org,
	bhelgaas@google.com, arnd@arndb.de
  Cc: x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-arch@vger.kernel.org

From: Mukesh R <mrathor@linux.microsoft.com> Sent: Thursday, August 21, 2025 1:50 PM
> 
> On 8/21/25 12:24, Michael Kelley wrote:
> > From: Mukesh R <mrathor@linux.microsoft.com> Sent: Wednesday, August 20, 2025
> 7:58 PM
> >>
> >> On 8/20/25 17:31, Mukesh R wrote:
> >>> On 4/15/25 11:07, mhkelley58@gmail.com wrote:
> >>>> From: Michael Kelley <mhklinux@outlook.com>
> >>>>
> >>>>
> <snip>
> >>>
> >>>
> >>> IMHO, this is unnecessary change that just obfuscates code. With status quo
> >>> one has the advantage of seeing what exactly is going on, one can use the
> >>> args any which way, change batch size any which way, and is thus flexible.
> >
> > I started this patch set in response to some errors in open coding the
> > use of hyperv_pcpu_input/output_arg, to see if helper functions could
> > regularize the usage and reduce the likelihood of future errors. Balancing
> > the pluses and minuses of the result, in my view the helper functions are
> > an improvement, though not overwhelmingly so. Others may see the
> > tradeoffs differently, and as such I would not go to the mat in arguing the
> > patches must be taken. But if we don't take them, we need to go back and
> > clean up minor errors and inconsistencies in the open coding at some
> > existing hypercall call sites.
> 
> Yes, definitely. Assuming Nuno knows what issues you are referring to,
> I'll work with him to get them addressed asap. Thanks for noticing them.
> If Nuno is not aware, I'll ping you for more info.
> 
> 
> >>> With time these functions only get more complicated and error prone. The
> >>> saving of ram is very minimal, this makes analyzing crash dumps harder,
> >>> and in some cases like in your patch 3/7 disables unnecessarily in error case:
> >>>
> >>> - if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
> >>> -  pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
> >>> -   HV_MAX_MODIFY_GPA_REP_COUNT);
> >>> + local_irq_save(flags);      <<<<<<<
> >>> ...
> >
> > FWIW, this error case is not disabled. It is checked a few lines further down as:
> 
> I meant disabled interrupts. The check moves after disabling interrupts, so
> it runs "disabled" in traditional OS terminology :).

Got it. But why is it problem to make this check with interrupts disabled?
The check is just for robustness and should never be true since
hv_mark_gpa_visiblity() is called from only one place that already ensures
the PFN count won't overflow a single page.

> 
> >
> > +       if (count > batch_size) {
> > +               pr_err("Hyper-V: GPA count:%d exceeds supported:%u\n", count,
> > +                      batch_size);
> >
> >>>
> >>> So, this is a nak from me. sorry.
> >>>
> >>
> >> Furthermore, this makes us lose the ability to permanently map
> >> input/output pages in the hypervisor. So, Wei kindly undo.
> >>
> >
> > Could you elaborate on "lose the ability to permanently map
> > input/output pages in the hypervisor"? What specifically can't be
> > done and why?
> 
> Input and output are mapped at fixed GPA/SPA always to avoid hyp
> having to map/unmap every time.

OK. But how does this patch set impede doing a fixed mapping?
Wouldn't that fixed mapping be done at the time the page or pages
are allocated, and then be transparent to hypercall call sites?

> 
> > <snip>
> >
> >>>
> >>>> +/*
> >>>> + * Allocate one page that is shared between input and output args, which is
> >>>> + * sufficient for all current hypercalls. If a future hypercall requires
> >>>
> >>> That is incorrect. We've iommu map hypercalls that will use up entire page
> >>> for input. More coming as we implement ram withdrawl from the hypervisor.
> >
> > At least some form of ram withdrawal is already implemented upstream as
> > hv_call_withdraw_memory(). The hypercall has a very small input using the
> > hv_setup_in() helper, but the output list of PFNs must go to a separately
> > allocated page so it can be retained with interrupts enabled while
> > __free_page() is called. The use of this separate output page predates the
> > introduction of the hv_setup_in() helper.
> 
> Yeah, I am talking about hyp memory that loader gives it, and during the
> lifetime it accumulates for VMs. We are opening the flood gates, so you
> will see lots patches very soon.
> 
> 
> > For iommu map hypercalls, what do the input and output look like? Is the
> > paradigm different from the typical small fixed portion plus a variable size
> > array of values that are fed into a rep hypercall? Is there also a large amount
> > of output from the hypercall? Just curious if there's a case that's fundamentally
> > different from the current set of hypercalls.
> 
> Patches coming soon, but at high level, hypercall includes list of SPAs
> that hypevisor will map into the iommu. These can get large. We will be
> exploring what we can do better to pass them, perhaps multiple pages, not
> sure yet, but for now it's single page.

To be clear, if the iommu hypercall does not produce any output, this patch
set uses the entire per-cpu hypercall arg page for input. For example,
hv_mark_gpa_visibility() uses the entire page for input, which is mostly an
array of PFNs.

Using multiple input pages is definitely a new paradigm, on both the
hypervisor and guest sides, and that will need additional infrastructure,
with or without this patch set.

I'm just trying to understand where there are real technical blockers vs.
concern about the style and the encapsulation the helpers impose.

Michael

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-08-21 21:15           ` Mukesh R
@ 2025-08-22  2:16             ` Michael Kelley
  2025-08-26  0:13               ` Nuno Das Neves
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Kelley @ 2025-08-22  2:16 UTC (permalink / raw)
  To: Mukesh R, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, lpieralisi@kernel.org, kw@linux.com,
	manivannan.sadhasivam@linaro.org, robh@kernel.org,
	bhelgaas@google.com, arnd@arndb.de
  Cc: x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-arch@vger.kernel.org

From: Mukesh R <mrathor@linux.microsoft.com> Sent: Thursday, August 21, 2025 2:16 PM
> 
> On 8/21/25 13:49, Mukesh R wrote:
> > On 8/21/25 12:24, Michael Kelley wrote:
> >> From: Mukesh R <mrathor@linux.microsoft.com> Sent: Wednesday, August 20, 2025 7:58 PM
> >>>
> >>> On 8/20/25 17:31, Mukesh R wrote:
> >>>> On 4/15/25 11:07, mhkelley58@gmail.com wrote:
> >>>>> From: Michael Kelley <mhklinux@outlook.com>
> >>>>>
> >>>>>
> > <snip>
> >>>>
> >>>>
> >>>> IMHO, this is unnecessary change that just obfuscates code. With status quo
> >>>> one has the advantage of seeing what exactly is going on, one can use the
> >>>> args any which way, change batch size any which way, and is thus flexible.
> >>
> >> I started this patch set in response to some errors in open coding the
> >> use of hyperv_pcpu_input/output_arg, to see if helper functions could
> >> regularize the usage and reduce the likelihood of future errors. Balancing
> >> the pluses and minuses of the result, in my view the helper functions are
> >> an improvement, though not overwhelmingly so. Others may see the
> >> tradeoffs differently, and as such I would not go to the mat in arguing the
> >> patches must be taken. But if we don't take them, we need to go back and
> >> clean up minor errors and inconsistencies in the open coding at some
> >> existing hypercall call sites.
> >
> > Yes, definitely. Assuming Nuno knows what issues you are referring to,
> > I'll work with him to get them addressed asap. Thanks for noticing them.
> > If Nuno is not aware, I'll ping you for more info.
> 
> Talked to Nuno, he's not aware of anything pending or details. So if you
> can kindly list them out here, I will make sure it gets addressed right
> away.
> 

I didn't catalog the issues as I came across them when doing this patch
set. :-(   I don't think any are bugs that could break things now. They were
things like not ensuring that all hypercall input fields are initialized to zero,
duplicate initialization to zero, and unnecessary initialization of hypercall
output memory. In general, how the hypercall args are set up is inconsistent
across different hypercall call sites, and that inconsistency can lead to errors,
which is what I was trying to address.

But I can go back and come up with a list if that's where we're headed.

Michael

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-08-22  2:10           ` Michael Kelley
@ 2025-08-23  2:25             ` Mukesh R
  2025-08-25 21:01               ` Michael Kelley
  0 siblings, 1 reply; 24+ messages in thread
From: Mukesh R @ 2025-08-23  2:25 UTC (permalink / raw)
  To: Michael Kelley, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, lpieralisi@kernel.org, kw@linux.com,
	manivannan.sadhasivam@linaro.org, robh@kernel.org,
	bhelgaas@google.com, arnd@arndb.de
  Cc: x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-arch@vger.kernel.org

On 8/21/25 19:10, Michael Kelley wrote:
> From: Mukesh R <mrathor@linux.microsoft.com> Sent: Thursday, August 21, 2025 1:50 PM
>>
>> On 8/21/25 12:24, Michael Kelley wrote:
>>> From: Mukesh R <mrathor@linux.microsoft.com> Sent: Wednesday, August 20, 2025
>> 7:58 PM
>>>>
>>>> On 8/20/25 17:31, Mukesh R wrote:
>>>>> On 4/15/25 11:07, mhkelley58@gmail.com wrote:
>>>>>> From: Michael Kelley <mhklinux@outlook.com>
>>>>>>
>>>>>>
>> <snip>
>>>>>
>>>>>
>>>>> IMHO, this is unnecessary change that just obfuscates code. With status quo
>>>>> one has the advantage of seeing what exactly is going on, one can use the
>>>>> args any which way, change batch size any which way, and is thus flexible.
>>>
>>> I started this patch set in response to some errors in open coding the
>>> use of hyperv_pcpu_input/output_arg, to see if helper functions could
>>> regularize the usage and reduce the likelihood of future errors. Balancing
>>> the pluses and minuses of the result, in my view the helper functions are
>>> an improvement, though not overwhelmingly so. Others may see the
>>> tradeoffs differently, and as such I would not go to the mat in arguing the
>>> patches must be taken. But if we don't take them, we need to go back and
>>> clean up minor errors and inconsistencies in the open coding at some
>>> existing hypercall call sites.
>>
>> Yes, definitely. Assuming Nuno knows what issues you are referring to,
>> I'll work with him to get them addressed asap. Thanks for noticing them.
>> If Nuno is not aware, I'll ping you for more info.
>>
>>
>>>>> With time these functions only get more complicated and error prone. The
>>>>> saving of ram is very minimal, this makes analyzing crash dumps harder,
>>>>> and in some cases like in your patch 3/7 disables unnecessarily in error case:
>>>>>
>>>>> - if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
>>>>> -  pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
>>>>> -   HV_MAX_MODIFY_GPA_REP_COUNT);
>>>>> + local_irq_save(flags);      <<<<<<<
>>>>> ...
>>>
>>> FWIW, this error case is not disabled. It is checked a few lines further down as:
>>
>> I meant disabled interrupts. The check moves after disabling interrupts, so
>> it runs "disabled" in traditional OS terminology :).
> 
> Got it. But why is it problem to make this check with interrupts disabled?

You are creating disabling overhead where that overhead previously
did not exist.


> The check is just for robustness and should never be true since
> hv_mark_gpa_visiblity() is called from only one place that already ensures
> the PFN count won't overflow a single page.
> 
>>
>>>
>>> +       if (count > batch_size) {
>>> +               pr_err("Hyper-V: GPA count:%d exceeds supported:%u\n", count,
>>> +                      batch_size);
>>>
>>>>>
>>>>> So, this is a nak from me. sorry.
>>>>>
>>>>
>>>> Furthermore, this makes us lose the ability to permanently map
>>>> input/output pages in the hypervisor. So, Wei kindly undo.
>>>>
>>>
>>> Could you elaborate on "lose the ability to permanently map
>>> input/output pages in the hypervisor"? What specifically can't be
>>> done and why?
>>
>> Input and output are mapped at fixed GPA/SPA always to avoid hyp
>> having to map/unmap every time.
> 
> OK. But how does this patch set impede doing a fixed mapping?

The output address can be varied depending on the hypercall, instead
of it being fixed always at fixed address:

          *(void **)output = space + offset; <<<<<<

> Wouldn't that fixed mapping be done at the time the page or pages
> are allocated, and then be transparent to hypercall call sites?
> 
>>
>>> <snip>
>>>
>>>>>
>>>>>> +/*
>>>>>> + * Allocate one page that is shared between input and output args, which is
>>>>>> + * sufficient for all current hypercalls. If a future hypercall requires
>>>>>
>>>>> That is incorrect. We've iommu map hypercalls that will use up entire page
>>>>> for input. More coming as we implement ram withdrawl from the hypervisor.
>>>
>>> At least some form of ram withdrawal is already implemented upstream as
>>> hv_call_withdraw_memory(). The hypercall has a very small input using the
>>> hv_setup_in() helper, but the output list of PFNs must go to a separately
>>> allocated page so it can be retained with interrupts enabled while
>>> __free_page() is called. The use of this separate output page predates the
>>> introduction of the hv_setup_in() helper.
>>
>> Yeah, I am talking about hyp memory that loader gives it, and during the
>> lifetime it accumulates for VMs. We are opening the flood gates, so you
>> will see lots patches very soon.
>>
>>
>>> For iommu map hypercalls, what do the input and output look like? Is the
>>> paradigm different from the typical small fixed portion plus a variable size
>>> array of values that are fed into a rep hypercall? Is there also a large amount
>>> of output from the hypercall? Just curious if there's a case that's fundamentally
>>> different from the current set of hypercalls.
>>
>> Patches coming soon, but at high level, hypercall includes list of SPAs
>> that hypevisor will map into the iommu. These can get large. We will be
>> exploring what we can do better to pass them, perhaps multiple pages, not
>> sure yet, but for now it's single page.
> 
> To be clear, if the iommu hypercall does not produce any output, this patch
> set uses the entire per-cpu hypercall arg page for input. For example,

Good

> hv_mark_gpa_visibility() uses the entire page for input, which is mostly an
> array of PFNs.
> 
> Using multiple input pages is definitely a new paradigm, on both the
> hypervisor and guest sides, and that will need additional infrastructure,
> with or without this patch set.

Right. With this patch set, every hcall is affected rather than just one
when code is modified to support that. That means one must test every
hypercall.

> I'm just trying to understand where there are real technical blockers vs.
> concern about the style and the encapsulation the helpers impose.

Well no technical blockers that can't be resolved, but style and obfuscation
that helpers impose are big concern.

Thanks,
-Mukesh


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-08-23  2:25             ` Mukesh R
@ 2025-08-25 21:01               ` Michael Kelley
  0 siblings, 0 replies; 24+ messages in thread
From: Michael Kelley @ 2025-08-25 21:01 UTC (permalink / raw)
  To: Mukesh R, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, lpieralisi@kernel.org, kw@linux.com,
	manivannan.sadhasivam@linaro.org, robh@kernel.org,
	bhelgaas@google.com, arnd@arndb.de
  Cc: x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-arch@vger.kernel.org

From: Mukesh R <mrathor@linux.microsoft.com> Sent: Friday, August 22, 2025 7:25 PM
> 
> On 8/21/25 19:10, Michael Kelley wrote:
> > From: Mukesh R <mrathor@linux.microsoft.com> Sent: Thursday, August 21, 2025 1:50 PM
> >>
> >> On 8/21/25 12:24, Michael Kelley wrote:
> >>> From: Mukesh R <mrathor@linux.microsoft.com> Sent: Wednesday, August 20, 2025 7:58 PM
> >>>>
> >>>> On 8/20/25 17:31, Mukesh R wrote:
> >>>>> On 4/15/25 11:07, mhkelley58@gmail.com wrote:
> >>>>>> From: Michael Kelley <mhklinux@outlook.com>
> >>>>>>
> >>>>>>
> >> <snip>
> >>>>>
> >>>>>
> >>>>> IMHO, this is unnecessary change that just obfuscates code. With status quo
> >>>>> one has the advantage of seeing what exactly is going on, one can use the
> >>>>> args any which way, change batch size any which way, and is thus flexible.
> >>>
> >>> I started this patch set in response to some errors in open coding the
> >>> use of hyperv_pcpu_input/output_arg, to see if helper functions could
> >>> regularize the usage and reduce the likelihood of future errors. Balancing
> >>> the pluses and minuses of the result, in my view the helper functions are
> >>> an improvement, though not overwhelmingly so. Others may see the
> >>> tradeoffs differently, and as such I would not go to the mat in arguing the
> >>> patches must be taken. But if we don't take them, we need to go back and
> >>> clean up minor errors and inconsistencies in the open coding at some
> >>> existing hypercall call sites.
> >>
> >> Yes, definitely. Assuming Nuno knows what issues you are referring to,
> >> I'll work with him to get them addressed asap. Thanks for noticing them.
> >> If Nuno is not aware, I'll ping you for more info.
> >>
> >>
> >>>>> With time these functions only get more complicated and error prone. The
> >>>>> saving of ram is very minimal, this makes analyzing crash dumps harder,
> >>>>> and in some cases like in your patch 3/7 disables unnecessarily in error case:
> >>>>>
> >>>>> - if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
> >>>>> -  pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
> >>>>> -   HV_MAX_MODIFY_GPA_REP_COUNT);
> >>>>> + local_irq_save(flags);      <<<<<<<
> >>>>> ...
> >>>
> >>> FWIW, this error case is not disabled. It is checked a few lines further down as:
> >>
> >> I meant disabled interrupts. The check moves after disabling interrupts, so
> >> it runs "disabled" in traditional OS terminology :).
> >
> > Got it. But why is it problem to make this check with interrupts disabled?
> 
> You are creating disabling overhead where that overhead previously
> did not exist.

I'm not clear on what you mean by "disabling overhead". The existing code
does the following:

1) Validate that "count" is not too big, and return an error if it is.
2) Disable interrupts
3) Populate the per-cpu hypercall input arg
4) Make the hypercall
5) Re-enable interrupts

With the patch, steps 1 and 2 are done in a different order:

2) Disable interrupts
1) Validate that "count" is not too big. Re-enable interrupts and return an error if it is.
3) Populate the per-cpu hypercall input arg
4) Make the hypercall
5) Re-enable interrupts

Validating "count" with interrupts disabled is probably an additional
2 or 3 instructions executed with interrupts disabled, which is negligible
compared to the thousands (or more) of instructions the hypercall will
execute with interrupts disabled.

Or are you referring to something else as "disabling overhead"?

> 
> 
> > The check is just for robustness and should never be true since
> > hv_mark_gpa_visiblity() is called from only one place that already ensures
> > the PFN count won't overflow a single page.
> >
> >>
> >>>
> >>> +       if (count > batch_size) {
> >>> +               pr_err("Hyper-V: GPA count:%d exceeds supported:%u\n", count,
> >>> +                      batch_size);
> >>>
> >>>>>
> >>>>> So, this is a nak from me. sorry.
> >>>>>
> >>>>
> >>>> Furthermore, this makes us lose the ability to permanently map
> >>>> input/output pages in the hypervisor. So, Wei kindly undo.
> >>>>
> >>>
> >>> Could you elaborate on "lose the ability to permanently map
> >>> input/output pages in the hypervisor"? What specifically can't be
> >>> done and why?
> >>
> >> Input and output are mapped at fixed GPA/SPA always to avoid hyp
> >> having to map/unmap every time.
> >
> > OK. But how does this patch set impede doing a fixed mapping?
> 
> The output address can be varied depending on the hypercall, instead
> of it being fixed always at fixed address:
> 
>           *(void **)output = space + offset; <<<<<<

Agreed. But since mappings from GPA to SPA are page granular, having
such a fixed mapping means that there's a mapping for every byte in
the page containing the GPA to the corresponding byte in the SPA,
right? So even though the offset above may vary across hypercalls,
the output GPA still refers to the same page (since the offset is always
less than 4096), and that page has a fixed mapping. I would expect the
hypercall code in the hypervisor to look for an existing mapping based
on the output page, not the output address that includes the offset.
But I'm haven't looked at the hypervisor code. If the Hyper-V folks say
that a non-zero offset thwarts finding the existing mapping, what does
the hypervisor end up doing? Creating a 2nd mapping wouldn't seem
to make sense. So I'm really curious about what's going on ....

Michael

> 
> > Wouldn't that fixed mapping be done at the time the page or pages
> > are allocated, and then be transparent to hypercall call sites?
> >
> >>
> >>> <snip>
> >>>
> >>>>>
> >>>>>> +/*
> >>>>>> + * Allocate one page that is shared between input and output args, which is
> >>>>>> + * sufficient for all current hypercalls. If a future hypercall requires
> >>>>>
> >>>>> That is incorrect. We've iommu map hypercalls that will use up entire page
> >>>>> for input. More coming as we implement ram withdrawl from the hypervisor.
> >>>
> >>> At least some form of ram withdrawal is already implemented upstream as
> >>> hv_call_withdraw_memory(). The hypercall has a very small input using the
> >>> hv_setup_in() helper, but the output list of PFNs must go to a separately
> >>> allocated page so it can be retained with interrupts enabled while
> >>> __free_page() is called. The use of this separate output page predates the
> >>> introduction of the hv_setup_in() helper.
> >>
> >> Yeah, I am talking about hyp memory that loader gives it, and during the
> >> lifetime it accumulates for VMs. We are opening the flood gates, so you
> >> will see lots patches very soon.
> >>
> >>
> >>> For iommu map hypercalls, what do the input and output look like? Is the
> >>> paradigm different from the typical small fixed portion plus a variable size
> >>> array of values that are fed into a rep hypercall? Is there also a large amount
> >>> of output from the hypercall? Just curious if there's a case that's fundamentally
> >>> different from the current set of hypercalls.
> >>
> >> Patches coming soon, but at high level, hypercall includes list of SPAs
> >> that hypevisor will map into the iommu. These can get large. We will be
> >> exploring what we can do better to pass them, perhaps multiple pages, not
> >> sure yet, but for now it's single page.
> >
> > To be clear, if the iommu hypercall does not produce any output, this patch
> > set uses the entire per-cpu hypercall arg page for input. For example,
> 
> Good
> 
> > hv_mark_gpa_visibility() uses the entire page for input, which is mostly an
> > array of PFNs.
> >
> > Using multiple input pages is definitely a new paradigm, on both the
> > hypervisor and guest sides, and that will need additional infrastructure,
> > with or without this patch set.
> 
> Right. With this patch set, every hcall is affected rather than just one
> when code is modified to support that. That means one must test every
> hypercall.
> 
> > I'm just trying to understand where there are real technical blockers vs.
> > concern about the style and the encapsulation the helpers impose.
> 
> Well no technical blockers that can't be resolved, but style and obfuscation
> that helpers impose are big concern.
> 
> Thanks,
> -Mukesh


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 0/7] hyperv: Introduce new way to manage hypercall args
  2025-04-15 18:07 [PATCH v3 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
                   ` (6 preceding siblings ...)
  2025-04-15 18:07 ` [PATCH v3 7/7] Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg mhkelley58
@ 2025-08-25 21:39 ` Wei Liu
  7 siblings, 0 replies; 24+ messages in thread
From: Wei Liu @ 2025-08-25 21:39 UTC (permalink / raw)
  To: mhklinux
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	lpieralisi, kw, manivannan.sadhasivam, robh, bhelgaas, arnd, x86,
	linux-hyperv, linux-kernel, linux-pci, linux-arch

On Tue, Apr 15, 2025 at 11:07:21AM -0700, mhkelley58@gmail.com wrote:
[...]
> 
> This patch set is built against linux-next20250411.
> 
> [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
> 
> Michael Kelley (7):
>   Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
>   x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 1
>   x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 2
>   Drivers: hv: Use hv_hvcall_*() to set up hypercall arguments
>   PCI: hv: Use hv_hvcall_*() to set up hypercall arguments
>   Drivers: hv: Use hv_hvcall_*() to set up hypercall arguments for mshv
>     code
>   Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg

I applied this series before but then there is a new discussion ongoing,
so I've dropped it from my tree for now until the discussion settles.

Wei

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-08-22  2:16             ` Michael Kelley
@ 2025-08-26  0:13               ` Nuno Das Neves
  2025-08-26  1:46                 ` Mukesh R
  0 siblings, 1 reply; 24+ messages in thread
From: Nuno Das Neves @ 2025-08-26  0:13 UTC (permalink / raw)
  To: Michael Kelley, Mukesh R, kys@microsoft.com,
	haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, lpieralisi@kernel.org,
	kw@linux.com, manivannan.sadhasivam@linaro.org, robh@kernel.org,
	bhelgaas@google.com, arnd@arndb.de
  Cc: x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-arch@vger.kernel.org

On 8/21/2025 7:16 PM, Michael Kelley wrote:
> From: Mukesh R <mrathor@linux.microsoft.com> Sent: Thursday, August 21, 2025 2:16 PM
>>
>> On 8/21/25 13:49, Mukesh R wrote:
>>> On 8/21/25 12:24, Michael Kelley wrote:
>>>> From: Mukesh R <mrathor@linux.microsoft.com> Sent: Wednesday, August 20, 2025 7:58 PM
>>>>>
>>>>> On 8/20/25 17:31, Mukesh R wrote:
>>>>>> On 4/15/25 11:07, mhkelley58@gmail.com wrote:
>>>>>>> From: Michael Kelley <mhklinux@outlook.com>
>>>>>>>
>>>>>>>
>>> <snip>
>>>>>>
>>>>>>
>>>>>> IMHO, this is unnecessary change that just obfuscates code. With status quo
>>>>>> one has the advantage of seeing what exactly is going on, one can use the
>>>>>> args any which way, change batch size any which way, and is thus flexible.
>>>>
>>>> I started this patch set in response to some errors in open coding the
>>>> use of hyperv_pcpu_input/output_arg, to see if helper functions could
>>>> regularize the usage and reduce the likelihood of future errors. Balancing
>>>> the pluses and minuses of the result, in my view the helper functions are
>>>> an improvement, though not overwhelmingly so. Others may see the
>>>> tradeoffs differently, and as such I would not go to the mat in arguing the
>>>> patches must be taken. But if we don't take them, we need to go back and
>>>> clean up minor errors and inconsistencies in the open coding at some
>>>> existing hypercall call sites.
>>>
>>> Yes, definitely. Assuming Nuno knows what issues you are referring to,
>>> I'll work with him to get them addressed asap. Thanks for noticing them.
>>> If Nuno is not aware, I'll ping you for more info.
>>
>> Talked to Nuno, he's not aware of anything pending or details. So if you
>> can kindly list them out here, I will make sure it gets addressed right
>> away.
>>
> 
> I didn't catalog the issues as I came across them when doing this patch
> set. :-(   I don't think any are bugs that could break things now. They were
> things like not ensuring that all hypercall input fields are initialized to zero,
> duplicate initialization to zero, and unnecessary initialization of hypercall
> output memory. In general, how the hypercall args are set up is inconsistent
> across different hypercall call sites, and that inconsistency can lead to errors,
> which is what I was trying to address.
> 
> But I can go back and come up with a list if that's where we're headed.

Hi Michael and Mukesh,

Just a suggestion, how about a simpler set of macros that doesn't really change
the existing paradigm, but can be used to improve the consistency across the
various hypercall sites.

e.g. for getting and zeroing the input page:

#define hv_get_input_ptr(in_ptr) \
({ \
        static_assert(sizeof(*in_ptr) <= HV_HYP_PAGE_SIZE); \
        void *__arg = *this_cpu_ptr(hyperv_pcpu_input_arg); \
        memset(__arg, 0, sizeof(*in_ptr)); \
        __arg; \
})

(And something similar for the output arg which doesn't need memset())

And for batch size, it can be very simple, although there's both the case
of argument + array elements, and just array elements:

#define hv_arg_get_batch_size(arg_ptr, element_ptr) \
        ((HV_HYP_PAGE_SIZE - sizeof(*arg_ptr)) / sizeof(*element_ptr))

#define hv_get_batch_size(element_ptr) (HV_HYP_PAGE_SIZE / sizeof(*element_ptr))

Usage:

struct hv_input_map_gpa_pages *input_page = hv_get_input_ptr(input_page);
int batch_size = hv_arg_get_batch_size(input_page, &input_page->source_gpa_page_list[0]);



Nuno

> 
> Michael


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments
  2025-08-26  0:13               ` Nuno Das Neves
@ 2025-08-26  1:46                 ` Mukesh R
  0 siblings, 0 replies; 24+ messages in thread
From: Mukesh R @ 2025-08-26  1:46 UTC (permalink / raw)
  To: Nuno Das Neves, Michael Kelley, kys@microsoft.com,
	haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, lpieralisi@kernel.org,
	kw@linux.com, manivannan.sadhasivam@linaro.org, robh@kernel.org,
	bhelgaas@google.com, arnd@arndb.de
  Cc: x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-arch@vger.kernel.org

On 8/25/25 17:13, Nuno Das Neves wrote:
> On 8/21/2025 7:16 PM, Michael Kelley wrote:
>> From: Mukesh R <mrathor@linux.microsoft.com> Sent: Thursday, August 21, 2025 2:16 PM
>>>
>>> On 8/21/25 13:49, Mukesh R wrote:
>>>> On 8/21/25 12:24, Michael Kelley wrote:
>>>>> From: Mukesh R <mrathor@linux.microsoft.com> Sent: Wednesday, August 20, 2025 7:58 PM
>>>>>>
>>>>>> On 8/20/25 17:31, Mukesh R wrote:
>>>>>>> On 4/15/25 11:07, mhkelley58@gmail.com wrote:
>>>>>>>> From: Michael Kelley <mhklinux@outlook.com>
>>>>>>>>
>>>>>>>>
>>>> <snip>
>>>>>>>
>>>>>>>
>>>>>>> IMHO, this is unnecessary change that just obfuscates code. With status quo
>>>>>>> one has the advantage of seeing what exactly is going on, one can use the
>>>>>>> args any which way, change batch size any which way, and is thus flexible.
>>>>>
>>>>> I started this patch set in response to some errors in open coding the
>>>>> use of hyperv_pcpu_input/output_arg, to see if helper functions could
>>>>> regularize the usage and reduce the likelihood of future errors. Balancing
>>>>> the pluses and minuses of the result, in my view the helper functions are
>>>>> an improvement, though not overwhelmingly so. Others may see the
>>>>> tradeoffs differently, and as such I would not go to the mat in arguing the
>>>>> patches must be taken. But if we don't take them, we need to go back and
>>>>> clean up minor errors and inconsistencies in the open coding at some
>>>>> existing hypercall call sites.
>>>>
>>>> Yes, definitely. Assuming Nuno knows what issues you are referring to,
>>>> I'll work with him to get them addressed asap. Thanks for noticing them.
>>>> If Nuno is not aware, I'll ping you for more info.
>>>
>>> Talked to Nuno, he's not aware of anything pending or details. So if you
>>> can kindly list them out here, I will make sure it gets addressed right
>>> away.
>>>
>>
>> I didn't catalog the issues as I came across them when doing this patch
>> set. :-(   I don't think any are bugs that could break things now. They were
>> things like not ensuring that all hypercall input fields are initialized to zero,
>> duplicate initialization to zero, and unnecessary initialization of hypercall
>> output memory. In general, how the hypercall args are set up is inconsistent
>> across different hypercall call sites, and that inconsistency can lead to errors,
>> which is what I was trying to address.
>>
>> But I can go back and come up with a list if that's where we're headed.
> 
> Hi Michael and Mukesh,
> 
> Just a suggestion, how about a simpler set of macros that doesn't really change
> the existing paradigm, but can be used to improve the consistency across the
> various hypercall sites.
> 
> e.g. for getting and zeroing the input page:
> 
> #define hv_get_input_ptr(in_ptr) \
> ({ \
>          static_assert(sizeof(*in_ptr) <= HV_HYP_PAGE_SIZE); \
>          void *__arg = *this_cpu_ptr(hyperv_pcpu_input_arg); \
>          memset(__arg, 0, sizeof(*in_ptr)); \
>          __arg; \
> })

Ugh! What is the problem that we are trying to solve? The code is
simple and clear today, tells the reader exactly what is being used and
for how many bytes etc. What if the input to hyp is a list of pfns, maybe
a void *? And if we want to do any complex stuff, we'll just keep adding
parameters to the macro. IMO, complex macros just obfuscate code! I think
this is just not worth it right now. We'll look ways to enhance hcall params
in future, perhaps we can address it then if there are any real issues.

Thanks,
-Mukesh


> (And something similar for the output arg which doesn't need memset())
> 
> And for batch size, it can be very simple, although there's both the case
> of argument + array elements, and just array elements:
> 
> #define hv_arg_get_batch_size(arg_ptr, element_ptr) \
>          ((HV_HYP_PAGE_SIZE - sizeof(*arg_ptr)) / sizeof(*element_ptr))
> 
> #define hv_get_batch_size(element_ptr) (HV_HYP_PAGE_SIZE / sizeof(*element_ptr))
> 
> Usage:
> 
> struct hv_input_map_gpa_pages *input_page = hv_get_input_ptr(input_page);
> int batch_size = hv_arg_get_batch_size(input_page, &input_page->source_gpa_page_list[0]);
> 
> 
> 
> Nuno
> 
>>
>> Michael


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-08-26  1:46 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-15 18:07 [PATCH v3 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
2025-04-15 18:07 ` [PATCH v3 1/7] Drivers: hv: Introduce hv_hvcall_*() functions for hypercall arguments mhkelley58
2025-04-21 20:41   ` Easwar Hariharan
2025-04-21 21:24     ` Michael Kelley
2025-04-21 23:27       ` Easwar Hariharan
2025-06-04 17:41         ` Easwar Hariharan
2025-08-21  0:31   ` Mukesh R
2025-08-21  2:58     ` Mukesh R
2025-08-21 19:24       ` Michael Kelley
2025-08-21 20:49         ` Mukesh R
2025-08-21 21:15           ` Mukesh R
2025-08-22  2:16             ` Michael Kelley
2025-08-26  0:13               ` Nuno Das Neves
2025-08-26  1:46                 ` Mukesh R
2025-08-22  2:10           ` Michael Kelley
2025-08-23  2:25             ` Mukesh R
2025-08-25 21:01               ` Michael Kelley
2025-04-15 18:07 ` [PATCH v3 2/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 1 mhkelley58
2025-04-15 18:07 ` [PATCH v3 3/7] x86/hyperv: Use hv_hvcall_*() to set up hypercall arguments -- part 2 mhkelley58
2025-04-15 18:07 ` [PATCH v3 4/7] Drivers: hv: Use hv_hvcall_*() to set up hypercall arguments mhkelley58
2025-04-15 18:07 ` [PATCH v3 5/7] PCI: " mhkelley58
2025-04-15 18:07 ` [PATCH v3 6/7] Drivers: hv: Use hv_hvcall_*() to set up hypercall arguments for mshv code mhkelley58
2025-04-15 18:07 ` [PATCH v3 7/7] Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg mhkelley58
2025-08-25 21:39 ` [PATCH v3 0/7] hyperv: Introduce new way to manage hypercall args Wei Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).