* [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args
@ 2025-07-18 4:55 mhkelley58
2025-07-18 4:55 ` [PATCH v4 1/7] Drivers: hv: Introduce hv_setup_*() functions for hypercall arguments mhkelley58
` (8 more replies)
0 siblings, 9 replies; 26+ messages in thread
From: mhkelley58 @ 2025-07-18 4:55 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
lpieralisi, kw, mani, robh, bhelgaas, arnd
Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch
From: Michael Kelley <mhklinux@outlook.com>
This patch set introduces a new way to manage the use of the per-vCPU
memory that is usually the input and output arguments to Hyper-V
hypercalls. Current code allocates the "hyperv_pcpu_input_arg", and in
some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
page of memory allocated per-vCPU. A hypercall call site disables
interrupts, then uses this memory to set up the input parameters for
the hypercall, read the output results after hypercall execution, and
re-enable interrupts. The open coding of these steps has led to
inconsistencies, and in some cases, violation of the generic
requirements for the hypercall input and output as described in the
Hyper-V Top Level Functional Spec (TLFS)[1]. This patch set introduces
a new family of inline functions to replace the open coding. The new
functions encapsulate key aspects of the use of per-vCPU memory for
hypercall input and output, and ensure that the TLFS requirements are
met (max size of 1 page each for input and output, no overlap of input
and output, aligned to 8 bytes, etc.).
With this change, hypercall call sites no longer directly access
"hyperv_pcpu_input_arg" and "hyperv_pcpu_output_arg". Instead, one of
a family of new functions provides the per-vCPU memory that a hypercall
call site uses to set up hypercall input and output areas.
Conceptually, there is no longer a difference between the "per-vCPU
input page" and "per-vCPU output page". Only a single per-vCPU page is
allocated, and it is used to provide both hypercall input and output.
All current hypercalls can fit their input and output within that single
page, though the new code allows easy changing to two pages should a
future hypercall require a full page for each of the input and output.
The new functions always zero the fixed-size portion of the hypercall
input area (but not any array portion -- see below) so that
uninitialized memory isn't inadvertently passed to the hypercall.
Current open-coded hypercall call sites are inconsistent on this point,
and use of the new functions addresses that inconsistency. The output
area is not zero'ed by the new code as it is Hyper-V's responsibility
to provide legal output.
When the input or output (or both) contain an array, the new code
calculates and returns how many array entries fit within the per-vCPU
memory page, which is effectively the "batch size" for the hypercall
processing multiple entries. This batch size can then be used in the
hypercall control word to specify the repetition count. This
calculation of the batch size replaces current open coding of the
batch size, which is prone to errors. Note that the array portion of
the input area is *not* zero'ed. The arrays are almost always 64-bit
GPAs or something similar, and zero'ing that much memory seems
wasteful at runtime when it will all be overwritten. The hypercall
call site is responsible for ensuring that no part of the array is
left uninitialized (just as with current code).
The new family of functions is realized as a single inline function
that handles the most complex case, which is a hypercall with input
and output, both of which contain arrays. Simpler cases are mapped to
this most complex case with #define wrappers that provide zero or NULL
for some arguments. Several of the arguments to this new function
must be compile-time constants generated by "sizeof()" expressions.
As such, most of the code in the new function is evaluated by the
compiler, with the result that the runtime code paths are no longer
than with the current open coding. An exception is the new code
generated to zero the fixed-size portion of the input area in cases
where it was not previously done.
Use of the new function typically (but not always) saves a few lines
of code at each hypercall call site. This is traded off against the
lines of code added for the new functions. With code currently
upstream, the net is an add of about 20 lines of code and comments.
A couple hypercall call sites have requirements that are not 100%
handled by the new function. These still require some manual open-
coded adjustment or open-coded batch size calculations -- see the
individual patches in this series. Suggestions on how to do better
are welcome.
The patches in the series do the following:
Patch 1: Introduce the new family of functions for assigning hypercall
input and output arguments.
Patch 2 to 6: Change existing hypercall call sites to use one of the new
functions. In some cases, tweaks to the hypercall argument data
structures are necessary, but these tweaks are making the data
structures more consistent with the overall pattern. These
5 patches are independent of each other, and can go in any
order. The breakup into 5 patches is for ease of review.
Patch 7: Update the name of the variable used to hold the per-vCPU memory
used for hypercall arguments. Remove code for managing the
per-vCPU output page.
The new code compiles and runs successfully on x86 and arm64. However,
basic smoke tests cover only a limited number of hypercall call sites
that have been modified. I don't have the hardware or Hyper-V
configurations needed to test running in the Hyper-V root partition
or running in a VTL other than VTL 0. The related hypercall call sites
still need to be tested to make sure I didn't break anything. Hopefully
someone with the necessary configurations and Hyper-V versions can
help with that testing.
For gcc 9.4.0, I've looked at the generated code for a couple of
hypercall call sites on both x86 and arm64 to ensure that it boils
down to the equivalent of the current open coding. I have not looked
at the generated code for later gcc versions or for Clang/LLVM, but
there's no reason to expect something worse as the code isn't doing
anything tricky.
This patch set is built against linux-next20250716.
[1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
Michael Kelley (7):
Drivers: hv: Introduce hv_setup_*() functions for hypercall arguments
x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 1
x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 2
Drivers: hv: Use hv_setup_*() to set up hypercall arguments
PCI: hv: Use hv_setup_*() to set up hypercall arguments
Drivers: hv: Use hv_setup_*() to set up hypercall arguments for mshv
code
Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg
arch/x86/hyperv/hv_apic.c | 10 +--
arch/x86/hyperv/hv_init.c | 12 ++-
arch/x86/hyperv/hv_vtl.c | 3 +-
arch/x86/hyperv/irqdomain.c | 17 ++--
arch/x86/hyperv/ivm.c | 18 ++---
arch/x86/hyperv/mmu.c | 19 ++---
arch/x86/hyperv/nested.c | 14 ++--
drivers/hv/hv.c | 6 +-
drivers/hv/hv_balloon.c | 8 +-
drivers/hv/hv_common.c | 64 +++++----------
drivers/hv/hv_proc.c | 23 +++---
drivers/hv/hyperv_vmbus.h | 2 +-
drivers/hv/mshv_common.c | 31 +++----
drivers/hv/mshv_root_hv_call.c | 121 +++++++++++-----------------
drivers/hv/mshv_root_main.c | 5 +-
drivers/pci/controller/pci-hyperv.c | 18 ++---
include/asm-generic/mshyperv.h | 106 +++++++++++++++++++++++-
include/hyperv/hvgdk_mini.h | 4 +-
18 files changed, 251 insertions(+), 230 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v4 1/7] Drivers: hv: Introduce hv_setup_*() functions for hypercall arguments
2025-07-18 4:55 [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
@ 2025-07-18 4:55 ` mhkelley58
2025-08-14 7:58 ` Tianyu Lan
2025-07-18 4:55 ` [PATCH v4 2/7] x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 1 mhkelley58
` (7 subsequent siblings)
8 siblings, 1 reply; 26+ messages in thread
From: mhkelley58 @ 2025-07-18 4:55 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
lpieralisi, kw, mani, robh, bhelgaas, arnd
Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch
From: Michael Kelley <mhklinux@outlook.com>
Current code allocates the "hyperv_pcpu_input_arg", and in
some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
page of memory allocated per-vCPU. A hypercall call site disables
interrupts, then uses this memory to set up the input parameters for
the hypercall, read the output results after hypercall execution, and
re-enable interrupts. The open coding of these steps leads to
inconsistencies, and in some cases, violation of the generic
requirements for the hypercall input and output as described in the
Hyper-V Top Level Functional Spec (TLFS)[1].
To reduce these kinds of problems, introduce a family of inline
functions to replace the open coding. The functions provide a new way
to manage the use of this per-vCPU memory that is usually the input and
output arguments to Hyper-V hypercalls. The functions encapsulate
key aspects of the usage and ensure that the TLFS requirements are
met (max size of 1 page each for input and output, no overlap of
input and output, aligned to 8 bytes, etc.). Conceptually, there
is no longer a difference between the "per-vCPU input page" and
"per-vCPU output page". Only a single per-vCPU page is allocated, and
it provides both hypercall input and output memory. All current
hypercalls can fit their input and output within that single page,
though the new code allows easy changing to two pages should a future
hypercall require a full page for each of the input and output.
The new functions always zero the fixed-size portion of the hypercall
input area so that uninitialized memory is not inadvertently passed
to the hypercall. Current open-coded hypercall call sites are
inconsistent on this point, and use of the new functions addresses
that inconsistency. The output area is not zero'ed by the new code
as it is Hyper-V's responsibility to provide legal output.
When the input or output (or both) contain an array, the new functions
calculate and return how many array entries fit within the per-vCPU
memory page, which is effectively the "batch size" for the hypercall
processing multiple entries. This batch size can then be used in the
hypercall control word to specify the repetition count. This
calculation of the batch size replaces current open coding of the
batch size, which is prone to errors. Note that the array portion of
the input area is *not* zero'ed. The arrays are almost always 64-bit
GPAs or something similar, and zero'ing that much memory seems
wasteful at runtime when it will all be overwritten. The hypercall
call site is responsible for ensuring that no part of the array is
left uninitialized (just as with current code).
The new functions are realized as a single inline function that
handles the most complex case, which is a hypercall with input
and output, both of which contain arrays. Simpler cases are mapped to
this most complex case with #define wrappers that provide zero or NULL
for some arguments. Several of the arguments to this new function
must be compile-time constants generated by "sizeof()"
expressions. As such, most of the code in the new function can be
evaluated by the compiler, with the result that the code paths are
no longer than with the current open coding. The one exception is
new code generated to zero the fixed-size portion of the input area
in cases where it is not currently done.
[1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
---
Notes:
Changes in v4:
* Rename hv_hvcall_*() functions to hv_setup_*() [Easwar Hariharan]
* Rename hv_hvcall_in_batch_size() to hv_get_input_batch_size()
[Easwar Hariharan]
Changes in v3:
* Added wrapper #define hv_hvcall_in_batch_size() to get the batch size
without setting up hypercall input/output parameters. This call can be
used when the batch size is needed for validation checks or memory
allocations prior to disabling interrupts.
Changes in v2:
* Added comment that hv_hvcall_inout_array() should always be called with
interrupts disabled because it is returning pointers to per-cpu memory
[Nuno Das Neves]
include/asm-generic/mshyperv.h | 106 +++++++++++++++++++++++++++++++++
1 file changed, 106 insertions(+)
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index a729b77983fa..040c4650f411 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -151,6 +151,112 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size,
return status;
}
+/*
+ * Hypercall input and output argument setup
+ */
+
+/* Temporary mapping to be removed at the end of the patch series */
+#define hyperv_pcpu_arg hyperv_pcpu_input_arg
+
+/*
+ * Allocate one page that is shared between input and output args, which is
+ * sufficient for all current hypercalls. If a future hypercall requires
+ * more space, change this value to "2" and everything will work.
+ */
+#define HV_HVCALL_ARG_PAGES 1
+
+/*
+ * Allocate space for hypercall input and output arguments from the
+ * pre-allocated per-cpu hyperv_pcpu_args page(s). A NULL value for the input
+ * or output indicates to allocate no space for that argument. For input and
+ * for output, specify the size of the fixed portion, and the size of an
+ * element in a variable size array. A zero value for entry_size indicates
+ * there is no array. The fixed size space for the input is zero'ed.
+ *
+ * When variable size arrays are present, the function returns the number of
+ * elements (i.e, the batch size) that fit in the available space.
+ *
+ * The four "size" arguments must be constants so the compiler can do most of
+ * the calculations. Then the generated inline code is no larger than if open
+ * coding the access to the hyperv_pcpu_arg and doing memset() on the input.
+ *
+ * This function must be called with interrupts disabled so the thread is not
+ * rescheduled onto another vCPU while accessing the per-cpu args page.
+ */
+static inline int hv_setup_inout_array(void *input, u32 in_size, u32 in_entry_size,
+ void *output, u32 out_size, u32 out_entry_size)
+{
+ u32 in_batch_count = 0, out_batch_count = 0, batch_count;
+ u32 in_total_size, out_total_size, offset;
+ u32 batch_space = HV_HYP_PAGE_SIZE * HV_HVCALL_ARG_PAGES;
+ void *space;
+
+ /*
+ * If input and output have arrays, allocate half the space to input
+ * and half to output. If only input has an array, the array can use
+ * all the space except for the fixed size output (but not to exceed
+ * one page), and vice versa.
+ */
+ if (in_entry_size && out_entry_size)
+ batch_space = batch_space / 2;
+ else if (in_entry_size)
+ batch_space = min(HV_HYP_PAGE_SIZE, batch_space - out_size);
+ else if (out_entry_size)
+ batch_space = min(HV_HYP_PAGE_SIZE, batch_space - in_size);
+
+ if (in_entry_size)
+ in_batch_count = (batch_space - in_size) / in_entry_size;
+ if (out_entry_size)
+ out_batch_count = (batch_space - out_size) / out_entry_size;
+
+ /*
+ * If input and output have arrays, use the smaller of the two batch
+ * counts, in case they are different. If only one has an array, use
+ * that batch count. batch_count will be zero if neither has an array.
+ */
+ if (in_batch_count && out_batch_count)
+ batch_count = min(in_batch_count, out_batch_count);
+ else
+ batch_count = in_batch_count | out_batch_count;
+
+ in_total_size = ALIGN(in_size + (in_entry_size * batch_count), 8);
+ out_total_size = ALIGN(out_size + (out_entry_size * batch_count), 8);
+
+ space = *this_cpu_ptr(hyperv_pcpu_arg);
+ if (input) {
+ *(void **)input = space;
+ if (space)
+ /* Zero the fixed size portion, not any array portion */
+ memset(space, 0, ALIGN(in_size, 8));
+ }
+
+ if (output) {
+ if (in_total_size + out_total_size <= HV_HYP_PAGE_SIZE) {
+ offset = in_total_size;
+ } else {
+ offset = HV_HYP_PAGE_SIZE;
+ /* Need more than 1 page, but only 1 was allocated */
+ BUILD_BUG_ON(HV_HVCALL_ARG_PAGES == 1);
+ }
+ *(void **)output = space + offset;
+ }
+
+ return batch_count;
+}
+
+/* Wrappers for some of the simpler cases with only input, or with no arrays */
+#define hv_setup_in(input, in_size) \
+ hv_setup_inout_array(input, in_size, 0, NULL, 0, 0)
+
+#define hv_setup_inout(input, in_size, output, out_size) \
+ hv_setup_inout_array(input, in_size, 0, output, out_size, 0)
+
+#define hv_setup_in_array(input, in_size, in_entry_size) \
+ hv_setup_inout_array(input, in_size, in_entry_size, NULL, 0, 0)
+
+#define hv_get_input_batch_size(in_size, in_entry_size) \
+ hv_setup_inout_array(NULL, in_size, in_entry_size, NULL, 0, 0)
+
/* Generate the guest OS identifier as described in the Hyper-V TLFS */
static inline u64 hv_generate_guest_id(u64 kernel_version)
{
--
2.25.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v4 2/7] x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 1
2025-07-18 4:55 [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
2025-07-18 4:55 ` [PATCH v4 1/7] Drivers: hv: Introduce hv_setup_*() functions for hypercall arguments mhkelley58
@ 2025-07-18 4:55 ` mhkelley58
2025-08-13 0:22 ` Nuno Das Neves
2025-07-18 4:55 ` [PATCH v4 3/7] x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 2 mhkelley58
` (6 subsequent siblings)
8 siblings, 1 reply; 26+ messages in thread
From: mhkelley58 @ 2025-07-18 4:55 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
lpieralisi, kw, mani, robh, bhelgaas, arnd
Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch
From: Michael Kelley <mhklinux@outlook.com>
Update hypercall call sites to use the new hv_setup_*() functions
to set up hypercall arguments. Since these functions zero the
fixed portion of input memory, remove now redundant calls to memset()
and explicit zero'ing of input fields.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
---
Notes:
Changes in v4:
* Rename hv_hvcall_*() functions to hv_setup_*() [Easwar Hariharan]
* Rename hv_hvcall_in_batch_size() to hv_get_input_batch_size()
[Easwar Hariharan]
Changes in v2:
* Fixed get_vtl() and hv_vtl_apicid_to_vp_id() to properly treat the input
and output arguments as arrays [Nuno Das Neves]
* Enhanced __send_ipi_mask_ex() and hv_map_interrupt() to check the number
of computed banks in the hv_vpset against the batch_size. Since an
hv_vpset currently represents a maximum of 4096 CPUs, the hv_vpset size
does not exceed 512 bytes and there should always be sufficent space. But
do the check just in case something changes. [Nuno Das Neves]
arch/x86/hyperv/hv_apic.c | 10 ++++------
arch/x86/hyperv/hv_init.c | 6 ++----
arch/x86/hyperv/hv_vtl.c | 3 +--
arch/x86/hyperv/irqdomain.c | 17 ++++++++++-------
4 files changed, 17 insertions(+), 19 deletions(-)
diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index bfde0a3498b9..bafb5dceb5d6 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -109,21 +109,19 @@ static bool __send_ipi_mask_ex(const struct cpumask *mask, int vector,
{
struct hv_send_ipi_ex *ipi_arg;
unsigned long flags;
- int nr_bank = 0;
+ int batch_size, nr_bank = 0;
u64 status = HV_STATUS_INVALID_PARAMETER;
if (!(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED))
return false;
local_irq_save(flags);
- ipi_arg = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
+ batch_size = hv_setup_in_array(&ipi_arg, sizeof(*ipi_arg),
+ sizeof(ipi_arg->vp_set.bank_contents[0]));
if (unlikely(!ipi_arg))
goto ipi_mask_ex_done;
ipi_arg->vector = vector;
- ipi_arg->reserved = 0;
- ipi_arg->vp_set.valid_bank_mask = 0;
/*
* Use HV_GENERIC_SET_ALL and avoid converting cpumask to VP_SET
@@ -140,7 +138,7 @@ static bool __send_ipi_mask_ex(const struct cpumask *mask, int vector,
* represented in VP_SET. Return an error and fall back to
* native (architectural) method of sending IPIs.
*/
- if (nr_bank <= 0)
+ if (nr_bank <= 0 || nr_bank > batch_size)
goto ipi_mask_ex_done;
} else {
ipi_arg->vp_set.format = HV_GENERIC_SET_ALL;
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index afdbda2dd7b7..b7a2877c2a92 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -685,13 +685,11 @@ int hv_apicid_to_vp_index(u32 apic_id)
local_irq_save(irq_flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
- memset(input, 0, sizeof(*input));
+ hv_setup_inout_array(&input, sizeof(*input), sizeof(input->apic_ids[0]),
+ &output, 0, sizeof(*output));
input->partition_id = HV_PARTITION_ID_SELF;
input->apic_ids[0] = apic_id;
- output = *this_cpu_ptr(hyperv_pcpu_output_arg);
-
control = HV_HYPERCALL_REP_COMP_1 | HVCALL_GET_VP_INDEX_FROM_APIC_ID;
status = hv_do_hypercall(control, input, output);
ret = output[0];
diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c
index 042e8712d8de..fc523a5096f4 100644
--- a/arch/x86/hyperv/hv_vtl.c
+++ b/arch/x86/hyperv/hv_vtl.c
@@ -131,8 +131,7 @@ static int hv_vtl_bringup_vcpu(u32 target_vp_index, int cpu, u64 eip_ignored)
local_irq_save(irq_flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
- memset(input, 0, sizeof(*input));
+ hv_setup_in(&input, sizeof(*input));
input->partition_id = HV_PARTITION_ID_SELF;
input->vp_index = target_vp_index;
diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
index 090f5ac9f492..87ebe43f58cf 100644
--- a/arch/x86/hyperv/irqdomain.c
+++ b/arch/x86/hyperv/irqdomain.c
@@ -21,15 +21,15 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
struct hv_device_interrupt_descriptor *intr_desc;
unsigned long flags;
u64 status;
- int nr_bank, var_size;
+ int batch_size, nr_bank, var_size;
local_irq_save(flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
- output = *this_cpu_ptr(hyperv_pcpu_output_arg);
+ batch_size = hv_setup_inout_array(&input, sizeof(*input),
+ sizeof(input->interrupt_descriptor.target.vp_set.bank_contents[0]),
+ &output, sizeof(*output), 0);
intr_desc = &input->interrupt_descriptor;
- memset(input, 0, sizeof(*input));
input->partition_id = hv_current_partition_id;
input->device_id = device_id.as_uint64;
intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
@@ -41,7 +41,6 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
else
intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
- intr_desc->target.vp_set.valid_bank_mask = 0;
intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), cpumask_of(cpu));
if (nr_bank < 0) {
@@ -49,6 +48,11 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
pr_err("%s: unable to generate VP set\n", __func__);
return -EINVAL;
}
+ if (nr_bank > batch_size) {
+ local_irq_restore(flags);
+ pr_err("%s: nr_bank too large\n", __func__);
+ return -EINVAL;
+ }
intr_desc->target.flags = HV_DEVICE_INTERRUPT_TARGET_PROCESSOR_SET;
/*
@@ -78,9 +82,8 @@ static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *old_entry)
u64 status;
local_irq_save(flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
- memset(input, 0, sizeof(*input));
+ hv_setup_in(&input, sizeof(*input));
intr_entry = &input->interrupt_entry;
input->partition_id = hv_current_partition_id;
input->device_id = id;
--
2.25.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v4 3/7] x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 2
2025-07-18 4:55 [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
2025-07-18 4:55 ` [PATCH v4 1/7] Drivers: hv: Introduce hv_setup_*() functions for hypercall arguments mhkelley58
2025-07-18 4:55 ` [PATCH v4 2/7] x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 1 mhkelley58
@ 2025-07-18 4:55 ` mhkelley58
2025-07-18 4:55 ` [PATCH v4 4/7] Drivers: hv: Use hv_setup_*() to set up hypercall arguments mhkelley58
` (5 subsequent siblings)
8 siblings, 0 replies; 26+ messages in thread
From: mhkelley58 @ 2025-07-18 4:55 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
lpieralisi, kw, mani, robh, bhelgaas, arnd
Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch
From: Michael Kelley <mhklinux@outlook.com>
Update hypercall call sites to use the new hv_setup_*() functions
to set up hypercall arguments. Since these functions zero the
fixed portion of input memory, remove now redundant calls to memset()
and explicit zero'ing of input fields.
For hv_mark_gpa_visibility(), use the computed batch_size instead
of HV_MAX_MODIFY_GPA_REP_COUNT. Also update the associated gpa_page_list[]
field to have zero size, which is more consistent with other array
arguments to hypercalls. Due to the interaction with the calling
hv_vtom_set_host_visibility(), HV_MAX_MODIFY_GPA_REP_COUNT cannot be
completely eliminated without some further restructuring, but that's
for another patch set.
Similarly, for the nested flush functions, update the gpa_list[] to
have zero size. Again, separate restructuring would be required to
completely eliminate the need for HV_MAX_FLUSH_REP_COUNT.
Finally, hyperv_flush_tlb_others_ex() requires special handling
because the input consists of two arrays -- one for the hv_vp_set and
another for the gva list. The batch_size computed by hv_setup_in_array()
is adjusted to account for the number of entries in the hv_vp_set.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
---
Notes:
Changes in v4:
* Rename hv_hvcall_*() functions to hv_setup_*() [Easwar Hariharan]
* Rename hv_hvcall_in_batch_size() to hv_get_input_batch_size()
[Easwar Hariharan]
Changes in v2:
* In hyperv_flush_tlb_others_ex(), added check of the adjusted
max_gvas to make sure it doesn't go to zero or negative, which would
happen if there is insufficient space to hold the hv_vpset and have
at least one entry in the gva list. Since an hv_vpset currently
represents a maximum of 4096 CPUs, the hv_vpset size does not exceed
512 bytes and there should always be sufficent space. But do the
check just in case something changes. [Nuno Das Neves]
arch/x86/hyperv/ivm.c | 18 +++++++++---------
arch/x86/hyperv/mmu.c | 19 +++++--------------
arch/x86/hyperv/nested.c | 14 +++++---------
include/hyperv/hvgdk_mini.h | 4 ++--
4 files changed, 21 insertions(+), 34 deletions(-)
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index ade6c665c97e..3084ae8a3eed 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -474,30 +474,30 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
{
struct hv_gpa_range_for_visibility *input;
u64 hv_status;
+ int batch_size;
unsigned long flags;
/* no-op if partition isolation is not enabled */
if (!hv_is_isolation_supported())
return 0;
- if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
- pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
- HV_MAX_MODIFY_GPA_REP_COUNT);
+ local_irq_save(flags);
+ batch_size = hv_setup_in_array(&input, sizeof(*input),
+ sizeof(input->gpa_page_list[0]));
+ if (unlikely(!input)) {
+ local_irq_restore(flags);
return -EINVAL;
}
- local_irq_save(flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
- if (unlikely(!input)) {
+ if (count > batch_size) {
+ pr_err("Hyper-V: GPA count:%d exceeds supported:%u\n", count,
+ batch_size);
local_irq_restore(flags);
return -EINVAL;
}
input->partition_id = HV_PARTITION_ID_SELF;
input->host_visibility = visibility;
- input->reserved0 = 0;
- input->reserved1 = 0;
memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
hv_status = hv_do_rep_hypercall(
HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index cfcb60468b01..f1aeb5bdb29a 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -72,7 +72,7 @@ static void hyperv_flush_tlb_multi(const struct cpumask *cpus,
local_irq_save(flags);
- flush = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ max_gvas = hv_setup_in_array(&flush, sizeof(*flush), sizeof(flush->gva_list[0]));
if (unlikely(!flush)) {
local_irq_restore(flags);
@@ -86,13 +86,10 @@ static void hyperv_flush_tlb_multi(const struct cpumask *cpus,
*/
flush->address_space = virt_to_phys(info->mm->pgd);
flush->address_space &= CR3_ADDR_MASK;
- flush->flags = 0;
} else {
- flush->address_space = 0;
flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
}
- flush->processor_mask = 0;
if (cpumask_equal(cpus, cpu_present_mask)) {
flush->flags |= HV_FLUSH_ALL_PROCESSORS;
} else {
@@ -139,8 +136,6 @@ static void hyperv_flush_tlb_multi(const struct cpumask *cpus,
* We can flush not more than max_gvas with one hypercall. Flush the
* whole address space if we were asked to do more.
*/
- max_gvas = (PAGE_SIZE - sizeof(*flush)) / sizeof(flush->gva_list[0]);
-
if (info->end == TLB_FLUSH_ALL) {
flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
status = hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE,
@@ -179,7 +174,7 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
if (!(ms_hyperv.hints & HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED))
return HV_STATUS_INVALID_PARAMETER;
- flush = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ max_gvas = hv_setup_in_array(&flush, sizeof(*flush), sizeof(flush->gva_list[0]));
if (info->mm) {
/*
@@ -188,14 +183,10 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
*/
flush->address_space = virt_to_phys(info->mm->pgd);
flush->address_space &= CR3_ADDR_MASK;
- flush->flags = 0;
} else {
- flush->address_space = 0;
flush->flags = HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES;
}
- flush->hv_vp_set.valid_bank_mask = 0;
-
flush->hv_vp_set.format = HV_GENERIC_SET_SPARSE_4K;
nr_bank = cpumask_to_vpset_skip(&flush->hv_vp_set, cpus,
info->freed_tables ? NULL : cpu_is_lazy);
@@ -210,10 +201,10 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
* of flush->hv_vp_set as part of the fixed size input header.
* So the variable input header size is equal to nr_bank.
*/
- max_gvas =
- (PAGE_SIZE - sizeof(*flush) - nr_bank *
- sizeof(flush->hv_vp_set.bank_contents[0])) /
+ max_gvas -= (nr_bank * sizeof(flush->hv_vp_set.bank_contents[0])) /
sizeof(flush->gva_list[0]);
+ if (max_gvas <= 0)
+ return HV_STATUS_INVALID_PARAMETER;
if (info->end == TLB_FLUSH_ALL) {
flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
diff --git a/arch/x86/hyperv/nested.c b/arch/x86/hyperv/nested.c
index 8ccbb7c4fc27..8eeda2e2ad29 100644
--- a/arch/x86/hyperv/nested.c
+++ b/arch/x86/hyperv/nested.c
@@ -30,15 +30,13 @@ int hyperv_flush_guest_mapping(u64 as)
local_irq_save(flags);
- flush = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
+ hv_setup_in(&flush, sizeof(*flush));
if (unlikely(!flush)) {
local_irq_restore(flags);
goto fault;
}
flush->address_space = as;
- flush->flags = 0;
status = hv_do_hypercall(HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE,
flush, NULL);
@@ -91,25 +89,23 @@ int hyperv_flush_guest_mapping_range(u64 as,
u64 status;
unsigned long flags;
int ret = -ENOTSUPP;
- int gpa_n = 0;
+ int batch_size, gpa_n = 0;
if (!hv_hypercall_pg || !fill_flush_list_func)
goto fault;
local_irq_save(flags);
- flush = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
+ batch_size = hv_setup_in_array(&flush, sizeof(*flush),
+ sizeof(flush->gpa_list[0]));
if (unlikely(!flush)) {
local_irq_restore(flags);
goto fault;
}
flush->address_space = as;
- flush->flags = 0;
-
gpa_n = fill_flush_list_func(flush, data);
- if (gpa_n < 0) {
+ if (gpa_n < 0 || gpa_n > batch_size) {
local_irq_restore(flags);
goto fault;
}
diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
index 1be7f6a02304..5590b5fe318c 100644
--- a/include/hyperv/hvgdk_mini.h
+++ b/include/hyperv/hvgdk_mini.h
@@ -557,7 +557,7 @@ union hv_gpa_page_range {
struct hv_guest_mapping_flush_list {
u64 address_space;
u64 flags;
- union hv_gpa_page_range gpa_list[HV_MAX_FLUSH_REP_COUNT];
+ union hv_gpa_page_range gpa_list[];
};
struct hv_tlb_flush { /* HV_INPUT_FLUSH_VIRTUAL_ADDRESS_LIST */
@@ -1244,7 +1244,7 @@ struct hv_gpa_range_for_visibility {
u32 host_visibility : 2;
u32 reserved0 : 30;
u32 reserved1;
- u64 gpa_page_list[HV_MAX_MODIFY_GPA_REP_COUNT];
+ u64 gpa_page_list[];
} __packed;
#if defined(CONFIG_X86)
--
2.25.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v4 4/7] Drivers: hv: Use hv_setup_*() to set up hypercall arguments
2025-07-18 4:55 [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
` (2 preceding siblings ...)
2025-07-18 4:55 ` [PATCH v4 3/7] x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 2 mhkelley58
@ 2025-07-18 4:55 ` mhkelley58
2025-07-28 17:02 ` Nuno Das Neves
2025-07-18 4:55 ` [PATCH v4 5/7] PCI: " mhkelley58
` (4 subsequent siblings)
8 siblings, 1 reply; 26+ messages in thread
From: mhkelley58 @ 2025-07-18 4:55 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
lpieralisi, kw, mani, robh, bhelgaas, arnd
Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch
From: Michael Kelley <mhklinux@outlook.com>
Update hypercall call sites to use the new hv_setup_*() functions
to set up hypercall arguments. Since these functions zero the
fixed portion of input memory, remove now redundant zero'ing of
input fields.
In hv_post_message(), the payload area is treated as an array to
avoid wasting cycles on zero'ing it and then overwriting with
memcpy().
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
Notes:
Changes in v4:
* Rename hv_hvcall_*() functions to hv_setup_*() [Easwar Hariharan]
* Rename hv_hvcall_in_batch_size() to hv_get_input_batch_size()
[Easwar Hariharan]
Changes in v3:
* Removed change to definition of struct hv_input_post_message so the
'payload' remains a fixed size array. Adjust hv_post_message() so
that the 'payload' array is not zero'ed. [Nuno Das Neves]
* Added check of the batch size in hv_free_page_report(). [Nuno Das Neves]
* In hv_call_deposit_pages(), use the new hv_hvcall_in_batch_size() to
get the batch size at the start of the function, and check the
'num_pages' input parameter against that batch size instead of against
a separately defined constant. Also use the batch size to compute the
size of the memory allocation. [Nuno Das Neves]
drivers/hv/hv.c | 4 +++-
drivers/hv/hv_balloon.c | 8 ++++----
drivers/hv/hv_common.c | 9 +++------
drivers/hv/hv_proc.c | 23 ++++++++++-------------
4 files changed, 20 insertions(+), 24 deletions(-)
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index b14c5f9e0ef2..ad063f535f95 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -66,7 +66,9 @@ int hv_post_message(union hv_connection_id connection_id,
if (hv_isolation_type_tdx() && ms_hyperv.paravisor_present)
aligned_msg = this_cpu_ptr(hv_context.cpu_context)->post_msg_page;
else
- aligned_msg = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ hv_setup_in_array(&aligned_msg,
+ offsetof(typeof(*aligned_msg), payload),
+ sizeof(aligned_msg->payload[0]));
aligned_msg->connectionid = connection_id;
aligned_msg->reserved = 0;
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 2b4080e51f97..d9b569b204d2 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -1577,21 +1577,21 @@ static int hv_free_page_report(struct page_reporting_dev_info *pr_dev_info,
{
unsigned long flags;
struct hv_memory_hint *hint;
- int i, order;
+ int i, order, batch_size;
u64 status;
struct scatterlist *sg;
- WARN_ON_ONCE(nents > HV_MEMORY_HINT_MAX_GPA_PAGE_RANGES);
WARN_ON_ONCE(sgl->length < (HV_HYP_PAGE_SIZE << page_reporting_order));
local_irq_save(flags);
- hint = *this_cpu_ptr(hyperv_pcpu_input_arg);
+
+ batch_size = hv_setup_in_array(&hint, sizeof(*hint), sizeof(hint->ranges[0]));
if (!hint) {
local_irq_restore(flags);
return -ENOSPC;
}
+ WARN_ON_ONCE(nents > batch_size);
hint->heat_type = HV_EXTMEM_HEAT_HINT_COLD_DISCARD;
- hint->reserved = 0;
for_each_sg(sgl, sg, nents, i) {
union hv_gpa_page_range *range;
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 49898d10faff..ae56397af1ed 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -267,7 +267,7 @@ void __init hv_get_partition_id(void)
u64 status, pt_id;
local_irq_save(flags);
- output = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ hv_setup_inout(NULL, 0, &output, sizeof(*output));
status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output);
pt_id = output->partition_id;
local_irq_restore(flags);
@@ -288,13 +288,10 @@ u8 __init get_vtl(void)
u64 ret;
local_irq_save(flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
- output = *this_cpu_ptr(hyperv_pcpu_output_arg);
-
- memset(input, 0, struct_size(input, names, 1));
+ hv_setup_inout_array(&input, sizeof(*input), sizeof(input->names[0]),
+ &output, sizeof(*output), sizeof(output->values[0]));
input->partition_id = HV_PARTITION_ID_SELF;
input->vp_index = HV_VP_INDEX_SELF;
- input->input_vtl.as_uint8 = 0;
input->names[0] = HV_REGISTER_VSM_VP_STATUS;
ret = hv_do_hypercall(control, input, output);
diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
index fbb4eb3901bb..bd63830b4141 100644
--- a/drivers/hv/hv_proc.c
+++ b/drivers/hv/hv_proc.c
@@ -9,12 +9,6 @@
#include <linux/export.h>
#include <asm/mshyperv.h>
-/*
- * See struct hv_deposit_memory. The first u64 is partition ID, the rest
- * are GPAs.
- */
-#define HV_DEPOSIT_MAX (HV_HYP_PAGE_SIZE / sizeof(u64) - 1)
-
/* Deposits exact number of pages. Must be called with interrupts enabled. */
int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
{
@@ -25,11 +19,13 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
int order;
u64 status;
int ret;
- u64 base_pfn;
+ u64 base_pfn, batch_size;
struct hv_deposit_memory *input_page;
unsigned long flags;
- if (num_pages > HV_DEPOSIT_MAX)
+ batch_size = hv_get_input_batch_size(sizeof(*input_page),
+ sizeof(input_page->gpa_page_list[0]));
+ if (num_pages > batch_size)
return -E2BIG;
if (!num_pages)
return 0;
@@ -40,7 +36,7 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
return -ENOMEM;
pages = page_address(page);
- counts = kcalloc(HV_DEPOSIT_MAX, sizeof(int), GFP_KERNEL);
+ counts = kcalloc(batch_size, sizeof(int), GFP_KERNEL);
if (!counts) {
free_page((unsigned long)pages);
return -ENOMEM;
@@ -74,7 +70,9 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
local_irq_save(flags);
- input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ /* Batch size is checked at the start of function; no need to repeat */
+ hv_setup_in_array(&input_page, sizeof(*input_page),
+ sizeof(input_page->gpa_page_list[0]));
input_page->partition_id = partition_id;
@@ -126,9 +124,8 @@ int hv_call_add_logical_proc(int node, u32 lp_index, u32 apic_id)
do {
local_irq_save(flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
/* We don't do anything with the output right now */
- output = *this_cpu_ptr(hyperv_pcpu_output_arg);
+ hv_setup_inout(&input, sizeof(*input), &output, sizeof(*output));
input->lp_index = lp_index;
input->apic_id = apic_id;
@@ -169,7 +166,7 @@ int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags)
do {
local_irq_save(irq_flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ hv_setup_in(&input, sizeof(*input));
input->partition_id = partition_id;
input->vp_index = vp_index;
--
2.25.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v4 5/7] PCI: hv: Use hv_setup_*() to set up hypercall arguments
2025-07-18 4:55 [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
` (3 preceding siblings ...)
2025-07-18 4:55 ` [PATCH v4 4/7] Drivers: hv: Use hv_setup_*() to set up hypercall arguments mhkelley58
@ 2025-07-18 4:55 ` mhkelley58
2025-07-28 17:12 ` Nuno Das Neves
2025-07-18 4:55 ` [PATCH v4 6/7] Drivers: hv: Use hv_setup_*() to set up hypercall arguments for mshv code mhkelley58
` (3 subsequent siblings)
8 siblings, 1 reply; 26+ messages in thread
From: mhkelley58 @ 2025-07-18 4:55 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
lpieralisi, kw, mani, robh, bhelgaas, arnd
Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch
From: Michael Kelley <mhklinux@outlook.com>
Update hypercall call sites to use the new hv_setup_*() functions
to set up hypercall arguments. Since these functions zero the
fixed portion of input memory, remove now redundant calls to memset().
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
Notes:
Changes in v4:
* Rename hv_hvcall_*() functions to hv_setup_*() [Easwar Hariharan]
* Rename hv_hvcall_in_batch_size() to hv_get_input_batch_size()
[Easwar Hariharan]
Changes in v3:
* Removed change to definition of struct hv_mmio_write_input so it remains
consistent with original Hyper-V definitions. Adjusted argument to
hv_hvcall_in_array() accordingly so that the 64 byte 'data' array is
not zero'ed. [Nuno Das Neves]
Changes in v2:
* In hv_arch_irq_unmask(), added check of the number of computed banks
in the hv_vpset against the batch_size. Since an hv_vpset currently
represents a maximum of 4096 CPUs, the hv_vpset size does not exceed
512 bytes and there should always be sufficent space. But do the
check just in case something changes. [Nuno Das Neves]
drivers/pci/controller/pci-hyperv.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index d2b7e8ea710b..79de85d1d68b 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -620,7 +620,7 @@ static void hv_irq_retarget_interrupt(struct irq_data *data)
struct pci_dev *pdev;
unsigned long flags;
u32 var_size = 0;
- int cpu, nr_bank;
+ int cpu, nr_bank, batch_size;
u64 res;
dest = irq_data_get_effective_affinity_mask(data);
@@ -636,8 +636,8 @@ static void hv_irq_retarget_interrupt(struct irq_data *data)
local_irq_save(flags);
- params = *this_cpu_ptr(hyperv_pcpu_input_arg);
- memset(params, 0, sizeof(*params));
+ batch_size = hv_setup_in_array(¶ms, sizeof(*params),
+ sizeof(params->int_target.vp_set.bank_contents[0]));
params->partition_id = HV_PARTITION_ID_SELF;
params->int_entry.source = HV_INTERRUPT_SOURCE_MSI;
params->int_entry.msi_entry.address.as_uint32 = int_desc->address & 0xffffffff;
@@ -669,7 +669,7 @@ static void hv_irq_retarget_interrupt(struct irq_data *data)
nr_bank = cpumask_to_vpset(¶ms->int_target.vp_set, tmp);
free_cpumask_var(tmp);
- if (nr_bank <= 0) {
+ if (nr_bank <= 0 || nr_bank > batch_size) {
res = 1;
goto out;
}
@@ -1102,11 +1102,9 @@ static void hv_pci_read_mmio(struct device *dev, phys_addr_t gpa, int size, u32
/*
* Must be called with interrupts disabled so it is safe
- * to use the per-cpu input argument page. Use it for
- * both input and output.
+ * to use the per-cpu argument page.
*/
- in = *this_cpu_ptr(hyperv_pcpu_input_arg);
- out = *this_cpu_ptr(hyperv_pcpu_input_arg) + sizeof(*in);
+ hv_setup_inout(&in, sizeof(*in), &out, sizeof(*out));
in->gpa = gpa;
in->size = size;
@@ -1135,9 +1133,9 @@ static void hv_pci_write_mmio(struct device *dev, phys_addr_t gpa, int size, u32
/*
* Must be called with interrupts disabled so it is safe
- * to use the per-cpu input argument memory.
+ * to use the per-cpu argument page.
*/
- in = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ hv_setup_in_array(&in, offsetof(typeof(*in), data), sizeof(in->data[0]));
in->gpa = gpa;
in->size = size;
switch (size) {
--
2.25.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v4 6/7] Drivers: hv: Use hv_setup_*() to set up hypercall arguments for mshv code
2025-07-18 4:55 [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
` (4 preceding siblings ...)
2025-07-18 4:55 ` [PATCH v4 5/7] PCI: " mhkelley58
@ 2025-07-18 4:55 ` mhkelley58
2025-08-13 22:18 ` Nuno Das Neves
2025-07-18 4:55 ` [PATCH v4 7/7] Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg mhkelley58
` (2 subsequent siblings)
8 siblings, 1 reply; 26+ messages in thread
From: mhkelley58 @ 2025-07-18 4:55 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
lpieralisi, kw, mani, robh, bhelgaas, arnd
Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch
From: Michael Kelley <mhklinux@outlook.com>
Update hypercall call sites to use the new hv_setup_*() functions
to set up hypercall arguments. Since these functions zero the
fixed portion of input memory, remove now redundant calls to memset()
and explicit zero'ing of input fields. Where feasible use batch size
returned by hv_setup_inout_array() instead of separate #define value.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
Notes:
Changes in v4:
* Rename hv_hvcall_*() functions to hv_setup_*() [Easwar Hariharan]
* Rename hv_hvcall_in_batch_size() to hv_get_input_batch_size()
[Easwar Hariharan]
Changes in v3:
* This patch is new in v3 due to rebasing on 6.15-rc1, which has new
mshv-related hypercalls.
drivers/hv/mshv_common.c | 31 +++------
drivers/hv/mshv_root_hv_call.c | 121 +++++++++++++--------------------
drivers/hv/mshv_root_main.c | 5 +-
3 files changed, 60 insertions(+), 97 deletions(-)
diff --git a/drivers/hv/mshv_common.c b/drivers/hv/mshv_common.c
index 6f227a8a5af7..94f8de66c096 100644
--- a/drivers/hv/mshv_common.c
+++ b/drivers/hv/mshv_common.c
@@ -17,12 +17,6 @@
#include "mshv.h"
-#define HV_GET_REGISTER_BATCH_SIZE \
- (HV_HYP_PAGE_SIZE / sizeof(union hv_register_value))
-#define HV_SET_REGISTER_BATCH_SIZE \
- ((HV_HYP_PAGE_SIZE - sizeof(struct hv_input_set_vp_registers)) \
- / sizeof(struct hv_register_assoc))
-
int hv_call_get_vp_registers(u32 vp_index, u64 partition_id, u16 count,
union hv_input_vtl input_vtl,
struct hv_register_assoc *registers)
@@ -30,24 +24,23 @@ int hv_call_get_vp_registers(u32 vp_index, u64 partition_id, u16 count,
struct hv_input_get_vp_registers *input_page;
union hv_register_value *output_page;
u16 completed = 0;
- unsigned long remaining = count;
+ unsigned long batch_size, remaining = count;
int rep_count, i;
u64 status = HV_STATUS_SUCCESS;
unsigned long flags;
local_irq_save(flags);
- input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
- output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
+ batch_size = hv_setup_inout_array(&input_page, sizeof(*input_page),
+ sizeof(input_page->names[0]),
+ &output_page, 0, sizeof(*output_page));
input_page->partition_id = partition_id;
input_page->vp_index = vp_index;
input_page->input_vtl.as_uint8 = input_vtl.as_uint8;
- input_page->rsvd_z8 = 0;
- input_page->rsvd_z16 = 0;
while (remaining) {
- rep_count = min(remaining, HV_GET_REGISTER_BATCH_SIZE);
+ rep_count = min(remaining, batch_size);
for (i = 0; i < rep_count; ++i)
input_page->names[i] = registers[i].name;
@@ -76,21 +69,19 @@ int hv_call_set_vp_registers(u32 vp_index, u64 partition_id, u16 count,
struct hv_input_set_vp_registers *input_page;
u16 completed = 0;
unsigned long remaining = count;
- int rep_count;
+ unsigned long rep_count, batch_size;
u64 status = HV_STATUS_SUCCESS;
unsigned long flags;
local_irq_save(flags);
- input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
+ batch_size = hv_setup_in_array(&input_page, sizeof(*input_page),
+ sizeof(input_page->elements[0]));
input_page->partition_id = partition_id;
input_page->vp_index = vp_index;
input_page->input_vtl.as_uint8 = input_vtl.as_uint8;
- input_page->rsvd_z8 = 0;
- input_page->rsvd_z16 = 0;
while (remaining) {
- rep_count = min(remaining, HV_SET_REGISTER_BATCH_SIZE);
+ rep_count = min(remaining, batch_size);
memcpy(input_page->elements, registers,
sizeof(struct hv_register_assoc) * rep_count);
@@ -120,9 +111,7 @@ int hv_call_get_partition_property(u64 partition_id,
struct hv_output_get_partition_property *output;
local_irq_save(flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
- output = *this_cpu_ptr(hyperv_pcpu_output_arg);
- memset(input, 0, sizeof(*input));
+ hv_setup_inout(&input, sizeof(*input), &output, sizeof(*output));
input->partition_id = partition_id;
input->property_code = property_code;
status = hv_do_hypercall(HVCALL_GET_PARTITION_PROPERTY, input, output);
diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
index c9c274f29c3c..7217310b8fc3 100644
--- a/drivers/hv/mshv_root_hv_call.c
+++ b/drivers/hv/mshv_root_hv_call.c
@@ -22,22 +22,6 @@
#define HV_PAGE_COUNT_2M_ALIGNED(pg_count) (!((pg_count) & (0x200 - 1)))
#define HV_WITHDRAW_BATCH_SIZE (HV_HYP_PAGE_SIZE / sizeof(u64))
-#define HV_MAP_GPA_BATCH_SIZE \
- ((HV_HYP_PAGE_SIZE - sizeof(struct hv_input_map_gpa_pages)) \
- / sizeof(u64))
-#define HV_GET_VP_STATE_BATCH_SIZE \
- ((HV_HYP_PAGE_SIZE - sizeof(struct hv_input_get_vp_state)) \
- / sizeof(u64))
-#define HV_SET_VP_STATE_BATCH_SIZE \
- ((HV_HYP_PAGE_SIZE - sizeof(struct hv_input_set_vp_state)) \
- / sizeof(u64))
-#define HV_GET_GPA_ACCESS_STATES_BATCH_SIZE \
- ((HV_HYP_PAGE_SIZE - sizeof(union hv_gpa_page_access_state)) \
- / sizeof(union hv_gpa_page_access_state))
-#define HV_MODIFY_SPARSE_SPA_PAGE_HOST_ACCESS_MAX_PAGE_COUNT \
- ((HV_HYP_PAGE_SIZE - \
- sizeof(struct hv_input_modify_sparse_spa_page_host_access)) / \
- sizeof(u64))
int hv_call_withdraw_memory(u64 count, int node, u64 partition_id)
{
@@ -58,9 +42,7 @@ int hv_call_withdraw_memory(u64 count, int node, u64 partition_id)
while (remaining) {
local_irq_save(flags);
- input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
- memset(input_page, 0, sizeof(*input_page));
+ hv_setup_in(&input_page, sizeof(*input_page));
input_page->partition_id = partition_id;
status = hv_do_rep_hypercall(HVCALL_WITHDRAW_MEMORY,
min(remaining, HV_WITHDRAW_BATCH_SIZE),
@@ -99,10 +81,7 @@ int hv_call_create_partition(u64 flags,
do {
local_irq_save(irq_flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
- output = *this_cpu_ptr(hyperv_pcpu_output_arg);
-
- memset(input, 0, sizeof(*input));
+ hv_setup_inout(&input, sizeof(*input), &output, sizeof(*output));
input->flags = flags;
input->compatibility_version = HV_COMPATIBILITY_21_H2;
@@ -206,11 +185,12 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
while (done < page_count) {
ulong i, completed, remain = page_count - done;
- int rep_count = min(remain, HV_MAP_GPA_BATCH_SIZE);
+ ulong rep_count, batch_size;
local_irq_save(irq_flags);
- input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
+ batch_size = hv_setup_in_array(&input_page, sizeof(*input_page),
+ sizeof(input_page->source_gpa_page_list[0]));
+ rep_count = min(remain, batch_size);
input_page->target_partition_id = partition_id;
input_page->target_gpa_base = gfn + (done << large_shift);
input_page->map_flags = flags;
@@ -311,7 +291,7 @@ int hv_call_unmap_gpa_pages(u64 partition_id, u64 gfn, u64 page_count_4k,
int rep_count = min(remain, HV_UMAP_GPA_PAGES);
local_irq_save(irq_flags);
- input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ hv_setup_in(&input_page, sizeof(*input_page));
input_page->target_partition_id = partition_id;
input_page->target_gpa_base = gfn + (done << large_shift);
@@ -340,7 +320,7 @@ int hv_call_get_gpa_access_states(u64 partition_id, u32 count, u64 gpa_base_pfn,
struct hv_input_get_gpa_pages_access_state *input_page;
union hv_gpa_page_access_state *output_page;
int completed = 0;
- unsigned long remaining = count;
+ unsigned long batch_size, remaining = count;
int rep_count, i;
u64 status = 0;
unsigned long flags;
@@ -348,13 +328,13 @@ int hv_call_get_gpa_access_states(u64 partition_id, u32 count, u64 gpa_base_pfn,
*written_total = 0;
while (remaining) {
local_irq_save(flags);
- input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
- output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
+ batch_size = hv_setup_inout_array(&input_page, sizeof(*input_page),
+ 0, &output_page, 0, sizeof(*output_page));
input_page->partition_id = partition_id;
input_page->hv_gpa_page_number = gpa_base_pfn + *written_total;
input_page->flags = state_flags;
- rep_count = min(remaining, HV_GET_GPA_ACCESS_STATES_BATCH_SIZE);
+ rep_count = min(remaining, batch_size);
status = hv_do_rep_hypercall(HVCALL_GET_GPA_PAGES_ACCESS_STATES, rep_count,
0, input_page, output_page);
@@ -384,8 +364,7 @@ int hv_call_assert_virtual_interrupt(u64 partition_id, u32 vector,
u64 status;
local_irq_save(flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
- memset(input, 0, sizeof(*input));
+ hv_setup_in(&input, sizeof(*input));
input->partition_id = partition_id;
input->vector = vector;
input->dest_addr = dest_addr;
@@ -422,21 +401,21 @@ int hv_call_get_vp_state(u32 vp_index, u64 partition_id,
u64 status;
int i;
u64 control;
- unsigned long flags;
+ unsigned long flags, batch_size;
int ret = 0;
- if (page_count > HV_GET_VP_STATE_BATCH_SIZE)
- return -EINVAL;
-
if (!page_count && !ret_output)
return -EINVAL;
do {
local_irq_save(flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
- output = *this_cpu_ptr(hyperv_pcpu_output_arg);
- memset(input, 0, sizeof(*input));
- memset(output, 0, sizeof(*output));
+ batch_size = hv_setup_inout_array(&input, sizeof(*input),
+ sizeof(input->output_data_pfns[0]),
+ &output, sizeof(*output), 0);
+ if (page_count > batch_size) {
+ local_irq_restore(flags);
+ return -EINVAL;
+ }
input->partition_id = partition_id;
input->vp_index = vp_index;
@@ -478,11 +457,7 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
unsigned long flags;
int ret = 0;
u16 varhead_sz;
-
- if (page_count > HV_SET_VP_STATE_BATCH_SIZE)
- return -EINVAL;
- if (sizeof(*input) + num_bytes > HV_HYP_PAGE_SIZE)
- return -EINVAL;
+ u64 batch_size;
if (num_bytes)
/* round up to 8 and divide by 8 */
@@ -494,18 +469,26 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
do {
local_irq_save(flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
- memset(input, 0, sizeof(*input));
- input->partition_id = partition_id;
- input->vp_index = vp_index;
- input->state_data = state_data;
if (num_bytes) {
+ batch_size = hv_setup_in_array(&input, sizeof(*input),
+ sizeof(input->data[0].bytes));
+ if (num_bytes > batch_size)
+ goto size_error;
+
memcpy((u8 *)input->data, bytes, num_bytes);
} else {
+ batch_size = hv_setup_in_array(&input, sizeof(*input),
+ sizeof(input->data[0].pfns));
+ if (page_count > batch_size)
+ goto size_error;
+
for (i = 0; i < page_count; i++)
input->data[i].pfns = page_to_pfn(pages[i]);
}
+ input->partition_id = partition_id;
+ input->vp_index = vp_index;
+ input->state_data = state_data;
control = (HVCALL_SET_VP_STATE) |
(varhead_sz << HV_HYPERCALL_VARHEAD_OFFSET);
@@ -524,6 +507,10 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
} while (!ret);
return ret;
+
+size_error:
+ local_irq_restore(flags);
+ return -EINVAL;
}
int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
@@ -539,8 +526,7 @@ int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
do {
local_irq_save(flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
- output = *this_cpu_ptr(hyperv_pcpu_output_arg);
+ hv_setup_inout(&input, sizeof(*input), &output, sizeof(*output));
input->partition_id = partition_id;
input->vp_index = vp_index;
@@ -574,9 +560,7 @@ int hv_call_unmap_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
local_irq_save(flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
- memset(input, 0, sizeof(*input));
+ hv_setup_in(&input, sizeof(*input));
input->partition_id = partition_id;
input->vp_index = vp_index;
@@ -614,8 +598,7 @@ hv_call_create_port(u64 port_partition_id, union hv_port_id port_id,
do {
local_irq_save(flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
- memset(input, 0, sizeof(*input));
+ hv_setup_in(&input, sizeof(*input));
input->port_partition_id = port_partition_id;
input->port_id = port_id;
@@ -668,8 +651,7 @@ hv_call_connect_port(u64 port_partition_id, union hv_port_id port_id,
do {
local_irq_save(flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
- memset(input, 0, sizeof(*input));
+ hv_setup_in(&input, sizeof(*input));
input->port_partition_id = port_partition_id;
input->port_id = port_id;
input->connection_partition_id = connection_partition_id;
@@ -736,10 +718,7 @@ int hv_call_map_stat_page(enum hv_stats_object_type type,
do {
local_irq_save(flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
- output = *this_cpu_ptr(hyperv_pcpu_output_arg);
-
- memset(input, 0, sizeof(*input));
+ hv_setup_inout(&input, sizeof(*input), &output, sizeof(*output));
input->type = type;
input->identity = *identity;
@@ -773,9 +752,7 @@ int hv_call_unmap_stat_page(enum hv_stats_object_type type,
u64 status;
local_irq_save(flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
-
- memset(input, 0, sizeof(*input));
+ hv_setup_in(&input, sizeof(*input));
input->type = type;
input->identity = *identity;
@@ -808,14 +785,14 @@ int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
}
while (done < page_count) {
- ulong i, completed, remain = page_count - done;
- int rep_count = min(remain,
- HV_MODIFY_SPARSE_SPA_PAGE_HOST_ACCESS_MAX_PAGE_COUNT);
+ ulong i, batch_size, completed, remain = page_count - done;
+ ulong rep_count;
local_irq_save(irq_flags);
- input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
+ batch_size = hv_setup_in_array(&input_page, sizeof(*input_page),
+ sizeof(input_page->spa_page_list[0]));
+ rep_count = min(remain, batch_size);
- memset(input_page, 0, sizeof(*input_page));
/* Only set the partition id if you are making the pages
* exclusive
*/
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index 72df774e410a..aca3331ad516 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -2051,11 +2051,8 @@ static int __init hv_retrieve_scheduler_type(enum hv_scheduler_type *out)
u64 status;
local_irq_save(flags);
- input = *this_cpu_ptr(hyperv_pcpu_input_arg);
- output = *this_cpu_ptr(hyperv_pcpu_output_arg);
- memset(input, 0, sizeof(*input));
- memset(output, 0, sizeof(*output));
+ hv_setup_inout(&input, sizeof(*input), &output, sizeof(*output));
input->property_id = HV_SYSTEM_PROPERTY_SCHEDULER_TYPE;
status = hv_do_hypercall(HVCALL_GET_SYSTEM_PROPERTY, input, output);
--
2.25.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH v4 7/7] Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg
2025-07-18 4:55 [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
` (5 preceding siblings ...)
2025-07-18 4:55 ` [PATCH v4 6/7] Drivers: hv: Use hv_setup_*() to set up hypercall arguments for mshv code mhkelley58
@ 2025-07-18 4:55 ` mhkelley58
2025-08-13 22:20 ` Nuno Das Neves
2025-07-18 16:33 ` [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args Easwar Hariharan
2025-08-12 23:58 ` Wei Liu
8 siblings, 1 reply; 26+ messages in thread
From: mhkelley58 @ 2025-07-18 4:55 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
lpieralisi, kw, mani, robh, bhelgaas, arnd
Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch
From: Michael Kelley <mhklinux@outlook.com>
All open coded uses of hyperv_pcpu_input_arg and hyperv_pcpu_ouput_arg
have been replaced by hv_setup_*() functions. So combine
hyperv_pcpu_input_arg and hyperv_pcpu_output_arg in a single
hyperv_pcpu_arg. Remove logic for managing a separate output arg. Fixup
comment references to the old variable names.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
arch/x86/hyperv/hv_init.c | 6 ++--
drivers/hv/hv.c | 2 +-
drivers/hv/hv_common.c | 55 ++++++++++------------------------
drivers/hv/hyperv_vmbus.h | 2 +-
include/asm-generic/mshyperv.h | 6 +---
5 files changed, 22 insertions(+), 49 deletions(-)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index b7a2877c2a92..2979d15223cf 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -453,16 +453,16 @@ void __init hyperv_init(void)
* A TDX VM with no paravisor only uses TDX GHCI rather than hv_hypercall_pg:
* when the hypercall input is a page, such a VM must pass a decrypted
* page to Hyper-V, e.g. hv_post_message() uses the per-CPU page
- * hyperv_pcpu_input_arg, which is decrypted if no paravisor is present.
+ * hyperv_pcpu_arg, which is decrypted if no paravisor is present.
*
* A TDX VM with the paravisor uses hv_hypercall_pg for most hypercalls,
* which are handled by the paravisor and the VM must use an encrypted
- * input page: in such a VM, the hyperv_pcpu_input_arg is encrypted and
+ * input page: in such a VM, the hyperv_pcpu_arg is encrypted and
* used in the hypercalls, e.g. see hv_mark_gpa_visibility() and
* hv_arch_irq_unmask(). Such a VM uses TDX GHCI for two hypercalls:
* 1. HVCALL_SIGNAL_EVENT: see vmbus_set_event() and _hv_do_fast_hypercall8().
* 2. HVCALL_POST_MESSAGE: the input page must be a decrypted page, i.e.
- * hv_post_message() in such a VM can't use the encrypted hyperv_pcpu_input_arg;
+ * hv_post_message() in such a VM can't use the encrypted hyperv_pcpu_arg;
* instead, hv_post_message() uses the post_msg_page, which is decrypted
* in such a VM and is only used in such a VM.
*/
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index ad063f535f95..3b5dfdecdfe7 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -60,7 +60,7 @@ int hv_post_message(union hv_connection_id connection_id,
/*
* A TDX VM with the paravisor must use the decrypted post_msg_page: see
* the comment in struct hv_per_cpu_context. A SNP VM with the paravisor
- * can use the encrypted hyperv_pcpu_input_arg because it copies the
+ * can use the encrypted hyperv_pcpu_arg because it copies the
* input into the GHCB page, which has been decrypted by the paravisor.
*/
if (hv_isolation_type_tdx() && ms_hyperv.paravisor_present)
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index ae56397af1ed..cbe4a954ad46 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -58,11 +58,8 @@ EXPORT_SYMBOL_GPL(hv_vp_index);
u32 hv_max_vp_index;
EXPORT_SYMBOL_GPL(hv_max_vp_index);
-void * __percpu *hyperv_pcpu_input_arg;
-EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
-
-void * __percpu *hyperv_pcpu_output_arg;
-EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
+void * __percpu *hyperv_pcpu_arg;
+EXPORT_SYMBOL_GPL(hyperv_pcpu_arg);
static void hv_kmsg_dump_unregister(void);
@@ -95,11 +92,8 @@ void __init hv_common_free(void)
kfree(hv_vp_index);
hv_vp_index = NULL;
- free_percpu(hyperv_pcpu_output_arg);
- hyperv_pcpu_output_arg = NULL;
-
- free_percpu(hyperv_pcpu_input_arg);
- hyperv_pcpu_input_arg = NULL;
+ free_percpu(hyperv_pcpu_arg);
+ hyperv_pcpu_arg = NULL;
free_percpu(hv_synic_eventring_tail);
hv_synic_eventring_tail = NULL;
@@ -255,11 +249,6 @@ static void hv_kmsg_dump_register(void)
}
}
-static inline bool hv_output_page_exists(void)
-{
- return hv_root_partition() || IS_ENABLED(CONFIG_HYPERV_VTL_MODE);
-}
-
void __init hv_get_partition_id(void)
{
struct hv_output_get_partition_id *output;
@@ -365,14 +354,8 @@ int __init hv_common_init(void)
* (per-CPU) hypercall input page and thus this failure is
* fatal on Hyper-V.
*/
- hyperv_pcpu_input_arg = alloc_percpu(void *);
- BUG_ON(!hyperv_pcpu_input_arg);
-
- /* Allocate the per-CPU state for output arg for root */
- if (hv_output_page_exists()) {
- hyperv_pcpu_output_arg = alloc_percpu(void *);
- BUG_ON(!hyperv_pcpu_output_arg);
- }
+ hyperv_pcpu_arg = alloc_percpu(void *);
+ BUG_ON(!hyperv_pcpu_arg);
if (hv_root_partition()) {
hv_synic_eventring_tail = alloc_percpu(u8 *);
@@ -466,33 +449,28 @@ void __init ms_hyperv_late_init(void)
int hv_common_cpu_init(unsigned int cpu)
{
- void **inputarg, **outputarg;
+ void **inputarg;
u8 **synic_eventring_tail;
u64 msr_vp_index;
gfp_t flags;
- const int pgcount = hv_output_page_exists() ? 2 : 1;
+ const int pgcount = HV_HVCALL_ARG_PAGES;
void *mem;
int ret = 0;
/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL;
- inputarg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
+ inputarg = (void **)this_cpu_ptr(hyperv_pcpu_arg);
/*
- * The per-cpu memory is already allocated if this CPU was previously
- * online and then taken offline
+ * hyperv_pcpu_arg memory is already allocated if this CPU was
+ * previously online and then taken offline
*/
if (!*inputarg) {
mem = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags);
if (!mem)
return -ENOMEM;
- if (hv_output_page_exists()) {
- outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
- *outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
- }
-
if (!ms_hyperv.paravisor_present &&
(hv_isolation_type_snp() || hv_isolation_type_tdx())) {
ret = set_memory_decrypted((unsigned long)mem, pgcount);
@@ -506,13 +484,13 @@ int hv_common_cpu_init(unsigned int cpu)
/*
* In a fully enlightened TDX/SNP VM with more than 64 VPs, if
- * hyperv_pcpu_input_arg is not NULL, set_memory_decrypted() ->
+ * hyperv_pcpu_arg is not NULL, set_memory_decrypted() ->
* ... -> cpa_flush()-> ... -> __send_ipi_mask_ex() tries to
- * use hyperv_pcpu_input_arg as the hypercall input page, which
+ * use hyperv_pcpu_arg as the hypercall input page, which
* must be a decrypted page in such a VM, but the page is still
* encrypted before set_memory_decrypted() returns. Fix this by
* setting *inputarg after the above set_memory_decrypted(): if
- * hyperv_pcpu_input_arg is NULL, __send_ipi_mask_ex() returns
+ * hyperv_pcpu_arg is NULL, __send_ipi_mask_ex() returns
* HV_STATUS_INVALID_PARAMETER immediately, and the function
* hv_send_ipi_mask() falls back to orig_apic.send_IPI_mask(),
* which may be slightly slower than the hypercall, but still
@@ -544,9 +522,8 @@ int hv_common_cpu_die(unsigned int cpu)
{
u8 **synic_eventring_tail;
/*
- * The hyperv_pcpu_input_arg and hyperv_pcpu_output_arg memory
- * is not freed when the CPU goes offline as the hyperv_pcpu_input_arg
- * may be used by the Hyper-V vPCI driver in reassigning interrupts
+ * The hyperv_pcpu_arg memory is not freed when the CPU goes offline as
+ * it may be used by the Hyper-V vPCI driver in reassigning interrupts
* as part of the offlining process. The interrupt reassignment
* happens *after* the CPUHP_AP_HYPERV_ONLINE state has run and
* called this function.
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 0b450e53161e..bef8a57100ec 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -126,7 +126,7 @@ struct hv_per_cpu_context {
/*
* The page is only used in hv_post_message() for a TDX VM (with the
* paravisor) to post a messages to Hyper-V: when such a VM calls
- * HVCALL_POST_MESSAGE, it can't use the hyperv_pcpu_input_arg (which
+ * HVCALL_POST_MESSAGE, it can't use the hyperv_pcpu_arg (which
* is encrypted in such a VM) as the hypercall input page, because
* the input page for HVCALL_POST_MESSAGE must be decrypted in such a
* VM, so post_msg_page (which is decrypted in hv_synic_alloc()) is
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 040c4650f411..3ce5b94ad41e 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -67,8 +67,7 @@ extern bool hv_nested;
extern u64 hv_current_partition_id;
extern enum hv_partition_type hv_curr_partition_type;
-extern void * __percpu *hyperv_pcpu_input_arg;
-extern void * __percpu *hyperv_pcpu_output_arg;
+extern void * __percpu *hyperv_pcpu_arg;
u64 hv_do_hypercall(u64 control, void *inputaddr, void *outputaddr);
u64 hv_do_fast_hypercall8(u16 control, u64 input8);
@@ -155,9 +154,6 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size,
* Hypercall input and output argument setup
*/
-/* Temporary mapping to be removed at the end of the patch series */
-#define hyperv_pcpu_arg hyperv_pcpu_input_arg
-
/*
* Allocate one page that is shared between input and output args, which is
* sufficient for all current hypercalls. If a future hypercall requires
--
2.25.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args
2025-07-18 4:55 [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
` (6 preceding siblings ...)
2025-07-18 4:55 ` [PATCH v4 7/7] Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg mhkelley58
@ 2025-07-18 16:33 ` Easwar Hariharan
2025-07-18 17:13 ` Michael Kelley
2025-08-12 23:58 ` Wei Liu
8 siblings, 1 reply; 26+ messages in thread
From: Easwar Hariharan @ 2025-07-18 16:33 UTC (permalink / raw)
To: mhklinux
Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
lpieralisi, kw, mani, robh, bhelgaas, arnd, eahariha, x86,
linux-hyperv, linux-kernel, linux-pci, linux-arch
On 7/17/2025 9:55 PM, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
>
> This patch set introduces a new way to manage the use of the per-vCPU
> memory that is usually the input and output arguments to Hyper-V
> hypercalls. Current code allocates the "hyperv_pcpu_input_arg", and in
> some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
> page of memory allocated per-vCPU. A hypercall call site disables
> interrupts, then uses this memory to set up the input parameters for
> the hypercall, read the output results after hypercall execution, and
> re-enable interrupts. The open coding of these steps has led to
> inconsistencies, and in some cases, violation of the generic
> requirements for the hypercall input and output as described in the
> Hyper-V Top Level Functional Spec (TLFS)[1]. This patch set introduces
> a new family of inline functions to replace the open coding. The new
> functions encapsulate key aspects of the use of per-vCPU memory for
> hypercall input and output, and ensure that the TLFS requirements are
> met (max size of 1 page each for input and output, no overlap of input
> and output, aligned to 8 bytes, etc.).
>
> With this change, hypercall call sites no longer directly access
> "hyperv_pcpu_input_arg" and "hyperv_pcpu_output_arg". Instead, one of
> a family of new functions provides the per-vCPU memory that a hypercall
> call site uses to set up hypercall input and output areas.
> Conceptually, there is no longer a difference between the "per-vCPU
> input page" and "per-vCPU output page". Only a single per-vCPU page is
> allocated, and it is used to provide both hypercall input and output.
> All current hypercalls can fit their input and output within that single
> page, though the new code allows easy changing to two pages should a
> future hypercall require a full page for each of the input and output.
>
> The new functions always zero the fixed-size portion of the hypercall
> input area (but not any array portion -- see below) so that
> uninitialized memory isn't inadvertently passed to the hypercall.
> Current open-coded hypercall call sites are inconsistent on this point,
> and use of the new functions addresses that inconsistency. The output
> area is not zero'ed by the new code as it is Hyper-V's responsibility
> to provide legal output.
>
> When the input or output (or both) contain an array, the new code
> calculates and returns how many array entries fit within the per-vCPU
> memory page, which is effectively the "batch size" for the hypercall
> processing multiple entries. This batch size can then be used in the
> hypercall control word to specify the repetition count. This
> calculation of the batch size replaces current open coding of the
> batch size, which is prone to errors. Note that the array portion of
> the input area is *not* zero'ed. The arrays are almost always 64-bit
> GPAs or something similar, and zero'ing that much memory seems
> wasteful at runtime when it will all be overwritten. The hypercall
> call site is responsible for ensuring that no part of the array is
> left uninitialized (just as with current code).
>
> The new family of functions is realized as a single inline function
> that handles the most complex case, which is a hypercall with input
> and output, both of which contain arrays. Simpler cases are mapped to
> this most complex case with #define wrappers that provide zero or NULL
> for some arguments. Several of the arguments to this new function
> must be compile-time constants generated by "sizeof()" expressions.
> As such, most of the code in the new function is evaluated by the
> compiler, with the result that the runtime code paths are no longer
> than with the current open coding. An exception is the new code
> generated to zero the fixed-size portion of the input area in cases
> where it was not previously done.
>
> Use of the new function typically (but not always) saves a few lines
> of code at each hypercall call site. This is traded off against the
> lines of code added for the new functions. With code currently
> upstream, the net is an add of about 20 lines of code and comments.
>
> A couple hypercall call sites have requirements that are not 100%
> handled by the new function. These still require some manual open-
> coded adjustment or open-coded batch size calculations -- see the
> individual patches in this series. Suggestions on how to do better
> are welcome.
>
> The patches in the series do the following:
>
> Patch 1: Introduce the new family of functions for assigning hypercall
> input and output arguments.
>
> Patch 2 to 6: Change existing hypercall call sites to use one of the new
> functions. In some cases, tweaks to the hypercall argument data
> structures are necessary, but these tweaks are making the data
> structures more consistent with the overall pattern. These
> 5 patches are independent of each other, and can go in any
> order. The breakup into 5 patches is for ease of review.
>
> Patch 7: Update the name of the variable used to hold the per-vCPU memory
> used for hypercall arguments. Remove code for managing the
> per-vCPU output page.
>
> The new code compiles and runs successfully on x86 and arm64. However,
> basic smoke tests cover only a limited number of hypercall call sites
> that have been modified. I don't have the hardware or Hyper-V
> configurations needed to test running in the Hyper-V root partition
> or running in a VTL other than VTL 0. The related hypercall call sites
> still need to be tested to make sure I didn't break anything. Hopefully
> someone with the necessary configurations and Hyper-V versions can
> help with that testing.
>
> For gcc 9.4.0, I've looked at the generated code for a couple of
> hypercall call sites on both x86 and arm64 to ensure that it boils
> down to the equivalent of the current open coding. I have not looked
> at the generated code for later gcc versions or for Clang/LLVM, but
> there's no reason to expect something worse as the code isn't doing
> anything tricky.
>
> This patch set is built against linux-next20250716.
>
> [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
>
> Michael Kelley (7):
> Drivers: hv: Introduce hv_setup_*() functions for hypercall arguments
> x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 1
> x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 2
> Drivers: hv: Use hv_setup_*() to set up hypercall arguments
> PCI: hv: Use hv_setup_*() to set up hypercall arguments
> Drivers: hv: Use hv_setup_*() to set up hypercall arguments for mshv
> code
> Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg
>
> arch/x86/hyperv/hv_apic.c | 10 +--
> arch/x86/hyperv/hv_init.c | 12 ++-
> arch/x86/hyperv/hv_vtl.c | 3 +-
> arch/x86/hyperv/irqdomain.c | 17 ++--
> arch/x86/hyperv/ivm.c | 18 ++---
> arch/x86/hyperv/mmu.c | 19 ++---
> arch/x86/hyperv/nested.c | 14 ++--
> drivers/hv/hv.c | 6 +-
> drivers/hv/hv_balloon.c | 8 +-
> drivers/hv/hv_common.c | 64 +++++----------
> drivers/hv/hv_proc.c | 23 +++---
> drivers/hv/hyperv_vmbus.h | 2 +-
> drivers/hv/mshv_common.c | 31 +++----
> drivers/hv/mshv_root_hv_call.c | 121 +++++++++++-----------------
> drivers/hv/mshv_root_main.c | 5 +-
> drivers/pci/controller/pci-hyperv.c | 18 ++---
> include/asm-generic/mshyperv.h | 106 +++++++++++++++++++++++-
> include/hyperv/hvgdk_mini.h | 4 +-
> 18 files changed, 251 insertions(+), 230 deletions(-)
>
Thank you for spinning this version!
For the series,
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args
2025-07-18 16:33 ` [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args Easwar Hariharan
@ 2025-07-18 17:13 ` Michael Kelley
2025-07-18 20:25 ` Easwar Hariharan
0 siblings, 1 reply; 26+ messages in thread
From: Michael Kelley @ 2025-07-18 17:13 UTC (permalink / raw)
To: Easwar Hariharan
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
lpieralisi@kernel.org, kw@linux.com, mani@kernel.org,
robh@kernel.org, bhelgaas@google.com, arnd@arndb.de,
x86@kernel.org, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
linux-arch@vger.kernel.org
From: Easwar Hariharan <eahariha@linux.microsoft.com> Sent: Friday, July 18, 2025 9:33 AM
>
> On 7/17/2025 9:55 PM, mhkelley58@gmail.com wrote:
> > From: Michael Kelley <mhklinux@outlook.com>
> >
[snip]
> >
> > The new code compiles and runs successfully on x86 and arm64. However,
> > basic smoke tests cover only a limited number of hypercall call sites
> > that have been modified. I don't have the hardware or Hyper-V
> > configurations needed to test running in the Hyper-V root partition
> > or running in a VTL other than VTL 0. The related hypercall call sites
> > still need to be tested to make sure I didn't break anything. Hopefully
> > someone with the necessary configurations and Hyper-V versions can
> > help with that testing.
Easwar --
Thanks for reviewing.
Any chance you (or someone else) could do a quick smoke test of this
patch set when running in the Hyper-V root partition, and separately,
when running in VTL2? Some hypercall call sites are modified that
don't get called in normal VTL0 execution. It just needs a quick
verification that nothing is obviously broken for the root partition and
VTL2 cases.
Michael
> >
> > This patch set is built against linux-next20250716.
> >
[snip]
>
> Thank you for spinning this version!
>
> For the series,
>
> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args
2025-07-18 17:13 ` Michael Kelley
@ 2025-07-18 20:25 ` Easwar Hariharan
2025-07-19 1:15 ` Roman Kisel
0 siblings, 1 reply; 26+ messages in thread
From: Easwar Hariharan @ 2025-07-18 20:25 UTC (permalink / raw)
To: Michael Kelley, Nuno Das Neves, Naman Jain, Roman Kisel
Cc: eahariha, kys@microsoft.com, haiyangz@microsoft.com,
wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
hpa@zytor.com, lpieralisi@kernel.org, kw@linux.com,
mani@kernel.org, robh@kernel.org, bhelgaas@google.com,
arnd@arndb.de, x86@kernel.org, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
linux-arch@vger.kernel.org
On 7/18/2025 10:13 AM, Michael Kelley wrote:
> From: Easwar Hariharan <eahariha@linux.microsoft.com> Sent: Friday, July 18, 2025 9:33 AM
>>
>> On 7/17/2025 9:55 PM, mhkelley58@gmail.com wrote:
>>> From: Michael Kelley <mhklinux@outlook.com>
>>>
>
> [snip]
>
>>>
>>> The new code compiles and runs successfully on x86 and arm64. However,
>>> basic smoke tests cover only a limited number of hypercall call sites
>>> that have been modified. I don't have the hardware or Hyper-V
>>> configurations needed to test running in the Hyper-V root partition
>>> or running in a VTL other than VTL 0. The related hypercall call sites
>>> still need to be tested to make sure I didn't break anything. Hopefully
>>> someone with the necessary configurations and Hyper-V versions can
>>> help with that testing.
>
> Easwar --
>
> Thanks for reviewing.
>
> Any chance you (or someone else) could do a quick smoke test of this
> patch set when running in the Hyper-V root partition, and separately,
> when running in VTL2? Some hypercall call sites are modified that
> don't get called in normal VTL0 execution. It just needs a quick
> verification that nothing is obviously broken for the root partition and
> VTL2 cases.
>
> Michael
>
I'm working almost entirely in VTL0, so I'd call on Nuno, Naman, and Roman (cc'ed) to help.
- Easwar
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args
2025-07-18 20:25 ` Easwar Hariharan
@ 2025-07-19 1:15 ` Roman Kisel
2025-07-21 2:19 ` Michael Kelley
0 siblings, 1 reply; 26+ messages in thread
From: Roman Kisel @ 2025-07-19 1:15 UTC (permalink / raw)
To: Easwar Hariharan, Michael Kelley, Nuno Das Neves, Naman Jain
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
lpieralisi@kernel.org, kw@linux.com, mani@kernel.org,
robh@kernel.org, bhelgaas@google.com, arnd@arndb.de,
x86@kernel.org, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
linux-arch@vger.kernel.org
On 7/18/2025 1:25 PM, Easwar Hariharan wrote:
> On 7/18/2025 10:13 AM, Michael Kelley wrote:
>> From: Easwar Hariharan <eahariha@linux.microsoft.com> Sent: Friday, July 18, 2025 9:33 AM
>>>
>>> On 7/17/2025 9:55 PM, mhkelley58@gmail.com wrote:
>>>> From: Michael Kelley <mhklinux@outlook.com>
>>>>
>>
>> [snip]
>>
>>>>
>>>> The new code compiles and runs successfully on x86 and arm64. However,
>>>> basic smoke tests cover only a limited number of hypercall call sites
>>>> that have been modified. I don't have the hardware or Hyper-V
>>>> configurations needed to test running in the Hyper-V root partition
>>>> or running in a VTL other than VTL 0. The related hypercall call sites
>>>> still need to be tested to make sure I didn't break anything. Hopefully
>>>> someone with the necessary configurations and Hyper-V versions can
>>>> help with that testing.
>>
>> Easwar --
>>
>> Thanks for reviewing.
>>
>> Any chance you (or someone else) could do a quick smoke test of this
>> patch set when running in the Hyper-V root partition, and separately,
>> when running in VTL2? Some hypercall call sites are modified that
>> don't get called in normal VTL0 execution. It just needs a quick
>> verification that nothing is obviously broken for the root partition and
>> VTL2 cases.
>>
>> Michael
>>
>
> I'm working almost entirely in VTL0, so I'd call on Nuno, Naman, and Roman (cc'ed) to help.
>
Michael,
I'll try to squeeze that in during the next week. Folks should feel free
to beat me to that :) The caveat would be that there are scenarios that
are beyond the capabilities of the hardware that I have readily
available, and would need to run in test clusters in Azure, and these
are pretty busy.
VTL2 currently uses a limited number hypercalls that are set as enabled
in the OpenVMM code (`set_allowed_hypercalls`). You could take a look
and conclude if these hypercalls require any adjustments in the patches.
My opinion has been to have two pages (input and output ones). As the
new code introduces just one page I do feel a bit apprehensive, got no
hard evidence that this is a bad approach though. If we tweak the code
to have 2 pages, perhaps there would be no need to run a full-blown
validation, and even smoke tests will suffice?
> - Easwar
--
Thank you,
Roman
^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args
2025-07-19 1:15 ` Roman Kisel
@ 2025-07-21 2:19 ` Michael Kelley
2025-07-21 17:14 ` Roman Kisel
0 siblings, 1 reply; 26+ messages in thread
From: Michael Kelley @ 2025-07-21 2:19 UTC (permalink / raw)
To: Roman Kisel, Easwar Hariharan, Nuno Das Neves, Naman Jain
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
lpieralisi@kernel.org, kw@linux.com, mani@kernel.org,
robh@kernel.org, bhelgaas@google.com, arnd@arndb.de,
x86@kernel.org, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
linux-arch@vger.kernel.org
From: Roman Kisel <romank@linux.microsoft.com> Sent: Friday, July 18, 2025 6:16 PM
>
> On 7/18/2025 1:25 PM, Easwar Hariharan wrote:
> > On 7/18/2025 10:13 AM, Michael Kelley wrote:
> >> From: Easwar Hariharan <eahariha@linux.microsoft.com> Sent: Friday, July 18, 2025
> 9:33 AM
> >>>
> >>> On 7/17/2025 9:55 PM, mhkelley58@gmail.com wrote:
> >>>> From: Michael Kelley <mhklinux@outlook.com>
> >>>>
> >>
> >> [snip]
> >>
> >>>>
> >>>> The new code compiles and runs successfully on x86 and arm64. However,
> >>>> basic smoke tests cover only a limited number of hypercall call sites
> >>>> that have been modified. I don't have the hardware or Hyper-V
> >>>> configurations needed to test running in the Hyper-V root partition
> >>>> or running in a VTL other than VTL 0. The related hypercall call sites
> >>>> still need to be tested to make sure I didn't break anything. Hopefully
> >>>> someone with the necessary configurations and Hyper-V versions can
> >>>> help with that testing.
> >>
> >> Easwar --
> >>
> >> Thanks for reviewing.
> >>
> >> Any chance you (or someone else) could do a quick smoke test of this
> >> patch set when running in the Hyper-V root partition, and separately,
> >> when running in VTL2? Some hypercall call sites are modified that
> >> don't get called in normal VTL0 execution. It just needs a quick
> >> verification that nothing is obviously broken for the root partition and
> >> VTL2 cases.
> >>
> >> Michael
> >>
> >
> > I'm working almost entirely in VTL0, so I'd call on Nuno, Naman, and Roman (cc'ed) to
> help.
> >
>
> Michael,
>
> I'll try to squeeze that in during the next week. Folks should feel free
> to beat me to that :) The caveat would be that there are scenarios that
> are beyond the capabilities of the hardware that I have readily
> available, and would need to run in test clusters in Azure, and these
> are pretty busy.
Thanks for any testing you can do on standalone test machines without
needing test clusters in Azure. It will be hard to get test coverage on
*every* hypercall call site that is modified by the patch set, but doing
basic smoke testing of running in the root partition and in VTL2 will
cover more than I can cover running in a VTL0 guest on my laptop or
in Azure. Fortunately, the changes overall in this patch set are pretty
straightforward, and my testing of VTL0 guests didn’t turn up any bugs.
I'm hoping that additional smoke testing is more about gaining
confidence than finding actual bugs. (Famous last words ....)
> VTL2 currently uses a limited number hypercalls that are set as enabled
> in the OpenVMM code (`set_allowed_hypercalls`). You could take a look
> and conclude if these hypercalls require any adjustments in the patches.
My patch set already covers all the hypercall call sites that originate in
VTL2 code. Again, a basic smoke test should help gain confidence, or
show that any confidence is misplaced :-)
> My opinion has been to have two pages (input and output ones). As the
> new code introduces just one page I do feel a bit apprehensive, got no
> hard evidence that this is a bad approach though. If we tweak the code
> to have 2 pages, perhaps there would be no need to run a full-blown
> validation, and even smoke tests will suffice?
My view is that the 1 page vs. 2 pages is much less of a risk than just
some coding error in introducing the new interfaces. The 1 page vs.
2 pages should only affect the batch size for rep hypercalls, and the
existing code already handles different batch sizes. So I'm not as
concerned about that risk. Wei Liu in the maintainer here, so I'll
certainly follow his judgment and guidance on what is needed to
be confident in this patch set.
Michael
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args
2025-07-21 2:19 ` Michael Kelley
@ 2025-07-21 17:14 ` Roman Kisel
2025-07-21 19:30 ` Michael Kelley
0 siblings, 1 reply; 26+ messages in thread
From: Roman Kisel @ 2025-07-21 17:14 UTC (permalink / raw)
To: Michael Kelley, Easwar Hariharan, Nuno Das Neves, Naman Jain
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
lpieralisi@kernel.org, kw@linux.com, mani@kernel.org,
robh@kernel.org, bhelgaas@google.com, arnd@arndb.de,
x86@kernel.org, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
linux-arch@vger.kernel.org
On 7/20/2025 7:19 PM, Michael Kelley wrote:
> From: Roman Kisel <romank@linux.microsoft.com> Sent: Friday, July 18, 2025 6:16 PM
[...]
>
> Thanks for any testing you can do on standalone test machines without
> needing test clusters in Azure. It will be hard to get test coverage on
> *every* hypercall call site that is modified by the patch set, but doing
> basic smoke testing of running in the root partition and in VTL2 will
> cover more than I can cover running in a VTL0 guest on my laptop or
> in Azure. Fortunately, the changes overall in this patch set are pretty
> straightforward, and my testing of VTL0 guests didn’t turn up any bugs.
> I'm hoping that additional smoke testing is more about gaining
> confidence than finding actual bugs. (Famous last words ....)
>
Thank you a million times for pushing the bar higher and supporting the
code :)
>> VTL2 currently uses a limited number hypercalls that are set as enabled
>> in the OpenVMM code (`set_allowed_hypercalls`). You could take a look
>> and conclude if these hypercalls require any adjustments in the patches.
>
> My patch set already covers all the hypercall call sites that originate in
> VTL2 code. Again, a basic smoke test should help gain confidence, or
> show that any confidence is misplaced :-)
>
Very nice, should be smooth sailing then :)
>> My opinion has been to have two pages (input and output ones). As the
>> new code introduces just one page I do feel a bit apprehensive, got no
>> hard evidence that this is a bad approach though. If we tweak the code
>> to have 2 pages, perhaps there would be no need to run a full-blown
>> validation, and even smoke tests will suffice?
>
> My view is that the 1 page vs. 2 pages is much less of a risk than just
> some coding error in introducing the new interfaces. The 1 page vs.
> 2 pages should only affect the batch size for rep hypercalls, and the
> existing code already handles different batch sizes. So I'm not as
> concerned about that risk. Wei Liu in the maintainer here, so I'll
> certainly follow his judgment and guidance on what is needed to
> be confident in this patch set.
>
I agree with your risk assessment. Perhaps I am playing too much of
a spec lawyer yet it states
1) Input and output area may not intersect,
2) Either can be up to 4KiB of size.
Hence, one (be that for feature development or one-off debugging) would
be within their right to implement a hypercall that accepts 4KiB of
data and returns 4KiB of data. My understanding that after this patch,
that won't work out-of-the-box, and would need some fixing in the
kernel.
Perhaps, we could have a KConfig option to let the user choose if they
need 2 pages instead of making the user figure out what needs to be
fixed in the kernel?
> Michael
--
Thank you,
Roman
^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args
2025-07-21 17:14 ` Roman Kisel
@ 2025-07-21 19:30 ` Michael Kelley
0 siblings, 0 replies; 26+ messages in thread
From: Michael Kelley @ 2025-07-21 19:30 UTC (permalink / raw)
To: Roman Kisel, Easwar Hariharan, Nuno Das Neves, Naman Jain
Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com,
bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
lpieralisi@kernel.org, kw@linux.com, mani@kernel.org,
robh@kernel.org, bhelgaas@google.com, arnd@arndb.de,
x86@kernel.org, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
linux-arch@vger.kernel.org
From: Roman Kisel <romank@linux.microsoft.com> Sent: Monday, July 21, 2025 10:15 AM
>
> On 7/20/2025 7:19 PM, Michael Kelley wrote:
> > From: Roman Kisel <romank@linux.microsoft.com> Sent: Friday, July 18, 2025 6:16
> PM
>
> [...]
>
> >
> > Thanks for any testing you can do on standalone test machines without
> > needing test clusters in Azure. It will be hard to get test coverage on
> > *every* hypercall call site that is modified by the patch set, but doing
> > basic smoke testing of running in the root partition and in VTL2 will
> > cover more than I can cover running in a VTL0 guest on my laptop or
> > in Azure. Fortunately, the changes overall in this patch set are pretty
> > straightforward, and my testing of VTL0 guests didn’t turn up any bugs.
> > I'm hoping that additional smoke testing is more about gaining
> > confidence than finding actual bugs. (Famous last words ....)
> >
>
> Thank you a million times for pushing the bar higher and supporting the
> code :)
>
> >> VTL2 currently uses a limited number hypercalls that are set as enabled
> >> in the OpenVMM code (`set_allowed_hypercalls`). You could take a look
> >> and conclude if these hypercalls require any adjustments in the patches.
> >
> > My patch set already covers all the hypercall call sites that originate in
> > VTL2 code. Again, a basic smoke test should help gain confidence, or
> > show that any confidence is misplaced :-)
> >
>
> Very nice, should be smooth sailing then :)
>
> >> My opinion has been to have two pages (input and output ones). As the
> >> new code introduces just one page I do feel a bit apprehensive, got no
> >> hard evidence that this is a bad approach though. If we tweak the code
> >> to have 2 pages, perhaps there would be no need to run a full-blown
> >> validation, and even smoke tests will suffice?
> >
> > My view is that the 1 page vs. 2 pages is much less of a risk than just
> > some coding error in introducing the new interfaces. The 1 page vs.
> > 2 pages should only affect the batch size for rep hypercalls, and the
> > existing code already handles different batch sizes. So I'm not as
> > concerned about that risk. Wei Liu in the maintainer here, so I'll
> > certainly follow his judgment and guidance on what is needed to
> > be confident in this patch set.
> >
>
> I agree with your risk assessment. Perhaps I am playing too much of
> a spec lawyer yet it states
>
> 1) Input and output area may not intersect,
> 2) Either can be up to 4KiB of size.
>
> Hence, one (be that for feature development or one-off debugging) would
> be within their right to implement a hypercall that accepts 4KiB of
> data and returns 4KiB of data. My understanding that after this patch,
> that won't work out-of-the-box, and would need some fixing in the
> kernel.
>
> Perhaps, we could have a KConfig option to let the user choose if they
> need 2 pages instead of making the user figure out what needs to be
> fixed in the kernel?
>
I'm not seeing the scenario where a Kconfig option is useful. Here's
my thinking:
A putative new hypercall that requires more than 4K for the sum of
the input and output would not be a rep hypercall. So the hypercall
has some large structure for the input and/or the output. Such a
hypercall would be handled one of two ways in the kernel:
(1) New code must be added in the kernel that includes a hypercall
call site to invoke this hypercall. The new code populates the large
input structure, and/or processes the large output structure in order
to accomplish whatever the Linux kernel needs done that the
hypercall helps with.
(2) The hypercall is invoked for the root partition or VTL2 via one
of the existing ioctl's that allow user space in the root partition or
VTL2 to invoke arbitrary hypercalls and get the results.
For (1), new code is being added to the kernel. So changing the
#define HV_HVCALL_ARG_PAGES from "1" to "2" is just part
of the code changes needed for the new hypercall.
For (2), the ioctls, such as mshv_ioctl_passthru_hvcall(), don't
use the pre-allocated hyperv_pcpu_input/output_arg. They do
a separate allocation of a full page for input and a full page for
output because of the complexities of copying the input
and output args from/to user space. So if the putative new
hypercall is invoked via the ioctl, it will just work.
I don't see a scenario where a Kconfig option provides any
value to someone building the kernel. Linux kernel code also
generally eschews adding mechanism to support future
scenarios that might or might not happen. Sure, we design
code to be extensible, but we don't generally add options or
APIs that aren't currently needed. I've consistently gotten
feedback to add such when/if the need actually arises. I
think the #define HV_HVCALL_ARG_PAGES 1 provides the
right balance.
Michael
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v4 4/7] Drivers: hv: Use hv_setup_*() to set up hypercall arguments
2025-07-18 4:55 ` [PATCH v4 4/7] Drivers: hv: Use hv_setup_*() to set up hypercall arguments mhkelley58
@ 2025-07-28 17:02 ` Nuno Das Neves
2025-07-29 0:23 ` Michael Kelley
0 siblings, 1 reply; 26+ messages in thread
From: Nuno Das Neves @ 2025-07-28 17:02 UTC (permalink / raw)
To: mhklinux, kys, haiyangz, wei.liu, decui, tglx, mingo, bp,
dave.hansen, hpa, lpieralisi, kw, mani, robh, bhelgaas, arnd
Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch
On 7/17/2025 9:55 PM, mhkelley58@gmail.com wrote:
<snip>
> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> index 2b4080e51f97..d9b569b204d2 100644
> --- a/drivers/hv/hv_balloon.c
> +++ b/drivers/hv/hv_balloon.c
> @@ -1577,21 +1577,21 @@ static int hv_free_page_report(struct page_reporting_dev_info *pr_dev_info,
> {
> unsigned long flags;
> struct hv_memory_hint *hint;
> - int i, order;
> + int i, order, batch_size;
> u64 status;
> struct scatterlist *sg;
>
> - WARN_ON_ONCE(nents > HV_MEMORY_HINT_MAX_GPA_PAGE_RANGES);
> WARN_ON_ONCE(sgl->length < (HV_HYP_PAGE_SIZE << page_reporting_order));
> local_irq_save(flags);
> - hint = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +
> + batch_size = hv_setup_in_array(&hint, sizeof(*hint), sizeof(hint->ranges[0]));
> if (!hint) {
> local_irq_restore(flags);
> return -ENOSPC;
> }
> + WARN_ON_ONCE(nents > batch_size);
>
I don't think WARN_ON_ONCE is sufficient here... this looks like a bug in the current code.
The loop below will go out of bounds of the input page if nents is too large.
Ideally this function would be refactored to batch the operation so that this isn't a
problem.
Nuno
> hint->heat_type = HV_EXTMEM_HEAT_HINT_COLD_DISCARD;
> - hint->reserved = 0;
> for_each_sg(sgl, sg, nents, i) {
> union hv_gpa_page_range *range;
>
<snip>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v4 5/7] PCI: hv: Use hv_setup_*() to set up hypercall arguments
2025-07-18 4:55 ` [PATCH v4 5/7] PCI: " mhkelley58
@ 2025-07-28 17:12 ` Nuno Das Neves
0 siblings, 0 replies; 26+ messages in thread
From: Nuno Das Neves @ 2025-07-28 17:12 UTC (permalink / raw)
To: mhklinux, kys, haiyangz, wei.liu, decui, tglx, mingo, bp,
dave.hansen, hpa, lpieralisi, kw, mani, robh, bhelgaas, arnd
Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch
On 7/17/2025 9:55 PM, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
>
> Update hypercall call sites to use the new hv_setup_*() functions
> to set up hypercall arguments. Since these functions zero the
> fixed portion of input memory, remove now redundant calls to memset().
>
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>
> Notes:
> Changes in v4:
> * Rename hv_hvcall_*() functions to hv_setup_*() [Easwar Hariharan]
> * Rename hv_hvcall_in_batch_size() to hv_get_input_batch_size()
> [Easwar Hariharan]
>
> Changes in v3:
> * Removed change to definition of struct hv_mmio_write_input so it remains
> consistent with original Hyper-V definitions. Adjusted argument to
> hv_hvcall_in_array() accordingly so that the 64 byte 'data' array is
> not zero'ed. [Nuno Das Neves]
>
> Changes in v2:
> * In hv_arch_irq_unmask(), added check of the number of computed banks
> in the hv_vpset against the batch_size. Since an hv_vpset currently
> represents a maximum of 4096 CPUs, the hv_vpset size does not exceed
> 512 bytes and there should always be sufficent space. But do the
> check just in case something changes. [Nuno Das Neves]
>
> drivers/pci/controller/pci-hyperv.c | 18 ++++++++----------
> 1 file changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index d2b7e8ea710b..79de85d1d68b 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -620,7 +620,7 @@ static void hv_irq_retarget_interrupt(struct irq_data *data)
> struct pci_dev *pdev;
> unsigned long flags;
> u32 var_size = 0;
> - int cpu, nr_bank;
> + int cpu, nr_bank, batch_size;
> u64 res;
>
> dest = irq_data_get_effective_affinity_mask(data);
> @@ -636,8 +636,8 @@ static void hv_irq_retarget_interrupt(struct irq_data *data)
>
> local_irq_save(flags);
>
> - params = *this_cpu_ptr(hyperv_pcpu_input_arg);
> - memset(params, 0, sizeof(*params));
> + batch_size = hv_setup_in_array(¶ms, sizeof(*params),
> + sizeof(params->int_target.vp_set.bank_contents[0]));
> params->partition_id = HV_PARTITION_ID_SELF;
> params->int_entry.source = HV_INTERRUPT_SOURCE_MSI;
> params->int_entry.msi_entry.address.as_uint32 = int_desc->address & 0xffffffff;
> @@ -669,7 +669,7 @@ static void hv_irq_retarget_interrupt(struct irq_data *data)
> nr_bank = cpumask_to_vpset(¶ms->int_target.vp_set, tmp);
> free_cpumask_var(tmp);
>
> - if (nr_bank <= 0) {
> + if (nr_bank <= 0 || nr_bank > batch_size) {
> res = 1;
> goto out;
> }
> @@ -1102,11 +1102,9 @@ static void hv_pci_read_mmio(struct device *dev, phys_addr_t gpa, int size, u32
>
> /*
> * Must be called with interrupts disabled so it is safe
> - * to use the per-cpu input argument page. Use it for
> - * both input and output.
> + * to use the per-cpu argument page.
> */
> - in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> - out = *this_cpu_ptr(hyperv_pcpu_input_arg) + sizeof(*in);
> + hv_setup_inout(&in, sizeof(*in), &out, sizeof(*out));
> in->gpa = gpa;
> in->size = size;
>
> @@ -1135,9 +1133,9 @@ static void hv_pci_write_mmio(struct device *dev, phys_addr_t gpa, int size, u32
>
> /*
> * Must be called with interrupts disabled so it is safe
> - * to use the per-cpu input argument memory.
> + * to use the per-cpu argument page.
> */
> - in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> + hv_setup_in_array(&in, offsetof(typeof(*in), data), sizeof(in->data[0]));
> in->gpa = gpa;
> in->size = size;
> switch (size) {
Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: [PATCH v4 4/7] Drivers: hv: Use hv_setup_*() to set up hypercall arguments
2025-07-28 17:02 ` Nuno Das Neves
@ 2025-07-29 0:23 ` Michael Kelley
0 siblings, 0 replies; 26+ messages in thread
From: Michael Kelley @ 2025-07-29 0:23 UTC (permalink / raw)
To: Nuno Das Neves, kys@microsoft.com, haiyangz@microsoft.com,
wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
hpa@zytor.com, lpieralisi@kernel.org, kw@linux.com,
mani@kernel.org, robh@kernel.org, bhelgaas@google.com,
arnd@arndb.de
Cc: x86@kernel.org, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
linux-arch@vger.kernel.org
From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Monday, July 28, 2025 10:03 AM
>
> On 7/17/2025 9:55 PM, mhkelley58@gmail.com wrote:
> <snip>
> > diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> > index 2b4080e51f97..d9b569b204d2 100644
> > --- a/drivers/hv/hv_balloon.c
> > +++ b/drivers/hv/hv_balloon.c
> > @@ -1577,21 +1577,21 @@ static int hv_free_page_report(struct page_reporting_dev_info *pr_dev_info,
> > {
> > unsigned long flags;
> > struct hv_memory_hint *hint;
> > - int i, order;
> > + int i, order, batch_size;
> > u64 status;
> > struct scatterlist *sg;
> >
> > - WARN_ON_ONCE(nents > HV_MEMORY_HINT_MAX_GPA_PAGE_RANGES);
> > WARN_ON_ONCE(sgl->length < (HV_HYP_PAGE_SIZE << page_reporting_order));
> > local_irq_save(flags);
> > - hint = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > +
> > + batch_size = hv_setup_in_array(&hint, sizeof(*hint), sizeof(hint->ranges[0]));
> > if (!hint) {
> > local_irq_restore(flags);
> > return -ENOSPC;
> > }
> > + WARN_ON_ONCE(nents > batch_size);
> >
>
> I don't think WARN_ON_ONCE is sufficient here... this looks like a bug in the current
> code. The loop below will go out of bounds of the input page if nents is too large.
>
> Ideally this function would be refactored to batch the operation so that this isn't a
> problem.
Yes, I kept the existing functionality, which is slightly flawed. But there's not a
real problem, because "nents" is always PAGE_REPORTING_CAPACITY (which is
32) or smaller. See page_reporting_cycle(). Furthermore, the HV balloon driver
function enable_page_reporting() has a BUILD_BUG_ON to ensure everything fits.
Adding a batching loop around the hypercall here in hv_free_page_report() seems
like overkill unless PAGE_REPORTING_CAPACITY is changed to something larger.
Hyper-V has room for the value to be as large as 128.
The virtio balloon driver does a similar check, though at runtime, and
virtio_balloon_probe() fails if its capacity isn't at least PAGE_REPORTING_CAPACITY.
The Hyper-V balloon driver could do the same. Tidying this up seems to me
to be a separate patch that's outside the scope of this series.
Michael
>
> Nuno
> > hint->heat_type = HV_EXTMEM_HEAT_HINT_COLD_DISCARD;
> > - hint->reserved = 0;
> > for_each_sg(sgl, sg, nents, i) {
> > union hv_gpa_page_range *range;
> >
> <snip>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args
2025-07-18 4:55 [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
` (7 preceding siblings ...)
2025-07-18 16:33 ` [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args Easwar Hariharan
@ 2025-08-12 23:58 ` Wei Liu
8 siblings, 0 replies; 26+ messages in thread
From: Wei Liu @ 2025-08-12 23:58 UTC (permalink / raw)
To: mhklinux
Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
lpieralisi, kw, mani, robh, bhelgaas, arnd, x86, linux-hyperv,
linux-kernel, linux-pci, linux-arch
On Thu, Jul 17, 2025 at 09:55:38PM -0700, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
>
[...]
>
> Michael Kelley (7):
> Drivers: hv: Introduce hv_setup_*() functions for hypercall arguments
> x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 1
> x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 2
> Drivers: hv: Use hv_setup_*() to set up hypercall arguments
> PCI: hv: Use hv_setup_*() to set up hypercall arguments
> Drivers: hv: Use hv_setup_*() to set up hypercall arguments for mshv
> code
> Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg
Queued for inclusion into hyperv-next. Thanks!
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v4 2/7] x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 1
2025-07-18 4:55 ` [PATCH v4 2/7] x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 1 mhkelley58
@ 2025-08-13 0:22 ` Nuno Das Neves
2025-08-13 0:41 ` Wei Liu
2025-08-14 18:48 ` Michael Kelley
0 siblings, 2 replies; 26+ messages in thread
From: Nuno Das Neves @ 2025-08-13 0:22 UTC (permalink / raw)
To: mhklinux, kys, haiyangz, wei.liu, decui, tglx, mingo, bp,
dave.hansen, hpa, lpieralisi, kw, mani, robh, bhelgaas, arnd
Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch
On 7/17/2025 11:55 PM, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
>
> Update hypercall call sites to use the new hv_setup_*() functions
> to set up hypercall arguments. Since these functions zero the
> fixed portion of input memory, remove now redundant calls to memset()
> and explicit zero'ing of input fields.
>
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> ---
>
> Notes:
> Changes in v4:
> * Rename hv_hvcall_*() functions to hv_setup_*() [Easwar Hariharan]
> * Rename hv_hvcall_in_batch_size() to hv_get_input_batch_size()
> [Easwar Hariharan]
>
> Changes in v2:
> * Fixed get_vtl() and hv_vtl_apicid_to_vp_id() to properly treat the input
> and output arguments as arrays [Nuno Das Neves]
> * Enhanced __send_ipi_mask_ex() and hv_map_interrupt() to check the number
> of computed banks in the hv_vpset against the batch_size. Since an
> hv_vpset currently represents a maximum of 4096 CPUs, the hv_vpset size
> does not exceed 512 bytes and there should always be sufficent space. But
> do the check just in case something changes. [Nuno Das Neves]
>
<snip>
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> index 090f5ac9f492..87ebe43f58cf 100644
> --- a/arch/x86/hyperv/irqdomain.c
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -21,15 +21,15 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
> struct hv_device_interrupt_descriptor *intr_desc;
> unsigned long flags;
> u64 status;
> - int nr_bank, var_size;
> + int batch_size, nr_bank, var_size;
>
> local_irq_save(flags);
>
> - input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> - output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> + batch_size = hv_setup_inout_array(&input, sizeof(*input),
> + sizeof(input->interrupt_descriptor.target.vp_set.bank_contents[0]),
> + &output, sizeof(*output), 0);
>
Hi Michael, I finally managed to test this series on (nested) root
partition and encountered an issue when I applied this patch.
With the above change, I saw HV_STATUS_INVALID_ALIGNMENT from this
hypercall. I printed out the addresses and sizes and everything looked
correct. The output seemed to be correctly placed at the end of the
percpu page. E.g. if input was allocated at an address ending in 0x3000,
output would be at 0x3ff0, because hv_output_map_device_interrupt is
0x10 bytes in size.
But it turns out, the definition for hv_output_map_device_interrupt
is out of date (or was never correct)! It should be:
struct hv_output_map_device_interrupt {
struct hv_interrupt_entry interrupt_entry;
u64 extended_status_deprecated[5];
} __packed;
(The "extended_status_deprecated" field is missing in the current code.)
Due to this, when the hypervisor validates the hypercall input/output,
it sees that output is going across a page boundary, because it knows
sizeof(hv_output_map_device_interrupt) is actually 0x58.
I confirmed that adding the "extended_status_deprecated" field fixes the
issue. That should be fixed either as part of this patch or an additional
one.
Nuno
PS. I have yet to test the mshv driver changes in patch 6, I'll try to
do so this week.
> intr_desc = &input->interrupt_descriptor;
> - memset(input, 0, sizeof(*input));
> input->partition_id = hv_current_partition_id;
> input->device_id = device_id.as_uint64;
> intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> @@ -41,7 +41,6 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
> else
> intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
>
> - intr_desc->target.vp_set.valid_bank_mask = 0;
> intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), cpumask_of(cpu));
> if (nr_bank < 0) {
> @@ -49,6 +48,11 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
> pr_err("%s: unable to generate VP set\n", __func__);
> return -EINVAL;
> }
> + if (nr_bank > batch_size) {
> + local_irq_restore(flags);
> + pr_err("%s: nr_bank too large\n", __func__);
> + return -EINVAL;
> + }
> intr_desc->target.flags = HV_DEVICE_INTERRUPT_TARGET_PROCESSOR_SET;
>
> /*
> @@ -78,9 +82,8 @@ static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *old_entry)
> u64 status;
>
> local_irq_save(flags);
> - input = *this_cpu_ptr(hyperv_pcpu_input_arg);
>
> - memset(input, 0, sizeof(*input));
> + hv_setup_in(&input, sizeof(*input));
> intr_entry = &input->interrupt_entry;
> input->partition_id = hv_current_partition_id;
> input->device_id = id;
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v4 2/7] x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 1
2025-08-13 0:22 ` Nuno Das Neves
@ 2025-08-13 0:41 ` Wei Liu
2025-08-13 18:22 ` Nuno Das Neves
2025-08-14 18:48 ` Michael Kelley
1 sibling, 1 reply; 26+ messages in thread
From: Wei Liu @ 2025-08-13 0:41 UTC (permalink / raw)
To: Nuno Das Neves
Cc: mhklinux, kys, haiyangz, wei.liu, decui, tglx, mingo, bp,
dave.hansen, hpa, lpieralisi, kw, mani, robh, bhelgaas, arnd, x86,
linux-hyperv, linux-kernel, linux-pci, linux-arch
On Tue, Aug 12, 2025 at 05:22:29PM -0700, Nuno Das Neves wrote:
> On 7/17/2025 11:55 PM, mhkelley58@gmail.com wrote:
> > From: Michael Kelley <mhklinux@outlook.com>
> >
> > Update hypercall call sites to use the new hv_setup_*() functions
> > to set up hypercall arguments. Since these functions zero the
> > fixed portion of input memory, remove now redundant calls to memset()
> > and explicit zero'ing of input fields.
> >
> > Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> > Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> > ---
> >
> > Notes:
> > Changes in v4:
> > * Rename hv_hvcall_*() functions to hv_setup_*() [Easwar Hariharan]
> > * Rename hv_hvcall_in_batch_size() to hv_get_input_batch_size()
> > [Easwar Hariharan]
> >
> > Changes in v2:
> > * Fixed get_vtl() and hv_vtl_apicid_to_vp_id() to properly treat the input
> > and output arguments as arrays [Nuno Das Neves]
> > * Enhanced __send_ipi_mask_ex() and hv_map_interrupt() to check the number
> > of computed banks in the hv_vpset against the batch_size. Since an
> > hv_vpset currently represents a maximum of 4096 CPUs, the hv_vpset size
> > does not exceed 512 bytes and there should always be sufficent space. But
> > do the check just in case something changes. [Nuno Das Neves]
> >
>
> <snip>
>
> > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > index 090f5ac9f492..87ebe43f58cf 100644
> > --- a/arch/x86/hyperv/irqdomain.c
> > +++ b/arch/x86/hyperv/irqdomain.c
> > @@ -21,15 +21,15 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
> > struct hv_device_interrupt_descriptor *intr_desc;
> > unsigned long flags;
> > u64 status;
> > - int nr_bank, var_size;
> > + int batch_size, nr_bank, var_size;
> >
> > local_irq_save(flags);
> >
> > - input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > - output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > + batch_size = hv_setup_inout_array(&input, sizeof(*input),
> > + sizeof(input->interrupt_descriptor.target.vp_set.bank_contents[0]),
> > + &output, sizeof(*output), 0);
> >
>
> Hi Michael, I finally managed to test this series on (nested) root
> partition and encountered an issue when I applied this patch.
>
> With the above change, I saw HV_STATUS_INVALID_ALIGNMENT from this
> hypercall. I printed out the addresses and sizes and everything looked
> correct. The output seemed to be correctly placed at the end of the
> percpu page. E.g. if input was allocated at an address ending in 0x3000,
> output would be at 0x3ff0, because hv_output_map_device_interrupt is
> 0x10 bytes in size.
>
> But it turns out, the definition for hv_output_map_device_interrupt
> is out of date (or was never correct)! It should be:
>
> struct hv_output_map_device_interrupt {
> struct hv_interrupt_entry interrupt_entry;
> u64 extended_status_deprecated[5];
> } __packed;
>
> (The "extended_status_deprecated" field is missing in the current code.)
>
> Due to this, when the hypervisor validates the hypercall input/output,
> it sees that output is going across a page boundary, because it knows
> sizeof(hv_output_map_device_interrupt) is actually 0x58.
>
> I confirmed that adding the "extended_status_deprecated" field fixes the
> issue. That should be fixed either as part of this patch or an additional
> one.
Thanks for testing this, Nuno. In that case, can you please submit a
patch for hv_output_map_device_interrupt? That can go in via the fixes
tree.
Thanks,
Wei
>
> Nuno
>
> PS. I have yet to test the mshv driver changes in patch 6, I'll try to
> do so this week.
>
> > intr_desc = &input->interrupt_descriptor;
> > - memset(input, 0, sizeof(*input));
> > input->partition_id = hv_current_partition_id;
> > input->device_id = device_id.as_uint64;
> > intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> > @@ -41,7 +41,6 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
> > else
> > intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> >
> > - intr_desc->target.vp_set.valid_bank_mask = 0;
> > intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> > nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), cpumask_of(cpu));
> > if (nr_bank < 0) {
> > @@ -49,6 +48,11 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
> > pr_err("%s: unable to generate VP set\n", __func__);
> > return -EINVAL;
> > }
> > + if (nr_bank > batch_size) {
> > + local_irq_restore(flags);
> > + pr_err("%s: nr_bank too large\n", __func__);
> > + return -EINVAL;
> > + }
> > intr_desc->target.flags = HV_DEVICE_INTERRUPT_TARGET_PROCESSOR_SET;
> >
> > /*
> > @@ -78,9 +82,8 @@ static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *old_entry)
> > u64 status;
> >
> > local_irq_save(flags);
> > - input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> >
> > - memset(input, 0, sizeof(*input));
> > + hv_setup_in(&input, sizeof(*input));
> > intr_entry = &input->interrupt_entry;
> > input->partition_id = hv_current_partition_id;
> > input->device_id = id;
>
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v4 2/7] x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 1
2025-08-13 0:41 ` Wei Liu
@ 2025-08-13 18:22 ` Nuno Das Neves
0 siblings, 0 replies; 26+ messages in thread
From: Nuno Das Neves @ 2025-08-13 18:22 UTC (permalink / raw)
To: Wei Liu
Cc: mhklinux, kys, haiyangz, decui, tglx, mingo, bp, dave.hansen, hpa,
lpieralisi, kw, mani, robh, bhelgaas, arnd, x86, linux-hyperv,
linux-kernel, linux-pci, linux-arch
On 8/12/2025 7:41 PM, Wei Liu wrote:
> On Tue, Aug 12, 2025 at 05:22:29PM -0700, Nuno Das Neves wrote:
>> On 7/17/2025 11:55 PM, mhkelley58@gmail.com wrote:
>>> From: Michael Kelley <mhklinux@outlook.com>
>>>
>>> Update hypercall call sites to use the new hv_setup_*() functions
>>> to set up hypercall arguments. Since these functions zero the
>>> fixed portion of input memory, remove now redundant calls to memset()
>>> and explicit zero'ing of input fields.
>>>
>>> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
>>> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
>>> ---
>>>
>>> Notes:
>>> Changes in v4:
>>> * Rename hv_hvcall_*() functions to hv_setup_*() [Easwar Hariharan]
>>> * Rename hv_hvcall_in_batch_size() to hv_get_input_batch_size()
>>> [Easwar Hariharan]
>>>
>>> Changes in v2:
>>> * Fixed get_vtl() and hv_vtl_apicid_to_vp_id() to properly treat the input
>>> and output arguments as arrays [Nuno Das Neves]
>>> * Enhanced __send_ipi_mask_ex() and hv_map_interrupt() to check the number
>>> of computed banks in the hv_vpset against the batch_size. Since an
>>> hv_vpset currently represents a maximum of 4096 CPUs, the hv_vpset size
>>> does not exceed 512 bytes and there should always be sufficent space. But
>>> do the check just in case something changes. [Nuno Das Neves]
>>>
>>
>> <snip>
>>
>>> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
>>> index 090f5ac9f492..87ebe43f58cf 100644
>>> --- a/arch/x86/hyperv/irqdomain.c
>>> +++ b/arch/x86/hyperv/irqdomain.c
>>> @@ -21,15 +21,15 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
>>> struct hv_device_interrupt_descriptor *intr_desc;
>>> unsigned long flags;
>>> u64 status;
>>> - int nr_bank, var_size;
>>> + int batch_size, nr_bank, var_size;
>>>
>>> local_irq_save(flags);
>>>
>>> - input = *this_cpu_ptr(hyperv_pcpu_input_arg);
>>> - output = *this_cpu_ptr(hyperv_pcpu_output_arg);
>>> + batch_size = hv_setup_inout_array(&input, sizeof(*input),
>>> + sizeof(input->interrupt_descriptor.target.vp_set.bank_contents[0]),
>>> + &output, sizeof(*output), 0);
>>>
>>
>> Hi Michael, I finally managed to test this series on (nested) root
>> partition and encountered an issue when I applied this patch.
>>
>> With the above change, I saw HV_STATUS_INVALID_ALIGNMENT from this
>> hypercall. I printed out the addresses and sizes and everything looked
>> correct. The output seemed to be correctly placed at the end of the
>> percpu page. E.g. if input was allocated at an address ending in 0x3000,
>> output would be at 0x3ff0, because hv_output_map_device_interrupt is
>> 0x10 bytes in size.
>>
>> But it turns out, the definition for hv_output_map_device_interrupt
>> is out of date (or was never correct)! It should be:
>>
>> struct hv_output_map_device_interrupt {
>> struct hv_interrupt_entry interrupt_entry;
>> u64 extended_status_deprecated[5];
>> } __packed;
>>
>> (The "extended_status_deprecated" field is missing in the current code.)
>>
>> Due to this, when the hypervisor validates the hypercall input/output,
>> it sees that output is going across a page boundary, because it knows
>> sizeof(hv_output_map_device_interrupt) is actually 0x58.
>>
>> I confirmed that adding the "extended_status_deprecated" field fixes the
>> issue. That should be fixed either as part of this patch or an additional
>> one.
>
> Thanks for testing this, Nuno. In that case, can you please submit a
> patch for hv_output_map_device_interrupt? That can go in via the fixes
> tree.
>
> Thanks,
> Wei
>
Sent the fix:
https://lore.kernel.org/linux-hyperv/1755109257-6893-1-git-send-email-nunodasneves@linux.microsoft.com/T/#u
>>
>> Nuno
>>
>> PS. I have yet to test the mshv driver changes in patch 6, I'll try to
>> do so this week.
>>
>>> intr_desc = &input->interrupt_descriptor;
>>> - memset(input, 0, sizeof(*input));
>>> input->partition_id = hv_current_partition_id;
>>> input->device_id = device_id.as_uint64;
>>> intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
>>> @@ -41,7 +41,6 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
>>> else
>>> intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
>>>
>>> - intr_desc->target.vp_set.valid_bank_mask = 0;
>>> intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
>>> nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), cpumask_of(cpu));
>>> if (nr_bank < 0) {
>>> @@ -49,6 +48,11 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
>>> pr_err("%s: unable to generate VP set\n", __func__);
>>> return -EINVAL;
>>> }
>>> + if (nr_bank > batch_size) {
>>> + local_irq_restore(flags);
>>> + pr_err("%s: nr_bank too large\n", __func__);
>>> + return -EINVAL;
>>> + }
>>> intr_desc->target.flags = HV_DEVICE_INTERRUPT_TARGET_PROCESSOR_SET;
>>>
>>> /*
>>> @@ -78,9 +82,8 @@ static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *old_entry)
>>> u64 status;
>>>
>>> local_irq_save(flags);
>>> - input = *this_cpu_ptr(hyperv_pcpu_input_arg);
>>>
>>> - memset(input, 0, sizeof(*input));
>>> + hv_setup_in(&input, sizeof(*input));
>>> intr_entry = &input->interrupt_entry;
>>> input->partition_id = hv_current_partition_id;
>>> input->device_id = id;
>>
>>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v4 6/7] Drivers: hv: Use hv_setup_*() to set up hypercall arguments for mshv code
2025-07-18 4:55 ` [PATCH v4 6/7] Drivers: hv: Use hv_setup_*() to set up hypercall arguments for mshv code mhkelley58
@ 2025-08-13 22:18 ` Nuno Das Neves
0 siblings, 0 replies; 26+ messages in thread
From: Nuno Das Neves @ 2025-08-13 22:18 UTC (permalink / raw)
To: mhklinux, kys, haiyangz, wei.liu, decui, tglx, mingo, bp,
dave.hansen, hpa, lpieralisi, kw, mani, robh, bhelgaas, arnd
Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch
On 7/17/2025 11:55 PM, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
>
> Update hypercall call sites to use the new hv_setup_*() functions
> to set up hypercall arguments. Since these functions zero the
> fixed portion of input memory, remove now redundant calls to memset()
> and explicit zero'ing of input fields. Where feasible use batch size
> returned by hv_setup_inout_array() instead of separate #define value.
>
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> ---
>
> Notes:
> Changes in v4:
> * Rename hv_hvcall_*() functions to hv_setup_*() [Easwar Hariharan]
> * Rename hv_hvcall_in_batch_size() to hv_get_input_batch_size()
> [Easwar Hariharan]
>
> Changes in v3:
> * This patch is new in v3 due to rebasing on 6.15-rc1, which has new
> mshv-related hypercalls.
>
> drivers/hv/mshv_common.c | 31 +++------
> drivers/hv/mshv_root_hv_call.c | 121 +++++++++++++--------------------
> drivers/hv/mshv_root_main.c | 5 +-
> 3 files changed, 60 insertions(+), 97 deletions(-)
<snip>
I tested most of the modified call sites (about 75%), and the rest look
correct to me.
Tested-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v4 7/7] Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg
2025-07-18 4:55 ` [PATCH v4 7/7] Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg mhkelley58
@ 2025-08-13 22:20 ` Nuno Das Neves
0 siblings, 0 replies; 26+ messages in thread
From: Nuno Das Neves @ 2025-08-13 22:20 UTC (permalink / raw)
To: mhklinux, kys, haiyangz, wei.liu, decui, tglx, mingo, bp,
dave.hansen, hpa, lpieralisi, kw, mani, robh, bhelgaas, arnd
Cc: x86, linux-hyperv, linux-kernel, linux-pci, linux-arch
On 7/17/2025 11:55 PM, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
>
> All open coded uses of hyperv_pcpu_input_arg and hyperv_pcpu_ouput_arg
> have been replaced by hv_setup_*() functions. So combine
> hyperv_pcpu_input_arg and hyperv_pcpu_output_arg in a single
> hyperv_pcpu_arg. Remove logic for managing a separate output arg. Fixup
> comment references to the old variable names.
>
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> ---
> arch/x86/hyperv/hv_init.c | 6 ++--
> drivers/hv/hv.c | 2 +-
> drivers/hv/hv_common.c | 55 ++++++++++------------------------
> drivers/hv/hyperv_vmbus.h | 2 +-
> include/asm-generic/mshyperv.h | 6 +---
> 5 files changed, 22 insertions(+), 49 deletions(-)
>
Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v4 1/7] Drivers: hv: Introduce hv_setup_*() functions for hypercall arguments
2025-07-18 4:55 ` [PATCH v4 1/7] Drivers: hv: Introduce hv_setup_*() functions for hypercall arguments mhkelley58
@ 2025-08-14 7:58 ` Tianyu Lan
0 siblings, 0 replies; 26+ messages in thread
From: Tianyu Lan @ 2025-08-14 7:58 UTC (permalink / raw)
To: mhklinux
Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
lpieralisi, kw, mani, robh, bhelgaas, arnd, x86, linux-hyperv,
linux-kernel, linux-pci, linux-arch
On Fri, Jul 18, 2025 at 12:56 PM <mhkelley58@gmail.com> wrote:
>
> From: Michael Kelley <mhklinux@outlook.com>
>
> Current code allocates the "hyperv_pcpu_input_arg", and in
> some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB
> page of memory allocated per-vCPU. A hypercall call site disables
> interrupts, then uses this memory to set up the input parameters for
> the hypercall, read the output results after hypercall execution, and
> re-enable interrupts. The open coding of these steps leads to
> inconsistencies, and in some cases, violation of the generic
> requirements for the hypercall input and output as described in the
> Hyper-V Top Level Functional Spec (TLFS)[1].
>
> To reduce these kinds of problems, introduce a family of inline
> functions to replace the open coding. The functions provide a new way
> to manage the use of this per-vCPU memory that is usually the input and
> output arguments to Hyper-V hypercalls. The functions encapsulate
> key aspects of the usage and ensure that the TLFS requirements are
> met (max size of 1 page each for input and output, no overlap of
> input and output, aligned to 8 bytes, etc.). Conceptually, there
> is no longer a difference between the "per-vCPU input page" and
> "per-vCPU output page". Only a single per-vCPU page is allocated, and
> it provides both hypercall input and output memory. All current
> hypercalls can fit their input and output within that single page,
> though the new code allows easy changing to two pages should a future
> hypercall require a full page for each of the input and output.
>
> The new functions always zero the fixed-size portion of the hypercall
> input area so that uninitialized memory is not inadvertently passed
> to the hypercall. Current open-coded hypercall call sites are
> inconsistent on this point, and use of the new functions addresses
> that inconsistency. The output area is not zero'ed by the new code
> as it is Hyper-V's responsibility to provide legal output.
>
> When the input or output (or both) contain an array, the new functions
> calculate and return how many array entries fit within the per-vCPU
> memory page, which is effectively the "batch size" for the hypercall
> processing multiple entries. This batch size can then be used in the
> hypercall control word to specify the repetition count. This
> calculation of the batch size replaces current open coding of the
> batch size, which is prone to errors. Note that the array portion of
> the input area is *not* zero'ed. The arrays are almost always 64-bit
> GPAs or something similar, and zero'ing that much memory seems
> wasteful at runtime when it will all be overwritten. The hypercall
> call site is responsible for ensuring that no part of the array is
> left uninitialized (just as with current code).
>
> The new functions are realized as a single inline function that
> handles the most complex case, which is a hypercall with input
> and output, both of which contain arrays. Simpler cases are mapped to
> this most complex case with #define wrappers that provide zero or NULL
> for some arguments. Several of the arguments to this new function
> must be compile-time constants generated by "sizeof()"
> expressions. As such, most of the code in the new function can be
> evaluated by the compiler, with the result that the code paths are
> no longer than with the current open coding. The one exception is
> new code generated to zero the fixed-size portion of the input area
> in cases where it is not currently done.
>
> [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
>
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> ---
>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
--
Thanks
Tianyu Lan
^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: [PATCH v4 2/7] x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 1
2025-08-13 0:22 ` Nuno Das Neves
2025-08-13 0:41 ` Wei Liu
@ 2025-08-14 18:48 ` Michael Kelley
1 sibling, 0 replies; 26+ messages in thread
From: Michael Kelley @ 2025-08-14 18:48 UTC (permalink / raw)
To: Nuno Das Neves, kys@microsoft.com, haiyangz@microsoft.com,
wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
hpa@zytor.com, lpieralisi@kernel.org, kw@linux.com,
mani@kernel.org, robh@kernel.org, bhelgaas@google.com,
arnd@arndb.de
Cc: x86@kernel.org, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
linux-arch@vger.kernel.org
From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Tuesday, August 12, 2025 5:22 PM
>
> On 7/17/2025 11:55 PM, mhkelley58@gmail.com wrote:
> > From: Michael Kelley <mhklinux@outlook.com>
> >
> > Update hypercall call sites to use the new hv_setup_*() functions
> > to set up hypercall arguments. Since these functions zero the
> > fixed portion of input memory, remove now redundant calls to memset()
> > and explicit zero'ing of input fields.
> >
> > Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> > Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> > ---
> >
> > Notes:
> > Changes in v4:
> > * Rename hv_hvcall_*() functions to hv_setup_*() [Easwar Hariharan]
> > * Rename hv_hvcall_in_batch_size() to hv_get_input_batch_size()
> > [Easwar Hariharan]
> >
> > Changes in v2:
> > * Fixed get_vtl() and hv_vtl_apicid_to_vp_id() to properly treat the input
> > and output arguments as arrays [Nuno Das Neves]
> > * Enhanced __send_ipi_mask_ex() and hv_map_interrupt() to check the number
> > of computed banks in the hv_vpset against the batch_size. Since an
> > hv_vpset currently represents a maximum of 4096 CPUs, the hv_vpset size
> > does not exceed 512 bytes and there should always be sufficent space. But
> > do the check just in case something changes. [Nuno Das Neves]
> >
>
> <snip>
>
> > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > index 090f5ac9f492..87ebe43f58cf 100644
> > --- a/arch/x86/hyperv/irqdomain.c
> > +++ b/arch/x86/hyperv/irqdomain.c
> > @@ -21,15 +21,15 @@ static int hv_map_interrupt(union hv_device_id device_id,
> bool level,
> > struct hv_device_interrupt_descriptor *intr_desc;
> > unsigned long flags;
> > u64 status;
> > - int nr_bank, var_size;
> > + int batch_size, nr_bank, var_size;
> >
> > local_irq_save(flags);
> >
> > - input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > - output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > + batch_size = hv_setup_inout_array(&input, sizeof(*input),
> > + sizeof(input-
> >interrupt_descriptor.target.vp_set.bank_contents[0]),
> > + &output, sizeof(*output), 0);
> >
>
> Hi Michael, I finally managed to test this series on (nested) root
> partition and encountered an issue when I applied this patch.
>
> With the above change, I saw HV_STATUS_INVALID_ALIGNMENT from this
> hypercall. I printed out the addresses and sizes and everything looked
> correct. The output seemed to be correctly placed at the end of the
> percpu page. E.g. if input was allocated at an address ending in 0x3000,
> output would be at 0x3ff0, because hv_output_map_device_interrupt is
> 0x10 bytes in size.
>
> But it turns out, the definition for hv_output_map_device_interrupt
> is out of date (or was never correct)! It should be:
>
> struct hv_output_map_device_interrupt {
> struct hv_interrupt_entry interrupt_entry;
> u64 extended_status_deprecated[5];
> } __packed;
>
> (The "extended_status_deprecated" field is missing in the current code.)
>
> Due to this, when the hypervisor validates the hypercall input/output,
> it sees that output is going across a page boundary, because it knows
> sizeof(hv_output_map_device_interrupt) is actually 0x58.
A very interesting problem! Not something I would have expected
to result from this patch set, but that's why we test. :-)
Thanks for the fix ....
Michael
>
> I confirmed that adding the "extended_status_deprecated" field fixes the
> issue. That should be fixed either as part of this patch or an additional
> one.
>
> Nuno
>
> PS. I have yet to test the mshv driver changes in patch 6, I'll try to
> do so this week.
>
> > intr_desc = &input->interrupt_descriptor;
> > - memset(input, 0, sizeof(*input));
> > input->partition_id = hv_current_partition_id;
> > input->device_id = device_id.as_uint64;
> > intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> > @@ -41,7 +41,6 @@ static int hv_map_interrupt(union hv_device_id device_id, bool
> level,
> > else
> > intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> >
> > - intr_desc->target.vp_set.valid_bank_mask = 0;
> > intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> > nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), cpumask_of(cpu));
> > if (nr_bank < 0) {
> > @@ -49,6 +48,11 @@ static int hv_map_interrupt(union hv_device_id device_id,
> bool level,
> > pr_err("%s: unable to generate VP set\n", __func__);
> > return -EINVAL;
> > }
> > + if (nr_bank > batch_size) {
> > + local_irq_restore(flags);
> > + pr_err("%s: nr_bank too large\n", __func__);
> > + return -EINVAL;
> > + }
> > intr_desc->target.flags = HV_DEVICE_INTERRUPT_TARGET_PROCESSOR_SET;
> >
> > /*
> > @@ -78,9 +82,8 @@ static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry
> *old_entry)
> > u64 status;
> >
> > local_irq_save(flags);
> > - input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> >
> > - memset(input, 0, sizeof(*input));
> > + hv_setup_in(&input, sizeof(*input));
> > intr_entry = &input->interrupt_entry;
> > input->partition_id = hv_current_partition_id;
> > input->device_id = id;
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2025-08-14 18:48 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-18 4:55 [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args mhkelley58
2025-07-18 4:55 ` [PATCH v4 1/7] Drivers: hv: Introduce hv_setup_*() functions for hypercall arguments mhkelley58
2025-08-14 7:58 ` Tianyu Lan
2025-07-18 4:55 ` [PATCH v4 2/7] x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 1 mhkelley58
2025-08-13 0:22 ` Nuno Das Neves
2025-08-13 0:41 ` Wei Liu
2025-08-13 18:22 ` Nuno Das Neves
2025-08-14 18:48 ` Michael Kelley
2025-07-18 4:55 ` [PATCH v4 3/7] x86/hyperv: Use hv_setup_*() to set up hypercall arguments -- part 2 mhkelley58
2025-07-18 4:55 ` [PATCH v4 4/7] Drivers: hv: Use hv_setup_*() to set up hypercall arguments mhkelley58
2025-07-28 17:02 ` Nuno Das Neves
2025-07-29 0:23 ` Michael Kelley
2025-07-18 4:55 ` [PATCH v4 5/7] PCI: " mhkelley58
2025-07-28 17:12 ` Nuno Das Neves
2025-07-18 4:55 ` [PATCH v4 6/7] Drivers: hv: Use hv_setup_*() to set up hypercall arguments for mshv code mhkelley58
2025-08-13 22:18 ` Nuno Das Neves
2025-07-18 4:55 ` [PATCH v4 7/7] Drivers: hv: Replace hyperv_pcpu_input/output_arg with hyperv_pcpu_arg mhkelley58
2025-08-13 22:20 ` Nuno Das Neves
2025-07-18 16:33 ` [PATCH v4 0/7] hyperv: Introduce new way to manage hypercall args Easwar Hariharan
2025-07-18 17:13 ` Michael Kelley
2025-07-18 20:25 ` Easwar Hariharan
2025-07-19 1:15 ` Roman Kisel
2025-07-21 2:19 ` Michael Kelley
2025-07-21 17:14 ` Roman Kisel
2025-07-21 19:30 ` Michael Kelley
2025-08-12 23:58 ` Wei Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).