* RE: [PATCH v2] mshv_vtl: clear hypercall output before copyout
From: Michael Kelley @ 2026-06-25 18:22 UTC (permalink / raw)
To: Yousef Alhouseen, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20260625181314.1399-1-alhouseenyousef@gmail.com>
From: Yousef Alhouseen <alhouseenyousef@gmail.com> Sent: Thursday, June 25, 2026 11:13 AM
>
> mshv_vtl_hvcall_call() copies output_size bytes to userspace.
>
> The output page is freshly allocated. Userspace chooses the copyout length.
>
> If the hypercall writes less, the tail can contain stale page data.
>
> Clear the copied range before issuing the hypercall.
>
> Also check both bounce page allocations before either page is used.
>
> Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
> ---
> Changes in v2:
> - Use the mshv_vtl subject prefix.
> - Clear only the requested output byte range instead of the whole page.
> - Add a comment explaining why the output range is cleared.
> - Keep free_page() calls unconditional.
> - v1: https://lore.kernel.org/all/20260624172157.2790-1-alhouseenyousef@gmail.com/
>
> drivers/hv/mshv_vtl_main.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/drivers/hv/mshv_vtl_main.c b/drivers/hv/mshv_vtl_main.c
> index 0d3d41619..dbf03b667 100644
> --- a/drivers/hv/mshv_vtl_main.c
> +++ b/drivers/hv/mshv_vtl_main.c
> @@ -1148,12 +1148,22 @@ static int mshv_vtl_hvcall_call(struct mshv_vtl_hvcall_fd *fd,
> */
> in = (void *)__get_free_page(GFP_KERNEL);
> out = (void *)__get_free_page(GFP_KERNEL);
> + if (!in || !out) {
> + ret = -ENOMEM;
> + goto free_pages;
> + }
>
> if (copy_from_user(in, (void __user *)hvcall.input_ptr, hvcall.input_size)) {
> ret = -EFAULT;
> goto free_pages;
> }
>
> + /*
> + * The caller supplies output_size, so clear the range copied back to
> + * userspace in case the hypercall writes fewer bytes than requested.
> + */
> + memset(out, 0, hvcall.output_size);
> +
> hvcall.status = hv_do_hypercall(hvcall.control, in, out);
>
> if (copy_to_user((void __user *)hvcall.output_ptr, out, hvcall.output_size)) {
> --
> 2.54.0
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
^ permalink raw reply
* RE: [PATCH] hyperv: mshv: zero VTL hypercall input page
From: Yousef Alhouseen @ 2026-06-25 18:14 UTC (permalink / raw)
To: Michael Kelley, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li
Cc: linux-hyperv, linux-kernel
In-Reply-To: <SN6PR02MB415746E6EDCF82DAFA8B1C49D4EC2@SN6PR02MB4157.namprd02.prod.outlook.com>
Hi Michael,
That makes sense. Please drop this patch.
I sent a v2 for the output-page issue with the mshv_vtl prefix and
your suggested changes.
Thanks,
Yousef
On Thu, 25 Jun 2026 16:41:51 +0000, Michael Kelley <mhklinux@outlook.com> wrote:
> From: Yousef Alhouseen <alhouseenyousef@gmail.com> Sent: Wednesday, June 24, 2026 10:57 AM
> > Subject: [PATCH] hyperv: mshv: zero VTL hypercall input page
> >
>
> Same comment here about the patch "Subject:" prefix.
>
> > mshv_vtl_hvcall_call() copies only the user-provided input size.
> >
> > It then passes the page to hv_do_hypercall().
> >
> > For short inputs, stale bytes can remain in the bounce page.
> >
> > Those bytes can be consumed by the hypervisor.
>
> It's unclear to me that there's really a problem here. In a
> CoCo VM, the host hypervisor isn't trusted, so hypercall sites
> must be careful to only expose intended data in the hypercall
> input and output pages. But this code already doesn't support
> CoCo VMs, as noted in the comment. So in the supported
> scenario, the hypervisor has access to all of guest memory. Passing
> stale bytes to the hypervisor vs. passing zeros really wouldn't matter.
> And user space can already pass stale/garbage bytes to the hypervisor
> if it wants to. This code doesn't try to validate the input data for
> whatever hypercall user space is requesting to be made.
>
> When support for CoCo VMs is added, this code will indeed
> need to make sure not to allow garbage kernel data in the
> hypercall input or output pages. But decrypting the pages
> so the hypervisor can access them should take care of that
> issue.
>
> Michael
>
> >
> > Allocate the input page zeroed, matching the output page.
> >
> > Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
> > ---
> > drivers/hv/mshv_vtl_main.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/hv/mshv_vtl_main.c b/drivers/hv/mshv_vtl_main.c
> > index 0365d207c..f2633148c 100644
> > --- a/drivers/hv/mshv_vtl_main.c
> > +++ b/drivers/hv/mshv_vtl_main.c
> > @@ -1146,7 +1146,7 @@ static int mshv_vtl_hvcall_call(struct mshv_vtl_hvcall_fd *fd,
> > *
> > * TODO: Take care of this when CVM support is added.
> > */
> > - in = (void *)__get_free_page(GFP_KERNEL);
> > + in = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> > out = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> > if (!in || !out) {
> > ret = -ENOMEM;
> > --
> > 2.54.0
> >
^ permalink raw reply
* [PATCH v2] mshv_vtl: clear hypercall output before copyout
From: Yousef Alhouseen @ 2026-06-25 18:13 UTC (permalink / raw)
To: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li
Cc: Michael Kelley, linux-hyperv, linux-kernel, Yousef Alhouseen
In-Reply-To: <20260624172157.2790-1-alhouseenyousef@gmail.com>
mshv_vtl_hvcall_call() copies output_size bytes to userspace.
The output page is freshly allocated. Userspace chooses the copyout length.
If the hypercall writes less, the tail can contain stale page data.
Clear the copied range before issuing the hypercall.
Also check both bounce page allocations before either page is used.
Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
---
Changes in v2:
- Use the mshv_vtl subject prefix.
- Clear only the requested output byte range instead of the whole page.
- Add a comment explaining why the output range is cleared.
- Keep free_page() calls unconditional.
- v1: https://lore.kernel.org/r/20260624172157.2790-1-alhouseenyousef@gmail.com
drivers/hv/mshv_vtl_main.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/hv/mshv_vtl_main.c b/drivers/hv/mshv_vtl_main.c
index 0d3d41619..dbf03b667 100644
--- a/drivers/hv/mshv_vtl_main.c
+++ b/drivers/hv/mshv_vtl_main.c
@@ -1148,12 +1148,22 @@ static int mshv_vtl_hvcall_call(struct mshv_vtl_hvcall_fd *fd,
*/
in = (void *)__get_free_page(GFP_KERNEL);
out = (void *)__get_free_page(GFP_KERNEL);
+ if (!in || !out) {
+ ret = -ENOMEM;
+ goto free_pages;
+ }
if (copy_from_user(in, (void __user *)hvcall.input_ptr, hvcall.input_size)) {
ret = -EFAULT;
goto free_pages;
}
+ /*
+ * The caller supplies output_size, so clear the range copied back to
+ * userspace in case the hypercall writes fewer bytes than requested.
+ */
+ memset(out, 0, hvcall.output_size);
+
hvcall.status = hv_do_hypercall(hvcall.control, in, out);
if (copy_to_user((void __user *)hvcall.output_ptr, out, hvcall.output_size)) {
--
2.54.0
^ permalink raw reply related
* Re: [PATCH v2 2/6] firmware: smccc: Detect hypervisor via RSI host call in CCA Realms
From: sashiko-bot @ 2026-06-25 17:54 UTC (permalink / raw)
To: Kameron Carr; +Cc: linux-hyperv
In-Reply-To: <20260625173500.1995481-3-kameroncarr@linux.microsoft.com>
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] Initialization order bug causes `arm_smccc_hypervisor_has_uuid()` to incorrectly fall back to standard SMC during early boot in a CCA Realm.
--
commit 8e27a5d40fac36e0edcb6814624362e892694ac4
Author: Kameron Carr <kameroncarr@linux.microsoft.com>
firmware: smccc: Detect hypervisor via RSI host call in CCA Realms
This commit modifies arm_smccc_hypervisor_has_uuid() to check is_realm_world()
and use an RSI host call to query the hypervisor UUID inside a Realm. It adds
a file-static RSI host call buffer serialized by a spinlock for ARM64.
> diff --git a/drivers/firmware/smccc/smccc.c b/drivers/firmware/smccc/smccc.c
> index bdee057db2fd3..a876b7aa2dc99 100644
> --- a/drivers/firmware/smccc/smccc.c
> +++ b/drivers/firmware/smccc/smccc.c
[ ... ]
> @@ -67,12 +73,45 @@ s32 arm_smccc_get_soc_id_revision(void)
> }
> EXPORT_SYMBOL_GPL(arm_smccc_get_soc_id_revision);
>
[ ... ]
> bool arm_smccc_hypervisor_has_uuid(const uuid_t *hyp_uuid)
> {
> struct arm_smccc_res res = {};
> uuid_t uuid;
>
> - arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID, &res);
> +#ifdef CONFIG_ARM64
> + if (is_realm_world())
> + arm_smccc_realm_get_hypervisor_uuid(&res);
> + else
> +#endif
[Severity: Medium]
Will this check evaluate to false during early boot, causing the detection to
incorrectly fall back to the standard SMC conduit?
Looking at setup_arch(), psci_dt_init() executes before arm64_rsi_init() is
called to set up the rsi_present static key used by is_realm_world():
setup_arch() {
...
if (acpi_disabled)
psci_dt_init();
else
psci_acpi_init();
arm64_rsi_init();
...
}
Because psci_dt_init() calls psci_probe(), which calls kvm_init_hyp_services(),
this eventually reaches arm_smccc_hypervisor_has_uuid() before the static key
is initialized.
Does this sequence cause a silent failure to detect the KVM hypervisor inside
an Arm CCA Realm?
> + arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID,
> + &res);
> +
> if (res.a0 == SMCCC_RET_NOT_SUPPORTED)
> return false;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260625173500.1995481-1-kameroncarr@linux.microsoft.com?part=2
^ permalink raw reply
* Re: [PATCH v2 4/6] Drivers: hv: Mark shared memory as decrypted for CCA Realms
From: sashiko-bot @ 2026-06-25 17:50 UTC (permalink / raw)
To: Kameron Carr; +Cc: linux-hyperv
In-Reply-To: <20260625173500.1995481-5-kameroncarr@linux.microsoft.com>
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [High] Using `kmalloc()` to allocate hypercall input/output pages violates the strict page-alignment requirements of `set_memory_decrypted()`, causing silent failures on ARM64 and exposing slab metadata to the untrusted host on x86.
--
commit 02dd1b4271844f489855c82e06efdec7e95897e3
Author: Kameron Carr <kameroncarr@linux.microsoft.com>
Drivers: hv: Mark shared memory as decrypted for CCA Realms
This commit extends the confidential VM shared memory decryption handling
for hypercall input/output pages to support Arm CCA Realm guests. It also
updates the memory allocation to round up to the nearest page size to satisfy
decryption requirements.
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index 6b67ac6167891..17048a0a18729 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
[ ... ]
> @@ -489,7 +490,7 @@ int hv_common_cpu_init(unsigned int cpu)
> * online and then taken offline
> */
> if (!*inputarg) {
> - mem = kmalloc_array(pgcount, HV_HYP_PAGE_SIZE, flags);
> + mem = kmalloc(alloc_size, flags);
> if (!mem)
> return -ENOMEM;
>
> @@ -499,14 +500,16 @@ int hv_common_cpu_init(unsigned int cpu)
> }
>
> if (!ms_hyperv.paravisor_present &&
> - (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
> - ret = set_memory_decrypted((unsigned long)mem, pgcount);
> + (hv_isolation_type_snp() || hv_isolation_type_tdx() ||
> + hv_isolation_type_cca())) {
> + ret = set_memory_decrypted((unsigned long)kasan_reset_tag(mem),
> + alloc_size >> PAGE_SHIFT);
[Severity: High]
This is a pre-existing issue, but does using kmalloc() here guarantee the
required page alignment for set_memory_decrypted()?
When slub debugging (like redzoning) is enabled, kmalloc() can return an
unaligned address due to prepended metadata.
On arm64, set_memory_decrypted() requires a page-aligned address. If
given an unaligned address, it warns and returns 0 without decrypting.
This leaves the hypercall pages encrypted, which breaks guest-hypervisor
communication for CCA Realms.
On x86, set_memory_decrypted() masks the address to the page boundary.
Because this memory is from a slab cache, decrypting the entire underlying
physical page could expose other unrelated kernel slab objects on the same
page to the untrusted hypervisor.
> if (ret) {
> /* It may be unsafe to free 'mem' */
> return ret;
> }
>
> - memset(mem, 0x00, pgcount * HV_HYP_PAGE_SIZE);
> + memset(mem, 0x00, alloc_size);
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260625173500.1995481-1-kameroncarr@linux.microsoft.com?part=4
^ permalink raw reply
* Re: [PATCH v2 5/6] arm64: hyperv: Route hypercalls through RSI host call in CCA Realms
From: sashiko-bot @ 2026-06-25 17:50 UTC (permalink / raw)
To: Kameron Carr; +Cc: linux-hyperv
In-Reply-To: <20260625173500.1995481-6-kameroncarr@linux.microsoft.com>
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] Missing Unprotected IPA Translation
- [High] Alignment Constraint Violation / Boot Failure
- [High] Missing NULL Pointer Check / NULL Dereference
--
commit a71c2edcad24708c7ea03f518e33fcef346f68d6
Author: Kameron Carr <kameroncarr@linux.microsoft.com>
arm64: hyperv: Route hypercalls through RSI host call in CCA Realms
Modify the five hypercall wrapper functions to check is_realm_world()
and use the per-CPU rsi_host_call structure when inside a Realm.
> diff --git a/arch/arm64/hyperv/hv_core.c b/arch/arm64/hyperv/hv_core.c
> index e33a9e3c366a1..77cba08fca132 100644
> --- a/arch/arm64/hyperv/hv_core.c
> +++ b/arch/arm64/hyperv/hv_core.c
> @@ -13,9 +13,41 @@
[ ... ]
> +static u64 hv_do_rsi_hypercall(u64 control, u64 input1, u64 input2)
> +{
> + struct rsi_host_call *hostcall;
> + unsigned long flags;
> + u64 ret;
> +
> + if (!hv_hostcall_array)
> + return HV_STATUS_INVALID_HYPERCALL_INPUT;
> +
> + local_irq_save(flags);
> + hostcall = &hv_hostcall_array[smp_processor_id()];
> + memset(hostcall, 0, sizeof(*hostcall));
> + hostcall->gprs[0] = HV_FUNC_ID;
> + hostcall->gprs[1] = control;
> + hostcall->gprs[2] = input1;
> + hostcall->gprs[3] = input2;
> +
> + if (rsi_host_call(virt_to_phys(hostcall)) == RSI_SUCCESS)
[Severity: High]
Does this allocation guarantee the 256-byte alignment required by the Arm CCA
RMM specification for the hostcall structure IPA?
If hv_hostcall_array is allocated with kcalloc(), which does not guarantee
type alignment, SLUB debugging or KASAN redzones could shift the object
offset.
Will calling rsi_host_call() with a misaligned pointer cause the RMM to
reject the command with RSI_ERROR_INPUT, ultimately triggering a fatal
BUG_ON(!hv_result_success(status)) in the wrapper?
> + ret = hostcall->gprs[0];
> + else
> + ret = HV_STATUS_INVALID_HYPERCALL_INPUT;
> +
> + local_irq_restore(flags);
> + return ret;
> +}
[ ... ]
> @@ -29,8 +61,11 @@ u64 hv_do_hypercall(u64 control, void *input, void *output)
> input_address = input ? virt_to_phys(input) : 0;
> output_address = output ? virt_to_phys(output) : 0;
>
> - arm_smccc_1_1_hvc(HV_FUNC_ID, control,
> - input_address, output_address, &res);
> + if (is_realm_world())
> + return hv_do_rsi_hypercall(control, input_address, output_address);
[Severity: Critical]
Will this translate the virtual address into a protected IPA without applying
the PROT_NS_SHARED bit required for host access?
In an Arm CCA Realm, memory shared with the host must be accessed via the
unprotected IPA space. The virt_to_phys() call only calculates the canonical
protected IPA.
When the untrusted host attempts to read or write the hypercall buffer using
these protected IPAs, won't the RMM block the access and generate a stage-2
fault?
> +
> + arm_smccc_1_1_hvc(HV_FUNC_ID, control, input_address,
> + output_address, &res);
> return res.a0;
> }
[ ... ]
> @@ -76,24 +117,44 @@ EXPORT_SYMBOL_GPL(hv_do_fast_hypercall16);
> void hv_set_vpreg(u32 msr, u64 value)
> {
> struct arm_smccc_res res;
> + struct rsi_host_call *hostcall;
> + unsigned long flags;
> + u64 status;
>
> + if (is_realm_world()) {
> + local_irq_save(flags);
> + hostcall = &hv_hostcall_array[smp_processor_id()];
> + memset(hostcall, 0, sizeof(*hostcall));
[Severity: High]
What happens if this is called early during a kernel panic before
hv_hostcall_array is initialized?
If an early kernel panic occurs or allocation fails, the crash handler
hyperv_report_panic() will invoke hv_set_vpreg(). Because this lacks the NULL
check present in hv_do_rsi_hypercall(), calculating the offset from NULL and
passing it to memset() might cause a double fault.
Could this prevent the panic from being reported to the hypervisor and
hang the system?
> + hostcall->gprs[0] = HV_FUNC_ID;
> + hostcall->gprs[1] = HVCALL_SET_VP_REGISTERS |
[ ... ]
> @@ -108,29 +169,55 @@ void hv_get_vpreg_128(u32 msr, struct hv_get_vp_registers_output *result)
> {
> struct arm_smccc_1_2_regs args;
> struct arm_smccc_1_2_regs res;
> + struct rsi_host_call *hostcall;
> + unsigned long flags;
> + u64 status;
>
> + if (is_realm_world()) {
> + local_irq_save(flags);
> + hostcall = &hv_hostcall_array[smp_processor_id()];
> + memset(hostcall, 0, sizeof(*hostcall));
[Severity: High]
Similar to hv_set_vpreg(), does this need a check for hv_hostcall_array
being NULL to avoid a NULL pointer dereference if invoked before the array
is properly allocated?
>
> - /*
> - * Use the SMCCC 1.2 interface because the results are in registers
> - * beyond X0-X3.
> - */
> - arm_smccc_1_2_hvc(&args, &res);
> + hostcall->gprs[0] = HV_FUNC_ID;
> + hostcall->gprs[1] = HVCALL_GET_VP_REGISTERS |
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260625173500.1995481-1-kameroncarr@linux.microsoft.com?part=5
^ permalink raw reply
* Re: [PATCH v2 3/6] arm64: hyperv: Add per-CPU RSI host call infrastructure for CCA Realms
From: sashiko-bot @ 2026-06-25 17:49 UTC (permalink / raw)
To: Kameron Carr; +Cc: linux-hyperv
In-Reply-To: <20260625173500.1995481-4-kameroncarr@linux.microsoft.com>
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] The `hv_hostcall_array` is allocated with `kcalloc()`, which does not guarantee the strict 256-byte alignment required by the RMM for `SMC_RSI_HOST_CALL`.
- [High] If `hyperv_init()` fails, `hv_hostcall_array` is freed but panic notifiers are left registered, causing a fatal NULL pointer dereference during a subsequent panic.
- [Low] The early boot initialization function `hyperv_init()` contains unnecessary memory allocation failure handling and cleanup logic for `hv_hostcall_array`.
--
commit 906cd2b30d2ff5884265fe39d28d07e982c9f874
Author: Kameron Carr <kameroncarr@linux.microsoft.com>
arm64: hyperv: Add per-CPU RSI host call infrastructure for CCA Realms
This commit adds a per-CPU rsi_host_call array allocation for Arm CCA
Realms. It routes Hyper-V hypercalls through the RSI_HOST_CALL interface
instead of HVC.
> diff --git a/arch/arm64/hyperv/mshyperv.c b/arch/arm64/hyperv/mshyperv.c
> index 4fdc26ade1d74..7d536d7fb557e 100644
> --- a/arch/arm64/hyperv/mshyperv.c
> +++ b/arch/arm64/hyperv/mshyperv.c
[ ... ]
> @@ -85,6 +96,20 @@ static int __init hyperv_init(void)
> if (!hyperv_detect_via_acpi() && !hyperv_detect_via_smccc())
> return 0;
>
> + /*
> + * The RSI host-call buffers are only ever used when
> + * is_realm_world() is true. Skip the allocation on non-Realm
> + * guests. A single contiguous array of nr_cpu_ids entries is
> + * allocated; each CPU indexes into it by its processor ID.
> + */
> + if (is_realm_world()) {
> + hv_hostcall_array = kcalloc(nr_cpu_ids,
> + sizeof(struct rsi_host_call),
> + GFP_KERNEL);
[Severity: High]
The RMM requires the IPA of the host call structure to be strictly 256-byte
aligned for SMC_RSI_HOST_CALL. Since standard slab allocators like kcalloc()
do not guarantee alignment beyond ARCH_KMALLOC_MINALIGN, could this result
in an unaligned physical address being passed to rsi_host_call() in
hv_set_vpreg()?
> + if (!hv_hostcall_array)
> + return -ENOMEM;
[Severity: Low]
This isn't a bug, but since hyperv_init() is an early_initcall executed
early in the boot process, is it necessary to handle this allocation failure
gracefully? Early boot memory allocations typically do not need complex
failure paths, as a failure at this stage implies the system cannot boot.
> + }
> +
> /* Setup the guest ID */
[ ... ]
> @@ -125,6 +151,10 @@ static int __init hyperv_init(void)
>
> hyperv_initialized = true;
> return 0;
> +
> +free_hostcall_mem:
> + hv_hostcall_free();
> + return ret;
[Severity: High]
If hyperv_init() fails here, hv_hostcall_array is freed and set to NULL.
Since the panic notifiers registered earlier in hv_common_init() are not
unregistered on this error path, will this cause a newly introduced NULL
pointer dereference during a later panic?
When the leaked notifier invokes hyperv_report_panic(), it calls
hv_set_vpreg(), which unconditionally accesses the array:
hv_set_vpreg() {
...
hostcall = &hv_hostcall_array[smp_processor_id()];
memset(hostcall, 0, sizeof(*hostcall));
...
}
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260625173500.1995481-1-kameroncarr@linux.microsoft.com?part=3
^ permalink raw reply
* RE: [RFC PATCH 1/6] arm64: rsi: Add RSI host call structure and helper function
From: Kameron Carr @ 2026-06-25 17:44 UTC (permalink / raw)
To: 'Michael Kelley'
Cc: catalin.marinas, will, mark.rutland, lpieralisi, sudeep.holla,
arnd, thuth, linux-hyperv, linux-arm-kernel, linux-kernel,
linux-arch, kys, haiyangz, wei.liu, decui, longli
In-Reply-To: <SN6PR02MB4157C9AA6BA2DD14E7697F2BD4E32@SN6PR02MB4157.namprd02.prod.outlook.com>
On Thursday, June 18, 2026 10:46 AM, Michael Kelley wrote:
> From: Kameron Carr <kameroncarr@linux.microsoft.com> Sent: Tuesday, June
> 9, 2026 11:10 AM
> > diff --git a/arch/arm64/include/asm/rsi_smc.h
> b/arch/arm64/include/asm/rsi_smc.h
> > index e19253f96c940..ffea93340ed7f 100644
> > --- a/arch/arm64/include/asm/rsi_smc.h
> > +++ b/arch/arm64/include/asm/rsi_smc.h
> > @@ -142,6 +142,12 @@ struct realm_config {
> > */
> > } __aligned(0x1000);
> >
> > +struct rsi_host_call {
> > + u16 immediate;
>
> I don't see the "immediate" used anywhere in this patch set.
> Is it always zero for the Hyper-V use cases? Just curious ...
Yes, the immediate value is always zero for Hyper-V host calls.
-- Kameron
^ permalink raw reply
* RE: [RFC PATCH 2/6] firmware: smccc: Detect hypervisor via RSI host call in CCA Realms
From: Kameron Carr @ 2026-06-25 17:42 UTC (permalink / raw)
To: 'Michael Kelley'
Cc: catalin.marinas, will, mark.rutland, lpieralisi, sudeep.holla,
arnd, thuth, linux-hyperv, linux-arm-kernel, linux-kernel,
linux-arch, kys, haiyangz, wei.liu, decui, longli
In-Reply-To: <SN6PR02MB4157F6A66DEDE650298E120ED4E32@SN6PR02MB4157.namprd02.prod.outlook.com>
On Thursday, June 18, 2026 10:46 AM, Michael Kelley wrote:
> From: Kameron Carr <kameroncarr@linux.microsoft.com> Sent: Tuesday, June
> 9, 2026 11:10 AM
> > diff --git a/drivers/firmware/smccc/smccc.c
> b/drivers/firmware/smccc/smccc.c
> > index bdee057db2fd3..6b465e65472b0 100644
> > --- a/drivers/firmware/smccc/smccc.c
> > +++ b/drivers/firmware/smccc/smccc.c
> > @@ -12,6 +12,12 @@
> > #include <linux/platform_device.h>
> > #include <asm/archrandom.h>
> >
> > +#ifdef CONFIG_ARM64
> > +#include <linux/cleanup.h>
> > +#include <linux/spinlock.h>
> > +#include <asm/rsi.h>
> > +#endif
> > +
> > static u32 smccc_version = ARM_SMCCC_VERSION_1_0;
> > static enum arm_smccc_conduit smccc_conduit = SMCCC_CONDUIT_NONE;
> >
> > @@ -67,12 +73,45 @@ s32 arm_smccc_get_soc_id_revision(void)
> > }
> > EXPORT_SYMBOL_GPL(arm_smccc_get_soc_id_revision);
> >
> > +#ifdef CONFIG_ARM64
> > +static struct rsi_host_call uuid_hc;
> > +static DEFINE_SPINLOCK(uuid_hc_lock);
>
> So evidently Sashiko is wrong in saying that struct rsi_host_call must be
> in decrypted memory?
Yes, Sashiko is wrong. The RMM spec clearly states that the rsi_host_call
struct must be encrypted / "protected". The other two requirements are
256 aligned and not RIPAS_EMPTY.
^ permalink raw reply
* [PATCH v2 6/6] arm64: hyperv: Implement hv_is_isolation_supported() for CCA Realms
From: Kameron Carr @ 2026-06-25 17:35 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli
Cc: catalin.marinas, will, mark.rutland, lpieralisi, sudeep.holla,
arnd, thuth, linux-hyperv, linux-arm-kernel, linux-kernel,
linux-arch, mhklinux
In-Reply-To: <20260625173500.1995481-1-kameroncarr@linux.microsoft.com>
Provide an arm64 implementation of hv_is_isolation_supported() that
overrides the __weak default in drivers/hv/hv_common.c.
The implementation deliberately does not depend on
hv_is_hyperv_initialized() because hv_common_init() consults
hv_is_isolation_supported() before hyperv_initialized is set.
Signed-off-by: Kameron Carr <kameroncarr@linux.microsoft.com>
---
arch/arm64/hyperv/mshyperv.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/arm64/hyperv/mshyperv.c b/arch/arm64/hyperv/mshyperv.c
index 8e8148b723d9c..62995b6133f6f 100644
--- a/arch/arm64/hyperv/mshyperv.c
+++ b/arch/arm64/hyperv/mshyperv.c
@@ -169,3 +169,8 @@ bool hv_isolation_type_cca(void)
{
return is_realm_world();
}
+
+bool hv_is_isolation_supported(void)
+{
+ return is_realm_world();
+}
--
2.45.4
^ permalink raw reply related
* [PATCH v2 5/6] arm64: hyperv: Route hypercalls through RSI host call in CCA Realms
From: Kameron Carr @ 2026-06-25 17:34 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli
Cc: catalin.marinas, will, mark.rutland, lpieralisi, sudeep.holla,
arnd, thuth, linux-hyperv, linux-arm-kernel, linux-kernel,
linux-arch, mhklinux
In-Reply-To: <20260625173500.1995481-1-kameroncarr@linux.microsoft.com>
Modify the five hypercall wrapper functions to check is_realm_world()
and use the per-CPU rsi_host_call structure when inside a Realm.
Signed-off-by: Kameron Carr <kameroncarr@linux.microsoft.com>
---
arch/arm64/hyperv/hv_core.c | 155 ++++++++++++++++++++++++++++--------
1 file changed, 121 insertions(+), 34 deletions(-)
diff --git a/arch/arm64/hyperv/hv_core.c b/arch/arm64/hyperv/hv_core.c
index e33a9e3c366a1..77cba08fca132 100644
--- a/arch/arm64/hyperv/hv_core.c
+++ b/arch/arm64/hyperv/hv_core.c
@@ -13,9 +13,41 @@
#include <linux/mm.h>
#include <linux/arm-smccc.h>
#include <linux/module.h>
+#include <linux/smp.h>
#include <asm-generic/bug.h>
#include <hyperv/hvhdk.h>
#include <asm/mshyperv.h>
+#include <asm/rsi.h>
+
+/*
+ * hv_do_rsi_hypercall - Helper function to invoke a hypercall from a
+ * Realm world using the RSI interface.
+ */
+static u64 hv_do_rsi_hypercall(u64 control, u64 input1, u64 input2)
+{
+ struct rsi_host_call *hostcall;
+ unsigned long flags;
+ u64 ret;
+
+ if (!hv_hostcall_array)
+ return HV_STATUS_INVALID_HYPERCALL_INPUT;
+
+ local_irq_save(flags);
+ hostcall = &hv_hostcall_array[smp_processor_id()];
+ memset(hostcall, 0, sizeof(*hostcall));
+ hostcall->gprs[0] = HV_FUNC_ID;
+ hostcall->gprs[1] = control;
+ hostcall->gprs[2] = input1;
+ hostcall->gprs[3] = input2;
+
+ if (rsi_host_call(virt_to_phys(hostcall)) == RSI_SUCCESS)
+ ret = hostcall->gprs[0];
+ else
+ ret = HV_STATUS_INVALID_HYPERCALL_INPUT;
+
+ local_irq_restore(flags);
+ return ret;
+}
/*
* hv_do_hypercall- Invoke the specified hypercall
@@ -29,8 +61,11 @@ u64 hv_do_hypercall(u64 control, void *input, void *output)
input_address = input ? virt_to_phys(input) : 0;
output_address = output ? virt_to_phys(output) : 0;
- arm_smccc_1_1_hvc(HV_FUNC_ID, control,
- input_address, output_address, &res);
+ if (is_realm_world())
+ return hv_do_rsi_hypercall(control, input_address, output_address);
+
+ arm_smccc_1_1_hvc(HV_FUNC_ID, control, input_address,
+ output_address, &res);
return res.a0;
}
EXPORT_SYMBOL_GPL(hv_do_hypercall);
@@ -48,6 +83,9 @@ u64 hv_do_fast_hypercall8(u16 code, u64 input)
control = (u64)code | HV_HYPERCALL_FAST_BIT;
+ if (is_realm_world())
+ return hv_do_rsi_hypercall(control, input, 0);
+
arm_smccc_1_1_hvc(HV_FUNC_ID, control, input, &res);
return res.a0;
}
@@ -65,6 +103,9 @@ u64 hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2)
control = (u64)code | HV_HYPERCALL_FAST_BIT;
+ if (is_realm_world())
+ return hv_do_rsi_hypercall(control, input1, input2);
+
arm_smccc_1_1_hvc(HV_FUNC_ID, control, input1, input2, &res);
return res.a0;
}
@@ -76,24 +117,44 @@ EXPORT_SYMBOL_GPL(hv_do_fast_hypercall16);
void hv_set_vpreg(u32 msr, u64 value)
{
struct arm_smccc_res res;
+ struct rsi_host_call *hostcall;
+ unsigned long flags;
+ u64 status;
- arm_smccc_1_1_hvc(HV_FUNC_ID,
- HVCALL_SET_VP_REGISTERS | HV_HYPERCALL_FAST_BIT |
- HV_HYPERCALL_REP_COMP_1,
- HV_PARTITION_ID_SELF,
- HV_VP_INDEX_SELF,
- msr,
- 0,
- value,
- 0,
- &res);
+ if (is_realm_world()) {
+ local_irq_save(flags);
+ hostcall = &hv_hostcall_array[smp_processor_id()];
+ memset(hostcall, 0, sizeof(*hostcall));
+ hostcall->gprs[0] = HV_FUNC_ID;
+ hostcall->gprs[1] = HVCALL_SET_VP_REGISTERS |
+ HV_HYPERCALL_FAST_BIT |
+ HV_HYPERCALL_REP_COMP_1;
+ hostcall->gprs[2] = HV_PARTITION_ID_SELF;
+ hostcall->gprs[3] = HV_VP_INDEX_SELF;
+ hostcall->gprs[4] = msr;
+ hostcall->gprs[6] = value;
+
+ if (rsi_host_call(virt_to_phys(hostcall)) == RSI_SUCCESS)
+ status = hostcall->gprs[0];
+ else
+ status = HV_STATUS_INVALID_HYPERCALL_INPUT;
+ local_irq_restore(flags);
+ } else {
+ arm_smccc_1_1_hvc(HV_FUNC_ID,
+ HVCALL_SET_VP_REGISTERS |
+ HV_HYPERCALL_FAST_BIT |
+ HV_HYPERCALL_REP_COMP_1,
+ HV_PARTITION_ID_SELF, HV_VP_INDEX_SELF, msr,
+ 0, value, 0, &res);
+ status = res.a0;
+ }
/*
- * Something is fundamentally broken in the hypervisor if
- * setting a VP register fails. There's really no way to
- * continue as a guest VM, so panic.
+ * Something is fundamentally broken in the hypervisor (or, in a
+ * Realm, the RMM denied the host call) if setting a VP register
+ * fails. There's really no way to continue as a guest VM, so panic.
*/
- BUG_ON(!hv_result_success(res.a0));
+ BUG_ON(!hv_result_success(status));
}
EXPORT_SYMBOL_GPL(hv_set_vpreg);
@@ -108,29 +169,55 @@ void hv_get_vpreg_128(u32 msr, struct hv_get_vp_registers_output *result)
{
struct arm_smccc_1_2_regs args;
struct arm_smccc_1_2_regs res;
+ struct rsi_host_call *hostcall;
+ unsigned long flags;
+ u64 status;
- args.a0 = HV_FUNC_ID;
- args.a1 = HVCALL_GET_VP_REGISTERS | HV_HYPERCALL_FAST_BIT |
- HV_HYPERCALL_REP_COMP_1;
- args.a2 = HV_PARTITION_ID_SELF;
- args.a3 = HV_VP_INDEX_SELF;
- args.a4 = msr;
+ if (is_realm_world()) {
+ local_irq_save(flags);
+ hostcall = &hv_hostcall_array[smp_processor_id()];
+ memset(hostcall, 0, sizeof(*hostcall));
- /*
- * Use the SMCCC 1.2 interface because the results are in registers
- * beyond X0-X3.
- */
- arm_smccc_1_2_hvc(&args, &res);
+ hostcall->gprs[0] = HV_FUNC_ID;
+ hostcall->gprs[1] = HVCALL_GET_VP_REGISTERS |
+ HV_HYPERCALL_FAST_BIT |
+ HV_HYPERCALL_REP_COMP_1;
+ hostcall->gprs[2] = HV_PARTITION_ID_SELF;
+ hostcall->gprs[3] = HV_VP_INDEX_SELF;
+ hostcall->gprs[4] = msr;
+
+ if (rsi_host_call(virt_to_phys(hostcall)) == RSI_SUCCESS) {
+ status = hostcall->gprs[0];
+ result->as64.low = hostcall->gprs[6];
+ result->as64.high = hostcall->gprs[7];
+ } else {
+ status = HV_STATUS_INVALID_HYPERCALL_INPUT;
+ }
+ local_irq_restore(flags);
+ } else {
+ args.a0 = HV_FUNC_ID;
+ args.a1 = HVCALL_GET_VP_REGISTERS | HV_HYPERCALL_FAST_BIT |
+ HV_HYPERCALL_REP_COMP_1;
+ args.a2 = HV_PARTITION_ID_SELF;
+ args.a3 = HV_VP_INDEX_SELF;
+ args.a4 = msr;
+
+ /*
+ * Use the SMCCC 1.2 interface because the results are in
+ * registers beyond X0-X3.
+ */
+ arm_smccc_1_2_hvc(&args, &res);
+ status = res.a0;
+ result->as64.low = res.a6;
+ result->as64.high = res.a7;
+ }
/*
- * Something is fundamentally broken in the hypervisor if
- * getting a VP register fails. There's really no way to
- * continue as a guest VM, so panic.
+ * Something is fundamentally broken in the hypervisor (or, in a
+ * Realm, the RMM denied the host call) if getting a VP register
+ * fails. There's really no way to continue as a guest VM, so panic.
*/
- BUG_ON(!hv_result_success(res.a0));
-
- result->as64.low = res.a6;
- result->as64.high = res.a7;
+ BUG_ON(!hv_result_success(status));
}
EXPORT_SYMBOL_GPL(hv_get_vpreg_128);
--
2.45.4
^ permalink raw reply related
* [PATCH v2 4/6] Drivers: hv: Mark shared memory as decrypted for CCA Realms
From: Kameron Carr @ 2026-06-25 17:34 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli
Cc: catalin.marinas, will, mark.rutland, lpieralisi, sudeep.holla,
arnd, thuth, linux-hyperv, linux-arm-kernel, linux-kernel,
linux-arch, mhklinux
In-Reply-To: <20260625173500.1995481-1-kameroncarr@linux.microsoft.com>
In hv_common_cpu_init(), the per-CPU hypercall input/output pages need
to be marked as decrypted (shared) for confidential VM isolation types.
This is already done for SNP and TDX isolation; extend the same handling
to Arm CCA Realm guests so that the host hypervisor can access the
shared hypercall buffers.
We need to round up the memory allocated for the input/output pages to
the nearest PAGE_SIZE, since set_memory_decrypted() requires the size to
be a multiple of PAGE_SIZE. This only has an effect on ARM VMs that are
using PAGE_SIZE larger than 4K.
is_realm_world() is only declared in arch/arm64/include/asm/rsi.h, so
using it directly in the arch-neutral drivers/hv/hv_common.c would
break the x86 build. Introduce a Hyper-V-specific helper following the
established hv_isolation_type_snp() / hv_isolation_type_tdx() pattern.
On architectures other than arm64 the weak default keeps the existing
behaviour.
Signed-off-by: Kameron Carr <kameroncarr@linux.microsoft.com>
---
arch/arm64/hyperv/mshyperv.c | 5 +++++
drivers/hv/hv_common.c | 17 +++++++++++++----
include/asm-generic/mshyperv.h | 1 +
3 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/hyperv/mshyperv.c b/arch/arm64/hyperv/mshyperv.c
index 7d536d7fb557e..8e8148b723d9c 100644
--- a/arch/arm64/hyperv/mshyperv.c
+++ b/arch/arm64/hyperv/mshyperv.c
@@ -164,3 +164,8 @@ bool hv_is_hyperv_initialized(void)
return hyperv_initialized;
}
EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
+
+bool hv_isolation_type_cca(void)
+{
+ return is_realm_world();
+}
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 6b67ac6167891..17048a0a18729 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -476,6 +476,7 @@ int hv_common_cpu_init(unsigned int cpu)
u64 msr_vp_index;
gfp_t flags;
const int pgcount = hv_output_page_exists() ? 2 : 1;
+ const size_t alloc_size = ALIGN((size_t)pgcount * HV_HYP_PAGE_SIZE, PAGE_SIZE);
void *mem;
int ret = 0;
@@ -489,7 +490,7 @@ int hv_common_cpu_init(unsigned int cpu)
* online and then taken offline
*/
if (!*inputarg) {
- mem = kmalloc_array(pgcount, HV_HYP_PAGE_SIZE, flags);
+ mem = kmalloc(alloc_size, flags);
if (!mem)
return -ENOMEM;
@@ -499,14 +500,16 @@ int hv_common_cpu_init(unsigned int cpu)
}
if (!ms_hyperv.paravisor_present &&
- (hv_isolation_type_snp() || hv_isolation_type_tdx())) {
- ret = set_memory_decrypted((unsigned long)mem, pgcount);
+ (hv_isolation_type_snp() || hv_isolation_type_tdx() ||
+ hv_isolation_type_cca())) {
+ ret = set_memory_decrypted((unsigned long)kasan_reset_tag(mem),
+ alloc_size >> PAGE_SHIFT);
if (ret) {
/* It may be unsafe to free 'mem' */
return ret;
}
- memset(mem, 0x00, pgcount * HV_HYP_PAGE_SIZE);
+ memset(mem, 0x00, alloc_size);
}
/*
@@ -666,6 +669,12 @@ bool __weak hv_isolation_type_tdx(void)
}
EXPORT_SYMBOL_GPL(hv_isolation_type_tdx);
+bool __weak hv_isolation_type_cca(void)
+{
+ return false;
+}
+EXPORT_SYMBOL_GPL(hv_isolation_type_cca);
+
void __weak hv_setup_vmbus_handler(void (*handler)(void))
{
}
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index bf601d67cecb9..1fa79abce743c 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -79,6 +79,7 @@ u64 hv_do_fast_hypercall16(u16 control, u64 input1, u64 input2);
bool hv_isolation_type_snp(void);
bool hv_isolation_type_tdx(void);
+bool hv_isolation_type_cca(void);
/*
* On architectures where Hyper-V doesn't support AEOI (e.g., ARM64),
--
2.45.4
^ permalink raw reply related
* [PATCH v2 3/6] arm64: hyperv: Add per-CPU RSI host call infrastructure for CCA Realms
From: Kameron Carr @ 2026-06-25 17:34 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli
Cc: catalin.marinas, will, mark.rutland, lpieralisi, sudeep.holla,
arnd, thuth, linux-hyperv, linux-arm-kernel, linux-kernel,
linux-arch, mhklinux
In-Reply-To: <20260625173500.1995481-1-kameroncarr@linux.microsoft.com>
Arm CCA Realms cannot issue Hyper-V hypercalls via HVC; the guest must
route them through the RSI_HOST_CALL interface, which takes the IPA of a
per-CPU rsi_host_call structure as its argument.
Add hv_hostcall_array as a per-CPU struct array and allocate it during
hyperv_init(). The allocation is gated on is_realm_world() so non-Realm
arm64 Hyper-V guests pay no memory cost.
Signed-off-by: Kameron Carr <kameroncarr@linux.microsoft.com>
---
arch/arm64/hyperv/mshyperv.c | 32 ++++++++++++++++++++++++++++++-
arch/arm64/include/asm/mshyperv.h | 4 ++++
2 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/hyperv/mshyperv.c b/arch/arm64/hyperv/mshyperv.c
index 4fdc26ade1d74..7d536d7fb557e 100644
--- a/arch/arm64/hyperv/mshyperv.c
+++ b/arch/arm64/hyperv/mshyperv.c
@@ -15,10 +15,15 @@
#include <linux/errno.h>
#include <linux/version.h>
#include <linux/cpuhotplug.h>
+#include <linux/slab.h>
#include <asm/mshyperv.h>
+#include <asm/rsi.h>
static bool hyperv_initialized;
+struct rsi_host_call *hv_hostcall_array;
+EXPORT_SYMBOL_GPL(hv_hostcall_array);
+
int hv_get_hypervisor_version(union hv_hypervisor_version_info *info)
{
hv_get_vpreg_128(HV_REGISTER_HYPERVISOR_VERSION,
@@ -60,6 +65,12 @@ static bool __init hyperv_detect_via_acpi(void)
#endif
+static void hv_hostcall_free(void)
+{
+ kfree(hv_hostcall_array);
+ hv_hostcall_array = NULL;
+}
+
static bool __init hyperv_detect_via_smccc(void)
{
uuid_t hyperv_uuid = UUID_INIT(
@@ -85,6 +96,20 @@ static int __init hyperv_init(void)
if (!hyperv_detect_via_acpi() && !hyperv_detect_via_smccc())
return 0;
+ /*
+ * The RSI host-call buffers are only ever used when
+ * is_realm_world() is true. Skip the allocation on non-Realm
+ * guests. A single contiguous array of nr_cpu_ids entries is
+ * allocated; each CPU indexes into it by its processor ID.
+ */
+ if (is_realm_world()) {
+ hv_hostcall_array = kcalloc(nr_cpu_ids,
+ sizeof(struct rsi_host_call),
+ GFP_KERNEL);
+ if (!hv_hostcall_array)
+ return -ENOMEM;
+ }
+
/* Setup the guest ID */
guest_id = hv_generate_guest_id(LINUX_VERSION_CODE);
hv_set_vpreg(HV_REGISTER_GUEST_OS_ID, guest_id);
@@ -106,12 +131,13 @@ static int __init hyperv_init(void)
ret = hv_common_init();
if (ret)
- return ret;
+ goto free_hostcall_mem;
ret = cpuhp_setup_state(CPUHP_AP_HYPERV_ONLINE, "arm64/hyperv_init:online",
hv_common_cpu_init, hv_common_cpu_die);
if (ret < 0) {
hv_common_free();
+ hv_hostcall_free();
return ret;
}
@@ -125,6 +151,10 @@ static int __init hyperv_init(void)
hyperv_initialized = true;
return 0;
+
+free_hostcall_mem:
+ hv_hostcall_free();
+ return ret;
}
early_initcall(hyperv_init);
diff --git a/arch/arm64/include/asm/mshyperv.h b/arch/arm64/include/asm/mshyperv.h
index b721d3134ab66..c207a3f79b99b 100644
--- a/arch/arm64/include/asm/mshyperv.h
+++ b/arch/arm64/include/asm/mshyperv.h
@@ -63,4 +63,8 @@ static inline u64 hv_get_non_nested_msr(unsigned int reg)
#include <asm-generic/mshyperv.h>
+/* Per-CPU-indexed RSI host call structures for CCA Realms */
+struct rsi_host_call;
+extern struct rsi_host_call *hv_hostcall_array;
+
#endif
--
2.45.4
^ permalink raw reply related
* [PATCH v2 2/6] firmware: smccc: Detect hypervisor via RSI host call in CCA Realms
From: Kameron Carr @ 2026-06-25 17:34 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli
Cc: catalin.marinas, will, mark.rutland, lpieralisi, sudeep.holla,
arnd, thuth, linux-hyperv, linux-arm-kernel, linux-kernel,
linux-arch, mhklinux
In-Reply-To: <20260625173500.1995481-1-kameroncarr@linux.microsoft.com>
Modify arm_smccc_hypervisor_has_uuid() to check is_realm_world() and
use rsi_host_call() to query the hypervisor vendor UUID when inside a
Realm. The realm path is factored into a helper,
arm_smccc_realm_get_hypervisor_uuid(), that owns a file-static
rsi_host_call buffer (uuid_hc) serialized by a spinlock.
The RSI-specific includes, file-static state and helper are guarded
with CONFIG_ARM64 because <asm/rsi.h> does not exist on 32-bit ARM.
For non-Realm environments, the existing arm_smccc_1_1_invoke() path
is unchanged.
Signed-off-by: Kameron Carr <kameroncarr@linux.microsoft.com>
---
drivers/firmware/smccc/smccc.c | 41 +++++++++++++++++++++++++++++++++-
1 file changed, 40 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/smccc/smccc.c b/drivers/firmware/smccc/smccc.c
index bdee057db2fd3..a876b7aa2dc99 100644
--- a/drivers/firmware/smccc/smccc.c
+++ b/drivers/firmware/smccc/smccc.c
@@ -12,6 +12,12 @@
#include <linux/platform_device.h>
#include <asm/archrandom.h>
+#ifdef CONFIG_ARM64
+#include <linux/cleanup.h>
+#include <linux/spinlock.h>
+#include <asm/rsi.h>
+#endif
+
static u32 smccc_version = ARM_SMCCC_VERSION_1_0;
static enum arm_smccc_conduit smccc_conduit = SMCCC_CONDUIT_NONE;
@@ -67,12 +73,45 @@ s32 arm_smccc_get_soc_id_revision(void)
}
EXPORT_SYMBOL_GPL(arm_smccc_get_soc_id_revision);
+#ifdef CONFIG_ARM64
+static struct rsi_host_call uuid_hc;
+static DEFINE_SPINLOCK(uuid_hc_lock);
+
+/*
+ * Helper function to get the hypervisor UUID via an RsiHostCall.
+ */
+static void arm_smccc_realm_get_hypervisor_uuid(struct arm_smccc_res *res)
+{
+ guard(spinlock_irqsave)(&uuid_hc_lock);
+
+ memset(&uuid_hc, 0, sizeof(uuid_hc));
+ uuid_hc.gprs[0] = ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID;
+
+ if (rsi_host_call(__pa_symbol(&uuid_hc)) != RSI_SUCCESS) {
+ res->a0 = SMCCC_RET_NOT_SUPPORTED;
+ return;
+ }
+
+ res->a0 = uuid_hc.gprs[0];
+ res->a1 = uuid_hc.gprs[1];
+ res->a2 = uuid_hc.gprs[2];
+ res->a3 = uuid_hc.gprs[3];
+}
+#endif
+
bool arm_smccc_hypervisor_has_uuid(const uuid_t *hyp_uuid)
{
struct arm_smccc_res res = {};
uuid_t uuid;
- arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID, &res);
+#ifdef CONFIG_ARM64
+ if (is_realm_world())
+ arm_smccc_realm_get_hypervisor_uuid(&res);
+ else
+#endif
+ arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID,
+ &res);
+
if (res.a0 == SMCCC_RET_NOT_SUPPORTED)
return false;
--
2.45.4
^ permalink raw reply related
* [PATCH v2 1/6] arm64: rsi: Add RSI host call structure and helper function
From: Kameron Carr @ 2026-06-25 17:34 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli
Cc: catalin.marinas, will, mark.rutland, lpieralisi, sudeep.holla,
arnd, thuth, linux-hyperv, linux-arm-kernel, linux-kernel,
linux-arch, mhklinux
In-Reply-To: <20260625173500.1995481-1-kameroncarr@linux.microsoft.com>
Add struct rsi_host_call to rsi_smc.h, which represents the host call
data structure used by the Realm Management Monitor (RMM) for the
RSI_HOST_CALL interface. The structure contains a 16-bit immediate field
and 31 general-purpose register values, aligned to 256 bytes as required
by the CCA RMM specification.
Add rsi_host_call() static inline wrapper in rsi_cmds.h that invokes
SMC_RSI_HOST_CALL with the physical address of the host call structure.
This will be used by Hyper-V guest code to route hypercalls through the
RSI interface when running inside an Arm CCA Realm.
Signed-off-by: Kameron Carr <kameroncarr@linux.microsoft.com>
---
arch/arm64/include/asm/rsi_cmds.h | 22 ++++++++++++++++++++++
arch/arm64/include/asm/rsi_smc.h | 7 +++++++
2 files changed, 29 insertions(+)
diff --git a/arch/arm64/include/asm/rsi_cmds.h b/arch/arm64/include/asm/rsi_cmds.h
index 2c8763876dfb7..9daf8008e5da2 100644
--- a/arch/arm64/include/asm/rsi_cmds.h
+++ b/arch/arm64/include/asm/rsi_cmds.h
@@ -88,6 +88,28 @@ static inline long rsi_set_addr_range_state(phys_addr_t start,
return res.a0;
}
+/**
+ * rsi_host_call - Make a Host call.
+ * @host_call_struct: IPA of host call structure
+ *
+ * This call will fail if the IPA of the host call structure:
+ * * is not aligned to 256 bytes,
+ * * is not protected / encrypted,
+ * * is RIPAS_EMPTY
+ *
+ * Returns:
+ * On success, returns RSI_SUCCESS.
+ * Otherwise, returns an error code.
+ */
+static inline unsigned long rsi_host_call(phys_addr_t host_call_struct)
+{
+ struct arm_smccc_res res;
+
+ arm_smccc_smc(SMC_RSI_HOST_CALL, host_call_struct, 0, 0, 0, 0, 0, 0,
+ &res);
+ return res.a0;
+}
+
/**
* rsi_attestation_token_init - Initialise the operation to retrieve an
* attestation token.
diff --git a/arch/arm64/include/asm/rsi_smc.h b/arch/arm64/include/asm/rsi_smc.h
index e19253f96c940..9cc57b5be0c02 100644
--- a/arch/arm64/include/asm/rsi_smc.h
+++ b/arch/arm64/include/asm/rsi_smc.h
@@ -142,6 +142,13 @@ struct realm_config {
*/
} __aligned(0x1000);
+struct rsi_host_call {
+ u16 immediate;
+ u8 _padding[6];
+ u64 gprs[31];
+} __aligned(256);
+static_assert(sizeof(struct rsi_host_call) == 256);
+
#endif /* __ASSEMBLER__ */
/*
--
2.45.4
^ permalink raw reply related
* [PATCH v2 0/6] arm64: hyperv: Add Realm support for Hyper-V
From: Kameron Carr @ 2026-06-25 17:34 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli
Cc: catalin.marinas, will, mark.rutland, lpieralisi, sudeep.holla,
arnd, thuth, linux-hyperv, linux-arm-kernel, linux-kernel,
linux-arch, mhklinux
Realms (CoCo VMs on ARM) require host calls to be routed through the RMM
(Realm Management Monitor) via the RSI (Realm Service Interface). This
series implements most of the necessary changes to support Realms on
Hyper-V.
One required change is not included in this series. The two buffers
allocated via vzalloc() in netvsc_init_buf() cannot be decrypted in
vmbus_establish_gpadl(). Currently only linearly mapped memory can be
decrypted. See my RFC patch [1]. I will implement the accompanying netvsc
changes based on the feedback I receive on that patch.
This patch series was tested by booting a Realm on Cobalt 200 running
Windows. I decreased the buffer size and used kzalloc() in
netvsc_init_buf() in my testing as a workaround for the issue mentioned
above.
Changes since v1 [2]:
Patch 1: Add explicit padding to the RSI host call structure
Patch 3: Change from a per-cpu pointer lazily allocated to an array
of host call structs indexed by cpu id
Patch 4: Align input_page + output_page allocation to PAGE_SIZE since
that is the smallest unit of memory that can be decrypted
Remove KSAN tags before passing address to set_memory_decrypted()
since __is_lm_address() does pointer arithmetic.
Patch 5: Add a helper function to reduce repetition
Check for NULL before indexing into host call array
[1] https://lore.kernel.org/all/20260521205834.1012925-1-kameroncarr@linux.microsoft.com/
[2] https://lore.kernel.org/all/20260609181030.2378391-1-kameroncarr@linux.microsoft.com/
Kameron Carr (6):
arm64: rsi: Add RSI host call structure and helper function
firmware: smccc: Detect hypervisor via RSI host call in CCA Realms
arm64: hyperv: Add per-CPU RSI host call infrastructure for CCA Realms
Drivers: hv: Mark shared memory as decrypted for CCA Realms
arm64: hyperv: Route hypercalls through RSI host call in CCA Realms
arm64: hyperv: Implement hv_is_isolation_supported() for CCA Realms
arch/arm64/hyperv/hv_core.c | 155 +++++++++++++++++++++++-------
arch/arm64/hyperv/mshyperv.c | 42 +++++++-
arch/arm64/include/asm/mshyperv.h | 4 +
arch/arm64/include/asm/rsi_cmds.h | 22 +++++
arch/arm64/include/asm/rsi_smc.h | 7 ++
drivers/firmware/smccc/smccc.c | 41 +++++++-
drivers/hv/hv_common.c | 17 +++-
include/asm-generic/mshyperv.h | 1 +
8 files changed, 249 insertions(+), 40 deletions(-)
base-commit: a4ffc59238be84dd1c26bf1c001543e832674fc6
--
2.45.4
^ permalink raw reply
* Re: [PATCH net] net: ethtool: keep rtnl_lock for ops using ethtool_op_get_link()
From: Harshitha Ramamurthy @ 2026-06-25 17:06 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
Breno Leitao, joshwash, anthony.l.nguyen, przemyslaw.kitszel,
saeedm, tariqt, mbloch, leon, alexanderduyck, kernel-team, kys,
haiyangz, wei.liu, decui, longli, jordanrhee, jacob.e.keller,
nktgrg, debarghyak, mohsin.bashr, ernis, sdf, gal, linux-rdma,
linux-hyperv
In-Reply-To: <20260624190439.2521219-1-kuba@kernel.org>
On Wed, Jun 24, 2026 at 12:04 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Breno reports following splats on mlx5:
>
> RTNL: assertion failed at net/core/dev.c (2241)
> WARNING: net/core/dev.c:2241 at netif_state_change+0xed/0x130, CPU#5: ethtool/1335
> RIP: 0010:netif_state_change+0xf9/0x130
> Call Trace:
> <TASK>
> __linkwatch_sync_dev+0xea/0x120
> ethtool_op_get_link+0xe/0x20
> __ethtool_get_link+0x26/0x40
> linkstate_prepare_data+0x51/0x200
> ethnl_default_doit+0x213/0x470
> genl_family_rcv_msg_doit+0xdd/0x110
>
> Looks like I missed ethtool_op_get_link() trying to sync linkwatch,
> which needs rtnl_lock. Not all drivers do this - bnxt doesn't,
> it just returns the link state, so add an opt-in bit.
>
> Reported-by: Breno Leitao <leitao@debian.org>
> Fixes: 45079e00133e ("net: ethtool: optionally skip rtnl_lock on Netlink path for GET ops")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: joshwash@google.com
> CC: hramamurthy@google.com
> CC: anthony.l.nguyen@intel.com
> CC: przemyslaw.kitszel@intel.com
> CC: saeedm@nvidia.com
> CC: tariqt@nvidia.com
> CC: mbloch@nvidia.com
> CC: leon@kernel.org
> CC: alexanderduyck@fb.com
> CC: kernel-team@meta.com
> CC: kys@microsoft.com
> CC: haiyangz@microsoft.com
> CC: wei.liu@kernel.org
> CC: decui@microsoft.com
> CC: longli@microsoft.com
> CC: jordanrhee@google.com
> CC: jacob.e.keller@intel.com
> CC: nktgrg@google.com
> CC: debarghyak@google.com
> CC: leitao@debian.org
> CC: mohsin.bashr@gmail.com
> CC: ernis@linux.microsoft.com
> CC: sdf@fomichev.me
> CC: gal@nvidia.com
> CC: linux-rdma@vger.kernel.org
> CC: linux-hyperv@vger.kernel.org
> ---
> include/linux/ethtool.h | 2 ++
> net/ethtool/common.h | 4 ++++
> drivers/net/ethernet/google/gve/gve_ethtool.c | 3 ++-
> drivers/net/ethernet/intel/iavf/iavf_ethtool.c | 1 +
> drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 3 ++-
> drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 3 ++-
> drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c | 4 +++-
> drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c | 3 ++-
> drivers/net/ethernet/microsoft/mana/mana_ethtool.c | 3 ++-
> 9 files changed, 20 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index 1b834e2a522e..5d491a98265e 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -942,6 +942,7 @@ struct kernel_ethtool_ts_info {
> #define ETHTOOL_OP_NEEDS_RTNL_GPAUSEPARAM BIT(5)
> #define ETHTOOL_OP_NEEDS_RTNL_SPAUSEPARAM BIT(6)
> #define ETHTOOL_OP_NEEDS_RTNL_RSS BIT(7)
> +#define ETHTOOL_OP_NEEDS_RTNL_GLINK BIT(8)
>
> /**
> * struct ethtool_ops - optional netdev operations
> @@ -978,6 +979,7 @@ struct kernel_ethtool_ts_info {
> * - phylink helpers (note that phydev is currently unsupported!)
> * - netdev_update_features()
> * - netif_set_real_num_tx_queues()
> + * - ethtool_op_get_link() (syncs link watch under rtnl_lock)
> *
> * @get_drvinfo: Report driver/device information. Modern drivers no
> * longer have to implement this callback. Most fields are
> diff --git a/net/ethtool/common.h b/net/ethtool/common.h
> index 2b3847f00801..4e5356e26f40 100644
> --- a/net/ethtool/common.h
> +++ b/net/ethtool/common.h
> @@ -113,6 +113,8 @@ ethtool_nl_msg_needs_rtnl(const struct net_device *dev, u8 cmd)
> return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_SPAUSEPARAM;
> case ETHTOOL_MSG_RSS_SET:
> return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_RSS;
> + case ETHTOOL_MSG_LINKSTATE_GET:
> + return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_GLINK;
> case ETHTOOL_MSG_TSCONFIG_GET:
> case ETHTOOL_MSG_TSCONFIG_SET:
> /* tsconfig calls ndos (ndo_hwtstamp_set/get), not ethtool ops.
> @@ -159,6 +161,8 @@ ethtool_ioctl_needs_rtnl(const struct net_device *dev, u32 ethcmd)
> case ETHTOOL_SRXFH:
> case ETHTOOL_SRXFHINDIR:
> return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_RSS;
> + case ETHTOOL_GLINK:
> + return ops->op_needs_rtnl & ETHTOOL_OP_NEEDS_RTNL_GLINK;
> }
> return false;
> }
> diff --git a/drivers/net/ethernet/google/gve/gve_ethtool.c b/drivers/net/ethernet/google/gve/gve_ethtool.c
> index 7cc22916852f..8199738ba979 100644
> --- a/drivers/net/ethernet/google/gve/gve_ethtool.c
> +++ b/drivers/net/ethernet/google/gve/gve_ethtool.c
> @@ -984,7 +984,8 @@ const struct ethtool_ops gve_ethtool_ops = {
> .supported_ring_params = ETHTOOL_RING_USE_TCP_DATA_SPLIT |
> ETHTOOL_RING_USE_RX_BUF_LEN,
> .op_needs_rtnl = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
> - ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
> + ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
> + ETHTOOL_OP_NEEDS_RTNL_GLINK,
Acked-by: Harshitha Ramamurthy <hramamurthy@google.com>
Thanks for the fix!
> .get_drvinfo = gve_get_drvinfo,
> .get_strings = gve_get_strings,
> .get_sset_count = gve_get_sset_count,
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
> index a615d599b88e..e7cf12eaa268 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
> @@ -1855,6 +1855,7 @@ static const struct ethtool_ops iavf_ethtool_ops = {
> .supported_coalesce_params = ETHTOOL_COALESCE_USECS |
> ETHTOOL_COALESCE_USE_ADAPTIVE,
> .supported_input_xfrm = RXH_XFRM_SYM_XOR,
> + .op_needs_rtnl = ETHTOOL_OP_NEEDS_RTNL_GLINK,
> .get_drvinfo = iavf_get_drvinfo,
> .get_link = ethtool_op_get_link,
> .get_ringparam = iavf_get_ringparam,
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
> index 2f5b626ba33f..112926d07634 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
> @@ -2721,7 +2721,8 @@ const struct ethtool_ops mlx5e_ethtool_ops = {
> .rxfh_max_num_contexts = MLX5E_MAX_NUM_RSS,
> .op_needs_rtnl = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
> ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
> - ETHTOOL_OP_NEEDS_RTNL_SPFLAGS,
> + ETHTOOL_OP_NEEDS_RTNL_SPFLAGS |
> + ETHTOOL_OP_NEEDS_RTNL_GLINK,
> .supported_coalesce_params = ETHTOOL_COALESCE_USECS |
> ETHTOOL_COALESCE_MAX_FRAMES |
> ETHTOOL_COALESCE_USE_ADAPTIVE |
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> index 1a8a19f980d3..c8b76d301c92 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
> @@ -419,7 +419,8 @@ static const struct ethtool_ops mlx5e_rep_ethtool_ops = {
> ETHTOOL_COALESCE_MAX_FRAMES |
> ETHTOOL_COALESCE_USE_ADAPTIVE,
> .op_needs_rtnl = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
> - ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
> + ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
> + ETHTOOL_OP_NEEDS_RTNL_GLINK,
> .get_drvinfo = mlx5e_rep_get_drvinfo,
> .get_link = ethtool_op_get_link,
> .get_strings = mlx5e_rep_get_strings,
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
> index 9b3b32408c64..01ddc3def9ac 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
> @@ -286,7 +286,8 @@ const struct ethtool_ops mlx5i_ethtool_ops = {
> ETHTOOL_COALESCE_MAX_FRAMES |
> ETHTOOL_COALESCE_USE_ADAPTIVE,
> .op_needs_rtnl = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
> - ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
> + ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
> + ETHTOOL_OP_NEEDS_RTNL_GLINK,
> .get_drvinfo = mlx5i_get_drvinfo,
> .get_strings = mlx5i_get_strings,
> .get_sset_count = mlx5i_get_sset_count,
> @@ -309,6 +310,7 @@ const struct ethtool_ops mlx5i_ethtool_ops = {
> };
>
> const struct ethtool_ops mlx5i_pkey_ethtool_ops = {
> + .op_needs_rtnl = ETHTOOL_OP_NEEDS_RTNL_GLINK,
> .get_drvinfo = mlx5i_get_drvinfo,
> .get_link = ethtool_op_get_link,
> .get_ts_info = mlx5i_get_ts_info,
> diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
> index cb34fc166ef9..0e47088ec44b 100644
> --- a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
> +++ b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
> @@ -2024,7 +2024,8 @@ static const struct ethtool_ops fbnic_ethtool_ops = {
> ETHTOOL_OP_NEEDS_RTNL_GPAUSEPARAM |
> ETHTOOL_OP_NEEDS_RTNL_SPAUSEPARAM |
> ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
> - ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
> + ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
> + ETHTOOL_OP_NEEDS_RTNL_GLINK,
> .get_drvinfo = fbnic_get_drvinfo,
> .get_regs_len = fbnic_get_regs_len,
> .get_regs = fbnic_get_regs,
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> index 94e658d07a27..881df597d7f9 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> @@ -597,7 +597,8 @@ static int mana_get_link_ksettings(struct net_device *ndev,
> const struct ethtool_ops mana_ethtool_ops = {
> .supported_coalesce_params = ETHTOOL_COALESCE_RX_CQE_FRAMES,
> .op_needs_rtnl = ETHTOOL_OP_NEEDS_RTNL_SCHANNELS |
> - ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM,
> + ETHTOOL_OP_NEEDS_RTNL_SRINGPARAM |
> + ETHTOOL_OP_NEEDS_RTNL_GLINK,
> .get_ethtool_stats = mana_get_ethtool_stats,
> .get_sset_count = mana_get_sset_count,
> .get_strings = mana_get_strings,
> --
> 2.54.0
>
^ permalink raw reply
* Re: [PATCH net] net: ethtool: keep rtnl_lock for ops using ethtool_op_get_link()
From: Breno Leitao @ 2026-06-25 16:47 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: Jakub Kicinski, davem, netdev, edumazet, pabeni, andrew+netdev,
horms, joshwash, hramamurthy, anthony.l.nguyen,
przemyslaw.kitszel, saeedm, tariqt, mbloch, leon, alexanderduyck,
kernel-team, kys, haiyangz, wei.liu, decui, longli, jordanrhee,
jacob.e.keller, nktgrg, debarghyak, mohsin.bashr, ernis, sdf, gal,
linux-rdma, linux-hyperv
In-Reply-To: <aj1Nqe3RoITzxSEb@devvm7509.cco0.facebook.com>
On Thu, Jun 25, 2026 at 08:48:03AM -0700, Stanislav Fomichev wrote:
> On 06/24, Jakub Kicinski wrote:
> > Breno reports following splats on mlx5:
> >
> > RTNL: assertion failed at net/core/dev.c (2241)
> > WARNING: net/core/dev.c:2241 at netif_state_change+0xed/0x130, CPU#5: ethtool/1335
> > RIP: 0010:netif_state_change+0xf9/0x130
> > Call Trace:
> > <TASK>
> > __linkwatch_sync_dev+0xea/0x120
> > ethtool_op_get_link+0xe/0x20
> > __ethtool_get_link+0x26/0x40
> > linkstate_prepare_data+0x51/0x200
> > ethnl_default_doit+0x213/0x470
> > genl_family_rcv_msg_doit+0xdd/0x110
> >
> > Looks like I missed ethtool_op_get_link() trying to sync linkwatch,
> > which needs rtnl_lock. Not all drivers do this - bnxt doesn't,
> > it just returns the link state, so add an opt-in bit.
> >
> > Reported-by: Breno Leitao <leitao@debian.org>
> > Fixes: 45079e00133e ("net: ethtool: optionally skip rtnl_lock on Netlink path for GET ops")
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>
> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Breno Leitao <leitao@debian.org>
^ permalink raw reply
* RE: [PATCH] hyperv: mshv: zero VTL hypercall input page
From: Michael Kelley @ 2026-06-25 16:41 UTC (permalink / raw)
To: Yousef Alhouseen, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20260624175703.9285-1-alhouseenyousef@gmail.com>
From: Yousef Alhouseen <alhouseenyousef@gmail.com> Sent: Wednesday, June 24, 2026 10:57 AM
> Subject: [PATCH] hyperv: mshv: zero VTL hypercall input page
>
Same comment here about the patch "Subject:" prefix.
> mshv_vtl_hvcall_call() copies only the user-provided input size.
>
> It then passes the page to hv_do_hypercall().
>
> For short inputs, stale bytes can remain in the bounce page.
>
> Those bytes can be consumed by the hypervisor.
It's unclear to me that there's really a problem here. In a
CoCo VM, the host hypervisor isn't trusted, so hypercall sites
must be careful to only expose intended data in the hypercall
input and output pages. But this code already doesn't support
CoCo VMs, as noted in the comment. So in the supported
scenario, the hypervisor has access to all of guest memory. Passing
stale bytes to the hypervisor vs. passing zeros really wouldn't matter.
And user space can already pass stale/garbage bytes to the hypervisor
if it wants to. This code doesn't try to validate the input data for
whatever hypercall user space is requesting to be made.
When support for CoCo VMs is added, this code will indeed
need to make sure not to allow garbage kernel data in the
hypercall input or output pages. But decrypting the pages
so the hypervisor can access them should take care of that
issue.
Michael
>
> Allocate the input page zeroed, matching the output page.
>
> Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
> ---
> drivers/hv/mshv_vtl_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/hv/mshv_vtl_main.c b/drivers/hv/mshv_vtl_main.c
> index 0365d207c..f2633148c 100644
> --- a/drivers/hv/mshv_vtl_main.c
> +++ b/drivers/hv/mshv_vtl_main.c
> @@ -1146,7 +1146,7 @@ static int mshv_vtl_hvcall_call(struct mshv_vtl_hvcall_fd *fd,
> *
> * TODO: Take care of this when CVM support is added.
> */
> - in = (void *)__get_free_page(GFP_KERNEL);
> + in = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> out = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> if (!in || !out) {
> ret = -ENOMEM;
> --
> 2.54.0
>
^ permalink raw reply
* RE: [PATCH] hyperv: mshv: zero VTL hypercall output page
From: Michael Kelley @ 2026-06-25 16:41 UTC (permalink / raw)
To: Yousef Alhouseen, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
Dexuan Cui, Long Li
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20260624172157.2790-1-alhouseenyousef@gmail.com>
From: Yousef Alhouseen <alhouseenyousef@gmail.com> Sent: Wednesday, June 24, 2026 10:22 AM
> Subject: [PATCH] hyperv: mshv: zero VTL hypercall output page
There was a recent discussion about what prefix to use in the patch
"Subject:" field for changes to MSHV VTL code. The agreement was to
use just "mshv_vtl:". See [1].
[1] https://lore.kernel.org/linux-hyperv/a0d271e3-ece8-45cf-9dbb-ced773d6f3f8@linux.microsoft.com/
>
> mshv_vtl_hvcall_call() copies output_size bytes from a freshly allocated
> hypercall output page back to userspace. The page is currently allocated
> without __GFP_ZERO, so any bytes not written by the hypervisor are copied
> from stale page contents.
This is a good find! Even though the VTL user space code is somewhat trusted,
there should not be any circumstances where the kernel could copy random
garbage to user space.
>
> Allocate the output page zeroed before issuing the hypercall.
Hypercall output is usually no more than a few tens of bytes. Zeroing
the entire page is a bit expensive. It would be sufficient to just zero
output_size bytes.
Standard practice is to *not* zero to the hypercall output area, since
the hypercall invoker knows exactly how many bytes Hyper-V will
return for a particular hypercall, and Hyper-V is responsible for not
leaving any garbage. So it would be good to leave a code comment
here about why the output area is being zero'ed contrary to that
standard practice.
I would note that many hypercalls don't return any output other
than the hypercall status. If output_size is zero, allocating the
output page could be skipped. But that's a further
optimization for another patch.
> Also check
> both bounce-page allocations before using them so memory pressure cannot
> turn the copy paths into NULL pointer dereferences.
>
> Signed-off-by: Yousef Alhouseen <alhouseenyousef@gmail.com>
> ---
> drivers/hv/mshv_vtl_main.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/hv/mshv_vtl_main.c b/drivers/hv/mshv_vtl_main.c
> index 0d3d41619..0365d207c 100644
> --- a/drivers/hv/mshv_vtl_main.c
> +++ b/drivers/hv/mshv_vtl_main.c
> @@ -1147,7 +1147,11 @@ static int mshv_vtl_hvcall_call(struct mshv_vtl_hvcall_fd *fd,
> * TODO: Take care of this when CVM support is added.
> */
> in = (void *)__get_free_page(GFP_KERNEL);
> - out = (void *)__get_free_page(GFP_KERNEL);
> + out = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
> + if (!in || !out) {
> + ret = -ENOMEM;
> + goto free_pages;
> + }
>
> if (copy_from_user(in, (void __user *)hvcall.input_ptr, hvcall.input_size)) {
> ret = -EFAULT;
> @@ -1162,8 +1166,10 @@ static int mshv_vtl_hvcall_call(struct mshv_vtl_hvcall_fd *fd,
> }
> ret = put_user(hvcall.status, &hvcall_user->status);
> free_pages:
> - free_page((unsigned long)in);
> - free_page((unsigned long)out);
> + if (in)
> + free_page((unsigned long)in);
> + if (out)
> + free_page((unsigned long)out);
Testing "in" and "out" here isn't necessary. free_page()
already has code to do nothing if its argument is zero.
Michael
>
> return ret;
> }
> --
> 2.54.0
>
^ permalink raw reply
* Re: [PATCH v5 net] net: mana: Optimize irq affinity for low vcpu configs
From: patchwork-bot+netdevbpf @ 2026-06-25 16:20 UTC (permalink / raw)
To: Shradha Gupta
Cc: decui, wei.liu, haiyangz, kys, andrew+netdev, davem, edumazet,
kuba, pabeni, kotaranov, horms, ernis, dipayanroy, shirazsaleem,
mhklinux, longli, yury.norov, linux-hyperv, linux-kernel, netdev,
paulros, shradhagupta, ssengar, stable, ynorov
In-Reply-To: <20260624072138.1632849-1-shradhagupta@linux.microsoft.com>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Wed, 24 Jun 2026 00:21:35 -0700 you wrote:
> Before the commit 755391121038 ("net: mana: Allocate MSI-X vectors
> dynamically"), all the MANA IRQs were assigned statically and together
> during early driver load.
>
> After this commit, the IRQ allocation for MANA was done in two phases.
> HWC IRQ allocated earlier and then, queue IRQs dynamically added at a
> later point. By this time, the IRQ weights on vCPUs can become imbalanced
> and if IRQ count is greater than the vCPU count the topology aware IRQ
> distribution logic in MANA can cause multiple MANA IRQs to land on the
> same vCPUs, while other sibling vCPUs have none (case 1).
>
> [...]
Here is the summary with links:
- [v5,net] net: mana: Optimize irq affinity for low vcpu configs
https://git.kernel.org/netdev/net/c/5316394b1752
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net] net: ethtool: keep rtnl_lock for ops using ethtool_op_get_link()
From: Stanislav Fomichev @ 2026-06-25 15:48 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
Breno Leitao, joshwash, hramamurthy, anthony.l.nguyen,
przemyslaw.kitszel, saeedm, tariqt, mbloch, leon, alexanderduyck,
kernel-team, kys, haiyangz, wei.liu, decui, longli, jordanrhee,
jacob.e.keller, nktgrg, debarghyak, mohsin.bashr, ernis, sdf, gal,
linux-rdma, linux-hyperv
In-Reply-To: <20260624190439.2521219-1-kuba@kernel.org>
On 06/24, Jakub Kicinski wrote:
> Breno reports following splats on mlx5:
>
> RTNL: assertion failed at net/core/dev.c (2241)
> WARNING: net/core/dev.c:2241 at netif_state_change+0xed/0x130, CPU#5: ethtool/1335
> RIP: 0010:netif_state_change+0xf9/0x130
> Call Trace:
> <TASK>
> __linkwatch_sync_dev+0xea/0x120
> ethtool_op_get_link+0xe/0x20
> __ethtool_get_link+0x26/0x40
> linkstate_prepare_data+0x51/0x200
> ethnl_default_doit+0x213/0x470
> genl_family_rcv_msg_doit+0xdd/0x110
>
> Looks like I missed ethtool_op_get_link() trying to sync linkwatch,
> which needs rtnl_lock. Not all drivers do this - bnxt doesn't,
> it just returns the link state, so add an opt-in bit.
>
> Reported-by: Breno Leitao <leitao@debian.org>
> Fixes: 45079e00133e ("net: ethtool: optionally skip rtnl_lock on Netlink path for GET ops")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
^ permalink raw reply
* Re: [PATCH net] net: mana: Fall back to standard MTU when PF reports adapter_mtu of 0
From: patchwork-bot+netdevbpf @ 2026-06-25 1:08 UTC (permalink / raw)
To: Erni Sri Satya Vennela
Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, kuba, pabeni, dipayanroy, ssengar, jacob.e.keller,
horms, gargaditya, kees, linux-hyperv, netdev, linux-kernel, bpf
In-Reply-To: <20260619055348.467224-1-ernis@linux.microsoft.com>
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 18 Jun 2026 22:53:38 -0700 you wrote:
> Commit d7709812e13d ("net: mana: hardening: Validate adapter_mtu from
> MANA_QUERY_DEV_CONFIG") rejected any adapter_mtu value smaller than
> ETH_MIN_MTU + ETH_HLEN, including 0, returning -EPROTO and failing
> mana_probe().
>
> Some older PF firmware versions still in the field report
> adapter_mtu as 0 in the MANA_QUERY_DEV_CONFIG response. With the
> hardening check in place, the MANA VF driver now fails to load on
> those hosts, breaking networking entirely for guests.
>
> [...]
Here is the summary with links:
- [net] net: mana: Fall back to standard MTU when PF reports adapter_mtu of 0
https://git.kernel.org/netdev/net/c/6bd81a5b4e0d
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* RE: [EXTERNAL] Re: [PATCH net] net: mana: Sync page pool RX frags for CPU
From: Dexuan Cui @ 2026-06-24 22:50 UTC (permalink / raw)
To: Simon Horman
Cc: KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org, Long Li,
andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, Konstantin Taranov,
ernis@linux.microsoft.com, dipayanroy@linux.microsoft.com,
kees@kernel.org, jacob.e.keller@intel.com,
ssengar@linux.microsoft.com, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-rdma@vger.kernel.org, stable@vger.kernel.org
In-Reply-To: <20260619090514.GT827683@horms.kernel.org>
> From: Simon Horman <horms@kernel.org>
> Sent: Friday, June 19, 2026 2:05 AM
> > ...
> > Also validate the packet length reported in the RX CQE before using it as
> > a DMA sync length or passing it to skb processing. The CQE is supplied
> > by the device and should not be blindly trusted by Confidential VMs.
>
> I think this last part warrants being split out into a separate patch.
Sorry for the late reply. I split v1 into 2 patches of v2, which I just posted:
https://lwn.net/ml/linux-kernel/20260624222605.1794719-1-decui@microsoft.com/
Thanks,
Dexuan
^ permalink raw reply
* [PATCH net v2 1/2] net: mana: Sync page pool RX frags for CPU
From: Dexuan Cui @ 2026-06-24 22:26 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, kuba, pabeni, kotaranov, horms, ernis, dipayanroy, kees,
jacob.e.keller, ssengar, linux-hyperv, netdev, linux-kernel,
linux-rdma
Cc: stable
In-Reply-To: <20260624222605.1794719-1-decui@microsoft.com>
MANA allocates RX buffers from page pool fragments when frag_count is
greater than 1. In that case the buffers remain DMA mapped by page pool
and the RX completion path does not call dma_unmap_single(). As a result,
the implicit sync-for-CPU normally performed by dma_unmap_single() is
missing before the packet data is passed to the networking stack.
This breaks RX on configurations which require explicit DMA syncing, for
example when booted with swiotlb=force.
Fix this by recording the page pool page and DMA sync offset when the RX
buffer is allocated, and syncing the received packet range for CPU access
before handing the RX buffer to the stack.
Fixes: 730ff06d3f5c ("net: mana: Use page pool fragments for RX buffers instead of full pages to improve memory efficiency.")
Cc: stable@vger.kernel.org
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
Changes since v1:
v1 is split into two patches in the v2.
Add Haiyang's Reviewed-by.
drivers/net/ethernet/microsoft/mana/mana_en.c | 39 +++++++++++++++----
include/net/mana/mana.h | 8 ++++
2 files changed, 40 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index c9b1df1ed109..1875bffd82b7 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -2044,12 +2044,16 @@ static void mana_rx_skb(void *buf_va, bool from_pool,
}
static void *mana_get_rxfrag(struct mana_rxq *rxq, struct device *dev,
- dma_addr_t *da, bool *from_pool)
+ dma_addr_t *da, bool *from_pool,
+ struct page **pp_page, u32 *dma_sync_offset)
{
struct page *page;
u32 offset;
void *va;
+
*from_pool = false;
+ *pp_page = NULL;
+ *dma_sync_offset = 0;
/* Don't use fragments for jumbo frames or XDP where it's 1 fragment
* per page.
@@ -2087,31 +2091,47 @@ static void *mana_get_rxfrag(struct mana_rxq *rxq, struct device *dev,
va = page_to_virt(page) + offset;
*da = page_pool_get_dma_addr(page) + offset + rxq->headroom;
*from_pool = true;
+ *pp_page = page;
+ *dma_sync_offset = offset + rxq->headroom;
return va;
}
/* Allocate frag for rx buffer, and save the old buf */
static void mana_refill_rx_oob(struct device *dev, struct mana_rxq *rxq,
- struct mana_recv_buf_oob *rxoob, void **old_buf,
- bool *old_fp)
+ struct mana_recv_buf_oob *rxoob, u32 pktlen,
+ void **old_buf, bool *old_fp)
{
+ struct page *pp_page;
+ u32 dma_sync_offset;
bool from_pool;
dma_addr_t da;
void *va;
- va = mana_get_rxfrag(rxq, dev, &da, &from_pool);
+ va = mana_get_rxfrag(rxq, dev, &da, &from_pool, &pp_page,
+ &dma_sync_offset);
if (!va)
return;
- if (!rxoob->from_pool || rxq->frag_count == 1)
+ if (!rxoob->from_pool || rxq->frag_count == 1) {
dma_unmap_single(dev, rxoob->sgl[0].address, rxq->datasize,
DMA_FROM_DEVICE);
+ } else {
+ /* The page pool maps the whole page and only syncs for device
+ * automatically (PP_FLAG_DMA_SYNC_DEV). Sync the received bytes
+ * for the CPU before they are read: this is required if DMA
+ * is incoherent or bounce buffers are used.
+ */
+ page_pool_dma_sync_for_cpu(rxq->page_pool, rxoob->pp_page,
+ rxoob->dma_sync_offset, pktlen);
+ }
*old_buf = rxoob->buf_va;
*old_fp = rxoob->from_pool;
rxoob->buf_va = va;
rxoob->sgl[0].address = da;
rxoob->from_pool = from_pool;
+ rxoob->pp_page = pp_page;
+ rxoob->dma_sync_offset = dma_sync_offset;
}
static void mana_process_rx_cqe(struct mana_rxq *rxq, struct mana_cq *cq,
@@ -2170,7 +2190,7 @@ static void mana_process_rx_cqe(struct mana_rxq *rxq, struct mana_cq *cq,
rxbuf_oob = &rxq->rx_oobs[curr];
WARN_ON_ONCE(rxbuf_oob->wqe_inf.wqe_size_in_bu != 1);
- mana_refill_rx_oob(dev, rxq, rxbuf_oob, &old_buf, &old_fp);
+ mana_refill_rx_oob(dev, rxq, rxbuf_oob, pktlen, &old_buf, &old_fp);
/* Unsuccessful refill will have old_buf == NULL.
* In this case, mana_rx_skb() will drop the packet.
@@ -2566,6 +2586,8 @@ static int mana_fill_rx_oob(struct mana_recv_buf_oob *rx_oob, u32 mem_key,
struct mana_rxq *rxq, struct device *dev)
{
struct mana_port_context *mpc = netdev_priv(rxq->ndev);
+ struct page *pp_page = NULL;
+ u32 dma_sync_offset = 0;
bool from_pool = false;
dma_addr_t da;
void *va;
@@ -2573,13 +2595,16 @@ static int mana_fill_rx_oob(struct mana_recv_buf_oob *rx_oob, u32 mem_key,
if (mpc->rxbufs_pre)
va = mana_get_rxbuf_pre(rxq, &da);
else
- va = mana_get_rxfrag(rxq, dev, &da, &from_pool);
+ va = mana_get_rxfrag(rxq, dev, &da, &from_pool, &pp_page,
+ &dma_sync_offset);
if (!va)
return -ENOMEM;
rx_oob->buf_va = va;
rx_oob->from_pool = from_pool;
+ rx_oob->pp_page = pp_page;
+ rx_oob->dma_sync_offset = dma_sync_offset;
rx_oob->sgl[0].address = da;
rx_oob->sgl[0].size = rxq->datasize;
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 8f721cd4e4a7..4111b93169d2 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -305,6 +305,14 @@ struct mana_recv_buf_oob {
void *buf_va;
bool from_pool; /* allocated from a page pool */
+ /* head page of the page_pool fragment; valid only when
+ * from_pool && frag_count > 1.
+ */
+ struct page *pp_page;
+ /* Fragment offset plus rxq->headroom, passed to
+ * page_pool_dma_sync_for_cpu().
+ */
+ u32 dma_sync_offset;
/* SGL of the buffer going to be sent as part of the work request. */
u32 num_sge;
--
2.34.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox