* Re: [PATCH net-next v5] net: mana: Add MAC address to vPort logs and clarify error messages
From: Simon Horman @ 2026-03-05 9:19 UTC (permalink / raw)
To: Erni Sri Satya Vennela
Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, kuba, pabeni, dipayanroy, shirazsaleem, kees,
shradhagupta, gargaditya, linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260302174204.234837-1-ernis@linux.microsoft.com>
On Mon, Mar 02, 2026 at 09:41:52AM -0800, Erni Sri Satya Vennela wrote:
> Add MAC address to vPort configuration success message and update error
> message to be more specific about HWC message errors in
> mana_send_request.
>
> Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
> ---
> Changes in v5:
> * Remove __func__ and __LINE__ from error logs in hw_channel.c
Thanks for the update.
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply
* [PATCH v9 10/10] x86/hyperv/vtl: Use the wakeup mailbox to boot secondary CPUs
From: Ricardo Neri @ 2026-03-04 23:41 UTC (permalink / raw)
To: x86, Krzysztof Kozlowski, Conor Dooley, Rob Herring,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Michael Kelley, Rafael J. Wysocki
Cc: Saurabh Sengar, Chris Oo, Kirill A. Shutemov, linux-hyperv,
devicetree, linux-acpi, linux-kernel, Ricardo Neri, Ricardo Neri
In-Reply-To: <20260304-rneri-wakeup-mailbox-v9-0-a5c6845e6251@linux.intel.com>
The hypervisor is an untrusted entity for TDX guests. It cannot be used
to boot secondary CPUs. The function hv_vtl_wakeup_secondary_cpu() cannot
be used.
Instead, the virtual firmware boots the secondary CPUs and places them in
a state to transfer control to the kernel using the wakeup mailbox. The
firmware enumerates the mailbox via either an ACPI table or a DeviceTree
node.
If the wakeup mailbox is present, the kernel updates the APIC callback
wakeup_secondary_cpu_64() to use it.
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes in v9:
- None
Changes in v8:
- None
Changes in v7:
- None
Changes in v6:
- Added Reviewed-by tag from Dexuan. Thanks!
Changes in v5:
- None
Changes in v4:
- Added Reviewed-by tag from Michael. Thanks!
Changes in v3:
- Unconditionally use the wakeup mailbox in a TDX confidential VM.
(Michael).
- Edited the commit message for clarity.
Changes in v2:
- None
---
arch/x86/hyperv/hv_vtl.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c
index 1e2f5b3ea772..07fac3d687c3 100644
--- a/arch/x86/hyperv/hv_vtl.c
+++ b/arch/x86/hyperv/hv_vtl.c
@@ -274,7 +274,15 @@ int __init hv_vtl_early_init(void)
panic("XSAVE has to be disabled as it is not supported by this module.\n"
"Please add 'noxsave' to the kernel command line.\n");
- apic_update_callback(wakeup_secondary_cpu_64, hv_vtl_wakeup_secondary_cpu);
+ /*
+ * TDX confidential VMs do not trust the hypervisor and cannot use it to
+ * boot secondary CPUs. Instead, they will be booted using the wakeup
+ * mailbox if detected during boot. See setup_arch().
+ *
+ * There is no paravisor present if we are here.
+ */
+ if (!hv_isolation_type_tdx())
+ apic_update_callback(wakeup_secondary_cpu_64, hv_vtl_wakeup_secondary_cpu);
return 0;
}
--
2.43.0
^ permalink raw reply related
* [PATCH v9 09/10] x86/hyperv/vtl: Mark the wakeup mailbox page as private
From: Ricardo Neri @ 2026-03-04 23:41 UTC (permalink / raw)
To: x86, Krzysztof Kozlowski, Conor Dooley, Rob Herring,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Michael Kelley, Rafael J. Wysocki
Cc: Saurabh Sengar, Chris Oo, Kirill A. Shutemov, linux-hyperv,
devicetree, linux-acpi, linux-kernel, Ricardo Neri, Yunhong Jiang,
Ricardo Neri
In-Reply-To: <20260304-rneri-wakeup-mailbox-v9-0-a5c6845e6251@linux.intel.com>
From: Yunhong Jiang <yunhong.jiang@linux.intel.com>
The current code maps MMIO devices as shared (decrypted) by default in a
confidential computing VM.
In a TDX environment, secondary CPUs are booted using the Multiprocessor
Wakeup Structure defined in the ACPI specification. The virtual firmware
and the operating system function in the guest context, without
intervention from the VMM. Map the physical memory of the mailbox as
private. Use the is_private_mmio() callback.
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes in v9:
- None
Changes in v8:
- Included linux/acpi.h to add missing definitions that caused build
breaks (kernel test robot)
Changes in v7:
- Dropped check for !CONFIG_X86_MAILBOX_WAKEUP. The symbol is no longer
valid and now we have a stub for !CONFIG_ACPI.
- Dropped Reviewed-by tags from Dexuan and Michael as this patch
changed.
Changes in v6:
- Fixed a compile error with !CONFIG_X86_MAILBOX_WAKEUP.
- Added Reviewed-by tag from Dexuan. Thanks!
Changes in v5:
- None
Changes in v4:
- Updated to use the renamed function acpi_get_mp_wakeup_mailbox_paddr().
- Added Reviewed-by tag from Michael. Thanks!
Changes in v3:
- Use the new helper function get_mp_wakeup_mailbox_paddr().
- Edited the commit message for clarity.
Changes in v2:
- Added the helper function within_page() to improve readability
- Override the is_private_mmio() callback when detecting a TDX
environment. The address of the mailbox is checked in
hv_is_private_mmio_tdx().
---
arch/x86/hyperv/hv_vtl.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c
index b94fffa67312..1e2f5b3ea772 100644
--- a/arch/x86/hyperv/hv_vtl.c
+++ b/arch/x86/hyperv/hv_vtl.c
@@ -6,6 +6,9 @@
* Saurabh Sengar <ssengar@microsoft.com>
*/
+#include <linux/acpi.h>
+
+#include <asm/acpi.h>
#include <asm/apic.h>
#include <asm/boot.h>
#include <asm/desc.h>
@@ -59,6 +62,18 @@ static void __noreturn hv_vtl_restart(char __maybe_unused *cmd)
hv_vtl_emergency_restart();
}
+static inline bool within_page(u64 addr, u64 start)
+{
+ return addr >= start && addr < (start + PAGE_SIZE);
+}
+
+static bool hv_vtl_is_private_mmio_tdx(u64 addr)
+{
+ u64 mb_addr = acpi_get_mp_wakeup_mailbox_paddr();
+
+ return mb_addr && within_page(addr, mb_addr);
+}
+
void __init hv_vtl_init_platform(void)
{
/*
@@ -71,6 +86,8 @@ void __init hv_vtl_init_platform(void)
/* There is no paravisor present if we are here. */
if (hv_isolation_type_tdx()) {
x86_init.resources.realmode_limit = SZ_4G;
+ x86_platform.hyper.is_private_mmio = hv_vtl_is_private_mmio_tdx;
+
} else {
x86_platform.realmode_reserve = x86_init_noop;
x86_platform.realmode_init = x86_init_noop;
--
2.43.0
^ permalink raw reply related
* [PATCH v9 08/10] x86/acpi: Add a helper get the address of the wakeup mailbox
From: Ricardo Neri @ 2026-03-04 23:41 UTC (permalink / raw)
To: x86, Krzysztof Kozlowski, Conor Dooley, Rob Herring,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Michael Kelley, Rafael J. Wysocki
Cc: Saurabh Sengar, Chris Oo, Kirill A. Shutemov, linux-hyperv,
devicetree, linux-acpi, linux-kernel, Ricardo Neri, Ricardo Neri
In-Reply-To: <20260304-rneri-wakeup-mailbox-v9-0-a5c6845e6251@linux.intel.com>
A Hyper-V VTL level 2 guest in a TDX environment needs to map the physical
page of the ACPI Multiprocessor Wakeup Structure as private (encrypted). It
needs to know the physical address of this structure. Add a helper function
to retrieve the address.
Suggested-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes in v9:
- None
Changes in v8:
- Added Acked-by tag from Rafael. Thanks!
Changes in v7:
- Moved the added function to arch/x86/kernel/acpi/madt_wakeup.c
- Dropped Reviewed-by tags from Dexuan and Michael as this patch
changed.
Changes in v6:
- Added Reviewed-by tag from Dexuan. Thanks!
Changes in v5:
- None
Changes in v4:
- Renamed function to acpi_get_mp_wakeup_mailbox_paddr().
- Added Reviewed-by tag from Michael. Thanks!
Changes in v3:
- Introduced this patch
Changes in v2:
- N/A
---
arch/x86/include/asm/acpi.h | 6 ++++++
arch/x86/kernel/acpi/madt_wakeup.c | 5 +++++
2 files changed, 11 insertions(+)
diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index 820df375df79..c4e6459bd56b 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -184,6 +184,7 @@ void __iomem *x86_acpi_os_ioremap(acpi_physical_address phys, acpi_size size);
void acpi_setup_mp_wakeup_mailbox(u64 addr);
struct acpi_madt_multiproc_wakeup_mailbox *acpi_get_mp_wakeup_mailbox(void);
+u64 acpi_get_mp_wakeup_mailbox_paddr(void);
#else /* !CONFIG_ACPI */
@@ -210,6 +211,11 @@ static inline struct acpi_madt_multiproc_wakeup_mailbox *acpi_get_mp_wakeup_mail
return NULL;
}
+static inline u64 acpi_get_mp_wakeup_mailbox_paddr(void)
+{
+ return 0;
+}
+
#endif /* !CONFIG_ACPI */
#define ARCH_HAS_POWER_INIT 1
diff --git a/arch/x86/kernel/acpi/madt_wakeup.c b/arch/x86/kernel/acpi/madt_wakeup.c
index 82caf44b45e3..48734e4a6e8f 100644
--- a/arch/x86/kernel/acpi/madt_wakeup.c
+++ b/arch/x86/kernel/acpi/madt_wakeup.c
@@ -258,3 +258,8 @@ struct acpi_madt_multiproc_wakeup_mailbox *acpi_get_mp_wakeup_mailbox(void)
{
return acpi_mp_wake_mailbox;
}
+
+u64 acpi_get_mp_wakeup_mailbox_paddr(void)
+{
+ return acpi_mp_wake_mailbox_paddr;
+}
--
2.43.0
^ permalink raw reply related
* [PATCH v9 06/10] x86/realmode: Make the location of the trampoline configurable
From: Ricardo Neri @ 2026-03-04 23:41 UTC (permalink / raw)
To: x86, Krzysztof Kozlowski, Conor Dooley, Rob Herring,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Michael Kelley, Rafael J. Wysocki
Cc: Saurabh Sengar, Chris Oo, Kirill A. Shutemov, linux-hyperv,
devicetree, linux-acpi, linux-kernel, Ricardo Neri, Yunhong Jiang,
Ricardo Neri, Thomas Gleixner
In-Reply-To: <20260304-rneri-wakeup-mailbox-v9-0-a5c6845e6251@linux.intel.com>
From: Yunhong Jiang <yunhong.jiang@linux.intel.com>
x86 CPUs boot in real mode. This mode uses a 1MB address space. The
trampoline must reside below this 1MB memory boundary.
There are platforms in which the firmware boots the secondary CPUs,
switches them to long mode and transfers control to the kernel. An example
of such a mechanism is the ACPI Multiprocessor Wakeup Structure.
In this scenario there is no restriction on locating the trampoline under
1MB memory. Moreover, certain platforms (for example, Hyper-V VTL guests)
may not have memory available for allocation below 1MB.
Add a new member to struct x86_init_resources to specify the upper bound
for the location of the trampoline memory. Preserve the default upper bound
of 1MB to conserve the current behavior.
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes in v9:
- None
Changes in v8:
- None
Changes in v7:
- None
Changes in v6:
- Added Reviewed-by tag from Dexuan. Thanks!
Changes in v5:
- None
Changes in v4:
- Added Reviewed-by tag from Michael. Thanks!
Changes in v3:
- Edited the commit message for clarity.
- Minor tweaks to comments.
- Removed the option to not reserve the first 1MB of memory as it is
not needed.
Changes in v2:
- Added this patch using code that Thomas suggested:
https://lore.kernel.org/lkml/87a5ho2q6x.ffs@tglx/
---
arch/x86/include/asm/x86_init.h | 3 +++
arch/x86/kernel/x86_init.c | 3 +++
arch/x86/realmode/init.c | 7 +++----
3 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 6c8a6ead84f6..953d3199408a 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -31,12 +31,15 @@ struct x86_init_mpparse {
* platform
* @memory_setup: platform specific memory setup
* @dmi_setup: platform specific DMI setup
+ * @realmode_limit: platform specific address limit for the real mode trampoline
+ * (default 1M)
*/
struct x86_init_resources {
void (*probe_roms)(void);
void (*reserve_resources)(void);
char *(*memory_setup)(void);
void (*dmi_setup)(void);
+ unsigned long realmode_limit;
};
/**
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index ebefb77c37bb..252c5827d063 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -9,6 +9,7 @@
#include <linux/export.h>
#include <linux/pci.h>
#include <linux/acpi.h>
+#include <linux/sizes.h>
#include <asm/acpi.h>
#include <asm/bios_ebda.h>
@@ -69,6 +70,8 @@ struct x86_init_ops x86_init __initdata = {
.reserve_resources = reserve_standard_io_resources,
.memory_setup = e820__memory_setup_default,
.dmi_setup = dmi_setup,
+ /* Has to be under 1M so we can execute real-mode AP code. */
+ .realmode_limit = SZ_1M,
},
.mpparse = {
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 88be32026768..694d80a5c68e 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -46,7 +46,7 @@ void load_trampoline_pgtable(void)
void __init reserve_real_mode(void)
{
- phys_addr_t mem;
+ phys_addr_t mem, limit = x86_init.resources.realmode_limit;
size_t size = real_mode_size_needed();
if (!size)
@@ -54,10 +54,9 @@ void __init reserve_real_mode(void)
WARN_ON(slab_is_available());
- /* Has to be under 1M so we can execute real-mode AP code. */
- mem = memblock_phys_alloc_range(size, PAGE_SIZE, 0, 1<<20);
+ mem = memblock_phys_alloc_range(size, PAGE_SIZE, 0, limit);
if (!mem)
- pr_info("No sub-1M memory is available for the trampoline\n");
+ pr_info("No memory below %pa for the real-mode trampoline\n", &limit);
else
set_real_mode_mem(mem);
--
2.43.0
^ permalink raw reply related
* [PATCH v9 07/10] x86/hyperv/vtl: Setup the 64-bit trampoline for TDX guests
From: Ricardo Neri @ 2026-03-04 23:41 UTC (permalink / raw)
To: x86, Krzysztof Kozlowski, Conor Dooley, Rob Herring,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Michael Kelley, Rafael J. Wysocki
Cc: Saurabh Sengar, Chris Oo, Kirill A. Shutemov, linux-hyperv,
devicetree, linux-acpi, linux-kernel, Ricardo Neri, Yunhong Jiang,
Ricardo Neri
In-Reply-To: <20260304-rneri-wakeup-mailbox-v9-0-a5c6845e6251@linux.intel.com>
From: Yunhong Jiang <yunhong.jiang@linux.intel.com>
The hypervisor is an untrusted entity for TDX guests. It cannot be used
to boot secondary CPUs - neither via hypercalls nor the INIT assert,
de-assert, plus Start-Up IPI messages.
Instead, the platform virtual firmware boots the secondary CPUs and
puts them in a state to transfer control to the kernel. This mechanism uses
the wakeup mailbox described in the Multiprocessor Wakeup Structure of the
ACPI specification. The entry point to the kernel is trampoline_start64.
Allocate and setup the trampoline using the default x86_platform callbacks.
The platform firmware configures the secondary CPUs in long mode. It is no
longer necessary to locate the trampoline under 1MB memory. After handoff
from firmware, the trampoline code switches briefly to 32-bit addressing
mode, which has an addressing limit of 4GB. Set the upper bound of the
trampoline memory accordingly.
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes in v9:
- None
Changes in v8:
- None
Changes in v7:
- None
Changes in v6:
- Added Reviewed-by tag from Dexuan. Thanks!
Changes in v5:
- None
Changes in v4:
- Added Reviewed-by tag from Michael. Thanks!
Changes in v3:
- Added a note regarding there is no need to check for a present
paravisor.
- Edited commit message for clarity.
Changes in v2:
- Dropped the function hv_reserve_real_mode(). Instead, used the new
members realmode_limit and reserve_bios members of x86_init to
set the upper bound of the trampoline memory. (Thomas)
---
arch/x86/hyperv/hv_vtl.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c
index e045bf3a2c59..b94fffa67312 100644
--- a/arch/x86/hyperv/hv_vtl.c
+++ b/arch/x86/hyperv/hv_vtl.c
@@ -68,9 +68,14 @@ void __init hv_vtl_init_platform(void)
*/
pr_info("Linux runs in Hyper-V Virtual Trust Level %d\n", ms_hyperv.vtl);
- x86_platform.realmode_reserve = x86_init_noop;
- x86_platform.realmode_init = x86_init_noop;
- real_mode_header = &hv_vtl_real_mode_header;
+ /* There is no paravisor present if we are here. */
+ if (hv_isolation_type_tdx()) {
+ x86_init.resources.realmode_limit = SZ_4G;
+ } else {
+ x86_platform.realmode_reserve = x86_init_noop;
+ x86_platform.realmode_init = x86_init_noop;
+ real_mode_header = &hv_vtl_real_mode_header;
+ }
x86_init.irqs.pre_vector_init = x86_init_noop;
x86_init.timers.timer_init = x86_init_noop;
x86_init.resources.probe_roms = x86_init_noop;
--
2.43.0
^ permalink raw reply related
* [PATCH v9 05/10] x86/hyperv/vtl: Set real_mode_header in hv_vtl_init_platform()
From: Ricardo Neri @ 2026-03-04 23:41 UTC (permalink / raw)
To: x86, Krzysztof Kozlowski, Conor Dooley, Rob Herring,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Michael Kelley, Rafael J. Wysocki
Cc: Saurabh Sengar, Chris Oo, Kirill A. Shutemov, linux-hyperv,
devicetree, linux-acpi, linux-kernel, Ricardo Neri, Yunhong Jiang,
Ricardo Neri, Thomas Gleixner
In-Reply-To: <20260304-rneri-wakeup-mailbox-v9-0-a5c6845e6251@linux.intel.com>
From: Yunhong Jiang <yunhong.jiang@linux.intel.com>
Hyper-V VTL clears x86_platform.realmode_{init(), reserve()} in
hv_vtl_init_platform() whereas it sets real_mode_header later in
hv_vtl_early_init(). There is no need to deal with the settings of real
mode memory in two places. Also, both functions are called much earlier
than x86_platform.realmode_init() (via an early_initcall), where the
real_mode_header is needed.
Set real_mode_header in hv_vtl_init_platform() to keep all code dealing
with memory for the real mode trampoline in one place. Besides making the
code more readable, it prepares it for a subsequent changeset in which the
behavior needs to change to support Hyper-V VTL guests in TDX a
environment.
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes in v9:
- None
Changes in v8:
- None
Changes in v7:
- None
Changes in v6:
- Corrected reference to hv_vtl_init_platform() in the changelog.
(Dexuan)
- Added Reviewed-by tag from Dexuan. Thanks!
Changes in v5:
- None
Changes in v4:
- Added Reviewed-by tag from Michael. Thanks!
Changes in v3:
- Edited the commit message for clarity.
Changes in v2:
- Introduced this patch.
---
arch/x86/hyperv/hv_vtl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c
index 9b6a9bc4ab76..e045bf3a2c59 100644
--- a/arch/x86/hyperv/hv_vtl.c
+++ b/arch/x86/hyperv/hv_vtl.c
@@ -70,6 +70,7 @@ void __init hv_vtl_init_platform(void)
x86_platform.realmode_reserve = x86_init_noop;
x86_platform.realmode_init = x86_init_noop;
+ real_mode_header = &hv_vtl_real_mode_header;
x86_init.irqs.pre_vector_init = x86_init_noop;
x86_init.timers.timer_init = x86_init_noop;
x86_init.resources.probe_roms = x86_init_noop;
@@ -251,7 +252,6 @@ int __init hv_vtl_early_init(void)
panic("XSAVE has to be disabled as it is not supported by this module.\n"
"Please add 'noxsave' to the kernel command line.\n");
- real_mode_header = &hv_vtl_real_mode_header;
apic_update_callback(wakeup_secondary_cpu_64, hv_vtl_wakeup_secondary_cpu);
return 0;
--
2.43.0
^ permalink raw reply related
* [PATCH v9 03/10] dt-bindings: reserved-memory: Wakeup Mailbox for Intel processors
From: Ricardo Neri @ 2026-03-04 23:41 UTC (permalink / raw)
To: x86, Krzysztof Kozlowski, Conor Dooley, Rob Herring,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Michael Kelley, Rafael J. Wysocki
Cc: Saurabh Sengar, Chris Oo, Kirill A. Shutemov, linux-hyperv,
devicetree, linux-acpi, linux-kernel, Ricardo Neri,
Rafael J. Wysocki (Intel), Yunhong Jiang, Ricardo Neri
In-Reply-To: <20260304-rneri-wakeup-mailbox-v9-0-a5c6845e6251@linux.intel.com>
Add DeviceTree bindings to enumerate the wakeup mailbox used in platform
firmware for Intel processors.
x86 platforms commonly boot secondary CPUs using an INIT assert, de-assert
followed by Start-Up IPI messages. The wakeup mailbox can be used when this
mechanism is unavailable.
The wakeup mailbox offers more control to the operating system to boot
secondary CPUs than a spin-table. It allows the reuse of the same wakeup
vector for all CPUs while maintaining control over which CPUs to boot and
when. While it is possible to achieve the same level of control using a
spin-table, it would require specifying a separate `cpu-release-addr` for
each secondary CPU.
The operation and structure of the mailbox are described in the
Multiprocessor Wakeup Structure defined in the ACPI specification. Note
that this structure does not specify how to publish the mailbox to the
operating system (ACPI-based platform firmware uses a separate table). No
ACPI table is needed in DeviceTree-based firmware to enumerate the mailbox.
Nodes that want to refer to the reserved memory usually define
a `memory-region` property. /cpus/cpu* nodes would want to refer to the
mailbox, but they do not have such property defined in the DeviceTree
specification. Moreover, it would imply that there is a memory region per
CPU. Instead, add a `compatible` property that the operating system can use
to discover the mailbox.
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Rafael J. Wysocki (Intel) <rafael.j.wysocki@intel.com>
Co-developed-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes in v9:
- Removed a spurious empty line. (Rob's bot)
Changes in v8:
- None
Changes in v7:
- Fixed Acked-by tag from Rafael to include the "(Intel)" suffix.
Changes in v6:
- Reworded the changelog for clarity.
- Added Acked-by tag from Rafael. Thanks!
- Added Reviewed-by tag from Rob. Thanks!
- Added Reviewed-by tag from Dexuan. Thanks!
Changes in v5:
- Specified the version and section of the ACPI spec in which the
wakeup mailbox is defined. (Rafael)
- Fixed a warning from yamllint about line lengths of URLs.
Changes in v4:
- Removed redefinitions of the mailbox and instead referred to ACPI
specification as per discussion on LKML.
- Clarified that DeviceTree-based firmware do not require the use of
ACPI tables to enumerate the mailbox. (Rob)
- Described the need of using a `compatible` property.
- Dropped the `alignment` property. (Krzysztof, Rafael)
- Used a real address for the mailbox node. (Krzysztof)
Changes in v3:
- Implemented the mailbox as a reserved-memory node. Add to it a
`compatible` property. (Krzysztof)
- Explained the relationship between the mailbox and the `enable-mehod`
property of the CPU nodes.
- Expanded the documentation of the binding.
Changes in v2:
- Added more details to the description of the binding.
- Added requirement a new requirement for cpu@N nodes to add an
`enable-method`.
---
.../reserved-memory/intel,wakeup-mailbox.yaml | 49 ++++++++++++++++++++++
1 file changed, 49 insertions(+)
diff --git a/Documentation/devicetree/bindings/reserved-memory/intel,wakeup-mailbox.yaml b/Documentation/devicetree/bindings/reserved-memory/intel,wakeup-mailbox.yaml
new file mode 100644
index 000000000000..4362bc058df4
--- /dev/null
+++ b/Documentation/devicetree/bindings/reserved-memory/intel,wakeup-mailbox.yaml
@@ -0,0 +1,49 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/reserved-memory/intel,wakeup-mailbox.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Wakeup Mailbox for Intel processors
+
+description: |
+ The Wakeup Mailbox provides a mechanism for the operating system to wake up
+ secondary CPUs on Intel processors. It is an alternative to the INIT-!INIT-
+ SIPI sequence used on most x86 systems.
+
+ The structure and operation of the mailbox is described in the Multiprocessor
+ Wakeup Structure of the ACPI specification version 6.6 section 5.2.12.19 [1].
+
+ The implementation of the mailbox in platform firmware is described in the
+ Intel TDX Virtual Firmware Design Guide section 4.3.5 [2].
+
+ 1: https://uefi.org/specs/ACPI/6.6/05_ACPI_Software_Programming_Model.html#multiprocessor-wakeup-structure
+ 2: https://www.intel.com/content/www/us/en/content-details/733585/intel-tdx-virtual-firmware-design-guide.html
+
+maintainers:
+ - Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
+
+allOf:
+ - $ref: reserved-memory.yaml
+
+properties:
+ compatible:
+ const: intel,wakeup-mailbox
+
+required:
+ - compatible
+ - reg
+
+unevaluatedProperties: false
+
+examples:
+ - |
+ reserved-memory {
+ #address-cells = <2>;
+ #size-cells = <1>;
+
+ wakeup-mailbox@ffff0000 {
+ compatible = "intel,wakeup-mailbox";
+ reg = <0x0 0xffff0000 0x1000>;
+ };
+ };
--
2.43.0
^ permalink raw reply related
* [PATCH v9 04/10] x86/dt: Parse the Wakeup Mailbox for Intel processors
From: Ricardo Neri @ 2026-03-04 23:41 UTC (permalink / raw)
To: x86, Krzysztof Kozlowski, Conor Dooley, Rob Herring,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Michael Kelley, Rafael J. Wysocki
Cc: Saurabh Sengar, Chris Oo, Kirill A. Shutemov, linux-hyperv,
devicetree, linux-acpi, linux-kernel, Ricardo Neri, Yunhong Jiang,
Ricardo Neri
In-Reply-To: <20260304-rneri-wakeup-mailbox-v9-0-a5c6845e6251@linux.intel.com>
The Wakeup Mailbox is a mechanism to boot secondary CPUs on systems that do
not want or cannot use the INIT + StartUp IPI messages.
The platform firmware is expected to implement the mailbox as described in
the Multiprocessor Wakeup Structure of the ACPI specification. It is also
expected to publish the mailbox to the operating system as described in the
corresponding DeviceTree schema that accompanies the documentation of the
Linux kernel.
Reuse the existing functionality to set the memory location of the mailbox
and update the wakeup_secondary_cpu_64() APIC callback. Make this
functionality available to DeviceTree-based systems by making CONFIG_X86_
MAILBOX_WAKEUP depend on either CONFIG_OF or CONFIG_ACPI_MADT_WAKEUP.
do_boot_cpu() uses wakeup_secondary_cpu_64() when set. It will be set if a
wakeup mailbox is enumerated via an ACPI table or a DeviceTree node. For
cases in which this behavior is not desired, this APIC callback can be
updated later during boot using platform-specific hooks.
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Co-developed-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
Signed-off-by: Yunhong Jiang <yunhong.jiang@linux.intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes in v9:
- None
Changes in v8:
- None
Changes in v7:
- #included asm/acpi.h to reflect the updated declaration of the
needed functions.
- (Kept Reviewed-by tag from Dexuan, as this single change is trivial.)
Changes in v6:
- Added Reviewed-by tag from Dexuan. Thanks!
Changes in v5:
- Made CONFIG_X86_MAILBOX_WAKEUP depend on CONFIG_OF or CONFIG_ACPI_
MADT_WAKEUP.
Changes in v4:
- Look for the wakeup mailbox unconditionally, regardless of whether
cpu@N nodes have an `enable-method` property.
- Add a reference to the ACPI specification. (Rafael)
Changes in v3:
- Added extra sanity checks when parsing the mailbox node.
- Probe the mailbox using its `compatible` property
- Setup the Wakeup Mailbox if the `enable-method` is found in the CPU
nodes.
- Cleaned up unneeded ifdeffery.
- Clarified the mechanisms used to override the wakeup_secondary_64()
callback to not use the mailbox when not desired. (Michael)
- Edited the commit message for clarity.
Changes in v2:
- Disabled CPU offlining.
- Modified dtb_parse_mp_wake() to return the address of the mailbox.
---
arch/x86/kernel/devicetree.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/arch/x86/kernel/devicetree.c b/arch/x86/kernel/devicetree.c
index dd8748c45529..318acaecb5ca 100644
--- a/arch/x86/kernel/devicetree.c
+++ b/arch/x86/kernel/devicetree.c
@@ -18,6 +18,7 @@
#include <linux/of_pci.h>
#include <linux/initrd.h>
+#include <asm/acpi.h>
#include <asm/irqdomain.h>
#include <asm/hpet.h>
#include <asm/apic.h>
@@ -125,6 +126,51 @@ static void __init dtb_setup_hpet(void)
#endif
}
+#if defined(CONFIG_X86_64) && defined(CONFIG_SMP)
+
+#define WAKEUP_MAILBOX_SIZE 0x1000
+#define WAKEUP_MAILBOX_ALIGN 0x1000
+
+/** dtb_wakeup_mailbox_setup() - Parse the wakeup mailbox from the device tree
+ *
+ * Look for the presence of a wakeup mailbox in the DeviceTree. The mailbox is
+ * expected to follow the structure and operation described in the Multiprocessor
+ * Wakeup Structure of the ACPI specification.
+ */
+static void __init dtb_wakeup_mailbox_setup(void)
+{
+ struct device_node *node;
+ struct resource res;
+
+ node = of_find_compatible_node(NULL, NULL, "intel,wakeup-mailbox");
+ if (!node)
+ return;
+
+ if (of_address_to_resource(node, 0, &res))
+ goto done;
+
+ /* The mailbox is a 4KB-aligned region.*/
+ if (res.start & (WAKEUP_MAILBOX_ALIGN - 1))
+ goto done;
+
+ /* The mailbox has a size of 4KB. */
+ if (res.end - res.start + 1 != WAKEUP_MAILBOX_SIZE)
+ goto done;
+
+ /* Not supported when the mailbox is used. */
+ cpu_hotplug_disable_offlining();
+
+ acpi_setup_mp_wakeup_mailbox(res.start);
+done:
+ of_node_put(node);
+}
+#else /* !CONFIG_X86_64 || !CONFIG_SMP */
+static inline int dtb_wakeup_mailbox_setup(void)
+{
+ return -EOPNOTSUPP;
+}
+#endif /* CONFIG_X86_64 && CONFIG_SMP */
+
#ifdef CONFIG_X86_LOCAL_APIC
static void __init dtb_cpu_setup(void)
@@ -287,6 +333,7 @@ static void __init x86_dtb_parse_smp_config(void)
dtb_setup_hpet();
dtb_apic_setup();
+ dtb_wakeup_mailbox_setup();
}
void __init x86_flattree_get_config(void)
--
2.43.0
^ permalink raw reply related
* [PATCH v9 02/10] x86/acpi: Add functions to setup and access the wakeup mailbox
From: Ricardo Neri @ 2026-03-04 23:41 UTC (permalink / raw)
To: x86, Krzysztof Kozlowski, Conor Dooley, Rob Herring,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Michael Kelley, Rafael J. Wysocki
Cc: Saurabh Sengar, Chris Oo, Kirill A. Shutemov, linux-hyperv,
devicetree, linux-acpi, linux-kernel, Ricardo Neri, Ricardo Neri
In-Reply-To: <20260304-rneri-wakeup-mailbox-v9-0-a5c6845e6251@linux.intel.com>
Systems that describe hardware using DeviceTree graphs may enumerate and
implement the wakeup mailbox as defined in the ACPI specification but do
not otherwise depend on ACPI. Expose functions to setup and access the
location of the wakeup mailbox from outside ACPI code.
The function acpi_setup_mp_wakeup_mailbox() stores the physical address of
the mailbox and updates the wakeup_secondary_cpu_64() APIC callback.
The function acpi_madt_multiproc_wakeup_mailbox() returns a pointer to the
mailbox.
Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes in v9:
- None
Changes in v8:
- Added Acked-by tag from Rafael. Thanks!
Changes in v7:
- Moved function declarations to arch/x86/include/asm/acpi.h
- Added stubs for !CONFIG_ACPI.
- Do not use these new functions in madt_wakeup.c.
- Dropped Acked-by and Reviewed-by tags from Rafael and Dexuan as this
patch changed.
Changes in v6:
- Fixed grammar error in the subject of the patch. (Rafael)
- Added Acked-by tag from Rafael. Thanks!
- Added Reviewed-by tag from Dexuan. Thanks!
Changes in v5:
- None
Changes in v4:
- Squashed the two first patches of the series into one, both introduce
helper functions. (Rafael)
- Renamed setup_mp_wakeup_mailbox() as acpi_setup_mp_wakeup_mailbox().
(Rafael)
- Dropped the function prototype for !CONFIG_X86_64. (Rafael)
Changes in v3:
- Introduced this patch.
Changes in v2:
- N/A
---
arch/x86/include/asm/acpi.h | 10 ++++++++++
arch/x86/kernel/acpi/madt_wakeup.c | 11 +++++++++++
2 files changed, 21 insertions(+)
diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index a03aa6f999d1..820df375df79 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -182,6 +182,9 @@ void __iomem *x86_acpi_os_ioremap(acpi_physical_address phys, acpi_size size);
#define acpi_os_ioremap acpi_os_ioremap
#endif
+void acpi_setup_mp_wakeup_mailbox(u64 addr);
+struct acpi_madt_multiproc_wakeup_mailbox *acpi_get_mp_wakeup_mailbox(void);
+
#else /* !CONFIG_ACPI */
#define acpi_lapic 0
@@ -200,6 +203,13 @@ static inline u64 x86_default_get_root_pointer(void)
return 0;
}
+static inline void acpi_setup_mp_wakeup_mailbox(u64 addr) { }
+
+static inline struct acpi_madt_multiproc_wakeup_mailbox *acpi_get_mp_wakeup_mailbox(void)
+{
+ return NULL;
+}
+
#endif /* !CONFIG_ACPI */
#define ARCH_HAS_POWER_INIT 1
diff --git a/arch/x86/kernel/acpi/madt_wakeup.c b/arch/x86/kernel/acpi/madt_wakeup.c
index 6d7603511f52..82caf44b45e3 100644
--- a/arch/x86/kernel/acpi/madt_wakeup.c
+++ b/arch/x86/kernel/acpi/madt_wakeup.c
@@ -247,3 +247,14 @@ int __init acpi_parse_mp_wake(union acpi_subtable_headers *header,
return 0;
}
+
+void __init acpi_setup_mp_wakeup_mailbox(u64 mailbox_paddr)
+{
+ acpi_mp_wake_mailbox_paddr = mailbox_paddr;
+ apic_update_callback(wakeup_secondary_cpu_64, acpi_wakeup_cpu);
+}
+
+struct acpi_madt_multiproc_wakeup_mailbox *acpi_get_mp_wakeup_mailbox(void)
+{
+ return acpi_mp_wake_mailbox;
+}
--
2.43.0
^ permalink raw reply related
* [PATCH v9 01/10] x86/topology: Add missing struct declaration and attribute dependency
From: Ricardo Neri @ 2026-03-04 23:41 UTC (permalink / raw)
To: x86, Krzysztof Kozlowski, Conor Dooley, Rob Herring,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Michael Kelley, Rafael J. Wysocki
Cc: Saurabh Sengar, Chris Oo, Kirill A. Shutemov, linux-hyperv,
devicetree, linux-acpi, linux-kernel, Ricardo Neri,
kernel test robot, Ricardo Neri
In-Reply-To: <20260304-rneri-wakeup-mailbox-v9-0-a5c6845e6251@linux.intel.com>
The prototypes for get_topology_cpu_type_name() and
get_topology_cpu_type() take a pointer to struct cpuinfo_x86, but
asm/topology.h neither includes nor forward-declares the structure.
Including asm/topology.h, directly or indirectly, without including
asm/processor.h triggers a warning:
./arch/x86/include/asm/topology.h:159:47: error: ‘struct cpuinfo_x86’
declared inside parameter list will not be visible outside of this
definition or declaration [-Werror]
159 | const char *get_topology_cpu_type_name(struct cpuinfo_x86 *c);
| ^~~~~~~~~~~
Since only a pointer is needed, add a forward declaration of struct
cpuinfo_x86.
Additionally, sysctl_sched_itmt_enabled is declared in asm/topology.h with
the __read_mostly attribute, but the header does not include linux/cache.h.
This causes a build failure when including asm/topology.h but not linux/
cache.h:
./arch/x86/include/asm/topology.h:264:27: error: expected ‘=’, ‘,’,
‘;’, ‘asm’ or ‘__attribute__’ before ‘sysctl_sched_itmt_enabled’
264 | extern bool __read_mostly sysctl_sched_itmt_enabled;
| ^~~~~~~~~~~~~~~~~~~~~~~~~
Include the required header.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511181954.UMxCeTV1-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511190008.AA0NTn3G-lkp@intel.com/
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
I independently found this issue when including asm/acpi.h to arch/x86/
hyperv/hv_vtl.c, which implicitly includes asm/topology.h but not asm/
processor.h nor linux/cache.h.
---
Changes in v9:
- None
Changes in v8:
- Added this patch.
Changes in v7:
- N/A
Changes in v6:
- N/A
Changes in v5:
- N/A
Changes in v4:
- N/A
Changes in v3:
- N/A
Changes in v2:
- N/A
Changes in v3:
- N/A
Changes in v2:
- N/A
---
arch/x86/include/asm/topology.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 1fadf0cf520c..630521a03982 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -156,6 +156,8 @@ extern unsigned int __max_threads_per_core;
extern unsigned int __num_threads_per_package;
extern unsigned int __num_cores_per_package;
+struct cpuinfo_x86;
+
const char *get_topology_cpu_type_name(struct cpuinfo_x86 *c);
enum x86_topology_cpu_type get_topology_cpu_type(struct cpuinfo_x86 *c);
@@ -259,6 +261,7 @@ extern bool x86_topology_update;
#ifdef CONFIG_SCHED_MC_PRIO
#include <asm/percpu.h>
+#include <linux/cache.h>
DECLARE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
extern bool __read_mostly sysctl_sched_itmt_enabled;
--
2.43.0
^ permalink raw reply related
* [PATCH v9 00/10] x86/hyperv/hv_vtl: Use a wakeup mailbox to boot secondary CPUs
From: Ricardo Neri @ 2026-03-04 23:41 UTC (permalink / raw)
To: x86, Krzysztof Kozlowski, Conor Dooley, Rob Herring,
K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Michael Kelley, Rafael J. Wysocki
Cc: Saurabh Sengar, Chris Oo, Kirill A. Shutemov, linux-hyperv,
devicetree, linux-acpi, linux-kernel, Ricardo Neri,
kernel test robot, Ricardo Neri, Rafael J. Wysocki (Intel),
Yunhong Jiang, Thomas Gleixner
Hi x86 maintainers,
This is a new version of this patchset. The only change since the last
version is a fix for a warning from `make dt_binding_check`. Since v7, I
incorporated feedback from Boris. Also, the ACPI, DeviceTree and Hyper-V
maintainers have reviewed the patches. Any chance it could be merged?
I include the cover letter from the previous version for convenience.
Thanks a lot to all those who have reviewed the series!
...
This patchset adds functionality to use the ACPI wakeup mailbox to boot
secondary CPUs in Hyper-V VTL level 2 TDX guests with DeviceTree-based
virtual firmware. Although this is the target use case, the use of the
mailbox depends solely on it being enumerated in the DeviceTree graph.
On x86 platforms, secondary CPUs are typically booted using INIT assert,
de-assert followed by Start-Up IPI messages. Virtual machines can also use
hypercalls to bring up secondary CPUs to a desired execution state. These
two mechanisms require support from the hypervisor. Confidential computing
VMs in a TDX environment cannot use this mechanism because the hypervisor
is considered an untrusted entity.
Linux already supports the ACPI Multiprocessor Wakeup Structure in which
the guest platform firmware boots the secondary CPUs and transfers control
to the kernel using a mailbox. This mechanism does not need involvement
of the VMM. It can be used in a Hyper-V VTL level 2 TDX guest.
Currently, this mechanism can only be used on x86 platforms with firmware
that supports ACPI. There are platforms that use DeviceTree (e.g., OpenHCL
[2]) instead of ACPI to describe the hardware.
Provided that the wakeup mailbox enumerated in a DeviceTree-based platform
firmware is implemented as described in the ACPI specification, the kernel
can use the existing ACPI code for both DeviceTree and ACPI systems. The
DeviceTree firmware does not need to use any ACPI table to enumerate the
mailbox.
This patchset is structured as follows:
* Add missing dependencies to arch/x86/include/asm/topology.h. (patch 1)
* Expose functions to reuse the code handling the ACPI Multiprocessor
Wakeup Structure outside of ACPI code. (patch 2)
* Define DeviceTree bindings to enumerate a mailbox as described in
the ACPI specification. (patch 3)
* Find and set up the wakeup mailbox if enumerated in the DeviceTree
graph. (patch 4)
* Prepare Hyper-V VTL2 TDX guests to use the Wakeup Mailbox to boot
secondary CPUs when available. (patches 5-10)
I have tested this patchset on a Hyper-V host with VTL2 OpenHCL, QEMU, and
physical hardware.
Changes in v9:
- Fixed a warning from `make dt_binding_check` reported by Rob's bot.
- Link to v8: https://lore.kernel.org/r/20260107-rneri-wakeup-mailbox-v8-0-2f5b6785f2f5@linux.intel.com
Changes in v8:
- Fixed a build break. Same patch as [1].
- Added two Acked-by tags from Rafael. Thanks!
- Link to v7: https://lore.kernel.org/r/20251117-rneri-wakeup-mailbox-v7-0-4a8b82ab7c2c@linux.intel.com
Changes in v7:
- Dropped the patch that relocated the ACPI wakeup mailbox to an generic
location. (Boris)
- Instead, added function declarations to use the wakeup mailbox from
outside ACPI code. Also added stubs for !CONFIG_ACPI.
- Link to v6: https://lore.kernel.org/r/20251016-rneri-wakeup-mailbox-v6-0-40435fb9305e@linux.intel.com
Changes in v6:
- Fixed a build error with !CONFIG_X86_MAILBOX_WAKEUP and
CONFIG_HYPER_VTL_MODE.
- Added Acked-by tags from Rafael. Thanks!
- Added Reviewed-by tags from Dexuan and Rob. Thanks!
- Corrected typos and function names in the changelog.
- Link to v5: https://lore.kernel.org/r/20250627-rneri-wakeup-mailbox-v5-0-df547b1d196e@linux.intel.com
Changes in v5:
- Referred in the DeviceTree binding documentation the section and
section of the ACPI specification that defines the wakeup mailbox.
- Moved the dependency on CONFIG_OF to patch 4, where the flattened
DeviceTree is parsed for the mailbox.
- Fixed a warning from yamllint regarding line lengths.
- Link to v4: https://lore.kernel.org/r/20250603-rneri-wakeup-mailbox-v4-0-d533272b7232@linux.intel.com
Changes in v4:
- Added Reviewed-by: tags from Michael Kelley. Thanks!
- Relocated the common wakeup code from acpi/madt_wakeup.c to a new
smpwakeup.c to be used in DeviceTree- and ACPI-based systems.
- Dropped the x86 CPU bindings as they are not a good fit to document
firmware features.
- Dropped the code that parsed and validated of the `enable-method`
property for cpu@N nodes in x86. Instead, unconditionally parse and use
the wakeup mailbox when found.
- Updated the wakeup mailbox schema to avoid redefing the structure and
operation of the mailbox. Instead, refer to the ACPI specification.
Also clarified that the enumeration of the mailbox is done separately.
- Prefixed helper functions of wakeup code with acpi_.
- Link to v3: https://lore.kernel.org/r/20250503191515.24041-1-ricardo.neri-calderon@linux.intel.com
Changes in v3:
- Only move out of the acpi directory acpi_wakeup_cpu() and its
accessory variables. Use helper functions to access the mailbox as
needed. This also fixed the warnings about unused code with CONFIG_
ACPI=n that Michael reported.
- Major rework of the DeviceTree bindings and schema. Now there is a
reserved-memory binding for the mailbox as well as a new x86 CPU
bindings. Both have `compatible` properties.
- Rework of the code parsing the DeviceTree bindings for the mailbox.
Now configuring the mailbox depends solely on its enumeration in the
DeviceTree and not on Hyper-V VTL2 TDX guest.
- Do not make reserving the first 1MB of memory optional. It is not
needed and may introduce bugs.
- Prepare Hyper-V VTL2 guests to unconditionally use the mailbox in TDX
environments. If the mailbox is not available, booting secondary CPUs
will fail gracefully.
- Link to v2: https://lore.kernel.org/r/20240823232327.2408869-1-yunhong.jiang@linux.intel.com
Changes in v2:
- Fix the cover letter's summary phrase.
- Fix the DT binding document to pass validation.
- Change the DT binding document to be ACPI independent.
- Move ACPI-only functions into the #ifdef CONFIG_ACPI.
- Change dtb_parse_mp_wake() to return mailbox physical address.
- Rework the hv_is_private_mmio_tdx().
- Remove unrelated real mode change from the patch that marks mailbox
page private.
- Check hv_isolation_type_tdx() instead of wakeup_mailbox_addr in
hv_vtl_init_platform() because wakeup_mailbox_addr is not parsed yet.
- Add memory range support to reserve_real_mode.
- Remove realmode_reserve callback and use the memory range.
- Move setting the real_mode_header to hv_vtl_init_platform.
- Update comments and commit messages.
- Minor style changes.
- Link to v1: https://lore.kernel.org/r/20240806221237.1634126-1-yunhong.jiang@linux.intel.com
[1]. https://lore.kernel.org/all/20251117-rneri-topology-cpuinfo-bug-v1-1-a905bb5f91e2@linux.intel.com/
[2]. https://openvmm.dev/guide/user_guide/openhcl.html
--
2.43.0
---
Ricardo Neri (6):
x86/topology: Add missing struct declaration and attribute dependency
x86/acpi: Add functions to setup and access the wakeup mailbox
dt-bindings: reserved-memory: Wakeup Mailbox for Intel processors
x86/dt: Parse the Wakeup Mailbox for Intel processors
x86/acpi: Add a helper get the address of the wakeup mailbox
x86/hyperv/vtl: Use the wakeup mailbox to boot secondary CPUs
Yunhong Jiang (4):
x86/hyperv/vtl: Set real_mode_header in hv_vtl_init_platform()
x86/realmode: Make the location of the trampoline configurable
x86/hyperv/vtl: Setup the 64-bit trampoline for TDX guests
x86/hyperv/vtl: Mark the wakeup mailbox page as private
.../reserved-memory/intel,wakeup-mailbox.yaml | 49 ++++++++++++++++++++++
arch/x86/hyperv/hv_vtl.c | 38 +++++++++++++++--
arch/x86/include/asm/acpi.h | 16 +++++++
arch/x86/include/asm/topology.h | 3 ++
arch/x86/include/asm/x86_init.h | 3 ++
arch/x86/kernel/acpi/madt_wakeup.c | 16 +++++++
arch/x86/kernel/devicetree.c | 47 +++++++++++++++++++++
arch/x86/kernel/x86_init.c | 3 ++
arch/x86/realmode/init.c | 7 ++--
9 files changed, 174 insertions(+), 8 deletions(-)
---
base-commit: 18a93ea5e0ae3e3e6918a6efc6a1d60a37be47b2
change-id: 20250602-rneri-wakeup-mailbox-328efe72803f
Best regards,
--
Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
^ permalink raw reply
* RE: [PATCH V0] mshv: pass struct mshv_user_mem_region by reference
From: Michael Kelley @ 2026-03-04 18:45 UTC (permalink / raw)
To: Mukesh R, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org
Cc: wei.liu@kernel.org
In-Reply-To: <20260304000251.2625375-1-mrathor@linux.microsoft.com>
From: Mukesh R <mrathor@linux.microsoft.com> Sent: Tuesday, March 3, 2026 4:03 PM
>
> For unstated reasons, function mshv_partition_ioctl_set_memory passes
> struct mshv_user_mem_region by value instead of by reference. Change
> it to pass by reference.
>
> Signed-off-by: Mukesh R <mrathor@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
> ---
> drivers/hv/mshv_root_main.c | 26 +++++++++++++-------------
> 1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index e6509c980763..87c5ffd2528d 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -1289,7 +1289,7 @@ static int mshv_prepare_pinned_region(struct mshv_mem_region *region)
> */
> static long
> mshv_map_user_memory(struct mshv_partition *partition,
> - struct mshv_user_mem_region mem)
> + struct mshv_user_mem_region *mem)
> {
> struct mshv_mem_region *region;
> struct vm_area_struct *vma;
> @@ -1297,12 +1297,12 @@ mshv_map_user_memory(struct mshv_partition *partition,
> ulong mmio_pfn;
> long ret;
>
> - if (mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP) ||
> - !access_ok((const void __user *)mem.userspace_addr, mem.size))
> + if (mem->flags & BIT(MSHV_SET_MEM_BIT_UNMAP) ||
> + !access_ok((const void __user *)mem->userspace_addr, mem->size))
> return -EINVAL;
>
> mmap_read_lock(current->mm);
> - vma = vma_lookup(current->mm, mem.userspace_addr);
> + vma = vma_lookup(current->mm, mem->userspace_addr);
> is_mmio = vma ? !!(vma->vm_flags & (VM_IO | VM_PFNMAP)) : 0;
> mmio_pfn = is_mmio ? vma->vm_pgoff : 0;
> mmap_read_unlock(current->mm);
> @@ -1310,7 +1310,7 @@ mshv_map_user_memory(struct mshv_partition *partition,
> if (!vma)
> return -EINVAL;
>
> - ret = mshv_partition_create_region(partition, &mem, ®ion,
> + ret = mshv_partition_create_region(partition, mem, ®ion,
> is_mmio);
> if (ret)
> return ret;
> @@ -1355,25 +1355,25 @@ mshv_map_user_memory(struct mshv_partition *partition,
> /* Called for unmapping both the guest ram and the mmio space */
> static long
> mshv_unmap_user_memory(struct mshv_partition *partition,
> - struct mshv_user_mem_region mem)
> + struct mshv_user_mem_region *mem)
> {
> struct mshv_mem_region *region;
>
> - if (!(mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP)))
> + if (!(mem->flags & BIT(MSHV_SET_MEM_BIT_UNMAP)))
> return -EINVAL;
>
> spin_lock(&partition->pt_mem_regions_lock);
>
> - region = mshv_partition_region_by_gfn(partition, mem.guest_pfn);
> + region = mshv_partition_region_by_gfn(partition, mem->guest_pfn);
> if (!region) {
> spin_unlock(&partition->pt_mem_regions_lock);
> return -ENOENT;
> }
>
> /* Paranoia check */
> - if (region->start_uaddr != mem.userspace_addr ||
> - region->start_gfn != mem.guest_pfn ||
> - region->nr_pages != HVPFN_DOWN(mem.size)) {
> + if (region->start_uaddr != mem->userspace_addr ||
> + region->start_gfn != mem->guest_pfn ||
> + region->nr_pages != HVPFN_DOWN(mem->size)) {
> spin_unlock(&partition->pt_mem_regions_lock);
> return -EINVAL;
> }
> @@ -1404,9 +1404,9 @@ mshv_partition_ioctl_set_memory(struct mshv_partition *partition,
> return -EINVAL;
>
> if (mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP))
> - return mshv_unmap_user_memory(partition, mem);
> + return mshv_unmap_user_memory(partition, &mem);
>
> - return mshv_map_user_memory(partition, mem);
> + return mshv_map_user_memory(partition, &mem);
> }
>
> static long
> --
> 2.51.2.vfs.0.1
>
^ permalink raw reply
* Re: [PATCH v2 net-next 6/6] RDMA/mana_ib: Allocate interrupt contexts on EQs
From: Leon Romanovsky @ 2026-03-04 14:59 UTC (permalink / raw)
To: Long Li
Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Shradha Gupta, Simon Horman, Konstantin Taranov,
Souradeep Chakrabarti, Erick Archer, linux-hyperv, netdev,
linux-kernel, linux-rdma
In-Reply-To: <20260304000017.333312-7-longli@microsoft.com>
On Tue, Mar 03, 2026 at 04:00:17PM -0800, Long Li wrote:
> Use the GIC functions to allocate interrupt contexts for RDMA EQs. These
> interrupt contexts may be shared with Ethernet EQs when MSI-X vectors
> are limited.
>
> The driver now supports allocating dedicated MSI-X for each EQ. Indicate
> this capability through driver capability bits.
>
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
> drivers/infiniband/hw/mana/main.c | 33 ++++++++++++++++++++++++++-----
> include/net/mana/gdma.h | 7 +++++--
> 2 files changed, 33 insertions(+), 7 deletions(-)
>
Thanks,
Acked-by: Leon Romanovsky <leon@kernel.org>
^ permalink raw reply
* Re: [PATCH v2 net-next 1/6] net: mana: Create separate EQs for each vPort
From: Leon Romanovsky @ 2026-03-04 14:58 UTC (permalink / raw)
To: Long Li
Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Shradha Gupta, Simon Horman, Konstantin Taranov,
Souradeep Chakrabarti, Erick Archer, linux-hyperv, netdev,
linux-kernel, linux-rdma
In-Reply-To: <20260304000017.333312-2-longli@microsoft.com>
On Tue, Mar 03, 2026 at 04:00:12PM -0800, Long Li wrote:
> To prepare for assigning vPorts to dedicated MSI-X vectors, remove EQ
> sharing among the vPorts and create dedicated EQs for each vPort.
>
> Move the EQ definition from struct mana_context to struct mana_port_context
> and update related support functions. Export mana_create_eq() and
> mana_destroy_eq() for use by the MANA RDMA driver.
>
> Signed-off-by: Long Li <longli@microsoft.com>
> ---
> drivers/infiniband/hw/mana/main.c | 14 ++-
> drivers/infiniband/hw/mana/qp.c | 4 +-
> drivers/net/ethernet/microsoft/mana/mana_en.c | 109 ++++++++++--------
> include/net/mana/mana.h | 7 +-
> 4 files changed, 82 insertions(+), 52 deletions(-)
>
Thanks,
Acked-by: Leon Romanovsky <leon@kernel.org> # drivers/infiniband
^ permalink raw reply
* Re: [PATCH v4 5/5] mm/page_reporting: change page_reporting_order to PAGE_REPORTING_ORDER_UNSPECIFIED
From: Yuvraj Sakshith @ 2026-03-04 8:44 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: mst, kys, haiyangz, wei.liu, decui, longli, jasowang, xuanzhuo,
eperezma, akpm, lorenzo.stoakes, Liam.Howlett, vbabka, rppt,
surenb, mhocko, jackmanb, hannes, ziy, linux-hyperv, linux-kernel,
virtualization, linux-mm
In-Reply-To: <849233d2-9d40-4d67-965d-262153d15d07@kernel.org>
On Tue, Mar 03, 2026 at 03:59:10PM +0100, David Hildenbrand (Arm) wrote:
> On 3/3/26 12:30, Yuvraj Sakshith wrote:
> > Signed-off-by: Yuvraj Sakshith <yuvraj.sakshith@oss.qualcomm.com>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Sweet. Thanks for the review, David.
Yuvraj
^ permalink raw reply
* Re: [PATCH v3] x86/hyperv: Use __naked attribute to fix stackless C function
From: Wei Liu @ 2026-03-04 5:50 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, x86, Andrew Cooper, Mukesh Rathor, Uros Bizjak,
Wei Liu, linux-hyperv
In-Reply-To: <20260302164530.50005-2-ardb@kernel.org>
On Mon, Mar 02, 2026 at 05:45:31PM +0100, Ard Biesheuvel wrote:
> hv_crash_c_entry() is a C function that is entered without a stack,
> and this is only allowed for functions that have the __naked attribute,
> which informs the compiler that it must not emit the usual prologue and
> epilogue or emit any other kind of instrumentation that relies on a
> stack frame.
>
> So split up the function, and set the __naked attribute on the initial
> part that sets up the stack, GDT, IDT and other pieces that are needed
> for ordinary C execution. Given that function calls are not permitted
> either, use the existing long return coded in an asm() block to call the
> second part of the function, which is an ordinary function that is
> permitted to call other functions as usual.
>
> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> # asm parts, not hv parts
> Reviewed-by: Mukesh Rathor <mrathor@linux.microsoft.com>
> Acked-by: Uros Bizjak <ubizjak@gmail.com>
> Cc: Wei Liu <wei.liu@kernel.org>
> Cc: linux-hyperv@vger.kernel.org
> Fixes: 94212d34618c ("x86/hyperv: Implement hypervisor RAM collection into vmcore")
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Applied to hyperv-fixes. Thanks Ard!
^ permalink raw reply
* [PATCH 4/4] mshv: Pre-deposit pages for SLAT creation
From: Stanislav Kinsburskii @ 2026-03-04 0:23 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177258296744.229866.4926075663598294228.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
Deposit enough pages up front to avoid guest address space region creation
failures due to low memory. This also speeds up guest creation.
Calculate the required number of pages based on the guest's physical
address space size, rounded up to 1 GB chunks. Even the smallest guests are
assumed to need at least 1 GB worth of deposits. This is because every
guest requires tens of megabytes of deposited pages for hypervisor
overhead, making smaller deposits impractical.
Estimating in 1 GB chunks prevents over-depositing for larger guests while
accepting some over-deposit for smaller ones. This trade-off keeps the
estimate close to actual needs for larger guests.
Also withdraw the deposited pages if address space region creation fails.
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
drivers/hv/mshv_root_main.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index 48c842b6938d..cb5b4505f8eb 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -39,6 +39,7 @@
#define MSHV_PARTITION_DEPOSIT_PAGES (SZ_512K >> PAGE_SHIFT)
#define MSHV_PARTITION_DEPOSIT_PAGES_NESTED (20 * SZ_1M >> PAGE_SHIFT)
#define MSHV_VP_DEPOSIT_PAGES (1 * SZ_1M >> PAGE_SHIFT)
+#define MSHV_1G_DEPOSIT_PAGES (6 * SZ_1M >> PAGE_SHIFT)
MODULE_AUTHOR("Microsoft");
MODULE_LICENSE("GPL");
@@ -1324,6 +1325,18 @@ static int mshv_prepare_pinned_region(struct mshv_mem_region *region)
return ret;
}
+static u64
+mshv_region_deposit_slat_pages(struct mshv_mem_region *region)
+{
+ u64 region_in_gbs, slat_pages;
+
+ /* SLAT needs 6 MB per 1 GB of address space. */
+ region_in_gbs = DIV_ROUND_UP(region->nr_pages << HV_HYP_PAGE_SHIFT, SZ_1G);
+ slat_pages = region_in_gbs * MSHV_1G_DEPOSIT_PAGES;
+
+ return slat_pages;
+}
+
/*
* This maps two things: guest RAM and for pci passthru mmio space.
*
@@ -1364,6 +1377,11 @@ mshv_map_user_memory(struct mshv_partition *partition,
if (ret)
return ret;
+ ret = hv_call_deposit_pages(NUMA_NO_NODE, partition->pt_id,
+ mshv_region_deposit_slat_pages(region));
+ if (ret)
+ goto free_region;
+
switch (region->mreg_type) {
case MSHV_REGION_TYPE_MEM_PINNED:
ret = mshv_prepare_pinned_region(region);
@@ -1392,7 +1410,7 @@ mshv_map_user_memory(struct mshv_partition *partition,
region->hv_map_flags, ret);
if (ret)
- goto errout;
+ goto withdraw_memory;
spin_lock(&partition->pt_mem_regions_lock);
hlist_add_head(®ion->hnode, &partition->pt_mem_regions);
@@ -1400,7 +1418,10 @@ mshv_map_user_memory(struct mshv_partition *partition,
return 0;
-errout:
+withdraw_memory:
+ hv_call_withdraw_memory(mshv_region_deposit_slat_pages(region),
+ NUMA_NO_NODE, partition->pt_id);
+free_region:
vfree(region);
return ret;
}
^ permalink raw reply related
* [PATCH 3/4] mshv: Fix pre-depositing of pages for virtual processor initialization
From: Stanislav Kinsburskii @ 2026-03-04 0:23 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177258296744.229866.4926075663598294228.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
Deposit enough pages up front to avoid virtual processor creation failures
due to low memory. This also speeds up guest creation. A VP uses 25% more
pages in a partition with nested virtualization enabled, but the exact
number doesn't vary much, so deposit a fixed number of pages per VP that
works for nested virtualization.
Move page depositing from the hypercall wrapper to the virtual processor
creation code. The required number of pages is based on empirical data.
This logic fits better in the virtual processor creation code than in the
hypercall wrapper.
Also withdraw the deposited memory if virtual processor creation fails.
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
drivers/hv/hv_proc.c | 8 --------
drivers/hv/mshv_root_main.c | 11 ++++++++++-
2 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
index 0f84a70def30..3d41f52efd9a 100644
--- a/drivers/hv/hv_proc.c
+++ b/drivers/hv/hv_proc.c
@@ -251,14 +251,6 @@ int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags)
unsigned long irq_flags;
int ret = 0;
- /* Root VPs don't seem to need pages deposited */
- if (partition_id != hv_current_partition_id) {
- /* The value 90 is empirically determined. It may change. */
- ret = hv_call_deposit_pages(node, partition_id, 90);
- if (ret)
- return ret;
- }
-
do {
local_irq_save(irq_flags);
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index fbfc50de332c..48c842b6938d 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -38,6 +38,7 @@
/* The deposit values below are empirical and may need to be adjusted. */
#define MSHV_PARTITION_DEPOSIT_PAGES (SZ_512K >> PAGE_SHIFT)
#define MSHV_PARTITION_DEPOSIT_PAGES_NESTED (20 * SZ_1M >> PAGE_SHIFT)
+#define MSHV_VP_DEPOSIT_PAGES (1 * SZ_1M >> PAGE_SHIFT)
MODULE_AUTHOR("Microsoft");
MODULE_LICENSE("GPL");
@@ -1077,10 +1078,15 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
if (partition->pt_vp_array[args.vp_index])
return -EEXIST;
+ ret = hv_call_deposit_pages(NUMA_NO_NODE, partition->pt_id,
+ MSHV_VP_DEPOSIT_PAGES);
+ if (ret)
+ return ret;
+
ret = hv_call_create_vp(NUMA_NO_NODE, partition->pt_id, args.vp_index,
0 /* Only valid for root partition VPs */);
if (ret)
- return ret;
+ goto withdraw_mem;
ret = hv_map_vp_state_page(partition->pt_id, args.vp_index,
HV_VP_STATE_PAGE_INTERCEPT_MESSAGE,
@@ -1177,6 +1183,9 @@ mshv_partition_ioctl_create_vp(struct mshv_partition *partition,
intercept_msg_page, input_vtl_zero);
destroy_vp:
hv_call_delete_vp(partition->pt_id, args.vp_index);
+withdraw_mem:
+ hv_call_withdraw_memory(MSHV_VP_DEPOSIT_PAGES, NUMA_NO_NODE,
+ partition->pt_id);
out:
trace_mshv_create_vp(partition->pt_id, args.vp_index, ret);
return ret;
^ permalink raw reply related
* [PATCH 2/4] mshv: Fix pre-depositing of pages for partition initialization
From: Stanislav Kinsburskii @ 2026-03-04 0:23 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177258296744.229866.4926075663598294228.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
Deposit enough pages upfront to avoid partition initialization failures due
to low memory. This also speeds up partition initialization.
Move page depositing from the hypercall wrapper to the partition
initialization code. The required number of pages is empirical. This logic
fits better in the partition initialization code than in the hypercall
wrapper.
A partition with nested capability requires 40x more pages (20 MB) to
accommodate the nested MSHV hypervisor. This may be improved in the future.
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
drivers/hv/mshv_root.h | 1 +
drivers/hv/mshv_root_hv_call.c | 6 ------
drivers/hv/mshv_root_main.c | 23 +++++++++++++++++++++--
3 files changed, 22 insertions(+), 8 deletions(-)
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index 947dfb76bb19..40cf7bdbd62f 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -106,6 +106,7 @@ struct mshv_partition {
struct hlist_node pt_hnode;
u64 pt_id;
+ u64 pt_flags;
refcount_t pt_ref_count;
struct mutex pt_mutex;
diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
index bdcb8de7fb47..b8d199f95299 100644
--- a/drivers/hv/mshv_root_hv_call.c
+++ b/drivers/hv/mshv_root_hv_call.c
@@ -15,7 +15,6 @@
#include "mshv_root.h"
/* Determined empirically */
-#define HV_INIT_PARTITION_DEPOSIT_PAGES 208
#define HV_UMAP_GPA_PAGES 512
#define HV_PAGE_COUNT_2M_ALIGNED(pg_count) (!((pg_count) & (0x200 - 1)))
@@ -139,11 +138,6 @@ int hv_call_initialize_partition(u64 partition_id)
input.partition_id = partition_id;
- ret = hv_call_deposit_pages(NUMA_NO_NODE, partition_id,
- HV_INIT_PARTITION_DEPOSIT_PAGES);
- if (ret)
- return ret;
-
do {
status = hv_do_fast_hypercall8(HVCALL_INITIALIZE_PARTITION,
*(u64 *)&input);
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index d753f41d3b57..fbfc50de332c 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -35,6 +35,10 @@
#include "mshv.h"
#include "mshv_root.h"
+/* The deposit values below are empirical and may need to be adjusted. */
+#define MSHV_PARTITION_DEPOSIT_PAGES (SZ_512K >> PAGE_SHIFT)
+#define MSHV_PARTITION_DEPOSIT_PAGES_NESTED (20 * SZ_1M >> PAGE_SHIFT)
+
MODULE_AUTHOR("Microsoft");
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Microsoft Hyper-V root partition VMM interface /dev/mshv");
@@ -1587,6 +1591,15 @@ mshv_partition_ioctl_set_msi_routing(struct mshv_partition *partition,
return ret;
}
+static u64
+mshv_partition_deposit_pages(struct mshv_partition *partition)
+{
+ if (partition->pt_flags &
+ HV_PARTITION_CREATION_FLAG_NESTED_VIRTUALIZATION_CAPABLE)
+ return MSHV_PARTITION_DEPOSIT_PAGES_NESTED;
+ return MSHV_PARTITION_DEPOSIT_PAGES;
+}
+
static long
mshv_partition_ioctl_initialize(struct mshv_partition *partition)
{
@@ -1595,6 +1608,11 @@ mshv_partition_ioctl_initialize(struct mshv_partition *partition)
if (partition->pt_initialized)
return 0;
+ ret = hv_call_deposit_pages(NUMA_NO_NODE, partition->pt_id,
+ mshv_partition_deposit_pages(partition));
+ if (ret)
+ goto withdraw_mem;
+
ret = hv_call_initialize_partition(partition->pt_id);
if (ret)
goto withdraw_mem;
@@ -1610,8 +1628,8 @@ mshv_partition_ioctl_initialize(struct mshv_partition *partition)
finalize_partition:
hv_call_finalize_partition(partition->pt_id);
withdraw_mem:
- hv_call_withdraw_memory(U64_MAX, NUMA_NO_NODE, partition->pt_id);
-
+ hv_call_withdraw_memory(MSHV_PARTITION_DEPOSIT_PAGES,
+ NUMA_NO_NODE, partition->pt_id);
return ret;
}
@@ -2032,6 +2050,7 @@ mshv_ioctl_create_partition(void __user *user_arg, struct device *module_dev)
return -ENOMEM;
partition->pt_module_dev = module_dev;
+ partition->pt_flags = creation_flags;
partition->isolation_type = isolation_properties.isolation_type;
refcount_set(&partition->pt_ref_count, 1);
^ permalink raw reply related
* [PATCH 1/4] mshv: Support larger memory deposits
From: Stanislav Kinsburskii @ 2026-03-04 0:23 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177258296744.229866.4926075663598294228.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
Convert hv_call_deposit_pages() into a wrapper supporting arbitrary number
of pages, and use it in the memory deposit code paths.
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
drivers/hv/hv_proc.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 49 insertions(+), 1 deletion(-)
diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
index 5f4fd9c3231c..0f84a70def30 100644
--- a/drivers/hv/hv_proc.c
+++ b/drivers/hv/hv_proc.c
@@ -16,7 +16,7 @@
#define HV_DEPOSIT_MAX (HV_HYP_PAGE_SIZE / sizeof(u64) - 1)
/* Deposits exact number of pages. Must be called with interrupts enabled. */
-int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
+static int __hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
{
struct page **pages, *page;
int *counts;
@@ -108,6 +108,54 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
kfree(counts);
return ret;
}
+
+/**
+ * hv_call_deposit_pages - Deposit memory pages to a partition
+ * @node : NUMA node from which to allocate pages
+ * @partition_id: Target partition ID to deposit pages to
+ * @num_pages : Number of pages to deposit
+ *
+ * Deposits memory pages to the specified partition. The deposit is
+ * performed in chunks of HV_DEPOSIT_MAX pages to handle large requests
+ * efficiently.
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
+{
+ u32 done;
+ int ret = 0;
+
+ /*
+ * Do a double deposit for L1VH. This reserves enough memory for
+ * Hypervisor Hot Restart (HHR).
+ *
+ * During HHR, every data structure must be recreated in the new
+ * ("proto") hypervisor. Memory is required by the proto hypervisor
+ * to do this work.
+ *
+ * For regular L1 partitions, more memory can be requested from the
+ * root during HHR by sending an asynchronous message. But this is
+ * not supported for L1VHs. A guest must not be allowed to block
+ * HHR by refusing to deposit more memory.
+ *
+ * So for L1VH a deposit is always required for both current needs
+ * and future HHR work.
+ */
+ if (hv_l1vh_partition())
+ num_pages *= 2;
+
+ for (done = 0; done < num_pages; done += HV_DEPOSIT_MAX) {
+ u32 to_deposit = min(num_pages - done, HV_DEPOSIT_MAX);
+
+ ret = __hv_call_deposit_pages(node, partition_id,
+ to_deposit);
+ if (ret)
+ break;
+ }
+
+ return ret;
+}
EXPORT_SYMBOL_GPL(hv_call_deposit_pages);
int hv_deposit_memory_node(int node, u64 partition_id,
^ permalink raw reply related
* [PATCH 0/4] mshv: Fix and improve memory pre-depositing
From: Stanislav Kinsburskii @ 2026-03-04 0:23 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
This series fixes and improves memory pre-depositing in the Microsoft
Hypervisor (MSHV) driver to avoid partition and virtual processor
creation failures due to insufficient deposited memory, and to speed
up guest creation.
The first patch converts hv_call_deposit_pages() into a wrapper that
supports arbitrarily large deposit requests by splitting them into
HV_DEPOSIT_MAX-sized chunks. It also doubles the deposit amount for
L1 virtual hypervisor (L1VH) partitions to reserve memory for
Hypervisor Hot Restart (HHR), since L1VH guests cannot request
additional memory from the root partition during HHR.
The second patch moves partition initialization page depositing from
the hypercall wrapper to the partition initialization ioctl. The
required number of pages is determined empirically. Partitions with
nested virtualization capability require significantly more pages
(20 MB) to accommodate the nested hypervisor. The partition creation
flags are saved in the partition structure to allow selecting the
correct deposit size at initialization time.
The third patch moves virtual processor page depositing from
hv_call_create_vp() to mshv_partition_ioctl_create_vp(). A fixed
deposit of 1 MB per VP is used, which covers both regular and nested
virtualization cases. Deposited memory is now properly withdrawn if
VP creation fails.
The fourth patch adds pre-depositing of pages for guest address space
(SLAT) region creation. The deposit size is calculated based on the
region size rounded up to 1 GB chunks, with 6 MB deposited per GB of
address space. Deposited pages are withdrawn on failure.
---
Stanislav Kinsburskii (4):
mshv: Support larger memory deposits
mshv: Fix pre-depositing of pages for partition initialization
mshv: Fix pre-depositing of pages for virtual processor initialization
mshv: Pre-deposit pages for SLAT creation
drivers/hv/hv_proc.c | 58 +++++++++++++++++++++++++++++++++------
drivers/hv/mshv_root.h | 1 +
drivers/hv/mshv_root_hv_call.c | 6 ----
drivers/hv/mshv_root_main.c | 59 +++++++++++++++++++++++++++++++++++++---
4 files changed, 104 insertions(+), 20 deletions(-)
^ permalink raw reply
* [PATCH V0] mshv: pass struct mshv_user_mem_region by reference
From: Mukesh R @ 2026-03-04 0:02 UTC (permalink / raw)
To: linux-hyperv, linux-kernel; +Cc: wei.liu
For unstated reasons, function mshv_partition_ioctl_set_memory passes
struct mshv_user_mem_region by value instead of by reference. Change
it to pass by reference.
Signed-off-by: Mukesh R <mrathor@linux.microsoft.com>
---
drivers/hv/mshv_root_main.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index e6509c980763..87c5ffd2528d 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -1289,7 +1289,7 @@ static int mshv_prepare_pinned_region(struct mshv_mem_region *region)
*/
static long
mshv_map_user_memory(struct mshv_partition *partition,
- struct mshv_user_mem_region mem)
+ struct mshv_user_mem_region *mem)
{
struct mshv_mem_region *region;
struct vm_area_struct *vma;
@@ -1297,12 +1297,12 @@ mshv_map_user_memory(struct mshv_partition *partition,
ulong mmio_pfn;
long ret;
- if (mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP) ||
- !access_ok((const void __user *)mem.userspace_addr, mem.size))
+ if (mem->flags & BIT(MSHV_SET_MEM_BIT_UNMAP) ||
+ !access_ok((const void __user *)mem->userspace_addr, mem->size))
return -EINVAL;
mmap_read_lock(current->mm);
- vma = vma_lookup(current->mm, mem.userspace_addr);
+ vma = vma_lookup(current->mm, mem->userspace_addr);
is_mmio = vma ? !!(vma->vm_flags & (VM_IO | VM_PFNMAP)) : 0;
mmio_pfn = is_mmio ? vma->vm_pgoff : 0;
mmap_read_unlock(current->mm);
@@ -1310,7 +1310,7 @@ mshv_map_user_memory(struct mshv_partition *partition,
if (!vma)
return -EINVAL;
- ret = mshv_partition_create_region(partition, &mem, ®ion,
+ ret = mshv_partition_create_region(partition, mem, ®ion,
is_mmio);
if (ret)
return ret;
@@ -1355,25 +1355,25 @@ mshv_map_user_memory(struct mshv_partition *partition,
/* Called for unmapping both the guest ram and the mmio space */
static long
mshv_unmap_user_memory(struct mshv_partition *partition,
- struct mshv_user_mem_region mem)
+ struct mshv_user_mem_region *mem)
{
struct mshv_mem_region *region;
- if (!(mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP)))
+ if (!(mem->flags & BIT(MSHV_SET_MEM_BIT_UNMAP)))
return -EINVAL;
spin_lock(&partition->pt_mem_regions_lock);
- region = mshv_partition_region_by_gfn(partition, mem.guest_pfn);
+ region = mshv_partition_region_by_gfn(partition, mem->guest_pfn);
if (!region) {
spin_unlock(&partition->pt_mem_regions_lock);
return -ENOENT;
}
/* Paranoia check */
- if (region->start_uaddr != mem.userspace_addr ||
- region->start_gfn != mem.guest_pfn ||
- region->nr_pages != HVPFN_DOWN(mem.size)) {
+ if (region->start_uaddr != mem->userspace_addr ||
+ region->start_gfn != mem->guest_pfn ||
+ region->nr_pages != HVPFN_DOWN(mem->size)) {
spin_unlock(&partition->pt_mem_regions_lock);
return -EINVAL;
}
@@ -1404,9 +1404,9 @@ mshv_partition_ioctl_set_memory(struct mshv_partition *partition,
return -EINVAL;
if (mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP))
- return mshv_unmap_user_memory(partition, mem);
+ return mshv_unmap_user_memory(partition, &mem);
- return mshv_map_user_memory(partition, mem);
+ return mshv_map_user_memory(partition, &mem);
}
static long
--
2.51.2.vfs.0.1
^ permalink raw reply related
* RE: [EXTERNAL] Re: [PATCH net-next 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management
From: Long Li @ 2026-03-04 0:01 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Konstantin Taranov, David S . Miller, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Jason Gunthorpe, Leon Romanovsky, Haiyang Zhang,
KY Srinivasan, Wei Liu, Dexuan Cui, Simon Horman,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20260302185902.5a778bb4@kernel.org>
> On Fri, 27 Feb 2026 18:11:38 -0800 Long Li wrote:
> > This series adds per-vPort Event Queue (EQ) allocation and MSI-X
> > interrupt management for the MANA driver. Previously, all vPorts
> > shared a single set of EQs. This change enables dedicated EQs per
> > vPort with support for both dedicated and shared MSI-X vector allocation
> modes.
>
> Does not apply to net-next, please rebase.
I have sent v2.
Thank you,
Long
^ permalink raw reply
* [PATCH v2 net-next 6/6] RDMA/mana_ib: Allocate interrupt contexts on EQs
From: Long Li @ 2026-03-04 0:00 UTC (permalink / raw)
To: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Shradha Gupta, Simon Horman, Konstantin Taranov,
Souradeep Chakrabarti, Erick Archer, linux-hyperv, netdev,
linux-kernel, linux-rdma
Cc: Long Li
In-Reply-To: <20260304000017.333312-1-longli@microsoft.com>
Use the GIC functions to allocate interrupt contexts for RDMA EQs. These
interrupt contexts may be shared with Ethernet EQs when MSI-X vectors
are limited.
The driver now supports allocating dedicated MSI-X for each EQ. Indicate
this capability through driver capability bits.
Signed-off-by: Long Li <longli@microsoft.com>
---
drivers/infiniband/hw/mana/main.c | 33 ++++++++++++++++++++++++++-----
include/net/mana/gdma.h | 7 +++++--
2 files changed, 33 insertions(+), 7 deletions(-)
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index d51dd0ee85f4..0b74dd093b41 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -787,6 +787,7 @@ int mana_ib_create_eqs(struct mana_ib_dev *mdev)
{
struct gdma_context *gc = mdev_to_gc(mdev);
struct gdma_queue_spec spec = {};
+ struct gdma_irq_context *gic;
int err, i;
spec.type = GDMA_EQ;
@@ -797,9 +798,15 @@ int mana_ib_create_eqs(struct mana_ib_dev *mdev)
spec.eq.log2_throttle_limit = LOG2_EQ_THROTTLE;
spec.eq.msix_index = 0;
+ gic = mana_gd_get_gic(gc, false, &spec.eq.msix_index);
+ if (!gic)
+ return -ENOMEM;
+
err = mana_gd_create_mana_eq(mdev->gdma_dev, &spec, &mdev->fatal_err_eq);
- if (err)
+ if (err) {
+ mana_gd_put_gic(gc, false, 0);
return err;
+ }
mdev->eqs = kzalloc_objs(struct gdma_queue *,
mdev->ib_dev.num_comp_vectors);
@@ -810,31 +817,47 @@ int mana_ib_create_eqs(struct mana_ib_dev *mdev)
spec.eq.callback = NULL;
for (i = 0; i < mdev->ib_dev.num_comp_vectors; i++) {
spec.eq.msix_index = (i + 1) % gc->num_msix_usable;
+
+ gic = mana_gd_get_gic(gc, false, &spec.eq.msix_index);
+ if (!gic) {
+ err = -ENOMEM;
+ goto destroy_eqs;
+ }
+
err = mana_gd_create_mana_eq(mdev->gdma_dev, &spec, &mdev->eqs[i]);
- if (err)
+ if (err) {
+ mana_gd_put_gic(gc, false, spec.eq.msix_index);
goto destroy_eqs;
+ }
}
return 0;
destroy_eqs:
- while (i-- > 0)
+ while (i-- > 0) {
mana_gd_destroy_queue(gc, mdev->eqs[i]);
+ mana_gd_put_gic(gc, false, (i + 1) % gc->num_msix_usable);
+ }
kfree(mdev->eqs);
destroy_fatal_eq:
mana_gd_destroy_queue(gc, mdev->fatal_err_eq);
+ mana_gd_put_gic(gc, false, 0);
return err;
}
void mana_ib_destroy_eqs(struct mana_ib_dev *mdev)
{
struct gdma_context *gc = mdev_to_gc(mdev);
- int i;
+ int i, msi;
mana_gd_destroy_queue(gc, mdev->fatal_err_eq);
+ mana_gd_put_gic(gc, false, 0);
- for (i = 0; i < mdev->ib_dev.num_comp_vectors; i++)
+ for (i = 0; i < mdev->ib_dev.num_comp_vectors; i++) {
mana_gd_destroy_queue(gc, mdev->eqs[i]);
+ msi = (i + 1) % gc->num_msix_usable;
+ mana_gd_put_gic(gc, false, msi);
+ }
kfree(mdev->eqs);
}
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 4e0278b00bbb..662e58f51e87 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -612,6 +612,7 @@ enum {
#define GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG BIT(3)
#define GDMA_DRV_CAP_FLAG_1_GDMA_PAGES_4MB_1GB_2GB BIT(4)
#define GDMA_DRV_CAP_FLAG_1_VARIABLE_INDIRECTION_TABLE_SUPPORT BIT(5)
+#define GDMA_DRV_CAP_FLAG_1_HW_VPORT_LINK_AWARE BIT(6)
/* Driver can handle holes (zeros) in the device list */
#define GDMA_DRV_CAP_FLAG_1_DEV_LIST_HOLES_SUP BIT(11)
@@ -628,7 +629,8 @@ enum {
/* Driver detects stalled send queues and recovers them */
#define GDMA_DRV_CAP_FLAG_1_HANDLE_STALL_SQ_RECOVERY BIT(18)
-#define GDMA_DRV_CAP_FLAG_1_HW_VPORT_LINK_AWARE BIT(6)
+/* Driver supports separate EQ/MSIs for each vPort */
+#define GDMA_DRV_CAP_FLAG_1_EQ_MSI_UNSHARE_MULTI_VPORT BIT(19)
/* Driver supports linearizing the skb when num_sge exceeds hardware limit */
#define GDMA_DRV_CAP_FLAG_1_SKB_LINEARIZE BIT(20)
@@ -656,7 +658,8 @@ enum {
GDMA_DRV_CAP_FLAG_1_SKB_LINEARIZE | \
GDMA_DRV_CAP_FLAG_1_PROBE_RECOVERY | \
GDMA_DRV_CAP_FLAG_1_HANDLE_STALL_SQ_RECOVERY | \
- GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECOVERY)
+ GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECOVERY | \
+ GDMA_DRV_CAP_FLAG_1_EQ_MSI_UNSHARE_MULTI_VPORT)
#define GDMA_DRV_CAP_FLAGS2 0
--
2.43.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox