Linux-HyperV List

Linux-HyperV List
 help / color / mirror / Atom feed

* Re: [PATCH] mshv: expose hv_call_scrub_partition
From: Wei Liu @ 2026-02-18 23:35 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: wei.liu, haiyangz, kys, decui, linux-hyperv, skinsburskii,
	magnuskulke, linux-kernel
In-Reply-To: <20260218141911.555592-1-magnuskulke@linux.microsoft.com>

On Wed, Feb 18, 2026 at 03:19:11PM +0100, Magnus Kulke wrote:
> This hv call needs to be exposed for VMMs to be able to soft-reboot
> guests. It will reset APIC and state of para-virtualized devices like
> SynIC.
> 
> Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>

Applied to hyperv-next. Thanks.

^ permalink raw reply

* Re: [PATCH v5] mshv: Add support for integrated scheduler
From: Wei Liu @ 2026-02-18 23:28 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel
In-Reply-To: <177144189787.43429.7425661016523660268.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

On Wed, Feb 18, 2026 at 07:11:40PM +0000, Stanislav Kinsburskii wrote:
> Query the hypervisor for integrated scheduler support and use it if
> configured.
> 
> Microsoft Hypervisor originally provided two schedulers: root and core. The
> root scheduler allows the root partition to schedule guest vCPUs across
> physical cores, supporting both time slicing and CPU affinity (e.g., via
> cgroups). In contrast, the core scheduler delegates vCPU-to-physical-core
> scheduling entirely to the hypervisor.
> 
> Direct virtualization introduces a new privileged guest partition type - L1
> Virtual Host (L1VH) — which can create child partitions from its own
> resources. These child partitions are effectively siblings, scheduled by
> the hypervisor's core scheduler. This prevents the L1VH parent from setting
> affinity or time slicing for its own processes or guest VPs. While cgroups,
> CFS, and cpuset controllers can still be used, their effectiveness is
> unpredictable, as the core scheduler swaps vCPUs according to its own logic
> (typically round-robin across all allocated physical CPUs). As a result,
> the system may appear to "steal" time from the L1VH and its children.
> 
> To address this, Microsoft Hypervisor introduces the integrated scheduler.
> This allows an L1VH partition to schedule its own vCPUs and those of its
> guests across its "physical" cores, effectively emulating root scheduler
> behavior within the L1VH, while retaining core scheduler behavior for the
> rest of the system.
> 
> The integrated scheduler is controlled by the root partition and gated by
> the vmm_enable_integrated_scheduler capability bit. If set, the hypervisor
> supports the integrated scheduler. The L1VH partition must then check if it
> is enabled by querying the corresponding extended partition property. If
> this property is true, the L1VH partition must use the root scheduler
> logic; otherwise, it must use the core scheduler. This requirement makes
> reading VMM capabilities in L1VH partition a requirement too.
> 
> Signed-off-by: Andreea Pintilie <anpintil@microsoft.com>
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> Reviewed-by: Michael Kelley <mhklinux@outlook.com>

Applied.

^ permalink raw reply

* Re: [PATCH RESEND] mshv: Use try_cmpxchg() instead of cmpxchg()
From: Wei Liu @ 2026-02-18 23:27 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: linux-hyperv, linux-kernel, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Dexuan Cui, Long Li
In-Reply-To: <20260218110041.179949-1-ubizjak@gmail.com>

On Wed, Feb 18, 2026 at 12:00:18PM +0100, Uros Bizjak wrote:
> Use !try_cmpxchg() instead of cmpxchg (*ptr, old, new) != old.
> x86 CMPXCHG instruction returns success in ZF flag, so this
> change saves a compare after CMPXCHG.
> 
> The generated assembly code improves from e.g.:
> 
>      415:	48 8b 44 24 30       	mov    0x30(%rsp),%rax
>      41a:	48 8b 54 24 38       	mov    0x38(%rsp),%rdx
>      41f:	f0 49 0f b1 91 a8 02 	lock cmpxchg %rdx,0x2a8(%r9)
>      426:	00 00
>      428:	48 3b 44 24 30       	cmp    0x30(%rsp),%rax
>      42d:	0f 84 09 ff ff ff    	je     33c <...>
> 
> to:
> 
>      415:	48 8b 44 24 30       	mov    0x30(%rsp),%rax
>      41a:	48 8b 54 24 38       	mov    0x38(%rsp),%rdx
>      41f:	f0 49 0f b1 91 a8 02 	lock cmpxchg %rdx,0x2a8(%r9)
>      426:	00 00
>      428:	0f 84 0e ff ff ff    	je     33c <...>
> 
> No functional change intended.
> 
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Wei Liu <wei.liu@kernel.org>
> Cc: Dexuan Cui <decui@microsoft.com>
> Cc: Long Li <longli@microsoft.com>

Applied. Thanks.

^ permalink raw reply

* Re: [PATCH] x86/hyperv: Fix error pointer dereference
From: Wei Liu @ 2026-02-18 23:22 UTC (permalink / raw)
  To: Ethan Tidmore
  Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Saurabh Sengar, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H . Peter Anvin, Michael Kelley, x86, linux-hyperv,
	linux-kernel
In-Reply-To: <DGIBQKCDK1KA.1BU93405XZZ9R@gmail.com>

On Wed, Feb 18, 2026 at 01:11:18PM -0600, Ethan Tidmore wrote:
> On Wed Feb 18, 2026 at 1:09 PM CST, Ethan Tidmore wrote:
> > The function idle_thread_get() can return an error pointer and is not
> > checked for it. Add check for error pointer.
> >
> > Detected by Smatch:
> > arch/x86/hyperv/hv_vtl.c:126 hv_vtl_bringup_vcpu() error:
> > 'idle' dereferencing possible ERR_PTR()
> >
> > Fixes: 2b4b90e053a29 ("x86/hyperv: Use per cpu initial stack for vtl context")
> > Signed-off-by: Ethan Tidmore <ethantidmore06@gmail.com>
> > ---
> 
> Oops, forgot v2 header this is the v2.

Applied. Thanks.

^ permalink raw reply

* RE: [EXTERNAL] [PATCH 1/1] Drivers: hv: vmbus: Simplify allocation of vmbus_evt
From: Long Li @ 2026-02-18 21:52 UTC (permalink / raw)
  To: mhklinux@outlook.com, KY Srinivasan, Haiyang Zhang,
	wei.liu@kernel.org, Dexuan Cui, linux-hyperv@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org
In-Reply-To: <20260218170121.1522-1-mhklkml@zohomail.com>

> From: Michael Kelley <mhklinux@outlook.com>
> 
> The per-cpu variable vmbus_evt is currently dynamically allocated. It's only 8
> bytes, so just allocate it statically to simplify and save a few lines of code.
> 
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>

Reviewed-by: Long Li <longli@microsoft.com>

> ---
>  drivers/hv/vmbus_drv.c | 23 +++++++++--------------
>  1 file changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index
> 97dfa529d250..2219ce41b384 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -51,7 +51,7 @@ static struct device  *vmbus_root_device;
> 
>  static int hyperv_cpuhp_online;
> 
> -static long __percpu *vmbus_evt;
> +static DEFINE_PER_CPU(long, vmbus_evt);
> 
>  /* Values parsed from ACPI DSDT */
>  int vmbus_irq;
> @@ -1475,13 +1475,11 @@ static int vmbus_bus_init(void)
>         if (vmbus_irq == -1) {
>                 hv_setup_vmbus_handler(vmbus_isr);
>         } else {
> -               vmbus_evt = alloc_percpu(long);
>                 ret = request_percpu_irq(vmbus_irq, vmbus_percpu_isr,
> -                               "Hyper-V VMbus", vmbus_evt);
> +                               "Hyper-V VMbus", &vmbus_evt);
>                 if (ret) {
>                         pr_err("Can't request Hyper-V VMbus IRQ %d, Err %d",
>                                         vmbus_irq, ret);
> -                       free_percpu(vmbus_evt);
>                         goto err_setup;
>                 }
>         }
> @@ -1510,12 +1508,10 @@ static int vmbus_bus_init(void)
>         return 0;
> 
>  err_connect:
> -       if (vmbus_irq == -1) {
> +       if (vmbus_irq == -1)
>                 hv_remove_vmbus_handler();
> -       } else {
> -               free_percpu_irq(vmbus_irq, vmbus_evt);
> -               free_percpu(vmbus_evt);
> -       }
> +       else
> +               free_percpu_irq(vmbus_irq, &vmbus_evt);
>  err_setup:
>         bus_unregister(&hv_bus);
>         return ret;
> @@ -2981,12 +2977,11 @@ static void __exit vmbus_exit(void)
>         vmbus_connection.conn_state = DISCONNECTED;
>         hv_stimer_global_cleanup();
>         vmbus_disconnect();
> -       if (vmbus_irq == -1) {
> +       if (vmbus_irq == -1)
>                 hv_remove_vmbus_handler();
> -       } else {
> -               free_percpu_irq(vmbus_irq, vmbus_evt);
> -               free_percpu(vmbus_evt);
> -       }
> +       else
> +               free_percpu_irq(vmbus_irq, &vmbus_evt);
> +
>         for_each_online_cpu(cpu) {
>                 struct hv_per_cpu_context *hv_cpu
>                         = per_cpu_ptr(hv_context.cpu_context, cpu);
> --
> 2.25.1


^ permalink raw reply

* Re: [PATCH v3 00/16] x86/msr: Inline rdmsr/wrmsr instructions
From: H. Peter Anvin @ 2026-02-18 20:37 UTC (permalink / raw)
  To: Juergen Gross, linux-kernel, x86, linux-coco, kvm, linux-hyperv,
	virtualization, llvm
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	Kiryl Shutsemau, Rick Edgecombe, Sean Christopherson,
	Paolo Bonzini, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Vitaly Kuznetsov, Boris Ostrovsky, xen-devel,
	Ajay Kaher, Alexey Makhalov, Broadcom internal kernel review list,
	Andy Lutomirski, Peter Zijlstra, Xin Li, Nathan Chancellor,
	Nick Desaulniers, Bill Wendling, Justin Stitt, Josh Poimboeuf,
	andy.cooper
In-Reply-To: <20260218082133.400602-1-jgross@suse.com>

On February 18, 2026 12:21:17 AM PST, Juergen Gross <jgross@suse.com> wrote:
>When building a kernel with CONFIG_PARAVIRT_XXL the paravirt
>infrastructure will always use functions for reading or writing MSRs,
>even when running on bare metal.
>
>Switch to inline RDMSR/WRMSR instructions in this case, reducing the
>paravirt overhead.
>
>The first patch is a prerequisite fix for alternative patching. Its
>is needed due to the initial indirect call needs to be padded with
>NOPs in some cases with the following patches.
>
>In order to make this less intrusive, some further reorganization of
>the MSR access helpers is done in the patches 1-6.
>
>The next 4 patches are converting the non-paravirt case to use direct
>inlining of the MSR access instructions, including the WRMSRNS
>instruction and the immediate variants of RDMSR and WRMSR if possible.
>
>Patches 11-13 are some further preparations for making the real switch
>to directly patch in the native MSR instructions easier.
>
>Patch 14 is switching the paravirt MSR function interface from normal
>call ABI to one more similar to the native MSR instructions.
>
>Patch 15 is a little cleanup patch.
>
>Patch 16 is the final step for patching in the native MSR instructions
>when not running as a Xen PV guest.
>
>This series has been tested to work with Xen PV and on bare metal.
>
>Note that there is more room for improvement. This series is sent out
>to get a first impression how the code will basically look like.

Does that mean you are considering this patchset an RFC? If so, you should put that in the subject header. 

>Right now the same problem is solved differently for the paravirt and
>the non-paravirt cases. In case this is not desired, there are two
>possibilities to merge the two implementations. Both solutions have
>the common idea to have rather similar code for paravirt and
>non-paravirt variants, but just use a different main macro for
>generating the respective code. For making the code of both possible
>scenarios more similar, the following variants are possible:
>
>1. Remove the micro-optimizations of the non-paravirt case, making
>   it similar to the paravirt code in my series. This has the
>   advantage of being more simple, but might have a very small
>   negative performance impact (probably not really detectable).
>
>2. Add the same micro-optimizations to the paravirt case, requiring
>   to enhance paravirt patching to support a to be patched indirect
>   call in the middle of the initial code snipplet.
>
>In both cases the native MSR function variants would no longer be
>usable in the paravirt case, but this would mostly affect Xen, as it
>would need to open code the WRMSR/RDMSR instructions to be used
>instead the native_*msr*() functions.
>
>Changes since V2:
>- switch back to the paravirt approach
>
>Changes since V1:
>- Use Xin Li's approach for inlining
>- Several new patches
>
>Juergen Gross (16):
>  x86/alternative: Support alt_replace_call() with instructions after
>    call
>  coco/tdx: Rename MSR access helpers
>  x86/sev: Replace call of native_wrmsr() with native_wrmsrq()
>  KVM: x86: Remove the KVM private read_msr() function
>  x86/msr: Minimize usage of native_*() msr access functions
>  x86/msr: Move MSR trace calls one function level up
>  x86/opcode: Add immediate form MSR instructions
>  x86/extable: Add support for immediate form MSR instructions
>  x86/msr: Use the alternatives mechanism for WRMSR
>  x86/msr: Use the alternatives mechanism for RDMSR
>  x86/alternatives: Add ALTERNATIVE_4()
>  x86/paravirt: Split off MSR related hooks into new header
>  x86/paravirt: Prepare support of MSR instruction interfaces
>  x86/paravirt: Switch MSR access pv_ops functions to instruction
>    interfaces
>  x86/msr: Reduce number of low level MSR access helpers
>  x86/paravirt: Use alternatives for MSR access with paravirt
>
> arch/x86/coco/sev/internal.h              |   7 +-
> arch/x86/coco/tdx/tdx.c                   |   8 +-
> arch/x86/hyperv/ivm.c                     |   2 +-
> arch/x86/include/asm/alternative.h        |   6 +
> arch/x86/include/asm/fred.h               |   2 +-
> arch/x86/include/asm/kvm_host.h           |  10 -
> arch/x86/include/asm/msr.h                | 345 ++++++++++++++++------
> arch/x86/include/asm/paravirt-msr.h       | 148 ++++++++++
> arch/x86/include/asm/paravirt.h           |  67 -----
> arch/x86/include/asm/paravirt_types.h     |  57 ++--
> arch/x86/include/asm/qspinlock_paravirt.h |   4 +-
> arch/x86/kernel/alternative.c             |   5 +-
> arch/x86/kernel/cpu/mshyperv.c            |   7 +-
> arch/x86/kernel/kvmclock.c                |   2 +-
> arch/x86/kernel/paravirt.c                |  42 ++-
> arch/x86/kvm/svm/svm.c                    |  16 +-
> arch/x86/kvm/vmx/tdx.c                    |   2 +-
> arch/x86/kvm/vmx/vmx.c                    |   8 +-
> arch/x86/lib/x86-opcode-map.txt           |   5 +-
> arch/x86/mm/extable.c                     |  35 ++-
> arch/x86/xen/enlighten_pv.c               |  52 +++-
> arch/x86/xen/pmu.c                        |   4 +-
> tools/arch/x86/lib/x86-opcode-map.txt     |   5 +-
> tools/objtool/check.c                     |   1 +
> 24 files changed, 576 insertions(+), 264 deletions(-)
> create mode 100644 arch/x86/include/asm/paravirt-msr.h
>

Could you clarify *on the high design level* what "go back to the paravirt approach" means, and the motivation for that?

Note that for Xen *most* MSRs fall in one of two categories: those that are dropped entirely and those that are just passed straight on to the hardware.

I don't know if anyone cares about optimizing PV Xen anymore, but at least in theory Xen can un-paravirtualize most sites.

^ permalink raw reply

* [PATCH v5] mshv: Add support for integrated scheduler
From: Stanislav Kinsburskii @ 2026-02-18 19:11 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel

Query the hypervisor for integrated scheduler support and use it if
configured.

Microsoft Hypervisor originally provided two schedulers: root and core. The
root scheduler allows the root partition to schedule guest vCPUs across
physical cores, supporting both time slicing and CPU affinity (e.g., via
cgroups). In contrast, the core scheduler delegates vCPU-to-physical-core
scheduling entirely to the hypervisor.

Direct virtualization introduces a new privileged guest partition type - L1
Virtual Host (L1VH) — which can create child partitions from its own
resources. These child partitions are effectively siblings, scheduled by
the hypervisor's core scheduler. This prevents the L1VH parent from setting
affinity or time slicing for its own processes or guest VPs. While cgroups,
CFS, and cpuset controllers can still be used, their effectiveness is
unpredictable, as the core scheduler swaps vCPUs according to its own logic
(typically round-robin across all allocated physical CPUs). As a result,
the system may appear to "steal" time from the L1VH and its children.

To address this, Microsoft Hypervisor introduces the integrated scheduler.
This allows an L1VH partition to schedule its own vCPUs and those of its
guests across its "physical" cores, effectively emulating root scheduler
behavior within the L1VH, while retaining core scheduler behavior for the
rest of the system.

The integrated scheduler is controlled by the root partition and gated by
the vmm_enable_integrated_scheduler capability bit. If set, the hypervisor
supports the integrated scheduler. The L1VH partition must then check if it
is enabled by querying the corresponding extended partition property. If
this property is true, the L1VH partition must use the root scheduler
logic; otherwise, it must use the core scheduler. This requirement makes
reading VMM capabilities in L1VH partition a requirement too.

Signed-off-by: Andreea Pintilie <anpintil@microsoft.com>
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
---
 drivers/hv/mshv_root_main.c |   82 ++++++++++++++++++++++++++-----------------
 include/hyperv/hvhdk_mini.h |    7 +++-
 2 files changed, 56 insertions(+), 33 deletions(-)

diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index 431aebf95bc7..c6ec88884728 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -2079,6 +2079,29 @@ static const char *scheduler_type_to_string(enum hv_scheduler_type type)
 	};
 }
 
+static int __init l1vh_retrieve_scheduler_type(enum hv_scheduler_type *out)
+{
+	u64 integrated_sched_enabled;
+	int ret;
+
+	*out = HV_SCHEDULER_TYPE_CORE_SMT;
+
+	if (!mshv_root.vmm_caps.vmm_enable_integrated_scheduler)
+		return 0;
+
+	ret = hv_call_get_partition_property_ex(HV_PARTITION_ID_SELF,
+						HV_PARTITION_PROPERTY_INTEGRATED_SCHEDULER_ENABLED,
+						0, &integrated_sched_enabled,
+						sizeof(integrated_sched_enabled));
+	if (ret)
+		return ret;
+
+	if (integrated_sched_enabled)
+		*out = HV_SCHEDULER_TYPE_ROOT;
+
+	return 0;
+}
+
 /* TODO move this to hv_common.c when needed outside */
 static int __init hv_retrieve_scheduler_type(enum hv_scheduler_type *out)
 {
@@ -2111,13 +2134,12 @@ static int __init hv_retrieve_scheduler_type(enum hv_scheduler_type *out)
 /* Retrieve and stash the supported scheduler type */
 static int __init mshv_retrieve_scheduler_type(struct device *dev)
 {
-	int ret = 0;
+	int ret;
 
 	if (hv_l1vh_partition())
-		hv_scheduler_type = HV_SCHEDULER_TYPE_CORE_SMT;
+		ret = l1vh_retrieve_scheduler_type(&hv_scheduler_type);
 	else
 		ret = hv_retrieve_scheduler_type(&hv_scheduler_type);
-
 	if (ret)
 		return ret;
 
@@ -2237,42 +2259,29 @@ struct notifier_block mshv_reboot_nb = {
 static void mshv_root_partition_exit(void)
 {
 	unregister_reboot_notifier(&mshv_reboot_nb);
-	root_scheduler_deinit();
 }
 
 static int __init mshv_root_partition_init(struct device *dev)
 {
-	int err;
-
-	err = root_scheduler_init(dev);
-	if (err)
-		return err;
-
-	err = register_reboot_notifier(&mshv_reboot_nb);
-	if (err)
-		goto root_sched_deinit;
-
-	return 0;
-
-root_sched_deinit:
-	root_scheduler_deinit();
-	return err;
+	return register_reboot_notifier(&mshv_reboot_nb);
 }
 
-static void mshv_init_vmm_caps(struct device *dev)
+static int __init mshv_init_vmm_caps(struct device *dev)
 {
-	/*
-	 * This can only fail here if HVCALL_GET_PARTITION_PROPERTY_EX or
-	 * HV_PARTITION_PROPERTY_VMM_CAPABILITIES are not supported. In that
-	 * case it's valid to proceed as if all vmm_caps are disabled (zero).
-	 */
-	if (hv_call_get_partition_property_ex(HV_PARTITION_ID_SELF,
-					      HV_PARTITION_PROPERTY_VMM_CAPABILITIES,
-					      0, &mshv_root.vmm_caps,
-					      sizeof(mshv_root.vmm_caps)))
-		dev_warn(dev, "Unable to get VMM capabilities\n");
+	int ret;
+
+	ret = hv_call_get_partition_property_ex(HV_PARTITION_ID_SELF,
+						HV_PARTITION_PROPERTY_VMM_CAPABILITIES,
+						0, &mshv_root.vmm_caps,
+						sizeof(mshv_root.vmm_caps));
+	if (ret && hv_l1vh_partition()) {
+		dev_err(dev, "Failed to get VMM capabilities: %d\n", ret);
+		return ret;
+	}
 
 	dev_dbg(dev, "vmm_caps = %#llx\n", mshv_root.vmm_caps.as_uint64[0]);
+
+	return 0;
 }
 
 static int __init mshv_parent_partition_init(void)
@@ -2318,6 +2327,10 @@ static int __init mshv_parent_partition_init(void)
 
 	mshv_cpuhp_online = ret;
 
+	ret = mshv_init_vmm_caps(dev);
+	if (ret)
+		goto remove_cpu_state;
+
 	ret = mshv_retrieve_scheduler_type(dev);
 	if (ret)
 		goto remove_cpu_state;
@@ -2327,11 +2340,13 @@ static int __init mshv_parent_partition_init(void)
 	if (ret)
 		goto remove_cpu_state;
 
-	mshv_init_vmm_caps(dev);
+	ret = root_scheduler_init(dev);
+	if (ret)
+		goto exit_partition;
 
 	ret = mshv_debugfs_init();
 	if (ret)
-		goto exit_partition;
+		goto deinit_root_scheduler;
 
 	ret = mshv_irqfd_wq_init();
 	if (ret)
@@ -2346,6 +2361,8 @@ static int __init mshv_parent_partition_init(void)
 
 exit_debugfs:
 	mshv_debugfs_exit();
+deinit_root_scheduler:
+	root_scheduler_deinit();
 exit_partition:
 	if (hv_root_partition())
 		mshv_root_partition_exit();
@@ -2365,6 +2382,7 @@ static void __exit mshv_parent_partition_exit(void)
 	mshv_debugfs_exit();
 	misc_deregister(&mshv_dev);
 	mshv_irqfd_wq_cleanup();
+	root_scheduler_deinit();
 	if (hv_root_partition())
 		mshv_root_partition_exit();
 	cpuhp_remove_state(mshv_cpuhp_online);
diff --git a/include/hyperv/hvhdk_mini.h b/include/hyperv/hvhdk_mini.h
index 41a29bf8ec14..c0300910808b 100644
--- a/include/hyperv/hvhdk_mini.h
+++ b/include/hyperv/hvhdk_mini.h
@@ -87,6 +87,9 @@ enum hv_partition_property_code {
 	HV_PARTITION_PROPERTY_PRIVILEGE_FLAGS			= 0x00010000,
 	HV_PARTITION_PROPERTY_SYNTHETIC_PROC_FEATURES		= 0x00010001,
 
+	/* Integrated scheduling properties */
+	HV_PARTITION_PROPERTY_INTEGRATED_SCHEDULER_ENABLED	= 0x00020005,
+
 	/* Resource properties */
 	HV_PARTITION_PROPERTY_GPA_PAGE_ACCESS_TRACKING		= 0x00050005,
 	HV_PARTITION_PROPERTY_UNIMPLEMENTED_MSR_ACTION		= 0x00050017,
@@ -102,7 +105,7 @@ enum hv_partition_property_code {
 };
 
 #define HV_PARTITION_VMM_CAPABILITIES_BANK_COUNT		1
-#define HV_PARTITION_VMM_CAPABILITIES_RESERVED_BITFIELD_COUNT	59
+#define HV_PARTITION_VMM_CAPABILITIES_RESERVED_BITFIELD_COUNT	57
 
 struct hv_partition_property_vmm_capabilities {
 	u16 bank_count;
@@ -119,6 +122,8 @@ struct hv_partition_property_vmm_capabilities {
 			u64 reservedbit3: 1;
 #endif
 			u64 assignable_synthetic_proc_features: 1;
+			u64 reservedbit5: 1;
+			u64 vmm_enable_integrated_scheduler : 1;
 			u64 reserved0: HV_PARTITION_VMM_CAPABILITIES_RESERVED_BITFIELD_COUNT;
 		} __packed;
 	};



^ permalink raw reply related

* Re: [PATCH] x86/hyperv: Fix error pointer dereference
From: Ethan Tidmore @ 2026-02-18 19:11 UTC (permalink / raw)
  To: Ethan Tidmore, K . Y . Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Saurabh Sengar
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Michael Kelley, x86, linux-hyperv, linux-kernel
In-Reply-To: <20260218190903.7874-1-ethantidmore06@gmail.com>

On Wed Feb 18, 2026 at 1:09 PM CST, Ethan Tidmore wrote:
> The function idle_thread_get() can return an error pointer and is not
> checked for it. Add check for error pointer.
>
> Detected by Smatch:
> arch/x86/hyperv/hv_vtl.c:126 hv_vtl_bringup_vcpu() error:
> 'idle' dereferencing possible ERR_PTR()
>
> Fixes: 2b4b90e053a29 ("x86/hyperv: Use per cpu initial stack for vtl context")
> Signed-off-by: Ethan Tidmore <ethantidmore06@gmail.com>
> ---

Oops, forgot v2 header this is the v2.

Thanks,

ET

^ permalink raw reply

* [PATCH] x86/hyperv: Fix error pointer dereference
From: Ethan Tidmore @ 2026-02-18 19:09 UTC (permalink / raw)
  To: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Saurabh Sengar
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Michael Kelley, x86, linux-hyperv, linux-kernel,
	Ethan Tidmore

The function idle_thread_get() can return an error pointer and is not
checked for it. Add check for error pointer.

Detected by Smatch:
arch/x86/hyperv/hv_vtl.c:126 hv_vtl_bringup_vcpu() error:
'idle' dereferencing possible ERR_PTR()

Fixes: 2b4b90e053a29 ("x86/hyperv: Use per cpu initial stack for vtl context")
Signed-off-by: Ethan Tidmore <ethantidmore06@gmail.com>
---
v2:
- Fixed typo.

 arch/x86/hyperv/hv_vtl.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c
index c0edaed0efb3..9b6a9bc4ab76 100644
--- a/arch/x86/hyperv/hv_vtl.c
+++ b/arch/x86/hyperv/hv_vtl.c
@@ -110,7 +110,7 @@ static void hv_vtl_ap_entry(void)
 
 static int hv_vtl_bringup_vcpu(u32 target_vp_index, int cpu, u64 eip_ignored)
 {
-	u64 status;
+	u64 status, rsp, rip;
 	int ret = 0;
 	struct hv_enable_vp_vtl *input;
 	unsigned long irq_flags;
@@ -123,9 +123,11 @@ static int hv_vtl_bringup_vcpu(u32 target_vp_index, int cpu, u64 eip_ignored)
 	struct desc_struct *gdt;
 
 	struct task_struct *idle = idle_thread_get(cpu);
-	u64 rsp = (unsigned long)idle->thread.sp;
+	if (IS_ERR(idle))
+		return PTR_ERR(idle);
 
-	u64 rip = (u64)&hv_vtl_ap_entry;
+	rsp = (unsigned long)idle->thread.sp;
+	rip = (u64)&hv_vtl_ap_entry;
 
 	native_store_gdt(&gdt_ptr);
 	store_idt(&idt_ptr);
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH 1/4] mshv: Add nested virtualization creation flag
From: Easwar Hariharan @ 2026-02-18 17:09 UTC (permalink / raw)
  To: Anatol Belski; +Cc: linux-hyperv, easwar.hariharan, wei.liu, muislam
In-Reply-To: <20260218144802.1962513-1-anbelski@linux.microsoft.com>

On 2/18/2026 6:47 AM, Anatol Belski wrote:
> From: Muminul Islam <muislam@microsoft.com>
> 
> Introduce HV_PARTITION_CREATION_FLAG_NESTED_VIRTUALIZATION_CAPABLE to
> indicate support for nested virtualization during partition creation.
> 
> This enables clearer configuration and capability checks for nested
> virtualization scenarios.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> Signed-off-by: Muminul Islam <muislam@microsoft.com>
> ---
>  include/hyperv/hvhdk.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/hyperv/hvhdk.h b/include/hyperv/hvhdk.h
> index 08965970c17d..03afb7d0412b 100644
> --- a/include/hyperv/hvhdk.h
> +++ b/include/hyperv/hvhdk.h
> @@ -328,6 +328,7 @@ union hv_partition_isolation_properties {
>  #define HV_PARTITION_ISOLATION_HOST_TYPE_RESERVED   0x2
>  
>  /* Note: Exo partition is enabled by default */
> +#define HV_PARTITION_CREATION_FLAG_NESTED_VIRTUALIZATION_CAPABLE	BIT(1)
>  #define HV_PARTITION_CREATION_FLAG_GPA_SUPER_PAGES_ENABLED		BIT(4)
>  #define HV_PARTITION_CREATION_FLAG_EXO_PARTITION			BIT(8)
>  #define HV_PARTITION_CREATION_FLAG_LAPIC_ENABLED			BIT(13)

Patches 1, 2, and 3 can all be squashed into 1 patch.

Thanks,
Easwar (he/him)

^ permalink raw reply

* [PATCH 1/1] Drivers: hv: vmbus: Simplify allocation of vmbus_evt
From: Michael Kelley @ 2026-02-18 17:01 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, linux-hyperv; +Cc: linux-kernel

From: Michael Kelley <mhklinux@outlook.com>

The per-cpu variable vmbus_evt is currently dynamically allocated. It's
only 8 bytes, so just allocate it statically to simplify and save a few
lines of code.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
 drivers/hv/vmbus_drv.c | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 97dfa529d250..2219ce41b384 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -51,7 +51,7 @@ static struct device  *vmbus_root_device;
 
 static int hyperv_cpuhp_online;
 
-static long __percpu *vmbus_evt;
+static DEFINE_PER_CPU(long, vmbus_evt);
 
 /* Values parsed from ACPI DSDT */
 int vmbus_irq;
@@ -1475,13 +1475,11 @@ static int vmbus_bus_init(void)
 	if (vmbus_irq == -1) {
 		hv_setup_vmbus_handler(vmbus_isr);
 	} else {
-		vmbus_evt = alloc_percpu(long);
 		ret = request_percpu_irq(vmbus_irq, vmbus_percpu_isr,
-				"Hyper-V VMbus", vmbus_evt);
+				"Hyper-V VMbus", &vmbus_evt);
 		if (ret) {
 			pr_err("Can't request Hyper-V VMbus IRQ %d, Err %d",
 					vmbus_irq, ret);
-			free_percpu(vmbus_evt);
 			goto err_setup;
 		}
 	}
@@ -1510,12 +1508,10 @@ static int vmbus_bus_init(void)
 	return 0;
 
 err_connect:
-	if (vmbus_irq == -1) {
+	if (vmbus_irq == -1)
 		hv_remove_vmbus_handler();
-	} else {
-		free_percpu_irq(vmbus_irq, vmbus_evt);
-		free_percpu(vmbus_evt);
-	}
+	else
+		free_percpu_irq(vmbus_irq, &vmbus_evt);
 err_setup:
 	bus_unregister(&hv_bus);
 	return ret;
@@ -2981,12 +2977,11 @@ static void __exit vmbus_exit(void)
 	vmbus_connection.conn_state = DISCONNECTED;
 	hv_stimer_global_cleanup();
 	vmbus_disconnect();
-	if (vmbus_irq == -1) {
+	if (vmbus_irq == -1)
 		hv_remove_vmbus_handler();
-	} else {
-		free_percpu_irq(vmbus_irq, vmbus_evt);
-		free_percpu(vmbus_evt);
-	}
+	else
+		free_percpu_irq(vmbus_irq, &vmbus_evt);
+
 	for_each_online_cpu(cpu) {
 		struct hv_per_cpu_context *hv_cpu
 			= per_cpu_ptr(hv_context.cpu_context, cpu);
-- 
2.25.1


^ permalink raw reply related

* [PATCH 4/4] mshv: Add SMT_ENABLED_GUEST partition creation flag
From: Anatol Belski @ 2026-02-18 14:48 UTC (permalink / raw)
  To: linux-hyperv; +Cc: wei.liu, muislam
In-Reply-To: <20260218144802.1962513-1-anbelski@linux.microsoft.com>

Add support for HV_PARTITION_CREATION_FLAG_SMT_ENABLED_GUEST
to allow userspace VMMs to enable SMT for guest partitions.

Expose this via new MSHV_PT_BIT_SMT_ENABLED_GUEST flag in the UAPI.

Withouth this flag, the hypervisor schedules guest VPs incorrectly,
causing SMT unusable.

Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
---
 drivers/hv/mshv_root_main.c | 2 ++
 include/hyperv/hvhdk.h      | 1 +
 include/uapi/linux/mshv.h   | 1 +
 3 files changed, 4 insertions(+)

diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index fb3ade44e1f1..899e055d975f 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -1923,6 +1923,8 @@ static long mshv_ioctl_process_pt_flags(void __user *user_arg, u64 *pt_flags,
 		*pt_flags |= HV_PARTITION_CREATION_FLAG_GPA_SUPER_PAGES_ENABLED;
 	if (args.pt_flags & BIT(MSHV_PT_BIT_NESTED_VIRTUALIZATION))
 		*pt_flags |= HV_PARTITION_CREATION_FLAG_NESTED_VIRTUALIZATION_CAPABLE;
+	if (args.pt_flags & BIT(MSHV_PT_BIT_SMT_ENABLED_GUEST))
+		*pt_flags |= HV_PARTITION_CREATION_FLAG_SMT_ENABLED_GUEST;
 
 	isol_props->as_uint64 = 0;
 
diff --git a/include/hyperv/hvhdk.h b/include/hyperv/hvhdk.h
index 03afb7d0412b..331cebc471e1 100644
--- a/include/hyperv/hvhdk.h
+++ b/include/hyperv/hvhdk.h
@@ -328,6 +328,7 @@ union hv_partition_isolation_properties {
 #define HV_PARTITION_ISOLATION_HOST_TYPE_RESERVED   0x2
 
 /* Note: Exo partition is enabled by default */
+#define HV_PARTITION_CREATION_FLAG_SMT_ENABLED_GUEST			BIT(0)
 #define HV_PARTITION_CREATION_FLAG_NESTED_VIRTUALIZATION_CAPABLE	BIT(1)
 #define HV_PARTITION_CREATION_FLAG_GPA_SUPER_PAGES_ENABLED		BIT(4)
 #define HV_PARTITION_CREATION_FLAG_EXO_PARTITION			BIT(8)
diff --git a/include/uapi/linux/mshv.h b/include/uapi/linux/mshv.h
index 7ef5dd67a232..e0645a34b55b 100644
--- a/include/uapi/linux/mshv.h
+++ b/include/uapi/linux/mshv.h
@@ -28,6 +28,7 @@ enum {
 	MSHV_PT_BIT_GPA_SUPER_PAGES,
 	MSHV_PT_BIT_CPU_AND_XSAVE_FEATURES,
 	MSHV_PT_BIT_NESTED_VIRTUALIZATION,
+	MSHV_PT_BIT_SMT_ENABLED_GUEST,
 	MSHV_PT_BIT_COUNT,
 };
 
-- 
2.34.1


^ permalink raw reply related

* [PATCH 3/4] drivers: hv: enable nested virtualization
From: Anatol Belski @ 2026-02-18 14:48 UTC (permalink / raw)
  To: linux-hyperv; +Cc: wei.liu, muislam
In-Reply-To: <20260218144802.1962513-1-anbelski@linux.microsoft.com>

From: Muminul Islam <muislam@microsoft.com>

Based on the bits provided by VMM, enable the nested
virtualization in the partition creation flag.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
---
 drivers/hv/mshv_root_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index 681b58154d5e..fb3ade44e1f1 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -1921,6 +1921,8 @@ static long mshv_ioctl_process_pt_flags(void __user *user_arg, u64 *pt_flags,
 		*pt_flags |= HV_PARTITION_CREATION_FLAG_X2APIC_CAPABLE;
 	if (args.pt_flags & BIT_ULL(MSHV_PT_BIT_GPA_SUPER_PAGES))
 		*pt_flags |= HV_PARTITION_CREATION_FLAG_GPA_SUPER_PAGES_ENABLED;
+	if (args.pt_flags & BIT(MSHV_PT_BIT_NESTED_VIRTUALIZATION))
+		*pt_flags |= HV_PARTITION_CREATION_FLAG_NESTED_VIRTUALIZATION_CAPABLE;
 
 	isol_props->as_uint64 = 0;
 
-- 
2.34.1


^ permalink raw reply related

* [PATCH 2/4] hyperv: uapi: Add bit for nested virtualization
From: Anatol Belski @ 2026-02-18 14:48 UTC (permalink / raw)
  To: linux-hyperv; +Cc: wei.liu, muislam
In-Reply-To: <20260218144802.1962513-1-anbelski@linux.microsoft.com>

From: Muminul Islam <muislam@microsoft.com>

Add a new bit for nested virtualization creation flag.
This is exposed to user-space API to enable during partition
creation.

Signed-off-by: Muminul Islam <muislam@microsoft.com>
---
 include/uapi/linux/mshv.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/mshv.h b/include/uapi/linux/mshv.h
index dee3ece28ce5..7ef5dd67a232 100644
--- a/include/uapi/linux/mshv.h
+++ b/include/uapi/linux/mshv.h
@@ -27,6 +27,7 @@ enum {
 	MSHV_PT_BIT_X2APIC,
 	MSHV_PT_BIT_GPA_SUPER_PAGES,
 	MSHV_PT_BIT_CPU_AND_XSAVE_FEATURES,
+	MSHV_PT_BIT_NESTED_VIRTUALIZATION,
 	MSHV_PT_BIT_COUNT,
 };
 
-- 
2.34.1


^ permalink raw reply related

* [PATCH 1/4] mshv: Add nested virtualization creation flag
From: Anatol Belski @ 2026-02-18 14:47 UTC (permalink / raw)
  To: linux-hyperv; +Cc: wei.liu, muislam

From: Muminul Islam <muislam@microsoft.com>

Introduce HV_PARTITION_CREATION_FLAG_NESTED_VIRTUALIZATION_CAPABLE to
indicate support for nested virtualization during partition creation.

This enables clearer configuration and capability checks for nested
virtualization scenarios.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
---
 include/hyperv/hvhdk.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/hyperv/hvhdk.h b/include/hyperv/hvhdk.h
index 08965970c17d..03afb7d0412b 100644
--- a/include/hyperv/hvhdk.h
+++ b/include/hyperv/hvhdk.h
@@ -328,6 +328,7 @@ union hv_partition_isolation_properties {
 #define HV_PARTITION_ISOLATION_HOST_TYPE_RESERVED   0x2
 
 /* Note: Exo partition is enabled by default */
+#define HV_PARTITION_CREATION_FLAG_NESTED_VIRTUALIZATION_CAPABLE	BIT(1)
 #define HV_PARTITION_CREATION_FLAG_GPA_SUPER_PAGES_ENABLED		BIT(4)
 #define HV_PARTITION_CREATION_FLAG_EXO_PARTITION			BIT(8)
 #define HV_PARTITION_CREATION_FLAG_LAPIC_ENABLED			BIT(13)
-- 
2.34.1


^ permalink raw reply related

* [PATCH] mshv: expose hv_call_scrub_partition
From: Magnus Kulke @ 2026-02-18 14:19 UTC (permalink / raw)
  To: wei.liu, haiyangz, kys, decui, linux-hyperv
  Cc: skinsburskii, magnuskulke, linux-kernel, Magnus Kulke

This hv call needs to be exposed for VMMs to be able to soft-reboot
guests. It will reset APIC and state of para-virtualized devices like
SynIC.

Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com>
---
 drivers/hv/mshv_root_main.c | 1 +
 include/hyperv/hvgdk_mini.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index cb2729f99e2c5..7c13d5f36437c 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -143,6 +143,7 @@ static u16 mshv_passthru_hvcalls[] = {
 	HVCALL_READ_GPA,
 	HVCALL_WRITE_GPA,
 	HVCALL_CLEAR_VIRTUAL_INTERRUPT,
+	HVCALL_SCRUB_PARTITION,
 	HVCALL_REGISTER_INTERCEPT_RESULT,
 	HVCALL_ASSERT_VIRTUAL_INTERRUPT,
 	HVCALL_GET_GPA_PAGES_ACCESS_STATES,
diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
index f98eb41342d40..9120fcf0161a4 100644
--- a/include/hyperv/hvgdk_mini.h
+++ b/include/hyperv/hvgdk_mini.h
@@ -501,6 +501,7 @@ union hv_vp_assist_msr_contents {	 /* HV_REGISTER_VP_ASSIST_PAGE */
 #define HVCALL_ENTER_SLEEP_STATE			0x0084
 #define HVCALL_NOTIFY_PARTITION_EVENT			0x0087
 #define HVCALL_NOTIFY_PORT_RING_EMPTY			0x008b
+#define HVCALL_SCRUB_PARTITION				0x008d
 #define HVCALL_REGISTER_INTERCEPT_RESULT		0x0091
 #define HVCALL_ASSERT_VIRTUAL_INTERRUPT			0x0094
 #define HVCALL_CREATE_PORT				0x0095
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH v2] x86/hyperv: Reserve 3 interrupt vectors used exclusively by mshv
From: kernel test robot @ 2026-02-18 12:35 UTC (permalink / raw)
  To: Mukesh R, linux-hyperv, linux-kernel
  Cc: oe-kbuild-all, kys, haiyangz, wei.liu, decui, longli, tglx, mingo,
	bp, dave.hansen, x86, hpa
In-Reply-To: <20260217231158.1184736-1-mrathor@linux.microsoft.com>

Hi Mukesh,

kernel test robot noticed the following build warnings:

[auto build test WARNING on tip/x86/core]
[also build test WARNING on linus/master v6.19 next-20260217]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Mukesh-R/x86-hyperv-Reserve-3-interrupt-vectors-used-exclusively-by-mshv/20260218-071406
base:   tip/x86/core
patch link:    https://lore.kernel.org/r/20260217231158.1184736-1-mrathor%40linux.microsoft.com
patch subject: [PATCH v2] x86/hyperv: Reserve 3 interrupt vectors used exclusively by mshv
config: x86_64-rhel-9.4 (https://download.01.org/0day-ci/archive/20260218/202602182000.O5dSFVVd-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260218/202602182000.O5dSFVVd-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602182000.O5dSFVVd-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> arch/x86/kernel/cpu/mshyperv.c:485:13: warning: 'hv_reserve_irq_vectors' defined but not used [-Wunused-function]
     485 | static void hv_reserve_irq_vectors(void)
         |             ^~~~~~~~~~~~~~~~~~~~~~


vim +/hv_reserve_irq_vectors +485 arch/x86/kernel/cpu/mshyperv.c

   480	
   481	/*
   482	 * Reserve vectors hard coded in the hypervisor. If used outside, the hypervisor
   483	 * will either crash or hang or attempt to break into debugger.
   484	 */
 > 485	static void hv_reserve_irq_vectors(void)
   486	{
   487		#define HYPERV_DBG_FASTFAIL_VECTOR	0x29
   488		#define HYPERV_DBG_ASSERT_VECTOR	0x2C
   489		#define HYPERV_DBG_SERVICE_VECTOR	0x2D
   490	
   491		if (cpu_feature_enabled(X86_FEATURE_FRED))
   492			return;
   493	
   494		if (test_and_set_bit(HYPERV_DBG_ASSERT_VECTOR, system_vectors) ||
   495		    test_and_set_bit(HYPERV_DBG_SERVICE_VECTOR, system_vectors) ||
   496		    test_and_set_bit(HYPERV_DBG_FASTFAIL_VECTOR, system_vectors))
   497			BUG();
   498	
   499		pr_info("Hyper-V:reserve vectors: %d %d %d\n", HYPERV_DBG_ASSERT_VECTOR,
   500			HYPERV_DBG_SERVICE_VECTOR, HYPERV_DBG_FASTFAIL_VECTOR);
   501	}
   502	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* [PATCH RESEND] mshv: Use try_cmpxchg() instead of cmpxchg()
From: Uros Bizjak @ 2026-02-18 11:00 UTC (permalink / raw)
  To: linux-hyperv, linux-kernel
  Cc: Uros Bizjak, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Long Li
In-Reply-To: <20260218102604.178561-1-ubizjak@gmail.com>

Use !try_cmpxchg() instead of cmpxchg (*ptr, old, new) != old.
x86 CMPXCHG instruction returns success in ZF flag, so this
change saves a compare after CMPXCHG.

The generated assembly code improves from e.g.:

     415:	48 8b 44 24 30       	mov    0x30(%rsp),%rax
     41a:	48 8b 54 24 38       	mov    0x38(%rsp),%rdx
     41f:	f0 49 0f b1 91 a8 02 	lock cmpxchg %rdx,0x2a8(%r9)
     426:	00 00
     428:	48 3b 44 24 30       	cmp    0x30(%rsp),%rax
     42d:	0f 84 09 ff ff ff    	je     33c <...>

to:

     415:	48 8b 44 24 30       	mov    0x30(%rsp),%rax
     41a:	48 8b 54 24 38       	mov    0x38(%rsp),%rdx
     41f:	f0 49 0f b1 91 a8 02 	lock cmpxchg %rdx,0x2a8(%r9)
     426:	00 00
     428:	0f 84 0e ff ff ff    	je     33c <...>

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Long Li <longli@microsoft.com>
---
 drivers/hv/hyperv_vmbus.h | 4 ++--
 drivers/hv/mshv_eventfd.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index cdbc5f5c3215..7bd8f8486e85 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -370,8 +370,8 @@ static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type)
 	 * CHANNELMSG_UNLOAD_RESPONSE and we don't care about other messages
 	 * on crash.
 	 */
-	if (cmpxchg(&msg->header.message_type, old_msg_type,
-		    HVMSG_NONE) != old_msg_type)
+	if (!try_cmpxchg(&msg->header.message_type,
+			 &old_msg_type, HVMSG_NONE))
 		return;
 
 	/*
diff --git a/drivers/hv/mshv_eventfd.c b/drivers/hv/mshv_eventfd.c
index 0b75ff1edb73..525e002758e4 100644
--- a/drivers/hv/mshv_eventfd.c
+++ b/drivers/hv/mshv_eventfd.c
@@ -128,8 +128,8 @@ static int mshv_vp_irq_try_set_vector(struct mshv_vp *vp, u32 vector)
 
 	new_iv.vector[new_iv.vector_count++] = vector;
 
-	if (cmpxchg(&vp->vp_register_page->interrupt_vectors.as_uint64,
-		    iv.as_uint64, new_iv.as_uint64) != iv.as_uint64)
+	if (!try_cmpxchg(&vp->vp_register_page->interrupt_vectors.as_uint64,
+			 &iv.as_uint64, new_iv.as_uint64))
 		return -EAGAIN;
 
 	return 0;
-- 
2.53.0


^ permalink raw reply related

* [PATCH] mshv: Use try_cmpxchg() instead of cmpxchg()
From: Uros Bizjak @ 2026-02-18 10:25 UTC (permalink / raw)
  To: linux-hyperv, linux-kernel
  Cc: Uros Bizjak, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Long Li

Use !try_cmpxchg() instead of cmpxchg (*ptr, old, new) != old.
x86 CMPXCHG instruction returns success in ZF flag, so this
change saves a compare after cmpxchg().

The generated assembly code improves from e.g.:

     415:	48 8b 44 24 30       	mov    0x30(%rsp),%rax
     41a:	48 8b 54 24 38       	mov    0x38(%rsp),%rdx
     41f:	f0 49 0f b1 91 a8 02 	lock cmpxchg %rdx,0x2a8(%r9)
     426:	00 00
     428:	48 3b 44 24 30       	cmp    0x30(%rsp),%rax
     42d:	0f 84 09 ff ff ff    	je     33c <...>

to:

     415:	48 8b 44 24 30       	mov    0x30(%rsp),%rax
     41a:	48 8b 54 24 38       	mov    0x38(%rsp),%rdx
     41f:	f0 49 0f b1 91 a8 02 	lock cmpxchg %rdx,0x2a8(%r9)
     426:	00 00
     428:	0f 84 0e ff ff ff    	je     33c <...>

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Long Li <longli@microsoft.com>
---
 drivers/hv/hyperv_vmbus.h | 4 ++--
 drivers/hv/mshv_eventfd.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index cdbc5f5c3215..7bd8f8486e85 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -370,8 +370,8 @@ static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type)
 	 * CHANNELMSG_UNLOAD_RESPONSE and we don't care about other messages
 	 * on crash.
 	 */
-	if (cmpxchg(&msg->header.message_type, old_msg_type,
-		    HVMSG_NONE) != old_msg_type)
+	if (!try_cmpxchg(&msg->header.message_type,
+			 &old_msg_type, HVMSG_NONE))
 		return;
 
 	/*
diff --git a/drivers/hv/mshv_eventfd.c b/drivers/hv/mshv_eventfd.c
index 0b75ff1edb73..525e002758e4 100644
--- a/drivers/hv/mshv_eventfd.c
+++ b/drivers/hv/mshv_eventfd.c
@@ -128,8 +128,8 @@ static int mshv_vp_irq_try_set_vector(struct mshv_vp *vp, u32 vector)
 
 	new_iv.vector[new_iv.vector_count++] = vector;
 
-	if (cmpxchg(&vp->vp_register_page->interrupt_vectors.as_uint64,
-		    iv.as_uint64, new_iv.as_uint64) != iv.as_uint64)
+	if (!try_cmpxchg(&vp->vp_register_page->interrupt_vectors.as_uint64,
+			 &iv.as_uint64, new_iv.as_uint64))
 		return -EAGAIN;
 
 	return 0;
-- 
2.53.0


^ permalink raw reply related

* [PATCH v3 05/16] x86/msr: Minimize usage of native_*() msr access functions
From: Juergen Gross @ 2026-02-18  8:21 UTC (permalink / raw)
  To: linux-kernel, x86, linux-hyperv, kvm
  Cc: Juergen Gross, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Paolo Bonzini,
	Vitaly Kuznetsov, Sean Christopherson, Boris Ostrovsky, xen-devel
In-Reply-To: <20260218082133.400602-1-jgross@suse.com>

In order to prepare for some MSR access function reorg work, switch
most users of native_{read|write}_msr[_safe]() to the more generic
rdmsr*()/wrmsr*() variants.

For now this will have some intermediate performance impact with
paravirtualization configured when running on bare metal, but this
is a prereq change for the planned direct inlining of the rdmsr/wrmsr
instructions with this configuration.

The main reason for this switch is the planned move of the MSR trace
function invocation from the native_*() functions to the generic
rdmsr*()/wrmsr*() variants. Without this switch the users of the
native_*() functions would lose the related tracing entries.

Note that the Xen related MSR access functions will not be switched,
as these will be handled after the move of the trace hooks.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Sean Christopherson <seanjc@google.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
---
 arch/x86/hyperv/ivm.c          |  2 +-
 arch/x86/kernel/cpu/mshyperv.c |  7 +++++--
 arch/x86/kernel/kvmclock.c     |  2 +-
 arch/x86/kvm/svm/svm.c         | 16 ++++++++--------
 arch/x86/xen/pmu.c             |  4 ++--
 5 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 651771534cae..1b2222036a0b 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -327,7 +327,7 @@ int hv_snp_boot_ap(u32 apic_id, unsigned long start_ip, unsigned int cpu)
 	asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
 	hv_populate_vmcb_seg(vmsa->ds, vmsa->gdtr.base);
 
-	vmsa->efer = native_read_msr(MSR_EFER);
+	rdmsrq(MSR_EFER, vmsa->efer);
 
 	vmsa->cr4 = native_read_cr4();
 	vmsa->cr3 = __native_read_cr3();
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 579fb2c64cfd..9bebb1a1ebee 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -111,9 +111,12 @@ void hv_para_set_sint_proxy(bool enable)
  */
 u64 hv_para_get_synic_register(unsigned int reg)
 {
+	u64 val;
+
 	if (WARN_ON(!ms_hyperv.paravisor_present || !hv_is_synic_msr(reg)))
 		return ~0ULL;
-	return native_read_msr(reg);
+	rdmsrq(reg, val);
+	return val;
 }
 
 /*
@@ -123,7 +126,7 @@ void hv_para_set_synic_register(unsigned int reg, u64 val)
 {
 	if (WARN_ON(!ms_hyperv.paravisor_present || !hv_is_synic_msr(reg)))
 		return;
-	native_write_msr(reg, val);
+	wrmsrq(reg, val);
 }
 
 u64 hv_get_msr(unsigned int reg)
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index b5991d53fc0e..1002bdd45c0f 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -197,7 +197,7 @@ static void kvm_setup_secondary_clock(void)
 void kvmclock_disable(void)
 {
 	if (msr_kvm_system_time)
-		native_write_msr(msr_kvm_system_time, 0);
+		wrmsrq(msr_kvm_system_time, 0);
 }
 
 static void __init kvmclock_init_mem(void)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8f8bc863e214..1c0e7cae9e49 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -389,12 +389,12 @@ static void svm_init_erratum_383(void)
 		return;
 
 	/* Use _safe variants to not break nested virtualization */
-	if (native_read_msr_safe(MSR_AMD64_DC_CFG, &val))
+	if (rdmsrq_safe(MSR_AMD64_DC_CFG, &val))
 		return;
 
 	val |= (1ULL << 47);
 
-	native_write_msr_safe(MSR_AMD64_DC_CFG, val);
+	wrmsrq_safe(MSR_AMD64_DC_CFG, val);
 
 	erratum_383_found = true;
 }
@@ -554,9 +554,9 @@ static int svm_enable_virtualization_cpu(void)
 		u64 len, status = 0;
 		int err;
 
-		err = native_read_msr_safe(MSR_AMD64_OSVW_ID_LENGTH, &len);
+		err = rdmsrq_safe(MSR_AMD64_OSVW_ID_LENGTH, &len);
 		if (!err)
-			err = native_read_msr_safe(MSR_AMD64_OSVW_STATUS, &status);
+			err = rdmsrq_safe(MSR_AMD64_OSVW_STATUS, &status);
 
 		if (err)
 			osvw_status = osvw_len = 0;
@@ -2029,7 +2029,7 @@ static bool is_erratum_383(void)
 	if (!erratum_383_found)
 		return false;
 
-	if (native_read_msr_safe(MSR_IA32_MC0_STATUS, &value))
+	if (rdmsrq_safe(MSR_IA32_MC0_STATUS, &value))
 		return false;
 
 	/* Bit 62 may or may not be set for this mce */
@@ -2040,11 +2040,11 @@ static bool is_erratum_383(void)
 
 	/* Clear MCi_STATUS registers */
 	for (i = 0; i < 6; ++i)
-		native_write_msr_safe(MSR_IA32_MCx_STATUS(i), 0);
+		wrmsrq_safe(MSR_IA32_MCx_STATUS(i), 0);
 
-	if (!native_read_msr_safe(MSR_IA32_MCG_STATUS, &value)) {
+	if (!rdmsrq_safe(MSR_IA32_MCG_STATUS, &value)) {
 		value &= ~(1ULL << 2);
-		native_write_msr_safe(MSR_IA32_MCG_STATUS, value);
+		wrmsrq_safe(MSR_IA32_MCG_STATUS, value);
 	}
 
 	/* Flush tlb to evict multi-match entries */
diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index 8f89ce0b67e3..d49a3bdc448b 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -323,7 +323,7 @@ static u64 xen_amd_read_pmc(int counter)
 		u64 val;
 
 		msr = amd_counters_base + (counter * amd_msr_step);
-		native_read_msr_safe(msr, &val);
+		rdmsrq_safe(msr, &val);
 		return val;
 	}
 
@@ -349,7 +349,7 @@ static u64 xen_intel_read_pmc(int counter)
 		else
 			msr = MSR_IA32_PERFCTR0 + counter;
 
-		native_read_msr_safe(msr, &val);
+		rdmsrq_safe(msr, &val);
 		return val;
 	}
 
-- 
2.53.0


^ permalink raw reply related

* [PATCH v3 00/16] x86/msr: Inline rdmsr/wrmsr instructions
From: Juergen Gross @ 2026-02-18  8:21 UTC (permalink / raw)
  To: linux-kernel, x86, linux-coco, kvm, linux-hyperv, virtualization,
	llvm
  Cc: Juergen Gross, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Kiryl Shutsemau, Rick Edgecombe,
	Sean Christopherson, Paolo Bonzini, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Vitaly Kuznetsov,
	Boris Ostrovsky, xen-devel, Ajay Kaher, Alexey Makhalov,
	Broadcom internal kernel review list, Andy Lutomirski,
	Peter Zijlstra, Xin Li, Nathan Chancellor, Nick Desaulniers,
	Bill Wendling, Justin Stitt, Josh Poimboeuf

When building a kernel with CONFIG_PARAVIRT_XXL the paravirt
infrastructure will always use functions for reading or writing MSRs,
even when running on bare metal.

Switch to inline RDMSR/WRMSR instructions in this case, reducing the
paravirt overhead.

The first patch is a prerequisite fix for alternative patching. Its
is needed due to the initial indirect call needs to be padded with
NOPs in some cases with the following patches.

In order to make this less intrusive, some further reorganization of
the MSR access helpers is done in the patches 1-6.

The next 4 patches are converting the non-paravirt case to use direct
inlining of the MSR access instructions, including the WRMSRNS
instruction and the immediate variants of RDMSR and WRMSR if possible.

Patches 11-13 are some further preparations for making the real switch
to directly patch in the native MSR instructions easier.

Patch 14 is switching the paravirt MSR function interface from normal
call ABI to one more similar to the native MSR instructions.

Patch 15 is a little cleanup patch.

Patch 16 is the final step for patching in the native MSR instructions
when not running as a Xen PV guest.

This series has been tested to work with Xen PV and on bare metal.

Note that there is more room for improvement. This series is sent out
to get a first impression how the code will basically look like.

Right now the same problem is solved differently for the paravirt and
the non-paravirt cases. In case this is not desired, there are two
possibilities to merge the two implementations. Both solutions have
the common idea to have rather similar code for paravirt and
non-paravirt variants, but just use a different main macro for
generating the respective code. For making the code of both possible
scenarios more similar, the following variants are possible:

1. Remove the micro-optimizations of the non-paravirt case, making
   it similar to the paravirt code in my series. This has the
   advantage of being more simple, but might have a very small
   negative performance impact (probably not really detectable).

2. Add the same micro-optimizations to the paravirt case, requiring
   to enhance paravirt patching to support a to be patched indirect
   call in the middle of the initial code snipplet.

In both cases the native MSR function variants would no longer be
usable in the paravirt case, but this would mostly affect Xen, as it
would need to open code the WRMSR/RDMSR instructions to be used
instead the native_*msr*() functions.

Changes since V2:
- switch back to the paravirt approach

Changes since V1:
- Use Xin Li's approach for inlining
- Several new patches

Juergen Gross (16):
  x86/alternative: Support alt_replace_call() with instructions after
    call
  coco/tdx: Rename MSR access helpers
  x86/sev: Replace call of native_wrmsr() with native_wrmsrq()
  KVM: x86: Remove the KVM private read_msr() function
  x86/msr: Minimize usage of native_*() msr access functions
  x86/msr: Move MSR trace calls one function level up
  x86/opcode: Add immediate form MSR instructions
  x86/extable: Add support for immediate form MSR instructions
  x86/msr: Use the alternatives mechanism for WRMSR
  x86/msr: Use the alternatives mechanism for RDMSR
  x86/alternatives: Add ALTERNATIVE_4()
  x86/paravirt: Split off MSR related hooks into new header
  x86/paravirt: Prepare support of MSR instruction interfaces
  x86/paravirt: Switch MSR access pv_ops functions to instruction
    interfaces
  x86/msr: Reduce number of low level MSR access helpers
  x86/paravirt: Use alternatives for MSR access with paravirt

 arch/x86/coco/sev/internal.h              |   7 +-
 arch/x86/coco/tdx/tdx.c                   |   8 +-
 arch/x86/hyperv/ivm.c                     |   2 +-
 arch/x86/include/asm/alternative.h        |   6 +
 arch/x86/include/asm/fred.h               |   2 +-
 arch/x86/include/asm/kvm_host.h           |  10 -
 arch/x86/include/asm/msr.h                | 345 ++++++++++++++++------
 arch/x86/include/asm/paravirt-msr.h       | 148 ++++++++++
 arch/x86/include/asm/paravirt.h           |  67 -----
 arch/x86/include/asm/paravirt_types.h     |  57 ++--
 arch/x86/include/asm/qspinlock_paravirt.h |   4 +-
 arch/x86/kernel/alternative.c             |   5 +-
 arch/x86/kernel/cpu/mshyperv.c            |   7 +-
 arch/x86/kernel/kvmclock.c                |   2 +-
 arch/x86/kernel/paravirt.c                |  42 ++-
 arch/x86/kvm/svm/svm.c                    |  16 +-
 arch/x86/kvm/vmx/tdx.c                    |   2 +-
 arch/x86/kvm/vmx/vmx.c                    |   8 +-
 arch/x86/lib/x86-opcode-map.txt           |   5 +-
 arch/x86/mm/extable.c                     |  35 ++-
 arch/x86/xen/enlighten_pv.c               |  52 +++-
 arch/x86/xen/pmu.c                        |   4 +-
 tools/arch/x86/lib/x86-opcode-map.txt     |   5 +-
 tools/objtool/check.c                     |   1 +
 24 files changed, 576 insertions(+), 264 deletions(-)
 create mode 100644 arch/x86/include/asm/paravirt-msr.h

-- 
2.53.0

^ permalink raw reply

* Re: [PATCH 2/2] mshv: Add kexec blocking support
From: Wei Liu @ 2026-02-18  8:14 UTC (permalink / raw)
  To: Mukesh R
  Cc: Stanislav Kinsburskii, rppt, akpm, bhe, kys, haiyangz, wei.liu,
	decui, longli, kexec, linux-hyperv, linux-kernel
In-Reply-To: <32c4bc2a-5dd1-c54d-a089-45bfad6eec94@linux.microsoft.com>

On Thu, Feb 12, 2026 at 02:11:13PM -0800, Mukesh R wrote:
> On 1/28/26 09:42, Stanislav Kinsburskii wrote:
> > Add kexec notifier to prevent kexec when VMs are active or memory
> > is deposited. The notifier blocks kexec operations if:
> > - Active VMs exist in the partition table
> > - Pages are still deposited to the hypervisor
> > 
> > The kernel cannot access hypervisor deposited pages: any access
> > triggers a GPF. Until the deposited page state can be handed over
> > to the next kernel, kexec must be blocked if there is any shared
> > state between kernel and hypervisor.
> > 
> > For L1 host virtualization, attempt to withdraw all deposited memory before
> > allowing kexec to proceed. If withdrawal fails or pages remain deposited
> > block the kexec operation.
> > 
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> >   drivers/hv/Makefile            |    1 +
> >   drivers/hv/hv_proc.c           |    4 ++
> >   drivers/hv/mshv_kexec.c        |   66 ++++++++++++++++++++++++++++++++++++++++
> >   drivers/hv/mshv_root.h         |   14 ++++++++
> >   drivers/hv/mshv_root_hv_call.c |    2 +
> >   drivers/hv/mshv_root_main.c    |    7 ++++
> >   6 files changed, 94 insertions(+)
> >   create mode 100644 drivers/hv/mshv_kexec.c
> > 
> > diff --git a/drivers/hv/Makefile b/drivers/hv/Makefile
> > index a49f93c2d245..bb72be5cc525 100644
> > --- a/drivers/hv/Makefile
> > +++ b/drivers/hv/Makefile
> > @@ -15,6 +15,7 @@ hv_vmbus-$(CONFIG_HYPERV_TESTING)	+= hv_debugfs.o
> >   hv_utils-y := hv_util.o hv_kvp.o hv_snapshot.o hv_utils_transport.o
> >   mshv_root-y := mshv_root_main.o mshv_synic.o mshv_eventfd.o mshv_irq.o \
> >   	       mshv_root_hv_call.o mshv_portid_table.o mshv_regions.o
> > +mshv_root-$(CONFIG_KEXEC) += mshv_kexec.o
> >   mshv_vtl-y := mshv_vtl_main.o
> >   # Code that must be built-in
> > diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
> > index 89870c1b0087..39bbbedb0340 100644
> > --- a/drivers/hv/hv_proc.c
> > +++ b/drivers/hv/hv_proc.c
> > @@ -15,6 +15,8 @@
> >    */
> >   #define HV_DEPOSIT_MAX (HV_HYP_PAGE_SIZE / sizeof(u64) - 1)
> > +atomic_t hv_pages_deposited;
> > +
> >   /* Deposits exact number of pages. Must be called with interrupts enabled.  */
> >   int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
> >   {
> > @@ -93,6 +95,8 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
> >   		goto err_free_allocations;
> >   	}
> > +	atomic_add(page_count, &hv_pages_deposited);
> > +
> >   	ret = 0;
> >   	goto free_buf;
> > diff --git a/drivers/hv/mshv_kexec.c b/drivers/hv/mshv_kexec.c
> > new file mode 100644
> > index 000000000000..5222b2e4ff97
> > --- /dev/null
> > +++ b/drivers/hv/mshv_kexec.c
> > @@ -0,0 +1,66 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (c) 2026, Microsoft Corporation.
> > + *
> > + * Live update orchestration management for mshv_root module.
> > + *
> > + * Author: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > + */
> > +
> > +#include <linux/kexec.h>
> > +#include <linux/notifier.h>
> > +#include <asm/mshyperv.h>
> > +#include "mshv_root.h"
> > +
> > +static BLOCKING_NOTIFIER_HEAD(overlay_notify_chain);
> > +
> > +static int mshv_block_kexec_notify(struct notifier_block *nb,
> > +				   unsigned long action, void *arg)
> > +{
> > +	if (!hash_empty(mshv_root.pt_htable)) {
> > +		pr_warn("mshv: Cannot perform kexec while VMs are active\n");
> > +		return -EBUSY;
> > +	}
> > +
> > +	if (hv_l1vh_partition()) {
> > +		int err;
> > +
> > +		/* Attempt to withdraw all the deposited pages */
> > +		err = hv_call_withdraw_memory(U64_MAX, NUMA_NO_NODE,
> > +					      hv_current_partition_id);
> > +		if (err) {
> > +			pr_err("mshv: Failed to withdraw memory from L1 virtualization: %d\n",
> > +			       err);
> > +			return err;
> > +		}
> > +	}
> > +
> > +	if (atomic_read(&hv_pages_deposited)) {
> > +		pr_warn("mshv: Cannot perform kexec while pages are deposited\n");
> > +		return -EBUSY;
> > +	}
> > +	return 0;
> > +}
> > +
> 
> What guarantees another deposit won't happen after this. Are all cpus
> "locked" in kexec path and not doing anything at this point?
> 

An alternative is to block kexec if any pages have ever been deposited.
This is a very heavy-handed approach.

Wei

^ permalink raw reply

* Re: [PATCH v4] mshv: Add support for integrated scheduler
From: Wei Liu @ 2026-02-18  8:05 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel
In-Reply-To: <20260204071816.GN79272@liuwe-devbox-debian-v2.local>

On Wed, Feb 04, 2026 at 07:18:16AM +0000, Wei Liu wrote:
> On Mon, Feb 02, 2026 at 07:26:06PM +0000, Stanislav Kinsburskii wrote:
> > Query the hypervisor for integrated scheduler support and use it if
> > configured.
> > 
> > Microsoft Hypervisor originally provided two schedulers: root and core. The
> 
> Microsoft Hypervisor provides three schedulers: root, classic
> (with or without SMT) and core. The latter two are hypervisor based.
> 
> > root scheduler allows the root partition to schedule guest vCPUs across
> > physical cores, supporting both time slicing and CPU affinity (e.g., via
> > cgroups). In contrast, the core scheduler delegates vCPU-to-physical-core
> > scheduling entirely to the hypervisor.
> > 
> > Direct virtualization introduces a new privileged guest partition type - L1
> 
> Level-1 Virtualization Host.
> 
> > Virtual Host (L1VH) — which can create child partitions from its own
> > resources. These child partitions are effectively siblings, scheduled by
> > the hypervisor's core scheduler. This prevents the L1VH parent from setting
> > affinity or time slicing for its own processes or guest VPs. While cgroups,
> > CFS, and cpuset controllers can still be used, their effectiveness is
> > unpredictable, as the core scheduler swaps vCPUs according to its own logic
> > (typically round-robin across all allocated physical CPUs). As a result,
> > the system may appear to "steal" time from the L1VH and its children.
> > 
> > To address this, Microsoft Hypervisor introduces the integrated scheduler.
> > This allows an L1VH partition to schedule its own vCPUs and those of its
> > guests across its "physical" cores, effectively emulating root scheduler
> > behavior within the L1VH, while retaining core scheduler behavior for the
> > rest of the system.
> > 
> > The integrated scheduler is controlled by the root partition and gated by
> > the vmm_enable_integrated_scheduler capability bit. If set, the hypervisor
> > supports the integrated scheduler. The L1VH partition must then check if it
> > is enabled by querying the corresponding extended partition property. If
> > this property is true, the L1VH partition must use the root scheduler
> > logic; otherwise, it must use the core scheduler. This requirement makes
> > reading VMM capabilities in L1VH partition a requirement too.
> > 
> > Signed-off-by: Andreea Pintilie <anpintil@microsoft.com>
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> [...]
> > +++ b/include/hyperv/hvhdk_mini.h
> > @@ -87,6 +87,9 @@ enum hv_partition_property_code {
> >  	HV_PARTITION_PROPERTY_PRIVILEGE_FLAGS			= 0x00010000,
> >  	HV_PARTITION_PROPERTY_SYNTHETIC_PROC_FEATURES		= 0x00010001,
> >  
> > +	/* Integrated scheduling properties */
> > +	HV_PARTITION_PROPERTY_INTEGRATED_SCHEDULER_ENABLED	= 0x00020005,
> 
> The internal name is "HvPartitionPropertyHierarchicalIntegratedSchedulerEnabled".
> 
> You missed the "Hierarchical" part in the property code name.
> 

I attempt to apply this patch and fix these issues. Unfortunately it
doesn't apply cleanly to hyperv-next.

Wei

^ permalink raw reply

* Re: [PATCH v2] x86/hyperv: Reserve 3 interrupt vectors used exclusively by mshv
From: kernel test robot @ 2026-02-18  7:38 UTC (permalink / raw)
  To: Mukesh R, linux-hyperv, linux-kernel
  Cc: oe-kbuild-all, kys, haiyangz, wei.liu, decui, longli, tglx, mingo,
	bp, dave.hansen, x86, hpa
In-Reply-To: <20260217231158.1184736-1-mrathor@linux.microsoft.com>

Hi Mukesh,

kernel test robot noticed the following build warnings:

[auto build test WARNING on tip/x86/core]
[also build test WARNING on linus/master v6.19 next-20260217]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Mukesh-R/x86-hyperv-Reserve-3-interrupt-vectors-used-exclusively-by-mshv/20260218-071406
base:   tip/x86/core
patch link:    https://lore.kernel.org/r/20260217231158.1184736-1-mrathor%40linux.microsoft.com
patch subject: [PATCH v2] x86/hyperv: Reserve 3 interrupt vectors used exclusively by mshv
config: x86_64-rhel-9.4-ltp (https://download.01.org/0day-ci/archive/20260218/202602180851.Pi2PY5LX-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260218/202602180851.Pi2PY5LX-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602180851.Pi2PY5LX-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> arch/x86/kernel/cpu/mshyperv.c:485:13: warning: 'hv_reserve_irq_vectors' defined but not used [-Wunused-function]
     485 | static void hv_reserve_irq_vectors(void)
         |             ^~~~~~~~~~~~~~~~~~~~~~


vim +/hv_reserve_irq_vectors +485 arch/x86/kernel/cpu/mshyperv.c

   480	
   481	/*
   482	 * Reserve vectors hard coded in the hypervisor. If used outside, the hypervisor
   483	 * will either crash or hang or attempt to break into debugger.
   484	 */
 > 485	static void hv_reserve_irq_vectors(void)
   486	{
   487		#define HYPERV_DBG_FASTFAIL_VECTOR	0x29
   488		#define HYPERV_DBG_ASSERT_VECTOR	0x2C
   489		#define HYPERV_DBG_SERVICE_VECTOR	0x2D
   490	
   491		if (cpu_feature_enabled(X86_FEATURE_FRED))
   492			return;
   493	
   494		if (test_and_set_bit(HYPERV_DBG_ASSERT_VECTOR, system_vectors) ||
   495		    test_and_set_bit(HYPERV_DBG_SERVICE_VECTOR, system_vectors) ||
   496		    test_and_set_bit(HYPERV_DBG_FASTFAIL_VECTOR, system_vectors))
   497			BUG();
   498	
   499		pr_info("Hyper-V:reserve vectors: %d %d %d\n", HYPERV_DBG_ASSERT_VECTOR,
   500			HYPERV_DBG_SERVICE_VECTOR, HYPERV_DBG_FASTFAIL_VECTOR);
   501	}
   502	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [PATCH v3] drivers: hv: vmbus: Use kthread for vmbus interrupts on PREEMPT_RT
From: Saurabh Singh Sengar @ 2026-02-18  7:19 UTC (permalink / raw)
  To: Wei Liu
  Cc: Jan Kiszka, K. Y. Srinivasan, Haiyang Zhang, Dexuan Cui, Long Li,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	linux-hyperv, linux-kernel, Florian Bezdeka, RT, Mitchell Levy,
	Michael Kelley, Naman Jain
In-Reply-To: <20260218070557.GF2236050@liuwe-devbox-debian-v2.local>

On Wed, Feb 18, 2026 at 07:05:57AM +0000, Wei Liu wrote:
> On Mon, Feb 16, 2026 at 05:24:56PM +0100, Jan Kiszka wrote:
> > From: Jan Kiszka <jan.kiszka@siemens.com>
> > 
> > Resolves the following lockdep report when booting PREEMPT_RT on Hyper-V
> > with related guest support enabled:
> > 
> > [    1.127941] hv_vmbus: registering driver hyperv_drm
> > 
> > [    1.132518] =============================
> > [    1.132519] [ BUG: Invalid wait context ]
> > [    1.132521] 6.19.0-rc8+ #9 Not tainted
> > [    1.132524] -----------------------------
> > [    1.132525] swapper/0/0 is trying to lock:
> > [    1.132526] ffff8b9381bb3c90 (&channel->sched_lock){....}-{3:3}, at: vmbus_chan_sched+0xc4/0x2b0
> > [    1.132543] other info that might help us debug this:
> > [    1.132544] context-{2:2}
> > [    1.132545] 1 lock held by swapper/0/0:
> > [    1.132547]  #0: ffffffffa010c4c0 (rcu_read_lock){....}-{1:3}, at: vmbus_chan_sched+0x31/0x2b0
> > [    1.132557] stack backtrace:
> > [    1.132560] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.19.0-rc8+ #9 PREEMPT_{RT,(lazy)}
> > [    1.132565] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025
> > [    1.132567] Call Trace:
> > [    1.132570]  <IRQ>
> > [    1.132573]  dump_stack_lvl+0x6e/0xa0
> > [    1.132581]  __lock_acquire+0xee0/0x21b0
> > [    1.132592]  lock_acquire+0xd5/0x2d0
> > [    1.132598]  ? vmbus_chan_sched+0xc4/0x2b0
> > [    1.132606]  ? lock_acquire+0xd5/0x2d0
> > [    1.132613]  ? vmbus_chan_sched+0x31/0x2b0
> > [    1.132619]  rt_spin_lock+0x3f/0x1f0
> > [    1.132623]  ? vmbus_chan_sched+0xc4/0x2b0
> > [    1.132629]  ? vmbus_chan_sched+0x31/0x2b0
> > [    1.132634]  vmbus_chan_sched+0xc4/0x2b0
> > [    1.132641]  vmbus_isr+0x2c/0x150
> > [    1.132648]  __sysvec_hyperv_callback+0x5f/0xa0
> > [    1.132654]  sysvec_hyperv_callback+0x88/0xb0
> > [    1.132658]  </IRQ>
> > [    1.132659]  <TASK>
> > [    1.132660]  asm_sysvec_hyperv_callback+0x1a/0x20
> > 
> > As code paths that handle vmbus IRQs use sleepy locks under PREEMPT_RT,
> > the vmbus_isr execution needs to be moved into thread context. Open-
> > coding this allows to skip the IPI that irq_work would additionally
> > bring and which we do not need, being an IRQ, never an NMI.
> > 
> > This affects both x86 and arm64, therefore hook into the common driver
> > logic.
> > 
> > Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> Applied to hyperv-next. Thanks.
> 
> Saurabh and Naman, I want to get this submitted in this merge window. If
> you find any more issues with this patch, we can address them in the RC
> phase. In the worst case, we can revert this patch later.
> 
> Wei

I was in the process of completing the final round of testing; however, since
the change has now been merged, it will receive broader coverage, I will rely
on that.

Overall, the patch looks good to me.

- Saurabh

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox