Linux-HyperV List

Linux-HyperV List
 help / color / mirror / Atom feed

* RE: [PATCH v3 4/4] mshv: Handle insufficient root memory hypervisor statuses
From: Michael Kelley @ 2026-02-06 18:54 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <177031694699.186911.12873334535011325477.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Thursday, February 5, 2026 10:42 AM
> To: kys@microsoft.com; haiyangz@microsoft.com; wei.liu@kernel.org;
> decui@microsoft.com; longli@microsoft.com
> Cc: linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH v3 4/4] mshv: Handle insufficient root memory hypervisor statuses
> 
> When creating guest partition objects, the hypervisor may fail to
> allocate root partition pages and return an insufficient memory status.
> In this case, deposit memory using the root partition ID instead.
> 
> Note: This error should never occur in a guest of L1VH partition context.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/hv_common.c      |    2 +
>  drivers/hv/hv_proc.c        |   14 ++++++++++
>  include/hyperv/hvgdk_mini.h |   58 ++++++++++++++++++++++---------------------
>  3 files changed, 46 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index f20596276662..6b67ac616789 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -794,6 +794,8 @@ static const struct hv_status_info hv_status_infos[] = {
>  	_STATUS_INFO(HV_STATUS_PROPERTY_VALUE_OUT_OF_RANGE,	-EIO),
>  	_STATUS_INFO(HV_STATUS_INSUFFICIENT_MEMORY,		-ENOMEM),
>  	_STATUS_INFO(HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY,	-ENOMEM),
> +	_STATUS_INFO(HV_STATUS_INSUFFICIENT_ROOT_MEMORY,	-ENOMEM),
> +	_STATUS_INFO(HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY, 	-ENOMEM),
>  	_STATUS_INFO(HV_STATUS_INVALID_PARTITION_ID,		-EINVAL),
>  	_STATUS_INFO(HV_STATUS_INVALID_VP_INDEX,		-EINVAL),
>  	_STATUS_INFO(HV_STATUS_NOT_FOUND,			-EIO),
> diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
> index 181f6d02bce3..5f4fd9c3231c 100644
> --- a/drivers/hv/hv_proc.c
> +++ b/drivers/hv/hv_proc.c
> @@ -121,6 +121,18 @@ int hv_deposit_memory_node(int node, u64 partition_id,
>  	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY:
>  		num_pages = HV_MAX_CONTIGUOUS_ALLOCATION_PAGES;
>  		break;
> +
> +	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY:
> +		num_pages = HV_MAX_CONTIGUOUS_ALLOCATION_PAGES;
> +		fallthrough;
> +	case HV_STATUS_INSUFFICIENT_ROOT_MEMORY:
> +		if (!hv_root_partition()) {
> +			hv_status_err(hv_status, "Unexpected root memory deposit\n");
> +			return -ENOMEM;
> +		}
> +		partition_id = HV_PARTITION_ID_SELF;
> +		break;
> +

Per the discussion in v1 of this patch set, if the number of pages that should be
deposited in a particular situation is different from what this function provides,
the fallback is to use hv_call_deposit_pages() directly. From what I see, there's
only one such fallback case after a hypercall failure -- in hv_do_map_gpa_hcall().
The other uses of hv_call_deposit_pages() are initial deposits when creating a
VP or partition.

But if hv_call_deposit_pages() is used directly, the logic added here to detect
insufficient root memory and deposit to HV_PARTITION_ID_SELF isn't applied.
So if the hypercall in hv_do_map_gpa_hcall() fails with insufficient root
memory, the deposit is done to the wrong partition ID. If that case can
actually happen, then some additional logic is needed in
hv_do_map_gpa_hcall() to handle it. Or there needs to be a fallback
function that contains the logic.

Other than that, everything else in this patch set looks good to me.

Michael

>  	default:
>  		hv_status_err(hv_status, "Unexpected!\n");
>  		return -ENOMEM;

^ permalink raw reply

* Re: [PATCH v3 4/4] mshv: Handle insufficient root memory hypervisor statuses
From: Stanislav Kinsburskii @ 2026-02-06 17:02 UTC (permalink / raw)
  To: Anirudh Rayabharam
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel
In-Reply-To: <aYWCmVxnO8R3vsc-@anirudh-surface.localdomain>

On Fri, Feb 06, 2026 at 05:56:41AM +0000, Anirudh Rayabharam wrote:
> On Thu, Feb 05, 2026 at 06:42:27PM +0000, Stanislav Kinsburskii wrote:
> > When creating guest partition objects, the hypervisor may fail to
> > allocate root partition pages and return an insufficient memory status.
> > In this case, deposit memory using the root partition ID instead.
> > 
> > Note: This error should never occur in a guest of L1VH partition context.
> 
> I think you should rephrse this to:
> 
> "... should never occur in an L1VH partition"
> 
> because none of the errors in this patch series occur inside a guest. They
> either occur in L1VH or root or both.
> 
> > 
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> >  drivers/hv/hv_common.c      |    2 +
> >  drivers/hv/hv_proc.c        |   14 ++++++++++
> >  include/hyperv/hvgdk_mini.h |   58 ++++++++++++++++++++++---------------------
> >  3 files changed, 46 insertions(+), 28 deletions(-)
> > 
> > diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> > index f20596276662..6b67ac616789 100644
> > --- a/drivers/hv/hv_common.c
> > +++ b/drivers/hv/hv_common.c
> > @@ -794,6 +794,8 @@ static const struct hv_status_info hv_status_infos[] = {
> >  	_STATUS_INFO(HV_STATUS_PROPERTY_VALUE_OUT_OF_RANGE,	-EIO),
> >  	_STATUS_INFO(HV_STATUS_INSUFFICIENT_MEMORY,		-ENOMEM),
> >  	_STATUS_INFO(HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY,	-ENOMEM),
> > +	_STATUS_INFO(HV_STATUS_INSUFFICIENT_ROOT_MEMORY,	-ENOMEM),
> > +	_STATUS_INFO(HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY,	-ENOMEM),
> >  	_STATUS_INFO(HV_STATUS_INVALID_PARTITION_ID,		-EINVAL),
> >  	_STATUS_INFO(HV_STATUS_INVALID_VP_INDEX,		-EINVAL),
> >  	_STATUS_INFO(HV_STATUS_NOT_FOUND,			-EIO),
> > diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
> > index 181f6d02bce3..5f4fd9c3231c 100644
> > --- a/drivers/hv/hv_proc.c
> > +++ b/drivers/hv/hv_proc.c
> > @@ -121,6 +121,18 @@ int hv_deposit_memory_node(int node, u64 partition_id,
> >  	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY:
> >  		num_pages = HV_MAX_CONTIGUOUS_ALLOCATION_PAGES;
> >  		break;
> > +
> > +	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY:
> > +		num_pages = HV_MAX_CONTIGUOUS_ALLOCATION_PAGES;
> > +		fallthrough;
> > +	case HV_STATUS_INSUFFICIENT_ROOT_MEMORY:
> 
> I have the same comment as on v2 about num_pages being uninitialized when we
> reach this case directly.
> 

It is initialized to 1 on top of the function.

Thanks,
Stanislav.

> Thanks,
> Anirudh.
> 
> > +		if (!hv_root_partition()) {
> > +			hv_status_err(hv_status, "Unexpected root memory deposit\n");
> > +			return -ENOMEM;
> > +		}
> > +		partition_id = HV_PARTITION_ID_SELF;
> > +		break;
> > +
> >  	default:
> >  		hv_status_err(hv_status, "Unexpected!\n");
> >  		return -ENOMEM;
> > @@ -134,6 +146,8 @@ bool hv_result_needs_memory(u64 status)
> >  	switch (hv_result(status)) {
> >  	case HV_STATUS_INSUFFICIENT_MEMORY:
> >  	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY:
> > +	case HV_STATUS_INSUFFICIENT_ROOT_MEMORY:
> > +	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY:
> >  		return true;
> >  	}
> >  	return false;
> > diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
> > index 99ea0d03e657..50f5a1419052 100644
> > --- a/include/hyperv/hvgdk_mini.h
> > +++ b/include/hyperv/hvgdk_mini.h
> > @@ -14,34 +14,36 @@ struct hv_u128 {
> >  } __packed;
> >  
> >  /* NOTE: when adding below, update hv_result_to_string() */
> > -#define HV_STATUS_SUCCESS			    0x0
> > -#define HV_STATUS_INVALID_HYPERCALL_CODE	    0x2
> > -#define HV_STATUS_INVALID_HYPERCALL_INPUT	    0x3
> > -#define HV_STATUS_INVALID_ALIGNMENT		    0x4
> > -#define HV_STATUS_INVALID_PARAMETER		    0x5
> > -#define HV_STATUS_ACCESS_DENIED			    0x6
> > -#define HV_STATUS_INVALID_PARTITION_STATE	    0x7
> > -#define HV_STATUS_OPERATION_DENIED		    0x8
> > -#define HV_STATUS_UNKNOWN_PROPERTY		    0x9
> > -#define HV_STATUS_PROPERTY_VALUE_OUT_OF_RANGE	    0xA
> > -#define HV_STATUS_INSUFFICIENT_MEMORY		    0xB
> > -#define HV_STATUS_INVALID_PARTITION_ID		    0xD
> > -#define HV_STATUS_INVALID_VP_INDEX		    0xE
> > -#define HV_STATUS_NOT_FOUND			    0x10
> > -#define HV_STATUS_INVALID_PORT_ID		    0x11
> > -#define HV_STATUS_INVALID_CONNECTION_ID		    0x12
> > -#define HV_STATUS_INSUFFICIENT_BUFFERS		    0x13
> > -#define HV_STATUS_NOT_ACKNOWLEDGED		    0x14
> > -#define HV_STATUS_INVALID_VP_STATE		    0x15
> > -#define HV_STATUS_NO_RESOURCES			    0x1D
> > -#define HV_STATUS_PROCESSOR_FEATURE_NOT_SUPPORTED   0x20
> > -#define HV_STATUS_INVALID_LP_INDEX		    0x41
> > -#define HV_STATUS_INVALID_REGISTER_VALUE	    0x50
> > -#define HV_STATUS_OPERATION_FAILED		    0x71
> > -#define HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY    0x75
> > -#define HV_STATUS_TIME_OUT			    0x78
> > -#define HV_STATUS_CALL_PENDING			    0x79
> > -#define HV_STATUS_VTL_ALREADY_ENABLED		    0x86
> > +#define HV_STATUS_SUCCESS				0x0
> > +#define HV_STATUS_INVALID_HYPERCALL_CODE		0x2
> > +#define HV_STATUS_INVALID_HYPERCALL_INPUT		0x3
> > +#define HV_STATUS_INVALID_ALIGNMENT			0x4
> > +#define HV_STATUS_INVALID_PARAMETER			0x5
> > +#define HV_STATUS_ACCESS_DENIED				0x6
> > +#define HV_STATUS_INVALID_PARTITION_STATE		0x7
> > +#define HV_STATUS_OPERATION_DENIED			0x8
> > +#define HV_STATUS_UNKNOWN_PROPERTY			0x9
> > +#define HV_STATUS_PROPERTY_VALUE_OUT_OF_RANGE		0xA
> > +#define HV_STATUS_INSUFFICIENT_MEMORY			0xB
> > +#define HV_STATUS_INVALID_PARTITION_ID			0xD
> > +#define HV_STATUS_INVALID_VP_INDEX			0xE
> > +#define HV_STATUS_NOT_FOUND				0x10
> > +#define HV_STATUS_INVALID_PORT_ID			0x11
> > +#define HV_STATUS_INVALID_CONNECTION_ID			0x12
> > +#define HV_STATUS_INSUFFICIENT_BUFFERS			0x13
> > +#define HV_STATUS_NOT_ACKNOWLEDGED			0x14
> > +#define HV_STATUS_INVALID_VP_STATE			0x15
> > +#define HV_STATUS_NO_RESOURCES				0x1D
> > +#define HV_STATUS_PROCESSOR_FEATURE_NOT_SUPPORTED	0x20
> > +#define HV_STATUS_INVALID_LP_INDEX			0x41
> > +#define HV_STATUS_INVALID_REGISTER_VALUE		0x50
> > +#define HV_STATUS_OPERATION_FAILED			0x71
> > +#define HV_STATUS_INSUFFICIENT_ROOT_MEMORY		0x73
> > +#define HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY	0x75
> > +#define HV_STATUS_TIME_OUT				0x78
> > +#define HV_STATUS_CALL_PENDING				0x79
> > +#define HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY	0x83
> > +#define HV_STATUS_VTL_ALREADY_ENABLED			0x86
> >  
> >  /*
> >   * The Hyper-V TimeRefCount register and the TSC
> > 
> > 

^ permalink raw reply

* Re: [PATCH 3/3] x86/virt: rename x2apic_available to x2apic_without_ir_available
From: Shashank Balaji @ 2026-02-06  9:23 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov,
	Broadcom internal kernel review list, Jan Kiszka, Paolo Bonzini,
	Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky, linux-kernel,
	linux-hyperv, virtualization, jailhouse-dev, kvm, xen-devel,
	Rahul Bukte, Daniel Palmer, Tim Bird
In-Reply-To: <ab7f5935-fd5e-4ba5-a97d-5433f241a089@intel.com>

On Thu, Feb 05, 2026 at 04:10:37PM -0800, Sohil Mehta wrote:
> On 2/2/2026 1:51 AM, Shashank Balaji wrote:
> > No functional change.
> > 
> > x86_init.hyper.x2apic_available is used only in try_to_enable_x2apic to check if
> > x2apic needs to be disabled if interrupt remapping support isn't present. But
> > the name x2apic_available doesn't reflect that usage.
> > 
> 
> I don't understand the premise of this patch. Shouldn't the variable
> name reflect what is stored rather than how it is used?

Sorry about the confusion, I should have used '()'.
x86_init.hyper.x2apic_available() is called only in
try_to_enable_x2apic(). Here's the relevant snippet:

	static __init void try_to_enable_x2apic(int remap_mode)
	{
		if (x2apic_state == X2APIC_DISABLED)
			return;

		if (remap_mode != IRQ_REMAP_X2APIC_MODE) {
			u32 apic_limit = 255;

			/*
			 * Using X2APIC without IR is not architecturally supported
			 * on bare metal but may be supported in guests.
			 */
			if (!x86_init.hyper.x2apic_available()) {
				pr_info("x2apic: IRQ remapping doesn't support X2APIC mode\n");
				x2apic_disable();
				return;
			}

So the question being asked is, "can x2apic be used without IR?", but
the name "x2apic_available" signals "is x2apic available?". I found this
confusing while going through the source.

Most hypervisors set their x2apic_available() implementation to
essentially return if the CPU supports x2apic or not, which is valid
given the name "x2apic_available", but x2apic availability is not what's in
question at the callsite.

> > This is what x2apic_available is set to for various hypervisors:
> > 
> > 	acrn		boot_cpu_has(X86_FEATURE_X2APIC)
> > 	mshyperv	boot_cpu_has(X86_FEATURE_X2APIC)
> > 	xen		boot_cpu_has(X86_FEATURE_X2APIC) or false
> > 	vmware		vmware_legacy_x2apic_available
> > 	kvm		kvm_cpuid_base() != 0
> > 	jailhouse	x2apic_enabled()
> > 	bhyve		true
> > 	default		false
> > 
> 
> If both interrupt remapping and x2apic are enabled, what would the name
> x2apic_without_ir_available signify?

If IR is enabled, then the branch to call x2apic_available() wouldn't be taken :)
So the meaning of x2apic_without_ir_available wouldn't be relevant
anymore.

> A value of "true" would mean x2apic is available without IR. But that
> would be inaccurate for most hypervisors. A value of "false" could be
> interpreted as x2apic is not available, which is also inaccurate.
> 
> To me, x2apic_available makes more sense than
> x2apic_without_ir_available based on the values being set by the
> hypervisors.

I agree with you, and I think therein lies the problem. Most hypervisors
are answering the broader question "is x2apic available?", so the name
"x2apic_available" makes sense.

I think further work is required to check if various implementations of
x2apic_available() also need to be changed to reflect the "x2apic
without IR?" semantic, but I don't know enough to do that myself. Maybe
I should have added TODOs above the implementations.

I would like the feedback of the virt folks too on all this, maybe I'm
misinterpreting what's going on here.

> > Bare metal and vmware correctly check if x2apic is available without interrupt
> > remapping. The rest of them check if x2apic is enabled/supported, and kvm just
> > checks if the kernel is running on kvm. The other hypervisors may have to have
> > their checks audited.
> > 
> AFAIU, the value on bare metal is set to false because this is a
> hypervisor specific variable. Perhaps I have misunderstood something?

^ permalink raw reply

* Re: [PATCH 1/3] x86/x2apic: disable x2apic on resume if the kernel expects so
From: Shashank Balaji @ 2026-02-06  8:57 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Suresh Siddha, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov,
	Broadcom internal kernel review list, Jan Kiszka, Paolo Bonzini,
	Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky, Ingo Molnar,
	linux-kernel, linux-hyperv, virtualization, jailhouse-dev, kvm,
	xen-devel, Rahul Bukte, Daniel Palmer, Tim Bird, stable
In-Reply-To: <e5ac3272-795b-488c-b767-290fd50f2105@intel.com>

On Thu, Feb 05, 2026 at 03:18:58PM -0800, Sohil Mehta wrote:
> On 2/4/2026 10:07 PM, Shashank Balaji wrote:
> > On Wed, Feb 04, 2026 at 10:53:28AM -0800, Sohil Mehta wrote:
> 
> >> It's a bit odd then that the firmware chooses to enable x2apic without
> >> the OS requesting it.
> > 
> > Well, the firmware has a setting saying "Enable x2apic", which was
> > enabled. So it did what the setting says
> > 
> 
> The expectation would be that firmware would restore to the same state
> before lapic_suspend().

I'm a bit out of my depth here, but I went looking around, and this is from the
latest ACPI spec (v6.6) [1]:

	When executing from the power-on reset vector as a result of waking
	from an S2 or S3 sleep state, the platform firmware performs only the
	hardware initialization required to restore the system to either the
	state the platform was in prior to the initial operating system boot,
	or to the pre-sleep configuration state. In multiprocessor systems,
	non-boot processors should be placed in the same state as prior to the
	initial operating system boot.

	(further ahead)

	 If this is an S2 or S3 wake, then the platform runtime firmware
	 restores minimum context of the system before jumping to the waking
	 vector. This includes:

	 	CPU configuration. Platform runtime firmware restores the
		pre-sleep configuration or initial boot configuration of each
		CPU (MSR, MTRR, firmware update, SMBase, and so on). Interrupts
		must be disabled (for IA-32 processors, disabled by CLI
		instruction).

		(and other things)

I suppose, in my case, the firmware is restoring initial boot
configuration on S3 resume. And initial boot configuration of x2apic is
set from the firmware's UI "Enable x2apic".

> Maybe a warning would be useful to encourage firmware to fix this going
> forward. I don't have a strong preference on the wording, but how about?
> 
> pr_warn_once("x2apic unexpectedly re-enabled by the firmware during
> resume.\n");

At least as per the spec, it's not something the firmware needs to fix,
and it's not unexpected re-enablement.

Am I missing something?

But it _is_ surprising that this bug went unnoticed for so long :)

[1] https://uefi.org/specs/ACPI/6.6/16_Waking_and_Sleeping.html#initialization

^ permalink raw reply

* Re: [PATCH] x86: mshyperv: Use kthread for vmbus interrupts on PREEMPT_RT
From: Naman Jain @ 2026-02-06  7:32 UTC (permalink / raw)
  To: Wei Liu, Jan Kiszka
  Cc: Magnus Kulke, K. Y. Srinivasan, Haiyang Zhang, Dexuan Cui,
	Long Li, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, linux-hyperv, linux-kernel, Florian Bezdeka, RT,
	Mitchell Levy, skinsburskii, mrathor, anirudh, schakrabarti,
	ssengar
In-Reply-To: <20260204073629.GP79272@liuwe-devbox-debian-v2.local>



On 2/4/2026 1:06 PM, Wei Liu wrote:
> On Wed, Feb 04, 2026 at 08:32:04AM +0100, Jan Kiszka wrote:
>> On 04.02.26 08:29, Wei Liu wrote:
>>> On Wed, Feb 04, 2026 at 08:26:48AM +0100, Jan Kiszka wrote:
>>>> On 04.02.26 08:19, Jan Kiszka wrote:
>>>>> On 04.02.26 08:00, Wei Liu wrote:
>>>>>> On Tue, Feb 03, 2026 at 05:01:30PM +0100, Jan Kiszka wrote:
>>>>>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>
>>>>>>> Resolves the following lockdep report when booting PREEMPT_RT on Hyper-V
>>>>>>> with related guest support enabled:
>>>>>>
>>>>>> So all it takes to reproduce this is to enabled PREEMPT_RT?
>>>>>>
>>>>>
>>>>> ...and enable CONFIG_PROVE_LOCKING so that you do not have to wait for
>>>>> your system to actually run into the bug. Lockdep already triggers
>>>>> during bootup.
>>>>>
>>>>>> Asking because ...
>>>>>>
>>>>>>>   	struct pt_regs *old_regs = set_irq_regs(regs);
>>>>>>> @@ -158,8 +196,12 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_callback)
>>>>>>>   	if (mshv_handler)
>>>>>>>   		mshv_handler();
>>>>>>
>>>>>> ... to err on the safe side we should probably do the same for
>>>>>> mshv_handler as well.
>>>>>>
>>>>>
>>>>> Valid question. We so far worked based on lockdep reports, and the
>>>>> mshv_handler didn't trigger yet. Either it is not run in our setup, or
>>>>> it is actually already fine. But I have a code review on my agenda
>>>>> regarding potential remaining issues in mshv.
>>>>>
>>>>> Is there something needed to trigger the mshv_handler so that we can
>>>>> test it?
>>>>>
>>>>
>>>> Ah, that depends on CONFIG_MSHV_ROOT. Is that related to the accelerator
>>>> mode that Magnus presented in [1]? We briefly chatted about it and also
>>>> my problems with the drivers after his talk on Saturday.
>>>
>>> Yes. That is the driver. If PROVE_LOCKING triggers the warning without
>>> running the code, perhaps turning on MSHV_ROOT is enough.
>>>
>>
>> But if my VM is not a root partition, I wouldn't use that driver, would I?
> 
> No, you wouldn't.  You cannot do that until later this year. If you
> cannot test that, so be it. I'm fine with applying your patch and then
> move the mshv_handler logic later ourselves.
> 
> I've CC'ed a few folks from Microsoft.
> 
> Saurabh, Long, and Dexuan, can you review and test this patch for VMBus?


I tested this and didn't see any issues with OpenHCL/mshv_vtl.

Regards,
Naman


^ permalink raw reply

* Re: [PATCH] mshv: fix SRCU protection in irqfd resampler ack handler
From: Wei Liu @ 2026-02-06  7:06 UTC (permalink / raw)
  To: lirongqing
  Cc: K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	linux-hyperv, linux-kernel
In-Reply-To: <20260205094010.4301-1-lirongqing@baidu.com>

On Thu, Feb 05, 2026 at 04:40:10AM -0500, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> Replace hlist_for_each_entry_rcu() with hlist_for_each_entry_srcu()
> in mshv_irqfd_resampler_ack() to correctly handle SRCU-protected
> linked list traversal.
> 
> The function uses SRCU (sleepable RCU) synchronization via
> partition->pt_irq_srcu, but was incorrectly using the RCU variant
> for list iteration. This could lead to race conditions when the
> list is modified concurrently.
> 
> Also add srcu_read_lock_held() assertion as required by
> hlist_for_each_entry_srcu() to ensure we're in the proper
> read-side critical section.
> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>

Thank you for the patch. Applied.

I also added a Fixes tag to the commit message.

Fixes: 621191d709b14 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")

> ---
>  drivers/hv/mshv_eventfd.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hv/mshv_eventfd.c b/drivers/hv/mshv_eventfd.c
> index 0b75ff1..6d176ed 100644
> --- a/drivers/hv/mshv_eventfd.c
> +++ b/drivers/hv/mshv_eventfd.c
> @@ -87,8 +87,9 @@ static void mshv_irqfd_resampler_ack(struct mshv_irq_ack_notifier *mian)
>  
>  	idx = srcu_read_lock(&partition->pt_irq_srcu);
>  
> -	hlist_for_each_entry_rcu(irqfd, &resampler->rsmplr_irqfd_list,
> -				 irqfd_resampler_hnode) {
> +	hlist_for_each_entry_srcu(irqfd, &resampler->rsmplr_irqfd_list,
> +				 irqfd_resampler_hnode,
> +				 srcu_read_lock_held(&partition->pt_irq_srcu)) {
>  		if (hv_should_clear_interrupt(irqfd->irqfd_lapic_irq.lapic_control.interrupt_type))
>  			hv_call_clear_virtual_interrupt(partition->pt_id);
>  
> -- 
> 2.9.4
> 

^ permalink raw reply

* Re: [PATCH v3] mshv: make field names descriptive in a header struct
From: Wei Liu @ 2026-02-06  7:03 UTC (permalink / raw)
  To: Mukesh R; +Cc: linux-hyperv, wei.liu
In-Reply-To: <20260204202328.196690-1-mrathor@linux.microsoft.com>

On Wed, Feb 04, 2026 at 12:23:28PM -0800, Mukesh R wrote:
> When struct fields use very common names like "pages" or "type", it makes
> it difficult to find uses of these fields with tools like grep, cscope,
> etc when the struct is in a header file included in many places. Add
> prefix mreg_ to some fields in struct mshv_mem_region to make it easier
> to find them.
> 
> There is no functional change.
> 
> Signed-off-by: Mukesh R <mrathor@linux.microsoft.com>
> ---
> V3: rebase to afefdb2bc945 (origin/hyperv-next)

Applied.

^ permalink raw reply

* [PATCH v2] x86: mshyperv: Use kthread for vmbus interrupts on PREEMPT_RT
From: Jan Kiszka @ 2026-02-06  6:47 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
  Cc: linux-hyperv, linux-kernel, Florian Bezdeka, RT, Mitchell Levy,
	Michael Kelley

From: Jan Kiszka <jan.kiszka@siemens.com>

Resolves the following lockdep report when booting PREEMPT_RT on Hyper-V
with related guest support enabled:

[    1.127941] hv_vmbus: registering driver hyperv_drm

[    1.132518] =============================
[    1.132519] [ BUG: Invalid wait context ]
[    1.132521] 6.19.0-rc8+ #9 Not tainted
[    1.132524] -----------------------------
[    1.132525] swapper/0/0 is trying to lock:
[    1.132526] ffff8b9381bb3c90 (&channel->sched_lock){....}-{3:3}, at: vmbus_chan_sched+0xc4/0x2b0
[    1.132543] other info that might help us debug this:
[    1.132544] context-{2:2}
[    1.132545] 1 lock held by swapper/0/0:
[    1.132547]  #0: ffffffffa010c4c0 (rcu_read_lock){....}-{1:3}, at: vmbus_chan_sched+0x31/0x2b0
[    1.132557] stack backtrace:
[    1.132560] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.19.0-rc8+ #9 PREEMPT_{RT,(lazy)}
[    1.132565] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025
[    1.132567] Call Trace:
[    1.132570]  <IRQ>
[    1.132573]  dump_stack_lvl+0x6e/0xa0
[    1.132581]  __lock_acquire+0xee0/0x21b0
[    1.132592]  lock_acquire+0xd5/0x2d0
[    1.132598]  ? vmbus_chan_sched+0xc4/0x2b0
[    1.132606]  ? lock_acquire+0xd5/0x2d0
[    1.132613]  ? vmbus_chan_sched+0x31/0x2b0
[    1.132619]  rt_spin_lock+0x3f/0x1f0
[    1.132623]  ? vmbus_chan_sched+0xc4/0x2b0
[    1.132629]  ? vmbus_chan_sched+0x31/0x2b0
[    1.132634]  vmbus_chan_sched+0xc4/0x2b0
[    1.132641]  vmbus_isr+0x2c/0x150
[    1.132648]  __sysvec_hyperv_callback+0x5f/0xa0
[    1.132654]  sysvec_hyperv_callback+0x88/0xb0
[    1.132658]  </IRQ>
[    1.132659]  <TASK>
[    1.132660]  asm_sysvec_hyperv_callback+0x1a/0x20

As code paths that handle vmbus IRQs use sleepy locks under PREEMPT_RT,
the complete vmbus_handler execution needs to be moved into thread
context. Open-coding this allows to skip the IPI that irq_work would
additionally bring and which we do not need, being an IRQ, never an NMI.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

Changes in v2:
 - reorder vmbus_irq_pending clearing to fix a race condition

 arch/x86/kernel/cpu/mshyperv.c | 52 ++++++++++++++++++++++++++++++++--
 1 file changed, 50 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 579fb2c64cfd..b39cb983326a 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -17,6 +17,7 @@
 #include <linux/irq.h>
 #include <linux/kexec.h>
 #include <linux/random.h>
+#include <linux/smpboot.h>
 #include <asm/processor.h>
 #include <asm/hypervisor.h>
 #include <hyperv/hvhdk.h>
@@ -150,6 +151,43 @@ static void (*hv_stimer0_handler)(void);
 static void (*hv_kexec_handler)(void);
 static void (*hv_crash_handler)(struct pt_regs *regs);
 
+static DEFINE_PER_CPU(bool, vmbus_irq_pending);
+static DEFINE_PER_CPU(struct task_struct *, vmbus_irqd);
+
+static void vmbus_irqd_wake(void)
+{
+	struct task_struct *tsk = __this_cpu_read(vmbus_irqd);
+
+	__this_cpu_write(vmbus_irq_pending, true);
+	wake_up_process(tsk);
+}
+
+static void vmbus_irqd_setup(unsigned int cpu)
+{
+	sched_set_fifo(current);
+}
+
+static int vmbus_irqd_should_run(unsigned int cpu)
+{
+	return __this_cpu_read(vmbus_irq_pending);
+}
+
+static void run_vmbus_irqd(unsigned int cpu)
+{
+	__this_cpu_write(vmbus_irq_pending, false);
+	vmbus_handler();
+}
+
+static bool vmbus_irq_initialized;
+
+static struct smp_hotplug_thread vmbus_irq_threads = {
+	.store                  = &vmbus_irqd,
+	.setup			= vmbus_irqd_setup,
+	.thread_should_run      = vmbus_irqd_should_run,
+	.thread_fn              = run_vmbus_irqd,
+	.thread_comm            = "vmbus_irq/%u",
+};
+
 DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_callback)
 {
 	struct pt_regs *old_regs = set_irq_regs(regs);
@@ -158,8 +196,12 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_callback)
 	if (mshv_handler)
 		mshv_handler();
 
-	if (vmbus_handler)
-		vmbus_handler();
+	if (vmbus_handler) {
+		if (IS_ENABLED(CONFIG_PREEMPT_RT))
+			vmbus_irqd_wake();
+		else
+			vmbus_handler();
+	}
 
 	if (ms_hyperv.hints & HV_DEPRECATING_AEOI_RECOMMENDED)
 		apic_eoi();
@@ -174,6 +216,10 @@ void hv_setup_mshv_handler(void (*handler)(void))
 
 void hv_setup_vmbus_handler(void (*handler)(void))
 {
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && !vmbus_irq_initialized) {
+		BUG_ON(smpboot_register_percpu_thread(&vmbus_irq_threads));
+		vmbus_irq_initialized = true;
+	}
 	vmbus_handler = handler;
 }
 
@@ -181,6 +227,8 @@ void hv_remove_vmbus_handler(void)
 {
 	/* We have no way to deallocate the interrupt gate */
 	vmbus_handler = NULL;
+	smpboot_unregister_percpu_thread(&vmbus_irq_threads);
+	vmbus_irq_initialized = false;
 }
 
 /*
-- 
2.51.0

^ permalink raw reply related

* Re: [PATCH] x86: mshyperv: Use kthread for vmbus interrupts on PREEMPT_RT
From: Jan Kiszka @ 2026-02-06  6:40 UTC (permalink / raw)
  To: Michael Kelley, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86@kernel.org
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	Florian Bezdeka, RT, Mitchell Levy,
	skinsburskii@linux.microsoft.com, mrathor@linux.microsoft.com,
	anirudh@anirudhrb.com, schakrabarti@linux.microsoft.com,
	ssengar@linux.microsoft.com
In-Reply-To: <SN6PR02MB4157B6A9C8BEFA312F0D9D68D499A@SN6PR02MB4157.namprd02.prod.outlook.com>

On 05.02.26 19:55, Michael Kelley wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com> Sent: Tuesday, February 3, 2026 8:02 AM
>>
>> Resolves the following lockdep report when booting PREEMPT_RT on Hyper-V
>> with related guest support enabled:
>>
>> [    1.127941] hv_vmbus: registering driver hyperv_drm
>>
>> [    1.132518] =============================
>> [    1.132519] [ BUG: Invalid wait context ]
>> [    1.132521] 6.19.0-rc8+ #9 Not tainted
>> [    1.132524] -----------------------------
>> [    1.132525] swapper/0/0 is trying to lock:
>> [    1.132526] ffff8b9381bb3c90 (&channel->sched_lock){....}-{3:3}, at: vmbus_chan_sched+0xc4/0x2b0
>> [    1.132543] other info that might help us debug this:
>> [    1.132544] context-{2:2}
>> [    1.132545] 1 lock held by swapper/0/0:
>> [    1.132547]  #0: ffffffffa010c4c0 (rcu_read_lock){....}-{1:3}, at: vmbus_chan_sched+0x31/0x2b0
>> [    1.132557] stack backtrace:
>> [    1.132560] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.19.0-rc8+ #9 PREEMPT_{RT,(lazy)}
>> [    1.132565] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025
>> [    1.132567] Call Trace:
>> [    1.132570]  <IRQ>
>> [    1.132573]  dump_stack_lvl+0x6e/0xa0
>> [    1.132581]  __lock_acquire+0xee0/0x21b0
>> [    1.132592]  lock_acquire+0xd5/0x2d0
>> [    1.132598]  ? vmbus_chan_sched+0xc4/0x2b0
>> [    1.132606]  ? lock_acquire+0xd5/0x2d0
>> [    1.132613]  ? vmbus_chan_sched+0x31/0x2b0
>> [    1.132619]  rt_spin_lock+0x3f/0x1f0
>> [    1.132623]  ? vmbus_chan_sched+0xc4/0x2b0
>> [    1.132629]  ? vmbus_chan_sched+0x31/0x2b0
>> [    1.132634]  vmbus_chan_sched+0xc4/0x2b0
>> [    1.132641]  vmbus_isr+0x2c/0x150
>> [    1.132648]  __sysvec_hyperv_callback+0x5f/0xa0
>> [    1.132654]  sysvec_hyperv_callback+0x88/0xb0
>> [    1.132658]  </IRQ>
>> [    1.132659]  <TASK>
>> [    1.132660]  asm_sysvec_hyperv_callback+0x1a/0x20
>>
>> As code paths that handle vmbus IRQs use sleepy locks under PREEMPT_RT,
>> the complete vmbus_handler execution needs to be moved into thread
>> context. Open-coding this allows to skip the IPI that irq_work would
>> additionally bring and which we do not need, being an IRQ, never an NMI.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>
>> This should resolve what was once brought forward via [1]. If it
>> actually resolves all remaining compatibility issues of the hyperv
>> support with RT is not yet clear, though. So far, lockdep is happy when
>> using this plus [2].
>>
>> [1] https://lore.kernel.org/all/20230809-b4-rt_preempt-fix-v1-0-7283bbdc8b14@gmail.com/
>> [2] https://lore.kernel.org/lkml/0c7fb5cd-fb21-4760-8593-e04bade84744@siemens.com/
>>
>>  arch/x86/kernel/cpu/mshyperv.c | 52 ++++++++++++++++++++++++++++++++--	
> 
> You've added this code under arch/x86. But isn't it architecture independent? I
> think it should also work on arm64. If that's the case, the code should probably
> be added to drivers/hv/vmbus_drv.c instead.
> 

I checked that before: arm64 uses normal IRQs, not over-optimized APIC
vectors. And those IRQs are auto-threaded.

That said, someone with an arm64 Hyper-V deployment should still try to
run things there once (PREEMPT_RT + PROVE_LOCKING). I don't have such a
setup.

>>  1 file changed, 50 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
>> index 579fb2c64cfd..1194ca452c52 100644
>> --- a/arch/x86/kernel/cpu/mshyperv.c
>> +++ b/arch/x86/kernel/cpu/mshyperv.c
>> @@ -17,6 +17,7 @@
>>  #include <linux/irq.h>
>>  #include <linux/kexec.h>
>>  #include <linux/random.h>
>> +#include <linux/smpboot.h>
>>  #include <asm/processor.h>
>>  #include <asm/hypervisor.h>
>>  #include <hyperv/hvhdk.h>
>> @@ -150,6 +151,43 @@ static void (*hv_stimer0_handler)(void);
>>  static void (*hv_kexec_handler)(void);
>>  static void (*hv_crash_handler)(struct pt_regs *regs);
>>
>> +static DEFINE_PER_CPU(bool, vmbus_irq_pending);
>> +static DEFINE_PER_CPU(struct task_struct *, vmbus_irqd);
>> +
>> +static void vmbus_irqd_wake(void)
>> +{
>> +	struct task_struct *tsk = __this_cpu_read(vmbus_irqd);
>> +
>> +	__this_cpu_write(vmbus_irq_pending, true);
>> +	wake_up_process(tsk);
>> +}
>> +
>> +static void vmbus_irqd_setup(unsigned int cpu)
>> +{
>> +	sched_set_fifo(current);
>> +}
>> +
>> +static int vmbus_irqd_should_run(unsigned int cpu)
>> +{
>> +	return __this_cpu_read(vmbus_irq_pending);
>> +}
>> +
>> +static void run_vmbus_irqd(unsigned int cpu)
>> +{
>> +	vmbus_handler();
>> +	__this_cpu_write(vmbus_irq_pending, false);
>> +}
> 
> The two statements in this function should be swapped. This function
> runs with pre-emption enabled and interrupts enabled. If a VMBus
> interrupt comes in as vmbus_handler() is finishing, vmbus_irqd_wake()
> will run and set vmbus_irq_pending to "true". This function will then set
> vmbus_irq_pending to 'false", wiping out the "true" setting. The hotplug
> thread will decide it doesn't need to run again, and whatever generated
> the new interrupt doesn't get processed (at least until another interrupt
> comes in).

You are absolutely right. The reordered pattern is the same as in
irq_work - for the very same reason. I'll send v2.

Thanks,
Jan

-- 
Siemens AG, Foundational Technologies
Linux Expert Center

^ permalink raw reply

* Re: [PATCH 1/1] mshv: Add comment about huge page mappings in guest physical address space
From: Anirudh Rayabharam @ 2026-02-06  6:12 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Stanislav Kinsburskii, mhkelley58@gmail.com, kys@microsoft.com,
	haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com,
	longli@microsoft.com, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <SN6PR02MB41575CA65B0A07C935F85665D49BA@SN6PR02MB4157.namprd02.prod.outlook.com>

On Tue, Feb 03, 2026 at 06:35:40PM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, February 2, 2026 10:56 AM
> > 
> > On Mon, Feb 02, 2026 at 06:26:42PM +0000, Michael Kelley wrote:
> > > From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, February 2, 2026 9:18 AM
> > > >
> > > > On Mon, Feb 02, 2026 at 08:51:01AM -0800, mhkelley58@gmail.com wrote:
> > > > > From: Michael Kelley <mhklinux@outlook.com>
> > > > >
> > > > > Huge page mappings in the guest physical address space depend on having
> > > > > matching alignment of the userspace address in the parent partition and
> > > > > of the guest physical address. Add a comment that captures this
> > > > > information. See the link to the mailing list thread.
> > > > >
> > > > > No code or functional change.
> > > > >
> > > > > Link: https://lore.kernel.org/linux-hyperv/aUrC94YvscoqBzh3@skinsburskii.localdomain/T/#m0871d2cae9b297fd397ddb8459e534981307c7dc
> > > > > Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> > > > > ---
> > > > >  drivers/hv/mshv_root_main.c | 14 ++++++++++++++
> > > > >  1 file changed, 14 insertions(+)
> > > > >
> > > > > diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> > > > > index 681b58154d5e..bc738ff4508e 100644
> > > > > --- a/drivers/hv/mshv_root_main.c
> > > > > +++ b/drivers/hv/mshv_root_main.c
> > > > > @@ -1389,6 +1389,20 @@ mshv_partition_ioctl_set_memory(struct mshv_partition *partition,
> > > > >  	if (mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP))
> > > > >  		return mshv_unmap_user_memory(partition, mem);
> > > > >
> > > > > +	/*
> > > > > +	 * If the userspace_addr and the guest physical address (as derived
> > > > > +	 * from the guest_pfn) have the same alignment modulo PMD huge page
> > > > > +	 * size, the MSHV driver can map any PMD huge pages to the guest
> > > > > +	 * physical address space as PMD huge pages. If the alignments do
> > > > > +	 * not match, PMD huge pages must be mapped as single pages in the
> > > > > +	 * guest physical address space. The MSHV driver does not enforce
> > > > > +	 * that the alignments match, and it invokes the hypervisor to set
> > > > > +	 * up correct functional mappings either way. See mshv_chunk_stride().
> > > > > +	 * The caller of the ioctl is responsible for providing userspace_addr
> > > > > +	 * and guest_pfn values with matching alignments if it wants the guest
> > > > > +	 * to get the performance benefits of PMD huge page mappings of its
> > > > > +	 * physical address space to real system memory.
> > > > > +	 */
> > > >
> > > > Thanks. However, I'd suggest to reduce this commet a lot and put the
> > > > details into the commit message instead. Also, why this place? Why not a
> > > > part of the function description instead, for example?
> > >
> > > In general, I'm very much an advocate of putting a bit more detail into code
> > > comments, so that someone new reading the code has a chance of figuring
> > > out what's going on without having to search through the commit history
> > > and read commit messages. The commit history is certainly useful for the
> > > historical record, and especially how things have changed over time. But for
> > > "how non-obvious things work now", I like to see that in the code comments.
> > >
> > 
> > This approach is not well aligned with the existing kernel coding style.
> > It is common to answer the "why" question in the commit message.
> > Code comments should focus on "what" the code does.
> > 
> > https://www.kernel.org/doc/html/latest/process/coding-style.html
> > 
> 
> Which says "Instead, put the comments at the head of the function,
> telling people what it does, and possibly WHY it does it." I'm good with
> that approach.
> 
> > For more details, it is common to use `git blame` to learn the context
> > of a change when needed.
> 
> Yep, I use that all the time for the historical record.
> 
> > 
> > > As for where to put the comment, I'm flexible. I thought about placing it
> > > outside the function as a "header" (which is what I think you mean by the
> > > "function description"), but the function handles both "map" and "unmap"
> > > operations, and this comment applies only to "map".  Hence I put it after
> > > the test for whether we're doing "map" vs. "unmap".  But I wouldn't object
> > > to it being placed as a function description, though the text would need to be
> > > enhanced to more broadly be a function description instead of just a comment
> > > about a specific aspect of "map" behavior.
> > >
> > 
> > As for the location, since this documents the userspace API, I would
> > rather place it above the function as part of the function description.
> > Even though the function handles both map and unmap, unmap also deals
> > with huge pages.
> 
> I'll do a version written as the function description. But the full function
> description will be more extensive to cover all the "what" that this function
> implements:
> * input parameters, and their valid values
> * map and unmap
> * when pinned vs. movable vs. mmio regions are created
> * what is done with huge pages in the above cases (i.e., a massaged version
>    of what I've already written)
> * populating and pinning of pages for pinned regions

I'm happy to approve such a version of this patch.

Also, if you want to limit yourself to the map behavior and not unmap,
you could also place this in the description of mshv_map_user_memory().
I would happily approve such a patch as well.

Overall, I think your comment is very useful and points out things that
are easy to miss while reading, modifying or reviewing this code in the
future. I also believe that this information is better as a comment here
than a commit message as has been suggested elsewhere in this thread.

Thanks,
Anirudh.


^ permalink raw reply

* Re: [PATCH v3 4/4] mshv: Handle insufficient root memory hypervisor statuses
From: Anirudh Rayabharam @ 2026-02-06  5:56 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel
In-Reply-To: <177031694699.186911.12873334535011325477.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

On Thu, Feb 05, 2026 at 06:42:27PM +0000, Stanislav Kinsburskii wrote:
> When creating guest partition objects, the hypervisor may fail to
> allocate root partition pages and return an insufficient memory status.
> In this case, deposit memory using the root partition ID instead.
> 
> Note: This error should never occur in a guest of L1VH partition context.

I think you should rephrse this to:

"... should never occur in an L1VH partition"

because none of the errors in this patch series occur inside a guest. They
either occur in L1VH or root or both.

> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/hv_common.c      |    2 +
>  drivers/hv/hv_proc.c        |   14 ++++++++++
>  include/hyperv/hvgdk_mini.h |   58 ++++++++++++++++++++++---------------------
>  3 files changed, 46 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index f20596276662..6b67ac616789 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -794,6 +794,8 @@ static const struct hv_status_info hv_status_infos[] = {
>  	_STATUS_INFO(HV_STATUS_PROPERTY_VALUE_OUT_OF_RANGE,	-EIO),
>  	_STATUS_INFO(HV_STATUS_INSUFFICIENT_MEMORY,		-ENOMEM),
>  	_STATUS_INFO(HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY,	-ENOMEM),
> +	_STATUS_INFO(HV_STATUS_INSUFFICIENT_ROOT_MEMORY,	-ENOMEM),
> +	_STATUS_INFO(HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY,	-ENOMEM),
>  	_STATUS_INFO(HV_STATUS_INVALID_PARTITION_ID,		-EINVAL),
>  	_STATUS_INFO(HV_STATUS_INVALID_VP_INDEX,		-EINVAL),
>  	_STATUS_INFO(HV_STATUS_NOT_FOUND,			-EIO),
> diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
> index 181f6d02bce3..5f4fd9c3231c 100644
> --- a/drivers/hv/hv_proc.c
> +++ b/drivers/hv/hv_proc.c
> @@ -121,6 +121,18 @@ int hv_deposit_memory_node(int node, u64 partition_id,
>  	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY:
>  		num_pages = HV_MAX_CONTIGUOUS_ALLOCATION_PAGES;
>  		break;
> +
> +	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY:
> +		num_pages = HV_MAX_CONTIGUOUS_ALLOCATION_PAGES;
> +		fallthrough;
> +	case HV_STATUS_INSUFFICIENT_ROOT_MEMORY:

I have the same comment as on v2 about num_pages being uninitialized when we
reach this case directly.

Thanks,
Anirudh.

> +		if (!hv_root_partition()) {
> +			hv_status_err(hv_status, "Unexpected root memory deposit\n");
> +			return -ENOMEM;
> +		}
> +		partition_id = HV_PARTITION_ID_SELF;
> +		break;
> +
>  	default:
>  		hv_status_err(hv_status, "Unexpected!\n");
>  		return -ENOMEM;
> @@ -134,6 +146,8 @@ bool hv_result_needs_memory(u64 status)
>  	switch (hv_result(status)) {
>  	case HV_STATUS_INSUFFICIENT_MEMORY:
>  	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY:
> +	case HV_STATUS_INSUFFICIENT_ROOT_MEMORY:
> +	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY:
>  		return true;
>  	}
>  	return false;
> diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
> index 99ea0d03e657..50f5a1419052 100644
> --- a/include/hyperv/hvgdk_mini.h
> +++ b/include/hyperv/hvgdk_mini.h
> @@ -14,34 +14,36 @@ struct hv_u128 {
>  } __packed;
>  
>  /* NOTE: when adding below, update hv_result_to_string() */
> -#define HV_STATUS_SUCCESS			    0x0
> -#define HV_STATUS_INVALID_HYPERCALL_CODE	    0x2
> -#define HV_STATUS_INVALID_HYPERCALL_INPUT	    0x3
> -#define HV_STATUS_INVALID_ALIGNMENT		    0x4
> -#define HV_STATUS_INVALID_PARAMETER		    0x5
> -#define HV_STATUS_ACCESS_DENIED			    0x6
> -#define HV_STATUS_INVALID_PARTITION_STATE	    0x7
> -#define HV_STATUS_OPERATION_DENIED		    0x8
> -#define HV_STATUS_UNKNOWN_PROPERTY		    0x9
> -#define HV_STATUS_PROPERTY_VALUE_OUT_OF_RANGE	    0xA
> -#define HV_STATUS_INSUFFICIENT_MEMORY		    0xB
> -#define HV_STATUS_INVALID_PARTITION_ID		    0xD
> -#define HV_STATUS_INVALID_VP_INDEX		    0xE
> -#define HV_STATUS_NOT_FOUND			    0x10
> -#define HV_STATUS_INVALID_PORT_ID		    0x11
> -#define HV_STATUS_INVALID_CONNECTION_ID		    0x12
> -#define HV_STATUS_INSUFFICIENT_BUFFERS		    0x13
> -#define HV_STATUS_NOT_ACKNOWLEDGED		    0x14
> -#define HV_STATUS_INVALID_VP_STATE		    0x15
> -#define HV_STATUS_NO_RESOURCES			    0x1D
> -#define HV_STATUS_PROCESSOR_FEATURE_NOT_SUPPORTED   0x20
> -#define HV_STATUS_INVALID_LP_INDEX		    0x41
> -#define HV_STATUS_INVALID_REGISTER_VALUE	    0x50
> -#define HV_STATUS_OPERATION_FAILED		    0x71
> -#define HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY    0x75
> -#define HV_STATUS_TIME_OUT			    0x78
> -#define HV_STATUS_CALL_PENDING			    0x79
> -#define HV_STATUS_VTL_ALREADY_ENABLED		    0x86
> +#define HV_STATUS_SUCCESS				0x0
> +#define HV_STATUS_INVALID_HYPERCALL_CODE		0x2
> +#define HV_STATUS_INVALID_HYPERCALL_INPUT		0x3
> +#define HV_STATUS_INVALID_ALIGNMENT			0x4
> +#define HV_STATUS_INVALID_PARAMETER			0x5
> +#define HV_STATUS_ACCESS_DENIED				0x6
> +#define HV_STATUS_INVALID_PARTITION_STATE		0x7
> +#define HV_STATUS_OPERATION_DENIED			0x8
> +#define HV_STATUS_UNKNOWN_PROPERTY			0x9
> +#define HV_STATUS_PROPERTY_VALUE_OUT_OF_RANGE		0xA
> +#define HV_STATUS_INSUFFICIENT_MEMORY			0xB
> +#define HV_STATUS_INVALID_PARTITION_ID			0xD
> +#define HV_STATUS_INVALID_VP_INDEX			0xE
> +#define HV_STATUS_NOT_FOUND				0x10
> +#define HV_STATUS_INVALID_PORT_ID			0x11
> +#define HV_STATUS_INVALID_CONNECTION_ID			0x12
> +#define HV_STATUS_INSUFFICIENT_BUFFERS			0x13
> +#define HV_STATUS_NOT_ACKNOWLEDGED			0x14
> +#define HV_STATUS_INVALID_VP_STATE			0x15
> +#define HV_STATUS_NO_RESOURCES				0x1D
> +#define HV_STATUS_PROCESSOR_FEATURE_NOT_SUPPORTED	0x20
> +#define HV_STATUS_INVALID_LP_INDEX			0x41
> +#define HV_STATUS_INVALID_REGISTER_VALUE		0x50
> +#define HV_STATUS_OPERATION_FAILED			0x71
> +#define HV_STATUS_INSUFFICIENT_ROOT_MEMORY		0x73
> +#define HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY	0x75
> +#define HV_STATUS_TIME_OUT				0x78
> +#define HV_STATUS_CALL_PENDING				0x79
> +#define HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY	0x83
> +#define HV_STATUS_VTL_ALREADY_ENABLED			0x86
>  
>  /*
>   * The Hyper-V TimeRefCount register and the TSC
> 
> 

^ permalink raw reply

* Re: [PATCH 1/3] x86/x2apic: disable x2apic on resume if the kernel expects so
From: Shashank Balaji @ 2026-02-06  3:44 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Suresh Siddha, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov,
	Broadcom internal kernel review list, Jan Kiszka, Paolo Bonzini,
	Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky, Ingo Molnar,
	linux-kernel, linux-hyperv, virtualization, jailhouse-dev, kvm,
	xen-devel, Rahul Bukte, Daniel Palmer, Tim Bird, stable
In-Reply-To: <e5ac3272-795b-488c-b767-290fd50f2105@intel.com>

On Thu, Feb 05, 2026 at 03:18:58PM -0800, Sohil Mehta wrote:
> Maybe a warning would be useful to encourage firmware to fix this going
> forward. I don't have a strong preference on the wording, but how about?
> 
> pr_warn_once("x2apic unexpectedly re-enabled by the firmware during
> resume.\n");

That works

> A few nits:
> 
> For the code comments, you can use more of the line width. Generally, 72
> (perhaps even 80) chars is okay for comments dependent on the code in
> the vicinity.
> 
> The tip tree has slightly unique preferences, such as capitalizing the
> first word of the patch title.
> 
> Please refer:
> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#patch-submission-notes

Thanks! I noticed that I also didn't use '()' for function names in the
commit message. I'll fix all these and add the pr_warn_once in v2.

^ permalink raw reply

* Re: [PATCH 3/3] x86/virt: rename x2apic_available to x2apic_without_ir_available
From: Sohil Mehta @ 2026-02-06  0:10 UTC (permalink / raw)
  To: Shashank Balaji, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov,
	Broadcom internal kernel review list, Jan Kiszka, Paolo Bonzini,
	Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky
  Cc: linux-kernel, linux-hyperv, virtualization, jailhouse-dev, kvm,
	xen-devel, Rahul Bukte, Daniel Palmer, Tim Bird
In-Reply-To: <20260202-x2apic-fix-v1-3-71c8f488a88b@sony.com>

On 2/2/2026 1:51 AM, Shashank Balaji wrote:
> No functional change.
> 
> x86_init.hyper.x2apic_available is used only in try_to_enable_x2apic to check if
> x2apic needs to be disabled if interrupt remapping support isn't present. But
> the name x2apic_available doesn't reflect that usage.
> 

I don't understand the premise of this patch. Shouldn't the variable
name reflect what is stored rather than how it is used?

> This is what x2apic_available is set to for various hypervisors:
> 
> 	acrn		boot_cpu_has(X86_FEATURE_X2APIC)
> 	mshyperv	boot_cpu_has(X86_FEATURE_X2APIC)
> 	xen		boot_cpu_has(X86_FEATURE_X2APIC) or false
> 	vmware		vmware_legacy_x2apic_available
> 	kvm		kvm_cpuid_base() != 0
> 	jailhouse	x2apic_enabled()
> 	bhyve		true
> 	default		false
> 

If both interrupt remapping and x2apic are enabled, what would the name
x2apic_without_ir_available signify?

A value of "true" would mean x2apic is available without IR. But that
would be inaccurate for most hypervisors. A value of "false" could be
interpreted as x2apic is not available, which is also inaccurate.

To me, x2apic_available makes more sense than
x2apic_without_ir_available based on the values being set by the
hypervisors.



> Bare metal and vmware correctly check if x2apic is available without interrupt
> remapping. The rest of them check if x2apic is enabled/supported, and kvm just
> checks if the kernel is running on kvm. The other hypervisors may have to have
> their checks audited.
> 
AFAIU, the value on bare metal is set to false because this is a
hypervisor specific variable. Perhaps I have misunderstood something?



^ permalink raw reply

* Re: [PATCH 1/3] x86/x2apic: disable x2apic on resume if the kernel expects so
From: Sohil Mehta @ 2026-02-05 23:18 UTC (permalink / raw)
  To: Shashank Balaji
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Suresh Siddha, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov,
	Broadcom internal kernel review list, Jan Kiszka, Paolo Bonzini,
	Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky, Ingo Molnar,
	linux-kernel, linux-hyperv, virtualization, jailhouse-dev, kvm,
	xen-devel, Rahul Bukte, Daniel Palmer, Tim Bird, stable
In-Reply-To: <aYQzhRN83rJx6DSb@JPC00244420>

On 2/4/2026 10:07 PM, Shashank Balaji wrote:
> On Wed, Feb 04, 2026 at 10:53:28AM -0800, Sohil Mehta wrote:

>> It's a bit odd then that the firmware chooses to enable x2apic without
>> the OS requesting it.
> 
> Well, the firmware has a setting saying "Enable x2apic", which was
> enabled. So it did what the setting says
> 

The expectation would be that firmware would restore to the same state
before lapic_suspend().

>  
>>> Either way, a pr_warn maybe helpful. How about "x2apic re-enabled by the
>>> firmware during resume. Disabling\n"?
>>
>> I mainly want to make sure the firmware is really at fault before we add
>> such a print. But it seems likely now that the firmware messed up.

Maybe a warning would be useful to encourage firmware to fix this going
forward. I don't have a strong preference on the wording, but how about?

pr_warn_once("x2apic unexpectedly re-enabled by the firmware during
resume.\n");

A few nits:

For the code comments, you can use more of the line width. Generally, 72
(perhaps even 80) chars is okay for comments dependent on the code in
the vicinity.

The tip tree has slightly unique preferences, such as capitalizing the
first word of the patch title.

Please refer:
https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#patch-submission-notes



^ permalink raw reply

* RE: [PATCH 1/1] mshv: Add comment about huge page mappings in guest physical address space
From: Michael Kelley @ 2026-02-05 20:42 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: mhkelley58@gmail.com, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <aYKY0JmcnadPqwXK@skinsburskii.localdomain>

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Tuesday, February 3, 2026 4:55 PM
> 
> On Tue, Feb 03, 2026 at 06:35:40PM +0000, Michael Kelley wrote:
> > From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday,
> February 2, 2026 10:56 AM
> > >
> > > On Mon, Feb 02, 2026 at 06:26:42PM +0000, Michael Kelley wrote:
> > > > From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Monday, February 2, 2026 9:18 AM
> > > > >
> > > > > On Mon, Feb 02, 2026 at 08:51:01AM -0800, mhkelley58@gmail.com wrote:
> > > > > > From: Michael Kelley <mhklinux@outlook.com>
> > > > > >
> > > > > > Huge page mappings in the guest physical address space depend on having
> > > > > > matching alignment of the userspace address in the parent partition and
> > > > > > of the guest physical address. Add a comment that captures this
> > > > > > information. See the link to the mailing list thread.
> > > > > >
> > > > > > No code or functional change.
> > > > > >
> > > > > > Link: https://lore.kernel.org/linux-hyperv/aUrC94YvscoqBzh3@skinsburskii.localdomain/T/#m0871d2cae9b297fd397ddb8459e534981307c7dc
> > > > > > Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> > > > > > ---
> > > > > >  drivers/hv/mshv_root_main.c | 14 ++++++++++++++
> > > > > >  1 file changed, 14 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> > > > > > index 681b58154d5e..bc738ff4508e 100644
> > > > > > --- a/drivers/hv/mshv_root_main.c
> > > > > > +++ b/drivers/hv/mshv_root_main.c
> > > > > > @@ -1389,6 +1389,20 @@ mshv_partition_ioctl_set_memory(struct mshv_partition *partition,
> > > > > >  	if (mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP))
> > > > > >  		return mshv_unmap_user_memory(partition, mem);
> > > > > >
> > > > > > +	/*
> > > > > > +	 * If the userspace_addr and the guest physical address (as derived
> > > > > > +	 * from the guest_pfn) have the same alignment modulo PMD huge page
> > > > > > +	 * size, the MSHV driver can map any PMD huge pages to the guest
> > > > > > +	 * physical address space as PMD huge pages. If the alignments do
> > > > > > +	 * not match, PMD huge pages must be mapped as single pages in the
> > > > > > +	 * guest physical address space. The MSHV driver does not enforce
> > > > > > +	 * that the alignments match, and it invokes the hypervisor to set
> > > > > > +	 * up correct functional mappings either way. See mshv_chunk_stride().
> > > > > > +	 * The caller of the ioctl is responsible for providing userspace_addr
> > > > > > +	 * and guest_pfn values with matching alignments if it wants the guest
> > > > > > +	 * to get the performance benefits of PMD huge page mappings of its
> > > > > > +	 * physical address space to real system memory.
> > > > > > +	 */
> > > > >
> > > > > Thanks. However, I'd suggest to reduce this commet a lot and put the
> > > > > details into the commit message instead. Also, why this place? Why not a
> > > > > part of the function description instead, for example?
> > > >
> > > > In general, I'm very much an advocate of putting a bit more detail into code
> > > > comments, so that someone new reading the code has a chance of figuring
> > > > out what's going on without having to search through the commit history
> > > > and read commit messages. The commit history is certainly useful for the
> > > > historical record, and especially how things have changed over time. But for
> > > > "how non-obvious things work now", I like to see that in the code comments.
> > > >
> > >
> > > This approach is not well aligned with the existing kernel coding style.
> > > It is common to answer the "why" question in the commit message.
> > > Code comments should focus on "what" the code does.
> > >
> > > https://www.kernel.org/doc/html/latest/process/coding-style.html
> > >
> >
> > Which says "Instead, put the comments at the head of the function,
> > telling people what it does, and possibly WHY it does it." I'm good with
> > that approach.
> >
> > > For more details, it is common to use `git blame` to learn the context
> > > of a change when needed.
> >
> > Yep, I use that all the time for the historical record.
> >
> > >
> > > > As for where to put the comment, I'm flexible. I thought about placing it
> > > > outside the function as a "header" (which is what I think you mean by the
> > > > "function description"), but the function handles both "map" and "unmap"
> > > > operations, and this comment applies only to "map".  Hence I put it after
> > > > the test for whether we're doing "map" vs. "unmap".  But I wouldn't object
> > > > to it being placed as a function description, though the text would need to be
> > > > enhanced to more broadly be a function description instead of just a comment
> > > > about a specific aspect of "map" behavior.
> > > >
> > >
> > > As for the location, since this documents the userspace API, I would
> > > rather place it above the function as part of the function description.
> > > Even though the function handles both map and unmap, unmap also deals
> > > with huge pages.
> >
> > I'll do a version written as the function description. But the full function
> > description will be more extensive to cover all the "what" that this function
> > implements:
> > * input parameters, and their valid values
> > * map and unmap
> > * when pinned vs. movable vs. mmio regions are created
> > * what is done with huge pages in the above cases (i.e., a massaged version
> >    of what I've already written)
> > * populating and pinning of pages for pinned regions
> >
> > Does that match with your expectations?
> 
> I'd rather suggest something simpler for the function header:
> 
> * What regions are created
> * What pages sizes are supported
> 
> I.e. describe what the function does, not the rationale or the
> architecture behind it.
> 
> For example, something like this (suggested by AI, feel free to rewrite
> completly):
> 
>  * Depending on the request, the region is created as pinned RAM, movable RAM,
>  * or MMIO. PMD-sized huge page mappings are supported when the userspace
>  * address and guest physical address (guest_pfn << PAGE_SHIFT) have matching
>  * alignment modulo PMD_SIZE; otherwise the mapping is established using base
>  * pages.
> 
> The rationale and architecture can be put into the commit message.

I really disagree with the approach you are suggesting. In my view, putting
a detailed description in the function header is completely aligned with the
"Commenting" section of the Coding Guidelines. We're not in danger of over
commenting or commenting on trivialities. My goal is always to be as helpful
as possible to whoever comes next in reviewing or updating the code. I
think that's putting the information front-and-center in the function header,
or in comments within a function to call out noteworthy aspects.

But we're at an impasse here, and further discussion is not likely to resolve it.
You and the Microsoft team are custodians of this code, so I'll stand down. I
can't do justice to the approach you prefer, so let's drop my patch. You, or
someone else who can embrace your approach, can submit a new patch that
does so, and I won't object.

Michael

^ permalink raw reply

* RE: [EXTERNAL] [PATCH] scsi: storvsc: Fix scheduling while atomic on PREEMPT_RT
From: Michael Kelley @ 2026-02-05 19:09 UTC (permalink / raw)
  To: Jan Kiszka, Long Li, KY Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, James E.J. Bottomley, Martin K. Petersen,
	linux-hyperv@vger.kernel.org
  Cc: linux-scsi@vger.kernel.org, Linux Kernel Mailing List,
	Florian Bezdeka, RT, Mitchell Levy
In-Reply-To: <d9f9add2-27e9-4b17-a122-d14918968ea6@siemens.com>

From: Jan Kiszka <jan.kiszka@siemens.com> Sent: Wednesday, February 4, 2026 10:38 PM
> 
> On 05.02.26 06:42, Michael Kelley wrote:
> > From: Jan Kiszka <jan.kiszka@siemens.com> Sent: Monday, February 2, 2026 9:58 PM
> >>
> >> On 03.02.26 00:47, Long Li wrote:
> >>>> From: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>
> >>>> This resolves the follow splat and lock-up when running with PREEMPT_RT
> >>>> enabled on Hyper-V:
> >>>
> >>> Hi Jan,
> >>>
> >>> It's interesting to know the use-case of running a RT kernel over Hyper-V.
> >>>
> >>> Can you give an example?
> >>>
> >>
> >> - functional testing of an RT base image over Hyper-V
> >> - re-use of a common RT base image, without exploiting RT properties
> >>
> >>> As far as I know, Hyper-V makes no RT guarantees of scheduling VPs for a VM.
> >>
> >> This is well understood and not our goal. We only need the kernel to run
> >> correctly over Hyper-V with PREEMPT-RT enabled, and that is not the case
> >> right now.
> >>
> >> Thanks,
> >> Jan
> >>
> >> PS: Who had to idea to drop a virtual UART from Gen 2 VMs? Early boot
> >> guest debugging is true fun now...
> >>
> >
> > Hmmm. I often do printk()-based debugging via a virtual UART in a Gen 2
> > VM. The Linux serial console outputs to that virtual UART and I see the
> > printk() output in PuTTY on the Windows host. What specifically are you
> > trying to do?  I'm trying to remember if there's any unique setup required
> > on a Gen 2 VM vs. a Gen 1 VM, and nothing immediately comes to mind.
> > Though maybe it's just so baked into my process that I don't remember it!
> >
> 
> Indeed:
> 
> Powershell> Set-VMComPort -VMName "Debian 13" 1 \\.\pipe\comport
> 
> <Start VM>
> 
> Powershell> putty -serial \\.\pipe\comport
> 
> Well hidden...

I just realized that the Hyper-V "Settings" UI for a VM shows COM1 and COM2
only for Gen1 VMs. I don't know why it's not shown for Gen2 VMs. The
Powershell command you found is what I have always used.

Michael


^ permalink raw reply

* RE: [PATCH] x86: mshyperv: Use kthread for vmbus interrupts on PREEMPT_RT
From: Michael Kelley @ 2026-02-05 18:55 UTC (permalink / raw)
  To: Jan Kiszka, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Long Li, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86@kernel.org
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	Florian Bezdeka, RT, Mitchell Levy,
	skinsburskii@linux.microsoft.com, mrathor@linux.microsoft.com,
	anirudh@anirudhrb.com, schakrabarti@linux.microsoft.com,
	ssengar@linux.microsoft.com
In-Reply-To: <133a95d9-8148-40ea-9acc-edfd8e3ceef4@siemens.com>

From: Jan Kiszka <jan.kiszka@siemens.com> Sent: Tuesday, February 3, 2026 8:02 AM
> 
> Resolves the following lockdep report when booting PREEMPT_RT on Hyper-V
> with related guest support enabled:
> 
> [    1.127941] hv_vmbus: registering driver hyperv_drm
> 
> [    1.132518] =============================
> [    1.132519] [ BUG: Invalid wait context ]
> [    1.132521] 6.19.0-rc8+ #9 Not tainted
> [    1.132524] -----------------------------
> [    1.132525] swapper/0/0 is trying to lock:
> [    1.132526] ffff8b9381bb3c90 (&channel->sched_lock){....}-{3:3}, at: vmbus_chan_sched+0xc4/0x2b0
> [    1.132543] other info that might help us debug this:
> [    1.132544] context-{2:2}
> [    1.132545] 1 lock held by swapper/0/0:
> [    1.132547]  #0: ffffffffa010c4c0 (rcu_read_lock){....}-{1:3}, at: vmbus_chan_sched+0x31/0x2b0
> [    1.132557] stack backtrace:
> [    1.132560] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.19.0-rc8+ #9 PREEMPT_{RT,(lazy)}
> [    1.132565] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025
> [    1.132567] Call Trace:
> [    1.132570]  <IRQ>
> [    1.132573]  dump_stack_lvl+0x6e/0xa0
> [    1.132581]  __lock_acquire+0xee0/0x21b0
> [    1.132592]  lock_acquire+0xd5/0x2d0
> [    1.132598]  ? vmbus_chan_sched+0xc4/0x2b0
> [    1.132606]  ? lock_acquire+0xd5/0x2d0
> [    1.132613]  ? vmbus_chan_sched+0x31/0x2b0
> [    1.132619]  rt_spin_lock+0x3f/0x1f0
> [    1.132623]  ? vmbus_chan_sched+0xc4/0x2b0
> [    1.132629]  ? vmbus_chan_sched+0x31/0x2b0
> [    1.132634]  vmbus_chan_sched+0xc4/0x2b0
> [    1.132641]  vmbus_isr+0x2c/0x150
> [    1.132648]  __sysvec_hyperv_callback+0x5f/0xa0
> [    1.132654]  sysvec_hyperv_callback+0x88/0xb0
> [    1.132658]  </IRQ>
> [    1.132659]  <TASK>
> [    1.132660]  asm_sysvec_hyperv_callback+0x1a/0x20
> 
> As code paths that handle vmbus IRQs use sleepy locks under PREEMPT_RT,
> the complete vmbus_handler execution needs to be moved into thread
> context. Open-coding this allows to skip the IPI that irq_work would
> additionally bring and which we do not need, being an IRQ, never an NMI.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
> 
> This should resolve what was once brought forward via [1]. If it
> actually resolves all remaining compatibility issues of the hyperv
> support with RT is not yet clear, though. So far, lockdep is happy when
> using this plus [2].
> 
> [1] https://lore.kernel.org/all/20230809-b4-rt_preempt-fix-v1-0-7283bbdc8b14@gmail.com/
> [2] https://lore.kernel.org/lkml/0c7fb5cd-fb21-4760-8593-e04bade84744@siemens.com/
> 
>  arch/x86/kernel/cpu/mshyperv.c | 52 ++++++++++++++++++++++++++++++++--	

You've added this code under arch/x86. But isn't it architecture independent? I
think it should also work on arm64. If that's the case, the code should probably
be added to drivers/hv/vmbus_drv.c instead.

>  1 file changed, 50 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 579fb2c64cfd..1194ca452c52 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -17,6 +17,7 @@
>  #include <linux/irq.h>
>  #include <linux/kexec.h>
>  #include <linux/random.h>
> +#include <linux/smpboot.h>
>  #include <asm/processor.h>
>  #include <asm/hypervisor.h>
>  #include <hyperv/hvhdk.h>
> @@ -150,6 +151,43 @@ static void (*hv_stimer0_handler)(void);
>  static void (*hv_kexec_handler)(void);
>  static void (*hv_crash_handler)(struct pt_regs *regs);
> 
> +static DEFINE_PER_CPU(bool, vmbus_irq_pending);
> +static DEFINE_PER_CPU(struct task_struct *, vmbus_irqd);
> +
> +static void vmbus_irqd_wake(void)
> +{
> +	struct task_struct *tsk = __this_cpu_read(vmbus_irqd);
> +
> +	__this_cpu_write(vmbus_irq_pending, true);
> +	wake_up_process(tsk);
> +}
> +
> +static void vmbus_irqd_setup(unsigned int cpu)
> +{
> +	sched_set_fifo(current);
> +}
> +
> +static int vmbus_irqd_should_run(unsigned int cpu)
> +{
> +	return __this_cpu_read(vmbus_irq_pending);
> +}
> +
> +static void run_vmbus_irqd(unsigned int cpu)
> +{
> +	vmbus_handler();
> +	__this_cpu_write(vmbus_irq_pending, false);
> +}

The two statements in this function should be swapped. This function
runs with pre-emption enabled and interrupts enabled. If a VMBus
interrupt comes in as vmbus_handler() is finishing, vmbus_irqd_wake()
will run and set vmbus_irq_pending to "true". This function will then set
vmbus_irq_pending to 'false", wiping out the "true" setting. The hotplug
thread will decide it doesn't need to run again, and whatever generated
the new interrupt doesn't get processed (at least until another interrupt
comes in).

This scenario could specifically happen because of the way VMBus messages
are processed. The vmbus_handler function calls vmbus_message_sched(),
which always processes a single message. When that message is handled,
Hyper-V sends the next message that may have been queued up, and
generates another interrupt to the guest VM. There's no looping in the Linux
code to process all messages, so Linux depends on getting a new interrupt for
each subsequent message in order to run vmbus_message_sched() again.

There might be a similar situation with vmbus_chan_sched() and channel
interrupts. There are three interrupt handling modes across multiple VMBus
devices, and it would take some additional sleuthing to see if any of them
depend on a similar scheme of needing a new interrupt for each channel
event.

Please double-check my thinking. The likelihood of the problem occurring
is very low, because VMBus messages generally are used only when VMBus
devices are being added (or removed), which is usually during boot, and
the timing window must be hit just right. But the fix is simple, so it should
be done.

Michael

> +
> +static bool vmbus_irq_initialized;
> +
> +static struct smp_hotplug_thread vmbus_irq_threads = {
> +	.store                  = &vmbus_irqd,
> +	.setup			= vmbus_irqd_setup,
> +	.thread_should_run      = vmbus_irqd_should_run,
> +	.thread_fn              = run_vmbus_irqd,
> +	.thread_comm            = "vmbus_irq/%u",
> +};
> +
>  DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_callback)
>  {
>  	struct pt_regs *old_regs = set_irq_regs(regs);
> @@ -158,8 +196,12 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_hyperv_callback)
>  	if (mshv_handler)
>  		mshv_handler();
> 
> -	if (vmbus_handler)
> -		vmbus_handler();
> +	if (vmbus_handler) {
> +		if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +			vmbus_irqd_wake();
> +		else
> +			vmbus_handler();
> +	}
> 
>  	if (ms_hyperv.hints & HV_DEPRECATING_AEOI_RECOMMENDED)
>  		apic_eoi();
> @@ -174,6 +216,10 @@ void hv_setup_mshv_handler(void (*handler)(void))
> 
>  void hv_setup_vmbus_handler(void (*handler)(void))
>  {
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT) && !vmbus_irq_initialized) {
> +		BUG_ON(smpboot_register_percpu_thread(&vmbus_irq_threads));
> +		vmbus_irq_initialized = true;
> +	}
>  	vmbus_handler = handler;
>  }
> 
> @@ -181,6 +227,8 @@ void hv_remove_vmbus_handler(void)
>  {
>  	/* We have no way to deallocate the interrupt gate */
>  	vmbus_handler = NULL;
> +	smpboot_unregister_percpu_thread(&vmbus_irq_threads);
> +	vmbus_irq_initialized = false;
>  }
> 
>  /*
> --
> 2.51.0


^ permalink raw reply

* Re: [PATCH v0 01/15] iommu/hyperv: rename hyperv-iommu.c to hyperv-irq.c
From: Anirudh Rayabharam @ 2026-02-05 18:48 UTC (permalink / raw)
  To: Mukesh R
  Cc: linux-kernel, linux-hyperv, linux-arm-kernel, iommu, linux-pci,
	linux-arch, kys, haiyangz, wei.liu, decui, longli,
	catalin.marinas, will, tglx, mingo, bp, dave.hansen, hpa, joro,
	lpieralisi, kwilczynski, mani, robh, bhelgaas, arnd, nunodasneves,
	mhklinux, romank
In-Reply-To: <20260120064230.3602565-2-mrathor@linux.microsoft.com>

On Mon, Jan 19, 2026 at 10:42:16PM -0800, Mukesh R wrote:
> From: Mukesh Rathor <mrathor@linux.microsoft.com>
> 
> This file actually implements irq remapping, so rename to more appropriate
> hyperv-irq.c. A new file named hyperv-iommu.c will be introduced later.
> Also, move CONFIG_IRQ_REMAP out of the file and add to Makefile.
> 
> Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>

Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>

> ---
>  MAINTAINERS                                    | 2 +-
>  drivers/iommu/Kconfig                          | 1 +
>  drivers/iommu/Makefile                         | 2 +-
>  drivers/iommu/{hyperv-iommu.c => hyperv-irq.c} | 4 ----
>  4 files changed, 3 insertions(+), 6 deletions(-)
>  rename drivers/iommu/{hyperv-iommu.c => hyperv-irq.c} (99%)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5b11839cba9d..381a0e086382 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11741,7 +11741,7 @@ F:	drivers/hid/hid-hyperv.c
>  F:	drivers/hv/
>  F:	drivers/infiniband/hw/mana/
>  F:	drivers/input/serio/hyperv-keyboard.c
> -F:	drivers/iommu/hyperv-iommu.c
> +F:	drivers/iommu/hyperv-irq.c
>  F:	drivers/net/ethernet/microsoft/
>  F:	drivers/net/hyperv/
>  F:	drivers/pci/controller/pci-hyperv-intf.c
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 99095645134f..b4cc2b42b338 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -355,6 +355,7 @@ config HYPERV_IOMMU
>  	bool "Hyper-V IRQ Handling"
>  	depends on HYPERV && X86
>  	select IOMMU_API
> +	select IRQ_REMAP
>  	default HYPERV
>  	help
>  	  Stub IOMMU driver to handle IRQs to support Hyper-V Linux
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 8e8843316c4b..598c39558e7d 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -30,7 +30,7 @@ obj-$(CONFIG_TEGRA_IOMMU_SMMU) += tegra-smmu.o
>  obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
>  obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
>  obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
> -obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
> +obj-$(CONFIG_HYPERV_IOMMU) += hyperv-irq.o
>  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
>  obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
>  obj-$(CONFIG_IOMMU_IOPF) += io-pgfault.o
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-irq.c
> similarity index 99%
> rename from drivers/iommu/hyperv-iommu.c
> rename to drivers/iommu/hyperv-irq.c
> index 0961ac805944..1944440a5004 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-irq.c
> @@ -24,8 +24,6 @@
>  
>  #include "irq_remapping.h"
>  
> -#ifdef CONFIG_IRQ_REMAP
> -
>  /*
>   * According 82093AA IO-APIC spec , IO APIC has a 24-entry Interrupt
>   * Redirection Table. Hyper-V exposes one single IO-APIC and so define
> @@ -330,5 +328,3 @@ static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
>  	.alloc = hyperv_root_irq_remapping_alloc,
>  	.free = hyperv_root_irq_remapping_free,
>  };
> -
> -#endif
> -- 
> 2.51.2.vfs.0.1
> 

^ permalink raw reply

* Re: [PATCH v0 02/15] x86/hyperv: cosmetic changes in irqdomain.c for readability
From: Anirudh Rayabharam @ 2026-02-05 18:47 UTC (permalink / raw)
  To: Mukesh R
  Cc: linux-kernel, linux-hyperv, linux-arm-kernel, iommu, linux-pci,
	linux-arch, kys, haiyangz, wei.liu, decui, longli,
	catalin.marinas, will, tglx, mingo, bp, dave.hansen, hpa, joro,
	lpieralisi, kwilczynski, mani, robh, bhelgaas, arnd, nunodasneves,
	mhklinux, romank
In-Reply-To: <20260120064230.3602565-3-mrathor@linux.microsoft.com>

On Mon, Jan 19, 2026 at 10:42:17PM -0800, Mukesh R wrote:
> From: Mukesh Rathor <mrathor@linux.microsoft.com>
> 
> Make cosmetic changes:
>  o Rename struct pci_dev *dev to *pdev since there are cases of
>    struct device *dev in the file and all over the kernel
>  o Rename hv_build_pci_dev_id to hv_build_devid_type_pci in anticipation
>    of building different types of device ids
>  o Fix checkpatch.pl issues with return and extraneous printk
>  o Replace spaces with tabs
>  o Rename struct hv_devid *xxx to struct hv_devid *hv_devid given code
>    paths involve many types of device ids
>  o Fix indentation in a large if block by using goto.
> 
> There are no functional changes.
> 
> Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
> ---
>  arch/x86/hyperv/irqdomain.c | 197 +++++++++++++++++++-----------------
>  1 file changed, 103 insertions(+), 94 deletions(-)
> 
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> index c3ba12b1bc07..f6b61483b3b8 100644
> --- a/arch/x86/hyperv/irqdomain.c
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -1,5 +1,4 @@
>  // SPDX-License-Identifier: GPL-2.0
> -
>  /*
>   * Irqdomain for Linux to run as the root partition on Microsoft Hypervisor.
>   *
> @@ -14,8 +13,8 @@
>  #include <linux/irqchip/irq-msi-lib.h>
>  #include <asm/mshyperv.h>
>  
> -static int hv_map_interrupt(union hv_device_id device_id, bool level,
> -		int cpu, int vector, struct hv_interrupt_entry *entry)
> +static int hv_map_interrupt(union hv_device_id hv_devid, bool level,
> +		int cpu, int vector, struct hv_interrupt_entry *ret_entry)
>  {
>  	struct hv_input_map_device_interrupt *input;
>  	struct hv_output_map_device_interrupt *output;
> @@ -32,7 +31,7 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
>  	intr_desc = &input->interrupt_descriptor;
>  	memset(input, 0, sizeof(*input));
>  	input->partition_id = hv_current_partition_id;
> -	input->device_id = device_id.as_uint64;
> +	input->device_id = hv_devid.as_uint64;
>  	intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
>  	intr_desc->vector_count = 1;
>  	intr_desc->target.vector = vector;
> @@ -44,7 +43,7 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
>  
>  	intr_desc->target.vp_set.valid_bank_mask = 0;
>  	intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> -	nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), cpumask_of(cpu));
> +	nr_bank = cpumask_to_vpset(&intr_desc->target.vp_set, cpumask_of(cpu));
>  	if (nr_bank < 0) {
>  		local_irq_restore(flags);
>  		pr_err("%s: unable to generate VP set\n", __func__);
> @@ -61,7 +60,7 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
>  
>  	status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, var_size,
>  			input, output);
> -	*entry = output->interrupt_entry;
> +	*ret_entry = output->interrupt_entry;
>  
>  	local_irq_restore(flags);
>  
> @@ -71,21 +70,19 @@ static int hv_map_interrupt(union hv_device_id device_id, bool level,
>  	return hv_result_to_errno(status);
>  }
>  
> -static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *old_entry)
> +static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *irq_entry)
>  {
>  	unsigned long flags;
>  	struct hv_input_unmap_device_interrupt *input;
> -	struct hv_interrupt_entry *intr_entry;
>  	u64 status;
>  
>  	local_irq_save(flags);
>  	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
>  
>  	memset(input, 0, sizeof(*input));
> -	intr_entry = &input->interrupt_entry;
>  	input->partition_id = hv_current_partition_id;
>  	input->device_id = id;
> -	*intr_entry = *old_entry;
> +	input->interrupt_entry = *irq_entry;
>  
>  	status = hv_do_hypercall(HVCALL_UNMAP_DEVICE_INTERRUPT, input, NULL);
>  	local_irq_restore(flags);
> @@ -115,67 +112,71 @@ static int get_rid_cb(struct pci_dev *pdev, u16 alias, void *data)
>  	return 0;
>  }
>  
> -static union hv_device_id hv_build_pci_dev_id(struct pci_dev *dev)
> +static union hv_device_id hv_build_devid_type_pci(struct pci_dev *pdev)
>  {
> -	union hv_device_id dev_id;
> +	int pos;
> +	union hv_device_id hv_devid;
>  	struct rid_data data = {
>  		.bridge = NULL,
> -		.rid = PCI_DEVID(dev->bus->number, dev->devfn)
> +		.rid = PCI_DEVID(pdev->bus->number, pdev->devfn)
>  	};
>  
> -	pci_for_each_dma_alias(dev, get_rid_cb, &data);
> +	pci_for_each_dma_alias(pdev, get_rid_cb, &data);
>  
> -	dev_id.as_uint64 = 0;
> -	dev_id.device_type = HV_DEVICE_TYPE_PCI;
> -	dev_id.pci.segment = pci_domain_nr(dev->bus);
> +	hv_devid.as_uint64 = 0;
> +	hv_devid.device_type = HV_DEVICE_TYPE_PCI;
> +	hv_devid.pci.segment = pci_domain_nr(pdev->bus);
>  
> -	dev_id.pci.bdf.bus = PCI_BUS_NUM(data.rid);
> -	dev_id.pci.bdf.device = PCI_SLOT(data.rid);
> -	dev_id.pci.bdf.function = PCI_FUNC(data.rid);
> -	dev_id.pci.source_shadow = HV_SOURCE_SHADOW_NONE;
> +	hv_devid.pci.bdf.bus = PCI_BUS_NUM(data.rid);
> +	hv_devid.pci.bdf.device = PCI_SLOT(data.rid);
> +	hv_devid.pci.bdf.function = PCI_FUNC(data.rid);
> +	hv_devid.pci.source_shadow = HV_SOURCE_SHADOW_NONE;
>  
> -	if (data.bridge) {
> -		int pos;
> +	if (data.bridge == NULL)
> +		goto out;
>  
> -		/*
> -		 * Microsoft Hypervisor requires a bus range when the bridge is
> -		 * running in PCI-X mode.
> -		 *
> -		 * To distinguish conventional vs PCI-X bridge, we can check
> -		 * the bridge's PCI-X Secondary Status Register, Secondary Bus
> -		 * Mode and Frequency bits. See PCI Express to PCI/PCI-X Bridge
> -		 * Specification Revision 1.0 5.2.2.1.3.
> -		 *
> -		 * Value zero means it is in conventional mode, otherwise it is
> -		 * in PCI-X mode.
> -		 */
> +	/*
> +	 * Microsoft Hypervisor requires a bus range when the bridge is
> +	 * running in PCI-X mode.
> +	 *
> +	 * To distinguish conventional vs PCI-X bridge, we can check
> +	 * the bridge's PCI-X Secondary Status Register, Secondary Bus
> +	 * Mode and Frequency bits. See PCI Express to PCI/PCI-X Bridge
> +	 * Specification Revision 1.0 5.2.2.1.3.
> +	 *
> +	 * Value zero means it is in conventional mode, otherwise it is
> +	 * in PCI-X mode.
> +	 */
>  
> -		pos = pci_find_capability(data.bridge, PCI_CAP_ID_PCIX);
> -		if (pos) {
> -			u16 status;
> +	pos = pci_find_capability(data.bridge, PCI_CAP_ID_PCIX);
> +	if (pos) {
> +		u16 status;
>  
> -			pci_read_config_word(data.bridge, pos +
> -					PCI_X_BRIDGE_SSTATUS, &status);
> +		pci_read_config_word(data.bridge, pos + PCI_X_BRIDGE_SSTATUS,
> +				     &status);
>  
> -			if (status & PCI_X_SSTATUS_FREQ) {
> -				/* Non-zero, PCI-X mode */
> -				u8 sec_bus, sub_bus;
> +		if (status & PCI_X_SSTATUS_FREQ) {
> +			/* Non-zero, PCI-X mode */
> +			u8 sec_bus, sub_bus;
>  
> -				dev_id.pci.source_shadow = HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE;
> +			hv_devid.pci.source_shadow =
> +					     HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE;
>  
> -				pci_read_config_byte(data.bridge, PCI_SECONDARY_BUS, &sec_bus);
> -				dev_id.pci.shadow_bus_range.secondary_bus = sec_bus;
> -				pci_read_config_byte(data.bridge, PCI_SUBORDINATE_BUS, &sub_bus);
> -				dev_id.pci.shadow_bus_range.subordinate_bus = sub_bus;
> -			}
> +			pci_read_config_byte(data.bridge, PCI_SECONDARY_BUS,
> +					     &sec_bus);
> +			hv_devid.pci.shadow_bus_range.secondary_bus = sec_bus;
> +			pci_read_config_byte(data.bridge, PCI_SUBORDINATE_BUS,
> +					     &sub_bus);
> +			hv_devid.pci.shadow_bus_range.subordinate_bus = sub_bus;
>  		}
>  	}
>  
> -	return dev_id;
> +out:
> +	return hv_devid;
>  }
>  
> -/**
> - * hv_map_msi_interrupt() - "Map" the MSI IRQ in the hypervisor.
> +/*
> + * hv_map_msi_interrupt() - Map the MSI IRQ in the hypervisor.
>   * @data:      Describes the IRQ
>   * @out_entry: Hypervisor (MSI) interrupt entry (can be NULL)
>   *
> @@ -188,22 +189,23 @@ int hv_map_msi_interrupt(struct irq_data *data,
>  {
>  	struct irq_cfg *cfg = irqd_cfg(data);
>  	struct hv_interrupt_entry dummy;
> -	union hv_device_id device_id;
> +	union hv_device_id hv_devid;
>  	struct msi_desc *msidesc;
> -	struct pci_dev *dev;
> +	struct pci_dev *pdev;
>  	int cpu;
>  
>  	msidesc = irq_data_get_msi_desc(data);
> -	dev = msi_desc_to_pci_dev(msidesc);
> -	device_id = hv_build_pci_dev_id(dev);
> +	pdev = msi_desc_to_pci_dev(msidesc);
> +	hv_devid = hv_build_devid_type_pci(pdev);
>  	cpu = cpumask_first(irq_data_get_effective_affinity_mask(data));
>  
> -	return hv_map_interrupt(device_id, false, cpu, cfg->vector,
> +	return hv_map_interrupt(hv_devid, false, cpu, cfg->vector,
>  				out_entry ? out_entry : &dummy);
>  }
>  EXPORT_SYMBOL_GPL(hv_map_msi_interrupt);
>  
> -static inline void entry_to_msi_msg(struct hv_interrupt_entry *entry, struct msi_msg *msg)
> +static void entry_to_msi_msg(struct hv_interrupt_entry *entry,
> +			     struct msi_msg *msg)
>  {
>  	/* High address is always 0 */
>  	msg->address_hi = 0;
> @@ -211,17 +213,19 @@ static inline void entry_to_msi_msg(struct hv_interrupt_entry *entry, struct msi
>  	msg->data = entry->msi_entry.data.as_uint32;
>  }
>  
> -static int hv_unmap_msi_interrupt(struct pci_dev *dev, struct hv_interrupt_entry *old_entry);
> +static int hv_unmap_msi_interrupt(struct pci_dev *pdev,
> +				  struct hv_interrupt_entry *irq_entry);
> +
>  static void hv_irq_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
>  {
>  	struct hv_interrupt_entry *stored_entry;
>  	struct irq_cfg *cfg = irqd_cfg(data);
>  	struct msi_desc *msidesc;
> -	struct pci_dev *dev;
> +	struct pci_dev *pdev;
>  	int ret;
>  
>  	msidesc = irq_data_get_msi_desc(data);
> -	dev = msi_desc_to_pci_dev(msidesc);
> +	pdev = msi_desc_to_pci_dev(msidesc);
>  
>  	if (!cfg) {
>  		pr_debug("%s: cfg is NULL", __func__);
> @@ -240,7 +244,7 @@ static void hv_irq_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
>  		stored_entry = data->chip_data;
>  		data->chip_data = NULL;
>  
> -		ret = hv_unmap_msi_interrupt(dev, stored_entry);
> +		ret = hv_unmap_msi_interrupt(pdev, stored_entry);
>  
>  		kfree(stored_entry);
>  
> @@ -249,10 +253,8 @@ static void hv_irq_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
>  	}
>  
>  	stored_entry = kzalloc(sizeof(*stored_entry), GFP_ATOMIC);
> -	if (!stored_entry) {
> -		pr_debug("%s: failed to allocate chip data\n", __func__);
> +	if (!stored_entry)
>  		return;
> -	}
>  
>  	ret = hv_map_msi_interrupt(data, stored_entry);
>  	if (ret) {
> @@ -262,18 +264,21 @@ static void hv_irq_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
>  
>  	data->chip_data = stored_entry;
>  	entry_to_msi_msg(data->chip_data, msg);
> -
> -	return;
>  }
>  
> -static int hv_unmap_msi_interrupt(struct pci_dev *dev, struct hv_interrupt_entry *old_entry)
> +static int hv_unmap_msi_interrupt(struct pci_dev *pdev,
> +				  struct hv_interrupt_entry *irq_entry)
>  {
> -	return hv_unmap_interrupt(hv_build_pci_dev_id(dev).as_uint64, old_entry);
> +	union hv_device_id hv_devid;
> +
> +	hv_devid = hv_build_devid_type_pci(pdev);
> +	return hv_unmap_interrupt(hv_devid.as_uint64, irq_entry);
>  }
>  
> -static void hv_teardown_msi_irq(struct pci_dev *dev, struct irq_data *irqd)
> +/* NB: during map, hv_interrupt_entry is saved via data->chip_data */
> +static void hv_teardown_msi_irq(struct pci_dev *pdev, struct irq_data *irqd)
>  {
> -	struct hv_interrupt_entry old_entry;
> +	struct hv_interrupt_entry irq_entry;
>  	struct msi_msg msg;
>  
>  	if (!irqd->chip_data) {
> @@ -281,13 +286,13 @@ static void hv_teardown_msi_irq(struct pci_dev *dev, struct irq_data *irqd)
>  		return;
>  	}
>  
> -	old_entry = *(struct hv_interrupt_entry *)irqd->chip_data;
> -	entry_to_msi_msg(&old_entry, &msg);
> +	irq_entry = *(struct hv_interrupt_entry *)irqd->chip_data;
> +	entry_to_msi_msg(&irq_entry, &msg);
>  
>  	kfree(irqd->chip_data);
>  	irqd->chip_data = NULL;
>  
> -	(void)hv_unmap_msi_interrupt(dev, &old_entry);
> +	(void)hv_unmap_msi_interrupt(pdev, &irq_entry);
>  }
>  
>  /*
> @@ -302,7 +307,8 @@ static struct irq_chip hv_pci_msi_controller = {
>  };
>  
>  static bool hv_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
> -				 struct irq_domain *real_parent, struct msi_domain_info *info)
> +				 struct irq_domain *real_parent,
> +				 struct msi_domain_info *info)
>  {
>  	struct irq_chip *chip = info->chip;
>  
> @@ -317,7 +323,8 @@ static bool hv_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
>  }
>  
>  #define HV_MSI_FLAGS_SUPPORTED	(MSI_GENERIC_FLAGS_MASK | MSI_FLAG_PCI_MSIX)
> -#define HV_MSI_FLAGS_REQUIRED	(MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS)
> +#define HV_MSI_FLAGS_REQUIRED	(MSI_FLAG_USE_DEF_DOM_OPS |	\
> +				 MSI_FLAG_USE_DEF_CHIP_OPS)
>  
>  static struct msi_parent_ops hv_msi_parent_ops = {
>  	.supported_flags	= HV_MSI_FLAGS_SUPPORTED,
> @@ -329,14 +336,13 @@ static struct msi_parent_ops hv_msi_parent_ops = {
>  	.init_dev_msi_info	= hv_init_dev_msi_info,
>  };
>  
> -static int hv_msi_domain_alloc(struct irq_domain *d, unsigned int virq, unsigned int nr_irqs,
> -			       void *arg)
> +static int hv_msi_domain_alloc(struct irq_domain *d, unsigned int virq,
> +			       unsigned int nr_irqs, void *arg)
>  {
>  	/*
> -	 * TODO: The allocation bits of hv_irq_compose_msi_msg(), i.e. everything except
> -	 * entry_to_msi_msg() should be in here.
> +	 * TODO: The allocation bits of hv_irq_compose_msi_msg(), i.e.
> +	 *	 everything except entry_to_msi_msg() should be in here.
>  	 */
> -
>  	int ret;
>  
>  	ret = irq_domain_alloc_irqs_parent(d, virq, nr_irqs, arg);
> @@ -344,13 +350,15 @@ static int hv_msi_domain_alloc(struct irq_domain *d, unsigned int virq, unsigned
>  		return ret;
>  
>  	for (int i = 0; i < nr_irqs; ++i) {
> -		irq_domain_set_info(d, virq + i, 0, &hv_pci_msi_controller, NULL,
> -				    handle_edge_irq, NULL, "edge");
> +		irq_domain_set_info(d, virq + i, 0, &hv_pci_msi_controller,
> +				    NULL, handle_edge_irq, NULL, "edge");
>  	}
> +
>  	return 0;
>  }
>  
> -static void hv_msi_domain_free(struct irq_domain *d, unsigned int virq, unsigned int nr_irqs)
> +static void hv_msi_domain_free(struct irq_domain *d, unsigned int virq,
> +			       unsigned int nr_irqs)
>  {
>  	for (int i = 0; i < nr_irqs; ++i) {
>  		struct irq_data *irqd = irq_domain_get_irq_data(d, virq);
> @@ -362,6 +370,7 @@ static void hv_msi_domain_free(struct irq_domain *d, unsigned int virq, unsigned
>  
>  		hv_teardown_msi_irq(to_pci_dev(desc->dev), irqd);
>  	}
> +
>  	irq_domain_free_irqs_top(d, virq, nr_irqs);
>  }
>  
> @@ -394,25 +403,25 @@ struct irq_domain * __init hv_create_pci_msi_domain(void)
>  
>  int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry)
>  {
> -	union hv_device_id device_id;
> +	union hv_device_id hv_devid;
>  
> -	device_id.as_uint64 = 0;
> -	device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> -	device_id.ioapic.ioapic_id = (u8)ioapic_id;
> +	hv_devid.as_uint64 = 0;
> +	hv_devid.device_type = HV_DEVICE_TYPE_IOAPIC;
> +	hv_devid.ioapic.ioapic_id = (u8)ioapic_id;
>  
> -	return hv_unmap_interrupt(device_id.as_uint64, entry);
> +	return hv_unmap_interrupt(hv_devid.as_uint64, entry);
>  }
>  EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
>  
>  int hv_map_ioapic_interrupt(int ioapic_id, bool level, int cpu, int vector,
>  		struct hv_interrupt_entry *entry)
>  {
> -	union hv_device_id device_id;
> +	union hv_device_id hv_devid;
>  
> -	device_id.as_uint64 = 0;
> -	device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> -	device_id.ioapic.ioapic_id = (u8)ioapic_id;
> +	hv_devid.as_uint64 = 0;
> +	hv_devid.device_type = HV_DEVICE_TYPE_IOAPIC;
> +	hv_devid.ioapic.ioapic_id = (u8)ioapic_id;
>  
> -	return hv_map_interrupt(device_id, level, cpu, vector, entry);
> +	return hv_map_interrupt(hv_devid, level, cpu, vector, entry);
>  }
>  EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
> -- 
> 2.51.2.vfs.0.1
> 

Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>

^ permalink raw reply

* [PATCH v3 4/4] mshv: Handle insufficient root memory hypervisor statuses
From: Stanislav Kinsburskii @ 2026-02-05 18:42 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177031674698.186911.179832109354647364.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

When creating guest partition objects, the hypervisor may fail to
allocate root partition pages and return an insufficient memory status.
In this case, deposit memory using the root partition ID instead.

Note: This error should never occur in a guest of L1VH partition context.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/hv_common.c      |    2 +
 drivers/hv/hv_proc.c        |   14 ++++++++++
 include/hyperv/hvgdk_mini.h |   58 ++++++++++++++++++++++---------------------
 3 files changed, 46 insertions(+), 28 deletions(-)

diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index f20596276662..6b67ac616789 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -794,6 +794,8 @@ static const struct hv_status_info hv_status_infos[] = {
 	_STATUS_INFO(HV_STATUS_PROPERTY_VALUE_OUT_OF_RANGE,	-EIO),
 	_STATUS_INFO(HV_STATUS_INSUFFICIENT_MEMORY,		-ENOMEM),
 	_STATUS_INFO(HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY,	-ENOMEM),
+	_STATUS_INFO(HV_STATUS_INSUFFICIENT_ROOT_MEMORY,	-ENOMEM),
+	_STATUS_INFO(HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY,	-ENOMEM),
 	_STATUS_INFO(HV_STATUS_INVALID_PARTITION_ID,		-EINVAL),
 	_STATUS_INFO(HV_STATUS_INVALID_VP_INDEX,		-EINVAL),
 	_STATUS_INFO(HV_STATUS_NOT_FOUND,			-EIO),
diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
index 181f6d02bce3..5f4fd9c3231c 100644
--- a/drivers/hv/hv_proc.c
+++ b/drivers/hv/hv_proc.c
@@ -121,6 +121,18 @@ int hv_deposit_memory_node(int node, u64 partition_id,
 	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY:
 		num_pages = HV_MAX_CONTIGUOUS_ALLOCATION_PAGES;
 		break;
+
+	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY:
+		num_pages = HV_MAX_CONTIGUOUS_ALLOCATION_PAGES;
+		fallthrough;
+	case HV_STATUS_INSUFFICIENT_ROOT_MEMORY:
+		if (!hv_root_partition()) {
+			hv_status_err(hv_status, "Unexpected root memory deposit\n");
+			return -ENOMEM;
+		}
+		partition_id = HV_PARTITION_ID_SELF;
+		break;
+
 	default:
 		hv_status_err(hv_status, "Unexpected!\n");
 		return -ENOMEM;
@@ -134,6 +146,8 @@ bool hv_result_needs_memory(u64 status)
 	switch (hv_result(status)) {
 	case HV_STATUS_INSUFFICIENT_MEMORY:
 	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY:
+	case HV_STATUS_INSUFFICIENT_ROOT_MEMORY:
+	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY:
 		return true;
 	}
 	return false;
diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
index 99ea0d03e657..50f5a1419052 100644
--- a/include/hyperv/hvgdk_mini.h
+++ b/include/hyperv/hvgdk_mini.h
@@ -14,34 +14,36 @@ struct hv_u128 {
 } __packed;
 
 /* NOTE: when adding below, update hv_result_to_string() */
-#define HV_STATUS_SUCCESS			    0x0
-#define HV_STATUS_INVALID_HYPERCALL_CODE	    0x2
-#define HV_STATUS_INVALID_HYPERCALL_INPUT	    0x3
-#define HV_STATUS_INVALID_ALIGNMENT		    0x4
-#define HV_STATUS_INVALID_PARAMETER		    0x5
-#define HV_STATUS_ACCESS_DENIED			    0x6
-#define HV_STATUS_INVALID_PARTITION_STATE	    0x7
-#define HV_STATUS_OPERATION_DENIED		    0x8
-#define HV_STATUS_UNKNOWN_PROPERTY		    0x9
-#define HV_STATUS_PROPERTY_VALUE_OUT_OF_RANGE	    0xA
-#define HV_STATUS_INSUFFICIENT_MEMORY		    0xB
-#define HV_STATUS_INVALID_PARTITION_ID		    0xD
-#define HV_STATUS_INVALID_VP_INDEX		    0xE
-#define HV_STATUS_NOT_FOUND			    0x10
-#define HV_STATUS_INVALID_PORT_ID		    0x11
-#define HV_STATUS_INVALID_CONNECTION_ID		    0x12
-#define HV_STATUS_INSUFFICIENT_BUFFERS		    0x13
-#define HV_STATUS_NOT_ACKNOWLEDGED		    0x14
-#define HV_STATUS_INVALID_VP_STATE		    0x15
-#define HV_STATUS_NO_RESOURCES			    0x1D
-#define HV_STATUS_PROCESSOR_FEATURE_NOT_SUPPORTED   0x20
-#define HV_STATUS_INVALID_LP_INDEX		    0x41
-#define HV_STATUS_INVALID_REGISTER_VALUE	    0x50
-#define HV_STATUS_OPERATION_FAILED		    0x71
-#define HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY    0x75
-#define HV_STATUS_TIME_OUT			    0x78
-#define HV_STATUS_CALL_PENDING			    0x79
-#define HV_STATUS_VTL_ALREADY_ENABLED		    0x86
+#define HV_STATUS_SUCCESS				0x0
+#define HV_STATUS_INVALID_HYPERCALL_CODE		0x2
+#define HV_STATUS_INVALID_HYPERCALL_INPUT		0x3
+#define HV_STATUS_INVALID_ALIGNMENT			0x4
+#define HV_STATUS_INVALID_PARAMETER			0x5
+#define HV_STATUS_ACCESS_DENIED				0x6
+#define HV_STATUS_INVALID_PARTITION_STATE		0x7
+#define HV_STATUS_OPERATION_DENIED			0x8
+#define HV_STATUS_UNKNOWN_PROPERTY			0x9
+#define HV_STATUS_PROPERTY_VALUE_OUT_OF_RANGE		0xA
+#define HV_STATUS_INSUFFICIENT_MEMORY			0xB
+#define HV_STATUS_INVALID_PARTITION_ID			0xD
+#define HV_STATUS_INVALID_VP_INDEX			0xE
+#define HV_STATUS_NOT_FOUND				0x10
+#define HV_STATUS_INVALID_PORT_ID			0x11
+#define HV_STATUS_INVALID_CONNECTION_ID			0x12
+#define HV_STATUS_INSUFFICIENT_BUFFERS			0x13
+#define HV_STATUS_NOT_ACKNOWLEDGED			0x14
+#define HV_STATUS_INVALID_VP_STATE			0x15
+#define HV_STATUS_NO_RESOURCES				0x1D
+#define HV_STATUS_PROCESSOR_FEATURE_NOT_SUPPORTED	0x20
+#define HV_STATUS_INVALID_LP_INDEX			0x41
+#define HV_STATUS_INVALID_REGISTER_VALUE		0x50
+#define HV_STATUS_OPERATION_FAILED			0x71
+#define HV_STATUS_INSUFFICIENT_ROOT_MEMORY		0x73
+#define HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY	0x75
+#define HV_STATUS_TIME_OUT				0x78
+#define HV_STATUS_CALL_PENDING				0x79
+#define HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY	0x83
+#define HV_STATUS_VTL_ALREADY_ENABLED			0x86
 
 /*
  * The Hyper-V TimeRefCount register and the TSC



^ permalink raw reply related

* [PATCH v3 3/4] mshv: Handle insufficient contiguous memory hypervisor status
From: Stanislav Kinsburskii @ 2026-02-05 18:42 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177031674698.186911.179832109354647364.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

The HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY status indicates that the
hypervisor lacks sufficient contiguous memory for its internal allocations.

When this status is encountered, allocate and deposit
HV_MAX_CONTIGUOUS_ALLOCATION_PAGES contiguous pages to the hypervisor.
HV_MAX_CONTIGUOUS_ALLOCATION_PAGES is defined in the hypervisor headers, a
deposit of this size will always satisfy the hypervisor's requirements.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/hv_common.c      |    1 +
 drivers/hv/hv_proc.c        |    4 ++++
 include/hyperv/hvgdk_mini.h |    1 +
 include/hyperv/hvhdk_mini.h |    2 ++
 4 files changed, 8 insertions(+)

diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index f1c17fb60dc1..f20596276662 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -793,6 +793,7 @@ static const struct hv_status_info hv_status_infos[] = {
 	_STATUS_INFO(HV_STATUS_UNKNOWN_PROPERTY,		-EIO),
 	_STATUS_INFO(HV_STATUS_PROPERTY_VALUE_OUT_OF_RANGE,	-EIO),
 	_STATUS_INFO(HV_STATUS_INSUFFICIENT_MEMORY,		-ENOMEM),
+	_STATUS_INFO(HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY,	-ENOMEM),
 	_STATUS_INFO(HV_STATUS_INVALID_PARTITION_ID,		-EINVAL),
 	_STATUS_INFO(HV_STATUS_INVALID_VP_INDEX,		-EINVAL),
 	_STATUS_INFO(HV_STATUS_NOT_FOUND,			-EIO),
diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
index 53622e5886b8..181f6d02bce3 100644
--- a/drivers/hv/hv_proc.c
+++ b/drivers/hv/hv_proc.c
@@ -118,6 +118,9 @@ int hv_deposit_memory_node(int node, u64 partition_id,
 	switch (hv_result(hv_status)) {
 	case HV_STATUS_INSUFFICIENT_MEMORY:
 		break;
+	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY:
+		num_pages = HV_MAX_CONTIGUOUS_ALLOCATION_PAGES;
+		break;
 	default:
 		hv_status_err(hv_status, "Unexpected!\n");
 		return -ENOMEM;
@@ -130,6 +133,7 @@ bool hv_result_needs_memory(u64 status)
 {
 	switch (hv_result(status)) {
 	case HV_STATUS_INSUFFICIENT_MEMORY:
+	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY:
 		return true;
 	}
 	return false;
diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
index 30fbbde81c5c..99ea0d03e657 100644
--- a/include/hyperv/hvgdk_mini.h
+++ b/include/hyperv/hvgdk_mini.h
@@ -38,6 +38,7 @@ struct hv_u128 {
 #define HV_STATUS_INVALID_LP_INDEX		    0x41
 #define HV_STATUS_INVALID_REGISTER_VALUE	    0x50
 #define HV_STATUS_OPERATION_FAILED		    0x71
+#define HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY    0x75
 #define HV_STATUS_TIME_OUT			    0x78
 #define HV_STATUS_CALL_PENDING			    0x79
 #define HV_STATUS_VTL_ALREADY_ENABLED		    0x86
diff --git a/include/hyperv/hvhdk_mini.h b/include/hyperv/hvhdk_mini.h
index c0300910808b..091c03e26046 100644
--- a/include/hyperv/hvhdk_mini.h
+++ b/include/hyperv/hvhdk_mini.h
@@ -7,6 +7,8 @@
 
 #include "hvgdk_mini.h"
 
+#define HV_MAX_CONTIGUOUS_ALLOCATION_PAGES	8
+
 /*
  * Doorbell connection_info flags.
  */



^ permalink raw reply related

* [PATCH v3 2/4] mshv: Introduce hv_deposit_memory helper functions
From: Stanislav Kinsburskii @ 2026-02-05 18:42 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177031674698.186911.179832109354647364.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

Introduce hv_deposit_memory_node() and hv_deposit_memory() helper
functions to handle memory deposition with proper error handling.

The new hv_deposit_memory_node() function takes the hypervisor status
as a parameter and validates it before depositing pages. It checks for
HV_STATUS_INSUFFICIENT_MEMORY specifically and returns an error for
unexpected status codes.

This is a precursor patch to new out-of-memory error codes support.
No functional changes intended.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/hv_proc.c           |   21 +++++++++++++++++++--
 drivers/hv/mshv_root_hv_call.c |   25 +++++++++----------------
 drivers/hv/mshv_root_main.c    |    3 +--
 include/asm-generic/mshyperv.h |   10 ++++++++++
 4 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
index e53204b9e05d..53622e5886b8 100644
--- a/drivers/hv/hv_proc.c
+++ b/drivers/hv/hv_proc.c
@@ -110,6 +110,22 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
 }
 EXPORT_SYMBOL_GPL(hv_call_deposit_pages);
 
+int hv_deposit_memory_node(int node, u64 partition_id,
+			   u64 hv_status)
+{
+	u32 num_pages = 1;
+
+	switch (hv_result(hv_status)) {
+	case HV_STATUS_INSUFFICIENT_MEMORY:
+		break;
+	default:
+		hv_status_err(hv_status, "Unexpected!\n");
+		return -ENOMEM;
+	}
+	return hv_call_deposit_pages(node, partition_id, num_pages);
+}
+EXPORT_SYMBOL_GPL(hv_deposit_memory_node);
+
 bool hv_result_needs_memory(u64 status)
 {
 	switch (hv_result(status)) {
@@ -155,7 +171,8 @@ int hv_call_add_logical_proc(int node, u32 lp_index, u32 apic_id)
 			}
 			break;
 		}
-		ret = hv_call_deposit_pages(node, hv_current_partition_id, 1);
+		ret = hv_deposit_memory_node(node, hv_current_partition_id,
+					     status);
 	} while (!ret);
 
 	return ret;
@@ -197,7 +214,7 @@ int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags)
 			}
 			break;
 		}
-		ret = hv_call_deposit_pages(node, partition_id, 1);
+		ret = hv_deposit_memory_node(node, partition_id, status);
 
 	} while (!ret);
 
diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
index 1c4a2dbf49c0..7f91096f95a8 100644
--- a/drivers/hv/mshv_root_hv_call.c
+++ b/drivers/hv/mshv_root_hv_call.c
@@ -123,8 +123,7 @@ int hv_call_create_partition(u64 flags,
 			break;
 		}
 		local_irq_restore(irq_flags);
-		ret = hv_call_deposit_pages(NUMA_NO_NODE,
-					    hv_current_partition_id, 1);
+		ret = hv_deposit_memory(hv_current_partition_id, status);
 	} while (!ret);
 
 	return ret;
@@ -151,7 +150,7 @@ int hv_call_initialize_partition(u64 partition_id)
 			ret = hv_result_to_errno(status);
 			break;
 		}
-		ret = hv_call_deposit_pages(NUMA_NO_NODE, partition_id, 1);
+		ret = hv_deposit_memory(partition_id, status);
 	} while (!ret);
 
 	return ret;
@@ -465,8 +464,7 @@ int hv_call_get_vp_state(u32 vp_index, u64 partition_id,
 		}
 		local_irq_restore(flags);
 
-		ret = hv_call_deposit_pages(NUMA_NO_NODE,
-					    partition_id, 1);
+		ret = hv_deposit_memory(partition_id, status);
 	} while (!ret);
 
 	return ret;
@@ -525,8 +523,7 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
 		}
 		local_irq_restore(flags);
 
-		ret = hv_call_deposit_pages(NUMA_NO_NODE,
-					    partition_id, 1);
+		ret = hv_deposit_memory(partition_id, status);
 	} while (!ret);
 
 	return ret;
@@ -573,7 +570,7 @@ static int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
 
 		local_irq_restore(flags);
 
-		ret = hv_call_deposit_pages(NUMA_NO_NODE, partition_id, 1);
+		ret = hv_deposit_memory(partition_id, status);
 	} while (!ret);
 
 	return ret;
@@ -722,8 +719,7 @@ hv_call_create_port(u64 port_partition_id, union hv_port_id port_id,
 			ret = hv_result_to_errno(status);
 			break;
 		}
-		ret = hv_call_deposit_pages(NUMA_NO_NODE, port_partition_id, 1);
-
+		ret = hv_deposit_memory(port_partition_id, status);
 	} while (!ret);
 
 	return ret;
@@ -776,8 +772,7 @@ hv_call_connect_port(u64 port_partition_id, union hv_port_id port_id,
 			ret = hv_result_to_errno(status);
 			break;
 		}
-		ret = hv_call_deposit_pages(NUMA_NO_NODE,
-					    connection_partition_id, 1);
+		ret = hv_deposit_memory(connection_partition_id, status);
 	} while (!ret);
 
 	return ret;
@@ -855,8 +850,7 @@ static int hv_call_map_stats_page2(enum hv_stats_object_type type,
 			break;
 		}
 
-		ret = hv_call_deposit_pages(NUMA_NO_NODE,
-					    hv_current_partition_id, 1);
+		ret = hv_deposit_memory(hv_current_partition_id, status);
 	} while (!ret);
 
 	return ret;
@@ -929,8 +923,7 @@ hv_call_map_stats_page(enum hv_stats_object_type type,
 			return hv_result_to_errno(status);
 		}
 
-		ret = hv_call_deposit_pages(NUMA_NO_NODE,
-					    hv_current_partition_id, 1);
+		ret = hv_deposit_memory(hv_current_partition_id, status);
 		if (ret)
 			return ret;
 	} while (!ret);
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index f5525651a565..cf58d9954638 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -254,8 +254,7 @@ static int mshv_ioctl_passthru_hvcall(struct mshv_partition *partition,
 		if (!hv_result_needs_memory(status))
 			ret = hv_result_to_errno(status);
 		else
-			ret = hv_call_deposit_pages(NUMA_NO_NODE,
-						    pt_id, 1);
+			ret = hv_deposit_memory(pt_id, status);
 	} while (!ret);
 
 	args.status = hv_result(status);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 452426d5b2ab..d37b68238c97 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -344,6 +344,7 @@ static inline bool hv_parent_partition(void)
 }
 
 bool hv_result_needs_memory(u64 status);
+int hv_deposit_memory_node(int node, u64 partition_id, u64 status);
 int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
 int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
 int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
@@ -353,6 +354,10 @@ static inline bool hv_root_partition(void) { return false; }
 static inline bool hv_l1vh_partition(void) { return false; }
 static inline bool hv_parent_partition(void) { return false; }
 static inline bool hv_result_needs_memory(u64 status) { return false; }
+static inline int hv_deposit_memory_node(int node, u64 partition_id, u64 status)
+{
+	return -EOPNOTSUPP;
+}
 static inline int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
 {
 	return -EOPNOTSUPP;
@@ -367,6 +372,11 @@ static inline int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u3
 }
 #endif /* CONFIG_MSHV_ROOT */
 
+static inline int hv_deposit_memory(u64 partition_id, u64 status)
+{
+	return hv_deposit_memory_node(NUMA_NO_NODE, partition_id, status);
+}
+
 #if IS_ENABLED(CONFIG_HYPERV_VTL_MODE)
 u8 __init get_vtl(void);
 #else



^ permalink raw reply related

* [PATCH v3 1/4] mshv: Introduce hv_result_needs_memory() helper function
From: Stanislav Kinsburskii @ 2026-02-05 18:42 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel
In-Reply-To: <177031674698.186911.179832109354647364.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

Replace direct comparisons of hv_result(status) against
HV_STATUS_INSUFFICIENT_MEMORY with a new hv_result_needs_memory() helper
function.
This improves code readability and provides a consistent and extendable
interface for checking out-of-memory conditions in hypercall results.

No functional changes intended.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/hv_proc.c           |   14 ++++++++++++--
 drivers/hv/mshv_root_hv_call.c |   25 ++++++++++++-------------
 drivers/hv/mshv_root_main.c    |    2 +-
 include/asm-generic/mshyperv.h |    3 +++
 4 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
index fbb4eb3901bb..e53204b9e05d 100644
--- a/drivers/hv/hv_proc.c
+++ b/drivers/hv/hv_proc.c
@@ -110,6 +110,16 @@ int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
 }
 EXPORT_SYMBOL_GPL(hv_call_deposit_pages);
 
+bool hv_result_needs_memory(u64 status)
+{
+	switch (hv_result(status)) {
+	case HV_STATUS_INSUFFICIENT_MEMORY:
+		return true;
+	}
+	return false;
+}
+EXPORT_SYMBOL_GPL(hv_result_needs_memory);
+
 int hv_call_add_logical_proc(int node, u32 lp_index, u32 apic_id)
 {
 	struct hv_input_add_logical_processor *input;
@@ -137,7 +147,7 @@ int hv_call_add_logical_proc(int node, u32 lp_index, u32 apic_id)
 					 input, output);
 		local_irq_restore(flags);
 
-		if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
+		if (!hv_result_needs_memory(status)) {
 			if (!hv_result_success(status)) {
 				hv_status_err(status, "cpu %u apic ID: %u\n",
 					      lp_index, apic_id);
@@ -179,7 +189,7 @@ int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags)
 		status = hv_do_hypercall(HVCALL_CREATE_VP, input, NULL);
 		local_irq_restore(irq_flags);
 
-		if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
+		if (!hv_result_needs_memory(status)) {
 			if (!hv_result_success(status)) {
 				hv_status_err(status, "vcpu: %u, lp: %u\n",
 					      vp_index, flags);
diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
index daee036e48bc..1c4a2dbf49c0 100644
--- a/drivers/hv/mshv_root_hv_call.c
+++ b/drivers/hv/mshv_root_hv_call.c
@@ -115,7 +115,7 @@ int hv_call_create_partition(u64 flags,
 		status = hv_do_hypercall(HVCALL_CREATE_PARTITION,
 					 input, output);
 
-		if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
+		if (!hv_result_needs_memory(status)) {
 			if (hv_result_success(status))
 				*partition_id = output->partition_id;
 			local_irq_restore(irq_flags);
@@ -147,7 +147,7 @@ int hv_call_initialize_partition(u64 partition_id)
 		status = hv_do_fast_hypercall8(HVCALL_INITIALIZE_PARTITION,
 					       *(u64 *)&input);
 
-		if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
+		if (!hv_result_needs_memory(status)) {
 			ret = hv_result_to_errno(status);
 			break;
 		}
@@ -239,7 +239,7 @@ static int hv_do_map_gpa_hcall(u64 partition_id, u64 gfn, u64 page_struct_count,
 
 		completed = hv_repcomp(status);
 
-		if (hv_result(status) == HV_STATUS_INSUFFICIENT_MEMORY) {
+		if (hv_result_needs_memory(status)) {
 			ret = hv_call_deposit_pages(NUMA_NO_NODE, partition_id,
 						    HV_MAP_GPA_DEPOSIT_PAGES);
 			if (ret)
@@ -455,7 +455,7 @@ int hv_call_get_vp_state(u32 vp_index, u64 partition_id,
 
 		status = hv_do_hypercall(control, input, output);
 
-		if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
+		if (!hv_result_needs_memory(status)) {
 			if (hv_result_success(status) && ret_output)
 				memcpy(ret_output, output, sizeof(*output));
 
@@ -518,7 +518,7 @@ int hv_call_set_vp_state(u32 vp_index, u64 partition_id,
 
 		status = hv_do_hypercall(control, input, NULL);
 
-		if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
+		if (!hv_result_needs_memory(status)) {
 			local_irq_restore(flags);
 			ret = hv_result_to_errno(status);
 			break;
@@ -563,7 +563,7 @@ static int hv_call_map_vp_state_page(u64 partition_id, u32 vp_index, u32 type,
 		status = hv_do_hypercall(HVCALL_MAP_VP_STATE_PAGE, input,
 					 output);
 
-		if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
+		if (!hv_result_needs_memory(status)) {
 			if (hv_result_success(status))
 				*state_page = pfn_to_page(output->map_location);
 			local_irq_restore(flags);
@@ -718,7 +718,7 @@ hv_call_create_port(u64 port_partition_id, union hv_port_id port_id,
 		if (hv_result_success(status))
 			break;
 
-		if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
+		if (!hv_result_needs_memory(status)) {
 			ret = hv_result_to_errno(status);
 			break;
 		}
@@ -772,7 +772,7 @@ hv_call_connect_port(u64 port_partition_id, union hv_port_id port_id,
 		if (hv_result_success(status))
 			break;
 
-		if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
+		if (!hv_result_needs_memory(status)) {
 			ret = hv_result_to_errno(status);
 			break;
 		}
@@ -850,7 +850,7 @@ static int hv_call_map_stats_page2(enum hv_stats_object_type type,
 		if (!ret)
 			break;
 
-		if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY) {
+		if (!hv_result_needs_memory(status)) {
 			hv_status_debug(status, "\n");
 			break;
 		}
@@ -899,7 +899,7 @@ hv_call_map_stats_page(enum hv_stats_object_type type,
 	struct hv_input_map_stats_page *input;
 	struct hv_output_map_stats_page *output;
 	u64 status, pfn;
-	int hv_status, ret = 0;
+	int ret = 0;
 
 	do {
 		local_irq_save(flags);
@@ -915,13 +915,12 @@ hv_call_map_stats_page(enum hv_stats_object_type type,
 
 		local_irq_restore(flags);
 
-		hv_status = hv_result(status);
-		if (hv_status != HV_STATUS_INSUFFICIENT_MEMORY) {
+		if (!hv_result_needs_memory(status)) {
 			if (hv_result_success(status))
 				break;
 
 			if (hv_stats_get_area_type(type, identity) == HV_STATS_AREA_PARENT &&
-			    hv_status == HV_STATUS_INVALID_PARAMETER) {
+			    hv_result(status) == HV_STATUS_INVALID_PARAMETER) {
 				*addr = NULL;
 				return 0;
 			}
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index b429bb1fdffa..f5525651a565 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -251,7 +251,7 @@ static int mshv_ioctl_passthru_hvcall(struct mshv_partition *partition,
 		if (hv_result_success(status))
 			break;
 
-		if (hv_result(status) != HV_STATUS_INSUFFICIENT_MEMORY)
+		if (!hv_result_needs_memory(status))
 			ret = hv_result_to_errno(status);
 		else
 			ret = hv_call_deposit_pages(NUMA_NO_NODE,
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index ecedab554c80..452426d5b2ab 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -342,6 +342,8 @@ static inline bool hv_parent_partition(void)
 {
 	return hv_root_partition() || hv_l1vh_partition();
 }
+
+bool hv_result_needs_memory(u64 status);
 int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages);
 int hv_call_add_logical_proc(int node, u32 lp_index, u32 acpi_id);
 int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
@@ -350,6 +352,7 @@ int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags);
 static inline bool hv_root_partition(void) { return false; }
 static inline bool hv_l1vh_partition(void) { return false; }
 static inline bool hv_parent_partition(void) { return false; }
+static inline bool hv_result_needs_memory(u64 status) { return false; }
 static inline int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
 {
 	return -EOPNOTSUPP;



^ permalink raw reply related

* [PATCH v3 0/4] Improve Hyper-V memory deposit error handling
From: Stanislav Kinsburskii @ 2026-02-05 18:42 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel

This series extends the MSHV driver to properly handle additional
memory-related error codes from the Microsoft Hypervisor by depositing
memory pages when needed.

Currently, when the hypervisor returns HV_STATUS_INSUFFICIENT_MEMORY
during partition creation, the driver calls hv_call_deposit_pages() to
provide the necessary memory. However, there are other memory-related
error codes that indicate the hypervisor needs additional memory
resources, but the driver does not attempt to deposit pages for these
cases.

This series introduces a dedicated helper function macro to identify all
memory-related error codes (HV_STATUS_INSUFFICIENT_MEMORY,
HV_STATUS_INSUFFICIENT_BUFFERS, HV_STATUS_INSUFFICIENT_DEVICE_DOMAINS, and
HV_STATUS_INSUFFICIENT_ROOT_MEMORY) and ensures the driver attempts to
deposit pages for all of them via new hv_deposit_memory() helper.

With these changes, partition creation becomes more robust by handling
all scenarios where the hypervisor requires additional memory deposits.

v3:
- Fix uninitialized num_pages variable in hv_deposit_memory_node() in case
  of HV_STATUS_INSUFFICIENT_ROOT_MEMORY status

v2:
- Rename hv_result_oom() into hv_result_needs_memory()

---

Stanislav Kinsburskii (4):
      mshv: Introduce hv_result_needs_memory() helper function
      mshv: Introduce hv_deposit_memory helper functions
      mshv: Handle insufficient contiguous memory hypervisor status
      mshv: Handle insufficient root memory hypervisor statuses

 drivers/hv/hv_common.c         |    3 ++
 drivers/hv/hv_proc.c           |   53 ++++++++++++++++++++++++++++++++++---
 drivers/hv/mshv_root_hv_call.c |   50 +++++++++++++++--------------------
 drivers/hv/mshv_root_main.c    |    5 +---
 include/asm-generic/mshyperv.h |   13 +++++++++
 include/hyperv/hvgdk_mini.h    |   57 +++++++++++++++++++++-------------------
 include/hyperv/hvhdk_mini.h    |    2 +
 7 files changed, 120 insertions(+), 63 deletions(-)

^ permalink raw reply

* Re: [PATCH v2 4/4] mshv: Handle insufficient root memory hypervisor statuses
From: Stanislav Kinsburskii @ 2026-02-05 18:36 UTC (permalink / raw)
  To: Anirudh Rayabharam
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel
In-Reply-To: <s6orh5waw2djyiv5w6yzwiaxv7rcja6iua6kbzldthsmceelqv@dnf2zr2m74we>

On Thu, Feb 05, 2026 at 11:37:49PM +0530, Anirudh Rayabharam wrote:
> On Mon, Feb 02, 2026 at 05:59:14PM +0000, Stanislav Kinsburskii wrote:
> > When creating guest partition objects, the hypervisor may fail to
> > allocate root partition pages and return an insufficient memory status.
> > In this case, deposit memory using the root partition ID instead.
> > 
> > Note: This error should never occur in a guest of L1VH partition context.
> > 
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> >  drivers/hv/hv_common.c      |    2 +
> >  drivers/hv/hv_proc.c        |   14 ++++++++++
> >  include/hyperv/hvgdk_mini.h |   58 ++++++++++++++++++++++---------------------
> >  3 files changed, 46 insertions(+), 28 deletions(-)
> > 
> > diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> > index c7f63c9de503..cab0d1733607 100644
> > --- a/drivers/hv/hv_common.c
> > +++ b/drivers/hv/hv_common.c
> > @@ -792,6 +792,8 @@ static const struct hv_status_info hv_status_infos[] = {
> >  	_STATUS_INFO(HV_STATUS_PROPERTY_VALUE_OUT_OF_RANGE,	-EIO),
> >  	_STATUS_INFO(HV_STATUS_INSUFFICIENT_MEMORY,		-ENOMEM),
> >  	_STATUS_INFO(HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY,	-ENOMEM),
> > +	_STATUS_INFO(HV_STATUS_INSUFFICIENT_ROOT_MEMORY,	-ENOMEM),
> > +	_STATUS_INFO(HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY,	-ENOMEM),
> >  	_STATUS_INFO(HV_STATUS_INVALID_PARTITION_ID,		-EINVAL),
> >  	_STATUS_INFO(HV_STATUS_INVALID_VP_INDEX,		-EINVAL),
> >  	_STATUS_INFO(HV_STATUS_NOT_FOUND,			-EIO),
> > diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
> > index dfa27be66ff7..935129e0b39d 100644
> > --- a/drivers/hv/hv_proc.c
> > +++ b/drivers/hv/hv_proc.c
> > @@ -122,6 +122,18 @@ int hv_deposit_memory_node(int node, u64 partition_id,
> >  	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY:
> >  		num_pages = HV_MAX_CONTIGUOUS_ALLOCATION_PAGES;
> >  		break;
> > +
> > +	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY:
> > +		num_pages = HV_MAX_CONTIGUOUS_ALLOCATION_PAGES;
> > +		fallthrough;
> > +	case HV_STATUS_INSUFFICIENT_ROOT_MEMORY:
> 
> Is num_pages uninitialized when we reach this case directly?
> 

It actually does not. I'll fix it.

Thanks,
Stanislav

> Thanks,
> Anirudh.
> 
> > +		if (!hv_root_partition()) {
> > +			hv_status_err(hv_status, "Unexpected root memory deposit\n");
> > +			return -ENOMEM;
> > +		}
> > +		partition_id = HV_PARTITION_ID_SELF;
> > +		break;
> > +
> >  	default:
> >  		hv_status_err(hv_status, "Unexpected!\n");
> >  		return -ENOMEM;
> > @@ -135,6 +147,8 @@ bool hv_result_needs_memory(u64 status)
> >  	switch (hv_result(status)) {
> >  	case HV_STATUS_INSUFFICIENT_MEMORY:
> >  	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY:
> > +	case HV_STATUS_INSUFFICIENT_ROOT_MEMORY:
> > +	case HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY:
> >  		return true;
> >  	}
> >  	return false;
> > diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
> > index 70f22ef44948..5b74a857ef43 100644
> > --- a/include/hyperv/hvgdk_mini.h
> > +++ b/include/hyperv/hvgdk_mini.h
> > @@ -14,34 +14,36 @@ struct hv_u128 {
> >  } __packed;
> >  
> >  /* NOTE: when adding below, update hv_result_to_string() */
> > -#define HV_STATUS_SUCCESS			    0x0
> > -#define HV_STATUS_INVALID_HYPERCALL_CODE	    0x2
> > -#define HV_STATUS_INVALID_HYPERCALL_INPUT	    0x3
> > -#define HV_STATUS_INVALID_ALIGNMENT		    0x4
> > -#define HV_STATUS_INVALID_PARAMETER		    0x5
> > -#define HV_STATUS_ACCESS_DENIED			    0x6
> > -#define HV_STATUS_INVALID_PARTITION_STATE	    0x7
> > -#define HV_STATUS_OPERATION_DENIED		    0x8
> > -#define HV_STATUS_UNKNOWN_PROPERTY		    0x9
> > -#define HV_STATUS_PROPERTY_VALUE_OUT_OF_RANGE	    0xA
> > -#define HV_STATUS_INSUFFICIENT_MEMORY		    0xB
> > -#define HV_STATUS_INVALID_PARTITION_ID		    0xD
> > -#define HV_STATUS_INVALID_VP_INDEX		    0xE
> > -#define HV_STATUS_NOT_FOUND			    0x10
> > -#define HV_STATUS_INVALID_PORT_ID		    0x11
> > -#define HV_STATUS_INVALID_CONNECTION_ID		    0x12
> > -#define HV_STATUS_INSUFFICIENT_BUFFERS		    0x13
> > -#define HV_STATUS_NOT_ACKNOWLEDGED		    0x14
> > -#define HV_STATUS_INVALID_VP_STATE		    0x15
> > -#define HV_STATUS_NO_RESOURCES			    0x1D
> > -#define HV_STATUS_PROCESSOR_FEATURE_NOT_SUPPORTED   0x20
> > -#define HV_STATUS_INVALID_LP_INDEX		    0x41
> > -#define HV_STATUS_INVALID_REGISTER_VALUE	    0x50
> > -#define HV_STATUS_OPERATION_FAILED		    0x71
> > -#define HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY    0x75
> > -#define HV_STATUS_TIME_OUT			    0x78
> > -#define HV_STATUS_CALL_PENDING			    0x79
> > -#define HV_STATUS_VTL_ALREADY_ENABLED		    0x86
> > +#define HV_STATUS_SUCCESS				0x0
> > +#define HV_STATUS_INVALID_HYPERCALL_CODE		0x2
> > +#define HV_STATUS_INVALID_HYPERCALL_INPUT		0x3
> > +#define HV_STATUS_INVALID_ALIGNMENT			0x4
> > +#define HV_STATUS_INVALID_PARAMETER			0x5
> > +#define HV_STATUS_ACCESS_DENIED				0x6
> > +#define HV_STATUS_INVALID_PARTITION_STATE		0x7
> > +#define HV_STATUS_OPERATION_DENIED			0x8
> > +#define HV_STATUS_UNKNOWN_PROPERTY			0x9
> > +#define HV_STATUS_PROPERTY_VALUE_OUT_OF_RANGE		0xA
> > +#define HV_STATUS_INSUFFICIENT_MEMORY			0xB
> > +#define HV_STATUS_INVALID_PARTITION_ID			0xD
> > +#define HV_STATUS_INVALID_VP_INDEX			0xE
> > +#define HV_STATUS_NOT_FOUND				0x10
> > +#define HV_STATUS_INVALID_PORT_ID			0x11
> > +#define HV_STATUS_INVALID_CONNECTION_ID			0x12
> > +#define HV_STATUS_INSUFFICIENT_BUFFERS			0x13
> > +#define HV_STATUS_NOT_ACKNOWLEDGED			0x14
> > +#define HV_STATUS_INVALID_VP_STATE			0x15
> > +#define HV_STATUS_NO_RESOURCES				0x1D
> > +#define HV_STATUS_PROCESSOR_FEATURE_NOT_SUPPORTED	0x20
> > +#define HV_STATUS_INVALID_LP_INDEX			0x41
> > +#define HV_STATUS_INVALID_REGISTER_VALUE		0x50
> > +#define HV_STATUS_OPERATION_FAILED			0x71
> > +#define HV_STATUS_INSUFFICIENT_ROOT_MEMORY		0x73
> > +#define HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY	0x75
> > +#define HV_STATUS_TIME_OUT				0x78
> > +#define HV_STATUS_CALL_PENDING				0x79
> > +#define HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY	0x83
> > +#define HV_STATUS_VTL_ALREADY_ENABLED			0x86
> >  
> >  /*
> >   * The Hyper-V TimeRefCount register and the TSC
> > 
> > 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox