public inbox for linux-hyperv@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
@ 2026-01-23 22:20 Stanislav Kinsburskii
  2026-01-24  0:09 ` Nuno Das Neves
                   ` (3 more replies)
  0 siblings, 4 replies; 41+ messages in thread
From: Stanislav Kinsburskii @ 2026-01-23 22:20 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli; +Cc: linux-hyperv, linux-kernel

The MSHV driver deposits kernel-allocated pages to the hypervisor during
runtime and never withdraws them. This creates a fundamental incompatibility
with KEXEC, as these deposited pages remain unavailable to the new kernel
loaded via KEXEC, leading to potential system crashes upon kernel accessing
hypervisor deposited pages.

Make MSHV mutually exclusive with KEXEC until proper page lifecycle
management is implemented.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 drivers/hv/Kconfig |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 7937ac0cbd0f..cfd4501db0fa 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -74,6 +74,7 @@ config MSHV_ROOT
 	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
 	# no particular order, making it impossible to reassemble larger pages
 	depends on PAGE_SIZE_4KB
+	depends on !KEXEC
 	select EVENTFD
 	select VIRT_XFER_TO_GUEST_WORK
 	select HMM_MIRROR



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-23 22:20 [PATCH] mshv: Make MSHV mutually exclusive with KEXEC Stanislav Kinsburskii
@ 2026-01-24  0:09 ` Nuno Das Neves
  2026-01-24  0:16 ` Mukesh R
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 41+ messages in thread
From: Nuno Das Neves @ 2026-01-24  0:09 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui, longli
  Cc: linux-hyperv, linux-kernel

On 1/23/2026 2:20 PM, Stanislav Kinsburskii wrote:
> The MSHV driver deposits kernel-allocated pages to the hypervisor during
> runtime and never withdraws them. This creates a fundamental incompatibility
> with KEXEC, as these deposited pages remain unavailable to the new kernel
> loaded via KEXEC, leading to potential system crashes upon kernel accessing
> hypervisor deposited pages.
> 
> Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> management is implemented.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/Kconfig |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> index 7937ac0cbd0f..cfd4501db0fa 100644
> --- a/drivers/hv/Kconfig
> +++ b/drivers/hv/Kconfig
> @@ -74,6 +74,7 @@ config MSHV_ROOT
>  	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
>  	# no particular order, making it impossible to reassemble larger pages
>  	depends on PAGE_SIZE_4KB
> +	depends on !KEXEC
>  	select EVENTFD
>  	select VIRT_XFER_TO_GUEST_WORK
>  	select HMM_MIRROR
> 
> 

Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-23 22:20 [PATCH] mshv: Make MSHV mutually exclusive with KEXEC Stanislav Kinsburskii
  2026-01-24  0:09 ` Nuno Das Neves
@ 2026-01-24  0:16 ` Mukesh R
  2026-01-25 22:39   ` Stanislav Kinsburskii
  2026-01-26 18:49 ` Anirudh Rayabharam
  2026-02-02 16:56 ` Naman Jain
  3 siblings, 1 reply; 41+ messages in thread
From: Mukesh R @ 2026-01-24  0:16 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui, longli
  Cc: linux-hyperv, linux-kernel

On 1/23/26 14:20, Stanislav Kinsburskii wrote:
> The MSHV driver deposits kernel-allocated pages to the hypervisor during
> runtime and never withdraws them. This creates a fundamental incompatibility
> with KEXEC, as these deposited pages remain unavailable to the new kernel
> loaded via KEXEC, leading to potential system crashes upon kernel accessing
> hypervisor deposited pages.
> 
> Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> management is implemented.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>   drivers/hv/Kconfig |    1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> index 7937ac0cbd0f..cfd4501db0fa 100644
> --- a/drivers/hv/Kconfig
> +++ b/drivers/hv/Kconfig
> @@ -74,6 +74,7 @@ config MSHV_ROOT
>   	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
>   	# no particular order, making it impossible to reassemble larger pages
>   	depends on PAGE_SIZE_4KB
> +	depends on !KEXEC
>   	select EVENTFD
>   	select VIRT_XFER_TO_GUEST_WORK
>   	select HMM_MIRROR
> 
> 

Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
implying that crash dump might be involved. Or did you test kdump
and it was fine?

Thanks,
-Mukesh


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-24  0:16 ` Mukesh R
@ 2026-01-25 22:39   ` Stanislav Kinsburskii
  2026-01-26 20:20     ` Mukesh R
  0 siblings, 1 reply; 41+ messages in thread
From: Stanislav Kinsburskii @ 2026-01-25 22:39 UTC (permalink / raw)
  To: Mukesh R
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
> On 1/23/26 14:20, Stanislav Kinsburskii wrote:
> > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > runtime and never withdraws them. This creates a fundamental incompatibility
> > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > hypervisor deposited pages.
> > 
> > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > management is implemented.
> > 
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> >   drivers/hv/Kconfig |    1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > index 7937ac0cbd0f..cfd4501db0fa 100644
> > --- a/drivers/hv/Kconfig
> > +++ b/drivers/hv/Kconfig
> > @@ -74,6 +74,7 @@ config MSHV_ROOT
> >   	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> >   	# no particular order, making it impossible to reassemble larger pages
> >   	depends on PAGE_SIZE_4KB
> > +	depends on !KEXEC
> >   	select EVENTFD
> >   	select VIRT_XFER_TO_GUEST_WORK
> >   	select HMM_MIRROR
> > 
> > 
> 
> Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
> implying that crash dump might be involved. Or did you test kdump
> and it was fine?
> 

Yes, it will. Crash kexec depends on normal kexec functionality, so it
will be affected as well.

Thanks,
Stanislav

> Thanks,
> -Mukesh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-23 22:20 [PATCH] mshv: Make MSHV mutually exclusive with KEXEC Stanislav Kinsburskii
  2026-01-24  0:09 ` Nuno Das Neves
  2026-01-24  0:16 ` Mukesh R
@ 2026-01-26 18:49 ` Anirudh Rayabharam
  2026-01-26 20:46   ` Stanislav Kinsburskii
  2026-02-02 16:56 ` Naman Jain
  3 siblings, 1 reply; 41+ messages in thread
From: Anirudh Rayabharam @ 2026-01-26 18:49 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> The MSHV driver deposits kernel-allocated pages to the hypervisor during
> runtime and never withdraws them. This creates a fundamental incompatibility
> with KEXEC, as these deposited pages remain unavailable to the new kernel
> loaded via KEXEC, leading to potential system crashes upon kernel accessing
> hypervisor deposited pages.
> 
> Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> management is implemented.

Someone might want to stop all guest VMs and do a kexec. Which is valid
and would work without any issue for L1VH.

Also, I don't think it is reasonable at all that someone needs to
disable basic kernel functionality such as kexec in order to use our
driver.

Thanks,
Anirudh.

> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/Kconfig |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> index 7937ac0cbd0f..cfd4501db0fa 100644
> --- a/drivers/hv/Kconfig
> +++ b/drivers/hv/Kconfig
> @@ -74,6 +74,7 @@ config MSHV_ROOT
>  	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
>  	# no particular order, making it impossible to reassemble larger pages
>  	depends on PAGE_SIZE_4KB
> +	depends on !KEXEC
>  	select EVENTFD
>  	select VIRT_XFER_TO_GUEST_WORK
>  	select HMM_MIRROR
> 
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-25 22:39   ` Stanislav Kinsburskii
@ 2026-01-26 20:20     ` Mukesh R
  2026-01-26 20:43       ` Stanislav Kinsburskii
  0 siblings, 1 reply; 41+ messages in thread
From: Mukesh R @ 2026-01-26 20:20 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On 1/25/26 14:39, Stanislav Kinsburskii wrote:
> On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
>> On 1/23/26 14:20, Stanislav Kinsburskii wrote:
>>> The MSHV driver deposits kernel-allocated pages to the hypervisor during
>>> runtime and never withdraws them. This creates a fundamental incompatibility
>>> with KEXEC, as these deposited pages remain unavailable to the new kernel
>>> loaded via KEXEC, leading to potential system crashes upon kernel accessing
>>> hypervisor deposited pages.
>>>
>>> Make MSHV mutually exclusive with KEXEC until proper page lifecycle
>>> management is implemented.
>>>
>>> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
>>> ---
>>>    drivers/hv/Kconfig |    1 +
>>>    1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
>>> index 7937ac0cbd0f..cfd4501db0fa 100644
>>> --- a/drivers/hv/Kconfig
>>> +++ b/drivers/hv/Kconfig
>>> @@ -74,6 +74,7 @@ config MSHV_ROOT
>>>    	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
>>>    	# no particular order, making it impossible to reassemble larger pages
>>>    	depends on PAGE_SIZE_4KB
>>> +	depends on !KEXEC
>>>    	select EVENTFD
>>>    	select VIRT_XFER_TO_GUEST_WORK
>>>    	select HMM_MIRROR
>>>
>>>
>>
>> Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
>> implying that crash dump might be involved. Or did you test kdump
>> and it was fine?
>>
> 
> Yes, it will. Crash kexec depends on normal kexec functionality, so it
> will be affected as well.

So not sure I understand the reason for this patch. We can just block
kexec if there are any VMs running, right? Doing this would mean any
further developement would be without a ver important and major feature,
right?

> Thanks,
> Stanislav
> 
>> Thanks,
>> -Mukesh


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-26 20:20     ` Mukesh R
@ 2026-01-26 20:43       ` Stanislav Kinsburskii
  2026-01-26 23:07         ` Mukesh R
  0 siblings, 1 reply; 41+ messages in thread
From: Stanislav Kinsburskii @ 2026-01-26 20:43 UTC (permalink / raw)
  To: Mukesh R
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote:
> On 1/25/26 14:39, Stanislav Kinsburskii wrote:
> > On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
> > > On 1/23/26 14:20, Stanislav Kinsburskii wrote:
> > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > hypervisor deposited pages.
> > > > 
> > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > management is implemented.
> > > > 
> > > > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > > > ---
> > > >    drivers/hv/Kconfig |    1 +
> > > >    1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > > > index 7937ac0cbd0f..cfd4501db0fa 100644
> > > > --- a/drivers/hv/Kconfig
> > > > +++ b/drivers/hv/Kconfig
> > > > @@ -74,6 +74,7 @@ config MSHV_ROOT
> > > >    	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> > > >    	# no particular order, making it impossible to reassemble larger pages
> > > >    	depends on PAGE_SIZE_4KB
> > > > +	depends on !KEXEC
> > > >    	select EVENTFD
> > > >    	select VIRT_XFER_TO_GUEST_WORK
> > > >    	select HMM_MIRROR
> > > > 
> > > > 
> > > 
> > > Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
> > > implying that crash dump might be involved. Or did you test kdump
> > > and it was fine?
> > > 
> > 
> > Yes, it will. Crash kexec depends on normal kexec functionality, so it
> > will be affected as well.
> 
> So not sure I understand the reason for this patch. We can just block
> kexec if there are any VMs running, right? Doing this would mean any
> further developement would be without a ver important and major feature,
> right?

This is an option. But until it's implemented and merged, a user mshv
driver gets into a situation where kexec is broken in a non-obvious way.
The system may crash at any time after kexec, depending on whether the
new kernel touches the pages deposited to hypervisor or not. This is a
bad user experience.
Therefor it should be explicitly forbidden as it's essentially not
supported yet.

Thanks,
Stanislav

> 
> > Thanks,
> > Stanislav
> > 
> > > Thanks,
> > > -Mukesh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-26 18:49 ` Anirudh Rayabharam
@ 2026-01-26 20:46   ` Stanislav Kinsburskii
  2026-01-28 16:16     ` Anirudh Rayabharam
  0 siblings, 1 reply; 41+ messages in thread
From: Stanislav Kinsburskii @ 2026-01-26 20:46 UTC (permalink / raw)
  To: Anirudh Rayabharam
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > runtime and never withdraws them. This creates a fundamental incompatibility
> > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > hypervisor deposited pages.
> > 
> > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > management is implemented.
> 
> Someone might want to stop all guest VMs and do a kexec. Which is valid
> and would work without any issue for L1VH.
> 

No, it won't work and hypervsisor depostied pages won't be withdrawn.
Also, kernel consisntency must no depend on use space behavior. 

> Also, I don't think it is reasonable at all that someone needs to
> disable basic kernel functionality such as kexec in order to use our
> driver.
> 

It's a temporary measure until proper page lifecycle management is
supported in the driver.
Mutual exclusion of the driver and kexec is given and thus should be
expclitily stated in the Kconfig.

Thanks,
Stanislav

> Thanks,
> Anirudh.
> 
> > 
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> >  drivers/hv/Kconfig |    1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > index 7937ac0cbd0f..cfd4501db0fa 100644
> > --- a/drivers/hv/Kconfig
> > +++ b/drivers/hv/Kconfig
> > @@ -74,6 +74,7 @@ config MSHV_ROOT
> >  	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> >  	# no particular order, making it impossible to reassemble larger pages
> >  	depends on PAGE_SIZE_4KB
> > +	depends on !KEXEC
> >  	select EVENTFD
> >  	select VIRT_XFER_TO_GUEST_WORK
> >  	select HMM_MIRROR
> > 
> > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-26 20:43       ` Stanislav Kinsburskii
@ 2026-01-26 23:07         ` Mukesh R
  2026-01-27  0:21           ` Stanislav Kinsburskii
  0 siblings, 1 reply; 41+ messages in thread
From: Mukesh R @ 2026-01-26 23:07 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On 1/26/26 12:43, Stanislav Kinsburskii wrote:
> On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote:
>> On 1/25/26 14:39, Stanislav Kinsburskii wrote:
>>> On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
>>>> On 1/23/26 14:20, Stanislav Kinsburskii wrote:
>>>>> The MSHV driver deposits kernel-allocated pages to the hypervisor during
>>>>> runtime and never withdraws them. This creates a fundamental incompatibility
>>>>> with KEXEC, as these deposited pages remain unavailable to the new kernel
>>>>> loaded via KEXEC, leading to potential system crashes upon kernel accessing
>>>>> hypervisor deposited pages.
>>>>>
>>>>> Make MSHV mutually exclusive with KEXEC until proper page lifecycle
>>>>> management is implemented.
>>>>>
>>>>> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
>>>>> ---
>>>>>     drivers/hv/Kconfig |    1 +
>>>>>     1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
>>>>> index 7937ac0cbd0f..cfd4501db0fa 100644
>>>>> --- a/drivers/hv/Kconfig
>>>>> +++ b/drivers/hv/Kconfig
>>>>> @@ -74,6 +74,7 @@ config MSHV_ROOT
>>>>>     	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
>>>>>     	# no particular order, making it impossible to reassemble larger pages
>>>>>     	depends on PAGE_SIZE_4KB
>>>>> +	depends on !KEXEC
>>>>>     	select EVENTFD
>>>>>     	select VIRT_XFER_TO_GUEST_WORK
>>>>>     	select HMM_MIRROR
>>>>>
>>>>>
>>>>
>>>> Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
>>>> implying that crash dump might be involved. Or did you test kdump
>>>> and it was fine?
>>>>
>>>
>>> Yes, it will. Crash kexec depends on normal kexec functionality, so it
>>> will be affected as well.
>>
>> So not sure I understand the reason for this patch. We can just block
>> kexec if there are any VMs running, right? Doing this would mean any
>> further developement would be without a ver important and major feature,
>> right?
> 
> This is an option. But until it's implemented and merged, a user mshv
> driver gets into a situation where kexec is broken in a non-obvious way.
> The system may crash at any time after kexec, depending on whether the
> new kernel touches the pages deposited to hypervisor or not. This is a
> bad user experience.

I understand that. But with this we cannot collect core and debug any
crashes. I was thinking there would be a quick way to prohibit kexec
for update via notifier or some other quick hack. Did you already
explore that and didn't find anything, hence this?

Thanks,
-Mukesh

> Therefor it should be explicitly forbidden as it's essentially not
> supported yet.
> 
> Thanks,
> Stanislav
> 
>>
>>> Thanks,
>>> Stanislav
>>>
>>>> Thanks,
>>>> -Mukesh


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-26 23:07         ` Mukesh R
@ 2026-01-27  0:21           ` Stanislav Kinsburskii
  2026-01-27  1:39             ` Mukesh R
  0 siblings, 1 reply; 41+ messages in thread
From: Stanislav Kinsburskii @ 2026-01-27  0:21 UTC (permalink / raw)
  To: Mukesh R
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Mon, Jan 26, 2026 at 03:07:18PM -0800, Mukesh R wrote:
> On 1/26/26 12:43, Stanislav Kinsburskii wrote:
> > On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote:
> > > On 1/25/26 14:39, Stanislav Kinsburskii wrote:
> > > > On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
> > > > > On 1/23/26 14:20, Stanislav Kinsburskii wrote:
> > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > hypervisor deposited pages.
> > > > > > 
> > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > management is implemented.
> > > > > > 
> > > > > > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > > > > > ---
> > > > > >     drivers/hv/Kconfig |    1 +
> > > > > >     1 file changed, 1 insertion(+)
> > > > > > 
> > > > > > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > > > > > index 7937ac0cbd0f..cfd4501db0fa 100644
> > > > > > --- a/drivers/hv/Kconfig
> > > > > > +++ b/drivers/hv/Kconfig
> > > > > > @@ -74,6 +74,7 @@ config MSHV_ROOT
> > > > > >     	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> > > > > >     	# no particular order, making it impossible to reassemble larger pages
> > > > > >     	depends on PAGE_SIZE_4KB
> > > > > > +	depends on !KEXEC
> > > > > >     	select EVENTFD
> > > > > >     	select VIRT_XFER_TO_GUEST_WORK
> > > > > >     	select HMM_MIRROR
> > > > > > 
> > > > > > 
> > > > > 
> > > > > Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
> > > > > implying that crash dump might be involved. Or did you test kdump
> > > > > and it was fine?
> > > > > 
> > > > 
> > > > Yes, it will. Crash kexec depends on normal kexec functionality, so it
> > > > will be affected as well.
> > > 
> > > So not sure I understand the reason for this patch. We can just block
> > > kexec if there are any VMs running, right? Doing this would mean any
> > > further developement would be without a ver important and major feature,
> > > right?
> > 
> > This is an option. But until it's implemented and merged, a user mshv
> > driver gets into a situation where kexec is broken in a non-obvious way.
> > The system may crash at any time after kexec, depending on whether the
> > new kernel touches the pages deposited to hypervisor or not. This is a
> > bad user experience.
> 
> I understand that. But with this we cannot collect core and debug any
> crashes. I was thinking there would be a quick way to prohibit kexec
> for update via notifier or some other quick hack. Did you already
> explore that and didn't find anything, hence this?
> 

This quick hack you mention isn't quick in the upstream kernel as there
is no hook to interrupt kexec process except the live update one.
I sent an RFC for that one but given todays conversation details is
won't be accepted as is.
Making mshv mutually exclusive with kexec is the only viable option for
now given time constraints.
It is intended to be replaced with proper page lifecycle management in
the future.

Thanks,
Stanislav

> Thanks,
> -Mukesh
> 
> > Therefor it should be explicitly forbidden as it's essentially not
> > supported yet.
> > 
> > Thanks,
> > Stanislav
> > 
> > > 
> > > > Thanks,
> > > > Stanislav
> > > > 
> > > > > Thanks,
> > > > > -Mukesh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-27  0:21           ` Stanislav Kinsburskii
@ 2026-01-27  1:39             ` Mukesh R
  2026-01-27 17:47               ` Stanislav Kinsburskii
  0 siblings, 1 reply; 41+ messages in thread
From: Mukesh R @ 2026-01-27  1:39 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On 1/26/26 16:21, Stanislav Kinsburskii wrote:
> On Mon, Jan 26, 2026 at 03:07:18PM -0800, Mukesh R wrote:
>> On 1/26/26 12:43, Stanislav Kinsburskii wrote:
>>> On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote:
>>>> On 1/25/26 14:39, Stanislav Kinsburskii wrote:
>>>>> On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
>>>>>> On 1/23/26 14:20, Stanislav Kinsburskii wrote:
>>>>>>> The MSHV driver deposits kernel-allocated pages to the hypervisor during
>>>>>>> runtime and never withdraws them. This creates a fundamental incompatibility
>>>>>>> with KEXEC, as these deposited pages remain unavailable to the new kernel
>>>>>>> loaded via KEXEC, leading to potential system crashes upon kernel accessing
>>>>>>> hypervisor deposited pages.
>>>>>>>
>>>>>>> Make MSHV mutually exclusive with KEXEC until proper page lifecycle
>>>>>>> management is implemented.
>>>>>>>
>>>>>>> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
>>>>>>> ---
>>>>>>>      drivers/hv/Kconfig |    1 +
>>>>>>>      1 file changed, 1 insertion(+)
>>>>>>>
>>>>>>> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
>>>>>>> index 7937ac0cbd0f..cfd4501db0fa 100644
>>>>>>> --- a/drivers/hv/Kconfig
>>>>>>> +++ b/drivers/hv/Kconfig
>>>>>>> @@ -74,6 +74,7 @@ config MSHV_ROOT
>>>>>>>      	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
>>>>>>>      	# no particular order, making it impossible to reassemble larger pages
>>>>>>>      	depends on PAGE_SIZE_4KB
>>>>>>> +	depends on !KEXEC
>>>>>>>      	select EVENTFD
>>>>>>>      	select VIRT_XFER_TO_GUEST_WORK
>>>>>>>      	select HMM_MIRROR
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
>>>>>> implying that crash dump might be involved. Or did you test kdump
>>>>>> and it was fine?
>>>>>>
>>>>>
>>>>> Yes, it will. Crash kexec depends on normal kexec functionality, so it
>>>>> will be affected as well.
>>>>
>>>> So not sure I understand the reason for this patch. We can just block
>>>> kexec if there are any VMs running, right? Doing this would mean any
>>>> further developement would be without a ver important and major feature,
>>>> right?
>>>
>>> This is an option. But until it's implemented and merged, a user mshv
>>> driver gets into a situation where kexec is broken in a non-obvious way.
>>> The system may crash at any time after kexec, depending on whether the
>>> new kernel touches the pages deposited to hypervisor or not. This is a
>>> bad user experience.
>>
>> I understand that. But with this we cannot collect core and debug any
>> crashes. I was thinking there would be a quick way to prohibit kexec
>> for update via notifier or some other quick hack. Did you already
>> explore that and didn't find anything, hence this?
>>
> 
> This quick hack you mention isn't quick in the upstream kernel as there
> is no hook to interrupt kexec process except the live update one.

That's the one we want to interrupt and block right? crash kexec
is ok and should be allowed. We can document we don't support kexec
for update for now.

> I sent an RFC for that one but given todays conversation details is
> won't be accepted as is.

Are you taking about this?

         "mshv: Add kexec safety for deposited pages"

> Making mshv mutually exclusive with kexec is the only viable option for
> now given time constraints.
> It is intended to be replaced with proper page lifecycle management in
> the future.

Yeah, that could take a long time and imo we cannot just disable KEXEC
completely. What we want is just block kexec for updates from some
mshv file for now, we an print during boot that kexec for updates is
not supported on mshv. Hope that makes sense.

Thanks,
-Mukesh



> Thanks,
> Stanislav
> 
>> Thanks,
>> -Mukesh
>>
>>> Therefor it should be explicitly forbidden as it's essentially not
>>> supported yet.
>>>
>>> Thanks,
>>> Stanislav
>>>
>>>>
>>>>> Thanks,
>>>>> Stanislav
>>>>>
>>>>>> Thanks,
>>>>>> -Mukesh


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-27  1:39             ` Mukesh R
@ 2026-01-27 17:47               ` Stanislav Kinsburskii
  2026-01-27 19:56                 ` Mukesh R
  0 siblings, 1 reply; 41+ messages in thread
From: Stanislav Kinsburskii @ 2026-01-27 17:47 UTC (permalink / raw)
  To: Mukesh R
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Mon, Jan 26, 2026 at 05:39:49PM -0800, Mukesh R wrote:
> On 1/26/26 16:21, Stanislav Kinsburskii wrote:
> > On Mon, Jan 26, 2026 at 03:07:18PM -0800, Mukesh R wrote:
> > > On 1/26/26 12:43, Stanislav Kinsburskii wrote:
> > > > On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote:
> > > > > On 1/25/26 14:39, Stanislav Kinsburskii wrote:
> > > > > > On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
> > > > > > > On 1/23/26 14:20, Stanislav Kinsburskii wrote:
> > > > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > > > hypervisor deposited pages.
> > > > > > > > 
> > > > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > > > management is implemented.
> > > > > > > > 
> > > > > > > > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > > > > > > > ---
> > > > > > > >      drivers/hv/Kconfig |    1 +
> > > > > > > >      1 file changed, 1 insertion(+)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > > > > > > > index 7937ac0cbd0f..cfd4501db0fa 100644
> > > > > > > > --- a/drivers/hv/Kconfig
> > > > > > > > +++ b/drivers/hv/Kconfig
> > > > > > > > @@ -74,6 +74,7 @@ config MSHV_ROOT
> > > > > > > >      	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> > > > > > > >      	# no particular order, making it impossible to reassemble larger pages
> > > > > > > >      	depends on PAGE_SIZE_4KB
> > > > > > > > +	depends on !KEXEC
> > > > > > > >      	select EVENTFD
> > > > > > > >      	select VIRT_XFER_TO_GUEST_WORK
> > > > > > > >      	select HMM_MIRROR
> > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
> > > > > > > implying that crash dump might be involved. Or did you test kdump
> > > > > > > and it was fine?
> > > > > > > 
> > > > > > 
> > > > > > Yes, it will. Crash kexec depends on normal kexec functionality, so it
> > > > > > will be affected as well.
> > > > > 
> > > > > So not sure I understand the reason for this patch. We can just block
> > > > > kexec if there are any VMs running, right? Doing this would mean any
> > > > > further developement would be without a ver important and major feature,
> > > > > right?
> > > > 
> > > > This is an option. But until it's implemented and merged, a user mshv
> > > > driver gets into a situation where kexec is broken in a non-obvious way.
> > > > The system may crash at any time after kexec, depending on whether the
> > > > new kernel touches the pages deposited to hypervisor or not. This is a
> > > > bad user experience.
> > > 
> > > I understand that. But with this we cannot collect core and debug any
> > > crashes. I was thinking there would be a quick way to prohibit kexec
> > > for update via notifier or some other quick hack. Did you already
> > > explore that and didn't find anything, hence this?
> > > 
> > 
> > This quick hack you mention isn't quick in the upstream kernel as there
> > is no hook to interrupt kexec process except the live update one.
> 
> That's the one we want to interrupt and block right? crash kexec
> is ok and should be allowed. We can document we don't support kexec
> for update for now.
> 
> > I sent an RFC for that one but given todays conversation details is
> > won't be accepted as is.
> 
> Are you taking about this?
> 
>         "mshv: Add kexec safety for deposited pages"
> 

Yes.

> > Making mshv mutually exclusive with kexec is the only viable option for
> > now given time constraints.
> > It is intended to be replaced with proper page lifecycle management in
> > the future.
> 
> Yeah, that could take a long time and imo we cannot just disable KEXEC
> completely. What we want is just block kexec for updates from some
> mshv file for now, we an print during boot that kexec for updates is
> not supported on mshv. Hope that makes sense.
> 

The trade-off here is between disabling kexec support and having the
kernel crash after kexec in a non-obvious way. This affects both regular
kexec and crash kexec.

It’s a pity we can’t apply a quick hack to disable only regular kexec.
However, since crash kexec would hit the same issues, until we have a
proper state transition for deposted pages, the best workaround for now
is to reset the hypervisor state on every kexec, which needs design,
work, and testing.

Disabling kexec is the only consistent way to handle this in the
upstream kernel at the moment.

Thanks, Stanislav


> Thanks,
> -Mukesh
> 
> 
> 
> > Thanks,
> > Stanislav
> > 
> > > Thanks,
> > > -Mukesh
> > > 
> > > > Therefor it should be explicitly forbidden as it's essentially not
> > > > supported yet.
> > > > 
> > > > Thanks,
> > > > Stanislav
> > > > 
> > > > > 
> > > > > > Thanks,
> > > > > > Stanislav
> > > > > > 
> > > > > > > Thanks,
> > > > > > > -Mukesh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-27 17:47               ` Stanislav Kinsburskii
@ 2026-01-27 19:56                 ` Mukesh R
  2026-01-28 15:53                   ` Michael Kelley
  2026-01-28 23:08                   ` Stanislav Kinsburskii
  0 siblings, 2 replies; 41+ messages in thread
From: Mukesh R @ 2026-01-27 19:56 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On 1/27/26 09:47, Stanislav Kinsburskii wrote:
> On Mon, Jan 26, 2026 at 05:39:49PM -0800, Mukesh R wrote:
>> On 1/26/26 16:21, Stanislav Kinsburskii wrote:
>>> On Mon, Jan 26, 2026 at 03:07:18PM -0800, Mukesh R wrote:
>>>> On 1/26/26 12:43, Stanislav Kinsburskii wrote:
>>>>> On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote:
>>>>>> On 1/25/26 14:39, Stanislav Kinsburskii wrote:
>>>>>>> On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
>>>>>>>> On 1/23/26 14:20, Stanislav Kinsburskii wrote:
>>>>>>>>> The MSHV driver deposits kernel-allocated pages to the hypervisor during
>>>>>>>>> runtime and never withdraws them. This creates a fundamental incompatibility
>>>>>>>>> with KEXEC, as these deposited pages remain unavailable to the new kernel
>>>>>>>>> loaded via KEXEC, leading to potential system crashes upon kernel accessing
>>>>>>>>> hypervisor deposited pages.
>>>>>>>>>
>>>>>>>>> Make MSHV mutually exclusive with KEXEC until proper page lifecycle
>>>>>>>>> management is implemented.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
>>>>>>>>> ---
>>>>>>>>>       drivers/hv/Kconfig |    1 +
>>>>>>>>>       1 file changed, 1 insertion(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
>>>>>>>>> index 7937ac0cbd0f..cfd4501db0fa 100644
>>>>>>>>> --- a/drivers/hv/Kconfig
>>>>>>>>> +++ b/drivers/hv/Kconfig
>>>>>>>>> @@ -74,6 +74,7 @@ config MSHV_ROOT
>>>>>>>>>       	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
>>>>>>>>>       	# no particular order, making it impossible to reassemble larger pages
>>>>>>>>>       	depends on PAGE_SIZE_4KB
>>>>>>>>> +	depends on !KEXEC
>>>>>>>>>       	select EVENTFD
>>>>>>>>>       	select VIRT_XFER_TO_GUEST_WORK
>>>>>>>>>       	select HMM_MIRROR
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
>>>>>>>> implying that crash dump might be involved. Or did you test kdump
>>>>>>>> and it was fine?
>>>>>>>>
>>>>>>>
>>>>>>> Yes, it will. Crash kexec depends on normal kexec functionality, so it
>>>>>>> will be affected as well.
>>>>>>
>>>>>> So not sure I understand the reason for this patch. We can just block
>>>>>> kexec if there are any VMs running, right? Doing this would mean any
>>>>>> further developement would be without a ver important and major feature,
>>>>>> right?
>>>>>
>>>>> This is an option. But until it's implemented and merged, a user mshv
>>>>> driver gets into a situation where kexec is broken in a non-obvious way.
>>>>> The system may crash at any time after kexec, depending on whether the
>>>>> new kernel touches the pages deposited to hypervisor or not. This is a
>>>>> bad user experience.
>>>>
>>>> I understand that. But with this we cannot collect core and debug any
>>>> crashes. I was thinking there would be a quick way to prohibit kexec
>>>> for update via notifier or some other quick hack. Did you already
>>>> explore that and didn't find anything, hence this?
>>>>
>>>
>>> This quick hack you mention isn't quick in the upstream kernel as there
>>> is no hook to interrupt kexec process except the live update one.
>>
>> That's the one we want to interrupt and block right? crash kexec
>> is ok and should be allowed. We can document we don't support kexec
>> for update for now.
>>
>>> I sent an RFC for that one but given todays conversation details is
>>> won't be accepted as is.
>>
>> Are you taking about this?
>>
>>          "mshv: Add kexec safety for deposited pages"
>>
> 
> Yes.
> 
>>> Making mshv mutually exclusive with kexec is the only viable option for
>>> now given time constraints.
>>> It is intended to be replaced with proper page lifecycle management in
>>> the future.
>>
>> Yeah, that could take a long time and imo we cannot just disable KEXEC
>> completely. What we want is just block kexec for updates from some
>> mshv file for now, we an print during boot that kexec for updates is
>> not supported on mshv. Hope that makes sense.
>>
> 
> The trade-off here is between disabling kexec support and having the
> kernel crash after kexec in a non-obvious way. This affects both regular
> kexec and crash kexec.

crash kexec on baremetal is not affected, hence disabling that
doesn't make sense as we can't debug crashes then on bm.

Let me think and explore a bit, and if I come up with something, I'll
send a patch here. If nothing, then we can do this as last resort.

Thanks,
-Mukesh


> It?s a pity we can?t apply a quick hack to disable only regular kexec.
> However, since crash kexec would hit the same issues, until we have a
> proper state transition for deposted pages, the best workaround for now
> is to reset the hypervisor state on every kexec, which needs design,
> work, and testing.
> 
> Disabling kexec is the only consistent way to handle this in the
> upstream kernel at the moment.
> 
> Thanks, Stanislav
> 
> 
>> Thanks,
>> -Mukesh
>>
>>
>>
>>> Thanks,
>>> Stanislav
>>>
>>>> Thanks,
>>>> -Mukesh
>>>>
>>>>> Therefor it should be explicitly forbidden as it's essentially not
>>>>> supported yet.
>>>>>
>>>>> Thanks,
>>>>> Stanislav
>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> Stanislav
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> -Mukesh


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-27 19:56                 ` Mukesh R
@ 2026-01-28 15:53                   ` Michael Kelley
  2026-01-30  2:52                     ` Mukesh R
  2026-01-28 23:08                   ` Stanislav Kinsburskii
  1 sibling, 1 reply; 41+ messages in thread
From: Michael Kelley @ 2026-01-28 15:53 UTC (permalink / raw)
  To: Mukesh R, Stanislav Kinsburskii
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, longli@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

From: Mukesh R <mrathor@linux.microsoft.com> Sent: Tuesday, January 27, 2026 11:56 AM
> To: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> Cc: kys@microsoft.com; haiyangz@microsoft.com; wei.liu@kernel.org;
> decui@microsoft.com; longli@microsoft.com; linux-hyperv@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
> 
> On 1/27/26 09:47, Stanislav Kinsburskii wrote:
> > On Mon, Jan 26, 2026 at 05:39:49PM -0800, Mukesh R wrote:
> >> On 1/26/26 16:21, Stanislav Kinsburskii wrote:
> >>> On Mon, Jan 26, 2026 at 03:07:18PM -0800, Mukesh R wrote:
> >>>> On 1/26/26 12:43, Stanislav Kinsburskii wrote:
> >>>>> On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote:
> >>>>>> On 1/25/26 14:39, Stanislav Kinsburskii wrote:
> >>>>>>> On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
> >>>>>>>> On 1/23/26 14:20, Stanislav Kinsburskii wrote:
> >>>>>>>>> The MSHV driver deposits kernel-allocated pages to the hypervisor during
> >>>>>>>>> runtime and never withdraws them. This creates a fundamental incompatibility
> >>>>>>>>> with KEXEC, as these deposited pages remain unavailable to the new kernel
> >>>>>>>>> loaded via KEXEC, leading to potential system crashes upon kernel accessing
> >>>>>>>>> hypervisor deposited pages.
> >>>>>>>>>
> >>>>>>>>> Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> >>>>>>>>> management is implemented.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> >>>>>>>>> ---
> >>>>>>>>>       drivers/hv/Kconfig |    1 +
> >>>>>>>>>       1 file changed, 1 insertion(+)
> >>>>>>>>>
> >>>>>>>>> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> >>>>>>>>> index 7937ac0cbd0f..cfd4501db0fa 100644
> >>>>>>>>> --- a/drivers/hv/Kconfig
> >>>>>>>>> +++ b/drivers/hv/Kconfig
> >>>>>>>>> @@ -74,6 +74,7 @@ config MSHV_ROOT
> >>>>>>>>>       	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> >>>>>>>>>       	# no particular order, making it impossible to reassemble larger pages
> >>>>>>>>>       	depends on PAGE_SIZE_4KB
> >>>>>>>>> +	depends on !KEXEC
> >>>>>>>>>       	select EVENTFD
> >>>>>>>>>       	select VIRT_XFER_TO_GUEST_WORK
> >>>>>>>>>       	select HMM_MIRROR
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
> >>>>>>>> implying that crash dump might be involved. Or did you test kdump
> >>>>>>>> and it was fine?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Yes, it will. Crash kexec depends on normal kexec functionality, so it
> >>>>>>> will be affected as well.
> >>>>>>
> >>>>>> So not sure I understand the reason for this patch. We can just block
> >>>>>> kexec if there are any VMs running, right? Doing this would mean any
> >>>>>> further developement would be without a ver important and major feature,
> >>>>>> right?
> >>>>>
> >>>>> This is an option. But until it's implemented and merged, a user mshv
> >>>>> driver gets into a situation where kexec is broken in a non-obvious way.
> >>>>> The system may crash at any time after kexec, depending on whether the
> >>>>> new kernel touches the pages deposited to hypervisor or not. This is a
> >>>>> bad user experience.
> >>>>
> >>>> I understand that. But with this we cannot collect core and debug any
> >>>> crashes. I was thinking there would be a quick way to prohibit kexec
> >>>> for update via notifier or some other quick hack. Did you already
> >>>> explore that and didn't find anything, hence this?
> >>>>
> >>>
> >>> This quick hack you mention isn't quick in the upstream kernel as there
> >>> is no hook to interrupt kexec process except the live update one.
> >>
> >> That's the one we want to interrupt and block right? crash kexec
> >> is ok and should be allowed. We can document we don't support kexec
> >> for update for now.
> >>
> >>> I sent an RFC for that one but given todays conversation details is
> >>> won't be accepted as is.
> >>
> >> Are you taking about this?
> >>
> >>          "mshv: Add kexec safety for deposited pages"
> >>
> >
> > Yes.
> >
> >>> Making mshv mutually exclusive with kexec is the only viable option for
> >>> now given time constraints.
> >>> It is intended to be replaced with proper page lifecycle management in
> >>> the future.
> >>
> >> Yeah, that could take a long time and imo we cannot just disable KEXEC
> >> completely. What we want is just block kexec for updates from some
> >> mshv file for now, we an print during boot that kexec for updates is
> >> not supported on mshv. Hope that makes sense.
> >>
> >
> > The trade-off here is between disabling kexec support and having the
> > kernel crash after kexec in a non-obvious way. This affects both regular
> > kexec and crash kexec.
> 
> crash kexec on baremetal is not affected, hence disabling that
> doesn't make sense as we can't debug crashes then on bm.
> 
> Let me think and explore a bit, and if I come up with something, I'll
> send a patch here. If nothing, then we can do this as last resort.
> 
> Thanks,
> -Mukesh

Maybe you've already looked at this, but there's a sysctl parameter
kernel.kexec_load_limit_reboot that prevents loading a kexec
kernel for reboot if the value is zero. Separately, there is
kernel.kexec_load_limit_panic that controls whether a kexec
kernel can be loaded for kdump purposes.

kernel.kexec_load_limit_reboot defaults to -1, which allows an
unlimited number of loading a kexec kernel for reboot. But the value
can be set to zero with this kernel boot line parameter:

sysctl.kernel.kexec_load_limit_reboot=0

Alternatively, the mshv driver initialization could add code along
the lines of process_sysctl_arg() to open
/proc/sys/kernel/kexec_load_limit_reboot and write a value of zero.
Then there's no dependency on setting the kernel boot line.

The downside to either method is that after Linux in the root partition
is up-and-running, it is possible to change the sysctl to a non-zero value,
and then load a kexec kernel for reboot. So this approach isn't absolute
protection against doing a kexec for reboot. But it makes it harder, and 
until there's a mechanism to reclaim the deposited pages, it might be
a viable compromise to allow kdump to still be used.

Just a thought ....

Michael

> 
> 
> > It?s a pity we can?t apply a quick hack to disable only regular kexec.
> > However, since crash kexec would hit the same issues, until we have a
> > proper state transition for deposted pages, the best workaround for now
> > is to reset the hypervisor state on every kexec, which needs design,
> > work, and testing.
> >
> > Disabling kexec is the only consistent way to handle this in the
> > upstream kernel at the moment.
> >
> > Thanks, Stanislav

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-26 20:46   ` Stanislav Kinsburskii
@ 2026-01-28 16:16     ` Anirudh Rayabharam
  2026-01-28 23:11       ` Stanislav Kinsburskii
  0 siblings, 1 reply; 41+ messages in thread
From: Anirudh Rayabharam @ 2026-01-28 16:16 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > hypervisor deposited pages.
> > > 
> > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > management is implemented.
> > 
> > Someone might want to stop all guest VMs and do a kexec. Which is valid
> > and would work without any issue for L1VH.
> > 
> 
> No, it won't work and hypervsisor depostied pages won't be withdrawn.

All pages that were deposited in the context of a guest partition (i.e.
with the guest partition ID), would be withdrawn when you kill the VMs,
right? What other deposited pages would be left?

Thanks,
Anirudh.

> Also, kernel consisntency must no depend on use space behavior. 
> 
> > Also, I don't think it is reasonable at all that someone needs to
> > disable basic kernel functionality such as kexec in order to use our
> > driver.
> > 
> 
> It's a temporary measure until proper page lifecycle management is
> supported in the driver.
> Mutual exclusion of the driver and kexec is given and thus should be
> expclitily stated in the Kconfig.
> 
> Thanks,
> Stanislav
> 
> > Thanks,
> > Anirudh.
> > 
> > > 
> > > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > > ---
> > >  drivers/hv/Kconfig |    1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > > index 7937ac0cbd0f..cfd4501db0fa 100644
> > > --- a/drivers/hv/Kconfig
> > > +++ b/drivers/hv/Kconfig
> > > @@ -74,6 +74,7 @@ config MSHV_ROOT
> > >  	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> > >  	# no particular order, making it impossible to reassemble larger pages
> > >  	depends on PAGE_SIZE_4KB
> > > +	depends on !KEXEC
> > >  	select EVENTFD
> > >  	select VIRT_XFER_TO_GUEST_WORK
> > >  	select HMM_MIRROR
> > > 
> > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-27 19:56                 ` Mukesh R
  2026-01-28 15:53                   ` Michael Kelley
@ 2026-01-28 23:08                   ` Stanislav Kinsburskii
  2026-01-30  2:59                     ` Mukesh R
  1 sibling, 1 reply; 41+ messages in thread
From: Stanislav Kinsburskii @ 2026-01-28 23:08 UTC (permalink / raw)
  To: Mukesh R
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Tue, Jan 27, 2026 at 11:56:02AM -0800, Mukesh R wrote:
> On 1/27/26 09:47, Stanislav Kinsburskii wrote:
> > On Mon, Jan 26, 2026 at 05:39:49PM -0800, Mukesh R wrote:
> > > On 1/26/26 16:21, Stanislav Kinsburskii wrote:
> > > > On Mon, Jan 26, 2026 at 03:07:18PM -0800, Mukesh R wrote:
> > > > > On 1/26/26 12:43, Stanislav Kinsburskii wrote:
> > > > > > On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote:
> > > > > > > On 1/25/26 14:39, Stanislav Kinsburskii wrote:
> > > > > > > > On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
> > > > > > > > > On 1/23/26 14:20, Stanislav Kinsburskii wrote:
> > > > > > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > > > > > hypervisor deposited pages.
> > > > > > > > > > 
> > > > > > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > > > > > management is implemented.
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > > > > > > > > > ---
> > > > > > > > > >       drivers/hv/Kconfig |    1 +
> > > > > > > > > >       1 file changed, 1 insertion(+)
> > > > > > > > > > 
> > > > > > > > > > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > > > > > > > > > index 7937ac0cbd0f..cfd4501db0fa 100644
> > > > > > > > > > --- a/drivers/hv/Kconfig
> > > > > > > > > > +++ b/drivers/hv/Kconfig
> > > > > > > > > > @@ -74,6 +74,7 @@ config MSHV_ROOT
> > > > > > > > > >       	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> > > > > > > > > >       	# no particular order, making it impossible to reassemble larger pages
> > > > > > > > > >       	depends on PAGE_SIZE_4KB
> > > > > > > > > > +	depends on !KEXEC
> > > > > > > > > >       	select EVENTFD
> > > > > > > > > >       	select VIRT_XFER_TO_GUEST_WORK
> > > > > > > > > >       	select HMM_MIRROR
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
> > > > > > > > > implying that crash dump might be involved. Or did you test kdump
> > > > > > > > > and it was fine?
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Yes, it will. Crash kexec depends on normal kexec functionality, so it
> > > > > > > > will be affected as well.
> > > > > > > 
> > > > > > > So not sure I understand the reason for this patch. We can just block
> > > > > > > kexec if there are any VMs running, right? Doing this would mean any
> > > > > > > further developement would be without a ver important and major feature,
> > > > > > > right?
> > > > > > 
> > > > > > This is an option. But until it's implemented and merged, a user mshv
> > > > > > driver gets into a situation where kexec is broken in a non-obvious way.
> > > > > > The system may crash at any time after kexec, depending on whether the
> > > > > > new kernel touches the pages deposited to hypervisor or not. This is a
> > > > > > bad user experience.
> > > > > 
> > > > > I understand that. But with this we cannot collect core and debug any
> > > > > crashes. I was thinking there would be a quick way to prohibit kexec
> > > > > for update via notifier or some other quick hack. Did you already
> > > > > explore that and didn't find anything, hence this?
> > > > > 
> > > > 
> > > > This quick hack you mention isn't quick in the upstream kernel as there
> > > > is no hook to interrupt kexec process except the live update one.
> > > 
> > > That's the one we want to interrupt and block right? crash kexec
> > > is ok and should be allowed. We can document we don't support kexec
> > > for update for now.
> > > 
> > > > I sent an RFC for that one but given todays conversation details is
> > > > won't be accepted as is.
> > > 
> > > Are you taking about this?
> > > 
> > >          "mshv: Add kexec safety for deposited pages"
> > > 
> > 
> > Yes.
> > 
> > > > Making mshv mutually exclusive with kexec is the only viable option for
> > > > now given time constraints.
> > > > It is intended to be replaced with proper page lifecycle management in
> > > > the future.
> > > 
> > > Yeah, that could take a long time and imo we cannot just disable KEXEC
> > > completely. What we want is just block kexec for updates from some
> > > mshv file for now, we an print during boot that kexec for updates is
> > > not supported on mshv. Hope that makes sense.
> > > 
> > 
> > The trade-off here is between disabling kexec support and having the
> > kernel crash after kexec in a non-obvious way. This affects both regular
> > kexec and crash kexec.
> 
> crash kexec on baremetal is not affected, hence disabling that
> doesn't make sense as we can't debug crashes then on bm.
> 

Bare metal support is not currently relevant, as it is not available.
This is the upstream kernel, and this driver will be accessible to
third-party customers beginning with kernel 6.19 for running their
kernels in Azure L1VH, so consistency is required.

Thanks,
Stanislav

> Let me think and explore a bit, and if I come up with something, I'll
> send a patch here. If nothing, then we can do this as last resort.
> 
> Thanks,
> -Mukesh
> 
> 
> > It?s a pity we can?t apply a quick hack to disable only regular kexec.
> > However, since crash kexec would hit the same issues, until we have a
> > proper state transition for deposted pages, the best workaround for now
> > is to reset the hypervisor state on every kexec, which needs design,
> > work, and testing.
> > 
> > Disabling kexec is the only consistent way to handle this in the
> > upstream kernel at the moment.
> > 
> > Thanks, Stanislav
> > 
> > 
> > > Thanks,
> > > -Mukesh
> > > 
> > > 
> > > 
> > > > Thanks,
> > > > Stanislav
> > > > 
> > > > > Thanks,
> > > > > -Mukesh
> > > > > 
> > > > > > Therefor it should be explicitly forbidden as it's essentially not
> > > > > > supported yet.
> > > > > > 
> > > > > > Thanks,
> > > > > > Stanislav
> > > > > > 
> > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > Stanislav
> > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > -Mukesh

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-28 16:16     ` Anirudh Rayabharam
@ 2026-01-28 23:11       ` Stanislav Kinsburskii
  2026-01-30 17:11         ` Anirudh Rayabharam
  0 siblings, 1 reply; 41+ messages in thread
From: Stanislav Kinsburskii @ 2026-01-28 23:11 UTC (permalink / raw)
  To: Anirudh Rayabharam
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Wed, Jan 28, 2026 at 04:16:31PM +0000, Anirudh Rayabharam wrote:
> On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> > On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > hypervisor deposited pages.
> > > > 
> > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > management is implemented.
> > > 
> > > Someone might want to stop all guest VMs and do a kexec. Which is valid
> > > and would work without any issue for L1VH.
> > > 
> > 
> > No, it won't work and hypervsisor depostied pages won't be withdrawn.
> 
> All pages that were deposited in the context of a guest partition (i.e.
> with the guest partition ID), would be withdrawn when you kill the VMs,
> right? What other deposited pages would be left?
> 

The driver deposits two types of pages: one for the guests (withdrawn
upon gust shutdown) and the other - for the host itself (never
withdrawn).
See hv_call_create_partition, for example: it deposits pages for the
host partition.

Thanks,
Stanislav

> Thanks,
> Anirudh.
> 
> > Also, kernel consisntency must no depend on use space behavior. 
> > 
> > > Also, I don't think it is reasonable at all that someone needs to
> > > disable basic kernel functionality such as kexec in order to use our
> > > driver.
> > > 
> > 
> > It's a temporary measure until proper page lifecycle management is
> > supported in the driver.
> > Mutual exclusion of the driver and kexec is given and thus should be
> > expclitily stated in the Kconfig.
> > 
> > Thanks,
> > Stanislav
> > 
> > > Thanks,
> > > Anirudh.
> > > 
> > > > 
> > > > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > > > ---
> > > >  drivers/hv/Kconfig |    1 +
> > > >  1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > > > index 7937ac0cbd0f..cfd4501db0fa 100644
> > > > --- a/drivers/hv/Kconfig
> > > > +++ b/drivers/hv/Kconfig
> > > > @@ -74,6 +74,7 @@ config MSHV_ROOT
> > > >  	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> > > >  	# no particular order, making it impossible to reassemble larger pages
> > > >  	depends on PAGE_SIZE_4KB
> > > > +	depends on !KEXEC
> > > >  	select EVENTFD
> > > >  	select VIRT_XFER_TO_GUEST_WORK
> > > >  	select HMM_MIRROR
> > > > 
> > > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-28 15:53                   ` Michael Kelley
@ 2026-01-30  2:52                     ` Mukesh R
  0 siblings, 0 replies; 41+ messages in thread
From: Mukesh R @ 2026-01-30  2:52 UTC (permalink / raw)
  To: Michael Kelley, Stanislav Kinsburskii
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, longli@microsoft.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org

On 1/28/26 07:53, Michael Kelley wrote:
> From: Mukesh R <mrathor@linux.microsoft.com> Sent: Tuesday, January 27, 2026 11:56 AM
>> To: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
>> Cc: kys@microsoft.com; haiyangz@microsoft.com; wei.liu@kernel.org;
>> decui@microsoft.com; longli@microsoft.com; linux-hyperv@vger.kernel.org; linux-
>> kernel@vger.kernel.org
>> Subject: Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
>>
>> On 1/27/26 09:47, Stanislav Kinsburskii wrote:
>>> On Mon, Jan 26, 2026 at 05:39:49PM -0800, Mukesh R wrote:
>>>> On 1/26/26 16:21, Stanislav Kinsburskii wrote:
>>>>> On Mon, Jan 26, 2026 at 03:07:18PM -0800, Mukesh R wrote:
>>>>>> On 1/26/26 12:43, Stanislav Kinsburskii wrote:
>>>>>>> On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote:
>>>>>>>> On 1/25/26 14:39, Stanislav Kinsburskii wrote:
>>>>>>>>> On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
>>>>>>>>>> On 1/23/26 14:20, Stanislav Kinsburskii wrote:
>>>>>>>>>>> The MSHV driver deposits kernel-allocated pages to the hypervisor during
>>>>>>>>>>> runtime and never withdraws them. This creates a fundamental incompatibility
>>>>>>>>>>> with KEXEC, as these deposited pages remain unavailable to the new kernel
>>>>>>>>>>> loaded via KEXEC, leading to potential system crashes upon kernel accessing
>>>>>>>>>>> hypervisor deposited pages.
>>>>>>>>>>>
>>>>>>>>>>> Make MSHV mutually exclusive with KEXEC until proper page lifecycle
>>>>>>>>>>> management is implemented.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
>>>>>>>>>>> ---
>>>>>>>>>>>        drivers/hv/Kconfig |    1 +
>>>>>>>>>>>        1 file changed, 1 insertion(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
>>>>>>>>>>> index 7937ac0cbd0f..cfd4501db0fa 100644
>>>>>>>>>>> --- a/drivers/hv/Kconfig
>>>>>>>>>>> +++ b/drivers/hv/Kconfig
>>>>>>>>>>> @@ -74,6 +74,7 @@ config MSHV_ROOT
>>>>>>>>>>>        	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
>>>>>>>>>>>        	# no particular order, making it impossible to reassemble larger pages
>>>>>>>>>>>        	depends on PAGE_SIZE_4KB
>>>>>>>>>>> +	depends on !KEXEC
>>>>>>>>>>>        	select EVENTFD
>>>>>>>>>>>        	select VIRT_XFER_TO_GUEST_WORK
>>>>>>>>>>>        	select HMM_MIRROR
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
>>>>>>>>>> implying that crash dump might be involved. Or did you test kdump
>>>>>>>>>> and it was fine?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes, it will. Crash kexec depends on normal kexec functionality, so it
>>>>>>>>> will be affected as well.
>>>>>>>>
>>>>>>>> So not sure I understand the reason for this patch. We can just block
>>>>>>>> kexec if there are any VMs running, right? Doing this would mean any
>>>>>>>> further developement would be without a ver important and major feature,
>>>>>>>> right?
>>>>>>>
>>>>>>> This is an option. But until it's implemented and merged, a user mshv
>>>>>>> driver gets into a situation where kexec is broken in a non-obvious way.
>>>>>>> The system may crash at any time after kexec, depending on whether the
>>>>>>> new kernel touches the pages deposited to hypervisor or not. This is a
>>>>>>> bad user experience.
>>>>>>
>>>>>> I understand that. But with this we cannot collect core and debug any
>>>>>> crashes. I was thinking there would be a quick way to prohibit kexec
>>>>>> for update via notifier or some other quick hack. Did you already
>>>>>> explore that and didn't find anything, hence this?
>>>>>>
>>>>>
>>>>> This quick hack you mention isn't quick in the upstream kernel as there
>>>>> is no hook to interrupt kexec process except the live update one.
>>>>
>>>> That's the one we want to interrupt and block right? crash kexec
>>>> is ok and should be allowed. We can document we don't support kexec
>>>> for update for now.
>>>>
>>>>> I sent an RFC for that one but given todays conversation details is
>>>>> won't be accepted as is.
>>>>
>>>> Are you taking about this?
>>>>
>>>>           "mshv: Add kexec safety for deposited pages"
>>>>
>>>
>>> Yes.
>>>
>>>>> Making mshv mutually exclusive with kexec is the only viable option for
>>>>> now given time constraints.
>>>>> It is intended to be replaced with proper page lifecycle management in
>>>>> the future.
>>>>
>>>> Yeah, that could take a long time and imo we cannot just disable KEXEC
>>>> completely. What we want is just block kexec for updates from some
>>>> mshv file for now, we an print during boot that kexec for updates is
>>>> not supported on mshv. Hope that makes sense.
>>>>
>>>
>>> The trade-off here is between disabling kexec support and having the
>>> kernel crash after kexec in a non-obvious way. This affects both regular
>>> kexec and crash kexec.
>>
>> crash kexec on baremetal is not affected, hence disabling that
>> doesn't make sense as we can't debug crashes then on bm.
>>
>> Let me think and explore a bit, and if I come up with something, I'll
>> send a patch here. If nothing, then we can do this as last resort.
>>
>> Thanks,
>> -Mukesh
> 
> Maybe you've already looked at this, but there's a sysctl parameter
> kernel.kexec_load_limit_reboot that prevents loading a kexec
> kernel for reboot if the value is zero. Separately, there is
> kernel.kexec_load_limit_panic that controls whether a kexec
> kernel can be loaded for kdump purposes.
> 
> kernel.kexec_load_limit_reboot defaults to -1, which allows an
> unlimited number of loading a kexec kernel for reboot. But the value
> can be set to zero with this kernel boot line parameter:
> 
> sysctl.kernel.kexec_load_limit_reboot=0
> 
> Alternatively, the mshv driver initialization could add code along
> the lines of process_sysctl_arg() to open
> /proc/sys/kernel/kexec_load_limit_reboot and write a value of zero.
> Then there's no dependency on setting the kernel boot line.
> 
> The downside to either method is that after Linux in the root partition
> is up-and-running, it is possible to change the sysctl to a non-zero value,
> and then load a kexec kernel for reboot. So this approach isn't absolute
> protection against doing a kexec for reboot. But it makes it harder, and
> until there's a mechanism to reclaim the deposited pages, it might be
> a viable compromise to allow kdump to still be used.

Mmm...eee...weelll... i think i see a much easier way to do this by
just hijacking __kexec_lock. I will resume my normal work tmrw/Fri,
so let me test it out. if it works, will send patch Monday.

Thanks,
-Mukesh



> Just a thought ....
> 
> Michael
> 
>>
>>
>>> It?s a pity we can?t apply a quick hack to disable only regular kexec.
>>> However, since crash kexec would hit the same issues, until we have a
>>> proper state transition for deposted pages, the best workaround for now
>>> is to reset the hypervisor state on every kexec, which needs design,
>>> work, and testing.
>>>
>>> Disabling kexec is the only consistent way to handle this in the
>>> upstream kernel at the moment.
>>>
>>> Thanks, Stanislav


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-28 23:08                   ` Stanislav Kinsburskii
@ 2026-01-30  2:59                     ` Mukesh R
  2026-01-30 17:17                       ` Anirudh Rayabharam
  0 siblings, 1 reply; 41+ messages in thread
From: Mukesh R @ 2026-01-30  2:59 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On 1/28/26 15:08, Stanislav Kinsburskii wrote:
> On Tue, Jan 27, 2026 at 11:56:02AM -0800, Mukesh R wrote:
>> On 1/27/26 09:47, Stanislav Kinsburskii wrote:
>>> On Mon, Jan 26, 2026 at 05:39:49PM -0800, Mukesh R wrote:
>>>> On 1/26/26 16:21, Stanislav Kinsburskii wrote:
>>>>> On Mon, Jan 26, 2026 at 03:07:18PM -0800, Mukesh R wrote:
>>>>>> On 1/26/26 12:43, Stanislav Kinsburskii wrote:
>>>>>>> On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote:
>>>>>>>> On 1/25/26 14:39, Stanislav Kinsburskii wrote:
>>>>>>>>> On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
>>>>>>>>>> On 1/23/26 14:20, Stanislav Kinsburskii wrote:
>>>>>>>>>>> The MSHV driver deposits kernel-allocated pages to the hypervisor during
>>>>>>>>>>> runtime and never withdraws them. This creates a fundamental incompatibility
>>>>>>>>>>> with KEXEC, as these deposited pages remain unavailable to the new kernel
>>>>>>>>>>> loaded via KEXEC, leading to potential system crashes upon kernel accessing
>>>>>>>>>>> hypervisor deposited pages.
>>>>>>>>>>>
>>>>>>>>>>> Make MSHV mutually exclusive with KEXEC until proper page lifecycle
>>>>>>>>>>> management is implemented.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
>>>>>>>>>>> ---
>>>>>>>>>>>        drivers/hv/Kconfig |    1 +
>>>>>>>>>>>        1 file changed, 1 insertion(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
>>>>>>>>>>> index 7937ac0cbd0f..cfd4501db0fa 100644
>>>>>>>>>>> --- a/drivers/hv/Kconfig
>>>>>>>>>>> +++ b/drivers/hv/Kconfig
>>>>>>>>>>> @@ -74,6 +74,7 @@ config MSHV_ROOT
>>>>>>>>>>>        	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
>>>>>>>>>>>        	# no particular order, making it impossible to reassemble larger pages
>>>>>>>>>>>        	depends on PAGE_SIZE_4KB
>>>>>>>>>>> +	depends on !KEXEC
>>>>>>>>>>>        	select EVENTFD
>>>>>>>>>>>        	select VIRT_XFER_TO_GUEST_WORK
>>>>>>>>>>>        	select HMM_MIRROR
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
>>>>>>>>>> implying that crash dump might be involved. Or did you test kdump
>>>>>>>>>> and it was fine?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes, it will. Crash kexec depends on normal kexec functionality, so it
>>>>>>>>> will be affected as well.
>>>>>>>>
>>>>>>>> So not sure I understand the reason for this patch. We can just block
>>>>>>>> kexec if there are any VMs running, right? Doing this would mean any
>>>>>>>> further developement would be without a ver important and major feature,
>>>>>>>> right?
>>>>>>>
>>>>>>> This is an option. But until it's implemented and merged, a user mshv
>>>>>>> driver gets into a situation where kexec is broken in a non-obvious way.
>>>>>>> The system may crash at any time after kexec, depending on whether the
>>>>>>> new kernel touches the pages deposited to hypervisor or not. This is a
>>>>>>> bad user experience.
>>>>>>
>>>>>> I understand that. But with this we cannot collect core and debug any
>>>>>> crashes. I was thinking there would be a quick way to prohibit kexec
>>>>>> for update via notifier or some other quick hack. Did you already
>>>>>> explore that and didn't find anything, hence this?
>>>>>>
>>>>>
>>>>> This quick hack you mention isn't quick in the upstream kernel as there
>>>>> is no hook to interrupt kexec process except the live update one.
>>>>
>>>> That's the one we want to interrupt and block right? crash kexec
>>>> is ok and should be allowed. We can document we don't support kexec
>>>> for update for now.
>>>>
>>>>> I sent an RFC for that one but given todays conversation details is
>>>>> won't be accepted as is.
>>>>
>>>> Are you taking about this?
>>>>
>>>>           "mshv: Add kexec safety for deposited pages"
>>>>
>>>
>>> Yes.
>>>
>>>>> Making mshv mutually exclusive with kexec is the only viable option for
>>>>> now given time constraints.
>>>>> It is intended to be replaced with proper page lifecycle management in
>>>>> the future.
>>>>
>>>> Yeah, that could take a long time and imo we cannot just disable KEXEC
>>>> completely. What we want is just block kexec for updates from some
>>>> mshv file for now, we an print during boot that kexec for updates is
>>>> not supported on mshv. Hope that makes sense.
>>>>
>>>
>>> The trade-off here is between disabling kexec support and having the
>>> kernel crash after kexec in a non-obvious way. This affects both regular
>>> kexec and crash kexec.
>>
>> crash kexec on baremetal is not affected, hence disabling that
>> doesn't make sense as we can't debug crashes then on bm.
>>
> 
> Bare metal support is not currently relevant, as it is not available.
> This is the upstream kernel, and this driver will be accessible to
> third-party customers beginning with kernel 6.19 for running their
> kernels in Azure L1VH, so consistency is required.

Well, without crashdump support, customers will not be running anything
anywhere.

Thanks,
-Mukesh

> Thanks,
> Stanislav
> 
>> Let me think and explore a bit, and if I come up with something, I'll
>> send a patch here. If nothing, then we can do this as last resort.
>>
>> Thanks,
>> -Mukesh
>>
>>
>>> It?s a pity we can?t apply a quick hack to disable only regular kexec.
>>> However, since crash kexec would hit the same issues, until we have a
>>> proper state transition for deposted pages, the best workaround for now
>>> is to reset the hypervisor state on every kexec, which needs design,
>>> work, and testing.
>>>
>>> Disabling kexec is the only consistent way to handle this in the
>>> upstream kernel at the moment.
>>>
>>> Thanks, Stanislav
>>>
>>>
>>>> Thanks,
>>>> -Mukesh
>>>>
>>>>
>>>>
>>>>> Thanks,
>>>>> Stanislav
>>>>>
>>>>>> Thanks,
>>>>>> -Mukesh
>>>>>>
>>>>>>> Therefor it should be explicitly forbidden as it's essentially not
>>>>>>> supported yet.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Stanislav
>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Stanislav
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> -Mukesh


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-28 23:11       ` Stanislav Kinsburskii
@ 2026-01-30 17:11         ` Anirudh Rayabharam
  2026-01-30 18:46           ` Stanislav Kinsburskii
  2026-02-02 18:09           ` Stanislav Kinsburskii
  0 siblings, 2 replies; 41+ messages in thread
From: Anirudh Rayabharam @ 2026-01-30 17:11 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Wed, Jan 28, 2026 at 03:11:14PM -0800, Stanislav Kinsburskii wrote:
> On Wed, Jan 28, 2026 at 04:16:31PM +0000, Anirudh Rayabharam wrote:
> > On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> > > On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > > > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > hypervisor deposited pages.
> > > > > 
> > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > management is implemented.
> > > > 
> > > > Someone might want to stop all guest VMs and do a kexec. Which is valid
> > > > and would work without any issue for L1VH.
> > > > 
> > > 
> > > No, it won't work and hypervsisor depostied pages won't be withdrawn.
> > 
> > All pages that were deposited in the context of a guest partition (i.e.
> > with the guest partition ID), would be withdrawn when you kill the VMs,
> > right? What other deposited pages would be left?
> > 
> 
> The driver deposits two types of pages: one for the guests (withdrawn
> upon gust shutdown) and the other - for the host itself (never
> withdrawn).
> See hv_call_create_partition, for example: it deposits pages for the
> host partition.

Hmm.. I see. Is it not possible to reclaim this memory in module_exit?
Also, can't we forcefully kill all running partitions in module_exit and
then reclaim memory? Would this help with kernel consistency
irrespective of userspace behavior?

Thanks,
Anirudh.

> 
> Thanks,
> Stanislav
> 
> > Thanks,
> > Anirudh.
> > 
> > > Also, kernel consisntency must no depend on use space behavior. 
> > > 
> > > > Also, I don't think it is reasonable at all that someone needs to
> > > > disable basic kernel functionality such as kexec in order to use our
> > > > driver.
> > > > 
> > > 
> > > It's a temporary measure until proper page lifecycle management is
> > > supported in the driver.
> > > Mutual exclusion of the driver and kexec is given and thus should be
> > > expclitily stated in the Kconfig.
> > > 
> > > Thanks,
> > > Stanislav
> > > 
> > > > Thanks,
> > > > Anirudh.
> > > > 
> > > > > 
> > > > > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > > > > ---
> > > > >  drivers/hv/Kconfig |    1 +
> > > > >  1 file changed, 1 insertion(+)
> > > > > 
> > > > > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > > > > index 7937ac0cbd0f..cfd4501db0fa 100644
> > > > > --- a/drivers/hv/Kconfig
> > > > > +++ b/drivers/hv/Kconfig
> > > > > @@ -74,6 +74,7 @@ config MSHV_ROOT
> > > > >  	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> > > > >  	# no particular order, making it impossible to reassemble larger pages
> > > > >  	depends on PAGE_SIZE_4KB
> > > > > +	depends on !KEXEC
> > > > >  	select EVENTFD
> > > > >  	select VIRT_XFER_TO_GUEST_WORK
> > > > >  	select HMM_MIRROR
> > > > > 
> > > > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-30  2:59                     ` Mukesh R
@ 2026-01-30 17:17                       ` Anirudh Rayabharam
  2026-01-30 18:41                         ` Stanislav Kinsburskii
  0 siblings, 1 reply; 41+ messages in thread
From: Anirudh Rayabharam @ 2026-01-30 17:17 UTC (permalink / raw)
  To: Mukesh R
  Cc: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui, longli,
	linux-hyperv, linux-kernel

On Thu, Jan 29, 2026 at 06:59:31PM -0800, Mukesh R wrote:
> On 1/28/26 15:08, Stanislav Kinsburskii wrote:
> > On Tue, Jan 27, 2026 at 11:56:02AM -0800, Mukesh R wrote:
> > > On 1/27/26 09:47, Stanislav Kinsburskii wrote:
> > > > On Mon, Jan 26, 2026 at 05:39:49PM -0800, Mukesh R wrote:
> > > > > On 1/26/26 16:21, Stanislav Kinsburskii wrote:
> > > > > > On Mon, Jan 26, 2026 at 03:07:18PM -0800, Mukesh R wrote:
> > > > > > > On 1/26/26 12:43, Stanislav Kinsburskii wrote:
> > > > > > > > On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote:
> > > > > > > > > On 1/25/26 14:39, Stanislav Kinsburskii wrote:
> > > > > > > > > > On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
> > > > > > > > > > > On 1/23/26 14:20, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > > > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > > > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > > > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > > > > > > > hypervisor deposited pages.
> > > > > > > > > > > > 
> > > > > > > > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > > > > > > > management is implemented.
> > > > > > > > > > > > 
> > > > > > > > > > > > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >        drivers/hv/Kconfig |    1 +
> > > > > > > > > > > >        1 file changed, 1 insertion(+)
> > > > > > > > > > > > 
> > > > > > > > > > > > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > > > > > > > > > > > index 7937ac0cbd0f..cfd4501db0fa 100644
> > > > > > > > > > > > --- a/drivers/hv/Kconfig
> > > > > > > > > > > > +++ b/drivers/hv/Kconfig
> > > > > > > > > > > > @@ -74,6 +74,7 @@ config MSHV_ROOT
> > > > > > > > > > > >        	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> > > > > > > > > > > >        	# no particular order, making it impossible to reassemble larger pages
> > > > > > > > > > > >        	depends on PAGE_SIZE_4KB
> > > > > > > > > > > > +	depends on !KEXEC
> > > > > > > > > > > >        	select EVENTFD
> > > > > > > > > > > >        	select VIRT_XFER_TO_GUEST_WORK
> > > > > > > > > > > >        	select HMM_MIRROR
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
> > > > > > > > > > > implying that crash dump might be involved. Or did you test kdump
> > > > > > > > > > > and it was fine?
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Yes, it will. Crash kexec depends on normal kexec functionality, so it
> > > > > > > > > > will be affected as well.
> > > > > > > > > 
> > > > > > > > > So not sure I understand the reason for this patch. We can just block
> > > > > > > > > kexec if there are any VMs running, right? Doing this would mean any
> > > > > > > > > further developement would be without a ver important and major feature,
> > > > > > > > > right?
> > > > > > > > 
> > > > > > > > This is an option. But until it's implemented and merged, a user mshv
> > > > > > > > driver gets into a situation where kexec is broken in a non-obvious way.
> > > > > > > > The system may crash at any time after kexec, depending on whether the
> > > > > > > > new kernel touches the pages deposited to hypervisor or not. This is a
> > > > > > > > bad user experience.
> > > > > > > 
> > > > > > > I understand that. But with this we cannot collect core and debug any
> > > > > > > crashes. I was thinking there would be a quick way to prohibit kexec
> > > > > > > for update via notifier or some other quick hack. Did you already
> > > > > > > explore that and didn't find anything, hence this?
> > > > > > > 
> > > > > > 
> > > > > > This quick hack you mention isn't quick in the upstream kernel as there
> > > > > > is no hook to interrupt kexec process except the live update one.
> > > > > 
> > > > > That's the one we want to interrupt and block right? crash kexec
> > > > > is ok and should be allowed. We can document we don't support kexec
> > > > > for update for now.
> > > > > 
> > > > > > I sent an RFC for that one but given todays conversation details is
> > > > > > won't be accepted as is.
> > > > > 
> > > > > Are you taking about this?
> > > > > 
> > > > >           "mshv: Add kexec safety for deposited pages"
> > > > > 
> > > > 
> > > > Yes.
> > > > 
> > > > > > Making mshv mutually exclusive with kexec is the only viable option for
> > > > > > now given time constraints.
> > > > > > It is intended to be replaced with proper page lifecycle management in
> > > > > > the future.
> > > > > 
> > > > > Yeah, that could take a long time and imo we cannot just disable KEXEC
> > > > > completely. What we want is just block kexec for updates from some
> > > > > mshv file for now, we an print during boot that kexec for updates is
> > > > > not supported on mshv. Hope that makes sense.
> > > > > 
> > > > 
> > > > The trade-off here is between disabling kexec support and having the
> > > > kernel crash after kexec in a non-obvious way. This affects both regular
> > > > kexec and crash kexec.
> > > 
> > > crash kexec on baremetal is not affected, hence disabling that
> > > doesn't make sense as we can't debug crashes then on bm.
> > > 
> > 
> > Bare metal support is not currently relevant, as it is not available.
> > This is the upstream kernel, and this driver will be accessible to
> > third-party customers beginning with kernel 6.19 for running their
> > kernels in Azure L1VH, so consistency is required.
> 
> Well, without crashdump support, customers will not be running anything
> anywhere.

This is my concern too. I don't think customers will be particularly
happy that kexec doesn't work with our driver.

Thanks,
Anirudh

> 
> Thanks,
> -Mukesh
> 
> > Thanks,
> > Stanislav
> > 
> > > Let me think and explore a bit, and if I come up with something, I'll
> > > send a patch here. If nothing, then we can do this as last resort.
> > > 
> > > Thanks,
> > > -Mukesh
> > > 
> > > 
> > > > It?s a pity we can?t apply a quick hack to disable only regular kexec.
> > > > However, since crash kexec would hit the same issues, until we have a
> > > > proper state transition for deposted pages, the best workaround for now
> > > > is to reset the hypervisor state on every kexec, which needs design,
> > > > work, and testing.
> > > > 
> > > > Disabling kexec is the only consistent way to handle this in the
> > > > upstream kernel at the moment.
> > > > 
> > > > Thanks, Stanislav
> > > > 
> > > > 
> > > > > Thanks,
> > > > > -Mukesh
> > > > > 
> > > > > 
> > > > > 
> > > > > > Thanks,
> > > > > > Stanislav
> > > > > > 
> > > > > > > Thanks,
> > > > > > > -Mukesh
> > > > > > > 
> > > > > > > > Therefor it should be explicitly forbidden as it's essentially not
> > > > > > > > supported yet.
> > > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > Stanislav
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > Thanks,
> > > > > > > > > > Stanislav
> > > > > > > > > > 
> > > > > > > > > > > Thanks,
> > > > > > > > > > > -Mukesh
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-30 17:17                       ` Anirudh Rayabharam
@ 2026-01-30 18:41                         ` Stanislav Kinsburskii
  2026-01-30 19:47                           ` Mukesh R
  0 siblings, 1 reply; 41+ messages in thread
From: Stanislav Kinsburskii @ 2026-01-30 18:41 UTC (permalink / raw)
  To: Anirudh Rayabharam
  Cc: Mukesh R, kys, haiyangz, wei.liu, decui, longli, linux-hyperv,
	linux-kernel

On Fri, Jan 30, 2026 at 05:17:52PM +0000, Anirudh Rayabharam wrote:
> On Thu, Jan 29, 2026 at 06:59:31PM -0800, Mukesh R wrote:
> > On 1/28/26 15:08, Stanislav Kinsburskii wrote:
> > > On Tue, Jan 27, 2026 at 11:56:02AM -0800, Mukesh R wrote:
> > > > On 1/27/26 09:47, Stanislav Kinsburskii wrote:
> > > > > On Mon, Jan 26, 2026 at 05:39:49PM -0800, Mukesh R wrote:
> > > > > > On 1/26/26 16:21, Stanislav Kinsburskii wrote:
> > > > > > > On Mon, Jan 26, 2026 at 03:07:18PM -0800, Mukesh R wrote:
> > > > > > > > On 1/26/26 12:43, Stanislav Kinsburskii wrote:
> > > > > > > > > On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote:
> > > > > > > > > > On 1/25/26 14:39, Stanislav Kinsburskii wrote:
> > > > > > > > > > > On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
> > > > > > > > > > > > On 1/23/26 14:20, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > > > > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > > > > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > > > > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > > > > > > > > hypervisor deposited pages.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > > > > > > > > management is implemented.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >        drivers/hv/Kconfig |    1 +
> > > > > > > > > > > > >        1 file changed, 1 insertion(+)
> > > > > > > > > > > > > 
> > > > > > > > > > > > > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > > > > > > > > > > > > index 7937ac0cbd0f..cfd4501db0fa 100644
> > > > > > > > > > > > > --- a/drivers/hv/Kconfig
> > > > > > > > > > > > > +++ b/drivers/hv/Kconfig
> > > > > > > > > > > > > @@ -74,6 +74,7 @@ config MSHV_ROOT
> > > > > > > > > > > > >        	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> > > > > > > > > > > > >        	# no particular order, making it impossible to reassemble larger pages
> > > > > > > > > > > > >        	depends on PAGE_SIZE_4KB
> > > > > > > > > > > > > +	depends on !KEXEC
> > > > > > > > > > > > >        	select EVENTFD
> > > > > > > > > > > > >        	select VIRT_XFER_TO_GUEST_WORK
> > > > > > > > > > > > >        	select HMM_MIRROR
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
> > > > > > > > > > > > implying that crash dump might be involved. Or did you test kdump
> > > > > > > > > > > > and it was fine?
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Yes, it will. Crash kexec depends on normal kexec functionality, so it
> > > > > > > > > > > will be affected as well.
> > > > > > > > > > 
> > > > > > > > > > So not sure I understand the reason for this patch. We can just block
> > > > > > > > > > kexec if there are any VMs running, right? Doing this would mean any
> > > > > > > > > > further developement would be without a ver important and major feature,
> > > > > > > > > > right?
> > > > > > > > > 
> > > > > > > > > This is an option. But until it's implemented and merged, a user mshv
> > > > > > > > > driver gets into a situation where kexec is broken in a non-obvious way.
> > > > > > > > > The system may crash at any time after kexec, depending on whether the
> > > > > > > > > new kernel touches the pages deposited to hypervisor or not. This is a
> > > > > > > > > bad user experience.
> > > > > > > > 
> > > > > > > > I understand that. But with this we cannot collect core and debug any
> > > > > > > > crashes. I was thinking there would be a quick way to prohibit kexec
> > > > > > > > for update via notifier or some other quick hack. Did you already
> > > > > > > > explore that and didn't find anything, hence this?
> > > > > > > > 
> > > > > > > 
> > > > > > > This quick hack you mention isn't quick in the upstream kernel as there
> > > > > > > is no hook to interrupt kexec process except the live update one.
> > > > > > 
> > > > > > That's the one we want to interrupt and block right? crash kexec
> > > > > > is ok and should be allowed. We can document we don't support kexec
> > > > > > for update for now.
> > > > > > 
> > > > > > > I sent an RFC for that one but given todays conversation details is
> > > > > > > won't be accepted as is.
> > > > > > 
> > > > > > Are you taking about this?
> > > > > > 
> > > > > >           "mshv: Add kexec safety for deposited pages"
> > > > > > 
> > > > > 
> > > > > Yes.
> > > > > 
> > > > > > > Making mshv mutually exclusive with kexec is the only viable option for
> > > > > > > now given time constraints.
> > > > > > > It is intended to be replaced with proper page lifecycle management in
> > > > > > > the future.
> > > > > > 
> > > > > > Yeah, that could take a long time and imo we cannot just disable KEXEC
> > > > > > completely. What we want is just block kexec for updates from some
> > > > > > mshv file for now, we an print during boot that kexec for updates is
> > > > > > not supported on mshv. Hope that makes sense.
> > > > > > 
> > > > > 
> > > > > The trade-off here is between disabling kexec support and having the
> > > > > kernel crash after kexec in a non-obvious way. This affects both regular
> > > > > kexec and crash kexec.
> > > > 
> > > > crash kexec on baremetal is not affected, hence disabling that
> > > > doesn't make sense as we can't debug crashes then on bm.
> > > > 
> > > 
> > > Bare metal support is not currently relevant, as it is not available.
> > > This is the upstream kernel, and this driver will be accessible to
> > > third-party customers beginning with kernel 6.19 for running their
> > > kernels in Azure L1VH, so consistency is required.
> > 
> > Well, without crashdump support, customers will not be running anything
> > anywhere.
> 
> This is my concern too. I don't think customers will be particularly
> happy that kexec doesn't work with our driver.
> 

I wasn’t clear earlier, so let me restate it. Today, kexec is not
supported in L1VH. This is a bug we have not fixed yet. Disabling kexec
is not a long-term solution. But it is better to disable it explicitly
than to have kernel crashes after kexec.

This does not mean the bug should not be fixed. But the upstream kernel
has its own policies and merge windows. For kernel 6.19, it is better to
have a clear kexec error than random crashes after kexec.

Thanks,
Stanislav

> Thanks,
> Anirudh
> 
> > 
> > Thanks,
> > -Mukesh
> > 
> > > Thanks,
> > > Stanislav
> > > 
> > > > Let me think and explore a bit, and if I come up with something, I'll
> > > > send a patch here. If nothing, then we can do this as last resort.
> > > > 
> > > > Thanks,
> > > > -Mukesh
> > > > 
> > > > 
> > > > > It?s a pity we can?t apply a quick hack to disable only regular kexec.
> > > > > However, since crash kexec would hit the same issues, until we have a
> > > > > proper state transition for deposted pages, the best workaround for now
> > > > > is to reset the hypervisor state on every kexec, which needs design,
> > > > > work, and testing.
> > > > > 
> > > > > Disabling kexec is the only consistent way to handle this in the
> > > > > upstream kernel at the moment.
> > > > > 
> > > > > Thanks, Stanislav
> > > > > 
> > > > > 
> > > > > > Thanks,
> > > > > > -Mukesh
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > Thanks,
> > > > > > > Stanislav
> > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > -Mukesh
> > > > > > > > 
> > > > > > > > > Therefor it should be explicitly forbidden as it's essentially not
> > > > > > > > > supported yet.
> > > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > Stanislav
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Stanislav
> > > > > > > > > > > 
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > -Mukesh
> > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-30 17:11         ` Anirudh Rayabharam
@ 2026-01-30 18:46           ` Stanislav Kinsburskii
  2026-01-30 20:32             ` Anirudh Rayabharam
  2026-02-02 18:09           ` Stanislav Kinsburskii
  1 sibling, 1 reply; 41+ messages in thread
From: Stanislav Kinsburskii @ 2026-01-30 18:46 UTC (permalink / raw)
  To: Anirudh Rayabharam
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Fri, Jan 30, 2026 at 05:11:12PM +0000, Anirudh Rayabharam wrote:
> On Wed, Jan 28, 2026 at 03:11:14PM -0800, Stanislav Kinsburskii wrote:
> > On Wed, Jan 28, 2026 at 04:16:31PM +0000, Anirudh Rayabharam wrote:
> > > On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> > > > On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > > > > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > hypervisor deposited pages.
> > > > > > 
> > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > management is implemented.
> > > > > 
> > > > > Someone might want to stop all guest VMs and do a kexec. Which is valid
> > > > > and would work without any issue for L1VH.
> > > > > 
> > > > 
> > > > No, it won't work and hypervsisor depostied pages won't be withdrawn.
> > > 
> > > All pages that were deposited in the context of a guest partition (i.e.
> > > with the guest partition ID), would be withdrawn when you kill the VMs,
> > > right? What other deposited pages would be left?
> > > 
> > 
> > The driver deposits two types of pages: one for the guests (withdrawn
> > upon gust shutdown) and the other - for the host itself (never
> > withdrawn).
> > See hv_call_create_partition, for example: it deposits pages for the
> > host partition.
> 
> Hmm.. I see. Is it not possible to reclaim this memory in module_exit?
> Also, can't we forcefully kill all running partitions in module_exit and
> then reclaim memory? Would this help with kernel consistency
> irrespective of userspace behavior?
> 

It would, but this is sloppy and cannot be a long-term solution.

It is also not reliable. We have no hook to prevent kexec. So if we fail
to kill the guest or reclaim the memory for any reason, the new kernel
may still crash.

There are two long-term solutions:
 1. Add a way to prevent kexec when there is shared state between the hypervisor and the kernel.
 2. Hand the shared kernel state over to the new kernel.

I sent a series for the first one. The second one is not ready yet.
Anything else is neither robust nor reliable, so I don’t think it makes
sense to pursue it.

Thanks,
Stanislav


> Thanks,
> Anirudh.
> 
> > 
> > Thanks,
> > Stanislav
> > 
> > > Thanks,
> > > Anirudh.
> > > 
> > > > Also, kernel consisntency must no depend on use space behavior. 
> > > > 
> > > > > Also, I don't think it is reasonable at all that someone needs to
> > > > > disable basic kernel functionality such as kexec in order to use our
> > > > > driver.
> > > > > 
> > > > 
> > > > It's a temporary measure until proper page lifecycle management is
> > > > supported in the driver.
> > > > Mutual exclusion of the driver and kexec is given and thus should be
> > > > expclitily stated in the Kconfig.
> > > > 
> > > > Thanks,
> > > > Stanislav
> > > > 
> > > > > Thanks,
> > > > > Anirudh.
> > > > > 
> > > > > > 
> > > > > > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > > > > > ---
> > > > > >  drivers/hv/Kconfig |    1 +
> > > > > >  1 file changed, 1 insertion(+)
> > > > > > 
> > > > > > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > > > > > index 7937ac0cbd0f..cfd4501db0fa 100644
> > > > > > --- a/drivers/hv/Kconfig
> > > > > > +++ b/drivers/hv/Kconfig
> > > > > > @@ -74,6 +74,7 @@ config MSHV_ROOT
> > > > > >  	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> > > > > >  	# no particular order, making it impossible to reassemble larger pages
> > > > > >  	depends on PAGE_SIZE_4KB
> > > > > > +	depends on !KEXEC
> > > > > >  	select EVENTFD
> > > > > >  	select VIRT_XFER_TO_GUEST_WORK
> > > > > >  	select HMM_MIRROR
> > > > > > 
> > > > > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-30 18:41                         ` Stanislav Kinsburskii
@ 2026-01-30 19:47                           ` Mukesh R
  2026-02-02 16:43                             ` Stanislav Kinsburskii
  0 siblings, 1 reply; 41+ messages in thread
From: Mukesh R @ 2026-01-30 19:47 UTC (permalink / raw)
  To: Stanislav Kinsburskii, Anirudh Rayabharam
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On 1/30/26 10:41, Stanislav Kinsburskii wrote:
> On Fri, Jan 30, 2026 at 05:17:52PM +0000, Anirudh Rayabharam wrote:
>> On Thu, Jan 29, 2026 at 06:59:31PM -0800, Mukesh R wrote:
>>> On 1/28/26 15:08, Stanislav Kinsburskii wrote:
>>>> On Tue, Jan 27, 2026 at 11:56:02AM -0800, Mukesh R wrote:
>>>>> On 1/27/26 09:47, Stanislav Kinsburskii wrote:
>>>>>> On Mon, Jan 26, 2026 at 05:39:49PM -0800, Mukesh R wrote:
>>>>>>> On 1/26/26 16:21, Stanislav Kinsburskii wrote:
>>>>>>>> On Mon, Jan 26, 2026 at 03:07:18PM -0800, Mukesh R wrote:
>>>>>>>>> On 1/26/26 12:43, Stanislav Kinsburskii wrote:
>>>>>>>>>> On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote:
>>>>>>>>>>> On 1/25/26 14:39, Stanislav Kinsburskii wrote:
>>>>>>>>>>>> On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
>>>>>>>>>>>>> On 1/23/26 14:20, Stanislav Kinsburskii wrote:
>>>>>>>>>>>>>> The MSHV driver deposits kernel-allocated pages to the hypervisor during
>>>>>>>>>>>>>> runtime and never withdraws them. This creates a fundamental incompatibility
>>>>>>>>>>>>>> with KEXEC, as these deposited pages remain unavailable to the new kernel
>>>>>>>>>>>>>> loaded via KEXEC, leading to potential system crashes upon kernel accessing
>>>>>>>>>>>>>> hypervisor deposited pages.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Make MSHV mutually exclusive with KEXEC until proper page lifecycle
>>>>>>>>>>>>>> management is implemented.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>         drivers/hv/Kconfig |    1 +
>>>>>>>>>>>>>>         1 file changed, 1 insertion(+)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
>>>>>>>>>>>>>> index 7937ac0cbd0f..cfd4501db0fa 100644
>>>>>>>>>>>>>> --- a/drivers/hv/Kconfig
>>>>>>>>>>>>>> +++ b/drivers/hv/Kconfig
>>>>>>>>>>>>>> @@ -74,6 +74,7 @@ config MSHV_ROOT
>>>>>>>>>>>>>>         	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
>>>>>>>>>>>>>>         	# no particular order, making it impossible to reassemble larger pages
>>>>>>>>>>>>>>         	depends on PAGE_SIZE_4KB
>>>>>>>>>>>>>> +	depends on !KEXEC
>>>>>>>>>>>>>>         	select EVENTFD
>>>>>>>>>>>>>>         	select VIRT_XFER_TO_GUEST_WORK
>>>>>>>>>>>>>>         	select HMM_MIRROR
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
>>>>>>>>>>>>> implying that crash dump might be involved. Or did you test kdump
>>>>>>>>>>>>> and it was fine?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, it will. Crash kexec depends on normal kexec functionality, so it
>>>>>>>>>>>> will be affected as well.
>>>>>>>>>>>
>>>>>>>>>>> So not sure I understand the reason for this patch. We can just block
>>>>>>>>>>> kexec if there are any VMs running, right? Doing this would mean any
>>>>>>>>>>> further developement would be without a ver important and major feature,
>>>>>>>>>>> right?
>>>>>>>>>>
>>>>>>>>>> This is an option. But until it's implemented and merged, a user mshv
>>>>>>>>>> driver gets into a situation where kexec is broken in a non-obvious way.
>>>>>>>>>> The system may crash at any time after kexec, depending on whether the
>>>>>>>>>> new kernel touches the pages deposited to hypervisor or not. This is a
>>>>>>>>>> bad user experience.
>>>>>>>>>
>>>>>>>>> I understand that. But with this we cannot collect core and debug any
>>>>>>>>> crashes. I was thinking there would be a quick way to prohibit kexec
>>>>>>>>> for update via notifier or some other quick hack. Did you already
>>>>>>>>> explore that and didn't find anything, hence this?
>>>>>>>>>
>>>>>>>>
>>>>>>>> This quick hack you mention isn't quick in the upstream kernel as there
>>>>>>>> is no hook to interrupt kexec process except the live update one.
>>>>>>>
>>>>>>> That's the one we want to interrupt and block right? crash kexec
>>>>>>> is ok and should be allowed. We can document we don't support kexec
>>>>>>> for update for now.
>>>>>>>
>>>>>>>> I sent an RFC for that one but given todays conversation details is
>>>>>>>> won't be accepted as is.
>>>>>>>
>>>>>>> Are you taking about this?
>>>>>>>
>>>>>>>            "mshv: Add kexec safety for deposited pages"
>>>>>>>
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>>> Making mshv mutually exclusive with kexec is the only viable option for
>>>>>>>> now given time constraints.
>>>>>>>> It is intended to be replaced with proper page lifecycle management in
>>>>>>>> the future.
>>>>>>>
>>>>>>> Yeah, that could take a long time and imo we cannot just disable KEXEC
>>>>>>> completely. What we want is just block kexec for updates from some
>>>>>>> mshv file for now, we an print during boot that kexec for updates is
>>>>>>> not supported on mshv. Hope that makes sense.
>>>>>>>
>>>>>>
>>>>>> The trade-off here is between disabling kexec support and having the
>>>>>> kernel crash after kexec in a non-obvious way. This affects both regular
>>>>>> kexec and crash kexec.
>>>>>
>>>>> crash kexec on baremetal is not affected, hence disabling that
>>>>> doesn't make sense as we can't debug crashes then on bm.
>>>>>
>>>>
>>>> Bare metal support is not currently relevant, as it is not available.
>>>> This is the upstream kernel, and this driver will be accessible to
>>>> third-party customers beginning with kernel 6.19 for running their
>>>> kernels in Azure L1VH, so consistency is required.
>>>
>>> Well, without crashdump support, customers will not be running anything
>>> anywhere.
>>
>> This is my concern too. I don't think customers will be particularly
>> happy that kexec doesn't work with our driver.
>>
> 
> I wasn?t clear earlier, so let me restate it. Today, kexec is not
> supported in L1VH. This is a bug we have not fixed yet. Disabling kexec
> is not a long-term solution. But it is better to disable it explicitly
> than to have kernel crashes after kexec.

I don't think there is disagreement on this. The undesired part is turning
off KEXEC config completely.

Thanks,
-Mukesh


> This does not mean the bug should not be fixed. But the upstream kernel
> has its own policies and merge windows. For kernel 6.19, it is better to
> have a clear kexec error than random crashes after kexec.
> 
> Thanks,
> Stanislav
> 
>> Thanks,
>> Anirudh
>>
>>>
>>> Thanks,
>>> -Mukesh
>>>
>>>> Thanks,
>>>> Stanislav
>>>>
>>>>> Let me think and explore a bit, and if I come up with something, I'll
>>>>> send a patch here. If nothing, then we can do this as last resort.
>>>>>
>>>>> Thanks,
>>>>> -Mukesh
>>>>>
>>>>>
>>>>>> It?s a pity we can?t apply a quick hack to disable only regular kexec.
>>>>>> However, since crash kexec would hit the same issues, until we have a
>>>>>> proper state transition for deposted pages, the best workaround for now
>>>>>> is to reset the hypervisor state on every kexec, which needs design,
>>>>>> work, and testing.
>>>>>>
>>>>>> Disabling kexec is the only consistent way to handle this in the
>>>>>> upstream kernel at the moment.
>>>>>>
>>>>>> Thanks, Stanislav
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> -Mukesh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Stanislav
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> -Mukesh
>>>>>>>>>
>>>>>>>>>> Therefor it should be explicitly forbidden as it's essentially not
>>>>>>>>>> supported yet.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Stanislav
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Stanislav
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> -Mukesh
>>>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-30 18:46           ` Stanislav Kinsburskii
@ 2026-01-30 20:32             ` Anirudh Rayabharam
  2026-02-02 17:10               ` Stanislav Kinsburskii
  0 siblings, 1 reply; 41+ messages in thread
From: Anirudh Rayabharam @ 2026-01-30 20:32 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Fri, Jan 30, 2026 at 10:46:45AM -0800, Stanislav Kinsburskii wrote:
> On Fri, Jan 30, 2026 at 05:11:12PM +0000, Anirudh Rayabharam wrote:
> > On Wed, Jan 28, 2026 at 03:11:14PM -0800, Stanislav Kinsburskii wrote:
> > > On Wed, Jan 28, 2026 at 04:16:31PM +0000, Anirudh Rayabharam wrote:
> > > > On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> > > > > On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > > > > > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > > hypervisor deposited pages.
> > > > > > > 
> > > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > > management is implemented.
> > > > > > 
> > > > > > Someone might want to stop all guest VMs and do a kexec. Which is valid
> > > > > > and would work without any issue for L1VH.
> > > > > > 
> > > > > 
> > > > > No, it won't work and hypervsisor depostied pages won't be withdrawn.
> > > > 
> > > > All pages that were deposited in the context of a guest partition (i.e.
> > > > with the guest partition ID), would be withdrawn when you kill the VMs,
> > > > right? What other deposited pages would be left?
> > > > 
> > > 
> > > The driver deposits two types of pages: one for the guests (withdrawn
> > > upon gust shutdown) and the other - for the host itself (never
> > > withdrawn).
> > > See hv_call_create_partition, for example: it deposits pages for the
> > > host partition.
> > 
> > Hmm.. I see. Is it not possible to reclaim this memory in module_exit?
> > Also, can't we forcefully kill all running partitions in module_exit and
> > then reclaim memory? Would this help with kernel consistency
> > irrespective of userspace behavior?
> > 
> 
> It would, but this is sloppy and cannot be a long-term solution.
> 
> It is also not reliable. We have no hook to prevent kexec. So if we fail
> to kill the guest or reclaim the memory for any reason, the new kernel
> may still crash.

Actually guests won't be running by the time we reach our module_exit
function during a kexec. Userspace processes would've been killed by
then.

Also, why is this sloppy? Isn't this what module_exit should be
doing anyway? If someone unloads our module we should be trying to
clean everything up (including killing guests) and reclaim memory.

In any case, we can BUG() out if we fail to reclaim the memory. That would
stop the kexec.

This is a better solution since instead of disabling KEXEC outright: our
driver made the best possible efforts to make kexec work.

> 
> There are two long-term solutions:
>  1. Add a way to prevent kexec when there is shared state between the hypervisor and the kernel.

I honestly think we should focus efforts on making kexec work rather
than finding ways to prevent it.

Thanks,
Anirudh

>  2. Hand the shared kernel state over to the new kernel.
> 
> I sent a series for the first one. The second one is not ready yet.
> Anything else is neither robust nor reliable, so I don’t think it makes
> sense to pursue it.
> 
> Thanks,
> Stanislav
> 
> 
> > Thanks,
> > Anirudh.
> > 
> > > 
> > > Thanks,
> > > Stanislav
> > > 
> > > > Thanks,
> > > > Anirudh.
> > > > 
> > > > > Also, kernel consisntency must no depend on use space behavior. 
> > > > > 
> > > > > > Also, I don't think it is reasonable at all that someone needs to
> > > > > > disable basic kernel functionality such as kexec in order to use our
> > > > > > driver.
> > > > > > 
> > > > > 
> > > > > It's a temporary measure until proper page lifecycle management is
> > > > > supported in the driver.
> > > > > Mutual exclusion of the driver and kexec is given and thus should be
> > > > > expclitily stated in the Kconfig.
> > > > > 
> > > > > Thanks,
> > > > > Stanislav
> > > > > 
> > > > > > Thanks,
> > > > > > Anirudh.
> > > > > > 
> > > > > > > 
> > > > > > > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > > > > > > ---
> > > > > > >  drivers/hv/Kconfig |    1 +
> > > > > > >  1 file changed, 1 insertion(+)
> > > > > > > 
> > > > > > > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > > > > > > index 7937ac0cbd0f..cfd4501db0fa 100644
> > > > > > > --- a/drivers/hv/Kconfig
> > > > > > > +++ b/drivers/hv/Kconfig
> > > > > > > @@ -74,6 +74,7 @@ config MSHV_ROOT
> > > > > > >  	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> > > > > > >  	# no particular order, making it impossible to reassemble larger pages
> > > > > > >  	depends on PAGE_SIZE_4KB
> > > > > > > +	depends on !KEXEC
> > > > > > >  	select EVENTFD
> > > > > > >  	select VIRT_XFER_TO_GUEST_WORK
> > > > > > >  	select HMM_MIRROR
> > > > > > > 
> > > > > > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-30 19:47                           ` Mukesh R
@ 2026-02-02 16:43                             ` Stanislav Kinsburskii
  2026-02-02 20:15                               ` Mukesh R
  0 siblings, 1 reply; 41+ messages in thread
From: Stanislav Kinsburskii @ 2026-02-02 16:43 UTC (permalink / raw)
  To: Mukesh R
  Cc: Anirudh Rayabharam, kys, haiyangz, wei.liu, decui, longli,
	linux-hyperv, linux-kernel

On Fri, Jan 30, 2026 at 11:47:48AM -0800, Mukesh R wrote:
> On 1/30/26 10:41, Stanislav Kinsburskii wrote:
> > On Fri, Jan 30, 2026 at 05:17:52PM +0000, Anirudh Rayabharam wrote:
> > > On Thu, Jan 29, 2026 at 06:59:31PM -0800, Mukesh R wrote:
> > > > On 1/28/26 15:08, Stanislav Kinsburskii wrote:
> > > > > On Tue, Jan 27, 2026 at 11:56:02AM -0800, Mukesh R wrote:
> > > > > > On 1/27/26 09:47, Stanislav Kinsburskii wrote:
> > > > > > > On Mon, Jan 26, 2026 at 05:39:49PM -0800, Mukesh R wrote:
> > > > > > > > On 1/26/26 16:21, Stanislav Kinsburskii wrote:
> > > > > > > > > On Mon, Jan 26, 2026 at 03:07:18PM -0800, Mukesh R wrote:
> > > > > > > > > > On 1/26/26 12:43, Stanislav Kinsburskii wrote:
> > > > > > > > > > > On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote:
> > > > > > > > > > > > On 1/25/26 14:39, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > > On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
> > > > > > > > > > > > > > On 1/23/26 14:20, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > > > > > > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > > > > > > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > > > > > > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > > > > > > > > > > hypervisor deposited pages.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > > > > > > > > > > management is implemented.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > >         drivers/hv/Kconfig |    1 +
> > > > > > > > > > > > > > >         1 file changed, 1 insertion(+)
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > > > > > > > > > > > > > > index 7937ac0cbd0f..cfd4501db0fa 100644
> > > > > > > > > > > > > > > --- a/drivers/hv/Kconfig
> > > > > > > > > > > > > > > +++ b/drivers/hv/Kconfig
> > > > > > > > > > > > > > > @@ -74,6 +74,7 @@ config MSHV_ROOT
> > > > > > > > > > > > > > >         	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> > > > > > > > > > > > > > >         	# no particular order, making it impossible to reassemble larger pages
> > > > > > > > > > > > > > >         	depends on PAGE_SIZE_4KB
> > > > > > > > > > > > > > > +	depends on !KEXEC
> > > > > > > > > > > > > > >         	select EVENTFD
> > > > > > > > > > > > > > >         	select VIRT_XFER_TO_GUEST_WORK
> > > > > > > > > > > > > > >         	select HMM_MIRROR
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
> > > > > > > > > > > > > > implying that crash dump might be involved. Or did you test kdump
> > > > > > > > > > > > > > and it was fine?
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Yes, it will. Crash kexec depends on normal kexec functionality, so it
> > > > > > > > > > > > > will be affected as well.
> > > > > > > > > > > > 
> > > > > > > > > > > > So not sure I understand the reason for this patch. We can just block
> > > > > > > > > > > > kexec if there are any VMs running, right? Doing this would mean any
> > > > > > > > > > > > further developement would be without a ver important and major feature,
> > > > > > > > > > > > right?
> > > > > > > > > > > 
> > > > > > > > > > > This is an option. But until it's implemented and merged, a user mshv
> > > > > > > > > > > driver gets into a situation where kexec is broken in a non-obvious way.
> > > > > > > > > > > The system may crash at any time after kexec, depending on whether the
> > > > > > > > > > > new kernel touches the pages deposited to hypervisor or not. This is a
> > > > > > > > > > > bad user experience.
> > > > > > > > > > 
> > > > > > > > > > I understand that. But with this we cannot collect core and debug any
> > > > > > > > > > crashes. I was thinking there would be a quick way to prohibit kexec
> > > > > > > > > > for update via notifier or some other quick hack. Did you already
> > > > > > > > > > explore that and didn't find anything, hence this?
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > This quick hack you mention isn't quick in the upstream kernel as there
> > > > > > > > > is no hook to interrupt kexec process except the live update one.
> > > > > > > > 
> > > > > > > > That's the one we want to interrupt and block right? crash kexec
> > > > > > > > is ok and should be allowed. We can document we don't support kexec
> > > > > > > > for update for now.
> > > > > > > > 
> > > > > > > > > I sent an RFC for that one but given todays conversation details is
> > > > > > > > > won't be accepted as is.
> > > > > > > > 
> > > > > > > > Are you taking about this?
> > > > > > > > 
> > > > > > > >            "mshv: Add kexec safety for deposited pages"
> > > > > > > > 
> > > > > > > 
> > > > > > > Yes.
> > > > > > > 
> > > > > > > > > Making mshv mutually exclusive with kexec is the only viable option for
> > > > > > > > > now given time constraints.
> > > > > > > > > It is intended to be replaced with proper page lifecycle management in
> > > > > > > > > the future.
> > > > > > > > 
> > > > > > > > Yeah, that could take a long time and imo we cannot just disable KEXEC
> > > > > > > > completely. What we want is just block kexec for updates from some
> > > > > > > > mshv file for now, we an print during boot that kexec for updates is
> > > > > > > > not supported on mshv. Hope that makes sense.
> > > > > > > > 
> > > > > > > 
> > > > > > > The trade-off here is between disabling kexec support and having the
> > > > > > > kernel crash after kexec in a non-obvious way. This affects both regular
> > > > > > > kexec and crash kexec.
> > > > > > 
> > > > > > crash kexec on baremetal is not affected, hence disabling that
> > > > > > doesn't make sense as we can't debug crashes then on bm.
> > > > > > 
> > > > > 
> > > > > Bare metal support is not currently relevant, as it is not available.
> > > > > This is the upstream kernel, and this driver will be accessible to
> > > > > third-party customers beginning with kernel 6.19 for running their
> > > > > kernels in Azure L1VH, so consistency is required.
> > > > 
> > > > Well, without crashdump support, customers will not be running anything
> > > > anywhere.
> > > 
> > > This is my concern too. I don't think customers will be particularly
> > > happy that kexec doesn't work with our driver.
> > > 
> > 
> > I wasn?t clear earlier, so let me restate it. Today, kexec is not
> > supported in L1VH. This is a bug we have not fixed yet. Disabling kexec
> > is not a long-term solution. But it is better to disable it explicitly
> > than to have kernel crashes after kexec.
> 
> I don't think there is disagreement on this. The undesired part is turning
> off KEXEC config completely.
> 

There is no disagreement on this either. If you have a better solution
that can be implemented and merged before next kernel merge window,
please propose it. Otherwise, this patch will remain as is for now.

Thanks,
Stanislav

> Thanks,
> -Mukesh
> 
> 
> > This does not mean the bug should not be fixed. But the upstream kernel
> > has its own policies and merge windows. For kernel 6.19, it is better to
> > have a clear kexec error than random crashes after kexec.
> > 
> > Thanks,
> > Stanislav
> > 
> > > Thanks,
> > > Anirudh
> > > 
> > > > 
> > > > Thanks,
> > > > -Mukesh
> > > > 
> > > > > Thanks,
> > > > > Stanislav
> > > > > 
> > > > > > Let me think and explore a bit, and if I come up with something, I'll
> > > > > > send a patch here. If nothing, then we can do this as last resort.
> > > > > > 
> > > > > > Thanks,
> > > > > > -Mukesh
> > > > > > 
> > > > > > 
> > > > > > > It?s a pity we can?t apply a quick hack to disable only regular kexec.
> > > > > > > However, since crash kexec would hit the same issues, until we have a
> > > > > > > proper state transition for deposted pages, the best workaround for now
> > > > > > > is to reset the hypervisor state on every kexec, which needs design,
> > > > > > > work, and testing.
> > > > > > > 
> > > > > > > Disabling kexec is the only consistent way to handle this in the
> > > > > > > upstream kernel at the moment.
> > > > > > > 
> > > > > > > Thanks, Stanislav
> > > > > > > 
> > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > -Mukesh
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > Stanislav
> > > > > > > > > 
> > > > > > > > > > Thanks,
> > > > > > > > > > -Mukesh
> > > > > > > > > > 
> > > > > > > > > > > Therefor it should be explicitly forbidden as it's essentially not
> > > > > > > > > > > supported yet.
> > > > > > > > > > > 
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Stanislav
> > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Stanislav
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > -Mukesh
> > > > 
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-23 22:20 [PATCH] mshv: Make MSHV mutually exclusive with KEXEC Stanislav Kinsburskii
                   ` (2 preceding siblings ...)
  2026-01-26 18:49 ` Anirudh Rayabharam
@ 2026-02-02 16:56 ` Naman Jain
  3 siblings, 0 replies; 41+ messages in thread
From: Naman Jain @ 2026-02-02 16:56 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, haiyangz, wei.liu, decui, longli
  Cc: linux-hyperv, linux-kernel



On 1/24/2026 3:50 AM, Stanislav Kinsburskii wrote:
> The MSHV driver deposits kernel-allocated pages to the hypervisor during
> runtime and never withdraws them. This creates a fundamental incompatibility
> with KEXEC, as these deposited pages remain unavailable to the new kernel
> loaded via KEXEC, leading to potential system crashes upon kernel accessing
> hypervisor deposited pages.
> 
> Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> management is implemented.
> 

I have not gone through entire conversation that has happened already on 
this, but if you send a next version for this, please change commit msg 
and subject to include MSHV_ROOT instead of MSHV, to avoid confusion.

Regards,
Naman

> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>   drivers/hv/Kconfig |    1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> index 7937ac0cbd0f..cfd4501db0fa 100644
> --- a/drivers/hv/Kconfig
> +++ b/drivers/hv/Kconfig
> @@ -74,6 +74,7 @@ config MSHV_ROOT
>   	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
>   	# no particular order, making it impossible to reassemble larger pages
>   	depends on PAGE_SIZE_4KB
> +	depends on !KEXEC
>   	select EVENTFD
>   	select VIRT_XFER_TO_GUEST_WORK
>   	select HMM_MIRROR
> 
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-30 20:32             ` Anirudh Rayabharam
@ 2026-02-02 17:10               ` Stanislav Kinsburskii
  2026-02-02 19:01                 ` Anirudh Rayabharam
  0 siblings, 1 reply; 41+ messages in thread
From: Stanislav Kinsburskii @ 2026-02-02 17:10 UTC (permalink / raw)
  To: Anirudh Rayabharam
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Fri, Jan 30, 2026 at 08:32:45PM +0000, Anirudh Rayabharam wrote:
> On Fri, Jan 30, 2026 at 10:46:45AM -0800, Stanislav Kinsburskii wrote:
> > On Fri, Jan 30, 2026 at 05:11:12PM +0000, Anirudh Rayabharam wrote:
> > > On Wed, Jan 28, 2026 at 03:11:14PM -0800, Stanislav Kinsburskii wrote:
> > > > On Wed, Jan 28, 2026 at 04:16:31PM +0000, Anirudh Rayabharam wrote:
> > > > > On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> > > > > > On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > > > > > > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > > > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > > > hypervisor deposited pages.
> > > > > > > > 
> > > > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > > > management is implemented.
> > > > > > > 
> > > > > > > Someone might want to stop all guest VMs and do a kexec. Which is valid
> > > > > > > and would work without any issue for L1VH.
> > > > > > > 
> > > > > > 
> > > > > > No, it won't work and hypervsisor depostied pages won't be withdrawn.
> > > > > 
> > > > > All pages that were deposited in the context of a guest partition (i.e.
> > > > > with the guest partition ID), would be withdrawn when you kill the VMs,
> > > > > right? What other deposited pages would be left?
> > > > > 
> > > > 
> > > > The driver deposits two types of pages: one for the guests (withdrawn
> > > > upon gust shutdown) and the other - for the host itself (never
> > > > withdrawn).
> > > > See hv_call_create_partition, for example: it deposits pages for the
> > > > host partition.
> > > 
> > > Hmm.. I see. Is it not possible to reclaim this memory in module_exit?
> > > Also, can't we forcefully kill all running partitions in module_exit and
> > > then reclaim memory? Would this help with kernel consistency
> > > irrespective of userspace behavior?
> > > 
> > 
> > It would, but this is sloppy and cannot be a long-term solution.
> > 
> > It is also not reliable. We have no hook to prevent kexec. So if we fail
> > to kill the guest or reclaim the memory for any reason, the new kernel
> > may still crash.
> 
> Actually guests won't be running by the time we reach our module_exit
> function during a kexec. Userspace processes would've been killed by
> then.
> 

No, they will not: "kexec -e" doesn't kill user processes.
We must not rely on OS to do graceful shutdown before doing
kexec.

> Also, why is this sloppy? Isn't this what module_exit should be
> doing anyway? If someone unloads our module we should be trying to
> clean everything up (including killing guests) and reclaim memory.
> 

Kexec does not unload modules, but it doesn't really matter even if it
would.
There are other means to plug into the reboot flow, but neither of them
is robust or reliable.

> In any case, we can BUG() out if we fail to reclaim the memory. That would
> stop the kexec.
> 

By killing the whole system? This is not a good user experience and I
don't see how can this be justified.

> This is a better solution since instead of disabling KEXEC outright: our
> driver made the best possible efforts to make kexec work.
> 

How an unrealiable feature leading to potential system crashes is better
that disabling kexec outright?

It's a complete opposite story for me: the latter provides a limited,
but robust functionality, while the former provides an unreliable and
unpredictable behavior.

> > 
> > There are two long-term solutions:
> >  1. Add a way to prevent kexec when there is shared state between the hypervisor and the kernel.
> 
> I honestly think we should focus efforts on making kexec work rather
> than finding ways to prevent it.
> 

There is no argument about it. But until we have it fixed properly, we
have two options: either disable kexec or stop claiming we have our
driver up and ready for external customers. Giving the importance of
this driver for current projects, I believe the better way would be to
explicitly limit the functionality instead of postponing the
productization of the driver.

In other words, this is not about our fillings about kexec support: it's
about what we can reliably provide to our customers today.

Thanks,
Stanislav

> Thanks,
> Anirudh
> 
> >  2. Hand the shared kernel state over to the new kernel.
> > 
> > I sent a series for the first one. The second one is not ready yet.
> > Anything else is neither robust nor reliable, so I don’t think it makes
> > sense to pursue it.
> > 
> > Thanks,
> > Stanislav
> > 
> > 
> > > Thanks,
> > > Anirudh.
> > > 
> > > > 
> > > > Thanks,
> > > > Stanislav
> > > > 
> > > > > Thanks,
> > > > > Anirudh.
> > > > > 
> > > > > > Also, kernel consisntency must no depend on use space behavior. 
> > > > > > 
> > > > > > > Also, I don't think it is reasonable at all that someone needs to
> > > > > > > disable basic kernel functionality such as kexec in order to use our
> > > > > > > driver.
> > > > > > > 
> > > > > > 
> > > > > > It's a temporary measure until proper page lifecycle management is
> > > > > > supported in the driver.
> > > > > > Mutual exclusion of the driver and kexec is given and thus should be
> > > > > > expclitily stated in the Kconfig.
> > > > > > 
> > > > > > Thanks,
> > > > > > Stanislav
> > > > > > 
> > > > > > > Thanks,
> > > > > > > Anirudh.
> > > > > > > 
> > > > > > > > 
> > > > > > > > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > > > > > > > ---
> > > > > > > >  drivers/hv/Kconfig |    1 +
> > > > > > > >  1 file changed, 1 insertion(+)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > > > > > > > index 7937ac0cbd0f..cfd4501db0fa 100644
> > > > > > > > --- a/drivers/hv/Kconfig
> > > > > > > > +++ b/drivers/hv/Kconfig
> > > > > > > > @@ -74,6 +74,7 @@ config MSHV_ROOT
> > > > > > > >  	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> > > > > > > >  	# no particular order, making it impossible to reassemble larger pages
> > > > > > > >  	depends on PAGE_SIZE_4KB
> > > > > > > > +	depends on !KEXEC
> > > > > > > >  	select EVENTFD
> > > > > > > >  	select VIRT_XFER_TO_GUEST_WORK
> > > > > > > >  	select HMM_MIRROR
> > > > > > > > 
> > > > > > > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-01-30 17:11         ` Anirudh Rayabharam
  2026-01-30 18:46           ` Stanislav Kinsburskii
@ 2026-02-02 18:09           ` Stanislav Kinsburskii
  1 sibling, 0 replies; 41+ messages in thread
From: Stanislav Kinsburskii @ 2026-02-02 18:09 UTC (permalink / raw)
  To: Anirudh Rayabharam
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Fri, Jan 30, 2026 at 05:11:12PM +0000, Anirudh Rayabharam wrote:
> On Wed, Jan 28, 2026 at 03:11:14PM -0800, Stanislav Kinsburskii wrote:
> > On Wed, Jan 28, 2026 at 04:16:31PM +0000, Anirudh Rayabharam wrote:
> > > On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> > > > On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > > > > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > hypervisor deposited pages.
> > > > > > 
> > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > management is implemented.
> > > > > 
> > > > > Someone might want to stop all guest VMs and do a kexec. Which is valid
> > > > > and would work without any issue for L1VH.
> > > > > 
> > > > 
> > > > No, it won't work and hypervsisor depostied pages won't be withdrawn.
> > > 
> > > All pages that were deposited in the context of a guest partition (i.e.
> > > with the guest partition ID), would be withdrawn when you kill the VMs,
> > > right? What other deposited pages would be left?
> > > 
> > 
> > The driver deposits two types of pages: one for the guests (withdrawn
> > upon gust shutdown) and the other - for the host itself (never
> > withdrawn).
> > See hv_call_create_partition, for example: it deposits pages for the
> > host partition.
> 
> Hmm.. I see. Is it not possible to reclaim this memory in module_exit?
> Also, can't we forcefully kill all running partitions in module_exit and
> then reclaim memory? Would this help with kernel consistency
> irrespective of userspace behavior?
> 

First, module_exit is not called during kexec. Second, forcefully
killing all partitions during a kexec reboot would be bulky,
error-prone, and slow. It also does not guarantee robust behavior. Too
many things can go wrong, and we could still end up in the same broken
state.

To reiterate: today, the only safe way to use kexec is to avoid any
shared state between the kernel and the hypervisor. In other words, that
state should never be created, or it must be destroyed before issuing
kexec.
Neither of this states is controlled by our driver, so the only safe
options yet is to disable kexec.

Thanks,
Stanislav


> Thanks,
> Anirudh.
> 
> > 
> > Thanks,
> > Stanislav
> > 
> > > Thanks,
> > > Anirudh.
> > > 
> > > > Also, kernel consisntency must no depend on use space behavior. 
> > > > 
> > > > > Also, I don't think it is reasonable at all that someone needs to
> > > > > disable basic kernel functionality such as kexec in order to use our
> > > > > driver.
> > > > > 
> > > > 
> > > > It's a temporary measure until proper page lifecycle management is
> > > > supported in the driver.
> > > > Mutual exclusion of the driver and kexec is given and thus should be
> > > > expclitily stated in the Kconfig.
> > > > 
> > > > Thanks,
> > > > Stanislav
> > > > 
> > > > > Thanks,
> > > > > Anirudh.
> > > > > 
> > > > > > 
> > > > > > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > > > > > ---
> > > > > >  drivers/hv/Kconfig |    1 +
> > > > > >  1 file changed, 1 insertion(+)
> > > > > > 
> > > > > > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > > > > > index 7937ac0cbd0f..cfd4501db0fa 100644
> > > > > > --- a/drivers/hv/Kconfig
> > > > > > +++ b/drivers/hv/Kconfig
> > > > > > @@ -74,6 +74,7 @@ config MSHV_ROOT
> > > > > >  	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> > > > > >  	# no particular order, making it impossible to reassemble larger pages
> > > > > >  	depends on PAGE_SIZE_4KB
> > > > > > +	depends on !KEXEC
> > > > > >  	select EVENTFD
> > > > > >  	select VIRT_XFER_TO_GUEST_WORK
> > > > > >  	select HMM_MIRROR
> > > > > > 
> > > > > > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-02-02 17:10               ` Stanislav Kinsburskii
@ 2026-02-02 19:01                 ` Anirudh Rayabharam
  2026-02-02 19:18                   ` Stanislav Kinsburskii
  0 siblings, 1 reply; 41+ messages in thread
From: Anirudh Rayabharam @ 2026-02-02 19:01 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Mon, Feb 02, 2026 at 09:10:00AM -0800, Stanislav Kinsburskii wrote:
> On Fri, Jan 30, 2026 at 08:32:45PM +0000, Anirudh Rayabharam wrote:
> > On Fri, Jan 30, 2026 at 10:46:45AM -0800, Stanislav Kinsburskii wrote:
> > > On Fri, Jan 30, 2026 at 05:11:12PM +0000, Anirudh Rayabharam wrote:
> > > > On Wed, Jan 28, 2026 at 03:11:14PM -0800, Stanislav Kinsburskii wrote:
> > > > > On Wed, Jan 28, 2026 at 04:16:31PM +0000, Anirudh Rayabharam wrote:
> > > > > > On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> > > > > > > On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > > > > > > > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > > > > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > > > > hypervisor deposited pages.
> > > > > > > > > 
> > > > > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > > > > management is implemented.
> > > > > > > > 
> > > > > > > > Someone might want to stop all guest VMs and do a kexec. Which is valid
> > > > > > > > and would work without any issue for L1VH.
> > > > > > > > 
> > > > > > > 
> > > > > > > No, it won't work and hypervsisor depostied pages won't be withdrawn.
> > > > > > 
> > > > > > All pages that were deposited in the context of a guest partition (i.e.
> > > > > > with the guest partition ID), would be withdrawn when you kill the VMs,
> > > > > > right? What other deposited pages would be left?
> > > > > > 
> > > > > 
> > > > > The driver deposits two types of pages: one for the guests (withdrawn
> > > > > upon gust shutdown) and the other - for the host itself (never
> > > > > withdrawn).
> > > > > See hv_call_create_partition, for example: it deposits pages for the
> > > > > host partition.
> > > > 
> > > > Hmm.. I see. Is it not possible to reclaim this memory in module_exit?
> > > > Also, can't we forcefully kill all running partitions in module_exit and
> > > > then reclaim memory? Would this help with kernel consistency
> > > > irrespective of userspace behavior?
> > > > 
> > > 
> > > It would, but this is sloppy and cannot be a long-term solution.
> > > 
> > > It is also not reliable. We have no hook to prevent kexec. So if we fail
> > > to kill the guest or reclaim the memory for any reason, the new kernel
> > > may still crash.
> > 
> > Actually guests won't be running by the time we reach our module_exit
> > function during a kexec. Userspace processes would've been killed by
> > then.
> > 
> 
> No, they will not: "kexec -e" doesn't kill user processes.
> We must not rely on OS to do graceful shutdown before doing
> kexec.

I see kexec -e is too brutal. Something like systemctl kexec is
more graceful and is probably used more commonly. In this case at least
we could register a reboot notifier and attempt to clean things up.

I think it is better to support kexec to this extent rather than
disabling it entirely.

> 
> > Also, why is this sloppy? Isn't this what module_exit should be
> > doing anyway? If someone unloads our module we should be trying to
> > clean everything up (including killing guests) and reclaim memory.
> > 
> 
> Kexec does not unload modules, but it doesn't really matter even if it
> would.
> There are other means to plug into the reboot flow, but neither of them
> is robust or reliable.
> 
> > In any case, we can BUG() out if we fail to reclaim the memory. That would
> > stop the kexec.
> > 
> 
> By killing the whole system? This is not a good user experience and I
> don't see how can this be justified.

It is justified because, as you said, once we reach that failure we can
no longer guarantee integrity. So BUG() makes sense. This BUG() would
cause the system to go for a full reboot and restore integrity.

> 
> > This is a better solution since instead of disabling KEXEC outright: our
> > driver made the best possible efforts to make kexec work.
> > 
> 
> How an unrealiable feature leading to potential system crashes is better
> that disabling kexec outright?

Because there are ways of using the feature reliably. What if someone
has MSHV_ROOT enabled but never start a VM? (Just because someone has our
driver enabled in the kernel doesn't mean they're using it.) What about crash
dump?

It is far better to support some of these scenarios and be unreliable in
some corner cases rather than disabling the feature completely.

Also, I'm curious if any other driver in the kernel has ever done this
(force disable KEXEC).

> 
> It's a complete opposite story for me: the latter provides a limited,
> but robust functionality, while the former provides an unreliable and
> unpredictable behavior.
> 
> > > 
> > > There are two long-term solutions:
> > >  1. Add a way to prevent kexec when there is shared state between the hypervisor and the kernel.
> > 
> > I honestly think we should focus efforts on making kexec work rather
> > than finding ways to prevent it.
> > 
> 
> There is no argument about it. But until we have it fixed properly, we
> have two options: either disable kexec or stop claiming we have our
> driver up and ready for external customers. Giving the importance of
> this driver for current projects, I believe the better way would be to
> explicitly limit the functionality instead of postponing the
> productization of the driver.

It is okay to claim our driver as ready even if it doesn't support all
kexec cases. If we can support the common cases such as crash dump and
maybe kexec based servicing (pretty sure people do systemctl kexec and
not kexec -e for this with proper teardown) we can claim that our driver
is ready for general use.

Thanks,
Anirudh.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-02-02 19:01                 ` Anirudh Rayabharam
@ 2026-02-02 19:18                   ` Stanislav Kinsburskii
  2026-02-03  5:04                     ` Anirudh Rayabharam
  0 siblings, 1 reply; 41+ messages in thread
From: Stanislav Kinsburskii @ 2026-02-02 19:18 UTC (permalink / raw)
  To: Anirudh Rayabharam
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Mon, Feb 02, 2026 at 07:01:01PM +0000, Anirudh Rayabharam wrote:
> On Mon, Feb 02, 2026 at 09:10:00AM -0800, Stanislav Kinsburskii wrote:
> > On Fri, Jan 30, 2026 at 08:32:45PM +0000, Anirudh Rayabharam wrote:
> > > On Fri, Jan 30, 2026 at 10:46:45AM -0800, Stanislav Kinsburskii wrote:
> > > > On Fri, Jan 30, 2026 at 05:11:12PM +0000, Anirudh Rayabharam wrote:
> > > > > On Wed, Jan 28, 2026 at 03:11:14PM -0800, Stanislav Kinsburskii wrote:
> > > > > > On Wed, Jan 28, 2026 at 04:16:31PM +0000, Anirudh Rayabharam wrote:
> > > > > > > On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > > > > > > > > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > > > > > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > > > > > hypervisor deposited pages.
> > > > > > > > > > 
> > > > > > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > > > > > management is implemented.
> > > > > > > > > 
> > > > > > > > > Someone might want to stop all guest VMs and do a kexec. Which is valid
> > > > > > > > > and would work without any issue for L1VH.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > No, it won't work and hypervsisor depostied pages won't be withdrawn.
> > > > > > > 
> > > > > > > All pages that were deposited in the context of a guest partition (i.e.
> > > > > > > with the guest partition ID), would be withdrawn when you kill the VMs,
> > > > > > > right? What other deposited pages would be left?
> > > > > > > 
> > > > > > 
> > > > > > The driver deposits two types of pages: one for the guests (withdrawn
> > > > > > upon gust shutdown) and the other - for the host itself (never
> > > > > > withdrawn).
> > > > > > See hv_call_create_partition, for example: it deposits pages for the
> > > > > > host partition.
> > > > > 
> > > > > Hmm.. I see. Is it not possible to reclaim this memory in module_exit?
> > > > > Also, can't we forcefully kill all running partitions in module_exit and
> > > > > then reclaim memory? Would this help with kernel consistency
> > > > > irrespective of userspace behavior?
> > > > > 
> > > > 
> > > > It would, but this is sloppy and cannot be a long-term solution.
> > > > 
> > > > It is also not reliable. We have no hook to prevent kexec. So if we fail
> > > > to kill the guest or reclaim the memory for any reason, the new kernel
> > > > may still crash.
> > > 
> > > Actually guests won't be running by the time we reach our module_exit
> > > function during a kexec. Userspace processes would've been killed by
> > > then.
> > > 
> > 
> > No, they will not: "kexec -e" doesn't kill user processes.
> > We must not rely on OS to do graceful shutdown before doing
> > kexec.
> 
> I see kexec -e is too brutal. Something like systemctl kexec is
> more graceful and is probably used more commonly. In this case at least
> we could register a reboot notifier and attempt to clean things up.
> 
> I think it is better to support kexec to this extent rather than
> disabling it entirely.
> 

You do understand that once our kernel is released to third parties, we
can’t control how they will use kexec, right?

This is a valid and existing option. We have to account for it. Yet
again, L1VH will be used by arbitrary third parties out there, not just
by us.

We can’t say the kernel supports MSHV until we close these gaps. We must
not depend on user space to keep the kernel safe.

Do you agree?

Thanks,
Stanislav

> > 
> > > Also, why is this sloppy? Isn't this what module_exit should be
> > > doing anyway? If someone unloads our module we should be trying to
> > > clean everything up (including killing guests) and reclaim memory.
> > > 
> > 
> > Kexec does not unload modules, but it doesn't really matter even if it
> > would.
> > There are other means to plug into the reboot flow, but neither of them
> > is robust or reliable.
> > 
> > > In any case, we can BUG() out if we fail to reclaim the memory. That would
> > > stop the kexec.
> > > 
> > 
> > By killing the whole system? This is not a good user experience and I
> > don't see how can this be justified.
> 
> It is justified because, as you said, once we reach that failure we can
> no longer guarantee integrity. So BUG() makes sense. This BUG() would
> cause the system to go for a full reboot and restore integrity.
> 
> > 
> > > This is a better solution since instead of disabling KEXEC outright: our
> > > driver made the best possible efforts to make kexec work.
> > > 
> > 
> > How an unrealiable feature leading to potential system crashes is better
> > that disabling kexec outright?
> 
> Because there are ways of using the feature reliably. What if someone
> has MSHV_ROOT enabled but never start a VM? (Just because someone has our
> driver enabled in the kernel doesn't mean they're using it.) What about crash
> dump?
> 
> It is far better to support some of these scenarios and be unreliable in
> some corner cases rather than disabling the feature completely.
> 
> Also, I'm curious if any other driver in the kernel has ever done this
> (force disable KEXEC).
> 
> > 
> > It's a complete opposite story for me: the latter provides a limited,
> > but robust functionality, while the former provides an unreliable and
> > unpredictable behavior.
> > 
> > > > 
> > > > There are two long-term solutions:
> > > >  1. Add a way to prevent kexec when there is shared state between the hypervisor and the kernel.
> > > 
> > > I honestly think we should focus efforts on making kexec work rather
> > > than finding ways to prevent it.
> > > 
> > 
> > There is no argument about it. But until we have it fixed properly, we
> > have two options: either disable kexec or stop claiming we have our
> > driver up and ready for external customers. Giving the importance of
> > this driver for current projects, I believe the better way would be to
> > explicitly limit the functionality instead of postponing the
> > productization of the driver.
> 
> It is okay to claim our driver as ready even if it doesn't support all
> kexec cases. If we can support the common cases such as crash dump and
> maybe kexec based servicing (pretty sure people do systemctl kexec and
> not kexec -e for this with proper teardown) we can claim that our driver
> is ready for general use.
> 
> Thanks,
> Anirudh.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-02-02 16:43                             ` Stanislav Kinsburskii
@ 2026-02-02 20:15                               ` Mukesh R
  2026-02-04  2:46                                 ` Mukesh R
  0 siblings, 1 reply; 41+ messages in thread
From: Mukesh R @ 2026-02-02 20:15 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: Anirudh Rayabharam, kys, haiyangz, wei.liu, decui, longli,
	linux-hyperv, linux-kernel

On 2/2/26 08:43, Stanislav Kinsburskii wrote:
> On Fri, Jan 30, 2026 at 11:47:48AM -0800, Mukesh R wrote:
>> On 1/30/26 10:41, Stanislav Kinsburskii wrote:
>>> On Fri, Jan 30, 2026 at 05:17:52PM +0000, Anirudh Rayabharam wrote:
>>>> On Thu, Jan 29, 2026 at 06:59:31PM -0800, Mukesh R wrote:
>>>>> On 1/28/26 15:08, Stanislav Kinsburskii wrote:
>>>>>> On Tue, Jan 27, 2026 at 11:56:02AM -0800, Mukesh R wrote:
>>>>>>> On 1/27/26 09:47, Stanislav Kinsburskii wrote:
>>>>>>>> On Mon, Jan 26, 2026 at 05:39:49PM -0800, Mukesh R wrote:
>>>>>>>>> On 1/26/26 16:21, Stanislav Kinsburskii wrote:
>>>>>>>>>> On Mon, Jan 26, 2026 at 03:07:18PM -0800, Mukesh R wrote:
>>>>>>>>>>> On 1/26/26 12:43, Stanislav Kinsburskii wrote:
>>>>>>>>>>>> On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote:
>>>>>>>>>>>>> On 1/25/26 14:39, Stanislav Kinsburskii wrote:
>>>>>>>>>>>>>> On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
>>>>>>>>>>>>>>> On 1/23/26 14:20, Stanislav Kinsburskii wrote:
>>>>>>>>>>>>>>>> The MSHV driver deposits kernel-allocated pages to the hypervisor during
>>>>>>>>>>>>>>>> runtime and never withdraws them. This creates a fundamental incompatibility
>>>>>>>>>>>>>>>> with KEXEC, as these deposited pages remain unavailable to the new kernel
>>>>>>>>>>>>>>>> loaded via KEXEC, leading to potential system crashes upon kernel accessing
>>>>>>>>>>>>>>>> hypervisor deposited pages.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Make MSHV mutually exclusive with KEXEC until proper page lifecycle
>>>>>>>>>>>>>>>> management is implemented.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>          drivers/hv/Kconfig |    1 +
>>>>>>>>>>>>>>>>          1 file changed, 1 insertion(+)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
>>>>>>>>>>>>>>>> index 7937ac0cbd0f..cfd4501db0fa 100644
>>>>>>>>>>>>>>>> --- a/drivers/hv/Kconfig
>>>>>>>>>>>>>>>> +++ b/drivers/hv/Kconfig
>>>>>>>>>>>>>>>> @@ -74,6 +74,7 @@ config MSHV_ROOT
>>>>>>>>>>>>>>>>          	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
>>>>>>>>>>>>>>>>          	# no particular order, making it impossible to reassemble larger pages
>>>>>>>>>>>>>>>>          	depends on PAGE_SIZE_4KB
>>>>>>>>>>>>>>>> +	depends on !KEXEC
>>>>>>>>>>>>>>>>          	select EVENTFD
>>>>>>>>>>>>>>>>          	select VIRT_XFER_TO_GUEST_WORK
>>>>>>>>>>>>>>>>          	select HMM_MIRROR
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
>>>>>>>>>>>>>>> implying that crash dump might be involved. Or did you test kdump
>>>>>>>>>>>>>>> and it was fine?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, it will. Crash kexec depends on normal kexec functionality, so it
>>>>>>>>>>>>>> will be affected as well.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So not sure I understand the reason for this patch. We can just block
>>>>>>>>>>>>> kexec if there are any VMs running, right? Doing this would mean any
>>>>>>>>>>>>> further developement would be without a ver important and major feature,
>>>>>>>>>>>>> right?
>>>>>>>>>>>>
>>>>>>>>>>>> This is an option. But until it's implemented and merged, a user mshv
>>>>>>>>>>>> driver gets into a situation where kexec is broken in a non-obvious way.
>>>>>>>>>>>> The system may crash at any time after kexec, depending on whether the
>>>>>>>>>>>> new kernel touches the pages deposited to hypervisor or not. This is a
>>>>>>>>>>>> bad user experience.
>>>>>>>>>>>
>>>>>>>>>>> I understand that. But with this we cannot collect core and debug any
>>>>>>>>>>> crashes. I was thinking there would be a quick way to prohibit kexec
>>>>>>>>>>> for update via notifier or some other quick hack. Did you already
>>>>>>>>>>> explore that and didn't find anything, hence this?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This quick hack you mention isn't quick in the upstream kernel as there
>>>>>>>>>> is no hook to interrupt kexec process except the live update one.
>>>>>>>>>
>>>>>>>>> That's the one we want to interrupt and block right? crash kexec
>>>>>>>>> is ok and should be allowed. We can document we don't support kexec
>>>>>>>>> for update for now.
>>>>>>>>>
>>>>>>>>>> I sent an RFC for that one but given todays conversation details is
>>>>>>>>>> won't be accepted as is.
>>>>>>>>>
>>>>>>>>> Are you taking about this?
>>>>>>>>>
>>>>>>>>>             "mshv: Add kexec safety for deposited pages"
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes.
>>>>>>>>
>>>>>>>>>> Making mshv mutually exclusive with kexec is the only viable option for
>>>>>>>>>> now given time constraints.
>>>>>>>>>> It is intended to be replaced with proper page lifecycle management in
>>>>>>>>>> the future.
>>>>>>>>>
>>>>>>>>> Yeah, that could take a long time and imo we cannot just disable KEXEC
>>>>>>>>> completely. What we want is just block kexec for updates from some
>>>>>>>>> mshv file for now, we an print during boot that kexec for updates is
>>>>>>>>> not supported on mshv. Hope that makes sense.
>>>>>>>>>
>>>>>>>>
>>>>>>>> The trade-off here is between disabling kexec support and having the
>>>>>>>> kernel crash after kexec in a non-obvious way. This affects both regular
>>>>>>>> kexec and crash kexec.
>>>>>>>
>>>>>>> crash kexec on baremetal is not affected, hence disabling that
>>>>>>> doesn't make sense as we can't debug crashes then on bm.
>>>>>>>
>>>>>>
>>>>>> Bare metal support is not currently relevant, as it is not available.
>>>>>> This is the upstream kernel, and this driver will be accessible to
>>>>>> third-party customers beginning with kernel 6.19 for running their
>>>>>> kernels in Azure L1VH, so consistency is required.
>>>>>
>>>>> Well, without crashdump support, customers will not be running anything
>>>>> anywhere.
>>>>
>>>> This is my concern too. I don't think customers will be particularly
>>>> happy that kexec doesn't work with our driver.
>>>>
>>>
>>> I wasn?t clear earlier, so let me restate it. Today, kexec is not
>>> supported in L1VH. This is a bug we have not fixed yet. Disabling kexec
>>> is not a long-term solution. But it is better to disable it explicitly
>>> than to have kernel crashes after kexec.
>>
>> I don't think there is disagreement on this. The undesired part is turning
>> off KEXEC config completely.
>>
> 
> There is no disagreement on this either. If you have a better solution
> that can be implemented and merged before next kernel merge window,
> please propose it. Otherwise, this patch will remain as is for now.

Like I said previously, I'll explore a bit. I think I found something,
but need to test it a bit and get second opinion on it. For me, I am
not convinced this absolutely has to be in this merge window as it only
involves MSHV for l1vh and has been like this all this time. Moreover,
other things like makedumpfile are broken on l1vh. But Wei can make
final decision.

Thanks,
-Mukesh

> Thanks,
> Stanislav
> 
>> Thanks,
>> -Mukesh
>>
>>
>>> This does not mean the bug should not be fixed. But the upstream kernel
>>> has its own policies and merge windows. For kernel 6.19, it is better to
>>> have a clear kexec error than random crashes after kexec.
>>>
>>> Thanks,
>>> Stanislav
>>>
>>>> Thanks,
>>>> Anirudh
>>>>
>>>>>
>>>>> Thanks,
>>>>> -Mukesh
>>>>>
>>>>>> Thanks,
>>>>>> Stanislav
>>>>>>
>>>>>>> Let me think and explore a bit, and if I come up with something, I'll
>>>>>>> send a patch here. If nothing, then we can do this as last resort.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -Mukesh
>>>>>>>
>>>>>>>
>>>>>>>> It?s a pity we can?t apply a quick hack to disable only regular kexec.
>>>>>>>> However, since crash kexec would hit the same issues, until we have a
>>>>>>>> proper state transition for deposted pages, the best workaround for now
>>>>>>>> is to reset the hypervisor state on every kexec, which needs design,
>>>>>>>> work, and testing.
>>>>>>>>
>>>>>>>> Disabling kexec is the only consistent way to handle this in the
>>>>>>>> upstream kernel at the moment.
>>>>>>>>
>>>>>>>> Thanks, Stanislav
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> -Mukesh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Stanislav
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> -Mukesh
>>>>>>>>>>>
>>>>>>>>>>>> Therefor it should be explicitly forbidden as it's essentially not
>>>>>>>>>>>> supported yet.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Stanislav
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Stanislav
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> -Mukesh
>>>>>
>>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-02-02 19:18                   ` Stanislav Kinsburskii
@ 2026-02-03  5:04                     ` Anirudh Rayabharam
  2026-02-03 15:40                       ` Stanislav Kinsburskii
  0 siblings, 1 reply; 41+ messages in thread
From: Anirudh Rayabharam @ 2026-02-03  5:04 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Mon, Feb 02, 2026 at 11:18:27AM -0800, Stanislav Kinsburskii wrote:
> On Mon, Feb 02, 2026 at 07:01:01PM +0000, Anirudh Rayabharam wrote:
> > On Mon, Feb 02, 2026 at 09:10:00AM -0800, Stanislav Kinsburskii wrote:
> > > On Fri, Jan 30, 2026 at 08:32:45PM +0000, Anirudh Rayabharam wrote:
> > > > On Fri, Jan 30, 2026 at 10:46:45AM -0800, Stanislav Kinsburskii wrote:
> > > > > On Fri, Jan 30, 2026 at 05:11:12PM +0000, Anirudh Rayabharam wrote:
> > > > > > On Wed, Jan 28, 2026 at 03:11:14PM -0800, Stanislav Kinsburskii wrote:
> > > > > > > On Wed, Jan 28, 2026 at 04:16:31PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > > > > > > > > > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > > > > > > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > > > > > > hypervisor deposited pages.
> > > > > > > > > > > 
> > > > > > > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > > > > > > management is implemented.
> > > > > > > > > > 
> > > > > > > > > > Someone might want to stop all guest VMs and do a kexec. Which is valid
> > > > > > > > > > and would work without any issue for L1VH.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > No, it won't work and hypervsisor depostied pages won't be withdrawn.
> > > > > > > > 
> > > > > > > > All pages that were deposited in the context of a guest partition (i.e.
> > > > > > > > with the guest partition ID), would be withdrawn when you kill the VMs,
> > > > > > > > right? What other deposited pages would be left?
> > > > > > > > 
> > > > > > > 
> > > > > > > The driver deposits two types of pages: one for the guests (withdrawn
> > > > > > > upon gust shutdown) and the other - for the host itself (never
> > > > > > > withdrawn).
> > > > > > > See hv_call_create_partition, for example: it deposits pages for the
> > > > > > > host partition.
> > > > > > 
> > > > > > Hmm.. I see. Is it not possible to reclaim this memory in module_exit?
> > > > > > Also, can't we forcefully kill all running partitions in module_exit and
> > > > > > then reclaim memory? Would this help with kernel consistency
> > > > > > irrespective of userspace behavior?
> > > > > > 
> > > > > 
> > > > > It would, but this is sloppy and cannot be a long-term solution.
> > > > > 
> > > > > It is also not reliable. We have no hook to prevent kexec. So if we fail
> > > > > to kill the guest or reclaim the memory for any reason, the new kernel
> > > > > may still crash.
> > > > 
> > > > Actually guests won't be running by the time we reach our module_exit
> > > > function during a kexec. Userspace processes would've been killed by
> > > > then.
> > > > 
> > > 
> > > No, they will not: "kexec -e" doesn't kill user processes.
> > > We must not rely on OS to do graceful shutdown before doing
> > > kexec.
> > 
> > I see kexec -e is too brutal. Something like systemctl kexec is
> > more graceful and is probably used more commonly. In this case at least
> > we could register a reboot notifier and attempt to clean things up.
> > 
> > I think it is better to support kexec to this extent rather than
> > disabling it entirely.
> > 
> 
> You do understand that once our kernel is released to third parties, we
> can’t control how they will use kexec, right?

Yes, we can't. But that's okay. It is fine for us to say that only some
kexec scenarios are supported and some aren't (iff you're creating VMs
using MSHV; if you're not creating VMs all of kexec is supported).

> 
> This is a valid and existing option. We have to account for it. Yet
> again, L1VH will be used by arbitrary third parties out there, not just
> by us.
> 
> We can’t say the kernel supports MSHV until we close these gaps. We must

We can. It is okay say some scenarios are supported and some aren't.

All kexecs are supported if they never create VMs using MSHV. If they do
create VMs using MSHV and we implement cleanup in a reboot notifier at
least systemctl kexec and crashdump kexec would which are probably the
most common uses of kexec. It's okay to say that this is all we support
as of now.

Also, what makes you think customers would even be interested in enabling
our module in their kernel configs if it takes away kexec?

Thanks,
Anirudh.

> not depend on user space to keep the kernel safe.
> 
> Do you agree?
> 
> Thanks,
> Stanislav
> 
> > > 
> > > > Also, why is this sloppy? Isn't this what module_exit should be
> > > > doing anyway? If someone unloads our module we should be trying to
> > > > clean everything up (including killing guests) and reclaim memory.
> > > > 
> > > 
> > > Kexec does not unload modules, but it doesn't really matter even if it
> > > would.
> > > There are other means to plug into the reboot flow, but neither of them
> > > is robust or reliable.
> > > 
> > > > In any case, we can BUG() out if we fail to reclaim the memory. That would
> > > > stop the kexec.
> > > > 
> > > 
> > > By killing the whole system? This is not a good user experience and I
> > > don't see how can this be justified.
> > 
> > It is justified because, as you said, once we reach that failure we can
> > no longer guarantee integrity. So BUG() makes sense. This BUG() would
> > cause the system to go for a full reboot and restore integrity.
> > 
> > > 
> > > > This is a better solution since instead of disabling KEXEC outright: our
> > > > driver made the best possible efforts to make kexec work.
> > > > 
> > > 
> > > How an unrealiable feature leading to potential system crashes is better
> > > that disabling kexec outright?
> > 
> > Because there are ways of using the feature reliably. What if someone
> > has MSHV_ROOT enabled but never start a VM? (Just because someone has our
> > driver enabled in the kernel doesn't mean they're using it.) What about crash
> > dump?
> > 
> > It is far better to support some of these scenarios and be unreliable in
> > some corner cases rather than disabling the feature completely.
> > 
> > Also, I'm curious if any other driver in the kernel has ever done this
> > (force disable KEXEC).
> > 
> > > 
> > > It's a complete opposite story for me: the latter provides a limited,
> > > but robust functionality, while the former provides an unreliable and
> > > unpredictable behavior.
> > > 
> > > > > 
> > > > > There are two long-term solutions:
> > > > >  1. Add a way to prevent kexec when there is shared state between the hypervisor and the kernel.
> > > > 
> > > > I honestly think we should focus efforts on making kexec work rather
> > > > than finding ways to prevent it.
> > > > 
> > > 
> > > There is no argument about it. But until we have it fixed properly, we
> > > have two options: either disable kexec or stop claiming we have our
> > > driver up and ready for external customers. Giving the importance of
> > > this driver for current projects, I believe the better way would be to
> > > explicitly limit the functionality instead of postponing the
> > > productization of the driver.
> > 
> > It is okay to claim our driver as ready even if it doesn't support all
> > kexec cases. If we can support the common cases such as crash dump and
> > maybe kexec based servicing (pretty sure people do systemctl kexec and
> > not kexec -e for this with proper teardown) we can claim that our driver
> > is ready for general use.
> > 
> > Thanks,
> > Anirudh.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-02-03  5:04                     ` Anirudh Rayabharam
@ 2026-02-03 15:40                       ` Stanislav Kinsburskii
  2026-02-03 16:46                         ` Anirudh Rayabharam
  0 siblings, 1 reply; 41+ messages in thread
From: Stanislav Kinsburskii @ 2026-02-03 15:40 UTC (permalink / raw)
  To: Anirudh Rayabharam
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Tue, Feb 03, 2026 at 10:34:28AM +0530, Anirudh Rayabharam wrote:
> On Mon, Feb 02, 2026 at 11:18:27AM -0800, Stanislav Kinsburskii wrote:
> > On Mon, Feb 02, 2026 at 07:01:01PM +0000, Anirudh Rayabharam wrote:
> > > On Mon, Feb 02, 2026 at 09:10:00AM -0800, Stanislav Kinsburskii wrote:
> > > > On Fri, Jan 30, 2026 at 08:32:45PM +0000, Anirudh Rayabharam wrote:
> > > > > On Fri, Jan 30, 2026 at 10:46:45AM -0800, Stanislav Kinsburskii wrote:
> > > > > > On Fri, Jan 30, 2026 at 05:11:12PM +0000, Anirudh Rayabharam wrote:
> > > > > > > On Wed, Jan 28, 2026 at 03:11:14PM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > On Wed, Jan 28, 2026 at 04:16:31PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > > On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > > On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > > > > > > > > > > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > > > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > > > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > > > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > > > > > > > hypervisor deposited pages.
> > > > > > > > > > > > 
> > > > > > > > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > > > > > > > management is implemented.
> > > > > > > > > > > 
> > > > > > > > > > > Someone might want to stop all guest VMs and do a kexec. Which is valid
> > > > > > > > > > > and would work without any issue for L1VH.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > No, it won't work and hypervsisor depostied pages won't be withdrawn.
> > > > > > > > > 
> > > > > > > > > All pages that were deposited in the context of a guest partition (i.e.
> > > > > > > > > with the guest partition ID), would be withdrawn when you kill the VMs,
> > > > > > > > > right? What other deposited pages would be left?
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > The driver deposits two types of pages: one for the guests (withdrawn
> > > > > > > > upon gust shutdown) and the other - for the host itself (never
> > > > > > > > withdrawn).
> > > > > > > > See hv_call_create_partition, for example: it deposits pages for the
> > > > > > > > host partition.
> > > > > > > 
> > > > > > > Hmm.. I see. Is it not possible to reclaim this memory in module_exit?
> > > > > > > Also, can't we forcefully kill all running partitions in module_exit and
> > > > > > > then reclaim memory? Would this help with kernel consistency
> > > > > > > irrespective of userspace behavior?
> > > > > > > 
> > > > > > 
> > > > > > It would, but this is sloppy and cannot be a long-term solution.
> > > > > > 
> > > > > > It is also not reliable. We have no hook to prevent kexec. So if we fail
> > > > > > to kill the guest or reclaim the memory for any reason, the new kernel
> > > > > > may still crash.
> > > > > 
> > > > > Actually guests won't be running by the time we reach our module_exit
> > > > > function during a kexec. Userspace processes would've been killed by
> > > > > then.
> > > > > 
> > > > 
> > > > No, they will not: "kexec -e" doesn't kill user processes.
> > > > We must not rely on OS to do graceful shutdown before doing
> > > > kexec.
> > > 
> > > I see kexec -e is too brutal. Something like systemctl kexec is
> > > more graceful and is probably used more commonly. In this case at least
> > > we could register a reboot notifier and attempt to clean things up.
> > > 
> > > I think it is better to support kexec to this extent rather than
> > > disabling it entirely.
> > > 
> > 
> > You do understand that once our kernel is released to third parties, we
> > can’t control how they will use kexec, right?
> 
> Yes, we can't. But that's okay. It is fine for us to say that only some
> kexec scenarios are supported and some aren't (iff you're creating VMs
> using MSHV; if you're not creating VMs all of kexec is supported).
> 

Well, I disagree here. If we say the kernel supports MSHV, we must
provide a robust solution. A partially working solution is not
acceptable. It makes us look careless and can damage our reputation as a
team (and as a company).

> > 
> > This is a valid and existing option. We have to account for it. Yet
> > again, L1VH will be used by arbitrary third parties out there, not just
> > by us.
> > 
> > We can’t say the kernel supports MSHV until we close these gaps. We must
> 
> We can. It is okay say some scenarios are supported and some aren't.
> 
> All kexecs are supported if they never create VMs using MSHV. If they do
> create VMs using MSHV and we implement cleanup in a reboot notifier at
> least systemctl kexec and crashdump kexec would which are probably the
> most common uses of kexec. It's okay to say that this is all we support
> as of now.
> 

I'm repeating myself, but I'll try to put it differently.
There won't be any kernel core collected if a page was deposited. You're
arguing for a lost cause here. Once a page is allocated and deposited,
the crash kernel will try to write it into the core.

> Also, what makes you think customers would even be interested in enabling
> our module in their kernel configs if it takes away kexec?
> 

It's simple: L1VH isn't a host, so I can spin up new VMs instead of
servicing the existing ones.

Why do you think there won’t be customers interested in using MSHV in
L1VH without kexec support?

Thanks,
Stanislav

> Thanks,
> Anirudh.
> 
> > not depend on user space to keep the kernel safe.
> > 
> > Do you agree?
> > 
> > Thanks,
> > Stanislav
> > 
> > > > 
> > > > > Also, why is this sloppy? Isn't this what module_exit should be
> > > > > doing anyway? If someone unloads our module we should be trying to
> > > > > clean everything up (including killing guests) and reclaim memory.
> > > > > 
> > > > 
> > > > Kexec does not unload modules, but it doesn't really matter even if it
> > > > would.
> > > > There are other means to plug into the reboot flow, but neither of them
> > > > is robust or reliable.
> > > > 
> > > > > In any case, we can BUG() out if we fail to reclaim the memory. That would
> > > > > stop the kexec.
> > > > > 
> > > > 
> > > > By killing the whole system? This is not a good user experience and I
> > > > don't see how can this be justified.
> > > 
> > > It is justified because, as you said, once we reach that failure we can
> > > no longer guarantee integrity. So BUG() makes sense. This BUG() would
> > > cause the system to go for a full reboot and restore integrity.
> > > 
> > > > 
> > > > > This is a better solution since instead of disabling KEXEC outright: our
> > > > > driver made the best possible efforts to make kexec work.
> > > > > 
> > > > 
> > > > How an unrealiable feature leading to potential system crashes is better
> > > > that disabling kexec outright?
> > > 
> > > Because there are ways of using the feature reliably. What if someone
> > > has MSHV_ROOT enabled but never start a VM? (Just because someone has our
> > > driver enabled in the kernel doesn't mean they're using it.) What about crash
> > > dump?
> > > 
> > > It is far better to support some of these scenarios and be unreliable in
> > > some corner cases rather than disabling the feature completely.
> > > 
> > > Also, I'm curious if any other driver in the kernel has ever done this
> > > (force disable KEXEC).
> > > 
> > > > 
> > > > It's a complete opposite story for me: the latter provides a limited,
> > > > but robust functionality, while the former provides an unreliable and
> > > > unpredictable behavior.
> > > > 
> > > > > > 
> > > > > > There are two long-term solutions:
> > > > > >  1. Add a way to prevent kexec when there is shared state between the hypervisor and the kernel.
> > > > > 
> > > > > I honestly think we should focus efforts on making kexec work rather
> > > > > than finding ways to prevent it.
> > > > > 
> > > > 
> > > > There is no argument about it. But until we have it fixed properly, we
> > > > have two options: either disable kexec or stop claiming we have our
> > > > driver up and ready for external customers. Giving the importance of
> > > > this driver for current projects, I believe the better way would be to
> > > > explicitly limit the functionality instead of postponing the
> > > > productization of the driver.
> > > 
> > > It is okay to claim our driver as ready even if it doesn't support all
> > > kexec cases. If we can support the common cases such as crash dump and
> > > maybe kexec based servicing (pretty sure people do systemctl kexec and
> > > not kexec -e for this with proper teardown) we can claim that our driver
> > > is ready for general use.
> > > 
> > > Thanks,
> > > Anirudh.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-02-03 15:40                       ` Stanislav Kinsburskii
@ 2026-02-03 16:46                         ` Anirudh Rayabharam
  2026-02-03 19:42                           ` Stanislav Kinsburskii
  0 siblings, 1 reply; 41+ messages in thread
From: Anirudh Rayabharam @ 2026-02-03 16:46 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Tue, Feb 03, 2026 at 07:40:36AM -0800, Stanislav Kinsburskii wrote:
> On Tue, Feb 03, 2026 at 10:34:28AM +0530, Anirudh Rayabharam wrote:
> > On Mon, Feb 02, 2026 at 11:18:27AM -0800, Stanislav Kinsburskii wrote:
> > > On Mon, Feb 02, 2026 at 07:01:01PM +0000, Anirudh Rayabharam wrote:
> > > > On Mon, Feb 02, 2026 at 09:10:00AM -0800, Stanislav Kinsburskii wrote:
> > > > > On Fri, Jan 30, 2026 at 08:32:45PM +0000, Anirudh Rayabharam wrote:
> > > > > > On Fri, Jan 30, 2026 at 10:46:45AM -0800, Stanislav Kinsburskii wrote:
> > > > > > > On Fri, Jan 30, 2026 at 05:11:12PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > On Wed, Jan 28, 2026 at 03:11:14PM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > On Wed, Jan 28, 2026 at 04:16:31PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > > > On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > > > On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > > > > > > > > > > > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > > > > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > > > > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > > > > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > > > > > > > > hypervisor deposited pages.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > > > > > > > > management is implemented.
> > > > > > > > > > > > 
> > > > > > > > > > > > Someone might want to stop all guest VMs and do a kexec. Which is valid
> > > > > > > > > > > > and would work without any issue for L1VH.
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > No, it won't work and hypervsisor depostied pages won't be withdrawn.
> > > > > > > > > > 
> > > > > > > > > > All pages that were deposited in the context of a guest partition (i.e.
> > > > > > > > > > with the guest partition ID), would be withdrawn when you kill the VMs,
> > > > > > > > > > right? What other deposited pages would be left?
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > The driver deposits two types of pages: one for the guests (withdrawn
> > > > > > > > > upon gust shutdown) and the other - for the host itself (never
> > > > > > > > > withdrawn).
> > > > > > > > > See hv_call_create_partition, for example: it deposits pages for the
> > > > > > > > > host partition.
> > > > > > > > 
> > > > > > > > Hmm.. I see. Is it not possible to reclaim this memory in module_exit?
> > > > > > > > Also, can't we forcefully kill all running partitions in module_exit and
> > > > > > > > then reclaim memory? Would this help with kernel consistency
> > > > > > > > irrespective of userspace behavior?
> > > > > > > > 
> > > > > > > 
> > > > > > > It would, but this is sloppy and cannot be a long-term solution.
> > > > > > > 
> > > > > > > It is also not reliable. We have no hook to prevent kexec. So if we fail
> > > > > > > to kill the guest or reclaim the memory for any reason, the new kernel
> > > > > > > may still crash.
> > > > > > 
> > > > > > Actually guests won't be running by the time we reach our module_exit
> > > > > > function during a kexec. Userspace processes would've been killed by
> > > > > > then.
> > > > > > 
> > > > > 
> > > > > No, they will not: "kexec -e" doesn't kill user processes.
> > > > > We must not rely on OS to do graceful shutdown before doing
> > > > > kexec.
> > > > 
> > > > I see kexec -e is too brutal. Something like systemctl kexec is
> > > > more graceful and is probably used more commonly. In this case at least
> > > > we could register a reboot notifier and attempt to clean things up.
> > > > 
> > > > I think it is better to support kexec to this extent rather than
> > > > disabling it entirely.
> > > > 
> > > 
> > > You do understand that once our kernel is released to third parties, we
> > > can’t control how they will use kexec, right?
> > 
> > Yes, we can't. But that's okay. It is fine for us to say that only some
> > kexec scenarios are supported and some aren't (iff you're creating VMs
> > using MSHV; if you're not creating VMs all of kexec is supported).
> > 
> 
> Well, I disagree here. If we say the kernel supports MSHV, we must
> provide a robust solution. A partially working solution is not
> acceptable. It makes us look careless and can damage our reputation as a
> team (and as a company).

It won't if we call out upfront what is supported and what is not.

> 
> > > 
> > > This is a valid and existing option. We have to account for it. Yet
> > > again, L1VH will be used by arbitrary third parties out there, not just
> > > by us.
> > > 
> > > We can’t say the kernel supports MSHV until we close these gaps. We must
> > 
> > We can. It is okay say some scenarios are supported and some aren't.
> > 
> > All kexecs are supported if they never create VMs using MSHV. If they do
> > create VMs using MSHV and we implement cleanup in a reboot notifier at
> > least systemctl kexec and crashdump kexec would which are probably the
> > most common uses of kexec. It's okay to say that this is all we support
> > as of now.
> > 
> 
> I'm repeating myself, but I'll try to put it differently.
> There won't be any kernel core collected if a page was deposited. You're
> arguing for a lost cause here. Once a page is allocated and deposited,
> the crash kernel will try to write it into the core.

That's why we have to implement something where we attempt to destroy
partitions and reclaim memory (and BUG() out if that fails; which
hopefully should happen very rarely if at all). This should be *the*
solution we work towards. We don't need a temporary disable kexec
solution.

> 
> > Also, what makes you think customers would even be interested in enabling
> > our module in their kernel configs if it takes away kexec?
> > 
> 
> It's simple: L1VH isn't a host, so I can spin up new VMs instead of
> servicing the existing ones.

And what about the L2 VM state then? They might not be throwaway in all
cases.

> 
> Why do you think there won’t be customers interested in using MSHV in
> L1VH without kexec support?

Because they could already be using kexec for their servicing needs or
whatever. And no we can't just say "don't service these VMs just spin up
new ones".

Also, keep in mind that once L1VH is available in Azure, the distros
that run on it would be the same distros that run on all other Azure
VMs. There won't be special distros with a kernel specifically built for
L1VH. And KEXEC is generally enabled in distros. Distro vendors won't be
happy that they would need to publish a separate version of their image with
MSHV_ROOT enabled and KEXEC disabled because they wouldn't want KEXEC to
be disabled for all Azure VMs. Also, the customers will be confused why
the same distro doesn't work on L1VH.

Thanks,
Anirudh.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-02-03 16:46                         ` Anirudh Rayabharam
@ 2026-02-03 19:42                           ` Stanislav Kinsburskii
  2026-02-04  5:33                             ` Anirudh Rayabharam
  0 siblings, 1 reply; 41+ messages in thread
From: Stanislav Kinsburskii @ 2026-02-03 19:42 UTC (permalink / raw)
  To: Anirudh Rayabharam
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Tue, Feb 03, 2026 at 04:46:03PM +0000, Anirudh Rayabharam wrote:
> On Tue, Feb 03, 2026 at 07:40:36AM -0800, Stanislav Kinsburskii wrote:
> > On Tue, Feb 03, 2026 at 10:34:28AM +0530, Anirudh Rayabharam wrote:
> > > On Mon, Feb 02, 2026 at 11:18:27AM -0800, Stanislav Kinsburskii wrote:
> > > > On Mon, Feb 02, 2026 at 07:01:01PM +0000, Anirudh Rayabharam wrote:
> > > > > On Mon, Feb 02, 2026 at 09:10:00AM -0800, Stanislav Kinsburskii wrote:
> > > > > > On Fri, Jan 30, 2026 at 08:32:45PM +0000, Anirudh Rayabharam wrote:
> > > > > > > On Fri, Jan 30, 2026 at 10:46:45AM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > On Fri, Jan 30, 2026 at 05:11:12PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > > On Wed, Jan 28, 2026 at 03:11:14PM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > > On Wed, Jan 28, 2026 at 04:16:31PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > > > > On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > > > > > > > > > > > > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > > > > > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > > > > > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > > > > > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > > > > > > > > > hypervisor deposited pages.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > > > > > > > > > management is implemented.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Someone might want to stop all guest VMs and do a kexec. Which is valid
> > > > > > > > > > > > > and would work without any issue for L1VH.
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > No, it won't work and hypervsisor depostied pages won't be withdrawn.
> > > > > > > > > > > 
> > > > > > > > > > > All pages that were deposited in the context of a guest partition (i.e.
> > > > > > > > > > > with the guest partition ID), would be withdrawn when you kill the VMs,
> > > > > > > > > > > right? What other deposited pages would be left?
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > The driver deposits two types of pages: one for the guests (withdrawn
> > > > > > > > > > upon gust shutdown) and the other - for the host itself (never
> > > > > > > > > > withdrawn).
> > > > > > > > > > See hv_call_create_partition, for example: it deposits pages for the
> > > > > > > > > > host partition.
> > > > > > > > > 
> > > > > > > > > Hmm.. I see. Is it not possible to reclaim this memory in module_exit?
> > > > > > > > > Also, can't we forcefully kill all running partitions in module_exit and
> > > > > > > > > then reclaim memory? Would this help with kernel consistency
> > > > > > > > > irrespective of userspace behavior?
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > It would, but this is sloppy and cannot be a long-term solution.
> > > > > > > > 
> > > > > > > > It is also not reliable. We have no hook to prevent kexec. So if we fail
> > > > > > > > to kill the guest or reclaim the memory for any reason, the new kernel
> > > > > > > > may still crash.
> > > > > > > 
> > > > > > > Actually guests won't be running by the time we reach our module_exit
> > > > > > > function during a kexec. Userspace processes would've been killed by
> > > > > > > then.
> > > > > > > 
> > > > > > 
> > > > > > No, they will not: "kexec -e" doesn't kill user processes.
> > > > > > We must not rely on OS to do graceful shutdown before doing
> > > > > > kexec.
> > > > > 
> > > > > I see kexec -e is too brutal. Something like systemctl kexec is
> > > > > more graceful and is probably used more commonly. In this case at least
> > > > > we could register a reboot notifier and attempt to clean things up.
> > > > > 
> > > > > I think it is better to support kexec to this extent rather than
> > > > > disabling it entirely.
> > > > > 
> > > > 
> > > > You do understand that once our kernel is released to third parties, we
> > > > can’t control how they will use kexec, right?
> > > 
> > > Yes, we can't. But that's okay. It is fine for us to say that only some
> > > kexec scenarios are supported and some aren't (iff you're creating VMs
> > > using MSHV; if you're not creating VMs all of kexec is supported).
> > > 
> > 
> > Well, I disagree here. If we say the kernel supports MSHV, we must
> > provide a robust solution. A partially working solution is not
> > acceptable. It makes us look careless and can damage our reputation as a
> > team (and as a company).
> 
> It won't if we call out upfront what is supported and what is not.
> 
> > 
> > > > 
> > > > This is a valid and existing option. We have to account for it. Yet
> > > > again, L1VH will be used by arbitrary third parties out there, not just
> > > > by us.
> > > > 
> > > > We can’t say the kernel supports MSHV until we close these gaps. We must
> > > 
> > > We can. It is okay say some scenarios are supported and some aren't.
> > > 
> > > All kexecs are supported if they never create VMs using MSHV. If they do
> > > create VMs using MSHV and we implement cleanup in a reboot notifier at
> > > least systemctl kexec and crashdump kexec would which are probably the
> > > most common uses of kexec. It's okay to say that this is all we support
> > > as of now.
> > > 
> > 
> > I'm repeating myself, but I'll try to put it differently.
> > There won't be any kernel core collected if a page was deposited. You're
> > arguing for a lost cause here. Once a page is allocated and deposited,
> > the crash kernel will try to write it into the core.
> 
> That's why we have to implement something where we attempt to destroy
> partitions and reclaim memory (and BUG() out if that fails; which
> hopefully should happen very rarely if at all). This should be *the*
> solution we work towards. We don't need a temporary disable kexec
> solution.
> 

No, the solution is to preserve the shared state and pass it over via KHO.

> > 
> > > Also, what makes you think customers would even be interested in enabling
> > > our module in their kernel configs if it takes away kexec?
> > > 
> > 
> > It's simple: L1VH isn't a host, so I can spin up new VMs instead of
> > servicing the existing ones.
> 
> And what about the L2 VM state then? They might not be throwaway in all
> cases.
> 

L2 guest can (and likely will) be migrated fromt he old L1VH to the new
one.
And this is most likely the current scenario customers are using.

> > 
> > Why do you think there won’t be customers interested in using MSHV in
> > L1VH without kexec support?
> 
> Because they could already be using kexec for their servicing needs or
> whatever. And no we can't just say "don't service these VMs just spin up
> new ones".
> 

Are you speculating or know for sure?

> Also, keep in mind that once L1VH is available in Azure, the distros
> that run on it would be the same distros that run on all other Azure
> VMs. There won't be special distros with a kernel specifically built for
> L1VH. And KEXEC is generally enabled in distros. Distro vendors won't be
> happy that they would need to publish a separate version of their image with
> MSHV_ROOT enabled and KEXEC disabled because they wouldn't want KEXEC to
> be disabled for all Azure VMs. Also, the customers will be confused why
> the same distro doesn't work on L1VH.
> 

I don't think distro happiness is our concern. They already build custom
versions for Azure. They can build another custom version for L1VH if
needed.

Anyway, I don't see the point in continuing this discussion. All points
have been made, and solutions have been proposed.

If you can come up with something better in the next few days, so we at
least have a chance to get it merged in the next merge window, great. If
not, we should explicitly forbid the unsupported feature and move on.

Thanks,
Thanks,
Stanislav

> Thanks,
> Anirudh.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-02-02 20:15                               ` Mukesh R
@ 2026-02-04  2:46                                 ` Mukesh R
  0 siblings, 0 replies; 41+ messages in thread
From: Mukesh R @ 2026-02-04  2:46 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: Anirudh Rayabharam, kys, haiyangz, wei.liu, decui, longli,
	linux-hyperv, linux-kernel

On 2/2/26 12:15, Mukesh R wrote:
> On 2/2/26 08:43, Stanislav Kinsburskii wrote:
>> On Fri, Jan 30, 2026 at 11:47:48AM -0800, Mukesh R wrote:
>>> On 1/30/26 10:41, Stanislav Kinsburskii wrote:
>>>> On Fri, Jan 30, 2026 at 05:17:52PM +0000, Anirudh Rayabharam wrote:
>>>>> On Thu, Jan 29, 2026 at 06:59:31PM -0800, Mukesh R wrote:
>>>>>> On 1/28/26 15:08, Stanislav Kinsburskii wrote:
>>>>>>> On Tue, Jan 27, 2026 at 11:56:02AM -0800, Mukesh R wrote:
>>>>>>>> On 1/27/26 09:47, Stanislav Kinsburskii wrote:
>>>>>>>>> On Mon, Jan 26, 2026 at 05:39:49PM -0800, Mukesh R wrote:
>>>>>>>>>> On 1/26/26 16:21, Stanislav Kinsburskii wrote:
>>>>>>>>>>> On Mon, Jan 26, 2026 at 03:07:18PM -0800, Mukesh R wrote:
>>>>>>>>>>>> On 1/26/26 12:43, Stanislav Kinsburskii wrote:
>>>>>>>>>>>>> On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote:
>>>>>>>>>>>>>> On 1/25/26 14:39, Stanislav Kinsburskii wrote:
>>>>>>>>>>>>>>> On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
>>>>>>>>>>>>>>>> On 1/23/26 14:20, Stanislav Kinsburskii wrote:
>>>>>>>>>>>>>>>>> The MSHV driver deposits kernel-allocated pages to the hypervisor during
>>>>>>>>>>>>>>>>> runtime and never withdraws them. This creates a fundamental incompatibility
>>>>>>>>>>>>>>>>> with KEXEC, as these deposited pages remain unavailable to the new kernel
>>>>>>>>>>>>>>>>> loaded via KEXEC, leading to potential system crashes upon kernel accessing
>>>>>>>>>>>>>>>>> hypervisor deposited pages.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Make MSHV mutually exclusive with KEXEC until proper page lifecycle
>>>>>>>>>>>>>>>>> management is implemented.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>          drivers/hv/Kconfig |    1 +
>>>>>>>>>>>>>>>>>          1 file changed, 1 insertion(+)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
>>>>>>>>>>>>>>>>> index 7937ac0cbd0f..cfd4501db0fa 100644
>>>>>>>>>>>>>>>>> --- a/drivers/hv/Kconfig
>>>>>>>>>>>>>>>>> +++ b/drivers/hv/Kconfig
>>>>>>>>>>>>>>>>> @@ -74,6 +74,7 @@ config MSHV_ROOT
>>>>>>>>>>>>>>>>>              # e.g. When withdrawing memory, the hypervisor gives back 4k pages in
>>>>>>>>>>>>>>>>>              # no particular order, making it impossible to reassemble larger pages
>>>>>>>>>>>>>>>>>              depends on PAGE_SIZE_4KB
>>>>>>>>>>>>>>>>> +    depends on !KEXEC
>>>>>>>>>>>>>>>>>              select EVENTFD
>>>>>>>>>>>>>>>>>              select VIRT_XFER_TO_GUEST_WORK
>>>>>>>>>>>>>>>>>              select HMM_MIRROR
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
>>>>>>>>>>>>>>>> implying that crash dump might be involved. Or did you test kdump
>>>>>>>>>>>>>>>> and it was fine?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yes, it will. Crash kexec depends on normal kexec functionality, so it
>>>>>>>>>>>>>>> will be affected as well.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So not sure I understand the reason for this patch. We can just block
>>>>>>>>>>>>>> kexec if there are any VMs running, right? Doing this would mean any
>>>>>>>>>>>>>> further developement would be without a ver important and major feature,
>>>>>>>>>>>>>> right?
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is an option. But until it's implemented and merged, a user mshv
>>>>>>>>>>>>> driver gets into a situation where kexec is broken in a non-obvious way.
>>>>>>>>>>>>> The system may crash at any time after kexec, depending on whether the
>>>>>>>>>>>>> new kernel touches the pages deposited to hypervisor or not. This is a
>>>>>>>>>>>>> bad user experience.
>>>>>>>>>>>>
>>>>>>>>>>>> I understand that. But with this we cannot collect core and debug any
>>>>>>>>>>>> crashes. I was thinking there would be a quick way to prohibit kexec
>>>>>>>>>>>> for update via notifier or some other quick hack. Did you already
>>>>>>>>>>>> explore that and didn't find anything, hence this?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This quick hack you mention isn't quick in the upstream kernel as there
>>>>>>>>>>> is no hook to interrupt kexec process except the live update one.
>>>>>>>>>>
>>>>>>>>>> That's the one we want to interrupt and block right? crash kexec
>>>>>>>>>> is ok and should be allowed. We can document we don't support kexec
>>>>>>>>>> for update for now.
>>>>>>>>>>
>>>>>>>>>>> I sent an RFC for that one but given todays conversation details is
>>>>>>>>>>> won't be accepted as is.
>>>>>>>>>>
>>>>>>>>>> Are you taking about this?
>>>>>>>>>>
>>>>>>>>>>             "mshv: Add kexec safety for deposited pages"
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>>>> Making mshv mutually exclusive with kexec is the only viable option for
>>>>>>>>>>> now given time constraints.
>>>>>>>>>>> It is intended to be replaced with proper page lifecycle management in
>>>>>>>>>>> the future.
>>>>>>>>>>
>>>>>>>>>> Yeah, that could take a long time and imo we cannot just disable KEXEC
>>>>>>>>>> completely. What we want is just block kexec for updates from some
>>>>>>>>>> mshv file for now, we an print during boot that kexec for updates is
>>>>>>>>>> not supported on mshv. Hope that makes sense.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The trade-off here is between disabling kexec support and having the
>>>>>>>>> kernel crash after kexec in a non-obvious way. This affects both regular
>>>>>>>>> kexec and crash kexec.
>>>>>>>>
>>>>>>>> crash kexec on baremetal is not affected, hence disabling that
>>>>>>>> doesn't make sense as we can't debug crashes then on bm.
>>>>>>>>
>>>>>>>
>>>>>>> Bare metal support is not currently relevant, as it is not available.
>>>>>>> This is the upstream kernel, and this driver will be accessible to
>>>>>>> third-party customers beginning with kernel 6.19 for running their
>>>>>>> kernels in Azure L1VH, so consistency is required.
>>>>>>
>>>>>> Well, without crashdump support, customers will not be running anything
>>>>>> anywhere.
>>>>>
>>>>> This is my concern too. I don't think customers will be particularly
>>>>> happy that kexec doesn't work with our driver.
>>>>>
>>>>
>>>> I wasn?t clear earlier, so let me restate it. Today, kexec is not
>>>> supported in L1VH. This is a bug we have not fixed yet. Disabling kexec
>>>> is not a long-term solution. But it is better to disable it explicitly
>>>> than to have kernel crashes after kexec.
>>>
>>> I don't think there is disagreement on this. The undesired part is turning
>>> off KEXEC config completely.
>>>
>>
>> There is no disagreement on this either. If you have a better solution
>> that can be implemented and merged before next kernel merge window,
>> please propose it. Otherwise, this patch will remain as is for now.
> 
> Like I said previously, I'll explore a bit. I think I found something,
> but need to test it a bit and get second opinion on it. For me, I am

Nah, it works, but is too intrusive and no chance of being accepted. So
giving up on it. Hopefully a cleaner way can be achieved working with
kexec folks.

Thanks,
-Mukesh


> not convinced this absolutely has to be in this merge window as it only
> involves MSHV for l1vh and has been like this all this time. Moreover,
> other things like makedumpfile are broken on l1vh. But Wei can make
> final decision.
> 
> Thanks,
> -Mukesh
> 
>> Thanks,
>> Stanislav
>>
>>> Thanks,
>>> -Mukesh
>>>
>>>
>>>> This does not mean the bug should not be fixed. But the upstream kernel
>>>> has its own policies and merge windows. For kernel 6.19, it is better to
>>>> have a clear kexec error than random crashes after kexec.
>>>>
>>>> Thanks,
>>>> Stanislav
>>>>
>>>>> Thanks,
>>>>> Anirudh
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> -Mukesh
>>>>>>
>>>>>>> Thanks,
>>>>>>> Stanislav
>>>>>>>
>>>>>>>> Let me think and explore a bit, and if I come up with something, I'll
>>>>>>>> send a patch here. If nothing, then we can do this as last resort.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> -Mukesh
>>>>>>>>
>>>>>>>>
>>>>>>>>> It?s a pity we can?t apply a quick hack to disable only regular kexec.
>>>>>>>>> However, since crash kexec would hit the same issues, until we have a
>>>>>>>>> proper state transition for deposted pages, the best workaround for now
>>>>>>>>> is to reset the hypervisor state on every kexec, which needs design,
>>>>>>>>> work, and testing.
>>>>>>>>>
>>>>>>>>> Disabling kexec is the only consistent way to handle this in the
>>>>>>>>> upstream kernel at the moment.
>>>>>>>>>
>>>>>>>>> Thanks, Stanislav
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> -Mukesh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Stanislav
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> -Mukesh
>>>>>>>>>>>>
>>>>>>>>>>>>> Therefor it should be explicitly forbidden as it's essentially not
>>>>>>>>>>>>> supported yet.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Stanislav
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Stanislav
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> -Mukesh
>>>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-02-03 19:42                           ` Stanislav Kinsburskii
@ 2026-02-04  5:33                             ` Anirudh Rayabharam
  2026-02-04 18:33                               ` Stanislav Kinsburskii
  0 siblings, 1 reply; 41+ messages in thread
From: Anirudh Rayabharam @ 2026-02-04  5:33 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Tue, Feb 03, 2026 at 11:42:58AM -0800, Stanislav Kinsburskii wrote:
> On Tue, Feb 03, 2026 at 04:46:03PM +0000, Anirudh Rayabharam wrote:
> > On Tue, Feb 03, 2026 at 07:40:36AM -0800, Stanislav Kinsburskii wrote:
> > > On Tue, Feb 03, 2026 at 10:34:28AM +0530, Anirudh Rayabharam wrote:
> > > > On Mon, Feb 02, 2026 at 11:18:27AM -0800, Stanislav Kinsburskii wrote:
> > > > > On Mon, Feb 02, 2026 at 07:01:01PM +0000, Anirudh Rayabharam wrote:
> > > > > > On Mon, Feb 02, 2026 at 09:10:00AM -0800, Stanislav Kinsburskii wrote:
> > > > > > > On Fri, Jan 30, 2026 at 08:32:45PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > On Fri, Jan 30, 2026 at 10:46:45AM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > On Fri, Jan 30, 2026 at 05:11:12PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > > > On Wed, Jan 28, 2026 at 03:11:14PM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > > > On Wed, Jan 28, 2026 at 04:16:31PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > > > > > On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > > On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > > > > > > > > > > > > > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > > > > > > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > > > > > > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > > > > > > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > > > > > > > > > > hypervisor deposited pages.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > > > > > > > > > > management is implemented.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Someone might want to stop all guest VMs and do a kexec. Which is valid
> > > > > > > > > > > > > > and would work without any issue for L1VH.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > No, it won't work and hypervsisor depostied pages won't be withdrawn.
> > > > > > > > > > > > 
> > > > > > > > > > > > All pages that were deposited in the context of a guest partition (i.e.
> > > > > > > > > > > > with the guest partition ID), would be withdrawn when you kill the VMs,
> > > > > > > > > > > > right? What other deposited pages would be left?
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > The driver deposits two types of pages: one for the guests (withdrawn
> > > > > > > > > > > upon gust shutdown) and the other - for the host itself (never
> > > > > > > > > > > withdrawn).
> > > > > > > > > > > See hv_call_create_partition, for example: it deposits pages for the
> > > > > > > > > > > host partition.
> > > > > > > > > > 
> > > > > > > > > > Hmm.. I see. Is it not possible to reclaim this memory in module_exit?
> > > > > > > > > > Also, can't we forcefully kill all running partitions in module_exit and
> > > > > > > > > > then reclaim memory? Would this help with kernel consistency
> > > > > > > > > > irrespective of userspace behavior?
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > It would, but this is sloppy and cannot be a long-term solution.
> > > > > > > > > 
> > > > > > > > > It is also not reliable. We have no hook to prevent kexec. So if we fail
> > > > > > > > > to kill the guest or reclaim the memory for any reason, the new kernel
> > > > > > > > > may still crash.
> > > > > > > > 
> > > > > > > > Actually guests won't be running by the time we reach our module_exit
> > > > > > > > function during a kexec. Userspace processes would've been killed by
> > > > > > > > then.
> > > > > > > > 
> > > > > > > 
> > > > > > > No, they will not: "kexec -e" doesn't kill user processes.
> > > > > > > We must not rely on OS to do graceful shutdown before doing
> > > > > > > kexec.
> > > > > > 
> > > > > > I see kexec -e is too brutal. Something like systemctl kexec is
> > > > > > more graceful and is probably used more commonly. In this case at least
> > > > > > we could register a reboot notifier and attempt to clean things up.
> > > > > > 
> > > > > > I think it is better to support kexec to this extent rather than
> > > > > > disabling it entirely.
> > > > > > 
> > > > > 
> > > > > You do understand that once our kernel is released to third parties, we
> > > > > can’t control how they will use kexec, right?
> > > > 
> > > > Yes, we can't. But that's okay. It is fine for us to say that only some
> > > > kexec scenarios are supported and some aren't (iff you're creating VMs
> > > > using MSHV; if you're not creating VMs all of kexec is supported).
> > > > 
> > > 
> > > Well, I disagree here. If we say the kernel supports MSHV, we must
> > > provide a robust solution. A partially working solution is not
> > > acceptable. It makes us look careless and can damage our reputation as a
> > > team (and as a company).
> > 
> > It won't if we call out upfront what is supported and what is not.
> > 
> > > 
> > > > > 
> > > > > This is a valid and existing option. We have to account for it. Yet
> > > > > again, L1VH will be used by arbitrary third parties out there, not just
> > > > > by us.
> > > > > 
> > > > > We can’t say the kernel supports MSHV until we close these gaps. We must
> > > > 
> > > > We can. It is okay say some scenarios are supported and some aren't.
> > > > 
> > > > All kexecs are supported if they never create VMs using MSHV. If they do
> > > > create VMs using MSHV and we implement cleanup in a reboot notifier at
> > > > least systemctl kexec and crashdump kexec would which are probably the
> > > > most common uses of kexec. It's okay to say that this is all we support
> > > > as of now.
> > > > 
> > > 
> > > I'm repeating myself, but I'll try to put it differently.
> > > There won't be any kernel core collected if a page was deposited. You're
> > > arguing for a lost cause here. Once a page is allocated and deposited,
> > > the crash kernel will try to write it into the core.
> > 
> > That's why we have to implement something where we attempt to destroy
> > partitions and reclaim memory (and BUG() out if that fails; which
> > hopefully should happen very rarely if at all). This should be *the*
> > solution we work towards. We don't need a temporary disable kexec
> > solution.
> > 
> 
> No, the solution is to preserve the shared state and pass it over via KHO.

Okay, then work towards it without doing temporary KEXEC disable. We can
call out that kexec is not supported until then. Disabling KEXEC is too
intrusive.

Is there any precedent for this? Do you know if any driver ever disabled
KEXEC this way?

> 
> > > 
> > > > Also, what makes you think customers would even be interested in enabling
> > > > our module in their kernel configs if it takes away kexec?
> > > > 
> > > 
> > > It's simple: L1VH isn't a host, so I can spin up new VMs instead of
> > > servicing the existing ones.
> > 
> > And what about the L2 VM state then? They might not be throwaway in all
> > cases.
> > 
> 
> L2 guest can (and likely will) be migrated fromt he old L1VH to the new
> one.
> And this is most likely the current scenario customers are using.
> 
> > > 
> > > Why do you think there won’t be customers interested in using MSHV in
> > > L1VH without kexec support?
> > 
> > Because they could already be using kexec for their servicing needs or
> > whatever. And no we can't just say "don't service these VMs just spin up
> > new ones".
> > 
> 
> Are you speculating or know for sure?

It's a reasonable assumption that people are using kexec for servicing.

> 
> > Also, keep in mind that once L1VH is available in Azure, the distros
> > that run on it would be the same distros that run on all other Azure
> > VMs. There won't be special distros with a kernel specifically built for
> > L1VH. And KEXEC is generally enabled in distros. Distro vendors won't be
> > happy that they would need to publish a separate version of their image with
> > MSHV_ROOT enabled and KEXEC disabled because they wouldn't want KEXEC to
> > be disabled for all Azure VMs. Also, the customers will be confused why
> > the same distro doesn't work on L1VH.
> > 
> 
> I don't think distro happiness is our concern. They already build custom

If distros are not happy they won't package this and consequently
nobody will use it.

> versions for Azure. They can build another custom version for L1VH if
> needed.

We should at least check if they are ready to do this.

Thanks,
Anirudh.

> 
> Anyway, I don't see the point in continuing this discussion. All points
> have been made, and solutions have been proposed.
> 
> If you can come up with something better in the next few days, so we at
> least have a chance to get it merged in the next merge window, great. If
> not, we should explicitly forbid the unsupported feature and move on.
> 
> Thanks,
> Thanks,
> Stanislav
> 
> > Thanks,
> > Anirudh.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-02-04  5:33                             ` Anirudh Rayabharam
@ 2026-02-04 18:33                               ` Stanislav Kinsburskii
  2026-02-05  4:59                                 ` Anirudh Rayabharam
  0 siblings, 1 reply; 41+ messages in thread
From: Stanislav Kinsburskii @ 2026-02-04 18:33 UTC (permalink / raw)
  To: Anirudh Rayabharam
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Wed, Feb 04, 2026 at 05:33:29AM +0000, Anirudh Rayabharam wrote:
> On Tue, Feb 03, 2026 at 11:42:58AM -0800, Stanislav Kinsburskii wrote:
> > On Tue, Feb 03, 2026 at 04:46:03PM +0000, Anirudh Rayabharam wrote:
> > > On Tue, Feb 03, 2026 at 07:40:36AM -0800, Stanislav Kinsburskii wrote:
> > > > On Tue, Feb 03, 2026 at 10:34:28AM +0530, Anirudh Rayabharam wrote:
> > > > > On Mon, Feb 02, 2026 at 11:18:27AM -0800, Stanislav Kinsburskii wrote:
> > > > > > On Mon, Feb 02, 2026 at 07:01:01PM +0000, Anirudh Rayabharam wrote:
> > > > > > > On Mon, Feb 02, 2026 at 09:10:00AM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > On Fri, Jan 30, 2026 at 08:32:45PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > > On Fri, Jan 30, 2026 at 10:46:45AM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > > On Fri, Jan 30, 2026 at 05:11:12PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > > > > On Wed, Jan 28, 2026 at 03:11:14PM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > On Wed, Jan 28, 2026 at 04:16:31PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > > > > > > On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > > > On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > > > > > > > > > > > > > > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > > > > > > > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > > > > > > > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > > > > > > > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > > > > > > > > > > > hypervisor deposited pages.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > > > > > > > > > > > management is implemented.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Someone might want to stop all guest VMs and do a kexec. Which is valid
> > > > > > > > > > > > > > > and would work without any issue for L1VH.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > No, it won't work and hypervsisor depostied pages won't be withdrawn.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > All pages that were deposited in the context of a guest partition (i.e.
> > > > > > > > > > > > > with the guest partition ID), would be withdrawn when you kill the VMs,
> > > > > > > > > > > > > right? What other deposited pages would be left?
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > The driver deposits two types of pages: one for the guests (withdrawn
> > > > > > > > > > > > upon gust shutdown) and the other - for the host itself (never
> > > > > > > > > > > > withdrawn).
> > > > > > > > > > > > See hv_call_create_partition, for example: it deposits pages for the
> > > > > > > > > > > > host partition.
> > > > > > > > > > > 
> > > > > > > > > > > Hmm.. I see. Is it not possible to reclaim this memory in module_exit?
> > > > > > > > > > > Also, can't we forcefully kill all running partitions in module_exit and
> > > > > > > > > > > then reclaim memory? Would this help with kernel consistency
> > > > > > > > > > > irrespective of userspace behavior?
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > It would, but this is sloppy and cannot be a long-term solution.
> > > > > > > > > > 
> > > > > > > > > > It is also not reliable. We have no hook to prevent kexec. So if we fail
> > > > > > > > > > to kill the guest or reclaim the memory for any reason, the new kernel
> > > > > > > > > > may still crash.
> > > > > > > > > 
> > > > > > > > > Actually guests won't be running by the time we reach our module_exit
> > > > > > > > > function during a kexec. Userspace processes would've been killed by
> > > > > > > > > then.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > No, they will not: "kexec -e" doesn't kill user processes.
> > > > > > > > We must not rely on OS to do graceful shutdown before doing
> > > > > > > > kexec.
> > > > > > > 
> > > > > > > I see kexec -e is too brutal. Something like systemctl kexec is
> > > > > > > more graceful and is probably used more commonly. In this case at least
> > > > > > > we could register a reboot notifier and attempt to clean things up.
> > > > > > > 
> > > > > > > I think it is better to support kexec to this extent rather than
> > > > > > > disabling it entirely.
> > > > > > > 
> > > > > > 
> > > > > > You do understand that once our kernel is released to third parties, we
> > > > > > can’t control how they will use kexec, right?
> > > > > 
> > > > > Yes, we can't. But that's okay. It is fine for us to say that only some
> > > > > kexec scenarios are supported and some aren't (iff you're creating VMs
> > > > > using MSHV; if you're not creating VMs all of kexec is supported).
> > > > > 
> > > > 
> > > > Well, I disagree here. If we say the kernel supports MSHV, we must
> > > > provide a robust solution. A partially working solution is not
> > > > acceptable. It makes us look careless and can damage our reputation as a
> > > > team (and as a company).
> > > 
> > > It won't if we call out upfront what is supported and what is not.
> > > 
> > > > 
> > > > > > 
> > > > > > This is a valid and existing option. We have to account for it. Yet
> > > > > > again, L1VH will be used by arbitrary third parties out there, not just
> > > > > > by us.
> > > > > > 
> > > > > > We can’t say the kernel supports MSHV until we close these gaps. We must
> > > > > 
> > > > > We can. It is okay say some scenarios are supported and some aren't.
> > > > > 
> > > > > All kexecs are supported if they never create VMs using MSHV. If they do
> > > > > create VMs using MSHV and we implement cleanup in a reboot notifier at
> > > > > least systemctl kexec and crashdump kexec would which are probably the
> > > > > most common uses of kexec. It's okay to say that this is all we support
> > > > > as of now.
> > > > > 
> > > > 
> > > > I'm repeating myself, but I'll try to put it differently.
> > > > There won't be any kernel core collected if a page was deposited. You're
> > > > arguing for a lost cause here. Once a page is allocated and deposited,
> > > > the crash kernel will try to write it into the core.
> > > 
> > > That's why we have to implement something where we attempt to destroy
> > > partitions and reclaim memory (and BUG() out if that fails; which
> > > hopefully should happen very rarely if at all). This should be *the*
> > > solution we work towards. We don't need a temporary disable kexec
> > > solution.
> > > 
> > 
> > No, the solution is to preserve the shared state and pass it over via KHO.
> 
> Okay, then work towards it without doing temporary KEXEC disable. We can
> call out that kexec is not supported until then. Disabling KEXEC is too
> intrusive.
> 

What do you mean by "too intrusive"? The change if local to driver's
Kconfig. There are no verbal "callouts" in upstream Linux - that's
exactly what Kconfig is used for. Once the proper solution is
implemented, we can remove the restriction.

> Is there any precedent for this? Do you know if any driver ever disabled
> KEXEC this way?
> 

No, but there is no other similar driver like this one.
Why does it matter though?

> > 
> > > > 
> > > > > Also, what makes you think customers would even be interested in enabling
> > > > > our module in their kernel configs if it takes away kexec?
> > > > > 
> > > > 
> > > > It's simple: L1VH isn't a host, so I can spin up new VMs instead of
> > > > servicing the existing ones.
> > > 
> > > And what about the L2 VM state then? They might not be throwaway in all
> > > cases.
> > > 
> > 
> > L2 guest can (and likely will) be migrated fromt he old L1VH to the new
> > one.
> > And this is most likely the current scenario customers are using.
> > 
> > > > 
> > > > Why do you think there won’t be customers interested in using MSHV in
> > > > L1VH without kexec support?
> > > 
> > > Because they could already be using kexec for their servicing needs or
> > > whatever. And no we can't just say "don't service these VMs just spin up
> > > new ones".
> > > 
> > 
> > Are you speculating or know for sure?
> 
> It's a reasonable assumption that people are using kexec for servicing.
> 

Again, using kexec for servicing is not supported: why pretending it is?

> > 
> > > Also, keep in mind that once L1VH is available in Azure, the distros
> > > that run on it would be the same distros that run on all other Azure
> > > VMs. There won't be special distros with a kernel specifically built for
> > > L1VH. And KEXEC is generally enabled in distros. Distro vendors won't be
> > > happy that they would need to publish a separate version of their image with
> > > MSHV_ROOT enabled and KEXEC disabled because they wouldn't want KEXEC to
> > > be disabled for all Azure VMs. Also, the customers will be confused why
> > > the same distro doesn't work on L1VH.
> > > 
> > 
> > I don't think distro happiness is our concern. They already build custom
> 
> If distros are not happy they won't package this and consequently
> nobody will use it.
> 

Could you provide an example of such issues in the past?

> > versions for Azure. They can build another custom version for L1VH if
> > needed.
> 
> We should at least check if they are ready to do this.
> 

This is a labor intrusive and long-term check. Unless there is a solid
evidence that they won't do it, I don't see the point in doing this.

Thanks,
Stanislav

> Thanks,
> Anirudh.
> 
> > 
> > Anyway, I don't see the point in continuing this discussion. All points
> > have been made, and solutions have been proposed.
> > 
> > If you can come up with something better in the next few days, so we at
> > least have a chance to get it merged in the next merge window, great. If
> > not, we should explicitly forbid the unsupported feature and move on.
> > 
> > Thanks,
> > Thanks,
> > Stanislav
> > 
> > > Thanks,
> > > Anirudh.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-02-04 18:33                               ` Stanislav Kinsburskii
@ 2026-02-05  4:59                                 ` Anirudh Rayabharam
  2026-02-05 17:12                                   ` Stanislav Kinsburskii
  0 siblings, 1 reply; 41+ messages in thread
From: Anirudh Rayabharam @ 2026-02-05  4:59 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Wed, Feb 04, 2026 at 10:33:11AM -0800, Stanislav Kinsburskii wrote:
> On Wed, Feb 04, 2026 at 05:33:29AM +0000, Anirudh Rayabharam wrote:
> > On Tue, Feb 03, 2026 at 11:42:58AM -0800, Stanislav Kinsburskii wrote:
> > > On Tue, Feb 03, 2026 at 04:46:03PM +0000, Anirudh Rayabharam wrote:
> > > > On Tue, Feb 03, 2026 at 07:40:36AM -0800, Stanislav Kinsburskii wrote:
> > > > > On Tue, Feb 03, 2026 at 10:34:28AM +0530, Anirudh Rayabharam wrote:
> > > > > > On Mon, Feb 02, 2026 at 11:18:27AM -0800, Stanislav Kinsburskii wrote:
> > > > > > > On Mon, Feb 02, 2026 at 07:01:01PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > On Mon, Feb 02, 2026 at 09:10:00AM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > On Fri, Jan 30, 2026 at 08:32:45PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > > > On Fri, Jan 30, 2026 at 10:46:45AM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > > > On Fri, Jan 30, 2026 at 05:11:12PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > > > > > On Wed, Jan 28, 2026 at 03:11:14PM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > > On Wed, Jan 28, 2026 at 04:16:31PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > > > > > > > On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > > > > On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > > > > > > > > > > > > > > > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > > > > > > > > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > > > > > > > > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > > > > > > > > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > > > > > > > > > > > > hypervisor deposited pages.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > > > > > > > > > > > > management is implemented.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Someone might want to stop all guest VMs and do a kexec. Which is valid
> > > > > > > > > > > > > > > > and would work without any issue for L1VH.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > No, it won't work and hypervsisor depostied pages won't be withdrawn.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > All pages that were deposited in the context of a guest partition (i.e.
> > > > > > > > > > > > > > with the guest partition ID), would be withdrawn when you kill the VMs,
> > > > > > > > > > > > > > right? What other deposited pages would be left?
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > The driver deposits two types of pages: one for the guests (withdrawn
> > > > > > > > > > > > > upon gust shutdown) and the other - for the host itself (never
> > > > > > > > > > > > > withdrawn).
> > > > > > > > > > > > > See hv_call_create_partition, for example: it deposits pages for the
> > > > > > > > > > > > > host partition.
> > > > > > > > > > > > 
> > > > > > > > > > > > Hmm.. I see. Is it not possible to reclaim this memory in module_exit?
> > > > > > > > > > > > Also, can't we forcefully kill all running partitions in module_exit and
> > > > > > > > > > > > then reclaim memory? Would this help with kernel consistency
> > > > > > > > > > > > irrespective of userspace behavior?
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > It would, but this is sloppy and cannot be a long-term solution.
> > > > > > > > > > > 
> > > > > > > > > > > It is also not reliable. We have no hook to prevent kexec. So if we fail
> > > > > > > > > > > to kill the guest or reclaim the memory for any reason, the new kernel
> > > > > > > > > > > may still crash.
> > > > > > > > > > 
> > > > > > > > > > Actually guests won't be running by the time we reach our module_exit
> > > > > > > > > > function during a kexec. Userspace processes would've been killed by
> > > > > > > > > > then.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > No, they will not: "kexec -e" doesn't kill user processes.
> > > > > > > > > We must not rely on OS to do graceful shutdown before doing
> > > > > > > > > kexec.
> > > > > > > > 
> > > > > > > > I see kexec -e is too brutal. Something like systemctl kexec is
> > > > > > > > more graceful and is probably used more commonly. In this case at least
> > > > > > > > we could register a reboot notifier and attempt to clean things up.
> > > > > > > > 
> > > > > > > > I think it is better to support kexec to this extent rather than
> > > > > > > > disabling it entirely.
> > > > > > > > 
> > > > > > > 
> > > > > > > You do understand that once our kernel is released to third parties, we
> > > > > > > can’t control how they will use kexec, right?
> > > > > > 
> > > > > > Yes, we can't. But that's okay. It is fine for us to say that only some
> > > > > > kexec scenarios are supported and some aren't (iff you're creating VMs
> > > > > > using MSHV; if you're not creating VMs all of kexec is supported).
> > > > > > 
> > > > > 
> > > > > Well, I disagree here. If we say the kernel supports MSHV, we must
> > > > > provide a robust solution. A partially working solution is not
> > > > > acceptable. It makes us look careless and can damage our reputation as a
> > > > > team (and as a company).
> > > > 
> > > > It won't if we call out upfront what is supported and what is not.
> > > > 
> > > > > 
> > > > > > > 
> > > > > > > This is a valid and existing option. We have to account for it. Yet
> > > > > > > again, L1VH will be used by arbitrary third parties out there, not just
> > > > > > > by us.
> > > > > > > 
> > > > > > > We can’t say the kernel supports MSHV until we close these gaps. We must
> > > > > > 
> > > > > > We can. It is okay say some scenarios are supported and some aren't.
> > > > > > 
> > > > > > All kexecs are supported if they never create VMs using MSHV. If they do
> > > > > > create VMs using MSHV and we implement cleanup in a reboot notifier at
> > > > > > least systemctl kexec and crashdump kexec would which are probably the
> > > > > > most common uses of kexec. It's okay to say that this is all we support
> > > > > > as of now.
> > > > > > 
> > > > > 
> > > > > I'm repeating myself, but I'll try to put it differently.
> > > > > There won't be any kernel core collected if a page was deposited. You're
> > > > > arguing for a lost cause here. Once a page is allocated and deposited,
> > > > > the crash kernel will try to write it into the core.
> > > > 
> > > > That's why we have to implement something where we attempt to destroy
> > > > partitions and reclaim memory (and BUG() out if that fails; which
> > > > hopefully should happen very rarely if at all). This should be *the*
> > > > solution we work towards. We don't need a temporary disable kexec
> > > > solution.
> > > > 
> > > 
> > > No, the solution is to preserve the shared state and pass it over via KHO.
> > 
> > Okay, then work towards it without doing temporary KEXEC disable. We can
> > call out that kexec is not supported until then. Disabling KEXEC is too
> > intrusive.
> > 
> 
> What do you mean by "too intrusive"? The change if local to driver's
> Kconfig. There are no verbal "callouts" in upstream Linux - that's
> exactly what Kconfig is used for. Once the proper solution is
> implemented, we can remove the restriction.
> 
> > Is there any precedent for this? Do you know if any driver ever disabled
> > KEXEC this way?
> > 
> 
> No, but there is no other similar driver like this one.

Doesn't have to be like this one. There could be issues with device
states during kexec state.

> Why does it matter though?

To learn from past precedents.

> 
> > > 
> > > > > 
> > > > > > Also, what makes you think customers would even be interested in enabling
> > > > > > our module in their kernel configs if it takes away kexec?
> > > > > > 
> > > > > 
> > > > > It's simple: L1VH isn't a host, so I can spin up new VMs instead of
> > > > > servicing the existing ones.
> > > > 
> > > > And what about the L2 VM state then? They might not be throwaway in all
> > > > cases.
> > > > 
> > > 
> > > L2 guest can (and likely will) be migrated fromt he old L1VH to the new
> > > one.
> > > And this is most likely the current scenario customers are using.
> > > 
> > > > > 
> > > > > Why do you think there won’t be customers interested in using MSHV in
> > > > > L1VH without kexec support?
> > > > 
> > > > Because they could already be using kexec for their servicing needs or
> > > > whatever. And no we can't just say "don't service these VMs just spin up
> > > > new ones".
> > > > 
> > > 
> > > Are you speculating or know for sure?
> > 
> > It's a reasonable assumption that people are using kexec for servicing.
> > 
> 
> Again, using kexec for servicing is not supported: why pretending it is?

What this patch effectively asserts is that kexec is unsupported whenever the
MSHV driver is enabled. But that is not accurate. Enabling MSHV does not
necessarily imply that it is being used. The correct statement is that kexec is
unsupported only when MSHV is *in use*, i.e. when one or more VMs are
running.

By disabling kexec unconditionally, the patch prevents a valid workflow in
situations where no VMs exist and kexec would work without issue. This imposes a
blanket restriction instead of enforcing the actual requirement.

And sure, I understand there is no way to enforce that actual
requirement. So this is what I propose:

The statement "kexec is not supported when the MSHV driver is used" can be
documented on docs.microsoft.com once direct virtualization becomes broadly
available. The documentation can also provide operational guidance, such as
shutting down all VMs before invoking kexec for servicing. This preserves a
practical path for users who rely on kexec. If kexec is disabled entirely, that
flexibility is lost.

The stricter approach ensures users cannot accidentally make a mistake, which
has its merits. However, my approach gives more power and discretion to
the user. In parallel, we of course continue to work on making it
robust.

> 
> > > 
> > > > Also, keep in mind that once L1VH is available in Azure, the distros
> > > > that run on it would be the same distros that run on all other Azure
> > > > VMs. There won't be special distros with a kernel specifically built for
> > > > L1VH. And KEXEC is generally enabled in distros. Distro vendors won't be
> > > > happy that they would need to publish a separate version of their image with
> > > > MSHV_ROOT enabled and KEXEC disabled because they wouldn't want KEXEC to
> > > > be disabled for all Azure VMs. Also, the customers will be confused why
> > > > the same distro doesn't work on L1VH.
> > > > 
> > > 
> > > I don't think distro happiness is our concern. They already build custom
> > 
> > If distros are not happy they won't package this and consequently
> > nobody will use it.
> > 
> 
> Could you provide an example of such issues in the past?
> 
> > > versions for Azure. They can build another custom version for L1VH if
> > > needed.
> > 
> > We should at least check if they are ready to do this.
> > 
> 
> This is a labor intrusive and long-term check. Unless there is a solid
> evidence that they won't do it, I don't see the point in doing this.

It is reasonable to assume that maintaining an additional flavor of a
distro is an overhead (maintain new package(s), maintain Azure
marketplace images etc etc). This should be enough reason to check. Not
everything needs a solid evidence. Often times a reasonable suspiscion
will do.

Thanks,
Anirudh.

> 
> Thanks,
> Stanislav
> 
> > Thanks,
> > Anirudh.
> > 
> > > 
> > > Anyway, I don't see the point in continuing this discussion. All points
> > > have been made, and solutions have been proposed.
> > > 
> > > If you can come up with something better in the next few days, so we at
> > > least have a chance to get it merged in the next merge window, great. If
> > > not, we should explicitly forbid the unsupported feature and move on.
> > > 
> > > Thanks,
> > > Thanks,
> > > Stanislav
> > > 
> > > > Thanks,
> > > > Anirudh.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
  2026-02-05  4:59                                 ` Anirudh Rayabharam
@ 2026-02-05 17:12                                   ` Stanislav Kinsburskii
  0 siblings, 0 replies; 41+ messages in thread
From: Stanislav Kinsburskii @ 2026-02-05 17:12 UTC (permalink / raw)
  To: Anirudh Rayabharam
  Cc: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel

On Thu, Feb 05, 2026 at 04:59:35AM +0000, Anirudh Rayabharam wrote:
> On Wed, Feb 04, 2026 at 10:33:11AM -0800, Stanislav Kinsburskii wrote:
> > On Wed, Feb 04, 2026 at 05:33:29AM +0000, Anirudh Rayabharam wrote:
> > > On Tue, Feb 03, 2026 at 11:42:58AM -0800, Stanislav Kinsburskii wrote:
> > > > On Tue, Feb 03, 2026 at 04:46:03PM +0000, Anirudh Rayabharam wrote:
> > > > > On Tue, Feb 03, 2026 at 07:40:36AM -0800, Stanislav Kinsburskii wrote:
> > > > > > On Tue, Feb 03, 2026 at 10:34:28AM +0530, Anirudh Rayabharam wrote:
> > > > > > > On Mon, Feb 02, 2026 at 11:18:27AM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > On Mon, Feb 02, 2026 at 07:01:01PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > > On Mon, Feb 02, 2026 at 09:10:00AM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > > On Fri, Jan 30, 2026 at 08:32:45PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > > > > On Fri, Jan 30, 2026 at 10:46:45AM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > On Fri, Jan 30, 2026 at 05:11:12PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > > > > > > On Wed, Jan 28, 2026 at 03:11:14PM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > > > On Wed, Jan 28, 2026 at 04:16:31PM +0000, Anirudh Rayabharam wrote:
> > > > > > > > > > > > > > > On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > > > > > On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > > > > > > > > > > > > > > > > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > > > > > > > > > > > > > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > > > > > > > > > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > > > > > > > > > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > > > > > > > > > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > > > > > > > > > > > > > hypervisor deposited pages.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > > > > > > > > > > > > > management is implemented.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Someone might want to stop all guest VMs and do a kexec. Which is valid
> > > > > > > > > > > > > > > > > and would work without any issue for L1VH.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > No, it won't work and hypervsisor depostied pages won't be withdrawn.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > All pages that were deposited in the context of a guest partition (i.e.
> > > > > > > > > > > > > > > with the guest partition ID), would be withdrawn when you kill the VMs,
> > > > > > > > > > > > > > > right? What other deposited pages would be left?
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > The driver deposits two types of pages: one for the guests (withdrawn
> > > > > > > > > > > > > > upon gust shutdown) and the other - for the host itself (never
> > > > > > > > > > > > > > withdrawn).
> > > > > > > > > > > > > > See hv_call_create_partition, for example: it deposits pages for the
> > > > > > > > > > > > > > host partition.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Hmm.. I see. Is it not possible to reclaim this memory in module_exit?
> > > > > > > > > > > > > Also, can't we forcefully kill all running partitions in module_exit and
> > > > > > > > > > > > > then reclaim memory? Would this help with kernel consistency
> > > > > > > > > > > > > irrespective of userspace behavior?
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > It would, but this is sloppy and cannot be a long-term solution.
> > > > > > > > > > > > 
> > > > > > > > > > > > It is also not reliable. We have no hook to prevent kexec. So if we fail
> > > > > > > > > > > > to kill the guest or reclaim the memory for any reason, the new kernel
> > > > > > > > > > > > may still crash.
> > > > > > > > > > > 
> > > > > > > > > > > Actually guests won't be running by the time we reach our module_exit
> > > > > > > > > > > function during a kexec. Userspace processes would've been killed by
> > > > > > > > > > > then.
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > No, they will not: "kexec -e" doesn't kill user processes.
> > > > > > > > > > We must not rely on OS to do graceful shutdown before doing
> > > > > > > > > > kexec.
> > > > > > > > > 
> > > > > > > > > I see kexec -e is too brutal. Something like systemctl kexec is
> > > > > > > > > more graceful and is probably used more commonly. In this case at least
> > > > > > > > > we could register a reboot notifier and attempt to clean things up.
> > > > > > > > > 
> > > > > > > > > I think it is better to support kexec to this extent rather than
> > > > > > > > > disabling it entirely.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > You do understand that once our kernel is released to third parties, we
> > > > > > > > can’t control how they will use kexec, right?
> > > > > > > 
> > > > > > > Yes, we can't. But that's okay. It is fine for us to say that only some
> > > > > > > kexec scenarios are supported and some aren't (iff you're creating VMs
> > > > > > > using MSHV; if you're not creating VMs all of kexec is supported).
> > > > > > > 
> > > > > > 
> > > > > > Well, I disagree here. If we say the kernel supports MSHV, we must
> > > > > > provide a robust solution. A partially working solution is not
> > > > > > acceptable. It makes us look careless and can damage our reputation as a
> > > > > > team (and as a company).
> > > > > 
> > > > > It won't if we call out upfront what is supported and what is not.
> > > > > 
> > > > > > 
> > > > > > > > 
> > > > > > > > This is a valid and existing option. We have to account for it. Yet
> > > > > > > > again, L1VH will be used by arbitrary third parties out there, not just
> > > > > > > > by us.
> > > > > > > > 
> > > > > > > > We can’t say the kernel supports MSHV until we close these gaps. We must
> > > > > > > 
> > > > > > > We can. It is okay say some scenarios are supported and some aren't.
> > > > > > > 
> > > > > > > All kexecs are supported if they never create VMs using MSHV. If they do
> > > > > > > create VMs using MSHV and we implement cleanup in a reboot notifier at
> > > > > > > least systemctl kexec and crashdump kexec would which are probably the
> > > > > > > most common uses of kexec. It's okay to say that this is all we support
> > > > > > > as of now.
> > > > > > > 
> > > > > > 
> > > > > > I'm repeating myself, but I'll try to put it differently.
> > > > > > There won't be any kernel core collected if a page was deposited. You're
> > > > > > arguing for a lost cause here. Once a page is allocated and deposited,
> > > > > > the crash kernel will try to write it into the core.
> > > > > 
> > > > > That's why we have to implement something where we attempt to destroy
> > > > > partitions and reclaim memory (and BUG() out if that fails; which
> > > > > hopefully should happen very rarely if at all). This should be *the*
> > > > > solution we work towards. We don't need a temporary disable kexec
> > > > > solution.
> > > > > 
> > > > 
> > > > No, the solution is to preserve the shared state and pass it over via KHO.
> > > 
> > > Okay, then work towards it without doing temporary KEXEC disable. We can
> > > call out that kexec is not supported until then. Disabling KEXEC is too
> > > intrusive.
> > > 
> > 
> > What do you mean by "too intrusive"? The change if local to driver's
> > Kconfig. There are no verbal "callouts" in upstream Linux - that's
> > exactly what Kconfig is used for. Once the proper solution is
> > implemented, we can remove the restriction.
> > 
> > > Is there any precedent for this? Do you know if any driver ever disabled
> > > KEXEC this way?
> > > 
> > 
> > No, but there is no other similar driver like this one.
> 
> Doesn't have to be like this one. There could be issues with device
> states during kexec state.
> 
> > Why does it matter though?
> 
> To learn from past precedents.
> 
> > 
> > > > 
> > > > > > 
> > > > > > > Also, what makes you think customers would even be interested in enabling
> > > > > > > our module in their kernel configs if it takes away kexec?
> > > > > > > 
> > > > > > 
> > > > > > It's simple: L1VH isn't a host, so I can spin up new VMs instead of
> > > > > > servicing the existing ones.
> > > > > 
> > > > > And what about the L2 VM state then? They might not be throwaway in all
> > > > > cases.
> > > > > 
> > > > 
> > > > L2 guest can (and likely will) be migrated fromt he old L1VH to the new
> > > > one.
> > > > And this is most likely the current scenario customers are using.
> > > > 
> > > > > > 
> > > > > > Why do you think there won’t be customers interested in using MSHV in
> > > > > > L1VH without kexec support?
> > > > > 
> > > > > Because they could already be using kexec for their servicing needs or
> > > > > whatever. And no we can't just say "don't service these VMs just spin up
> > > > > new ones".
> > > > > 
> > > > 
> > > > Are you speculating or know for sure?
> > > 
> > > It's a reasonable assumption that people are using kexec for servicing.
> > > 
> > 
> > Again, using kexec for servicing is not supported: why pretending it is?
> 
> What this patch effectively asserts is that kexec is unsupported whenever the
> MSHV driver is enabled. But that is not accurate. Enabling MSHV does not
> necessarily imply that it is being used. The correct statement is that kexec is
> unsupported only when MSHV is *in use*, i.e. when one or more VMs are
> running.
> 
> By disabling kexec unconditionally, the patch prevents a valid workflow in
> situations where no VMs exist and kexec would work without issue. This imposes a
> blanket restriction instead of enforcing the actual requirement.
> 
> And sure, I understand there is no way to enforce that actual
> requirement. So this is what I propose:
> 
> The statement "kexec is not supported when the MSHV driver is used" can be
> documented on docs.microsoft.com once direct virtualization becomes broadly
> available. The documentation can also provide operational guidance, such as
> shutting down all VMs before invoking kexec for servicing. This preserves a
> practical path for users who rely on kexec. If kexec is disabled entirely, that
> flexibility is lost.
> 
> The stricter approach ensures users cannot accidentally make a mistake, which
> has its merits. However, my approach gives more power and discretion to
> the user. In parallel, we of course continue to work on making it
> robust.
> 

The flexibility is much smaller than you described. The host can’t kexec
if a VM was ever created, because we don’t withdraw the host pages.

Even if we try to withdraw pages during kexec, it won’t help with crash
collection. Those pages will be in use and won’t be available to
withdraw.

So the trade-off is between being able to kexec safely only before any
VM has been launched, or blocking it completely.

> > 
> > > > 
> > > > > Also, keep in mind that once L1VH is available in Azure, the distros
> > > > > that run on it would be the same distros that run on all other Azure
> > > > > VMs. There won't be special distros with a kernel specifically built for
> > > > > L1VH. And KEXEC is generally enabled in distros. Distro vendors won't be
> > > > > happy that they would need to publish a separate version of their image with
> > > > > MSHV_ROOT enabled and KEXEC disabled because they wouldn't want KEXEC to
> > > > > be disabled for all Azure VMs. Also, the customers will be confused why
> > > > > the same distro doesn't work on L1VH.
> > > > > 
> > > > 
> > > > I don't think distro happiness is our concern. They already build custom
> > > 
> > > If distros are not happy they won't package this and consequently
> > > nobody will use it.
> > > 
> > 
> > Could you provide an example of such issues in the past?
> > 
> > > > versions for Azure. They can build another custom version for L1VH if
> > > > needed.
> > > 
> > > We should at least check if they are ready to do this.
> > > 
> > 
> > This is a labor intrusive and long-term check. Unless there is a solid
> > evidence that they won't do it, I don't see the point in doing this.
> 
> It is reasonable to assume that maintaining an additional flavor of a
> distro is an overhead (maintain new package(s), maintain Azure
> marketplace images etc etc). This should be enough reason to check. Not
> everything needs a solid evidence. Often times a reasonable suspiscion
> will do.
> 

There will be a new kernel flavor anyway. That means a new kernel
package. If we also need a separate distro image for MSHV on Azure VMs,
it will be needed regardless of kexec support. There won’t be a generic
Ubuntu build that works both for regular guest VMs and for L1VH VMs any
time soon.

Thanks,
Stanislav

> Thanks,
> Anirudh.
> 
> > 
> > Thanks,
> > Stanislav
> > 
> > > Thanks,
> > > Anirudh.
> > > 
> > > > 
> > > > Anyway, I don't see the point in continuing this discussion. All points
> > > > have been made, and solutions have been proposed.
> > > > 
> > > > If you can come up with something better in the next few days, so we at
> > > > least have a chance to get it merged in the next merge window, great. If
> > > > not, we should explicitly forbid the unsupported feature and move on.
> > > > 
> > > > Thanks,
> > > > Thanks,
> > > > Stanislav
> > > > 
> > > > > Thanks,
> > > > > Anirudh.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2026-02-05 17:12 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-23 22:20 [PATCH] mshv: Make MSHV mutually exclusive with KEXEC Stanislav Kinsburskii
2026-01-24  0:09 ` Nuno Das Neves
2026-01-24  0:16 ` Mukesh R
2026-01-25 22:39   ` Stanislav Kinsburskii
2026-01-26 20:20     ` Mukesh R
2026-01-26 20:43       ` Stanislav Kinsburskii
2026-01-26 23:07         ` Mukesh R
2026-01-27  0:21           ` Stanislav Kinsburskii
2026-01-27  1:39             ` Mukesh R
2026-01-27 17:47               ` Stanislav Kinsburskii
2026-01-27 19:56                 ` Mukesh R
2026-01-28 15:53                   ` Michael Kelley
2026-01-30  2:52                     ` Mukesh R
2026-01-28 23:08                   ` Stanislav Kinsburskii
2026-01-30  2:59                     ` Mukesh R
2026-01-30 17:17                       ` Anirudh Rayabharam
2026-01-30 18:41                         ` Stanislav Kinsburskii
2026-01-30 19:47                           ` Mukesh R
2026-02-02 16:43                             ` Stanislav Kinsburskii
2026-02-02 20:15                               ` Mukesh R
2026-02-04  2:46                                 ` Mukesh R
2026-01-26 18:49 ` Anirudh Rayabharam
2026-01-26 20:46   ` Stanislav Kinsburskii
2026-01-28 16:16     ` Anirudh Rayabharam
2026-01-28 23:11       ` Stanislav Kinsburskii
2026-01-30 17:11         ` Anirudh Rayabharam
2026-01-30 18:46           ` Stanislav Kinsburskii
2026-01-30 20:32             ` Anirudh Rayabharam
2026-02-02 17:10               ` Stanislav Kinsburskii
2026-02-02 19:01                 ` Anirudh Rayabharam
2026-02-02 19:18                   ` Stanislav Kinsburskii
2026-02-03  5:04                     ` Anirudh Rayabharam
2026-02-03 15:40                       ` Stanislav Kinsburskii
2026-02-03 16:46                         ` Anirudh Rayabharam
2026-02-03 19:42                           ` Stanislav Kinsburskii
2026-02-04  5:33                             ` Anirudh Rayabharam
2026-02-04 18:33                               ` Stanislav Kinsburskii
2026-02-05  4:59                                 ` Anirudh Rayabharam
2026-02-05 17:12                                   ` Stanislav Kinsburskii
2026-02-02 18:09           ` Stanislav Kinsburskii
2026-02-02 16:56 ` Naman Jain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox