From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1310F1E3DDE; Fri, 30 Jan 2026 02:53:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769741582; cv=none; b=rQgnvQNZXMR+pcENsjOsLsTUhxuJ7dd4qvsELHmrsRGXFWqGHorGLkLgYK3UJZ5Zubz4QCoDxdxVh6r4hbwoNWwpM1lIZt7AbmHzFeMWlMJ5TukNuTNMHeC/ewG/5rY3zlgwnVJjtP4FOC6YTNVLA7Nvbmk0lVeLiw3AcGmi05I= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769741582; c=relaxed/simple; bh=pa9KSP2ETTIjMmTecDGbtbIslfVbLGe6PnXsrnXYGKY=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=mTTGBksUxM5lH+X7d4BLhpMnrZhdls7KmR9Q4loFx5YbliR+gskWqXFVuz+SVex+R12b2NOeLoi6/S5EVSQO3fpU0zxh2rngpvXCqZlMqNhU+l/wtgQ0R4hUnSkX1XdM2alLTy0Umbz+4YIUcHRt0bOiysX8T8yvQq3+4DLlSz4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=lTq+nwuB; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="lTq+nwuB" Received: from [100.93.96.72] (unknown [40.118.131.60]) by linux.microsoft.com (Postfix) with ESMTPSA id 5689020B7167; Thu, 29 Jan 2026 18:53:00 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 5689020B7167 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1769741580; bh=qJbEnXaZFlp7XQYHqLovcTm6TAqRuPhpqXHxd4OSaQ0=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=lTq+nwuBIzU2NUDrjYwnqvB9RrUgFkCFhn0A+jejgcBzpE+RJs86AaNf6c/Zx6079 sE41OcIBVxjn4FjJAcfMlDI0TYzNvJ6sLu2A+7kXIbFzKqjnFecDuOGlpRPsU+dwj/ rRpHIzKLLF31N71zvGx9JUGOslVEoc4bkF4AIDio= Message-ID: <6e480ee7-683a-e5f1-7448-51f257d58614@linux.microsoft.com> Date: Thu, 29 Jan 2026 18:52:59 -0800 Precedence: bulk X-Mailing-List: linux-hyperv@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.13.1 Subject: Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC Content-Language: en-US To: Michael Kelley , Stanislav Kinsburskii Cc: "kys@microsoft.com" , "haiyangz@microsoft.com" , "wei.liu@kernel.org" , "decui@microsoft.com" , "longli@microsoft.com" , "linux-hyperv@vger.kernel.org" , "linux-kernel@vger.kernel.org" References: <176920684805.250171.6817228088359793537.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net> <549041d1-360d-d34c-4e3b-62802346acaa@linux.microsoft.com> <890506f6-9b91-5d59-8c98-086cf5d206bb@linux.microsoft.com> <2b42997d-7cc0-56ba-e1ca-a8640ce71ea9@linux.microsoft.com> <257ad7f1-5dc0-2644-41c3-960c396caa38@linux.microsoft.com> <4bcd7b66-6e3b-8f53-b688-ce0272123839@linux.microsoft.com> From: Mukesh R In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 1/28/26 07:53, Michael Kelley wrote: > From: Mukesh R Sent: Tuesday, January 27, 2026 11:56 AM >> To: Stanislav Kinsburskii >> Cc: kys@microsoft.com; haiyangz@microsoft.com; wei.liu@kernel.org; >> decui@microsoft.com; longli@microsoft.com; linux-hyperv@vger.kernel.org; linux- >> kernel@vger.kernel.org >> Subject: Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC >> >> On 1/27/26 09:47, Stanislav Kinsburskii wrote: >>> On Mon, Jan 26, 2026 at 05:39:49PM -0800, Mukesh R wrote: >>>> On 1/26/26 16:21, Stanislav Kinsburskii wrote: >>>>> On Mon, Jan 26, 2026 at 03:07:18PM -0800, Mukesh R wrote: >>>>>> On 1/26/26 12:43, Stanislav Kinsburskii wrote: >>>>>>> On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote: >>>>>>>> On 1/25/26 14:39, Stanislav Kinsburskii wrote: >>>>>>>>> On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote: >>>>>>>>>> On 1/23/26 14:20, Stanislav Kinsburskii wrote: >>>>>>>>>>> The MSHV driver deposits kernel-allocated pages to the hypervisor during >>>>>>>>>>> runtime and never withdraws them. This creates a fundamental incompatibility >>>>>>>>>>> with KEXEC, as these deposited pages remain unavailable to the new kernel >>>>>>>>>>> loaded via KEXEC, leading to potential system crashes upon kernel accessing >>>>>>>>>>> hypervisor deposited pages. >>>>>>>>>>> >>>>>>>>>>> Make MSHV mutually exclusive with KEXEC until proper page lifecycle >>>>>>>>>>> management is implemented. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Stanislav Kinsburskii >>>>>>>>>>> --- >>>>>>>>>>> drivers/hv/Kconfig | 1 + >>>>>>>>>>> 1 file changed, 1 insertion(+) >>>>>>>>>>> >>>>>>>>>>> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig >>>>>>>>>>> index 7937ac0cbd0f..cfd4501db0fa 100644 >>>>>>>>>>> --- a/drivers/hv/Kconfig >>>>>>>>>>> +++ b/drivers/hv/Kconfig >>>>>>>>>>> @@ -74,6 +74,7 @@ config MSHV_ROOT >>>>>>>>>>> # e.g. When withdrawing memory, the hypervisor gives back 4k pages in >>>>>>>>>>> # no particular order, making it impossible to reassemble larger pages >>>>>>>>>>> depends on PAGE_SIZE_4KB >>>>>>>>>>> + depends on !KEXEC >>>>>>>>>>> select EVENTFD >>>>>>>>>>> select VIRT_XFER_TO_GUEST_WORK >>>>>>>>>>> select HMM_MIRROR >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c >>>>>>>>>> implying that crash dump might be involved. Or did you test kdump >>>>>>>>>> and it was fine? >>>>>>>>>> >>>>>>>>> >>>>>>>>> Yes, it will. Crash kexec depends on normal kexec functionality, so it >>>>>>>>> will be affected as well. >>>>>>>> >>>>>>>> So not sure I understand the reason for this patch. We can just block >>>>>>>> kexec if there are any VMs running, right? Doing this would mean any >>>>>>>> further developement would be without a ver important and major feature, >>>>>>>> right? >>>>>>> >>>>>>> This is an option. But until it's implemented and merged, a user mshv >>>>>>> driver gets into a situation where kexec is broken in a non-obvious way. >>>>>>> The system may crash at any time after kexec, depending on whether the >>>>>>> new kernel touches the pages deposited to hypervisor or not. This is a >>>>>>> bad user experience. >>>>>> >>>>>> I understand that. But with this we cannot collect core and debug any >>>>>> crashes. I was thinking there would be a quick way to prohibit kexec >>>>>> for update via notifier or some other quick hack. Did you already >>>>>> explore that and didn't find anything, hence this? >>>>>> >>>>> >>>>> This quick hack you mention isn't quick in the upstream kernel as there >>>>> is no hook to interrupt kexec process except the live update one. >>>> >>>> That's the one we want to interrupt and block right? crash kexec >>>> is ok and should be allowed. We can document we don't support kexec >>>> for update for now. >>>> >>>>> I sent an RFC for that one but given todays conversation details is >>>>> won't be accepted as is. >>>> >>>> Are you taking about this? >>>> >>>> "mshv: Add kexec safety for deposited pages" >>>> >>> >>> Yes. >>> >>>>> Making mshv mutually exclusive with kexec is the only viable option for >>>>> now given time constraints. >>>>> It is intended to be replaced with proper page lifecycle management in >>>>> the future. >>>> >>>> Yeah, that could take a long time and imo we cannot just disable KEXEC >>>> completely. What we want is just block kexec for updates from some >>>> mshv file for now, we an print during boot that kexec for updates is >>>> not supported on mshv. Hope that makes sense. >>>> >>> >>> The trade-off here is between disabling kexec support and having the >>> kernel crash after kexec in a non-obvious way. This affects both regular >>> kexec and crash kexec. >> >> crash kexec on baremetal is not affected, hence disabling that >> doesn't make sense as we can't debug crashes then on bm. >> >> Let me think and explore a bit, and if I come up with something, I'll >> send a patch here. If nothing, then we can do this as last resort. >> >> Thanks, >> -Mukesh > > Maybe you've already looked at this, but there's a sysctl parameter > kernel.kexec_load_limit_reboot that prevents loading a kexec > kernel for reboot if the value is zero. Separately, there is > kernel.kexec_load_limit_panic that controls whether a kexec > kernel can be loaded for kdump purposes. > > kernel.kexec_load_limit_reboot defaults to -1, which allows an > unlimited number of loading a kexec kernel for reboot. But the value > can be set to zero with this kernel boot line parameter: > > sysctl.kernel.kexec_load_limit_reboot=0 > > Alternatively, the mshv driver initialization could add code along > the lines of process_sysctl_arg() to open > /proc/sys/kernel/kexec_load_limit_reboot and write a value of zero. > Then there's no dependency on setting the kernel boot line. > > The downside to either method is that after Linux in the root partition > is up-and-running, it is possible to change the sysctl to a non-zero value, > and then load a kexec kernel for reboot. So this approach isn't absolute > protection against doing a kexec for reboot. But it makes it harder, and > until there's a mechanism to reclaim the deposited pages, it might be > a viable compromise to allow kdump to still be used. Mmm...eee...weelll... i think i see a much easier way to do this by just hijacking __kexec_lock. I will resume my normal work tmrw/Fri, so let me test it out. if it works, will send patch Monday. Thanks, -Mukesh > Just a thought .... > > Michael > >> >> >>> It?s a pity we can?t apply a quick hack to disable only regular kexec. >>> However, since crash kexec would hit the same issues, until we have a >>> proper state transition for deposted pages, the best workaround for now >>> is to reset the hypervisor state on every kexec, which needs design, >>> work, and testing. >>> >>> Disabling kexec is the only consistent way to handle this in the >>> upstream kernel at the moment. >>> >>> Thanks, Stanislav