* Re: [PATCH 0/2] kexec: Refuse kernel-unsafe Microsoft Hypervisor transitions
From: Stanislav Kinsburskii @ 2026-02-11 23:30 UTC (permalink / raw)
To: rppt, akpm, bhe, kys, haiyangz, wei.liu, decui, longli
Cc: kexec, linux-hyperv, linux-kernel
In-Reply-To: <176962149772.85424.9395505307198316093.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
On Wed, Jan 28, 2026 at 05:41:56PM +0000, Stanislav Kinsburskii wrote:
> When Microsoft Hypervisor is active, the kernel may have memory “deposited”
> to the hypervisor. Those pages are no longer safe for the kernel to touch,
> and attempting to access them can trigger a GPF. The problem becomes acute
> with kexec: the “deposited pages” state does not survive the transition,
> and the next kernel has no reliable way to know which pages are still
> owned/managed by the hypervisor.
>
> Until there is a proper handoff mechanism to preserve that state across
> kexec, the only safe behavior is to refuse kexec whenever there is shared
> hypervisor state that cannot survive the transition—most notably deposited
> pages, and also cases where VMs are still running.
>
> This series adds the missing kexec integration point needed by MSHV: a
> callback at the kexec “freeze” stage so the driver can make the transition
> safe (or block it). With this hook, MSHV can refuse kexec while VMs are
> running, attempt to withdraw deposited pages when possible (e.g. L1VH
> host), and fail the transition if any pages remain deposited.
>
> ---
>
> Stanislav Kinsburskii (2):
> kexec: Add permission notifier chain for kexec operations
> mshv: Add kexec blocking support
>
Hi,
I’m sending a gentle follow‑up on the patch series below, which I posted
about two weeks ago. I wanted to check whether anyone has had a chance
to look at it, or if there are concerns I should address.
Any feedback would be appreciated.
Thanks for your time.
Best regards,
Stanislav
>
> drivers/hv/Makefile | 1 +
> drivers/hv/hv_proc.c | 4 ++
> drivers/hv/mshv_kexec.c | 66 ++++++++++++++++++++++++++++++++++++++++
> drivers/hv/mshv_root.h | 14 ++++++++
> drivers/hv/mshv_root_hv_call.c | 2 +
> drivers/hv/mshv_root_main.c | 7 ++++
> include/linux/kexec.h | 6 ++++
> kernel/kexec_core.c | 24 +++++++++++++++
> 8 files changed, 124 insertions(+)
> create mode 100644 drivers/hv/mshv_kexec.c
>
^ permalink raw reply
* RE: [PATCH 2/2] drm/hyperv: During panic do VMBus unload after frame buffer is flushed
From: mhklkml @ 2026-02-11 23:01 UTC (permalink / raw)
To: 'Jocelyn Falempe', mhklinux, drawat.floss,
maarten.lankhorst, mripard, tzimmermann, airlied, simona, kys,
haiyangz, wei.liu, decui, longli, ryasuoka
Cc: dri-devel, linux-kernel, linux-hyperv, stable
In-Reply-To: <a5372b72-8dc0-4f2d-ad5c-086f3e75ee81@redhat.com>
From: Jocelyn Falempe <jfalempe@redhat.com> Sent: Wednesday, February 11, 2026 1:54 PM
>
> On 09/02/2026 08:02, mhkelley58@gmail.com wrote:
> > From: Michael Kelley <mhklinux@outlook.com>
> >
> > In a VM, Linux panic information (reason for the panic, stack trace,
> > etc.) may be written to a serial console and/or a virtual frame buffer
> > for a graphics console. The latter may need to be flushed back to the
> > host hypervisor for display.
> >
> > The current Hyper-V DRM driver for the frame buffer does the flushing
> > *after* the VMBus connection has been unloaded, such that panic messages
> > are not displayed on the graphics console. A user with a Hyper-V graphics
> > console is left with just a hung empty screen after a panic. The enhanced
> > control that DRM provides over the panic display in the graphics console
> > is similarly non-functional.
> >
> > Commit 3671f3777758 ("drm/hyperv: Add support for drm_panic") added
> > the Hyper-V DRM driver support to flush the virtual frame buffer. It
> > provided necessary functionality but did not handle the sequencing
> > problem with VMBus unload.
> >
> > Fix the full problem by using VMBus functions to suppress the VMBus
> > unload that is normally done by the VMBus driver in the panic path. Then
> > after the frame buffer has been flushed, do the VMBus unload so that a
> > kdump kernel can start cleanly. As expected, CONFIG_DRM_PANIC must be
> > selected for these changes to have effect. As a side benefit, the
> > enhanced features of the DRM panic path are also functional.
>
> Thanks for properly fixing this issue with DRM Panic on hyperv.
>
> I've a small comment below.
>
> With that fixed:
> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
>
> The first patch looks good too, I can review it if no other step up, as
> I'm not familiar with hyperv.
>
> >
> > Fixes: 3671f3777758 ("drm/hyperv: Add support for drm_panic")
> > Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> > ---
> > drivers/gpu/drm/hyperv/hyperv_drm_drv.c | 4 ++++
> > drivers/gpu/drm/hyperv/hyperv_drm_modeset.c | 15 ++++++++-------
> > 2 files changed, 12 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
> b/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
> > index 06b5d96e6eaf..79e51643be67 100644
> > --- a/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
> > +++ b/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
> > @@ -150,6 +150,9 @@ static int hyperv_vmbus_probe(struct hv_device *hdev,
> > goto err_free_mmio;
> > }
> >
> > + /* If DRM panic path is stubbed out VMBus code must do the unload */
> > + if (IS_ENABLED(CONFIG_DRM_PANIC) && IS_ENABLED(CONFIG_PRINTK))
>
> I think drm_panic should still work without printk.
> The "user" panic screen would be unaffected, but the "kmsg" screen might
> be blank, and the "qr_code" would generate an empty qr code.
> (Actually I never tried to build a kernel without printk).
Yeah, I had never built such a kernel either until recently when the kernel
test robot flagged an error in Hyper-V code when CONFIG_PRINTK is not set. :-)
But for this patch, the issue is that drm_panic() never gets called if CONFIG_PRINTK
isn't set. In that case, kmsg_dump_register() is a stub that returns an error. So
drm_panic_register() never registers the callback to drm_panic(). And if
drm_panic() isn't going to run, responsibility for doing the VMBus unload
must remain with the VMBus code. It's hard to actually test this case because
of depending on printk() for debugging output, so double-check my
thinking.
Michael
>
> > + vmbus_set_skip_unload(true);
> > drm_client_setup(dev, NULL);
> >
> > return 0;
^ permalink raw reply
* Re: [PATCH 2/2] drm/hyperv: During panic do VMBus unload after frame buffer is flushed
From: Jocelyn Falempe @ 2026-02-11 21:54 UTC (permalink / raw)
To: mhklinux, drawat.floss, maarten.lankhorst, mripard, tzimmermann,
airlied, simona, kys, haiyangz, wei.liu, decui, longli, ryasuoka
Cc: dri-devel, linux-kernel, linux-hyperv, stable
In-Reply-To: <20260209070201.1492-2-mhklinux@outlook.com>
On 09/02/2026 08:02, mhkelley58@gmail.com wrote:
> From: Michael Kelley <mhklinux@outlook.com>
>
> In a VM, Linux panic information (reason for the panic, stack trace,
> etc.) may be written to a serial console and/or a virtual frame buffer
> for a graphics console. The latter may need to be flushed back to the
> host hypervisor for display.
>
> The current Hyper-V DRM driver for the frame buffer does the flushing
> *after* the VMBus connection has been unloaded, such that panic messages
> are not displayed on the graphics console. A user with a Hyper-V graphics
> console is left with just a hung empty screen after a panic. The enhanced
> control that DRM provides over the panic display in the graphics console
> is similarly non-functional.
>
> Commit 3671f3777758 ("drm/hyperv: Add support for drm_panic") added
> the Hyper-V DRM driver support to flush the virtual frame buffer. It
> provided necessary functionality but did not handle the sequencing
> problem with VMBus unload.
>
> Fix the full problem by using VMBus functions to suppress the VMBus
> unload that is normally done by the VMBus driver in the panic path. Then
> after the frame buffer has been flushed, do the VMBus unload so that a
> kdump kernel can start cleanly. As expected, CONFIG_DRM_PANIC must be
> selected for these changes to have effect. As a side benefit, the
> enhanced features of the DRM panic path are also functional.
Thanks for properly fixing this issue with DRM Panic on hyperv.
I've a small comment below.
With that fixed:
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
The first patch looks good too, I can review it if no other step up, as
I'm not familiar with hyperv.
>
> Fixes: 3671f3777758 ("drm/hyperv: Add support for drm_panic")
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> ---
> drivers/gpu/drm/hyperv/hyperv_drm_drv.c | 4 ++++
> drivers/gpu/drm/hyperv/hyperv_drm_modeset.c | 15 ++++++++-------
> 2 files changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_drv.c b/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
> index 06b5d96e6eaf..79e51643be67 100644
> --- a/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
> +++ b/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
> @@ -150,6 +150,9 @@ static int hyperv_vmbus_probe(struct hv_device *hdev,
> goto err_free_mmio;
> }
>
> + /* If DRM panic path is stubbed out VMBus code must do the unload */
> + if (IS_ENABLED(CONFIG_DRM_PANIC) && IS_ENABLED(CONFIG_PRINTK))
I think drm_panic should still work without printk.
The "user" panic screen would be unaffected, but the "kmsg" screen might
be blank, and the "qr_code" would generate an empty qr code.
(Actually I never tried to build a kernel without printk).
> + vmbus_set_skip_unload(true);
> drm_client_setup(dev, NULL);
>
> return 0;
> @@ -169,6 +172,7 @@ static void hyperv_vmbus_remove(struct hv_device *hdev)
> struct drm_device *dev = hv_get_drvdata(hdev);
> struct hyperv_drm_device *hv = to_hv(dev);
>
> + vmbus_set_skip_unload(false);
> drm_dev_unplug(dev);
> drm_atomic_helper_shutdown(dev);
> vmbus_close(hdev->channel);
> diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c b/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c
> index 7978f8c8108c..d48ca6c23b7c 100644
> --- a/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c
> +++ b/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c
> @@ -212,15 +212,16 @@ static void hyperv_plane_panic_flush(struct drm_plane *plane)
> struct hyperv_drm_device *hv = to_hv(plane->dev);
> struct drm_rect rect;
>
> - if (!plane->state || !plane->state->fb)
> - return;
> + if (plane->state && plane->state->fb) {
> + rect.x1 = 0;
> + rect.y1 = 0;
> + rect.x2 = plane->state->fb->width;
> + rect.y2 = plane->state->fb->height;
>
> - rect.x1 = 0;
> - rect.y1 = 0;
> - rect.x2 = plane->state->fb->width;
> - rect.y2 = plane->state->fb->height;
> + hyperv_update_dirt(hv->hdev, &rect);
> + }
>
> - hyperv_update_dirt(hv->hdev, &rect);
> + vmbus_initiate_unload(true);
> }
>
> static const struct drm_plane_helper_funcs hyperv_plane_helper_funcs = {
--
Jocelyn
^ permalink raw reply
* Re: [RFC PATCH V2] x86/VMBus: Confidential VMBus for dynamic DMA buffer transition
From: Robin Murphy @ 2026-02-11 19:51 UTC (permalink / raw)
To: Michael Kelley, Tianyu Lan, kys@microsoft.com,
haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com,
longli@microsoft.com
Cc: Tianyu Lan, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, hch@infradead.org,
vdso@hexbites.dev, Suzuki K Poulose, Aneesh Kumar K.V (Arm)
In-Reply-To: <SN6PR02MB41577FB84EC73E48ABAC7D18D463A@SN6PR02MB4157.namprd02.prod.outlook.com>
On 2026-02-11 6:00 pm, Michael Kelley wrote:
> From: Tianyu Lan <ltykernel@gmail.com> Sent: Tuesday, February 10, 2026 8:21 AM
>>
>> Hyper-V provides Confidential VMBus to communicate between
>> device model and device guest driver via encrypted/private
>> memory in Confidential VM. The device model is in OpenHCL
>> (https://openvmm.dev/guide/user_guide/openhcl.html) that
>> plays the paravisor rule.
>>
>> For a VMBUS device, there are two communication methods to
>
> s/VMBUS/VMBus/
>
>> talk with Host/Hypervisor. 1) VMBus Ring buffer 2) dynamic
>> DMA transition.
>
> I'm not sure what "dynamic DMA transition" is. Maybe just
> "DMA transfers"? Also, do the same substitution further
> down in this commit message.
>
>> The Confidential VMBus Ring buffer has been
>> upstreamed by Roman Kisel(commit 6802d8af).
>
> It's customary to use 12 character commit IDs, which would be
> 6802d8af47d1 in this case.
>
>>
>> The dynamic DMA transition of VMBus device normally goes
>> through DMA core and it uses SWIOTLB as bounce buffer in
>> CVM
>
> "CVM" is Microsoft-speak. The Linux terminology is "a CoCo VM".
>
>> to communicate with Host/Hypervisor. The Confidential
>> VMBus device may use private/encrypted memory to do DMA
>> and so the device swiotlb(bounce buffer) isn't necessary.
>
> The phrase "isn't necessary" does not capture the real issue
> here. Saying "isn't necessary" makes it sound like this patch is
> just avoids unnecessary work, so that it is a performance
> improvement. But that's not the case.
>
> The real issue is that swiotlb memory is decrypted. So bouncing
> through the swiotlb exposes to the host what is supposed to be
> confidential data passed on the Confidential VMBus. Disabling
> the swiotlb bouncing in this case is a hard requirement to preserve
> confidentially.
Yeah, this really isn't a Hyper-V problem. Indeed as things stand,
"swiotlb=force" could potentially break confidentiality for any
environment trying to invent a notion of private DMA, and perhaps we
could throw a big warning about that, but really the answer there is
"Don't run your confidential workload with 'swiotlb=force'. Why would
you even do that? Debug your drivers in a regular VM or bare-metal with
full debug visibility like a normal person..."
The fact is we do not have a proper notion of trusted/private DMA yet,
and this is not the way to add it. The current assumption is very much
that all DMA is untrusted in the CoCo sense, because initially it was
only virtual devices emulated by a hypervisor, thus had to be bounced
through shared memory anyway. AMD SEV with a stage 1 IOMMU in the guest
can allow an assigned physical device to access a suitably-aligned
encrypted buffer directly, but that's still effectively just putting the
buffer into a temporarily shared state for that device, it merely skips
sharing it with the rest of the system. !force_dma_unencrypted() doesn't
mean "we trust this device's DMA", it just means "we don't have to use
explicitly-decrypted pages to accommodate untrusted/shared DMA here",
plus it also serves double-duty for host encryption which doesn't share
the same trust model anyway.
I assumed this would follow the TDISP stuff, but if Hyper-V has an
alternative device-trusting mechanism already then there's no need to
wait. We want some common device property (likely consolidating the
current PCI external-facing port notion of trustedness plus whatever
TDISP wants), with which we can then make proper decisions in all the
right DMA API paths - and if it can end up replacing the horrible
force_dma_unencrypted() as well then all the better! I'd totally
forgotten about the previous discussion that Michael referred to (which
I had to track down[1]), but it looks like all the main points were
already covered there and we were approaching a consensus, so really I
guess someone just needs to give it a go.
Thanks,
Robin.
[1]
https://lore.kernel.org/linux-hyperv/20250409000835.285105-6-romank@linux.microsoft.com/
>
> So I would reword the sentences as something like this:
>
> The Confidential VMBus device can do DMA directly to
> private/encrypted memory. Because the swiotlb is decrypted
> memory, the DMA transfer must not be bounced through the
> swiotlb, so as to preserve confidentiality. This is different from
> the default for Linux CoCo VMs, so disable the VMBus device's
> use of swiotlb.
>
>> To disable device's swiotlb, set device->dma_io_tlb_mem
>> to NULL in VMBus driver and is_swiotlb_force_bounce()
>> always returns false.
>>
>> Suggested-by: Michael Kelley <mhklinux@outlook.com>
>> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
>> ---
>> Change since v1:
>> Use device.dma_io_tlb_mem to disable device bounce buffer
>>
>> drivers/hv/vmbus_drv.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
>> index a53af6fe81a6..58dab8cc3fcb 100644
>> --- a/drivers/hv/vmbus_drv.c
>> +++ b/drivers/hv/vmbus_drv.c
>> @@ -2133,11 +2133,15 @@ int vmbus_device_register(struct hv_device *child_device_obj)
>> child_device_obj->device.dma_mask = &child_device_obj->dma_mask;
>> dma_set_mask(&child_device_obj->device, DMA_BIT_MASK(64));
>>
>> + device_initialize(&child_device_obj->device);
>> + if (child_device_obj->channel->co_external_memory)
>> + child_device_obj->device.dma_io_tlb_mem = NULL;
>> +
>
> Doing this as part of the VMBus bus driver makes sense. While directly
> setting device.dma_io_tlb_mem to NULL should work, it would be better
> to add a function to the swiotlb code to do this, and then call that function
> here, passing the device as an argument. The need to disable swiotlb on a
> device will likely arise in similar contexts (such as TDISP), and it would be
> better to have a swiotlb function for that purpose. This use case may be
> a bit ahead of the TDISP work, and having a swiotlb function in place will
> help ensure that duplicate mechanisms aren't created as everything
> comes together.
>
> See my earlier comments in [1] about the key point in the commit message,
> and about adding a swiotlb_dev_disable() function to the swiotlb code.
>
> Michael
>
> [1] https://lore.kernel.org/linux-hyperv/SN6PR02MB4157DAE6D8CC6BA11CA87298D4DCA@SN6PR02MB4157.namprd02.prod.outlook.com/
>
>> /*
>> * Register with the LDM. This will kick off the driver/device
>> * binding...which will eventually call vmbus_match() and vmbus_probe()
>> */
>> - ret = device_register(&child_device_obj->device);
>> + ret = device_add(&child_device_obj->device);
>> if (ret) {
>> pr_err("Unable to register child device\n");
>> put_device(&child_device_obj->device);
>> --
>> 2.50.1
^ permalink raw reply
* Re: [PATCH net-next v3] net: mana: Add MAC address to vPort logs and clarify error messages
From: Jakub Kicinski @ 2026-02-11 18:42 UTC (permalink / raw)
To: Erni Sri Satya Vennela
Cc: ernis, kys, haiyangz, wei.liu, decui, longli, andrew+netdev,
davem, edumazet, pabeni, dipayanroy, shradhagupta, shirazsaleem,
gargaditya, linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260211054335.8511-1-ernis@linux.microsoft.com>
On Tue, 10 Feb 2026 21:43:24 -0800 Erni Sri Satya Vennela wrote:
> Add MAC address to vPort configuration success message and update error
> message to be more specific about HWC message errors in
> mana_send_request and mana_hwc_send_request.
## Form letter - net-next-closed
We have already submitted our pull request with net-next material for v7.0,
and therefore net-next is closed for new drivers, features, code refactoring
and optimizations. We are currently accepting bug fixes only.
Please repost when net-next reopens after Feb 23rd.
RFC patches sent for review only are obviously welcome at any time.
See: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#development-cycle
--
pw-bot: defer
pv-bot: closed
^ permalink raw reply
* RE: [RFC PATCH V2] x86/VMBus: Confidential VMBus for dynamic DMA buffer transition
From: Michael Kelley @ 2026-02-11 18:00 UTC (permalink / raw)
To: Tianyu Lan, kys@microsoft.com, haiyangz@microsoft.com,
wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com
Cc: Tianyu Lan, linux-hyperv@vger.kernel.org,
linux-kernel@vger.kernel.org, hch@infradead.org,
robin.murphy@arm.com, vdso@hexbites.dev
In-Reply-To: <20260210162107.2270823-1-ltykernel@gmail.com>
From: Tianyu Lan <ltykernel@gmail.com> Sent: Tuesday, February 10, 2026 8:21 AM
>
> Hyper-V provides Confidential VMBus to communicate between
> device model and device guest driver via encrypted/private
> memory in Confidential VM. The device model is in OpenHCL
> (https://openvmm.dev/guide/user_guide/openhcl.html) that
> plays the paravisor rule.
>
> For a VMBUS device, there are two communication methods to
s/VMBUS/VMBus/
> talk with Host/Hypervisor. 1) VMBus Ring buffer 2) dynamic
> DMA transition.
I'm not sure what "dynamic DMA transition" is. Maybe just
"DMA transfers"? Also, do the same substitution further
down in this commit message.
> The Confidential VMBus Ring buffer has been
> upstreamed by Roman Kisel(commit 6802d8af).
It's customary to use 12 character commit IDs, which would be
6802d8af47d1 in this case.
>
> The dynamic DMA transition of VMBus device normally goes
> through DMA core and it uses SWIOTLB as bounce buffer in
> CVM
"CVM" is Microsoft-speak. The Linux terminology is "a CoCo VM".
> to communicate with Host/Hypervisor. The Confidential
> VMBus device may use private/encrypted memory to do DMA
> and so the device swiotlb(bounce buffer) isn't necessary.
The phrase "isn't necessary" does not capture the real issue
here. Saying "isn't necessary" makes it sound like this patch is
just avoids unnecessary work, so that it is a performance
improvement. But that's not the case.
The real issue is that swiotlb memory is decrypted. So bouncing
through the swiotlb exposes to the host what is supposed to be
confidential data passed on the Confidential VMBus. Disabling
the swiotlb bouncing in this case is a hard requirement to preserve
confidentially.
So I would reword the sentences as something like this:
The Confidential VMBus device can do DMA directly to
private/encrypted memory. Because the swiotlb is decrypted
memory, the DMA transfer must not be bounced through the
swiotlb, so as to preserve confidentiality. This is different from
the default for Linux CoCo VMs, so disable the VMBus device's
use of swiotlb.
> To disable device's swiotlb, set device->dma_io_tlb_mem
> to NULL in VMBus driver and is_swiotlb_force_bounce()
> always returns false.
>
> Suggested-by: Michael Kelley <mhklinux@outlook.com>
> Signed-off-by: Tianyu Lan <tiala@microsoft.com>
> ---
> Change since v1:
> Use device.dma_io_tlb_mem to disable device bounce buffer
>
> drivers/hv/vmbus_drv.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index a53af6fe81a6..58dab8cc3fcb 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2133,11 +2133,15 @@ int vmbus_device_register(struct hv_device *child_device_obj)
> child_device_obj->device.dma_mask = &child_device_obj->dma_mask;
> dma_set_mask(&child_device_obj->device, DMA_BIT_MASK(64));
>
> + device_initialize(&child_device_obj->device);
> + if (child_device_obj->channel->co_external_memory)
> + child_device_obj->device.dma_io_tlb_mem = NULL;
> +
Doing this as part of the VMBus bus driver makes sense. While directly
setting device.dma_io_tlb_mem to NULL should work, it would be better
to add a function to the swiotlb code to do this, and then call that function
here, passing the device as an argument. The need to disable swiotlb on a
device will likely arise in similar contexts (such as TDISP), and it would be
better to have a swiotlb function for that purpose. This use case may be
a bit ahead of the TDISP work, and having a swiotlb function in place will
help ensure that duplicate mechanisms aren't created as everything
comes together.
See my earlier comments in [1] about the key point in the commit message,
and about adding a swiotlb_dev_disable() function to the swiotlb code.
Michael
[1] https://lore.kernel.org/linux-hyperv/SN6PR02MB4157DAE6D8CC6BA11CA87298D4DCA@SN6PR02MB4157.namprd02.prod.outlook.com/
> /*
> * Register with the LDM. This will kick off the driver/device
> * binding...which will eventually call vmbus_match() and vmbus_probe()
> */
> - ret = device_register(&child_device_obj->device);
> + ret = device_add(&child_device_obj->device);
> if (ret) {
> pr_err("Unable to register child device\n");
> put_device(&child_device_obj->device);
> --
> 2.50.1
^ permalink raw reply
* [PATCH v4 2/2] mshv: add arm64 support for doorbell & intercept SINTs
From: Anirudh Rayabharam @ 2026-02-11 17:07 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel; +Cc: anirudh
In-Reply-To: <20260211170728.3056226-1-anirudh@anirudhrb.com>
From: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
On x86, the HYPERVISOR_CALLBACK_VECTOR is used to receive synthetic
interrupts (SINTs) from the hypervisor for doorbells and intercepts.
There is no such vector reserved for arm64.
On arm64, the hypervisor exposes a synthetic register that can be read
to find the INTID that should be used for SINTs. This INTID is in the
PPI range.
To better unify the code paths, introduce mshv_sint_vector_init() that
either reads the synthetic register and obtains the INTID (arm64) or
just uses HYPERVISOR_CALLBACK_VECTOR as the interrupt vector (x86).
Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
---
drivers/hv/mshv_synic.c | 112 +++++++++++++++++++++++++++++++++---
include/hyperv/hvgdk_mini.h | 2 +
2 files changed, 107 insertions(+), 7 deletions(-)
diff --git a/drivers/hv/mshv_synic.c b/drivers/hv/mshv_synic.c
index 074e37c48876..7957ad0328dd 100644
--- a/drivers/hv/mshv_synic.c
+++ b/drivers/hv/mshv_synic.c
@@ -10,17 +10,24 @@
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/mm.h>
+#include <linux/interrupt.h>
#include <linux/io.h>
#include <linux/random.h>
#include <linux/cpuhotplug.h>
#include <linux/reboot.h>
#include <asm/mshyperv.h>
+#include <linux/platform_device.h>
+#include <linux/acpi.h>
#include "mshv_eventfd.h"
#include "mshv.h"
static int synic_cpuhp_online;
static struct hv_synic_pages __percpu *synic_pages;
+static int mshv_sint_vector = -1; /* hwirq for the SynIC SINTs */
+#ifndef HYPERVISOR_CALLBACK_VECTOR
+static int mshv_sint_irq = -1; /* Linux IRQ for mshv_sint_vector */
+#endif
static u32 synic_event_ring_get_queued_port(u32 sint_index)
{
@@ -456,9 +463,7 @@ static int mshv_synic_cpu_init(unsigned int cpu)
union hv_synic_simp simp;
union hv_synic_siefp siefp;
union hv_synic_sirbp sirbp;
-#ifdef HYPERVISOR_CALLBACK_VECTOR
union hv_synic_sint sint;
-#endif
union hv_synic_scontrol sctrl;
struct hv_synic_pages *spages = this_cpu_ptr(synic_pages);
struct hv_message_page **msg_page = &spages->hyp_synic_message_page;
@@ -501,10 +506,13 @@ static int mshv_synic_cpu_init(unsigned int cpu)
hv_set_non_nested_msr(HV_MSR_SIRBP, sirbp.as_uint64);
-#ifdef HYPERVISOR_CALLBACK_VECTOR
+#ifndef HYPERVISOR_CALLBACK_VECTOR
+ enable_percpu_irq(mshv_sint_irq, 0);
+#endif
+
/* Enable intercepts */
sint.as_uint64 = 0;
- sint.vector = HYPERVISOR_CALLBACK_VECTOR;
+ sint.vector = mshv_sint_vector;
sint.masked = false;
sint.auto_eoi = hv_recommend_using_aeoi();
hv_set_non_nested_msr(HV_MSR_SINT0 + HV_SYNIC_INTERCEPTION_SINT_INDEX,
@@ -512,13 +520,12 @@ static int mshv_synic_cpu_init(unsigned int cpu)
/* Doorbell SINT */
sint.as_uint64 = 0;
- sint.vector = HYPERVISOR_CALLBACK_VECTOR;
+ sint.vector = mshv_sint_vector;
sint.masked = false;
sint.as_intercept = 1;
sint.auto_eoi = hv_recommend_using_aeoi();
hv_set_non_nested_msr(HV_MSR_SINT0 + HV_SYNIC_DOORBELL_SINT_INDEX,
sint.as_uint64);
-#endif
/* Enable global synic bit */
sctrl.as_uint64 = hv_get_non_nested_msr(HV_MSR_SCONTROL);
@@ -573,6 +580,10 @@ static int mshv_synic_cpu_exit(unsigned int cpu)
hv_set_non_nested_msr(HV_MSR_SINT0 + HV_SYNIC_DOORBELL_SINT_INDEX,
sint.as_uint64);
+#ifndef HYPERVISOR_CALLBACK_VECTOR
+ disable_percpu_irq(mshv_sint_irq);
+#endif
+
/* Disable Synic's event ring page */
sirbp.as_uint64 = hv_get_non_nested_msr(HV_MSR_SIRBP);
sirbp.sirbp_enabled = false;
@@ -683,14 +694,98 @@ static struct notifier_block mshv_synic_reboot_nb = {
.notifier_call = mshv_synic_reboot_notify,
};
+#ifndef HYPERVISOR_CALLBACK_VECTOR
+#ifdef CONFIG_ACPI
+static long __percpu *mshv_evt;
+#endif
+
+static irqreturn_t mshv_percpu_isr(int irq, void *dev_id)
+{
+ mshv_isr();
+ return IRQ_HANDLED;
+}
+
+static int __init mshv_sint_vector_init(void)
+{
+#ifdef CONFIG_ACPI
+ int ret;
+ struct hv_register_assoc reg = {
+ .name = HV_ARM64_REGISTER_SINT_RESERVED_INTERRUPT_ID,
+ };
+ union hv_input_vtl input_vtl = { 0 };
+
+ ret = hv_call_get_vp_registers(HV_VP_INDEX_SELF, HV_PARTITION_ID_SELF,
+ 1, input_vtl, ®);
+ if (ret || !reg.value.reg64)
+ return -ENODEV;
+
+ mshv_sint_vector = reg.value.reg64;
+ ret = acpi_register_gsi(NULL, mshv_sint_vector, ACPI_EDGE_SENSITIVE,
+ ACPI_ACTIVE_HIGH);
+ if (ret < 0)
+ goto out_fail;
+
+ mshv_sint_irq = ret;
+
+ mshv_evt = alloc_percpu(long);
+ if (!mshv_evt) {
+ ret = -ENOMEM;
+ goto out_unregister;
+ }
+
+ ret = request_percpu_irq(mshv_sint_irq, mshv_percpu_isr, "MSHV",
+ mshv_evt);
+ if (ret)
+ goto free_evt;
+
+ return 0;
+
+free_evt:
+ free_percpu(mshv_evt);
+out_unregister:
+ acpi_unregister_gsi(mshv_sint_vector);
+out_fail:
+ return ret;
+#else
+ return -ENODEV;
+#endif
+}
+
+static void mshv_sint_vector_cleanup(void)
+{
+#ifdef CONFIG_ACPI
+ free_percpu_irq(mshv_sint_irq, mshv_evt);
+ free_percpu(mshv_evt);
+ acpi_unregister_gsi(mshv_sint_vector);
+#endif
+}
+#else /* !HYPERVISOR_CALLBACK_VECTOR */
+static int __init mshv_sint_vector_init(void)
+{
+ mshv_sint_vector = HYPERVISOR_CALLBACK_VECTOR;
+ return 0;
+}
+
+static void mshv_sint_vector_cleanup(void)
+{
+}
+#endif /* HYPERVISOR_CALLBACK_VECTOR */
+
int __init mshv_synic_init(struct device *dev)
{
int ret = 0;
+ ret = mshv_sint_vector_init();
+ if (ret) {
+ dev_err(dev, "Failed to get MSHV SINT vector: %i\n", ret);
+ return ret;
+ }
+
synic_pages = alloc_percpu(struct hv_synic_pages);
if (!synic_pages) {
dev_err(dev, "Failed to allocate percpu synic page\n");
- return -ENOMEM;
+ ret = -ENOMEM;
+ goto sint_vector_cleanup;
}
ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "mshv_synic",
@@ -713,6 +808,8 @@ int __init mshv_synic_init(struct device *dev)
cpuhp_remove_state(synic_cpuhp_online);
free_synic_pages:
free_percpu(synic_pages);
+sint_vector_cleanup:
+ mshv_sint_vector_cleanup();
return ret;
}
@@ -721,4 +818,5 @@ void mshv_synic_cleanup(void)
unregister_reboot_notifier(&mshv_synic_reboot_nb);
cpuhp_remove_state(synic_cpuhp_online);
free_percpu(synic_pages);
+ mshv_sint_vector_cleanup();
}
diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h
index 30fbbde81c5c..7676f78e0766 100644
--- a/include/hyperv/hvgdk_mini.h
+++ b/include/hyperv/hvgdk_mini.h
@@ -1117,6 +1117,8 @@ enum hv_register_name {
HV_X64_REGISTER_MSR_MTRR_FIX4KF8000 = 0x0008007A,
HV_X64_REGISTER_REG_PAGE = 0x0009001C,
+#elif defined(CONFIG_ARM64)
+ HV_ARM64_REGISTER_SINT_RESERVED_INTERRUPT_ID = 0x00070001,
#endif
};
--
2.34.1
^ permalink raw reply related
* [PATCH v4 1/2] mshv: refactor synic init and cleanup
From: Anirudh Rayabharam @ 2026-02-11 17:07 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel; +Cc: anirudh
In-Reply-To: <20260211170728.3056226-1-anirudh@anirudhrb.com>
From: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
Rename mshv_synic_init() to mshv_synic_cpu_init() and
mshv_synic_cleanup() to mshv_synic_cpu_exit() to better reflect that
these functions handle per-cpu synic setup and teardown.
Use mshv_synic_init/cleanup() to perform init/cleanup that is not per-cpu.
Move all the synic related setup from mshv_parent_partition_init.
Move the reboot notifier to mshv_synic.c because it currently only
operates on the synic cpuhp state.
Move out synic_pages from the global mshv_root since it's use is now
completely local to mshv_synic.c.
This is in preparation for the next patch which will add more stuff to
mshv_synic_init().
No functional change.
Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com>
---
drivers/hv/mshv_root.h | 5 ++-
drivers/hv/mshv_root_main.c | 59 +++++-------------------------
drivers/hv/mshv_synic.c | 71 +++++++++++++++++++++++++++++++++----
3 files changed, 75 insertions(+), 60 deletions(-)
diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
index 3c1d88b36741..26e0320c8097 100644
--- a/drivers/hv/mshv_root.h
+++ b/drivers/hv/mshv_root.h
@@ -183,7 +183,6 @@ struct hv_synic_pages {
};
struct mshv_root {
- struct hv_synic_pages __percpu *synic_pages;
spinlock_t pt_ht_lock;
DECLARE_HASHTABLE(pt_htable, MSHV_PARTITIONS_HASH_BITS);
struct hv_partition_property_vmm_capabilities vmm_caps;
@@ -242,8 +241,8 @@ int mshv_register_doorbell(u64 partition_id, doorbell_cb_t doorbell_cb,
void mshv_unregister_doorbell(u64 partition_id, int doorbell_portid);
void mshv_isr(void);
-int mshv_synic_init(unsigned int cpu);
-int mshv_synic_cleanup(unsigned int cpu);
+int mshv_synic_init(struct device *dev);
+void mshv_synic_cleanup(void);
static inline bool mshv_partition_encrypted(struct mshv_partition *partition)
{
diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index 681b58154d5e..7c1666456e78 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -2035,7 +2035,6 @@ mshv_dev_release(struct inode *inode, struct file *filp)
return 0;
}
-static int mshv_cpuhp_online;
static int mshv_root_sched_online;
static const char *scheduler_type_to_string(enum hv_scheduler_type type)
@@ -2198,40 +2197,14 @@ root_scheduler_deinit(void)
free_percpu(root_scheduler_output);
}
-static int mshv_reboot_notify(struct notifier_block *nb,
- unsigned long code, void *unused)
-{
- cpuhp_remove_state(mshv_cpuhp_online);
- return 0;
-}
-
-struct notifier_block mshv_reboot_nb = {
- .notifier_call = mshv_reboot_notify,
-};
-
static void mshv_root_partition_exit(void)
{
- unregister_reboot_notifier(&mshv_reboot_nb);
root_scheduler_deinit();
}
static int __init mshv_root_partition_init(struct device *dev)
{
- int err;
-
- err = root_scheduler_init(dev);
- if (err)
- return err;
-
- err = register_reboot_notifier(&mshv_reboot_nb);
- if (err)
- goto root_sched_deinit;
-
- return 0;
-
-root_sched_deinit:
- root_scheduler_deinit();
- return err;
+ return root_scheduler_init(dev);
}
static void mshv_init_vmm_caps(struct device *dev)
@@ -2276,31 +2249,18 @@ static int __init mshv_parent_partition_init(void)
MSHV_HV_MAX_VERSION);
}
- mshv_root.synic_pages = alloc_percpu(struct hv_synic_pages);
- if (!mshv_root.synic_pages) {
- dev_err(dev, "Failed to allocate percpu synic page\n");
- ret = -ENOMEM;
+ ret = mshv_synic_init(dev);
+ if (ret)
goto device_deregister;
- }
-
- ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "mshv_synic",
- mshv_synic_init,
- mshv_synic_cleanup);
- if (ret < 0) {
- dev_err(dev, "Failed to setup cpu hotplug state: %i\n", ret);
- goto free_synic_pages;
- }
-
- mshv_cpuhp_online = ret;
ret = mshv_retrieve_scheduler_type(dev);
if (ret)
- goto remove_cpu_state;
+ goto synic_cleanup;
if (hv_root_partition())
ret = mshv_root_partition_init(dev);
if (ret)
- goto remove_cpu_state;
+ goto synic_cleanup;
mshv_init_vmm_caps(dev);
@@ -2318,10 +2278,8 @@ static int __init mshv_parent_partition_init(void)
exit_partition:
if (hv_root_partition())
mshv_root_partition_exit();
-remove_cpu_state:
- cpuhp_remove_state(mshv_cpuhp_online);
-free_synic_pages:
- free_percpu(mshv_root.synic_pages);
+synic_cleanup:
+ mshv_synic_cleanup();
device_deregister:
misc_deregister(&mshv_dev);
return ret;
@@ -2335,8 +2293,7 @@ static void __exit mshv_parent_partition_exit(void)
mshv_irqfd_wq_cleanup();
if (hv_root_partition())
mshv_root_partition_exit();
- cpuhp_remove_state(mshv_cpuhp_online);
- free_percpu(mshv_root.synic_pages);
+ mshv_synic_cleanup();
}
module_init(mshv_parent_partition_init);
diff --git a/drivers/hv/mshv_synic.c b/drivers/hv/mshv_synic.c
index f8b0337cdc82..074e37c48876 100644
--- a/drivers/hv/mshv_synic.c
+++ b/drivers/hv/mshv_synic.c
@@ -12,11 +12,16 @@
#include <linux/mm.h>
#include <linux/io.h>
#include <linux/random.h>
+#include <linux/cpuhotplug.h>
+#include <linux/reboot.h>
#include <asm/mshyperv.h>
#include "mshv_eventfd.h"
#include "mshv.h"
+static int synic_cpuhp_online;
+static struct hv_synic_pages __percpu *synic_pages;
+
static u32 synic_event_ring_get_queued_port(u32 sint_index)
{
struct hv_synic_event_ring_page **event_ring_page;
@@ -26,7 +31,7 @@ static u32 synic_event_ring_get_queued_port(u32 sint_index)
u32 message;
u8 tail;
- spages = this_cpu_ptr(mshv_root.synic_pages);
+ spages = this_cpu_ptr(synic_pages);
event_ring_page = &spages->synic_event_ring_page;
synic_eventring_tail = (u8 **)this_cpu_ptr(hv_synic_eventring_tail);
@@ -393,7 +398,7 @@ mshv_intercept_isr(struct hv_message *msg)
void mshv_isr(void)
{
- struct hv_synic_pages *spages = this_cpu_ptr(mshv_root.synic_pages);
+ struct hv_synic_pages *spages = this_cpu_ptr(synic_pages);
struct hv_message_page **msg_page = &spages->hyp_synic_message_page;
struct hv_message *msg;
bool handled;
@@ -446,7 +451,7 @@ void mshv_isr(void)
}
}
-int mshv_synic_init(unsigned int cpu)
+static int mshv_synic_cpu_init(unsigned int cpu)
{
union hv_synic_simp simp;
union hv_synic_siefp siefp;
@@ -455,7 +460,7 @@ int mshv_synic_init(unsigned int cpu)
union hv_synic_sint sint;
#endif
union hv_synic_scontrol sctrl;
- struct hv_synic_pages *spages = this_cpu_ptr(mshv_root.synic_pages);
+ struct hv_synic_pages *spages = this_cpu_ptr(synic_pages);
struct hv_message_page **msg_page = &spages->hyp_synic_message_page;
struct hv_synic_event_flags_page **event_flags_page =
&spages->synic_event_flags_page;
@@ -542,14 +547,14 @@ int mshv_synic_init(unsigned int cpu)
return -EFAULT;
}
-int mshv_synic_cleanup(unsigned int cpu)
+static int mshv_synic_cpu_exit(unsigned int cpu)
{
union hv_synic_sint sint;
union hv_synic_simp simp;
union hv_synic_siefp siefp;
union hv_synic_sirbp sirbp;
union hv_synic_scontrol sctrl;
- struct hv_synic_pages *spages = this_cpu_ptr(mshv_root.synic_pages);
+ struct hv_synic_pages *spages = this_cpu_ptr(synic_pages);
struct hv_message_page **msg_page = &spages->hyp_synic_message_page;
struct hv_synic_event_flags_page **event_flags_page =
&spages->synic_event_flags_page;
@@ -663,3 +668,57 @@ mshv_unregister_doorbell(u64 partition_id, int doorbell_portid)
mshv_portid_free(doorbell_portid);
}
+
+static int mshv_synic_reboot_notify(struct notifier_block *nb,
+ unsigned long code, void *unused)
+{
+ if (!hv_root_partition())
+ return 0;
+
+ cpuhp_remove_state(synic_cpuhp_online);
+ return 0;
+}
+
+static struct notifier_block mshv_synic_reboot_nb = {
+ .notifier_call = mshv_synic_reboot_notify,
+};
+
+int __init mshv_synic_init(struct device *dev)
+{
+ int ret = 0;
+
+ synic_pages = alloc_percpu(struct hv_synic_pages);
+ if (!synic_pages) {
+ dev_err(dev, "Failed to allocate percpu synic page\n");
+ return -ENOMEM;
+ }
+
+ ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "mshv_synic",
+ mshv_synic_cpu_init,
+ mshv_synic_cpu_exit);
+ if (ret < 0) {
+ dev_err(dev, "Failed to setup cpu hotplug state: %i\n", ret);
+ goto free_synic_pages;
+ }
+
+ synic_cpuhp_online = ret;
+
+ ret = register_reboot_notifier(&mshv_synic_reboot_nb);
+ if (ret)
+ goto remove_cpuhp_state;
+
+ return 0;
+
+remove_cpuhp_state:
+ cpuhp_remove_state(synic_cpuhp_online);
+free_synic_pages:
+ free_percpu(synic_pages);
+ return ret;
+}
+
+void mshv_synic_cleanup(void)
+{
+ unregister_reboot_notifier(&mshv_synic_reboot_nb);
+ cpuhp_remove_state(synic_cpuhp_online);
+ free_percpu(synic_pages);
+}
--
2.34.1
^ permalink raw reply related
* [PATCH v4 0/2] ARM64 support for doorbell and intercept SINTs
From: Anirudh Rayabharam @ 2026-02-11 17:07 UTC (permalink / raw)
To: kys, haiyangz, wei.liu, decui, longli, linux-hyperv, linux-kernel; +Cc: anirudh
From: "Anirudh Rayabharam (Microsoft)" <anirudh@anirudhrb.com>
On x86, the HYPERVISOR_CALLBACK_VECTOR is used to receive synthetic
interrupts (SINTs) from the hypervisor for doorbells and intercepts.
There is no such vector reserved for arm64.
On arm64, the hypervisor exposes a synthetic register that can be read
to find the INTID that should be used for SINTs. This INTID is in the
PPI range.
Changes in v4:
- Hypervisor now exposes a synthetic register to read the SINT vector
instead of using an ACPI platform device. So make changes to accomodate that.
Changes in v3:
- Moved the hv_root_partition() check into the reboot notifier
to avoid doing it multiple times.
v2: https://lore.kernel.org/linux-hyperv/20260202182706.648192-1-anirudh@anirudhrb.com/
Changes in v2:
Addressed review comments:
- Moved more stuff into mshv_synic.c
- Code simplifications
- Removed unnecessary debug prints
v1: https://lore.kernel.org/linux-hyperv/20260128160437.3342167-1-anirudh@anirudhrb.com/
Anirudh Rayabharam (Microsoft) (2):
mshv: refactor synic init and cleanup
mshv: add arm64 support for doorbell & intercept SINTs
drivers/hv/mshv_root.h | 5 +-
drivers/hv/mshv_root_main.c | 59 ++----------
drivers/hv/mshv_synic.c | 181 +++++++++++++++++++++++++++++++++---
include/hyperv/hvgdk_mini.h | 2 +
4 files changed, 181 insertions(+), 66 deletions(-)
--
2.34.1
^ permalink raw reply
* [PATCH net-next v3] net: mana: Add MAC address to vPort logs and clarify error messages
From: Erni Sri Satya Vennela @ 2026-02-11 5:43 UTC (permalink / raw)
To: ernis, kys, haiyangz, wei.liu, decui, longli, andrew+netdev,
davem, edumazet, kuba, pabeni, dipayanroy, ernis, shradhagupta,
shirazsaleem, gargaditya, linux-hyperv, netdev, linux-kernel
Add MAC address to vPort configuration success message and update error
message to be more specific about HWC message errors in
mana_send_request and mana_hwc_send_request.
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
---
Changes in v3:
* Remove the changes from v2 and Update commit message.
* Use "Enabled vPort ..." instead of "Configured vPort" in
mana_cfg_vport.
* Update error logs in mana_hwc_send_request.
Changes in v2:
* Update commit message.
* Use "Enabled vPort ..." instead of "Configured vPort" in
mana_cfg_vport.
* Add info log in mana_uncfg_vport, mana_gd_verify_vf_version,
mana_gd_query_max_resources, mana_query_device_cfg and
mana_query_vport_cfg.
---
.../net/ethernet/microsoft/mana/hw_channel.c | 19 +++++++++++--------
drivers/net/ethernet/microsoft/mana/mana_en.c | 8 ++++----
2 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
index aa4e2731e2ba..d4fd513dc1d6 100644
--- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
@@ -853,6 +853,7 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
struct hwc_caller_ctx *ctx;
u32 dest_vrcq = 0;
u32 dest_vrq = 0;
+ u32 command;
u16 msg_id;
int err;
@@ -861,8 +862,8 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
tx_wr = &txq->msg_buf->reqs[msg_id];
if (req_len > tx_wr->buf_len) {
- dev_err(hwc->dev, "HWC: req msg size: %d > %d\n", req_len,
- tx_wr->buf_len);
+ dev_err(hwc->dev, "%s:%d: req msg size: %d > %d\n",
+ __func__, __LINE__, req_len, tx_wr->buf_len);
err = -EINVAL;
goto out;
}
@@ -878,6 +879,7 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
req_msg->req.hwc_msg_id = msg_id;
tx_wr->msg_size = req_len;
+ command = req_msg->req.msg_type;
if (gc->is_pf) {
dest_vrq = hwc->pf_dest_vrq_id;
@@ -886,15 +888,16 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
err = mana_hwc_post_tx_wqe(txq, tx_wr, dest_vrq, dest_vrcq, false);
if (err) {
- dev_err(hwc->dev, "HWC: Failed to post send WQE: %d\n", err);
+ dev_err(hwc->dev, "%s:%d: Failed to post send WQE: %d\n",
+ __func__, __LINE__, err);
goto out;
}
if (!wait_for_completion_timeout(&ctx->comp_event,
(msecs_to_jiffies(hwc->hwc_timeout)))) {
if (hwc->hwc_timeout != 0)
- dev_err(hwc->dev, "HWC: Request timed out: %u ms\n",
- hwc->hwc_timeout);
+ dev_err(hwc->dev, "%s:%d: Command 0x%x timed out: %u ms\n",
+ __func__, __LINE__, command, hwc->hwc_timeout);
/* Reduce further waiting if HWC no response */
if (hwc->hwc_timeout > 1)
@@ -914,9 +917,9 @@ int mana_hwc_send_request(struct hw_channel_context *hwc, u32 req_len,
err = -EOPNOTSUPP;
goto out;
}
- if (req_msg->req.msg_type != MANA_QUERY_PHY_STAT)
- dev_err(hwc->dev, "HWC: Failed hw_channel req: 0x%x\n",
- ctx->status_code);
+ if (command != MANA_QUERY_PHY_STAT)
+ dev_err(hwc->dev, "%s:%d: Command 0x%x failed with status: 0x%x\n",
+ __func__, __LINE__, command, ctx->status_code);
err = -EPROTO;
goto out;
}
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 9b5a72ada5c4..53f24244de75 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1023,8 +1023,8 @@ static int mana_send_request(struct mana_context *ac, void *in_buf,
if (req->req.msg_type != MANA_QUERY_PHY_STAT &&
mana_need_log(gc, err))
- dev_err(dev, "Failed to send mana message: %d, 0x%x\n",
- err, resp->status);
+ dev_err(dev, "Command 0x%x failed with status: 0x%x, err: %d\n",
+ req->req.msg_type, resp->status, err);
return err ? err : -EPROTO;
}
@@ -1337,8 +1337,8 @@ int mana_cfg_vport(struct mana_port_context *apc, u32 protection_dom_id,
apc->tx_shortform_allowed = resp.short_form_allowed;
apc->tx_vp_offset = resp.tx_vport_offset;
- netdev_info(apc->ndev, "Configured vPort %llu PD %u DB %u\n",
- apc->port_handle, protection_dom_id, doorbell_pg_id);
+ netdev_info(apc->ndev, "Enabled vPort %llu PD %u DB %u MAC %pM\n",
+ apc->port_handle, protection_dom_id, doorbell_pg_id, apc->mac_addr);
out:
if (err)
mana_uncfg_vport(apc);
--
2.34.1
^ permalink raw reply related
* Re: [PATCH net-next v2] net: mana: Improve diagnostic logging for better debuggability
From: Erni Sri Satya Vennela @ 2026-02-11 5:20 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Jakub Kicinski, kys, haiyangz, wei.liu, decui, longli,
andrew+netdev, davem, edumazet, pabeni, kotaranov, shradhagupta,
yury.norov, dipayanroy, shirazsaleem, ssengar, gargaditya,
linux-hyperv, netdev, linux-kernel
In-Reply-To: <20260126195850.GO13967@unreal>
Hi Jakub, Leon,
Thankyou for your comments.
I will be sending the next version with updated error logs only
with MAC address reporting in Vport Config.
Additional information will be added into debugfs in a different patch.
^ permalink raw reply
* Re: [PATCH v2 3/3] x86/hyperv: Remove ASM_CALL_CONSTRAINT with VMMCALL insn
From: H. Peter Anvin @ 2026-02-10 22:52 UTC (permalink / raw)
To: Uros Bizjak
Cc: linux-hyperv, x86, linux-kernel, Michael Kelley, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen
In-Reply-To: <CAFULd4ZaRGENKVYXZiaPO0heT+1bpGrVBGzA+Wz9VS1NG6trAQ@mail.gmail.com>
On 2025-11-22 01:33, Uros Bizjak wrote:
>>
>> I think it would be good to have a comment at the point where ASM_CALL_CONSTRAINT is defined explaining its proper use.
>>
>> Specifically, instructions like syscall, vmcall, vmfunc, vmmcall, int xx and VM-specific escape instructions are not "calls" because they either don't modify the stack or create an exception frame (kernel) or signal frame (user space) which is completely special.
>
> The existing comment already mentions CALL instruction only:
>
> /*
> * This output constraint should be used for any inline asm which has a "call"
> * instruction. Otherwise the asm may be inserted before the frame pointer
> * gets set up by the containing function. If you forget to do this, objtool
> * may print a "call without frame pointer save/setup" warning.
> */
>
Yes. Some people seem to have misunderstood it to mean any instruction with
"CALL" in the name.
-hpa
^ permalink raw reply
* RE: [PATCH v2 2/3] x86/hyperv: Use savesegment() instead of inline asm() to save segment registers
From: Michael Kelley @ 2026-02-10 22:40 UTC (permalink / raw)
To: Uros Bizjak, linux-hyperv@vger.kernel.org, x86@kernel.org,
linux-kernel@vger.kernel.org, Wei Liu
Cc: K. Y. Srinivasan, Haiyang Zhang, Dexuan Cui, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin
In-Reply-To: <20251121141437.205481-2-ubizjak@gmail.com>
From: Uros Bizjak <ubizjak@gmail.com> Sent: Friday, November 21, 2025 6:14 AM
>
> Use standard savesegment() utility macro to save segment registers.
Patch 1 of this series was included in the tip tree. But this patch (Patch 2) and
Patch 3 have not been picked up anywhere.
Wei Liu -- could you pick these two up in the hyperv tree?
Michael
>
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> Acked-by: Wei Liu <wei.liu@kernel.org>
> Tested-by: Michael Kelley <mhklinux@outlook.com>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Dexuan Cui <decui@microsoft.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> ---
> arch/x86/hyperv/ivm.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index 651771534cae..7365d8f43181 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -25,6 +25,7 @@
> #include <asm/e820/api.h>
> #include <asm/desc.h>
> #include <asm/msr.h>
> +#include <asm/segment.h>
> #include <uapi/asm/vmx.h>
>
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> @@ -315,16 +316,16 @@ int hv_snp_boot_ap(u32 apic_id, unsigned long start_ip,
> unsigned int cpu)
> vmsa->gdtr.base = gdtr.address;
> vmsa->gdtr.limit = gdtr.size;
>
> - asm volatile("movl %%es, %%eax;" : "=a" (vmsa->es.selector));
> + savesegment(es, vmsa->es.selector);
> hv_populate_vmcb_seg(vmsa->es, vmsa->gdtr.base);
>
> - asm volatile("movl %%cs, %%eax;" : "=a" (vmsa->cs.selector));
> + savesegment(cs, vmsa->cs.selector);
> hv_populate_vmcb_seg(vmsa->cs, vmsa->gdtr.base);
>
> - asm volatile("movl %%ss, %%eax;" : "=a" (vmsa->ss.selector));
> + savesegment(ss, vmsa->ss.selector);
> hv_populate_vmcb_seg(vmsa->ss, vmsa->gdtr.base);
>
> - asm volatile("movl %%ds, %%eax;" : "=a" (vmsa->ds.selector));
> + savesegment(ds, vmsa->ds.selector);
> hv_populate_vmcb_seg(vmsa->ds, vmsa->gdtr.base);
>
> vmsa->efer = native_read_msr(MSR_EFER);
> --
> 2.51.1
^ permalink raw reply
* Re: [PATCH v1] x86/hyperv: Reserve 3 interrupt vectors used exclusively by mshv
From: Thomas Gleixner @ 2026-02-10 21:23 UTC (permalink / raw)
To: Mukesh Rathor, linux-hyperv, linux-kernel
Cc: kys, haiyangz, wei.liu, decui, longli, mingo, bp, dave.hansen,
x86, hpa
In-Reply-To: <20260102220208.862818-1-mrathor@linux.microsoft.com>
On Fri, Jan 02 2026 at 14:02, Mukesh Rathor wrote:
>
> +#ifndef CONFIG_X86_FRED
This #ifndef is broken. A FRED enabled kernel is no guarantee that the
CPU has FRED. So this has to be unconditional.
> + * Reserve vectors hard coded in the hypervisor. If used outside, the hypervisor
> + * will crash or hang or break into debugger.
> + */
> +static void hv_reserve_irq_vectors(void)
> +{
> + #define HYPERV_DBG_FASTFAIL_VECTOR 0x29
> + #define HYPERV_DBG_ASSERT_VECTOR 0x2C
> + #define HYPERV_DBG_SERVICE_VECTOR 0x2D
As FRED does not need this bit fiddling you want:
if (cpu_feature_enabled(X86_FEATURE_FRED))
return;
right here.
> + if (test_and_set_bit(HYPERV_DBG_ASSERT_VECTOR, system_vectors) ||
> + test_and_set_bit(HYPERV_DBG_SERVICE_VECTOR, system_vectors) ||
> + test_and_set_bit(HYPERV_DBG_FASTFAIL_VECTOR, system_vectors))
> + BUG();
> +
Thanks,
tglx
^ permalink raw reply
* Re: [PATCH v1] x86/hyperv: Reserve 3 interrupt vectors used exclusively by mshv
From: H. Peter Anvin @ 2026-02-10 20:33 UTC (permalink / raw)
To: Mukesh Rathor, linux-hyperv, linux-kernel
Cc: kys, haiyangz, wei.liu, decui, longli, tglx, mingo, bp,
dave.hansen, x86
In-Reply-To: <20260102220208.862818-1-mrathor@linux.microsoft.com>
On 2026-01-02 14:02, Mukesh Rathor wrote:
>
> v1: Add ifndef CONFIG_X86_FRED (thanks hpa)
>
It just clicked in my brain.
This must be cpu_feature_enabled() not a static #ifndef. Just because the
kernel is compiled with FRED support doesn't mean that it is *using* FRED!
-hpa
> arch/x86/kernel/cpu/mshyperv.c | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 579fb2c64cfd..8ef4ca6733ac 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -478,6 +478,27 @@ int hv_get_hypervisor_version(union hv_hypervisor_version_info *info)
> }
> EXPORT_SYMBOL_GPL(hv_get_hypervisor_version);
>
> +#ifndef CONFIG_X86_FRED
> +/*
> + * Reserve vectors hard coded in the hypervisor. If used outside, the hypervisor
> + * will crash or hang or break into debugger.
> + */
> +static void hv_reserve_irq_vectors(void)
> +{
> + #define HYPERV_DBG_FASTFAIL_VECTOR 0x29
> + #define HYPERV_DBG_ASSERT_VECTOR 0x2C
> + #define HYPERV_DBG_SERVICE_VECTOR 0x2D
> +
> + if (test_and_set_bit(HYPERV_DBG_ASSERT_VECTOR, system_vectors) ||
> + test_and_set_bit(HYPERV_DBG_SERVICE_VECTOR, system_vectors) ||
> + test_and_set_bit(HYPERV_DBG_FASTFAIL_VECTOR, system_vectors))
> + BUG();
> +
> + pr_info("Hyper-V:reserve vectors: %d %d %d\n", HYPERV_DBG_ASSERT_VECTOR,
> + HYPERV_DBG_SERVICE_VECTOR, HYPERV_DBG_FASTFAIL_VECTOR);
> +}
> +#endif /* CONFIG_X86_FRED */
> +
> static void __init ms_hyperv_init_platform(void)
> {
> int hv_max_functions_eax, eax;
> @@ -510,6 +531,11 @@ static void __init ms_hyperv_init_platform(void)
>
> hv_identify_partition_type();
>
> +#ifndef CONFIG_X86_FRED
> + if (hv_root_partition())
> + hv_reserve_irq_vectors();
> +#endif /* CONFIG_X86_FRED */
> +
> if (cc_platform_has(CC_ATTR_SNP_SECURE_AVIC))
> ms_hyperv.hints |= HV_DEPRECATING_AEOI_RECOMMENDED;
>
^ permalink raw reply
* Re: [PATCH v1] x86/hyperv: Reserve 3 interrupt vectors used exclusively by mshv
From: H. Peter Anvin @ 2026-02-10 19:41 UTC (permalink / raw)
To: Wei Liu, Mukesh Rathor
Cc: linux-hyperv, linux-kernel, kys, haiyangz, decui, longli, tglx,
mingo, bp, dave.hansen, x86
In-Reply-To: <20260115072509.GF3557088@liuwe-devbox-debian-v2.local>
On 2026-01-14 23:25, Wei Liu wrote:
>
> I briefly looked up FRED and checked the code. I understand that once it
> is enabled, Linux kernel doesn't setup the IDT anymore (code in
> arch/x86/kernel/traps.c).
>
> My question is, do we need to do anything when FRED is enabled?
>
Assuming you don't need them handled in any specific way, then you don't. INT
instructions are treated completely separately from hardware interrupts in
FRED, and so they cannot be confused.
By default they will emulate a #GP(0) just as if an INT instruction had been
executed in user space with the DPL of the corresponding interrupt gate < 3;
this is currently the case for interrupt vectors other than 3, 4, and 0x80
(which are handled in fred_intx).
Any INT instruction in kernel space will end up in fred_bad_type().
-hpa
^ permalink raw reply
* Re: [PATCH v1] x86/hyperv: Reserve 3 interrupt vectors used exclusively by mshv
From: Mukesh R @ 2026-02-10 19:12 UTC (permalink / raw)
To: linux-hyperv, linux-kernel
Cc: kys, haiyangz, wei.liu, decui, longli, tglx, mingo, bp,
dave.hansen, x86, hpa
In-Reply-To: <20260102220208.862818-1-mrathor@linux.microsoft.com>
On 1/2/26 14:02, Mukesh Rathor wrote:
> MSVC compiler, used to compile the Microsoft Hyper-V hypervisor currently,
> has an assert intrinsic that uses interrupt vector 0x29 to create an
> exception. This will cause hypervisor to then crash and collect core. As
> such, if this interrupt number is assigned to a device by linux and the
> device generates it, hypervisor will crash. There are two other such
> vectors hard coded in the hypervisor, 0x2C and 0x2D for debug purposes.
> Fortunately, the three vectors are part of the kernel driver space and
> that makes it feasible to reserve them early so they are not assigned
> later.
>
> Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
> ---
>
> v1: Add ifndef CONFIG_X86_FRED (thanks hpa)
>
> arch/x86/kernel/cpu/mshyperv.c | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index 579fb2c64cfd..8ef4ca6733ac 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -478,6 +478,27 @@ int hv_get_hypervisor_version(union hv_hypervisor_version_info *info)
> }
> EXPORT_SYMBOL_GPL(hv_get_hypervisor_version);
>
> +#ifndef CONFIG_X86_FRED
> +/*
> + * Reserve vectors hard coded in the hypervisor. If used outside, the hypervisor
> + * will crash or hang or break into debugger.
> + */
> +static void hv_reserve_irq_vectors(void)
> +{
> + #define HYPERV_DBG_FASTFAIL_VECTOR 0x29
> + #define HYPERV_DBG_ASSERT_VECTOR 0x2C
> + #define HYPERV_DBG_SERVICE_VECTOR 0x2D
> +
> + if (test_and_set_bit(HYPERV_DBG_ASSERT_VECTOR, system_vectors) ||
> + test_and_set_bit(HYPERV_DBG_SERVICE_VECTOR, system_vectors) ||
> + test_and_set_bit(HYPERV_DBG_FASTFAIL_VECTOR, system_vectors))
> + BUG();
> +
> + pr_info("Hyper-V:reserve vectors: %d %d %d\n", HYPERV_DBG_ASSERT_VECTOR,
> + HYPERV_DBG_SERVICE_VECTOR, HYPERV_DBG_FASTFAIL_VECTOR);
> +}
> +#endif /* CONFIG_X86_FRED */
> +
> static void __init ms_hyperv_init_platform(void)
> {
> int hv_max_functions_eax, eax;
> @@ -510,6 +531,11 @@ static void __init ms_hyperv_init_platform(void)
>
> hv_identify_partition_type();
>
> +#ifndef CONFIG_X86_FRED
> + if (hv_root_partition())
> + hv_reserve_irq_vectors();
> +#endif /* CONFIG_X86_FRED */
> +
> if (cc_platform_has(CC_ATTR_SNP_SECURE_AVIC))
> ms_hyperv.hints |= HV_DEPRECATING_AEOI_RECOMMENDED;
>
Hi Wei,
Ping on this one... this is show stopper. Are we waiting for an ack
from someone else on x86 side? If so, pl lmk if you know best person
to ping.
Thanks,
-Mukesh
^ permalink raw reply
* RE: [PATCH] x86: mshyperv: Use kthread for vmbus interrupts on PREEMPT_RT
From: Michael Kelley @ 2026-02-09 18:25 UTC (permalink / raw)
To: Florian Bezdeka, Jan Kiszka, K. Y. Srinivasan, Haiyang Zhang,
Wei Liu, Dexuan Cui, Long Li, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86@kernel.org
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, RT,
Mitchell Levy, skinsburskii@linux.microsoft.com,
mrathor@linux.microsoft.com, anirudh@anirudhrb.com,
schakrabarti@linux.microsoft.com, ssengar@linux.microsoft.com
In-Reply-To: <1b569fcf3d096066aeb011e21f9c1fe21f7df9b5.camel@siemens.com>
From: Florian Bezdeka <florian.bezdeka@siemens.com> Sent: Monday, February 9, 2026 2:35 AM
>
> On Sat, 2026-02-07 at 01:30 +0000, Michael Kelley wrote:
>
> [snip]
> >
> > I've run your suggested experiment on an arm64 VM in the Azure cloud. My
> > kernel was linux-next 20260128. I set CONFIG_PREEMPT_RT=y and
> > CONFIG_PROVE_LOCKING=y, but did not add either of your two patches
> > (neither the storvsc driver patch nor the x86 VMBus interrupt handling patch).
> > The VM comes up and runs, but with this warning during boot:
> >
> > [ 3.075604] hv_utils: Registering HyperV Utility Driver
> > [ 3.075636] hv_vmbus: registering driver hv_utils
> > [ 3.085920] =============================
> > [ 3.088128] hv_vmbus: registering driver hv_netvsc
> > [ 3.091180] [ BUG: Invalid wait context ]
> > [ 3.093544] 6.19.0-rc7-next-20260128+ #3 Tainted: G E
> > [ 3.097582] -----------------------------
> > [ 3.099899] systemd-udevd/284 is trying to lock:
> > [ 3.102568] ffff000100e24490 (&channel->sched_lock){....}-{3:3}, at: vmbus_chan_sched+0x128/0x3b8 [hv_vmbus]
> > [ 3.108208] other info that might help us debug this:
> > [ 3.111454] context-{2:2}
> > [ 3.112987] 1 lock held by systemd-udevd/284:
> > [ 3.115626] #0: ffffd5cfc20bcc80 (rcu_read_lock){....}-{1:3}, at: vmbus_chan_sched+0xcc/0x3b8 [hv_vmbus]
> > [ 3.121224] stack backtrace:
> > [ 3.122897] CPU: 0 UID: 0 PID: 284 Comm: systemd-udevd Tainted: G E 6.19.0-rc7-next-20260128+ #3 PREEMPT_RT
> > [ 3.129631] Tainted: [E]=UNSIGNED_MODULE
> > [ 3.131946] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 06/10/2025
> > [ 3.138553] Call trace:
> > [ 3.140015] show_stack+0x20/0x38 (C)
> > [ 3.142137] dump_stack_lvl+0x9c/0x158
> > [ 3.144340] dump_stack+0x18/0x28
> > [ 3.146290] __lock_acquire+0x488/0x1e20
> > [ 3.148569] lock_acquire+0x11c/0x388
> > [ 3.150703] rt_spin_lock+0x54/0x230
> > [ 3.152785] vmbus_chan_sched+0x128/0x3b8 [hv_vmbus]
> > [ 3.155611] vmbus_isr+0x34/0x80 [hv_vmbus]
> > [ 3.158093] vmbus_percpu_isr+0x18/0x30 [hv_vmbus]
> > [ 3.160848] handle_percpu_devid_irq+0xdc/0x348
> > [ 3.163495] handle_irq_desc+0x48/0x68
> > [ 3.165851] generic_handle_domain_irq+0x20/0x38
> > [ 3.168664] gic_handle_irq+0x1dc/0x430
> > [ 3.170868] call_on_irq_stack+0x30/0x70
> > [ 3.173161] do_interrupt_handler+0x88/0xa0
> > [ 3.175724] el1_interrupt+0x4c/0xb0
> > [ 3.177855] el1h_64_irq_handler+0x18/0x28
> > [ 3.180332] el1h_64_irq+0x84/0x88
> > [ 3.182378] _raw_spin_unlock_irqrestore+0x4c/0xb0 (P)
> > [ 3.185493] rt_mutex_slowunlock+0x404/0x440
> > [ 3.187951] rt_spin_unlock+0xb8/0x178
> > [ 3.190394] kmem_cache_alloc_noprof+0xf0/0x4f8
> > [ 3.193100] alloc_empty_file+0x64/0x148
> > [ 3.195461] path_openat+0x58/0xaa0
> > [ 3.197658] do_file_open+0xa0/0x140
> > [ 3.199752] do_sys_openat2+0x190/0x278
> > [ 3.202124] do_sys_open+0x60/0xb8
> > [ 3.204047] __arm64_sys_openat+0x2c/0x48
> > [ 3.206433] invoke_syscall+0x6c/0xf8
> > [ 3.208519] el0_svc_common.constprop.0+0x48/0xf0
> > [ 3.211050] do_el0_svc+0x24/0x38
> > [ 3.212990] el0_svc+0x164/0x3c8
> > [ 3.214842] el0t_64_sync_handler+0xd0/0xe8
> > [ 3.217251] el0t_64_sync+0x1b0/0x1b8
> > [ 3.219450] hv_utils: Heartbeat IC version 3.0
> > [ 3.219471] hv_utils: Shutdown IC version 3.2
> > [ 3.219844] hv_utils: TimeSync IC version 4.0
>
> That matches with my expectation that the same problem exists on arm64.
> The patch from Jan addresses that issue for x86 (only, so far) as we do
> not have a working test environment for arm64 yet.
OK. I had understood Jan's earlier comments to mean that the VMBus
interrupt problem was implicitly solved on arm64 because of VMBus using
a standard Linux IRQ on arm64. But evidently that's not the case. So my
earlier comment stands: The code changes should go into the architecture
independent portion of the VMBus driver, and not under arch/x86. I
can probably work with you to test on arm64 if need be.
>
> >
> > I don't see an indication that vmbus_isr() has been offloaded from
> > interrupt level onto a thread. The stack starting with el1h_64_irq()
> > and going forward is the stack for normal per-cpu interrupt handling.
> > Maybe arm64 with PREEMPT_RT does the offload to a thread only
> > for SPIs and LPIs, but not for PPIs? I haven't looked at the source code
> > for how PREEMPT_RT affects arm64 interrupt handling.
> >
> > Also, I had expected to see a problem with storvsc because I did
> > not apply your storvsc patch. But there was no such problem, even
> > with some disk I/O load (read only). arm64 VMs in Azure use exactly
> > the same virtual SCSI devices that are used with x86 VMs in Azure or
> > on local Hyper-V. I don't have an explanation. Will think about it.
> >
>
> Running the --iomix stressor provided by stress-ng was able to trigger
> the SCSI problem within 2 minutes. The result was a completely frozen
> system. For completeness the complete stress-ng command line:
>
> # stress-ng --cpu 2 --iomix 8 --vm 2 --vm-bytes 128M --fork 4
>
Thanks!
Yes, that command line reproduced the storvsc problem on arm64. And
then applying the storvsc patch made the problem go away. FWIW, on
arm64 Linux recovered and kept running after hitting the storvsc
problem.
Michael
^ permalink raw reply
* Re: [PATCH] x86: mshyperv: Use kthread for vmbus interrupts on PREEMPT_RT
From: Florian Bezdeka @ 2026-02-09 10:35 UTC (permalink / raw)
To: Michael Kelley, Jan Kiszka, K. Y. Srinivasan, Haiyang Zhang,
Wei Liu, Dexuan Cui, Long Li, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86@kernel.org
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, RT,
Mitchell Levy, skinsburskii@linux.microsoft.com,
mrathor@linux.microsoft.com, anirudh@anirudhrb.com,
schakrabarti@linux.microsoft.com, ssengar@linux.microsoft.com
In-Reply-To: <SN6PR02MB41579F60E39CA2A3CA8A5A75D467A@SN6PR02MB4157.namprd02.prod.outlook.com>
On Sat, 2026-02-07 at 01:30 +0000, Michael Kelley wrote:
[snip]
>
> I've run your suggested experiment on an arm64 VM in the Azure cloud. My
> kernel was linux-next 20260128. I set CONFIG_PREEMPT_RT=y and
> CONFIG_PROVE_LOCKING=y, but did not add either of your two patches
> (neither the storvsc driver patch nor the x86 VMBus interrupt handling patch).
> The VM comes up and runs, but with this warning during boot:
>
> [ 3.075604] hv_utils: Registering HyperV Utility Driver
> [ 3.075636] hv_vmbus: registering driver hv_utils
> [ 3.085920] =============================
> [ 3.088128] hv_vmbus: registering driver hv_netvsc
> [ 3.091180] [ BUG: Invalid wait context ]
> [ 3.093544] 6.19.0-rc7-next-20260128+ #3 Tainted: G E
> [ 3.097582] -----------------------------
> [ 3.099899] systemd-udevd/284 is trying to lock:
> [ 3.102568] ffff000100e24490 (&channel->sched_lock){....}-{3:3}, at: vmbus_chan_sched+0x128/0x3b8 [hv_vmbus]
> [ 3.108208] other info that might help us debug this:
> [ 3.111454] context-{2:2}
> [ 3.112987] 1 lock held by systemd-udevd/284:
> [ 3.115626] #0: ffffd5cfc20bcc80 (rcu_read_lock){....}-{1:3}, at: vmbus_chan_sched+0xcc/0x3b8 [hv_vmbus]
> [ 3.121224] stack backtrace:
> [ 3.122897] CPU: 0 UID: 0 PID: 284 Comm: systemd-udevd Tainted: G E 6.19.0-rc7-next-20260128+ #3 PREEMPT_RT
> [ 3.129631] Tainted: [E]=UNSIGNED_MODULE
> [ 3.131946] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 06/10/2025
> [ 3.138553] Call trace:
> [ 3.140015] show_stack+0x20/0x38 (C)
> [ 3.142137] dump_stack_lvl+0x9c/0x158
> [ 3.144340] dump_stack+0x18/0x28
> [ 3.146290] __lock_acquire+0x488/0x1e20
> [ 3.148569] lock_acquire+0x11c/0x388
> [ 3.150703] rt_spin_lock+0x54/0x230
> [ 3.152785] vmbus_chan_sched+0x128/0x3b8 [hv_vmbus]
> [ 3.155611] vmbus_isr+0x34/0x80 [hv_vmbus]
> [ 3.158093] vmbus_percpu_isr+0x18/0x30 [hv_vmbus]
> [ 3.160848] handle_percpu_devid_irq+0xdc/0x348
> [ 3.163495] handle_irq_desc+0x48/0x68
> [ 3.165851] generic_handle_domain_irq+0x20/0x38
> [ 3.168664] gic_handle_irq+0x1dc/0x430
> [ 3.170868] call_on_irq_stack+0x30/0x70
> [ 3.173161] do_interrupt_handler+0x88/0xa0
> [ 3.175724] el1_interrupt+0x4c/0xb0
> [ 3.177855] el1h_64_irq_handler+0x18/0x28
> [ 3.180332] el1h_64_irq+0x84/0x88
> [ 3.182378] _raw_spin_unlock_irqrestore+0x4c/0xb0 (P)
> [ 3.185493] rt_mutex_slowunlock+0x404/0x440
> [ 3.187951] rt_spin_unlock+0xb8/0x178
> [ 3.190394] kmem_cache_alloc_noprof+0xf0/0x4f8
> [ 3.193100] alloc_empty_file+0x64/0x148
> [ 3.195461] path_openat+0x58/0xaa0
> [ 3.197658] do_file_open+0xa0/0x140
> [ 3.199752] do_sys_openat2+0x190/0x278
> [ 3.202124] do_sys_open+0x60/0xb8
> [ 3.204047] __arm64_sys_openat+0x2c/0x48
> [ 3.206433] invoke_syscall+0x6c/0xf8
> [ 3.208519] el0_svc_common.constprop.0+0x48/0xf0
> [ 3.211050] do_el0_svc+0x24/0x38
> [ 3.212990] el0_svc+0x164/0x3c8
> [ 3.214842] el0t_64_sync_handler+0xd0/0xe8
> [ 3.217251] el0t_64_sync+0x1b0/0x1b8
> [ 3.219450] hv_utils: Heartbeat IC version 3.0
> [ 3.219471] hv_utils: Shutdown IC version 3.2
> [ 3.219844] hv_utils: TimeSync IC version 4.0
That matches with my expectation that the same problem exists on arm64.
The patch from Jan addresses that issue for x86 (only, so far) as we do
not have a working test environment for arm64 yet.
>
> I don't see an indication that vmbus_isr() has been offloaded from
> interrupt level onto a thread. The stack starting with el1h_64_irq()
> and going forward is the stack for normal per-cpu interrupt handling.
> Maybe arm64 with PREEMPT_RT does the offload to a thread only
> for SPIs and LPIs, but not for PPIs? I haven't looked at the source code
> for how PREEMPT_RT affects arm64 interrupt handling.
>
> Also, I had expected to see a problem with storvsc because I did
> not apply your storvsc patch. But there was no such problem, even
> with some disk I/O load (read only). arm64 VMs in Azure use exactly
> the same virtual SCSI devices that are used with x86 VMs in Azure or
> on local Hyper-V. I don't have an explanation. Will think about it.
>
Running the --iomix stressor provided by stress-ng was able to trigger
the SCSI problem within 2 minutes. The result was a completely frozen
system. For completeness the complete stress-ng command line:
# stress-ng --cpu 2 --iomix 8 --vm 2 --vm-bytes 128M --fork 4
Best regards,
Florian
^ permalink raw reply
* [PATCH 2/2] drm/hyperv: During panic do VMBus unload after frame buffer is flushed
From: mhkelley58 @ 2026-02-09 7:02 UTC (permalink / raw)
To: drawat.floss, maarten.lankhorst, mripard, tzimmermann, airlied,
simona, kys, haiyangz, wei.liu, decui, longli, ryasuoka, jfalempe
Cc: dri-devel, linux-kernel, linux-hyperv, stable
In-Reply-To: <20260209070201.1492-1-mhklinux@outlook.com>
From: Michael Kelley <mhklinux@outlook.com>
In a VM, Linux panic information (reason for the panic, stack trace,
etc.) may be written to a serial console and/or a virtual frame buffer
for a graphics console. The latter may need to be flushed back to the
host hypervisor for display.
The current Hyper-V DRM driver for the frame buffer does the flushing
*after* the VMBus connection has been unloaded, such that panic messages
are not displayed on the graphics console. A user with a Hyper-V graphics
console is left with just a hung empty screen after a panic. The enhanced
control that DRM provides over the panic display in the graphics console
is similarly non-functional.
Commit 3671f3777758 ("drm/hyperv: Add support for drm_panic") added
the Hyper-V DRM driver support to flush the virtual frame buffer. It
provided necessary functionality but did not handle the sequencing
problem with VMBus unload.
Fix the full problem by using VMBus functions to suppress the VMBus
unload that is normally done by the VMBus driver in the panic path. Then
after the frame buffer has been flushed, do the VMBus unload so that a
kdump kernel can start cleanly. As expected, CONFIG_DRM_PANIC must be
selected for these changes to have effect. As a side benefit, the
enhanced features of the DRM panic path are also functional.
Fixes: 3671f3777758 ("drm/hyperv: Add support for drm_panic")
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
drivers/gpu/drm/hyperv/hyperv_drm_drv.c | 4 ++++
drivers/gpu/drm/hyperv/hyperv_drm_modeset.c | 15 ++++++++-------
2 files changed, 12 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_drv.c b/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
index 06b5d96e6eaf..79e51643be67 100644
--- a/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
+++ b/drivers/gpu/drm/hyperv/hyperv_drm_drv.c
@@ -150,6 +150,9 @@ static int hyperv_vmbus_probe(struct hv_device *hdev,
goto err_free_mmio;
}
+ /* If DRM panic path is stubbed out VMBus code must do the unload */
+ if (IS_ENABLED(CONFIG_DRM_PANIC) && IS_ENABLED(CONFIG_PRINTK))
+ vmbus_set_skip_unload(true);
drm_client_setup(dev, NULL);
return 0;
@@ -169,6 +172,7 @@ static void hyperv_vmbus_remove(struct hv_device *hdev)
struct drm_device *dev = hv_get_drvdata(hdev);
struct hyperv_drm_device *hv = to_hv(dev);
+ vmbus_set_skip_unload(false);
drm_dev_unplug(dev);
drm_atomic_helper_shutdown(dev);
vmbus_close(hdev->channel);
diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c b/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c
index 7978f8c8108c..d48ca6c23b7c 100644
--- a/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c
+++ b/drivers/gpu/drm/hyperv/hyperv_drm_modeset.c
@@ -212,15 +212,16 @@ static void hyperv_plane_panic_flush(struct drm_plane *plane)
struct hyperv_drm_device *hv = to_hv(plane->dev);
struct drm_rect rect;
- if (!plane->state || !plane->state->fb)
- return;
+ if (plane->state && plane->state->fb) {
+ rect.x1 = 0;
+ rect.y1 = 0;
+ rect.x2 = plane->state->fb->width;
+ rect.y2 = plane->state->fb->height;
- rect.x1 = 0;
- rect.y1 = 0;
- rect.x2 = plane->state->fb->width;
- rect.y2 = plane->state->fb->height;
+ hyperv_update_dirt(hv->hdev, &rect);
+ }
- hyperv_update_dirt(hv->hdev, &rect);
+ vmbus_initiate_unload(true);
}
static const struct drm_plane_helper_funcs hyperv_plane_helper_funcs = {
--
2.25.1
^ permalink raw reply related
* [PATCH 1/2] Drivers: hv: vmbus: Provide option to skip VMBus unload on panic
From: mhkelley58 @ 2026-02-09 7:02 UTC (permalink / raw)
To: drawat.floss, maarten.lankhorst, mripard, tzimmermann, airlied,
simona, kys, haiyangz, wei.liu, decui, longli, ryasuoka, jfalempe
Cc: dri-devel, linux-kernel, linux-hyperv, stable
From: Michael Kelley <mhklinux@outlook.com>
Currently, VMBus code initiates a VMBus unload in the panic path so
that if a kdump kernel is loaded, it can start fresh in setting up its
own VMBus connection. However, a driver for the VMBus virtual frame
buffer may need to flush dirty portions of the frame buffer back to
the Hyper-V host so that panic information is visible in the graphics
console. To support such flushing, provide exported functions for the
frame buffer driver to specify that the VMBus unload should not be
done by the VMBus driver, and to initiate the VMBus unload itself.
Together these allow a frame buffer driver to delay the VMBus unload
until after it has completed the flush.
Ideally, the VMBus driver could use its own panic-path callback to do
the unload after all frame buffer drivers have finished. But DRM frame
buffer drivers use the kmsg dump callback, and there are no callbacks
after that in the panic path. Hence this somewhat messy approach to
properly sequencing the frame buffer flush and the VMBus unload.
Fixes: 3671f3777758 ("drm/hyperv: Add support for drm_panic")
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
drivers/hv/channel_mgmt.c | 1 +
drivers/hv/hyperv_vmbus.h | 1 -
drivers/hv/vmbus_drv.c | 25 ++++++++++++++++++-------
include/linux/hyperv.h | 3 +++
4 files changed, 22 insertions(+), 8 deletions(-)
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 74fed2c073d4..5de83676dbad 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -944,6 +944,7 @@ void vmbus_initiate_unload(bool crash)
else
vmbus_wait_for_unload();
}
+EXPORT_SYMBOL_GPL(vmbus_initiate_unload);
static void vmbus_setup_channel_state(struct vmbus_channel *channel,
struct vmbus_channel_offer_channel *offer)
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index cdbc5f5c3215..5d3944fc93ae 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -440,7 +440,6 @@ void hv_vss_deinit(void);
int hv_vss_pre_suspend(void);
int hv_vss_pre_resume(void);
void hv_vss_onchannelcallback(void *context);
-void vmbus_initiate_unload(bool crash);
static inline void hv_poll_channel(struct vmbus_channel *channel,
void (*cb)(void *))
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 6785ad63a9cb..97dfa529d250 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -69,19 +69,29 @@ bool vmbus_is_confidential(void)
}
EXPORT_SYMBOL_GPL(vmbus_is_confidential);
+static bool skip_vmbus_unload;
+
+/*
+ * Allow a VMBus framebuffer driver to specify that in the case of a panic,
+ * it will do the VMbus unload operation once it has flushed any dirty
+ * portions of the framebuffer to the Hyper-V host.
+ */
+void vmbus_set_skip_unload(bool skip)
+{
+ skip_vmbus_unload = skip;
+}
+EXPORT_SYMBOL_GPL(vmbus_set_skip_unload);
+
/*
* The panic notifier below is responsible solely for unloading the
* vmbus connection, which is necessary in a panic event.
- *
- * Notice an intrincate relation of this notifier with Hyper-V
- * framebuffer panic notifier exists - we need vmbus connection alive
- * there in order to succeed, so we need to order both with each other
- * [see hvfb_on_panic()] - this is done using notifiers' priorities.
*/
static int hv_panic_vmbus_unload(struct notifier_block *nb, unsigned long val,
void *args)
{
- vmbus_initiate_unload(true);
+ if (!skip_vmbus_unload)
+ vmbus_initiate_unload(true);
+
return NOTIFY_DONE;
}
static struct notifier_block hyperv_panic_vmbus_unload_block = {
@@ -2848,7 +2858,8 @@ static void hv_crash_handler(struct pt_regs *regs)
{
int cpu;
- vmbus_initiate_unload(true);
+ if (!skip_vmbus_unload)
+ vmbus_initiate_unload(true);
/*
* In crash handler we can't schedule synic cleanup for all CPUs,
* doing the cleanup for current CPU only. This should be sufficient
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index dfc516c1c719..b0502a336eb3 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1334,6 +1334,9 @@ int vmbus_allocate_mmio(struct resource **new, struct hv_device *device_obj,
bool fb_overlap_ok);
void vmbus_free_mmio(resource_size_t start, resource_size_t size);
+void vmbus_initiate_unload(bool crash);
+void vmbus_set_skip_unload(bool skip);
+
/*
* GUID definitions of various offer types - services offered to the guest.
*/
--
2.25.1
^ permalink raw reply related
* Re: [PATCH] PCI: hv: Allocate MMIO from above 4GB for the config window
From: Krister Johansen @ 2026-02-07 1:42 UTC (permalink / raw)
To: Matthew Ruffell, Michael Kelley
Cc: DECUI@microsoft.com, bhelgaas@google.com, haiyangz@microsoft.com,
jakeo@microsoft.com, kwilczynski@kernel.org, kys@microsoft.com,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-pci@vger.kernel.org, longli@microsoft.com,
lpieralisi@kernel.org, mani@kernel.org, robh@kernel.org,
stable@vger.kernel.org, wei.liu@kernel.org
In-Reply-To: <SN6PR02MB41573CD2EA6CD82A0C238F66D494A@SN6PR02MB4157.namprd02.prod.outlook.com>
Hi Matthew and Michael,
On Fri, Jan 23, 2026 at 06:39:24AM +0000, Michael Kelley wrote:
> From: Matthew Ruffell <matthew.ruffell@canonical.com> Sent: Thursday, January 22, 2026 9:39 PM
> > > > There's a parameter to the kexec() command that governs whether it uses the
> > > > kexec_file_load() system call or the kexec_load() system call.
> > > > I wonder if that parameter makes a difference in the problem described for this
> > > > patch.
> >
> > Yes, it does indeed make a difference. I have been debugging this the past few
> > days, and my colleague Melissa noticed that the problem reproduces when secure
> > boot is disabled, but it does not reproduce when secure boot is enabled.
> > Additionally, it reproduces on jammy, but not noble. It turns out that
> > kexec-tools on jammy defaults to kexec_load() when secure boot is disabled,
> > and when enabled, it instead uses kexec_file_load(). On noble, it defaults to
> > first trying kexec_file_load() before falling back to kexec_load(), so the
> > issue does not reproduce.
>
> This is good info, and definitely a clue. So to be clear, the problem repros
> only when kexec_load() is used. With kexec_file_load(), it does not repro. Is that
> right? I saw a similar distinction when working on commit 304386373007,
> though in the opposite direction!
Just to muddy the waters here, I have a team on the Noble 6.8 kernel
train that's running into this issue on Standard_D#pds_v6 with secure
boot disabled. I've validated via strace(8) that kexec(8) is calling
kexec_file_load(2), but in this case the problem Dexuan describes in the
cover letter occurs but affects NIC attachment instead of the NVMe
storage device. (e.g. pci_hyperv attach of the NIC reports the
pass-through error instead of successfully attaching).
> > > > > /*
> > > > > * Set up a region of MMIO space to use for accessing configuration
> > > > > - * space.
> > > > > + * space. Use the high MMIO range to not conflict with the hyperv_drm
> > > > > + * driver (which normally gets MMIO from the low MMIO range) in the
> > > > > + * kdump kernel of a Gen2 VM, which fails to reserve the framebuffer
> > > > > + * MMIO range in vmbus_reserve_fb() due to screen_info.lfb_base being
> > > > > + * zero in the kdump kernel.
> > > > > */
> > > > > - ret = vmbus_allocate_mmio(&hbus->mem_config, hbus->hdev, 0, -1,
> > > > > + ret = vmbus_allocate_mmio(&hbus->mem_config, hbus->hdev, SZ_4G, -1,
> > > > > PCI_CONFIG_MMIO_LENGTH, 0x1000, false);
> > > > > if (ret)
> > > > > return ret;
> > > > > --
> >
> > Thank you for the patch Dexuan.
> >
> > This patch fixes the problem on Ubuntu 5.15, and 6.8 based kernels
> > booting V6 instance types on Azure with Gen 2 images.
>
> Are you seeing the problem on x86/64 or arm64 instances in Azure?
> "V6 instance types" could be either, I think, but I'm guessing you
> are on x86/64.
>
> And just to confirm: are you seeing the problem with the
> Hyper-V DRM driver, or the Hyper-V FB driver? This patch mentions
> the DRM driver, so I assume that's the problematic config.
It's been arm64 and not x86 for the case I've seen. They're currently
running with the hyperv_drm driver, but they've also tried swapping to
the fb driver without any change in results.
> > Tested-by: Matthew Ruffell <matthew.ruffell@canonical.com>
All of the above said, I also tested Dexuan's fix on these instances and
found that with the patch applied kdump did work again.
Tested-by: Krister Johansen <johansen@templeofstupid.com>
-K
^ permalink raw reply
* RE: [PATCH] x86: mshyperv: Use kthread for vmbus interrupts on PREEMPT_RT
From: Michael Kelley @ 2026-02-07 1:30 UTC (permalink / raw)
To: Jan Kiszka, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
Long Li, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86@kernel.org
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
Florian Bezdeka, RT, Mitchell Levy,
skinsburskii@linux.microsoft.com, mrathor@linux.microsoft.com,
anirudh@anirudhrb.com, schakrabarti@linux.microsoft.com,
ssengar@linux.microsoft.com
In-Reply-To: <eb5debe8-b7d6-4076-b295-9a02271c2ee6@siemens.com>
From: Jan Kiszka <jan.kiszka@siemens.com> Sent: Thursday, February 5, 2026 10:41 PM
>
> On 05.02.26 19:55, Michael Kelley wrote:
> > From: Jan Kiszka <jan.kiszka@siemens.com> Sent: Tuesday, February 3, 2026 8:02 AM
> >>
> >> Resolves the following lockdep report when booting PREEMPT_RT on Hyper-V
> >> with related guest support enabled:
> >>
> >> [ 1.127941] hv_vmbus: registering driver hyperv_drm
> >>
> >> [ 1.132518] =============================
> >> [ 1.132519] [ BUG: Invalid wait context ]
> >> [ 1.132521] 6.19.0-rc8+ #9 Not tainted
> >> [ 1.132524] -----------------------------
> >> [ 1.132525] swapper/0/0 is trying to lock:
> >> [ 1.132526] ffff8b9381bb3c90 (&channel->sched_lock){....}-{3:3}, at: vmbus_chan_sched+0xc4/0x2b0
> >> [ 1.132543] other info that might help us debug this:
> >> [ 1.132544] context-{2:2}
> >> [ 1.132545] 1 lock held by swapper/0/0:
> >> [ 1.132547] #0: ffffffffa010c4c0 (rcu_read_lock){....}-{1:3}, at: vmbus_chan_sched+0x31/0x2b0
> >> [ 1.132557] stack backtrace:
> >> [ 1.132560] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.19.0-rc8+ #9 PREEMPT_{RT,(lazy)}
> >> [ 1.132565] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025
> >> [ 1.132567] Call Trace:
> >> [ 1.132570] <IRQ>
> >> [ 1.132573] dump_stack_lvl+0x6e/0xa0
> >> [ 1.132581] __lock_acquire+0xee0/0x21b0
> >> [ 1.132592] lock_acquire+0xd5/0x2d0
> >> [ 1.132598] ? vmbus_chan_sched+0xc4/0x2b0
> >> [ 1.132606] ? lock_acquire+0xd5/0x2d0
> >> [ 1.132613] ? vmbus_chan_sched+0x31/0x2b0
> >> [ 1.132619] rt_spin_lock+0x3f/0x1f0
> >> [ 1.132623] ? vmbus_chan_sched+0xc4/0x2b0
> >> [ 1.132629] ? vmbus_chan_sched+0x31/0x2b0
> >> [ 1.132634] vmbus_chan_sched+0xc4/0x2b0
> >> [ 1.132641] vmbus_isr+0x2c/0x150
> >> [ 1.132648] __sysvec_hyperv_callback+0x5f/0xa0
> >> [ 1.132654] sysvec_hyperv_callback+0x88/0xb0
> >> [ 1.132658] </IRQ>
> >> [ 1.132659] <TASK>
> >> [ 1.132660] asm_sysvec_hyperv_callback+0x1a/0x20
> >>
> >> As code paths that handle vmbus IRQs use sleepy locks under PREEMPT_RT,
> >> the complete vmbus_handler execution needs to be moved into thread
> >> context. Open-coding this allows to skip the IPI that irq_work would
> >> additionally bring and which we do not need, being an IRQ, never an NMI.
> >>
> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >> ---
[snip]
> >>
> >> arch/x86/kernel/cpu/mshyperv.c | 52 ++++++++++++++++++++++++++++++++--
> >
> > You've added this code under arch/x86. But isn't it architecture independent? I
> > think it should also work on arm64. If that's the case, the code should probably
> > be added to drivers/hv/vmbus_drv.c instead.
> >
>
> I checked that before: arm64 uses normal IRQs, not over-optimized APIC
> vectors. And those IRQs are auto-threaded.
Just to clarify, with CONFIG_PREEMPT_RT=y, you expect the normal Linux
IRQ handling mechanism to offload a per-CPU interrupt handler from interrupt
level onto a thread?
>
> That said, someone with an arm64 Hyper-V deployment should still try to
> run things there once (PREEMPT_RT + PROVE_LOCKING). I don't have such a
> setup.
>
I've run your suggested experiment on an arm64 VM in the Azure cloud. My
kernel was linux-next 20260128. I set CONFIG_PREEMPT_RT=y and
CONFIG_PROVE_LOCKING=y, but did not add either of your two patches
(neither the storvsc driver patch nor the x86 VMBus interrupt handling patch).
The VM comes up and runs, but with this warning during boot:
[ 3.075604] hv_utils: Registering HyperV Utility Driver
[ 3.075636] hv_vmbus: registering driver hv_utils
[ 3.085920] =============================
[ 3.088128] hv_vmbus: registering driver hv_netvsc
[ 3.091180] [ BUG: Invalid wait context ]
[ 3.093544] 6.19.0-rc7-next-20260128+ #3 Tainted: G E
[ 3.097582] -----------------------------
[ 3.099899] systemd-udevd/284 is trying to lock:
[ 3.102568] ffff000100e24490 (&channel->sched_lock){....}-{3:3}, at: vmbus_chan_sched+0x128/0x3b8 [hv_vmbus]
[ 3.108208] other info that might help us debug this:
[ 3.111454] context-{2:2}
[ 3.112987] 1 lock held by systemd-udevd/284:
[ 3.115626] #0: ffffd5cfc20bcc80 (rcu_read_lock){....}-{1:3}, at: vmbus_chan_sched+0xcc/0x3b8 [hv_vmbus]
[ 3.121224] stack backtrace:
[ 3.122897] CPU: 0 UID: 0 PID: 284 Comm: systemd-udevd Tainted: G E 6.19.0-rc7-next-20260128+ #3 PREEMPT_RT
[ 3.129631] Tainted: [E]=UNSIGNED_MODULE
[ 3.131946] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 06/10/2025
[ 3.138553] Call trace:
[ 3.140015] show_stack+0x20/0x38 (C)
[ 3.142137] dump_stack_lvl+0x9c/0x158
[ 3.144340] dump_stack+0x18/0x28
[ 3.146290] __lock_acquire+0x488/0x1e20
[ 3.148569] lock_acquire+0x11c/0x388
[ 3.150703] rt_spin_lock+0x54/0x230
[ 3.152785] vmbus_chan_sched+0x128/0x3b8 [hv_vmbus]
[ 3.155611] vmbus_isr+0x34/0x80 [hv_vmbus]
[ 3.158093] vmbus_percpu_isr+0x18/0x30 [hv_vmbus]
[ 3.160848] handle_percpu_devid_irq+0xdc/0x348
[ 3.163495] handle_irq_desc+0x48/0x68
[ 3.165851] generic_handle_domain_irq+0x20/0x38
[ 3.168664] gic_handle_irq+0x1dc/0x430
[ 3.170868] call_on_irq_stack+0x30/0x70
[ 3.173161] do_interrupt_handler+0x88/0xa0
[ 3.175724] el1_interrupt+0x4c/0xb0
[ 3.177855] el1h_64_irq_handler+0x18/0x28
[ 3.180332] el1h_64_irq+0x84/0x88
[ 3.182378] _raw_spin_unlock_irqrestore+0x4c/0xb0 (P)
[ 3.185493] rt_mutex_slowunlock+0x404/0x440
[ 3.187951] rt_spin_unlock+0xb8/0x178
[ 3.190394] kmem_cache_alloc_noprof+0xf0/0x4f8
[ 3.193100] alloc_empty_file+0x64/0x148
[ 3.195461] path_openat+0x58/0xaa0
[ 3.197658] do_file_open+0xa0/0x140
[ 3.199752] do_sys_openat2+0x190/0x278
[ 3.202124] do_sys_open+0x60/0xb8
[ 3.204047] __arm64_sys_openat+0x2c/0x48
[ 3.206433] invoke_syscall+0x6c/0xf8
[ 3.208519] el0_svc_common.constprop.0+0x48/0xf0
[ 3.211050] do_el0_svc+0x24/0x38
[ 3.212990] el0_svc+0x164/0x3c8
[ 3.214842] el0t_64_sync_handler+0xd0/0xe8
[ 3.217251] el0t_64_sync+0x1b0/0x1b8
[ 3.219450] hv_utils: Heartbeat IC version 3.0
[ 3.219471] hv_utils: Shutdown IC version 3.2
[ 3.219844] hv_utils: TimeSync IC version 4.0
I don't see an indication that vmbus_isr() has been offloaded from
interrupt level onto a thread. The stack starting with el1h_64_irq()
and going forward is the stack for normal per-cpu interrupt handling.
Maybe arm64 with PREEMPT_RT does the offload to a thread only
for SPIs and LPIs, but not for PPIs? I haven't looked at the source code
for how PREEMPT_RT affects arm64 interrupt handling.
Also, I had expected to see a problem with storvsc because I did
not apply your storvsc patch. But there was no such problem, even
with some disk I/O load (read only). arm64 VMs in Azure use exactly
the same virtual SCSI devices that are used with x86 VMs in Azure or
on local Hyper-V. I don't have an explanation. Will think about it.
Michael
^ permalink raw reply
* Re: [PATCH 1/3] x86/x2apic: disable x2apic on resume if the kernel expects so
From: Sohil Mehta @ 2026-02-07 0:37 UTC (permalink / raw)
To: Shashank Balaji
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, Suresh Siddha, K. Y. Srinivasan, Haiyang Zhang,
Wei Liu, Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov,
Broadcom internal kernel review list, Jan Kiszka, Paolo Bonzini,
Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky, Ingo Molnar,
linux-kernel, linux-hyperv, virtualization, jailhouse-dev, kvm,
xen-devel, Rahul Bukte, Daniel Palmer, Tim Bird, stable
In-Reply-To: <aYWs-wvDuS53BHMe@JPC00244420>
On 2/6/2026 12:57 AM, Shashank Balaji wrote:
> On Thu, Feb 05, 2026 at 03:18:58PM -0800, Sohil Mehta wrote:
>> On 2/4/2026 10:07 PM, Shashank Balaji wrote:
>>> On Wed, Feb 04, 2026 at 10:53:28AM -0800, Sohil Mehta wrote:
>>
>>>> It's a bit odd then that the firmware chooses to enable x2apic without
>>>> the OS requesting it.
>>>
>>> Well, the firmware has a setting saying "Enable x2apic", which was
>>> enabled. So it did what the setting says
>>>
>>
I still think the firmware behavior is flawed. Some confusion
originates from the option "Enable x2apic". Based on just these two
words, the behavior seems highly ambiguous. I interpret it to mean "Make
the x2apic feature available to the kernel". Then it is up to the kernel
whether it decides to use it.
If the intention of the BIOS option is to automatically/always enable
x2apic then there is an architectural way to do it. It can set the bits
that track to x2apic_hw_locked(). But, doing it this way during resume
(behind OS's back) is unexpected.
>>
>> pr_warn_once("x2apic unexpectedly re-enabled by the firmware during
>> resume.\n");
>
> At least as per the spec, it's not something the firmware needs to fix,
> and it's not unexpected re-enablement.
>
> Am I missing something?
>
> But it _is_ surprising that this bug went unnoticed for so long :)
Maybe this BIOS option isn't typically found in a production firmware. I
think it would be preferable to have the pr_warn_once() but I won't push
super hard for it due to the unclear information.
Whatever you choose for v2, can you please briefly describe the
ambiguity in the commit message? It would help other reviewers provide
better insight and be handy if we ever need to come back to this.
^ permalink raw reply
* RE: [PATCH v3 4/4] mshv: Handle insufficient root memory hypervisor statuses
From: Michael Kelley @ 2026-02-06 18:54 UTC (permalink / raw)
To: Stanislav Kinsburskii, kys@microsoft.com, haiyangz@microsoft.com,
wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com
Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <177031694699.186911.12873334535011325477.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Thursday, February 5, 2026 10:42 AM
> To: kys@microsoft.com; haiyangz@microsoft.com; wei.liu@kernel.org;
> decui@microsoft.com; longli@microsoft.com
> Cc: linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH v3 4/4] mshv: Handle insufficient root memory hypervisor statuses
>
> When creating guest partition objects, the hypervisor may fail to
> allocate root partition pages and return an insufficient memory status.
> In this case, deposit memory using the root partition ID instead.
>
> Note: This error should never occur in a guest of L1VH partition context.
>
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
> drivers/hv/hv_common.c | 2 +
> drivers/hv/hv_proc.c | 14 ++++++++++
> include/hyperv/hvgdk_mini.h | 58 ++++++++++++++++++++++---------------------
> 3 files changed, 46 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index f20596276662..6b67ac616789 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -794,6 +794,8 @@ static const struct hv_status_info hv_status_infos[] = {
> _STATUS_INFO(HV_STATUS_PROPERTY_VALUE_OUT_OF_RANGE, -EIO),
> _STATUS_INFO(HV_STATUS_INSUFFICIENT_MEMORY, -ENOMEM),
> _STATUS_INFO(HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY, -ENOMEM),
> + _STATUS_INFO(HV_STATUS_INSUFFICIENT_ROOT_MEMORY, -ENOMEM),
> + _STATUS_INFO(HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY, -ENOMEM),
> _STATUS_INFO(HV_STATUS_INVALID_PARTITION_ID, -EINVAL),
> _STATUS_INFO(HV_STATUS_INVALID_VP_INDEX, -EINVAL),
> _STATUS_INFO(HV_STATUS_NOT_FOUND, -EIO),
> diff --git a/drivers/hv/hv_proc.c b/drivers/hv/hv_proc.c
> index 181f6d02bce3..5f4fd9c3231c 100644
> --- a/drivers/hv/hv_proc.c
> +++ b/drivers/hv/hv_proc.c
> @@ -121,6 +121,18 @@ int hv_deposit_memory_node(int node, u64 partition_id,
> case HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY:
> num_pages = HV_MAX_CONTIGUOUS_ALLOCATION_PAGES;
> break;
> +
> + case HV_STATUS_INSUFFICIENT_CONTIGUOUS_ROOT_MEMORY:
> + num_pages = HV_MAX_CONTIGUOUS_ALLOCATION_PAGES;
> + fallthrough;
> + case HV_STATUS_INSUFFICIENT_ROOT_MEMORY:
> + if (!hv_root_partition()) {
> + hv_status_err(hv_status, "Unexpected root memory deposit\n");
> + return -ENOMEM;
> + }
> + partition_id = HV_PARTITION_ID_SELF;
> + break;
> +
Per the discussion in v1 of this patch set, if the number of pages that should be
deposited in a particular situation is different from what this function provides,
the fallback is to use hv_call_deposit_pages() directly. From what I see, there's
only one such fallback case after a hypercall failure -- in hv_do_map_gpa_hcall().
The other uses of hv_call_deposit_pages() are initial deposits when creating a
VP or partition.
But if hv_call_deposit_pages() is used directly, the logic added here to detect
insufficient root memory and deposit to HV_PARTITION_ID_SELF isn't applied.
So if the hypercall in hv_do_map_gpa_hcall() fails with insufficient root
memory, the deposit is done to the wrong partition ID. If that case can
actually happen, then some additional logic is needed in
hv_do_map_gpa_hcall() to handle it. Or there needs to be a fallback
function that contains the logic.
Other than that, everything else in this patch set looks good to me.
Michael
> default:
> hv_status_err(hv_status, "Unexpected!\n");
> return -ENOMEM;
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox