Linux-HyperV List
 help / color / mirror / Atom feed
* [PATCH] hyperv: move vmbus_root_device outside of the VMBus driver
From: Hamza Mahfooz @ 2026-05-20 15:49 UTC (permalink / raw)
  To: linux-hyperv
  Cc: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Stanislav Kinsburskii, Anirudh Rayabharam (Microsoft),
	Jork Loeser, Michael Kelley, Hamza Mahfooz, Arnd Bergmann

Conceptually, we shouldn't have to build the VMBus driver to answer the
question "does VMBus exist?" So, move vmbus_root_device and the
functions that access it directly to hyperv.h. Also, introduce
hv_set_vmbus_root_device() to allow vmbus_drv to set the root device.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Closes: https://lore.kernel.org/r/20260520074044.923728-1-arnd@kernel.org/
Fixes: f1a9e67c1138 ("mshv: limit SynIC management to MSHV-owned resources")
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
---
 drivers/hv/vmbus_drv.c | 27 ++++++---------------------
 include/linux/hyperv.h | 18 ++++++++++++++++--
 2 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 71f1b4d52f7f..b26e6b42fa49 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -47,9 +47,6 @@ struct vmbus_dynid {
 	struct hv_vmbus_device_id id;
 };
 
-/* VMBus Root Device */
-static struct device  *vmbus_root_device;
-
 static int hyperv_cpuhp_online;
 
 static DEFINE_PER_CPU(long, vmbus_evt);
@@ -95,18 +92,6 @@ static struct resource *fb_mmio;
 static struct resource *hyperv_mmio;
 static DEFINE_MUTEX(hyperv_mmio_lock);
 
-struct device *hv_get_vmbus_root_device(void)
-{
-	return vmbus_root_device;
-}
-EXPORT_SYMBOL_GPL(hv_get_vmbus_root_device);
-
-bool hv_vmbus_exists(void)
-{
-	return vmbus_root_device != NULL;
-}
-EXPORT_SYMBOL_GPL(hv_vmbus_exists);
-
 static u8 channel_monitor_group(const struct vmbus_channel *channel)
 {
 	return (u8)channel->offermsg.monitorid / 32;
@@ -903,7 +888,7 @@ static int vmbus_dma_configure(struct device *child_device)
 	 * On x86/x64 coherence is assumed and these calls have no effect.
 	 */
 	hv_setup_dma_ops(child_device,
-		device_get_dma_attr(vmbus_root_device) == DEV_DMA_COHERENT);
+		device_get_dma_attr(hv_get_vmbus_root_device()) == DEV_DMA_COHERENT);
 	return 0;
 }
 
@@ -2172,7 +2157,7 @@ int vmbus_device_register(struct hv_device *child_device_obj)
 		     &child_device_obj->channel->offermsg.offer.if_instance);
 
 	child_device_obj->device.bus = &hv_bus;
-	child_device_obj->device.parent = vmbus_root_device;
+	child_device_obj->device.parent = hv_get_vmbus_root_device();
 	child_device_obj->device.release = vmbus_device_release;
 
 	child_device_obj->device.dma_parms = &child_device_obj->dma_parms;
@@ -2561,7 +2546,7 @@ static int vmbus_acpi_add(struct platform_device *pdev)
 	struct acpi_device *ancestor;
 	struct acpi_device *device = ACPI_COMPANION(&pdev->dev);
 
-	vmbus_root_device = &device->dev;
+	hv_set_vmbus_root_device(&device->dev);
 
 	/*
 	 * Older versions of Hyper-V for ARM64 fail to include the _CCA
@@ -2648,7 +2633,7 @@ static int vmbus_device_add(struct platform_device *pdev)
 	struct device_node *np = pdev->dev.of_node;
 	int ret;
 
-	vmbus_root_device = &pdev->dev;
+	hv_set_vmbus_root_device(&pdev->dev);
 
 	ret = of_range_parser_init(&parser, np);
 	if (ret)
@@ -2969,7 +2954,7 @@ static int __init hv_acpi_init(void)
 	if (ret)
 		return ret;
 
-	if (!vmbus_root_device) {
+	if (!hv_vmbus_exists()) {
 		ret = -ENODEV;
 		goto cleanup;
 	}
@@ -3000,7 +2985,7 @@ static int __init hv_acpi_init(void)
 
 cleanup:
 	platform_driver_unregister(&vmbus_platform_driver);
-	vmbus_root_device = NULL;
+	hv_set_vmbus_root_device(NULL);
 	return ret;
 }
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 5459e776ec17..3e8ea7d9653c 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1302,9 +1302,23 @@ static inline void *hv_get_drvdata(struct hv_device *dev)
 	return dev_get_drvdata(&dev->device);
 }
 
-struct device *hv_get_vmbus_root_device(void);
+/* Do *not* use this outside of hyperv.h */
+static struct device *__vmbus_root_device __read_mostly;
 
-bool hv_vmbus_exists(void);
+static inline void hv_set_vmbus_root_device(struct device *dev)
+{
+	__vmbus_root_device = dev;
+}
+
+static inline struct device *hv_get_vmbus_root_device(void)
+{
+	return __vmbus_root_device;
+}
+
+static inline bool hv_vmbus_exists(void)
+{
+	return hv_get_vmbus_root_device();
+}
 
 struct hv_ring_buffer_debug_info {
 	u32 current_interrupt_mask;
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH v1 3/4] iommu/hyperv: Add para-virtualized IOMMU support for Hyper-V guest
From: Yu Zhang @ 2026-05-20 15:50 UTC (permalink / raw)
  To: Michael Kelley
  Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
	iommu@lists.linux.dev, linux-pci@vger.kernel.org,
	linux-arch@vger.kernel.org, wei.liu@kernel.org, kys@microsoft.com,
	haiyangz@microsoft.com, decui@microsoft.com, longli@microsoft.com,
	joro@8bytes.org, will@kernel.org, robin.murphy@arm.com,
	bhelgaas@google.com, kwilczynski@kernel.org,
	lpieralisi@kernel.org, mani@kernel.org, robh@kernel.org,
	arnd@arndb.de, jgg@ziepe.ca, jacob.pan@linux.microsoft.com,
	tgopinath@linux.microsoft.com,
	easwar.hariharan@linux.microsoft.com, Mukesh R
In-Reply-To: <SN6PR02MB4157A1B2D9B56062A0917BC5D4042@SN6PR02MB4157.namprd02.prod.outlook.com>

On Fri, May 15, 2026 at 05:36:40PM +0000, Michael Kelley wrote:
> From: Yu Zhang <zhangyu1@linux.microsoft.com> Sent: Friday, May 15, 2026 9:54 AM
> > 
> > On Fri, May 15, 2026 at 02:51:38PM +0000, Michael Kelley wrote:
> > > From: Yu Zhang <zhangyu1@linux.microsoft.com> Sent: Friday, May 15, 2026 7:00 AM
> > > >
> > > > On Thu, May 14, 2026 at 06:13:24PM +0000, Michael Kelley wrote:
> > > > > From: Yu Zhang <zhangyu1@linux.microsoft.com> Sent: Monday, May 11, 2026 9:24 AM
> 
> [....]
> 
> > > > >
> > > > > Previous versions of this function did hv_iommu_detach_dev(). With that call
> > > > > removed from here, hv_iommu_detach_dev() is only called when attaching a
> > > > > domain to a device that already has a domain attached. Is it the case that
> > > > > Hyper-V doesn't require the detach as a cleanup step?
> > > > >
> > > >
> > > > The IOMMU core attaches the device to release_domain (our blocking domain)
> > > > before calling release_device(), so I believe the explicit detach in the RFC
> > > > was redundant. I simply didn't realize that at the time.
> > > >
> > >
> > > Got it. But after the IOMMU core attaches the device to the blocking
> > > domain, there's the possibility that the vPCI device is rescinded by
> > > Hyper-V and it goes away entirely. Or the device might be subjected
> > > to an "unbind/bind" cycle in Linux. Does the detach need to be done
> > > on the blocking domain in such cases? In this version of the patches, the
> > > Hyper-V "attach" and "detach" hypercalls still end up unbalanced. That
> > > seems a bit untidy at best, and I wonder if there are scenarios where
> > > Hyper-V will complain about the lack of balance.
> > >
> > 
> > Thank you, Michael. May I ask what "the vPCI device is rescinded by
> > Hyper-V and it goes away entirely" mean?
> > 
> 
> See the documentation at Documentation/virt/hyperv/vpci.rst in a
> kernel source code tree, and particularly the section entitled "PCI Device
> Removal". Such removals can and do happen in running Azure guest
> VMs. Start with that info and then I'll do my best to answer follow-up
> questions you may have.
> 
> The unbind/bind case is separate, but has some of the same effects in
> that Linux should be removing all setup of the PCI device. There's actually
> two unbind steps -- one to unbind the device-specific driver (e.g., the
> Mellanox MLX5 driver or the NMVe driver) driver from the device, and
> potentially a second to unbind the VMBus vPCI driver from the device.
> These unbind/bind sequences can be done in the Linux guest without
> the Hyper-V host rescinding the device.

Thanks for pointing me to the vpci.rst doc, Michael!

IIUC, for the vPCI rescind case and the VMBus vPCI driver unbind
case, both will trigger iommu_deinit_device(), in which IOMMU core
attaches the device to our blocking domain. And hypervisor will
handle such attaching to the blocking domain by essentially disable
the DMA translation and IOPF for the device. Since the device is
already in a safe state after that, I don't think an additional
detach is necessary.

For the device-specific driver unbind (e.g., unbind nvme then
bind back, or bind to vfio-pci), that is transparent to the IOMMU
layer.

Also, while looking at Jason's comments on the detach/attach
semantics, I'm now thinking that making the attach hypercall
atomic (having the hypervisor handle the domain switch
internally) might be a cleaner approach. That also eliminates
the attach/detach imbalance issue entirely. I need to do some
verification on the hypervisor side first, but will keep you
posted. Thanks!

B.R.
Yu

> 
> Michael
> 

^ permalink raw reply

* Re: [PATCH v1 3/4] iommu/hyperv: Add para-virtualized IOMMU support for Hyper-V guest
From: Yu Zhang @ 2026-05-20 15:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, linux-hyperv, iommu, linux-pci, linux-arch, wei.liu,
	kys, haiyangz, decui, longli, joro, will, robin.murphy, bhelgaas,
	kwilczynski, lpieralisi, mani, robh, arnd, mhklinux, jacob.pan,
	tgopinath, easwar.hariharan
In-Reply-To: <20260515223139.GK7702@ziepe.ca>

On Fri, May 15, 2026 at 07:31:39PM -0300, Jason Gunthorpe wrote:
> On Tue, May 12, 2026 at 12:24:07AM +0800, Yu Zhang wrote:
> > +/*
> > + * Identity and blocking domains are static singletons: identity is a 1:1
> > + * passthrough with no page table, blocking rejects all DMA. Neither holds
> > + * per-IOMMU state, so one instance suffices even with multiple vIOMMUs.
> > + */
> > +static struct hv_iommu_domain hv_identity_domain;
> > +static struct hv_iommu_domain hv_blocking_domain;
> > +static const struct iommu_domain_ops hv_iommu_identity_domain_ops;
> > +static const struct iommu_domain_ops hv_iommu_blocking_domain_ops;
> 
> Please follow the format of other drivers and statically initialize
> these here instead of in C code.
> 

Thanks! Will do.

> > +static struct iommu_ops hv_iommu_ops;
> > +static LIST_HEAD(hv_iommu_pci_bus_list);
> > +static DEFINE_SPINLOCK(hv_iommu_pci_bus_lock);
> > +
> > +#define hv_iommu_present(iommu_cap) (iommu_cap & HV_IOMMU_CAP_PRESENT)
> > +#define hv_iommu_s1_domain_supported(iommu_cap) (iommu_cap & HV_IOMMU_CAP_S1)
> > +#define hv_iommu_5lvl_supported(iommu_cap) (iommu_cap & HV_IOMMU_CAP_S1_5LVL)
> > +#define hv_iommu_ats_supported(iommu_cap) (iommu_cap & HV_IOMMU_CAP_ATS)
> 
> prefer to see static inlines
> 

Yes. Thanks.

> > +static void hv_iommu_detach_dev(struct iommu_domain *domain, struct device *dev)
> > +{
> > +	u64 status;
> > +	unsigned long flags;
> > +	struct hv_input_detach_device_domain *input;
> > +	struct pci_dev *pdev;
> > +	struct hv_iommu_domain *hv_domain = to_hv_iommu_domain(domain);
> > +	struct hv_iommu_endpoint *vdev = dev_iommu_priv_get(dev);
> > +
> > +	/* See the attach function, only PCI devices for now */
> > +	if (!dev_is_pci(dev) || vdev->hv_domain != hv_domain)
> > +		return;
> > +
> > +	pdev = to_pci_dev(dev);
> > +
> > +	dev_dbg(dev, "detaching from domain %d\n", hv_domain->device_domain.domain_id.id);
> > +
> > +	local_irq_save(flags);
> > +
> > +	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > +	memset(input, 0, sizeof(*input));
> > +	input->partition_id = HV_PARTITION_ID_SELF;
> > +	if (hv_iommu_lookup_logical_dev_id(pdev, &input->device_id.as_uint64)) {
> > +		local_irq_restore(flags);
> > +		dev_warn(&pdev->dev, "no IOMMU registration for vPCI bus on detach\n");
> > +		return;
> > +	}
> > +	status = hv_do_hypercall(HVCALL_DETACH_DEVICE_DOMAIN, input, NULL);
> 
> FWIW the hypervisor cannot implement the linux attach semantics if it
> has detach/attach. It must support simply 'attach' which atomically
> changes the translation. With detach you have a confusing problem if
> errors happen in the middle of the sequence the device is left in an
> unclear state. You should at least document what state the hypervisor
> is supposed to leaave the translation iin during these failures..
> 

Thanks for raising this!

My understanding is that the detach hypercall just clears the HW
IOMMU entry for the device (e.g., context table entry on VT-d, DTE
on AMD IOMMU) and flushes the IOTLB. So after detach and before the
subsequent attach, DMA from the device is blocked by the HW IOMMU
- the device is not in an unprotected state.

The detach shall not fail unless the pvIOMMU driver passes incorrect
parameters or is otherwise in a buggy state, or Hyper-V itself has
a bug. 

But making attach atomic might be a cleaner approach (drop the
explicit detach and let the hypervisor handle the domain switch
internally)? I'll look into making that work, though it needs
verification (and possibly changes) on the hypervisor side (these
hypercalls are not exclusively for Linux guest). If that doesn't
work out, I'll keep the two-step approach with comments documenting
the intermediate translation state.

> > +static void hv_iommu_release_device(struct device *dev)
> > +{
> > +	struct hv_iommu_endpoint *vdev = dev_iommu_priv_get(dev);
> > +	struct pci_dev *pdev = to_pci_dev(dev);
> > +
> > +	if (pdev->ats_enabled)
> > +		pci_disable_ats(pdev);
> > +
> > +	dev_iommu_priv_set(dev, NULL);
> > +	set_dma_ops(dev, NULL);
> 
> Does the driver really need to mess with set_dma_ops ? I'd rather not
> see that in new iommu drivers if at all possible :|
> 

No. I should have removed it. Thanks!

> > +static int __init hv_initialize_static_domains(void)
> > +{
> > +	int ret;
> > +	struct hv_iommu_domain *hv_domain;
> > +
> > +	/* Default stage-1 identity domain */
> > +	hv_domain = &hv_identity_domain;
> > +
> > +	ret = hv_create_device_domain(hv_domain, HV_DEVICE_DOMAIN_TYPE_S1);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = hv_configure_device_domain(hv_domain, IOMMU_DOMAIN_IDENTITY);
> > +	if (ret)
> > +		goto delete_identity_domain;
> > +
> > +	hv_domain->domain.type = IOMMU_DOMAIN_IDENTITY;
> > +	hv_domain->domain.ops = &hv_iommu_identity_domain_ops;
> > +	hv_domain->domain.owner = &hv_iommu_ops;
> > +	hv_domain->domain.geometry = hv_iommu_device->geometry;
> > +	hv_domain->domain.pgsize_bitmap = hv_iommu_device->pgsize_bitmap;
> 
> identity doesn't use geometry or pgsize_bitmap
> 

Sure. Will remove it.

> > +	/* Default stage-1 blocked domain */
> > +	hv_domain = &hv_blocking_domain;
> > +
> > +	ret = hv_create_device_domain(hv_domain, HV_DEVICE_DOMAIN_TYPE_S1);
> > +	if (ret)
> > +		goto delete_identity_domain;
> > +
> > +	ret = hv_configure_device_domain(hv_domain, IOMMU_DOMAIN_BLOCKED);
> > +	if (ret)
> > +		goto delete_blocked_domain;
> > +
> > +	hv_domain->domain.type = IOMMU_DOMAIN_BLOCKED;
> > +	hv_domain->domain.ops = &hv_iommu_blocking_domain_ops;
> > +	hv_domain->domain.owner = &hv_iommu_ops;
> > +	hv_domain->domain.geometry = hv_iommu_device->geometry;
> > +	hv_domain->domain.pgsize_bitmap = hv_iommu_device->pgsize_bitmap;
> 
> Nor does blocked
> 

Same. Will remove.

> > +#define INTERRUPT_RANGE_START	(0xfee00000)
> > +#define INTERRUPT_RANGE_END	(0xfeefffff)
> > +static void hv_iommu_get_resv_regions(struct device *dev,
> > +		struct list_head *head)
> > +{
> > +	struct iommu_resv_region *region;
> > +
> > +	region = iommu_alloc_resv_region(INTERRUPT_RANGE_START,
> > +				      INTERRUPT_RANGE_END - INTERRUPT_RANGE_START + 1,
> > +				      0, IOMMU_RESV_MSI, GFP_KERNEL);
> 
> Surprised these constants are not discovered from the hv?
> 

Thanks. Since the pvIOMMU currently only targets x86, hardcoding
the x86 architectural MSI range should be reasonable?

If in the future we need additional reserved regions (e.g.,
RMRR-style identity-mapped ranges), those would be discovered
from the hypervisor.

> > +static void hv_iommu_flush_iotlb_all(struct iommu_domain *domain)
> > +{
> > +	hv_flush_device_domain(to_hv_iommu_domain(domain));
> > +}
> > +
> > +static void hv_iommu_iotlb_sync(struct iommu_domain *domain,
> > +				struct iommu_iotlb_gather *iotlb_gather)
> > +{
> > +	hv_flush_device_domain(to_hv_iommu_domain(domain));
> > +
> > +	iommu_put_pages_list(&iotlb_gather->freelist);
> > +}
> 
> Full invalidation only huh?
> 

Page-selective IOTLB flush is added in patch 4/4.

> > +static const struct iommu_domain_ops hv_iommu_identity_domain_ops = {
> > +	.attach_dev	= hv_iommu_attach_dev,
> > +};
> > +
> > +static const struct iommu_domain_ops hv_iommu_blocking_domain_ops = {
> > +	.attach_dev	= hv_iommu_attach_dev,
> > +};
> 
> Usually I would expect these to have their own attach
> functions. blocking in particular must have an attach op that cannot
> fail. It is used to recover the device back to a known translation in
> case of cascading other errors.
> 

For blocking domain, the hypercall handler of such attach essentially
disable the translation and IOPF for the device. It should not fail
unless there is a bug in the driver or the hypervisor.
How about something like this?

	static int hv_blocking_attach_dev(domain, dev, old)
	{
		...
		status = hv_do_hypercall(HVCALL_ATTACH, blocking, dev);
		WARN_ON(!hv_result_success(status));
		vdev->hv_domain = blocking;
		return 0;
	}

For identity domain, would it be fine to keep sharing the same attach
callback with paging domain?

> > +static const struct iommu_domain_ops hv_iommu_paging_domain_ops = {
> > +	.attach_dev	= hv_iommu_attach_dev,
> > +	IOMMU_PT_DOMAIN_OPS(x86_64),
> > +	.flush_iotlb_all = hv_iommu_flush_iotlb_all,
> > +	.iotlb_sync = hv_iommu_iotlb_sync,
> > +	.free = hv_iommu_paging_domain_free,
> > +};
> > +
> > +static struct iommu_domain *hv_iommu_domain_alloc_paging(struct device *dev)
> > +{
> > +	int ret;
> > +	struct hv_iommu_domain *hv_domain;
> > +	struct pt_iommu_x86_64_cfg cfg = {};
> > +
> > +	hv_domain = kzalloc_obj(*hv_domain, GFP_KERNEL);
> > +	if (!hv_domain)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	ret = hv_create_device_domain(hv_domain, HV_DEVICE_DOMAIN_TYPE_S1);
> > +	if (ret) {
> > +		kfree(hv_domain);
> > +		return ERR_PTR(ret);
> > +	}
> > +
> > +	hv_domain->domain.geometry = hv_iommu_device->geometry;
> 
> geoemtry shouldn't be set here, it is overriden by
> pt_iommu_x86_64_init() with the exact page table configuration
> 

Right. Will remove.
> > +static void __init hv_init_iommu_device(struct hv_iommu_dev *hv_iommu,
> > +			struct hv_output_get_iommu_capabilities *hv_iommu_cap)
> > +{
> > +	ida_init(&hv_iommu->domain_ids);
> > +
> > +	hv_iommu->cap = hv_iommu_cap->iommu_cap;
> > +	hv_iommu->max_iova_width = hv_iommu_cap->max_iova_width;
> > +	if (!hv_iommu_5lvl_supported(hv_iommu->cap) &&
> > +	    hv_iommu->max_iova_width > 48) {
> > +		pr_info("5-level paging not supported, limiting iova width to 48.\n");
> > +		hv_iommu->max_iova_width = 48;
> > +	}
> > +
> > +	hv_iommu->geometry = (struct iommu_domain_geometry) {
> > +		.aperture_start = 0,
> > +		.aperture_end = (((u64)1) << hv_iommu->max_iova_width) - 1,
> > +		.force_aperture = true,
> > +	};
> > +
> > +	hv_iommu->first_domain = HV_DEVICE_DOMAIN_ID_DEFAULT + 1;
> > +	hv_iommu->last_domain = HV_DEVICE_DOMAIN_ID_NULL - 1;
> > +	/* Only x86 page sizes (4K/2M/1G) are supported */
> > +	hv_iommu->pgsize_bitmap = hv_iommu_cap->pgsize_bitmap &
> > +				  (SZ_4K | SZ_2M | SZ_1G);
> > +	if (hv_iommu->pgsize_bitmap != hv_iommu_cap->pgsize_bitmap)
> > +		pr_warn("unsupported page sizes masked: 0x%llx -> 0x%llx\n",
> > +			hv_iommu_cap->pgsize_bitmap, hv_iommu->pgsize_bitmap);
> 
> IKD if you need this logic, the way the page table code is used it
> just sort of falls out naturally that other page sizes are ignored.
> 

Right, the page table code can handle 4K/2M/1G naturally. We'd like
hypervisor to be able to choose to support only a subset of {4K, 2M,
1G}. This masking filters out any unexpected page sizes and falls
back to 4K if nothing valid remains.

The if statement may be a bit paranoid perhaps, but it felt safer
to validate explicitly. Happy to drop it if you think it's unnecessary.


> > +struct hv_iommu_domain {
> > +	union {
> > +		struct iommu_domain    domain;
> > +		struct pt_iommu        pt_iommu;
> > +		struct pt_iommu_x86_64 pt_iommu_x86_64;
> > +	};
> 
> You should retain the build assertions here
> 

Sure. Thanks!

B.R.
Yu

> Jason
> 

^ permalink raw reply

* RE: [PATCH v3 2/2] drm/hyperv: validate VMBus packet size in receive callback
From: Michael Kelley @ 2026-05-20 14:24 UTC (permalink / raw)
  To: Berkant Koc, Saurabh Sengar, Dexuan Cui, Long Li
  Cc: linux-hyperv@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Michael Kelley, Thomas Zimmermann, Maarten Lankhorst,
	Maxime Ripard, Deepak Rawat, sashiko-reviews@lists.linux.dev
In-Reply-To: <177928341826.371979.14701698047864220449@berkoc.com>

From: Berkant Koc <me@berkoc.com> Sent: Wednesday, May 20, 2026 6:24 AM
> 
> sashiko-bot@kernel.org wrote:
> > - [Critical] Using `bytes_recvd` for `memcpy()` without checking
> >   `vmbus_recvpacket()` return value leads to a massive heap buffer
> >   overflow.
> 
> This one is bounded on this channel. hyperv_connect_vsp() calls
> vmbus_open() without setting max_pkt_size, so the inbound ring uses
> VMBUS_DEFAULT_MAX_PKT_SIZE (4096) and hv_pkt_iter_first() clamps the
> packet length to pkt_buffer_size. bytes_recvd therefore cannot exceed
> 4096, well under the 16 KiB recv_buf and init_buf, and
> vmbus_recvpacket() does not return -ENOBUFS here, so the memcpy length
> stays bounded.

Actually, the behavior of vmbus_recvpacket() is more subtle,
and the problem pointed out by Sashiko AI is real. I had forgotten
about this subtle behavior in my first reply to you on this topic. :-(

The incoming message from Hyper-V has a length encoded in it.
hv_ringbuffer_read() gets that message length, and if it is larger
than the buflen_parameter, -ENOBUFS is returned. But the
returned buffer_actual_len is also set to the incoming message
length provided by Hyper-V. This allows the caller to realize
it didn't provide enough buffer space, and also tells it how much
buffer space is needed. hv_pci_onchannelcallback() uses
this functionality to retry the receive operation with a larger
buffer. I think all the other callers just treat -ENOBUFS as a
fatal error, which is also fine.

> 
> I will still gate the dispatch on a successful vmbus_recvpacket()
> return in the next revision, as defense in depth, so the bound is
> local instead of relying on the ring clamp.

Indeed, the error check is needed, and it's not just defense in depth.

> 
> > - [High] Strict sizeof() validation incorrectly rejects
> >   dynamically-sized SYNTHVID_RESOLUTION_RESPONSE packets.
> 
> Agreed. The response carries resolution_count entries, not the full
> SYNTHVID_MAX_RESOLUTION_COUNT array, so checking against
> sizeof(struct synthvid_supported_resolution_resp) is too strict. The
> next revision validates the fixed prefix, reads and bounds
> resolution_count, then requires only the count-sized array.

OK, good.

> 
> > - [High] Concurrent lockless write to `hv->init_buf` from VMBus
> >   callback allows a malicious host to overwrite data while the guest
> >   is validating it.

I don't think this actually happens. In hv_pkt_iter_first(), the message
is copied out of the ring buffer into a temporary buffer that is not
explicitly shared with the host. This temporary copy prevents a
malicious host from modifying the data while the guest is validating it.

> > - [High] Missing `reinit_completion()` before reusing the shared
> >   `hv->wait` completion object.
> 
> Both pre-existing. On v2 Michael Kelley suggested splitting the
> completion reinit into a separate patch on the resume path. The
> init_buf reuse sits in the same area, so I plan to send the reinit and
> the related response-type handling as a separate follow-up rather than
> fold them into this size-validation change.

Handling a timeout, and then the host providing a belated response
is a really messy problem. There's some mechanism using request IDs
in hv_ringbuffer_write() and vmbus_sendpacket_getid() to help match
up requests and responses, but my recollection is that even this
extra machinery is not 100% foolproof. You may or may not want
to go down the path of trying to fix it. :-)
	
A process comment:  The emails for v2 and v3 of your patch set
are being threaded in an unexpected way. They are showing up
as replies under the original v1. See https://lore.kernel.org/linux-hyperv/.
The preferred approach is for each version to start a new email
thread. The cover letter should start the new thread, and the
patches should show up as threaded under the cover letter.

Also, don't post a new version more frequently than every
24 hours at a minimum, and there's no problem with waiting
2 to 3 days. The idea is to give people a chance to review and
provide comments so that you can batch any changes in
response to the feedback.

Michael

^ permalink raw reply

* Re: [PATCH] mshv: add vmbus dependency
From: Arnd Bergmann @ 2026-05-20 13:51 UTC (permalink / raw)
  To: Michael Kelley, Arnd Bergmann, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Dexuan Cui, Long Li, Anirudh Rayabharam (Microsoft),
	Jork Loeser, Stanislav Kinsburskii
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <SN6PR02MB4157CDF10D33D62DD8709D63D4012@SN6PR02MB4157.namprd02.prod.outlook.com>

On Wed, May 20, 2026, at 15:46, Michael Kelley wrote:
> From: Arnd Bergmann <arnd@kernel.org> Sent: Wednesday, May 20, 2026 12:40 AM
>> 
>> When the vmbus driver is not part of the kernel, the mvhv_root
>> driver now fails to link:
>> 
>> ERROR: modpost: "hv_vmbus_exists" [drivers/hv/mshv_root.ko] undefined!
>> 
>> Avoid this by adding an explicit Kconfig dependency. Note that
>> stubbing out the hv_vmbus_exists() based on configuration would
>> also work for some cases, but not with MSHV_ROOT=y and HYPERV_VMBUS=m.
>
> Conceptually, the MSHV root code should not have a dependency on
> VMBus. The "does VMBus exist?" question should handled differently
> by setting up a boolean in the core Hyper-V code that defaults to "false".
> If the VMBus driver loads, it would set the boolean to "true". MSHV
> root code would query the boolean.

That makes sense to me, but it's outside of my area for a drive-by
build fix. Please treat my patch as a 'Reported-by' then, I expect
this can easily be addressed by Jork or someone else in the
hyperv team.

      Arnd

^ permalink raw reply

* RE: [PATCH] mshv: add vmbus dependency
From: Michael Kelley @ 2026-05-20 13:46 UTC (permalink / raw)
  To: Arnd Bergmann, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Anirudh Rayabharam (Microsoft), Jork Loeser,
	Stanislav Kinsburskii
  Cc: Arnd Bergmann, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20260520074044.923728-1-arnd@kernel.org>

From: Arnd Bergmann <arnd@kernel.org> Sent: Wednesday, May 20, 2026 12:40 AM
> 
> When the vmbus driver is not part of the kernel, the mvhv_root
> driver now fails to link:
> 
> ERROR: modpost: "hv_vmbus_exists" [drivers/hv/mshv_root.ko] undefined!
> 
> Avoid this by adding an explicit Kconfig dependency. Note that
> stubbing out the hv_vmbus_exists() based on configuration would
> also work for some cases, but not with MSHV_ROOT=y and HYPERV_VMBUS=m.

Conceptually, the MSHV root code should not have a dependency on
VMBus. The "does VMBus exist?" question should handled differently
by setting up a boolean in the core Hyper-V code that defaults to "false".
If the VMBus driver loads, it would set the boolean to "true". MSHV
root code would query the boolean.

Michael

> 
> Fixes: f1a9e67c1138 ("mshv: limit SynIC management to MSHV-owned resources")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  drivers/hv/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> index 52af086fdeb2..21193b571a80 100644
> --- a/drivers/hv/Kconfig
> +++ b/drivers/hv/Kconfig
> @@ -75,6 +75,7 @@ config MSHV_ROOT
>  	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
>  	# no particular order, making it impossible to reassemble larger pages
>  	depends on PAGE_SIZE_4KB
> +	depends on HYPERV_VMBUS
>  	select EVENTFD
>  	select VIRT_XFER_TO_GUEST_WORK
>  	select HMM_MIRROR
> --
> 2.39.5
> 


^ permalink raw reply

* Re: [PATCH v1 1/4] iommu: Move Hyper-V IOMMU driver to its own subdirectory
From: Jason Gunthorpe @ 2026-05-20 13:38 UTC (permalink / raw)
  To: Yu Zhang
  Cc: linux-kernel, linux-hyperv, iommu, linux-pci, linux-arch, wei.liu,
	kys, haiyangz, decui, longli, joro, will, robin.murphy, bhelgaas,
	kwilczynski, lpieralisi, mani, robh, arnd, mhklinux, jacob.pan,
	tgopinath, easwar.hariharan
In-Reply-To: <elo7enhxk5m7x4rc4quiqnkgbwsqa3ex7di2ksv7szu75xqe6x@zum5nwmkl6kn>

On Wed, May 20, 2026 at 02:37:03PM +0800, Yu Zhang wrote:
> On Fri, May 15, 2026 at 07:19:18PM -0300, Jason Gunthorpe wrote:
> > On Tue, May 12, 2026 at 12:24:05AM +0800, Yu Zhang wrote:
> > > From: Easwar Hariharan <eahariha@linux.microsoft.com>
> > > 
> > > The Hyper-V IOMMU driver currently only supports IRQ remapping.
> > > As it will be adding DMA remapping support, prepare a directory
> > > to contain all the different feature files.
> > 
> > Any possibility we could put the irq remapping thing under the irq
> > directory?
> > 
> > The other drivers have it here because they are co-mingled with their
> > iommu HW, will hyperv have the same issue?
> > 
> 
> Good question. I don't think Hyper-V have the same co-mingling issue.
> 
> But from a code organization perspective, I think drivers/iommu/hyperv/
> is still the most natural place:
> 
> - The IRQ remapping framework itself (drivers/iommu/irq_remapping.c
>   and its internal header irq_remapping.h) lives under drivers/iommu/,
>   and all three backends (intel/, amd/, hyperv/) sit there today.
>   hyperv/irq_remapping.c includes that internal header directly.

IMHO it is a fair question if that even belongs under iommu.

I think it was dumped into here because of the co-mingled drivers..

Jason

^ permalink raw reply

* Re: [PATCH v3 2/2] drm/hyperv: validate VMBus packet size in receive callback
From: Berkant Koc @ 2026-05-20 13:23 UTC (permalink / raw)
  To: Saurabh Sengar, Dexuan Cui, Long Li
  Cc: linux-hyperv, dri-devel, linux-kernel, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Michael Kelley, Thomas Zimmermann,
	Maarten Lankhorst, Maxime Ripard, Deepak Rawat, sashiko-reviews
In-Reply-To: <20260519213450.50E611F000E9@smtp.kernel.org>

sashiko-bot@kernel.org wrote:
> - [Critical] Using `bytes_recvd` for `memcpy()` without checking
>   `vmbus_recvpacket()` return value leads to a massive heap buffer
>   overflow.

This one is bounded on this channel. hyperv_connect_vsp() calls
vmbus_open() without setting max_pkt_size, so the inbound ring uses
VMBUS_DEFAULT_MAX_PKT_SIZE (4096) and hv_pkt_iter_first() clamps the
packet length to pkt_buffer_size. bytes_recvd therefore cannot exceed
4096, well under the 16 KiB recv_buf and init_buf, and
vmbus_recvpacket() does not return -ENOBUFS here, so the memcpy length
stays bounded.

I will still gate the dispatch on a successful vmbus_recvpacket()
return in the next revision, as defense in depth, so the bound is
local instead of relying on the ring clamp.

> - [High] Strict sizeof() validation incorrectly rejects
>   dynamically-sized SYNTHVID_RESOLUTION_RESPONSE packets.

Agreed. The response carries resolution_count entries, not the full
SYNTHVID_MAX_RESOLUTION_COUNT array, so checking against
sizeof(struct synthvid_supported_resolution_resp) is too strict. The
next revision validates the fixed prefix, reads and bounds
resolution_count, then requires only the count-sized array.

> - [High] Concurrent lockless write to `hv->init_buf` from VMBus
>   callback allows a malicious host to overwrite data while the guest
>   is validating it.
> - [High] Missing `reinit_completion()` before reusing the shared
>   `hv->wait` completion object.

Both pre-existing. On v2 Michael Kelley suggested splitting the
completion reinit into a separate patch on the resume path. The
init_buf reuse sits in the same area, so I plan to send the reinit and
the related response-type handling as a separate follow-up rather than
fold them into this size-validation change.

Thanks for the review.

Berkant

^ permalink raw reply

* Re: [PATCH] drivers: hv: use kmalloc_array in mshv_root_scheduler_init
From: sashiko-bot @ 2026-05-20  7:55 UTC (permalink / raw)
  To: Can Peng; +Cc: linux-hyperv
In-Reply-To: <20260520071632.557990-1-pengcan@kylinos.cn>

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] A CPU hotplug offline operation can lead to a NULL pointer dereference in `mshv_vp_dispatch` because user tasks can still execute on the offlining CPU after the per-CPU buffers have been freed.
--

commit 182b63fe189dbbf1da723a03d972eaa7cbb71f56
Author: Can Peng <pengcan@kylinos.cn>

drivers: hv: use kmalloc_array in mshv_root_scheduler_init

Replace kmalloc() with kmalloc_array() to prevent potential overflow, as
recommended in Documentation/process/deprecated.rst.

> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index bd1359eb58dd4..146726cc4e9ba 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -2241,7 +2241,7 @@ static int mshv_root_scheduler_init(unsigned int cpu)
>  	outputarg = (void **)this_cpu_ptr(root_scheduler_output);
>  
>  	/* Allocate two consecutive pages. One for input, one for output. */
> -	p = kmalloc(2 * HV_HYP_PAGE_SIZE, GFP_KERNEL);
> +	p = kmalloc_array(2, HV_HYP_PAGE_SIZE, GFP_KERNEL);
        ^^^^^^^^^^^^^

[Severity: High]
This isn't a bug introduced by this patch, but is there a pre-existing
issue where a CPU hotplug offline operation can lead to a NULL pointer
dereference in mshv_vp_dispatch()?

The CPU hotplug teardown callback mshv_root_scheduler_cleanup() is registered
at CPUHP_AP_ONLINE_DYN. During a CPU offline operation, it frees the per-CPU
buffers and sets the pointers to NULL:

drivers/hv/mshv_root_main.c:mshv_root_scheduler_cleanup() {
    ...
    *inputarg = NULL;
    *outputarg = NULL;
    ...
}

User tasks can still execute on the offlining CPU at this high state in
the offline sequence. If a vCPU thread scheduled on the offlining CPU invokes
mshv_vp_dispatch() after the teardown callback has run, it unconditionally
dereferences the input buffer:

drivers/hv/mshv_root_main.c:mshv_vp_dispatch() {
    ...
    preempt_disable();
    input = *this_cpu_ptr(root_scheduler_input);
    ...
    memset(input, 0, sizeof(*input));
    ...
}

Does this sequence result in a NULL pointer dereference if the input buffer
pointer has already been cleared by the cleanup callback?

>  	if (!p)
>  		return -ENOMEM;
>

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260520071632.557990-1-pengcan@kylinos.cn?part=1

^ permalink raw reply

* [PATCH] mshv: add vmbus dependency
From: Arnd Bergmann @ 2026-05-20  7:40 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Anirudh Rayabharam (Microsoft), Jork Loeser,
	Stanislav Kinsburskii
  Cc: Arnd Bergmann, linux-hyperv, linux-kernel

From: Arnd Bergmann <arnd@arndb.de>

When the vmbus driver is not part of the kernel, the mvhv_root
driver now fails to link:

ERROR: modpost: "hv_vmbus_exists" [drivers/hv/mshv_root.ko] undefined!

Avoid this by adding an explicit Kconfig dependency. Note that
stubbing out the hv_vmbus_exists() based on configuration would
also work for some cases, but not with MSHV_ROOT=y and HYPERV_VMBUS=m.

Fixes: f1a9e67c1138 ("mshv: limit SynIC management to MSHV-owned resources")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/hv/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 52af086fdeb2..21193b571a80 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -75,6 +75,7 @@ config MSHV_ROOT
 	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
 	# no particular order, making it impossible to reassemble larger pages
 	depends on PAGE_SIZE_4KB
+	depends on HYPERV_VMBUS
 	select EVENTFD
 	select VIRT_XFER_TO_GUEST_WORK
 	select HMM_MIRROR
-- 
2.39.5


^ permalink raw reply related

* [PATCH] drivers: hv: use kmalloc_array in mshv_root_scheduler_init
From: Can Peng @ 2026-05-20  7:16 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, longli, decui
  Cc: linux-kernel, linux-hyperv, Can Peng

Replace kmalloc() with kmalloc_array() to prevent potential
overflow, as recommended in Documentation/process/deprecated.rst.

Signed-off-by: Can Peng <pengcan@kylinos.cn>
---
 drivers/hv/mshv_root_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
index bd1359eb58dd..146726cc4e9b 100644
--- a/drivers/hv/mshv_root_main.c
+++ b/drivers/hv/mshv_root_main.c
@@ -2241,7 +2241,7 @@ static int mshv_root_scheduler_init(unsigned int cpu)
 	outputarg = (void **)this_cpu_ptr(root_scheduler_output);
 
 	/* Allocate two consecutive pages. One for input, one for output. */
-	p = kmalloc(2 * HV_HYP_PAGE_SIZE, GFP_KERNEL);
+	p = kmalloc_array(2, HV_HYP_PAGE_SIZE, GFP_KERNEL);
 	if (!p)
 		return -ENOMEM;
 
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v1 1/4] iommu: Move Hyper-V IOMMU driver to its own subdirectory
From: Yu Zhang @ 2026-05-20  6:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-kernel, linux-hyperv, iommu, linux-pci, linux-arch, wei.liu,
	kys, haiyangz, decui, longli, joro, will, robin.murphy, bhelgaas,
	kwilczynski, lpieralisi, mani, robh, arnd, mhklinux, jacob.pan,
	tgopinath, easwar.hariharan
In-Reply-To: <20260515221918.GJ7702@ziepe.ca>

On Fri, May 15, 2026 at 07:19:18PM -0300, Jason Gunthorpe wrote:
> On Tue, May 12, 2026 at 12:24:05AM +0800, Yu Zhang wrote:
> > From: Easwar Hariharan <eahariha@linux.microsoft.com>
> > 
> > The Hyper-V IOMMU driver currently only supports IRQ remapping.
> > As it will be adding DMA remapping support, prepare a directory
> > to contain all the different feature files.
> 
> Any possibility we could put the irq remapping thing under the irq
> directory?
> 
> The other drivers have it here because they are co-mingled with their
> iommu HW, will hyperv have the same issue?
> 

Good question. I don't think Hyper-V have the same co-mingling issue.

But from a code organization perspective, I think drivers/iommu/hyperv/
is still the most natural place:

- The IRQ remapping framework itself (drivers/iommu/irq_remapping.c
  and its internal header irq_remapping.h) lives under drivers/iommu/,
  and all three backends (intel/, amd/, hyperv/) sit there today.
  hyperv/irq_remapping.c includes that internal header directly.

- irq_remapping_prepare() references the extern hyperv_irq_remap_ops
  declared in that header, so the provider naturally belongs in the
  same tree.

Moving it to e.g. drivers/irqchip/ would break that symmetry.

Yu

> Jason
> 

^ permalink raw reply

* [PATCH net] net: mana: validate rx_req_idx to prevent out-of-bounds array access
From: Aditya Garg @ 2026-05-20  5:15 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, dipayanroy, horms, ernis, gargaditya,
	gargaditya, kees, stephen, shacharr, ssengar, linux-hyperv,
	netdev, linux-kernel

In mana_hwc_rx_event_handler(), rx_req_idx is derived from
sge->address in DMA-coherent memory. In Confidential VMs
(SEV-SNP/TDX), this memory is shared unencrypted and HW can modify
WQE contents at any time. No bounds check exists on rx_req_idx,
which can lead to an out-of-bounds access into reqs[].

Add bounds check on rx_req_idx in mana_hwc_rx_event_handler() before
using it to index the reqs[] array.

Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/ethernet/microsoft/mana/hw_channel.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c b/drivers/net/ethernet/microsoft/mana/hw_channel.c
index dbbde0fa57e7..a60f733d1a07 100644
--- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
+++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
@@ -266,6 +266,12 @@ static void mana_hwc_rx_event_handler(void *ctx, u32 gdma_rxq_id,
 	rq_base_addr = hwc_rxq->msg_buf->mem_info.dma_handle;
 	rx_req_idx = (sge->address - rq_base_addr) / hwc->max_req_msg_size;
 
+	if (rx_req_idx >= hwc_rxq->msg_buf->num_reqs) {
+		dev_err(hwc->dev, "HWC RX: wrong rx_req_idx=%llu, num_reqs=%u\n",
+			rx_req_idx, hwc_rxq->msg_buf->num_reqs);
+		return;
+	}
+
 	rx_req = &hwc_rxq->msg_buf->reqs[rx_req_idx];
 	resp = (struct gdma_resp_hdr *)rx_req->buf_va;
 
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net 2/2] net: mana: Skip redundant detach in queue reset handler if already detached
From: Jakub Kicinski @ 2026-05-19 22:55 UTC (permalink / raw)
  To: dipayanroy
  Cc: Jakub Kicinski, kys, haiyangz, wei.liu, decui, andrew+netdev,
	davem, edumazet, pabeni, leon, longli, kotaranov, horms,
	shradhagupta, ssengar, ernis, shirazsaleem, linux-hyperv, netdev,
	linux-kernel, linux-rdma, stephen, jacob.e.keller, dipayanroy,
	leitao, kees, john.fastabend, hawk, bpf, daniel, ast, sdf,
	yury.norov
In-Reply-To: <20260518194654.735580-3-dipayanroy@linux.microsoft.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: mana: Skip redundant detach in queue reset handler if already detached

This commit adds a check for netif_device_present() in the queue reset
handler. If the device is already detached, it skips the redundant
mana_detach() call and proceeds directly to mana_attach().

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index 9e7e4bf526bf6..740f3c34db451 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -316,12 +316,19 @@ static void mana_per_port_queue_reset_work_handler(struct work_struct *work)
[ ... ]
> +	/* If already detached (indicates detach succeeded but attach failed
> +	 * previously). Now skip mana detach and just retry mana_attach.
> +	 */
> +	if (!netif_device_present(ndev))
> +		goto attach;
> +
>  	err = mana_detach(ndev, false);

The fix here correctly prevents mana_detach() from running on an already
detached device. However, do identical issues remain in mana_change_mtu(),
mana_set_channels(), mana_set_ringparam(), and mana_bpf()?

If mana_attach() previously failed (for example, due to out of memory),
the port is left with apc->port_is_up = false, but it preserves
apc->port_st_save = true, and calls netif_device_detach(ndev).

If an admin tries to recover the port by retrying a configuration command
like changing the MTU or attaching XDP, these functions appear to
unconditionally call mana_detach(ndev, false) without the
!netif_device_present(ndev) check added here.

Inside mana_detach(), apc->port_st_save = apc->port_is_up executes. Since
port_is_up is already false, port_st_save is overwritten to false,
permanently erasing the intent that the port should be up.

The subsequent call to mana_attach() would then see port_st_save == false,
skip mana_alloc_queues(), set port_is_up = false, and successfully call
netif_device_attach(ndev).

Would this result in the configuration command returning 0 to userspace
while the port queues were never allocated, leaving the interface silently
broken until manually toggled down and up?

^ permalink raw reply

* Re: [PATCH net 1/2] net: mana: Add NULL guards in teardown path to prevent panic on attach failure
From: Jakub Kicinski @ 2026-05-19 22:55 UTC (permalink / raw)
  To: dipayanroy
  Cc: Jakub Kicinski, kys, haiyangz, wei.liu, decui, andrew+netdev,
	davem, edumazet, pabeni, leon, longli, kotaranov, horms,
	shradhagupta, ssengar, ernis, shirazsaleem, linux-hyperv, netdev,
	linux-kernel, linux-rdma, stephen, jacob.e.keller, dipayanroy,
	leitao, kees, john.fastabend, hawk, bpf, daniel, ast, sdf,
	yury.norov
In-Reply-To: <20260518194654.735580-2-dipayanroy@linux.microsoft.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: mana: Add NULL guards in teardown path to prevent panic on attach failure
    
When queue allocation fails partway through, the error cleanup frees
and NULLs apc->tx_qp and apc->rxqs. The patch adds NULL guards for these
pointers in teardown paths such as mana_destroy_vport(), mana_fence_rqs(),
and mana_dealloc_queues() to prevent NULL pointer dereferences.

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index 9afc786b297a8..9e7e4bf526bf6 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
[ ... ]
> @@ -2858,13 +2861,16 @@ static void mana_destroy_vport(struct mana_port_context *apc)
>  	struct mana_rxq *rxq;
>  	u32 rxq_idx;
>  
> -	for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
> -		rxq = apc->rxqs[rxq_idx];
> -		if (!rxq)
> -			continue;
> +	if (apc->rxqs) {
>  
> -		mana_destroy_rxq(apc, rxq, true);
> -		apc->rxqs[rxq_idx] = NULL;
> +		for (rxq_idx = 0; rxq_idx < apc->num_queues; rxq_idx++) {
> +			rxq = apc->rxqs[rxq_idx];
> +			if (!rxq)
> +				continue;
> +
> +			mana_destroy_rxq(apc, rxq, true);
> +			apc->rxqs[rxq_idx] = NULL;
> +		}
>  	}

If the teardown path frees apc->rxqs and sets it to NULL on attach failure,
how does the interface recover if the user subsequently brings it back up?

When mana_open() is invoked, it directly calls mana_alloc_queues(). Inside
mana_alloc_queues() -> mana_add_rx_queues(), the code does:

    apc->rxqs[i] = rxq;

If apc->rxqs was left as NULL during the teardown, will this cause a NULL
pointer dereference on the next interface bring-up?

[ ... ]
> @@ -3287,33 +3294,38 @@ static int mana_dealloc_queues(struct net_device *ndev)
>  	 * number of queues.
>  	 */
>  
> -	for (i = 0; i < apc->num_queues; i++) {
> -		txq = &apc->tx_qp[i].txq;
> -		tsleep = 1000;
> -		while (atomic_read(&txq->pending_sends) > 0 &&
> -		       time_before(jiffies, timeout)) {
> -			usleep_range(tsleep, tsleep + 1000);
> -			tsleep <<= 1;
> -		}
> -		if (atomic_read(&txq->pending_sends)) {
> -			err = pcie_flr(to_pci_dev(gd->gdma_context->dev));
> -			if (err) {
> -				netdev_err(ndev, "flr failed %d with %d pkts pending in txq %u\n",
> -					   err, atomic_read(&txq->pending_sends),
> -					   txq->gdma_txq_id);
> +	if (apc->tx_qp) {

In the error path where queue creation fails partway through, will
mana_destroy_txq() be called?

If mana_destroy_txq() unconditionally iterates over all apc->num_queues:

    for (i = 0; i < apc->num_queues; i++) {
        ...
        mana_destroy_wq_obj(apc, GDMA_SQ, apc->tx_qp[i].tx_object);
        ...

Will it pass 0 or INVALID_MANA_HANDLE to mana_destroy_wq_obj() for queues
that were never fully initialized?

Does mana_destroy_wq_obj() send a MANA_DESTROY_WQ_OBJ command to the hardware
with these uninitialized handles, and could handle 0 corrupt the firmware
state?

> +		for (i = 0; i < apc->num_queues; i++) {
> +			txq = &apc->tx_qp[i].txq;
> +			tsleep = 1000;
> +			while (atomic_read(&txq->pending_sends) > 0 &&
> +			       time_before(jiffies, timeout)) {
> +				usleep_range(tsleep, tsleep + 1000);
> +				tsleep <<= 1;
> +			}
> +			if (atomic_read(&txq->pending_sends)) {
> +				err =
> +				    pcie_flr(to_pci_dev(gd->gdma_context->dev));

If pcie_flr() is triggered here, the PCI Function Level Reset wipes the
internal GDMA state.

Further down in this function, mana_disable_vport_rx(), mana_fence_rqs(),
and mana_destroy_vport() are called. Will these functions attempt to send
commands to the now-destroyed GDMA admin queue via mana_send_request() and
wait for completions?

Could this lead to cascading timeouts since the hardware was just reset and
cannot process admin commands?

> +				if (err) {
> +					netdev_err(ndev, "flr failed %d with %d pkts pending in txq %u\n",
> +						   err,
> +					    atomic_read(&txq->pending_sends),
> +					    txq->gdma_txq_id);
> +				}
> +				break;
>  			}
> -			break;
>  		}
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH v3 2/2] drm/hyperv: validate VMBus packet size in receive callback
From: sashiko-bot @ 2026-05-19 21:34 UTC (permalink / raw)
  To: Berkant Koc; +Cc: linux-hyperv
In-Reply-To: <e6e63276cca2901641ab39029e4fd3d621b1ee92.1779221799.git.me@berkoc.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [Critical] Using `bytes_recvd` for `memcpy()` without checking `vmbus_recvpacket()` return value leads to a massive heap buffer overflow.
- [High] Strict sizeof() validation incorrectly rejects dynamically-sized SYNTHVID_RESOLUTION_RESPONSE packets.

Pre-existing issues:
- [High] Concurrent lockless write to `hv->init_buf` from VMBus callback allows a malicious host to overwrite data while the guest is validating it.
- [High] Missing `reinit_completion()` before reusing the shared `hv->wait` completion object.
--

commit 8035423b95c9b24007478019421c5f94fe1179ce
Author: Berkant Koc <me@berkoc.com>
Subject: drm/hyperv: validate VMBus packet size in receive callback

This patch passes bytes_recvd into hyperv_receive_sub() to validate that
packets cover the pipe and synthvid headers as well as the type-specific
payload. The memcpy operation is also modified to use bytes_recvd.

Cc: stable@vger.kernel.org # 5.14+

> diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> index c3d0ff229e3d0..12d3feb4f88ac 100644
> --- a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> +++ b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
[ ... ]
> @@ -420,26 +420,62 @@ static int hyperv_get_supported_resolution(struct hv_device *hdev)
>  	return 0;
>  }
>  
> -static void hyperv_receive_sub(struct hv_device *hdev)
> +static void hyperv_receive_sub(struct hv_device *hdev, u32 bytes_recvd)
>  {
[ ... ]
>  	if (msg->vid_hdr.type == SYNTHVID_VERSION_RESPONSE ||
>  	    msg->vid_hdr.type == SYNTHVID_RESOLUTION_RESPONSE ||
>  	    msg->vid_hdr.type == SYNTHVID_VRAM_LOCATION_ACK) {
> -		memcpy(hv->init_buf, msg, VMBUS_MAX_PACKET_SIZE);
> +		size_t need = hdr_size;
> +
> +		switch (msg->vid_hdr.type) {
> +		case SYNTHVID_VERSION_RESPONSE:
> +			need += sizeof(struct synthvid_version_resp);
> +			break;
> +		case SYNTHVID_RESOLUTION_RESPONSE:
> +			need += sizeof(struct synthvid_supported_resolution_resp);

[Severity: High]
Does this strict sizeof() check correctly handle dynamically-sized packets?

If the host sends a response sized only for the active elements (up to
resolution_count) rather than the maximum 64 elements, bytes_recvd will
be less than need. Will this cause valid packets to be falsely rejected
and break the driver initialization sequence?

> +			break;
> +		case SYNTHVID_VRAM_LOCATION_ACK:
> +			need += sizeof(struct synthvid_vram_location_ack);
> +			break;
> +		}
> +		if (bytes_recvd < need) {
> +			drm_err_ratelimited(&hv->dev,
> +					    "synthvid packet too small for type %u: %u < %zu\n",
> +					    msg->vid_hdr.type, bytes_recvd, need);
> +			return;
> +		}
> +		memcpy(hv->init_buf, msg, bytes_recvd);

[Severity: High]
This is a pre-existing issue, but does this pattern allow a malicious host
to overwrite data while the guest is validating it?

The callback unconditionally copies incoming response packets into
hv->init_buf via memcpy() and calls complete().

Since the waiting thread processes hv->init_buf in process context without
copying it to isolated local memory, can an untrusted host send a second
packet immediately after validation to trigger a time-of-check to
time-of-use (TOCTOU) data race?

>  		complete(&hv->wait);

[Severity: High]
This isn't a bug introduced by this patch, but does the driver need to call
reinit_completion(&hv->wait) before sending a new request?

If a prior request times out and its response arrives late, the completion
counter is incremented. Will the next wait immediately consume the
completion counter and process stale data left in hv->init_buf?

>  		return;
>  	}
[ ... ]
> @@ -466,7 +502,7 @@ static void hyperv_receive(void *ctx)
>  				       &bytes_recvd, &req_id);
>  		if (bytes_recvd > 0 &&
>  		    recv_buf->pipe_hdr.type == PIPE_MSG_DATA)
> -			hyperv_receive_sub(hdev);
> +			hyperv_receive_sub(hdev, bytes_recvd);

[Severity: Critical]
If vmbus_recvpacket() encounters a packet larger than VMBUS_MAX_PACKET_SIZE
(16KB), it returns -ENOBUFS and sets bytes_recvd to the required packet
length.

Since ret == 0 is only checked in the while condition, does this pass an
unbounded bytes_recvd into the callback? If recv_buf still contains the
header from a previous successful iteration, it looks like
memcpy(hv->init_buf, msg, bytes_recvd) will copy more than 16KB into
hv->init_buf, causing a heap buffer overflow.

>  	} while (bytes_recvd > 0 && ret == 0);
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1779221339.git.me@berkoc.com?part=2

^ permalink raw reply

* Re: [PATCH v3 1/2] drm/hyperv: validate resolution_count and fix WIN8 fallback
From: sashiko-bot @ 2026-05-19 20:55 UTC (permalink / raw)
  To: Berkant Koc; +Cc: linux-hyperv
In-Reply-To: <1b88bc7edeb2f0153475225b67f19aaca629eca8.1779221799.git.me@berkoc.com>

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Unsynchronized access to the shared `hv->init_buf` introduces Double Fetch/TOCTOU vulnerabilities, allowing a malicious host to bypass bounds checks and leak kernel memory.
- [High] The driver reuses the `hv->wait` completion for synchronous messages without re-initializing it, which can cause spurious wakeups and synchronization bypass.
--

commit dde0fd11b79967ca7df11b1049abebec363d25a5
Author: Berkant Koc <me@berkoc.com>

drm/hyperv: validate resolution_count and fix WIN8 fallback

Adds bounds checking for resolution_count to prevent walking past the
supported_resolution array. It also adds fallback handling to ensure WIN8
defaults are used when the resolution probe fails.

> diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> index 051ecc5268321..c3d0ff229e3d0 100644
> --- a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> +++ b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
> @@ -387,12 +387,15 @@ static int hyperv_get_supported_resolution(struct hv_device *hdev)
>  	hyperv_sendpacket(hdev, msg);
>  
>  	t = wait_for_completion_timeout(&hv->wait, VMBUS_VSP_TIMEOUT);

[Severity: High]
This isn't a bug introduced by this patch, but does the driver need to call
reinit_completion(&hv->wait) before sending requests?

If a delayed or unsolicited response arrives, hyperv_receive_sub() calls
complete(&hv->wait), incrementing its internal counter.  When this function
is called later, wait_for_completion_timeout() will return immediately
without waiting for the actual response.

Could this cause the driver to erroneously parse its own outgoing request
buffer as a response?

>  	if (!t) {
>  		drm_err(dev, "Time out on waiting resolution response\n");
>  		return -ETIMEDOUT;
>  	}
>  
> -	if (msg->resolution_resp.resolution_count == 0) {
> -		drm_err(dev, "No supported resolutions\n");
> +	if (msg->resolution_resp.resolution_count == 0 ||
> +	    msg->resolution_resp.resolution_count >
> +	    SYNTHVID_MAX_RESOLUTION_COUNT) {

[Severity: Critical]
This is a pre-existing issue, but does reading directly from the shared
hv->init_buf introduce a TOCTOU or Double Fetch problem here?

Since hv->init_buf is also used by the asynchronous hyperv_receive_sub()
callback, could a malicious host send a valid count to pass this check,
and then immediately overwrite hv->init_buf with a large value before the
subsequent for-loop evaluates i < msg->resolution_resp.resolution_count?

Additionally, when the driver prepares requests earlier in this function
(e.g., setting msg->vid_hdr.size), an unsolicited response could overwrite
hv->init_buf. Could this trick vmbus_sendpacket() into using a
host-controlled size and leaking kernel memory?

> +		drm_err(dev, "Invalid resolution count: %d\n",
> +			msg->resolution_resp.resolution_count);
>  		return -ENODEV;
>  	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/cover.1779221339.git.me@berkoc.com?part=1

^ permalink raw reply

* [PATCH v3 2/2] drm/hyperv: validate VMBus packet size in receive callback
From: Berkant Koc @ 2026-05-19 20:08 UTC (permalink / raw)
  To: Saurabh Sengar, Dexuan Cui, Long Li
  Cc: linux-hyperv, dri-devel, linux-kernel, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Michael Kelley, Thomas Zimmermann,
	Maarten Lankhorst, Maxime Ripard, Deepak Rawat
In-Reply-To: <cover.1779221339.git.me@berkoc.com>

hyperv_receive_sub() reads msg->vid_hdr.type and dispatches into one
of four message-type branches without knowing how many bytes the host
wrote into hv->recv_buf. The completion path then runs
memcpy(hv->init_buf, msg, VMBUS_MAX_PACKET_SIZE), so the consumer
that wakes on wait_for_completion_timeout() can read up to 16 KiB of
residue from a prior message as if it were the response payload.

Pass bytes_recvd into hyperv_receive_sub() and reject any packet that
does not cover the pipe + synthvid header. For each of the three
completion-driving types (SYNTHVID_VERSION_RESPONSE,
SYNTHVID_RESOLUTION_RESPONSE, SYNTHVID_VRAM_LOCATION_ACK) also
require the type-specific payload before memcpy/complete, and apply
the same rule to SYNTHVID_FEATURE_CHANGE before reading is_dirt_needed.
The memcpy then uses bytes_recvd, which is bounded by
VMBUS_MAX_PACKET_SIZE through the call to vmbus_recvpacket().

Rejected packets are reported via drm_err_ratelimited() rather than
silently dropped, matching the CoCo-hardened pattern in
hv_kvp_onchannelcallback().

Fixes: 76c56a5affeb ("drm/hyperv: Add DRM driver for hyperv synthetic video device")
Cc: stable@vger.kernel.org # 5.14+
Signed-off-by: Berkant Koc <me@berkoc.com>
Assisted-by: Claude:claude-opus-4-7 berkoc-pipeline
---
 drivers/gpu/drm/hyperv/hyperv_drm_proto.c | 42 +++++++++++++++++++++--
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
index c3d0ff229..12d3feb4f 100644
--- a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
+++ b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
@@ -420,26 +420,62 @@ static int hyperv_get_supported_resolution(struct hv_device *hdev)
 	return 0;
 }
 
-static void hyperv_receive_sub(struct hv_device *hdev)
+static void hyperv_receive_sub(struct hv_device *hdev, u32 bytes_recvd)
 {
 	struct hyperv_drm_device *hv = hv_get_drvdata(hdev);
 	struct synthvid_msg *msg;
+	size_t hdr_size;
 
 	if (!hv)
 		return;
 
+	hdr_size = sizeof(struct pipe_msg_hdr) +
+		   sizeof(struct synthvid_msg_hdr);
+	if (bytes_recvd < hdr_size) {
+		drm_err_ratelimited(&hv->dev,
+				    "synthvid packet too small for header: %u\n",
+				    bytes_recvd);
+		return;
+	}
+
 	msg = (struct synthvid_msg *)hv->recv_buf;
 
 	/* Complete the wait event */
 	if (msg->vid_hdr.type == SYNTHVID_VERSION_RESPONSE ||
 	    msg->vid_hdr.type == SYNTHVID_RESOLUTION_RESPONSE ||
 	    msg->vid_hdr.type == SYNTHVID_VRAM_LOCATION_ACK) {
-		memcpy(hv->init_buf, msg, VMBUS_MAX_PACKET_SIZE);
+		size_t need = hdr_size;
+
+		switch (msg->vid_hdr.type) {
+		case SYNTHVID_VERSION_RESPONSE:
+			need += sizeof(struct synthvid_version_resp);
+			break;
+		case SYNTHVID_RESOLUTION_RESPONSE:
+			need += sizeof(struct synthvid_supported_resolution_resp);
+			break;
+		case SYNTHVID_VRAM_LOCATION_ACK:
+			need += sizeof(struct synthvid_vram_location_ack);
+			break;
+		}
+		if (bytes_recvd < need) {
+			drm_err_ratelimited(&hv->dev,
+					    "synthvid packet too small for type %u: %u < %zu\n",
+					    msg->vid_hdr.type, bytes_recvd, need);
+			return;
+		}
+		memcpy(hv->init_buf, msg, bytes_recvd);
 		complete(&hv->wait);
 		return;
 	}
 
 	if (msg->vid_hdr.type == SYNTHVID_FEATURE_CHANGE) {
+		if (bytes_recvd < hdr_size +
+		    sizeof(struct synthvid_feature_change)) {
+			drm_err_ratelimited(&hv->dev,
+					    "synthvid feature change packet too small: %u\n",
+					    bytes_recvd);
+			return;
+		}
 		hv->dirt_needed = msg->feature_chg.is_dirt_needed;
 		if (hv->dirt_needed)
 			hyperv_hide_hw_ptr(hv->hdev);
@@ -466,7 +502,7 @@ static void hyperv_receive(void *ctx)
 				       &bytes_recvd, &req_id);
 		if (bytes_recvd > 0 &&
 		    recv_buf->pipe_hdr.type == PIPE_MSG_DATA)
-			hyperv_receive_sub(hdev);
+			hyperv_receive_sub(hdev, bytes_recvd);
 	} while (bytes_recvd > 0 && ret == 0);
 }
 
-- 
2.47.3


^ permalink raw reply related

* [PATCH v3 1/2] drm/hyperv: validate resolution_count and fix WIN8 fallback
From: Berkant Koc @ 2026-05-19 20:08 UTC (permalink / raw)
  To: Saurabh Sengar, Dexuan Cui, Long Li
  Cc: linux-hyperv, dri-devel, linux-kernel, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Michael Kelley, Thomas Zimmermann,
	Maarten Lankhorst, Maxime Ripard, Deepak Rawat
In-Reply-To: <cover.1779221339.git.me@berkoc.com>

A SYNTHVID_RESOLUTION_RESPONSE with resolution_count > 64 walks past
the supported_resolution[SYNTHVID_MAX_RESOLUTION_COUNT] array in the
parse loop. Bound resolution_count against the array size, folded
into the existing zero-check.

When the WIN10 resolution probe fails, the caller in
hyperv_connect_vsp() left hv->screen_*_max / preferred_* unpopulated,
which sets mode_config.max_width / max_height to 0 and makes
drm_internal_framebuffer_create() reject every userspace framebuffer
with -EINVAL. The pre-WIN10 branch had the same gap for
preferred_width / preferred_height. Use a single post-probe fallback
guarded by screen_width_max == 0 so both paths converge on the WIN8
defaults.

Signed-off-by: Berkant Koc <me@berkoc.com>
Assisted-by: Claude:claude-opus-4-7 berkoc-pipeline
Fixes: 76c56a5affeb ("drm/hyperv: Add DRM driver for hyperv synthetic video device")
Cc: stable@vger.kernel.org # 5.14+
---
 drivers/gpu/drm/hyperv/hyperv_drm_proto.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
index 051ecc526..c3d0ff229 100644
--- a/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
+++ b/drivers/gpu/drm/hyperv/hyperv_drm_proto.c
@@ -391,8 +391,11 @@ static int hyperv_get_supported_resolution(struct hv_device *hdev)
 		return -ETIMEDOUT;
 	}
 
-	if (msg->resolution_resp.resolution_count == 0) {
-		drm_err(dev, "No supported resolutions\n");
+	if (msg->resolution_resp.resolution_count == 0 ||
+	    msg->resolution_resp.resolution_count >
+	    SYNTHVID_MAX_RESOLUTION_COUNT) {
+		drm_err(dev, "Invalid resolution count: %d\n",
+			msg->resolution_resp.resolution_count);
 		return -ENODEV;
 	}
 
@@ -508,9 +511,13 @@ int hyperv_connect_vsp(struct hv_device *hdev)
 		ret = hyperv_get_supported_resolution(hdev);
 		if (ret)
 			drm_err(dev, "Failed to get supported resolution from host, use default\n");
-	} else {
+	}
+
+	if (!hv->screen_width_max) {
 		hv->screen_width_max = SYNTHVID_WIDTH_WIN8;
 		hv->screen_height_max = SYNTHVID_HEIGHT_WIN8;
+		hv->preferred_width = SYNTHVID_WIDTH_WIN8;
+		hv->preferred_height = SYNTHVID_HEIGHT_WIN8;
 	}
 
 	hv->mmio_megabytes = hdev->channel->offermsg.offer.mmio_megabytes;
-- 
2.47.3


^ permalink raw reply related

* [PATCH v3 0/2] drm/hyperv: harden host message parsing
From: Berkant Koc @ 2026-05-19 20:08 UTC (permalink / raw)
  To: Saurabh Sengar, Dexuan Cui, Long Li
  Cc: linux-hyperv, dri-devel, linux-kernel, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Michael Kelley, Thomas Zimmermann,
	Maarten Lankhorst, Maxime Ripard, Deepak Rawat
In-Reply-To: <20260517-drm-hyperv-cover-v2@berkoc.com>

Two independent issues in the synthetic video driver that both stem
from trusting unvalidated host data.

1/2 bounds resolution_count from SYNTHVID_RESOLUTION_RESPONSE against
the supported_resolution[] array, and populates WIN8 defaults for
hv->screen_*_max / hv->preferred_* in both the WIN10-probe-failure
path and the pre-WIN10 path, so a failed or pre-WIN10 probe yields
a usable display instead of having drm_internal_framebuffer_create()
reject every userspace framebuffer with -EINVAL.

2/2 forwards bytes_recvd from vmbus_recvpacket() into the sub-handler,
rejects packets that do not cover the synthvid header, and additionally
requires the type-specific payload size before memcpy/complete or
before reading the feature-change byte. Rejected packets are logged
via drm_err_ratelimited() instead of being silently dropped, matching
the CoCo-hardened pattern in hv_kvp_onchannelcallback().

Changes since v2 (per review by Michael Kelley):

  1/2: dropped the reinit_completion() change. Kelley pointed out that
       the negotiate-version and update-vram-location timeouts cause
       hyperv_vmbus_probe() to fail and free the device, so the stale
       completion can only outlive its request in hyperv_vmbus_resume()
       after a get_supported_resolution() timeout. That is a narrower
       fix and belongs in a separate patch against the resume path.
       Subject and commit message rewritten to reflect that this patch
       is now bounds-check + WIN8 fallback only. Pre-WIN10 branch now
       also populates hv->preferred_* (Kelley spotted the gap).
       Followed the post-probe-test refactor Kelley suggested: the else
       branch is gone, a single screen_width_max == 0 check covers
       both the pre-WIN10 case and a failed WIN10 probe.

  2/2: dropped the redundant upper bound on bytes_recvd. Added a
       per-type switch for the three completion-driving message types
       (SYNTHVID_VERSION_RESPONSE, SYNTHVID_RESOLUTION_RESPONSE,
       SYNTHVID_VRAM_LOCATION_ACK) so the wait-completion path
       validates payload size before memcpy/complete. Every reject
       path now emits drm_err_ratelimited() rather than returning
       silently. Commit message rewritten to lead with the residue
       read, with "wasteful copy" reframed as the secondary observation.

Changes since v1:
  1/2: bound resolution_count check folded into the existing zero
       check; populate WIN8 defaults when hyperv_get_supported_resolution()
       fails.
  2/2: forward bytes_recvd into hyperv_receive_sub(); enforce the
       pipe + synthvid header minimum; check synthvid_feature_change
       payload size before reading is_dirt_needed.

Both patches carry an Assisted-by: Claude:claude-opus-4-7 berkoc-pipeline
trailer per the kernel coding-assistants policy. Code, analysis and
review responses are mine; the model is used as a structured reviewer
under human verification.

base-commit: 4bf5d3da79c48e1df4bab82c9680c53adeff7820

Berkant Koc (2):
  drm/hyperv: validate resolution_count and fix WIN8 fallback
  drm/hyperv: validate VMBus packet size in receive callback

 drivers/gpu/drm/hyperv/hyperv_drm_proto.c | 58 ++++++++++++++++++++---
 1 file changed, 52 insertions(+), 6 deletions(-)

--
2.47.3

^ permalink raw reply

* Re: [PATCH v2 2/2] drm/hyperv: validate VMBus packet size in receive callback
From: Berkant Koc @ 2026-05-19 20:20 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Saurabh Sengar, Dexuan Cui, Long Li, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Thomas Zimmermann, Maarten Lankhorst,
	Maxime Ripard, Deepak Rawat, linux-hyperv, dri-devel,
	linux-kernel
In-Reply-To: <SN6PR02MB415761EE7992EAFA14F2201BD4002@SN6PR02MB4157.namprd02.prod.outlook.com>

Hello Michael,

Thanks for the thorough review. v3 is on the list and addresses each
point:

> Does copying the full 16 KiB break anything? Or are you flagging as just
> wasteful activity?

It is the residue read that is the actual hazard, not the copy cost:
the consumer that wakes on complete() then reads up to 16 KiB of bytes
the host did not write in this packet as if it were the response
payload. The v3 commit message now leads with that and treats
"wasteful" as a secondary observation.

> Version related comments should go below the "---" following the
> Signed-off line.

Moved into the cover letter changelog in v3 so it stays out of git
log.

> The check against VMBUS_MAX_PACKET_SIZE shouldn't be needed.

Dropped. The v3 check is bytes_recvd < hdr_size only.

> In similar cases in other drivers that have been hardened for CoCo VMs,
> the code outputs a rate limited error message. [...] See
> hv_kvp_onchannelcallback() for example.

Done in v3 via drm_err_ratelimited() on every short-packet path
(synthvid header underflow, type-specific payload underflow, feature
change underflow). The driver already uses drm_err_ratelimited() in
hyperv_sendpacket() for the corresponding send path.

> Additional logic is needed here. Each of the three message types
> in the "if" statement has data beyond just the header. Before doing
> the memcpy() and complete(), the code should validate that the msg
> is big enough to contain that expected data.

Fixed in v3. For the three completion types I now compute the required
payload size with a switch on msg->vid_hdr.type and reject the packet
before memcpy/complete:

  SYNTHVID_VERSION_RESPONSE    -> sizeof(struct synthvid_version_resp)
  SYNTHVID_RESOLUTION_RESPONSE -> sizeof(struct synthvid_supported_resolution_resp)
  SYNTHVID_VRAM_LOCATION_ACK   -> sizeof(struct synthvid_vram_location_ack)

The memcpy then uses bytes_recvd, so wait_for_completion_timeout()
consumers never see truncated data and never read past what the host
wrote.

Series: <cover.1779221339.git.me@berkoc.com>

The v3 patches carry an `Assisted-by: Claude:claude-opus-4-7
berkoc-pipeline` trailer per the kernel coding-assistants policy.
Code, analysis and review responses are mine; the model is used as
a structured reviewer under human verification.

Thanks,
Berkant

^ permalink raw reply

* Re: [PATCH v2 1/2] drm/hyperv: validate resolution_count and harden VSP request paths
From: Berkant Koc @ 2026-05-19 20:20 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Saurabh Sengar, Dexuan Cui, Long Li, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Thomas Zimmermann, Maarten Lankhorst,
	Maxime Ripard, Deepak Rawat, linux-hyperv, dri-devel,
	linux-kernel
In-Reply-To: <SN6PR02MB4157D595B990A321BFA85B40D4002@SN6PR02MB4157.namprd02.prod.outlook.com>

Hello Michael,

Thanks for the CoCo context, that lines up with what is in
vmbus_devs[] for the framebuffer device. The piecemeal approach is
what I am aiming for here.

v3 is on the list and addresses your three concrete points:

> This change should probably be a separate patch, as it's not really
> related to the bounds checking issue.
> [...]
> All that notwithstanding, I don't think your fix is needed in its
> current form.

Dropped from v3. You are right that the negotiate-version and
update-vram-location timeouts let hyperv_vmbus_probe() bail out and
free the device, so the stale-completion path is only reachable from
hyperv_vmbus_resume() after a get_supported_resolution() timeout.
That is a narrower fix and belongs in a separate patch against the
resume path, which I will send afterwards.

> Is there a separate problem here in that preferred_width and
> preferred_height are not set in the pre-WIN10 case?

Yes, separate problem, and I missed it in v2. The pre-WIN10 branch
in hyperv_connect_vsp() sets only screen_*_max and leaves preferred_*
at zero, which is inconsistent with the WIN10-failure path.

> Also, having to duplicate the fallback code is distasteful. Instead
> of having an "else" clause, maybe have a follow-up test for
> screen_width_max [...] being zero as an indicator [...]

Adopted in v3. The else branch is gone; the WIN10 path runs the probe
and the post-probe block applies the WIN8 defaults whenever
screen_width_max is still zero. One source of truth, both paths
converge on it.

> In my view, your commit message is a bit too detailed.

Rewritten in v3. The bounds check and the WIN8 fallback are now one
short paragraph each, focused on the "why".

Series: <cover.1779221339.git.me@berkoc.com>

The v3 patches carry an `Assisted-by: Claude:claude-opus-4-7
berkoc-pipeline` trailer per the kernel coding-assistants policy.
Code, analysis and review responses are mine; the model is used as
a structured reviewer under human verification.

Thanks,
Berkant

^ permalink raw reply

* RE: [PATCH v4 16/18] mshv: Validate scheduler message bounds from hypervisor
From: Michael Kelley @ 2026-05-19 20:21 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <177816866691.21765.15605640837157423543.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Thursday, May 7, 2026 8:44 AM
> 
> handle_pair_message() iterates up to msg->vp_count without verifying it
> against HV_MESSAGE_MAX_PARTITION_VP_PAIR_COUNT. Since vp_count is read
> from untrusted hypervisor data, a malformed message with a large value
> would cause out-of-bounds reads from the partition_ids and vp_indexes
> arrays.
> 
> handle_bitset_message() iterates over set bits in valid_bank_mask (up to
> 64) and advances bank_contents for each one. However, the payload buffer
> only has space for 16 bank entries. A valid_bank_mask with more than 16
> bits set causes bank_contents to read beyond the message buffer.
> 
> Fix both by adding bounds validation:
> - Clamp vp_count to HV_MESSAGE_MAX_PARTITION_VP_PAIR_COUNT
> - Track banks consumed and stop before exceeding buffer capacity
> 
> Fixes: 621191d709b1 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_synic.c |   20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hv/mshv_synic.c b/drivers/hv/mshv_synic.c
> index 89207aad7cf1f..5d509299f14d7 100644
> --- a/drivers/hv/mshv_synic.c
> +++ b/drivers/hv/mshv_synic.c
> @@ -190,7 +190,9 @@ static void kick_vp(struct mshv_vp *vp)
>  static void
>  handle_bitset_message(const struct hv_vp_signal_bitset_scheduler_message *msg)
>  {
> -	int bank_idx, vps_signaled = 0, bank_mask_size;
> +	int bank_idx, vps_signaled = 0, bank_mask_size, banks_used = 0;
> +	const int max_banks = sizeof(msg->vp_bitset.bitset_buffer) /
> +			      sizeof(u64) - 2; /* subtract format + mask */
>  	struct mshv_partition *partition;
>  	const struct hv_vpset *vpset;
>  	const u64 *bank_contents;
> @@ -230,6 +232,11 @@ handle_bitset_message(const struct hv_vp_signal_bitset_scheduler_message *msg)
>  		if (bank_idx == bank_mask_size)
>  			break;
> 
> +		if (unlikely(banks_used >= max_banks)) {
> +			pr_debug("valid_bank_mask exceeds buffer capacity\n");
> +			goto unlock_out;
> +		}

Instead of counting the banks while going through the loop, and checking
against the max on each loop iteration, consider doing the following up
front before entering the while loop:

		if (hweight64(vpset->valid_bank_mask) > max_banks) {
			pr_debug("valid_bank_mask exceeds buffer capacity\n");
			return;
		}

Of course, such an approach would not process *any* bits in the message,
but that's probably OK since the message Linux got is malformed.

Michael

> +
>  		while (true) {
>  			struct mshv_vp *vp;
> 
> @@ -258,6 +265,7 @@ handle_bitset_message(const struct hv_vp_signal_bitset_scheduler_message *msg)
>  		}
> 
>  		bank_contents++;
> +		banks_used++;
>  	}
> 
>  unlock_out:
> @@ -274,10 +282,18 @@ handle_pair_message(const struct hv_vp_signal_pair_scheduler_message *msg)
>  	struct mshv_partition *partition = NULL;
>  	struct mshv_vp *vp;
>  	int idx;
> +	u8 vp_count = msg->vp_count;
> +
> +	if (unlikely(vp_count > HV_MESSAGE_MAX_PARTITION_VP_PAIR_COUNT)) {
> +		pr_debug("pair message vp_count %u exceeds max %lu\n",
> +			 vp_count,
> +			 (unsigned long)HV_MESSAGE_MAX_PARTITION_VP_PAIR_COUNT);
> +		return;
> +	}
> 
>  	rcu_read_lock();
> 
> -	for (idx = 0; idx < msg->vp_count; idx++) {
> +	for (idx = 0; idx < vp_count; idx++) {
>  		u64 partition_id = msg->partition_ids[idx];
>  		u32 vp_index = msg->vp_indexes[idx];
> 
> 
> 


^ permalink raw reply

* RE: [PATCH v4 04/18] mshv: Add NULL check for vp in mshv_try_assert_irq_fast
From: Michael Kelley @ 2026-05-19 20:08 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <177816860118.21765.7481864545928795603.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Thursday, May 7, 2026 8:43 AM
> 
> mshv_try_assert_irq_fast() dereferences the vp pointer obtained from
> pt_vp_array[lapic_apic_id] without checking for NULL or validating that
> lapic_apic_id is within bounds. A spurious interrupt from the hypervisor
> targeting a non-existent VP (or one not yet created) causes a NULL
> pointer dereference and crashes the host.
> 
> Add a bounds check on lapic_apic_id against MSHV_MAX_VPS and a NULL
> check on the vp pointer before dereferencing.
> 
> Fixes: 621191d709b14 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_eventfd.c |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/hv/mshv_eventfd.c b/drivers/hv/mshv_eventfd.c
> index 5995a62aff8d8..b398e58411dd7 100644
> --- a/drivers/hv/mshv_eventfd.c
> +++ b/drivers/hv/mshv_eventfd.c
> @@ -169,7 +169,12 @@ static int mshv_try_assert_irq_fast(struct mshv_irqfd *irqfd)
>  		return -EOPNOTSUPP;
>  #endif
> 
> +	if (irq->lapic_apic_id >= MSHV_MAX_VPS)
> +		return -EINVAL;

APIC IDs are 8-bit values, and indeed lapic_apic_id is set in
mshv_copy_girq_info() after masking the value with 0xFF. So
this check doesn't do anything. (I guess MSHV_MAX_VPS
could be changed to something smaller than 256, but that
seems highly unlikely.) There is extended destination
id functionality that provides for a 15-bit APIC ID, but I don't
see any support for that in the MSHV code.

> +
>  	vp = partition->pt_vp_array[irq->lapic_apic_id];

The APIC ID does *not* equal the Linux CPU number or VP
index in the general case, so this indexing is problematic. APIC
IDs are not dense if the target partition has multiple NUMA
nodes and the number of VPs in a NUMA node is not a power
of 2. In such case, the APIC ID space has gaps. Azure has such
VM sizes, and you can create such a configuration using
the local Hyper-V Manager UI. So having APIC IDs that aren't
dense is a real case.

> +	if (!vp)
> +		return -EINVAL;
> 
>  	if (!vp->vp_register_page)
>  		return -EOPNOTSUPP;
> 
> 


^ permalink raw reply

* RE: [PATCH v4 01/18] mshv: Fix IRQ leak and type hazards in hv_call_modify_spa_host_access
From: Michael Kelley @ 2026-05-19 20:07 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, longli@microsoft.com
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <177816858406.21765.9718563917415905259.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>

From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Sent: Thursday, May 7, 2026 8:43 AM
> 
> The bounds check inside the PFN-filling loop can return -EINVAL while
> interrupts are disabled via local_irq_save(), leaking IRQ state.
> 
> Remove the check — it is redundant because the loop invariant
> (done + i < page_count == page_struct_count >> large_shift) guarantees
> (done + i) << large_shift < page_struct_count always holds.
> 
> While here, fix type mismatches: change 'int done' to 'u64 done' and
> use u64 for loop and batch-size variables so they match the u64
> page_count they are compared against.
> 
> Fixes: 621191d709b14 ("Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs")
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  drivers/hv/mshv_root_hv_call.c |   18 ++++++------------
>  1 file changed, 6 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> index 129456bd72aba..cc580225e9e45 100644
> --- a/drivers/hv/mshv_root_hv_call.c
> +++ b/drivers/hv/mshv_root_hv_call.c
> @@ -1042,7 +1042,7 @@ int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
>  {
>  	struct hv_input_modify_sparse_spa_page_host_access *input_page;
>  	u64 status;
> -	int done = 0;
> +	u64 done = 0;
>  	unsigned long irq_flags, large_shift = 0;
>  	u64 page_count = page_struct_count;
>  	u16 code = acquire ? HVCALL_ACQUIRE_SPARSE_SPA_PAGE_HOST_ACCESS :
> @@ -1059,9 +1059,9 @@ int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
>  	}
> 
>  	while (done < page_count) {
> -		ulong i, completed, remain = page_count - done;
> -		int rep_count = min(remain,
> -				        HV_MODIFY_SPARSE_SPA_PAGE_HOST_ACCESS_MAX_PAGE_COUNT);
> +		u64 i, completed, remain = page_count - done;
> +		u64 rep_count = min_t(u64, remain,
> + 					HV_MODIFY_SPARSE_SPA_PAGE_HOST_ACCESS_MAX_PAGE_COUNT);

Why does this need to be "min_t()" instead of just "min()"?  Needing min_t()
was the case in times past, but "min()" does a much better job now of handling
mixed integer types. There are a number of patches on LKML that are
replacing min_t() with min(). This change seems to be going the opposite
direction.

> 
>  		local_irq_save(irq_flags);
>  		input_page = *this_cpu_ptr(hyperv_pcpu_input_arg);
> @@ -1075,15 +1075,9 @@ int hv_call_modify_spa_host_access(u64 partition_id, struct page **pages,
>  		input_page->flags = flags;
>  		input_page->host_access = host_access;
> 
> -		for (i = 0; i < rep_count; i++) {
> -			u64 index = (done + i) << large_shift;
> -
> -			if (index >= page_struct_count)
> -				return -EINVAL;
> -
> +		for (i = 0; i < rep_count; i++)
>  			input_page->spa_page_list[i] =
> -						page_to_pfn(pages[index]);
> -		}
> +				page_to_pfn(pages[(done + i) << large_shift]);
> 
>  		status = hv_do_rep_hypercall(code, rep_count, 0, input_page,
>  					     NULL);
> 
The "completed" local variable could be collapsed out by doing:

		done += hv_repcomp(status)

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox