Re: [PATCH v2 3/4] iommu/hyperv: Add para-virtualized IOMMU support for Hyper-V guest

Linux-HyperV List
 help / color / mirror / Atom feed

From: Jason Gunthorpe <jgg@ziepe.ca>
To: Yu Zhang <zhangyu1@linux.microsoft.com>
Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
	iommu@lists.linux.dev, linux-pci@vger.kernel.org,
	linux-arch@vger.kernel.org, wei.liu@kernel.org,
	kys@microsoft.com, haiyangz@microsoft.com, decui@microsoft.com,
	longli@microsoft.com, joro@8bytes.org, will@kernel.org,
	robin.murphy@arm.com, bhelgaas@google.com,
	kwilczynski@kernel.org, lpieralisi@kernel.org, mani@kernel.org,
	robh@kernel.org, arnd@arndb.de, mhklinux@outlook.com,
	jacob.pan@linux.microsoft.com, tgopinath@linux.microsoft.com,
	easwar.hariharan@linux.microsoft.com,
	mrathor@linux.microsoft.com
Subject: Re: [PATCH v2 3/4] iommu/hyperv: Add para-virtualized IOMMU support for Hyper-V guest
Date: Fri, 3 Jul 2026 14:32:48 -0300	[thread overview]
Message-ID: <20260703173248.GB1968184@ziepe.ca> (raw)
In-Reply-To: <20260702160518.311234-4-zhangyu1@linux.microsoft.com>

On Fri, Jul 03, 2026 at 12:05:17AM +0800, Yu Zhang wrote:

> +static bool hv_iommu_capable(struct device *dev, enum iommu_cap cap)
> +{
> +	switch (cap) {
> +	case IOMMU_CAP_CACHE_COHERENCY:
> +		return true;
> +	case IOMMU_CAP_DEFERRED_FLUSH:
> +		return true;

This CAP isn't necessary anymore

> +static struct iommu_device *hv_iommu_probe_device(struct device *dev)
> +{
> +	struct pci_dev *pdev;
> +	struct hv_iommu_endpoint *vdev;
> +	struct hv_output_get_logical_device_property device_iommu_property = {0};
> +
> +	if (!dev_is_pci(dev))
> +		return ERR_PTR(-ENODEV);
> +
> +	pdev = to_pci_dev(dev);
> +
> +	if (hv_iommu_get_logical_device_property(dev,
> +						 HV_LOGICAL_DEVICE_PROPERTY_PVIOMMU,
> +						 &device_iommu_property) ||
> +	    !(device_iommu_property.device_iommu & HV_DEVICE_IOMMU_ENABLED))
> +		return ERR_PTR(-ENODEV);
> +
> +	vdev = kzalloc_obj(*vdev, GFP_KERNEL);
> +	if (!vdev)
> +		return ERR_PTR(-ENOMEM);
> +
> +	vdev->dev = dev;
> +	vdev->hv_iommu = hv_iommu_device;
> +	dev_iommu_priv_set(dev, vdev);
> +
> +	if (hv_iommu_ats_supported(hv_iommu_device->cap) &&
> +	    pci_ats_supported(pdev))
> +		pci_enable_ats(pdev, __ffs(hv_iommu_device->pgsize_bitmap));

This can probably just be PAGE_SHIFT

Also ATS shouldn't be enabled until a translation is installed,
otherwise the driver cannot participate in the ATS error handling
Nicolin is working on.

> +static void hv_iommu_release_device(struct device *dev)
> +{
> +	struct hv_iommu_endpoint *vdev = dev_iommu_priv_get(dev);
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +
> +	if (pdev->ats_enabled)
> +		pci_disable_ats(pdev);
> +
> +	dev_iommu_priv_set(dev, NULL);

No necessary, the caller does it

> +static struct iommu_group *hv_iommu_device_group(struct device *dev)
> +{
> +	if (dev_is_pci(dev))
> +		return pci_device_group(dev);
> +
> +	WARN_ON_ONCE(1);
> +	return generic_device_group(dev);

I think you can just return failure here instead of WARN_ON ?

> +static int __init hv_initialize_static_domains(void)
> +{
> +	int ret;
> +	struct hv_iommu_domain *hv_domain;
> +
> +	/* Default stage-1 identity domain */
> +	hv_domain = &hv_identity_domain;
> +
> +	ret = hv_create_device_domain(hv_domain, HV_DEVICE_DOMAIN_TYPE_S1);
> +	if (ret)
> +		return ret;
> +
> +	ret = hv_configure_device_domain(hv_domain, IOMMU_DOMAIN_IDENTITY);
> +	if (ret)
> +		goto delete_identity_domain;

IMHO I would change this around to have a single function that accepts
a struct hv_input_configure_device_domain as input and does both of
the hypercalls inside. Then here it is easy to directly construct the
hv_input_configure_device_domain for blocking and identity.

I'd be happy if this never touched domain_type, drivers shouldn't be
touching that.

> +static void __init hv_init_iommu_device(struct hv_iommu_dev *hv_iommu,
> +			struct hv_output_get_iommu_capabilities *hv_iommu_cap)
> +{
> +	ida_init(&hv_iommu->domain_ids);
> +
> +	hv_iommu->cap = hv_iommu_cap->iommu_cap;
> +	hv_iommu->max_iova_width = hv_iommu_cap->max_iova_width;
> +	if (!hv_iommu_5lvl_supported(hv_iommu->cap) &&
> +	    hv_iommu->max_iova_width > 48) {
> +		pr_info("5-level paging not supported, limiting iova width to 48.\n");
> +		hv_iommu->max_iova_width = 48;
> +	}
> +
> +	hv_iommu->geometry = (struct iommu_domain_geometry) {
> +		.aperture_start = 0,
> +		.aperture_end = (((u64)1) << hv_iommu->max_iova_width) - 1,
> +		.force_aperture = true,
> +	};

I don't see anything reading this, I don't expect this to be used?

The max_iova_width has to be passed into the iommupt creation, which
it does:

 +	cfg.common.hw_max_vasz_lg2 = hv_iommu_device->max_iova_width;
 +	cfg.common.hw_max_oasz_lg2 = 52;
 +	cfg.top_level = (hv_iommu_device->max_iova_width > 48) ? 4 : 3;
 +	ret = pt_iommu_x86_64_init(&hv_domain->pt_iommu_x86_64, &cfg, GFP_KERNEL);
 +	if (ret)

So just delete hv->iommu->geometry.

Also, VT-D has weirdness where the HW can require a 4 level table but
only a 3 level worth of IOVA width is being used. This was a
real-world bug we hit when converting to iommupt. This interaction
with the HV doesn't seem able to represent that.

> +	/*
> +	 * The page table code only maps x86 page sizes (4K/2M/1G); require the
> +	 * hypervisor to advertise a non-empty subset of exactly those.
> +	 */
> +	if (!hv_iommu_cap.pgsize_bitmap ||
> +	    (hv_iommu_cap.pgsize_bitmap & ~(u64)(SZ_4K | SZ_2M | SZ_1G))) {
> +		pr_err("unsupported page sizes: pgsize_bitmap=0x%llx\n",
> +		       hv_iommu_cap.pgsize_bitmap);
> +		return -ENODEV;
> +	}

This can just be

if (!(hv_iommu_cap.pgsize_bitmap & PAGE_SHIFT)) {
		pr_err("unsupported page sizes: pgsize_bitmap=0x%llx\n",
		       hv_iommu_cap.pgsize_bitmap);
}		return -ENODEV;

Which is all you really need. If the HV doesn't support 1G it is
perfectly fine, the iommupt page bitmap is already masked by this. 

> +	ret = iommu_device_register(&hv_iommu->iommu, &hv_iommu_ops, NULL);
> +	if (ret) {
> +		pr_err("iommu_device_register failed: %d\n", ret);
> +		goto err_sysfs_remove;
> +	}
> +
> +	pr_info("successfully initialized\n");

Don't log someting so vauge?

Jason

next prev parent reply	other threads:[~2026-07-03 17:32 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-07-02 16:05 [PATCH v2 0/4] Hyper-V: Add para-virtualized IOMMU support for Linux guests Yu Zhang
2026-07-02 16:05 ` [PATCH v2 1/4] hyperv: Introduce new hypercall interfaces used by Hyper-V guest IOMMU Yu Zhang
2026-07-02 16:36   ` sashiko-bot
2026-07-02 16:05 ` [PATCH v2 2/4] Drivers: hv: Add logical device ID registry for vPCI devices Yu Zhang
2026-07-02 16:42   ` sashiko-bot
2026-07-02 16:05 ` [PATCH v2 3/4] iommu/hyperv: Add para-virtualized IOMMU support for Hyper-V guest Yu Zhang
2026-07-02 17:08   ` sashiko-bot
2026-07-03 17:32   ` Jason Gunthorpe [this message]
2026-07-02 16:05 ` [PATCH v2 4/4] iommu/hyperv: Add page-selective IOTLB flush support Yu Zhang
2026-07-02 17:20   ` sashiko-bot
2026-07-03 17:10   ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260703173248.GB1968184@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=arnd@arndb.de \
    --cc=bhelgaas@google.com \
    --cc=decui@microsoft.com \
    --cc=easwar.hariharan@linux.microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=iommu@lists.linux.dev \
    --cc=jacob.pan@linux.microsoft.com \
    --cc=joro@8bytes.org \
    --cc=kwilczynski@kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=lpieralisi@kernel.org \
    --cc=mani@kernel.org \
    --cc=mhklinux@outlook.com \
    --cc=mrathor@linux.microsoft.com \
    --cc=robh@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=tgopinath@linux.microsoft.com \
    --cc=wei.liu@kernel.org \
    --cc=will@kernel.org \
    --cc=zhangyu1@linux.microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox