All of lore.kernel.org
 help / color / mirror / Atom feed
From: Baolu Lu <baolu.lu@linux.intel.com>
To: Nicolin Chen <nicolinc@nvidia.com>,
	will@kernel.org, robin.murphy@arm.com, joro@8bytes.org,
	bhelgaas@google.com, jgg@nvidia.com
Cc: rafael@kernel.org, lenb@kernel.org, praan@google.com,
	xueshuai@linux.alibaba.com, kevin.tian@intel.com,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-pci@vger.kernel.org, vsethi@nvidia.com
Subject: Re: [PATCH v2 2/7] iommu: Add reset_device_done callback for hardware fault recovery
Date: Wed, 18 Mar 2026 13:59:58 +0800	[thread overview]
Message-ID: <566549c9-fab1-43b5-a35b-e3c76f1c285d@linux.intel.com> (raw)
In-Reply-To: <3750a106b4ab4235df842fa2b9defbc8226ebbef.1773774441.git.nicolinc@nvidia.com>

On 3/18/26 03:15, Nicolin Chen wrote:
> When an IOMMU hardware detects an error due to a faulty device (e.g. an ATS
> invalidation timeout), IOMMU drivers may quarantine the device by disabling
> specific hardware features or dropping translation capabilities.
> 
> To recover from these states, the IOMMU driver needs a reliable signal that
> the underlying physical hardware has been cleanly reset (e.g., via PCIe AER
> or a sysfs Function Level Reset) so as to lift the quarantine.
> 
> Introduce a reset_device_done callback in struct iommu_ops. Trigger it from
> the existing pci_dev_reset_iommu_done() path to notify the underlying IOMMU
> driver that the device's internal state has been sanitized.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>   include/linux/iommu.h |  2 ++
>   drivers/iommu/iommu.c | 12 ++++++++++++
>   2 files changed, 14 insertions(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 54b8b48c762e8..9ba12b2164724 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -626,6 +626,7 @@ __iommu_copy_struct_to_user(const struct iommu_user_data *dst_data,
>    * @release_device: Remove device from iommu driver handling
>    * @probe_finalize: Do final setup work after the device is added to an IOMMU
>    *                  group and attached to the groups domain
> + * @reset_device_done: Notify the driver about the completion of a device reset
>    * @device_group: find iommu group for a particular device
>    * @get_resv_regions: Request list of reserved regions for a device
>    * @of_xlate: add OF master IDs to iommu grouping
> @@ -683,6 +684,7 @@ struct iommu_ops {
>   	struct iommu_device *(*probe_device)(struct device *dev);
>   	void (*release_device)(struct device *dev);
>   	void (*probe_finalize)(struct device *dev);
> +	void (*reset_device_done)(struct device *dev);
>   	struct iommu_group *(*device_group)(struct device *dev);
>   
>   	/* Request/Free a list of reserved regions for a device */
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 40a15c9360bd1..fcd2902d9e8db 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -4013,11 +4013,13 @@ EXPORT_SYMBOL_GPL(pci_dev_reset_iommu_prepare);
>   void pci_dev_reset_iommu_done(struct pci_dev *pdev)
>   {
>   	struct iommu_group *group = pdev->dev.iommu_group;
> +	const struct iommu_ops *ops;
>   	unsigned long pasid;
>   	void *entry;
>   
>   	if (!pci_ats_supported(pdev) || !dev_has_iommu(&pdev->dev))
>   		return;
> +	ops = dev_iommu_ops(&pdev->dev);
>   
>   	guard(mutex)(&group->mutex);
>   
> @@ -4029,6 +4031,16 @@ void pci_dev_reset_iommu_done(struct pci_dev *pdev)
>   	if (WARN_ON(!group->blocking_domain))
>   		return;
>   
> +	/*
> +	 * A PCI device might have been in an error state, so the IOMMU driver
> +	 * had to quarantine the device by disabling specific hardware feature
> +	 * or dropping translation capability. Here notify the IOMMU driver as
> +	 * a reliable signal that the faulty PCI device has been cleanly reset
> +	 * so now it can lift its quarantine and restore full functionality.
> +	 */
> +	if (ops && ops->reset_device_done)
> +		ops->reset_device_done(&pdev->dev);

Nit: dev_iommu_ops() ensures a valid iommu "ops". There is no need to
check "ops != NULL" here. Just

	if (ops->reset_device_done)
		ops->reset_device_done(&pdev->dev);

> +
>   	/* Re-attach RID domain back to group->domain */
>   	if (group->domain != group->blocking_domain) {
>   		WARN_ON(__iommu_attach_device(group->domain, &pdev->dev,

Thanks,
baolu

  reply	other threads:[~2026-03-18  6:01 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 19:15 [PATCH v2 0/7] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 1/7] iommu: Do not call pci_dev_reset_iommu_done() unless reset succeeds Nicolin Chen
2026-03-18  7:21   ` Tian, Kevin
2026-03-18 20:16     ` Nicolin Chen
2026-03-18  8:02   ` Shuai Xue
2026-03-18 20:27     ` Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 2/7] iommu: Add reset_device_done callback for hardware fault recovery Nicolin Chen
2026-03-18  5:59   ` Baolu Lu [this message]
2026-03-18 18:42     ` Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 3/7] iommu: Add iommu_report_device_broken() to quarantine a broken device Nicolin Chen
2026-03-18  6:13   ` Baolu Lu
2026-03-19  1:31     ` Nicolin Chen
2026-03-18  7:31   ` Tian, Kevin
2026-03-19  1:30     ` Nicolin Chen
2026-03-19  2:35       ` Tian, Kevin
2026-03-19  3:13         ` Nicolin Chen
2026-03-18 11:45   ` Shuai Xue
2026-03-18 20:29     ` Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap Nicolin Chen
2026-03-18  7:36   ` Tian, Kevin
2026-03-18 19:26     ` Nicolin Chen
2026-03-18 22:06       ` Samiullah Khawaja
2026-03-19  3:08         ` Tian, Kevin
2026-03-19  3:12           ` Nicolin Chen
2026-03-23 23:51             ` Jason Gunthorpe
2026-04-10  7:39               ` Tian, Kevin
2026-03-18 22:02   ` Samiullah Khawaja
2026-03-18 23:23     ` Nicolin Chen
2026-03-19  0:08       ` Samiullah Khawaja
2026-03-19  1:15         ` Nicolin Chen
2026-03-23 23:57       ` Jason Gunthorpe
2026-03-24  1:21         ` Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 5/7] iommu/arm-smmu-v3: Replace smmu with master in arm_smmu_inv Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 6/7] iommu/arm-smmu-v3: Introduce master->ats_broken flag Nicolin Chen
2026-03-18  7:39   ` Tian, Kevin
2026-03-18 20:00     ` Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 7/7] iommu/arm-smmu-v3: Block ATS upon an ATC invalidation timeout Nicolin Chen
2026-03-19  2:56   ` Shuai Xue
2026-03-19  3:26     ` Nicolin Chen
2026-03-19  7:41       ` Shuai Xue
2026-03-18  7:47 ` [PATCH v2 0/7] iommu/arm-smmu-v3: Quarantine device upon " Tian, Kevin
2026-03-18 20:04   ` Nicolin Chen
2026-03-19  2:29     ` Tian, Kevin
2026-03-19  3:10       ` Nicolin Chen
2026-03-24  0:03         ` Jason Gunthorpe
2026-03-24  1:30           ` Nicolin Chen
2026-03-25  6:55           ` Tian, Kevin
2026-03-25 14:12             ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=566549c9-fab1-43b5-a35b-e3c76f1c285d@linux.intel.com \
    --to=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=praan@google.com \
    --cc=rafael@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.