public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Baolu Lu <baolu.lu@linux.intel.com>
To: Nicolin Chen <nicolinc@nvidia.com>, Will Deacon <will@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Joerg Roedel <joro@8bytes.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Jason Gunthorpe <jgg@nvidia.com>
Cc: "Rafael J . Wysocki" <rafael@kernel.org>,
	Len Brown <lenb@kernel.org>,
	Pranjal Shrivastava <praan@google.com>,
	Mostafa Saleh <smostafa@google.com>,
	Kevin Tian <kevin.tian@intel.com>,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-pci@vger.kernel.org, vsethi@nvidia.com,
	Shuai Xue <xueshuai@linux.alibaba.com>
Subject: Re: [PATCH v3 02/11] iommu: Pass in reset result to pci_dev_reset_iommu_done()
Date: Fri, 24 Apr 2026 10:38:09 +0800	[thread overview]
Message-ID: <c8c8b482-3781-4d33-9aea-866467d15b69@linux.intel.com> (raw)
In-Reply-To: <bf99f11ac9a42b5552ec3367d02840366459ae7b.1776381841.git.nicolinc@nvidia.com>

On 4/17/26 07:28, Nicolin Chen wrote:
> IOMMU drivers handle ATC cache maintenance. They may encounter ATC-related
> errors (e.g., ATC invalidation request timeout), indicating that ATC cache
> might have stale entries that can corrupt the memory. In this case, IOMMU
> driver has no choice but to block the device's ATS function and wait for a
> device recovery.
> 
> The pci_dev_reset_iommu_done() called at the end of a reset function could
> serve as a reliable signal to the IOMMU subsystem that the physical device
> cache is completely clean. However, the function is called unconditionally
> even if the reset operation had actually failed, which would re-attach the
> faulty device back to a normal translation domain. And this will leave the
> system highly exposed, creating vulnerabilities for data corruption:
>      IOMMU blocks RID/ATS
>      pci_reset_function():
>          pci_dev_reset_iommu_prepare(); // Block RID/ATS
>          __reset(); // Failed (ATC is still stale)
>          pci_dev_reset_iommu_done(); // Unblock RID/ATS (ah-ha)
> 
> Instead, add a @reset_succeeds parameter to pci_dev_reset_iommu_done() and
> pass the reset result from each caller:
>      IOMMU blocks RID/ATS
>      pci_reset_function():
>          pci_dev_reset_iommu_prepare(); // Block RID/ATS
>          rc = __reset();
>          pci_dev_reset_iommu_done(!rc); // Unblock or quarantine
> 
> On a successful reset, done() restores the device to its RID/PASID domains
> and decrements group->recovery_cnt. On failure, the device remains blocked,
> and concurrent domain attachment will be rejected until a successful reset.
> 
> Suggested-by: Kevin Tian<kevin.tian@intel.com>
> Signed-off-by: Nicolin Chen<nicolinc@nvidia.com>
> ---
>   include/linux/iommu.h  |  5 +++--
>   drivers/iommu/iommu.c  | 28 +++++++++++++++++++++++++---
>   drivers/pci/pci-acpi.c |  2 +-
>   drivers/pci/pci.c      | 10 +++++-----
>   drivers/pci/quirks.c   |  2 +-
>   5 files changed, 35 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 54b8b48c762e8..d3685967e960a 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -1191,7 +1191,7 @@ void iommu_free_global_pasid(ioasid_t pasid);
>   
>   /* PCI device reset functions */
>   int pci_dev_reset_iommu_prepare(struct pci_dev *pdev);
> -void pci_dev_reset_iommu_done(struct pci_dev *pdev);
> +void pci_dev_reset_iommu_done(struct pci_dev *pdev, bool reset_succeeds);
>   #else /* CONFIG_IOMMU_API */
>   
>   struct iommu_ops {};
> @@ -1521,7 +1521,8 @@ static inline int pci_dev_reset_iommu_prepare(struct pci_dev *pdev)
>   	return 0;
>   }
>   
> -static inline void pci_dev_reset_iommu_done(struct pci_dev *pdev)
> +static inline void pci_dev_reset_iommu_done(struct pci_dev *pdev,
> +					    bool reset_succeeds)
>   {
>   }
>   #endif /* CONFIG_IOMMU_API */
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index ff181db687bbf..28d4c1f143a08 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -80,6 +80,7 @@ struct group_device {
>   	 * Device is blocked for a pending recovery while its group->domain is
>   	 * retained. This can happen when:
>   	 *  - Device is undergoing a reset
> +	 *  - Device failed the last reset
>   	 */
>   	bool blocked;
>   	unsigned int reset_depth;
> @@ -3971,7 +3972,9 @@ EXPORT_SYMBOL_NS_GPL(iommu_replace_group_handle, "IOMMUFD_INTERNAL");
>    * reset is finished, pci_dev_reset_iommu_done() can restore everything.
>    *
>    * Caller must use pci_dev_reset_iommu_prepare() with pci_dev_reset_iommu_done()
> - * before/after the core-level reset routine, to decrement the recovery_cnt.
> + * before/after the core-level reset routine. On a successful reset, done() will
> + * decrement group->recovery_cnt and restore domains. On a failure, recovery_cnt
> + * is left intact and the device stays blocked.
>    *
>    * Return: 0 on success or negative error code if the preparation failed.
>    *
> @@ -4000,6 +4003,9 @@ int pci_dev_reset_iommu_prepare(struct pci_dev *pdev)
>   
>   	if (gdev->reset_depth++)
>   		return 0;
> +	/* Device might be already blocked for a quarantine */
> +	if (gdev->blocked)
> +		return 0;
>   
>   	ret = __iommu_group_alloc_blocking_domain(group);
>   	if (ret)
> @@ -4047,18 +4053,22 @@ EXPORT_SYMBOL_GPL(pci_dev_reset_iommu_prepare);
>   /**
>    * pci_dev_reset_iommu_done() - Restore IOMMU after a PCI device reset is done
>    * @pdev: PCI device that has finished a reset routine
> + * @reset_succeeds: Whether the PCI device reset is successful or not
>    *
>    * After a PCIe device finishes a reset routine, it wants to restore its IOMMU
>    * activity, including new translation and cache invalidation, by re-attaching
>    * all RID/PASID of the device back to the domains retained in the core-level
>    * structure.
>    *
> - * Caller must pair it with a successful pci_dev_reset_iommu_prepare().
> + * This is a pairing function for pci_dev_reset_iommu_prepare(). Caller should
> + * pass in the reset state via @reset_succeeds. On a failed reset, the device
> + * remains blocked for a quarantine with the group->recovery_cnt intact, so as
> + * to protect system memory until a subsequent successful reset.
>    *
>    * Note that, although unlikely, there is a risk that re-attaching domains might
>    * fail due to some unexpected happening like OOM.
>    */
> -void pci_dev_reset_iommu_done(struct pci_dev *pdev)
> +void pci_dev_reset_iommu_done(struct pci_dev *pdev, bool reset_succeeds)
>   {
>   	struct iommu_group *group = pdev->dev.iommu_group;
>   	struct group_device *gdev;
> @@ -4083,6 +4093,18 @@ void pci_dev_reset_iommu_done(struct pci_dev *pdev)
>   	if (WARN_ON(!group->blocking_domain))
>   		return;
>   
> +	/*
> +	 * A reset failure implies that the device might be unreliable. E.g. its
> +	 * device cache might retain stale entries, which potentially results in
> +	 * memory corruption. Thus, do not unblock the device until a successful
> +	 * reset.
> +	 */
> +	if (!reset_succeeds) {
> +		pci_err(pdev,
> +			"Reset failed. Keep it blocked to protect memory\n");
> +		return;
> +	}

Nit: pci_dev_reset_iommu_done() does nothing if reset_succeeds is false.
Would it be better to handle this in the caller instead? Something like:

	if (reset_succeeds)
		pci_dev_reset_iommu_done(dev);

?

> +
>   	/* Re-attach RID domain back to group->domain */
>   	if (group->domain != group->blocking_domain) {
>   		WARN_ON(__iommu_attach_device(group->domain, &pdev->dev,

Thanks,
baolu


  reply	other threads:[~2026-04-24  2:40 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-16 23:28 [PATCH v3 00/11] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout Nicolin Chen
2026-04-16 23:28 ` [PATCH v3 01/11] PCI: Propagate FLR return values to callers Nicolin Chen
2026-04-22  6:13   ` Baolu Lu
2026-04-23  4:29     ` Nicolin Chen
2026-04-16 23:28 ` [PATCH v3 02/11] iommu: Pass in reset result to pci_dev_reset_iommu_done() Nicolin Chen
2026-04-24  2:38   ` Baolu Lu [this message]
2026-04-24  2:46     ` Nicolin Chen
2026-04-24  2:56       ` Baolu Lu
2026-04-16 23:28 ` [PATCH v3 03/11] iommu: Add reset_device_done callback for hardware fault recovery Nicolin Chen
2026-04-24  2:40   ` Baolu Lu
2026-04-16 23:28 ` [PATCH v3 04/11] iommu: Add __iommu_group_block_device helper Nicolin Chen
2026-04-24  2:40   ` Baolu Lu
2026-04-16 23:28 ` [PATCH v3 05/11] iommu: Change group->devices to RCU-protected list Nicolin Chen
2026-04-24  2:53   ` Baolu Lu
2026-04-24  3:08     ` Nicolin Chen
2026-04-24 13:11       ` Jason Gunthorpe
2026-04-24 19:12         ` Nicolin Chen
2026-04-24 22:58           ` Jason Gunthorpe
2026-04-25  0:51             ` Nicolin Chen
2026-04-16 23:28 ` [PATCH v3 06/11] iommu: Defer __iommu_group_free_device() to be outside group->mutex Nicolin Chen
2026-04-23  7:55   ` Baolu Lu
2026-04-23 15:47     ` Nicolin Chen
2026-04-24  2:29       ` Baolu Lu
2026-04-16 23:28 ` [PATCH v3 07/11] iommu: Add iommu_report_device_broken() to quarantine a broken device Nicolin Chen
2026-04-16 23:28 ` [PATCH v3 08/11] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap Nicolin Chen
2026-04-16 23:28 ` [PATCH v3 09/11] iommu/arm-smmu-v3: Replace smmu with master in arm_smmu_inv Nicolin Chen
2026-04-16 23:28 ` [PATCH v3 10/11] iommu/arm-smmu-v3: Introduce master->ats_broken flag Nicolin Chen
2026-04-16 23:28 ` [PATCH v3 11/11] iommu/arm-smmu-v3: Block ATS upon an ATC invalidation timeout Nicolin Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c8c8b482-3781-4d33-9aea-866467d15b69@linux.intel.com \
    --to=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=praan@google.com \
    --cc=rafael@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=smostafa@google.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox