From: Nicolin Chen <nicolinc@nvidia.com>
To: Will Deacon <will@kernel.org>,
Robin Murphy <robin.murphy@arm.com>,
"Joerg Roedel" <joro@8bytes.org>,
Bjorn Helgaas <bhelgaas@google.com>,
"Jason Gunthorpe" <jgg@nvidia.com>
Cc: "Rafael J . Wysocki" <rafael@kernel.org>,
Len Brown <lenb@kernel.org>,
Pranjal Shrivastava <praan@google.com>,
Mostafa Saleh <smostafa@google.com>,
Lu Baolu <baolu.lu@linux.intel.com>,
Kevin Tian <kevin.tian@intel.com>,
<linux-arm-kernel@lists.infradead.org>, <iommu@lists.linux.dev>,
<linux-kernel@vger.kernel.org>, <linux-acpi@vger.kernel.org>,
<linux-pci@vger.kernel.org>, <vsethi@nvidia.com>,
Shuai Xue <xueshuai@linux.alibaba.com>
Subject: [PATCH v4 00/24] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout
Date: Mon, 18 May 2026 20:38:43 -0700 [thread overview]
Message-ID: <cover.1779161849.git.nicolinc@nvidia.com> (raw)
Hi all,
This series addresses a critical vulnerability and stability issue where an
unresponsive PCIe device failing to process ATC (Address Translation Cache)
invalidation requests leads to silent data corruption and continuous SMMU
CMDQ error spam.
[ As Jason pointed out, because this series fundamentally introduces a new
RAS feature to quarantine and recover from hardware faults and relies on
a recently accepted SMMU driver rework, it is not treated as a standard
bug fix. Thus, most of the patches here don't carry a "Fixes" tag. ]
Currently, when an ATC invalidation times out, the SMMUv3 driver skips the
CMDQ_ERR_CERROR_ATC_INV_IDX error. This leaves the device's ATS cache state
desynchronized from the SMMU: the device cache may retain stale ATC entries
for memory pages that the OS has already reclaimed and reassigned, creating
a direct vector for data corruption. Furthermore, the driver might continue
issuing ATC_INV commands, resulting in constant CMDQ errors:
unexpected global error reported (0x00000001), this could be serious
CMDQ error (cons 0x0302bb84): ATC invalidate timeout
unexpected global error reported (0x00000001), this could be serious
CMDQ error (cons 0x0302bb88): ATC invalidate timeout
unexpected global error reported (0x00000001), this could be serious
CMDQ error (cons 0x0302bb8c): ATC invalidate timeout
...
To resolve this, introduce a mechanism to quarantine a broken device in the
SMMUv3 driver and the IOMMU core. To achieve this, add preparatory changes:
- Pass in PCI reset result to pci_dev_reset_iommu_done()
- Co-clear pending CMDQ_ERR from the cmdq issuer under a raw_spinlock_t,
so an ATC_INV timeout flagged in cmdq->atc_sync_timeouts is definitive
when the issuer reads its bit after CMD_SYNC poll
- Introduce a reset_device_done op, allowing the core to signal the driver
when the physical hardware has been cleanly recovered (e.g., via AER or
a manual reset) so the quarantine can be lifted
- Utilize a per-group_device WQ via an iommu_report_device_broken() helper
On the SMMUv3 driver side, retry the timedout ATC_INV batch to identify the
faulty device(s). Perform a surgical STE update, and flag the ATS as broken
to reject further ATS/ATC requests at HW level and suppress timeout spam.
This is on Github:
https://github.com/nicolinc/iommufd/commits/smmuv3_atc_timeout-v4
Changelog
v4:
* Rebase on Joerg's IOMMU "fixes" branch
* Rebase on Jason's SMMUv3 cmd_ent series
https://lore.kernel.org/all/0-v2-47b2bf710ad5+716ac-smmu_no_cmdq_ent_jgg@nvidia.com/
* [PCI] Don't suspend IOMMU in probe mode
* [iommu] kfree_rcu() iommu_group
* [iommu] Convert gdev->blocked to enum gdev_blocked
* [iommu] Use disable_work_sync() to fix UAF and ref leak
* [iommu] Gate done() transitions to preserve BLOCKED_BROKEN
* [iommu] Decrement recovery_cnt when unplugging a blocked gdev
* [iommu] Drop racy dev_has_iommu() in iommu_report_device_broken()
* [iommu] Add gdev->broken_pending to skip worker after racing recovery
* [smmuv3] Add master->ats_invs scratch
* [smmuv3] Add arm_smmu_cmdq_batch_issue() wrapper
* [smmuv3] Force per-flush sync for has_ats batches
* [smmuv3] Serialize STE.EATS and ats_broken updates
* [smmuv3] Co-clear pending CMDQ_ERR from cmdq issuer
* [smmuv3] Add invs and has_ats to arm_smmu_cmdq_batch
* [smmuv3] Move arm_smmu_invs_for_each_entry to header
* [smmuv3] Set master->ats_broken after clearing STE.EATS
* [smmuv3] Issue CFGI_STE via arm_smmu_cmdq_issue_cmd_with_sync()
* [smmuv3] Keep "smmu" pointer in arm_smmu_inv but add "master" for ATS
v3:
https://lore.kernel.org/all/cover.1776381841.git.nicolinc@nvidia.com/
* Rebase on arm/smmu/updates branch + bug fix
* Update commit messages and inline comments
* [iommu] Drop unnecessary ops validation
* [iommu] Add missed function stub when !CONFIG_IOMMU_API
* [iommu] Change iommu_report_device_broken() to per gdev
* [iommu] Separate quarantine from pci_dev_reset_prepare()
* [iommu] Check reset failure in pci_dev_reset_iommu_done()
* [smmuv3] Fix STE update with try_cmpxchg64()
* [smmuv3] Fix "continue" bug when skipping ATC commands
* [smmuv3] Replace atomic_t prod_err with a lockless bitmap
* [smmuv3] Drop master->invs_domain; disable ATS per-master directly
* [smmuv3] Return -EIO for ATC timeout v.s. -ETIMEDOUT for poll timeout
* [smmuv3] Replace INV_TYPE_ATS_DISABLED with per-master ats_broken flag
v2:
https://lore.kernel.org/all/cover.1773774441.git.nicolinc@nvidia.com/
* Rebase on arm_smmu_invs-v13 series
* Bisect batched atc invalidation commands
* Drop the direct pci_reset_function() call
* Move the work queue from SMMUv3 to the core
* Proceed a surgical STE update to disable EATS
* Wait for pci_dev_reset_iommu_done() to signal a recovery
v1:
https://lore.kernel.org/all/cover.1772686998.git.nicolinc@nvidia.com/
Thanks
Nicolin
Nicolin Chen (24):
PCI: Don't suspend IOMMU when probing reset capability
PCI: Propagate FLR return values to callers
iommu: Convert gdev->blocked from bool to enum gdev_blocked
iommu: Pass in reset result to pci_dev_reset_iommu_done()
iommu: Add reset_device_done callback for hardware fault recovery
iommu: Defer iommu_group free via kfree_rcu()
iommu: Defer __iommu_group_free_device() to be outside group->mutex
iommu: Change group->devices to RCU-protected list
iommu: Add group pointer to struct group_device
iommu: Add __iommu_group_block_device helper
iommu: Add iommu_report_device_broken() to quarantine a broken device
iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap
iommu/arm-smmu-v3: Skip remaining GERROR causes on SFM
iommu/arm-smmu-v3: Introduce per-cmdq cmdq_err_handler callback
iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when CMD_SYNC times out
iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when queue_has_space()
fails
iommu/arm-smmu-v3: Add master in arm_smmu_inv for ATS entries
iommu/arm-smmu-v3: Introduce master->ats_broken flag
iommu/arm-smmu-v3: Add invs and has_ats to struct arm_smmu_cmdq_batch
iommu/arm-smmu-v3: Introduce arm_smmu_cmdq_batch_issue() wrapper
iommu/arm-smmu-v3: Move arm_smmu_invs_for_each_entry to header
iommu/arm-smmu-v3: Introduce master->ats_invs
iommu/arm-smmu-v3: Serialize STE.EATS and ats_broken updates
iommu/arm-smmu-v3: Block ATS upon an ATC invalidation timeout
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 72 +++-
include/linux/iommu.h | 18 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 387 ++++++++++++++---
.../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 36 +-
drivers/iommu/iommu.c | 406 ++++++++++++++----
drivers/pci/pci-acpi.c | 2 +-
drivers/pci/pci.c | 21 +-
drivers/pci/quirks.c | 43 +-
8 files changed, 820 insertions(+), 165 deletions(-)
--
2.43.0
next reply other threads:[~2026-05-19 3:39 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-19 3:38 Nicolin Chen [this message]
2026-05-19 3:38 ` [PATCH v4 01/24] PCI: Don't suspend IOMMU when probing reset capability Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 02/24] PCI: Propagate FLR return values to callers Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 03/24] iommu: Convert gdev->blocked from bool to enum gdev_blocked Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 04/24] iommu: Pass in reset result to pci_dev_reset_iommu_done() Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 05/24] iommu: Add reset_device_done callback for hardware fault recovery Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 06/24] iommu: Defer iommu_group free via kfree_rcu() Nicolin Chen
2026-05-19 11:39 ` Jason Gunthorpe
2026-05-19 18:54 ` Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 07/24] iommu: Defer __iommu_group_free_device() to be outside group->mutex Nicolin Chen
2026-05-19 11:47 ` Jason Gunthorpe
2026-05-19 3:38 ` [PATCH v4 08/24] iommu: Change group->devices to RCU-protected list Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 09/24] iommu: Add group pointer to struct group_device Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 10/24] iommu: Add __iommu_group_block_device helper Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 11/24] iommu: Add iommu_report_device_broken() to quarantine a broken device Nicolin Chen
2026-05-19 12:07 ` Jason Gunthorpe
2026-05-19 18:29 ` Nicolin Chen
2026-05-19 19:16 ` Jason Gunthorpe
2026-05-19 22:30 ` Nicolin Chen
2026-05-19 23:02 ` Jason Gunthorpe
2026-05-20 0:21 ` Nicolin Chen
2026-05-20 0:30 ` Jason Gunthorpe
2026-05-19 3:38 ` [PATCH v4 12/24] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 13/24] iommu/arm-smmu-v3: Skip remaining GERROR causes on SFM Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 14/24] iommu/arm-smmu-v3: Introduce per-cmdq cmdq_err_handler callback Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 15/24] iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when CMD_SYNC times out Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 16/24] iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when queue_has_space() fails Nicolin Chen
2026-05-19 3:39 ` [PATCH v4 17/24] iommu/arm-smmu-v3: Add master in arm_smmu_inv for ATS entries Nicolin Chen
2026-05-19 12:01 ` Jason Gunthorpe
2026-05-19 3:39 ` [PATCH v4 18/24] iommu/arm-smmu-v3: Introduce master->ats_broken flag Nicolin Chen
2026-05-19 12:06 ` Jason Gunthorpe
2026-05-19 3:39 ` [PATCH v4 19/24] iommu/arm-smmu-v3: Add invs and has_ats to struct arm_smmu_cmdq_batch Nicolin Chen
2026-05-19 12:09 ` Jason Gunthorpe
2026-05-19 3:39 ` [PATCH v4 20/24] iommu/arm-smmu-v3: Introduce arm_smmu_cmdq_batch_issue() wrapper Nicolin Chen
2026-05-19 3:39 ` [PATCH v4 21/24] iommu/arm-smmu-v3: Move arm_smmu_invs_for_each_entry to header Nicolin Chen
2026-05-19 3:39 ` [PATCH v4 22/24] iommu/arm-smmu-v3: Introduce master->ats_invs Nicolin Chen
2026-05-19 12:12 ` Jason Gunthorpe
2026-05-19 3:39 ` [PATCH v4 23/24] iommu/arm-smmu-v3: Serialize STE.EATS and ats_broken updates Nicolin Chen
2026-05-19 3:39 ` [PATCH v4 24/24] iommu/arm-smmu-v3: Block ATS upon an ATC invalidation timeout Nicolin Chen
2026-05-20 3:59 ` [PATCH v4 00/24] iommu/arm-smmu-v3: Quarantine device upon " Tian, Kevin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1779161849.git.nicolinc@nvidia.com \
--to=nicolinc@nvidia.com \
--cc=baolu.lu@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=praan@google.com \
--cc=rafael@kernel.org \
--cc=robin.murphy@arm.com \
--cc=smostafa@google.com \
--cc=vsethi@nvidia.com \
--cc=will@kernel.org \
--cc=xueshuai@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox