From: Nicolin Chen <nicolinc@nvidia.com>
To: Will Deacon <will@kernel.org>,
Robin Murphy <robin.murphy@arm.com>,
"Joerg Roedel" <joro@8bytes.org>,
Bjorn Helgaas <bhelgaas@google.com>,
"Jason Gunthorpe" <jgg@nvidia.com>
Cc: "Rafael J . Wysocki" <rafael@kernel.org>,
Len Brown <lenb@kernel.org>,
Pranjal Shrivastava <praan@google.com>,
Mostafa Saleh <smostafa@google.com>,
Lu Baolu <baolu.lu@linux.intel.com>,
Kevin Tian <kevin.tian@intel.com>,
<linux-arm-kernel@lists.infradead.org>, <iommu@lists.linux.dev>,
<linux-kernel@vger.kernel.org>, <linux-acpi@vger.kernel.org>,
<linux-pci@vger.kernel.org>, <vsethi@nvidia.com>,
Shuai Xue <xueshuai@linux.alibaba.com>
Subject: [PATCH v3 00/11] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout
Date: Thu, 16 Apr 2026 16:28:29 -0700 [thread overview]
Message-ID: <cover.1776381841.git.nicolinc@nvidia.com> (raw)
Hi all,
This series addresses a critical vulnerability and stability issue where an
unresponsive PCIe device failing to process ATC (Address Translation Cache)
invalidation requests leads to silent data corruption and continuous SMMU
CMDQ error spam.
[ As Jason pointed out, because this series fundamentally introduces a new
RAS feature to quarantine and recover from hardware faults and relies on
a recently accepted SMMU driver rework, it is not treated as a standard
bug fix. Thus, none of the patches here carries a "Fixes" tag. ]
Currently, when an ATC invalidation times out, the SMMUv3 driver skips the
CMDQ_ERR_CERROR_ATC_INV_IDX error. This leaves the device's ATS cache state
desynchronized from the SMMU: the device cache may retain stale ATC entries
for memory pages that the OS has already reclaimed and reassigned, creating
a direct vector for data corruption. Furthermore, the driver might continue
issuing ATC_INV commands, resulting in constant CMDQ errors:
unexpected global error reported (0x00000001), this could be serious
CMDQ error (cons 0x0302bb84): ATC invalidate timeout
unexpected global error reported (0x00000001), this could be serious
CMDQ error (cons 0x0302bb88): ATC invalidate timeout
unexpected global error reported (0x00000001), this could be serious
CMDQ error (cons 0x0302bb8c): ATC invalidate timeout
...
To resolve this, introduce a mechanism to quarantine a broken device in the
SMMUv3 driver and the IOMMU core. To achieve this, add preparatory changes:
- Tighten the semantics of pci_dev_reset_iommu_done() that is now strictly
called only upon a successful hardware reset
- Introduce a reset_device_done op, allowing the core to signal the driver
when the physical hardware has been cleanly recovered (e.g., via AER or
a manual reset) so the quarantine can be lifted
- Utilize a per-group_device WQ via an iommu_report_device_broken() helper
On the SMMUv3 driver side, retry the timedout ATC_INV batch to identify the
faulty device(s) via an atc_sync_timeouts tracker. Perform a surgical STE
update and flag the ATS as broken to reject further ATS/ATC requests at the
hardware level and suppress further timeout spam.
This is on Github:
https://github.com/nicolinc/iommufd/commits/smmuv3_atc_timeout-v3
Note that patches are rebased on bug-fix under review:
https://lore.kernel.org/all/20260407194644.171304-1-nicolinc@nvidia.com/
Changelog
v3:
* Rebase on arm/smmu/updates branch + bug fix
* Update commit messages and inline comments
* [iommu] Drop unnecessary ops validation
* [iommu] Add missed function stub when !CONFIG_IOMMU_API
* [iommu] Change iommu_report_device_broken() to per gdev
* [iommu] Separate quarantine from pci_dev_reset_prepare()
* [iommu] Check reset failure in pci_dev_reset_iommu_done()
* [smmuv3] Fix STE update with try_cmpxchg64()
* [smmuv3] Fix "continue" bug when skipping ATC commands
* [smmuv3] Replace atomic_t prod_err with a lockless bitmap
* [smmuv3] Drop master->invs_domain; disable ATS per-master directly
* [smmuv3] Return -EIO for ATC timeout v.s. -ETIMEDOUT for poll timeout
* [smmuv3] Replace INV_TYPE_ATS_DISABLED with per-master ats_broken flag
v2:
https://lore.kernel.org/all/cover.1773774441.git.nicolinc@nvidia.com/
* Rebase on arm_smmu_invs-v13 series [0]
* Bisect batched atc invalidation commands
* Drop the direct pci_reset_function() call
* Move the work queue from SMMUv3 to the core
* Proceed a surgical STE update to disable EATS
* Wait for pci_dev_reset_iommu_done() to signal a recovery
v1:
https://lore.kernel.org/all/cover.1772686998.git.nicolinc@nvidia.com/
[0] https://lore.kernel.org/all/cover.1773733797.git.nicolinc@nvidia.com/
Thanks
Nicolin
Nicolin Chen (11):
PCI: Propagate FLR return values to callers
iommu: Pass in reset result to pci_dev_reset_iommu_done()
iommu: Add reset_device_done callback for hardware fault recovery
iommu: Add __iommu_group_block_device helper
iommu: Change group->devices to RCU-protected list
iommu: Defer __iommu_group_free_device() to be outside group->mutex
iommu: Add iommu_report_device_broken() to quarantine a broken device
iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap
iommu/arm-smmu-v3: Replace smmu with master in arm_smmu_inv
iommu/arm-smmu-v3: Introduce master->ats_broken flag
iommu/arm-smmu-v3: Block ATS upon an ATC invalidation timeout
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 4 +-
include/linux/iommu.h | 15 +-
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c | 34 ++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 193 +++++++++++-
drivers/iommu/iommu.c | 284 ++++++++++++++----
drivers/pci/pci-acpi.c | 2 +-
drivers/pci/pci.c | 10 +-
drivers/pci/quirks.c | 24 +-
8 files changed, 454 insertions(+), 112 deletions(-)
--
2.43.0
next reply other threads:[~2026-04-16 23:29 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-16 23:28 Nicolin Chen [this message]
2026-04-16 23:28 ` [PATCH v3 01/11] PCI: Propagate FLR return values to callers Nicolin Chen
2026-04-16 23:28 ` [PATCH v3 02/11] iommu: Pass in reset result to pci_dev_reset_iommu_done() Nicolin Chen
2026-04-16 23:28 ` [PATCH v3 03/11] iommu: Add reset_device_done callback for hardware fault recovery Nicolin Chen
2026-04-16 23:28 ` [PATCH v3 04/11] iommu: Add __iommu_group_block_device helper Nicolin Chen
2026-04-16 23:28 ` [PATCH v3 05/11] iommu: Change group->devices to RCU-protected list Nicolin Chen
2026-04-16 23:28 ` [PATCH v3 06/11] iommu: Defer __iommu_group_free_device() to be outside group->mutex Nicolin Chen
2026-04-16 23:28 ` [PATCH v3 07/11] iommu: Add iommu_report_device_broken() to quarantine a broken device Nicolin Chen
2026-04-16 23:28 ` [PATCH v3 08/11] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap Nicolin Chen
2026-04-16 23:28 ` [PATCH v3 09/11] iommu/arm-smmu-v3: Replace smmu with master in arm_smmu_inv Nicolin Chen
2026-04-16 23:28 ` [PATCH v3 10/11] iommu/arm-smmu-v3: Introduce master->ats_broken flag Nicolin Chen
2026-04-16 23:28 ` [PATCH v3 11/11] iommu/arm-smmu-v3: Block ATS upon an ATC invalidation timeout Nicolin Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1776381841.git.nicolinc@nvidia.com \
--to=nicolinc@nvidia.com \
--cc=baolu.lu@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=praan@google.com \
--cc=rafael@kernel.org \
--cc=robin.murphy@arm.com \
--cc=smostafa@google.com \
--cc=vsethi@nvidia.com \
--cc=will@kernel.org \
--cc=xueshuai@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox