public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH rc v2 0/5] iommu/arm-smmu-v3: Fix device crash on kdump kernel
@ 2026-04-15 21:17 Nicolin Chen
  2026-04-15 21:17 ` [PATCH rc v2 1/5] iommu/arm-smmu-v3: Add arm_smmu_adopt_strtab() for kdump Nicolin Chen
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Nicolin Chen @ 2026-04-15 21:17 UTC (permalink / raw)
  To: will, robin.murphy, jgg, kevin.tian
  Cc: joro, praan, baolu.lu, miko.lenczewski, smostafa,
	linux-arm-kernel, iommu, linux-kernel, stable, jamien

When transitioning to a kdump kernel, the primary kernel might have crashed
while endpoint devices were actively bus-mastering DMA. Currently, the SMMU
driver aggressively resets the hardware during probe by clearing CR0_SMMUEN
and setting the Global Bypass Attribute (GBPA) to ABORT.

In a kdump scenario, this aggressive reset is highly destructive:
a) If GBPA is set to ABORT, in-flight DMA will be aborted, generating fatal
   PCIe AER or SErrors that may panic the kdump kernel
b) If GBPA is set to BYPASS, in-flight DMA targeting some IOVAs will bypass
   the SMMU and corrupt the physical memory at those 1:1 mapped IOVAs.

To safely absorb in-flight DMA, the kdump kernel must leave SMMUEN=1 intact
and avoid modifying STRTAB_BASE. This allows HW to continue translating in-
flight DMA using the crashed kernel's page tables until the endpoint device
drivers probe and quiesce their respective hardware.

However, the ARM SMMUv3 architecture specification states that updating the
SMMU_STRTAB_BASE register while SMMUEN == 1 is UNPREDICTABLE or ignored.

This leaves a kdump kernel no choice but to adopt the stream table from the
crashed kernel.

In this series:
 - Introduce an ARM_SMMU_OPT_KDUMP
 - Skip SMMUEN and STRTAB_BASE resets in arm_smmu_device_reset()
 - Map the crashed kernel's stream tables into the kdump kernel [*]
 - Defer any default domain attachment to retain STEs until device drivers
   explicitly request it.

[*] This is implemented via memremap, which only works on a coherent SMMU.

Note that the entire series requires Jason's work that was merged in v6.12:
85196f54743d ("iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg").
I have a backported version that is verified with a v6.8 kernel. I can send
if we see a strong need after this version is accepted.

This is on Github:
https://github.com/nicolinc/iommufd/commits/smmuv3_kdump-v2

Changelog
v2
 * Add warning in non-coherent SMMU cases
 * Keep eventq/priq disabled v.s. enabling-and-disabling-later
 * Check KDUMP option in the beginning of arm_smmu_device_reset()
 * Validate STRTAB format matches HW capability instead of forcing flags
v1:
 https://lore.kernel.org/all/cover.1775763475.git.nicolinc@nvidia.com/

Nicolin Chen (5):
  iommu/arm-smmu-v3: Add arm_smmu_adopt_strtab() for kdump
  iommu/arm-smmu-v3: Implement is_attach_deferred() for kdump
  iommu/arm-smmu-v3: Retain CR0_SMMUEN during kdump device reset
  iommu/arm-smmu-v3: Skip EVTQ/PRIQ setup in kdump kernel
  iommu/arm-smmu-v3: Detect ARM_SMMU_OPT_KDUMP in
    arm_smmu_device_hw_probe()

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |   1 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 225 ++++++++++++++++++--
 2 files changed, 207 insertions(+), 19 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-04-17 12:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-15 21:17 [PATCH rc v2 0/5] iommu/arm-smmu-v3: Fix device crash on kdump kernel Nicolin Chen
2026-04-15 21:17 ` [PATCH rc v2 1/5] iommu/arm-smmu-v3: Add arm_smmu_adopt_strtab() for kdump Nicolin Chen
2026-04-15 21:17 ` [PATCH rc v2 2/5] iommu/arm-smmu-v3: Implement is_attach_deferred() " Nicolin Chen
2026-04-15 21:17 ` [PATCH rc v2 3/5] iommu/arm-smmu-v3: Retain CR0_SMMUEN during kdump device reset Nicolin Chen
2026-04-15 21:17 ` [PATCH rc v2 4/5] iommu/arm-smmu-v3: Skip EVTQ/PRIQ setup in kdump kernel Nicolin Chen
2026-04-15 21:17 ` [PATCH rc v2 5/5] iommu/arm-smmu-v3: Detect ARM_SMMU_OPT_KDUMP in arm_smmu_device_hw_probe() Nicolin Chen
2026-04-16 16:49 ` [PATCH rc v2 0/5] iommu/arm-smmu-v3: Fix device crash on kdump kernel Robin Murphy
2026-04-16 17:20   ` Jason Gunthorpe
2026-04-17  7:48     ` Tian, Kevin
2026-04-17 11:59       ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox