public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: Nicolin Chen <nicolinc@nvidia.com>,
	will@kernel.org, jgg@nvidia.com, kevin.tian@intel.com
Cc: joro@8bytes.org, praan@google.com, baolu.lu@linux.intel.com,
	miko.lenczewski@arm.com, smostafa@google.com,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	jamien@nvidia.com
Subject: Re: [PATCH rc v2 0/5] iommu/arm-smmu-v3: Fix device crash on kdump kernel
Date: Thu, 16 Apr 2026 17:49:24 +0100	[thread overview]
Message-ID: <3eaf217f-8e1e-4d64-983a-6b888886f157@arm.com> (raw)
In-Reply-To: <cover.1776286352.git.nicolinc@nvidia.com>

On 15/04/2026 10:17 pm, Nicolin Chen wrote:
> When transitioning to a kdump kernel, the primary kernel might have crashed
> while endpoint devices were actively bus-mastering DMA. Currently, the SMMU
> driver aggressively resets the hardware during probe by clearing CR0_SMMUEN
> and setting the Global Bypass Attribute (GBPA) to ABORT.
> 
> In a kdump scenario, this aggressive reset is highly destructive:
> a) If GBPA is set to ABORT, in-flight DMA will be aborted, generating fatal
>     PCIe AER or SErrors that may panic the kdump kernel
> b) If GBPA is set to BYPASS, in-flight DMA targeting some IOVAs will bypass
>     the SMMU and corrupt the physical memory at those 1:1 mapped IOVAs.

But wasn't that rather the point? Th kdump kernel doesn't know the scope 
of how much could have gone wrong (including potentially the SMMU 
configuration itself), so it just blocks everything, resets and 
reenables the devices it cares about, and ignores whatever else might be 
on fire.

If AER can panic a kdump kernel, that seems like a failing of the kdump 
kernel itself more than anything else (especially given the likelihood 
that additional AER events could follow from whatever initial 
crash/failure triggered kdump to begin with). And frankly if some device 
getting a translation fault could directly SError the whole system, then 
I'd say that system is pretty doomed in general, kdump or not.

Thanks,
Robin.

> To safely absorb in-flight DMA, the kdump kernel must leave SMMUEN=1 intact
> and avoid modifying STRTAB_BASE. This allows HW to continue translating in-
> flight DMA using the crashed kernel's page tables until the endpoint device
> drivers probe and quiesce their respective hardware.
> 
> However, the ARM SMMUv3 architecture specification states that updating the
> SMMU_STRTAB_BASE register while SMMUEN == 1 is UNPREDICTABLE or ignored.
> 
> This leaves a kdump kernel no choice but to adopt the stream table from the
> crashed kernel.
> 
> In this series:
>   - Introduce an ARM_SMMU_OPT_KDUMP
>   - Skip SMMUEN and STRTAB_BASE resets in arm_smmu_device_reset()
>   - Map the crashed kernel's stream tables into the kdump kernel [*]
>   - Defer any default domain attachment to retain STEs until device drivers
>     explicitly request it.
> 
> [*] This is implemented via memremap, which only works on a coherent SMMU.
> 
> Note that the entire series requires Jason's work that was merged in v6.12:
> 85196f54743d ("iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg").
> I have a backported version that is verified with a v6.8 kernel. I can send
> if we see a strong need after this version is accepted.
> 
> This is on Github:
> https://github.com/nicolinc/iommufd/commits/smmuv3_kdump-v2
> 
> Changelog
> v2
>   * Add warning in non-coherent SMMU cases
>   * Keep eventq/priq disabled v.s. enabling-and-disabling-later
>   * Check KDUMP option in the beginning of arm_smmu_device_reset()
>   * Validate STRTAB format matches HW capability instead of forcing flags
> v1:
>   https://lore.kernel.org/all/cover.1775763475.git.nicolinc@nvidia.com/
> 
> Nicolin Chen (5):
>    iommu/arm-smmu-v3: Add arm_smmu_adopt_strtab() for kdump
>    iommu/arm-smmu-v3: Implement is_attach_deferred() for kdump
>    iommu/arm-smmu-v3: Retain CR0_SMMUEN during kdump device reset
>    iommu/arm-smmu-v3: Skip EVTQ/PRIQ setup in kdump kernel
>    iommu/arm-smmu-v3: Detect ARM_SMMU_OPT_KDUMP in
>      arm_smmu_device_hw_probe()
> 
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |   1 +
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 225 ++++++++++++++++++--
>   2 files changed, 207 insertions(+), 19 deletions(-)
> 



  parent reply	other threads:[~2026-04-16 16:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-15 21:17 [PATCH rc v2 0/5] iommu/arm-smmu-v3: Fix device crash on kdump kernel Nicolin Chen
2026-04-15 21:17 ` [PATCH rc v2 1/5] iommu/arm-smmu-v3: Add arm_smmu_adopt_strtab() for kdump Nicolin Chen
2026-04-15 21:17 ` [PATCH rc v2 2/5] iommu/arm-smmu-v3: Implement is_attach_deferred() " Nicolin Chen
2026-04-15 21:17 ` [PATCH rc v2 3/5] iommu/arm-smmu-v3: Retain CR0_SMMUEN during kdump device reset Nicolin Chen
2026-04-15 21:17 ` [PATCH rc v2 4/5] iommu/arm-smmu-v3: Skip EVTQ/PRIQ setup in kdump kernel Nicolin Chen
2026-04-15 21:17 ` [PATCH rc v2 5/5] iommu/arm-smmu-v3: Detect ARM_SMMU_OPT_KDUMP in arm_smmu_device_hw_probe() Nicolin Chen
2026-04-16 16:49 ` Robin Murphy [this message]
2026-04-16 17:20   ` [PATCH rc v2 0/5] iommu/arm-smmu-v3: Fix device crash on kdump kernel Jason Gunthorpe
2026-04-17  7:48     ` Tian, Kevin
2026-04-17 11:59       ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3eaf217f-8e1e-4d64-983a-6b888886f157@arm.com \
    --to=robin.murphy@arm.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=iommu@lists.linux.dev \
    --cc=jamien@nvidia.com \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miko.lenczewski@arm.com \
    --cc=nicolinc@nvidia.com \
    --cc=praan@google.com \
    --cc=smostafa@google.com \
    --cc=stable@vger.kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox