From: Pranjal Shrivastava <praan@google.com>
To: Mostafa Saleh <smostafa@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
Nicolin Chen <nicolinc@nvidia.com>,
will@kernel.org, robin.murphy@arm.com, joro@8bytes.org,
kees@kernel.org, baolu.lu@linux.intel.com, kevin.tian@intel.com,
miko.lenczewski@arm.com, linux-arm-kernel@lists.infradead.org,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
stable@vger.kernel.org, jamien@nvidia.com
Subject: Re: [PATCH rc v7 0/7] iommu/arm-smmu-v3: Fix device crash on kdump kernel
Date: Wed, 1 Jul 2026 13:36:29 +0000 [thread overview]
Message-ID: <akUX3T3fIoN42sdM@google.com> (raw)
In-Reply-To: <akUQj2pa1W-MekgF@google.com>
On Wed, Jul 01, 2026 at 01:05:19PM +0000, Mostafa Saleh wrote:
> On Tue, Jun 30, 2026 at 03:59:42PM -0300, Jason Gunthorpe wrote:
> > On Tue, Jun 30, 2026 at 03:33:12PM +0000, Mostafa Saleh wrote:
> >
> > > For example patch#1 verifies log2size and split and both are read
> > > from HW registers. Same for the base address or other addresses as
> > > the page tables, they might be corrupted due to a buggy driver.
> > > My point is that, it is really hard to assume that the previous state
> > > of registers/STE/page-tables were valid or even consistent, when the
> > > kernel crashed and did not transition the state gracefully.
> >
> > Sure, and this mechanism is probably not very useful for debugging
> > these kinds of errors in the SMMU driver. Oh well, that isn't a common
> > source of kernel crashes :)
>
> I hope not! Although memory corruption can happen due to many other
> reasons :/
>
> I am not trying to bikeshed, but I wondering if there is a more
> reliable way rather than doing archaeology from a panicked kernel
> SMMUv3 configuration, as I am worried that will be even harder to
> debug if it goes wrong.
>
> >
> > > Similarly for TLBs, the kernel might have panicked in the middle of an
> > > unmap or free domain. (not to mention what that means for RPM where
> > > a device reset with unknown TLBs)
> >
> > TLB is fine. kdump works by carving out a chunk of memory for the
> > future crash kernel. When the kernel boots it ignores all the memory
> > used by the prior kernel. So DMA can keep running into the old kernels
> > memory with no issue. It doesn't matter if the TLBs are inconsistent or
> > not.
>
> Ideally if a TLB is to be missed (because of the panic), it should not
> point to kdump memory as it is carved-out. However, it is still a leap to
> assume that the TLBs are in a good shape as I mentioned with RPM (or
> even if the device resets transiently for some reason) it can end up
> with garbage in its TLBs.
Regarding RPM, I can say that even if we panicked while SMMU was off in
the previous kernel, when we call device_reset() in the new kernel we
still issue the TLBI_ALL with the reset.
However, I agree with the overall problem, i.e. IF an active device
unmaps the DMA addr after the transaction in the previous kernel,
(with the SMMU powered ON) but the TLBI was missed due to a crash/panic,
Any new DMA in the new kernel may alias onto a memory in the previous
(crashed) kernel, not the kdump kernel.
That way, I agree that continuing DMA could be problematic as we may
corrupt the very memory we'd wanna analyze for a crash.
Thanks,
Praan
next prev parent reply other threads:[~2026-07-01 13:36 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-30 6:15 [PATCH rc v7 0/7] iommu/arm-smmu-v3: Fix device crash on kdump kernel Nicolin Chen
2026-06-30 6:15 ` [PATCH rc v7 1/7] iommu/arm-smmu-v3: Add arm_smmu_kdump_adopt_strtab() for kdump Nicolin Chen
2026-06-30 6:15 ` [PATCH rc v7 2/7] iommu/arm-smmu-v3: Implement is_attach_deferred() " Nicolin Chen
2026-06-30 6:15 ` [PATCH rc v7 3/7] iommu/arm-smmu-v3: Do not enable EVTQ/PRIQ interrupts in kdump kernel Nicolin Chen
2026-06-30 6:15 ` [PATCH rc v7 4/7] iommu/arm-smmu-v3: Skip EVTQ/PRIQ setup " Nicolin Chen
2026-06-30 6:15 ` [PATCH rc v7 5/7] iommu/arm-smmu-v3: Retain CR0_SMMUEN during kdump device reset Nicolin Chen
2026-06-30 6:15 ` [PATCH rc v7 6/7] iommu/arm-smmu-v3: Skip RMR bypass for kdump adoption Nicolin Chen
2026-06-30 6:15 ` [PATCH rc v7 7/7] iommu/arm-smmu-v3: Detect ARM_SMMU_OPT_KDUMP_ADOPT in probe() Nicolin Chen
2026-06-30 13:17 ` [PATCH rc v7 0/7] iommu/arm-smmu-v3: Fix device crash on kdump kernel Mostafa Saleh
2026-06-30 14:51 ` Pranjal Shrivastava
2026-06-30 15:33 ` Mostafa Saleh
2026-06-30 18:30 ` Pranjal Shrivastava
2026-06-30 19:08 ` Jason Gunthorpe
2026-06-30 19:24 ` Nicolin Chen
2026-07-01 0:25 ` Jason Gunthorpe
2026-06-30 18:59 ` Jason Gunthorpe
2026-07-01 13:05 ` Mostafa Saleh
2026-07-01 13:36 ` Pranjal Shrivastava [this message]
2026-06-30 18:56 ` Jason Gunthorpe
2026-06-30 19:27 ` Nicolin Chen
2026-07-01 9:58 ` Mostafa Saleh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=akUX3T3fIoN42sdM@google.com \
--to=praan@google.com \
--cc=baolu.lu@linux.intel.com \
--cc=iommu@lists.linux.dev \
--cc=jamien@nvidia.com \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=kees@kernel.org \
--cc=kevin.tian@intel.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miko.lenczewski@arm.com \
--cc=nicolinc@nvidia.com \
--cc=robin.murphy@arm.com \
--cc=smostafa@google.com \
--cc=stable@vger.kernel.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox