From: Will Deacon <will@kernel.org>
To: Connor Abbott <cwabbott0@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>,
Robin Murphy <robin.murphy@arm.com>,
Joerg Roedel <joro@8bytes.org>, Sean Paul <sean@poorly.run>,
Konrad Dybcio <konradybcio@kernel.org>,
Abhinav Kumar <quic_abhinavk@quicinc.com>,
Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>,
Marijn Suijten <marijn.suijten@somainline.org>,
iommu@lists.linux.dev, linux-arm-msm@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
freedreno@lists.freedesktop.org
Subject: Re: [PATCH v6 0/7] iommu/arm-smmu, drm/msm: Fixes for stall-on-fault
Date: Tue, 20 May 2025 16:38:56 +0100 [thread overview]
Message-ID: <20250520153855.GG18901@willie-the-truck> (raw)
In-Reply-To: <CACu1E7HdJvbx_6L9KvX3n78_cbkrey8npo=O=AkEzg335wJC=g@mail.gmail.com>
On Tue, May 20, 2025 at 10:42:49AM -0400, Connor Abbott wrote:
> On Tue, May 20, 2025 at 10:19 AM Will Deacon <will@kernel.org> wrote:
> > On Thu, May 15, 2025 at 03:58:42PM -0400, Connor Abbott wrote:
> > > drm/msm uses the stall-on-fault model to record the GPU state on the
> > > first GPU page fault to help debugging. On systems where the GPU is
> > > paired with a MMU-500, there were two problems:
> > >
> > > 1. The MMU-500 doesn't de-assert its interrupt line until the fault is
> > > resumed, which led to a storm of interrupts until the fault handler
> > > was called. If we got unlucky and the fault handler was on the same
> > > CPU as the interrupt, there was a deadlock.
> > > 2. The GPU is capable of generating page faults much faster than we can
> > > resume them. GMU (GPU Management Unit) shares the same context bank
> > > as the GPU, so if there was a sudden spurt of page faults it would be
> > > effectively starved and would trigger a watchdog reset, made even
> > > worse because the GPU cannot be reset while there's a pending
> > > transaction leaving the GPU permanently wedged.
> > >
> > > Patches 1-2 and 4 fix the first problem by switching the IRQ to be a
> > > threaded IRQ and then making drm/msm do its devcoredump work
> > > synchronously in the threaded IRQ. Patch 4 is dependent on patches 1-2.
> > > Patch 6 fixes the second problem and is dependent on patch 3. Patch 5 is
> > > a cleanup for patch 4 and patch 7 is a subsequent further cleanup to get
> > > rid of the resume_fault() callback once we switch resuming to being done
> > > by the SMMU's fault handler.
> >
> > Thanks for reworking this; I think it looks much better now from the
> > SMMU standpoint.
> >
> > > I've organized the series in the order that it should be picked up:
> > >
> > > - Patches 1-3 need to be applied to the iommu tree first.
> >
> > Which kernel version did you base these on? I can't see to apply the
> > second patch, as you seem to have a stale copy of arm-smmu-qcom.c?
> >
> Sorry about that, for the next version I'll rebase on linux-next. I
> was using an older version of msm-next for a while now.
Can you base on v6.15-rc2 instead, please? linux-next is a moving
target so it's not massively helpful to use that.
Cheers,
Will
prev parent reply other threads:[~2025-05-20 16:28 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-15 19:58 [PATCH v6 0/7] iommu/arm-smmu, drm/msm: Fixes for stall-on-fault Connor Abbott
2025-05-15 19:58 ` [PATCH v6 1/7] iommu/arm-smmu-qcom: Enable threaded IRQ for Adreno SMMUv2/MMU500 Connor Abbott
2025-05-15 19:58 ` [PATCH v6 2/7] iommu/arm-smmu: Move handing of RESUME to the context fault handler Connor Abbott
2025-05-20 14:19 ` Will Deacon
2025-05-15 19:58 ` [PATCH v6 3/7] iommu/arm-smmu-qcom: Make set_stall work when the device is on Connor Abbott
2025-05-15 19:58 ` [PATCH v6 4/7] drm/msm: Don't use a worker to capture fault devcoredump Connor Abbott
2025-05-15 19:58 ` [PATCH v6 5/7] drm/msm: Delete resume_translation() Connor Abbott
2025-05-15 19:58 ` [PATCH v6 6/7] drm/msm: Temporarily disable stall-on-fault after a page fault Connor Abbott
2025-05-15 19:58 ` [PATCH v6 7/7] iommu/smmu-arm-qcom: Delete resume_translation() Connor Abbott
2025-05-20 14:18 ` [PATCH v6 0/7] iommu/arm-smmu, drm/msm: Fixes for stall-on-fault Will Deacon
2025-05-20 14:42 ` Connor Abbott
2025-05-20 15:38 ` Will Deacon [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250520153855.GG18901@willie-the-truck \
--to=will@kernel.org \
--cc=cwabbott0@gmail.com \
--cc=dmitry.baryshkov@oss.qualcomm.com \
--cc=freedreno@lists.freedesktop.org \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=konradybcio@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=marijn.suijten@somainline.org \
--cc=quic_abhinavk@quicinc.com \
--cc=robdclark@gmail.com \
--cc=robin.murphy@arm.com \
--cc=sean@poorly.run \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox