Linux ARM-MSM sub-architecture
 help / color / mirror / Atom feed
* [PATCH 0/3] iommu/arm-smmu, drm/msm: Fixes for stall-on-fault
@ 2025-01-17 18:47 Connor Abbott
  2025-01-17 18:47 ` [PATCH 1/3] iommu/arm-smmu: Fix spurious interrupts with stall-on-fault Connor Abbott
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Connor Abbott @ 2025-01-17 18:47 UTC (permalink / raw)
  To: Rob Clark, Will Deacon, Robin Murphy, Joerg Roedel, Sean Paul,
	Konrad Dybcio, Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten
  Cc: iommu, linux-arm-msm, linux-arm-kernel, freedreno, Connor Abbott

drm/msm uses the stall-on-fault model to record the GPU state on the
first GPU page fault to help debugging. On systems where the GPU is
paired with a MMU-500, there were two problems:

1. The MMU-500 doesn't de-assert its interrupt line until the fault is
   resumed, which led to a storm of interrupts until the fault handler
   was called. If we got unlucky and the fault handler was on the same
   CPU as the interrupt, there was a deadlock.
2. The GPU is capable of generating page faults much faster than we can
   resume them. GMU (GPU Management Unit) shares the same context bank
   as the GPU, so if there was a sudden spurt of page faults it would be
   effectively starved and would trigger a watchdog reset, made even
   worse because the GPU cannot be reset while there's a pending
   transaction leaving the GPU permanently wedged.

Patch 1 fixes the first problem and is independent of the rest of the
series. Patch 3 fixes the second problem and is dependent on patch 2, so
there will have to be some cross-tree coordination.

I've rebased this series on the latest linux-next to avoid rebase
troubles.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
---
Connor Abbott (3):
      iommu/arm-smmu: Fix spurious interrupts with stall-on-fault
      iommu/arm-smmu-qcom: Make set_stall work when the device is on
      drm/msm: Temporarily disable stall-on-fault after a page fault

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c      |  2 ++
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c      |  4 +++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c    | 56 +++++++++++++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h    | 21 +++++++++++
 drivers/gpu/drm/msm/msm_iommu.c            |  9 +++++
 drivers/gpu/drm/msm/msm_mmu.h              |  1 +
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 46 +++++++++++++++++++++---
 drivers/iommu/arm/arm-smmu/arm-smmu.c      | 32 +++++++++++++++++
 drivers/iommu/arm/arm-smmu/arm-smmu.h      |  2 +-
 9 files changed, 167 insertions(+), 6 deletions(-)
---
base-commit: 0907e7fb35756464aa34c35d6abb02998418164b
change-id: 20250117-msm-gpu-fault-fixes-next-96e3098023e1

Best regards,
-- 
Connor Abbott <cwabbott0@gmail.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-01-17 19:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-17 18:47 [PATCH 0/3] iommu/arm-smmu, drm/msm: Fixes for stall-on-fault Connor Abbott
2025-01-17 18:47 ` [PATCH 1/3] iommu/arm-smmu: Fix spurious interrupts with stall-on-fault Connor Abbott
2025-01-17 19:34   ` Robin Murphy
2025-01-17 18:47 ` [PATCH 2/3] iommu/arm-smmu-qcom: Make set_stall work when the device is on Connor Abbott
2025-01-17 18:47 ` [PATCH 3/3] drm/msm: Temporarily disable stall-on-fault after a page fault Connor Abbott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox