[PATCH v4 0/5] iommu/arm-smmu, drm/msm: Fixes for stall-on-fault

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 0/5] iommu/arm-smmu, drm/msm: Fixes for stall-on-fault
@ 2025-03-04 16:56 Connor Abbott
  2025-03-04 16:56 ` [PATCH v4 1/5] iommu/arm-smmu: Save additional information on context fault Connor Abbott
                   ` (4 more replies)
  0 siblings, 5 replies; 33+ messages in thread
From: Connor Abbott @ 2025-03-04 16:56 UTC (permalink / raw)
  To: Rob Clark, Will Deacon, Robin Murphy, Joerg Roedel, Sean Paul,
	Konrad Dybcio, Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten
  Cc: iommu, linux-arm-msm, linux-arm-kernel, freedreno, Connor Abbott

drm/msm uses the stall-on-fault model to record the GPU state on the
first GPU page fault to help debugging. On systems where the GPU is
paired with a MMU-500, there were two problems:

1. The MMU-500 doesn't de-assert its interrupt line until the fault is
   resumed, which led to a storm of interrupts until the fault handler
   was called. If we got unlucky and the fault handler was on the same
   CPU as the interrupt, there was a deadlock.
2. The GPU is capable of generating page faults much faster than we can
   resume them. GMU (GPU Management Unit) shares the same context bank
   as the GPU, so if there was a sudden spurt of page faults it would be
   effectively starved and would trigger a watchdog reset, made even
   worse because the GPU cannot be reset while there's a pending
   transaction leaving the GPU permanently wedged.

Patches 1-3 fixes the first problem and is independent of the rest of the
series. Patch 5 fixes the second problem and is dependent on patch 4, so
there will have to be some cross-tree coordination.

I've rebased this series on the latest linux-next to avoid rebase
troubles.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
---
Changes in v4:
- Add patches 1-2, which fix reading registers in drm/msm when
  acknowledging the fault early. This was Robin's preferred solution
  compared to making drm/msm's fault handler tell arm-smmu to resume the
  fault.
- Link to v3: https://lore.kernel.org/r/20250122-msm-gpu-fault-fixes-next-v3-0-0afa00158521@gmail.com

Changes in v3:
- Acknowledge the fault before resuming the transaction in patch 1.
- Add suggested extra context to commit messages.
- Link to v2: https://lore.kernel.org/r/20250120-msm-gpu-fault-fixes-next-v2-0-d636c4027042@gmail.com

Changes in v2:
- Remove unnecessary _irqsave when locking in IRQ handler (Robin)
- Reuse existing spinlock for CFIE manipulation (Robin)
- Lock CFCFG manipulation against concurrent CFIE manipulation
- Don't use timer to re-enable stall-on-fault. (Rob)
- Use more descriptive name for the function that re-enables
  stall-on-fault if the cooldown period has ended. (Rob)
- Link to v1: https://lore.kernel.org/r/20250117-msm-gpu-fault-fixes-next-v1-0-bc9b332b5d0b@gmail.com

---
Connor Abbott (5):
      iommu/arm-smmu: Save additional information on context fault
      iommu/arm-smmu-qcom: Don't read fault registers directly
      iommu/arm-smmu: Fix spurious interrupts with stall-on-fault
      iommu/arm-smmu-qcom: Make set_stall work when the device is on
      drm/msm: Temporarily disable stall-on-fault after a page fault

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c            |  2 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c            |  4 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c          | 42 ++++++++++++-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h          | 24 ++++++++
 drivers/gpu/drm/msm/msm_iommu.c                  |  9 +++
 drivers/gpu/drm/msm/msm_mmu.h                    |  1 +
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c |  4 +-
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c       | 64 ++++++++++++++-----
 drivers/iommu/arm/arm-smmu/arm-smmu.c            | 78 ++++++++++++++++++------
 drivers/iommu/arm/arm-smmu/arm-smmu.h            | 19 +++---
 10 files changed, 204 insertions(+), 43 deletions(-)
---
base-commit: 866e43b945bf98f8e807dfa45eca92f931f3a032
change-id: 20250117-msm-gpu-fault-fixes-next-96e3098023e1

Best regards,
-- 
Connor Abbott <cwabbott0@gmail.com>



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 1/5] iommu/arm-smmu: Save additional information on context fault
  2025-03-04 16:56 [PATCH v4 0/5] iommu/arm-smmu, drm/msm: Fixes for stall-on-fault Connor Abbott
@ 2025-03-04 16:56 ` Connor Abbott
  2025-03-05 19:09   ` Rob Clark
  2025-03-11 18:05   ` Will Deacon
  2025-03-04 16:56 ` [PATCH v4 2/5] iommu/arm-smmu-qcom: Don't read fault registers directly Connor Abbott
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 33+ messages in thread
From: Connor Abbott @ 2025-03-04 16:56 UTC (permalink / raw)
  To: Rob Clark, Will Deacon, Robin Murphy, Joerg Roedel, Sean Paul,
	Konrad Dybcio, Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten
  Cc: iommu, linux-arm-msm, linux-arm-kernel, freedreno, Connor Abbott

This will be used by drm/msm for GPU page faults, replacing the manual
register reading it does.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c |  4 ++--
 drivers/iommu/arm/arm-smmu/arm-smmu.c            | 27 +++++++++++++-----------
 drivers/iommu/arm/arm-smmu/arm-smmu.h            |  5 ++++-
 3 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
index 548783f3f8e89fd978367afa65c473002f66e2e7..ae4fdbbce6ba80440f539557a39866a932360d4e 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
@@ -400,7 +400,7 @@ irqreturn_t qcom_smmu_context_fault(int irq, void *dev)
 
 	if (list_empty(&tbu_list)) {
 		ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
-					 cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
+					 cfi.fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
 
 		if (ret == -ENOSYS)
 			arm_smmu_print_context_fault_info(smmu, idx, &cfi);
@@ -412,7 +412,7 @@ irqreturn_t qcom_smmu_context_fault(int irq, void *dev)
 	phys_soft = ops->iova_to_phys(ops, cfi.iova);
 
 	tmp = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
-				 cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
+				 cfi.fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
 	if (!tmp || tmp == -EBUSY) {
 		ret = IRQ_HANDLED;
 		resume = ARM_SMMU_RESUME_TERMINATE;
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index ade4684c14c9b2724a71e2457288dbfaf7562c83..a9213e0f1579d1e3be0bfba75eea1d5de23117de 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -409,9 +409,12 @@ void arm_smmu_read_context_fault_info(struct arm_smmu_device *smmu, int idx,
 				      struct arm_smmu_context_fault_info *cfi)
 {
 	cfi->iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
+	cfi->ttbr0 = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_TTBR0);
 	cfi->fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
-	cfi->fsynr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
+	cfi->fsynr0 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
+	cfi->fsynr1 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR1);
 	cfi->cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
+	cfi->contextidr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_CONTEXTIDR);
 }
 
 void arm_smmu_print_context_fault_info(struct arm_smmu_device *smmu, int idx,
@@ -419,7 +422,7 @@ void arm_smmu_print_context_fault_info(struct arm_smmu_device *smmu, int idx,
 {
 	dev_err(smmu->dev,
 		"Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
-		cfi->fsr, cfi->iova, cfi->fsynr, cfi->cbfrsynra, idx);
+		cfi->fsr, cfi->iova, cfi->fsynr0, cfi->cbfrsynra, idx);
 
 	dev_err(smmu->dev, "FSR    = %08x [%s%sFormat=%u%s%s%s%s%s%s%s%s], SID=0x%x\n",
 		cfi->fsr,
@@ -437,15 +440,15 @@ void arm_smmu_print_context_fault_info(struct arm_smmu_device *smmu, int idx,
 		cfi->cbfrsynra);
 
 	dev_err(smmu->dev, "FSYNR0 = %08x [S1CBNDX=%u%s%s%s%s%s%s PLVL=%u]\n",
-		cfi->fsynr,
-		(u32)FIELD_GET(ARM_SMMU_CB_FSYNR0_S1CBNDX, cfi->fsynr),
-		(cfi->fsynr & ARM_SMMU_CB_FSYNR0_AFR) ? " AFR" : "",
-		(cfi->fsynr & ARM_SMMU_CB_FSYNR0_PTWF) ? " PTWF" : "",
-		(cfi->fsynr & ARM_SMMU_CB_FSYNR0_NSATTR) ? " NSATTR" : "",
-		(cfi->fsynr & ARM_SMMU_CB_FSYNR0_IND) ? " IND" : "",
-		(cfi->fsynr & ARM_SMMU_CB_FSYNR0_PNU) ? " PNU" : "",
-		(cfi->fsynr & ARM_SMMU_CB_FSYNR0_WNR) ? " WNR" : "",
-		(u32)FIELD_GET(ARM_SMMU_CB_FSYNR0_PLVL, cfi->fsynr));
+		cfi->fsynr0,
+		(u32)FIELD_GET(ARM_SMMU_CB_FSYNR0_S1CBNDX, cfi->fsynr0),
+		(cfi->fsynr0 & ARM_SMMU_CB_FSYNR0_AFR) ? " AFR" : "",
+		(cfi->fsynr0 & ARM_SMMU_CB_FSYNR0_PTWF) ? " PTWF" : "",
+		(cfi->fsynr0 & ARM_SMMU_CB_FSYNR0_NSATTR) ? " NSATTR" : "",
+		(cfi->fsynr0 & ARM_SMMU_CB_FSYNR0_IND) ? " IND" : "",
+		(cfi->fsynr0 & ARM_SMMU_CB_FSYNR0_PNU) ? " PNU" : "",
+		(cfi->fsynr0 & ARM_SMMU_CB_FSYNR0_WNR) ? " WNR" : "",
+		(u32)FIELD_GET(ARM_SMMU_CB_FSYNR0_PLVL, cfi->fsynr0));
 }
 
 static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
@@ -464,7 +467,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
 		return IRQ_NONE;
 
 	ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
-		cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
+		cfi.fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
 
 	if (ret == -ENOSYS && __ratelimit(&rs))
 		arm_smmu_print_context_fault_info(smmu, idx, &cfi);
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index e2aeb511ae903302e3c15d2cf5f22e2a26ac2346..d3bc77dcd4d40f25bc70f3289616fb866649b022 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -543,9 +543,12 @@ int arm_mmu500_reset(struct arm_smmu_device *smmu);
 
 struct arm_smmu_context_fault_info {
 	unsigned long iova;
+	u64 ttbr0;
 	u32 fsr;
-	u32 fsynr;
+	u32 fsynr0;
+	u32 fsynr1;
 	u32 cbfrsynra;
+	u32 contextidr;
 };
 
 void arm_smmu_read_context_fault_info(struct arm_smmu_device *smmu, int idx,

-- 
2.47.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 2/5] iommu/arm-smmu-qcom: Don't read fault registers directly
  2025-03-04 16:56 [PATCH v4 0/5] iommu/arm-smmu, drm/msm: Fixes for stall-on-fault Connor Abbott
  2025-03-04 16:56 ` [PATCH v4 1/5] iommu/arm-smmu: Save additional information on context fault Connor Abbott
@ 2025-03-04 16:56 ` Connor Abbott
  2025-03-05 19:08   ` Rob Clark
  2025-03-11 18:08   ` Will Deacon
  2025-03-04 16:56 ` [PATCH v4 3/5] iommu/arm-smmu: Fix spurious interrupts with stall-on-fault Connor Abbott
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 33+ messages in thread
From: Connor Abbott @ 2025-03-04 16:56 UTC (permalink / raw)
  To: Rob Clark, Will Deacon, Robin Murphy, Joerg Roedel, Sean Paul,
	Konrad Dybcio, Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten
  Cc: iommu, linux-arm-msm, linux-arm-kernel, freedreno, Connor Abbott

In some cases drm/msm has to resume a stalled transaction directly in
its fault handler. Experimentally this doesn't work on SMMU500 if the
fault hasn't already been acknowledged by clearing FSR. Rather than
trying to clear FSR in msm's fault handler and implementing a
tricky handshake to avoid accidentally clearing FSR twice, we want to
clear FSR before calling the fault handlers, but this means that the
contents of registers can change underneath us in the fault handler and
msm currently uses a private function to read the register contents for
its own purposes in its fault handler, such as using the
implementation-defined FSYNR1 to determine which block caused the fault.
Fix this by making msm use the register values already read by arm-smmu
itself before clearing FSR rather than messing around with reading
registers directly.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 19 +++++++++----------
 drivers/iommu/arm/arm-smmu/arm-smmu.c      | 14 +++++++-------
 drivers/iommu/arm/arm-smmu/arm-smmu.h      | 21 +++++++++++----------
 3 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 6372f3e25c4bc24cb52f9233095170e8aa510a53..186d6ad4fd1c990398df4dec53f4d58ada9e658c 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -62,16 +62,15 @@ static void qcom_adreno_smmu_get_fault_info(const void *cookie,
 		struct adreno_smmu_fault_info *info)
 {
 	struct arm_smmu_domain *smmu_domain = (void *)cookie;
-	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
-	struct arm_smmu_device *smmu = smmu_domain->smmu;
-
-	info->fsr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_FSR);
-	info->fsynr0 = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_FSYNR0);
-	info->fsynr1 = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_FSYNR1);
-	info->far = arm_smmu_cb_readq(smmu, cfg->cbndx, ARM_SMMU_CB_FAR);
-	info->cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(cfg->cbndx));
-	info->ttbr0 = arm_smmu_cb_readq(smmu, cfg->cbndx, ARM_SMMU_CB_TTBR0);
-	info->contextidr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_CONTEXTIDR);
+	struct arm_smmu_context_fault_info *cfi = &smmu_domain->cfi;
+
+	info->fsr = cfi->fsr;
+	info->fsynr0 = cfi->fsynr0;
+	info->fsynr1 = cfi->fsynr1;
+	info->far = cfi->iova;
+	info->cbfrsynra = cfi->cbfrsynra;
+	info->ttbr0 = cfi->ttbr0;
+	info->contextidr = cfi->contextidr;
 }
 
 static void qcom_adreno_smmu_set_stall(const void *cookie, bool enabled)
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index a9213e0f1579d1e3be0bfba75eea1d5de23117de..498b96e95cb4fdb67c246ef13de1eb8f40d68f7d 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -453,26 +453,26 @@ void arm_smmu_print_context_fault_info(struct arm_smmu_device *smmu, int idx,
 
 static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
 {
-	struct arm_smmu_context_fault_info cfi;
 	struct arm_smmu_domain *smmu_domain = dev;
+	struct arm_smmu_context_fault_info *cfi = &smmu_domain->cfi;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
 				      DEFAULT_RATELIMIT_BURST);
 	int idx = smmu_domain->cfg.cbndx;
 	int ret;
 
-	arm_smmu_read_context_fault_info(smmu, idx, &cfi);
+	arm_smmu_read_context_fault_info(smmu, idx, cfi);
 
-	if (!(cfi.fsr & ARM_SMMU_CB_FSR_FAULT))
+	if (!(cfi->fsr & ARM_SMMU_CB_FSR_FAULT))
 		return IRQ_NONE;
 
-	ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
-		cfi.fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
+	ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi->iova,
+		cfi->fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
 
 	if (ret == -ENOSYS && __ratelimit(&rs))
-		arm_smmu_print_context_fault_info(smmu, idx, &cfi);
+		arm_smmu_print_context_fault_info(smmu, idx, cfi);
 
-	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi.fsr);
+	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi->fsr);
 	return IRQ_HANDLED;
 }
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index d3bc77dcd4d40f25bc70f3289616fb866649b022..411d807e0a7033833716635efb3968a0bd3ff237 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -373,6 +373,16 @@ enum arm_smmu_domain_stage {
 	ARM_SMMU_DOMAIN_NESTED,
 };
 
+struct arm_smmu_context_fault_info {
+	unsigned long iova;
+	u64 ttbr0;
+	u32 fsr;
+	u32 fsynr0;
+	u32 fsynr1;
+	u32 cbfrsynra;
+	u32 contextidr;
+};
+
 struct arm_smmu_domain {
 	struct arm_smmu_device		*smmu;
 	struct io_pgtable_ops		*pgtbl_ops;
@@ -380,6 +390,7 @@ struct arm_smmu_domain {
 	const struct iommu_flush_ops	*flush_ops;
 	struct arm_smmu_cfg		cfg;
 	enum arm_smmu_domain_stage	stage;
+	struct arm_smmu_context_fault_info cfi;
 	struct mutex			init_mutex; /* Protects smmu pointer */
 	spinlock_t			cb_lock; /* Serialises ATS1* ops and TLB syncs */
 	struct iommu_domain		domain;
@@ -541,16 +552,6 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
 void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx);
 int arm_mmu500_reset(struct arm_smmu_device *smmu);
 
-struct arm_smmu_context_fault_info {
-	unsigned long iova;
-	u64 ttbr0;
-	u32 fsr;
-	u32 fsynr0;
-	u32 fsynr1;
-	u32 cbfrsynra;
-	u32 contextidr;
-};
-
 void arm_smmu_read_context_fault_info(struct arm_smmu_device *smmu, int idx,
 				      struct arm_smmu_context_fault_info *cfi);
 

-- 
2.47.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 3/5] iommu/arm-smmu: Fix spurious interrupts with stall-on-fault
  2025-03-04 16:56 [PATCH v4 0/5] iommu/arm-smmu, drm/msm: Fixes for stall-on-fault Connor Abbott
  2025-03-04 16:56 ` [PATCH v4 1/5] iommu/arm-smmu: Save additional information on context fault Connor Abbott
  2025-03-04 16:56 ` [PATCH v4 2/5] iommu/arm-smmu-qcom: Don't read fault registers directly Connor Abbott
@ 2025-03-04 16:56 ` Connor Abbott
  2025-03-05 19:12   ` Rob Clark
  2025-03-04 16:56 ` [PATCH v4 4/5] iommu/arm-smmu-qcom: Make set_stall work when the device is on Connor Abbott
  2025-03-04 16:56 ` [PATCH v4 5/5] drm/msm: Temporarily disable stall-on-fault after a page fault Connor Abbott
  4 siblings, 1 reply; 33+ messages in thread
From: Connor Abbott @ 2025-03-04 16:56 UTC (permalink / raw)
  To: Rob Clark, Will Deacon, Robin Murphy, Joerg Roedel, Sean Paul,
	Konrad Dybcio, Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten
  Cc: iommu, linux-arm-msm, linux-arm-kernel, freedreno, Connor Abbott

On some SMMUv2 implementations, including MMU-500, SMMU_CBn_FSR.SS
asserts an interrupt. The only way to clear that bit is to resume the
transaction by writing SMMU_CBn_RESUME, but typically resuming the
transaction requires complex operations (copying in pages, etc.) that
can't be done in IRQ context. drm/msm already has a problem, because
its fault handler sometimes schedules a job to dump the GPU state and
doesn't resume translation until this is complete.

Work around this by disabling context fault interrupts until after the
transaction is resumed. Because other context banks can share an IRQ
line, we may still get an interrupt intended for another context bank,
but in this case only SMMU_CBn_FSR.SS will be asserted and we can skip
it assuming that interrupts are disabled which is accomplished by
removing the bit from ARM_SMMU_CB_FSR_FAULT. SMMU_CBn_FSR.SS won't be
asserted unless an external user enabled stall-on-fault, and they are
expected to resume the translation and re-enable interrupts.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 15 ++++++++++-
 drivers/iommu/arm/arm-smmu/arm-smmu.c      | 41 +++++++++++++++++++++++++++++-
 drivers/iommu/arm/arm-smmu/arm-smmu.h      |  1 -
 3 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 186d6ad4fd1c990398df4dec53f4d58ada9e658c..a428e53add08d451fb2152e3ab80e0fba936e214 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -90,12 +90,25 @@ static void qcom_adreno_smmu_resume_translation(const void *cookie, bool termina
 	struct arm_smmu_domain *smmu_domain = (void *)cookie;
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
-	u32 reg = 0;
+	u32 reg = 0, sctlr;
+	unsigned long flags;
 
 	if (terminate)
 		reg |= ARM_SMMU_RESUME_TERMINATE;
 
+	spin_lock_irqsave(&smmu_domain->cb_lock, flags);
+
 	arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_RESUME, reg);
+
+	/*
+	 * Re-enable interrupts after they were disabled by
+	 * arm_smmu_context_fault().
+	 */
+	sctlr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR);
+	sctlr |= ARM_SMMU_SCTLR_CFIE;
+	arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR, sctlr);
+
+	spin_unlock_irqrestore(&smmu_domain->cb_lock, flags);
 }
 
 #define QCOM_ADRENO_SMMU_GPU_SID 0
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 498b96e95cb4fdb67c246ef13de1eb8f40d68f7d..284079ef95cd2deeb71816a284850523897badd8 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -466,13 +466,52 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
 	if (!(cfi->fsr & ARM_SMMU_CB_FSR_FAULT))
 		return IRQ_NONE;
 
+	/*
+	 * On some implementations FSR.SS asserts a context fault
+	 * interrupt. We do not want this behavior, because resolving the
+	 * original context fault typically requires operations that cannot be
+	 * performed in IRQ context but leaving the stall unacknowledged will
+	 * immediately lead to another spurious interrupt as FSR.SS is still
+	 * set. Work around this by disabling interrupts for this context bank.
+	 * It's expected that interrupts are re-enabled after resuming the
+	 * translation.
+	 *
+	 * We have to do this before report_iommu_fault() so that we don't
+	 * leave interrupts disabled in case the downstream user decides the
+	 * fault can be resolved inside its fault handler.
+	 *
+	 * There is a possible race if there are multiple context banks sharing
+	 * the same interrupt and both signal an interrupt in between writing
+	 * RESUME and SCTLR. We could disable interrupts here before we
+	 * re-enable them in the resume handler, leaving interrupts enabled.
+	 * Lock the write to serialize it with the resume handler.
+	 */
+	if (cfi->fsr & ARM_SMMU_CB_FSR_SS) {
+		u32 val;
+
+		spin_lock(&smmu_domain->cb_lock);
+		val = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_SCTLR);
+		val &= ~ARM_SMMU_SCTLR_CFIE;
+		arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, val);
+		spin_unlock(&smmu_domain->cb_lock);
+	}
+
+	/*
+	 * The SMMUv2 architecture specification says that if stall-on-fault is
+	 * enabled the correct sequence is to write to SMMU_CBn_FSR to clear
+	 * the fault and then write to SMMU_CBn_RESUME. Clear the interrupt
+	 * first before running the user's fault handler to make sure we follow
+	 * this sequence. It should be ok if there is another fault in the
+	 * meantime because we have already read the fault info.
+	 */
+	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi->fsr);
+
 	ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi->iova,
 		cfi->fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
 
 	if (ret == -ENOSYS && __ratelimit(&rs))
 		arm_smmu_print_context_fault_info(smmu, idx, cfi);
 
-	arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi->fsr);
 	return IRQ_HANDLED;
 }
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 411d807e0a7033833716635efb3968a0bd3ff237..4235b772c2cb032778816578c9e6644512543a5e 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -214,7 +214,6 @@ enum arm_smmu_cbar_type {
 					 ARM_SMMU_CB_FSR_TLBLKF)
 
 #define ARM_SMMU_CB_FSR_FAULT		(ARM_SMMU_CB_FSR_MULTI |	\
-					 ARM_SMMU_CB_FSR_SS |		\
 					 ARM_SMMU_CB_FSR_UUT |		\
 					 ARM_SMMU_CB_FSR_EF |		\
 					 ARM_SMMU_CB_FSR_PF |		\

-- 
2.47.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 4/5] iommu/arm-smmu-qcom: Make set_stall work when the device is on
  2025-03-04 16:56 [PATCH v4 0/5] iommu/arm-smmu, drm/msm: Fixes for stall-on-fault Connor Abbott
                   ` (2 preceding siblings ...)
  2025-03-04 16:56 ` [PATCH v4 3/5] iommu/arm-smmu: Fix spurious interrupts with stall-on-fault Connor Abbott
@ 2025-03-04 16:56 ` Connor Abbott
  2025-03-05 19:09   ` Rob Clark
  2025-03-11 18:11   ` Will Deacon
  2025-03-04 16:56 ` [PATCH v4 5/5] drm/msm: Temporarily disable stall-on-fault after a page fault Connor Abbott
  4 siblings, 2 replies; 33+ messages in thread
From: Connor Abbott @ 2025-03-04 16:56 UTC (permalink / raw)
  To: Rob Clark, Will Deacon, Robin Murphy, Joerg Roedel, Sean Paul,
	Konrad Dybcio, Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten
  Cc: iommu, linux-arm-msm, linux-arm-kernel, freedreno, Connor Abbott

Up until now we have only called the set_stall callback during
initialization when the device is off. But we will soon start calling it
to temporarily disable stall-on-fault when the device is on, so handle
that by checking if the device is on and writing SCTLR.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index a428e53add08d451fb2152e3ab80e0fba936e214..d34a0d917013bb3d5a24b3ce72f48e3b38474da2 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -77,12 +77,36 @@ static void qcom_adreno_smmu_set_stall(const void *cookie, bool enabled)
 {
 	struct arm_smmu_domain *smmu_domain = (void *)cookie;
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
-	struct qcom_smmu *qsmmu = to_qcom_smmu(smmu_domain->smmu);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
+	u32 mask = BIT(cfg->cbndx);
+	bool stall_changed = !!(qsmmu->stall_enabled & mask) != enabled;
+	unsigned long flags;
 
 	if (enabled)
-		qsmmu->stall_enabled |= BIT(cfg->cbndx);
+		qsmmu->stall_enabled |= mask;
 	else
-		qsmmu->stall_enabled &= ~BIT(cfg->cbndx);
+		qsmmu->stall_enabled &= ~mask;
+
+	/*
+	 * If the device is on and we changed the setting, update the register.
+	 */
+	if (stall_changed && pm_runtime_get_if_active(smmu->dev) > 0) {
+		spin_lock_irqsave(&smmu_domain->cb_lock, flags);
+
+		u32 reg = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR);
+
+		if (enabled)
+			reg |= ARM_SMMU_SCTLR_CFCFG;
+		else
+			reg &= ~ARM_SMMU_SCTLR_CFCFG;
+
+		arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR, reg);
+
+		spin_unlock_irqrestore(&smmu_domain->cb_lock, flags);
+
+		pm_runtime_put_autosuspend(smmu->dev);
+	}
 }
 
 static void qcom_adreno_smmu_resume_translation(const void *cookie, bool terminate)

-- 
2.47.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 5/5] drm/msm: Temporarily disable stall-on-fault after a page fault
  2025-03-04 16:56 [PATCH v4 0/5] iommu/arm-smmu, drm/msm: Fixes for stall-on-fault Connor Abbott
                   ` (3 preceding siblings ...)
  2025-03-04 16:56 ` [PATCH v4 4/5] iommu/arm-smmu-qcom: Make set_stall work when the device is on Connor Abbott
@ 2025-03-04 16:56 ` Connor Abbott
  2025-03-05 19:07   ` Rob Clark
  4 siblings, 1 reply; 33+ messages in thread
From: Connor Abbott @ 2025-03-04 16:56 UTC (permalink / raw)
  To: Rob Clark, Will Deacon, Robin Murphy, Joerg Roedel, Sean Paul,
	Konrad Dybcio, Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten
  Cc: iommu, linux-arm-msm, linux-arm-kernel, freedreno, Connor Abbott

When things go wrong, the GPU is capable of quickly generating millions
of faulting translation requests per second. When that happens, in the
stall-on-fault model each access will stall until it wins the race to
signal the fault and then the RESUME register is written. This slows
processing page faults to a crawl as the GPU can generate faults much
faster than the CPU can acknowledge them. It also means that all
available resources in the SMMU are saturated waiting for the stalled
transactions, so that other transactions such as transactions generated
by the GMU, which shares a context bank with the GPU, cannot proceed.
This causes a GMU watchdog timeout, which leads to a failed reset
because GX cannot collapse when there is a transaction pending and a
permanently hung GPU.

On older platforms with qcom,smmu-v2, it seems that when one transaction
is stalled subsequent faulting transactions are terminated, which avoids
this problem, but the MMU-500 follows the spec here.

To work around these problem, disable stall-on-fault as soon as we get a
page fault until a cooldown period after pagefaults stop. This allows
the GMU some guaranteed time to continue working. We only use
stall-on-fault to halt the GPU while we collect a devcoredump and we
always terminate the transaction afterward, so it's fine to miss some
subsequent page faults. We also keep it disabled so long as the current
devcoredump hasn't been deleted, because in that case we likely won't
capture another one if there's a fault.

After this commit HFI messages still occasionally time out, because the
crashdump handler doesn't run fast enough to let the GMU resume, but the
driver seems to recover from it. This will probably go away after the
HFI timeout is increased.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  2 ++
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   |  4 ++++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 42 ++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h | 24 +++++++++++++++++++
 drivers/gpu/drm/msm/msm_iommu.c         |  9 +++++++
 drivers/gpu/drm/msm/msm_mmu.h           |  1 +
 6 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 71dca78cd7a5324e9ff5b14f173e2209fa42e196..670141531112c9d29cef8ef1fd51b74759fdd6d2 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -131,6 +131,8 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 	struct msm_ringbuffer *ring = submit->ring;
 	unsigned int i, ibs = 0;
 
+	adreno_check_and_reenable_stall(adreno_gpu);
+
 	if (IS_ENABLED(CONFIG_DRM_MSM_GPU_SUDO) && submit->in_rb) {
 		ring->cur_ctx_seqno = 0;
 		a5xx_submit_in_rb(gpu, submit);
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 0ae29a7c8a4d3f74236a35cc919f69d5c0a384a0..5a34cd2109a2d74c92841448a61ccb0d4f34e264 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -212,6 +212,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 	struct msm_ringbuffer *ring = submit->ring;
 	unsigned int i, ibs = 0;
 
+	adreno_check_and_reenable_stall(adreno_gpu);
+
 	a6xx_set_pagetable(a6xx_gpu, ring, submit);
 
 	get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP(0),
@@ -335,6 +337,8 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 	struct msm_ringbuffer *ring = submit->ring;
 	unsigned int i, ibs = 0;
 
+	adreno_check_and_reenable_stall(adreno_gpu);
+
 	/*
 	 * Toggle concurrent binning for pagetable switch and set the thread to
 	 * BR since only it can execute the pagetable switch packets.
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 1238f326597808eb28b4c6822cbd41a26e555eb9..bac586101dc0494f46b069a8440a45825dfe9b5e 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -246,16 +246,53 @@ u64 adreno_private_address_space_size(struct msm_gpu *gpu)
 	return SZ_4G;
 }
 
+void adreno_check_and_reenable_stall(struct adreno_gpu *adreno_gpu)
+{
+	struct msm_gpu *gpu = &adreno_gpu->base;
+	unsigned long flags;
+
+	/*
+	 * Wait until the cooldown period has passed and we would actually
+	 * collect a crashdump to re-enable stall-on-fault.
+	 */
+	spin_lock_irqsave(&adreno_gpu->fault_stall_lock, flags);
+	if (!adreno_gpu->stall_enabled &&
+			ktime_after(ktime_get(), adreno_gpu->stall_reenable_time) &&
+			!READ_ONCE(gpu->crashstate)) {
+		adreno_gpu->stall_enabled = true;
+
+		gpu->aspace->mmu->funcs->set_stall(gpu->aspace->mmu, true);
+	}
+	spin_unlock_irqrestore(&adreno_gpu->fault_stall_lock, flags);
+}
+
 #define ARM_SMMU_FSR_TF                 BIT(1)
 #define ARM_SMMU_FSR_PF			BIT(3)
 #define ARM_SMMU_FSR_EF			BIT(4)
+#define ARM_SMMU_FSR_SS			BIT(30)
 
 int adreno_fault_handler(struct msm_gpu *gpu, unsigned long iova, int flags,
 			 struct adreno_smmu_fault_info *info, const char *block,
 			 u32 scratch[4])
 {
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	const char *type = "UNKNOWN";
-	bool do_devcoredump = info && !READ_ONCE(gpu->crashstate);
+	bool do_devcoredump = info && (info->fsr & ARM_SMMU_FSR_SS) &&
+		!READ_ONCE(gpu->crashstate);
+	unsigned long irq_flags;
+
+	/*
+	 * In case there is a subsequent storm of pagefaults, disable
+	 * stall-on-fault for at least half a second.
+	 */
+	spin_lock_irqsave(&adreno_gpu->fault_stall_lock, irq_flags);
+	if (adreno_gpu->stall_enabled) {
+		adreno_gpu->stall_enabled = false;
+
+		gpu->aspace->mmu->funcs->set_stall(gpu->aspace->mmu, false);
+	}
+	adreno_gpu->stall_reenable_time = ktime_add_ms(ktime_get(), 500);
+	spin_unlock_irqrestore(&adreno_gpu->fault_stall_lock, irq_flags);
 
 	/*
 	 * If we aren't going to be resuming later from fault_worker, then do
@@ -1143,6 +1180,9 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		adreno_gpu->info->inactive_period);
 	pm_runtime_use_autosuspend(dev);
 
+	spin_lock_init(&adreno_gpu->fault_stall_lock);
+	adreno_gpu->stall_enabled = true;
+
 	return msm_gpu_init(drm, pdev, &adreno_gpu->base, &funcs->base,
 			gpu_name, &adreno_gpu_config);
 }
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index dcf454629ce037b2a8274a6699674ad754ce1f07..a528036b46216bd898f6d48c5fb0555c4c4b053b 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -205,6 +205,28 @@ struct adreno_gpu {
 	/* firmware: */
 	const struct firmware *fw[ADRENO_FW_MAX];
 
+	/**
+	 * fault_stall_lock:
+	 *
+	 * Serialize changes to stall-on-fault state.
+	 */
+	spinlock_t fault_stall_lock;
+
+	/**
+	 * fault_stall_reenable_time:
+	 *
+	 * if stall_enabled is false, when to reenable stall-on-fault.
+	 */
+	ktime_t stall_reenable_time;
+
+	/**
+	 * stall_enabled:
+	 *
+	 * Whether stall-on-fault is currently enabled.
+	 */
+	bool stall_enabled;
+
+
 	struct {
 		/**
 		 * @rgb565_predicator: Unknown, introduced with A650 family,
@@ -629,6 +651,8 @@ int adreno_fault_handler(struct msm_gpu *gpu, unsigned long iova, int flags,
 			 struct adreno_smmu_fault_info *info, const char *block,
 			 u32 scratch[4]);
 
+void adreno_check_and_reenable_stall(struct adreno_gpu *gpu);
+
 int adreno_read_speedbin(struct device *dev, u32 *speedbin);
 
 /*
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 2a94e82316f95c5f9dcc37ef0a4664a29e3492b2..8d5380e6dcc217c7c209b51527bf15748b3ada71 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -351,6 +351,14 @@ static void msm_iommu_resume_translation(struct msm_mmu *mmu)
 		adreno_smmu->resume_translation(adreno_smmu->cookie, true);
 }
 
+static void msm_iommu_set_stall(struct msm_mmu *mmu, bool enable)
+{
+	struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(mmu->dev);
+
+	if (adreno_smmu->set_stall)
+		adreno_smmu->set_stall(adreno_smmu->cookie, enable);
+}
+
 static void msm_iommu_detach(struct msm_mmu *mmu)
 {
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
@@ -399,6 +407,7 @@ static const struct msm_mmu_funcs funcs = {
 		.unmap = msm_iommu_unmap,
 		.destroy = msm_iommu_destroy,
 		.resume_translation = msm_iommu_resume_translation,
+		.set_stall = msm_iommu_set_stall,
 };
 
 struct msm_mmu *msm_iommu_new(struct device *dev, unsigned long quirks)
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 88af4f490881f2a6789ae2d03e1c02d10046331a..2694a356a17904e7572b767b16ed0cee806406cf 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -16,6 +16,7 @@ struct msm_mmu_funcs {
 	int (*unmap)(struct msm_mmu *mmu, uint64_t iova, size_t len);
 	void (*destroy)(struct msm_mmu *mmu);
 	void (*resume_translation)(struct msm_mmu *mmu);
+	void (*set_stall)(struct msm_mmu *mmu, bool enable);
 };
 
 enum msm_mmu_type {

-- 
2.47.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 5/5] drm/msm: Temporarily disable stall-on-fault after a page fault
  2025-03-04 16:56 ` [PATCH v4 5/5] drm/msm: Temporarily disable stall-on-fault after a page fault Connor Abbott
@ 2025-03-05 19:07   ` Rob Clark
  2025-03-05 19:38     ` Connor Abbott
  0 siblings, 1 reply; 33+ messages in thread
From: Rob Clark @ 2025-03-05 19:07 UTC (permalink / raw)
  To: Connor Abbott
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On Tue, Mar 4, 2025 at 8:57 AM Connor Abbott <cwabbott0@gmail.com> wrote:
>
> When things go wrong, the GPU is capable of quickly generating millions
> of faulting translation requests per second. When that happens, in the
> stall-on-fault model each access will stall until it wins the race to
> signal the fault and then the RESUME register is written. This slows
> processing page faults to a crawl as the GPU can generate faults much
> faster than the CPU can acknowledge them. It also means that all
> available resources in the SMMU are saturated waiting for the stalled
> transactions, so that other transactions such as transactions generated
> by the GMU, which shares a context bank with the GPU, cannot proceed.

Nit, the GMU does not actually share a cb.. looking on x1e80100.dtsi,
the GMU has cb 5 and gpu has 0 and 1.  (Currently we just use the
first, but I guess the 2nd would be used if we supported protected
content?)

fwiw, you can read this from dtsi, ie. in the GMU node, "iommus =
<&adreno_smmu 5 0x0>;"

> This causes a GMU watchdog timeout, which leads to a failed reset
> because GX cannot collapse when there is a transaction pending and a
> permanently hung GPU.
>
> On older platforms with qcom,smmu-v2, it seems that when one transaction
> is stalled subsequent faulting transactions are terminated, which avoids
> this problem, but the MMU-500 follows the spec here.
>
> To work around these problem, disable stall-on-fault as soon as we get a
> page fault until a cooldown period after pagefaults stop. This allows
> the GMU some guaranteed time to continue working. We only use
> stall-on-fault to halt the GPU while we collect a devcoredump and we
> always terminate the transaction afterward, so it's fine to miss some
> subsequent page faults. We also keep it disabled so long as the current
> devcoredump hasn't been deleted, because in that case we likely won't
> capture another one if there's a fault.
>
> After this commit HFI messages still occasionally time out, because the
> crashdump handler doesn't run fast enough to let the GMU resume, but the
> driver seems to recover from it. This will probably go away after the
> HFI timeout is increased.
>
> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
> ---
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  2 ++
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   |  4 ++++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 42 ++++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h | 24 +++++++++++++++++++
>  drivers/gpu/drm/msm/msm_iommu.c         |  9 +++++++
>  drivers/gpu/drm/msm/msm_mmu.h           |  1 +
>  6 files changed, 81 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index 71dca78cd7a5324e9ff5b14f173e2209fa42e196..670141531112c9d29cef8ef1fd51b74759fdd6d2 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -131,6 +131,8 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>         struct msm_ringbuffer *ring = submit->ring;
>         unsigned int i, ibs = 0;
>
> +       adreno_check_and_reenable_stall(adreno_gpu);
> +
>         if (IS_ENABLED(CONFIG_DRM_MSM_GPU_SUDO) && submit->in_rb) {
>                 ring->cur_ctx_seqno = 0;
>                 a5xx_submit_in_rb(gpu, submit);
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 0ae29a7c8a4d3f74236a35cc919f69d5c0a384a0..5a34cd2109a2d74c92841448a61ccb0d4f34e264 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -212,6 +212,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>         struct msm_ringbuffer *ring = submit->ring;
>         unsigned int i, ibs = 0;
>
> +       adreno_check_and_reenable_stall(adreno_gpu);
> +
>         a6xx_set_pagetable(a6xx_gpu, ring, submit);
>
>         get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP(0),
> @@ -335,6 +337,8 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>         struct msm_ringbuffer *ring = submit->ring;
>         unsigned int i, ibs = 0;
>
> +       adreno_check_and_reenable_stall(adreno_gpu);
> +
>         /*
>          * Toggle concurrent binning for pagetable switch and set the thread to
>          * BR since only it can execute the pagetable switch packets.
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index 1238f326597808eb28b4c6822cbd41a26e555eb9..bac586101dc0494f46b069a8440a45825dfe9b5e 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -246,16 +246,53 @@ u64 adreno_private_address_space_size(struct msm_gpu *gpu)
>         return SZ_4G;
>  }
>
> +void adreno_check_and_reenable_stall(struct adreno_gpu *adreno_gpu)
> +{
> +       struct msm_gpu *gpu = &adreno_gpu->base;
> +       unsigned long flags;
> +
> +       /*
> +        * Wait until the cooldown period has passed and we would actually
> +        * collect a crashdump to re-enable stall-on-fault.
> +        */
> +       spin_lock_irqsave(&adreno_gpu->fault_stall_lock, flags);
> +       if (!adreno_gpu->stall_enabled &&
> +                       ktime_after(ktime_get(), adreno_gpu->stall_reenable_time) &&
> +                       !READ_ONCE(gpu->crashstate)) {
> +               adreno_gpu->stall_enabled = true;
> +
> +               gpu->aspace->mmu->funcs->set_stall(gpu->aspace->mmu, true);
> +       }
> +       spin_unlock_irqrestore(&adreno_gpu->fault_stall_lock, flags);
> +}
> +
>  #define ARM_SMMU_FSR_TF                 BIT(1)
>  #define ARM_SMMU_FSR_PF                        BIT(3)
>  #define ARM_SMMU_FSR_EF                        BIT(4)
> +#define ARM_SMMU_FSR_SS                        BIT(30)
>
>  int adreno_fault_handler(struct msm_gpu *gpu, unsigned long iova, int flags,
>                          struct adreno_smmu_fault_info *info, const char *block,
>                          u32 scratch[4])
>  {
> +       struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>         const char *type = "UNKNOWN";
> -       bool do_devcoredump = info && !READ_ONCE(gpu->crashstate);
> +       bool do_devcoredump = info && (info->fsr & ARM_SMMU_FSR_SS) &&
> +               !READ_ONCE(gpu->crashstate);
> +       unsigned long irq_flags;
> +
> +       /*
> +        * In case there is a subsequent storm of pagefaults, disable
> +        * stall-on-fault for at least half a second.
> +        */
> +       spin_lock_irqsave(&adreno_gpu->fault_stall_lock, irq_flags);
> +       if (adreno_gpu->stall_enabled) {
> +               adreno_gpu->stall_enabled = false;
> +
> +               gpu->aspace->mmu->funcs->set_stall(gpu->aspace->mmu, false);
> +       }
> +       adreno_gpu->stall_reenable_time = ktime_add_ms(ktime_get(), 500);
> +       spin_unlock_irqrestore(&adreno_gpu->fault_stall_lock, irq_flags);
>
>         /*
>          * If we aren't going to be resuming later from fault_worker, then do
> @@ -1143,6 +1180,9 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>                 adreno_gpu->info->inactive_period);
>         pm_runtime_use_autosuspend(dev);
>
> +       spin_lock_init(&adreno_gpu->fault_stall_lock);
> +       adreno_gpu->stall_enabled = true;
> +
>         return msm_gpu_init(drm, pdev, &adreno_gpu->base, &funcs->base,
>                         gpu_name, &adreno_gpu_config);
>  }
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> index dcf454629ce037b2a8274a6699674ad754ce1f07..a528036b46216bd898f6d48c5fb0555c4c4b053b 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -205,6 +205,28 @@ struct adreno_gpu {
>         /* firmware: */
>         const struct firmware *fw[ADRENO_FW_MAX];
>
> +       /**
> +        * fault_stall_lock:

nit, @fault_stall_lock:  And for
fault_stall_reenable_time/stall_enabled, it wouldn't hurt to add
something along the lines of "Protected by @fault_stall_lock".  I've
been slowly trying to improve the comment docs over time, I have some
of that in my vmbind patchset.

Anyways, with those nits addressed, r-b

BR,
-R

> +        *
> +        * Serialize changes to stall-on-fault state.
> +        */
> +       spinlock_t fault_stall_lock;
> +
> +       /**
> +        * fault_stall_reenable_time:
> +        *
> +        * if stall_enabled is false, when to reenable stall-on-fault.
> +        */
> +       ktime_t stall_reenable_time;
> +
> +       /**
> +        * stall_enabled:
> +        *
> +        * Whether stall-on-fault is currently enabled.
> +        */
> +       bool stall_enabled;
> +
> +
>         struct {
>                 /**
>                  * @rgb565_predicator: Unknown, introduced with A650 family,
> @@ -629,6 +651,8 @@ int adreno_fault_handler(struct msm_gpu *gpu, unsigned long iova, int flags,
>                          struct adreno_smmu_fault_info *info, const char *block,
>                          u32 scratch[4]);
>
> +void adreno_check_and_reenable_stall(struct adreno_gpu *gpu);
> +
>  int adreno_read_speedbin(struct device *dev, u32 *speedbin);
>
>  /*
> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> index 2a94e82316f95c5f9dcc37ef0a4664a29e3492b2..8d5380e6dcc217c7c209b51527bf15748b3ada71 100644
> --- a/drivers/gpu/drm/msm/msm_iommu.c
> +++ b/drivers/gpu/drm/msm/msm_iommu.c
> @@ -351,6 +351,14 @@ static void msm_iommu_resume_translation(struct msm_mmu *mmu)
>                 adreno_smmu->resume_translation(adreno_smmu->cookie, true);
>  }
>
> +static void msm_iommu_set_stall(struct msm_mmu *mmu, bool enable)
> +{
> +       struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(mmu->dev);
> +
> +       if (adreno_smmu->set_stall)
> +               adreno_smmu->set_stall(adreno_smmu->cookie, enable);
> +}
> +
>  static void msm_iommu_detach(struct msm_mmu *mmu)
>  {
>         struct msm_iommu *iommu = to_msm_iommu(mmu);
> @@ -399,6 +407,7 @@ static const struct msm_mmu_funcs funcs = {
>                 .unmap = msm_iommu_unmap,
>                 .destroy = msm_iommu_destroy,
>                 .resume_translation = msm_iommu_resume_translation,
> +               .set_stall = msm_iommu_set_stall,
>  };
>
>  struct msm_mmu *msm_iommu_new(struct device *dev, unsigned long quirks)
> diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
> index 88af4f490881f2a6789ae2d03e1c02d10046331a..2694a356a17904e7572b767b16ed0cee806406cf 100644
> --- a/drivers/gpu/drm/msm/msm_mmu.h
> +++ b/drivers/gpu/drm/msm/msm_mmu.h
> @@ -16,6 +16,7 @@ struct msm_mmu_funcs {
>         int (*unmap)(struct msm_mmu *mmu, uint64_t iova, size_t len);
>         void (*destroy)(struct msm_mmu *mmu);
>         void (*resume_translation)(struct msm_mmu *mmu);
> +       void (*set_stall)(struct msm_mmu *mmu, bool enable);
>  };
>
>  enum msm_mmu_type {
>
> --
> 2.47.1
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 2/5] iommu/arm-smmu-qcom: Don't read fault registers directly
  2025-03-04 16:56 ` [PATCH v4 2/5] iommu/arm-smmu-qcom: Don't read fault registers directly Connor Abbott
@ 2025-03-05 19:08   ` Rob Clark
  2025-03-11 18:08   ` Will Deacon
  1 sibling, 0 replies; 33+ messages in thread
From: Rob Clark @ 2025-03-05 19:08 UTC (permalink / raw)
  To: Connor Abbott
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On Tue, Mar 4, 2025 at 8:57 AM Connor Abbott <cwabbott0@gmail.com> wrote:
>
> In some cases drm/msm has to resume a stalled transaction directly in
> its fault handler. Experimentally this doesn't work on SMMU500 if the
> fault hasn't already been acknowledged by clearing FSR. Rather than
> trying to clear FSR in msm's fault handler and implementing a
> tricky handshake to avoid accidentally clearing FSR twice, we want to
> clear FSR before calling the fault handlers, but this means that the
> contents of registers can change underneath us in the fault handler and
> msm currently uses a private function to read the register contents for
> its own purposes in its fault handler, such as using the
> implementation-defined FSYNR1 to determine which block caused the fault.
> Fix this by making msm use the register values already read by arm-smmu
> itself before clearing FSR rather than messing around with reading
> registers directly.
>
> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>

Reviewed-by: Rob Clark <robdclark@gmail.com>

> ---
>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 19 +++++++++----------
>  drivers/iommu/arm/arm-smmu/arm-smmu.c      | 14 +++++++-------
>  drivers/iommu/arm/arm-smmu/arm-smmu.h      | 21 +++++++++++----------
>  3 files changed, 27 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index 6372f3e25c4bc24cb52f9233095170e8aa510a53..186d6ad4fd1c990398df4dec53f4d58ada9e658c 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -62,16 +62,15 @@ static void qcom_adreno_smmu_get_fault_info(const void *cookie,
>                 struct adreno_smmu_fault_info *info)
>  {
>         struct arm_smmu_domain *smmu_domain = (void *)cookie;
> -       struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> -       struct arm_smmu_device *smmu = smmu_domain->smmu;
> -
> -       info->fsr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_FSR);
> -       info->fsynr0 = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_FSYNR0);
> -       info->fsynr1 = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_FSYNR1);
> -       info->far = arm_smmu_cb_readq(smmu, cfg->cbndx, ARM_SMMU_CB_FAR);
> -       info->cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(cfg->cbndx));
> -       info->ttbr0 = arm_smmu_cb_readq(smmu, cfg->cbndx, ARM_SMMU_CB_TTBR0);
> -       info->contextidr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_CONTEXTIDR);
> +       struct arm_smmu_context_fault_info *cfi = &smmu_domain->cfi;
> +
> +       info->fsr = cfi->fsr;
> +       info->fsynr0 = cfi->fsynr0;
> +       info->fsynr1 = cfi->fsynr1;
> +       info->far = cfi->iova;
> +       info->cbfrsynra = cfi->cbfrsynra;
> +       info->ttbr0 = cfi->ttbr0;
> +       info->contextidr = cfi->contextidr;
>  }
>
>  static void qcom_adreno_smmu_set_stall(const void *cookie, bool enabled)
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index a9213e0f1579d1e3be0bfba75eea1d5de23117de..498b96e95cb4fdb67c246ef13de1eb8f40d68f7d 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -453,26 +453,26 @@ void arm_smmu_print_context_fault_info(struct arm_smmu_device *smmu, int idx,
>
>  static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>  {
> -       struct arm_smmu_context_fault_info cfi;
>         struct arm_smmu_domain *smmu_domain = dev;
> +       struct arm_smmu_context_fault_info *cfi = &smmu_domain->cfi;
>         struct arm_smmu_device *smmu = smmu_domain->smmu;
>         static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
>                                       DEFAULT_RATELIMIT_BURST);
>         int idx = smmu_domain->cfg.cbndx;
>         int ret;
>
> -       arm_smmu_read_context_fault_info(smmu, idx, &cfi);
> +       arm_smmu_read_context_fault_info(smmu, idx, cfi);
>
> -       if (!(cfi.fsr & ARM_SMMU_CB_FSR_FAULT))
> +       if (!(cfi->fsr & ARM_SMMU_CB_FSR_FAULT))
>                 return IRQ_NONE;
>
> -       ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
> -               cfi.fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> +       ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi->iova,
> +               cfi->fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
>
>         if (ret == -ENOSYS && __ratelimit(&rs))
> -               arm_smmu_print_context_fault_info(smmu, idx, &cfi);
> +               arm_smmu_print_context_fault_info(smmu, idx, cfi);
>
> -       arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi.fsr);
> +       arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi->fsr);
>         return IRQ_HANDLED;
>  }
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> index d3bc77dcd4d40f25bc70f3289616fb866649b022..411d807e0a7033833716635efb3968a0bd3ff237 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> @@ -373,6 +373,16 @@ enum arm_smmu_domain_stage {
>         ARM_SMMU_DOMAIN_NESTED,
>  };
>
> +struct arm_smmu_context_fault_info {
> +       unsigned long iova;
> +       u64 ttbr0;
> +       u32 fsr;
> +       u32 fsynr0;
> +       u32 fsynr1;
> +       u32 cbfrsynra;
> +       u32 contextidr;
> +};
> +
>  struct arm_smmu_domain {
>         struct arm_smmu_device          *smmu;
>         struct io_pgtable_ops           *pgtbl_ops;
> @@ -380,6 +390,7 @@ struct arm_smmu_domain {
>         const struct iommu_flush_ops    *flush_ops;
>         struct arm_smmu_cfg             cfg;
>         enum arm_smmu_domain_stage      stage;
> +       struct arm_smmu_context_fault_info cfi;
>         struct mutex                    init_mutex; /* Protects smmu pointer */
>         spinlock_t                      cb_lock; /* Serialises ATS1* ops and TLB syncs */
>         struct iommu_domain             domain;
> @@ -541,16 +552,6 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
>  void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx);
>  int arm_mmu500_reset(struct arm_smmu_device *smmu);
>
> -struct arm_smmu_context_fault_info {
> -       unsigned long iova;
> -       u64 ttbr0;
> -       u32 fsr;
> -       u32 fsynr0;
> -       u32 fsynr1;
> -       u32 cbfrsynra;
> -       u32 contextidr;
> -};
> -
>  void arm_smmu_read_context_fault_info(struct arm_smmu_device *smmu, int idx,
>                                       struct arm_smmu_context_fault_info *cfi);
>
>
> --
> 2.47.1
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 1/5] iommu/arm-smmu: Save additional information on context fault
  2025-03-04 16:56 ` [PATCH v4 1/5] iommu/arm-smmu: Save additional information on context fault Connor Abbott
@ 2025-03-05 19:09   ` Rob Clark
  2025-03-11 18:05   ` Will Deacon
  1 sibling, 0 replies; 33+ messages in thread
From: Rob Clark @ 2025-03-05 19:09 UTC (permalink / raw)
  To: Connor Abbott
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On Tue, Mar 4, 2025 at 8:57 AM Connor Abbott <cwabbott0@gmail.com> wrote:
>
> This will be used by drm/msm for GPU page faults, replacing the manual
> register reading it does.
>
> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>

Reviewed-by: Rob Clark <robdclark@gmail.com>

> ---
>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c |  4 ++--
>  drivers/iommu/arm/arm-smmu/arm-smmu.c            | 27 +++++++++++++-----------
>  drivers/iommu/arm/arm-smmu/arm-smmu.h            |  5 ++++-
>  3 files changed, 21 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
> index 548783f3f8e89fd978367afa65c473002f66e2e7..ae4fdbbce6ba80440f539557a39866a932360d4e 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
> @@ -400,7 +400,7 @@ irqreturn_t qcom_smmu_context_fault(int irq, void *dev)
>
>         if (list_empty(&tbu_list)) {
>                 ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
> -                                        cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> +                                        cfi.fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
>
>                 if (ret == -ENOSYS)
>                         arm_smmu_print_context_fault_info(smmu, idx, &cfi);
> @@ -412,7 +412,7 @@ irqreturn_t qcom_smmu_context_fault(int irq, void *dev)
>         phys_soft = ops->iova_to_phys(ops, cfi.iova);
>
>         tmp = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
> -                                cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> +                                cfi.fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
>         if (!tmp || tmp == -EBUSY) {
>                 ret = IRQ_HANDLED;
>                 resume = ARM_SMMU_RESUME_TERMINATE;
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index ade4684c14c9b2724a71e2457288dbfaf7562c83..a9213e0f1579d1e3be0bfba75eea1d5de23117de 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -409,9 +409,12 @@ void arm_smmu_read_context_fault_info(struct arm_smmu_device *smmu, int idx,
>                                       struct arm_smmu_context_fault_info *cfi)
>  {
>         cfi->iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
> +       cfi->ttbr0 = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_TTBR0);
>         cfi->fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
> -       cfi->fsynr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
> +       cfi->fsynr0 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
> +       cfi->fsynr1 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR1);
>         cfi->cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
> +       cfi->contextidr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_CONTEXTIDR);
>  }
>
>  void arm_smmu_print_context_fault_info(struct arm_smmu_device *smmu, int idx,
> @@ -419,7 +422,7 @@ void arm_smmu_print_context_fault_info(struct arm_smmu_device *smmu, int idx,
>  {
>         dev_err(smmu->dev,
>                 "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
> -               cfi->fsr, cfi->iova, cfi->fsynr, cfi->cbfrsynra, idx);
> +               cfi->fsr, cfi->iova, cfi->fsynr0, cfi->cbfrsynra, idx);
>
>         dev_err(smmu->dev, "FSR    = %08x [%s%sFormat=%u%s%s%s%s%s%s%s%s], SID=0x%x\n",
>                 cfi->fsr,
> @@ -437,15 +440,15 @@ void arm_smmu_print_context_fault_info(struct arm_smmu_device *smmu, int idx,
>                 cfi->cbfrsynra);
>
>         dev_err(smmu->dev, "FSYNR0 = %08x [S1CBNDX=%u%s%s%s%s%s%s PLVL=%u]\n",
> -               cfi->fsynr,
> -               (u32)FIELD_GET(ARM_SMMU_CB_FSYNR0_S1CBNDX, cfi->fsynr),
> -               (cfi->fsynr & ARM_SMMU_CB_FSYNR0_AFR) ? " AFR" : "",
> -               (cfi->fsynr & ARM_SMMU_CB_FSYNR0_PTWF) ? " PTWF" : "",
> -               (cfi->fsynr & ARM_SMMU_CB_FSYNR0_NSATTR) ? " NSATTR" : "",
> -               (cfi->fsynr & ARM_SMMU_CB_FSYNR0_IND) ? " IND" : "",
> -               (cfi->fsynr & ARM_SMMU_CB_FSYNR0_PNU) ? " PNU" : "",
> -               (cfi->fsynr & ARM_SMMU_CB_FSYNR0_WNR) ? " WNR" : "",
> -               (u32)FIELD_GET(ARM_SMMU_CB_FSYNR0_PLVL, cfi->fsynr));
> +               cfi->fsynr0,
> +               (u32)FIELD_GET(ARM_SMMU_CB_FSYNR0_S1CBNDX, cfi->fsynr0),
> +               (cfi->fsynr0 & ARM_SMMU_CB_FSYNR0_AFR) ? " AFR" : "",
> +               (cfi->fsynr0 & ARM_SMMU_CB_FSYNR0_PTWF) ? " PTWF" : "",
> +               (cfi->fsynr0 & ARM_SMMU_CB_FSYNR0_NSATTR) ? " NSATTR" : "",
> +               (cfi->fsynr0 & ARM_SMMU_CB_FSYNR0_IND) ? " IND" : "",
> +               (cfi->fsynr0 & ARM_SMMU_CB_FSYNR0_PNU) ? " PNU" : "",
> +               (cfi->fsynr0 & ARM_SMMU_CB_FSYNR0_WNR) ? " WNR" : "",
> +               (u32)FIELD_GET(ARM_SMMU_CB_FSYNR0_PLVL, cfi->fsynr0));
>  }
>
>  static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
> @@ -464,7 +467,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>                 return IRQ_NONE;
>
>         ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
> -               cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> +               cfi.fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
>
>         if (ret == -ENOSYS && __ratelimit(&rs))
>                 arm_smmu_print_context_fault_info(smmu, idx, &cfi);
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> index e2aeb511ae903302e3c15d2cf5f22e2a26ac2346..d3bc77dcd4d40f25bc70f3289616fb866649b022 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> @@ -543,9 +543,12 @@ int arm_mmu500_reset(struct arm_smmu_device *smmu);
>
>  struct arm_smmu_context_fault_info {
>         unsigned long iova;
> +       u64 ttbr0;
>         u32 fsr;
> -       u32 fsynr;
> +       u32 fsynr0;
> +       u32 fsynr1;
>         u32 cbfrsynra;
> +       u32 contextidr;
>  };
>
>  void arm_smmu_read_context_fault_info(struct arm_smmu_device *smmu, int idx,
>
> --
> 2.47.1
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 4/5] iommu/arm-smmu-qcom: Make set_stall work when the device is on
  2025-03-04 16:56 ` [PATCH v4 4/5] iommu/arm-smmu-qcom: Make set_stall work when the device is on Connor Abbott
@ 2025-03-05 19:09   ` Rob Clark
  2025-03-11 18:11   ` Will Deacon
  1 sibling, 0 replies; 33+ messages in thread
From: Rob Clark @ 2025-03-05 19:09 UTC (permalink / raw)
  To: Connor Abbott
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On Tue, Mar 4, 2025 at 8:57 AM Connor Abbott <cwabbott0@gmail.com> wrote:
>
> Up until now we have only called the set_stall callback during
> initialization when the device is off. But we will soon start calling it
> to temporarily disable stall-on-fault when the device is on, so handle
> that by checking if the device is on and writing SCTLR.
>
> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>

Reviewed-by: Rob Clark <robdclark@gmail.com>

> ---
>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 30 +++++++++++++++++++++++++++---
>  1 file changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index a428e53add08d451fb2152e3ab80e0fba936e214..d34a0d917013bb3d5a24b3ce72f48e3b38474da2 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -77,12 +77,36 @@ static void qcom_adreno_smmu_set_stall(const void *cookie, bool enabled)
>  {
>         struct arm_smmu_domain *smmu_domain = (void *)cookie;
>         struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> -       struct qcom_smmu *qsmmu = to_qcom_smmu(smmu_domain->smmu);
> +       struct arm_smmu_device *smmu = smmu_domain->smmu;
> +       struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
> +       u32 mask = BIT(cfg->cbndx);
> +       bool stall_changed = !!(qsmmu->stall_enabled & mask) != enabled;
> +       unsigned long flags;
>
>         if (enabled)
> -               qsmmu->stall_enabled |= BIT(cfg->cbndx);
> +               qsmmu->stall_enabled |= mask;
>         else
> -               qsmmu->stall_enabled &= ~BIT(cfg->cbndx);
> +               qsmmu->stall_enabled &= ~mask;
> +
> +       /*
> +        * If the device is on and we changed the setting, update the register.
> +        */
> +       if (stall_changed && pm_runtime_get_if_active(smmu->dev) > 0) {
> +               spin_lock_irqsave(&smmu_domain->cb_lock, flags);
> +
> +               u32 reg = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR);
> +
> +               if (enabled)
> +                       reg |= ARM_SMMU_SCTLR_CFCFG;
> +               else
> +                       reg &= ~ARM_SMMU_SCTLR_CFCFG;
> +
> +               arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR, reg);
> +
> +               spin_unlock_irqrestore(&smmu_domain->cb_lock, flags);
> +
> +               pm_runtime_put_autosuspend(smmu->dev);
> +       }
>  }
>
>  static void qcom_adreno_smmu_resume_translation(const void *cookie, bool terminate)
>
> --
> 2.47.1
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 3/5] iommu/arm-smmu: Fix spurious interrupts with stall-on-fault
  2025-03-04 16:56 ` [PATCH v4 3/5] iommu/arm-smmu: Fix spurious interrupts with stall-on-fault Connor Abbott
@ 2025-03-05 19:12   ` Rob Clark
  0 siblings, 0 replies; 33+ messages in thread
From: Rob Clark @ 2025-03-05 19:12 UTC (permalink / raw)
  To: Connor Abbott
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On Tue, Mar 4, 2025 at 8:57 AM Connor Abbott <cwabbott0@gmail.com> wrote:
>
> On some SMMUv2 implementations, including MMU-500, SMMU_CBn_FSR.SS
> asserts an interrupt. The only way to clear that bit is to resume the
> transaction by writing SMMU_CBn_RESUME, but typically resuming the
> transaction requires complex operations (copying in pages, etc.) that
> can't be done in IRQ context. drm/msm already has a problem, because
> its fault handler sometimes schedules a job to dump the GPU state and
> doesn't resume translation until this is complete.
>
> Work around this by disabling context fault interrupts until after the
> transaction is resumed. Because other context banks can share an IRQ
> line, we may still get an interrupt intended for another context bank,
> but in this case only SMMU_CBn_FSR.SS will be asserted and we can skip
> it assuming that interrupts are disabled which is accomplished by
> removing the bit from ARM_SMMU_CB_FSR_FAULT. SMMU_CBn_FSR.SS won't be
> asserted unless an external user enabled stall-on-fault, and they are
> expected to resume the translation and re-enable interrupts.
>
> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
> Reviewed-by Robin Murphy <robin.murphy@arm.com>

Reviewed-by: Rob Clark <robdclark@gmail.com>

> ---
>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 15 ++++++++++-
>  drivers/iommu/arm/arm-smmu/arm-smmu.c      | 41 +++++++++++++++++++++++++++++-
>  drivers/iommu/arm/arm-smmu/arm-smmu.h      |  1 -
>  3 files changed, 54 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index 186d6ad4fd1c990398df4dec53f4d58ada9e658c..a428e53add08d451fb2152e3ab80e0fba936e214 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -90,12 +90,25 @@ static void qcom_adreno_smmu_resume_translation(const void *cookie, bool termina
>         struct arm_smmu_domain *smmu_domain = (void *)cookie;
>         struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
>         struct arm_smmu_device *smmu = smmu_domain->smmu;
> -       u32 reg = 0;
> +       u32 reg = 0, sctlr;
> +       unsigned long flags;
>
>         if (terminate)
>                 reg |= ARM_SMMU_RESUME_TERMINATE;
>
> +       spin_lock_irqsave(&smmu_domain->cb_lock, flags);
> +
>         arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_RESUME, reg);
> +
> +       /*
> +        * Re-enable interrupts after they were disabled by
> +        * arm_smmu_context_fault().
> +        */
> +       sctlr = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR);
> +       sctlr |= ARM_SMMU_SCTLR_CFIE;
> +       arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR, sctlr);
> +
> +       spin_unlock_irqrestore(&smmu_domain->cb_lock, flags);
>  }
>
>  #define QCOM_ADRENO_SMMU_GPU_SID 0
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index 498b96e95cb4fdb67c246ef13de1eb8f40d68f7d..284079ef95cd2deeb71816a284850523897badd8 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -466,13 +466,52 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>         if (!(cfi->fsr & ARM_SMMU_CB_FSR_FAULT))
>                 return IRQ_NONE;
>
> +       /*
> +        * On some implementations FSR.SS asserts a context fault
> +        * interrupt. We do not want this behavior, because resolving the
> +        * original context fault typically requires operations that cannot be
> +        * performed in IRQ context but leaving the stall unacknowledged will
> +        * immediately lead to another spurious interrupt as FSR.SS is still
> +        * set. Work around this by disabling interrupts for this context bank.
> +        * It's expected that interrupts are re-enabled after resuming the
> +        * translation.
> +        *
> +        * We have to do this before report_iommu_fault() so that we don't
> +        * leave interrupts disabled in case the downstream user decides the
> +        * fault can be resolved inside its fault handler.
> +        *
> +        * There is a possible race if there are multiple context banks sharing
> +        * the same interrupt and both signal an interrupt in between writing
> +        * RESUME and SCTLR. We could disable interrupts here before we
> +        * re-enable them in the resume handler, leaving interrupts enabled.
> +        * Lock the write to serialize it with the resume handler.
> +        */
> +       if (cfi->fsr & ARM_SMMU_CB_FSR_SS) {
> +               u32 val;
> +
> +               spin_lock(&smmu_domain->cb_lock);
> +               val = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_SCTLR);
> +               val &= ~ARM_SMMU_SCTLR_CFIE;
> +               arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, val);
> +               spin_unlock(&smmu_domain->cb_lock);
> +       }
> +
> +       /*
> +        * The SMMUv2 architecture specification says that if stall-on-fault is
> +        * enabled the correct sequence is to write to SMMU_CBn_FSR to clear
> +        * the fault and then write to SMMU_CBn_RESUME. Clear the interrupt
> +        * first before running the user's fault handler to make sure we follow
> +        * this sequence. It should be ok if there is another fault in the
> +        * meantime because we have already read the fault info.
> +        */
> +       arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi->fsr);
> +
>         ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi->iova,
>                 cfi->fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
>
>         if (ret == -ENOSYS && __ratelimit(&rs))
>                 arm_smmu_print_context_fault_info(smmu, idx, cfi);
>
> -       arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_FSR, cfi->fsr);
>         return IRQ_HANDLED;
>  }
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> index 411d807e0a7033833716635efb3968a0bd3ff237..4235b772c2cb032778816578c9e6644512543a5e 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> @@ -214,7 +214,6 @@ enum arm_smmu_cbar_type {
>                                          ARM_SMMU_CB_FSR_TLBLKF)
>
>  #define ARM_SMMU_CB_FSR_FAULT          (ARM_SMMU_CB_FSR_MULTI |        \
> -                                        ARM_SMMU_CB_FSR_SS |           \
>                                          ARM_SMMU_CB_FSR_UUT |          \
>                                          ARM_SMMU_CB_FSR_EF |           \
>                                          ARM_SMMU_CB_FSR_PF |           \
>
> --
> 2.47.1
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 5/5] drm/msm: Temporarily disable stall-on-fault after a page fault
  2025-03-05 19:07   ` Rob Clark
@ 2025-03-05 19:38     ` Connor Abbott
  2025-03-05 19:56       ` Rob Clark
  0 siblings, 1 reply; 33+ messages in thread
From: Connor Abbott @ 2025-03-05 19:38 UTC (permalink / raw)
  To: Rob Clark
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On Wed, Mar 5, 2025 at 2:07 PM Rob Clark <robdclark@gmail.com> wrote:
>
> On Tue, Mar 4, 2025 at 8:57 AM Connor Abbott <cwabbott0@gmail.com> wrote:
> >
> > When things go wrong, the GPU is capable of quickly generating millions
> > of faulting translation requests per second. When that happens, in the
> > stall-on-fault model each access will stall until it wins the race to
> > signal the fault and then the RESUME register is written. This slows
> > processing page faults to a crawl as the GPU can generate faults much
> > faster than the CPU can acknowledge them. It also means that all
> > available resources in the SMMU are saturated waiting for the stalled
> > transactions, so that other transactions such as transactions generated
> > by the GMU, which shares a context bank with the GPU, cannot proceed.
>
> Nit, the GMU does not actually share a cb.. looking on x1e80100.dtsi,
> the GMU has cb 5 and gpu has 0 and 1.  (Currently we just use the
> first, but I guess the 2nd would be used if we supported protected
> content?)

Yeah, I realized after writing this that's the case. But I guess the
QoS issues happen even with separate context banks due to the way they
allocate translation units?

Connor

>
> fwiw, you can read this from dtsi, ie. in the GMU node, "iommus =
> <&adreno_smmu 5 0x0>;"
>
> > This causes a GMU watchdog timeout, which leads to a failed reset
> > because GX cannot collapse when there is a transaction pending and a
> > permanently hung GPU.
> >
> > On older platforms with qcom,smmu-v2, it seems that when one transaction
> > is stalled subsequent faulting transactions are terminated, which avoids
> > this problem, but the MMU-500 follows the spec here.
> >
> > To work around these problem, disable stall-on-fault as soon as we get a
> > page fault until a cooldown period after pagefaults stop. This allows
> > the GMU some guaranteed time to continue working. We only use
> > stall-on-fault to halt the GPU while we collect a devcoredump and we
> > always terminate the transaction afterward, so it's fine to miss some
> > subsequent page faults. We also keep it disabled so long as the current
> > devcoredump hasn't been deleted, because in that case we likely won't
> > capture another one if there's a fault.
> >
> > After this commit HFI messages still occasionally time out, because the
> > crashdump handler doesn't run fast enough to let the GMU resume, but the
> > driver seems to recover from it. This will probably go away after the
> > HFI timeout is increased.
> >
> > Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
> > ---
> >  drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  2 ++
> >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   |  4 ++++
> >  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 42 ++++++++++++++++++++++++++++++++-
> >  drivers/gpu/drm/msm/adreno/adreno_gpu.h | 24 +++++++++++++++++++
> >  drivers/gpu/drm/msm/msm_iommu.c         |  9 +++++++
> >  drivers/gpu/drm/msm/msm_mmu.h           |  1 +
> >  6 files changed, 81 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> > index 71dca78cd7a5324e9ff5b14f173e2209fa42e196..670141531112c9d29cef8ef1fd51b74759fdd6d2 100644
> > --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> > @@ -131,6 +131,8 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
> >         struct msm_ringbuffer *ring = submit->ring;
> >         unsigned int i, ibs = 0;
> >
> > +       adreno_check_and_reenable_stall(adreno_gpu);
> > +
> >         if (IS_ENABLED(CONFIG_DRM_MSM_GPU_SUDO) && submit->in_rb) {
> >                 ring->cur_ctx_seqno = 0;
> >                 a5xx_submit_in_rb(gpu, submit);
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index 0ae29a7c8a4d3f74236a35cc919f69d5c0a384a0..5a34cd2109a2d74c92841448a61ccb0d4f34e264 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -212,6 +212,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
> >         struct msm_ringbuffer *ring = submit->ring;
> >         unsigned int i, ibs = 0;
> >
> > +       adreno_check_and_reenable_stall(adreno_gpu);
> > +
> >         a6xx_set_pagetable(a6xx_gpu, ring, submit);
> >
> >         get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP(0),
> > @@ -335,6 +337,8 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
> >         struct msm_ringbuffer *ring = submit->ring;
> >         unsigned int i, ibs = 0;
> >
> > +       adreno_check_and_reenable_stall(adreno_gpu);
> > +
> >         /*
> >          * Toggle concurrent binning for pagetable switch and set the thread to
> >          * BR since only it can execute the pagetable switch packets.
> > diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > index 1238f326597808eb28b4c6822cbd41a26e555eb9..bac586101dc0494f46b069a8440a45825dfe9b5e 100644
> > --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > @@ -246,16 +246,53 @@ u64 adreno_private_address_space_size(struct msm_gpu *gpu)
> >         return SZ_4G;
> >  }
> >
> > +void adreno_check_and_reenable_stall(struct adreno_gpu *adreno_gpu)
> > +{
> > +       struct msm_gpu *gpu = &adreno_gpu->base;
> > +       unsigned long flags;
> > +
> > +       /*
> > +        * Wait until the cooldown period has passed and we would actually
> > +        * collect a crashdump to re-enable stall-on-fault.
> > +        */
> > +       spin_lock_irqsave(&adreno_gpu->fault_stall_lock, flags);
> > +       if (!adreno_gpu->stall_enabled &&
> > +                       ktime_after(ktime_get(), adreno_gpu->stall_reenable_time) &&
> > +                       !READ_ONCE(gpu->crashstate)) {
> > +               adreno_gpu->stall_enabled = true;
> > +
> > +               gpu->aspace->mmu->funcs->set_stall(gpu->aspace->mmu, true);
> > +       }
> > +       spin_unlock_irqrestore(&adreno_gpu->fault_stall_lock, flags);
> > +}
> > +
> >  #define ARM_SMMU_FSR_TF                 BIT(1)
> >  #define ARM_SMMU_FSR_PF                        BIT(3)
> >  #define ARM_SMMU_FSR_EF                        BIT(4)
> > +#define ARM_SMMU_FSR_SS                        BIT(30)
> >
> >  int adreno_fault_handler(struct msm_gpu *gpu, unsigned long iova, int flags,
> >                          struct adreno_smmu_fault_info *info, const char *block,
> >                          u32 scratch[4])
> >  {
> > +       struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> >         const char *type = "UNKNOWN";
> > -       bool do_devcoredump = info && !READ_ONCE(gpu->crashstate);
> > +       bool do_devcoredump = info && (info->fsr & ARM_SMMU_FSR_SS) &&
> > +               !READ_ONCE(gpu->crashstate);
> > +       unsigned long irq_flags;
> > +
> > +       /*
> > +        * In case there is a subsequent storm of pagefaults, disable
> > +        * stall-on-fault for at least half a second.
> > +        */
> > +       spin_lock_irqsave(&adreno_gpu->fault_stall_lock, irq_flags);
> > +       if (adreno_gpu->stall_enabled) {
> > +               adreno_gpu->stall_enabled = false;
> > +
> > +               gpu->aspace->mmu->funcs->set_stall(gpu->aspace->mmu, false);
> > +       }
> > +       adreno_gpu->stall_reenable_time = ktime_add_ms(ktime_get(), 500);
> > +       spin_unlock_irqrestore(&adreno_gpu->fault_stall_lock, irq_flags);
> >
> >         /*
> >          * If we aren't going to be resuming later from fault_worker, then do
> > @@ -1143,6 +1180,9 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
> >                 adreno_gpu->info->inactive_period);
> >         pm_runtime_use_autosuspend(dev);
> >
> > +       spin_lock_init(&adreno_gpu->fault_stall_lock);
> > +       adreno_gpu->stall_enabled = true;
> > +
> >         return msm_gpu_init(drm, pdev, &adreno_gpu->base, &funcs->base,
> >                         gpu_name, &adreno_gpu_config);
> >  }
> > diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > index dcf454629ce037b2a8274a6699674ad754ce1f07..a528036b46216bd898f6d48c5fb0555c4c4b053b 100644
> > --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > @@ -205,6 +205,28 @@ struct adreno_gpu {
> >         /* firmware: */
> >         const struct firmware *fw[ADRENO_FW_MAX];
> >
> > +       /**
> > +        * fault_stall_lock:
>
> nit, @fault_stall_lock:  And for
> fault_stall_reenable_time/stall_enabled, it wouldn't hurt to add
> something along the lines of "Protected by @fault_stall_lock".  I've
> been slowly trying to improve the comment docs over time, I have some
> of that in my vmbind patchset.
>
> Anyways, with those nits addressed, r-b
>
> BR,
> -R
>
> > +        *
> > +        * Serialize changes to stall-on-fault state.
> > +        */
> > +       spinlock_t fault_stall_lock;
> > +
> > +       /**
> > +        * fault_stall_reenable_time:
> > +        *
> > +        * if stall_enabled is false, when to reenable stall-on-fault.
> > +        */
> > +       ktime_t stall_reenable_time;
> > +
> > +       /**
> > +        * stall_enabled:
> > +        *
> > +        * Whether stall-on-fault is currently enabled.
> > +        */
> > +       bool stall_enabled;
> > +
> > +
> >         struct {
> >                 /**
> >                  * @rgb565_predicator: Unknown, introduced with A650 family,
> > @@ -629,6 +651,8 @@ int adreno_fault_handler(struct msm_gpu *gpu, unsigned long iova, int flags,
> >                          struct adreno_smmu_fault_info *info, const char *block,
> >                          u32 scratch[4]);
> >
> > +void adreno_check_and_reenable_stall(struct adreno_gpu *gpu);
> > +
> >  int adreno_read_speedbin(struct device *dev, u32 *speedbin);
> >
> >  /*
> > diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> > index 2a94e82316f95c5f9dcc37ef0a4664a29e3492b2..8d5380e6dcc217c7c209b51527bf15748b3ada71 100644
> > --- a/drivers/gpu/drm/msm/msm_iommu.c
> > +++ b/drivers/gpu/drm/msm/msm_iommu.c
> > @@ -351,6 +351,14 @@ static void msm_iommu_resume_translation(struct msm_mmu *mmu)
> >                 adreno_smmu->resume_translation(adreno_smmu->cookie, true);
> >  }
> >
> > +static void msm_iommu_set_stall(struct msm_mmu *mmu, bool enable)
> > +{
> > +       struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(mmu->dev);
> > +
> > +       if (adreno_smmu->set_stall)
> > +               adreno_smmu->set_stall(adreno_smmu->cookie, enable);
> > +}
> > +
> >  static void msm_iommu_detach(struct msm_mmu *mmu)
> >  {
> >         struct msm_iommu *iommu = to_msm_iommu(mmu);
> > @@ -399,6 +407,7 @@ static const struct msm_mmu_funcs funcs = {
> >                 .unmap = msm_iommu_unmap,
> >                 .destroy = msm_iommu_destroy,
> >                 .resume_translation = msm_iommu_resume_translation,
> > +               .set_stall = msm_iommu_set_stall,
> >  };
> >
> >  struct msm_mmu *msm_iommu_new(struct device *dev, unsigned long quirks)
> > diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
> > index 88af4f490881f2a6789ae2d03e1c02d10046331a..2694a356a17904e7572b767b16ed0cee806406cf 100644
> > --- a/drivers/gpu/drm/msm/msm_mmu.h
> > +++ b/drivers/gpu/drm/msm/msm_mmu.h
> > @@ -16,6 +16,7 @@ struct msm_mmu_funcs {
> >         int (*unmap)(struct msm_mmu *mmu, uint64_t iova, size_t len);
> >         void (*destroy)(struct msm_mmu *mmu);
> >         void (*resume_translation)(struct msm_mmu *mmu);
> > +       void (*set_stall)(struct msm_mmu *mmu, bool enable);
> >  };
> >
> >  enum msm_mmu_type {
> >
> > --
> > 2.47.1
> >


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 5/5] drm/msm: Temporarily disable stall-on-fault after a page fault
  2025-03-05 19:38     ` Connor Abbott
@ 2025-03-05 19:56       ` Rob Clark
  0 siblings, 0 replies; 33+ messages in thread
From: Rob Clark @ 2025-03-05 19:56 UTC (permalink / raw)
  To: Connor Abbott
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On Wed, Mar 5, 2025 at 11:38 AM Connor Abbott <cwabbott0@gmail.com> wrote:
>
> On Wed, Mar 5, 2025 at 2:07 PM Rob Clark <robdclark@gmail.com> wrote:
> >
> > On Tue, Mar 4, 2025 at 8:57 AM Connor Abbott <cwabbott0@gmail.com> wrote:
> > >
> > > When things go wrong, the GPU is capable of quickly generating millions
> > > of faulting translation requests per second. When that happens, in the
> > > stall-on-fault model each access will stall until it wins the race to
> > > signal the fault and then the RESUME register is written. This slows
> > > processing page faults to a crawl as the GPU can generate faults much
> > > faster than the CPU can acknowledge them. It also means that all
> > > available resources in the SMMU are saturated waiting for the stalled
> > > transactions, so that other transactions such as transactions generated
> > > by the GMU, which shares a context bank with the GPU, cannot proceed.
> >
> > Nit, the GMU does not actually share a cb.. looking on x1e80100.dtsi,
> > the GMU has cb 5 and gpu has 0 and 1.  (Currently we just use the
> > first, but I guess the 2nd would be used if we supported protected
> > content?)
>
> Yeah, I realized after writing this that's the case. But I guess the
> QoS issues happen even with separate context banks due to the way they
> allocate translation units?

Right, however resources are allocated to track pending translations
seems to be shared across CBs

BR,
-R

>
> Connor
>
> >
> > fwiw, you can read this from dtsi, ie. in the GMU node, "iommus =
> > <&adreno_smmu 5 0x0>;"
> >
> > > This causes a GMU watchdog timeout, which leads to a failed reset
> > > because GX cannot collapse when there is a transaction pending and a
> > > permanently hung GPU.
> > >
> > > On older platforms with qcom,smmu-v2, it seems that when one transaction
> > > is stalled subsequent faulting transactions are terminated, which avoids
> > > this problem, but the MMU-500 follows the spec here.
> > >
> > > To work around these problem, disable stall-on-fault as soon as we get a
> > > page fault until a cooldown period after pagefaults stop. This allows
> > > the GMU some guaranteed time to continue working. We only use
> > > stall-on-fault to halt the GPU while we collect a devcoredump and we
> > > always terminate the transaction afterward, so it's fine to miss some
> > > subsequent page faults. We also keep it disabled so long as the current
> > > devcoredump hasn't been deleted, because in that case we likely won't
> > > capture another one if there's a fault.
> > >
> > > After this commit HFI messages still occasionally time out, because the
> > > crashdump handler doesn't run fast enough to let the GMU resume, but the
> > > driver seems to recover from it. This will probably go away after the
> > > HFI timeout is increased.
> > >
> > > Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
> > > ---
> > >  drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  2 ++
> > >  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   |  4 ++++
> > >  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 42 ++++++++++++++++++++++++++++++++-
> > >  drivers/gpu/drm/msm/adreno/adreno_gpu.h | 24 +++++++++++++++++++
> > >  drivers/gpu/drm/msm/msm_iommu.c         |  9 +++++++
> > >  drivers/gpu/drm/msm/msm_mmu.h           |  1 +
> > >  6 files changed, 81 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> > > index 71dca78cd7a5324e9ff5b14f173e2209fa42e196..670141531112c9d29cef8ef1fd51b74759fdd6d2 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> > > @@ -131,6 +131,8 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
> > >         struct msm_ringbuffer *ring = submit->ring;
> > >         unsigned int i, ibs = 0;
> > >
> > > +       adreno_check_and_reenable_stall(adreno_gpu);
> > > +
> > >         if (IS_ENABLED(CONFIG_DRM_MSM_GPU_SUDO) && submit->in_rb) {
> > >                 ring->cur_ctx_seqno = 0;
> > >                 a5xx_submit_in_rb(gpu, submit);
> > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > index 0ae29a7c8a4d3f74236a35cc919f69d5c0a384a0..5a34cd2109a2d74c92841448a61ccb0d4f34e264 100644
> > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > > @@ -212,6 +212,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
> > >         struct msm_ringbuffer *ring = submit->ring;
> > >         unsigned int i, ibs = 0;
> > >
> > > +       adreno_check_and_reenable_stall(adreno_gpu);
> > > +
> > >         a6xx_set_pagetable(a6xx_gpu, ring, submit);
> > >
> > >         get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP(0),
> > > @@ -335,6 +337,8 @@ static void a7xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
> > >         struct msm_ringbuffer *ring = submit->ring;
> > >         unsigned int i, ibs = 0;
> > >
> > > +       adreno_check_and_reenable_stall(adreno_gpu);
> > > +
> > >         /*
> > >          * Toggle concurrent binning for pagetable switch and set the thread to
> > >          * BR since only it can execute the pagetable switch packets.
> > > diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > > index 1238f326597808eb28b4c6822cbd41a26e555eb9..bac586101dc0494f46b069a8440a45825dfe9b5e 100644
> > > --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > > +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> > > @@ -246,16 +246,53 @@ u64 adreno_private_address_space_size(struct msm_gpu *gpu)
> > >         return SZ_4G;
> > >  }
> > >
> > > +void adreno_check_and_reenable_stall(struct adreno_gpu *adreno_gpu)
> > > +{
> > > +       struct msm_gpu *gpu = &adreno_gpu->base;
> > > +       unsigned long flags;
> > > +
> > > +       /*
> > > +        * Wait until the cooldown period has passed and we would actually
> > > +        * collect a crashdump to re-enable stall-on-fault.
> > > +        */
> > > +       spin_lock_irqsave(&adreno_gpu->fault_stall_lock, flags);
> > > +       if (!adreno_gpu->stall_enabled &&
> > > +                       ktime_after(ktime_get(), adreno_gpu->stall_reenable_time) &&
> > > +                       !READ_ONCE(gpu->crashstate)) {
> > > +               adreno_gpu->stall_enabled = true;
> > > +
> > > +               gpu->aspace->mmu->funcs->set_stall(gpu->aspace->mmu, true);
> > > +       }
> > > +       spin_unlock_irqrestore(&adreno_gpu->fault_stall_lock, flags);
> > > +}
> > > +
> > >  #define ARM_SMMU_FSR_TF                 BIT(1)
> > >  #define ARM_SMMU_FSR_PF                        BIT(3)
> > >  #define ARM_SMMU_FSR_EF                        BIT(4)
> > > +#define ARM_SMMU_FSR_SS                        BIT(30)
> > >
> > >  int adreno_fault_handler(struct msm_gpu *gpu, unsigned long iova, int flags,
> > >                          struct adreno_smmu_fault_info *info, const char *block,
> > >                          u32 scratch[4])
> > >  {
> > > +       struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
> > >         const char *type = "UNKNOWN";
> > > -       bool do_devcoredump = info && !READ_ONCE(gpu->crashstate);
> > > +       bool do_devcoredump = info && (info->fsr & ARM_SMMU_FSR_SS) &&
> > > +               !READ_ONCE(gpu->crashstate);
> > > +       unsigned long irq_flags;
> > > +
> > > +       /*
> > > +        * In case there is a subsequent storm of pagefaults, disable
> > > +        * stall-on-fault for at least half a second.
> > > +        */
> > > +       spin_lock_irqsave(&adreno_gpu->fault_stall_lock, irq_flags);
> > > +       if (adreno_gpu->stall_enabled) {
> > > +               adreno_gpu->stall_enabled = false;
> > > +
> > > +               gpu->aspace->mmu->funcs->set_stall(gpu->aspace->mmu, false);
> > > +       }
> > > +       adreno_gpu->stall_reenable_time = ktime_add_ms(ktime_get(), 500);
> > > +       spin_unlock_irqrestore(&adreno_gpu->fault_stall_lock, irq_flags);
> > >
> > >         /*
> > >          * If we aren't going to be resuming later from fault_worker, then do
> > > @@ -1143,6 +1180,9 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
> > >                 adreno_gpu->info->inactive_period);
> > >         pm_runtime_use_autosuspend(dev);
> > >
> > > +       spin_lock_init(&adreno_gpu->fault_stall_lock);
> > > +       adreno_gpu->stall_enabled = true;
> > > +
> > >         return msm_gpu_init(drm, pdev, &adreno_gpu->base, &funcs->base,
> > >                         gpu_name, &adreno_gpu_config);
> > >  }
> > > diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > > index dcf454629ce037b2a8274a6699674ad754ce1f07..a528036b46216bd898f6d48c5fb0555c4c4b053b 100644
> > > --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > > +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > > @@ -205,6 +205,28 @@ struct adreno_gpu {
> > >         /* firmware: */
> > >         const struct firmware *fw[ADRENO_FW_MAX];
> > >
> > > +       /**
> > > +        * fault_stall_lock:
> >
> > nit, @fault_stall_lock:  And for
> > fault_stall_reenable_time/stall_enabled, it wouldn't hurt to add
> > something along the lines of "Protected by @fault_stall_lock".  I've
> > been slowly trying to improve the comment docs over time, I have some
> > of that in my vmbind patchset.
> >
> > Anyways, with those nits addressed, r-b
> >
> > BR,
> > -R
> >
> > > +        *
> > > +        * Serialize changes to stall-on-fault state.
> > > +        */
> > > +       spinlock_t fault_stall_lock;
> > > +
> > > +       /**
> > > +        * fault_stall_reenable_time:
> > > +        *
> > > +        * if stall_enabled is false, when to reenable stall-on-fault.
> > > +        */
> > > +       ktime_t stall_reenable_time;
> > > +
> > > +       /**
> > > +        * stall_enabled:
> > > +        *
> > > +        * Whether stall-on-fault is currently enabled.
> > > +        */
> > > +       bool stall_enabled;
> > > +
> > > +
> > >         struct {
> > >                 /**
> > >                  * @rgb565_predicator: Unknown, introduced with A650 family,
> > > @@ -629,6 +651,8 @@ int adreno_fault_handler(struct msm_gpu *gpu, unsigned long iova, int flags,
> > >                          struct adreno_smmu_fault_info *info, const char *block,
> > >                          u32 scratch[4]);
> > >
> > > +void adreno_check_and_reenable_stall(struct adreno_gpu *gpu);
> > > +
> > >  int adreno_read_speedbin(struct device *dev, u32 *speedbin);
> > >
> > >  /*
> > > diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
> > > index 2a94e82316f95c5f9dcc37ef0a4664a29e3492b2..8d5380e6dcc217c7c209b51527bf15748b3ada71 100644
> > > --- a/drivers/gpu/drm/msm/msm_iommu.c
> > > +++ b/drivers/gpu/drm/msm/msm_iommu.c
> > > @@ -351,6 +351,14 @@ static void msm_iommu_resume_translation(struct msm_mmu *mmu)
> > >                 adreno_smmu->resume_translation(adreno_smmu->cookie, true);
> > >  }
> > >
> > > +static void msm_iommu_set_stall(struct msm_mmu *mmu, bool enable)
> > > +{
> > > +       struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(mmu->dev);
> > > +
> > > +       if (adreno_smmu->set_stall)
> > > +               adreno_smmu->set_stall(adreno_smmu->cookie, enable);
> > > +}
> > > +
> > >  static void msm_iommu_detach(struct msm_mmu *mmu)
> > >  {
> > >         struct msm_iommu *iommu = to_msm_iommu(mmu);
> > > @@ -399,6 +407,7 @@ static const struct msm_mmu_funcs funcs = {
> > >                 .unmap = msm_iommu_unmap,
> > >                 .destroy = msm_iommu_destroy,
> > >                 .resume_translation = msm_iommu_resume_translation,
> > > +               .set_stall = msm_iommu_set_stall,
> > >  };
> > >
> > >  struct msm_mmu *msm_iommu_new(struct device *dev, unsigned long quirks)
> > > diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
> > > index 88af4f490881f2a6789ae2d03e1c02d10046331a..2694a356a17904e7572b767b16ed0cee806406cf 100644
> > > --- a/drivers/gpu/drm/msm/msm_mmu.h
> > > +++ b/drivers/gpu/drm/msm/msm_mmu.h
> > > @@ -16,6 +16,7 @@ struct msm_mmu_funcs {
> > >         int (*unmap)(struct msm_mmu *mmu, uint64_t iova, size_t len);
> > >         void (*destroy)(struct msm_mmu *mmu);
> > >         void (*resume_translation)(struct msm_mmu *mmu);
> > > +       void (*set_stall)(struct msm_mmu *mmu, bool enable);
> > >  };
> > >
> > >  enum msm_mmu_type {
> > >
> > > --
> > > 2.47.1
> > >


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 1/5] iommu/arm-smmu: Save additional information on context fault
  2025-03-04 16:56 ` [PATCH v4 1/5] iommu/arm-smmu: Save additional information on context fault Connor Abbott
  2025-03-05 19:09   ` Rob Clark
@ 2025-03-11 18:05   ` Will Deacon
  2025-03-11 22:36     ` Connor Abbott
  1 sibling, 1 reply; 33+ messages in thread
From: Will Deacon @ 2025-03-11 18:05 UTC (permalink / raw)
  To: Connor Abbott
  Cc: Rob Clark, Robin Murphy, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On Tue, Mar 04, 2025 at 11:56:47AM -0500, Connor Abbott wrote:
> This will be used by drm/msm for GPU page faults, replacing the manual
> register reading it does.
> 
> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
> ---
>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c |  4 ++--
>  drivers/iommu/arm/arm-smmu/arm-smmu.c            | 27 +++++++++++++-----------
>  drivers/iommu/arm/arm-smmu/arm-smmu.h            |  5 ++++-
>  3 files changed, 21 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
> index 548783f3f8e89fd978367afa65c473002f66e2e7..ae4fdbbce6ba80440f539557a39866a932360d4e 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
> @@ -400,7 +400,7 @@ irqreturn_t qcom_smmu_context_fault(int irq, void *dev)
>  
>  	if (list_empty(&tbu_list)) {
>  		ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
> -					 cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> +					 cfi.fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
>  
>  		if (ret == -ENOSYS)
>  			arm_smmu_print_context_fault_info(smmu, idx, &cfi);
> @@ -412,7 +412,7 @@ irqreturn_t qcom_smmu_context_fault(int irq, void *dev)
>  	phys_soft = ops->iova_to_phys(ops, cfi.iova);
>  
>  	tmp = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
> -				 cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> +				 cfi.fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
>  	if (!tmp || tmp == -EBUSY) {
>  		ret = IRQ_HANDLED;
>  		resume = ARM_SMMU_RESUME_TERMINATE;
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index ade4684c14c9b2724a71e2457288dbfaf7562c83..a9213e0f1579d1e3be0bfba75eea1d5de23117de 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -409,9 +409,12 @@ void arm_smmu_read_context_fault_info(struct arm_smmu_device *smmu, int idx,
>  				      struct arm_smmu_context_fault_info *cfi)
>  {
>  	cfi->iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
> +	cfi->ttbr0 = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_TTBR0);
>  	cfi->fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
> -	cfi->fsynr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
> +	cfi->fsynr0 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
> +	cfi->fsynr1 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR1);

We already have an implementation hook (->get_fault_info()) which the
qcom SMMU driver can override with qcom_adreno_smmu_get_fault_info().
That thing dumps these registers already so if we're moving that into
the core SMMU driver, let's get rid of the hook and move everybody over
rather than having it done in both places.

>  	cfi->cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
> +	cfi->contextidr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_CONTEXTIDR);

I think the CONTEXTIDR register is stage-1 only, so we shouldn't dump
it for stage-2 domains.

Will


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 2/5] iommu/arm-smmu-qcom: Don't read fault registers directly
  2025-03-04 16:56 ` [PATCH v4 2/5] iommu/arm-smmu-qcom: Don't read fault registers directly Connor Abbott
  2025-03-05 19:08   ` Rob Clark
@ 2025-03-11 18:08   ` Will Deacon
  2025-03-11 19:42     ` Connor Abbott
  1 sibling, 1 reply; 33+ messages in thread
From: Will Deacon @ 2025-03-11 18:08 UTC (permalink / raw)
  To: Connor Abbott
  Cc: Rob Clark, Robin Murphy, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On Tue, Mar 04, 2025 at 11:56:48AM -0500, Connor Abbott wrote:
> In some cases drm/msm has to resume a stalled transaction directly in
> its fault handler. Experimentally this doesn't work on SMMU500 if the
> fault hasn't already been acknowledged by clearing FSR. Rather than
> trying to clear FSR in msm's fault handler and implementing a
> tricky handshake to avoid accidentally clearing FSR twice, we want to
> clear FSR before calling the fault handlers, but this means that the
> contents of registers can change underneath us in the fault handler and
> msm currently uses a private function to read the register contents for
> its own purposes in its fault handler, such as using the
> implementation-defined FSYNR1 to determine which block caused the fault.
> Fix this by making msm use the register values already read by arm-smmu
> itself before clearing FSR rather than messing around with reading
> registers directly.
> 
> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
> ---
>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 19 +++++++++----------
>  drivers/iommu/arm/arm-smmu/arm-smmu.c      | 14 +++++++-------
>  drivers/iommu/arm/arm-smmu/arm-smmu.h      | 21 +++++++++++----------
>  3 files changed, 27 insertions(+), 27 deletions(-)

[...]

> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> index d3bc77dcd4d40f25bc70f3289616fb866649b022..411d807e0a7033833716635efb3968a0bd3ff237 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> @@ -373,6 +373,16 @@ enum arm_smmu_domain_stage {
>  	ARM_SMMU_DOMAIN_NESTED,
>  };
>  
> +struct arm_smmu_context_fault_info {
> +	unsigned long iova;
> +	u64 ttbr0;
> +	u32 fsr;
> +	u32 fsynr0;
> +	u32 fsynr1;
> +	u32 cbfrsynra;
> +	u32 contextidr;
> +};
> +
>  struct arm_smmu_domain {
>  	struct arm_smmu_device		*smmu;
>  	struct io_pgtable_ops		*pgtbl_ops;
> @@ -380,6 +390,7 @@ struct arm_smmu_domain {
>  	const struct iommu_flush_ops	*flush_ops;
>  	struct arm_smmu_cfg		cfg;
>  	enum arm_smmu_domain_stage	stage;
> +	struct arm_smmu_context_fault_info cfi;

Does this mean we have to serialise all faults for a given domain? That
can't be right...

Will


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 4/5] iommu/arm-smmu-qcom: Make set_stall work when the device is on
  2025-03-04 16:56 ` [PATCH v4 4/5] iommu/arm-smmu-qcom: Make set_stall work when the device is on Connor Abbott
  2025-03-05 19:09   ` Rob Clark
@ 2025-03-11 18:11   ` Will Deacon
  2025-03-11 20:01     ` Connor Abbott
  1 sibling, 1 reply; 33+ messages in thread
From: Will Deacon @ 2025-03-11 18:11 UTC (permalink / raw)
  To: Connor Abbott
  Cc: Rob Clark, Robin Murphy, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On Tue, Mar 04, 2025 at 11:56:50AM -0500, Connor Abbott wrote:
> Up until now we have only called the set_stall callback during
> initialization when the device is off. But we will soon start calling it
> to temporarily disable stall-on-fault when the device is on, so handle
> that by checking if the device is on and writing SCTLR.
> 
> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
> ---
>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 30 +++++++++++++++++++++++++++---
>  1 file changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index a428e53add08d451fb2152e3ab80e0fba936e214..d34a0d917013bb3d5a24b3ce72f48e3b38474da2 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -77,12 +77,36 @@ static void qcom_adreno_smmu_set_stall(const void *cookie, bool enabled)
>  {
>  	struct arm_smmu_domain *smmu_domain = (void *)cookie;
>  	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> -	struct qcom_smmu *qsmmu = to_qcom_smmu(smmu_domain->smmu);
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
> +	u32 mask = BIT(cfg->cbndx);
> +	bool stall_changed = !!(qsmmu->stall_enabled & mask) != enabled;
> +	unsigned long flags;
>  
>  	if (enabled)
> -		qsmmu->stall_enabled |= BIT(cfg->cbndx);
> +		qsmmu->stall_enabled |= mask;
>  	else
> -		qsmmu->stall_enabled &= ~BIT(cfg->cbndx);
> +		qsmmu->stall_enabled &= ~mask;
> +
> +	/*
> +	 * If the device is on and we changed the setting, update the register.
> +	 */
> +	if (stall_changed && pm_runtime_get_if_active(smmu->dev) > 0) {
> +		spin_lock_irqsave(&smmu_domain->cb_lock, flags);
> +
> +		u32 reg = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR);
> +
> +		if (enabled)
> +			reg |= ARM_SMMU_SCTLR_CFCFG;
> +		else
> +			reg &= ~ARM_SMMU_SCTLR_CFCFG;
> +
> +		arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR, reg);

Are you sure you don't need TLB invalidation for this to take effect? I
think some fields in the SCTLR can be cached in the TLB but you'll need
to check whether or not that applies to CFCFG.

Will


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 2/5] iommu/arm-smmu-qcom: Don't read fault registers directly
  2025-03-11 18:08   ` Will Deacon
@ 2025-03-11 19:42     ` Connor Abbott
  2025-03-11 20:00       ` Rob Clark
  0 siblings, 1 reply; 33+ messages in thread
From: Connor Abbott @ 2025-03-11 19:42 UTC (permalink / raw)
  To: Will Deacon
  Cc: Rob Clark, Robin Murphy, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On Tue, Mar 11, 2025 at 2:08 PM Will Deacon <will@kernel.org> wrote:
>
> On Tue, Mar 04, 2025 at 11:56:48AM -0500, Connor Abbott wrote:
> > In some cases drm/msm has to resume a stalled transaction directly in
> > its fault handler. Experimentally this doesn't work on SMMU500 if the
> > fault hasn't already been acknowledged by clearing FSR. Rather than
> > trying to clear FSR in msm's fault handler and implementing a
> > tricky handshake to avoid accidentally clearing FSR twice, we want to
> > clear FSR before calling the fault handlers, but this means that the
> > contents of registers can change underneath us in the fault handler and
> > msm currently uses a private function to read the register contents for
> > its own purposes in its fault handler, such as using the
> > implementation-defined FSYNR1 to determine which block caused the fault.
> > Fix this by making msm use the register values already read by arm-smmu
> > itself before clearing FSR rather than messing around with reading
> > registers directly.
> >
> > Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
> > ---
> >  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 19 +++++++++----------
> >  drivers/iommu/arm/arm-smmu/arm-smmu.c      | 14 +++++++-------
> >  drivers/iommu/arm/arm-smmu/arm-smmu.h      | 21 +++++++++++----------
> >  3 files changed, 27 insertions(+), 27 deletions(-)
>
> [...]
>
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > index d3bc77dcd4d40f25bc70f3289616fb866649b022..411d807e0a7033833716635efb3968a0bd3ff237 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > @@ -373,6 +373,16 @@ enum arm_smmu_domain_stage {
> >       ARM_SMMU_DOMAIN_NESTED,
> >  };
> >
> > +struct arm_smmu_context_fault_info {
> > +     unsigned long iova;
> > +     u64 ttbr0;
> > +     u32 fsr;
> > +     u32 fsynr0;
> > +     u32 fsynr1;
> > +     u32 cbfrsynra;
> > +     u32 contextidr;
> > +};
> > +
> >  struct arm_smmu_domain {
> >       struct arm_smmu_device          *smmu;
> >       struct io_pgtable_ops           *pgtbl_ops;
> > @@ -380,6 +390,7 @@ struct arm_smmu_domain {
> >       const struct iommu_flush_ops    *flush_ops;
> >       struct arm_smmu_cfg             cfg;
> >       enum arm_smmu_domain_stage      stage;
> > +     struct arm_smmu_context_fault_info cfi;
>
> Does this mean we have to serialise all faults for a given domain? That
> can't be right...
>
> Will

They are already serialized? There's only one of each register per
context bank, so you can only have one context fault at a time per
context bank, and AFAIK a context bank is 1:1 with a domain. Also this
struct is only written and then read inside the context bank's
interrupt handler, and you can only have one interrupt at a time, no?

Connor


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 2/5] iommu/arm-smmu-qcom: Don't read fault registers directly
  2025-03-11 19:42     ` Connor Abbott
@ 2025-03-11 20:00       ` Rob Clark
  2025-03-12 12:57         ` Robin Murphy
  2025-03-12 13:06         ` Will Deacon
  0 siblings, 2 replies; 33+ messages in thread
From: Rob Clark @ 2025-03-11 20:00 UTC (permalink / raw)
  To: Connor Abbott
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On Tue, Mar 11, 2025 at 12:42 PM Connor Abbott <cwabbott0@gmail.com> wrote:
>
> On Tue, Mar 11, 2025 at 2:08 PM Will Deacon <will@kernel.org> wrote:
> >
> > On Tue, Mar 04, 2025 at 11:56:48AM -0500, Connor Abbott wrote:
> > > In some cases drm/msm has to resume a stalled transaction directly in
> > > its fault handler. Experimentally this doesn't work on SMMU500 if the
> > > fault hasn't already been acknowledged by clearing FSR. Rather than
> > > trying to clear FSR in msm's fault handler and implementing a
> > > tricky handshake to avoid accidentally clearing FSR twice, we want to
> > > clear FSR before calling the fault handlers, but this means that the
> > > contents of registers can change underneath us in the fault handler and
> > > msm currently uses a private function to read the register contents for
> > > its own purposes in its fault handler, such as using the
> > > implementation-defined FSYNR1 to determine which block caused the fault.
> > > Fix this by making msm use the register values already read by arm-smmu
> > > itself before clearing FSR rather than messing around with reading
> > > registers directly.
> > >
> > > Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
> > > ---
> > >  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 19 +++++++++----------
> > >  drivers/iommu/arm/arm-smmu/arm-smmu.c      | 14 +++++++-------
> > >  drivers/iommu/arm/arm-smmu/arm-smmu.h      | 21 +++++++++++----------
> > >  3 files changed, 27 insertions(+), 27 deletions(-)
> >
> > [...]
> >
> > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > > index d3bc77dcd4d40f25bc70f3289616fb866649b022..411d807e0a7033833716635efb3968a0bd3ff237 100644
> > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > > @@ -373,6 +373,16 @@ enum arm_smmu_domain_stage {
> > >       ARM_SMMU_DOMAIN_NESTED,
> > >  };
> > >
> > > +struct arm_smmu_context_fault_info {
> > > +     unsigned long iova;
> > > +     u64 ttbr0;
> > > +     u32 fsr;
> > > +     u32 fsynr0;
> > > +     u32 fsynr1;
> > > +     u32 cbfrsynra;
> > > +     u32 contextidr;
> > > +};
> > > +
> > >  struct arm_smmu_domain {
> > >       struct arm_smmu_device          *smmu;
> > >       struct io_pgtable_ops           *pgtbl_ops;
> > > @@ -380,6 +390,7 @@ struct arm_smmu_domain {
> > >       const struct iommu_flush_ops    *flush_ops;
> > >       struct arm_smmu_cfg             cfg;
> > >       enum arm_smmu_domain_stage      stage;
> > > +     struct arm_smmu_context_fault_info cfi;
> >
> > Does this mean we have to serialise all faults for a given domain? That
> > can't be right...
> >
> > Will
>
> They are already serialized? There's only one of each register per
> context bank, so you can only have one context fault at a time per
> context bank, and AFAIK a context bank is 1:1 with a domain. Also this
> struct is only written and then read inside the context bank's
> interrupt handler, and you can only have one interrupt at a time, no?
>
> Connor

And if it was a race condition with cfi getting overridden, it would
have already been an equivalent race condition currently when reading
the values from registers (ie. the register values could have changed
in the elapsed time)

So no additional serialization needed here.

BR,
-R


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 4/5] iommu/arm-smmu-qcom: Make set_stall work when the device is on
  2025-03-11 18:11   ` Will Deacon
@ 2025-03-11 20:01     ` Connor Abbott
  2025-03-12 12:49       ` Will Deacon
  0 siblings, 1 reply; 33+ messages in thread
From: Connor Abbott @ 2025-03-11 20:01 UTC (permalink / raw)
  To: Will Deacon
  Cc: Rob Clark, Robin Murphy, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On Tue, Mar 11, 2025 at 2:11 PM Will Deacon <will@kernel.org> wrote:
>
> On Tue, Mar 04, 2025 at 11:56:50AM -0500, Connor Abbott wrote:
> > Up until now we have only called the set_stall callback during
> > initialization when the device is off. But we will soon start calling it
> > to temporarily disable stall-on-fault when the device is on, so handle
> > that by checking if the device is on and writing SCTLR.
> >
> > Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
> > ---
> >  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 30 +++++++++++++++++++++++++++---
> >  1 file changed, 27 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > index a428e53add08d451fb2152e3ab80e0fba936e214..d34a0d917013bb3d5a24b3ce72f48e3b38474da2 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > @@ -77,12 +77,36 @@ static void qcom_adreno_smmu_set_stall(const void *cookie, bool enabled)
> >  {
> >       struct arm_smmu_domain *smmu_domain = (void *)cookie;
> >       struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> > -     struct qcom_smmu *qsmmu = to_qcom_smmu(smmu_domain->smmu);
> > +     struct arm_smmu_device *smmu = smmu_domain->smmu;
> > +     struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
> > +     u32 mask = BIT(cfg->cbndx);
> > +     bool stall_changed = !!(qsmmu->stall_enabled & mask) != enabled;
> > +     unsigned long flags;
> >
> >       if (enabled)
> > -             qsmmu->stall_enabled |= BIT(cfg->cbndx);
> > +             qsmmu->stall_enabled |= mask;
> >       else
> > -             qsmmu->stall_enabled &= ~BIT(cfg->cbndx);
> > +             qsmmu->stall_enabled &= ~mask;
> > +
> > +     /*
> > +      * If the device is on and we changed the setting, update the register.
> > +      */
> > +     if (stall_changed && pm_runtime_get_if_active(smmu->dev) > 0) {
> > +             spin_lock_irqsave(&smmu_domain->cb_lock, flags);
> > +
> > +             u32 reg = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR);
> > +
> > +             if (enabled)
> > +                     reg |= ARM_SMMU_SCTLR_CFCFG;
> > +             else
> > +                     reg &= ~ARM_SMMU_SCTLR_CFCFG;
> > +
> > +             arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR, reg);
>
> Are you sure you don't need TLB invalidation for this to take effect? I
> think some fields in the SCTLR can be cached in the TLB but you'll need
> to check whether or not that applies to CFCFG.
>
> Will

I think it should be fine because CFCFG only controls behavior when
there's a context fault and there can't be TLB entries for entries
that cause a context fault: "The architecture permits the caching of
any translation table entry that has been returned from memory without
a fault and that does not, as a result of that entry, cause a
Translation Fault or an Access Flag fault."

Connor


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 1/5] iommu/arm-smmu: Save additional information on context fault
  2025-03-11 18:05   ` Will Deacon
@ 2025-03-11 22:36     ` Connor Abbott
  2025-03-12 13:05       ` Will Deacon
  0 siblings, 1 reply; 33+ messages in thread
From: Connor Abbott @ 2025-03-11 22:36 UTC (permalink / raw)
  To: Will Deacon
  Cc: Rob Clark, Robin Murphy, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On Tue, Mar 11, 2025 at 2:06 PM Will Deacon <will@kernel.org> wrote:
>
> On Tue, Mar 04, 2025 at 11:56:47AM -0500, Connor Abbott wrote:
> > This will be used by drm/msm for GPU page faults, replacing the manual
> > register reading it does.
> >
> > Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
> > ---
> >  drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c |  4 ++--
> >  drivers/iommu/arm/arm-smmu/arm-smmu.c            | 27 +++++++++++++-----------
> >  drivers/iommu/arm/arm-smmu/arm-smmu.h            |  5 ++++-
> >  3 files changed, 21 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
> > index 548783f3f8e89fd978367afa65c473002f66e2e7..ae4fdbbce6ba80440f539557a39866a932360d4e 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
> > @@ -400,7 +400,7 @@ irqreturn_t qcom_smmu_context_fault(int irq, void *dev)
> >
> >       if (list_empty(&tbu_list)) {
> >               ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
> > -                                      cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> > +                                      cfi.fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> >
> >               if (ret == -ENOSYS)
> >                       arm_smmu_print_context_fault_info(smmu, idx, &cfi);
> > @@ -412,7 +412,7 @@ irqreturn_t qcom_smmu_context_fault(int irq, void *dev)
> >       phys_soft = ops->iova_to_phys(ops, cfi.iova);
> >
> >       tmp = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
> > -                              cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> > +                              cfi.fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> >       if (!tmp || tmp == -EBUSY) {
> >               ret = IRQ_HANDLED;
> >               resume = ARM_SMMU_RESUME_TERMINATE;
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > index ade4684c14c9b2724a71e2457288dbfaf7562c83..a9213e0f1579d1e3be0bfba75eea1d5de23117de 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > @@ -409,9 +409,12 @@ void arm_smmu_read_context_fault_info(struct arm_smmu_device *smmu, int idx,
> >                                     struct arm_smmu_context_fault_info *cfi)
> >  {
> >       cfi->iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
> > +     cfi->ttbr0 = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_TTBR0);
> >       cfi->fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
> > -     cfi->fsynr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
> > +     cfi->fsynr0 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
> > +     cfi->fsynr1 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR1);
>
> We already have an implementation hook (->get_fault_info()) which the
> qcom SMMU driver can override with qcom_adreno_smmu_get_fault_info().
> That thing dumps these registers already so if we're moving that into
> the core SMMU driver, let's get rid of the hook and move everybody over
> rather than having it done in both places.

As you probably saw, the next commit moves over
qcom_adreno_smmu_get_fault_info() to use this. The current back door
used by drm/msm to access these functions is specific to adreno_smmu
and there isn't an equivalent interface to allow it to call a generic
SMMU function so it isn't possible to move it entirely to the core. At
least not without a bigger refactoring that isn't justified for this
series that is just trying to fix things.

>
> >       cfi->cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
> > +     cfi->contextidr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_CONTEXTIDR);
>
> I think the CONTEXTIDR register is stage-1 only, so we shouldn't dump
> it for stage-2 domains.
>
> Will

Does it matter if we read the register though, as long as users are
aware of this and don't use its value for anything?

Connor


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 4/5] iommu/arm-smmu-qcom: Make set_stall work when the device is on
  2025-03-11 20:01     ` Connor Abbott
@ 2025-03-12 12:49       ` Will Deacon
  2025-03-12 13:30         ` Connor Abbott
  0 siblings, 1 reply; 33+ messages in thread
From: Will Deacon @ 2025-03-12 12:49 UTC (permalink / raw)
  To: Connor Abbott
  Cc: Rob Clark, Robin Murphy, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On Tue, Mar 11, 2025 at 04:01:00PM -0400, Connor Abbott wrote:
> On Tue, Mar 11, 2025 at 2:11 PM Will Deacon <will@kernel.org> wrote:
> >
> > On Tue, Mar 04, 2025 at 11:56:50AM -0500, Connor Abbott wrote:
> > > Up until now we have only called the set_stall callback during
> > > initialization when the device is off. But we will soon start calling it
> > > to temporarily disable stall-on-fault when the device is on, so handle
> > > that by checking if the device is on and writing SCTLR.
> > >
> > > Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
> > > ---
> > >  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 30 +++++++++++++++++++++++++++---
> > >  1 file changed, 27 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > > index a428e53add08d451fb2152e3ab80e0fba936e214..d34a0d917013bb3d5a24b3ce72f48e3b38474da2 100644
> > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > > @@ -77,12 +77,36 @@ static void qcom_adreno_smmu_set_stall(const void *cookie, bool enabled)
> > >  {
> > >       struct arm_smmu_domain *smmu_domain = (void *)cookie;
> > >       struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> > > -     struct qcom_smmu *qsmmu = to_qcom_smmu(smmu_domain->smmu);
> > > +     struct arm_smmu_device *smmu = smmu_domain->smmu;
> > > +     struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
> > > +     u32 mask = BIT(cfg->cbndx);
> > > +     bool stall_changed = !!(qsmmu->stall_enabled & mask) != enabled;
> > > +     unsigned long flags;
> > >
> > >       if (enabled)
> > > -             qsmmu->stall_enabled |= BIT(cfg->cbndx);
> > > +             qsmmu->stall_enabled |= mask;
> > >       else
> > > -             qsmmu->stall_enabled &= ~BIT(cfg->cbndx);
> > > +             qsmmu->stall_enabled &= ~mask;
> > > +
> > > +     /*
> > > +      * If the device is on and we changed the setting, update the register.
> > > +      */
> > > +     if (stall_changed && pm_runtime_get_if_active(smmu->dev) > 0) {
> > > +             spin_lock_irqsave(&smmu_domain->cb_lock, flags);
> > > +
> > > +             u32 reg = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR);
> > > +
> > > +             if (enabled)
> > > +                     reg |= ARM_SMMU_SCTLR_CFCFG;
> > > +             else
> > > +                     reg &= ~ARM_SMMU_SCTLR_CFCFG;
> > > +
> > > +             arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR, reg);
> >
> > Are you sure you don't need TLB invalidation for this to take effect? I
> > think some fields in the SCTLR can be cached in the TLB but you'll need
> > to check whether or not that applies to CFCFG.
> >
> 
> I think it should be fine because CFCFG only controls behavior when
> there's a context fault and there can't be TLB entries for entries
> that cause a context fault: "The architecture permits the caching of
> any translation table entry that has been returned from memory without
> a fault and that does not, as a result of that entry, cause a
> Translation Fault or an Access Flag fault."

Ok, but what about other types of fault? For example, a permission fault
or an address size fault?

Will


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 2/5] iommu/arm-smmu-qcom: Don't read fault registers directly
  2025-03-11 20:00       ` Rob Clark
@ 2025-03-12 12:57         ` Robin Murphy
  2025-03-12 13:06         ` Will Deacon
  1 sibling, 0 replies; 33+ messages in thread
From: Robin Murphy @ 2025-03-12 12:57 UTC (permalink / raw)
  To: Rob Clark, Connor Abbott
  Cc: Will Deacon, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On 11/03/2025 8:00 pm, Rob Clark wrote:
> On Tue, Mar 11, 2025 at 12:42 PM Connor Abbott <cwabbott0@gmail.com> wrote:
>>
>> On Tue, Mar 11, 2025 at 2:08 PM Will Deacon <will@kernel.org> wrote:
>>>
>>> On Tue, Mar 04, 2025 at 11:56:48AM -0500, Connor Abbott wrote:
>>>> In some cases drm/msm has to resume a stalled transaction directly in
>>>> its fault handler. Experimentally this doesn't work on SMMU500 if the
>>>> fault hasn't already been acknowledged by clearing FSR. Rather than
>>>> trying to clear FSR in msm's fault handler and implementing a
>>>> tricky handshake to avoid accidentally clearing FSR twice, we want to
>>>> clear FSR before calling the fault handlers, but this means that the
>>>> contents of registers can change underneath us in the fault handler and
>>>> msm currently uses a private function to read the register contents for
>>>> its own purposes in its fault handler, such as using the
>>>> implementation-defined FSYNR1 to determine which block caused the fault.
>>>> Fix this by making msm use the register values already read by arm-smmu
>>>> itself before clearing FSR rather than messing around with reading
>>>> registers directly.
>>>>
>>>> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
>>>> ---
>>>>   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 19 +++++++++----------
>>>>   drivers/iommu/arm/arm-smmu/arm-smmu.c      | 14 +++++++-------
>>>>   drivers/iommu/arm/arm-smmu/arm-smmu.h      | 21 +++++++++++----------
>>>>   3 files changed, 27 insertions(+), 27 deletions(-)
>>>
>>> [...]
>>>
>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
>>>> index d3bc77dcd4d40f25bc70f3289616fb866649b022..411d807e0a7033833716635efb3968a0bd3ff237 100644
>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
>>>> @@ -373,6 +373,16 @@ enum arm_smmu_domain_stage {
>>>>        ARM_SMMU_DOMAIN_NESTED,
>>>>   };
>>>>
>>>> +struct arm_smmu_context_fault_info {
>>>> +     unsigned long iova;
>>>> +     u64 ttbr0;
>>>> +     u32 fsr;
>>>> +     u32 fsynr0;
>>>> +     u32 fsynr1;
>>>> +     u32 cbfrsynra;
>>>> +     u32 contextidr;
>>>> +};
>>>> +
>>>>   struct arm_smmu_domain {
>>>>        struct arm_smmu_device          *smmu;
>>>>        struct io_pgtable_ops           *pgtbl_ops;
>>>> @@ -380,6 +390,7 @@ struct arm_smmu_domain {
>>>>        const struct iommu_flush_ops    *flush_ops;
>>>>        struct arm_smmu_cfg             cfg;
>>>>        enum arm_smmu_domain_stage      stage;
>>>> +     struct arm_smmu_context_fault_info cfi;
>>>
>>> Does this mean we have to serialise all faults for a given domain? That
>>> can't be right...
>>>
>>> Will
>>
>> They are already serialized? There's only one of each register per
>> context bank, so you can only have one context fault at a time per
>> context bank, and AFAIK a context bank is 1:1 with a domain. Also this
>> struct is only written and then read inside the context bank's
>> interrupt handler, and you can only have one interrupt at a time, no?
>>
>> Connor
> 
> And if it was a race condition with cfi getting overridden, it would
> have already been an equivalent race condition currently when reading
> the values from registers (ie. the register values could have changed
> in the elapsed time)
> 
> So no additional serialization needed here.

Agreed, the only subtlety is the other side of things is true as well - 
i.e. cfi is only valid and stable between the IRQ being fired and 
CBn_RESUME being written, so actually it *mustn't* be touched by 
anything outside the fault handling path.

Thanks,
Robin.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 1/5] iommu/arm-smmu: Save additional information on context fault
  2025-03-11 22:36     ` Connor Abbott
@ 2025-03-12 13:05       ` Will Deacon
  2025-03-12 14:59         ` Rob Clark
  0 siblings, 1 reply; 33+ messages in thread
From: Will Deacon @ 2025-03-12 13:05 UTC (permalink / raw)
  To: Connor Abbott
  Cc: Rob Clark, Robin Murphy, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On Tue, Mar 11, 2025 at 06:36:38PM -0400, Connor Abbott wrote:
> On Tue, Mar 11, 2025 at 2:06 PM Will Deacon <will@kernel.org> wrote:
> >
> > On Tue, Mar 04, 2025 at 11:56:47AM -0500, Connor Abbott wrote:
> > > This will be used by drm/msm for GPU page faults, replacing the manual
> > > register reading it does.
> > >
> > > Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
> > > ---
> > >  drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c |  4 ++--
> > >  drivers/iommu/arm/arm-smmu/arm-smmu.c            | 27 +++++++++++++-----------
> > >  drivers/iommu/arm/arm-smmu/arm-smmu.h            |  5 ++++-
> > >  3 files changed, 21 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
> > > index 548783f3f8e89fd978367afa65c473002f66e2e7..ae4fdbbce6ba80440f539557a39866a932360d4e 100644
> > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
> > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
> > > @@ -400,7 +400,7 @@ irqreturn_t qcom_smmu_context_fault(int irq, void *dev)
> > >
> > >       if (list_empty(&tbu_list)) {
> > >               ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
> > > -                                      cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> > > +                                      cfi.fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> > >
> > >               if (ret == -ENOSYS)
> > >                       arm_smmu_print_context_fault_info(smmu, idx, &cfi);
> > > @@ -412,7 +412,7 @@ irqreturn_t qcom_smmu_context_fault(int irq, void *dev)
> > >       phys_soft = ops->iova_to_phys(ops, cfi.iova);
> > >
> > >       tmp = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
> > > -                              cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> > > +                              cfi.fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> > >       if (!tmp || tmp == -EBUSY) {
> > >               ret = IRQ_HANDLED;
> > >               resume = ARM_SMMU_RESUME_TERMINATE;
> > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > index ade4684c14c9b2724a71e2457288dbfaf7562c83..a9213e0f1579d1e3be0bfba75eea1d5de23117de 100644
> > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > @@ -409,9 +409,12 @@ void arm_smmu_read_context_fault_info(struct arm_smmu_device *smmu, int idx,
> > >                                     struct arm_smmu_context_fault_info *cfi)
> > >  {
> > >       cfi->iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
> > > +     cfi->ttbr0 = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_TTBR0);
> > >       cfi->fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
> > > -     cfi->fsynr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
> > > +     cfi->fsynr0 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
> > > +     cfi->fsynr1 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR1);
> >
> > We already have an implementation hook (->get_fault_info()) which the
> > qcom SMMU driver can override with qcom_adreno_smmu_get_fault_info().
> > That thing dumps these registers already so if we're moving that into
> > the core SMMU driver, let's get rid of the hook and move everybody over
> > rather than having it done in both places.
> 
> As you probably saw, the next commit moves over
> qcom_adreno_smmu_get_fault_info() to use this. The current back door
> used by drm/msm to access these functions is specific to adreno_smmu
> and there isn't an equivalent interface to allow it to call a generic
> SMMU function so it isn't possible to move it entirely to the core. At
> least not without a bigger refactoring that isn't justified for this
> series that is just trying to fix things.

Ok :(

> > >       cfi->cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
> > > +     cfi->contextidr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_CONTEXTIDR);
> >
> > I think the CONTEXTIDR register is stage-1 only, so we shouldn't dump
> > it for stage-2 domains.
> >
> Does it matter if we read the register though, as long as users are
> aware of this and don't use its value for anything?

I think the contents are "UNKNOWN", so it could be hugely confusing even
if they just got logged someplace. Why is it difficult to avoid touching
it for stage-2?

Will


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 2/5] iommu/arm-smmu-qcom: Don't read fault registers directly
  2025-03-11 20:00       ` Rob Clark
  2025-03-12 12:57         ` Robin Murphy
@ 2025-03-12 13:06         ` Will Deacon
  1 sibling, 0 replies; 33+ messages in thread
From: Will Deacon @ 2025-03-12 13:06 UTC (permalink / raw)
  To: Rob Clark
  Cc: Connor Abbott, Robin Murphy, Joerg Roedel, Sean Paul,
	Konrad Dybcio, Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten,
	iommu, linux-arm-msm, linux-arm-kernel, freedreno

On Tue, Mar 11, 2025 at 01:00:34PM -0700, Rob Clark wrote:
> On Tue, Mar 11, 2025 at 12:42 PM Connor Abbott <cwabbott0@gmail.com> wrote:
> >
> > On Tue, Mar 11, 2025 at 2:08 PM Will Deacon <will@kernel.org> wrote:
> > >
> > > On Tue, Mar 04, 2025 at 11:56:48AM -0500, Connor Abbott wrote:
> > > > In some cases drm/msm has to resume a stalled transaction directly in
> > > > its fault handler. Experimentally this doesn't work on SMMU500 if the
> > > > fault hasn't already been acknowledged by clearing FSR. Rather than
> > > > trying to clear FSR in msm's fault handler and implementing a
> > > > tricky handshake to avoid accidentally clearing FSR twice, we want to
> > > > clear FSR before calling the fault handlers, but this means that the
> > > > contents of registers can change underneath us in the fault handler and
> > > > msm currently uses a private function to read the register contents for
> > > > its own purposes in its fault handler, such as using the
> > > > implementation-defined FSYNR1 to determine which block caused the fault.
> > > > Fix this by making msm use the register values already read by arm-smmu
> > > > itself before clearing FSR rather than messing around with reading
> > > > registers directly.
> > > >
> > > > Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
> > > > ---
> > > >  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 19 +++++++++----------
> > > >  drivers/iommu/arm/arm-smmu/arm-smmu.c      | 14 +++++++-------
> > > >  drivers/iommu/arm/arm-smmu/arm-smmu.h      | 21 +++++++++++----------
> > > >  3 files changed, 27 insertions(+), 27 deletions(-)
> > >
> > > [...]
> > >
> > > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > > > index d3bc77dcd4d40f25bc70f3289616fb866649b022..411d807e0a7033833716635efb3968a0bd3ff237 100644
> > > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
> > > > @@ -373,6 +373,16 @@ enum arm_smmu_domain_stage {
> > > >       ARM_SMMU_DOMAIN_NESTED,
> > > >  };
> > > >
> > > > +struct arm_smmu_context_fault_info {
> > > > +     unsigned long iova;
> > > > +     u64 ttbr0;
> > > > +     u32 fsr;
> > > > +     u32 fsynr0;
> > > > +     u32 fsynr1;
> > > > +     u32 cbfrsynra;
> > > > +     u32 contextidr;
> > > > +};
> > > > +
> > > >  struct arm_smmu_domain {
> > > >       struct arm_smmu_device          *smmu;
> > > >       struct io_pgtable_ops           *pgtbl_ops;
> > > > @@ -380,6 +390,7 @@ struct arm_smmu_domain {
> > > >       const struct iommu_flush_ops    *flush_ops;
> > > >       struct arm_smmu_cfg             cfg;
> > > >       enum arm_smmu_domain_stage      stage;
> > > > +     struct arm_smmu_context_fault_info cfi;
> > >
> > > Does this mean we have to serialise all faults for a given domain? That
> > > can't be right...
> > >
> >
> > They are already serialized? There's only one of each register per
> > context bank, so you can only have one context fault at a time per
> > context bank, and AFAIK a context bank is 1:1 with a domain. Also this
> > struct is only written and then read inside the context bank's
> > interrupt handler, and you can only have one interrupt at a time, no?
> >
> > Connor
> 
> And if it was a race condition with cfi getting overridden, it would
> have already been an equivalent race condition currently when reading
> the values from registers (ie. the register values could have changed
> in the elapsed time)
> 
> So no additional serialization needed here.

Oops, yes, sorry. I've been spending too long on SMMUv3 and forgot how
the context banks worked.

Will


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 4/5] iommu/arm-smmu-qcom: Make set_stall work when the device is on
  2025-03-12 12:49       ` Will Deacon
@ 2025-03-12 13:30         ` Connor Abbott
  2025-03-18 13:36           ` Robin Murphy
  0 siblings, 1 reply; 33+ messages in thread
From: Connor Abbott @ 2025-03-12 13:30 UTC (permalink / raw)
  To: Will Deacon
  Cc: Rob Clark, Robin Murphy, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On Wed, Mar 12, 2025 at 8:49 AM Will Deacon <will@kernel.org> wrote:
>
> On Tue, Mar 11, 2025 at 04:01:00PM -0400, Connor Abbott wrote:
> > On Tue, Mar 11, 2025 at 2:11 PM Will Deacon <will@kernel.org> wrote:
> > >
> > > On Tue, Mar 04, 2025 at 11:56:50AM -0500, Connor Abbott wrote:
> > > > Up until now we have only called the set_stall callback during
> > > > initialization when the device is off. But we will soon start calling it
> > > > to temporarily disable stall-on-fault when the device is on, so handle
> > > > that by checking if the device is on and writing SCTLR.
> > > >
> > > > Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
> > > > ---
> > > >  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 30 +++++++++++++++++++++++++++---
> > > >  1 file changed, 27 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > > > index a428e53add08d451fb2152e3ab80e0fba936e214..d34a0d917013bb3d5a24b3ce72f48e3b38474da2 100644
> > > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > > > @@ -77,12 +77,36 @@ static void qcom_adreno_smmu_set_stall(const void *cookie, bool enabled)
> > > >  {
> > > >       struct arm_smmu_domain *smmu_domain = (void *)cookie;
> > > >       struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> > > > -     struct qcom_smmu *qsmmu = to_qcom_smmu(smmu_domain->smmu);
> > > > +     struct arm_smmu_device *smmu = smmu_domain->smmu;
> > > > +     struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
> > > > +     u32 mask = BIT(cfg->cbndx);
> > > > +     bool stall_changed = !!(qsmmu->stall_enabled & mask) != enabled;
> > > > +     unsigned long flags;
> > > >
> > > >       if (enabled)
> > > > -             qsmmu->stall_enabled |= BIT(cfg->cbndx);
> > > > +             qsmmu->stall_enabled |= mask;
> > > >       else
> > > > -             qsmmu->stall_enabled &= ~BIT(cfg->cbndx);
> > > > +             qsmmu->stall_enabled &= ~mask;
> > > > +
> > > > +     /*
> > > > +      * If the device is on and we changed the setting, update the register.
> > > > +      */
> > > > +     if (stall_changed && pm_runtime_get_if_active(smmu->dev) > 0) {
> > > > +             spin_lock_irqsave(&smmu_domain->cb_lock, flags);
> > > > +
> > > > +             u32 reg = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR);
> > > > +
> > > > +             if (enabled)
> > > > +                     reg |= ARM_SMMU_SCTLR_CFCFG;
> > > > +             else
> > > > +                     reg &= ~ARM_SMMU_SCTLR_CFCFG;
> > > > +
> > > > +             arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR, reg);
> > >
> > > Are you sure you don't need TLB invalidation for this to take effect? I
> > > think some fields in the SCTLR can be cached in the TLB but you'll need
> > > to check whether or not that applies to CFCFG.
> > >
> >
> > I think it should be fine because CFCFG only controls behavior when
> > there's a context fault and there can't be TLB entries for entries
> > that cause a context fault: "The architecture permits the caching of
> > any translation table entry that has been returned from memory without
> > a fault and that does not, as a result of that entry, cause a
> > Translation Fault or an Access Flag fault."
>
> Ok, but what about other types of fault? For example, a permission fault
> or an address size fault?
>
> Will

I'm not sure, but the pseudocode for context faults mentions
resampling CFCFG after a fault happens ("We have a fault and must
resample FSR, CFCFG and HUPCF") so I don't think it would be legal to
cache it. Also in practice this does seem to work. Does that answer
it?

Connor


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 1/5] iommu/arm-smmu: Save additional information on context fault
  2025-03-12 13:05       ` Will Deacon
@ 2025-03-12 14:59         ` Rob Clark
  2025-03-12 16:47           ` Will Deacon
  0 siblings, 1 reply; 33+ messages in thread
From: Rob Clark @ 2025-03-12 14:59 UTC (permalink / raw)
  To: Will Deacon
  Cc: Connor Abbott, Robin Murphy, Joerg Roedel, Sean Paul,
	Konrad Dybcio, Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten,
	iommu, linux-arm-msm, linux-arm-kernel, freedreno

On Wed, Mar 12, 2025 at 6:05 AM Will Deacon <will@kernel.org> wrote:
>
> On Tue, Mar 11, 2025 at 06:36:38PM -0400, Connor Abbott wrote:
> > On Tue, Mar 11, 2025 at 2:06 PM Will Deacon <will@kernel.org> wrote:
> > >
> > > On Tue, Mar 04, 2025 at 11:56:47AM -0500, Connor Abbott wrote:
> > > > This will be used by drm/msm for GPU page faults, replacing the manual
> > > > register reading it does.
> > > >
> > > > Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
> > > > ---
> > > >  drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c |  4 ++--
> > > >  drivers/iommu/arm/arm-smmu/arm-smmu.c            | 27 +++++++++++++-----------
> > > >  drivers/iommu/arm/arm-smmu/arm-smmu.h            |  5 ++++-
> > > >  3 files changed, 21 insertions(+), 15 deletions(-)
> > > >
> > > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
> > > > index 548783f3f8e89fd978367afa65c473002f66e2e7..ae4fdbbce6ba80440f539557a39866a932360d4e 100644
> > > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
> > > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.c
> > > > @@ -400,7 +400,7 @@ irqreturn_t qcom_smmu_context_fault(int irq, void *dev)
> > > >
> > > >       if (list_empty(&tbu_list)) {
> > > >               ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
> > > > -                                      cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> > > > +                                      cfi.fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> > > >
> > > >               if (ret == -ENOSYS)
> > > >                       arm_smmu_print_context_fault_info(smmu, idx, &cfi);
> > > > @@ -412,7 +412,7 @@ irqreturn_t qcom_smmu_context_fault(int irq, void *dev)
> > > >       phys_soft = ops->iova_to_phys(ops, cfi.iova);
> > > >
> > > >       tmp = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
> > > > -                              cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> > > > +                              cfi.fsynr0 & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> > > >       if (!tmp || tmp == -EBUSY) {
> > > >               ret = IRQ_HANDLED;
> > > >               resume = ARM_SMMU_RESUME_TERMINATE;
> > > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > > index ade4684c14c9b2724a71e2457288dbfaf7562c83..a9213e0f1579d1e3be0bfba75eea1d5de23117de 100644
> > > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > > @@ -409,9 +409,12 @@ void arm_smmu_read_context_fault_info(struct arm_smmu_device *smmu, int idx,
> > > >                                     struct arm_smmu_context_fault_info *cfi)
> > > >  {
> > > >       cfi->iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
> > > > +     cfi->ttbr0 = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_TTBR0);
> > > >       cfi->fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
> > > > -     cfi->fsynr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
> > > > +     cfi->fsynr0 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
> > > > +     cfi->fsynr1 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR1);
> > >
> > > We already have an implementation hook (->get_fault_info()) which the
> > > qcom SMMU driver can override with qcom_adreno_smmu_get_fault_info().
> > > That thing dumps these registers already so if we're moving that into
> > > the core SMMU driver, let's get rid of the hook and move everybody over
> > > rather than having it done in both places.
> >
> > As you probably saw, the next commit moves over
> > qcom_adreno_smmu_get_fault_info() to use this. The current back door
> > used by drm/msm to access these functions is specific to adreno_smmu
> > and there isn't an equivalent interface to allow it to call a generic
> > SMMU function so it isn't possible to move it entirely to the core. At
> > least not without a bigger refactoring that isn't justified for this
> > series that is just trying to fix things.
>
> Ok :(
>
> > > >       cfi->cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
> > > > +     cfi->contextidr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_CONTEXTIDR);
> > >
> > > I think the CONTEXTIDR register is stage-1 only, so we shouldn't dump
> > > it for stage-2 domains.
> > >
> > Does it matter if we read the register though, as long as users are
> > aware of this and don't use its value for anything?
>
> I think the contents are "UNKNOWN", so it could be hugely confusing even
> if they just got logged someplace. Why is it difficult to avoid touching
> it for stage-2?
>
> Will

Fwiw, we are only ever using stage-1

BR,
-R


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 1/5] iommu/arm-smmu: Save additional information on context fault
  2025-03-12 14:59         ` Rob Clark
@ 2025-03-12 16:47           ` Will Deacon
  2025-03-12 17:23             ` Rob Clark
  0 siblings, 1 reply; 33+ messages in thread
From: Will Deacon @ 2025-03-12 16:47 UTC (permalink / raw)
  To: Rob Clark
  Cc: Connor Abbott, Robin Murphy, Joerg Roedel, Sean Paul,
	Konrad Dybcio, Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten,
	iommu, linux-arm-msm, linux-arm-kernel, freedreno

On Wed, Mar 12, 2025 at 07:59:52AM -0700, Rob Clark wrote:
> On Wed, Mar 12, 2025 at 6:05 AM Will Deacon <will@kernel.org> wrote:
> > On Tue, Mar 11, 2025 at 06:36:38PM -0400, Connor Abbott wrote:
> > > On Tue, Mar 11, 2025 at 2:06 PM Will Deacon <will@kernel.org> wrote:
> > > > On Tue, Mar 04, 2025 at 11:56:47AM -0500, Connor Abbott wrote:
> > > > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > > > index ade4684c14c9b2724a71e2457288dbfaf7562c83..a9213e0f1579d1e3be0bfba75eea1d5de23117de 100644
> > > > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > > > @@ -409,9 +409,12 @@ void arm_smmu_read_context_fault_info(struct arm_smmu_device *smmu, int idx,
> > > > >                                     struct arm_smmu_context_fault_info *cfi)
> > > > >  {
> > > > >       cfi->iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
> > > > > +     cfi->ttbr0 = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_TTBR0);
> > > > >       cfi->fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
> > > > > -     cfi->fsynr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
> > > > > +     cfi->fsynr0 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
> > > > > +     cfi->fsynr1 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR1);
> > > >
> > > > We already have an implementation hook (->get_fault_info()) which the
> > > > qcom SMMU driver can override with qcom_adreno_smmu_get_fault_info().
> > > > That thing dumps these registers already so if we're moving that into
> > > > the core SMMU driver, let's get rid of the hook and move everybody over
> > > > rather than having it done in both places.
> > >
> > > As you probably saw, the next commit moves over
> > > qcom_adreno_smmu_get_fault_info() to use this. The current back door
> > > used by drm/msm to access these functions is specific to adreno_smmu
> > > and there isn't an equivalent interface to allow it to call a generic
> > > SMMU function so it isn't possible to move it entirely to the core. At
> > > least not without a bigger refactoring that isn't justified for this
> > > series that is just trying to fix things.
> >
> > Ok :(
> >
> > > > >       cfi->cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
> > > > > +     cfi->contextidr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_CONTEXTIDR);
> > > >
> > > > I think the CONTEXTIDR register is stage-1 only, so we shouldn't dump
> > > > it for stage-2 domains.
> > > >
> > > Does it matter if we read the register though, as long as users are
> > > aware of this and don't use its value for anything?
> >
> > I think the contents are "UNKNOWN", so it could be hugely confusing even
> > if they just got logged someplace. Why is it difficult to avoid touching
> > it for stage-2?
> >
> Fwiw, we are only ever using stage-1

Sure, but this is in arm-smmu.c which is used by other people and supports
both stages.

Will


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 1/5] iommu/arm-smmu: Save additional information on context fault
  2025-03-12 16:47           ` Will Deacon
@ 2025-03-12 17:23             ` Rob Clark
  2025-03-12 18:01               ` Robin Murphy
  0 siblings, 1 reply; 33+ messages in thread
From: Rob Clark @ 2025-03-12 17:23 UTC (permalink / raw)
  To: Will Deacon
  Cc: Connor Abbott, Robin Murphy, Joerg Roedel, Sean Paul,
	Konrad Dybcio, Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten,
	iommu, linux-arm-msm, linux-arm-kernel, freedreno

On Wed, Mar 12, 2025 at 9:47 AM Will Deacon <will@kernel.org> wrote:
>
> On Wed, Mar 12, 2025 at 07:59:52AM -0700, Rob Clark wrote:
> > On Wed, Mar 12, 2025 at 6:05 AM Will Deacon <will@kernel.org> wrote:
> > > On Tue, Mar 11, 2025 at 06:36:38PM -0400, Connor Abbott wrote:
> > > > On Tue, Mar 11, 2025 at 2:06 PM Will Deacon <will@kernel.org> wrote:
> > > > > On Tue, Mar 04, 2025 at 11:56:47AM -0500, Connor Abbott wrote:
> > > > > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > > > > index ade4684c14c9b2724a71e2457288dbfaf7562c83..a9213e0f1579d1e3be0bfba75eea1d5de23117de 100644
> > > > > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > > > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > > > > @@ -409,9 +409,12 @@ void arm_smmu_read_context_fault_info(struct arm_smmu_device *smmu, int idx,
> > > > > >                                     struct arm_smmu_context_fault_info *cfi)
> > > > > >  {
> > > > > >       cfi->iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
> > > > > > +     cfi->ttbr0 = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_TTBR0);
> > > > > >       cfi->fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
> > > > > > -     cfi->fsynr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
> > > > > > +     cfi->fsynr0 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
> > > > > > +     cfi->fsynr1 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR1);
> > > > >
> > > > > We already have an implementation hook (->get_fault_info()) which the
> > > > > qcom SMMU driver can override with qcom_adreno_smmu_get_fault_info().
> > > > > That thing dumps these registers already so if we're moving that into
> > > > > the core SMMU driver, let's get rid of the hook and move everybody over
> > > > > rather than having it done in both places.
> > > >
> > > > As you probably saw, the next commit moves over
> > > > qcom_adreno_smmu_get_fault_info() to use this. The current back door
> > > > used by drm/msm to access these functions is specific to adreno_smmu
> > > > and there isn't an equivalent interface to allow it to call a generic
> > > > SMMU function so it isn't possible to move it entirely to the core. At
> > > > least not without a bigger refactoring that isn't justified for this
> > > > series that is just trying to fix things.
> > >
> > > Ok :(
> > >
> > > > > >       cfi->cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
> > > > > > +     cfi->contextidr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_CONTEXTIDR);
> > > > >
> > > > > I think the CONTEXTIDR register is stage-1 only, so we shouldn't dump
> > > > > it for stage-2 domains.
> > > > >
> > > > Does it matter if we read the register though, as long as users are
> > > > aware of this and don't use its value for anything?
> > >
> > > I think the contents are "UNKNOWN", so it could be hugely confusing even
> > > if they just got logged someplace. Why is it difficult to avoid touching
> > > it for stage-2?
> > >
> > Fwiw, we are only ever using stage-1
>
> Sure, but this is in arm-smmu.c which is used by other people and supports
> both stages.

Sure, but no one else is using this field in the fault-info.  So maybe
the addition of a comment in the struct would be enough if it isn't
going to cause an SError/etc to read it for S2 cb?

BR,
-R


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 1/5] iommu/arm-smmu: Save additional information on context fault
  2025-03-12 17:23             ` Rob Clark
@ 2025-03-12 18:01               ` Robin Murphy
  2025-03-12 20:20                 ` Rob Clark
  0 siblings, 1 reply; 33+ messages in thread
From: Robin Murphy @ 2025-03-12 18:01 UTC (permalink / raw)
  To: Rob Clark, Will Deacon
  Cc: Connor Abbott, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On 12/03/2025 5:23 pm, Rob Clark wrote:
> On Wed, Mar 12, 2025 at 9:47 AM Will Deacon <will@kernel.org> wrote:
>>
>> On Wed, Mar 12, 2025 at 07:59:52AM -0700, Rob Clark wrote:
>>> On Wed, Mar 12, 2025 at 6:05 AM Will Deacon <will@kernel.org> wrote:
>>>> On Tue, Mar 11, 2025 at 06:36:38PM -0400, Connor Abbott wrote:
>>>>> On Tue, Mar 11, 2025 at 2:06 PM Will Deacon <will@kernel.org> wrote:
>>>>>> On Tue, Mar 04, 2025 at 11:56:47AM -0500, Connor Abbott wrote:
>>>>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>>>>>> index ade4684c14c9b2724a71e2457288dbfaf7562c83..a9213e0f1579d1e3be0bfba75eea1d5de23117de 100644
>>>>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>>>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>>>>>> @@ -409,9 +409,12 @@ void arm_smmu_read_context_fault_info(struct arm_smmu_device *smmu, int idx,
>>>>>>>                                      struct arm_smmu_context_fault_info *cfi)
>>>>>>>   {
>>>>>>>        cfi->iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
>>>>>>> +     cfi->ttbr0 = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_TTBR0);
>>>>>>>        cfi->fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
>>>>>>> -     cfi->fsynr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
>>>>>>> +     cfi->fsynr0 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
>>>>>>> +     cfi->fsynr1 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR1);
>>>>>>
>>>>>> We already have an implementation hook (->get_fault_info()) which the
>>>>>> qcom SMMU driver can override with qcom_adreno_smmu_get_fault_info().
>>>>>> That thing dumps these registers already so if we're moving that into
>>>>>> the core SMMU driver, let's get rid of the hook and move everybody over
>>>>>> rather than having it done in both places.
>>>>>
>>>>> As you probably saw, the next commit moves over
>>>>> qcom_adreno_smmu_get_fault_info() to use this. The current back door
>>>>> used by drm/msm to access these functions is specific to adreno_smmu
>>>>> and there isn't an equivalent interface to allow it to call a generic
>>>>> SMMU function so it isn't possible to move it entirely to the core. At
>>>>> least not without a bigger refactoring that isn't justified for this
>>>>> series that is just trying to fix things.
>>>>
>>>> Ok :(
>>>>
>>>>>>>        cfi->cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
>>>>>>> +     cfi->contextidr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_CONTEXTIDR);
>>>>>>
>>>>>> I think the CONTEXTIDR register is stage-1 only, so we shouldn't dump
>>>>>> it for stage-2 domains.
>>>>>>
>>>>> Does it matter if we read the register though, as long as users are
>>>>> aware of this and don't use its value for anything?
>>>>
>>>> I think the contents are "UNKNOWN", so it could be hugely confusing even
>>>> if they just got logged someplace. Why is it difficult to avoid touching
>>>> it for stage-2?
>>>>
>>> Fwiw, we are only ever using stage-1
>>
>> Sure, but this is in arm-smmu.c which is used by other people and supports
>> both stages.
> 
> Sure, but no one else is using this field in the fault-info.  So maybe
> the addition of a comment in the struct would be enough if it isn't
> going to cause an SError/etc to read it for S2 cb?

Any worthwhile comment isn't going to be significantly shorter or 
clearer than 1 extra line of "if (smmu_domain->stage == 
ARM_SMMU_DOMAIN_S1)"...

TBH it's the Qualcomm register-middle-man firmware I'd be more worried 
about than real hardware, given how touchy it can be even with register 
accesses which *should* be well defined. But then I guess it also has 
the habit of killing the system if anything other than the GPU dares 
cause a fault in the first place, so maybe it OK?

If anyone still uses Arm Fast Models SMMUv1/2 components it'll probably 
squawk an annoying warning there too - ISTR I had at least one patch 
motivated by that in the past.

Thanks,
Robin.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 1/5] iommu/arm-smmu: Save additional information on context fault
  2025-03-12 18:01               ` Robin Murphy
@ 2025-03-12 20:20                 ` Rob Clark
  2025-03-18 15:46                   ` Will Deacon
  0 siblings, 1 reply; 33+ messages in thread
From: Rob Clark @ 2025-03-12 20:20 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Will Deacon, Connor Abbott, Joerg Roedel, Sean Paul,
	Konrad Dybcio, Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten,
	iommu, linux-arm-msm, linux-arm-kernel, freedreno

On Wed, Mar 12, 2025 at 11:01 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 12/03/2025 5:23 pm, Rob Clark wrote:
> > On Wed, Mar 12, 2025 at 9:47 AM Will Deacon <will@kernel.org> wrote:
> >>
> >> On Wed, Mar 12, 2025 at 07:59:52AM -0700, Rob Clark wrote:
> >>> On Wed, Mar 12, 2025 at 6:05 AM Will Deacon <will@kernel.org> wrote:
> >>>> On Tue, Mar 11, 2025 at 06:36:38PM -0400, Connor Abbott wrote:
> >>>>> On Tue, Mar 11, 2025 at 2:06 PM Will Deacon <will@kernel.org> wrote:
> >>>>>> On Tue, Mar 04, 2025 at 11:56:47AM -0500, Connor Abbott wrote:
> >>>>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> >>>>>>> index ade4684c14c9b2724a71e2457288dbfaf7562c83..a9213e0f1579d1e3be0bfba75eea1d5de23117de 100644
> >>>>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> >>>>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> >>>>>>> @@ -409,9 +409,12 @@ void arm_smmu_read_context_fault_info(struct arm_smmu_device *smmu, int idx,
> >>>>>>>                                      struct arm_smmu_context_fault_info *cfi)
> >>>>>>>   {
> >>>>>>>        cfi->iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
> >>>>>>> +     cfi->ttbr0 = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_TTBR0);
> >>>>>>>        cfi->fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
> >>>>>>> -     cfi->fsynr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
> >>>>>>> +     cfi->fsynr0 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
> >>>>>>> +     cfi->fsynr1 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR1);
> >>>>>>
> >>>>>> We already have an implementation hook (->get_fault_info()) which the
> >>>>>> qcom SMMU driver can override with qcom_adreno_smmu_get_fault_info().
> >>>>>> That thing dumps these registers already so if we're moving that into
> >>>>>> the core SMMU driver, let's get rid of the hook and move everybody over
> >>>>>> rather than having it done in both places.
> >>>>>
> >>>>> As you probably saw, the next commit moves over
> >>>>> qcom_adreno_smmu_get_fault_info() to use this. The current back door
> >>>>> used by drm/msm to access these functions is specific to adreno_smmu
> >>>>> and there isn't an equivalent interface to allow it to call a generic
> >>>>> SMMU function so it isn't possible to move it entirely to the core. At
> >>>>> least not without a bigger refactoring that isn't justified for this
> >>>>> series that is just trying to fix things.
> >>>>
> >>>> Ok :(
> >>>>
> >>>>>>>        cfi->cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
> >>>>>>> +     cfi->contextidr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_CONTEXTIDR);
> >>>>>>
> >>>>>> I think the CONTEXTIDR register is stage-1 only, so we shouldn't dump
> >>>>>> it for stage-2 domains.
> >>>>>>
> >>>>> Does it matter if we read the register though, as long as users are
> >>>>> aware of this and don't use its value for anything?
> >>>>
> >>>> I think the contents are "UNKNOWN", so it could be hugely confusing even
> >>>> if they just got logged someplace. Why is it difficult to avoid touching
> >>>> it for stage-2?
> >>>>
> >>> Fwiw, we are only ever using stage-1
> >>
> >> Sure, but this is in arm-smmu.c which is used by other people and supports
> >> both stages.
> >
> > Sure, but no one else is using this field in the fault-info.  So maybe
> > the addition of a comment in the struct would be enough if it isn't
> > going to cause an SError/etc to read it for S2 cb?
>
> Any worthwhile comment isn't going to be significantly shorter or
> clearer than 1 extra line of "if (smmu_domain->stage ==
> ARM_SMMU_DOMAIN_S1)"...

Just that smmu_domain isn't passed to
arm_smmu_read_context_fault_info(), so it would add some extra churn
on top of that one extra line

> TBH it's the Qualcomm register-middle-man firmware I'd be more worried
> about than real hardware, given how touchy it can be even with register
> accesses which *should* be well defined. But then I guess it also has
> the habit of killing the system if anything other than the GPU dares
> cause a fault in the first place, so maybe it OK?

If we have qc hyp, then we don't get S2 in the first place ;-)

BR,
-R

> If anyone still uses Arm Fast Models SMMUv1/2 components it'll probably
> squawk an annoying warning there too - ISTR I had at least one patch
> motivated by that in the past.
>
> Thanks,
> Robin.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 4/5] iommu/arm-smmu-qcom: Make set_stall work when the device is on
  2025-03-12 13:30         ` Connor Abbott
@ 2025-03-18 13:36           ` Robin Murphy
  2025-03-18 15:47             ` Will Deacon
  0 siblings, 1 reply; 33+ messages in thread
From: Robin Murphy @ 2025-03-18 13:36 UTC (permalink / raw)
  To: Connor Abbott, Will Deacon
  Cc: Rob Clark, Joerg Roedel, Sean Paul, Konrad Dybcio, Abhinav Kumar,
	Dmitry Baryshkov, Marijn Suijten, iommu, linux-arm-msm,
	linux-arm-kernel, freedreno

On 12/03/2025 1:30 pm, Connor Abbott wrote:
> On Wed, Mar 12, 2025 at 8:49 AM Will Deacon <will@kernel.org> wrote:
>>
>> On Tue, Mar 11, 2025 at 04:01:00PM -0400, Connor Abbott wrote:
>>> On Tue, Mar 11, 2025 at 2:11 PM Will Deacon <will@kernel.org> wrote:
>>>>
>>>> On Tue, Mar 04, 2025 at 11:56:50AM -0500, Connor Abbott wrote:
>>>>> Up until now we have only called the set_stall callback during
>>>>> initialization when the device is off. But we will soon start calling it
>>>>> to temporarily disable stall-on-fault when the device is on, so handle
>>>>> that by checking if the device is on and writing SCTLR.
>>>>>
>>>>> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
>>>>> ---
>>>>>   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 30 +++++++++++++++++++++++++++---
>>>>>   1 file changed, 27 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>>> index a428e53add08d451fb2152e3ab80e0fba936e214..d34a0d917013bb3d5a24b3ce72f48e3b38474da2 100644
>>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>>> @@ -77,12 +77,36 @@ static void qcom_adreno_smmu_set_stall(const void *cookie, bool enabled)
>>>>>   {
>>>>>        struct arm_smmu_domain *smmu_domain = (void *)cookie;
>>>>>        struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
>>>>> -     struct qcom_smmu *qsmmu = to_qcom_smmu(smmu_domain->smmu);
>>>>> +     struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>>> +     struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
>>>>> +     u32 mask = BIT(cfg->cbndx);
>>>>> +     bool stall_changed = !!(qsmmu->stall_enabled & mask) != enabled;
>>>>> +     unsigned long flags;
>>>>>
>>>>>        if (enabled)
>>>>> -             qsmmu->stall_enabled |= BIT(cfg->cbndx);
>>>>> +             qsmmu->stall_enabled |= mask;
>>>>>        else
>>>>> -             qsmmu->stall_enabled &= ~BIT(cfg->cbndx);
>>>>> +             qsmmu->stall_enabled &= ~mask;
>>>>> +
>>>>> +     /*
>>>>> +      * If the device is on and we changed the setting, update the register.
>>>>> +      */
>>>>> +     if (stall_changed && pm_runtime_get_if_active(smmu->dev) > 0) {
>>>>> +             spin_lock_irqsave(&smmu_domain->cb_lock, flags);
>>>>> +
>>>>> +             u32 reg = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR);
>>>>> +
>>>>> +             if (enabled)
>>>>> +                     reg |= ARM_SMMU_SCTLR_CFCFG;
>>>>> +             else
>>>>> +                     reg &= ~ARM_SMMU_SCTLR_CFCFG;
>>>>> +
>>>>> +             arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR, reg);
>>>>
>>>> Are you sure you don't need TLB invalidation for this to take effect? I
>>>> think some fields in the SCTLR can be cached in the TLB but you'll need
>>>> to check whether or not that applies to CFCFG.
>>>>
>>>
>>> I think it should be fine because CFCFG only controls behavior when
>>> there's a context fault and there can't be TLB entries for entries
>>> that cause a context fault: "The architecture permits the caching of
>>> any translation table entry that has been returned from memory without
>>> a fault and that does not, as a result of that entry, cause a
>>> Translation Fault or an Access Flag fault."
>>
>> Ok, but what about other types of fault? For example, a permission fault
>> or an address size fault?
>>
>> Will
> 
> I'm not sure, but the pseudocode for context faults mentions
> resampling CFCFG after a fault happens ("We have a fault and must
> resample FSR, CFCFG and HUPCF") so I don't think it would be legal to
> cache it. Also in practice this does seem to work. Does that answer
> it?

FWIW I checked with the former MMU-500 design lead, and although he 
doesn't remember the exact details he's pretty confident that they 
wouldn't have cached anything fault-related, so at least from our side 
I'd consider this OK.

Thanks,
Robin.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 1/5] iommu/arm-smmu: Save additional information on context fault
  2025-03-12 20:20                 ` Rob Clark
@ 2025-03-18 15:46                   ` Will Deacon
  0 siblings, 0 replies; 33+ messages in thread
From: Will Deacon @ 2025-03-18 15:46 UTC (permalink / raw)
  To: Rob Clark
  Cc: Robin Murphy, Connor Abbott, Joerg Roedel, Sean Paul,
	Konrad Dybcio, Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten,
	iommu, linux-arm-msm, linux-arm-kernel, freedreno

On Wed, Mar 12, 2025 at 01:20:45PM -0700, Rob Clark wrote:
> On Wed, Mar 12, 2025 at 11:01 AM Robin Murphy <robin.murphy@arm.com> wrote:
> >
> > On 12/03/2025 5:23 pm, Rob Clark wrote:
> > > On Wed, Mar 12, 2025 at 9:47 AM Will Deacon <will@kernel.org> wrote:
> > >>
> > >> On Wed, Mar 12, 2025 at 07:59:52AM -0700, Rob Clark wrote:
> > >>> On Wed, Mar 12, 2025 at 6:05 AM Will Deacon <will@kernel.org> wrote:
> > >>>> On Tue, Mar 11, 2025 at 06:36:38PM -0400, Connor Abbott wrote:
> > >>>>> On Tue, Mar 11, 2025 at 2:06 PM Will Deacon <will@kernel.org> wrote:
> > >>>>>> On Tue, Mar 04, 2025 at 11:56:47AM -0500, Connor Abbott wrote:
> > >>>>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > >>>>>>> index ade4684c14c9b2724a71e2457288dbfaf7562c83..a9213e0f1579d1e3be0bfba75eea1d5de23117de 100644
> > >>>>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > >>>>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > >>>>>>> @@ -409,9 +409,12 @@ void arm_smmu_read_context_fault_info(struct arm_smmu_device *smmu, int idx,
> > >>>>>>>                                      struct arm_smmu_context_fault_info *cfi)
> > >>>>>>>   {
> > >>>>>>>        cfi->iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
> > >>>>>>> +     cfi->ttbr0 = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_TTBR0);
> > >>>>>>>        cfi->fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
> > >>>>>>> -     cfi->fsynr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
> > >>>>>>> +     cfi->fsynr0 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR0);
> > >>>>>>> +     cfi->fsynr1 = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSYNR1);
> > >>>>>>
> > >>>>>> We already have an implementation hook (->get_fault_info()) which the
> > >>>>>> qcom SMMU driver can override with qcom_adreno_smmu_get_fault_info().
> > >>>>>> That thing dumps these registers already so if we're moving that into
> > >>>>>> the core SMMU driver, let's get rid of the hook and move everybody over
> > >>>>>> rather than having it done in both places.
> > >>>>>
> > >>>>> As you probably saw, the next commit moves over
> > >>>>> qcom_adreno_smmu_get_fault_info() to use this. The current back door
> > >>>>> used by drm/msm to access these functions is specific to adreno_smmu
> > >>>>> and there isn't an equivalent interface to allow it to call a generic
> > >>>>> SMMU function so it isn't possible to move it entirely to the core. At
> > >>>>> least not without a bigger refactoring that isn't justified for this
> > >>>>> series that is just trying to fix things.
> > >>>>
> > >>>> Ok :(
> > >>>>
> > >>>>>>>        cfi->cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
> > >>>>>>> +     cfi->contextidr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_CONTEXTIDR);
> > >>>>>>
> > >>>>>> I think the CONTEXTIDR register is stage-1 only, so we shouldn't dump
> > >>>>>> it for stage-2 domains.
> > >>>>>>
> > >>>>> Does it matter if we read the register though, as long as users are
> > >>>>> aware of this and don't use its value for anything?
> > >>>>
> > >>>> I think the contents are "UNKNOWN", so it could be hugely confusing even
> > >>>> if they just got logged someplace. Why is it difficult to avoid touching
> > >>>> it for stage-2?
> > >>>>
> > >>> Fwiw, we are only ever using stage-1
> > >>
> > >> Sure, but this is in arm-smmu.c which is used by other people and supports
> > >> both stages.
> > >
> > > Sure, but no one else is using this field in the fault-info.  So maybe
> > > the addition of a comment in the struct would be enough if it isn't
> > > going to cause an SError/etc to read it for S2 cb?
> >
> > Any worthwhile comment isn't going to be significantly shorter or
> > clearer than 1 extra line of "if (smmu_domain->stage ==
> > ARM_SMMU_DOMAIN_S1)"...
> 
> Just that smmu_domain isn't passed to
> arm_smmu_read_context_fault_info(), so it would add some extra churn
> on top of that one extra line

Churn is what we do best in this driver :)

Will


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 4/5] iommu/arm-smmu-qcom: Make set_stall work when the device is on
  2025-03-18 13:36           ` Robin Murphy
@ 2025-03-18 15:47             ` Will Deacon
  0 siblings, 0 replies; 33+ messages in thread
From: Will Deacon @ 2025-03-18 15:47 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Connor Abbott, Rob Clark, Joerg Roedel, Sean Paul, Konrad Dybcio,
	Abhinav Kumar, Dmitry Baryshkov, Marijn Suijten, iommu,
	linux-arm-msm, linux-arm-kernel, freedreno

On Tue, Mar 18, 2025 at 01:36:55PM +0000, Robin Murphy wrote:
> On 12/03/2025 1:30 pm, Connor Abbott wrote:
> > On Wed, Mar 12, 2025 at 8:49 AM Will Deacon <will@kernel.org> wrote:
> > > On Tue, Mar 11, 2025 at 04:01:00PM -0400, Connor Abbott wrote:
> > > > On Tue, Mar 11, 2025 at 2:11 PM Will Deacon <will@kernel.org> wrote:
> > > > > On Tue, Mar 04, 2025 at 11:56:50AM -0500, Connor Abbott wrote:
> > > > > > +     /*
> > > > > > +      * If the device is on and we changed the setting, update the register.
> > > > > > +      */
> > > > > > +     if (stall_changed && pm_runtime_get_if_active(smmu->dev) > 0) {
> > > > > > +             spin_lock_irqsave(&smmu_domain->cb_lock, flags);
> > > > > > +
> > > > > > +             u32 reg = arm_smmu_cb_read(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR);
> > > > > > +
> > > > > > +             if (enabled)
> > > > > > +                     reg |= ARM_SMMU_SCTLR_CFCFG;
> > > > > > +             else
> > > > > > +                     reg &= ~ARM_SMMU_SCTLR_CFCFG;
> > > > > > +
> > > > > > +             arm_smmu_cb_write(smmu, cfg->cbndx, ARM_SMMU_CB_SCTLR, reg);
> > > > > 
> > > > > Are you sure you don't need TLB invalidation for this to take effect? I
> > > > > think some fields in the SCTLR can be cached in the TLB but you'll need
> > > > > to check whether or not that applies to CFCFG.
> > > > > 
> > > > 
> > > > I think it should be fine because CFCFG only controls behavior when
> > > > there's a context fault and there can't be TLB entries for entries
> > > > that cause a context fault: "The architecture permits the caching of
> > > > any translation table entry that has been returned from memory without
> > > > a fault and that does not, as a result of that entry, cause a
> > > > Translation Fault or an Access Flag fault."
> > > 
> > > Ok, but what about other types of fault? For example, a permission fault
> > > or an address size fault?
> > > 
> > I'm not sure, but the pseudocode for context faults mentions
> > resampling CFCFG after a fault happens ("We have a fault and must
> > resample FSR, CFCFG and HUPCF") so I don't think it would be legal to
> > cache it. Also in practice this does seem to work. Does that answer
> > it?
> 
> FWIW I checked with the former MMU-500 design lead, and although he doesn't
> remember the exact details he's pretty confident that they wouldn't have
> cached anything fault-related, so at least from our side I'd consider this
> OK.

Thanks for checking, Robin.

Connor -- please can you stick some of the above in a comment prior
to the CFCFG munging so I don't "rediscover" the problem in the future?

Will


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2025-03-18 15:51 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-04 16:56 [PATCH v4 0/5] iommu/arm-smmu, drm/msm: Fixes for stall-on-fault Connor Abbott
2025-03-04 16:56 ` [PATCH v4 1/5] iommu/arm-smmu: Save additional information on context fault Connor Abbott
2025-03-05 19:09   ` Rob Clark
2025-03-11 18:05   ` Will Deacon
2025-03-11 22:36     ` Connor Abbott
2025-03-12 13:05       ` Will Deacon
2025-03-12 14:59         ` Rob Clark
2025-03-12 16:47           ` Will Deacon
2025-03-12 17:23             ` Rob Clark
2025-03-12 18:01               ` Robin Murphy
2025-03-12 20:20                 ` Rob Clark
2025-03-18 15:46                   ` Will Deacon
2025-03-04 16:56 ` [PATCH v4 2/5] iommu/arm-smmu-qcom: Don't read fault registers directly Connor Abbott
2025-03-05 19:08   ` Rob Clark
2025-03-11 18:08   ` Will Deacon
2025-03-11 19:42     ` Connor Abbott
2025-03-11 20:00       ` Rob Clark
2025-03-12 12:57         ` Robin Murphy
2025-03-12 13:06         ` Will Deacon
2025-03-04 16:56 ` [PATCH v4 3/5] iommu/arm-smmu: Fix spurious interrupts with stall-on-fault Connor Abbott
2025-03-05 19:12   ` Rob Clark
2025-03-04 16:56 ` [PATCH v4 4/5] iommu/arm-smmu-qcom: Make set_stall work when the device is on Connor Abbott
2025-03-05 19:09   ` Rob Clark
2025-03-11 18:11   ` Will Deacon
2025-03-11 20:01     ` Connor Abbott
2025-03-12 12:49       ` Will Deacon
2025-03-12 13:30         ` Connor Abbott
2025-03-18 13:36           ` Robin Murphy
2025-03-18 15:47             ` Will Deacon
2025-03-04 16:56 ` [PATCH v4 5/5] drm/msm: Temporarily disable stall-on-fault after a page fault Connor Abbott
2025-03-05 19:07   ` Rob Clark
2025-03-05 19:38     ` Connor Abbott
2025-03-05 19:56       ` Rob Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).