public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers
@ 2026-01-27 12:11 Prakash Gupta
  2026-01-27 12:52 ` Akhil P Oommen
  2026-01-27 16:05 ` Robin Murphy
  0 siblings, 2 replies; 11+ messages in thread
From: Prakash Gupta @ 2026-01-27 12:11 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy, Joerg Roedel, Rob Clark, Connor Abbott
  Cc: linux-arm-msm, linux-arm-kernel, iommu, linux-kernel,
	Akhil P Oommen, Pratyush Brahma, Prakash Gupta

Commit d4a44f0750bb ("iommu/arm-smmu: Invoke pm_runtime across the driver")
enabled pm_runtime for the arm-smmu device. On systems where the SMMU
sits in a power domain, all register accesses must be done while the
device is runtime-resumed to avoid unclocked register reads and
potential NoC errors.

So far, this has not been an issue for most SMMU clients because
stall-on-fault is enabled by default. While a translation fault is
being handled, the SMMU stalls further translations for that context
bank, so the fault handler would not race with a powered-down
SMMU.

Adreno SMMU now disables stall-on-fault in the presence of fault
storms to avoid saturating SMMU resources and hanging the GMU. With
stall-on-fault disabled, the SMMU can generate faults while its power
domain may no longer be enabled, which makes unclocked accesses to
fault-status registers in the SMMU fault handlers possible.

Guard the context and global fault handlers with arm_smmu_rpm_get() /
arm_smmu_rpm_put() so that all SMMU fault register accesses are done
with the SMMU powered.

Fixes: b13044092c1e ("drm/msm: Temporarily disable stall-on-fault after a page fault")
Co-developed-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com>
Signed-off-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com>
Signed-off-by: Prakash Gupta <prakash.gupta@oss.qualcomm.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c |  5 ++-
 drivers/iommu/arm/arm-smmu/arm-smmu.c      | 53 ++++++++++++++++++++++--------
 2 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 573085349df3..2d03df72612d 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -317,6 +317,7 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
 	const struct of_device_id *client_match;
+	const struct arm_smmu_impl *impl = qsmmu->data->impl;
 	int cbndx = smmu_domain->cfg.cbndx;
 	struct adreno_smmu_priv *priv;
 
@@ -350,10 +351,12 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
 	priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
 	priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
 	priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
-	priv->set_stall = qcom_adreno_smmu_set_stall;
 	priv->set_prr_bit = NULL;
 	priv->set_prr_addr = NULL;
 
+	if (impl->context_fault_needs_threaded_irq)
+		priv->set_stall = qcom_adreno_smmu_set_stall;
+
 	if (of_device_is_compatible(np, "qcom,smmu-500") &&
 	    !of_device_is_compatible(np, "qcom,sm8250-smmu-500") &&
 	    of_device_is_compatible(np, "qcom,adreno-smmu")) {
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 5e690cf85ec9..183f12e45b02 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -462,10 +462,23 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
 	int idx = smmu_domain->cfg.cbndx;
 	int ret;
 
+	if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq) {
+		ret = arm_smmu_rpm_get(smmu);
+		if (ret < 0)
+			return IRQ_NONE;
+	}
+
+	if (smmu->impl && smmu->impl->context_fault) {
+		ret = smmu->impl->context_fault(irq, dev);
+		goto out_power_off;
+	}
+
 	arm_smmu_read_context_fault_info(smmu, idx, &cfi);
 
-	if (!(cfi.fsr & ARM_SMMU_CB_FSR_FAULT))
-		return IRQ_NONE;
+	if (!(cfi.fsr & ARM_SMMU_CB_FSR_FAULT)) {
+		ret = IRQ_NONE;
+		goto out_power_off;
+	}
 
 	ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
 		cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
@@ -480,7 +493,14 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
 				  ret == -EAGAIN ? 0 : ARM_SMMU_RESUME_TERMINATE);
 	}
 
-	return IRQ_HANDLED;
+	ret = IRQ_HANDLED;
+
+out_power_off:
+
+	if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq)
+		arm_smmu_rpm_put(smmu);
+
+	return ret;
 }
 
 static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
@@ -489,14 +509,21 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
 	struct arm_smmu_device *smmu = dev;
 	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
 				      DEFAULT_RATELIMIT_BURST);
+	int ret;
+
+	ret = arm_smmu_rpm_get(smmu);
+	if (ret < 0)
+		return IRQ_NONE;
 
 	gfsr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSR);
 	gfsynr0 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR0);
 	gfsynr1 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR1);
 	gfsynr2 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR2);
 
-	if (!gfsr)
-		return IRQ_NONE;
+	if (!gfsr) {
+		ret = IRQ_NONE;
+		goto out_power_off;
+	}
 
 	if (__ratelimit(&rs)) {
 		if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) &&
@@ -513,7 +540,11 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
 	}
 
 	arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sGFSR, gfsr);
-	return IRQ_HANDLED;
+	ret = IRQ_HANDLED;
+
+out_power_off:
+	arm_smmu_rpm_put(smmu);
+	return ret;
 }
 
 static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
@@ -683,7 +714,6 @@ static int arm_smmu_init_domain_context(struct arm_smmu_domain *smmu_domain,
 	enum io_pgtable_fmt fmt;
 	struct iommu_domain *domain = &smmu_domain->domain;
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
-	irqreturn_t (*context_fault)(int irq, void *dev);
 
 	mutex_lock(&smmu_domain->init_mutex);
 	if (smmu_domain->smmu)
@@ -850,19 +880,14 @@ static int arm_smmu_init_domain_context(struct arm_smmu_domain *smmu_domain,
 	 */
 	irq = smmu->irqs[cfg->irptndx];
 
-	if (smmu->impl && smmu->impl->context_fault)
-		context_fault = smmu->impl->context_fault;
-	else
-		context_fault = arm_smmu_context_fault;
-
 	if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq)
 		ret = devm_request_threaded_irq(smmu->dev, irq, NULL,
-						context_fault,
+						arm_smmu_context_fault,
 						IRQF_ONESHOT | IRQF_SHARED,
 						"arm-smmu-context-fault",
 						smmu_domain);
 	else
-		ret = devm_request_irq(smmu->dev, irq, context_fault, IRQF_SHARED,
+		ret = devm_request_irq(smmu->dev, irq, arm_smmu_context_fault, IRQF_SHARED,
 				       "arm-smmu-context-fault", smmu_domain);
 
 	if (ret < 0) {

---
base-commit: fcb70a56f4d81450114034b2c61f48ce7444a0e2
change-id: 20251208-smmu-rpm-8bd67db93dca

Best regards,
--  
Prakash Gupta <prakash.gupta@oss.qualcomm.com>


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers
  2026-01-27 12:11 [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers Prakash Gupta
@ 2026-01-27 12:52 ` Akhil P Oommen
  2026-01-27 16:05 ` Robin Murphy
  1 sibling, 0 replies; 11+ messages in thread
From: Akhil P Oommen @ 2026-01-27 12:52 UTC (permalink / raw)
  To: Prakash Gupta, Will Deacon, Robin Murphy, Joerg Roedel, Rob Clark,
	Connor Abbott
  Cc: linux-arm-msm, linux-arm-kernel, iommu, linux-kernel,
	Pratyush Brahma

On 1/27/2026 5:41 PM, Prakash Gupta wrote:
> Commit d4a44f0750bb ("iommu/arm-smmu: Invoke pm_runtime across the driver")
> enabled pm_runtime for the arm-smmu device. On systems where the SMMU
> sits in a power domain, all register accesses must be done while the
> device is runtime-resumed to avoid unclocked register reads and
> potential NoC errors.
> 
> So far, this has not been an issue for most SMMU clients because
> stall-on-fault is enabled by default. While a translation fault is
> being handled, the SMMU stalls further translations for that context
> bank, so the fault handler would not race with a powered-down
> SMMU.
> 
> Adreno SMMU now disables stall-on-fault in the presence of fault
> storms to avoid saturating SMMU resources and hanging the GMU. With
> stall-on-fault disabled, the SMMU can generate faults while its power
> domain may no longer be enabled, which makes unclocked accesses to
> fault-status registers in the SMMU fault handlers possible.
> 
> Guard the context and global fault handlers with arm_smmu_rpm_get() /
> arm_smmu_rpm_put() so that all SMMU fault register accesses are done
> with the SMMU powered.
> 
> Fixes: b13044092c1e ("drm/msm: Temporarily disable stall-on-fault after a page fault")
> Co-developed-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com>
> Signed-off-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com>
> Signed-off-by: Prakash Gupta <prakash.gupta@oss.qualcomm.com>

Acked-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>

-Akhil

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers
  2026-01-27 12:11 [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers Prakash Gupta
  2026-01-27 12:52 ` Akhil P Oommen
@ 2026-01-27 16:05 ` Robin Murphy
  2026-01-28  5:56   ` Prakash Gupta
  1 sibling, 1 reply; 11+ messages in thread
From: Robin Murphy @ 2026-01-27 16:05 UTC (permalink / raw)
  To: Prakash Gupta, Will Deacon, Joerg Roedel, Rob Clark,
	Connor Abbott
  Cc: linux-arm-msm, linux-arm-kernel, iommu, linux-kernel,
	Akhil P Oommen, Pratyush Brahma

On 2026-01-27 12:11 pm, Prakash Gupta wrote:
> Commit d4a44f0750bb ("iommu/arm-smmu: Invoke pm_runtime across the driver")
> enabled pm_runtime for the arm-smmu device. On systems where the SMMU
> sits in a power domain, all register accesses must be done while the
> device is runtime-resumed to avoid unclocked register reads and
> potential NoC errors.
> 
> So far, this has not been an issue for most SMMU clients because
> stall-on-fault is enabled by default. While a translation fault is
> being handled, the SMMU stalls further translations for that context
> bank, so the fault handler would not race with a powered-down
> SMMU.
> 
> Adreno SMMU now disables stall-on-fault in the presence of fault
> storms to avoid saturating SMMU resources and hanging the GMU. With
> stall-on-fault disabled, the SMMU can generate faults while its power
> domain may no longer be enabled, which makes unclocked accesses to
> fault-status registers in the SMMU fault handlers possible.

At face value, that sounds wrong - how does an SMMU generate a fault, or 
indeed do anything, when it's powered off? In principle it's possible 
that the SMMU might signal an interrupt, and is _then_ suspended (with 
the interrupt line somehow remaining asserted, so probably more 
clock-gated than completely powered down) before the interrupt hander 
runs, but we rather assume that we're not going to have an unhandled 
hardware IRQ hanging around for longer than the autosuspend delay.

So, judging by the diff below, I guess what you really mean is that in 
the case of a threaded context IRQ handler, it can take long enough 
between handling the hardware IRQ and the thread actually running that 
the SMMU may have suspended in between?

> Guard the context and global fault handlers with arm_smmu_rpm_get() /
> arm_smmu_rpm_put() so that all SMMU fault register accesses are done
> with the SMMU powered.
> 
> Fixes: b13044092c1e ("drm/msm: Temporarily disable stall-on-fault after a page fault")
> Co-developed-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com>
> Signed-off-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com>
> Signed-off-by: Prakash Gupta <prakash.gupta@oss.qualcomm.com>
> ---
>   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c |  5 ++-
>   drivers/iommu/arm/arm-smmu/arm-smmu.c      | 53 ++++++++++++++++++++++--------
>   2 files changed, 43 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index 573085349df3..2d03df72612d 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -317,6 +317,7 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
>   	struct arm_smmu_device *smmu = smmu_domain->smmu;
>   	struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
>   	const struct of_device_id *client_match;
> +	const struct arm_smmu_impl *impl = qsmmu->data->impl;
>   	int cbndx = smmu_domain->cfg.cbndx;
>   	struct adreno_smmu_priv *priv;
>   
> @@ -350,10 +351,12 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
>   	priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
>   	priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
>   	priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
> -	priv->set_stall = qcom_adreno_smmu_set_stall;
>   	priv->set_prr_bit = NULL;
>   	priv->set_prr_addr = NULL;
>   
> +	if (impl->context_fault_needs_threaded_irq)
> +		priv->set_stall = qcom_adreno_smmu_set_stall;
> +
>   	if (of_device_is_compatible(np, "qcom,smmu-500") &&
>   	    !of_device_is_compatible(np, "qcom,sm8250-smmu-500") &&
>   	    of_device_is_compatible(np, "qcom,adreno-smmu")) {
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index 5e690cf85ec9..183f12e45b02 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -462,10 +462,23 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>   	int idx = smmu_domain->cfg.cbndx;
>   	int ret;
>   
> +	if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq) {

Why is this conditional on being threaded, if the global fault handler 
that can never be threaded at all apparently needs it unconditionally?

Thanks,
Robin.

> +		ret = arm_smmu_rpm_get(smmu);
> +		if (ret < 0)
> +			return IRQ_NONE;
> +	}
> +
> +	if (smmu->impl && smmu->impl->context_fault) {
> +		ret = smmu->impl->context_fault(irq, dev);
> +		goto out_power_off;
> +	}
> +
>   	arm_smmu_read_context_fault_info(smmu, idx, &cfi);
>   
> -	if (!(cfi.fsr & ARM_SMMU_CB_FSR_FAULT))
> -		return IRQ_NONE;
> +	if (!(cfi.fsr & ARM_SMMU_CB_FSR_FAULT)) {
> +		ret = IRQ_NONE;
> +		goto out_power_off;
> +	}
>   
>   	ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
>   		cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
> @@ -480,7 +493,14 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
>   				  ret == -EAGAIN ? 0 : ARM_SMMU_RESUME_TERMINATE);
>   	}
>   
> -	return IRQ_HANDLED;
> +	ret = IRQ_HANDLED;
> +
> +out_power_off:
> +
> +	if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq)
> +		arm_smmu_rpm_put(smmu);
> +
> +	return ret;
>   }
>   
>   static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
> @@ -489,14 +509,21 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
>   	struct arm_smmu_device *smmu = dev;
>   	static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
>   				      DEFAULT_RATELIMIT_BURST);
> +	int ret;
> +
> +	ret = arm_smmu_rpm_get(smmu);
> +	if (ret < 0)
> +		return IRQ_NONE;
>   
>   	gfsr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSR);
>   	gfsynr0 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR0);
>   	gfsynr1 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR1);
>   	gfsynr2 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR2);
>   
> -	if (!gfsr)
> -		return IRQ_NONE;
> +	if (!gfsr) {
> +		ret = IRQ_NONE;
> +		goto out_power_off;
> +	}
>   
>   	if (__ratelimit(&rs)) {
>   		if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) &&
> @@ -513,7 +540,11 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
>   	}
>   
>   	arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sGFSR, gfsr);
> -	return IRQ_HANDLED;
> +	ret = IRQ_HANDLED;
> +
> +out_power_off:
> +	arm_smmu_rpm_put(smmu);
> +	return ret;
>   }
>   
>   static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
> @@ -683,7 +714,6 @@ static int arm_smmu_init_domain_context(struct arm_smmu_domain *smmu_domain,
>   	enum io_pgtable_fmt fmt;
>   	struct iommu_domain *domain = &smmu_domain->domain;
>   	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> -	irqreturn_t (*context_fault)(int irq, void *dev);
>   
>   	mutex_lock(&smmu_domain->init_mutex);
>   	if (smmu_domain->smmu)
> @@ -850,19 +880,14 @@ static int arm_smmu_init_domain_context(struct arm_smmu_domain *smmu_domain,
>   	 */
>   	irq = smmu->irqs[cfg->irptndx];
>   
> -	if (smmu->impl && smmu->impl->context_fault)
> -		context_fault = smmu->impl->context_fault;
> -	else
> -		context_fault = arm_smmu_context_fault;
> -
>   	if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq)
>   		ret = devm_request_threaded_irq(smmu->dev, irq, NULL,
> -						context_fault,
> +						arm_smmu_context_fault,
>   						IRQF_ONESHOT | IRQF_SHARED,
>   						"arm-smmu-context-fault",
>   						smmu_domain);
>   	else
> -		ret = devm_request_irq(smmu->dev, irq, context_fault, IRQF_SHARED,
> +		ret = devm_request_irq(smmu->dev, irq, arm_smmu_context_fault, IRQF_SHARED,
>   				       "arm-smmu-context-fault", smmu_domain);
>   
>   	if (ret < 0) {
> 
> ---
> base-commit: fcb70a56f4d81450114034b2c61f48ce7444a0e2
> change-id: 20251208-smmu-rpm-8bd67db93dca
> 
> Best regards,
> --
> Prakash Gupta <prakash.gupta@oss.qualcomm.com>
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers
  2026-01-27 16:05 ` Robin Murphy
@ 2026-01-28  5:56   ` Prakash Gupta
  2026-01-28 18:44     ` Robin Murphy
  0 siblings, 1 reply; 11+ messages in thread
From: Prakash Gupta @ 2026-01-28  5:56 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon, Joerg Roedel, Rob Clark, Connor Abbott
  Cc: linux-arm-msm, linux-arm-kernel, iommu, linux-kernel,
	Akhil P Oommen, Pratyush Brahma


On 1/27/2026 9:35 PM, Robin Murphy wrote:
> On 2026-01-27 12:11 pm, Prakash Gupta wrote:
>> Commit d4a44f0750bb ("iommu/arm-smmu: Invoke pm_runtime across the
>> driver")
>> enabled pm_runtime for the arm-smmu device. On systems where the SMMU
>> sits in a power domain, all register accesses must be done while the
>> device is runtime-resumed to avoid unclocked register reads and
>> potential NoC errors.
>>
>> So far, this has not been an issue for most SMMU clients because
>> stall-on-fault is enabled by default. While a translation fault is
>> being handled, the SMMU stalls further translations for that context
>> bank, so the fault handler would not race with a powered-down
>> SMMU.
>>
>> Adreno SMMU now disables stall-on-fault in the presence of fault
>> storms to avoid saturating SMMU resources and hanging the GMU. With
>> stall-on-fault disabled, the SMMU can generate faults while its power
>> domain may no longer be enabled, which makes unclocked accesses to
>> fault-status registers in the SMMU fault handlers possible.
>
> At face value, that sounds wrong - how does an SMMU generate a fault,
> or indeed do anything, when it's powered off? In principle it's
> possible that the SMMU might signal an interrupt, and is _then_
> suspended (with the interrupt line somehow remaining asserted, so
> probably more clock-gated than completely powered down) before the
> interrupt hander runs, but we rather assume that we're not going to
> have an unhandled hardware IRQ hanging around for longer than the
> autosuspend delay.
>
> So, judging by the diff below, I guess what you really mean is that in
> the case of a threaded context IRQ handler, it can take long enough
> between handling the hardware IRQ and the thread actually running that
> the SMMU may have suspended in between? 

You are correct that the SMMU cannot generate a fault while powered
down. A more accurate description of the race condition is as follows:
When stall-on-fault is disabled, the faulting transaction does is
terminated. This allows the master (the GPU) to complete its work, drop
its power vote for the SMMU, and allow the SMMU to suspend. However, the
SMMU fault handler may still be waiting to execute on the CPU.
If the SMMU suspends before the handler reads the fault registers, an
unclocked access occurs. This scenario is significantly more likely when
using threaded IRQs due to the scheduling latency involved. I will
update the next iteration to reflect this.

>
>> Guard the context and global fault handlers with arm_smmu_rpm_get() /
>> arm_smmu_rpm_put() so that all SMMU fault register accesses are done
>> with the SMMU powered.
>>
>> Fixes: b13044092c1e ("drm/msm: Temporarily disable stall-on-fault
>> after a page fault")
>> Co-developed-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com>
>> Signed-off-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com>
>> Signed-off-by: Prakash Gupta <prakash.gupta@oss.qualcomm.com>
>> ---
>>   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c |  5 ++-
>>   drivers/iommu/arm/arm-smmu/arm-smmu.c      | 53
>> ++++++++++++++++++++++--------
>>   2 files changed, 43 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>> b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>> index 573085349df3..2d03df72612d 100644
>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>> @@ -317,6 +317,7 @@ static int qcom_adreno_smmu_init_context(struct
>> arm_smmu_domain *smmu_domain,
>>       struct arm_smmu_device *smmu = smmu_domain->smmu;
>>       struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
>>       const struct of_device_id *client_match;
>> +    const struct arm_smmu_impl *impl = qsmmu->data->impl;
>>       int cbndx = smmu_domain->cfg.cbndx;
>>       struct adreno_smmu_priv *priv;
>>   @@ -350,10 +351,12 @@ static int
>> qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
>>       priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
>>       priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
>>       priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
>> -    priv->set_stall = qcom_adreno_smmu_set_stall;
>>       priv->set_prr_bit = NULL;
>>       priv->set_prr_addr = NULL;
>>   +    if (impl->context_fault_needs_threaded_irq)
>> +        priv->set_stall = qcom_adreno_smmu_set_stall;
>> +
>>       if (of_device_is_compatible(np, "qcom,smmu-500") &&
>>           !of_device_is_compatible(np, "qcom,sm8250-smmu-500") &&
>>           of_device_is_compatible(np, "qcom,adreno-smmu")) {
>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>> b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>> index 5e690cf85ec9..183f12e45b02 100644
>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>> @@ -462,10 +462,23 @@ static irqreturn_t arm_smmu_context_fault(int
>> irq, void *dev)
>>       int idx = smmu_domain->cfg.cbndx;
>>       int ret;
>>   +    if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq) {
>
> Why is this conditional on being threaded, if the global fault handler
> that can never be threaded at all apparently needs it unconditionally? 
Synchronous runtime PM calls can sleep, which would cause issue if
called within a hard IRQ context. This is why I added the conditional
check for threaded IRQs.
Furthermore, this change only allow the driver to override the
stall-on-fault setting when context_fault_needs_threaded_irq is true.
Since the unclocked access issue is tied to disabling stall-on-fault,
the fix is only logically required for the threaded IRQ path.
For the Global Fault handler, which runs in a hard IRQ context, you are
right—we cannot safely vote for power there. I will remove the runtime
PM call from that section.
>
> Thanks,
> Robin.
>
>> +        ret = arm_smmu_rpm_get(smmu);
>> +        if (ret < 0)
>> +            return IRQ_NONE;
>> +    }
>> +
>> +    if (smmu->impl && smmu->impl->context_fault) {
>> +        ret = smmu->impl->context_fault(irq, dev);
>> +        goto out_power_off;
>> +    }
>> +
>>       arm_smmu_read_context_fault_info(smmu, idx, &cfi);
>>   -    if (!(cfi.fsr & ARM_SMMU_CB_FSR_FAULT))
>> -        return IRQ_NONE;
>> +    if (!(cfi.fsr & ARM_SMMU_CB_FSR_FAULT)) {
>> +        ret = IRQ_NONE;
>> +        goto out_power_off;
>> +    }
>>         ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
>>           cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE :
>> IOMMU_FAULT_READ);
>> @@ -480,7 +493,14 @@ static irqreturn_t arm_smmu_context_fault(int
>> irq, void *dev)
>>                     ret == -EAGAIN ? 0 : ARM_SMMU_RESUME_TERMINATE);
>>       }
>>   -    return IRQ_HANDLED;
>> +    ret = IRQ_HANDLED;
>> +
>> +out_power_off:
>> +
>> +    if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq)
>> +        arm_smmu_rpm_put(smmu);
>> +
>> +    return ret;
>>   }
>>     static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
>> @@ -489,14 +509,21 @@ static irqreturn_t arm_smmu_global_fault(int
>> irq, void *dev)
>>       struct arm_smmu_device *smmu = dev;
>>       static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
>>                         DEFAULT_RATELIMIT_BURST);
>> +    int ret;
>> +
>> +    ret = arm_smmu_rpm_get(smmu);
>> +    if (ret < 0)
>> +        return IRQ_NONE;
>>         gfsr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSR);
>>       gfsynr0 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR0);
>>       gfsynr1 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR1);
>>       gfsynr2 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR2);
>>   -    if (!gfsr)
>> -        return IRQ_NONE;
>> +    if (!gfsr) {
>> +        ret = IRQ_NONE;
>> +        goto out_power_off;
>> +    }
>>         if (__ratelimit(&rs)) {
>>           if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) &&
>> @@ -513,7 +540,11 @@ static irqreturn_t arm_smmu_global_fault(int
>> irq, void *dev)
>>       }
>>         arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sGFSR, gfsr);
>> -    return IRQ_HANDLED;
>> +    ret = IRQ_HANDLED;
>> +
>> +out_power_off:
>> +    arm_smmu_rpm_put(smmu);
>> +    return ret;
>>   }
>>     static void arm_smmu_init_context_bank(struct arm_smmu_domain
>> *smmu_domain,
>> @@ -683,7 +714,6 @@ static int arm_smmu_init_domain_context(struct
>> arm_smmu_domain *smmu_domain,
>>       enum io_pgtable_fmt fmt;
>>       struct iommu_domain *domain = &smmu_domain->domain;
>>       struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
>> -    irqreturn_t (*context_fault)(int irq, void *dev);
>>         mutex_lock(&smmu_domain->init_mutex);
>>       if (smmu_domain->smmu)
>> @@ -850,19 +880,14 @@ static int arm_smmu_init_domain_context(struct
>> arm_smmu_domain *smmu_domain,
>>        */
>>       irq = smmu->irqs[cfg->irptndx];
>>   -    if (smmu->impl && smmu->impl->context_fault)
>> -        context_fault = smmu->impl->context_fault;
>> -    else
>> -        context_fault = arm_smmu_context_fault;
>> -
>>       if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq)
>>           ret = devm_request_threaded_irq(smmu->dev, irq, NULL,
>> -                        context_fault,
>> +                        arm_smmu_context_fault,
>>                           IRQF_ONESHOT | IRQF_SHARED,
>>                           "arm-smmu-context-fault",
>>                           smmu_domain);
>>       else
>> -        ret = devm_request_irq(smmu->dev, irq, context_fault,
>> IRQF_SHARED,
>> +        ret = devm_request_irq(smmu->dev, irq,
>> arm_smmu_context_fault, IRQF_SHARED,
>>                          "arm-smmu-context-fault", smmu_domain);
>>         if (ret < 0) {
>>
>> ---
>> base-commit: fcb70a56f4d81450114034b2c61f48ce7444a0e2
>> change-id: 20251208-smmu-rpm-8bd67db93dca
>>
>> Best regards,
>> -- 
>> Prakash Gupta <prakash.gupta@oss.qualcomm.com>
>>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers
  2026-01-28  5:56   ` Prakash Gupta
@ 2026-01-28 18:44     ` Robin Murphy
  2026-01-29 16:03       ` Prakash Gupta
  2026-02-02 20:14       ` Pranjal Shrivastava
  0 siblings, 2 replies; 11+ messages in thread
From: Robin Murphy @ 2026-01-28 18:44 UTC (permalink / raw)
  To: Prakash Gupta, Will Deacon, Joerg Roedel, Rob Clark,
	Connor Abbott
  Cc: linux-arm-msm, linux-arm-kernel, iommu, linux-kernel,
	Akhil P Oommen, Pratyush Brahma, Pranjal Shrivastava

[ +Pranjal as this might matter for v3 too... ]

On 28/01/2026 5:56 am, Prakash Gupta wrote:
> 
> On 1/27/2026 9:35 PM, Robin Murphy wrote:
>> On 2026-01-27 12:11 pm, Prakash Gupta wrote:
>>> Commit d4a44f0750bb ("iommu/arm-smmu: Invoke pm_runtime across the
>>> driver")
>>> enabled pm_runtime for the arm-smmu device. On systems where the SMMU
>>> sits in a power domain, all register accesses must be done while the
>>> device is runtime-resumed to avoid unclocked register reads and
>>> potential NoC errors.
>>>
>>> So far, this has not been an issue for most SMMU clients because
>>> stall-on-fault is enabled by default. While a translation fault is
>>> being handled, the SMMU stalls further translations for that context
>>> bank, so the fault handler would not race with a powered-down
>>> SMMU.
>>>
>>> Adreno SMMU now disables stall-on-fault in the presence of fault
>>> storms to avoid saturating SMMU resources and hanging the GMU. With
>>> stall-on-fault disabled, the SMMU can generate faults while its power
>>> domain may no longer be enabled, which makes unclocked accesses to
>>> fault-status registers in the SMMU fault handlers possible.
>>
>> At face value, that sounds wrong - how does an SMMU generate a fault,
>> or indeed do anything, when it's powered off? In principle it's
>> possible that the SMMU might signal an interrupt, and is _then_
>> suspended (with the interrupt line somehow remaining asserted, so
>> probably more clock-gated than completely powered down) before the
>> interrupt hander runs, but we rather assume that we're not going to
>> have an unhandled hardware IRQ hanging around for longer than the
>> autosuspend delay.
>>
>> So, judging by the diff below, I guess what you really mean is that in
>> the case of a threaded context IRQ handler, it can take long enough
>> between handling the hardware IRQ and the thread actually running that
>> the SMMU may have suspended in between?
> 
> You are correct that the SMMU cannot generate a fault while powered
> down. A more accurate description of the race condition is as follows:
> When stall-on-fault is disabled, the faulting transaction does is
> terminated. This allows the master (the GPU) to complete its work, drop
> its power vote for the SMMU, and allow the SMMU to suspend. However, the
> SMMU fault handler may still be waiting to execute on the CPU.
> If the SMMU suspends before the handler reads the fault registers, an
> unclocked access occurs. This scenario is significantly more likely when
> using threaded IRQs due to the scheduling latency involved. I will
> update the next iteration to reflect this.
> 
>>
>>> Guard the context and global fault handlers with arm_smmu_rpm_get() /
>>> arm_smmu_rpm_put() so that all SMMU fault register accesses are done
>>> with the SMMU powered.
>>>
>>> Fixes: b13044092c1e ("drm/msm: Temporarily disable stall-on-fault
>>> after a page fault")
>>> Co-developed-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com>
>>> Signed-off-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com>
>>> Signed-off-by: Prakash Gupta <prakash.gupta@oss.qualcomm.com>
>>> ---
>>>    drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c |  5 ++-
>>>    drivers/iommu/arm/arm-smmu/arm-smmu.c      | 53
>>> ++++++++++++++++++++++--------
>>>    2 files changed, 43 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>> b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>> index 573085349df3..2d03df72612d 100644
>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>> @@ -317,6 +317,7 @@ static int qcom_adreno_smmu_init_context(struct
>>> arm_smmu_domain *smmu_domain,
>>>        struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>        struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
>>>        const struct of_device_id *client_match;
>>> +    const struct arm_smmu_impl *impl = qsmmu->data->impl;
>>>        int cbndx = smmu_domain->cfg.cbndx;
>>>        struct adreno_smmu_priv *priv;
>>>    @@ -350,10 +351,12 @@ static int
>>> qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
>>>        priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
>>>        priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
>>>        priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
>>> -    priv->set_stall = qcom_adreno_smmu_set_stall;
>>>        priv->set_prr_bit = NULL;
>>>        priv->set_prr_addr = NULL;
>>>    +    if (impl->context_fault_needs_threaded_irq)
>>> +        priv->set_stall = qcom_adreno_smmu_set_stall;
>>> +
>>>        if (of_device_is_compatible(np, "qcom,smmu-500") &&
>>>            !of_device_is_compatible(np, "qcom,sm8250-smmu-500") &&
>>>            of_device_is_compatible(np, "qcom,adreno-smmu")) {
>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>> b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>> index 5e690cf85ec9..183f12e45b02 100644
>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>> @@ -462,10 +462,23 @@ static irqreturn_t arm_smmu_context_fault(int
>>> irq, void *dev)
>>>        int idx = smmu_domain->cfg.cbndx;
>>>        int ret;
>>>    +    if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq) {
>>
>> Why is this conditional on being threaded, if the global fault handler
>> that can never be threaded at all apparently needs it unconditionally?
> Synchronous runtime PM calls can sleep, which would cause issue if
> called within a hard IRQ context. This is why I added the conditional
> check for threaded IRQs.
> Furthermore, this change only allow the driver to override the
> stall-on-fault setting when context_fault_needs_threaded_irq is true.
> Since the unclocked access issue is tied to disabling stall-on-fault,
> the fix is only logically required for the threaded IRQ path.
> For the Global Fault handler, which runs in a hard IRQ context, you are
> right—we cannot safely vote for power there. I will remove the runtime
> PM call from that section.

Hmm, but then how *do* we actually guarantee that autosuspend doesn't 
happen to kick in and power down the SMMU just as a hardirq handler 
runs, when there's some unexpected event? I fear there's a horrible can 
of worms here...

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers
  2026-01-28 18:44     ` Robin Murphy
@ 2026-01-29 16:03       ` Prakash Gupta
  2026-02-02 20:14       ` Pranjal Shrivastava
  1 sibling, 0 replies; 11+ messages in thread
From: Prakash Gupta @ 2026-01-29 16:03 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon, Joerg Roedel, Rob Clark, Connor Abbott
  Cc: linux-arm-msm, linux-arm-kernel, iommu, linux-kernel,
	Akhil P Oommen, Pratyush Brahma, Pranjal Shrivastava


On 1/29/2026 12:14 AM, Robin Murphy wrote:
> [ +Pranjal as this might matter for v3 too... ]
>
> On 28/01/2026 5:56 am, Prakash Gupta wrote:
>>
>> On 1/27/2026 9:35 PM, Robin Murphy wrote:
>>> On 2026-01-27 12:11 pm, Prakash Gupta wrote:
>>>> Commit d4a44f0750bb ("iommu/arm-smmu: Invoke pm_runtime across the
>>>> driver")
>>>> enabled pm_runtime for the arm-smmu device. On systems where the SMMU
>>>> sits in a power domain, all register accesses must be done while the
>>>> device is runtime-resumed to avoid unclocked register reads and
>>>> potential NoC errors.
>>>>
>>>> So far, this has not been an issue for most SMMU clients because
>>>> stall-on-fault is enabled by default. While a translation fault is
>>>> being handled, the SMMU stalls further translations for that context
>>>> bank, so the fault handler would not race with a powered-down
>>>> SMMU.
>>>>
>>>> Adreno SMMU now disables stall-on-fault in the presence of fault
>>>> storms to avoid saturating SMMU resources and hanging the GMU. With
>>>> stall-on-fault disabled, the SMMU can generate faults while its power
>>>> domain may no longer be enabled, which makes unclocked accesses to
>>>> fault-status registers in the SMMU fault handlers possible.
>>>
>>> At face value, that sounds wrong - how does an SMMU generate a fault,
>>> or indeed do anything, when it's powered off? In principle it's
>>> possible that the SMMU might signal an interrupt, and is _then_
>>> suspended (with the interrupt line somehow remaining asserted, so
>>> probably more clock-gated than completely powered down) before the
>>> interrupt hander runs, but we rather assume that we're not going to
>>> have an unhandled hardware IRQ hanging around for longer than the
>>> autosuspend delay.
>>>
>>> So, judging by the diff below, I guess what you really mean is that in
>>> the case of a threaded context IRQ handler, it can take long enough
>>> between handling the hardware IRQ and the thread actually running that
>>> the SMMU may have suspended in between?
>>
>> You are correct that the SMMU cannot generate a fault while powered
>> down. A more accurate description of the race condition is as follows:
>> When stall-on-fault is disabled, the faulting transaction does is
>> terminated. This allows the master (the GPU) to complete its work, drop
>> its power vote for the SMMU, and allow the SMMU to suspend. However, the
>> SMMU fault handler may still be waiting to execute on the CPU.
>> If the SMMU suspends before the handler reads the fault registers, an
>> unclocked access occurs. This scenario is significantly more likely when
>> using threaded IRQs due to the scheduling latency involved. I will
>> update the next iteration to reflect this.
>>
>>>
>>>> Guard the context and global fault handlers with arm_smmu_rpm_get() /
>>>> arm_smmu_rpm_put() so that all SMMU fault register accesses are done
>>>> with the SMMU powered.
>>>>
>>>> Fixes: b13044092c1e ("drm/msm: Temporarily disable stall-on-fault
>>>> after a page fault")
>>>> Co-developed-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com>
>>>> Signed-off-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com>
>>>> Signed-off-by: Prakash Gupta <prakash.gupta@oss.qualcomm.com>
>>>> ---
>>>>    drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c |  5 ++-
>>>>    drivers/iommu/arm/arm-smmu/arm-smmu.c      | 53
>>>> ++++++++++++++++++++++--------
>>>>    2 files changed, 43 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>> b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>> index 573085349df3..2d03df72612d 100644
>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>>>> @@ -317,6 +317,7 @@ static int qcom_adreno_smmu_init_context(struct
>>>> arm_smmu_domain *smmu_domain,
>>>>        struct arm_smmu_device *smmu = smmu_domain->smmu;
>>>>        struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
>>>>        const struct of_device_id *client_match;
>>>> +    const struct arm_smmu_impl *impl = qsmmu->data->impl;
>>>>        int cbndx = smmu_domain->cfg.cbndx;
>>>>        struct adreno_smmu_priv *priv;
>>>>    @@ -350,10 +351,12 @@ static int
>>>> qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
>>>>        priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
>>>>        priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
>>>>        priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
>>>> -    priv->set_stall = qcom_adreno_smmu_set_stall;
>>>>        priv->set_prr_bit = NULL;
>>>>        priv->set_prr_addr = NULL;
>>>>    +    if (impl->context_fault_needs_threaded_irq)
>>>> +        priv->set_stall = qcom_adreno_smmu_set_stall;
>>>> +
>>>>        if (of_device_is_compatible(np, "qcom,smmu-500") &&
>>>>            !of_device_is_compatible(np, "qcom,sm8250-smmu-500") &&
>>>>            of_device_is_compatible(np, "qcom,adreno-smmu")) {
>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>>> b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>>> index 5e690cf85ec9..183f12e45b02 100644
>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>>> @@ -462,10 +462,23 @@ static irqreturn_t arm_smmu_context_fault(int
>>>> irq, void *dev)
>>>>        int idx = smmu_domain->cfg.cbndx;
>>>>        int ret;
>>>>    +    if (smmu->impl &&
>>>> smmu->impl->context_fault_needs_threaded_irq) {
>>>
>>> Why is this conditional on being threaded, if the global fault handler
>>> that can never be threaded at all apparently needs it unconditionally?
>> Synchronous runtime PM calls can sleep, which would cause issue if
>> called within a hard IRQ context. This is why I added the conditional
>> check for threaded IRQs.
>> Furthermore, this change only allow the driver to override the
>> stall-on-fault setting when context_fault_needs_threaded_irq is true.
>> Since the unclocked access issue is tied to disabling stall-on-fault,
>> the fix is only logically required for the threaded IRQ path.
>> For the Global Fault handler, which runs in a hard IRQ context, you are
>> right—we cannot safely vote for power there. I will remove the runtime
>> PM call from that section.
>
> Hmm, but then how *do* we actually guarantee that autosuspend doesn't
> happen to kick in and power down the SMMU just as a hardirq handler
> runs, when there's some unexpected event? I fear there's a horrible
> can of worms here... 
>
You are right. To safely handle the race where the device suspends
before the handler runs, we must be able to resume it. Since that
requires sleeping, the hard IRQ handler is insufficient if the device is
not already active.

So, disabling stall-on-fault can only be safely allowed when both the
global and context fault handlers are moved to threaded IRQs. This will
allow us to safely use runtime resume in the handler, ensuring the
device is powered to clear the fault status regardless of when the
suspend occurred.

> Thanks,
> Robin.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers
  2026-01-28 18:44     ` Robin Murphy
  2026-01-29 16:03       ` Prakash Gupta
@ 2026-02-02 20:14       ` Pranjal Shrivastava
  2026-02-10 11:09         ` Prakash Gupta
  1 sibling, 1 reply; 11+ messages in thread
From: Pranjal Shrivastava @ 2026-02-02 20:14 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Prakash Gupta, Will Deacon, Joerg Roedel, Rob Clark,
	Connor Abbott, linux-arm-msm, linux-arm-kernel, iommu,
	linux-kernel, Akhil P Oommen, Pratyush Brahma

On Wed, Jan 28, 2026 at 06:44:35PM +0000, Robin Murphy wrote:
> [ +Pranjal as this might matter for v3 too... ]
> 

Hi Robin,

To weigh in from the arm-smmu-v3 side, we’ve attempted to address the
"can of worms" regarding power races by leaning on these differences:

 - Threaded IRQs for PRI/Events: In the recent series[1], the PRI and
   event handlers are fully threaded. This allows us to call 
   arm_smmu_rpm_get() safely, as the handler can sleep while waiting for
   the hardware to resume.

 - GERROR Handling: Since GERROR remains a hard IRQ, we handle any
   pending gerrors in the suspend callback before the SMMU actually
   powers down. Any GERROR interrupts received while the device was
   suspended are treated as spurious and ignored.

Thanks,
Praan 

[1] https://lore.kernel.org/all/20260126151157.3418145-1-praan@google.com/

[...]

> 
> Hmm, but then how *do* we actually guarantee that autosuspend doesn't happen
> to kick in and power down the SMMU just as a hardirq handler runs, when
> there's some unexpected event? I fear there's a horrible can of worms
> here...
> 
> Thanks,
> Robin.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers
  2026-02-02 20:14       ` Pranjal Shrivastava
@ 2026-02-10 11:09         ` Prakash Gupta
  2026-02-10 13:15           ` Pranjal Shrivastava
  0 siblings, 1 reply; 11+ messages in thread
From: Prakash Gupta @ 2026-02-10 11:09 UTC (permalink / raw)
  To: Pranjal Shrivastava, Robin Murphy
  Cc: Will Deacon, Joerg Roedel, Rob Clark, Connor Abbott,
	linux-arm-msm, linux-arm-kernel, iommu, linux-kernel,
	Akhil P Oommen, Pratyush Brahma



On 2/3/2026 1:44 AM, Pranjal Shrivastava wrote:
> On Wed, Jan 28, 2026 at 06:44:35PM +0000, Robin Murphy wrote:
>> [ +Pranjal as this might matter for v3 too... ]
>>
> 
> Hi Robin,
> 
> To weigh in from the arm-smmu-v3 side, we’ve attempted to address the
> "can of worms" regarding power races by leaning on these differences:
> 
>  - Threaded IRQs for PRI/Events: In the recent series[1], the PRI and
>    event handlers are fully threaded. This allows us to call 
>    arm_smmu_rpm_get() safely, as the handler can sleep while waiting for
>    the hardware to resume.
> 
>  - GERROR Handling: Since GERROR remains a hard IRQ, we handle any
>    pending gerrors in the suspend callback before the SMMU actually
>    powers down. Any GERROR interrupts received while the device was
>    suspended are treated as spurious and ignored.
> 
> Thanks,
> Praan

[1] refer to case where SMMU state is not retained during smmu device
power down, this I think is equally applicable for both context and
global faults.

Since the ARM SMMU runtime resume triggers a device reset, any pending
faults would be cleared during resume. Here the solution can be to
handle both global and context faults before allowing the SMMU device to
suspend.
With this approach, any hard or threaded IRQ scheduled after the SMMU
device has suspended can be safely ignored.
One concern I see is iommu fault reporting to clients while handling
fault during smmu device suspend.

Thanks,
Prakash

[1] https://lore.kernel.org/all/20260126151157.3418145-9-praan@google.com/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers
  2026-02-10 11:09         ` Prakash Gupta
@ 2026-02-10 13:15           ` Pranjal Shrivastava
  2026-02-11 16:10             ` Prakash Gupta
  0 siblings, 1 reply; 11+ messages in thread
From: Pranjal Shrivastava @ 2026-02-10 13:15 UTC (permalink / raw)
  To: Prakash Gupta
  Cc: Robin Murphy, Will Deacon, Joerg Roedel, Rob Clark, Connor Abbott,
	linux-arm-msm, linux-arm-kernel, iommu, linux-kernel,
	Akhil P Oommen, Pratyush Brahma

On Tue, Feb 10, 2026 at 04:39:56PM +0530, Prakash Gupta wrote:
> 
> 
> On 2/3/2026 1:44 AM, Pranjal Shrivastava wrote:
> > On Wed, Jan 28, 2026 at 06:44:35PM +0000, Robin Murphy wrote:
> >> [ +Pranjal as this might matter for v3 too... ]
> >>
> > 
> > Hi Robin,
> > 
> > To weigh in from the arm-smmu-v3 side, we’ve attempted to address the
> > "can of worms" regarding power races by leaning on these differences:
> > 
> >  - Threaded IRQs for PRI/Events: In the recent series[1], the PRI and
> >    event handlers are fully threaded. This allows us to call 
> >    arm_smmu_rpm_get() safely, as the handler can sleep while waiting for
> >    the hardware to resume.
> > 
> >  - GERROR Handling: Since GERROR remains a hard IRQ, we handle any
> >    pending gerrors in the suspend callback before the SMMU actually
> >    powers down. Any GERROR interrupts received while the device was
> >    suspended are treated as spurious and ignored.
> > 
> > Thanks,
> > Praan
> 
> [1] refer to case where SMMU state is not retained during smmu device
> power down, this I think is equally applicable for both context and
> global faults.
> 
> Since the ARM SMMU runtime resume triggers a device reset, any pending
> faults would be cleared during resume. Here the solution can be to
> handle both global and context faults before allowing the SMMU device to
> suspend.
> With this approach, any hard or threaded IRQ scheduled after the SMMU
> device has suspended can be safely ignored.
> One concern I see is iommu fault reporting to clients while handling
> fault during smmu device suspend.

I believe by the time we've reached suspend it's safe to assume that all
clients have been suspended. Thus, we could simply not report the error
and instead scream by having a dev_warn_ratelimited about the situation.

Thanks,
Praan

> 
> Thanks,
> Prakash
> 
> [1] https://lore.kernel.org/all/20260126151157.3418145-9-praan@google.com/
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers
  2026-02-10 13:15           ` Pranjal Shrivastava
@ 2026-02-11 16:10             ` Prakash Gupta
  2026-02-11 16:59               ` Pranjal Shrivastava
  0 siblings, 1 reply; 11+ messages in thread
From: Prakash Gupta @ 2026-02-11 16:10 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: Robin Murphy, Will Deacon, Joerg Roedel, Rob Clark, Connor Abbott,
	linux-arm-msm, linux-arm-kernel, iommu, linux-kernel,
	Akhil P Oommen, Pratyush Brahma



On 2/10/2026 6:45 PM, Pranjal Shrivastava wrote:
> On Tue, Feb 10, 2026 at 04:39:56PM +0530, Prakash Gupta wrote:
>>
>>
>> On 2/3/2026 1:44 AM, Pranjal Shrivastava wrote:
>>> On Wed, Jan 28, 2026 at 06:44:35PM +0000, Robin Murphy wrote:
>>>> [ +Pranjal as this might matter for v3 too... ]
>>>>
>>>
>>> Hi Robin,
>>>
>>> To weigh in from the arm-smmu-v3 side, we’ve attempted to address the
>>> "can of worms" regarding power races by leaning on these differences:
>>>
>>>  - Threaded IRQs for PRI/Events: In the recent series[1], the PRI and
>>>    event handlers are fully threaded. This allows us to call 
>>>    arm_smmu_rpm_get() safely, as the handler can sleep while waiting for
>>>    the hardware to resume.
>>>
>>>  - GERROR Handling: Since GERROR remains a hard IRQ, we handle any
>>>    pending gerrors in the suspend callback before the SMMU actually
>>>    powers down. Any GERROR interrupts received while the device was
>>>    suspended are treated as spurious and ignored.
>>>
>>> Thanks,
>>> Praan
>>
>> [1] refer to case where SMMU state is not retained during smmu device
>> power down, this I think is equally applicable for both context and
>> global faults.
>>
>> Since the ARM SMMU runtime resume triggers a device reset, any pending
>> faults would be cleared during resume. Here the solution can be to
>> handle both global and context faults before allowing the SMMU device to
>> suspend.
>> With this approach, any hard or threaded IRQ scheduled after the SMMU
>> device has suspended can be safely ignored.
>> One concern I see is iommu fault reporting to clients while handling
>> fault during smmu device suspend.
> 
> I believe by the time we've reached suspend it's safe to assume that all
> clients have been suspended. Thus, we could simply not report the error
> and instead scream by having a dev_warn_ratelimited about the situation.
> 

By reporting error I meant reporting the error to client with
report_iommu_fault(). I agree that if smmu device is being suspended the
dma devices should have suspended by now. If so, it should be safe to
just handle the fault excluding report_iommu_fault() in suspend path and
complete smmu device suspend. Will update in next patchset.

> Thanks,
> Praan
> 
>>
>> Thanks,
>> Prakash
>>
>> [1] https://lore.kernel.org/all/20260126151157.3418145-9-praan@google.com/
>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers
  2026-02-11 16:10             ` Prakash Gupta
@ 2026-02-11 16:59               ` Pranjal Shrivastava
  0 siblings, 0 replies; 11+ messages in thread
From: Pranjal Shrivastava @ 2026-02-11 16:59 UTC (permalink / raw)
  To: Prakash Gupta
  Cc: Robin Murphy, Will Deacon, Joerg Roedel, Rob Clark, Connor Abbott,
	linux-arm-msm, linux-arm-kernel, iommu, linux-kernel,
	Akhil P Oommen, Pratyush Brahma

On Wed, Feb 11, 2026 at 09:40:29PM +0530, Prakash Gupta wrote:
> 
> 
> On 2/10/2026 6:45 PM, Pranjal Shrivastava wrote:
> > On Tue, Feb 10, 2026 at 04:39:56PM +0530, Prakash Gupta wrote:
> >>
> >>
> >> On 2/3/2026 1:44 AM, Pranjal Shrivastava wrote:
> >>> On Wed, Jan 28, 2026 at 06:44:35PM +0000, Robin Murphy wrote:
> >>>> [ +Pranjal as this might matter for v3 too... ]
> >>>>
> >>>
> >>> Hi Robin,
> >>>
> >>> To weigh in from the arm-smmu-v3 side, we’ve attempted to address the
> >>> "can of worms" regarding power races by leaning on these differences:
> >>>
> >>>  - Threaded IRQs for PRI/Events: In the recent series[1], the PRI and
> >>>    event handlers are fully threaded. This allows us to call 
> >>>    arm_smmu_rpm_get() safely, as the handler can sleep while waiting for
> >>>    the hardware to resume.
> >>>
> >>>  - GERROR Handling: Since GERROR remains a hard IRQ, we handle any
> >>>    pending gerrors in the suspend callback before the SMMU actually
> >>>    powers down. Any GERROR interrupts received while the device was
> >>>    suspended are treated as spurious and ignored.
> >>>
> >>> Thanks,
> >>> Praan
> >>
> >> [1] refer to case where SMMU state is not retained during smmu device
> >> power down, this I think is equally applicable for both context and
> >> global faults.
> >>
> >> Since the ARM SMMU runtime resume triggers a device reset, any pending
> >> faults would be cleared during resume. Here the solution can be to
> >> handle both global and context faults before allowing the SMMU device to
> >> suspend.
> >> With this approach, any hard or threaded IRQ scheduled after the SMMU
> >> device has suspended can be safely ignored.
> >> One concern I see is iommu fault reporting to clients while handling
> >> fault during smmu device suspend.
> > 
> > I believe by the time we've reached suspend it's safe to assume that all
> > clients have been suspended. Thus, we could simply not report the error
> > and instead scream by having a dev_warn_ratelimited about the situation.
> > 
> 
> By reporting error I meant reporting the error to client with
> report_iommu_fault(). I agree that if smmu device is being suspended the
> dma devices should have suspended by now. If so, it should be safe to
> just handle the fault excluding report_iommu_fault() in suspend path and
> complete smmu device suspend. Will update in next patchset.
> 

Yes, that's what I meant, since the client is likely suspended, we can't
call report_iommu_fault() because the client might've registered a
handler which touches some MMIO (accesses registers) while the client
is suspended. Thus, if we see a fault during suspend, we could
potentially log it at an appropriate level and not call
report_iommu_fault()..

Thanks,
Praan

> > 
> >>
> >> Thanks,
> >> Prakash
> >>
> >> [1] https://lore.kernel.org/all/20260126151157.3418145-9-praan@google.com/
> >>
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-02-11 16:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-27 12:11 [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers Prakash Gupta
2026-01-27 12:52 ` Akhil P Oommen
2026-01-27 16:05 ` Robin Murphy
2026-01-28  5:56   ` Prakash Gupta
2026-01-28 18:44     ` Robin Murphy
2026-01-29 16:03       ` Prakash Gupta
2026-02-02 20:14       ` Pranjal Shrivastava
2026-02-10 11:09         ` Prakash Gupta
2026-02-10 13:15           ` Pranjal Shrivastava
2026-02-11 16:10             ` Prakash Gupta
2026-02-11 16:59               ` Pranjal Shrivastava

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox