* [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers
@ 2026-01-27 12:11 Prakash Gupta
2026-01-27 12:52 ` Akhil P Oommen
2026-01-27 16:05 ` Robin Murphy
0 siblings, 2 replies; 11+ messages in thread
From: Prakash Gupta @ 2026-01-27 12:11 UTC (permalink / raw)
To: Will Deacon, Robin Murphy, Joerg Roedel, Rob Clark, Connor Abbott
Cc: linux-arm-msm, linux-arm-kernel, iommu, linux-kernel,
Akhil P Oommen, Pratyush Brahma, Prakash Gupta
Commit d4a44f0750bb ("iommu/arm-smmu: Invoke pm_runtime across the driver")
enabled pm_runtime for the arm-smmu device. On systems where the SMMU
sits in a power domain, all register accesses must be done while the
device is runtime-resumed to avoid unclocked register reads and
potential NoC errors.
So far, this has not been an issue for most SMMU clients because
stall-on-fault is enabled by default. While a translation fault is
being handled, the SMMU stalls further translations for that context
bank, so the fault handler would not race with a powered-down
SMMU.
Adreno SMMU now disables stall-on-fault in the presence of fault
storms to avoid saturating SMMU resources and hanging the GMU. With
stall-on-fault disabled, the SMMU can generate faults while its power
domain may no longer be enabled, which makes unclocked accesses to
fault-status registers in the SMMU fault handlers possible.
Guard the context and global fault handlers with arm_smmu_rpm_get() /
arm_smmu_rpm_put() so that all SMMU fault register accesses are done
with the SMMU powered.
Fixes: b13044092c1e ("drm/msm: Temporarily disable stall-on-fault after a page fault")
Co-developed-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com>
Signed-off-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com>
Signed-off-by: Prakash Gupta <prakash.gupta@oss.qualcomm.com>
---
drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 5 ++-
drivers/iommu/arm/arm-smmu/arm-smmu.c | 53 ++++++++++++++++++++++--------
2 files changed, 43 insertions(+), 15 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 573085349df3..2d03df72612d 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -317,6 +317,7 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
const struct of_device_id *client_match;
+ const struct arm_smmu_impl *impl = qsmmu->data->impl;
int cbndx = smmu_domain->cfg.cbndx;
struct adreno_smmu_priv *priv;
@@ -350,10 +351,12 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg;
priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg;
priv->get_fault_info = qcom_adreno_smmu_get_fault_info;
- priv->set_stall = qcom_adreno_smmu_set_stall;
priv->set_prr_bit = NULL;
priv->set_prr_addr = NULL;
+ if (impl->context_fault_needs_threaded_irq)
+ priv->set_stall = qcom_adreno_smmu_set_stall;
+
if (of_device_is_compatible(np, "qcom,smmu-500") &&
!of_device_is_compatible(np, "qcom,sm8250-smmu-500") &&
of_device_is_compatible(np, "qcom,adreno-smmu")) {
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 5e690cf85ec9..183f12e45b02 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -462,10 +462,23 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
int idx = smmu_domain->cfg.cbndx;
int ret;
+ if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq) {
+ ret = arm_smmu_rpm_get(smmu);
+ if (ret < 0)
+ return IRQ_NONE;
+ }
+
+ if (smmu->impl && smmu->impl->context_fault) {
+ ret = smmu->impl->context_fault(irq, dev);
+ goto out_power_off;
+ }
+
arm_smmu_read_context_fault_info(smmu, idx, &cfi);
- if (!(cfi.fsr & ARM_SMMU_CB_FSR_FAULT))
- return IRQ_NONE;
+ if (!(cfi.fsr & ARM_SMMU_CB_FSR_FAULT)) {
+ ret = IRQ_NONE;
+ goto out_power_off;
+ }
ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova,
cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ);
@@ -480,7 +493,14 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
ret == -EAGAIN ? 0 : ARM_SMMU_RESUME_TERMINATE);
}
- return IRQ_HANDLED;
+ ret = IRQ_HANDLED;
+
+out_power_off:
+
+ if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq)
+ arm_smmu_rpm_put(smmu);
+
+ return ret;
}
static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
@@ -489,14 +509,21 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
struct arm_smmu_device *smmu = dev;
static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);
+ int ret;
+
+ ret = arm_smmu_rpm_get(smmu);
+ if (ret < 0)
+ return IRQ_NONE;
gfsr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSR);
gfsynr0 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR0);
gfsynr1 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR1);
gfsynr2 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR2);
- if (!gfsr)
- return IRQ_NONE;
+ if (!gfsr) {
+ ret = IRQ_NONE;
+ goto out_power_off;
+ }
if (__ratelimit(&rs)) {
if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) &&
@@ -513,7 +540,11 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
}
arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sGFSR, gfsr);
- return IRQ_HANDLED;
+ ret = IRQ_HANDLED;
+
+out_power_off:
+ arm_smmu_rpm_put(smmu);
+ return ret;
}
static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
@@ -683,7 +714,6 @@ static int arm_smmu_init_domain_context(struct arm_smmu_domain *smmu_domain,
enum io_pgtable_fmt fmt;
struct iommu_domain *domain = &smmu_domain->domain;
struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
- irqreturn_t (*context_fault)(int irq, void *dev);
mutex_lock(&smmu_domain->init_mutex);
if (smmu_domain->smmu)
@@ -850,19 +880,14 @@ static int arm_smmu_init_domain_context(struct arm_smmu_domain *smmu_domain,
*/
irq = smmu->irqs[cfg->irptndx];
- if (smmu->impl && smmu->impl->context_fault)
- context_fault = smmu->impl->context_fault;
- else
- context_fault = arm_smmu_context_fault;
-
if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq)
ret = devm_request_threaded_irq(smmu->dev, irq, NULL,
- context_fault,
+ arm_smmu_context_fault,
IRQF_ONESHOT | IRQF_SHARED,
"arm-smmu-context-fault",
smmu_domain);
else
- ret = devm_request_irq(smmu->dev, irq, context_fault, IRQF_SHARED,
+ ret = devm_request_irq(smmu->dev, irq, arm_smmu_context_fault, IRQF_SHARED,
"arm-smmu-context-fault", smmu_domain);
if (ret < 0) {
---
base-commit: fcb70a56f4d81450114034b2c61f48ce7444a0e2
change-id: 20251208-smmu-rpm-8bd67db93dca
Best regards,
--
Prakash Gupta <prakash.gupta@oss.qualcomm.com>
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers 2026-01-27 12:11 [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers Prakash Gupta @ 2026-01-27 12:52 ` Akhil P Oommen 2026-01-27 16:05 ` Robin Murphy 1 sibling, 0 replies; 11+ messages in thread From: Akhil P Oommen @ 2026-01-27 12:52 UTC (permalink / raw) To: Prakash Gupta, Will Deacon, Robin Murphy, Joerg Roedel, Rob Clark, Connor Abbott Cc: linux-arm-msm, linux-arm-kernel, iommu, linux-kernel, Pratyush Brahma On 1/27/2026 5:41 PM, Prakash Gupta wrote: > Commit d4a44f0750bb ("iommu/arm-smmu: Invoke pm_runtime across the driver") > enabled pm_runtime for the arm-smmu device. On systems where the SMMU > sits in a power domain, all register accesses must be done while the > device is runtime-resumed to avoid unclocked register reads and > potential NoC errors. > > So far, this has not been an issue for most SMMU clients because > stall-on-fault is enabled by default. While a translation fault is > being handled, the SMMU stalls further translations for that context > bank, so the fault handler would not race with a powered-down > SMMU. > > Adreno SMMU now disables stall-on-fault in the presence of fault > storms to avoid saturating SMMU resources and hanging the GMU. With > stall-on-fault disabled, the SMMU can generate faults while its power > domain may no longer be enabled, which makes unclocked accesses to > fault-status registers in the SMMU fault handlers possible. > > Guard the context and global fault handlers with arm_smmu_rpm_get() / > arm_smmu_rpm_put() so that all SMMU fault register accesses are done > with the SMMU powered. > > Fixes: b13044092c1e ("drm/msm: Temporarily disable stall-on-fault after a page fault") > Co-developed-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com> > Signed-off-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com> > Signed-off-by: Prakash Gupta <prakash.gupta@oss.qualcomm.com> Acked-by: Akhil P Oommen <akhilpo@oss.qualcomm.com> -Akhil ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers 2026-01-27 12:11 [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers Prakash Gupta 2026-01-27 12:52 ` Akhil P Oommen @ 2026-01-27 16:05 ` Robin Murphy 2026-01-28 5:56 ` Prakash Gupta 1 sibling, 1 reply; 11+ messages in thread From: Robin Murphy @ 2026-01-27 16:05 UTC (permalink / raw) To: Prakash Gupta, Will Deacon, Joerg Roedel, Rob Clark, Connor Abbott Cc: linux-arm-msm, linux-arm-kernel, iommu, linux-kernel, Akhil P Oommen, Pratyush Brahma On 2026-01-27 12:11 pm, Prakash Gupta wrote: > Commit d4a44f0750bb ("iommu/arm-smmu: Invoke pm_runtime across the driver") > enabled pm_runtime for the arm-smmu device. On systems where the SMMU > sits in a power domain, all register accesses must be done while the > device is runtime-resumed to avoid unclocked register reads and > potential NoC errors. > > So far, this has not been an issue for most SMMU clients because > stall-on-fault is enabled by default. While a translation fault is > being handled, the SMMU stalls further translations for that context > bank, so the fault handler would not race with a powered-down > SMMU. > > Adreno SMMU now disables stall-on-fault in the presence of fault > storms to avoid saturating SMMU resources and hanging the GMU. With > stall-on-fault disabled, the SMMU can generate faults while its power > domain may no longer be enabled, which makes unclocked accesses to > fault-status registers in the SMMU fault handlers possible. At face value, that sounds wrong - how does an SMMU generate a fault, or indeed do anything, when it's powered off? In principle it's possible that the SMMU might signal an interrupt, and is _then_ suspended (with the interrupt line somehow remaining asserted, so probably more clock-gated than completely powered down) before the interrupt hander runs, but we rather assume that we're not going to have an unhandled hardware IRQ hanging around for longer than the autosuspend delay. So, judging by the diff below, I guess what you really mean is that in the case of a threaded context IRQ handler, it can take long enough between handling the hardware IRQ and the thread actually running that the SMMU may have suspended in between? > Guard the context and global fault handlers with arm_smmu_rpm_get() / > arm_smmu_rpm_put() so that all SMMU fault register accesses are done > with the SMMU powered. > > Fixes: b13044092c1e ("drm/msm: Temporarily disable stall-on-fault after a page fault") > Co-developed-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com> > Signed-off-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com> > Signed-off-by: Prakash Gupta <prakash.gupta@oss.qualcomm.com> > --- > drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 5 ++- > drivers/iommu/arm/arm-smmu/arm-smmu.c | 53 ++++++++++++++++++++++-------- > 2 files changed, 43 insertions(+), 15 deletions(-) > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > index 573085349df3..2d03df72612d 100644 > --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > @@ -317,6 +317,7 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, > struct arm_smmu_device *smmu = smmu_domain->smmu; > struct qcom_smmu *qsmmu = to_qcom_smmu(smmu); > const struct of_device_id *client_match; > + const struct arm_smmu_impl *impl = qsmmu->data->impl; > int cbndx = smmu_domain->cfg.cbndx; > struct adreno_smmu_priv *priv; > > @@ -350,10 +351,12 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, > priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg; > priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg; > priv->get_fault_info = qcom_adreno_smmu_get_fault_info; > - priv->set_stall = qcom_adreno_smmu_set_stall; > priv->set_prr_bit = NULL; > priv->set_prr_addr = NULL; > > + if (impl->context_fault_needs_threaded_irq) > + priv->set_stall = qcom_adreno_smmu_set_stall; > + > if (of_device_is_compatible(np, "qcom,smmu-500") && > !of_device_is_compatible(np, "qcom,sm8250-smmu-500") && > of_device_is_compatible(np, "qcom,adreno-smmu")) { > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c > index 5e690cf85ec9..183f12e45b02 100644 > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c > @@ -462,10 +462,23 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev) > int idx = smmu_domain->cfg.cbndx; > int ret; > > + if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq) { Why is this conditional on being threaded, if the global fault handler that can never be threaded at all apparently needs it unconditionally? Thanks, Robin. > + ret = arm_smmu_rpm_get(smmu); > + if (ret < 0) > + return IRQ_NONE; > + } > + > + if (smmu->impl && smmu->impl->context_fault) { > + ret = smmu->impl->context_fault(irq, dev); > + goto out_power_off; > + } > + > arm_smmu_read_context_fault_info(smmu, idx, &cfi); > > - if (!(cfi.fsr & ARM_SMMU_CB_FSR_FAULT)) > - return IRQ_NONE; > + if (!(cfi.fsr & ARM_SMMU_CB_FSR_FAULT)) { > + ret = IRQ_NONE; > + goto out_power_off; > + } > > ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova, > cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ); > @@ -480,7 +493,14 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev) > ret == -EAGAIN ? 0 : ARM_SMMU_RESUME_TERMINATE); > } > > - return IRQ_HANDLED; > + ret = IRQ_HANDLED; > + > +out_power_off: > + > + if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq) > + arm_smmu_rpm_put(smmu); > + > + return ret; > } > > static irqreturn_t arm_smmu_global_fault(int irq, void *dev) > @@ -489,14 +509,21 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev) > struct arm_smmu_device *smmu = dev; > static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL, > DEFAULT_RATELIMIT_BURST); > + int ret; > + > + ret = arm_smmu_rpm_get(smmu); > + if (ret < 0) > + return IRQ_NONE; > > gfsr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSR); > gfsynr0 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR0); > gfsynr1 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR1); > gfsynr2 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR2); > > - if (!gfsr) > - return IRQ_NONE; > + if (!gfsr) { > + ret = IRQ_NONE; > + goto out_power_off; > + } > > if (__ratelimit(&rs)) { > if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) && > @@ -513,7 +540,11 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev) > } > > arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sGFSR, gfsr); > - return IRQ_HANDLED; > + ret = IRQ_HANDLED; > + > +out_power_off: > + arm_smmu_rpm_put(smmu); > + return ret; > } > > static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain, > @@ -683,7 +714,6 @@ static int arm_smmu_init_domain_context(struct arm_smmu_domain *smmu_domain, > enum io_pgtable_fmt fmt; > struct iommu_domain *domain = &smmu_domain->domain; > struct arm_smmu_cfg *cfg = &smmu_domain->cfg; > - irqreturn_t (*context_fault)(int irq, void *dev); > > mutex_lock(&smmu_domain->init_mutex); > if (smmu_domain->smmu) > @@ -850,19 +880,14 @@ static int arm_smmu_init_domain_context(struct arm_smmu_domain *smmu_domain, > */ > irq = smmu->irqs[cfg->irptndx]; > > - if (smmu->impl && smmu->impl->context_fault) > - context_fault = smmu->impl->context_fault; > - else > - context_fault = arm_smmu_context_fault; > - > if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq) > ret = devm_request_threaded_irq(smmu->dev, irq, NULL, > - context_fault, > + arm_smmu_context_fault, > IRQF_ONESHOT | IRQF_SHARED, > "arm-smmu-context-fault", > smmu_domain); > else > - ret = devm_request_irq(smmu->dev, irq, context_fault, IRQF_SHARED, > + ret = devm_request_irq(smmu->dev, irq, arm_smmu_context_fault, IRQF_SHARED, > "arm-smmu-context-fault", smmu_domain); > > if (ret < 0) { > > --- > base-commit: fcb70a56f4d81450114034b2c61f48ce7444a0e2 > change-id: 20251208-smmu-rpm-8bd67db93dca > > Best regards, > -- > Prakash Gupta <prakash.gupta@oss.qualcomm.com> > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers 2026-01-27 16:05 ` Robin Murphy @ 2026-01-28 5:56 ` Prakash Gupta 2026-01-28 18:44 ` Robin Murphy 0 siblings, 1 reply; 11+ messages in thread From: Prakash Gupta @ 2026-01-28 5:56 UTC (permalink / raw) To: Robin Murphy, Will Deacon, Joerg Roedel, Rob Clark, Connor Abbott Cc: linux-arm-msm, linux-arm-kernel, iommu, linux-kernel, Akhil P Oommen, Pratyush Brahma On 1/27/2026 9:35 PM, Robin Murphy wrote: > On 2026-01-27 12:11 pm, Prakash Gupta wrote: >> Commit d4a44f0750bb ("iommu/arm-smmu: Invoke pm_runtime across the >> driver") >> enabled pm_runtime for the arm-smmu device. On systems where the SMMU >> sits in a power domain, all register accesses must be done while the >> device is runtime-resumed to avoid unclocked register reads and >> potential NoC errors. >> >> So far, this has not been an issue for most SMMU clients because >> stall-on-fault is enabled by default. While a translation fault is >> being handled, the SMMU stalls further translations for that context >> bank, so the fault handler would not race with a powered-down >> SMMU. >> >> Adreno SMMU now disables stall-on-fault in the presence of fault >> storms to avoid saturating SMMU resources and hanging the GMU. With >> stall-on-fault disabled, the SMMU can generate faults while its power >> domain may no longer be enabled, which makes unclocked accesses to >> fault-status registers in the SMMU fault handlers possible. > > At face value, that sounds wrong - how does an SMMU generate a fault, > or indeed do anything, when it's powered off? In principle it's > possible that the SMMU might signal an interrupt, and is _then_ > suspended (with the interrupt line somehow remaining asserted, so > probably more clock-gated than completely powered down) before the > interrupt hander runs, but we rather assume that we're not going to > have an unhandled hardware IRQ hanging around for longer than the > autosuspend delay. > > So, judging by the diff below, I guess what you really mean is that in > the case of a threaded context IRQ handler, it can take long enough > between handling the hardware IRQ and the thread actually running that > the SMMU may have suspended in between? You are correct that the SMMU cannot generate a fault while powered down. A more accurate description of the race condition is as follows: When stall-on-fault is disabled, the faulting transaction does is terminated. This allows the master (the GPU) to complete its work, drop its power vote for the SMMU, and allow the SMMU to suspend. However, the SMMU fault handler may still be waiting to execute on the CPU. If the SMMU suspends before the handler reads the fault registers, an unclocked access occurs. This scenario is significantly more likely when using threaded IRQs due to the scheduling latency involved. I will update the next iteration to reflect this. > >> Guard the context and global fault handlers with arm_smmu_rpm_get() / >> arm_smmu_rpm_put() so that all SMMU fault register accesses are done >> with the SMMU powered. >> >> Fixes: b13044092c1e ("drm/msm: Temporarily disable stall-on-fault >> after a page fault") >> Co-developed-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com> >> Signed-off-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com> >> Signed-off-by: Prakash Gupta <prakash.gupta@oss.qualcomm.com> >> --- >> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 5 ++- >> drivers/iommu/arm/arm-smmu/arm-smmu.c | 53 >> ++++++++++++++++++++++-------- >> 2 files changed, 43 insertions(+), 15 deletions(-) >> >> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >> b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >> index 573085349df3..2d03df72612d 100644 >> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >> @@ -317,6 +317,7 @@ static int qcom_adreno_smmu_init_context(struct >> arm_smmu_domain *smmu_domain, >> struct arm_smmu_device *smmu = smmu_domain->smmu; >> struct qcom_smmu *qsmmu = to_qcom_smmu(smmu); >> const struct of_device_id *client_match; >> + const struct arm_smmu_impl *impl = qsmmu->data->impl; >> int cbndx = smmu_domain->cfg.cbndx; >> struct adreno_smmu_priv *priv; >> @@ -350,10 +351,12 @@ static int >> qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, >> priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg; >> priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg; >> priv->get_fault_info = qcom_adreno_smmu_get_fault_info; >> - priv->set_stall = qcom_adreno_smmu_set_stall; >> priv->set_prr_bit = NULL; >> priv->set_prr_addr = NULL; >> + if (impl->context_fault_needs_threaded_irq) >> + priv->set_stall = qcom_adreno_smmu_set_stall; >> + >> if (of_device_is_compatible(np, "qcom,smmu-500") && >> !of_device_is_compatible(np, "qcom,sm8250-smmu-500") && >> of_device_is_compatible(np, "qcom,adreno-smmu")) { >> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c >> b/drivers/iommu/arm/arm-smmu/arm-smmu.c >> index 5e690cf85ec9..183f12e45b02 100644 >> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c >> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c >> @@ -462,10 +462,23 @@ static irqreturn_t arm_smmu_context_fault(int >> irq, void *dev) >> int idx = smmu_domain->cfg.cbndx; >> int ret; >> + if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq) { > > Why is this conditional on being threaded, if the global fault handler > that can never be threaded at all apparently needs it unconditionally? Synchronous runtime PM calls can sleep, which would cause issue if called within a hard IRQ context. This is why I added the conditional check for threaded IRQs. Furthermore, this change only allow the driver to override the stall-on-fault setting when context_fault_needs_threaded_irq is true. Since the unclocked access issue is tied to disabling stall-on-fault, the fix is only logically required for the threaded IRQ path. For the Global Fault handler, which runs in a hard IRQ context, you are right—we cannot safely vote for power there. I will remove the runtime PM call from that section. > > Thanks, > Robin. > >> + ret = arm_smmu_rpm_get(smmu); >> + if (ret < 0) >> + return IRQ_NONE; >> + } >> + >> + if (smmu->impl && smmu->impl->context_fault) { >> + ret = smmu->impl->context_fault(irq, dev); >> + goto out_power_off; >> + } >> + >> arm_smmu_read_context_fault_info(smmu, idx, &cfi); >> - if (!(cfi.fsr & ARM_SMMU_CB_FSR_FAULT)) >> - return IRQ_NONE; >> + if (!(cfi.fsr & ARM_SMMU_CB_FSR_FAULT)) { >> + ret = IRQ_NONE; >> + goto out_power_off; >> + } >> ret = report_iommu_fault(&smmu_domain->domain, NULL, cfi.iova, >> cfi.fsynr & ARM_SMMU_CB_FSYNR0_WNR ? IOMMU_FAULT_WRITE : >> IOMMU_FAULT_READ); >> @@ -480,7 +493,14 @@ static irqreturn_t arm_smmu_context_fault(int >> irq, void *dev) >> ret == -EAGAIN ? 0 : ARM_SMMU_RESUME_TERMINATE); >> } >> - return IRQ_HANDLED; >> + ret = IRQ_HANDLED; >> + >> +out_power_off: >> + >> + if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq) >> + arm_smmu_rpm_put(smmu); >> + >> + return ret; >> } >> static irqreturn_t arm_smmu_global_fault(int irq, void *dev) >> @@ -489,14 +509,21 @@ static irqreturn_t arm_smmu_global_fault(int >> irq, void *dev) >> struct arm_smmu_device *smmu = dev; >> static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL, >> DEFAULT_RATELIMIT_BURST); >> + int ret; >> + >> + ret = arm_smmu_rpm_get(smmu); >> + if (ret < 0) >> + return IRQ_NONE; >> gfsr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSR); >> gfsynr0 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR0); >> gfsynr1 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR1); >> gfsynr2 = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sGFSYNR2); >> - if (!gfsr) >> - return IRQ_NONE; >> + if (!gfsr) { >> + ret = IRQ_NONE; >> + goto out_power_off; >> + } >> if (__ratelimit(&rs)) { >> if (IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT) && >> @@ -513,7 +540,11 @@ static irqreturn_t arm_smmu_global_fault(int >> irq, void *dev) >> } >> arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sGFSR, gfsr); >> - return IRQ_HANDLED; >> + ret = IRQ_HANDLED; >> + >> +out_power_off: >> + arm_smmu_rpm_put(smmu); >> + return ret; >> } >> static void arm_smmu_init_context_bank(struct arm_smmu_domain >> *smmu_domain, >> @@ -683,7 +714,6 @@ static int arm_smmu_init_domain_context(struct >> arm_smmu_domain *smmu_domain, >> enum io_pgtable_fmt fmt; >> struct iommu_domain *domain = &smmu_domain->domain; >> struct arm_smmu_cfg *cfg = &smmu_domain->cfg; >> - irqreturn_t (*context_fault)(int irq, void *dev); >> mutex_lock(&smmu_domain->init_mutex); >> if (smmu_domain->smmu) >> @@ -850,19 +880,14 @@ static int arm_smmu_init_domain_context(struct >> arm_smmu_domain *smmu_domain, >> */ >> irq = smmu->irqs[cfg->irptndx]; >> - if (smmu->impl && smmu->impl->context_fault) >> - context_fault = smmu->impl->context_fault; >> - else >> - context_fault = arm_smmu_context_fault; >> - >> if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq) >> ret = devm_request_threaded_irq(smmu->dev, irq, NULL, >> - context_fault, >> + arm_smmu_context_fault, >> IRQF_ONESHOT | IRQF_SHARED, >> "arm-smmu-context-fault", >> smmu_domain); >> else >> - ret = devm_request_irq(smmu->dev, irq, context_fault, >> IRQF_SHARED, >> + ret = devm_request_irq(smmu->dev, irq, >> arm_smmu_context_fault, IRQF_SHARED, >> "arm-smmu-context-fault", smmu_domain); >> if (ret < 0) { >> >> --- >> base-commit: fcb70a56f4d81450114034b2c61f48ce7444a0e2 >> change-id: 20251208-smmu-rpm-8bd67db93dca >> >> Best regards, >> -- >> Prakash Gupta <prakash.gupta@oss.qualcomm.com> >> > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers 2026-01-28 5:56 ` Prakash Gupta @ 2026-01-28 18:44 ` Robin Murphy 2026-01-29 16:03 ` Prakash Gupta 2026-02-02 20:14 ` Pranjal Shrivastava 0 siblings, 2 replies; 11+ messages in thread From: Robin Murphy @ 2026-01-28 18:44 UTC (permalink / raw) To: Prakash Gupta, Will Deacon, Joerg Roedel, Rob Clark, Connor Abbott Cc: linux-arm-msm, linux-arm-kernel, iommu, linux-kernel, Akhil P Oommen, Pratyush Brahma, Pranjal Shrivastava [ +Pranjal as this might matter for v3 too... ] On 28/01/2026 5:56 am, Prakash Gupta wrote: > > On 1/27/2026 9:35 PM, Robin Murphy wrote: >> On 2026-01-27 12:11 pm, Prakash Gupta wrote: >>> Commit d4a44f0750bb ("iommu/arm-smmu: Invoke pm_runtime across the >>> driver") >>> enabled pm_runtime for the arm-smmu device. On systems where the SMMU >>> sits in a power domain, all register accesses must be done while the >>> device is runtime-resumed to avoid unclocked register reads and >>> potential NoC errors. >>> >>> So far, this has not been an issue for most SMMU clients because >>> stall-on-fault is enabled by default. While a translation fault is >>> being handled, the SMMU stalls further translations for that context >>> bank, so the fault handler would not race with a powered-down >>> SMMU. >>> >>> Adreno SMMU now disables stall-on-fault in the presence of fault >>> storms to avoid saturating SMMU resources and hanging the GMU. With >>> stall-on-fault disabled, the SMMU can generate faults while its power >>> domain may no longer be enabled, which makes unclocked accesses to >>> fault-status registers in the SMMU fault handlers possible. >> >> At face value, that sounds wrong - how does an SMMU generate a fault, >> or indeed do anything, when it's powered off? In principle it's >> possible that the SMMU might signal an interrupt, and is _then_ >> suspended (with the interrupt line somehow remaining asserted, so >> probably more clock-gated than completely powered down) before the >> interrupt hander runs, but we rather assume that we're not going to >> have an unhandled hardware IRQ hanging around for longer than the >> autosuspend delay. >> >> So, judging by the diff below, I guess what you really mean is that in >> the case of a threaded context IRQ handler, it can take long enough >> between handling the hardware IRQ and the thread actually running that >> the SMMU may have suspended in between? > > You are correct that the SMMU cannot generate a fault while powered > down. A more accurate description of the race condition is as follows: > When stall-on-fault is disabled, the faulting transaction does is > terminated. This allows the master (the GPU) to complete its work, drop > its power vote for the SMMU, and allow the SMMU to suspend. However, the > SMMU fault handler may still be waiting to execute on the CPU. > If the SMMU suspends before the handler reads the fault registers, an > unclocked access occurs. This scenario is significantly more likely when > using threaded IRQs due to the scheduling latency involved. I will > update the next iteration to reflect this. > >> >>> Guard the context and global fault handlers with arm_smmu_rpm_get() / >>> arm_smmu_rpm_put() so that all SMMU fault register accesses are done >>> with the SMMU powered. >>> >>> Fixes: b13044092c1e ("drm/msm: Temporarily disable stall-on-fault >>> after a page fault") >>> Co-developed-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com> >>> Signed-off-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com> >>> Signed-off-by: Prakash Gupta <prakash.gupta@oss.qualcomm.com> >>> --- >>> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 5 ++- >>> drivers/iommu/arm/arm-smmu/arm-smmu.c | 53 >>> ++++++++++++++++++++++-------- >>> 2 files changed, 43 insertions(+), 15 deletions(-) >>> >>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>> b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>> index 573085349df3..2d03df72612d 100644 >>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>> @@ -317,6 +317,7 @@ static int qcom_adreno_smmu_init_context(struct >>> arm_smmu_domain *smmu_domain, >>> struct arm_smmu_device *smmu = smmu_domain->smmu; >>> struct qcom_smmu *qsmmu = to_qcom_smmu(smmu); >>> const struct of_device_id *client_match; >>> + const struct arm_smmu_impl *impl = qsmmu->data->impl; >>> int cbndx = smmu_domain->cfg.cbndx; >>> struct adreno_smmu_priv *priv; >>> @@ -350,10 +351,12 @@ static int >>> qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, >>> priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg; >>> priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg; >>> priv->get_fault_info = qcom_adreno_smmu_get_fault_info; >>> - priv->set_stall = qcom_adreno_smmu_set_stall; >>> priv->set_prr_bit = NULL; >>> priv->set_prr_addr = NULL; >>> + if (impl->context_fault_needs_threaded_irq) >>> + priv->set_stall = qcom_adreno_smmu_set_stall; >>> + >>> if (of_device_is_compatible(np, "qcom,smmu-500") && >>> !of_device_is_compatible(np, "qcom,sm8250-smmu-500") && >>> of_device_is_compatible(np, "qcom,adreno-smmu")) { >>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c >>> b/drivers/iommu/arm/arm-smmu/arm-smmu.c >>> index 5e690cf85ec9..183f12e45b02 100644 >>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c >>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c >>> @@ -462,10 +462,23 @@ static irqreturn_t arm_smmu_context_fault(int >>> irq, void *dev) >>> int idx = smmu_domain->cfg.cbndx; >>> int ret; >>> + if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq) { >> >> Why is this conditional on being threaded, if the global fault handler >> that can never be threaded at all apparently needs it unconditionally? > Synchronous runtime PM calls can sleep, which would cause issue if > called within a hard IRQ context. This is why I added the conditional > check for threaded IRQs. > Furthermore, this change only allow the driver to override the > stall-on-fault setting when context_fault_needs_threaded_irq is true. > Since the unclocked access issue is tied to disabling stall-on-fault, > the fix is only logically required for the threaded IRQ path. > For the Global Fault handler, which runs in a hard IRQ context, you are > right—we cannot safely vote for power there. I will remove the runtime > PM call from that section. Hmm, but then how *do* we actually guarantee that autosuspend doesn't happen to kick in and power down the SMMU just as a hardirq handler runs, when there's some unexpected event? I fear there's a horrible can of worms here... Thanks, Robin. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers 2026-01-28 18:44 ` Robin Murphy @ 2026-01-29 16:03 ` Prakash Gupta 2026-02-02 20:14 ` Pranjal Shrivastava 1 sibling, 0 replies; 11+ messages in thread From: Prakash Gupta @ 2026-01-29 16:03 UTC (permalink / raw) To: Robin Murphy, Will Deacon, Joerg Roedel, Rob Clark, Connor Abbott Cc: linux-arm-msm, linux-arm-kernel, iommu, linux-kernel, Akhil P Oommen, Pratyush Brahma, Pranjal Shrivastava On 1/29/2026 12:14 AM, Robin Murphy wrote: > [ +Pranjal as this might matter for v3 too... ] > > On 28/01/2026 5:56 am, Prakash Gupta wrote: >> >> On 1/27/2026 9:35 PM, Robin Murphy wrote: >>> On 2026-01-27 12:11 pm, Prakash Gupta wrote: >>>> Commit d4a44f0750bb ("iommu/arm-smmu: Invoke pm_runtime across the >>>> driver") >>>> enabled pm_runtime for the arm-smmu device. On systems where the SMMU >>>> sits in a power domain, all register accesses must be done while the >>>> device is runtime-resumed to avoid unclocked register reads and >>>> potential NoC errors. >>>> >>>> So far, this has not been an issue for most SMMU clients because >>>> stall-on-fault is enabled by default. While a translation fault is >>>> being handled, the SMMU stalls further translations for that context >>>> bank, so the fault handler would not race with a powered-down >>>> SMMU. >>>> >>>> Adreno SMMU now disables stall-on-fault in the presence of fault >>>> storms to avoid saturating SMMU resources and hanging the GMU. With >>>> stall-on-fault disabled, the SMMU can generate faults while its power >>>> domain may no longer be enabled, which makes unclocked accesses to >>>> fault-status registers in the SMMU fault handlers possible. >>> >>> At face value, that sounds wrong - how does an SMMU generate a fault, >>> or indeed do anything, when it's powered off? In principle it's >>> possible that the SMMU might signal an interrupt, and is _then_ >>> suspended (with the interrupt line somehow remaining asserted, so >>> probably more clock-gated than completely powered down) before the >>> interrupt hander runs, but we rather assume that we're not going to >>> have an unhandled hardware IRQ hanging around for longer than the >>> autosuspend delay. >>> >>> So, judging by the diff below, I guess what you really mean is that in >>> the case of a threaded context IRQ handler, it can take long enough >>> between handling the hardware IRQ and the thread actually running that >>> the SMMU may have suspended in between? >> >> You are correct that the SMMU cannot generate a fault while powered >> down. A more accurate description of the race condition is as follows: >> When stall-on-fault is disabled, the faulting transaction does is >> terminated. This allows the master (the GPU) to complete its work, drop >> its power vote for the SMMU, and allow the SMMU to suspend. However, the >> SMMU fault handler may still be waiting to execute on the CPU. >> If the SMMU suspends before the handler reads the fault registers, an >> unclocked access occurs. This scenario is significantly more likely when >> using threaded IRQs due to the scheduling latency involved. I will >> update the next iteration to reflect this. >> >>> >>>> Guard the context and global fault handlers with arm_smmu_rpm_get() / >>>> arm_smmu_rpm_put() so that all SMMU fault register accesses are done >>>> with the SMMU powered. >>>> >>>> Fixes: b13044092c1e ("drm/msm: Temporarily disable stall-on-fault >>>> after a page fault") >>>> Co-developed-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com> >>>> Signed-off-by: Pratyush Brahma <pratyush.brahma@oss.qualcomm.com> >>>> Signed-off-by: Prakash Gupta <prakash.gupta@oss.qualcomm.com> >>>> --- >>>> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 5 ++- >>>> drivers/iommu/arm/arm-smmu/arm-smmu.c | 53 >>>> ++++++++++++++++++++++-------- >>>> 2 files changed, 43 insertions(+), 15 deletions(-) >>>> >>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>> b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>> index 573085349df3..2d03df72612d 100644 >>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>> @@ -317,6 +317,7 @@ static int qcom_adreno_smmu_init_context(struct >>>> arm_smmu_domain *smmu_domain, >>>> struct arm_smmu_device *smmu = smmu_domain->smmu; >>>> struct qcom_smmu *qsmmu = to_qcom_smmu(smmu); >>>> const struct of_device_id *client_match; >>>> + const struct arm_smmu_impl *impl = qsmmu->data->impl; >>>> int cbndx = smmu_domain->cfg.cbndx; >>>> struct adreno_smmu_priv *priv; >>>> @@ -350,10 +351,12 @@ static int >>>> qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, >>>> priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg; >>>> priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg; >>>> priv->get_fault_info = qcom_adreno_smmu_get_fault_info; >>>> - priv->set_stall = qcom_adreno_smmu_set_stall; >>>> priv->set_prr_bit = NULL; >>>> priv->set_prr_addr = NULL; >>>> + if (impl->context_fault_needs_threaded_irq) >>>> + priv->set_stall = qcom_adreno_smmu_set_stall; >>>> + >>>> if (of_device_is_compatible(np, "qcom,smmu-500") && >>>> !of_device_is_compatible(np, "qcom,sm8250-smmu-500") && >>>> of_device_is_compatible(np, "qcom,adreno-smmu")) { >>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c >>>> b/drivers/iommu/arm/arm-smmu/arm-smmu.c >>>> index 5e690cf85ec9..183f12e45b02 100644 >>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c >>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c >>>> @@ -462,10 +462,23 @@ static irqreturn_t arm_smmu_context_fault(int >>>> irq, void *dev) >>>> int idx = smmu_domain->cfg.cbndx; >>>> int ret; >>>> + if (smmu->impl && >>>> smmu->impl->context_fault_needs_threaded_irq) { >>> >>> Why is this conditional on being threaded, if the global fault handler >>> that can never be threaded at all apparently needs it unconditionally? >> Synchronous runtime PM calls can sleep, which would cause issue if >> called within a hard IRQ context. This is why I added the conditional >> check for threaded IRQs. >> Furthermore, this change only allow the driver to override the >> stall-on-fault setting when context_fault_needs_threaded_irq is true. >> Since the unclocked access issue is tied to disabling stall-on-fault, >> the fix is only logically required for the threaded IRQ path. >> For the Global Fault handler, which runs in a hard IRQ context, you are >> right—we cannot safely vote for power there. I will remove the runtime >> PM call from that section. > > Hmm, but then how *do* we actually guarantee that autosuspend doesn't > happen to kick in and power down the SMMU just as a hardirq handler > runs, when there's some unexpected event? I fear there's a horrible > can of worms here... > You are right. To safely handle the race where the device suspends before the handler runs, we must be able to resume it. Since that requires sleeping, the hard IRQ handler is insufficient if the device is not already active. So, disabling stall-on-fault can only be safely allowed when both the global and context fault handlers are moved to threaded IRQs. This will allow us to safely use runtime resume in the handler, ensuring the device is powered to clear the fault status regardless of when the suspend occurred. > Thanks, > Robin. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers 2026-01-28 18:44 ` Robin Murphy 2026-01-29 16:03 ` Prakash Gupta @ 2026-02-02 20:14 ` Pranjal Shrivastava 2026-02-10 11:09 ` Prakash Gupta 1 sibling, 1 reply; 11+ messages in thread From: Pranjal Shrivastava @ 2026-02-02 20:14 UTC (permalink / raw) To: Robin Murphy Cc: Prakash Gupta, Will Deacon, Joerg Roedel, Rob Clark, Connor Abbott, linux-arm-msm, linux-arm-kernel, iommu, linux-kernel, Akhil P Oommen, Pratyush Brahma On Wed, Jan 28, 2026 at 06:44:35PM +0000, Robin Murphy wrote: > [ +Pranjal as this might matter for v3 too... ] > Hi Robin, To weigh in from the arm-smmu-v3 side, we’ve attempted to address the "can of worms" regarding power races by leaning on these differences: - Threaded IRQs for PRI/Events: In the recent series[1], the PRI and event handlers are fully threaded. This allows us to call arm_smmu_rpm_get() safely, as the handler can sleep while waiting for the hardware to resume. - GERROR Handling: Since GERROR remains a hard IRQ, we handle any pending gerrors in the suspend callback before the SMMU actually powers down. Any GERROR interrupts received while the device was suspended are treated as spurious and ignored. Thanks, Praan [1] https://lore.kernel.org/all/20260126151157.3418145-1-praan@google.com/ [...] > > Hmm, but then how *do* we actually guarantee that autosuspend doesn't happen > to kick in and power down the SMMU just as a hardirq handler runs, when > there's some unexpected event? I fear there's a horrible can of worms > here... > > Thanks, > Robin. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers 2026-02-02 20:14 ` Pranjal Shrivastava @ 2026-02-10 11:09 ` Prakash Gupta 2026-02-10 13:15 ` Pranjal Shrivastava 0 siblings, 1 reply; 11+ messages in thread From: Prakash Gupta @ 2026-02-10 11:09 UTC (permalink / raw) To: Pranjal Shrivastava, Robin Murphy Cc: Will Deacon, Joerg Roedel, Rob Clark, Connor Abbott, linux-arm-msm, linux-arm-kernel, iommu, linux-kernel, Akhil P Oommen, Pratyush Brahma On 2/3/2026 1:44 AM, Pranjal Shrivastava wrote: > On Wed, Jan 28, 2026 at 06:44:35PM +0000, Robin Murphy wrote: >> [ +Pranjal as this might matter for v3 too... ] >> > > Hi Robin, > > To weigh in from the arm-smmu-v3 side, we’ve attempted to address the > "can of worms" regarding power races by leaning on these differences: > > - Threaded IRQs for PRI/Events: In the recent series[1], the PRI and > event handlers are fully threaded. This allows us to call > arm_smmu_rpm_get() safely, as the handler can sleep while waiting for > the hardware to resume. > > - GERROR Handling: Since GERROR remains a hard IRQ, we handle any > pending gerrors in the suspend callback before the SMMU actually > powers down. Any GERROR interrupts received while the device was > suspended are treated as spurious and ignored. > > Thanks, > Praan [1] refer to case where SMMU state is not retained during smmu device power down, this I think is equally applicable for both context and global faults. Since the ARM SMMU runtime resume triggers a device reset, any pending faults would be cleared during resume. Here the solution can be to handle both global and context faults before allowing the SMMU device to suspend. With this approach, any hard or threaded IRQ scheduled after the SMMU device has suspended can be safely ignored. One concern I see is iommu fault reporting to clients while handling fault during smmu device suspend. Thanks, Prakash [1] https://lore.kernel.org/all/20260126151157.3418145-9-praan@google.com/ ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers 2026-02-10 11:09 ` Prakash Gupta @ 2026-02-10 13:15 ` Pranjal Shrivastava 2026-02-11 16:10 ` Prakash Gupta 0 siblings, 1 reply; 11+ messages in thread From: Pranjal Shrivastava @ 2026-02-10 13:15 UTC (permalink / raw) To: Prakash Gupta Cc: Robin Murphy, Will Deacon, Joerg Roedel, Rob Clark, Connor Abbott, linux-arm-msm, linux-arm-kernel, iommu, linux-kernel, Akhil P Oommen, Pratyush Brahma On Tue, Feb 10, 2026 at 04:39:56PM +0530, Prakash Gupta wrote: > > > On 2/3/2026 1:44 AM, Pranjal Shrivastava wrote: > > On Wed, Jan 28, 2026 at 06:44:35PM +0000, Robin Murphy wrote: > >> [ +Pranjal as this might matter for v3 too... ] > >> > > > > Hi Robin, > > > > To weigh in from the arm-smmu-v3 side, we’ve attempted to address the > > "can of worms" regarding power races by leaning on these differences: > > > > - Threaded IRQs for PRI/Events: In the recent series[1], the PRI and > > event handlers are fully threaded. This allows us to call > > arm_smmu_rpm_get() safely, as the handler can sleep while waiting for > > the hardware to resume. > > > > - GERROR Handling: Since GERROR remains a hard IRQ, we handle any > > pending gerrors in the suspend callback before the SMMU actually > > powers down. Any GERROR interrupts received while the device was > > suspended are treated as spurious and ignored. > > > > Thanks, > > Praan > > [1] refer to case where SMMU state is not retained during smmu device > power down, this I think is equally applicable for both context and > global faults. > > Since the ARM SMMU runtime resume triggers a device reset, any pending > faults would be cleared during resume. Here the solution can be to > handle both global and context faults before allowing the SMMU device to > suspend. > With this approach, any hard or threaded IRQ scheduled after the SMMU > device has suspended can be safely ignored. > One concern I see is iommu fault reporting to clients while handling > fault during smmu device suspend. I believe by the time we've reached suspend it's safe to assume that all clients have been suspended. Thus, we could simply not report the error and instead scream by having a dev_warn_ratelimited about the situation. Thanks, Praan > > Thanks, > Prakash > > [1] https://lore.kernel.org/all/20260126151157.3418145-9-praan@google.com/ > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers 2026-02-10 13:15 ` Pranjal Shrivastava @ 2026-02-11 16:10 ` Prakash Gupta 2026-02-11 16:59 ` Pranjal Shrivastava 0 siblings, 1 reply; 11+ messages in thread From: Prakash Gupta @ 2026-02-11 16:10 UTC (permalink / raw) To: Pranjal Shrivastava Cc: Robin Murphy, Will Deacon, Joerg Roedel, Rob Clark, Connor Abbott, linux-arm-msm, linux-arm-kernel, iommu, linux-kernel, Akhil P Oommen, Pratyush Brahma On 2/10/2026 6:45 PM, Pranjal Shrivastava wrote: > On Tue, Feb 10, 2026 at 04:39:56PM +0530, Prakash Gupta wrote: >> >> >> On 2/3/2026 1:44 AM, Pranjal Shrivastava wrote: >>> On Wed, Jan 28, 2026 at 06:44:35PM +0000, Robin Murphy wrote: >>>> [ +Pranjal as this might matter for v3 too... ] >>>> >>> >>> Hi Robin, >>> >>> To weigh in from the arm-smmu-v3 side, we’ve attempted to address the >>> "can of worms" regarding power races by leaning on these differences: >>> >>> - Threaded IRQs for PRI/Events: In the recent series[1], the PRI and >>> event handlers are fully threaded. This allows us to call >>> arm_smmu_rpm_get() safely, as the handler can sleep while waiting for >>> the hardware to resume. >>> >>> - GERROR Handling: Since GERROR remains a hard IRQ, we handle any >>> pending gerrors in the suspend callback before the SMMU actually >>> powers down. Any GERROR interrupts received while the device was >>> suspended are treated as spurious and ignored. >>> >>> Thanks, >>> Praan >> >> [1] refer to case where SMMU state is not retained during smmu device >> power down, this I think is equally applicable for both context and >> global faults. >> >> Since the ARM SMMU runtime resume triggers a device reset, any pending >> faults would be cleared during resume. Here the solution can be to >> handle both global and context faults before allowing the SMMU device to >> suspend. >> With this approach, any hard or threaded IRQ scheduled after the SMMU >> device has suspended can be safely ignored. >> One concern I see is iommu fault reporting to clients while handling >> fault during smmu device suspend. > > I believe by the time we've reached suspend it's safe to assume that all > clients have been suspended. Thus, we could simply not report the error > and instead scream by having a dev_warn_ratelimited about the situation. > By reporting error I meant reporting the error to client with report_iommu_fault(). I agree that if smmu device is being suspended the dma devices should have suspended by now. If so, it should be safe to just handle the fault excluding report_iommu_fault() in suspend path and complete smmu device suspend. Will update in next patchset. > Thanks, > Praan > >> >> Thanks, >> Prakash >> >> [1] https://lore.kernel.org/all/20260126151157.3418145-9-praan@google.com/ >> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers 2026-02-11 16:10 ` Prakash Gupta @ 2026-02-11 16:59 ` Pranjal Shrivastava 0 siblings, 0 replies; 11+ messages in thread From: Pranjal Shrivastava @ 2026-02-11 16:59 UTC (permalink / raw) To: Prakash Gupta Cc: Robin Murphy, Will Deacon, Joerg Roedel, Rob Clark, Connor Abbott, linux-arm-msm, linux-arm-kernel, iommu, linux-kernel, Akhil P Oommen, Pratyush Brahma On Wed, Feb 11, 2026 at 09:40:29PM +0530, Prakash Gupta wrote: > > > On 2/10/2026 6:45 PM, Pranjal Shrivastava wrote: > > On Tue, Feb 10, 2026 at 04:39:56PM +0530, Prakash Gupta wrote: > >> > >> > >> On 2/3/2026 1:44 AM, Pranjal Shrivastava wrote: > >>> On Wed, Jan 28, 2026 at 06:44:35PM +0000, Robin Murphy wrote: > >>>> [ +Pranjal as this might matter for v3 too... ] > >>>> > >>> > >>> Hi Robin, > >>> > >>> To weigh in from the arm-smmu-v3 side, we’ve attempted to address the > >>> "can of worms" regarding power races by leaning on these differences: > >>> > >>> - Threaded IRQs for PRI/Events: In the recent series[1], the PRI and > >>> event handlers are fully threaded. This allows us to call > >>> arm_smmu_rpm_get() safely, as the handler can sleep while waiting for > >>> the hardware to resume. > >>> > >>> - GERROR Handling: Since GERROR remains a hard IRQ, we handle any > >>> pending gerrors in the suspend callback before the SMMU actually > >>> powers down. Any GERROR interrupts received while the device was > >>> suspended are treated as spurious and ignored. > >>> > >>> Thanks, > >>> Praan > >> > >> [1] refer to case where SMMU state is not retained during smmu device > >> power down, this I think is equally applicable for both context and > >> global faults. > >> > >> Since the ARM SMMU runtime resume triggers a device reset, any pending > >> faults would be cleared during resume. Here the solution can be to > >> handle both global and context faults before allowing the SMMU device to > >> suspend. > >> With this approach, any hard or threaded IRQ scheduled after the SMMU > >> device has suspended can be safely ignored. > >> One concern I see is iommu fault reporting to clients while handling > >> fault during smmu device suspend. > > > > I believe by the time we've reached suspend it's safe to assume that all > > clients have been suspended. Thus, we could simply not report the error > > and instead scream by having a dev_warn_ratelimited about the situation. > > > > By reporting error I meant reporting the error to client with > report_iommu_fault(). I agree that if smmu device is being suspended the > dma devices should have suspended by now. If so, it should be safe to > just handle the fault excluding report_iommu_fault() in suspend path and > complete smmu device suspend. Will update in next patchset. > Yes, that's what I meant, since the client is likely suspended, we can't call report_iommu_fault() because the client might've registered a handler which touches some MMIO (accesses registers) while the client is suspended. Thus, if we see a fault during suspend, we could potentially log it at an appropriate level and not call report_iommu_fault().. Thanks, Praan > > > >> > >> Thanks, > >> Prakash > >> > >> [1] https://lore.kernel.org/all/20260126151157.3418145-9-praan@google.com/ > >> > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-02-11 16:59 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-27 12:11 [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers Prakash Gupta 2026-01-27 12:52 ` Akhil P Oommen 2026-01-27 16:05 ` Robin Murphy 2026-01-28 5:56 ` Prakash Gupta 2026-01-28 18:44 ` Robin Murphy 2026-01-29 16:03 ` Prakash Gupta 2026-02-02 20:14 ` Pranjal Shrivastava 2026-02-10 11:09 ` Prakash Gupta 2026-02-10 13:15 ` Pranjal Shrivastava 2026-02-11 16:10 ` Prakash Gupta 2026-02-11 16:59 ` Pranjal Shrivastava
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox