* [PATCH v5 01/10] iommu/arm-smmu-v3: Refactor arm_smmu_setup_irqs
2026-01-26 15:11 [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
@ 2026-01-26 15:11 ` Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 02/10] iommu/arm-smmu-v3: Add a helper to drain cmd queues Pranjal Shrivastava
` (9 subsequent siblings)
10 siblings, 0 replies; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-01-26 15:11 UTC (permalink / raw)
To: iommu
Cc: Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Daniel Mentz, Ashish Mhetre,
Sairaj Kodilkar, Pranjal Shrivastava
Refactor arm_smmu_setup_irqs by splitting it into two parts, one for
registering interrupt handlers and the other one for enabling interrupt
generation in the hardware. This refactor helps in re-initialization of
hardware interrupts as part of a subsequent patch that enables runtime
power management for the arm-smmu-v3 driver.
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 47 +++++++++++++--------
1 file changed, 30 insertions(+), 17 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 852379845359..d479fadc3fe6 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -4097,14 +4097,32 @@ static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu)
}
}
+static void arm_smmu_enable_irqs(struct arm_smmu_device *smmu)
+{
+ int ret;
+ u32 irqen_flags = IRQ_CTRL_EVTQ_IRQEN | IRQ_CTRL_GERROR_IRQEN;
+
+ if (smmu->features & ARM_SMMU_FEAT_PRI)
+ irqen_flags |= IRQ_CTRL_PRIQ_IRQEN;
+
+ ret = arm_smmu_write_reg_sync(smmu, irqen_flags,
+ ARM_SMMU_IRQ_CTRL, ARM_SMMU_IRQ_CTRLACK);
+ if (ret)
+ dev_warn(smmu->dev, "failed to enable irqs\n");
+}
+
+static int arm_smmu_disable_irqs(struct arm_smmu_device *smmu)
+{
+ return arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_IRQ_CTRL,
+ ARM_SMMU_IRQ_CTRLACK);
+}
+
static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
{
int ret, irq;
- u32 irqen_flags = IRQ_CTRL_EVTQ_IRQEN | IRQ_CTRL_GERROR_IRQEN;
/* Disable IRQs first */
- ret = arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_IRQ_CTRL,
- ARM_SMMU_IRQ_CTRLACK);
+ ret = arm_smmu_disable_irqs(smmu);
if (ret) {
dev_err(smmu->dev, "failed to disable irqs\n");
return ret;
@@ -4126,15 +4144,6 @@ static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
} else
arm_smmu_setup_unique_irqs(smmu);
- if (smmu->features & ARM_SMMU_FEAT_PRI)
- irqen_flags |= IRQ_CTRL_PRIQ_IRQEN;
-
- /* Enable interrupt generation on the SMMU */
- ret = arm_smmu_write_reg_sync(smmu, irqen_flags,
- ARM_SMMU_IRQ_CTRL, ARM_SMMU_IRQ_CTRLACK);
- if (ret)
- dev_warn(smmu->dev, "failed to enable irqs\n");
-
return 0;
}
@@ -4277,11 +4286,8 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
}
}
- ret = arm_smmu_setup_irqs(smmu);
- if (ret) {
- dev_err(smmu->dev, "failed to setup irqs\n");
- return ret;
- }
+ /* Enable interrupt generation on the SMMU */
+ arm_smmu_enable_irqs(smmu);
if (is_kdump_kernel())
enables &= ~(CR0_EVTQEN | CR0_PRIQEN);
@@ -4905,6 +4911,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
/* Check for RMRs and install bypass STEs if any */
arm_smmu_rmr_install_bypass_ste(smmu);
+ /* Setup interrupt handlers */
+ ret = arm_smmu_setup_irqs(smmu);
+ if (ret) {
+ dev_err(smmu->dev, "failed to setup irqs\n");
+ goto err_free_iopf;
+ }
+
/* Reset the device */
ret = arm_smmu_device_reset(smmu);
if (ret)
--
2.52.0.457.g6b5491de43-goog
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v5 02/10] iommu/arm-smmu-v3: Add a helper to drain cmd queues
2026-01-26 15:11 [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 01/10] iommu/arm-smmu-v3: Refactor arm_smmu_setup_irqs Pranjal Shrivastava
@ 2026-01-26 15:11 ` Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 03/10] iommu/tegra241-cmdqv: Add a helper to drain VCMDQs Pranjal Shrivastava
` (8 subsequent siblings)
10 siblings, 0 replies; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-01-26 15:11 UTC (permalink / raw)
To: iommu
Cc: Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Daniel Mentz, Ashish Mhetre,
Sairaj Kodilkar, Pranjal Shrivastava
Before we suspend SMMU, we want to ensure that all commands (especially
ATC_INV) have been flushed by the CMDQ, i.e. the CMDQs are empty.
Add a helper function that polls till the queues are dained.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 34 +++++++++++++++++++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 ++
2 files changed, 37 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index d479fadc3fe6..014d08cdfb92 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -992,6 +992,40 @@ static int arm_smmu_cmdq_batch_submit(struct arm_smmu_device *smmu,
cmds->num, true);
}
+int arm_smmu_queue_poll_until_empty(struct arm_smmu_device *smmu,
+ struct arm_smmu_queue *q)
+{
+ struct arm_smmu_queue_poll qp;
+ struct arm_smmu_ll_queue *llq = &q->llq;
+ int ret = 0;
+
+ queue_poll_init(smmu, &qp);
+ do {
+ if (queue_empty(llq))
+ break;
+
+ ret = queue_poll(&qp);
+ WRITE_ONCE(llq->cons, readl_relaxed(q->cons_reg));
+
+ } while (!ret);
+
+ return ret;
+}
+
+static int arm_smmu_drain_queues(struct arm_smmu_device *smmu)
+{
+ int ret;
+
+ /*
+ * Since this is only called from the suspend callback where
+ * exclusive access is ensured as nr_cmdq_users = 0 blocks new
+ * command submissions to the cmdq, we can skip the cmdq locking.
+ */
+ ret = arm_smmu_queue_poll_until_empty(smmu, &smmu->cmdq.q);
+
+ return ret;
+}
+
static void arm_smmu_page_response(struct device *dev, struct iopf_fault *unused,
struct iommu_page_response *resp)
{
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 3c6d65d36164..4c04f63966eb 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -985,6 +985,9 @@ int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
int arm_smmu_cmdq_init(struct arm_smmu_device *smmu,
struct arm_smmu_cmdq *cmdq);
+int arm_smmu_queue_poll_until_empty(struct arm_smmu_device *smmu,
+ struct arm_smmu_queue *q);
+
static inline bool arm_smmu_master_canwbs(struct arm_smmu_master *master)
{
return dev_iommu_fwspec_get(master->dev)->flags &
--
2.52.0.457.g6b5491de43-goog
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v5 03/10] iommu/tegra241-cmdqv: Add a helper to drain VCMDQs
2026-01-26 15:11 [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 01/10] iommu/arm-smmu-v3: Refactor arm_smmu_setup_irqs Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 02/10] iommu/arm-smmu-v3: Add a helper to drain cmd queues Pranjal Shrivastava
@ 2026-01-26 15:11 ` Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 04/10] iommu/tegra241-cmdqv: Restore PROD and CONS after resume Pranjal Shrivastava
` (7 subsequent siblings)
10 siblings, 0 replies; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-01-26 15:11 UTC (permalink / raw)
To: iommu
Cc: Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Daniel Mentz, Ashish Mhetre,
Sairaj Kodilkar, Pranjal Shrivastava
The tegra241-cmdqv driver supports vCMDQs which need to be drained
before suspending the SMMU. The current driver implementation only uses
VINTF0 for vCMDQs owned by the kernel which need to be drained. Add a
helper that drains all the enabled vCMDQs under VINTF0.
Add another function ptr to arm_smmu_impl_ops to drain implementation
specified queues and call it within `arm_smmu_drain_queues`.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 7 +++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 +
.../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 27 +++++++++++++++++++
3 files changed, 35 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 014d08cdfb92..d9003f5f40b6 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1023,6 +1023,13 @@ static int arm_smmu_drain_queues(struct arm_smmu_device *smmu)
*/
ret = arm_smmu_queue_poll_until_empty(smmu, &smmu->cmdq.q);
+ if (ret)
+ goto out;
+
+ /* Drain all implementation-specific queues */
+ if (smmu->impl_ops && smmu->impl_ops->drain_queues)
+ ret = smmu->impl_ops->drain_queues(smmu);
+out:
return ret;
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 4c04f63966eb..d083141c6563 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -731,6 +731,7 @@ struct arm_smmu_impl_ops {
size_t (*get_viommu_size)(enum iommu_viommu_type viommu_type);
int (*vsmmu_init)(struct arm_vsmmu *vsmmu,
const struct iommu_user_data *user_data);
+ int (*drain_queues)(struct arm_smmu_device *smmu);
};
/* An SMMUv3 instance */
diff --git a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
index 1fc03b72beb8..38d637fde86e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
+++ b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
@@ -414,6 +414,32 @@ tegra241_cmdqv_get_cmdq(struct arm_smmu_device *smmu,
return &vcmdq->cmdq;
}
+static int tegra241_cmdqv_drain_vintf0_lvcmdqs(struct arm_smmu_device *smmu)
+{
+ struct tegra241_cmdqv *cmdqv =
+ container_of(smmu, struct tegra241_cmdqv, smmu);
+ struct tegra241_vintf *vintf = cmdqv->vintfs[0];
+ int ret = 0;
+ u16 lidx;
+
+ /* Kernel only uses VINTF0. Return if it's disabled */
+ if (!READ_ONCE(vintf->enabled))
+ return 0;
+
+ for (lidx = 0; lidx < cmdqv->num_lvcmdqs_per_vintf; lidx++) {
+ struct tegra241_vcmdq *vcmdq = vintf->lvcmdqs[lidx];
+
+ if (!vcmdq || !READ_ONCE(vcmdq->enabled))
+ continue;
+
+ ret = arm_smmu_queue_poll_until_empty(smmu, &vcmdq->cmdq.q);
+ if (ret)
+ break;
+ }
+
+ return ret;
+}
+
/* HW Reset Functions */
/*
@@ -844,6 +870,7 @@ static struct arm_smmu_impl_ops tegra241_cmdqv_impl_ops = {
.get_secondary_cmdq = tegra241_cmdqv_get_cmdq,
.device_reset = tegra241_cmdqv_hw_reset,
.device_remove = tegra241_cmdqv_remove,
+ .drain_queues = tegra241_cmdqv_drain_vintf0_lvcmdqs,
/* For user-space use */
.hw_info = tegra241_cmdqv_hw_info,
.get_viommu_size = tegra241_cmdqv_get_vintf_size,
--
2.52.0.457.g6b5491de43-goog
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v5 04/10] iommu/tegra241-cmdqv: Restore PROD and CONS after resume
2026-01-26 15:11 [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
` (2 preceding siblings ...)
2026-01-26 15:11 ` [PATCH v5 03/10] iommu/tegra241-cmdqv: Add a helper to drain VCMDQs Pranjal Shrivastava
@ 2026-01-26 15:11 ` Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 05/10] iommu/arm-smmu-v3: Cache and restore MSI config Pranjal Shrivastava
` (6 subsequent siblings)
10 siblings, 0 replies; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-01-26 15:11 UTC (permalink / raw)
To: iommu
Cc: Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Daniel Mentz, Ashish Mhetre,
Sairaj Kodilkar, Pranjal Shrivastava
From: Ashish Mhetre <amhetre@nvidia.com>
PROD and CONS indices for vcmdqs are getting set to 0 after resume.
Because of this the vcmdq is not consuming commands after resume.
Fix this by restoring PROD and CONS indices after resume from
saved pointers.
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
index 38d637fde86e..6a951386baad 100644
--- a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
+++ b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
@@ -507,6 +507,8 @@ static int tegra241_vcmdq_hw_init(struct tegra241_vcmdq *vcmdq)
/* Configure and enable VCMDQ */
writeq_relaxed(vcmdq->cmdq.q.q_base, REG_VCMDQ_PAGE1(vcmdq, BASE));
+ writel_relaxed(vcmdq->cmdq.q.llq.prod, REG_VCMDQ_PAGE0(vcmdq, PROD));
+ writel_relaxed(vcmdq->cmdq.q.llq.cons, REG_VCMDQ_PAGE0(vcmdq, CONS));
ret = vcmdq_write_config(vcmdq, VCMDQ_EN);
if (ret) {
--
2.52.0.457.g6b5491de43-goog
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v5 05/10] iommu/arm-smmu-v3: Cache and restore MSI config
2026-01-26 15:11 [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
` (3 preceding siblings ...)
2026-01-26 15:11 ` [PATCH v5 04/10] iommu/tegra241-cmdqv: Restore PROD and CONS after resume Pranjal Shrivastava
@ 2026-01-26 15:11 ` Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq Pranjal Shrivastava
` (5 subsequent siblings)
10 siblings, 0 replies; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-01-26 15:11 UTC (permalink / raw)
To: iommu
Cc: Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Daniel Mentz, Ashish Mhetre,
Sairaj Kodilkar, Pranjal Shrivastava
The SMMU's MSI configuration registers (*_IRQ_CFGn) containing target
address, data and memory attributes lose their state when the SMMU is
powered down. We'll need to cache and restore their contents to ensure
that MSIs work after the system resumes.
To address this, cache the original `msi_msg` within the `msi_desc`
when the configuration is first written by `arm_smmu_write_msi_msg`.
This primarily includes the target address and data since the memory
attributes are fixed.
Introduce a new helper `arm_smmu_resume_msis` which will later be called
during the driver's resume callback. The helper would retrieve the
cached MSI message for each relevant interrupt (evtq, gerr, priq) via
get_cached_msi_msg & re-config the registers via arm_smmu_write_msi_msg.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 37 +++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index d9003f5f40b6..91edcd8922a7 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -4047,6 +4047,9 @@ static void arm_smmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
struct arm_smmu_device *smmu = dev_get_drvdata(dev);
phys_addr_t *cfg = arm_smmu_msi_cfg[desc->msi_index];
+ /* Cache the msi_msg for resume */
+ desc->msg = *msg;
+
doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo;
doorbell &= MSI_CFG0_ADDR_MASK;
@@ -4055,6 +4058,40 @@ static void arm_smmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
writel_relaxed(ARM_SMMU_MEMATTR_DEVICE_nGnRE, smmu->base + cfg[2]);
}
+static void arm_smmu_resume_msi(struct arm_smmu_device *smmu,
+ unsigned int irq, const char *name)
+{
+ struct msi_desc *desc;
+ struct msi_msg msg;
+
+ if (!irq)
+ return;
+
+ desc = irq_get_msi_desc(irq);
+ if (!desc) {
+ dev_err(smmu->dev, "Failed to resume msi: %s", name);
+ return;
+ }
+
+ get_cached_msi_msg(irq, &msg);
+ arm_smmu_write_msi_msg(desc, &msg);
+}
+
+static void arm_smmu_resume_msis(struct arm_smmu_device *smmu)
+{
+ if (!(smmu->features & ARM_SMMU_FEAT_MSI))
+ return;
+
+ if (!dev_get_msi_domain(smmu->dev))
+ return;
+
+ arm_smmu_resume_msi(smmu, smmu->gerr_irq, "gerror");
+ arm_smmu_resume_msi(smmu, smmu->evtq.q.irq, "evtq");
+
+ if (smmu->features & ARM_SMMU_FEAT_PRI)
+ arm_smmu_resume_msi(smmu, smmu->priq.q.irq, "priq");
+}
+
static void arm_smmu_setup_msis(struct arm_smmu_device *smmu)
{
int ret, nvec = ARM_SMMU_MAX_MSIS;
--
2.52.0.457.g6b5491de43-goog
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq
2026-01-26 15:11 [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
` (4 preceding siblings ...)
2026-01-26 15:11 ` [PATCH v5 05/10] iommu/arm-smmu-v3: Cache and restore MSI config Pranjal Shrivastava
@ 2026-01-26 15:11 ` Pranjal Shrivastava
2026-03-08 21:23 ` Daniel Mentz
2026-01-26 15:11 ` [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops Pranjal Shrivastava
` (4 subsequent siblings)
10 siblings, 1 reply; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-01-26 15:11 UTC (permalink / raw)
To: iommu
Cc: Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Daniel Mentz, Ashish Mhetre,
Sairaj Kodilkar, Pranjal Shrivastava
Introduce a biased counter to track the number of active cmdq owners as
a preparatory step for the runtime PM implementation.
The counter will be used to gate command submission, preventing the
submission of new commands while the device is suspended and deferring
suspend while the command submissions are in-flight.
The counter is biased to a value of 1 during device reset. A cmdq owner
or a thread issuing cmds with sync, increment it before accessing HW
registers and decrements it with release semantics afterwards.
A value of 1 represents an idle (but active) state. A suspend operation
will set it to from 1 -> 0 representing the suspended state.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 65 +++++++++++++++++----
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 +
2 files changed, 57 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 91edcd8922a7..3a8d3c2a8d69 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -805,7 +805,7 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
u64 cmd_sync[CMDQ_ENT_DWORDS];
u32 prod;
unsigned long flags;
- bool owner;
+ bool owner, has_ref = false;
struct arm_smmu_ll_queue llq, head;
int ret = 0;
@@ -819,8 +819,15 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
while (!queue_has_space(&llq, n + sync)) {
local_irq_restore(flags);
+
+ if (!atomic_inc_not_zero(&smmu->nr_cmdq_users))
+ /* Device is suspended, don't wait for space */
+ return 0;
+
if (arm_smmu_cmdq_poll_until_not_full(smmu, cmdq, &llq))
dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
+
+ atomic_dec_return_release(&smmu->nr_cmdq_users);
local_irq_save(flags);
}
@@ -879,10 +886,35 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
arm_smmu_cmdq_poll_valid_map(cmdq, llq.prod, prod);
/*
- * d. Advance the hardware prod pointer
+ * d. Advance the hardware prod pointer (if smmu is still active)
* Control dependency ordering from the entries becoming valid.
*/
- writel_relaxed(prod, cmdq->q.prod_reg);
+ if (atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
+ writel_relaxed(prod, cmdq->q.prod_reg);
+
+ if (sync) {
+ has_ref = true;
+ } else {
+ /*
+ * Use release semantics to enforce ordering without a full barrier.
+ * This ensures the prior writel_relaxed() is ordered/visible
+ * before the refcount decrement, avoiding the heavy pipeline
+ * stall of a full wmb().
+ *
+ * We need the atomic_dec_return_release() below and the
+ * atomic_set_release() in step (e) below doesn't suffice.
+ *
+ * Specifically, without release semantics on the decrement,
+ * the CPU is free to reorder the independent atomic_dec_relaxed()
+ * before the writel_relaxed().
+ *
+ * If this happens, the refcount could drop to zero, allowing the PM
+ * suspend path (running on another CPU) to disable the SMMU before
+ * the register write completes, resulting in a bus fault.
+ */
+ atomic_dec_return_release(&smmu->nr_cmdq_users);
+ }
+ }
/*
* e. Tell the next owner we're done
@@ -894,14 +926,19 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
/* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */
if (sync) {
- llq.prod = queue_inc_prod_n(&llq, n);
- ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
- if (ret) {
- dev_err_ratelimited(smmu->dev,
- "CMD_SYNC timeout at 0x%08x [hwprod 0x%08x, hwcons 0x%08x]\n",
- llq.prod,
- readl_relaxed(cmdq->q.prod_reg),
- readl_relaxed(cmdq->q.cons_reg));
+
+ /* If we are not the owner, check if we're suspended */
+ if (has_ref || atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
+ has_ref = true;
+ llq.prod = queue_inc_prod_n(&llq, n);
+ ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
+ if (ret) {
+ dev_err_ratelimited(smmu->dev,
+ "CMD_SYNC timeout at 0x%08x [hwprod 0x%08x, hwcons 0x%08x]\n",
+ llq.prod,
+ readl_relaxed(cmdq->q.prod_reg),
+ readl_relaxed(cmdq->q.cons_reg));
+ }
}
/*
@@ -914,6 +951,9 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
}
}
+ if (has_ref)
+ atomic_dec_return_release(&smmu->nr_cmdq_users);
+
local_irq_restore(flags);
return ret;
}
@@ -4310,6 +4350,9 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
return ret;
}
+ /* Set the cmdq to be active before issuing any commands */
+ atomic_set(&smmu->nr_cmdq_users, 1);
+
/* Invalidate any cached configuration */
cmd.opcode = CMDQ_OP_CFGI_ALL;
arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index d083141c6563..50a2513e4425 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -804,6 +804,9 @@ struct arm_smmu_device {
struct rb_root streams;
struct mutex streams_mutex;
+
+ /* Tracks the cmdq usage count for runtime PM */
+ atomic_t nr_cmdq_users;
};
struct arm_smmu_stream {
--
2.52.0.457.g6b5491de43-goog
^ permalink raw reply related [flat|nested] 33+ messages in thread* Re: [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq
2026-01-26 15:11 ` [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq Pranjal Shrivastava
@ 2026-03-08 21:23 ` Daniel Mentz
2026-03-09 19:46 ` Pranjal Shrivastava
0 siblings, 1 reply; 33+ messages in thread
From: Daniel Mentz @ 2026-03-08 21:23 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Mon, Jan 26, 2026 at 7:12 AM Pranjal Shrivastava <praan@google.com> wrote:
>
> Introduce a biased counter to track the number of active cmdq owners as
> a preparatory step for the runtime PM implementation.
Non-owners waiting for their CMD_SYNC to be consumed also appear to
increase this counter.
>
> The counter will be used to gate command submission, preventing the
> submission of new commands while the device is suspended and deferring
> suspend while the command submissions are in-flight.
What does command submission mean in this context? I understand that
commands are still added to the queue (until the queue is full). It's
only the producer index that is not advanced.
>
> The counter is biased to a value of 1 during device reset. A cmdq owner
> or a thread issuing cmds with sync, increment it before accessing HW
> registers and decrements it with release semantics afterwards.
>
> A value of 1 represents an idle (but active) state. A suspend operation
> will set it to from 1 -> 0 representing the suspended state.
>
> Signed-off-by: Pranjal Shrivastava <praan@google.com>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 65 +++++++++++++++++----
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 +
> 2 files changed, 57 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 91edcd8922a7..3a8d3c2a8d69 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -805,7 +805,7 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> u64 cmd_sync[CMDQ_ENT_DWORDS];
> u32 prod;
> unsigned long flags;
> - bool owner;
> + bool owner, has_ref = false;
> struct arm_smmu_ll_queue llq, head;
> int ret = 0;
>
> @@ -819,8 +819,15 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
>
> while (!queue_has_space(&llq, n + sync)) {
> local_irq_restore(flags);
> +
> + if (!atomic_inc_not_zero(&smmu->nr_cmdq_users))
> + /* Device is suspended, don't wait for space */
> + return 0;
> +
Does this mean that we still accumulate commands until the queue is
full while in the suspended state and then have SMMU excecute all of
these (now irrelevant) commands when the command queue is re-enabled?
One might argue that this is wasteful, but I guess it might be
technically correct, because it doesn't hurt either.
> if (arm_smmu_cmdq_poll_until_not_full(smmu, cmdq, &llq))
> dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
> +
> + atomic_dec_return_release(&smmu->nr_cmdq_users);
> local_irq_save(flags);
> }
>
> @@ -879,10 +886,35 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> arm_smmu_cmdq_poll_valid_map(cmdq, llq.prod, prod);
>
> /*
> - * d. Advance the hardware prod pointer
> + * d. Advance the hardware prod pointer (if smmu is still active)
Consider "(if command queue is still enabled)". Otherwise, what does
"active" mean? SMMU and/or command queue enabled?
> * Control dependency ordering from the entries becoming valid.
> */
> - writel_relaxed(prod, cmdq->q.prod_reg);
> + if (atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> + writel_relaxed(prod, cmdq->q.prod_reg);
> +
> + if (sync) {
> + has_ref = true;
> + } else {
> + /*
> + * Use release semantics to enforce ordering without a full barrier.
> + * This ensures the prior writel_relaxed() is ordered/visible
> + * before the refcount decrement, avoiding the heavy pipeline
> + * stall of a full wmb().
> + *
> + * We need the atomic_dec_return_release() below and the
> + * atomic_set_release() in step (e) below doesn't suffice.
> + *
> + * Specifically, without release semantics on the decrement,
> + * the CPU is free to reorder the independent atomic_dec_relaxed()
> + * before the writel_relaxed().
> + *
> + * If this happens, the refcount could drop to zero, allowing the PM
> + * suspend path (running on another CPU) to disable the SMMU before
> + * the register write completes, resulting in a bus fault.
> + */
> + atomic_dec_return_release(&smmu->nr_cmdq_users);
> + }
> + }
>
> /*
> * e. Tell the next owner we're done
> @@ -894,14 +926,19 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
>
> /* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */
> if (sync) {
> - llq.prod = queue_inc_prod_n(&llq, n);
> - ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
> - if (ret) {
> - dev_err_ratelimited(smmu->dev,
> - "CMD_SYNC timeout at 0x%08x [hwprod 0x%08x, hwcons 0x%08x]\n",
> - llq.prod,
> - readl_relaxed(cmdq->q.prod_reg),
> - readl_relaxed(cmdq->q.cons_reg));
> +
> + /* If we are not the owner, check if we're suspended */
> + if (has_ref || atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
Can there be a situation, where we execute this branch and the
hardware prod pointer has not been updated, in which case
arm_smmu_cmdq_poll_until_sync will wait forever (well, actually, until
it times out). Imagine the following series of events:
* nr_cmdq_users==0
* hardware prod pointer update is skipped
* nr_cmdq_users becomes 1
* arm_smmu_cmdq_poll_until_sync waits but commands are not executed
by SMMU because hardware prod pointer update wasn't updated
* arm_smmu_device_reset() tries to submit CMDQ_OP_CFGI_ALL, but gets
stuck because the queue is full.
* cmdq->q.llq.cons is never updated because the shared lock is held.
queue_has_space() returns false, and
arm_smmu_cmdq_poll_until_not_full() can't get the exclusive lock
> + has_ref = true;
> + llq.prod = queue_inc_prod_n(&llq, n);
> + ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
If you skip arm_smmu_cmdq_poll_until_sync() in the nr_cmdq_users==0
case, I'm thinking that llq.cons is not updated. If this is the case,
is the subsequent WRITE_ONCE(cmdq->q.llq.cons, llq.cons); potentially
incorrect?
I'm still trying to get my head around this algorithm, but accorrding
to my current understanding, the following rule applies:
Only the cmdq owner may update the hardware prod pointer, and only
after arm_smmu_cmdq_poll_valid_map() returned for the relevant range.
Does the following instruction in arm_smmu_device_reset() violate this
rule?
writel_relaxed(smmu->cmdq.q.llq.prod, smmu->base + ARM_SMMU_CMDQ_PROD);
I'm wondering if we should just set ARM_SMMU_CMDQ_PROD to
smmu->cmdq.q.llq.cons and then have the next owner update the hardware
prod pointer.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq
2026-03-08 21:23 ` Daniel Mentz
@ 2026-03-09 19:46 ` Pranjal Shrivastava
2026-03-09 21:13 ` Pranjal Shrivastava
2026-03-09 22:41 ` Daniel Mentz
0 siblings, 2 replies; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-03-09 19:46 UTC (permalink / raw)
To: Daniel Mentz
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Sun, Mar 08, 2026 at 02:23:03PM -0700, Daniel Mentz wrote:
> On Mon, Jan 26, 2026 at 7:12 AM Pranjal Shrivastava <praan@google.com> wrote:
> >
> > Introduce a biased counter to track the number of active cmdq owners as
> > a preparatory step for the runtime PM implementation.
>
> Non-owners waiting for their CMD_SYNC to be consumed also appear to
> increase this counter.
>
Ack, I'll clarify that in the commit message too, any owners or
non-owners waiting on sync would increment this counter.
> >
> > The counter will be used to gate command submission, preventing the
> > submission of new commands while the device is suspended and deferring
> > suspend while the command submissions are in-flight.
>
> What does command submission mean in this context? I understand that
> commands are still added to the queue (until the queue is full). It's
> only the producer index that is not advanced.
>
Exactly, incrementing the "producer index" is how the command is
actually submitted to the HW. In case the SMMU is suspended, we'd simply
add it to the queue but not touch the PROD register in the HW. I can
clarify this further by saying "preventing the flushing of new commands"
> >
> > The counter is biased to a value of 1 during device reset. A cmdq owner
> > or a thread issuing cmds with sync, increment it before accessing HW
> > registers and decrements it with release semantics afterwards.
> >
> > A value of 1 represents an idle (but active) state. A suspend operation
> > will set it to from 1 -> 0 representing the suspended state.
> >
> > Signed-off-by: Pranjal Shrivastava <praan@google.com>
> > ---
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 65 +++++++++++++++++----
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 +
> > 2 files changed, 57 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index 91edcd8922a7..3a8d3c2a8d69 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -805,7 +805,7 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > u64 cmd_sync[CMDQ_ENT_DWORDS];
> > u32 prod;
> > unsigned long flags;
> > - bool owner;
> > + bool owner, has_ref = false;
> > struct arm_smmu_ll_queue llq, head;
> > int ret = 0;
> >
> > @@ -819,8 +819,15 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> >
> > while (!queue_has_space(&llq, n + sync)) {
> > local_irq_restore(flags);
> > +
> > + if (!atomic_inc_not_zero(&smmu->nr_cmdq_users))
> > + /* Device is suspended, don't wait for space */
> > + return 0;
> > +
>
> Does this mean that we still accumulate commands until the queue is
> full while in the suspended state and then have SMMU excecute all of
> these (now irrelevant) commands when the command queue is re-enabled?
> One might argue that this is wasteful, but I guess it might be
> technically correct, because it doesn't hurt either.
>
No, we drop any commands that are issued if the command queue doesn't
have space while it is suspended. There are two reasons for this:
1. The arm_smmu_cmdq_poll_until_not_full below would try to perform an
MMIO access (read_relaxed(cmdq->q.cons_reg) which shouldn't be allowed
when the SMMU is suspended.
2. We usually have 3 types of commands in the SMMUv3:
a) Invalidation commands: We clear the entire TLB during resume, so we
don't really need to batch these.
b) Prefetch commands: Any prefetches are effectively no-ops as when the
SMMU is suspended, the resume would effectively read the latest config
c) Page responses: These are NOT expected to occur while the SMMU is
suspended and we shouldn't be batching them anyway. (The page resp
handlers handle such situations and avoid queueing-in commands anyway
with this series)
Thus, any dropped invalidation / prefetch commands don't harm us if the
SMMU was suspended.
> > if (arm_smmu_cmdq_poll_until_not_full(smmu, cmdq, &llq))
> > dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
> > +
> > + atomic_dec_return_release(&smmu->nr_cmdq_users);
> > local_irq_save(flags);
> > }
> >
> > @@ -879,10 +886,35 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > arm_smmu_cmdq_poll_valid_map(cmdq, llq.prod, prod);
> >
> > /*
> > - * d. Advance the hardware prod pointer
> > + * d. Advance the hardware prod pointer (if smmu is still active)
>
> Consider "(if command queue is still enabled)". Otherwise, what does
> "active" mean? SMMU and/or command queue enabled?
>
The suspend callback is implemented in a way where we can't have a
"partial" suspend, i.e. there is no code path where the CMDQ is disabled
but the SMMU is not. Thus, "active" was supposed to mean The SMMU is
enabled entirely since there isn't a partial suspend/resume implemented.
Although, I can update the comment to say (if smmu is still enabled) OR
(if smmu is NOT suspended).
> > * Control dependency ordering from the entries becoming valid.
> > */
> > - writel_relaxed(prod, cmdq->q.prod_reg);
> > + if (atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> > + writel_relaxed(prod, cmdq->q.prod_reg);
> > +
> > + if (sync) {
> > + has_ref = true;
> > + } else {
> > + /*
> > + * Use release semantics to enforce ordering without a full barrier.
> > + * This ensures the prior writel_relaxed() is ordered/visible
> > + * before the refcount decrement, avoiding the heavy pipeline
> > + * stall of a full wmb().
> > + *
> > + * We need the atomic_dec_return_release() below and the
> > + * atomic_set_release() in step (e) below doesn't suffice.
> > + *
> > + * Specifically, without release semantics on the decrement,
> > + * the CPU is free to reorder the independent atomic_dec_relaxed()
> > + * before the writel_relaxed().
> > + *
> > + * If this happens, the refcount could drop to zero, allowing the PM
> > + * suspend path (running on another CPU) to disable the SMMU before
> > + * the register write completes, resulting in a bus fault.
> > + */
> > + atomic_dec_return_release(&smmu->nr_cmdq_users);
> > + }
> > + }
> >
> > /*
> > * e. Tell the next owner we're done
> > @@ -894,14 +926,19 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> >
> > /* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */
> > if (sync) {
> > - llq.prod = queue_inc_prod_n(&llq, n);
> > - ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
> > - if (ret) {
> > - dev_err_ratelimited(smmu->dev,
> > - "CMD_SYNC timeout at 0x%08x [hwprod 0x%08x, hwcons 0x%08x]\n",
> > - llq.prod,
> > - readl_relaxed(cmdq->q.prod_reg),
> > - readl_relaxed(cmdq->q.cons_reg));
> > +
> > + /* If we are not the owner, check if we're suspended */
> > + if (has_ref || atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
>
> Can there be a situation, where we execute this branch and the
> hardware prod pointer has not been updated, in which case
> arm_smmu_cmdq_poll_until_sync will wait forever (well, actually, until
> it times out). Imagine the following series of events:
> * nr_cmdq_users==0
> * hardware prod pointer update is skipped
> * nr_cmdq_users becomes 1
> * arm_smmu_cmdq_poll_until_sync waits but commands are not executed
> by SMMU because hardware prod pointer update wasn't updated
> * arm_smmu_device_reset() tries to submit CMDQ_OP_CFGI_ALL, but gets
> stuck because the queue is full.
> * cmdq->q.llq.cons is never updated because the shared lock is held.
> queue_has_space() returns false, and
> arm_smmu_cmdq_poll_until_not_full() can't get the exclusive lock
>
Thanks for catching this! IIUC, you mean a race between the resume and
this code block can cause a deadlock? Something like the following:
CPU 0 CPU 1
nr_cmdq_users == 0
skip hw_prod++
->resume cb
nr_cmdq_user == 1
nr_cmdq_users == 1
in the if (sync) part
arm_smmu_cmdq_poll_until_sync()
CMDQ_OP_CFGI_ALL stuck for space
I guess we could add a check here if (has_ref == false) but
nr_cmdq_users > 0, then we could write the prod_reg again.. maybe we'd
need to factor out a helper function here.. I'll think more about this..
I guess the simplest thing would be to bail out much before, something
like if (nr_cmdq_users == 0 && sync) return;
> > + has_ref = true;
> > + llq.prod = queue_inc_prod_n(&llq, n);
> > + ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
>
> If you skip arm_smmu_cmdq_poll_until_sync() in the nr_cmdq_users==0
> case, I'm thinking that llq.cons is not updated. If this is the case,
> is the subsequent WRITE_ONCE(cmdq->q.llq.cons, llq.cons); potentially
> incorrect?
>
The goal here was to "simulate" a successful command submission while
the smmu is suspended (as mentioned by Will earlier), we do everything
except for touching MMIO here.
Since the commands can be one of the three types mentioned above, we
don't really miss out on anything with the faux-cmd submission here.
Thus, what if we update llq.prod always and set llq.cons == llq.prod
condtionally if nr_cmdq_users == 0.
Ideally, this should also prevent the `queue_full` scenario as all
commands are faux-issued instantly. Maybe something like:
if (sync) {
llq.prod = queue_inc_prod_n(&llq, n);
if (has_ref || atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
if (!has_ref) {
/* Repair: We were suspended earlier but now we're active.
* Kick the hardware to publish our commands. */
has_ref = true;
... owner logic again ...
...
writel_relaxed(llq.prod, cmdq->q.prod_reg);
}
ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
if (ret) {
// ...
}
} else {
/*
* Device is suspended. Fast-forward the consumer pointer in memory
* to "drop" these commands and avoid a full queue deadlock.
* (n + 1 to include the SYNC command)
*/
llq.cons = queue_inc_prod_n(&llq, n + 1);
}
if (!arm_smmu_cmdq_shared_tryunlock(cmdq)) {
WRITE_ONCE(cmdq->q.llq.cons, llq.cons);
arm_smmu_cmdq_shared_unlock(cmdq);
}
}
This could still be racy.. I'll give it another thought..
> I'm still trying to get my head around this algorithm, but accorrding
> to my current understanding, the following rule applies:
>
> Only the cmdq owner may update the hardware prod pointer, and only
> after arm_smmu_cmdq_poll_valid_map() returned for the relevant range.
> Does the following instruction in arm_smmu_device_reset() violate this
> rule?
>
> writel_relaxed(smmu->cmdq.q.llq.prod, smmu->base + ARM_SMMU_CMDQ_PROD);
>
Not really as we don't enable the command queue by that point.
(but it's still racy as you've mentioned in the queue_full scenario
above)
> I'm wondering if we should just set ARM_SMMU_CMDQ_PROD to
> smmu->cmdq.q.llq.cons and then have the next owner update the hardware
> prod pointer.
Hmm.. what if we ensure that the cmdq.q.llq.prod == cons instead? I
guess that would be racy though?
Thanks,
Praan
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq
2026-03-09 19:46 ` Pranjal Shrivastava
@ 2026-03-09 21:13 ` Pranjal Shrivastava
2026-03-09 22:56 ` Daniel Mentz
2026-03-09 22:41 ` Daniel Mentz
1 sibling, 1 reply; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-03-09 21:13 UTC (permalink / raw)
To: Daniel Mentz
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Mon, Mar 09, 2026 at 07:46:24PM +0000, Pranjal Shrivastava wrote:
> On Sun, Mar 08, 2026 at 02:23:03PM -0700, Daniel Mentz wrote:
> > On Mon, Jan 26, 2026 at 7:12 AM Pranjal Shrivastava <praan@google.com> wrote:
> > >
> > > Introduce a biased counter to track the number of active cmdq owners as
> > > a preparatory step for the runtime PM implementation.
> >
> > Non-owners waiting for their CMD_SYNC to be consumed also appear to
> > increase this counter.
> >
>
> Ack, I'll clarify that in the commit message too, any owners or
> non-owners waiting on sync would increment this counter.
>
> > >
> > > The counter will be used to gate command submission, preventing the
> > > submission of new commands while the device is suspended and deferring
> > > suspend while the command submissions are in-flight.
> >
> > What does command submission mean in this context? I understand that
> > commands are still added to the queue (until the queue is full). It's
> > only the producer index that is not advanced.
> >
>
> Exactly, incrementing the "producer index" is how the command is
> actually submitted to the HW. In case the SMMU is suspended, we'd simply
> add it to the queue but not touch the PROD register in the HW. I can
> clarify this further by saying "preventing the flushing of new commands"
>
> > >
> > > The counter is biased to a value of 1 during device reset. A cmdq owner
> > > or a thread issuing cmds with sync, increment it before accessing HW
> > > registers and decrements it with release semantics afterwards.
> > >
> > > A value of 1 represents an idle (but active) state. A suspend operation
> > > will set it to from 1 -> 0 representing the suspended state.
> > >
> > > Signed-off-by: Pranjal Shrivastava <praan@google.com>
> > > ---
> > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 65 +++++++++++++++++----
> > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 +
> > > 2 files changed, 57 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > index 91edcd8922a7..3a8d3c2a8d69 100644
> > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > @@ -805,7 +805,7 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > > u64 cmd_sync[CMDQ_ENT_DWORDS];
> > > u32 prod;
> > > unsigned long flags;
> > > - bool owner;
> > > + bool owner, has_ref = false;
> > > struct arm_smmu_ll_queue llq, head;
> > > int ret = 0;
> > >
> > > @@ -819,8 +819,15 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > >
> > > while (!queue_has_space(&llq, n + sync)) {
> > > local_irq_restore(flags);
> > > +
> > > + if (!atomic_inc_not_zero(&smmu->nr_cmdq_users))
> > > + /* Device is suspended, don't wait for space */
> > > + return 0;
> > > +
> >
> > Does this mean that we still accumulate commands until the queue is
> > full while in the suspended state and then have SMMU excecute all of
> > these (now irrelevant) commands when the command queue is re-enabled?
> > One might argue that this is wasteful, but I guess it might be
> > technically correct, because it doesn't hurt either.
> >
>
> No, we drop any commands that are issued if the command queue doesn't
> have space while it is suspended. There are two reasons for this:
>
> 1. The arm_smmu_cmdq_poll_until_not_full below would try to perform an
> MMIO access (read_relaxed(cmdq->q.cons_reg) which shouldn't be allowed
> when the SMMU is suspended.
>
> 2. We usually have 3 types of commands in the SMMUv3:
>
> a) Invalidation commands: We clear the entire TLB during resume, so we
> don't really need to batch these.
>
> b) Prefetch commands: Any prefetches are effectively no-ops as when the
> SMMU is suspended, the resume would effectively read the latest config
>
> c) Page responses: These are NOT expected to occur while the SMMU is
> suspended and we shouldn't be batching them anyway. (The page resp
> handlers handle such situations and avoid queueing-in commands anyway
> with this series)
>
> Thus, any dropped invalidation / prefetch commands don't harm us if the
> SMMU was suspended.
>
>
> > > if (arm_smmu_cmdq_poll_until_not_full(smmu, cmdq, &llq))
> > > dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
> > > +
> > > + atomic_dec_return_release(&smmu->nr_cmdq_users);
> > > local_irq_save(flags);
> > > }
> > >
> > > @@ -879,10 +886,35 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > > arm_smmu_cmdq_poll_valid_map(cmdq, llq.prod, prod);
> > >
> > > /*
> > > - * d. Advance the hardware prod pointer
> > > + * d. Advance the hardware prod pointer (if smmu is still active)
> >
> > Consider "(if command queue is still enabled)". Otherwise, what does
> > "active" mean? SMMU and/or command queue enabled?
> >
>
> The suspend callback is implemented in a way where we can't have a
> "partial" suspend, i.e. there is no code path where the CMDQ is disabled
> but the SMMU is not. Thus, "active" was supposed to mean The SMMU is
> enabled entirely since there isn't a partial suspend/resume implemented.
>
> Although, I can update the comment to say (if smmu is still enabled) OR
> (if smmu is NOT suspended).
>
> > > * Control dependency ordering from the entries becoming valid.
> > > */
> > > - writel_relaxed(prod, cmdq->q.prod_reg);
> > > + if (atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> > > + writel_relaxed(prod, cmdq->q.prod_reg);
> > > +
> > > + if (sync) {
> > > + has_ref = true;
> > > + } else {
> > > + /*
> > > + * Use release semantics to enforce ordering without a full barrier.
> > > + * This ensures the prior writel_relaxed() is ordered/visible
> > > + * before the refcount decrement, avoiding the heavy pipeline
> > > + * stall of a full wmb().
> > > + *
> > > + * We need the atomic_dec_return_release() below and the
> > > + * atomic_set_release() in step (e) below doesn't suffice.
> > > + *
> > > + * Specifically, without release semantics on the decrement,
> > > + * the CPU is free to reorder the independent atomic_dec_relaxed()
> > > + * before the writel_relaxed().
> > > + *
> > > + * If this happens, the refcount could drop to zero, allowing the PM
> > > + * suspend path (running on another CPU) to disable the SMMU before
> > > + * the register write completes, resulting in a bus fault.
> > > + */
> > > + atomic_dec_return_release(&smmu->nr_cmdq_users);
> > > + }
> > > + }
> > >
> > > /*
> > > * e. Tell the next owner we're done
> > > @@ -894,14 +926,19 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > >
> > > /* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */
> > > if (sync) {
> > > - llq.prod = queue_inc_prod_n(&llq, n);
> > > - ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
> > > - if (ret) {
> > > - dev_err_ratelimited(smmu->dev,
> > > - "CMD_SYNC timeout at 0x%08x [hwprod 0x%08x, hwcons 0x%08x]\n",
> > > - llq.prod,
> > > - readl_relaxed(cmdq->q.prod_reg),
> > > - readl_relaxed(cmdq->q.cons_reg));
> > > +
> > > + /* If we are not the owner, check if we're suspended */
> > > + if (has_ref || atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> >
> > Can there be a situation, where we execute this branch and the
> > hardware prod pointer has not been updated, in which case
> > arm_smmu_cmdq_poll_until_sync will wait forever (well, actually, until
> > it times out). Imagine the following series of events:
> > * nr_cmdq_users==0
> > * hardware prod pointer update is skipped
> > * nr_cmdq_users becomes 1
> > * arm_smmu_cmdq_poll_until_sync waits but commands are not executed
> > by SMMU because hardware prod pointer update wasn't updated
> > * arm_smmu_device_reset() tries to submit CMDQ_OP_CFGI_ALL, but gets
> > stuck because the queue is full.
> > * cmdq->q.llq.cons is never updated because the shared lock is held.
> > queue_has_space() returns false, and
> > arm_smmu_cmdq_poll_until_not_full() can't get the exclusive lock
> >
>
> Thanks for catching this! IIUC, you mean a race between the resume and
> this code block can cause a deadlock? Something like the following:
>
> CPU 0 CPU 1
> nr_cmdq_users == 0
> skip hw_prod++
> ->resume cb
> nr_cmdq_user == 1
> nr_cmdq_users == 1
> in the if (sync) part
> arm_smmu_cmdq_poll_until_sync()
> CMDQ_OP_CFGI_ALL stuck for space
>
> I guess we could add a check here if (has_ref == false) but
> nr_cmdq_users > 0, then we could write the prod_reg again.. maybe we'd
> need to factor out a helper function here.. I'll think more about this..
>
> I guess the simplest thing would be to bail out much before, something
> like if (nr_cmdq_users == 0 && sync) return;
>
> > > + has_ref = true;
> > > + llq.prod = queue_inc_prod_n(&llq, n);
> > > + ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
> >
> > If you skip arm_smmu_cmdq_poll_until_sync() in the nr_cmdq_users==0
> > case, I'm thinking that llq.cons is not updated. If this is the case,
> > is the subsequent WRITE_ONCE(cmdq->q.llq.cons, llq.cons); potentially
> > incorrect?
> >
>
> The goal here was to "simulate" a successful command submission while
> the smmu is suspended (as mentioned by Will earlier), we do everything
> except for touching MMIO here.
>
> Since the commands can be one of the three types mentioned above, we
> don't really miss out on anything with the faux-cmd submission here.
>
> Thus, what if we update llq.prod always and set llq.cons == llq.prod
> condtionally if nr_cmdq_users == 0.
>
> Ideally, this should also prevent the `queue_full` scenario as all
> commands are faux-issued instantly. Maybe something like:
>
> if (sync) {
> llq.prod = queue_inc_prod_n(&llq, n);
>
> if (has_ref || atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> if (!has_ref) {
> /* Repair: We were suspended earlier but now we're active.
> * Kick the hardware to publish our commands. */
> has_ref = true;
> ... owner logic again ...
> ...
> writel_relaxed(llq.prod, cmdq->q.prod_reg);
> }
> ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
>
> if (ret) {
> // ...
> }
> } else {
> /*
> * Device is suspended. Fast-forward the consumer pointer in memory
> * to "drop" these commands and avoid a full queue deadlock.
> * (n + 1 to include the SYNC command)
> */
> llq.cons = queue_inc_prod_n(&llq, n + 1);
> }
>
> if (!arm_smmu_cmdq_shared_tryunlock(cmdq)) {
> WRITE_ONCE(cmdq->q.llq.cons, llq.cons);
> arm_smmu_cmdq_shared_unlock(cmdq);
> }
> }
>
> This could still be racy.. I'll give it another thought..
>
Still thinking out loud, but maybe we can have something like:
if (owner) {
....
if (atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
writel_relaxes(llq.prod, cmdq->q.prod_reg);
has_ref = true;
} else {
/* SMMU is suspended, fake a consumption */
WRITE_ONCE(cmdq->q.llq.cons, prod);
/* Avoid racing with resume */
sync = false;
}
atomic_set_release(&cmdq->owner_prod, prod);
}
if (sync) {
... same stuff ..
}
LMK, what you think?
[ snip ]
Thanks,
Praan
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq
2026-03-09 21:13 ` Pranjal Shrivastava
@ 2026-03-09 22:56 ` Daniel Mentz
2026-03-10 16:46 ` Pranjal Shrivastava
0 siblings, 1 reply; 33+ messages in thread
From: Daniel Mentz @ 2026-03-09 22:56 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Mon, Mar 9, 2026 at 2:13 PM Pranjal Shrivastava <praan@google.com> wrote:
> Still thinking out loud, but maybe we can have something like:
>
> if (owner) {
> ....
>
> if (atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> writel_relaxes(llq.prod, cmdq->q.prod_reg);
> has_ref = true;
> } else {
>
> /* SMMU is suspended, fake a consumption */
> WRITE_ONCE(cmdq->q.llq.cons, prod);
>
> /* Avoid racing with resume */
> sync = false;
> }
>
> atomic_set_release(&cmdq->owner_prod, prod);
> }
>
> if (sync) {
>
> ... same stuff ..
>
> }
>
> LMK, what you think?
You'd need to make sure to cover the ARM_SMMU_OPT_MSIPOLL case: The
function __arm_smmu_cmdq_poll_until_msi() doesn't read
cmdq->q.llq.cons but rather waits for the MSI write into the command
queue.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq
2026-03-09 22:56 ` Daniel Mentz
@ 2026-03-10 16:46 ` Pranjal Shrivastava
0 siblings, 0 replies; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-03-10 16:46 UTC (permalink / raw)
To: Daniel Mentz
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Mon, Mar 09, 2026 at 03:56:03PM -0700, Daniel Mentz wrote:
> On Mon, Mar 9, 2026 at 2:13 PM Pranjal Shrivastava <praan@google.com> wrote:
> > Still thinking out loud, but maybe we can have something like:
> >
> > if (owner) {
> > ....
> >
> > if (atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> > writel_relaxes(llq.prod, cmdq->q.prod_reg);
> > has_ref = true;
> > } else {
> >
> > /* SMMU is suspended, fake a consumption */
> > WRITE_ONCE(cmdq->q.llq.cons, prod);
> >
> > /* Avoid racing with resume */
> > sync = false;
> > }
> >
> > atomic_set_release(&cmdq->owner_prod, prod);
> > }
> >
> > if (sync) {
> >
> > ... same stuff ..
> >
> > }
> >
> > LMK, what you think?
>
> You'd need to make sure to cover the ARM_SMMU_OPT_MSIPOLL case: The
> function __arm_smmu_cmdq_poll_until_msi() doesn't read
> cmdq->q.llq.cons but rather waits for the MSI write into the command
> queue.
why? The goal is we don't poll at all, simply setting sync = false if
the users == 0, we simply fake a consumption and move on.
Thanks,
Praan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq
2026-03-09 19:46 ` Pranjal Shrivastava
2026-03-09 21:13 ` Pranjal Shrivastava
@ 2026-03-09 22:41 ` Daniel Mentz
2026-03-10 16:54 ` Pranjal Shrivastava
1 sibling, 1 reply; 33+ messages in thread
From: Daniel Mentz @ 2026-03-09 22:41 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Mon, Mar 9, 2026 at 12:46 PM Pranjal Shrivastava <praan@google.com> wrote:
> > > @@ -819,8 +819,15 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > >
> > > while (!queue_has_space(&llq, n + sync)) {
> > > local_irq_restore(flags);
> > > +
> > > + if (!atomic_inc_not_zero(&smmu->nr_cmdq_users))
> > > + /* Device is suspended, don't wait for space */
> > > + return 0;
> > > +
> >
> > Does this mean that we still accumulate commands until the queue is
> > full while in the suspended state and then have SMMU excecute all of
> > these (now irrelevant) commands when the command queue is re-enabled?
> > One might argue that this is wasteful, but I guess it might be
> > technically correct, because it doesn't hurt either.
> >
>
> No, we drop any commands that are issued if the command queue doesn't
> have space while it is suspended. There are two reasons for this:
>
> 1. The arm_smmu_cmdq_poll_until_not_full below would try to perform an
> MMIO access (read_relaxed(cmdq->q.cons_reg) which shouldn't be allowed
> when the SMMU is suspended.
>
> 2. We usually have 3 types of commands in the SMMUv3:
>
> a) Invalidation commands: We clear the entire TLB during resume, so we
> don't really need to batch these.
>
> b) Prefetch commands: Any prefetches are effectively no-ops as when the
> SMMU is suspended, the resume would effectively read the latest config
>
> c) Page responses: These are NOT expected to occur while the SMMU is
> suspended and we shouldn't be batching them anyway. (The page resp
> handlers handle such situations and avoid queueing-in commands anyway
> with this series)
>
> Thus, any dropped invalidation / prefetch commands don't harm us if the
> SMMU was suspended.
Understood. My point is that we start dropping commands only when the
command queue fills up, even though we could potentially drop commands
even earlier.
> > > * Control dependency ordering from the entries becoming valid.
> > > */
> > > - writel_relaxed(prod, cmdq->q.prod_reg);
> > > + if (atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> > > + writel_relaxed(prod, cmdq->q.prod_reg);
> > > +
> > > + if (sync) {
> > > + has_ref = true;
> > > + } else {
> > > + /*
> > > + * Use release semantics to enforce ordering without a full barrier.
> > > + * This ensures the prior writel_relaxed() is ordered/visible
> > > + * before the refcount decrement, avoiding the heavy pipeline
> > > + * stall of a full wmb().
> > > + *
> > > + * We need the atomic_dec_return_release() below and the
> > > + * atomic_set_release() in step (e) below doesn't suffice.
> > > + *
> > > + * Specifically, without release semantics on the decrement,
> > > + * the CPU is free to reorder the independent atomic_dec_relaxed()
> > > + * before the writel_relaxed().
> > > + *
> > > + * If this happens, the refcount could drop to zero, allowing the PM
> > > + * suspend path (running on another CPU) to disable the SMMU before
> > > + * the register write completes, resulting in a bus fault.
> > > + */
> > > + atomic_dec_return_release(&smmu->nr_cmdq_users);
> > > + }
> > > + }
> > >
> > > /*
> > > * e. Tell the next owner we're done
> > > @@ -894,14 +926,19 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > >
> > > /* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */
> > > if (sync) {
> > > - llq.prod = queue_inc_prod_n(&llq, n);
> > > - ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
> > > - if (ret) {
> > > - dev_err_ratelimited(smmu->dev,
> > > - "CMD_SYNC timeout at 0x%08x [hwprod 0x%08x, hwcons 0x%08x]\n",
> > > - llq.prod,
> > > - readl_relaxed(cmdq->q.prod_reg),
> > > - readl_relaxed(cmdq->q.cons_reg));
> > > +
> > > + /* If we are not the owner, check if we're suspended */
> > > + if (has_ref || atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> >
> > Can there be a situation, where we execute this branch and the
> > hardware prod pointer has not been updated, in which case
> > arm_smmu_cmdq_poll_until_sync will wait forever (well, actually, until
> > it times out). Imagine the following series of events:
> > * nr_cmdq_users==0
> > * hardware prod pointer update is skipped
> > * nr_cmdq_users becomes 1
> > * arm_smmu_cmdq_poll_until_sync waits but commands are not executed
> > by SMMU because hardware prod pointer update wasn't updated
> > * arm_smmu_device_reset() tries to submit CMDQ_OP_CFGI_ALL, but gets
> > stuck because the queue is full.
> > * cmdq->q.llq.cons is never updated because the shared lock is held.
> > queue_has_space() returns false, and
> > arm_smmu_cmdq_poll_until_not_full() can't get the exclusive lock
> >
>
> Thanks for catching this! IIUC, you mean a race between the resume and
> this code block can cause a deadlock? Something like the following:
>
> CPU 0 CPU 1
> nr_cmdq_users == 0
> skip hw_prod++
> ->resume cb
> nr_cmdq_user == 1
> nr_cmdq_users == 1
> in the if (sync) part
> arm_smmu_cmdq_poll_until_sync()
> CMDQ_OP_CFGI_ALL stuck for space
>
> I guess we could add a check here if (has_ref == false) but
> nr_cmdq_users > 0, then we could write the prod_reg again.. maybe we'd
> need to factor out a helper function here.. I'll think more about this..
>
> I guess the simplest thing would be to bail out much before, something
> like if (nr_cmdq_users == 0 && sync) return;
On a related note: Can you guarantee that with your current approach,
nr_cmdq_users drops to 1 even with a high rate of calls to
arm_smmu_cmdq_issue_cmdlist() from multiple CPUs or will your "/* Try
to suspend the device, wait for in-flight submissions */" eventually
time out?
Have you considered an alternative approach where you put
if (!atomic_inc_not_zero(&smmu->nr_cmdq_users))
return 0;
at the beginning of arm_smmu_cmdq_issue_cmdlist() and an unconditional
atomic_dec_return_release(&smmu->nr_cmdq_users); at the end? I think
that would avoid this potential deadlock situation.
>
> > I'm still trying to get my head around this algorithm, but accorrding
> > to my current understanding, the following rule applies:
> >
> > Only the cmdq owner may update the hardware prod pointer, and only
> > after arm_smmu_cmdq_poll_valid_map() returned for the relevant range.
> > Does the following instruction in arm_smmu_device_reset() violate this
> > rule?
> >
> > writel_relaxed(smmu->cmdq.q.llq.prod, smmu->base + ARM_SMMU_CMDQ_PROD);
> >
>
> Not really as we don't enable the command queue by that point.
> (but it's still racy as you've mentioned in the queue_full scenario
> above)
You're enabling the command queue right after writing to ARM_SMMU_CMDQ_PROD
/* Command queue */
writeq_relaxed(smmu->cmdq.q.q_base, smmu->base + ARM_SMMU_CMDQ_BASE);
writel_relaxed(smmu->cmdq.q.llq.prod, smmu->base + ARM_SMMU_CMDQ_PROD);
writel_relaxed(smmu->cmdq.q.llq.cons, smmu->base + ARM_SMMU_CMDQ_CONS);
enables = CR0_CMDQEN;
ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
ARM_SMMU_CR0ACK);
My point is that writing smmu->cmdq.q.llq.prod to ARM_SMMU_CMDQ_PROD
is premature. This might lead to the SMMU reading from command slots
that haven't been fully populated.
Daniel
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq
2026-03-09 22:41 ` Daniel Mentz
@ 2026-03-10 16:54 ` Pranjal Shrivastava
2026-03-11 4:43 ` Daniel Mentz
0 siblings, 1 reply; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-03-10 16:54 UTC (permalink / raw)
To: Daniel Mentz
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Mon, Mar 09, 2026 at 03:41:16PM -0700, Daniel Mentz wrote:
> On Mon, Mar 9, 2026 at 12:46 PM Pranjal Shrivastava <praan@google.com> wrote:
> > > > @@ -819,8 +819,15 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > > >
> > > > while (!queue_has_space(&llq, n + sync)) {
> > > > local_irq_restore(flags);
> > > > +
> > > > + if (!atomic_inc_not_zero(&smmu->nr_cmdq_users))
> > > > + /* Device is suspended, don't wait for space */
> > > > + return 0;
> > > > +
> > >
> > > Does this mean that we still accumulate commands until the queue is
> > > full while in the suspended state and then have SMMU excecute all of
> > > these (now irrelevant) commands when the command queue is re-enabled?
> > > One might argue that this is wasteful, but I guess it might be
> > > technically correct, because it doesn't hurt either.
> > >
> >
> > No, we drop any commands that are issued if the command queue doesn't
> > have space while it is suspended. There are two reasons for this:
> >
> > 1. The arm_smmu_cmdq_poll_until_not_full below would try to perform an
> > MMIO access (read_relaxed(cmdq->q.cons_reg) which shouldn't be allowed
> > when the SMMU is suspended.
> >
> > 2. We usually have 3 types of commands in the SMMUv3:
> >
> > a) Invalidation commands: We clear the entire TLB during resume, so we
> > don't really need to batch these.
> >
> > b) Prefetch commands: Any prefetches are effectively no-ops as when the
> > SMMU is suspended, the resume would effectively read the latest config
> >
> > c) Page responses: These are NOT expected to occur while the SMMU is
> > suspended and we shouldn't be batching them anyway. (The page resp
> > handlers handle such situations and avoid queueing-in commands anyway
> > with this series)
> >
> > Thus, any dropped invalidation / prefetch commands don't harm us if the
> > SMMU was suspended.
>
> Understood. My point is that we start dropping commands only when the
> command queue fills up, even though we could potentially drop commands
> even earlier.
>
Hmm.. but how early? Since we also need the bit where we let the next
owner know that the current owner was done..
> > > > * Control dependency ordering from the entries becoming valid.
> > > > */
> > > > - writel_relaxed(prod, cmdq->q.prod_reg);
> > > > + if (atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> > > > + writel_relaxed(prod, cmdq->q.prod_reg);
> > > > +
> > > > + if (sync) {
> > > > + has_ref = true;
> > > > + } else {
> > > > + /*
> > > > + * Use release semantics to enforce ordering without a full barrier.
> > > > + * This ensures the prior writel_relaxed() is ordered/visible
> > > > + * before the refcount decrement, avoiding the heavy pipeline
> > > > + * stall of a full wmb().
> > > > + *
> > > > + * We need the atomic_dec_return_release() below and the
> > > > + * atomic_set_release() in step (e) below doesn't suffice.
> > > > + *
> > > > + * Specifically, without release semantics on the decrement,
> > > > + * the CPU is free to reorder the independent atomic_dec_relaxed()
> > > > + * before the writel_relaxed().
> > > > + *
> > > > + * If this happens, the refcount could drop to zero, allowing the PM
> > > > + * suspend path (running on another CPU) to disable the SMMU before
> > > > + * the register write completes, resulting in a bus fault.
> > > > + */
> > > > + atomic_dec_return_release(&smmu->nr_cmdq_users);
> > > > + }
> > > > + }
> > > >
> > > > /*
> > > > * e. Tell the next owner we're done
> > > > @@ -894,14 +926,19 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > > >
> > > > /* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */
> > > > if (sync) {
> > > > - llq.prod = queue_inc_prod_n(&llq, n);
> > > > - ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
> > > > - if (ret) {
> > > > - dev_err_ratelimited(smmu->dev,
> > > > - "CMD_SYNC timeout at 0x%08x [hwprod 0x%08x, hwcons 0x%08x]\n",
> > > > - llq.prod,
> > > > - readl_relaxed(cmdq->q.prod_reg),
> > > > - readl_relaxed(cmdq->q.cons_reg));
> > > > +
> > > > + /* If we are not the owner, check if we're suspended */
> > > > + if (has_ref || atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> > >
> > > Can there be a situation, where we execute this branch and the
> > > hardware prod pointer has not been updated, in which case
> > > arm_smmu_cmdq_poll_until_sync will wait forever (well, actually, until
> > > it times out). Imagine the following series of events:
> > > * nr_cmdq_users==0
> > > * hardware prod pointer update is skipped
> > > * nr_cmdq_users becomes 1
> > > * arm_smmu_cmdq_poll_until_sync waits but commands are not executed
> > > by SMMU because hardware prod pointer update wasn't updated
> > > * arm_smmu_device_reset() tries to submit CMDQ_OP_CFGI_ALL, but gets
> > > stuck because the queue is full.
> > > * cmdq->q.llq.cons is never updated because the shared lock is held.
> > > queue_has_space() returns false, and
> > > arm_smmu_cmdq_poll_until_not_full() can't get the exclusive lock
> > >
> >
> > Thanks for catching this! IIUC, you mean a race between the resume and
> > this code block can cause a deadlock? Something like the following:
> >
> > CPU 0 CPU 1
> > nr_cmdq_users == 0
> > skip hw_prod++
> > ->resume cb
> > nr_cmdq_user == 1
> > nr_cmdq_users == 1
> > in the if (sync) part
> > arm_smmu_cmdq_poll_until_sync()
> > CMDQ_OP_CFGI_ALL stuck for space
> >
> > I guess we could add a check here if (has_ref == false) but
> > nr_cmdq_users > 0, then we could write the prod_reg again.. maybe we'd
> > need to factor out a helper function here.. I'll think more about this..
> >
> > I guess the simplest thing would be to bail out much before, something
> > like if (nr_cmdq_users == 0 && sync) return;
>
> On a related note: Can you guarantee that with your current approach,
> nr_cmdq_users drops to 1 even with a high rate of calls to
> arm_smmu_cmdq_issue_cmdlist() from multiple CPUs or will your "/* Try
> to suspend the device, wait for in-flight submissions */" eventually
> time out?
>
> Have you considered an alternative approach where you put
>
> if (!atomic_inc_not_zero(&smmu->nr_cmdq_users))
> return 0;
>
> at the beginning of arm_smmu_cmdq_issue_cmdlist() and an unconditional
> atomic_dec_return_release(&smmu->nr_cmdq_users); at the end? I think
> that would avoid this potential deadlock situation.
>
Right that's what I meant with "bail out much before", I'll try to think
if that would work always and if there's a situation where it may fail?
> >
> > > I'm still trying to get my head around this algorithm, but accorrding
> > > to my current understanding, the following rule applies:
> > >
> > > Only the cmdq owner may update the hardware prod pointer, and only
> > > after arm_smmu_cmdq_poll_valid_map() returned for the relevant range.
> > > Does the following instruction in arm_smmu_device_reset() violate this
> > > rule?
> > >
> > > writel_relaxed(smmu->cmdq.q.llq.prod, smmu->base + ARM_SMMU_CMDQ_PROD);
> > >
> >
> > Not really as we don't enable the command queue by that point.
> > (but it's still racy as you've mentioned in the queue_full scenario
> > above)
>
> You're enabling the command queue right after writing to ARM_SMMU_CMDQ_PROD
>
> /* Command queue */
> writeq_relaxed(smmu->cmdq.q.q_base, smmu->base + ARM_SMMU_CMDQ_BASE);
> writel_relaxed(smmu->cmdq.q.llq.prod, smmu->base + ARM_SMMU_CMDQ_PROD);
> writel_relaxed(smmu->cmdq.q.llq.cons, smmu->base + ARM_SMMU_CMDQ_CONS);
>
> enables = CR0_CMDQEN;
> ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
> ARM_SMMU_CR0ACK);
>
> My point is that writing smmu->cmdq.q.llq.prod to ARM_SMMU_CMDQ_PROD
> is premature. This might lead to the SMMU reading from command slots
> that haven't been fully populated.
>
Well.. if the smmu->cmdq.q.llq.cons was also updated with the "fake"
consumption logic, then it wouldn't right?
I agree that the current implementation might have missed a case but
otherwise, this should be fine if both llq.prod & llq.cons are updated
in a balanced manner during the suspend.
> Daniel
Thanks,
Praan
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq
2026-03-10 16:54 ` Pranjal Shrivastava
@ 2026-03-11 4:43 ` Daniel Mentz
2026-03-11 13:53 ` Pranjal Shrivastava
0 siblings, 1 reply; 33+ messages in thread
From: Daniel Mentz @ 2026-03-11 4:43 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
Hi Pranjal,
I have thought about this further and would like to propose an
alternative design.
A key difference in this design is that no commands are written to the
command queue, and the producer index is not updated while the SMMU is
suspended.
Similar to the existing CMDQ_PROD_OWNED_FLAG, we could define a new
flag, CMDQ_PROD_STOP_FLAG, stored in cmdq->q.llq.prod.
In arm_smmu_cmdq_issue_cmdlist():
Perform "old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val);".
If CMDQ_PROD_STOP_FLAG is set in the variable named old, return from
arm_smmu_cmdq_issue_cmdlist() immediately.
In the Suspend handler:
1. Clear SMMU_CR0.SMMUEN.
2. Set CMDQ_PROD_STOP_FLAG in cmdq->q.llq.prod.
3. Wait until CMDQ_PROD_OWNED_FLAG is cleared from cmdq->q.llq.prod.
4. Wait until cmdq->owner_prod matches cmdq->q.llq.prod (excluding the
CMDQ_PROD_STOP_FLAG).
5. Wait until cmdq->lock is 0.
6. Clear SMMU_CR0.
In the Resume handler:
Follow the same logic as your patch, but clear CMDQ_PROD_STOP_FLAG in
cmdq->q.llq.prod instead of setting nr_cmdq_users to 1.
One benefit of this proposal is that it avoids adding a new atomic
variable that multiple CPU threads might compete for. Instead, we
reuse existing atomic operations, which should reduce the performance
impact for server use cases where many CPU threads compete for command
queue access.
Daniel
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq
2026-03-11 4:43 ` Daniel Mentz
@ 2026-03-11 13:53 ` Pranjal Shrivastava
2026-03-11 22:56 ` Daniel Mentz
0 siblings, 1 reply; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-03-11 13:53 UTC (permalink / raw)
To: Daniel Mentz
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Tue, Mar 10, 2026 at 09:43:05PM -0700, Daniel Mentz wrote:
> Hi Pranjal,
>
> I have thought about this further and would like to propose an
> alternative design.
>
> A key difference in this design is that no commands are written to the
> command queue, and the producer index is not updated while the SMMU is
> suspended.
>
> Similar to the existing CMDQ_PROD_OWNED_FLAG, we could define a new
> flag, CMDQ_PROD_STOP_FLAG, stored in cmdq->q.llq.prod.
>
> In arm_smmu_cmdq_issue_cmdlist():
> Perform "old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val);".
> If CMDQ_PROD_STOP_FLAG is set in the variable named old, return from
> arm_smmu_cmdq_issue_cmdlist() immediately.
>
> In the Suspend handler:
>
> 1. Clear SMMU_CR0.SMMUEN.
> 2. Set CMDQ_PROD_STOP_FLAG in cmdq->q.llq.prod.
> 3. Wait until CMDQ_PROD_OWNED_FLAG is cleared from cmdq->q.llq.prod.
> 4. Wait until cmdq->owner_prod matches cmdq->q.llq.prod (excluding the
> CMDQ_PROD_STOP_FLAG).
> 5. Wait until cmdq->lock is 0.
> 6. Clear SMMU_CR0.
>
> In the Resume handler:
> Follow the same logic as your patch, but clear CMDQ_PROD_STOP_FLAG in
> cmdq->q.llq.prod instead of setting nr_cmdq_users to 1.
>
> One benefit of this proposal is that it avoids adding a new atomic
> variable that multiple CPU threads might compete for. Instead, we
> reuse existing atomic operations, which should reduce the performance
> impact for server use cases where many CPU threads compete for command
> queue access.
>
I like the idea/direction! Although, there could be a few (solvable)
challenges, for example:
1. Abandoned Command Batch
In the SMMUv3 driver, cmdq->q.llq.val is the shared global state in
RAM. The cmpxchg is the "Point of Commitment." Consider the following
sequence of events:
a. Assume STOP_FLAG is already set. Global prod is 10 | STOP_FLAG.
b. CPU-A calls issue_cmdlist. It reads the global state:
llq.val = 10 | STOP_FLAG.
c. CPU-A calculates its new target:
head.prod = 14 | STOP_FLAG | OWNED_FLAG.
d. CPU-A executes: old = cmpxchg(&cmdq->q.llq.val, llq.val, head.val)
The swap succeeds because the global state was exactly llq.val.
Global prod is now 14. We have just told the entire system that
there are 14 commands.
e. CPU-A now checks the old variable. It sees STOP_FLAG is set and
returns immediately.
CPU-A has "reserved" the indices 10, 11, 12, and 13 in the global
producer pointer, but it never wrote the commands to those memory slots.
Indices 10–13 now contain garbage/stale data. When the SMMU resumes and
processes up to index 14, it will execute that garbage cmd leading to a
fault (CERROR_ILL).
2. The OWNED_FLAG Deadlock
a. The CPU who finds OWNED_FLAG is 0 in the cmpxchg becomes the owner
Its job is to gather all concurrent work and update the hardware.
b. Any subsequent CPUs that find OWNED_FLAG is 1 in the cmpxchg. They
write their commands to RAM and then just wait. They rely on the
Owner to "kick" the HW for them & to clear the flag for the next
batch.
The Deadlock Scenario:
a. Suspend Handler: Sets the STOP_FLAG in the global cmdq->q.llq.val.
b. CPU-A (The Owner): Executes cmpxchg. It moves the global prod from
10 to 14. Because it was the first in this batch, it does not set
the OWNED_FLAG.
c. CPU-B (The Follower): Executes cmpxchg. It moves the global prod
from 14 to 18. Because it is joining CPU-A's batch, its cmpxchg
sets the OWNED_FLAG in the global state.
d. Now, CPU-A checks the old value, sees STOP_FLAG, and returns.
CPU-B checks the old value, sees STOP_FLAG, and returns.
e. The Consequence: The global state cmdq->q.llq.prod now has the
OWNED_FLAG set.
f. Suspend Handler: Calls Wait until CMDQ_PROD_OWNED_FLAG is cleared
g. Since the Owner (CPU-A) should clear that flag (it does so at the
very end of the function). The Suspend Handler will spin on this
flag forever.
This can be solved by checking the flag before setting the OWNED_FLAG.
While the atomic counter (nr_cmdq_users) used in v5 adds an tomic, I
believe that it provides a cleaner "gate" that prevents CPUs from ever
entering the "Commitment" phase once a suspend is imminent (with fixes).
However, I agree that the atomic cntr can cause pref drops on servers.
I'm okay to adopt Daniel's direction (it only impacts 2 patches) & post
a v6. However, given we've been working through this for a year now, I'd
want to make sure we have a solid consensus on the design from everyone.
I felt with v4 that we have consensus on the design but it still seems
to be evolving. I’m ready to post v6 once we settle on the path forward.
Thanks,
Praan
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq
2026-03-11 13:53 ` Pranjal Shrivastava
@ 2026-03-11 22:56 ` Daniel Mentz
2026-03-16 15:36 ` Pranjal Shrivastava
0 siblings, 1 reply; 33+ messages in thread
From: Daniel Mentz @ 2026-03-11 22:56 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Wed, Mar 11, 2026 at 6:53 AM Pranjal Shrivastava <praan@google.com> wrote:
>
> On Tue, Mar 10, 2026 at 09:43:05PM -0700, Daniel Mentz wrote:
> > Hi Pranjal,
> >
> > I have thought about this further and would like to propose an
> > alternative design.
> >
> > A key difference in this design is that no commands are written to the
> > command queue, and the producer index is not updated while the SMMU is
> > suspended.
> >
> > Similar to the existing CMDQ_PROD_OWNED_FLAG, we could define a new
> > flag, CMDQ_PROD_STOP_FLAG, stored in cmdq->q.llq.prod.
> >
> > In arm_smmu_cmdq_issue_cmdlist():
> > Perform "old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val);".
> > If CMDQ_PROD_STOP_FLAG is set in the variable named old, return from
> > arm_smmu_cmdq_issue_cmdlist() immediately.
> >
> > In the Suspend handler:
> >
> > 1. Clear SMMU_CR0.SMMUEN.
> > 2. Set CMDQ_PROD_STOP_FLAG in cmdq->q.llq.prod.
> > 3. Wait until CMDQ_PROD_OWNED_FLAG is cleared from cmdq->q.llq.prod.
> > 4. Wait until cmdq->owner_prod matches cmdq->q.llq.prod (excluding the
> > CMDQ_PROD_STOP_FLAG).
> > 5. Wait until cmdq->lock is 0.
> > 6. Clear SMMU_CR0.
> >
> > In the Resume handler:
> > Follow the same logic as your patch, but clear CMDQ_PROD_STOP_FLAG in
> > cmdq->q.llq.prod instead of setting nr_cmdq_users to 1.
> >
> > One benefit of this proposal is that it avoids adding a new atomic
> > variable that multiple CPU threads might compete for. Instead, we
> > reuse existing atomic operations, which should reduce the performance
> > impact for server use cases where many CPU threads compete for command
> > queue access.
> >
>
> I like the idea/direction! Although, there could be a few (solvable)
> challenges, for example:
>
> 1. Abandoned Command Batch
> In the SMMUv3 driver, cmdq->q.llq.val is the shared global state in
> RAM. The cmpxchg is the "Point of Commitment." Consider the following
> sequence of events:
>
> a. Assume STOP_FLAG is already set. Global prod is 10 | STOP_FLAG.
> b. CPU-A calls issue_cmdlist. It reads the global state:
> llq.val = 10 | STOP_FLAG.
> c. CPU-A calculates its new target:
> head.prod = 14 | STOP_FLAG | OWNED_FLAG.
head.prod is calculated like so, and queue_inc_prod_n() will clear the
CMDQ_PROD_STOP_FLAG.
head.prod = queue_inc_prod_n(&llq, n + sync) |
CMDQ_PROD_OWNED_FLAG;
You're bringing up a good point, though. This would clear the
CMDQ_PROD_STOP_FLAG which we don't want.
> d. CPU-A executes: old = cmpxchg(&cmdq->q.llq.val, llq.val, head.val)
> The swap succeeds because the global state was exactly llq.val.
> Global prod is now 14. We have just told the entire system that
> there are 14 commands.
> e. CPU-A now checks the old variable. It sees STOP_FLAG is set and
> returns immediately.
>
> CPU-A has "reserved" the indices 10, 11, 12, and 13 in the global
> producer pointer, but it never wrote the commands to those memory slots.
> Indices 10–13 now contain garbage/stale data. When the SMMU resumes and
> processes up to index 14, it will execute that garbage cmd leading to a
> fault (CERROR_ILL).
>
> 2. The OWNED_FLAG Deadlock
>
> a. The CPU who finds OWNED_FLAG is 0 in the cmpxchg becomes the owner
> Its job is to gather all concurrent work and update the hardware.
> b. Any subsequent CPUs that find OWNED_FLAG is 1 in the cmpxchg. They
> write their commands to RAM and then just wait. They rely on the
> Owner to "kick" the HW for them & to clear the flag for the next
> batch.
>
> The Deadlock Scenario:
> a. Suspend Handler: Sets the STOP_FLAG in the global cmdq->q.llq.val.
> b. CPU-A (The Owner): Executes cmpxchg. It moves the global prod from
> 10 to 14. Because it was the first in this batch, it does not set
> the OWNED_FLAG.
Everybody sets the OWNED_FLAG in
head.prod = queue_inc_prod_n(&llq, n + sync) |
CMDQ_PROD_OWNED_FLAG;
The OWNED_FLAG is always set when any CPU breaks out of that first
do/while loop in arm_smmu_cmdq_issue_cmdlist.
> c. CPU-B (The Follower): Executes cmpxchg. It moves the global prod
> from 14 to 18. Because it is joining CPU-A's batch, its cmpxchg
> sets the OWNED_FLAG in the global state.
> d. Now, CPU-A checks the old value, sees STOP_FLAG, and returns.
> CPU-B checks the old value, sees STOP_FLAG, and returns.
> e. The Consequence: The global state cmdq->q.llq.prod now has the
> OWNED_FLAG set.
> f. Suspend Handler: Calls Wait until CMDQ_PROD_OWNED_FLAG is cleared
> g. Since the Owner (CPU-A) should clear that flag (it does so at the
> very end of the function). The Suspend Handler will spin on this
> flag forever.
>
> This can be solved by checking the flag before setting the OWNED_FLAG.
Good point. We should check llq.prod for CMDQ_PROD_STOP_FLAG at the
beginning of the do/while loop in arm_smmu_cmdq_issue_cmdlist and
return early if it's set. The function arm_smmu_cmdq_issue_cmdlist
should never clear CMDQ_PROD_STOP_FLAG.
> While the atomic counter (nr_cmdq_users) used in v5 adds an tomic, I
> believe that it provides a cleaner "gate" that prevents CPUs from ever
> entering the "Commitment" phase once a suspend is imminent (with fixes).
>
> However, I agree that the atomic cntr can cause pref drops on servers.
I think another benefit of my approach is that it avoids the problem
where a high rate of calls to arm_smmu_cmdq_issue_cmdlist can starve
the suspend handler. I'm afraid that with the nr_cmdq_users approach,
we might end up in situations where the suspend handler times out
because nr_cmdq_users never drops to 1.
> I'm okay to adopt Daniel's direction (it only impacts 2 patches) & post
> a v6. However, given we've been working through this for a year now, I'd
> want to make sure we have a solid consensus on the design from everyone.
> I felt with v4 that we have consensus on the design but it still seems
> to be evolving. I’m ready to post v6 once we settle on the path forward.
I think the fastest way forward is to post an updated patch right
away. We can then use that as a baseline for discussions.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq
2026-03-11 22:56 ` Daniel Mentz
@ 2026-03-16 15:36 ` Pranjal Shrivastava
0 siblings, 0 replies; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-03-16 15:36 UTC (permalink / raw)
To: Daniel Mentz
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Wed, Mar 11, 2026 at 03:56:55PM -0700, Daniel Mentz wrote:
> On Wed, Mar 11, 2026 at 6:53 AM Pranjal Shrivastava <praan@google.com> wrote:
> >
> > On Tue, Mar 10, 2026 at 09:43:05PM -0700, Daniel Mentz wrote:
> > > Hi Pranjal,
> > >
> > > I have thought about this further and would like to propose an
> > > alternative design.
> > >
> > > A key difference in this design is that no commands are written to the
> > > command queue, and the producer index is not updated while the SMMU is
> > > suspended.
> > >
> > > Similar to the existing CMDQ_PROD_OWNED_FLAG, we could define a new
> > > flag, CMDQ_PROD_STOP_FLAG, stored in cmdq->q.llq.prod.
> > >
> > > In arm_smmu_cmdq_issue_cmdlist():
> > > Perform "old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val);".
> > > If CMDQ_PROD_STOP_FLAG is set in the variable named old, return from
> > > arm_smmu_cmdq_issue_cmdlist() immediately.
> > >
> > > In the Suspend handler:
> > >
> > > 1. Clear SMMU_CR0.SMMUEN.
> > > 2. Set CMDQ_PROD_STOP_FLAG in cmdq->q.llq.prod.
> > > 3. Wait until CMDQ_PROD_OWNED_FLAG is cleared from cmdq->q.llq.prod.
> > > 4. Wait until cmdq->owner_prod matches cmdq->q.llq.prod (excluding the
> > > CMDQ_PROD_STOP_FLAG).
> > > 5. Wait until cmdq->lock is 0.
> > > 6. Clear SMMU_CR0.
> > >
> > > In the Resume handler:
> > > Follow the same logic as your patch, but clear CMDQ_PROD_STOP_FLAG in
> > > cmdq->q.llq.prod instead of setting nr_cmdq_users to 1.
> > >
> > > One benefit of this proposal is that it avoids adding a new atomic
> > > variable that multiple CPU threads might compete for. Instead, we
> > > reuse existing atomic operations, which should reduce the performance
> > > impact for server use cases where many CPU threads compete for command
> > > queue access.
> > >
> >
> > I like the idea/direction! Although, there could be a few (solvable)
> > challenges, for example:
> >
> > 1. Abandoned Command Batch
> > In the SMMUv3 driver, cmdq->q.llq.val is the shared global state in
> > RAM. The cmpxchg is the "Point of Commitment." Consider the following
> > sequence of events:
> >
> > a. Assume STOP_FLAG is already set. Global prod is 10 | STOP_FLAG.
> > b. CPU-A calls issue_cmdlist. It reads the global state:
> > llq.val = 10 | STOP_FLAG.
> > c. CPU-A calculates its new target:
> > head.prod = 14 | STOP_FLAG | OWNED_FLAG.
>
> head.prod is calculated like so, and queue_inc_prod_n() will clear the
> CMDQ_PROD_STOP_FLAG.
>
> head.prod = queue_inc_prod_n(&llq, n + sync) |
> CMDQ_PROD_OWNED_FLAG;
>
> You're bringing up a good point, though. This would clear the
> CMDQ_PROD_STOP_FLAG which we don't want.
>
> > d. CPU-A executes: old = cmpxchg(&cmdq->q.llq.val, llq.val, head.val)
> > The swap succeeds because the global state was exactly llq.val.
> > Global prod is now 14. We have just told the entire system that
> > there are 14 commands.
> > e. CPU-A now checks the old variable. It sees STOP_FLAG is set and
> > returns immediately.
> >
> > CPU-A has "reserved" the indices 10, 11, 12, and 13 in the global
> > producer pointer, but it never wrote the commands to those memory slots.
> > Indices 10–13 now contain garbage/stale data. When the SMMU resumes and
> > processes up to index 14, it will execute that garbage cmd leading to a
> > fault (CERROR_ILL).
> >
> > 2. The OWNED_FLAG Deadlock
> >
> > a. The CPU who finds OWNED_FLAG is 0 in the cmpxchg becomes the owner
> > Its job is to gather all concurrent work and update the hardware.
> > b. Any subsequent CPUs that find OWNED_FLAG is 1 in the cmpxchg. They
> > write their commands to RAM and then just wait. They rely on the
> > Owner to "kick" the HW for them & to clear the flag for the next
> > batch.
> >
> > The Deadlock Scenario:
> > a. Suspend Handler: Sets the STOP_FLAG in the global cmdq->q.llq.val.
> > b. CPU-A (The Owner): Executes cmpxchg. It moves the global prod from
> > 10 to 14. Because it was the first in this batch, it does not set
> > the OWNED_FLAG.
>
> Everybody sets the OWNED_FLAG in
>
> head.prod = queue_inc_prod_n(&llq, n + sync) |
> CMDQ_PROD_OWNED_FLAG;
>
> The OWNED_FLAG is always set when any CPU breaks out of that first
> do/while loop in arm_smmu_cmdq_issue_cmdlist.
>
> > c. CPU-B (The Follower): Executes cmpxchg. It moves the global prod
> > from 14 to 18. Because it is joining CPU-A's batch, its cmpxchg
> > sets the OWNED_FLAG in the global state.
> > d. Now, CPU-A checks the old value, sees STOP_FLAG, and returns.
> > CPU-B checks the old value, sees STOP_FLAG, and returns.
> > e. The Consequence: The global state cmdq->q.llq.prod now has the
> > OWNED_FLAG set.
> > f. Suspend Handler: Calls Wait until CMDQ_PROD_OWNED_FLAG is cleared
> > g. Since the Owner (CPU-A) should clear that flag (it does so at the
> > very end of the function). The Suspend Handler will spin on this
> > flag forever.
> >
> > This can be solved by checking the flag before setting the OWNED_FLAG.
>
> Good point. We should check llq.prod for CMDQ_PROD_STOP_FLAG at the
> beginning of the do/while loop in arm_smmu_cmdq_issue_cmdlist and
> return early if it's set. The function arm_smmu_cmdq_issue_cmdlist
> should never clear CMDQ_PROD_STOP_FLAG.
>
> > While the atomic counter (nr_cmdq_users) used in v5 adds an tomic, I
> > believe that it provides a cleaner "gate" that prevents CPUs from ever
> > entering the "Commitment" phase once a suspend is imminent (with fixes).
> >
> > However, I agree that the atomic cntr can cause pref drops on servers.
>
> I think another benefit of my approach is that it avoids the problem
> where a high rate of calls to arm_smmu_cmdq_issue_cmdlist can starve
> the suspend handler. I'm afraid that with the nr_cmdq_users approach,
> we might end up in situations where the suspend handler times out
> because nr_cmdq_users never drops to 1.
>
I agree.
> > I'm okay to adopt Daniel's direction (it only impacts 2 patches) & post
> > a v6. However, given we've been working through this for a year now, I'd
> > want to make sure we have a solid consensus on the design from everyone.
> > I felt with v4 that we have consensus on the design but it still seems
> > to be evolving. I’m ready to post v6 once we settle on the path forward.
>
> I think the fastest way forward is to post an updated patch right
> away. We can then use that as a baseline for discussions.
Agreed. I'll think through this CMDQ_STOP_FLAG approach (addressing the
issues I've discussed above) and post a new version soon.
Thanks
Praan
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops
2026-01-26 15:11 [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
` (5 preceding siblings ...)
2026-01-26 15:11 ` [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq Pranjal Shrivastava
@ 2026-01-26 15:11 ` Pranjal Shrivastava
2026-03-10 1:45 ` Daniel Mentz
2026-03-12 19:20 ` Daniel Mentz
2026-01-26 15:11 ` [PATCH v5 08/10] iommu/arm-smmu-v3: Handle gerror during suspend Pranjal Shrivastava
` (3 subsequent siblings)
10 siblings, 2 replies; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-01-26 15:11 UTC (permalink / raw)
To: iommu
Cc: Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Daniel Mentz, Ashish Mhetre,
Sairaj Kodilkar, Pranjal Shrivastava
Implement pm_runtime and system sleep ops for arm-smmu-v3.
The suspend callback configures the SMMU to abort new transactions,
disables the main translation unit and then drains the command queue
to ensure completion of any in-flight commands.
The resume callback restores the MSI configuration and performs a full
device reset via `arm_smmu_device_reset` to bring the SMMU back to an
operational state. The MSIs are cached during the msi_write and are
restored during the resume operation by using the helper.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 124 ++++++++++++++++++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 +
2 files changed, 127 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 3a8d3c2a8d69..19339ac9095b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -26,6 +26,7 @@
#include <linux/pci.h>
#include <linux/pci-ats.h>
#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
#include <linux/string_choices.h>
#include <kunit/visibility.h>
#include <uapi/linux/iommufd.h>
@@ -108,6 +109,33 @@ static const char * const event_class_str[] = {
static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master);
+/* Runtime PM helpers */
+__maybe_unused static int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
+{
+ int ret;
+
+ if (pm_runtime_enabled(smmu->dev)) {
+ ret = pm_runtime_resume_and_get(smmu->dev);
+ if (ret < 0) {
+ dev_err(smmu->dev, "Failed to resume device: %d\n", ret);
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
+__maybe_unused static void arm_smmu_rpm_put(struct arm_smmu_device *smmu)
+{
+ int ret;
+
+ if (pm_runtime_enabled(smmu->dev)) {
+ ret = pm_runtime_put_autosuspend(smmu->dev);
+ if (ret < 0)
+ dev_err(smmu->dev, "Failed to suspend device: %d\n", ret);
+ }
+}
+
static void parse_driver_options(struct arm_smmu_device *smmu)
{
int i = 0;
@@ -5085,6 +5113,101 @@ static void arm_smmu_device_shutdown(struct platform_device *pdev)
arm_smmu_device_disable(smmu);
}
+static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev)
+{
+ struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+ int timeout = ARM_SMMU_SUSPEND_TIMEOUT_US;
+ u32 enables;
+ int ret;
+
+ /* Try to suspend the device, wait for in-flight submissions */
+ do {
+ if (atomic_cmpxchg(&smmu->nr_cmdq_users, 1, 0) == 1)
+ break;
+
+ udelay(1);
+ } while (--timeout);
+
+ if (!timeout) {
+ dev_warn(smmu->dev, "SMMU in use, aborting suspend\n");
+ return -EAGAIN;
+ }
+
+ /* Abort all transactions before disable to avoid spurious bypass */
+ arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
+
+ /* Disable the SMMU via CR0.EN and all queues except CMDQ */
+ enables = CR0_CMDQEN;
+ ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0, ARM_SMMU_CR0ACK);
+ if (ret) {
+ dev_err(smmu->dev, "Timed-out while disabling smmu\n");
+ atomic_set(&smmu->nr_cmdq_users, 1);
+ return ret;
+ }
+
+ /*
+ * At this point the SMMU is completely disabled and won't access
+ * any translation/config structures, even speculative accesses
+ * aren't performed as per the IHI0070 spec (section 6.3.9.6).
+ */
+
+ /* Wait for the CMDQs to be drained to flush any pending commands */
+ ret = arm_smmu_drain_queues(smmu);
+ if (ret)
+ dev_err(smmu->dev, "Draining queues timed-out..forcing suspend\n");
+
+ /* Disable everything */
+ arm_smmu_device_disable(smmu);
+ dev_dbg(dev, "Suspended smmu\n");
+
+ return 0;
+}
+
+static int __maybe_unused arm_smmu_runtime_resume(struct device *dev)
+{
+ int ret;
+ struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+ dev_dbg(dev, "Resuming device\n");
+
+ /* Re-configure MSIs */
+ arm_smmu_resume_msis(smmu);
+
+ /*
+ * The reset will re-initialize all the base addresses, queues,
+ * prod and cons maintained within struct arm_smmu_device as well as
+ * re-enable the interrupts.
+ */
+ ret = arm_smmu_device_reset(smmu);
+
+ if (ret)
+ dev_err(dev, "Failed to reset during resume operation: %d\n", ret);
+
+ return ret;
+}
+
+static int __maybe_unused arm_smmu_pm_suspend(struct device *dev)
+{
+ if (pm_runtime_suspended(dev))
+ return 0;
+
+ return arm_smmu_runtime_suspend(dev);
+}
+
+static int __maybe_unused arm_smmu_pm_resume(struct device *dev)
+{
+ if (pm_runtime_suspended(dev))
+ return 0;
+
+ return arm_smmu_runtime_resume(dev);
+}
+
+static const struct dev_pm_ops arm_smmu_pm_ops = {
+ SET_SYSTEM_SLEEP_PM_OPS(arm_smmu_pm_suspend, arm_smmu_pm_resume)
+ SET_RUNTIME_PM_OPS(arm_smmu_runtime_suspend,
+ arm_smmu_runtime_resume, NULL)
+};
+
static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,smmu-v3", },
{ },
@@ -5101,6 +5224,7 @@ static struct platform_driver arm_smmu_driver = {
.driver = {
.name = "arm-smmu-v3",
.of_match_table = arm_smmu_of_match,
+ .pm = &arm_smmu_pm_ops,
.suppress_bind_attrs = true,
},
.probe = arm_smmu_device_probe,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 50a2513e4425..6a752fe06f54 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -502,11 +502,14 @@ static inline unsigned int arm_smmu_cdtab_l2_idx(unsigned int ssid)
/* High-level queue structures */
#define ARM_SMMU_POLL_TIMEOUT_US 1000000 /* 1s! */
+#define ARM_SMMU_SUSPEND_TIMEOUT_US 100 /* 100us! */
#define ARM_SMMU_POLL_SPIN_COUNT 10
#define MSI_IOVA_BASE 0x8000000
#define MSI_IOVA_LENGTH 0x100000
+#define RPM_AUTOSUSPEND_DELAY_MS 15
+
enum pri_resp {
PRI_RESP_DENY = 0,
PRI_RESP_FAIL = 1,
--
2.52.0.457.g6b5491de43-goog
^ permalink raw reply related [flat|nested] 33+ messages in thread* Re: [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops
2026-01-26 15:11 ` [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops Pranjal Shrivastava
@ 2026-03-10 1:45 ` Daniel Mentz
2026-03-10 16:58 ` Pranjal Shrivastava
2026-03-12 19:20 ` Daniel Mentz
1 sibling, 1 reply; 33+ messages in thread
From: Daniel Mentz @ 2026-03-10 1:45 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Mon, Jan 26, 2026 at 7:12 AM Pranjal Shrivastava <praan@google.com> wrote:
>
> +static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev)
> +{
> + struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> + int timeout = ARM_SMMU_SUSPEND_TIMEOUT_US;
> + u32 enables;
> + int ret;
> +
> + /* Try to suspend the device, wait for in-flight submissions */
> + do {
> + if (atomic_cmpxchg(&smmu->nr_cmdq_users, 1, 0) == 1)
I understand we must follow the rule: do not elide any invalidations
while SMMU is still enabled. Otherwise, the following sequence of
events might occur:
* The driver clears a table descriptor
* The TLBI command is skipped
* The driver frees the page that backed the translation table
* The page is re-allocated for some unrelated purpose
* SMMU performs a table walk and then descends into the page that has
been reallocated, misinterpreting its contents.
Can you disable SMMU before setting nr_cmdq_users to 0?
>
> + /* Disable everything */
> + arm_smmu_device_disable(smmu);
> + dev_dbg(dev, "Suspended smmu\n");
> +
> + return 0;
> +}
> +
> +static int __maybe_unused arm_smmu_runtime_resume(struct device *dev)
> +{
> + int ret;
> + struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> +
> + dev_dbg(dev, "Resuming device\n");
Try to be consistent with these messages. Here, it says "Resuming
device" whereas above, it says "Suspended smmu". Also, try to align
new messages with existing messages in the driver. I believe most
other messages start with lowercase characters.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops
2026-03-10 1:45 ` Daniel Mentz
@ 2026-03-10 16:58 ` Pranjal Shrivastava
2026-03-10 21:16 ` Daniel Mentz
0 siblings, 1 reply; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-03-10 16:58 UTC (permalink / raw)
To: Daniel Mentz
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Mon, Mar 09, 2026 at 06:45:09PM -0700, Daniel Mentz wrote:
> On Mon, Jan 26, 2026 at 7:12 AM Pranjal Shrivastava <praan@google.com> wrote:
> >
> > +static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev)
> > +{
> > + struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> > + int timeout = ARM_SMMU_SUSPEND_TIMEOUT_US;
> > + u32 enables;
> > + int ret;
> > +
> > + /* Try to suspend the device, wait for in-flight submissions */
> > + do {
> > + if (atomic_cmpxchg(&smmu->nr_cmdq_users, 1, 0) == 1)
>
>
> I understand we must follow the rule: do not elide any invalidations
> while SMMU is still enabled. Otherwise, the following sequence of
> events might occur:
> * The driver clears a table descriptor
> * The TLBI command is skipped
> * The driver frees the page that backed the translation table
> * The page is re-allocated for some unrelated purpose
> * SMMU performs a table walk and then descends into the page that has
> been reallocated, misinterpreting its contents.
>
> Can you disable SMMU before setting nr_cmdq_users to 0?
>
No, that'd be racy, if I disable the smmu but nr_cmdq_users != 0 an
"owner" thread might believe that the SMMU is still functional and poll
for consumption.
I don't think that the SMMU would perform a walk as we first set the
nr_cmdq_users == 0 to stop command submissions and then immediately set
GBPA to abort. IF a command submission raced with this, we wait for the
CMDQ to be drained before doing anything, the owner thread would poll
till sync in this case only then the driver would return from the unmap
call and get a chance to free the page.
> >
> > + /* Disable everything */
> > + arm_smmu_device_disable(smmu);
> > + dev_dbg(dev, "Suspended smmu\n");
> > +
> > + return 0;
> > +}
> > +
> > +static int __maybe_unused arm_smmu_runtime_resume(struct device *dev)
> > +{
> > + int ret;
> > + struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> > +
> > + dev_dbg(dev, "Resuming device\n");
>
>
> Try to be consistent with these messages. Here, it says "Resuming
> device" whereas above, it says "Suspended smmu". Also, try to align
> new messages with existing messages in the driver. I believe most
> other messages start with lowercase characters.
Ack. I'll update it in v6.
Thanks
Praan
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops
2026-03-10 16:58 ` Pranjal Shrivastava
@ 2026-03-10 21:16 ` Daniel Mentz
2026-03-10 21:40 ` Pranjal Shrivastava
0 siblings, 1 reply; 33+ messages in thread
From: Daniel Mentz @ 2026-03-10 21:16 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Tue, Mar 10, 2026 at 9:58 AM Pranjal Shrivastava <praan@google.com> wrote:
>
> On Mon, Mar 09, 2026 at 06:45:09PM -0700, Daniel Mentz wrote:
> > On Mon, Jan 26, 2026 at 7:12 AM Pranjal Shrivastava <praan@google.com> wrote:
> > >
> > > +static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev)
> > > +{
> > > + struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> > > + int timeout = ARM_SMMU_SUSPEND_TIMEOUT_US;
> > > + u32 enables;
> > > + int ret;
> > > +
> > > + /* Try to suspend the device, wait for in-flight submissions */
> > > + do {
> > > + if (atomic_cmpxchg(&smmu->nr_cmdq_users, 1, 0) == 1)
> >
> >
> > I understand we must follow the rule: do not elide any invalidations
> > while SMMU is still enabled. Otherwise, the following sequence of
> > events might occur:
> > * The driver clears a table descriptor
> > * The TLBI command is skipped
> > * The driver frees the page that backed the translation table
> > * The page is re-allocated for some unrelated purpose
> > * SMMU performs a table walk and then descends into the page that has
> > been reallocated, misinterpreting its contents.
> >
> > Can you disable SMMU before setting nr_cmdq_users to 0?
> >
>
> No, that'd be racy, if I disable the smmu but nr_cmdq_users != 0 an
> "owner" thread might believe that the SMMU is still functional and poll
> for consumption.
I assume that SMMU still processes commands if SMMU_CR0.SMMUEN is
cleared but SMMU_CR0.CMDQEN is set. The arch spec reads: "When
translation is not enabled for a Security state, an SMMU: [...] Can
process commands after the relevant queue pointers are initialized and
SMMU_(*_)CR0.CMDQEN is enabled." Also, the sequence described under
"Arm recommends that software initializing the SMMU performs the
following steps:" suggests this behavior. The existing driver code
also relies on this when it runs CMDQ_OP_CFGI_ALL before setting
SMMU_CR0.SMMUEN.
>
> I don't think that the SMMU would perform a walk as we first set the
> nr_cmdq_users == 0 to stop command submissions and then immediately set
> GBPA to abort. IF a command submission raced with this, we wait for the
To ensure we're on the same page: Setting SMMU_GBPA.ABORT doesn't do
anything while SMMU_CR0.SMMUEN is set. SMMU_GBPA only plays a role if
SMMU_CR0.SMMUEN is cleared.
The moment you set nr_cmdq_users == 0, you potentially start eliding
TBLIs which means that the TLB entries are at risk of becoming stale.
How about the following sequenc of events:
* suspend handler sets nr_cmdq_users == 0
* arm_smmu_cmdq_issue_cmdlist() stops updating the producer index
which means that TLBIs are not executed
* (sequence described above)
Now that I think more about it: I'm concerned that if we set
nr_cmdq_users == 0, the hw prod index will no longer be updated and
the arm_smmu_drain_queues() function will wait for ever, because the
cons index is not catching up.
> CMDQ to be drained before doing anything, the owner thread would poll
> till sync in this case only then the driver would return from the unmap
> call and get a chance to free the page.
>
> > >
> > > + /* Disable everything */
> > > + arm_smmu_device_disable(smmu);
> > > + dev_dbg(dev, "Suspended smmu\n");
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int __maybe_unused arm_smmu_runtime_resume(struct device *dev)
> > > +{
> > > + int ret;
> > > + struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> > > +
> > > + dev_dbg(dev, "Resuming device\n");
> >
> >
> > Try to be consistent with these messages. Here, it says "Resuming
> > device" whereas above, it says "Suspended smmu". Also, try to align
> > new messages with existing messages in the driver. I believe most
> > other messages start with lowercase characters.
>
> Ack. I'll update it in v6.
>
> Thanks
> Praan
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops
2026-03-10 21:16 ` Daniel Mentz
@ 2026-03-10 21:40 ` Pranjal Shrivastava
2026-03-11 0:12 ` Daniel Mentz
0 siblings, 1 reply; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-03-10 21:40 UTC (permalink / raw)
To: Daniel Mentz
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Tue, Mar 10, 2026 at 02:16:31PM -0700, Daniel Mentz wrote:
> On Tue, Mar 10, 2026 at 9:58 AM Pranjal Shrivastava <praan@google.com> wrote:
> >
> > On Mon, Mar 09, 2026 at 06:45:09PM -0700, Daniel Mentz wrote:
> > > On Mon, Jan 26, 2026 at 7:12 AM Pranjal Shrivastava <praan@google.com> wrote:
> > > >
> > > > +static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev)
> > > > +{
> > > > + struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> > > > + int timeout = ARM_SMMU_SUSPEND_TIMEOUT_US;
> > > > + u32 enables;
> > > > + int ret;
> > > > +
> > > > + /* Try to suspend the device, wait for in-flight submissions */
> > > > + do {
> > > > + if (atomic_cmpxchg(&smmu->nr_cmdq_users, 1, 0) == 1)
> > >
> > >
> > > I understand we must follow the rule: do not elide any invalidations
> > > while SMMU is still enabled. Otherwise, the following sequence of
> > > events might occur:
> > > * The driver clears a table descriptor
> > > * The TLBI command is skipped
> > > * The driver frees the page that backed the translation table
> > > * The page is re-allocated for some unrelated purpose
> > > * SMMU performs a table walk and then descends into the page that has
> > > been reallocated, misinterpreting its contents.
> > >
> > > Can you disable SMMU before setting nr_cmdq_users to 0?
> > >
> >
> > No, that'd be racy, if I disable the smmu but nr_cmdq_users != 0 an
> > "owner" thread might believe that the SMMU is still functional and poll
> > for consumption.
>
> I assume that SMMU still processes commands if SMMU_CR0.SMMUEN is
> cleared but SMMU_CR0.CMDQEN is set. The arch spec reads: "When
> translation is not enabled for a Security state, an SMMU: [...] Can
> process commands after the relevant queue pointers are initialized and
> SMMU_(*_)CR0.CMDQEN is enabled." Also, the sequence described under
> "Arm recommends that software initializing the SMMU performs the
> following steps:" suggests this behavior. The existing driver code
> also relies on this when it runs CMDQ_OP_CFGI_ALL before setting
> SMMU_CR0.SMMUEN.
>
> >
> > I don't think that the SMMU would perform a walk as we first set the
> > nr_cmdq_users == 0 to stop command submissions and then immediately set
> > GBPA to abort. IF a command submission raced with this, we wait for the
>
> To ensure we're on the same page: Setting SMMU_GBPA.ABORT doesn't do
> anything while SMMU_CR0.SMMUEN is set. SMMU_GBPA only plays a role if
> SMMU_CR0.SMMUEN is cleared.
>
I think you're missing that we *DO* actually set SMMUEN = 0 *with* the
GBPA abort, please look at this part of the patch:
+ /* Abort all transactions before disable to avoid spurious bypass */
+ arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
+
+ /* Disable the SMMU via CR0.EN and all queues except CMDQ */
+ enables = CR0_CMDQEN;
+ ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0, ARM_SMMU_CR0ACK);
+ if (ret) {
+ dev_err(smmu->dev, "Timed-out while disabling smmu\n");
+ atomic_set(&smmu->nr_cmdq_users, 1);
+ return ret;
+ }
We disable everything except for the CMDQ. Since, everything is disabled
at this point, any new commands can safely be elided as the SMMU won't
access any config structures as mentioned by the spec.
> The moment you set nr_cmdq_users == 0, you potentially start eliding
> TBLIs which means that the TLB entries are at risk of becoming stale.
Not really, the SMMU isn't accessing anything since SMMUEN=0 and GBPA is
set by this point.
> How about the following sequenc of events:
>
> * suspend handler sets nr_cmdq_users == 0
> * arm_smmu_cmdq_issue_cmdlist() stops updating the producer index
> which means that TLBIs are not executed
> * (sequence described above)
>
> Now that I think more about it: I'm concerned that if we set
> nr_cmdq_users == 0, the hw prod index will no longer be updated and
> the arm_smmu_drain_queues() function will wait for ever, because the
> cons index is not catching up.
>
If we go ahead with the suggestion to completely elide a command
submission, if (nr_users_cmdq == 0) return 0; (as discussed in the other
thread), then we don't update llq.cons / prod. Then this shouldn't be an
issue right?
The problem I see in swapping the sequence is::
+ /* Abort all transactions before disable to avoid spurious bypass */
+ arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
+
+ /* Disable the SMMU via CR0.EN and all queues except CMDQ */
+ enables = CR0_CMDQEN; <-- Tear down everything except CMDQ
+ ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0, ARM_SMMU_CR0ACK);
+ if (ret) {
+ dev_err(smmu->dev, "Timed-out while disabling smmu\n");
+ atomic_set(&smmu->nr_cmdq_users, 1);
+ return ret;
+ }
+ /* Try to suspend the device, wait for in-flight submissions */
+ do {
+ if (atomic_cmpxchg(&smmu->nr_cmdq_users, 1, 0) == 1)
+ break;
+
+ udelay(1);
+ } while (--timeout);
+
+ if (!timeout) {
+ dev_warn(smmu->dev, "SMMU in use, aborting suspend\n");
+ return -EAGAIN; <--- suspend failed, but we already tore down the smmu
+ }
+
+
+ /*
+ * At this point the SMMU is completely disabled and won't access
+ * any translation/config structures, even speculative accesses
+ * aren't performed as per the IHI0070 spec (section 6.3.9.6).
+ */
+
+ /* Wait for the CMDQs to be drained to flush any pending commands */
+ ret = arm_smmu_drain_queues(smmu);
+ if (ret)
+ dev_err(smmu->dev, "Draining queues timed-out..forcing suspend\n");
+
+ /* Disable everything */
+ arm_smmu_device_disable(smmu); <---- Note that this still just disables the CMDQ
+ dev_dbg(dev, "Suspended smmu\n");
+
+ return 0;
+}
+
In the current implementation we first see if we can suspend only then
tear down the SMMU (Except for the CMDQ),
I think this would NOT be a problem if we just bail out of issue_cmdlist
early as discussed on the other thread if (nr_cmdq_users == 0) return 0;
that way, we don't end up in a "fake command" consumption situation
which could lead to the cons never catching up while draining the queue.
Thanks
Praan
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops
2026-03-10 21:40 ` Pranjal Shrivastava
@ 2026-03-11 0:12 ` Daniel Mentz
2026-03-11 5:31 ` Pranjal Shrivastava
0 siblings, 1 reply; 33+ messages in thread
From: Daniel Mentz @ 2026-03-11 0:12 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Tue, Mar 10, 2026 at 2:40 PM Pranjal Shrivastava <praan@google.com> wrote:
>
> On Tue, Mar 10, 2026 at 02:16:31PM -0700, Daniel Mentz wrote:
> > On Tue, Mar 10, 2026 at 9:58 AM Pranjal Shrivastava <praan@google.com> wrote:
> > >
> > > On Mon, Mar 09, 2026 at 06:45:09PM -0700, Daniel Mentz wrote:
> > > > On Mon, Jan 26, 2026 at 7:12 AM Pranjal Shrivastava <praan@google.com> wrote:
> > > > >
> > > > > +static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev)
> > > > > +{
> > > > > + struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> > > > > + int timeout = ARM_SMMU_SUSPEND_TIMEOUT_US;
> > > > > + u32 enables;
> > > > > + int ret;
> > > > > +
> > > > > + /* Try to suspend the device, wait for in-flight submissions */
> > > > > + do {
> > > > > + if (atomic_cmpxchg(&smmu->nr_cmdq_users, 1, 0) == 1)
> > > >
> > > >
> > > > I understand we must follow the rule: do not elide any invalidations
> > > > while SMMU is still enabled. Otherwise, the following sequence of
> > > > events might occur:
> > > > * The driver clears a table descriptor
> > > > * The TLBI command is skipped
> > > > * The driver frees the page that backed the translation table
> > > > * The page is re-allocated for some unrelated purpose
> > > > * SMMU performs a table walk and then descends into the page that has
> > > > been reallocated, misinterpreting its contents.
> > > >
> > > > Can you disable SMMU before setting nr_cmdq_users to 0?
> > > >
> > >
> > > No, that'd be racy, if I disable the smmu but nr_cmdq_users != 0 an
> > > "owner" thread might believe that the SMMU is still functional and poll
> > > for consumption.
> >
> > I assume that SMMU still processes commands if SMMU_CR0.SMMUEN is
> > cleared but SMMU_CR0.CMDQEN is set. The arch spec reads: "When
> > translation is not enabled for a Security state, an SMMU: [...] Can
> > process commands after the relevant queue pointers are initialized and
> > SMMU_(*_)CR0.CMDQEN is enabled." Also, the sequence described under
> > "Arm recommends that software initializing the SMMU performs the
> > following steps:" suggests this behavior. The existing driver code
> > also relies on this when it runs CMDQ_OP_CFGI_ALL before setting
> > SMMU_CR0.SMMUEN.
> >
> > >
> > > I don't think that the SMMU would perform a walk as we first set the
> > > nr_cmdq_users == 0 to stop command submissions and then immediately set
> > > GBPA to abort. IF a command submission raced with this, we wait for the
> >
> > To ensure we're on the same page: Setting SMMU_GBPA.ABORT doesn't do
> > anything while SMMU_CR0.SMMUEN is set. SMMU_GBPA only plays a role if
> > SMMU_CR0.SMMUEN is cleared.
> >
>
> I think you're missing that we *DO* actually set SMMUEN = 0 *with* the
> GBPA abort, please look at this part of the patch:
>
> + /* Abort all transactions before disable to avoid spurious bypass */
> + arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
> +
> + /* Disable the SMMU via CR0.EN and all queues except CMDQ */
> + enables = CR0_CMDQEN;
> + ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0, ARM_SMMU_CR0ACK);
> + if (ret) {
> + dev_err(smmu->dev, "Timed-out while disabling smmu\n");
> + atomic_set(&smmu->nr_cmdq_users, 1);
> + return ret;
> + }
>
> We disable everything except for the CMDQ. Since, everything is disabled
> at this point, any new commands can safely be elided as the SMMU won't
> access any config structures as mentioned by the spec.
>
> > The moment you set nr_cmdq_users == 0, you potentially start eliding
> > TBLIs which means that the TLB entries are at risk of becoming stale.
>
> Not really, the SMMU isn't accessing anything since SMMUEN=0 and GBPA is
> set by this point.
I believe you are assuming that all these things happen atomically
when they, in fact, happen sequentially. Imagine the CPU takes an
interrupt after setting nr_cmdq_users==0 but before clearing
SMMU_CR0.SMMUEN, then we are in a state where nr_cmdq_users==0 but
SMMUEN is still set.
>
> > How about the following sequenc of events:
> >
> > * suspend handler sets nr_cmdq_users == 0
> > * arm_smmu_cmdq_issue_cmdlist() stops updating the producer index
> > which means that TLBIs are not executed
> > * (sequence described above)
> >
> > Now that I think more about it: I'm concerned that if we set
> > nr_cmdq_users == 0, the hw prod index will no longer be updated and
> > the arm_smmu_drain_queues() function will wait for ever, because the
> > cons index is not catching up.
> >
>
> If we go ahead with the suggestion to completely elide a command
> submission, if (nr_users_cmdq == 0) return 0; (as discussed in the other
> thread), then we don't update llq.cons / prod. Then this shouldn't be an
> issue right?
I still think we have to clear SMMUEN before setting nr_users_cmdq=0
>
> The problem I see in swapping the sequence is::
>
> + /* Abort all transactions before disable to avoid spurious bypass */
> + arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
> +
> + /* Disable the SMMU via CR0.EN and all queues except CMDQ */
> + enables = CR0_CMDQEN; <-- Tear down everything except CMDQ
> + ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0, ARM_SMMU_CR0ACK);
> + if (ret) {
> + dev_err(smmu->dev, "Timed-out while disabling smmu\n");
> + atomic_set(&smmu->nr_cmdq_users, 1);
> + return ret;
> + }
> + /* Try to suspend the device, wait for in-flight submissions */
> + do {
> + if (atomic_cmpxchg(&smmu->nr_cmdq_users, 1, 0) == 1)
> + break;
> +
> + udelay(1);
> + } while (--timeout);
> +
> + if (!timeout) {
> + dev_warn(smmu->dev, "SMMU in use, aborting suspend\n");
> + return -EAGAIN; <--- suspend failed, but we already tore down the smmu
> + }
> +
> +
> + /*
> + * At this point the SMMU is completely disabled and won't access
> + * any translation/config structures, even speculative accesses
> + * aren't performed as per the IHI0070 spec (section 6.3.9.6).
> + */
> +
> + /* Wait for the CMDQs to be drained to flush any pending commands */
> + ret = arm_smmu_drain_queues(smmu);
> + if (ret)
> + dev_err(smmu->dev, "Draining queues timed-out..forcing suspend\n");
> +
> + /* Disable everything */
> + arm_smmu_device_disable(smmu); <---- Note that this still just disables the CMDQ
> + dev_dbg(dev, "Suspended smmu\n");
> +
> + return 0;
> +}
> +
>
> In the current implementation we first see if we can suspend only then
> tear down the SMMU (Except for the CMDQ),
What do you mean by "if we can suspend"? I think we can clear SMMUEN
even while commands are in-flight.
>
> I think this would NOT be a problem if we just bail out of issue_cmdlist
> early as discussed on the other thread if (nr_cmdq_users == 0) return 0;
> that way, we don't end up in a "fake command" consumption situation
> which could lead to the cons never catching up while draining the queue.
>
> Thanks
> Praan
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops
2026-03-11 0:12 ` Daniel Mentz
@ 2026-03-11 5:31 ` Pranjal Shrivastava
2026-03-11 17:26 ` Daniel Mentz
0 siblings, 1 reply; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-03-11 5:31 UTC (permalink / raw)
To: Daniel Mentz
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Tue, Mar 10, 2026 at 05:12:15PM -0700, Daniel Mentz wrote:
> On Tue, Mar 10, 2026 at 2:40 PM Pranjal Shrivastava <praan@google.com> wrote:
> >
> > On Tue, Mar 10, 2026 at 02:16:31PM -0700, Daniel Mentz wrote:
> > > On Tue, Mar 10, 2026 at 9:58 AM Pranjal Shrivastava <praan@google.com> wrote:
> > > >
> > > > On Mon, Mar 09, 2026 at 06:45:09PM -0700, Daniel Mentz wrote:
> > > > > On Mon, Jan 26, 2026 at 7:12 AM Pranjal Shrivastava <praan@google.com> wrote:
> > > > > >
> > > > > > +static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev)
> > > > > > +{
> > > > > > + struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> > > > > > + int timeout = ARM_SMMU_SUSPEND_TIMEOUT_US;
> > > > > > + u32 enables;
> > > > > > + int ret;
> > > > > > +
> > > > > > + /* Try to suspend the device, wait for in-flight submissions */
> > > > > > + do {
> > > > > > + if (atomic_cmpxchg(&smmu->nr_cmdq_users, 1, 0) == 1)
> > > > >
> > > > >
> > > > > I understand we must follow the rule: do not elide any invalidations
> > > > > while SMMU is still enabled. Otherwise, the following sequence of
> > > > > events might occur:
> > > > > * The driver clears a table descriptor
> > > > > * The TLBI command is skipped
> > > > > * The driver frees the page that backed the translation table
> > > > > * The page is re-allocated for some unrelated purpose
> > > > > * SMMU performs a table walk and then descends into the page that has
> > > > > been reallocated, misinterpreting its contents.
> > > > >
> > > > > Can you disable SMMU before setting nr_cmdq_users to 0?
> > > > >
> > > >
> > > > No, that'd be racy, if I disable the smmu but nr_cmdq_users != 0 an
> > > > "owner" thread might believe that the SMMU is still functional and poll
> > > > for consumption.
> > >
> > > I assume that SMMU still processes commands if SMMU_CR0.SMMUEN is
> > > cleared but SMMU_CR0.CMDQEN is set. The arch spec reads: "When
> > > translation is not enabled for a Security state, an SMMU: [...] Can
> > > process commands after the relevant queue pointers are initialized and
> > > SMMU_(*_)CR0.CMDQEN is enabled." Also, the sequence described under
> > > "Arm recommends that software initializing the SMMU performs the
> > > following steps:" suggests this behavior. The existing driver code
> > > also relies on this when it runs CMDQ_OP_CFGI_ALL before setting
> > > SMMU_CR0.SMMUEN.
> > >
> > > >
> > > > I don't think that the SMMU would perform a walk as we first set the
> > > > nr_cmdq_users == 0 to stop command submissions and then immediately set
> > > > GBPA to abort. IF a command submission raced with this, we wait for the
> > >
> > > To ensure we're on the same page: Setting SMMU_GBPA.ABORT doesn't do
> > > anything while SMMU_CR0.SMMUEN is set. SMMU_GBPA only plays a role if
> > > SMMU_CR0.SMMUEN is cleared.
> > >
> >
> > I think you're missing that we *DO* actually set SMMUEN = 0 *with* the
> > GBPA abort, please look at this part of the patch:
> >
> > + /* Abort all transactions before disable to avoid spurious bypass */
> > + arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
> > +
> > + /* Disable the SMMU via CR0.EN and all queues except CMDQ */
> > + enables = CR0_CMDQEN;
> > + ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0, ARM_SMMU_CR0ACK);
> > + if (ret) {
> > + dev_err(smmu->dev, "Timed-out while disabling smmu\n");
> > + atomic_set(&smmu->nr_cmdq_users, 1);
> > + return ret;
> > + }
> >
> > We disable everything except for the CMDQ. Since, everything is disabled
> > at this point, any new commands can safely be elided as the SMMU won't
> > access any config structures as mentioned by the spec.
> >
> > > The moment you set nr_cmdq_users == 0, you potentially start eliding
> > > TBLIs which means that the TLB entries are at risk of becoming stale.
> >
> > Not really, the SMMU isn't accessing anything since SMMUEN=0 and GBPA is
> > set by this point.
>
> I believe you are assuming that all these things happen atomically
> when they, in fact, happen sequentially. Imagine the CPU takes an
> interrupt after setting nr_cmdq_users==0 but before clearing
> SMMU_CR0.SMMUEN, then we are in a state where nr_cmdq_users==0 but
> SMMUEN is still set.
>
> >
> > > How about the following sequenc of events:
> > >
> > > * suspend handler sets nr_cmdq_users == 0
> > > * arm_smmu_cmdq_issue_cmdlist() stops updating the producer index
> > > which means that TLBIs are not executed
> > > * (sequence described above)
> > >
> > > Now that I think more about it: I'm concerned that if we set
> > > nr_cmdq_users == 0, the hw prod index will no longer be updated and
> > > the arm_smmu_drain_queues() function will wait for ever, because the
> > > cons index is not catching up.
> > >
> >
> > If we go ahead with the suggestion to completely elide a command
> > submission, if (nr_users_cmdq == 0) return 0; (as discussed in the other
> > thread), then we don't update llq.cons / prod. Then this shouldn't be an
> > issue right?
>
> I still think we have to clear SMMUEN before setting nr_users_cmdq=0
>
> >
> > The problem I see in swapping the sequence is::
> >
> > + /* Abort all transactions before disable to avoid spurious bypass */
> > + arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
> > +
> > + /* Disable the SMMU via CR0.EN and all queues except CMDQ */
> > + enables = CR0_CMDQEN; <-- Tear down everything except CMDQ
> > + ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0, ARM_SMMU_CR0ACK);
> > + if (ret) {
> > + dev_err(smmu->dev, "Timed-out while disabling smmu\n");
> > + atomic_set(&smmu->nr_cmdq_users, 1);
> > + return ret;
> > + }
> > + /* Try to suspend the device, wait for in-flight submissions */
> > + do {
> > + if (atomic_cmpxchg(&smmu->nr_cmdq_users, 1, 0) == 1)
> > + break;
> > +
> > + udelay(1);
> > + } while (--timeout);
> > +
> > + if (!timeout) {
> > + dev_warn(smmu->dev, "SMMU in use, aborting suspend\n");
> > + return -EAGAIN; <--- suspend failed, but we already tore down the smmu
> > + }
> > +
> > +
> > + /*
> > + * At this point the SMMU is completely disabled and won't access
> > + * any translation/config structures, even speculative accesses
> > + * aren't performed as per the IHI0070 spec (section 6.3.9.6).
> > + */
> > +
> > + /* Wait for the CMDQs to be drained to flush any pending commands */
> > + ret = arm_smmu_drain_queues(smmu);
> > + if (ret)
> > + dev_err(smmu->dev, "Draining queues timed-out..forcing suspend\n");
> > +
> > + /* Disable everything */
> > + arm_smmu_device_disable(smmu); <---- Note that this still just disables the CMDQ
> > + dev_dbg(dev, "Suspended smmu\n");
> > +
> > + return 0;
> > +}
> > +
> >
> > In the current implementation we first see if we can suspend only then
> > tear down the SMMU (Except for the CMDQ),
>
> What do you mean by "if we can suspend"? I think we can clear SMMUEN
> even while commands are in-flight.
>
I mean a race where we set SMMUEN = 0 but we get another CMDQ user,
since the nr_cmdq_users != 0, we can have multiple users flushing their
commands unless it's back to 1. The problem is how long do we wait for
nr_users = 0 ? If we timeout, we'll have to reset the SMMU as we can't
simply re-enable SMMUEN.
We can't forcibly set nr_cmdq_user = 0 as we don't know where in the
issue_cmdlist a thread is, forcibly setting it 0 would cause trouble if
a thread is polling on a sync or about to increment the prod_index..
Thanks
Praan
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops
2026-03-11 5:31 ` Pranjal Shrivastava
@ 2026-03-11 17:26 ` Daniel Mentz
2026-03-16 15:05 ` Pranjal Shrivastava
0 siblings, 1 reply; 33+ messages in thread
From: Daniel Mentz @ 2026-03-11 17:26 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Tue, Mar 10, 2026 at 10:31 PM Pranjal Shrivastava <praan@google.com> wrote:
>
> On Tue, Mar 10, 2026 at 05:12:15PM -0700, Daniel Mentz wrote:
> > On Tue, Mar 10, 2026 at 2:40 PM Pranjal Shrivastava <praan@google.com> wrote:
> > >
> > > On Tue, Mar 10, 2026 at 02:16:31PM -0700, Daniel Mentz wrote:
> > > > On Tue, Mar 10, 2026 at 9:58 AM Pranjal Shrivastava <praan@google.com> wrote:
> > > > >
> > > > > On Mon, Mar 09, 2026 at 06:45:09PM -0700, Daniel Mentz wrote:
> > > > > > On Mon, Jan 26, 2026 at 7:12 AM Pranjal Shrivastava <praan@google.com> wrote:
> > > > > > >
> > > > > > > +static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev)
> > > > > > > +{
> > > > > > > + struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> > > > > > > + int timeout = ARM_SMMU_SUSPEND_TIMEOUT_US;
> > > > > > > + u32 enables;
> > > > > > > + int ret;
> > > > > > > +
> > > > > > > + /* Try to suspend the device, wait for in-flight submissions */
> > > > > > > + do {
> > > > > > > + if (atomic_cmpxchg(&smmu->nr_cmdq_users, 1, 0) == 1)
> > > > > >
> > > > > >
> > > > > > I understand we must follow the rule: do not elide any invalidations
> > > > > > while SMMU is still enabled. Otherwise, the following sequence of
> > > > > > events might occur:
> > > > > > * The driver clears a table descriptor
> > > > > > * The TLBI command is skipped
> > > > > > * The driver frees the page that backed the translation table
> > > > > > * The page is re-allocated for some unrelated purpose
> > > > > > * SMMU performs a table walk and then descends into the page that has
> > > > > > been reallocated, misinterpreting its contents.
> > > > > >
> > > > > > Can you disable SMMU before setting nr_cmdq_users to 0?
> > > > > >
> > > > >
> > > > > No, that'd be racy, if I disable the smmu but nr_cmdq_users != 0 an
> > > > > "owner" thread might believe that the SMMU is still functional and poll
> > > > > for consumption.
> > > >
> > > > I assume that SMMU still processes commands if SMMU_CR0.SMMUEN is
> > > > cleared but SMMU_CR0.CMDQEN is set. The arch spec reads: "When
> > > > translation is not enabled for a Security state, an SMMU: [...] Can
> > > > process commands after the relevant queue pointers are initialized and
> > > > SMMU_(*_)CR0.CMDQEN is enabled." Also, the sequence described under
> > > > "Arm recommends that software initializing the SMMU performs the
> > > > following steps:" suggests this behavior. The existing driver code
> > > > also relies on this when it runs CMDQ_OP_CFGI_ALL before setting
> > > > SMMU_CR0.SMMUEN.
> > > >
> > > > >
> > > > > I don't think that the SMMU would perform a walk as we first set the
> > > > > nr_cmdq_users == 0 to stop command submissions and then immediately set
> > > > > GBPA to abort. IF a command submission raced with this, we wait for the
> > > >
> > > > To ensure we're on the same page: Setting SMMU_GBPA.ABORT doesn't do
> > > > anything while SMMU_CR0.SMMUEN is set. SMMU_GBPA only plays a role if
> > > > SMMU_CR0.SMMUEN is cleared.
> > > >
> > >
> > > I think you're missing that we *DO* actually set SMMUEN = 0 *with* the
> > > GBPA abort, please look at this part of the patch:
> > >
> > > + /* Abort all transactions before disable to avoid spurious bypass */
> > > + arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
> > > +
> > > + /* Disable the SMMU via CR0.EN and all queues except CMDQ */
> > > + enables = CR0_CMDQEN;
> > > + ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0, ARM_SMMU_CR0ACK);
> > > + if (ret) {
> > > + dev_err(smmu->dev, "Timed-out while disabling smmu\n");
> > > + atomic_set(&smmu->nr_cmdq_users, 1);
> > > + return ret;
> > > + }
> > >
> > > We disable everything except for the CMDQ. Since, everything is disabled
> > > at this point, any new commands can safely be elided as the SMMU won't
> > > access any config structures as mentioned by the spec.
> > >
> > > > The moment you set nr_cmdq_users == 0, you potentially start eliding
> > > > TBLIs which means that the TLB entries are at risk of becoming stale.
> > >
> > > Not really, the SMMU isn't accessing anything since SMMUEN=0 and GBPA is
> > > set by this point.
> >
> > I believe you are assuming that all these things happen atomically
> > when they, in fact, happen sequentially. Imagine the CPU takes an
> > interrupt after setting nr_cmdq_users==0 but before clearing
> > SMMU_CR0.SMMUEN, then we are in a state where nr_cmdq_users==0 but
> > SMMUEN is still set.
> >
> > >
> > > > How about the following sequenc of events:
> > > >
> > > > * suspend handler sets nr_cmdq_users == 0
> > > > * arm_smmu_cmdq_issue_cmdlist() stops updating the producer index
> > > > which means that TLBIs are not executed
> > > > * (sequence described above)
> > > >
> > > > Now that I think more about it: I'm concerned that if we set
> > > > nr_cmdq_users == 0, the hw prod index will no longer be updated and
> > > > the arm_smmu_drain_queues() function will wait for ever, because the
> > > > cons index is not catching up.
> > > >
> > >
> > > If we go ahead with the suggestion to completely elide a command
> > > submission, if (nr_users_cmdq == 0) return 0; (as discussed in the other
> > > thread), then we don't update llq.cons / prod. Then this shouldn't be an
> > > issue right?
> >
> > I still think we have to clear SMMUEN before setting nr_users_cmdq=0
> >
> > >
> > > The problem I see in swapping the sequence is::
> > >
> > > + /* Abort all transactions before disable to avoid spurious bypass */
> > > + arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
> > > +
> > > + /* Disable the SMMU via CR0.EN and all queues except CMDQ */
> > > + enables = CR0_CMDQEN; <-- Tear down everything except CMDQ
> > > + ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0, ARM_SMMU_CR0ACK);
> > > + if (ret) {
> > > + dev_err(smmu->dev, "Timed-out while disabling smmu\n");
> > > + atomic_set(&smmu->nr_cmdq_users, 1);
> > > + return ret;
> > > + }
> > > + /* Try to suspend the device, wait for in-flight submissions */
> > > + do {
> > > + if (atomic_cmpxchg(&smmu->nr_cmdq_users, 1, 0) == 1)
> > > + break;
> > > +
> > > + udelay(1);
> > > + } while (--timeout);
> > > +
> > > + if (!timeout) {
> > > + dev_warn(smmu->dev, "SMMU in use, aborting suspend\n");
> > > + return -EAGAIN; <--- suspend failed, but we already tore down the smmu
> > > + }
> > > +
> > > +
> > > + /*
> > > + * At this point the SMMU is completely disabled and won't access
> > > + * any translation/config structures, even speculative accesses
> > > + * aren't performed as per the IHI0070 spec (section 6.3.9.6).
> > > + */
> > > +
> > > + /* Wait for the CMDQs to be drained to flush any pending commands */
> > > + ret = arm_smmu_drain_queues(smmu);
> > > + if (ret)
> > > + dev_err(smmu->dev, "Draining queues timed-out..forcing suspend\n");
> > > +
> > > + /* Disable everything */
> > > + arm_smmu_device_disable(smmu); <---- Note that this still just disables the CMDQ
> > > + dev_dbg(dev, "Suspended smmu\n");
> > > +
> > > + return 0;
> > > +}
> > > +
> > >
> > > In the current implementation we first see if we can suspend only then
> > > tear down the SMMU (Except for the CMDQ),
> >
> > What do you mean by "if we can suspend"? I think we can clear SMMUEN
> > even while commands are in-flight.
> >
>
> I mean a race where we set SMMUEN = 0 but we get another CMDQ user,
> since the nr_cmdq_users != 0, we can have multiple users flushing their
> commands unless it's back to 1. The problem is how long do we wait for
> nr_users = 0 ? If we timeout, we'll have to reset the SMMU as we can't
> simply re-enable SMMUEN.
Why can't we just re-enable SMMUEN? If we never set nr_cmdq_users to
0, we never elided any invalidation commands.
>
> We can't forcibly set nr_cmdq_user = 0 as we don't know where in the
> issue_cmdlist a thread is, forcibly setting it 0 would cause trouble if
> a thread is polling on a sync or about to increment the prod_index..
I agree that we can't forcibly set nr_cmdq_user = 0. When I wrote set
nr_cmdq_user to 0, I meant using your atomic_cmpxchg loop.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops
2026-03-11 17:26 ` Daniel Mentz
@ 2026-03-16 15:05 ` Pranjal Shrivastava
0 siblings, 0 replies; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-03-16 15:05 UTC (permalink / raw)
To: Daniel Mentz
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Wed, Mar 11, 2026 at 10:26:49AM -0700, Daniel Mentz wrote:
> On Tue, Mar 10, 2026 at 10:31 PM Pranjal Shrivastava <praan@google.com> wrote:
> >
> > On Tue, Mar 10, 2026 at 05:12:15PM -0700, Daniel Mentz wrote:
> > > On Tue, Mar 10, 2026 at 2:40 PM Pranjal Shrivastava <praan@google.com> wrote:
> > > >
> > > > On Tue, Mar 10, 2026 at 02:16:31PM -0700, Daniel Mentz wrote:
> > > > > On Tue, Mar 10, 2026 at 9:58 AM Pranjal Shrivastava <praan@google.com> wrote:
> > > > > >
> > > > > > On Mon, Mar 09, 2026 at 06:45:09PM -0700, Daniel Mentz wrote:
> > > > > > > On Mon, Jan 26, 2026 at 7:12 AM Pranjal Shrivastava <praan@google.com> wrote:
> > > > > > > >
> > > > > > > > +static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev)
> > > > > > > > +{
> > > > > > > > + struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> > > > > > > > + int timeout = ARM_SMMU_SUSPEND_TIMEOUT_US;
> > > > > > > > + u32 enables;
> > > > > > > > + int ret;
> > > > > > > > +
> > > > > > > > + /* Try to suspend the device, wait for in-flight submissions */
> > > > > > > > + do {
> > > > > > > > + if (atomic_cmpxchg(&smmu->nr_cmdq_users, 1, 0) == 1)
> > > > > > >
> > > > > > >
> > > > > > > I understand we must follow the rule: do not elide any invalidations
> > > > > > > while SMMU is still enabled. Otherwise, the following sequence of
> > > > > > > events might occur:
> > > > > > > * The driver clears a table descriptor
> > > > > > > * The TLBI command is skipped
> > > > > > > * The driver frees the page that backed the translation table
> > > > > > > * The page is re-allocated for some unrelated purpose
> > > > > > > * SMMU performs a table walk and then descends into the page that has
> > > > > > > been reallocated, misinterpreting its contents.
> > > > > > >
> > > > > > > Can you disable SMMU before setting nr_cmdq_users to 0?
> > > > > > >
> > > > > >
> > > > > > No, that'd be racy, if I disable the smmu but nr_cmdq_users != 0 an
> > > > > > "owner" thread might believe that the SMMU is still functional and poll
> > > > > > for consumption.
> > > > >
> > > > > I assume that SMMU still processes commands if SMMU_CR0.SMMUEN is
> > > > > cleared but SMMU_CR0.CMDQEN is set. The arch spec reads: "When
> > > > > translation is not enabled for a Security state, an SMMU: [...] Can
> > > > > process commands after the relevant queue pointers are initialized and
> > > > > SMMU_(*_)CR0.CMDQEN is enabled." Also, the sequence described under
> > > > > "Arm recommends that software initializing the SMMU performs the
> > > > > following steps:" suggests this behavior. The existing driver code
> > > > > also relies on this when it runs CMDQ_OP_CFGI_ALL before setting
> > > > > SMMU_CR0.SMMUEN.
> > > > >
> > > > > >
> > > > > > I don't think that the SMMU would perform a walk as we first set the
> > > > > > nr_cmdq_users == 0 to stop command submissions and then immediately set
> > > > > > GBPA to abort. IF a command submission raced with this, we wait for the
> > > > >
> > > > > To ensure we're on the same page: Setting SMMU_GBPA.ABORT doesn't do
> > > > > anything while SMMU_CR0.SMMUEN is set. SMMU_GBPA only plays a role if
> > > > > SMMU_CR0.SMMUEN is cleared.
> > > > >
> > > >
> > > > I think you're missing that we *DO* actually set SMMUEN = 0 *with* the
> > > > GBPA abort, please look at this part of the patch:
> > > >
> > > > + /* Abort all transactions before disable to avoid spurious bypass */
> > > > + arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
> > > > +
> > > > + /* Disable the SMMU via CR0.EN and all queues except CMDQ */
> > > > + enables = CR0_CMDQEN;
> > > > + ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0, ARM_SMMU_CR0ACK);
> > > > + if (ret) {
> > > > + dev_err(smmu->dev, "Timed-out while disabling smmu\n");
> > > > + atomic_set(&smmu->nr_cmdq_users, 1);
> > > > + return ret;
> > > > + }
> > > >
> > > > We disable everything except for the CMDQ. Since, everything is disabled
> > > > at this point, any new commands can safely be elided as the SMMU won't
> > > > access any config structures as mentioned by the spec.
> > > >
> > > > > The moment you set nr_cmdq_users == 0, you potentially start eliding
> > > > > TBLIs which means that the TLB entries are at risk of becoming stale.
> > > >
> > > > Not really, the SMMU isn't accessing anything since SMMUEN=0 and GBPA is
> > > > set by this point.
> > >
> > > I believe you are assuming that all these things happen atomically
> > > when they, in fact, happen sequentially. Imagine the CPU takes an
> > > interrupt after setting nr_cmdq_users==0 but before clearing
> > > SMMU_CR0.SMMUEN, then we are in a state where nr_cmdq_users==0 but
> > > SMMUEN is still set.
> > >
> > > >
> > > > > How about the following sequenc of events:
> > > > >
> > > > > * suspend handler sets nr_cmdq_users == 0
> > > > > * arm_smmu_cmdq_issue_cmdlist() stops updating the producer index
> > > > > which means that TLBIs are not executed
> > > > > * (sequence described above)
> > > > >
> > > > > Now that I think more about it: I'm concerned that if we set
> > > > > nr_cmdq_users == 0, the hw prod index will no longer be updated and
> > > > > the arm_smmu_drain_queues() function will wait for ever, because the
> > > > > cons index is not catching up.
> > > > >
> > > >
> > > > If we go ahead with the suggestion to completely elide a command
> > > > submission, if (nr_users_cmdq == 0) return 0; (as discussed in the other
> > > > thread), then we don't update llq.cons / prod. Then this shouldn't be an
> > > > issue right?
> > >
> > > I still think we have to clear SMMUEN before setting nr_users_cmdq=0
> > >
> > > >
> > > > The problem I see in swapping the sequence is::
> > > >
> > > > + /* Abort all transactions before disable to avoid spurious bypass */
> > > > + arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0);
> > > > +
> > > > + /* Disable the SMMU via CR0.EN and all queues except CMDQ */
> > > > + enables = CR0_CMDQEN; <-- Tear down everything except CMDQ
> > > > + ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0, ARM_SMMU_CR0ACK);
> > > > + if (ret) {
> > > > + dev_err(smmu->dev, "Timed-out while disabling smmu\n");
> > > > + atomic_set(&smmu->nr_cmdq_users, 1);
> > > > + return ret;
> > > > + }
> > > > + /* Try to suspend the device, wait for in-flight submissions */
> > > > + do {
> > > > + if (atomic_cmpxchg(&smmu->nr_cmdq_users, 1, 0) == 1)
> > > > + break;
> > > > +
> > > > + udelay(1);
> > > > + } while (--timeout);
> > > > +
> > > > + if (!timeout) {
> > > > + dev_warn(smmu->dev, "SMMU in use, aborting suspend\n");
> > > > + return -EAGAIN; <--- suspend failed, but we already tore down the smmu
> > > > + }
> > > > +
> > > > +
> > > > + /*
> > > > + * At this point the SMMU is completely disabled and won't access
> > > > + * any translation/config structures, even speculative accesses
> > > > + * aren't performed as per the IHI0070 spec (section 6.3.9.6).
> > > > + */
> > > > +
> > > > + /* Wait for the CMDQs to be drained to flush any pending commands */
> > > > + ret = arm_smmu_drain_queues(smmu);
> > > > + if (ret)
> > > > + dev_err(smmu->dev, "Draining queues timed-out..forcing suspend\n");
> > > > +
> > > > + /* Disable everything */
> > > > + arm_smmu_device_disable(smmu); <---- Note that this still just disables the CMDQ
> > > > + dev_dbg(dev, "Suspended smmu\n");
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > >
> > > > In the current implementation we first see if we can suspend only then
> > > > tear down the SMMU (Except for the CMDQ),
> > >
> > > What do you mean by "if we can suspend"? I think we can clear SMMUEN
> > > even while commands are in-flight.
> > >
> >
> > I mean a race where we set SMMUEN = 0 but we get another CMDQ user,
> > since the nr_cmdq_users != 0, we can have multiple users flushing their
> > commands unless it's back to 1. The problem is how long do we wait for
> > nr_users = 0 ? If we timeout, we'll have to reset the SMMU as we can't
> > simply re-enable SMMUEN.
>
> Why can't we just re-enable SMMUEN? If we never set nr_cmdq_users to
> 0, we never elided any invalidation commands.
>
Hmm.. this question makes me agree that moving the nr_cmdq_users gating
loop to the top is the better approach.
We can't simply re-enable SMMUEN to recover from a failed suspend. Per
spec section 6.3.9.6, transitioning SMMUEN to 0 is a slightly
destructive operation that forces the termination of stalled
transactions.. but I think we already have a problem in the current path
where we disable SMMUEN first and a timeout causes us to return early.
Thus, I agree with the suggestion of moving the nr_users cmpxchg loop.
It creates a clear 'Point of No Return' in software before we modify
the hardware state.
If we cannot transition nr_cmdq_users from 1 to 0, we can abort the
suspend safely because the SMMU is still fully enabled. Once we
successfully gate submissions, we can proceed with SMMUEN = 0 and
draining the queue, knowing that no new commands will race with the
teardown. Thanks for being persistent!
> >
> > We can't forcibly set nr_cmdq_user = 0 as we don't know where in the
> > issue_cmdlist a thread is, forcibly setting it 0 would cause trouble if
> > a thread is polling on a sync or about to increment the prod_index..
>
> I agree that we can't forcibly set nr_cmdq_user = 0. When I wrote set
> nr_cmdq_user to 0, I meant using your atomic_cmpxchg loop.
Ack.
Thanks,
Praan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops
2026-01-26 15:11 ` [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops Pranjal Shrivastava
2026-03-10 1:45 ` Daniel Mentz
@ 2026-03-12 19:20 ` Daniel Mentz
1 sibling, 0 replies; 33+ messages in thread
From: Daniel Mentz @ 2026-03-12 19:20 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Ashish Mhetre, Sairaj Kodilkar
On Mon, Jan 26, 2026 at 7:12 AM Pranjal Shrivastava <praan@google.com> wrote:
> +static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev)
I understand that most or all other IOMMU drivers use __maybe_unused,
but have you considered gating these definitions by an #ifdef
CONFIG_PM instead of using __maybe_unused?
> +{
> + struct arm_smmu_device *smmu = dev_get_drvdata(dev);
> + int timeout = ARM_SMMU_SUSPEND_TIMEOUT_US;
> + u32 enables;
> + int ret;
> +
> + /* Try to suspend the device, wait for in-flight submissions */
> + do {
> + if (atomic_cmpxchg(&smmu->nr_cmdq_users, 1, 0) == 1)
> + break;
> +
> + udelay(1);
In other situations where we are waiting, e.g. in queue_poll() and
readl_relaxed_poll_timeout(), we use a procedure based on ktime_get().
Consider doing the same here or using queue_poll().
> + } while (--timeout);
> +
> + if (!timeout) {
> + dev_warn(smmu->dev, "SMMU in use, aborting suspend\n");
> + return -EAGAIN;
> + }
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 50a2513e4425..6a752fe06f54 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -502,11 +502,14 @@ static inline unsigned int arm_smmu_cdtab_l2_idx(unsigned int ssid)
>
> /* High-level queue structures */
> #define ARM_SMMU_POLL_TIMEOUT_US 1000000 /* 1s! */
> +#define ARM_SMMU_SUSPEND_TIMEOUT_US 100 /* 100us! */
Isn't a 100us timeout very short? For the other operations in
arm_smmu_runtime_suspend including arm_smmu_update_gbpa,
arm_smmu_write_reg_sync, arm_smmu_drain_queues and
arm_smmu_device_disable, we allow for a 1 second timeout. Also,
waiting for nr_cmdq_users to become 1 indirectly means we are waiting
for a SYNC command to be consumed for which we allow a 1 second
timeout.
> #define ARM_SMMU_POLL_SPIN_COUNT 10
>
> #define MSI_IOVA_BASE 0x8000000
> #define MSI_IOVA_LENGTH 0x100000
>
> +#define RPM_AUTOSUSPEND_DELAY_MS 15
Consider moving this to "iommu/arm-smmu-v3: Enable pm_runtime and
setup devlinks" where it is first used.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v5 08/10] iommu/arm-smmu-v3: Handle gerror during suspend
2026-01-26 15:11 [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
` (6 preceding siblings ...)
2026-01-26 15:11 ` [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops Pranjal Shrivastava
@ 2026-01-26 15:11 ` Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 09/10] iommu/arm-smmu-v3: Enable pm_runtime and setup devlinks Pranjal Shrivastava
` (2 subsequent siblings)
10 siblings, 0 replies; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-01-26 15:11 UTC (permalink / raw)
To: iommu
Cc: Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Daniel Mentz, Ashish Mhetre,
Sairaj Kodilkar, Pranjal Shrivastava, Jason Gunthorpe
The GERROR register's state might be lost when the SMMU is powered down
during runtime suspend. Handle any pending errors before suspending.
Refactor the gerror handling logic into a helper function and invoke it
from the runtime suspend callback after disabling the SMMU. This ensures
that any late-breaking gerrors are logged and ack'ed before the hardware
state is lost.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 19339ac9095b..22782025d0f0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2202,10 +2202,9 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
static int arm_smmu_device_disable(struct arm_smmu_device *smmu);
-static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
+static irqreturn_t arm_smmu_handle_gerror(struct arm_smmu_device *smmu)
{
u32 gerror, gerrorn, active;
- struct arm_smmu_device *smmu = dev;
gerror = readl_relaxed(smmu->base + ARM_SMMU_GERROR);
gerrorn = readl_relaxed(smmu->base + ARM_SMMU_GERRORN);
@@ -2245,9 +2244,17 @@ static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
arm_smmu_cmdq_skip_err(smmu);
writel(gerror, smmu->base + ARM_SMMU_GERRORN);
+
return IRQ_HANDLED;
}
+static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
+{
+ struct arm_smmu_device *smmu = dev;
+
+ return arm_smmu_handle_gerror(smmu);
+}
+
static irqreturn_t arm_smmu_combined_irq_thread(int irq, void *dev)
{
struct arm_smmu_device *smmu = dev;
@@ -5158,6 +5165,10 @@ static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev)
/* Disable everything */
arm_smmu_device_disable(smmu);
+
+ /* Handle any pending gerrors before powering down */
+ arm_smmu_handle_gerror(smmu);
+
dev_dbg(dev, "Suspended smmu\n");
return 0;
--
2.52.0.457.g6b5491de43-goog
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v5 09/10] iommu/arm-smmu-v3: Enable pm_runtime and setup devlinks
2026-01-26 15:11 [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
` (7 preceding siblings ...)
2026-01-26 15:11 ` [PATCH v5 08/10] iommu/arm-smmu-v3: Handle gerror during suspend Pranjal Shrivastava
@ 2026-01-26 15:11 ` Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 10/10] iommu/arm-smmu-v3: Invoke pm_runtime before hw access Pranjal Shrivastava
2026-02-06 15:48 ` [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Ashish Mhetre
10 siblings, 0 replies; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-01-26 15:11 UTC (permalink / raw)
To: iommu
Cc: Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Daniel Mentz, Ashish Mhetre,
Sairaj Kodilkar, Pranjal Shrivastava
Enable PM runtime for SMMUs having a power-domain during smmu probe.
Add a devlink between the clients and SMMU device. The absence of a
power domain effectively disables runtime power management.
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 22782025d0f0..9931dde41239 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3752,6 +3752,9 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
pci_prepare_ats(to_pci_dev(dev), stu);
}
+ device_link_add(dev, smmu->dev,
+ DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER);
+
return &smmu->iommu;
err_free_master:
@@ -5079,11 +5082,18 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
if (ret)
goto err_disable;
+ if (dev->pm_domain) {
+ pm_runtime_set_active(dev);
+ pm_runtime_use_autosuspend(dev);
+ pm_runtime_set_autosuspend_delay(dev, RPM_AUTOSUSPEND_DELAY_MS);
+ pm_runtime_enable(dev);
+ }
+
/* And we're up. Go go go! */
ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
"smmu3.%pa", &ioaddr);
if (ret)
- goto err_disable;
+ goto err_rpm_disable;
ret = iommu_device_register(&smmu->iommu, &arm_smmu_ops, dev);
if (ret) {
@@ -5095,6 +5105,8 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
err_free_sysfs:
iommu_device_sysfs_remove(&smmu->iommu);
+err_rpm_disable:
+ pm_runtime_disable(dev);
err_disable:
arm_smmu_device_disable(smmu);
err_free_iopf:
--
2.52.0.457.g6b5491de43-goog
^ permalink raw reply related [flat|nested] 33+ messages in thread* [PATCH v5 10/10] iommu/arm-smmu-v3: Invoke pm_runtime before hw access
2026-01-26 15:11 [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
` (8 preceding siblings ...)
2026-01-26 15:11 ` [PATCH v5 09/10] iommu/arm-smmu-v3: Enable pm_runtime and setup devlinks Pranjal Shrivastava
@ 2026-01-26 15:11 ` Pranjal Shrivastava
2026-02-06 15:48 ` [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Ashish Mhetre
10 siblings, 0 replies; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-01-26 15:11 UTC (permalink / raw)
To: iommu
Cc: Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Daniel Mentz, Ashish Mhetre,
Sairaj Kodilkar, Pranjal Shrivastava
Invoke the pm_runtime helpers at all places before accessing the hw.
The idea is to invoke runtime_pm helpers at common points which are
used by exposed ops or interrupt handlers. Elide all TLB and CFG
invalidations if the smmu is suspended but not ATC invalidations.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 10 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 92 +++++++++++++++++--
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 +
3 files changed, 97 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index 823461a26659..f1f0920ac12c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -15,6 +15,11 @@ void *arm_smmu_hw_info(struct device *dev, u32 *length,
struct iommu_hw_info_arm_smmuv3 *info;
u32 __iomem *base_idr;
unsigned int i;
+ int ret;
+
+ ret = arm_smmu_rpm_get(master->smmu);
+ if (ret < 0)
+ return ERR_PTR(-EIO);
if (*type != IOMMU_HW_INFO_TYPE_DEFAULT &&
*type != IOMMU_HW_INFO_TYPE_ARM_SMMUV3) {
@@ -24,8 +29,10 @@ void *arm_smmu_hw_info(struct device *dev, u32 *length,
}
info = kzalloc(sizeof(*info), GFP_KERNEL);
- if (!info)
+ if (!info) {
+ arm_smmu_rpm_put(master->smmu);
return ERR_PTR(-ENOMEM);
+ }
base_idr = master->smmu->base + ARM_SMMU_IDR0;
for (i = 0; i <= 5; i++)
@@ -36,6 +43,7 @@ void *arm_smmu_hw_info(struct device *dev, u32 *length,
*length = sizeof(*info);
*type = IOMMU_HW_INFO_TYPE_ARM_SMMUV3;
+ arm_smmu_rpm_put(master->smmu);
return info;
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 9931dde41239..9e2d16c128b0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -110,7 +110,7 @@ static const char * const event_class_str[] = {
static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master);
/* Runtime PM helpers */
-__maybe_unused static int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
+int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
{
int ret;
@@ -125,7 +125,7 @@ __maybe_unused static int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
return 0;
}
-__maybe_unused static void arm_smmu_rpm_put(struct arm_smmu_device *smmu)
+void arm_smmu_rpm_put(struct arm_smmu_device *smmu)
{
int ret;
@@ -1106,7 +1106,9 @@ static void arm_smmu_page_response(struct device *dev, struct iopf_fault *unused
{
struct arm_smmu_cmdq_ent cmd = {0};
struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+ struct arm_smmu_device *smmu = master->smmu;
int sid = master->streams[0].id;
+ int ret;
if (WARN_ON(!master->stall_enabled))
return;
@@ -1126,6 +1128,25 @@ static void arm_smmu_page_response(struct device *dev, struct iopf_fault *unused
break;
}
+ /*
+ * The SMMU is guaranteed to be active via device_link if any master is
+ * active. Furthermore, on suspend we set GBPA to abort, flushing any
+ * pending stalled transactions.
+ *
+ * Receiving a page fault while suspended implies a bug in the power
+ * dependency chain or a stale event. Since the SMMU is powered down
+ * and the command queue is inaccessible, we cannot issue the
+ * RESUME command and must drop it.
+ */
+ if (!atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
+ dev_err(smmu->dev, "Ignoring page fault while suspended\n");
+ return;
+ }
+
+ ret = arm_smmu_rpm_get(smmu);
+ if (ret < 0)
+ goto out;
+
arm_smmu_cmdq_issue_cmd(master->smmu, &cmd);
/*
* Don't send a SYNC, it doesn't do anything for RESUME or PRI_RESP.
@@ -1133,6 +1154,9 @@ static void arm_smmu_page_response(struct device *dev, struct iopf_fault *unused
* terminated... at some point in the future. PRI_RESP is fire and
* forget.
*/
+ arm_smmu_rpm_put(smmu);
+out:
+ atomic_dec_return_release(&smmu->nr_cmdq_users);
}
/* Context descriptor manipulation functions */
@@ -2111,6 +2135,7 @@ static void arm_smmu_dump_event(struct arm_smmu_device *smmu, u64 *raw,
static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
{
+ int ret;
u64 evt[EVTQ_ENT_DWORDS];
struct arm_smmu_event event = {0};
struct arm_smmu_device *smmu = dev;
@@ -2119,6 +2144,10 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);
+ ret = arm_smmu_rpm_get(smmu);
+ if (ret < 0)
+ return IRQ_NONE;
+
do {
while (!queue_remove_raw(q, evt)) {
arm_smmu_decode_event(smmu, evt, &event);
@@ -2139,6 +2168,7 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev)
/* Sync our overflow flag, as we believe we're up to speed */
queue_sync_cons_ovf(q);
+ arm_smmu_rpm_put(smmu);
return IRQ_HANDLED;
}
@@ -2186,6 +2216,11 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
struct arm_smmu_queue *q = &smmu->priq.q;
struct arm_smmu_ll_queue *llq = &q->llq;
u64 evt[PRIQ_ENT_DWORDS];
+ int ret;
+
+ ret = arm_smmu_rpm_get(smmu);
+ if (ret < 0)
+ return IRQ_NONE;
do {
while (!queue_remove_raw(q, evt))
@@ -2197,6 +2232,7 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void *dev)
/* Sync our overflow flag, as we believe we're up to speed */
queue_sync_cons_ovf(q);
+ arm_smmu_rpm_put(smmu);
return IRQ_HANDLED;
}
@@ -2251,8 +2287,21 @@ static irqreturn_t arm_smmu_handle_gerror(struct arm_smmu_device *smmu)
static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev)
{
struct arm_smmu_device *smmu = dev;
+ irqreturn_t ret;
+
+ /* If we are suspended, this is a spurious interrupt */
+ if (!atomic_read(&smmu->nr_cmdq_users) ||
+ (pm_runtime_enabled(smmu->dev) &&
+ pm_runtime_get_if_active(smmu->dev) <= 0)) {
+ dev_err(smmu->dev,
+ "Ignoring gerror interrupt because the SMMU is suspended\n");
+ return IRQ_NONE;
+ }
- return arm_smmu_handle_gerror(smmu);
+ ret = arm_smmu_handle_gerror(smmu);
+ arm_smmu_rpm_put(smmu);
+
+ return ret;
}
static irqreturn_t arm_smmu_combined_irq_thread(int irq, void *dev)
@@ -2342,26 +2391,33 @@ arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
static int arm_smmu_atc_inv_master(struct arm_smmu_master *master,
ioasid_t ssid)
{
- int i;
+ int i, ret;
struct arm_smmu_cmdq_ent cmd;
struct arm_smmu_cmdq_batch cmds;
arm_smmu_atc_inv_to_cmd(ssid, 0, 0, &cmd);
+ /* ATC invalidations shouldn't be elided */
+ ret = arm_smmu_rpm_get(master->smmu);
+ if (ret < 0)
+ return ret;
+
arm_smmu_cmdq_batch_init(master->smmu, &cmds, &cmd);
for (i = 0; i < master->num_streams; i++) {
cmd.atc.sid = master->streams[i].id;
arm_smmu_cmdq_batch_add(master->smmu, &cmds, &cmd);
}
- return arm_smmu_cmdq_batch_submit(master->smmu, &cmds);
+ ret = arm_smmu_cmdq_batch_submit(master->smmu, &cmds);
+ arm_smmu_rpm_put(master->smmu);
+ return ret;
}
int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
unsigned long iova, size_t size)
{
struct arm_smmu_master_domain *master_domain;
- int i;
+ int i, ret;
unsigned long flags;
struct arm_smmu_cmdq_ent cmd = {
.opcode = CMDQ_OP_ATC_INV,
@@ -2388,6 +2444,11 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
if (!atomic_read(&smmu_domain->nr_ats_masters))
return 0;
+ /* ATC invalidations shouldn't be elided */
+ ret = arm_smmu_rpm_get(smmu_domain->smmu);
+ if (ret < 0)
+ return ret;
+
arm_smmu_cmdq_batch_init(smmu_domain->smmu, &cmds, &cmd);
spin_lock_irqsave(&smmu_domain->devices_lock, flags);
@@ -2416,7 +2477,9 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
}
spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
- return arm_smmu_cmdq_batch_submit(smmu_domain->smmu, &cmds);
+ ret = arm_smmu_cmdq_batch_submit(smmu_domain->smmu, &cmds);
+ arm_smmu_rpm_put(smmu_domain->smmu);
+ return ret;
}
/* IO_PGTABLE API */
@@ -5117,10 +5180,19 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
static void arm_smmu_device_remove(struct platform_device *pdev)
{
struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
+ int ret;
iommu_device_unregister(&smmu->iommu);
iommu_device_sysfs_remove(&smmu->iommu);
+
+ ret = arm_smmu_rpm_get(smmu);
+ if (ret < 0)
+ goto free_iopf;
+
arm_smmu_device_disable(smmu);
+ arm_smmu_rpm_put(smmu);
+
+free_iopf:
iopf_queue_free(smmu->evtq.iopf);
ida_destroy(&smmu->vmid_map);
}
@@ -5128,8 +5200,14 @@ static void arm_smmu_device_remove(struct platform_device *pdev)
static void arm_smmu_device_shutdown(struct platform_device *pdev)
{
struct arm_smmu_device *smmu = platform_get_drvdata(pdev);
+ int ret;
+
+ ret = arm_smmu_rpm_get(smmu);
+ if (ret < 0)
+ return;
arm_smmu_device_disable(smmu);
+ arm_smmu_rpm_put(smmu);
}
static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 6a752fe06f54..41e8da97193a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -1023,6 +1023,9 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
struct arm_smmu_cmdq *cmdq, u64 *cmds, int n,
bool sync);
+int arm_smmu_rpm_get(struct arm_smmu_device *smmu);
+void arm_smmu_rpm_put(struct arm_smmu_device *smmu);
+
#ifdef CONFIG_ARM_SMMU_V3_SVA
bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
void arm_smmu_sva_notifier_synchronize(void);
--
2.52.0.457.g6b5491de43-goog
^ permalink raw reply related [flat|nested] 33+ messages in thread* Re: [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops
2026-01-26 15:11 [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
` (9 preceding siblings ...)
2026-01-26 15:11 ` [PATCH v5 10/10] iommu/arm-smmu-v3: Invoke pm_runtime before hw access Pranjal Shrivastava
@ 2026-02-06 15:48 ` Ashish Mhetre
2026-02-11 13:33 ` Pranjal Shrivastava
10 siblings, 1 reply; 33+ messages in thread
From: Ashish Mhetre @ 2026-02-06 15:48 UTC (permalink / raw)
To: Pranjal Shrivastava, iommu
Cc: Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Daniel Mentz, Sairaj Kodilkar
On 1/26/2026 8:41 PM, Pranjal Shrivastava wrote:
> External email: Use caution opening links or attachments
>
>
> As arm-smmu-v3 rapidly finds its way into SoCs designed for hand-held
> devices, power management capabilities, similar to its predecessors, are
> crucial for these applications. This series introduces power management
> support for the arm-smmu-v3 driver.
>
> Design
> =======
> The arm-smmu-v3 primarily operates with in-memory data structures
> through HW registers pointing to these data structures. The proposed design
> makes use of this fact for implementing suspend and resume ops, centered
> around a driver-specific biased usage counter.
>
> 1. Biased Usage Counter
> To safely manage runtime PM state, this series introduces an atomic
> `nr_cmdq_users` counter. This counter is "biased" by being initialized to
> 1, representing an "idle but active" state.
>
> The suspend callback waits for this counter to be exactly 1 and then
> atomically transitions it to 0, signifying a "suspended" state. Any
> other value indicates that the command queue is in use, and the suspend
> operation is aborted.
>
> Code paths that submit commands to the hardware (the "fast path") use
> `atomic_inc_not_zero()` to acquire a reference. This operation only
> succeeds if the counter is not zero (i.e., not suspended), effectively
> gating hardware access. The reference is dropped using
> `atomic_dec_return_release()` after the hardware interaction is
> complete.
>
> 2. Suspend / Resume Flow
> The "suspend" op now polls for a short period (100us) for the usage
> counter to become 1. Once successful, it configures the GBPA register
> to abort new transactions and disables the SMMU, EVTQ, and PRIQ, leaving
> only the CMDQ enabled to drain any in-flight commands.
>
> The "resume" operation uses the `arm_smmu_device_reset` function, which
> re-initializes the HW using the SW-copies maintained by the driver
> (e.g., prod/cons pointers, queue base addresses) and clears all cached
> configurations.
>
> 3. Guarding Hardware Access
> Instead of eliding operations, the driver now ensures the SMMU is active
> before any hardware access. This is managed by `arm_smmu_rpm_get()` and
> `arm_smmu_rpm_put()` helpers, which wrap the standard runtime PM calls.
> These wrappers are now used throughout the driver in interrupt handlers,
> TLB invalidation paths, and any other function that touches hardware
> registers. This ensures that operations are implicitly queued until the
> device resumes.
>
> 4. Interrupt Re-config
> a. Wired irqs: The series refactors the `arm_smmu_setup_irqs` to be
> able to enable/disable irqs and install their handlers separately to
> help with the re-initialization of the interrupts correctly.
>
> b. MSIs: The series relies on caching the `msi_msg` and retrieving it
> through a newly introduced helper `arm_smmu_resume_msis()` which
> re-configures the *_IRQ_CFGn registers by writing back the cached msi_msg.
>
> Call for review
> Any insights/comments on the proposed changes are appreciated,
> especially in areas related to locking, atomic contexts, and potential
> optimizations.
>
> Note: The series isn't tested with MSIs and is weakly tested for PCIe
> clients. The same holds true for tegra241_cmdv changes. Any help in
> reviewing and testing these parts is much appreciated.
>
> [v5]
> - Refactored GERROR handling into a helper function and invoked it during
> runtime suspend after disabling the SMMU to capture any late-breaking
> gerrors as suggested by Jason.
> - Updated `arm_smmu_page_response` to be power-state aware and drop
> page faults received while suspended.
> - Included a patch from Ashish to correctly restore PROD and CONS
> indices for tegra241-cmdqv after a hardware reset.
> - Collected Reviewed-bys from Mostafa and Nicolin.
>
> [v4]
> - https://lore.kernel.org/all/20251117191433.3360130-1-praan@google.com/
> - Dropped the `pm_runtime_get_if_not_suspended()` API in favor of a
> simpler, driver-specific biased counter (`nr_cmdq_users`) to manage
> runtime PM state.
> - Reworked the suspend callback to poll on the biased counter before
> disabling the SMMU.
> - Addressed comments for the MSI refactor.
>
> [v3]
> - https://lore.kernel.org/all/20250616203149.2649118-1-praan@google.com/
> - Introduced `pm_runtime_get_if_not_suspended` API to avoid races due
> to bouncing RPM states while eliding TLBIs as pointed out by Daniel.
> - Addressed Nicolin's comments regarding msi_resume and CMDQV flush
> - Addressed Daniel's comments about CMDQ locking and draining
> - Addressed issues related to draining the evtq and priq
> - Dropped the code to identify and track user-space attachments
>
> [v2]
> - https://lore.kernel.org/all/20250418233409.3926715-1-praan@google.com/
> - Introduced `arm_smmu_rpm_get_if_active` for eliding TLBIs & CFGIs
> - Updated the rpm helper invocation strategy.
> - Drained all queues in suspend callback (including tegra241-cmdv)
> - Cache and restore msi_msg instead of free-ing realloc-ing on resume
> - Added support to identify and track user-space attachments
> - Fixed the setup_irqs as per Nicolin & Mostafa's suggestions
> - Used force_runtime_suspend/resume instead as per Mostafa's suggestion.
> - Added "Reviewed-by" line from Mostafa on an unchanged patch
>
> [v1]
> - https://lore.kernel.org/all/20250319004254.2547950-1-praan@google.com/
>
> Ashish Mhetre (1):
> iommu/tegra241-cmdqv: Restore PROD and CONS after resume
>
> Pranjal Shrivastava (9):
> iommu/arm-smmu-v3: Refactor arm_smmu_setup_irqs
> iommu/arm-smmu-v3: Add a helper to drain cmd queues
> iommu/tegra241-cmdqv: Add a helper to drain VCMDQs
> iommu/arm-smmu-v3: Cache and restore MSI config
> iommu/arm-smmu-v3: Add a usage counter for cmdq
> iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops
> iommu/arm-smmu-v3: Handle gerror during suspend
> iommu/arm-smmu-v3: Enable pm_runtime and setup devlinks
> iommu/arm-smmu-v3: Invoke pm_runtime before hw access
>
> .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 10 +-
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 429 ++++++++++++++++--
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 13 +
> .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 29 ++
> 4 files changed, 445 insertions(+), 36 deletions(-)
>
> --
> 2.52.0.457.g6b5491de43-goog
>
Hi Pranjal,
I have pulled this series and tested on Tegra264 with CMDQV enabled
and 16 vcmdqs assigned to guest. I ran multiple iterations of system
suspend and resume with each iteration following by SMMU tests for
multiple client devices. Everything worked as expected and tests
were passing.
Tested-by: Ashish Mhetre <amhetre@nvidia.com>
Thanks,
Ashish Mhetre
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops
2026-02-06 15:48 ` [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Ashish Mhetre
@ 2026-02-11 13:33 ` Pranjal Shrivastava
0 siblings, 0 replies; 33+ messages in thread
From: Pranjal Shrivastava @ 2026-02-11 13:33 UTC (permalink / raw)
To: Ashish Mhetre
Cc: iommu, Will Deacon, Joerg Roedel, Robin Murphy, Jason Gunthorpe,
Mostafa Saleh, Nicolin Chen, Daniel Mentz, Sairaj Kodilkar
On Fri, Feb 06, 2026 at 09:18:06PM +0530, Ashish Mhetre wrote:
>
>
> On 1/26/2026 8:41 PM, Pranjal Shrivastava wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > As arm-smmu-v3 rapidly finds its way into SoCs designed for hand-held
> > devices, power management capabilities, similar to its predecessors, are
> > crucial for these applications. This series introduces power management
> > support for the arm-smmu-v3 driver.
> >
> > Design
> > =======
> > The arm-smmu-v3 primarily operates with in-memory data structures
> > through HW registers pointing to these data structures. The proposed design
> > makes use of this fact for implementing suspend and resume ops, centered
> > around a driver-specific biased usage counter.
> >
> > 1. Biased Usage Counter
> > To safely manage runtime PM state, this series introduces an atomic
> > `nr_cmdq_users` counter. This counter is "biased" by being initialized to
> > 1, representing an "idle but active" state.
> >
> > The suspend callback waits for this counter to be exactly 1 and then
> > atomically transitions it to 0, signifying a "suspended" state. Any
> > other value indicates that the command queue is in use, and the suspend
> > operation is aborted.
> >
> > Code paths that submit commands to the hardware (the "fast path") use
> > `atomic_inc_not_zero()` to acquire a reference. This operation only
> > succeeds if the counter is not zero (i.e., not suspended), effectively
> > gating hardware access. The reference is dropped using
> > `atomic_dec_return_release()` after the hardware interaction is
> > complete.
> >
> > 2. Suspend / Resume Flow
> > The "suspend" op now polls for a short period (100us) for the usage
> > counter to become 1. Once successful, it configures the GBPA register
> > to abort new transactions and disables the SMMU, EVTQ, and PRIQ, leaving
> > only the CMDQ enabled to drain any in-flight commands.
> >
> > The "resume" operation uses the `arm_smmu_device_reset` function, which
> > re-initializes the HW using the SW-copies maintained by the driver
> > (e.g., prod/cons pointers, queue base addresses) and clears all cached
> > configurations.
> >
> > 3. Guarding Hardware Access
> > Instead of eliding operations, the driver now ensures the SMMU is active
> > before any hardware access. This is managed by `arm_smmu_rpm_get()` and
> > `arm_smmu_rpm_put()` helpers, which wrap the standard runtime PM calls.
> > These wrappers are now used throughout the driver in interrupt handlers,
> > TLB invalidation paths, and any other function that touches hardware
> > registers. This ensures that operations are implicitly queued until the
> > device resumes.
> >
> > 4. Interrupt Re-config
> > a. Wired irqs: The series refactors the `arm_smmu_setup_irqs` to be
> > able to enable/disable irqs and install their handlers separately to
> > help with the re-initialization of the interrupts correctly.
> >
> > b. MSIs: The series relies on caching the `msi_msg` and retrieving it
> > through a newly introduced helper `arm_smmu_resume_msis()` which
> > re-configures the *_IRQ_CFGn registers by writing back the cached msi_msg.
> >
> > Call for review
> > Any insights/comments on the proposed changes are appreciated,
> > especially in areas related to locking, atomic contexts, and potential
> > optimizations.
> >
> > Note: The series isn't tested with MSIs and is weakly tested for PCIe
> > clients. The same holds true for tegra241_cmdv changes. Any help in
> > reviewing and testing these parts is much appreciated.
> >
> > [v5]
> > - Refactored GERROR handling into a helper function and invoked it during
> > runtime suspend after disabling the SMMU to capture any late-breaking
> > gerrors as suggested by Jason.
> > - Updated `arm_smmu_page_response` to be power-state aware and drop
> > page faults received while suspended.
> > - Included a patch from Ashish to correctly restore PROD and CONS
> > indices for tegra241-cmdqv after a hardware reset.
> > - Collected Reviewed-bys from Mostafa and Nicolin.
> >
> > [v4]
> > - https://lore.kernel.org/all/20251117191433.3360130-1-praan@google.com/
> > - Dropped the `pm_runtime_get_if_not_suspended()` API in favor of a
> > simpler, driver-specific biased counter (`nr_cmdq_users`) to manage
> > runtime PM state.
> > - Reworked the suspend callback to poll on the biased counter before
> > disabling the SMMU.
> > - Addressed comments for the MSI refactor.
> >
> > [v3]
> > - https://lore.kernel.org/all/20250616203149.2649118-1-praan@google.com/
> > - Introduced `pm_runtime_get_if_not_suspended` API to avoid races due
> > to bouncing RPM states while eliding TLBIs as pointed out by Daniel.
> > - Addressed Nicolin's comments regarding msi_resume and CMDQV flush
> > - Addressed Daniel's comments about CMDQ locking and draining
> > - Addressed issues related to draining the evtq and priq
> > - Dropped the code to identify and track user-space attachments
> >
> > [v2]
> > - https://lore.kernel.org/all/20250418233409.3926715-1-praan@google.com/
> > - Introduced `arm_smmu_rpm_get_if_active` for eliding TLBIs & CFGIs
> > - Updated the rpm helper invocation strategy.
> > - Drained all queues in suspend callback (including tegra241-cmdv)
> > - Cache and restore msi_msg instead of free-ing realloc-ing on resume
> > - Added support to identify and track user-space attachments
> > - Fixed the setup_irqs as per Nicolin & Mostafa's suggestions
> > - Used force_runtime_suspend/resume instead as per Mostafa's suggestion.
> > - Added "Reviewed-by" line from Mostafa on an unchanged patch
> >
> > [v1]
> > - https://lore.kernel.org/all/20250319004254.2547950-1-praan@google.com/
> >
> > Ashish Mhetre (1):
> > iommu/tegra241-cmdqv: Restore PROD and CONS after resume
> >
> > Pranjal Shrivastava (9):
> > iommu/arm-smmu-v3: Refactor arm_smmu_setup_irqs
> > iommu/arm-smmu-v3: Add a helper to drain cmd queues
> > iommu/tegra241-cmdqv: Add a helper to drain VCMDQs
> > iommu/arm-smmu-v3: Cache and restore MSI config
> > iommu/arm-smmu-v3: Add a usage counter for cmdq
> > iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops
> > iommu/arm-smmu-v3: Handle gerror during suspend
> > iommu/arm-smmu-v3: Enable pm_runtime and setup devlinks
> > iommu/arm-smmu-v3: Invoke pm_runtime before hw access
> >
> > .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 10 +-
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 429 ++++++++++++++++--
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 13 +
> > .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 29 ++
> > 4 files changed, 445 insertions(+), 36 deletions(-)
> >
> > --
> > 2.52.0.457.g6b5491de43-goog
> >
>
> Hi Pranjal,
>
> I have pulled this series and tested on Tegra264 with CMDQV enabled
> and 16 vcmdqs assigned to guest. I ran multiple iterations of system
> suspend and resume with each iteration following by SMMU tests for
> multiple client devices. Everything worked as expected and tests
> were passing.
>
> Tested-by: Ashish Mhetre <amhetre@nvidia.com>
>
Thanks a lot for helping test this, Ashish, especially on Tegra CMDQV!
Regards,
Praan
^ permalink raw reply [flat|nested] 33+ messages in thread