Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] iommu/tegra241-cmdqv: Skip CMD_SYNC flush during remove
@ 2026-05-29  9:10 Shameer Kolothum
  2026-05-29 11:53 ` Jason Gunthorpe
  0 siblings, 1 reply; 3+ messages in thread
From: Shameer Kolothum @ 2026-05-29  9:10 UTC (permalink / raw)
  To: iommu, linux-kernel, linux-arm-kernel
  Cc: nicolinc, jgg, joro, will, robin.murphy, nathanc, mochs,
	skolothumtho

tegra241_vcmdq_hw_deinit() unconditionally issues a CMD_SYNC on
smmu->cmdq via tegra241_vcmdq_hw_flush_timeout(). When the SMMU is
being torn down (eg: probe failure), this CMD_SYNC hits freed
memory and UAFs. Observed during testing with a QEMU hack that
fails tegra241_cmdqv_setup_vcmdq(), so the guest sees the VCMDQ
enable as failed:

 platform NVDA200C:00: tegra241_cmdqv: VINTF0: VCMDQ0/LVCMDQ0: failed to enable, STATUS=0x00000000
 platform NVDA200C:00: tegra241_cmdqv: VINTF0: VCMDQ0/LVCMDQ0: GERRORN=0x0, GERROR=0x4, CONS=0x0
 platform NVDA200C:00: tegra241_cmdqv: VINTF0: VCMDQ0/LVCMDQ0: uncleared error detected, resetting
 arm-smmu-v3 arm-smmu-v3.0.auto: failed to reset impl
 arm-smmu-v3 arm-smmu-v3.0.auto: probe with driver arm-smmu-v3 failed with error -110
 Unable to handle kernel paging request at virtual address ffff8000891e0098
 ...
 Internal error: Oops: 0000000096000047 [#1] SMP
 ...
 Call trace:
  arm_smmu_cmdq_issue_cmdlist+0x320/0x6fc (P)
  tegra241_vcmdq_hw_deinit+0x98/0x168
  tegra241_vintf_hw_deinit+0x5c/0x1b0
  tegra241_cmdqv_remove_vintf+0x34/0xec
  tegra241_cmdqv_remove+0x40/0x9c
  arm_smmu_impl_remove+0x20/0x30
  devm_action_release+0x14/0x20
  devres_release_all+0xa8/0x110
  device_unbind_cleanup+0x18/0x84
  really_probe+0x1f0/0x29c

The reason is the devres ordering. arm_smmu_impl_probe() registers
arm_smmu_impl_remove as a devres action before arm_smmu_init_queues()
does dmam_alloc_coherent() for smmu->cmdq.q.base.

devres unwinds LIFO, so q.base is released first, then
arm_smmu_impl_remove()
 tegra241_cmdqv_remove()
  tegra241_vintf_hw_deinit()
   tegra241_vcmdq_hw_deinit()
     hw_flush_timeout() on a freed q.base.

The flush exists to drain a guest-owned VCMDQ's pending ATC_INV
TIMEOUT before the VCMDQ is handed to the next VM (see the comment
above tegra241_vcmdq_hw_flush_timeout()). impl-remove is not a
handover: no VM is taking the VCMDQ here. The next time a VM is
assigned a VCMDQ via the IOMMU_HW_QUEUE_ALLOC ioctl, this host
kernel driver's tegra241_vcmdq_hw_init_user() runs hw_deinit() ->
hw_flush_timeout() before the hw_queue is returned to userspace,
so any pending TIMEOUT is drained by the host before any guest
sees the VCMDQ.

Mark the cmdqv as removing at the top of tegra241_cmdqv_remove() and
skip the flush in tegra241_vcmdq_hw_deinit() when the flag is set.
All other hw_deinit callers (in-kernel hw_init reset, vintf_hw_init
failure unwind, user-mode lvcmdq destroy) run while smmu->cmdq is
still valid and continue to issue the flush as before.

Fixes: 4dc0d12474f9 ("iommu/tegra241-cmdqv: Add user-space use support")
Cc: stable@vger.kernel.org # v6.17+
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
index 4915ed96baca..b336b0bffe96 100644
--- a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
+++ b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
@@ -208,6 +208,7 @@ struct tegra241_vintf_sid {
  * @num_sids_per_vintf: Total number of SID mappings per VINTF
  * @vintf_ids: VINTF id allocator
  * @vintfs: List of VINTFs
+ * @removing: Set while the device is being torn down via impl_remove
  */
 struct tegra241_cmdqv {
 	struct arm_smmu_device smmu;
@@ -226,6 +227,8 @@ struct tegra241_cmdqv {
 	struct ida vintf_ids;
 
 	struct tegra241_vintf **vintfs;
+
+	bool removing;
 };
 
 /* Config and Polling Helpers */
@@ -452,7 +455,9 @@ static void tegra241_vcmdq_hw_deinit(struct tegra241_vcmdq *vcmdq)
 			readl_relaxed(REG_VCMDQ_PAGE0(vcmdq, GERROR)),
 			readl_relaxed(REG_VCMDQ_PAGE0(vcmdq, CONS)));
 	}
-	tegra241_vcmdq_hw_flush_timeout(vcmdq);
+	/* In the removing path, smmu->cmdq.q.base is freed by the devres */
+	if (!vcmdq->cmdqv->removing)
+		tegra241_vcmdq_hw_flush_timeout(vcmdq);
 
 	writel_relaxed(0, REG_VCMDQ_PAGE0(vcmdq, PROD));
 	writel_relaxed(0, REG_VCMDQ_PAGE0(vcmdq, CONS));
@@ -789,6 +794,13 @@ static void tegra241_cmdqv_remove(struct arm_smmu_device *smmu)
 		container_of(smmu, struct tegra241_cmdqv, smmu);
 	u16 idx;
 
+	/*
+	 * tegra241_cmdqv_remove() is added to devres at the very beginning. So,
+	 * at this point, devres has already freed the SMMU resources that this
+	 * path must not access to avoid a UAF.
+	 */
+	cmdqv->removing = true;
+
 	/* Remove VINTF resources */
 	for (idx = 0; idx < cmdqv->num_vintfs; idx++) {
 		if (cmdqv->vintfs[idx]) {
@@ -932,6 +944,7 @@ __tegra241_cmdqv_probe(struct arm_smmu_device *smmu, struct resource *res,
 	cmdqv->base = base;
 	cmdqv->dev = smmu->impl_dev;
 	cmdqv->base_phys = res->start;
+	cmdqv->removing = false;
 
 	if (cmdqv->irq > 0) {
 		ret = request_threaded_irq(irq, NULL, tegra241_cmdqv_isr,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-29 12:50 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-29  9:10 [PATCH] iommu/tegra241-cmdqv: Skip CMD_SYNC flush during remove Shameer Kolothum
2026-05-29 11:53 ` Jason Gunthorpe
2026-05-29 12:50   ` Shameer Kolothum Thodi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox