Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] iommu/arm-smmu-v3: Fix Tegra241 CMDQV CMD_SYNC use-after-free
@ 2026-06-11  8:42 Shameer Kolothum
  2026-06-11  8:42 ` [PATCH v2 1/2] iommu/arm-smmu-v3: Manage teardown with devm Shameer Kolothum
  2026-06-11  8:42 ` [PATCH v2 2/2] iommu/tegra241-cmdqv: Fix CMD_SYNC use-after-free on teardown Shameer Kolothum
  0 siblings, 2 replies; 5+ messages in thread
From: Shameer Kolothum @ 2026-06-11  8:42 UTC (permalink / raw)
  To: iommu, linux-kernel, linux-arm-kernel
  Cc: nicolinc, jgg, joro, will, robin.murphy, nathanc, mochs,
	skolothumtho

Hi,

The arm-smmu-v3 probe teardown mixes devm and manual cleanup, so resources
unwind in the wrong order. The IOPF queue is freed before the event-queue
IRQ whose handler uses it. On Tegra241 this is worse: devres frees
smmu->cmdq.q.base before arm_smmu_impl_remove() runs, and the CMDQV
teardown then issues a CMD_SYNC on the freed queue, a use-after-free.

Patch 1 moves the remaining manual teardown (IOPF queue, vmid_map, device
disable) onto devm so the unwind order is correct. Patch 2 adds a
device_disable() impl op and uses it to quiesce the Tegra241 VINTFs
while the CMDQ is still up, fixing the UAF.

v1 is here:
 https://lore.kernel.org/linux-iommu/20260529091052.317102-1-skolothumtho@nvidia.com/

Please take a look and let me know.

Thanks,
Shameer

Shameer Kolothum (2):
  iommu/arm-smmu-v3: Manage teardown with devm
  iommu/tegra241-cmdqv: Fix CMD_SYNC use-after-free on teardown

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  1 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 54 +++++++++++++------
 .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c    | 15 +++++-
 3 files changed, 52 insertions(+), 18 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/2] iommu/arm-smmu-v3: Manage teardown with devm
  2026-06-11  8:42 [PATCH v2 0/2] iommu/arm-smmu-v3: Fix Tegra241 CMDQV CMD_SYNC use-after-free Shameer Kolothum
@ 2026-06-11  8:42 ` Shameer Kolothum
  2026-06-12  1:15   ` Nicolin Chen
  2026-06-11  8:42 ` [PATCH v2 2/2] iommu/tegra241-cmdqv: Fix CMD_SYNC use-after-free on teardown Shameer Kolothum
  1 sibling, 1 reply; 5+ messages in thread
From: Shameer Kolothum @ 2026-06-11  8:42 UTC (permalink / raw)
  To: iommu, linux-kernel, linux-arm-kernel
  Cc: nicolinc, jgg, joro, will, robin.murphy, nathanc, mochs,
	skolothumtho

arm_smmu_device_remove() manually frees the IOPF queue, destroys the
vmid_map and disables the device, while the IRQs and queues are devm
managed. devm unwinds only after remove() returns, so the cleanup runs
in the wrong order. The IOPF queue is freed before the event-queue IRQ
whose handler uses it.

Manage all of it with devm so the unwind order is correct. Free the IOPF
queue and vmid_map via devm actions, and disable the device from one
registered after arm_smmu_device_reset().

This is also a prerequisite for fixing a Tegra241 CMDQV CMD_SYNC
use-after-free in the subsequent patch.

Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 50 ++++++++++++++-------
 1 file changed, 34 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index e8d7dbe495f0..00261e77e7bc 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -4464,6 +4464,16 @@ int arm_smmu_cmdq_init(struct arm_smmu_device *smmu,
 	return 0;
 }
 
+static void arm_smmu_free_iopf_action(void *data)
+{
+	iopf_queue_free(data);
+}
+
+static void arm_smmu_destroy_vmid_map(void *data)
+{
+	ida_destroy(data);
+}
+
 static int arm_smmu_init_queues(struct arm_smmu_device *smmu)
 {
 	int ret;
@@ -4491,6 +4501,11 @@ static int arm_smmu_init_queues(struct arm_smmu_device *smmu)
 		smmu->evtq.iopf = iopf_queue_alloc(dev_name(smmu->dev));
 		if (!smmu->evtq.iopf)
 			return -ENOMEM;
+		ret = devm_add_action_or_reset(smmu->dev,
+					       arm_smmu_free_iopf_action,
+					       smmu->evtq.iopf);
+		if (ret)
+			return ret;
 	}
 
 	/* priq */
@@ -4569,7 +4584,8 @@ static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
 
 	ida_init(&smmu->vmid_map);
 
-	return 0;
+	return devm_add_action_or_reset(smmu->dev, arm_smmu_destroy_vmid_map,
+					&smmu->vmid_map);
 }
 
 static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
@@ -4782,6 +4798,11 @@ static int arm_smmu_device_disable(struct arm_smmu_device *smmu)
 	return ret;
 }
 
+static void arm_smmu_disable_action(void *data)
+{
+	arm_smmu_device_disable(data);
+}
+
 static void arm_smmu_write_strtab(struct arm_smmu_device *smmu)
 {
 	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
@@ -5540,7 +5561,7 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 	/* Initialise in-memory data structures */
 	ret = arm_smmu_init_structures(smmu);
 	if (ret)
-		goto err_free_iopf;
+		return ret;
 
 	/* Record our private device structure */
 	platform_set_drvdata(pdev, smmu);
@@ -5550,30 +5571,30 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 
 	/* Reset the device */
 	ret = arm_smmu_device_reset(smmu);
+	if (ret) {
+		arm_smmu_device_disable(smmu);
+		return ret;
+	}
+
+	/* Register last so it unwinds first, while the CMDQ is still up. */
+	ret = devm_add_action_or_reset(smmu->dev, arm_smmu_disable_action, smmu);
 	if (ret)
-		goto err_disable;
+		return ret;
 
 	/* And we're up. Go go go! */
 	ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL,
 				     "smmu3.%pa", &ioaddr);
 	if (ret)
-		goto err_disable;
+		return ret;
 
 	ret = iommu_device_register(&smmu->iommu, &arm_smmu_ops, dev);
 	if (ret) {
 		dev_err(dev, "Failed to register iommu\n");
-		goto err_free_sysfs;
+		iommu_device_sysfs_remove(&smmu->iommu);
+		return ret;
 	}
 
 	return 0;
-
-err_free_sysfs:
-	iommu_device_sysfs_remove(&smmu->iommu);
-err_disable:
-	arm_smmu_device_disable(smmu);
-err_free_iopf:
-	iopf_queue_free(smmu->evtq.iopf);
-	return ret;
 }
 
 static void arm_smmu_device_remove(struct platform_device *pdev)
@@ -5582,9 +5603,6 @@ static void arm_smmu_device_remove(struct platform_device *pdev)
 
 	iommu_device_unregister(&smmu->iommu);
 	iommu_device_sysfs_remove(&smmu->iommu);
-	arm_smmu_device_disable(smmu);
-	iopf_queue_free(smmu->evtq.iopf);
-	ida_destroy(&smmu->vmid_map);
 }
 
 static void arm_smmu_device_shutdown(struct platform_device *pdev)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] iommu/tegra241-cmdqv: Fix CMD_SYNC use-after-free on teardown
  2026-06-11  8:42 [PATCH v2 0/2] iommu/arm-smmu-v3: Fix Tegra241 CMDQV CMD_SYNC use-after-free Shameer Kolothum
  2026-06-11  8:42 ` [PATCH v2 1/2] iommu/arm-smmu-v3: Manage teardown with devm Shameer Kolothum
@ 2026-06-11  8:42 ` Shameer Kolothum
  2026-06-12  1:10   ` Nicolin Chen
  1 sibling, 1 reply; 5+ messages in thread
From: Shameer Kolothum @ 2026-06-11  8:42 UTC (permalink / raw)
  To: iommu, linux-kernel, linux-arm-kernel
  Cc: nicolinc, jgg, joro, will, robin.murphy, nathanc, mochs,
	skolothumtho

arm_smmu_impl_remove() is registered as a devres action in
arm_smmu_impl_probe(), before arm_smmu_init_queues() allocates
smmu->cmdq.q.base. On a devres unwind, whether a failed probe or an
unbind, the queue is freed first and arm_smmu_impl_remove() then runs
tegra241_cmdqv_remove_vintf(), whose VINTF deinit issues a CMD_SYNC on
the freed memory.

Observed during testing with a QEMU hack that makes the VCMDQ fail to
enable, so the impl reset fails and probe aborts into the devres unwind:

 platform NVDA200C:00: tegra241_cmdqv: VINTF0: VCMDQ0/LVCMDQ0: failed to enable, STATUS=0x00000000
 platform NVDA200C:00: tegra241_cmdqv: VINTF0: VCMDQ0/LVCMDQ0: GERRORN=0x0, GERROR=0x4, CONS=0x0
 platform NVDA200C:00: tegra241_cmdqv: VINTF0: VCMDQ0/LVCMDQ0: uncleared error detected, resetting
 arm-smmu-v3 arm-smmu-v3.0.auto: failed to reset impl
 arm-smmu-v3 arm-smmu-v3.0.auto: probe with driver arm-smmu-v3 failed with error -110
 Unable to handle kernel paging request at virtual address ffff8000891e0098
 ...
 Internal error: Oops: 0000000096000047 [#1] SMP
 ...
 Call trace:
  arm_smmu_cmdq_issue_cmdlist+0x320/0x6fc (P)
  tegra241_vcmdq_hw_deinit+0x98/0x168
  tegra241_vintf_hw_deinit+0x5c/0x1b0
  tegra241_cmdqv_remove_vintf+0x34/0xec
  tegra241_cmdqv_remove+0x40/0x9c
  arm_smmu_impl_remove+0x20/0x30
  devm_action_release+0x14/0x20
  devres_release_all+0xa8/0x110
  device_unbind_cleanup+0x18/0x84
  really_probe+0x1f0/0x29c

Drop the VINTF deinit from tegra241_cmdqv_remove_vintf() so the unwind no
longer touches the freed queue. Quiesce the VINTFs earlier instead. Add a
device_disable() impl op and run it from arm_smmu_disable_action() while
the CMDQ is still up. That handles a live unbind. A failed reset is already
handled because tegra241_vintf_hw_init() deinits the VINTF on its own error
path. tegra241_cmdqv_remove_vintf() is also used by the iommufd viommu
destroy path, so quiesce there too.

Fixes: 4dc0d12474f9 ("iommu/tegra241-cmdqv: Add user-space use support")
Cc: stable@vger.kernel.org
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h    |  1 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c    |  6 +++++-
 drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 15 +++++++++++++--
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index ef42df4753ec..690055aa63c7 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -804,6 +804,7 @@ struct arm_smmu_strtab_cfg {
 
 struct arm_smmu_impl_ops {
 	int (*device_reset)(struct arm_smmu_device *smmu);
+	void (*device_disable)(struct arm_smmu_device *smmu);
 	void (*device_remove)(struct arm_smmu_device *smmu);
 	int (*init_structures)(struct arm_smmu_device *smmu);
 	struct arm_smmu_cmdq *(*get_secondary_cmdq)(
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 00261e77e7bc..3d531af7e16c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -4800,7 +4800,11 @@ static int arm_smmu_device_disable(struct arm_smmu_device *smmu)
 
 static void arm_smmu_disable_action(void *data)
 {
-	arm_smmu_device_disable(data);
+	struct arm_smmu_device *smmu = data;
+
+	if (smmu->impl_ops && smmu->impl_ops->device_disable)
+		smmu->impl_ops->device_disable(smmu);
+	arm_smmu_device_disable(smmu);
 }
 
 static void arm_smmu_write_strtab(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
index 83f6e9f6c51d..c7af7658a4ea 100644
--- a/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
+++ b/drivers/iommu/arm/arm-smmu-v3/tegra241-cmdqv.c
@@ -761,8 +761,6 @@ static void tegra241_cmdqv_remove_vintf(struct tegra241_cmdqv *cmdqv, u16 idx)
 	struct tegra241_vintf *vintf = cmdqv->vintfs[idx];
 	u16 lidx;
 
-	tegra241_vintf_hw_deinit(vintf);
-
 	/* Remove LVCMDQ resources */
 	for (lidx = 0; lidx < vintf->cmdqv->num_lvcmdqs_per_vintf; lidx++)
 		if (vintf->lvcmdqs[lidx])
@@ -779,6 +777,17 @@ static void tegra241_cmdqv_remove_vintf(struct tegra241_cmdqv *cmdqv, u16 idx)
 	}
 }
 
+static void tegra241_cmdqv_hw_disable(struct arm_smmu_device *smmu)
+{
+	struct tegra241_cmdqv *cmdqv =
+		container_of(smmu, struct tegra241_cmdqv, smmu);
+	u16 idx;
+
+	for (idx = 0; idx < cmdqv->num_vintfs; idx++)
+		if (cmdqv->vintfs[idx])
+			tegra241_vintf_hw_deinit(cmdqv->vintfs[idx]);
+}
+
 static void tegra241_cmdqv_remove(struct arm_smmu_device *smmu)
 {
 	struct tegra241_cmdqv *cmdqv =
@@ -844,6 +853,7 @@ static struct arm_smmu_impl_ops tegra241_cmdqv_impl_ops = {
 	/* For in-kernel use */
 	.get_secondary_cmdq = tegra241_cmdqv_get_cmdq,
 	.device_reset = tegra241_cmdqv_hw_reset,
+	.device_disable = tegra241_cmdqv_hw_disable,
 	.device_remove = tegra241_cmdqv_remove,
 	/* For user-space use */
 	.hw_info = tegra241_cmdqv_hw_info,
@@ -1152,6 +1162,7 @@ static void tegra241_cmdqv_destroy_vintf_user(struct iommufd_viommu *viommu)
 	if (vintf->mmap_offset)
 		iommufd_viommu_destroy_mmap(&vintf->vsmmu.core,
 					    vintf->mmap_offset);
+	tegra241_vintf_hw_deinit(vintf);
 	tegra241_cmdqv_remove_vintf(vintf->cmdqv, vintf->idx);
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] iommu/tegra241-cmdqv: Fix CMD_SYNC use-after-free on teardown
  2026-06-11  8:42 ` [PATCH v2 2/2] iommu/tegra241-cmdqv: Fix CMD_SYNC use-after-free on teardown Shameer Kolothum
@ 2026-06-12  1:10   ` Nicolin Chen
  0 siblings, 0 replies; 5+ messages in thread
From: Nicolin Chen @ 2026-06-12  1:10 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: iommu, linux-kernel, linux-arm-kernel, jgg, joro, will,
	robin.murphy, nathanc, mochs

On Thu, Jun 11, 2026 at 09:42:05AM +0100, Shameer Kolothum wrote:
> arm_smmu_impl_remove() is registered as a devres action in
> arm_smmu_impl_probe(), before arm_smmu_init_queues() allocates
> smmu->cmdq.q.base. On a devres unwind, whether a failed probe or an
> unbind, the queue is freed first and arm_smmu_impl_remove() then runs
> tegra241_cmdqv_remove_vintf(), whose VINTF deinit issues a CMD_SYNC on
> the freed memory.
> 
> Observed during testing with a QEMU hack that makes the VCMDQ fail to
> enable, so the impl reset fails and probe aborts into the devres unwind:
> 
>  platform NVDA200C:00: tegra241_cmdqv: VINTF0: VCMDQ0/LVCMDQ0: failed to enable, STATUS=0x00000000
>  platform NVDA200C:00: tegra241_cmdqv: VINTF0: VCMDQ0/LVCMDQ0: GERRORN=0x0, GERROR=0x4, CONS=0x0
>  platform NVDA200C:00: tegra241_cmdqv: VINTF0: VCMDQ0/LVCMDQ0: uncleared error detected, resetting
>  arm-smmu-v3 arm-smmu-v3.0.auto: failed to reset impl
>  arm-smmu-v3 arm-smmu-v3.0.auto: probe with driver arm-smmu-v3 failed with error -110
>  Unable to handle kernel paging request at virtual address ffff8000891e0098
>  ...
>  Internal error: Oops: 0000000096000047 [#1] SMP
>  ...
>  Call trace:
>   arm_smmu_cmdq_issue_cmdlist+0x320/0x6fc (P)
>   tegra241_vcmdq_hw_deinit+0x98/0x168
>   tegra241_vintf_hw_deinit+0x5c/0x1b0
>   tegra241_cmdqv_remove_vintf+0x34/0xec
>   tegra241_cmdqv_remove+0x40/0x9c
>   arm_smmu_impl_remove+0x20/0x30
>   devm_action_release+0x14/0x20
>   devres_release_all+0xa8/0x110
>   device_unbind_cleanup+0x18/0x84
>   really_probe+0x1f0/0x29c
> 
> Drop the VINTF deinit from tegra241_cmdqv_remove_vintf() so the unwind no
> longer touches the freed queue. Quiesce the VINTFs earlier instead. Add a
> device_disable() impl op and run it from arm_smmu_disable_action() while
> the CMDQ is still up. That handles a live unbind. A failed reset is already
> handled because tegra241_vintf_hw_init() deinits the VINTF on its own error
> path. tegra241_cmdqv_remove_vintf() is also used by the iommufd viommu
> destroy path, so quiesce there too.
> 
> Fixes: 4dc0d12474f9 ("iommu/tegra241-cmdqv: Add user-space use support")
> Cc: stable@vger.kernel.org
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/2] iommu/arm-smmu-v3: Manage teardown with devm
  2026-06-11  8:42 ` [PATCH v2 1/2] iommu/arm-smmu-v3: Manage teardown with devm Shameer Kolothum
@ 2026-06-12  1:15   ` Nicolin Chen
  0 siblings, 0 replies; 5+ messages in thread
From: Nicolin Chen @ 2026-06-12  1:15 UTC (permalink / raw)
  To: Shameer Kolothum
  Cc: iommu, linux-kernel, linux-arm-kernel, jgg, joro, will,
	robin.murphy, nathanc, mochs

On Thu, Jun 11, 2026 at 09:42:04AM +0100, Shameer Kolothum wrote:
> arm_smmu_device_remove() manually frees the IOPF queue, destroys the
> vmid_map and disables the device, while the IRQs and queues are devm
> managed. devm unwinds only after remove() returns, so the cleanup runs
> in the wrong order. The IOPF queue is freed before the event-queue IRQ
> whose handler uses it.
> 
> Manage all of it with devm so the unwind order is correct. Free the IOPF
> queue and vmid_map via devm actions, and disable the device from one
> registered after arm_smmu_device_reset().
> 
> This is also a prerequisite for fixing a Tegra241 CMDQV CMD_SYNC
> use-after-free in the subsequent patch.
> 
> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>

Given that the PATCH-2 adds new code to arm_smmu_disable_action()
which is introduced here, should this patch also cc stable tree?

> +static void arm_smmu_free_iopf_action(void *data)
> +{
> +	iopf_queue_free(data);
> +}
> +
> +static void arm_smmu_destroy_vmid_map(void *data)
> +{
> +	ida_destroy(data);
> +}
[...]
> +static void arm_smmu_disable_action(void *data)
> +{
> +	arm_smmu_device_disable(data);
> +}

Jason prefers casting.

Otherwise,

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-12  1:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-11  8:42 [PATCH v2 0/2] iommu/arm-smmu-v3: Fix Tegra241 CMDQV CMD_SYNC use-after-free Shameer Kolothum
2026-06-11  8:42 ` [PATCH v2 1/2] iommu/arm-smmu-v3: Manage teardown with devm Shameer Kolothum
2026-06-12  1:15   ` Nicolin Chen
2026-06-11  8:42 ` [PATCH v2 2/2] iommu/tegra241-cmdqv: Fix CMD_SYNC use-after-free on teardown Shameer Kolothum
2026-06-12  1:10   ` Nicolin Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox