* [PATCH v1 0/4] iommufd: Cache invalidation hardening and SMMUv3 batching rework
@ 2026-06-03 21:26 Nicolin Chen
2026-06-03 21:26 ` [PATCH v1 1/4] iommufd: Set upper bounds on cache invalidation entry_num and entry_len Nicolin Chen
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Nicolin Chen @ 2026-06-03 21:26 UTC (permalink / raw)
To: Will Deacon, Jason Gunthorpe, Kevin Tian
Cc: Robin Murphy, Joerg Roedel, Shuah Khan, Pranjal Shrivastava,
Kees Cook, Yi Liu, Eric Auger, linux-arm-kernel, iommu,
linux-kernel, linux-kselftest
Sashiko pointed out several issues in the iommufd invalidation path, which
also prompted a rework of the ARM SMMUv3 vIOMMU invalidation handler:
- entry_len is user-controlled and unbounded, so the trailing-zero check
for its forward-compat fields can scan gigabytes of user memory without
yielding, long enough to trip the soft-lockup watchdog.
- A large entry_num drives a backend's per-entry invalidation loop with no
reschedule, e.g. the VT-d nested path, pinning the CPU.
- The full-array copy helper copies the array twice on the equal-size fast
path: once in bulk, then again entry by entry.
- arm_vsmmu_cache_invalidate() reports converted-but-unsubmitted commands
as handled on its error paths.
- It sizes a single kernel allocation from the user-controlled entry_num.
- It rejects an empty-array data_type probe that the uAPI allows.
Fix them properly.
This is on Github:
https://github.com/nicolinc/iommufd/commits/smmuv3_fix_iommufd-v1
Nicolin Chen (4):
iommufd: Set upper bounds on cache invalidation entry_num and
entry_len
iommufd/selftest: Add invalidation entry_num and entry_len boundary
tests
iommu: Avoid copying the user array twice in the full-array copy
helper
iommu/arm-smmu-v3: Process vIOMMU invalidations in batches
include/linux/iommu.h | 1 +
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 91 +++++++++++--------
drivers/iommu/iommufd/hw_pagetable.c | 11 ++-
tools/testing/selftests/iommu/iommufd.c | 15 +++
4 files changed, 81 insertions(+), 37 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v1 1/4] iommufd: Set upper bounds on cache invalidation entry_num and entry_len
2026-06-03 21:26 [PATCH v1 0/4] iommufd: Cache invalidation hardening and SMMUv3 batching rework Nicolin Chen
@ 2026-06-03 21:26 ` Nicolin Chen
2026-06-03 21:26 ` [PATCH v1 2/4] iommufd/selftest: Add invalidation entry_num and entry_len boundary tests Nicolin Chen
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Nicolin Chen @ 2026-06-03 21:26 UTC (permalink / raw)
To: Will Deacon, Jason Gunthorpe, Kevin Tian
Cc: Robin Murphy, Joerg Roedel, Shuah Khan, Pranjal Shrivastava,
Kees Cook, Yi Liu, Eric Auger, linux-arm-kernel, iommu,
linux-kernel, linux-kselftest
iommufd_hwpt_invalidate() takes a user-controlled entry_num and entry_len,
each bounded only by U32_MAX. An entry_len beyond the kernel's struct size
makes the copy helper verify the extra bytes are zero, scanning that excess
in one uninterruptible pass; a multi-gigabyte value over zeroed user memory
trips the soft-lockup watchdog.
A large entry_num is the other half, driving the backend invalidation loop
with no reschedule. The VT-d nested handler, for one, copies each entry and
flushes caches per iteration, pinning the CPU on a non-preemptible kernel.
Cap both in the ioctl. entry_len is held under PAGE_SIZE, above any request
struct, and entry_num under 1 << 19, the order of a hardware invalidation
queue and well beyond any real batch, bounding the per-call loop length.
Fixes: 8c6eabae3807 ("iommufd: Add IOMMU_HWPT_INVALIDATE")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommufd/hw_pagetable.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index fe789c2dc0c97..623cc608ca0cd 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -489,6 +489,9 @@ int iommufd_hwpt_get_dirty_bitmap(struct iommufd_ucmd *ucmd)
return rc;
}
+/* An arbitrary entry_num cap, far above any realistic invalidation batch */
+#define IOMMU_HWPT_INVALIDATE_ENTRY_NUM_MAX (1U << 19)
+
int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
{
struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
@@ -507,7 +510,13 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
goto out;
}
- if (cmd->entry_num && (!cmd->data_uptr || !cmd->entry_len)) {
+ /*
+ * Bound entry_num and entry_len so a single call cannot pin the CPU;
+ * entry_len also caps the copy_struct_from_user() trailing-zero scan.
+ */
+ if (cmd->entry_num &&
+ (!cmd->data_uptr || !cmd->entry_len || cmd->entry_len > PAGE_SIZE ||
+ cmd->entry_num > IOMMU_HWPT_INVALIDATE_ENTRY_NUM_MAX)) {
rc = -EINVAL;
goto out;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v1 2/4] iommufd/selftest: Add invalidation entry_num and entry_len boundary tests
2026-06-03 21:26 [PATCH v1 0/4] iommufd: Cache invalidation hardening and SMMUv3 batching rework Nicolin Chen
2026-06-03 21:26 ` [PATCH v1 1/4] iommufd: Set upper bounds on cache invalidation entry_num and entry_len Nicolin Chen
@ 2026-06-03 21:26 ` Nicolin Chen
2026-06-03 21:26 ` [PATCH v1 3/4] iommu: Avoid copying the user array twice in the full-array copy helper Nicolin Chen
2026-06-03 21:26 ` [PATCH v1 4/4] iommu/arm-smmu-v3: Process vIOMMU invalidations in batches Nicolin Chen
3 siblings, 0 replies; 5+ messages in thread
From: Nicolin Chen @ 2026-06-03 21:26 UTC (permalink / raw)
To: Will Deacon, Jason Gunthorpe, Kevin Tian
Cc: Robin Murphy, Joerg Roedel, Shuah Khan, Pranjal Shrivastava,
Kees Cook, Yi Liu, Eric Auger, linux-arm-kernel, iommu,
linux-kernel, linux-kselftest
Test that the cache invalidation ioctl rejects an oversized entry_len and
an oversized entry_num, covering the CPU soft-lockup paths the caps close.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
tools/testing/selftests/iommu/iommufd.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index d1fe5dbc2813e..653aa251f8122 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -556,6 +556,21 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
1, &num_inv);
assert(!num_inv);
+ /* Negative test: entry_len is bounded by PAGE_SIZE */
+ num_inv = 1;
+ test_err_hwpt_invalidate(EINVAL, nested_hwpt_id[0], inv_reqs,
+ IOMMU_HWPT_INVALIDATE_DATA_SELFTEST,
+ PAGE_SIZE + 1, &num_inv);
+ assert(!num_inv);
+
+ /* Negative test: entry_num is bounded */
+#define IOMMU_HWPT_INVALIDATE_ENTRY_NUM_MAX (1U << 19)
+ num_inv = IOMMU_HWPT_INVALIDATE_ENTRY_NUM_MAX + 1;
+ test_err_hwpt_invalidate(EINVAL, nested_hwpt_id[0], inv_reqs,
+ IOMMU_HWPT_INVALIDATE_DATA_SELFTEST,
+ sizeof(*inv_reqs), &num_inv);
+ assert(!num_inv);
+
/* Negative test: invalid flag is passed */
num_inv = 1;
inv_reqs[0].flags = 0xffffffff;
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v1 3/4] iommu: Avoid copying the user array twice in the full-array copy helper
2026-06-03 21:26 [PATCH v1 0/4] iommufd: Cache invalidation hardening and SMMUv3 batching rework Nicolin Chen
2026-06-03 21:26 ` [PATCH v1 1/4] iommufd: Set upper bounds on cache invalidation entry_num and entry_len Nicolin Chen
2026-06-03 21:26 ` [PATCH v1 2/4] iommufd/selftest: Add invalidation entry_num and entry_len boundary tests Nicolin Chen
@ 2026-06-03 21:26 ` Nicolin Chen
2026-06-03 21:26 ` [PATCH v1 4/4] iommu/arm-smmu-v3: Process vIOMMU invalidations in batches Nicolin Chen
3 siblings, 0 replies; 5+ messages in thread
From: Nicolin Chen @ 2026-06-03 21:26 UTC (permalink / raw)
To: Will Deacon, Jason Gunthorpe, Kevin Tian
Cc: Robin Murphy, Joerg Roedel, Shuah Khan, Pranjal Shrivastava,
Kees Cook, Yi Liu, Eric Auger, linux-arm-kernel, iommu,
linux-kernel, linux-kselftest
iommu_copy_struct_from_full_user_array() copies a whole user array into a
kernel buffer. In the common case, where user entry_len equals destination
entry size, it takes a fast path and copies the whole array with a single
copy_from_user().
That fast path does not return, so it falls through into the item-by-item
copy_struct_from_user() loop and copies every entry a second time. For an
equal entry_len that loop is just a copy_from_user() of the same bytes, so
the whole array is copied twice for no benefit.
Return right after the bulk copy. The per-item loop then runs only on the
slow path, where entry_len differs and each entry needs size adaption.
Fixes: 4f2e59ccb698 ("iommu: Add iommu_copy_struct_from_full_user_array helper")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
include/linux/iommu.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e587d4ac4d331..6957144263793 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -547,6 +547,7 @@ iommu_copy_struct_from_full_user_array(void *kdst, size_t kdst_entry_size,
user_array->entry_num *
user_array->entry_len))
return -EFAULT;
+ return 0;
}
/* Copy item by item */
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v1 4/4] iommu/arm-smmu-v3: Process vIOMMU invalidations in batches
2026-06-03 21:26 [PATCH v1 0/4] iommufd: Cache invalidation hardening and SMMUv3 batching rework Nicolin Chen
` (2 preceding siblings ...)
2026-06-03 21:26 ` [PATCH v1 3/4] iommu: Avoid copying the user array twice in the full-array copy helper Nicolin Chen
@ 2026-06-03 21:26 ` Nicolin Chen
3 siblings, 0 replies; 5+ messages in thread
From: Nicolin Chen @ 2026-06-03 21:26 UTC (permalink / raw)
To: Will Deacon, Jason Gunthorpe, Kevin Tian
Cc: Robin Murphy, Joerg Roedel, Shuah Khan, Pranjal Shrivastava,
Kees Cook, Yi Liu, Eric Auger, linux-arm-kernel, iommu,
linux-kernel, linux-kselftest
arm_vsmmu_cache_invalidate() copies the entire user invalidation array into
one kernel allocation, then converts and submits those commands in batches
of CMDQ_BATCH_ENTRIES - 1. Sizing a single allocation from a user-supplied
count while tracking progress with a separate cursor is the root of several
independent problems:
- entry_num, the in/out count of successfully handled requests, is set to
"cur - cmds" on every error path. "cur" counts converted commands, not
submitted ones, so a conversion or submission failure reports the unsent
batch as handled, telling user space that invalidations which never made
it to the hardware have completed.
- The allocation is sized straight from the user's entry_num with no upper
bound and no __GFP_ACCOUNT, letting a large request drive an oversized,
unaccounted allocation and spam the page allocator.
- The -ENOMEM path returns without touching entry_num, once more reporting
the full input count as handled on a failure.
- A zero-length array, which the uAPI defines as a probe for a data_type
the kernel supports, reaches the full-array copy helper, which rejects
zero entries, so a supported type fails exactly like an unsupported one.
Rework the function to process the array one batch at a time, copying up to
CMDQ_BATCH_ENTRIES - 1 entries into a fixed-size stack buffer, converting
them in place, and issuing the batch, which removes the user-sized buffer.
A refactor is preferred over a series of smaller fixes because the problems
above are symptoms of one structure; changing the structure removes them as
a class rather than patching each in place:
- Dropping the allocation makes the size bound, the __GFP_ACCOUNT, and the
-ENOMEM accounting question moot, not three separate fixes.
- entry_num advances only as batches are issued, so the count it reports
tracks submitted commands and needs no correction on the error paths.
- The type is checked once up front, and an empty array then iterates zero
times, so the probe case falls out for free.
Each batch is still copied in one bulk transfer via the full-array helper,
iommu_copy_struct_from_full_user_array(); the whole-array copy becomes one
copy per batch that the preceding helper fix keeps to a single bulk copy.
If converting an entry fails, the already-converted prefix of the batch is
issued before the error is returned; if that issue fails, the batch is not
counted. So entry_num stays tied only to commands that actually reached the
hardware, without dropping the valid ones that preceded the failure.
Fixes: d68beb276ba2 ("iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object")
Assisted-by: Codex:GPT-5
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 91 +++++++++++--------
1 file changed, 55 insertions(+), 36 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index ddae0b07c76b5..5574ccaecf381 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -350,53 +350,72 @@ static int arm_vsmmu_convert_user_cmd(struct arm_vsmmu *vsmmu,
return 0;
}
-int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu,
- struct iommu_user_data_array *array)
+static int arm_vsmmu_cache_invalidate_batch(struct arm_vsmmu *vsmmu,
+ struct iommu_user_data_array *array,
+ u32 *idx)
{
- struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core);
+ struct arm_vsmmu_invalidation_cmd cmds[CMDQ_BATCH_ENTRIES - 1];
struct arm_smmu_device *smmu = vsmmu->smmu;
- struct arm_vsmmu_invalidation_cmd *last;
- struct arm_vsmmu_invalidation_cmd *cmds;
- struct arm_vsmmu_invalidation_cmd *cur;
- struct arm_vsmmu_invalidation_cmd *end;
- int ret;
-
- cmds = kzalloc_objs(*cmds, array->entry_num);
- if (!cmds)
- return -ENOMEM;
- cur = cmds;
- end = cmds + array->entry_num;
+ struct iommu_user_data_array batch = {
+ .type = array->type,
+ .entry_len = array->entry_len,
+ };
+ unsigned int num, i;
+ int ret, issue_ret;
static_assert(sizeof(*cmds) == 2 * sizeof(u64));
+
+ /* Copy one batch of the user array in a single bulk copy */
+ num = min_t(u32, array->entry_num - *idx, ARRAY_SIZE(cmds));
+ batch.uptr = array->uptr + array->entry_len * *idx;
+ batch.entry_num = num;
ret = iommu_copy_struct_from_full_user_array(
- cmds, sizeof(*cmds), array,
+ cmds, sizeof(*cmds), &batch,
IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3);
if (ret)
- goto out;
-
- last = cmds;
- while (cur != end) {
- ret = arm_vsmmu_convert_user_cmd(vsmmu, cur);
- if (ret)
- goto out;
-
- /* FIXME work in blocks of CMDQ_BATCH_ENTRIES and copy each block? */
- cur++;
- if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1)
- continue;
+ return ret;
- /* FIXME always uses the main cmdq rather than trying to group by type */
- ret = arm_smmu_cmdq_issue_cmdlist(smmu, &smmu->cmdq, last->cmd,
- cur - last, true);
+ /* Convert in place; only the converted prefix may be issued */
+ for (i = 0; i < num; i++) {
+ ret = arm_vsmmu_convert_user_cmd(vsmmu, &cmds[i]);
if (ret) {
- cur--;
- goto out;
+ num = i;
+ break;
}
- last = cur;
}
-out:
- array->entry_num = cur - cmds;
- kfree(cmds);
+ if (!num)
+ return ret;
+
+ /* FIXME always uses the main cmdq rather than trying to group by type */
+ issue_ret = arm_smmu_cmdq_issue_cmdlist(smmu, &smmu->cmdq, cmds->cmd,
+ num, true);
+ if (issue_ret)
+ return issue_ret;
+
+ *idx += num;
+ return ret;
+}
+
+int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu,
+ struct iommu_user_data_array *array)
+{
+ struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core);
+ u32 issued = 0;
+ int ret = 0;
+
+ if (array->type != IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3) {
+ array->entry_num = 0;
+ return -EINVAL;
+ }
+
+ while (issued != array->entry_num) {
+ /* Process and issue the command(s) in batch */
+ ret = arm_vsmmu_cache_invalidate_batch(vsmmu, array, &issued);
+ if (ret)
+ break;
+ }
+
+ array->entry_num = issued;
return ret;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-03 21:28 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-03 21:26 [PATCH v1 0/4] iommufd: Cache invalidation hardening and SMMUv3 batching rework Nicolin Chen
2026-06-03 21:26 ` [PATCH v1 1/4] iommufd: Set upper bounds on cache invalidation entry_num and entry_len Nicolin Chen
2026-06-03 21:26 ` [PATCH v1 2/4] iommufd/selftest: Add invalidation entry_num and entry_len boundary tests Nicolin Chen
2026-06-03 21:26 ` [PATCH v1 3/4] iommu: Avoid copying the user array twice in the full-array copy helper Nicolin Chen
2026-06-03 21:26 ` [PATCH v1 4/4] iommu/arm-smmu-v3: Process vIOMMU invalidations in batches Nicolin Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox