* [PATCH v1 0/5] iommufd: Iterate the cache invalidation array in the core
@ 2026-06-29 21:15 Nicolin Chen
2026-06-29 21:15 ` [PATCH v1 1/5] iommu/arm-smmu-v3-iommufd: Reject unsupported bits in invalidation commands Nicolin Chen
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Nicolin Chen @ 2026-06-29 21:15 UTC (permalink / raw)
To: Will Deacon, Jason Gunthorpe, Kevin Tian, Lu Baolu
Cc: Robin Murphy, joro, David Woodhouse, linux-arm-kernel, iommu,
linux-kernel
The vIOMMU cache_invalidate() and the nested-HWPT cache_invalidate_user()
ops are each handed the full user invalidation array and must report, via
array->entry_num, how many of its entries they handled. That makes every
driver open-code the same array walk, with real downsides:
- each driver carries its own loop and sub-array bookkeeping;
- the ARM SMMUv3 driver allocates a buffer sized to the whole array just
to iterate over it;
- hand-rolling the loop left the ARM SMMUv3 driver with two long-standing
bugs:
1) on a conversion failure it counts commands that it converted but
never issued, so user space skips invalidations that never reached
the cmdq;
2) it rejects a zero-length array, which the uAPI documents as a valid
request that only probes the data type.
The walk is identical for every driver, so move it into the iommufd core.
The core now drives the iteration:
- it invokes the op on a sub-array starting at the first not-yet-handled
entry;
- the op handles one chunk from the front of that sub-array and reports
the count via array->entry_num;
- the core advances and re-invokes until the whole array is consumed or
the op returns an error.
A driver then only has to handle one bounded chunk per call, e.g. the ARM
SMMUv3 op copies a single cmdq batch into a fixed on-stack buffer and drops
its whole-array allocation. An op still handling the entire array in one
call keeps working, so each driver converts independently.
These are long-standing corner cases, so this targets for-next, not for-rc.
This is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_invalidation_loop-v1
[Note to Jason and Will]
This has some conflicts with Ashish's ARM_SMMU_OPT_REPEAT_TLBI_CFGI series:
https://lore.kernel.org/all/20260609073204.1760077-1-amhetre@nvidia.com/
Nicolin Chen (5):
iommu/arm-smmu-v3-iommufd: Reject unsupported bits in invalidation
commands
iommufd: Iterate the cache invalidation array in the core
iommufd/selftest: Convert cache invalidation mocks to the core array
loop
iommu/arm-smmu-v3-iommufd: Convert cache invalidation to the core
array loop
iommu/vt-d: Convert nested cache invalidation to the core array loop
include/linux/iommu.h | 6 +-
include/linux/iommufd.h | 2 +
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 131 ++++++++++++----
drivers/iommu/intel/nested.c | 54 ++++---
drivers/iommu/iommufd/hw_pagetable.c | 22 ++-
drivers/iommu/iommufd/selftest.c | 147 +++++++++---------
6 files changed, 222 insertions(+), 140 deletions(-)
base-commit: dc59e4fea9d83f03bad6bddf3fa2e52491777482
--
2.43.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v1 1/5] iommu/arm-smmu-v3-iommufd: Reject unsupported bits in invalidation commands
2026-06-29 21:15 [PATCH v1 0/5] iommufd: Iterate the cache invalidation array in the core Nicolin Chen
@ 2026-06-29 21:15 ` Nicolin Chen
2026-06-29 21:15 ` [PATCH v1 2/5] iommufd: Iterate the cache invalidation array in the core Nicolin Chen
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Nicolin Chen @ 2026-06-29 21:15 UTC (permalink / raw)
To: Will Deacon, Jason Gunthorpe, Kevin Tian, Lu Baolu
Cc: Robin Murphy, joro, David Woodhouse, linux-arm-kernel, iommu,
linux-kernel
The arm_vsmmu_cache_invalidate() op hands a guest's invalidation commands
to the trusted main command queue after enforcing only the VMID or the SID,
and passes the rest of the command through to the queue unchanged.
That lets a guest set bits the host never meant to forward, in two ways. A
bit can take the command out of the guest's own scope: the ATC_INV Global
bit, for one, makes the SMMU ignore the SID and invalidate the ATC of every
device, not just the guest's. A reserved or undefined bit instead makes the
command malformed; per the Arm SMMUv3 specification, in its section 4.1.3
"Command errors", a CERROR_ILL is raised, among other cases, when:
A valid command opcode is used and a Reserved or undefined field is
optionally detected as non-zero, which results in the command being
treated as malformed.
Restrict each opcode to the fields that the driver supports and reject the
command with -EIO if it sets any other bit, before the command reaches the
queue. This keeps a guest scoped to its own devices and stops the host from
forwarding any bit whose meaning it does not control.
Some fields and whole opcodes are legal only on an SMMU that implements the
matching feature, so accept them conditionally. The NUM, SCALE and TG range
fields need FEAT_RANGE_INV. The ATC_INV opcode needs FEAT_ATS. Per the same
specification's section 4.5 "ATS and PRI", CMD_ATC_INV is ILLEGAL when:
SMMU_IDR0.ATS == 0 and this command is issued on a Non-secure or Secure
Command queue.
The SSV and SSID substream fields require a non-zero ssid_bits, so without
substream support setting them is not illegal but CONSTRAINED UNPREDICTABLE,
which a guest should not be able to provoke.
Fixes: d68beb276ba2 ("iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 58 +++++++++++++++++++
1 file changed, 58 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index 1e9f7d2de3441..393d69783225c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -315,10 +315,64 @@ struct arm_vsmmu_invalidation_cmd {
static int arm_vsmmu_convert_user_cmd(struct arm_vsmmu *vsmmu,
struct arm_vsmmu_invalidation_cmd *cmd)
{
+ u64 allowed[2] = { CMDQ_0_OP };
+
/* Commands are le64 stored in u64 */
cmd->cmd.data[0] = le64_to_cpu(cmd->ucmd.cmd[0]);
cmd->cmd.data[1] = le64_to_cpu(cmd->ucmd.cmd[1]);
+ /* Collect the fields userspace is allowed to set for each opcode */
+ switch (cmd->cmd.data[0] & CMDQ_0_OP) {
+ case CMDQ_OP_TLBI_NH_VA:
+ allowed[0] |= CMDQ_TLBI_0_ASID;
+ fallthrough;
+ case CMDQ_OP_TLBI_NH_VAA:
+ allowed[0] |= CMDQ_TLBI_0_VMID;
+ allowed[1] |= CMDQ_TLBI_1_LEAF | CMDQ_TLBI_1_TTL |
+ CMDQ_TLBI_1_VA_MASK;
+ /* NUM/SCALE/TG are range fields gated on FEAT_RANGE_INV */
+ if (vsmmu->smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
+ allowed[0] |= CMDQ_TLBI_0_NUM | CMDQ_TLBI_0_SCALE;
+ allowed[1] |= CMDQ_TLBI_1_TG;
+ }
+ break;
+ case CMDQ_OP_TLBI_NH_ASID:
+ allowed[0] |= CMDQ_TLBI_0_ASID;
+ fallthrough;
+ case CMDQ_OP_TLBI_NH_ALL:
+ allowed[0] |= CMDQ_TLBI_0_VMID;
+ break;
+ case CMDQ_OP_ATC_INV:
+ /*
+ * Exclude the Global bit: it makes the SMMU ignore the SID and
+ * invalidate the ATC of every device, not just the guest's.
+ */
+ allowed[0] |= CMDQ_ATC_0_SID;
+ allowed[1] |= CMDQ_ATC_1_SIZE | CMDQ_ATC_1_ADDR_MASK;
+ /* SSV/SSID require substream support */
+ if (vsmmu->smmu->ssid_bits)
+ allowed[0] |= CMDQ_0_SSV | CMDQ_ATC_0_SSID;
+ break;
+ case CMDQ_OP_CFGI_CD:
+ allowed[1] |= CMDQ_CFGI_1_LEAF;
+ /* No SSV for CFGI_CD; SSID requires substream support */
+ if (vsmmu->smmu->ssid_bits)
+ allowed[0] |= CMDQ_CFGI_0_SSID;
+ fallthrough;
+ case CMDQ_OP_CFGI_CD_ALL:
+ allowed[0] |= CMDQ_CFGI_0_SID;
+ break;
+ }
+
+ /*
+ * Reject any other bit, e.g. a RES0 bit or a Secure bit, before the
+ * command reaches the trusted main cmdq, so a guest cannot wedge the
+ * shared queue for every device with a CERROR_ILL.
+ */
+ if ((cmd->cmd.data[0] & ~allowed[0]) ||
+ (cmd->cmd.data[1] & ~allowed[1]))
+ return -EIO;
+
switch (cmd->cmd.data[0] & CMDQ_0_OP) {
case CMDQ_OP_TLBI_NSNH_ALL:
/* Convert to NH_ALL */
@@ -334,6 +388,10 @@ static int arm_vsmmu_convert_user_cmd(struct arm_vsmmu *vsmmu,
cmd->cmd.data[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, vsmmu->vmid);
break;
case CMDQ_OP_ATC_INV:
+ /* ATC_INV is illegal unless the SMMU implements ATS */
+ if (!(vsmmu->smmu->features & ARM_SMMU_FEAT_ATS))
+ return -EIO;
+ fallthrough;
case CMDQ_OP_CFGI_CD:
case CMDQ_OP_CFGI_CD_ALL: {
u32 sid, vsid = FIELD_GET(CMDQ_CFGI_0_SID, cmd->cmd.data[0]);
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v1 2/5] iommufd: Iterate the cache invalidation array in the core
2026-06-29 21:15 [PATCH v1 0/5] iommufd: Iterate the cache invalidation array in the core Nicolin Chen
2026-06-29 21:15 ` [PATCH v1 1/5] iommu/arm-smmu-v3-iommufd: Reject unsupported bits in invalidation commands Nicolin Chen
@ 2026-06-29 21:15 ` Nicolin Chen
2026-06-29 21:15 ` [PATCH v1 3/5] iommufd/selftest: Convert cache invalidation mocks to the core array loop Nicolin Chen
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Nicolin Chen @ 2026-06-29 21:15 UTC (permalink / raw)
To: Will Deacon, Jason Gunthorpe, Kevin Tian, Lu Baolu
Cc: Robin Murphy, joro, David Woodhouse, linux-arm-kernel, iommu,
linux-kernel
The cache invalidation ops, cache_invalidate_user() for a nested HWPT and
the cache_invalidate() for a vIOMMU, are each handed the full user request
array and report how many of the array entries they handled by setting the
array->entry_num. Every driver therefore implements its own loop over the
array, and a driver wanting to process that array in fixed-size chunks
(e.g. to issue commands out of a fixed-size on-stack buffer) has to carry
the loop and its sub-array bookkeeping all on its own.
Move the iteration into the iommufd core instead. Invoke the op with a
sub-array that starts at the first not-yet-handled entry, let it handle a
prefix of that sub-array and report the count via array->entry_num, then
advance the base pointer and re-invoke the op until the entire array has
been consumed or until the op returns an error along the way.
A driver that handles the entire window in one single call, as all of the
current drivers happen to do, finishes the loop in just one pass, so this
does not change any of the existing behavior. It instead lets each of the
drivers convert to bounded chunk processing on its own, done by each of the
following changes.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
include/linux/iommu.h | 6 ++++--
include/linux/iommufd.h | 2 ++
drivers/iommu/iommufd/hw_pagetable.c | 22 +++++++++++++++++-----
3 files changed, 23 insertions(+), 7 deletions(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d20aa6f6863ab..969758f87e445 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -773,8 +773,10 @@ struct iommu_ops {
* passes in the cache invalidation requests, in form
* of a driver data structure. The driver must update
* array->entry_num to report the number of handled
- * invalidation requests. The driver data structure
- * must be defined in include/uapi/linux/iommufd.h
+ * invalidation requests. A driver may handle fewer than
+ * the requested, in which case the core re-invokes the
+ * op for the remainder. The driver data structure must
+ * be defined in include/uapi/linux/iommufd.h
* @iova_to_phys: translate iova to physical address
* @enforce_cache_coherency: Prevent any kind of DMA from bypassing IOMMU_CACHE,
* including no-snoop TLPs on PCIe or other platform
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 6e7efe83bc5d8..3087f5b2def84 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -154,6 +154,8 @@ struct iommufd_hw_queue {
* The @array passes in the cache invalidation requests, in
* form of a driver data structure. A driver must update the
* array->entry_num to report the number of handled requests.
+ * A driver may handle fewer than the requested entry_num, in
+ * which case the core re-invokes the op for the remainder.
* The data structure of the array entry must be defined in
* include/uapi/linux/iommufd.h
* @vdevice_size: Size of the driver-defined vDEVICE structure per this vIOMMU
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 623cc608ca0cd..409ba2216f8bd 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -535,8 +535,15 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
rc = -EOPNOTSUPP;
goto out_put_pt;
}
- rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain,
- &data_array);
+ do {
+ rc = hwpt->domain->ops->cache_invalidate_user(
+ hwpt->domain, &data_array);
+
+ done_num += data_array.entry_num;
+ data_array.uptr +=
+ data_array.entry_num * cmd->entry_len;
+ data_array.entry_num = cmd->entry_num - done_num;
+ } while (!rc && done_num != cmd->entry_num);
} else if (pt_obj->type == IOMMUFD_OBJ_VIOMMU) {
struct iommufd_viommu *viommu =
container_of(pt_obj, struct iommufd_viommu, obj);
@@ -545,14 +552,19 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
rc = -EOPNOTSUPP;
goto out_put_pt;
}
- rc = viommu->ops->cache_invalidate(viommu, &data_array);
+ do {
+ rc = viommu->ops->cache_invalidate(viommu, &data_array);
+
+ done_num += data_array.entry_num;
+ data_array.uptr +=
+ data_array.entry_num * cmd->entry_len;
+ data_array.entry_num = cmd->entry_num - done_num;
+ } while (!rc && done_num != cmd->entry_num);
} else {
rc = -EINVAL;
goto out_put_pt;
}
- done_num = data_array.entry_num;
-
out_put_pt:
iommufd_put_object(ucmd->ictx, pt_obj);
out:
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v1 3/5] iommufd/selftest: Convert cache invalidation mocks to the core array loop
2026-06-29 21:15 [PATCH v1 0/5] iommufd: Iterate the cache invalidation array in the core Nicolin Chen
2026-06-29 21:15 ` [PATCH v1 1/5] iommu/arm-smmu-v3-iommufd: Reject unsupported bits in invalidation commands Nicolin Chen
2026-06-29 21:15 ` [PATCH v1 2/5] iommufd: Iterate the cache invalidation array in the core Nicolin Chen
@ 2026-06-29 21:15 ` Nicolin Chen
2026-06-29 21:15 ` [PATCH v1 4/5] iommu/arm-smmu-v3-iommufd: Convert cache invalidation " Nicolin Chen
2026-06-29 21:15 ` [PATCH v1 5/5] iommu/vt-d: Convert nested " Nicolin Chen
4 siblings, 0 replies; 6+ messages in thread
From: Nicolin Chen @ 2026-06-29 21:15 UTC (permalink / raw)
To: Will Deacon, Jason Gunthorpe, Kevin Tian, Lu Baolu
Cc: Robin Murphy, joro, David Woodhouse, linux-arm-kernel, iommu,
linux-kernel
The vIOMMU and the nested-domain selftest invalidation mocks each used to
walk the whole request array on their own, with the vIOMMU mock even
allocating a buffer sized to the entire array in order to do so first.
The iommufd core now iterates the request array itself and re-invokes the
op with the not-yet-handled sub-array, so handle just a single request per
call out of the front of that sub-array and report one handled entry via
the array->entry_num. Drop both of the loops and the kzalloc_objs() in the
viommu callback function, and keep returning a success for an empty array
as a probe of the selftest data type.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommufd/selftest.c | 147 +++++++++++++++----------------
1 file changed, 72 insertions(+), 75 deletions(-)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index af07c642a5260..51da687e432ef 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -631,70 +631,63 @@ mock_viommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
static int mock_viommu_cache_invalidate(struct iommufd_viommu *viommu,
struct iommu_user_data_array *array)
{
- struct iommu_viommu_invalidate_selftest *cmds;
- struct iommu_viommu_invalidate_selftest *cur;
- struct iommu_viommu_invalidate_selftest *end;
- int rc;
+ struct iommu_viommu_invalidate_selftest cmd;
+ struct mock_dev *mdev;
+ struct device *dev;
+ u32 processed = 0;
+ int rc = 0;
+ int i;
- /* A zero-length array is allowed to validate the array type */
- if (array->entry_num == 0 &&
- array->type == IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST) {
- array->entry_num = 0;
- return 0;
+ if (array->type != IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST) {
+ rc = -EINVAL;
+ goto out;
}
- cmds = kzalloc_objs(*cmds, array->entry_num);
- if (!cmds)
- return -ENOMEM;
- cur = cmds;
- end = cmds + array->entry_num;
+ /*
+ * The core re-invokes this op for the remaining requests, so handle one
+ * request per call. A zero-length array only probes the type, validated
+ * above.
+ */
+ if (!array->entry_num)
+ goto out;
- static_assert(sizeof(*cmds) == 3 * sizeof(u32));
- rc = iommu_copy_struct_from_full_user_array(
- cmds, sizeof(*cmds), array,
- IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST);
+ rc = iommu_copy_struct_from_user_array(
+ &cmd, array, IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, 0,
+ cache_id);
if (rc)
goto out;
- while (cur != end) {
- struct mock_dev *mdev;
- struct device *dev;
- int i;
-
- if (cur->flags & ~IOMMU_TEST_INVALIDATE_FLAG_ALL) {
- rc = -EOPNOTSUPP;
- goto out;
- }
-
- if (cur->cache_id > MOCK_DEV_CACHE_ID_MAX) {
- rc = -EINVAL;
- goto out;
- }
+ if (cmd.flags & ~IOMMU_TEST_INVALIDATE_FLAG_ALL) {
+ rc = -EOPNOTSUPP;
+ goto out;
+ }
- xa_lock(&viommu->vdevs);
- dev = iommufd_viommu_find_dev(viommu,
- (unsigned long)cur->vdev_id);
- if (!dev) {
- xa_unlock(&viommu->vdevs);
- rc = -EINVAL;
- goto out;
- }
- mdev = container_of(dev, struct mock_dev, dev);
+ if (cmd.cache_id > MOCK_DEV_CACHE_ID_MAX) {
+ rc = -EINVAL;
+ goto out;
+ }
- if (cur->flags & IOMMU_TEST_INVALIDATE_FLAG_ALL) {
- /* Invalidate all cache entries and ignore cache_id */
- for (i = 0; i < MOCK_DEV_CACHE_NUM; i++)
- mdev->cache[i] = 0;
- } else {
- mdev->cache[cur->cache_id] = 0;
- }
+ xa_lock(&viommu->vdevs);
+ dev = iommufd_viommu_find_dev(viommu, (unsigned long)cmd.vdev_id);
+ if (!dev) {
xa_unlock(&viommu->vdevs);
-
- cur++;
+ rc = -EINVAL;
+ goto out;
+ }
+ mdev = container_of(dev, struct mock_dev, dev);
+
+ if (cmd.flags & IOMMU_TEST_INVALIDATE_FLAG_ALL) {
+ /* Invalidate all cache entries and ignore cache_id */
+ for (i = 0; i < MOCK_DEV_CACHE_NUM; i++)
+ mdev->cache[i] = 0;
+ } else {
+ mdev->cache[cmd.cache_id] = 0;
}
+ xa_unlock(&viommu->vdevs);
+
+ processed = 1;
out:
- array->entry_num = cur - cmds;
- kfree(cmds);
+ array->entry_num = processed;
return rc;
}
@@ -875,42 +868,46 @@ mock_domain_cache_invalidate_user(struct iommu_domain *domain,
struct mock_iommu_domain_nested *mock_nested = to_mock_nested(domain);
struct iommu_hwpt_invalidate_selftest inv;
u32 processed = 0;
- int i = 0, j;
int rc = 0;
+ int i;
if (array->type != IOMMU_HWPT_INVALIDATE_DATA_SELFTEST) {
rc = -EINVAL;
goto out;
}
- for ( ; i < array->entry_num; i++) {
- rc = iommu_copy_struct_from_user_array(&inv, array,
- IOMMU_HWPT_INVALIDATE_DATA_SELFTEST,
- i, iotlb_id);
- if (rc)
- break;
+ /*
+ * The core re-invokes this op for the remaining requests, so handle one
+ * request per call. A zero-length array only probes the type, validated
+ * above.
+ */
+ if (!array->entry_num)
+ goto out;
- if (inv.flags & ~IOMMU_TEST_INVALIDATE_FLAG_ALL) {
- rc = -EOPNOTSUPP;
- break;
- }
+ rc = iommu_copy_struct_from_user_array(
+ &inv, array, IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, 0, iotlb_id);
+ if (rc)
+ goto out;
- if (inv.iotlb_id > MOCK_NESTED_DOMAIN_IOTLB_ID_MAX) {
- rc = -EINVAL;
- break;
- }
+ if (inv.flags & ~IOMMU_TEST_INVALIDATE_FLAG_ALL) {
+ rc = -EOPNOTSUPP;
+ goto out;
+ }
- if (inv.flags & IOMMU_TEST_INVALIDATE_FLAG_ALL) {
- /* Invalidate all mock iotlb entries and ignore iotlb_id */
- for (j = 0; j < MOCK_NESTED_DOMAIN_IOTLB_NUM; j++)
- mock_nested->iotlb[j] = 0;
- } else {
- mock_nested->iotlb[inv.iotlb_id] = 0;
- }
+ if (inv.iotlb_id > MOCK_NESTED_DOMAIN_IOTLB_ID_MAX) {
+ rc = -EINVAL;
+ goto out;
+ }
- processed++;
+ if (inv.flags & IOMMU_TEST_INVALIDATE_FLAG_ALL) {
+ /* Invalidate all mock iotlb entries and ignore iotlb_id */
+ for (i = 0; i < MOCK_NESTED_DOMAIN_IOTLB_NUM; i++)
+ mock_nested->iotlb[i] = 0;
+ } else {
+ mock_nested->iotlb[inv.iotlb_id] = 0;
}
+ processed = 1;
out:
array->entry_num = processed;
return rc;
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v1 4/5] iommu/arm-smmu-v3-iommufd: Convert cache invalidation to the core array loop
2026-06-29 21:15 [PATCH v1 0/5] iommufd: Iterate the cache invalidation array in the core Nicolin Chen
` (2 preceding siblings ...)
2026-06-29 21:15 ` [PATCH v1 3/5] iommufd/selftest: Convert cache invalidation mocks to the core array loop Nicolin Chen
@ 2026-06-29 21:15 ` Nicolin Chen
2026-06-29 21:15 ` [PATCH v1 5/5] iommu/vt-d: Convert nested " Nicolin Chen
4 siblings, 0 replies; 6+ messages in thread
From: Nicolin Chen @ 2026-06-29 21:15 UTC (permalink / raw)
To: Will Deacon, Jason Gunthorpe, Kevin Tian, Lu Baolu
Cc: Robin Murphy, joro, David Woodhouse, linux-arm-kernel, iommu,
linux-kernel
arm_vsmmu_cache_invalidate() allocated a buffer for the entire user request
array, walked the array converting each of the commands, and issued those
converted commands to the cmdq in CMDQ_BATCH_ENTRIES sized chunks, carrying
the sub-array bookkeeping all on its own.
The iommufd core now iterates the invalidation array and re-invokes the op
with the not-yet-handled sub-array, so the driver only has to proceed with
a single chunk per call.
Instead of a per-array allocation, use a fixed on-stack batch to copy from
the userspace array. If the copy fails due to nonzero padding (VMM violates
the ABI), fail the entire batch.
Convert the whole batch before issuing any of it: a malformed command is a
userspace bug, so the first illegal command fails the batch as a unit,
issuing nothing and leaving array->entry_num at zero, the same way the copy
above bails on nonzero padding. A batch that converts cleanly is issued in
full, so the op returns either a handled count with no error or zero with
an error.
A zero-length array now returns success once the data type gets validated,
matching the documented probe behavior, rather than the -EINVAL that the
full-array copy helper would previously return.
This also fixes two long-standing bugs:
1) On a conversion failure the old code reported commands that it had
converted but not yet issued, so user space advanced its consumer
index past invalidations that never reached the cmdq.
2) A zero-length array was rejected with -EINVAL, although the uAPI
documents it as a valid request that only probes the data type.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 73 ++++++++++---------
1 file changed, 39 insertions(+), 34 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index 393d69783225c..aee58c0be4597 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -412,49 +412,54 @@ int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu,
struct iommu_user_data_array *array)
{
struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core);
+ struct arm_vsmmu_invalidation_cmd cmds[CMDQ_BATCH_ENTRIES - 1];
struct arm_smmu_device *smmu = vsmmu->smmu;
- struct arm_vsmmu_invalidation_cmd *last;
- struct arm_vsmmu_invalidation_cmd *cmds;
- struct arm_vsmmu_invalidation_cmd *cur;
- struct arm_vsmmu_invalidation_cmd *end;
+ struct iommu_user_data_array batch = {
+ .type = array->type,
+ .uptr = array->uptr,
+ .entry_len = array->entry_len,
+ };
int ret;
-
- cmds = kzalloc_objs(*cmds, array->entry_num);
- if (!cmds)
- return -ENOMEM;
- cur = cmds;
- end = cmds + array->entry_num;
+ u32 i;
static_assert(sizeof(*cmds) == 2 * sizeof(u64));
+
+ if (array->type != IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3) {
+ array->entry_num = 0;
+ return -EINVAL;
+ }
+
+ /* A zero-length array only probes the type, validated above */
+ if (!array->entry_num)
+ return 0;
+
+ /*
+ * The core re-invokes this op for the remaining requests, so copy one
+ * cmdq batch worth of commands into a fixed on-stack buffer rather than
+ * allocating for the whole array.
+ */
+ batch.entry_num = min_t(u32, array->entry_num, ARRAY_SIZE(cmds));
ret = iommu_copy_struct_from_full_user_array(
- cmds, sizeof(*cmds), array,
+ cmds, sizeof(*cmds), &batch,
IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3);
- if (ret)
- goto out;
-
- last = cmds;
- while (cur != end) {
- ret = arm_vsmmu_convert_user_cmd(vsmmu, cur);
- if (ret)
- goto out;
-
- /* FIXME work in blocks of CMDQ_BATCH_ENTRIES and copy each block? */
- cur++;
- if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1)
- continue;
-
- /* FIXME always uses the main cmdq rather than trying to group by type */
- ret = arm_smmu_cmdq_issue_cmdlist(smmu, &smmu->cmdq, &last->cmd,
- cur - last, true);
+ if (ret) {
+ array->entry_num = 0;
+ return ret;
+ }
+
+ /* Convert the whole batch; a single illegal command fails it all */
+ for (i = 0; i < batch.entry_num; i++) {
+ ret = arm_vsmmu_convert_user_cmd(vsmmu, &cmds[i]);
if (ret) {
- cur--;
- goto out;
+ array->entry_num = 0;
+ return ret;
}
- last = cur;
}
-out:
- array->entry_num = cur - cmds;
- kfree(cmds);
+
+ /* FIXME always uses the main cmdq rather than trying to group by type */
+ ret = arm_smmu_cmdq_issue_cmdlist(smmu, &smmu->cmdq, &cmds->cmd,
+ batch.entry_num, true);
+ array->entry_num = ret ? 0 : batch.entry_num;
return ret;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v1 5/5] iommu/vt-d: Convert nested cache invalidation to the core array loop
2026-06-29 21:15 [PATCH v1 0/5] iommufd: Iterate the cache invalidation array in the core Nicolin Chen
` (3 preceding siblings ...)
2026-06-29 21:15 ` [PATCH v1 4/5] iommu/arm-smmu-v3-iommufd: Convert cache invalidation " Nicolin Chen
@ 2026-06-29 21:15 ` Nicolin Chen
4 siblings, 0 replies; 6+ messages in thread
From: Nicolin Chen @ 2026-06-29 21:15 UTC (permalink / raw)
To: Will Deacon, Jason Gunthorpe, Kevin Tian, Lu Baolu
Cc: Robin Murphy, joro, David Woodhouse, linux-arm-kernel, iommu,
linux-kernel
intel_nested_cache_invalidate_user() used to walk the whole request array
on its own, copying and then flushing one single entry at a time.
The iommufd core now iterates the request array itself and re-invokes the
op with the not-yet-handled sub-array, so handle just a single request per
call out of the front of that sub-array and report one handled entry via
the array->entry_num. An empty array keeps returning a success, used as a
probe of IOMMU_HWPT_INVALIDATE_DATA_VTD_S1 support.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/intel/nested.c | 54 ++++++++++++++++++++----------------
1 file changed, 30 insertions(+), 24 deletions(-)
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index 2b979bec56cef..cad5c1573196e 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -93,7 +93,7 @@ static int intel_nested_cache_invalidate_user(struct iommu_domain *domain,
{
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
struct iommu_hwpt_vtd_s1_invalidate inv_entry;
- u32 index, processed = 0;
+ u32 processed = 0;
int ret = 0;
if (array->type != IOMMU_HWPT_INVALIDATE_DATA_VTD_S1) {
@@ -101,31 +101,37 @@ static int intel_nested_cache_invalidate_user(struct iommu_domain *domain,
goto out;
}
- for (index = 0; index < array->entry_num; index++) {
- ret = iommu_copy_struct_from_user_array(&inv_entry, array,
- IOMMU_HWPT_INVALIDATE_DATA_VTD_S1,
- index, __reserved);
- if (ret)
- break;
-
- if ((inv_entry.flags & ~IOMMU_VTD_INV_FLAGS_LEAF) ||
- inv_entry.__reserved) {
- ret = -EOPNOTSUPP;
- break;
- }
-
- if (!IS_ALIGNED(inv_entry.addr, VTD_PAGE_SIZE) ||
- ((inv_entry.npages == U64_MAX) && inv_entry.addr)) {
- ret = -EINVAL;
- break;
- }
-
- cache_tag_flush_range(dmar_domain, inv_entry.addr,
- inv_entry.addr + nrpages_to_size(inv_entry.npages) - 1,
- inv_entry.flags & IOMMU_VTD_INV_FLAGS_LEAF);
- processed++;
+ /*
+ * The core re-invokes this op for the remaining requests, so handle one
+ * request per call. A zero-length array only probes the type, validated
+ * above.
+ */
+ if (!array->entry_num)
+ goto out;
+
+ ret = iommu_copy_struct_from_user_array(
+ &inv_entry, array, IOMMU_HWPT_INVALIDATE_DATA_VTD_S1, 0,
+ __reserved);
+ if (ret)
+ goto out;
+
+ if ((inv_entry.flags & ~IOMMU_VTD_INV_FLAGS_LEAF) ||
+ inv_entry.__reserved) {
+ ret = -EOPNOTSUPP;
+ goto out;
+ }
+
+ if (!IS_ALIGNED(inv_entry.addr, VTD_PAGE_SIZE) ||
+ (inv_entry.npages == U64_MAX && inv_entry.addr)) {
+ ret = -EINVAL;
+ goto out;
}
+ cache_tag_flush_range(dmar_domain, inv_entry.addr,
+ inv_entry.addr +
+ nrpages_to_size(inv_entry.npages) - 1,
+ inv_entry.flags & IOMMU_VTD_INV_FLAGS_LEAF);
+ processed = 1;
out:
array->entry_num = processed;
return ret;
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-06-29 21:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-29 21:15 [PATCH v1 0/5] iommufd: Iterate the cache invalidation array in the core Nicolin Chen
2026-06-29 21:15 ` [PATCH v1 1/5] iommu/arm-smmu-v3-iommufd: Reject unsupported bits in invalidation commands Nicolin Chen
2026-06-29 21:15 ` [PATCH v1 2/5] iommufd: Iterate the cache invalidation array in the core Nicolin Chen
2026-06-29 21:15 ` [PATCH v1 3/5] iommufd/selftest: Convert cache invalidation mocks to the core array loop Nicolin Chen
2026-06-29 21:15 ` [PATCH v1 4/5] iommu/arm-smmu-v3-iommufd: Convert cache invalidation " Nicolin Chen
2026-06-29 21:15 ` [PATCH v1 5/5] iommu/vt-d: Convert nested " Nicolin Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox