* [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary
@ 2026-05-22 0:36 Nicolin Chen
2026-05-22 0:36 ` [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch Nicolin Chen
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Nicolin Chen @ 2026-05-22 0:36 UTC (permalink / raw)
To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest
The upper bound of veventq_depth has been missing for veventq allocation,
leaving a vulnerability where userspace could exhaust atomic memory pool.
Fix it properly:
- Allocate outside the spinlock to avoid GFP_ATOMIC
- Cap the veventq_depth upper bound
- Fix event_data byte-count
- Add selftest coverage
Note that QEMU's SMMU has been already allocating veventq using a "HW"
EVTQ entry number. So, picking 19 as the known use case, for a minimal
level of ABI consistency.
This is on github:
https://github.com/nicolinc/iommufd/commits/fix_veventq_depth-v2
Changelog:
v2
* Add Reviewed-by from Jason
* Rebase on Jason's for-rc tree
* Update commit message for clarification
* Move "data_len byte-count" to the first
* Drop optimistic read in the allocation path
v1
https://lore.kernel.org/all/cover.1779070992.git.nicolinc@nvidia.com/
Nicolin Chen (4):
iommufd: Fix data_len byte-count vs element-count mismatch
iommufd: Move vevent memory allocation outside spinlock
iommufd: Set veventq_depth upper bound
iommufd/selftest: Add boundary tests for veventq_depth
drivers/iommu/iommufd/iommufd_private.h | 2 +-
tools/testing/selftests/iommu/iommufd_utils.h | 17 +++++++++--------
drivers/iommu/iommufd/driver.c | 13 ++++++++-----
drivers/iommu/iommufd/eventq.c | 5 ++++-
tools/testing/selftests/iommu/iommufd.c | 19 +++++++++++++++++--
.../selftests/iommu/iommufd_fail_nth.c | 2 +-
6 files changed, 40 insertions(+), 18 deletions(-)
base-commit: be93d186ae88a92e7aa77e122d4e661fa57b1e39
--
2.43.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch
2026-05-22 0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen
@ 2026-05-22 0:36 ` Nicolin Chen
2026-05-22 0:36 ` [PATCH rc v2 2/4] iommufd: Move vevent memory allocation outside spinlock Nicolin Chen
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Nicolin Chen @ 2026-05-22 0:36 UTC (permalink / raw)
To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest
kzalloc_flex() computes the allocation size. With event_data typed as u64,
data_len is interpreted as a u64 element count. Yet, every caller and the
read path treat data_len as a byte count. The current code over-allocates
by sizeof(u64) and the __counted_by() annotation overstates the length by
the same factor.
Re-type event_data as u8. No functional change in user-visible behavior.
Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC")
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommufd/iommufd_private.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 6ac1965199e9a..43fbc5bed8de3 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -602,7 +602,7 @@ struct iommufd_vevent {
struct iommufd_vevent_header header;
struct list_head node; /* for iommufd_eventq::deliver */
ssize_t data_len;
- u64 event_data[] __counted_by(data_len);
+ u8 event_data[] __counted_by(data_len);
};
#define vevent_for_lost_events_header(vevent) \
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH rc v2 2/4] iommufd: Move vevent memory allocation outside spinlock
2026-05-22 0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen
2026-05-22 0:36 ` [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch Nicolin Chen
@ 2026-05-22 0:36 ` Nicolin Chen
2026-05-22 0:36 ` [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound Nicolin Chen
2026-05-22 0:36 ` [PATCH rc v2 4/4] iommufd/selftest: Add boundary tests for veventq_depth Nicolin Chen
3 siblings, 0 replies; 5+ messages in thread
From: Nicolin Chen @ 2026-05-22 0:36 UTC (permalink / raw)
To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest
The veventq memory allocation happens inside the spinlock. Given its depth
is decided by the user space, this leaves a vulnerability, where userspace
can allocate large queues to exhaust atomic memory reserves.
Move the allocation outside the spinlock and use GFP_NOWAIT, which can fail
fast under memory pressure without dipping into the GFP_ATOMIC reserves or
direct-reclaiming from the threaded IRQ handler. On allocation failure,
queue the lost_events_header (so userspace learns of the drop) and return
-ENOMEM so the caller learns of the kernel-side memory pressure.
This is intentionally distinct from the queue-overflow path, which also
queues the lost_events_header but returns 0: a full queue is an expected
userspace-pacing condition rather than a kernel error.
A subsequent change will cap the upper bound of the veventq_depth.
Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC")
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommufd/driver.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c
index 61e6b02601d1a..3b8067976eac0 100644
--- a/drivers/iommu/iommufd/driver.c
+++ b/drivers/iommu/iommufd/driver.c
@@ -149,15 +149,18 @@ int iommufd_viommu_report_event(struct iommufd_viommu *viommu,
goto out_unlock_veventqs;
}
- spin_lock(&veventq->common.lock);
- if (veventq->num_events == veventq->depth) {
+ /* Pre-allocate to avoid GFP_ATOMIC; use GFP_NOWAIT to avoid sleeping */
+ vevent = kzalloc_flex(*vevent, event_data, data_len, GFP_NOWAIT);
+ if (!vevent) {
+ spin_lock(&veventq->common.lock);
vevent = &veventq->lost_events_header;
+ rc = -ENOMEM;
goto out_set_header;
}
- vevent = kzalloc_flex(*vevent, event_data, data_len, GFP_ATOMIC);
- if (!vevent) {
- rc = -ENOMEM;
+ spin_lock(&veventq->common.lock);
+ if (veventq->num_events == veventq->depth) {
+ kfree(vevent);
vevent = &veventq->lost_events_header;
goto out_set_header;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound
2026-05-22 0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen
2026-05-22 0:36 ` [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch Nicolin Chen
2026-05-22 0:36 ` [PATCH rc v2 2/4] iommufd: Move vevent memory allocation outside spinlock Nicolin Chen
@ 2026-05-22 0:36 ` Nicolin Chen
2026-05-22 0:36 ` [PATCH rc v2 4/4] iommufd/selftest: Add boundary tests for veventq_depth Nicolin Chen
3 siblings, 0 replies; 5+ messages in thread
From: Nicolin Chen @ 2026-05-22 0:36 UTC (permalink / raw)
To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest
iommufd_veventq_alloc() accepts any !0 veventq_depth from userspace, with
an upper bound at U32_MAX.
This leaves a vulnerability where userspace can allocate excessively large
queues to exhaust kernel memory reserves.
Cap the veventq_depth (maximum number of entries) to 1 << 19, matching the
maximum number of entries in the SMMUv3 EVTQ (the largest use case today).
Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC")
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommufd/eventq.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/eventq.c b/drivers/iommu/iommufd/eventq.c
index 78689fb52d24c..1f1e415285b1a 100644
--- a/drivers/iommu/iommufd/eventq.c
+++ b/drivers/iommu/iommufd/eventq.c
@@ -473,6 +473,9 @@ int iommufd_fault_iopf_handler(struct iopf_group *group)
static const struct file_operations iommufd_veventq_fops =
INIT_EVENTQ_FOPS(iommufd_veventq_fops_read, NULL);
+/* An arbitrary upper bound for veventq_depth that fits all existing HWs */
+#define VEVENTQ_MAX_DEPTH (1U << 19)
+
int iommufd_veventq_alloc(struct iommufd_ucmd *ucmd)
{
struct iommu_veventq_alloc *cmd = ucmd->cmd;
@@ -484,7 +487,7 @@ int iommufd_veventq_alloc(struct iommufd_ucmd *ucmd)
if (cmd->flags || cmd->__reserved ||
cmd->type == IOMMU_VEVENTQ_TYPE_DEFAULT)
return -EOPNOTSUPP;
- if (!cmd->veventq_depth)
+ if (!cmd->veventq_depth || cmd->veventq_depth > VEVENTQ_MAX_DEPTH)
return -EINVAL;
viommu = iommufd_get_viommu(ucmd, cmd->viommu_id);
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH rc v2 4/4] iommufd/selftest: Add boundary tests for veventq_depth
2026-05-22 0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen
` (2 preceding siblings ...)
2026-05-22 0:36 ` [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound Nicolin Chen
@ 2026-05-22 0:36 ` Nicolin Chen
3 siblings, 0 replies; 5+ messages in thread
From: Nicolin Chen @ 2026-05-22 0:36 UTC (permalink / raw)
To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest
Test veventq_depth to cover a memory exhaustion vulnerability.
Keep veventq_depth=2 for the existing callers.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
tools/testing/selftests/iommu/iommufd_utils.h | 17 +++++++++--------
tools/testing/selftests/iommu/iommufd.c | 19 +++++++++++++++++--
.../selftests/iommu/iommufd_fail_nth.c | 2 +-
3 files changed, 27 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 5502751d500c8..b4928cbd4d9c8 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -1060,12 +1060,13 @@ static int _test_cmd_hw_queue_alloc(int fd, __u32 viommu_id, __u32 type,
base_addr, len, out_qid))
static int _test_cmd_veventq_alloc(int fd, __u32 viommu_id, __u32 type,
- __u32 *veventq_id, __u32 *veventq_fd)
+ __u32 depth, __u32 *veventq_id,
+ __u32 *veventq_fd)
{
struct iommu_veventq_alloc cmd = {
.size = sizeof(cmd),
.type = type,
- .veventq_depth = 2,
+ .veventq_depth = depth,
.viommu_id = viommu_id,
};
int ret;
@@ -1080,13 +1081,13 @@ static int _test_cmd_veventq_alloc(int fd, __u32 viommu_id, __u32 type,
return 0;
}
-#define test_cmd_veventq_alloc(viommu_id, type, veventq_id, veventq_fd) \
- ASSERT_EQ(0, _test_cmd_veventq_alloc(self->fd, viommu_id, type, \
+#define test_cmd_veventq_alloc(viommu_id, type, depth, veventq_id, veventq_fd) \
+ ASSERT_EQ(0, _test_cmd_veventq_alloc(self->fd, viommu_id, type, depth, \
veventq_id, veventq_fd))
-#define test_err_veventq_alloc(_errno, viommu_id, type, veventq_id, \
- veventq_fd) \
- EXPECT_ERRNO(_errno, \
- _test_cmd_veventq_alloc(self->fd, viommu_id, type, \
+#define test_err_veventq_alloc(_errno, viommu_id, type, depth, veventq_id, \
+ veventq_fd) \
+ EXPECT_ERRNO(_errno, \
+ _test_cmd_veventq_alloc(self->fd, viommu_id, type, depth, \
veventq_id, veventq_fd))
static int _test_cmd_trigger_vevents(int fd, __u32 dev_id, __u32 nvevents)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index d1fe5dbc2813e..2e8a27dab0bb8 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -2986,11 +2986,26 @@ TEST_F(iommufd_viommu, vdevice_alloc)
test_err_mock_domain_replace(ENOENT, self->stdev_id,
self->nested_hwpt_id);
+ /* Test depth lower and upper bounds (mirrors kernel cap) */
+#define VEVENTQ_MAX_DEPTH (1U << 19)
+ test_err_veventq_alloc(EINVAL, viommu_id,
+ IOMMU_VEVENTQ_TYPE_SELFTEST, 0, NULL,
+ NULL);
+ test_err_veventq_alloc(EINVAL, viommu_id,
+ IOMMU_VEVENTQ_TYPE_SELFTEST,
+ VEVENTQ_MAX_DEPTH + 1, NULL, NULL);
+ test_cmd_veventq_alloc(viommu_id, IOMMU_VEVENTQ_TYPE_SELFTEST,
+ VEVENTQ_MAX_DEPTH, &veventq_id,
+ &veventq_fd);
+ close(veventq_fd);
+ test_ioctl_destroy(veventq_id);
+
/* Allocate a vEVENTQ with veventq_depth=2 */
test_cmd_veventq_alloc(viommu_id, IOMMU_VEVENTQ_TYPE_SELFTEST,
- &veventq_id, &veventq_fd);
+ 2, &veventq_id, &veventq_fd);
test_err_veventq_alloc(EEXIST, viommu_id,
- IOMMU_VEVENTQ_TYPE_SELFTEST, NULL, NULL);
+ IOMMU_VEVENTQ_TYPE_SELFTEST, 2, NULL,
+ NULL);
/* Set vdev_id to 0x99, unset it, and set to 0x88 */
test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
test_cmd_mock_domain_replace(self->stdev_id,
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index 45c14323a6183..25495d8dceb3d 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -712,7 +712,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)
return -1;
if (_test_cmd_veventq_alloc(self->fd, viommu_id,
- IOMMU_VEVENTQ_TYPE_SELFTEST, &veventq_id,
+ IOMMU_VEVENTQ_TYPE_SELFTEST, 2, &veventq_id,
&veventq_fd))
return -1;
close(veventq_fd);
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-22 0:37 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22 0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen
2026-05-22 0:36 ` [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch Nicolin Chen
2026-05-22 0:36 ` [PATCH rc v2 2/4] iommufd: Move vevent memory allocation outside spinlock Nicolin Chen
2026-05-22 0:36 ` [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound Nicolin Chen
2026-05-22 0:36 ` [PATCH rc v2 4/4] iommufd/selftest: Add boundary tests for veventq_depth Nicolin Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox