* [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary
@ 2026-05-22 0:36 Nicolin Chen
2026-05-22 0:36 ` [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch Nicolin Chen
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Nicolin Chen @ 2026-05-22 0:36 UTC (permalink / raw)
To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest
The upper bound of veventq_depth has been missing for veventq allocation,
leaving a vulnerability where userspace could exhaust atomic memory pool.
Fix it properly:
- Allocate outside the spinlock to avoid GFP_ATOMIC
- Cap the veventq_depth upper bound
- Fix event_data byte-count
- Add selftest coverage
Note that QEMU's SMMU has been already allocating veventq using a "HW"
EVTQ entry number. So, picking 19 as the known use case, for a minimal
level of ABI consistency.
This is on github:
https://github.com/nicolinc/iommufd/commits/fix_veventq_depth-v2
Changelog:
v2
* Add Reviewed-by from Jason
* Rebase on Jason's for-rc tree
* Update commit message for clarification
* Move "data_len byte-count" to the first
* Drop optimistic read in the allocation path
v1
https://lore.kernel.org/all/cover.1779070992.git.nicolinc@nvidia.com/
Nicolin Chen (4):
iommufd: Fix data_len byte-count vs element-count mismatch
iommufd: Move vevent memory allocation outside spinlock
iommufd: Set veventq_depth upper bound
iommufd/selftest: Add boundary tests for veventq_depth
drivers/iommu/iommufd/iommufd_private.h | 2 +-
tools/testing/selftests/iommu/iommufd_utils.h | 17 +++++++++--------
drivers/iommu/iommufd/driver.c | 13 ++++++++-----
drivers/iommu/iommufd/eventq.c | 5 ++++-
tools/testing/selftests/iommu/iommufd.c | 19 +++++++++++++++++--
.../selftests/iommu/iommufd_fail_nth.c | 2 +-
6 files changed, 40 insertions(+), 18 deletions(-)
base-commit: be93d186ae88a92e7aa77e122d4e661fa57b1e39
--
2.43.0
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch 2026-05-22 0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen @ 2026-05-22 0:36 ` Nicolin Chen 2026-05-22 0:36 ` [PATCH rc v2 2/4] iommufd: Move vevent memory allocation outside spinlock Nicolin Chen ` (2 subsequent siblings) 3 siblings, 0 replies; 5+ messages in thread From: Nicolin Chen @ 2026-05-22 0:36 UTC (permalink / raw) To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest kzalloc_flex() computes the allocation size. With event_data typed as u64, data_len is interpreted as a u64 element count. Yet, every caller and the read path treat data_len as a byte count. The current code over-allocates by sizeof(u64) and the __counted_by() annotation overstates the length by the same factor. Re-type event_data as u8. No functional change in user-visible behavior. Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC") Cc: stable@vger.kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> --- drivers/iommu/iommufd/iommufd_private.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 6ac1965199e9a..43fbc5bed8de3 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -602,7 +602,7 @@ struct iommufd_vevent { struct iommufd_vevent_header header; struct list_head node; /* for iommufd_eventq::deliver */ ssize_t data_len; - u64 event_data[] __counted_by(data_len); + u8 event_data[] __counted_by(data_len); }; #define vevent_for_lost_events_header(vevent) \ -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH rc v2 2/4] iommufd: Move vevent memory allocation outside spinlock 2026-05-22 0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen 2026-05-22 0:36 ` [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch Nicolin Chen @ 2026-05-22 0:36 ` Nicolin Chen 2026-05-22 0:36 ` [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound Nicolin Chen 2026-05-22 0:36 ` [PATCH rc v2 4/4] iommufd/selftest: Add boundary tests for veventq_depth Nicolin Chen 3 siblings, 0 replies; 5+ messages in thread From: Nicolin Chen @ 2026-05-22 0:36 UTC (permalink / raw) To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest The veventq memory allocation happens inside the spinlock. Given its depth is decided by the user space, this leaves a vulnerability, where userspace can allocate large queues to exhaust atomic memory reserves. Move the allocation outside the spinlock and use GFP_NOWAIT, which can fail fast under memory pressure without dipping into the GFP_ATOMIC reserves or direct-reclaiming from the threaded IRQ handler. On allocation failure, queue the lost_events_header (so userspace learns of the drop) and return -ENOMEM so the caller learns of the kernel-side memory pressure. This is intentionally distinct from the queue-overflow path, which also queues the lost_events_header but returns 0: a full queue is an expected userspace-pacing condition rather than a kernel error. A subsequent change will cap the upper bound of the veventq_depth. Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC") Cc: stable@vger.kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> --- drivers/iommu/iommufd/driver.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c index 61e6b02601d1a..3b8067976eac0 100644 --- a/drivers/iommu/iommufd/driver.c +++ b/drivers/iommu/iommufd/driver.c @@ -149,15 +149,18 @@ int iommufd_viommu_report_event(struct iommufd_viommu *viommu, goto out_unlock_veventqs; } - spin_lock(&veventq->common.lock); - if (veventq->num_events == veventq->depth) { + /* Pre-allocate to avoid GFP_ATOMIC; use GFP_NOWAIT to avoid sleeping */ + vevent = kzalloc_flex(*vevent, event_data, data_len, GFP_NOWAIT); + if (!vevent) { + spin_lock(&veventq->common.lock); vevent = &veventq->lost_events_header; + rc = -ENOMEM; goto out_set_header; } - vevent = kzalloc_flex(*vevent, event_data, data_len, GFP_ATOMIC); - if (!vevent) { - rc = -ENOMEM; + spin_lock(&veventq->common.lock); + if (veventq->num_events == veventq->depth) { + kfree(vevent); vevent = &veventq->lost_events_header; goto out_set_header; } -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound 2026-05-22 0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen 2026-05-22 0:36 ` [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch Nicolin Chen 2026-05-22 0:36 ` [PATCH rc v2 2/4] iommufd: Move vevent memory allocation outside spinlock Nicolin Chen @ 2026-05-22 0:36 ` Nicolin Chen 2026-05-22 0:36 ` [PATCH rc v2 4/4] iommufd/selftest: Add boundary tests for veventq_depth Nicolin Chen 3 siblings, 0 replies; 5+ messages in thread From: Nicolin Chen @ 2026-05-22 0:36 UTC (permalink / raw) To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest iommufd_veventq_alloc() accepts any !0 veventq_depth from userspace, with an upper bound at U32_MAX. This leaves a vulnerability where userspace can allocate excessively large queues to exhaust kernel memory reserves. Cap the veventq_depth (maximum number of entries) to 1 << 19, matching the maximum number of entries in the SMMUv3 EVTQ (the largest use case today). Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC") Cc: stable@vger.kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> --- drivers/iommu/iommufd/eventq.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/iommufd/eventq.c b/drivers/iommu/iommufd/eventq.c index 78689fb52d24c..1f1e415285b1a 100644 --- a/drivers/iommu/iommufd/eventq.c +++ b/drivers/iommu/iommufd/eventq.c @@ -473,6 +473,9 @@ int iommufd_fault_iopf_handler(struct iopf_group *group) static const struct file_operations iommufd_veventq_fops = INIT_EVENTQ_FOPS(iommufd_veventq_fops_read, NULL); +/* An arbitrary upper bound for veventq_depth that fits all existing HWs */ +#define VEVENTQ_MAX_DEPTH (1U << 19) + int iommufd_veventq_alloc(struct iommufd_ucmd *ucmd) { struct iommu_veventq_alloc *cmd = ucmd->cmd; @@ -484,7 +487,7 @@ int iommufd_veventq_alloc(struct iommufd_ucmd *ucmd) if (cmd->flags || cmd->__reserved || cmd->type == IOMMU_VEVENTQ_TYPE_DEFAULT) return -EOPNOTSUPP; - if (!cmd->veventq_depth) + if (!cmd->veventq_depth || cmd->veventq_depth > VEVENTQ_MAX_DEPTH) return -EINVAL; viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH rc v2 4/4] iommufd/selftest: Add boundary tests for veventq_depth 2026-05-22 0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen ` (2 preceding siblings ...) 2026-05-22 0:36 ` [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound Nicolin Chen @ 2026-05-22 0:36 ` Nicolin Chen 3 siblings, 0 replies; 5+ messages in thread From: Nicolin Chen @ 2026-05-22 0:36 UTC (permalink / raw) To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest Test veventq_depth to cover a memory exhaustion vulnerability. Keep veventq_depth=2 for the existing callers. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> --- tools/testing/selftests/iommu/iommufd_utils.h | 17 +++++++++-------- tools/testing/selftests/iommu/iommufd.c | 19 +++++++++++++++++-- .../selftests/iommu/iommufd_fail_nth.c | 2 +- 3 files changed, 27 insertions(+), 11 deletions(-) diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 5502751d500c8..b4928cbd4d9c8 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -1060,12 +1060,13 @@ static int _test_cmd_hw_queue_alloc(int fd, __u32 viommu_id, __u32 type, base_addr, len, out_qid)) static int _test_cmd_veventq_alloc(int fd, __u32 viommu_id, __u32 type, - __u32 *veventq_id, __u32 *veventq_fd) + __u32 depth, __u32 *veventq_id, + __u32 *veventq_fd) { struct iommu_veventq_alloc cmd = { .size = sizeof(cmd), .type = type, - .veventq_depth = 2, + .veventq_depth = depth, .viommu_id = viommu_id, }; int ret; @@ -1080,13 +1081,13 @@ static int _test_cmd_veventq_alloc(int fd, __u32 viommu_id, __u32 type, return 0; } -#define test_cmd_veventq_alloc(viommu_id, type, veventq_id, veventq_fd) \ - ASSERT_EQ(0, _test_cmd_veventq_alloc(self->fd, viommu_id, type, \ +#define test_cmd_veventq_alloc(viommu_id, type, depth, veventq_id, veventq_fd) \ + ASSERT_EQ(0, _test_cmd_veventq_alloc(self->fd, viommu_id, type, depth, \ veventq_id, veventq_fd)) -#define test_err_veventq_alloc(_errno, viommu_id, type, veventq_id, \ - veventq_fd) \ - EXPECT_ERRNO(_errno, \ - _test_cmd_veventq_alloc(self->fd, viommu_id, type, \ +#define test_err_veventq_alloc(_errno, viommu_id, type, depth, veventq_id, \ + veventq_fd) \ + EXPECT_ERRNO(_errno, \ + _test_cmd_veventq_alloc(self->fd, viommu_id, type, depth, \ veventq_id, veventq_fd)) static int _test_cmd_trigger_vevents(int fd, __u32 dev_id, __u32 nvevents) diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index d1fe5dbc2813e..2e8a27dab0bb8 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -2986,11 +2986,26 @@ TEST_F(iommufd_viommu, vdevice_alloc) test_err_mock_domain_replace(ENOENT, self->stdev_id, self->nested_hwpt_id); + /* Test depth lower and upper bounds (mirrors kernel cap) */ +#define VEVENTQ_MAX_DEPTH (1U << 19) + test_err_veventq_alloc(EINVAL, viommu_id, + IOMMU_VEVENTQ_TYPE_SELFTEST, 0, NULL, + NULL); + test_err_veventq_alloc(EINVAL, viommu_id, + IOMMU_VEVENTQ_TYPE_SELFTEST, + VEVENTQ_MAX_DEPTH + 1, NULL, NULL); + test_cmd_veventq_alloc(viommu_id, IOMMU_VEVENTQ_TYPE_SELFTEST, + VEVENTQ_MAX_DEPTH, &veventq_id, + &veventq_fd); + close(veventq_fd); + test_ioctl_destroy(veventq_id); + /* Allocate a vEVENTQ with veventq_depth=2 */ test_cmd_veventq_alloc(viommu_id, IOMMU_VEVENTQ_TYPE_SELFTEST, - &veventq_id, &veventq_fd); + 2, &veventq_id, &veventq_fd); test_err_veventq_alloc(EEXIST, viommu_id, - IOMMU_VEVENTQ_TYPE_SELFTEST, NULL, NULL); + IOMMU_VEVENTQ_TYPE_SELFTEST, 2, NULL, + NULL); /* Set vdev_id to 0x99, unset it, and set to 0x88 */ test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id); test_cmd_mock_domain_replace(self->stdev_id, diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index 45c14323a6183..25495d8dceb3d 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -712,7 +712,7 @@ TEST_FAIL_NTH(basic_fail_nth, device) return -1; if (_test_cmd_veventq_alloc(self->fd, viommu_id, - IOMMU_VEVENTQ_TYPE_SELFTEST, &veventq_id, + IOMMU_VEVENTQ_TYPE_SELFTEST, 2, &veventq_id, &veventq_fd)) return -1; close(veventq_fd); -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-22 0:37 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-22 0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen 2026-05-22 0:36 ` [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch Nicolin Chen 2026-05-22 0:36 ` [PATCH rc v2 2/4] iommufd: Move vevent memory allocation outside spinlock Nicolin Chen 2026-05-22 0:36 ` [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound Nicolin Chen 2026-05-22 0:36 ` [PATCH rc v2 4/4] iommufd/selftest: Add boundary tests for veventq_depth Nicolin Chen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox