[PATCH rc v2 0/4] iommufd: Fix veventq

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary
@ 2026-05-22  0:36 Nicolin Chen
  2026-05-22  0:36 ` [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch Nicolin Chen
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Nicolin Chen @ 2026-05-22  0:36 UTC (permalink / raw)
  To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest

The upper bound of veventq_depth has been missing for veventq allocation,
leaving a vulnerability where userspace could exhaust atomic memory pool.

Fix it properly:
 - Allocate outside the spinlock to avoid GFP_ATOMIC
 - Cap the veventq_depth upper bound
 - Fix event_data byte-count
 - Add selftest coverage

Note that QEMU's SMMU has been already allocating veventq using a "HW"
EVTQ entry number. So, picking 19 as the known use case, for a minimal
level of ABI consistency.

This is on github:
https://github.com/nicolinc/iommufd/commits/fix_veventq_depth-v2

Changelog:
v2
 * Add Reviewed-by from Jason
 * Rebase on Jason's for-rc tree
 * Update commit message for clarification
 * Move "data_len byte-count" to the first
 * Drop optimistic read in the allocation path
v1
 https://lore.kernel.org/all/cover.1779070992.git.nicolinc@nvidia.com/

Nicolin Chen (4):
  iommufd: Fix data_len byte-count vs element-count mismatch
  iommufd: Move vevent memory allocation outside spinlock
  iommufd: Set veventq_depth upper bound
  iommufd/selftest: Add boundary tests for veventq_depth

 drivers/iommu/iommufd/iommufd_private.h       |  2 +-
 tools/testing/selftests/iommu/iommufd_utils.h | 17 +++++++++--------
 drivers/iommu/iommufd/driver.c                | 13 ++++++++-----
 drivers/iommu/iommufd/eventq.c                |  5 ++++-
 tools/testing/selftests/iommu/iommufd.c       | 19 +++++++++++++++++--
 .../selftests/iommu/iommufd_fail_nth.c        |  2 +-
 6 files changed, 40 insertions(+), 18 deletions(-)


base-commit: be93d186ae88a92e7aa77e122d4e661fa57b1e39
-- 
2.43.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch
  2026-05-22  0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen
@ 2026-05-22  0:36 ` Nicolin Chen
  2026-05-22  0:36 ` [PATCH rc v2 2/4] iommufd: Move vevent memory allocation outside spinlock Nicolin Chen
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Nicolin Chen @ 2026-05-22  0:36 UTC (permalink / raw)
  To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest

kzalloc_flex() computes the allocation size. With event_data typed as u64,
data_len is interpreted as a u64 element count. Yet, every caller and the
read path treat data_len as a byte count. The current code over-allocates
by sizeof(u64) and the __counted_by() annotation overstates the length by
the same factor.

Re-type event_data as u8. No functional change in user-visible behavior.

Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC")
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/iommufd_private.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 6ac1965199e9a..43fbc5bed8de3 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -602,7 +602,7 @@ struct iommufd_vevent {
 	struct iommufd_vevent_header header;
 	struct list_head node; /* for iommufd_eventq::deliver */
 	ssize_t data_len;
-	u64 event_data[] __counted_by(data_len);
+	u8 event_data[] __counted_by(data_len);
 };
 
 #define vevent_for_lost_events_header(vevent) \
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH rc v2 2/4] iommufd: Move vevent memory allocation outside spinlock
  2026-05-22  0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen
  2026-05-22  0:36 ` [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch Nicolin Chen
@ 2026-05-22  0:36 ` Nicolin Chen
  2026-05-22  0:36 ` [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound Nicolin Chen
  2026-05-22  0:36 ` [PATCH rc v2 4/4] iommufd/selftest: Add boundary tests for veventq_depth Nicolin Chen
  3 siblings, 0 replies; 5+ messages in thread
From: Nicolin Chen @ 2026-05-22  0:36 UTC (permalink / raw)
  To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest

The veventq memory allocation happens inside the spinlock. Given its depth
is decided by the user space, this leaves a vulnerability, where userspace
can allocate large queues to exhaust atomic memory reserves.

Move the allocation outside the spinlock and use GFP_NOWAIT, which can fail
fast under memory pressure without dipping into the GFP_ATOMIC reserves or
direct-reclaiming from the threaded IRQ handler. On allocation failure,
queue the lost_events_header (so userspace learns of the drop) and return
-ENOMEM so the caller learns of the kernel-side memory pressure.

This is intentionally distinct from the queue-overflow path, which also
queues the lost_events_header but returns 0: a full queue is an expected
userspace-pacing condition rather than a kernel error.

A subsequent change will cap the upper bound of the veventq_depth.

Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC")
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/driver.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c
index 61e6b02601d1a..3b8067976eac0 100644
--- a/drivers/iommu/iommufd/driver.c
+++ b/drivers/iommu/iommufd/driver.c
@@ -149,15 +149,18 @@ int iommufd_viommu_report_event(struct iommufd_viommu *viommu,
 		goto out_unlock_veventqs;
 	}
 
-	spin_lock(&veventq->common.lock);
-	if (veventq->num_events == veventq->depth) {
+	/* Pre-allocate to avoid GFP_ATOMIC; use GFP_NOWAIT to avoid sleeping */
+	vevent = kzalloc_flex(*vevent, event_data, data_len, GFP_NOWAIT);
+	if (!vevent) {
+		spin_lock(&veventq->common.lock);
 		vevent = &veventq->lost_events_header;
+		rc = -ENOMEM;
 		goto out_set_header;
 	}
 
-	vevent = kzalloc_flex(*vevent, event_data, data_len, GFP_ATOMIC);
-	if (!vevent) {
-		rc = -ENOMEM;
+	spin_lock(&veventq->common.lock);
+	if (veventq->num_events == veventq->depth) {
+		kfree(vevent);
 		vevent = &veventq->lost_events_header;
 		goto out_set_header;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound
  2026-05-22  0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen
  2026-05-22  0:36 ` [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch Nicolin Chen
  2026-05-22  0:36 ` [PATCH rc v2 2/4] iommufd: Move vevent memory allocation outside spinlock Nicolin Chen
@ 2026-05-22  0:36 ` Nicolin Chen
  2026-05-22  0:36 ` [PATCH rc v2 4/4] iommufd/selftest: Add boundary tests for veventq_depth Nicolin Chen
  3 siblings, 0 replies; 5+ messages in thread
From: Nicolin Chen @ 2026-05-22  0:36 UTC (permalink / raw)
  To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest

iommufd_veventq_alloc() accepts any !0 veventq_depth from userspace, with
an upper bound at U32_MAX.

This leaves a vulnerability where userspace can allocate excessively large
queues to exhaust kernel memory reserves.

Cap the veventq_depth (maximum number of entries) to 1 << 19, matching the
maximum number of entries in the SMMUv3 EVTQ (the largest use case today).

Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC")
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/eventq.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/eventq.c b/drivers/iommu/iommufd/eventq.c
index 78689fb52d24c..1f1e415285b1a 100644
--- a/drivers/iommu/iommufd/eventq.c
+++ b/drivers/iommu/iommufd/eventq.c
@@ -473,6 +473,9 @@ int iommufd_fault_iopf_handler(struct iopf_group *group)
 static const struct file_operations iommufd_veventq_fops =
 	INIT_EVENTQ_FOPS(iommufd_veventq_fops_read, NULL);
 
+/* An arbitrary upper bound for veventq_depth that fits all existing HWs */
+#define VEVENTQ_MAX_DEPTH (1U << 19)
+
 int iommufd_veventq_alloc(struct iommufd_ucmd *ucmd)
 {
 	struct iommu_veventq_alloc *cmd = ucmd->cmd;
@@ -484,7 +487,7 @@ int iommufd_veventq_alloc(struct iommufd_ucmd *ucmd)
 	if (cmd->flags || cmd->__reserved ||
 	    cmd->type == IOMMU_VEVENTQ_TYPE_DEFAULT)
 		return -EOPNOTSUPP;
-	if (!cmd->veventq_depth)
+	if (!cmd->veventq_depth || cmd->veventq_depth > VEVENTQ_MAX_DEPTH)
 		return -EINVAL;
 
 	viommu = iommufd_get_viommu(ucmd, cmd->viommu_id);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH rc v2 4/4] iommufd/selftest: Add boundary tests for veventq_depth
  2026-05-22  0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen
                   ` (2 preceding siblings ...)
  2026-05-22  0:36 ` [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound Nicolin Chen
@ 2026-05-22  0:36 ` Nicolin Chen
  3 siblings, 0 replies; 5+ messages in thread
From: Nicolin Chen @ 2026-05-22  0:36 UTC (permalink / raw)
  To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest

Test veventq_depth to cover a memory exhaustion vulnerability.

Keep veventq_depth=2 for the existing callers.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 tools/testing/selftests/iommu/iommufd_utils.h | 17 +++++++++--------
 tools/testing/selftests/iommu/iommufd.c       | 19 +++++++++++++++++--
 .../selftests/iommu/iommufd_fail_nth.c        |  2 +-
 3 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 5502751d500c8..b4928cbd4d9c8 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -1060,12 +1060,13 @@ static int _test_cmd_hw_queue_alloc(int fd, __u32 viommu_id, __u32 type,
 					      base_addr, len, out_qid))
 
 static int _test_cmd_veventq_alloc(int fd, __u32 viommu_id, __u32 type,
-				   __u32 *veventq_id, __u32 *veventq_fd)
+				   __u32 depth, __u32 *veventq_id,
+				   __u32 *veventq_fd)
 {
 	struct iommu_veventq_alloc cmd = {
 		.size = sizeof(cmd),
 		.type = type,
-		.veventq_depth = 2,
+		.veventq_depth = depth,
 		.viommu_id = viommu_id,
 	};
 	int ret;
@@ -1080,13 +1081,13 @@ static int _test_cmd_veventq_alloc(int fd, __u32 viommu_id, __u32 type,
 	return 0;
 }
 
-#define test_cmd_veventq_alloc(viommu_id, type, veventq_id, veventq_fd) \
-	ASSERT_EQ(0, _test_cmd_veventq_alloc(self->fd, viommu_id, type, \
+#define test_cmd_veventq_alloc(viommu_id, type, depth, veventq_id, veventq_fd) \
+	ASSERT_EQ(0, _test_cmd_veventq_alloc(self->fd, viommu_id, type, depth, \
 					     veventq_id, veventq_fd))
-#define test_err_veventq_alloc(_errno, viommu_id, type, veventq_id,     \
-			       veventq_fd)                              \
-	EXPECT_ERRNO(_errno,                                            \
-		     _test_cmd_veventq_alloc(self->fd, viommu_id, type, \
+#define test_err_veventq_alloc(_errno, viommu_id, type, depth, veventq_id,     \
+			       veventq_fd)                                     \
+	EXPECT_ERRNO(_errno,                                                   \
+		     _test_cmd_veventq_alloc(self->fd, viommu_id, type, depth, \
 					     veventq_id, veventq_fd))
 
 static int _test_cmd_trigger_vevents(int fd, __u32 dev_id, __u32 nvevents)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index d1fe5dbc2813e..2e8a27dab0bb8 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -2986,11 +2986,26 @@ TEST_F(iommufd_viommu, vdevice_alloc)
 		test_err_mock_domain_replace(ENOENT, self->stdev_id,
 					     self->nested_hwpt_id);
 
+		/* Test depth lower and upper bounds (mirrors kernel cap) */
+#define VEVENTQ_MAX_DEPTH (1U << 19)
+		test_err_veventq_alloc(EINVAL, viommu_id,
+				       IOMMU_VEVENTQ_TYPE_SELFTEST, 0, NULL,
+				       NULL);
+		test_err_veventq_alloc(EINVAL, viommu_id,
+				       IOMMU_VEVENTQ_TYPE_SELFTEST,
+				       VEVENTQ_MAX_DEPTH + 1, NULL, NULL);
+		test_cmd_veventq_alloc(viommu_id, IOMMU_VEVENTQ_TYPE_SELFTEST,
+				       VEVENTQ_MAX_DEPTH, &veventq_id,
+				       &veventq_fd);
+		close(veventq_fd);
+		test_ioctl_destroy(veventq_id);
+
 		/* Allocate a vEVENTQ with veventq_depth=2 */
 		test_cmd_veventq_alloc(viommu_id, IOMMU_VEVENTQ_TYPE_SELFTEST,
-				       &veventq_id, &veventq_fd);
+				       2, &veventq_id, &veventq_fd);
 		test_err_veventq_alloc(EEXIST, viommu_id,
-				       IOMMU_VEVENTQ_TYPE_SELFTEST, NULL, NULL);
+				       IOMMU_VEVENTQ_TYPE_SELFTEST, 2, NULL,
+				       NULL);
 		/* Set vdev_id to 0x99, unset it, and set to 0x88 */
 		test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
 		test_cmd_mock_domain_replace(self->stdev_id,
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index 45c14323a6183..25495d8dceb3d 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -712,7 +712,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)
 		return -1;
 
 	if (_test_cmd_veventq_alloc(self->fd, viommu_id,
-				    IOMMU_VEVENTQ_TYPE_SELFTEST, &veventq_id,
+				    IOMMU_VEVENTQ_TYPE_SELFTEST, 2, &veventq_id,
 				    &veventq_fd))
 		return -1;
 	close(veventq_fd);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-22  0:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22  0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen
2026-05-22  0:36 ` [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch Nicolin Chen
2026-05-22  0:36 ` [PATCH rc v2 2/4] iommufd: Move vevent memory allocation outside spinlock Nicolin Chen
2026-05-22  0:36 ` [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound Nicolin Chen
2026-05-22  0:36 ` [PATCH rc v2 4/4] iommufd/selftest: Add boundary tests for veventq_depth Nicolin Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox