[PATCH rc v2 0/4] iommufd: Fix veventq

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary
@ 2026-05-22  0:36 Nicolin Chen
  2026-05-22  0:36 ` [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch Nicolin Chen
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Nicolin Chen @ 2026-05-22  0:36 UTC (permalink / raw)
  To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest

The upper bound of veventq_depth has been missing for veventq allocation,
leaving a vulnerability where userspace could exhaust atomic memory pool.

Fix it properly:
 - Allocate outside the spinlock to avoid GFP_ATOMIC
 - Cap the veventq_depth upper bound
 - Fix event_data byte-count
 - Add selftest coverage

Note that QEMU's SMMU has been already allocating veventq using a "HW"
EVTQ entry number. So, picking 19 as the known use case, for a minimal
level of ABI consistency.

This is on github:
https://github.com/nicolinc/iommufd/commits/fix_veventq_depth-v2

Changelog:
v2
 * Add Reviewed-by from Jason
 * Rebase on Jason's for-rc tree
 * Update commit message for clarification
 * Move "data_len byte-count" to the first
 * Drop optimistic read in the allocation path
v1
 https://lore.kernel.org/all/cover.1779070992.git.nicolinc@nvidia.com/

Nicolin Chen (4):
  iommufd: Fix data_len byte-count vs element-count mismatch
  iommufd: Move vevent memory allocation outside spinlock
  iommufd: Set veventq_depth upper bound
  iommufd/selftest: Add boundary tests for veventq_depth

 drivers/iommu/iommufd/iommufd_private.h       |  2 +-
 tools/testing/selftests/iommu/iommufd_utils.h | 17 +++++++++--------
 drivers/iommu/iommufd/driver.c                | 13 ++++++++-----
 drivers/iommu/iommufd/eventq.c                |  5 ++++-
 tools/testing/selftests/iommu/iommufd.c       | 19 +++++++++++++++++--
 .../selftests/iommu/iommufd_fail_nth.c        |  2 +-
 6 files changed, 40 insertions(+), 18 deletions(-)


base-commit: be93d186ae88a92e7aa77e122d4e661fa57b1e39
-- 
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch
  2026-05-22  0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen
@ 2026-05-22  0:36 ` Nicolin Chen
  2026-05-25  6:49   ` Tian, Kevin
  2026-05-22  0:36 ` [PATCH rc v2 2/4] iommufd: Move vevent memory allocation outside spinlock Nicolin Chen
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Nicolin Chen @ 2026-05-22  0:36 UTC (permalink / raw)
  To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest

kzalloc_flex() computes the allocation size. With event_data typed as u64,
data_len is interpreted as a u64 element count. Yet, every caller and the
read path treat data_len as a byte count. The current code over-allocates
by sizeof(u64) and the __counted_by() annotation overstates the length by
the same factor.

Re-type event_data as u8. No functional change in user-visible behavior.

Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC")
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/iommufd_private.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 6ac1965199e9a..43fbc5bed8de3 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -602,7 +602,7 @@ struct iommufd_vevent {
 	struct iommufd_vevent_header header;
 	struct list_head node; /* for iommufd_eventq::deliver */
 	ssize_t data_len;
-	u64 event_data[] __counted_by(data_len);
+	u8 event_data[] __counted_by(data_len);
 };
 
 #define vevent_for_lost_events_header(vevent) \
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH rc v2 2/4] iommufd: Move vevent memory allocation outside spinlock
  2026-05-22  0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen
  2026-05-22  0:36 ` [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch Nicolin Chen
@ 2026-05-22  0:36 ` Nicolin Chen
  2026-05-25  6:50   ` Tian, Kevin
  2026-05-22  0:36 ` [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound Nicolin Chen
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Nicolin Chen @ 2026-05-22  0:36 UTC (permalink / raw)
  To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest

The veventq memory allocation happens inside the spinlock. Given its depth
is decided by the user space, this leaves a vulnerability, where userspace
can allocate large queues to exhaust atomic memory reserves.

Move the allocation outside the spinlock and use GFP_NOWAIT, which can fail
fast under memory pressure without dipping into the GFP_ATOMIC reserves or
direct-reclaiming from the threaded IRQ handler. On allocation failure,
queue the lost_events_header (so userspace learns of the drop) and return
-ENOMEM so the caller learns of the kernel-side memory pressure.

This is intentionally distinct from the queue-overflow path, which also
queues the lost_events_header but returns 0: a full queue is an expected
userspace-pacing condition rather than a kernel error.

A subsequent change will cap the upper bound of the veventq_depth.

Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC")
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/driver.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c
index 61e6b02601d1a..3b8067976eac0 100644
--- a/drivers/iommu/iommufd/driver.c
+++ b/drivers/iommu/iommufd/driver.c
@@ -149,15 +149,18 @@ int iommufd_viommu_report_event(struct iommufd_viommu *viommu,
 		goto out_unlock_veventqs;
 	}
 
-	spin_lock(&veventq->common.lock);
-	if (veventq->num_events == veventq->depth) {
+	/* Pre-allocate to avoid GFP_ATOMIC; use GFP_NOWAIT to avoid sleeping */
+	vevent = kzalloc_flex(*vevent, event_data, data_len, GFP_NOWAIT);
+	if (!vevent) {
+		spin_lock(&veventq->common.lock);
 		vevent = &veventq->lost_events_header;
+		rc = -ENOMEM;
 		goto out_set_header;
 	}
 
-	vevent = kzalloc_flex(*vevent, event_data, data_len, GFP_ATOMIC);
-	if (!vevent) {
-		rc = -ENOMEM;
+	spin_lock(&veventq->common.lock);
+	if (veventq->num_events == veventq->depth) {
+		kfree(vevent);
 		vevent = &veventq->lost_events_header;
 		goto out_set_header;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound
  2026-05-22  0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen
  2026-05-22  0:36 ` [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch Nicolin Chen
  2026-05-22  0:36 ` [PATCH rc v2 2/4] iommufd: Move vevent memory allocation outside spinlock Nicolin Chen
@ 2026-05-22  0:36 ` Nicolin Chen
  2026-05-25  6:52   ` Tian, Kevin
  2026-05-22  0:36 ` [PATCH rc v2 4/4] iommufd/selftest: Add boundary tests for veventq_depth Nicolin Chen
  2026-06-05 13:43 ` [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Jason Gunthorpe
  4 siblings, 1 reply; 11+ messages in thread
From: Nicolin Chen @ 2026-05-22  0:36 UTC (permalink / raw)
  To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest

iommufd_veventq_alloc() accepts any !0 veventq_depth from userspace, with
an upper bound at U32_MAX.

This leaves a vulnerability where userspace can allocate excessively large
queues to exhaust kernel memory reserves.

Cap the veventq_depth (maximum number of entries) to 1 << 19, matching the
maximum number of entries in the SMMUv3 EVTQ (the largest use case today).

Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC")
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/eventq.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/eventq.c b/drivers/iommu/iommufd/eventq.c
index 78689fb52d24c..1f1e415285b1a 100644
--- a/drivers/iommu/iommufd/eventq.c
+++ b/drivers/iommu/iommufd/eventq.c
@@ -473,6 +473,9 @@ int iommufd_fault_iopf_handler(struct iopf_group *group)
 static const struct file_operations iommufd_veventq_fops =
 	INIT_EVENTQ_FOPS(iommufd_veventq_fops_read, NULL);
 
+/* An arbitrary upper bound for veventq_depth that fits all existing HWs */
+#define VEVENTQ_MAX_DEPTH (1U << 19)
+
 int iommufd_veventq_alloc(struct iommufd_ucmd *ucmd)
 {
 	struct iommu_veventq_alloc *cmd = ucmd->cmd;
@@ -484,7 +487,7 @@ int iommufd_veventq_alloc(struct iommufd_ucmd *ucmd)
 	if (cmd->flags || cmd->__reserved ||
 	    cmd->type == IOMMU_VEVENTQ_TYPE_DEFAULT)
 		return -EOPNOTSUPP;
-	if (!cmd->veventq_depth)
+	if (!cmd->veventq_depth || cmd->veventq_depth > VEVENTQ_MAX_DEPTH)
 		return -EINVAL;
 
 	viommu = iommufd_get_viommu(ucmd, cmd->viommu_id);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH rc v2 4/4] iommufd/selftest: Add boundary tests for veventq_depth
  2026-05-22  0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen
                   ` (2 preceding siblings ...)
  2026-05-22  0:36 ` [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound Nicolin Chen
@ 2026-05-22  0:36 ` Nicolin Chen
  2026-05-25  6:52   ` Tian, Kevin
  2026-06-05 13:43 ` [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Jason Gunthorpe
  4 siblings, 1 reply; 11+ messages in thread
From: Nicolin Chen @ 2026-05-22  0:36 UTC (permalink / raw)
  To: jgg, kevin.tian; +Cc: iommu, linux-kernel, linux-kselftest

Test veventq_depth to cover a memory exhaustion vulnerability.

Keep veventq_depth=2 for the existing callers.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 tools/testing/selftests/iommu/iommufd_utils.h | 17 +++++++++--------
 tools/testing/selftests/iommu/iommufd.c       | 19 +++++++++++++++++--
 .../selftests/iommu/iommufd_fail_nth.c        |  2 +-
 3 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 5502751d500c8..b4928cbd4d9c8 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -1060,12 +1060,13 @@ static int _test_cmd_hw_queue_alloc(int fd, __u32 viommu_id, __u32 type,
 					      base_addr, len, out_qid))
 
 static int _test_cmd_veventq_alloc(int fd, __u32 viommu_id, __u32 type,
-				   __u32 *veventq_id, __u32 *veventq_fd)
+				   __u32 depth, __u32 *veventq_id,
+				   __u32 *veventq_fd)
 {
 	struct iommu_veventq_alloc cmd = {
 		.size = sizeof(cmd),
 		.type = type,
-		.veventq_depth = 2,
+		.veventq_depth = depth,
 		.viommu_id = viommu_id,
 	};
 	int ret;
@@ -1080,13 +1081,13 @@ static int _test_cmd_veventq_alloc(int fd, __u32 viommu_id, __u32 type,
 	return 0;
 }
 
-#define test_cmd_veventq_alloc(viommu_id, type, veventq_id, veventq_fd) \
-	ASSERT_EQ(0, _test_cmd_veventq_alloc(self->fd, viommu_id, type, \
+#define test_cmd_veventq_alloc(viommu_id, type, depth, veventq_id, veventq_fd) \
+	ASSERT_EQ(0, _test_cmd_veventq_alloc(self->fd, viommu_id, type, depth, \
 					     veventq_id, veventq_fd))
-#define test_err_veventq_alloc(_errno, viommu_id, type, veventq_id,     \
-			       veventq_fd)                              \
-	EXPECT_ERRNO(_errno,                                            \
-		     _test_cmd_veventq_alloc(self->fd, viommu_id, type, \
+#define test_err_veventq_alloc(_errno, viommu_id, type, depth, veventq_id,     \
+			       veventq_fd)                                     \
+	EXPECT_ERRNO(_errno,                                                   \
+		     _test_cmd_veventq_alloc(self->fd, viommu_id, type, depth, \
 					     veventq_id, veventq_fd))
 
 static int _test_cmd_trigger_vevents(int fd, __u32 dev_id, __u32 nvevents)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index d1fe5dbc2813e..2e8a27dab0bb8 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -2986,11 +2986,26 @@ TEST_F(iommufd_viommu, vdevice_alloc)
 		test_err_mock_domain_replace(ENOENT, self->stdev_id,
 					     self->nested_hwpt_id);
 
+		/* Test depth lower and upper bounds (mirrors kernel cap) */
+#define VEVENTQ_MAX_DEPTH (1U << 19)
+		test_err_veventq_alloc(EINVAL, viommu_id,
+				       IOMMU_VEVENTQ_TYPE_SELFTEST, 0, NULL,
+				       NULL);
+		test_err_veventq_alloc(EINVAL, viommu_id,
+				       IOMMU_VEVENTQ_TYPE_SELFTEST,
+				       VEVENTQ_MAX_DEPTH + 1, NULL, NULL);
+		test_cmd_veventq_alloc(viommu_id, IOMMU_VEVENTQ_TYPE_SELFTEST,
+				       VEVENTQ_MAX_DEPTH, &veventq_id,
+				       &veventq_fd);
+		close(veventq_fd);
+		test_ioctl_destroy(veventq_id);
+
 		/* Allocate a vEVENTQ with veventq_depth=2 */
 		test_cmd_veventq_alloc(viommu_id, IOMMU_VEVENTQ_TYPE_SELFTEST,
-				       &veventq_id, &veventq_fd);
+				       2, &veventq_id, &veventq_fd);
 		test_err_veventq_alloc(EEXIST, viommu_id,
-				       IOMMU_VEVENTQ_TYPE_SELFTEST, NULL, NULL);
+				       IOMMU_VEVENTQ_TYPE_SELFTEST, 2, NULL,
+				       NULL);
 		/* Set vdev_id to 0x99, unset it, and set to 0x88 */
 		test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
 		test_cmd_mock_domain_replace(self->stdev_id,
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index 45c14323a6183..25495d8dceb3d 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -712,7 +712,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)
 		return -1;
 
 	if (_test_cmd_veventq_alloc(self->fd, viommu_id,
-				    IOMMU_VEVENTQ_TYPE_SELFTEST, &veventq_id,
+				    IOMMU_VEVENTQ_TYPE_SELFTEST, 2, &veventq_id,
 				    &veventq_fd))
 		return -1;
 	close(veventq_fd);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch
  2026-05-22  0:36 ` [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch Nicolin Chen
@ 2026-05-25  6:49   ` Tian, Kevin
  0 siblings, 0 replies; 11+ messages in thread
From: Tian, Kevin @ 2026-05-25  6:49 UTC (permalink / raw)
  To: Nicolin Chen, jgg@nvidia.com
  Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, May 22, 2026 8:37 AM
> 
> kzalloc_flex() computes the allocation size. With event_data typed as u64,
> data_len is interpreted as a u64 element count. Yet, every caller and the
> read path treat data_len as a byte count. The current code over-allocates
> by sizeof(u64) and the __counted_by() annotation overstates the length by
> the same factor.
> 
> Re-type event_data as u8. No functional change in user-visible behavior.
> 
> Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and
> IOMMUFD_CMD_VEVENTQ_ALLOC")
> Cc: stable@vger.kernel.org
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH rc v2 2/4] iommufd: Move vevent memory allocation outside spinlock
  2026-05-22  0:36 ` [PATCH rc v2 2/4] iommufd: Move vevent memory allocation outside spinlock Nicolin Chen
@ 2026-05-25  6:50   ` Tian, Kevin
  0 siblings, 0 replies; 11+ messages in thread
From: Tian, Kevin @ 2026-05-25  6:50 UTC (permalink / raw)
  To: Nicolin Chen, jgg@nvidia.com
  Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, May 22, 2026 8:37 AM
> 
> The veventq memory allocation happens inside the spinlock. Given its depth
> is decided by the user space, this leaves a vulnerability, where userspace
> can allocate large queues to exhaust atomic memory reserves.
> 
> Move the allocation outside the spinlock and use GFP_NOWAIT, which can
> fail
> fast under memory pressure without dipping into the GFP_ATOMIC reserves
> or
> direct-reclaiming from the threaded IRQ handler. On allocation failure,
> queue the lost_events_header (so userspace learns of the drop) and return
> -ENOMEM so the caller learns of the kernel-side memory pressure.
> 
> This is intentionally distinct from the queue-overflow path, which also
> queues the lost_events_header but returns 0: a full queue is an expected
> userspace-pacing condition rather than a kernel error.
> 
> A subsequent change will cap the upper bound of the veventq_depth.
> 
> Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and
> IOMMUFD_CMD_VEVENTQ_ALLOC")
> Cc: stable@vger.kernel.org
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound
  2026-05-22  0:36 ` [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound Nicolin Chen
@ 2026-05-25  6:52   ` Tian, Kevin
  2026-05-25 18:41     ` Nicolin Chen
  0 siblings, 1 reply; 11+ messages in thread
From: Tian, Kevin @ 2026-05-25  6:52 UTC (permalink / raw)
  To: Nicolin Chen, jgg@nvidia.com
  Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, May 22, 2026 8:37 AM
> 
> iommufd_veventq_alloc() accepts any !0 veventq_depth from userspace,
> with
> an upper bound at U32_MAX.
> 
> This leaves a vulnerability where userspace can allocate excessively large
> queues to exhaust kernel memory reserves.
> 
> Cap the veventq_depth (maximum number of entries) to 1 << 19, matching
> the
> maximum number of entries in the SMMUv3 EVTQ (the largest use case
> today).

probably add a comment to uapi header that the maximum number of
supported veventq depth is implementation specific hence user may
expect -EINVAL returned if the specified value is too large?

> 
> Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and
> IOMMUFD_CMD_VEVENTQ_ALLOC")
> Cc: stable@vger.kernel.org
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH rc v2 4/4] iommufd/selftest: Add boundary tests for veventq_depth
  2026-05-22  0:36 ` [PATCH rc v2 4/4] iommufd/selftest: Add boundary tests for veventq_depth Nicolin Chen
@ 2026-05-25  6:52   ` Tian, Kevin
  0 siblings, 0 replies; 11+ messages in thread
From: Tian, Kevin @ 2026-05-25  6:52 UTC (permalink / raw)
  To: Nicolin Chen, jgg@nvidia.com
  Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, May 22, 2026 8:37 AM
> 
> Test veventq_depth to cover a memory exhaustion vulnerability.
> 
> Keep veventq_depth=2 for the existing callers.
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound
  2026-05-25  6:52   ` Tian, Kevin
@ 2026-05-25 18:41     ` Nicolin Chen
  0 siblings, 0 replies; 11+ messages in thread
From: Nicolin Chen @ 2026-05-25 18:41 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: jgg@nvidia.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org

On Mon, May 25, 2026 at 06:52:38AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Friday, May 22, 2026 8:37 AM
> > 
> > iommufd_veventq_alloc() accepts any !0 veventq_depth from userspace,
> > with
> > an upper bound at U32_MAX.
> > 
> > This leaves a vulnerability where userspace can allocate excessively large
> > queues to exhaust kernel memory reserves.
> > 
> > Cap the veventq_depth (maximum number of entries) to 1 << 19, matching
> > the
> > maximum number of entries in the SMMUv3 EVTQ (the largest use case
> > today).
> 
> probably add a comment to uapi header that the maximum number of
> supported veventq depth is implementation specific hence user may
> expect -EINVAL returned if the specified value is too large?

Sure.

@@ -1267,7 +1267,9 @@ struct iommu_vevent_tegra241_cmdqv {
  * can have multiple FDs for different types, but is confined to one per @type.
  * User space should open the @out_veventq_fd to read vEVENTs out of a vEVENTQ,
  * if there are vEVENTs available. A vEVENTQ will lose events due to overflow,
- * if the number of the vEVENTs hits @veventq_depth.
+ * if the number of the vEVENTs hits @veventq_depth. The maximum @veventq_depth
+ * is implementation-specific; -EINVAL will be returned if the requested value
+ * exceeds it.
  *
  * Each vEVENT in a vEVENTQ encloses a struct iommufd_vevent_header followed by
  * a type-specific data structure, in a normal case:

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary
  2026-05-22  0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen
                   ` (3 preceding siblings ...)
  2026-05-22  0:36 ` [PATCH rc v2 4/4] iommufd/selftest: Add boundary tests for veventq_depth Nicolin Chen
@ 2026-06-05 13:43 ` Jason Gunthorpe
  4 siblings, 0 replies; 11+ messages in thread
From: Jason Gunthorpe @ 2026-06-05 13:43 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, iommu, linux-kernel, linux-kselftest

On Thu, May 21, 2026 at 05:36:31PM -0700, Nicolin Chen wrote:
> The upper bound of veventq_depth has been missing for veventq allocation,
> leaving a vulnerability where userspace could exhaust atomic memory pool.
> 
> Fix it properly:
>  - Allocate outside the spinlock to avoid GFP_ATOMIC
>  - Cap the veventq_depth upper bound
>  - Fix event_data byte-count
>  - Add selftest coverage
> 
> Note that QEMU's SMMU has been already allocating veventq using a "HW"
> EVTQ entry number. So, picking 19 as the known use case, for a minimal
> level of ABI consistency.
> 
> This is on github:
> https://github.com/nicolinc/iommufd/commits/fix_veventq_depth-v2
> 
> Changelog:
> v2
>  * Add Reviewed-by from Jason
>  * Rebase on Jason's for-rc tree
>  * Update commit message for clarification
>  * Move "data_len byte-count" to the first
>  * Drop optimistic read in the allocation path
> v1
>  https://lore.kernel.org/all/cover.1779070992.git.nicolinc@nvidia.com/
> 
> Nicolin Chen (4):
>   iommufd: Fix data_len byte-count vs element-count mismatch
>   iommufd: Move vevent memory allocation outside spinlock
>   iommufd: Set veventq_depth upper bound
>   iommufd/selftest: Add boundary tests for veventq_depth

I applied this a few days ago

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-06-05 13:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22  0:36 [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Nicolin Chen
2026-05-22  0:36 ` [PATCH rc v2 1/4] iommufd: Fix data_len byte-count vs element-count mismatch Nicolin Chen
2026-05-25  6:49   ` Tian, Kevin
2026-05-22  0:36 ` [PATCH rc v2 2/4] iommufd: Move vevent memory allocation outside spinlock Nicolin Chen
2026-05-25  6:50   ` Tian, Kevin
2026-05-22  0:36 ` [PATCH rc v2 3/4] iommufd: Set veventq_depth upper bound Nicolin Chen
2026-05-25  6:52   ` Tian, Kevin
2026-05-25 18:41     ` Nicolin Chen
2026-05-22  0:36 ` [PATCH rc v2 4/4] iommufd/selftest: Add boundary tests for veventq_depth Nicolin Chen
2026-05-25  6:52   ` Tian, Kevin
2026-06-05 13:43 ` [PATCH rc v2 0/4] iommufd: Fix veventq_depth boundary Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.