stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rc 0/3] Several iommufd bug fixes
@ 2023-07-25 19:05 Jason Gunthorpe
  2023-07-25 19:05 ` [PATCH rc 1/3] iommufd/selftest: Do not try to destroy an access once it is attached Jason Gunthorpe
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2023-07-25 19:05 UTC (permalink / raw)
  To: iommu
  Cc: Alex Williamson, Lu Baolu, Eric Auger, Kevin Tian, Lixiao Yang,
	Matthew Rosato, Nicolin Chen, stable, syzbot+7574ebfe589049630608,
	Terrence Xu, Yi Liu

Syzkaller found another small issue, a race with
iommufd_object_destroy_user() and IOMMUFD_DESTROY.

While researching that I noticed an error unwind mistake by inspection.

Yi and Nicolin stumbled on the last one running the test suite sometimes.

Jason Gunthorpe (3):
  iommufd/selftest: Do not try to destroy an access once it is attached
  iommufd: IOMMUFD_DESTROY should not increase the refcount
  iommufd: Set end correctly when doing batch carry

 drivers/iommu/iommufd/device.c          | 12 +---
 drivers/iommu/iommufd/iommufd_private.h | 15 ++++-
 drivers/iommu/iommufd/main.c            | 78 +++++++++++++++++++------
 drivers/iommu/iommufd/pages.c           |  2 +-
 drivers/iommu/iommufd/selftest.c        |  9 +--
 5 files changed, 81 insertions(+), 35 deletions(-)


base-commit: 6eaae198076080886b9e7d57f4ae06fa782f90ef
-- 
2.41.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH rc 1/3] iommufd/selftest: Do not try to destroy an access once it is attached
  2023-07-25 19:05 [PATCH rc 0/3] Several iommufd bug fixes Jason Gunthorpe
@ 2023-07-25 19:05 ` Jason Gunthorpe
  2023-07-25 21:45   ` Nicolin Chen
  2023-07-26  3:53   ` Greg KH
  2023-07-25 19:05 ` [PATCH rc 2/3] iommufd: IOMMUFD_DESTROY should not increase the refcount Jason Gunthorpe
  2023-07-25 19:05 ` [PATCH rc 3/3] iommufd: Set end correctly when doing batch carry Jason Gunthorpe
  2 siblings, 2 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2023-07-25 19:05 UTC (permalink / raw)
  To: iommu
  Cc: Alex Williamson, Lu Baolu, Eric Auger, Kevin Tian, Lixiao Yang,
	Matthew Rosato, Nicolin Chen, stable, syzbot+7574ebfe589049630608,
	Terrence Xu, Yi Liu

The access must be detached first.

To make the cleanup simpler copy the fdno to userspace before creating the
access in the first place. Then there is no need to unwind after
iommufd_access_attach.

Fixes: 54b47585db66 ("iommufd: Create access in vfio_iommufd_emulated_bind()")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommufd/selftest.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 74c2076105d486..65c8b6ad2504f0 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -716,6 +716,10 @@ static int iommufd_test_create_access(struct iommufd_ucmd *ucmd,
 		rc = -ENOMEM;
 		goto out_free_staccess;
 	}
+	cmd->create_access.out_access_fd = fdno;
+	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+	if (rc)
+		goto out_put_fdno;
 
 	access = iommufd_access_create(
 		ucmd->ictx,
@@ -727,11 +731,8 @@ static int iommufd_test_create_access(struct iommufd_ucmd *ucmd,
 		rc = PTR_ERR(access);
 		goto out_put_fdno;
 	}
+
 	rc = iommufd_access_attach(access, ioas_id);
-	if (rc)
-		goto out_destroy;
-	cmd->create_access.out_access_fd = fdno;
-	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
 	if (rc)
 		goto out_destroy;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rc 2/3] iommufd: IOMMUFD_DESTROY should not increase the refcount
  2023-07-25 19:05 [PATCH rc 0/3] Several iommufd bug fixes Jason Gunthorpe
  2023-07-25 19:05 ` [PATCH rc 1/3] iommufd/selftest: Do not try to destroy an access once it is attached Jason Gunthorpe
@ 2023-07-25 19:05 ` Jason Gunthorpe
  2023-07-27  5:25   ` Tian, Kevin
  2023-07-25 19:05 ` [PATCH rc 3/3] iommufd: Set end correctly when doing batch carry Jason Gunthorpe
  2 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2023-07-25 19:05 UTC (permalink / raw)
  To: iommu
  Cc: Alex Williamson, Lu Baolu, Eric Auger, Kevin Tian, Lixiao Yang,
	Matthew Rosato, Nicolin Chen, stable, syzbot+7574ebfe589049630608,
	Terrence Xu, Yi Liu

syzkaller found a race where IOMMUFD_DESTROY increments the refcount:

       obj = iommufd_get_object(ucmd->ictx, cmd->id, IOMMUFD_OBJ_ANY);
       if (IS_ERR(obj))
               return PTR_ERR(obj);
       iommufd_ref_to_users(obj);
       /* See iommufd_ref_to_users() */
       if (!iommufd_object_destroy_user(ucmd->ictx, obj))

As part of the sequence to join the two existing primitives together.

Allowing the refcount the be elevated without holding the destroy_rwsem
violates the assumption that all temporary refcoutn elevations are
protected by destroy_rwsem. Racing IOMMUFD_DESTROY with
iommufd_object_destroy_user() will cause spurious failures:

  WARNING: CPU: 0 PID: 3076 at drivers/iommu/iommufd/device.c:477 iommufd_access_destroy+0x18/0x20 drivers/iommu/iommufd/device.c:478
  Modules linked in:
  CPU: 0 PID: 3076 Comm: syz-executor.0 Not tainted 6.3.0-rc1-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
  RIP: 0010:iommufd_access_destroy+0x18/0x20 drivers/iommu/iommufd/device.c:477
  Code: e8 3d 4e 00 00 84 c0 74 01 c3 0f 0b c3 0f 1f 44 00 00 f3 0f 1e fa 48 89 fe 48 8b bf a8 00 00 00 e8 1d 4e 00 00 84 c0 74 01 c3 <0f> 0b c3 0f 1f 44 00 00 41 57 41 56 41 55 4c 8d ae d0 00 00 00 41
  RSP: 0018:ffffc90003067e08 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff888109ea0300 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000ffffffff
  RBP: 0000000000000004 R08: 0000000000000000 R09: ffff88810bbb3500
  R10: ffff88810bbb3e48 R11: 0000000000000000 R12: ffffc90003067e88
  R13: ffffc90003067ea8 R14: ffff888101249800 R15: 00000000fffffffe
  FS:  00007ff7254fe6c0(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000555557262da8 CR3: 000000010a6fd000 CR4: 0000000000350ef0
  Call Trace:
   <TASK>
   iommufd_test_create_access drivers/iommu/iommufd/selftest.c:596 [inline]
   iommufd_test+0x71c/0xcf0 drivers/iommu/iommufd/selftest.c:813
   iommufd_fops_ioctl+0x10f/0x1b0 drivers/iommu/iommufd/main.c:337
   vfs_ioctl fs/ioctl.c:51 [inline]
   __do_sys_ioctl fs/ioctl.c:870 [inline]
   __se_sys_ioctl fs/ioctl.c:856 [inline]
   __x64_sys_ioctl+0x84/0xc0 fs/ioctl.c:856
   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
   do_syscall_64+0x38/0x80 arch/x86/entry/common.c:80
   entry_SYSCALL_64_after_hwframe+0x63/0xcd

The solution is to not increment the refcount on the IOMMUFD_DESTROY path
at all. Instead use the xa_lock to serialize everything. The refcount
check == 1 and xa_erase can be done under a single critical region. This
avoids the need for any refcount incrementing.

It has the downside that if userspace races destroy with other operations
it will get an EBUSY instead of waiting, but this is kind of racing is
already dangerous.

Fixes: 2ff4bed7fee7 ("iommufd: File descriptor, context, kconfig and makefiles")
Reported-by: syzbot+7574ebfe589049630608@syzkaller.appspotmail.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommufd/device.c          | 12 +---
 drivers/iommu/iommufd/iommufd_private.h | 15 ++++-
 drivers/iommu/iommufd/main.c            | 78 +++++++++++++++++++------
 3 files changed, 75 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 29d05663d4d17a..ed2937a4e196f6 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -109,10 +109,7 @@ EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, IOMMUFD);
  */
 void iommufd_device_unbind(struct iommufd_device *idev)
 {
-	bool was_destroyed;
-
-	was_destroyed = iommufd_object_destroy_user(idev->ictx, &idev->obj);
-	WARN_ON(!was_destroyed);
+	iommufd_object_destroy_user(idev->ictx, &idev->obj);
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
 
@@ -382,7 +379,7 @@ void iommufd_device_detach(struct iommufd_device *idev)
 	mutex_unlock(&hwpt->devices_lock);
 
 	if (hwpt->auto_domain)
-		iommufd_object_destroy_user(idev->ictx, &hwpt->obj);
+		iommufd_object_deref_user(idev->ictx, &hwpt->obj);
 	else
 		refcount_dec(&hwpt->obj.users);
 
@@ -456,10 +453,7 @@ EXPORT_SYMBOL_NS_GPL(iommufd_access_create, IOMMUFD);
  */
 void iommufd_access_destroy(struct iommufd_access *access)
 {
-	bool was_destroyed;
-
-	was_destroyed = iommufd_object_destroy_user(access->ictx, &access->obj);
-	WARN_ON(!was_destroyed);
+	iommufd_object_destroy_user(access->ictx, &access->obj);
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
 
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index b38e67d1988bdb..f9790983699ce6 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -176,8 +176,19 @@ void iommufd_object_abort_and_destroy(struct iommufd_ctx *ictx,
 				      struct iommufd_object *obj);
 void iommufd_object_finalize(struct iommufd_ctx *ictx,
 			     struct iommufd_object *obj);
-bool iommufd_object_destroy_user(struct iommufd_ctx *ictx,
-				 struct iommufd_object *obj);
+void __iommufd_object_destroy_user(struct iommufd_ctx *ictx,
+				   struct iommufd_object *obj, bool allow_fail);
+static inline void iommufd_object_destroy_user(struct iommufd_ctx *ictx,
+					       struct iommufd_object *obj)
+{
+	__iommufd_object_destroy_user(ictx, obj, false);
+}
+static inline void iommufd_object_deref_user(struct iommufd_ctx *ictx,
+					     struct iommufd_object *obj)
+{
+	__iommufd_object_destroy_user(ictx, obj, true);
+}
+
 struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx,
 					     size_t size,
 					     enum iommufd_object_type type);
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 3fbe636c3d8a69..69387f9ef7f14c 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -116,14 +116,56 @@ struct iommufd_object *iommufd_get_object(struct iommufd_ctx *ictx, u32 id,
 	return obj;
 }
 
+/*
+ * Remove the given object id from the xarray if the only reference to the
+ * object is held by the xarray. The caller must call ops destroy().
+ */
+static struct iommufd_object *iommufd_object_remove(struct iommufd_ctx *ictx,
+						    u32 id, bool extra_put)
+{
+	struct iommufd_object *obj;
+	XA_STATE(xas, &ictx->objects, id);
+
+	xa_lock(&ictx->objects);
+	obj = xas_load(&xas);
+	if (xa_is_zero(obj) || !obj) {
+		obj = ERR_PTR(-ENOENT);
+		goto out_xa;
+	}
+
+	/*
+	 * If the caller is holding a ref on obj we put it here under the
+	 * spinlock.
+	 */
+	if (extra_put)
+		refcount_dec(&obj->users);
+
+	if (!refcount_dec_if_one(&obj->users)) {
+		obj = ERR_PTR(-EBUSY);
+		goto out_xa;
+	}
+
+	xas_store(&xas, NULL);
+	if (ictx->vfio_ioas == container_of(obj, struct iommufd_ioas, obj))
+		ictx->vfio_ioas = NULL;
+
+out_xa:
+	xa_unlock(&ictx->objects);
+
+	/* The returned object reference count is zero */
+	return obj;
+}
+
 /*
  * The caller holds a users refcount and wants to destroy the object. Returns
  * true if the object was destroyed. In all cases the caller no longer has a
  * reference on obj.
  */
-bool iommufd_object_destroy_user(struct iommufd_ctx *ictx,
-				 struct iommufd_object *obj)
+void __iommufd_object_destroy_user(struct iommufd_ctx *ictx,
+				   struct iommufd_object *obj, bool allow_fail)
 {
+	struct iommufd_object *ret;
+
 	/*
 	 * The purpose of the destroy_rwsem is to ensure deterministic
 	 * destruction of objects used by external drivers and destroyed by this
@@ -131,22 +173,22 @@ bool iommufd_object_destroy_user(struct iommufd_ctx *ictx,
 	 * side of this, such as during ioctl execution.
 	 */
 	down_write(&obj->destroy_rwsem);
-	xa_lock(&ictx->objects);
-	refcount_dec(&obj->users);
-	if (!refcount_dec_if_one(&obj->users)) {
-		xa_unlock(&ictx->objects);
-		up_write(&obj->destroy_rwsem);
-		return false;
-	}
-	__xa_erase(&ictx->objects, obj->id);
-	if (ictx->vfio_ioas && &ictx->vfio_ioas->obj == obj)
-		ictx->vfio_ioas = NULL;
-	xa_unlock(&ictx->objects);
+	ret = iommufd_object_remove(ictx, obj->id, true);
 	up_write(&obj->destroy_rwsem);
 
+	if (allow_fail && IS_ERR(ret))
+		return;
+
+	/*
+	 * If there is a bug and we couldn't destroy the object then we did put
+	 * back the callers refcount and will eventually try to free it again
+	 * during close.
+	 */
+	if (WARN_ON(IS_ERR(ret)))
+		return;
+
 	iommufd_object_ops[obj->type].destroy(obj);
 	kfree(obj);
-	return true;
 }
 
 static int iommufd_destroy(struct iommufd_ucmd *ucmd)
@@ -154,13 +196,11 @@ static int iommufd_destroy(struct iommufd_ucmd *ucmd)
 	struct iommu_destroy *cmd = ucmd->cmd;
 	struct iommufd_object *obj;
 
-	obj = iommufd_get_object(ucmd->ictx, cmd->id, IOMMUFD_OBJ_ANY);
+	obj = iommufd_object_remove(ucmd->ictx, cmd->id, false);
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
-	iommufd_ref_to_users(obj);
-	/* See iommufd_ref_to_users() */
-	if (!iommufd_object_destroy_user(ucmd->ictx, obj))
-		return -EBUSY;
+	iommufd_object_ops[obj->type].destroy(obj);
+	kfree(obj);
 	return 0;
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rc 3/3] iommufd: Set end correctly when doing batch carry
  2023-07-25 19:05 [PATCH rc 0/3] Several iommufd bug fixes Jason Gunthorpe
  2023-07-25 19:05 ` [PATCH rc 1/3] iommufd/selftest: Do not try to destroy an access once it is attached Jason Gunthorpe
  2023-07-25 19:05 ` [PATCH rc 2/3] iommufd: IOMMUFD_DESTROY should not increase the refcount Jason Gunthorpe
@ 2023-07-25 19:05 ` Jason Gunthorpe
  2023-07-25 19:55   ` Nicolin Chen
  2023-07-27  5:26   ` Tian, Kevin
  2 siblings, 2 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2023-07-25 19:05 UTC (permalink / raw)
  To: iommu
  Cc: Alex Williamson, Lu Baolu, Eric Auger, Kevin Tian, Lixiao Yang,
	Matthew Rosato, Nicolin Chen, stable, syzbot+7574ebfe589049630608,
	Terrence Xu, Yi Liu

Even though the test suite covers this it somehow became obscured that
this wasn't working.

The test iommufd_ioas.mock_domain.access_domain_destory would blow up
rarely.

end should be set to 1 because this just pushed an item, the carry, to the
pfns list.

Sometimes the test would blow up with:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP
  CPU: 5 PID: 584 Comm: iommufd Not tainted 6.5.0-rc1-dirty #1236
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  RIP: 0010:batch_unpin+0xa2/0x100 [iommufd]
  Code: 17 48 81 fe ff ff 07 00 77 70 48 8b 15 b7 be 97 e2 48 85 d2 74 14 48 8b 14 fa 48 85 d2 74 0b 40 0f b6 f6 48 c1 e6 04 48 01 f2 <48> 8b 3a 48 c1 e0 06 89 ca 48 89 de 48 83 e7 f0 48 01 c7 e8 96 dc
  RSP: 0018:ffffc90001677a58 EFLAGS: 00010246
  RAX: 00007f7e2646f000 RBX: 0000000000000000 RCX: 0000000000000001
  RDX: 0000000000000000 RSI: 00000000fefc4c8d RDI: 0000000000fefc4c
  RBP: ffffc90001677a80 R08: 0000000000000048 R09: 0000000000000200
  R10: 0000000000030b98 R11: ffffffff81f3bb40 R12: 0000000000000001
  R13: ffff888101f75800 R14: ffffc90001677ad0 R15: 00000000000001fe
  FS:  00007f9323679740(0000) GS:ffff8881ba540000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000000105ede003 CR4: 00000000003706a0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   ? show_regs+0x5c/0x70
   ? __die+0x1f/0x60
   ? page_fault_oops+0x15d/0x440
   ? lock_release+0xbc/0x240
   ? exc_page_fault+0x4a4/0x970
   ? asm_exc_page_fault+0x27/0x30
   ? batch_unpin+0xa2/0x100 [iommufd]
   ? batch_unpin+0xba/0x100 [iommufd]
   __iopt_area_unfill_domain+0x198/0x430 [iommufd]
   ? __mutex_lock+0x8c/0xb80
   ? __mutex_lock+0x6aa/0xb80
   ? xa_erase+0x28/0x30
   ? iopt_table_remove_domain+0x162/0x320 [iommufd]
   ? lock_release+0xbc/0x240
   iopt_area_unfill_domain+0xd/0x10 [iommufd]
   iopt_table_remove_domain+0x195/0x320 [iommufd]
   iommufd_hw_pagetable_destroy+0xb3/0x110 [iommufd]
   iommufd_object_destroy_user+0x8e/0xf0 [iommufd]
   iommufd_device_detach+0xc5/0x140 [iommufd]
   iommufd_selftest_destroy+0x1f/0x70 [iommufd]
   iommufd_object_destroy_user+0x8e/0xf0 [iommufd]
   iommufd_destroy+0x3a/0x50 [iommufd]
   iommufd_fops_ioctl+0xfb/0x170 [iommufd]
   __x64_sys_ioctl+0x40d/0x9a0
   do_syscall_64+0x3c/0x80
   entry_SYSCALL_64_after_hwframe+0x46/0xb0

Cc: <stable@vger.kernel.org>
Fixes: f394576eb11d ("iommufd: PFN handling for iopt_pages")
Reported-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommufd/pages.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index 412ca96be128ea..8d9aa297c117e4 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -297,7 +297,7 @@ static void batch_clear_carry(struct pfn_batch *batch, unsigned int keep_pfns)
 	batch->pfns[0] = batch->pfns[batch->end - 1] +
 			 (batch->npfns[batch->end - 1] - keep_pfns);
 	batch->npfns[0] = keep_pfns;
-	batch->end = 0;
+	batch->end = 1;
 }
 
 static void batch_skip_carry(struct pfn_batch *batch, unsigned int skip_pfns)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH rc 3/3] iommufd: Set end correctly when doing batch carry
  2023-07-25 19:05 ` [PATCH rc 3/3] iommufd: Set end correctly when doing batch carry Jason Gunthorpe
@ 2023-07-25 19:55   ` Nicolin Chen
  2023-07-27  5:26   ` Tian, Kevin
  1 sibling, 0 replies; 10+ messages in thread
From: Nicolin Chen @ 2023-07-25 19:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Alex Williamson, Lu Baolu, Eric Auger, Kevin Tian,
	Lixiao Yang, Matthew Rosato, stable, syzbot+7574ebfe589049630608,
	Terrence Xu, Yi Liu

On Tue, Jul 25, 2023 at 04:05:50PM -0300, Jason Gunthorpe wrote:
> Even though the test suite covers this it somehow became obscured that
> this wasn't working.
> 
> The test iommufd_ioas.mock_domain.access_domain_destory would blow up
> rarely.
> 
> end should be set to 1 because this just pushed an item, the carry, to the
> pfns list.
> 
> Sometimes the test would blow up with:
> 
>   BUG: kernel NULL pointer dereference, address: 0000000000000000
>   #PF: supervisor read access in kernel mode
>   #PF: error_code(0x0000) - not-present page
>   PGD 0 P4D 0
>   Oops: 0000 [#1] SMP
>   CPU: 5 PID: 584 Comm: iommufd Not tainted 6.5.0-rc1-dirty #1236
>   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
>   RIP: 0010:batch_unpin+0xa2/0x100 [iommufd]
>   Code: 17 48 81 fe ff ff 07 00 77 70 48 8b 15 b7 be 97 e2 48 85 d2 74 14 48 8b 14 fa 48 85 d2 74 0b 40 0f b6 f6 48 c1 e6 04 48 01 f2 <48> 8b 3a 48 c1 e0 06 89 ca 48 89 de 48 83 e7 f0 48 01 c7 e8 96 dc
>   RSP: 0018:ffffc90001677a58 EFLAGS: 00010246
>   RAX: 00007f7e2646f000 RBX: 0000000000000000 RCX: 0000000000000001
>   RDX: 0000000000000000 RSI: 00000000fefc4c8d RDI: 0000000000fefc4c
>   RBP: ffffc90001677a80 R08: 0000000000000048 R09: 0000000000000200
>   R10: 0000000000030b98 R11: ffffffff81f3bb40 R12: 0000000000000001
>   R13: ffff888101f75800 R14: ffffc90001677ad0 R15: 00000000000001fe
>   FS:  00007f9323679740(0000) GS:ffff8881ba540000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000000000000000 CR3: 0000000105ede003 CR4: 00000000003706a0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>    <TASK>
>    ? show_regs+0x5c/0x70
>    ? __die+0x1f/0x60
>    ? page_fault_oops+0x15d/0x440
>    ? lock_release+0xbc/0x240
>    ? exc_page_fault+0x4a4/0x970
>    ? asm_exc_page_fault+0x27/0x30
>    ? batch_unpin+0xa2/0x100 [iommufd]
>    ? batch_unpin+0xba/0x100 [iommufd]
>    __iopt_area_unfill_domain+0x198/0x430 [iommufd]
>    ? __mutex_lock+0x8c/0xb80
>    ? __mutex_lock+0x6aa/0xb80
>    ? xa_erase+0x28/0x30
>    ? iopt_table_remove_domain+0x162/0x320 [iommufd]
>    ? lock_release+0xbc/0x240
>    iopt_area_unfill_domain+0xd/0x10 [iommufd]
>    iopt_table_remove_domain+0x195/0x320 [iommufd]
>    iommufd_hw_pagetable_destroy+0xb3/0x110 [iommufd]
>    iommufd_object_destroy_user+0x8e/0xf0 [iommufd]
>    iommufd_device_detach+0xc5/0x140 [iommufd]
>    iommufd_selftest_destroy+0x1f/0x70 [iommufd]
>    iommufd_object_destroy_user+0x8e/0xf0 [iommufd]
>    iommufd_destroy+0x3a/0x50 [iommufd]
>    iommufd_fops_ioctl+0xfb/0x170 [iommufd]
>    __x64_sys_ioctl+0x40d/0x9a0
>    do_syscall_64+0x3c/0x80
>    entry_SYSCALL_64_after_hwframe+0x46/0xb0
> 
> Cc: <stable@vger.kernel.org>
> Fixes: f394576eb11d ("iommufd: PFN handling for iopt_pages")
> Reported-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

This fixes the memory leak at the HugePages, and likely the rarely
triggered BUG too since I see no repro after applying this patch.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>

Thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH rc 1/3] iommufd/selftest: Do not try to destroy an access once it is attached
  2023-07-25 19:05 ` [PATCH rc 1/3] iommufd/selftest: Do not try to destroy an access once it is attached Jason Gunthorpe
@ 2023-07-25 21:45   ` Nicolin Chen
  2023-07-26  3:53   ` Greg KH
  1 sibling, 0 replies; 10+ messages in thread
From: Nicolin Chen @ 2023-07-25 21:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Alex Williamson, Lu Baolu, Eric Auger, Kevin Tian,
	Lixiao Yang, Matthew Rosato, stable, syzbot+7574ebfe589049630608,
	Terrence Xu, Yi Liu

On Tue, Jul 25, 2023 at 04:05:48PM -0300, Jason Gunthorpe wrote:
> The access must be detached first.
>
> To make the cleanup simpler copy the fdno to userspace before creating the
> access in the first place. Then there is no need to unwind after
> iommufd_access_attach.
>
> Fixes: 54b47585db66 ("iommufd: Create access in vfio_iommufd_emulated_bind()")
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Hmm, I was expecting that the iopt_remove_access() call in the
iommufd_access_destroy_object() could "detach" the access. If
calling iopt_remove_access() isn't enough, it means that we'd
need the full routine from the iommufd_access_detach() in cdev
series, i.e. we are missing the unmap part?

In that case, though this patch can fix the issue in selftest,
yet does the emulated pathway potentially have the same issue?

Thanks
Nic

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH rc 1/3] iommufd/selftest: Do not try to destroy an access once it is attached
  2023-07-25 19:05 ` [PATCH rc 1/3] iommufd/selftest: Do not try to destroy an access once it is attached Jason Gunthorpe
  2023-07-25 21:45   ` Nicolin Chen
@ 2023-07-26  3:53   ` Greg KH
  1 sibling, 0 replies; 10+ messages in thread
From: Greg KH @ 2023-07-26  3:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Alex Williamson, Lu Baolu, Eric Auger, Kevin Tian,
	Lixiao Yang, Matthew Rosato, Nicolin Chen, stable,
	syzbot+7574ebfe589049630608, Terrence Xu, Yi Liu

On Tue, Jul 25, 2023 at 04:05:48PM -0300, Jason Gunthorpe wrote:
> The access must be detached first.
> 
> To make the cleanup simpler copy the fdno to userspace before creating the
> access in the first place. Then there is no need to unwind after
> iommufd_access_attach.
> 
> Fixes: 54b47585db66 ("iommufd: Create access in vfio_iommufd_emulated_bind()")
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/iommufd/selftest.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH rc 2/3] iommufd: IOMMUFD_DESTROY should not increase the refcount
  2023-07-25 19:05 ` [PATCH rc 2/3] iommufd: IOMMUFD_DESTROY should not increase the refcount Jason Gunthorpe
@ 2023-07-27  5:25   ` Tian, Kevin
  2023-07-27 14:10     ` Jason Gunthorpe
  0 siblings, 1 reply; 10+ messages in thread
From: Tian, Kevin @ 2023-07-27  5:25 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu@lists.linux.dev
  Cc: Alex Williamson, Lu Baolu, Eric Auger, Lixiao Yang,
	Matthew Rosato, Nicolin Chen, stable@vger.kernel.org,
	syzbot+7574ebfe589049630608@syzkaller.appspotmail.com,
	Xu, Terrence, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, July 26, 2023 3:06 AM
> 
> syzkaller found a race where IOMMUFD_DESTROY increments the refcount:
> 
>        obj = iommufd_get_object(ucmd->ictx, cmd->id, IOMMUFD_OBJ_ANY);
>        if (IS_ERR(obj))
>                return PTR_ERR(obj);
>        iommufd_ref_to_users(obj);
>        /* See iommufd_ref_to_users() */
>        if (!iommufd_object_destroy_user(ucmd->ictx, obj))
> 
> As part of the sequence to join the two existing primitives together.
> 
> Allowing the refcount the be elevated without holding the destroy_rwsem
> violates the assumption that all temporary refcoutn elevations are

s/refcoutn/refcount/

> protected by destroy_rwsem. Racing IOMMUFD_DESTROY with
> iommufd_object_destroy_user() will cause spurious failures:
> 
>   WARNING: CPU: 0 PID: 3076 at drivers/iommu/iommufd/device.c:477
> iommufd_access_destroy+0x18/0x20 drivers/iommu/iommufd/device.c:478
>   Modules linked in:
>   CPU: 0 PID: 3076 Comm: syz-executor.0 Not tainted 6.3.0-rc1-syzkaller #0
>   Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 07/03/2023
>   RIP: 0010:iommufd_access_destroy+0x18/0x20
> drivers/iommu/iommufd/device.c:477
>   Code: e8 3d 4e 00 00 84 c0 74 01 c3 0f 0b c3 0f 1f 44 00 00 f3 0f 1e fa 48 89
> fe 48 8b bf a8 00 00 00 e8 1d 4e 00 00 84 c0 74 01 c3 <0f> 0b c3 0f 1f 44 00 00
> 41 57 41 56 41 55 4c 8d ae d0 00 00 00 41
>   RSP: 0018:ffffc90003067e08 EFLAGS: 00010246
>   RAX: 0000000000000000 RBX: ffff888109ea0300 RCX: 0000000000000000
>   RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000ffffffff
>   RBP: 0000000000000004 R08: 0000000000000000 R09: ffff88810bbb3500
>   R10: ffff88810bbb3e48 R11: 0000000000000000 R12: ffffc90003067e88
>   R13: ffffc90003067ea8 R14: ffff888101249800 R15: 00000000fffffffe
>   FS:  00007ff7254fe6c0(0000) GS:ffff888237c00000(0000)
> knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000555557262da8 CR3: 000000010a6fd000 CR4: 0000000000350ef0
>   Call Trace:
>    <TASK>
>    iommufd_test_create_access drivers/iommu/iommufd/selftest.c:596
> [inline]
>    iommufd_test+0x71c/0xcf0 drivers/iommu/iommufd/selftest.c:813
>    iommufd_fops_ioctl+0x10f/0x1b0 drivers/iommu/iommufd/main.c:337
>    vfs_ioctl fs/ioctl.c:51 [inline]
>    __do_sys_ioctl fs/ioctl.c:870 [inline]
>    __se_sys_ioctl fs/ioctl.c:856 [inline]
>    __x64_sys_ioctl+0x84/0xc0 fs/ioctl.c:856
>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>    do_syscall_64+0x38/0x80 arch/x86/entry/common.c:80
>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
> The solution is to not increment the refcount on the IOMMUFD_DESTROY
> path
> at all. Instead use the xa_lock to serialize everything. The refcount
> check == 1 and xa_erase can be done under a single critical region. This
> avoids the need for any refcount incrementing.
> 
> It has the downside that if userspace races destroy with other operations
> it will get an EBUSY instead of waiting, but this is kind of racing is
> already dangerous.

it's not a new downside. Even old code also returns -EBUSY if
iommufd_object_destroy_user() returns false.

> +
>  /*
>   * The caller holds a users refcount and wants to destroy the object. Returns

s/users/user's/

> +
> +	/*
> +	 * If there is a bug and we couldn't destroy the object then we did put
> +	 * back the callers refcount and will eventually try to free it again

s/callers/caller's/

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

btw,

> -	iommufd_ref_to_users(obj);
> -	/* See iommufd_ref_to_users() */
> -	if (!iommufd_object_destroy_user(ucmd->ictx, obj))
> -		return -EBUSY;

I wonder whether there is any other reason to keep iommufd_ref_to_users().
Now the only invocation is in iommufd_access_attach(), but it could be
simply replaced by "get_object(); refcount_inc(); put_object()" as all other
places are doing.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH rc 3/3] iommufd: Set end correctly when doing batch carry
  2023-07-25 19:05 ` [PATCH rc 3/3] iommufd: Set end correctly when doing batch carry Jason Gunthorpe
  2023-07-25 19:55   ` Nicolin Chen
@ 2023-07-27  5:26   ` Tian, Kevin
  1 sibling, 0 replies; 10+ messages in thread
From: Tian, Kevin @ 2023-07-27  5:26 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu@lists.linux.dev
  Cc: Alex Williamson, Lu Baolu, Eric Auger, Lixiao Yang,
	Matthew Rosato, Nicolin Chen, stable@vger.kernel.org,
	syzbot+7574ebfe589049630608@syzkaller.appspotmail.com,
	Xu, Terrence, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, July 26, 2023 3:06 AM
> 
> Even though the test suite covers this it somehow became obscured that
> this wasn't working.
> 
> The test iommufd_ioas.mock_domain.access_domain_destory would blow
> up
> rarely.
> 
> end should be set to 1 because this just pushed an item, the carry, to the
> pfns list.
> 
> Sometimes the test would blow up with:
> 
>   BUG: kernel NULL pointer dereference, address: 0000000000000000
>   #PF: supervisor read access in kernel mode
>   #PF: error_code(0x0000) - not-present page
>   PGD 0 P4D 0
>   Oops: 0000 [#1] SMP
>   CPU: 5 PID: 584 Comm: iommufd Not tainted 6.5.0-rc1-dirty #1236
>   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-
> gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
>   RIP: 0010:batch_unpin+0xa2/0x100 [iommufd]
>   Code: 17 48 81 fe ff ff 07 00 77 70 48 8b 15 b7 be 97 e2 48 85 d2 74 14 48 8b
> 14 fa 48 85 d2 74 0b 40 0f b6 f6 48 c1 e6 04 48 01 f2 <48> 8b 3a 48 c1 e0 06 89
> ca 48 89 de 48 83 e7 f0 48 01 c7 e8 96 dc
>   RSP: 0018:ffffc90001677a58 EFLAGS: 00010246
>   RAX: 00007f7e2646f000 RBX: 0000000000000000 RCX: 0000000000000001
>   RDX: 0000000000000000 RSI: 00000000fefc4c8d RDI: 0000000000fefc4c
>   RBP: ffffc90001677a80 R08: 0000000000000048 R09: 0000000000000200
>   R10: 0000000000030b98 R11: ffffffff81f3bb40 R12: 0000000000000001
>   R13: ffff888101f75800 R14: ffffc90001677ad0 R15: 00000000000001fe
>   FS:  00007f9323679740(0000) GS:ffff8881ba540000(0000)
> knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 0000000000000000 CR3: 0000000105ede003 CR4: 00000000003706a0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>   Call Trace:
>    <TASK>
>    ? show_regs+0x5c/0x70
>    ? __die+0x1f/0x60
>    ? page_fault_oops+0x15d/0x440
>    ? lock_release+0xbc/0x240
>    ? exc_page_fault+0x4a4/0x970
>    ? asm_exc_page_fault+0x27/0x30
>    ? batch_unpin+0xa2/0x100 [iommufd]
>    ? batch_unpin+0xba/0x100 [iommufd]
>    __iopt_area_unfill_domain+0x198/0x430 [iommufd]
>    ? __mutex_lock+0x8c/0xb80
>    ? __mutex_lock+0x6aa/0xb80
>    ? xa_erase+0x28/0x30
>    ? iopt_table_remove_domain+0x162/0x320 [iommufd]
>    ? lock_release+0xbc/0x240
>    iopt_area_unfill_domain+0xd/0x10 [iommufd]
>    iopt_table_remove_domain+0x195/0x320 [iommufd]
>    iommufd_hw_pagetable_destroy+0xb3/0x110 [iommufd]
>    iommufd_object_destroy_user+0x8e/0xf0 [iommufd]
>    iommufd_device_detach+0xc5/0x140 [iommufd]
>    iommufd_selftest_destroy+0x1f/0x70 [iommufd]
>    iommufd_object_destroy_user+0x8e/0xf0 [iommufd]
>    iommufd_destroy+0x3a/0x50 [iommufd]
>    iommufd_fops_ioctl+0xfb/0x170 [iommufd]
>    __x64_sys_ioctl+0x40d/0x9a0
>    do_syscall_64+0x3c/0x80
>    entry_SYSCALL_64_after_hwframe+0x46/0xb0
> 
> Cc: <stable@vger.kernel.org>
> Fixes: f394576eb11d ("iommufd: PFN handling for iopt_pages")
> Reported-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH rc 2/3] iommufd: IOMMUFD_DESTROY should not increase the refcount
  2023-07-27  5:25   ` Tian, Kevin
@ 2023-07-27 14:10     ` Jason Gunthorpe
  0 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2023-07-27 14:10 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: iommu@lists.linux.dev, Alex Williamson, Lu Baolu, Eric Auger,
	Lixiao Yang, Matthew Rosato, Nicolin Chen, stable@vger.kernel.org,
	syzbot+7574ebfe589049630608@syzkaller.appspotmail.com,
	Xu, Terrence, Liu, Yi L

On Thu, Jul 27, 2023 at 05:25:33AM +0000, Tian, Kevin wrote:

> > It has the downside that if userspace races destroy with other operations
> > it will get an EBUSY instead of waiting, but this is kind of racing is
> > already dangerous.
> 
> it's not a new downside. Even old code also returns -EBUSY if
> iommufd_object_destroy_user() returns false.

It sort of is, previously you could race ioctls and destroy would wait
for the ioctl to finish, now it returns -EBUSY


> > +
> >  /*
> >   * The caller holds a users refcount and wants to destroy the object. Returns
> 
> s/users/user's/

'users' is the name of the variable
 
> > +	/*
> > +	 * If there is a bug and we couldn't destroy the object then we did put
> > +	 * back the callers refcount and will eventually try to free it again
> 
> s/callers/caller's/

Yep
 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> 
> btw,
> 
> > -	iommufd_ref_to_users(obj);
> > -	/* See iommufd_ref_to_users() */
> > -	if (!iommufd_object_destroy_user(ucmd->ictx, obj))
> > -		return -EBUSY;
> 
> I wonder whether there is any other reason to keep iommufd_ref_to_users().
> Now the only invocation is in iommufd_access_attach(), but it could be
> simply replaced by "get_object(); refcount_inc(); put_object()" as all other
> places are doing.

Hmm, yes, the replace series could probably be tweaked to comfortably
do this.

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-07-27 14:10 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-25 19:05 [PATCH rc 0/3] Several iommufd bug fixes Jason Gunthorpe
2023-07-25 19:05 ` [PATCH rc 1/3] iommufd/selftest: Do not try to destroy an access once it is attached Jason Gunthorpe
2023-07-25 21:45   ` Nicolin Chen
2023-07-26  3:53   ` Greg KH
2023-07-25 19:05 ` [PATCH rc 2/3] iommufd: IOMMUFD_DESTROY should not increase the refcount Jason Gunthorpe
2023-07-27  5:25   ` Tian, Kevin
2023-07-27 14:10     ` Jason Gunthorpe
2023-07-25 19:05 ` [PATCH rc 3/3] iommufd: Set end correctly when doing batch carry Jason Gunthorpe
2023-07-25 19:55   ` Nicolin Chen
2023-07-27  5:26   ` Tian, Kevin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).