* [PATCH v3 0/4] Have xe_vm and xe_exec_queue take references to xef
@ 2024-07-18 20:31 Umesh Nerlige Ramappa
2024-07-18 20:31 ` [PATCH v3 1/4] drm/xe: Move part of xe_file cleanup to a helper Umesh Nerlige Ramappa
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Umesh Nerlige Ramappa @ 2024-07-18 20:31 UTC (permalink / raw)
To: intel-xe
Just posting the revision that will be pushed for reference. This rev
only has documentation changes compared to v2, so will not run CI on
this.
-----
xe_file_close triggers an asynchronous queue cleanup and then frees up
the xef object. Since queue cleanup flushes all pending jobs and the KMD
stores client usage stats into the xef object after jobs are flushed, we
see a use-after-free for the xef object. Resolve this by taking a
reference to xef from xe_exec_queue.
Issue: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1908
The series adds xef refcounting and ensures all consumers of xef take a
ref to it.
v2:
- Include review comments from v1
- Squash patch 3 and 5 from v1 to add Fixes/Closes tags
v3:
- kernel-doc fixes (Lucas, Matt)
Note: Patches 1 - 3 can be merged independently
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Umesh Nerlige Ramappa (4):
drm/xe: Move part of xe_file cleanup to a helper
drm/xe: Add ref counting for xe_file
drm/xe: Take a ref to xe file when user creates a VM
drm/xe: Fix use after free when client stats are captured
drivers/gpu/drm/xe/xe_device.c | 58 +++++++++++++++++++-----
drivers/gpu/drm/xe/xe_device.h | 3 ++
drivers/gpu/drm/xe/xe_device_types.h | 3 ++
drivers/gpu/drm/xe/xe_drm_client.c | 5 +-
drivers/gpu/drm/xe/xe_exec_queue.c | 10 +++-
drivers/gpu/drm/xe/xe_exec_queue_types.h | 7 ++-
drivers/gpu/drm/xe/xe_vm.c | 6 ++-
7 files changed, 71 insertions(+), 21 deletions(-)
--
2.38.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 1/4] drm/xe: Move part of xe_file cleanup to a helper
2024-07-18 20:31 [PATCH v3 0/4] Have xe_vm and xe_exec_queue take references to xef Umesh Nerlige Ramappa
@ 2024-07-18 20:31 ` Umesh Nerlige Ramappa
2024-07-18 20:31 ` [PATCH v3 2/4] drm/xe: Add ref counting for xe_file Umesh Nerlige Ramappa
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Umesh Nerlige Ramappa @ 2024-07-18 20:31 UTC (permalink / raw)
To: intel-xe
In order to make xe_file ref counted, move destruction of xe_file
members to a helper.
v2: Move xe_vm_close_and_put back into xe_file_close (Matt)
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_device.c | 29 ++++++++++++++++++-----------
1 file changed, 18 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index f51d456d15f7..0a7478d1ee63 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -93,9 +93,25 @@ static int xe_file_open(struct drm_device *dev, struct drm_file *file)
return 0;
}
+static void xe_file_destroy(struct xe_file *xef)
+{
+ struct xe_device *xe = xef->xe;
+
+ xa_destroy(&xef->exec_queue.xa);
+ mutex_destroy(&xef->exec_queue.lock);
+ xa_destroy(&xef->vm.xa);
+ mutex_destroy(&xef->vm.lock);
+
+ spin_lock(&xe->clients.lock);
+ xe->clients.count--;
+ spin_unlock(&xe->clients.lock);
+
+ xe_drm_client_put(xef->client);
+ kfree(xef);
+}
+
static void xe_file_close(struct drm_device *dev, struct drm_file *file)
{
- struct xe_device *xe = to_xe_device(dev);
struct xe_file *xef = file->driver_priv;
struct xe_vm *vm;
struct xe_exec_queue *q;
@@ -111,21 +127,12 @@ static void xe_file_close(struct drm_device *dev, struct drm_file *file)
xe_exec_queue_kill(q);
xe_exec_queue_put(q);
}
- xa_destroy(&xef->exec_queue.xa);
- mutex_destroy(&xef->exec_queue.lock);
mutex_lock(&xef->vm.lock);
xa_for_each(&xef->vm.xa, idx, vm)
xe_vm_close_and_put(vm);
mutex_unlock(&xef->vm.lock);
- xa_destroy(&xef->vm.xa);
- mutex_destroy(&xef->vm.lock);
- spin_lock(&xe->clients.lock);
- xe->clients.count--;
- spin_unlock(&xe->clients.lock);
-
- xe_drm_client_put(xef->client);
- kfree(xef);
+ xe_file_destroy(xef);
}
static const struct drm_ioctl_desc xe_ioctls[] = {
--
2.38.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 2/4] drm/xe: Add ref counting for xe_file
2024-07-18 20:31 [PATCH v3 0/4] Have xe_vm and xe_exec_queue take references to xef Umesh Nerlige Ramappa
2024-07-18 20:31 ` [PATCH v3 1/4] drm/xe: Move part of xe_file cleanup to a helper Umesh Nerlige Ramappa
@ 2024-07-18 20:31 ` Umesh Nerlige Ramappa
2024-07-18 20:31 ` [PATCH v3 3/4] drm/xe: Take a ref to xe file when user creates a VM Umesh Nerlige Ramappa
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Umesh Nerlige Ramappa @ 2024-07-18 20:31 UTC (permalink / raw)
To: intel-xe
Add ref counting for xe_file.
v2:
- Add kernel doc for exported functions (Matt)
- Instead of xe_file_destroy, export the get/put helpers (Lucas)
v3: Fixup the kernel-doc format and description (Matt, Lucas)
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
drivers/gpu/drm/xe/xe_device.c | 33 ++++++++++++++++++++++++++--
drivers/gpu/drm/xe/xe_device.h | 3 +++
drivers/gpu/drm/xe/xe_device_types.h | 3 +++
3 files changed, 37 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 0a7478d1ee63..50c302cf3249 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -90,11 +90,14 @@ static int xe_file_open(struct drm_device *dev, struct drm_file *file)
spin_unlock(&xe->clients.lock);
file->driver_priv = xef;
+ kref_init(&xef->refcount);
+
return 0;
}
-static void xe_file_destroy(struct xe_file *xef)
+static void xe_file_destroy(struct kref *ref)
{
+ struct xe_file *xef = container_of(ref, struct xe_file, refcount);
struct xe_device *xe = xef->xe;
xa_destroy(&xef->exec_queue.xa);
@@ -110,6 +113,32 @@ static void xe_file_destroy(struct xe_file *xef)
kfree(xef);
}
+/**
+ * xe_file_get() - Take a reference to the xe file object
+ * @xef: Pointer to the xe file
+ *
+ * Anyone with a pointer to xef must take a reference to the xe file
+ * object using this call.
+ *
+ * Return: xe file pointer
+ */
+struct xe_file *xe_file_get(struct xe_file *xef)
+{
+ kref_get(&xef->refcount);
+ return xef;
+}
+
+/**
+ * xe_file_put() - Drop a reference to the xe file object
+ * @xef: Pointer to the xe file
+ *
+ * Used to drop reference to the xef object
+ */
+void xe_file_put(struct xe_file *xef)
+{
+ kref_put(&xef->refcount, xe_file_destroy);
+}
+
static void xe_file_close(struct drm_device *dev, struct drm_file *file)
{
struct xe_file *xef = file->driver_priv;
@@ -132,7 +161,7 @@ static void xe_file_close(struct drm_device *dev, struct drm_file *file)
xe_vm_close_and_put(vm);
mutex_unlock(&xef->vm.lock);
- xe_file_destroy(xef);
+ xe_file_put(xef);
}
static const struct drm_ioctl_desc xe_ioctls[] = {
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 0a2a3e7fd402..533ccfb2567a 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -171,4 +171,7 @@ static inline bool xe_device_wedged(struct xe_device *xe)
void xe_device_declare_wedged(struct xe_device *xe);
+struct xe_file *xe_file_get(struct xe_file *xef);
+void xe_file_put(struct xe_file *xef);
+
#endif
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 8e81ade7279b..36252d5b1663 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -581,6 +581,9 @@ struct xe_file {
/** @client: drm client */
struct xe_drm_client *client;
+
+ /** @refcount: ref count of this xe file */
+ struct kref refcount;
};
#endif
--
2.38.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 3/4] drm/xe: Take a ref to xe file when user creates a VM
2024-07-18 20:31 [PATCH v3 0/4] Have xe_vm and xe_exec_queue take references to xef Umesh Nerlige Ramappa
2024-07-18 20:31 ` [PATCH v3 1/4] drm/xe: Move part of xe_file cleanup to a helper Umesh Nerlige Ramappa
2024-07-18 20:31 ` [PATCH v3 2/4] drm/xe: Add ref counting for xe_file Umesh Nerlige Ramappa
@ 2024-07-18 20:31 ` Umesh Nerlige Ramappa
2024-07-18 20:31 ` [PATCH v3 4/4] drm/xe: Fix use after free when client stats are captured Umesh Nerlige Ramappa
2024-07-18 20:37 ` ✗ CI.Patch_applied: failure for Have xe_vm and xe_exec_queue take references to xef (rev3) Patchwork
4 siblings, 0 replies; 6+ messages in thread
From: Umesh Nerlige Ramappa @ 2024-07-18 20:31 UTC (permalink / raw)
To: intel-xe
Take a reference to xef when user creates the VM and put the reference
when user destroys the VM.
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_vm.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index cf3aea5d8cdc..6adb0ff09d40 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1670,6 +1670,10 @@ static void vm_destroy_work_func(struct work_struct *w)
XE_WARN_ON(vm->pt_root[id]);
trace_xe_vm_free(vm);
+
+ if (vm->xef)
+ xe_file_put(vm->xef);
+
kfree(vm);
}
@@ -1802,7 +1806,7 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
}
args->vm_id = id;
- vm->xef = xef;
+ vm->xef = xe_file_get(xef);
/* Record BO memory for VM pagetable created against client */
for_each_tile(tile, xe, id)
--
2.38.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 4/4] drm/xe: Fix use after free when client stats are captured
2024-07-18 20:31 [PATCH v3 0/4] Have xe_vm and xe_exec_queue take references to xef Umesh Nerlige Ramappa
` (2 preceding siblings ...)
2024-07-18 20:31 ` [PATCH v3 3/4] drm/xe: Take a ref to xe file when user creates a VM Umesh Nerlige Ramappa
@ 2024-07-18 20:31 ` Umesh Nerlige Ramappa
2024-07-18 20:37 ` ✗ CI.Patch_applied: failure for Have xe_vm and xe_exec_queue take references to xef (rev3) Patchwork
4 siblings, 0 replies; 6+ messages in thread
From: Umesh Nerlige Ramappa @ 2024-07-18 20:31 UTC (permalink / raw)
To: intel-xe
xe_file_close triggers an asynchronous queue cleanup and then frees up
the xef object. Since queue cleanup flushes all pending jobs and the KMD
stores client usage stats into the xef object after jobs are flushed, we
see a use-after-free for the xef object. Resolve this by taking a
reference to xef from xe_exec_queue.
While at it, revert an earlier change that contained a partial work
around for this issue.
v2:
- Take a ref to xef even for the VM bind queue (Matt)
- Squash patches relevant to that fix and work around (Lucas)
v3: Fix typo (Lucas)
Fixes: ce62827bc294 ("drm/xe: Do not access xe file when updating exec queue run_ticks")
Fixes: 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1908
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
drivers/gpu/drm/xe/xe_drm_client.c | 5 +----
drivers/gpu/drm/xe/xe_exec_queue.c | 10 +++++++++-
drivers/gpu/drm/xe/xe_exec_queue_types.h | 7 +++----
3 files changed, 13 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
index 6a26923fa10e..7ddd59908334 100644
--- a/drivers/gpu/drm/xe/xe_drm_client.c
+++ b/drivers/gpu/drm/xe/xe_drm_client.c
@@ -251,11 +251,8 @@ static void show_run_ticks(struct drm_printer *p, struct drm_file *file)
/* Accumulate all the exec queues from this client */
mutex_lock(&xef->exec_queue.lock);
- xa_for_each(&xef->exec_queue.xa, i, q) {
+ xa_for_each(&xef->exec_queue.xa, i, q)
xe_exec_queue_update_run_ticks(q);
- xef->run_ticks[q->class] += q->run_ticks - q->old_run_ticks;
- q->old_run_ticks = q->run_ticks;
- }
mutex_unlock(&xef->exec_queue.lock);
/* Get the total GPU cycles */
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 3336a01a1006..69867a7b7c77 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -37,6 +37,10 @@ static void __xe_exec_queue_free(struct xe_exec_queue *q)
{
if (q->vm)
xe_vm_put(q->vm);
+
+ if (q->xef)
+ xe_file_put(q->xef);
+
kfree(q);
}
@@ -649,6 +653,7 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
goto kill_exec_queue;
args->exec_queue_id = id;
+ q->xef = xe_file_get(xef);
return 0;
@@ -762,6 +767,7 @@ bool xe_exec_queue_is_idle(struct xe_exec_queue *q)
*/
void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q)
{
+ struct xe_file *xef;
struct xe_lrc *lrc;
u32 old_ts, new_ts;
@@ -773,6 +779,8 @@ void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q)
if (!q->vm || !q->vm->xef)
return;
+ xef = q->vm->xef;
+
/*
* Only sample the first LRC. For parallel submission, all of them are
* scheduled together and we compensate that below by multiplying by
@@ -783,7 +791,7 @@ void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q)
*/
lrc = q->lrc[0];
new_ts = xe_lrc_update_timestamp(lrc, &old_ts);
- q->run_ticks += (new_ts - old_ts) * q->width;
+ xef->run_ticks[q->class] += (new_ts - old_ts) * q->width;
}
void xe_exec_queue_kill(struct xe_exec_queue *q)
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index ded9f9396429..1408b02eea53 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -38,6 +38,9 @@ enum xe_exec_queue_priority {
* a kernel object.
*/
struct xe_exec_queue {
+ /** @xef: Back pointer to xe file if this is user created exec queue */
+ struct xe_file *xef;
+
/** @gt: graphics tile this exec queue can submit to */
struct xe_gt *gt;
/**
@@ -139,10 +142,6 @@ struct xe_exec_queue {
* Protected by @vm's resv. Unused if @vm == NULL.
*/
u64 tlb_flush_seqno;
- /** @old_run_ticks: prior hw engine class run time in ticks for this exec queue */
- u64 old_run_ticks;
- /** @run_ticks: hw engine class run time in ticks for this exec queue */
- u64 run_ticks;
/** @lrc: logical ring context for this exec queue */
struct xe_lrc *lrc[];
};
--
2.38.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* ✗ CI.Patch_applied: failure for Have xe_vm and xe_exec_queue take references to xef (rev3)
2024-07-18 20:31 [PATCH v3 0/4] Have xe_vm and xe_exec_queue take references to xef Umesh Nerlige Ramappa
` (3 preceding siblings ...)
2024-07-18 20:31 ` [PATCH v3 4/4] drm/xe: Fix use after free when client stats are captured Umesh Nerlige Ramappa
@ 2024-07-18 20:37 ` Patchwork
4 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2024-07-18 20:37 UTC (permalink / raw)
To: Umesh Nerlige Ramappa; +Cc: intel-xe
== Series Details ==
Series: Have xe_vm and xe_exec_queue take references to xef (rev3)
URL : https://patchwork.freedesktop.org/series/135865/
State : failure
== Summary ==
=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: 776d87c393f0 drm-tip: 2024y-07m-18d-20h-08m-58s UTC integration manifest
=== git am output follows ===
error: patch failed: drivers/gpu/drm/xe/xe_vm.c:1670
error: drivers/gpu/drm/xe/xe_vm.c: patch does not apply
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Applying: drm/xe: Move part of xe_file cleanup to a helper
Applying: drm/xe: Add ref counting for xe_file
Applying: drm/xe: Take a ref to xe file when user creates a VM
Patch failed at 0003 drm/xe: Take a ref to xe file when user creates a VM
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-07-18 20:37 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-18 20:31 [PATCH v3 0/4] Have xe_vm and xe_exec_queue take references to xef Umesh Nerlige Ramappa
2024-07-18 20:31 ` [PATCH v3 1/4] drm/xe: Move part of xe_file cleanup to a helper Umesh Nerlige Ramappa
2024-07-18 20:31 ` [PATCH v3 2/4] drm/xe: Add ref counting for xe_file Umesh Nerlige Ramappa
2024-07-18 20:31 ` [PATCH v3 3/4] drm/xe: Take a ref to xe file when user creates a VM Umesh Nerlige Ramappa
2024-07-18 20:31 ` [PATCH v3 4/4] drm/xe: Fix use after free when client stats are captured Umesh Nerlige Ramappa
2024-07-18 20:37 ` ✗ CI.Patch_applied: failure for Have xe_vm and xe_exec_queue take references to xef (rev3) Patchwork
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox