* [PATCH] drm/xe: Do not access xe file when updating exec queue runtime
@ 2024-05-24 18:49 Umesh Nerlige Ramappa
2024-05-24 19:25 ` Umesh Nerlige Ramappa
2024-05-24 19:35 ` Rodrigo Vivi
0 siblings, 2 replies; 4+ messages in thread
From: Umesh Nerlige Ramappa @ 2024-05-24 18:49 UTC (permalink / raw)
To: intel-xe; +Cc: jani.saarinen, matthew.auld, lucas.demarchi
The current code is running into a use after free case where xe file is
closed before the exec queue runtime can be updated. This is occuring in
the xe_file_close path. To fix that, do not access xe file when updating
the exec queue runtime. Instead store the exec queue runtime locally in
the exec queue object and accumulate it when the user dumps the drm
client stats. We know that the xe file is valid when user is dumping the
runtime stats for the drm client, so this effectively removes the
dependency on xe file object in xe_exec_queue_update_runtime().
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1908
Fixes: 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
drivers/gpu/drm/xe/xe_drm_client.c | 4 +++-
drivers/gpu/drm/xe/xe_exec_queue.c | 5 +----
drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 ++
3 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
index af404c9e5cc0..a6a15562b8f7 100644
--- a/drivers/gpu/drm/xe/xe_drm_client.c
+++ b/drivers/gpu/drm/xe/xe_drm_client.c
@@ -251,8 +251,10 @@ static void show_runtime(struct drm_printer *p, struct drm_file *file)
/* Accumulate all the exec queues from this client */
mutex_lock(&xef->exec_queue.lock);
- xa_for_each(&xef->exec_queue.xa, i, q)
+ xa_for_each(&xef->exec_queue.xa, i, q) {
xe_exec_queue_update_runtime(q);
+ xef->runtime[q->class] += q->runtime - xef->runtime[q->class];
+ }
mutex_unlock(&xef->exec_queue.lock);
/* Get the total GPU cycles */
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 0fd61fb4d104..a0775a96efb1 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -765,7 +765,6 @@ bool xe_exec_queue_is_idle(struct xe_exec_queue *q)
*/
void xe_exec_queue_update_runtime(struct xe_exec_queue *q)
{
- struct xe_file *xef;
struct xe_lrc *lrc;
u32 old_ts, new_ts;
@@ -777,8 +776,6 @@ void xe_exec_queue_update_runtime(struct xe_exec_queue *q)
if (!q->vm || !q->vm->xef)
return;
- xef = q->vm->xef;
-
/*
* Only sample the first LRC. For parallel submission, all of them are
* scheduled together and we compensate that below by multiplying by
@@ -789,7 +786,7 @@ void xe_exec_queue_update_runtime(struct xe_exec_queue *q)
*/
lrc = &q->lrc[0];
new_ts = xe_lrc_update_timestamp(lrc, &old_ts);
- xef->runtime[q->class] += (new_ts - old_ts) * q->width;
+ q->runtime += (new_ts - old_ts) * q->width;
}
void xe_exec_queue_kill(struct xe_exec_queue *q)
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index ee78d497d838..085a7dce7565 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -151,6 +151,8 @@ struct xe_exec_queue {
* Protected by @vm's resv. Unused if @vm == NULL.
*/
u64 tlb_flush_seqno;
+ /** @runtime: hw engine class runtime in ticks for this exec queue*/
+ u64 runtime;
/** @lrc: logical ring context for this exec queue */
struct xe_lrc lrc[];
};
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] drm/xe: Do not access xe file when updating exec queue runtime
2024-05-24 18:49 [PATCH] drm/xe: Do not access xe file when updating exec queue runtime Umesh Nerlige Ramappa
@ 2024-05-24 19:25 ` Umesh Nerlige Ramappa
2024-05-24 19:35 ` Rodrigo Vivi
1 sibling, 0 replies; 4+ messages in thread
From: Umesh Nerlige Ramappa @ 2024-05-24 19:25 UTC (permalink / raw)
To: intel-xe; +Cc: jani.saarinen, matthew.auld, lucas.demarchi
On Fri, May 24, 2024 at 11:49:58AM -0700, Umesh Nerlige Ramappa wrote:
>The current code is running into a use after free case where xe file is
>closed before the exec queue runtime can be updated. This is occuring in
>the xe_file_close path. To fix that, do not access xe file when updating
>the exec queue runtime. Instead store the exec queue runtime locally in
>the exec queue object and accumulate it when the user dumps the drm
>client stats. We know that the xe file is valid when user is dumping the
>runtime stats for the drm client, so this effectively removes the
>dependency on xe file object in xe_exec_queue_update_runtime().
>
>Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1908
>Fixes: 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime")
>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>---
> drivers/gpu/drm/xe/xe_drm_client.c | 4 +++-
> drivers/gpu/drm/xe/xe_exec_queue.c | 5 +----
> drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 ++
> 3 files changed, 6 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
>index af404c9e5cc0..a6a15562b8f7 100644
>--- a/drivers/gpu/drm/xe/xe_drm_client.c
>+++ b/drivers/gpu/drm/xe/xe_drm_client.c
>@@ -251,8 +251,10 @@ static void show_runtime(struct drm_printer *p, struct drm_file *file)
>
> /* Accumulate all the exec queues from this client */
> mutex_lock(&xef->exec_queue.lock);
>- xa_for_each(&xef->exec_queue.xa, i, q)
>+ xa_for_each(&xef->exec_queue.xa, i, q) {
> xe_exec_queue_update_runtime(q);
>+ xef->runtime[q->class] += q->runtime - xef->runtime[q->class];
This needs to be delta for the specific queue. Will post another rev.
Umesh
>+ }
> mutex_unlock(&xef->exec_queue.lock);
>
> /* Get the total GPU cycles */
>diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>index 0fd61fb4d104..a0775a96efb1 100644
>--- a/drivers/gpu/drm/xe/xe_exec_queue.c
>+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>@@ -765,7 +765,6 @@ bool xe_exec_queue_is_idle(struct xe_exec_queue *q)
> */
> void xe_exec_queue_update_runtime(struct xe_exec_queue *q)
> {
>- struct xe_file *xef;
> struct xe_lrc *lrc;
> u32 old_ts, new_ts;
>
>@@ -777,8 +776,6 @@ void xe_exec_queue_update_runtime(struct xe_exec_queue *q)
> if (!q->vm || !q->vm->xef)
> return;
>
>- xef = q->vm->xef;
>-
> /*
> * Only sample the first LRC. For parallel submission, all of them are
> * scheduled together and we compensate that below by multiplying by
>@@ -789,7 +786,7 @@ void xe_exec_queue_update_runtime(struct xe_exec_queue *q)
> */
> lrc = &q->lrc[0];
> new_ts = xe_lrc_update_timestamp(lrc, &old_ts);
>- xef->runtime[q->class] += (new_ts - old_ts) * q->width;
>+ q->runtime += (new_ts - old_ts) * q->width;
> }
>
> void xe_exec_queue_kill(struct xe_exec_queue *q)
>diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>index ee78d497d838..085a7dce7565 100644
>--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
>+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>@@ -151,6 +151,8 @@ struct xe_exec_queue {
> * Protected by @vm's resv. Unused if @vm == NULL.
> */
> u64 tlb_flush_seqno;
>+ /** @runtime: hw engine class runtime in ticks for this exec queue*/
>+ u64 runtime;
> /** @lrc: logical ring context for this exec queue */
> struct xe_lrc lrc[];
> };
>--
>2.34.1
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] drm/xe: Do not access xe file when updating exec queue runtime
2024-05-24 18:49 [PATCH] drm/xe: Do not access xe file when updating exec queue runtime Umesh Nerlige Ramappa
2024-05-24 19:25 ` Umesh Nerlige Ramappa
@ 2024-05-24 19:35 ` Rodrigo Vivi
2024-05-24 20:03 ` Umesh Nerlige Ramappa
1 sibling, 1 reply; 4+ messages in thread
From: Rodrigo Vivi @ 2024-05-24 19:35 UTC (permalink / raw)
To: Umesh Nerlige Ramappa
Cc: intel-xe, jani.saarinen, matthew.auld, lucas.demarchi
On Fri, May 24, 2024 at 11:49:58AM -0700, Umesh Nerlige Ramappa wrote:
> The current code is running into a use after free case where xe file is
> closed before the exec queue runtime can be updated. This is occuring in
> the xe_file_close path. To fix that, do not access xe file when updating
> the exec queue runtime. Instead store the exec queue runtime locally in
> the exec queue object and accumulate it when the user dumps the drm
> client stats. We know that the xe file is valid when user is dumping the
> runtime stats for the drm client, so this effectively removes the
> dependency on xe file object in xe_exec_queue_update_runtime().
>
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1908
> Fixes: 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime")
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
> drivers/gpu/drm/xe/xe_drm_client.c | 4 +++-
> drivers/gpu/drm/xe/xe_exec_queue.c | 5 +----
> drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 ++
> 3 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
> index af404c9e5cc0..a6a15562b8f7 100644
> --- a/drivers/gpu/drm/xe/xe_drm_client.c
> +++ b/drivers/gpu/drm/xe/xe_drm_client.c
> @@ -251,8 +251,10 @@ static void show_runtime(struct drm_printer *p, struct drm_file *file)
>
> /* Accumulate all the exec queues from this client */
> mutex_lock(&xef->exec_queue.lock);
> - xa_for_each(&xef->exec_queue.xa, i, q)
> + xa_for_each(&xef->exec_queue.xa, i, q) {
> xe_exec_queue_update_runtime(q);
> + xef->runtime[q->class] += q->runtime - xef->runtime[q->class];
> + }
> mutex_unlock(&xef->exec_queue.lock);
>
> /* Get the total GPU cycles */
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 0fd61fb4d104..a0775a96efb1 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -765,7 +765,6 @@ bool xe_exec_queue_is_idle(struct xe_exec_queue *q)
> */
> void xe_exec_queue_update_runtime(struct xe_exec_queue *q)
> {
> - struct xe_file *xef;
> struct xe_lrc *lrc;
> u32 old_ts, new_ts;
>
> @@ -777,8 +776,6 @@ void xe_exec_queue_update_runtime(struct xe_exec_queue *q)
> if (!q->vm || !q->vm->xef)
> return;
>
> - xef = q->vm->xef;
> -
> /*
> * Only sample the first LRC. For parallel submission, all of them are
> * scheduled together and we compensate that below by multiplying by
> @@ -789,7 +786,7 @@ void xe_exec_queue_update_runtime(struct xe_exec_queue *q)
> */
> lrc = &q->lrc[0];
> new_ts = xe_lrc_update_timestamp(lrc, &old_ts);
> - xef->runtime[q->class] += (new_ts - old_ts) * q->width;
> + q->runtime += (new_ts - old_ts) * q->width;
> }
>
> void xe_exec_queue_kill(struct xe_exec_queue *q)
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index ee78d497d838..085a7dce7565 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -151,6 +151,8 @@ struct xe_exec_queue {
> * Protected by @vm's resv. Unused if @vm == NULL.
> */
> u64 tlb_flush_seqno;
> + /** @runtime: hw engine class runtime in ticks for this exec queue*/
^
missing space ^
good idea to move to the exec_queue
> + u64 runtime;
could we please find a better name for this? runtime what?
or do you mean run_time?
Is this the total run time of the exec queue expressed in ticks of the eu clock?
My poor brain parses runtime as the phase of the operation, vs
run_time as the time spent running.
or perhaps 'run_ticks'?
> /** @lrc: logical ring context for this exec queue */
> struct xe_lrc lrc[];
> };
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] drm/xe: Do not access xe file when updating exec queue runtime
2024-05-24 19:35 ` Rodrigo Vivi
@ 2024-05-24 20:03 ` Umesh Nerlige Ramappa
0 siblings, 0 replies; 4+ messages in thread
From: Umesh Nerlige Ramappa @ 2024-05-24 20:03 UTC (permalink / raw)
To: Rodrigo Vivi; +Cc: intel-xe, jani.saarinen, matthew.auld, lucas.demarchi
On Fri, May 24, 2024 at 03:35:32PM -0400, Rodrigo Vivi wrote:
>On Fri, May 24, 2024 at 11:49:58AM -0700, Umesh Nerlige Ramappa wrote:
>> The current code is running into a use after free case where xe file is
>> closed before the exec queue runtime can be updated. This is occuring in
>> the xe_file_close path. To fix that, do not access xe file when updating
>> the exec queue runtime. Instead store the exec queue runtime locally in
>> the exec queue object and accumulate it when the user dumps the drm
>> client stats. We know that the xe file is valid when user is dumping the
>> runtime stats for the drm client, so this effectively removes the
>> dependency on xe file object in xe_exec_queue_update_runtime().
>>
>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1908
>> Fixes: 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime")
>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_drm_client.c | 4 +++-
>> drivers/gpu/drm/xe/xe_exec_queue.c | 5 +----
>> drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 ++
>> 3 files changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
>> index af404c9e5cc0..a6a15562b8f7 100644
>> --- a/drivers/gpu/drm/xe/xe_drm_client.c
>> +++ b/drivers/gpu/drm/xe/xe_drm_client.c
>> @@ -251,8 +251,10 @@ static void show_runtime(struct drm_printer *p, struct drm_file *file)
>>
>> /* Accumulate all the exec queues from this client */
>> mutex_lock(&xef->exec_queue.lock);
>> - xa_for_each(&xef->exec_queue.xa, i, q)
>> + xa_for_each(&xef->exec_queue.xa, i, q) {
>> xe_exec_queue_update_runtime(q);
>> + xef->runtime[q->class] += q->runtime - xef->runtime[q->class];
>> + }
>> mutex_unlock(&xef->exec_queue.lock);
>>
>> /* Get the total GPU cycles */
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>> index 0fd61fb4d104..a0775a96efb1 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>> @@ -765,7 +765,6 @@ bool xe_exec_queue_is_idle(struct xe_exec_queue *q)
>> */
>> void xe_exec_queue_update_runtime(struct xe_exec_queue *q)
>> {
>> - struct xe_file *xef;
>> struct xe_lrc *lrc;
>> u32 old_ts, new_ts;
>>
>> @@ -777,8 +776,6 @@ void xe_exec_queue_update_runtime(struct xe_exec_queue *q)
>> if (!q->vm || !q->vm->xef)
>> return;
>>
>> - xef = q->vm->xef;
>> -
>> /*
>> * Only sample the first LRC. For parallel submission, all of them are
>> * scheduled together and we compensate that below by multiplying by
>> @@ -789,7 +786,7 @@ void xe_exec_queue_update_runtime(struct xe_exec_queue *q)
>> */
>> lrc = &q->lrc[0];
>> new_ts = xe_lrc_update_timestamp(lrc, &old_ts);
>> - xef->runtime[q->class] += (new_ts - old_ts) * q->width;
>> + q->runtime += (new_ts - old_ts) * q->width;
>> }
>>
>> void xe_exec_queue_kill(struct xe_exec_queue *q)
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> index ee78d497d838..085a7dce7565 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> @@ -151,6 +151,8 @@ struct xe_exec_queue {
>> * Protected by @vm's resv. Unused if @vm == NULL.
>> */
>> u64 tlb_flush_seqno;
>> + /** @runtime: hw engine class runtime in ticks for this exec queue*/
> ^
>missing space ^
>
>good idea to move to the exec_queue
>
>> + u64 runtime;
>
>could we please find a better name for this? runtime what?
>or do you mean run_time?
>Is this the total run time of the exec queue expressed in ticks of the eu clock?
>
>My poor brain parses runtime as the phase of the operation, vs
>run_time as the time spent running.
>
>or perhaps 'run_ticks'?
It was xef->runtime earlier, so just carried over the name to exec queue
as is. xef still has a runtime field that will accumulate the
q->runtime, so I would need to change it there as well. I will pick
run_ticks for now since it is indeed ticks.
Thanks,
Umesh
>
>> /** @lrc: logical ring context for this exec queue */
>> struct xe_lrc lrc[];
>> };
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-05-24 20:03 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-24 18:49 [PATCH] drm/xe: Do not access xe file when updating exec queue runtime Umesh Nerlige Ramappa
2024-05-24 19:25 ` Umesh Nerlige Ramappa
2024-05-24 19:35 ` Rodrigo Vivi
2024-05-24 20:03 ` Umesh Nerlige Ramappa
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.