All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <jani.saarinen@intel.com>,
	<matthew.auld@intel.com>, <lucas.demarchi@intel.com>
Subject: Re: [PATCH] drm/xe: Do not access xe file when updating exec queue runtime
Date: Fri, 24 May 2024 15:35:32 -0400	[thread overview]
Message-ID: <ZlDsBOpHS9ybagbt@intel.com> (raw)
In-Reply-To: <20240524184958.1344787-1-umesh.nerlige.ramappa@intel.com>

On Fri, May 24, 2024 at 11:49:58AM -0700, Umesh Nerlige Ramappa wrote:
> The current code is running into a use after free case where xe file is
> closed before the exec queue runtime can be updated. This is occuring in
> the xe_file_close path. To fix that, do not access xe file when updating
> the exec queue runtime. Instead store the exec queue runtime locally in
> the exec queue object and accumulate it when the user dumps the drm
> client stats. We know that the xe file is valid when user is dumping the
> runtime stats for the drm client, so this effectively removes the
> dependency on xe file object in xe_exec_queue_update_runtime().
> 
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1908
> Fixes: 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime")
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_drm_client.c       | 4 +++-
>  drivers/gpu/drm/xe/xe_exec_queue.c       | 5 +----
>  drivers/gpu/drm/xe/xe_exec_queue_types.h | 2 ++
>  3 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
> index af404c9e5cc0..a6a15562b8f7 100644
> --- a/drivers/gpu/drm/xe/xe_drm_client.c
> +++ b/drivers/gpu/drm/xe/xe_drm_client.c
> @@ -251,8 +251,10 @@ static void show_runtime(struct drm_printer *p, struct drm_file *file)
>  
>  	/* Accumulate all the exec queues from this client */
>  	mutex_lock(&xef->exec_queue.lock);
> -	xa_for_each(&xef->exec_queue.xa, i, q)
> +	xa_for_each(&xef->exec_queue.xa, i, q) {
>  		xe_exec_queue_update_runtime(q);
> +		xef->runtime[q->class] += q->runtime - xef->runtime[q->class];
> +	}
>  	mutex_unlock(&xef->exec_queue.lock);
>  
>  	/* Get the total GPU cycles */
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 0fd61fb4d104..a0775a96efb1 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -765,7 +765,6 @@ bool xe_exec_queue_is_idle(struct xe_exec_queue *q)
>   */
>  void xe_exec_queue_update_runtime(struct xe_exec_queue *q)
>  {
> -	struct xe_file *xef;
>  	struct xe_lrc *lrc;
>  	u32 old_ts, new_ts;
>  
> @@ -777,8 +776,6 @@ void xe_exec_queue_update_runtime(struct xe_exec_queue *q)
>  	if (!q->vm || !q->vm->xef)
>  		return;
>  
> -	xef = q->vm->xef;
> -
>  	/*
>  	 * Only sample the first LRC. For parallel submission, all of them are
>  	 * scheduled together and we compensate that below by multiplying by
> @@ -789,7 +786,7 @@ void xe_exec_queue_update_runtime(struct xe_exec_queue *q)
>  	 */
>  	lrc = &q->lrc[0];
>  	new_ts = xe_lrc_update_timestamp(lrc, &old_ts);
> -	xef->runtime[q->class] += (new_ts - old_ts) * q->width;
> +	q->runtime += (new_ts - old_ts) * q->width;
>  }
>  
>  void xe_exec_queue_kill(struct xe_exec_queue *q)
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index ee78d497d838..085a7dce7565 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -151,6 +151,8 @@ struct xe_exec_queue {
>  	 * Protected by @vm's resv. Unused if @vm == NULL.
>  	 */
>  	u64 tlb_flush_seqno;
> +	/** @runtime: hw engine class runtime in ticks for this exec queue*/
                                                                          ^
missing space                                                             ^

good idea to move to the exec_queue

> +	u64 runtime;

could we please find a better name for this? runtime what?
or do you mean run_time?
Is this the total run time of the exec queue expressed in ticks of the eu clock?

My poor brain parses runtime as the phase of the operation, vs
run_time as the time spent running.

or perhaps 'run_ticks'?

>  	/** @lrc: logical ring context for this exec queue */
>  	struct xe_lrc lrc[];
>  };
> -- 
> 2.34.1
> 

  parent reply	other threads:[~2024-05-24 19:35 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-24 18:49 [PATCH] drm/xe: Do not access xe file when updating exec queue runtime Umesh Nerlige Ramappa
2024-05-24 19:25 ` Umesh Nerlige Ramappa
2024-05-24 19:35 ` Rodrigo Vivi [this message]
2024-05-24 20:03   ` Umesh Nerlige Ramappa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZlDsBOpHS9ybagbt@intel.com \
    --to=rodrigo.vivi@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jani.saarinen@intel.com \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.auld@intel.com \
    --cc=umesh.nerlige.ramappa@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.