Re: [PATCH v2 4/4] drm/xe: Wait on killed exec queues

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Matthew Auld <matthew.auld@intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>,
	intel-xe@lists.freedesktop.org
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>,
	Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>,
	Matthew Brost <matthew.brost@intel.com>
Subject: Re: [PATCH v2 4/4] drm/xe: Wait on killed exec queues
Date: Wed, 30 Oct 2024 10:56:27 +0000	[thread overview]
Message-ID: <b4b37380-caee-4a46-bedd-78f32f622698@intel.com> (raw)
In-Reply-To: <20241029214351.776293-5-lucas.demarchi@intel.com>

On 29/10/2024 21:43, Lucas De Marchi wrote:
> When an exec queue is killed it triggers an async process of asking the
> GuC to schedule the context out. The timestamp in the context image is
> only updated when this process completes. In case a userspace process
> kills an exec and tries to read the timestamp, it may not get an updated
> runtime.
> 
> Add synchronization between the process reading the fdinfo and the exec
> queue being killed. After reading all the timestamps, wait on exec
> queues in the process of being killed. When that wait is over,
> xe_exec_queue_fini() was already called and updated the timestamps.
> 
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2667
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_device_types.h | 5 +++++
>   drivers/gpu/drm/xe/xe_drm_client.c   | 7 +++++++
>   drivers/gpu/drm/xe/xe_exec_queue.c   | 4 ++++
>   3 files changed, 16 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index ef7412d653d2e..b949376ca388a 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -614,6 +614,11 @@ struct xe_file {
>   		 * which does things while being held.
>   		 */
>   		struct mutex lock;
> +		/**
> +		 * @exec_queue.pending_removal: items pending to be removed to
> +		 * synchronize GPU state update with ongoing query.
> +		 */
> +		atomic_t pending_removal;
>   	} exec_queue;
>   
>   	/** @run_ticks: hw engine class run time in ticks for this drm client */
> diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
> index 22f0f1a6dfd55..24a0a7377abf2 100644
> --- a/drivers/gpu/drm/xe/xe_drm_client.c
> +++ b/drivers/gpu/drm/xe/xe_drm_client.c
> @@ -317,6 +317,13 @@ static void show_run_ticks(struct drm_printer *p, struct drm_file *file)
>   		break;
>   	}
>   
> +	/*
> +	 * Wait for any exec queue going away: their cycles will get updated on
> +	 * context switch out, so wait for that to happen
> +	 */
> +	wait_var_event(&xef->exec_queue.pending_removal,
> +		       !atomic_read(&xef->exec_queue.pending_removal));
> +
>   	xe_pm_runtime_put(xe);
>   
>   	if (unlikely(!hwe))
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index fd0f3b3c9101d..58dd35beb15ad 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -262,8 +262,11 @@ void xe_exec_queue_fini(struct xe_exec_queue *q)
>   
>   	/*
>   	 * Before releasing our ref to lrc and xef, accumulate our run ticks
> +	 * and wakeup any waiters.
>   	 */
>   	xe_exec_queue_update_run_ticks(q);
> +	if (q->xef && atomic_dec_and_test(&q->xef->exec_queue.pending_removal))
> +		wake_up_var(&q->xef->exec_queue.pending_removal);
>   
>   	for (i = 0; i < q->width; ++i)
>   		xe_lrc_put(q->lrc[i]);
> @@ -824,6 +827,7 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
>   	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
>   		return -EINVAL;
>   
> +	atomic_inc(&xef->exec_queue.pending_removal);

Probably need to move this down, otherwise I think user can supply bogus 
queue id or perhaps just accidentally supply an already destroyed queue, 
to get many increments per queue which is then unbalanced with the 
fini() and then show_run_ticks() gets stuck waiting forever?

>   	mutex_lock(&xef->exec_queue.lock);
>   	q = xa_erase(&xef->exec_queue.xa, args->exec_queue_id);
>   	mutex_unlock(&xef->exec_queue.lock);

next prev parent reply	other threads:[~2024-10-30 10:56 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-29 21:43 [PATCH v2 0/4] drm/xe: Fix races on fdinfo Lucas De Marchi
2024-10-29 21:43 ` [PATCH v2 1/4] drm/xe: Add trace to lrc timestamp update Lucas De Marchi
2024-10-30  9:24   ` Nirmoy Das
2024-10-29 21:43 ` [PATCH v2 2/4] drm/xe: Stop accumulating LRC timestamp on job_free Lucas De Marchi
2024-10-29 21:43 ` [PATCH v2 3/4] drm/xe: Reword exec_queue.lock doc Lucas De Marchi
2024-10-29 21:50   ` Cavitt, Jonathan
2024-10-30  9:28   ` Nirmoy Das
2024-10-29 21:43 ` [PATCH v2 4/4] drm/xe: Wait on killed exec queues Lucas De Marchi
2024-10-29 22:05   ` Cavitt, Jonathan
2024-10-30 14:01     ` Lucas De Marchi
2024-10-30 10:56   ` Matthew Auld [this message]
2024-10-30 13:30     ` Lucas De Marchi
2024-10-29 22:30 ` ✓ CI.Patch_applied: success for drm/xe: Fix races on fdinfo (rev3) Patchwork
2024-10-29 22:30 ` ✗ CI.checkpatch: warning " Patchwork
2024-10-29 22:31 ` ✓ CI.KUnit: success " Patchwork
2024-10-29 22:43 ` ✓ CI.Build: " Patchwork
2024-10-29 22:47 ` ✓ CI.Hooks: " Patchwork
2024-10-29 22:49 ` ✓ CI.checksparse: " Patchwork
2024-10-29 23:11 ` ✓ CI.BAT: " Patchwork
2024-10-29 23:48 ` [PATCH v2 0/4] drm/xe: Fix races on fdinfo Umesh Nerlige Ramappa
2024-10-30  3:24 ` ✗ CI.FULL: failure for drm/xe: Fix races on fdinfo (rev3) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b4b37380-caee-4a46-bedd-78f32f622698@intel.com \
    --to=matthew.auld@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jonathan.cavitt@intel.com \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=umesh.nerlige.ramappa@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox