Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <stable@vger.kernel.org>,
	Matthew Brost <matthew.brost@intel.com>
Subject: Re: [PATCH v2] drm/xe/client: Better correlate exec_queue and GT timestamps
Date: Mon, 13 Jan 2025 15:00:55 -0800	[thread overview]
Message-ID: <Z4WbJ4DMaPS+LfJ5@orsosgc001> (raw)
In-Reply-To: <20250109200340.1774314-1-lucas.demarchi@intel.com>

On Thu, Jan 09, 2025 at 12:03:40PM -0800, Lucas De Marchi wrote:
>This partially reverts commit fe4f5d4b6616 ("drm/xe: Clean up VM / exec
>queue file lock usage."). While it's desired to have the mutex to
>protect only the reference to the exec queue, getting and dropping each
>mutex and then later getting the GPU timestamp, doesn't produce a
>correct result: it introduces multiple opportunities for the task to be
>scheduled out and thus wrecking havoc the deltas reported to userspace.
>
>Also, to better correlate the timestamp from the exec queues with the
>GPU, disable preemption so they can be updated without allowing the task
>to be scheduled out. We leave interrupts enabled as that shouldn't be
>enough disturbance for the deltas to matter to userspace.
>
>Test scenario:
>
>	* IGT'S `xe_drm_fdinfo --r utilization-single-full-load`
>	* Platform: LNL, where CI occasionally reports failures
>	* `stress -c $(nproc)` running in parallel to disturb the
>	  system
>
>This brings a first failure from "after ~150 executions" to "never
>occurs after 1000 attempts".
>
>v2: Also keep xe_hw_engine_read_timestamp() call inside the
>    preemption-disabled section (Umesh)
>
>Cc: stable@vger.kernel.org # v6.11+
>Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>Cc: Matthew Brost <matthew.brost@intel.com>
>Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3512
>Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>---
> drivers/gpu/drm/xe/xe_drm_client.c | 14 ++++++--------
> 1 file changed, 6 insertions(+), 8 deletions(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
>index 7d55ad846bac5..2220a09bf9751 100644
>--- a/drivers/gpu/drm/xe/xe_drm_client.c
>+++ b/drivers/gpu/drm/xe/xe_drm_client.c
>@@ -337,20 +337,18 @@ static void show_run_ticks(struct drm_printer *p, struct drm_file *file)
> 		return;
> 	}
>
>+	/* Let both the GPU timestamp and exec queue be updated together */
>+	preempt_disable();
>+	gpu_timestamp = xe_hw_engine_read_timestamp(hwe);
>+
> 	/* Accumulate all the exec queues from this client */
> 	mutex_lock(&xef->exec_queue.lock);

mutex_lock could sleep and you have disabled preemption above, so not a 
good idea. I think it will bug check if the lock is contended.

Earlier you had mutex_lock on the outside, so that was fine.

Thanks,
Umesh

>-	xa_for_each(&xef->exec_queue.xa, i, q) {
>-		xe_exec_queue_get(q);
>-		mutex_unlock(&xef->exec_queue.lock);
>
>+	xa_for_each(&xef->exec_queue.xa, i, q)
> 		xe_exec_queue_update_run_ticks(q);
>
>-		mutex_lock(&xef->exec_queue.lock);
>-		xe_exec_queue_put(q);
>-	}
> 	mutex_unlock(&xef->exec_queue.lock);
>-
>-	gpu_timestamp = xe_hw_engine_read_timestamp(hwe);
>+	preempt_enable();
>
> 	xe_force_wake_put(gt_to_fw(hwe->gt), fw_ref);
> 	xe_pm_runtime_put(xe);
>-- 
>2.47.0
>

  parent reply	other threads:[~2025-01-13 23:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-09 20:03 [PATCH v2] drm/xe/client: Better correlate exec_queue and GT timestamps Lucas De Marchi
2025-01-09 22:38 ` ✓ CI.Patch_applied: success for drm/xe/client: Better correlate exec_queue and GT timestamps (rev2) Patchwork
2025-01-09 22:38 ` ✗ CI.checkpatch: warning " Patchwork
2025-01-09 22:39 ` ✓ CI.KUnit: success " Patchwork
2025-01-09 22:58 ` ✓ CI.Build: " Patchwork
2025-01-09 23:00 ` ✓ CI.Hooks: " Patchwork
2025-01-09 23:01 ` ✓ CI.checksparse: " Patchwork
2025-01-09 23:30 ` ✓ Xe.CI.BAT: " Patchwork
2025-01-12 12:41 ` ✗ Xe.CI.Full: failure " Patchwork
2025-01-13 23:00 ` Umesh Nerlige Ramappa [this message]
2025-01-14  0:38   ` [PATCH v2] drm/xe/client: Better correlate exec_queue and GT timestamps Lucas De Marchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z4WbJ4DMaPS+LfJ5@orsosgc001 \
    --to=umesh.nerlige.ramappa@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox