From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH 1/2] drm/xe: Always capture exec queues on snapshot
Date: Mon, 8 Apr 2024 14:32:41 -0400 [thread overview]
Message-ID: <ZhQ4SSuBH82XDiGN@intel.com> (raw)
In-Reply-To: <20240405211632.223568-2-matthew.brost@intel.com>
On Fri, Apr 05, 2024 at 02:16:31PM -0700, Matthew Brost wrote:
> Always capture exec queues on snapshot regardless if exec queue has
> pending jobs or not. Having jobs or not does indicate whether the exec
> queue capture is useful.
>
> Example bugs that would not be easily detected by skipping capture when
> pending job list is empty:
> - Jobs pending on exec queue have dependencies
> - Leaking exec queue refs
> - GuC protocol issues (i.e. losing G2H)
>
> In addition to above bugs, in general it just useful to see every exec
> queue registered with the GuC and its state.
>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
> drivers/gpu/drm/xe/xe_devcoredump.c | 2 +-
> drivers/gpu/drm/xe/xe_guc_submit.c | 25 +++----------------------
> drivers/gpu/drm/xe/xe_guc_submit.h | 4 ++--
> 3 files changed, 6 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
> index a951043b2943..283ca7518aff 100644
> --- a/drivers/gpu/drm/xe/xe_devcoredump.c
> +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
> @@ -188,7 +188,7 @@ static void devcoredump_snapshot(struct xe_devcoredump *coredump,
> xe_gt_info(ss->gt, "failed to get forcewake for coredump capture\n");
>
> coredump->snapshot.ct = xe_guc_ct_snapshot_capture(&guc->ct, true);
> - coredump->snapshot.ge = xe_guc_exec_queue_snapshot_capture(job);
> + coredump->snapshot.ge = xe_guc_exec_queue_snapshot_capture(q);
> coredump->snapshot.job = xe_sched_job_snapshot_capture(job);
> coredump->snapshot.vm = xe_vm_snapshot_capture(q->vm);
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 9c30bd9ac8c0..cc1890e322cb 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -1777,7 +1777,7 @@ guc_exec_queue_wq_snapshot_print(struct xe_guc_submit_exec_queue_snapshot *snaps
>
> /**
> * xe_guc_exec_queue_snapshot_capture - Take a quick snapshot of the GuC Engine.
> - * @job: faulty Xe scheduled job.
> + * @q: faulty exec queue
> *
> * This can be printed out in a later stage like during dev_coredump
> * analysis.
> @@ -1786,9 +1786,8 @@ guc_exec_queue_wq_snapshot_print(struct xe_guc_submit_exec_queue_snapshot *snaps
> * caller, using `xe_guc_exec_queue_snapshot_free`.
> */
> struct xe_guc_submit_exec_queue_snapshot *
> -xe_guc_exec_queue_snapshot_capture(struct xe_sched_job *job)
> +xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q)
> {
> - struct xe_exec_queue *q = job->q;
> struct xe_gpu_scheduler *sched = &q->guc->sched;
> struct xe_guc_submit_exec_queue_snapshot *snapshot;
> int i;
> @@ -1944,28 +1943,10 @@ void xe_guc_exec_queue_snapshot_free(struct xe_guc_submit_exec_queue_snapshot *s
> static void guc_exec_queue_print(struct xe_exec_queue *q, struct drm_printer *p)
> {
> struct xe_guc_submit_exec_queue_snapshot *snapshot;
> - struct xe_gpu_scheduler *sched = &q->guc->sched;
> - struct xe_sched_job *job;
> - bool found = false;
>
> - spin_lock(&sched->base.job_list_lock);
> - list_for_each_entry(job, &sched->base.pending_list, drm.list) {
> - if (job->q == q) {
> - xe_sched_job_get(job);
> - found = true;
> - break;
> - }
> - }
> - spin_unlock(&sched->base.job_list_lock);
> -
> - if (!found)
> - return;
> -
> - snapshot = xe_guc_exec_queue_snapshot_capture(job);
> + snapshot = xe_guc_exec_queue_snapshot_capture(q);
> xe_guc_exec_queue_snapshot_print(snapshot, p);
> xe_guc_exec_queue_snapshot_free(snapshot);
> -
> - xe_sched_job_put(job);
> }
>
> /**
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
> index 2f14dfd04722..fad0421ead36 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.h
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.h
> @@ -9,8 +9,8 @@
> #include <linux/types.h>
>
> struct drm_printer;
> +struct xe_exec_queue;
> struct xe_guc;
> -struct xe_sched_job;
>
> int xe_guc_submit_init(struct xe_guc *guc);
>
> @@ -27,7 +27,7 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
> int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 len);
>
> struct xe_guc_submit_exec_queue_snapshot *
> -xe_guc_exec_queue_snapshot_capture(struct xe_sched_job *job);
> +xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q);
> void
> xe_guc_exec_queue_snapshot_capture_delayed(struct xe_guc_submit_exec_queue_snapshot *snapshot);
> void
> --
> 2.34.1
>
next prev parent reply other threads:[~2024-04-08 18:32 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-05 21:16 [PATCH 0/2] Snapshot updates Matthew Brost
2024-04-05 21:16 ` [PATCH 1/2] drm/xe: Always capture exec queues on snapshot Matthew Brost
2024-04-08 18:32 ` Rodrigo Vivi [this message]
2024-04-05 21:16 ` [PATCH 2/2] drm/xe: Capture GuC CT snapshot when stopped Matthew Brost
2024-04-08 18:32 ` Rodrigo Vivi
2024-04-09 7:45 ` Matthew Auld
2024-04-05 23:21 ` ✓ CI.Patch_applied: success for Snapshot updates Patchwork
2024-04-05 23:22 ` ✓ CI.checkpatch: " Patchwork
2024-04-05 23:23 ` ✓ CI.KUnit: " Patchwork
2024-04-05 23:34 ` ✓ CI.Build: " Patchwork
2024-04-05 23:37 ` ✓ CI.Hooks: " Patchwork
2024-04-05 23:38 ` ✓ CI.checksparse: " Patchwork
2024-04-06 0:06 ` ✓ CI.BAT: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZhQ4SSuBH82XDiGN@intel.com \
--to=rodrigo.vivi@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.