From: John Harrison <john.c.harrison@intel.com>
To: Matthew Auld <matthew.auld@intel.com>, <intel-xe@lists.freedesktop.org>
Cc: Matthew Brost <matthew.brost@intel.com>,
Nirmoy Das <nirmoy.das@intel.com>
Subject: Re: [PATCH] drm/xe/guc_submit: improve schedule disable error logging
Date: Fri, 27 Sep 2024 16:05:48 -0700 [thread overview]
Message-ID: <e2f109e3-d34c-4461-bfda-910965a14ce9@intel.com> (raw)
In-Reply-To: <20240927133535.548793-2-matthew.auld@intel.com>
On 9/27/2024 06:35, Matthew Auld wrote:
> A few things here. Make the two prints consistent (and distinct), print
> the guc_id, and finally dump the CT queues. It should be possible to
> spot the guc_id in the CT queue dump, and for example see that host side
> has yet to process the response for the schedule disable, or see that
> GuC is yet to send it, to help narrow things down if we trigger the
> timeout.
Where are you seeing these failures? Is there an understanding of why?
Or is this patch basically a "we have no idea what is going on, so get
better logs out of CI" type thing? In which case you really want is to
generate a devcoredump (with my debug improvements patch set to include
the GuC log and such like) and to get CI to give you the core dumps back.
And maybe this is related to the fix from Badal: "drm/xe/guc: In
guc_ct_send_recv flush g2h worker if g2h resp times out"? We have seen
problems where the worker is simply not getting to run before the
timeout expires.
John.
>
> References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1638
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Nirmoy Das <nirmoy.das@intel.com>
> ---
> drivers/gpu/drm/xe/xe_guc_submit.c | 17 ++++++++++++++---
> 1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 80062e1d3f66..52ed7c0043f9 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -977,7 +977,12 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
> !exec_queue_pending_disable(q) ||
> guc_read_stopped(guc), HZ * 5);
> if (!ret) {
> - drm_warn(&xe->drm, "Schedule disable failed to respond");
> + struct xe_gt *gt = guc_to_gt(guc);
> + struct drm_printer p = xe_gt_err_printer(gt);
> +
> + xe_gt_warn(gt, "%s schedule disable failed to respond guc_id=%d",
> + __func__, ge->id);
> + xe_guc_ct_print(&guc->ct, &p, false);
> xe_sched_submission_start(sched);
> xe_gt_reset_async(q->gt);
> return;
> @@ -1177,8 +1182,14 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
> guc_read_stopped(guc), HZ * 5);
> if (!ret || guc_read_stopped(guc)) {
> trigger_reset:
> - if (!ret)
> - xe_gt_warn(guc_to_gt(guc), "Schedule disable failed to respond");
> + if (!ret) {
> + struct xe_gt *gt = guc_to_gt(guc);
> + struct drm_printer p = xe_gt_err_printer(gt);
> +
> + xe_gt_warn(gt, "%s schedule disable failed to respond guc_id=%d",
> + __func__, q->guc->id);
> + xe_guc_ct_print(&guc->ct, &p, true);
> + }
> set_exec_queue_extra_ref(q);
> xe_exec_queue_get(q); /* GT reset owns this */
> set_exec_queue_banned(q);
next prev parent reply other threads:[~2024-09-27 23:05 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-27 13:35 [PATCH] drm/xe/guc_submit: improve schedule disable error logging Matthew Auld
2024-09-27 13:41 ` ✓ CI.Patch_applied: success for " Patchwork
2024-09-27 13:42 ` ✗ CI.checkpatch: warning " Patchwork
2024-09-27 13:43 ` ✓ CI.KUnit: success " Patchwork
2024-09-27 13:54 ` ✓ CI.Build: " Patchwork
2024-09-27 13:56 ` ✓ CI.Hooks: " Patchwork
2024-09-27 13:58 ` ✓ CI.checksparse: " Patchwork
2024-09-27 14:10 ` [PATCH] " Nirmoy Das
2024-09-27 14:16 ` ✓ CI.BAT: success for " Patchwork
2024-09-27 21:30 ` [PATCH] " Matthew Brost
2024-09-27 23:05 ` John Harrison [this message]
2024-09-28 2:39 ` Matthew Brost
2024-09-30 10:00 ` Matthew Auld
2024-09-30 22:48 ` John Harrison
2024-09-28 7:14 ` ✗ CI.FULL: failure for " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e2f109e3-d34c-4461-bfda-910965a14ce9@intel.com \
--to=john.c.harrison@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.auld@intel.com \
--cc=matthew.brost@intel.com \
--cc=nirmoy.das@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox