From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
To: John Harrison <john.c.harrison@intel.com>,
Michal Wajdeczko <michal.wajdeczko@intel.com>,
<Intel-Xe@Lists.FreeDesktop.Org>
Subject: Re: [PATCH] drm/xe/guc: Track FAST_REQ H2Gs to report where errors came from
Date: Mon, 24 Mar 2025 09:48:45 -0700 [thread overview]
Message-ID: <de0073ad-9b42-4c0e-bdfe-a47d45215cc4@intel.com> (raw)
In-Reply-To: <094db2ab-de03-4505-9256-5fa552babb0a@intel.com>
<snip>
>>>> @@ -1141,6 +1165,47 @@ static int guc_crash_process_msg(struct
>>>> xe_guc_ct *ct, u32 action)
>>>> return 0;
>>>> }
>>>> +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
>>>> +static void fast_req_report(struct xe_guc_ct *ct, u16 fence)
>>>> +{
>>>> + unsigned int n;
>>>> + bool found = false;
>>>> +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC)
>>>> + char *buf;
>>>> +#endif
>>>> +
>>>> + lockdep_assert_held(&ct->lock);
>>>> +
>>>> + for (n = 0; n < ARRAY_SIZE(ct->fast_req); n++) {
>>>> + if (ct->fast_req[n].fence != fence)
>>>> + continue;
>>>> + found = true;
>>>> +
>>>> +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC)
>>>> + buf = kmalloc(SZ_4K, GFP_NOWAIT);
>>>> + if (buf && stack_depot_snprint(ct->fast_req[n].stack, buf,
>>>> SZ_4K, 0))
>>>> + xe_gt_err(ct_to_gt(ct), "Fence 0x%x was used by action
>>>> %#04x sent at\n%s",
>>>> + fence, ct->fast_req[n].action, buf);
>>>> + else
>>>> + xe_gt_err(ct_to_gt(ct), "Fence 0x%x was used by action
>>>> %#04x [failed to retrieve stack]\n",
>>>> + fence, ct->fast_req[n].action);
>>>> + kfree(buf);
>>>> +#else
>>>> + xe_gt_err(ct_to_gt(ct), "Fence 0x%x was used by action
>>>> %#04x\n",
>>>> + fence, ct->fast_req[n].action);
>>>> +#endif
>>>> + break;
>>>> + }
>>>> +
>>>> + if (!found)
>>>> + xe_gt_warn(ct_to_gt(ct), "FAST_REQ G2H fence 0x%x not found!
>>>> \n", fence);
>>> Not convinced about this error message. the fast_req array is only 32
>>> entries deep, so it wouldn't be weird for entries to be overwritten
>>> in a
>>> busy system, but the read I get from this message is that something is
>>> wrong with the fact that we didn't find the fence. Maybe go for
>>> something like: "FAST_REQ G2H fence 0x%x action unknown". Not a
>>> blocker.
> How about:
> xe_gt_warn(gt, "Fence 0x%x not found - tracking buffer
> wrapped?\n", fence);
>
>
Sounds good to me
<snip>
>
>>
>>>> + /** @action: H2G action code */
>>>> + u16 action;
>>>> +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC)
>>>> + /** @stack: call stack from when the H2G was sent */
>>>> + depot_stack_handle_t stack;
>>>> +#endif
>>>> +};
>>> nit: should this whole struct be wrapped in CONFIG_DRM_XE_DEBUG? Not
>>> sure if any code analyzer would be smart enough to mark it as unused if
>>> CONFIG_DRM_XE_DEBUG is not set.
> Um, it is. Or are you saying that it should not be?
>
> That '#endif' just below matches a CONFIG_DRM_XE_DEBUG that was
> previously just wrapping the DEAD_CT structure. Now it wraps both that
> and the FAST_REQ fence structure.
D'oh! I missed that the ifdef was already there.
Daniele
>
> John.
>
>
>>>
>>> Apart from the nits this looks good to me:
>>>
>>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>
>>> Daniele
>>>
>>>> #endif
>>>> /**
>>>> @@ -152,6 +165,8 @@ struct xe_guc_ct {
>>>> #if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
>>>> /** @dead: information for debugging dead CTs */
>>>> struct xe_dead_ct dead;
>>>> + /** @fast_req: history of FAST_REQ messages for matching with G2H
>>>> error responses*/
>>>> + struct xe_fast_req_fence fast_req[SZ_32];
>>>> #endif
>>>> };
>>>> diff --git a/drivers/gpu/drm/xe/xe_guc_log.h b/drivers/gpu/drm/xe/
>>>> xe_guc_log.h
>>>> index 5b896f5fafaf..f1e2b0be90a9 100644
>>>> --- a/drivers/gpu/drm/xe/xe_guc_log.h
>>>> +++ b/drivers/gpu/drm/xe/xe_guc_log.h
>>>> @@ -12,7 +12,7 @@
>>>> struct drm_printer;
>>>> struct xe_device;
>>>> -#if IS_ENABLED(CONFIG_DRM_XE_LARGE_GUC_BUFFER)
>>>> +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC)
>>>> #define CRASH_BUFFER_SIZE SZ_1M
>>>> #define DEBUG_BUFFER_SIZE SZ_8M
>>>> #define CAPTURE_BUFFER_SIZE SZ_2M
>
prev parent reply other threads:[~2025-03-24 16:48 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-21 3:14 [PATCH] drm/xe/guc: Track FAST_REQ H2Gs to report where errors came from John.C.Harrison
2025-02-21 3:19 ` ✓ CI.Patch_applied: success for " Patchwork
2025-02-21 3:20 ` ✗ CI.checkpatch: warning " Patchwork
2025-02-21 3:21 ` ✓ CI.KUnit: success " Patchwork
2025-02-21 3:37 ` ✓ CI.Build: " Patchwork
2025-02-21 3:40 ` ✓ CI.Hooks: " Patchwork
2025-02-21 3:41 ` ✓ CI.checksparse: " Patchwork
2025-02-21 4:02 ` ✓ Xe.CI.BAT: " Patchwork
2025-02-21 18:23 ` ✗ Xe.CI.Full: failure " Patchwork
2025-03-06 1:15 ` [PATCH] " Daniele Ceraolo Spurio
2025-03-06 22:48 ` Michal Wajdeczko
2025-03-21 17:07 ` John Harrison
2025-03-24 16:48 ` Daniele Ceraolo Spurio [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=de0073ad-9b42-4c0e-bdfe-a47d45215cc4@intel.com \
--to=daniele.ceraolospurio@intel.com \
--cc=Intel-Xe@Lists.FreeDesktop.Org \
--cc=john.c.harrison@intel.com \
--cc=michal.wajdeczko@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox