Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
To: John Harrison <john.c.harrison@intel.com>,
	Michal Wajdeczko <michal.wajdeczko@intel.com>,
	<Intel-Xe@Lists.FreeDesktop.Org>
Subject: Re: [PATCH] drm/xe/guc: Track FAST_REQ H2Gs to report where errors came from
Date: Mon, 24 Mar 2025 09:48:45 -0700	[thread overview]
Message-ID: <de0073ad-9b42-4c0e-bdfe-a47d45215cc4@intel.com> (raw)
In-Reply-To: <094db2ab-de03-4505-9256-5fa552babb0a@intel.com>

<snip>

>>>> @@ -1141,6 +1165,47 @@ static int guc_crash_process_msg(struct
>>>> xe_guc_ct *ct, u32 action)
>>>>        return 0;
>>>>    }
>>>>    +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
>>>> +static void fast_req_report(struct xe_guc_ct *ct, u16 fence)
>>>> +{
>>>> +    unsigned int n;
>>>> +    bool found = false;
>>>> +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC)
>>>> +    char *buf;
>>>> +#endif
>>>> +
>>>> +    lockdep_assert_held(&ct->lock);
>>>> +
>>>> +    for (n = 0; n < ARRAY_SIZE(ct->fast_req); n++) {
>>>> +        if (ct->fast_req[n].fence != fence)
>>>> +            continue;
>>>> +        found = true;
>>>> +
>>>> +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC)
>>>> +        buf = kmalloc(SZ_4K, GFP_NOWAIT);
>>>> +        if (buf && stack_depot_snprint(ct->fast_req[n].stack, buf,
>>>> SZ_4K, 0))
>>>> +            xe_gt_err(ct_to_gt(ct), "Fence 0x%x was used by action
>>>> %#04x sent at\n%s",
>>>> +                  fence, ct->fast_req[n].action, buf);
>>>> +        else
>>>> +            xe_gt_err(ct_to_gt(ct), "Fence 0x%x was used by action
>>>> %#04x [failed to retrieve stack]\n",
>>>> +                  fence, ct->fast_req[n].action);
>>>> +        kfree(buf);
>>>> +#else
>>>> +        xe_gt_err(ct_to_gt(ct), "Fence 0x%x was used by action 
>>>> %#04x\n",
>>>> +              fence, ct->fast_req[n].action);
>>>> +#endif
>>>> +        break;
>>>> +    }
>>>> +
>>>> +    if (!found)
>>>> +        xe_gt_warn(ct_to_gt(ct), "FAST_REQ G2H fence 0x%x not found!
>>>> \n", fence);
>>> Not convinced about this error message. the fast_req array is only 32
>>> entries deep, so it wouldn't be weird for entries to be overwritten 
>>> in a
>>> busy system, but the read I get from this message is that something is
>>> wrong with the fact that we didn't find the fence. Maybe go for
>>> something like: "FAST_REQ G2H fence 0x%x action unknown". Not a 
>>> blocker.
> How about:
>                 xe_gt_warn(gt, "Fence 0x%x not found - tracking buffer 
> wrapped?\n", fence);
>
>

Sounds good to me

<snip>

>
>>
>>>> +    /** @action: H2G action code */
>>>> +    u16 action;
>>>> +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC)
>>>> +    /** @stack: call stack from when the H2G was sent */
>>>> +    depot_stack_handle_t stack;
>>>> +#endif
>>>> +};
>>> nit: should this whole struct be wrapped in CONFIG_DRM_XE_DEBUG? Not
>>> sure if any code analyzer would be smart enough to mark it as unused if
>>> CONFIG_DRM_XE_DEBUG is not set.
> Um, it is. Or are you saying that it should not be?
>
> That '#endif' just below matches a CONFIG_DRM_XE_DEBUG that was 
> previously just wrapping the DEAD_CT structure. Now it wraps both that 
> and the FAST_REQ fence structure.

D'oh! I missed that the ifdef was already there.

Daniele

>
> John.
>
>
>>>
>>> Apart from the nits this looks good to me:
>>>
>>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>
>>> Daniele
>>>
>>>>    #endif
>>>>      /**
>>>> @@ -152,6 +165,8 @@ struct xe_guc_ct {
>>>>    #if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
>>>>        /** @dead: information for debugging dead CTs */
>>>>        struct xe_dead_ct dead;
>>>> +    /** @fast_req: history of FAST_REQ messages for matching with G2H
>>>> error responses*/
>>>> +    struct xe_fast_req_fence fast_req[SZ_32];
>>>>    #endif
>>>>    };
>>>>    diff --git a/drivers/gpu/drm/xe/xe_guc_log.h b/drivers/gpu/drm/xe/
>>>> xe_guc_log.h
>>>> index 5b896f5fafaf..f1e2b0be90a9 100644
>>>> --- a/drivers/gpu/drm/xe/xe_guc_log.h
>>>> +++ b/drivers/gpu/drm/xe/xe_guc_log.h
>>>> @@ -12,7 +12,7 @@
>>>>    struct drm_printer;
>>>>    struct xe_device;
>>>>    -#if IS_ENABLED(CONFIG_DRM_XE_LARGE_GUC_BUFFER)
>>>> +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_GUC)
>>>>    #define CRASH_BUFFER_SIZE       SZ_1M
>>>>    #define DEBUG_BUFFER_SIZE       SZ_8M
>>>>    #define CAPTURE_BUFFER_SIZE     SZ_2M
>


      reply	other threads:[~2025-03-24 16:48 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-21  3:14 [PATCH] drm/xe/guc: Track FAST_REQ H2Gs to report where errors came from John.C.Harrison
2025-02-21  3:19 ` ✓ CI.Patch_applied: success for " Patchwork
2025-02-21  3:20 ` ✗ CI.checkpatch: warning " Patchwork
2025-02-21  3:21 ` ✓ CI.KUnit: success " Patchwork
2025-02-21  3:37 ` ✓ CI.Build: " Patchwork
2025-02-21  3:40 ` ✓ CI.Hooks: " Patchwork
2025-02-21  3:41 ` ✓ CI.checksparse: " Patchwork
2025-02-21  4:02 ` ✓ Xe.CI.BAT: " Patchwork
2025-02-21 18:23 ` ✗ Xe.CI.Full: failure " Patchwork
2025-03-06  1:15 ` [PATCH] " Daniele Ceraolo Spurio
2025-03-06 22:48   ` Michal Wajdeczko
2025-03-21 17:07     ` John Harrison
2025-03-24 16:48       ` Daniele Ceraolo Spurio [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de0073ad-9b42-4c0e-bdfe-a47d45215cc4@intel.com \
    --to=daniele.ceraolospurio@intel.com \
    --cc=Intel-Xe@Lists.FreeDesktop.Org \
    --cc=john.c.harrison@intel.com \
    --cc=michal.wajdeczko@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox