Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: John Harrison <john.c.harrison@intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: <Intel-Xe@lists.freedesktop.org>
Subject: Re: [PATCH 2/2] drm/xe/guc: Support crash dump notification from GuC
Date: Fri, 8 Nov 2024 18:09:58 -0800	[thread overview]
Message-ID: <7a2641b8-2cc4-4f1c-bcb0-b82d8a7df125@intel.com> (raw)
In-Reply-To: <90f215aa-5d52-46d0-9f85-0cefc580c707@intel.com>

On 11/8/2024 16:39, John Harrison wrote:
> On 11/8/2024 15:56, Matthew Brost wrote:
>> On Fri, Nov 08, 2024 at 03:51:12PM -0800, John Harrison wrote:
>>> On 11/8/2024 15:35, Matthew Brost wrote:
>>>> On Fri, Nov 08, 2024 at 01:27:37PM -0800, John.C.Harrison@Intel.com 
>>>> wrote:
>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>
>>>>> Add support for the two crash dump notifications from GuC. Either one
>>>>> means GuC is toast, so just capture state trigger a reset.
>>>>>
>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/xe/xe_guc_ct.c | 23 +++++++++++++++++++++++
>>>>>    1 file changed, 23 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c 
>>>>> b/drivers/gpu/drm/xe/xe_guc_ct.c
>>>>> index 63bd91963eb1..7eb175a0b874 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
>>>>> @@ -54,6 +54,7 @@ enum {
>>>>>        CT_DEAD_PARSE_G2H_UNKNOWN,        /* 0x1000 */
>>>>>        CT_DEAD_PARSE_G2H_ORIGIN,        /* 0x2000 */
>>>>>        CT_DEAD_PARSE_G2H_TYPE,            /* 0x4000 */
>>>>> +    CT_DEAD_CRASH,                /* 0x8000 */
>>>>>    };
>>>>>    static void ct_dead_worker_func(struct work_struct *w);
>>>>> @@ -1120,6 +1121,24 @@ static int parse_g2h_event(struct xe_guc_ct 
>>>>> *ct, u32 *msg, u32 len)
>>>>>        return 0;
>>>>>    }
>>>>> +static int guc_crash_process_msg(struct xe_guc_ct *ct, u32 action)
>>>>> +{
>>>>> +    struct xe_gt *gt = ct_to_gt(ct);
>>>>> +
>>>>> +    if (action == XE_GUC_ACTION_NOTIFY_CRASH_DUMP_POSTED)
>>>>> +        xe_gt_err(gt, "GuC Crash dump notification\n");
>>>>> +    else if (action == XE_GUC_ACTION_NOTIFY_EXCEPTION)
>>>>> +        xe_gt_err(gt, "GuC Exception notification\n");
>>>>> +    else
>>>>> +        xe_gt_err(gt, "Unknown GuC crash notification: 0x%04X\n", 
>>>>> action);
>>>>> +
>>>>> +    CT_DEAD(ct, NULL, CRASH);
>>>>> +
>>>>> +    kick_reset(ct);
>>>> Side note, we may want to wire a devcoredump to a GT reset too.
>>> I have a work-in-progress series to allow creating a devcoredump 
>>> without a
>>> scheduler job. I assume that would be a re-requisite to creating one 
>>> from an
>>> arbitrary GT reset. Certainly coming in from an async event such as 
>>> this,
>>> there is no scheduler job to use. Hoping to post that soon. Should 
>>> be easy
>>> enough to connect it to the GT reset then.
>>>
>> We appear to be stepping on each other feet, just posted this one...
>>
>> https://patchwork.freedesktop.org/series/141110/
> I did see that. Haven't had a chance to look in detail yet. But I 
> don't think it really affects the changes I'm doing. Either sched_job 
> or exec_queue doesn't make a difference, we don't have access to 
> either outside of the submission path. My other changes are more about 
> splitting the print code up a bit to allow dump via the dmesg helper 
> (for internal developer use) as well as via sysfs. The bits I'm 
> missing at the moment is how to get to engine state without having a 
> job/queue to start from. I was also wanting to allow capture of 
> multiple GTs in a single dump. I'll see if I can quickly clean up what 
> I've got so far and post it so you can take a look.
>
Posted: https://patchwork.freedesktop.org/series/141122/

> John.
>
>>
>> I had to code these locally while working on something else so threw
>> them on the list.
>>
>> Let me know if I missed anything there or if you want me to hold up
>> merging as I was planning on merging once CI is clean.
>>
>> Also agree it is a small rework (don't assume we have a queue) on top of
>> this to connect this to a GT reset.
>>
>> Matt
>>
>>> John.
>>>
>>>> Anyways this patch LGTM. With that:
>>>> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
>>>>
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>>    static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, 
>>>>> u32 len)
>>>>>    {
>>>>>        struct xe_gt *gt =  ct_to_gt(ct);
>>>>> @@ -1294,6 +1313,10 @@ static int process_g2h_msg(struct xe_guc_ct 
>>>>> *ct, u32 *msg, u32 len)
>>>>>        case GUC_ACTION_GUC2PF_ADVERSE_EVENT:
>>>>>            ret = xe_gt_sriov_pf_monitor_process_guc2pf(gt, hxg, 
>>>>> hxg_len);
>>>>>            break;
>>>>> +    case XE_GUC_ACTION_NOTIFY_CRASH_DUMP_POSTED:
>>>>> +    case XE_GUC_ACTION_NOTIFY_EXCEPTION:
>>>>> +        ret = guc_crash_process_msg(ct, action);
>>>>> +        break;
>>>>>        default:
>>>>>            xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action);
>>>>>        }
>>>>> -- 
>>>>> 2.47.0
>>>>>
>


  reply	other threads:[~2024-11-09  2:10 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-08 21:27 [PATCH 0/2] drm/xe/guc: Handle crash notifications & drop default log verbosity John.C.Harrison
2024-11-08 21:27 ` [PATCH 1/2] drm/xe/guc: Reduce default GuC " John.C.Harrison
2024-11-08 23:33   ` Matthew Brost
2024-11-08 23:49     ` John Harrison
2024-11-08 21:27 ` [PATCH 2/2] drm/xe/guc: Support crash dump notification from GuC John.C.Harrison
2024-11-08 23:35   ` Matthew Brost
2024-11-08 23:51     ` John Harrison
2024-11-08 23:56       ` Matthew Brost
2024-11-09  0:39         ` John Harrison
2024-11-09  2:09           ` John Harrison [this message]
2024-11-08 21:50 ` [PATCH 0/2] drm/xe/guc: Handle crash notifications & drop default log verbosity Cavitt, Jonathan
2024-11-08 22:28 ` ✓ CI.Patch_applied: success for " Patchwork
2024-11-08 22:28 ` ✓ CI.checkpatch: " Patchwork
2024-11-08 22:29 ` ✓ CI.KUnit: " Patchwork
2024-11-08 22:41 ` ✓ CI.Build: " Patchwork
2024-11-08 22:43 ` ✓ CI.Hooks: " Patchwork
2024-11-08 22:45 ` ✓ CI.checksparse: " Patchwork
2024-11-08 23:02 ` ✓ CI.BAT: " Patchwork
2024-11-10  1:47 ` ✗ CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7a2641b8-2cc4-4f1c-bcb0-b82d8a7df125@intel.com \
    --to=john.c.harrison@intel.com \
    --cc=Intel-Xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox