From: John Harrison <john.c.harrison@intel.com>
To: Nirmoy Das <nirmoy.das@intel.com>,
Matthew Brost <matthew.brost@intel.com>
Cc: <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH] drm/xe: Add engine name to the engine reset and cat-err log
Date: Mon, 29 Apr 2024 14:17:31 -0700 [thread overview]
Message-ID: <5ce5dda0-5de8-4619-bab4-5875c46d9c92@intel.com> (raw)
In-Reply-To: <2a989da8-26ad-498f-bf7d-19796eac2fa8@intel.com>
On 4/25/2024 14:16, Nirmoy Das wrote:
> Hi Matt,
>
> On 4/25/2024 10:31 PM, Matthew Brost wrote:
>> On Thu, Apr 25, 2024 at 09:11:22PM +0200, Nirmoy Das wrote:
>>> Hi Matt,
>>>
>>> On 4/25/2024 7:46 PM, Matthew Brost wrote:
>>>> On Thu, Apr 25, 2024 at 02:18:56PM +0200, Nirmoy Das wrote:
>>>>> Add engine name to the engine reset and cat error log
>>>>> which should be useful while debugging.
>>>>>
>>>>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
>>>>> ---
>>>>> drivers/gpu/drm/xe/xe_guc_submit.c | 5 +++--
>>>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
>>>>> b/drivers/gpu/drm/xe/xe_guc_submit.c
>>>>> index c7d38469fb46..245e29d095c0 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>>>>> @@ -1655,7 +1655,7 @@ int xe_guc_exec_queue_reset_handler(struct
>>>>> xe_guc *guc, u32 *msg, u32 len)
>>>>> if (unlikely(!q))
>>>>> return -EPROTO;
>>>>> - drm_info(&xe->drm, "Engine reset: guc_id=%d", guc_id);
>>>>> + drm_info(&xe->drm, "Engine reset: name=%s, guc_id=%d",
>>>>> q->hwe->name, guc_id);
>>>> I don't think q->hwe->name name is useful as it might not actually be
>>>> exec queue is running. I'd drop that, and replace with string
>>>> indicating
Not following this. What is q->hwe->name if it is not the name of the
exec queue which owns the given guc_id?
Note that the notification is officially about a context reset not an
engine reset. The actual implementation mechanism might be an engine
reset but the intent and purpose is to reset the specific context as
identified by the guc_id field. That is, the error report is not that
BCS37 failed and needed to be reset, but that context 43 failed and
needed to be reset and the fact that it happened to executing on BCS37
at the time is more coincidence than cause.
If the q name is not meaningful and just some generic string then maybe
the better fix would be to make that name more useful?
I would also change the base string to be 'Context reset' rather than
'Engine reset'.
>>>> the hardware engine class.
>>> I will resend with engine class instead.
>>>
>> Maybe include the logical mask of exec queue too.
>
> Will do that!
>
> Nirmoy
>
>>
>> Matt
>>
>>> Thanks,
>>>
>>> Nirmoy
>>>
>>>>> /* FIXME: Do error capture, most likely async */
>>>>> @@ -1690,7 +1690,8 @@ int
>>>>> xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32
>>>>> *msg,
>>>>> if (unlikely(!q))
>>>>> return -EPROTO;
>>>>> - drm_dbg(&xe->drm, "Engine memory cat error: guc_id=%d", guc_id);
>>>>> + drm_dbg(&xe->drm, "Engine memory cat error: name=%s, guc_id=%d",
>>>>> + q->hwe->name, guc_id);
>>>> Same here.
>>>>
Indeed. This is also not about an engine failing and create a memory
error. It is about a context attempting to access an invalid address.
John.
>>>> Matt
>>>>
>>>>> trace_xe_exec_queue_memory_cat_error(q);
>>>>> /* Treat the same as engine reset */
>>>>> --
>>>>> 2.42.0
>>>>>
next prev parent reply other threads:[~2024-04-29 21:17 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-25 12:18 [PATCH] drm/xe: Add engine name to the engine reset and cat-err log Nirmoy Das
2024-04-25 13:05 ` ✓ CI.Patch_applied: success for " Patchwork
2024-04-25 13:06 ` ✓ CI.checkpatch: " Patchwork
2024-04-25 13:07 ` ✓ CI.KUnit: " Patchwork
2024-04-25 13:18 ` ✓ CI.Build: " Patchwork
2024-04-25 13:21 ` ✓ CI.Hooks: " Patchwork
2024-04-25 13:22 ` ✓ CI.checksparse: " Patchwork
2024-04-25 13:53 ` ✓ CI.BAT: " Patchwork
2024-04-25 17:46 ` [PATCH] " Matthew Brost
2024-04-25 19:11 ` Nirmoy Das
2024-04-25 20:31 ` Matthew Brost
2024-04-25 21:16 ` Nirmoy Das
2024-04-29 21:17 ` John Harrison [this message]
2024-04-30 3:28 ` Matthew Brost
2024-04-25 19:15 ` Michal Wajdeczko
2024-04-25 20:30 ` Matthew Brost
2024-04-25 21:17 ` Nirmoy Das
2024-04-26 3:12 ` ✓ CI.FULL: success for " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5ce5dda0-5de8-4619-bab4-5875c46d9c92@intel.com \
--to=john.c.harrison@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
--cc=nirmoy.das@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox