From: John Harrison <john.c.harrison@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
<Intel-GFX@Lists.FreeDesktop.Org>
Cc: DRI-Devel@Lists.FreeDesktop.Org
Subject: Re: [Intel-gfx] [PATCH 2/2] drm/i915/guc: Look for a guilty context when an engine reset fails
Date: Wed, 30 Nov 2022 13:04:23 -0800 [thread overview]
Message-ID: <751f5d84-b7c4-e459-957a-06ad47d4b1de@intel.com> (raw)
In-Reply-To: <17ba580d-556b-c963-703c-b80e74c050f9@linux.intel.com>
On 11/30/2022 00:30, Tvrtko Ursulin wrote:
> On 29/11/2022 21:12, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> Engine resets are supposed to never happen. But in the case when one
>
> Engine resets or engine reset failures? Hopefully the latter.
>
Oops. Yes, that was meant to say "engine resets are never supposed to fail."
>> does (due to unknwon reasons that normally come down to a missing
unknwon -> unknown
>> w/a), it is useful to get as much information out of the system as
>> possible. Given that the GuC effectively dies on such a situation, it
>> is not possible to get a guilty context notification back. So do a
>> manual search instead. Given that GuC is dead, this is safe because
>> GuC won't be changing the engine state asynchronously.
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>> drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 15 ++++++++++++++-
>> 1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> index 0a42f1807f52c..c82730804a1c4 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> @@ -4751,11 +4751,24 @@ static void reset_fail_worker_func(struct
>> work_struct *w)
>> guc->submission_state.reset_fail_mask = 0;
>> spin_unlock_irqrestore(&guc->submission_state.lock, flags);
>> - if (likely(reset_fail_mask))
>> + if (likely(reset_fail_mask)) {
>> + struct intel_engine_cs *engine;
>> + enum intel_engine_id id;
>> +
>> + /*
>> + * GuC is toast at this point - it dead loops after sending
>> the failed
>> + * reset notification. So need to manually determine the
>> guilty context.
>> + * Note that it should be safe/reliable to do this here
>> because the GuC
>> + * is toast and will not be scheduling behind the KMD's back.
>> + */
>> + for_each_engine_masked(engine, gt, reset_fail_mask, id)
>> + intel_guc_find_hung_context(engine);
>> +
>> intel_gt_handle_error(gt, reset_fail_mask,
>> I915_ERROR_CAPTURE,
>> "GuC failed to reset engine mask=0x%x\n",
>> reset_fail_mask);
>
> If GuC is defined by ABI contract to be dead, should the flow be
> attempting to do a full GPU reset here, or maybe it happens somewhere
> else as a consequence anyway? (In which case is the engine reset here
> even needed?)
This is a full GT reset. i915 is not allowed to perform an engine reset
when using GuC submission. Those can only be done by GuC. So any forced
reset by i915 will be escalated to full GT internally.
John.
>
> Regards,
>
> Tvrtko
>
>> + }
>> }
>> int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
next prev parent reply other threads:[~2022-11-30 21:04 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-29 21:12 [Intel-gfx] [PATCH 0/2] Allow error capture without a request / on reset failure John.C.Harrison
2022-11-29 21:12 ` [Intel-gfx] [PATCH 1/2] drm/i915: Allow error capture without a request John.C.Harrison
2022-12-13 1:52 ` Umesh Nerlige Ramappa
2022-12-16 21:06 ` John Harrison
2022-11-29 21:12 ` [Intel-gfx] [PATCH 2/2] drm/i915/guc: Look for a guilty context when an engine reset fails John.C.Harrison
2022-11-30 8:30 ` Tvrtko Ursulin
2022-11-30 21:04 ` John Harrison [this message]
2022-12-01 10:21 ` Tvrtko Ursulin
2022-12-13 2:00 ` Umesh Nerlige Ramappa
2022-11-30 0:08 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Allow error capture without a request / on reset failure Patchwork
2022-11-30 1:27 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=751f5d84-b7c4-e459-957a-06ad47d4b1de@intel.com \
--to=john.c.harrison@intel.com \
--cc=DRI-Devel@Lists.FreeDesktop.Org \
--cc=Intel-GFX@Lists.FreeDesktop.Org \
--cc=tvrtko.ursulin@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox