Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: John Harrison <john.c.harrison@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	<Intel-GFX@Lists.FreeDesktop.Org>
Cc: DRI-Devel@Lists.FreeDesktop.Org
Subject: Re: [Intel-gfx] [PATCH 2/4] drm/i915: Allow error capture of a pending request
Date: Thu, 12 Jan 2023 12:46:27 -0800	[thread overview]
Message-ID: <5a8be54d-9627-3d60-b6b0-22f3732e6962@intel.com> (raw)
In-Reply-To: <dab002d8-75f7-d8b5-d0a0-a6a21ec724b0@linux.intel.com>

On 1/12/2023 02:06, Tvrtko Ursulin wrote:
> On 12/01/2023 02:53, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> A hang situation has been observed where the only requests on the
>> context were either completed or not yet started according to the
>> breaadcrumbs. However, the register state claimed a batch was (maybe)
>> in progress. So, allow capture of the pending request on the grounds
>> that this might be better than nothing.
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gpu_error.c | 8 +++-----
>>   1 file changed, 3 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
>> b/drivers/gpu/drm/i915/i915_gpu_error.c
>> index bd2cf7d235df0..2e338a9667a4b 100644
>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>> @@ -1628,11 +1628,9 @@ capture_engine(struct intel_engine_cs *engine,
>>       if (ce) {
>>           intel_engine_clear_hung_context(engine);
>>           rq = intel_context_find_active_request(ce);
>> -        if (rq && !i915_request_started(rq)) {
>> -            drm_info(&engine->gt->i915->drm, "Got hung context on %s 
>> with no active request!\n",
>> -                 engine->name);
>> -            rq = NULL;
>> -        }
>> +        if (rq && !i915_request_started(rq))
>> +            drm_info(&engine->gt->i915->drm, "Confused - active 
>> request not yet started: %lld:%lld, ce = 0x%04X/%s!\n",
>> +                 rq->fence.context, rq->fence.seqno, ce->guc_id.id, 
>> engine->name);
>
> Ah you change active to started in this patch! :)
Yeah, I'm wanting to keep these two patches separate. This one is a more 
questionable change in actual behaviour. The previous patch just allows 
capturing the context when the request has been rejected. Whereas this 
one changes the request acceptance criteria. With the potential to start 
blaming innocent requests. It seems plausible to me, especially with the 
warning message. We know the context owning the request is guilty so why 
wouldn't we blame that request just because the tracking is off (maybe 
due to some driver bug). But I could see someone objecting on grounds of 
being super strict about who/what gets blamed for a hang and either 
nacks or maybe wants this change reverted some time later.

>
> I suggest no "ce" in user visible messages and maybe stick with the 
> convention grep suggest is already established:
>
> "Hung context with active request %lld:%lld [0x%04X] not started!"
>
Are you also meaning to drop the engine name? I think it is important to 
keep the '%s' in there somewhere.

John.


> Regards,
>
> Tvrtko
>
>>       } else {
>>           /*
>>            * Getting here with GuC enabled means it is a forced error 
>> capture


  reply	other threads:[~2023-01-12 20:46 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-12  2:53 [Intel-gfx] [PATCH 0/4] Allow error capture without a request / on reset failure John.C.Harrison
2023-01-12  2:53 ` [Intel-gfx] [PATCH 1/4] drm/i915: Allow error capture without a request John.C.Harrison
2023-01-12 10:01   ` Tvrtko Ursulin
2023-01-12 20:40     ` John Harrison
2023-01-13  9:51       ` Tvrtko Ursulin
2023-01-13 17:46         ` Hellstrom, Thomas
2023-01-13 21:29           ` John Harrison
2023-01-16 12:38             ` Tvrtko Ursulin
2023-01-17 19:40               ` John Harrison
2023-01-16 12:13           ` Tvrtko Ursulin
2023-01-12  2:53 ` [Intel-gfx] [PATCH 2/4] drm/i915: Allow error capture of a pending request John.C.Harrison
2023-01-12 10:06   ` Tvrtko Ursulin
2023-01-12 20:46     ` John Harrison [this message]
2023-01-13  9:10       ` Tvrtko Ursulin
2023-01-12  2:53 ` [Intel-gfx] [PATCH 3/4] drm/i915/guc: Look for a guilty context when an engine reset fails John.C.Harrison
2023-01-12 10:15   ` Tvrtko Ursulin
2023-01-12 20:59     ` John Harrison
2023-01-13  9:22       ` Tvrtko Ursulin
2023-01-14  1:27         ` John Harrison
2023-01-16 12:43           ` Tvrtko Ursulin
2023-01-17 21:14             ` John Harrison
2023-01-12  2:53 ` [Intel-gfx] [PATCH 4/4] drm/i915/guc: Add a debug print on GuC triggered reset John.C.Harrison
2023-01-12 10:11   ` Tvrtko Ursulin
2023-01-12  3:21 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Allow error capture without a request / on reset failure (rev2) Patchwork
2023-01-12  3:36 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-01-12  5:36 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5a8be54d-9627-3d60-b6b0-22f3732e6962@intel.com \
    --to=john.c.harrison@intel.com \
    --cc=DRI-Devel@Lists.FreeDesktop.Org \
    --cc=Intel-GFX@Lists.FreeDesktop.Org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox