Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Teres Alexis, Alan Previn" <alan.previn.teres.alexis@intel.com>
To: "Harrison, John C" <john.c.harrison@intel.com>,
	"Intel-GFX@Lists.FreeDesktop.Org"
	<Intel-GFX@Lists.FreeDesktop.Org>
Cc: "DRI-Devel@Lists.FreeDesktop.Org" <DRI-Devel@Lists.FreeDesktop.Org>
Subject: Re: [Intel-gfx] [PATCH] drm/i915/guc: Fix error capture for virtual engines
Date: Thu, 27 Apr 2023 17:00:08 +0000	[thread overview]
Message-ID: <174d334438e887a3a80f6442184cf2f51f8e6c83.camel@intel.com> (raw)
In-Reply-To: <20230415002710.3971019-1-John.C.Harrison@Intel.com>

On Fri, 2023-04-14 at 17:27 -0700, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> GuC based register dumps in error capture logs were basically broken
> for virtual engines. This can be seen in igt@gem_exec_balancer@hang:
>   [IGT] gem_exec_balancer: starting subtest hang
>   [drm] GPU HANG: ecode 12:4:e1524110, in gem_exec_balanc [6388]
>   [drm] GT0: GUC: No register capture node found for 0x1005 / 0xFEDC311D
>   [drm] GPU HANG: ecode 12:4:00000000, in gem_exec_balanc [6388]
>   [IGT] gem_exec_balancer: exiting, ret=0
> 
> The test causes a hang on both engines of a virtual engine context.
> The engine instance zero hang gets a valid error capture but the
> non-instance-zero hang does not.
> 
> Fix that by scanning through the list of pending register captures
> when a hang notification for a virtual engine is received. That way,
> the hang can be assigned to the correct physical engine prior to
> starting the error capture process. So later on, when the error capture
> handler tries to find the engine register list, it looks for one on
> the correct engine.
> 
> Also, sneak in a missing blank line before a comment in the node
> search code.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

LGTM - thanks for fixing this! :D
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>

A side conversation - potentially requring an unrelated future patch,...
i notice that the array "error->reset_engine_count[]" (which is being
used for error state reporting and as some verification in selftests) seem
to have a size indicating of engine-instance-count but the reading/wrting
of members of this array keep using the engine->uabi_class as index...
meaning its being used to track engine class reset counts? Maybe this is
an issue or i am misundertanding that. Either way, that issue is unrelated
to the intent of this patch - i just wanted to get that highlighted for
future action if needed. I can take that onus if its in fact an issue.

      parent reply	other threads:[~2023-04-27 17:00 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-15  0:27 [Intel-gfx] [PATCH] drm/i915/guc: Fix error capture for virtual engines John.C.Harrison
2023-04-15  1:10 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2023-04-15  7:13 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2023-04-27 17:00 ` Teres Alexis, Alan Previn [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=174d334438e887a3a80f6442184cf2f51f8e6c83.camel@intel.com \
    --to=alan.previn.teres.alexis@intel.com \
    --cc=DRI-Devel@Lists.FreeDesktop.Org \
    --cc=Intel-GFX@Lists.FreeDesktop.Org \
    --cc=john.c.harrison@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox