dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Intel-gfx@lists.freedesktop.org,
	John Harrison <John.C.Harrison@Intel.com>,
	dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH] drm/i915/guc: Log engine resets
Date: Mon, 20 Dec 2021 09:55:37 -0800	[thread overview]
Message-ID: <20211220175537.GA2663@jons-linux-dev-box> (raw)
In-Reply-To: <3d32df02-c02e-9c35-5165-79af1cb10100@linux.intel.com>

On Mon, Dec 20, 2021 at 03:00:53PM +0000, Tvrtko Ursulin wrote:
> 
> On 17/12/2021 16:22, Matthew Brost wrote:
> > On Fri, Dec 17, 2021 at 12:15:53PM +0000, Tvrtko Ursulin wrote:
> > > 
> > > On 14/12/2021 15:07, Tvrtko Ursulin wrote:
> > > > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > > 
> > > > Log engine resets done by the GuC firmware in the similar way it is done
> > > > by the execlists backend.
> > > > 
> > > > This way we have notion of where the hangs are before the GuC gains
> > > > support for proper error capture.
> > > 
> > > Ping - any interest to log this info?
> > > 
> > > All there currently is a non-descriptive "[drm] GPU HANG: ecode
> > > 12:0:00000000".
> > > 
> > 
> > Yea, this could be helpful. One suggestion below.
> > 
> > > Also, will GuC be reporting the reason for the engine reset at any point?
> > > 
> > 
> > We are working on the error state capture, presumably the registers will
> > give a clue what caused the hang.
> > 
> > As for the GuC providing a reason, that isn't defined in the interface
> > but that is decent idea to provide a hint in G2H what the issue was. Let
> > me run that by the i915 GuC developers / GuC firmware team and see what
> > they think.
> > 
> > > Regards,
> > > 
> > > Tvrtko
> > > 
> > > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > > Cc: John Harrison <John.C.Harrison@Intel.com>
> > > > ---
> > > >    drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 12 +++++++++++-
> > > >    1 file changed, 11 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > index 97311119da6f..51512123dc1a 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > @@ -11,6 +11,7 @@
> > > >    #include "gt/intel_context.h"
> > > >    #include "gt/intel_engine_pm.h"
> > > >    #include "gt/intel_engine_heartbeat.h"
> > > > +#include "gt/intel_engine_user.h"
> > > >    #include "gt/intel_gpu_commands.h"
> > > >    #include "gt/intel_gt.h"
> > > >    #include "gt/intel_gt_clock_utils.h"
> > > > @@ -3934,9 +3935,18 @@ static void capture_error_state(struct intel_guc *guc,
> > > >    {
> > > >    	struct intel_gt *gt = guc_to_gt(guc);
> > > >    	struct drm_i915_private *i915 = gt->i915;
> > > > -	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
> > > > +	struct intel_engine_cs *engine = ce->engine;
> > > >    	intel_wakeref_t wakeref;
> > > > +	if (intel_engine_is_virtual(engine)) {
> > > > +		drm_notice(&i915->drm, "%s class, engines 0x%x; GuC engine reset\n",
> > > > +			   intel_engine_class_repr(engine->class),
> > > > +			   engine->mask);
> > > > +		engine = guc_virtual_get_sibling(engine, 0);
> > > > +	} else {
> > > > +		drm_notice(&i915->drm, "%s GuC engine reset\n", engine->name);
> > 
> > Probably include the guc_id of the context too then?
> 
> Is the guc id stable and useful on its own - who would be the user?
> 

Techincally not stable, but in practice it is. The user could be
corresponding the context that was reset to a GuC log.

More debug info is typically better.

Matt

> Regards,
> 
> Tvrtko

  reply	other threads:[~2021-12-20 18:01 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-14 15:07 [PATCH] drm/i915/guc: Log engine resets Tvrtko Ursulin
2021-12-17 12:15 ` [Intel-gfx] " Tvrtko Ursulin
2021-12-17 16:22   ` Matthew Brost
2021-12-20 15:00     ` Tvrtko Ursulin
2021-12-20 17:55       ` Matthew Brost [this message]
2021-12-20 18:34       ` John Harrison
2021-12-21 13:37         ` Tvrtko Ursulin
2021-12-21 22:14           ` John Harrison
2021-12-22 16:21             ` Tvrtko Ursulin
2021-12-22 21:58               ` John Harrison
2021-12-23 10:23                 ` Tvrtko Ursulin
2021-12-23 17:35                   ` John Harrison
2021-12-24 11:57                     ` Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211220175537.GA2663@jons-linux-dev-box \
    --to=matthew.brost@intel.com \
    --cc=Intel-gfx@lists.freedesktop.org \
    --cc=John.C.Harrison@Intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).