public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/i915: Don't claim an unstarted request was guilty
Date: Fri, 08 Feb 2019 17:31:54 +0200	[thread overview]
Message-ID: <87zhr61el1.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <154963787506.3378.11368751484623076898@skylake-alporthouse-com>

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2019-02-08 14:47:13)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > If we haven't even begun executing the payload of the stalled request,
>> > then we should not claim that its userspace context was guilty of
>> > submitting a hanging batch.
>> >
>> > v2: Check for context corruption before trying to restart.
>> > v3: Preserve semaphores on skipping requests (need to keep the timelines
>> > intact).
>> >
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > ---
>> >  drivers/gpu/drm/i915/intel_lrc.c              | 42 +++++++++++++++++--
>> >  drivers/gpu/drm/i915/selftests/igt_spinner.c  |  9 +++-
>> >  .../gpu/drm/i915/selftests/intel_hangcheck.c  |  6 +++
>> >  3 files changed, 53 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> > index 5e98fd79bd9d..e3134a635926 100644
>> > --- a/drivers/gpu/drm/i915/intel_lrc.c
>> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> > @@ -1387,6 +1387,10 @@ static int gen8_emit_init_breadcrumb(struct i915_request *rq)
>> >       *cs++ = rq->fence.seqno - 1;
>> >  
>> >       intel_ring_advance(rq, cs);
>> > +
>> > +     /* Record the updated position of the request's payload */
>> > +     rq->infix = intel_ring_offset(rq, cs);
>> > +
>> >       return 0;
>> >  }
>> >  
>> > @@ -1878,6 +1882,23 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine)
>> >       spin_unlock_irqrestore(&engine->timeline.lock, flags);
>> >  }
>> >  
>> > +static bool lrc_regs_ok(const struct i915_request *rq)
>> > +{
>> > +     const struct intel_ring *ring = rq->ring;
>> > +     const u32 *regs = rq->hw_context->lrc_reg_state;
>> > +
>> > +     /* Quick spot check for the common signs of context corruption */
>> > +
>> > +     if (regs[CTX_RING_BUFFER_CONTROL + 1] !=
>> > +         (RING_CTL_SIZE(ring->size) | RING_VALID))
>> > +             return false;
>> 
>> You been noticing this with ctx corruption? Well now
>> thinking about it, we have had reports where on init,
>> on some trouble, the valid vanishes.
>
> Yes, it's why we've been copying the default context over guilty for a
> long time (pretty much since live_hangcheck became a thing iirc).
>
>> > +
>> > +     if (regs[CTX_RING_BUFFER_START + 1] != i915_ggtt_offset(ring->vma))
>> > +             return false;
>> > +
>> 
>> Seen this on bugzilla reports too. Are there more
>> in your sleeve or is this a compromise on complexity
>> and performance? Checking on a sane actual head too?
>
> No, I can't remember seeing this, just loosing CTL. But I do recall that
> at one point it seemed the whole reg state was zero, but that is foggy
> memory.

I might mix this with the failed init where the start was
bogus/stale. Looked like after the hang, the hardware just
swapped to a previous ctx without any provocation.

Regardless of the reason, this will guard the
restart.

>  
>> The heuristics of it bothers me some as we will
>> get false positives.
>
> They cannot be false positives! If we restore to a batch setup like
> this, it will hang -- which is why we explicitly reset them.
>  
>> So in effect, when we get one, we just move ahead
>> after an extra reset as we got it all wrong?
>
> Yup. The context is corrupt, we replace it with a sane one and hope
> nobody notices. Mesa does notice though....

We tell them about our shortcomings, if they choose to listen.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2019-02-08 15:32 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-07  7:18 [PATCH 1/8] drm/i915: Hack and slash, throttle execbuffer hogs Chris Wilson
2019-02-07  7:18 ` [PATCH 2/8] drm/i915: Defer removing fence register tracking to rpm wakeup Chris Wilson
2019-02-07 13:22   ` Mika Kuoppala
2019-02-07 13:38     ` Chris Wilson
2019-02-07 14:09       ` Mika Kuoppala
2019-02-07 14:13         ` Chris Wilson
2019-02-07  7:18 ` [PATCH 3/8] drm/i915: Revoke mmaps and prevent access to fence registers across reset Chris Wilson
2019-02-07 15:05   ` Mika Kuoppala
2019-02-07  7:18 ` [PATCH 4/8] drm/i915: Force the GPU reset upon wedging Chris Wilson
2019-02-08  9:31   ` Mika Kuoppala
2019-02-08  9:47     ` Chris Wilson
2019-02-07  7:18 ` [PATCH 5/8] drm/i915: Uninterruptibly drain the timelines on unwedging Chris Wilson
2019-02-08  9:46   ` Mika Kuoppala
2019-02-08 10:00     ` Chris Wilson
2019-02-08 15:07       ` Mika Kuoppala
2019-02-08 15:13         ` Chris Wilson
2019-02-07  7:18 ` [PATCH 6/8] drm/i915: Wait for old resets before applying debugfs/i915_wedged Chris Wilson
2019-02-08  9:56   ` Mika Kuoppala
2019-02-08 10:01     ` Chris Wilson
2019-02-07  7:18 ` [PATCH 7/8] drm/i915: Serialise resets with wedging Chris Wilson
2019-02-08 14:30   ` Mika Kuoppala
2019-02-07  7:18 ` [PATCH 8/8] drm/i915: Don't claim an unstarted request was guilty Chris Wilson
2019-02-07  7:41   ` [PATCH] " Chris Wilson
2019-02-08 14:47     ` Mika Kuoppala
2019-02-08 14:58       ` Chris Wilson
2019-02-08 15:31         ` Mika Kuoppala [this message]
2019-02-07  8:08 ` ✗ Fi.CI.SPARSE: warning for series starting with [1/8] drm/i915: Hack and slash, throttle execbuffer hogs (rev2) Patchwork
2019-02-07  8:25 ` ✓ Fi.CI.BAT: success " Patchwork
2019-02-07  9:53 ` ✓ Fi.CI.IGT: " Patchwork
2019-02-07 16:01 ` [PATCH 1/8] drm/i915: Hack and slash, throttle execbuffer hogs Joonas Lahtinen
2019-02-07 16:05   ` Chris Wilson
2019-02-07 16:21   ` Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zhr61el1.fsf@gaia.fi.intel.com \
    --to=mika.kuoppala@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox