From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/i915: Don't claim an unstarted request was guilty
Date: Fri, 08 Feb 2019 17:31:54 +0200 [thread overview]
Message-ID: <87zhr61el1.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <154963787506.3378.11368751484623076898@skylake-alporthouse-com>
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Quoting Mika Kuoppala (2019-02-08 14:47:13)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>>
>> > If we haven't even begun executing the payload of the stalled request,
>> > then we should not claim that its userspace context was guilty of
>> > submitting a hanging batch.
>> >
>> > v2: Check for context corruption before trying to restart.
>> > v3: Preserve semaphores on skipping requests (need to keep the timelines
>> > intact).
>> >
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > ---
>> > drivers/gpu/drm/i915/intel_lrc.c | 42 +++++++++++++++++--
>> > drivers/gpu/drm/i915/selftests/igt_spinner.c | 9 +++-
>> > .../gpu/drm/i915/selftests/intel_hangcheck.c | 6 +++
>> > 3 files changed, 53 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> > index 5e98fd79bd9d..e3134a635926 100644
>> > --- a/drivers/gpu/drm/i915/intel_lrc.c
>> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> > @@ -1387,6 +1387,10 @@ static int gen8_emit_init_breadcrumb(struct i915_request *rq)
>> > *cs++ = rq->fence.seqno - 1;
>> >
>> > intel_ring_advance(rq, cs);
>> > +
>> > + /* Record the updated position of the request's payload */
>> > + rq->infix = intel_ring_offset(rq, cs);
>> > +
>> > return 0;
>> > }
>> >
>> > @@ -1878,6 +1882,23 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine)
>> > spin_unlock_irqrestore(&engine->timeline.lock, flags);
>> > }
>> >
>> > +static bool lrc_regs_ok(const struct i915_request *rq)
>> > +{
>> > + const struct intel_ring *ring = rq->ring;
>> > + const u32 *regs = rq->hw_context->lrc_reg_state;
>> > +
>> > + /* Quick spot check for the common signs of context corruption */
>> > +
>> > + if (regs[CTX_RING_BUFFER_CONTROL + 1] !=
>> > + (RING_CTL_SIZE(ring->size) | RING_VALID))
>> > + return false;
>>
>> You been noticing this with ctx corruption? Well now
>> thinking about it, we have had reports where on init,
>> on some trouble, the valid vanishes.
>
> Yes, it's why we've been copying the default context over guilty for a
> long time (pretty much since live_hangcheck became a thing iirc).
>
>> > +
>> > + if (regs[CTX_RING_BUFFER_START + 1] != i915_ggtt_offset(ring->vma))
>> > + return false;
>> > +
>>
>> Seen this on bugzilla reports too. Are there more
>> in your sleeve or is this a compromise on complexity
>> and performance? Checking on a sane actual head too?
>
> No, I can't remember seeing this, just loosing CTL. But I do recall that
> at one point it seemed the whole reg state was zero, but that is foggy
> memory.
I might mix this with the failed init where the start was
bogus/stale. Looked like after the hang, the hardware just
swapped to a previous ctx without any provocation.
Regardless of the reason, this will guard the
restart.
>
>> The heuristics of it bothers me some as we will
>> get false positives.
>
> They cannot be false positives! If we restore to a batch setup like
> this, it will hang -- which is why we explicitly reset them.
>
>> So in effect, when we get one, we just move ahead
>> after an extra reset as we got it all wrong?
>
> Yup. The context is corrupt, we replace it with a sane one and hope
> nobody notices. Mesa does notice though....
We tell them about our shortcomings, if they choose to listen.
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2019-02-08 15:32 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-07 7:18 [PATCH 1/8] drm/i915: Hack and slash, throttle execbuffer hogs Chris Wilson
2019-02-07 7:18 ` [PATCH 2/8] drm/i915: Defer removing fence register tracking to rpm wakeup Chris Wilson
2019-02-07 13:22 ` Mika Kuoppala
2019-02-07 13:38 ` Chris Wilson
2019-02-07 14:09 ` Mika Kuoppala
2019-02-07 14:13 ` Chris Wilson
2019-02-07 7:18 ` [PATCH 3/8] drm/i915: Revoke mmaps and prevent access to fence registers across reset Chris Wilson
2019-02-07 15:05 ` Mika Kuoppala
2019-02-07 7:18 ` [PATCH 4/8] drm/i915: Force the GPU reset upon wedging Chris Wilson
2019-02-08 9:31 ` Mika Kuoppala
2019-02-08 9:47 ` Chris Wilson
2019-02-07 7:18 ` [PATCH 5/8] drm/i915: Uninterruptibly drain the timelines on unwedging Chris Wilson
2019-02-08 9:46 ` Mika Kuoppala
2019-02-08 10:00 ` Chris Wilson
2019-02-08 15:07 ` Mika Kuoppala
2019-02-08 15:13 ` Chris Wilson
2019-02-07 7:18 ` [PATCH 6/8] drm/i915: Wait for old resets before applying debugfs/i915_wedged Chris Wilson
2019-02-08 9:56 ` Mika Kuoppala
2019-02-08 10:01 ` Chris Wilson
2019-02-07 7:18 ` [PATCH 7/8] drm/i915: Serialise resets with wedging Chris Wilson
2019-02-08 14:30 ` Mika Kuoppala
2019-02-07 7:18 ` [PATCH 8/8] drm/i915: Don't claim an unstarted request was guilty Chris Wilson
2019-02-07 7:41 ` [PATCH] " Chris Wilson
2019-02-08 14:47 ` Mika Kuoppala
2019-02-08 14:58 ` Chris Wilson
2019-02-08 15:31 ` Mika Kuoppala [this message]
2019-02-07 8:08 ` ✗ Fi.CI.SPARSE: warning for series starting with [1/8] drm/i915: Hack and slash, throttle execbuffer hogs (rev2) Patchwork
2019-02-07 8:25 ` ✓ Fi.CI.BAT: success " Patchwork
2019-02-07 9:53 ` ✓ Fi.CI.IGT: " Patchwork
2019-02-07 16:01 ` [PATCH 1/8] drm/i915: Hack and slash, throttle execbuffer hogs Joonas Lahtinen
2019-02-07 16:05 ` Chris Wilson
2019-02-07 16:21 ` Chris Wilson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zhr61el1.fsf@gaia.fi.intel.com \
--to=mika.kuoppala@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.