From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 3/4] drm/i915/gt: Perform an arbitration check before busywaiting
Date: Tue, 12 Jan 2021 09:21:17 +0000 [thread overview]
Message-ID: <ca8e82e6-7ec3-ffe9-f151-6f08bd304333@linux.intel.com> (raw)
In-Reply-To: <161040207613.28181.5503270808079806649@build.alporthouse.com>
On 11/01/2021 21:54, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2021-01-11 17:12:57)
>>
>> On 11/01/2021 16:27, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2021-01-11 16:19:40)
>>>>
>>>> On 11/01/2021 10:57, Chris Wilson wrote:
>>>>> During igt_reset_nop_engine, it was observed that an unexpected failed
>>>>> engine reset lead to us busywaiting on the stop-ring semaphore (set
>>>>> during the reset preparations) on the first request afterwards. There was
>>>>> no explicit MI_ARB_CHECK in this sequence as the presumption was that
>>>>> the failed MI_SEMAPHORE_WAIT would itself act as an arbitration point.
>>>>> It did not in this circumstance, so force it.
>>>>
>>>> In other words MI_SEMAPHORE_POLL is not a preemption point? Can't
>>>> remember if I knew that or not..
>>>
>>> MI_SEMAPHORE_WAIT | POLL is most definitely a preemption point on a
>>> miss.
>>>
>>>> 1)
>>>> Why not the same handling in !gen12 version?
>>>
>>> Because I think it's a bug in tgl [a0 at least]. I think I've seen the
>>> same symptoms on tgl before, but not earlier. This is the first time the
>>> sequence clicked as to why it was busy spinning. Random engine reset
>>> failures are rare enough -- I was meant to also write a test case to
>>> inject failure.
>>
>> Random engine reset failure you think is a TGL issue?
>
> The MI_SEMAPHORE_WAIT | POLL miss not generating an arbitration point.
> We have quite a few selftests and IGT that use this feature.
>
> So I was wondering if this was similar to one of those tgl issues with
> semaphores and CS events.
>
> The random engine reset failure here is also decidedly odd. The engine
> was idle!
>
>>>> 2)
>>>> Failed reset leads to busy-hang in following request _tail_? But there
>>>> is an arb check at the start of following request as well. Or in cases
>>>> where we context switch into the middle of a previously executing request?
>>>
>>> It was the first request submitted after the failed reset. We expect to
>>> clear the ring-stop flag on the CS IDLE->ACTIVE event.
>>>
>>>> But why would that busy hang? Hasn't the failed request unpaused the ring?
>>>
>>> The engine was idle at the time of the failed reset. We left the
>>> ring-stop set, and submitted the next batch of requests. We hit the
>>> MI_SEMAPHORE_WAIT(ring-stop) at the end of the first request, but
>>> without hitting an arbitration point (first request, no init-breadcrumb
>>> in this case), the semaphore was stuck.
>>
>> So a kernel context request?
>
> Ish. The selftest is using empty requests, and not emitting the
> initial breadcrumb. (So acting like a kernel context.)
>
>> Why hasn't IDLE->ACTIVE cleared ring stop?
>
> There hasn't been an idle->active event, not a single CS event after
> writing to ELSP and timing out while still spinning on the semaphore.
>
>> Presumably this CSB must come after the first request has been submitted
>> so apparently I am still not getting how it hangs.
>
> It was never sent. The context is still in pending[0] (not active[0])
> and there's no sign in the trace of any interrupts/tasklet handing other
> than the semaphore-wait interrupt.
>
>> Just because igt_reset_nop_engine does things "quickly"? It prevents the
>> CSB from arriving?
>
> More that the since we do very little we hit the semaphore before the CS
> has recovered from the shock of being asked to do something.
>
>> So ARB_CHECK pickups up on the fact ELSP has been
>> rewritten before the IDLE->ACTIVE even received and/or engine reset
>> prevented it from arriving?
>
> The ARB_CHECK should trigger the CS to generate the IDLE->ACTIVE event.
> (Of course assuming that the bug is in the semaphore not triggering the
> event due to strange circumstances and not a bug in the event generator
> itself.) I'm suspicious of the semaphore due to the earlier CS bugs with
> lite-restores + semaphores, and am expecting that since the MI_ARB_CHECK
> is explicit, it actually works.
Okay got it, thanks. I suggest it would be good to slightly improve the
commit message so it is clear what are the suspected TGL quirks. But in
general:
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2021-01-12 9:21 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-11 10:57 [Intel-gfx] [PATCH 1/4] drm/i915/gt: Disable arbitration around Braswell's pdp updates Chris Wilson
2021-01-11 10:57 ` [Intel-gfx] [PATCH 2/4] drm/i915/gt: Check for arbitration after writing start seqno Chris Wilson
2021-01-11 16:03 ` Tvrtko Ursulin
2021-01-11 16:10 ` Chris Wilson
2021-01-12 9:14 ` Tvrtko Ursulin
2021-01-11 10:57 ` [Intel-gfx] [PATCH 3/4] drm/i915/gt: Perform an arbitration check before busywaiting Chris Wilson
2021-01-11 16:19 ` Tvrtko Ursulin
2021-01-11 16:27 ` Chris Wilson
2021-01-11 17:12 ` Tvrtko Ursulin
2021-01-11 21:54 ` Chris Wilson
2021-01-12 9:21 ` Tvrtko Ursulin [this message]
2021-01-11 10:57 ` [Intel-gfx] [PATCH 4/4] drm/i915/selftests: Include engine name after reset failure Chris Wilson
2021-01-11 16:20 ` Tvrtko Ursulin
2021-01-11 12:30 ` [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [1/4] drm/i915/gt: Disable arbitration around Braswell's pdp updates Patchwork
2021-01-11 14:19 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2021-01-11 15:53 ` [Intel-gfx] [PATCH 1/4] " Tvrtko Ursulin
2021-01-11 16:00 ` Chris Wilson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ca8e82e6-7ec3-ffe9-f151-6f08bd304333@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.