From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
intel-gfx@lists.freedesktop.org,
Mika Kuoppala <mika.kuoppala@intel.com>
Subject: Re: [PATCH v3 2/2] drm/i915: Restore engine->submit_request before unwedging
Date: Wed, 15 Mar 2017 10:04:13 +0000 [thread overview]
Message-ID: <482f5c2f-ae2c-ba37-63be-fe9ffb0823c1@linux.intel.com> (raw)
In-Reply-To: <20170315093421.GQ2118@nuc-i3427.alporthouse.com>
On 15/03/2017 09:34, Chris Wilson wrote:
> On Wed, Mar 15, 2017 at 09:23:01AM +0000, Tvrtko Ursulin wrote:
>>
>> On 14/03/2017 09:34, Chris Wilson wrote:
>>> When we wedge the device, we override engine->submit_request with a nop
>>> to ensure that all in-flight requests are marked in error. However, igt
>>> would like to unwedge the device to test -EIO handling. This requires us
>>> to flush those in-flight requests and restore the original
>>> engine->submit_request.
>>>
>>> v2: Use a vfunc to unify enabling request submission to engines
>>> v3: Split new vfunc to a separate patch.
>>>
>>> Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests")
>>> Testcase: igt/gem_eio
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>>> ---
>>> drivers/gpu/drm/i915/i915_drv.c | 2 +-
>>> drivers/gpu/drm/i915/i915_drv.h | 1 +
>>> drivers/gpu/drm/i915/i915_gem.c | 51 +++++++++++++++++++++++++++++++++++++++++
>>> 3 files changed, 53 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
>>> index e312b61ba6bb..11d1066b673c 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.c
>>> +++ b/drivers/gpu/drm/i915/i915_drv.c
>>> @@ -1822,7 +1822,7 @@ void i915_reset(struct drm_i915_private *dev_priv)
>>> return;
>>>
>>> /* Clear any previous failed attempts at recovery. Time to try again. */
>>> - __clear_bit(I915_WEDGED, &error->flags);
>>> + i915_gem_unset_wedged(dev_priv);
>>> error->reset_count++;
>>>
>>> pr_notice("drm/i915: Resetting chip after gpu hang\n");
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>> index 48ff64812289..53a791d8d992 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -3406,6 +3406,7 @@ int i915_gem_reset_prepare(struct drm_i915_private *dev_priv);
>>> void i915_gem_reset(struct drm_i915_private *dev_priv);
>>> void i915_gem_reset_finish(struct drm_i915_private *dev_priv);
>>> void i915_gem_set_wedged(struct drm_i915_private *dev_priv);
>>> +void i915_gem_unset_wedged(struct drm_i915_private *dev_priv);
>>>
>>> void i915_gem_init_mmio(struct drm_i915_private *i915);
>>> int __must_check i915_gem_init(struct drm_i915_private *dev_priv);
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 202bb850f260..e06830916a05 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -3000,6 +3000,57 @@ void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
>>> mod_delayed_work(dev_priv->wq, &dev_priv->gt.idle_work, 0);
>>> }
>>>
>>> +void i915_gem_unset_wedged(struct drm_i915_private *i915)
>>> +{
>>> + struct i915_gem_timeline *tl;
>>> + int i;
>>> +
>>> + lockdep_assert_held(&i915->drm.struct_mutex);
>>> + if (!test_bit(I915_WEDGED, &i915->gpu_error.flags))
>>> + return;
>>> +
>>> + /* Before unwedging, make sure that all pending operations
>>> + * are flushed and errored out. No more can be submitted until
>>> + * we reset the wedged bit.
>>> + */
>>> + list_for_each_entry(tl, &i915->gt.timelines, link) {
>>> + for (i = 0; i < ARRAY_SIZE(tl->engine); i++) {
>>> + struct drm_i915_gem_request *rq;
>>> +
>>> + rq = i915_gem_active_peek(&tl->engine[i].last_request,
>>> + &i915->drm.struct_mutex);
>>> + if (!rq)
>>> + continue;
>>> +
>>> + /* We can't use our normal waiter as we want to
>>> + * avoid recursively trying to handle the current
>>> + * reset. The basic dma_fence_default_wait() installs
>>> + * a callback for dma_fence_signal(), which is
>>> + * triggered by our nop handler (indirectly, the
>>> + * callback enables the signaler thread which is
>>> + * woken by the nop_submit_request() advancing the seqno
>>> + * and when the seqno passes the fence, the signaler
>>> + * then signals the fence waking us up).
>>> + */
>>> + dma_fence_default_wait(&rq->fence, false,
>>> + MAX_SCHEDULE_TIMEOUT);
>>> + }
>>> + }
>>> +
>>> + /* Undo nop_submit_request. We prevent all new i915 requests from
>>> + * being queued (by disallowing execbuf whilst wedged) so having
>>> + * waited for all active requests above, we know the system is idle
>>> + * and do not have to worry about a thread being inside
>>> + * engine->submit_request() as we swap over. So unlike installing
>>> + * the nop_submit_request on reset, we can do this from normal
>>> + * context and do not require stop_machine().
>>> + */
>>> + intel_engines_enable_submission(i915);
>>
>> So the point of the dma_fence_default_wait above is it to ensure all
>> nop_submit_request call backs have completed? I don't at the moment
>> understand how could there be such callbacks since unwedge is
>> happening after the wedge. So the wedge already installed the nop
>> handler, and by the time we get to another reset attempt, isn't it
>> already guaranteed all of those have exited?
>
> There's no such guarantee. The nop_submit_request() is to ensure that all
> third party driven requests are flushed and our requests are marked with
> dma_fence_set_error(-EIO) and not submitted to hw.
I was thinking that the above loop is only about the runnable requests
but it's not. Okay, I think I get it now.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2017-03-15 10:04 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-14 9:34 [PATCH v3 1/2] drm/i915: Move engine->submit_request selection to a vfunc Chris Wilson
2017-03-14 9:34 ` [PATCH v3 2/2] drm/i915: Restore engine->submit_request before unwedging Chris Wilson
2017-03-15 9:23 ` Tvrtko Ursulin
2017-03-15 9:34 ` Chris Wilson
2017-03-15 10:04 ` Tvrtko Ursulin [this message]
2017-03-14 16:17 ` ✓ Fi.CI.BAT: success for series starting with [v3,1/2] drm/i915: Move engine->submit_request selection to a vfunc Patchwork
2017-03-14 16:31 ` [PATCH v3 1/2] " Tvrtko Ursulin
2017-03-14 21:33 ` Chris Wilson
2017-03-15 8:14 ` Tvrtko Ursulin
2017-03-15 9:41 ` Chris Wilson
2017-03-15 10:05 ` Tvrtko Ursulin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=482f5c2f-ae2c-ba37-63be-fe9ffb0823c1@linux.intel.com \
--to=tvrtko.ursulin@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
--cc=mika.kuoppala@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox