Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/i915: Park the breadcrumbs signaler across a GPU reset
Date: Mon, 13 Feb 2017 12:34:26 +0200	[thread overview]
Message-ID: <87h93ypefh.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <20170213101544.GB30915@nuc-i3427.alporthouse.com>

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Mon, Feb 13, 2017 at 11:56:46AM +0200, Mika Kuoppala wrote:
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > The signal threads may be running concurrently with the GPU reset. The
>> > completion from the GPU run asynchronous with the reset and two threads
>> > may see different snapshots of the state, and the signaler may mark a
>> > request as complete as we try to reset it. We don't tolerate 2 different
>> > views of the same state and complain if we try to mark a request as
>> > failed if it is already complete. Disable the signal threads during
>> > reset to prevent this conflict (even though the conflict implies that
>> > the state we resetting to is invalid, we have already made our
>> > decision!).
>> >
>> > References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/i915_gem.c          | 16 +++++++++++++++-
>> >  drivers/gpu/drm/i915/intel_breadcrumbs.c |  3 +++
>> >  2 files changed, 18 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> > index 48922ff454e6..95582295b219 100644
>> > --- a/drivers/gpu/drm/i915/i915_gem.c
>> > +++ b/drivers/gpu/drm/i915/i915_gem.c
>> > @@ -35,6 +35,7 @@
>> >  #include "intel_frontbuffer.h"
>> >  #include "intel_mocs.h"
>> >  #include <linux/dma-fence-array.h>
>> > +#include <linux/kthread.h>
>> >  #include <linux/reservation.h>
>> >  #include <linux/shmem_fs.h>
>> >  #include <linux/slab.h>
>> > @@ -2643,6 +2644,17 @@ int i915_gem_reset_prepare(struct drm_i915_private *dev_priv)
>> >  	for_each_engine(engine, dev_priv, id) {
>> >  		struct drm_i915_gem_request *request;
>> >  
>> > +		/* Prevent the signaler thread from updating the request
>> > +		 * state (by calling dma_fence_signal) as we are processing
>> > +		 * the reset. The write from the GPU of the seqno is
>> > +		 * asynchronous and the signaler thread may see a different
>> > +		 * value to us and declare the request complete, even though
>> > +		 * the reset routine have picked that request as the active
>> > +		 * (incomplete) request. This conflict is not handled
>> > +		 * gracefully!
>> > +		 */
>> > +		kthread_park(engine->breadcrumbs.signaler);
>> > +
>> 
>> I was wondering here if it would be best to disable the tasklet first.
>> But stopping the car before we stop the engine seems wise thing to do.
>
> Note that we would have to do two passes over the engines to stop the
> signalers, and even then we still have other sources (other devices)
> that can cause tasklets to be invoked. There is not a foolproof answer
> here, they each serve a different purpose and can be disabled
> orthogonally.
>  
>> Another unrelated thing is that we might be better off just to
>> make find_active_request always seqno coherent and fork a
>> noncoherent version for error capture.
>
> It's slightly more than that. In the worst case irq_barrier is not
> sufficient, and a second call to active_request may return a different
> victim - which again raises some awkwards issues in how we assign blame.
>
> At least we now do the irq_barrier hammer once at the start in reset_prepare,
> so we should be better, but I'm wondering if we want to store the
> request from prepare and then double check in the actual reset.
>

#1 store seqno from hangcheck
#2 get mutex for reset
#3 barrier
#4 find_request (only once) 
#5 on prepare path, check the submachinery
against this req and if inconsistent, queue hangcheck
and return from prepare without resetting.

?
-Mika
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2017-02-13 10:35 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-12  0:44 [PATCH] drm/i915: Park the breadcrumbs signaler across a GPU reset Chris Wilson
2017-02-12  1:22 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-02-12 14:08 ` [PATCH] " Chris Wilson
2017-02-13  9:56 ` Mika Kuoppala
2017-02-13 10:15   ` Chris Wilson
2017-02-13 10:34     ` Mika Kuoppala [this message]
2017-02-13 10:48       ` Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h93ypefh.fsf@gaia.fi.intel.com \
    --to=mika.kuoppala@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox