From: Mika Kuoppala <mika.kuoppala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/i915: Park the breadcrumbs signaler across a GPU reset
Date: Mon, 13 Feb 2017 12:34:26 +0200 [thread overview]
Message-ID: <87h93ypefh.fsf@gaia.fi.intel.com> (raw)
In-Reply-To: <20170213101544.GB30915@nuc-i3427.alporthouse.com>
Chris Wilson <chris@chris-wilson.co.uk> writes:
> On Mon, Feb 13, 2017 at 11:56:46AM +0200, Mika Kuoppala wrote:
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>>
>> > The signal threads may be running concurrently with the GPU reset. The
>> > completion from the GPU run asynchronous with the reset and two threads
>> > may see different snapshots of the state, and the signaler may mark a
>> > request as complete as we try to reset it. We don't tolerate 2 different
>> > views of the same state and complain if we try to mark a request as
>> > failed if it is already complete. Disable the signal threads during
>> > reset to prevent this conflict (even though the conflict implies that
>> > the state we resetting to is invalid, we have already made our
>> > decision!).
>> >
>> > References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> > ---
>> > drivers/gpu/drm/i915/i915_gem.c | 16 +++++++++++++++-
>> > drivers/gpu/drm/i915/intel_breadcrumbs.c | 3 +++
>> > 2 files changed, 18 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> > index 48922ff454e6..95582295b219 100644
>> > --- a/drivers/gpu/drm/i915/i915_gem.c
>> > +++ b/drivers/gpu/drm/i915/i915_gem.c
>> > @@ -35,6 +35,7 @@
>> > #include "intel_frontbuffer.h"
>> > #include "intel_mocs.h"
>> > #include <linux/dma-fence-array.h>
>> > +#include <linux/kthread.h>
>> > #include <linux/reservation.h>
>> > #include <linux/shmem_fs.h>
>> > #include <linux/slab.h>
>> > @@ -2643,6 +2644,17 @@ int i915_gem_reset_prepare(struct drm_i915_private *dev_priv)
>> > for_each_engine(engine, dev_priv, id) {
>> > struct drm_i915_gem_request *request;
>> >
>> > + /* Prevent the signaler thread from updating the request
>> > + * state (by calling dma_fence_signal) as we are processing
>> > + * the reset. The write from the GPU of the seqno is
>> > + * asynchronous and the signaler thread may see a different
>> > + * value to us and declare the request complete, even though
>> > + * the reset routine have picked that request as the active
>> > + * (incomplete) request. This conflict is not handled
>> > + * gracefully!
>> > + */
>> > + kthread_park(engine->breadcrumbs.signaler);
>> > +
>>
>> I was wondering here if it would be best to disable the tasklet first.
>> But stopping the car before we stop the engine seems wise thing to do.
>
> Note that we would have to do two passes over the engines to stop the
> signalers, and even then we still have other sources (other devices)
> that can cause tasklets to be invoked. There is not a foolproof answer
> here, they each serve a different purpose and can be disabled
> orthogonally.
>
>> Another unrelated thing is that we might be better off just to
>> make find_active_request always seqno coherent and fork a
>> noncoherent version for error capture.
>
> It's slightly more than that. In the worst case irq_barrier is not
> sufficient, and a second call to active_request may return a different
> victim - which again raises some awkwards issues in how we assign blame.
>
> At least we now do the irq_barrier hammer once at the start in reset_prepare,
> so we should be better, but I'm wondering if we want to store the
> request from prepare and then double check in the actual reset.
>
#1 store seqno from hangcheck
#2 get mutex for reset
#3 barrier
#4 find_request (only once)
#5 on prepare path, check the submachinery
against this req and if inconsistent, queue hangcheck
and return from prepare without resetting.
?
-Mika
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2017-02-13 10:35 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-12 0:44 [PATCH] drm/i915: Park the breadcrumbs signaler across a GPU reset Chris Wilson
2017-02-12 1:22 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-02-12 14:08 ` [PATCH] " Chris Wilson
2017-02-13 9:56 ` Mika Kuoppala
2017-02-13 10:15 ` Chris Wilson
2017-02-13 10:34 ` Mika Kuoppala [this message]
2017-02-13 10:48 ` Chris Wilson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h93ypefh.fsf@gaia.fi.intel.com \
--to=mika.kuoppala@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox