From: Carlos Santa <carlos.santa@intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Cc: Michel Thierry <michel.thierry@intel.com>
Subject: Re: [PATCH v5 3/5] drm/i915: Watchdog timeout: Ringbuffer command emission for gen8+
Date: Mon, 01 Apr 2019 17:57:10 -0700 [thread overview]
Message-ID: <926e77bc13e29a2742aa2c398dcd8b21bbc53e1e.camel@intel.com> (raw)
In-Reply-To: <155393647005.24691.2200638315146110295@skylake-alporthouse-com>
On Sat, 2019-03-30 at 09:01 +0000, Chris Wilson wrote:
> Quoting Carlos Santa (2019-03-22 23:41:16)
> > From: Michel Thierry <michel.thierry@intel.com>
> >
> > Emit the required commands into the ring buffer for starting and
> > stopping the watchdog timer before/after batch buffer start during
> > batch buffer submission.
>
> I'm expecting to see some discussion of how this is handled across
> preemption here since you are inside an arbitration enabled section.
>
> > v2: Support watchdog threshold per context engine, merge lri
> > commands,
> > and move watchdog commands emission to emit_bb_start. Request space
> > of
> > combined start_watchdog, bb_start and stop_watchdog to avoid any
> > error
> > after emitting bb_start.
> >
> > v3: There were too many req->engine in emit_bb_start.
> > Use GEM_BUG_ON instead of returning a very late EINVAL in the
> > remote
> > case of watchdog misprogramming; set correct LRI cmd size in
> > emit_stop_watchdog. (Chris)
> >
> > v4: Rebase.
> > v5: use to_intel_context instead of ctx->engine.
> > v6: Rebase.
> > v7: Rebase,
> > Store gpu watchdog capability in engine flag (Tvrtko)
> > Store WATCHDOG_DISABLE magic # in engine (Tvrtko)
> > No need to declare emit_{start|stop}_watchdog as vfuncs
> > (Tvrtko)
> > Replace flag watchdog_running with enable_watchdog (Tvrtko)
> > Emit a single MI_NOOP by conditionally checking whether the #
> > of emitted OPs is odd (Tvrtko)
> > v8: Rebase
> >
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Antonio Argenziano <antonio.argenziano@intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> > Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> > Signed-off-by: Carlos Santa <carlos.santa@intel.com>
> > ---
> > drivers/gpu/drm/i915/intel_context_types.h | 4 +
> > drivers/gpu/drm/i915/intel_engine_cs.c | 2 +
> > drivers/gpu/drm/i915/intel_engine_types.h | 17 ++++-
> > drivers/gpu/drm/i915/intel_lrc.c | 89
> > +++++++++++++++++++++-
> > drivers/gpu/drm/i915/intel_lrc.h | 2 +
> > 5 files changed, 106 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_context_types.h
> > b/drivers/gpu/drm/i915/intel_context_types.h
> > index 6dc9b4b9067b..e56fc263568e 100644
> > --- a/drivers/gpu/drm/i915/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/intel_context_types.h
> > @@ -51,6 +51,10 @@ struct intel_context {
> > u64 lrc_desc;
> >
> > atomic_t pin_count;
> > + /** watchdog_threshold: hw watchdog threshold value,
> > + * in clock counts
> > + */
>
> Gah. Why would you put it here? Between a tightly coupled count +
> mutex.
>
> > + u32 watchdog_threshold;
> > struct mutex pin_mutex; /* guards pinning and associated
> > on-gpuing */
> >
> > /**
> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c
> > b/drivers/gpu/drm/i915/intel_engine_cs.c
> > index 88cf0fc07623..d4ea07b70904 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> > @@ -324,6 +324,8 @@ intel_engine_setup(struct drm_i915_private
> > *dev_priv,
> > if (engine->context_size)
> > DRIVER_CAPS(dev_priv)->has_logical_contexts = true;
> >
> > + engine->watchdog_disable_id = get_watchdog_disable(engine);
> > +
> > /* Nothing to do here, execute in order of dependencies */
> > engine->schedule = NULL;
> >
> > diff --git a/drivers/gpu/drm/i915/intel_engine_types.h
> > b/drivers/gpu/drm/i915/intel_engine_types.h
> > index c4f66b774e7c..1f99b536471d 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_types.h
> > +++ b/drivers/gpu/drm/i915/intel_engine_types.h
> > @@ -260,6 +260,7 @@ struct intel_engine_cs {
> > unsigned int guc_id;
> > intel_engine_mask_t mask;
> >
> > + u32 watchdog_disable_id;
>
> You've just put this between a pair of u8s.
>
> > u8 uabi_class;
> >
> > u8 class;
> > @@ -422,10 +423,12 @@ struct intel_engine_cs {
> >
> > struct intel_engine_hangcheck hangcheck;
> >
> > -#define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
> > -#define I915_ENGINE_SUPPORTS_STATS BIT(1)
> > -#define I915_ENGINE_HAS_PREEMPTION BIT(2)
> > -#define I915_ENGINE_HAS_SEMAPHORES BIT(3)
> > +#define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
> > +#define I915_ENGINE_SUPPORTS_STATS BIT(1)
> > +#define I915_ENGINE_HAS_PREEMPTION BIT(2)
> > +#define I915_ENGINE_HAS_SEMAPHORES BIT(3)
> > +#define I915_ENGINE_SUPPORTS_WATCHDOG BIT(4)
> > +
> > unsigned int flags;
> >
> > /*
> > @@ -509,6 +512,12 @@ intel_engine_has_semaphores(const struct
> > intel_engine_cs *engine)
> > return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
> > }
> >
> > +static inline bool
> > +intel_engine_supports_watchdog(const struct intel_engine_cs
> > *engine)
> > +{
> > + return engine->flags & I915_ENGINE_SUPPORTS_WATCHDOG;
> > +}
> > +
> > #define instdone_slice_mask(dev_priv__) \
> > (IS_GEN(dev_priv__, 7) ? \
> > 1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > b/drivers/gpu/drm/i915/intel_lrc.c
> > index 85785a94f6ae..78ea54a5dbc3 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -2036,16 +2036,75 @@ static void execlists_reset_finish(struct
> > intel_engine_cs *engine)
> > atomic_read(&execlists->tasklet.count));
> > }
> >
> > +static u32 *gen8_emit_start_watchdog(struct i915_request *rq, u32
> > *cs)
> > +{
> > + struct intel_engine_cs *engine = rq->engine;
> > + struct i915_gem_context *ctx = rq->gem_context;
> > + struct intel_context *ce = intel_context_lookup(ctx,
> > engine);
> > +
> > + GEM_BUG_ON(!intel_engine_supports_watchdog(engine));
> > +
> > + /*
> > + * watchdog register must never be programmed to zero. This
> > would
> > + * cause the watchdog counter to exceed and not allow the
> > engine to
> > + * go into IDLE state
> > + */
> > + GEM_BUG_ON(ce->watchdog_threshold == 0);
> > +
> > + /* Set counter period */
> > + *cs++ = MI_LOAD_REGISTER_IMM(2);
> > + *cs++ = i915_mmio_reg_offset(RING_THRESH(engine-
> > >mmio_base));
> > + *cs++ = ce->watchdog_threshold;
> > + /* Start counter */
> > + *cs++ = i915_mmio_reg_offset(RING_CNTR(engine->mmio_base));
> > + *cs++ = GEN8_WATCHDOG_ENABLE;
>
> Hmm, so no watchdog seqno.
>
> > + return cs;
> > +}
> > +
> > +static u32 *gen8_emit_stop_watchdog(struct i915_request *rq, u32
> > *cs)
> > +{
> > + struct intel_engine_cs *engine = rq->engine;
> > +
> > + GEM_BUG_ON(!intel_engine_supports_watchdog(engine));
> > +
> > + *cs++ = MI_LOAD_REGISTER_IMM(1);
> > + *cs++ = i915_mmio_reg_offset(RING_CNTR(engine->mmio_base));
> > + *cs++ = engine->watchdog_disable_id;
> > +
> > + return cs;
> > +}
> > +
> > static int gen8_emit_bb_start(struct i915_request *rq,
> > u64 offset, u32 len,
> > const unsigned int flags)
> > {
> > + struct intel_engine_cs *engine = rq->engine;
> > + struct i915_gem_context *ctx = rq->gem_context;
> > + struct intel_context *ce = intel_context_lookup(ctx,
> > engine);
>
> Ahem. This keeps on getting worse.
Can you explain a bit more why?
>
> > u32 *cs;
> > + u32 num_dwords;
> > + bool enable_watchdog = false;
> >
> > - cs = intel_ring_begin(rq, 6);
> > + /* bb_start only */
> > + num_dwords = 6;
> > +
> > + /* check if watchdog will be required */
> > + if (ce->watchdog_threshold != 0) {
> > + /* + start_watchdog (6) + stop_watchdog (4) */
> > + num_dwords += 10;
> > + enable_watchdog = true;
> > + }
> > +
> > + cs = intel_ring_begin(rq, num_dwords);
> > if (IS_ERR(cs))
> > return PTR_ERR(cs);
> >
> > + if (enable_watchdog) {
> > + /* Start watchdog timer */
>
> Please don't simply repeat code in comments.
Ack.
>
> > + cs = gen8_emit_start_watchdog(rq, cs);
> > + }
> > +
> > /*
> > * WaDisableCtxRestoreArbitration:bdw,chv
> > *
> > @@ -2072,10 +2131,16 @@ static int gen8_emit_bb_start(struct
> > i915_request *rq,
> > *cs++ = upper_32_bits(offset);
> >
> > *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
> > - *cs++ = MI_NOOP;
> >
> > - intel_ring_advance(rq, cs);
> > + if (enable_watchdog) {
> > + /* Cancel watchdog timer */
> > + cs = gen8_emit_stop_watchdog(rq, cs);
> > + }
> > +
> > + if (*cs%2 != 0)
> > + *cs++ = MI_NOOP;
>
> This is wrong. cs points into the unset portion of the ring. The
> watchdog commands are even, so there is no reason to move the
> original
> NOOP, or at least no reason to make it conditional.
Ok, I initially took the suggestion from Tvrtko from way back, where we
were trying to get rid of too many MI_NOOPs by emitting them
conditionally based on whether the cs pointer was odd...
Carlos
> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2019-04-02 0:57 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-22 23:41 [PATCH v5 0/5] GEN8+ GPU Watchdog Reset Support Carlos Santa
2019-03-22 23:41 ` [PATCH v5 1/5] drm/i915: Add engine reset count in get-reset-stats ioctl Carlos Santa
2019-03-30 8:45 ` Chris Wilson
2019-03-22 23:41 ` [PATCH v5 2/5] drm/i915: Watchdog timeout: IRQ handler for gen8+ Carlos Santa
2019-03-25 10:00 ` Tvrtko Ursulin
2019-03-27 1:58 ` Carlos Santa
2019-03-27 10:40 ` Tvrtko Ursulin
2019-03-22 23:41 ` [PATCH v5 3/5] drm/i915: Watchdog timeout: Ringbuffer command emission " Carlos Santa
2019-03-30 8:49 ` Chris Wilson
2019-03-30 9:01 ` Chris Wilson
2019-04-02 0:57 ` Carlos Santa [this message]
2019-03-22 23:41 ` [PATCH v5 4/5] drm/i915: Watchdog timeout: DRM kernel interface to set the timeout Carlos Santa
2019-03-22 23:41 ` [PATCH v5 5/5] drm/i915: Watchdog timeout: Include threshold value in error state Carlos Santa
2019-03-22 23:59 ` ✗ Fi.CI.BAT: failure for GEN8+ GPU Watchdog Reset Support Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=926e77bc13e29a2742aa2c398dcd8b21bbc53e1e.camel@intel.com \
--to=carlos.santa@intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
--cc=michel.thierry@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox