From: Carlos Santa <carlos.santa@intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Cc: Michel Thierry <michel.thierry@intel.com>
Subject: Re: [PATCH v4 2/5] drm/i915: Watchdog timeout: IRQ handler for gen8+
Date: Fri, 01 Mar 2019 18:08:52 -0800 [thread overview]
Message-ID: <9f1cd56b2f11301084667a7e316d78fc49b96d71.camel@intel.com> (raw)
In-Reply-To: <155143298308.5847.17161978101892410437@skylake-alporthouse-com>
On Fri, 2019-03-01 at 09:36 +0000, Chris Wilson wrote:
> Quoting Carlos Santa (2019-02-21 02:58:16)
> > +#define GEN8_WATCHDOG_1000US(dev_priv)
> > watchdog_to_clock_counts(dev_priv, 1000)
> > +static void gen8_watchdog_irq_handler(unsigned long data)
> > +{
> > + struct intel_engine_cs *engine = (struct intel_engine_cs
> > *)data;
> > + struct drm_i915_private *dev_priv = engine->i915;
> > + unsigned int hung = 0;
> > + u32 current_seqno=0;
> > + char msg[80];
> > + unsigned int tmp;
> > + int len;
> > +
> > + /* Stop the counter to prevent further timeout interrupts
> > */
> > + I915_WRITE_FW(RING_CNTR(engine->mmio_base),
> > get_watchdog_disable(engine));
> > +
> > + /* Read the heartbeat seqno once again to check if we are
> > stuck? */
> > + current_seqno = intel_engine_get_hangcheck_seqno(engine);
>
> I have said this before, but this doesn't exist either, it's just a
> temporary glitch in the matrix.
That was my only way to check for the "quilty" seqno right before
resetting during smoke testing... Will reach out again before sending a
new rev to cross check on the new approach you mentioned today.
>
> > + if (current_seqno == engine->current_seqno) {
> > + hung |= engine->mask;
> > +
> > + len = scnprintf(msg, sizeof(msg), "%s on ",
> > "watchdog timeout");
> > + for_each_engine_masked(engine, dev_priv, hung, tmp)
> > + len += scnprintf(msg + len, sizeof(msg) -
> > len,
> > + "%s, ", engine->name);
> > + msg[len-2] = '\0';
> > +
> > + i915_handle_error(dev_priv, hung, 0, "%s", msg);
> > +
> > + /* Reset timer in case GPU hangs without another
> > request being added */
> > + i915_queue_hangcheck(dev_priv);
>
> You still haven't explained why we are not just resetting the engine
> immediately. Have you looked at the preempt-timeout patches that need
> to
> do the same thing from timer-irq context?
>
> Resending the same old stuff over and over again is just
> exasperating.
> -Chris
Oops, I had the wrong assumption, as I honestly thought removing the
workqueue from v3 would allow for an immediate reset. Thanks for the
feedback on the preempt-timeout series... will rework this.
Carlos
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2019-03-02 2:09 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-21 2:58 [PATCH v4 0/5] GEN8+ GPU Watchdog Reset Support Carlos Santa
2019-02-21 2:58 ` [PATCH v4 1/5] drm/i915: Add engine reset count in get-reset-stats ioctl Carlos Santa
2019-02-25 13:34 ` Tvrtko Ursulin
2019-03-06 23:08 ` Carlos Santa
2019-03-07 7:27 ` Tvrtko Ursulin
2019-02-21 2:58 ` [PATCH v4 2/5] drm/i915: Watchdog timeout: IRQ handler for gen8+ Carlos Santa
2019-02-28 17:38 ` Tvrtko Ursulin
2019-03-01 1:51 ` Carlos Santa
2019-03-01 9:36 ` Chris Wilson
2019-03-02 2:08 ` Carlos Santa [this message]
2019-03-08 3:16 ` Carlos Santa
2019-03-11 10:39 ` Tvrtko Ursulin
2019-03-18 0:15 ` Carlos Santa
2019-03-19 12:39 ` Tvrtko Ursulin
2019-03-19 12:46 ` Tvrtko Ursulin
2019-03-19 17:52 ` Carlos Santa
2019-02-21 2:58 ` [PATCH v4 3/5] drm/i915: Watchdog timeout: Ringbuffer command emission " Carlos Santa
2019-02-21 2:58 ` [PATCH v4 4/5] drm/i915: Watchdog timeout: DRM kernel interface to set the timeout Carlos Santa
2019-02-28 17:22 ` Tvrtko Ursulin
2019-02-21 2:58 ` [PATCH v4 5/5] drm/i915: Watchdog timeout: Include threshold value in error state Carlos Santa
2019-02-21 2:58 ` drm/i915: Replace global_seqno with a hangcheck heartbeat seqno Carlos Santa
2019-02-21 3:24 ` ✗ Fi.CI.BAT: failure for drm/i915: Replace global_seqno with a hangcheck heartbeat seqno (rev3) Patchwork
2019-03-11 11:54 ` [PATCH v4 0/5] GEN8+ GPU Watchdog Reset Support Chris Wilson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9f1cd56b2f11301084667a7e316d78fc49b96d71.camel@intel.com \
--to=carlos.santa@intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
--cc=michel.thierry@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox