From: Carlos Santa <carlos.santa@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
Chris Wilson <chris@chris-wilson.co.uk>,
intel-gfx@lists.freedesktop.org
Cc: Michel Thierry <michel.thierry@intel.com>
Subject: Re: [PATCH v4 2/5] drm/i915: Watchdog timeout: IRQ handler for gen8+
Date: Sun, 17 Mar 2019 17:15:14 -0700 [thread overview]
Message-ID: <1271220339df25098eb6305051972c80c125bc52.camel@intel.com> (raw)
In-Reply-To: <62d65486-a45a-7cf9-1574-1c961be2ee9a@linux.intel.com>
On Mon, 2019-03-11 at 10:39 +0000, Tvrtko Ursulin wrote:
> On 08/03/2019 03:16, Carlos Santa wrote:
> > On Fri, 2019-03-01 at 09:36 +0000, Chris Wilson wrote:
> > > >
> > >
> > > Quoting Carlos Santa (2019-02-21 02:58:16)
> > > > +#define GEN8_WATCHDOG_1000US(dev_priv)
> > > > watchdog_to_clock_counts(dev_priv, 1000)
> > > > +static void gen8_watchdog_irq_handler(unsigned long data)
> > > > +{
> > > > + struct intel_engine_cs *engine = (struct
> > > > intel_engine_cs
> > > > *)data;
> > > > + struct drm_i915_private *dev_priv = engine->i915;
> > > > + unsigned int hung = 0;
> > > > + u32 current_seqno=0;
> > > > + char msg[80];
> > > > + unsigned int tmp;
> > > > + int len;
> > > > +
> > > > + /* Stop the counter to prevent further timeout
> > > > interrupts
> > > > */
> > > > + I915_WRITE_FW(RING_CNTR(engine->mmio_base),
> > > > get_watchdog_disable(engine));
> > > > +
> > > > + /* Read the heartbeat seqno once again to check if we
> > > > are
> > > > stuck? */
> > > > + current_seqno =
> > > > intel_engine_get_hangcheck_seqno(engine);
> > >
> > > I have said this before, but this doesn't exist either, it's just
> > > a
> > > temporary glitch in the matrix.
> > >
> >
> > Chris, Tvrtko, I need some guidance on how to find the quilty seqno
> > during a hang, can you please advice here what to do?
>
> When an interrupt fires you need to ascertain whether the same
> request
> which enabled the watchdog is running, correct?
>
> So I think you would need this, with a disclaimer that I haven't
> thought
> about the details really:
>
> 1. Take a reference to timeline hwsp when setting up the watchdog for
> a
> request.
>
> 2. Store the initial seqno associated with this request.
>
> 3. Force enable user interrupts.
>
> 4. When timeout fires, inspect the HWSP seqno to see if the request
> completed or not.
>
> 5. Reset the engine if not completed.
>
> 6. Put the timeline/hwsp reference.
static int gen8_emit_bb_start(struct i915_request *rq,
u64 offset, u32
len,
const unsigned
int flags)
{
struct i915_timeline *tl;
u32 seqno;
if (enable_watchdog) {
/* Start watchdog timer */
cs = gen8_emit_start_watchdog(rq, cs);
tl = ce->ring->timeline;
i915_timeline_get_seqno(tl, rq, &seqno);
/*Store initial hwsp seqno associated with this request
engine->watchdog_hwsp_seqno = tl->hwsp_seqno;
}
}
static void gen8_watchdog_tasklet(unsigned long data)
{
struct i915_request *rq;
rq = intel_engine_find_active_request(engine);
/* Inspect the watchdog seqno once again for
completion? */
if (!i915_seqno_passed(engine->watchdog_hwsp_seqno, rq-
>fence.seqno)) {
//Reset Engine
}
}
Tvrtko, is the above acceptable to inspect whether the seqno has
completed?
I noticed there's a helper function i915_request_completed(struct
i915_request *rq) but it will require me to modify it in order to pass
2 different seqnos.
Regards,
Carlos
>
> If the user interrupt fires with the request completed cancel the
> above
> operations.
>
> There could be an inherent race between inspecting the seqno and
> deciding to reset. Not sure at the moment what to do. Maybe just call
> it
> bad luck?
>
> I also think for the software implementation you need to force no
> request coalescing for contexts with timeout set. Because you want
> to
> have 100% defined borders for request in and out - since the timeout
> is
> defined per request.
>
> In this case you don't need the user interrupt for the trailing edge
> signal but can use context complete. Maybe putting hooks into
> context_in/out in intel_lrc.c would work under these circumstances.
>
> Also if preempted you need to cancel the timer setup and store
> elapsed
> execution time.
>
> Or it may make sense to just disable preemption for these contexts.
> Otherwise there is no point in trying to mandate the timeout?
>
> But it is also kind of bad since non-privileged contexts can make
> themselves non-preemptable by setting the watchdog timeout.
>
> Maybe as a compromise we need to automatically apply an elevated
> priority level, but not as high to be completely non-preemptable.
> Sounds
> like a hard question.
>
> Regards,
>
> Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2019-03-18 0:15 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-21 2:58 [PATCH v4 0/5] GEN8+ GPU Watchdog Reset Support Carlos Santa
2019-02-21 2:58 ` [PATCH v4 1/5] drm/i915: Add engine reset count in get-reset-stats ioctl Carlos Santa
2019-02-25 13:34 ` Tvrtko Ursulin
2019-03-06 23:08 ` Carlos Santa
2019-03-07 7:27 ` Tvrtko Ursulin
2019-02-21 2:58 ` [PATCH v4 2/5] drm/i915: Watchdog timeout: IRQ handler for gen8+ Carlos Santa
2019-02-28 17:38 ` Tvrtko Ursulin
2019-03-01 1:51 ` Carlos Santa
2019-03-01 9:36 ` Chris Wilson
2019-03-02 2:08 ` Carlos Santa
2019-03-08 3:16 ` Carlos Santa
2019-03-11 10:39 ` Tvrtko Ursulin
2019-03-18 0:15 ` Carlos Santa [this message]
2019-03-19 12:39 ` Tvrtko Ursulin
2019-03-19 12:46 ` Tvrtko Ursulin
2019-03-19 17:52 ` Carlos Santa
2019-02-21 2:58 ` [PATCH v4 3/5] drm/i915: Watchdog timeout: Ringbuffer command emission " Carlos Santa
2019-02-21 2:58 ` [PATCH v4 4/5] drm/i915: Watchdog timeout: DRM kernel interface to set the timeout Carlos Santa
2019-02-28 17:22 ` Tvrtko Ursulin
2019-02-21 2:58 ` [PATCH v4 5/5] drm/i915: Watchdog timeout: Include threshold value in error state Carlos Santa
2019-02-21 2:58 ` drm/i915: Replace global_seqno with a hangcheck heartbeat seqno Carlos Santa
2019-02-21 3:24 ` ✗ Fi.CI.BAT: failure for drm/i915: Replace global_seqno with a hangcheck heartbeat seqno (rev3) Patchwork
2019-03-11 11:54 ` [PATCH v4 0/5] GEN8+ GPU Watchdog Reset Support Chris Wilson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1271220339df25098eb6305051972c80c125bc52.camel@intel.com \
--to=carlos.santa@intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
--cc=michel.thierry@intel.com \
--cc=tvrtko.ursulin@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox