From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>,
intel-gfx@lists.freedesktop.org,
Jani Nikula <jani.nikula@intel.com>,
Daniel Vetter <dnaiel.vetter@ffwll.chm>
Subject: Re: [PATCH] drm/i915: Convert hangcheck from a timer into a delayed work item
Date: Thu, 4 Sep 2014 19:40:32 +0300 [thread overview]
Message-ID: <20140904164032.GN4193@intel.com> (raw)
In-Reply-To: <20140904153839.GE13230@nuc-i3427.alporthouse.com>
On Thu, Sep 04, 2014 at 04:38:39PM +0100, Chris Wilson wrote:
> On Thu, Sep 04, 2014 at 06:25:03PM +0300, Ville Syrjälä wrote:
> > On Thu, Sep 04, 2014 at 04:09:02PM +0100, Chris Wilson wrote:
> > > When run as a timer, i915_hangcheck_elapsed() must adhere to all the
> > > rules of running in a softirq context. This is advantageous to us as we
> > > want to minimise the risk that a driver bug will prevent us from
> > > detecting a hung GPU. However, that is irrelevant if the driver bug
> > > prevents us from resetting and recovering. Still it is prudent not to
> > > rely on mutexes inside the checker, but given the coarseness of
> > > dev->struct_mutex doing so is extremely hard.
> > >
> > > Give in and run from a work queue, i.e. outside of softirq.
> > >
> > > v2:
> > >
> > > The conversion does have one significant change, from the use of
> > > mod_timer to schedule_delayed_work, means that the time that we execute
> > > the first hangcheck is fixed and not continually deferred by later work.
> > > This has the advantage of not allowing userspace to fill the ring before
> > > hangcheck can finally run. At the same time, it removes the ability for
> > > the interrupt to defer the hangcheck as well. This is sensible for that
> > > an interrupt is only for a single engine, whereas we perform hangcheck
> > > globally, so whilst one ring may have hung, the other could be running
> > > normally and preventing the hangcheck from firing.
> >
> > But doesn't this make it so that we may not detect a hang unless more
> > work gets submitted constantly? Eg.
> >
> > 1. execbuffer batch 1 -> queue hangcheck schedules work
> > 2. execbuffer batch 2 -> queue hangcheck does nothing
> > 3. execbuffer batch 3 -> queue hangcheck does nothing
> > 4. hangcheck expires and sees progress up to batch 2 -> everything is fine
> 4.b hangcheck rearms itself as there is outstanding wrok
Indeed. I should have actually read the code and it would have been
obvious.
> > 5. batch 3 hangs
> 6. hangcheck fires, sees progress, rearms
> 7. hangcheck fires, sees no progress, shoots the user.
Sounds like we need a disclaimer about the dangers of causing a GPU
hang :)
--
Ville Syrjälä
Intel OTC
next prev parent reply other threads:[~2014-09-04 16:40 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-04 12:04 [PATCH] drm/i915: Convert hangcheck from a timer into a delayed work item Chris Wilson
2014-09-04 12:17 ` Jani Nikula
2014-09-04 12:26 ` Chris Wilson
2014-09-04 13:31 ` Daniel Vetter
2014-09-04 13:33 ` Chris Wilson
2014-09-04 14:12 ` Daniel Vetter
2014-09-04 14:18 ` Chris Wilson
2014-09-04 15:09 ` Chris Wilson
2014-09-04 15:25 ` Ville Syrjälä
2014-09-04 15:38 ` Chris Wilson
2014-09-04 16:40 ` Ville Syrjälä [this message]
2014-10-07 15:34 ` Mika Kuoppala
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140904164032.GN4193@intel.com \
--to=ville.syrjala@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=dnaiel.vetter@ffwll.chm \
--cc=intel-gfx@lists.freedesktop.org \
--cc=jani.nikula@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.