From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 01/62] drm/i915: Only start retire worker when idle
Date: Wed, 08 Jun 2016 15:07:45 +0300 [thread overview]
Message-ID: <1465387665.5803.39.camel@linux.intel.com> (raw)
In-Reply-To: <20160608110602.GV32344@nuc-i3427.alporthouse.com>
On ke, 2016-06-08 at 12:06 +0100, Chris Wilson wrote:
> On Wed, Jun 08, 2016 at 11:53:15AM +0100, Chris Wilson wrote:
> >
> > On Tue, Jun 07, 2016 at 02:31:07PM +0300, Joonas Lahtinen wrote:
> > >
> > > On pe, 2016-06-03 at 17:36 +0100, Chris Wilson wrote:
> > > >
> > > > i915_gem_idle_work_handler(struct work_struct *work)
> > > > {
> > > > struct drm_i915_private *dev_priv =
> > > > - container_of(work, typeof(*dev_priv), mm.idle_work.work);
> > > > + container_of(work, typeof(*dev_priv), gt.idle_work.work);
> > > > struct drm_device *dev = dev_priv->dev;
> > > > struct intel_engine_cs *engine;
> > > >
> > > > - for_each_engine(engine, dev_priv)
> > > > - if (!list_empty(&engine->request_list))
> > > > - return;
> > > > + if (!READ_ONCE(dev_priv->gt.awake))
> > > > + return;
> > > >
> > > > - /* we probably should sync with hangcheck here, using cancel_work_sync.
> > > > - * Also locking seems to be fubar here, engine->request_list is protected
> > > > - * by dev->struct_mutex. */
> > > > + mutex_lock(&dev->struct_mutex);
> > > > + if (dev_priv->gt.active_engines)
> > > > + goto out;
> > > >
> > > > - intel_mark_idle(dev_priv);
> > > > + for_each_engine(engine, dev_priv)
> > > > + i915_gem_batch_pool_fini(&engine->batch_pool);
> > > >
> > > > - if (mutex_trylock(&dev->struct_mutex)) {
> > > > - for_each_engine(engine, dev_priv)
> > > > - i915_gem_batch_pool_fini(&engine->batch_pool);
> > > > + GEM_BUG_ON(!dev_priv->gt.awake);
> > > > + dev_priv->gt.awake = false;
> > > >
> > > > - mutex_unlock(&dev->struct_mutex);
> > > > + if (INTEL_INFO(dev_priv)->gen >= 6)
> > > > + gen6_rps_idle(dev_priv);
> > > > + intel_runtime_pm_put(dev_priv);
> > > > +out:
> > > > + mutex_unlock(&dev->struct_mutex);
> > > > +
> > > > + if (!dev_priv->gt.awake &&
> > > No READ_ONCE here, even we just unlocked the mutex. So lacks some
> > > consistency.
> > >
> > > Also, this assumes we might be pre-empted between unlocking mutex and
> > > making this test, so I'm little bit confused. Do you want to optimize
> > > by avoiding calling cancel_delayed_work_sync?
> > General principle to never call work_sync functions with locks held. I
> > had actually thought I had fixed this up (but realized that I just
> > rewrote hangcheck later on instead ;)
> >
> > Ok, what I think is safer here is
> >
> > bool hangcheck = cancel_delay_work_sync(hangcheck_work)
> >
> > mutex_lock()
> > if (actually_idle()) {
> > awake = false;
> > missed_irq_rings |= intel_kick_waiters();
> > }
> > mutex_unlock();
> >
> > if (awake && hangcheck)
> > queue_hangcheck()
> >
> > So always kick the hangcheck and reeanble if we tried to idle too early.
> > This will potentially delay hangcheck by one full hangcheck period if we
> > do encounter that race. But we shouldn't be hitting this race that
> > often, or hanging the GPU for that mterr.
> Actual delta:
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 406046f66e36..856da4036fb3 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3066,10 +3066,15 @@ i915_gem_idle_work_handler(struct work_struct *work)
> container_of(work, typeof(*dev_priv), gt.idle_work.work);
> struct drm_device *dev = dev_priv->dev;
> struct intel_engine_cs *engine;
> + unsigned stuck_engines;
> + bool rearm_hangcheck;
>
> if (!READ_ONCE(dev_priv->gt.awake))
> return;
>
> + rearm_hangcheck =
> + cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
> +
> mutex_lock(&dev->struct_mutex);
> if (dev_priv->gt.active_engines)
> goto out;
> @@ -3079,6 +3084,13 @@ i915_gem_idle_work_handler(struct work_struct *work)
>
> GEM_BUG_ON(!dev_priv->gt.awake);
> dev_priv->gt.awake = false;
> + rearm_hangcheck = false;
> +
> + stuck_engines = intel_kick_waiters(dev_priv);
> + if (unlikely(stuck_engines)) {
> + DRM_DEBUG_DRIVER("kicked stuck waiters...missed irq\n");
> + dev_priv->gpu_error.missed_irq_rings |= stuck_engines;
> + }
>
> if (INTEL_INFO(dev_priv)->gen >= 6)
> gen6_rps_idle(dev_priv);
> @@ -3086,14 +3098,8 @@ i915_gem_idle_work_handler(struct work_struct *work)
> out:
> mutex_unlock(&dev->struct_mutex);
>
> - if (!dev_priv->gt.awake &&
> - cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work)) {
> - unsigned stuck = intel_kick_waiters(dev_priv);
> - if (unlikely(stuck)) {
> - DRM_DEBUG_DRIVER("kicked stuck waiters...missed irq\n");
> - dev_priv->gpu_error.missed_irq_rings |= stuck;
> - }
> - }
> + if (rearm_hangcheck)
> + i915_queue_hangcheck(dev_priv);
As discussed in IRC, should not race, so with above hunk;
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> }
> -Chris
>
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2016-06-08 12:07 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-03 16:36 The vma leak fix from yonder Chris Wilson
2016-06-03 16:36 ` [PATCH 01/62] drm/i915: Only start retire worker when idle Chris Wilson
2016-06-07 11:31 ` Joonas Lahtinen
2016-06-08 10:53 ` Chris Wilson
2016-06-08 11:06 ` Chris Wilson
2016-06-08 12:07 ` Joonas Lahtinen [this message]
2016-06-03 16:36 ` [PATCH 02/62] drm/i915: Do not keep postponing the idle-work Chris Wilson
2016-06-07 11:34 ` Joonas Lahtinen
2016-06-03 16:36 ` [PATCH 03/62] drm/i915: Remove redundant queue_delayed_work() from throttle ioctl Chris Wilson
2016-06-07 11:39 ` Joonas Lahtinen
2016-06-03 16:36 ` [PATCH 04/62] drm/i915: Restore waitboost credit to the synchronous waiter Chris Wilson
2016-06-08 9:04 ` Daniel Vetter
2016-06-08 10:38 ` Chris Wilson
2016-06-03 16:36 ` [PATCH 05/62] drm/i915: Add background commentary to "waitboosting" Chris Wilson
2016-06-03 16:36 ` [PATCH 06/62] drm/i915: Flush the RPS bottom-half when the GPU idles Chris Wilson
2016-06-16 8:49 ` Michał Winiarski
2016-06-16 11:09 ` Chris Wilson
2016-06-03 16:36 ` [PATCH 07/62] drm/i915: Remove temporary RPM wakeref assert disables Chris Wilson
2016-06-03 16:36 ` [PATCH 08/62] drm/i915: Remove stop-rings debugfs interface Chris Wilson
2016-06-08 11:50 ` Arun Siluvery
2016-06-03 16:36 ` [PATCH 09/62] drm/i915: Record the ringbuffer associated with the request Chris Wilson
2016-06-03 16:36 ` [PATCH 10/62] drm/i915: Allow userspace to request no-error-capture upon GPU hangs Chris Wilson
2016-06-03 16:36 ` [PATCH 11/62] drm/i915: Clean up GPU hang message Chris Wilson
2016-06-14 8:13 ` Mika Kuoppala
2016-06-03 16:36 ` [PATCH 12/62] drm/i915: Skip capturing an error state if we already have one Chris Wilson
2016-06-08 11:14 ` Arun Siluvery
2016-06-08 12:06 ` Chris Wilson
2016-06-03 16:36 ` [PATCH 13/62] drm/i915: Derive GEM requests from dma-fence Chris Wilson
2016-06-08 9:14 ` Daniel Vetter
2016-06-08 10:33 ` Chris Wilson
2016-06-03 16:36 ` [PATCH 14/62] drm/i915: Rename request reference/unreference to get/put Chris Wilson
2016-06-08 9:15 ` Daniel Vetter
2016-06-03 16:36 ` [PATCH 15/62] drm/i915: Rename i915_gem_context_reference/unreference() Chris Wilson
2016-06-06 12:12 ` Joonas Lahtinen
2016-06-03 16:36 ` [PATCH 16/62] drm/i915: Wrap drm_gem_object_lookup in i915_gem_object_lookup Chris Wilson
2016-06-03 16:36 ` [PATCH 17/62] drm/i915: Wrap drm_gem_object_reference in i915_gem_object_get Chris Wilson
2016-06-03 16:36 ` [PATCH 18/62] drm/i915: Rename drm_gem_object_unreference in preparation for lockless free Chris Wilson
2016-06-03 16:36 ` [PATCH 19/62] drm/i915: Rename drm_gem_object_unreference_unlocked " Chris Wilson
2016-06-03 16:36 ` [PATCH 20/62] drm/i915: Disable waitboosting for fence_wait() Chris Wilson
2016-06-03 16:36 ` [PATCH 21/62] drm/i915: Disable waitboosting for mmioflips/semaphores Chris Wilson
2016-06-03 16:36 ` [PATCH 22/62] drm/i915: Treat ringbuffer writes as write to normal memory Chris Wilson
2016-06-03 16:36 ` [PATCH 23/62] drm/i915: Rename ring->virtual_start as ring->vaddr Chris Wilson
2016-06-03 16:36 ` [PATCH 24/62] drm/i915: Convert i915_semaphores_is_enabled over to early sanitize Chris Wilson
2016-06-03 16:36 ` [PATCH 25/62] drm/i915: Unify intel_logical_ring_emit and intel_ring_emit Chris Wilson
2016-06-03 16:36 ` [PATCH 26/62] drm/i915: Rename request->ring to request->engine Chris Wilson
2016-06-06 13:42 ` Tvrtko Ursulin
2016-06-03 16:36 ` [PATCH 27/62] drm/i915: Rename request->ringbuf to request->ring Chris Wilson
2016-06-06 13:44 ` Tvrtko Ursulin
2016-06-08 9:18 ` Daniel Vetter
2016-06-03 16:36 ` [PATCH 28/62] drm/i915: Rename backpointer from intel_ringbuffer to intel_engine_cs Chris Wilson
2016-06-06 13:45 ` Tvrtko Ursulin
2016-06-03 16:36 ` [PATCH 29/62] drm/i915: Rename intel_context[engine].ringbuf Chris Wilson
2016-06-03 16:36 ` [PATCH 30/62] drm/i915: Rename struct intel_ringbuffer to struct intel_ring Chris Wilson
2016-06-03 16:36 ` [PATCH 31/62] drm/i915: Rename residual ringbuf parameters Chris Wilson
2016-06-03 16:36 ` [PATCH 32/62] drm/i915: Rename intel_pin_and_map_ring() Chris Wilson
2016-06-03 16:36 ` [PATCH 33/62] drm/i915: Remove obsolete engine->gpu_caches_dirty Chris Wilson
2016-06-03 16:36 ` [PATCH 34/62] drm/i915: Simplify request_alloc by returning the allocated request Chris Wilson
2016-06-03 16:37 ` [PATCH 35/62] drm/i915: Unify legacy/execlists emission of MI_BATCHBUFFER_START Chris Wilson
2016-06-03 16:37 ` [PATCH 36/62] drm/i915: Convert engine->write_tail to operate on a request Chris Wilson
2016-06-03 16:37 ` [PATCH 37/62] drm/i915: Unify request submission Chris Wilson
2016-06-03 16:37 ` [PATCH 38/62] drm/i915: Stop passing caller's num_dwords to engine->semaphore.signal() Chris Wilson
2016-06-03 16:37 ` [PATCH 39/62] drm/i915: Reuse legacy breadcrumbs + tail emission Chris Wilson
2016-06-03 16:37 ` [PATCH 40/62] drm/i915: Remove duplicate golden render state init from execlists Chris Wilson
2016-06-03 16:37 ` [PATCH 41/62] drm/i915: Unify legacy/execlists submit_execbuf callbacks Chris Wilson
2016-06-03 16:37 ` [PATCH 42/62] drm/i915: Simplify calling engine->sync_to Chris Wilson
2016-06-03 16:37 ` [PATCH 43/62] drm/i915: Introduce i915_gem_active for request tracking Chris Wilson
2016-06-03 16:37 ` [PATCH 44/62] drm/i915: Prepare i915_gem_active for annotations Chris Wilson
2016-06-03 16:37 ` [PATCH 45/62] drm/i915: Mark up i915_gem_active for locking annotation Chris Wilson
2016-06-03 16:37 ` [PATCH 46/62] drm/i915: Refactor blocking waits Chris Wilson
2016-06-03 16:37 ` [PATCH 47/62] drm/i915: Rename request->list to link for consistency Chris Wilson
2016-06-03 16:37 ` [PATCH 48/62] drm/i915: Remove obsolete i915_gem_object_flush_active() Chris Wilson
2016-06-03 16:37 ` [PATCH 49/62] drm/i915: Refactor activity tracking for requests Chris Wilson
2016-06-03 16:37 ` [PATCH 50/62] drm/i915: Double check activity before relocations Chris Wilson
2016-06-03 16:37 ` [PATCH 51/62] drm/i915: Move request list retirement to i915_gem_request.c Chris Wilson
2016-06-03 16:37 ` [PATCH 52/62] drm/i915: Amalgamate GGTT/ppGTT vma debug list walkers Chris Wilson
2016-06-03 16:37 ` [PATCH 53/62] drm/i915: Split early global GTT initialisation Chris Wilson
2016-06-03 16:37 ` [PATCH 54/62] drm/i915: Store owning file on the i915_address_space Chris Wilson
2016-06-03 16:37 ` [PATCH 55/62] drm/i915: i915_vma_move_to_active prep patch Chris Wilson
2016-06-03 16:37 ` [PATCH 56/62] drm/i915: Count how many VMA are bound for an object Chris Wilson
2016-06-03 16:37 ` [PATCH 57/62] drm/i915: Be more careful when unbinding vma Chris Wilson
2016-06-03 16:37 ` [PATCH 58/62] drm/i915: Kill drop_pages() Chris Wilson
2016-06-03 16:37 ` [PATCH 59/62] drm/i915: Track active vma requests Chris Wilson
2016-06-03 16:37 ` [PATCH 60/62] drm/i915: Release vma when the handle is closed Chris Wilson
2016-06-03 16:37 ` [PATCH 61/62] drm/i915: Mark the context and address space as closed Chris Wilson
2016-06-03 16:37 ` [PATCH 62/62] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
2016-06-05 5:24 ` ✗ Ro.CI.BAT: failure for series starting with [01/62] drm/i915: Only start retire worker when idle Patchwork
2016-06-08 9:30 ` The vma leak fix from yonder Daniel Vetter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1465387665.5803.39.camel@linux.intel.com \
--to=joonas.lahtinen@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).