intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 01/62] drm/i915: Only start retire worker when idle
Date: Wed, 08 Jun 2016 15:07:45 +0300	[thread overview]
Message-ID: <1465387665.5803.39.camel@linux.intel.com> (raw)
In-Reply-To: <20160608110602.GV32344@nuc-i3427.alporthouse.com>

On ke, 2016-06-08 at 12:06 +0100, Chris Wilson wrote:
> On Wed, Jun 08, 2016 at 11:53:15AM +0100, Chris Wilson wrote:
> > 
> > On Tue, Jun 07, 2016 at 02:31:07PM +0300, Joonas Lahtinen wrote:
> > > 
> > > On pe, 2016-06-03 at 17:36 +0100, Chris Wilson wrote:
> > > > 
> > > >  i915_gem_idle_work_handler(struct work_struct *work)
> > > >  {
> > > >  	struct drm_i915_private *dev_priv =
> > > > -		container_of(work, typeof(*dev_priv), mm.idle_work.work);
> > > > +		container_of(work, typeof(*dev_priv), gt.idle_work.work);
> > > >  	struct drm_device *dev = dev_priv->dev;
> > > >  	struct intel_engine_cs *engine;
> > > >  
> > > > -	for_each_engine(engine, dev_priv)
> > > > -		if (!list_empty(&engine->request_list))
> > > > -			return;
> > > > +	if (!READ_ONCE(dev_priv->gt.awake))
> > > > +		return;
> > > >  
> > > > -	/* we probably should sync with hangcheck here, using cancel_work_sync.
> > > > -	 * Also locking seems to be fubar here, engine->request_list is protected
> > > > -	 * by dev->struct_mutex. */
> > > > +	mutex_lock(&dev->struct_mutex);
> > > > +	if (dev_priv->gt.active_engines)
> > > > +		goto out;
> > > >  
> > > > -	intel_mark_idle(dev_priv);
> > > > +	for_each_engine(engine, dev_priv)
> > > > +		i915_gem_batch_pool_fini(&engine->batch_pool);
> > > >  
> > > > -	if (mutex_trylock(&dev->struct_mutex)) {
> > > > -		for_each_engine(engine, dev_priv)
> > > > -			i915_gem_batch_pool_fini(&engine->batch_pool);
> > > > +	GEM_BUG_ON(!dev_priv->gt.awake);
> > > > +	dev_priv->gt.awake = false;
> > > >  
> > > > -		mutex_unlock(&dev->struct_mutex);
> > > > +	if (INTEL_INFO(dev_priv)->gen >= 6)
> > > > +		gen6_rps_idle(dev_priv);
> > > > +	intel_runtime_pm_put(dev_priv);
> > > > +out:
> > > > +	mutex_unlock(&dev->struct_mutex);
> > > > +
> > > > +	if (!dev_priv->gt.awake &&
> > > No READ_ONCE here, even we just unlocked the mutex. So lacks some
> > > consistency.
> > > 
> > > Also, this assumes we might be pre-empted between unlocking mutex and
> > > making this test, so I'm little bit confused. Do you want to optimize
> > > by avoiding calling cancel_delayed_work_sync?
> > General principle to never call work_sync functions with locks held. I
> > had actually thought I had fixed this up (but realized that I just
> > rewrote hangcheck later on instead ;)
> > 
> > Ok, what I think is safer here is
> > 
> > 	bool hangcheck = cancel_delay_work_sync(hangcheck_work)
> > 
> > 	mutex_lock()
> > 	if (actually_idle()) {
> > 		awake = false;
> > 		missed_irq_rings |= intel_kick_waiters();
> > 	}
> > 	mutex_unlock();
> > 
> > 	if (awake && hangcheck)
> > 		queue_hangcheck()
> > 	
> > So always kick the hangcheck and reeanble if we tried to idle too early.
> > This will potentially delay hangcheck by one full hangcheck period if we
> > do encounter that race. But we shouldn't be hitting this race that
> > often, or hanging the GPU for that mterr.
> Actual delta:
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 406046f66e36..856da4036fb3 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3066,10 +3066,15 @@ i915_gem_idle_work_handler(struct work_struct *work)
>                 container_of(work, typeof(*dev_priv), gt.idle_work.work);
>         struct drm_device *dev = dev_priv->dev;
>         struct intel_engine_cs *engine;
> +       unsigned stuck_engines;
> +       bool rearm_hangcheck;
>  
>         if (!READ_ONCE(dev_priv->gt.awake))
>                 return;
>  
> +       rearm_hangcheck =
> +               cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
> +
>         mutex_lock(&dev->struct_mutex);
>         if (dev_priv->gt.active_engines)
>                 goto out;
> @@ -3079,6 +3084,13 @@ i915_gem_idle_work_handler(struct work_struct *work)
>  
>         GEM_BUG_ON(!dev_priv->gt.awake);
>         dev_priv->gt.awake = false;
> +       rearm_hangcheck = false;
> +
> +       stuck_engines = intel_kick_waiters(dev_priv);
> +       if (unlikely(stuck_engines)) {
> +               DRM_DEBUG_DRIVER("kicked stuck waiters...missed irq\n");
> +               dev_priv->gpu_error.missed_irq_rings |= stuck_engines;
> +       }
>  
>         if (INTEL_INFO(dev_priv)->gen >= 6)
>                 gen6_rps_idle(dev_priv);
> @@ -3086,14 +3098,8 @@ i915_gem_idle_work_handler(struct work_struct *work)
>  out:
>         mutex_unlock(&dev->struct_mutex);
>  
> -       if (!dev_priv->gt.awake &&
> -           cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work)) {
> -               unsigned stuck = intel_kick_waiters(dev_priv);
> -               if (unlikely(stuck)) {
> -                       DRM_DEBUG_DRIVER("kicked stuck waiters...missed irq\n");
> -                       dev_priv->gpu_error.missed_irq_rings |= stuck;
> -               }
> -       }
> +       if (rearm_hangcheck)
> +               i915_queue_hangcheck(dev_priv);

As discussed in IRC, should not race, so with above hunk;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

>  }
> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2016-06-08 12:07 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-03 16:36 The vma leak fix from yonder Chris Wilson
2016-06-03 16:36 ` [PATCH 01/62] drm/i915: Only start retire worker when idle Chris Wilson
2016-06-07 11:31   ` Joonas Lahtinen
2016-06-08 10:53     ` Chris Wilson
2016-06-08 11:06       ` Chris Wilson
2016-06-08 12:07         ` Joonas Lahtinen [this message]
2016-06-03 16:36 ` [PATCH 02/62] drm/i915: Do not keep postponing the idle-work Chris Wilson
2016-06-07 11:34   ` Joonas Lahtinen
2016-06-03 16:36 ` [PATCH 03/62] drm/i915: Remove redundant queue_delayed_work() from throttle ioctl Chris Wilson
2016-06-07 11:39   ` Joonas Lahtinen
2016-06-03 16:36 ` [PATCH 04/62] drm/i915: Restore waitboost credit to the synchronous waiter Chris Wilson
2016-06-08  9:04   ` Daniel Vetter
2016-06-08 10:38     ` Chris Wilson
2016-06-03 16:36 ` [PATCH 05/62] drm/i915: Add background commentary to "waitboosting" Chris Wilson
2016-06-03 16:36 ` [PATCH 06/62] drm/i915: Flush the RPS bottom-half when the GPU idles Chris Wilson
2016-06-16  8:49   ` Michał Winiarski
2016-06-16 11:09     ` Chris Wilson
2016-06-03 16:36 ` [PATCH 07/62] drm/i915: Remove temporary RPM wakeref assert disables Chris Wilson
2016-06-03 16:36 ` [PATCH 08/62] drm/i915: Remove stop-rings debugfs interface Chris Wilson
2016-06-08 11:50   ` Arun Siluvery
2016-06-03 16:36 ` [PATCH 09/62] drm/i915: Record the ringbuffer associated with the request Chris Wilson
2016-06-03 16:36 ` [PATCH 10/62] drm/i915: Allow userspace to request no-error-capture upon GPU hangs Chris Wilson
2016-06-03 16:36 ` [PATCH 11/62] drm/i915: Clean up GPU hang message Chris Wilson
2016-06-14  8:13   ` Mika Kuoppala
2016-06-03 16:36 ` [PATCH 12/62] drm/i915: Skip capturing an error state if we already have one Chris Wilson
2016-06-08 11:14   ` Arun Siluvery
2016-06-08 12:06     ` Chris Wilson
2016-06-03 16:36 ` [PATCH 13/62] drm/i915: Derive GEM requests from dma-fence Chris Wilson
2016-06-08  9:14   ` Daniel Vetter
2016-06-08 10:33     ` Chris Wilson
2016-06-03 16:36 ` [PATCH 14/62] drm/i915: Rename request reference/unreference to get/put Chris Wilson
2016-06-08  9:15   ` Daniel Vetter
2016-06-03 16:36 ` [PATCH 15/62] drm/i915: Rename i915_gem_context_reference/unreference() Chris Wilson
2016-06-06 12:12   ` Joonas Lahtinen
2016-06-03 16:36 ` [PATCH 16/62] drm/i915: Wrap drm_gem_object_lookup in i915_gem_object_lookup Chris Wilson
2016-06-03 16:36 ` [PATCH 17/62] drm/i915: Wrap drm_gem_object_reference in i915_gem_object_get Chris Wilson
2016-06-03 16:36 ` [PATCH 18/62] drm/i915: Rename drm_gem_object_unreference in preparation for lockless free Chris Wilson
2016-06-03 16:36 ` [PATCH 19/62] drm/i915: Rename drm_gem_object_unreference_unlocked " Chris Wilson
2016-06-03 16:36 ` [PATCH 20/62] drm/i915: Disable waitboosting for fence_wait() Chris Wilson
2016-06-03 16:36 ` [PATCH 21/62] drm/i915: Disable waitboosting for mmioflips/semaphores Chris Wilson
2016-06-03 16:36 ` [PATCH 22/62] drm/i915: Treat ringbuffer writes as write to normal memory Chris Wilson
2016-06-03 16:36 ` [PATCH 23/62] drm/i915: Rename ring->virtual_start as ring->vaddr Chris Wilson
2016-06-03 16:36 ` [PATCH 24/62] drm/i915: Convert i915_semaphores_is_enabled over to early sanitize Chris Wilson
2016-06-03 16:36 ` [PATCH 25/62] drm/i915: Unify intel_logical_ring_emit and intel_ring_emit Chris Wilson
2016-06-03 16:36 ` [PATCH 26/62] drm/i915: Rename request->ring to request->engine Chris Wilson
2016-06-06 13:42   ` Tvrtko Ursulin
2016-06-03 16:36 ` [PATCH 27/62] drm/i915: Rename request->ringbuf to request->ring Chris Wilson
2016-06-06 13:44   ` Tvrtko Ursulin
2016-06-08  9:18     ` Daniel Vetter
2016-06-03 16:36 ` [PATCH 28/62] drm/i915: Rename backpointer from intel_ringbuffer to intel_engine_cs Chris Wilson
2016-06-06 13:45   ` Tvrtko Ursulin
2016-06-03 16:36 ` [PATCH 29/62] drm/i915: Rename intel_context[engine].ringbuf Chris Wilson
2016-06-03 16:36 ` [PATCH 30/62] drm/i915: Rename struct intel_ringbuffer to struct intel_ring Chris Wilson
2016-06-03 16:36 ` [PATCH 31/62] drm/i915: Rename residual ringbuf parameters Chris Wilson
2016-06-03 16:36 ` [PATCH 32/62] drm/i915: Rename intel_pin_and_map_ring() Chris Wilson
2016-06-03 16:36 ` [PATCH 33/62] drm/i915: Remove obsolete engine->gpu_caches_dirty Chris Wilson
2016-06-03 16:36 ` [PATCH 34/62] drm/i915: Simplify request_alloc by returning the allocated request Chris Wilson
2016-06-03 16:37 ` [PATCH 35/62] drm/i915: Unify legacy/execlists emission of MI_BATCHBUFFER_START Chris Wilson
2016-06-03 16:37 ` [PATCH 36/62] drm/i915: Convert engine->write_tail to operate on a request Chris Wilson
2016-06-03 16:37 ` [PATCH 37/62] drm/i915: Unify request submission Chris Wilson
2016-06-03 16:37 ` [PATCH 38/62] drm/i915: Stop passing caller's num_dwords to engine->semaphore.signal() Chris Wilson
2016-06-03 16:37 ` [PATCH 39/62] drm/i915: Reuse legacy breadcrumbs + tail emission Chris Wilson
2016-06-03 16:37 ` [PATCH 40/62] drm/i915: Remove duplicate golden render state init from execlists Chris Wilson
2016-06-03 16:37 ` [PATCH 41/62] drm/i915: Unify legacy/execlists submit_execbuf callbacks Chris Wilson
2016-06-03 16:37 ` [PATCH 42/62] drm/i915: Simplify calling engine->sync_to Chris Wilson
2016-06-03 16:37 ` [PATCH 43/62] drm/i915: Introduce i915_gem_active for request tracking Chris Wilson
2016-06-03 16:37 ` [PATCH 44/62] drm/i915: Prepare i915_gem_active for annotations Chris Wilson
2016-06-03 16:37 ` [PATCH 45/62] drm/i915: Mark up i915_gem_active for locking annotation Chris Wilson
2016-06-03 16:37 ` [PATCH 46/62] drm/i915: Refactor blocking waits Chris Wilson
2016-06-03 16:37 ` [PATCH 47/62] drm/i915: Rename request->list to link for consistency Chris Wilson
2016-06-03 16:37 ` [PATCH 48/62] drm/i915: Remove obsolete i915_gem_object_flush_active() Chris Wilson
2016-06-03 16:37 ` [PATCH 49/62] drm/i915: Refactor activity tracking for requests Chris Wilson
2016-06-03 16:37 ` [PATCH 50/62] drm/i915: Double check activity before relocations Chris Wilson
2016-06-03 16:37 ` [PATCH 51/62] drm/i915: Move request list retirement to i915_gem_request.c Chris Wilson
2016-06-03 16:37 ` [PATCH 52/62] drm/i915: Amalgamate GGTT/ppGTT vma debug list walkers Chris Wilson
2016-06-03 16:37 ` [PATCH 53/62] drm/i915: Split early global GTT initialisation Chris Wilson
2016-06-03 16:37 ` [PATCH 54/62] drm/i915: Store owning file on the i915_address_space Chris Wilson
2016-06-03 16:37 ` [PATCH 55/62] drm/i915: i915_vma_move_to_active prep patch Chris Wilson
2016-06-03 16:37 ` [PATCH 56/62] drm/i915: Count how many VMA are bound for an object Chris Wilson
2016-06-03 16:37 ` [PATCH 57/62] drm/i915: Be more careful when unbinding vma Chris Wilson
2016-06-03 16:37 ` [PATCH 58/62] drm/i915: Kill drop_pages() Chris Wilson
2016-06-03 16:37 ` [PATCH 59/62] drm/i915: Track active vma requests Chris Wilson
2016-06-03 16:37 ` [PATCH 60/62] drm/i915: Release vma when the handle is closed Chris Wilson
2016-06-03 16:37 ` [PATCH 61/62] drm/i915: Mark the context and address space as closed Chris Wilson
2016-06-03 16:37 ` [PATCH 62/62] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
2016-06-05  5:24 ` ✗ Ro.CI.BAT: failure for series starting with [01/62] drm/i915: Only start retire worker when idle Patchwork
2016-06-08  9:30 ` The vma leak fix from yonder Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1465387665.5803.39.camel@linux.intel.com \
    --to=joonas.lahtinen@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).