public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 02/20] drm/i915: Delay queuing hangcheck to wait-request
Date: Fri, 1 Jul 2016 16:32:16 +0100	[thread overview]
Message-ID: <57768D00.3030100@linux.intel.com> (raw)
In-Reply-To: <1467372140-30422-3-git-send-email-chris@chris-wilson.co.uk>


On 01/07/16 12:22, Chris Wilson wrote:
> We can forgo queuing the hangcheck from the start of every request to
> until we wait upon a request. This reduces the overhead of every
> request, but may increase the latency of detecting a hang. Howeever, if
> nothing every waits upon a hang, did it ever hang? It also improves the
> robustness of the wait-request by ensuring that the hangchecker is
> indeed running before we sleep indefinitely (and thereby ensuring that
> we never actually sleep forever waiting for a dead GPU).
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c |  9 +++++----
>   drivers/gpu/drm/i915/i915_irq.c | 10 ++++------
>   2 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1d9878258103..34f724cc40b8 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1532,6 +1532,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   			break;
>   		}
>
> +		/* Ensure that even if the GPU hangs, we get woken up. */
> +		i915_queue_hangcheck(dev_priv);
> +
>   		timer.function = NULL;
>   		if (timeout || missed_irq(dev_priv, engine)) {
>   			unsigned long expire;
> @@ -2919,8 +2922,6 @@ void __i915_add_request(struct drm_i915_gem_request *request,
>   	/* Not allowed to fail! */
>   	WARN(ret, "emit|add_request failed: %d!\n", ret);
>
> -	i915_queue_hangcheck(engine->i915);
> -
>   	queue_delayed_work(dev_priv->wq,
>   			   &dev_priv->mm.retire_work,
>   			   round_jiffies_up_relative(HZ));
> @@ -3264,8 +3265,8 @@ i915_gem_retire_requests(struct drm_i915_private *dev_priv)
>
>   	if (idle)
>   		mod_delayed_work(dev_priv->wq,
> -				   &dev_priv->mm.idle_work,
> -				   msecs_to_jiffies(100));
> +				 &dev_priv->mm.idle_work,
> +				 msecs_to_jiffies(100));
>
>   	return idle;
>   }
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 4378a659d962..5614582ca240 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3135,10 +3135,10 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>   	intel_uncore_arm_unclaimed_mmio_detection(dev_priv);
>
>   	for_each_engine_id(engine, dev_priv, id) {
> +		bool busy = waitqueue_active(&engine->irq_queue);
>   		u64 acthd;
>   		u32 seqno;
>   		unsigned user_interrupts;
> -		bool busy = true;
>
>   		semaphore_clear_deadlocks(dev_priv);
>
> @@ -3161,12 +3161,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>   		if (engine->hangcheck.seqno == seqno) {
>   			if (ring_idle(engine, seqno)) {
>   				engine->hangcheck.action = HANGCHECK_IDLE;
> -				if (waitqueue_active(&engine->irq_queue)) {, the
> +				if (busy) {
>   					/* Safeguard against driver failure */
>   					user_interrupts = kick_waiters(engine);
>   					engine->hangcheck.score += BUSY;
> -				} else
> -					busy = false;
> +				}
>   			} else {
>   				/* We always increment the hangcheck score
>   				 * if the ring is busy and still processing
> @@ -3240,9 +3239,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>   		goto out;
>   	}
>
> +	/* Reset timer in case GPU hangs without another request being added */
>   	if (busy_count)
> -		/* Reset timer case chip hangs without another request
> -		 * being added */
>   		i915_queue_hangcheck(dev_priv);
>
>   out:
>

I thought I see a problem here but I was just confused. I think it is 
OK. Just won't re-queue the hangcheck if no one is waiting and no new 
requests get submitted. It is unlikely that would cause a problem in 
practice. It sounds very unlucky that the last submitted request ever 
hangs. Balance with the benefit of not running while GPU is processing 
stuff I think we can give it a go.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2016-07-01 15:32 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-01 11:22 To the gingerbread house! Chris Wilson
2016-07-01 11:22 ` [PATCH 01/20] drm/i915/shrinker: Flush active on objects before counting Chris Wilson
2016-07-01 11:22 ` [PATCH 02/20] drm/i915: Delay queuing hangcheck to wait-request Chris Wilson
2016-07-01 15:32   ` Tvrtko Ursulin [this message]
2016-07-01 11:22 ` [PATCH 03/20] drm/i915: Remove the dedicated hangcheck workqueue Chris Wilson
2016-07-01 11:22 ` [PATCH 04/20] drm/i915: Make queueing the hangcheck work inline Chris Wilson
2016-07-01 11:22 ` [PATCH 05/20] drm/i915: Separate GPU hang waitqueue from advance Chris Wilson
2016-07-01 14:54   ` Tvrtko Ursulin
2016-07-01 11:22 ` [PATCH 06/20] drm/i915: Slaughter the thundering i915_wait_request herd Chris Wilson
2016-07-01 11:22 ` [PATCH 07/20] drm/i915: Spin after waking up for an interrupt Chris Wilson
2016-07-01 11:22 ` [PATCH 08/20] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
2016-07-01 14:09   ` Tvrtko Ursulin
2016-07-01 14:14     ` Chris Wilson
2016-07-01 11:22 ` [PATCH 09/20] drm/i915: Stop mapping the scratch page into CPU space Chris Wilson
2016-07-01 11:22 ` [PATCH 10/20] drm/i915: Allocate scratch page from stolen Chris Wilson
2016-07-01 11:22 ` [PATCH 11/20] drm/i915: Refactor scratch object allocation for gen2 w/a buffer Chris Wilson
2016-07-01 11:22 ` [PATCH 12/20] drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk) Chris Wilson
2016-07-01 14:27   ` Tvrtko Ursulin
2016-07-01 14:35     ` Chris Wilson
2016-07-01 11:22 ` [PATCH 13/20] drm/i915: Check the CPU cached value in HWS of seqno after waking the waiter Chris Wilson
2016-07-01 11:22 ` [PATCH 14/20] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted Chris Wilson
2016-07-01 11:22 ` [PATCH 15/20] drm/i915: Stop setting wraparound seqno on initialisation Chris Wilson
2016-07-01 11:22 ` [PATCH 16/20] drm/i915: Convert trace-irq to the breadcrumb waiter Chris Wilson
2016-07-01 11:22 ` [PATCH 17/20] drm/i915: Embed signaling node into the GEM request Chris Wilson
2016-07-01 11:22 ` [PATCH 18/20] drm/i915: Move the get/put irq locking into the caller Chris Wilson
2016-07-01 14:39   ` Tvrtko Ursulin
2016-07-01 11:22 ` [PATCH 19/20] drm/i915: Simplify enabling user-interrupts with L3-remapping Chris Wilson
2016-07-01 11:22 ` [PATCH 20/20] drm/i915: Remove debug noise on detecting fault-injection of missed interrupts Chris Wilson
2016-07-01 11:51 ` ✗ Ro.CI.BAT: failure for series starting with [01/20] drm/i915/shrinker: Flush active on objects before counting Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57768D00.3030100@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox