Re: [PATCH 2/5] drm/i915: don't bail out of intel_wait_ring_buffer too early

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ben Widawsky <ben@bwidawsk.net>
To: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: intel-gfx <intel-gfx@lists.freedesktop.org>
Subject: Re: [PATCH 2/5] drm/i915: don't bail out of intel_wait_ring_buffer too early
Date: Sun, 30 Oct 2011 18:29:13 -0700	[thread overview]
Message-ID: <20111030182913.511d8d53@bwidawsk.net> (raw)
In-Reply-To: <1320001932-1846-2-git-send-email-daniel.vetter@ffwll.ch>

On Sun, 30 Oct 2011 20:12:09 +0100
Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> In the pre-gem days with non-existing hangcheck and gpu reset code,
> this timeout of 3 seconds was pretty important to avoid stuck
> processes.
> 
> But now we have the hangcheck code in gem that goes to great length
> to ensure that the gpu is really dead before declaring it wedged.
> 
> So there's no need for this timeout anymore. Actually it's even harmful
> because we can bail out too early (e.g. with xscreensaver slip)
> when running giant batchbuffers. And our code isn't robust enough
> to properly unroll any state-changes, we pretty much rely on the gpu
> reset code cleaning up the mess (like cache tracking, fencing state,
> active list/request tracking, ...).
> 
> With this change intel_begin_ring can only fail when the gpu is
> wedged, and it will return -EAGAIN (like wait_request in case the
> gpu reset is still outstanding).
> 
> v2: Chris Wilson noted that on resume timers aren't running and hence
> we won't ever get kicked out of this loop by the hangcheck code. Use
> an insanely large timeout instead for the HAS_GEM case to prevent
> resume bugs from totally hanging the machine.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c |   11 ++++++++++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index ca70e2f..6e28301 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1119,7 +1119,16 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
>  	}
>  
>  	trace_i915_ring_wait_begin(ring);
> -	end = jiffies + 3 * HZ;
> +	if (drm_core_check_feature(dev, DRIVER_GEM))
> +		/* With GEM the hangcheck timer should kick us out of the loop,
> +		 * leaving it early runs the risk of corrupting GEM state (due
> +		 * to running on almost untested codepaths). But on resume
> +		 * timers don't work yet, so prevent a complete hang in that
> +		 * case by choosing an insanely large timeout. */
> +		end = jiffies + 60 * HZ;
> +	else
> +		end = jiffies + 3 * HZ;
> +
>  	do {
>  		ring->head = I915_READ_HEAD(ring);
>  		ring->space = ring_space(ring);

I didn't really check to see if there is actually an issue here, but
instead of 60, do you want to play nice with timeouts such as
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT (ie. the min of all the timeouts and
60)?

Acked-by: Ben Widawsky <ben@bwidawsk.net>

next prev parent reply	other threads:[~2011-10-31  1:29 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-30 19:12 [PATCH 1/5] drm/i915: kicking rings stuck on semaphores considered harmful Daniel Vetter
2011-10-30 19:12 ` [PATCH 2/5] drm/i915: don't bail out of intel_wait_ring_buffer too early Daniel Vetter
2011-10-31  1:29   ` Ben Widawsky [this message]
2011-10-31  7:37     ` Daniel Vetter
2011-11-01 15:31   ` Eugeni Dodonov
2011-10-30 19:12 ` [PATCH 3/5] drm/i915: switch ring->id to be a real id Daniel Vetter
2011-10-31  1:48   ` Ben Widawsky
2011-11-01 15:29   ` Eugeni Dodonov
2011-10-30 19:12 ` [PATCH 4/5] drm/i915: refactor ring error state capture to use arrays Daniel Vetter
2011-10-30 19:32   ` Chris Wilson
2011-10-31  1:47   ` Ben Widawsky
2011-10-31  1:50     ` Ben Widawsky
2011-10-31  7:53       ` Daniel Vetter
2011-10-30 19:12 ` [PATCH 5/5] drm/i915: collect more per ring error state Daniel Vetter
2011-10-31  1:51   ` Ben Widawsky
2011-10-31  1:25 ` [PATCH 1/5] drm/i915: kicking rings stuck on semaphores considered harmful Ben Widawsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111030182913.511d8d53@bwidawsk.net \
    --to=ben@bwidawsk.net \
    --cc=daniel.vetter@ffwll.ch \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.