From mboxrd@z Thu Jan  1 00:00:00 1970
From: Chris Wilson <chris@chris-wilson.co.uk>
Subject: Re: [PATCH] drm/i915: kicking rings considered harmful
Date: Tue, 27 Sep 2011 22:54:01 +0100
Message-ID: <e0d58a$1mj5pp@orsmga002.jf.intel.com>
References: <CAObL_7Hi+c2aEtEMzrMNnrQXfnUmNY_ZnP==xCd7egMoKBow_g@mail.gmail.com>
	<1317059990-1922-1-git-send-email-daniel.vetter@ffwll.ch>
	<20110926222201.1ca2afcd@bwidawsk.net>
	<20110927100322.GA2785@phenom.ffwll.local>
	<20110927094614.158112a2@bwidawsk.net>
	<c55c5d$k5v25@AZSMGA002.ch.intel.com>
	<20110927180317.GC2785@phenom.ffwll.local>
	<20110927123859.5cd58ba8@bwidawsk.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org>
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
	by gabe.freedesktop.org (Postfix) with ESMTP id 00B9E9E7B8
	for <intel-gfx@lists.freedesktop.org>;
	Tue, 27 Sep 2011 14:54:05 -0700 (PDT)
In-Reply-To: <20110927123859.5cd58ba8@bwidawsk.net>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
To: Ben Widawsky <ben@bwidawsk.net>, Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx <intel-gfx@lists.freedesktop.org>
List-Id: intel-gfx@lists.freedesktop.org

On Tue, 27 Sep 2011 12:38:59 -0700, Ben Widawsky <ben@bwidawsk.net> wrote:
> If we do this we lose the possibility to kick rings, but not reset the
> GPU (not that I find that terribly useful. If we do this, it does fire a
> wq event, but I don't see a problem with that for this case.
> 
> I think I would rather do this:
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 012732b..803524e 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1698,6 +1698,10 @@ void i915_hangcheck_elapsed(unsigned long data)
>                 if (dev_priv->hangcheck_count++ > 1) {
>                         DRM_ERROR("Hangcheck timer elapsed... GPU hung\n");
>  
> +                       /* Save off error state before kicking the rings and
> +                        * possibly ruining the GPU state.
> +                        */
> +                       i915_handle_error(dev, true);
>                         if (!IS_GEN2(dev)) {
>                                 /* Is the chip hanging on a WAIT_FOR_EVENT?
>                                  * If so we can simply poke the RB_WAIT bit
> @@ -1717,7 +1721,6 @@ void i915_hangcheck_elapsed(unsigned long data)
>                                         goto repeat;
>                         }
>  
> -                       i915_handle_error(dev, true);
>                         return;
>                 }
>         } else {

Interesting, if we simply call i915_capture_error_state() rather than move
the i195_handle_error() earlier we do in fact get the best of both worlds.

However, it doesn't address Daniel's statement that kick_rings() provoked
an unrecoverable hang and so we still need to disable that in order to
save the error-state. The origin of ring-kicking was to try and recover
from the modesetting/vsync issues, which apart from the outstanding issue
in intel_crtc_disable() are behind us. (I hope ;-) We shouldn't be relying
on i915_reset() and i915.reset=0 tends to be either deliberate or an act of
desparation so I don't see the issue in also preventing ring-kicking with
the same parameter. Is there an issue I'm overlooking?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre