From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Wilson Subject: Re: [PATCH] drm/i915: kicking rings considered harmful Date: Tue, 27 Sep 2011 22:54:01 +0100 Message-ID: References: <1317059990-1922-1-git-send-email-daniel.vetter@ffwll.ch> <20110926222201.1ca2afcd@bwidawsk.net> <20110927100322.GA2785@phenom.ffwll.local> <20110927094614.158112a2@bwidawsk.net> <20110927180317.GC2785@phenom.ffwll.local> <20110927123859.5cd58ba8@bwidawsk.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTP id 00B9E9E7B8 for ; Tue, 27 Sep 2011 14:54:05 -0700 (PDT) In-Reply-To: <20110927123859.5cd58ba8@bwidawsk.net> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Ben Widawsky , Daniel Vetter Cc: Daniel Vetter , intel-gfx List-Id: intel-gfx@lists.freedesktop.org On Tue, 27 Sep 2011 12:38:59 -0700, Ben Widawsky wrote: > If we do this we lose the possibility to kick rings, but not reset the > GPU (not that I find that terribly useful. If we do this, it does fire a > wq event, but I don't see a problem with that for this case. > > I think I would rather do this: > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c > index 012732b..803524e 100644 > --- a/drivers/gpu/drm/i915/i915_irq.c > +++ b/drivers/gpu/drm/i915/i915_irq.c > @@ -1698,6 +1698,10 @@ void i915_hangcheck_elapsed(unsigned long data) > if (dev_priv->hangcheck_count++ > 1) { > DRM_ERROR("Hangcheck timer elapsed... GPU hung\n"); > > + /* Save off error state before kicking the rings and > + * possibly ruining the GPU state. > + */ > + i915_handle_error(dev, true); > if (!IS_GEN2(dev)) { > /* Is the chip hanging on a WAIT_FOR_EVENT? > * If so we can simply poke the RB_WAIT bit > @@ -1717,7 +1721,6 @@ void i915_hangcheck_elapsed(unsigned long data) > goto repeat; > } > > - i915_handle_error(dev, true); > return; > } > } else { Interesting, if we simply call i915_capture_error_state() rather than move the i195_handle_error() earlier we do in fact get the best of both worlds. However, it doesn't address Daniel's statement that kick_rings() provoked an unrecoverable hang and so we still need to disable that in order to save the error-state. The origin of ring-kicking was to try and recover from the modesetting/vsync issues, which apart from the outstanding issue in intel_crtc_disable() are behind us. (I hope ;-) We shouldn't be relying on i915_reset() and i915.reset=0 tends to be either deliberate or an act of desparation so I don't see the issue in also preventing ring-kicking with the same parameter. Is there an issue I'm overlooking? -Chris -- Chris Wilson, Intel Open Source Technology Centre