From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Widawsky Subject: Re: [PATCH] drm/i915: Fail gpu reset if the forcewake fifo hasn't drained Date: Sat, 8 Mar 2014 10:50:41 -0800 Message-ID: <20140308185041.GA16066@bwidawsk.net> References: <1394222943-7241-1-git-send-email-daniel.vetter@ffwll.ch> <20140307213556.GG25837@phenom.ffwll.local> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail.bwidawsk.net (bwidawsk.net [166.78.191.112]) by gabe.freedesktop.org (Postfix) with ESMTP id 42F7EFA8CB for ; Sat, 8 Mar 2014 10:50:55 -0800 (PST) Content-Disposition: inline In-Reply-To: <20140307213556.GG25837@phenom.ffwll.local> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org To: Daniel Vetter , Chris Wilson Cc: Daniel Vetter , Intel Graphics Development , Mika Kuoppala List-Id: intel-gfx@lists.freedesktop.org On Fri, Mar 07, 2014 at 10:35:56PM +0100, Daniel Vetter wrote: > On Fri, Mar 07, 2014 at 09:09:03PM +0100, Daniel Vetter wrote: > > Since the gpu reset + full ppgtt merge we have a hard hang on snb when > > running the gem_reset_stat tests. Recently Mika also some more strict > > forcewake fifo warnigns for gen6/7 in > > > > commit 20277c60ed08ab4f7237854cc6c2046649f9200f > > Author: Mika Kuoppala > > Date: Wed Mar 5 18:08:19 2014 +0200 > > > > drm/i915: Always set fifo count to zero in gen6_reset > > > > and they _do_ fire just right before the the final failing reset which > > then results in the machine's ultimate demise. > > > > So use this indicator to fail the gpu reset with an -EIO code, > > preventing further command submission, further hangs and so the deadly > > final gpu reset attempt. It seems to work and my snb survives now. > > > > The gpu is still dead though unfortunately. > > > > Cc: Mika Kuoppala > > References: https://bugs.freedesktop.org/show_bug.cgi?id=74100 > > Signed-off-by: Daniel Vetter > > --- > > drivers/gpu/drm/i915/intel_uncore.c | 8 +++++--- > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > > index c666af8232ef..9e22b11d0b0c 100644 > > --- a/drivers/gpu/drm/i915/intel_uncore.c > > +++ b/drivers/gpu/drm/i915/intel_uncore.c > > @@ -989,9 +989,11 @@ static int gen6_do_reset(struct drm_device *dev) > > if (fw_engine) > > dev_priv->uncore.funcs.force_wake_get(dev_priv, fw_engine); > > > > - if (IS_GEN6(dev) || IS_GEN7(dev)) > > - WARN_ON((__raw_i915_read32(dev_priv, GTFIFOCTL) & > > - GT_FIFO_FREE_ENTRIES_MASK) != 0); > > + if (IS_GEN6(dev) || IS_GEN7(dev)) { > > + if (WARN_ON((__raw_i915_read32(dev_priv, GTFIFOCTL) & > > + GT_FIFO_FREE_ENTRIES_MASK) != 0)) > > + ret = -EIO; > > Chris pointed out that this WARN doesn't make much sense, and testing > confirmed that this completely breaks gpu reset on my machines here. > > I've backed out Mika's original patch, this seems to be the wrong path. > -Daniel > > > + } > > > > dev_priv->uncore.fifo_count = 0; > > I've seen this too. Though I think the WARN does coincide with what the docs state - it doesn't seem to match reality. So I totally agree this is the right course. However, for my curiosity, Chris, can you elaborate on why you think it doesn't make sense? -- Ben Widawsky, Intel Open Source Technology Center