From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Wilson Subject: Re: [PATCH] drm/i915: add interface to simulate gpu hangs Date: Sat, 03 Dec 2011 01:33:59 +0000 Message-ID: References: <1320942887-6919-1-git-send-email-daniel.vetter@ffwll.ch> <1322864509-4130-1-git-send-email-daniel.vetter@ffwll.ch> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTP id 38A519E746 for ; Fri, 2 Dec 2011 17:34:20 -0800 (PST) In-Reply-To: <1322864509-4130-1-git-send-email-daniel.vetter@ffwll.ch> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: intel-gfx@lists.freedesktop.org Cc: Daniel Vetter List-Id: intel-gfx@lists.freedesktop.org On Fri, 2 Dec 2011 23:21:49 +0100, Daniel Vetter wrote: > gpu reset is a very important piece of our infrastructure. > Unfortunately we only really it test by actually hanging the gpu, > which often has bad side-effects for the entire system. And the gpu > hang handling code is one of the rather complicated pieces of code we > have, consisting of > - hang detection > - error capture > - actual gpu reset > - reset of all the gem bookkeeping > - reinitialition of the entire gpu > > This patch adds a debugfs to selectively stopping rings by ceasing to > update the hw tail pointer, which will result in the gpu no longer > updating it's head pointer and eventually to the hangcheck firing. > This way we can exercise the gpu hang code under controlled conditions > without a dying gpu taking down the entire systems. > > Patch motivated by me forgetting to properly reinitialize ppgtt after > a gpu reset. > > Usage: > > echo $((1 << $ringnum)) > i915_ring_stop # stops one ring > > echo 0xffffffff > i915_ring_stop # stops all, future-proof version > > then run whatever testload is desired. i915_ring_stop automatically > resets after a gpu hang is detected to avoid hanging the gpu to fast > and declaring it wedged. > > v2: Incorporate feedback from Chris Wilson. > > v3: Add the missing cleanup. I think I've made my peace with this patch. I'm still not completely sold on its value, but if Daniel found it useful then it has merit. > > Signed-Off-by: Daniel Vetter Reviewed-by: Chris Wilson > --- > drivers/gpu/drm/i915/i915_debugfs.c | 65 +++++++++++++++++++++++++++++++ > drivers/gpu/drm/i915/i915_drv.c | 2 + > drivers/gpu/drm/i915/i915_drv.h | 2 + > drivers/gpu/drm/i915/intel_ringbuffer.c | 4 ++ > 4 files changed, 73 insertions(+), 0 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c > index db83552..85328f7 100644 > --- a/drivers/gpu/drm/i915/i915_debugfs.c > +++ b/drivers/gpu/drm/i915/i915_debugfs.c > @@ -1397,6 +1397,64 @@ static const struct file_operations i915_wedged_fops = { > }; > > static ssize_t > +i915_ring_stop_read(struct file *filp, > + char __user *ubuf, > + size_t max, > + loff_t *ppos) > +{ > + struct drm_device *dev = filp->private_data; > + drm_i915_private_t *dev_priv = dev->dev_private; > + char buf[80]; > + int len; > + > + len = snprintf(buf, sizeof(buf), > + "%d\n", dev_priv->stop_rings); %08x since it is a flags value, though 8 may be overkill! > + > + if (len > sizeof(buf)) > + len = sizeof(buf); > + > + return simple_read_from_buffer(ubuf, max, ppos, buf, len); > +} > + > +static ssize_t > +i915_ring_stop_write(struct file *filp, > + const char __user *ubuf, > + size_t cnt, > + loff_t *ppos) > +{ > + struct drm_device *dev = filp->private_data; > + struct drm_i915_private *dev_priv = dev->dev_private; > + char buf[20]; > + int val = 0; > + > + if (cnt > 0) { > + if (cnt > sizeof(buf) - 1) > + return -EINVAL; > + > + if (copy_from_user(buf, ubuf, cnt)) > + return -EFAULT; > + buf[cnt] = 0; > + > + val = simple_strtoul(buf, NULL, 0); > + } > + > + DRM_DEBUG_DRIVER("Stopping rings %u\n", val); %x here as well -- Chris Wilson, Intel Open Source Technology Centre