From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ville =?iso-8859-1?Q?Syrj=E4l=E4?= Subject: Re: [PATCH] drm/i915: Decouple GPU error reporting from ring initialisation Date: Mon, 27 Jan 2014 16:05:24 +0200 Message-ID: <20140127140524.GT9454@intel.com> References: <20140124120612.GC9454@intel.com> <1390830754-952-1-git-send-email-chris@chris-wilson.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTP id 25868FADB6 for ; Mon, 27 Jan 2014 06:05:28 -0800 (PST) Content-Disposition: inline In-Reply-To: <1390830754-952-1-git-send-email-chris@chris-wilson.co.uk> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org To: Chris Wilson Cc: intel-gfx@lists.freedesktop.org, Ben Widawsky , stable@vger.kernel.org List-Id: intel-gfx@lists.freedesktop.org On Mon, Jan 27, 2014 at 01:52:34PM +0000, Chris Wilson wrote: > Currently we report through our error state only the rings that have > been initialised (as detected by ring->obj). This check is done after > the GPU reset and ring re-initialisation, which means that the software > state may not be the same as when we captured the hardware error and we > may not print out any of the vital information for debugging the hang. > = > This (and the implied object leak) is a regression from > = > commit 3d57e5bd1284f44e325f3a52d966259ed42f9e05 > Author: Ben Widawsky > Date: Mon Oct 14 10:01:36 2013 -0700 > = > drm/i915: Do a fuller init after reset > = > Note that we are already starting to get bug reports with incomplete > error states from 3.13. > = > v2: Prevent a NULL dereference on 830gm/845g after a GPU reset where > the scratch obj may be NULL. > = > Signed-off-by: Chris Wilson > Cc: Ben Widawsky > Cc: Ville Syrj=E4l=E4 > References: https://bugs.freedesktop.org/show_bug.cgi?id=3D74094 > Cc: stable@vger.kernel.org Looks OK to me. Reviewed-by: Ville Syrj=E4l=E4 > --- > drivers/gpu/drm/i915/i915_drv.h | 1 + > drivers/gpu/drm/i915/i915_gpu_error.c | 22 +++++++++++++++------- > 2 files changed, 16 insertions(+), 7 deletions(-) > = > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_= drv.h > index 2e6c67d944eb..0249c9aa345a 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -335,6 +335,7 @@ struct drm_i915_error_state { > struct timeval time; > = > struct drm_i915_error_ring { > + bool valid; > struct drm_i915_error_object { > int page_count; > u32 gtt_offset; > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915= /i915_gpu_error.c > index 6832473bc386..96e945c3d44f 100644 > --- a/drivers/gpu/drm/i915/i915_gpu_error.c > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c > @@ -240,6 +240,9 @@ static void i915_ring_error_state(struct drm_i915_err= or_state_buf *m, > unsigned ring) > { > BUG_ON(ring >=3D I915_NUM_RINGS); /* shut up confused gcc */ > + if (!error->ring[ring].valid) > + return; > + > err_printf(m, "%s command stream:\n", ring_str(ring)); > err_printf(m, " HEAD: 0x%08x\n", error->head[ring]); > err_printf(m, " TAIL: 0x%08x\n", error->tail[ring]); > @@ -295,7 +298,6 @@ int i915_error_state_to_str(struct drm_i915_error_sta= te_buf *m, > struct drm_device *dev =3D error_priv->dev; > drm_i915_private_t *dev_priv =3D dev->dev_private; > struct drm_i915_error_state *error =3D error_priv->error; > - struct intel_ring_buffer *ring; > int i, j, page, offset, elt; > = > if (!error) { > @@ -330,7 +332,7 @@ int i915_error_state_to_str(struct drm_i915_error_sta= te_buf *m, > if (INTEL_INFO(dev)->gen =3D=3D 7) > err_printf(m, "ERR_INT: 0x%08x\n", error->err_int); > = > - for_each_ring(ring, dev_priv, i) > + for (i =3D 0; i < ARRAY_SIZE(error->ring); i++) > i915_ring_error_state(m, dev, error, i); > = > for (i =3D 0; i < error->vm_count; i++) { > @@ -405,8 +407,7 @@ int i915_error_state_to_str(struct drm_i915_error_sta= te_buf *m, > } > } > = > - obj =3D error->ring[i].ctx; > - if (obj) { > + if ((obj =3D error->ring[i].ctx)) { > err_printf(m, "%s --- HW Context =3D 0x%08x\n", > dev_priv->ring[i].name, > obj->gtt_offset); > @@ -730,7 +731,8 @@ i915_error_first_batchbuffer(struct drm_i915_private = *dev_priv, > return NULL; > = > obj =3D ring->scratch.obj; > - if (acthd >=3D i915_gem_obj_ggtt_offset(obj) && > + if (obj !=3D NULL && > + acthd >=3D i915_gem_obj_ggtt_offset(obj) && > acthd < i915_gem_obj_ggtt_offset(obj) + obj->base.size) > return i915_error_ggtt_object_create(dev_priv, obj); > } > @@ -875,11 +877,17 @@ static void i915_gem_record_rings(struct drm_device= *dev, > struct drm_i915_error_state *error) > { > struct drm_i915_private *dev_priv =3D dev->dev_private; > - struct intel_ring_buffer *ring; > struct drm_i915_gem_request *request; > int i, count; > = > - for_each_ring(ring, dev_priv, i) { > + for (i =3D 0; i < I915_NUM_RINGS; i++) { > + struct intel_ring_buffer *ring =3D &dev_priv->ring[i]; > + > + if (ring->dev =3D=3D NULL) > + continue; > + > + error->ring[i].valid =3D true; > + > i915_record_ring_state(dev, error, ring); > = > error->ring[i].batchbuffer =3D > -- = > 1.8.5.3 -- = Ville Syrj=E4l=E4 Intel OTC