From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ville =?iso-8859-1?Q?Syrj=E4l=E4?= Subject: Re: [RFC] [PATCH 7/7] drm/i915: find guilty batch buffer on ring resets Date: Fri, 15 Feb 2013 16:55:25 +0200 Message-ID: <20130215145525.GD9135@intel.com> References: <1359986683-29788-1-git-send-email-mika.kuoppala@intel.com> <1359986683-29788-8-git-send-email-mika.kuoppala@intel.com> <20130207141136.GD9135@intel.com> <87wqu9fyyn.fsf@gaia.fi.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by gabe.freedesktop.org (Postfix) with ESMTP id D34D4E5F89 for ; Fri, 15 Feb 2013 06:55:28 -0800 (PST) Content-Disposition: inline In-Reply-To: <87wqu9fyyn.fsf@gaia.fi.intel.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Mika Kuoppala Cc: intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org On Fri, Feb 15, 2013 at 04:12:16PM +0200, Mika Kuoppala wrote: > Ville Syrj=E4l=E4 writes: > = > > On Mon, Feb 04, 2013 at 04:04:43PM +0200, Mika Kuoppala wrote: > >> After hang check timer has declared gpu to be hang, > >> rings are reset. In ring reset, when clearing > >> request list, do post mortem analysis to find out > >> the guilty batch buffer. > >> = > >> Select requests for further analysis by inspecting > >> the completed sequence number which has been updated > >> into the HWS page. If request was completed, it can't > >> be related to the hang. > >> = > >> For completed requests mark the batch as guilty > > ^^^^^^^^^ > > > > That's a typo, right? > = > It sure is. Will fix. > = > >> if the ring was not waiting and the ring head was > >> stuck inside the buffer object or in the flush region > >> right after the batch. For everything else, mark > >> them as innocents. > >> = > >> Signed-off-by: Mika Kuoppala > >> --- > >> drivers/gpu/drm/i915/i915_gem.c | 91 ++++++++++++++++++++++++++++++= +++++++++ > >> 1 file changed, 91 insertions(+) > >> = > >> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i9= 15_gem.c > >> index b304b06..db0f3e3 100644 > >> --- a/drivers/gpu/drm/i915/i915_gem.c > >> +++ b/drivers/gpu/drm/i915/i915_gem.c > >> @@ -2092,9 +2092,97 @@ i915_gem_request_remove_from_client(struct drm_= i915_gem_request *request) > >> spin_unlock(&file_priv->mm.lock); > >> } > >> = > >> +static bool i915_head_inside_object(u32 acthd, struct drm_i915_gem_ob= ject *obj) > >> +{ > >> + if (acthd >=3D obj->gtt_offset && > >> + acthd < obj->gtt_offset + obj->base.size) > >> + return true; > >> + > >> + return false; > >> +} > >> + > >> +static bool i915_head_inside_request(u32 acthd, u32 rs, u32 re) > >> +{ > >> + if (rs < re) { > >> + if (acthd >=3D rs && acthd < re) > >> + return true; > >> + } else if (rs > re) { > >> + if (acthd >=3D rs || acthd < re) > >> + return true; > >> + } > >> + > >> + return false; > >> +} > >> + > >> +static bool i915_request_guilty(struct drm_i915_gem_request *request, > >> + const u32 acthd, bool *inside) > >> +{ > >> + if (request->batch_obj) { > >> + if (i915_head_inside_object(acthd, request->batch_obj)) { > >> + *inside =3D true; > >> + return true; > >> + } > >> + } > >> + > >> + if (i915_head_inside_request(acthd, request->head, request->tail)) { > >> + *inside =3D false; > >> + return true; > >> + } > >> + > >> + return false; > >> +} > >> + > >> +static void i915_set_reset_status(struct intel_ring_buffer *ring, > >> + struct drm_i915_gem_request *request, > >> + u32 acthd) > >> +{ > >> + bool inside; > >> + struct i915_reset_stats *rs =3D NULL; > >> + bool guilty; > >> + > >> + /* Innocent until proven guilty */ > >> + guilty =3D false; > >> + > >> + if (!ring->hangcheck_waiting && > >> + i915_request_guilty(request, acthd, &inside)) { > >> + DRM_ERROR("%s hung %s bo (0x%x ctx %d) at 0x%x\n", > >> + ring->name, > >> + inside ? "inside" : "flushing", > >> + request->batch_obj ? > >> + request->batch_obj->gtt_offset : 0, > >> + request->ctx ? request->ctx->id : 0, > >> + acthd); > >> + > >> + guilty =3D true; > >> + } > >> + > >> + /* If contexts are disabled or this is the default context, use > >> + * file_priv->reset_stats > >> + */ > >> + if (request->ctx && request->ctx->id !=3D DEFAULT_CONTEXT_ID) > >> + rs =3D &request->ctx->reset_stats; > >> + else if (request->file_priv) > >> + rs =3D &request->file_priv->reset_stats; > >> + > >> + if (rs) { > >> + rs->total++; > >> + > >> + if (guilty) > >> + rs->guilty++; > >> + else > >> + rs->innocent++; > >> + } > >> +} > >> + > >> static void i915_gem_reset_ring_lists(struct drm_i915_private *dev_pr= iv, > >> struct intel_ring_buffer *ring) > >> { > >> + u32 completed_seqno; > >> + u32 acthd; > >> + > >> + acthd =3D intel_ring_get_active_head(ring); > >> + completed_seqno =3D ring->get_seqno(ring, false); > >> + > >> while (!list_empty(&ring->request_list)) { > >> struct drm_i915_gem_request *request; > >> = > >> @@ -2102,6 +2190,9 @@ static void i915_gem_reset_ring_lists(struct drm= _i915_private *dev_priv, > >> struct drm_i915_gem_request, > >> list); > >> = > >> + if (request->seqno > completed_seqno) > > > > i915_seqno_passed()? > = > For readability or for correctness? > = > When seqno wraps, the request queue will be cleaned up so > we can't have cross wrap boundary stuff in here. > = > Or did you have something else in mind that i have missed. Nah. It just seems suspicious to have a direct comparison with any comment why i915_seqno_passed() isn't used. -- = Ville Syrj=E4l=E4 Intel OTC