From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Vetter Subject: Re: [RFC] drm/i915: reference count batch object on requests Date: Tue, 3 Dec 2013 18:10:05 +0100 Message-ID: <20131203171005.GN27344@phenom.ffwll.local> References: <1385995666-3541-1-git-send-email-mika.kuoppala@intel.com> <1385998313-5783-1-git-send-email-mika.kuoppala@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) by gabe.freedesktop.org (Postfix) with ESMTP id 6BA8FFB258 for ; Tue, 3 Dec 2013 09:09:23 -0800 (PST) Received: by mail-wi0-f181.google.com with SMTP id hq4so6866739wib.14 for ; Tue, 03 Dec 2013 09:09:21 -0800 (PST) Content-Disposition: inline In-Reply-To: <1385998313-5783-1-git-send-email-mika.kuoppala@intel.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org To: Mika Kuoppala Cc: intel-gfx@lists.freedesktop.org, miku@iki.fi List-Id: intel-gfx@lists.freedesktop.org On Mon, Dec 02, 2013 at 05:31:53PM +0200, Mika Kuoppala wrote: > We used to lean on active_list to handle the references > to batch objects. But there are useful cases when same, > albeit simple, batch can be executing on multiple rings > concurrently. For this case the active_list reference count > handling is just not enough as batch could be freed by > ring A request retirement as it is still running on ring B. > > Fix this by doing proper batch_obj reference counting. > > Signed-off-by: Mika Kuoppala > > Notes: > This is a patch which ameliorates the > [PATCH] tests/gem_reset_stats: add close-pending-fork > > Chris wasn't happy about the refcounting as it might hide > the true problem. But I haven't been able to find the real culprit, > thus the RFC. I think I understand the bug now, and your patch looks like the correct fix. But we need to pimp the commit message. In i915_gem_reset_ring_lists we reset requests and move objects to the inactive list. Which means if the active list is the last one to hold a reference, the object will disappear. Now the problem is that we do this per-ring, and not in the order that the objects would have been retired if the gpu wouldn't have hung. E.g. if a batch is active on both ring 1&2 but was last active on ring 1, then we'd free it before we go ahead with cleaning up the requests for ring 2. So reference-counting the batch_obj pointers looks like the real fix. Can you please amend your patch with this explanation and also please add a backtrace that shows the crash? Might need to gather it with netconsole. Thanks, Daniel > --- > drivers/gpu/drm/i915/i915_gem.c | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 40d9dcf..858538f 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -2145,13 +2145,12 @@ int __i915_add_request(struct intel_ring_buffer *ring, > request->head = request_start; > request->tail = request_ring_position; > > - /* Whilst this request exists, batch_obj will be on the > - * active_list, and so will hold the active reference. Only when this > - * request is retired will the the batch_obj be moved onto the > - * inactive_list and lose its active reference. Hence we do not need > - * to explicitly hold another reference here. > + /* Active list has one reference but that is not enough as same > + * batch_obj can be active on multiple rings > */ > request->batch_obj = obj; > + if (request->batch_obj) > + drm_gem_object_reference(&request->batch_obj->base); > > /* Hold a reference to the current context so that we can inspect > * it later in case a hangcheck error event fires. > @@ -2340,6 +2339,9 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request) > if (request->ctx) > i915_gem_context_unreference(request->ctx); > > + if (request->batch_obj) > + drm_gem_object_unreference(&request->batch_obj->base); > + > kfree(request); > } > > -- > 1.7.9.5 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch