public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org, miku@iki.fi
Subject: Re: [RFC] drm/i915: reference count batch object on requests
Date: Tue, 3 Dec 2013 18:10:05 +0100	[thread overview]
Message-ID: <20131203171005.GN27344@phenom.ffwll.local> (raw)
In-Reply-To: <1385998313-5783-1-git-send-email-mika.kuoppala@intel.com>

On Mon, Dec 02, 2013 at 05:31:53PM +0200, Mika Kuoppala wrote:
> We used to lean on active_list to handle the references
> to batch objects. But there are useful cases when same,
> albeit simple, batch can be executing on multiple rings
> concurrently. For this case the active_list reference count
> handling is just not enough as batch could be freed by
> ring A request retirement as it is still running on ring B.
> 
> Fix this by doing proper batch_obj reference counting.
> 
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> 
> Notes:
>     This is a patch which ameliorates the
>     [PATCH] tests/gem_reset_stats: add close-pending-fork
>     
>     Chris wasn't happy about the refcounting as it might hide
>     the true problem. But I haven't been able to find the real culprit,
>     thus the RFC.

I think I understand the bug now, and your patch looks like the correct
fix. But we need to pimp the commit message.

In i915_gem_reset_ring_lists we reset requests and move objects to the
inactive list. Which means if the active list is the last one to hold a
reference, the object will disappear.

Now the problem is that we do this per-ring, and not in the order that the
objects would have been retired if the gpu wouldn't have hung. E.g. if a
batch is active on both ring 1&2 but was last active on ring 1, then we'd
free it before we go ahead with cleaning up the requests for ring 2.

So reference-counting the batch_obj pointers looks like the real fix.

Can you please amend your patch with this explanation and also please add
a backtrace that shows the crash? Might need to gather it with netconsole.

Thanks, Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c |   12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 40d9dcf..858538f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2145,13 +2145,12 @@ int __i915_add_request(struct intel_ring_buffer *ring,
>  	request->head = request_start;
>  	request->tail = request_ring_position;
>  
> -	/* Whilst this request exists, batch_obj will be on the
> -	 * active_list, and so will hold the active reference. Only when this
> -	 * request is retired will the the batch_obj be moved onto the
> -	 * inactive_list and lose its active reference. Hence we do not need
> -	 * to explicitly hold another reference here.
> +	/* Active list has one reference but that is not enough as same
> +	 * batch_obj can be active on multiple rings
>  	 */
>  	request->batch_obj = obj;
> +	if (request->batch_obj)
> +		drm_gem_object_reference(&request->batch_obj->base);
>  
>  	/* Hold a reference to the current context so that we can inspect
>  	 * it later in case a hangcheck error event fires.
> @@ -2340,6 +2339,9 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request)
>  	if (request->ctx)
>  		i915_gem_context_unreference(request->ctx);
>  
> +	if (request->batch_obj)
> +		drm_gem_object_unreference(&request->batch_obj->base);
> +
>  	kfree(request);
>  }
>  
> -- 
> 1.7.9.5
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

  reply	other threads:[~2013-12-03 17:09 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-02 14:47 [PATCH] tests/gem_reset_stats: add close-pending-fork Mika Kuoppala
2013-12-02 15:03 ` Chris Wilson
2013-12-02 16:32   ` Mika Kuoppala
2013-12-03 17:03     ` Daniel Vetter
2013-12-04 14:39       ` Mika Kuoppala
2013-12-04 15:48         ` Daniel Vetter
2013-12-02 15:31 ` [RFC] drm/i915: reference count batch object on requests Mika Kuoppala
2013-12-03 17:10   ` Daniel Vetter [this message]
2013-12-04 11:24     ` Chris Wilson
2013-12-04 12:08       ` Daniel Vetter
2013-12-04 12:11         ` Chris Wilson
2013-12-04 13:28     ` Mika Kuoppala

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131203171005.GN27344@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=mika.kuoppala@linux.intel.com \
    --cc=miku@iki.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox