public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Matthew Auld <matthew.auld@intel.com>
To: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH v3 4/6] drm/i915/ttm: Break refcounting loops at device region unref time
Date: Mon, 15 Nov 2021 10:49:32 +0000	[thread overview]
Message-ID: <8a789fe6-4278-a9a3-0201-82f80cc5a69a@intel.com> (raw)
In-Reply-To: <20211114111218.623138-5-thomas.hellstrom@linux.intel.com>

On 14/11/2021 11:12, Thomas Hellström wrote:
> There is an interesting refcounting loop:
> struct intel_memory_region has a struct ttm_resource_manager,
> ttm_resource_manager->move may hold a reference to i915_request,
> i915_request may hold a reference to intel_context,
> intel_context may hold a reference to drm_i915_gem_object,
> drm_i915_gem_object may hold a reference to intel_memory_region.

Would it help if we drop the per object region refcoutning? IIRC that 
was originally added to more cleanly appease some selftest teardown or 
something.

> 
> Break this loop when we drop the device reference count on the
> region by putting the region move fence.
> 
> Also hold dropping the device reference count until all objects of
> the region has been deleted, to avoid issues if proceeding with the
> device takedown while the region is still present.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c     |  1 +
>   drivers/gpu/drm/i915/gt/intel_region_lmem.c |  1 +
>   drivers/gpu/drm/i915/intel_memory_region.c  |  5 +++-
>   drivers/gpu/drm/i915/intel_memory_region.h  |  1 +
>   drivers/gpu/drm/i915/intel_region_ttm.c     | 28 +++++++++++++++++++++
>   drivers/gpu/drm/i915/intel_region_ttm.h     |  2 ++
>   6 files changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 537a81445b90..a1df49378a0f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -1044,6 +1044,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
>   
>   static const struct intel_memory_region_ops ttm_system_region_ops = {
>   	.init_object = __i915_gem_ttm_object_init,
> +	.disable = intel_region_ttm_disable,
>   };
>   
>   struct intel_memory_region *
> diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> index aec838ecb2ef..956916fd21f8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> @@ -108,6 +108,7 @@ region_lmem_init(struct intel_memory_region *mem)
>   static const struct intel_memory_region_ops intel_region_lmem_ops = {
>   	.init = region_lmem_init,
>   	.release = region_lmem_release,
> +	.disable = intel_region_ttm_disable,
>   	.init_object = __i915_gem_ttm_object_init,
>   };
>   
> diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
> index e7f7e6627750..1f67d2b68c24 100644
> --- a/drivers/gpu/drm/i915/intel_memory_region.c
> +++ b/drivers/gpu/drm/i915/intel_memory_region.c
> @@ -233,8 +233,11 @@ void intel_memory_regions_driver_release(struct drm_i915_private *i915)
>   		struct intel_memory_region *region =
>   			fetch_and_zero(&i915->mm.regions[i]);
>   
> -		if (region)
> +		if (region) {
> +			if (region->ops->disable)
> +				region->ops->disable(region);
>   			intel_memory_region_put(region);
> +		}
>   	}
>   }
>   
> diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
> index 3feae3353d33..9bb77eacd206 100644
> --- a/drivers/gpu/drm/i915/intel_memory_region.h
> +++ b/drivers/gpu/drm/i915/intel_memory_region.h
> @@ -52,6 +52,7 @@ struct intel_memory_region_ops {
>   
>   	int (*init)(struct intel_memory_region *mem);
>   	void (*release)(struct intel_memory_region *mem);
> +	void (*disable)(struct intel_memory_region *mem);
>   
>   	int (*init_object)(struct intel_memory_region *mem,
>   			   struct drm_i915_gem_object *obj,
> diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
> index 2e901a27e259..4219d83a2b19 100644
> --- a/drivers/gpu/drm/i915/intel_region_ttm.c
> +++ b/drivers/gpu/drm/i915/intel_region_ttm.c
> @@ -114,6 +114,34 @@ void intel_region_ttm_fini(struct intel_memory_region *mem)
>   	mem->region_private = NULL;
>   }
>   
> +/**
> + * intel_region_ttm_disable - A TTM region disable callback helper
> + * @mem: The memory region.
> + *
> + * A helper that ensures that nothing any longer references a region at
> + * device takedown. Breaks refcounting loops and waits for objects in the
> + * region to be deleted.
> + */
> +void intel_region_ttm_disable(struct intel_memory_region *mem)
> +{
> +	struct ttm_resource_manager *man = mem->region_private;
> +
> +	/*
> +	 * Put the region's move fences. This releases requests that
> +	 * may hold on to contexts and vms that may hold on to buffer
> +	 * objects that may have a refcount on the region. :/
> +	 */
> +	if (man)
> +		ttm_resource_manager_cleanup(man);
> +
> +	/* Flush objects that may just have been freed */
> +	i915_gem_flush_free_objects(mem->i915);
> +
> +	/* Wait until the only region reference left is our own. */
> +	while (kref_read(&mem->kref) > 1)
> +		msleep(20);

If we leak an object, I guess we get an infinite loop here at driver 
release?

> +}
> +
>   /**
>    * intel_region_ttm_resource_to_rsgt -
>    * Convert an opaque TTM resource manager resource to a refcounted sg_table.
> diff --git a/drivers/gpu/drm/i915/intel_region_ttm.h b/drivers/gpu/drm/i915/intel_region_ttm.h
> index 7bbe2b46b504..197a8c179370 100644
> --- a/drivers/gpu/drm/i915/intel_region_ttm.h
> +++ b/drivers/gpu/drm/i915/intel_region_ttm.h
> @@ -22,6 +22,8 @@ int intel_region_ttm_init(struct intel_memory_region *mem);
>   
>   void intel_region_ttm_fini(struct intel_memory_region *mem);
>   
> +void intel_region_ttm_disable(struct intel_memory_region *mem);
> +
>   struct i915_refct_sgt *
>   intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
>   				  struct ttm_resource *res);
> 

  reply	other threads:[~2021-11-15 10:49 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-14 11:12 [Intel-gfx] [PATCH v3 0/6] drm/i915/ttm: Async migration Thomas Hellström
2021-11-14 11:12 ` [Intel-gfx] [PATCH v3 1/6] drm/i915: Add functions to set/get moving fence Thomas Hellström
2021-11-15 12:39   ` Matthew Auld
2021-11-15 12:44     ` Thomas Hellström
2021-11-14 11:12 ` [Intel-gfx] [PATCH v3 2/6] drm/i915: Add support for asynchronous moving fence waiting Thomas Hellström
2021-11-15 12:36   ` Matthew Auld
2021-11-15 12:42     ` Thomas Hellström
2021-11-15 13:13       ` Matthew Auld
2021-11-15 13:29         ` Thomas Hellström
2021-11-14 11:12 ` [Intel-gfx] [PATCH v3 3/6] drm/i915/ttm: Move the i915_gem_obj_copy_ttm() function Thomas Hellström
2021-11-15 10:42   ` Matthew Auld
2021-11-14 11:12 ` [Intel-gfx] [PATCH v3 4/6] drm/i915/ttm: Break refcounting loops at device region unref time Thomas Hellström
2021-11-15 10:49   ` Matthew Auld [this message]
2021-11-14 11:12 ` [Intel-gfx] [PATCH v3 5/6] drm/i915/ttm: Implement asynchronous TTM moves Thomas Hellström
2021-11-15 17:16   ` Matthew Auld
2021-11-16  7:20     ` Thomas Hellström
2021-11-18  7:13     ` Thomas Hellström
2021-11-14 11:12 ` [Intel-gfx] [PATCH v3 6/6] drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous Thomas Hellström
2021-11-14 11:25 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/ttm: Async migration (rev4) Patchwork
2021-11-14 11:28 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2021-11-14 11:52 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-11-14 13:32 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8a789fe6-4278-a9a3-0201-82f80cc5a69a@intel.com \
    --to=matthew.auld@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox