From: Matthew Auld <matthew.auld@intel.com>
To: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH v3 4/6] drm/i915/ttm: Break refcounting loops at device region unref time
Date: Mon, 15 Nov 2021 10:49:32 +0000 [thread overview]
Message-ID: <8a789fe6-4278-a9a3-0201-82f80cc5a69a@intel.com> (raw)
In-Reply-To: <20211114111218.623138-5-thomas.hellstrom@linux.intel.com>
On 14/11/2021 11:12, Thomas Hellström wrote:
> There is an interesting refcounting loop:
> struct intel_memory_region has a struct ttm_resource_manager,
> ttm_resource_manager->move may hold a reference to i915_request,
> i915_request may hold a reference to intel_context,
> intel_context may hold a reference to drm_i915_gem_object,
> drm_i915_gem_object may hold a reference to intel_memory_region.
Would it help if we drop the per object region refcoutning? IIRC that
was originally added to more cleanly appease some selftest teardown or
something.
>
> Break this loop when we drop the device reference count on the
> region by putting the region move fence.
>
> Also hold dropping the device reference count until all objects of
> the region has been deleted, to avoid issues if proceeding with the
> device takedown while the region is still present.
>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
> drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 1 +
> drivers/gpu/drm/i915/gt/intel_region_lmem.c | 1 +
> drivers/gpu/drm/i915/intel_memory_region.c | 5 +++-
> drivers/gpu/drm/i915/intel_memory_region.h | 1 +
> drivers/gpu/drm/i915/intel_region_ttm.c | 28 +++++++++++++++++++++
> drivers/gpu/drm/i915/intel_region_ttm.h | 2 ++
> 6 files changed, 37 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 537a81445b90..a1df49378a0f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -1044,6 +1044,7 @@ int __i915_gem_ttm_object_init(struct intel_memory_region *mem,
>
> static const struct intel_memory_region_ops ttm_system_region_ops = {
> .init_object = __i915_gem_ttm_object_init,
> + .disable = intel_region_ttm_disable,
> };
>
> struct intel_memory_region *
> diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> index aec838ecb2ef..956916fd21f8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
> @@ -108,6 +108,7 @@ region_lmem_init(struct intel_memory_region *mem)
> static const struct intel_memory_region_ops intel_region_lmem_ops = {
> .init = region_lmem_init,
> .release = region_lmem_release,
> + .disable = intel_region_ttm_disable,
> .init_object = __i915_gem_ttm_object_init,
> };
>
> diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
> index e7f7e6627750..1f67d2b68c24 100644
> --- a/drivers/gpu/drm/i915/intel_memory_region.c
> +++ b/drivers/gpu/drm/i915/intel_memory_region.c
> @@ -233,8 +233,11 @@ void intel_memory_regions_driver_release(struct drm_i915_private *i915)
> struct intel_memory_region *region =
> fetch_and_zero(&i915->mm.regions[i]);
>
> - if (region)
> + if (region) {
> + if (region->ops->disable)
> + region->ops->disable(region);
> intel_memory_region_put(region);
> + }
> }
> }
>
> diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
> index 3feae3353d33..9bb77eacd206 100644
> --- a/drivers/gpu/drm/i915/intel_memory_region.h
> +++ b/drivers/gpu/drm/i915/intel_memory_region.h
> @@ -52,6 +52,7 @@ struct intel_memory_region_ops {
>
> int (*init)(struct intel_memory_region *mem);
> void (*release)(struct intel_memory_region *mem);
> + void (*disable)(struct intel_memory_region *mem);
>
> int (*init_object)(struct intel_memory_region *mem,
> struct drm_i915_gem_object *obj,
> diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c
> index 2e901a27e259..4219d83a2b19 100644
> --- a/drivers/gpu/drm/i915/intel_region_ttm.c
> +++ b/drivers/gpu/drm/i915/intel_region_ttm.c
> @@ -114,6 +114,34 @@ void intel_region_ttm_fini(struct intel_memory_region *mem)
> mem->region_private = NULL;
> }
>
> +/**
> + * intel_region_ttm_disable - A TTM region disable callback helper
> + * @mem: The memory region.
> + *
> + * A helper that ensures that nothing any longer references a region at
> + * device takedown. Breaks refcounting loops and waits for objects in the
> + * region to be deleted.
> + */
> +void intel_region_ttm_disable(struct intel_memory_region *mem)
> +{
> + struct ttm_resource_manager *man = mem->region_private;
> +
> + /*
> + * Put the region's move fences. This releases requests that
> + * may hold on to contexts and vms that may hold on to buffer
> + * objects that may have a refcount on the region. :/
> + */
> + if (man)
> + ttm_resource_manager_cleanup(man);
> +
> + /* Flush objects that may just have been freed */
> + i915_gem_flush_free_objects(mem->i915);
> +
> + /* Wait until the only region reference left is our own. */
> + while (kref_read(&mem->kref) > 1)
> + msleep(20);
If we leak an object, I guess we get an infinite loop here at driver
release?
> +}
> +
> /**
> * intel_region_ttm_resource_to_rsgt -
> * Convert an opaque TTM resource manager resource to a refcounted sg_table.
> diff --git a/drivers/gpu/drm/i915/intel_region_ttm.h b/drivers/gpu/drm/i915/intel_region_ttm.h
> index 7bbe2b46b504..197a8c179370 100644
> --- a/drivers/gpu/drm/i915/intel_region_ttm.h
> +++ b/drivers/gpu/drm/i915/intel_region_ttm.h
> @@ -22,6 +22,8 @@ int intel_region_ttm_init(struct intel_memory_region *mem);
>
> void intel_region_ttm_fini(struct intel_memory_region *mem);
>
> +void intel_region_ttm_disable(struct intel_memory_region *mem);
> +
> struct i915_refct_sgt *
> intel_region_ttm_resource_to_rsgt(struct intel_memory_region *mem,
> struct ttm_resource *res);
>
next prev parent reply other threads:[~2021-11-15 10:49 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-14 11:12 [Intel-gfx] [PATCH v3 0/6] drm/i915/ttm: Async migration Thomas Hellström
2021-11-14 11:12 ` [Intel-gfx] [PATCH v3 1/6] drm/i915: Add functions to set/get moving fence Thomas Hellström
2021-11-15 12:39 ` Matthew Auld
2021-11-15 12:44 ` Thomas Hellström
2021-11-14 11:12 ` [Intel-gfx] [PATCH v3 2/6] drm/i915: Add support for asynchronous moving fence waiting Thomas Hellström
2021-11-15 12:36 ` Matthew Auld
2021-11-15 12:42 ` Thomas Hellström
2021-11-15 13:13 ` Matthew Auld
2021-11-15 13:29 ` Thomas Hellström
2021-11-14 11:12 ` [Intel-gfx] [PATCH v3 3/6] drm/i915/ttm: Move the i915_gem_obj_copy_ttm() function Thomas Hellström
2021-11-15 10:42 ` Matthew Auld
2021-11-14 11:12 ` [Intel-gfx] [PATCH v3 4/6] drm/i915/ttm: Break refcounting loops at device region unref time Thomas Hellström
2021-11-15 10:49 ` Matthew Auld [this message]
2021-11-14 11:12 ` [Intel-gfx] [PATCH v3 5/6] drm/i915/ttm: Implement asynchronous TTM moves Thomas Hellström
2021-11-15 17:16 ` Matthew Auld
2021-11-16 7:20 ` Thomas Hellström
2021-11-18 7:13 ` Thomas Hellström
2021-11-14 11:12 ` [Intel-gfx] [PATCH v3 6/6] drm/i915/ttm: Update i915_gem_obj_copy_ttm() to be asynchronous Thomas Hellström
2021-11-14 11:25 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/ttm: Async migration (rev4) Patchwork
2021-11-14 11:28 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2021-11-14 11:52 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-11-14 13:32 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8a789fe6-4278-a9a3-0201-82f80cc5a69a@intel.com \
--to=matthew.auld@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox