Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: "Christian König" <christian.koenig@amd.com>,
	matthew.brost@intel.com, dri-devel@lists.freedesktop.org,
	intel-xe@lists.freedesktop.org
Subject: Re: Switching over to GEM refcounts and a bunch of cleanups
Date: Fri, 22 Aug 2025 10:51:13 +0200	[thread overview]
Message-ID: <c14a919a352742d6344f34455de6aa1e535ecc70.camel@linux.intel.com> (raw)
In-Reply-To: <d3f60146-39d3-458f-8271-cfcec1c92254@amd.com>

Hi

On Thu, 2025-08-21 at 16:59 +0200, Christian König wrote:
> On 21.08.25 16:06, Thomas Hellström wrote:
> > > What are you referring to?
> > 
> > https://lore.kernel.org/intel-xe/a004736315d77837172418eb196d5b5f80b74e6c.camel@linux.intel.com/
> 
> Thanks, that one never made it into my inbox as far as I can see.
> 
> > A couple of questions on the design direction here:
> > 
> > IIRC both xe and i915 has checks to consider objects with a 0 gem
> > refcount as zombies requiring special treatment or skipping, when
> > encountered in TTM callbacks. We need to double-check that.
> 
> I think I've found all of those. The one in i915 were actually not
> TTM specific but try to catch the same problem on the GEM refcount.
> 
> > But I wonder, 
> > first this practice of resurrecting refcounts seem a bit unusual, I
> > wonder if we can get rid of that somehow?
> 
> I was also going back on forth if that is a good idea or not as well.
> 
> The usual solution to such kinds of issues is to use two reference
> counts, so that you got a multi stage cleanup approach. E.g. backing
> store and object, like what mm_struct is using as well.
> 
> The problem was simply that TTM/GEM ended up having *four* reference
> counts for the same object, each was doing something different and
> they didn't worked well together at all.
> 
> > Furthermore, it seems the problem with drm_exec is related only to
> > the
> > LRU walk. What about adding a struct completion to the object, that
> > is
> > signaled when the object has freed its final backing-store. The LRU
> > walk would then check if the object is a zombie, and if so just
> > wait on
> > the struct completion. (Need of course to carefully set up locking
> > orders). Then we wouldn't need to resurrect the gem refcount, nor
> > use
> > drm_exec locking for zombies.
> 
> I had a similar idea, waiting is already possible by waiting for the
> BOs work item.
> 
> But I abandoned that idea because I couldn't see how we could solve
> the locking.
> 
> > We would still need some form of refcounting while waiting on the
> > struct completion, but if we restricted the TTM refcount to *only*
> > be
> > used internally for that sole purpose, and also replaced the final
> > ttm_bo_put() with the ttm_bo_finalize() that you suggest we
> > wouldn't
> > need to resurrect that refcount since it wouldn't drop to zero
> > until
> > the object is ready for final free.
> > 
> > Ideas, comments?
> 
> Ideally I think we would use the handle_count as backing store the
> drm_gem_object->refcount as structure reference.
> 
> But that means a massive rework of the GEM handling/drivers/TTM.
> 
> Alternative we could just grab a reference to a unsignaled fence when
> we encounter a dead BO on the LRU.
> 
> What do you think of that idea?

I think to be able to *guarantee* exhaustive eviction, we need
1) all unfreed resources to sit on an LRU, and
2) everything on the LRU needs to be able to have something to wait
for.

A fence can't really guarantee 2), but it's close. There is a time-
interval in betwen where the last fence signals and we take the
resource from the LRU and free it.

A struct completion can be made to signal when the resource is freed.
I think the locking restriction in the struct completion case (the
struct completion is likely waited for under a dma-resv), is that
nothing except the object destructor may take an individualized resv of
a zombie gem object whose refcount has gone to zero. The destructor
should use an asserted trylock only to make lockdep happy. The struct
completion also needs a refcount to avoid destroying it while there are
waiters.

So what do you think about starting out with a fence, and if / when
that appears not to be sufficient, we have a backup plan to move to a
struct completion?

Thomas


> 
> Regards,
> Christian.


  reply	other threads:[~2025-08-22  8:51 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-16 16:04 Switching over to GEM refcounts and a bunch of cleanups Christian König
2025-07-16 16:04 ` [PATCH 1/7] drm/ttm: replace TTMs refcount with the DRM refcount v3 Christian König
2025-07-16 16:04 ` [PATCH 2/7] drm/ttm: remove ttm_lru_walk_ops Christian König
2025-07-16 16:04 ` [PATCH 3/7] drm/ttm: grab BO reference before locking it Christian König
2025-07-16 16:04 ` [PATCH 4/7] drm/ttm: switch to ttm_bo_lru_for_each_reserved_guarded for swapout Christian König
2025-07-16 16:04 ` [PATCH 5/7] drm/ttm: move zombie handling into ttm_bo_evict Christian König
2025-07-16 16:04 ` [PATCH 6/7] drm/ttm: use ttm_bo_lru_for_each_reserved_guarded in evict_all Christian König
2025-07-16 16:04 ` [PATCH 7/7] drm/xe: remove workaround for TTM internals Christian König
2025-07-22 12:36 ` Switching over to GEM refcounts and a bunch of cleanups Thomas Hellström
2025-08-04 12:24   ` Christian König
2025-08-21 12:30     ` Christian König
2025-08-21 14:06       ` Thomas Hellström
2025-08-21 14:59         ` Christian König
2025-08-22  8:51           ` Thomas Hellström [this message]
2025-09-08 15:16             ` Christian König
2025-09-09  7:21               ` Thomas Hellström
2025-09-09  8:13                 ` Christian König
2025-09-09  8:36                   ` Thomas Hellström

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c14a919a352742d6344f34455de6aa1e535ecc70.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox